Actions - llama.cpp - Gitea: Git with a cup of tea

root/llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-09-22 21:16:20 +00:00

musa: enable building fat binaries, enable unified memory, and disable Flash Attention on QY1 (MTT S80) (#9526)

#218:Commit c35e586ea5 pushed by root

master

2024-09-22 21:16:20 +00:00

0s

CUDA: enable Gemma FA for HIP/Pascal (#9581)

#212:Commit a5b57b08ce pushed by root

master

2024-09-22 21:16:20 +00:00

0s

llama: remove redundant loop when constructing ubatch (#9574)

#203:Commit ecd5d6b65b pushed by root

master

2024-09-22 13:06:19 +00:00

0s

ggml-alloc : fix list of allocated tensors with GGML_ALLOCATOR_DEBUG (#9573)

#197:Commit d09770cae7 pushed by root

master

2024-09-22 04:56:20 +00:00

0s

Update CUDA graph on scale change plus clear nodes/params (#9550)

#190:Commit 41f477879f pushed by root

master

2024-09-21 12:36:19 +00:00

0s

quantize : improve type name parsing (#9570)

#184:Commit 63351143b2 pushed by root

master

2024-09-21 04:26:20 +00:00

0s

examples : flush log upon ctrl+c (#9559)

#178:Commit d39e26741f pushed by root

master

2024-09-20 20:16:21 +00:00

0s

server : clean-up completed tasks from waiting list (#9531)

#169:Commit 6026da52d6 pushed by root

master

2024-09-20 12:06:20 +00:00

0s

ggml : fix n_threads_cur initialization with one thread (#9538)

#159:Commit 64c6af3195 pushed by root

master

2024-09-19 11:36:20 +00:00

0s

server : match OAI structured output response (#9527)

#150:Commit 8a308354f6 pushed by root

master

2024-09-18 19:16:20 +00:00

0s

[SYCL]set context default value to avoid memory issue, update guide (#9476)

#145:Commit faf67b3de4 pushed by root

master

2024-09-18 11:06:21 +00:00

0s

arg : add env variable for parallel (#9513)

#139:Commit 8b836ae731 pushed by root

master

2024-09-18 02:56:19 +00:00

0s

llama : fix n_vocab init for 'no_vocab' case (#9511)

#129:Commit 8344ef58f8 pushed by root

master

2024-09-17 18:46:21 +00:00

0s

ggml : move common CPU backend impl to new header (#9509)

#110:Commit 23e0d70bac pushed by root

master

2024-09-17 10:46:18 +00:00

0s

convert : identify missing model files (#9397)

#103:Commit d54c21df7e pushed by root

master

2024-09-16 18:26:17 +00:00

0s

common : reimplement logging (#9418)

#89:Commit 6262d13e0b pushed by root

master

2024-09-16 10:16:19 +00:00

0s

py : add "LLaMAForCausalLM" conversion support (#9485)

#82:Commit 3c7989fd29 pushed by root

master

2024-09-15 17:56:18 +00:00

0s

ggml : ggml_type_name return "NONE" for invalid values (#9458)

#71:Commit 822b6322de pushed by root

master

2024-09-15 09:46:17 +00:00

0s

cmake : use list(APPEND ...) instead of set() + dedup linker (#9463)

#66:Commit 1f4111e540 pushed by root

master

2024-09-14 17:26:17 +00:00

0s

server : add loading html page while model is loading (#9468)

#57:Commit feff4aa846 pushed by root

master

2024-09-14 09:16:17 +00:00

0s

llama : llama_perf + option to disable timings during decode (#9355)

#50:Commit 0abc6a2c25 pushed by root

master

2024-09-13 16:56:18 +00:00

0s

server : Add option to return token pieces in /tokenize endpoint (#9108)

#42:Commit 78203641fe pushed by root

master

2024-09-13 08:46:18 +00:00

0s

cann: Add host buffer type for Ascend NPU (#9406)

#31:Commit e6b7801bd1 pushed by root

master

2024-09-13 00:36:16 +00:00

0s

cann: Fix error when running a non-exist op (#9424)

#24:Commit df4b7945ae pushed by root

master

2024-09-12 16:26:18 +00:00

0s

llama : skip token bounds check when evaluating embeddings (#9437)

#4:Commit 1b28061400 pushed by root

master

2024-09-12 08:16:17 +00:00

0s