llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-25 02:44:36 +00:00

History

Ivan 116efee0ee cuda: add q8_0->f32 cpy operation (#9571 ) llama: enable K-shift for quantized KV cache It will fail on unsupported backends or quant types.		2024-09-24 02:14:24 +02:00
..
cmake	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
include	ggml/examples: add backend support for numerical optimization (ggml/949)	2024-09-20 21:15:05 +03:00
src	cuda: add q8_0->f32 cpy operation (#9571 )	2024-09-24 02:14:24 +02:00
.gitignore	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
CMakeLists.txt	cmake : do not hide GGML options + rename option (#9465 )	2024-09-16 10:27:50 +03:00