llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-13 12:10:18 +00:00

History

mahorozte e9e661bd59 CUDA: remove unnecessary warp reduce in FA (ggml/1032) * kqmax_new_j in every thread within warp is same after operate at line 199,this reduce can be omit * same problem in vec32 --------- Co-authored-by: ZhaoXiaoYu <zhao.xiaoyu@zte.com.cn>		2024-12-03 20:04:49 +02:00
..
include	ggml : move AMX to the CPU backend (#10570 )	2024-11-29 21:54:58 +01:00
src	CUDA: remove unnecessary warp reduce in FA (ggml/1032)	2024-12-03 20:04:49 +02:00
.gitignore	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
CMakeLists.txt	ggml : automatic selection of best CPU backend (#10606 )	2024-12-01 16:12:41 +01:00