llama.cpp/ggml
mahorozte e9e661bd59 CUDA: remove unnecessary warp reduce in FA (ggml/1032)
* kqmax_new_j in every thread within warp is same after operate at line 199,this reduce can be omit

* same problem in vec32

---------

Co-authored-by: ZhaoXiaoYu <zhao.xiaoyu@zte.com.cn>
2024-12-03 20:04:49 +02:00
..
include ggml : move AMX to the CPU backend (#10570) 2024-11-29 21:54:58 +01:00
src CUDA: remove unnecessary warp reduce in FA (ggml/1032) 2024-12-03 20:04:49 +02:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt ggml : automatic selection of best CPU backend (#10606) 2024-12-01 16:12:41 +01:00