llama.cpp/ggml/src/ggml-vulkan
Jeff Bolz b3e585988f
vulkan: Optimize soft_max (#10301)
* vulkan: Optimize soft_max

Large soft_max could already saturate memory, but small/medium sizes were
pretty slow. The bulk of the gains for them comes from using a smaller
workgroup size, and making the workgroup size match the subgroup size also
makes the barriers much cheaper.

Cache some values in locals to avoid refetching/recomputing. And stamp
out a few "template instantiations" so smaller cases will fully unroll.

Add a missing early return for OOB rows. This happens when there are more
than 512 rows and the dispatch is 512 x H.

* vulkan: Further soft_max optimizations

Restore the workgroup size of 512 case, use it for >1024.

Use unrollable loops for more iteration counts.
2024-11-19 08:25:17 +01:00
..
vulkan-shaders vulkan: Optimize soft_max (#10301) 2024-11-19 08:25:17 +01:00
CMakeLists.txt ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-vulkan.cpp vulkan: Optimize soft_max (#10301) 2024-11-19 08:25:17 +01:00