llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-25 10:54:36 +00:00

History

Markus Tavenrath 7c5bfd57f8 Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. (#8943 ) * Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. - Allocation overhead for the temporary std::vectors was easily detectable with a sampling profiler and simple to remove. - ggml_vk_sync_buffer introduce a full pipeline sync which has a significant cost on the GPU side, sometimes larger than the actual kernel execution. Adding only barriers for shader read/writes and transfers seems to be sufficient looking at the code which either launches compute kernels or copies tensors. * Fix small typo --------- Co-authored-by: 0cc4m <picard12@live.de>		2024-08-11 10:09:09 +02:00
..
cmake	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
include	metal : add abort callback (ggml/905)	2024-08-08 13:19:30 +03:00
src	Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. (#8943 )	2024-08-11 10:09:09 +02:00
.gitignore	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
CMakeLists.txt	cann: update cmake (#8765 )	2024-07-30 12:37:35 +02:00