llama.cpp/ggml/src/vulkan-shaders
Jeff Bolz 80dd7ff22f
vulkan: Optimize contiguous copies (#10254)
* tests: Fix memory bandwidth calculation for perf tests

Add a flops calculation for flash attention.

Add one GGML_OP_CPY perf test.

* vulkan: Optimize contiguous copies

Add a variant of the copy shader for when the tensors are contiguous. Avoid
the complex addressing calculations, and do four elements per invocation
to hide some other overhead.

Apply similar changes to the scale shader, since scale is always contiguous.

Add a "progress bar" for shader compiles.
2024-11-13 07:58:57 +01:00
..
acc.comp llava: Add ACC OP for GPU acceleration to the Vulkan backend in the LLAVA CLIP model. (#8984) 2024-08-20 21:00:00 +02:00
add.comp vulkan : implement Stable Diffusion operators (ggml/904) 2024-08-05 08:50:57 +03:00
argsort.comp vulkan : argsort barriers must be under uniform control flow (ggml/951) 2024-09-29 21:15:37 +03:00
clamp.comp vulkan: Optimize contiguous copies (#10254) 2024-11-13 07:58:57 +01:00
CMakeLists.txt cmake : Link vulkan-shaders-gen with pthreads (#8835) 2024-08-06 15:21:47 +02:00
concat.comp Vulkan Optimizations and Fixes (#8959) 2024-08-14 18:32:53 +02:00
contig_copy.comp vulkan: Optimize contiguous copies (#10254) 2024-11-13 07:58:57 +01:00
copy.comp vulkan: Optimize contiguous copies (#10254) 2024-11-13 07:58:57 +01:00
cos.comp vulkan: Optimize contiguous copies (#10254) 2024-11-13 07:58:57 +01:00
dequant_f32.comp llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
dequant_funcs.comp Vulkan IQ4_NL Support (#8613) 2024-07-23 10:56:49 +02:00
dequant_head.comp llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
dequant_iq4_nl.comp Vulkan IQ4_NL Support (#8613) 2024-07-23 10:56:49 +02:00
dequant_q2_k.comp llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
dequant_q3_k.comp llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
dequant_q4_0.comp Vulkan IQ4_NL Support (#8613) 2024-07-23 10:56:49 +02:00
dequant_q4_1.comp llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
dequant_q4_k.comp llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
dequant_q5_0.comp llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
dequant_q5_1.comp llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
dequant_q5_k.comp llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
dequant_q6_k.comp llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
dequant_q8_0.comp llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
diag_mask_inf.comp llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
div.comp vulkan : implement Stable Diffusion operators (ggml/904) 2024-08-05 08:50:57 +03:00
gelu_quick.comp vulkan : implement Stable Diffusion operators (ggml/904) 2024-08-05 08:50:57 +03:00
gelu.comp vulkan : implement Stable Diffusion operators (ggml/904) 2024-08-05 08:50:57 +03:00
generic_binary_head.comp vulkan : implement Stable Diffusion operators (ggml/904) 2024-08-05 08:50:57 +03:00
generic_head.comp llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
generic_unary_head.comp vulkan: Optimize contiguous copies (#10254) 2024-11-13 07:58:57 +01:00
get_rows_quant.comp llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
get_rows.comp llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
group_norm.comp vulkan : implement Stable Diffusion operators (ggml/904) 2024-08-05 08:50:57 +03:00
im2col.comp vulkan : implement Stable Diffusion operators (ggml/904) 2024-08-05 08:50:57 +03:00
leaky_relu.comp vulkan : implement Stable Diffusion operators (ggml/904) 2024-08-05 08:50:57 +03:00
mul_mat_split_k_reduce.comp llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
mul_mat_vec_base.comp llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
mul_mat_vec_nc.comp Vulkan Optimizations and Fixes (#8959) 2024-08-14 18:32:53 +02:00
mul_mat_vec_p021.comp Vulkan Optimizations and Fixes (#8959) 2024-08-14 18:32:53 +02:00
mul_mat_vec_q2_k.comp Vulkan Optimizations and Fixes (#8959) 2024-08-14 18:32:53 +02:00
mul_mat_vec_q3_k.comp Vulkan Optimizations and Fixes (#8959) 2024-08-14 18:32:53 +02:00
mul_mat_vec_q4_k.comp Vulkan Optimizations and Fixes (#8959) 2024-08-14 18:32:53 +02:00
mul_mat_vec_q5_k.comp Vulkan Optimizations and Fixes (#8959) 2024-08-14 18:32:53 +02:00
mul_mat_vec_q6_k.comp Vulkan Optimizations and Fixes (#8959) 2024-08-14 18:32:53 +02:00
mul_mat_vec.comp Vulkan Optimizations and Fixes (#8959) 2024-08-14 18:32:53 +02:00
mul_mm.comp Vulkan Optimizations and Fixes (#8959) 2024-08-14 18:32:53 +02:00
mul.comp vulkan : implement Stable Diffusion operators (ggml/904) 2024-08-05 08:50:57 +03:00
norm.comp vulkan : implement Stable Diffusion operators (ggml/904) 2024-08-05 08:50:57 +03:00
pad.comp vulkan: Optimize contiguous copies (#10254) 2024-11-13 07:58:57 +01:00
pool2d.comp ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. (#9763) 2024-10-29 09:52:56 +01:00
relu.comp vulkan : implement Stable Diffusion operators (ggml/904) 2024-08-05 08:50:57 +03:00
repeat.comp vulkan: Optimize contiguous copies (#10254) 2024-11-13 07:58:57 +01:00
rms_norm.comp vulkan : implement Stable Diffusion operators (ggml/904) 2024-08-05 08:50:57 +03:00
rope_head.comp llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
rope_neox.comp llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
rope_norm.comp llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
scale.comp vulkan: Optimize contiguous copies (#10254) 2024-11-13 07:58:57 +01:00
silu.comp vulkan : implement Stable Diffusion operators (ggml/904) 2024-08-05 08:50:57 +03:00
sin.comp vulkan: Optimize contiguous copies (#10254) 2024-11-13 07:58:57 +01:00
soft_max.comp vulkan : implement Stable Diffusion operators (ggml/904) 2024-08-05 08:50:57 +03:00
square.comp vulkan: Optimize contiguous copies (#10254) 2024-11-13 07:58:57 +01:00
sum_rows.comp vulkan : implement Stable Diffusion operators (ggml/904) 2024-08-05 08:50:57 +03:00
tanh.comp vulkan : implement Stable Diffusion operators (ggml/904) 2024-08-05 08:50:57 +03:00
timestep_embedding.comp vulkan : implement Stable Diffusion operators (ggml/904) 2024-08-05 08:50:57 +03:00
types.comp vulkan : implement Stable Diffusion operators (ggml/904) 2024-08-05 08:50:57 +03:00
upscale.comp vulkan : implement Stable Diffusion operators (ggml/904) 2024-08-05 08:50:57 +03:00
vulkan-shaders-gen.cpp vulkan: Optimize contiguous copies (#10254) 2024-11-13 07:58:57 +01:00