llama.cpp/ggml/src/ggml-vulkan/vulkan-shaders
Eve 64ae065511
vulkan: small mul_mat_vec optimizations (#10665)
* double the number of rows per workgroup

* Update ggml-vulkan.cpp

* Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats

* only increase the number of rows for amd and subgroup size 64

* fix missing NUM_ROWS for mul_mat_vec_iq4_nl_f16_f32, untested

* use subgroup min and max to check for gcn (requires https://github.com/ggerganov/llama.cpp/pull/10721)

* manual merge ggml-vulkan.cpp

* set min and max subgroup size in any case

* Also double the number of rows for Intel GPUs
2024-12-13 09:42:04 +01:00
..
acc.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
add.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
argsort.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
clamp.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
CMakeLists.txt vulkan: Add VK_NV_cooperative_matrix2 support for mul_mat and flash attention (#10206) 2024-12-05 20:15:05 +01:00
concat.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
contig_copy.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
copy.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
cos.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
dequant_f32.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
dequant_funcs_cm2.comp vulkan: Add VK_NV_cooperative_matrix2 support for mul_mat and flash attention (#10206) 2024-12-05 20:15:05 +01:00
dequant_funcs.comp vulkan: small mul_mat_vec optimizations (#10665) 2024-12-13 09:42:04 +01:00
dequant_head.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
dequant_iq4_nl.comp vulkan: copy iq4_nl LUT into shared memory (#10409) 2024-11-20 08:40:18 +01:00
dequant_q2_k.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
dequant_q3_k.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
dequant_q4_0.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
dequant_q4_1.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
dequant_q4_k.comp Vulkan: Use improved q4_k and q5_k dequant code in dequant shaders (#10798) 2024-12-12 18:36:00 +01:00
dequant_q5_0.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
dequant_q5_1.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
dequant_q5_k.comp Vulkan: Use improved q4_k and q5_k dequant code in dequant shaders (#10798) 2024-12-12 18:36:00 +01:00
dequant_q6_k.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
dequant_q8_0.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
diag_mask_inf.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
div.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
flash_attn_cm2.comp vulkan: Add VK_NV_cooperative_matrix2 support for mul_mat and flash attention (#10206) 2024-12-05 20:15:05 +01:00
gelu_quick.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
gelu.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
generic_binary_head.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
generic_head.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
generic_unary_head.comp vulkan: Implement "fast divide" (mul+shift) for unary ops like copy (#10642) 2024-12-04 08:28:59 +01:00
get_rows_quant.comp vulkan: small mul_mat_vec optimizations (#10665) 2024-12-13 09:42:04 +01:00
get_rows.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
group_norm.comp vulkan: fix group_norm (#10496) 2024-11-26 16:45:05 +01:00
im2col.comp vulkan: request round-to-even for fp16 in im2col/rope_head (#10767) 2024-12-10 21:23:17 +01:00
leaky_relu.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
mul_mat_split_k_reduce.comp vulkan: optimize and reenable split_k (#10637) 2024-12-03 20:29:54 +01:00
mul_mat_vec_base.comp vulkan: dynamic subgroup size for the remaining k quants (#10745) 2024-12-10 20:33:23 +01:00
mul_mat_vec_nc.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
mul_mat_vec_p021.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
mul_mat_vec_q2_k.comp vulkan: dynamic subgroup size for the remaining k quants (#10745) 2024-12-10 20:33:23 +01:00
mul_mat_vec_q3_k.comp vulkan: dynamic subgroup size for the remaining k quants (#10745) 2024-12-10 20:33:23 +01:00
mul_mat_vec_q4_k.comp vulkan: dynamic subgroup size for the remaining k quants (#10745) 2024-12-10 20:33:23 +01:00
mul_mat_vec_q5_k.comp vulkan: dynamic subgroup size for the remaining k quants (#10745) 2024-12-10 20:33:23 +01:00
mul_mat_vec_q6_k.comp vulkan: Dynamic subgroup size support for Q6_K mat_vec (#10536) 2024-11-30 08:00:02 +01:00
mul_mat_vec.comp vulkan: small mul_mat_vec optimizations (#10665) 2024-12-13 09:42:04 +01:00
mul_mm_cm2.comp vulkan: Add VK_NV_cooperative_matrix2 support for mul_mat and flash attention (#10206) 2024-12-05 20:15:05 +01:00
mul_mm.comp Vulkan: VK_KHR_cooperative_matrix support to speed up prompt processing (#10597) 2024-12-07 10:24:15 +01:00
mul.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
norm.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
pad.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
pool2d.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
relu.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
repeat.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
rms_norm.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
rope_head.comp vulkan: request round-to-even for fp16 in im2col/rope_head (#10767) 2024-12-10 21:23:17 +01:00
rope_neox.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
rope_norm.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
scale.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
silu.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
sin.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
soft_max.comp vulkan: predicate max operation in soft_max shaders/soft_max (#10437) 2024-11-20 20:47:36 +01:00
square.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
sum_rows.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
tanh.comp Vulkan: fix NaN in tanh.comp with AMD proprietary driver on Windows (#10723) 2024-12-08 19:19:19 +01:00
test_coopmat2_support.comp vulkan: compile a test shader in cmake to check for coopmat2 support (#10713) 2024-12-08 09:05:55 +01:00
timestep_embedding.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
types.comp vulkan: define all quant data structures in types.comp (#10440) 2024-11-27 08:32:54 +01:00
upscale.comp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
vulkan-shaders-gen.cpp vulkan: request round-to-even for fp16 in im2col/rope_head (#10767) 2024-12-10 21:23:17 +01:00