llama.cpp/ggml/src
Jeff Bolz 80dd7ff22f
vulkan: Optimize contiguous copies (#10254)
* tests: Fix memory bandwidth calculation for perf tests

Add a flops calculation for flash attention.

Add one GGML_OP_CPY perf test.

* vulkan: Optimize contiguous copies

Add a variant of the copy shader for when the tensors are contiguous. Avoid
the complex addressing calculations, and do four elements per invocation
to hide some other overhead.

Apply similar changes to the scale shader, since scale is always contiguous.

Add a "progress bar" for shader compiles.
2024-11-13 07:58:57 +01:00
..
ggml-amx add amx kernel for gemm (#8998) 2024-10-18 13:34:36 +08:00
ggml-cann cann: fix crash when llama-bench is running on multiple cann devices (#9627) 2024-09-25 11:30:38 +08:00
ggml-cuda ggml: fix zero division in ‘dne’ calculation in CUDA COUNT_EQUAL operator when ‘ne’ is small (#10213) 2024-11-09 08:35:46 +01:00
ggml-sycl Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (#10133) 2024-11-07 15:19:10 +08:00
kompute@4565194ed7 llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
kompute-shaders kompute: add mul_mat_q4_k shader (#10097) 2024-10-31 11:09:52 +02:00
llamafile ggml : optimize llamafile cpu matrix multiplication for ppc64le (#10156) 2024-11-09 09:17:50 +02:00
vulkan-shaders vulkan: Optimize contiguous copies (#10254) 2024-11-13 07:58:57 +01:00
CMakeLists.txt ggml : optimize llamafile cpu matrix multiplication for ppc64le (#10156) 2024-11-09 09:17:50 +02:00
ggml-aarch64.c ggml : move CPU backend to a separate file (#10144) 2024-11-03 19:34:08 +01:00
ggml-aarch64.h ggml : minor naming changes (#8433) 2024-07-12 10:46:02 +03:00
ggml-alloc.c ggml-alloc : remove buffer_id from leaf_alloc (ggml/987) 2024-10-16 11:28:01 +03:00
ggml-amx.cpp llama : refactor model loader with backend registry (#10026) 2024-10-30 02:01:23 +01:00
ggml-backend-impl.h llama : refactor model loader with backend registry (#10026) 2024-10-30 02:01:23 +01:00
ggml-backend.cpp ggml : move CPU backend to a separate file (#10144) 2024-11-03 19:34:08 +01:00
ggml-blas.cpp llama : refactor model loader with backend registry (#10026) 2024-10-30 02:01:23 +01:00
ggml-cann.cpp CANN: adjust backend registry refactor. (#10158) 2024-11-04 19:08:22 +08:00
ggml-common.h ggml-quants : ternary packing for TriLMs and BitNet b1.58 (#8151) 2024-09-05 21:48:47 -04:00
ggml-cpu-impl.h ggml : move common CPU backend impl to new header (#9509) 2024-09-16 16:22:07 +02:00
ggml-cpu.c fix q4_0_8_8 format for corrupted tokens issue (#10198) 2024-11-07 09:02:08 +01:00
ggml-cuda.cu metal : optimize FA kernels (#10171) 2024-11-08 13:47:22 +02:00
ggml-impl.h ggml : move CPU backend to a separate file (#10144) 2024-11-03 19:34:08 +01:00
ggml-kompute.cpp kompute: add mul_mat_q4_k shader (#10097) 2024-10-31 11:09:52 +02:00
ggml-metal.m metal : fix build and some more comments (#10229) 2024-11-09 11:53:02 +02:00
ggml-metal.metal metal : more precise Q*K in FA vec kernel (#10247) 2024-11-11 08:39:13 +02:00
ggml-quants.c Q6_K AVX improvements (#10118) 2024-11-04 23:06:31 +01:00
ggml-quants.h ggml : add run-time detection of neon, i8mm and sve (#9331) 2024-09-28 15:06:16 +03:00
ggml-rpc.cpp ggml : move CPU backend to a separate file (#10144) 2024-11-03 19:34:08 +01:00
ggml-sycl.cpp Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (#10133) 2024-11-07 15:19:10 +08:00
ggml-vulkan.cpp vulkan: Optimize contiguous copies (#10254) 2024-11-13 07:58:57 +01:00
ggml.c metal : optimize FA kernels (#10171) 2024-11-08 13:47:22 +02:00