llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-30 21:34:36 +00:00

History

snadampal a07d0fee1f ggml : add mmla kernels for quantized GEMM (#4966 ) * ggml: aarch64: implement smmla kernel for q8_0_q8_0 quantized gemm armv8.2-a and above supports MMLA instructions that have higher throughput than DOT. this commit adds mmla kernel for q8_0_q8_0 gemm. The feature is enabled if the platform supports "__ARM_FEATURE_MATMUL_INT8" On AWS Graviton3 processors this kernel resulted up to 1.5x improvement for prompt evaluation throughput compared to the default sdot kernel. * ggml: aarch64: implement smmla kernel for q4_0_q8_0 quantized gemm armv8.2-a and above supports MMLA instructions that have higher throughput than DOT. this commit adds mmla kernel for q4_0_q8_0 gemm. The feature is enabled if the platform supports "__ARM_FEATURE_MATMUL_INT8" On AWS Graviton3 processors this kernel resulted up to 1.5x improvement for prompt evaluation throughput compared to the default sdot kernel. * ggml: aarch64: implement smmla kernel for q4_1_q8_1 quantized gemm armv8.2-a and above supports MMLA instructions that have higher throughput than DOT. this commit adds mmla kernel for q4_1_q8_1 gemm. The feature is enabled if the platform supports "__ARM_FEATURE_MATMUL_INT8" On AWS Graviton3 processors this kernel resulted up to 1.5x improvement for prompt evaluation throughput compared to the default sdot kernel. * ggml: update unit tests for the new vec_dot interface * llama.cpp: add MATMUL_INT8 capability to system_info		2024-02-11 15:22:33 +02:00
..
CMakeLists.txt	ggml : a faster version for Q4_1 x Q8_0 dot products (#1083 )	2023-04-21 18:18:26 +03:00
q8dot.cpp	ggml : add mmla kernels for quantized GEMM (#4966 )	2024-02-11 15:22:33 +02:00
vdot.cpp	ggml : add mmla kernels for quantized GEMM (#4966 )	2024-02-11 15:22:33 +02:00