llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-11-15 07:19:53 +00:00

History

Jeroen Mostert 46e47417aa Allow all RDNA2 archs to use sdot4 intrinsic (#8629 ) The check gating the use of `__builtin_amdgc_sdot4` specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using `HSA_OVERRIDE_GFX_VERSION` (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.		2024-07-23 10:50:40 +02:00
..
ggml-cann	[CANN] Add Ascend NPU backend (#6035 )	2024-07-17 14:23:50 +03:00
ggml-cuda	Allow all RDNA2 archs to use sdot4 intrinsic (#8629 )	2024-07-23 10:50:40 +02:00
ggml-sycl	[SYCL] fix scratch size of softmax (#8642 )	2024-07-23 15:43:28 +08:00
kompute@4565194ed7	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
kompute-shaders	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
llamafile	ggml : move sgemm sources to llamafile subfolder (#8394 )	2024-07-10 15:23:29 +03:00
vulkan-shaders	Vulkan MMQ Fix (#8479 )	2024-07-15 09:38:52 +02:00
CMakeLists.txt	[CANN] Add Ascend NPU backend (#6035 )	2024-07-17 14:23:50 +03:00
ggml-aarch64.c	ggml : suppress unknown pragma 'GCC' on windows (#8460 )	2024-07-15 15:48:17 +03:00
ggml-aarch64.h	ggml : minor naming changes (#8433 )	2024-07-12 10:46:02 +03:00
ggml-alloc.c	CUDA: fix partial offloading for ne0 % 256 != 0 (#8572 )	2024-07-18 23:48:47 +02:00
ggml-backend-impl.h	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
ggml-backend.c	CUDA: fix partial offloading for ne0 % 256 != 0 (#8572 )	2024-07-18 23:48:47 +02:00
ggml-blas.cpp	ggml : add NVPL BLAS support (#8329 ) (#8425 )	2024-07-11 18:49:15 +02:00
ggml-cann.cpp	[CANN] Add Ascend NPU backend (#6035 )	2024-07-17 14:23:50 +03:00
ggml-common.h	ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (#5780 )	2024-07-10 15:14:51 +03:00
ggml-cuda.cu	CUDA: fix partial offloading for ne0 % 256 != 0 (#8572 )	2024-07-18 23:48:47 +02:00
ggml-impl.h	ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (#5780 )	2024-07-10 15:14:51 +03:00
ggml-kompute.cpp	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
ggml-metal.m	ggml : fix quant dot product with odd number of blocks (#8549 )	2024-07-19 17:17:27 +02:00
ggml-metal.metal	ggml : fix quant dot product with odd number of blocks (#8549 )	2024-07-19 17:17:27 +02:00
ggml-quants.c	ggml: fix compile error for RISC-V (#8623 )	2024-07-22 10:56:45 +03:00
ggml-quants.h	ggml : minor naming changes (#8433 )	2024-07-12 10:46:02 +03:00
ggml-rpc.cpp	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
ggml-sycl.cpp	[SYCL] add concat through dim 1/2 (#8483 )	2024-07-15 19:32:15 +08:00
ggml-vulkan.cpp	Vulkan MMQ Fix (#8479 )	2024-07-15 09:38:52 +02:00
ggml.c	gguf : handle null name during init (#8587 )	2024-07-20 17:15:42 +03:00