llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-13 04:00:16 +00:00

Author	SHA1	Message	Date
Francis Couture-Harpin	c51daefc32	llama : advanced batch splits This includes equal-sequence-length batch splits which are useful to simplify recurrent model operators. * llama : always make recurrent state slots contiguous * ggml : simplify mamba operators	2024-07-16 20:38:48 -04:00
AidanBeltonS	f619024764	[SYCL] Remove unneeded semicolons (#8280 )	2024-07-04 09:07:19 +08:00
Daniele	d23287f122	Define and optimize RDNA1 (#8085 )	2024-07-04 01:02:58 +02:00
Judd	f8d6a23804	fix typo (#8267 ) Co-authored-by: Judd <foldl@boxvest.com>	2024-07-03 14:40:16 +02:00
AidanBeltonS	fadde67135	Dequant improvements rebase (#8255 ) * Single load for half2 * Store scales in local mem * Vec load quantized values	2024-07-03 09:55:34 +08:00
Clint Herron	07a3fc0608	Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258 )	2024-07-02 12:18:10 -04:00
slaren	0e0590adab	cuda : update supports_op for matrix multiplication (#8245 )	2024-07-02 09:39:38 +03:00
luoyu-intel	a9f3b10215	[SYCL] Fix win build conflict of math library (#8230 ) * fix win build conflict of math library * fix the condition: !(win32 & SYCL) * revert warp_size=16	2024-07-02 12:50:07 +08:00
luoyu-intel	d08c20edde	[SYCL] Fix the sub group size of Intel (#8106 ) * use warp_size macro for all sycl kernels * fix mask of permute_sub_group_by_xor * fix rms_norm with correct warp number * fix rms_norm_f32/group_norm_f32 * move norm to norm.cpp file * fix quantize bug * fix mmvq's batch size	2024-07-02 10:16:00 +08:00
Johannes Gäßler	cb5fad4c6c	CUDA: refactor and optimize IQ MMVQ (#8215 ) * CUDA: refactor and optimize IQ MMVQ * uint -> uint32_t * __dp4a -> ggml_cuda_dp4a * remove MIN_CC_DP4A checks * change default * try CI fix	2024-07-01 20:39:06 +02:00
zhentaoyu	197fe6c1d7	[SYCL] Update SYCL-Rope op and Refactor (#8157 ) * align with rope.cu and move sycl-op to a single file	2024-07-01 19:39:06 +08:00
Johannes Gäßler	85a267daaa	CUDA: fix MMQ stream-k for --split-mode row (#8167 )	2024-06-27 16:26:05 +02:00
slaren	31ec3993f6	ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLAS (cmake) (#8140 )	2024-06-26 21:34:14 +02:00
Georgi Gerganov	f3f65429c4	llama : reorganize source code + improve CMake (#8006 ) * scripts : update sync [no ci] * files : relocate [no ci] * ci : disable kompute build [no ci] * cmake : fixes [no ci] * server : fix mingw build ggml-ci * cmake : minor [no ci] * cmake : link math library [no ci] * cmake : build normal ggml library (not object library) [no ci] * cmake : fix kompute build ggml-ci * make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE ggml-ci * move public backend headers to the public include directory (#8122) * move public backend headers to the public include directory * nix test * spm : fix metal header --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * scripts : fix sync paths [no ci] * scripts : sync ggml-blas.h [no ci] --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-26 18:33:02 +03:00

14 Commits