llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-11-14 23:09:53 +00:00

Author	SHA1	Message	Date
compilade	3fd62a6b1c	py : type-check all Python scripts with Pyright (#8341 ) * py : type-check all Python scripts with Pyright * server-tests : use trailing slash in openai base_url * server-tests : add more type annotations * server-tests : strip "chat" from base_url in oai_chat_completions * server-tests : model metadata is a dict * ci : disable pip cache in type-check workflow The cache is not shared between branches, and it's 250MB in size, so it would become quite a big part of the 10GB cache limit of the repo. * py : fix new type errors from master branch * tests : fix test-tokenizer-random.py Apparently, gcc applies optimisations even when pre-processing, which confuses pycparser. * ci : only show warnings and errors in python type-check The "information" level otherwise has entries from 'examples/pydantic_models_to_grammar.py', which could be confusing for someone trying to figure out what failed, considering that these messages can safely be ignored even though they look like errors.	2024-07-07 15:04:39 -04:00
Natsu	1d894a790e	cmake : add GGML_BUILD and GGML_SHARED macro definitions (#8281 )	2024-07-05 17:29:35 +03:00
Ouadie EL FAROUKI	1f3e1b66e2	Enabled more data types for oneMKL gemm_batch (#8236 )	2024-07-05 13:23:25 +01:00
Johannes Gäßler	8e558309dc	CUDA: MMQ support for iq4_nl, iq4_xs (#8278 )	2024-07-05 09:06:31 +02:00
Daniele	0a423800ff	CUDA: revert part of the RDNA1 optimizations (#8309 ) The change on the launch_bounds was causing a small performance drop in perplexity of 25 t/s	2024-07-05 09:06:09 +02:00
Johannes Gäßler	bcefa03bc0	CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 (#8311 )	2024-07-05 09:05:34 +02:00
luoyu-intel	a9554e20b6	[SYCL] Fix WARP_SIZE=16 bug of Intel GPU (#8266 ) * fix group_norm ut * split softmax * fix softmax * add concat support condition * revert debug code * move QK_WARP_SIZE to presets.hpp	2024-07-05 13:06:13 +08:00
Neo Zhang Jianyu	f09b7cb609	rm get_work_group_size() by local cache for performance (#8286 ) Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2024-07-05 10:32:29 +08:00
AidanBeltonS	f619024764	[SYCL] Remove unneeded semicolons (#8280 )	2024-07-04 09:07:19 +08:00
Daniele	d23287f122	Define and optimize RDNA1 (#8085 )	2024-07-04 01:02:58 +02:00
Judd	f8d6a23804	fix typo (#8267 ) Co-authored-by: Judd <foldl@boxvest.com>	2024-07-03 14:40:16 +02:00
AidanBeltonS	fadde67135	Dequant improvements rebase (#8255 ) * Single load for half2 * Store scales in local mem * Vec load quantized values	2024-07-03 09:55:34 +08:00
Clint Herron	07a3fc0608	Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258 )	2024-07-02 12:18:10 -04:00
slaren	0e0590adab	cuda : update supports_op for matrix multiplication (#8245 )	2024-07-02 09:39:38 +03:00
luoyu-intel	a9f3b10215	[SYCL] Fix win build conflict of math library (#8230 ) * fix win build conflict of math library * fix the condition: !(win32 & SYCL) * revert warp_size=16	2024-07-02 12:50:07 +08:00
luoyu-intel	d08c20edde	[SYCL] Fix the sub group size of Intel (#8106 ) * use warp_size macro for all sycl kernels * fix mask of permute_sub_group_by_xor * fix rms_norm with correct warp number * fix rms_norm_f32/group_norm_f32 * move norm to norm.cpp file * fix quantize bug * fix mmvq's batch size	2024-07-02 10:16:00 +08:00
Johannes Gäßler	cb5fad4c6c	CUDA: refactor and optimize IQ MMVQ (#8215 ) * CUDA: refactor and optimize IQ MMVQ * uint -> uint32_t * __dp4a -> ggml_cuda_dp4a * remove MIN_CC_DP4A checks * change default * try CI fix	2024-07-01 20:39:06 +02:00
zhentaoyu	197fe6c1d7	[SYCL] Update SYCL-Rope op and Refactor (#8157 ) * align with rope.cu and move sycl-op to a single file	2024-07-01 19:39:06 +08:00
Johannes Gäßler	85a267daaa	CUDA: fix MMQ stream-k for --split-mode row (#8167 )	2024-06-27 16:26:05 +02:00
slaren	31ec3993f6	ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLAS (cmake) (#8140 )	2024-06-26 21:34:14 +02:00
Georgi Gerganov	f3f65429c4	llama : reorganize source code + improve CMake (#8006 ) * scripts : update sync [no ci] * files : relocate [no ci] * ci : disable kompute build [no ci] * cmake : fixes [no ci] * server : fix mingw build ggml-ci * cmake : minor [no ci] * cmake : link math library [no ci] * cmake : build normal ggml library (not object library) [no ci] * cmake : fix kompute build ggml-ci * make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE ggml-ci * move public backend headers to the public include directory (#8122) * move public backend headers to the public include directory * nix test * spm : fix metal header --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * scripts : fix sync paths [no ci] * scripts : sync ggml-blas.h [no ci] --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-26 18:33:02 +03:00

... 2 3 4 5 6

271 Commits