llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-31 22:04:35 +00:00

Author	SHA1	Message	Date
Francis Couture-Harpin	dd3e62a703	ggml : add some informative comments in q1_3 vec_dot	2024-07-28 21:17:16 -04:00
Francis Couture-Harpin	8fbd59308b	ggml-quants : attempt to fix Arm 32-bit support	2024-06-28 22:52:57 -04:00
Francis Couture-Harpin	ec50944bf6	ggml-quants : fix build failure on Windows	2024-06-28 20:41:13 -04:00
Francis Couture-Harpin	bfd2f21fb4	bitnet : replace 1.58b with b1.58, as in the paper	2024-06-28 20:38:12 -04:00
Francis Couture-Harpin	89dc3b254c	ggml-quants : use ceiling division when quantizing q1_3	2024-06-27 02:06:28 -04:00
Francis Couture-Harpin	9465ec6e12	ggml-quants : ARM NEON vec_dot for q2_2 and q1_3	2024-06-27 02:06:28 -04:00
Francis Couture-Harpin	638ad52f87	ggml-quants : cleanup Q1_3 code formatting	2024-06-27 02:06:28 -04:00
Francis Couture-Harpin	ef1e345c85	ggml-quants : Q2_2 now faster than Q4_K on with AVX2	2024-06-27 02:06:28 -04:00
Francis Couture-Harpin	48b73b8498	ggml-quants : substract 1 when back in epi8 This makes the 1.625 bpw type go faster than q4_0. Still not the fastest.	2024-06-27 02:06:28 -04:00
Francis Couture-Harpin	7ef4254a92	ggml-quants : faster 1.625 bpw AVX2 vec_dot Not using a lookup table anymore makes it match q4_0 speed. * gguf-py : fix formatting * llama : remove spaces on empty line	2024-06-27 02:06:28 -04:00
Francis Couture-Harpin	bd807499f7	ggml-quants : 1.625 bpw ternary packing for BitNet 1.58b	2024-06-27 02:06:22 -04:00
slaren	31ec3993f6	ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLAS (cmake) (#8140 )	2024-06-26 21:34:14 +02:00
Georgi Gerganov	f3f65429c4	llama : reorganize source code + improve CMake (#8006 ) * scripts : update sync [no ci] * files : relocate [no ci] * ci : disable kompute build [no ci] * cmake : fixes [no ci] * server : fix mingw build ggml-ci * cmake : minor [no ci] * cmake : link math library [no ci] * cmake : build normal ggml library (not object library) [no ci] * cmake : fix kompute build ggml-ci * make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE ggml-ci * move public backend headers to the public include directory (#8122) * move public backend headers to the public include directory * nix test * spm : fix metal header --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * scripts : fix sync paths [no ci] * scripts : sync ggml-blas.h [no ci] --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-26 18:33:02 +03:00

13 Commits