llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-28 12:24:35 +00:00

Author	SHA1	Message	Date
Francis Couture-Harpin	638ad52f87	ggml-quants : cleanup Q1_3 code formatting	2024-06-27 02:06:28 -04:00
Francis Couture-Harpin	ef1e345c85	ggml-quants : Q2_2 now faster than Q4_K on with AVX2	2024-06-27 02:06:28 -04:00
Francis Couture-Harpin	48b73b8498	ggml-quants : substract 1 when back in epi8 This makes the 1.625 bpw type go faster than q4_0. Still not the fastest.	2024-06-27 02:06:28 -04:00
Francis Couture-Harpin	7ef4254a92	ggml-quants : faster 1.625 bpw AVX2 vec_dot Not using a lookup table anymore makes it match q4_0 speed. * gguf-py : fix formatting * llama : remove spaces on empty line	2024-06-27 02:06:28 -04:00
Francis Couture-Harpin	bd807499f7	ggml-quants : 1.625 bpw ternary packing for BitNet 1.58b	2024-06-27 02:06:22 -04:00
slaren	31ec3993f6	ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLAS (cmake) (#8140 )	2024-06-26 21:34:14 +02:00
Georgi Gerganov	f3f65429c4	llama : reorganize source code + improve CMake (#8006 ) * scripts : update sync [no ci] * files : relocate [no ci] * ci : disable kompute build [no ci] * cmake : fixes [no ci] * server : fix mingw build ggml-ci * cmake : minor [no ci] * cmake : link math library [no ci] * cmake : build normal ggml library (not object library) [no ci] * cmake : fix kompute build ggml-ci * make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE ggml-ci * move public backend headers to the public include directory (#8122) * move public backend headers to the public include directory * nix test * spm : fix metal header --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * scripts : fix sync paths [no ci] * scripts : sync ggml-blas.h [no ci] --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-26 18:33:02 +03:00

1 2 3 4

157 Commits