llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-27 03:44:35 +00:00

Author	SHA1	Message	Date
Matvey Soloviev	904d2a8d6a	Q4_1 quantization (#193 ) * Add AVX2 version of ggml_vec_dot_q4_1 * Small optimisations to q4_1 dot product (@Const-me) * Rearrange Q4_1 quantization to work for multipart models. (Fix #152) * Fix ggml_vec_mad_q4_1 too * Fix non-vectorised q4_1 vec mul	2023-03-17 06:48:39 +02:00
Nebula	9b4a15b17d	Fix RMS norm in GGML (#191 )	2023-03-15 19:29:25 -04:00
hoangmit	6eac39ba95	Add RMS norm and use it (#187 ) * add ggml_rms_norm * update op num	2023-03-16 00:41:38 +02:00
hoangmit	113e685d18	inline -> static inline for "bytesFromNibbles" (#161 ) Without "static" prefix, it fails to compile in clang	2023-03-15 21:05:14 +02:00
Ronsor	47857e564c	Don't use vdotq_s32 if it's not available (#139 ) * Don't use vdotq_s32 if it's not available `dotprod` extensions aren't available on some ARM CPUs (e.g. Raspberry Pi 4), so check for them and only use them if they're available. Reintroduces the code removed in `84d9015` if `__ARM_FEATURE_DOTPROD` isn't defined. * Update ggml.c --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-14 21:34:37 +02:00
Thomas Klausner	41be0a3b3d	Add NetBSD support. (#90 )	2023-03-13 18:40:54 +02:00
Georgi Gerganov	84d9015c4a	Use vdotq_s32 to improve performance (#67 ) * 10% performance boost on ARM * Back to original change	2023-03-13 18:36:44 +02:00
Georgi Gerganov	c80e2a8f2a	Revert "10% performance boost on ARM" This reverts commit `113a9e83eb`. There are some reports for illegal instruction. Moved this stuff to vdotq_s32 branch until resolve	2023-03-13 01:28:08 +02:00
Georgi Gerganov	54a0e66ea0	Check for vdotq_s32 availability	2023-03-13 01:21:03 +02:00
Georgi Gerganov	543c57e991	Ammend to previous commit - forgot to update non-QRDMX branch	2023-03-13 01:05:24 +02:00
Georgi Gerganov	113a9e83eb	10% performance boost on ARM	2023-03-13 00:56:10 +02:00
Sebastián A	eb062bb012	Windows fixes (#31 ) * Apply fixes suggested to build on windows Issue: https://github.com/ggerganov/llama.cpp/issues/22 * Remove unsupported VLAs * MSVC: Remove features that are only available on MSVC C++20. * Fix zero initialization of the other fields. * Change the use of vector for stack allocations.	2023-03-12 22:15:00 +02:00
Georgi Gerganov	f1eaff4721	Add AVX2 support for x86 architectures thanks to @Const-me !	2023-03-11 18:04:25 +02:00
Georgi Gerganov	007a8f6f45	Support all LLaMA models + change Q4_0 quantization storage	2023-03-11 11:28:30 +02:00
Georgi Gerganov	26c0846629	Initial release	2023-03-10 20:56:40 +02:00

1 2 3 4 5

215 Commits