root/llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-09-23 21:46:20 +00:00

History

Kawrakow 469e75d0a3 llama : restore intended k-quants mixes for MoE models (#4872 ) * Restore intended k-quants quantization mixes for MoE models * Update Q2_K_S values in the quantize tool Still using LLaMA-v1 PPL values in the quant description today does not make much sense. But let's leave this update for another PR. --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>		2024-01-11 21:43:15 +02:00
..
CMakeLists.txt	build : link against build info instead of compiling against it (#3879 )	2023-11-02 08:50:16 +02:00
quantize.cpp	llama : restore intended k-quants mixes for MoE models (#4872 )	2024-01-11 21:43:15 +02:00
README.md	readme : add some recent perplexity and bpw measurements to READMES, link for k-quants (#3340 )	2023-09-27 18:30:36 +03:00

README.md

quantize

TODO

Llama 2 7B

Quantization	Bits per Weight (BPW)
Q2_K	3.35
Q3_K_S	3.50
Q3_K_M	3.91
Q3_K_L	4.27
Q4_K_S	4.58
Q4_K_M	4.84
Q5_K_S	5.52
Q5_K_M	5.68
Q6_K	6.56

Llama 2 13B

Quantization	Bits per Weight (BPW)
Q2_K	3.34
Q3_K_S	3.48
Q3_K_M	3.89
Q3_K_L	4.26
Q4_K_S	4.56
Q4_K_M	4.83
Q5_K_S	5.51
Q5_K_M	5.67
Q6_K	6.56

Llama 2 70B

Quantization	Bits per Weight (BPW)
Q2_K	3.40
Q3_K_S	3.47
Q3_K_M	3.85
Q3_K_L	4.19
Q4_K_S	4.53
Q4_K_M	4.80
Q5_K_S	5.50
Q5_K_M	5.65
Q6_K	6.56