llama.cpp/examples/quantize
2024-06-10 15:34:14 +01:00
..
CMakeLists.txt prefix more cmake targets w/ llama- 2024-06-08 14:05:34 +01:00
quantize.cpp common : normalize naming style (#7462) 2024-05-22 20:04:20 +03:00
README.md doc: add references to hugging face GGUF-my-repo quantisation web tool. (#7288) 2024-05-16 15:38:43 +10:00
tests.sh rename llama|main -> llama-cli; consistent RPM bin prefixes 2024-06-10 15:34:14 +01:00

quantize

You can also use the GGUF-my-repo space on Hugging Face to build your own quants without any setup.

Note: It is synced from llama.cpp main every 6 hours.

Llama 2 7B

Quantization Bits per Weight (BPW)
Q2_K 3.35
Q3_K_S 3.50
Q3_K_M 3.91
Q3_K_L 4.27
Q4_K_S 4.58
Q4_K_M 4.84
Q5_K_S 5.52
Q5_K_M 5.68
Q6_K 6.56

Llama 2 13B

Quantization Bits per Weight (BPW)
Q2_K 3.34
Q3_K_S 3.48
Q3_K_M 3.89
Q3_K_L 4.26
Q4_K_S 4.56
Q4_K_M 4.83
Q5_K_S 5.51
Q5_K_M 5.67
Q6_K 6.56

Llama 2 70B

Quantization Bits per Weight (BPW)
Q2_K 3.40
Q3_K_S 3.47
Q3_K_M 3.85
Q3_K_L 4.19
Q4_K_S 4.53
Q4_K_M 4.80
Q5_K_S 5.50
Q5_K_M 5.65
Q6_K 6.56