quantize
You can also use the GGUF-my-repo space on Hugging Face to build your own quants without any setup.
Note: It is synced from llama.cpp main
every 6 hours.
Llama 2 7B
Quantization |
Bits per Weight (BPW) |
Q2_K |
3.35 |
Q3_K_S |
3.50 |
Q3_K_M |
3.91 |
Q3_K_L |
4.27 |
Q4_K_S |
4.58 |
Q4_K_M |
4.84 |
Q5_K_S |
5.52 |
Q5_K_M |
5.68 |
Q6_K |
6.56 |
Llama 2 13B
Quantization |
Bits per Weight (BPW) |
Q2_K |
3.34 |
Q3_K_S |
3.48 |
Q3_K_M |
3.89 |
Q3_K_L |
4.26 |
Q4_K_S |
4.56 |
Q4_K_M |
4.83 |
Q5_K_S |
5.51 |
Q5_K_M |
5.67 |
Q6_K |
6.56 |
Llama 2 70B
Quantization |
Bits per Weight (BPW) |
Q2_K |
3.40 |
Q3_K_S |
3.47 |
Q3_K_M |
3.85 |
Q3_K_L |
4.19 |
Q4_K_S |
4.53 |
Q4_K_M |
4.80 |
Q5_K_S |
5.50 |
Q5_K_M |
5.65 |
Q6_K |
6.56 |