Added perplexity metrics for llama 3.1 with different quantization settings

2025-01-12 11:40:17 +00:00 · 2024-08-08 10:55:33 +02:00 · 2024-08-08 10:55:33 +02:00 · 924c832461
commit 924c832461
parent ebd541a570
1 changed files with 23 additions and 0 deletions
--- a/examples/perplexity/README.md
+++ b/examples/perplexity/README.md
@ -169,6 +169,29 @@ Results were calculated with LLaMA 3 8b BF16 as `--kl-divergence-base` and LLaMA
 | RMS Δp                         |          0.150 ± 0.001 % |
 | Same top p                     |         99.739 ± 0.013 % |

+## LLaMA 3.1 BF16 Scoreboard
+
+| Revision | b3472            |
+|:---------|:-------------------|
+| Backend  | CUDA               |
+| CPU      | AMD Epyc 7R13      |
+| GPU      | 1x NVIDIA L4       |
+
+| Quantization | imatrix | PPL                 | 
+|--------------|---------|---------------------|
+| bf16         | None    | 6.4006 +/- 0.03938  |
+| fp16         | None    | 6.4016 +/- 0.03939  |
+| q8_0         | None    | 6.4070 +/- 0.03941  | 
+| q6_K         | None    | 6.4231 +/- 0.03957  | 
+| q5_K_M       | None    | 6.4623 +/- 0.03987  |
+| q5_K_S       | None    | 6.5161 +/- 0.04028  | 
+| q4_K_M       | None    | 6.5837 +/- 0.04068  | 
+| q4_K_S       | None    | 6.6751 +/- 0.04125  | 
+| q3_K_L       | None    | 6.9458 +/- 0.04329  | 
+| q3_K_M       | None    | 7.0488 +/- 0.04384  | 
+| q3_K_S       | None    | 7.8823 +/- 0.04920  | 
+| q2_K         | None    | 9.7262 +/- 0.06393  | 
+
 ## Old Numbers

 <details>