Added perplexity metrics for llama 3.1 with different quantization settings

2025-01-12 19:50:17 +00:00 · 2024-08-08 10:55:33 +02:00 · 2024-08-08 10:55:33 +02:00 · 924c832461
commit 924c832461
parent ebd541a570
1 changed files with 23 additions and 0 deletions
--- a/examples/perplexity/README.md
+++ b/examples/perplexity/README.md
@ -169,6 +169,29 @@ Results were calculated with LLaMA 3 8b BF16 as `--kl-divergence-base` and LLaMA
 | RMS Δp                         |          0.150 ± 0.001 % |
 | Same top p                     |         99.739 ± 0.013 % |
 ## LLaMA 3.1 BF16 Scoreboard
 | Revision | b3472            |
 |:---------|:-------------------|
 | Backend  | CUDA               |
 | CPU      | AMD Epyc 7R13      |
 | GPU      | 1x NVIDIA L4       |
 | Quantization | imatrix | PPL                 | 
 |--------------|---------|---------------------|
 | bf16         | None    | 6.4006 +/- 0.03938  |
 | fp16         | None    | 6.4016 +/- 0.03939  |
 | q8_0         | None    | 6.4070 +/- 0.03941  | 
 | q6_K         | None    | 6.4231 +/- 0.03957  | 
 | q5_K_M       | None    | 6.4623 +/- 0.03987  |
 | q5_K_S       | None    | 6.5161 +/- 0.04028  | 
 | q4_K_M       | None    | 6.5837 +/- 0.04068  | 
 | q4_K_S       | None    | 6.6751 +/- 0.04125  | 
 | q3_K_L       | None    | 6.9458 +/- 0.04329  | 
 | q3_K_M       | None    | 7.0488 +/- 0.04384  | 
 | q3_K_S       | None    | 7.8823 +/- 0.04920  | 
 | q2_K         | None    | 9.7262 +/- 0.06393  | 
 ## Old Numbers
 <details>