mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2024-12-24 02:14:35 +00:00
perplexity: add BF16 vs. FP16 results (#7150)
This commit is contained in:
parent
948f4ec7c5
commit
1c570d8bee
@ -7,6 +7,8 @@ Also note that finetunes typically result in a higher perplexity value even thou
|
||||
|
||||
Within llama.cpp the perplexity of base models is used primarily to judge the quality loss from e.g. quantized models vs. FP16.
|
||||
The convention among contributors is to use the Wikitext-2 test set for testing unless noted otherwise (can be obtained with `scripts/get-wikitext-2.sh`).
|
||||
When numbers are listed all command line arguments and compilation options are left at their defaults unless noted otherwise.
|
||||
llama.cpp numbers are **not** directly comparable to those of other projects because the exact values depend strongly on the implementation details.
|
||||
|
||||
By default only the mean perplexity value and the corresponding uncertainty is calculated.
|
||||
The uncertainty is determined empirically by assuming a Gaussian distribution of the "correct" logits per and then applying error propagation.
|
||||
@ -32,7 +34,13 @@ In addition to the KL divergence the following statistics are calculated with `-
|
||||
|
||||
## LLaMA 3 8b Scoreboard
|
||||
|
||||
Results are sorted by Kullback-Leibler divergence relative to FP16.
|
||||
| Revision | f364eb6f |
|
||||
|:---------|:-------------------|
|
||||
| Backend | CUDA |
|
||||
| CPU | AMD Epyc 7742 |
|
||||
| GPU | 1x NVIDIA RTX 4090 |
|
||||
|
||||
Results were generated using the CUDA backend and are sorted by Kullback-Leibler divergence relative to FP16.
|
||||
The "WT" importance matrices were created using varying numbers of Wikitext tokens and can be found [here](https://huggingface.co/JohannesGaessler/llama.cpp_importance_matrices/blob/main/imatrix-llama_3-8b-f16-2.7m_tokens.dat).
|
||||
|
||||
| Quantization | imatrix | Model size [GiB] | PPL | ΔPPL | KLD | Mean Δp | RMS Δp |
|
||||
@ -89,6 +97,12 @@ K-quants score better on mean Δp than the legacy quants than e.g. KL divergence
|
||||
|
||||
## LLaMA 2 vs. LLaMA 3 Quantization comparison
|
||||
|
||||
| Revision | f364eb6f |
|
||||
|:---------|:-------------------|
|
||||
| Backend | CUDA |
|
||||
| CPU | AMD Epyc 7742 |
|
||||
| GPU | 1x NVIDIA RTX 4090 |
|
||||
|
||||
| Metric | L2 7b q2_K | L3 8b q2_K | L2 7b q4_K_M | L3 8b q4_K_M | L2 7b q6_K | L3 8b q6_K | L2 7b q8_0 | L3 8b q8_0 |
|
||||
|-----------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|
|
||||
| Mean PPL | 5.794552 ± 0.032298 | 9.751568 ± 0.063312 | 5.877078 ± 0.032781 | 6.407115 ± 0.039119 | 5.808494 ± 0.032425 | 6.253382 ± 0.038078 | 5.798542 ± 0.032366 | 6.234284 ± 0.037878 |
|
||||
@ -107,6 +121,50 @@ K-quants score better on mean Δp than the legacy quants than e.g. KL divergence
|
||||
| RMS Δp | 9.762 ± 0.053 % | 21.421 ± 0.079 % | 3.252 ± 0.024 % | 5.519 ± 0.050 % | 1.339 ± 0.010 % | 2.295 ± 0.019 % | 0.618 ± 0.011 % | 1.198 ± 0.007 % |
|
||||
| Same top p | 85.584 ± 0.086 % | 71.138 ± 0.119 % | 94.665 ± 0.055 % | 91.901 ± 0.072 % | 97.520 ± 0.038 % | 96.031 ± 0.051 % | 98.846 ± 0.026 % | 97.674 ± 0.040 % |
|
||||
|
||||
## LLaMA 3 BF16 vs. FP16 comparison
|
||||
|
||||
| Revision | 83330d8c |
|
||||
|:---------|:--------------|
|
||||
| Backend | CPU |
|
||||
| CPU | AMD Epyc 7742 |
|
||||
| GPU | N/A |
|
||||
|
||||
Results were calculated with LLaMA 3 8b BF16 as `--kl-divergence-base` and LLaMA 3 8b FP16 as the `--model` for comparison.
|
||||
|
||||
| Metric | Value |
|
||||
|--------------------------------|--------------------------|
|
||||
| Mean PPL(Q) | 6.227711 ± 0.037833 |
|
||||
| Mean PPL(base) | 6.225194 ± 0.037771 |
|
||||
| Cor(ln(PPL(Q)), ln(PPL(base))) | 99.990% |
|
||||
| Mean ln(PPL(Q)/PPL(base)) | 0.000404 ± 0.000086 |
|
||||
| Mean PPL(Q)/PPL(base) | 1.000404 ± 0.000086 |
|
||||
| Mean PPL(Q)-PPL(base) | 0.002517 ± 0.000536 |
|
||||
| Mean KLD | 0.00002515 ± 0.00000020 |
|
||||
| Maximum KLD | 0.012206 |
|
||||
| 99.9% KLD | 0.000799 |
|
||||
| 99.0% KLD | 0.000222 |
|
||||
| 99.0% KLD | 0.000222 |
|
||||
| Median KLD | 0.000013 |
|
||||
| 10.0% KLD | -0.000002 |
|
||||
| 5.0% KLD | -0.000008 |
|
||||
| 1.0% KLD | -0.000023 |
|
||||
| Minimum KLD | -0.000059 |
|
||||
| Mean Δp | -0.0000745 ± 0.0003952 % |
|
||||
| Maximum Δp | 4.186% |
|
||||
| 99.9% Δp | 1.049% |
|
||||
| 99.0% Δp | 0.439% |
|
||||
| 95.0% Δp | 0.207% |
|
||||
| 90.0% Δp | 0.125% |
|
||||
| 75.0% Δp | 0.029% |
|
||||
| Median Δp | 0.000% |
|
||||
| 25.0% Δp | -0.030% |
|
||||
| 10.0% Δp | -0.126% |
|
||||
| 5.0% Δp | -0.207% |
|
||||
| 1.0% Δp | -0.434% |
|
||||
| 0.1% Δp | -1.016% |
|
||||
| Minimum Δp | -4.672% |
|
||||
| RMS Δp | 0.150 ± 0.001 % |
|
||||
| Same top p | 99.739 ± 0.013 % |
|
||||
|
||||
## Old Numbers
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user