llama.cpp/examples/perplexity
Iwan Kawrakow ccc78a200e hellaswag: speed up even more by parallelizing log-prob evaluation
For Mistral-7B and fp16, time on my system goes down from 536 seconds
to 423 seconds for the full evaluation dataset (10042 tasks).
2024-01-18 18:25:29 +02:00
..
CMakeLists.txt build : link against build info instead of compiling against it (#3879) 2023-11-02 08:50:16 +02:00
perplexity.cpp hellaswag: speed up even more by parallelizing log-prob evaluation 2024-01-18 18:25:29 +02:00
README.md readme : add some recent perplexity and bpw measurements to READMES, link for k-quants (#3340) 2023-09-27 18:30:36 +03:00

perplexity

TODO

Llama 2 70B Scorechart

Quantization Model size (GiB) Perplexity Delta to fp16
Q4_0 36.20 3.5550 3.61%
Q4_1 40.20 3.5125 2.37%
Q5_0 44.20 3.4744 1.26%
Q2_K 27.27 3.7339 8.82%
Q3_K_S 27.86 3.7019 7.89%
Q3_K_M 30.83 3.5932 4.72%
Q3_K_L 33.67 3.5617 3.80%
Q4_K_S 36.39 3.4852 1.57%
Q4_K_M 38.54 3.4725 1.20%
Q5_K_S 44.20 3.4483 0.50%
Q5_K_M 45.41 3.4451 0.40%
Q6_K 52.70 3.4367 0.16%
fp16 128.5 3.4313 -