llama.cpp/examples/perplexity
Kawrakow 7051aacfac
winogrande: evaluate log-probs in parallel (#5036)
This is a relatively minor performance tweak resulting in
~10% speedup on my system.

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-19 11:39:11 +02:00
..
CMakeLists.txt build : link against build info instead of compiling against it (#3879) 2023-11-02 08:50:16 +02:00
perplexity.cpp winogrande: evaluate log-probs in parallel (#5036) 2024-01-19 11:39:11 +02:00
README.md readme : add some recent perplexity and bpw measurements to READMES, link for k-quants (#3340) 2023-09-27 18:30:36 +03:00

perplexity

TODO

Llama 2 70B Scorechart

Quantization Model size (GiB) Perplexity Delta to fp16
Q4_0 36.20 3.5550 3.61%
Q4_1 40.20 3.5125 2.37%
Q5_0 44.20 3.4744 1.26%
Q2_K 27.27 3.7339 8.82%
Q3_K_S 27.86 3.7019 7.89%
Q3_K_M 30.83 3.5932 4.72%
Q3_K_L 33.67 3.5617 3.80%
Q4_K_S 36.39 3.4852 1.57%
Q4_K_M 38.54 3.4725 1.20%
Q5_K_S 44.20 3.4483 0.50%
Q5_K_M 45.41 3.4451 0.40%
Q6_K 52.70 3.4367 0.16%
fp16 128.5 3.4313 -