llama.cpp/examples
Kawrakow 682986a08e
Add Winogrande evaluation (#5015)
* winogrande: simple implementation

It doesn't look like it is working - why?
For Mistral-7B it is barely better than
random chance (score ~60% for 1267 tasks), while I see
Mistral-7B scoring 78.4% on the HF leader board.
1-sigma statistical uncertainty for 1267 tasks is ~1.4,
so no way the difference is due to statistics.

* winogrande: somewhat better

Score for Mistrali7-B is now 68.9 on the validation set of
winogrande_debiased. Still far from the reported 78.4, but
better than what I had before.

* winogrande: improving

Mistral-7B score is now 73.56.
Still not quite 78.4 but getting there.
We are also getting a lower score on HellaSwag
compared to HF leader board, so I'm not expecting
we will get up to 78.4 anyway.

It looks like it is better to skip the choice word(s)
when evaluating the average log-likelihood. This kind of
makes sense because a more common word (in Winogrande this is
often a name) will have a higher probability without knowing
about the follow up context, and this will skew the log-likelihood
towards the more common word. We can only do this if the
choice words are not last in the sentence.

It also looks like it is better to skip the punctuation at the
end of the sentence, provided the choice words are not last.

* winogrande: add dataset instructions

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-01-18 13:46:27 +02:00
..
baby-llama ggml : change ggml_scale to take a float instead of tensor (#4573) 2023-12-21 23:20:49 +02:00
batched examples : add passkey test (#3856) 2024-01-08 11:14:04 +02:00
batched-bench llama : ggml-backend integration (#4766) 2024-01-12 20:07:38 +01:00
batched.swift swift : fix prompt tokenization logic (#4321) 2023-12-04 15:43:45 +02:00
beam-search llama : remove token functions with context args in favor of model (#3720) 2023-10-23 22:40:03 +03:00
benchmark 2-bit quantizations (#4897) 2024-01-14 09:45:56 +02:00
convert-llama2c-to-ggml ggml : remove n_dims from ggml_tensor (#4469) 2023-12-14 16:52:08 +01:00
embedding build : link against build info instead of compiling against it (#3879) 2023-11-02 08:50:16 +02:00
export-lora export-lora : use LLAMA_FILE_MAGIC_GGLA (#4894) 2024-01-12 19:54:53 +02:00
finetune finetune : add training data file to log message (#4979) 2024-01-16 19:54:24 +02:00
gguf gguf : simplify example dependencies 2023-12-21 23:08:14 +02:00
imatrix imatrix : offload to GPU support (#4957) 2024-01-17 18:46:30 +02:00
infill main : Add ChatML functionality to main example (#4046) 2023-11-20 14:56:59 +01:00
jeopardy parallel : add option to load external prompt file (#3416) 2023-10-06 16:16:38 +03:00
llama-bench llama : ggml-backend integration (#4766) 2024-01-12 20:07:38 +01:00
llama.android android : introduce starter project example (#4926) 2024-01-16 15:47:34 +02:00
llama.swiftui llama.swiftui : update models layout (#4826) 2024-01-12 14:48:00 +02:00
llava clip : support more quantization types (#4846) 2024-01-10 15:37:09 +02:00
lookahead english : use typos to fix comments and logs (#4354) 2023-12-12 11:53:36 +02:00
lookup lookup : add prompt lookup decoding example (#4484) 2023-12-22 18:05:56 +02:00
main main : add parameter --no-display-prompt (#4541) 2024-01-13 18:09:08 +02:00
main-cmake-pkg main-cmake-pkg : fix build issue (#4665) 2023-12-29 16:18:20 +02:00
parallel llama : KV cache view API + better KV cache management (#4170) 2023-11-23 19:07:56 +02:00
passkey examples : add passkey test (#3856) 2024-01-08 11:14:04 +02:00
perplexity Add Winogrande evaluation (#5015) 2024-01-18 13:46:27 +02:00
quantize Add ability to use importance matrix for all k-quants (#4930) 2024-01-14 16:21:12 +02:00
quantize-stats llama : per-layer KV cache + quantum K cache (#4309) 2023-12-07 13:03:17 +02:00
save-load-state llama : minimize size used for state save/load (#4820) 2024-01-13 18:29:43 +02:00
server server : fix prompt caching with system prompt (#4914) 2024-01-13 19:31:26 +02:00
simple simple : update error message for KV cache check (#4324) 2023-12-04 18:04:21 +02:00
speculative speculative : threading options (#4959) 2024-01-16 13:04:32 +02:00
tokenize tokenize example: Respect normal add BOS token behavior (#4126) 2023-11-18 14:48:17 -07:00
train-text-from-scratch ggml : change ggml_scale to take a float instead of tensor (#4573) 2023-12-21 23:20:49 +02:00
alpaca.sh alpaca.sh : update model file name (#2074) 2023-07-06 19:17:50 +03:00
base-translate.sh examples : improve base-translate.sh script (#4783) 2024-01-06 11:40:24 +02:00
chat-13B.bat Create chat-13B.bat (#592) 2023-03-29 20:21:09 +03:00
chat-13B.sh examples : read chat prompts from a template file (#1196) 2023-05-03 20:58:11 +03:00
chat-persistent.sh llama : fix session saving/loading (#3400) 2023-10-03 21:04:01 +03:00
chat-vicuna.sh examples : add chat-vicuna.sh (#1854) 2023-06-15 21:05:53 +03:00
chat.sh main : log file (#2748) 2023-08-30 09:29:32 +03:00
CMakeLists.txt metal : remove old API (#4919) 2024-01-13 20:45:45 +02:00
gpt4all.sh examples : add -n to alpaca and gpt4all scripts (#706) 2023-04-13 16:03:39 +03:00
json-schema-to-grammar.py chmod : make scripts executable (#2675) 2023-08-23 17:29:09 +03:00
llama2-13b.sh gitignore : changes for Poetry users + chat examples (#2284) 2023-07-21 13:53:27 +03:00
llama2.sh gitignore : changes for Poetry users + chat examples (#2284) 2023-07-21 13:53:27 +03:00
llama.vim vim : streaming and more (#2495) 2023-08-08 14:44:48 +03:00
llm.vim llm.vim : stop generation at multiple linebreaks, bind to <F2> (#2879) 2023-08-30 09:50:55 +03:00
make-ggml.py make-ggml.py : compatibility with more models and GGUF (#3290) 2023-09-27 19:25:12 +03:00
Miku.sh MIKU MAYHEM: Upgrading the Default Model for Maximum Fun 🎉 (#2287) 2023-07-21 11:13:18 +03:00
pydantic_models_to_grammar.py examples : fix and improv docs for the grammar generator (#4909) 2024-01-16 14:10:48 +02:00
pydantic-models-to-grammar-examples.py examples : add complete parallel function calling example (#4974) 2024-01-16 19:41:42 +02:00
reason-act.sh chmod : make scripts executable (#2675) 2023-08-23 17:29:09 +03:00
server-llama2-13B.sh chmod : make scripts executable (#2675) 2023-08-23 17:29:09 +03:00