llama.cpp/models
Georgi Gerganov 92139b90af
tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)
* tests : add test-tokenizer-0.sh

* unicode : add all unicode number ranges

* starcoder : fix pre-tokenizer

* tests : add test that fails with DeepSeek tokenizers

* falcon : fix regex

* unicode : regenerate unicode tables

* refact : add tokenizer model

* lint : fix

* tests : disable failing tests

ggml-ci

* refact : add tests files

ggml-ci

* convert : print -> logging

ggml-ci

* lint : fix

* unicode : digit -> number

* phi-3 : update
2024-05-04 08:32:32 +03:00
..
.editorconfig gguf : new file format with flexible meta data (beta) (#2398) 2023-08-21 23:07:43 +03:00
ggml-vocab-aquila.gguf Work on the BPE tokenizer (#3252) 2023-10-03 09:16:26 +02:00
ggml-vocab-baichuan.gguf Add more tokenizer tests (#3742) 2023-10-24 09:17:17 +02:00
ggml-vocab-bert-bge.gguf llama : fix BPE pre-tokenization (#6920) 2024-04-29 16:58:41 +03:00
ggml-vocab-bert-bge.gguf.inp tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
ggml-vocab-bert-bge.gguf.out tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
ggml-vocab-deepseek-coder.gguf llama : fix BPE pre-tokenization (#6920) 2024-04-29 16:58:41 +03:00
ggml-vocab-deepseek-coder.gguf.inp tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
ggml-vocab-deepseek-coder.gguf.out tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
ggml-vocab-deepseek-llm.gguf llama : fix BPE pre-tokenization (#6920) 2024-04-29 16:58:41 +03:00
ggml-vocab-deepseek-llm.gguf.inp tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
ggml-vocab-deepseek-llm.gguf.out tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
ggml-vocab-falcon.gguf llama : fix BPE pre-tokenization (#6920) 2024-04-29 16:58:41 +03:00
ggml-vocab-falcon.gguf.inp tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
ggml-vocab-falcon.gguf.out tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
ggml-vocab-gpt2.gguf gpt2 : Add gpt2 architecture integration (#4555) 2023-12-28 15:03:57 +01:00
ggml-vocab-gpt-2.gguf llama : fix BPE pre-tokenization (#6920) 2024-04-29 16:58:41 +03:00
ggml-vocab-gpt-2.gguf.inp tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
ggml-vocab-gpt-2.gguf.out tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
ggml-vocab-gpt-neox.gguf Add more tokenizer tests (#3742) 2023-10-24 09:17:17 +02:00
ggml-vocab-llama-bpe.gguf llama : fix BPE pre-tokenization (#6920) 2024-04-29 16:58:41 +03:00
ggml-vocab-llama-bpe.gguf.inp tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
ggml-vocab-llama-bpe.gguf.out tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
ggml-vocab-llama-spm.gguf llama : fix BPE pre-tokenization (#6920) 2024-04-29 16:58:41 +03:00
ggml-vocab-llama-spm.gguf.inp tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
ggml-vocab-llama-spm.gguf.out tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
ggml-vocab-mpt.gguf llama : fix BPE pre-tokenization (#6920) 2024-04-29 16:58:41 +03:00
ggml-vocab-mpt.gguf.inp tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
ggml-vocab-mpt.gguf.out tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
ggml-vocab-phi-3.gguf tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
ggml-vocab-phi-3.gguf.inp tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
ggml-vocab-phi-3.gguf.out tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
ggml-vocab-refact.gguf tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
ggml-vocab-refact.gguf.inp tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
ggml-vocab-refact.gguf.out tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
ggml-vocab-stablelm.gguf llama : fix BPE pre-tokenization (#6920) 2024-04-29 16:58:41 +03:00
ggml-vocab-starcoder.gguf llama : fix BPE pre-tokenization (#6920) 2024-04-29 16:58:41 +03:00
ggml-vocab-starcoder.gguf.inp tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00
ggml-vocab-starcoder.gguf.out tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) 2024-05-04 08:32:32 +03:00