llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-11-14 06:49:54 +00:00

History

Georgi Gerganov edd4c14817 llama : more tokenizer fixes (#2810 ) * tests : write a Python tokenizer test (wip) * llama : prefix input text for tokenization with whitespace * llama : distinguish pieces from decoded text + fix detokenization * common : add comments * examples : no longer manually add leading space when tokenizing * tests : use Python to generate tokenizer tests for C++ * tests : add option to tokenize text files ggml-ci * tests : add test-tokenizer-1.py * llama.cpp : fix LF token * hellaswag : move the concat space for clarity * tests : add falcon tests (py + cpp, currently do not pass Unicode) ggml-ci * common : temporary separate llama_detokenize calls for SPM and BPE --------- Co-authored-by: klosax <131523366+klosax@users.noreply.github.com>	2023-08-27 14:19:19 +03:00
..
CMakeLists.txt	cmake : install targets (#2256 )	2023-07-19 10:01:11 +03:00
save-load-state.cpp	llama : more tokenizer fixes (#2810 )	2023-08-27 14:19:19 +03:00

Georgi Gerganov edd4c14817

* tests : write a Python tokenizer test (wip)

* llama : prefix input text for tokenization with whitespace

* llama : distinguish pieces from decoded text + fix detokenization

* common : add comments

* examples : no longer manually add leading space when tokenizing

* tests : use Python to generate tokenizer tests for C++

* tests : add option to tokenize text files

ggml-ci

* tests : add test-tokenizer-1.py

* llama.cpp : fix LF token

* hellaswag : move the concat space for clarity

* tests : add falcon tests (py + cpp, currently do not pass Unicode)

ggml-ci

* common : temporary separate llama_detokenize calls for SPM and BPE

---------

Co-authored-by: klosax <131523366+klosax@users.noreply.github.com>

2023-08-27 14:19:19 +03:00

CMakeLists.txt

cmake : install targets (#2256 )

2023-07-19 10:01:11 +03:00

save-load-state.cpp

llama : more tokenizer fixes (#2810 )

2023-08-27 14:19:19 +03:00