llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-25 19:04:35 +00:00

History

Mikko Juola 57684331fc Make tokenize CLI tool have nicer command line arguments. (#6188 ) * Make tokenizer.cpp CLI tool nicer. Before this commit, tokenize was a simple CLI tool like this: tokenize MODEL_FILENAME PROMPT [--ids] This simple tool loads the model, takes the prompt, and shows the tokens llama.cpp is interpreting. This changeset makes the tokenize more sophisticated, and more useful for debugging and troubleshooting: tokenize [-m, --model MODEL_FILENAME] [--ids] [--stdin] [--prompt] [-f, --file] [--no-bos] [--log-disable] It also behaves nicer on Windows now, interpreting and rendering Unicode from command line arguments and pipes no matter what code page the user has set on their terminal. * style fix: strlen(str) == 0 --> str == 0 Simplify tokenize.cpp; by getting rid of handling positional style arguments. It must now be invoked with long --model, --prompt etc. arguments only. Shortens the code. * tokenize.cpp: iostream header no longer required --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: brian khuu <mofosyne@gmail.com>	2024-05-25 11:14:42 +10:00
..
CMakeLists.txt	examples : add tokenize (#4039 )	2023-11-17 17:36:44 +02:00
tokenize.cpp	Make tokenize CLI tool have nicer command line arguments. (#6188 )	2024-05-25 11:14:42 +10:00

Make tokenize CLI tool have nicer command line arguments. (#6188 )

* Make tokenizer.cpp CLI tool nicer.

Before this commit, tokenize was a simple CLI tool like this:

  tokenize MODEL_FILENAME PROMPT [--ids]

This simple tool loads the model, takes the prompt, and shows the tokens
llama.cpp is interpreting.

This changeset makes the tokenize more sophisticated, and more useful
for debugging and troubleshooting:

  tokenize [-m, --model MODEL_FILENAME]
           [--ids]
           [--stdin]
           [--prompt]
           [-f, --file]
           [--no-bos]
           [--log-disable]

It also behaves nicer on Windows now, interpreting and rendering Unicode
from command line arguments and pipes no matter what code page the user
has set on their terminal.

* style fix: strlen(str) == 0 --> *str == 0

* Simplify tokenize.cpp; by getting rid of handling positional style arguments.

It must now be invoked with long --model, --prompt etc. arguments only.
Shortens the code.

* tokenize.cpp: iostream header no longer required

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: brian khuu <mofosyne@gmail.com>

2024-05-25 11:14:42 +10:00

CMakeLists.txt

examples : add tokenize (#4039 )

2023-11-17 17:36:44 +02:00

tokenize.cpp

Make tokenize CLI tool have nicer command line arguments. (#6188 )

2024-05-25 11:14:42 +10:00