llama.cpp/examples
compilade 4c676c85e5
llama : refactor session file management (#8699)
* llama : refactor session file management

* llama : saving and restoring state checks for overflow

The size of the buffers should now be given to the functions working
with them, otherwise a truncated file could cause out of bound reads.

* llama : stream from session file instead of copying into a big buffer

Loading session files should no longer cause a memory usage spike.

* llama : llama_state_get_size returns the actual size instead of max

This is a breaking change, but makes that function *much* easier
to keep up to date, and it also makes it reflect the behavior
of llama_state_seq_get_size.

* llama : share code between whole and seq_id-specific state saving

Both session file types now use a more similar format.

* llama : no longer store all hparams in session files

Instead, the model arch name is stored.
The layer count and the embedding dimensions of the KV cache
are still verified when loading.
Storing all the hparams is not necessary.

* llama : fix uint64_t format type

* llama : various integer type cast and format string fixes

Some platforms use "%lu" and others "%llu" for uint64_t.
Not sure how to handle that, so casting to size_t when displaying errors.

* llama : remove _context suffix for llama_data_context

* llama : fix session file loading

llama_state_get_size cannot be used to get the max size anymore.

* llama : more graceful error handling of invalid session files

* llama : remove LLAMA_MAX_RNG_STATE

It's no longer necessary to limit the size of the RNG state,
because the max size of session files is not estimated anymore.

* llama : cast seq_id in comparison with unsigned n_seq_max
2024-07-28 00:42:05 -04:00
..
baby-llama build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
batched batched: fix n_predict parameter (#8527) 2024-07-17 10:34:28 +03:00
batched-bench build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
batched.swift Detokenizer fixes (#8039) 2024-07-05 19:01:35 +02:00
benchmark build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
convert-llama2c-to-ggml build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
cvector-generator cvector: better prompt handling, add "mean vector" method (#8069) 2024-06-25 13:59:54 +02:00
deprecation-warning examples : remove finetune and train-text-from-scratch (#8669) 2024-07-25 10:39:04 +02:00
embedding Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258) 2024-07-02 12:18:10 -04:00
eval-callback ggml : reduce hash table reset cost (#8698) 2024-07-27 04:41:55 +02:00
export-lora examples : export-lora : fix issue with quantized base models (#8687) 2024-07-25 23:49:39 +02:00
gbnf-validator llama : move vocab, grammar and sampling into separate files (#8508) 2024-07-23 13:10:17 +03:00
gguf gguf : handle null name during init (#8587) 2024-07-20 17:15:42 +03:00
gguf-hash gguf-hash : update clib.json to point to original xxhash repo (#8491) 2024-07-16 10:14:16 +03:00
gguf-split build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
gritlm llama : allow pooled embeddings on any model (#7477) 2024-06-21 08:38:22 +03:00
imatrix ggml : reduce hash table reset cost (#8698) 2024-07-27 04:41:55 +02:00
infill infill : assert prefix/suffix tokens + remove old space logic (#8351) 2024-07-08 09:34:35 +03:00
jeopardy build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
llama-bench ggml : reduce hash table reset cost (#8698) 2024-07-27 04:41:55 +02:00
llama.android examples: fix android example cannot be generated continuously (#8621) 2024-07-22 09:54:42 +03:00
llama.swiftui llama.swiftui: fix end of generation bug (#8268) 2024-07-20 16:09:37 +03:00
llava ggml : reduce hash table reset cost (#8698) 2024-07-27 04:41:55 +02:00
lookahead build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
lookup lookup: fibonacci hashing, fix crashes (#8548) 2024-07-17 23:35:44 +02:00
main llama : fix llama_chat_format_single for mistral (#8657) 2024-07-24 13:48:46 +02:00
main-cmake-pkg Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258) 2024-07-02 12:18:10 -04:00
parallel build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
passkey passkey : add short intro to README.md [no-ci] (#8317) 2024-07-05 09:14:24 +03:00
perplexity ppl : fix n_seq_max for perplexity (#8277) 2024-07-03 20:33:31 +03:00
quantize llama : valign + remove unused ftype (#8502) 2024-07-16 10:00:30 +03:00
quantize-stats ggml : minor naming changes (#8433) 2024-07-12 10:46:02 +03:00
retrieval llama : allow pooled embeddings on any model (#7477) 2024-06-21 08:38:22 +03:00
rpc llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
save-load-state llama : refactor session file management (#8699) 2024-07-28 00:42:05 -04:00
server server : add Speech Recognition & Synthesis to UI (#8679) 2024-07-26 00:10:16 +02:00
simple build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
speculative build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
sycl Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258) 2024-07-02 12:18:10 -04:00
tokenize ggml : reduce hash table reset cost (#8698) 2024-07-27 04:41:55 +02:00
base-translate.sh build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
chat-13B.bat Create chat-13B.bat (#592) 2023-03-29 20:21:09 +03:00
chat-13B.sh build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
chat-persistent.sh build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
chat-vicuna.sh build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
chat.sh build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
CMakeLists.txt examples : remove finetune and train-text-from-scratch (#8669) 2024-07-25 10:39:04 +02:00
convert_legacy_llama.py convert-*.py: GGUF Naming Convention Refactor and Metadata Override Refactor (#7499) 2024-07-18 20:40:15 +10:00
json_schema_pydantic_example.py py : type-check all Python scripts with Pyright (#8341) 2024-07-07 15:04:39 -04:00
json_schema_to_grammar.py py : type-check all Python scripts with Pyright (#8341) 2024-07-07 15:04:39 -04:00
llama.vim llama.vim : added api key support (#5090) 2024-01-23 08:51:27 +02:00
llm.vim llm.vim : stop generation at multiple linebreaks, bind to <F2> (#2879) 2023-08-30 09:50:55 +03:00
Miku.sh build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
pydantic_models_to_grammar_examples.py examples : Rewrite pydantic_models_to_grammar_examples.py (#8493) 2024-07-20 22:09:17 -04:00
pydantic_models_to_grammar.py pydantic : replace uses of __annotations__ with get_type_hints (#8474) 2024-07-14 19:51:21 -04:00
reason-act.sh build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
regex_to_grammar.py py : switch to snake_case (#8305) 2024-07-05 07:53:33 +03:00
server_embd.py py : type-check all Python scripts with Pyright (#8341) 2024-07-07 15:04:39 -04:00
server-llama2-13B.sh build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
ts-type-to-grammar.sh JSON schema conversion: ️ faster repetitions, min/maxLength for strings, cap number length (#6555) 2024-04-12 19:43:38 +01:00