llama.cpp/examples
Steve Grubb 988631335a
server : free llama_batch on exit (#7212)
* [server] Cleanup a memory leak on exit

There are a couple memory leaks on exit of the server. This hides others.
After cleaning this up, you can see leaks on slots. But that is another
patch to be sent after this.

* make tab into spaces
2024-05-11 11:13:02 +03:00
..
baby-llama code : normalize enum names (#5697) 2024-02-25 12:09:09 +02:00
batched llama : support Llama 3 HF conversion (#6745) 2024-04-21 14:50:41 +03:00
batched-bench ggml : add Flash Attention (#5021) 2024-04-30 12:16:08 +03:00
batched.swift llama : add option to render special/control tokens (#6807) 2024-04-21 18:36:45 +03:00
beam-search llama : support Llama 3 HF conversion (#6745) 2024-04-21 14:50:41 +03:00
benchmark ggml : remove old quantization functions (#5942) 2024-03-09 15:53:59 +02:00
convert-llama2c-to-ggml TypoFix (#7162) 2024-05-09 10:16:45 +02:00
embedding llama : add Jina Embeddings architecture (#6826) 2024-05-11 10:46:09 +03:00
eval-callback eval-callback : fix conversion to float (#7184) 2024-05-10 01:04:12 +02:00
export-lora ci : add an option to fail on compile warning (#3952) 2024-02-17 23:03:14 +02:00
finetune ggml : introduce bfloat16 support (#6412) 2024-05-08 09:30:09 +03:00
gbnf-validator grammars: 1.5x faster inference w/ complex grammars (vector reserves / reuses) (#6609) 2024-04-11 19:47:34 +01:00
gguf gguf : add option to not check tensor data (#6582) 2024-04-10 21:16:48 +03:00
gguf-split gguf-split: add --no-tensor-first-split (#7072) 2024-05-04 18:56:22 +02:00
gritlm gritlm : add --outdir option to hf.sh script (#6699) 2024-04-16 09:34:06 +03:00
imatrix Fixed save_imatrix to match old behaviour for MoE (#7099) 2024-05-08 02:24:16 +02:00
infill llama : support Llama 3 HF conversion (#6745) 2024-04-21 14:50:41 +03:00
jeopardy parallel : add option to load external prompt file (#3416) 2023-10-06 16:16:38 +03:00
llama-bench llama-bench : add pp+tg test type (#7199) 2024-05-10 18:03:54 +02:00
llama.android llama : support Llama 3 HF conversion (#6745) 2024-04-21 14:50:41 +03:00
llama.swiftui llama : add option to render special/control tokens (#6807) 2024-04-21 18:36:45 +03:00
llava Fix memory bug in grammar parser (#7194) 2024-05-10 21:01:08 +10:00
lookahead llama : support Llama 3 HF conversion (#6745) 2024-04-21 14:50:41 +03:00
lookup Server: fix seed for multiple slots (#6835) 2024-04-24 11:08:36 +02:00
main Fix memory bug in grammar parser (#7194) 2024-05-10 21:01:08 +10:00
main-cmake-pkg build(cmake): simplify instructions (cmake -B build && cmake --build build ...) (#6964) 2024-04-29 17:02:45 +01:00
parallel llama : support Llama 3 HF conversion (#6745) 2024-04-21 14:50:41 +03:00
passkey llama : support Llama 3 HF conversion (#6745) 2024-04-21 14:50:41 +03:00
perplexity perplexity: more statistics, added documentation (#6936) 2024-04-30 23:36:27 +02:00
quantize ggml : introduce bfloat16 support (#6412) 2024-05-08 09:30:09 +03:00
quantize-stats Improve usability of --model-url & related flags (#6930) 2024-04-30 00:52:50 +01:00
retrieval examples : add "retrieval" (#6193) 2024-03-25 09:38:22 +02:00
save-load-state llama : save and restore kv cache for single seq id (#6341) 2024-04-08 15:43:30 +03:00
server server : free llama_batch on exit (#7212) 2024-05-11 11:13:02 +03:00
simple llama : support Llama 3 HF conversion (#6745) 2024-04-21 14:50:41 +03:00
speculative llama : support Llama 3 HF conversion (#6745) 2024-04-21 14:50:41 +03:00
sycl docs: fix typos (#7124) 2024-05-07 18:20:33 +03:00
tokenize BERT tokenizer fixes (#6498) 2024-04-09 13:44:08 -04:00
train-text-from-scratch train : add general name (#6752) 2024-04-19 10:16:45 +03:00
alpaca.sh alpaca.sh : update model file name (#2074) 2023-07-06 19:17:50 +03:00
base-translate.sh examples : improve base-translate.sh script (#4783) 2024-01-06 11:40:24 +02:00
chat-13B.bat Create chat-13B.bat (#592) 2023-03-29 20:21:09 +03:00
chat-13B.sh examples : read chat prompts from a template file (#1196) 2023-05-03 20:58:11 +03:00
chat-persistent.sh llama : fix session saving/loading (#3400) 2023-10-03 21:04:01 +03:00
chat-vicuna.sh examples : add chat-vicuna.sh (#1854) 2023-06-15 21:05:53 +03:00
chat.sh main : log file (#2748) 2023-08-30 09:29:32 +03:00
CMakeLists.txt eval-callback: Example how to use eval callback for debugging (#6576) 2024-04-11 14:51:07 +02:00
gpt4all.sh examples : add -n to alpaca and gpt4all scripts (#706) 2023-04-13 16:03:39 +03:00
json_schema_to_grammar.py JSON schema conversion: ️ faster repetitions, min/maxLength for strings, cap number length (#6555) 2024-04-12 19:43:38 +01:00
json-schema-pydantic-example.py json-schema-to-grammar improvements (+ added to server) (#5978) 2024-03-21 11:50:43 +00:00
llama2-13b.sh gitignore : changes for Poetry users + chat examples (#2284) 2023-07-21 13:53:27 +03:00
llama2.sh gitignore : changes for Poetry users + chat examples (#2284) 2023-07-21 13:53:27 +03:00
llama.vim llama.vim : added api key support (#5090) 2024-01-23 08:51:27 +02:00
llm.vim llm.vim : stop generation at multiple linebreaks, bind to <F2> (#2879) 2023-08-30 09:50:55 +03:00
make-ggml.py make-ggml.py : compatibility with more models and GGUF (#3290) 2023-09-27 19:25:12 +03:00
Miku.sh MIKU MAYHEM: Upgrading the Default Model for Maximum Fun 🎉 (#2287) 2023-07-21 11:13:18 +03:00
pydantic_models_to_grammar.py examples : make pydantic scripts pass mypy and support py3.8 (#5099) 2024-01-25 14:51:24 -05:00
pydantic-models-to-grammar-examples.py examples : make pydantic scripts pass mypy and support py3.8 (#5099) 2024-01-25 14:51:24 -05:00
reason-act.sh chmod : make scripts executable (#2675) 2023-08-23 17:29:09 +03:00
regex-to-grammar.py JSON schema conversion: ️ faster repetitions, min/maxLength for strings, cap number length (#6555) 2024-04-12 19:43:38 +01:00
server-embd.py server : refactor (#5882) 2024-03-07 11:41:53 +02:00
server-llama2-13B.sh chmod : make scripts executable (#2675) 2023-08-23 17:29:09 +03:00
ts-type-to-grammar.sh JSON schema conversion: ️ faster repetitions, min/maxLength for strings, cap number length (#6555) 2024-04-12 19:43:38 +01:00