llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-27 11:54:35 +00:00

Author	SHA1	Message	Date
Francis Couture-Harpin	91deef4606	py : rename requirements for convert_legacy_llama.py Needed for scripts/check-requirements.sh	2024-07-04 16:16:21 -04:00
Francis Couture-Harpin	902de8826b	gguf-py : use snake_case in scripts entrypoint export	2024-07-04 16:09:06 -04:00
Georgi Gerganov	39a41a53b0	py : switch to snake_case ggml-ci	2024-07-04 20:44:32 +03:00
Daniel Bevenius	6f63d646c1	tokenize : add --show-count (token) option (#8299 ) This commit adds a new option to the tokenize example, --show-count. When this is set the total number of tokens are printed to stdout. This was added as an option as I was concerned that there might be scripts that use the output from this program and it might be better to not print this information by default. The motivation for this is that can be useful to find out how many tokens a file contains, for example when trying to determine prompt input file sizes for testing. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-07-04 19:38:58 +03:00
ditsuke	07786a61a2	chore: Fixup requirements and build	2024-07-04 15:39:13 +00:00
fairydreaming	807b0c49ff	Inference support for T5 and FLAN-T5 model families (#5763 ) * llama : add inference support and model types for T5 and FLAN-T5 model families * llama : add new API functions to support encoder-decoder models: llama_encode(), llama_model_has_encoder(), llama_model_decoder_start_token() * common, llama-cli, llama-batched : add support for encoder-decoder models * convert-hf : handle shared token embeddings tensors in T5Model * convert-hf : add support for SentencePiece BPE tokenizer in T5Model (for Pile-T5 models) * convert-hf : add MT5ForConditionalGeneration and UMT5ForConditionalGeneration to architectures supported by T5Model * convert : add t5 tokenizer tests, use "slow" HF tokenizer for t5 --------- Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-07-04 15:46:11 +02:00
slaren	5f2d4e60e2	ppl : fix n_seq_max for perplexity (#8277 ) * ppl : fix n_seq_max for perplexity * use 1 seq for kl_divergence	2024-07-03 20:33:31 +03:00
Clint Herron	07a3fc0608	Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258 )	2024-07-02 12:18:10 -04:00
Xuan Son Nguyen	9ef0780062	Fix new line issue with chat template, disable template when in-prefix/suffix is set (#8203 ) * preserve new line llama_chat_format_single * disable chat template if in-prefix/suffix is set * remove redundant change	2024-06-30 20:27:13 +02:00
Xuan Son Nguyen	72272b83a3	fix code typo in llama-cli (#8198 )	2024-06-29 00:14:20 +02:00
Sigbjørn Skjæret	38373cfbab	Add SPM infill support (#8016 ) * add --spm-infill option * support --spm-infill * support --spm-infill	2024-06-28 12:53:43 +02:00
Olivier Chafik	139cc621e9	`json`: restore default additionalProperties to false, fix some pattern escapes (#8180 ) * json: expand ESCAPED_IN_REGEXPS_BUT_NOT_IN_LITERALS charset * json: revert default of additionalProperties to false * Update README.md	2024-06-28 09:26:45 +01:00
Raj Hammeer Singh Hada	387952651a	Delete examples/llama.android/llama/CMakeLists.txt (#8165 ) * Delete examples/llama.android/llama/CMakeLists.txt https://github.com/ggerganov/llama.cpp/pull/8145#issuecomment-2194534244 This file is not being used for building on Android. `llama.cpp/examples/llama.android/llama/src/main/cpp/CMakeLists.txt` is being used instead. * Update CMakeLists.txt Pick local llama.cpp files instead of fetching content from git	2024-06-27 16:39:29 +02:00
Raj Hammeer Singh Hada	ac146628e4	Fix llama-android.cpp for error - "common/common.h not found" (#8145 ) - Path seems to be wrong for the common.h header file in llama-android.cpp file. Fixing the path so the Android Build doesn't fail with the error "There is no file common/common.h"	2024-06-27 03:57:57 +02:00
Daniel Bevenius	9b31a40c6d	clip : suppress unused variable warnings (#8105 ) * clip : suppress unused variable warnings This commit suppresses unused variable warnings for the variables e in the catch blocks. The motivation for this change is to suppress the warnings that are generated on Windows when using the MSVC compiler. The warnings are not displayed when using GCC because GCC will mark all catch parameters as used. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! clip : suppress unused variable warnings Remove e (/e/) instead instead of using GGML_UNUSED. --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-06-27 01:50:09 +02:00
Georgi Gerganov	f3f65429c4	llama : reorganize source code + improve CMake (#8006 ) * scripts : update sync [no ci] * files : relocate [no ci] * ci : disable kompute build [no ci] * cmake : fixes [no ci] * server : fix mingw build ggml-ci * cmake : minor [no ci] * cmake : link math library [no ci] * cmake : build normal ggml library (not object library) [no ci] * cmake : fix kompute build ggml-ci * make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE ggml-ci * move public backend headers to the public include directory (#8122) * move public backend headers to the public include directory * nix test * spm : fix metal header --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * scripts : fix sync paths [no ci] * scripts : sync ggml-blas.h [no ci] --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-26 18:33:02 +03:00
Olivier Chafik	9b2f16f805	`json`: better support for "type" unions (e.g. nullable arrays w/ typed items) (#7863 ) * json: better suport for "type" arrays (e.g. `{"type": ["array", "null"], "items": {"type": "string"}}`) * json: add test for type: [array, null] fix * update tests	2024-06-26 01:46:35 +01:00
Olivier Chafik	6777c544bd	`json`: fix additionalProperties, allow space after enum/const (#7840 ) * json: default additionalProperty to true * json: don't force additional props after normal properties! * json: allow space after enum/const * json: update pydantic example to set additionalProperties: false * json: prevent additional props to redefine a typed prop * port not_strings to python, add trailing space * fix not_strings & port to js+py * Update json-schema-to-grammar.cpp * fix _not_strings for substring overlaps * json: fix additionalProperties default, uncomment tests * json: add integ. test case for additionalProperties * json: nit: simplify condition * reformat grammar integ tests w/ R"""()""" strings where there's escapes * update # tokens in server test: consts can now have trailing space	2024-06-26 01:45:58 +01:00
Daniel Bevenius	e6bf007744	llama : return nullptr from llama_grammar_init (#8093 ) * llama : return nullptr from llama_grammar_init This commit updates llama_grammar_init to return nullptr instead of throwing an exception. The motivation for this is that this function is declared inside an extern "C" block and is intended/may be used from C code which will not be able to handle exceptions thrown, and results in undefined behavior. On Windows and using MSVC the following warning is currently generated: ```console C:\llama.cpp\llama.cpp(13998,1): warning C4297: 'llama_grammar_init': function assumed not to throw an exception but does C:\llama.cpp\llama.cpp(13998,1): message : __declspec(nothrow), throw(), noexcept(true), or noexcept was specified on the function ``` Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! llama : return nullptr from llama_grammar_init Add checks for nullptr when calling llama_grammar_init. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> Co-authored-by: Clint Herron <hanclinto@gmail.com>	2024-06-25 15:07:28 -04:00
Olivier Chafik	84631fe150	`json`: support integer minimum, maximum, exclusiveMinimum, exclusiveMaximum (#7797 ) * json: support minimum for positive integer values * json: fix min 0 * json: min + max integer constraints * json: handle negative min / max integer bounds * json: fix missing paren min/max bug * json: proper paren fix * json: integration test for schemas * json: fix bounds tests * Update json-schema-to-grammar.cpp * json: fix negative max * json: fix negative min (w/ more than 1 digit) * Update test-grammar-integration.cpp * json: nit: move string rules together * json: port min/max integer support to Python & JS * nit: move + rename _build_min_max_int * fix min in [1, 9] * Update test-grammar-integration.cpp * add C++11-compatible replacement for std::string_view * add min/max constrained int field to pydantic json schema example * fix merge * json: add integration tests for min/max bounds * reshuffle/merge min/max integ test cases * nits / cleanups * defensive code against string out of bounds (apparently different behaviour of libstdc++ vs. clang's libc++, can't read final NULL char w/ former)	2024-06-25 20:06:20 +01:00
Xuan Son Nguyen	49c03c79cd	cvector: better prompt handling, add "mean vector" method (#8069 ) * remove completions file * fix inverted vector * add mean method * code style * remove inverted pca hotfix	2024-06-25 13:59:54 +02:00
Xuan Son Nguyen	48e6b92cc3	Add chat template support for llama-cli (#8068 ) * add chat template support for llama-cli * add help message * server: simplify format_chat * more consistent naming * improve * add llama_chat_format_example * fix server * code style * code style * Update examples/main/main.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-06-25 21:56:49 +10:00
HanishKVC	3791ad2193	SimpleChat v3.1: Boolean chat request options in Settings UI, cache_prompt (#7950 ) * SimpleChat: Allow for chat req bool options to be user controlled * SimpleChat: Allow user to control cache_prompt flag in request * SimpleChat: Add sample GUI images to readme file Show the chat screen and the settings screen * SimpleChat:Readme: Add quickstart block, title to image, cleanup * SimpleChat: RePosition contents of the Info and Settings UI Make it more logically structured and flow through. * SimpleChat: Rename to apiRequestOptions from chatRequestOptions So that it is not wrongly assumed that these request options are used only for chat/completions endpoint. Rather these are used for both the end points, so rename to match semantic better. * SimpleChat: Update image included with readme wrt settings ui * SimpleChat:ReadMe: Switch to webp screen image to reduce size	2024-06-25 21:27:35 +10:00
Yann Follet	646ef4a9cf	embedding : more cli arguments (#7458 ) * add parameters for embeddings --embd-normalize --embd-output-format --embd-separator description in the README.md * Update README.md fix tipo * Trailing whitespace * fix json generation, use " not ' * fix merge master * fix code formating group of parameters // embedding print usage for embedding parameters --------- Co-authored-by: Brian <mofosyne@gmail.com>	2024-06-24 08:30:24 +03:00
Aarni Koskela	6a2f298bd7	server : fix JSON-Scheme typo (#7975 )	2024-06-23 11:03:08 -04:00
Xuan Son Nguyen	3e58b0ee35	cvector: fix CI + correct help message (#8064 ) * cvector: fix CI + correct help message * also correct --pca-iter	2024-06-22 18:11:30 +02:00
HatsuneMikuUwU33	adf480c3ab	cvector-generator: Moe Moe Fixie-Fixie for Lots of Formats~! ♡(ᐢ ᴥ ᐢ)♡ (#8052 ) * Update negative.txt * Update positive.txt * Update cvector-generator.cpp * Update cvector-generator.cpp	2024-06-22 17:19:37 +02:00
ddh0	5b48cd53a8	Update llama-quantize ppl/file size output from LLaMA-v1 to Llama-3 values (#8058 ) Uses the values computed by @JohannesGaessler in PR #7413	2024-06-22 15:16:10 +02:00
Douglas Hanley	80ea089d77	llama : allow pooled embeddings on any model (#7477 ) * create append_pooling operation; allow to specify attention_type; add last token pooling; update examples * find result_norm/result_embd tensors properly; update output allocation logic * only use embd output for pooling_type NONE * get rid of old causal_attn accessor * take out attention_type; add in llama_set_embeddings * bypass logits when doing non-NONE pooling	2024-06-21 08:38:22 +03:00
Shuichi Tsutsumi	0e64591e82	swiftui : enable stream updating (#7754 )	2024-06-21 08:30:58 +03:00
luoyu-intel	de391e4c80	[SYCL] Fix windows build and inference (#8003 ) * add sycl preset * fix debug link error. fix windows crash * update README	2024-06-20 21:19:05 +08:00
sasha0552	ba58993152	server : fix smart slot selection (#8020 )	2024-06-20 09:57:10 +10:00
Sigbjørn Skjæret	91c188d6c2	Only use FIM middle token if it exists (#7648 ) * Only use FIM middle if it exists * Only use FIM middle if it exists	2024-06-18 22:19:45 +10:00
Calvin Laurenson	43b35e38ba	Add support for sqrt on CUDA (#7953 ) * cuda sqrt support * enable cuda in pca * fix comments in pca * add test * add sqrt to ggml_backend_cuda_supports_op * fix test * new line * Use F32 sqrtf instead of F64 sqrt Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2024-06-17 00:23:04 +02:00
Xuan Son Nguyen	0c7b3595b9	Add `cvector-generator` example (#7514 ) * add control-vector-generator * calc diff * add comments * proof-of-concept stdlib implementation Implements PCA and file writing using mostly standard libraries. The output is recognized as a functional control vector, but outputs gibberish. * param parsing, refactor, comments Added basic command-line parameters for outfile and one each positive/negative prompt. Refactored some messy code in PCA computation and GGUF exporting. Left a bunch of comments regarding further work needed. * example template completions Implements an example template set built from the positive/negative prompts like the control vector Python implementation. * add multi prompts, multi-thread for PCA * fix mem error * add debugs * fix matrix transpose multiplication you have got to be kidding me * preliminary template/multiprompt support model is running out of context and that ought to be fixed (segfaulting) but other than that it looks goodish * fix zero output & param parsing, functional templating fixed a bug where the output file had no tensor data/was all zero fixed a bug where single hyphen flags were not being correctly parsed implements creation of templated prompts from input (still need to adapt based on model) * fix square_diff matmul index range and CRLF->LF line endings fixed a logic error where square_diff would not multiply all rows fixed a formatting error where the provided completions.txt had CRLF line endings * add command-line args for num threads, num completions file lines, always reload model refactored a few things and did what the commit message says on the tin * code aestheticization * fix compiler warnings * in-series multithreading for prompt embedding? added commented-out code to attempt to start implementing mutlithreading for embedding in main * remove unnecessary multithreading * interim fix memory leak * translated everything but PCA (I think) * tentatively translate the rest * fix ggml errors and make new ones at least it compiles and runs * fix cb_eval * temporary commit while I move dev environments it finally outputs a functioning control vector - "functioning" in the sense that it can be loaded and it clearly has the right idea, but makes the model incoherent * update debug statements * pre-tokenize so we can allocate correct memory to ctx_diffs_wrapped * update comments * (wip) refactor * clean up PCA ggml implementation * fix shape of v_diff_original * add n_batch for pca * working version * remember to copy back the last_eigenvector * fix n_completions * bring back n_completions * default n_pca_batch to 20 * fix macos build * add to makefile all targets * use ggml_format_name * add readme * fix .editorconfig * use ggml_backend_tensor_copy * attemp to fix compile problem on mac * fix compile warn * reuse allocr * move param parser to common * better error handling * clean up a bit * add print_usage * shorten help msg * beautify help msg * escape prompt by default * change compile target to llama-cvector-generator * typo * disable GPU for PCA * code style --------- Co-authored-by: Christian Zhou-Zheng <christianzhouzheng@gmail.com>	2024-06-15 18:53:40 +02:00
Radoslav Gerganov	e65bbf606c	llama-bench : fix RPC indication (#7936 ) Show "<backend_name>+RPC" when RPC offloading is used	2024-06-14 16:47:41 +03:00
slaren	f578b86b21	move BLAS to a separate backend (#6210 ) * move BLAS to a separate backend * rename GGML_USE_OPENBLAS to GGML_USE_BLAS * alloc : reuse same buffer when the same buffer type if used multiple times * set number of threads automatically for openblas and blis * sched : print assignments when GGML_SCHED_DEBUG env variable is set * sched : allow ops with weights on an incompatible buffer type This will cause the weight to be copied to a backend that supports the op, which is very costly. The weight should have been stored in a buffer of a backend that can run the op, but llama.cpp cannot do this automatically at the moment. --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-06-13 03:11:35 +02:00
Olivier Chafik	1c641e6aac	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 ) * `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew * server: update refs -> llama-server gitignore llama-server * server: simplify nix package * main: update refs -> llama fix examples/main ref * main/server: fix targets * update more names * Update build.yml * rm accidentally checked in bins * update straggling refs * Update .gitignore * Update server-llm.sh * main: target name -> llama-cli * Prefix all example bins w/ llama- * fix main refs * rename {main->llama}-cmake-pkg binary * prefix more cmake targets w/ llama- * add/fix gbnf-validator subfolder to cmake * sort cmake example subdirs * rm bin files * fix llama-lookup-* Makefile rules * gitignore /llama-* * rename Dockerfiles * rename llama\|main -> llama-cli; consistent RPM bin prefixes * fix some missing -cli suffixes * rename dockerfile w/ llama-cli * rename(make): llama-baby-llama * update dockerfile refs * more llama-cli(.exe) * fix test-eval-callback * rename: llama-cli-cmake-pkg(.exe) * address gbnf-validator unused fread warning (switched to C++ / ifstream) * add two missing llama- prefixes * Updating docs for eval-callback binary to use new `llama-` prefix. * Updating a few lingering doc references for rename of main to llama-cli * Updating `run-with-preset.py` to use new binary names. Updating docs around `perplexity` binary rename. * Updating documentation references for lookup-merge and export-lora * Updating two small `main` references missed earlier in the finetune docs. * Update apps.nix * update grammar/README.md w/ new llama-* names * update llama-rpc-server bin name + doc * Revert "update llama-rpc-server bin name + doc" This reverts commit `e474ef1df4`. * add hot topic notice to README.md * Update README.md * Update README.md * rename gguf-split & quantize bins refs in **/tests.sh --------- Co-authored-by: HanClinto <hanclinto@gmail.com>	2024-06-13 00:41:52 +01:00
Georgi Gerganov	704a35b183	server : restore numeric prompts (#7883 )	2024-06-12 14:42:29 +03:00
Johannes Gäßler	148995e5e5	llama-bench: more compact markdown tables (#7879 )	2024-06-11 14:45:40 +02:00
Olivier Chafik	b61eb9644d	json: refine constraint for whitespace to avoid runaways yet allow pretty print (#7866 )	2024-06-11 02:22:57 +01:00
Olivier Chafik	396b18dfec	`json`: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841 ) * json: fix char pattern in grammar converters * json: prevent number precision & whitespace runaways in example grammars * json: add doc to grammar readme	2024-06-11 01:00:30 +01:00
Georgi Gerganov	c28a83902c	examples : remove --instruct remnants (#7846 )	2024-06-10 15:00:15 +03:00
Georgi Gerganov	d9da0e4986	server : improve "prompt" handling (#7847 )	2024-06-10 14:59:55 +03:00
Georgi Gerganov	e95beeb1fc	imatrix : handle partial entries (#7833 )	2024-06-09 20:19:35 +03:00
mgroeber9110	3e2ee44315	server: do not remove whitespace at the start of a completion chunk (#7830 )	2024-06-09 20:50:35 +10:00
slaren	fe1e3917cf	Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682 )" (#7808 ) This reverts commit `9422c5e34b`.	2024-06-09 01:43:39 +02:00
sasha0552	7a16ce7db2	server : smart slot selection using Longest Common Prefix (#7728 ) * server : Smart selection of available slot using Longest Common Substring * add usage * remove trailing whitespaces * Use Longest Common Prefix (LCP) instead of LCS * Rename argument	2024-06-08 10:50:31 +03:00
Christian Zhou-Zheng	c00fad71e5	gguf-split : change binary multi-byte units to decimal (#7803 )	2024-06-07 15:56:01 +03:00
Johannes Gäßler	7027b27d76	server: update cache_prompt documentation [no ci] (#7745 )	2024-06-07 11:15:49 +02:00

1 2 3 4 5 ...

950 Commits