llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-02 14:54:35 +00:00

Author	SHA1	Message	Date
Jared Van Bortel	3e09e127eb	rename ggml-vulkan -> ggml-kompute	2023-12-13 17:49:45 -05:00
Jared Van Bortel	56430c3209	relicense Vulkan backend as MIT	2023-12-13 17:49:19 -05:00
Jared Van Bortel	9ae88baf38	Merge remote-tracking branch 'upstream/master' into nomic-vulkan-redo	2023-11-23 17:22:09 -05:00
Jared Van Bortel	a4bb9c5ced	vulkan : sync with "migrate to dynamic graphs"	2023-11-23 17:22:09 -05:00
Jared Van Bortel	23f6d51f68	Merge commit '4760e7cc0b68570d58f55e8dda469805d1759d0d' into nomic-vulkan	2023-11-23 17:22:09 -05:00
Jared Van Bortel	208cd52f7d	vulkan : implement YaRN RoPE scaling (#2268 ) The NeoX cur_rot part is different because I'm pretty sure my original implementation was wrong.	2023-11-23 17:22:09 -05:00
Jared Van Bortel	1829f1d7be	Merge commit '4760e7cc0b68570d58f55e8dda469805d1759d0d~' into nomic-vulkan	2023-11-23 17:22:08 -05:00
Jared Van Bortel	02c3309f6d	merge fixup (`e16b9fa4ba`)	2023-11-23 17:22:05 -05:00
Jared Van Bortel	9c4dfd06e8	mention skipped change	2023-11-23 17:22:05 -05:00
Jared Van Bortel	fe26e6adff	Merge commit 'e16b9fa4baa8a09c6619b116159830e898050942' into nomic-vulkan	2023-11-23 17:22:04 -05:00
Jared Van Bortel	6474fc879a	vulkan : handle ggml_scale for n%8 != 0 ref ggerganov/llama.cpp#3754	2023-11-23 17:22:00 -05:00
Jared Van Bortel	2a41ba7258	Merge commit '469c9addef75893e6be12edda852d12e840bf064' into nomic-vulkan	2023-11-23 17:22:00 -05:00
Jared Van Bortel	a934b2cb8a	vulkan : assert various kernel requirements	2023-11-23 17:22:00 -05:00
Jared Van Bortel	f194e1b6a6	Merge commit 'fcca0a700487999d52a525c96d6661e9f6a8703a' into nomic-vulkan	2023-11-23 17:21:59 -05:00
Jared Van Bortel	39abedd1d7	vulkan : optimize workgroup sizes	2023-11-23 17:18:48 -05:00
Jared Van Bortel	84f7fc4553	vulkan : rope n_past is now KQ_pos, f16 rope kernel	2023-11-23 17:18:42 -05:00
Jared Van Bortel	71565eb0c3	vulkan : replace ggml_diag_mask_inf with ggml_add (custom -inf mask)	2023-11-23 17:18:27 -05:00
Georgi Gerganov	6b0a7420d0	llama : KV cache view API + better KV cache management (#4170 ) * llama : keep track of used KV cells + better KV cache management * llama : zero KV cache used upon clear ggml-ci * llama : allow exporting a view of the KV cache (#4180) * Allow exporting a view of the KV cache * Allow dumping the sequences per cell in common * Track max contiguous cells value and position as well * Fix max contiguous empty cells index calculation Make dump functions deal with lengths or sequences counts > 10 better * Fix off by one error in dump_kv_cache_view * Add doc comments for KV cache view functions Eliminate cell sequence struct; use llama_seq_id directly Minor cleanups * common : add -dkvc arg for enabling kv cache dumps --------- Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>	2023-11-23 19:07:56 +02:00
Georgi Gerganov	d103d935c0	readme : update hot topics	2023-11-23 13:51:22 +02:00
Daniel Bevenius	9d5949f04b	examples : fix typo in parallel example doc comment (#4181 ) Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2023-11-23 13:34:20 +02:00
Georgi Gerganov	ff8238f71d	docs : add llama-star arch idea	2023-11-23 11:35:04 +02:00
Galunid	8e672efe63	stablelm : simplify + speedup generation (#4153 )	2023-11-21 16:22:30 +01:00
Galunid	0b871f1a04	finetune - update readme to mention llama support only (#4148 )	2023-11-20 19:30:00 +01:00
Aaryaman Vasishta	dfc7cd48b1	readme : update ROCm Windows instructions (#4122 ) * Update README.md * Update README.md Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> --------- Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>	2023-11-20 17:02:46 +02:00
Seb C	881800d1f0	main : Add ChatML functionality to main example (#4046 ) Co-authored-by: Sebastian Cramond <sebby37@users.noreply.github.com>	2023-11-20 14:56:59 +01:00
Galunid	f23c0359a3	ci : add flake8 to github actions (python linting) (#4129 ) Disabled rules: * E203 Whitespace before ':' - disabled because we often use 'C' Style where values are aligned * E211 Whitespace before '(' (E211) - disabled because we often use 'C' Style where values are aligned * E221 Multiple spaces before operator - disabled because we often use 'C' Style where values are aligned * E225 Missing whitespace around operator - disabled because it's broken so often it seems like a standard * E231 Missing whitespace after ',', ';', or ':' - disabled because we often use 'C' Style where values are aligned * E241 Multiple spaces after ',' - disabled because we often use 'C' Style where values are aligned * E251 Unexpected spaces around keyword / parameter equals - disabled because it's broken so often it seems like a standard * E261 At least two spaces before inline comment - disabled because it's broken so often it seems like a standard * E266 Too many leading '#' for block comment - sometimes used as "section" separator * E501 Line too long - disabled because it's broken so often it seems like a standard * E701 Multiple statements on one line (colon) - broken only in convert.py when defining abstract methods (we can use# noqa instead) * E704 Multiple statements on one line - broken only in convert.py when defining abstract methods (we can use# noqa instead)	2023-11-20 11:35:47 +01:00
Branden Butler	40a34fe8d0	speculative : fix prompt tokenization in speculative example (#4025 ) * Support special tokens and not adding BOS to prompt in speculative * Adapt to new should_add_bos function * Ensure tgt and dft have same add_bos setting	2023-11-20 11:50:04 +02:00
Georgi Gerganov	dae06c06e5	Revert "finetune : add --n-gpu-layers flag info to --help (#4128 )" This reverts commit `05e8301e45`.	2023-11-19 19:16:07 +02:00
Clark Saben	05e8301e45	finetune : add --n-gpu-layers flag info to --help (#4128 )	2023-11-19 18:56:38 +02:00
SoftwareRenderer	936c79b227	server : relay error messages (#4131 )	2023-11-19 18:54:10 +02:00
kchro3	262005ad9d	common : comma should be semicolon (#4137 )	2023-11-19 18:52:57 +02:00
Georgi Gerganov	35985acffa	gitignore : tokenize	2023-11-19 18:50:49 +02:00
slaren	e937066420	gguf-py : export chat templates (#4125 ) * gguf-py : export chat templates * llama.cpp : escape new lines in gguf kv info prints * gguf-py : bump version * gguf-py : check chat_template type * gguf-py : initialize chat_template	2023-11-19 11:10:52 +01:00
Kerfuffle	28a2e6e7d4	tokenize example: Respect normal add BOS token behavior (#4126 ) Allow building with Makefile	2023-11-18 14:48:17 -07:00
Galunid	0b5c3b0457	scripts : Remove missed baichuan convert script (#4127 )	2023-11-18 21:08:33 +01:00
Kerfuffle	2923f17f6f	Clean up ggml-cuda.cu warnings when compiling with clang (for ROCM) (#4124 ) * ggml-cuda.cu: Clean up warnings when compiling with clang * ggml-cuda.cu: Move static items into anonymous namespace * ggml-cuda.cu: Fix use of namespace start macro * Revert "ggml-cuda.cu: Fix use of namespace start macro" This reverts commit `26c1149026`. * Revert "ggml-cuda.cu: Move static items into anonymous namespace" This reverts commit `e29757e0f7`.	2023-11-18 08:11:18 -07:00
slaren	bbecf3f415	llama : increase max nodes (#4115 )	2023-11-17 21:39:11 +02:00
Roger Meier	8e9361089d	build : support ppc64le build for make and CMake (#3963 ) * build: support ppc64le build for make and CMake * build: keep __POWER9_VECTOR__ ifdef and extend with __powerpc64__ Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-11-17 18:11:23 +02:00
Georgi Gerganov	5ad387e994	tokenize : fix trailing whitespace	2023-11-17 18:01:38 +02:00
zakkor	2fa02b4b3d	examples : add tokenize (#4039 )	2023-11-17 17:36:44 +02:00
Don Mahurin	2ab0707acb	convert : use 'model' value if it exists. This allows karpathy/tinyllamas to load (#4089 ) Co-authored-by: Don Mahurin <@>	2023-11-17 17:32:34 +02:00
John	11173c92d6	py : Falcon HF compatibility (#4104 ) Falcon HF compatibility	2023-11-17 17:24:30 +02:00
Jannis Schönleber	9e87ef60e1	common : improve yaml log escaping (#4080 ) * logging: improve escaping in yaml output * logging: include review feedback	2023-11-17 17:24:07 +02:00
Huawei Lin	c7cce1246e	llava : fix compilation warning that fread return value is not used (#4069 )	2023-11-17 17:22:56 +02:00
Jiří Podivín	f7d5e97542	py : remove superfluous import statements (#4076 ) Signed-off-by: Jiri Podivin <jpodivin@gmail.com> Co-authored-by: Jiri Podivin <jpodivin@redhat.com>	2023-11-17 17:20:53 +02:00
Jiří Podivín	ba4cf5c0bf	train : move number of gpu layers argument parsing to common/train.cpp (#4074 ) - introduces help entry for the argument - cuts '--gpu-layers' form in order to simplify usage and documentation. Signed-off-by: Jiri Podivin <jpodivin@gmail.com> Co-authored-by: Jiri Podivin <jpodivin@redhat.com>	2023-11-17 17:19:16 +02:00
slaren	e85bb1a8e7	llama : add functions to get the model's metadata (#4013 ) * llama : add functions to get the model's metadata * format -> std::to_string * better documentation	2023-11-17 17:17:37 +02:00
gwjr	3e916a07ac	finetune : speed-up ggml_compute_forward_out_prod_f32 via BLAS (#4079 ) * Remove logically superfluous assertions and order by dimension * Use cblas_sgemm() to implement ggml_compute_forward_out_prod() * Remove ggml_compute_forward_out_prod_use_blas(), fix compiling errors on cmake/zig, remove trailing whitespace * Add openBLAS support for sgemm() in compute_forward_out_prod()	2023-11-17 16:48:19 +02:00
Andrew Godfrey	947f64f163	finetune : zero the loraB initial vectors (#4082 ) * finetune : zero the loraB initial vectors Without this, the first iteration is starting out far from the base model, instead of exactly on it. Zeroing loraB is what the paper recommends. loralib also zeroes at least one of the init vector pairs (though it departs from the paper in using a different distribution for the other vector, in some cases). * tabs to spaces * Use ggml_set_zero instead of adding a new function	2023-11-17 11:23:11 +01:00
Andrew Godfrey	b83e149ec6	cuda : get_row_rounding F32 (#4095 ) * Fix #4017 * Update ggml-cuda.cu Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update ggml-cuda.cu Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> --------- Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>	2023-11-17 10:01:15 +02:00

1 2 3 4 5 ...

1631 Commits