llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-29 04:44:34 +00:00

Author	SHA1	Message	Date
Faisal Zaghloul	968967376d	Add `JAIS` model(s) (#8118 ) * Add `JAIS` model(s) * cleanup * address review comments * remove hack * un-hardcode max-alibi-bias * minor tweaks --------- Co-authored-by: fmz <quic_fzaghlou@quic.com>	2024-07-02 16:36:00 +02:00
Daniel Bevenius	023b8807e1	convert-hf : print output file name when completed (#8181 ) * convert-hf : print output file name when completed This commit adds the output file name to the log message when the conversion is completed. The motivation for this change is that when `--outfile` option is not specified it migth not be obvious where the output file is written. With this change the output of running the script will be something like the following: ```console INFO:hf-to-gguf:Model successfully exported to models/gemma-2-9b-it.gguf. ``` Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! convert-hf : print output file name when completed Updates the output of to support printing the directory if the output is split into multiple files. Also the output file name is now retrieved from the model_instance object. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! convert-hf : print output file name when completed Use parent attribute of Path object and string interpolation. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! convert-hf : print output file name when completed Use os.sep instead of hardcoding the path separator. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-07-02 09:40:49 +03:00
slaren	0e0590adab	cuda : update supports_op for matrix multiplication (#8245 )	2024-07-02 09:39:38 +03:00
luoyu-intel	a9f3b10215	[SYCL] Fix win build conflict of math library (#8230 ) * fix win build conflict of math library * fix the condition: !(win32 & SYCL) * revert warp_size=16	2024-07-02 12:50:07 +08:00
luoyu-intel	d08c20edde	[SYCL] Fix the sub group size of Intel (#8106 ) * use warp_size macro for all sycl kernels * fix mask of permute_sub_group_by_xor * fix rms_norm with correct warp number * fix rms_norm_f32/group_norm_f32 * move norm to norm.cpp file * fix quantize bug * fix mmvq's batch size	2024-07-02 10:16:00 +08:00
Xuan Son Nguyen	5fac350b9c	Fix gemma2 tokenizer convert (#8244 ) * fix gemma2 tokenizer convert * remove scores * improve code, fix new line issue	2024-07-02 01:07:23 +02:00
Johannes Gäßler	cb5fad4c6c	CUDA: refactor and optimize IQ MMVQ (#8215 ) * CUDA: refactor and optimize IQ MMVQ * uint -> uint32_t * __dp4a -> ggml_cuda_dp4a * remove MIN_CC_DP4A checks * change default * try CI fix	2024-07-01 20:39:06 +02:00
Mateusz Charytoniuk	dae57a1ebc	readme: add Paddler to the list of projects (#8239 )	2024-07-01 20:13:22 +03:00
Xuan Son Nguyen	49122a873f	gemma2: add sliding window mask (#8227 ) * gemma2: add sliding window mask * fix data_swa uninitialized * better naming * add co-author Co-authored-by: Arlo Phoenix <arlo-phoenix@users.noreply.github.com> * replace list with single tensor * update * llama : minor styling * convert : add sanity check for query_pre_attn_scalar * fix small typo in README --------- Co-authored-by: Arlo Phoenix <arlo-phoenix@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-07-01 18:48:34 +02:00
Roni	0ddeff1023	readme : update tool list (#8209 ) * Added gppm to Tool list in README * Update README.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-07-01 15:48:16 +03:00
Michael Francis	3840b6f593	nix : enable curl (#8043 ) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-07-01 14:47:04 +03:00
Georgi Gerganov	257f8e41e2	nix : remove OpenCL remnants (#8235 ) * nix : remove OpenCL remnants * minor : remove parentheses	2024-07-01 14:46:18 +03:00
iacore	694c59cb42	Document BERT support. (#8205 ) * Update README.md document BERT support * Update README.md	2024-07-01 13:40:58 +02:00
zhentaoyu	197fe6c1d7	[SYCL] Update SYCL-Rope op and Refactor (#8157 ) * align with rope.cu and move sycl-op to a single file	2024-07-01 19:39:06 +08:00
Georgi Gerganov	d0a7145ba9	flake.lock: Update (#8218 )	2024-06-30 16:09:34 -07:00
Xuan Son Nguyen	9ef0780062	Fix new line issue with chat template, disable template when in-prefix/suffix is set (#8203 ) * preserve new line llama_chat_format_single * disable chat template if in-prefix/suffix is set * remove redundant change	2024-06-30 20:27:13 +02:00
Andrei	1c5eba6f8e	llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 (#8197 ) * Add attention and final logit softcapping. * fix * Add custom add_ functions * Disable flash attention for Gemma2 * Update src/llama.cpp Co-authored-by: slaren <slarengh@gmail.com> * Add default value for attention and final logit softcap value * Add custom kq scaling from Gemma2Attention * Remove custom pre attention scaling and use computed value instead. --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-29 23:44:08 -04:00
Francis Couture-Harpin	8fbd59308b	ggml-quants : attempt to fix Arm 32-bit support	2024-06-28 22:52:57 -04:00
Francis Couture-Harpin	ec50944bf6	ggml-quants : fix build failure on Windows	2024-06-28 20:41:13 -04:00
Francis Couture-Harpin	bfd2f21fb4	bitnet : replace 1.58b with b1.58, as in the paper	2024-06-28 20:38:12 -04:00
Xuan Son Nguyen	72272b83a3	fix code typo in llama-cli (#8198 )	2024-06-29 00:14:20 +02:00
Olivier Chafik	8748d8ac6f	json: attempt to skip slow tests when running under emulator (#8189 )	2024-06-28 18:02:05 +01:00
Xuan Son Nguyen	26a39bbd6b	Add MiniCPM, Deepseek V2 chat template + clean up `llama_chat_apply_template_internal` (#8172 ) * tmp_contains * minicpm chat template * add DeepSeek Lite template * change deepseek-lite to deepseek2 * correct code comment * correct code from master branch	2024-06-28 15:11:44 +02:00
Sigbjørn Skjæret	38373cfbab	Add SPM infill support (#8016 ) * add --spm-infill option * support --spm-infill * support --spm-infill	2024-06-28 12:53:43 +02:00
slaren	b851b3fba0	cmake : allow user to override default options (#8178 )	2024-06-28 12:37:45 +02:00
Olivier Chafik	139cc621e9	`json`: restore default additionalProperties to false, fix some pattern escapes (#8180 ) * json: expand ESCAPED_IN_REGEXPS_BUT_NOT_IN_LITERALS charset * json: revert default of additionalProperties to false * Update README.md	2024-06-28 09:26:45 +01:00
pculliton	e57dc62057	llama: Add support for Gemma2ForCausalLM (#8156 ) * Inference support for Gemma 2 model family * Update convert-hf-to-gguf.py, constants, and tensor mappings * cleanup * format fix * Fix special token vocab bug * Don't add space prefix * fix deleted lines * Update src/llama.cpp Co-authored-by: slaren <slarengh@gmail.com> * Add model type names * Add control vector * Fix model type identification --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com> Co-authored-by: slaren <slarengh@gmail.com>	2024-06-27 21:00:43 -07:00
Xuan Son Nguyen	a27aa50ab7	Add missing items in makefile (#8177 )	2024-06-28 02:19:11 +02:00
Olivier Chafik	cb0b06a8a6	`json`: update grammars/README w/ examples & note about additionalProperties (#8132 ) * json: update grammars/README * mention broken prefixItems * add mention to llama-gbnf-validator * json: explicit type: object for nested items object in cli example	2024-06-27 22:08:42 +01:00
loonerin	558f44bf83	CI: fix release build (Ubuntu+Mac) (#8170 ) * CI: fix release build (Ubuntu) PR #8006 changes defaults to build shared libs. However, CI for releases expects static builds. * CI: fix release build (Mac) --------- Co-authored-by: loonerin <loonerin@users.noreply.github.com>	2024-06-27 21:01:23 +02:00
slaren	8172ee9da9	cmake : fix deprecated option names not working (#8171 ) * cmake : fix deprecated option names not working * remove LlAMA_OPENMP	2024-06-27 20:04:39 +02:00
Xuan Son Nguyen	16791b8f0b	Add chatml fallback for cpp `llama_chat_apply_template` (#8160 ) * add chatml fallback for cpp `llama_chat_apply_template` * remove redundant code	2024-06-27 18:14:19 +02:00
Georgi Gerganov	ab3679112d	flake.lock: Update (#8071 ) Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/e9ee548d90ff586a6471b4ae80ae9cfcbceb3420?narHash=sha256-4Zu0RYRcAY/VWuu6awwq4opuiD//ahpc2aFHg2CWqFY%3D' (2024-06-13) → 'github:NixOS/nixpkgs/d603719ec6e294f034936c0d0dc06f689d91b6c3?narHash=sha256-k3JqJrkdoYwE3fHE6xGDY676AYmyh4U2Zw%2B0Bwe5DLU%3D' (2024-06-20) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Philip Taron <philip.taron@gmail.com>	2024-06-27 08:37:29 -07:00
jukofyork	97877eb10b	Control vector loading fixes (#8137 ) * Fixed leak in llama_control_vector_load_one() and allow llama_control_vector_load() to grow * refactored `llama_control_vector_load_one()` * allow multiple directions for same layer in same file * llama_control_vector_load_one() and llama_control_vector_load() now break on error * removed unnecessary ggml_free() call	2024-06-27 16:48:07 +02:00
Raj Hammeer Singh Hada	387952651a	Delete examples/llama.android/llama/CMakeLists.txt (#8165 ) * Delete examples/llama.android/llama/CMakeLists.txt https://github.com/ggerganov/llama.cpp/pull/8145#issuecomment-2194534244 This file is not being used for building on Android. `llama.cpp/examples/llama.android/llama/src/main/cpp/CMakeLists.txt` is being used instead. * Update CMakeLists.txt Pick local llama.cpp files instead of fetching content from git	2024-06-27 16:39:29 +02:00
Sigbjørn Skjæret	6030c61281	Add Qwen2MoE 57B-A14B model identifier (#8158 ) * Add Qwen2MoE 57B-A14B * Add Qwen2MoE 57B-A14B	2024-06-27 16:27:41 +02:00
Johannes Gäßler	85a267daaa	CUDA: fix MMQ stream-k for --split-mode row (#8167 )	2024-06-27 16:26:05 +02:00
kustaaya	f675b20a3b	Added support for Viking pre-tokenizer (#8135 ) Co-authored-by: kustaaya <kustaaya@protonmail.com>	2024-06-27 10:58:54 +02:00
Sigbjørn Skjæret	911e35bb8b	llama : fix CodeLlama FIM token checks (#8144 ) * account for space prefix character * use find instead	2024-06-27 10:46:41 +03:00
Francis Couture-Harpin	0996149911	convert-hf : allow converting the weird BitNet 1.3B Its FFN size is 5460 which is not convenient. The offending tensors are kept in F16, which makes the final model 5.01 bpw.	2024-06-27 02:06:28 -04:00
Francis Couture-Harpin	961e293833	convert-hf : simplify BitNet pre-quantization This still results in the exact same tensor weights and scales, but it reveals some weirdness in the current algorithm.	2024-06-27 02:06:28 -04:00
Francis Couture-Harpin	89dc3b254c	ggml-quants : use ceiling division when quantizing q1_3	2024-06-27 02:06:28 -04:00
Francis Couture-Harpin	9465ec6e12	ggml-quants : ARM NEON vec_dot for q2_2 and q1_3	2024-06-27 02:06:28 -04:00
Francis Couture-Harpin	638ad52f87	ggml-quants : cleanup Q1_3 code formatting	2024-06-27 02:06:28 -04:00
Francis Couture-Harpin	ef1e345c85	ggml-quants : Q2_2 now faster than Q4_K on with AVX2	2024-06-27 02:06:28 -04:00
Francis Couture-Harpin	48b73b8498	ggml-quants : substract 1 when back in epi8 This makes the 1.625 bpw type go faster than q4_0. Still not the fastest.	2024-06-27 02:06:28 -04:00
Francis Couture-Harpin	7ef4254a92	ggml-quants : faster 1.625 bpw AVX2 vec_dot Not using a lookup table anymore makes it match q4_0 speed. * gguf-py : fix formatting * llama : remove spaces on empty line	2024-06-27 02:06:28 -04:00
Francis Couture-Harpin	bd807499f7	ggml-quants : 1.625 bpw ternary packing for BitNet 1.58b	2024-06-27 02:06:22 -04:00
Raj Hammeer Singh Hada	ac146628e4	Fix llama-android.cpp for error - "common/common.h not found" (#8145 ) - Path seems to be wrong for the common.h header file in llama-android.cpp file. Fixing the path so the Android Build doesn't fail with the error "There is no file common/common.h"	2024-06-27 03:57:57 +02:00
Daniel Bevenius	9b31a40c6d	clip : suppress unused variable warnings (#8105 ) * clip : suppress unused variable warnings This commit suppresses unused variable warnings for the variables e in the catch blocks. The motivation for this change is to suppress the warnings that are generated on Windows when using the MSVC compiler. The warnings are not displayed when using GCC because GCC will mark all catch parameters as used. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! clip : suppress unused variable warnings Remove e (/e/) instead instead of using GGML_UNUSED. --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-06-27 01:50:09 +02:00

... 5 6 7 8 9 ...

3594 Commits