llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-26 19:34:35 +00:00

Author	SHA1	Message	Date
Xuan Son Nguyen	49122a873f	gemma2: add sliding window mask (#8227 ) * gemma2: add sliding window mask * fix data_swa uninitialized * better naming * add co-author Co-authored-by: Arlo Phoenix <arlo-phoenix@users.noreply.github.com> * replace list with single tensor * update * llama : minor styling * convert : add sanity check for query_pre_attn_scalar * fix small typo in README --------- Co-authored-by: Arlo Phoenix <arlo-phoenix@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-07-01 18:48:34 +02:00
Andrei	1c5eba6f8e	llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 (#8197 ) * Add attention and final logit softcapping. * fix * Add custom add_ functions * Disable flash attention for Gemma2 * Update src/llama.cpp Co-authored-by: slaren <slarengh@gmail.com> * Add default value for attention and final logit softcap value * Add custom kq scaling from Gemma2Attention * Remove custom pre attention scaling and use computed value instead. --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-29 23:44:08 -04:00
Xuan Son Nguyen	26a39bbd6b	Add MiniCPM, Deepseek V2 chat template + clean up `llama_chat_apply_template_internal` (#8172 ) * tmp_contains * minicpm chat template * add DeepSeek Lite template * change deepseek-lite to deepseek2 * correct code comment * correct code from master branch	2024-06-28 15:11:44 +02:00
pculliton	e57dc62057	llama: Add support for Gemma2ForCausalLM (#8156 ) * Inference support for Gemma 2 model family * Update convert-hf-to-gguf.py, constants, and tensor mappings * cleanup * format fix * Fix special token vocab bug * Don't add space prefix * fix deleted lines * Update src/llama.cpp Co-authored-by: slaren <slarengh@gmail.com> * Add model type names * Add control vector * Fix model type identification --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com> Co-authored-by: slaren <slarengh@gmail.com>	2024-06-27 21:00:43 -07:00
Sigbjørn Skjæret	6030c61281	Add Qwen2MoE 57B-A14B model identifier (#8158 ) * Add Qwen2MoE 57B-A14B * Add Qwen2MoE 57B-A14B	2024-06-27 16:27:41 +02:00
kustaaya	f675b20a3b	Added support for Viking pre-tokenizer (#8135 ) Co-authored-by: kustaaya <kustaaya@protonmail.com>	2024-06-27 10:58:54 +02:00
Sigbjørn Skjæret	911e35bb8b	llama : fix CodeLlama FIM token checks (#8144 ) * account for space prefix character * use find instead	2024-06-27 10:46:41 +03:00
Georgi Gerganov	f3f65429c4	llama : reorganize source code + improve CMake (#8006 ) * scripts : update sync [no ci] * files : relocate [no ci] * ci : disable kompute build [no ci] * cmake : fixes [no ci] * server : fix mingw build ggml-ci * cmake : minor [no ci] * cmake : link math library [no ci] * cmake : build normal ggml library (not object library) [no ci] * cmake : fix kompute build ggml-ci * make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE ggml-ci * move public backend headers to the public include directory (#8122) * move public backend headers to the public include directory * nix test * spm : fix metal header --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * scripts : fix sync paths [no ci] * scripts : sync ggml-blas.h [no ci] --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-26 18:33:02 +03:00

1 2

58 Commits