Commit Graph

4409 Commits

Author SHA1 Message Date
Georgi Gerganov
e160b0608d
Merge 1e7e3384e1 into 09fe2e7613 2024-12-24 12:10:45 -05:00
NeverLucky
09fe2e7613
server: allow filtering llama server response fields (#10940)
* llama_server_response_fields

* llama_server_response_fields_fix_issues

* params fixes

* fix

* clarify docs

* change to "response_fields"

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2024-12-24 17:39:49 +01:00
Georgi Gerganov
30caac3a68
llama : the WPM vocabs use the CLS token as BOS (#10930)
* llama : the WPM vocabs use the CLS token as BOS

ggml-ci

* llama : add comment
2024-12-24 09:44:20 +02:00
Georgi Gerganov
1e7e3384e1
minor 2024-12-24 09:42:53 +02:00
Diego Devesa
60cfa728e2
ggml : use wstring for backend search paths (#10960)
ggml-ci
2024-12-24 04:05:27 +01:00
Diego Devesa
3327bb0f8d
ggml : fix arm enabled features check (#10961) 2024-12-24 04:05:17 +01:00
Diego Devesa
32d6ee6385
ggml : fix const usage in SSE path (#10962) 2024-12-23 20:25:52 +01:00
Georgi Gerganov
bb0b2c4f56
llama : context
ggml-ci
2024-12-23 21:05:54 +02:00
Georgi Gerganov
0ccae21e6b
cont
ggml-ci
2024-12-23 19:22:24 +02:00
Georgi Gerganov
7035c79fb5
llama : batch
ggml-ci
2024-12-23 18:43:42 +02:00
Georgi Gerganov
a7df0714db
llama : impl
ggml-ci
2024-12-23 17:42:12 +02:00
Georgi Gerganov
b0d6b66b7d
llama : kv cache
ggml-ci
2024-12-23 15:43:16 +02:00
Xuan Son Nguyen
14b699ecde
server : fix missing model id in /model endpoint (#10957)
Some checks are pending
flake8 Lint / Lint (push) Waiting to run
Python Type-Check / pyright type-check (push) Waiting to run
* server : fix missing model id in /model endpoint

* fix ci
2024-12-23 12:52:25 +01:00
Georgi Gerganov
6eaea63e36
minor 2024-12-23 13:28:56 +02:00
Xuan Son Nguyen
485dc01214
server : add system_fingerprint to chat/completion (#10917)
* server : add system_fingerprint to chat/completion

* update README
2024-12-23 12:02:44 +01:00
Georgi Gerganov
de014bc339
rebase
Some checks failed
Python check requirements.txt / check-requirements (push) Has been cancelled
Python Type-Check / pyright type-check (push) Has been cancelled
ggml-ci
2024-12-23 11:52:36 +02:00
Georgi Gerganov
e42839382e
examples : fix
ggml-ci
2024-12-23 11:46:51 +02:00
Georgi Gerganov
963fb4d26f
llama : adapter
ggml-ci
2024-12-23 11:46:51 +02:00
Georgi Gerganov
0969970a48
llama : hparams
ggml-ci
2024-12-23 11:46:51 +02:00
Georgi Gerganov
ac62ce0236
llama : model
ggml-ci
2024-12-23 11:46:51 +02:00
Georgi Gerganov
29fd7b56d0
llama : chat
ggml-ci
2024-12-23 11:46:49 +02:00
Georgi Gerganov
c8669a0e55
llama : arch (cont)
ggml-ci
2024-12-23 11:46:39 +02:00
Georgi Gerganov
52063f737d
ci : remove BUILD_SHARED_LIBS=OFF
ggml-ci
2024-12-23 11:46:39 +02:00
Georgi Gerganov
7eb858aab4
llama : mmap
ggml-ci
2024-12-23 11:46:39 +02:00
Georgi Gerganov
4c5b321042
llama : arch 2024-12-23 11:46:39 +02:00
Georgi Gerganov
7b5b594526
llama : control-vector -> adapter 2024-12-23 11:46:38 +02:00
Georgi Gerganov
f9b0e3b382
llama : scatter llama.cpp into multiple modules (wip) 2024-12-23 11:46:37 +02:00
Radoslav Gerganov
86bf31cfe6
rpc-server : add support for the SYCL backend (#10934) 2024-12-23 10:39:30 +02:00
Yun Dou
b92a14a841
llama : support InfiniAI Megrez 3b (#10893)
Some checks failed
flake8 Lint / Lint (push) Waiting to run
Python Type-Check / pyright type-check (push) Waiting to run
Python check requirements.txt / check-requirements (push) Has been cancelled
* Support InfiniAI Megrez 3b

* Fix tokenizer_clean_spaces for megrez
2024-12-23 01:35:44 +01:00
ymcki
6f0c9e034b
llama : support for Llama-3_1-Nemotron-51B (#10669)
* conflict resolution

* move comments after bracket to its own line
2024-12-23 01:22:33 +01:00
Eric Curtin
dab76c92cc
llama-run : include temperature option (#10899)
This commit updates the `examples/run/README.md` file to include a new
option for setting the temperature and updates the `run.cpp` file to
parse this option.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2024-12-23 01:21:40 +01:00
yuri@FreeBSD
7024d59e6a
ggml : fix run-time on FreeBSD in get_executable_path() (#10948) 2024-12-23 01:20:11 +01:00
Rudi Servo
7c0e285858
devops : add docker-multi-stage builds (#10832) 2024-12-22 23:22:58 +01:00
Billel Mokeddem
7ae33a616f
llama : add Falcon3 support (#10883)
* Add Falcon3 model support

* Add fix for adding bos to added special tokens

* Add comment explaining the logic behind the if statement

* Add a log message to better track the when the following line of code is triggered

* Update log to only print when input and output characters are different

* Fix handling pre-normalized tokens

* Refactoring
2024-12-23 00:09:58 +02:00
Jeff Bolz
ebdee9478c
vulkan: build fixes for 32b (#10927)
* vulkan: build fixes for 32b

Should fix #10923

* vulkan: initialize some buffer/offset variables
2024-12-22 10:44:01 +01:00
Georgi Gerganov
5cd85b5e00
convert : add BertForMaskedLM (#10919)
Some checks failed
Python check requirements.txt / check-requirements (push) Has been cancelled
flake8 Lint / Lint (push) Has been cancelled
Python Type-Check / pyright type-check (push) Has been cancelled
2024-12-21 10:10:18 +02:00
Jeff Bolz
a91a41364b
vulkan: optimize coopmat2 dequant functions (#10855)
Change the code to do 16b loads when possible and extract the appropriate
component late, so the code is effectively decoding a pair of elements and
then selecting one. This can allow more commoning to happen in the compiler
when neighboring elements are loaded.
2024-12-21 08:04:45 +01:00
Adrien Gallouët
e34c5af43f
ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0() (#10874)
* ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0()

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* ggml-cpu: format code

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2024-12-21 00:33:37 +01:00
Akarshan Biswas
eb5c3dc64b
SYCL: Migrate away from deprecated ggml_tensor->backend (#10840)
Some checks are pending
Python check requirements.txt / check-requirements (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
Python Type-Check / pyright type-check (push) Waiting to run
* Migrate to tensor->buffer for checking backend buffer type: 1

* SYCL: common.cpp try to migrate away from tensor->backend

* SYCL: fix assertions and add proper comments

* SYCL: remove extra space

* SYCL: Add back static to ggml_backend_buffer_is_sycl_split function

* SYCL: Add pragma directive to suppress warning spam

* SYCL: Integrate debug logs with GGML_LOG and other fixes

* Revert "SYCL: Integrate debug logs with GGML_LOG and other fixes"

This reverts commit 2607b7de0f.
Let's keep the current SYCL specific logging mechanism for now

* SYCL: Use GGML_SYCL_DEBUG after reverting

* SYCL: reg_get_proc_address func, update to the current func signature

* SYCL: Refactor SYCL buffer checks in ggml_sycl_cpy_tensor_2d
2024-12-20 23:31:28 +08:00
Xuan Son Nguyen
0ca416c91a
server : (UI) fix copy to clipboard function (#10916) 2024-12-20 14:12:06 +01:00
Diego Devesa
21ae3b9be8
ggml : add test for SVE and disable when it fails (#10906) 2024-12-20 13:31:28 +01:00
Molly Sophia
0a11f8b7b5
convert : fix RWKV v6 model conversion (#10913)
* Enable --no-context-shift for llama-perplexity example

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* RWKV 6: Fix error in ggml_cuda_op_bin_bcast

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

---------

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-12-20 11:44:58 +02:00
Georgi Gerganov
d408bb9268
clip : disable GPU support (#10896)
ggml-ci
2024-12-19 18:47:15 +02:00
Georgi Gerganov
5cab3e4aaa
llama : minor grammar refactor (#10897)
Some checks are pending
Python check requirements.txt / check-requirements (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
Python Type-Check / pyright type-check (push) Waiting to run
ggml-ci
2024-12-19 17:42:13 +02:00
Georgi Gerganov
36319dec5d
tts : small QoL for easy model fetch (#10903) 2024-12-19 17:35:15 +02:00
Xuan Son Nguyen
57bb2c40cd
server : fix logprobs, make it OAI-compatible (#10783)
* server : fix logprobs, make it openai-compatible

* update docs

* add std::log

* return pre-sampling p

* sort before apply softmax

* add comment

* fix test

* set p for sampled token

* update docs

* add --multi-token-probs

* update docs

* add `post_sampling_probs` option

* update docs [no ci]

* remove --multi-token-probs

* "top_probs" with "post_sampling_probs"

* resolve review comments

* rename struct token_prob to prob_info

* correct comment placement

* fix setting prob for sampled token
2024-12-19 15:40:08 +01:00
Adrien Gallouët
a3c33b1dce
ggml: fix arm build with gcc (#10895)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2024-12-19 14:20:41 +01:00
Sukriti Sharma
2fffc52b50
llama : fix Roberta embeddings (#10856)
* fix: Use gpt2 tokenizer for roberta and add eos/bos tokens

Branch: RobertaTokenizer

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fixes to position embeddings

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* map roberta-bpe to gpt-2

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* fix linting

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

---------

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>
2024-12-19 15:04:51 +02:00
fairydreaming
7585edbdeb
convert : Add support for Microsoft Phi-4 model (#10817)
* convert : use GPT2 vocab for Phi-4 model

* convert : use null value of sliding_window to distinguish Phi-4 from other PHI3-based models

* llama : do not use sliding window attention mask for Phi-4 model

---------

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2024-12-19 10:37:12 +01:00
Johannes Gäßler
cd920d0ac3
tests: disable GGUF test for bad value size (#10886) 2024-12-19 08:53:58 +01:00