Georgi Gerganov
e160b0608d
Merge 1e7e3384e1
into 09fe2e7613
2024-12-24 12:10:45 -05:00
NeverLucky
09fe2e7613
server: allow filtering llama server response fields ( #10940 )
...
* llama_server_response_fields
* llama_server_response_fields_fix_issues
* params fixes
* fix
* clarify docs
* change to "response_fields"
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2024-12-24 17:39:49 +01:00
Georgi Gerganov
30caac3a68
llama : the WPM vocabs use the CLS token as BOS ( #10930 )
...
* llama : the WPM vocabs use the CLS token as BOS
ggml-ci
* llama : add comment
2024-12-24 09:44:20 +02:00
Georgi Gerganov
1e7e3384e1
minor
2024-12-24 09:42:53 +02:00
Diego Devesa
60cfa728e2
ggml : use wstring for backend search paths ( #10960 )
...
ggml-ci
2024-12-24 04:05:27 +01:00
Diego Devesa
3327bb0f8d
ggml : fix arm enabled features check ( #10961 )
2024-12-24 04:05:17 +01:00
Diego Devesa
32d6ee6385
ggml : fix const usage in SSE path ( #10962 )
2024-12-23 20:25:52 +01:00
Georgi Gerganov
bb0b2c4f56
llama : context
...
ggml-ci
2024-12-23 21:05:54 +02:00
Georgi Gerganov
0ccae21e6b
cont
...
ggml-ci
2024-12-23 19:22:24 +02:00
Georgi Gerganov
7035c79fb5
llama : batch
...
ggml-ci
2024-12-23 18:43:42 +02:00
Georgi Gerganov
a7df0714db
llama : impl
...
ggml-ci
2024-12-23 17:42:12 +02:00
Georgi Gerganov
b0d6b66b7d
llama : kv cache
...
ggml-ci
2024-12-23 15:43:16 +02:00
Xuan Son Nguyen
14b699ecde
server : fix missing model id in /model endpoint ( #10957 )
...
flake8 Lint / Lint (push) Waiting to run
Python Type-Check / pyright type-check (push) Waiting to run
* server : fix missing model id in /model endpoint
* fix ci
2024-12-23 12:52:25 +01:00
Georgi Gerganov
6eaea63e36
minor
2024-12-23 13:28:56 +02:00
Xuan Son Nguyen
485dc01214
server : add system_fingerprint to chat/completion ( #10917 )
...
* server : add system_fingerprint to chat/completion
* update README
2024-12-23 12:02:44 +01:00
Georgi Gerganov
de014bc339
rebase
...
Python check requirements.txt / check-requirements (push) Has been cancelled
Python Type-Check / pyright type-check (push) Has been cancelled
ggml-ci
2024-12-23 11:52:36 +02:00
Georgi Gerganov
e42839382e
examples : fix
...
ggml-ci
2024-12-23 11:46:51 +02:00
Georgi Gerganov
963fb4d26f
llama : adapter
...
ggml-ci
2024-12-23 11:46:51 +02:00
Georgi Gerganov
0969970a48
llama : hparams
...
ggml-ci
2024-12-23 11:46:51 +02:00
Georgi Gerganov
ac62ce0236
llama : model
...
ggml-ci
2024-12-23 11:46:51 +02:00
Georgi Gerganov
29fd7b56d0
llama : chat
...
ggml-ci
2024-12-23 11:46:49 +02:00
Georgi Gerganov
c8669a0e55
llama : arch (cont)
...
ggml-ci
2024-12-23 11:46:39 +02:00
Georgi Gerganov
52063f737d
ci : remove BUILD_SHARED_LIBS=OFF
...
ggml-ci
2024-12-23 11:46:39 +02:00
Georgi Gerganov
7eb858aab4
llama : mmap
...
ggml-ci
2024-12-23 11:46:39 +02:00
Georgi Gerganov
4c5b321042
llama : arch
2024-12-23 11:46:39 +02:00
Georgi Gerganov
7b5b594526
llama : control-vector -> adapter
2024-12-23 11:46:38 +02:00
Georgi Gerganov
f9b0e3b382
llama : scatter llama.cpp into multiple modules (wip)
2024-12-23 11:46:37 +02:00
Radoslav Gerganov
86bf31cfe6
rpc-server : add support for the SYCL backend ( #10934 )
2024-12-23 10:39:30 +02:00
Yun Dou
b92a14a841
llama : support InfiniAI Megrez 3b ( #10893 )
...
flake8 Lint / Lint (push) Waiting to run
Python Type-Check / pyright type-check (push) Waiting to run
Python check requirements.txt / check-requirements (push) Has been cancelled
* Support InfiniAI Megrez 3b
* Fix tokenizer_clean_spaces for megrez
2024-12-23 01:35:44 +01:00
ymcki
6f0c9e034b
llama : support for Llama-3_1-Nemotron-51B ( #10669 )
...
* conflict resolution
* move comments after bracket to its own line
2024-12-23 01:22:33 +01:00
Eric Curtin
dab76c92cc
llama-run : include temperature option ( #10899 )
...
This commit updates the `examples/run/README.md` file to include a new
option for setting the temperature and updates the `run.cpp` file to
parse this option.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2024-12-23 01:21:40 +01:00
yuri@FreeBSD
7024d59e6a
ggml : fix run-time on FreeBSD in get_executable_path() ( #10948 )
2024-12-23 01:20:11 +01:00
Rudi Servo
7c0e285858
devops : add docker-multi-stage builds ( #10832 )
2024-12-22 23:22:58 +01:00
Billel Mokeddem
7ae33a616f
llama : add Falcon3 support ( #10883 )
...
* Add Falcon3 model support
* Add fix for adding bos to added special tokens
* Add comment explaining the logic behind the if statement
* Add a log message to better track the when the following line of code is triggered
* Update log to only print when input and output characters are different
* Fix handling pre-normalized tokens
* Refactoring
2024-12-23 00:09:58 +02:00
Jeff Bolz
ebdee9478c
vulkan: build fixes for 32b ( #10927 )
...
* vulkan: build fixes for 32b
Should fix #10923
* vulkan: initialize some buffer/offset variables
2024-12-22 10:44:01 +01:00
Georgi Gerganov
5cd85b5e00
convert : add BertForMaskedLM ( #10919 )
Python check requirements.txt / check-requirements (push) Has been cancelled
flake8 Lint / Lint (push) Has been cancelled
Python Type-Check / pyright type-check (push) Has been cancelled
2024-12-21 10:10:18 +02:00
Jeff Bolz
a91a41364b
vulkan: optimize coopmat2 dequant functions ( #10855 )
...
Change the code to do 16b loads when possible and extract the appropriate
component late, so the code is effectively decoding a pair of elements and
then selecting one. This can allow more commoning to happen in the compiler
when neighboring elements are loaded.
2024-12-21 08:04:45 +01:00
Adrien Gallouët
e34c5af43f
ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0() ( #10874 )
...
* ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0()
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* ggml-cpu: format code
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
---------
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2024-12-21 00:33:37 +01:00
Akarshan Biswas
eb5c3dc64b
SYCL: Migrate away from deprecated ggml_tensor->backend ( #10840 )
...
Python check requirements.txt / check-requirements (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
Python Type-Check / pyright type-check (push) Waiting to run
* Migrate to tensor->buffer for checking backend buffer type: 1
* SYCL: common.cpp try to migrate away from tensor->backend
* SYCL: fix assertions and add proper comments
* SYCL: remove extra space
* SYCL: Add back static to ggml_backend_buffer_is_sycl_split function
* SYCL: Add pragma directive to suppress warning spam
* SYCL: Integrate debug logs with GGML_LOG and other fixes
* Revert "SYCL: Integrate debug logs with GGML_LOG and other fixes"
This reverts commit 2607b7de0f
.
Let's keep the current SYCL specific logging mechanism for now
* SYCL: Use GGML_SYCL_DEBUG after reverting
* SYCL: reg_get_proc_address func, update to the current func signature
* SYCL: Refactor SYCL buffer checks in ggml_sycl_cpy_tensor_2d
2024-12-20 23:31:28 +08:00
Xuan Son Nguyen
0ca416c91a
server : (UI) fix copy to clipboard function ( #10916 )
2024-12-20 14:12:06 +01:00
Diego Devesa
21ae3b9be8
ggml : add test for SVE and disable when it fails ( #10906 )
2024-12-20 13:31:28 +01:00
Molly Sophia
0a11f8b7b5
convert : fix RWKV v6 model conversion ( #10913 )
...
* Enable --no-context-shift for llama-perplexity example
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* RWKV 6: Fix error in ggml_cuda_op_bin_bcast
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-12-20 11:44:58 +02:00
Georgi Gerganov
d408bb9268
clip : disable GPU support ( #10896 )
...
ggml-ci
2024-12-19 18:47:15 +02:00
Georgi Gerganov
5cab3e4aaa
llama : minor grammar refactor ( #10897 )
...
Python check requirements.txt / check-requirements (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
Python Type-Check / pyright type-check (push) Waiting to run
ggml-ci
2024-12-19 17:42:13 +02:00
Georgi Gerganov
36319dec5d
tts : small QoL for easy model fetch ( #10903 )
2024-12-19 17:35:15 +02:00
Xuan Son Nguyen
57bb2c40cd
server : fix logprobs, make it OAI-compatible ( #10783 )
...
* server : fix logprobs, make it openai-compatible
* update docs
* add std::log
* return pre-sampling p
* sort before apply softmax
* add comment
* fix test
* set p for sampled token
* update docs
* add --multi-token-probs
* update docs
* add `post_sampling_probs` option
* update docs [no ci]
* remove --multi-token-probs
* "top_probs" with "post_sampling_probs"
* resolve review comments
* rename struct token_prob to prob_info
* correct comment placement
* fix setting prob for sampled token
2024-12-19 15:40:08 +01:00
Adrien Gallouët
a3c33b1dce
ggml: fix arm build with gcc ( #10895 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2024-12-19 14:20:41 +01:00
Sukriti Sharma
2fffc52b50
llama : fix Roberta embeddings ( #10856 )
...
* fix: Use gpt2 tokenizer for roberta and add eos/bos tokens
Branch: RobertaTokenizer
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* fixes to position embeddings
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* map roberta-bpe to gpt-2
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
* fix linting
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
---------
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com>
2024-12-19 15:04:51 +02:00
fairydreaming
7585edbdeb
convert : Add support for Microsoft Phi-4 model ( #10817 )
...
* convert : use GPT2 vocab for Phi-4 model
* convert : use null value of sliding_window to distinguish Phi-4 from other PHI3-based models
* llama : do not use sliding window attention mask for Phi-4 model
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2024-12-19 10:37:12 +01:00
Johannes Gäßler
cd920d0ac3
tests: disable GGUF test for bad value size ( #10886 )
2024-12-19 08:53:58 +01:00