ochafik
542853b34b
tool-call
: greedy sampling in server tests + tweak prompt
2024-10-31 04:38:22 +00:00
ochafik
61655b9cdd
Merge remote-tracking branch 'origin/master' into tool-call
2024-10-31 01:45:07 +00:00
Olivier Chafik
e4d5449638
tool-calls
: test Qwen2.5-7B-Instruct-Q4_K_M.gguf
2024-10-30 21:40:15 +00:00
ochafik
5227321dfd
tool-call
: when slow server tests fail, hint to run python scripts/fetch_server_test_models.py
2024-10-30 12:40:22 +00:00
Rich Dougherty
6763f713bb
readme : more lora detail in main example readme ( #10064 )
2024-10-30 13:22:39 +01:00
ochafik
3ebdb2b805
tool-call
: support tool_use variant in llama_chat_template_from_model + drop llama_get_chat_template
2024-10-30 10:07:10 +00:00
Diego Devesa
c5b0f4b5d9
llama : refactor model loader with backend registry ( #10026 )
2024-10-30 02:01:23 +01:00
Olivier Chafik
92c384a5e8
nits
2024-10-29 17:24:59 +00:00
Olivier Chafik
773ff91b7a
tool-call
: force printing of lazy grammar trigger tokens to regularize function call parsing
2024-10-29 15:26:51 +00:00
Olivier Chafik
fa4c1119c9
tool-call
: use functionary-small-v3.2-Q8_0.gguf in test (Q4_K_M too dumb for function call)
2024-10-29 15:25:37 +00:00
Olivier Chafik
64287a328d
tool-call
: test Hermes-3-Llama-3.1-8B
2024-10-29 14:52:25 +00:00
Georgi Gerganov
8d8ff71536
llama : remove Tail-Free sampling ( #10071 )
...
ggml-ci
2024-10-29 10:42:05 +02:00
ochafik
aefac1e5cb
tool-call
: update scripts/fetch_server_test_models.py
2024-10-28 23:57:23 +00:00
ochafik
b825440c81
tool-call
: use Q4_K_M models
2024-10-28 23:56:40 +00:00
ochafik
74d71a673e
agent
: simplify syntax (default tools to local w/ default port)
2024-10-28 23:54:01 +00:00
ochafik
ec547e4137
tool-call
: add tests: tool_call=none, parallel_tool_calls=true
2024-10-28 10:04:00 +00:00
Georgi Gerganov
8125e6cbfc
server : don't overfill the batch during infill ( #10018 )
...
ggml-ci
2024-10-28 08:49:32 +02:00
ochafik
168add7ec8
Update tool_call.feature
2024-10-28 02:06:00 +00:00
ochafik
7fde6d0091
tool_call
: test no tool call on a real model + rename scenarios
2024-10-28 02:00:09 +00:00
ochafik
c88095e3fc
space nits
2024-10-28 00:27:04 +00:00
ochafik
9a86ea79a2
tool-call
: slow tool call integration tests
2024-10-28 00:26:40 +00:00
ochafik
080982ebf3
tool-call
: test MistralNemo in forced tools server tests (w/ parallel tool calls disabled)
2024-10-27 16:39:51 +00:00
wwoodsTM
ff252ea48e
llama : add DRY sampler ( #9702 )
...
* sampling : add DRY sampler (post-refactor)
* DRY: Trying to fix coauthors, removed unneeded line
* DRY: Fixed redundant code
* DRY: Fixed crash issue due to DRY being in chain but uninitialized
---------
Co-authored-by: l3utterfly <gc.pthzfoldr@gmail.com>
Co-authored-by: pi6am <34464159+pi6am@users.noreply.github.com>
2024-10-25 19:07:34 +03:00
Michael Podvitskiy
d80fb71f8b
llama: string_split fix ( #10022 )
...
* llama: Refactor string_split to use template specialization, fixes parsing strings with spaces
* llama: Add static_assert in the string_split template to ensure the correct template specialization is used for std::string
2024-10-25 17:57:54 +02:00
Georgi Gerganov
bc5ba007b2
server : check that the prompt fits in the slot's context ( #10030 )
...
ggml-ci
2024-10-25 10:13:46 +03:00
Olivier Chafik
30bd00bcf7
agent
: fix tools setup
2024-10-25 02:00:47 +01:00
Olivier Chafik
5c414a3335
agent
: simplify tools setup
2024-10-25 01:03:45 +01:00
Xuan Son Nguyen
958367bf53
server : refactor slot input data, move tokenizer to HTTP thread ( #10023 )
...
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-musa.Dockerfile platforms:linux/amd64 tag:full-musa]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-musa.Dockerfile platforms:linux/amd64 tag:light-musa]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-musa.Dockerfile platforms:linux/amd64 tag:server-musa]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
Python Type-Check / pyright type-check (push) Waiting to run
* server : refactor slot input data, move tokenizer to HTTP thread
* move prompt_tokens.empty() check
* fix incorrect if branch
* fix infinite generation loop
* bring back infill validation
* add infill test
* try fixing format_infill
* fix test
* remove redundant code
* rename completion to inference
* update docs
* use llama_tokens everywhere
2024-10-24 21:51:22 +02:00
Olivier Chafik
0f4fc8cb28
agent
: fix no-cache issue in squid for brave tool
2024-10-24 18:59:37 +01:00
Olivier Chafik
03b86416e1
agent
: fix deps + make docker compose setup easier to debug
2024-10-24 12:30:27 +01:00
ochafik
c2926e4bd9
Update README.md
2024-10-24 06:40:16 +01:00
ochafik
d338bfb87f
agent
: ditch aiohttp & define REQUESTS_CA_BUNDLE to fix http proxying / trust the self-signed cert from python
2024-10-24 06:35:37 +01:00
ochafik
0f5d63943f
agent
: display http errors nicely
2024-10-24 05:40:58 +01:00
ochafik
f5320af02a
tool-call
: return tool_call.id (required by Nemo)
2024-10-24 05:40:15 +01:00
ochafik
267e630c14
agent
: isolate tools container + log its outgoing HTTP & HTTPS traffic w/ docker compose + self-signed squid proxy
2024-10-24 05:38:54 +01:00
wwoodsTM
0a1c750c80
server : samplers accept the prompt correctly ( #10019 )
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-musa.Dockerfile platforms:linux/amd64 tag:full-musa]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-musa.Dockerfile platforms:linux/amd64 tag:light-musa]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-musa.Dockerfile platforms:linux/amd64 tag:server-musa]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
2024-10-23 22:27:51 +03:00
Georgi Gerganov
2d3aba9ee8
llama.vim : bump generation time limit to 3s [no ci]
2024-10-23 17:16:56 +03:00
Michael Coppola
ac113a0fee
llama.vim : add classic vim support ( #9995 )
...
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-musa.Dockerfile platforms:linux/amd64 tag:full-musa]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-musa.Dockerfile platforms:linux/amd64 tag:light-musa]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-musa.Dockerfile platforms:linux/amd64 tag:server-musa]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
* added classic vim support
* fixed ring update, removed blank line
* minor
* minor
* minor doc update
* removed uneeded var
* minor
* minor
* fixed job_start creating new scratch buffers
* fixed job_start creating new scratch buffers
* fixed ghost text indenting when expandtab is on
* removed unused code
* minor
* unified fim_on_exit
* minor
* vim ghost text rendering now uses pos_x and pos_y parameters
* renamed *_hlgroup to hlgroup_*
* renamed *_ghost_text to ghost_text_*, moved nvim/vim detection to llama#init()
* minor
---------
Co-authored-by: Michael Coppola <info@michaeljcoppola.com>
2024-10-23 14:09:26 +03:00
ochafik
2b49440011
tool-call
: fix previous commit's parallel arg
2024-10-23 02:35:21 +01:00
ochafik
3e12b9b38e
tool-calls
: basic Nemo support, default parallel to true if template mentions tool_call_id
2024-10-23 02:30:31 +01:00
ochafik
fc80ad20ce
tool-call
: Log tool call style name, ensure returned content not null
2024-10-22 23:41:47 +01:00
Olivier Chafik
db4bf93812
Merge remote-tracking branch 'origin/master' into tool-call
2024-10-22 14:37:30 +01:00
ochafik
9f5ab97756
tool-calls
: add generic tool call style as default
2024-10-22 10:53:21 +01:00
Georgi Gerganov
e94a138d64
llama.vim : fix info text display [no ci] ( #9787 )
2024-10-22 00:37:55 +03:00
Georgi Gerganov
e01c67affe
llama.vim : move info to the right of screen [no ci] ( #9787 )
...
'eol' messes up the rendering with nvim v0.10.2 for some reason
2024-10-21 22:53:18 +03:00
Georgi Gerganov
dbd5f2f573
llama.vim : plugin for Neovim ( #9787 )
2024-10-21 20:25:02 +03:00
Georgi Gerganov
55e47786e3
llama : default sampling changes + greedy update ( #9897 )
...
* llama : deprecate softmax sampler + fix dist sampler
ggml-ci
* tests : replace macros with functions
ggml-ci
* sampling : change temperature sampler logic
For t <= 0.0f, keep the max logit intact and set the rest to -inf
* cont : no need for special "greedy" logic
top-k == 1 is the same
* tests : init prob correctly
* llama : handle temp <= 0.0 in the temp_ext sampler too
ggml-ci
* cont : avoid extra loop in temperature sampler for sub-zero temp
ggml-ci
2024-10-21 09:46:40 +03:00
Georgi Gerganov
bc21975084
speculative : fix handling of some input params ( #9963 )
...
* speculative : fix batch sizes at initialization
ggml-ci
* speculative : handle params.n_predict == -1
* speculative : limit batch size to llama_n_batch
2024-10-21 09:37:12 +03:00
Xuan Son Nguyen
cda0e4b648
llama : remove all_pos_0, all_pos_1, all_seq_id from llama_batch ( #9745 )
...
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-musa.Dockerfile platforms:linux/amd64 tag:full-musa]) (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-musa.Dockerfile platforms:linux/amd64 tag:light-musa]) (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-musa.Dockerfile platforms:linux/amd64 tag:server-musa]) (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Has been cancelled
Nix CI / nix-eval (macos-latest) (push) Has been cancelled
Nix CI / nix-eval (ubuntu-latest) (push) Has been cancelled
Nix CI / nix-build (macos-latest) (push) Has been cancelled
Nix CI / nix-build (ubuntu-latest) (push) Has been cancelled
flake8 Lint / Lint (push) Has been cancelled
update-flake-lock / lockfile (push) Has been cancelled
* refactor llama_batch_get_one
* adapt all examples
* fix simple.cpp
* fix llama_bench
* fix
* fix context shifting
* free batch before return
* use common_batch_add, reuse llama_batch in loop
* null terminated seq_id list
* fix save-load-state example
* fix perplexity
* correct token pos in llama_batch_allocr
2024-10-18 23:18:01 +02:00
Ouadie EL FAROUKI
87421a23e8
[SYCL] Add SYCL Backend registry, device and Event Interfaces ( #9705 )
...
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-musa.Dockerfile platforms:linux/amd64 tag:full-musa]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-musa.Dockerfile platforms:linux/amd64 tag:light-musa]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-musa.Dockerfile platforms:linux/amd64 tag:server-musa]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
* implemented missing SYCL event APIs
* sycl : Added device and backend reg interfaces
* Restructured ggml-sycl.cpp
2024-10-18 06:46:16 +01:00