Daniel Kleine
133c7b46b3
Fixed RNG seed docs ( #9723 )
...
* Update README.md
fixed RNG seed info
* changed print format to unsigned
2024-10-04 10:54:44 +02:00
ochafik
a151ddcd5a
agent
: handle function errors and dont' stringify str outputs
2024-10-04 04:06:00 +01:00
Olivier Chafik
21a3c90a1c
agent
: tool tweaks (remove ansi escapes from python output, update env keys + provider docs)
2024-10-03 22:20:34 +01:00
Olivier Chafik
da02397f7f
agent
: support more providers (+ extract serve_tools_inside_docker.sh)
...
update readme
2024-10-03 19:23:32 +01:00
Radoslav Gerganov
841713e1e4
rpc : enable vulkan ( #9714 )
...
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
closes #8536
2024-10-03 13:00:52 +03:00
Olivier Chafik
fa8df0c350
agent
: drop fastify.py -> simpler serve_tools.py, and expose other tools to python interpreter
2024-10-02 19:51:23 +01:00
Olivier Chafik
6b4a454735
agent
: hard-code max_results=10 in brave_search
2024-10-02 19:13:28 +01:00
Olivier Chafik
26e76f9704
agent
: allow interactive chat by default, and don't reuse sessions
2024-10-02 19:12:57 +01:00
Olivier Chafik
6f2191d99e
agent
: remove *lots* of cruft from tool definitions derived from FastAPI catalog (and remove wait* tools which can be implemented in Python anyway)
2024-10-02 17:54:20 +01:00
Olivier Chafik
e2a9ab68a3
agent
: --openai flag (auto-fetches OPENAI_API_KEY), improved logging
2024-10-02 17:15:55 +01:00
Olivier Chafik
2428b73853
agent
: ditch openai dependency, use cache_prompt and expose seed
2024-10-02 16:26:45 +01:00
Olivier Chafik
b559d64ecc
Update README.md
2024-10-02 15:19:27 +01:00
Olivier Chafik
9e502e89a5
tool-call
: promote getting chat templates w/ dedicated script rather than rely on test resources
2024-10-02 15:03:08 +01:00
Olivier Chafik
f3538e755b
update tools
2024-10-02 14:57:25 +01:00
Olivier Chafik
5b01402655
agent
: add brave_search & fetch_page tools + move to examples/agent/tools/
2024-10-02 14:29:45 +01:00
Zhenwei Jin
76b37d1541
gguf-split : improve --split and --merge logic ( #9619 )
...
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
* make sure params --split and --merge are not specified at same time
* update gguf-split params parse logic
* Update examples/gguf-split/gguf-split.cpp
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
2024-10-02 10:21:57 +03:00
Georgi Gerganov
148844fe97
examples : remove benchmark ( #9704 )
...
ggml-ci
2024-10-02 10:14:44 +03:00
Olivier Chafik
c36a196f53
tool-call
: prepare possible externalization of minja + factor tool call style out of template
2024-10-01 23:12:24 +01:00
Georgi Gerganov
cad341d889
metal : reduce command encoding overhead ( #9698 )
...
* metal : reduce command encoding overhead
ggml-ci
* metal : add comments
2024-10-01 16:00:25 +03:00
compilade
511636df0c
ci : reduce severity of unused Pyright ignore comments ( #9697 )
2024-09-30 14:13:16 -04:00
vb
08a43d05b6
py : update transfomers version ( #9694 )
...
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
Python check requirements.txt / check-requirements (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
Python Type-Check / pyright type-check (push) Waiting to run
Nix aarch64 builds / nix-build-aarch64 (push) Has been cancelled
* update transfomers version.
* update hfh version.
2024-09-30 18:03:47 +03:00
ochafik
dbda025f87
tool-call
: test messages -> template -> grammar -> tool call parser
2024-09-28 22:32:47 +01:00
ochafik
0ae1112faa
agent
: try to fix pyright lint
2024-09-28 20:10:08 +01:00
ochafik
ef2a020276
tool-call
: make agent async
2024-09-28 19:11:09 +01:00
ochafik
c657857e21
tool-call
: cleanup tools.py
2024-09-28 18:33:40 +01:00
ochafik
55cf337560
tool-call
: better error reporting for server tests
2024-09-28 18:33:40 +01:00
ochafik
7cef90cf9c
tool-call
: more eager function call parsing for Functionary & Llama (give a chance to 3B model)
2024-09-28 18:33:40 +01:00
ochafik
8b2cf3509f
tool-call
: fix grammar trigger crash
2024-09-28 18:30:01 +01:00
ochafik
d983516f40
tool-call
: let the tool call handler expand chat template, moving builtin_tools down as extra_context
2024-09-28 17:46:36 +01:00
Georgi Gerganov
f4d2b8846a
llama : add reranking support ( #9510 )
...
* py : add XLMRobertaForSequenceClassification [no ci]
* py : fix scalar-tensor conversion [no ci]
* py : fix position embeddings chop [no ci]
* llama : read new cls tensors [no ci]
* llama : add classigication head (wip) [no ci]
* llama : add "rank" pooling type
ggml-ci
* server : add rerank endpoint
ggml-ci
* llama : aboud ggml_repeat during classification
* rerank : cleanup + comments
* server : accept /rerank endpoint in addition to /v1/rerank [no ci]
* embedding : parse special tokens
* jina : support v1 reranker
* vocab : minor style
ggml-ci
* server : initiate tests for later
ggml-ci
* server : add docs
* llama : add comment [no ci]
* llama : fix uninitialized tensors
* ci : add rerank tests
ggml-ci
* add reranking test
* change test data
* Update examples/server/server.cpp
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* add `--reranking` argument
* update server docs
* llama : fix comment [no ci]
ggml-ci
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
2024-09-28 17:42:03 +03:00
Zhenwei Jin
6102037bbb
vocab : refactor tokenizer to reduce init overhead ( #9449 )
...
* refactor tokenizer
* llama : make llm_tokenizer more private
ggml-ci
* refactor tokenizer
* refactor tokenizer
* llama : make llm_tokenizer more private
ggml-ci
* remove unused files
* remove unused fileds to avoid unused filed build error
* avoid symbol link error
* Update src/llama.cpp
* Update src/llama.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-09-28 15:10:58 +03:00
ochafik
2f25ee30ef
Update README.md
2024-09-27 07:18:07 +01:00
ochafik
86e4f99092
Update README.md
2024-09-27 07:15:25 +01:00
ochafik
e62b5de3cf
tool-call
: fix functionary-small-3.2 (first tool starts w/ name\n, subsequent are >>>name\n)
2024-09-27 07:06:33 +01:00
ochafik
e33b342da7
tool-call
: fix passing of tools to template + allow agent to finish
2024-09-27 06:24:22 +01:00
ochafik
f62e688387
tool-call
: fix crash / test non-tool call case (added llama_sampler_is_grammar_empty)
2024-09-27 06:04:41 +01:00
ochafik
0abfa36ca7
tool-call
: move usage examples to examples/agent
2024-09-27 05:10:30 +01:00
ochafik
6610ecf965
server
: rm bad debug code
2024-09-27 04:07:35 +01:00
ochafik
9295ca95db
tool-call
: fix agent type lints
2024-09-27 03:53:56 +01:00
ochafik
8299fac07c
tool-call
: adapt very simple agent + docker isolation from https://github.com/ggerganov/llama.cpp/pull/6389
2024-09-26 21:07:46 +01:00
ochafik
10f9fe8d49
tool-call
: fix tool call return format
2024-09-26 21:01:04 +01:00
ochafik
c88c932d98
fix gcc error + lint
2024-09-26 19:18:40 +01:00
ochafik
9cfe4d7202
tool-call
: refactor llama_chat_template class + use in validate_model_chat_template
2024-09-26 18:06:03 +01:00
ochafik
cf7bece6a7
tool-call
: factor chat template away from legacy API
2024-09-26 17:19:29 +01:00
ochafik
3d2650ce65
fix gcc build
2024-09-26 06:52:34 +01:00
ochafik
0c870133d8
tool-call
: test/fix functionary-medium-v3.1's template (can "look" like llama3.1 template)
2024-09-26 05:56:15 +01:00
ochafik
4cd82d61dd
tool-call
: fix pyright type errors
2024-09-26 03:59:38 +01:00
ochafik
94377d743c
server
: catch errors in format_final_response_oaicompat instead of taking server down
2024-09-26 03:42:36 +01:00
ochafik
595e11cb11
tool-call
: fix/test functionary v3
2024-09-26 03:42:05 +01:00
ochafik
1b6280102b
fix editorconfig lints
2024-09-26 02:27:46 +01:00
ochafik
ab25e3fbf9
tool-call
: allow empty message content when there's tool_calls in format_chat
2024-09-26 02:19:04 +01:00
ochafik
d928ff4dfd
server
: catch errors in oaicompat_completion_params_parse instead of taking server down
2024-09-26 02:18:01 +01:00
ochafik
a774093a99
tool-call
: add server tests for llama 3.1
2024-09-26 02:17:30 +01:00
ochafik
9e366b3d03
server
: fix tailing comma in completions_seed
2024-09-26 02:15:48 +01:00
ochafik
45b243b4a5
minja
: fix llama_chat_apply_template + adde use_jinja param to validate_model_chat_template
2024-09-26 02:14:42 +01:00
ochafik
e983c9d0de
tool-call
: fix llama_chat_apply_template signature / test-chat-template
2024-09-25 22:02:58 +01:00
ochafik
d15dcfb09d
tool-call
: add output example to readme
2024-09-25 19:22:16 +01:00
ochafik
33ea20edd1
Merge remote-tracking branch 'origin/master' into tool-call
2024-09-25 18:58:54 +01:00
ochafik
8f25531c44
tool-call
: add basic usage example to server readme
2024-09-25 18:00:31 +01:00
ochafik
e309c6a47f
tool-call
: integrate minja & tool-call to server when --jinja is set
2024-09-25 16:14:46 +01:00
ochafik
5b6d5040d5
grammar
: trigger words + refactor of antiprompts
2024-09-25 16:14:22 +01:00
Xuan Son Nguyen
afbbfaa537
server : add more env vars, improve gen-docs ( #9635 )
...
* server : add more env vars, improve gen-docs
* update server docs
* LLAMA_ARG_NO_CONTEXT_SHIFT
2024-09-25 14:05:13 +02:00
Georgi Gerganov
cea1486ecf
log : add CONT level for continuing previous log entry ( #9610 )
2024-09-24 10:15:35 +03:00
StrangeBytesDev
0aa15011e3
server : add newline after chat example ( #9616 )
2024-09-24 09:04:39 +03:00
Georgi Gerganov
b0f27361f3
sampling : avoid expensive softmax during greedy sampling ( #9605 )
...
* sampling : avoid expensive softmax during greedy sampling
ggml-ci
* speculative : fix default RNG seed + set sparams.n_probs
* Update tests/test-sampling.cpp
Co-authored-by: slaren <slarengh@gmail.com>
* sampling : add clarifying comment [no ci]
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-09-24 09:03:17 +03:00
Xuan Son Nguyen
0b3bf966f4
server : add --no-context-shift option ( #9607 )
...
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
Nix aarch64 builds / nix-build-aarch64 (push) Has been cancelled
Python Type-Check / pyright type-check (push) Has been cancelled
* server : add --no-context-shift option
* small fix
* Update examples/server/tests/features/embeddings.feature
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* tests : minor fix
* revert usage of GGML_ASSERT
* update server documentation
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-09-23 22:23:54 +02:00
Georgi Gerganov
37f8c7b4c9
perplexity : remove extra new lines after chunks ( #9596 )
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
2024-09-23 11:28:02 +03:00
slaren
63351143b2
quantize : improve type name parsing ( #9570 )
...
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
quantize : do not ignore invalid types in arg parsing
quantize : ignore case of type and ftype arguments
2024-09-20 20:55:36 +02:00
Georgi Gerganov
d39e26741f
examples : flush log upon ctrl+c ( #9559 )
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
2024-09-20 11:46:56 +03:00
Sigbjørn Skjæret
722ec1eb51
perplexity : do not escape input data by default ( #9548 )
2024-09-20 09:38:10 +03:00
Georgi Gerganov
6026da52d6
server : clean-up completed tasks from waiting list ( #9531 )
...
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
ggml-ci
2024-09-19 12:44:53 +03:00
Sigbjørn Skjæret
eca0fab44e
imatrix : disable prompt escape by default ( #9543 )
2024-09-19 10:58:14 +03:00
Vinesh Janarthanan
8a308354f6
server : match OAI structured output response ( #9527 )
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
2024-09-18 09:50:34 +03:00
Eric Zhang
f799155ab8
server : fix OpenSSL build (remove obsolete LOG_INFO
) ( #9529 )
2024-09-18 09:28:20 +03:00
Neo Zhang Jianyu
faf67b3de4
[SYCL]set context default value to avoid memory issue, update guide ( #9476 )
...
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
* set context default to avoid memory issue, update guide
* Update docs/backend/SYCL.md
Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>
---------
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>
2024-09-18 08:30:31 +08:00
Michael Podvitskiy
7be099fa81
llama-bench: correct argument parsing error message ( #9524 )
2024-09-17 22:41:38 +02:00
Bert Wagner
8b836ae731
arg : add env variable for parallel ( #9513 )
...
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
* add env variable for parallel
* Update README.md with env: LLAMA_ARG_N_PARALLEL
2024-09-17 16:35:38 +03:00
Vinesh Janarthanan
441b72b91f
main : option to disable context shift ( #9484 )
...
* added cli arg to disable context shift
* reverted precommit
* updated README.md for main
* white space
* allow disabling context shift in the server
* Update common/arg.cpp
no-context-shift only works for main example
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* added server example to --no-context-shift args
* removed server changes
* white space
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-09-16 09:20:01 +03:00
Georgi Gerganov
6262d13e0b
common : reimplement logging ( #9418 )
...
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
Python Type-Check / pyright type-check (push) Waiting to run
https://github.com/ggerganov/llama.cpp/pull/9418
2024-09-15 20:46:12 +03:00
slaren
e6deac31f7
gguf-split : add basic checks ( #9499 )
...
* gguf-split : do not overwrite existing files when merging
* gguf-split : error when too many arguments are passed
2024-09-15 19:02:27 +02:00
VoidIsVoid
dcdcee3a74
server: add data: [DONE] to /chat/completions stream response ( #9459 )
2024-09-14 11:36:44 +02:00
Xuan Son Nguyen
feff4aa846
server : add loading html page while model is loading ( #9468 )
...
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
* Adding loading page for '/' server requests
* set content when model is loading
* removed loading html file
* updated cmakelist
* updated makefile
* cleaned up whitespace
* cleanup for PR removed error
* updated server test to handle 503 HTML
* updated server test to handle 503 HTML
* ca†ch 503 before parsing json
* revert test
* account for both api and web browser requests
* precommit corrections
* eol fix
* revert changes to pre-commit
* removed print statement
* made loading message more descriptive
* also support .html files
---------
Co-authored-by: VJHack <flymyplane21@gmail.com>
Co-authored-by: Vinesh Janarthanan <36610342+VJHack@users.noreply.github.com>
2024-09-13 14:23:11 +02:00
Georgi Gerganov
0abc6a2c25
llama : llama_perf + option to disable timings during decode ( #9355 )
...
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
* llama : llama_perf + option to disable timings during decode
ggml-ci
* common : add llama_arg
* Update src/llama.cpp
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* perf : separate functions in the API
ggml-ci
* perf : safer pointer handling + naming update
ggml-ci
* minor : better local var name
* perf : abort on invalid sampler pointer
ggml-ci
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
2024-09-13 09:53:38 +03:00
Mathijs Henquet
78203641fe
server : Add option to return token pieces in /tokenize endpoint ( #9108 )
...
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
Python Type-Check / pyright type-check (push) Has been cancelled
* server : added with_pieces functionality to /tokenize endpoint
* server : Add tokenize with pieces tests to server.feature
* Handle case if tokenizer splits along utf8 continuation bytes
* Add example of token splitting
* Remove trailing ws
* Fix trailing ws
* Maybe fix ci
* maybe this fix windows ci?
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2024-09-12 22:30:11 +02:00
fengerhu1
e665744317
llava : fix the script error in MobileVLM README ( #9054 )
...
Signed-off-by: Erhu Feng <2748250768@qq.com>
2024-09-12 14:34:22 +03:00
Michael Podvitskiy
ff76e18516
cmake : fixed the order of linking libraries for llama-quantize ( #9450 )
2024-09-12 14:27:14 +03:00
Georgi Gerganov
d6a04f872d
ggml : hide ggml_object, ggml_cgraph, ggml_hash_set ( #9408 )
...
* ggml : hide ggml_object, ggml_cgraph, ggml_hash_set
ggml-ci
* ggml : add ggml-impl.h to backends
* ggml : fix compiler warnings
ggml-ci
* ggml : add assert upon adding nodes
2024-09-12 14:23:49 +03:00
Neo Zhang Jianyu
c9c8575a1a
enhance run script to be easy to change the parameters ( #9448 )
...
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2024-09-12 17:44:17 +08:00
Xuan Son Nguyen
0996c5597f
llava : correct args for minicpmv-cli ( #9429 )
2024-09-11 12:59:13 +02:00
Georgi Gerganov
d2b496bff4
batched-bench : remove unused code ( #9305 )
2024-09-11 10:03:54 +03:00
slaren
49006c67b4
llama : move random seed generation to the samplers ( #9398 )
...
* llama_sampler_penalties : clamp penalty_last_n to zero
2024-09-10 18:04:25 +02:00
Xuan Son Nguyen
bfe76d4a17
common : move arg parser code to arg.cpp
( #9388 )
...
* common : move arg parser to arg.cpp
* better categorize args
* add cmake
* missing climits
* missing cstdarg
* common : more explicit includes
* fix build
* refactor gpt_params_parse
* update server readme
* fix test
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-09-09 23:36:09 +02:00
slaren
5fb5e24811
llama : minor sampling refactor (2) ( #9386 )
2024-09-09 17:10:46 +02:00
Antonis Makropoulos
5ed087573e
readme : add LLMUnity to UI projects ( #9381 )
...
* add LLMUnity to UI projects
* add newline to examples/rpc/README.md to fix editorconfig-checker unit test
2024-09-09 14:21:38 +03:00
Radoslav Gerganov
54f376d0b9
rpc : update README [no ci] ( #9320 )
...
Update README with instructions how to offload model layers to both
local and remote devices
2024-09-09 11:04:39 +03:00
Dan Johansson
b2e89a3274
Arm AArch64: Documentation updates ( #9321 )
...
* Arm AArch64: Documentation updates
* Update docs/build.md to include information on how to enable the Arm optimized gemm/gemv kernels
* Update examples/quantize/README.md with information on the Q4_0_4_4, Q4_0_4_8 and Q4_0_8_8 formats
* Add newline to the end of docs/build.md
2024-09-09 10:02:45 +03:00
Xuan Son Nguyen
3f7ccfd649
common : bring back missing args, add env var duplication check ( #9375 )
...
* common : bring back missing args
* move duplication check to test-arg-parser
* add check for duplicated env var
* correct default values
2024-09-08 18:08:55 +02:00
Xuan Son Nguyen
00b02bb249
imatrix : fix arg parser for imatrix ( #9366 )
...
* imatrix : fix arg parser
* beautify printing first arg
2024-09-08 12:12:17 +02:00
Georgi Gerganov
a5b5d9a101
llama.android : fix build ( #9350 )
2024-09-08 00:33:50 +03:00
Georgi Gerganov
faf69d4237
llama : sanitize invalid tokens ( #9357 )
...
* common : do not add null tokens during warmup
ggml-ci
* llama : check that the input tokens are valid
ggml-ci
* tests : fix batch size of bert model
ggml-ci
2024-09-08 00:33:13 +03:00