Commit Graph

4065 Commits

Author SHA1 Message Date
ochafik
2926089c5d fix lints 2024-09-26 19:06:29 +01:00
ochafik
5840e10069 tool-call: merge & fix jinja template tests into test-chat-template 2024-09-26 19:05:00 +01:00
ochafik
50685f837f minja: add str.title() 2024-09-26 19:03:59 +01:00
ochafik
296331bba3 minja: update chat template goldens w/ llama.3.1 arguments workaround 2024-09-26 18:10:27 +01:00
ochafik
9cfe4d7202 tool-call: refactor llama_chat_template class + use in validate_model_chat_template 2024-09-26 18:06:03 +01:00
ochafik
cf7bece6a7 tool-call: factor chat template away from legacy API 2024-09-26 17:19:29 +01:00
Neo Zhang Jianyu
95bc82fbc0
[SYCL] add missed dll file in package (#9577)
Some checks failed
Nix CI / nix-eval (macos-latest) (push) Has been cancelled
Nix CI / nix-eval (ubuntu-latest) (push) Has been cancelled
Nix CI / nix-build (macos-latest) (push) Has been cancelled
Nix CI / nix-build (ubuntu-latest) (push) Has been cancelled
flake8 Lint / Lint (push) Has been cancelled
* update oneapi to 2024.2

* use 2024.1

---------

Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2024-09-26 17:38:31 +08:00
ochafik
d7ec84f78c tool-call: allow <|python_tag|> in functionary-medium-3.1 2024-09-26 06:52:34 +01:00
ochafik
3d2650ce65 fix gcc build 2024-09-26 06:52:34 +01:00
ochafik
749a21c67a gcc appeasement 2024-09-26 06:08:18 +01:00
ochafik
0c870133d8 tool-call: test/fix functionary-medium-v3.1's template (can "look" like llama3.1 template) 2024-09-26 05:56:15 +01:00
ochafik
8e4a9bad8a minja: allow none input to selectattr, and add safe passthrough filter 2024-09-26 05:53:12 +01:00
ochafik
5f5be9cde7 minja: gcc tweaks 2024-09-26 05:06:11 +01:00
ochafik
2eb29bf8b8 tool-call: update chat templates/goldens 2024-09-26 04:00:10 +01:00
ochafik
4cd82d61dd tool-call: fix pyright type errors 2024-09-26 03:59:38 +01:00
ochafik
059babdd9b minja: try to please gcc 2024-09-26 03:58:18 +01:00
ochafik
94377d743c server: catch errors in format_final_response_oaicompat instead of taking server down 2024-09-26 03:42:36 +01:00
ochafik
595e11cb11 tool-call: fix/test functionary v3 2024-09-26 03:42:05 +01:00
ochafik
c124ab48ea minja: add str.endswith 2024-09-26 03:21:23 +01:00
ochafik
76d2938ef8 fix flake8 lints 2024-09-26 02:30:17 +01:00
ochafik
1b6280102b fix editorconfig lints 2024-09-26 02:27:46 +01:00
R0CKSTAR
7691654c68
mtgpu: enable VMM (#9597)
Some checks failed
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Has been cancelled
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-09-26 03:27:40 +02:00
ochafik
ab25e3fbf9 tool-call: allow empty message content when there's tool_calls in format_chat 2024-09-26 02:19:04 +01:00
ochafik
d928ff4dfd server: catch errors in oaicompat_completion_params_parse instead of taking server down 2024-09-26 02:18:01 +01:00
ochafik
a774093a99 tool-call: add server tests for llama 3.1 2024-09-26 02:17:30 +01:00
ochafik
9e366b3d03 server: fix tailing comma in completions_seed 2024-09-26 02:15:48 +01:00
ochafik
45b243b4a5 minja: fix llama_chat_apply_template + adde use_jinja param to validate_model_chat_template 2024-09-26 02:14:42 +01:00
ochafik
e983c9d0de tool-call: fix llama_chat_apply_template signature / test-chat-template 2024-09-25 22:02:58 +01:00
ochafik
97d0620968 minja: fetch more templates (add models from test-chat-template) 2024-09-25 19:22:43 +01:00
ochafik
d15dcfb09d tool-call: add output example to readme 2024-09-25 19:22:16 +01:00
ochafik
33ea20edd1 Merge remote-tracking branch 'origin/master' into tool-call 2024-09-25 18:58:54 +01:00
ochafik
8f25531c44 tool-call: add basic usage example to server readme 2024-09-25 18:00:31 +01:00
ochafik
4706bdbae1 tool-call: support Functionary v3 vs. v3-llama3.1 variants 2024-09-25 17:33:00 +01:00
Xuan Son Nguyen
ea9c32be71
ci : fix docker build number and tag name (#9638)
Some checks are pending
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
* ci : fix docker build number and tag name

* fine-grant permissions
2024-09-25 17:26:01 +02:00
ochafik
41103c0ed6 server: add --chat-template-file 2024-09-25 16:14:46 +01:00
ochafik
e309c6a47f tool-call: integrate minja & tool-call to server when --jinja is set 2024-09-25 16:14:46 +01:00
ochafik
3cfc21ea71 tool-call: basic Functionary 3.2, Llama 3.1, Hermes 2 Pro grammar generators + parsers 2024-09-25 16:14:22 +01:00
ochafik
26c175b416 json: build_grammar helper 2024-09-25 16:14:22 +01:00
ochafik
eaca756ecc minja: minimalist Jinja templating engine for LLM chat templates 2024-09-25 16:14:22 +01:00
ochafik
5b6d5040d5 grammar: trigger words + refactor of antiprompts 2024-09-25 16:14:22 +01:00
Charles Xu
1e43630218
ggml : remove assert for AArch64 GEMV and GEMM Q4 kernels (#9217)
Some checks failed
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
Python check requirements.txt / check-requirements (push) Has been cancelled
Python Type-Check / pyright type-check (push) Has been cancelled
* ggml : remove assert for AArch64 GEMV and GEMM Q4 kernels

* added fallback mechanism when the offline re-quantized model is not
optimized for the underlying target.

* fix for build errors

* remove prints from the low-level code

* Rebase to the latest upstream
2024-09-25 16:12:20 +03:00
Xuan Son Nguyen
afbbfaa537
server : add more env vars, improve gen-docs (#9635)
* server : add more env vars, improve gen-docs

* update server docs

* LLAMA_ARG_NO_CONTEXT_SHIFT
2024-09-25 14:05:13 +02:00
Gabe Goodhart
3d6bf6919f
llama : add IBM Granite MoE architecture (#9438)
* feat(gguf-py): Add granitemoe architecture

This includes the addition of new tensor names for the new moe layers.
These may not be correct at this point due to the need for the hack in
gguf_writer.py to double-check the length of the shape for these layers.

Branch: GraniteMoE

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat(convert_hf_to_gguf): Add GraniteMoeModel

GraniteMoe has the same configuration deltas as Granite

Branch: GraniteMoE

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(granitemoe convert): Split the double-sized input layer into gate and up

After a lot of staring and squinting, it's clear that the standard mixtral
expert implementation is equivalent to the vectorized parallel experts in
granite. The difference is that in granite, the w1 and w3 are concatenated
into a single tensor "input_linear." Rather than reimplementing all of the
math on the llama.cpp side, the much simpler route is to just split this
tensor during conversion and follow the standard mixtral route.

Branch: GraniteMoE

Co-Authored-By: alex.brooks@ibm.com

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat(granitemoe): Implement granitemoe

GraniteMoE follows the mixtral architecture (once the input_linear layers
are split into gate_exps/up_exps). The main delta is the addition of the
same four multipliers used in Granite.

Branch: GraniteMoE

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* Typo fix in docstring

Co-Authored-By: ggerganov@gmail.com

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(conversion): Simplify tensor name mapping in conversion

Branch: GraniteMoE

Co-Authored-By: git@compilade.net
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(convert): Remove unused tensor name mappings

Branch: GraniteMoE

Co-Authored-By: git@compilade.net
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(convert): Sanity check on merged FFN tensor sizes

Branch: GraniteMoE

Co-Authored-By: git@compilade.net
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix: Allow "output" layer in granite moe architecture (convert and cpp)

Branch: GraniteMoE

Co-Authored-By: git@compilade.net
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(granite): Add missing 'output' tensor for Granite

This is a fix for the previous `granite` architecture PR. Recent snapshots
have included this (`lm_head.weights`) as part of the architecture

Branch: GraniteMoE

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

---------

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-09-25 10:06:52 +03:00
Dou Xinpeng
904837e0cb
cann: fix crash when llama-bench is running on multiple cann devices (#9627)
Some checks are pending
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
2024-09-25 11:30:38 +08:00
Eric Zhang
70392f1f81
ggml : add AVX512DQ requirement for AVX512 builds (#9622)
Some checks are pending
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
2024-09-24 11:03:21 +03:00
Georgi Gerganov
bb5f819975
sync : ggml 2024-09-24 11:01:18 +03:00
Georgi Gerganov
c038931615
examples : adapt to ggml.h changes (ggml/0)
ggml-ci
2024-09-24 11:00:52 +03:00
Georgi Gerganov
31ac5834fe
llama : keep track of all EOG tokens in the vocab (#9609)
ggml-ci
2024-09-24 10:16:06 +03:00
Georgi Gerganov
cea1486ecf
log : add CONT level for continuing previous log entry (#9610) 2024-09-24 10:15:35 +03:00
StrangeBytesDev
0aa15011e3
server : add newline after chat example (#9616) 2024-09-24 09:04:39 +03:00