llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-13 04:00:16 +00:00

Author	SHA1	Message	Date
ochafik	2926089c5d	fix lints	2024-09-26 19:06:29 +01:00
ochafik	5840e10069	`tool-call`: merge & fix jinja template tests into test-chat-template	2024-09-26 19:05:00 +01:00
ochafik	50685f837f	`minja`: add str.title()	2024-09-26 19:03:59 +01:00
ochafik	296331bba3	`minja`: update chat template goldens w/ llama.3.1 arguments workaround	2024-09-26 18:10:27 +01:00
ochafik	9cfe4d7202	`tool-call`: refactor llama_chat_template class + use in validate_model_chat_template	2024-09-26 18:06:03 +01:00
ochafik	cf7bece6a7	`tool-call`: factor chat template away from legacy API	2024-09-26 17:19:29 +01:00
Neo Zhang Jianyu	95bc82fbc0	[SYCL] add missed dll file in package (#9577 ) Some checks failed Nix CI / nix-eval (macos-latest) (push) Has been cancelled Details Nix CI / nix-eval (ubuntu-latest) (push) Has been cancelled Details Nix CI / nix-build (macos-latest) (push) Has been cancelled Details Nix CI / nix-build (ubuntu-latest) (push) Has been cancelled Details flake8 Lint / Lint (push) Has been cancelled Details * update oneapi to 2024.2 * use 2024.1 --------- Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2024-09-26 17:38:31 +08:00
ochafik	d7ec84f78c	`tool-call`: allow <\|python_tag\|> in functionary-medium-3.1	2024-09-26 06:52:34 +01:00
ochafik	3d2650ce65	fix gcc build	2024-09-26 06:52:34 +01:00
ochafik	749a21c67a	gcc appeasement	2024-09-26 06:08:18 +01:00
ochafik	0c870133d8	`tool-call`: test/fix functionary-medium-v3.1's template (can "look" like llama3.1 template)	2024-09-26 05:56:15 +01:00
ochafik	8e4a9bad8a	`minja`: allow none input to selectattr, and add safe passthrough filter	2024-09-26 05:53:12 +01:00
ochafik	5f5be9cde7	`minja`: gcc tweaks	2024-09-26 05:06:11 +01:00
ochafik	2eb29bf8b8	`tool-call`: update chat templates/goldens	2024-09-26 04:00:10 +01:00
ochafik	4cd82d61dd	`tool-call`: fix pyright type errors	2024-09-26 03:59:38 +01:00
ochafik	059babdd9b	`minja`: try to please gcc	2024-09-26 03:58:18 +01:00
ochafik	94377d743c	`server`: catch errors in format_final_response_oaicompat instead of taking server down	2024-09-26 03:42:36 +01:00
ochafik	595e11cb11	`tool-call`: fix/test functionary v3	2024-09-26 03:42:05 +01:00
ochafik	c124ab48ea	`minja`: add str.endswith	2024-09-26 03:21:23 +01:00
ochafik	76d2938ef8	fix flake8 lints	2024-09-26 02:30:17 +01:00
ochafik	1b6280102b	fix editorconfig lints	2024-09-26 02:27:46 +01:00
R0CKSTAR	7691654c68	mtgpu: enable VMM (#9597 ) Some checks failed Nix CI / nix-eval (macos-latest) (push) Waiting to run Details Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run Details Nix CI / nix-build (macos-latest) (push) Waiting to run Details Nix CI / nix-build (ubuntu-latest) (push) Waiting to run Details flake8 Lint / Lint (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Has been cancelled Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Has been cancelled Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Has been cancelled Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Has been cancelled Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Has been cancelled Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Has been cancelled Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Has been cancelled Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Has been cancelled Details Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2024-09-26 03:27:40 +02:00
ochafik	ab25e3fbf9	`tool-call`: allow empty message content when there's tool_calls in format_chat	2024-09-26 02:19:04 +01:00
ochafik	d928ff4dfd	`server`: catch errors in oaicompat_completion_params_parse instead of taking server down	2024-09-26 02:18:01 +01:00
ochafik	a774093a99	`tool-call`: add server tests for llama 3.1	2024-09-26 02:17:30 +01:00
ochafik	9e366b3d03	`server`: fix tailing comma in completions_seed	2024-09-26 02:15:48 +01:00
ochafik	45b243b4a5	`minja`: fix llama_chat_apply_template + adde use_jinja param to validate_model_chat_template	2024-09-26 02:14:42 +01:00
ochafik	e983c9d0de	`tool-call`: fix llama_chat_apply_template signature / test-chat-template	2024-09-25 22:02:58 +01:00
ochafik	97d0620968	`minja`: fetch more templates (add models from test-chat-template)	2024-09-25 19:22:43 +01:00
ochafik	d15dcfb09d	`tool-call`: add output example to readme	2024-09-25 19:22:16 +01:00
ochafik	33ea20edd1	Merge remote-tracking branch 'origin/master' into tool-call	2024-09-25 18:58:54 +01:00
ochafik	8f25531c44	`tool-call`: add basic usage example to server readme	2024-09-25 18:00:31 +01:00
ochafik	4706bdbae1	`tool-call`: support Functionary v3 vs. v3-llama3.1 variants	2024-09-25 17:33:00 +01:00
Xuan Son Nguyen	ea9c32be71	ci : fix docker build number and tag name (#9638 ) Some checks are pending Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run Details Nix CI / nix-eval (macos-latest) (push) Waiting to run Details Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run Details Nix CI / nix-build (macos-latest) (push) Waiting to run Details Nix CI / nix-build (ubuntu-latest) (push) Waiting to run Details flake8 Lint / Lint (push) Waiting to run Details * ci : fix docker build number and tag name * fine-grant permissions	2024-09-25 17:26:01 +02:00
ochafik	41103c0ed6	`server`: add --chat-template-file	2024-09-25 16:14:46 +01:00
ochafik	e309c6a47f	`tool-call`: integrate minja & tool-call to server when --jinja is set	2024-09-25 16:14:46 +01:00
ochafik	3cfc21ea71	`tool-call`: basic Functionary 3.2, Llama 3.1, Hermes 2 Pro grammar generators + parsers	2024-09-25 16:14:22 +01:00
ochafik	26c175b416	`json`: build_grammar helper	2024-09-25 16:14:22 +01:00
ochafik	eaca756ecc	`minja`: minimalist Jinja templating engine for LLM chat templates	2024-09-25 16:14:22 +01:00
ochafik	5b6d5040d5	`grammar`: trigger words + refactor of antiprompts	2024-09-25 16:14:22 +01:00
Charles Xu	1e43630218	ggml : remove assert for AArch64 GEMV and GEMM Q4 kernels (#9217 ) Some checks failed Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run Details Nix CI / nix-eval (macos-latest) (push) Waiting to run Details Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run Details Nix CI / nix-build (macos-latest) (push) Waiting to run Details Nix CI / nix-build (ubuntu-latest) (push) Waiting to run Details flake8 Lint / Lint (push) Waiting to run Details Python check requirements.txt / check-requirements (push) Has been cancelled Details Python Type-Check / pyright type-check (push) Has been cancelled Details * ggml : remove assert for AArch64 GEMV and GEMM Q4 kernels * added fallback mechanism when the offline re-quantized model is not optimized for the underlying target. * fix for build errors * remove prints from the low-level code * Rebase to the latest upstream	2024-09-25 16:12:20 +03:00
Xuan Son Nguyen	afbbfaa537	server : add more env vars, improve gen-docs (#9635 ) * server : add more env vars, improve gen-docs * update server docs * LLAMA_ARG_NO_CONTEXT_SHIFT	2024-09-25 14:05:13 +02:00
Gabe Goodhart	3d6bf6919f	llama : add IBM Granite MoE architecture (#9438 ) * feat(gguf-py): Add granitemoe architecture This includes the addition of new tensor names for the new moe layers. These may not be correct at this point due to the need for the hack in gguf_writer.py to double-check the length of the shape for these layers. Branch: GraniteMoE Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat(convert_hf_to_gguf): Add GraniteMoeModel GraniteMoe has the same configuration deltas as Granite Branch: GraniteMoE Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix(granitemoe convert): Split the double-sized input layer into gate and up After a lot of staring and squinting, it's clear that the standard mixtral expert implementation is equivalent to the vectorized parallel experts in granite. The difference is that in granite, the w1 and w3 are concatenated into a single tensor "input_linear." Rather than reimplementing all of the math on the llama.cpp side, the much simpler route is to just split this tensor during conversion and follow the standard mixtral route. Branch: GraniteMoE Co-Authored-By: alex.brooks@ibm.com Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat(granitemoe): Implement granitemoe GraniteMoE follows the mixtral architecture (once the input_linear layers are split into gate_exps/up_exps). The main delta is the addition of the same four multipliers used in Granite. Branch: GraniteMoE Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * Typo fix in docstring Co-Authored-By: ggerganov@gmail.com Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix(conversion): Simplify tensor name mapping in conversion Branch: GraniteMoE Co-Authored-By: git@compilade.net Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix(convert): Remove unused tensor name mappings Branch: GraniteMoE Co-Authored-By: git@compilade.net Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix(convert): Sanity check on merged FFN tensor sizes Branch: GraniteMoE Co-Authored-By: git@compilade.net Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Allow "output" layer in granite moe architecture (convert and cpp) Branch: GraniteMoE Co-Authored-By: git@compilade.net Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix(granite): Add missing 'output' tensor for Granite This is a fix for the previous `granite` architecture PR. Recent snapshots have included this (`lm_head.weights`) as part of the architecture Branch: GraniteMoE Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-09-25 10:06:52 +03:00
Dou Xinpeng	904837e0cb	cann: fix crash when llama-bench is running on multiple cann devices (#9627 ) Some checks are pending Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run Details Nix CI / nix-eval (macos-latest) (push) Waiting to run Details Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run Details Nix CI / nix-build (macos-latest) (push) Waiting to run Details Nix CI / nix-build (ubuntu-latest) (push) Waiting to run Details flake8 Lint / Lint (push) Waiting to run Details	2024-09-25 11:30:38 +08:00
Eric Zhang	70392f1f81	ggml : add AVX512DQ requirement for AVX512 builds (#9622 ) Some checks are pending Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run Details Nix CI / nix-eval (macos-latest) (push) Waiting to run Details Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run Details Nix CI / nix-build (macos-latest) (push) Waiting to run Details Nix CI / nix-build (ubuntu-latest) (push) Waiting to run Details flake8 Lint / Lint (push) Waiting to run Details	2024-09-24 11:03:21 +03:00
Georgi Gerganov	bb5f819975	sync : ggml	2024-09-24 11:01:18 +03:00
Georgi Gerganov	c038931615	examples : adapt to ggml.h changes (ggml/0) ggml-ci	2024-09-24 11:00:52 +03:00
Georgi Gerganov	31ac5834fe	llama : keep track of all EOG tokens in the vocab (#9609 ) ggml-ci	2024-09-24 10:16:06 +03:00
Georgi Gerganov	cea1486ecf	log : add CONT level for continuing previous log entry (#9610 )	2024-09-24 10:15:35 +03:00
StrangeBytesDev	0aa15011e3	server : add newline after chat example (#9616 )	2024-09-24 09:04:39 +03:00

... 3 4 5 6 7 ...

4065 Commits