llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-13 12:10:18 +00:00

Author	SHA1	Message	Date
ochafik	bc3e0c0830	`tool-call`: Qwen 2.5 Instruct also requires object arguments	2024-09-28 23:05:35 +01:00
ochafik	b10ef04d8d	`chat-template`: tweak --chat-template error message when --jinja is set	2024-09-28 22:36:38 +01:00
ochafik	dbda025f87	`tool-call`: test messages -> template -> grammar -> tool call parser	2024-09-28 22:32:47 +01:00
ochafik	1b32ac129f	`chat-template`: fix test-arg	2024-09-28 20:06:10 +01:00
ochafik	e6be59c2a0	`antiprompts`: fix gcc8 build (avoid recursive struct)	2024-09-28 19:39:52 +01:00
ochafik	05bbba9f8a	`tool-call`: only match json eagerly for Llama 3.2	2024-09-28 19:05:10 +01:00
ochafik	7cef90cf9c	`tool-call`: more eager function call parsing for Functionary & Llama (give a chance to 3B model)	2024-09-28 18:33:40 +01:00
ochafik	d983516f40	`tool-call`: let the tool call handler expand chat template, moving builtin_tools down as extra_context	2024-09-28 17:46:36 +01:00
ochafik	0c85bc7a8f	`tool-call`: test tool call style detection	2024-09-28 17:43:09 +01:00
ochafik	701b664551	`minja`: add `indent` filter to support command-r-plus's chat templates	2024-09-27 19:00:14 +01:00
ochafik	0093a5e527	`minja`: fix identifiers parsing (when start w/ not/is/etc) and lstrip_blocks corner case (needed by DeepSeek-V2.5	2024-09-27 18:30:44 +01:00
ochafik	e62b5de3cf	`tool-call`: fix functionary-small-3.2 (first tool starts w/ name\n, subsequent are >>>name\n)	2024-09-27 07:06:33 +01:00
ochafik	e33b342da7	`tool-call`: fix passing of tools to template + allow agent to finish	2024-09-27 06:24:22 +01:00
ochafik	f62e688387	`tool-call`: fix crash / test non-tool call case (added llama_sampler_is_grammar_empty)	2024-09-27 06:04:41 +01:00
ochafik	27cd07a056	`json`: fix grammar conversion typo	2024-09-27 03:57:48 +01:00
ochafik	f9c1743bb5	`minja`: fix iterables	2024-09-27 03:36:49 +01:00
ochafik	c88c932d98	fix gcc error + lint	2024-09-26 19:18:40 +01:00
ochafik	5840e10069	`tool-call`: merge & fix jinja template tests into test-chat-template	2024-09-26 19:05:00 +01:00
ochafik	50685f837f	`minja`: add str.title()	2024-09-26 19:03:59 +01:00
ochafik	9cfe4d7202	`tool-call`: refactor llama_chat_template class + use in validate_model_chat_template	2024-09-26 18:06:03 +01:00
ochafik	cf7bece6a7	`tool-call`: factor chat template away from legacy API	2024-09-26 17:19:29 +01:00
ochafik	d7ec84f78c	`tool-call`: allow <\|python_tag\|> in functionary-medium-3.1	2024-09-26 06:52:34 +01:00
ochafik	3d2650ce65	fix gcc build	2024-09-26 06:52:34 +01:00
ochafik	0c870133d8	`tool-call`: test/fix functionary-medium-v3.1's template (can "look" like llama3.1 template)	2024-09-26 05:56:15 +01:00
ochafik	8e4a9bad8a	`minja`: allow none input to selectattr, and add safe passthrough filter	2024-09-26 05:53:12 +01:00
ochafik	5f5be9cde7	`minja`: gcc tweaks	2024-09-26 05:06:11 +01:00
ochafik	059babdd9b	`minja`: try to please gcc	2024-09-26 03:58:18 +01:00
ochafik	595e11cb11	`tool-call`: fix/test functionary v3	2024-09-26 03:42:05 +01:00
ochafik	c124ab48ea	`minja`: add str.endswith	2024-09-26 03:21:23 +01:00
ochafik	1b6280102b	fix editorconfig lints	2024-09-26 02:27:46 +01:00
ochafik	a774093a99	`tool-call`: add server tests for llama 3.1	2024-09-26 02:17:30 +01:00
ochafik	45b243b4a5	`minja`: fix llama_chat_apply_template + adde use_jinja param to validate_model_chat_template	2024-09-26 02:14:42 +01:00
ochafik	e983c9d0de	`tool-call`: fix llama_chat_apply_template signature / test-chat-template	2024-09-25 22:02:58 +01:00
ochafik	33ea20edd1	Merge remote-tracking branch 'origin/master' into tool-call	2024-09-25 18:58:54 +01:00
ochafik	4706bdbae1	`tool-call`: support Functionary v3 vs. v3-llama3.1 variants	2024-09-25 17:33:00 +01:00
ochafik	41103c0ed6	`server`: add --chat-template-file	2024-09-25 16:14:46 +01:00
ochafik	e309c6a47f	`tool-call`: integrate minja & tool-call to server when --jinja is set	2024-09-25 16:14:46 +01:00
ochafik	3cfc21ea71	`tool-call`: basic Functionary 3.2, Llama 3.1, Hermes 2 Pro grammar generators + parsers	2024-09-25 16:14:22 +01:00
ochafik	26c175b416	`json`: build_grammar helper	2024-09-25 16:14:22 +01:00
ochafik	eaca756ecc	`minja`: minimalist Jinja templating engine for LLM chat templates	2024-09-25 16:14:22 +01:00
ochafik	5b6d5040d5	`grammar`: trigger words + refactor of antiprompts	2024-09-25 16:14:22 +01:00
Xuan Son Nguyen	afbbfaa537	server : add more env vars, improve gen-docs (#9635 ) * server : add more env vars, improve gen-docs * update server docs * LLAMA_ARG_NO_CONTEXT_SHIFT	2024-09-25 14:05:13 +02:00
Georgi Gerganov	cea1486ecf	log : add CONT level for continuing previous log entry (#9610 )	2024-09-24 10:15:35 +03:00
Georgi Gerganov	b0f27361f3	sampling : avoid expensive softmax during greedy sampling (#9605 ) * sampling : avoid expensive softmax during greedy sampling ggml-ci * speculative : fix default RNG seed + set sparams.n_probs * Update tests/test-sampling.cpp Co-authored-by: slaren <slarengh@gmail.com> * sampling : add clarifying comment [no ci] --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-09-24 09:03:17 +03:00
Xuan Son Nguyen	0b3bf966f4	server : add --no-context-shift option (#9607 ) Some checks failed Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run Details Nix CI / nix-eval (macos-latest) (push) Waiting to run Details Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run Details Nix CI / nix-build (macos-latest) (push) Waiting to run Details Nix CI / nix-build (ubuntu-latest) (push) Waiting to run Details flake8 Lint / Lint (push) Waiting to run Details Nix aarch64 builds / nix-build-aarch64 (push) Has been cancelled Details Python Type-Check / pyright type-check (push) Has been cancelled Details * server : add --no-context-shift option * small fix * Update examples/server/tests/features/embeddings.feature Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * tests : minor fix * revert usage of GGML_ASSERT * update server documentation --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-09-23 22:23:54 +02:00
Bert Wagner	8b836ae731	arg : add env variable for parallel (#9513 ) Some checks are pending Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run Details Nix CI / nix-eval (macos-latest) (push) Waiting to run Details Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run Details Nix CI / nix-build (macos-latest) (push) Waiting to run Details Nix CI / nix-build (ubuntu-latest) (push) Waiting to run Details flake8 Lint / Lint (push) Waiting to run Details * add env variable for parallel * Update README.md with env: LLAMA_ARG_N_PARALLEL	2024-09-17 16:35:38 +03:00
Vinesh Janarthanan	441b72b91f	main : option to disable context shift (#9484 ) * added cli arg to disable context shift * reverted precommit * updated README.md for main * white space * allow disabling context shift in the server * Update common/arg.cpp no-context-shift only works for main example Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * added server example to --no-context-shift args * removed server changes * white space --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-09-16 09:20:01 +03:00
Georgi Gerganov	6262d13e0b	common : reimplement logging (#9418 ) Some checks are pending Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run Details Nix CI / nix-eval (macos-latest) (push) Waiting to run Details Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run Details Nix CI / nix-build (macos-latest) (push) Waiting to run Details Nix CI / nix-build (ubuntu-latest) (push) Waiting to run Details flake8 Lint / Lint (push) Waiting to run Details Python Type-Check / pyright type-check (push) Waiting to run Details https://github.com/ggerganov/llama.cpp/pull/9418	2024-09-15 20:46:12 +03:00
Georgi Gerganov	0abc6a2c25	llama : llama_perf + option to disable timings during decode (#9355 ) Some checks are pending Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run Details Nix CI / nix-eval (macos-latest) (push) Waiting to run Details Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run Details Nix CI / nix-build (macos-latest) (push) Waiting to run Details Nix CI / nix-build (ubuntu-latest) (push) Waiting to run Details flake8 Lint / Lint (push) Waiting to run Details * llama : llama_perf + option to disable timings during decode ggml-ci * common : add llama_arg * Update src/llama.cpp Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * perf : separate functions in the API ggml-ci * perf : safer pointer handling + naming update ggml-ci * minor : better local var name * perf : abort on invalid sampler pointer ggml-ci --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>	2024-09-13 09:53:38 +03:00
Ahmad Tameem	2b00fa7997	riscv : modify Makefile and add a RISCV_VECT to print log info (#9442 ) - Added ggml_cpu_has_riscv_v() in GGML to print system info in log - Modified Makefile to only use flag when cross compiling for RISC-V	2024-09-12 14:24:31 +03:00

1 2 3 4 5 ...

337 Commits