llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-13 04:00:16 +00:00

Author	SHA1	Message	Date
Diego Devesa	a7ad553513	ggml-backend : add device description to CPU backend (#9720 )	2024-10-03 17:39:18 +02:00
bandoti	d6fe7abf04	ggml: unify backend logging mechanism (#9709 ) * Add scaffolding for ggml logging macros * Metal backend now uses GGML logging * Cuda backend now uses GGML logging * Cann backend now uses GGML logging * Add enum tag to parameters * Use C memory allocation funcs * Fix compile error * Use GGML_LOG instead of GGML_PRINT * Rename llama_state to llama_logger_state * Prevent null format string * Fix whitespace * Remove log callbacks from ggml backends * Remove cuda log statement	2024-10-03 17:39:03 +02:00
compilade	e3c355ba65	convert : handle tokenizer merges format from transformers 4.45 (#9696 )	2024-10-03 17:22:15 +03:00
Radoslav Gerganov	841713e1e4	rpc : enable vulkan (#9714 ) Some checks are pending Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run Details Nix CI / nix-eval (macos-latest) (push) Waiting to run Details Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run Details Nix CI / nix-build (macos-latest) (push) Waiting to run Details Nix CI / nix-build (ubuntu-latest) (push) Waiting to run Details flake8 Lint / Lint (push) Waiting to run Details closes #8536	2024-10-03 13:00:52 +03:00
Ouadie EL FAROUKI	5639971466	Fixed dequant precision issues in Q4_1 and Q5_1 (#9711 )	2024-10-03 07:50:44 +01:00
Diego Devesa	c83ad6d01e	ggml-backend : add device and backend reg interfaces (#9707 ) Some checks are pending Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run Details Nix CI / nix-eval (macos-latest) (push) Waiting to run Details Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run Details Nix CI / nix-build (macos-latest) (push) Waiting to run Details Nix CI / nix-build (ubuntu-latest) (push) Waiting to run Details flake8 Lint / Lint (push) Waiting to run Details Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2024-10-03 01:49:47 +02:00
Olivier Chafik	fa8df0c350	`agent`: drop fastify.py -> simpler serve_tools.py, and expose other tools to python interpreter	2024-10-02 19:51:23 +01:00
Olivier Chafik	6b4a454735	`agent`: hard-code max_results=10 in brave_search	2024-10-02 19:13:28 +01:00
Olivier Chafik	26e76f9704	`agent`: allow interactive chat by default, and don't reuse sessions	2024-10-02 19:12:57 +01:00
Olivier Chafik	6f2191d99e	`agent`: remove lots of cruft from tool definitions derived from FastAPI catalog (and remove wait* tools which can be implemented in Python anyway)	2024-10-02 17:54:20 +01:00
Olivier Chafik	e2a9ab68a3	`agent`: --openai flag (auto-fetches OPENAI_API_KEY), improved logging	2024-10-02 17:15:55 +01:00
Olivier Chafik	2428b73853	`agent`: ditch openai dependency, use cache_prompt and expose seed	2024-10-02 16:26:45 +01:00
Olivier Chafik	b559d64ecc	Update README.md	2024-10-02 15:19:27 +01:00
Olivier Chafik	9e502e89a5	`tool-call`: promote getting chat templates w/ dedicated script rather than rely on test resources	2024-10-02 15:03:08 +01:00
Olivier Chafik	f3538e755b	update tools	2024-10-02 14:57:25 +01:00
Xuan Son Nguyen	a39ab216aa	llama : reduce compile time and binary size (#9712 ) Some checks are pending Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run Details Nix CI / nix-eval (macos-latest) (push) Waiting to run Details Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run Details Nix CI / nix-build (macos-latest) (push) Waiting to run Details Nix CI / nix-build (ubuntu-latest) (push) Waiting to run Details flake8 Lint / Lint (push) Waiting to run Details * llama : speed up compile time * fix build * fix build (2)	2024-10-02 15:49:55 +02:00
Olivier Chafik	5b01402655	`agent`: add brave_search & fetch_page tools + move to examples/agent/tools/	2024-10-02 14:29:45 +01:00
Alberto Cabrera Pérez	f536f4c439	[SYCL] Initial cmake support of SYCL for AMD GPUs (#9658 ) sycl: initial cmake support of SYCL for AMD GPUs	2024-10-02 13:57:18 +01:00
Radoslav Gerganov	00b7317e63	vulkan : do not use tensor->extra (#9407 ) * vulkan : do not use tensor->extra This patch allows using the Vulkan backend with the RPC backend as tensor->extra is no longer used. Ref: #8536 * Adapt GGML_VULKAN_CHECK_RESULTS to extra removal (#2) --------- Co-authored-by: 0cc4m <picard12@live.de>	2024-10-02 13:49:16 +03:00
Zhenwei Jin	76b37d1541	gguf-split : improve --split and --merge logic (#9619 ) Some checks are pending Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run Details Nix CI / nix-eval (macos-latest) (push) Waiting to run Details Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run Details Nix CI / nix-build (macos-latest) (push) Waiting to run Details Nix CI / nix-build (ubuntu-latest) (push) Waiting to run Details flake8 Lint / Lint (push) Waiting to run Details * make sure params --split and --merge are not specified at same time * update gguf-split params parse logic * Update examples/gguf-split/gguf-split.cpp Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: slaren <slarengh@gmail.com>	2024-10-02 10:21:57 +03:00
Georgi Gerganov	148844fe97	examples : remove benchmark (#9704 ) ggml-ci	2024-10-02 10:14:44 +03:00
Olivier Chafik	c76b14501e	`tool-call`: fix Makefile	2024-10-02 00:06:42 +01:00
Olivier Chafik	c36a196f53	`tool-call`: prepare possible externalization of minja + factor tool call style out of template	2024-10-01 23:12:24 +01:00
Paweł Wodnicki	3f1ae2e32c	Update README.md (#9591 ) Some checks are pending Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run Details Nix CI / nix-eval (macos-latest) (push) Waiting to run Details Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run Details Nix CI / nix-build (macos-latest) (push) Waiting to run Details Nix CI / nix-build (ubuntu-latest) (push) Waiting to run Details flake8 Lint / Lint (push) Waiting to run Details Add Bielik model.	2024-10-01 19:18:46 +02:00
Georgi Gerganov	f1b8c42711	sync : ggml	2024-10-01 16:09:42 +03:00
Johannes Gäßler	e98c1c188e	test: fix OPT_STEP_ADAMW for test-backend-ops (ggml/974)	2024-10-01 16:07:40 +03:00
Salvatore Mesoraca	cb00020504	vulkan : mul_mat: fix UB with small warps (ggml/952) When the device's warp size is less than 16, it is possible for loadstride_a (mul_mm.comp:114) and loadstride_b (mul_mm.comp:115) to be set to 0. Because they are calculated as: the workgroup size, multiplied by LOAD_VEC_* (which can be 1) and divided by 16. And the workgroup size is set to be the same as the warp/subgroup size. The loadstride_* variables are used as increments in the loops that populate the buffers used for the multiplication. When they are 0 they cause an infinite loop. But infinite loops without side-effects are UB and the values of loadstride_* are known at compile time. So, the compiler quietly optimizes all the loops away. As a consequence, the buffers are not populated and the multiplication result is just a matrix with all elements set to 0. We prevent the UB by making sure that the workgroup size will never be less than 16, even if our device has a smaller warp size (e.g. 8). Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>	2024-10-01 16:07:39 +03:00
Borislav Stanimirov	6c5322481a	ggml : fix ggml_cast (ggml/973)	2024-10-01 16:07:39 +03:00
Johannes Gäßler	7254cdf7e8	ggml: fix gradient allocation logic (ggml/966) * ggml: fix gradient allocation logic * gradient allocation in ggml_build_backward_expand * fixup * fix test-backend-ops grad * suggestions by slaren * fix test1.c * fix legacy opt API * fix test-grad0 * remove keep arg	2024-10-01 16:07:38 +03:00
Georgi Gerganov	cad341d889	metal : reduce command encoding overhead (#9698 ) * metal : reduce command encoding overhead ggml-ci * metal : add comments	2024-10-01 16:00:25 +03:00
Georgi Gerganov	a90484c6d9	llama : print correct model type for Llama 3.2 1B and 3B Some checks failed Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run Details Nix CI / nix-eval (macos-latest) (push) Waiting to run Details Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run Details Nix CI / nix-build (macos-latest) (push) Waiting to run Details Nix CI / nix-build (ubuntu-latest) (push) Waiting to run Details flake8 Lint / Lint (push) Waiting to run Details Python check requirements.txt / check-requirements (push) Has been cancelled Details Python Type-Check / pyright type-check (push) Has been cancelled Details	2024-10-01 11:42:01 +03:00
compilade	1927378bcc	convert : refactor rope_freqs generation (#9396 ) * convert : refactor rope_freqs generation This should also fix vocab-only conversion for Phi-3. * convert : adapt MiniCPM3 to separate rope_freqs insertion MiniCPM3's tokenizer is treated as a SentencePiece tokenizer to avoid having to run its custom Python code which mixes tokenization in the same file as tool calls. gguf-py : add long and short RoPE factors to tensor mappings Empty, but the key names are used to populate the mappings.	2024-10-01 09:31:36 +03:00
serhii-nakon	6f1d9d71f4	Fix Docker ROCM builds, use AMDGPU_TARGETS instead of GPU_TARGETS (#9641 ) Some checks are pending Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run Details Nix CI / nix-eval (macos-latest) (push) Waiting to run Details Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run Details Nix CI / nix-build (macos-latest) (push) Waiting to run Details Nix CI / nix-build (ubuntu-latest) (push) Waiting to run Details flake8 Lint / Lint (push) Waiting to run Details Python Type-Check / pyright type-check (push) Waiting to run Details * Fix Docker ROCM builds, use AMDGPU_TARGETS instead of GPU_TARGETS * Set ROCM_DOCKER_ARCH as string due it incorrectly build and cause OOM exit code	2024-09-30 20:57:12 +02:00
compilade	511636df0c	ci : reduce severity of unused Pyright ignore comments (#9697 )	2024-09-30 14:13:16 -04:00
vb	08a43d05b6	py : update transfomers version (#9694 ) Some checks failed Nix CI / nix-eval (macos-latest) (push) Waiting to run Details Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run Details Nix CI / nix-build (macos-latest) (push) Waiting to run Details Nix CI / nix-build (ubuntu-latest) (push) Waiting to run Details Python check requirements.txt / check-requirements (push) Waiting to run Details flake8 Lint / Lint (push) Waiting to run Details Python Type-Check / pyright type-check (push) Waiting to run Details Nix aarch64 builds / nix-build-aarch64 (push) Has been cancelled Details * update transfomers version. * update hfh version.	2024-09-30 18:03:47 +03:00
Georgi Gerganov	ace4f4be37	flake.lock: Update (#9680 ) Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/c04d5652cfa9742b1d519688f65d1bbccea9eb7e?narHash=sha256-PmUr/2GQGvFTIJ6/Tvsins7Q43KTMvMFhvG6oaYK%2BWk%3D' (2024-09-19) → 'github:NixOS/nixpkgs/1925c603f17fc89f4c8f6bf6f631a802ad85d784?narHash=sha256-J%2BPeFKSDV%2BpHL7ukkfpVzCOO7mBSrrpJ3svwBFABbhI%3D' (2024-09-26) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-09-30 07:48:49 -07:00
Ruchira Hasaranga	8277a817f1	console : utf-8 fix for windows stdin (#9690 ) Some checks are pending Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run Details Nix CI / nix-eval (macos-latest) (push) Waiting to run Details Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run Details Nix CI / nix-build (macos-latest) (push) Waiting to run Details Nix CI / nix-build (ubuntu-latest) (push) Waiting to run Details flake8 Lint / Lint (push) Waiting to run Details * utf-8 fix for windows stdin * Update common/console.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-09-30 11:23:42 +03:00
ochafik	d9451fd647	`antiprompts`: avoid c++20 struct initializers in test	2024-09-30 04:08:55 +01:00
ochafik	0fc5ad7ae1	`minja`: avoid c++20 struct initializers in test	2024-09-30 03:51:48 +01:00
ochafik	277f38536c	`minja`: attempt to handle windows' crlf	2024-09-30 03:45:50 +01:00
Georgi Gerganov	c919d5db39	ggml : define missing HWCAP flags (#9684 ) Some checks are pending Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run Details Nix CI / nix-eval (macos-latest) (push) Waiting to run Details Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run Details Nix CI / nix-build (macos-latest) (push) Waiting to run Details Nix CI / nix-build (ubuntu-latest) (push) Waiting to run Details flake8 Lint / Lint (push) Waiting to run Details ggml-ci Co-authored-by: Willy Tarreau <w@1wt.eu>	2024-09-29 21:18:23 +03:00
Georgi Gerganov	d0b1d663e4	sync : ggml	2024-09-29 21:16:07 +03:00
Johannes Gäßler	aaa4099925	CUDA: remove bad assert (ggml/972)	2024-09-29 21:15:37 +03:00
Jeff Bolz	641002fba8	vulkan : multithread pipeline creation (ggml/963)	2024-09-29 21:15:37 +03:00
Jeff Bolz	0de8b203f1	vulkan : fix build for GGML_VULKAN_RUN_TESTS, add TFLOPS to log (ggml/961)	2024-09-29 21:15:37 +03:00
Salvatore Mesoraca	544f409b4b	vulkan : argsort barriers must be under uniform control flow (ggml/951) a return before a barrier (that happens only in some threads in a workgroup) leads to UB. While the old code actually works on some devices, it fails on some others (i.e. "smaller" GPUs). BTW, I think it would be better to set specialization constants when the graph is built, in that way the local workgroup could be sized appropriately. But it would take a lot of work. Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>	2024-09-29 21:15:37 +03:00
Georgi Gerganov	6084bfb261	ggml : fix GGML_MAX_N_THREADS + improve formatting (ggml/969)	2024-09-29 21:15:35 +03:00
matiaslin	faac0bae26	common : ensure llama_batch size does not exceed max size (#9668 ) Some checks are pending Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run Details Nix CI / nix-eval (macos-latest) (push) Waiting to run Details Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run Details Nix CI / nix-build (macos-latest) (push) Waiting to run Details Nix CI / nix-build (ubuntu-latest) (push) Waiting to run Details Python check requirements.txt / check-requirements (push) Waiting to run Details flake8 Lint / Lint (push) Waiting to run Details Python Type-Check / pyright type-check (push) Waiting to run Details A crash was observed when the number of tokens added to a batch exceeds llama_batch size. An assertion in llama_batch_add was added to protect against llama_batch size overflow.	2024-09-29 15:25:00 +03:00
nopperl	f99d3f8367	py : add model class for Chameleon conversion (#9683 )	2024-09-29 15:02:06 +03:00
Georgi Gerganov	589b48d41e	contrib : add Resources section (#9675 )	2024-09-29 14:38:18 +03:00

1 2 3 4 5 ...

4065 Commits