llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-11-14 06:49:54 +00:00

Author	SHA1	Message	Date
slaren	1b28061400	llama : skip token bounds check when evaluating embeddings (#9437 ) Some checks are pending Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run Details Nix CI / nix-eval (macos-latest) (push) Waiting to run Details Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run Details Nix CI / nix-build (macos-latest) (push) Waiting to run Details Nix CI / nix-build (ubuntu-latest) (push) Waiting to run Details Python check requirements.txt / check-requirements (push) Waiting to run Details flake8 Lint / Lint (push) Waiting to run Details Python Type-Check / pyright type-check (push) Waiting to run Details	2024-09-11 17:52:13 +02:00
Pavel Zloi	8db003a19d	py : support converting local models (#7547 ) * Support of converting local models added to convert-hf-to-gguf-update.py * Description fixed * shutil added to imports	2024-09-11 15:29:51 +03:00
Xuan Son Nguyen	0996c5597f	llava : correct args for minicpmv-cli (#9429 )	2024-09-11 12:59:13 +02:00
Xuan Son Nguyen	5bb2c5dbd2	files : remove accidentally added `lora_test` submodule (#9430 )	2024-09-11 13:02:09 +03:00
Farbod Bijary	67155ab7f5	feat: Implements retrying logic for downloading models using --model-url flag (#9255 ) * feat: Implements retrying logic for downloading models using --model-url flag * Update common/common.cpp Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * Update common/common.cpp Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * apply comments * implements a retry function to avoid duplication * fix editorconfig * change function name --------- Co-authored-by: farbod <farbod.bjary82@gmail.com> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2024-09-11 11:22:37 +02:00
Johannes Gäßler	5af118efda	CUDA: fix --split-mode row race condition (#9413 )	2024-09-11 10:22:40 +02:00
Georgi Gerganov	d2b496bff4	batched-bench : remove unused code (#9305 )	2024-09-11 10:03:54 +03:00
R0CKSTAR	b34e023480	musa: remove Clang builtins mapping (#9421 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2024-09-11 03:46:55 +02:00
Alberto Cabrera Pérez	51b6038636	sycl : update support conditions (#9394 ) * sycl : update support condition to im2col Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com> * Added TODO to remind supporting FP32 im2col --------- Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>	2024-09-11 08:53:42 +08:00
Georgi Gerganov	cb9c933eb2	flake.lock: Update (#9360 ) Flake lock file updates: • Updated input 'flake-parts': 'github:hercules-ci/flake-parts/af510d4a62d071ea13925ce41c95e3dec816c01d?narHash=sha256-ODYRm8zHfLTH3soTFWE452ydPYz2iTvr9T8ftDMUQ3E%3D' (2024-08-30) → 'github:hercules-ci/flake-parts/567b938d64d4b4112ee253b9274472dc3a346eb6?narHash=sha256-%2Bebgonl3NbiKD2UD0x4BszCZQ6sTfL4xioaM49o5B3Y%3D' (2024-09-01) • Updated input 'flake-parts/nixpkgs-lib': '`a5d394176e`.tar.gz?narHash=sha256-uFf2QeW7eAHlYXuDktm9c25OxOyCoUOQmh5SZ9amE5Q%3D' (2024-08-01) → '`356624c120`.tar.gz?narHash=sha256-Ss8QWLXdr2JCBPcYChJhz4xJm%2Bh/xjl4G0c0XlP6a74%3D' (2024-09-01) • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/71e91c409d1e654808b2621f28a327acfdad8dc2?narHash=sha256-GnR7/ibgIH1vhoy8cYdmXE6iyZqKqFxQSVkFgosBh6w%3D' (2024-08-28) → 'github:NixOS/nixpkgs/574d1eac1c200690e27b8eb4e24887f8df7ac27c?narHash=sha256-v3rIhsJBOMLR8e/RNWxr828tB%2BWywYIoajrZKFM%2B0Gg%3D' (2024-09-06) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-09-10 15:46:59 -07:00
Xuan Son Nguyen	6cd4e03444	arg : bring back missing ifdef (#9411 ) * arg : bring back missing ifdef * replace with llama_supports_gpu_offload	2024-09-10 22:41:29 +02:00
matteo	8d300bd35f	enable --special arg for llama-server (#9419 ) Co-authored-by: matteo serva <matteo.serva@gmail.com>	2024-09-10 22:40:59 +02:00
slaren	49006c67b4	llama : move random seed generation to the samplers (#9398 ) * llama_sampler_penalties : clamp penalty_last_n to zero	2024-09-10 18:04:25 +02:00
Georgi Gerganov	00ba2ff781	metal : fix compile warning with GGML_METAL_NDEBUG (#0 )	2024-09-10 10:17:43 +03:00
Daniel Bevenius	83008b7cfe	llama : update llm_build_copy_mask_state comment [no ci] (#9385 ) This commit updates the comment, which seems to contain a typo or be an outdated comment, in the copy_mask_state function changing the variable n_rs to n_kv. I believe this change is correct and what the comment wants to convey is to copy the states that are not going to be used in the upcoming processing, which are the tokens states from n_seqs up to the number of possible token states n_kv.	2024-09-10 10:03:21 +03:00
Molly Sophia	0b4ac75772	RWKV v6: Add time_mix_decay_w1/w2 in quant exclusion list (#9387 ) Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2024-09-10 10:02:30 +03:00
slaren	fb3f249815	make : do not run llama-gen-docs when building (#9399 )	2024-09-10 09:23:33 +03:00
Xuan Son Nguyen	bfe76d4a17	common : move arg parser code to `arg.cpp` (#9388 ) * common : move arg parser to arg.cpp * better categorize args * add cmake * missing climits * missing cstdarg * common : more explicit includes * fix build * refactor gpt_params_parse * update server readme * fix test --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-09-09 23:36:09 +02:00
Radoslav Gerganov	293bebe077	rpc : fix segfault with nkvo (#9389 ) * rpc : fix nkvo * rpc : buf_size must not be static ref: #9337 --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-09-09 18:40:10 +03:00
Prashant Vithule	5fac4d5764	ggml : vector length agnostic SVE support (#9290 ) * Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths * Implemented vector length agnostic SVE using switch case for 512-bit, 256-bit, 128-bit vector lengths * Removed WhiteSpaces * ggml : style changes + fix 512-bit nb loop check - fix local scope in switch cases - consistent predicate names - empty lines when necessary - opening braces, spaces - const-correctness - add asserts * Update ggml/src/ggml-quants.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-09-09 18:37:18 +03:00
slaren	5fb5e24811	llama : minor sampling refactor (2) (#9386 )	2024-09-09 17:10:46 +02:00
Georgi Gerganov	38ca6f644b	readme : update hot topics	2024-09-09 15:51:37 +03:00
Johannes Gäßler	8e6e2fbe14	CUDA: fix variable name conflict for Windows build (#9382 )	2024-09-09 14:22:53 +02:00
Antonis Makropoulos	5ed087573e	readme : add LLMUnity to UI projects (#9381 ) * add LLMUnity to UI projects * add newline to examples/rpc/README.md to fix editorconfig-checker unit test	2024-09-09 14:21:38 +03:00
Radoslav Gerganov	54f376d0b9	rpc : update README [no ci] (#9320 ) Update README with instructions how to offload model layers to both local and remote devices	2024-09-09 11:04:39 +03:00
Dan Johansson	b2e89a3274	Arm AArch64: Documentation updates (#9321 ) * Arm AArch64: Documentation updates * Update docs/build.md to include information on how to enable the Arm optimized gemm/gemv kernels * Update examples/quantize/README.md with information on the Q4_0_4_4, Q4_0_4_8 and Q4_0_8_8 formats * Add newline to the end of docs/build.md	2024-09-09 10:02:45 +03:00
Markus Tavenrath	daa9623ab0	Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early. (#9118 ) * Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early. * fix compile issues * Fix issues where the last submit wasn't executed or handled properly. * remove trailing whitespace * Repair GGML_VULKAN_CHECK_RESULTS * Increase submit counter only if actual work has been submitted and increase submit count to 100. * Fix some nodes are not checked with GGML_VULKAN_CHECK_RESULTS enabled.	2024-09-08 21:43:48 +02:00
Georgi Gerganov	e079bffb66	cuda : fix FA Q src index (1 -> 0) (#9374 )	2024-09-08 22:01:02 +03:00
Xuan Son Nguyen	3f7ccfd649	common : bring back missing args, add env var duplication check (#9375 ) * common : bring back missing args * move duplication check to test-arg-parser * add check for duplicated env var * correct default values	2024-09-08 18:08:55 +02:00
slaren	a249843d89	common : restore --n-gpu-layers (#9371 )	2024-09-08 16:44:42 +02:00
slaren	19f4a7b296	llama : refactor samplers internal implementation (#9370 )	2024-09-08 15:52:07 +02:00
Neo Zhang Jianyu	2a358fb0c4	[SYCL] add check malloc result on device (#9346 ) * add check malloc result on device * update for review comments, check all malloc_device() result --------- Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2024-09-08 19:05:29 +08:00
slaren	eae597182c	llama : sanitize tokens in the upper bound (#9359 )	2024-09-08 12:41:51 +02:00
Xuan Son Nguyen	00b02bb249	imatrix : fix arg parser for imatrix (#9366 ) * imatrix : fix arg parser * beautify printing first arg	2024-09-08 12:12:17 +02:00
Georgi Gerganov	a876861455	metal : update support condition for im2col + fix warning (#0 )	2024-09-08 11:05:55 +03:00
Georgi Gerganov	385decbd63	sync : ggml	2024-09-08 11:05:55 +03:00
Georgi Gerganov	60a3107ccd	scripts : option to increase git patch context	2024-09-08 11:05:55 +03:00
Salvatore Mesoraca	406c1a32a1	vulkan: add dryrun support to sin and cos ops (ggml/947) sin and cos failed test-backend-ops because they tried to dereference a context pointer that is null on dry runs. This commit prevents that segfault. Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>	2024-09-08 11:05:55 +03:00
Salvatore Mesoraca	9cb9260861	vulkan: correctly report support for OP_CONT (ggml/946) test-backend-ops fails because ggml_cont aborts when invoked passing an unsupported type. This commit makes ggml_cont tests pass Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com>	2024-09-08 11:05:55 +03:00
Johannes Gäßler	202084d31d	tests: add gradient tests for all backends (ggml/932) * tests: add gradient checking to test-backend-ops * remove old comment * reorder includes * adjust SIN/COS parameters * add documentation, use supports_op if possible	2024-09-08 11:05:55 +03:00
Johannes Gäßler	dbbebcab33	ggml: fix ggml_graph_cpy undefined behavior (ggml/943)	2024-09-08 11:05:55 +03:00
Georgi Gerganov	ba1cf846ed	cann : fix doxy (ggml/0)	2024-09-08 11:05:55 +03:00
Mengqing Cao	d2d3200b38	cann : add Ascend NPU support (whisper/2336) * enable Ascend NPU in src/whisper.cpp * sync test-backend-ops with llama.cpp	2024-09-08 11:05:55 +03:00
Georgi Gerganov	51d964a4ef	cuda : mark BF16 CONT as unsupported	2024-09-08 11:05:55 +03:00
Salvatore Mesoraca	efe6a83e30	ggml : fix cont with transposed tensors when one dimension is 1 (ggml/934) * ggml_cont: fix issue with transposed tensors when one dimension is 1 when using multiple threads, it is not enough to check for the tensors to be contiguous for ggml_compute_forward_dup_same_cont to work correctly. The tensors strides also need to match. Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> * Add ggml_cont tests Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> * Remove dead code it isn't possible to reach this code because all these functions are invoked by ggml_compute_forward_dup if and only if src0->type != dst->type Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> * Make ggml_compute_forward_dup_same_cont work with contiguous tensors Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> --------- Signed-off-by: Salvatore Mesoraca <s.mesoraca16@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-09-08 11:05:55 +03:00
Kevin Gibbons	fbb7fcffbc	llama : set attrs of mislabelled EOT/EOM tokens (#9348 )	2024-09-08 08:51:00 +03:00
Georgi Gerganov	a5b5d9a101	llama.android : fix build (#9350 )	2024-09-08 00:33:50 +03:00
Georgi Gerganov	f12295b8a9	llama : fix empty ring buffer push (#9358 )	2024-09-08 00:33:33 +03:00
Georgi Gerganov	faf69d4237	llama : sanitize invalid tokens (#9357 ) * common : do not add null tokens during warmup ggml-ci * llama : check that the input tokens are valid ggml-ci * tests : fix batch size of bert model ggml-ci	2024-09-08 00:33:13 +03:00
Eve	e536426ded	llamafile : disable sgemm for batch-size 1 (#9330 )	2024-09-07 22:02:26 +03:00

... 3 4 5 6 7 ...

3933 Commits