llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-11-13 14:29:52 +00:00

Author	SHA1	Message	Date
slaren	2f0e81e053	cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy (#6208 ) * cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy * add LLAMA_CUDA_NO_PEER_COPY to HIP build	2024-03-22 14:05:31 +01:00
Xiaoyi Chen	29ab270e65	readme : add RecurseChat to the list of UIs (#6219 )	2024-03-22 13:29:49 +02:00
Jan Boon	6b8bb3a31d	server : fix n_keep always showing as 0 in response (#6211 )	2024-03-22 13:12:05 +02:00
Georgi Gerganov	68e210b354	server : enable continuous batching by default (#6231 )	2024-03-22 13:08:28 +02:00
Georgi Gerganov	b3e94f26ba	metal : proper assert for mat-mat memory alignment (#6225 ) * metal : proper assert for mat-mat memory alignment ggml-ci * readme : add notice about the bug fix * metal : fix the fix ggml-ci	2024-03-22 11:35:53 +02:00
Vaibhav Srivastav	b2075fd6a5	ci : add CURL flag for the mac builds (#6214 )	2024-03-22 09:53:43 +02:00
Georgi Gerganov	95d576b48e	metal : pad n_ctx by 32 (#6177 ) * metal : require ne00 >= 128 for mat-mat kernels ggml-ci * llama : pad n_ctx by 32 ggml-ci	2024-03-22 09:36:03 +02:00
Neo Zhang Jianyu	59c17f02de	add blog link (#6222 )	2024-03-22 15:19:37 +08:00
DAN™	fa046eafbc	Fix params underscore convert to dash. (#6203 ) * Fix params underscore convert to dash. * Update common/common.cpp --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-03-22 02:32:42 +01:00
Jan Boon	be07a03217	server : update readme doc from `slot_id` to `id_slot` (#6213 )	2024-03-21 23:41:24 +01:00
slaren	d0a71233fb	cuda : disable host register by default (#6206 )	2024-03-21 20:54:28 +02:00
semidark	f372c49ccd	Corrected typo to wrong file (#6199 ) The stated file `./devops/main-server.Dockerfile` does not exist. I figure that `.devops/server-intel.Dockerfile` was meant.	2024-03-21 18:52:35 +01:00
Georgi Gerganov	924ce1dce7	tests : disable system() calls (#6198 ) ggml-ci	2024-03-21 16:20:05 +02:00
slaren	03a8f8fafe	cuda : fix LLAMA_CUDA_F16 build (#6197 )	2024-03-21 14:59:53 +02:00
Kawrakow	cfd3be76e3	ggml : same IQ4_NL quantization for CPU/CUDA/Metal (#6196 ) * Make quantize_row_iq4_nl do the same thing is quantization on CUDA * Make quantize_row_iq4_nl do the same thing is quantization on CUDA This time for real. backend-ops tests pass. * Now fix test-quantize-fns --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-03-21 14:59:38 +02:00
Olivier Chafik	5b7b0ac8df	json-schema-to-grammar improvements (+ added to server) (#5978 ) * json: fix arrays (disallow `[,1]`) * json: support tuple types (`[number, string]`) * json: support additionalProperties (`{[k: string]: [string,number][]}`) * json: support required / optional properties * json: add support for pattern * json: resolve $ref (and support https schema urls) * json: fix $ref resolution * join: support union types (mostly for nullable types I think) * json: support allOf + nested anyOf * json: support any (`{}` or `{type: object}`) * json: fix merge * json: temp fix for escapes * json: spaces in output and unrestricted output spaces * json: add typings * json:fix typo * Create ts-type-to-grammar.sh * json: fix _format_literal (json.dumps already escapes quotes) * json: merge lit sequences and handle negatives {"type": "string", "pattern": "^({\"question\": \"[^\"]+\", \"response\": \"[^\"]+\"}\\n)+$"} * json: handle pattern repetitions * Update json-schema-to-grammar.mjs * Create regex-to-grammar.py * json: extract repeated regexp patterns to subrule * Update json-schema-to-grammar.py * Update json-schema-to-grammar.py * Update json-schema-to-grammar.py * json: handle schema from pydantic Optional fields * Update json-schema-to-grammar.py * Update json-schema-to-grammar.py * Update ts-type-to-grammar.sh * Update ts-type-to-grammar.sh * json: simplify nullable fields handling * json: accept duplicate identical rules * json: revert space to 1 at most * json: reuse regexp pattern subrules * json: handle uuid string format * json: fix literal escapes * json: add --allow-fetch * json: simplify range escapes * json: support negative ranges in patterns * Delete commit.txt * json: custom regex parser, adds dot support & JS-portable * json: rm trailing spaces * Update json-schema-to-grammar.mjs * json: updated server & chat `( cd examples/server && ./deps.sh )` * json: port fixes from mjs to python * Update ts-type-to-grammar.sh * json: support prefixItems alongside array items * json: add date format + fix uuid * json: add date, time, date-time formats * json: preserve order of props from TS defs * json: port schema converter to C++, wire in ./server * json: nits * Update json-schema-to-grammar.cpp * Update json-schema-to-grammar.cpp * Update json-schema-to-grammar.cpp * json: fix mjs implementation + align outputs * Update json-schema-to-grammar.mjs.hpp * json: test C++, JS & Python versions * json: nits + regen deps * json: cleanup test * json: revert from c++17 to 11 * json: nit fixes * json: dirty include for test * json: fix zig build * json: pass static command to std::system in tests (fixed temp files) * json: fix top-level $refs * json: don't use c++20 designated initializers * nit * json: basic support for reserved names `{number:{number:{root:number}}}` * Revamp test cmake to allow args (WORKING_DIRECTORY needed for JSON test) * json: re-ran server deps.sh * json: simplify test * json: support mix of additional props & required/optional * json: add tests for some expected failures * json: fix type=const in c++, add failure expectations for non-str const&enum * json: test (& simplify output of) empty schema * json: check parsing in test + fix value & string refs * json: add server tests for OAI JSON response_format * json: test/fix top-level anyOf * json: improve grammar parsing failures * json: test/fix additional props corner cases * json: fix string patterns (was missing quotes) * json: ws nit * json: fix json handling in server when there's no response_format * json: catch schema conversion errors in server * json: don't complain about unknown format type in server if unset * json: cleaner build of test * json: create examples/json-schema-pydantic-example.py * json: fix date pattern * json: move json.hpp & json-schema-to-grammar.{cpp,h} to common * json: indent 4 spaces * json: fix naming of top-level c++ function (+ drop unused one) * json: avoid using namespace std * json: fix zig build * Update server.feature * json: iostream -> fprintf * json: space before & refs for consistency * json: nits	2024-03-21 11:50:43 +00:00
Vaibhav Srivastav	1943c01981	ci : fix indentation error (#6195 )	2024-03-21 11:30:40 +02:00
Vaibhav Srivastav	5e43ba8742	build : add mac pre-build binaries (#6182 ) * Initial commit - add mac prebuilds. * forward contribution credits for building the workflow. * minor : remove trailing whitespaces --------- Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-21 11:13:12 +02:00
Kawrakow	76aa30a263	Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183 ) * k_cache: be able to use Q5_0 * k_cache: be able to use Q5_1 on CODA * k_cache: be able to use Q5_0 on Metal * k_cache: be able to use Q5_1 on Metal * k_cache: be able to use IQ4_NL - just CUDA for now * k_cache: be able to use IQ4_NL on Metal * k_cache: add newly added supported types to llama-bench and CUDA supports_op --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-03-21 08:27:57 +01:00
AidanBeltonS	c5b8595e3f	Add nvidia and amd backends (#6157 )	2024-03-21 11:40:52 +05:30
slaren	42e21c6882	cuda : fix conflict with std::swap (#6186 )	2024-03-21 01:47:46 +01:00
slaren	1c51f98adc	cuda : print the returned error when CUDA initialization fails (#6185 )	2024-03-20 21:03:26 +01:00
Ziang Wu	f9c7ba3447	llava : update MobileVLM-README.md (#6180 )	2024-03-20 17:29:51 +02:00
Ziang Wu	272935b281	llava : add MobileVLM_V2 backup (#6175 ) * Add MobileVLM_V2 backup * Update MobileVLM-README.md * Update examples/llava/MobileVLM-README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/llava/convert-image-encoder-to-gguf.py Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * clip : fix whitespace * fix deifinition mistake in clip.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-20 17:02:32 +02:00
slaren	ccf58aa3ec	cuda : refactor to remove global resources (#6170 ) * cuda : refactor to remove global resources	2024-03-20 14:42:59 +01:00
Xuan Son Nguyen	91f8ad167d	Server: version bump for httplib and json (#6169 ) * server: version bump for httplib and json * fix build * bring back content_length	2024-03-20 13:30:36 +01:00
Georgi Gerganov	6b7e76d28c	gitignore : ignore curl-related files	2024-03-20 14:17:34 +02:00
Georgi Gerganov	bc0baab2ea	server : allow to override -ngl in tests (#6170 )	2024-03-20 14:14:32 +02:00
Georgi Gerganov	d795988d9e	Revert "llava : add a MobileVLM_V2-1.7B backup (#6152 )" This reverts commit `f8c4e745e1`.	2024-03-20 13:29:49 +02:00
Ziang Wu	f8c4e745e1	llava : add a MobileVLM_V2-1.7B backup (#6152 ) * Add MobileVLM_V2 backup * Update MobileVLM-README.md * Update examples/llava/MobileVLM-README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/llava/convert-image-encoder-to-gguf.py Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * clip : fix whitespace --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-20 13:20:37 +02:00
Karthick	47cc7a7bf9	Server: Handle n_keep parameter in the request (#6174 )	2024-03-20 12:02:34 +01:00
Jared Van Bortel	bd60d82d0c	server tests : more pythonic process management; fix bare `except:` (#6146 ) * server tests : remove seemingly redundant newlines in print() * server tests : use built-in subprocess features, not os.kill and psutil * server tests : do not catch e.g. SystemExit; use print_exc * server tests: handle TimeoutExpired exception * server tests: fix connect on dual-stack systems * server: tests: add new tokens regex on windows generated following new repeat penalties default changed in (#6127) * server: tests: remove the hack on windows since now we get the good socket family * server: tests: add new tokens regex following new repeat penalties default changed in (#6127) * server: tests: add new tokens regex following new repeat penalties default changed in (#6127) --------- Co-authored-by: Pierrick HYMBERT <pierrick.hymbert@gmail.com>	2024-03-20 06:33:49 +01:00
Neo Zhang Jianyu	6c0b287748	update readme sycl for new update (#6151 ) * update readme sycl for new update * Update README-sycl.md Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> * Update README-sycl.md Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> * Update README-sycl.md Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> * Update README-sycl.md Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> * Update README-sycl.md Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com> * Update README-sycl.md Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com> * update by review comments * update w64devkit link * update for verify device id part * Update README-sycl.md Co-authored-by: Meng, Hengyu <airdldl@163.com> --------- Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com> Co-authored-by: Meng, Hengyu <airdldl@163.com>	2024-03-20 11:21:41 +08:00
Abhilash Majumder	d26e8b669d	increase igpu cluster limit (#6159 )	2024-03-20 08:28:49 +05:30
DAN™	d8b009a945	Remove undeed header file. (#6158 )	2024-03-19 17:16:09 +01:00
Pierrick Hymbert	d0d5de42e5	gguf-split: split and merge gguf per batch of tensors (#6135 ) * gguf-split: split and merge gguf files per tensor * gguf-split: build with make toolchain * gguf-split: rename `--split-tensors-size` to `--split-max-tensors`. Set general.split_count KV to all split * split : minor style + fix compile warnings * gguf-split: remove --upload not implemented --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-19 12:05:44 +01:00
Georgi Gerganov	b80cf3b2d1	common : disable repeat penalties by default (#6127 )	2024-03-19 10:21:54 +02:00
slaren	970a48060a	ci : exempt some labels from being tagged as stale (#6140 )	2024-03-19 10:06:54 +02:00
DAN™	4c28b82529	common : print usage on '-h' and '--help' (#6145 )	2024-03-19 07:59:36 +02:00
github-actions[bot]	2d15886bb0	flake.lock: Update Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/9df3e30ce24fd28c7b3e2de0d986769db5d6225d' (2024-03-06) → 'github:NixOS/nixpkgs/d691274a972b3165335d261cc4671335f5c67de9' (2024-03-14)	2024-03-18 18:51:30 +00:00
Jared Van Bortel	d199ca79f2	mpt : implement backwards compatiblity with duped output tensor (#6139 )	2024-03-18 12:49:02 -04:00
Felix	104f5e0fc1	clip : fix memory leak (#6138 )	2024-03-18 17:40:22 +02:00
slaren	5e1b7f94a0	backend : set max split inputs to GGML_MAX_SRC (#6137 )	2024-03-18 16:33:44 +01:00
Georgi Gerganov	ac9ee6a4ad	ci : disable stale issue messages (#6126 )	2024-03-18 13:45:38 +02:00
Georgi Gerganov	4f6d1337ca	ci : temporary disable sanitizer builds (#6128 )	2024-03-18 13:45:27 +02:00
slaren	2bf8d0f7c4	backend : offload large batches to GPU (#6083 ) * backend : offload large batches to GPU * fix hip * code cleanup * fix CUDA split buffers * Update ggml-backend-impl.h Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * cuda : fix memset without set_device * imatrix : remove sched affix from weight names * sched : add a new split if the current one has too many inputs reduce max inputs per split more cleanup * update backends ggml-ci --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2024-03-18 11:03:04 +01:00
DAN™	496bc79bc2	common : tidy-up argument parsing (#6105 ) * Tidy-up argument parsing. * Missing ref. * common : minor * common : add static classifier --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-18 10:27:44 +02:00
Thérence	9b03719ad7	convert : add support for CamembertModel architecture (#6119 ) Adding support for CamembertModel architecture used by : https://huggingface.co/dangvantuan/sentence-camembert-large	2024-03-18 10:17:00 +02:00
Romain D	3a6efdd03c	convert : use f32 outtype for bf16 tensors (#6106 ) The old behaviour is to use f16, but bf16 to f16 is not a lossless conversion. Change the outtype to f32 to default to a lossless conversion.	2024-03-18 10:04:41 +02:00
Pierrick Hymbert	d01b3c4c32	common: llama_load_model_from_url using --model-url (#6098 ) * common: llama_load_model_from_url with libcurl dependency Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-17 19:12:37 +01:00

... 6 7 8 9 10 ...

2849 Commits