llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-05 00:04:36 +00:00

Author	SHA1	Message	Date
Cebtenzzre	6b6c73a9e3	kompute : don't fail build because of -Warray-bounds There are some warnings in debug builds that are likely to be false positives.	2023-11-03 17:22:21 -04:00
Adam Treat	1b1416d7b7	Support for gguf.	2023-11-03 17:22:20 -04:00
Georgi Gerganov	fcca0a7004	refact : fix convert script + zero out KV cache to avoid nans (#3523 ) * refact : fix convert script + zero out KV cache to avoid nans * ggml : silu(-inf) should never happen * metal : assert various kernel requirements	2023-10-09 14:32:17 +03:00
Georgi Gerganov	dcc09d2596	metal : do not use mul_mm kernels when ne00 < 64 (#3542 )	2023-10-09 14:28:27 +03:00
Georgi Gerganov	db3abcc114	sync : ggml (ggml-backend) (#3548 ) * sync : ggml (ggml-backend) ggml-ci * zig : add ggml-backend to the build	2023-10-08 20:19:14 +03:00
Matheus C. França	eee42c670e	ci : add Zig CI/CD and fix build (#2996 ) * zig CI/CD and fix build Signed-off-by: Matheus Catarino França <matheus-catarino@hotmail.com> * fix build_compiler * ci : remove trailing whitespace --------- Signed-off-by: Matheus Catarino França <matheus-catarino@hotmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-08 16:59:20 +03:00
Ryder Wishart	8e6716a102	api_like_OAI.py : compat with Microsoft Guidance (#2746 ) Check for None in addition to empty string check in all request params Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-08 13:55:58 +03:00
arcrank	9c38d181d4	api_like_OAI.py : simplify function (#2796 ) Simplify function	2023-10-08 13:52:57 +03:00
Johannes Rudolph	a1202a31ed	k-quants : fix comments about block sizing (#3499 )	2023-10-08 13:21:19 +03:00
Georgi Gerganov	94e502dfb7	ci : enable on obj-c changes + fix metal build (#3540 )	2023-10-08 11:24:50 +03:00
Luo Tian	7d8b24932f	zig : fix build by introducing train.cpp (#3539 )	2023-10-08 11:24:01 +03:00
Georgi Gerganov	b0ec5218c3	metal : support MTLGPUFamily < Apple7, formatting, style (#3524 ) * metal : improve decoding speed for batches of 2-16 * metal : rename kernels mul_mat_ to mul_mv_ * metal : indentations * minor * metal : print more GPU info + disable mul_mm for MTLGPUFamiliy < Apple7	2023-10-08 10:01:53 +03:00
Kerfuffle	63d3b06a43	llama : fix missing break in Persimmon arch case statements (#3535 )	2023-10-08 08:22:17 +03:00
Kerfuffle	a16e89cec8	Fix trying to strip newline from empty prompt and cfg prompt file content (#3534 )	2023-10-07 15:31:41 -06:00
M. Yusuf Sarıgöz	4d03833211	gguf.py : fix CI for publishing GGUF package (#3532 ) * Fix CI for publishing GGUF package * Bump version * fix * bump version * bump version * bump version	2023-10-07 22:14:10 +03:00
Tom C	c47066d833	py : change version of numpy requirement to 1.24.4 (#3515 ) Co-authored-by: Lyjia <me@lyjia.us>	2023-10-07 12:56:15 +03:00
cebtenzzre	f1782c68de	quantize : fail fast on write errors (#3521 )	2023-10-07 11:41:52 +03:00
Jhen-Jie Hong	c26765a0a1	metal : support default.metallib load & reuse code for swift package (#3522 ) * metal : support load default.metallib & reuse code for swift package * metal : use SWIFT_PACKAGE def instead of define GGML_SWIFT	2023-10-07 11:40:27 +03:00
Phillip Kravtsov	0e797c2fc5	llm : support Adept Persimmon 8B (#3410 ) * Produces garbage output * wip: correct tensors up to RoPE * correct tensors thru RoPE * Correct outputs through masked & softmax'd KQ * fp32 works * Rename adept->persimmon * Produces correct outputs * clean up convert scripts * remove printing logic from ggml.c * remove prints from llama.cpp & fix merge * trivial cleanups * Add offload funcs * update conversion script to directly take adept artifacts rather than .saftensors file * Fix norm eps bug * Support sqr and concat on metal, persimmon-8b-q4 runs correctly * Small changes from review * Formatting changes * Minor changes to conversion script * Remove old script * Fix editorconfig formatting * Fix build * add overlooked offload code ggml-ci	2023-10-07 10:12:43 +03:00
goerch	3a716b4dae	Fix for #3454 (#3455 ) Fix: `sentencepiece` tokenizers with added tokens failed with an incorrect assertion	2023-10-07 06:57:01 +02:00
BarfingLemurs	1faaae8c2b	readme : update models, cuda + ppl instructions (#3510 )	2023-10-06 22:13:36 +03:00
Mihai	cb13d73a72	server : docs fix default values and add n_probs (#3506 )	2023-10-06 21:39:33 +03:00
Kerfuffle	9ca79d5cbb	kv cache slot search improvements (#3493 ) * kv cache slot search improvements * Use n_ctx in kv find slot for consistency * Ensure kv cache head points to a valid slot in llama_decode internal * Add some comments to prevent dumb people (like me) from getting confused.	2023-10-06 10:10:13 -06:00
Georgi Gerganov	0c731ca403	prompts : fix editorconfig checks after #3416	2023-10-06 16:36:32 +03:00
pudepiedj	a8777ad84e	parallel : add option to load external prompt file (#3416 ) * Enable external file and add datestamp * Add name of external file at end * Upload ToK2024 * Delete ToK2024.txt * Experiments with jeopardy * Move ParallelQuestions to /proimpts and rename * Interim commit * Interim commit * Final revision * Remove trailing whitespace * remove cmake_all.sh * Remove cmake_all.sh * Changed .gitignore * Improved reporting and new question files. * Corrected typo * More LLM questions * Update LLM-questions.txt * Yet more LLM-questions * Remove jeopardy results file * Reinstate original jeopardy.sh * Update examples/parallel/parallel.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-06 16:16:38 +03:00
Jhen-Jie Hong	97af49fa39	server : reuse llama_sample_token common util (#3494 ) * server : reuse llama_sample_token common function * common : use n_probs for temperature sampling	2023-10-06 15:44:24 +03:00
l3utterfly	16820a5a0d	llama : correct hparams comparison (#3446 ) * fixed floating point comparison issues * updated implementation for hparam comparison to handle inf and NaN * fixed code review comments * minor simplification * rename is_float_eq -> is_float_close --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>	2023-10-06 13:47:59 +03:00
Jhen-Jie Hong	04b2f4386e	ci : fix xcodebuild destinations (#3491 ) * ci : fix xcodebuild destinations * ci : add .swift to paths	2023-10-06 13:36:43 +03:00
cebtenzzre	48edda30ee	convert : update Falcon script for new HF config (#3448 ) Also adds Falcon-180B support. Closes #3049 Co-authored-by: jb <jonathan.t.barnard@gmail.com>	2023-10-05 15:00:34 -04:00
Adam Treat	2c24d67e7b	Don't crash on available devices if we can't even create an instance.	2023-10-05 13:39:18 -04:00
Adam Treat	addac25293	Set the singleton to nullptr here.	2023-10-05 13:39:18 -04:00
Adam Treat	68aca6be08	Only use vulkan with known quant that work.	2023-10-05 13:39:18 -04:00
Adam Treat	4ed25b2f88	Sync from device back to host at begin of new prompt.	2023-10-05 13:39:18 -04:00
Adam Treat	bd5f6399bb	Don't try and install kompute artifacts.	2023-10-05 13:39:18 -04:00
Aaron Miller	8bea719879	vulkan: disambiguate gpus with the same name	2023-10-05 13:39:18 -04:00
Adam Treat	68cf1df6fb	Throw an exception when allocation fails for vulkan.	2023-10-05 13:39:18 -04:00
Aaron Miller	beee57266f	Make kompute actually include external SDK headers when requested	2023-10-05 13:39:18 -04:00
Adam Treat	b7e2e691d4	Completely revamp how we do object management with the vulkan backend and stop using so many static objects so we can tear down and bring up vulkan on new devices in the same runtime.	2023-10-05 13:39:18 -04:00
Adam Treat	45c8778b49	Switch to a dynamic dispatch table instead of linking hard against libvulkan.	2023-10-05 13:39:18 -04:00
Aaron Miller	8563fa001f	remove dynamic deps from kompute build should no longer have new external deps other than libvulkan ``` ubuntu@ip-172-31-1-24:~/repo/gpt4all/gpt4all-backend/build$ ldd ./libllamamodel-mainline-avxonly.so linux-vdso.so.1 (0x00007ffcb53bb000) libvulkan.so.1 => /lib/x86_64-linux-gnu/libvulkan.so.1 (0x00007f239dab5000) libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f239d800000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f239d719000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f239da95000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f239d400000) /lib64/ld-linux-x86-64.so.2 (0x00007f239dd1d000) ```	2023-10-05 13:39:18 -04:00
Adam Treat	48a45ea435	Remove warning which fails on windows.	2023-10-05 13:39:18 -04:00
niansa	ba15dfd0be	Nomic vulkan backend licensed under the Software for Open Models License (SOM), version 1.0.	2023-10-05 13:39:18 -04:00
Kenvix ⭐	45eba9369f	build : use std::make_tuple() for compatibility with older GCC versions (#3488 )	2023-10-05 20:16:39 +03:00
staviq	acec9eaaa9	common : process escape sequences in reverse prompts (#3461 )	2023-10-05 19:17:29 +03:00
shibe2	e2583cbc29	CLBlast: Fix handling of on-device tensor data Fix uploading tensor data to device, including 3D, 4D, and non-contiguous tensors. Use correct offsets into data that is already in VRAM. Correct handling of OpenCL events when multiple commands are queued.	2023-10-05 18:25:23 +04:00
Jhen-Jie Hong	e8b8d32e86	server : fix incorrect num_tokens_predicted (#3480 )	2023-10-05 17:02:55 +03:00
Jhen-Jie Hong	8f3a642ec1	swift : disable ACCELERATE_NEW_LAPACK (#3481 )	2023-10-05 17:00:07 +03:00
Jhen-Jie Hong	0745384449	ci : add swift build via xcodebuild (#3482 )	2023-10-05 16:56:21 +03:00
Kerfuffle	019ba1dcd0	convert : fix Baichuan2 models by using vocab size in config.json (#3299 ) Use local GGUF package when possible in Baichuan converter	2023-10-04 17:20:28 +03:00
Georgi Gerganov	beabc8cfb0	readme : add project status link	2023-10-04 16:50:44 +03:00

1 2 3 4 5 ...

1421 Commits