llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-02 14:54:35 +00:00

Author	SHA1	Message	Date
Aaron Miller	8bea719879	vulkan: disambiguate gpus with the same name	2023-10-05 13:39:18 -04:00
Adam Treat	68cf1df6fb	Throw an exception when allocation fails for vulkan.	2023-10-05 13:39:18 -04:00
Aaron Miller	beee57266f	Make kompute actually include external SDK headers when requested	2023-10-05 13:39:18 -04:00
Adam Treat	b7e2e691d4	Completely revamp how we do object management with the vulkan backend and stop using so many static objects so we can tear down and bring up vulkan on new devices in the same runtime.	2023-10-05 13:39:18 -04:00
Adam Treat	45c8778b49	Switch to a dynamic dispatch table instead of linking hard against libvulkan.	2023-10-05 13:39:18 -04:00
Aaron Miller	8563fa001f	remove dynamic deps from kompute build should no longer have new external deps other than libvulkan ``` ubuntu@ip-172-31-1-24:~/repo/gpt4all/gpt4all-backend/build$ ldd ./libllamamodel-mainline-avxonly.so linux-vdso.so.1 (0x00007ffcb53bb000) libvulkan.so.1 => /lib/x86_64-linux-gnu/libvulkan.so.1 (0x00007f239dab5000) libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f239d800000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f239d719000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f239da95000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f239d400000) /lib64/ld-linux-x86-64.so.2 (0x00007f239dd1d000) ```	2023-10-05 13:39:18 -04:00
Adam Treat	48a45ea435	Remove warning which fails on windows.	2023-10-05 13:39:18 -04:00
niansa	ba15dfd0be	Nomic vulkan backend licensed under the Software for Open Models License (SOM), version 1.0.	2023-10-05 13:39:18 -04:00
Kevin Ji	45855b3f1c	docs : mark code as Bash (#3375 )	2023-09-28 09:11:32 -04:00
Pierre Alexandre SCHEMBRI	4aea3b846e	readme : add Mistral AI release 0.1 (#3362 )	2023-09-28 15:13:37 +03:00
slaren	da0400344b	ggml-cuda : perform cublas fp16 matrix multiplication as fp16 (#3370 ) * ggml-cuda : perform cublas fp16 matrix multiplication as fp16 * try to fix rocm build * restrict fp16 mat mul to volta and up	2023-09-28 13:08:28 +03:00
Zhang Peiyuan	e519621010	convert : remove bug in convert.py permute function (#3364 )	2023-09-27 20:45:20 +02:00
Richard Roberson	ac43576124	make-ggml.py : compatibility with more models and GGUF (#3290 ) * Resync my fork with new llama.cpp commits * examples : rename to use dash instead of underscore * New model conversions --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-09-27 19:25:12 +03:00
Cebtenzzre	20c7e1e804	gguf : fix a few general keys (#3341 )	2023-09-27 12:18:07 -04:00
Rickard Hallerbäck	dc6897404e	metal : reusing llama.cpp logging (#3152 ) * metal : reusing llama.cpp logging * cmake : build fix * metal : logging callback * metal : logging va_args memory fix * metal : minor cleanup * metal : setting function like logging macro to capital letters * llama.cpp : trailing whitespace fix * ggml : log level enum used by llama * Makefile : cleanup ggml-metal recipe * ggml : ggml_log_callback typedef * ggml : minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-09-27 18:48:33 +03:00
Jag Chadha	527e57cfd8	build : add ACCELERATE_NEW_LAPACK to fix warning on macOS Sonoma (#3342 )	2023-09-27 18:34:32 +03:00
BarfingLemurs	ffe88a36a9	readme : add some recent perplexity and bpw measurements to READMES, link for k-quants (#3340 ) * Update README.md * Update README.md * Update README.md with k-quants bpw measurements	2023-09-27 18:30:36 +03:00
DAN™	99115f3fa6	cmake : fix build-info.h on MSVC (#3309 )	2023-09-25 18:45:33 -04:00
2f38b454	1726f9626f	docs: Fix typo CLBlast_DIR var. (#3330 )	2023-09-25 20:24:52 +02:00
Erik Scholz	a98b1633d5	nix : add cuda, use a symlinked toolkit for cmake (#3202 )	2023-09-25 13:48:30 +02:00
slaren	c091cdfb24	llama-bench : add README (#3317 ) * llama-bench : add README * minor edit	2023-09-23 21:48:24 +02:00
Cebtenzzre	51a7cf5c6e	examples : fix RoPE defaults to match PR #3240 (#3315 )	2023-09-23 12:28:50 +03:00
Kevin Ji	bedb92b603	scripts : use `/usr/bin/env` in shebang (#3313 )	2023-09-22 23:52:23 -04:00
Lee Drake	bc9d3e3971	Update README.md (#3289 ) * Update README.md * Update README.md Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2023-09-21 21:00:24 +02:00
shibe2	36b904e200	ggml-opencl.cpp: Make private functions static (#3300 )	2023-09-21 14:10:26 -04:00
Edward Taylor	324f3403d5	zig : fix for updated c lib (#3259 )	2023-09-21 12:08:20 +03:00
yuiseki	f56c418ab0	embedding : update README.md (#3224 )	2023-09-21 11:57:40 +03:00
Johannes Gäßler	8185710a80	CUDA: use only 1 thread if fully offloaded (#2915 )	2023-09-21 11:43:53 +03:00
Georgi Gerganov	7eb41179ed	readme : update hot topics	2023-09-20 20:48:22 +03:00
Cebtenzzre	a5661d7e71	llama : allow gguf RoPE keys to be overridden with defaults (#3240 )	2023-09-20 12:12:47 -04:00
Cebtenzzre	65c2c1c5ab	benchmark-matmult : do not use integer abs() on a float (#3277 )	2023-09-20 12:06:08 -04:00
kang	80834daecf	flake : Restore default package's buildInputs (#3262 )	2023-09-20 15:48:22 +02:00
Alon	a40f2b656f	CI: FreeBSD fix (#3258 ) * - freebsd ci: use qemu	2023-09-20 14:06:36 +02:00
Georgi Gerganov	d119c04c15	examples : fix benchmark-matmult (#1554 ) The precision for Q4_0 has degraded since #1508	2023-09-20 10:02:39 +03:00
Cebtenzzre	8781013ef6	make : restore build-info.h dependency for several targets (#3205 )	2023-09-18 10:03:53 -04:00
Erik Scholz	7ddf185537	ci : switch cudatoolkit install on windows to networked (#3236 )	2023-09-18 02:21:47 +02:00
Johannes Gäßler	ee66942d7e	CUDA: fix peer access logic (#3231 )	2023-09-17 23:35:20 +02:00
Johannes Gäßler	111163e246	CUDA: enable peer access between devices (#2470 )	2023-09-17 16:37:53 +02:00
slaren	8b428c9bc8	llama.cpp : show model size and BPW on load (#3223 )	2023-09-17 14:33:28 +02:00
Johannes Gäßler	578d8c8f5c	CUDA: fix scratch malloced on non-main device (#3220 )	2023-09-17 14:16:22 +02:00
IsaacDynamo	b541b4f0b1	Enable BUILD_SHARED_LIBS=ON on all Windows builds (#3215 )	2023-09-16 19:35:25 +02:00
Vlad	5dbc2b3213	Enable build with CUDA 11.0 (make) (#3132 ) * CUDA 11.0 fixes * Cleaner CUDA/host flags separation Also renamed GGML_ASSUME into GGML_CUDA_ASSUME	2023-09-16 16:55:43 +02:00
goerch	b08e75baea	Fixing the last deviations from sentencepiece indicated by test-tokenizer-1 (#3170 ) * Fix für #2721 * Reenable tokenizer test for LLaMa * Add `console.cpp` dependency * Fix dependency to `common` * Fixing wrong fix. * Make console usage platform specific Work on compiler warnings. * Adapting makefile * Remove trailing whitespace * Adapting the other parts of the makefile * Fix typo. * Fixing the last deviations from sentencepiece indicated by test-tokenizer-1 * Simplify logic * Add missing change... * Fix ugly compiler warning * llama_tokenize should accept strings containing NUL now * Adding huichen's test case	2023-09-16 13:41:33 +02:00
Cebtenzzre	e6616cf0db	examples : add compiler version and target to build info (#2998 )	2023-09-15 16:59:49 -04:00
Cebtenzzre	3aefaab9e5	check C++ code with -Wmissing-declarations (#3184 )	2023-09-15 15:38:27 -04:00
Cebtenzzre	69eb67e282	fix build numbers by setting fetch-depth=0 (#3197 )	2023-09-15 15:18:15 -04:00
Meng Zhang	4fe09dfe66	llama : add support for StarCoder model architectures (#3187 ) * add placeholder of starcoder in gguf / llama.cpp * support convert starcoder weights to gguf * convert MQA to MHA * fix ffn_down name * add LLM_ARCH_STARCODER to llama.cpp * set head_count_kv = 1 * load starcoder weight * add max_position_embeddings * set n_positions to max_positioin_embeddings * properly load all starcoder params * fix head count kv * fix comments * fix vram calculation for starcoder * store mqa directly * add input embeddings handling * add TBD * working in cpu, metal buggy * cleanup useless code * metal : fix out-of-bounds access in soft_max kernels * llama : make starcoder graph build more consistent with others * refactor: cleanup comments a bit * add other starcoder models: 3B, 7B, 15B * support-mqa-directly * fix: remove max_position_embeddings, use n_train_ctx * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix: switch to space from tab --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-09-15 22:02:13 +03:00
Cebtenzzre	80291a1d02	common : do not use GNU zero-length __VA_ARGS__ extension (#3195 )	2023-09-15 21:02:01 +03:00
Georgi Gerganov	c6f1491da0	metal : fix bug in soft_max kernels (out-of-bounds access) (#3194 )	2023-09-15 20:17:24 +03:00
Cebtenzzre	e3d87a6c36	convert : make ftype optional in simple scripts (#3185 )	2023-09-15 12:29:02 -04:00

1 2 3 4 5 ...

1290 Commits