llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-27 03:44:35 +00:00

Author	SHA1	Message	Date
Ronsor	47857e564c	Don't use vdotq_s32 if it's not available (#139 ) * Don't use vdotq_s32 if it's not available `dotprod` extensions aren't available on some ARM CPUs (e.g. Raspberry Pi 4), so check for them and only use them if they're available. Reintroduces the code removed in `84d9015` if `__ARM_FEATURE_DOTPROD` isn't defined. * Update ggml.c --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-14 21:34:37 +02:00
Radoslav Gerganov	60f819a2b1	Add section to README on how to run the project on Android (#130 )	2023-03-14 15:30:08 +02:00
Georgi Gerganov	97ab2b2578	Add Misc section + update hot topics + minor fixes	2023-03-14 09:43:52 +02:00
Sebastián A	2f700a2738	Add windows to the CI (#98 )	2023-03-13 22:29:10 +02:00
Georgi Gerganov	c09a9cfb06	CMake build in Release by default (#75 )	2023-03-13 21:22:15 +02:00
Georgi Gerganov	7ec903d3c1	Update contribution section, hot topics, limitations, etc.	2023-03-13 19:21:51 +02:00
Georgi Gerganov	4497ad819c	Print system information	2023-03-13 19:15:08 +02:00
Sebastián A	ed6849cc07	Initial support for CMake (#75 )	2023-03-13 19:12:33 +02:00
Thomas Klausner	41be0a3b3d	Add NetBSD support. (#90 )	2023-03-13 18:40:54 +02:00
Pavol Rusnak	671d5cac15	Use fprintf for diagnostic output (#48 ) keep printf only for printing model output one can now use ./main ... 2>dev/null to suppress any diagnostic output	2023-03-13 18:39:56 +02:00
Georgi Gerganov	84d9015c4a	Use vdotq_s32 to improve performance (#67 ) * 10% performance boost on ARM * Back to original change	2023-03-13 18:36:44 +02:00
uint256_t	63fd76fbb0	Reduce model loading time (#43 ) * Use buffering * Use vector * Minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-13 18:33:43 +02:00
Val Kharitonov	2a20f48efa	Fix UTF-8 handling (including colors) (#79 )	2023-03-13 18:24:18 +02:00
Pavol Rusnak	d1f224712d	Add quantize script for batch quantization (#92 ) * Add quantize script for batch quantization * Indentation * README for new quantize.sh * Fix script name * Fix file list on Mac OS --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-13 18:15:20 +02:00
Georgi Gerganov	1808ee0500	Add initial contribution guidelines	2023-03-13 09:42:26 +02:00
Matvey Soloviev	a169bb889c	Gate signal support on being on a unixoid system. (#74 )	2023-03-13 04:08:01 +01:00
Matvey Soloviev	460c482540	Fix token count accounting	2023-03-13 01:04:41 +01:00
Georgi Gerganov	c80e2a8f2a	Revert "10% performance boost on ARM" This reverts commit `113a9e83eb`. There are some reports for illegal instruction. Moved this stuff to vdotq_s32 branch until resolve	2023-03-13 01:28:08 +02:00
Georgi Gerganov	54a0e66ea0	Check for vdotq_s32 availability	2023-03-13 01:21:03 +02:00
Georgi Gerganov	543c57e991	Ammend to previous commit - forgot to update non-QRDMX branch	2023-03-13 01:05:24 +02:00
Georgi Gerganov	113a9e83eb	10% performance boost on ARM	2023-03-13 00:56:10 +02:00
Matvey Soloviev	404fac0d62	Fix color getting reset before prompt output done (#65 ) (cherry picked from commit 7eb2987619feee04c40eff69b604017d09919cb6)	2023-03-13 00:07:34 +02:00
Georgi Gerganov	1a0a74300f	Update README.md	2023-03-12 23:39:01 +02:00
Matvey Soloviev	96ea727f47	Add interactive mode (#61 ) * Initial work on interactive mode. * Improve interactive mode. Make rev. prompt optional. * Update README to explain interactive mode. * Fix OS X build	2023-03-12 23:13:28 +02:00
Marc Köhlbrugge	9661954835	Fix typo in README (#45 )	2023-03-12 22:30:08 +02:00
Ben Garney	f385f8dee8	Allow using prompt files (#59 )	2023-03-12 22:28:36 +02:00
beiller	02f0c6fe7f	Add back top_k (#56 ) * Add back top_k * Update utils.cpp * Update utils.h --------- Co-authored-by: Bill Hamilton <bill.hamilton@shopify.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-12 22:23:15 +02:00
Sebastián A	eb062bb012	Windows fixes (#31 ) * Apply fixes suggested to build on windows Issue: https://github.com/ggerganov/llama.cpp/issues/22 * Remove unsupported VLAs * MSVC: Remove features that are only available on MSVC C++20. * Fix zero initialization of the other fields. * Change the use of vector for stack allocations.	2023-03-12 22:15:00 +02:00
Georgi Gerganov	7027a97837	Update README.md	2023-03-12 22:09:26 +02:00
Georgi Gerganov	2d555e5b42	Add CI (#60 )	2023-03-12 22:08:24 +02:00
Georgi Gerganov	7c9e54e55e	Revert "weights_only" arg - this causing more trouble than help	2023-03-12 20:59:01 +02:00
Oleksandr Nikitin	b9bd1d0141	python/pytorch compat notes (#44 )	2023-03-12 14:16:33 +02:00
beiller	129c7d1ea8	Add repetition penalty (#20 ) * Adding repeat penalization * Update utils.h * Update utils.cpp * Numeric fix Should probably still scale by temp even if penalized * Update comments, more proper application I see that numbers can go negative so a fix from a referenced commit * Minor formatting --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-12 11:27:42 +02:00
Georgi Gerganov	702fddf5c5	Clarify meaning of hacking	2023-03-12 09:03:25 +02:00
Georgi Gerganov	7d86e25bf6	README: add "Supported platforms" + update hot topics	2023-03-12 08:41:54 +02:00
deepdiffuser	a93120236f	use weights_only in conversion script (#32 ) this restricts malicious weights from executing arbitrary code by restricting the unpickler to only loading tensors, primitive types, and dictionaries	2023-03-12 08:36:35 +02:00
Pavol Rusnak	6a9a67f0be	Add LICENSE (#21 )	2023-03-12 08:36:03 +02:00
Georgi Gerganov	da1a4ff01f	Update README.md	2023-03-12 01:26:32 +02:00
Juraj Bednar	6b2cb6302f	Fix a typo in model name (#16 )	2023-03-11 19:32:20 +02:00
Georgi Gerganov	4235e3d5b3	Update README.md	2023-03-11 18:10:18 +02:00
Georgi Gerganov	f1eaff4721	Add AVX2 support for x86 architectures thanks to @Const-me !	2023-03-11 18:04:25 +02:00
Georgi Gerganov	a9e58529ea	Fix un-initialized FP16 tables on x86 (#15 , #2 )	2023-03-11 17:40:14 +02:00
Georgi Gerganov	7d9ed7b25f	Bump memory buffer	2023-03-11 12:45:01 +02:00
Georgi Gerganov	0c6803321c	Update README.md	2023-03-11 12:31:21 +02:00
Georgi Gerganov	f60fa9e50a	.gitignore models/	2023-03-11 12:27:02 +02:00
Georgi Gerganov	7211862c94	Update Makefile var + add comment	2023-03-11 12:27:02 +02:00
Georgi Gerganov	a5c5ae2f54	Update README.md	2023-03-11 11:34:25 +02:00
Georgi Gerganov	ea977e85ec	Update README.md	2023-03-11 11:34:11 +02:00
Georgi Gerganov	007a8f6f45	Support all LLaMA models + change Q4_0 quantization storage	2023-03-11 11:28:30 +02:00
Simon Willison	5f2f970d51	Include Python dependencies in README (#6 )	2023-03-11 07:47:26 +02:00

... 64 65 66 67 68

3363 Commits