llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-11-18 08:49:52 +00:00

Author	SHA1	Message	Date
Qingyou Meng	084e2f0ec0	interactive mode: print '\n' in sigint_handler, this flush stdout thus ensure color reset. (#283 )	2023-03-19 20:10:00 +02:00
Erik Scholz	0b366e7357	Command line switch to use F16 for memory_k and memory_v (refactor of #154 ) (#294 ) * Use F16 for memory_k and memory_v * add command line switch to use f16 instead of f32 for memory k+v --------- Co-authored-by: Ty Everett <ty@tyweb.us>	2023-03-19 19:57:00 +02:00
Georgi Gerganov	160bfb217d	Update hot topics to mention Alpaca support	2023-03-19 19:51:55 +02:00
Georgi Gerganov	c494ed5b94	Fix off-by-one bug (#115 )	2023-03-19 19:46:32 +02:00
Georgi Gerganov	c1c7026b47	Fix python stuff (#109 )	2023-03-19 19:33:18 +02:00
qunash	467b149761	Refactoring `convert-pth-to-ggml.py`: more concise and readable (#109 ) * Refactor get_n_parts function to simplify code and improve readability * Use f-strings instead of concatenation * Refactoring: more concise and readable * modularize --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-19 19:17:39 +02:00
Georgi Gerganov	70f01cb863	Drop trailing new line from file prompts (#80 )	2023-03-19 19:05:04 +02:00
Georgi Gerganov	a4e63b73df	Add instruction for using Alpaca (#240 )	2023-03-19 18:49:50 +02:00
Georgi Gerganov	9e1707218a	Add "--instruct" argument for usage with Alpaca (#240 ) Also start adding prompts in "./prompts"	2023-03-19 18:37:02 +02:00
Georgi Gerganov	22213a17b5	Change RMSNorm eps to 1e-6 (#173 ) I think this is what is used in the Python code	2023-03-19 17:30:00 +02:00
Ronsor	d7def1a752	Warn user if a context size greater than 2048 tokens is specified (#274 ) LLaMA doesn't support more than 2048 token context sizes, and going above that produces terrible results.	2023-03-18 20:10:47 -04:00
Pavol Rusnak	6f61c18ec9	Fix typo in readme	2023-03-18 23:18:04 +01:00
Pavol Rusnak	1e5a6d088d	Add note about Python 3.11 to readme	2023-03-18 22:25:35 +01:00
Pavol Rusnak	554b541521	Add memory/disk requirements to readme	2023-03-18 22:25:35 +01:00
Alex Nguyen	d3f202d57b	Remove unused code since n_vocab is model.hparams.n_vocab (#262 )	2023-03-18 13:51:49 +00:00
Justin Suess	e03e359730	fixed warning with std::ignore about unused function result (#151 ) fixed warning with std::ignore about unused function result	2023-03-18 11:44:09 +00:00
Gary Linscott	a81d0c2a17	Fix n^2 loop in tokenization (#254 ) This causes long prompts to parse very slowly.	2023-03-18 11:17:19 +00:00
anzz1	b2de7f18df	CI Improvements (#230 ) * CI Improvements Manual build feature, autoreleases for Windows * better CI naming convention use branch name in releases and tags	2023-03-18 09:27:12 +02:00
Niklas Korz	a292747893	Nix flake (#40 ) * Nix flake * Nix: only add Accelerate framework on macOS * Nix: development shel, direnv and compatibility * Nix: use python packages supplied by withPackages * Nix: remove channel compatibility * Nix: fix ARM neon dotproduct on macOS --------- Co-authored-by: Pavol Rusnak <pavol@rusnak.io>	2023-03-17 23:03:48 +01:00
thement	c9f670a177	Implement non-greedy tokenizer that tries to maximize token lengths (#242 ) * Implement non-greedy tokenizer that tries to maximize token lengths * Insert single space in front of the prompt - this is to match original llama tokenizer behavior --------- Co-authored-by: Jakub Horak <jakub.horak@ibawizard.net>	2023-03-17 21:05:58 +01:00
Georgi Gerganov	4f54609110	Default to 4 threads (#243 )	2023-03-17 21:46:46 +02:00
Georgi Gerganov	e81b9c81c1	Update Contributing section	2023-03-17 20:30:04 +02:00
Stephan Walter	367946c668	Don't tell users to use a bad number of threads (#243 ) The readme tells people to use the command line option "-t 8", causing 8 threads to be started. On systems with fewer than 8 cores, this causes a significant slowdown. Remove the option from the example command lines and use /proc/cpuinfo on Linux to determine a sensible default.	2023-03-17 19:47:35 +02:00
mmyjona	6b0df5ccf3	add ptread link to fix cmake build under linux (#114 ) * add ptread link to fix cmake build under linux * add cmake to linux and macos platform * separate make and cmake workflow --------- Co-authored-by: Sebastián A <sebastian.aedo29@gmail.com>	2023-03-17 13:38:24 -03:00
Bernat Vadell	2af23d3043	🚀 Dockerize llamacpp (#132 ) * feat: dockerize llamacpp * feat: split build & runtime stages * split dockerfile into main & tools * add quantize into tool docker image * Update .devops/tools.sh Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add docker action pipeline * change CI to publish at github docker registry * fix name runs-on macOS-latest is macos-latest (lowercase) * include docker versioned images * fix github action docker * fix docker.yml * feat: include all-in-one command tool & update readme.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-17 10:47:06 +01:00
Matvey Soloviev	904d2a8d6a	Q4_1 quantization (#193 ) * Add AVX2 version of ggml_vec_dot_q4_1 * Small optimisations to q4_1 dot product (@Const-me) * Rearrange Q4_1 quantization to work for multipart models. (Fix #152) * Fix ggml_vec_mad_q4_1 too * Fix non-vectorised q4_1 vec mul	2023-03-17 06:48:39 +02:00
Georgi Gerganov	721311070e	Update README.md	2023-03-16 15:00:09 +02:00
Georgi Gerganov	ac15de7895	Expand "Contributing" section	2023-03-16 08:55:13 +02:00
Georgi Gerganov	273abc47ff	Update hot topics - RMSnorm	2023-03-16 07:12:12 +02:00
Nebula	9b4a15b17d	Fix RMS norm in GGML (#191 )	2023-03-15 19:29:25 -04:00
hoangmit	6eac39ba95	Add RMS norm and use it (#187 ) * add ggml_rms_norm * update op num	2023-03-16 00:41:38 +02:00
moritzbrantner	27944c4206	fixed typo (#178 )	2023-03-15 22:35:25 +02:00
Rickey Bowers Jr	2d15d6c9a9	add SIGINT support for _WIN32 environments (#120 ) * add SIGINT support for _WIN32 environments * perhaps more consistent	2023-03-15 21:56:24 +02:00
Justin Suess	2d64715ad4	added ctx_size parameter (#148 ) * added ctx_size parameter * added it in more places * Apply suggestions from code review --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-15 21:42:40 +02:00
Justin Suess	16b2c61a22	fixed color reset on exit (#149 ) * fixed color reset on exit * added sigint handler for ansi_color_reset * Update main.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-15 21:39:38 +02:00
Musab Gultekin	977295c700	Fix potential licensing issue (#126 ) * Update README.md * Update README.md remove facebook	2023-03-15 21:39:06 +02:00
Ronsor	956dfda8ad	Use `tokenizer.vocab_size()` instead of hardcoding 32000 in convert-pth-to-ggml.py (#142 ) There are ways that special tokens or other new tokens could be added to the tokenizer; therefore it's probably best not to assume the vocabulary is only 32000 tokens.	2023-03-15 21:37:50 +02:00
hoangmit	113e685d18	inline -> static inline for "bytesFromNibbles" (#161 ) Without "static" prefix, it fails to compile in clang	2023-03-15 21:05:14 +02:00
Ronsor	47857e564c	Don't use vdotq_s32 if it's not available (#139 ) * Don't use vdotq_s32 if it's not available `dotprod` extensions aren't available on some ARM CPUs (e.g. Raspberry Pi 4), so check for them and only use them if they're available. Reintroduces the code removed in `84d9015` if `__ARM_FEATURE_DOTPROD` isn't defined. * Update ggml.c --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-14 21:34:37 +02:00
Radoslav Gerganov	60f819a2b1	Add section to README on how to run the project on Android (#130 )	2023-03-14 15:30:08 +02:00
Georgi Gerganov	97ab2b2578	Add Misc section + update hot topics + minor fixes	2023-03-14 09:43:52 +02:00
Sebastián A	2f700a2738	Add windows to the CI (#98 )	2023-03-13 22:29:10 +02:00
Georgi Gerganov	c09a9cfb06	CMake build in Release by default (#75 )	2023-03-13 21:22:15 +02:00
Georgi Gerganov	7ec903d3c1	Update contribution section, hot topics, limitations, etc.	2023-03-13 19:21:51 +02:00
Georgi Gerganov	4497ad819c	Print system information	2023-03-13 19:15:08 +02:00
Sebastián A	ed6849cc07	Initial support for CMake (#75 )	2023-03-13 19:12:33 +02:00
Thomas Klausner	41be0a3b3d	Add NetBSD support. (#90 )	2023-03-13 18:40:54 +02:00
Pavol Rusnak	671d5cac15	Use fprintf for diagnostic output (#48 ) keep printf only for printing model output one can now use ./main ... 2>dev/null to suppress any diagnostic output	2023-03-13 18:39:56 +02:00
Georgi Gerganov	84d9015c4a	Use vdotq_s32 to improve performance (#67 ) * 10% performance boost on ARM * Back to original change	2023-03-13 18:36:44 +02:00
uint256_t	63fd76fbb0	Reduce model loading time (#43 ) * Use buffering * Use vector * Minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-13 18:33:43 +02:00

... 74 75 76 77 78

3851 Commits