llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-26 19:34:35 +00:00

Author	SHA1	Message	Date
SebastianApel	95ea26f6e9	benchmark : add tool for timing q4_0 matrix multiplication (#653 ) * Initial version of q4_0 matrix multiplication benchmark * Bugfix: Added dependency to ggml.o to benchmark * Reviewer requests: added parameter for threads, switched to ggml_time_us() * Reviewer input: removed rtsc, use epsilon for check * Review comment: Removed set_locale * Feature: Param for numer of iterations, Bugfix for use of parameter threads * Reviewer suggestion: Moved to examples * Reviewer feedback: Updated clean: and benchmark: sections --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-04-13 15:46:23 +03:00
Pavol Rusnak	8b679987cd	Fix whitespace, add .editorconfig, add GitHub workflow (#883 )	2023-04-11 19:45:44 +00:00
Stephan Walter	3e6e70d8e8	Add enum llama_ftype, sync ggml_type to model files (#709 )	2023-04-11 15:03:51 +00:00
comex	2663d2c678	Windows fixes (#890 ) Mostly for msys2 and mingw64 builds, which are different from each other and different from standard Visual Studio builds. Isn't Windows fun? - Define _GNU_SOURCE in more files (it's already used in ggml.c for Linux's sake). - Don't use PrefetchVirtualMemory if not building for Windows 8 or later (mingw64 doesn't by default). But warn the user about this situation since it's probably not intended. - Check for NOMINMAX already being defined, which it is on mingw64. - Actually use the `increment` variable (bug in my `pizza` PR). - Suppress unused variable warnings in the fake pthread_create and pthread_join implementations for Windows. - (not Windows-related) Remove mention of `asprintf` from comment; `asprintf` is no longer used. Fixes #871.	2023-04-11 15:19:54 +02:00
comex	f963b63afa	Rewrite loading code to try to satisfy everyone: - Support all three formats (ggml, ggmf, ggjt). (However, I didn't include the hack needed to support GPT4All files without conversion. Those can still be used after converting them with convert.py from my other PR.) - Support both mmap and read (mmap is used by default, but can be disabled with `--no-mmap`, and is automatically disabled for pre-ggjt files or on platforms where mmap is not supported). - Support multi-file models like before, but automatically determine the number of parts rather than requiring `--n_parts`. - Improve validation and error checking. - Stop using the per-file type field (f16) entirely in favor of just relying on the per-tensor type/size fields. This has no immediate benefit, but makes it easier to experiment with different formats, and should make it easier to support the new GPTQ-for-LLaMa models in the future (I have some work in progress on that front). - Support VirtualLock on Windows (using the same `--mlock` option as on Unix). - Indicate loading progress when using mmap + mlock. (Which led me to the interesting observation that on my Linux machine, with a warm file cache, mlock actually takes some time, whereas mmap without mlock starts almost instantly...) - To help implement this, move mlock support from ggml to the loading code. - madvise/PrefetchVirtualMemory support (based on #740) - Switch from ifstream to the `fopen` family of functions to avoid unnecessary copying and, when mmap is enabled, allow reusing the same file descriptor for both metadata reads and mmap (whereas the existing implementation opens the file a second time to mmap). - Quantization now produces a single-file output even with multi-file inputs (not really a feature as much as 'it was easier this way'). Implementation notes: I tried to factor the code into more discrete pieces than before. Regarding code style: I tried to follow the code style, but I'm naughty and used a few advanced C++ features repeatedly: - Destructors to make it easier to ensure everything gets cleaned up. - Exceptions. I don't even usually use exceptions when writing C++, and I can remove them if desired... but here they make the loading code much more succinct while still properly handling a variety of errors, ranging from API calls failing to integer overflow and allocation failure. The exceptions are converted to error codes at the API boundary.) Co-authored-by: Pavol Rusnak <pavol@rusnak.io> (for the bit I copied from #740)	2023-04-10 01:10:46 +02:00
Tomáš Pazdiora	aaf3b23deb	fix for windows utf-8 input (#840 ) Use UTF-16 as input on Windows, since UTF-8 does not work and reads multibyte characters as zeros	2023-04-08 17:49:39 +02:00
unbounded	62cfc54f77	Add quantize-stats command for testing quantization (#728 ) Command that calculates some statistics over the errors introduced by quantization, like mean square error, max error and some percentile errors for layer weights. Should be useful for testing quantization improvements. Exposes some internal state from ggml and llama for testing	2023-04-08 00:09:18 +02:00
Sergey Alirzaev	cc9cee8e9e	Do not crash when it has nothing to say. (#796 ) Otherwise observing this in the interactive mode: /usr/lib/gcc/x86_64-pc-linux-gnu/12/include/g++-v12/bits/stl_vector.h:1230: reference std::vector<int>::back() [_Tp = int, _Alloc = std::allocator<int>]: Assertion '!this->empty()' failed.	2023-04-06 17:59:11 +02:00
at8u	ff05d05c96	miku.sh : add executable bit (#780 )	2023-04-05 18:59:13 +03:00
at8u	88ed5761b8	examples : add Miku.sh (#724 ) * Add Miku.sh to examples * Add missing line to prompt in Miku.sh * Add --keep param to Miku.sh * Remove '[end_of_conversation]' line from Miku.sh No longer is necessary.	2023-04-05 17:32:42 +03:00
mgroeber9110	53dbba7695	Windows: reactive sigint handler after each Ctrl-C (#736 )	2023-04-03 18:00:55 +02:00
Leonardo Neumann	6e7801d08d	examples : add gpt4all script (#658 )	2023-04-02 10:56:20 +03:00
Murilo Santana	5b70e7de4c	fix default params for examples/main (#697 )	2023-04-02 04:41:12 +02:00
Slaren	0d054e292e	Show error message when -f fails	2023-04-01 16:08:40 +02:00
Slaren	64bde3ffd4	Fix ggml_init_params in quantize	2023-03-30 12:28:25 -07:00
Thérence	d9ad104440	Create chat-13B.bat (#592 ) * Create chat-13B.bat Same script than chat-13B.sh, but for windows users. Tested and working on windows 10/11 v 22H2 * Apply suggestions from code review --------- Co-authored-by: anzz1 <anzz1@live.com>	2023-03-29 20:21:09 +03:00
Tobias Lütke	a6956b25a1	add example of re-act pattern (#583 ) * add example of re-act pattern * spelling... * fixed whitespace in reverse prompt issue	2023-03-29 10:10:24 -05:00
anzz1	7f4c5c6651	llama : fix linkage with mingw (#551 ) * Revert `7e53955` (#542) Still needs to be fixed properly * Fix linking on mingw32	2023-03-28 21:23:09 +03:00
Stephan Walter	436e561931	all : be more strict about converting float to double (#458 ) * Be more strict about converting float to double * Test equivalence of round, SILU implementations Test module is commented out in CMakeLists.txt because the tests may take a long time, depending on how much the compiler optimizes. * Fix softmax in perplexity.cpp * all : prefer float over double where appropriate * perplexity : add <cmath> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-28 19:48:20 +03:00
Stephan Walter	c1f885067c	ggml : introduce structs for the q4 data blocks (#356 ) * Introduce structs for the q4 data blocks * ggml : rename quant struct variables + fix ARM_NEON --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-28 18:56:03 +03:00
anzz1	7b8dbcb78b	main.cpp fixes, refactoring (#571 ) - main: entering empty line passes back control without new input in interactive/instruct modes - instruct mode: keep prompt fix - instruct mode: duplicate instruct prompt fix - refactor: move common console code from main->common	2023-03-28 17:09:55 +03:00
Marco Matthies	7e5395575a	Fix missing ggml link in cmake for examples/* on w64-mingw32 (#542 )	2023-03-27 07:55:26 +03:00
Stephan Walter	b391579db9	Update README and comments for standalone perplexity tool (#525 )	2023-03-26 16:14:01 +03:00
anzz1	7a87d31f4f	[main] fix infinite generation (-n == -1) (#523 )	2023-03-26 16:06:10 +03:00
Harald Fernengel	33e35b8fe8	Exit from interactive mode if input stream is bad (#491 ) Allow exiting the interactive prompt also with CTRL-D on Unix and CTRL-Z on Windows.	2023-03-26 08:25:46 +03:00
anzz1	34ab526843	(Windows) Set console to UTF-8 on init (#420 ) Sets console codepage to 65001 (CP_UTF8) on start for both input and output, should fix problems with UTF-8 characters.	2023-03-25 22:29:22 +02:00
Georgi Gerganov	c2b25b6912	Fix colors enabling on WIN32	2023-03-25 21:53:39 +02:00
Georgi Gerganov	79b2b266db	If n_predict == -1, generate forever	2023-03-25 21:51:41 +02:00
Georgi Gerganov	e2d490dafd	Inifinite generation via context swapping (#71 )	2023-03-25 21:36:22 +02:00
Georgi Gerganov	03f7e33560	Cleanup STL headers + fix embedding examples + minor stuff	2023-03-25 20:51:14 +02:00
Georgi Gerganov	55ad42af84	Move chat scripts into "./examples"	2023-03-25 20:37:09 +02:00
Georgi Gerganov	a316a425d0	Overhaul the examples structure - main -> examples - utils -> examples (renamed to "common") - quantize -> examples - separate tools for "perplexity" and "embedding" Hope I didn't break something !	2023-03-25 20:26:40 +02:00
Georgi Gerganov	04c6f5ed6f	Immediately start processing the prompt before user input has been provided (#476 )	2023-03-24 23:17:58 +02:00
Mathieu Nayrolles	3f9c6135e4	fix typo in chatLLaMa (#368 ) The prompt contains a typo where 'alound' is used instead of 'aloud'.	2023-03-21 22:52:27 +02:00
Jean-Christophe Hoelt	3ab3e6582f	Add chatLLaMa script (#198 ) * Add chatLLaMa script * Fix shellcheck errors and do some cleanup * Move chatLLaMa script to `examples` directory * Reduce chatLLaMa context size to 2048 Ref `d7def1a752` * Include n_predict to 2048 in examples/chatLLaMa	2023-03-21 18:23:15 +02:00

... 15 16 17 18 19

935 Commits