llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-27 20:04:35 +00:00

Author	SHA1	Message	Date
oKatanaaa	e4881686b4	Make WIN32 mmap() improvements (#341 ) Still not fully working yet. Closes #341	2023-03-28 09:19:03 -07:00
Justine Tunney	0b5448a3a4	Implement system polyfill for win32 / posix.1 I don't have access to Microsoft Visual Studio right now (aside from the the Github Actions CI system) but I think this code should come close to what we want in terms of polyfilling UNIX functionality.	2023-03-17 21:22:40 -07:00
Justine Tunney	5b8023d935	Implement prototype for instant mmap() loading This change uses a custom malloc() implementation to transactionally capture to a file dynamic memory created during the loading process. That includes (1) the malloc() allocation for mem_buffer and (2) all the C++ STL objects. On my $1000 personal computer, this change lets me run ./main to generate a single token (-n 1) using the float16 7B model (~12gb size) in one second. In order to do that, there's a one time cost where a 13gb file needs to be generated. This change rocks but it shouldn't be necessary to do something this heroic. We should instead change the file format, so that tensors don't need reshaping and realignment in order to be loaded.	2023-03-16 22:16:33 -07:00
Georgi Gerganov	4497ad819c	Print system information	2023-03-13 19:15:08 +02:00
Pavol Rusnak	671d5cac15	Use fprintf for diagnostic output (#48 ) keep printf only for printing model output one can now use ./main ... 2>dev/null to suppress any diagnostic output	2023-03-13 18:39:56 +02:00
uint256_t	63fd76fbb0	Reduce model loading time (#43 ) * Use buffering * Use vector * Minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-13 18:33:43 +02:00
Val Kharitonov	2a20f48efa	Fix UTF-8 handling (including colors) (#79 )	2023-03-13 18:24:18 +02:00
Matvey Soloviev	a169bb889c	Gate signal support on being on a unixoid system. (#74 )	2023-03-13 04:08:01 +01:00
Matvey Soloviev	460c482540	Fix token count accounting	2023-03-13 01:04:41 +01:00
Matvey Soloviev	404fac0d62	Fix color getting reset before prompt output done (#65 ) (cherry picked from commit 7eb2987619feee04c40eff69b604017d09919cb6)	2023-03-13 00:07:34 +02:00
Matvey Soloviev	96ea727f47	Add interactive mode (#61 ) * Initial work on interactive mode. * Improve interactive mode. Make rev. prompt optional. * Update README to explain interactive mode. * Fix OS X build	2023-03-12 23:13:28 +02:00
beiller	02f0c6fe7f	Add back top_k (#56 ) * Add back top_k * Update utils.cpp * Update utils.h --------- Co-authored-by: Bill Hamilton <bill.hamilton@shopify.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-12 22:23:15 +02:00
Sebastián A	eb062bb012	Windows fixes (#31 ) * Apply fixes suggested to build on windows Issue: https://github.com/ggerganov/llama.cpp/issues/22 * Remove unsupported VLAs * MSVC: Remove features that are only available on MSVC C++20. * Fix zero initialization of the other fields. * Change the use of vector for stack allocations.	2023-03-12 22:15:00 +02:00
beiller	129c7d1ea8	Add repetition penalty (#20 ) * Adding repeat penalization * Update utils.h * Update utils.cpp * Numeric fix Should probably still scale by temp even if penalized * Update comments, more proper application I see that numbers can go negative so a fix from a referenced commit * Minor formatting --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-12 11:27:42 +02:00
Georgi Gerganov	7d9ed7b25f	Bump memory buffer	2023-03-11 12:45:01 +02:00
Georgi Gerganov	007a8f6f45	Support all LLaMA models + change Q4_0 quantization storage	2023-03-11 11:28:30 +02:00
Georgi Gerganov	70bc0b8b15	Fix a bug in the rope calculation	2023-03-10 23:46:57 +02:00
Georgi Gerganov	319cdb3e1f	Final touches	2023-03-10 21:50:46 +02:00
Georgi Gerganov	26c0846629	Initial release	2023-03-10 20:56:40 +02:00

19 Commits