Commit Graph

926 Commits

Author SHA1 Message Date
Matvey Soloviev
904d2a8d6a
Q4_1 quantization (#193)
* Add AVX2 version of ggml_vec_dot_q4_1

* Small optimisations to q4_1 dot product (@Const-me)

* Rearrange Q4_1 quantization to work for multipart models. (Fix #152)

* Fix ggml_vec_mad_q4_1 too

* Fix non-vectorised q4_1 vec mul
2023-03-17 06:48:39 +02:00
Georgi Gerganov
721311070e
Update README.md 2023-03-16 15:00:09 +02:00
Georgi Gerganov
ac15de7895
Expand "Contributing" section 2023-03-16 08:55:13 +02:00
Georgi Gerganov
273abc47ff
Update hot topics - RMSnorm 2023-03-16 07:12:12 +02:00
Nebula
9b4a15b17d
Fix RMS norm in GGML (#191) 2023-03-15 19:29:25 -04:00
hoangmit
6eac39ba95
Add RMS norm and use it (#187)
* add ggml_rms_norm

* update op num
2023-03-16 00:41:38 +02:00
moritzbrantner
27944c4206
fixed typo (#178) 2023-03-15 22:35:25 +02:00
Rickey Bowers Jr
2d15d6c9a9
add SIGINT support for _WIN32 environments (#120)
* add SIGINT support for _WIN32 environments

* perhaps more consistent
2023-03-15 21:56:24 +02:00
Justin Suess
2d64715ad4
added ctx_size parameter (#148)
* added ctx_size parameter

* added it in more places

* Apply suggestions from code review

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-15 21:42:40 +02:00
Justin Suess
16b2c61a22
fixed color reset on exit (#149)
* fixed color reset on exit

* added sigint handler for ansi_color_reset

* Update main.cpp

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-15 21:39:38 +02:00
Musab Gultekin
977295c700
Fix potential licensing issue (#126)
* Update README.md

* Update README.md

remove facebook
2023-03-15 21:39:06 +02:00
Ronsor
956dfda8ad
Use tokenizer.vocab_size() instead of hardcoding 32000 in convert-pth-to-ggml.py (#142)
There are ways that special tokens or other new tokens could be added to the tokenizer; therefore it's probably best not to assume the vocabulary is only 32000 tokens.
2023-03-15 21:37:50 +02:00
hoangmit
113e685d18
inline -> static inline for "bytesFromNibbles" (#161)
Without "static" prefix, it fails to compile in clang
2023-03-15 21:05:14 +02:00
Ronsor
47857e564c
Don't use vdotq_s32 if it's not available (#139)
* Don't use vdotq_s32 if it's not available

`dotprod` extensions aren't available on some ARM CPUs (e.g. Raspberry Pi 4), so check for them and only use them if they're available.

Reintroduces the code removed in 84d9015 if `__ARM_FEATURE_DOTPROD` isn't defined.

* Update ggml.c

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-14 21:34:37 +02:00
Radoslav Gerganov
60f819a2b1
Add section to README on how to run the project on Android (#130) 2023-03-14 15:30:08 +02:00
Georgi Gerganov
97ab2b2578
Add Misc section + update hot topics + minor fixes 2023-03-14 09:43:52 +02:00
Sebastián A
2f700a2738
Add windows to the CI (#98) 2023-03-13 22:29:10 +02:00
Georgi Gerganov
c09a9cfb06
CMake build in Release by default (#75) 2023-03-13 21:22:15 +02:00
Georgi Gerganov
7ec903d3c1
Update contribution section, hot topics, limitations, etc. 2023-03-13 19:21:51 +02:00
Georgi Gerganov
4497ad819c
Print system information 2023-03-13 19:15:08 +02:00
Sebastián A
ed6849cc07
Initial support for CMake (#75) 2023-03-13 19:12:33 +02:00
Thomas Klausner
41be0a3b3d
Add NetBSD support. (#90) 2023-03-13 18:40:54 +02:00
Pavol Rusnak
671d5cac15
Use fprintf for diagnostic output (#48)
keep printf only for printing model output

one can now use ./main ... 2>dev/null to suppress any diagnostic output
2023-03-13 18:39:56 +02:00
Georgi Gerganov
84d9015c4a
Use vdotq_s32 to improve performance (#67)
* 10% performance boost on ARM

* Back to original change
2023-03-13 18:36:44 +02:00
uint256_t
63fd76fbb0
Reduce model loading time (#43)
* Use buffering

* Use vector

* Minor

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-13 18:33:43 +02:00
Val Kharitonov
2a20f48efa
Fix UTF-8 handling (including colors) (#79) 2023-03-13 18:24:18 +02:00
Pavol Rusnak
d1f224712d
Add quantize script for batch quantization (#92)
* Add quantize script for batch quantization

* Indentation

* README for new quantize.sh

* Fix script name

* Fix file list on Mac OS

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-13 18:15:20 +02:00
Georgi Gerganov
1808ee0500
Add initial contribution guidelines 2023-03-13 09:42:26 +02:00
Matvey Soloviev
a169bb889c Gate signal support on being on a unixoid system. (#74) 2023-03-13 04:08:01 +01:00
Matvey Soloviev
460c482540 Fix token count accounting 2023-03-13 01:04:41 +01:00
Georgi Gerganov
c80e2a8f2a
Revert "10% performance boost on ARM"
This reverts commit 113a9e83eb.

There are some reports for illegal instruction.
Moved this stuff to vdotq_s32 branch until resolve
2023-03-13 01:28:08 +02:00
Georgi Gerganov
54a0e66ea0
Check for vdotq_s32 availability 2023-03-13 01:21:03 +02:00
Georgi Gerganov
543c57e991
Ammend to previous commit - forgot to update non-QRDMX branch 2023-03-13 01:05:24 +02:00
Georgi Gerganov
113a9e83eb
10% performance boost on ARM 2023-03-13 00:56:10 +02:00
Matvey Soloviev
404fac0d62
Fix color getting reset before prompt output done (#65)
(cherry picked from commit 7eb2987619feee04c40eff69b604017d09919cb6)
2023-03-13 00:07:34 +02:00
Georgi Gerganov
1a0a74300f
Update README.md 2023-03-12 23:39:01 +02:00
Matvey Soloviev
96ea727f47
Add interactive mode (#61)
* Initial work on interactive mode.

* Improve interactive mode. Make rev. prompt optional.

* Update README to explain interactive mode.

* Fix OS X build
2023-03-12 23:13:28 +02:00
Marc Köhlbrugge
9661954835
Fix typo in README (#45) 2023-03-12 22:30:08 +02:00
Ben Garney
f385f8dee8
Allow using prompt files (#59) 2023-03-12 22:28:36 +02:00
beiller
02f0c6fe7f
Add back top_k (#56)
* Add back top_k

* Update utils.cpp

* Update utils.h

---------

Co-authored-by: Bill Hamilton <bill.hamilton@shopify.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-12 22:23:15 +02:00
Sebastián A
eb062bb012
Windows fixes (#31)
* Apply fixes suggested to build on windows

Issue: https://github.com/ggerganov/llama.cpp/issues/22

* Remove unsupported VLAs

* MSVC: Remove features that are only available on MSVC C++20.

* Fix zero initialization of the other fields.

* Change the use of vector for stack allocations.
2023-03-12 22:15:00 +02:00
Georgi Gerganov
7027a97837
Update README.md 2023-03-12 22:09:26 +02:00
Georgi Gerganov
2d555e5b42
Add CI (#60) 2023-03-12 22:08:24 +02:00
Georgi Gerganov
7c9e54e55e
Revert "weights_only" arg - this causing more trouble than help 2023-03-12 20:59:01 +02:00
Oleksandr Nikitin
b9bd1d0141
python/pytorch compat notes (#44) 2023-03-12 14:16:33 +02:00
beiller
129c7d1ea8
Add repetition penalty (#20)
* Adding repeat penalization

* Update utils.h

* Update utils.cpp

* Numeric fix

Should probably still scale by temp even if penalized

* Update comments, more proper application

I see that numbers can go negative so a fix from a referenced commit

* Minor formatting

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-12 11:27:42 +02:00
Georgi Gerganov
702fddf5c5
Clarify meaning of hacking 2023-03-12 09:03:25 +02:00
Georgi Gerganov
7d86e25bf6
README: add "Supported platforms" + update hot topics 2023-03-12 08:41:54 +02:00
deepdiffuser
a93120236f
use weights_only in conversion script (#32)
this restricts malicious weights from executing arbitrary code by restricting the unpickler to only loading tensors, primitive types, and dictionaries
2023-03-12 08:36:35 +02:00
Pavol Rusnak
6a9a67f0be
Add LICENSE (#21) 2023-03-12 08:36:03 +02:00