These flags are only really useful for linting. They put GCC and other
compilers into `__STRICT_ANSI__` mode. That can make systems stuff
slower, in favor of standards conformance, since it may cause headers to
remove platform specific goodness. It also makes builds more painful on
old distros that have the functions we need, but track an older version
of the standards where those functions weren't strictly available. One
such example is mkstemp(). It's available everywhere in practice, but GA
Ubuntu in strict ansi mode complains about it. If we don't use mkstemp()
then that'll put us on the security radar with other platforms.
I don't have access to Microsoft Visual Studio right now (aside from the
the Github Actions CI system) but I think this code should come close to
what we want in terms of polyfilling UNIX functionality.
This change uses a custom malloc() implementation to transactionally
capture to a file dynamic memory created during the loading process.
That includes (1) the malloc() allocation for mem_buffer and (2) all
the C++ STL objects. On my $1000 personal computer, this change lets
me run ./main to generate a single token (-n 1) using the float16 7B
model (~12gb size) in one second. In order to do that, there's a one
time cost where a 13gb file needs to be generated. This change rocks
but it shouldn't be necessary to do something this heroic. We should
instead change the file format, so that tensors don't need reshaping
and realignment in order to be loaded.
* Don't use vdotq_s32 if it's not available
`dotprod` extensions aren't available on some ARM CPUs (e.g. Raspberry Pi 4), so check for them and only use them if they're available.
Reintroduces the code removed in 84d9015 if `__ARM_FEATURE_DOTPROD` isn't defined.
* Update ggml.c
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Add quantize script for batch quantization
* Indentation
* README for new quantize.sh
* Fix script name
* Fix file list on Mac OS
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Initial work on interactive mode.
* Improve interactive mode. Make rev. prompt optional.
* Update README to explain interactive mode.
* Fix OS X build
* Add back top_k
* Update utils.cpp
* Update utils.h
---------
Co-authored-by: Bill Hamilton <bill.hamilton@shopify.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Apply fixes suggested to build on windows
Issue: https://github.com/ggerganov/llama.cpp/issues/22
* Remove unsupported VLAs
* MSVC: Remove features that are only available on MSVC C++20.
* Fix zero initialization of the other fields.
* Change the use of vector for stack allocations.
* Adding repeat penalization
* Update utils.h
* Update utils.cpp
* Numeric fix
Should probably still scale by temp even if penalized
* Update comments, more proper application
I see that numbers can go negative so a fix from a referenced commit
* Minor formatting
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>