M. Yusuf Sarıgöz
06f423a8e1
gguf : write sample tensors to read
2023-07-29 10:26:26 +03:00
M. Yusuf Sarıgöz
08dc8fd884
gguf : do not hardcode tensor names to read
2023-07-29 10:24:46 +03:00
M. Yusuf Sarıgöz
9475cdb7a3
Merge branch 'gguf-write-tokenization' into gguf
2023-07-29 00:36:35 +03:00
M. Yusuf Sarıgöz
1495735aac
gguf : fix writing tensors
2023-07-29 00:26:22 +03:00
klosax
3492f848d7
gguf : add gguf_find_key ( #2438 )
...
* gguf.cpp : find key example
* ggml.h : add gguf_find_key
* ggml.c : add gguf_find_key
2023-07-28 23:45:24 +03:00
klosax
8a88e5855c
perplexity : add Hellaswag calculation ( #2389 )
...
* common.h : add hellaswag / remove perplexity-lines
* common.cpp : add hellaswag / remove perplexity-lines
* perplexity.cpp : add hellswag scores / remove perplexity-lines
* perplexity.cpp : clean up
* common.h : change default param value
* common.cpp : Change default param
* perplexity.cpp : alter wording
* common.h : alter wording
* common.cpp : alter wording
2023-07-28 21:25:36 +03:00
Lee
a9559bf77b
ggml : workaround for missing _mm256_setr_m128i in GCC < 8 in k_quants.c ( #2405 )
2023-07-28 21:17:45 +03:00
eric8607242
ee1b497c98
llama : support more diverse tokenizers? ( #2420 )
...
* supporting more diverse tokenizers
* Update llama.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-28 21:10:05 +03:00
Georgi Gerganov
d73b8d48b4
examples : fix whitespace
2023-07-28 21:05:08 +03:00
nhamanasu
34ae1caf7f
examples : server chat mode with llama2 ( #2400 )
...
* add: server chat mode with llama2
* fix: remove the unnecessary last \n
2023-07-28 21:02:10 +03:00
Weird Constructor
d91f3f0c55
readme : fix the description of the Tail free sampling (TFS) method ( #2431 )
2023-07-28 11:44:43 +03:00
Rand Xie
65cdf34bdc
llama : use n_embd_gqa instead of n_embd to handle llama-2 70B ( #2433 )
2023-07-28 11:42:53 +03:00
M. Yusuf Sarıgöz
11ef380c2a
GGUF : write tensor ( #2426 )
...
* WIP: Write tensor
* GGUF : Support writing tensors in Python
* refactor : rm unused import and upd todos
* fix : fix errors upd writing example
* rm example.gguf
* gitignore *.gguf
* undo formatting
2023-07-28 11:34:16 +03:00
niansa/tuxifan
edcc7ae7d2
Obtaining LLaMA 2 instructions ( #2308 )
...
* Obtaining LLaMA 2 instructions
* Removed sharing warning for LLaMA 2
* Linked TheBloke's GGML repos
* Add LLaMA 2 to list of supported models
* Added LLaMA 2 usage instructions
* Added links to LLaMA 2 70B models
2023-07-28 03:14:11 +02:00
mj-shifu
7c529cede6
convert.py : Update to support 70B HF format model files ( #2427 )
...
* convert.py : fix llama 2 70b conversion from Huggingface
2023-07-27 14:39:17 -06:00
Georgi Gerganov
d2bb3ac10b
convert.py : remove GGML vocab + other obsolete stuff
2023-07-27 16:36:35 +03:00
Georgi Gerganov
68f53485e4
convert.py : start a new simplified implementation by removing old stuff
2023-07-27 15:56:53 +03:00
Georgi Gerganov
158be8f7f4
gguf.py : some code style changes
2023-07-27 15:37:06 +03:00
Georgi Gerganov
d2b6ca13ad
gguf : add array support
2023-07-27 14:53:07 +03:00
Georgi Gerganov
d89533dff6
gguf : expose the gguf_type enum through the API for now
2023-07-27 11:10:34 +03:00
Georgi Gerganov
1a941869cb
metal : disable graph concurrency optimization due to bug ( #2413 )
2023-07-27 11:00:54 +03:00
M. Yusuf Sarıgöz
c85d3178b3
refactor : reduce code duplication and better API ( #2415 )
2023-07-27 10:29:29 +03:00
slaren
b5472ea0ad
ggml : fix assert in ggml_set_unary_op ( #2410 )
2023-07-26 23:57:23 +02:00
Georgi Gerganov
d8491fc7e3
gguf : add comments
2023-07-26 23:00:24 +03:00
Georgi Gerganov
5628ec7163
gguf : read / write sample models
2023-07-26 22:40:45 +03:00
Cebtenzzre
6df1f5940f
make : build with -Wmissing-prototypes ( #2394 )
2023-07-26 21:00:04 +03:00
Georgi Gerganov
e46870f5af
gguf : gguf.c is now part of ggml.c
2023-07-26 18:55:32 +03:00
Georgi Gerganov
d313c0fa33
gguf : simplify gguf_get_val
2023-07-26 18:53:57 +03:00
Georgi Gerganov
cb871fa022
gguf : do not support passing existing ggml_context to gguf_init
2023-07-26 18:48:52 +03:00
Georgi Gerganov
860c9c63ce
gguf : add gguf_get_tensor_name()
2023-07-26 18:21:14 +03:00
Georgi Gerganov
78b226a959
gguf : initial model loading - not tested
2023-07-26 18:21:14 +03:00
Georgi Gerganov
d91b985d2d
gguf : read tensor info
2023-07-26 18:21:13 +03:00
Georgi Gerganov
8d6acfec12
gguf : read header + meta data
2023-07-26 18:21:13 +03:00
Georgi Gerganov
6873148771
gguf : first API pass
2023-07-26 18:21:13 +03:00
Georgi Gerganov
7e82d25f40
ci : disable CI temporary to not waste energy
2023-07-26 18:21:13 +03:00
M. Yusuf Sarıgöz
bae6b125f6
wip : implement GGUF ( #2397 )
...
* Add LLAMA_DEFAULT_RMS_EPS so we can change the default (#2384 )
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
* WIP: python class to write GGUF, incomplete C apı for reading
---------
Co-authored-by: Kawrakow <48489457+ikawrakow@users.noreply.github.com>
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-07-26 18:21:13 +03:00
Georgi Gerganov
4d698495ea
gguf : init
2023-07-26 18:21:12 +03:00
slaren
5488fb789e
ggml : allocate graphs in a context ( #2392 )
...
* ggml : graph allocation in contexts
* allocate work buffer as a ggml_object in ggml_graph_compute_with_ctx
* llama.cpp : allocate graph in the context
* add GGML_PAD
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-26 15:56:53 +02:00
Kawrakow
eb542d3932
Add LLAMA_DEFAULT_RMS_EPS so we can change the default ( #2384 )
...
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-07-25 18:35:53 +03:00
slaren
07aaa0f63f
ggml : fix ggml_flash_attn to use op_params ( #2387 )
...
* ggml : fix ggml_flash_attn to use op_params
2023-07-25 16:20:12 +02:00
ldwang
fce48caf9a
convert.py : support bpe tokenizer ( #2228 )
...
* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com>
* support bpe tokenizer in convert
Signed-off-by: ldwang <ftgreat@gmail.com>
* support bpe tokenizer in convert, fix
Signed-off-by: ldwang <ftgreat@gmail.com>
---------
Signed-off-by: ldwang <ftgreat@gmail.com>
Co-authored-by: ldwang <ftgreat@gmail.com>
2023-07-25 16:22:09 +03:00
Jiahao Li
875086bdb9
ggml : relax contiguous constraints in activation function ( #2371 )
2023-07-25 15:58:32 +03:00
slaren
da1889834a
ggml : improve graph build time via hash table lookup ( #2329 )
...
* improve graph build time
* ggml_tensor : use 1 bit per flag
* use a hash table instead
2023-07-25 15:32:20 +03:00
Hesen Peng
82552b7f54
build : fix line breaking error in build-info.sh ( #2349 )
...
* fix line breaking
* build number line break removal
2023-07-25 15:24:09 +03:00
Xiao-Yong Jin
0c06204fb3
main : add --in-prefix-bos
to prefix BOS to user inputs; keep EOS ( #2304 )
...
* add `--in-prefix-bos` to prefix BOS to user inputs; keep EOS
The BOS precedes the string specified by `--in-prefix`.
Model generated EOS is now kept in the context.
It provides a way to strictly following the prompt format used in
Llama-2-chat.
The EOS handling also benefits some existing finetunes that uses
EOS to mark the end of turn.
* examples/common: move input_prefix_bos to other bools
2023-07-25 15:19:11 +03:00
Eve
1fed755b1f
ci : add non-AVX scalar build/test ( #2356 )
...
* noavx build and test
* we don't need to remove f16c in windows
2023-07-25 15:16:13 +03:00
katsu560
be2301bcda
k_quants : add AVX support to dot functions with QK_K as 64 ( #2339 )
...
* add AVX to ggml_vec_dot_q2_K_q8_K()
* add AVX to ggml_vec_dot_q3_K_q8_K()
* add AVX to ggml_vec_dot_q4_K_q8_K()
* add AVX to ggml_vec_dot_q5_K_q8_K()
* add AVX to ggml_vec_dot_q6_K_q8_K()
* refactor AVX code in ggml_vec_dot_q6_K_q8_K()
2023-07-25 15:13:41 +03:00
Shouzheng Liu
1aa18ef994
metal : concurrently dispatch commands ( #2358 )
...
* metal: concurrently dispatch commands
Function `ggml_metal_graph_find_concurrency` will run and write
commands that can be issued concurrently to metal context `concur_list`
array, when `ggml_metal_graph_compute` is called for the first time.
* metal: don't call find_concurrency automatically.
* metal : code style changes
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-25 15:00:19 +03:00
Kawrakow
9a08eaf3c4
Another speed gain for Q4_0 and Q4_1 on Metal ( #2375 )
...
* Another speed gain for Q4_0 and Q4_1 on Metal
* Have N_DST, etc., be template parameters
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-07-25 13:48:29 +03:00
Kawrakow
129d844c87
Fix Q4_K and Q5_K for QK_K = 64 on CUDA ( #2359 )
...
* Fix Q4_K and Q5_K for QK_K = 64
* Very slightly better Q5_K bit fiddling
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2023-07-25 13:48:04 +03:00