klosax
|
f175b05872
|
Makefile : add gptneox gguf example
|
2023-07-30 15:08:37 +02:00 |
|
klosax
|
e9192b0135
|
add gptneox gguf example
|
2023-07-30 15:05:37 +02:00 |
|
klosax
|
4ed98bf1ab
|
Update convert-llama-h5-to-gguf.py
|
2023-07-30 15:01:47 +02:00 |
|
klosax
|
b19c11750b
|
ggml.c : add gguf_get_arr_n
|
2023-07-30 14:58:50 +02:00 |
|
klosax
|
b4676ee447
|
ggml.h : increase GGML_MAX_NAME to 64
|
2023-07-30 14:51:37 +02:00 |
|
klosax
|
ccd81a751b
|
gguf.py : add layer norm eps and merges
|
2023-07-30 14:48:14 +02:00 |
|
klosax
|
0790c121aa
|
constants.py : add layer norm eps
|
2023-07-30 14:46:36 +02:00 |
|
M. Yusuf Sarıgöz
|
87c34e4dd4
|
gguf : update convert-llama-h5-to-gguf.py
|
2023-07-30 01:09:22 +03:00 |
|
M. Yusuf Sarıgöz
|
32e037ffbe
|
gguf : fix set is not subscriptable
|
2023-07-30 01:01:13 +03:00 |
|
Johannes Gäßler
|
11f3ca06b8
|
CUDA: Quantized matrix matrix multiplication (#2160)
* mmq implementation for non k-quants
* q6_K
* q2_K
* q3_k
* q4_K
* vdr
* q5_K
* faster q8_1 loading
* loop unrolling
* add __restrict__
* q2_K sc_high
* GGML_CUDA_MMQ_Y
* Updated Makefile
* Update Makefile
* DMMV_F16 -> F16
* Updated README, CMakeLists
* Fix CMakeLists.txt
* Fix CMakeLists.txt
* Fix multi GPU out-of-bounds
|
2023-07-29 23:04:44 +02:00 |
|
Johannes Gäßler
|
9baf9ef304
|
CUDA: faster multi GPU synchronization (#2448)
|
2023-07-29 23:04:10 +02:00 |
|
klosax
|
06c3e4a1a7
|
Update convert-llama-h5-to-gguf.py
|
2023-07-29 21:38:01 +02:00 |
|
klosax
|
9577821487
|
gguf.py : support any type
|
2023-07-29 21:29:07 +02:00 |
|
klosax
|
2c22e3bcdb
|
ggml.c : get arr str and f32
|
2023-07-29 20:37:47 +02:00 |
|
klosax
|
34469b9ea7
|
ggml.h : get array str and f32
|
2023-07-29 20:36:06 +02:00 |
|
M. Yusuf Sarıgöz
|
0f5e57f01d
|
gguf : handle already encoded string
|
2023-07-29 19:56:06 +03:00 |
|
klosax
|
8ad7cd49fb
|
Update convert-llama-h5-to-gguf.py
|
2023-07-29 16:47:00 +02:00 |
|
M. Yusuf Sarıgöz
|
0317c41d98
|
gguf : upd gguf conversion script
|
2023-07-29 13:31:07 +03:00 |
|
M. Yusuf Sarıgöz
|
cc3dd7f042
|
gguf : write tokenizer data
|
2023-07-29 13:30:22 +03:00 |
|
M. Yusuf Sarıgöz
|
8a76dd8a85
|
gguf : write tensors one by one
|
2023-07-29 13:17:28 +03:00 |
|
M. Yusuf Sarıgöz
|
c861e234f4
|
gguf : write tensors one by one
|
2023-07-29 12:49:01 +03:00 |
|
M. Yusuf Sarıgöz
|
0c219fb5b5
|
gguf : fix writing gguf arrays
|
2023-07-29 12:42:54 +03:00 |
|
M. Yusuf Sarıgöz
|
93f7f7aef7
|
gguf : write tensors one by one and code reuse
|
2023-07-29 12:34:35 +03:00 |
|
M. Yusuf Sarıgöz
|
aa99562d70
|
Merge branch 'gguf' of https://github.com//ggerganov/llama.cpp into gguf
|
2023-07-29 12:26:11 +03:00 |
|
M. Yusuf Sarıgöz
|
ea5f9ad2ca
|
gguf : fix writing gguf arrays
|
2023-07-29 12:25:43 +03:00 |
|
klosax
|
999431c4b6
|
quick and dirty conversion example
|
2023-07-29 11:20:05 +02:00 |
|
M. Yusuf Sarıgöz
|
d54f53ca51
|
gguf : add tokenization constants
|
2023-07-29 12:04:45 +03:00 |
|
M. Yusuf Sarıgöz
|
06f423a8e1
|
gguf : write sample tensors to read
|
2023-07-29 10:26:26 +03:00 |
|
M. Yusuf Sarıgöz
|
08dc8fd884
|
gguf : do not hardcode tensor names to read
|
2023-07-29 10:24:46 +03:00 |
|
M. Yusuf Sarıgöz
|
9475cdb7a3
|
Merge branch 'gguf-write-tokenization' into gguf
|
2023-07-29 00:36:35 +03:00 |
|
M. Yusuf Sarıgöz
|
1495735aac
|
gguf : fix writing tensors
|
2023-07-29 00:26:22 +03:00 |
|
klosax
|
3492f848d7
|
gguf : add gguf_find_key (#2438)
* gguf.cpp : find key example
* ggml.h : add gguf_find_key
* ggml.c : add gguf_find_key
|
2023-07-28 23:45:24 +03:00 |
|
klosax
|
8a88e5855c
|
perplexity : add Hellaswag calculation (#2389)
* common.h : add hellaswag / remove perplexity-lines
* common.cpp : add hellaswag / remove perplexity-lines
* perplexity.cpp : add hellswag scores / remove perplexity-lines
* perplexity.cpp : clean up
* common.h : change default param value
* common.cpp : Change default param
* perplexity.cpp : alter wording
* common.h : alter wording
* common.cpp : alter wording
|
2023-07-28 21:25:36 +03:00 |
|
Lee
|
a9559bf77b
|
ggml : workaround for missing _mm256_setr_m128i in GCC < 8 in k_quants.c (#2405)
|
2023-07-28 21:17:45 +03:00 |
|
eric8607242
|
ee1b497c98
|
llama : support more diverse tokenizers? (#2420)
* supporting more diverse tokenizers
* Update llama.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
2023-07-28 21:10:05 +03:00 |
|
Georgi Gerganov
|
d73b8d48b4
|
examples : fix whitespace
|
2023-07-28 21:05:08 +03:00 |
|
nhamanasu
|
34ae1caf7f
|
examples : server chat mode with llama2 (#2400)
* add: server chat mode with llama2
* fix: remove the unnecessary last \n
|
2023-07-28 21:02:10 +03:00 |
|
Weird Constructor
|
d91f3f0c55
|
readme : fix the description of the Tail free sampling (TFS) method (#2431)
|
2023-07-28 11:44:43 +03:00 |
|
Rand Xie
|
65cdf34bdc
|
llama : use n_embd_gqa instead of n_embd to handle llama-2 70B (#2433)
|
2023-07-28 11:42:53 +03:00 |
|
M. Yusuf Sarıgöz
|
11ef380c2a
|
GGUF : write tensor (#2426)
* WIP: Write tensor
* GGUF : Support writing tensors in Python
* refactor : rm unused import and upd todos
* fix : fix errors upd writing example
* rm example.gguf
* gitignore *.gguf
* undo formatting
|
2023-07-28 11:34:16 +03:00 |
|
niansa/tuxifan
|
edcc7ae7d2
|
Obtaining LLaMA 2 instructions (#2308)
* Obtaining LLaMA 2 instructions
* Removed sharing warning for LLaMA 2
* Linked TheBloke's GGML repos
* Add LLaMA 2 to list of supported models
* Added LLaMA 2 usage instructions
* Added links to LLaMA 2 70B models
|
2023-07-28 03:14:11 +02:00 |
|
mj-shifu
|
7c529cede6
|
convert.py : Update to support 70B HF format model files (#2427)
* convert.py : fix llama 2 70b conversion from Huggingface
|
2023-07-27 14:39:17 -06:00 |
|
Georgi Gerganov
|
d2bb3ac10b
|
convert.py : remove GGML vocab + other obsolete stuff
|
2023-07-27 16:36:35 +03:00 |
|
Georgi Gerganov
|
68f53485e4
|
convert.py : start a new simplified implementation by removing old stuff
|
2023-07-27 15:56:53 +03:00 |
|
Georgi Gerganov
|
158be8f7f4
|
gguf.py : some code style changes
|
2023-07-27 15:37:06 +03:00 |
|
Georgi Gerganov
|
d2b6ca13ad
|
gguf : add array support
|
2023-07-27 14:53:07 +03:00 |
|
Georgi Gerganov
|
d89533dff6
|
gguf : expose the gguf_type enum through the API for now
|
2023-07-27 11:10:34 +03:00 |
|
Georgi Gerganov
|
1a941869cb
|
metal : disable graph concurrency optimization due to bug (#2413)
|
2023-07-27 11:00:54 +03:00 |
|
M. Yusuf Sarıgöz
|
c85d3178b3
|
refactor : reduce code duplication and better API (#2415)
|
2023-07-27 10:29:29 +03:00 |
|
slaren
|
b5472ea0ad
|
ggml : fix assert in ggml_set_unary_op (#2410)
|
2023-07-26 23:57:23 +02:00 |
|