llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-27 03:44:35 +00:00

Author	SHA1	Message	Date
M. Yusuf Sarıgöz	1d93d04ce2	gguf : refactor pth to gguf conversion script	2023-08-17 19:58:27 +03:00
M. Yusuf Sarıgöz	f71704177f	gguf : rename h5 to hf (for HuggingFace)	2023-08-17 19:49:15 +03:00
M. Yusuf Sarıgöz	9f02694c91	gguf : refactor gptneox conversion script	2023-08-17 19:45:06 +03:00
M. Yusuf Sarıgöz	22c61c5b45	gguf : style fixes in simple conversion script	2023-08-17 19:05:43 +03:00
M. Yusuf Sarıgöz	2f8fc92d86	gguf : fix conflicts	2023-08-17 18:51:14 +03:00
klosax	d646c4efce	convert.py : n_head_kv optional and .gguf file extension	2023-08-17 17:20:36 +02:00
Georgi Gerganov	dd016cc246	Revert "ci : disable CI temporary to not waste energy" This reverts commit `7e82d25f40`.	2023-08-17 17:23:16 +03:00
Georgi Gerganov	2ddd9681d6	convert.py : update to support GGUF output	2023-08-17 17:22:43 +03:00
Georgi Gerganov	e0429d38e4	convert-new.py : output gguf (#2635 ) * convert-new.py : output gguf (WIP) * convert-new.py : add gguf key-value pairs * llama : add hparams.ctx_train + no longer print ftype * convert-new.py : minor fixes * convert-new.py : vocab-only option should work now * llama : fix tokenizer to use llama_char_to_byte * tests : add new ggml-vocab-llama.gguf * convert-new.py : tensor name mapping * convert-new.py : add map for skipping tensor serialization * convert-new.py : convert script now works * gguf.py : pick some of the refactoring from #2644 * convert-new.py : minor fixes	2023-08-17 17:19:52 +03:00
M. Yusuf Sarıgöz	5f97a48fc1	gguf : single pass for writing tensors + refactoring writer	2023-08-17 16:57:50 +03:00
M. Yusuf Sarıgöz	dce07c3121	gguf : single pass for writing tensors + refactoring writer	2023-08-17 16:48:49 +03:00
klosax	d6fd53afd6	llama.cpp : use ggml_elements()	2023-08-17 15:24:35 +02:00
klosax	5a0a2c5685	llama.cpp : print actual model size	2023-08-17 15:18:16 +02:00
M. Yusuf Sarıgöz	f31e9230ad	gguf : single pass for writing tensors + refactoring writer	2023-08-17 15:19:30 +03:00
M. Yusuf Sarıgöz	42f8fe1927	examples/gguf : no need to keep q option for quantization any more	2023-08-17 08:56:42 +03:00
Georgi Gerganov	5ec18934ad	convert-new.py : pick #2427 for HF 70B support	2023-08-16 20:16:15 +03:00
Georgi Gerganov	c8ee87f141	gguf.py : merge all files in gguf.py	2023-08-16 19:55:49 +03:00
Georgi Gerganov	88b5769487	gguf : deduplicate (#2629 ) * gguf : better type names * dedup : CPU + Metal is working * ggml : fix warnings about unused results * llama.cpp : fix line feed and compiler warning * llama : fix strncpy warning + note token_to_str does not write null * llama : restore the original load/save session implementation Will migrate this to GGUF in the future * convert-llama-h5-to-gguf.py : support alt ctx param name * ggml : assert when using ggml_mul with non-F32 src1 * examples : dedup simple --------- Co-authored-by: klosax <131523366+klosax@users.noreply.github.com>	2023-08-16 19:25:29 +03:00
Georgi Gerganov	758ff1bbb5	llama : refactor model loading code (#2620 ) * llama : style formatting + remove helper methods * llama : fix quantization using gguf tool * llama : simplify gguf_file_saver * llama : fix method names * llama : simplify write_header() * llama : no need to pass full file loader to the file saver just gguf_ctx * llama : gguf_file_saver write I32 * llama : refactor tensor names (#2622) * gguf: update tensor names searched in quantization * gguf : define tensor names as constants * gguf : initial write API (not tested yet) * gguf : write to file API (not tested) * gguf : initial write API ready + example * gguf : fix header write * gguf : fixes + simplify example + add ggml_nbytes_pad() * gguf : minor * llama : replace gguf_file_saver with new gguf write API * gguf : streaming support when writing files * gguf : remove oboslete write methods * gguf : remove obosolete gguf_get_arr_xxx API * llama : simplify gguf_file_loader * llama : move hparams and vocab from gguf_file_loader to llama_model_loader * llama : merge gguf-util.h in llama.cpp * llama : reorder definitions in .cpp to match .h * llama : minor simplifications * llama : refactor llama_model_loader (WIP) wip : remove ggml_ctx from llama_model_loader wip : merge gguf_file_loader in llama_model_loader * llama : fix shape prints * llama : fix Windows build + fix norm_rms_eps key * llama : throw error on missing KV paris in model meta data * llama : improve printing + log meta data * llama : switch print order of meta data --------- Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>	2023-08-16 14:34:03 +03:00
klosax	ea5615a03a	convert-llama-h5-to-gguf.py : clarify the reverse permute	2023-08-16 11:23:15 +02:00
klosax	4a1741aa2d	gptneox-main.cpp : add tensor data layout	2023-08-15 19:56:19 +02:00
klosax	2ae0e985b3	convert-llama-7b-pth-to-gguf.py : add tensor data layout	2023-08-15 19:55:13 +02:00
klosax	66756c82af	convert-llama-h5-to-gguf.py : add tensor data layout	2023-08-15 19:54:33 +02:00
klosax	b6056c3db8	gguf.py : add tensor data layout	2023-08-15 19:53:44 +02:00
klosax	2dd5d2c92c	convert-llama-h5-to-gguf.py : add 70b gqa support	2023-08-15 00:43:10 +02:00
klosax	ca4758290c	gguf-llama.cpp : fix n_head_kv	2023-08-14 23:18:41 +02:00
klosax	ab2cbd03ca	convert-llama-7b-pth-to-gguf.py : add token types	2023-08-14 22:10:50 +02:00
klosax	cedb4870c6	gguf.py : add token types	2023-08-14 22:08:40 +02:00
klosax	5d518d421f	constants.py : add token types	2023-08-14 22:07:53 +02:00
klosax	7ec125b1dc	convert-llama-h5-to-gguf.py : add token types	2023-08-14 22:06:33 +02:00
Georgi Gerganov	6c63550f63	llama : update tokenizer style	2023-08-14 22:11:57 +03:00
Georgi Gerganov	7494c78428	llama : sync gguf-llama with llama (#2613 ) * llama : sync gguf-llama with llama * tests : fix build + warnings (test-tokenizer-1 still fails) * tests : fix wstring_convert * convert : fix layer names * llama : sync gguf-llama.cpp * convert : update HF converter to new tokenizer voodoo magics	2023-08-14 21:33:33 +03:00
goerch	afc4ca2889	convert : update convert-new.py with tokenizer fixes (#2614 ) * Merge tokenizer fixes into the gguf branch. * Add test vocabularies * Adapt convert-new.py (and fix a clang-cl compiler error on windows)	2023-08-14 20:20:04 +03:00
goerch	ec1b100720	llama : tokenizer fixes (#2549 ) * Merge tokenizer fixes into the gguf branch. * Add test vocabularies	2023-08-14 19:30:28 +03:00
Georgi Gerganov	8af3a99ff1	Merge branch 'master' into gguf	2023-08-14 16:39:18 +03:00
Georgi Gerganov	6f14854880	gitignore : add gptneox-main	2023-08-14 16:39:02 +03:00
Jhen-Jie Hong	d783f7982e	metal : return null instead of exit(1) (#2573 )	2023-08-14 16:37:39 +03:00
Cheng Shao	d75561df20	server : add --numa support (#2524 )	2023-08-14 16:36:42 +03:00
Kamil Tomšík	348acf188c	llama : add missing enum keyword in function signatures (#2610 )	2023-08-14 16:35:16 +03:00
Georgi Gerganov	f00780b2ee	llama : sync gguf-llama.cpp with latest llama.cpp (#2608 ) * llama : sync gguf-llama.cpp with latest llama.cpp * minor : indentation + assert * llama : refactor gguf_buffer and gguf_ctx_buffer * llama : minor	2023-08-14 16:28:44 +03:00
klosax	6f64b6c0f8	Create convert-llama-7b-pth-to-gguf.py	2023-08-14 13:51:09 +02:00
Georgi Gerganov	62490f1380	gguf : use UNIX line ending	2023-08-14 13:04:35 +03:00
Georgi Gerganov	0c19ae70d5	simple : minor style changes	2023-08-14 12:58:12 +03:00
klosax	5c5a95ba2d	gguf.py : dont add empty strings	2023-08-14 11:22:06 +02:00
klosax	a7d226f871	convert-llama-h5-to-gguf.py : fixes	2023-08-14 11:14:24 +02:00
klosax	d753dfbcc8	gptneox-main.cpp : tensor name map changes	2023-08-14 10:59:18 +02:00
klosax	806a15749d	Delete gguf_tensor_map.py	2023-08-14 10:57:19 +02:00
klosax	51939d7d1b	Create gguf_namemap.py : tensor name map changes	2023-08-14 10:56:59 +02:00
klosax	5d22a9db13	convert-gptneox-h5-to-gguf.py : tensor name map changes	2023-08-14 10:55:44 +02:00
Johannes Gäßler	1cd06fa25e	CUDA: launch_bounds, small q4_K, q5_K mmq refactor (#2596 )	2023-08-14 10:41:22 +02:00

1 2 3 4 5 ...

1172 Commits