llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-27 03:44:35 +00:00

Author	SHA1	Message	Date
Kerfuffle	e06cbcee73	gguf : add Python script to convert GGMLv3 LLaMA models to GGUF (#2682 ) * First pass at converting GGMLv3 LLaMA models to GGUF * Cleanups, better output during conversion * Fix vocab space conversion logic * More vocab conversion fixes * Add description to converted GGUF files * Improve help text, expand warning * Allow specifying name and description for output GGUF * Allow overriding vocab and hyperparams from original model metadata * Use correct params override var name * Fix wrong type size for Q8_K Better handling of original style metadata * Set default value for gguf add_tensor raw_shape KW arg	2023-08-21 17:45:52 +03:00
Georgi Gerganov	6490ff7198	py : fix whitespace	2023-08-21 16:42:27 +03:00
Georgi Gerganov	1e7a0092dd	Merge branch 'master' into gguf ggml-ci	2023-08-21 16:28:30 +03:00
klosax	7a7d1ba68a	convert-llama-hf-to-gguf.py : rope scale fix	2023-08-21 14:12:02 +02:00
klosax	9070e330ab	convert-llama-7b-pth-to-gguf.py : rope scale fix	2023-08-21 14:11:22 +02:00
klosax	c082b9fa0b	llama.cpp : use rope scale kv	2023-08-21 13:30:03 +02:00
klosax	dc1f051013	convert-llama-7b-pth-to-gguf.py : rope scale and added tokens	2023-08-21 13:27:53 +02:00
klosax	5f6ff387ca	convert-llama-hf-to-gguf.py : rope scale and added tokens	2023-08-21 13:25:14 +02:00
klosax	6a69a693cb	gguf.py : fix rope scale kv	2023-08-21 13:23:10 +02:00
Shouzheng Liu	dadbed99e6	metal : fix synchronization in new matrix multiplication kernel (#2686 )	2023-08-21 13:59:29 +03:00
Kawrakow	cb1c0727bd	HellaSwag: split token evaluation into batches if needed (#2681 ) Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2023-08-21 11:11:31 +03:00
klosax	c818c405e0	convert-llama-hf-to-gguf.py : fix attn_q permute	2023-08-21 04:42:09 +02:00
klosax	58bde5c5c1	Delete convert-permute-debug.py	2023-08-21 04:35:06 +02:00
klosax	287db51015	Delete convert-permute-debug-master.py	2023-08-21 04:34:39 +02:00
klosax	d5c8fcfd8a	convert.py : 70b model working (change attn_q permute)	2023-08-21 04:33:33 +02:00
klosax	7de7cb4bd8	convert-permute-debug.py : change permute type of attn_q	2023-08-21 04:06:59 +02:00
klosax	4f92488dd6	convert-permute-debug-master.py : permute debug for master	2023-08-21 03:44:16 +02:00
klosax	5a02b9625a	convert-permute-debug.py : permute debug print	2023-08-21 03:24:29 +02:00
slaren	9e232f0234	ggml : move all type info to ggml_type_traits (#2663 )	2023-08-20 22:17:53 +02:00
klosax	f838faa874	convert-llama-7b-pth-to-gguf.py : special tokens	2023-08-20 16:56:48 +02:00
klosax	76b46627e2	convert-llama-hf-to-gguf.py : special tokens	2023-08-20 16:54:42 +02:00
Kawrakow	5e9ff54a67	More efficient Hellaswag implementation (#2677 ) Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2023-08-20 16:44:46 +03:00
klosax	28b8c265eb	cmpnct_gpt2bpe.hpp : cleanup	2023-08-19 18:26:51 +02:00
klosax	c0a1269b7f	Update examples/server/README.md Co-authored-by: slaren <slarengh@gmail.com>	2023-08-19 15:27:37 +02:00
klosax	6a2e520095	cmpnct_gpt2bpe.hpp : remove non-general stuff	2023-08-19 13:19:02 +02:00
klosax	8945d47f52	gptneox-main.cpp : fixes	2023-08-19 12:09:24 +02:00
klosax	781bf2481f	falcon-main.cpp : fixes	2023-08-19 12:08:17 +02:00
klosax	dadf098b5a	cmpnct_gpt2bpe.hpp : fixes	2023-08-19 12:06:22 +02:00
klosax	b3a7a2b486	convert-falcon-hf-to-gguf.py : add tensor data layout	2023-08-19 12:05:11 +02:00
klosax	2c8055b65b	convert-falcon-hf-to-gguf.py : update ref	2023-08-19 01:08:39 +02:00
klosax	1d80eea574	falcon-main.cpp : fix for falcon 40b	2023-08-19 01:03:37 +02:00
klosax	bd5a57901b	gguf.py : fix for falcon 40b	2023-08-19 01:01:52 +02:00
klosax	281d6d1105	convert-llama-hf-to-gguf.py : remove extra kv	2023-08-19 00:32:56 +02:00
klosax	593b04fdcd	convert-llama-7b-pth-to-gguf.py : remove extra kv	2023-08-19 00:32:27 +02:00
klosax	c0e4ca630b	convert-gptneox-hf-to-gguf.py : remove extra kv	2023-08-19 00:31:56 +02:00
klosax	16ab9ba3b3	convert-falcon-hf-to-gguf.py : remove extra kv	2023-08-19 00:31:28 +02:00
klosax	d5e976c12b	falcon-main.cpp : falcon inference example	2023-08-19 00:02:18 +02:00
Georgi Gerganov	1f0bccb279	server : better default prompt (#2646 )	2023-08-19 05:45:36 +08:00
Jhen-Jie Hong	f63564adfa	server : update xxd usage for older versions compatibility (#2649 ) * server : update xxd usage for older versions compatibility * remove unused $func	2023-08-19 05:41:32 +08:00
Adrian	2d8b76a110	Add link to clojure bindings to Readme. (#2659 )	2023-08-18 21:39:22 +02:00
klosax	fb7c883cd3	convert-falcon-hf-to-gguf.py : falcon HF --> gguf conversion, not tested	2023-08-18 20:14:01 +02:00
Georgi Gerganov	25b8a8922d	llama : introduce enum llama_vocab_type + remove hardcoded string constants	2023-08-18 18:46:38 +03:00
Georgi Gerganov	7af633aec3	readme : incoming BREAKING CHANGE	2023-08-18 17:48:31 +03:00
Georgi Gerganov	a4ad2bf35c	llama : fix MPI build ggml-ci	2023-08-18 17:34:27 +03:00
Georgi Gerganov	5d2656d670	llama : avoid hardcoded special tokens	2023-08-18 17:29:20 +03:00
Georgi Gerganov	035d511457	llama : minor API updates	2023-08-18 17:10:20 +03:00
Georgi Gerganov	2d6c2c757c	llama : remove C++ API + reorganize common source in /common dir	2023-08-18 16:22:48 +03:00
Georgi Gerganov	38016ed9ec	Merge branch 'master' into gguf	2023-08-18 15:21:48 +03:00
Georgi Gerganov	660ca9bbca	llama : re-order functions	2023-08-18 14:56:36 +03:00
slaren	097e121e2f	llama : add benchmark example (#2626 ) * llama : add benchmark example * add to examples CMakeLists.txt * fix msvc build * add missing include * add Bessel's correction to stdev calculation Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * improve markdown formatting * add missing include * print warning is NDEBUG is not defined * remove n_prompt and n_gen from the matrix, use each value separately instead * better checks for non-optimized builds * llama.cpp : fix MEM_REQ_SCRATCH0 reusing the value of n_ctx of the first call * fix json formatting * add sql output * add basic cpu and gpu info (linx/cuda only) * markdown: also show values that differ from the default * markdown: add build id * cleanup * improve formatting * formatting --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2023-08-18 12:44:58 +02:00

1 2 3 4 5 ...

1259 Commits