llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-27 20:04:35 +00:00

Author	SHA1	Message	Date
goerch	49c25cce19	tests : use new tokenizer type API (#2692 ) * Merge tokenizer fixes into the gguf branch. * Add test vocabularies * Adapt convert-new.py (and fix a clang-cl compiler error on windows) * Improved tokenizer test But does it work on MacOS? * Improve token type support - Added @klosax code to convert.py - Improved token type support in vocabulary * Exclude platform dependent tests * More sentencepiece compatibility by eliminating magic numbers * Restored accidentally removed comment * Improve commentary * Use token type API in test-tokenizer-1.cpp	2023-08-21 20:11:14 +03:00
Georgi Gerganov	0b53b8b08d	llama : add API for token type ggml-ci	2023-08-21 19:35:31 +03:00
goerch	8d177eddeb	llama : improve token type support (#2668 ) * Merge tokenizer fixes into the gguf branch. * Add test vocabularies * Adapt convert-new.py (and fix a clang-cl compiler error on windows) * Improved tokenizer test But does it work on MacOS? * Improve token type support - Added @klosax code to convert.py - Improved token type support in vocabulary * Exclude platform dependent tests * More sentencepiece compatibility by eliminating magic numbers * Restored accidentally removed comment	2023-08-21 18:56:02 +03:00
klosax	d5c8fcfd8a	convert.py : 70b model working (change attn_q permute)	2023-08-21 04:33:33 +02:00
Georgi Gerganov	acaa98234a	convert.py : fix HF tensor permuting / unpacking ggml-ci	2023-08-17 21:06:45 +03:00
Georgi Gerganov	8ace03ad3d	convert.py : better always have n_head_kv and default it to n_head	2023-08-17 18:47:06 +03:00
klosax	d646c4efce	convert.py : n_head_kv optional and .gguf file extension	2023-08-17 17:20:36 +02:00
Georgi Gerganov	2ddd9681d6	convert.py : update to support GGUF output	2023-08-17 17:22:43 +03:00
goerch	ec1b100720	llama : tokenizer fixes (#2549 ) * Merge tokenizer fixes into the gguf branch. * Add test vocabularies	2023-08-14 19:30:28 +03:00
Keiichi Tabata	2e8265ae17	convert.py : add missing abstract methods for quantized data (#2491 )	2023-08-06 09:34:05 +03:00
mj-shifu	7c529cede6	convert.py : Update to support 70B HF format model files (#2427 ) * convert.py : fix llama 2 70b conversion from Huggingface	2023-07-27 14:39:17 -06:00
ldwang	fce48caf9a	convert.py : support bpe tokenizer (#2228 ) * support bpe tokenizer in convert Signed-off-by: ldwang <ftgreat@gmail.com> * support bpe tokenizer in convert Signed-off-by: ldwang <ftgreat@gmail.com> * support bpe tokenizer in convert, fix Signed-off-by: ldwang <ftgreat@gmail.com> --------- Signed-off-by: ldwang <ftgreat@gmail.com> Co-authored-by: ldwang <ftgreat@gmail.com>	2023-07-25 16:22:09 +03:00
Georgi Gerganov	e76d630df1	llama : grouped-query attention + LLaMAv2 70B support (#2276 ) * CUDA: GQA implementation * llama : support for GQA and LLaMAv2 70B ggml-ci * py : fix hparams parsing (if-else blocks) ggml-ci * py : oh boy .. ggml-ci * help : fix gqa value for 70B ggml-ci --------- Co-authored-by: JohannesGaessler <johannesg@5d6.de>	2023-07-23 15:09:47 +03:00
wzy	b1f4290953	cmake : install targets (#2256 ) fix #2252	2023-07-19 10:01:11 +03:00
Aarni Koskela	3e08ae99ce	convert.py: add mapping for safetensors bf16 (#1598 ) Fixes #1473	2023-07-07 09:12:49 -04:00
Judd	36680f6e40	convert : update for baichuan (#2081 ) 1. guess n_layers; 2. relax warnings on context size; 3. add a note that its derivations are also supported. Co-authored-by: Judd <foldl@boxvest.com>	2023-07-06 19:23:49 +03:00
Judd	471aab6e4c	convert : add support of baichuan-7b (#2055 ) Co-authored-by: Judd <foldl@boxvest.com>	2023-07-01 20:00:25 +03:00
AN Long	c943d823c1	convert : fix invalid params in write_vocab_only (#1975 )	2023-06-24 14:02:06 +03:00
Erik Scholz	7487137227	rework convert.py to read hyper-parameters from config.json (#1958 ) * Read hyper-parameters from HuggingFace-transformer config.json, if they exist, and fall back to guessing, like before otherwise. This allows converting open_llama 3B and other non-standard model designs.	2023-06-22 14:20:47 +02:00
Jiří Podivín	5ddf7ea1fb	hooks : setting up flake8 and pre-commit hooks (#1681 ) Small, non-functional changes were made to non-compliant files. These include breaking up long lines, whitespace sanitation and unused import removal. Maximum line length in python files was set to a generous 125 chars, in order to minimize number of changes needed in scripts and general annoyance. The "txt" prompts directory is excluded from the checks as it may contain oddly formatted files and strings for a good reason. Signed-off-by: Jiri Podivin <jpodivin@gmail.com>	2023-06-17 13:32:48 +03:00
Tom Jobbins	2b2646931b	convert.py: Support models which are stored in a single pytorch_model.bin (#1469 ) * Support models in a single pytorch_model.bin * Remove spurious line with typo	2023-05-17 00:04:35 +02:00
ubik2	95078cc554	convert: add ability to convert safetensors files (#1276 ) * when loading a safetensors file, ignore the metadata header * check for safetensors files first, and only use PyTorch versions when safetensors aren't available	2023-05-08 13:54:26 +02:00
Benjamin Lecaillon	a90e96b266	Convert.py @staticmethod (#1327 ) * Line 698 has one #staticmethod and should not otherwise throw error at unpickle.load() as not callable * Update convert.py --------- Co-authored-by: Ivan Stepanov <ivanstepanovftw@gmail.com>	2023-05-05 03:17:07 +03:00
Ivan Stepanov	d3e8093e9b	convert: support DT_BF16 tensors (#1309 ) Co-authored-by: Pavol Rusnak <pavol@rusnak.io>	2023-05-04 18:54:37 +02:00
Cameron	4ad73137a1	add 4_0 to default outfile namestr dict (#1031 ) this came up when trying to convert the gpt4all-lora-unfiltered-quantized.bin file	2023-04-17 20:26:23 +02:00
Georgi Gerganov	3173a62eb9	stdout : vertical align outputs for better readibility	2023-04-16 13:59:27 +03:00
comex	74f5899df4	convert.py: Fix loading safetensors and ggml format on Windows (#991 ) Calling `mmap.mmap` on Windows apparently resets the file offset of the raw file object (and makes the BufferedReader return a negative file offset). For safetensors, avoid using the file offset after calling mmap. For GGML format, explicitly save and restore the offset. Fixes #966.	2023-04-15 23:53:21 +02:00
Pavol Rusnak	43ffdefb74	py : fix flake8 and isort nitpicks (#960 )	2023-04-14 14:23:21 +02:00
comex	723dac55fa	py : new conversion script (#545 ) Current status: Working, except for the latest GPTQ-for-LLaMa format that includes `g_idx`. This turns out to require changes to GGML, so for now it only works if you use the `--outtype` option to dequantize it back to f16 (which is pointless except for debugging). I also included some cleanup for the C++ code. This script is meant to replace all the existing conversion scripts (including the ones that convert from older GGML formats), while also adding support for some new formats. Specifically, I've tested with: - [x] `LLaMA` (original) - [x] `llama-65b-4bit` - [x] `alpaca-native` - [x] `alpaca-native-4bit` - [x] LLaMA converted to 'transformers' format using `convert_llama_weights_to_hf.py` - [x] `alpaca-native` quantized with `--true-sequential --act-order --groupsize 128` (dequantized only) - [x] same as above plus `--save_safetensors` - [x] GPT4All - [x] stock unversioned ggml - [x] ggmh There's enough overlap in the logic needed to handle these different cases that it seemed best to move to a single script. I haven't tried this with Alpaca-LoRA because I don't know where to find it. Useful features: - Uses multiple threads for a speedup in some cases (though the Python GIL limits the gain, and sometimes it's disk-bound anyway). - Combines split models into a single file (both the intra-tensor split of the original and the inter-tensor split of 'transformers' format files). Single files are more convenient to work with and more friendly to future changes to use memory mapping on the C++ side. To accomplish this without increasing memory requirements, it has some custom loading code which avoids loading whole input files into memory at once. - Because of the custom loading code, it no longer depends in PyTorch, which might make installing dependencies slightly easier or faster... although it still depends on NumPy and sentencepiece, so I don't know if there's any meaningful difference. In any case, I also added a requirements.txt file to lock the dependency versions in case of any future breaking changes. - Type annotations checked with mypy. - Some attempts to be extra user-friendly: - The script tries to be forgiving with arguments, e.g. you can specify either the model file itself or the directory containing it. - The script doesn't depend on config.json / params.json, just in case the user downloaded files individually and doesn't have those handy. But you still need tokenizer.model and, for Alpaca, added_tokens.json. - The script tries to give a helpful error message if added_tokens.json is missing.	2023-04-14 10:03:03 +03:00

29 Commits