mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2024-12-25 02:44:36 +00:00
py : switch to snake_case (#8305)
* py : switch to snake_case ggml-ci * cont ggml-ci * cont ggml-ci * cont : fix link * gguf-py : use snake_case in scripts entrypoint export * py : rename requirements for convert_legacy_llama.py Needed for scripts/check-requirements.sh --------- Co-authored-by: Francis Couture-Harpin <git@compilade.net>
This commit is contained in:
parent
f09b7cb609
commit
e235b267a2
@ -26,7 +26,7 @@ Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others)
|
|||||||
|
|
||||||
### Hot topics
|
### Hot topics
|
||||||
|
|
||||||
- **`convert.py` has been deprecated and moved to `examples/convert-legacy-llama.py`, please use `convert-hf-to-gguf.py`** https://github.com/ggerganov/llama.cpp/pull/7430
|
- **`convert.py` has been deprecated and moved to `examples/convert_legacy_llama.py`, please use `convert_hf_to_gguf.py`** https://github.com/ggerganov/llama.cpp/pull/7430
|
||||||
- Initial Flash-Attention support: https://github.com/ggerganov/llama.cpp/pull/5021
|
- Initial Flash-Attention support: https://github.com/ggerganov/llama.cpp/pull/5021
|
||||||
- BPE pre-tokenization support has been added: https://github.com/ggerganov/llama.cpp/pull/6920
|
- BPE pre-tokenization support has been added: https://github.com/ggerganov/llama.cpp/pull/6920
|
||||||
- MoE memory layout has been updated - reconvert models for `mmap` support and regenerate `imatrix` https://github.com/ggerganov/llama.cpp/pull/6387
|
- MoE memory layout has been updated - reconvert models for `mmap` support and regenerate `imatrix` https://github.com/ggerganov/llama.cpp/pull/6387
|
||||||
@ -636,8 +636,8 @@ Building the program with BLAS support may lead to some performance improvements
|
|||||||
|
|
||||||
To obtain the official LLaMA 2 weights please see the <a href="#obtaining-and-using-the-facebook-llama-2-model">Obtaining and using the Facebook LLaMA 2 model</a> section. There is also a large selection of pre-quantized `gguf` models available on Hugging Face.
|
To obtain the official LLaMA 2 weights please see the <a href="#obtaining-and-using-the-facebook-llama-2-model">Obtaining and using the Facebook LLaMA 2 model</a> section. There is also a large selection of pre-quantized `gguf` models available on Hugging Face.
|
||||||
|
|
||||||
Note: `convert.py` has been moved to `examples/convert-legacy-llama.py` and shouldn't be used for anything other than `Llama/Llama2/Mistral` models and their derivatives.
|
Note: `convert.py` has been moved to `examples/convert_legacy_llama.py` and shouldn't be used for anything other than `Llama/Llama2/Mistral` models and their derivatives.
|
||||||
It does not support LLaMA 3, you can use `convert-hf-to-gguf.py` with LLaMA 3 downloaded from Hugging Face.
|
It does not support LLaMA 3, you can use `convert_hf_to_gguf.py` with LLaMA 3 downloaded from Hugging Face.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# obtain the official LLaMA model weights and place them in ./models
|
# obtain the official LLaMA model weights and place them in ./models
|
||||||
@ -654,7 +654,7 @@ ls ./models
|
|||||||
python3 -m pip install -r requirements.txt
|
python3 -m pip install -r requirements.txt
|
||||||
|
|
||||||
# convert the model to ggml FP16 format
|
# convert the model to ggml FP16 format
|
||||||
python3 convert-hf-to-gguf.py models/mymodel/
|
python3 convert_hf_to_gguf.py models/mymodel/
|
||||||
|
|
||||||
# quantize the model to 4-bits (using Q4_K_M method)
|
# quantize the model to 4-bits (using Q4_K_M method)
|
||||||
./llama-quantize ./models/mymodel/ggml-model-f16.gguf ./models/mymodel/ggml-model-Q4_K_M.gguf Q4_K_M
|
./llama-quantize ./models/mymodel/ggml-model-f16.gguf ./models/mymodel/ggml-model-Q4_K_M.gguf Q4_K_M
|
||||||
|
@ -287,7 +287,7 @@ function gg_run_open_llama_7b_v2 {
|
|||||||
(time cmake -DCMAKE_BUILD_TYPE=Release ${CMAKE_EXTRA} -DGGML_CUDA=1 .. ) 2>&1 | tee -a $OUT/${ci}-cmake.log
|
(time cmake -DCMAKE_BUILD_TYPE=Release ${CMAKE_EXTRA} -DGGML_CUDA=1 .. ) 2>&1 | tee -a $OUT/${ci}-cmake.log
|
||||||
(time make -j ) 2>&1 | tee -a $OUT/${ci}-make.log
|
(time make -j ) 2>&1 | tee -a $OUT/${ci}-make.log
|
||||||
|
|
||||||
python3 ../examples/convert-legacy-llama.py ${path_models} --outfile ${path_models}/ggml-model-f16.gguf
|
python3 ../examples/convert_legacy_llama.py ${path_models} --outfile ${path_models}/ggml-model-f16.gguf
|
||||||
|
|
||||||
model_f16="${path_models}/ggml-model-f16.gguf"
|
model_f16="${path_models}/ggml-model-f16.gguf"
|
||||||
model_q8_0="${path_models}/ggml-model-q8_0.gguf"
|
model_q8_0="${path_models}/ggml-model-q8_0.gguf"
|
||||||
@ -421,7 +421,7 @@ function gg_run_pythia_1_4b {
|
|||||||
(time cmake -DCMAKE_BUILD_TYPE=Release ${CMAKE_EXTRA} .. ) 2>&1 | tee -a $OUT/${ci}-cmake.log
|
(time cmake -DCMAKE_BUILD_TYPE=Release ${CMAKE_EXTRA} .. ) 2>&1 | tee -a $OUT/${ci}-cmake.log
|
||||||
(time make -j ) 2>&1 | tee -a $OUT/${ci}-make.log
|
(time make -j ) 2>&1 | tee -a $OUT/${ci}-make.log
|
||||||
|
|
||||||
python3 ../convert-hf-to-gguf.py ${path_models} --outfile ${path_models}/ggml-model-f16.gguf
|
python3 ../convert_hf_to_gguf.py ${path_models} --outfile ${path_models}/ggml-model-f16.gguf
|
||||||
|
|
||||||
model_f16="${path_models}/ggml-model-f16.gguf"
|
model_f16="${path_models}/ggml-model-f16.gguf"
|
||||||
model_q8_0="${path_models}/ggml-model-q8_0.gguf"
|
model_q8_0="${path_models}/ggml-model-q8_0.gguf"
|
||||||
@ -553,7 +553,7 @@ function gg_run_pythia_2_8b {
|
|||||||
(time cmake -DCMAKE_BUILD_TYPE=Release ${CMAKE_EXTRA} -DGGML_CUDA=1 .. ) 2>&1 | tee -a $OUT/${ci}-cmake.log
|
(time cmake -DCMAKE_BUILD_TYPE=Release ${CMAKE_EXTRA} -DGGML_CUDA=1 .. ) 2>&1 | tee -a $OUT/${ci}-cmake.log
|
||||||
(time make -j ) 2>&1 | tee -a $OUT/${ci}-make.log
|
(time make -j ) 2>&1 | tee -a $OUT/${ci}-make.log
|
||||||
|
|
||||||
python3 ../convert-hf-to-gguf.py ${path_models} --outfile ${path_models}/ggml-model-f16.gguf
|
python3 ../convert_hf_to_gguf.py ${path_models} --outfile ${path_models}/ggml-model-f16.gguf
|
||||||
|
|
||||||
model_f16="${path_models}/ggml-model-f16.gguf"
|
model_f16="${path_models}/ggml-model-f16.gguf"
|
||||||
model_q8_0="${path_models}/ggml-model-q8_0.gguf"
|
model_q8_0="${path_models}/ggml-model-q8_0.gguf"
|
||||||
|
@ -404,7 +404,7 @@ class Model:
|
|||||||
|
|
||||||
return tokens, toktypes, tokpre
|
return tokens, toktypes, tokpre
|
||||||
|
|
||||||
# NOTE: this function is generated by convert-hf-to-gguf-update.py
|
# NOTE: this function is generated by convert_hf_to_gguf_update.py
|
||||||
# do not modify it manually!
|
# do not modify it manually!
|
||||||
# ref: https://github.com/ggerganov/llama.cpp/pull/6920
|
# ref: https://github.com/ggerganov/llama.cpp/pull/6920
|
||||||
# Marker: Start get_vocab_base_pre
|
# Marker: Start get_vocab_base_pre
|
||||||
@ -424,7 +424,7 @@ class Model:
|
|||||||
|
|
||||||
res = None
|
res = None
|
||||||
|
|
||||||
# NOTE: if you get an error here, you need to update the convert-hf-to-gguf-update.py script
|
# NOTE: if you get an error here, you need to update the convert_hf_to_gguf_update.py script
|
||||||
# or pull the latest version of the model from Huggingface
|
# or pull the latest version of the model from Huggingface
|
||||||
# don't edit the hashes manually!
|
# don't edit the hashes manually!
|
||||||
if chkhsh == "0ef9807a4087ebef797fc749390439009c3b9eda9ad1a097abbe738f486c01e5":
|
if chkhsh == "0ef9807a4087ebef797fc749390439009c3b9eda9ad1a097abbe738f486c01e5":
|
||||||
@ -499,9 +499,9 @@ class Model:
|
|||||||
logger.warning("**************************************************************************************")
|
logger.warning("**************************************************************************************")
|
||||||
logger.warning("** WARNING: The BPE pre-tokenizer was not recognized!")
|
logger.warning("** WARNING: The BPE pre-tokenizer was not recognized!")
|
||||||
logger.warning("** There are 2 possible reasons for this:")
|
logger.warning("** There are 2 possible reasons for this:")
|
||||||
logger.warning("** - the model has not been added to convert-hf-to-gguf-update.py yet")
|
logger.warning("** - the model has not been added to convert_hf_to_gguf_update.py yet")
|
||||||
logger.warning("** - the pre-tokenization config has changed upstream")
|
logger.warning("** - the pre-tokenization config has changed upstream")
|
||||||
logger.warning("** Check your model files and convert-hf-to-gguf-update.py and update them accordingly.")
|
logger.warning("** Check your model files and convert_hf_to_gguf_update.py and update them accordingly.")
|
||||||
logger.warning("** ref: https://github.com/ggerganov/llama.cpp/pull/6920")
|
logger.warning("** ref: https://github.com/ggerganov/llama.cpp/pull/6920")
|
||||||
logger.warning("**")
|
logger.warning("**")
|
||||||
logger.warning(f"** chkhsh: {chkhsh}")
|
logger.warning(f"** chkhsh: {chkhsh}")
|
||||||
|
@ -2,7 +2,7 @@
|
|||||||
# -*- coding: utf-8 -*-
|
# -*- coding: utf-8 -*-
|
||||||
|
|
||||||
# This script downloads the tokenizer models of the specified models from Huggingface and
|
# This script downloads the tokenizer models of the specified models from Huggingface and
|
||||||
# generates the get_vocab_base_pre() function for convert-hf-to-gguf.py
|
# generates the get_vocab_base_pre() function for convert_hf_to_gguf.py
|
||||||
#
|
#
|
||||||
# This is necessary in order to analyze the type of pre-tokenizer used by the model and
|
# This is necessary in order to analyze the type of pre-tokenizer used by the model and
|
||||||
# provide the necessary information to llama.cpp via the GGUF header in order to implement
|
# provide the necessary information to llama.cpp via the GGUF header in order to implement
|
||||||
@ -15,9 +15,9 @@
|
|||||||
# - Add a new model to the "models" list
|
# - Add a new model to the "models" list
|
||||||
# - Run the script with your huggingface token:
|
# - Run the script with your huggingface token:
|
||||||
#
|
#
|
||||||
# python3 convert-hf-to-gguf-update.py <huggingface_token>
|
# python3 convert_hf_to_gguf_update.py <huggingface_token>
|
||||||
#
|
#
|
||||||
# - Copy-paste the generated get_vocab_base_pre() function into convert-hf-to-gguf.py
|
# - Copy-paste the generated get_vocab_base_pre() function into convert_hf_to_gguf.py
|
||||||
# - Update llama.cpp with the new pre-tokenizer if necessary
|
# - Update llama.cpp with the new pre-tokenizer if necessary
|
||||||
#
|
#
|
||||||
# TODO: generate tokenizer tests for llama.cpp
|
# TODO: generate tokenizer tests for llama.cpp
|
||||||
@ -37,7 +37,7 @@ from enum import IntEnum, auto
|
|||||||
from transformers import AutoTokenizer
|
from transformers import AutoTokenizer
|
||||||
|
|
||||||
logging.basicConfig(level=logging.DEBUG)
|
logging.basicConfig(level=logging.DEBUG)
|
||||||
logger = logging.getLogger("convert-hf-to-gguf-update")
|
logger = logging.getLogger("convert_hf_to_gguf_update")
|
||||||
sess = requests.Session()
|
sess = requests.Session()
|
||||||
|
|
||||||
|
|
||||||
@ -56,10 +56,10 @@ if len(sys.argv) == 2:
|
|||||||
token = sys.argv[1]
|
token = sys.argv[1]
|
||||||
if not token.startswith("hf_"):
|
if not token.startswith("hf_"):
|
||||||
logger.info("Huggingface token seems invalid")
|
logger.info("Huggingface token seems invalid")
|
||||||
logger.info("Usage: python convert-hf-to-gguf-update.py <huggingface_token>")
|
logger.info("Usage: python convert_hf_to_gguf_update.py <huggingface_token>")
|
||||||
sys.exit(1)
|
sys.exit(1)
|
||||||
else:
|
else:
|
||||||
logger.info("Usage: python convert-hf-to-gguf-update.py <huggingface_token>")
|
logger.info("Usage: python convert_hf_to_gguf_update.py <huggingface_token>")
|
||||||
sys.exit(1)
|
sys.exit(1)
|
||||||
|
|
||||||
# TODO: add models here, base models preferred
|
# TODO: add models here, base models preferred
|
||||||
@ -134,7 +134,7 @@ for model in models:
|
|||||||
logger.error(f"Failed to download model {model['name']}. Error: {e}")
|
logger.error(f"Failed to download model {model['name']}. Error: {e}")
|
||||||
|
|
||||||
|
|
||||||
# generate the source code for the convert-hf-to-gguf.py:get_vocab_base_pre() function:
|
# generate the source code for the convert_hf_to_gguf.py:get_vocab_base_pre() function:
|
||||||
|
|
||||||
src_ifs = ""
|
src_ifs = ""
|
||||||
for model in models:
|
for model in models:
|
||||||
@ -201,7 +201,7 @@ src_func = f"""
|
|||||||
|
|
||||||
res = None
|
res = None
|
||||||
|
|
||||||
# NOTE: if you get an error here, you need to update the convert-hf-to-gguf-update.py script
|
# NOTE: if you get an error here, you need to update the convert_hf_to_gguf_update.py script
|
||||||
# or pull the latest version of the model from Huggingface
|
# or pull the latest version of the model from Huggingface
|
||||||
# don't edit the hashes manually!
|
# don't edit the hashes manually!
|
||||||
{src_ifs}
|
{src_ifs}
|
||||||
@ -210,9 +210,9 @@ src_func = f"""
|
|||||||
logger.warning("**************************************************************************************")
|
logger.warning("**************************************************************************************")
|
||||||
logger.warning("** WARNING: The BPE pre-tokenizer was not recognized!")
|
logger.warning("** WARNING: The BPE pre-tokenizer was not recognized!")
|
||||||
logger.warning("** There are 2 possible reasons for this:")
|
logger.warning("** There are 2 possible reasons for this:")
|
||||||
logger.warning("** - the model has not been added to convert-hf-to-gguf-update.py yet")
|
logger.warning("** - the model has not been added to convert_hf_to_gguf_update.py yet")
|
||||||
logger.warning("** - the pre-tokenization config has changed upstream")
|
logger.warning("** - the pre-tokenization config has changed upstream")
|
||||||
logger.warning("** Check your model files and convert-hf-to-gguf-update.py and update them accordingly.")
|
logger.warning("** Check your model files and convert_hf_to_gguf_update.py and update them accordingly.")
|
||||||
logger.warning("** ref: https://github.com/ggerganov/llama.cpp/pull/6920")
|
logger.warning("** ref: https://github.com/ggerganov/llama.cpp/pull/6920")
|
||||||
logger.warning("**")
|
logger.warning("**")
|
||||||
logger.warning(f"** chkhsh: {{chkhsh}}")
|
logger.warning(f"** chkhsh: {{chkhsh}}")
|
||||||
@ -226,7 +226,7 @@ src_func = f"""
|
|||||||
return res
|
return res
|
||||||
"""
|
"""
|
||||||
|
|
||||||
convert_py_pth = pathlib.Path("convert-hf-to-gguf.py")
|
convert_py_pth = pathlib.Path("convert_hf_to_gguf.py")
|
||||||
convert_py = convert_py_pth.read_text(encoding="utf-8")
|
convert_py = convert_py_pth.read_text(encoding="utf-8")
|
||||||
convert_py = re.sub(
|
convert_py = re.sub(
|
||||||
r"(# Marker: Start get_vocab_base_pre)(.+?)( +# Marker: End get_vocab_base_pre)",
|
r"(# Marker: Start get_vocab_base_pre)(.+?)( +# Marker: End get_vocab_base_pre)",
|
||||||
@ -237,7 +237,7 @@ convert_py = re.sub(
|
|||||||
|
|
||||||
convert_py_pth.write_text(convert_py, encoding="utf-8")
|
convert_py_pth.write_text(convert_py, encoding="utf-8")
|
||||||
|
|
||||||
logger.info("+++ convert-hf-to-gguf.py was updated")
|
logger.info("+++ convert_hf_to_gguf.py was updated")
|
||||||
|
|
||||||
# generate tests for each tokenizer model
|
# generate tests for each tokenizer model
|
||||||
|
|
||||||
@ -343,6 +343,6 @@ logger.info("\nRun the following commands to generate the vocab files for testin
|
|||||||
for model in models:
|
for model in models:
|
||||||
name = model["name"]
|
name = model["name"]
|
||||||
|
|
||||||
print(f"python3 convert-hf-to-gguf.py models/tokenizers/{name}/ --outfile models/ggml-vocab-{name}.gguf --vocab-only") # noqa: NP100
|
print(f"python3 convert_hf_to_gguf.py models/tokenizers/{name}/ --outfile models/ggml-vocab-{name}.gguf --vocab-only") # noqa: NP100
|
||||||
|
|
||||||
logger.info("\n")
|
logger.info("\n")
|
||||||
|
@ -17,7 +17,7 @@ Also, it is important to check that the examples and main ggml backends (CUDA, M
|
|||||||
### 1. Convert the model to GGUF
|
### 1. Convert the model to GGUF
|
||||||
|
|
||||||
This step is done in python with a `convert` script using the [gguf](https://pypi.org/project/gguf/) library.
|
This step is done in python with a `convert` script using the [gguf](https://pypi.org/project/gguf/) library.
|
||||||
Depending on the model architecture, you can use either [convert-hf-to-gguf.py](../convert-hf-to-gguf.py) or [examples/convert-legacy-llama.py](../examples/convert-legacy-llama.py) (for `llama/llama2` models in `.pth` format).
|
Depending on the model architecture, you can use either [convert_hf_to_gguf.py](../convert_hf_to_gguf.py) or [examples/convert_legacy_llama.py](../examples/convert_legacy_llama.py) (for `llama/llama2` models in `.pth` format).
|
||||||
|
|
||||||
The convert script reads the model configuration, tokenizer, tensor names+data and converts them to GGUF metadata and tensors.
|
The convert script reads the model configuration, tokenizer, tensor names+data and converts them to GGUF metadata and tensors.
|
||||||
|
|
||||||
|
@ -1,7 +1,7 @@
|
|||||||
# Usage:
|
# Usage:
|
||||||
#! ./llama-server -m some-model.gguf &
|
#! ./llama-server -m some-model.gguf &
|
||||||
#! pip install pydantic
|
#! pip install pydantic
|
||||||
#! python json-schema-pydantic-example.py
|
#! python json_schema_pydantic_example.py
|
||||||
|
|
||||||
from pydantic import BaseModel, Extra, TypeAdapter
|
from pydantic import BaseModel, Extra, TypeAdapter
|
||||||
from annotated_types import MinLen
|
from annotated_types import MinLen
|
@ -30,16 +30,16 @@ git clone https://huggingface.co/mtgv/MobileVLM-1.7B
|
|||||||
git clone https://huggingface.co/openai/clip-vit-large-patch14-336
|
git clone https://huggingface.co/openai/clip-vit-large-patch14-336
|
||||||
```
|
```
|
||||||
|
|
||||||
2. Use `llava-surgery.py` to split the LLaVA model to LLaMA and multimodel projector constituents:
|
2. Use `llava_surgery.py` to split the LLaVA model to LLaMA and multimodel projector constituents:
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
python ./examples/llava/llava-surgery.py -m path/to/MobileVLM-1.7B
|
python ./examples/llava/llava_surgery.py -m path/to/MobileVLM-1.7B
|
||||||
```
|
```
|
||||||
|
|
||||||
3. Use `convert-image-encoder-to-gguf.py` with `--projector-type ldp` (for **V2** please use `--projector-type ldpv2`) to convert the LLaVA image encoder to GGUF:
|
3. Use `convert_image_encoder_to_gguf.py` with `--projector-type ldp` (for **V2** please use `--projector-type ldpv2`) to convert the LLaVA image encoder to GGUF:
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
python ./examples/llava/convert-image-encoder-to-gguf \
|
python ./examples/llava/convert_image_encoder_to_gguf \
|
||||||
-m path/to/clip-vit-large-patch14-336 \
|
-m path/to/clip-vit-large-patch14-336 \
|
||||||
--llava-projector path/to/MobileVLM-1.7B/llava.projector \
|
--llava-projector path/to/MobileVLM-1.7B/llava.projector \
|
||||||
--output-dir path/to/MobileVLM-1.7B \
|
--output-dir path/to/MobileVLM-1.7B \
|
||||||
@ -47,17 +47,17 @@ python ./examples/llava/convert-image-encoder-to-gguf \
|
|||||||
```
|
```
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
python ./examples/llava/convert-image-encoder-to-gguf \
|
python ./examples/llava/convert_image_encoder_to_gguf \
|
||||||
-m path/to/clip-vit-large-patch14-336 \
|
-m path/to/clip-vit-large-patch14-336 \
|
||||||
--llava-projector path/to/MobileVLM-1.7B_V2/llava.projector \
|
--llava-projector path/to/MobileVLM-1.7B_V2/llava.projector \
|
||||||
--output-dir path/to/MobileVLM-1.7B_V2 \
|
--output-dir path/to/MobileVLM-1.7B_V2 \
|
||||||
--projector-type ldpv2
|
--projector-type ldpv2
|
||||||
```
|
```
|
||||||
|
|
||||||
4. Use `examples/convert-legacy-llama.py` to convert the LLaMA part of LLaVA to GGUF:
|
4. Use `examples/convert_legacy_llama.py` to convert the LLaMA part of LLaVA to GGUF:
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
python ./examples/convert-legacy-llama.py path/to/MobileVLM-1.7B
|
python ./examples/convert_legacy_llama.py path/to/MobileVLM-1.7B
|
||||||
```
|
```
|
||||||
|
|
||||||
5. Use `quantize` to convert LLaMA part's DataType from `fp16` to `q4_k`
|
5. Use `quantize` to convert LLaMA part's DataType from `fp16` to `q4_k`
|
||||||
|
@ -38,22 +38,22 @@ git clone https://huggingface.co/openai/clip-vit-large-patch14-336
|
|||||||
pip install -r examples/llava/requirements.txt
|
pip install -r examples/llava/requirements.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
3. Use `llava-surgery.py` to split the LLaVA model to LLaMA and multimodel projector constituents:
|
3. Use `llava_surgery.py` to split the LLaVA model to LLaMA and multimodel projector constituents:
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
python ./examples/llava/llava-surgery.py -m ../llava-v1.5-7b
|
python ./examples/llava/llava_surgery.py -m ../llava-v1.5-7b
|
||||||
```
|
```
|
||||||
|
|
||||||
4. Use `convert-image-encoder-to-gguf.py` to convert the LLaVA image encoder to GGUF:
|
4. Use `convert_image_encoder_to_gguf.py` to convert the LLaVA image encoder to GGUF:
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
python ./examples/llava/convert-image-encoder-to-gguf.py -m ../clip-vit-large-patch14-336 --llava-projector ../llava-v1.5-7b/llava.projector --output-dir ../llava-v1.5-7b
|
python ./examples/llava/convert_image_encoder_to_gguf.py -m ../clip-vit-large-patch14-336 --llava-projector ../llava-v1.5-7b/llava.projector --output-dir ../llava-v1.5-7b
|
||||||
```
|
```
|
||||||
|
|
||||||
5. Use `examples/convert-legacy-llama.py` to convert the LLaMA part of LLaVA to GGUF:
|
5. Use `examples/convert_legacy_llama.py` to convert the LLaMA part of LLaVA to GGUF:
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
python ./examples/convert-legacy-llama.py ../llava-v1.5-7b --skip-unknown
|
python ./examples/convert_legacy_llama.py ../llava-v1.5-7b --skip-unknown
|
||||||
```
|
```
|
||||||
|
|
||||||
Now both the LLaMA part and the image encoder are in the `llava-v1.5-7b` directory.
|
Now both the LLaMA part and the image encoder are in the `llava-v1.5-7b` directory.
|
||||||
@ -70,9 +70,9 @@ git clone https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b
|
|||||||
pip install -r examples/llava/requirements.txt
|
pip install -r examples/llava/requirements.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
3) Use `llava-surgery-v2.py` which also supports llava-1.5 variants pytorch as well as safetensor models:
|
3) Use `llava_surgery_v2.py` which also supports llava-1.5 variants pytorch as well as safetensor models:
|
||||||
```console
|
```console
|
||||||
python examples/llava/llava-surgery-v2.py -C -m ../llava-v1.6-vicuna-7b/
|
python examples/llava/llava_surgery_v2.py -C -m ../llava-v1.6-vicuna-7b/
|
||||||
```
|
```
|
||||||
- you will find a llava.projector and a llava.clip file in your model directory
|
- you will find a llava.projector and a llava.clip file in your model directory
|
||||||
|
|
||||||
@ -86,13 +86,13 @@ curl -s -q https://huggingface.co/cmp-nct/llava-1.6-gguf/raw/main/config_vit.jso
|
|||||||
|
|
||||||
5) Create the visual gguf model:
|
5) Create the visual gguf model:
|
||||||
```console
|
```console
|
||||||
python ./examples/llava/convert-image-encoder-to-gguf.py -m vit --llava-projector vit/llava.projector --output-dir vit --clip-model-is-vision
|
python ./examples/llava/convert_image_encoder_to_gguf.py -m vit --llava-projector vit/llava.projector --output-dir vit --clip-model-is-vision
|
||||||
```
|
```
|
||||||
- This is similar to llava-1.5, the difference is that we tell the encoder that we are working with the pure vision model part of CLIP
|
- This is similar to llava-1.5, the difference is that we tell the encoder that we are working with the pure vision model part of CLIP
|
||||||
|
|
||||||
6) Then convert the model to gguf format:
|
6) Then convert the model to gguf format:
|
||||||
```console
|
```console
|
||||||
python ./examples/convert-legacy-llama.py ../llava-v1.6-vicuna-7b/ --skip-unknown
|
python ./examples/convert_legacy_llama.py ../llava-v1.6-vicuna-7b/ --skip-unknown
|
||||||
```
|
```
|
||||||
|
|
||||||
7) And finally we can run the llava cli using the 1.6 model version:
|
7) And finally we can run the llava cli using the 1.6 model version:
|
||||||
|
@ -1,3 +1,3 @@
|
|||||||
-r ../../requirements/requirements-convert-legacy-llama.txt
|
-r ../../requirements/requirements-convert_legacy_llama.txt
|
||||||
pillow~=10.2.0
|
pillow~=10.2.0
|
||||||
torch~=2.2.1
|
torch~=2.2.1
|
||||||
|
@ -3,7 +3,7 @@
|
|||||||
This is a Python package for writing binary files in the [GGUF](https://github.com/ggerganov/ggml/pull/302)
|
This is a Python package for writing binary files in the [GGUF](https://github.com/ggerganov/ggml/pull/302)
|
||||||
(GGML Universal File) format.
|
(GGML Universal File) format.
|
||||||
|
|
||||||
See [convert-llama-hf-to-gguf.py](https://github.com/ggerganov/llama.cpp/blob/master/convert-hf-to-gguf.py)
|
See [convert_hf_to_gguf.py](https://github.com/ggerganov/llama.cpp/blob/master/convert_hf_to_gguf.py)
|
||||||
as an example for its usage.
|
as an example for its usage.
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
@ -15,13 +15,13 @@ pip install gguf
|
|||||||
|
|
||||||
[examples/writer.py](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/examples/writer.py) — Generates `example.gguf` in the current directory to demonstrate generating a GGUF file. Note that this file cannot be used as a model.
|
[examples/writer.py](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/examples/writer.py) — Generates `example.gguf` in the current directory to demonstrate generating a GGUF file. Note that this file cannot be used as a model.
|
||||||
|
|
||||||
[scripts/gguf-dump.py](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/scripts/gguf-dump.py) — Dumps a GGUF file's metadata to the console.
|
[scripts/gguf_dump.py](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/scripts/gguf_dump.py) — Dumps a GGUF file's metadata to the console.
|
||||||
|
|
||||||
[scripts/gguf-set-metadata.py](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/scripts/gguf-set-metadata.py) — Allows changing simple metadata values in a GGUF file by key.
|
[scripts/gguf_set_metadata.py](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/scripts/gguf_set_metadata.py) — Allows changing simple metadata values in a GGUF file by key.
|
||||||
|
|
||||||
[scripts/gguf-convert-endian.py](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/scripts/gguf-convert-endian.py) — Allows converting the endianness of GGUF files.
|
[scripts/gguf_convert_endian.py](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/scripts/gguf_convert_endian.py) — Allows converting the endianness of GGUF files.
|
||||||
|
|
||||||
[scripts/gguf-new-metadata.py](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/scripts/gguf-new-metadata.py) — Copies a GGUF file with added/modified/removed metadata values.
|
[scripts/gguf_new_metadata.py](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/scripts/gguf_new_metadata.py) — Copies a GGUF file with added/modified/removed metadata values.
|
||||||
|
|
||||||
## Development
|
## Development
|
||||||
Maintainers who participate in development of this package are advised to install it in editable mode:
|
Maintainers who participate in development of this package are advised to install it in editable mode:
|
||||||
|
@ -1,13 +1,4 @@
|
|||||||
import os
|
from .gguf_convert_endian import main as gguf_convert_endian_entrypoint
|
||||||
|
from .gguf_dump import main as gguf_dump_entrypoint
|
||||||
from importlib import import_module
|
from .gguf_set_metadata import main as gguf_set_metadata_entrypoint
|
||||||
|
from .gguf_new_metadata import main as gguf_new_metadata_entrypoint
|
||||||
|
|
||||||
os.environ["NO_LOCAL_GGUF"] = "TRUE"
|
|
||||||
|
|
||||||
gguf_convert_endian_entrypoint = import_module("scripts.gguf-convert-endian").main
|
|
||||||
gguf_dump_entrypoint = import_module("scripts.gguf-dump").main
|
|
||||||
gguf_set_metadata_entrypoint = import_module("scripts.gguf-set-metadata").main
|
|
||||||
gguf_new_metadata_entrypoint = import_module("scripts.gguf-new-metadata").main
|
|
||||||
|
|
||||||
del import_module, os
|
|
||||||
|
@ -4,7 +4,7 @@
|
|||||||
# Package versions must stay compatible across all top-level python scripts.
|
# Package versions must stay compatible across all top-level python scripts.
|
||||||
#
|
#
|
||||||
|
|
||||||
-r ./requirements/requirements-convert-legacy-llama.txt
|
-r ./requirements/requirements-convert_legacy_llama.txt
|
||||||
|
|
||||||
-r ./requirements/requirements-convert_hf_to_gguf.txt
|
-r ./requirements/requirements-convert_hf_to_gguf.txt
|
||||||
-r ./requirements/requirements-convert_hf_to_gguf_update.txt
|
-r ./requirements/requirements-convert_hf_to_gguf_update.txt
|
||||||
|
@ -1,2 +1,2 @@
|
|||||||
-r ./requirements-convert-legacy-llama.txt
|
-r ./requirements-convert_legacy_llama.txt
|
||||||
torch~=2.2.1
|
torch~=2.2.1
|
||||||
|
@ -1,2 +1,2 @@
|
|||||||
-r ./requirements-convert-legacy-llama.txt
|
-r ./requirements-convert_legacy_llama.txt
|
||||||
torch~=2.2.1
|
torch~=2.2.1
|
||||||
|
@ -1 +1 @@
|
|||||||
-r ./requirements-convert-legacy-llama.txt
|
-r ./requirements-convert_legacy_llama.txt
|
||||||
|
@ -97,9 +97,9 @@ check_requirements() {
|
|||||||
}
|
}
|
||||||
|
|
||||||
check_convert_script() {
|
check_convert_script() {
|
||||||
local py=$1 # e.g. ./convert-hf-to-gguf.py
|
local py=$1 # e.g. ./convert_hf_to_gguf.py
|
||||||
local pyname=${py##*/} # e.g. convert-hf-to-gguf.py
|
local pyname=${py##*/} # e.g. convert_hf_to_gguf.py
|
||||||
pyname=${pyname%.py} # e.g. convert-hf-to-gguf
|
pyname=${pyname%.py} # e.g. convert_hf_to_gguf
|
||||||
|
|
||||||
info "$py: beginning check"
|
info "$py: beginning check"
|
||||||
|
|
||||||
@ -166,9 +166,9 @@ if (( do_cleanup )); then
|
|||||||
rm -rf -- "$all_venv"
|
rm -rf -- "$all_venv"
|
||||||
fi
|
fi
|
||||||
|
|
||||||
check_convert_script examples/convert-legacy-llama.py
|
check_convert_script examples/convert_legacy_llama.py
|
||||||
for py in convert_*.py; do
|
for py in convert_*.py; do
|
||||||
# skip convert-hf-to-gguf-update.py
|
# skip convert_hf_to_gguf_update.py
|
||||||
# TODO: the check is failing for some reason:
|
# TODO: the check is failing for some reason:
|
||||||
# https://github.com/ggerganov/llama.cpp/actions/runs/8875330981/job/24364557177?pr=6920
|
# https://github.com/ggerganov/llama.cpp/actions/runs/8875330981/job/24364557177?pr=6920
|
||||||
[[ $py == convert_hf_to_gguf_update.py ]] && continue
|
[[ $py == convert_hf_to_gguf_update.py ]] && continue
|
||||||
|
@ -1,26 +0,0 @@
|
|||||||
#!/bin/bash
|
|
||||||
|
|
||||||
set -e
|
|
||||||
|
|
||||||
# LLaMA v1
|
|
||||||
python3 examples/convert-legacy-llama.py ../llama1/7B --outfile models/llama-7b/ggml-model-f16.gguf --outtype f16
|
|
||||||
python3 examples/convert-legacy-llama.py ../llama1/13B --outfile models/llama-13b/ggml-model-f16.gguf --outtype f16
|
|
||||||
python3 examples/convert-legacy-llama.py ../llama1/30B --outfile models/llama-30b/ggml-model-f16.gguf --outtype f16
|
|
||||||
python3 examples/convert-legacy-llama.py ../llama1/65B --outfile models/llama-65b/ggml-model-f16.gguf --outtype f16
|
|
||||||
|
|
||||||
# LLaMA v2
|
|
||||||
python3 examples/convert-legacy-llama.py ../llama2/llama-2-7b --outfile models/llama-7b-v2/ggml-model-f16.gguf --outtype f16
|
|
||||||
python3 examples/convert-legacy-llama.py ../llama2/llama-2-13b --outfile models/llama-13b-v2/ggml-model-f16.gguf --outtype f16
|
|
||||||
python3 examples/convert-legacy-llama.py ../llama2/llama-2-70b --outfile models/llama-70b-v2/ggml-model-f16.gguf --outtype f16
|
|
||||||
|
|
||||||
# Code Llama
|
|
||||||
python3 examples/convert-legacy-llama.py ../codellama/CodeLlama-7b/ --outfile models/codellama-7b/ggml-model-f16.gguf --outtype f16
|
|
||||||
python3 examples/convert-legacy-llama.py ../codellama/CodeLlama-13b/ --outfile models/codellama-13b/ggml-model-f16.gguf --outtype f16
|
|
||||||
python3 examples/convert-legacy-llama.py ../codellama/CodeLlama-34b/ --outfile models/codellama-34b/ggml-model-f16.gguf --outtype f16
|
|
||||||
|
|
||||||
# Falcon
|
|
||||||
python3 convert-falcon-hf-to-gguf.py ../falcon/falcon-7b 1
|
|
||||||
mv -v ../falcon/falcon-7b/ggml-model-f16.gguf models/falcon-7b/ggml-model-f16.gguf
|
|
||||||
|
|
||||||
python3 convert-falcon-hf-to-gguf.py ../falcon/falcon-40b 1
|
|
||||||
mv -v ../falcon/falcon-40b/ggml-model-f16.gguf models/falcon-40b/ggml-model-f16.gguf
|
|
@ -75,7 +75,7 @@ if [ "$1" -eq "1" ]; then
|
|||||||
|
|
||||||
cd /workspace/llama.cpp
|
cd /workspace/llama.cpp
|
||||||
|
|
||||||
python3 examples/convert-legacy-llama.py ./models/tinyllama-1b --outfile ./models/tinyllama-1b/ggml-model-f16.gguf --outtype f16
|
python3 examples/convert_legacy_llama.py ./models/tinyllama-1b --outfile ./models/tinyllama-1b/ggml-model-f16.gguf --outtype f16
|
||||||
|
|
||||||
./llama-quantize ./models/tinyllama-1b/ggml-model-f16.gguf ./models/tinyllama-1b/ggml-model-q4_0.gguf q4_0
|
./llama-quantize ./models/tinyllama-1b/ggml-model-f16.gguf ./models/tinyllama-1b/ggml-model-q4_0.gguf q4_0
|
||||||
./llama-quantize ./models/tinyllama-1b/ggml-model-f16.gguf ./models/tinyllama-1b/ggml-model-q4_k.gguf q4_k
|
./llama-quantize ./models/tinyllama-1b/ggml-model-f16.gguf ./models/tinyllama-1b/ggml-model-q4_k.gguf q4_k
|
||||||
@ -90,7 +90,7 @@ if [ "$1" -eq "2" ]; then
|
|||||||
|
|
||||||
cd /workspace/llama.cpp
|
cd /workspace/llama.cpp
|
||||||
|
|
||||||
python3 examples/convert-legacy-llama.py ./models/codellama-7b --outfile ./models/codellama-7b/ggml-model-f16.gguf --outtype f16
|
python3 examples/convert_legacy_llama.py ./models/codellama-7b --outfile ./models/codellama-7b/ggml-model-f16.gguf --outtype f16
|
||||||
|
|
||||||
./llama-quantize ./models/codellama-7b/ggml-model-f16.gguf ./models/codellama-7b/ggml-model-q4_0.gguf q4_0
|
./llama-quantize ./models/codellama-7b/ggml-model-f16.gguf ./models/codellama-7b/ggml-model-q4_0.gguf q4_0
|
||||||
./llama-quantize ./models/codellama-7b/ggml-model-f16.gguf ./models/codellama-7b/ggml-model-q4_k.gguf q4_k
|
./llama-quantize ./models/codellama-7b/ggml-model-f16.gguf ./models/codellama-7b/ggml-model-q4_k.gguf q4_k
|
||||||
@ -105,7 +105,7 @@ if [ "$1" -eq "3" ]; then
|
|||||||
|
|
||||||
cd /workspace/llama.cpp
|
cd /workspace/llama.cpp
|
||||||
|
|
||||||
python3 examples/convert-legacy-llama.py ./models/codellama-13b --outfile ./models/codellama-13b/ggml-model-f16.gguf --outtype f16
|
python3 examples/convert_legacy_llama.py ./models/codellama-13b --outfile ./models/codellama-13b/ggml-model-f16.gguf --outtype f16
|
||||||
|
|
||||||
./llama-quantize ./models/codellama-13b/ggml-model-f16.gguf ./models/codellama-13b/ggml-model-q4_0.gguf q4_0
|
./llama-quantize ./models/codellama-13b/ggml-model-f16.gguf ./models/codellama-13b/ggml-model-q4_0.gguf q4_0
|
||||||
./llama-quantize ./models/codellama-13b/ggml-model-f16.gguf ./models/codellama-13b/ggml-model-q4_k.gguf q4_k
|
./llama-quantize ./models/codellama-13b/ggml-model-f16.gguf ./models/codellama-13b/ggml-model-q4_k.gguf q4_k
|
||||||
@ -120,7 +120,7 @@ if [ "$1" -eq "4" ]; then
|
|||||||
|
|
||||||
cd /workspace/llama.cpp
|
cd /workspace/llama.cpp
|
||||||
|
|
||||||
python3 examples/convert-legacy-llama.py ./models/codellama-34b --outfile ./models/codellama-34b/ggml-model-f16.gguf --outtype f16
|
python3 examples/convert_legacy_llama.py ./models/codellama-34b --outfile ./models/codellama-34b/ggml-model-f16.gguf --outtype f16
|
||||||
|
|
||||||
./llama-quantize ./models/codellama-34b/ggml-model-f16.gguf ./models/codellama-34b/ggml-model-q4_0.gguf q4_0
|
./llama-quantize ./models/codellama-34b/ggml-model-f16.gguf ./models/codellama-34b/ggml-model-q4_0.gguf q4_0
|
||||||
./llama-quantize ./models/codellama-34b/ggml-model-f16.gguf ./models/codellama-34b/ggml-model-q4_k.gguf q4_k
|
./llama-quantize ./models/codellama-34b/ggml-model-f16.gguf ./models/codellama-34b/ggml-model-q4_k.gguf q4_k
|
||||||
@ -135,7 +135,7 @@ if [ "$1" -eq "5" ]; then
|
|||||||
|
|
||||||
cd /workspace/llama.cpp
|
cd /workspace/llama.cpp
|
||||||
|
|
||||||
python3 examples/convert-legacy-llama.py ./models/codellama-7b-instruct --outfile ./models/codellama-7b-instruct/ggml-model-f16.gguf --outtype f16
|
python3 examples/convert_legacy_llama.py ./models/codellama-7b-instruct --outfile ./models/codellama-7b-instruct/ggml-model-f16.gguf --outtype f16
|
||||||
|
|
||||||
./llama-quantize ./models/codellama-7b-instruct/ggml-model-f16.gguf ./models/codellama-7b-instruct/ggml-model-q4_0.gguf q4_0
|
./llama-quantize ./models/codellama-7b-instruct/ggml-model-f16.gguf ./models/codellama-7b-instruct/ggml-model-q4_0.gguf q4_0
|
||||||
./llama-quantize ./models/codellama-7b-instruct/ggml-model-f16.gguf ./models/codellama-7b-instruct/ggml-model-q4_k.gguf q4_k
|
./llama-quantize ./models/codellama-7b-instruct/ggml-model-f16.gguf ./models/codellama-7b-instruct/ggml-model-q4_k.gguf q4_k
|
||||||
@ -150,7 +150,7 @@ if [ "$1" -eq "6" ]; then
|
|||||||
|
|
||||||
cd /workspace/llama.cpp
|
cd /workspace/llama.cpp
|
||||||
|
|
||||||
python3 examples/convert-legacy-llama.py ./models/codellama-13b-instruct --outfile ./models/codellama-13b-instruct/ggml-model-f16.gguf --outtype f16
|
python3 examples/convert_legacy_llama.py ./models/codellama-13b-instruct --outfile ./models/codellama-13b-instruct/ggml-model-f16.gguf --outtype f16
|
||||||
|
|
||||||
./llama-quantize ./models/codellama-13b-instruct/ggml-model-f16.gguf ./models/codellama-13b-instruct/ggml-model-q4_0.gguf q4_0
|
./llama-quantize ./models/codellama-13b-instruct/ggml-model-f16.gguf ./models/codellama-13b-instruct/ggml-model-q4_0.gguf q4_0
|
||||||
./llama-quantize ./models/codellama-13b-instruct/ggml-model-f16.gguf ./models/codellama-13b-instruct/ggml-model-q4_k.gguf q4_k
|
./llama-quantize ./models/codellama-13b-instruct/ggml-model-f16.gguf ./models/codellama-13b-instruct/ggml-model-q4_k.gguf q4_k
|
||||||
@ -165,7 +165,7 @@ if [ "$1" -eq "7" ]; then
|
|||||||
|
|
||||||
cd /workspace/llama.cpp
|
cd /workspace/llama.cpp
|
||||||
|
|
||||||
python3 examples/convert-legacy-llama.py ./models/codellama-34b-instruct --outfile ./models/codellama-34b-instruct/ggml-model-f16.gguf --outtype f16
|
python3 examples/convert_legacy_llama.py ./models/codellama-34b-instruct --outfile ./models/codellama-34b-instruct/ggml-model-f16.gguf --outtype f16
|
||||||
|
|
||||||
./llama-quantize ./models/codellama-34b-instruct/ggml-model-f16.gguf ./models/codellama-34b-instruct/ggml-model-q4_0.gguf q4_0
|
./llama-quantize ./models/codellama-34b-instruct/ggml-model-f16.gguf ./models/codellama-34b-instruct/ggml-model-q4_0.gguf q4_0
|
||||||
./llama-quantize ./models/codellama-34b-instruct/ggml-model-f16.gguf ./models/codellama-34b-instruct/ggml-model-q4_k.gguf q4_k
|
./llama-quantize ./models/codellama-34b-instruct/ggml-model-f16.gguf ./models/codellama-34b-instruct/ggml-model-q4_k.gguf q4_k
|
||||||
|
Loading…
Reference in New Issue
Block a user