llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-27 20:04:35 +00:00

History

Meng Zhang 4fe09dfe66 llama : add support for StarCoder model architectures (#3187 ) * add placeholder of starcoder in gguf / llama.cpp * support convert starcoder weights to gguf * convert MQA to MHA * fix ffn_down name * add LLM_ARCH_STARCODER to llama.cpp * set head_count_kv = 1 * load starcoder weight * add max_position_embeddings * set n_positions to max_positioin_embeddings * properly load all starcoder params * fix head count kv * fix comments * fix vram calculation for starcoder * store mqa directly * add input embeddings handling * add TBD * working in cpu, metal buggy * cleanup useless code * metal : fix out-of-bounds access in soft_max kernels * llama : make starcoder graph build more consistent with others * refactor: cleanup comments a bit * add other starcoder models: 3B, 7B, 15B * support-mqa-directly * fix: remove max_position_embeddings, use n_train_ctx * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix: switch to space from tab --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>		2023-09-15 22:02:13 +03:00
..
gguf	llama : add support for StarCoder model architectures (#3187 )	2023-09-15 22:02:13 +03:00
tests	gguf : make gguf pip-installable	2023-08-25 09:26:05 +03:00
LICENSE	gguf : make gguf pip-installable	2023-08-25 09:26:05 +03:00
pyproject.toml	gguf-py : support identity operation in TensorNameMap (#3095 )	2023-09-14 19:32:26 +03:00
README.md	gguf : add workflow for Pypi publishing (#2896 )	2023-08-30 12:47:40 +03:00

README.md

gguf

This is a Python package for writing binary files in the GGUF (GGML Universal File) format.

See convert-llama-hf-to-gguf.py as an example for its usage.

Installation

pip install gguf

Development

Maintainers who participate in development of this package are advised to install it in editable mode:

cd /path/to/llama.cpp/gguf-py

pip install --editable .

Note: This may require to upgrade your Pip installation, with a message saying that editable installation currently requires setup.py. In this case, upgrade Pip to the latest:

pip install --upgrade pip

Automatic publishing with CI

There's a GitHub workflow to make a release automatically upon creation of tags in a specified format.

Bump the version in pyproject.toml.
Create a tag named gguf-vx.x.x where x.x.x is the semantic version number.

git tag -a gguf-v1.0.0 -m "Version 1.0 release"

Push the tags.

git push origin --tags

Manual publishing

If you want to publish the package manually for any reason, you need to have twine and build installed:

pip install build twine

Then, folow these steps to release a new version:

Bump the version in pyproject.toml.
Build the package:

python -m build

Upload the generated distribution archives:

python -m twine upload dist/*

TODO

Add tests
Include conversion scripts as command line entry points in this package.
Add CI workflow for releasing the package.