llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-09 18:21:45 +00:00

History

Georgi Gerganov 11ac9800af Some checks failed Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-musa.Dockerfile platforms:linux/amd64 tag:full-musa]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-musa.Dockerfile platforms:linux/amd64 tag:light-musa]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-musa.Dockerfile platforms:linux/amd64 tag:server-musa]) (push) Waiting to run Details Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run Details Nix CI / nix-eval (macos-latest) (push) Waiting to run Details Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run Details Nix CI / nix-build (macos-latest) (push) Waiting to run Details Nix CI / nix-build (ubuntu-latest) (push) Waiting to run Details flake8 Lint / Lint (push) Waiting to run Details Python Type-Check / pyright type-check (push) Has been cancelled Details llama : improve infill support and special token detection (#9798 ) * llama : improve infill support ggml-ci * llama : add more FIM token strings ggml-ci * server : update prompt on slot restore (#9800) * gguf : deprecate old FIM token KVs		2024-10-12 08:21:51 +03:00
..
examples	gguf-py : fix double call to add_architecture() (#8952 )	2024-08-10 08:58:49 +03:00
gguf	llama : improve infill support and special token detection (#9798 )	2024-10-12 08:21:51 +03:00
scripts	gguf_dump.py: fix markddown kv array print (#8588 )	2024-07-20 17:35:25 +10:00
tests	ggml-quants : ternary packing for TriLMs and BitNet b1.58 (#8151 )	2024-09-05 21:48:47 -04:00
LICENSE	gguf : make gguf pip-installable	2023-08-25 09:26:05 +03:00
pyproject.toml	build(nix): Package gguf-py (#5664 )	2024-09-02 14:21:01 +03:00
README.md	convert-*.py: GGUF Naming Convention Refactor and Metadata Override Refactor (#7499 )	2024-07-18 20:40:15 +10:00

README.md

gguf

This is a Python package for writing binary files in the GGUF (GGML Universal File) format.

See convert_hf_to_gguf.py as an example for its usage.

Installation

pip install gguf

API Examples/Simple Tools

examples/writer.py — Generates example.gguf in the current directory to demonstrate generating a GGUF file. Note that this file cannot be used as a model.

scripts/gguf_dump.py — Dumps a GGUF file's metadata to the console.

scripts/gguf_set_metadata.py — Allows changing simple metadata values in a GGUF file by key.

scripts/gguf_convert_endian.py — Allows converting the endianness of GGUF files.

scripts/gguf_new_metadata.py — Copies a GGUF file with added/modified/removed metadata values.

Development

Maintainers who participate in development of this package are advised to install it in editable mode:

cd /path/to/llama.cpp/gguf-py

pip install --editable .

Note: This may require to upgrade your Pip installation, with a message saying that editable installation currently requires setup.py. In this case, upgrade Pip to the latest:

pip install --upgrade pip

Automatic publishing with CI

There's a GitHub workflow to make a release automatically upon creation of tags in a specified format.

Bump the version in pyproject.toml.
Create a tag named gguf-vx.x.x where x.x.x is the semantic version number.

git tag -a gguf-v1.0.0 -m "Version 1.0 release"

Push the tags.

git push origin --tags

Manual publishing

If you want to publish the package manually for any reason, you need to have twine and build installed:

pip install build twine

Then, follow these steps to release a new version:

Bump the version in pyproject.toml.
Build the package:

python -m build

Upload the generated distribution archives:

python -m twine upload dist/*

Run Unit Tests

From root of this repository you can run this command to run all the unit tests

python -m unittest discover ./gguf-py -v

TODO

Include conversion scripts as command line entry points in this package.