llama.cpp/gguf-py
Gabe Goodhart 54ce8cd5d6 fix(granitemoe convert): Split the double-sized input layer into gate and up
After a lot of staring and squinting, it's clear that the standard mixtral
expert implementation is equivalent to the vectorized parallel experts in
granite. The difference is that in granite, the w1 and w3 are concatenated
into a single tensor "input_linear." Rather than reimplementing all of the
math on the llama.cpp side, the much simpler route is to just split this
tensor during conversion and follow the standard mixtral route.

Branch: GraniteMoE

Co-Authored-By: alex.brooks@ibm.com

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2024-09-17 06:45:55 -06:00
..
examples gguf-py : fix double call to add_architecture() (#8952) 2024-08-10 08:58:49 +03:00
gguf fix(granitemoe convert): Split the double-sized input layer into gate and up 2024-09-17 06:45:55 -06:00
scripts gguf_dump.py: fix markddown kv array print (#8588) 2024-07-20 17:35:25 +10:00
tests ggml-quants : ternary packing for TriLMs and BitNet b1.58 (#8151) 2024-09-05 21:48:47 -04:00
LICENSE gguf : make gguf pip-installable 2023-08-25 09:26:05 +03:00
pyproject.toml build(nix): Package gguf-py (#5664) 2024-09-02 14:21:01 +03:00
README.md convert-*.py: GGUF Naming Convention Refactor and Metadata Override Refactor (#7499) 2024-07-18 20:40:15 +10:00

gguf

This is a Python package for writing binary files in the GGUF (GGML Universal File) format.

See convert_hf_to_gguf.py as an example for its usage.

Installation

pip install gguf

API Examples/Simple Tools

examples/writer.py — Generates example.gguf in the current directory to demonstrate generating a GGUF file. Note that this file cannot be used as a model.

scripts/gguf_dump.py — Dumps a GGUF file's metadata to the console.

scripts/gguf_set_metadata.py — Allows changing simple metadata values in a GGUF file by key.

scripts/gguf_convert_endian.py — Allows converting the endianness of GGUF files.

scripts/gguf_new_metadata.py — Copies a GGUF file with added/modified/removed metadata values.

Development

Maintainers who participate in development of this package are advised to install it in editable mode:

cd /path/to/llama.cpp/gguf-py

pip install --editable .

Note: This may require to upgrade your Pip installation, with a message saying that editable installation currently requires setup.py. In this case, upgrade Pip to the latest:

pip install --upgrade pip

Automatic publishing with CI

There's a GitHub workflow to make a release automatically upon creation of tags in a specified format.

  1. Bump the version in pyproject.toml.
  2. Create a tag named gguf-vx.x.x where x.x.x is the semantic version number.
git tag -a gguf-v1.0.0 -m "Version 1.0 release"
  1. Push the tags.
git push origin --tags

Manual publishing

If you want to publish the package manually for any reason, you need to have twine and build installed:

pip install build twine

Then, follow these steps to release a new version:

  1. Bump the version in pyproject.toml.
  2. Build the package:
python -m build
  1. Upload the generated distribution archives:
python -m twine upload dist/*

Run Unit Tests

From root of this repository you can run this command to run all the unit tests

python -m unittest discover ./gguf-py -v

TODO

  • Include conversion scripts as command line entry points in this package.