From d1f224712d78ab2cbb78777acfeb6739f660eb96 Mon Sep 17 00:00:00 2001 From: Pavol Rusnak Date: Mon, 13 Mar 2023 17:15:20 +0100 Subject: [PATCH] Add quantize script for batch quantization (#92) * Add quantize script for batch quantization * Indentation * README for new quantize.sh * Fix script name * Fix file list on Mac OS --------- Co-authored-by: Georgi Gerganov --- README.md | 34 +++------------------------------- quantize.sh | 15 +++++++++++++++ 2 files changed, 18 insertions(+), 31 deletions(-) create mode 100755 quantize.sh diff --git a/README.md b/README.md index 3a6d757d6..65be1a687 100644 --- a/README.md +++ b/README.md @@ -145,44 +145,16 @@ python3 -m pip install torch numpy sentencepiece python3 convert-pth-to-ggml.py models/7B/ 1 # quantize the model to 4-bits -./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2 +./quantize.sh 7B # run the inference ./main -m ./models/7B/ggml-model-q4_0.bin -t 8 -n 128 ``` -For the bigger models, there are a few extra quantization steps. For example, for LLaMA-13B, converting to FP16 format -will create 2 ggml files, instead of one: - -```bash -ggml-model-f16.bin -ggml-model-f16.bin.1 -``` - -You need to quantize each of them separately like this: - -```bash -./quantize ./models/13B/ggml-model-f16.bin ./models/13B/ggml-model-q4_0.bin 2 -./quantize ./models/13B/ggml-model-f16.bin.1 ./models/13B/ggml-model-q4_0.bin.1 2 -``` - -Everything else is the same. Simply run: - -```bash -./main -m ./models/13B/ggml-model-q4_0.bin -t 8 -n 128 -``` - -The number of files generated for each model is as follows: - -``` -7B -> 1 file -13B -> 2 files -30B -> 4 files -65B -> 8 files -``` - When running the larger models, make sure you have enough disk space to store all the intermediate files. +TODO: add model disk/mem requirements + ### Interactive mode If you want a more ChatGPT-like experience, you can run in interactive mode by passing `-i` as a parameter. diff --git a/quantize.sh b/quantize.sh new file mode 100755 index 000000000..6194649b3 --- /dev/null +++ b/quantize.sh @@ -0,0 +1,15 @@ +#!/usr/bin/env bash + +if ! [[ "$1" =~ ^[0-9]{1,2}B$ ]]; then + echo + echo "Usage: quantize.sh 7B|13B|30B|65B [--remove-f16]" + echo + exit 1 +fi + +for i in `ls models/$1/ggml-model-f16.bin*`; do + ./quantize "$i" "${i/f16/q4_0}" 2 + if [[ "$2" == "--remove-f16" ]]; then + rm "$i" + fi +done