mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2024-11-11 21:39:52 +00:00
Add quantize script for batch quantization (#92)
* Add quantize script for batch quantization * Indentation * README for new quantize.sh * Fix script name * Fix file list on Mac OS --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
This commit is contained in:
parent
1808ee0500
commit
d1f224712d
34
README.md
34
README.md
@ -145,44 +145,16 @@ python3 -m pip install torch numpy sentencepiece
|
||||
python3 convert-pth-to-ggml.py models/7B/ 1
|
||||
|
||||
# quantize the model to 4-bits
|
||||
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2
|
||||
./quantize.sh 7B
|
||||
|
||||
# run the inference
|
||||
./main -m ./models/7B/ggml-model-q4_0.bin -t 8 -n 128
|
||||
```
|
||||
|
||||
For the bigger models, there are a few extra quantization steps. For example, for LLaMA-13B, converting to FP16 format
|
||||
will create 2 ggml files, instead of one:
|
||||
|
||||
```bash
|
||||
ggml-model-f16.bin
|
||||
ggml-model-f16.bin.1
|
||||
```
|
||||
|
||||
You need to quantize each of them separately like this:
|
||||
|
||||
```bash
|
||||
./quantize ./models/13B/ggml-model-f16.bin ./models/13B/ggml-model-q4_0.bin 2
|
||||
./quantize ./models/13B/ggml-model-f16.bin.1 ./models/13B/ggml-model-q4_0.bin.1 2
|
||||
```
|
||||
|
||||
Everything else is the same. Simply run:
|
||||
|
||||
```bash
|
||||
./main -m ./models/13B/ggml-model-q4_0.bin -t 8 -n 128
|
||||
```
|
||||
|
||||
The number of files generated for each model is as follows:
|
||||
|
||||
```
|
||||
7B -> 1 file
|
||||
13B -> 2 files
|
||||
30B -> 4 files
|
||||
65B -> 8 files
|
||||
```
|
||||
|
||||
When running the larger models, make sure you have enough disk space to store all the intermediate files.
|
||||
|
||||
TODO: add model disk/mem requirements
|
||||
|
||||
### Interactive mode
|
||||
|
||||
If you want a more ChatGPT-like experience, you can run in interactive mode by passing `-i` as a parameter.
|
||||
|
15
quantize.sh
Executable file
15
quantize.sh
Executable file
@ -0,0 +1,15 @@
|
||||
#!/usr/bin/env bash
|
||||
|
||||
if ! [[ "$1" =~ ^[0-9]{1,2}B$ ]]; then
|
||||
echo
|
||||
echo "Usage: quantize.sh 7B|13B|30B|65B [--remove-f16]"
|
||||
echo
|
||||
exit 1
|
||||
fi
|
||||
|
||||
for i in `ls models/$1/ggml-model-f16.bin*`; do
|
||||
./quantize "$i" "${i/f16/q4_0}" 2
|
||||
if [[ "$2" == "--remove-f16" ]]; then
|
||||
rm "$i"
|
||||
fi
|
||||
done
|
Loading…
Reference in New Issue
Block a user