mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-28 12:24:35 +00:00

History

Francis Couture-Harpin bd807499f7 ggml-quants : 1.625 bpw ternary packing for BitNet 1.58b		2024-06-27 02:06:22 -04:00
..
CMakeLists.txt	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00
quantize.cpp	ggml-quants : 1.625 bpw ternary packing for BitNet 1.58b	2024-06-27 02:06:22 -04:00
README.md	doc: add references to hugging face GGUF-my-repo quantisation web tool. (#7288 )	2024-05-16 15:38:43 +10:00
tests.sh	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00

quantize

You can also use the GGUF-my-repo space on Hugging Face to build your own quants without any setup.

Note: It is synced from llama.cpp main every 6 hours.

Llama 2 7B