From ad52d5c259344888b06fd5acd3344c663dd0621d Mon Sep 17 00:00:00 2001 From: Vaibhav Srivastav Date: Thu, 16 May 2024 07:38:43 +0200 Subject: [PATCH] doc: add references to hugging face GGUF-my-repo quantisation web tool. (#7288) * chore: add references to the quantisation space. * fix grammer lol. * Update README.md Co-authored-by: Julien Chaumond * Update README.md Co-authored-by: Georgi Gerganov --------- Co-authored-by: Julien Chaumond Co-authored-by: Georgi Gerganov --- README.md | 3 +++ examples/quantize/README.md | 4 +++- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index ecbe802df..5d6217d13 100644 --- a/README.md +++ b/README.md @@ -712,6 +712,9 @@ Building the program with BLAS support may lead to some performance improvements ### Prepare and Quantize +> [!NOTE] +> You can use the [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space on Hugging Face to quantise your model weights without any setup too. It is synced from `llama.cpp` main every 6 hours. + To obtain the official LLaMA 2 weights please see the Obtaining and using the Facebook LLaMA 2 model section. There is also a large selection of pre-quantized `gguf` models available on Hugging Face. Note: `convert.py` does not support LLaMA 3, you can use `convert-hf-to-gguf.py` with LLaMA 3 downloaded from Hugging Face. diff --git a/examples/quantize/README.md b/examples/quantize/README.md index 8a10365c0..b78ece4e7 100644 --- a/examples/quantize/README.md +++ b/examples/quantize/README.md @@ -1,6 +1,8 @@ # quantize -TODO +You can also use the [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space on Hugging Face to build your own quants without any setup. + +Note: It is synced from llama.cpp `main` every 6 hours. ## Llama 2 7B