mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2024-12-25 02:44:36 +00:00
docs : Quantum -> Quantized (#8666)
* docfix: imatrix readme, quantum models -> quantized models. * docfix: server readme: quantum models -> quantized models.
This commit is contained in:
parent
8a4bad50a8
commit
4b0eff3df5
@ -1,6 +1,6 @@
|
|||||||
# llama.cpp/examples/imatrix
|
# llama.cpp/examples/imatrix
|
||||||
|
|
||||||
Compute an importance matrix for a model and given text dataset. Can be used during quantization to enchance the quality of the quantum models.
|
Compute an importance matrix for a model and given text dataset. Can be used during quantization to enchance the quality of the quantized models.
|
||||||
More information is available here: https://github.com/ggerganov/llama.cpp/pull/4861
|
More information is available here: https://github.com/ggerganov/llama.cpp/pull/4861
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
@ -5,7 +5,7 @@ Fast, lightweight, pure C/C++ HTTP server based on [httplib](https://github.com/
|
|||||||
Set of LLM REST APIs and a simple web front end to interact with llama.cpp.
|
Set of LLM REST APIs and a simple web front end to interact with llama.cpp.
|
||||||
|
|
||||||
**Features:**
|
**Features:**
|
||||||
* LLM inference of F16 and quantum models on GPU and CPU
|
* LLM inference of F16 and quantized models on GPU and CPU
|
||||||
* [OpenAI API](https://github.com/openai/openai-openapi) compatible chat completions and embeddings routes
|
* [OpenAI API](https://github.com/openai/openai-openapi) compatible chat completions and embeddings routes
|
||||||
* Parallel decoding with multi-user support
|
* Parallel decoding with multi-user support
|
||||||
* Continuous batching
|
* Continuous batching
|
||||||
|
Loading…
Reference in New Issue
Block a user