mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2024-11-11 21:39:52 +00:00
574406dc7e
* ggml : add Q5_0 quantization (cuBLAS only) * ggml : fix Q5_0 qh -> uint32_t * ggml : fix q5_0 histogram stats * ggml : q5_0 scalar dot product * ggml : q5_0 ARM NEON dot * ggml : q5_0 more efficient ARM NEON using uint64_t masks * ggml : rename Q5_0 -> Q5_1 * ggml : adding Q5_0 mode * quantize : add Q5_0 and Q5_1 to map * ggml : AVX2 optimizations for Q5_0, Q5_1 (#1195) --------- Co-authored-by: Stephan Walter <stephan@walter.name> |
||
---|---|---|
.. | ||
benchmark | ||
embedding | ||
main | ||
perplexity | ||
quantize | ||
quantize-stats | ||
save-load-state | ||
alpaca.sh | ||
chat-13B.bat | ||
chat-13B.sh | ||
chat.sh | ||
CMakeLists.txt | ||
common.cpp | ||
common.h | ||
gpt4all.sh | ||
Miku.sh | ||
reason-act.sh |