Commit Graph

8 Commits

Author SHA1 Message Date
Djip007
19d8762ab6
ggml : refactor online repacking (#10446)
* rename ggml-cpu-aarch64.c to .cpp

* reformat extra cpu backend.

- clean Q4_0_N_M and IQ4_0_N_M
  - remove from "file" tensor type
  - allow only with dynamic repack

- extract cpu extra bufts and convert to C++
  - hbm
  - "aarch64"

- more generic use of extra buffer
  - generalise extra_supports_op
  - new API for "cpu-accel":
     - amx
     - aarch64

* clang-format

* Clean Q4_0_N_M ref

Enable restrict on C++

* add op GGML_OP_MUL_MAT_ID for Q4_0_N_M with runtime repack

* added/corrected control on tensor size for Q4 repacking.

* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* add debug logs on repacks.

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-12-07 14:37:50 +02:00
Dan Johansson
b2e89a3274
Arm AArch64: Documentation updates (#9321)
* Arm AArch64: Documentation updates

* Update docs/build.md to include information on how to enable the Arm optimized gemm/gemv kernels

* Update examples/quantize/README.md with information on the Q4_0_4_4, Q4_0_4_8 and Q4_0_8_8 formats

* Add newline to the end of docs/build.md
2024-09-09 10:02:45 +03:00
Aisuko
c8ddce8560
Fix inference example lacks required parameters (#9035)
Signed-off-by: Aisuko <urakiny@gmail.com>
2024-08-16 11:08:59 +02:00
Xuan Son Nguyen
be20e7f49d
Reorganize documentation pages (#8325)
* re-organize docs

* add link among docs

* add link to build docs

* fix style

* de-duplicate sections
2024-07-05 18:08:32 +02:00
Vaibhav Srivastav
ad52d5c259
doc: add references to hugging face GGUF-my-repo quantisation web tool. (#7288)
* chore: add references to the quantisation space.

* fix grammer lol.

* Update README.md

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Update README.md

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-16 15:38:43 +10:00
Rene Leonhardt
5c4d767ac0
chore: Fix markdown warnings (#6625) 2024-04-12 10:52:36 +02:00
BarfingLemurs
ffe88a36a9
readme : add some recent perplexity and bpw measurements to READMES, link for k-quants (#3340)
* Update README.md

* Update README.md

* Update README.md with k-quants bpw measurements
2023-09-27 18:30:36 +03:00
Georgi Gerganov
a316a425d0
Overhaul the examples structure
- main -> examples
- utils -> examples (renamed to "common")
- quantize -> examples
- separate tools for "perplexity" and "embedding"

Hope I didn't break something !
2023-03-25 20:26:40 +02:00