Francis Couture-Harpin
7ef4254a92
ggml-quants : faster 1.625 bpw AVX2 vec_dot
...
Not using a lookup table anymore makes it match q4_0 speed.
* gguf-py : fix formatting
* llama : remove spaces on empty line
2024-06-27 02:06:28 -04:00
Francis Couture-Harpin
bd807499f7
ggml-quants : 1.625 bpw ternary packing for BitNet 1.58b
2024-06-27 02:06:22 -04:00
compilade
b83bab15a5
gguf-py : fix and simplify quantized shape round-trip ( #7483 )
...
* gguf-py : fix and simplify quantized shape round-trip
* gguf-py : remove unused import
2024-05-25 11:11:48 +10:00
compilade
ee52225067
convert-hf : support direct Q8_0 conversion ( #7234 )
...
* convert-hf : support q8_0 conversion
* convert-hf : add missing ftype
This was messing with the checksums otherwise.
* convert-hf : add missing ftype to Baichuan and Xverse
I didn't notice these on my first pass.
2024-05-13 14:10:51 -04:00