Francis Couture-Harpin
0996149911
convert-hf : allow converting the weird BitNet 1.3B
...
Its FFN size is 5460 which is not convenient.
The offending tensors are kept in F16,
which makes the final model 5.01 bpw.
2024-06-27 02:06:28 -04:00
Francis Couture-Harpin
89dc3b254c
ggml-quants : use ceiling division when quantizing q1_3
2024-06-27 02:06:28 -04:00
Francis Couture-Harpin
7ef4254a92
ggml-quants : faster 1.625 bpw AVX2 vec_dot
...
Not using a lookup table anymore makes it match q4_0 speed.
* gguf-py : fix formatting
* llama : remove spaces on empty line
2024-06-27 02:06:28 -04:00
Francis Couture-Harpin
bd807499f7
ggml-quants : 1.625 bpw ternary packing for BitNet 1.58b
2024-06-27 02:06:22 -04:00
compilade
b83bab15a5
gguf-py : fix and simplify quantized shape round-trip ( #7483 )
...
* gguf-py : fix and simplify quantized shape round-trip
* gguf-py : remove unused import
2024-05-25 11:11:48 +10:00
compilade
ee52225067
convert-hf : support direct Q8_0 conversion ( #7234 )
...
* convert-hf : support q8_0 conversion
* convert-hf : add missing ftype
This was messing with the checksums otherwise.
* convert-hf : add missing ftype to Baichuan and Xverse
I didn't notice these on my first pass.
2024-05-13 14:10:51 -04:00