Updated Tensor Encoding Schemes (markdown)

2024-12-26 03:14:35 +00:00 · 2024-05-17 16:51:05 +10:00 · 2024-05-17 16:51:05 +10:00 · 99cc856d71
commit 99cc856d71
parent 79fd1841b1
1 changed files with 1 additions and 0 deletions
--- a/Tensor-Encoding-Schemes.md
+++ b/Tensor-Encoding-Schemes.md
@ -56,3 +56,4 @@ This is not definitive, but is helpful when reading sourcecode or console output
 | IQ4_NL   | GGML_FTYPE_MOSTLY_IQ4_NL        | GGML_TYPE_IQ4_NL        | 4.5         | i-quantization                | Superblocks with 16 blocks, each block has 16 weights                  | w = [non linear mapping of quants to weights]   | [llama.cpp PR: IQ4_NL: 4-bit non-linear quants with blocks of 32 #5590](https://github.com/ggerganov/llama.cpp/pull/5590) |
 | IQ4_XS   | GGML_FTYPE_MOSTLY_IQ4_XS        | GGML_TYPE_IQ4_XS        | 4.25        | i-quantization                | Superblocks with 8 blocks, each block has 32 weights                   | w = func(superblock_scale, importance_matrix)   | [llama.cpp PR: IQ4_XS: a 4.25 bpw quantization #5747](https://github.com/ggerganov/llama.cpp/pull/5747) |

+* All superblocks have fp16 scaling factor and contains up to 256 weights. Number of weights in a block must be divisible by 256. (To be confirmed)