Default Branch

master
Some checks are pending
Python Type-Check / pyright type-check (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run

9ba399dfa7 · server : add support for "encoding_format": "base64" to the */embeddings endpoints (#10967) · Updated 2024-12-24 20:33:04 +00:00

Branches

f0cbb6ddf6 · iq1_s: turn off SIMD implementation for QK_K = 64 (it does not work) · Updated 2024-02-28 06:28:10 +00:00    root

2110
6

14d757066b · llama : add llama_kv_cache_compress (EXPERIMENTAL) · Updated 2024-02-27 14:24:40 +00:00    root

2111
1

608f449880 · swift : fix build · Updated 2024-02-23 17:02:09 +00:00    root

2142
4

56c047156a · py : minor fixes · Updated 2024-02-22 17:22:56 +00:00    root

2151
1

5271c75666 · llama : fix K-shift with quantized K (wip) · Updated 2024-02-21 23:28:42 +00:00    root

2159
1

f249c997a8 · llama : adapt to F16 KQ_pos · Updated 2024-02-19 11:31:02 +00:00    root

2197
62

412735ec70 · Merge branch 'master' into gg/metal-batched · Updated 2024-02-19 09:25:24 +00:00    root

2197
6

47c662b0de · fix some spaces added by IDE in math op · Updated 2024-02-18 20:40:35 +00:00    root

2207
4

974e3cadff · ggml : try another fix · Updated 2024-02-17 16:14:35 +00:00    root

2226
2

e856bfed3b · hf : add support for --repo and --file · Updated 2024-02-15 13:05:15 +00:00    root

2240
3

ccd757a174 · convert : fix mistakes from refactoring · Updated 2024-02-13 17:01:30 +00:00    root

2248
4

5c977221d2 · iq1_s: slightly faster dot product · Updated 2024-02-13 13:18:27 +00:00    root

2254
15

4246b71ad7 · Fix compiler warnings (shadow variable) · Updated 2024-02-13 06:44:56 +00:00    root

2257
1

7286b83d3f · BERT WIP · Updated 2024-02-06 22:10:11 +00:00    root

2310
1

adcf16fd68 · py : fix empty bytes arg · Updated 2024-02-05 17:53:07 +00:00    root

2320
2

91c453fb11 · One cannot possibly be defining static_assert in a C++ compilation · Updated 2024-02-05 11:22:14 +00:00    root

2325
2

49a483e0f2 · wip · Updated 2024-02-04 10:34:36 +00:00    root

2351
60

a647257b47 · cuda : express strides with helper constants · Updated 2024-02-04 09:45:26 +00:00    root

2351
60

b957b8f5f6 · cuda : add flash_attn kernel (wip) · Updated 2024-02-01 17:49:57 +00:00    root

2355
39

ac26f27028 · cuda : increase C to 128 for better performance · Updated 2024-02-01 15:08:29 +00:00    root

2355
61