Default Branch

master
Some checks are pending
Python Type-Check / pyright type-check (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run

9ba399dfa7 · server : add support for "encoding_format": "base64" to the */embeddings endpoints (#10967) · Updated 2024-12-24 20:33:04 +00:00

Branches

1ad42b1f1e · ggml : ggml_soft_max uses F16 mask · Updated 2024-01-31 18:33:59 +00:00    root

2355
36

719a087138 · iq3_xxs: forgotten update of the grid points · Updated 2024-01-30 16:39:07 +00:00    root

2369
1

2bf91c5306 · metal : clean up · Updated 2024-01-25 11:29:45 +00:00    root

2465
23

6ccbd1777a · wip · Updated 2024-01-24 13:45:04 +00:00    root

2465
18

da23b56f25 · wip : no ic 8 step · Updated 2024-01-24 11:25:34 +00:00    root

2465
18

06c2d0d117 · wip · Updated 2024-01-23 20:42:43 +00:00    root

2465
14

a9681febd6 · ggml : online attention (CPU) · Updated 2024-01-20 14:45:41 +00:00    root

2465
4

32a392fe68 · try a differerent fix · Updated 2024-01-19 22:10:23 +00:00    root

2466
2

4a3bc1522e · py : linting with mypy and isort · Updated 2024-01-19 20:18:58 +00:00    root

2467
3

1453215165 · kompute : fix ggml_add kernel · Updated 2024-01-18 22:09:16 +00:00    root

2583
105

ccc78a200e · hellaswag: speed up even more by parallelizing log-prob evaluation · Updated 2024-01-18 16:25:29 +00:00    root

2483
1

2917e6b528 · Merge branch 'master' into gg/imatrix-gpu-4931 · Updated 2024-01-17 16:43:45 +00:00    root

2490
10

23742deb5b · py : fix padded dummy tokens (I hope) · Updated 2024-01-17 13:44:22 +00:00    root

2509
4

9fd1e83f6d · Use Q4_K for attn_v for Q2_K_S when n_gqa >= 4 · Updated 2024-01-17 10:16:08 +00:00    root

2495
1

49bafe0986 · tests : avoid creating RNGs for each tensor · Updated 2024-01-17 08:40:55 +00:00    root

2498
6

bb9abb5cd8 · imatrix: guard Q4_0/Q5_0 against ffn_down craziness · Updated 2024-01-16 07:56:05 +00:00    root

2512
2

9998ecd191 · llama : add phixtral support (wip) · Updated 2024-01-13 12:24:07 +00:00    root

2542
1

1fb563ebdc · py : try to fix flake stuff · Updated 2024-01-13 11:42:35 +00:00    root

2536
2

9bfcb16fd3 · Add llama enum for IQ2_XS · Updated 2024-01-11 16:24:12 +00:00    root

2585
11

24096933b0 · server : try to fix infill when prompt is empty · Updated 2024-01-09 09:27:29 +00:00    root

2587
1