Commit Graph

  • e52797162e
    spectrum processing Georgi Gerganov 2024-12-11 16:35:10 +0200
  • 5a1c98e8d2
    fft Georgi Gerganov 2024-12-11 14:51:17 +0200
  • e728cfd297
    compute hann window Georgi Gerganov 2024-12-11 12:35:47 +0200
  • a1f08ad338
    fix n_embd + remove llama.cpp hacks Georgi Gerganov 2024-12-11 10:59:08 +0200
  • eb1b70f42a
    hann window Georgi Gerganov 2024-12-11 10:52:07 +0200
  • 839035d1bb
    head Georgi Gerganov 2024-12-11 10:22:12 +0200
  • fe6dd5aa61
    convnext Georgi Gerganov 2024-12-11 10:06:48 +0200
  • b3ba05e5bc
    layer norm Georgi Gerganov 2024-12-10 22:37:26 +0200
  • 435cfd788b
    pos net Georgi Gerganov 2024-12-10 22:05:44 +0200
  • 3046fde420
    attn Georgi Gerganov 2024-12-10 21:59:45 +0200
  • 13dd8941a4
    resnet Georgi Gerganov 2024-12-10 20:50:13 +0200
  • 3d08d62b6c
    resnet conv Georgi Gerganov 2024-12-10 20:40:52 +0200
  • 5296c96ca8
    group norm Georgi Gerganov 2024-12-10 20:33:29 +0200
  • 6ef14091c0
    first conv Georgi Gerganov 2024-12-10 19:18:04 +0200
  • aac7e04953
    extract features Georgi Gerganov 2024-12-10 18:23:10 +0200
  • ff2ea75fb4
    wip Georgi Gerganov 2024-12-10 16:31:02 +0200
  • f169965158
    llama : add OuteTTS support (wip) Georgi Gerganov 2024-12-10 14:40:03 +0200
  • e65556f174
    server : do not normalize embeddings when there is no pooling Georgi Gerganov 2024-12-17 13:36:32 +0200
  • 1b18b2d7b0
    server : be explicit about the pooling type in the tests Georgi Gerganov 2024-12-17 11:45:18 +0200
  • 06e85401b0
    server : output embeddings for all tokens when pooling = none Georgi Gerganov 2024-12-17 10:56:20 +0200
  • 89eaf5036a
    server : add "tokens" output Georgi Gerganov 2024-12-16 21:03:24 +0200
  • c0cca53d85 Merge branch 'master' into xsn/fix_logprobs Xuan Son Nguyen 2024-12-18 12:46:50 +0100
  • 50b3813319 rebuild Xuan Son Nguyen 2024-12-18 12:41:05 +0100
  • 152610eda9
    server : output embeddings for all tokens when pooling = none (#10861) Georgi Gerganov 2024-12-18 13:01:41 +0200
  • 2dcdd483d4
    server : remove rebase artifact Georgi Gerganov 2024-12-18 12:43:04 +0200
  • 84bbd366c1 tests: disable GGUF test for bad value size Johannes Gäßler 2024-12-18 11:12:11 +0100
  • 4a7843cdab
    Merge c76851eeb0 into 0e70ba686e Daniel Bevenius 2024-12-18 01:57:33 -0800
  • 6448dbc496
    Merge e0580f9d66 into 0e70ba686e Sigbjørn Skjæret 2024-12-18 01:57:33 -0800
  • 126883acf2
    Merge a6648b9df7 into 0e70ba686e Georgi Gerganov 2024-12-18 10:56:29 +0100
  • 600cebc9a8
    server : update readme [no ci] Georgi Gerganov 2024-12-18 11:55:28 +0200
  • 2a5510ed82
    tests : update server tests Georgi Gerganov 2024-12-18 11:33:46 +0200
  • 87df60166d
    server : fixes Georgi Gerganov 2024-12-18 11:13:29 +0200
  • 3a7c001fe3
    server : update readme Georgi Gerganov 2024-12-17 16:12:15 +0200
  • 7e693f92d7
    server : do not normalize embeddings when there is no pooling Georgi Gerganov 2024-12-17 13:36:32 +0200
  • abf33e2017
    server : update /embeddings and /v1/embeddings endpoints Georgi Gerganov 2024-12-17 15:59:55 +0200
  • 2a94c33028
    server : be explicit about the pooling type in the tests Georgi Gerganov 2024-12-17 11:45:18 +0200
  • 2dea48758e
    server : fix spacing [no ci] Georgi Gerganov 2024-12-17 11:37:08 +0200
  • d424afac5f
    server : update readme [no ci] Georgi Gerganov 2024-12-17 11:01:29 +0200
  • 07946a3a30
    server : output embeddings for all tokens when pooling = none Georgi Gerganov 2024-12-17 10:56:20 +0200
  • 44eeb6a88e
    server : add "tokens" output Georgi Gerganov 2024-12-16 21:03:24 +0200
  • 0e70ba686e
    server : add "tokens" output (#10853) b4354 Georgi Gerganov 2024-12-18 11:05:29 +0200
  • 46828872c3
    server : (embeddings) using same format for "input" and "content" (#10872) b4353 Xuan Son Nguyen 2024-12-18 09:55:09 +0100
  • 6b064c92b4
    docs: Fix HIP (née hipBLAS) in README (#10880) redbeard 2024-12-18 00:35:00 -0800
  • 92e41ec4b9 Update log to only print when input and output characters are different Billel Mokeddem 2024-12-18 08:20:28 +0000
  • 99cb6be1d3
    server : remove "tokens" from the OAI endpoint Georgi Gerganov 2024-12-18 10:16:46 +0200
  • 1dae1d884f ggml: Show detected features with GGML_NATIVE Adrien Gallouët 2024-12-17 12:11:30 +0100
  • 7eb81e1603 ggml: GGML_NATIVE uses -mcpu=native on ARM Adrien Gallouët 2024-12-10 11:08:02 +0000
  • 5bf29af841
    tests : improve "tokens" type check Georgi Gerganov 2024-12-18 10:02:01 +0200
  • fe9235d795 Force max subgroup size for coopmat shaders 0cc4m/vulkan-coopmat-amd-windows 0cc4m 2024-12-10 20:27:04 +0000
  • d8d2f370dc Add a log message to better track the when the following line of code is triggered Billel Mokeddem 2024-12-18 07:23:35 +0000
  • b3d022aa1a Add comment explaining the logic behind the if statement Billel Mokeddem 2024-12-18 05:46:07 +0000
  • fc055407b7 Add fix for adding bos to added special tokens Billel Mokeddem 2024-12-18 04:58:00 +0000
  • 36423273dc docs: Fix HIP (née hipBLAS) in README Brian 'redbeard' Harrington 2024-12-17 20:48:15 -0800
  • a20dde36ff
    SYCL: reg_get_proc_address func, update to the current func signature Akarshan Biswas 2024-12-18 09:20:52 +0530
  • 82ce602ee7
    SYCL: Use GGML_SYCL_DEBUG after reverting Akarshan Biswas 2024-12-18 09:19:43 +0530
  • eeb04751d9
    Revert "SYCL: Integrate debug logs with GGML_LOG and other fixes" Akarshan Biswas 2024-12-18 09:11:17 +0530
  • bfa0298900 server: avoid overwriting Authorization header Gaetan Bisson 2024-12-17 16:08:48 -1000
  • 4da69d1abd
    Revert "llama : add Falcon3 support (#10864)" (#10876) b4351 Diego Devesa 2024-12-18 01:36:46 +0100
  • e10dc009b5
    Revert "llama : add Falcon3 support (#10864)" Diego Devesa 2024-12-17 23:25:36 +0100
  • d62b532c52
    Use model->gguf_kv for loading the template instead of using the C API. (#10868) b4350 DAN™ 2024-12-17 17:24:22 -0500
  • 2e04ccf4e6 llama_server_response_fields nvrxq 2024-12-18 01:21:44 +0300
  • 101e772c73 fix test Xuan Son Nguyen 2024-12-17 22:28:53 +0100
  • d4b9ec098b handle empty input case Xuan Son Nguyen 2024-12-17 21:50:36 +0100
  • a2d4b6fc81 llama: Ensure KV cache is fully defragmented. Jesse Gross 2024-12-13 16:11:59 -0800
  • 9a566806f0 fix test case Xuan Son Nguyen 2024-12-17 21:36:50 +0100
  • d4e0bad0ae server : (embeddings) using same format for "input" and "content" Xuan Son Nguyen 2024-12-17 21:33:29 +0100
  • 8bcfc5551e
    server : return tokens ids only if requested Georgi Gerganov 2024-12-17 21:44:09 +0200
  • 52bfa235e3 Use model->gguf_kv for efficiency. DAN™ 2024-12-17 14:00:45 -0500
  • bf51f65a1c Improve progress bar Eric Curtin 2024-12-13 22:46:13 +0000
  • 081b29bd2a
    tests: add tests for GGUF (#10830) b4349 Johannes Gäßler 2024-12-17 19:09:35 +0100
  • 919fe432c3 Bump model_template to 16384 bytes to support larger chat templates. DAN™ 2024-12-17 11:02:26 -0500
  • 5437d4aaf5
    sync : ggml b4348 Georgi Gerganov 2024-12-17 18:36:02 +0200
  • 78f766768d
    cmake : fix "amd64" processor string (whisper/2638) Georgi Gerganov 2024-12-17 18:34:32 +0200
  • 8dd19a4812
    vulkan : fix soft_max.comp division by zero (whisper/2633) gn64 2024-12-16 19:34:38 +0900
  • 130d0c90bd
    ggml : remove return from ggml_gallocr_allocate_node (ggml/1048) Daniel Bevenius 2024-12-14 03:23:08 +0100
  • 3919da8e33
    ggml : add check for grad_accs (ggml/1046) Daniel Bevenius 2024-12-13 08:19:38 +0100
  • 0006f5a74a
    ggml : update ggml_backend_cpu_device_supports_op (#10867) b4343 Georgi Gerganov 2024-12-17 18:35:42 +0200
  • 4fbb801a9d
    ggml : update ggml_backend_cpu_device_supports_op gg/cpu-fix-cpy-iq Georgi Gerganov 2024-12-17 18:09:02 +0200
  • fe67caaca5 docs: update link to ramalama on readme Charlie Drage 2024-12-17 11:07:38 -0500
  • 8cc7145cc7
    ggml : disable tests involving i-matrix quantization Georgi Gerganov 2024-12-17 18:03:47 +0200
  • 05c3a444b8
    server : fill usage info in embeddings and rerank responses (#10852) b4342 krystiancha 2024-12-17 16:00:24 +0000
  • b0597b1493
    ggml : fix cpy op for IQ-quants to use reference impl Georgi Gerganov 2024-12-17 17:54:04 +0200
  • 382bc7f2e8
    llama : add Falcon3 support (#10864) b4341 Billel Mokeddem 2024-12-17 19:24:56 +0400
  • 88cc9719c4 server : fill usage info in reranking response Krystian Chachuła 2024-12-16 14:45:06 +0100
  • 357a7bac41 server : fill usage info in embeddings response Krystian Chachuła 2024-12-16 14:42:41 +0100
  • 38725ef6da server : add bad input handling in embeddings Krystian Chachuła 2024-12-17 13:04:02 +0100
  • d2b1a41a2c
    Merge 4c7195e839 into 4f51968aca Ilan F. S. Theodoro 2024-12-17 11:16:40 +0100
  • 4f51968aca
    readme : update typos (#10863) Ruan 2024-12-17 17:47:20 +0800
  • d146334c11 Add Falcon3 model support Billel Mokeddem 2024-12-17 09:46:19 +0000
  • 8f1330666c
    readme : update typos Ruan 2024-12-17 17:29:50 +0800
  • 0463b42cf6
    Merge 63978cb6dc into 227d7c5a7f Zhenwei Jin 2024-12-17 10:26:15 +0100
  • 227d7c5a7f
    server : (UI) fix missing async generator on safari (#10857) Xuan Son Nguyen 2024-12-17 09:52:09 +0100
  • 6ad1f8dae9 fix Xuan Son Nguyen 2024-12-17 09:45:32 +0100
  • 7b1ec53f56
    vulkan: bugfixes for small subgroup size systems + llvmpipe test (#10809) b4338 Eve 2024-12-17 05:52:55 +0000
  • 5ba1fda5cf server : (UI) fix missing async generator on safari Xuan Son Nguyen 2024-12-17 00:38:08 +0100
  • d5f69e8a43 fixes to position embeddings Sukriti-Sharma4 2024-12-16 15:28:09 -0700
  • 22bea1d791 vulkan: optimize coopmat2 dequant functions Jeff Bolz 2024-12-07 13:21:10 -0600
  • 7e4d5acfbc
    Merge 74342d48c2 into 160bc039c8 Eugeniusz 2024-12-16 22:27:52 +0100
  • 160bc039c8
    rwkv6: add wkv6 support for Vulkan backend (#10829) b4337 Zhiyuan Li 2024-12-17 05:00:46 +0800
  • d58f8a1b6b
    server : update readme Georgi Gerganov 2024-12-16 21:05:19 +0200