Commit Graph

  • f2478bcab5 fix: Get n_head_kv per-layer in build_bamba Gabe Goodhart 2024-12-09 13:43:27 -0700
  • e7b1abbc0a feat(bamba): Partially complete work on constructing the forward graph Gabe Goodhart 2024-12-05 11:04:54 -0700
  • 41fc019057 fix(bamba): Remove ssm_head_count and ssm_chunk_size in llama.cpp Gabe Goodhart 2024-12-05 11:01:02 -0700
  • dfe8d3ddb8 fix(bamba conv): Remove chunk size and consolidate head count w/ time step rank Gabe Goodhart 2024-12-05 10:59:21 -0700
  • 3ee0ae3b90 feat(bamba): Full tensor parsing for bamba Gabe Goodhart 2024-12-04 12:01:45 -0700
  • fd3bb30118 fix(bamba conv): Fizes in tensor name and hparam conversion for llama.cpp parsing Gabe Goodhart 2024-12-04 12:00:46 -0700
  • e0af809b05 feat(bamba): hparam parsing in llama.cpp Gabe Goodhart 2024-12-03 16:29:32 -0700
  • 1c1e0080ed fix(bamba): Jamba->Bamba in llama.cpp Gabe Goodhart 2024-12-03 16:29:13 -0700
  • fd98682ec3 fix(bamba conv): Jamba -> Bamba Gabe Goodhart 2024-12-03 16:27:29 -0700
  • e3525e9e50 feat(convert): Full pass at hparam conversion Gabe Goodhart 2024-12-02 16:27:19 -0700
  • 246dfdba65 feat(jamba): Add jamba architecture to llama.cpp enums Gabe Goodhart 2024-11-26 14:30:12 -0700
  • 9a68f7537b feat(jamba): First pass at GGUF conversion for Jamba models Gabe Goodhart 2024-11-26 14:29:24 -0700
  • c17956e5e9 contrib : add ngxson as codeowner Xuan Son Nguyen 2024-12-12 20:03:02 +0100
  • 8faa1d4dd4
    CUDA: faster non-contiguous concat (#10760) b4315 a3sh 2024-12-13 02:09:50 +0800
  • cb13ef85a4
    remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797) b4314 Diego Devesa 2024-12-12 19:02:49 +0100
  • 4064c0e3b6
    Vulkan: Use improved q4_k and q5_k dequant code in dequant shaders (#10798) 0cc4m 2024-12-12 18:36:00 +0100
  • dc5301d565
    Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats (#10721) b4312 0cc4m 2024-12-12 18:35:37 +0100
  • ddb5a0254f
    Merge 94d814c559 into 9fdb124304 Isotr0py 2024-12-12 18:35:08 +0100
  • 59940ef310 Documentation: Add ggml_type value choices for KV cache data type in README. MichelleTPY 2024-12-12 17:27:11 +0000
  • 9fdb124304
    common : add missing env var for speculative (#10801) b4311 Xuan Son Nguyen 2024-12-12 16:57:32 +0100
  • 2190dd5dec common : add missing env var for speculative Xuan Son Nguyen 2024-12-12 16:40:26 +0100
  • 6b0848ceaf
    SYCL ggml-sycl: pool2D use sycl::nan and remove if-else block Akarshan Biswas 2024-12-12 19:44:06 +0530
  • d231c1b1c3 remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS other windows build fixes slaren 2024-12-12 14:12:52 +0100
  • dddbdb134f Vulkan: Use improved q4_k and q5_k dequant code in dequant shaders 0cc4m 2024-12-12 13:13:04 +0000
  • ed7f2d5756 set p for sampled token Xuan Son Nguyen 2024-12-12 14:05:38 +0100
  • 22b72c8574 fix test Xuan Son Nguyen 2024-12-12 14:05:29 +0100
  • 396ade0b02 add comment Xuan Son Nguyen 2024-12-12 13:50:42 +0100
  • 29c1495afa sort before apply softmax Xuan Son Nguyen 2024-12-12 13:47:43 +0100
  • cc90cdbc33 return pre-sampling p Xuan Son Nguyen 2024-12-12 13:44:30 +0100
  • 01afafef93 add std::log Xuan Son Nguyen 2024-12-12 11:16:12 +0100
  • 0b4d9ab8f8
    nix: allow to override rocm gpu targets Evgeny Kurnevsky 2024-12-12 10:48:19 +0100
  • b828f4aa5f
    remove prints for CI Abhilash Majumder 2024-12-12 14:57:10 +0530
  • 524acb4279
    use sycl printf over fprintf Abhilash Majumder 2024-12-12 14:48:23 +0530
  • 14f64dab74
    Merge branch 'ggerganov:master' into cuda-build-doc Yann Follet 2024-12-12 17:15:04 +0800
  • ba661a4df5
    add a stdout for unsupported op Abhilash Majumder 2024-12-12 13:36:42 +0530
  • 1c40582610 Also disable coopmats on amdvlk 0cc4m 2024-12-12 07:58:19 +0000
  • 9131c592f7 Fix subgroup size control extension support check 0cc4m 2024-12-12 07:01:07 +0000
  • 5a9d9f7600 Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats 0cc4m 2024-12-08 14:41:40 +0000
  • ffd7c1d04c
    cleanup spaces Abhilash Majumder 2024-12-12 13:13:15 +0530
  • 46bcfe4c30
    Remove TODO Akarshan Biswas 2024-12-12 12:58:11 +0530
  • 90fe556e6d
    SYCL: remove extra empty lines and a comment Akarshan Biswas 2024-12-12 12:54:36 +0530
  • 8dfac46dad
    SYCL: Use GGML_UNUSED for unused variables Akarshan Biswas 2024-12-12 12:32:00 +0530
  • 6b6f756bf9
    Merge branch 'ggerganov:master' into vulkan_llvmpipe Eve 2024-12-12 03:57:19 +0000
  • 064360aa00 ensure mul mat shaders work on systems with subgroup size less than 32 Eve 2024-12-11 21:50:24 -0500
  • ce70d910d9 docs: update server streaming mode documentation (#9519) CentricStorm 2024-12-11 22:40:40 +0000
  • 3b2601d07c Enable rule key ordering for grammars ParthSareen 2024-12-11 17:20:41 -0800
  • a90d3e05b3
    Merge 928aa66a92 into 5555c0c1f6 Brian 2024-12-12 00:17:03 +0100
  • 5555c0c1f6
    docs: update server streaming mode documentation (#9519) CentricStorm 2024-12-11 22:40:40 +0000
  • 40c07240cb docs: update server streaming mode documentation CentricStorm 2024-09-17 05:33:44 +0100
  • 973f328b1e
    Merge pull request #10788 from ggerganov/gg/gguf-py-0.11.0 Georgi Gerganov 2024-12-11 23:14:46 +0200
  • fb18934a97
    gguf-py : bump version to 0.11.0 gguf-v0.11.0 gg/gguf-py-0.11.0 Georgi Gerganov 2024-12-11 23:13:31 +0200
  • 235f6e14bf
    server : (UI) add tok/s, get rid of completion.js (#10786) gguf-py gguf ggu Xuan Son Nguyen 2024-12-11 20:52:14 +0100
  • e4aca8845f fix auto scroll Xuan Son Nguyen 2024-12-11 20:36:45 +0100
  • ab1f7e0326 only extract timings when it's enabled Xuan Son Nguyen 2024-12-11 20:32:24 +0100
  • 10f773415c fix BASE_URL Xuan Son Nguyen 2024-12-11 19:38:10 +0100
  • 4219698eb0 sync Xuan Son Nguyen 2024-12-11 19:31:27 +0100
  • dd09094a22 add tok/s info Xuan Son Nguyen 2024-12-11 19:28:13 +0100
  • 95e294b19d extract chat bubble to a component Xuan Son Nguyen 2024-12-11 18:55:05 +0100
  • bd2f59e50a get rid of completion.js Xuan Son Nguyen 2024-12-11 17:56:43 +0100
  • 1a31d0dc00
    Update README.md (#10772) qingy1337 2024-12-11 07:16:32 -0800
  • 92f77a640f
    ci : pin nodejs to 22.11.0 (#10779) Xuan Son Nguyen 2024-12-11 14:59:41 +0100
  • 484d2f31ae
    bug-fix: snprintf prints NULL in place of the last character (#10419) b4304 kallewoof 2024-12-11 22:48:04 +0900
  • 7828013689 update docs Xuan Son Nguyen 2024-12-11 14:47:49 +0100
  • 74dc729c0b server : fix logprobs, make it openai-compatible Xuan Son Nguyen 2024-12-11 14:38:57 +0100
  • 4b4d92b098
    docs: fix server documentation formatting (#10776) CentricStorm 2024-12-11 10:47:43 +0000
  • 39b4c47b01
    SYCL: clean comments and variables step 3 Akarshan Biswas 2024-12-11 16:10:36 +0530
  • 8f123ae71d
    SYCL: clean comments step 2 Akarshan Biswas 2024-12-11 15:58:40 +0530
  • d7edc55003
    Merge branch 'master' into qwen2-vl HimariO 2024-12-11 17:42:21 +0800
  • 53c8765fb2 ci : pin nodejs to 22.11.0 Xuan Son Nguyen 2024-12-11 08:54:56 +0100
  • 7006dd784c server: Propagate standby_timeout after it has been initialized johannes 2024-12-11 08:41:51 +0100
  • 4fd58a8013 server: Initialize standby_timeout over constructor instead of passing as argument johannes 2024-12-11 08:33:24 +0100
  • acbac00f0d server: Return shutdown_handler to its initial state and use running = false for termination johannes 2024-12-11 08:32:12 +0100
  • cb0daca00b
    SYCL: wkv6 remove a comment Akarshan Biswas 2024-12-11 11:29:04 +0530
  • b0e27ad9ec
    SYCL gemm.hpp: use const cast to properly support dnnl::memory Akarshan Biswas 2024-12-11 11:27:13 +0530
  • 274842d976
    SYCL gemm.hpp: remove pragma directives Akarshan Biswas 2024-12-11 11:11:03 +0530
  • 5a766c12ae
    Merge branch 'master' into refactor Akarshan Biswas 2024-12-11 11:08:54 +0530
  • cc7cd62ee7
    SYCL poo2d kernel: set NAN for invalid pooling op Akarshan Biswas 2024-12-11 11:07:32 +0530
  • 42eec5d60e docs: fix server documentation formatting CentricStorm 2024-12-11 05:04:20 +0000
  • 7dda9aad23
    SYCL: remove the unused variables instead of commenting it out Akarshan Biswas 2024-12-11 08:54:19 +0530
  • 4b5470fcdd
    ggml-sycl.cpp: fix some trailing whitespaces Akarshan Biswas 2024-12-11 08:46:57 +0530
  • 8564c35ac4
    Update README.md qingy1337 2024-12-10 17:41:49 -0800
  • 9fdf8ad826 add constexpr and static assert lihan 2024-12-11 09:38:33 +0800
  • 618708c549
    Update ggml/src/ggml-cuda/concat.cu Diego Devesa 2024-12-11 02:25:48 +0100
  • f8a5b04441
    Use a lambda to avoid code duplication a3sh 2024-12-11 09:14:30 +0800
  • 43041d2eb3
    ggml: load all backends from a user-provided search path (#10699) b4302 Gilad S. 2024-12-11 02:47:21 +0200
  • e950fe63ae fix: change NULL to nullptr Gilad S 2024-12-11 02:14:13 +0200
  • c4b78a035e
    fix: change NULL to nullptr Gilad S. 2024-12-11 02:09:30 +0200
  • 84c0ef9f4b
    Merge ee1c6a4d89 into b685daf386 Amit Kumar Jha 2024-12-11 01:01:22 +0100
  • 51b9545fcd Removes spurious \r in output that causes logging in journalctl to treat lines as binary and therefore hidden by default Charles Darke 2024-12-10 22:46:43 +0100
  • 4f3a7e279b Force max subgroup size for coopmat shaders 0cc4m/vulkan-subgroup-size-control-amd 0cc4m 2024-12-10 20:27:04 +0000
  • b685daf386
    vulkan: request round-to-even for fp16 in im2col/rope_head (#10767) b4301 Jeff Bolz 2024-12-10 14:23:17 -0600
  • 2dc175fb2b Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats 0cc4m 2024-12-08 14:41:40 +0000
  • dafae66cc2
    vulkan: dynamic subgroup size for the remaining k quants (#10745) b4300 Eve 2024-12-10 19:33:23 +0000
  • 140ef267b5 vulkan: request round-to-even for fp16 in im2col/rope_head Jeff Bolz 2024-12-10 11:59:24 -0600
  • f5f3fca063 Add chat template Billel Mokeddem 2024-12-10 17:36:39 +0000
  • 5d31c23f5e Use llama vocab Billel Mokeddem 2024-12-10 17:24:57 +0000
  • ae4b922614
    imatrix : Add imatrix to --no-context-shift (#10766) b4299 Bartowski 2024-12-10 12:23:50 -0500
  • 750cb3e246
    CUDA: rename macros to avoid conflicts with WinAPI (#10736) b4298 Andreas Kieslinger 2024-12-10 18:23:24 +0100
  • a86ad841f1
    server : add flag to disable the web-ui (#10762) (#10751) b4297 Yüg 2024-12-10 17:22:34 +0000
  • a05e2afcc2
    vulkan: disable spirv-opt for coopmat shaders (#10763) b4296 Jeff Bolz 2024-12-10 11:22:20 -0600