Commit Graph

  • 2948768e25
    common : reimplement the logger Georgi Gerganov 2024-09-10 20:40:43 +0300
  • 0792375c66
    metal : handle zero-sized allocs Georgi Gerganov 2024-09-13 10:21:55 +0300
  • 0abc6a2c25
    llama : llama_perf + option to disable timings during decode (#9355) b3750 Georgi Gerganov 2024-09-13 09:53:38 +0300
  • 19ecca1946
    cmake : use list(APPEND ...) instead of set() + dedup linker Georgi Gerganov 2024-09-13 09:44:55 +0300
  • 3a3c9ae8af Implement OLMoE architecture Shane A 2024-09-12 22:52:49 -0700
  • 739ea75015 made loading message more descriptive VJHack 2024-09-12 23:14:29 -0500
  • df9f16747f removed print statement VJHack 2024-09-12 23:04:53 -0500
  • e51eb59861 revert changes to pre-commit VJHack 2024-09-12 22:27:34 -0500
  • cd80fce5e8 eol fix VJHack 2024-09-12 22:16:45 -0500
  • 69c97bbead
    Merge branch 'ggerganov:master' into master Vinesh Janarthanan 2024-09-12 22:14:53 -0500
  • 42abdd0207 precommit corrections VJHack 2024-09-12 22:04:08 -0500
  • b3b84732f9
    Prevent crash on quantization executable Yuri Khrustalev 2024-09-12 22:46:34 -0400
  • cb13382136 account for both api and web browser requests VJHack 2024-09-12 21:44:52 -0500
  • 7c39f2d3ab ggml: rwkv_wkv op CUDA impl Molly Sophia 2024-09-06 16:33:46 +0800
  • daf64fc4a9 revert test VJHack 2024-09-12 20:57:51 -0500
  • bd35cb0ae3
    feat: remove a sampler from a chain (#9445) b3749 Gilad S. 2024-09-13 04:54:49 +0300
  • 8b7daaaef2 ca†ch 503 before parsing json VJHack 2024-09-12 20:44:45 -0500
  • 7da90fb350
    fix: safer casting Gilad S. 2024-09-13 04:38:55 +0300
  • 0b174abc3d ggml: CUDA unary op EXP Molly Sophia 2024-09-05 18:18:51 +0800
  • 78203641fe
    server : Add option to return token pieces in /tokenize endpoint (#9108) Mathijs Henquet 2024-09-12 22:30:11 +0200
  • 661a740d55 maybe this fix windows ci? Xuan Son Nguyen 2024-09-12 21:58:24 +0200
  • 444b757bce
    perf : abort on invalid sampler pointer Georgi Gerganov 2024-09-12 15:08:48 +0300
  • ad971140c3 Merge branch 'master' into feature/tokenize-with-pieces Xuan Son Nguyen 2024-09-12 13:49:52 +0200
  • e6b7801bd1
    cann: Add host buffer type for Ascend NPU (#9406) b3747 Dou Xinpeng 2024-09-12 19:46:43 +0800
  • e665744317
    llava : fix the script error in MobileVLM README (#9054) fengerhu1 2024-09-12 19:34:22 +0800
  • d4c3c10fad
    lora : raise error if lm_head is ignored (#9103) Xuan Son Nguyen 2024-09-12 13:33:57 +0200
  • 2a825116b6
    cmake : fix for builds without GGML_CDEF_PUBLIC (#9338) b3744 Michael Podvitskiy 2024-09-12 13:30:01 +0200
  • 4dc4f5f14a
    ci : update HIP SDK to 24.Q3 (ROCm 6.1) (#9329) b3743 Huang Qi 2024-09-12 19:28:43 +0800
  • c837981bba
    py : add Phi-1.5/Phi-2 tokenizer (#9361) daminho 2024-09-12 20:28:20 +0900
  • 3c26a1644d
    ci : bump actions/checkout to v4 (#9377) Trivikram Kamat 2024-09-12 04:27:45 -0700
  • ff76e18516
    cmake : fixed the order of linking libraries for llama-quantize (#9450) b3740 Michael Podvitskiy 2024-09-12 13:27:14 +0200
  • 39f852f440
    py : add special tokens in hf_converter for RWKV v6 (#9428) Molly Sophia 2024-09-12 19:25:16 +0800
  • 2b00fa7997
    riscv : modify Makefile and add a RISCV_VECT to print log info (#9442) b3738 Ahmad Tameem 2024-09-12 16:24:31 +0500
  • d6a04f872d
    ggml : hide ggml_object, ggml_cgraph, ggml_hash_set (#9408) b3737 Georgi Gerganov 2024-09-12 14:23:49 +0300
  • decff4309e fixed the order of linking libraries for llama-quantize Michael Podvitskiy 2024-09-12 11:56:52 +0200
  • c9c8575a1a
    enhance run script to be easy to change the parameters (#9448) Neo Zhang Jianyu 2024-09-12 17:44:17 +0800
  • b8c9b8e468
    ggml : add assert upon adding nodes Georgi Gerganov 2024-09-12 09:31:39 +0300
  • f35e9b87cd
    minor : better local var name Georgi Gerganov 2024-09-12 09:23:50 +0300
  • 44f0218532
    Merge branch 'master' into gg/llama-perf Georgi Gerganov 2024-09-12 09:21:42 +0300
  • 7362f28833
    perf : safer pointer handling + naming update Georgi Gerganov 2024-09-12 09:19:41 +0300
  • 9f330e7273 Add a few comments dou 2024-09-12 12:21:41 +0800
  • d9132f7846
    Merge branch 'ggerganov:master' into fix Xinpeng Dou 2024-09-12 12:18:39 +0800
  • 161bf2205d updated server test to handle 503 HTML VJHack 2024-09-11 23:11:03 -0500
  • 5ca179c899 updated server test to handle 503 HTML VJHack 2024-09-11 22:54:52 -0500
  • 7a6f9e0be2 enhance run script to be easy to change the parameters arthw 2024-09-12 10:43:39 +0800
  • df4b7945ae
    cann: Fix error when running a non-exist op (#9424) b3735 Xinpeng Dou 2024-09-12 09:02:35 +0800
  • 4410aa09fb
    fix: return removed sampler Gilad S. 2024-09-12 03:57:46 +0300
  • 449ccfb6f5
    Add Jais to list of supported models (#9439) Faisal Zaghloul 2024-09-11 20:29:53 -0400
  • 4a2e5e0dc5
    feat: remove a sampler from a chain Gilad S. 2024-09-12 01:46:29 +0300
  • d7c042d1ae
    ggml : make n_threads_cur atomic_int gg/ggml-atomic-int Georgi Gerganov 2024-09-11 21:12:11 +0300
  • 71cf0e1c0f remove prints from the low-level code Charles Xu 2024-09-11 19:10:30 +0200
  • e5701063f8 Modify Makefile for RISCV and add RISCV-V print System Info - Added ggml_cpu_has_riscv_v() in GGML to print system info in log - Modified Makefile to only use flag when cross compiling for RISC-V AhmadTameem 2024-09-11 22:09:39 +0500
  • 7c5c9d7713 Add Jais to list of supported models fmz 2024-09-11 09:50:42 -0700
  • 1b28061400
    llama : skip token bounds check when evaluating embeddings (#9437) b3733 slaren 2024-09-11 17:52:13 +0200
  • a74c029f76 llama : skip token bounds check when evaluating embeddings slaren 2024-09-11 17:28:35 +0200
  • 3dbd2eeb34 llama : return enum for llama_decode and llama_encode Xuan Son Nguyen 2024-09-11 15:37:22 +0200
  • 8db003a19d
    py : support converting local models (#7547) Pavel Zloi 2024-09-11 15:29:51 +0300
  • 98038bdc8a shutil added to imports pasha 2024-09-11 14:16:32 +0300
  • 0996c5597f
    llava : correct args for minicpmv-cli (#9429) b3731 Xuan Son Nguyen 2024-09-11 12:59:13 +0200
  • 810dc7d034 Merge conflict solved pasha 2024-09-11 13:35:31 +0300
  • f9968f661d
    ggml : update comments [no ci] gg/ggml-rework-cgraph Georgi Gerganov 2024-09-11 13:16:39 +0300
  • 119e0bc9ae
    ggml : remove ggml_cplan + rework ggml_cgraph Georgi Gerganov 2024-09-11 13:05:10 +0300
  • ee154457dd
    ggml : fix compiler warnings Georgi Gerganov 2024-09-11 13:03:18 +0300
  • 5bb2c5dbd2
    files : remove accidentally added lora_test submodule (#9430) Xuan Son Nguyen 2024-09-11 12:02:09 +0200
  • 443219e9c7 remove accidentally commited lora_test submodule Xuan Son Nguyen 2024-09-11 11:25:46 +0200
  • 67155ab7f5
    feat: Implements retrying logic for downloading models using --model-url flag (#9255) b3729 Farbod Bijary 2024-09-11 12:52:37 +0330
  • 29717e8d8f llava : correct args for minicpmv-cli Xuan Son Nguyen 2024-09-11 11:21:18 +0200
  • 691b0f94fa llama: Add special tokens in hf_converter for RWKV v6 Molly Sophia 2024-09-11 16:31:30 +0800
  • 5af118efda
    CUDA: fix --split-mode row race condition (#9413) b3728 Johannes Gäßler 2024-09-11 10:22:40 +0200
  • 92a96865cd
    ggml : add ggml-impl.h to backends Georgi Gerganov 2024-09-11 10:07:21 +0300
  • d2b496bff4
    batched-bench : remove unused code (#9305) b3727 Georgi Gerganov 2024-09-11 10:03:54 +0300
  • f42de2426e
    perf : separate functions in the API Georgi Gerganov 2024-09-11 09:56:41 +0300
  • fb24e846a9
    Merge branch 'ggerganov:master' into npu01 Xinpeng Dou 2024-09-11 14:36:05 +0800
  • 03b52bf908
    Merge branch 'ggerganov:master' into fix Xinpeng Dou 2024-09-11 14:33:08 +0800
  • 6e1aeaf7ab fix: Error when running a non-exist op for Ascend NPU(#9303) dou 2024-09-11 11:30:24 +0800
  • 9d3424a3c1 cleanup for PR removed error VJHack 2024-09-10 22:26:58 -0500
  • 1ff1aa722a cleaned up whitespace VJHack 2024-09-10 22:25:32 -0500
  • 3dd73ca662 updated makefile VJHack 2024-09-10 22:24:39 -0500
  • 125737a255 updated cmakelist VJHack 2024-09-10 22:23:57 -0500
  • 19bc86307f removed loading html file VJHack 2024-09-10 22:23:09 -0500
  • dab4b49f04 set content when model is loading VJHack 2024-09-10 22:20:11 -0500
  • cd99605276 fix some checking errors dou 2024-09-11 10:45:02 +0800
  • cf589c60d8
    Merge 6e075c8849 into b34e023480 Daniel Bevenius 2024-09-10 19:37:38 -0700
  • b34e023480
    musa: remove Clang builtins mapping (#9421) b3726 R0CKSTAR 2024-09-11 09:46:55 +0800
  • d635c75b85
    Merge branch 'ggerganov:master' into avx_optimizations Eve 2024-09-11 01:40:58 +0000
  • a753b25933 remove f16c iq4_nl as i cant make it faster than before Eve 2024-09-10 21:31:09 -0400
  • a201c6b5f7 shuffle Eve 2024-09-10 21:01:57 -0400
  • 51b6038636
    sycl : update support conditions (#9394) b3725 Alberto Cabrera Pérez 2024-09-11 01:53:42 +0100
  • 2963165ced musa: remove Clang builtins mapping Xiaodong Ye 2024-09-11 08:38:29 +0800
  • cb9c933eb2
    flake.lock: Update (#9360) Georgi Gerganov 2024-09-11 01:46:59 +0300
  • f4c67610eb Merge branch 'master' into add-retry-to-model-download Xuan Son Nguyen 2024-09-10 22:46:30 +0200
  • 7272c6fb99 change function name Xuan Son Nguyen 2024-09-10 22:46:20 +0200
  • 6cd4e03444
    arg : bring back missing ifdef (#9411) b3723 Xuan Son Nguyen 2024-09-10 22:41:29 +0200
  • 8d300bd35f
    enable --special arg for llama-server (#9419) b3722 matteo 2024-09-10 22:40:59 +0200
  • 93d2e28b8b
    Merge f286589a32 into 49006c67b4 Olivier Chafik 2024-09-11 02:32:44 +0800
  • 24d903bef7 enable --special arg for llama-server matteo serva 2024-09-10 20:05:38 +0200
  • 7e84f921c5 CUDA: fix --split-mode row race condition Johannes Gäßler 2024-09-10 18:10:40 +0200
  • 2d79a7077c quantize : use unused imatrix chunk_size with LLAMA_TRACE compilade/imatrix-batched-chunks Francis Couture-Harpin 2024-09-10 12:09:17 -0400
  • 49006c67b4
    llama : move random seed generation to the samplers (#9398) b3721 slaren 2024-09-10 18:04:25 +0200
  • 8c13e16bb0 imatrix : allow loading mis-ordered tensors Francis Couture-Harpin 2024-09-10 11:31:49 -0400