Commit Graph

  • 7f7e684c5e CUDA: fix sum.cu compilation for CUDA < 11.7 Johannes Gäßler 2024-09-20 09:57:01 +0200
  • e9d8ebaa2c
    examples : flush log upon ctrl+c Georgi Gerganov 2024-09-20 10:06:38 +0300
  • d949c5844d refactor tokenizer zhenweijin 2024-09-11 09:42:55 +0800
  • 722ec1eb51
    perplexity : do not escape input data by default (#9548) b3788 Sigbjørn Skjæret 2024-09-20 08:38:10 +0200
  • f557ccfd2c update oneapi to 2024.2 arthw 2024-09-20 10:47:38 +0800
  • 0940460774 baby-llama : rename llama_layer to baby_llama_layer Daniel Bevenius 2024-09-20 04:41:29 +0200
  • 34857422b1
    Merge a0bd8f0343 into 6026da52d6 Michael Podvitskiy 2024-09-19 20:26:09 -0500
  • 26aac8e289 Soften the token embeddings bump for experts >= 4 Nexesenex 2024-08-25 14:42:33 +0200
  • 5644d4ca01 Merge branch 'master' into pr/8836 Nexesenex 2024-09-20 01:38:20 +0200
  • 19ea00858e
    Merge 4e23f8a81b into 6026da52d6 Chad Brewbaker 2024-09-19 12:57:25 -0400
  • 5f95dccea8
    server : add rerank endpoint gg/rerank Georgi Gerganov 2024-09-19 16:18:30 +0300
  • c6b3ea6595 Avoid using saved CUDA graph if scale changes and reset nodes/params on update Alan Gray 2024-09-17 08:47:06 -0700
  • 333a84ada5
    Merge db4939040f into 6026da52d6 Alexey Parfenov 2024-09-19 15:26:46 +0300
  • f03bcd84e7
    llama : add "rank" pooling type Georgi Gerganov 2024-09-19 13:21:15 +0300
  • b276c09b6f
    Merge 6f9d1275a0 into 6026da52d6 Bruno Pio 2024-09-19 10:45:01 +0100
  • 6026da52d6
    server : clean-up completed tasks from waiting list (#9531) b3787 Georgi Gerganov 2024-09-19 12:44:53 +0300
  • da00027c4b
    Perplexity input data should not be unescaped Sigbjørn Skjæret 2024-09-19 10:50:19 +0200
  • eca0fab44e
    imatrix : disable prompt escape by default (#9543) b3786 Sigbjørn Skjæret 2024-09-19 09:58:14 +0200
  • 5e1a23adb0 fix function params Jia Liu 2024-09-19 15:46:17 +0800
  • ff231de553
    llama-bench : add time-to-first-byte stat gg/ttfb Georgi Gerganov 2024-09-19 09:15:29 +0300
  • 216e7d9648 fix llama_reset_model_time Jia Liu 2024-09-19 11:30:47 +0800
  • 61b155b366
    Merge d32c74d1f2 into 64c6af3195 Brian 2024-09-19 11:27:51 +0800
  • 24bea1549b add llama_model_reset_time API Jia Liu 2024-09-19 11:06:47 +0800
  • 568886416d allow disable context shift for sever VJHack 2024-09-18 19:34:05 -0500
  • a537aaa87b
    Imatrix input data should not be unescaped Sigbjørn Skjæret 2024-09-19 01:26:30 +0200
  • 6f9d1275a0
    Update convert_hf_to_gguf.py Bruno Pio 2024-09-18 20:00:25 -0300
  • c42ec2f8bb add solar pro support Michael Yang 2024-09-16 15:53:16 -0700
  • f3bd9f3aab
    Merge 60e6e2af36 into 64c6af3195 Justine Tunney 2024-09-18 20:52:06 +0200
  • 152e90331e
    llama : add classigication head (wip) [no ci] Georgi Gerganov 2024-09-18 21:20:21 +0300
  • 64c6af3195
    ggml : fix n_threads_cur initialization with one thread (#9538) b3785 slaren 2024-09-18 19:13:08 +0200
  • 64975aa250
    Merge a7f5c74795 into 0d2f22e45c JohnnyB 2024-09-18 18:35:19 +0200
  • 6b0248c29a
    Update ggml/src/ggml.c sl/fix-omp-one-thread Max Krasnyansky 2024-09-18 09:00:26 -0700
  • 0d2f22e45c
    scripts : verify py deps at the start of compare (#9520) Georgi Gerganov 2024-09-18 18:34:32 +0300
  • 56be864758
    Merge faaac59d16 into 6443ddd985 compilade 2024-09-18 16:46:25 +0200
  • f9196c9174 ggml : fix n_threads_cur initialization with one thread slaren 2024-09-18 14:58:49 +0200
  • 6443ddd985
    llama : use reserve/emplace_back in sampler_sample (#9534) b3783 Daniel Bevenius 2024-09-18 13:42:36 +0200
  • e08b907760
    Update clip.cpp Tejaakshaykumar 2024-09-18 15:57:22 +0530
  • abeceace88
    Merge a7618821d5 into 8a308354f6 Brian 2024-09-18 11:49:03 +0200
  • 87c5161b0c llama : use reserve/emplace_back in sampler_sample Daniel Bevenius 2024-09-18 11:42:07 +0200
  • a829583c97 AVX512 version of ggml_gemm_q4_0_8x8_q8_0 Srihari-mcw 2024-09-18 00:55:41 -0700
  • c90a43a237 minor change pr_add_intel_amx_support mingfeima 2024-09-18 00:31:08 -0700
  • e01cdda168
    server : clean-up completed tasks from waiting list Georgi Gerganov 2024-09-18 10:20:41 +0300
  • 38b955cf36 update CMakeLists.txt mingfeima 2024-09-18 00:12:02 -0700
  • 8a308354f6
    server : match OAI structured output response (#9527) b3782 Vinesh Janarthanan 2024-09-18 01:50:34 -0500
  • f799155ab8
    server : fix OpenSSL build (remove obsolete LOG_INFO) (#9529) b3781 Eric Zhang 2024-09-18 14:28:20 +0800
  • 9504b9f9ae minor change mingfeima 2024-09-17 23:20:01 -0700
  • 47b1a743b8 update header file, the old path for immintrin.h has changed to ggml-cpu-impl.h mingfeima 2024-09-17 23:10:58 -0700
  • 5107df7671 add amx as an ggml-backend mingfeima 2024-09-17 22:56:03 -0700
  • 7921032af5 update CMakeLists with -mamx-tile, -mamx-int8 and -mamx-bf16 mingfeima 2024-08-14 19:56:41 -0700
  • fc236ac85d move ggml_amx_init from ggml.c to ggml-amx/mmq.cpp mingfeima 2024-08-13 22:06:53 -0700
  • e38be3dac1 minor change mingfeima 2024-08-12 00:38:26 -0700
  • 82a1ab80f6 fix compiler warning when amx is not enabled mingfeima 2024-08-12 00:11:02 -0700
  • ac00d707d5 fix some compilation warning mingfeima 2024-08-08 23:33:32 -0700
  • c6cc006a1d update README mingfeima 2024-07-24 01:17:37 -0700
  • daefbe463b update CMakeList mingfeima 2024-07-24 01:16:06 -0700
  • 60249b52f1 add amx kernel for gemm mingfeima 2024-04-06 19:57:25 -0700
  • 1a88cff6ac
    server : fix openssl build by removing invalid LOG_INFO references EZForever 2024-09-18 10:52:13 +0800
  • 556d4b6292 cleaned up pr VJHack 2024-09-17 21:13:46 -0500
  • dc288da026
    Merge 74342d48c2 into faf67b3de4 Eugeniusz 2024-09-18 10:12:50 +0800
  • faf67b3de4
    [SYCL]set context default value to avoid memory issue, update guide (#9476) Neo Zhang Jianyu 2024-09-18 08:30:31 +0800
  • 37704e04cc
    Merge 61221221d7 into 7be099fa81 Herman Semenoff 2024-09-17 20:20:06 -0400
  • 1fd5de37c8
    Merge 853dbf17cd into 7be099fa81 JohnnyB 2024-09-18 02:06:35 +0200
  • 7be099fa81
    llama-bench: correct argument parsing error message (#9524) b3779 Michael Podvitskiy 2024-09-17 22:41:38 +0200
  • 95ce058c2b llama: propagating the results of graph_compute to the user interface Michael Podvitskiy 2024-09-17 21:43:01 +0200
  • 7e7f8b91d6 llama-bench: correct argument parsing error message Michael Podvitskiy 2024-09-17 20:30:01 +0200
  • 4a2e37e443
    Merge 924c832461 into 8b836ae731 fedric95 2024-09-17 22:46:41 +0800
  • 00f40ae0ef
    llama : read new cls tensors [no ci] Georgi Gerganov 2024-09-17 16:38:38 +0300
  • 8b836ae731
    arg : add env variable for parallel (#9513) b3778 Bert Wagner 2024-09-17 09:35:38 -0400
  • 2615459bb2 feat(granitemoe): Implement granitemoe Gabe Goodhart 2024-09-10 16:35:14 -0600
  • 54ce8cd5d6 fix(granitemoe convert): Split the double-sized input layer into gate and up Gabe Goodhart 2024-09-11 10:03:43 -0600
  • 178821231e feat(convert_hf_to_gguf): Add GraniteMoeModel Gabe Goodhart 2024-09-10 14:48:30 -0600
  • 70f19efc40 feat(gguf-py): Add granitemoe architecture Gabe Goodhart 2024-09-10 14:45:51 -0600
  • a5307f5acf
    py : fix position embeddings chop [no ci] Georgi Gerganov 2024-09-17 13:53:19 +0300
  • cbef812dcd
    Update README.md with env: LLAMA_ARG_N_PARALLEL Bert Wagner 2024-09-17 06:50:54 -0400
  • fbbb64fffe
    py : fix scalar-tensor conversion [no ci] Georgi Gerganov 2024-09-17 13:40:52 +0300
  • 8344ef58f8
    llama : fix n_vocab init for 'no_vocab' case (#9511) b3777 Michael Podvitskiy 2024-09-17 12:18:22 +0200
  • cba0340871
    Refactored error handling for hyperparameter validation in clip.cpp Tejaakshaykumar 2024-09-17 15:46:59 +0530
  • 28d1c4566a
    Update examples/llava/clip.cpp Tejaakshaykumar 2024-09-17 15:13:48 +0530
  • 93ef595b4b
    llama: correct vocab size for logging Michael Podvitskiy 2024-09-17 11:23:52 +0200
  • aab436c58d ggml: Added run-time detection of neon, i8mm and sve Dan Johansson 2024-08-08 13:52:59 +0200
  • a6a8f8d09c
    Update docs/backend/SYCL.md fix_ctx_default Neo Zhang Jianyu 2024-09-17 16:25:43 +0800
  • d15e19dd74
    Merge a6821563e8 into 0226613853 Nick Crews 2024-09-17 15:24:55 +0700
  • 0226613853
    threadpool : skip polling for unused threads (#9461) Max Krasnyansky 2024-09-17 01:19:46 -0700
  • cbfa2fcbdc
    scripts : verify py deps at the start of compare Georgi Gerganov 2024-09-17 11:05:10 +0300
  • 503147a9f9
    unicode : add <algorithm> (#9508) b3775 Yuri Khrustalev 2024-09-17 02:51:15 -0400
  • 0d2ec43833
    llama : support IBM Granite architecture (#9412) b3774 Gabe Goodhart 2024-09-17 00:44:58 -0600
  • 37f3a3810e
    llama : add llama_n_head() (#9512) Michael Podvitskiy 2024-09-17 08:23:30 +0200
  • d2c0111da1
    docs: add server utils.hpp comment CentricStorm 2024-09-17 05:50:02 +0100
  • c58105120e
    Merge 3277bb88e5 into 23e0d70bac hackingthekernel 2024-09-17 07:42:59 +0300
  • c23bfc5000
    docs: update server streaming mode documentation CentricStorm 2024-09-17 05:33:44 +0100
  • eb550592e4 test-barrier: release threadpool before releasing the context Max Krasnyansky 2024-09-16 16:41:41 -0700
  • a8095187d8 threadpool: improve abort handling Max Krasnyansky 2024-09-16 15:25:20 -0700
  • b9763b3301 threadpool: improve thread sync for new-graphs Max Krasnyansky 2024-09-16 14:35:09 -0700
  • 695c4483a0 wip sl/test-backend-ops-perf-flops slaren 2024-09-16 23:28:36 +0200
  • e83d2707d3 convert : adapt MiniCPM3 to separate rope_freqs insertion Francis Couture-Harpin 2024-09-16 12:05:29 -0400
  • a25f838e53 add env variable for parallel Bert Wagner 2024-09-16 15:00:49 -0400
  • c4411d5b5f threads: add simple barrier test Max Krasnyansky 2024-09-15 11:58:21 -0700
  • ed094a5211 threadpool: further simplify and improve ggml_barrier Max Krasnyansky 2024-09-13 11:21:59 -0700
  • 2bd9f47800 threadpool: skip polling for unused threads Max Krasnyansky 2024-09-12 21:28:45 -0700
  • 9704f0e928 llama: log warning if there's no vocab_size in metadata Michael Podvitskiy 2024-09-16 19:30:07 +0200