Commit Graph

  • 428e734bb6 1) change c++14 request from global to grpc-server only 2) change proto to llama/v1 dir according to lint suggestion Liu Ming 2023-06-12 09:57:47 +0800
  • 7be3222b64
    avoid creating unnecessary grad tensors xaedes 2023-06-12 00:01:18 +0200
  • 59544f0cdf
    remove unnecessary scratch buffer 0 xaedes 2023-06-11 23:23:06 +0200
  • efd7314d27
    scratch buffer bug fixes in forward_batch_wo_cache_flash_attn_train xaedes 2023-06-11 23:10:41 +0200
  • a6812a1dd6 metal : fixed accidentally broken Q2_K Iwan Kawrakow 2023-06-11 23:22:16 +0300
  • aa23744759 metal : Q3_K cleanup Iwan Kawrakow 2023-06-11 22:45:24 +0300
  • eee8b28d36
    Merge pull request #20 from SlyEcho/server_refactor Randall Fitzgerald 2023-06-11 15:17:46 -0400
  • fdeb99784a
    bug fix in forward_batch_wo_cache_flash_attn_train xaedes 2023-06-11 19:58:36 +0200
  • df2c1dc738 metal : Q3_K second optimization pass - 29.6 ms/token Iwan Kawrakow 2023-06-11 19:56:47 +0300
  • 3d5ff127ca metal : Q3_K 1st optimization pass Iwan Kawrakow 2023-06-11 19:25:36 +0300
  • 27a69d6a75 metal : q3_K finally working Iwan Kawrakow 2023-06-11 19:00:57 +0300
  • df57fcb9c8 llama : update internal API declaration Didzis Gosko 2023-06-11 19:00:17 +0300
  • b9a4da3c6f Merge branch 'master' into concedo_experimental Concedo 2023-06-11 23:27:28 +0800
  • c44b9c3ecf added the llama_v2 cuda back (+2 squashed commit) Concedo 2023-06-11 23:18:03 +0800
  • edf6fc252a
    store view offset like in master branch xaedes 2023-06-11 17:07:44 +0200
  • 982c7cf5cc metal : q3_K still not working Iwan Kawrakow 2023-06-11 12:48:24 +0300
  • 3b4f5e167c metal : optimize Q5_K Iwan Kawrakow 2023-06-10 19:50:55 +0300
  • 3bd1608c08 metal : yet another failed attempt to make q3_K work Iwan Kawrakow 2023-06-10 09:02:17 +0300
  • 66dddda5cb Minor Iwan Kawrakow 2023-06-09 22:16:33 +0300
  • bdf3a66fcd metal : still not able to figure out why q3_K does not work Iwan Kawrakow 2023-06-09 17:04:26 +0300
  • cda2b7b4c2 metal : Q5_K support Iwan Kawrakow 2023-06-09 13:50:05 +0300
  • f5b6ed315e metal : Q3_K support Iwan Kawrakow 2023-06-09 08:35:49 +0300
  • 355e8c6e95 metal : some more optimizations Iwan Kawrakow 2023-06-10 19:05:32 +0300
  • fff0e4f9d8 metal : still optimizing Q4_K Iwan Kawrakow 2023-06-10 15:57:33 +0300
  • 5e2f67fe00 metal : small improvement for Q4_K Iwan Kawrakow 2023-06-10 12:04:13 +0300
  • a75c12932d metal : improve q4_K Iwan Kawrakow 2023-06-10 11:31:12 +0300
  • 7aa10d0518
    fix bug in threaded indices calculation of ggml_compute_forward_flash_attn_back_f32 xaedes 2023-06-11 16:50:41 +0200
  • fa84c4b3e8
    Fix issue where interactive mode crashes when input exceeds ctx size (#1789) master-fa84c4b Kerfuffle 2023-06-11 08:19:17 -0600
  • 855ede7436 Add a comment clarifying where n_ctx - 4 came from KerfuffleV2 2023-06-11 08:17:25 -0600
  • 6518f9c482
    build settings Henri Vasserman 2023-06-11 16:32:53 +0300
  • c2522f005e Fix CI build when changing only the CUDA sources slaren 2023-06-11 15:27:44 +0200
  • 12b063f0ec
    Fixed WSL cuda's OOM error (#1594) Kyle Liang 2023-06-11 21:20:52 +0800
  • 9612d12fbf
    big logging update Henri Vasserman 2023-06-11 16:18:39 +0300
  • 2c00bf855d
    more formatting changes Henri Vasserman 2023-06-11 14:01:42 +0300
  • 31d2b5f4a4
    Update SHA256SUMS with current hashes for models quantized using q4_0 (#1798) Ryan Landay 2023-06-11 17:38:53 +0800
  • e829421eda
    minor : fix compile warnings + minor style changes Georgi Gerganov 2023-06-11 11:49:01 +0300
  • 25b3a83425
    Merge branch 'ggerganov:master' into master l3utterfly 2023-06-11 11:30:09 +0800
  • 50b966292c Update SHA256SUMS with current hashes for models quantized using q4_0 Ryan Landay 2023-06-11 10:59:35 +0800
  • 22904afaeb llama : minor cleanup Didzis Gosko 2023-06-11 05:24:17 +0300
  • 0a30fc99fd llama : make model stateless and context stateful Didzis Gosko 2023-06-11 04:56:17 +0300
  • 9a2f20362f make : find include dir for OpenBLAS header file katsu560 2023-06-11 06:15:06 +0900
  • 4de0334f5c
    cmake : fix Metal build (close #1791) master-4de0334 Georgi Gerganov 2023-06-10 22:56:53 +0300
  • 3f1223155a
    k-quants : GCC12 compilation fix (#1792) master-3f12231 Artyom Lebedev 2023-06-10 22:51:36 +0300
  • 82645fcd01 [#1783] GCC12 compilation fix. Artyom Lebedev 2023-06-10 17:33:17 +0000
  • f5a790f761 Use n_ctx - 4 for max_embd_size to match existing behavior KerfuffleV2 2023-06-10 09:43:26 -0600
  • e1f0de2fa1 Move add_library(llama, ...) to above first_use Spencer Sutton 2023-06-10 11:17:18 -0400
  • fb67506c1b Merge branch 'master' into concedo_experimental Concedo 2023-06-10 23:04:48 +0800
  • 303f5809f1
    metal : fix issue with ggml-metal.metal path. Closes #1769 (#1782) master-303f580 Andrei 2023-06-10 10:47:34 -0400
  • 0c9cd39259 lowered streaming tickrate for greater efficiency Concedo 2023-06-10 22:12:01 +0800
  • 059e99066d
    doc : fix wrong address of BLIS.md (#1772) Aisuko 2023-06-11 00:08:11 +1000
  • 34ca572e84 Show progress Howard Su 2023-06-10 21:47:07 +0800
  • 910fb8b683 Fix issue where interactive mode crashes when input exceeds ctx size KerfuffleV2 2023-06-10 07:44:16 -0600
  • 921d87cad8 Rebase to latest Howard Su 2023-06-10 21:33:51 +0800
  • b9f74db89e Merge branch 'master' into concedo_experimental Concedo 2023-06-10 21:07:20 +0800
  • fa64971881 encoding Concedo 2023-06-10 21:05:35 +0800
  • 66a3f4e421 added support for lora base Concedo 2023-06-10 19:29:45 +0800
  • 375540837e updated lite Concedo 2023-06-10 19:16:29 +0800
  • a68fcfe738 only start a new thread when using sse Concedo 2023-06-10 19:03:41 +0800
  • 43f7e40470 added extra endpoints for abort gen and polled streaming Concedo 2023-06-10 18:13:26 +0800
  • bac0ddb58f
    Merge branch 'ggerganov:master' into master Randall Fitzgerald 2023-06-10 06:11:31 -0400
  • 17c10acfb4
    ggml : force no_alloc == false when creating opt tensors (close #1699) master-17c10ac Georgi Gerganov 2023-06-10 12:06:45 +0300
  • e9b66ee982
    metal : add Q4_1 implementation (#1785) Kawrakow 2023-06-10 11:28:11 +0300
  • 4f0154b0ba
    llama : support requantizing models instead of only allowing quantization from 16/32bit (#1691) master-4f0154b Kerfuffle 2023-06-10 01:59:17 -0600
  • ef3171d162
    ggml : workaround for missing _mm256_setr_m128i in GCC < 8 (#1638) master-ef3171d Xingchen Song(宋星辰) 2023-06-10 15:49:40 +0800
  • 555275a693
    make : add SSSE3 compilation use case (#1659) master-555275a rankaiyx 2023-06-10 14:41:59 +0800
  • 4a5ecde3f1 metal : add Q4_1 implementation Iwan Kawrakow 2023-06-10 09:32:59 +0300
  • d6d263fc4f
    Merge pull request #19 from lesaun/master Randall Fitzgerald 2023-06-09 23:11:02 -0400
  • 917540ce43
    Clarify build instructions in README. Lesaun Harvey 2023-06-09 19:06:09 -0700
  • 1a9141b6c3 Remove model assign in main(). Clarified stop in README. Randall Fitzgerald 2023-06-09 16:29:10 -0400
  • 921a8e483b
    Update flake.nix metal kernel substitution Andrei 2023-06-09 12:58:33 -0400
  • 43fe774573
    Add ggml-metal.metal as a resource for llama target Andrei 2023-06-09 12:28:58 -0400
  • 98ed165574
    OpenCL: Add release memory (#1741) master-98ed165 Robert Sung-wook Shin 2023-06-10 01:24:40 +0900
  • c6dd99a023
    Fix issue with ggml-metal.metal path Andrei 2023-06-09 12:15:59 -0400
  • 5bd9cef9fa merging Proper SSE Token Streaming #220 with end connection fix test Concedo 2023-06-09 23:22:16 +0800
  • 0522e794af Case-insensitive quantize CLI arguments JohannesGaessler 2023-06-09 16:56:30 +0200
  • b92f9fe3a2 Merge remote-tracking branch 'sammcheese/sammcheese/tokenstreaming' into concedo_experimental Concedo 2023-06-09 20:41:02 +0800
  • 507939c135 Merge branch 'master' into concedo_experimental Concedo 2023-06-09 20:20:04 +0800
  • 788784179a Merge branch 'concedo' into concedo_experimental Concedo 2023-06-09 20:19:56 +0800
  • e1ab14c4ab
    fix format string vulnerability (#223) 12Boti 2023-06-09 14:16:03 +0200
  • ae9663f188
    Windows nvcc workaround (#1753) master-ae9663f Johannes Gäßler 2023-06-09 13:58:15 +0200
  • 57b0b53b54
    fix kobold lite generation SammCheese 2023-06-09 12:39:35 +0200
  • c99ab9df33
    Revert "Squashed commit of the following:" SammCheese 2023-06-09 12:19:08 +0200
  • e6231c3055
    back to http.server, improved implementation SammCheese 2023-06-09 12:17:55 +0200
  • d28ed99e59 remove unused declarations Concedo 2023-06-09 18:01:55 +0800
  • 7cdeb08483 More formatting cleanup Randall Fitzgerald 2023-06-09 05:12:16 -0400
  • 889d9044bf Merge branch 'master' of https://github.com/digiwombat/llama.cpp Randall Fitzgerald 2023-06-09 04:57:21 -0400
  • 7580427837 Resolving some review comments Randall Fitzgerald 2023-06-09 04:56:31 -0400
  • 4f665cd63d
    Squashed commit of the following: SammCheese 2023-06-09 10:55:07 +0200
  • 23a1b1841e
    Merge branch 'ggerganov:master' into master Randall Fitzgerald 2023-06-09 04:51:20 -0400
  • cc2b33649d Missed a pair of catch statements for formatting. Randall Fitzgerald 2023-06-09 04:50:31 -0400
  • a9c34779f6 Spaces to 4 and other code style cleanup. Notes in README. Randall Fitzgerald 2023-06-09 04:47:18 -0400
  • b33dee282f
    metal : fix build "tanhf" -> "tanh" Georgi Gerganov 2023-06-09 11:11:04 +0300
  • b617f2847b Merge branch 'master' into concedo_experimental Concedo 2023-06-09 16:10:35 +0800
  • 73cc5b88fb added warning message for unsupported K quants Concedo 2023-06-09 16:09:23 +0800
  • 92f44ff7f7
    metal : add GELU implementation (#1770) AT 2023-06-09 04:00:51 -0400
  • 245fc3c37d
    metal : faster q4_0 (#1775) Kawrakow 2023-06-09 10:39:59 +0300
  • 9b4de68e73 metal : 17% faster Q4_0 Iwan Kawrakow 2023-06-09 10:26:42 +0300
  • 01dc509038 Merge branch 'master' into concedo_experimental Concedo 2023-06-09 14:53:35 +0800
  • 0833845268 merged metal patch directly into the file Concedo 2023-06-09 14:38:31 +0800
  • 090710e485 metal : 8% faster q4_0 Iwan Kawrakow 2023-06-09 09:11:18 +0300