Commit Graph

  • 358dcf0934 CUDA: mul_mat_vec_q kernels for k-quants JohannesGaessler 2023-07-11 20:58:44 +0200
  • 3a13d1e829
    Apply formatting. Henri Vasserman 2023-07-13 20:24:31 +0300
  • 32c5411631
    Revert "Support using mmap when applying LoRA (#2095)" (#2206) master-32c5411 Howard Su 2023-07-13 21:58:25 +0800
  • ff5d58faec
    Fix compile error on Windows CUDA (#2207) master-ff5d58f Howard Su 2023-07-13 21:58:09 +0800
  • b782422a3e
    devops : add missing quotes to bash script (#2193) Bodo Graumann 2023-07-13 15:49:14 +0200
  • 495245bc32
    examples: fixed path typos in embd-input Shangning Xu 2023-07-13 21:05:51 +0800
  • 2ec4466db5
    Update build flags. Henri Vasserman 2023-07-13 13:44:02 +0300
  • cd36b185ff
    Merge 'origin/master' into hipblas Henri Vasserman 2023-07-13 13:03:01 +0300
  • 08812fe2e6 Oops, forgot to delete the original Q4_1 kernel Iwan Kawrakow 2023-07-13 11:33:34 +0200
  • 0f7967089f 7-25% faster Q4_1 on Metal Iwan Kawrakow 2023-07-13 10:51:45 +0200
  • 585ac35b42 3-5% faster Q4_0 on Metal Iwan Kawrakow 2023-07-13 10:32:19 +0200
  • 6fb06123dc
    Enable LLAMA_METAL and LLAMA_MPI in Makefile James Reynolds 2023-07-12 19:41:46 -0600
  • cac6746cc6 Fix compile error on Windows CUDA Howard Su 2023-07-13 08:19:37 +0800
  • 183de43647 Revert "Support using mmap when applying LoRA (#2095)" Howard Su 2023-07-13 07:52:29 +0800
  • 7e01bc5ec6 Use loader field for clarity. Spencer Sutton 2023-07-12 18:33:59 -0400
  • 421cc6cc01 Change comment Spencer Sutton 2023-07-12 18:26:06 -0400
  • c14cde156e Rename parameter Spencer Sutton 2023-07-12 18:07:33 -0400
  • f6c4e8dd6a Set different mmap flags for lora/non-lora Spencer Sutton 2023-07-12 17:56:54 -0400
  • b3c1434d2e
    Merge branch 'ggerganov:master' into master m3ndax 2023-07-12 22:12:18 +0200
  • 48e3e99ea0 fix codre readability mendax0110 2023-07-12 22:11:24 +0200
  • 1cbf561466
    metal : new q4_0 matrix-vector kernel (#2188) Shouzheng Liu 2023-07-12 16:10:55 -0400
  • 5150582bb3 metal: New q4_0 matrix-vector kernel lshzh-ww 2023-07-12 14:43:10 -0400
  • 975221e954
    ggml : broadcast mul_mat + conv batch support (#2199) Georgi Gerganov 2023-07-12 20:51:29 +0300
  • 521bc7b4c5
    ggml : apply mul_mat broadcast fix by @jploski Georgi Gerganov 2023-07-12 20:49:54 +0300
  • 2e3326a939
    ggml : broadcast mul_mat + conv batch support Georgi Gerganov 2023-07-12 20:41:05 +0300
  • 4523d10d0c ggml : add ggml_pool_1d and ggml_pool_2d Georgi Gerganov 2023-07-12 20:27:03 +0300
  • 680e6f9177 cuda : add gelu support Georgi Gerganov 2023-07-12 20:26:18 +0300
  • 64e4602463
    ggml : add ggml_pool_1d and ggml_pool_2d Georgi Gerganov 2023-07-12 20:27:03 +0300
  • f3b5b4f9f2
    cuda : add gelu support Georgi Gerganov 2023-07-12 20:26:18 +0300
  • 1a1c6d9c2b Add functions that works directly with model Bach Le 2023-07-12 23:29:13 +0800
  • 4d3ce352eb Remove vocab reference from context Bach Le 2023-07-12 23:09:58 +0800
  • b723fe7028 Track and free temporary ggml_tensor_extra_gpu struct created during eval Bach Le 2023-07-12 22:59:04 +0800
  • 2cca222a54
    Add missing quotes to bash script Bodo Graumann 2023-07-12 16:45:08 +0200
  • 4e7464ef88
    FP16 is supported in CM=6.0 (#2177) master-4e7464e Howard Su 2023-07-12 20:18:40 +0800
  • a53a59afe3 Support broadcast add & mul on CUDA (fixed) lijiahao 2023-07-12 18:21:31 +0800
  • a95e105acd
    building PTX code for both of 60 and 61 Howard Su 2023-07-12 16:41:56 +0800
  • 2b5eb72e10
    Fixed __dp4a compute capability: 6.0 -> 6.1 (#2189) master-2b5eb72 Johannes Gäßler 2023-07-12 10:38:52 +0200
  • f7d278faf3
    ggml : revert CUDA broadcast changes from #2183 (#2191) master-f7d278f Georgi Gerganov 2023-07-12 10:54:19 +0300
  • 95fff695cb ggml : revert CUDA broadcast changes from #2183 Georgi Gerganov 2023-07-12 10:49:27 +0300
  • dfdadc0dd9 Hotfix for the prompt being ignored with CUDA JohannesGaessler 2023-07-12 09:36:39 +0200
  • 69391e09fa Fixed __dp4a compute capability: 6.0 -> 6.1 JohannesGaessler 2023-07-12 09:13:00 +0200
  • 5941514e95 Merge commit '5bf2a2771886ee86137e01dbc7492f78fb392066' into concedo_experimental Concedo 2023-07-12 13:05:16 +0800
  • 8f4ed0d18c fixed cmake, 8bit MMV should be working now Concedo 2023-07-12 11:22:55 +0800
  • 1c3ab205d0 add top-k to web ui Yazan Agha-Schrader 2023-07-12 05:10:22 +0200
  • 7516488550
    fix compilation (#313) Sammy 2023-07-12 04:44:56 +0200
  • b2e071dd86 Merge remote-tracking branch 'upstream/master' into grammar Evan Jones 2023-07-11 21:51:50 -0400
  • 20d7740a9b
    ggml : sync (abort callback, mul / add broadcast, fix alibi) (#2183) master-20d7740 Georgi Gerganov 2023-07-11 22:53:34 +0300
  • f43d6c7c46
    ggml : sync (abort callback, mul / add broadcast, fix alibi) Georgi Gerganov 2023-07-11 22:22:19 +0300
  • 5bf2a27718
    ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178) master-5bf2a27 Spencer Sutton 2023-07-11 12:31:10 -0400
  • e902c49d24
    mpi : adapt to new ggml_tensor->src Georgi Gerganov 2023-07-11 19:24:08 +0300
  • c9c74b4e3f
    llama : add classifier-free guidance (#2135) master-c9c74b4 Bach Le 2023-07-12 00:18:43 +0800
  • 3ec7e596b2
    docker : add '--server' option (#2174) Jinwoo Jeong 2023-07-12 01:12:35 +0900
  • 917831c63a
    readme : fix zig build instructions (#2171) Chad Brewbaker 2023-07-11 11:03:06 -0500
  • 7c0fbc2f12 Update train-text-from-scratch for change Spencer Sutton 2023-07-11 11:19:29 -0400
  • e7251ab827 Add ggml changes Spencer Sutton 2023-07-11 11:18:33 -0400
  • afcb8fe0c4
    Add new config option Henri Vasserman 2023-07-11 18:09:27 +0300
  • 8c2c4978a3
    Merge 'origin/master' into hipblas Henri Vasserman 2023-07-11 17:53:54 +0300
  • e610466307
    Expand arch list and make it overrideable Henri Vasserman 2023-07-11 17:53:14 +0300
  • 2347463201
    Support using mmap when applying LoRA (#2095) master-2347463 Howard Su 2023-07-11 22:37:01 +0800
  • bbef28218f
    Possible solution to allow K-quants on models with n_vocab!=32000 (#2148) master-bbef282 LostRuins 2023-07-11 22:01:08 +0800
  • a286776435 updated lite Concedo 2023-07-11 21:48:01 +0800
  • 1d1111e10f expose timing info in web api Concedo 2023-07-11 18:56:06 +0800
  • 7222877069 Merge remote-tracking branch 'ren/concedo' into concedo_experimental Concedo 2023-07-11 18:45:36 +0800
  • 5ca204d527 Merge remote-tracking branch 'yellowrose/pr/open/LostRuins/koboldcpp/multigpu-cuda-gui' into concedo_experimental Concedo 2023-07-11 18:22:54 +0800
  • 4be167915a added linear rope option, added warning for bad samplers Concedo 2023-07-11 18:08:19 +0800
  • 397da62002 FP16 is supported in CM=6.0 Howard Su 2023-07-11 17:43:10 +0800
  • 2ab2da2eb4 Update comment to reflect the support lora with mmap Howard Su 2023-07-05 14:21:31 +0800
  • 1d4b687ee6 Fix Linux Howard Su 2023-07-04 18:16:04 +0800
  • d4e58cbf94 Support using mmap when applying LoRA Howard Su 2023-07-04 16:05:26 +0800
  • b0b131499f Merge branch 'master' into concedo_experimental Concedo 2023-07-11 16:12:15 +0800
  • 694fce3a0b
    Add '--server' option to run './server' script Jinwoo Jeong 2023-07-11 14:19:15 +0900
  • 014fbfd4a9 add unicode escapes Evan Jones 2023-07-10 23:26:09 -0400
  • b9fa12d360 fix zig build readme Chad Brewbaker 2023-07-10 22:05:07 -0500
  • 2777168618 Porting MPI PR to Darwin OpenMPI Chad Brewbaker 2023-07-10 17:49:14 -0500
  • 45e5df66da XgenVocab fix from @smdesai Aman Karmani 2023-07-10 11:06:05 -0700
  • abf164d71e Fix styling based on review Bach Le 2023-07-10 23:50:17 +0800
  • 5656d10599
    mpi : add support for distributed inference via MPI (#2099) master-5656d10 Evan Miller 2023-07-10 11:49:56 -0400
  • eaef2d0e76
    mpi : extend API to allow usage with outer backends (e.g. Metal) Georgi Gerganov 2023-07-10 18:47:24 +0300
  • c3c3ef11a6
    mpi : factor out recv / send in functions and reuse Georgi Gerganov 2023-07-10 18:35:38 +0300
  • 11ebfea8c0 Merge branch 'kquant_vocab_fix' into concedo_experimental Concedo 2023-07-10 23:28:48 +0800
  • fd9a2fdfe2 As an alternative, to avoid failing on Metal due to lack of Q8_0 support, instead quantize tok_embeddings.weight to Q4_0 and retain output.weight as F16. This results in a net gain of about 55mb for a 7B model compared to previous approach, but should minimize adverse impact to model quality. Concedo 2023-07-10 23:22:45 +0800
  • 048dca9809
    Fix indentation LostRuins 2023-07-10 22:57:15 +0800
  • 9324cb804a reimplemented save and load Concedo 2023-07-10 22:49:27 +0800
  • 50097e6c7f Merge branch 'master' into concedo_experimental Concedo 2023-07-10 20:08:27 +0800
  • 523fc3be52 fixed rwkv, standardized new ctx usage Concedo 2023-07-10 20:05:53 +0800
  • 2827920044 fix compile errors, rwkv not working Concedo 2023-07-10 18:23:25 +0800
  • f1014f3cc7 remove unused .re YellowRoseCx 2023-07-10 00:26:40 -0500
  • 80e4e548bf
    Merge 'origin/master' into hipblas Henri Vasserman 2023-07-10 02:09:28 +0300
  • 242f01e983 Add Multi-GPU CuBLAS support in the new GUI YellowRoseCx 2023-07-09 17:10:14 -0500
  • ada1a2aa8b [mpi] use MPI_INT32_T Evan Miller 2023-07-09 15:37:33 -0400
  • b18e4ad4ac Merge branch 'mpi' of github.com:evanmiller/llama.cpp into mpi Evan Miller 2023-07-09 15:32:36 -0400
  • 666a15aeb4 Merge remote-tracking branch 'refs/remotes/origin/mpi' into mpi Evan Miller 2023-07-09 15:32:10 -0400
  • 00b8aa1e66
    tests : fix new llama_backend API Georgi Gerganov 2023-07-09 22:31:54 +0300
  • f085a57d1a [mpi] Link MPI C++ libraries to fix OpenMPI Evan Miller 2023-07-09 15:31:53 -0400
  • 166db36c51
    mpi : fix after master merge Georgi Gerganov 2023-07-09 22:23:04 +0300
  • 0492363137
    mpi : fix after master merge refactor-mpi Georgi Gerganov 2023-07-09 22:23:04 +0300
  • 1c3a15c5d4
    Merge pull request #1 from ggerganov/refactor-mpi Evan Miller 2023-07-09 15:23:04 -0400
  • 81c5ddd532
    Merge branch 'mpi' into refactor-mpi Georgi Gerganov 2023-07-09 22:20:14 +0300
  • 03cc12be0d [mpi] continue-on-error: true Evan Miller 2023-07-09 15:10:43 -0400
  • 4a9a4748e9 Add OpenMPI to GH action Evan Miller 2023-07-09 15:05:58 -0400