Commit Graph

  • 82d74ca1a6 Merge branch 'master' into concedo Concedo 2023-04-21 16:24:30 +0800
  • 3687db7cf7 cublas is not feasible at this time. removed for now Concedo 2023-04-21 16:14:23 +0800
  • d40fded93e
    llama : fix comment for "output.weight" tensor master-d40fded Georgi Gerganov 2023-04-21 10:23:36 +0300
  • 7c36e03dfb too many hooved animals! Barton Rhodes 2023-04-21 05:07:57 +0000
  • d1d76e24f2 Show perplexity ETA in hours and minutes Slaren 2023-04-21 03:50:20 +0200
  • c832e7c793 Add CXX flags to nvcc Slaren 2023-04-21 03:39:04 +0200
  • 94cb00a3cf alternate implementation of setting different n_batch for BLAS eiery 2023-04-20 20:57:16 -0400
  • d3e1984ce0 add rpath Henri Vasserman 2023-04-21 03:32:06 +0300
  • 0e005f7793 Build file changes Henri Vasserman 2023-04-21 02:13:00 +0300
  • 641e9a0c52 Move cuda specific definitions to ggml-cuda.h/cu Slaren 2023-04-21 00:58:26 +0200
  • 6ffe4680ca fiat nexus ▦ Barton Rhodes 2023-04-20 22:38:56 +0000
  • 041627284d
    Merge branch 'ggerganov:master' into main barton ⊛ 2023-04-20 22:21:31 +0000
  • e8797a9aed Improve cuBLAS performance by using a memory pool Slaren 2023-04-21 00:09:14 +0200
  • 2510c1831f
    Add ggml-model-*.bin checksums for 7B, 13B, 30B, 65B (#1088) Stephan Walter 2023-04-20 21:56:44 +0000
  • d58c304a45
    Add ggml-model-*.bin checksums for 65B Pavol Rusnak 2023-04-20 23:55:04 +0200
  • c6dfc44a37 spacing eiery 2023-04-20 17:06:34 -0400
  • 4b781c2055 set default n_batch to 512 when using BLAS eiery 2023-04-20 17:04:31 -0400
  • 12b5900dbc
    ggml : sync ggml (add GPT-NeoX RoPE implementation) master-12b5900 Georgi Gerganov 2023-04-20 23:32:59 +0300
  • 7aa501cd1c Faster q3_0 implementation, using two planes, by @pubby pubby 2023-04-17 10:38:45 -0500
  • 8c90a860cc More AVX2 optimizations Stephan Walter 2023-04-16 15:36:36 +0200
  • c29ab90e06 Q2 AVX2: do two blocks at a time, by @slaren Stephan Walter 2023-04-16 09:55:39 +0200
  • 6fc51a8c05 Q2 and Q3 quantization Stephan Walter 2023-03-24 17:32:35 +0100
  • d54dcbcc3b Add ggml-model-*.bin checksums for 7B, 13B, 30B Stephan Walter 2023-04-20 21:50:25 +0200
  • 54a63c10e8 Update Makefile for the Cuda kernels Henri Vasserman 2023-04-20 22:19:22 +0300
  • 9ff334f3c9
    ggml : fix bug in ggml_compute_forward_dup_f32() master-9ff334f Georgi Gerganov 2023-04-20 21:58:05 +0300
  • 0fd8363adc use hipblas based on cublas Henri Vasserman 2023-04-20 02:04:00 +0300
  • 2005469ea1
    Add Q4_3 support to cuBLAS (#1086) master-2005469 slaren 2023-04-20 20:49:53 +0200
  • 8a1756abdf
    ggml : do not break cuBLAS build (Q4_3 is not yet implemented) master-8a1756a Georgi Gerganov 2023-04-20 21:43:50 +0300
  • 7aba7cae29 Add Q4_3 support to cuBLAS Slaren 2023-04-20 20:34:35 +0200
  • 66aab46079
    ggml : fix Q4_3 quantization master-66aab46 Georgi Gerganov 2023-04-20 20:44:05 +0300
  • 38de86a711
    llama : multi-threaded quantization (#1075) master-38de86a Kawrakow 2023-04-20 19:42:27 +0200
  • b3545d9a2a
    Merge branch 'master' into multi-thread-quantize Georgi Gerganov 2023-04-20 20:41:29 +0300
  • e0305ead3a
    ggml : add Q4_3 quantization (#1082) master-e0305ea Georgi Gerganov 2023-04-20 20:35:53 +0300
  • 515ccfd2b6
    ggml : add Q4_3 quantization Georgi Gerganov 2023-04-20 18:58:35 +0300
  • 0ae02ebb40 Still fighting with lambda captures in MSVC Iwan Kawrakow 2023-04-20 18:50:27 +0200
  • 7fae1c4ee2 Avoiding compiler confusion Iwan Kawrakow 2023-04-20 18:40:33 +0200
  • 07bb31b034 wip dont use Concedo 2023-04-21 00:35:54 +0800
  • b65e559a68 Reviewer comments Iwan Kawrakow 2023-04-20 18:17:56 +0200
  • 6a9661ea5a
    ci : remove the LLAMA_ACCELERATE matrix dimension from Ubuntu builds in the CI (#1074) master-6a9661e Ivan Komarov 2023-04-20 17:15:18 +0200
  • 7ba36c2c6c trying to put out penguin based fires. sorry for inconvenience Concedo 2023-04-20 23:15:07 +0800
  • 5addcb120c
    fix: LLAMA_CUBLAS=1 undefined reference 'shm_open' (#1080) master-5addcb1 源文雨 2023-04-20 21:28:43 +0800
  • 113de2bb06
    fix: LLAMA_CUBLAS=1 undefined reference 'shm_open' 源文雨 2023-04-20 19:09:44 +0800
  • 49697d86d8 adjusted down the buf memory allocation now that realloc seems to work Concedo 2023-04-20 17:51:13 +0800
  • 4605074245 Merge branch 'master' into concedo_experimental Concedo 2023-04-20 17:30:54 +0800
  • 3e88616439 fixed WONKY CODE Concedo 2023-04-20 16:41:32 +0800
  • 0b08ec7c5d forgot to remove this Concedo 2023-04-20 16:28:47 +0800
  • 346cd68903 make linux and OSX build process equal to windows. Now it will build all applicable libraries, for a full build do make LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1 Concedo 2023-04-20 15:53:55 +0800
  • c8c2c52482
    AVX2 optimization for vec_dot_q4_2_q8_0 (#1068) master-c8c2c52 Stephan Walter 2023-04-20 06:45:41 +0000
  • ce05fc0a67 Multi-threading for quantize-stats Iwan Kawrakow 2023-04-20 07:25:13 +0200
  • 2732a6b84a Merge remote-tracking branch 'upstream/master' into eval-thread-count ml6 2023-04-19 21:43:40 -0700
  • 93761e7baf slightly clarified the library replacement steps - replacing the dll is necessary in addition to replacing the library imports Concedo 2023-04-20 12:23:54 +0800
  • 5ca2d774cc
    doc - explanation of how to use a custom version of the windows libraries at the lib folder. (#92) Gustavo Rocha Dias 2023-04-20 01:20:11 -0300
  • e488db9fd9
    Remove the LLAMA_ACCELERATE matrix dimension from Ubuntu builds in the CI Ivan Komarov 2023-04-20 04:23:19 +0200
  • 02d6988121
    Improve cuBLAS performance by dequantizing on the GPU (#1065) master-02d6988 slaren 2023-04-20 03:14:14 +0200
  • 18337719e0 Fix windows build Slaren 2023-04-20 01:03:44 +0200
  • 95cf9597aa Fix possible synchronization issue Slaren 2023-04-19 23:01:53 +0200
  • 834695fe3a
    Minor: Readme fixed grammar, spelling, and misc updates (#1071) CRD716 2023-04-19 14:52:14 -0500
  • 5b7ff8234f
    editorconfig check CRD716 2023-04-19 14:35:58 -0500
  • 48f6664589
    trailing CRD716 2023-04-19 14:31:01 -0500
  • 0731d4147e
    Update README.md CRD716 2023-04-19 14:29:40 -0500
  • 72028641ca AVX2 optimization for vec_dot_q4_2_q8_0 Stephan Walter 2023-04-19 20:41:55 +0200
  • d2f9266200 Multi-threading quantization. Iwan Kawrakow 2023-04-19 20:20:44 +0200
  • f7d05095b4
    Q4_2 quantization with rmse-optimized scale and quants (#1062) master-f7d0509 Kawrakow 2023-04-19 20:20:14 +0200
  • fe14e7c522 Re-add dropped Darwin-only flag. Corbin 2023-04-19 10:53:42 -0700
  • 35b0bf0585 Merge remote-tracking branch 'upstream/master' into more_responsive Jeffersoncgo 2023-04-19 13:44:25 -0400
  • 14a4fc874b Nix flake: Use Makefile instead of CMake Corbin 2023-04-19 10:23:47 -0700
  • 884e7d7a2b
    ggml : use 8-bit precision for Q4_1 intermediate results (#1047) master-884e7d7 Georgi Gerganov 2023-04-19 20:10:08 +0300
  • 96d84438bc Fixed type as per reviewer comment Iwan Kawrakow 2023-04-19 18:57:07 +0200
  • 49beb2cdb8 Better follow ggml conventions for function names Iwan Kawrakow 2023-04-19 18:46:44 +0200
  • e582f2ad60
    gitignore : ignore ppl-*.txt files Georgi Gerganov 2023-04-19 19:31:44 +0300
  • ad7007aa21
    ggml : AVX2 implementation of ggml_vec_dot_q4_1_q8_0 (#1051) slaren 2023-04-19 18:29:02 +0200
  • 426230525c
    ggml : optimize ggml_vec_dot_q4_1_q8_0() via vmalq_n_f32 Georgi Gerganov 2023-04-18 23:33:03 +0300
  • e9c07f72cb
    ggml : use 8-bit precision for Q4_1 intermediate results (ARM) Georgi Gerganov 2023-04-18 22:12:19 +0300
  • 6d36a51fa5
    ggml : satisfy the sanitizer builds Georgi Gerganov 2023-04-19 19:18:28 +0300
  • 891af05e7d Remove unused parameters Slaren 2023-04-19 18:11:54 +0200
  • 7cd5c4a3e9
    readme : add warning about Q4_2 and Q4_3 Georgi Gerganov 2023-04-19 19:07:54 +0300
  • f3d4edf504
    ggml : Q4 cleanup - remove 4-bit dot product code (#1061) master-f3d4edf Stephan Walter 2023-04-19 16:06:37 +0000
  • 359b056034 Improve cuBLAS performance with quantized models by dequantizing on the GPU Slaren 2023-04-19 18:01:39 +0200
  • e9657b20e8 Remove unused AVX512 Q4_0 code Stephan Walter 2023-04-19 17:31:02 +0200
  • 6eec06081b Q4_2 quantization with rmse-optimized scale and quants Iwan Kawrakow 2023-04-19 17:10:58 +0200
  • 21ee6d97cc Q4 cleanup Stephan Walter 2023-04-19 16:15:24 +0200
  • 275f1bdf13 Added tokens to identify if is loading or ready Jeffersoncgo 2023-04-19 09:08:32 -0400
  • be1222c36e Merged the upstream cublas feature, Concedo 2023-04-19 20:45:37 +0800
  • cc407f283a messing around with memory allocation to bandaid the random ooms with various gpt2 and gptj models Concedo 2023-04-19 20:18:55 +0800
  • 99eafe908f more_responsive Jeffersoncgo 2023-04-19 08:01:35 -0400
  • 8944a13296
    Add NVIDIA cuBLAS support (#1044) master-8944a13 slaren 2023-04-19 11:22:45 +0200
  • f662a9a230 Merge branch 'master' into concedo Concedo 2023-04-19 16:34:51 +0800
  • 65bfcdb1cc Merge branch 'concedo_experimental' into concedo Concedo 2023-04-19 15:35:48 +0800
  • 45ec09d31b fast forwarding for rwkv for unmodified contexts Concedo 2023-04-19 15:09:35 +0800
  • 116488af66
    Create make_pyinstaller.sh (#89) AlpinDale 2023-04-19 07:27:07 +0430
  • 142c38a4f3 AVX2 implementation of ggml_vec_dot_q4_1_q8_0 Slaren 2023-04-19 03:13:20 +0200
  • 6667401238
    Multi-threaded ggml_cpy (#1035) master-6667401 slaren 2023-04-19 00:53:24 +0200
  • b9e99cd1fd Also fix wdata offset in ggml_compute_forward_add_q_f32 Slaren 2023-04-18 22:27:50 +0200
  • 8bd47a8bda Update ggml.c slaren 2023-04-18 19:19:01 +0200
  • 0f8b1df18f Multi-threaded ggml_cpy Slaren 2023-04-18 01:47:34 +0200
  • 40846bd28d Cleanup cublas comments Slaren 2023-04-19 00:37:33 +0200
  • 5fc6799f05 Add support to cmake Slaren 2023-04-18 23:20:11 +0200
  • 77a73403ca
    ggml : add new Q4_2 quantization (ARM only) (#1046) master-77a7340 Georgi Gerganov 2023-04-18 23:54:57 +0300
  • ed24225917
    ggml : optimize ggml_vec_dot_q4_1_q8_0() via vmalq_n_f32 Georgi Gerganov 2023-04-18 23:33:03 +0300
  • 3ceb0733a6
    Merge branch 'master' into q4_1xq8_0 Georgi Gerganov 2023-04-18 23:13:21 +0300