Commit Graph

  • bc04508666 Use MTLDevice.newBufferWithBytesNoCopy to share buffers between CPU and GPU Kilty McGowan 2023-06-04 21:30:54 -0700
  • 9270056269 fixed compile error in cmake VS Concedo 2023-06-05 11:48:04 +0800
  • 80891d1591 Update REAMDE.md (#1673) qingfengfenga 2023-06-05 11:06:18 +0800
  • 5eed33f3b3
    Merge branch 'ggerganov:master' into master qingfengfenga 2023-06-05 11:00:12 +0800
  • 073b4b8ba5 fix(avx): workaround for missing _mm256_setr_m128i in GCC < 8 xingchensong 2023-06-05 09:27:07 +0800
  • 4f9640b8fe Tensor parallelism JohannesGaessler 2023-05-24 14:29:21 +0200
  • 971920e935 ggml_cuda_compute_forward JohannesGaessler 2023-05-24 12:55:50 +0200
  • 071dcd351b CUDA op template JohannesGaessler 2023-05-23 09:17:31 +0200
  • 827f5eda91
    readme : update hot topics Georgi Gerganov 2023-06-04 23:38:19 +0300
  • ecb217db4f
    llama : Metal inference (#1642) master-ecb217d Georgi Gerganov 2023-06-04 23:34:30 +0300
  • 95eaed63a7
    Merge ac7a69fa33 into dcb2ed4826 Howard Su 2023-06-04 22:50:31 +0300
  • 82cfd1b395 Added tensor layer numbers Daniel Kuntz 2023-06-04 15:06:05 -0400
  • 324e823afd
    readme : add example for main Georgi Gerganov 2023-06-04 18:50:09 +0300
  • e33002d42e
    readme : add Metal instructions Georgi Gerganov 2023-06-04 18:48:35 +0300
  • db3db9e774
    metal : clean-up stuff, fix typos Georgi Gerganov 2023-06-04 18:19:08 +0300
  • b252acbcb6
    metal : add comments Georgi Gerganov 2023-06-04 18:10:28 +0300
  • d8a7486d17
    Revert "ci : disable temporary" Georgi Gerganov 2023-06-04 17:58:23 +0300
  • a7fb899c53
    metal : final refactoring and simplification Georgi Gerganov 2023-06-04 17:57:02 +0300
  • 32a5f3a601 Had unintentionally committed the Makefile with -Ofast enabled Iwan Kawrakow 2023-06-04 17:35:56 +0300
  • b7fb1aa233 removed build info in cmake Concedo 2023-06-04 22:34:27 +0800
  • 6f66e4c4a5 updated lite Concedo 2023-06-04 22:27:15 +0800
  • 9aa2d8535b hide gpu input box when dropdown not selected, minor memory fix for neox and gptj Concedo 2023-06-04 21:47:17 +0800
  • b4aad3add4
    Merge a1cdd29cd2 into dcb2ed4826 Georgi Gerganov 2023-06-04 06:02:26 -0500
  • 1ddbb9acd9 Merge branch 'concedo-opencl-dev' into concedo_experimental Concedo 2023-06-04 18:07:27 +0800
  • 64e3e74556 change max value size_t to use limits Concedo 2023-06-04 18:04:52 +0800
  • 2b700749e5
    Merge branch 'master' into concedo-opencl-dev LostRuins 2023-06-04 18:00:06 +0800
  • dd4b5c64b8 Merge branch 'master' into concedo_experimental Concedo 2023-06-04 17:38:22 +0800
  • 431693cb10 Added forgotten ggml.o dependence on k_quants.h to the Makefile Iwan Kawrakow 2023-06-04 11:28:50 +0300
  • e26cd6b483
    mtl : remove temp / debug code Georgi Gerganov 2023-06-04 11:23:36 +0300
  • e4b522232c
    mtl : clean-up ggml mtl interface + suport scratch / inplace Georgi Gerganov 2023-06-04 10:38:21 +0300
  • 18e482a89c
    mtl : preparing for merge Georgi Gerganov 2023-06-04 09:27:27 +0300
  • dcb2ed4826
    OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel (#1653) master-dcb2ed4 0cc4m 2023-06-04 08:12:05 +0200
  • 88919095b5 edit readme Concedo 2023-06-04 12:09:49 +0800
  • c3c05fc33b further cleanup, refactor renamemode to hordeconfig Concedo 2023-06-04 11:57:46 +0800
  • 2868fac676 Merge branch 'master' into concedo_experimental Concedo 2023-06-04 11:07:07 +0800
  • 20803c221e cleaning up some old junk Concedo 2023-06-04 11:05:46 +0800
  • b62279cb39 buf size for starcoder still not good Concedo 2023-06-04 00:41:08 +0800
  • 0a71a4e6d3 Fix docker build Iwan Kawrakow 2023-06-03 19:03:15 +0300
  • 6ef13823b8 Fix quantization error test Iwan Kawrakow 2023-06-03 18:41:45 +0300
  • d8bd0013e8
    Add info about CUDA_VISIBLE_DEVICES (#1682) Henri Vasserman 2023-06-03 16:35:20 +0300
  • 4c88864715
    Add info about CUDA_VISIBLE_DEVICES Henri Vasserman 2023-06-03 16:23:33 +0300
  • b5c85468a3
    Docker: change to calling convert.py (#1641) Jiří Podivín 2023-06-03 14:11:53 +0200
  • 6a14cd4d3e Setting up flake8 and pre-commit hooks Jiri Podivin 2023-06-03 11:54:24 +0200
  • 8f5d42db9b Minor Iwan Kawrakow 2023-06-03 14:46:57 +0300
  • abd99a89a7 A slightly faster ARM_NEON A4_K dot product Iwan Kawrakow 2023-06-03 11:37:53 +0300
  • 894210a351 A slightly daster Q4_K AVX2 dot product Iwan Kawrakow 2023-06-02 17:28:38 +0300
  • 9a9c5a0c80 A 10% faster CUDA vector dot kernel for Q3_K Iwan Kawrakow 2023-06-01 15:22:12 +0300
  • c5959d53ff Don't print zeros/NaNs when no count histogram has been collected Iwan Kawrakow 2023-06-01 14:07:42 +0300
  • e51ce72e03 Fixed bug in Q2_K CUDA dot product kernel Iwan Kawrakow 2023-06-01 14:01:25 +0300
  • 7bcc37676a A slightly faster ARM_NEON Q2_K dot Iwan Kawrakow 2023-06-01 11:27:14 +0300
  • 6ec70579cb Adding ARM_NEON Q2_K dot Iwan Kawrakow 2023-06-01 00:31:52 +0300
  • 8516fdf728 Adding scalar and AVX2 Q2_K dot Iwan Kawrakow 2023-05-31 22:28:55 +0300
  • b439efb712 Adding Q2_K - just CUDA for now Iwan Kawrakow 2023-05-31 18:09:31 +0300
  • 4faa040c20 A very slightly faster ARM_NEON Q3_K dot Iwan Kawrakow 2023-05-31 08:46:30 +0300
  • 13264fa067 Adding Q3_K dot for ARM_NEON Iwan Kawrakow 2023-05-30 14:18:47 +0300
  • a197eb50d1 Q5_K dot product for ARM_NEON Iwan Kawrakow 2023-05-30 12:31:21 +0300
  • 5ca15ce155 Q6_K dot product for ARM_NEON Iwan Kawrakow 2023-05-30 11:22:53 +0300
  • a2533a72a3 Q4_K dot product for ARM_NEON Iwan Kawrakow 2023-05-30 10:02:54 +0300
  • 54f808db2b Quantization mixes: didn't quite get what I wanted in the last commit Iwan Kawrakow 2023-05-29 22:09:46 +0300
  • d537b97cb8 Adding quantization mixes Iwan Kawrakow 2023-05-29 20:10:56 +0300
  • 5c5191ab68 Per convention, all QX_K quantizations use Q5_K for output.weight Iwan Kawrakow 2023-05-29 19:32:43 +0300
  • b835d0f49f Adding Q5_K - scalar, AVX2, CUDA Iwan Kawrakow 2023-05-29 18:57:04 +0300
  • cf221afb55 Adding Q6_K - scalar, AVX2, CUDA Iwan Kawrakow 2023-05-29 16:02:54 +0300
  • a0b8e9f3c9 Adding Q4_K - scalar, AVX2, CUDA Iwan Kawrakow 2023-05-29 14:30:17 +0300
  • 3d8b1de3f7 Some more CUDA optimizations for Q3_K Iwan Kawrakow 2023-05-29 09:16:45 +0300
  • a3c0673089 Some improvement for Q3_K on CUDA Iwan Kawrakow 2023-05-28 22:21:25 +0300
  • c93cce3a45 Q3_K now working on CUDA and AVX2/scalar Iwan Kawrakow 2023-05-28 21:38:00 +0300
  • b4f71347ff Adding Q3_K and Q8_K (de)-quantization Iwan Kawrakow 2023-05-27 20:26:36 +0300
  • 8673a41385 Starting to add k-quantization to ggml Iwan Kawrakow 2023-05-27 19:10:49 +0300
  • 136476e898
    Fix prompt cache saving and chat-persistent rollover (#1678) master-136476e Evan Jones 2023-06-03 07:28:45 -0400
  • fb14faf6b0
    clang-tidy Evan Jones 2023-06-03 07:00:42 -0400
  • 50ce29667f add interface for float input ningshanwutuobang 2023-06-03 18:51:58 +0800
  • c1b293d31a fixed MPT ooms Concedo 2023-06-03 18:37:13 +0800
  • 8bd9a3a48b updated readme, improved simple launcher Concedo 2023-06-03 17:17:15 +0800
  • 6f82e17b7a added MPT support Concedo 2023-06-03 16:14:08 +0800
  • 4df2ef3161
    mtl : make it work with main example Georgi Gerganov 2023-06-03 09:11:15 +0300
  • c812ff2b8a Fix prompt cache saving and chat-persistent rollover (fixes #1670) Evan Jones 2023-06-03 00:41:16 -0400
  • df2ecc942a
    Merge pull request #18 from anon998/update-readme Randall Fitzgerald 2023-06-02 17:04:25 -0400
  • 98ae2de017 parse --mlock and --no-mmap + format anon 2023-06-02 17:54:46 -0300
  • 05a5a485b8 make help text load faster anon 2023-06-02 17:52:04 -0300
  • a6ed390cc6 update readme anon 2023-06-02 17:48:29 -0300
  • e1e2be2146 remove --keep from help text anon 2023-06-02 17:47:42 -0300
  • 2f4e9d19cc
    mtl : plug Metal inference into llama.cpp (very quick-n-dirty) Georgi Gerganov 2023-06-02 21:52:11 +0300
  • 640a889632
    mtl : add save/load vocab to ggml file Georgi Gerganov 2023-06-02 21:00:30 +0300
  • 03c2d72867
    mtl : simplify implementation Georgi Gerganov 2023-06-02 20:36:26 +0300
  • 627605732c
    mtl : remove printfs from inner loop Georgi Gerganov 2023-06-02 19:58:08 +0300
  • 9839259b63 allow specifying the horde limit as well Concedo 2023-06-03 00:55:44 +0800
  • b088e14a7e
    mtl : more threads for rms_norm + better timing Georgi Gerganov 2023-06-02 19:26:58 +0300
  • 70c3387726
    mtl : fix kernel signature + roll inner loop Georgi Gerganov 2023-06-02 19:11:39 +0300
  • b58d73ca8c
    ci : disable temporary Georgi Gerganov 2023-05-29 20:57:24 +0300
  • 847bbfe9e6
    mtl : faster mul_mat_q4_0_f32 kernel Georgi Gerganov 2023-06-02 18:28:31 +0300
  • 5758e9f09b
    Removed embedding from flags. Randall Fitzgerald 2023-06-02 08:31:12 -0700
  • 310bf61496
    Merge pull request #17 from SlyEcho/server_refactor Randall Fitzgerald 2023-06-02 11:25:01 -0400
  • de6df486e9
    Removed embedding from README Randall Fitzgerald 2023-06-02 08:24:46 -0700
  • 33671460b0
    mtl : fix bug in f16 x f32 mul mat + speed-up computation Georgi Gerganov 2023-06-02 18:23:51 +0300
  • bcd616700e
    improve docs and example Henri Vasserman 2023-06-02 18:04:46 +0300
  • 96b0e536b7 Merge branch 'opencl-dev-concedo' into concedo_experimental Concedo 2023-06-02 22:12:14 +0800
  • 59fe16877d Clblast fixes + enhancements to save VRAM: Concedo 2023-06-02 22:10:49 +0800
  • 7cebe2eaf8 Merge branch 'master' of https://github.com/digiwombat/llama.cpp digiwombat 2023-06-02 10:06:04 -0400
  • 16e1c9813a Removed the embedding api endpoint and associated code. digiwombat 2023-06-02 10:05:52 -0400