Commit Graph

  • ac793a21e8 Fix for #2023 goerch 2023-07-22 00:32:09 +0200
  • 934eeb43d9 CUDA: Fixed 7b q3_K_S with mul_mat_vec_q JohannesGaessler 2023-07-22 00:30:05 +0200
  • 43833f6c83
    Merge 39edee5136 into 7d5f18468c dewijones92 2023-07-21 23:13:14 +0200
  • 545862ae48
    Update perplexity.cpp klosax 2023-07-21 21:25:44 +0200
  • ebf009f63e
    Update common.cpp klosax 2023-07-21 21:21:35 +0200
  • 68d2ca65e6
    Update common.h klosax 2023-07-21 21:19:45 +0200
  • 7d5f18468c
    examples : add easy python script to create quantized (k-bit support) GGML models from local HF Transformer models (#2311) Richard Roberson 2023-07-21 13:01:10 -0600
  • 1faad6ddac
    examples : rename to use dash instead of underscore Georgi Gerganov 2023-07-21 21:58:21 +0300
  • a363c2bc60 Resync my fork with new llama.cpp commits richardr1126 2023-07-21 12:35:21 -0600
  • 75064b4ada wip on embedded horde worker Concedo 2023-07-22 01:30:25 +0800
  • 807ef887b2 fix white spaces lshzh-ww 2023-07-21 12:39:44 -0400
  • 6ee897a501 metal: issue operations concurrently if possible lshzh-ww 2023-07-21 11:23:51 -0400
  • 1c3030ee41 ggml: try to issue operations concurrently on GPU lshzh-ww 2023-07-21 11:23:18 -0400
  • c8e6ef1846 metal: only encode in one command buffer lshzh-ww 2023-07-21 11:17:48 -0400
  • fe8b79255b
    Obtaining LLaMA 2 instructions niansa/tuxifan 2023-07-21 17:16:40 +0200
  • e87840f9fd allocator: automatic inplace operations slaren 2023-07-21 16:51:50 +0200
  • d924522a46
    Custom RoPE + bettter memory management for CUDA (#2295) master-d924522 Kawrakow 2023-07-21 17:27:51 +0300
  • 4d76a5f49b
    Faster Q3_K implementation on Metal (#2307) Kawrakow 2023-07-21 17:05:30 +0300
  • 0db14fef06
    ggml : fix the rope fix (513f861953) master-0db14fe Georgi Gerganov 2023-07-21 15:16:55 +0300
  • 11315b1d61
    llama : minor style changes Georgi Gerganov 2023-07-21 15:11:23 +0300
  • 03e566977b
    examples : fix typo in minigpt4.py (#2298) Ikko Eltociear Ashimine 2023-07-21 20:53:07 +0900
  • 513f861953
    ggml : fix rope args order + assert (#2054) master-513f861 Georgi Gerganov 2023-07-21 14:51:34 +0300
  • 3973b25a64
    gitignore : fix final newline Georgi Gerganov 2023-07-21 14:42:41 +0300
  • 3d679827e7 improved memory management fixes slaren 2023-07-21 12:41:46 +0200
  • ab0e26bdfb
    llama : remove cfg smooth factor as it is only a reparameterization of the guidance scale (#2280) master-ab0e26b Guillaume "Vermeille" Sanchez 2023-07-21 12:58:36 +0200
  • 73643f5fb1
    gitignore : changes for Poetry users + chat examples (#2284) master-73643f5 Jose Maldonado 2023-07-21 06:53:27 -0400
  • eef66e1d2e
    Merge branch 'master' into master Georgi Gerganov 2023-07-21 13:52:25 +0300
  • a814d04f81
    make : fix indentation master-a814d04 Georgi Gerganov 2023-07-21 13:50:55 +0300
  • 4c013bb738
    ci : fix MNT realpath usage (#2250) Georgi Gerganov 2023-07-21 13:48:18 +0300
  • 56e9ae062c llama.cpp: partially restore state support, graph export slaren 2023-07-21 12:39:51 +0200
  • 42c7c2e2e9
    make : support customized LLAMA_CUDA_NVCC and LLAMA_CUDA_CCBIN (#2275) master-42c7c2e Sky Yan 2023-07-21 18:38:57 +0800
  • 78a3d13424
    flake : remove intel mkl from flake.nix due to missing files (#2277) master-78a3d13 wzy 2023-07-21 18:26:34 +0800
  • ae178ab46b
    llama : make tensor_split ptr instead of array (#2272) master-ae178ab Georgi Gerganov 2023-07-21 13:10:51 +0300
  • 54e3bc76fe
    make : add new target for test binaries (#2244) master-54e3bc7 Jiří Podivín 2023-07-21 12:09:16 +0200
  • 647cef8bbd
    Merge branch 'master' into testtarget-removal Georgi Gerganov 2023-07-21 13:08:59 +0300
  • b068f2f4b5 Adjusted look ahead in ggml_cuda_pool_malloc to 5% Iwan Kawrakow 2023-07-21 11:58:52 +0300
  • 019fe257bb
    MIKU MAYHEM: Upgrading the Default Model for Maximum Fun 🎉 (#2287) Hatsune Miku 2023-07-21 08:13:18 +0000
  • d3c3624c7b Better Q3_K for QK_K = 64 Iwan Kawrakow 2023-07-21 09:38:38 +0300
  • 0099570f04 Q3_K for QK_K = 64 Iwan Kawrakow 2023-07-21 09:14:12 +0300
  • 8dba28c00a Additional Q3_K speedup on Metal Iwan Kawrakow 2023-07-20 20:28:28 +0300
  • 5bb23b5ab5 Faster Q3_K on Metal Iwan Kawrakow 2023-07-20 20:07:38 +0300
  • e68c96f7fe
    Faster Q2_K on Metal (#2297) Kawrakow 2023-07-21 10:44:40 +0300
  • 9cf022a188
    make : fix embdinput library and server examples building on MSYS2 (#2235) master-9cf022a Przemysław Pawełczyk 2023-07-21 09:42:21 +0200
  • bdf3b6e0d7 Fixed bug in new metal Q2_K implementation Iwan Kawrakow 2023-07-21 10:21:19 +0300
  • 1a61c1a5e1 server: allow json array in prompt or content Xiao-Yong Jin 2023-07-21 00:35:58 -0500
  • 47031e4094 add --in-prefix-bos to prefix BOS to user inputs; keep EOS Xiao-Yong Jin 2023-07-20 22:04:06 -0500
  • c047e8aec2 only sample full tokens (no peeking or truncation) Evan Jones 2023-07-19 23:28:57 -0400
  • 37d3f6a260 remove unused code slaren 2023-07-21 02:33:06 +0200
  • cd6f5dec92 improved memory management slaren 2023-07-21 00:28:49 +0200
  • ad97ee3676
    Fix flake build on darwin Charles Duffy 2023-07-20 14:52:46 -0500
  • 3432e378d5 Replace VMA library with native Vulkan buffer management 0cc4m 2023-07-20 21:57:33 +0200
  • d45c1631bc
    metal : rewrite to fit new backend interface correctly (WIP) ggml-backends-metal Georgi Gerganov 2023-07-20 16:36:33 +0300
  • b5b133723a Don't free before queue done 0cc4m 2023-07-20 19:32:17 +0200
  • 66c2632f1d
    examples : fix typo in minigpt4.py Ikko Eltociear Ashimine 2023-07-21 02:32:00 +0900
  • 417546c8c3 Deleting unnoticed and dangereous trailing white space Iwan Kawrakow 2023-07-20 20:30:50 +0300
  • 09b51fc648 Faster Q2_K on Metal Iwan Kawrakow 2023-07-20 18:53:45 +0300
  • 0f8d5aa091
    Update README.md repo-reviews 2023-07-20 17:26:08 +0200
  • e782c9e735
    Faster Q5_K and Q6_K on Metal (#2294) Kawrakow 2023-07-20 18:19:45 +0300
  • 1cdbbbb37c Custom RoPE + bettter memory management for CUDA Iwan Kawrakow 2023-07-20 17:52:27 +0300
  • de69f8f20d initial implementation of delayed graph allocation slaren 2023-07-20 15:57:48 +0200
  • 5f2e4bd8ba Another Q5_K speedup Iwan Kawrakow 2023-07-20 16:33:15 +0300
  • 463f420710 Faster Q5_K on Metal Iwan Kawrakow 2023-07-20 16:09:40 +0300
  • 06c08576f7 Merge remote-tracking branch 'origin/master' into concedo_experimental Concedo 2023-07-20 21:02:40 +0800
  • f036109110 script for henky Concedo 2023-07-20 21:02:12 +0800
  • fa9d54e36e Faster Q6_K on Metal Iwan Kawrakow 2023-07-20 16:00:22 +0300
  • e4db70720d
    [wip] chat now has parameter and cfg Henri Vasserman 2023-07-20 15:37:31 +0300
  • 785829dfe8
    Faster Q4_K on Metal (#2290) Kawrakow 2023-07-20 15:18:43 +0300
  • cb82adadb8
    metal : first working version of the inference without prompt processing Georgi Gerganov 2023-07-20 14:56:29 +0300
  • 290cb700bf
    metal : map the CPU buffers to Metal buffers (WIP) Georgi Gerganov 2023-07-20 14:30:34 +0300
  • 8e03cfcb6a Faster Q4_K on Metal Iwan Kawrakow 2023-07-20 14:19:08 +0300
  • fff0e0eafe llama : fix regression from #2000 - could not load no-mmap models master-fff0e0e Georgi Gerganov 2023-07-20 13:47:26 +0300
  • 417a85a001
    metal: minor q4 optimization and reduce code size (#2248) Shouzheng Liu 2023-07-20 06:32:22 -0400
  • e85557f798 launcher for rope Concedo 2023-07-20 17:45:50 +0800
  • 4379ed7085 Miku.sh: Switch sampler to mirostat_v2 and tiny prompt improvements at8u 2023-07-20 08:13:21 +0100
  • 39dc1a46c4 added token count, updated lite Concedo 2023-07-20 14:41:06 +0800
  • 569916ab44 Little changes in .gitignore for Poetry users A fix in Makefile for FreeBSD users. In the platfrom x86_64 is amd64. This fix resolve compilation using CFLAGS and CXXFLAGS with -march=native and -mtune=native Add two examples for interactive mode using Llama2 models (thx TheBloke for models) Jose Yukiteru Amano 2023-07-20 00:11:30 -0400
  • f3f2e8eee3 metal: use template to reduce size lshzh-ww 2023-07-19 23:16:18 -0400
  • ea0ea9ad36
    Merge branch 'ggerganov:master' into server-improve-yazan Yazan Agha-Schrader 2023-07-20 05:00:47 +0200
  • 082dd81286
    [wip] chat improvements Henri Vasserman 2023-07-20 03:48:48 +0300
  • cb205c0d13 automatically calculate compute buffer sizes (without graph allocator) slaren 2023-07-20 02:22:54 +0200
  • 77ac8deaf1 llama.cpp: remove backend-specific code where possible slaren 2023-07-20 00:59:26 +0200
  • 43694ca867
    consistent semicolons Henri Vasserman 2023-07-20 00:58:16 +0300
  • 890d1b8446
    Merge master into server-cfg Henri Vasserman 2023-07-20 00:48:03 +0300
  • dd3cf5760a
    last n tokens done Henri Vasserman 2023-07-20 00:36:36 +0300
  • 42591a0acd
    remove "smooth factor" Henri Vasserman 2023-07-20 00:02:13 +0300
  • 2cb8469e7f
    refactor evaluation logic Henri Vasserman 2023-07-19 23:45:40 +0300
  • 9e97cb0baf Don't force aligned matmul 0cc4m 2023-07-19 21:59:03 +0200
  • 105fd199be Use pinned memory for f16 preprocessing 0cc4m 2023-07-19 21:03:11 +0200
  • 1e78b1b0a1 remove cfg smooth factor as it is only a reparameterization of the guidance scale Guillaume Sanchez 2023-07-19 16:50:32 +0000
  • f38433ef5d
    Merge remote-tracking branch 'origin/ggml-backends' into ggml-backends-metal Georgi Gerganov 2023-07-19 17:45:45 +0300
  • 02de94ef82
    Remove intel mkl from flake.nix due to missing files Wu Zhenyu 2023-07-19 21:33:08 +0800
  • 70c55c17c7
    metal : create backend, mostly reuse CPU backend interface Georgi Gerganov 2023-07-19 16:47:43 +0300
  • c49a469a79 updated lite Concedo 2023-07-19 21:13:00 +0800
  • 6065f5dd15 Miku.sh: Add in-prefix/in-suffix opts at8u 2023-07-19 13:48:27 +0100
  • 187b7dd297 Miku.sh: Set ctx_size to 4096 at8u 2023-07-19 13:37:39 +0100
  • df15fcb598 Support customized LLAMA_CUDA_NVCC and LLAMA_CUDA_CCBIN Yan Lin 2023-07-19 20:25:49 +0800
  • 79479bd201 Miku.sh: Set default model to llama-2-7b-chat at8u 2023-07-19 13:20:54 +0100
  • 2a88d6d3ec Merge remote-tracking branch 'ycros/api-modelbusy-fix' into concedo_experimental Concedo 2023-07-19 18:32:13 +0800
  • 13e34d5058 Merge remote-tracking branch 'origin/master' into concedo_experimental Concedo 2023-07-19 18:28:29 +0800
  • e9467f5a44 auto rope scale adjustments, added sched yield fix for apple, adjust warning for mirostat Concedo 2023-07-19 16:44:44 +0800