Commit Graph

  • 99ef967d42 add static prefix to the other functions too anon 2023-06-13 14:17:22 -0300
  • 1f3945236a remove old verbose variable anon 2023-06-13 14:14:29 -0300
  • bbe9c59618
    Update Makefile for minimalist example SuperUserNameMan 2023-06-13 19:12:45 +0200
  • ba636acb1f
    minimalist example CMakeLists.txt SuperUserNameMan 2023-06-13 19:09:44 +0200
  • 1659d77515
    Create simple.cpp SuperUserNameMan 2023-06-13 19:08:37 +0200
  • cc60183c5f VRAM KV cache based on -ngl, fixed info prints JohannesGaessler 2023-06-13 17:39:32 +0200
  • 15de626b3a double max nodes again Concedo 2023-06-13 23:51:10 +0800
  • 82cf97ce92 hotfix for rwkv Concedo 2023-06-13 23:38:41 +0800
  • e8528d4d2f Fixed incorrect index when going out of context JohannesGaessler 2023-06-13 16:15:11 +0200
  • 20e76a0764 Free CUDA scratch buffer upon llama_model deletion JohannesGaessler 2023-06-13 13:03:25 +0200
  • ed6587491c Free KV cache CUDA buffers upon deletion JohannesGaessler 2023-06-13 11:15:30 +0200
  • 8e3057b24b Removed obsolete code, fixed multi GPU JohannesGaessler 2023-06-12 20:23:16 +0200
  • 95120f1365 flatten rows for ggml_cuda_op JohannesGaessler 2023-06-12 19:52:38 +0200
  • 3b6a2ee414 ggml_cuda_cpy for f32 -> f32 JohannesGaessler 2023-06-12 17:33:48 +0200
  • cf5ae8635a KV cache v works, perf. bad, # after 64 tokens JohannesGaessler 2023-06-12 10:07:27 +0200
  • 9a85d913ee Refactored ggml_cuda_cpy JohannesGaessler 2023-06-12 09:19:11 +0200
  • 19c0bf5c86 ggml_is_permuted JohannesGaessler 2023-06-11 19:53:43 +0200
  • 8d648a34d8 ggml_cuda_diag_mask_inf JohannesGaessler 2023-06-10 22:09:56 +0200
  • 6b46870fea ggml_cuda_scale JohannesGaessler 2023-06-10 20:56:40 +0200
  • b87178b558 ggml_cuda_mul_mat_vec_p021 JohannesGaessler 2023-06-09 20:01:54 +0200
  • 8c6bd319db Fixed CUDA RoPE JohannesGaessler 2023-06-07 11:47:52 +0200
  • 9db2ec068f cuda build file Concedo 2023-06-13 22:29:38 +0800
  • 6119b8a3d0 new vocab files Concedo 2023-06-13 22:22:04 +0800
  • 0e3cc8e6f7 Improve code formatting 0cc4m 2023-06-13 16:10:25 +0200
  • f1ac03ed37 Shorten switch statements 0cc4m 2023-06-13 15:21:44 +0200
  • f345347e5c updated lite Concedo 2023-06-13 20:44:22 +0800
  • 561ce6a153 Merge remote-tracking branch 'occam/kquant-opencl' into concedo_experimental Concedo 2023-06-13 20:27:11 +0800
  • 67559a15f3 Merge branch 'master' into concedo_experimental Concedo 2023-06-13 20:26:51 +0800
  • 871009dfab integrated world tokenizer for RWKV Concedo 2023-06-13 20:06:19 +0800
  • 6627a02540
    Allow overriding the server address Henri Vasserman 2023-06-13 13:36:31 +0300
  • 74d4cfa343
    Allow "quantizing" to f16 and f32 (#1787) master-74d4cfa Kerfuffle 2023-06-13 04:23:23 -0600
  • 2a972f3649 Fix q3_k 0cc4m 2023-06-13 08:25:32 +0200
  • fc8c823f34 Fix q2_k, improve code 0cc4m 2023-06-12 20:02:56 +0200
  • 6e20827f93 Added OpenCL DMMV kernels Concedo 2023-06-12 19:31:09 +0800
  • f558e4c297 Finish dequant kernels Concedo 2023-06-12 14:55:21 +0800
  • 56151bb875 Replace uchar with uint8_t Concedo 2023-06-12 14:20:44 +0800
  • a4ee2b89d2 Fix q4_k opencl struct order 0cc4m 2023-06-12 08:13:05 +0200
  • 1506affd0a Added q6_k kernel Concedo 2023-06-11 22:29:43 +0800
  • 44422fd567 Set global and local sizes for kernel calls for dequantizing k-quants 0cc4m 2023-06-11 12:47:21 +0200
  • 9b41865312 Porting q2_k kernel to OpenCL Concedo 2023-06-10 21:52:32 +0800
  • b8b8a6ed00
    Add log flush Henri Vasserman 2023-06-13 12:58:02 +0300
  • 3655865e15
    Merge 1e06f12714 into 74a6d922f1 milka :) 2023-06-13 10:39:13 +0100
  • 1e06f12714 Removed trailing whitespaces, removed variable-length arrays, removed debug print Amy 2023-06-13 10:39:04 +0100
  • b3ea026c92 Improve help output in quantize tool KerfuffleV2 2023-06-13 03:14:06 -0600
  • 124b4172ef Fixed warnings Amy 2023-06-13 09:56:37 +0100
  • 9830871d0f pulled all Occam's fixes and the kquants are all working now Concedo 2023-06-13 16:15:13 +0800
  • 298ff34221 clarified dynamic precision picking in QX Amy 2023-06-13 09:11:42 +0100
  • 9b6c35b651 rwkv speed enhancements (batch processing), fixed a rwkv token processing bug Concedo 2023-06-13 16:02:12 +0800
  • 4cd885beb5 added comments and scalar implementation for vec_dot_qx Amy 2023-06-13 08:59:03 +0100
  • 854a1e805d Add ggml.h to spm public Headers Vogel Frederik 2023-06-13 14:12:07 +0900
  • a28c5f95f7 Ignore metal file in spm Vogel Frederik 2023-06-13 14:11:33 +0900
  • d820918f18 Set include path for OpenBlas Igor Okulist 2023-06-12 23:25:06 -0500
  • e5274378f7 cleaned-up implementation of QX mixed quantization Amy 2023-06-13 05:08:57 +0100
  • 4f2f619e31 docs - Alternative way to build at Android, with CLBlast. gustrd 2023-06-12 22:38:33 -0300
  • 909970921e
    Merge pull request #22 from anon998/bash-trim Randall Fitzgerald 2023-06-12 21:06:50 -0400
  • 9d564db9ae trim response and trim trailing space in prompt anon 2023-06-12 21:30:33 -0300
  • 6d72f0f070 Make chat shell script work by piping the content out of the subshell. Randall Fitzgerald 2023-06-12 19:44:53 -0400
  • 1bedac6ff0 Remove iostream usage from quantize tool KerfuffleV2 2023-06-12 13:03:59 -0600
  • 3cb9de2c4a Clean up size output, use uppercase for quant types KerfuffleV2 2023-06-10 09:01:41 -0600
  • 1e361c531c Allow "quantizing" to f16 and f32 KerfuffleV2 2023-06-10 05:03:16 -0600
  • fc78910bc3
    Merge branch 'ggerganov:master' into master Randall Fitzgerald 2023-06-12 16:18:13 -0400
  • 50e7c5434f
    Merge pull request #21 from SlyEcho/server_refactor Randall Fitzgerald 2023-06-12 16:16:20 -0400
  • f344d090f7
    streaming shell script Henri Vasserman 2023-06-12 22:49:08 +0300
  • 74a6d922f1
    Metal implementation for all k_quants (#1807) master-74a6d92 Kawrakow 2023-06-12 22:39:21 +0300
  • 335cc1eb3a
    Update baby-llama.cpp 0xspringtime 2023-06-12 14:50:43 -0400
  • 55290ba801
    Update baby-llama.cpp 0xspringtime 2023-06-12 14:45:19 -0400
  • cb469f7efb
    zero initialize gfbuf and gbbuf xaedes 2023-06-12 20:43:48 +0200
  • 32dc227284
    print used training seed xaedes 2023-06-12 20:42:44 +0200
  • 350784d9a0
    Merge branch 'ggerganov:master' into metal-max-buffer-workaround kiltyj 2023-06-12 11:39:53 -0700
  • 223bd36370
    Merge remote-tracking branch 'upstream/master' into text-from-scratch xaedes 2023-06-12 20:39:47 +0200
  • 2d247e3c11 removed NDK check for armv8 because we are passing options from the parent project instead l3utterfly 2023-06-13 02:05:08 +0800
  • 0aed791064
    llama : do a warm-up eval at start for better timings Georgi Gerganov 2023-06-12 20:56:06 +0300
  • 429ed950af
    move CPPHTTPLIB settings inside server Henri Vasserman 2023-06-12 20:46:53 +0300
  • e4caa8da59
    ci : run when changing only the CUDA sources (#1800) master-e4caa8d slaren 2023-06-12 19:12:47 +0200
  • 28694f7ac9
    add a simple bash script too Henri Vasserman 2023-06-12 19:53:13 +0300
  • fc4264d14a
    api url Henri Vasserman 2023-06-12 18:43:40 +0300
  • 1510337901
    fix make flags propagation Henri Vasserman 2023-06-12 18:34:12 +0300
  • b91200a2e5
    javascript chat update. Henri Vasserman 2023-06-12 18:34:01 +0300
  • 860fb026df rwkv compile fix (+1 squashed commits) Concedo 2023-06-12 22:40:45 +0800
  • 13cf6929b7
    more json changes and stop info Henri Vasserman 2023-06-12 17:29:25 +0300
  • 9301ff0624
    Update baby-llama.cpp 0xspringtime 2023-06-12 09:58:07 -0400
  • 120851df53 prevent gpu offload if kquant is selected with clblast for now Concedo 2023-06-12 21:57:31 +0800
  • 215edf420b Merge branch 'master' into concedo_experimental Concedo 2023-06-12 21:53:13 +0800
  • dff11a14d2
    json parsing improvements Henri Vasserman 2023-06-12 16:52:21 +0300
  • 9c08017051 this patch is a work in progress implementation for the k-quants. the dequant kernels are working, but the DMMV ones are not. Concedo 2023-06-12 21:47:57 +0800
  • 58970a4c39
    Leverage mmap for offloading tensors to GPU (#1597) master-58970a4 Howard Su 2023-06-12 20:44:16 +0800
  • 61726bd942
    Add assert to make sure we only allocate temp buffer for non-CPU backend tensor Howard Su 2023-06-12 20:19:26 +0800
  • 8d9465b22e Inital Commit with our data Seif-Sallam 2023-06-12 15:09:23 +0300
  • 8c0a10e64d
    metal : fix failure to load model (#1817) Kawrakow 2023-06-12 14:31:36 +0300
  • 661777d636 metal : fix failure to load model Iwan Kawrakow 2023-06-12 13:07:10 +0300
  • 21f093c7f1
    Merge 6bc86c95de into fa84c4b3e8 JackJollimore 2023-06-12 03:06:10 -0700
  • b2c0973b44 Workaround Metal maxBufferLength Kilty McGowan 2023-06-12 02:00:22 -0700
  • 4148b9bd03
    remove void Henri Vasserman 2023-06-12 10:28:17 +0300
  • 3e78f0071a add missing include Evan Jones 2023-06-12 00:13:27 -0400
  • 56904cae00 Merge remote-tracking branch 'refs/remotes/upstream/master' into grammar Evan Jones 2023-06-12 00:07:13 -0400
  • 98a9587ce4 add comments to grammar syntax and allow newlines where unambiguous Evan Jones 2023-06-11 23:41:25 -0400
  • 674bb08b20 handle & print parser errors Evan Jones 2023-06-11 22:40:01 -0400
  • 9e77f42ef7 fix whitespace errors Evan Jones 2023-06-11 22:38:13 -0400
  • 834d423edf allow loading grammar from file Evan Jones 2023-06-11 22:37:16 -0400
  • 5963a0ae3d Nix flake: Build and install libllama.so Corbin 2023-06-11 19:07:30 -0700