Commit Graph

  • cc45a7feb8
    Fix crash of test-tokenizer-0 under Debug build (#2064) Howard Su 2023-07-04 02:43:55 +0800
  • ca9a11697c possibly slower, but cannot use larger batches without modifying ggml library. Concedo 2023-07-04 00:35:02 +0800
  • 889c0aaa9b fix index for ne03 value gotope 2023-07-03 23:32:37 +0800
  • 2f764b5e8e
    Add instructions for downloading weights to README Jonathan Allen Grant 2023-07-03 07:54:35 -0700
  • bfeb3471d7 fix typos Concedo 2023-07-03 21:36:42 +0800
  • a75ddde952 Change per comment Howard Su 2023-07-03 20:12:40 +0800
  • 55dbb915cc
    [llama] No need to check file version when loading vocab score (#2079) Howard Su 2023-07-03 19:58:58 +0800
  • fdbf3982e2 Fixed OpenCL offloading prints JohannesGaessler 2023-07-03 09:51:47 +0200
  • 26c4b23320 update for baichuan: Judd 2023-07-03 11:25:02 +0800
  • 73352bee66 Don't need check version here Howard Su 2023-07-03 09:51:39 +0800
  • b48bef8074 Fix style Howard Su 2023-07-03 09:22:45 +0800
  • 202ed75ad4 Fix abs() warning Evan Miller 2023-07-02 21:19:31 -0400
  • 98bbd73b69
    fix server crashes Henri Vasserman 2023-07-03 03:37:49 +0300
  • a58c1ee863
    Update model file name in examples/alpaca.sh tslmy 2023-07-02 15:41:03 -0700
  • d7d2e6a0f0
    server: add option to output probabilities for completion (#1962) master-d7d2e6a WangHaoranRobin 2023-07-03 05:38:44 +0800
  • 24eeb97d13 Add bounds checking to matmul kernels, improve implementation, fix command buffers not freed properly 0cc4m 2023-07-02 22:11:58 +0200
  • f9c585f008 Generalize quantize_fns for simpler FP16 handling Stephan Walter 2023-04-29 19:46:37 +0200
  • 309534dcd0 implement sampler order, expose sampler order and mirostat in api Ycros 2023-07-02 18:15:34 +0000
  • e5e7183299 use const in methods mendax0110 2023-07-02 18:43:29 +0200
  • f713dd515d add /v1/ endpoints binding jwj7140 2023-07-03 00:50:46 +0900
  • 685d236d8b Add BPE dropout support, use it in training. Howard Su 2023-07-02 22:57:14 +0800
  • c3e3733c61
    ROCm fixes Henri Vasserman 2023-07-02 15:51:31 +0300
  • 15db19ae7b
    Merge 'origin/master' into hipblas Henri Vasserman 2023-07-02 15:39:57 +0300
  • 7dcffd7a03 set n_keep to -1 jwj7140 2023-07-02 20:45:29 +0900
  • 377ecf9e9b fix bugs jwj7140 2023-07-02 20:17:03 +0900
  • 3d2907d208 make gptneox and gptj work with extended context too Concedo 2023-07-02 18:28:09 +0800
  • d6b47e6a5b Merge branch 'master' into concedo_experimental Concedo 2023-07-02 17:26:39 +0800
  • e17c8497cf switched to NTK aware scaling Concedo 2023-07-02 17:25:08 +0800
  • 9bfaf7ddd0
    Merge branch 'ggerganov:master' into master m3ndax 2023-07-02 10:29:26 +0200
  • e19483ca0f increase scratch for above 4096 Concedo 2023-07-02 14:55:08 +0800
  • 46088f7231 ggml : fix build with OpenBLAS (close #2066) master-46088f7 Georgi Gerganov 2023-07-02 09:46:46 +0300
  • b85ea580d3 Merge branch 'master' into concedo_experimental Concedo 2023-07-02 14:45:25 +0800
  • da7d2f9587 Adjust Metal buffer allocation to avoid allocating beyond MTLDevice.recommendedMaxWorkingSetSize Kilty McGowan 2023-07-01 21:33:16 -0700
  • cc06f1171b Fix crash of test-tokenizer-0 under Debug build Howard Su 2023-07-01 22:37:26 +0800
  • cc3c86f6ea
    Merge pull request #9 from WangHaoranRobin/robin_fork_master WangHaoranRobin 2023-07-02 08:02:14 +0800
  • 71f829678a examples/common.h: put all bool variables in gpt_params together Wang Haoran(Robin) 2023-07-02 08:01:19 +0800
  • 1a70a80369 examples/common.h: put all bool variables in gpt_params together Wang Haoran(Robin) 2023-07-02 08:00:13 +0800
  • ad807731d9
    Merge branch 'ggerganov:master' into master WangHaoranRobin 2023-07-02 07:54:40 +0800
  • adb97e8818
    Merge branch 'ggerganov:master' into master m3ndax 2023-07-01 23:42:15 +0200
  • 0bc2cdfc87
    Better CUDA synchronization logic (#2057) master-0bc2cdf Johannes Gäßler 2023-07-01 21:49:44 +0200
  • befb3a3562
    Test-based VRAM scratch size + context adjustment (#2056) Johannes Gäßler 2023-07-01 21:47:26 +0200
  • b213227067
    cmake : don't force -mcpu=native on aarch64 (#2063) Daniel Drake 2023-07-01 20:31:44 +0200
  • 2f8cd979ec
    metal : release buffers when freeing metal context (#2062) master-2f8cd97 Aaron Miller 2023-07-01 11:14:59 -0700
  • 471aab6e4c
    convert : add support of baichuan-7b (#2055) Judd 2023-07-02 01:00:25 +0800
  • ef3b8dc0d9 GPU accel for rwkv is slow, disable it Concedo 2023-07-02 00:41:46 +0800
  • e1a7042943 try out the new rwkv but it seems worse, may revert Concedo 2023-07-02 00:10:56 +0800
  • 463f2f4c4f
    llama : fix return value of llama_load_session_file_internal (#2022) Georgi Gerganov 2023-07-01 19:05:09 +0300
  • cb44dbc7de
    llama : catch llama_load_session_file_internal exceptions (#2022) Rand Xie 2023-07-02 00:02:58 +0800
  • 79f634a19d
    embd-input : fix returning ptr to temporary master-79f634a Georgi Gerganov 2023-07-01 18:46:00 +0300
  • 04606a1599
    train : fix compile warning Georgi Gerganov 2023-07-01 18:45:44 +0300
  • b1ca8f36a9
    ggml : disable GGML_TASK_INIT and GGML_TASK_FINALIZE by default (#1995) Qingyou Meng 2023-07-01 23:42:43 +0800
  • 2353509b78 cmake: don't force -mcpu=native on aarch64 Daniel Drake 2023-07-01 09:12:16 +0200
  • 632bf27b65 more granular context size selections Concedo 2023-07-01 11:02:44 +0800
  • 1a3e8ad6db release metal buffers when freeing metal context Aaron Miller 2023-06-30 16:08:37 -0700
  • d412bbbcdc
    Merge branch 'ggerganov:master' into master m3ndax 2023-06-30 22:55:21 +0200
  • 94ba56184e Better CUDA synchronization logic JohannesGaessler 2023-06-30 19:19:43 +0200
  • 36cd5d85e9 Avoid requesting dedicated memory, VMA can decide that by itself 0cc4m 2023-06-30 21:20:19 +0200
  • 4ea9b2fd4b Add VMA library 0cc4m 2023-06-30 21:15:06 +0200
  • c8ff09bdc7 dequant_q4_0 kernel 0cc4m 2023-06-30 20:48:42 +0200
  • cb5cb4d6e2 Fix f16_to_f32 kernel 0cc4m 2023-06-30 20:14:11 +0200
  • df3cdbdac7 Output FP32 in fp16 matmul shader 0cc4m 2023-06-29 20:15:39 +0200
  • 40c8f843f2 Fix mulmat_f16 0cc4m 2023-06-29 20:04:36 +0200
  • c31e14b2fd Enable device extensions properly, restore fp16 matmul op 0cc4m 2023-06-29 06:46:17 +0200
  • fc5bb53b32 Code abstraction, FP16 implementation, fix kernel, add FP16 to FP32 kernel 0cc4m 2023-06-28 20:18:55 +0200
  • 3adc7b1d60 First FP16 attempt, disabled for now 0cc4m 2023-06-28 07:36:56 +0200
  • 2c70df985a Continue vulkan implementation and optimization 0cc4m 2023-06-25 15:17:23 +0200
  • 0c9cca00bd Write coalescing 0cc4m 2023-06-25 09:54:40 +0200
  • 7c6860b483 2D Blocktiling 0cc4m 2023-06-24 18:40:11 +0200
  • 1b4863c2b9 1D Blocktiling 0cc4m 2023-06-24 08:01:43 +0200
  • baf9ff536b GEMM Kernel optimization 0cc4m 2023-06-23 14:43:57 +0200
  • a42376e7ec First matmul success 0cc4m 2023-06-22 09:46:00 +0200
  • 8ce84c2747 Continue implementation 0cc4m 2023-06-21 00:26:48 +0200
  • 2471728a9d Add aligned malloc and free for VMA 0cc4m 2023-06-13 12:00:06 +0200
  • fc4f207cfb Matmul call 0cc4m 2023-06-12 09:57:26 +0200
  • b0e65855d1 Vulkan development 0cc4m 2023-06-12 08:01:38 +0200
  • a4004d4fa8 Vulkan memory management 0cc4m 2023-06-11 19:26:52 +0200
  • 88d4ec05a8 Continue implementation 0cc4m 2023-06-11 08:49:43 +0200
  • 4a96d0eb7f Fix matmul kernel, continue implementation 0cc4m 2023-06-10 16:24:37 +0200
  • 061246fb07 Vulkan loader code 0cc4m 2023-05-07 07:22:12 +0200
  • eda663f15f update lite and up ver Concedo 2023-07-01 00:15:26 +0800
  • 0cb8a9eab3 Merge remote-tracking branch 'Johannes/cuda-scratch-size-adjust' into concedo_experimental Concedo 2023-06-30 23:29:38 +0800
  • 67cb0b2760 Merge branch 'master' into concedo_experimental Concedo 2023-06-30 23:25:40 +0800
  • d16926dff4 Merge branch 'concedo' into concedo_experimental Concedo 2023-06-30 23:06:21 +0800
  • baf6325907 added flag for building kquants in tools Concedo 2023-06-30 23:06:11 +0800
  • 30ea774e2c
    Update CMakeLists.txt with dmmv_x/y/f16 (#277) YellowRoseCx 2023-06-30 09:52:32 -0500
  • 1129d66ca9
    To fix build problem on Apple Metal LLAMA_METAL=1 (#282) bebopkim 2023-06-30 23:50:38 +0900
  • f0e1429d7f Implemented RMS_NORM niansa 2023-06-30 16:01:08 +0200
  • d1f84db4b6 Implemented GGML_OP_NORM niansa 2023-06-30 15:18:10 +0200
  • 8fa60134b1 Added missing break to mul_mat_f16 case niansa 2023-06-30 12:47:17 +0200
  • 0dc5f2f2ba Fixed mul mat dispatch size niansa 2023-06-30 12:31:13 +0200
  • f093bf2e5e Minor MUL_MAT fix and implemented DIAG_MASK_INF niansa 2023-06-30 12:19:29 +0200
  • 964fe8c546 Added mul_mat (needs fixes) niansa 2023-06-30 11:47:10 +0200
  • 600bf6d929 Test-based VRAM scratch size + context adjustment JohannesGaessler 2023-06-30 11:35:30 +0200
  • 8e215e4d9f add support of baichuan-7b Judd 2023-06-30 15:29:26 +0800
  • 86469d15c4 fix for yr-rocm, large gpu scratch Concedo 2023-06-30 12:40:08 +0800
  • dedd2067e8 convert: spike out xgen support Aman Karmani 2023-06-29 19:08:57 -0700
  • b95016c19b add newline jwj7140 2023-06-30 01:08:34 +0900
  • 1347d3acc0 another missing flag? Concedo 2023-06-30 00:02:18 +0800
  • 396f857021 make platform appropriate library Concedo 2023-06-29 23:50:48 +0800
  • f50c73a0b2 readme Concedo 2023-06-29 23:45:57 +0800