Commit Graph

  • beadbf3380
    mpi : fix inference Georgi Gerganov 2023-07-09 18:26:20 +0300
  • ef37dd14e7
    mpi : fix output tensor after MPI compute (still not working) Georgi Gerganov 2023-07-09 17:01:08 +0300
  • 8dd585e8cb Variable matmul kernel using specialization constants 0cc4m 2023-07-09 15:50:28 +0200
  • c717c5185f
    mpi : various fixes - communication now works but results are wrong Georgi Gerganov 2023-07-09 16:40:16 +0300
  • 01abb3b3b9
    mpi : move all MPI logic into ggml-mpi Georgi Gerganov 2023-07-09 16:04:27 +0300
  • e339d35579
    mpi : add names for layer inputs + prep ggml_mpi_graph_compute() Georgi Gerganov 2023-07-09 14:42:36 +0300
  • 3232db628c
    mpi : trying to move more MPI stuff into ggml-mpi (WIP) (#2099) Georgi Gerganov 2023-07-09 14:08:53 +0300
  • 3bc7a80ca6 Rework command buffer handling 0cc4m 2023-07-09 11:37:32 +0200
  • 1d16309969
    llama : remove "first token must be BOS" restriction (#2153) master-1d16309 oobabooga 2023-07-09 05:59:53 -0300
  • db4047ad5c
    main : escape prompt prefix/suffix (#2151) master-db4047a Nigel Bosch 2023-07-09 03:56:18 -0500
  • 18780e0a5e
    readme : update Termux instructions (#2147) JackJollimore 2023-07-09 05:20:43 -0300
  • 3bbc1a11f0
    ggml : fix buidling with Intel MKL but ask for "cblas.h" issue (#2104) (#2115) master-3bbc1a1 clyang 2023-07-09 16:12:20 +0800
  • 2492a53fd0
    readme : add more docs indexes (#2127) rankaiyx 2023-07-09 15:38:42 +0800
  • 3d9871715a
    Remove "first token must be BOS" restriction oobabooga 2023-07-08 23:50:19 -0300
  • 83db5cffed Escape prompt prefix/suffix Nigel Bosch 2023-07-08 18:51:02 -0500
  • b90c80bdbf Add __restrict__ to dequantize_mul_mat kernels JohannesGaessler 2023-07-08 22:53:43 +0200
  • 0ef62f511a Fix validation errors, improve compatibility with AMD GPUs 0cc4m 2023-07-08 20:40:19 +0200
  • 64639555ff
    Fixed OpenLLaMA 3b CUDA mul_mat_vec_q (#2144) master-6463955 Johannes Gäßler 2023-07-08 20:01:44 +0200
  • a7ce53f763 Fixed OpenLLaMA 3b CUDA mul_mat_vec_q JohannesGaessler 2023-07-08 10:50:59 +0200
  • c7c761a2b7 Add split-k optimization for small matrix multiplication 0cc4m 2023-07-08 17:27:05 +0200
  • 325fc88141 Shift all values by the max value before applying logsoftmax Bach Le 2023-07-08 00:10:26 +0800
  • 8e66e59cdd Record sampling time in llama_sample_classifier_free_guidance Bach Le 2023-07-08 00:07:49 +0800
  • 66eb048470 Correct typo. CFG already means context-free grammar. Bach Le 2023-07-07 23:48:07 +0800
  • 422a7ffdaf Make Classifier-Free Guidance a sampling function Bach Le 2023-07-07 23:45:37 +0800
  • 114d4c5389 Make freeing of guidance_ctx conditional Bach Le 2023-07-07 23:10:47 +0800
  • 8f91b52fdf Free guidance context Bach Le 2023-07-07 22:50:42 +0800
  • 8ba5b137c8 Restore signature of llama_init_from_gpt_params Bach Le 2023-07-07 22:25:00 +0800
  • 478630019b Remove debug print Bach Le 2023-07-07 22:20:04 +0800
  • d09d5ed640 Initial implementation Bach Le 2023-07-07 21:35:46 +0800
  • 83ca507cfd
    Merge branch 'ggerganov:master' into master m3ndax 2023-07-08 15:34:22 +0200
  • b180adccc3
    Update README.md JackJollimore 2023-07-08 09:45:47 -0300
  • 15576bc865 Merge branch 'kquant_vocab_fix' into concedo_experimental Concedo 2023-07-08 20:43:20 +0800
  • 1854168841 This allows LLAMA models that were previously incompatible with K quants to function mostly as normal. This happens when a model has a vocab != 32000, e.g 32001 which means it's not divisible by 256 or 64. Since the problematic dimensions only apply for tok_embeddings.weight and output.weight (dimentions 4096 x n_vocab), we can simply quantize these layers to Q8_0 whereas the majority of the hidden layers are still K-quanted since they have compatible dimensions. Concedo 2023-07-08 20:31:49 +0800
  • 4749543a88
    fix ggml_tensor_extra_gpu memory leak eajechiloae 2023-07-08 15:08:05 +0300
  • e344540620
    ci: add linux binaries to release build ci_cublas_linux-e344540 Green Sky 2023-05-05 00:01:30 +0200
  • 4e46673f80
    Merge branch 'LostRuins:concedo' into concedo callMeMakerRen 2023-07-08 09:33:26 +0800
  • 061f5f8d21
    CUDA: add __restrict__ to mul mat vec kernels (#2140) master-061f5f8 Johannes Gäßler 2023-07-08 00:25:15 +0200
  • c8abd83c55 CUDA: add __restrict__ to mul mat vec kernels JohannesGaessler 2023-07-07 13:51:31 +0200
  • 84525e7962
    docker : add support for CUDA in docker (#1461) master-84525e7 dylan 2023-07-07 11:25:25 -0700
  • a7e20edf22
    ci : switch threads to 1 (#2138) master-a7e20ed Georgi Gerganov 2023-07-07 21:23:57 +0300
  • 5d0e752724
    Merge branch 'master' into feat/docker-cuda Georgi Gerganov 2023-07-07 21:23:38 +0300
  • bf7d02d965
    ci : switch threads to 1 Georgi Gerganov 2023-07-07 21:11:36 +0300
  • 1d656d6360
    ggml : change ggml_graph_compute() API to not require context (#1999) Qingyou Meng 2023-07-08 00:24:01 +0800
  • c15833c8d6
    ggml : remove comments from source file and match order in header Georgi Gerganov 2023-07-07 19:13:26 +0300
  • 98d129cd06
    Use angle brackets to indicate the system library clyang 2023-07-07 23:57:16 +0800
  • 8edcb337c6 added ability to select "all devices" Concedo 2023-07-07 23:37:55 +0800
  • 7242140283 ggml : remove sched_yield() call in ggml_graph_compute_thread() (#2134) master-7242140 Georgi Gerganov 2023-07-07 18:36:37 +0300
  • ddaa4f2a26 fix cuda garbage results and gpu selection issues Concedo 2023-07-07 22:14:14 +0800
  • 3e08ae99ce
    convert.py: add mapping for safetensors bf16 (#1598) Aarni Koskela 2023-07-07 16:12:49 +0300
  • ef61acfbf5 Add info to README Evan Miller 2023-07-07 09:02:23 -0400
  • 95eca51bef add gpu choice for GUI for cuda Concedo 2023-07-07 18:39:47 +0800
  • a689a66068 make it work with pyinstaller Concedo 2023-07-07 17:52:34 +0800
  • 9ee9a77f12 warn outdated GUI (+1 squashed commits) Concedo 2023-07-07 16:25:37 +0800
  • 32102c2064 Merge branch 'master' into concedo_experimental Concedo 2023-07-07 14:15:39 +0800
  • a3b4d93285 server: use proper Content-Type in curl examples Xiao-Yong Jin 2023-07-07 00:52:06 -0500
  • c3d947510b Optimize warptile matmul shader, replace blocktile with it 0cc4m 2023-07-07 07:13:47 +0200
  • 894c72819c Merge branch 'concedo' of https://github.com/callMeMakerRen/koboldcpp into concedo shutup 2023-07-07 11:57:25 +0800
  • 6d5a0ada8c
    Merge pull request #2 from SlyEcho/vulkan 0cc4m 2023-07-07 05:53:11 +0200
  • 1727e652f1 expose some useful info that can be used in statistics of performence shutup 2023-07-07 11:52:58 +0800
  • ea06a2c321
    Disable glslc optimization for CMake 0cc4m 2023-07-07 05:52:33 +0200
  • 481f793acc
    Fix opencl by wrap #if-else-endif with \n (#2086) master-481f793 Howard Su 2023-07-07 11:34:18 +0800
  • 88910c30df
    Update README.md to add more docs indexes rankaiyx 2023-07-07 11:13:29 +0800
  • a728a0d185 llama: make MEM_REQ_EVAL depend on n_ctx Xiao-Yong Jin 2023-07-06 21:40:29 -0500
  • 5c6eed39ee llama: increase MEM_REQ_EVAL for MODEL_3B Xiao-Yong Jin 2023-07-03 21:31:34 -0500
  • 41819b0bd7 common: fix argument names in help Xiao-Yong Jin 2023-06-30 00:46:57 -0500
  • 1ae4318ddb ggml-metal: fix custom rope Xiao-Yong Jin 2023-06-30 00:45:34 -0500
  • dc0d0eb6a9 Implement customizable RoPE Xiao-Yong Jin 2023-06-29 23:16:04 -0500
  • b76e3d676e
    Update README.md to add more docs indexes rankaiyx 2023-07-07 10:42:59 +0800
  • 55207ba2b8 Add GH workflow, fix test Evan Miller 2023-07-06 21:40:18 -0400
  • 1f0a2cfeda Update CMakeLists.txt Evan Miller 2023-07-06 21:25:34 -0400
  • 06a239343c PR comments Evan Miller 2023-07-06 20:18:41 -0400
  • 32deabfdc8 Merge branch 'master' into mpi Evan Miller 2023-07-06 19:04:50 -0400
  • 58d663d327 hack in empty tokens for unknown vocab Aman Karmani 2023-07-06 14:08:32 -0700
  • f789f2cef2
    llama : avoid unnecessary bool Georgi Gerganov 2023-07-06 21:54:04 +0300
  • 551ed08234
    ggml : fix indentation in switch Georgi Gerganov 2023-07-06 21:35:22 +0300
  • 8dc7f104f8
    ggml : remove obsolete assert + refactor n_tasks section Georgi Gerganov 2023-07-06 21:28:10 +0300
  • 9c9bdaf0b8
    llama : fix duplicate symbols + refactor example benchmark Georgi Gerganov 2023-07-06 21:18:42 +0300
  • 8fdf86dd25
    ci : fix env Georgi Gerganov 2023-07-06 21:15:17 +0300
  • 2d3a5252f9
    llama : factor out plan stuff into a helper function Georgi Gerganov 2023-07-06 21:12:25 +0300
  • a67404e749
    examples : factor out plan allocation into a helper function Georgi Gerganov 2023-07-06 21:08:25 +0300
  • 1b9994f809
    ci : enable test-grad0 Georgi Gerganov 2023-07-06 20:57:12 +0300
  • 2392f7a9cd
    ggml : add ggml_graph_compute_with_ctx() Georgi Gerganov 2023-07-06 20:43:43 +0300
  • 8e1f0b6865
    tests : disable grad / opt + minor naming changes Georgi Gerganov 2023-07-06 20:30:40 +0300
  • 4646cc2cf1
    ggml : fix docs Georgi Gerganov 2023-07-06 20:25:27 +0300
  • 53cfb4b995
    ggml : more consistent naming + metal fixes Georgi Gerganov 2023-07-06 20:23:08 +0300
  • dfd9fce6d6
    ggml : fix restrict usage master-dfd9fce Georgi Gerganov 2023-07-06 19:41:31 +0300
  • 36680f6e40
    convert : update for baichuan (#2081) master-36680f6 Judd 2023-07-07 00:23:49 +0800
  • a17a2683d8
    alpaca.sh : update model file name (#2074) tslmy 2023-07-06 09:17:50 -0700
  • 8424a35c62 added the ability to ban any substring tokens Concedo 2023-07-06 23:24:21 +0800
  • 27a0907cfa backport MM256_SET_M128I to ggml_v2, updated lite, added support for selecting the GPU for cublas Concedo 2023-07-06 22:33:46 +0800
  • f607bd1217 Add new APIs Howard Su 2023-07-06 21:12:44 +0800
  • cd8a59be71 Fix opencl by wrap #if-else-endif with \n Howard Su 2023-07-04 07:18:56 +0800
  • 220aa707e6 Merge branch 'master' into concedo_experimental Concedo 2023-07-06 15:40:40 +0800
  • 4d1700b172 adjust some ui sizing Concedo 2023-07-06 15:17:47 +0800
  • 1c80002310
    New UI using customtkinter (#284) Vali-98 2023-07-06 15:00:57 +0800
  • b1331d7e60 reusable buffers mqy 2023-07-04 20:38:46 +0800
  • cb1dec0ec0 minor: update comments mqy 2023-07-03 23:58:31 +0800
  • 2b502c32ca add static ggml_graph_compute_sugar() mqy 2023-07-03 20:28:07 +0800
  • db81f33ef2 remove ggml_graph_compute from tests/test-grad0.c, but current change breaks backward mqy 2023-07-03 18:10:00 +0800
  • a37de23953 minor: rename ctx as plan; const mqy 2023-07-03 16:22:52 +0800