Commit Graph

  • 5dc35d3b59 Disable _O_WTEXT when using main in MinGW asctime 2023-06-17 02:07:18 +1000
  • dd0ff3a5f4
    metal : fix prints for overlapping views Georgi Gerganov 2023-06-16 18:57:20 +0300
  • 71443d757e
    metal : print more verbose device info + handle errors Georgi Gerganov 2023-06-16 18:53:18 +0300
  • 741c19a756 Maybe libggml has to be static KerfuffleV2 2023-06-16 09:40:59 -0600
  • 12156fb4a8
    metal : handle buffers larger than device's maxBufferLength Georgi Gerganov 2023-06-12 22:06:57 +0300
  • 48f17f2ff8 Allow cmake to build ggml as a library KerfuffleV2 2023-06-16 09:16:54 -0600
  • 3778836046 Work in progress. Added falcon main and library based on llama.cpp CPU inference works (getting ~260ms/token on 7B 16 bit falcon) Tested with 7B 16 bit and the two shakespear models (both in 16 bit precisiononly) John 2023-06-16 16:31:02 +0200
  • c341915b11
    Fixed a mismatch format FrankHB 2023-06-16 22:22:43 +0800
  • d50bfb2630
    Fixed possible macro redefinition FrankHB 2023-06-16 22:19:53 +0800
  • 1170a95732 Fixed embd when offloading non-repeating layers JohannesGaessler 2023-06-16 15:34:46 +0200
  • 7b737917d1 Fix build and remove warning Howard Su 2023-06-16 21:09:45 +0800
  • fb49c05351
    Merge branch 'ggerganov:master' into master Randall Fitzgerald 2023-06-16 07:54:45 -0400
  • 31b20758c8 Add LLAMA_CUDA_KQUANTS_ITER to CMakeLists.txt and Makefile Iwan Kawrakow 2023-06-16 12:55:50 +0300
  • 3edee085ea Imrove Q6_K dot kernel on older GPUs Iwan Kawrakow 2023-06-16 12:43:20 +0300
  • 7ef8d740b9 Merge branch 'master' into concedo_experimental Concedo 2023-06-16 16:37:14 +0800
  • ae88eec40b updated lite Concedo 2023-06-16 16:27:23 +0800
  • 7ced197127 Imrove Q2_K dot kernel on older GPUs Iwan Kawrakow 2023-06-16 11:27:09 +0300
  • e828c0e52a
    build : fix and ignore MSVC warnings Borislav Stanimirov 2023-06-16 10:18:17 +0300
  • 602c748863
    gitignore : add several entries specific to Visual Studio (#1888) Borislav Stanimirov 2023-06-16 09:58:11 +0300
  • 576c8e6186
    gitignore: add several entries specific to Visual Studio Borislav Stanimirov 2023-06-16 09:07:51 +0300
  • 4eff089421 examples : add JSON schema grammars Evan Jones 2023-06-10 00:08:05 -0400
  • 8cc5bed078 fix: add auto detection on the BLAS_INCLUDE_DIRS zenix 2023-06-16 13:19:56 +0900
  • 488c62acf9 Merge remote-tracking branch 'upstream/master' Randall Fitzgerald 2023-06-15 17:12:29 -0400
  • a09f9195be
    Fixed CUDA runtime version check (#1879) master-a09f919 Johannes Gäßler 2023-06-15 21:49:08 +0200
  • bed9275617
    cmake : remove whitespaces master-bed9275 Georgi Gerganov 2023-06-15 21:56:50 +0300
  • c36e81da62
    examples : add chat-vicuna.sh (#1854) master-c36e81d yangli2 2023-06-15 11:05:53 -0700
  • dd13ec96ed Fixed CUDA runtime version check JohannesGaessler 2023-06-15 19:55:36 +0200
  • 3559433fec
    cmake : set include path for OpenBlas (#1830) master-3559433 Igor Okulist 2023-06-15 12:51:26 -0500
  • 69b34a0e80
    swift : Package compile breaks due to ggml-metal.metal (#1831) master-69b34a0 Frederik Vogel 2023-06-16 02:47:04 +0900
  • cf267d1c71
    make : add train-text-from-scratch (#1850) master-cf267d1 daboe01 2023-06-15 19:42:48 +0200
  • 4af9a7d6d9
    Merge branch 'master' into finetuning-acessability Georgi Gerganov 2023-06-15 20:42:26 +0300
  • 9a60bbe8de
    Update examples/train-text-from-scratch/README.md Georgi Gerganov 2023-06-15 20:41:58 +0300
  • 9dda13e5e1
    readme : server compile flag (#1874) Srinivas Billa 2023-06-15 18:36:38 +0100
  • 37e257c48e
    make : clean *.so files (#1857) master-37e257c sandyiscool 2023-06-15 23:06:06 +0530
  • 64cc19b4fe
    Fix the validation of main device (#1872) master-64cc19b Howard Su 2023-06-16 01:29:59 +0800
  • 4bfcc855ab
    metal : parallel command buffer encoding (#1860) master-4bfcc85 Georgi Gerganov 2023-06-15 20:29:48 +0300
  • ef43a62289
    metal : determine number of command buffers based on gf->n_threads Georgi Gerganov 2023-06-15 20:26:56 +0300
  • 6b8312e797
    Better error when using both LoRA + GPU layers (#1861) master-6b8312e Johannes Gäßler 2023-06-15 19:06:46 +0200
  • 0971f83bca added eos token id handling for starcoder models, as they use a different EOS ID Concedo 2023-06-15 22:57:14 +0800
  • 2d5e8b2ca0 Add newline at end of file Howard Su 2023-06-15 21:39:21 +0800
  • 013992e280
    Update README.md Srinivas Billa 2023-06-15 14:38:20 +0100
  • 77ab0c0f3d Fix embedding when embedding layer on GPU Howard Su 2023-06-15 21:33:58 +0800
  • 4e2e286cca Fix the validation of main device Howard Su 2023-06-15 21:31:12 +0800
  • 8a2a73102c Fixes CMake style to use lowercase like everywhere else Jeremy Dunn 2023-06-15 08:28:58 -0500
  • 6b764ee72f
    Merge pull request #2 from ggerganov/master l3utterfly 2023-06-15 21:22:51 +0800
  • eff0834249
    Merge branch 'ggerganov:master' into master yangli2 2023-06-15 06:19:39 -0700
  • 3649d35cca Merge branch 'master' into concedo_experimental Concedo 2023-06-15 18:24:31 +0800
  • aee859519e
    Update README.md Randall Fitzgerald 2023-06-15 01:50:54 -0700
  • 6a113eeec8 Merge branch 'concedo' into concedo_experimental Concedo 2023-06-15 14:47:32 +0800
  • b1b8dc32c9
    Fix Makefile for CUBLAS. (#241) Ycros 2023-06-15 16:46:47 +1000
  • 23710144dc fixed clean target daboe01 2023-06-15 07:02:00 +0200
  • 8ff35ef944 updated lite Concedo 2023-06-15 12:13:55 +0800
  • 414f25104b remove swp file Evan Jones 2023-06-15 00:13:13 -0400
  • 58ca9bc6c0 adjust JSON grammar Evan Jones 2023-06-15 00:06:54 -0400
  • b876d19cff fix bugs with empty token and EOS Evan Jones 2023-06-14 23:53:55 -0400
  • 421c6e1ca1 support alternates in root rule Evan Jones 2023-06-14 23:53:12 -0400
  • 50537b471d exposed modules so that they can be invoked by nix run github:ggerganov/llama.cpp#server etc Faez Shakil 2023-06-15 01:49:03 +0500
  • f858cd64d4 Merge remote-tracking branch 'upstream/master' Randall Fitzgerald 2023-06-14 16:48:01 -0400
  • 5e107c2aac
    Merge pull request #24 from anon998/logit-bias Randall Fitzgerald 2023-06-14 16:27:43 -0400
  • 61df8e9217
    add cudaMemset Henri Vasserman 2023-06-14 22:46:10 +0300
  • a836529996
    Merge 'origin/master' into hipblas Henri Vasserman 2023-06-14 22:41:55 +0300
  • fea717efa8 Better error when using both LoRA + GPU layers JohannesGaessler 2023-06-14 21:21:28 +0200
  • dc67f1a06e cuda : faster k-quant dot kernels Iwan Kawrakow 2023-06-14 22:14:21 +0300
  • 1556bbb6a3 block_q5_k const hoist Steven Roussey 2023-06-14 12:02:40 -0700
  • 69bae5d277
    metal : parallel command buffer encoding Georgi Gerganov 2023-06-14 21:01:48 +0300
  • 254a7a7a5f
    CUDA full GPU acceleration, KV cache in VRAM (#1827) master-254a7a7 Johannes Gäßler 2023-06-14 19:47:19 +0200
  • bd81096927 fix typo in readme + don't ignore integers anon 2023-06-14 13:29:05 -0300
  • 2048c061d5 Update Makefile to clean *.so files too. Sandeep 2023-06-14 21:42:09 +0530
  • b783da97a6 Fixed LLAMA_CUDA_DMMV_Y > 1 for WizardLM JohannesGaessler 2023-06-14 16:36:46 +0200
  • 546f850796
    Update examples/server/server.cpp Henri Vasserman 2023-06-14 17:41:58 +0300
  • 6f54ad042b fixed: model path was wrong daboe01 2023-06-14 14:26:07 +0200
  • 3ed3e7b7e2 reverted sequence mode for rwkv due to multiple issues with speed loss with bigger quantized models Concedo 2023-06-14 20:03:14 +0800
  • addd592828 fixed: naming of binary daboe01 2023-06-14 12:42:12 +0200
  • 1e9980e20a fixed: name of executable was wrong daboe01 2023-06-14 12:30:07 +0200
  • 7fff8782e3 fixed: targed was in wrong line daboe01 2023-06-14 12:27:35 +0200
  • 09f0a94519 make finetuning example accessible daboe01 2023-06-14 12:25:49 +0200
  • 9af5ab5c1e Add an example script that works with the Vicuna model. Adding a small bit of documenting comment in llama.h. Yang Li 2023-06-14 01:09:21 -0700
  • 8f65eecf20
    typo and comments simple.cpp SuperUserNameMan 2023-06-14 09:33:31 +0200
  • 9d2f4a8000 Used local copy of CLBlast instead of installed one l3utterfly 2023-06-14 15:09:05 +0800
  • 7a4f712a29
    removed trailing white spaces simple.cpp SuperUserNameMan 2023-06-14 08:58:18 +0200
  • f83b66606b Merge branch 'concedo' into concedo_experimental Concedo 2023-06-14 11:50:24 +0800
  • 443903fa0f up ver with these minor improvements Concedo 2023-06-14 11:50:13 +0800
  • ce36167976
    fix Fix the link on the Mac platform OpenCL method (#227) tqcq 2023-06-14 11:41:39 +0800
  • f5247be0d7 Merge branch 'master' into concedo_experimental Concedo 2023-06-14 11:35:43 +0800
  • 2b4a286e56 Merge remote-tracking branch 'occam/kquant-opencl' into concedo_experimental Concedo 2023-06-14 11:34:53 +0800
  • e4265198ed added cublas back into the makefile as some people requested Concedo 2023-06-14 11:34:40 +0800
  • a47072b85d Cleaned up code, added comments JohannesGaessler 2023-06-14 00:00:53 +0200
  • 51830ee5e6 Fixed Windows performance JohannesGaessler 2023-06-13 22:37:17 +0200
  • 222c679842 Move a repeated calc to const Steven Roussey 2023-06-13 12:51:08 -0700
  • d9f38465b7
    ci: add linux binaries to release build ci_cublas_linux-d9f3846 Green Sky 2023-05-05 00:01:30 +0200
  • dba14529de Added a --low-vram option JohannesGaessler 2023-06-13 21:40:33 +0200
  • 9254920265
    baby-llama : fix operator!= (#1821) master-9254920 0xspringtime 2023-06-13 15:37:54 -0400
  • e32089b2c2
    train : improved training-from-scratch example (#1652) master-e32089b xaedes 2023-06-13 21:04:40 +0200
  • d4b6438708
    ci : re-enable workflows + add README for training Georgi Gerganov 2023-06-13 21:38:00 +0300
  • 6075d7862d
    Merge pull request #23 from anon998/fix-linter-warnings Randall Fitzgerald 2023-06-13 14:32:19 -0400
  • 7a48ade7ef fix comment indentation anon 2023-06-13 14:46:40 -0300
  • c369d11905
    remove 273: Trailing whitespace SuperUserNameMan 2023-06-13 19:36:27 +0200
  • 7df316b728 fix linter warnings + make variables const anon 2023-06-13 14:28:52 -0300
  • 575cf23862 remove json_indent variable anon 2023-06-13 14:21:40 -0300
  • 2347e45e7b
    llama : do a warm-up eval at start for better timings (#1824) master-2347e45 Georgi Gerganov 2023-06-13 20:20:07 +0300