Commit Graph

  • 4caebf6d40
    gitignore : vdot Georgi Gerganov 2023-04-18 23:00:08 +0300
  • dcdd65e296
    ggml : optimize ggml_vec_dot_q4_0_q8_0() using vectorized accumulators master-dcdd65e Georgi Gerganov 2023-04-18 22:59:17 +0300
  • 7840f6637c
    ggml : use 8-bit precision for Q4_1 intermediate results (ARM) Georgi Gerganov 2023-04-18 22:12:19 +0300
  • 5ecff35151
    Adding a simple program to measure speed of dot products (#1041) master-5ecff35 Kawrakow 2023-04-18 21:00:14 +0200
  • e8061e6990
    Merge 3dc5243b1b into 7faa7460f0 Jan Bielak 2023-04-18 20:35:14 +0200
  • 5725eec429
    Update CMakeLists.txt 源文雨 2023-04-19 01:10:49 +0800
  • 7faa7460f0
    readme : update hot topics about new LoRA functionality Georgi Gerganov 2023-04-18 20:10:26 +0300
  • 5af8e32238
    ci : do not run on drafts master-5af8e32 Georgi Gerganov 2023-04-17 18:00:10 +0300
  • d7c53a084e
    Update CMakeLists.txt 源文雨 2023-04-19 00:45:50 +0800
  • fdb55c9a01
    Update CMakeLists.txt 源文雨 2023-04-19 00:45:29 +0800
  • 72cd433066
    ggml : test dot product q4_0 x f32 Georgi Gerganov 2023-04-18 19:20:37 +0300
  • 4440d198c0 Add NVIDIA cuBLAS support Slaren 2023-04-18 18:16:27 +0200
  • d4dd743d6f
    fix: ld link test-tokenizer-0 error 源文雨 2023-04-19 00:13:59 +0800
  • baee7684df Adding a POC dot product for Q4_1 quantization Iwan Kawrakow 2023-04-18 17:24:45 +0200
  • 42031dac73 Adding a simple program to measure speed of dot products Iwan Kawrakow 2023-04-18 16:14:01 +0200
  • f39def81d4 Update readme with more info Concedo 2023-04-18 21:44:26 +0800
  • 3614956bc7 update readme Concedo 2023-04-18 21:39:05 +0800
  • ea01771dd5 rwkv is done Concedo 2023-04-18 20:55:01 +0800
  • 391c5a247a remove redundant free memory call. wbpxre150 2023-04-18 19:11:59 +0800
  • a76b15b581 Merge branch 'concedo' into concedo_experimental Concedo 2023-04-18 17:42:43 +0800
  • ed5b5c45a9
    doc - enhanced readme explaing how to compile at Windows. (#80) Gustavo Rocha Dias 2023-04-18 06:40:04 -0300
  • a9253cdfba
    fix - at some OSs the PyInstaller command is case sensitive, at lowercase it doen't work. (#81) Gustavo Rocha Dias 2023-04-18 06:39:06 -0300
  • ac61e34d5f Merge branch 'master' into concedo_experimental Concedo 2023-04-18 17:38:10 +0800
  • c200b674f4 updated kobold lite, work on rwkv, added exe path to model load params, added launch parameter Concedo 2023-04-18 17:36:44 +0800
  • 42747220b4
    Do not close file after mmap (Windows version) (#1034) master-4274722 Ivan Komarov 2023-04-18 03:15:50 +0200
  • bf1a24aceb Do not close file after mmap (Windows version) Ivan Komarov 2023-04-18 02:19:43 +0200
  • d1f02102f8
    examples : evaluate tokens in batches after swapping context grencez 2023-04-17 13:01:17 -0700
  • e9298af389
    readme : add Ruby bindings (#1029) Atsushi Tatsuma 2023-04-18 04:34:35 +0900
  • d1b51ceb8e
    ggml : minor Georgi Gerganov 2023-04-17 21:59:58 +0300
  • 4ad73137a1
    add 4_0 to default outfile namestr dict (#1031) Cameron 2023-04-17 11:26:23 -0700
  • 2299b0a5f5 Use VLAs instead of alloca, if possible. Olaf Seibert 2023-04-12 20:37:41 +0200
  • 7905368223 --amend Cammy 2023-04-17 11:12:25 -0700
  • 74e28db50f add 4_0 to default outfile namestr dict Cammy 2023-04-17 09:11:14 -0700
  • a8592fbc11
    ggml : explicit assignment of deltas Georgi Gerganov 2023-04-17 18:41:14 +0300
  • 315a95a4d3
    Add LoRA support (#820) master-315a95a slaren 2023-04-17 17:28:55 +0200
  • 9d0a3a71b0
    readme : add Ruby bindings yoshoku 2023-04-18 00:15:20 +0900
  • 0f5ee9e13f
    ci : do not run on drafts Georgi Gerganov 2023-04-17 18:00:10 +0300
  • 48ab0963ae No need to copy tokens Howard Su 2023-04-17 23:07:30 +0800
  • 7da998da01
    ggml : initial ARM_NEON 2x F16 Q4_0 implementation Georgi Gerganov 2023-04-17 16:17:06 +0300
  • 331343ab0e Make reverse prompt option act as a stop token in non-interactive scenarios Jason McCartney 2023-04-17 07:51:31 -0700
  • 6a4fa4d95a Speedup the AVX-512 implementation of ggml_vec_dot_q4_0() Ivan Komarov 2023-04-13 23:23:01 +0200
  • efd05648c8
    llama : well-defined static initialization of complex objects (#927) master-efd0564 Arik Poznanski 2023-04-17 17:41:53 +0300
  • eb17a026fd
    quantize-stats : fix bug in --type argument master-eb17a02 Georgi Gerganov 2023-04-17 17:31:06 +0300
  • 8e923dc6e9 updated kobold lite Concedo 2023-04-17 21:33:57 +0800
  • 69b740289f
    ggml : avoid using ggml_fp16_to_fp32() and ggml_fp32_to_fp16() in ggml.c master-69b7402 Georgi Gerganov 2023-04-17 16:16:23 +0300
  • f266259ad9
    Speedup the AVX-512 implementation of ggml_vec_dot_q4_0() (#933) master-f266259 Ivan Komarov 2023-04-17 15:10:57 +0200
  • c6b1e55b54 Tailing white spaces mqy 2023-04-17 19:36:17 +0800
  • 1f4a69c051 version number api Concedo 2023-04-17 19:31:15 +0800
  • 29a49c653e try fix windows compile error: pthread_mutext_t and others mqy 2023-04-17 18:59:13 +0800
  • 481c6d517b Fixed a segment fault when n_threads == 1 and GGML_GLOBAL_THREADS is defined mqy 2023-04-17 18:40:54 +0800
  • f57ff961fd try fix windows compile error: atomic_flag mqy 2023-04-17 18:20:54 +0800
  • 364e2736c9 Merge branch 'master' into concedo Concedo 2023-04-17 17:34:50 +0800
  • 763ad172c0 arranged files, updated kobold lite, modified makefile for extra link args on linux, started RWKV implementation Concedo 2023-04-17 17:31:45 +0800
  • 3dc5243b1b
    Incorporate feedback from @howard0su Jan Bielak 2023-04-17 10:14:29 +0200
  • cba49ad48b try fix windows compile errors: undefined c11 atomic_flag_*() mqy 2023-04-17 15:59:06 +0800
  • f0ccde172a Add command mode to interactive mode. wbpxre150 2023-04-17 15:21:31 +0800
  • a1ee3b4e47 print timings on ctrl+c exit wbpxre150 2023-04-17 15:04:35 +0800
  • 6b515403c8 threading: preemptive, local/global mqy 2023-04-17 14:28:39 +0800
  • c770e0145f Merge remote-tracking branch 'upstream/master' into eval-thread-count ml6 2023-04-16 14:03:18 -0700
  • 47f61aaa5f
    Fix: do not close file on mmap (#1017) master-47f61aa slaren 2023-04-16 21:27:38 +0200
  • 9331fad93b Fix: do not close file on mmap Slaren 2023-04-16 20:38:52 +0200
  • 8d37db3cdf ggml_add: Add more checks Slaren 2023-04-16 18:53:44 +0200
  • 0a6d5ad7cc Reuse definitions from convert.py Slaren 2023-04-16 18:52:22 +0200
  • 63da54e016 Only attempt to use mmap for the lora base model if it is supported Slaren 2023-04-16 18:30:27 +0200
  • 3df343b4f0 ggml_cpy: use the work buffer instead of alloca when quantizing Slaren 2023-04-15 20:29:05 +0200
  • 14858ba2bf Show warning when using a quantized base model Slaren 2023-04-15 20:11:07 +0200
  • fc89916002 Fix windows build Slaren 2023-04-15 19:54:56 +0200
  • c150e1b0c3 Add support for using a different base model Slaren 2023-04-15 19:45:00 +0200
  • 57627f0e5f Rebase to master Slaren 2023-04-13 18:06:33 +0200
  • c45868ba9f Support more layer types, fix memory and generation issues Slaren 2023-04-11 23:15:29 +0200
  • c920f00136 Add compatibility with #801 Slaren 2023-04-10 21:52:10 +0200
  • 87c518bb3d Update exporter and support scaling Slaren 2023-04-08 19:39:24 +0200
  • dc65707130 Use the work buffer instead to fix MSVC build Slaren 2023-04-08 13:41:57 +0200
  • 7136adac8a Add support for quantized models Slaren 2023-04-08 13:12:44 +0200
  • ac3fbe492a Export lora A matrix pre-transposed Slaren 2023-04-08 03:37:12 +0200
  • f52101e889 Add lora support Slaren 2023-04-06 23:18:59 +0200
  • 1506737499
    Add mmap pages stats (disabled by default) mmap-pages-stats Pavol Rusnak 2023-04-16 18:19:50 +0200
  • 9581171a9f updated embedded lite again Concedo 2023-04-16 22:42:51 +0800
  • df2d350c00 minor jon-chuang 2023-04-16 22:10:26 +0800
  • bee6a401fd slight clarity fix Concedo 2023-04-16 22:04:19 +0800
  • 96fb12cfa2 Merge branch 'master' into concedo Concedo 2023-04-16 21:59:05 +0800
  • c757fbee1d fixes to stopper tokens, fixed BLAS mode for GPT2 and GPTJ, updated kobold lite Concedo 2023-04-16 21:54:18 +0800
  • 9ee4719ee9 fix windows jon-chuang 2023-04-16 21:19:34 +0800
  • 6548d3b3fb Added prints for stopping sequences, made makefile 1% friendlier to arch linux users Concedo 2023-04-16 20:43:17 +0800
  • 3173a62eb9
    stdout : vertical align outputs for better readibility master-3173a62 Georgi Gerganov 2023-04-16 13:58:48 +0300
  • 525184930d added a kobold API compatible implementation of stopping sequences Concedo 2023-04-16 18:37:49 +0800
  • 489537e6cf
    examples: add missing <ctime> include for time() (#1011) master-489537e Pavol Rusnak 2023-04-16 12:13:00 +0200
  • 038e9a3b6d
    examples: add missing <ctime> include for time() Pavol Rusnak 2023-04-16 11:16:08 +0200
  • 2d3481c721
    Fix msys2 build error and warnings (#1009) master-2d3481c nanahi 2023-04-16 17:13:42 +0800
  • 8bf2e50a11 converted the cl file to be a string literal instead Concedo 2023-04-16 15:57:30 +0800
  • 5a4d1b5d15 Merge branch 'master' into concedo Concedo 2023-04-16 14:08:23 +0800
  • 8e9b093a4a Fix msys2 build error and warnings nanahi 2023-04-16 10:38:32 +0800
  • eea154c9c8 project updated Paulo Coutinho 2023-04-15 21:32:10 -0300
  • e95a8336d5 test-quantize: fix for q8_0 intermediates Håkon H. Hitland 2023-04-16 00:37:16 +0200
  • 6071228818 test-quantize: remove Håkon H. Hitland 2023-04-14 23:19:36 +0200
  • 8bd7dd64ba test-quantize-fns: CI fixes Håkon H. Hitland 2023-04-14 23:15:13 +0200
  • ebee501cca Unit test for quantization functions Håkon H. Hitland 2023-04-14 00:34:33 +0200
  • 74f5899df4
    convert.py: Fix loading safetensors and ggml format on Windows (#991) comex 2023-04-15 14:53:21 -0700
  • 1a6c8cf72c remove color jon-chuang 2023-04-16 02:47:32 +0800
  • 2f7c8e014e
    Fix potential int8 overflow in non-SIMD vec_dot (#986) master-2f7c8e0 Stephan Walter 2023-04-15 18:28:56 +0000