Commit Graph

  • e4903957ec Add vectorized loading and zeropadding for matrix multiplication 0cc4m 2023-07-19 10:13:51 +0200
  • 63ba9f3306
    llama : make tensor_split ptr instead of array Georgi Gerganov 2023-07-19 10:25:41 +0300
  • 294f424554
    llama : extend API to get max devices at runtime (#2253) master-294f424 Rinne 2023-07-19 15:06:40 +0800
  • 45a1b07e9b
    flake : update flake.nix (#2270) master-45a1b07 wzy 2023-07-19 15:01:55 +0800
  • b1f4290953
    cmake : install targets (#2256) master-b1f4290 wzy 2023-07-19 15:01:11 +0800
  • 3eefb221b0
    Update flake.nix Wu Zhenyu 2023-07-19 14:12:53 +0800
  • 0d7240b320 modified rope for cuda Concedo 2023-07-19 14:16:27 +0800
  • 8d37755bdc add inverse char ranges Evan Jones 2023-07-18 21:54:44 -0400
  • 295f85654a allocators wip renamed ggml_backend functions changed ggml_buffer and ggml_backend to always be used as pointers rename ggml_tensor::params -> op_params slaren 2023-07-17 19:03:51 +0200
  • 374fffb9c6 Reworking rope WIP Concedo 2023-07-19 00:54:41 +0800
  • 63ec354ad1
    Fix #2252, add install() to CMakeLists.txt Wu Zhenyu 2023-07-18 15:51:06 +0800
  • ed960fa1ab
    llama : separate compute buffer for metal Georgi Gerganov 2023-07-18 19:19:59 +0300
  • 652c849643
    ggml : add is_ram_shared to ggml_backend Georgi Gerganov 2023-07-18 18:51:02 +0300
  • 90503f150d
    llama : init metal backend as CPU backend for now Georgi Gerganov 2023-07-18 17:52:13 +0300
  • 0a3861c47b
    metal : adapting to ggml_backend (WIP) Georgi Gerganov 2023-07-18 16:54:41 +0300
  • 0a11f50da8 reenabled sched_yield, reduced sampler warning msg to once per session Concedo 2023-07-18 20:26:18 +0800
  • d01bccde9f
    ci : integrate with ggml-org/ci (#2250) master-d01bccd Georgi Gerganov 2023-07-18 14:24:43 +0300
  • 6d32e7fc8b Merge commit 'a6803cab946c817fb7aaf2a40b317f5d3e373bd1' into concedo_experimental Concedo 2023-07-18 19:12:06 +0800
  • 775fb18857
    ci : update README Georgi Gerganov 2023-07-18 13:58:13 +0300
  • 37855781fd updated runtimes to henky version Concedo 2023-07-18 18:48:54 +0800
  • 1f3512de68
    ppl : add --chunks argument to limit max number of chunks Georgi Gerganov 2023-07-18 13:48:00 +0300
  • da4a773cbc
    ci : add README.md Georgi Gerganov 2023-07-18 13:30:26 +0300
  • 9e8392a0c0
    ci : add short perplexity tests Georgi Gerganov 2023-07-18 12:55:21 +0300
  • fd90d52127 API: Replace modelbusy bool with a lock. Ycros 2023-07-18 20:09:50 +1000
  • 3d90f9f166
    ci : add K-quants Georgi Gerganov 2023-07-18 11:47:45 +0300
  • a404142aec
    tests : try to fix tail free sampling test Georgi Gerganov 2023-07-17 17:35:01 +0300
  • d7d1828613
    ci : add open llama 3B-v2 tg tests for q4 and q5 quantizations Georgi Gerganov 2023-07-17 17:13:11 +0300
  • 5fd6650af4
    ci : disable wget progress output Georgi Gerganov 2023-07-17 17:03:41 +0300
  • 68d4dd301d
    ci : add open llama 3B-v2 tests Georgi Gerganov 2023-07-17 16:46:56 +0300
  • d2c3214a1a
    ci : run ctest Georgi Gerganov 2023-07-17 16:25:50 +0300
  • 6cbf9dfb32
    llama : shorten quantization descriptions master-6cbf9df Georgi Gerganov 2023-07-18 11:50:49 +0300
  • 64b8aafce1 support bpe tokenizer in convert, fix ldwang 2023-07-18 11:18:12 +0800
  • 3db70b5f0a
    Merge 'origin/master' into hipblas Henri Vasserman 2023-07-18 01:54:17 +0300
  • 8d351b8bd8 Merge upstream changes, fix conflict 0cc4m 2023-07-17 22:14:22 +0200
  • 7568d1a2b2
    Support dup & cont ops on CUDA (#2242) master-7568d1a Jiahao Li 2023-07-18 01:39:29 +0800
  • f0a8ba0414
    Add an api to get max devices at runtime. Yaohui Liu 2023-07-17 22:59:54 +0800
  • 1102ff56db fix double-free with --no-mmap slaren 2023-07-17 12:00:17 +0200
  • 4e94af3060 improve layer backend printing with ranges slaren 2023-07-17 11:53:01 +0200
  • c2beeb8e3a only allocate as much memory as is required in each backend for the model slaren 2023-07-17 11:18:19 +0200
  • 4088df14ca metal: update rms_norm kernel lshzh-ww 2023-07-16 22:28:59 -0400
  • 2c9385289e More changes Howard Su 2023-07-17 09:47:31 +0800
  • b7647436cc
    llama : fix t_start_sample_us initialization warning (#2238) master-b764743 Alex Klinkhamer 2023-07-16 14:01:45 -0700
  • 1ea010a5c3
    llama : fix t_start_sample_us initialization warning grencez 2023-07-16 13:57:17 -0700
  • 672dda10e4
    ggml : fixed runtime bugs and compile errors related to GGML_PERF and GGML_DEBUG (#2219) master-672dda1 Qingyou Meng 2023-07-17 03:57:28 +0800
  • 27ab66e437
    py : turn verify-checksum-models.py into executable (#2245) Jiří Podivín 2023-07-16 21:54:47 +0200
  • 36ffa16130 Turning verify-checksum-models.py into executable Jiri Podivin 2023-07-16 21:40:53 +0200
  • 6035abe170 Setting new target for test binaries Jiri Podivin 2023-07-16 17:46:37 +0200
  • 9c72e7e916 rebase to master (except ggml-cuda) slaren 2023-07-16 14:36:32 +0200
  • 33ab185dd1 fix NVCC version on Makefile, __halves2half2 -> make_half2 slaren 2023-07-16 00:20:43 +0200
  • 24cc6f008f minor fixes slaren 2023-07-15 19:04:37 +0200
  • 5765d7a587 restore simple.cpp for now slaren 2023-07-15 12:44:47 +0200
  • 0d2b66c638 ggml backend interface wip slaren 2023-07-10 17:32:06 +0200
  • 929ae2017f Support dup & cont ops on CUDA lijiahao 2023-07-16 19:37:14 +0800
  • 362da7b310 cmake : fix server example building on MSYS2 Przemyslaw Pawelczyk 2023-07-16 10:52:24 +0200
  • 931a8921de Fix F32 matmul 0cc4m 2023-07-16 07:55:53 +0200
  • cdba17d262
    chore (ci): remove release from ci Hunter LaTourette 2023-07-15 22:12:53 -0400
  • b477e17406
    docs (gh): remove ISSUE_TEMPLATE Hunter LaTourette 2023-07-15 22:12:35 -0400
  • ff29017f48
    chore (*): remove pocs Hunter LaTourette 2023-07-15 21:41:56 -0400
  • 656449e4ea make : fix embdinput library and server examples building on MSYS2 Przemyslaw Pawelczyk 2023-07-16 01:25:56 +0200
  • 6bfbdf84ce fix NVCC version on Makefile, __halves2half2 -> make_half2 slaren 2023-07-16 00:20:43 +0200
  • 88c88778ad Fix macro expansion on gcc grahameth 2023-07-15 23:02:53 +0200
  • f58fa51fd0 Increase matmul test runs for consistent results 0cc4m 2023-07-15 22:46:24 +0200
  • 2dfe0aefc6 add log_callback to llama_context_params for custom logging. grahameth 2023-07-15 20:48:36 +0200
  • 22a4cb7f03 Handle stage flags during command buffer submission properly 0cc4m 2023-07-15 22:00:47 +0200
  • 39edee5136 Add flag to make reverse prompt case insensitive Dewi Jones 2023-07-15 19:56:57 +0000
  • 83595ecbd6 minor fixes slaren 2023-07-15 19:04:37 +0200
  • 5d03303bdc remove ifdef GGML_PERF; update fmt mqy 2023-07-15 17:36:39 +0800
  • 09ab5c1718 restore simple.cpp for now slaren 2023-07-15 12:44:47 +0200
  • fea4e9d25e ggml backend interface wip slaren 2023-07-10 17:32:06 +0200
  • 6e7cca4047
    llama : add custom RoPE (#2054) master-6e7cca4 Xiao-Yong Jin 2023-07-15 06:34:16 -0400
  • 6024bccdb9
    ggml : fix asserts Georgi Gerganov 2023-07-15 11:28:37 +0300
  • ad3d28ee0a Reuse semaphores 0cc4m 2023-07-15 09:18:54 +0200
  • 0c4d841cf1 Fix synchronization on AMD, add barriers for buffer ownership transfer, add debug flag and prints 0cc4m 2023-07-15 09:06:53 +0200
  • d0b6c942fc
    style : minor fixes, mostly indentations Georgi Gerganov 2023-07-15 09:57:35 +0300
  • bbce392890 metal: use uint16_t instead of uint8_t. lshzh-ww 2023-07-15 02:15:02 -0400
  • ee6bc1426e support bpe tokenizer in convert ldwang 2023-07-15 14:14:00 +0800
  • d7aab2e900 support bpe tokenizer in convert ldwang 2023-07-15 14:12:25 +0800
  • a6803cab94
    flake : add runHook preInstall/postInstall to installPhase so hooks function (#2224) Dave Della Costa 2023-07-14 15:13:38 -0400
  • 7dabc66f3c
    make : use pkg-config for OpenBLAS (#2222) master-7dabc66 wzy 2023-07-15 03:05:08 +0800
  • 7cdd30bf1f
    cuda : allocate all temporary ggml_tensor_extra_gpu from a fixed-size buffer (#2220) master-7cdd30b Bach Le 2023-07-15 03:00:58 +0800
  • e8035f141e
    ggml : fix static_assert with older compilers #2024 (#2218) master-e8035f1 Evan Miller 2023-07-14 14:55:56 -0400
  • 7513b7b0a1
    llama : add functions that work directly on model (#2197) master-7513b7b Bach Le 2023-07-15 02:55:24 +0800
  • de8342423d
    build.zig : install config header (#2216) Ali Chraghi 2023-07-14 11:50:58 -0700
  • c48c525f87
    examples : fixed path typos in embd-input (#2214) Shangning Xu 2023-07-15 02:40:05 +0800
  • 206e01de11
    cuda : support broadcast add & mul (#2192) master-206e01d Jiahao Li 2023-07-15 02:38:24 +0800
  • 4fc401434e
    Merge branch 'master' into bcast-cuda Georgi Gerganov 2023-07-14 21:33:19 +0300
  • 4abdcd5479 adds runHook preInstall/postInstall to installPhase so hooks function Dave Della Costa 2023-07-14 13:59:44 -0400
  • 4304bd3cde
    CUDA: mul_mat_vec_q kernels for k-quants (#2203) master-4304bd3 Johannes Gäßler 2023-07-14 19:44:08 +0200
  • 229aab351c
    make : fix combination of LLAMA_METAL and LLAMA_MPI (#2208) master-229aab3 James Reynolds 2023-07-14 11:34:40 -0600
  • b16785f713
    Merge 2777168618 into 697966680b Chad Brewbaker 2023-07-14 18:49:17 +0200
  • 2a1add9374
    Fix #2221, use pkg-config Wu Zhenyu 2023-07-14 22:24:36 +0800
  • 268d03b447 Allocate all temporary ggml_tensor_extra_gpu from a fixed-size buffer Bach Le 2023-07-14 21:59:29 +0800
  • 697966680b
    ggml : sync (ggml_conv_2d, fix mul_mat bug, CUDA GLM rope) master-6979666 Georgi Gerganov 2023-07-14 16:36:41 +0300
  • 218ab9ef89 fixed runtime bugs and compile errors related to GGML_PERF and GGML_DEBUG mqy 2023-07-14 20:21:53 +0800
  • 9234f32bea Fix static_assert with older compilers #2024 Evan Miller 2023-07-14 06:57:02 -0400
  • 27ad57a69b
    Metal: faster Q4_0 and Q4_1 matrix x vector kernels (#2212) Kawrakow 2023-07-14 12:46:21 +0300
  • da730c53bf Merge branch 'custom_rope' of github.com:jxy/llama.cpp into custom_rope Xiao-Yong Jin 2023-07-13 19:54:50 -0400
  • a6b5695764 Merge remote-tracking branch 'upstream/master' into custom_rope Xiao-Yong Jin 2023-07-13 19:52:28 -0400
  • 4cae9f5673
    Port CFG to server. Henri Vasserman 2023-07-13 22:37:57 +0300
  • 0c5a305d9d build.zig: install config header Ali Chraghi 2023-07-13 22:58:49 +0330