Commit Graph

  • f3de876a12 fix : update convert-llama-h5-to-gguf.py M. Yusuf Sarıgöz 2023-07-31 23:58:29 +0300
  • 995c2204e5 Added ne03==ne13 assertion Matteo Boschini 2023-07-31 21:25:09 +0200
  • 49e7cb5bb1
    CUDA: fixed LLAMA_FAST compilation option (#2473) master-49e7cb5 Johannes Gäßler 2023-07-31 21:02:19 +0200
  • 5be88d1a30 CUDA: fixed LLAMA_FAST compilation option JohannesGaessler 2023-07-31 20:17:23 +0200
  • b772bba42e
    CUDA: fixed cmake F16 option (#2471) master-b772bba Johannes Gäßler 2023-07-31 19:52:22 +0200
  • d91456aaf1
    fix half2 decomposition ardfork 2023-07-31 20:35:00 +0300
  • c1cb70d64d
    new build arg LLAMA_CUDA_MMQ_Y Henri Vasserman 2023-07-31 19:56:44 +0300
  • f1c03f4b16 more bug fixn Aniket 2023-07-31 13:20:32 -0400
  • 971464b920 CUDA: fixed cmake F16 option JohannesGaessler 2023-07-31 18:40:11 +0200
  • c1664a00ae
    Merge 'origin/master' into hipblas Henri Vasserman 2023-07-31 19:32:27 +0300
  • e221843147 trying out mmq Concedo 2023-07-31 22:51:15 +0800
  • bb42aefaeb gguf : mmap tensor data example M. Yusuf Sarıgöz 2023-07-31 17:46:12 +0300
  • 3e370f83ef Warning: Very experimental merge, do not use until confirmed stable. Concedo 2023-07-31 22:33:43 +0800
  • 0728c5a8b9
    CUDA: mmq CLI option, fixed mmq build issues (#2453) master-0728c5a Johannes Gäßler 2023-07-31 15:44:35 +0200
  • aebccdbf00 fixing bug that didnt unroll the 1d karpathy arrays Aniket 2023-07-31 09:33:57 -0400
  • b26f5b2e43 gguf : fix typo in function call M. Yusuf Sarıgöz 2023-07-31 16:23:54 +0300
  • d28b07ca7c Extend kernel_mul_mat_f16_f32 to handle gqa broadcast Matteo Boschini 2023-07-31 14:41:23 +0200
  • 1b09b9439f
    Merge 204f76d52e into 1215ed7d5c maddes8cht 2023-07-31 14:37:52 +0200
  • 1215ed7d5c
    CUDA: Implemented row flattening for non-glm RoPE (#2468) master-1215ed7 Johannes Gäßler 2023-07-31 14:32:30 +0200
  • 5b5f04be97 CUDA: mmq CLI option, fixed mmq build issues JohannesGaessler 2023-07-30 12:34:18 +0200
  • fee39ecd48 Update ggml-metal.m Matteo Boschini 2023-07-31 07:52:46 +0200
  • ae58ac7dd4 Added gqa8 kernel to allow llama-2-70B on metal Matteo Boschini 2023-07-31 00:02:04 +0200
  • 204f76d52e Fix: possible out-of-bounds error, remove default_params Mathias Bachmann 2023-07-31 13:19:13 +0200
  • 2dbf518911
    CUDA: fewer memory bank conflicts for mul_mat_q (#2458) master-2dbf518 Johannes Gäßler 2023-07-31 13:18:51 +0200
  • 58ff5e17e1 CUDA: Implemented row flattening for non-glm RoPE JohannesGaessler 2023-07-31 12:21:51 +0200
  • 4d92be8813 the cur parameter is missing gklab 2023-07-31 17:41:34 +0800
  • 84ce184c4f layout Concedo 2023-07-31 17:33:31 +0800
  • 9d2382b3e4
    Fix Metal backend broken from the allocator changes (#2455) master-9d2382b slaren 2023-07-31 11:02:53 +0200
  • f27972777f
    correct semantic error in import_vars (#355) YellowRoseCx 2023-07-31 02:51:35 -0500
  • 7aa0a0e7f7 gguf : support custom alignment value M. Yusuf Sarıgöz 2023-07-31 09:59:36 +0300
  • eab8335e33 use memcpy in test-double-float.c netrunnereve 2023-07-30 23:12:25 -0400
  • 5ad9c2f320
    Fix broken build for LLAMA_METAL 唐鳳 2023-07-31 09:35:55 +0800
  • 6b3a7b9f4f
    Update convert-llama-h5-to-gguf.py klosax 2023-07-31 03:02:00 +0200
  • 4f5b6224be
    Update convert-gptneox-h5-to-gguf.py klosax 2023-07-31 03:00:20 +0200
  • 5073b0f5d8 CUDA: fewer memory bank conflicts for mul_mat_q JohannesGaessler 2023-07-30 13:17:29 +0200
  • 74fb31bd35 move asserts slaren 2023-07-30 19:50:57 +0200
  • dc6e677c40 Reduce overhead of mul_f32 calls by using a single command buffer 0cc4m 2023-07-30 19:10:15 +0200
  • 485bbe1a78 fix Metal backend broken from the allocator changes slaren 2023-07-30 18:26:22 +0200
  • 2a0914673c
    Update convert-gptneox-h5-to-gguf.py klosax 2023-07-30 17:31:11 +0200
  • 068a8e0fbe
    Update convert-llama-h5-to-gguf.py klosax 2023-07-30 17:29:56 +0200
  • a37d31f29b use the appropriate format specifier for size_t, which is %zu mendax0110 2023-07-30 17:06:10 +0200
  • 30c4ea47e6
    add gptneox gguf example klosax 2023-07-30 16:59:26 +0200
  • 20bd792736 make auto const mendax0110 2023-07-30 16:59:17 +0200
  • da8fe7ac02
    Merge branch 'ggerganov:master' into master m3ndax 2023-07-30 16:58:17 +0200
  • 5ea5d19d6a SSE emoji fix Concedo 2023-07-30 22:31:20 +0800
  • a9a2647536 fixed whitespace maddes8cht 2023-07-30 16:30:53 +0200
  • 2fabc176ce
    Update convert-llama-h5-to-gguf.py klosax 2023-07-30 16:28:08 +0200
  • 582c825738 Use single command buffer for matrix vector multiplication ops 0cc4m 2023-07-30 16:25:58 +0200
  • a113689571
    ggml : add graph tensor allocator (#2411) master-a113689 slaren 2023-07-30 15:58:01 +0200
  • 570aa7ceeb rename ggml_allocator to ggml_allocr slaren 2023-07-29 15:01:43 +0200
  • 9df732dae4 introduce validate_params, use it in gpt_params_parse. maddes8cht 2023-07-30 15:28:23 +0200
  • f175b05872
    Makefile : add gptneox gguf example klosax 2023-07-30 15:08:37 +0200
  • e9192b0135
    add gptneox gguf example klosax 2023-07-30 15:05:37 +0200
  • 4ed98bf1ab
    Update convert-llama-h5-to-gguf.py klosax 2023-07-30 15:01:47 +0200
  • b19c11750b
    ggml.c : add gguf_get_arr_n klosax 2023-07-30 14:58:50 +0200
  • b4676ee447
    ggml.h : increase GGML_MAX_NAME to 64 klosax 2023-07-30 14:51:37 +0200
  • ccd81a751b
    gguf.py : add layer norm eps and merges klosax 2023-07-30 14:48:14 +0200
  • 0790c121aa
    constants.py : add layer norm eps klosax 2023-07-30 14:46:36 +0200
  • 82d0695f0f Merge commit '9baf9ef304f330009d5a93b7390280a0fd27c9a1' into concedo_experimental Concedo 2023-07-30 18:18:23 +0800
  • 90a37d63d5 up ver, added warning for max context Concedo 2023-07-30 18:07:14 +0800
  • c8af65760f
    Hide unavailable backends & Add tooltip over backend count (#352) YellowRoseCx 2023-07-30 04:50:55 -0500
  • 45456fa6ca switch noavx2 to not use openblas, as it has incompatible instructions Concedo 2023-07-30 16:47:33 +0800
  • 23825abee1 fix wrong key Concedo 2023-07-30 14:30:46 +0800
  • 87c34e4dd4 gguf : update convert-llama-h5-to-gguf.py M. Yusuf Sarıgöz 2023-07-30 01:09:22 +0300
  • 32e037ffbe gguf : fix set is not subscriptable M. Yusuf Sarıgöz 2023-07-30 01:01:13 +0300
  • 11f3ca06b8
    CUDA: Quantized matrix matrix multiplication (#2160) master-11f3ca0 Johannes Gäßler 2023-07-29 23:04:44 +0200
  • 9baf9ef304
    CUDA: faster multi GPU synchronization (#2448) master-9baf9ef Johannes Gäßler 2023-07-29 23:04:10 +0200
  • 06c3e4a1a7
    Update convert-llama-h5-to-gguf.py klosax 2023-07-29 21:38:01 +0200
  • d641b80660 CUDA: faster multi GPU synchronization JohannesGaessler 2023-07-29 20:53:30 +0200
  • 9577821487
    gguf.py : support any type klosax 2023-07-29 21:29:07 +0200
  • 0b206788dc add static to test-grad0.c internal functions netrunnereve 2023-07-29 15:12:05 -0400
  • 2c22e3bcdb
    ggml.c : get arr str and f32 klosax 2023-07-29 20:37:47 +0200
  • 49580fe816 c++11 cannot use designated initializers netrunnereve 2023-07-29 14:36:33 -0400
  • 34469b9ea7
    ggml.h : get array str and f32 klosax 2023-07-29 20:36:06 +0200
  • dc9b9f3272 fix hellaswag print format, cast away warning in test-double-float netrunnereve 2023-07-29 13:55:53 -0400
  • 0bb22bb4df Fix multi GPU out-of-bounds JohannesGaessler 2023-07-29 19:31:30 +0200
  • 0f5e57f01d gguf : handle already encoded string M. Yusuf Sarıgöz 2023-07-29 19:56:06 +0300
  • 0b5f989122 Fix CMakeLists.txt JohannesGaessler 2023-07-29 17:45:13 +0200
  • 4336231a32
    add hipBLAS to README Henri Vasserman 2023-07-29 18:35:56 +0300
  • 8ad7cd49fb
    Update convert-llama-h5-to-gguf.py klosax 2023-07-29 16:47:00 +0200
  • c0dfd5a5e0 Fix CMakeLists.txt JohannesGaessler 2023-07-29 16:04:19 +0200
  • 592594f110
    Merge branch 'ggerganov:master' into develop Stephen Nichols 2023-07-29 08:17:32 -0500
  • f8e3fc6c74
    rocblas init stuff Henri Vasserman 2023-07-29 14:16:46 +0300
  • 0317c41d98 gguf : upd gguf conversion script M. Yusuf Sarıgöz 2023-07-29 13:31:07 +0300
  • cc3dd7f042 gguf : write tokenizer data M. Yusuf Sarıgöz 2023-07-29 13:30:22 +0300
  • 8a76dd8a85 gguf : write tensors one by one M. Yusuf Sarıgöz 2023-07-29 13:17:28 +0300
  • d2ade639f4
    Merge 'origin/master' into hipblas Henri Vasserman 2023-07-29 12:59:48 +0300
  • c861e234f4 gguf : write tensors one by one M. Yusuf Sarıgöz 2023-07-29 12:49:01 +0300
  • cde3760e52 Merge branch 'master' into concedo_experimental Concedo 2023-07-29 17:47:00 +0800
  • 0c219fb5b5 gguf : fix writing gguf arrays M. Yusuf Sarıgöz 2023-07-29 12:42:54 +0300
  • aa4b2c9375 Updated README, CMakeLists JohannesGaessler 2023-07-29 11:40:56 +0200
  • 93f7f7aef7 gguf : write tensors one by one and code reuse M. Yusuf Sarıgöz 2023-07-29 12:34:35 +0300
  • 9589d52079 added help link Concedo 2023-07-29 17:33:15 +0800
  • aa99562d70 Merge branch 'gguf' of https://github.com//ggerganov/llama.cpp into gguf M. Yusuf Sarıgöz 2023-07-29 12:26:11 +0300
  • ea5f9ad2ca gguf : fix writing gguf arrays M. Yusuf Sarıgöz 2023-07-29 12:25:43 +0300
  • 999431c4b6
    quick and dirty conversion example klosax 2023-07-29 11:20:05 +0200
  • d54f53ca51 gguf : add tokenization constants M. Yusuf Sarıgöz 2023-07-29 12:04:45 +0300
  • a4e9c92292
    Merge branch 'ggerganov:master' into master m3ndax 2023-07-29 10:15:57 +0200
  • 06f423a8e1 gguf : write sample tensors to read M. Yusuf Sarıgöz 2023-07-29 10:26:26 +0300
  • 08dc8fd884 gguf : do not hardcode tensor names to read M. Yusuf Sarıgöz 2023-07-29 10:24:46 +0300