Commit Graph

  • 5a5aeb1e91
    llama : fix unused warning master-5a5aeb1 Georgi Gerganov 2023-05-13 16:55:14 +0300
  • 66841fdb0e
    ggml : multi-thread mul and diag_mask ops (#1428) master-66841fd Georgi Gerganov 2023-05-13 16:48:03 +0300
  • b849461e62
    ggml : fix clang-tidy warning Georgi Gerganov 2023-05-13 16:47:39 +0300
  • 6d7c47b8de
    ggml : multi-thread mul and diag_mask ops Georgi Gerganov 2023-05-13 11:29:32 +0300
  • 905d87b70a
    ggml : GPU-accelerated token generation (#1412) master-905d87b Johannes Gäßler 2023-05-13 15:38:36 +0200
  • ad8a9e6971 llama : offload "output" tensor to GPU too + coding style fixes Georgi Gerganov 2023-05-13 16:35:21 +0300
  • 7b6f3f3970 ggml : add AVX support based on AVX2 code katsu560 2023-05-13 22:26:58 +0900
  • f954edda93
    ggml : implement backward pass for llama + small training-llama-from-scratch example (#1360) master-f954edd xaedes 2023-05-13 14:56:40 +0200
  • dae6ba2abe
    baby-llama : couple of clang-tidy warnings Georgi Gerganov 2023-05-13 15:38:50 +0300
  • ef3d42a3aa
    ggml : fix clang-tidy warnings Georgi Gerganov 2023-05-13 15:34:56 +0300
  • 95a487a17e
    ggml : remove Q4_2 remnants Georgi Gerganov 2023-05-13 15:22:24 +0300
  • 092913ecea
    Merge remote-tracking branch 'origin/master' into HEAD Georgi Gerganov 2023-05-13 15:20:22 +0300
  • 2956630a3d
    Merge 'origin/master' into hipblas Henri Vasserman 2023-05-13 13:12:52 +0300
  • f048af0230
    ggml : sync alibi fix from ggml repo master-f048af0 Georgi Gerganov 2023-05-13 11:54:33 +0300
  • ac0cd259d5
    Adding SSE instructions to ggml_vec_dot_q4_0_q8_0 (#1413) master-ac0cd25 3ooabkhxtn 2023-05-13 10:43:33 +0200
  • 0cd22e190a
    llama : fix various warnings master-0cd22e1 Georgi Gerganov 2023-05-13 11:23:15 +0300
  • c9eb2ba1c5 Merge branch 'master' into concedo_experimental Concedo 2023-05-13 15:51:05 +0800
  • b6594ab91e do not show tokenizer warning Concedo 2023-05-13 15:48:17 +0800
  • 6456a4eb9f
    embedding : remove unused code (#1426) master-6456a4e Rinne 2023-05-13 15:24:20 +0800
  • 0fa4624cb7
    Remove extra code of embedding example. Yaohui Liu 2023-05-13 15:19:45 +0800
  • 33034cfede
    ggml : fix null ptr deref in backward pass Georgi Gerganov 2023-05-13 10:08:01 +0300
  • bb0993ed48 dequantize_mul_mat_vec kernels for q5_1, q8_0, f16 JohannesGaessler 2023-05-13 08:10:38 +0200
  • f977243ded
    minor : fix compiler warnings + indentation style Georgi Gerganov 2023-05-13 09:55:17 +0300
  • cdd5350892
    readme : update Q4_0 perplexities Georgi Gerganov 2023-05-13 09:12:44 +0300
  • 738ace394a
    llama : free ggml context in set / copy state data (close #1425) master-738ace3 Georgi Gerganov 2023-05-13 09:08:52 +0300
  • 699b1ad7fe
    opencl : fix kernels for the new formats (#1422) master-699b1ad Henri Vasserman 2023-05-13 09:01:15 +0300
  • 5a0ecf768d More readable dequantize_mul_mat_vec logic JohannesGaessler 2023-05-13 07:14:27 +0200
  • 9da44fdcb3 q5_0 dequantize_mul_mat kernel JohannesGaessler 2023-05-12 23:57:10 +0200
  • 0986c2f44e Shorter dequantize_mul_mat_vec line JohannesGaessler 2023-05-12 23:30:17 +0200
  • cee8042793 integrated new version of clblast kernels as a separate file Concedo 2023-05-13 12:53:28 +0800
  • 017023e477 updated kobold lite Concedo 2023-05-13 12:12:20 +0800
  • 53e7256a25 should be good to merge, only thing missing is clblast new quants Concedo 2023-05-13 12:07:29 +0800
  • 098277cf5e ADD Chatbot UI example Brendan Hubble 2023-05-13 11:16:29 +1000
  • 05cf5f7d6e partially working, but the blas matmul is broken Concedo 2023-05-13 11:35:38 +0800
  • cc798cc08c
    Fix Q5_0 alignment issues. Henri Vasserman 2023-05-13 03:37:28 +0300
  • 3243b9943a
    Fix OpenCL kernels for the new formats Henri Vasserman 2023-05-13 01:02:36 +0300
  • 1a8f93442f
    Add files via upload morpheus2448 2023-05-12 22:28:31 +0100
  • 25b448a32f - rearranged defines, SSSE3 function only compiled if used 3ooabkhxtn 2023-05-12 20:48:41 +0000
  • f0af475739 --gpu_layers -> --gpu-layers JohannesGaessler 2023-05-12 21:43:47 +0200
  • 7dc2f57e5e Added missing __syncthreads(); JohannesGaessler 2023-05-12 21:37:34 +0200
  • 12fc292ee6 Added q4_1 via template JohannesGaessler 2023-05-12 12:42:09 +0200
  • 637be12f16 CUDA kernel for q4_0 dequant. + mat. vec. mult. JohannesGaessler 2023-05-08 22:21:03 +0200
  • fb62f92433
    llama : fix --mtest option (close #1414) master-fb62f92 Georgi Gerganov 2023-05-12 21:44:20 +0300
  • b335f73a60 BACKWARDS COMPAT QUANT SHIM is ready, but upstream model converter is BORKED. BORK BORK. Concedo 2023-05-13 01:30:11 +0800
  • 08810d5fee interim merge. do not use Concedo 2023-05-13 00:33:55 +0800
  • e9caff1cda Interim merge. Do not use. Concedo 2023-05-12 23:20:27 +0800
  • 773ee249fb
    CLI args use - instead of _, backwards compatible (#1416) master-773ee24 Johannes Gäßler 2023-05-12 16:34:55 +0200
  • 0fe6384755
    fix makefile Henri Vasserman 2023-05-12 17:22:11 +0300
  • a3e6d62283 cuda : alternative q4_q8 kernel dequantize-matmul-3-gg Georgi Gerganov 2023-05-12 15:54:07 +0300
  • fc26f54e74 - Put the whole line into defined() - Use __SSSE3__ instead of __SSE__ 3ooabkhxtn 2023-05-12 13:59:20 +0000
  • e3c7dcf5c1 CLI args use - instead of _, backwards compatible JohannesGaessler 2023-05-12 15:23:49 +0200
  • 553fd4d4b5
    Add clang-tidy reviews to CI (#1407) master-553fd4d slaren 2023-05-12 15:40:53 +0200
  • 70c2b6c696 Put __SSE3__ into defined() 3ooabkhxtn 2023-05-12 13:32:00 +0000
  • 605560d9ec
    Merge 'origin/master' into hipblas Henri Vasserman 2023-05-12 16:12:53 +0300
  • ca54314a2f - Improved prefetching 3ooabkhxtn 2023-05-12 10:17:13 +0000
  • 8699fd0d43 - Cleanup 3ooabkhxtn 2023-05-12 09:25:00 +0000
  • 7379dd2dba - Added prefetch 3ooabkhxtn 2023-05-12 09:20:48 +0000
  • 78bbb3cdfe - Use 4 accumulations instead of 2 - Removed first accumulation 3ooabkhxtn 2023-05-12 09:15:46 +0000
  • 607b9c7373 - Split multiplication and addition to make it easier for the compiler to optimise - Accumulate two acc instead of one 3ooabkhxtn 2023-05-12 08:04:54 +0000
  • 524d6c9447 - added sse instructions for ggml_vec_dot_q4_0_q8_0 3ooabkhxtn 2023-05-12 07:54:33 +0000
  • e7b9d97bae More int mult, less float mult, worse performance JohannesGaessler 2023-05-12 09:11:47 +0200
  • 089b1c93ba
    readme : add C#/.NET bindings repo (#1409) Rinne 2023-05-12 13:39:40 +0800
  • e052d53e51 Update gpt_params_parse and fix a merge error take 2 Jason McCartney 2023-05-11 21:17:04 -0700
  • 121c986d02 Revert "Update gpt_params_parse and fix a merge error" Jason McCartney 2023-05-11 21:11:55 -0700
  • 2bb2ff1748 Update gpt_params_parse and fix a merge error Jason McCartney 2023-05-11 21:00:11 -0700
  • ddc64202f6
    Add the dotnet binding info. Yaohui Liu 2023-05-12 11:44:30 +0800
  • d882d1c2fe Performance no longer terrible JohannesGaessler 2023-05-11 23:27:06 +0200
  • b9fd7eee57
    ggml : remove bit shuffling (#1405) master-b9fd7ee Georgi Gerganov 2023-05-12 00:23:08 +0300
  • cbb6a3a7e8
    llama : fix return for unknown version Georgi Gerganov 2023-05-12 00:08:36 +0300
  • b58b1f4bf6
    readme : add note that Q4 and Q5 have been changed Georgi Gerganov 2023-05-12 00:00:40 +0300
  • 4b12881329 WAKE ME UP JohannesGaessler 2023-05-11 22:47:38 +0200
  • ca7f069f39
    ggml : back to original bit order Georgi Gerganov 2023-05-11 23:33:07 +0300
  • f92faf50ce Add clang-tidy reviews to CI slaren 2023-05-10 23:27:59 +0200
  • 832c53f427
    ggml : fix WASM comments Georgi Gerganov 2023-05-11 21:59:25 +0300
  • 1c87847b6b
    llama : update v2 PR number to 1405 Georgi Gerganov 2023-05-11 21:48:56 +0300
  • 927afddf95
    Merge branch 'master' into add_stop_token Jason McCartney 2023-05-11 11:40:17 -0700
  • 51c25fd995
    readme : update timings + remove warning banner Georgi Gerganov 2023-05-11 21:38:47 +0300
  • e038e01e28
    sha : update hashes for 7B and 13B Georgi Gerganov 2023-05-11 21:33:29 +0300
  • 5bc286ab18
    ggml : fix AVX2 implementation Georgi Gerganov 2023-05-11 21:22:27 +0300
  • bd5e373058
    Revert "AVX implementations (#1370)" Georgi Gerganov 2023-05-11 20:57:28 +0300
  • 6680244838
    ggml : fix Q8_0 and Q8_1 rounding Georgi Gerganov 2023-05-11 20:47:41 +0300
  • 582a39fff5
    ggml : simplify Q8_1 - no need for low / high sums anymore Georgi Gerganov 2023-05-11 20:11:37 +0300
  • 695f3963b1
    ggml : preserve old Q4 and Q5 formats Georgi Gerganov 2023-05-11 19:46:11 +0300
  • b7ad385d42
    ggml : speed-up Q5_0 + Q5_1 at 4 threads Georgi Gerganov 2023-05-10 22:58:45 +0300
  • 09032e0290
    llama : fix model magic/version write Georgi Gerganov 2023-05-09 18:25:28 +0300
  • d52172a509
    llama : produce error upon loading old model files Georgi Gerganov 2023-05-09 18:19:13 +0300
  • 489bd13fad
    ggml : uniform 5th bit extraction Georgi Gerganov 2023-05-08 22:18:15 +0300
  • 9e49d20150
    AVX implementations (#1370) Stephan Walter 2023-05-08 19:14:06 +0000
  • 928d2f335f
    scripts : add script for measuring the time per token Georgi Gerganov 2023-05-08 22:06:54 +0300
  • 83674556b8
    ggml : fix Q5_0 quantization Georgi Gerganov 2023-05-07 20:26:02 +0300
  • b08c39b16c
    ggml : minor formatting Georgi Gerganov 2023-05-07 20:00:01 +0300
  • 4bf1c8a43e
    ggml : remove Q4_2 mode Georgi Gerganov 2023-05-07 18:26:59 +0300
  • cdc9607329
    ggml : update cuBLAS + normalize variable names Georgi Gerganov 2023-05-07 18:23:59 +0300
  • 9472d0ea8b
    ggml : fix Q4_1 quantization Georgi Gerganov 2023-05-07 18:07:11 +0300
  • 0add6402bd
    ggml : remove WASM SIMD bit shuffling + remove vzip for ARM 32-bit Georgi Gerganov 2023-05-05 17:23:41 +0300
  • caaacd5765
    ggml : simplify scalar dot Georgi Gerganov 2023-05-05 17:12:58 +0300
  • 292a778ca2
    ggml : remove Q5_1 bit shuffling (ARM NEON + scalar) Georgi Gerganov 2023-05-05 17:09:11 +0300
  • b37a08f646
    ggml : 2x faster scalar implementations Georgi Gerganov 2023-05-04 23:31:35 +0300
  • aa78dfed7d
    ggml : remove Q5_0 bit shuffling (ARM NEON) Georgi Gerganov 2023-05-04 22:55:10 +0300
  • 9f3285f741
    ggml : remove Q4_2 bit shuffling (WIP, BROKEN) Georgi Gerganov 2023-05-04 22:07:40 +0300