Commit Graph

  • d390f4f7dd
    ggml : q5_0 more efficient ARM NEON using uint64_t masks Georgi Gerganov 2023-04-26 16:32:33 +0300
  • b294b7fdc0
    ggml : q5_0 ARM NEON dot Georgi Gerganov 2023-04-26 16:24:27 +0300
  • ef8e3ee6f5
    ggml : q5_0 scalar dot product Georgi Gerganov 2023-04-26 13:58:47 +0300
  • 99238e4c28
    ggml : fix q5_0 histogram stats Georgi Gerganov 2023-04-26 13:37:57 +0300
  • 2576c16f00
    ggml : fix Q5_0 qh -> uint32_t Georgi Gerganov 2023-04-26 10:43:26 +0300
  • 5bebc0a6e2
    ggml : add Q5_0 quantization (cuBLAS only) Georgi Gerganov 2023-04-26 10:33:57 +0300
  • 859fee6dfb
    quantize : use map to assign quantization type from string (#1191) master-859fee6 Pavol Rusnak 2023-04-26 18:43:27 +0200
  • 6383bbfa5f fix jon-chuang 2023-04-27 00:42:41 +0800
  • 9eda98d14b fix jon-chuang 2023-04-27 00:41:12 +0800
  • ce97a807cb Simplify code, fix include 0cc4m 2023-04-26 18:39:04 +0200
  • b746458281 Use c compiler for opencl files 0cc4m 2023-04-26 18:38:31 +0200
  • d3e9a5c415
    quantize : use map to assign quantization type from string Pavol Rusnak 2023-04-26 18:06:10 +0200
  • 101f7a6e73 updated readme Concedo 2023-04-26 23:50:00 +0800
  • b80bc36ab0 minor jon-chuang 2023-04-26 23:33:24 +0800
  • 7ffbcbdfa3 fix jon-chuang 2023-04-26 23:29:26 +0800
  • 8cead20746 done jon-chuang 2023-04-26 23:03:54 +0800
  • 8ead56c03a fix jon-chuang 2023-04-26 22:58:20 +0800
  • 5bb5327833 minor jon-chuang 2023-04-26 22:48:15 +0800
  • afe94e878b Merge branch 'jon/tall-and-skinny-matmul' of https://github.com/jon-chuang/llama.cpp into jon/tall-and-skinny-matmul jon-chuang 2023-04-26 22:46:48 +0800
  • 0a320ed274 Merge branch 'master' of https://github.com/ggerganov/llama.cpp into jon/tall-and-skinny-matmul jon-chuang 2023-04-26 22:45:58 +0800
  • 4a98a0f21a fix jon-chuang 2023-04-26 22:37:52 +0800
  • 42c297b926 Merge branch 'master' of https://github.com/ggerganov/llama.cpp into jon/use-hardware-cores jon-chuang 2023-04-26 22:21:52 +0800
  • 0bf8b5bdeb
    Fixing typo DaniAndTheWeb 2023-04-26 15:51:55 +0200
  • 8136918974
    Windows Make instructions DaniAndTheWeb 2023-04-26 15:45:01 +0200
  • 93a8e00dfa Merge branch 'master' into concedo Concedo 2023-04-26 18:01:35 +0800
  • ef51e9ecac
    Merge branch 'ggerganov:master' into hipblas Henri Vasserman 2023-04-26 12:46:26 +0300
  • 27bc29128e
    Update README.md (#120) Disty0 2023-04-26 12:33:34 +0300
  • 2b0c6a56f9 Improve code quality 0cc4m 2023-04-26 07:48:04 +0200
  • 741bb67445 Allow setting the rng seed after initialization. Asgeir Bjarni Ingvarsson 2023-04-25 23:23:15 +0000
  • 2ca73cb6ea
    Clarify the effect of BLAS DaniAndTheWeb 2023-04-26 01:40:00 +0200
  • b6904cc79f
    BLAS for Mac DaniAndTheWeb 2023-04-26 01:33:25 +0200
  • 2ff156d463
    Better BLAS explanation DaniAndTheWeb 2023-04-26 01:27:04 +0200
  • 5ac9074a7c
    Better BLAS explanation DaniAndTheWeb 2023-04-26 01:07:34 +0200
  • e1b704b44c
    Update information about BLAS DaniAndTheWeb 2023-04-26 00:38:40 +0200
  • ab07da07c1
    Update README.md DaniAndTheWeb 2023-04-26 00:17:51 +0200
  • e2bb127fd8
    Updated build information DaniAndTheWeb 2023-04-26 00:15:56 +0200
  • 4afcc37869
    Update SHA256SUMS after quantization change (#1181) Stephan Walter 2023-04-25 21:41:56 +0000
  • e4e868e2e5
    Update SHA256SUMS after quantization change (65B) Pavol Rusnak 2023-04-25 23:40:57 +0200
  • 667c501334
    py : cast lora_alpha to int in convert-lora-to-ggml (#1170) ostix360 2023-04-25 23:33:08 +0200
  • bb98e77be7
    nix: use convert.py instead of legacy wrapper convert-pth-to-ggml.py (#981) Pavol Rusnak 2023-04-25 23:19:57 +0200
  • 7a32fcb3b2
    ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) (#1179) master-7a32fcb Georgi Gerganov 2023-04-25 23:40:51 +0300
  • e8c3731764
    ggml : fix assert using wrong QK4_2 instead of QK4_3 Georgi Gerganov 2023-04-25 23:25:03 +0300
  • 4ddb983a02
    ggml : fix Q8_0 to use 255 values out of 256 Georgi Gerganov 2023-04-25 23:23:05 +0300
  • 91bfa51dca
    ggml : extend quantize_fns_t with "vec_dot_type" Georgi Gerganov 2023-04-25 22:47:50 +0300
  • 46fc696dea
    ggml : fix bug - using wrong block type Georgi Gerganov 2023-04-25 22:28:26 +0300
  • 6e0f0b6ff1
    ggml : Q8_0 unroll x2 Georgi Gerganov 2023-04-25 22:21:57 +0300
  • 88618ab7f5
    ggml : fix Q8_0 dot product bug (ARM) Georgi Gerganov 2023-04-25 22:14:25 +0300
  • 6496b79e8e
    ggml : use q4_0_q8_0 and q4_2_q8_0 Georgi Gerganov 2023-04-25 22:08:44 +0300
  • d8bf7207f1
    ggml : finalize Q8_0 implementation Georgi Gerganov 2023-04-25 22:03:08 +0300
  • 79cfdf5e23
    tests : fix test-quantize-fns Georgi Gerganov 2023-04-25 21:55:15 +0300
  • 95c6f85ae3 Update SHA256SUMS after quantization change Stephan Walter 2023-04-25 20:51:41 +0200
  • f83c321c47 ggml : add Q8_0 quantization format (rename the old one to Q8_1) Georgi Gerganov 2023-04-25 21:39:06 +0300
  • d571d1629f Merge 'origin/master' into hipblas Henri Vasserman 2023-04-25 21:15:33 +0300
  • 608aa33d9f change default GPU arch to match CMake Henri Vasserman 2023-04-25 21:15:04 +0300
  • b73c19201f
    Merge branch 'ggerganov:master' into master CRD716 2023-04-25 12:56:06 -0500
  • 618fda5009 examples : switch input_noecho to input_echo to remove negation deadprogram 2023-04-25 19:55:25 +0200
  • 137071003c Improve btype dequant kernel selection code, add error if type is unsupported 0cc4m 2023-04-25 19:40:54 +0200
  • ecff6723d1
    Update convert-lora-to-ggml.py Pavol Rusnak 2023-04-25 19:29:11 +0200
  • dd0eabc049
    ggml : use full range for Q4_0 and Q4_2 quantization (#729) master-dd0eabc unbounded 2023-04-25 19:20:46 +0200
  • 36bfb3c158 Fix typos, use GGML_TYPE defines, improve code 0cc4m 2023-04-25 18:43:31 +0200
  • 0aa3d839fb free old ctx on retry Concedo 2023-04-25 23:42:57 +0800
  • 6454855ae3
    Update convert-lora-to-ggml.py ostix360 2023-04-25 17:33:52 +0200
  • a696b0a16c missed another thing Concedo 2023-04-25 23:16:04 +0800
  • 8c9c218609 missed a thing Concedo 2023-04-25 23:02:08 +0800
  • 235daf4016 Merge branch 'master' into concedo Concedo 2023-04-25 20:44:22 +0800
  • 72b2331ad6 edge cases with mem crash? need verify Concedo 2023-04-25 20:42:30 +0800
  • 5eec5d6ed9 Added backwards compatibility to an earlier version of NeoX. Concedo 2023-04-25 20:34:18 +0800
  • bff998f871 Slight refactor of the python code: credits to @LuxF3rre Concedo 2023-04-25 19:20:14 +0800
  • 9bfc54373c force int caste .0 in the config file for the lora_alpha param ostix360 2023-04-25 07:19:46 +0000
  • 9143ccefa0
    introduction to give more consistent results CRD716 2023-04-24 22:55:43 -0500
  • e3159c018f
    editorcheck CRD716 2023-04-24 22:01:05 -0500
  • e82439a36e
    Prefixes, Line separators, etc CRD716 2023-04-24 21:59:06 -0500
  • 7f58f2cca0 llama : add session file format and saved sessions in main Evan Jones 2023-04-24 20:56:45 -0400
  • 7fd88f445b
    Prevent Results.txt from coming up CRD716 2023-04-24 19:13:08 -0500
  • ccf900240d
    Basic Setup CRD716 2023-04-24 19:12:46 -0500
  • 54bb60e268
    ggml : fix bug in ggml_compute_forward_sum_f32 (#1162) master-54bb60e xaedes 2023-04-24 23:02:02 +0200
  • daa5df51f7 Replace buffer pool with static buffers a, b, qb, c 0cc4m 2023-04-24 22:08:51 +0200
  • ae73887fb9 Add CLBlast to CMakeLists.txt 0cc4m 2023-04-24 21:22:41 +0200
  • 18cc05bde4 Fix cast in opencl kernels 0cc4m 2023-04-24 16:13:43 +0200
  • 8603c25e3c Fix device selection env variable names 0cc4m 2023-04-24 15:53:48 +0200
  • f469d9afa0 Double CLBlast speed by disabling OpenBLAS thread workaround 0cc4m 2023-04-24 15:15:23 +0200
  • 309af7fce9 Add q4_2 and q4_3 CLBlast support, improve code 0cc4m 2023-04-24 07:16:43 +0200
  • 1b16b8c90d Move CLBlast implementation to separate file 0cc4m 2023-04-23 09:59:45 +0200
  • 6f66870726 Finish merge of ClBlast support 0cc4m 2023-04-15 12:03:11 +0200
  • b7143c1a2e Improve ClBlast implementation, avoid recreating buffers, remove redundant transfers 0cc4m 2023-04-11 21:53:50 +0200
  • a908c37ce9 Allow use of OpenCL GPU-based BLAS using ClBlast instead of OpenBLAS for context processing 0cc4m 2023-04-10 09:49:40 +0200
  • e24ecd2cc9
    fix bug in ggml_compute_forward_sum_f32 xaedes 2023-04-24 21:13:22 +0200
  • 8a0f8673ba
    ggml : export symbols (#1155) master-8a0f867 Georgi Gerganov 2023-04-24 22:18:25 +0300
  • 09ae3044f4
    small update on the readme KASR 2023-04-24 20:56:15 +0200
  • 5808fcf7ac Use full range for q4_2 quantization Håkon H. Hitland 2023-04-24 20:54:51 +0200
  • 735c77acf1
    move powershell script & update readme KASR 2023-04-24 20:49:43 +0200
  • e5bbecaf2d
    Merge branch 'ggerganov:master' into master KASR 2023-04-24 20:32:39 +0200
  • d09f97e28f Update quantize_row_q4_0 for PowerPC Håkon H. Hitland 2023-04-05 02:48:51 +0200
  • fea8d10107 Update quantize_row_q4_0 for Arm NEON Håkon H. Hitland 2023-04-05 02:37:20 +0200
  • 73a92d2d3c Update quantize_row_q4_0 for WASM Håkon H. Hitland 2023-04-05 01:18:42 +0200
  • 84aa7d83c4 Update quantize_row_q4_0 for AVX/AVX2 Håkon H. Hitland 2023-04-05 01:02:43 +0200
  • f57433c44f Use full range for q4_0 quantization Håkon H. Hitland 2023-04-03 03:02:26 +0200
  • 0c5692345d
    examples : add save_load_state example (#1150) master-0c56923 xaedes 2023-04-24 18:23:31 +0200
  • 00ef34dea1
    renamed save-load-state example files replacing underscores by dashes xaedes 2023-04-24 18:20:10 +0200
  • e8a156ab50
    ggml : export symbols Georgi Gerganov 2023-04-24 18:55:18 +0300