Commit Graph

  • 8342fe81b1 revert the wstring tokenization. coherency was affected Concedo 2023-06-24 12:58:49 +0800
  • 6da38b0d40 up ver Concedo 2023-06-24 12:30:38 +0800
  • 0485fa65a2 wstring convert for mpt Concedo 2023-06-24 11:43:42 +0800
  • 072007b1e8 Add buffer qualifiers niansa 2023-06-23 21:21:16 +0200
  • 48125f7221 fix name of copies slaren 2023-06-23 20:41:24 +0200
  • acb7d90398 Reenabled unknown op message niansa 2023-06-23 20:39:32 +0200
  • 5d5f66d1d9 More little fixes and stuff niansa 2023-06-23 20:37:58 +0200
  • b19334ec76 add more automatic names to view ops slaren 2023-06-23 20:27:43 +0200
  • e0814f86a2 Free vk context niansa 2023-06-23 20:02:46 +0200
  • 55815b67f4 Improved memory safety niansa 2023-06-23 19:58:41 +0200
  • 9600ded125 upgrade zig build system support sjinzh 2023-06-24 01:32:43 +0800
  • 4b267e88b6 Temporarily care for all layers niansa 2023-06-23 18:40:58 +0200
  • 40621ea0ec Added more debugging niansa 2023-06-23 18:26:21 +0200
  • e6da9bd96b Added ggml_vk_mem_used() niansa 2023-06-23 17:57:09 +0200
  • 6d718525c4 Merge branch 'optimize_quants_upstream' into concedo_experimental Concedo 2023-06-23 23:56:31 +0800
  • f7b096374d fixed string too long CI issue Concedo 2023-06-23 23:56:22 +0800
  • 0cc5c5325c Improve ggml_graph_dump_dot, add ggml_format_name slaren 2023-06-23 17:18:17 +0200
  • 1a68195408 Add mutexes for gpu tensors niansa 2023-06-23 17:46:09 +0200
  • 0f07f37b88 applies the top k sampler first anon 2023-06-23 12:40:41 -0300
  • 46f577bfc1 h2d tensors during loadup niansa 2023-06-23 17:10:45 +0200
  • 490cf395f8 better alloc error Concedo 2023-06-23 22:51:51 +0800
  • 98e588c6eb Fix ggml_vk_h2d_tensor throwing on second call niansa 2023-06-23 16:50:37 +0200
  • ece453ed09 Merge branch 'master' into concedo_experimental Concedo 2023-06-23 22:46:54 +0800
  • f39a746089 bug fixes for openblas Concedo 2023-06-23 22:45:22 +0800
  • 328dea41ba
    convert : fix invalid params in write_vocab_only AN Long 2023-06-23 22:45:08 +0800
  • 09b0b3a49b Wait for all threads to finish niansa 2023-06-23 16:13:32 +0200
  • 2589cb0c70 Prevent compileSource race niansa 2023-06-23 16:02:49 +0200
  • 5c0d8dd0f2 Specify program output size niansa 2023-06-23 15:58:13 +0200
  • e830264c92 Share sequence to functions and add scale() niansa 2023-06-23 15:10:24 +0200
  • 5e9403342b Minor fixes niansa 2023-06-23 15:01:09 +0200
  • b6264542b7 Added vk_mul to ggml_vk_graph_compute niansa 2023-06-23 14:19:31 +0200
  • 18d6f7f8da More progress... niansa 2023-06-23 14:08:45 +0200
  • d539247996 Began implementing ggml_graph_compute niansa 2023-06-23 14:03:33 +0200
  • 43c2891afa option to not use scratch Concedo 2023-06-23 19:01:36 +0800
  • d5e4cf7ffe handle ctx manip Concedo 2023-06-23 19:01:15 +0800
  • df9135e3a9 fixing memory bugs Concedo 2023-06-23 18:41:23 +0800
  • b8a4594f89 More fixes... niansa 2023-06-23 12:19:33 +0200
  • 9d643755a6 Fixed compile error niansa 2023-06-23 11:51:25 +0200
  • 339bc36cdd Added more functions from Metal niansa 2023-06-23 11:50:30 +0200
  • d7b7484f74
    Add OpenLLaMA instructions to the README (#1954) eiery 2023-06-23 04:38:01 -0400
  • 5d077341f7 Fix ggml-metal.metal path and run nixfmt novafacing 2023-06-23 00:31:36 -0700
  • 6dd5bd7e43
    readme : fixed termux instructions Alberto 2023-06-23 09:15:34 +0200
  • 6c76c31184
    Merge branch 'ggerganov:master' into master WangHaoranRobin 2023-06-22 22:14:41 -0700
  • 7cd8fc20d0
    Merge pull request #4 from WangHaoranRobin/robin_fork_master WangHaoranRobin 2023-06-22 22:00:21 -0700
  • 7b93b248ef server: fix some beginner mistakes Wang Haoran(Robin) 2023-06-22 21:59:12 -0700
  • bdb710efa2
    Merge pull request #3 from WangHaoranRobin/robin_fork_master WangHaoranRobin 2023-06-22 21:36:50 -0700
  • cf76195223 server: fix issue when handling probability output for incomplete tokens for multibyte character generation Wang Haoran(Robin) 2023-06-22 21:35:37 -0700
  • 3349c01357
    3b works now eiery 2023-06-22 22:51:52 -0400
  • e92795f2f4 Add CUDA and hopefully Metal support for p_scale KerfuffleV2 2023-06-22 14:06:13 -0600
  • df7346ccd5
    Merge 'origin/master' into hipblas Henri Vasserman 2023-06-22 20:51:09 +0300
  • 926664c229
    Merge pull request #2 from WangHaoranRobin/robin_fork_master WangHaoranRobin 2023-06-22 09:01:42 -0700
  • ccf254bd44 server: fix comment about max n_probs Wang Haoran(Robin) 2023-06-22 08:57:35 -0700
  • 9cdaea9240 Implemented dequantize_row_q4_1 niansa 2023-06-22 16:30:36 +0200
  • 887694acfd Handle rope params in CUDA, Metal KerfuffleV2 2023-06-22 08:18:01 -0600
  • b0f11fa9c1 More code cleanups niansa 2023-06-22 16:05:56 +0200
  • 4bf45a7dbe Helps to pass args in the correct order KerfuffleV2 2023-06-22 06:37:21 -0600
  • 7487137227
    rework convert.py to read hyper-parameters from config.json (#1958) master-7487137 Erik Scholz 2023-06-22 14:20:47 +0200
  • 3b3d30e4ad Cleanups niansa 2023-06-22 13:55:25 +0200
  • 2f3fe0c0a4 Updated gitignore niansa 2023-06-22 12:58:33 +0200
  • 4f598dd973 Initial working stuff niansa 2023-06-22 12:58:07 +0200
  • bc17e11590 Allow specifying p scale factor for ggml rope and rope_back ops KerfuffleV2 2023-06-22 05:29:11 -0600
  • 0eedccaf06 Merge branch 'master' into optimize_quants_upstream Concedo 2023-06-22 17:59:58 +0800
  • e6ddb15c3a cleanup Concedo 2023-06-22 10:38:27 +0800
  • bbca06e269
    cmake: revert CUDA arch default to 52, 61 if f16 (#1959) master-bbca06e Johannes Gäßler 2023-06-21 23:49:25 +0200
  • fb98254f99
    Fix typo in README.md (#1961) Rahul Vivek Nair 2023-06-22 03:18:43 +0530
  • 8004e673f0
    Merge pull request #1 from WangHaoranRobin/robin_fork_master WangHaoranRobin 2023-06-21 14:28:46 -0700
  • ba210e4bc7 server: add option to output probabilities for completion Wang Haoran(Robin) 2023-06-21 14:21:35 -0700
  • c7dc5a37c3
    remove 3b eiery 2023-06-21 16:28:56 -0400
  • dbf02472bd cmake: revert CUDA arch default to 52, 61 if f16 JohannesGaessler 2023-06-21 15:28:55 +0200
  • 022d099376 Fix typo in README.md RahulVivekNair 2023-06-21 23:54:35 +0530
  • 0141e6395c
    clean up previous hack Green Sky 2023-06-21 19:52:40 +0200
  • 1b71752a9f Implemented basic GPU offloading for MPT, GPT-2, GPT-J and GPT-NeoX Concedo 2023-06-22 00:43:25 +0800
  • b1f00fa9cc
    Fix hordeconfig max context setting, and add Makefile flags for cuda F16/KQuants per iter. (#252) Ycros 2023-06-22 01:01:46 +1000
  • 72397fbe63
    read params from hftransformer config.json Green Sky 2023-06-21 15:12:09 +0200
  • dfdd20240c gpt j use scratch buffers Concedo 2023-06-21 16:10:31 +0800
  • 2880f43b7f add test for correct top-p behavior Alex Renda 2023-06-20 20:49:43 -0400
  • 407b77cdb3 top-p: correct gt to gte Alex Renda 2023-06-20 20:48:09 -0400
  • d7714a8f80 Deprecate public API function llama_apply_lora_from_file Didzis Gosko 2023-06-21 00:08:49 +0300
  • 69f776282b Update public API use cases: move away from deprecated llama_init_from_file Didzis Gosko 2023-06-20 23:47:33 +0300
  • 6ef282f2b8
    spacing eiery 2023-06-20 16:16:05 -0400
  • aa4df44134
    whitespace eiery 2023-06-20 16:14:11 -0400
  • ff24bd7667
    table of contents eiery 2023-06-20 15:47:05 -0400
  • f5a276b265
    add openllama to readme eiery 2023-06-20 15:41:30 -0400
  • 1e7755cfcb Fix top-p sampling to match the standard definition (smallest set that has probability mass at least p, not largest set with probability mass less than p) Alex Renda 2023-06-20 14:38:13 -0400
  • a9eb1e73e9 Fix typo Xiake Sun 2023-06-20 12:47:22 -0400
  • 049aa16b8c
    readme : add link to p1 Georgi Gerganov 2023-06-20 19:05:54 +0300
  • 0dcfe45c1c Fix crash when running train with CUDA enabled Howard Su 2023-06-20 23:58:37 +0800
  • 53dfbbf553 add example of PandaGPT ningshanwutuobang 2023-06-20 22:57:21 +0800
  • 266d47a4b9 Merge branch 'optimize_quants_upstream' into concedo_experimental Concedo 2023-06-20 22:46:35 +0800
  • da668e685f fixing address spaces Concedo 2023-06-20 22:45:16 +0800
  • cce6e67f44 fixing address spaces Concedo 2023-06-20 22:45:16 +0800
  • 1f1735f5ad Merge branch 'optimize_quants_upstream' into concedo_experimental Concedo 2023-06-20 21:39:35 +0800
  • 6b75fc48b9 fixed global const struct types Concedo 2023-06-20 21:38:48 +0800
  • 2322ec223a
    Fix typo (#1949) Xiake Sun 2023-06-20 05:42:40 -0700
  • 537ff22ec9 fixed a bug with token timings, updated lite Concedo 2023-06-20 20:41:42 +0800
  • c5ae3f50a7 Merge branch 'optimize_quants_upstream' into concedo_experimental Concedo 2023-06-20 18:41:13 +0800
  • a6e8b0216d remove old dot kernels and template Concedo 2023-06-20 18:34:46 +0800
  • 93247a11cd ported q2k and q5k speedups Concedo 2023-06-20 18:30:30 +0800
  • 029bed6446 ported q3k speedup successfully Concedo 2023-06-20 17:57:44 +0800
  • d754915269 Merge branch 'optimize_quants_upstream' into concedo_experimental Concedo 2023-06-20 17:26:39 +0800