Commit Graph

  • f4c1c6b97a Updated preloader to use multithreading - currently set to 50% of the available threads on the system Tested on Windows - a small performance during loading is not avoidable but this is the best possible solution On Linux John 2023-04-11 00:28:04 +0200
  • 74b92ff6b8 Add helper script to convert hf (pytorch) models into ggml format aeslampanah 2023-04-10 17:21:33 -0400
  • 0f934b79de
    Merge 94ddd6204c into a0caa34b16 Howard Su 2023-04-10 22:58:20 +0200
  • a0caa34b16
    Add BAIR's Koala to supported models (#877) qouoq 2023-04-11 04:41:53 +0800
  • 461ba9e66e
    ggml : fix WASM build master-461ba9e Georgi Gerganov 2023-04-10 23:20:01 +0300
  • 55ffe2e46c
    Fix whitespace, add .editorconfig, add GitHub workflow Pavol Rusnak 2023-04-10 22:09:54 +0200
  • c3ac702e5e
    ggml : add ggml_cont() + optimize ggml_cpy() for contiguous dst master-c3ac702 Georgi Gerganov 2023-04-10 22:40:28 +0300
  • 9d634ef452
    ggml : remove trailing whitespaces Georgi Gerganov 2023-04-10 19:32:45 +0300
  • d9a239c410
    Simplify to include lower-case windows.h always, fix compile on mingw32 (#747) master-d9a239c Marco Matthies 2023-04-10 19:57:59 +0200
  • 417e3f409d
    Merge branch 'master' into fix-mingw32-includes Pavol Rusnak 2023-04-10 19:55:10 +0200
  • 5f7b3837a6
    Add BAIR's Koala to supported models qouoq 2023-04-11 01:53:45 +0800
  • 684da25926
    ggml : fix quantize_row_q4_1() ARM_NEON (close #876) master-684da25 Georgi Gerganov 2023-04-10 19:29:48 +0300
  • c3db99ea32 Allow use of OpenCL GPU-based BLAS using ClBlast instead of OpenBLAS for context processing 0cc4m 2023-04-10 09:49:40 +0200
  • 69b85f5b61 fixed a few OOM errors with larger contexts - I cannot figure out why they happen, so I am forced to increase the buffer size. Concedo 2023-04-11 00:14:57 +0800
  • 776b2cb135 Add enum llama_ftype, sync ggml_type to model files Stephan Walter 2023-04-02 13:37:23 +0200
  • 94ddd6204c Simplify the logic of scheduling Howard Su 2023-04-10 22:37:37 +0800
  • 6d18c6ea3e Fix the number of forward looking nodes Howard Su 2023-04-10 22:37:10 +0800
  • 6f2a61eb4f Rework scheduling algorithm. Howard Su 2023-04-10 22:24:27 +0800
  • f74aeef379 update readme zjli2019 2023-04-10 17:50:51 +0800
  • 80d468e681 update zjli2019 2023-04-10 17:02:54 +0800
  • f53238f570 Merged the upstream updates for model loading code, and ditched the legacy llama loaders since they were no longer needed. Concedo 2023-04-10 12:00:34 +0800
  • 56b6fa5397 linux will need unistd.h John 2023-04-10 05:23:53 +0200
  • 5010b6ae84 Adds _PRELOAD_MMAP_FILE flag to fully preload the model even when using mmap(). This brings back consistency so benchmarking token inference does not depend on ssd/disk speed anymore. John 2023-04-10 05:14:35 +0200
  • e163b7375f add more configs Tomáš Pazdiora 2023-04-10 00:33:35 +0200
  • 3d2bf47f24 default multiline mode Tomáš Pazdiora 2023-04-09 21:56:16 +0200
  • c1be1ee073 bugfix Tomáš Pazdiora 2023-04-09 21:20:39 +0200
  • 73966bc983 Change --multiline implementation to be toggled by EOF. Tomáš Pazdiora 2023-04-09 21:05:11 +0200
  • d98f613cc4 typos Tomáš Pazdiora 2023-04-09 20:19:57 +0200
  • 7c60721217 update implementation Tomáš Pazdiora 2023-04-09 19:49:33 +0200
  • 0dc9dcff13 bugfix Tomáš Pazdiora 2023-04-09 18:38:23 +0200
  • ddf6bd6497 Add multiline mode, update text input. Tomáš Pazdiora 2023-04-09 18:36:23 +0200
  • 80181c0712 Add support for configs, add configurable prefixes / suffixes, deprecate instruct mode, add stop prompt Tomáš Pazdiora 2023-04-09 06:11:29 +0200
  • 180b693a47 Print model version. master-180b693 comex 2023-04-08 13:08:21 -0700
  • f963b63afa Rewrite loading code to try to satisfy everyone: comex 2023-04-08 12:24:37 -0700
  • a78c42d5da Added token conversion script to convert from tokenizer.json format to tokenizer.model format, tested with bigscience models aeslampanah 2023-04-09 18:33:03 -0400
  • b543273254 initialize ggml_compute_state_shared with designated initializers, thanks @sw mqy 2023-04-10 02:53:57 +0800
  • faa3dde7b8 remove feature flag DISABLE_GGML_COMPUTE_SPIN_V2 mqy 2023-04-10 02:37:31 +0800
  • 18a154715e added version label, improved file type checks Concedo 2023-04-10 01:03:09 +0800
  • 2035a3cc29 avoid to change ggml_task_type Howard Su 2023-04-09 22:11:24 +0800
  • 1543c700d8 added a missing endpoint for tavern Concedo 2023-04-09 17:41:33 +0800
  • b91abc3316 increase default blas batch size Concedo 2023-04-09 15:27:43 +0800
  • 4d1825263b Merge branch 'master' into concedo Concedo 2023-04-09 13:22:40 +0800
  • 7b418494bd Print model version. comex 2023-04-08 13:08:21 -0700
  • e2cb5ab1bf Rewrite loading code to try to satisfy everyone: comex 2023-04-08 12:24:37 -0700
  • 26a7933084 hide the tiny tkinter window Concedo 2023-04-09 01:01:34 +0800
  • a0c015bab0
    Merge 678e138970 into aaf3b23deb Stephan Walter 2023-04-08 18:26:39 +0200
  • aaf3b23deb
    fix for windows utf-8 input (#840) master-aaf3b23 Tomáš Pazdiora 2023-04-08 17:49:39 +0200
  • d1315a34aa Fix wasm build after breaking it in #356 Stephan Walter 2023-04-08 16:06:56 +0200
  • ac7a69fa33 Run several single thread operator in worker threads Howard Su 2023-04-08 20:46:11 +0800
  • 3b03df5c05 look forward more Howard Su 2023-04-08 19:55:29 +0800
  • 4598aa8ab5 cleanup Tomáš Pazdiora 2023-04-07 23:49:23 +0200
  • ff21b16ea4 Use UTF-16 as input on Windows, since UTF-8 does not work and reads multibyte characters as zeros. Tomáš Pazdiora 2023-04-07 23:44:20 +0200
  • f2d1c47294
    cmake should link openblas properly with -lopenblas like how it's done in the makefile (#839) master-f2d1c47 eiery 2023-04-08 07:15:17 -0400
  • a55f190249
    flake.nix: add all binaries from bin Pavol Rusnak 2023-04-08 12:49:06 +0200
  • 67a0878f99 fix endline Claude Doppler 2023-04-08 10:05:34 +0000
  • 317fb12fbd
    Add new binaries to flake.nix (#847) lon 2023-04-08 07:04:23 -0300
  • d335fae7c4 missed a print statement Concedo 2023-04-08 17:59:53 +0800
  • 71ad5c1e22 Add new binaries to flake.nix Claude Doppler 2023-04-08 09:51:26 +0000
  • 0b904e12db Merge branch 'master' into concedo Concedo 2023-04-08 17:42:09 +0800
  • 5dd610032e
    Merge pull request #27 from ariez-xyz/patch-1 LostRuins 2023-04-08 17:37:39 +0800
  • d8e37bfe75 new gpt2 format supported Concedo 2023-04-08 17:35:36 +0800
  • 678e138970 Update stats tool for unbounded's method Stephan Walter 2023-04-08 10:46:49 +0200
  • b48255db19
    add more precise instructions for arch ariez-xyz 2023-04-08 10:41:57 +0200
  • 4dc62e78d8 Really slow RMS "optimal" scaling for q4_0 Håkon H. Hitland 2023-04-07 17:13:29 +0200
  • 40ebf819b0 Q4_0 scale selection using RMSE Stephan Walter 2023-04-07 13:49:51 +0200
  • 1369b46bb7 notice about false positives Concedo 2023-04-08 12:20:48 +0800
  • 9fd062fd2e feat: add "stop" keywords as alternative to eot token Claude Doppler 2023-04-04 20:33:09 +0000
  • 62cfc54f77
    Add quantize-stats command for testing quantization (#728) master-62cfc54 unbounded 2023-04-08 00:09:18 +0200
  • 5e5a653555 Interleave threads Will Beddow 2023-04-07 17:00:37 -0400
  • 45e532eb5d cmake should link openblas properly with -lopenblas like how it's done in the makefile eiery 2023-04-07 16:15:03 -0400
  • d1c957ee64 strip symbols Concedo 2023-04-08 00:59:34 +0800
  • 921296c0d5 avoid malloc/free in critial path Howard Su 2023-04-08 00:47:19 +0800
  • 455f6f79bc Try find other single threaded operator to run Howard Su 2023-04-08 00:34:05 +0800
  • 698f7b5d63
    make : add libllama.so target for llama-cpp-python (#797) master-698f7b5 bhubbb 2023-04-08 02:11:58 +1000
  • c1950c3431
    zig : don't link examples/common.cpp for non-example (#814) iacore 2023-04-07 16:05:29 +0000
  • 9b41742d88 try to fix cblas_sgemm call in ggml_compute_forward_mul_mat_* Vladimir 2023-04-07 18:00:44 +0200
  • 4953e9007f
    llama : always sort logits before nucleus sampling (#812) master-4953e90 Ivan Stepanov 2023-04-07 19:02:12 +0300
  • 43dde039b0 Run second operator when possible Howard Su 2023-04-07 23:51:46 +0800
  • 289c40df94 updated embedded kobold Concedo 2023-04-07 22:39:20 +0800
  • c640d2a4bd Remove finalizer Howard Su 2023-04-07 22:24:14 +0800
  • b8c9b27452 Merge remote-tracking branch 'tp/Pithikos-C-Thread-Pool2' into tp_schedule Howard Su 2023-04-07 21:31:07 +0800
  • 5ad9e9531f Only check hardware when option is ON Howard Su 2023-04-07 21:04:47 +0800
  • 1abcdb2394 should not be static Concedo 2023-04-07 20:35:19 +0800
  • 43949f7c7c Merge branch 'master' into concedo Concedo 2023-04-07 20:34:06 +0800
  • f322a5820e fixed positional port arg Concedo 2023-04-07 17:46:33 +0800
  • 1d48db4f63 dont build quantize Concedo 2023-04-07 17:11:26 +0800
  • edd57a186c Update README.md saharNooby 2023-04-07 10:16:12 +0400
  • e26b408ea7 Add Q4_1_O test saharNooby 2023-04-07 10:12:19 +0400
  • 18bf02fea4 Use ggml function for parameter size calculation saharNooby 2023-04-07 10:01:04 +0400
  • c40941d9d0 Add Q4_1_O format saharNooby 2023-04-07 09:55:39 +0400
  • 1b6fd5470b WIP on f16 Will Beddow 2023-04-07 00:49:19 -0400
  • 32d0654dd0 Impl for 32 Will Beddow 2023-04-06 23:46:22 -0400
  • 9603f7f5bf ggml: refactor compute thread: merge three spin variables into one mqy 2023-04-07 05:18:37 +0800
  • ec99bc1765 Do not quantize head saharNooby 2023-04-06 16:26:18 +0400
  • 058b5cd1e6 Show file compression ratio saharNooby 2023-04-04 20:20:34 +0400
  • fa9ad13a39 Free ggml context when model is garbage collected saharNooby 2023-04-05 15:55:47 +0400
  • ad3a4ebc57 Add missing labels and symbols for new operators saharNooby 2023-04-06 20:26:31 +0400
  • cc9cee8e9e
    Do not crash when it has nothing to say. (#796) master-cc9cee8 Sergey Alirzaev 2023-04-06 17:59:11 +0200
  • 2ceeccf8dc
    remove second normalization Ivan Stepanov 2023-04-06 18:03:37 +0300
  • 97443359c3
    Don't link examples/common.cpp for non-example Locria Cyber 2023-04-06 14:16:50 +0000