Commit Graph

  • 6d18c6ea3e Fix the number of forward looking nodes Howard Su 2023-04-10 22:37:10 +0800
  • 6f2a61eb4f Rework scheduling algorithm. Howard Su 2023-04-10 22:24:27 +0800
  • f74aeef379 update readme zjli2019 2023-04-10 17:50:51 +0800
  • 80d468e681 update zjli2019 2023-04-10 17:02:54 +0800
  • f53238f570 Merged the upstream updates for model loading code, and ditched the legacy llama loaders since they were no longer needed. Concedo 2023-04-10 12:00:34 +0800
  • 56b6fa5397 linux will need unistd.h John 2023-04-10 05:23:53 +0200
  • 5010b6ae84 Adds _PRELOAD_MMAP_FILE flag to fully preload the model even when using mmap(). This brings back consistency so benchmarking token inference does not depend on ssd/disk speed anymore. John 2023-04-10 05:14:35 +0200
  • e163b7375f add more configs Tomáš Pazdiora 2023-04-10 00:33:35 +0200
  • 3d2bf47f24 default multiline mode Tomáš Pazdiora 2023-04-09 21:56:16 +0200
  • c1be1ee073 bugfix Tomáš Pazdiora 2023-04-09 21:20:39 +0200
  • 73966bc983 Change --multiline implementation to be toggled by EOF. Tomáš Pazdiora 2023-04-09 21:05:11 +0200
  • d98f613cc4 typos Tomáš Pazdiora 2023-04-09 20:19:57 +0200
  • 7c60721217 update implementation Tomáš Pazdiora 2023-04-09 19:49:33 +0200
  • 0dc9dcff13 bugfix Tomáš Pazdiora 2023-04-09 18:38:23 +0200
  • ddf6bd6497 Add multiline mode, update text input. Tomáš Pazdiora 2023-04-09 18:36:23 +0200
  • 80181c0712 Add support for configs, add configurable prefixes / suffixes, deprecate instruct mode, add stop prompt Tomáš Pazdiora 2023-04-09 06:11:29 +0200
  • 180b693a47 Print model version. master-180b693 comex 2023-04-08 13:08:21 -0700
  • f963b63afa Rewrite loading code to try to satisfy everyone: comex 2023-04-08 12:24:37 -0700
  • a78c42d5da Added token conversion script to convert from tokenizer.json format to tokenizer.model format, tested with bigscience models aeslampanah 2023-04-09 18:33:03 -0400
  • b543273254 initialize ggml_compute_state_shared with designated initializers, thanks @sw mqy 2023-04-10 02:53:57 +0800
  • faa3dde7b8 remove feature flag DISABLE_GGML_COMPUTE_SPIN_V2 mqy 2023-04-10 02:37:31 +0800
  • 18a154715e added version label, improved file type checks Concedo 2023-04-10 01:03:09 +0800
  • 2035a3cc29 avoid to change ggml_task_type Howard Su 2023-04-09 22:11:24 +0800
  • 1543c700d8 added a missing endpoint for tavern Concedo 2023-04-09 17:41:33 +0800
  • b91abc3316 increase default blas batch size Concedo 2023-04-09 15:27:43 +0800
  • 4d1825263b Merge branch 'master' into concedo Concedo 2023-04-09 13:22:40 +0800
  • 7b418494bd Print model version. comex 2023-04-08 13:08:21 -0700
  • e2cb5ab1bf Rewrite loading code to try to satisfy everyone: comex 2023-04-08 12:24:37 -0700
  • 26a7933084 hide the tiny tkinter window Concedo 2023-04-09 01:01:34 +0800
  • a0c015bab0
    Merge 678e138970 into aaf3b23deb Stephan Walter 2023-04-08 18:26:39 +0200
  • aaf3b23deb
    fix for windows utf-8 input (#840) master-aaf3b23 Tomáš Pazdiora 2023-04-08 17:49:39 +0200
  • d1315a34aa Fix wasm build after breaking it in #356 Stephan Walter 2023-04-08 16:06:56 +0200
  • ac7a69fa33 Run several single thread operator in worker threads Howard Su 2023-04-08 20:46:11 +0800
  • 3b03df5c05 look forward more Howard Su 2023-04-08 19:55:29 +0800
  • 4598aa8ab5 cleanup Tomáš Pazdiora 2023-04-07 23:49:23 +0200
  • ff21b16ea4 Use UTF-16 as input on Windows, since UTF-8 does not work and reads multibyte characters as zeros. Tomáš Pazdiora 2023-04-07 23:44:20 +0200
  • f2d1c47294
    cmake should link openblas properly with -lopenblas like how it's done in the makefile (#839) master-f2d1c47 eiery 2023-04-08 07:15:17 -0400
  • a55f190249
    flake.nix: add all binaries from bin Pavol Rusnak 2023-04-08 12:49:06 +0200
  • 67a0878f99 fix endline Claude Doppler 2023-04-08 10:05:34 +0000
  • 317fb12fbd
    Add new binaries to flake.nix (#847) lon 2023-04-08 07:04:23 -0300
  • d335fae7c4 missed a print statement Concedo 2023-04-08 17:59:53 +0800
  • 71ad5c1e22 Add new binaries to flake.nix Claude Doppler 2023-04-08 09:51:26 +0000
  • 0b904e12db Merge branch 'master' into concedo Concedo 2023-04-08 17:42:09 +0800
  • 5dd610032e
    Merge pull request #27 from ariez-xyz/patch-1 LostRuins 2023-04-08 17:37:39 +0800
  • d8e37bfe75 new gpt2 format supported Concedo 2023-04-08 17:35:36 +0800
  • 678e138970 Update stats tool for unbounded's method Stephan Walter 2023-04-08 10:46:49 +0200
  • b48255db19
    add more precise instructions for arch ariez-xyz 2023-04-08 10:41:57 +0200
  • 4dc62e78d8 Really slow RMS "optimal" scaling for q4_0 Håkon H. Hitland 2023-04-07 17:13:29 +0200
  • 40ebf819b0 Q4_0 scale selection using RMSE Stephan Walter 2023-04-07 13:49:51 +0200
  • 1369b46bb7 notice about false positives Concedo 2023-04-08 12:20:48 +0800
  • 9fd062fd2e feat: add "stop" keywords as alternative to eot token Claude Doppler 2023-04-04 20:33:09 +0000
  • 62cfc54f77
    Add quantize-stats command for testing quantization (#728) master-62cfc54 unbounded 2023-04-08 00:09:18 +0200
  • 5e5a653555 Interleave threads Will Beddow 2023-04-07 17:00:37 -0400
  • 45e532eb5d cmake should link openblas properly with -lopenblas like how it's done in the makefile eiery 2023-04-07 16:15:03 -0400
  • d1c957ee64 strip symbols Concedo 2023-04-08 00:59:34 +0800
  • 921296c0d5 avoid malloc/free in critial path Howard Su 2023-04-08 00:47:19 +0800
  • 455f6f79bc Try find other single threaded operator to run Howard Su 2023-04-08 00:34:05 +0800
  • 698f7b5d63
    make : add libllama.so target for llama-cpp-python (#797) master-698f7b5 bhubbb 2023-04-08 02:11:58 +1000
  • c1950c3431
    zig : don't link examples/common.cpp for non-example (#814) iacore 2023-04-07 16:05:29 +0000
  • 9b41742d88 try to fix cblas_sgemm call in ggml_compute_forward_mul_mat_* Vladimir 2023-04-07 18:00:44 +0200
  • 4953e9007f
    llama : always sort logits before nucleus sampling (#812) master-4953e90 Ivan Stepanov 2023-04-07 19:02:12 +0300
  • 43dde039b0 Run second operator when possible Howard Su 2023-04-07 23:51:46 +0800
  • 289c40df94 updated embedded kobold Concedo 2023-04-07 22:39:20 +0800
  • c640d2a4bd Remove finalizer Howard Su 2023-04-07 22:24:14 +0800
  • b8c9b27452 Merge remote-tracking branch 'tp/Pithikos-C-Thread-Pool2' into tp_schedule Howard Su 2023-04-07 21:31:07 +0800
  • 5ad9e9531f Only check hardware when option is ON Howard Su 2023-04-07 21:04:47 +0800
  • 1abcdb2394 should not be static Concedo 2023-04-07 20:35:19 +0800
  • 43949f7c7c Merge branch 'master' into concedo Concedo 2023-04-07 20:34:06 +0800
  • f322a5820e fixed positional port arg Concedo 2023-04-07 17:46:33 +0800
  • 1d48db4f63 dont build quantize Concedo 2023-04-07 17:11:26 +0800
  • edd57a186c Update README.md saharNooby 2023-04-07 10:16:12 +0400
  • e26b408ea7 Add Q4_1_O test saharNooby 2023-04-07 10:12:19 +0400
  • 18bf02fea4 Use ggml function for parameter size calculation saharNooby 2023-04-07 10:01:04 +0400
  • c40941d9d0 Add Q4_1_O format saharNooby 2023-04-07 09:55:39 +0400
  • 1b6fd5470b WIP on f16 Will Beddow 2023-04-07 00:49:19 -0400
  • 32d0654dd0 Impl for 32 Will Beddow 2023-04-06 23:46:22 -0400
  • 9603f7f5bf ggml: refactor compute thread: merge three spin variables into one mqy 2023-04-07 05:18:37 +0800
  • ec99bc1765 Do not quantize head saharNooby 2023-04-06 16:26:18 +0400
  • 058b5cd1e6 Show file compression ratio saharNooby 2023-04-04 20:20:34 +0400
  • fa9ad13a39 Free ggml context when model is garbage collected saharNooby 2023-04-05 15:55:47 +0400
  • ad3a4ebc57 Add missing labels and symbols for new operators saharNooby 2023-04-06 20:26:31 +0400
  • cc9cee8e9e
    Do not crash when it has nothing to say. (#796) master-cc9cee8 Sergey Alirzaev 2023-04-06 17:59:11 +0200
  • 2ceeccf8dc
    remove second normalization Ivan Stepanov 2023-04-06 18:03:37 +0300
  • 97443359c3
    Don't link examples/common.cpp for non-example Locria Cyber 2023-04-06 14:16:50 +0000
  • 709a958a2f
    Optimize locking behavior Jan Bielak 2023-04-06 16:18:49 +0200
  • 9f0c7cdb8f Always sort logits before nucleus sampling Ivan Stepanov 2023-04-06 17:05:38 +0300
  • 4f5faf9612 some users report that this repo is now being flagged as malicious? no idea why, but I am removing all prebuilt binaries except libopenblas. windows users can still obtain it from /releases and osx and linux users can rebuild from source code. Concedo 2023-04-06 21:49:43 +0800
  • 997c749065 Add detection code for avx Howard Su 2023-04-01 16:32:14 +0800
  • 3b3caf2e64 optimize rope function to avoid call powf in the tight loop Howard Su 2023-04-06 19:50:09 +0800
  • b56f872b61 update embedded kobold lite Concedo 2023-04-06 16:34:51 +0800
  • d2beca95dc
    Make docker instructions more explicit (#785) Pavol Rusnak 2023-04-06 08:56:58 +0200
  • 0e889ed6db Merge branch 'master' into concedo Concedo 2023-04-06 11:14:44 +0800
  • 3d650d0e25 remove dependency of psutil, fixed compile error on WSL, handle exceptions when sending http response, added multiline for embedded kobold Concedo 2023-04-06 11:08:19 +0800
  • 79ed023891
    Do not crash when it has nothing to say. Sergey Alirzaev 2023-04-06 00:57:31 +0200
  • e65e832082 ADD libllama.so target for llama-cpp-python Brendan Hubble 2023-04-06 08:56:53 +1000
  • 41d4a863c9 Remove "internal" header files Håkon H. Hitland 2023-04-05 22:18:58 +0200
  • 4778f93611 Merge branch 'master' into eval-thread-count ml6 2023-04-05 12:44:50 -0700
  • 36ddd12924
    llama : add flash attention (demo) flash-attn Georgi Gerganov 2023-04-05 18:28:01 +0300
  • eeaa7b0492
    ggml : multi-thread ggml_rope() (~3-4 times faster on M1) (#781) master-eeaa7b0 Georgi Gerganov 2023-04-05 22:11:03 +0300
  • 372b70c39d
    ggml : multi-thread ggml_rope() (~3-4 times faster on M1) Georgi Gerganov 2023-04-05 19:15:21 +0300