Commit Graph

  • ff966e7ca6
    build : fix several cast and printf warnings (#2499) master-ff966e7 Borislav Stanimirov 2023-08-04 13:07:21 +0300
  • db5618ad99
    cmpnct_gpt2bpe.hpp : comments klosax 2023-08-04 04:57:51 +0200
  • f0764c6cfb fix indentation, increase server thread count Concedo 2023-08-04 10:29:56 +0800
  • d09e54aad1 Merge remote-tracking branch 'duncan/api-stream-double-write-fix' into concedo_experimental Concedo 2023-08-04 10:22:53 +0800
  • 278ada9572
    gguf.py : bytesarray for gpt2bpe tokenizer klosax 2023-08-04 04:07:57 +0200
  • fb0b243705
    Makefile : remove gptneox-common klosax 2023-08-04 04:02:10 +0200
  • 5d98989cf6
    gpt2 bpe tokenizer (handles merges and unicode) klosax 2023-08-04 03:58:44 +0200
  • e6f19ba240
    gptneox-main.cpp : gpt2 bpe tokenizer klosax 2023-08-04 03:56:37 +0200
  • 2922280a1a
    convert-gptneox-h5-to-gguf.py : gpt2bpe tokenizer klosax 2023-08-04 03:55:23 +0200
  • 6691aa8797
    Delete gptneox-common.h klosax 2023-08-04 03:52:01 +0200
  • 23abbe8e00
    Delete gptneox-common.cpp klosax 2023-08-04 03:51:43 +0200
  • c1320fd54a CUDA: use min compute capability of GPUs actually used Cebtenzzre 2023-08-03 16:40:34 -0400
  • c79c66bdf3 CUDA: check if event is NULL before cudaStreamWaitEvent Cebtenzzre 2023-08-03 15:04:24 -0400
  • d4a126cdb5 removed unused llama-util.h include l3utterfly 2023-08-03 21:03:54 +0800
  • 142da8f9dc fixed whitepace l3utterfly 2023-08-03 21:03:29 +0800
  • 30144f7634 restored save load state example l3utterfly 2023-08-03 21:03:15 +0800
  • cffe923ea3 fixed function declaration order l3utterfly 2023-08-03 20:52:54 +0800
  • 7c5b2b57b0 - restored breakage of the llama_copy_state_data API - moved new logic for copying llama state data to internal function l3utterfly 2023-08-03 20:47:35 +0800
  • 63ec711a70
    fix: still send full result after streaming duncannah 2023-08-03 14:35:43 +0200
  • 601eef7f06 Add --simple-io option for subprocesses and break out console.h and cpp Danny Daemonic 2023-05-21 22:35:32 -0700
  • 86ac49c5c5
    build : fix several cast and printf warnings Borislav Stanimirov 2023-08-03 13:40:00 +0300
  • 1ffa6be726 updated save load state to use public function in llama.cpp l3utterfly 2023-08-03 17:30:13 +0800
  • 987387859b fixed save load state example l3utterfly 2023-08-03 17:21:55 +0800
  • 81f347c19d fixed trailing whitespaces l3utterfly 2023-08-03 17:13:12 +0800
  • 87fcdd971c added comments explaining how copy_state_data works l3utterfly 2023-08-03 17:11:17 +0800
  • 3cebd6e4b7 generalised copying state data to file or buffer l3utterfly 2023-08-03 17:02:29 +0800
  • 4709545c06 Merge remote-tracking branch 'duncan/api-stream-double-write-fix' into concedo_experimental Concedo 2023-08-03 12:52:43 +0800
  • ba2040d1df compile fix for ARM NEON Concedo 2023-08-03 12:52:06 +0800
  • 3fa6befdaf increase max free blocks Concedo 2023-08-03 10:50:16 +0800
  • 34e60be41a compile fix Concedo 2023-08-03 10:36:14 +0800
  • 8183159cf3
    examples : generate JSON according to schema (#1887) Evan Jones 2023-08-02 22:05:44 -0400
  • 034894f590 support integer type and adjust usage text Evan Jones 2023-08-02 21:11:15 -0400
  • 9281c2801f
    fix: don't send headers twice when streaming duncannah 2023-08-02 23:42:43 +0200
  • f8f0d59765 Update Vim plugin Austin Mroz 2023-08-02 16:33:17 -0500
  • 44bbc85aaf Add missing barrier 0cc4m 2023-08-02 22:04:43 +0200
  • 468ea24fb4
    CUDA: faster non k-quant mul_mat_q kernels (#2483) master-468ea24 Johannes Gäßler 2023-08-02 18:04:04 +0200
  • eec1ef8738 Fix missing abstract methods Keiichi TABATA 2023-08-03 00:14:30 +0900
  • d6154f5b3a CUDA: faster non k-quant mul_mat_q kernels JohannesGaessler 2023-08-01 09:22:31 +0200
  • b2eaec4261 updated lite Concedo 2023-08-02 22:54:17 +0800
  • 4f6b60c776
    CUDA: Fix models with output size != 32000 (#2480) master-4f6b60c Johannes Gäßler 2023-08-02 16:48:10 +0200
  • 4c90fdc5cd Merge remote-tracking branch 'johannes/cuda-fix-output-size' into concedo_experimental Concedo 2023-08-02 22:37:41 +0800
  • 6fe92318f8 Merge branch 'master' into concedo_experimental Concedo 2023-08-02 22:36:00 +0800
  • df659f6bef cleaning up code a little bit with removing extra printfs needed during debug Aniket 2023-08-02 09:16:00 -0400
  • 48bea64d47 server : update index.html.hpp Jhen 2023-08-02 18:01:58 +0800
  • c5ba5efda2
    convert-llama-h5-to-gguf.py : special tokens klosax 2023-08-02 11:26:07 +0200
  • e1e9b28547
    convert-llama-h5-to-gguf.py : accumulate kv / ti + special tokens klosax 2023-08-02 11:15:33 +0200
  • cc1ae32d41 server : Fix regenerated prompt Jhen 2023-08-02 16:34:00 +0800
  • 6c798db041 added stream saving context data to file to avoid allocating unnecessary amounts of memory l3utterfly 2023-08-02 16:41:25 +0800
  • 8772c255ab make use_buff and get_buf_max_mem static mendax0110 2023-08-02 10:33:16 +0200
  • 750299726d server : adjust for dark/light mode Jhen 2023-08-02 16:29:15 +0800
  • a51d1a416c
    Merge branch 'ggerganov:master' into master m3ndax 2023-08-02 10:28:12 +0200
  • 1e64d511d5 CUDA: Fix models with output size != 32000 JohannesGaessler 2023-08-01 15:11:36 +0200
  • 220d931864
    readme : add Aquila-7B model series to supported models (#2487) ldwang 2023-08-02 16:21:11 +0800
  • 368c41cb5b server : make n_probs max to 10 for easy scroll Jhen 2023-08-02 16:14:40 +0800
  • c3a65c4bbe gguf-util.h : update note M. Yusuf Sarıgöz 2023-08-02 11:16:23 +0300
  • 7f02fead8c server : handle bytes Jhen 2023-08-02 16:14:10 +0800
  • cf365fbc20 gguf : gguf counterpart of llama-util.h M. Yusuf Sarıgöz 2023-08-02 11:13:56 +0300
  • 81844fbcfd
    tests : Fix compilation warnings (Linux/GCC) (#2451) master-81844fb Eve 2023-08-02 04:06:19 -0400
  • d37be8dc9e server : implement Probabilites Jhen 2023-08-02 15:57:19 +0800
  • b9b6cd2f21 server : fix completion_probabilities undefined if not set n_probs Jhen 2023-08-02 15:53:10 +0800
  • 2a5bab4c9f server : add simple popover component Jhen 2023-08-02 15:46:49 +0800
  • 6a8b9c27d4 server : keep message data array & show in probabilites component Jhen 2023-08-02 15:43:33 +0800
  • 7862118886 server : add n_probs param in chat UI Jhen 2023-08-02 15:33:47 +0800
  • 803c2ff7bf Up Aquila-7B models in README.md ldwang 2023-08-02 15:18:36 +0800
  • 35ed27b1af Add Aquila-7B models in README.md ldwang 2023-08-02 14:48:03 +0800
  • 128b2f1e47
    Merge branch 'ggerganov:master' into master ldwang 2023-08-02 14:19:50 +0800
  • a312193e18
    readme : Add Chinese LLaMA-2 / Alpaca-2 to supported models (#2475) Yiming Cui 2023-08-02 14:18:31 +0800
  • 455b8d58b3
    Merge 5dc35d3b59 into c574bddb36 asctime 2023-08-02 01:46:41 -0400
  • 8ee4cef747
    Merge branch 'ggerganov:master' into master Richard Roberson 2023-08-01 23:27:04 -0600
  • 24dcf26b83 remove white spaces ymcui 2023-08-02 12:39:10 +0800
  • 73b6402cff Merge remote-tracking branch 'upstream/master' Upstream merge staviq 2023-08-02 04:56:19 +0200
  • 847f0af99c support for templates in browser LocalStorage staviq 2023-08-02 05:48:20 +0200
  • 712c2e90b1 Use conv.set_system_message from upstream Elsa 2023-08-02 09:33:23 +0800
  • 59484c6121 Merge remote-tracking branch 'origin/master' Elsa 2023-08-02 09:25:36 +0800
  • 98369f62c5 Add comment for llama_log_callback and replace remaining printf calls grahameth 2023-08-02 01:16:12 +0200
  • c857a33b19 Add back all the new lines in the logging strings grahameth 2023-08-02 00:50:31 +0200
  • e39e45493c Merge branch 'master' into logging_callback grahameth 2023-08-02 00:33:57 +0200
  • e23ba19da1 use initializer list for ggml_init_params netrunnereve 2023-08-01 17:53:14 -0400
  • 1b4f9c8eb9
    convert-gptneox-h5-to-gguf.py : accumulate kv and ti + special tokens klosax 2023-08-01 23:40:50 +0200
  • 49380a23a3
    gguf.py : accumulate kv and tensor info data + special tokens klosax 2023-08-01 23:37:48 +0200
  • ff1cb02397
    constants.py : special tokens klosax 2023-08-01 23:17:21 +0200
  • 0081032087
    Update Makefile Alex 2023-08-01 14:13:17 -0700
  • c638955bfa Fix tests 0cc4m 2023-08-01 21:52:57 +0200
  • 75788fe9b0 Add submission batching to mul_f32 0cc4m 2023-08-01 21:28:40 +0200
  • f33b3dc306
    Merge branch 'ggerganov:master' into master Eve 2023-08-01 14:36:46 -0400
  • c574bddb36
    fix a typo in examples/server/README.md (#2478) Bono Lv 2023-08-01 20:54:28 +0800
  • 36a36c32a3
    Update gptneox-main.cpp klosax 2023-08-01 14:44:28 +0200
  • c77fabb1f9
    gptneox-main.cpp : special tokens klosax 2023-08-01 14:32:53 +0200
  • e7a741695c
    convert-gptneox-h5-to-gguf.py : Special tokens klosax 2023-08-01 14:30:00 +0200
  • 556134cfe1 fix a typo Bono Lv 2023-08-01 20:00:01 +0800
  • 5b61ec41e0
    Replaced the usage of ReadConsoleInputW and fgetwc with standard C++ input functions to make getchar32() work consistently in all environments, including cases when stdin is redirected. Kerim Büyükakyüz 2023-08-01 14:05:46 +0300
  • c58ffc92e5 fixed compile error Concedo 2023-08-01 18:28:49 +0800
  • 84b28c4282 Merge branch 'master' into concedo_experimental Concedo 2023-08-01 18:13:27 +0800
  • 46682e5cb3 added mmq launch flag Concedo 2023-08-01 17:57:13 +0800
  • 86aeb27734
    server : Support dark mode (#2414) master-86aeb27 ebraminio 2023-08-01 01:56:23 -0700
  • 78edf98735 Setting correct format string for long unsigned. Jiri Podivin 2023-08-01 10:52:25 +0200
  • 1873ff586b
    metal : add gqa8 kernel to allow llama-2-70B on metal (#2459) Matteo Boschini 2023-08-01 09:43:12 +0200
  • 92e60dba8b port c tests to c++ netrunnereve 2023-07-31 20:59:43 -0400
  • 0131ac0484 add support for chinese llama-2 / alpaca-2 ymcui 2023-08-01 08:07:05 +0800
  • da4900e835
    Update convert-llama-h5-to-gguf.py klosax 2023-07-31 23:04:03 +0200