Commit Graph

  • f1f85de815 Split BPE and SentencePiece vocabularies goerch 2023-08-08 07:23:01 +0200
  • 2f9181f235 Trim whitespace from first 2 displayed tokens crasm 2023-08-07 20:11:47 -0400
  • cfdc3494e3 Always print num tokens crasm 2023-08-07 19:03:28 -0400
  • ca32203c88 CUDA: tuned mul_mat_q kernels JohannesGaessler 2023-08-06 23:07:54 +0200
  • 4fc3776ceb Add another test case Igor Pissolati 2023-08-07 18:30:24 -0300
  • 236c838d23 server : fix Probabilites not used if included empty str Jhen 2023-08-08 04:40:41 +0800
  • 6f7dabab44 Add simple test for special tokens Igor Pissolati 2023-08-07 17:31:13 -0300
  • d9791bb48b Add C API for adding special tokens Igor Pissolati 2023-08-07 17:30:12 -0300
  • 65559a23c8
    Update gptneox-main.cpp klosax 2023-08-07 22:28:43 +0200
  • 38fbb74038
    Merge branch 'master' into fix-2023 goerch 2023-08-07 21:24:22 +0200
  • 00e9a228dc Remove getbufoneline usage, Add input bind example. Austin Mroz 2023-08-07 12:36:17 -0500
  • f3c3b4b167
    Add --rope-scale parameter (#2544) master-f3c3b4b klosax 2023-08-07 19:07:19 +0200
  • 3554080502 fixed blasbatchmul multiplier Concedo 2023-08-08 00:41:02 +0800
  • 28ad80b6e4 Merge branch 'master' into concedo_experimental Concedo 2023-08-08 00:34:10 +0800
  • 3c7d938d95 update lite, resize scratch buffers for blasbatch 2048 Concedo 2023-08-08 00:32:51 +0800
  • e1175b8314
    README.md : Add info about using linear rope scaling klosax 2023-08-07 18:27:49 +0200
  • 9348aa4df9 Metal implementation Cebtenzzre 2023-07-21 17:10:57 -0400
  • 6aeb46b343 CUDA implementation Cebtenzzre 2023-07-18 22:28:27 -0400
  • 8dec38c35c llama: implement NTK-By-Parts (NTKv2) RoPE scaling Cebtenzzre 2023-07-17 20:07:15 -0400
  • 30b63f71bd
    multiline autocompletion, get rid of "^@" chaihahaha 2023-08-08 00:08:49 +0800
  • 099119f532 Fixes to rebase Igor Pissolati 2023-08-07 12:59:11 -0300
  • 8083ae347a gguf : minor stuff Georgi Gerganov 2023-08-07 19:02:18 +0300
  • b2417d0dfb
    common.cpp : Add --rope-scale parameter klosax 2023-08-07 17:57:36 +0200
  • f6d5fe3afc Use some tricks to eliminate the necessity for a new format Igor Pissolati 2023-06-22 11:29:51 -0300
  • 1da82c551f Merge branch 'master' into gguf Georgi Gerganov 2023-08-07 18:53:03 +0300
  • 41a2ed03e7 Ignore unusable json values Igor Pissolati 2023-06-20 19:20:53 -0300
  • ca1fc20508 Fix issues revealed by CI Igor Pissolati 2023-06-20 01:27:36 -0300
  • e468e75515 Remove trailing whitespaces Igor Pissolati 2023-06-19 23:03:58 -0300
  • 7f9d720105 Better loading of special tokens from jsons Igor Pissolati 2023-06-19 16:00:13 -0300
  • 0c14627438 Code cleanup Igor Pissolati 2023-06-19 14:52:57 -0300
  • 61a98bc30a Improve support for special tokens Igor Pissolati 2023-06-18 20:11:01 -0300
  • 4357e692ac
    gguf.py : use custom alignment if present klosax 2023-08-07 13:51:26 +0200
  • 93356bdb7a
    ggml : mul mat tweaks (#2372) master-93356bd Georgi Gerganov 2023-08-07 14:25:58 +0300
  • 60baff7c85
    ggml : pad result of ggml_nbytes() master-60baff7 Georgi Gerganov 2023-08-07 14:24:42 +0300
  • 9082b5dfbf
    ggml : change params pointer (style change) (#2539) master-9082b5d Georgi Gerganov 2023-08-07 13:55:18 +0300
  • dd50b77d37
    ggml : fix params pointer Georgi Gerganov 2023-08-07 13:26:56 +0300
  • 99d29c0094
    ggml : sync (custom ops) (#2537) master-99d29c0 Georgi Gerganov 2023-08-07 13:20:09 +0300
  • ea73dace98 Fix when stop in request is null Elsa 2023-08-07 18:09:15 +0800
  • 6ae3702f69 Merge remote-tracking branch 'origin/master' Elsa 2023-08-07 18:08:17 +0800
  • b6524985df Include review comments Martin Krasser 2023-08-07 11:46:26 +0200
  • 5ddfbffbaf
    llama : replace (permute + reshape + view_1d) with (view_3d) Georgi Gerganov 2023-08-07 12:32:58 +0300
  • 9133e456d2 Merge branch 'master' into concedo_experimental Concedo 2023-08-07 17:33:42 +0800
  • cae6a847ad cuda free only for non mmq (+2 squashed commit) Concedo 2023-08-07 16:40:13 +0800
  • 9b643601e6
    ggml : sync (custom ops) Georgi Gerganov 2023-08-07 11:52:32 +0300
  • 3d9a551816
    Fixed mmap prefetch for GPU offloading (#2529) master-3d9a551 Johannes Gäßler 2023-08-07 10:09:40 +0200
  • f6f9896ac3
    metal : fix out-of-bounds access + inc concurrency nodes (#2416) Georgi Gerganov 2023-08-07 10:52:57 +0300
  • 30ea0e1685
    metal : increase concurrency nodes to 2*GGML_MAX_NODES Georgi Gerganov 2023-08-07 10:52:13 +0300
  • 9f16a4c4ef switch to upstream implementation of pool malloc Concedo 2023-08-07 15:16:37 +0800
  • 34a14b28ff
    [Makefile] Move ARM CFLAGS before compilation (#2536) master-34a14b2 GiviMAD 2023-08-06 23:21:46 -0700
  • 7297128db8
    [Zig] Rewrite build for Zig 0.11 (#2514) Henri Vasserman 2023-08-07 08:35:53 +0300
  • e660943d3d Add further ops 0cc4m 2023-08-07 06:02:57 +0200
  • 6659652c9f lower actual temp used when temp=0 Concedo 2023-08-07 11:05:06 +0800
  • 0e41b94f40 improve detection for 70B. Concedo 2023-08-07 10:43:06 +0800
  • fb44d72a78 Merge remote-tracking branch 'johannes/cuda-fix-mmap-prefetch' into concedo_experimental Concedo 2023-08-07 10:17:43 +0800
  • 559c0e2d1f updated lite again, fix for wi Concedo 2023-08-07 10:15:20 +0800
  • 0b8c9efe8f Refactor makefile fix build with CLBlast in arm Miguel Álvarez 2023-08-07 01:01:30 +0200
  • 2bf422eafd
    add train function using automatic gradient checkpointing backward pass and allocator xaedes 2023-08-06 23:07:57 +0200
  • d9024df759 Fixed mmap prefetch for GPU offloading JohannesGaessler 2023-08-06 10:18:05 +0200
  • 68365e2291 one can now specify where ggml-metal.metal file is with en variable GGML_METAL_PATH Marc 2023-08-06 18:20:59 +0200
  • d43af4b543
    Merge branch 'master' into pr-train-mem-usage-improvements xaedes 2023-08-06 17:30:17 +0200
  • d442888626 Merge branch 'master' into concedo_experimental Concedo 2023-08-06 22:47:33 +0800
  • 198cc826fc updated lite Concedo 2023-08-06 22:19:18 +0800
  • 5d52192f73 Remove inactive code. goerch 2023-08-06 13:51:26 +0200
  • bb6a58d0c3 Simplifying an expression. goerch 2023-08-06 13:35:27 +0200
  • 19e950f051 Adding support for Aquila (GPT2?) tokenizer. goerch 2023-08-06 13:24:05 +0200
  • e99416cdfe blasbatchsize Concedo 2023-08-06 17:47:59 +0800
  • bcfdd0e662 fixed bbs -1 and allow bbs = 2048 Concedo 2023-08-06 17:47:05 +0800
  • 86c3219895
    console : fix issue related to Windows 11 PowerShell console mode persistence (#2521) master-86c3219 DannyDaemonic 2023-08-05 23:49:34 -0700
  • 2e8265ae17
    convert.py : add missing abstract methods for quantized data (#2491) Keiichi Tabata 2023-08-06 15:34:05 +0900
  • 1b5442923a Fix tokenizer regression in convert.py and improve CPP interface for llama_tokenize goerch 2023-08-06 07:47:55 +0200
  • d9f75f3ccf Allow passing grammar to completion endpoint Martin Krasser 2023-08-05 14:05:15 +0200
  • 0480362f12 remove from llama_context_params netrunnereve 2023-08-06 00:44:29 -0400
  • ce6d86ec41 fix netrunnereve 2023-08-06 00:40:13 -0400
  • 215e2f21d0 only activate pp_threads for main for now netrunnereve 2023-08-06 00:22:14 -0400
  • 590feeac1d add printout of pp_threads netrunnereve 2023-08-06 00:13:02 -0400
  • 30a0e4ccba Fixing function ordering issue goerch 2023-08-06 05:55:14 +0200
  • 1de711d4f8 builds fine netrunnereve 2023-08-05 23:45:58 -0400
  • ccd2592782 Add further missing barrier 0cc4m 2023-08-06 05:25:33 +0200
  • 5f022185a1 test pp_threads netrunnereve 2023-08-05 22:39:44 -0400
  • f514d1b306
    CUDA: faster k-quant mul_mat_q kernels (#2525) master-f514d1b Johannes Gäßler 2023-08-05 18:20:44 +0200
  • fe6a8f80ff CUDA: faster k-quant mul_mat_q kernels JohannesGaessler 2023-08-02 15:54:53 +0200
  • b139ca4e94 server: add --numa support Cheng Shao 2023-08-05 12:36:25 +0000
  • c760fd2452 Fix issue related to Windows 11 PowerShell console mode persistence Danny Daemonic 2023-08-04 21:27:51 -0700
  • 04b6f2ce20 server : convert prob to percentage + show original value as div title Jhen 2023-08-05 07:11:28 +0800
  • 3e1e86d89c Merge branch 'master' into server-probs Jhen 2023-08-05 07:10:14 +0800
  • 332311234a
    fix firefox autoscroll (#2519) master-3323112 Jonas Wunderlich 2023-08-04 20:16:11 +0000
  • dff68fd968
    fix firefox autoscroll Jonas Wunderlich 2023-08-04 21:54:43 +0200
  • 182af739c4
    server: regenerate completion.js.hpp (#2515) master-182af73 Cebtenzzre 2023-08-04 15:00:57 -0400
  • 6fc2847aaf simplify object var names Henri Vasserman 2023-08-04 21:43:15 +0300
  • c1cb4c11be Disable LTO on Windows. Henri Vasserman 2023-08-04 21:19:37 +0300
  • 555d132d2a server: regenerate completion.js.hpp Cebtenzzre 2023-08-04 12:16:18 -0400
  • 4329d1acb0
    CUDA: use min compute capability of GPUs actually used (#2506) master-4329d1a Cebtenzzre 2023-08-04 11:35:22 -0400
  • 02f9d96a86
    CUDA: check if event is NULL before cudaStreamWaitEvent (#2505) master-02f9d96 Cebtenzzre 2023-08-04 11:34:32 -0400
  • 3498588e0f
    Add --simple-io option for subprocesses and break out console.h and cpp (#1558) master-3498588 DannyDaemonic 2023-08-04 08:20:12 -0700
  • a36255062f zig build fixes Henri Vasserman 2023-08-04 18:16:24 +0300
  • 18bb0ab127 up ver, support 16k ctx Concedo 2023-08-04 21:47:17 +0800
  • 5f631c2679
    Fixing race condition in server and partial stream handling in frontend. (#2391) master-5f631c2 Stephen Nichols 2023-08-04 06:37:24 -0500
  • 415e99fec2
    Stream save llama context data to file instead of allocating entire buffer upfront (#2488) master-415e99f l3utterfly 2023-08-04 19:29:52 +0800
  • d6360ade08
    Apply code review suggestions l3utterfly 2023-08-04 19:15:22 +0800
  • e74d42dfff
    Apply suggestions from code review l3utterfly 2023-08-04 19:14:26 +0800