Commit Graph

  • 91b8be0877 refactor probs render & make pColor transparent if not found Jhen 2023-08-22 06:41:42 +0800
  • 423db742e7
    Merge 'origin/master' into hipblas Henri Vasserman 2023-08-22 01:03:44 +0300
  • 5bc418fa18 llama-bench : minor fixes slaren 2023-08-22 00:00:24 +0200
  • 76515b7574 Merge remote-tracking branch 'origin/master' into cuda-graph-allocr slaren 2023-08-21 23:50:24 +0200
  • 2d86b2e219 Add --config argument Pontus Mårdnäs 2023-08-21 23:46:56 +0200
  • 2bfb39ac1d ggml-cuda : use graph allocator slaren 2023-08-20 21:31:53 +0200
  • 2932a5516a metal: add missing barriers for mul-mat lshzh-ww 2023-08-21 16:43:47 -0400
  • c8dba409e6
    py : remove obsolete script Georgi Gerganov 2023-08-21 23:40:22 +0300
  • 6381d4e110
    gguf : new file format with flexible meta data (beta) (#2398) master-6381d4e Georgi Gerganov 2023-08-21 23:07:43 +0300
  • 66a66a05a8
    readme : add notice about new file format gguf Georgi Gerganov 2023-08-21 22:11:00 +0300
  • 811f653f95
    py : cosmetics Georgi Gerganov 2023-08-21 20:40:08 +0300
  • 49c25cce19
    tests : use new tokenizer type API (#2692) goerch 2023-08-21 19:11:14 +0200
  • 11e3806be4 Use token type API in test-tokenizer-1.cpp goerch 2023-08-21 19:04:03 +0200
  • d3f5fbef6c
    main : flush stdout Georgi Gerganov 2023-08-21 19:52:51 +0300
  • a856685648 Merge branch 'gguf' of https://github.com/goerch/llama.cpp into gguf goerch 2023-08-21 18:48:23 +0200
  • 0b53b8b08d
    llama : add API for token type Georgi Gerganov 2023-08-21 19:35:31 +0300
  • 8d177eddeb
    llama : improve token type support (#2668) goerch 2023-08-21 17:56:02 +0200
  • e06cbcee73
    gguf : add Python script to convert GGMLv3 LLaMA models to GGUF (#2682) Kerfuffle 2023-08-21 08:45:52 -0600
  • 054776049e Set default value for gguf add_tensor raw_shape KW arg KerfuffleV2 2023-08-21 08:27:31 -0600
  • 6490ff7198
    py : fix whitespace Georgi Gerganov 2023-08-21 16:42:27 +0300
  • e3da126f2a
    main : inject reverse prompt after EOS + update examples/chat.sh Georgi Gerganov 2023-08-21 16:41:27 +0300
  • 1e7a0092dd
    Merge branch 'master' into gguf Georgi Gerganov 2023-08-21 16:27:51 +0300
  • 2177142b49
    Merge e9c17039db into dadbed99e6 Lionel Cheng 2023-08-21 08:07:30 -0500
  • 8af1991e2a
    main : restore old EOS behavior in interactive mode Georgi Gerganov 2023-08-21 15:40:51 +0300
  • 7a7d1ba68a
    convert-llama-hf-to-gguf.py : rope scale fix klosax 2023-08-21 14:12:02 +0200
  • 9070e330ab
    convert-llama-7b-pth-to-gguf.py : rope scale fix klosax 2023-08-21 14:11:22 +0200
  • c082b9fa0b
    llama.cpp : use rope scale kv klosax 2023-08-21 13:30:03 +0200
  • dc1f051013
    convert-llama-7b-pth-to-gguf.py : rope scale and added tokens klosax 2023-08-21 13:27:53 +0200
  • 5f6ff387ca
    convert-llama-hf-to-gguf.py : rope scale and added tokens klosax 2023-08-21 13:25:14 +0200
  • 6a69a693cb
    gguf.py : fix rope scale kv klosax 2023-08-21 13:23:10 +0200
  • dadbed99e6
    metal : fix synchronization in new matrix multiplication kernel (#2686) Shouzheng Liu 2023-08-21 06:59:29 -0400
  • f68aef5473 Fix wrong type size for Q8_K KerfuffleV2 2023-08-21 04:19:17 -0600
  • 996aaca1d4 Use correct params override var name KerfuffleV2 2023-08-20 16:06:23 -0600
  • e854cd7dc6 Allow overriding vocab and hyperparams from original model metadata KerfuffleV2 2023-08-20 15:58:02 -0600
  • f56db2164a Allow specifying name and description for output GGUF KerfuffleV2 2023-08-20 14:24:26 -0600
  • 80912f0741 Improve help text, expand warning KerfuffleV2 2023-08-20 13:15:01 -0600
  • ff25134390 Add description to converted GGUF files KerfuffleV2 2023-08-20 13:03:19 -0600
  • 8083e20d19 More vocab conversion fixes KerfuffleV2 2023-08-20 11:23:13 -0600
  • 08959c88c2 Fix vocab space conversion logic KerfuffleV2 2023-08-20 10:36:57 -0600
  • f7e61fd1a9 Cleanups, better output during conversion KerfuffleV2 2023-08-20 10:26:43 -0600
  • 8afc1ef312 First pass at converting GGMLv3 LLaMA models to GGUF KerfuffleV2 2023-08-20 09:34:48 -0600
  • cb1c0727bd
    HellaSwag: split token evaluation into batches if needed (#2681) master-cb1c072 Kawrakow 2023-08-21 11:11:31 +0300
  • 1f373e349e server : do not overwrite 404 if status is 500 from exception_handler jhen 2023-08-21 16:02:32 +0800
  • 3f9fa77fe0 server : fallback to default if client param is null jhen 2023-08-21 15:37:51 +0800
  • af1ea58b60 fix content of format_final_response Jhen 2023-08-21 13:42:06 +0800
  • 1bef2dcf87 fix typo Jhen 2023-08-21 13:33:33 +0800
  • 14ac9dadc4 metal: fix-test lshzh-ww 2023-08-21 01:27:37 -0400
  • 1e9fe8a954 always send partial response for get correct probs of last to_send Jhen 2023-08-21 13:26:23 +0800
  • 371cc14815 remove unused function Jhen 2023-08-21 13:06:59 +0800
  • b7ddf04a26 correct probabilites usage Jhen 2023-08-21 12:47:39 +0800
  • a7042c187f revert unnecessary change Jhen 2023-08-21 11:49:48 +0800
  • 54f9f3c107 use final response to show probabilities on stop Jhen 2023-08-21 11:49:04 +0800
  • e4c04c242d fix incorrect prob convert if the str is already a known token Jhen 2023-08-21 11:48:51 +0800
  • c818c405e0
    convert-llama-hf-to-gguf.py : fix attn_q permute klosax 2023-08-21 04:42:09 +0200
  • 58bde5c5c1
    Delete convert-permute-debug.py klosax 2023-08-21 04:35:06 +0200
  • 287db51015
    Delete convert-permute-debug-master.py klosax 2023-08-21 04:34:39 +0200
  • d5c8fcfd8a
    convert.py : 70b model working (change attn_q permute) klosax 2023-08-21 04:33:33 +0200
  • 7de7cb4bd8
    convert-permute-debug.py : change permute type of attn_q klosax 2023-08-21 04:06:59 +0200
  • 4f92488dd6
    convert-permute-debug-master.py : permute debug for master klosax 2023-08-21 03:44:16 +0200
  • 5a02b9625a
    convert-permute-debug.py : permute debug print klosax 2023-08-21 03:24:29 +0200
  • 8b4106ae33
    also save latest finetune output with ITERATION="LATEST" and print where files are saved xaedes 2023-08-21 02:24:25 +0200
  • 11e863651d generate index.html.hpp Jhen 2023-08-21 06:39:21 +0800
  • 25e6747a56 Merge branch 'master' into server-probs Jhen 2023-08-21 06:39:00 +0800
  • 7ec9c22249 skip empty array or byte pair (> 1) in Probabilites Jhen 2023-08-21 06:38:22 +0800
  • 44dd9ed287 Improve commentary goerch 2023-08-21 00:28:31 +0200
  • 6586487e62 Restored accidentally removed comment goerch 2023-08-21 00:13:25 +0200
  • dea1e4c03e Merge branch 'gguf' of https://github.com/ggerganov/llama.cpp into gguf goerch 2023-08-21 00:12:47 +0200
  • 9e232f0234
    ggml : move all type info to ggml_type_traits (#2663) master-9e232f0 slaren 2023-08-20 22:17:53 +0200
  • 27c24ffa1b
    add option to save finetune output every N iterations xaedes 2023-08-20 20:16:46 +0200
  • d61ed6b431
    mixing multiple LORA adapters is now possible xaedes 2023-08-20 18:36:20 +0200
  • f838faa874
    convert-llama-7b-pth-to-gguf.py : special tokens klosax 2023-08-20 16:56:48 +0200
  • 76b46627e2
    convert-llama-hf-to-gguf.py : special tokens klosax 2023-08-20 16:54:42 +0200
  • c9d9b05281 HellaSwag: split token evaluation into batches if needed Iwan Kawrakow 2023-08-20 17:41:13 +0300
  • 5e9ff54a67
    More efficient Hellaswag implementation (#2677) master-5e9ff54 Kawrakow 2023-08-20 16:44:46 +0300
  • 05ef02aec3 More efficient Hellaswag implementation Iwan Kawrakow 2023-08-20 10:05:29 +0300
  • 3d8e255514 make scripts executable Cebtenzzre 2023-08-18 17:49:25 -0400
  • 01046648cf ggml: create thread pool lazily JohannesGaessler 2023-08-19 19:54:56 +0200
  • 16ab5f1b18 ggml: use __CUDACC__ to recognise nvcc compiler Kylin 2023-08-20 01:57:24 +0800
  • 28b8c265eb
    cmpnct_gpt2bpe.hpp : cleanup gguf-28b8c26 klosax 2023-08-19 18:26:51 +0200
  • 8a25bd41b3 ggml: support CUDA's half type for aarch64(#1455) support CUDA's half type for aarch64 in ggml_fp16_t definition Kylin 2023-08-20 00:07:50 +0800
  • 5ae5d2bd5b Remove unnecessary scalar layout extension 0cc4m 2023-08-19 17:53:48 +0200
  • 2faad208ae CUDA: fix __builtin_assume for CUDA < 11.2 JohannesGaessler 2023-08-19 17:17:55 +0200
  • aea173f5af More sentencepiece compatibility by eliminating magic numbers goerch 2023-08-19 16:50:29 +0200
  • c0a1269b7f
    Update examples/server/README.md klosax 2023-08-19 15:27:37 +0200
  • da837401cd Exclude platform dependent tests goerch 2023-08-19 14:50:32 +0200
  • dc65fb3044 Merge branch 'gguf' of https://github.com/goerch/llama.cpp into gguf goerch 2023-08-19 14:40:21 +0200
  • 370a95f524 Improve token type support goerch 2023-08-19 14:39:33 +0200
  • 21d88645fc Merge branch 'gguf' of https://github.com/ggerganov/llama.cpp into gguf goerch 2023-08-19 13:37:04 +0200
  • c16ea8e193
    Merge branch 'ggerganov:gguf' into gguf goerch 2023-08-19 13:36:05 +0200
  • 6a2e520095
    cmpnct_gpt2bpe.hpp : remove non-general stuff klosax 2023-08-19 13:19:02 +0200
  • 8945d47f52
    gptneox-main.cpp : fixes klosax 2023-08-19 12:09:24 +0200
  • 781bf2481f
    falcon-main.cpp : fixes klosax 2023-08-19 12:08:17 +0200
  • dadf098b5a
    cmpnct_gpt2bpe.hpp : fixes klosax 2023-08-19 12:06:22 +0200
  • b3a7a2b486
    convert-falcon-hf-to-gguf.py : add tensor data layout klosax 2023-08-19 12:05:11 +0200
  • 12e4284c31 Fix CUDA softmax by subtracting max value before exp lijiahao 2023-08-19 11:55:01 +0800
  • 946e3138a4 ggml : move all type info to ggml_type_traits slaren 2023-08-19 02:54:25 +0200
  • 2c8055b65b
    convert-falcon-hf-to-gguf.py : update ref klosax 2023-08-19 01:08:39 +0200
  • 1d80eea574
    falcon-main.cpp : fix for falcon 40b klosax 2023-08-19 01:03:37 +0200
  • bd5a57901b
    gguf.py : fix for falcon 40b klosax 2023-08-19 01:01:52 +0200
  • 281d6d1105
    convert-llama-hf-to-gguf.py : remove extra kv klosax 2023-08-19 00:32:56 +0200