Commit Graph

  • 9a08eaf3c4
    Another speed gain for Q4_0 and Q4_1 on Metal (#2375) Kawrakow 2023-07-25 13:48:29 +0300
  • 129d844c87
    Fix Q4_K and Q5_K for QK_K = 64 on CUDA (#2359) master-129d844 Kawrakow 2023-07-25 13:48:04 +0300
  • 450a7c76de
    ggml : mul_mat threads yield Georgi Gerganov 2023-07-25 13:26:32 +0300
  • 66e4b5141e fix horde worker host and client agent Concedo 2023-07-25 18:18:41 +0800
  • d5512b782b
    server: add rms_norm_eps parameter (#2380) master-d5512b7 slaren 2023-07-25 11:36:17 +0200
  • 2076a9b3d9 ggml : mul_mat block tiling attempt Georgi Gerganov 2023-07-25 11:34:32 +0300
  • c798308e3a
    [Server] Escape HTML in webchat (#2368) master-c798308 Henri Vasserman 2023-07-25 10:27:34 +0300
  • 69554cee9e Add fallback for devices only supporting one DescriptorSet per DescriptorPool 0cc4m 2023-07-25 07:01:02 +0200
  • b12859937f Merge remote-tracking branch 'origin/master' into prefix-bos Xiao-Yong Jin 2023-07-24 21:51:48 -0500
  • 11d2405486 examples/common: move input_prefix_bos to other bools Xiao-Yong Jin 2023-07-24 21:48:44 -0500
  • bba27edadc Merge remote-tracking branch 'origin/master' into prompt-array Xiao-Yong Jin 2023-07-24 21:40:19 -0500
  • 97deb25398 server: use tokenizePrompt(json) and default "" if empty prompt Xiao-Yong Jin 2023-07-24 21:39:35 -0500
  • f4519830ed first crack at lamma2.c model conversion Aniket 2023-07-24 22:29:30 -0400
  • 010a3cbe81 added Dockerfile for server John Jones 2023-07-24 20:34:48 -0400
  • 4e580284c0 Allow parallel execution of kernels, parallelize third and fourth dimension calls 0cc4m 2023-07-24 22:51:19 +0200
  • 3d4359e21b server: add rms_norm_eps parameter slaren 2023-07-24 22:45:35 +0200
  • 48c27a9ce1 hotfix for 70b broadcast issues Concedo 2023-07-25 01:32:47 +0800
  • 9731682ad6
    Update Makefile (#345) Александр Герман 2023-07-24 21:21:32 +0500
  • 7f98561243 Have N_DST, etc., be template parameters Iwan Kawrakow 2023-07-24 19:05:44 +0300
  • 6f489a77dd metal: don't call find_concurrency automatically. lshzh-ww 2023-07-24 11:59:12 -0400
  • 41c674161f
    make rms_norm_eps a parameter (#2374) master-41c6741 slaren 2023-07-24 17:57:12 +0200
  • b759afaa2a Another speed gain for Q4_0 and Q4_1 on Metal Iwan Kawrakow 2023-07-24 18:32:41 +0300
  • 27d0fcc344
    Merge remote-tracking branch 'origin/master' into webchat-escape-html Henri Vasserman 2023-07-24 18:12:52 +0300
  • d8d2449bfb better label (+1 squashed commits) Concedo 2023-07-24 22:46:53 +0800
  • b3f138d058
    Chat UI extras (#2366) master-b3f138d Aarni Koskela 2023-07-24 17:54:22 +0300
  • 3855ea36a4 use scientific notation for eps param in the help slaren 2023-07-24 16:43:41 +0200
  • 7555dae4cc ditch advanced subparsers Concedo 2023-07-24 22:40:36 +0800
  • 8d7cfb42b7 fix baby llama, test-grad0 slaren 2023-07-24 16:38:37 +0200
  • 24e53a1466 add rms_norm_eps to command line slaren 2023-07-24 16:34:38 +0200
  • 9fe47c747f make rms_norm_eps a parameter slaren 2023-07-24 16:18:40 +0200
  • a2eb57e796 ggml : alternative thread distribution for mul_mat Georgi Gerganov 2023-07-24 16:35:34 +0300
  • 0822d27613 ggml : mul mat wip Georgi Gerganov 2023-07-24 15:42:59 +0300
  • 8a9b40840b Merge branch 'master' into concedo_experimental Concedo 2023-07-24 20:51:28 +0800
  • 6d71e100fe buff buffers Concedo 2023-07-24 20:33:17 +0800
  • 2af540d3e1
    Merge ca2467d12c into 5b2b2dc6ae Henri Vasserman 2023-07-24 13:53:54 +0200
  • 5b2b2dc6ae
    ggml : sync (unary ops refactor, static-correctness) (#2370) master-5b2b2dc Georgi Gerganov 2023-07-24 14:46:21 +0300
  • 8253a534eb Fix test goerch 2023-07-24 13:38:25 +0200
  • a3d880382c
    add amp Henri Vasserman 2023-07-24 14:27:30 +0300
  • 68c9fca9c2
    tests : remove unnecessary funcs Georgi Gerganov 2023-07-24 14:25:35 +0300
  • 6c5d496e69 Relax contiguous contraints in activation function lijiahao 2023-07-24 18:23:27 +0800
  • fe7508c408 Fix review remarks. goerch 2023-07-24 13:21:24 +0200
  • ca2467d12c
    chat css Henri Vasserman 2023-07-24 14:09:05 +0300
  • f77972f9af
    Merge remote-tracking branch 'origin/master' into server-cfg Henri Vasserman 2023-07-24 14:08:40 +0300
  • 971c689178
    ggml : sync (unary ops, tests) Georgi Gerganov 2023-07-24 13:52:54 +0300
  • 825e34baa3 default horde name and better handling for horde (+3 squashed commit) Concedo 2023-07-24 17:37:26 +0800
  • fd2849f018 server: embetter mirostat fields Aarni Koskela 2023-07-24 13:15:12 +0300
  • 42f70cb2f6
    Fix scalar version of Q5_K when QK_K = 64 (#2362) master-42f70cb Kawrakow 2023-07-24 12:55:02 +0300
  • 6d89bd9c6a
    escape HTML in webchat Henri Vasserman 2023-07-24 12:52:53 +0300
  • cde52d6a63
    Merge 'origin/master' into hipblas Henri Vasserman 2023-07-24 12:22:58 +0300
  • 8e8054ad83
    Add rocblas to build files Henri Vasserman 2023-07-24 12:20:49 +0300
  • c7136f03d9 added support for tensor_split parameter as an advanced parameter. Concedo 2023-07-24 17:16:19 +0800
  • 1dbdd2310c server: expose remaining generation params, for the adventurous Aarni Koskela 2023-07-24 12:11:53 +0300
  • 1f6294dc44
    Fix multi GPU on multiple amd architectures with rocblas_initialize() (#5) YellowRoseCx 2023-07-24 03:52:01 -0500
  • a1a6b68dca server: expose all currently configured generation params in UI Aarni Koskela 2023-07-24 11:42:07 +0300
  • a37a543e67 server: tighten settings layout a little Aarni Koskela 2023-07-24 11:31:28 +0300
  • 90fafe9439 makefile: correct deps for server Aarni Koskela 2023-07-24 11:24:06 +0300
  • 39c9a3b553 Added test cases goerch 2023-07-24 10:20:25 +0200
  • a0d28b250c Remove comment goerch 2023-07-24 09:48:51 +0200
  • 281a4b4f27 Fixing tests goerch 2023-07-24 09:45:20 +0200
  • 66328fcd80 Merge branch 'master' into concedo_experimental Concedo 2023-07-24 15:44:26 +0800
  • b32538da12 Fix scalar version of Q5_K when QK_K = 64 Iwan Kawrakow 2023-07-24 10:43:54 +0300
  • 94499dba25 added support for 70b llama 2 Concedo 2023-07-24 15:20:18 +0800
  • 81fae1dc8f Fixing llama_token_to_str for the different sentence_piece token types goerch 2023-07-24 09:05:21 +0200
  • e6dd6bc567 Very slightly better Q5_K bit fiddling Iwan Kawrakow 2023-07-24 09:31:07 +0300
  • b97a505c5d Fix C linkage for llama_token_to_str goerch 2023-07-24 08:05:16 +0200
  • 7f96ff9a1b Fix Q4_K and Q5_K for QK_K = 64 Iwan Kawrakow 2023-07-24 08:59:18 +0300
  • 5d0dabe19c metal: concurrently dispatch commands lshzh-ww 2023-07-24 01:30:01 -0400
  • 7846dbdbc9
    Add C grammar Ben Siraphob 2023-07-24 11:22:31 +0700
  • 993ba3b026 Merge branch 'master' into concedo_experimental Concedo 2023-07-24 11:59:00 +0800
  • 84e09a7d8b
    llama : add grammar-based sampling (#1773) master-84e09a7 Evan Jones 2023-07-23 23:58:10 -0400
  • 280abaf029 added stop reason in the perf endpoint Concedo 2023-07-24 11:55:35 +0800
  • ffc73fe35d
    we don't need to remove f16c in windows Eve 2023-07-23 23:44:49 -0400
  • 2582c31878
    noavx build and test Eve 2023-07-23 23:41:48 -0400
  • 4cd9711dac add warning message if EOS is disabled Evan Jones 2023-07-23 22:58:56 -0400
  • f7f1d266e3 update help text Evan Jones 2023-07-23 21:45:48 -0400
  • 8145bca2c9 Merge remote-tracking branch 'upstream/master' into grammar Evan Jones 2023-07-23 23:14:54 -0400
  • 6542a035f9 use a hash table instead slaren 2023-07-24 01:18:28 +0200
  • e371b716ca ggml_tensor : use 1 bit per flag slaren 2023-07-23 18:04:33 +0200
  • 261fdaae80 improve graph build time slaren 2023-07-22 21:48:57 +0200
  • 2f9cf974a0
    Some more Q4_K and Q5_K speedup on CUDA (#2346) master-2f9cf97 Kawrakow 2023-07-24 00:19:47 +0300
  • 4f06592cc6
    Add gqa parameter support to the server (#2351) master-4f06592 IgnacioFDM 2023-07-23 17:31:17 -0300
  • f3a92117a7 Add some comments to satisfy PR reviewer Iwan Kawrakow 2023-07-23 23:26:03 +0300
  • f1fc2db772 Fix missing static grahameth 2023-07-23 22:24:25 +0200
  • a3c6a8b698 Fix log calls after merge grahameth 2023-07-23 22:21:39 +0200
  • 152b633691 Merge branch 'master' into logging_callback grahameth 2023-07-23 22:14:05 +0200
  • 9f894cca30
    Merge branch 'ggerganov:master' into master m3ndax 2023-07-23 20:34:17 +0200
  • ef75c45a0a build number line break removal Hesen Peng 2023-07-23 11:01:28 -0700
  • 7a2d2dd3ab Fix enum and initialize g_state Helmut 2023-07-23 19:27:03 +0200
  • 33b4202403 Convert remaining fprintf(stderr, ...) calls to use new macros. Helmut 2023-07-23 19:16:43 +0200
  • a0c5113766 Change help from stderr to stdout Ignacio DM 2023-07-23 14:16:21 -0300
  • 161c2c69f2 Add gqa parameter support to the server Ignacio DM 2023-07-23 13:49:26 -0300
  • 6baa4ead58 Address PR comments Iwan Kawrakow 2023-07-23 20:13:23 +0300
  • 21d1e8bab2 Remove model_for_logging parameter (not needed anymore) Helmut 2023-07-23 19:05:03 +0200
  • fc5586677e Turn log level into enum and some minor changes. Helmut 2023-07-23 19:01:37 +0200
  • 671ec2c588 Add struct llama_state for global variables and move log_callback there Helmut 2023-07-23 18:52:09 +0200
  • dba8369a39 One more test case... goerch 2023-07-23 18:46:29 +0200
  • e6b1a5003e Fix for #2310 goerch 2023-07-23 18:17:32 +0200
  • 70d26ac388
    Fix __dp4a documentation (#2348) Johannes Gäßler 2023-07-23 17:49:06 +0200
  • 95daa52875 fix line breaking Hesen Peng 2023-07-23 08:38:55 -0700
  • 309a58b3cf Fix __dp4a documentation JohannesGaessler 2023-07-23 17:17:59 +0200