Commit Graph

  • b11ac01f6b rewrite: no longer consider backward compitability; plan and make_plan mqy 2023-07-03 16:00:47 +0800
  • a1e7c69228 ggml_graph_compute: deprecate using ggml_context, try resolve issue #287 mqy 2023-06-27 05:47:08 +0800
  • 38fbd4001e Merge remote-tracking branch 'upstream/master' into grammar Evan Jones 2023-07-05 22:07:51 -0400
  • b0ac2bcaab
    Delete codacy.yml m3ndax 2023-07-05 23:22:27 +0200
  • 413d2bdcaf
    Merge branch 'ggerganov:master' into master m3ndax 2023-07-05 23:20:51 +0200
  • de71010c4a
    Merge da7d2f9587 into 31cfbb1013 kiltyj 2023-07-05 17:14:53 -0400
  • 31cfbb1013
    Expose generation timings from server & update completions.js (#2116) master-31cfbb1 Tobias Lütke 2023-07-05 16:51:13 -0400
  • 869ae76764 Disable glslc optimization 0cc4m 2023-07-05 22:23:07 +0200
  • 26cc1bd7a2
    llama : uniform variable names + struct init llama_server_completions Georgi Gerganov 2023-07-05 23:15:54 +0300
  • 244939029d Add WIP warp tile mat mul shaders 0cc4m 2023-07-05 22:17:58 +0200
  • 4fe95c6985
    update readme, update baked includes Tobias Lütke 2023-07-05 15:20:40 -0400
  • 30d973dc42
    export llama_timings as struct and expose them in server Tobias Lütke 2023-07-04 21:52:04 -0400
  • a76ce02a6c
    use javascript generators as much cleaner API Tobias Lütke 2023-07-05 15:03:01 -0400
  • ff6e39f138
    use javascript generators as much cleaner API llama_server_timings Tobias Lütke 2023-07-05 15:03:01 -0400
  • e6d1c4fe32 Fix buidling with Intel MKL but ask for "cblas.h" issue clyang 2023-07-06 02:20:43 +0800
  • 983b555e9d
    Update Server Instructions (#2113) Jesse Jojo Johnson 2023-07-05 18:03:19 +0000
  • ec326d350c
    ggml : fix bug introduced in #1237 master-ec326d3 Georgi Gerganov 2023-07-05 20:44:11 +0300
  • 1b6efeab82
    tests : fix test-grad0 master-1b6efea Georgi Gerganov 2023-07-05 20:20:05 +0300
  • 3a47811cbe Fix error from editorconfig checker Jesse Johnson 2023-07-05 16:26:53 +0000
  • 1b107b8550
    ggml : generalize quantize_fns for simpler FP16 handling (#1237) master-1b107b8 Stephan Walter 2023-07-05 16:13:06 +0000
  • bd08fa9522 Fix duplicate text Jesse Johnson 2023-07-05 16:03:39 +0000
  • 775a299a4b Remove duplicate OAI instructions Jesse Johnson 2023-07-05 16:00:22 +0000
  • 1dd61d2aa3 Merge branch 'master' into update-server-instructions Jesse Johnson 2023-07-05 15:53:54 +0000
  • fa4a48cacc Update server README Jesse Johnson 2023-07-05 15:42:26 +0000
  • 745e89ea24
    ci : disable FMA for mac os actions Georgi Gerganov 2023-07-05 18:35:13 +0300
  • f46db27ea0
    ci : disable FMA on Mac OS test-mac-os-ci Georgi Gerganov 2023-07-05 18:29:08 +0300
  • 8567c76b53
    Update server instructions for web front end (#2103) Jesse Jojo Johnson 2023-07-05 15:13:35 +0000
  • 83efc92053
    ci : print mac os "sysctl -a" Georgi Gerganov 2023-07-05 18:06:32 +0300
  • 44d214c040 Only warn if __STDC_IEC_559__ isn't defined niansa 2023-07-05 14:34:18 +0200
  • 77ebe46966 Fixed case order in ggml_vk_graph_compute niansa 2023-07-05 14:21:16 +0200
  • 924dd22fd3
    Quantized dot products for CUDA mul mat vec (#2067) master-924dd22 Johannes Gäßler 2023-07-05 14:19:42 +0200
  • 856b7589e9 Optimized ggml_vk_mul_mat_f16 argument count niansa 2023-07-05 13:34:01 +0200
  • 6be93e6071 Ported mat mul from Metal niansa 2023-07-05 13:28:40 +0200
  • 051c70dcd5
    llama: Don't double count the sampling time (#2107) master-051c70d Howard Su 2023-07-05 18:31:23 +0800
  • ea79e549f0 fixed refusing to quantize some models Concedo 2023-07-05 17:29:35 +0800
  • 2fc8249ba3 Simple mul_mat_f16 for speed and removal of unused mul_mat_f32 niansa 2023-07-05 10:59:38 +0200
  • 9e4475f5cf
    Fixed OpenCL offloading prints (#2082) master-9e4475f Johannes Gäßler 2023-07-05 08:58:05 +0200
  • 681f1801fe Quantized dot products for CUDA mul mat vec JohannesGaessler 2023-07-01 17:29:54 +0200
  • efa86bf2a6
    export llama_timings as struct and expose them in server Tobias Lütke 2023-07-04 21:52:04 -0400
  • 7b007d4b5e Don't double count the sample time Howard Su 2023-07-05 09:10:54 +0800
  • 7f0e9a775e
    embd-input: Fix input embedding example unsigned int seed (#2105) master-7f0e9a7 Nigel Bosch 2023-07-04 18:33:33 -0500
  • 80b17e2f66 Fix trailing whitespace in vk_mem_alloc.h 0cc4m 2023-07-04 23:01:32 +0200
  • e35d28fec3 Fix queue selection for AMD RADV 0cc4m 2023-07-04 22:57:08 +0200
  • ae7325fdff Fix 2d write 0cc4m 2023-07-04 22:42:07 +0200
  • ade9555c48 Add 2d write operation, profiling code 0cc4m 2023-07-04 22:31:47 +0200
  • b472f3fca5
    readme : add link web chat PR Georgi Gerganov 2023-07-04 22:25:22 +0300
  • 8e9af803ba
    Merge branch 'master' into HEAD Georgi Gerganov 2023-07-04 22:02:38 +0300
  • ed9a54e512
    ggml : sync latest (new ops, macros, refactoring) (#2106) master-ed9a54e Georgi Gerganov 2023-07-04 21:54:11 +0300
  • 577d0a53dd
    ggml : sync latest (new ops, macros, refactoring) Georgi Gerganov 2023-07-04 21:26:37 +0300
  • 8239ae8101 Fix input embedding example unsigned int seed Nigel Bosch 2023-07-04 13:14:34 -0500
  • f257fd2550
    Add an API example using server.cpp similar to OAI. (#2009) master-f257fd2 jwj7140 2023-07-05 03:06:12 +0900
  • 81f28f2539 Remove call to ggml_cuda_mul_mat_get_wsize Stephan Walter 2023-07-04 19:15:57 +0200
  • 93e69abe12 print json jwj7140 2023-07-05 01:08:25 +0900
  • 41f7a5004a fix bug & add truncation return jwj7140 2023-07-05 01:02:05 +0900
  • 99cf7bcc9a Update server instructions for web front end Jesse Johnson 2023-07-04 15:54:40 +0000
  • 7735c5a9af
    Merge 'origin/master' into hipblas Henri Vasserman 2023-07-04 17:09:16 +0300
  • 7ee76e45af
    Simple webchat for server (#1998) master-7ee76e4 Tobias Lütke 2023-07-04 10:05:27 -0400
  • 3d7d8d00a4
    add cmake commands Henri Vasserman 2023-07-04 17:02:22 +0300
  • c19daa4eb5
    basic response formatting Tobias Lütke 2023-07-03 15:53:01 -0400
  • eee6d69e39
    fix mobile, fix missing prompt cache Tobias Lütke 2023-07-03 12:21:41 -0400
  • fedce007c0
    rework state management into session, expose historyTemplate to settings Tobias Lütke 2023-07-03 10:55:15 -0400
  • 98e612cefd
    slightly nicer css Tobias Lütke 2023-07-02 17:46:00 -0400
  • dd1df3f31c
    add /completion.js file to make it easy to use the server from js Tobias Lütke 2023-07-02 15:56:10 -0400
  • 8e1b04d319
    enable server in Makefiles Tobias Lütke 2023-07-02 14:55:16 -0400
  • dc7dd0886a
    let's try this with the xxd tool instead and see if msvc is happier with that Tobias Lütke 2023-07-02 14:50:14 -0400
  • 34fc3c7e9f
    remove need for @microsoft/fetch-event-source dep (-7kb) Tobias Lütke 2023-07-02 14:30:23 -0400
  • e192f950a3
    revert log format changes Tobias Lütke 2023-06-27 16:23:22 -0400
  • 0f95689c17
    improvements Tobias Lütke 2023-06-27 15:14:15 -0400
  • 7a3895641c
    allow server to multithread Tobias Lütke 2023-06-27 13:19:24 -0400
  • a30d4b2a8f
    switched to fprintf logging and to access_log Tobias Lütke 2023-06-27 13:13:01 -0400
  • c8cedf5684
    newline police tobi lutke 2023-06-26 20:45:22 -0400
  • 022bf2bb48
    embed index and add --path for choosing static dir tobi lutke 2023-06-26 20:36:42 -0400
  • e3fba85d14
    minor aesthetic fixes tobi lutke 2023-06-26 19:20:28 -0400
  • c1cb0e1db2
    server : clear trailing whitespace Georgi Gerganov 2023-06-26 10:42:28 +0300
  • b07b271358
    tighter tobi lutke 2023-06-25 21:19:03 -0400
  • 627d3ba8b5
    expose simple web interface on root domain tobi lutke 2023-06-25 20:56:00 -0400
  • 5f04a5d877 Add test Howard Su 2023-07-04 21:08:48 +0800
  • ca150b7725 More tests Howard Su 2023-07-04 18:15:25 +0800
  • 751e51ceda special test Howard Su 2023-07-03 09:51:04 +0800
  • 6caa06638f Add tests Howard Su 2023-07-02 22:06:10 +0800
  • e818537027 [llama] Add resegment post processing of tokenizer Howard Su 2023-07-02 22:06:03 +0800
  • acc111caf9
    Allow old Make to build server. (#2098) master-acc111c Henri Vasserman 2023-07-04 15:38:04 +0300
  • 23c7c6fc91
    Update Makefile: clean simple (#2097) master-23c7c6f ZhouYuChen 2023-07-04 20:15:16 +0800
  • 1cc82900e3
    Allow old Make to build server. Henri Vasserman 2023-07-04 14:51:08 +0300
  • fe9d3daea3
    Update Makefile: clean simple ZhouYuChen 2023-07-04 19:48:52 +0800
  • 69add28324 Merge branch 'master' into concedo_experimental Concedo 2023-07-04 18:51:42 +0800
  • 00e35d0bbf Merge branch 'concedo' into concedo_experimental Concedo 2023-07-04 18:46:40 +0800
  • f9108ba401
    Make koboldcpp.py executable on Linux (#293) Michael Moon 2023-07-04 18:46:08 +0800
  • fff705d4f6 Merge remote-tracking branch 'ycros/improve-sampler-api-access' into concedo_experimental Concedo 2023-07-04 18:42:02 +0800
  • c6c0afdf18 refactor to avoid code duplication Concedo 2023-07-04 18:35:03 +0800
  • 784628a2be Merge remote-tracking branch 'ycros/improve-sampler-api-access' into concedo_experimental Concedo 2023-07-04 16:38:32 +0800
  • 042c5b278f wrap includes Evan Miller 2023-07-04 00:13:20 -0400
  • 668ba5fe0b fixes Evan Miller 2023-07-04 00:09:02 -0400
  • d05ca74dd8 fix warnings, update README Evan Miller 2023-07-03 23:53:43 -0400
  • f85785f650 MPI support, first cut Evan Miller 2023-07-03 21:51:05 -0400
  • 698efad5fb
    CI: make the brew update temporarily optional. (#2092) master-698efad Erik Scholz 2023-07-04 01:50:12 +0200
  • 14a2cc71f6
    [ggml] fix index for ne03 value in ggml_cl_mul_f32 (#2088) Govlzkoy 2023-07-04 07:50:00 +0800
  • 20d4a48d72
    CI: make the brew update temporarily optional. until they decide to fix the brew installation in the macos runners. see the open issues. eg https://github.com/actions/runner-images/pull/7710 Green Sky 2023-07-04 01:35:39 +0200
  • 56336afb50
    Merge branch 'ggerganov:master' into master m3ndax 2023-07-04 00:45:21 +0200
  • 1cf14ccef1
    fix server crashes (#2076) Henri Vasserman 2023-07-04 00:05:23 +0300