2024-06-12 23:41:52 +00:00
|
|
|
set(TARGET llama-server)
|
Server Example Refactor and Improvements (#1570)
A major rewrite for the server example.
Note that if you have built something on the previous server API, it will probably be incompatible.
Check out the examples for how a typical chat app could work.
This took a lot of effort, there are 24 PR's closed in the submitter's repo alone, over 160 commits and a lot of comments and testing.
Summary of the changes:
- adds missing generation parameters: tfs_z, typical_p, repeat_last_n, repeat_penalty, presence_penalty, frequency_penalty, mirostat, penalize_nl, seed, ignore_eos
- applies missing top k sampler
- removes interactive mode/terminal-like behavior, removes exclude parameter
- moves threads and batch size to server command-line parameters
- adds LoRA loading and matches command line parameters with main example
- fixes stopping on EOS token and with the specified token amount with n_predict
- adds server timeouts, host, and port settings
- adds expanded generation complete response; adds generation settings, stop reason, prompt truncated, model used, and final text
- sets defaults for unspecified parameters between requests
- removes /next-token endpoint and as_loop parameter, adds stream parameter and server-sent events for streaming
- adds CORS headers to responses
- adds request logging, exception printing and optional verbose logging
- adds better stopping words handling when matching multiple tokens and while streaming, or when it finishes on a partial stop string
- adds printing an error when it can't bind to the host/port specified
- fixes multi-byte character handling and replaces invalid UTF-8 characters on responses
- prints timing and build info on startup
- adds logit bias to request parameters
- removes embedding mode
- updates documentation; adds streaming Node.js and Bash examples
- fixes code formatting
- sets server threads to 1 since the current global state doesn't work well with simultaneous requests
- adds truncation of the input prompt and better context reset
- removes token limit from the input prompt
- significantly simplified the logic and removed a lot of variables
---------
Co-authored-by: anon998 <131767832+anon998@users.noreply.github.com>
Co-authored-by: Henri Vasserman <henv@hot.ee>
Co-authored-by: Felix Hellmann <privat@cirk2.de>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Co-authored-by: Lesaun Harvey <Lesaun@gmail.com>
2023-06-17 11:53:04 +00:00
|
|
|
option(LLAMA_SERVER_VERBOSE "Build verbose logging option for Server" ON)
|
2024-06-26 15:33:02 +00:00
|
|
|
option(LLAMA_SERVER_SSL "Build SSL support for the server" OFF)
|
|
|
|
|
2024-04-21 17:48:53 +00:00
|
|
|
include_directories(${CMAKE_CURRENT_SOURCE_DIR} ${CMAKE_CURRENT_BINARY_DIR})
|
2024-06-26 15:33:02 +00:00
|
|
|
|
2024-04-21 17:48:53 +00:00
|
|
|
set(TARGET_SRCS
|
json-schema-to-grammar improvements (+ added to server) (#5978)
* json: fix arrays (disallow `[,1]`)
* json: support tuple types (`[number, string]`)
* json: support additionalProperties (`{[k: string]: [string,number][]}`)
* json: support required / optional properties
* json: add support for pattern
* json: resolve $ref (and support https schema urls)
* json: fix $ref resolution
* join: support union types (mostly for nullable types I think)
* json: support allOf + nested anyOf
* json: support any (`{}` or `{type: object}`)
* json: fix merge
* json: temp fix for escapes
* json: spaces in output and unrestricted output spaces
* json: add typings
* json:fix typo
* Create ts-type-to-grammar.sh
* json: fix _format_literal (json.dumps already escapes quotes)
* json: merge lit sequences and handle negatives
{"type": "string", "pattern": "^({\"question\": \"[^\"]+\", \"response\": \"[^\"]+\"}\\n)+$"}
* json: handle pattern repetitions
* Update json-schema-to-grammar.mjs
* Create regex-to-grammar.py
* json: extract repeated regexp patterns to subrule
* Update json-schema-to-grammar.py
* Update json-schema-to-grammar.py
* Update json-schema-to-grammar.py
* json: handle schema from pydantic Optional fields
* Update json-schema-to-grammar.py
* Update json-schema-to-grammar.py
* Update ts-type-to-grammar.sh
* Update ts-type-to-grammar.sh
* json: simplify nullable fields handling
* json: accept duplicate identical rules
* json: revert space to 1 at most
* json: reuse regexp pattern subrules
* json: handle uuid string format
* json: fix literal escapes
* json: add --allow-fetch
* json: simplify range escapes
* json: support negative ranges in patterns
* Delete commit.txt
* json: custom regex parser, adds dot support & JS-portable
* json: rm trailing spaces
* Update json-schema-to-grammar.mjs
* json: updated server & chat `( cd examples/server && ./deps.sh )`
* json: port fixes from mjs to python
* Update ts-type-to-grammar.sh
* json: support prefixItems alongside array items
* json: add date format + fix uuid
* json: add date, time, date-time formats
* json: preserve order of props from TS defs
* json: port schema converter to C++, wire in ./server
* json: nits
* Update json-schema-to-grammar.cpp
* Update json-schema-to-grammar.cpp
* Update json-schema-to-grammar.cpp
* json: fix mjs implementation + align outputs
* Update json-schema-to-grammar.mjs.hpp
* json: test C++, JS & Python versions
* json: nits + regen deps
* json: cleanup test
* json: revert from c++17 to 11
* json: nit fixes
* json: dirty include for test
* json: fix zig build
* json: pass static command to std::system in tests (fixed temp files)
* json: fix top-level $refs
* json: don't use c++20 designated initializers
* nit
* json: basic support for reserved names `{number:{number:{root:number}}}`
* Revamp test cmake to allow args (WORKING_DIRECTORY needed for JSON test)
* json: re-ran server deps.sh
* json: simplify test
* json: support mix of additional props & required/optional
* json: add tests for some expected failures
* json: fix type=const in c++, add failure expectations for non-str const&enum
* json: test (& simplify output of) empty schema
* json: check parsing in test + fix value & string refs
* json: add server tests for OAI JSON response_format
* json: test/fix top-level anyOf
* json: improve grammar parsing failures
* json: test/fix additional props corner cases
* json: fix string patterns (was missing quotes)
* json: ws nit
* json: fix json handling in server when there's no response_format
* json: catch schema conversion errors in server
* json: don't complain about unknown format type in server if unset
* json: cleaner build of test
* json: create examples/json-schema-pydantic-example.py
* json: fix date pattern
* json: move json.hpp & json-schema-to-grammar.{cpp,h} to common
* json: indent 4 spaces
* json: fix naming of top-level c++ function (+ drop unused one)
* json: avoid using namespace std
* json: fix zig build
* Update server.feature
* json: iostream -> fprintf
* json: space before & refs for consistency
* json: nits
2024-03-21 11:50:43 +00:00
|
|
|
server.cpp
|
|
|
|
utils.hpp
|
|
|
|
httplib.h
|
|
|
|
)
|
2024-04-21 17:48:53 +00:00
|
|
|
set(PUBLIC_ASSETS
|
2024-06-01 19:31:48 +00:00
|
|
|
colorthemes.css
|
|
|
|
style.css
|
|
|
|
theme-beeninorder.css
|
|
|
|
theme-ketivah.css
|
|
|
|
theme-mangotango.css
|
|
|
|
theme-playground.css
|
|
|
|
theme-polarnight.css
|
|
|
|
theme-snowstorm.css
|
2024-04-21 17:48:53 +00:00
|
|
|
index.html
|
2024-06-01 19:31:48 +00:00
|
|
|
index-new.html
|
2024-04-21 17:48:53 +00:00
|
|
|
index.js
|
|
|
|
completion.js
|
2024-06-01 19:31:48 +00:00
|
|
|
system-prompts.js
|
|
|
|
prompt-formats.js
|
2024-04-21 17:48:53 +00:00
|
|
|
json-schema-to-grammar.mjs
|
|
|
|
)
|
2024-06-26 15:33:02 +00:00
|
|
|
|
2024-04-21 17:48:53 +00:00
|
|
|
foreach(asset ${PUBLIC_ASSETS})
|
|
|
|
set(input "${CMAKE_CURRENT_SOURCE_DIR}/public/${asset}")
|
|
|
|
set(output "${CMAKE_CURRENT_BINARY_DIR}/${asset}.hpp")
|
|
|
|
list(APPEND TARGET_SRCS ${output})
|
|
|
|
add_custom_command(
|
|
|
|
DEPENDS "${input}"
|
|
|
|
OUTPUT "${output}"
|
|
|
|
COMMAND "${CMAKE_COMMAND}" "-DINPUT=${input}" "-DOUTPUT=${output}" -P "${PROJECT_SOURCE_DIR}/scripts/xxd.cmake"
|
|
|
|
)
|
|
|
|
endforeach()
|
2024-06-26 15:33:02 +00:00
|
|
|
|
2024-04-21 17:48:53 +00:00
|
|
|
add_executable(${TARGET} ${TARGET_SRCS})
|
2023-07-19 07:01:11 +00:00
|
|
|
install(TARGETS ${TARGET} RUNTIME)
|
Server Example Refactor and Improvements (#1570)
A major rewrite for the server example.
Note that if you have built something on the previous server API, it will probably be incompatible.
Check out the examples for how a typical chat app could work.
This took a lot of effort, there are 24 PR's closed in the submitter's repo alone, over 160 commits and a lot of comments and testing.
Summary of the changes:
- adds missing generation parameters: tfs_z, typical_p, repeat_last_n, repeat_penalty, presence_penalty, frequency_penalty, mirostat, penalize_nl, seed, ignore_eos
- applies missing top k sampler
- removes interactive mode/terminal-like behavior, removes exclude parameter
- moves threads and batch size to server command-line parameters
- adds LoRA loading and matches command line parameters with main example
- fixes stopping on EOS token and with the specified token amount with n_predict
- adds server timeouts, host, and port settings
- adds expanded generation complete response; adds generation settings, stop reason, prompt truncated, model used, and final text
- sets defaults for unspecified parameters between requests
- removes /next-token endpoint and as_loop parameter, adds stream parameter and server-sent events for streaming
- adds CORS headers to responses
- adds request logging, exception printing and optional verbose logging
- adds better stopping words handling when matching multiple tokens and while streaming, or when it finishes on a partial stop string
- adds printing an error when it can't bind to the host/port specified
- fixes multi-byte character handling and replaces invalid UTF-8 characters on responses
- prints timing and build info on startup
- adds logit bias to request parameters
- removes embedding mode
- updates documentation; adds streaming Node.js and Bash examples
- fixes code formatting
- sets server threads to 1 since the current global state doesn't work well with simultaneous requests
- adds truncation of the input prompt and better context reset
- removes token limit from the input prompt
- significantly simplified the logic and removed a lot of variables
---------
Co-authored-by: anon998 <131767832+anon998@users.noreply.github.com>
Co-authored-by: Henri Vasserman <henv@hot.ee>
Co-authored-by: Felix Hellmann <privat@cirk2.de>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Co-authored-by: Lesaun Harvey <Lesaun@gmail.com>
2023-06-17 11:53:04 +00:00
|
|
|
target_compile_definitions(${TARGET} PRIVATE
|
|
|
|
SERVER_VERBOSE=$<BOOL:${LLAMA_SERVER_VERBOSE}>
|
|
|
|
)
|
2024-06-26 15:33:02 +00:00
|
|
|
|
2024-04-15 17:35:21 +00:00
|
|
|
target_link_libraries(${TARGET} PRIVATE common ${CMAKE_THREAD_LIBS_INIT})
|
2024-06-26 15:33:02 +00:00
|
|
|
|
2024-03-09 09:57:09 +00:00
|
|
|
if (LLAMA_SERVER_SSL)
|
|
|
|
find_package(OpenSSL REQUIRED)
|
|
|
|
target_link_libraries(${TARGET} PRIVATE OpenSSL::SSL OpenSSL::Crypto)
|
|
|
|
target_compile_definitions(${TARGET} PRIVATE CPPHTTPLIB_OPENSSL_SUPPORT)
|
|
|
|
endif()
|
2024-06-26 15:33:02 +00:00
|
|
|
|
2023-07-21 07:42:21 +00:00
|
|
|
if (WIN32)
|
2024-07-03 04:54:56 +00:00
|
|
|
add_compile_definitions(-D_WIN32_WINNT=${GGML_WIN_VER} -DWINVER=${GGML_WIN_VER})
|
2023-07-21 07:42:21 +00:00
|
|
|
TARGET_LINK_LIBRARIES(${TARGET} PRIVATE ws2_32)
|
|
|
|
endif()
|
2024-06-26 15:33:02 +00:00
|
|
|
|
2023-05-21 17:51:18 +00:00
|
|
|
target_compile_features(${TARGET} PRIVATE cxx_std_11)
|