Xuan Son Nguyen
9c405c9f9a
Server: use llama_chat_apply_template ( #5593 )
...
* server: use llama_chat_apply_template
* server: remove trailing space
* server: fix format_chat
* server: fix help message
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server: fix formatted_chat
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-20 15:58:27 +01:00
Daniel Hiltgen
66c1968f7a
server : graceful server shutdown ( #5244 )
...
This updates the server queue to support graceful shutdown of the server on signals.
2024-02-18 18:23:16 +02:00
Xuan Son Nguyen
907e08c110
server : add llama2 chat template ( #5425 )
...
* server: add mistral chat template
* server: fix typo
* server: rename template mistral to llama2
* server: format_llama2: remove BOS
* server: validate "--chat-template" argument
* server: clean up using_chatml variable
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
---------
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2024-02-11 12:16:22 +02:00
Georgi Gerganov
753eafed0e
sync : ggml
2024-01-27 17:00:24 +02:00
Xuan Son Nguyen
48c857aa10
server : refactored the task processing logic ( #5065 )
...
* server: add llama_server_queue struct
* server: add llama_server_response_event
* server: add comments
* server: move all mutexes away from server.cpp
* server: correct multitask response
* server: only add back deferred tasks when one slot is available
* server: fix a race condition cause by "request_completion"
2024-01-26 14:42:20 +02:00