llama.cpp/chat_tcp_server.sh at 3a0dcb39207a18ab3f8d825914d0c4359ae9736d - llama.cpp - Gitea: Git with a cup of tea

root/llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-26 11:24:35 +00:00

Thiago Padilha 3a0dcb3920

Implement server mode.

This new mode works by first loading the model then listening for TCP
connections on a port. When a connection is received, arguments will be
parsed using a simple protocol:

- First the number of arguments will be read followed by a newline
  character.
- Then each argument will be read, separated by the 0 byte.
- With this we build an argument vector, similar to what is passed to
  the program entry point. We pass this to gpt_params_parse.

Finally `run` will be executed with the input/output streams connected
to the socket.

Signed-off-by: Thiago Padilha <thiago@padilha.cc>

2023-03-22 14:34:19 -03:00

7 lines

115 B

Bash

Executable File

Raw Blame History

 #!/usr/bin/env bash
 PORT=${PORT:-8080}
 MODEL=${MODEL:-models/7B/ggml-model-q4_0.bin}
 ./main -l ${PORT} -m $MODEL