mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2024-12-26 11:24:35 +00:00
3a0dcb3920
This new mode works by first loading the model then listening for TCP connections on a port. When a connection is received, arguments will be parsed using a simple protocol: - First the number of arguments will be read followed by a newline character. - Then each argument will be read, separated by the 0 byte. - With this we build an argument vector, similar to what is passed to the program entry point. We pass this to gpt_params_parse. Finally `run` will be executed with the input/output streams connected to the socket. Signed-off-by: Thiago Padilha <thiago@padilha.cc>
7 lines
115 B
Bash
Executable File
7 lines
115 B
Bash
Executable File
#!/usr/bin/env bash
|
|
|
|
PORT=${PORT:-8080}
|
|
MODEL=${MODEL:-models/7B/ggml-model-q4_0.bin}
|
|
|
|
./main -l ${PORT} -m $MODEL
|