This new mode works by first loading the model then listening for TCP
connections on a port. When a connection is received, arguments will be
parsed using a simple protocol:
- First the number of arguments will be read followed by a newline
character.
- Then each argument will be read, separated by the 0 byte.
- With this we build an argument vector, similar to what is passed to
the program entry point. We pass this to gpt_params_parse.
Finally `run` will be executed with the input/output streams connected
to the socket.
Signed-off-by: Thiago Padilha <thiago@padilha.cc>