mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2024-12-26 11:24:35 +00:00
server : update readme
ggml-ci
This commit is contained in:
parent
79a8176883
commit
d58f8a1b6b
@ -444,13 +444,14 @@ These words will not be included in the completion, so make sure to add them to
|
|||||||
|
|
||||||
**Response format**
|
**Response format**
|
||||||
|
|
||||||
- Note: In streaming mode (`stream`), only `content` and `stop` will be returned until end of completion. Responses are sent using the [Server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html) standard. Note: the browser's `EventSource` interface cannot be used due to its lack of `POST` request support.
|
- Note: In streaming mode (`stream`), only `content`, `tokens` and `stop` will be returned until end of completion. Responses are sent using the [Server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html) standard. Note: the browser's `EventSource` interface cannot be used due to its lack of `POST` request support.
|
||||||
|
|
||||||
- `completion_probabilities`: An array of token probabilities for each completion. The array's length is `n_predict`. Each item in the array has the following structure:
|
- `completion_probabilities`: An array of token probabilities for each completion. The array's length is `n_predict`. Each item in the array has the following structure:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"content": "<the token selected by the model>",
|
"content": "<the token generated by the model>",
|
||||||
|
"tokens": [ generated token ids ],
|
||||||
"probs": [
|
"probs": [
|
||||||
{
|
{
|
||||||
"prob": float,
|
"prob": float,
|
||||||
@ -468,6 +469,7 @@ These words will not be included in the completion, so make sure to add them to
|
|||||||
Notice that each `probs` is an array of length `n_probs`.
|
Notice that each `probs` is an array of length `n_probs`.
|
||||||
|
|
||||||
- `content`: Completion result as a string (excluding `stopping_word` if any). In case of streaming mode, will contain the next token as a string.
|
- `content`: Completion result as a string (excluding `stopping_word` if any). In case of streaming mode, will contain the next token as a string.
|
||||||
|
- `tokens`: Same as `content` but represented as raw token ids.
|
||||||
- `stop`: Boolean for use with `stream` to check whether the generation has stopped (Note: This is not related to stopping words array `stop` from input options)
|
- `stop`: Boolean for use with `stream` to check whether the generation has stopped (Note: This is not related to stopping words array `stop` from input options)
|
||||||
- `generation_settings`: The provided options above excluding `prompt` but including `n_ctx`, `model`. These options may differ from the original ones in some way (e.g. bad values filtered out, strings converted to tokens, etc.).
|
- `generation_settings`: The provided options above excluding `prompt` but including `n_ctx`, `model`. These options may differ from the original ones in some way (e.g. bad values filtered out, strings converted to tokens, etc.).
|
||||||
- `model`: The path to the model loaded with `-m`
|
- `model`: The path to the model loaded with `-m`
|
||||||
|
Loading…
Reference in New Issue
Block a user