llama.cpp/examples/server/tests
Mathijs Henquet 78203641fe
Some checks failed
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
Python Type-Check / pyright type-check (push) Has been cancelled
server : Add option to return token pieces in /tokenize endpoint (#9108)
* server : added with_pieces functionality to /tokenize endpoint

* server : Add tokenize with pieces tests to server.feature

* Handle case if tokenizer splits along utf8 continuation bytes

* Add example of token splitting

* Remove trailing ws

* Fix trailing ws

* Maybe fix ci

* maybe this fix windows ci?

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2024-09-12 22:30:11 +02:00
..
features server : Add option to return token pieces in /tokenize endpoint (#9108) 2024-09-12 22:30:11 +02:00
README.md build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
requirements.txt server : add lora hotswap endpoint (WIP) (#8857) 2024-08-06 17:33:39 +02:00
tests.sh tests : minor bash stuff (#6902) 2024-04-25 14:27:20 +03:00

Server tests

Python based server tests scenario using BDD and behave:

Tests target GitHub workflows job runners with 4 vCPU.

Requests are using aiohttp, asyncio based http client.

Note: If the host architecture inference speed is faster than GitHub runners one, parallel scenario may randomly fail. To mitigate it, you can increase values in n_predict, kv_size.

Install dependencies

pip install -r requirements.txt

Run tests

  1. Build the server
cd ../../..
cmake -B build -DLLAMA_CURL=ON
cmake --build build --target llama-server
  1. Start the test: ./tests.sh

It's possible to override some scenario steps values with environment variables:

variable description
PORT context.server_port to set the listening port of the server during scenario, default: 8080
LLAMA_SERVER_BIN_PATH to change the server binary path, default: ../../../build/bin/llama-server
DEBUG "ON" to enable steps and server verbose mode --verbose
SERVER_LOG_FORMAT_JSON if set switch server logs to json format
N_GPU_LAYERS number of model layers to offload to VRAM -ngl --n-gpu-layers

Run @bug, @wip or @wrong_usage annotated scenario

Feature or Scenario must be annotated with @llama.cpp to be included in the default scope.

  • @bug annotation aims to link a scenario with a GitHub issue.
  • @wrong_usage are meant to show user issue that are actually an expected behavior
  • @wip to focus on a scenario working in progress
  • @slow heavy test, disabled by default

To run a scenario annotated with @bug, start:

DEBUG=ON ./tests.sh --no-skipped --tags bug --stop

After changing logic in steps.py, ensure that @bug and @wrong_usage scenario are updated.

./tests.sh --no-skipped --tags bug,wrong_usage || echo "should failed but compile"