llama.cpp/examples/server/tests/features/lora.feature

@llama.cpp
@lora
Feature: llama.cpp server

  Background: Server startup
    Given a server listening on localhost:8080
    And   a model url https://huggingface.co/ggml-org/stories15M_MOE/resolve/main/stories15M_MOE-F16.gguf
    And   a model file stories15M_MOE-F16.gguf
    And   a model alias stories15M_MOE
    And   a lora adapter file from https://huggingface.co/ggml-org/stories15M_MOE/resolve/main/moe_shakespeare15M.gguf
    And   42 as server seed
    And   1024 as batch size
    And   1024 as ubatch size
    And   2048 KV cache size
    And   64 max tokens to predict
    And   0.0 temperature
    Then  the server is starting
    Then  the server is healthy

  Scenario: Completion LoRA disabled
    Given switch off lora adapter 0
    Given a prompt:
    """
    Look in thy glass
    """
    And   a completion request with no api error
    Then  64 tokens are predicted matching little|girl|three|years|old

  Scenario: Completion LoRA enabled
    Given switch on lora adapter 0
    Given a prompt:
    """
    Look in thy glass
    """
    And   a completion request with no api error
    Then  64 tokens are predicted matching eye|love|glass|sun
server : add lora hotswap endpoint (WIP) (#8857) * server : add lora hotswap endpoint * handle lora_no_apply * fix build * updae docs * clean up struct def * fix build * add LoRA test * fix style 2024-08-06 15:33:39 +00:00			`@llama.cpp`
			`@lora`
			`Feature: llama.cpp server`

			`Background: Server startup`
			`Given a server listening on localhost:8080`
			`And a model url https://huggingface.co/ggml-org/stories15M_MOE/resolve/main/stories15M_MOE-F16.gguf`
			`And a model file stories15M_MOE-F16.gguf`
			`And a model alias stories15M_MOE`
			`And a lora adapter file from https://huggingface.co/ggml-org/stories15M_MOE/resolve/main/moe_shakespeare15M.gguf`
			`And 42 as server seed`
			`And 1024 as batch size`
			`And 1024 as ubatch size`
			`And 2048 KV cache size`
			`And 64 max tokens to predict`
			`And 0.0 temperature`
			`Then the server is starting`
			`Then the server is healthy`

			`Scenario: Completion LoRA disabled`
			`Given switch off lora adapter 0`
			`Given a prompt:`
			`"""`
			`Look in thy glass`
			`"""`
			`And a completion request with no api error`
			`Then 64 tokens are predicted matching little\|girl\|three\|years\|old`

			`Scenario: Completion LoRA enabled`
			`Given switch on lora adapter 0`
			`Given a prompt:`
			`"""`
			`Look in thy glass`
			`"""`
			`And a completion request with no api error`
			`Then 64 tokens are predicted matching eye\|love\|glass\|sun`