@llama.cpp @lora Feature: llama.cpp server Background: Server startup Given a server listening on localhost:8080 And a model url https://huggingface.co/ggml-org/stories15M_MOE/resolve/main/stories15M_MOE-F16.gguf And a model file stories15M_MOE-F16.gguf And a model alias stories15M_MOE And a lora adapter file from https://huggingface.co/ggml-org/stories15M_MOE/resolve/main/moe_shakespeare15M.gguf And 42 as server seed And 1024 as batch size And 1024 as ubatch size And 2048 KV cache size And 64 max tokens to predict And 0.0 temperature Then the server is starting Then the server is healthy Scenario: Completion LoRA disabled Given switch off lora adapter 0 Given a prompt: """ Look in thy glass """ And a completion request with no api error Then 64 tokens are predicted matching little|girl|three|years|old Scenario: Completion LoRA enabled Given switch on lora adapter 0 Given a prompt: """ Look in thy glass """ And a completion request with no api error Then 64 tokens are predicted matching eye|love|glass|sun