llama.cpp/examples/export-lora
Xuan Son Nguyen 1e6f6554aa
server : add lora hotswap endpoint (WIP) (#8857)
* server : add lora hotswap endpoint

* handle lora_no_apply

* fix build

* updae docs

* clean up struct def

* fix build

* add LoRA test

* fix style
2024-08-06 17:33:39 +02:00
..
CMakeLists.txt build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) 2024-06-13 00:41:52 +01:00
export-lora.cpp server : add lora hotswap endpoint (WIP) (#8857) 2024-08-06 17:33:39 +02:00
README.md examples : remove finetune and train-text-from-scratch (#8669) 2024-07-25 10:39:04 +02:00

export-lora

Apply LORA adapters to base model and export the resulting model.

usage: llama-export-lora [options]

options:
  -m,    --model                  model path from which to load base model (default '')
         --lora FNAME             path to LoRA adapter  (can be repeated to use multiple adapters)
         --lora-scaled FNAME S    path to LoRA adapter with user defined scaling S  (can be repeated to use multiple adapters)
  -t,    --threads N              number of threads to use during computation (default: 4)
  -o,    --output FNAME           output file (default: 'ggml-lora-merged-f16.gguf')

For example:

./bin/llama-export-lora \
    -m open-llama-3b-v2-q8_0.gguf \
    -o open-llama-3b-v2-q8_0-english2tokipona-chat.gguf \
    --lora lora-open-llama-3b-v2-q8_0-english2tokipona-chat-LATEST.gguf

Multiple LORA adapters can be applied by passing multiple --lora FNAME or --lora-scaled FNAME S command line parameters:

./bin/llama-export-lora \
    -m your_base_model.gguf \
    -o your_merged_model.gguf \
    --lora-scaled lora_task_A.gguf 0.5 \
    --lora-scaled lora_task_B.gguf 0.5