llama.cpp/examples/run/README.md

# llama.cpp/example/run

The purpose of this example is to demonstrate a minimal usage of llama.cpp for running models.

```bash
llama-run granite-code
```

```bash
llama-run -h
Description:
  Runs a llm

Usage:
  llama-run [options] model [prompt]

Options:
  -c, --context-size <value>
      Context size (default: 2048)
  -n, --ngl <value>
      Number of GPU layers (default: 0)
  -v, --verbose, --log-verbose
      Set verbosity level to infinity (i.e. log all messages, useful for debugging)
  -h, --help
      Show help message

Commands:
  model
      Model is a string with an optional prefix of
      huggingface:// (hf://), ollama://, https:// or file://.
      If no protocol is specified and a file exists in the specified
      path, file:// is assumed, otherwise if a file does not exist in
      the specified path, ollama:// is assumed. Models that are being
      pulled are downloaded with .partial extension while being
      downloaded and then renamed as the file without the .partial
      extension when complete.

Examples:
  llama-run llama3
  llama-run ollama://granite-code
  llama-run ollama://smollm:135m
  llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf
  llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf
  llama-run https://example.com/some-file1.gguf
  llama-run some-file2.gguf
  llama-run file://some-file3.gguf
  llama-run --ngl 999 some-file4.gguf
  llama-run --ngl 999 some-file5.gguf Hello World
```
Introduce llama-run (#10291) It's like simple-chat but it uses smart pointers to avoid manual memory cleanups. Less memory leaks in the code now. Avoid printing multiple dots. Split code into smaller functions. Uses no exception handling. Signed-off-by: Eric Curtin <ecurtin@redhat.com> 2024-11-25 21:56:24 +00:00			`# llama.cpp/example/run`

			`The purpose of this example is to demonstrate a minimal usage of llama.cpp for running models.`

			```bash
Opt class for positional argument handling (#10508) Added support for positional arguments `model` and `prompt`. Added functionality to download via strings like: llama-run llama3 llama-run ollama://granite-code llama-run ollama://granite-code:8b llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf llama-run https://example.com/some-file1.gguf llama-run some-file2.gguf llama-run file://some-file3.gguf Signed-off-by: Eric Curtin <ecurtin@redhat.com> 2024-12-13 18:34:25 +00:00			`llama-run granite-code`
llama-run : improve progress bar (#10821) Set default width to whatever the terminal is. Also fixed a small bug around default n_gpu_layers value. Signed-off-by: Eric Curtin <ecurtin@redhat.com> 2024-12-19 02:58:00 +00:00			```
Opt class for positional argument handling (#10508) Added support for positional arguments `model` and `prompt`. Added functionality to download via strings like: llama-run llama3 llama-run ollama://granite-code llama-run ollama://granite-code:8b llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf llama-run https://example.com/some-file1.gguf llama-run some-file2.gguf llama-run file://some-file3.gguf Signed-off-by: Eric Curtin <ecurtin@redhat.com> 2024-12-13 18:34:25 +00:00
			```bash
			`llama-run -h`
			`Description:`
			`Runs a llm`

			`Usage:`
			`llama-run [options] model [prompt]`

			`Options:`
			`-c, --context-size <value>`
			`Context size (default: 2048)`
			`-n, --ngl <value>`
			`Number of GPU layers (default: 0)`
llama-run : improve progress bar (#10821) Set default width to whatever the terminal is. Also fixed a small bug around default n_gpu_layers value. Signed-off-by: Eric Curtin <ecurtin@redhat.com> 2024-12-19 02:58:00 +00:00			`-v, --verbose, --log-verbose`
			`Set verbosity level to infinity (i.e. log all messages, useful for debugging)`
Opt class for positional argument handling (#10508) Added support for positional arguments `model` and `prompt`. Added functionality to download via strings like: llama-run llama3 llama-run ollama://granite-code llama-run ollama://granite-code:8b llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf llama-run https://example.com/some-file1.gguf llama-run some-file2.gguf llama-run file://some-file3.gguf Signed-off-by: Eric Curtin <ecurtin@redhat.com> 2024-12-13 18:34:25 +00:00			`-h, --help`
			`Show help message`

			`Commands:`
			`model`
			`Model is a string with an optional prefix of`
			`huggingface:// (hf://), ollama://, https:// or file://.`
			`If no protocol is specified and a file exists in the specified`
			`path, file:// is assumed, otherwise if a file does not exist in`
			`the specified path, ollama:// is assumed. Models that are being`
			`pulled are downloaded with .partial extension while being`
			`downloaded and then renamed as the file without the .partial`
			`extension when complete.`

			`Examples:`
			`llama-run llama3`
			`llama-run ollama://granite-code`
			`llama-run ollama://smollm:135m`
			`llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf`
			`llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf`
			`llama-run https://example.com/some-file1.gguf`
			`llama-run some-file2.gguf`
			`llama-run file://some-file3.gguf`
llama-run : improve progress bar (#10821) Set default width to whatever the terminal is. Also fixed a small bug around default n_gpu_layers value. Signed-off-by: Eric Curtin <ecurtin@redhat.com> 2024-12-19 02:58:00 +00:00			`llama-run --ngl 999 some-file4.gguf`
			`llama-run --ngl 999 some-file5.gguf Hello World`
			```