1c641e6aac
* `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew
* server: update refs -> llama-server
gitignore llama-server
* server: simplify nix package
* main: update refs -> llama
fix examples/main ref
* main/server: fix targets
* update more names
* Update build.yml
* rm accidentally checked in bins
* update straggling refs
* Update .gitignore
* Update server-llm.sh
* main: target name -> llama-cli
* Prefix all example bins w/ llama-
* fix main refs
* rename {main->llama}-cmake-pkg binary
* prefix more cmake targets w/ llama-
* add/fix gbnf-validator subfolder to cmake
* sort cmake example subdirs
* rm bin files
* fix llama-lookup-* Makefile rules
* gitignore /llama-*
* rename Dockerfiles
* rename llama|main -> llama-cli; consistent RPM bin prefixes
* fix some missing -cli suffixes
* rename dockerfile w/ llama-cli
* rename(make): llama-baby-llama
* update dockerfile refs
* more llama-cli(.exe)
* fix test-eval-callback
* rename: llama-cli-cmake-pkg(.exe)
* address gbnf-validator unused fread warning (switched to C++ / ifstream)
* add two missing llama- prefixes
* Updating docs for eval-callback binary to use new `llama-` prefix.
* Updating a few lingering doc references for rename of main to llama-cli
* Updating `run-with-preset.py` to use new binary names.
Updating docs around `perplexity` binary rename.
* Updating documentation references for lookup-merge and export-lora
* Updating two small `main` references missed earlier in the finetune docs.
* Update apps.nix
* update grammar/README.md w/ new llama-* names
* update llama-rpc-server bin name + doc
* Revert "update llama-rpc-server bin name + doc"
This reverts commit
|
||
---|---|---|
.. | ||
arithmetic.gbnf | ||
c.gbnf | ||
chess.gbnf | ||
japanese.gbnf | ||
json_arr.gbnf | ||
json.gbnf | ||
list.gbnf | ||
README.md |
GBNF Guide
GBNF (GGML BNF) is a format for defining formal grammars to constrain model outputs in llama.cpp
. For example, you can use it to force the model to generate valid JSON, or speak only in emojis. GBNF grammars are supported in various ways in examples/main
and examples/server
.
Background
Bakus-Naur Form (BNF) is a notation for describing the syntax of formal languages like programming languages, file formats, and protocols. GBNF is an extension of BNF that primarily adds a few modern regex-like features.
Basics
In GBNF, we define production rules that specify how a non-terminal (rule name) can be replaced with sequences of terminals (characters, specifically Unicode code points) and other non-terminals. The basic format of a production rule is nonterminal ::= sequence...
.
Example
Before going deeper, let's look at some of the features demonstrated in grammars/chess.gbnf
, a small chess notation grammar:
# `root` specifies the pattern for the overall output
root ::= (
# it must start with the characters "1. " followed by a sequence
# of characters that match the `move` rule, followed by a space, followed
# by another move, and then a newline
"1. " move " " move "\n"
# it's followed by one or more subsequent moves, numbered with one or two digits
([1-9] [0-9]? ". " move " " move "\n")+
)
# `move` is an abstract representation, which can be a pawn, nonpawn, or castle.
# The `[+#]?` denotes the possibility of checking or mate signs after moves
move ::= (pawn | nonpawn | castle) [+#]?
pawn ::= ...
nonpawn ::= ...
castle ::= ...
Non-Terminals and Terminals
Non-terminal symbols (rule names) stand for a pattern of terminals and other non-terminals. They are required to be a dashed lowercase word, like move
, castle
, or check-mate
.
Terminals are actual characters (code points). They can be specified as a sequence like "1"
or "O-O"
or as ranges like [1-9]
or [NBKQR]
.
Characters and character ranges
Terminals support the full range of Unicode. Unicode characters can be specified directly in the grammar, for example hiragana ::= [ぁ-ゟ]
, or with escapes: 8-bit (\xXX
), 16-bit (\uXXXX
) or 32-bit (\UXXXXXXXX
).
Character ranges can be negated with ^
:
single-line ::= [^\n]+ "\n"`
Sequences and Alternatives
The order of symbols in a sequence matters. For example, in "1. " move " " move "\n"
, the "1. "
must come before the first move
, etc.
Alternatives, denoted by |
, give different sequences that are acceptable. For example, in move ::= pawn | nonpawn | castle
, move
can be a pawn
move, a nonpawn
move, or a castle
.
Parentheses ()
can be used to group sequences, which allows for embedding alternatives in a larger rule or applying repetition and optional symbols (below) to a sequence.
Repetition and Optional Symbols
*
after a symbol or sequence means that it can be repeated zero or more times (equivalent to{0,}
).+
denotes that the symbol or sequence should appear one or more times (equivalent to{1,}
).?
makes the preceding symbol or sequence optional (equivalent to{0,1}
).{m}
repeats the precedent symbol or sequence exactlym
times{m,}
repeats the precedent symbol or sequence at leastm
times{m,n}
repeats the precedent symbol or sequence at betweenm
andn
times (included){0,n}
repeats the precedent symbol or sequence at mostn
times (included)
Comments and newlines
Comments can be specified with #
:
# defines optional whitespace
ws ::= [ \t\n]+
Newlines are allowed between rules and between symbols or sequences nested inside parentheses. Additionally, a newline after an alternate marker |
will continue the current rule, even outside of parentheses.
The root rule
In a full grammar, the root
rule always defines the starting point of the grammar. In other words, it specifies what the entire output must match.
# a grammar for lists
root ::= ("- " item)+
item ::= [^\n]+ "\n"
Next steps
This guide provides a brief overview. Check out the GBNF files in this directory (grammars/
) for examples of full grammars. You can try them out with:
./llama-cli -m <model> --grammar-file grammars/some-grammar.gbnf -p 'Some prompt'
llama.cpp
can also convert JSON schemas to grammars either ahead of time or at each request, see below.
Troubleshooting
Grammars currently have performance gotchas (see https://github.com/ggerganov/llama.cpp/issues/4218).
Efficient optional repetitions
A common pattern is to allow repetitions of a pattern x
up to N times.
While semantically correct, the syntax x? x? x?.... x?
(with N repetitions) may result in extremely slow sampling. Instead, you can write x{0,N}
(or (x (x (x ... (x)?...)?)?)?
w/ N-deep nesting in earlier llama.cpp versions).
Using GBNF grammars
You can use GBNF grammars:
- In llama-server's completion endpoints, passed as the
grammar
body field - In llama-cli, passed as the
--grammar
&--grammar-file
flags - With llama-gbnf-validator tool, to test them against strings.
JSON Schemas → GBNF
llama.cpp
supports converting a subset of https://json-schema.org/ to GBNF grammars:
- In llama-server:
- For any completion endpoints, passed as the
json_schema
body field - For the
/chat/completions
endpoint, passed inside theresult_format
body field (e.g.{"type", "json_object", "schema": {"items": {}}}
)
- For any completion endpoints, passed as the
- In llama-cli, passed as the
--json
/-j
flag - To convert to a grammar ahead of time:
- in CLI, with examples/json_schema_to_grammar.py
- in JavaScript with json-schema-to-grammar.mjs (this is used by the server's Web UI)
Take a look at tests to see which features are likely supported (you'll also find usage examples in https://github.com/ggerganov/llama.cpp/pull/5978, https://github.com/ggerganov/llama.cpp/pull/6659 & https://github.com/ggerganov/llama.cpp/pull/6555).
Here is also a non-exhaustive list of unsupported features:
additionalProperties
: to be fixed in https://github.com/ggerganov/llama.cpp/pull/7840minimum
,exclusiveMinimum
,maximum
,exclusiveMaximum
integer
constraints to be implemented in https://github.com/ggerganov/llama.cpp/pull/7797
- Remote
$ref
s in the C++ version (Python & JavaScript versions fetch https refs) - Mixing
properties
w/anyOf
/oneOf
in the same type (https://github.com/ggerganov/llama.cpp/issues/7703) string
formatsuri
,email
contains
/minContains
uniqueItems
$anchor
(cf. dereferencing)not
- Conditionals
if
/then
/else
/dependentSchemas
patternProperties