mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-13 12:10:18 +00:00

History

Olivier Chafik 2428b73853 `agent`: ditch openai dependency, use cache_prompt and expose seed		2024-10-02 16:26:45 +01:00
..
tools	update tools	2024-10-02 14:57:25 +01:00
fastify.py	update tools	2024-10-02 14:57:25 +01:00
README.md	Update README.md	2024-10-02 15:19:27 +01:00
requirements.txt	`agent`: ditch openai dependency, use cache_prompt and expose seed	2024-10-02 16:26:45 +01:00
run.py	`agent`: ditch openai dependency, use cache_prompt and expose seed	2024-10-02 16:26:45 +01:00

README.md

Agents / Tool Calling w/ llama.cpp

Install prerequisite: uv (used to simplify python deps)

Run llama-server w/ jinja templates. Note that most models need a template override (the HF to GGUF conversion only retains a single chat_template, but sometimes the models only support tool calls in an alternative chat template).

make -j LLAMA_CURL=1 llama-server

# Nous Hermes 2 Pro Llama 3 8B
./llama-server --jinja -fa --verbose \
  -hfr NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF -hff Hermes-2-Pro-Llama-3-8B-Q8_0.gguf \
  --chat-template "$( python scripts/get_hf_chat_template.py NousResearch/Hermes-2-Pro-Llama-3-8B tool_use )"

# Llama 3.1 8B
./llama-server --jinja -fa --verbose \
  -hfr lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF -hff Meta-Llama-3.1-8B-Instruct-Q5_K_M.gguf

# Llama 3.1 70B
./llama-server --jinja -fa --verbose \
  -hfr lmstudio-community/Meta-Llama-3.1-70B-Instruct-GGUF -hff Meta-Llama-3.1-70B-Instruct-Q4_K_M.gguf

# functionary-small-v3
./llama-server --jinja -fa --verbose \
  -hfr meetkai/functionary-small-v3.2-GGUF -hff functionary-small-v3.2.Q4_0.gguf \
  --chat-template "$( python scripts/get_hf_chat_template.py meetkai/functionary-medium-v3.2 )"

# Llama 3.2 3B (poor adherence)
./llama-server --jinja -fa --verbose \
  -hfr lmstudio-community/Llama-3.2-3B-Instruct-GGUF -hff Llama-3.2-3B-Instruct-Q6_K_L.gguf \
  --chat-template "$( python scripts/get_hf_chat_template.py meta-llama/Llama-3.2-3B-Instruct )"

# Llama 3.2 1B (very poor adherence)
./llama-server --jinja -fa --verbose \
  -hfr lmstudio-community/Llama-3.2-1B-Instruct-GGUF -hff Llama-3.2-1B-Instruct-Q4_K_M.gguf \
  --chat-template "$( python scripts/get_hf_chat_template.py meta-llama/Llama-3.2-3B-Instruct )"

Run the tools in examples/agent/tools inside a docker container (check http://localhost:8088/docs once running):
```
docker run -p 8088:8088 -w /src -v $PWD/examples/agent:/src \
  --env BRAVE_SEARCH_API_KEY=$BRAVE_SEARCH_API_KEY \
  --rm -it ghcr.io/astral-sh/uv:python3.12-alpine \
  uv run fastify.py --port 8088 tools/
```
Warning

The command above gives tools (and your agent) access to the web (and read-only access to examples/agent/**. If you're concerned about unleashing a rogue agent on the web, please explore setting up proxies for your docker (and contribute back!)

Run the agent with a given goal:

uv run examples/agent/run.py --tools http://localhost:8088 \
  "What is the sum of 2535 squared and 32222000403?"

uv run examples/agent/run.py --tools http://localhost:8088 \
  "What is the best BBQ join in Laguna Beach?"

uv run examples/agent/run.py --tools http://localhost:8088 \
  "Search for, fetch and summarize the homepage of llama.cpp"

TODO

Implement code_interpreter using whichever tools are builtin for a given model.