mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2024-12-24 02:14:35 +00:00
1b78ed2081
1. Add a `LLAMA_SUPPORTS_GPU_OFFLOAD` define to `llama.h` (defined when compiled with CLBlast or cuBLAS) 2. Update the argument handling in the common example code to only show the `-ngl`, `--n-gpu-layers` option when GPU offload is possible. 3. Add an entry for the `-ngl`, `--n-gpu-layers` option to the `main` and `server` examples documentation 4. Update `main` and `server` examples documentation to use the new style dash separator argument format 5. Update the `server` example to use dash separators for its arguments and adds `-ngl` to `--help` (only shown when compiled with appropriate support). It will still support `--memory_f32` and `--ctx_size` for compatibility. 6. Add a warning discouraging use of `--memory-f32` for the `main` and `server` examples `--help` text as well as documentation. Rationale: https://github.com/ggerganov/llama.cpp/discussions/1593#discussioncomment-6004356 |
||
---|---|---|
.. | ||
baby-llama | ||
benchmark | ||
embedding | ||
jeopardy | ||
main | ||
perplexity | ||
quantize | ||
quantize-stats | ||
save-load-state | ||
server | ||
alpaca.sh | ||
chat-13B.bat | ||
chat-13B.sh | ||
chat-persistent.sh | ||
chat.sh | ||
CMakeLists.txt | ||
common.cpp | ||
common.h | ||
gpt4all.sh | ||
Miku.sh | ||
reason-act.sh |