mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2024-11-13 14:29:52 +00:00
438c2ca830
* implementing parallel decoding in server example * crash fixed * save dev progress * refactored sampling function * completion endpoint working * multiple client support * grammar + no stream completion * cached prompt support * chat.mjs support cached prompt + some fixes * server ui now support multiple clients * unused change reverted * fixed timings per slot * add context swap * add changes to README.md * llava multimodal integration * fixed tokens probs * add multimodal input - alfa * refactor code + remove unused comments + improved README.md * fix compilation errors with llvm * notify the user from server ui that multimodality is unavialable * some ci fixes * fix ci make build undefined ref errors * fix long prompt than ctx proposed in #3639 * fixed premature end due stop word * context shift fixed * fix llava implementation * sync README.md changes * readme change * update api like OpenAI * multimodal support enabled by default * fix make bui;d errors * fix multiple clients * fix zig build * new sampling API * latest changes of sampling API * server : coding-style normalization * server : coding-style normalization (part 2) * server : remove beam-search functionality * server : bug fix in ingest_images n_tokens is incremented internally by llama_batch_add * server : use refs + use llama_batch_clear() * server : snake case * server : minor sync * added thread safe pipeline * server : bach has to be allocated for n_parallel sequences * server : no need for atomic int - already using mutex * server : logs + minor code style * server : fix multibyte handle in partial response (#3706) * fix image load + view image in chat * make : silence stb warnings * clip : link to ggml, not to llama * server : fix switch fallthrough * server : fix crash in Debug on macOS (I have no idea why this fixes it!?) * server : refactor ctx_sampling init + n_ctx + names * server : bug fix for prompt caching * Do not save/load image_data to localStorage * editorconfig : new line in index.html * server : completion requests remember slot_id * Update readme to document multimodal in server * server : minor style * Update readme to document multimodal in server * server : hide ctx_sampling->prev behind API (#3696) * server : apply fix from #3722 * server : fix slot reuse * server : add comment about changing slot_state to bool --------- Co-authored-by: FSSRepo <go778sgt@gmail.com> Co-authored-by: Damian Stewart <d@damianstewart.com> Co-authored-by: Steward Garcia <57494570+FSSRepo@users.noreply.github.com> Co-authored-by: Jhen-Jie Hong <iainst0409@gmail.com> Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com> |
||
---|---|---|
.. | ||
clip.cpp | ||
clip.h | ||
CMakeLists.txt | ||
convert-image-encoder-to-gguf.py | ||
llava-surgery.py | ||
llava-utils.h | ||
llava.cpp | ||
README.md |
LLaVA
Currently this implementation supports llava-v1.5 variants.
The pre-converted 7b and 13b models are available.
After API is confirmed, more models will be supported / uploaded.
Usage
Build with cmake or run make llava
to build it.
After building, run: ./llava
to see the usage. For example:
./llava -m llava-v1.5-7b/ggml-model-q5_k.gguf --mmproj llava-v1.5-7b/mmproj-model-f16.gguf --image path/to/an/image.jpg
note: A lower temperature like 0.1 is recommended for better quality. add --temp 0.1
to the command to do so.
Model conversion
- Clone
llava-v15-7b`` and
clip-vit-large-patch14-336`` locally:
git clone https://huggingface.co/liuhaotian/llava-v1.5-7b
git clone https://huggingface.co/openai/clip-vit-large-patch14-336
- Use
llava-surgery.py
to split the LLaVA model to LLaMA and multimodel projector constituents:
python ./examples/llava/llava-surgery.py -m ../llava-v1.5-7b
- Use
convert-image-encoder-to-gguf.py
to convert the LLaVA image encoder to GGUF:
python ./examples/llava/convert-image-encoder-to-gguf -m ../clip-vit-large-patch14-336 --llava-projector ../llava-v1.5-7b/llava.projector --output-dir ../llava-v1.5-7b
- Use
convert.py
to convert the LLaMA part of LLaVA to GGUF:
python ./convert.py ../llava-v1.5-7b
Now both the LLaMA part and the image encoder is in the llava-v1.5-7b
directory.
TODO
- Support server mode.
- Support non-CPU backend for the image encoding part.
- Support different sampling methods.
- Support more model variants.