llama.cpp/ggml/src
HimariO ba1cb19cdd
Some checks failed
flake8 Lint / Lint (push) Waiting to run
Python Type-Check / pyright type-check (push) Waiting to run
Python check requirements.txt / check-requirements (push) Has been cancelled
llama : add Qwen2VL support + multimodal RoPE (#10361)
* Barebone Qwen2VL LLM convertor

* Add Qwen2VL cli entrypoint

* [WIP] add qwen2vl arch

* Verify m-rope output

* Add vl-rope/2d-rope support for qwen2vl ViT

* update qwen2vl cli tool

* update 5D tensor op workaround

* [WIP] qwen2vl vision model

* make batch and clip utils compatible with qwen2vl

* [WIP] create inference workflow, gguf convert script but fix

* correcting vision-rope behavior, add the missing last layer back to ViT

* add arg parser to qwen2vl_surgery

* replace variable size array with vector

* cuda-gdb cmake preset

* add fp32 mrope, vision rope kernel

* add fp16 support for qwen2vl and m-rope

* add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION`

* fix rope op mode switching, out dated func args

* update `llama_hparams`

* update to keep up stream changes

* resolve linter, test errors

* add makefile entry, update speical image padding token

* add mrope unit test, fix few compiler warnings

* rename `mrope` related function, params

* minor updates on debug util, bug fixs

* add `m-rope` testcase to `test-backend-ops`

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* fix traililng whitespce

* store `llama_hparams.rope_sections` with fixed size array

* update position id tensor size check in GGML_OP_ROPE

* minor updates

* update `ggml_backend_*_supports_op` of unsupported backends

* remote old `rope_section` compare operator

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-12-14 14:43:46 +02:00
..
ggml-blas ggml : add support for dynamic loading of backends (#10469) 2024-11-25 15:13:39 +01:00
ggml-cann llama : add Qwen2VL support + multimodal RoPE (#10361) 2024-12-14 14:43:46 +02:00
ggml-cpu llama : add Qwen2VL support + multimodal RoPE (#10361) 2024-12-14 14:43:46 +02:00
ggml-cuda llama : add Qwen2VL support + multimodal RoPE (#10361) 2024-12-14 14:43:46 +02:00
ggml-hip ggml : add support for dynamic loading of backends (#10469) 2024-11-25 15:13:39 +01:00
ggml-kompute llama : add Qwen2VL support + multimodal RoPE (#10361) 2024-12-14 14:43:46 +02:00
ggml-metal llama : add Qwen2VL support + multimodal RoPE (#10361) 2024-12-14 14:43:46 +02:00
ggml-musa mtgpu: Add MUSA_DOCKER_ARCH in Dockerfiles && update cmake and make (#10516) 2024-11-26 17:00:41 +01:00
ggml-opencl Introducing experimental OpenCL backend with support for Qualcomm Adreno GPUs (#10693) 2024-12-13 12:23:52 -08:00
ggml-rpc ggml : add support for dynamic loading of backends (#10469) 2024-11-25 15:13:39 +01:00
ggml-sycl llama : add Qwen2VL support + multimodal RoPE (#10361) 2024-12-14 14:43:46 +02:00
ggml-vulkan llama : add Qwen2VL support + multimodal RoPE (#10361) 2024-12-14 14:43:46 +02:00
CMakeLists.txt Introducing experimental OpenCL backend with support for Qualcomm Adreno GPUs (#10693) 2024-12-13 12:23:52 -08:00
ggml-alloc.c ggml: new optimization interface (ggml/988) 2024-11-17 08:30:29 +02:00
ggml-backend-impl.h ggml : automatic selection of best CPU backend (#10606) 2024-12-01 16:12:41 +01:00
ggml-backend-reg.cpp Introducing experimental OpenCL backend with support for Qualcomm Adreno GPUs (#10693) 2024-12-13 12:23:52 -08:00
ggml-backend.cpp ggml : move AMX to the CPU backend (#10570) 2024-11-29 21:54:58 +01:00
ggml-common.h CUDA: rename macros to avoid conflicts with WinAPI (#10736) 2024-12-10 18:23:24 +01:00
ggml-impl.h remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797) 2024-12-12 19:02:49 +01:00
ggml-opt.cpp ggml-opt: fix data corruption (ggml/1022) 2024-11-21 09:22:02 +02:00
ggml-quants.c ggml : refactor online repacking (#10446) 2024-12-07 14:37:50 +02:00
ggml-quants.h ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-threading.cpp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-threading.h remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797) 2024-12-12 19:02:49 +01:00
ggml.c llama : add Qwen2VL support + multimodal RoPE (#10361) 2024-12-14 14:43:46 +02:00