llama.cpp/ggml/src/ggml-sycl
HimariO ba1cb19cdd
Some checks failed
flake8 Lint / Lint (push) Waiting to run
Python Type-Check / pyright type-check (push) Waiting to run
Python check requirements.txt / check-requirements (push) Has been cancelled
llama : add Qwen2VL support + multimodal RoPE (#10361)
* Barebone Qwen2VL LLM convertor

* Add Qwen2VL cli entrypoint

* [WIP] add qwen2vl arch

* Verify m-rope output

* Add vl-rope/2d-rope support for qwen2vl ViT

* update qwen2vl cli tool

* update 5D tensor op workaround

* [WIP] qwen2vl vision model

* make batch and clip utils compatible with qwen2vl

* [WIP] create inference workflow, gguf convert script but fix

* correcting vision-rope behavior, add the missing last layer back to ViT

* add arg parser to qwen2vl_surgery

* replace variable size array with vector

* cuda-gdb cmake preset

* add fp32 mrope, vision rope kernel

* add fp16 support for qwen2vl and m-rope

* add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION`

* fix rope op mode switching, out dated func args

* update `llama_hparams`

* update to keep up stream changes

* resolve linter, test errors

* add makefile entry, update speical image padding token

* add mrope unit test, fix few compiler warnings

* rename `mrope` related function, params

* minor updates on debug util, bug fixs

* add `m-rope` testcase to `test-backend-ops`

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* fix traililng whitespce

* store `llama_hparams.rope_sections` with fixed size array

* update position id tensor size check in GGML_OP_ROPE

* minor updates

* update `ggml_backend_*_supports_op` of unsupported backends

* remote old `rope_section` compare operator

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-12-14 14:43:46 +02:00
..
dpct SYCL: Reduce most of the compiler warnings (#10748) 2024-12-13 12:12:15 +05:30
backend.hpp Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (#10133) 2024-11-07 15:19:10 +08:00
CMakeLists.txt SYCL : Move to compile time oneMKL interface backend selection for NVIDIA backend (#10584) 2024-12-04 09:29:20 +08:00
common.cpp SYCL: Reduce most of the compiler warnings (#10748) 2024-12-13 12:12:15 +05:30
common.hpp SYCL: Reduce most of the compiler warnings (#10748) 2024-12-13 12:12:15 +05:30
concat.cpp SYCL: Reduce most of the compiler warnings (#10748) 2024-12-13 12:12:15 +05:30
concat.hpp [SYCL] add concat through dim 1/2 (#8483) 2024-07-15 19:32:15 +08:00
conv.cpp [SYCL] add conv support (#8688) 2024-07-29 10:50:27 +08:00
conv.hpp [SYCL] add conv support (#8688) 2024-07-29 10:50:27 +08:00
convert.cpp SYCL: Reduce most of the compiler warnings (#10748) 2024-12-13 12:12:15 +05:30
convert.hpp [SYCL] Fix SYCL im2col and convert Overflow with Large Dims (#9052) 2024-08-20 23:06:51 +08:00
dequantize.hpp Fixed dequant precision issues in Q4_1 and Q5_1 (#9711) 2024-10-03 07:50:44 +01:00
dmmv.cpp SYCL: Reduce most of the compiler warnings (#10748) 2024-12-13 12:12:15 +05:30
dmmv.hpp llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
element_wise.cpp SYCL: Reduce most of the compiler warnings (#10748) 2024-12-13 12:12:15 +05:30
element_wise.hpp Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (#10133) 2024-11-07 15:19:10 +08:00
gemm.hpp SYCL: Reduce most of the compiler warnings (#10748) 2024-12-13 12:12:15 +05:30
ggml-sycl.cpp llama : add Qwen2VL support + multimodal RoPE (#10361) 2024-12-14 14:43:46 +02:00
im2col.cpp SYCL: Reduce most of the compiler warnings (#10748) 2024-12-13 12:12:15 +05:30
im2col.hpp [SYCL] Fix SYCL im2col and convert Overflow with Large Dims (#9052) 2024-08-20 23:06:51 +08:00
mmq.cpp SYCL: Reduce most of the compiler warnings (#10748) 2024-12-13 12:12:15 +05:30
mmq.hpp llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
mmvq.cpp SYCL: Reduce most of the compiler warnings (#10748) 2024-12-13 12:12:15 +05:30
mmvq.hpp llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
norm.cpp SYCL: Reduce most of the compiler warnings (#10748) 2024-12-13 12:12:15 +05:30
norm.hpp [SYCL] Fix the sub group size of Intel (#8106) 2024-07-02 10:16:00 +08:00
outprod.cpp SYCL : Move to compile time oneMKL interface backend selection for NVIDIA backend (#10584) 2024-12-04 09:29:20 +08:00
outprod.hpp Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (#10133) 2024-11-07 15:19:10 +08:00
presets.hpp Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (#10133) 2024-11-07 15:19:10 +08:00
rope.cpp SYCL: Reduce most of the compiler warnings (#10748) 2024-12-13 12:12:15 +05:30
rope.hpp [SYCL] Update SYCL-Rope op and Refactor (#8157) 2024-07-01 19:39:06 +08:00
softmax.cpp SYCL: Reduce most of the compiler warnings (#10748) 2024-12-13 12:12:15 +05:30
softmax.hpp [SYCL] Fix WARP_SIZE=16 bug of Intel GPU (#8266) 2024-07-05 13:06:13 +08:00
tsembd.cpp SYCL: Reduce most of the compiler warnings (#10748) 2024-12-13 12:12:15 +05:30
tsembd.hpp [SYCL] Add TIMESTEP_EMBEDDING OP (#8707) 2024-07-30 14:56:51 +08:00
vecdotq.hpp sycl: Use syclcompat::dp4a (#10267) 2024-11-15 11:09:12 +08:00
wkv6.cpp SYCL: Reduce most of the compiler warnings (#10748) 2024-12-13 12:12:15 +05:30
wkv6.hpp Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (#10133) 2024-11-07 15:19:10 +08:00