llama.cpp/fattn.cuh at 013721df2bf2936ec6d6493671ec9df3f70f1c13 - llama.cpp - Gitea: Git with a cup of tea

root/llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-30 21:34:36 +00:00

Georgi Gerganov 013721df2b

Merge branch 'master' into gg/flash-attn

2024-03-27 10:24:09 +02:00

7 lines

226 B

Plaintext

Raw Blame History

 #include "common.cuh"
 void ggml_cuda_flash_attn_ext(
         ggml_backend_cuda_context & ctx,
         const ggml_tensor * Q, const ggml_tensor * K, const ggml_tensor * V,
         const ggml_tensor * mask, ggml_tensor * KQV);