.. |
template-instances
|
CUDA: refactor mmq, dmmv, mmvq (#7716)
|
2024-06-05 16:53:00 +02:00 |
acc.cu
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
acc.cuh
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
arange.cu
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
arange.cuh
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
argsort.cu
|
CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (#7921)
|
2024-06-14 18:41:49 +02:00 |
argsort.cuh
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
binbcast.cu
|
ggml : group all experts in a single ggml_mul_mat_id (#6505)
|
2024-04-18 15:18:48 +02:00 |
binbcast.cuh
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
clamp.cu
|
Introduction of CUDA Graphs to LLama.cpp (#6766)
|
2024-05-08 22:55:49 +02:00 |
clamp.cuh
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
common.cuh
|
CUDA: use MMQ instead of cuBLAS by default (#8075)
|
2024-06-24 17:43:42 +02:00 |
concat.cu
|
cuda : non-cont concat support (#7610)
|
2024-05-29 15:38:26 +03:00 |
concat.cuh
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
convert.cu
|
ggml : drop support for QK_K=64 (#7473)
|
2024-05-23 10:00:21 +03:00 |
convert.cuh
|
llama : add Command R Plus support (#6491)
|
2024-04-09 11:16:13 +03:00 |
cpy.cu
|
Introduction of CUDA Graphs to LLama.cpp (#6766)
|
2024-05-08 22:55:49 +02:00 |
cpy.cuh
|
Introduction of CUDA Graphs to LLama.cpp (#6766)
|
2024-05-08 22:55:49 +02:00 |
dequantize.cuh
|
llama : add Command R Plus support (#6491)
|
2024-04-09 11:16:13 +03:00 |
diagmask.cu
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
diagmask.cuh
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
dmmv.cu
|
CUDA: refactor mmq, dmmv, mmvq (#7716)
|
2024-06-05 16:53:00 +02:00 |
dmmv.cuh
|
sync : ggml (#6351)
|
2024-03-29 17:45:46 +02:00 |
fattn-common.cuh
|
CUDA: use tensor cores for MMQ (#7676)
|
2024-06-10 11:45:13 +02:00 |
fattn-tile-f16.cu
|
CUDA: use tensor cores for MMQ (#7676)
|
2024-06-10 11:45:13 +02:00 |
fattn-tile-f16.cuh
|
CUDA: faster large batch FA without tensor cores (#7314)
|
2024-05-17 18:54:52 +02:00 |
fattn-tile-f32.cu
|
CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (#7681)
|
2024-06-01 15:47:04 +02:00 |
fattn-tile-f32.cuh
|
CUDA: faster large batch FA without tensor cores (#7314)
|
2024-05-17 18:54:52 +02:00 |
fattn-vec-f16.cuh
|
CUDA: use tensor cores for MMQ (#7676)
|
2024-06-10 11:45:13 +02:00 |
fattn-vec-f32.cuh
|
CUDA: fix broken oob check for FA vec f32 kernel (#7904)
|
2024-06-12 17:41:51 +02:00 |
fattn-wmma-f16.cuh
|
CUDA: use tensor cores for MMQ (#7676)
|
2024-06-10 11:45:13 +02:00 |
fattn.cu
|
CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (#7681)
|
2024-06-01 15:47:04 +02:00 |
fattn.cuh
|
ggml : add Flash Attention (#5021)
|
2024-04-30 12:16:08 +03:00 |
getrows.cu
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
getrows.cuh
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
im2col.cu
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
im2col.cuh
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
mma.cuh
|
CUDA: optimize MMQ int8 tensor core performance (#8062)
|
2024-06-24 12:41:23 +02:00 |
mmq.cu
|
CUDA: use MMQ instead of cuBLAS by default (#8075)
|
2024-06-24 17:43:42 +02:00 |
mmq.cuh
|
CUDA: use MMQ instead of cuBLAS by default (#8075)
|
2024-06-24 17:43:42 +02:00 |
mmvq.cu
|
cuda : fix bounds check for src0 rows in MMVQ kernel (whisper/2231)
|
2024-06-16 20:32:49 +03:00 |
mmvq.cuh
|
CUDA: use MMQ instead of cuBLAS by default (#8075)
|
2024-06-24 17:43:42 +02:00 |
norm.cu
|
ggml : fix YARN + add tests + add asserts (#7617)
|
2024-05-29 20:17:31 +03:00 |
norm.cuh
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
pad.cu
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
pad.cuh
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
pool2d.cu
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
pool2d.cuh
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
quantize.cu
|
CUDA: revise q8_1 data layout for mul_mat_q (#7824)
|
2024-06-09 09:42:25 +02:00 |
quantize.cuh
|
CUDA: revise q8_1 data layout for mul_mat_q (#7824)
|
2024-06-09 09:42:25 +02:00 |
rope.cu
|
ggml : refactor rope norm/neox (#7634)
|
2024-06-05 11:29:20 +03:00 |
rope.cuh
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
scale.cu
|
Introduction of CUDA Graphs to LLama.cpp (#6766)
|
2024-05-08 22:55:49 +02:00 |
scale.cuh
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
softmax.cu
|
CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (#7921)
|
2024-06-14 18:41:49 +02:00 |
softmax.cuh
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
sumrows.cu
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
sumrows.cuh
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
tsembd.cu
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
tsembd.cuh
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
unary.cu
|
Add support for sqrt on CUDA (#7953)
|
2024-06-17 00:23:04 +02:00 |
unary.cuh
|
Add support for sqrt on CUDA (#7953)
|
2024-06-17 00:23:04 +02:00 |
upscale.cu
|
ggml : add ggml_upscale_ext (ggml/814)
|
2024-05-15 13:23:33 +03:00 |
upscale.cuh
|
cuda : refactor into multiple files (#6269)
|
2024-03-25 13:50:23 +01:00 |
vecdotq.cuh
|
CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (#7921)
|
2024-06-14 18:41:49 +02:00 |