Johannes Gäßler
|
963552903f
|
CUDA: fix broken oob check for FA vec f32 kernel (#7904)
|
2024-06-12 17:41:51 +02:00 |
|
Johannes Gäßler
|
e141ce624a
|
Fix FlashAttention debug test, FP32 assert (#7684)
|
2024-06-01 23:26:10 +02:00 |
|
Johannes Gäßler
|
750f60c03e
|
CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (#7681)
|
2024-06-01 15:47:04 +02:00 |
|
Johannes Gäßler
|
9b596417af
|
CUDA: quantized KV support for FA vec (#7527)
* CUDA: quantized KV support for FA vec
* try CI fix
* fix commented-out kernel variants
* add q8_0 q4_0 tests
* fix nwarps > batch size
* split fattn compile via extern templates
* fix flake8
* fix metal tests
* fix cmake
* make generate_cu_files.py executable
* add autogenerated .cu files
* fix AMD
* error if type_v != FP16 and not flash_attn
* remove obsolete code
|
2024-06-01 08:44:14 +02:00 |
|
Johannes Gäßler
|
dc685be466
|
CUDA: add FP32 FlashAttention vector kernel (#7188)
* CUDA: add FP32 FlashAttention vector kernel
* fixup! CUDA: add FP32 FlashAttention vector kernel
* fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
* fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
|
2024-05-12 19:40:45 +02:00 |
|