Jared Van Bortel
208cd52f7d
vulkan : implement YaRN RoPE scaling ( #2268 )
...
The NeoX cur_rot part is different because I'm pretty sure my original
implementation was wrong.
2023-11-23 17:22:09 -05:00
Jared Van Bortel
9c4dfd06e8
mention skipped change
2023-11-23 17:22:05 -05:00
Jared Van Bortel
6474fc879a
vulkan : handle ggml_scale for n%8 != 0
...
ref ggerganov/llama.cpp#3754
2023-11-23 17:22:00 -05:00
Jared Van Bortel
39abedd1d7
vulkan : optimize workgroup sizes
2023-11-23 17:18:48 -05:00
Jared Van Bortel
84f7fc4553
vulkan : rope n_past is now KQ_pos, f16 rope kernel
2023-11-23 17:18:42 -05:00
Jared Van Bortel
71565eb0c3
vulkan : replace ggml_diag_mask_inf with ggml_add (custom -inf mask)
2023-11-23 17:18:27 -05:00
Jared Van Bortel
c438c16896
fix build with external fmtlib (v10)
...
Co-authored-by: ToKiNoBug <tokinobug@163.com>
2023-11-08 16:31:29 -05:00
Jared Van Bortel
a8cac53207
kompute : fix issues with debug layers
2023-11-08 16:31:29 -05:00
Adam Treat
ffd0624be2
Remove this debug code.
2023-11-03 17:22:22 -04:00
Adam Treat
e006d377dd
Scale the workgroup count down to allow correct generation for falcon with
...
AMD radeon cards with lower workgroup count limit
Partially fixes #1581
2023-11-03 17:22:22 -04:00
Adam Treat
74ddf0f17d
Fix synchronization problem for AMD Radeon with amdvlk driver or windows
...
drivers. Does not have any performance or fidelity effect on other gpu/driver
combos I've tested.
FIXES: https://github.com/nomic-ai/gpt4all/issues/1507
2023-11-03 17:22:22 -04:00
Adam Treat
8d9efbf97a
Lower the workgroup count for some shaders by providing a loop that processes
...
four floats at a time.
2023-11-03 17:22:22 -04:00
Adam Treat
752f7ebd61
Remove unused push constant that was giving validation errors.
2023-11-03 17:22:22 -04:00
cebtenzzre
cbc0d1af79
kompute : make scripts executable
2023-11-03 17:22:22 -04:00
cebtenzzre
21841d3163
kompute : enable kp_logger and make it static ( #8 )
2023-11-03 17:22:22 -04:00
Aaron Miller
cc05a602d6
use mat*vec shaders for mat*mat
...
I wrote the mat*mat shaders from scratch so I understand them better but
they are currently not faster than just multiply-invoking the mat*vec
shaders, by a significant degree - so, except for f32 which needed a new
shader, revert to the m*v ones here.
2023-11-03 17:22:22 -04:00
Aaron Miller
c1fd64548d
attempted speedups 2
2023-11-03 17:22:22 -04:00
Aaron Miller
9bc52ebae3
attempted speedups
2023-11-03 17:22:22 -04:00
Aaron Miller
cd0257ed0d
q4_1 mat*mat
2023-11-03 17:22:22 -04:00
Aaron Miller
4809890d80
rm commented dbg print
2023-11-03 17:22:22 -04:00
Aaron Miller
b78a94bc6d
q6k mm works
2023-11-03 17:22:22 -04:00
Aaron Miller
3327d84a7f
perf: use bigger threadgroups in mm
2023-11-03 17:22:22 -04:00
Aaron Miller
46385ee0d5
misc vulkan cleanup
...
make pushconts consistent w/ dispatch, avoid a double free
2023-11-03 17:22:22 -04:00
Aaron Miller
f0cd38b9ad
add mat*mat ops
2023-11-03 17:22:22 -04:00
Aaron Miller
020b1745a0
vulkan: implement neox mode for rope
2023-11-03 17:22:21 -04:00
Aaron Miller
ff4212d20f
q8 mat*vec
2023-11-03 17:22:21 -04:00
Aaron Miller
9db90cbe12
f16 mv broadcasting fix (gqa fix)
2023-11-03 17:22:21 -04:00
Adam Treat
bc4b5ed1cb
Fixes for subgroup size to bring AMD and NVIDIA inline with eachother for all kernels.
2023-11-03 17:22:21 -04:00
Adam Treat
32289aa447
Fixes for norm.
2023-11-03 17:22:21 -04:00
Adam Treat
06d4b21598
Fix offset into the qh and now we have working vulkan accelerated for gguff'd llama.
2023-11-03 17:22:21 -04:00
Adam Treat
f1c9bc1821
Add q6_k getrows and mul*vec kernel.
2023-11-03 17:22:21 -04:00
Adam Treat
4b223ec432
Refactor getrows to use common code and get ready for q6_k.
2023-11-03 17:22:21 -04:00
Adam Treat
601905e75e
Move the subgroups and printf into common.
2023-11-03 17:22:21 -04:00
Adam Treat
93306f16d0
Consolidate code for mat x vec kernels and use subgroups more extensively.
2023-11-03 17:22:21 -04:00
Adam Treat
77135a3bf5
Add a common boilerplate code via include and elim copy pasta
2023-11-03 17:22:21 -04:00
Cebtenzzre
6b6c73a9e3
kompute : don't fail build because of -Warray-bounds
...
There are some warnings in debug builds that are likely to be false
positives.
2023-11-03 17:22:21 -04:00
Adam Treat
2c24d67e7b
Don't crash on available devices if we can't even create an instance.
2023-10-05 13:39:18 -04:00
Adam Treat
bd5f6399bb
Don't try and install kompute artifacts.
2023-10-05 13:39:18 -04:00
Aaron Miller
beee57266f
Make kompute actually include external SDK headers when requested
2023-10-05 13:39:18 -04:00
Adam Treat
b7e2e691d4
Completely revamp how we do object management with the vulkan backend and
...
stop using so many static objects so we can tear down and bring up vulkan
on new devices in the same runtime.
2023-10-05 13:39:18 -04:00
Adam Treat
45c8778b49
Switch to a dynamic dispatch table instead of linking hard against libvulkan.
2023-10-05 13:39:18 -04:00
Aaron Miller
8563fa001f
remove dynamic deps from kompute build
...
should no longer have new external deps other than libvulkan
```
ubuntu@ip-172-31-1-24:~/repo/gpt4all/gpt4all-backend/build$ ldd ./libllamamodel-mainline-avxonly.so
linux-vdso.so.1 (0x00007ffcb53bb000)
libvulkan.so.1 => /lib/x86_64-linux-gnu/libvulkan.so.1 (0x00007f239dab5000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f239d800000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f239d719000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f239da95000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f239d400000)
/lib64/ld-linux-x86-64.so.2 (0x00007f239dd1d000)
```
2023-10-05 13:39:18 -04:00
niansa
ba15dfd0be
Nomic vulkan backend licensed under the Software for Open Models License (SOM), version 1.0.
2023-10-05 13:39:18 -04:00