Jared Van Bortel
f194e1b6a6
Merge commit 'fcca0a700487999d52a525c96d6661e9f6a8703a' into nomic-vulkan
2023-11-23 17:21:59 -05:00
Jared Van Bortel
39abedd1d7
vulkan : optimize workgroup sizes
2023-11-23 17:18:48 -05:00
Jared Van Bortel
84f7fc4553
vulkan : rope n_past is now KQ_pos, f16 rope kernel
2023-11-23 17:18:42 -05:00
Jared Van Bortel
71565eb0c3
vulkan : replace ggml_diag_mask_inf with ggml_add (custom -inf mask)
2023-11-23 17:18:27 -05:00
Jared Van Bortel
af00cca08e
Merge commit 'ec893798b7a2a803466cc8f063051499ec3d96f7' into HEAD
2023-11-08 16:36:00 -05:00
Jared Van Bortel
c438c16896
fix build with external fmtlib (v10)
...
Co-authored-by: ToKiNoBug <tokinobug@163.com>
2023-11-08 16:31:29 -05:00
Jared Van Bortel
a8cac53207
kompute : fix issues with debug layers
2023-11-08 16:31:29 -05:00
cebtenzzre
f88b198885
llama : fix Vulkan whitelist ( #11 )
2023-11-03 17:22:22 -04:00
Adam Treat
ffd0624be2
Remove this debug code.
2023-11-03 17:22:22 -04:00
Adam Treat
a5eb001eab
Revert the prompt processing on gpu for now.
...
Fixes issues #1580 and #1581
2023-11-03 17:22:22 -04:00
Adam Treat
e006d377dd
Scale the workgroup count down to allow correct generation for falcon with
...
AMD radeon cards with lower workgroup count limit
Partially fixes #1581
2023-11-03 17:22:22 -04:00
cebtenzzre
89b71278ff
llama : decide to disable Vulkan before loading tensors ( #7 )
2023-11-03 17:22:22 -04:00
cebtenzzre
1c17010188
vulkan : fix missing break in matmul selection ( #9 )
2023-11-03 17:22:22 -04:00
Adam Treat
74ddf0f17d
Fix synchronization problem for AMD Radeon with amdvlk driver or windows
...
drivers. Does not have any performance or fidelity effect on other gpu/driver
combos I've tested.
FIXES: https://github.com/nomic-ai/gpt4all/issues/1507
2023-11-03 17:22:22 -04:00
Adam Treat
8d9efbf97a
Lower the workgroup count for some shaders by providing a loop that processes
...
four floats at a time.
2023-11-03 17:22:22 -04:00
Adam Treat
752f7ebd61
Remove unused push constant that was giving validation errors.
2023-11-03 17:22:22 -04:00
Adam Treat
8400015337
Don't try an allocation on a heap that is smaller than the size we require.
2023-11-03 17:22:22 -04:00
cebtenzzre
cbc0d1af79
kompute : make scripts executable
2023-11-03 17:22:22 -04:00
cebtenzzre
21841d3163
kompute : enable kp_logger and make it static ( #8 )
2023-11-03 17:22:22 -04:00
Aaron Miller
cc05a602d6
use mat*vec shaders for mat*mat
...
I wrote the mat*mat shaders from scratch so I understand them better but
they are currently not faster than just multiply-invoking the mat*vec
shaders, by a significant degree - so, except for f32 which needed a new
shader, revert to the m*v ones here.
2023-11-03 17:22:22 -04:00
Aaron Miller
c1fd64548d
attempted speedups 2
2023-11-03 17:22:22 -04:00
Aaron Miller
9bc52ebae3
attempted speedups
2023-11-03 17:22:22 -04:00
Aaron Miller
8dc79ac380
clean up vulkan/cpu switch
2023-11-03 17:22:22 -04:00
Aaron Miller
cd0257ed0d
q4_1 mat*mat
2023-11-03 17:22:22 -04:00
Aaron Miller
4809890d80
rm commented dbg print
2023-11-03 17:22:22 -04:00
Aaron Miller
b78a94bc6d
q6k mm works
2023-11-03 17:22:22 -04:00
Aaron Miller
d5741c07a5
use op param epsilon for norms
2023-11-03 17:22:22 -04:00
Aaron Miller
3327d84a7f
perf: use bigger threadgroups in mm
2023-11-03 17:22:22 -04:00
Aaron Miller
46385ee0d5
misc vulkan cleanup
...
make pushconts consistent w/ dispatch, avoid a double free
2023-11-03 17:22:22 -04:00
Aaron Miller
f0cd38b9ad
add mat*mat ops
2023-11-03 17:22:22 -04:00
Adam Treat
09d83f0401
Delete TODO now that we have q8_0.
2023-11-03 17:22:22 -04:00
Aaron Miller
8564f79036
falcon h2d + reenable vulkan
2023-11-03 17:22:22 -04:00
Aaron Miller
020b1745a0
vulkan: implement neox mode for rope
2023-11-03 17:22:21 -04:00
Aaron Miller
ff4212d20f
q8 mat*vec
2023-11-03 17:22:21 -04:00
Aaron Miller
9db90cbe12
f16 mv broadcasting fix (gqa fix)
2023-11-03 17:22:21 -04:00
Cebtenzzre
3d850db767
kompute : remove Q6_K from list of supported quant types
2023-11-03 17:22:21 -04:00
Cebtenzzre
24a4a5956a
kompute : only try to use Vulkan for LLaMA itself
2023-11-03 17:22:21 -04:00
Adam Treat
bc4b5ed1cb
Fixes for subgroup size to bring AMD and NVIDIA inline with eachother for all kernels.
2023-11-03 17:22:21 -04:00
Adam Treat
de589ced7c
Change this back to be in agreement with metal and our previous softmax kernel.
2023-11-03 17:22:21 -04:00
Adam Treat
6ac39752bf
Fixup the upstream CMakelists.txt so we can build just llama.cpp with our branch.
2023-11-03 17:22:21 -04:00
Adam Treat
32289aa447
Fixes for norm.
2023-11-03 17:22:21 -04:00
Adam Treat
06d4b21598
Fix offset into the qh and now we have working vulkan accelerated for gguff'd llama.
2023-11-03 17:22:21 -04:00
Adam Treat
f1c9bc1821
Add q6_k getrows and mul*vec kernel.
2023-11-03 17:22:21 -04:00
Adam Treat
4b223ec432
Refactor getrows to use common code and get ready for q6_k.
2023-11-03 17:22:21 -04:00
Adam Treat
5509f74318
Minor cleanup.
2023-11-03 17:22:21 -04:00
Adam Treat
601905e75e
Move the subgroups and printf into common.
2023-11-03 17:22:21 -04:00
Adam Treat
93306f16d0
Consolidate code for mat x vec kernels and use subgroups more extensively.
2023-11-03 17:22:21 -04:00
Adam Treat
77135a3bf5
Add a common boilerplate code via include and elim copy pasta
2023-11-03 17:22:21 -04:00
Adam Treat
9e4f8b4acc
Upload immediately to device.
2023-11-03 17:22:21 -04:00
Cebtenzzre
6b6c73a9e3
kompute : don't fail build because of -Warray-bounds
...
There are some warnings in debug builds that are likely to be false
positives.
2023-11-03 17:22:21 -04:00