Adam Treat
a5eb001eab
Revert the prompt processing on gpu for now.
...
Fixes issues #1580 and #1581
2023-11-03 17:22:22 -04:00
Adam Treat
e006d377dd
Scale the workgroup count down to allow correct generation for falcon with
...
AMD radeon cards with lower workgroup count limit
Partially fixes #1581
2023-11-03 17:22:22 -04:00
cebtenzzre
89b71278ff
llama : decide to disable Vulkan before loading tensors ( #7 )
2023-11-03 17:22:22 -04:00
cebtenzzre
1c17010188
vulkan : fix missing break in matmul selection ( #9 )
2023-11-03 17:22:22 -04:00
Adam Treat
74ddf0f17d
Fix synchronization problem for AMD Radeon with amdvlk driver or windows
...
drivers. Does not have any performance or fidelity effect on other gpu/driver
combos I've tested.
FIXES: https://github.com/nomic-ai/gpt4all/issues/1507
2023-11-03 17:22:22 -04:00
Adam Treat
8d9efbf97a
Lower the workgroup count for some shaders by providing a loop that processes
...
four floats at a time.
2023-11-03 17:22:22 -04:00
Adam Treat
752f7ebd61
Remove unused push constant that was giving validation errors.
2023-11-03 17:22:22 -04:00
Adam Treat
8400015337
Don't try an allocation on a heap that is smaller than the size we require.
2023-11-03 17:22:22 -04:00
cebtenzzre
cbc0d1af79
kompute : make scripts executable
2023-11-03 17:22:22 -04:00
cebtenzzre
21841d3163
kompute : enable kp_logger and make it static ( #8 )
2023-11-03 17:22:22 -04:00
Aaron Miller
cc05a602d6
use mat*vec shaders for mat*mat
...
I wrote the mat*mat shaders from scratch so I understand them better but
they are currently not faster than just multiply-invoking the mat*vec
shaders, by a significant degree - so, except for f32 which needed a new
shader, revert to the m*v ones here.
2023-11-03 17:22:22 -04:00
Aaron Miller
c1fd64548d
attempted speedups 2
2023-11-03 17:22:22 -04:00
Aaron Miller
9bc52ebae3
attempted speedups
2023-11-03 17:22:22 -04:00
Aaron Miller
8dc79ac380
clean up vulkan/cpu switch
2023-11-03 17:22:22 -04:00
Aaron Miller
cd0257ed0d
q4_1 mat*mat
2023-11-03 17:22:22 -04:00
Aaron Miller
4809890d80
rm commented dbg print
2023-11-03 17:22:22 -04:00
Aaron Miller
b78a94bc6d
q6k mm works
2023-11-03 17:22:22 -04:00
Aaron Miller
d5741c07a5
use op param epsilon for norms
2023-11-03 17:22:22 -04:00
Aaron Miller
3327d84a7f
perf: use bigger threadgroups in mm
2023-11-03 17:22:22 -04:00
Aaron Miller
46385ee0d5
misc vulkan cleanup
...
make pushconts consistent w/ dispatch, avoid a double free
2023-11-03 17:22:22 -04:00
Aaron Miller
f0cd38b9ad
add mat*mat ops
2023-11-03 17:22:22 -04:00
Adam Treat
09d83f0401
Delete TODO now that we have q8_0.
2023-11-03 17:22:22 -04:00
Aaron Miller
8564f79036
falcon h2d + reenable vulkan
2023-11-03 17:22:22 -04:00
Aaron Miller
020b1745a0
vulkan: implement neox mode for rope
2023-11-03 17:22:21 -04:00
Aaron Miller
ff4212d20f
q8 mat*vec
2023-11-03 17:22:21 -04:00
Aaron Miller
9db90cbe12
f16 mv broadcasting fix (gqa fix)
2023-11-03 17:22:21 -04:00
Cebtenzzre
3d850db767
kompute : remove Q6_K from list of supported quant types
2023-11-03 17:22:21 -04:00
Cebtenzzre
24a4a5956a
kompute : only try to use Vulkan for LLaMA itself
2023-11-03 17:22:21 -04:00
Adam Treat
bc4b5ed1cb
Fixes for subgroup size to bring AMD and NVIDIA inline with eachother for all kernels.
2023-11-03 17:22:21 -04:00
Adam Treat
de589ced7c
Change this back to be in agreement with metal and our previous softmax kernel.
2023-11-03 17:22:21 -04:00
Adam Treat
6ac39752bf
Fixup the upstream CMakelists.txt so we can build just llama.cpp with our branch.
2023-11-03 17:22:21 -04:00
Adam Treat
32289aa447
Fixes for norm.
2023-11-03 17:22:21 -04:00
Adam Treat
06d4b21598
Fix offset into the qh and now we have working vulkan accelerated for gguff'd llama.
2023-11-03 17:22:21 -04:00
Adam Treat
f1c9bc1821
Add q6_k getrows and mul*vec kernel.
2023-11-03 17:22:21 -04:00
Adam Treat
4b223ec432
Refactor getrows to use common code and get ready for q6_k.
2023-11-03 17:22:21 -04:00
Adam Treat
5509f74318
Minor cleanup.
2023-11-03 17:22:21 -04:00
Adam Treat
601905e75e
Move the subgroups and printf into common.
2023-11-03 17:22:21 -04:00
Adam Treat
93306f16d0
Consolidate code for mat x vec kernels and use subgroups more extensively.
2023-11-03 17:22:21 -04:00
Adam Treat
77135a3bf5
Add a common boilerplate code via include and elim copy pasta
2023-11-03 17:22:21 -04:00
Adam Treat
9e4f8b4acc
Upload immediately to device.
2023-11-03 17:22:21 -04:00
Cebtenzzre
6b6c73a9e3
kompute : don't fail build because of -Warray-bounds
...
There are some warnings in debug builds that are likely to be false
positives.
2023-11-03 17:22:21 -04:00
Adam Treat
1b1416d7b7
Support for gguf.
2023-11-03 17:22:20 -04:00
Adam Treat
2c24d67e7b
Don't crash on available devices if we can't even create an instance.
2023-10-05 13:39:18 -04:00
Adam Treat
addac25293
Set the singleton to nullptr here.
2023-10-05 13:39:18 -04:00
Adam Treat
68aca6be08
Only use vulkan with known quant that work.
2023-10-05 13:39:18 -04:00
Adam Treat
4ed25b2f88
Sync from device back to host at begin of new prompt.
2023-10-05 13:39:18 -04:00
Adam Treat
bd5f6399bb
Don't try and install kompute artifacts.
2023-10-05 13:39:18 -04:00
Aaron Miller
8bea719879
vulkan: disambiguate gpus with the same name
2023-10-05 13:39:18 -04:00
Adam Treat
68cf1df6fb
Throw an exception when allocation fails for vulkan.
2023-10-05 13:39:18 -04:00
Aaron Miller
beee57266f
Make kompute actually include external SDK headers when requested
2023-10-05 13:39:18 -04:00