cebtenzzre
21841d3163
kompute : enable kp_logger and make it static ( #8 )
2023-11-03 17:22:22 -04:00
Aaron Miller
cc05a602d6
use mat*vec shaders for mat*mat
...
I wrote the mat*mat shaders from scratch so I understand them better but
they are currently not faster than just multiply-invoking the mat*vec
shaders, by a significant degree - so, except for f32 which needed a new
shader, revert to the m*v ones here.
2023-11-03 17:22:22 -04:00
Aaron Miller
c1fd64548d
attempted speedups 2
2023-11-03 17:22:22 -04:00
Aaron Miller
9bc52ebae3
attempted speedups
2023-11-03 17:22:22 -04:00
Aaron Miller
8dc79ac380
clean up vulkan/cpu switch
2023-11-03 17:22:22 -04:00
Aaron Miller
cd0257ed0d
q4_1 mat*mat
2023-11-03 17:22:22 -04:00
Aaron Miller
4809890d80
rm commented dbg print
2023-11-03 17:22:22 -04:00
Aaron Miller
b78a94bc6d
q6k mm works
2023-11-03 17:22:22 -04:00
Aaron Miller
d5741c07a5
use op param epsilon for norms
2023-11-03 17:22:22 -04:00
Aaron Miller
3327d84a7f
perf: use bigger threadgroups in mm
2023-11-03 17:22:22 -04:00
Aaron Miller
46385ee0d5
misc vulkan cleanup
...
make pushconts consistent w/ dispatch, avoid a double free
2023-11-03 17:22:22 -04:00
Aaron Miller
f0cd38b9ad
add mat*mat ops
2023-11-03 17:22:22 -04:00
Adam Treat
09d83f0401
Delete TODO now that we have q8_0.
2023-11-03 17:22:22 -04:00
Aaron Miller
8564f79036
falcon h2d + reenable vulkan
2023-11-03 17:22:22 -04:00
Aaron Miller
020b1745a0
vulkan: implement neox mode for rope
2023-11-03 17:22:21 -04:00
Aaron Miller
ff4212d20f
q8 mat*vec
2023-11-03 17:22:21 -04:00
Aaron Miller
9db90cbe12
f16 mv broadcasting fix (gqa fix)
2023-11-03 17:22:21 -04:00
Cebtenzzre
3d850db767
kompute : remove Q6_K from list of supported quant types
2023-11-03 17:22:21 -04:00
Cebtenzzre
24a4a5956a
kompute : only try to use Vulkan for LLaMA itself
2023-11-03 17:22:21 -04:00
Adam Treat
bc4b5ed1cb
Fixes for subgroup size to bring AMD and NVIDIA inline with eachother for all kernels.
2023-11-03 17:22:21 -04:00
Adam Treat
de589ced7c
Change this back to be in agreement with metal and our previous softmax kernel.
2023-11-03 17:22:21 -04:00
Adam Treat
6ac39752bf
Fixup the upstream CMakelists.txt so we can build just llama.cpp with our branch.
2023-11-03 17:22:21 -04:00
Adam Treat
32289aa447
Fixes for norm.
2023-11-03 17:22:21 -04:00
Adam Treat
06d4b21598
Fix offset into the qh and now we have working vulkan accelerated for gguff'd llama.
2023-11-03 17:22:21 -04:00
Adam Treat
f1c9bc1821
Add q6_k getrows and mul*vec kernel.
2023-11-03 17:22:21 -04:00
Adam Treat
4b223ec432
Refactor getrows to use common code and get ready for q6_k.
2023-11-03 17:22:21 -04:00
Adam Treat
5509f74318
Minor cleanup.
2023-11-03 17:22:21 -04:00
Adam Treat
601905e75e
Move the subgroups and printf into common.
2023-11-03 17:22:21 -04:00
Adam Treat
93306f16d0
Consolidate code for mat x vec kernels and use subgroups more extensively.
2023-11-03 17:22:21 -04:00
Adam Treat
77135a3bf5
Add a common boilerplate code via include and elim copy pasta
2023-11-03 17:22:21 -04:00
Adam Treat
9e4f8b4acc
Upload immediately to device.
2023-11-03 17:22:21 -04:00
Cebtenzzre
6b6c73a9e3
kompute : don't fail build because of -Warray-bounds
...
There are some warnings in debug builds that are likely to be false
positives.
2023-11-03 17:22:21 -04:00
Adam Treat
1b1416d7b7
Support for gguf.
2023-11-03 17:22:20 -04:00
Peter Sugihara
d9b33fe95b
metal : round up to 16 to fix MTLDebugComputeCommandEncoder assertion ( #3938 )
2023-11-03 21:18:18 +02:00
Xiao-Yong Jin
5ba3746171
ggml-metal: fix yarn rope ( #3937 )
2023-11-03 14:00:31 -04:00
slaren
abb77e7319
ggml-cuda : move row numbers to x grid dim in mmv kernels ( #3921 )
2023-11-03 12:13:09 +01:00
Georgi Gerganov
8f961abdc4
speculative : change default p_accept to 0.5 + CLI args ( #3919 )
...
ggml-ci
2023-11-03 09:41:56 +02:00
Georgi Gerganov
05816027d6
common : YAYF (yet another YARN fix) ( #3925 )
...
ggml-ci
2023-11-03 09:24:00 +02:00
cebtenzzre
3fdbe6b66b
llama : change yarn_ext_factor placeholder to -1 ( #3922 )
2023-11-03 08:31:58 +02:00
Kerfuffle
629f917cd6
cuda : add ROCM aliases for CUDA pool stuff ( #3918 )
2023-11-02 21:58:22 +02:00
Andrei
51b2fc11f7
cmake : fix relative path to git submodule index ( #3915 )
2023-11-02 21:40:31 +02:00
Georgi Gerganov
224e7d5b14
readme : add notice about #3912
2023-11-02 20:44:12 +02:00
Georgi Gerganov
c7743fe1c1
cuda : fix const ptrs warning causing ROCm build issues ( #3913 )
2023-11-02 20:32:11 +02:00
Oleksii Maryshchenko
d6069051de
cuda : use CUDA memory pool with async memory allocation/deallocation when available ( #3903 )
...
* Using cuda memory pools for async alloc/dealloc.
* If cuda device doesnt support memory pool than use old implementation.
* Removed redundant cublasSetStream
---------
Co-authored-by: Oleksii Maryshchenko <omaryshchenko@dtis.com>
2023-11-02 19:10:39 +02:00
Georgi Gerganov
4ff1046d75
gguf : print error for GGUFv1 files ( #3908 )
2023-11-02 16:22:30 +02:00
slaren
21958bb393
cmake : disable LLAMA_NATIVE by default ( #3906 )
2023-11-02 14:10:33 +02:00
Georgi Gerganov
2756c4fbff
gguf : remove special-case code for GGUFv1 ( #3901 )
...
ggml-ci
2023-11-02 11:20:21 +02:00
Georgi Gerganov
1efae9b7dc
llm : prevent from 1-D tensors being GPU split ( #3697 )
2023-11-02 09:54:44 +02:00
cebtenzzre
b12fa0d1c1
build : link against build info instead of compiling against it ( #3879 )
...
* cmake : fix build when .git does not exist
* cmake : simplify BUILD_INFO target
* cmake : add missing dependencies on BUILD_INFO
* build : link against build info instead of compiling against it
* zig : make build info a .cpp source instead of a header
Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com>
* cmake : revert change to CMP0115
---------
Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com>
2023-11-02 08:50:16 +02:00
Georgi Gerganov
4d719a6d4e
cuda : check if this fixes Pascal card regression ( #3882 )
2023-11-02 08:35:10 +02:00