llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-05 00:04:36 +00:00

Author	SHA1	Message	Date
Jared Van Bortel	208cd52f7d	vulkan : implement YaRN RoPE scaling (#2268 ) The NeoX cur_rot part is different because I'm pretty sure my original implementation was wrong.	2023-11-23 17:22:09 -05:00
Jared Van Bortel	9c4dfd06e8	mention skipped change	2023-11-23 17:22:05 -05:00
Jared Van Bortel	6474fc879a	vulkan : handle ggml_scale for n%8 != 0 ref ggerganov/llama.cpp#3754	2023-11-23 17:22:00 -05:00
Jared Van Bortel	39abedd1d7	vulkan : optimize workgroup sizes	2023-11-23 17:18:48 -05:00
Jared Van Bortel	84f7fc4553	vulkan : rope n_past is now KQ_pos, f16 rope kernel	2023-11-23 17:18:42 -05:00
Jared Van Bortel	71565eb0c3	vulkan : replace ggml_diag_mask_inf with ggml_add (custom -inf mask)	2023-11-23 17:18:27 -05:00
Jared Van Bortel	c438c16896	fix build with external fmtlib (v10) Co-authored-by: ToKiNoBug <tokinobug@163.com>	2023-11-08 16:31:29 -05:00
Jared Van Bortel	a8cac53207	kompute : fix issues with debug layers	2023-11-08 16:31:29 -05:00
Adam Treat	ffd0624be2	Remove this debug code.	2023-11-03 17:22:22 -04:00
Adam Treat	e006d377dd	Scale the workgroup count down to allow correct generation for falcon with AMD radeon cards with lower workgroup count limit Partially fixes #1581	2023-11-03 17:22:22 -04:00
Adam Treat	74ddf0f17d	Fix synchronization problem for AMD Radeon with amdvlk driver or windows drivers. Does not have any performance or fidelity effect on other gpu/driver combos I've tested. FIXES: https://github.com/nomic-ai/gpt4all/issues/1507	2023-11-03 17:22:22 -04:00
Adam Treat	8d9efbf97a	Lower the workgroup count for some shaders by providing a loop that processes four floats at a time.	2023-11-03 17:22:22 -04:00
Adam Treat	752f7ebd61	Remove unused push constant that was giving validation errors.	2023-11-03 17:22:22 -04:00
cebtenzzre	cbc0d1af79	kompute : make scripts executable	2023-11-03 17:22:22 -04:00
cebtenzzre	21841d3163	kompute : enable kp_logger and make it static (#8 )	2023-11-03 17:22:22 -04:00
Aaron Miller	cc05a602d6	use matvec shaders for matmat I wrote the matmat shaders from scratch so I understand them better but they are currently not faster than just multiply-invoking the matvec shaders, by a significant degree - so, except for f32 which needed a new shader, revert to the m*v ones here.	2023-11-03 17:22:22 -04:00
Aaron Miller	c1fd64548d	attempted speedups 2	2023-11-03 17:22:22 -04:00
Aaron Miller	9bc52ebae3	attempted speedups	2023-11-03 17:22:22 -04:00
Aaron Miller	cd0257ed0d	q4_1 mat*mat	2023-11-03 17:22:22 -04:00
Aaron Miller	4809890d80	rm commented dbg print	2023-11-03 17:22:22 -04:00
Aaron Miller	b78a94bc6d	q6k mm works	2023-11-03 17:22:22 -04:00
Aaron Miller	3327d84a7f	perf: use bigger threadgroups in mm	2023-11-03 17:22:22 -04:00
Aaron Miller	46385ee0d5	misc vulkan cleanup make pushconts consistent w/ dispatch, avoid a double free	2023-11-03 17:22:22 -04:00
Aaron Miller	f0cd38b9ad	add mat*mat ops	2023-11-03 17:22:22 -04:00
Aaron Miller	020b1745a0	vulkan: implement neox mode for rope	2023-11-03 17:22:21 -04:00
Aaron Miller	ff4212d20f	q8 mat*vec	2023-11-03 17:22:21 -04:00
Aaron Miller	9db90cbe12	f16 mv broadcasting fix (gqa fix)	2023-11-03 17:22:21 -04:00
Adam Treat	bc4b5ed1cb	Fixes for subgroup size to bring AMD and NVIDIA inline with eachother for all kernels.	2023-11-03 17:22:21 -04:00
Adam Treat	32289aa447	Fixes for norm.	2023-11-03 17:22:21 -04:00
Adam Treat	06d4b21598	Fix offset into the qh and now we have working vulkan accelerated for gguff'd llama.	2023-11-03 17:22:21 -04:00
Adam Treat	f1c9bc1821	Add q6_k getrows and mul*vec kernel.	2023-11-03 17:22:21 -04:00
Adam Treat	4b223ec432	Refactor getrows to use common code and get ready for q6_k.	2023-11-03 17:22:21 -04:00
Adam Treat	601905e75e	Move the subgroups and printf into common.	2023-11-03 17:22:21 -04:00
Adam Treat	93306f16d0	Consolidate code for mat x vec kernels and use subgroups more extensively.	2023-11-03 17:22:21 -04:00
Adam Treat	77135a3bf5	Add a common boilerplate code via include and elim copy pasta	2023-11-03 17:22:21 -04:00
Cebtenzzre	6b6c73a9e3	kompute : don't fail build because of -Warray-bounds There are some warnings in debug builds that are likely to be false positives.	2023-11-03 17:22:21 -04:00
Adam Treat	2c24d67e7b	Don't crash on available devices if we can't even create an instance.	2023-10-05 13:39:18 -04:00
Adam Treat	bd5f6399bb	Don't try and install kompute artifacts.	2023-10-05 13:39:18 -04:00
Aaron Miller	beee57266f	Make kompute actually include external SDK headers when requested	2023-10-05 13:39:18 -04:00
Adam Treat	b7e2e691d4	Completely revamp how we do object management with the vulkan backend and stop using so many static objects so we can tear down and bring up vulkan on new devices in the same runtime.	2023-10-05 13:39:18 -04:00
Adam Treat	45c8778b49	Switch to a dynamic dispatch table instead of linking hard against libvulkan.	2023-10-05 13:39:18 -04:00
Aaron Miller	8563fa001f	remove dynamic deps from kompute build should no longer have new external deps other than libvulkan ``` ubuntu@ip-172-31-1-24:~/repo/gpt4all/gpt4all-backend/build$ ldd ./libllamamodel-mainline-avxonly.so linux-vdso.so.1 (0x00007ffcb53bb000) libvulkan.so.1 => /lib/x86_64-linux-gnu/libvulkan.so.1 (0x00007f239dab5000) libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f239d800000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f239d719000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f239da95000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f239d400000) /lib64/ld-linux-x86-64.so.2 (0x00007f239dd1d000) ```	2023-10-05 13:39:18 -04:00
niansa	ba15dfd0be	Nomic vulkan backend licensed under the Software for Open Models License (SOM), version 1.0.	2023-10-05 13:39:18 -04:00

43 Commits