llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-10 10:41:47 +00:00

Author	SHA1	Message	Date
Meng, Hengyu	50addec9a5	[SYCL] fallback mmvq (#9088 ) * fallback mmvq to mul_mat * mmvq in cuda path * Update ggml/src/ggml-sycl.cpp Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@codeplay.com> --------- Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@codeplay.com>	2024-08-20 23:50:17 +08:00
zhentaoyu	4f8d19ff17	[SYCL] Fix SYCL `im2col` and `convert` Overflow with Large Dims (#9052 ) * sycl: fix im2col overflow and sync with cuda Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * sycl: fix convert overflow Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * sycl: fix convert and dequantize Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * sycl: fix ib in dmmv Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * sycl:refine convert Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * sycl: move downsample global_range into common Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * test: add im2col and convert test cases Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * test: make new cases only in sycl Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * test: comment new test_cases for only local testing Signed-off-by: zhentaoyu <zhentao.yu@intel.com> --------- Signed-off-by: zhentaoyu <zhentao.yu@intel.com>	2024-08-20 23:06:51 +08:00
slaren	2b1f616b20	ggml : reduce hash table reset cost (#8698 ) * ggml : reduce hash table reset cost * fix unreachable code warnings after GGML_ASSERT(false) * GGML_ASSERT(false) -> GGML_ABORT("fatal error") * GGML_ABORT use format string	2024-07-27 04:41:55 +02:00
Chen Xi	ed67bcb24f	[SYCL] fix multi-gpu issue on sycl (#8554 ) --------- Signed-off-by: Chen Xi <xi2chen@intel.com> Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>	2024-07-25 19:45:18 +08:00
AidanBeltonS	f4444d992c	[SYCL] Use multi_ptr to clean up deprecated warnings (#8256 )	2024-07-10 16:10:49 +01:00
Neo Zhang Jianyu	f09b7cb609	rm get_work_group_size() by local cache for performance (#8286 ) Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2024-07-05 10:32:29 +08:00
AidanBeltonS	fadde67135	Dequant improvements rebase (#8255 ) * Single load for half2 * Store scales in local mem * Vec load quantized values	2024-07-03 09:55:34 +08:00
luoyu-intel	d08c20edde	[SYCL] Fix the sub group size of Intel (#8106 ) * use warp_size macro for all sycl kernels * fix mask of permute_sub_group_by_xor * fix rms_norm with correct warp number * fix rms_norm_f32/group_norm_f32 * move norm to norm.cpp file * fix quantize bug * fix mmvq's batch size	2024-07-02 10:16:00 +08:00
Georgi Gerganov	f3f65429c4	llama : reorganize source code + improve CMake (#8006 ) * scripts : update sync [no ci] * files : relocate [no ci] * ci : disable kompute build [no ci] * cmake : fixes [no ci] * server : fix mingw build ggml-ci * cmake : minor [no ci] * cmake : link math library [no ci] * cmake : build normal ggml library (not object library) [no ci] * cmake : fix kompute build ggml-ci * make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE ggml-ci * move public backend headers to the public include directory (#8122) * move public backend headers to the public include directory * nix test * spm : fix metal header --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * scripts : fix sync paths [no ci] * scripts : sync ggml-blas.h [no ci] --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-26 18:33:02 +03:00

9 Commits