Commit Graph

125 Commits

Author SHA1 Message Date
luoyu-intel
a9554e20b6
[SYCL] Fix WARP_SIZE=16 bug of Intel GPU (#8266)
* fix group_norm ut

* split softmax

* fix softmax

* add concat support condition

* revert debug code

* move QK_WARP_SIZE to presets.hpp
2024-07-05 13:06:13 +08:00
Neo Zhang Jianyu
f09b7cb609
rm get_work_group_size() by local cache for performance (#8286)
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2024-07-05 10:32:29 +08:00
AidanBeltonS
f619024764
[SYCL] Remove unneeded semicolons (#8280) 2024-07-04 09:07:19 +08:00
Daniele
d23287f122
Define and optimize RDNA1 (#8085) 2024-07-04 01:02:58 +02:00
Judd
f8d6a23804
fix typo (#8267)
Co-authored-by: Judd <foldl@boxvest.com>
2024-07-03 14:40:16 +02:00
AidanBeltonS
fadde67135
Dequant improvements rebase (#8255)
* Single load for half2

* Store scales in local mem

* Vec load quantized values
2024-07-03 09:55:34 +08:00
Clint Herron
07a3fc0608
Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258) 2024-07-02 12:18:10 -04:00
slaren
0e0590adab
cuda : update supports_op for matrix multiplication (#8245) 2024-07-02 09:39:38 +03:00
luoyu-intel
a9f3b10215
[SYCL] Fix win build conflict of math library (#8230)
* fix win build conflict of math library

* fix the condition: !(win32 & SYCL)

* revert warp_size=16
2024-07-02 12:50:07 +08:00
luoyu-intel
d08c20edde
[SYCL] Fix the sub group size of Intel (#8106)
* use warp_size macro for all sycl kernels

* fix mask of permute_sub_group_by_xor

* fix rms_norm with correct warp number

* fix rms_norm_f32/group_norm_f32

* move norm to norm.cpp file

* fix quantize bug

* fix mmvq's batch size
2024-07-02 10:16:00 +08:00
Johannes Gäßler
cb5fad4c6c
CUDA: refactor and optimize IQ MMVQ (#8215)
* CUDA: refactor and optimize IQ MMVQ

* uint -> uint32_t

* __dp4a -> ggml_cuda_dp4a

* remove MIN_CC_DP4A checks

* change default

* try CI fix
2024-07-01 20:39:06 +02:00
zhentaoyu
197fe6c1d7
[SYCL] Update SYCL-Rope op and Refactor (#8157)
* align with rope.cu and move sycl-op to a single file
2024-07-01 19:39:06 +08:00
Francis Couture-Harpin
8fbd59308b ggml-quants : attempt to fix Arm 32-bit support 2024-06-28 22:52:57 -04:00
Francis Couture-Harpin
ec50944bf6 ggml-quants : fix build failure on Windows 2024-06-28 20:41:13 -04:00
Francis Couture-Harpin
bfd2f21fb4 bitnet : replace 1.58b with b1.58, as in the paper 2024-06-28 20:38:12 -04:00
Johannes Gäßler
85a267daaa
CUDA: fix MMQ stream-k for --split-mode row (#8167) 2024-06-27 16:26:05 +02:00
Francis Couture-Harpin
89dc3b254c ggml-quants : use ceiling division when quantizing q1_3 2024-06-27 02:06:28 -04:00
Francis Couture-Harpin
9465ec6e12 ggml-quants : ARM NEON vec_dot for q2_2 and q1_3 2024-06-27 02:06:28 -04:00
Francis Couture-Harpin
638ad52f87 ggml-quants : cleanup Q1_3 code formatting 2024-06-27 02:06:28 -04:00
Francis Couture-Harpin
ef1e345c85 ggml-quants : Q2_2 now faster than Q4_K on with AVX2 2024-06-27 02:06:28 -04:00
Francis Couture-Harpin
48b73b8498 ggml-quants : substract 1 when back in epi8
This makes the 1.625 bpw type go faster than q4_0. Still not the fastest.
2024-06-27 02:06:28 -04:00
Francis Couture-Harpin
7ef4254a92 ggml-quants : faster 1.625 bpw AVX2 vec_dot
Not using a lookup table anymore makes it match q4_0 speed.

* gguf-py : fix formatting

* llama : remove spaces on empty line
2024-06-27 02:06:28 -04:00
Francis Couture-Harpin
bd807499f7 ggml-quants : 1.625 bpw ternary packing for BitNet 1.58b 2024-06-27 02:06:22 -04:00
slaren
31ec3993f6
ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLAS (cmake) (#8140) 2024-06-26 21:34:14 +02:00
Georgi Gerganov
f3f65429c4
llama : reorganize source code + improve CMake (#8006)
* scripts : update sync [no ci]

* files : relocate [no ci]

* ci : disable kompute build [no ci]

* cmake : fixes [no ci]

* server : fix mingw build

ggml-ci

* cmake : minor [no ci]

* cmake : link math library [no ci]

* cmake : build normal ggml library (not object library) [no ci]

* cmake : fix kompute build

ggml-ci

* make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE

ggml-ci

* move public backend headers to the public include directory (#8122)

* move public backend headers to the public include directory

* nix test

* spm : fix metal header

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* scripts : fix sync paths [no ci]

* scripts : sync ggml-blas.h [no ci]

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-06-26 18:33:02 +03:00