Diego Devesa
c5b0f4b5d9
llama : refactor model loader with backend registry ( #10026 )
2024-10-30 02:01:23 +01:00
Ouadie EL FAROUKI
87421a23e8
[SYCL] Add SYCL Backend registry, device and Event Interfaces ( #9705 )
...
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-musa.Dockerfile platforms:linux/amd64 tag:full-musa]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-musa.Dockerfile platforms:linux/amd64 tag:light-musa]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-musa.Dockerfile platforms:linux/amd64 tag:server-musa]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
* implemented missing SYCL event APIs
* sycl : Added device and backend reg interfaces
* Restructured ggml-sycl.cpp
2024-10-18 06:46:16 +01:00
Diego Devesa
c83ad6d01e
ggml-backend : add device and backend reg interfaces ( #9707 )
...
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-10-03 01:49:47 +02:00
Akarshan Biswas
e62e9789cd
Revert "[SYCL] fallback mmvq ( #9088 )" ( #9579 )
...
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
This reverts commit 50addec9a5
.
2024-09-23 11:28:06 +08:00
Georgi Gerganov
d13edb17ed
ggml : fix builds ( #0 )
...
ggml-ci
2024-09-20 21:15:05 +03:00
Johannes Gäßler
424c5d00a9
ggml/examples: add backend support for numerical optimization (ggml/949)
...
* CUDA eval works
* stochastic gradient descent op
* Adam except decay
* CUDA CROSS_ENTROPY_LOSS_BACK
* CUDA mnist-fc training works
* backend CLI arg
* refactor gguf load
* remove sched from opt_step_adam
* implement l1 regularization (weight decay)
* extra call to add optimizer
* initialize gradients with ggml_graph_reset
* gradient accumulation
* increment iter per eval instead of epoch
* adjust backend interfaces
* fix ggml_graph_reset without backend
* fix ggml graph export/import
* fixup
* rename
* revert ggml_opt changes
* more general CUDA repeat_back
* update documentation, fix CNN
* validation split
* add clarifying comment
* optimize PyTorch training
* adjust buffer size, thread count
* fix 0.0f validation split
* Update examples/mnist/mnist-common.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* fix gradient accumulation
* tensor flag for accumulators -> tensor hash set
* Update include/ggml.h
Co-authored-by: slaren <slarengh@gmail.com>
* Update tests/test-backend-ops.cpp
Co-authored-by: slaren <slarengh@gmail.com>
* Update tests/test-backend-ops.cpp
Co-authored-by: slaren <slarengh@gmail.com>
* fix test prints
* Update src/ggml-backend.c
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* better CUDA support for noncontiguous out_prod
* add comment
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
2024-09-20 21:15:05 +03:00
Georgi Gerganov
d6a04f872d
ggml : hide ggml_object, ggml_cgraph, ggml_hash_set ( #9408 )
...
* ggml : hide ggml_object, ggml_cgraph, ggml_hash_set
ggml-ci
* ggml : add ggml-impl.h to backends
* ggml : fix compiler warnings
ggml-ci
* ggml : add assert upon adding nodes
2024-09-12 14:23:49 +03:00
Alberto Cabrera Pérez
51b6038636
sycl : update support conditions ( #9394 )
...
* sycl : update support condition to im2col
Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>
* Added TODO to remind supporting FP32 im2col
---------
Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>
2024-09-11 08:53:42 +08:00
Neo Zhang Jianyu
2a358fb0c4
[SYCL] add check malloc result on device ( #9346 )
...
* add check malloc result on device
* update for review comments, check all malloc_device() result
---------
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2024-09-08 19:05:29 +08:00
luoyu-intel
1731d4238f
[SYCL] Add oneDNN primitive support ( #9091 )
...
* add onednn
* add sycl_f16
* add dnnl stream
* add engine map
* use dnnl for intel only
* use fp16fp16fp16
* update doc
2024-08-22 12:50:10 +08:00
Meng, Hengyu
50addec9a5
[SYCL] fallback mmvq ( #9088 )
...
* fallback mmvq to mul_mat
* mmvq in cuda path
* Update ggml/src/ggml-sycl.cpp
Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@codeplay.com>
---------
Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@codeplay.com>
2024-08-20 23:50:17 +08:00
zhentaoyu
4f8d19ff17
[SYCL] Fix SYCL im2col
and convert
Overflow with Large Dims ( #9052 )
...
* sycl: fix im2col overflow and sync with cuda
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
* sycl: fix convert overflow
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
* sycl: fix convert and dequantize
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
* sycl: fix ib in dmmv
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
* sycl:refine convert
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
* sycl: move downsample global_range into common
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
* test: add im2col and convert test cases
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
* test: make new cases only in sycl
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
* test: comment new test_cases for only local testing
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
---------
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
2024-08-20 23:06:51 +08:00
zhentaoyu
c887d8b017
[SYCL] Add TIMESTEP_EMBEDDING
OP ( #8707 )
...
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
2024-07-30 14:56:51 +08:00
Meng, Hengyu
0832de7236
[SYCL] add conv support ( #8688 )
2024-07-29 10:50:27 +08:00
slaren
2b1f616b20
ggml : reduce hash table reset cost ( #8698 )
...
* ggml : reduce hash table reset cost
* fix unreachable code warnings after GGML_ASSERT(false)
* GGML_ASSERT(false) -> GGML_ABORT("fatal error")
* GGML_ABORT use format string
2024-07-27 04:41:55 +02:00
Meng, Hengyu
16bdfa42ac
[SYCL] add concat through dim 1/2 ( #8483 )
...
* add concat through dim 1/2
2024-07-15 19:32:15 +08:00
Chen Xi
b549a1bbef
[SYCL] fix the mul_mat_id ut issues ( #8427 )
...
* fix part of mul_mat_id
* skip the bfloat 16 sycl ut
Signed-off-by: Chen Xi <xi2chen@intel.com>
---------
Signed-off-by: Chen Xi <xi2chen@intel.com>
Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>
Co-authored-by: Chen Xi <xi2chen@intel.com>
2024-07-12 08:52:04 +08:00
Alberto Cabrera Pérez
5b0b8d8cfb
sycl : Reenabled mmvq path for the SYCL Nvidia Backend ( #8372 )
...
* SYCL : Reenabled mmvq path for the SYCL Nvidia Backend
* Reduced verbosity of comment
2024-07-09 22:03:15 +08:00
Ouadie EL FAROUKI
1f3e1b66e2
Enabled more data types for oneMKL gemm_batch ( #8236 )
2024-07-05 13:23:25 +01:00
luoyu-intel
a9554e20b6
[SYCL] Fix WARP_SIZE=16 bug of Intel GPU ( #8266 )
...
* fix group_norm ut
* split softmax
* fix softmax
* add concat support condition
* revert debug code
* move QK_WARP_SIZE to presets.hpp
2024-07-05 13:06:13 +08:00
Neo Zhang Jianyu
f09b7cb609
rm get_work_group_size() by local cache for performance ( #8286 )
...
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2024-07-05 10:32:29 +08:00
luoyu-intel
d08c20edde
[SYCL] Fix the sub group size of Intel ( #8106 )
...
* use warp_size macro for all sycl kernels
* fix mask of permute_sub_group_by_xor
* fix rms_norm with correct warp number
* fix rms_norm_f32/group_norm_f32
* move norm to norm.cpp file
* fix quantize bug
* fix mmvq's batch size
2024-07-02 10:16:00 +08:00
zhentaoyu
197fe6c1d7
[SYCL] Update SYCL-Rope op and Refactor ( #8157 )
...
* align with rope.cu and move sycl-op to a single file
2024-07-01 19:39:06 +08:00
Georgi Gerganov
f3f65429c4
llama : reorganize source code + improve CMake ( #8006 )
...
* scripts : update sync [no ci]
* files : relocate [no ci]
* ci : disable kompute build [no ci]
* cmake : fixes [no ci]
* server : fix mingw build
ggml-ci
* cmake : minor [no ci]
* cmake : link math library [no ci]
* cmake : build normal ggml library (not object library) [no ci]
* cmake : fix kompute build
ggml-ci
* make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE
ggml-ci
* move public backend headers to the public include directory (#8122 )
* move public backend headers to the public include directory
* nix test
* spm : fix metal header
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* scripts : fix sync paths [no ci]
* scripts : sync ggml-blas.h [no ci]
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-06-26 18:33:02 +03:00