Commit Graph

19 Commits

Author SHA1 Message Date
Neo Zhang Jianyu
faf67b3de4
[SYCL]set context default value to avoid memory issue, update guide (#9476)
Some checks are pending
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
* set context default to avoid memory issue, update guide

* Update docs/backend/SYCL.md

Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>

---------

Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>
2024-09-18 08:30:31 +08:00
Neo Zhang Jianyu
c9c8575a1a
enhance run script to be easy to change the parameters (#9448)
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2024-09-12 17:44:17 +08:00
Ouadie EL FAROUKI
0478174d59
[SYCL] Updated SYCL device filtering (#8901)
* Updated device filter to depend on default_selector (fixes non-intel device issues)
* Small related update to example/sycl Readme
2024-08-07 11:25:36 +01:00
Neo Zhang
d4ff847153
[SYCL] correct cmd name (#8877) 2024-08-06 09:09:12 +08:00
Clint Herron
07a3fc0608
Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258) 2024-07-02 12:18:10 -04:00
Georgi Gerganov
f3f65429c4
llama : reorganize source code + improve CMake (#8006)
* scripts : update sync [no ci]

* files : relocate [no ci]

* ci : disable kompute build [no ci]

* cmake : fixes [no ci]

* server : fix mingw build

ggml-ci

* cmake : minor [no ci]

* cmake : link math library [no ci]

* cmake : build normal ggml library (not object library) [no ci]

* cmake : fix kompute build

ggml-ci

* make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE

ggml-ci

* move public backend headers to the public include directory (#8122)

* move public backend headers to the public include directory

* nix test

* spm : fix metal header

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* scripts : fix sync paths [no ci]

* scripts : sync ggml-blas.h [no ci]

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-06-26 18:33:02 +03:00
luoyu-intel
de391e4c80
[SYCL] Fix windows build and inference (#8003)
* add sycl preset

* fix debug link error. fix windows crash

* update README
2024-06-20 21:19:05 +08:00
Olivier Chafik
1c641e6aac
build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809)
* `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew

* server: update refs -> llama-server

gitignore llama-server

* server: simplify nix package

* main: update refs -> llama

fix examples/main ref

* main/server: fix targets

* update more names

* Update build.yml

* rm accidentally checked in bins

* update straggling refs

* Update .gitignore

* Update server-llm.sh

* main: target name -> llama-cli

* Prefix all example bins w/ llama-

* fix main refs

* rename {main->llama}-cmake-pkg binary

* prefix more cmake targets w/ llama-

* add/fix gbnf-validator subfolder to cmake

* sort cmake example subdirs

* rm bin files

* fix llama-lookup-* Makefile rules

* gitignore /llama-*

* rename Dockerfiles

* rename llama|main -> llama-cli; consistent RPM bin prefixes

* fix some missing -cli suffixes

* rename dockerfile w/ llama-cli

* rename(make): llama-baby-llama

* update dockerfile refs

* more llama-cli(.exe)

* fix test-eval-callback

* rename: llama-cli-cmake-pkg(.exe)

* address gbnf-validator unused fread warning (switched to C++ / ifstream)

* add two missing llama- prefixes

* Updating docs for eval-callback binary to use new `llama-` prefix.

* Updating a few lingering doc references for rename of main to llama-cli

* Updating `run-with-preset.py` to use new binary names.
Updating docs around `perplexity` binary rename.

* Updating documentation references for lookup-merge and export-lora

* Updating two small `main` references missed earlier in the finetune docs.

* Update apps.nix

* update grammar/README.md w/ new llama-* names

* update llama-rpc-server bin name + doc

* Revert "update llama-rpc-server bin name + doc"

This reverts commit e474ef1df4.

* add hot topic notice to README.md

* Update README.md

* Update README.md

* rename gguf-split & quantize bins refs in **/tests.sh

---------

Co-authored-by: HanClinto <hanclinto@gmail.com>
2024-06-13 00:41:52 +01:00
Neo Zhang
0df0aa8e43
add build shared lib in win release package (#7438) 2024-05-24 10:06:56 +08:00
omahs
04976db7a8
docs: fix typos (#7124)
* fix typo

* fix typos

* fix typo

* fix typos

* fix typo

* fix typos
2024-05-07 18:20:33 +03:00
Neo Zhang Jianyu
de17e3f745
fix memcpy() crash, add missed cmd in guide, fix softmax (#6622)
* disable mmap to fix memcpy crash, add missed cmd in guide, fix softmax

* refactor to disable mmap for SYCL backend

* fix compile error in other os

* refactor the solution, use host buf to fix it, instead of disable mmap

* keep to support mmap()

* use host buff to reduce malloc times

* revert to malloc/free solution, for threaad safe
2024-04-14 10:42:29 +08:00
Neo Zhang Jianyu
95ad616cdd
[SYCL] fix SYCL backend build on windows is break by LOG() error (#6290)
* fix LOG() error for SYCL, enhance erro check by CI

* rollback to bash

* add newline at end of file
2024-03-25 15:52:41 +08:00
Neo Zhang Jianyu
6c0b287748
update readme sycl for new update (#6151)
* update readme sycl for new update

* Update README-sycl.md

Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>

* Update README-sycl.md

Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>

* Update README-sycl.md

Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>

* Update README-sycl.md

Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>

* Update README-sycl.md

Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>

* Update README-sycl.md

Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>

* update by review comments

* update w64devkit link

* update for verify device id part

* Update README-sycl.md

Co-authored-by: Meng, Hengyu <airdldl@163.com>

---------

Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>
Co-authored-by: Meng, Hengyu <airdldl@163.com>
2024-03-20 11:21:41 +08:00
Neo Zhang Jianyu
46acb36767
fix set main gpu error (#6073) 2024-03-15 18:53:53 +08:00
Neo Zhang Jianyu
715641391d
Support multiple GPUs (split mode) on SYCL backend (#5806)
* suport multiple cards: split-mode - layer|row

* rm warning

* rebase with master, support tow new OPs, close feature for -sm=row, fix for unit test

* update news

* fix merge error

* update according to review comments
2024-03-02 19:49:30 +08:00
Neo Zhang Jianyu
af3ba5d946
[SYCL] update guide of SYCL backend (#5254)
* update guide for make installation, memory, gguf model link,  rm todo for windows build

* add vs install requirement

* update for gpu device check

* update help of llama-bench

* fix grammer issues
2024-02-02 15:53:27 +08:00
Neo Zhang Jianyu
b2b9f025e7
format license text, restore apache license by legal suggestion (#5233) 2024-01-31 18:34:46 +05:30
Neo Zhang Jianyu
01684139c3
support SYCL backend windows build (#5208)
* support SYCL backend windows build

* add windows build in CI

* add for win build CI

* correct install oneMKL

* fix install issue

* fix ci

* fix install cmd

* fix install cmd

* fix install cmd

* fix install cmd

* fix install cmd

* fix win build

* fix win build

* fix win build

* restore other CI part

* restore as base

* rm no new line

* fix no new line issue, add -j

* fix grammer issue

* allow to trigger manually, fix format issue

* fix format

* add newline

* fix format

* fix format

* fix format issuse

---------

Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
2024-01-31 08:08:07 +05:30
Abhilash Majumder
0f648573dd
ggml : add unified SYCL backend for Intel GPUs (#2690)
* first update for migration

* update init_cublas

* add debug functio, commit all help code

* step 1

* step 2

* step3 add fp16, slower 31->28

* add GGML_LIST_DEVICE function

* step 5 format device and print

* step6, enhance error check, remove CUDA macro, enhance device id to fix none-zero id issue

* support main device is non-zero

* step7 add debug for code path, rm log

* step 8, rename all macro & func from cuda by sycl

* fix error of select non-zero device, format device list

* ren ggml-sycl.hpp -> ggml-sycl.h

* clear CMAKE to rm unused lib and options

* correct queue: rm dtct:get_queue

* add print tensor function to debug

* fix error: wrong result in 658746bb26702e50f2c59c0e4ada8e9da6010481

* summary dpct definition in one header file to replace folder:dpct

* refactor device log

* mv dpct definition from folder dpct to ggml-sycl.h

* update readme, refactor build script

* fix build with sycl

* set nthread=1 when sycl, increase performance

* add run script, comment debug code

* add ls-sycl-device tool

* add ls-sycl-device, rm unused files

* rm rear space

* dos2unix

* Update README_sycl.md

* fix return type

* remove sycl version from include path

* restore rm code to fix hang issue

* add syc and link for sycl readme

* rm original sycl code before refactor

* fix code err

* add know issue for pvc hang issue

* enable SYCL_F16 support

* align pr4766

* check for sycl blas, better performance

* cleanup 1

* remove extra endif

* add build&run script, clean CMakefile, update guide by review comments

* rename macro to intel hardware

* editor config format

* format fixes

* format fixes

* editor format fix

* Remove unused headers

* skip build sycl tool for other code path

* replace tab by space

* fix blas matmul function

* fix mac build

* restore hip dependency

* fix conflict

* ren as review comments

* mv internal function to .cpp file

* export funciton print_sycl_devices(), mv class dpct definition to source file

* update CI/action for sycl code, fix CI error of repeat/dup

* fix action ID format issue

* rm unused strategy

* enable llama_f16 in ci

* fix conflict

* fix build break on MacOS, due to CI of MacOS depend on external ggml, instead of internal ggml

* fix ci cases for unsupported data type

* revert unrelated changed in cuda cmake
remove useless nommq
fix typo of GGML_USE_CLBLAS_SYCL

* revert hip cmake changes

* fix indent

* add prefix in func name

* revert no mmq

* rm cpu blas duplicate

* fix no_new_line

* fix src1->type==F16 bug.

* pass batch offset for F16 src1

* fix batch error

* fix wrong code

* revert sycl checking in test-sampling

* pass void as arguments of ggml_backend_sycl_print_sycl_devices

* remove extra blank line in test-sampling

* revert setting n_threads in sycl

* implement std::isinf for icpx with fast math.

* Update ci/run.sh

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update examples/sycl/run-llama2.sh

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update examples/sycl/run-llama2.sh

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update CMakeLists.txt

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update CMakeLists.txt

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update CMakeLists.txt

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update CMakeLists.txt

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* add copyright and MIT license declare

* update the cmd example

---------

Co-authored-by: jianyuzh <jianyu.zhang@intel.com>
Co-authored-by: luoyu-intel <yu.luo@intel.com>
Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-28 17:56:23 +02:00