Commit Graph

112 Commits

Author SHA1 Message Date
Georgi Gerganov
6262d13e0b
common : reimplement logging (#9418)
Some checks are pending
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
Python Type-Check / pyright type-check (push) Waiting to run
https://github.com/ggerganov/llama.cpp/pull/9418
2024-09-15 20:46:12 +03:00
Huang Qi
4dc4f5f14a
ci : update HIP SDK to 24.Q3 (ROCm 6.1) (#9329) 2024-09-12 14:28:43 +03:00
Trivikram Kamat
3c26a1644d
ci : bump actions/checkout to v4 (#9377) 2024-09-12 14:27:45 +03:00
awatuna
32b2ec88bc
Update build.yml (#9184)
build rpc-server for windows cuda
2024-09-06 00:34:36 +02:00
Radoslav Gerganov
1f67436c5e
ci : enable RPC in all of the released builds (#9006)
ref: #8912
2024-08-12 19:17:03 +03:00
Johannes Gäßler
6eeaeba126
cmake: use 1 more thread for non-ggml in CI (#8740) 2024-07-28 22:32:44 +02:00
Johannes Gäßler
69c487f4ed
CUDA: MMQ code deduplication + iquant support (#8495)
* CUDA: MMQ code deduplication + iquant support

* 1 less parallel job for CI build
2024-07-20 22:25:26 +02:00
bandoti
17eb6aa8a9
vulkan : cmake integration (#8119)
* Add Vulkan to CMake pkg

* Add Sycl to CMake pkg

* Add OpenMP to CMake pkg

* Split generated shader file into separate translation unit

* Add CMake target for Vulkan shaders

* Update README.md

* Add make target for Vulkan shaders

* Use pkg-config to locate vulkan library

* Add vulkan SDK dep to ubuntu-22-cmake-vulkan workflow

* Clean up tabs

* Move sudo to apt-key invocation

* Forward GGML_EXTRA_LIBS to CMake config pkg

* Update vulkan obj file paths

* Add shaderc to nix pkg

* Add python3 to Vulkan nix build

* Link against ggml in cmake pkg

* Remove Python dependency from Vulkan build

* code review changes

* Remove trailing newline

* Add cflags from pkg-config to fix w64devkit build

* Update README.md

* Remove trailing whitespace

* Update README.md

* Remove trailing whitespace

* Fix doc heading

* Make glslc required Vulkan component

* remove clblast from nix pkg
2024-07-13 18:12:39 +02:00
Olivier Chafik
8748d8ac6f
json: attempt to skip slow tests when running under emulator (#8189) 2024-06-28 18:02:05 +01:00
loonerin
558f44bf83
CI: fix release build (Ubuntu+Mac) (#8170)
* CI: fix release build (Ubuntu)

PR #8006 changes defaults to build shared libs. However, CI for releases
expects static builds.

* CI: fix release build (Mac)

---------

Co-authored-by: loonerin <loonerin@users.noreply.github.com>
2024-06-27 21:01:23 +02:00
slaren
ae5d0f4b89
ci : publish new docker images only when the files change (#8142) 2024-06-26 21:59:28 +02:00
Georgi Gerganov
f3f65429c4
llama : reorganize source code + improve CMake (#8006)
* scripts : update sync [no ci]

* files : relocate [no ci]

* ci : disable kompute build [no ci]

* cmake : fixes [no ci]

* server : fix mingw build

ggml-ci

* cmake : minor [no ci]

* cmake : link math library [no ci]

* cmake : build normal ggml library (not object library) [no ci]

* cmake : fix kompute build

ggml-ci

* make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE

ggml-ci

* move public backend headers to the public include directory (#8122)

* move public backend headers to the public include directory

* nix test

* spm : fix metal header

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* scripts : fix sync paths [no ci]

* scripts : sync ggml-blas.h [no ci]

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-06-26 18:33:02 +03:00
olexiyb
f8ec8877b7
ci : fix macos x86 build (#7940)
In order to use old `macos-latest` we should use `macos-12`

Potentially will fix: https://github.com/ggerganov/llama.cpp/issues/6975
2024-06-14 20:28:34 +03:00
Olivier Chafik
1c641e6aac
build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809)
* `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew

* server: update refs -> llama-server

gitignore llama-server

* server: simplify nix package

* main: update refs -> llama

fix examples/main ref

* main/server: fix targets

* update more names

* Update build.yml

* rm accidentally checked in bins

* update straggling refs

* Update .gitignore

* Update server-llm.sh

* main: target name -> llama-cli

* Prefix all example bins w/ llama-

* fix main refs

* rename {main->llama}-cmake-pkg binary

* prefix more cmake targets w/ llama-

* add/fix gbnf-validator subfolder to cmake

* sort cmake example subdirs

* rm bin files

* fix llama-lookup-* Makefile rules

* gitignore /llama-*

* rename Dockerfiles

* rename llama|main -> llama-cli; consistent RPM bin prefixes

* fix some missing -cli suffixes

* rename dockerfile w/ llama-cli

* rename(make): llama-baby-llama

* update dockerfile refs

* more llama-cli(.exe)

* fix test-eval-callback

* rename: llama-cli-cmake-pkg(.exe)

* address gbnf-validator unused fread warning (switched to C++ / ifstream)

* add two missing llama- prefixes

* Updating docs for eval-callback binary to use new `llama-` prefix.

* Updating a few lingering doc references for rename of main to llama-cli

* Updating `run-with-preset.py` to use new binary names.
Updating docs around `perplexity` binary rename.

* Updating documentation references for lookup-merge and export-lora

* Updating two small `main` references missed earlier in the finetune docs.

* Update apps.nix

* update grammar/README.md w/ new llama-* names

* update llama-rpc-server bin name + doc

* Revert "update llama-rpc-server bin name + doc"

This reverts commit e474ef1df4.

* add hot topic notice to README.md

* Update README.md

* Update README.md

* rename gguf-split & quantize bins refs in **/tests.sh

---------

Co-authored-by: HanClinto <hanclinto@gmail.com>
2024-06-13 00:41:52 +01:00
slaren
c2ce6c47e4
fix CUDA CI by using a windows-2019 image (#7861)
* try to fix CUDA ci with --allow-unsupported-compiler

* trigger when build.yml changes

* another test

* try exllama/bdashore3 method

* install vs build tools before cuda toolkit

* try win-2019
2024-06-11 08:59:20 +03:00
Georgi Gerganov
554c247caf
ggml : remove OpenCL (#7735)
ggml-ci
2024-06-04 21:23:20 +03:00
Masaya, Kato
a5735e4426
ggml : use OpenMP as a thread pool (#7606)
* ggml: Added OpenMP for multi-threads processing

* ggml : Limit the number of threads used to avoid deadlock

* update shared state n_threads in parallel region

* clear numa affinity for main thread even with openmp

* enable openmp by default

* fix msvc build

* disable openmp on macos

* ci : disable openmp with thread sanitizer

* Update ggml.c

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-06-03 17:14:15 +02:00
slaren
d359f30921
llama : remove MPI backend (#7395) 2024-05-20 01:17:03 +02:00
Georgi Gerganov
059031b8c4
ci : re-enable sanitizer runs (#7358)
* Revert "ci : temporary disable sanitizer builds (#6128)"

This reverts commit 4f6d1337ca.

* ci : trigger
2024-05-18 18:55:54 +03:00
Gavin Zhao
82ca83db3c
ROCm: use native CMake HIP support (#5966)
Supercedes #4024 and #4813.

CMake's native HIP support has become the
recommended way to add HIP code into a project (see
[here](https://rocm.docs.amd.com/en/docs-6.0.0/conceptual/cmake-packages.html#using-hip-in-cmake)).
This PR makes the following changes:

1. The environment variable `HIPCXX` or CMake option
`CMAKE_HIP_COMPILER` should be used to specify the HIP
compiler. Notably this shouldn't be `hipcc`, but ROCm's clang,
which usually resides in `$ROCM_PATH/llvm/bin/clang`. Previously
this was control by `CMAKE_C_COMPILER` and `CMAKE_CXX_COMPILER`.
Note that since native CMake HIP support is not yet available on
Windows, on Windows we fall back to the old behavior.

2. CMake option `CMAKE_HIP_ARCHITECTURES` is used to control the
GPU architectures to build for. Previously this was controled by
`GPU_TARGETS`.

3. Updated the Nix recipe to account for these new changes.

4. The GPU targets to build against in the Nix recipe is now
consistent with the supported GPU targets in nixpkgs.

5. Added CI checks for HIP on both Linux and Windows. On Linux, we test
both the new and old behavior.

The most important part about this PR is the separation of the
HIP compiler and the C/C++ compiler. This allows users to choose
a different C/C++ compiler if desired, compared to the current
situation where when building for ROCm support, everything must be
compiled with ROCm's clang.

~~Makefile is unchanged. Please let me know if we want to be
consistent on variables' naming because Makefile still uses
`GPU_TARGETS` to control architectures to build for, but I feel
like setting `CMAKE_HIP_ARCHITECTURES` is a bit awkward when you're
calling `make`.~~ Makefile used `GPU_TARGETS` but the README says
to use `AMDGPU_TARGETS`. For consistency with CMake, all usage of
`GPU_TARGETS` in Makefile has been updated to `AMDGPU_TARGETS`.

Thanks to the suggestion of @jin-eld, to maintain backwards
compatibility (and not break too many downstream users' builds), if
`CMAKE_CXX_COMPILER` ends with `hipcc`, then we still compile using
the original behavior and emit a warning that recommends switching
to the new HIP support. Similarly, if `AMDGPU_TARGETS` is set but
`CMAKE_HIP_ARCHITECTURES` is not, then we forward `AMDGPU_TARGETS`
to `CMAKE_HIP_ARCHITECTURES` to ease the transition to the new
HIP support.

Signed-off-by: Gavin Zhao <git@gzgz.dev>
2024-05-17 17:03:03 +02:00
Max Krasnyansky
172b78210a
ci: fix bin/Release path for windows-arm64 builds (#7317)
Switch to Ninja Multi-Config CMake generator to resurect bin/Release path
that broke artifact packaging in CI.
2024-05-16 15:36:43 +10:00
Max Krasnyansky
13ad16af12
Add support for properly optimized Windows ARM64 builds with LLVM and MSVC (#7191)
* logging: add proper checks for clang to avoid errors and warnings with VA_ARGS

* build: add CMake Presets and toolchian files for Windows ARM64

* matmul-int8: enable matmul-int8 with MSVC and fix Clang warnings

* ci: add support for optimized Windows ARM64 builds with MSVC and LLVM

* matmul-int8: fixed typos in q8_0_q8_0 matmuls

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* matmul-int8: remove unnecessary casts in q8_0_q8_0

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-16 12:47:36 +10:00
Radoslav Gerganov
5e31828d3e
ggml : add RPC backend (#6829)
* ggml : add RPC backend

The RPC backend proxies all operations to a remote server which runs a
regular backend (CPU, CUDA, Metal, etc).

* set TCP_NODELAY

* add CI workflows

* Address review comments

* fix warning

* implement llama_max_devices() for RPC

* Address review comments

* Address review comments

* wrap sockfd into a struct

* implement get_alignment and get_max_size

* add get_device_memory

* fix warning

* win32 support

* add README

* readme : trim trailing whitespace

* Address review comments

* win32 fix

* Address review comments

* fix compile warnings on macos
2024-05-14 14:27:19 +03:00
Neo Zhang
cbf75894d2
[SYCL] Add oneapi runtime dll files to win release package (#7241)
* add oneapi running time dlls to release package

* fix path

* fix path

* fix path

* fix path

* fix path

---------

Co-authored-by: Zhang <jianyu.zhang@intel.com>
2024-05-13 08:04:29 +08:00
Neo Zhang
0d5cef78ae
[SYCL] update CI with oneapi 2024.1 (#7235)
Co-authored-by: Zhang <jianyu.zhang@intel.com>
2024-05-13 08:02:55 +08:00
Przemysław Pawełczyk
ca7f29f568
ci : add building in MSYS2 environments (Windows) (#6967) 2024-04-29 15:59:47 +03:00
loonerin
0e4802b2ec
ci: add ubuntu latest release and fix missing build number (mac & ubuntu) (#6748) 2024-04-19 19:03:35 +02:00
Georgi Gerganov
9ed2737acc
ci : disable Metal for macOS-latest-cmake-x64 (#6628) 2024-04-12 11:15:05 +03:00
Hugo Roussel
1bbdaf6ecd
ci: download artifacts to release directory (#6612)
When action download-artifact was updated to v4, the default download path changed.
This fix binaries not being uploaded to releases.
2024-04-11 19:52:21 +02:00
Pierrick Hymbert
b804b1ef77
eval-callback: Example how to use eval callback for debugging (#6576)
* gguf-debug: Example how to use ggml callback for debugging

* gguf-debug: no mutex, verify type, fix stride.

* llama: cv eval: move cb eval field in common gpt_params

* ggml_debug: use common gpt_params to pass cb eval.
Fix get tensor SIGV random.

* ggml_debug: ci: add tests

* ggml_debug: EOL in CMakeLists.txt

* ggml_debug: Remove unused param n_batch, no batching here

* ggml_debug: fix trailing spaces

* ggml_debug: fix trailing spaces

* common: fix cb_eval and user data not initialized

* ci: build revert label

* ggml_debug: add main test label

* doc: add a model: add a link to ggml-debug

* ggml-debug: add to make toolchain

* ggml-debug: tests add the main label

* ggml-debug: ci add test curl label

* common: allow the warmup to be disabled in llama_init_from_gpt_params

* ci: add curl test

* ggml-debug: better tensor type support

* gitignore : ggml-debug

* ggml-debug: printing also the sum of each tensor

* ggml-debug: remove block size

* eval-callback: renamed from ggml-debug

* eval-callback: fix make toolchain

---------

Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-11 14:51:07 +02:00
Minsoo Cheong
7dda1b727e
ci: exempt master branch workflows from getting cancelled (#6486)
* ci: exempt master branch workflows from getting cancelled

* apply to bench.yml
2024-04-04 18:30:53 +02:00
Ewout ter Hoeven
c666ba26c3
build CI: Name artifacts (#6482)
Name the artifacts in the build CI, so that they get uploaded with separate names, instead of all put into the same `artifact` ZIP.

It might be possible to further simplify the packing step (in future PRs).
2024-04-04 17:08:55 +02:00
Ewout ter Hoeven
9f62c0173d
ci : update checkout, setup-python and upload-artifact to latest (#6456)
* CI: Update actions/checkout to v4

* CI: Update actions/setup-python to v5

* CI: Update actions/upload-artifact to v4
2024-04-03 21:01:13 +03:00
Neo Zhang Jianyu
a4f569e8a3
[SYCL] fix no file in win rel (#6314) 2024-03-27 09:47:06 +08:00
slaren
280345968d
cuda : rename build flag to LLAMA_CUDA (#6299) 2024-03-26 01:16:01 +01:00
Neo Zhang Jianyu
d03224ac98
Support build win release for SYCL (#6241)
* support release win

* fix value

* fix value

* fix value

* fix error

* fix error

* fix format
2024-03-24 09:44:01 +08:00
fraxy-v
92397d87a4
convert-llama2c-to-ggml : enable conversion of GQA models (#6237)
* convert-llama2c-to-ggml: enable conversion of multiqueries, #5608

* add test in build action

* Update build.yml

* Update build.yml

* Update build.yml

* gg patch
2024-03-22 20:49:06 +02:00
Minsoo Cheong
ee804f6223
ci: apply concurrency limit for github workflows (#6243) 2024-03-22 19:15:06 +02:00
Olivier Chafik
f77a8ffd3b
tests : conditional python & node json schema tests (#6207)
* json: only attempt python & node schema conversion tests if their bins are present

Tests introduced in https://github.com/ggerganov/llama.cpp/pull/5978
disabled in https://github.com/ggerganov/llama.cpp/pull/6198

* json: orange warnings when tests skipped

* json: ensure py/js schema conv tested on ubuntu-focal-make

* json: print env vars in test
2024-03-22 15:09:07 +02:00
Vaibhav Srivastav
b2075fd6a5
ci : add CURL flag for the mac builds (#6214) 2024-03-22 09:53:43 +02:00
Vaibhav Srivastav
1943c01981
ci : fix indentation error (#6195) 2024-03-21 11:30:40 +02:00
Vaibhav Srivastav
5e43ba8742
build : add mac pre-build binaries (#6182)
* Initial commit - add mac prebuilds.

* forward contribution credits for building the workflow.

* minor : remove trailing whitespaces

---------

Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-21 11:13:12 +02:00
Georgi Gerganov
4f6d1337ca
ci : temporary disable sanitizer builds (#6128) 2024-03-18 13:45:27 +02:00
Pierrick Hymbert
d01b3c4c32
common: llama_load_model_from_url using --model-url (#6098)
* common: llama_load_model_from_url with libcurl dependency

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-17 19:12:37 +01:00
Georgi Gerganov
381da2d9f0
metal : build metallib + fix embed path (#6015)
* metal : build metallib + fix embed path

ggml-ci

* metal : fix embed build + update library load logic

ggml-ci

* metal : fix embeded library build

ggml-ci

* ci : fix iOS builds to use embedded library
2024-03-14 11:55:23 +02:00
Michael Podvitskiy
3202361c5b
ggml, ci : Windows ARM runner and build fixes (#5979)
* windows arm ci

* fix `error C2078: too many initializers` with ggml_vld1q_u32 macro for MSVC ARM64

* fix `warning C4146: unary minus operator applied to unsigned type, result still unsigned`

* fix `error C2065: '__fp16': undeclared identifier`
2024-03-11 11:28:51 +02:00
Eve
6ea0f010ff
ci : add Ubuntu 22 Vulkan CI run (#5789) 2024-03-01 10:54:53 +02:00
Radosław Gryta
abbabc5e51
ggml-quants : provide ggml_vqtbl1q_u8 for 64bit compatibility (#5711)
* [ggml-quants] Provide ggml_vqtbl1q_u8 for 64bit compatibility

vqtbl1q_u8 is not part of arm v7 neon library

* [android-example] Remove abi filter after arm v7a fix

* [github-workflows] Do not skip Android armeabi-v7a build
2024-02-25 20:43:00 +02:00
Ananta Bastola
6e4e973b26
ci : add an option to fail on compile warning (#3952)
* feat(ci): add an option to fail on compile warning

* Update CMakeLists.txt

* minor : fix compile warnings

ggml-ci

* ggml : fix unreachable code warnings

ggml-ci

* ci : disable fatal warnings for windows, ios and tvos

* ggml : fix strncpy warning

* ci : disable fatal warnings for MPI build

* ci : add fatal warnings to ggml-ci

ggml-ci

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-17 23:03:14 +02:00
Abhilash Majumder
6e99f2a04f
Fix f16_sycl cpy call from Arc (#5411)
* fix f16_sycl cpy call

* rm old logic

* add fp16 build CI

* use macro

* format fix
2024-02-08 22:39:10 +05:30