llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-26 19:34:35 +00:00

Author	SHA1	Message	Date
compilade	3fd62a6b1c	py : type-check all Python scripts with Pyright (#8341 ) * py : type-check all Python scripts with Pyright * server-tests : use trailing slash in openai base_url * server-tests : add more type annotations * server-tests : strip "chat" from base_url in oai_chat_completions * server-tests : model metadata is a dict * ci : disable pip cache in type-check workflow The cache is not shared between branches, and it's 250MB in size, so it would become quite a big part of the 10GB cache limit of the repo. * py : fix new type errors from master branch * tests : fix test-tokenizer-random.py Apparently, gcc applies optimisations even when pre-processing, which confuses pycparser. * ci : only show warnings and errors in python type-check The "information" level otherwise has entries from 'examples/pydantic_models_to_grammar.py', which could be confusing for someone trying to figure out what failed, considering that these messages can safely be ignored even though they look like errors.	2024-07-07 15:04:39 -04:00
Michael Francis	3840b6f593	nix : enable curl (#8043 ) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-07-01 14:47:04 +03:00
Georgi Gerganov	257f8e41e2	nix : remove OpenCL remnants (#8235 ) * nix : remove OpenCL remnants * minor : remove parentheses	2024-07-01 14:46:18 +03:00
Georgi Gerganov	0e814dfc42	devops : remove clblast + LLAMA_CUDA -> GGML_CUDA (#8139 ) ggml-ci	2024-06-26 19:32:07 +03:00
Georgi Gerganov	f3f65429c4	llama : reorganize source code + improve CMake (#8006 ) * scripts : update sync [no ci] * files : relocate [no ci] * ci : disable kompute build [no ci] * cmake : fixes [no ci] * server : fix mingw build ggml-ci * cmake : minor [no ci] * cmake : link math library [no ci] * cmake : build normal ggml library (not object library) [no ci] * cmake : fix kompute build ggml-ci * make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE ggml-ci * move public backend headers to the public include directory (#8122) * move public backend headers to the public include directory * nix test * spm : fix metal header --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * scripts : fix sync paths [no ci] * scripts : sync ggml-blas.h [no ci] --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-26 18:33:02 +03:00
joecryptotoo	925c30956d	Add healthchecks to llama-server containers (#8081 ) * added healthcheck * added healthcheck * added healthcheck * added healthcheck * added healthcheck * moved curl to base * moved curl to base	2024-06-25 17:13:27 +02:00
Olivier Chafik	1c641e6aac	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 ) * `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew * server: update refs -> llama-server gitignore llama-server * server: simplify nix package * main: update refs -> llama fix examples/main ref * main/server: fix targets * update more names * Update build.yml * rm accidentally checked in bins * update straggling refs * Update .gitignore * Update server-llm.sh * main: target name -> llama-cli * Prefix all example bins w/ llama- * fix main refs * rename {main->llama}-cmake-pkg binary * prefix more cmake targets w/ llama- * add/fix gbnf-validator subfolder to cmake * sort cmake example subdirs * rm bin files * fix llama-lookup-* Makefile rules * gitignore /llama-* * rename Dockerfiles * rename llama\|main -> llama-cli; consistent RPM bin prefixes * fix some missing -cli suffixes * rename dockerfile w/ llama-cli * rename(make): llama-baby-llama * update dockerfile refs * more llama-cli(.exe) * fix test-eval-callback * rename: llama-cli-cmake-pkg(.exe) * address gbnf-validator unused fread warning (switched to C++ / ifstream) * add two missing llama- prefixes * Updating docs for eval-callback binary to use new `llama-` prefix. * Updating a few lingering doc references for rename of main to llama-cli * Updating `run-with-preset.py` to use new binary names. Updating docs around `perplexity` binary rename. * Updating documentation references for lookup-merge and export-lora * Updating two small `main` references missed earlier in the finetune docs. * Update apps.nix * update grammar/README.md w/ new llama-* names * update llama-rpc-server bin name + doc * Revert "update llama-rpc-server bin name + doc" This reverts commit `e474ef1df4`. * add hot topic notice to README.md * Update README.md * Update README.md * rename gguf-split & quantize bins refs in **/tests.sh --------- Co-authored-by: HanClinto <hanclinto@gmail.com>	2024-06-13 00:41:52 +01:00
Meng, Hengyu	dcf752707d	update intel docker oneapi-basekit to 2024.1.1-devel-ubuntu22.04 (#7894 ) In addition this reverts a workaround we had to do to workaround the upstream issue with expired intel GPG package keys in 2024.0.1-devel-ubuntu22.04	2024-06-12 19:05:35 +10:00
slaren	2d08b7fbb4	docker : build only main and server in their images (#7782 ) * add openmp lib to dockerfiles * build only main and server in their docker images	2024-06-06 08:19:49 +03:00
slaren	d67caea0d6	docker : add openmp lib (#7780 )	2024-06-06 08:17:21 +03:00
JohnnyB	9022c33646	Fixed painfully slow single process builds. (#7326 ) * Fixed painfully slow single process builds. * Added nproc for systems that don't default to nproc	2024-05-30 22:32:38 +02:00
Galunid	9c4c9cc83f	Move convert.py to examples/convert-legacy-llama.py (#7430 ) * Move convert.py to examples/convert-no-torch.py * Fix CI, scripts, readme files * convert-no-torch -> convert-legacy-llama * Move vocab thing to vocab.py * Fix convert-no-torch -> convert-legacy-llama * Fix lost convert.py in ci/run.sh * Fix imports * Fix gguf not imported correctly * Fix flake8 complaints * Fix check-requirements.sh * Get rid of ADDED_TOKENS_FILE, FAST_TOKENIZER_FILE * Review fixes	2024-05-30 21:40:00 +10:00
Meng, Hengyu	3854c9d07f	[SYCL] fix intel docker (#7630 ) * Update main-intel.Dockerfile * workaround for https://github.com/intel/oneapi-containers/issues/70 * reset intel docker in CI * add missed in server	2024-05-30 16:19:08 +10:00
slaren	d359f30921	llama : remove MPI backend (#7395 )	2024-05-20 01:17:03 +02:00
Gavin Zhao	82ca83db3c	ROCm: use native CMake HIP support (#5966 ) Supercedes #4024 and #4813. CMake's native HIP support has become the recommended way to add HIP code into a project (see [here](https://rocm.docs.amd.com/en/docs-6.0.0/conceptual/cmake-packages.html#using-hip-in-cmake)). This PR makes the following changes: 1. The environment variable `HIPCXX` or CMake option `CMAKE_HIP_COMPILER` should be used to specify the HIP compiler. Notably this shouldn't be `hipcc`, but ROCm's clang, which usually resides in `$ROCM_PATH/llvm/bin/clang`. Previously this was control by `CMAKE_C_COMPILER` and `CMAKE_CXX_COMPILER`. Note that since native CMake HIP support is not yet available on Windows, on Windows we fall back to the old behavior. 2. CMake option `CMAKE_HIP_ARCHITECTURES` is used to control the GPU architectures to build for. Previously this was controled by `GPU_TARGETS`. 3. Updated the Nix recipe to account for these new changes. 4. The GPU targets to build against in the Nix recipe is now consistent with the supported GPU targets in nixpkgs. 5. Added CI checks for HIP on both Linux and Windows. On Linux, we test both the new and old behavior. The most important part about this PR is the separation of the HIP compiler and the C/C++ compiler. This allows users to choose a different C/C++ compiler if desired, compared to the current situation where when building for ROCm support, everything must be compiled with ROCm's clang. ~~Makefile is unchanged. Please let me know if we want to be consistent on variables' naming because Makefile still uses `GPU_TARGETS` to control architectures to build for, but I feel like setting `CMAKE_HIP_ARCHITECTURES` is a bit awkward when you're calling `make`.~~ Makefile used `GPU_TARGETS` but the README says to use `AMDGPU_TARGETS`. For consistency with CMake, all usage of `GPU_TARGETS` in Makefile has been updated to `AMDGPU_TARGETS`. Thanks to the suggestion of @jin-eld, to maintain backwards compatibility (and not break too many downstream users' builds), if `CMAKE_CXX_COMPILER` ends with `hipcc`, then we still compile using the original behavior and emit a warning that recommends switching to the new HIP support. Similarly, if `AMDGPU_TARGETS` is set but `CMAKE_HIP_ARCHITECTURES` is not, then we forward `AMDGPU_TARGETS` to `CMAKE_HIP_ARCHITECTURES` to ease the transition to the new HIP support. Signed-off-by: Gavin Zhao <git@gzgz.dev>	2024-05-17 17:03:03 +02:00
Olivier Chafik	b8a7a5a90f	build(cmake): simplify instructions (`cmake -B build && cmake --build build ...`) (#6964 ) * readme: cmake . -B build && cmake --build build * build: fix typo Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * build: drop implicit . from cmake config command * build: remove another superfluous . * build: update MinGW cmake commands * Update README-sycl.md Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com> * build: reinstate --config Release as not the default w/ some generators + document how to build Debug * build: revert more --config Release * build: nit / remove -H from cmake example * build: reword debug instructions around single/multi config split --------- Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>	2024-04-29 17:02:45 +01:00
Ed Lepedus	0a1d889e27	server: add cURL support to server Dockerfiles (#6474 ) * server: add cURL support to `full.Dockerfile` * server: add cURL support to `full-cuda.Dockerfile` and `server-cuda.Dockerfile` * server: add cURL support to `full-rocm.Dockerfile` and `server-rocm.Dockerfile` * server: add cURL support to `server-intel.Dockerfile` * server: add cURL support to `server-vulkan.Dockerfile` * fix typo in `server-vulkan.Dockerfile` Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-04 18:31:22 +02:00
Ed Lepedus	5d4f12e462	server: add cURL support to `server.Dockerfile` (#6461 )	2024-04-03 19:56:37 +02:00
Mohammadreza Hendiani	c342d070c6	Fedora build update (#6388 ) * fixed deprecated address * fixed deprecated address * fixed deprecated address * Added 'Apache-2.0' SPDX license identifier due to 'kompute.cc' submodule licensing. Explanation of licensing method: https://docs.fedoraproject.org/en-US/legal/spdx/#_and_expressions * Added 'Apache-2.0' SPDX license identifier due to 'kompute.cc' submodule licensing. Explanation of licensing method: https://docs.fedoraproject.org/en-US/legal/spdx/#_and_expressions * Added 'Apache-2.0' SPDX license identifier due to 'kompute.cc' submodule licensing. Explanation of licensing method: https://docs.fedoraproject.org/en-US/legal/spdx/#_and_expressions * reverted back to only the MIT license	2024-03-29 22:59:56 +01:00
hutli	d2d8f38996	nix: removed unnessesary indentation	2024-03-28 07:48:27 +00:00
hutli	d39b308eaf	nix: moved blas availability check to package inputs so it is still overridable	2024-03-28 07:48:27 +00:00
hutli	c873976649	using blas.meta.available to check host platform	2024-03-28 07:48:27 +00:00
hutli	dbb03e2b9c	only using explicit blas if hostPlatform is allowed	2024-03-28 07:48:27 +00:00
Someone Serge	22a462cc1f	nix: package: don't introduce the dependency on python - The generic /usr/bin/env shebangs are good enough - Python deps are provisioned in the devShells - We need to be able to leave python out at least on windows (currently breaks eval)	2024-03-28 07:48:27 +00:00
hutli	f6a0f5c642	nix: .#widnows: init initial nix build for windows using zig mingwW64 build removes nix zig windows build removes nix zig windows build removed unnessesary glibc.static removed unnessesary import of pkgs in nix fixed missing trailing newline on non-windows nix builds overriding stdenv when building for crosscompiling to windows in nix better variables when crosscompiling windows in nix cross compile windows on macos removed trailing whitespace remove unnessesary overwrite of "CMAKE_SYSTEM_NAME" in nix windows build nix: keep file extension when copying result files during cross compile for windows nix: better checking for file extensions when using MinGW nix: using hostPlatform instead of targetPlatform when cross compiling for Windows using hostPlatform.extensions.executable to extract executable format	2024-03-28 07:48:27 +00:00
Joseph Stahl	e190f1fca6	nix: make `xcrun` visible in Nix sandbox for precompiling Metal shaders (#6118 ) * Symlink to /usr/bin/xcrun so that `xcrun` binary is usable during build (used for compiling Metal shaders) Fixes https://github.com/ggerganov/llama.cpp/issues/6117 * cmake - copy default.metallib to install directory When metal files are compiled to default.metallib, Cmake needs to add this to the install directory so that it's visible to llama-cpp Also, update package.nix to use absolute path for default.metallib (it's not finding the bundle) * add `precompileMetalShaders` flag (defaults to false) to disable precompilation of metal shader Precompilation requires Xcode to be installed and requires disable sandbox on nix-darwin	2024-03-25 17:51:46 -07:00
slaren	280345968d	cuda : rename build flag to LLAMA_CUDA (#6299 )	2024-03-26 01:16:01 +01:00
Christian Kögler	b06c16ef9f	nix: fix blas support (#6281 ) Since no blas was provided to buildInputs, the executable is built without blas support. This is a backport of NixOS/nixpkgs#298567	2024-03-25 10:52:45 -07:00
Minsoo Cheong	6a87ac3a52	fix editorconfig check break (#5879 )	2024-03-05 11:42:23 +05:30
hutli	1d41d6f7c2	nix: static build (#5814 )	2024-03-04 17:33:08 -08:00
Tushar	cb5e8f7fc4	build(nix): Introduce flake.formatter for `nix fmt` (#5687 ) * build(nix): Introduce flake.formatter for `nix fmt` * chore: Switch to pkgs.nixfmt-rfc-style	2024-03-01 15:18:26 -08:00
Someone	201294ae17	nix: init singularity and docker images (#5056 ) Exposes a few attributes demonstrating how to build [singularity](https://docs.sylabs.io/guides/latest/user-guide/)/[apptainer](https://apptainer.org/) and Docker images re-using llama.cpp's Nix expression. Built locally on `x86_64-linux` with `nix build github:someoneserge/llama.cpp/feat/nix/images#llamaPackages.{docker,docker-min,sif,llama-cpp}` and it's fast and effective.	2024-02-22 11:44:10 -08:00
0cc4m	22f83f0c38	Enable Vulkan MacOS CI	2024-02-19 14:49:49 -08:00
Martin Schwaighofer	60ecf099ed	add Vulkan support to Nix flake	2024-02-03 13:13:07 -06:00
Xuan Son Nguyen	6b91b1e0a9	docker : add build for SYCL, Vulkan + update readme (#5228 ) * add vulkan dockerfile * intel dockerfile: compile sycl by default * fix vulkan dockerfile * add docs for vulkan * docs: sycl build in docker * docs: remove trailing spaces * docs: sycl: add docker section * docs: clarify install vulkan SDK outside docker * sycl: use intel/oneapi-basekit docker image * docs: correct TOC * docs: correct docker image for Intel oneMKL	2024-02-02 09:56:31 +02:00
Kyle Mistele	39baaf55a1	docker : add server-first container images (#5157 ) * feat: add Dockerfiles for each platform that user ./server instead of ./main * feat: update .github/workflows/docker.yml to build server-first docker containers * doc: add information about running the server with Docker to README.md * doc: add information about running with docker to the server README * doc: update n-gpu-layers to show correct GPU usage * fix(doc): update container tag from `server` to `server-cuda` for README example on running server container with CUDA	2024-01-28 09:55:31 +02:00
Michael Hueschen	c9b316c78f	nix-shell: use addToSearchPath thx to @SomeoneSerge for the suggestion!	2024-01-24 12:39:29 +00:00
Michael Hueschen	bf63d695b8	nix: add cc to devShell LD_LIBRARY_PATH this fixes the error I encountered when trying to run the convert.py script in a venv: ``` $ nix develop [...]$ source .venv/bin/activate (.venv) [...]$ pip3 install -r requirements.txt <... clipped ...> [...]$ python3 ./convert.py Traceback (most recent call last): File "/home/mhueschen/projects-reference/llama.cpp/./convert.py", line 40, in <module> from sentencepiece import SentencePieceProcessor File "/home/mhueschen/projects-reference/llama.cpp/.venv/lib/python3.11/site-packages/sentencepiece/__init__.py", line 13, in <module> from . import _sentencepiece ImportError: libstdc++.so.6: cannot open shared object file: No such file or directory ``` however, I am not sure this is the cleanest way to address this linker issue...	2024-01-24 12:39:29 +00:00
Xuan Son Nguyen	2bed4aa3f3	devops : add intel oneapi dockerfile (#5068 ) Co-authored-by: Xuan Son Nguyen <xuanson.nguyen@snowpack.eu>	2024-01-23 09:11:39 +02:00
Someone Serge	28603cd283	nix: add a comment on the many nixpkgs-with-cuda instances	2024-01-22 12:19:30 +00:00
Someone Serge	5e97ec91ae	nix: add a comment about makeScope	2024-01-22 12:19:30 +00:00
Someone Serge	7251870780	nix: refactor the cleanSource rules	2024-01-22 12:19:30 +00:00
compilade	d6bd4d46dd	llama : support StableLM 2 1.6B (#5052 ) * llama : support StableLM 2 1.6B * convert : fix Qwen's set_vocab wrongly naming all special tokens [PAD{id}] * convert : refactor Qwen's set_vocab to use it for StableLM 2 too * nix : add tiktoken to llama-python-extra * convert : use presence of tokenizer.json to determine StableLM tokenizer loader It's a less arbitrary heuristic than the vocab size.	2024-01-22 13:21:52 +02:00
iSma	504dc37be8	Revert LLAMA_NATIVE to OFF in flake.nix (#5066 )	2024-01-21 21:37:13 +00:00
Ikko Eltociear Ashimine	be36bb946a	flake.nix : fix typo (#4700 ) betwen -> between	2024-01-05 18:02:44 +02:00
Someone Serge	1e3900ebac	flake.nix: expose full scope in legacyPackages	2023-12-31 13:14:58 -08:00
crasm	04ac0607e9	python : add check-requirements.sh and GitHub workflow (#4585 ) * python: add check-requirements.sh and GitHub workflow This script and workflow forces package versions to remain compatible across all convert.py scripts, while allowing secondary convert scripts to import dependencies not wanted in convert.py. Move requirements into ./requirements * Fail on "==" being used for package requirements (but can be suppressed) * Enforce "compatible release" syntax instead of == * Update workflow * Add upper version bound for transformers and protobuf * improve check-requirements.sh * small syntax change * don't remove venvs if nocleanup is passed * See if this fixes docker workflow * Move check-requirements.sh into ./scripts/ --------- Co-authored-by: Jared Van Bortel <jared@nomic.ai>	2023-12-29 16:50:29 +02:00
Philip Taron	68eccbdc5b	flake.nix : rewrite (#4605 ) * flake.lock: update to hotfix CUDA::cuda_driver Required to support https://github.com/ggerganov/llama.cpp/pull/4606 * flake.nix: rewrite 1. Split into separate files per output. 2. Added overlays, so that this flake can be integrated into others. The names in the overlay are `llama-cpp`, `llama-cpp-opencl`, `llama-cpp-cuda`, and `llama-cpp-rocm` so that they fit into the broader set of Nix packages from [nixpkgs](https://github.com/nixos/nixpkgs). 3. Use [callPackage](https://summer.nixos.org/blog/callpackage-a-tool-for-the-lazy/) rather than `with pkgs;` so that there's dependency injection rather than dependency lookup. 4. Add a description and meta information for each package. The description includes a bit about what's trying to accelerate each one. 5. Use specific CUDA packages instead of cudatoolkit on the advice of SomeoneSerge. 6. Format with `serokell/nixfmt` for a consistent style. 7. Update `flake.lock` with the latest goods. * flake.nix: use finalPackage instead of passing it manually * nix: unclutter darwin support * nix: pass most darwin frameworks unconditionally ...for simplicity * .nix: nixfmt nix shell github:piegamesde/nixfmt/rfc101-style --command \ nixfmt flake.nix .devops/nix/.nix * flake.nix: add maintainers * nix: move meta down to follow Nixpkgs style more closely * nix: add missing meta attributes nix: clarify the interpretation of meta.maintainers nix: clarify the meaning of "broken" and "badPlatforms" nix: passthru: expose the use* flags for inspection E.g.: ``` ❯ nix eval .#cuda.useCuda true ``` * flake.nix: avoid re-evaluating nixpkgs too many times * flake.nix: use flake-parts * nix: migrate to pname+version * flake.nix: overlay: expose both the namespace and the default attribute * ci: add the (Nix) flakestry workflow * nix: cmakeFlags: explicit OFF bools * nix: cuda: reduce runtime closure * nix: fewer rebuilds * nix: respect config.cudaCapabilities * nix: add the impure driver's location to the DT_RUNPATHs * nix: clean sources more thoroughly ...this way outPaths change less frequently, and so there are fewer rebuilds * nix: explicit mpi support * nix: explicit jetson support * flake.nix: darwin: only expose the default --------- Co-authored-by: Someone Serge <sergei.kozlukov@aalto.fi>	2023-12-29 16:42:26 +02:00
Juraj Bednar	3bd2c7ce1b	docker : add finetune option (#4211 )	2023-11-30 23:46:01 +02:00
Ali Tariq	c2ab6fe661	ci : Cloud-V for RISC-V builds (#3160 ) * Added Cloud-V File * Replaced Makefile with original one --------- Co-authored-by: moiz.hussain <moiz.hussain@10xengineers.ai>	2023-09-15 11:06:56 +03:00

1 2

69 Commits