llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-27 03:44:35 +00:00

Author	SHA1	Message	Date
Minsoo Cheong	6a87ac3a52	fix editorconfig check break (#5879 )	2024-03-05 11:42:23 +05:30
hutli	1d41d6f7c2	nix: static build (#5814 )	2024-03-04 17:33:08 -08:00
Tushar	cb5e8f7fc4	build(nix): Introduce flake.formatter for `nix fmt` (#5687 ) * build(nix): Introduce flake.formatter for `nix fmt` * chore: Switch to pkgs.nixfmt-rfc-style	2024-03-01 15:18:26 -08:00
Someone	201294ae17	nix: init singularity and docker images (#5056 ) Exposes a few attributes demonstrating how to build [singularity](https://docs.sylabs.io/guides/latest/user-guide/)/[apptainer](https://apptainer.org/) and Docker images re-using llama.cpp's Nix expression. Built locally on `x86_64-linux` with `nix build github:someoneserge/llama.cpp/feat/nix/images#llamaPackages.{docker,docker-min,sif,llama-cpp}` and it's fast and effective.	2024-02-22 11:44:10 -08:00
0cc4m	22f83f0c38	Enable Vulkan MacOS CI	2024-02-19 14:49:49 -08:00
Martin Schwaighofer	60ecf099ed	add Vulkan support to Nix flake	2024-02-03 13:13:07 -06:00
Xuan Son Nguyen	6b91b1e0a9	docker : add build for SYCL, Vulkan + update readme (#5228 ) * add vulkan dockerfile * intel dockerfile: compile sycl by default * fix vulkan dockerfile * add docs for vulkan * docs: sycl build in docker * docs: remove trailing spaces * docs: sycl: add docker section * docs: clarify install vulkan SDK outside docker * sycl: use intel/oneapi-basekit docker image * docs: correct TOC * docs: correct docker image for Intel oneMKL	2024-02-02 09:56:31 +02:00
Kyle Mistele	39baaf55a1	docker : add server-first container images (#5157 ) * feat: add Dockerfiles for each platform that user ./server instead of ./main * feat: update .github/workflows/docker.yml to build server-first docker containers * doc: add information about running the server with Docker to README.md * doc: add information about running with docker to the server README * doc: update n-gpu-layers to show correct GPU usage * fix(doc): update container tag from `server` to `server-cuda` for README example on running server container with CUDA	2024-01-28 09:55:31 +02:00
Michael Hueschen	c9b316c78f	nix-shell: use addToSearchPath thx to @SomeoneSerge for the suggestion!	2024-01-24 12:39:29 +00:00
Michael Hueschen	bf63d695b8	nix: add cc to devShell LD_LIBRARY_PATH this fixes the error I encountered when trying to run the convert.py script in a venv: ``` $ nix develop [...]$ source .venv/bin/activate (.venv) [...]$ pip3 install -r requirements.txt <... clipped ...> [...]$ python3 ./convert.py Traceback (most recent call last): File "/home/mhueschen/projects-reference/llama.cpp/./convert.py", line 40, in <module> from sentencepiece import SentencePieceProcessor File "/home/mhueschen/projects-reference/llama.cpp/.venv/lib/python3.11/site-packages/sentencepiece/__init__.py", line 13, in <module> from . import _sentencepiece ImportError: libstdc++.so.6: cannot open shared object file: No such file or directory ``` however, I am not sure this is the cleanest way to address this linker issue...	2024-01-24 12:39:29 +00:00
Xuan Son Nguyen	2bed4aa3f3	devops : add intel oneapi dockerfile (#5068 ) Co-authored-by: Xuan Son Nguyen <xuanson.nguyen@snowpack.eu>	2024-01-23 09:11:39 +02:00
Someone Serge	28603cd283	nix: add a comment on the many nixpkgs-with-cuda instances	2024-01-22 12:19:30 +00:00
Someone Serge	5e97ec91ae	nix: add a comment about makeScope	2024-01-22 12:19:30 +00:00
Someone Serge	7251870780	nix: refactor the cleanSource rules	2024-01-22 12:19:30 +00:00
compilade	d6bd4d46dd	llama : support StableLM 2 1.6B (#5052 ) * llama : support StableLM 2 1.6B * convert : fix Qwen's set_vocab wrongly naming all special tokens [PAD{id}] * convert : refactor Qwen's set_vocab to use it for StableLM 2 too * nix : add tiktoken to llama-python-extra * convert : use presence of tokenizer.json to determine StableLM tokenizer loader It's a less arbitrary heuristic than the vocab size.	2024-01-22 13:21:52 +02:00
iSma	504dc37be8	Revert LLAMA_NATIVE to OFF in flake.nix (#5066 )	2024-01-21 21:37:13 +00:00
Ikko Eltociear Ashimine	be36bb946a	flake.nix : fix typo (#4700 ) betwen -> between	2024-01-05 18:02:44 +02:00
Someone Serge	1e3900ebac	flake.nix: expose full scope in legacyPackages	2023-12-31 13:14:58 -08:00
crasm	04ac0607e9	python : add check-requirements.sh and GitHub workflow (#4585 ) * python: add check-requirements.sh and GitHub workflow This script and workflow forces package versions to remain compatible across all convert.py scripts, while allowing secondary convert scripts to import dependencies not wanted in convert.py. Move requirements into ./requirements * Fail on "==" being used for package requirements (but can be suppressed) * Enforce "compatible release" syntax instead of == * Update workflow * Add upper version bound for transformers and protobuf * improve check-requirements.sh * small syntax change * don't remove venvs if nocleanup is passed * See if this fixes docker workflow * Move check-requirements.sh into ./scripts/ --------- Co-authored-by: Jared Van Bortel <jared@nomic.ai>	2023-12-29 16:50:29 +02:00
Philip Taron	68eccbdc5b	flake.nix : rewrite (#4605 ) * flake.lock: update to hotfix CUDA::cuda_driver Required to support https://github.com/ggerganov/llama.cpp/pull/4606 * flake.nix: rewrite 1. Split into separate files per output. 2. Added overlays, so that this flake can be integrated into others. The names in the overlay are `llama-cpp`, `llama-cpp-opencl`, `llama-cpp-cuda`, and `llama-cpp-rocm` so that they fit into the broader set of Nix packages from [nixpkgs](https://github.com/nixos/nixpkgs). 3. Use [callPackage](https://summer.nixos.org/blog/callpackage-a-tool-for-the-lazy/) rather than `with pkgs;` so that there's dependency injection rather than dependency lookup. 4. Add a description and meta information for each package. The description includes a bit about what's trying to accelerate each one. 5. Use specific CUDA packages instead of cudatoolkit on the advice of SomeoneSerge. 6. Format with `serokell/nixfmt` for a consistent style. 7. Update `flake.lock` with the latest goods. * flake.nix: use finalPackage instead of passing it manually * nix: unclutter darwin support * nix: pass most darwin frameworks unconditionally ...for simplicity * .nix: nixfmt nix shell github:piegamesde/nixfmt/rfc101-style --command \ nixfmt flake.nix .devops/nix/.nix * flake.nix: add maintainers * nix: move meta down to follow Nixpkgs style more closely * nix: add missing meta attributes nix: clarify the interpretation of meta.maintainers nix: clarify the meaning of "broken" and "badPlatforms" nix: passthru: expose the use* flags for inspection E.g.: ``` ❯ nix eval .#cuda.useCuda true ``` * flake.nix: avoid re-evaluating nixpkgs too many times * flake.nix: use flake-parts * nix: migrate to pname+version * flake.nix: overlay: expose both the namespace and the default attribute * ci: add the (Nix) flakestry workflow * nix: cmakeFlags: explicit OFF bools * nix: cuda: reduce runtime closure * nix: fewer rebuilds * nix: respect config.cudaCapabilities * nix: add the impure driver's location to the DT_RUNPATHs * nix: clean sources more thoroughly ...this way outPaths change less frequently, and so there are fewer rebuilds * nix: explicit mpi support * nix: explicit jetson support * flake.nix: darwin: only expose the default --------- Co-authored-by: Someone Serge <sergei.kozlukov@aalto.fi>	2023-12-29 16:42:26 +02:00
Juraj Bednar	3bd2c7ce1b	docker : add finetune option (#4211 )	2023-11-30 23:46:01 +02:00
Ali Tariq	c2ab6fe661	ci : Cloud-V for RISC-V builds (#3160 ) * Added Cloud-V File * Replaced Makefile with original one --------- Co-authored-by: moiz.hussain <moiz.hussain@10xengineers.ai>	2023-09-15 11:06:56 +03:00
hongbo.mo	a21baeb122	docker : add git to full-cuda.Dockerfile main-cuda.Dockerfile (#3044 )	2023-09-08 13:57:55 +03:00
Henri Vasserman	71d6975559	[Docker] fix tools.sh argument passing. (#2884 ) * [Docker] fix tools.sh argument passing. This should allow passing multiple arguments to containers with the full image that are using the tools.sh frontend. Fix from https://github.com/ggerganov/llama.cpp/issues/2535#issuecomment-1697091734	2023-08-30 19:14:53 +03:00
JohnnyB	3e8ff47af6	devops : added systemd units and set versioning to use date. (#2835 ) * Corrections and systemd units * Missing dependency clblast	2023-08-28 09:31:24 +03:00
Henri Vasserman	6bbc598a63	ROCm Port (#1087 ) * use hipblas based on cublas * Update Makefile for the Cuda kernels * Expand arch list and make it overrideable * Fix multi GPU on multiple amd architectures with rocblas_initialize() (#5) * add hipBLAS to README * new build arg LLAMA_CUDA_MMQ_Y * fix half2 decomposition * Add intrinsics polyfills for AMD * AMD assembly optimized __dp4a * Allow overriding CC_TURING * use "ROCm" instead of "CUDA" * ignore all build dirs * Add Dockerfiles * fix llama-bench * fix -nommq help for non CUDA/HIP --------- Co-authored-by: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Co-authored-by: ardfork <134447697+ardfork@users.noreply.github.com> Co-authored-by: funnbot <22226942+funnbot@users.noreply.github.com> Co-authored-by: Engininja2 <139037756+Engininja2@users.noreply.github.com> Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com> Co-authored-by: jammm <2500920+jammm@users.noreply.github.com> Co-authored-by: jdecourval <7315817+jdecourval@users.noreply.github.com>	2023-08-25 12:09:42 +03:00
JohnnyB	f19dca04ea	devops : RPM Specs (#2723 ) * Create llama-cpp.srpm * Rename llama-cpp.srpm to llama-cpp.srpm.spec Correcting extension. * Tested spec success. * Update llama-cpp.srpm.spec * Create lamma-cpp-cublas.srpm.spec * Create lamma-cpp-clblast.srpm.spec * Update lamma-cpp-cublas.srpm.spec Added BuildRequires * Moved to devops dir	2023-08-23 17:28:22 +03:00
Bodo Graumann	b782422a3e	devops : add missing quotes to bash script (#2193 ) This prevents accidentally expanding arguments that contain spaces.	2023-07-13 16:49:14 +03:00
Jinwoo Jeong	3ec7e596b2	docker : add '--server' option (#2174 )	2023-07-11 19:12:35 +03:00
dylan	84525e7962	docker : add support for CUDA in docker (#1461 ) Co-authored-by: canardleteer <eris.has.a.dad+github@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-07-07 21:25:25 +03:00
qingfengfenga	8fc8179919	Add llama.cpp docker support for non-latin languages (#1673 ) * Modify Dockerfile default character set to improve compatibility (#1673)	2023-06-08 00:58:53 -07:00
Jiří Podivín	b5c85468a3	Docker: change to calling convert.py (#1641 ) Deprecation disclaimer was added to convert-pth-to-ggml.py	2023-06-03 15:11:53 +03:00
Jiří Podivín	0e730dd23b	Adding git in container package dependencies (#1621 ) Git added to build packages for version information in docker image Signed-off-by: Jiri Podivin <jpodivin@gmail.com>	2023-05-28 21:45:50 -07:00
Pavol Rusnak	859fee6dfb	quantize : use `map` to assign quantization type from `string` (#1191 ) instead of `int` (while `int` option still being supported) This allows the following usage: `./quantize ggml-model-f16.bin ggml-model-q4_0.bin q4_0` instead of: `./quantize ggml-model-f16.bin ggml-model-q4_0.bin 2`	2023-04-26 18:43:27 +02:00
Pavol Rusnak	a32f7acc9f	py : cleanup dependencies (#962 ) after #545 we do not need torch, tqdm and requests in the dependencies	2023-04-14 15:37:11 +02:00
Pavol Rusnak	8b679987cd	Fix whitespace, add .editorconfig, add GitHub workflow (#883 )	2023-04-11 19:45:44 +00:00
bsilvereagle	a0c0516416	Remove torch GPU dependencies from the Docker.full image (#665 ) By using `pip install torch --index-url https://download.pytorch.org/whl/cpu` instead of `pip install torch` we can specify we want to install a CPU-only version of PyTorch without any GPU dependencies. This reduces the size of the Docker image from 7.32 GB to 1.62 GB	2023-04-03 00:13:03 +02:00
Georgi Gerganov	4cc053b6d5	Remove oboslete command from Docker script	2023-03-23 22:39:44 +02:00
Stephan Walter	5cb63e2493	Add tqdm to Python requirements (#293 ) * Add tqdm to Python requirements * Remove torchvision torchaudio, add requests	2023-03-20 09:24:11 +01:00
Stephan Walter	367946c668	Don't tell users to use a bad number of threads (#243 ) The readme tells people to use the command line option "-t 8", causing 8 threads to be started. On systems with fewer than 8 cores, this causes a significant slowdown. Remove the option from the example command lines and use /proc/cpuinfo on Linux to determine a sensible default.	2023-03-17 19:47:35 +02:00
Bernat Vadell	2af23d3043	🚀 Dockerize llamacpp (#132 ) * feat: dockerize llamacpp * feat: split build & runtime stages * split dockerfile into main & tools * add quantize into tool docker image * Update .devops/tools.sh Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add docker action pipeline * change CI to publish at github docker registry * fix name runs-on macOS-latest is macos-latest (lowercase) * include docker versioned images * fix github action docker * fix docker.yml * feat: include all-in-one command tool & update readme.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-17 10:47:06 +01:00

1 2

91 Commits