* style: format with nixfmt/rfc101-style
* build(nix): Package gguf-py
* build(nix): Refactor to new scope for gguf-py
* build(nix): Exclude gguf-py from devShells
* build(nix): Refactor gguf-py derivation to take in exact deps
* build(nix): Enable pytestCheckHook and pythonImportsCheck for gguf-py
* build(python): Package python scripts with pyproject.toml
* chore: Cleanup
* dev(nix): Break up python/C devShells
* build(python): Relax pytorch version constraint
Nix has an older version
* chore: Move cmake to nativeBuildInputs for devShell
* fmt: Reconcile formatting with rebase
* style: nix fmt
* cleanup: Remove unncessary __init__.py
* chore: Suggestions from review
- Filter out non-source files from llama-scripts flake derivation
- Clean up unused closure
- Remove scripts devShell
* revert: Bad changes
* dev: Simplify devShells, restore the -extra devShell
* build(nix): Add pyyaml for gguf-py
* chore: Remove some unused bindings
* dev: Add tiktoken to -extra devShells
initial nix build for windows using zig
mingwW64 build
removes nix zig windows build
removes nix zig windows build
removed unnessesary glibc.static
removed unnessesary import of pkgs in nix
fixed missing trailing newline on non-windows nix builds
overriding stdenv when building for crosscompiling to windows in nix
better variables when crosscompiling windows in nix
cross compile windows on macos
removed trailing whitespace
remove unnessesary overwrite of "CMAKE_SYSTEM_NAME" in nix windows build
nix: keep file extension when copying result files during cross compile for windows
nix: better checking for file extensions when using MinGW
nix: using hostPlatform instead of targetPlatform when cross compiling for Windows
using hostPlatform.extensions.executable to extract executable format
* flake.lock: update to hotfix CUDA::cuda_driver
Required to support https://github.com/ggerganov/llama.cpp/pull/4606
* flake.nix: rewrite
1. Split into separate files per output.
2. Added overlays, so that this flake can be integrated into others.
The names in the overlay are `llama-cpp`, `llama-cpp-opencl`,
`llama-cpp-cuda`, and `llama-cpp-rocm` so that they fit into the
broader set of Nix packages from [nixpkgs](https://github.com/nixos/nixpkgs).
3. Use [callPackage](https://summer.nixos.org/blog/callpackage-a-tool-for-the-lazy/)
rather than `with pkgs;` so that there's dependency injection rather
than dependency lookup.
4. Add a description and meta information for each package.
The description includes a bit about what's trying to accelerate each one.
5. Use specific CUDA packages instead of cudatoolkit on the advice of SomeoneSerge.
6. Format with `serokell/nixfmt` for a consistent style.
7. Update `flake.lock` with the latest goods.
* flake.nix: use finalPackage instead of passing it manually
* nix: unclutter darwin support
* nix: pass most darwin frameworks unconditionally
...for simplicity
* *.nix: nixfmt
nix shell github:piegamesde/nixfmt/rfc101-style --command \
nixfmt flake.nix .devops/nix/*.nix
* flake.nix: add maintainers
* nix: move meta down to follow Nixpkgs style more closely
* nix: add missing meta attributes
nix: clarify the interpretation of meta.maintainers
nix: clarify the meaning of "broken" and "badPlatforms"
nix: passthru: expose the use* flags for inspection
E.g.:
```
❯ nix eval .#cuda.useCuda
true
```
* flake.nix: avoid re-evaluating nixpkgs too many times
* flake.nix: use flake-parts
* nix: migrate to pname+version
* flake.nix: overlay: expose both the namespace and the default attribute
* ci: add the (Nix) flakestry workflow
* nix: cmakeFlags: explicit OFF bools
* nix: cuda: reduce runtime closure
* nix: fewer rebuilds
* nix: respect config.cudaCapabilities
* nix: add the impure driver's location to the DT_RUNPATHs
* nix: clean sources more thoroughly
...this way outPaths change less frequently,
and so there are fewer rebuilds
* nix: explicit mpi support
* nix: explicit jetson support
* flake.nix: darwin: only expose the default
---------
Co-authored-by: Someone Serge <sergei.kozlukov@aalto.fi>
* fix LLAMA_NATIVE
* syntax
* alternate implementation
* my eyes must be getting bad...
* set cmake LLAMA_NATIVE=ON by default
* march=native doesn't work for ios/tvos, so disable for those targets. also see what happens if we use it on msvc
* revert 8283237 and only allow LLAMA_NATIVE on x86 like the Makefile
* remove -DLLAMA_MPI=ON
---------
Co-authored-by: netrunnereve <netrunnereve@users.noreply.github.com>
Problem
-------
`nix build` fails with missing `Accelerate.h`.
Changes
-------
- Fix build of the llama.cpp with nix for Intel: add the same SDK frameworks as
for ARM
- Add `quantize` app to the output of nix flake
- Extend nix devShell with llama-python so we can use convertScript
Testing
-------
Testing the steps with nix:
1. `nix build`
Get the model and then
2. `nix develop` and then `python convert.py models/llama-2-7b.ggmlv3.q4_0.bin`
3. `nix run llama.cpp#quantize -- open_llama_7b/ggml-model-f16.gguf ./models/ggml-model-q4_0.bin 2`
4. `nix run llama.cpp#llama -- -m models/ggml-model-q4_0.bin -p "What is nix?" -n 400 --temp 0.8 -e -t 8`
Co-authored-by: Volodymyr Vitvitskyi <volodymyrvitvitskyi@SamsungPro.local>
* metal: matrix-matrix multiplication kernel
This commit removes MPS and uses custom matrix-matrix multiplication
kernels for all quantization types. This commit also adds grouped-query
attention to support llama2 70B.
* metal: fix performance degradation from gqa
Integers are slow on the GPU, and 64-bit divides are extremely slow.
In the context of GQA, we introduce a 64-bit divide that cannot be
optimized out by the compiler, which results in a decrease of ~8% in
inference performance. This commit fixes that issue by calculating a
part of the offset with a 32-bit divide. Naturally, this limits the
size of a single matrix to ~4GB. However, this limitation should
suffice for the near future.
* metal: fix bugs for GQA and perplexity test.
I mixed up ne02 and nb02 in previous commit.
NixOS's mkl misses some libraries like mkl-sdl.pc. See #2261
Currently NixOS doesn't have intel C compiler (icx, icpx). See https://discourse.nixos.org/t/packaging-intel-math-kernel-libraries-mkl/975
So remove it from flake.nix
Some minor changes:
- Change pkgs.python310 to pkgs.python3 to keep latest
- Add pkgconfig to devShells.default
- Remove installPhase because we have `cmake --install` from #2256
* Nix flake
* Nix: only add Accelerate framework on macOS
* Nix: development shel, direnv and compatibility
* Nix: use python packages supplied by withPackages
* Nix: remove channel compatibility
* Nix: fix ARM neon dotproduct on macOS
---------
Co-authored-by: Pavol Rusnak <pavol@rusnak.io>