Commit Graph

142 Commits

Author SHA1 Message Date
Pierrick Hymbert
4bd0f93e4a
model: support arch DbrxForCausalLM (#6515)
* model: dbrx convert to gguf
#6344

* llama: support dbrx
#6344

* doc: dbrx: add the model as supported

* scripts: get-wikitext-2 add unzip

* llama: increase maximum experts allowed

* llama: factorize moe graph implementation between grok, mixtral and dbrx


---------

Co-authored-by: Megha Agarwal <16129366+megha95@users.noreply.github.com>
2024-04-13 11:33:52 +02:00
Daniel Bevenius
f4183afe6a
scripts : add --outdir option to hf.sh (#6600)
* scripts : add --outdir option to hf.sh

This commit adds an option to the hf.sh script that allows the user to
specify an output directory for the downloaded file.

The motivation for this changes is that examples that use the hf.sh
script to download models from huggingface can now specify the output
directory, perhaps to the `models` directory to keep them in one place
and not clutter the root directory.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* squash! scripts : add --outdir option to hf.sh

Fix format of the --outdir option in the usage message.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

---------

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-04-11 16:22:47 +03:00
Georgi Gerganov
c4a3a4ff47
sync : ggml 2024-04-09 20:29:06 +03:00
Georgi Gerganov
e11a8999b5
license : update copyright notice + add AUTHORS (#6405)
* license : add AUTHORS

* authors : update

* scipts : add LICENSE and gen-authors.sh to sync
2024-04-09 09:23:19 +03:00
Georgi Gerganov
c37247796b
sync : ggml 2024-04-07 17:05:51 +03:00
Georgi Gerganov
43e8995e75
scripts : sync ggml-cuda folder 2024-04-07 16:08:12 +03:00
Georgi Gerganov
54ea0698fb
sync : ggml 2024-04-06 18:27:46 +03:00
Johannes Gäßler
33a5244806
compare-llama-bench.py: fix long hexsha args (#6424) 2024-04-01 13:30:43 +02:00
Georgi Gerganov
d48ccf3ad4
sync : ggml (#6351)
* sync : ggml

ggml-ci

* cuda : move GGML_CUDA_DMMV constants to dmmv.cuh

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-03-29 17:45:46 +02:00
slaren
280345968d
cuda : rename build flag to LLAMA_CUDA (#6299) 2024-03-26 01:16:01 +01:00
Johannes Gäßler
50ccaf5eac
lookup: complement data from context with general text statistics (#5479)
* lookup: evaluation tools, use corpus/previous gens

* fixup! lookup: evaluation tools, use corpus/previous gens

* fixup! lookup: evaluation tools, use corpus/previous gens

* fixup! lookup: evaluation tools, use corpus/previous gens

* fixup! lookup: evaluation tools, use corpus/previous gens
2024-03-23 01:24:36 +01:00
Georgi Gerganov
b838b53ad6
sync : ggml 2024-03-10 20:10:46 +02:00
Georgi Gerganov
8a3012a4ad
ggml : add ggml-common.h to deduplicate shared code (#5940)
* ggml : add ggml-common.h to shared code

ggml-ci

* scripts : update sync scripts

* sycl : reuse quantum tables

ggml-ci

* ggml : minor

* ggml : minor

* sycl : try to fix build
2024-03-09 12:47:57 +02:00
slaren
652ca2bded
compare-llama-bench.py : remove mul_mat_q (#5892) 2024-03-05 22:27:29 +01:00
Georgi Gerganov
efd8533ef8 sync : ggml
ggml-ci
2024-03-04 20:54:23 +02:00
Georgi Gerganov
a0fc62661f
sync : ggml 2024-03-04 10:40:04 +02:00
Georgi Gerganov
ef2cd694c4
scripts : add pod-llama.sh 2024-03-02 16:54:20 +02:00
Pierrick Hymbert
3ab8b3a92e
llama : cleanup unused mmq flags (#5772)
* cleanup unused --no-mul-mat-q,-nommq, -mmq, --mul-mat-q, mul_mat_q

* remove: mul_mat_q in compare llama bench and usage

* update llama-bench

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-03-01 13:39:06 +02:00
Georgi Gerganov
8c0e8f4e73
sync : ggml 2024-02-28 11:17:32 +02:00
Georgi Gerganov
334f76fa38
sync : ggml 2024-02-22 23:21:05 +02:00
Georgi Gerganov
5022cf242d
sync : ggml 2024-02-21 16:52:52 +02:00
Georgi Gerganov
eccd7a26dd
sync : ggml (#5633)
* ggml : fix conv_2d batch mode (ggml/737)

Co-authored-by: bssrdf <bssrdf@gmail.com>

* ggml : compute forward no longer pass src tensors (ggml/729)

* sync : ggml

ggml-ci

---------

Co-authored-by: bssrdf <merlintiger@hotmail.com>
Co-authored-by: bssrdf <bssrdf@gmail.com>
2024-02-21 16:17:10 +02:00
Georgi Gerganov
337c9cbd52 sync : ggml
ggml-ci
2024-02-19 15:09:43 +02:00
Jared Van Bortel
a0c2dad9d4
build : pass all warning flags to nvcc via -Xcompiler (#5570)
* build : pass all warning flags to nvcc via -Xcompiler
* make : fix apparent mis-merge from #3952
* make : fix incorrect GF_CC_VER for CUDA host compiler
2024-02-18 16:21:52 -05:00
Georgi Gerganov
b1de96824b
ci : fix wikitext url + compile warnings (#5569)
ggml-ci
2024-02-18 22:39:30 +02:00
Georgi Gerganov
d2819d5577
scripts : add helpers script for bench comparing commits (#5521)
* scripts : add helpers script for bench comparing commits

* scripts : detect CUDA

* set flags after checking the command line

* fix make flags

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-02-16 15:14:40 +02:00
Georgi Gerganov
9350a1cf21
scripts : add hf.sh helper script (#5501)
* scripts : add hf.sh helper scripts

* hf : add error logs

* hf : add support for --repo and --file
2024-02-15 15:41:15 +02:00
Georgi Gerganov
3b169441df
sync : ggml (#5452)
* ggml-alloc : v3 (ggml/727)

* ggml-alloc v3

ggml-ci

* fix ci

ggml-ci

* whisper : check for backend buffer allocation failures

* whisper : avoid leaks when initialization fails

* cleanup

ggml-ci

* style fixes

ggml-ci

* sync : ggml

* update llama.cpp, clip.cpp, export-lora.cpp

* update finetune.cpp, train-text-from-scratch.cpp

ggml-ci

* ggml-backend : reduce alignment to 32 to match gguf and fix mmap

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-02-12 09:16:06 +02:00
Georgi Gerganov
cd9aea63b5
scripts : update sync scripts with new backends 2024-02-10 09:53:05 +02:00
Georgi Gerganov
43b65f5eb8
sync : ggml 2024-02-10 09:30:36 +02:00
Georgi Gerganov
30679d438d
scripts : fix typos, cleanup (#5303) 2024-02-05 09:48:03 +02:00
Нияз Гарифзянов
4be04c8965
scripts : add non-interactive server-llm.sh (#5303)
* Update server-llm.sh

Add flag --non-interactive that allows run script without asking a permission

* Update scripts/server-llm.sh

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-05 09:43:57 +02:00
Georgi Gerganov
e437b37fd0
scripts : parse wtype in server-llm.sh (#5167)
* scripts : parse wtype in server-llm.sh

* scripts : fix check for wfile
2024-02-02 14:23:40 +02:00
Neo Zhang Jianyu
01684139c3
support SYCL backend windows build (#5208)
* support SYCL backend windows build

* add windows build in CI

* add for win build CI

* correct install oneMKL

* fix install issue

* fix ci

* fix install cmd

* fix install cmd

* fix install cmd

* fix install cmd

* fix install cmd

* fix win build

* fix win build

* fix win build

* restore other CI part

* restore as base

* rm no new line

* fix no new line issue, add -j

* fix grammer issue

* allow to trigger manually, fix format issue

* fix format

* add newline

* fix format

* fix format

* fix format issuse

---------

Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
2024-01-31 08:08:07 +05:30
Georgi Gerganov
8f8ddfcfad
sync : ggml (#0) 2024-01-30 16:21:57 +02:00
Georgi Gerganov
35dec26cc2
sync : ggml 2024-01-28 19:48:05 +02:00
Georgi Gerganov
753eafed0e
sync : ggml 2024-01-27 17:00:24 +02:00
Georgi Gerganov
5f1925a8ce
scripts : move run-with-preset.py from root to scripts folder 2024-01-26 17:09:44 +02:00
crasm
413e7b0559
ci : add model tests + script wrapper (#4586)
* scripts : add lib.sh and lib_test.sh

* scripts : stub out new ci-run.sh script

* scripts : switch to PascalCase for functions

This looks a little odd at first, but I find it very useful as a
convention to know if a command is part of our code vs a builtin.

* scripts : add some fancy conversion from snake_case to PascalCase

* Add venv to ci/run.sh

* Revert scripts work

* scripts : add wrapper script for local use of ci/run.sh

* Simplify .gitignore for tests, clang-tidy fixes

* Label all ctest tests

* ci : ctest uses -L main

* Attempt at writing ctest_with_model

* Update test-model-load-cancel

* ci : add ctest_with_model for debug and release

ggml-ci

* Fix gg_get_model function

ggml-ci

* got stuck on CMake

* Add get_model.cpp to tests/CMakeLists.txt

ggml-ci

* Fix README.md output for ctest_with_model

ggml-ci

* workflows : use `-L main` for all ctest

ggml-ci

* Fixes

* GG_RUN_CTEST_MODELFILE => LLAMACPP_TESTMODELFILE
* Always show warning rather than failing if model file variable is not
  set

* scripts : update usage text for ci-run.sh
2024-01-26 14:18:00 +02:00
Georgi Gerganov
e9240cdfa0
scripts : add get-winogrande.sh 2024-01-18 20:45:39 +02:00
Georgi Gerganov
dcad445d0c
scritps : add helper script to get hellaswag data in txt format 2024-01-18 11:44:49 +02:00
Georgi Gerganov
6b6916b215
sync : ggml 2024-01-17 20:54:50 +02:00
Georgi Gerganov
9408cfdad6
scripts : sync-ggml-am.sh option to skip commits 2024-01-14 11:08:41 +02:00
Georgi Gerganov
76484fbfd3
sync : ggml 2024-01-14 00:14:46 +02:00
Johannes Gäßler
7dc78764e2
compare-llama-bench: tweak output format (#4910) 2024-01-13 15:52:53 +01:00
Georgi Gerganov
de473f5f8e
sync : ggml 2024-01-12 22:02:43 +02:00
Georgi Gerganov
64802ec00d
sync : ggml 2024-01-11 09:39:08 +02:00
Johannes Gäßler
4f56458d34
Python script to compare commits with llama-bench (#4844) 2024-01-10 01:04:33 +01:00
Georgi Gerganov
9a818f7c42
scripts : improve get-pg.sh (#4838) 2024-01-09 19:21:13 +02:00
Georgi Gerganov
d9653894df
scripts : script to get Paul Graham essays in txt format (#4838) 2024-01-09 16:23:05 +02:00
Georgi Gerganov
91d38876df metal : switch back to default.metallib (ggml/681)
ggml-ci
2024-01-05 18:02:06 +02:00
Georgi Gerganov
7bed7eba35 cuda : simplify expression
Co-authored-by: slaren <slarengh@gmail.com>
2024-01-03 14:38:38 +02:00
Georgi Gerganov
75e3fd8581 sync : ggml
ggml-ci
2024-01-03 14:38:38 +02:00
Georgi Gerganov
ab62fc3e55 scripts : fix sync order + metal sed 2024-01-03 14:38:38 +02:00
crasm
04ac0607e9
python : add check-requirements.sh and GitHub workflow (#4585)
* python: add check-requirements.sh and GitHub workflow

This script and workflow forces package versions to remain compatible
across all convert*.py scripts, while allowing secondary convert scripts
to import dependencies not wanted in convert.py.

* Move requirements into ./requirements

* Fail on "==" being used for package requirements (but can be suppressed)

* Enforce "compatible release" syntax instead of ==

* Update workflow

* Add upper version bound for transformers and protobuf

* improve check-requirements.sh

* small syntax change

* don't remove venvs if nocleanup is passed

* See if this fixes docker workflow

* Move check-requirements.sh into ./scripts/

---------

Co-authored-by: Jared Van Bortel <jared@nomic.ai>
2023-12-29 16:50:29 +02:00
Georgi Gerganov
c8255f8a6b
scripts : print list of sync commits 2023-12-29 15:12:35 +02:00
Georgi Gerganov
38b3de4658
sync : ggml 2023-12-29 14:56:41 +02:00
Georgi Gerganov
ca38b8d334
scripts : do not sync commits from this repo 2023-12-29 14:54:05 +02:00
Georgi Gerganov
b47879b0dd
scripts : add sync-ggml-am.sh 2023-12-27 11:44:22 +02:00
Jared Van Bortel
70f806b821
build : detect host compiler and cuda compiler separately (#4414) 2023-12-13 12:10:10 -05:00
Georgi Gerganov
fe680e3d10
sync : ggml (new ops, tests, backend, etc.) (#4359)
* sync : ggml (part 1)

* sync : ggml (part 2, CUDA)

* sync : ggml (part 3, Metal)

* ggml : build fixes

ggml-ci

* cuda : restore lost changes

* cuda : restore lost changes (StableLM rope)

* cmake : enable separable compilation for CUDA

ggml-ci

* ggml-cuda : remove device side dequantize

* Revert "cmake : enable separable compilation for CUDA"

This reverts commit 09e35d04b1.

* cuda : remove assert for rope

* tests : add test-backend-ops

* ggml : fix bug in ggml_concat

* ggml : restore `ggml_get_n_tasks()` logic in `ggml_graph_plan()`

* ci : try to fix macOS

* ggml-backend : remove backend self-registration

* ci : disable Metal for macOS cmake build

ggml-ci

* metal : fix "supports family" call

* metal : fix assert

* metal : print resource path

ggml-ci

---------

Co-authored-by: slaren <slarengh@gmail.com>
2023-12-07 22:26:54 +02:00
bandoti
b38a16dfcf
cmake : fix issue with version info not getting baked into LlamaConfig.cmake (#3970)
* Split CPP generation from build-info query

* Remove blank lines

* Add BUILD_SHARED_LIBS option
2023-11-27 21:25:42 +02:00
Georgi Gerganov
4760e7cc0b
sync : ggml (backend v2) (#3912)
* sync : ggml (backend v2) (wip)

* sync : migrate examples and llama.cpp to dynamic graphs (wip)

* sync : update tests + fix max op params to 64

ggml-ci

* sync : ggml-cuda

ggml-ci

* llama : fix save/load state context size

ggml-ci

* sync : try to fix build on tvOS

* sync : pass custom graph sizes in training examples

* sync : update graph copies to new ggml API

* sync : update sync-ggml.sh with new files

* scripts : fix header in sync script

* train : fix context size calculations

* llama : increase inference graph size up to 4096 nodes

* train : allocate grads for backward graphs

* train : allocate grads for gb_tmp
2023-11-13 14:16:23 +02:00
cebtenzzre
b12fa0d1c1
build : link against build info instead of compiling against it (#3879)
* cmake : fix build when .git does not exist

* cmake : simplify BUILD_INFO target

* cmake : add missing dependencies on BUILD_INFO

* build : link against build info instead of compiling against it

* zig : make build info a .cpp source instead of a header

Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com>

* cmake : revert change to CMP0115

---------

Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com>
2023-11-02 08:50:16 +02:00
Georgi Gerganov
f0e209324a
scripts : add server-llm.sh (#3868)
* scripts : add deploy-server.sh

* scripts : rename to server-llm.sh

* scripts : working curl pipe
2023-11-01 11:29:07 +02:00
Georgi Gerganov
db3abcc114
sync : ggml (ggml-backend) (#3548)
* sync : ggml (ggml-backend)

ggml-ci

* zig : add ggml-backend to the build
2023-10-08 20:19:14 +03:00
bandoti
095231dfd3
cmake : fix transient definitions in find pkg (#3411) 2023-10-02 12:51:49 +03:00
DAN™
99115f3fa6
cmake : fix build-info.h on MSVC (#3309) 2023-09-25 18:45:33 -04:00
Kevin Ji
bedb92b603
scripts : use /usr/bin/env in shebang (#3313) 2023-09-22 23:52:23 -04:00
Cebtenzzre
e6616cf0db
examples : add compiler version and target to build info (#2998) 2023-09-15 16:59:49 -04:00
bandoti
990a5e226a
cmake : add relocatable Llama package (#2960)
* Keep static libs and headers with install

* Add logic to generate Config package

* Use proper build info

* Add llama as import library

* Prefix target with package name

* Add example project using CMake package

* Update README

* Update README

* Remove trailing whitespace
2023-09-14 20:04:40 +03:00
Georgi Gerganov
611363ac79 scripts : add pipefail 2023-08-29 10:50:30 +03:00
Georgi Gerganov
25423e9185
scripts : helper convert script 2023-08-27 15:24:58 +03:00
Georgi Gerganov
01f2224682
falcon : write file type 2023-08-24 19:58:30 +03:00
Georgi Gerganov
8f8c28e89c
convert : auto-determine model name based on dir + scripts update 2023-08-24 19:26:47 +03:00
Cebtenzzre
7c2227a197
chmod : make scripts executable (#2675) 2023-08-23 17:29:09 +03:00
Georgi Gerganov
ef3f333d37
ggml : sync latest (SAM + SD operators, CUDA alibi) (#2709)
* ggml : sync latest (SAM + SD operators, CUDA alibi)

ggml-ci

* ggml : fix tabs
2023-08-22 14:22:08 +03:00
Georgi Gerganov
b5ffb2849d
scripts : add helper script to get wikitext 2023-08-15 10:05:25 +03:00
Eve
81844fbcfd
tests : Fix compilation warnings (Linux/GCC) (#2451)
* fix hellaswag print format, cast away warning in test-double-float

* c++11 cannot use designated initializers

* add static to test-grad0.c internal functions

* use memcpy in test-double-float.c

* port c tests to c++

* use initializer list for ggml_init_params
2023-08-02 11:06:19 +03:00
Hesen Peng
82552b7f54
build : fix line breaking error in build-info.sh (#2349)
* fix line breaking

* build number line break removal
2023-07-25 15:24:09 +03:00
Jiří Podivín
27ab66e437
py : turn verify-checksum-models.py into executable (#2245)
README.md was adjusted to reflect the change.

Signed-off-by: Jiri Podivin <jpodivin@gmail.com>
2023-07-16 22:54:47 +03:00
Georgi Gerganov
1b6efeab82
tests : fix test-grad0 2023-07-05 20:20:25 +03:00
Georgi Gerganov
ed9a54e512
ggml : sync latest (new ops, macros, refactoring) (#2106)
- add ggml_argmax()
- add ggml_tanh()
- add ggml_elu()
- refactor ggml_conv_1d() and variants
- refactor ggml_conv_2d() and variants
- add helper macros to reduce code duplication in ggml.c
2023-07-04 21:54:11 +03:00
Jiří Podivín
5ddf7ea1fb
hooks : setting up flake8 and pre-commit hooks (#1681)
Small, non-functional changes were made to non-compliant files.
These include breaking up long lines, whitespace sanitation and
unused import removal.

Maximum line length in python files was set to a generous 125 chars,
in order to minimize number of changes needed in scripts and general
annoyance. The "txt" prompts directory is excluded from the checks
as it may contain oddly formatted files and strings for a good reason.

Signed-off-by: Jiri Podivin <jpodivin@gmail.com>
2023-06-17 13:32:48 +03:00
Georgi Gerganov
b9fd7eee57
ggml : remove bit shuffling (#1405)
* ggml : remove Q4_0 bit shufling (ARM NEON)

* ggml : remove Q4_1 bit shuffling (ARM NEON + reference)

* ggml : nibbles_from_floats() + bytes_from_nibbles() (ARM NEON)

* ggml : remove Q4_2 bit shuffling (WIP, BROKEN)

* ggml : remove Q5_0 bit shuffling (ARM NEON)

* ggml : 2x faster scalar implementations

* ggml : remove Q5_1 bit shuffling (ARM NEON + scalar)

* ggml : simplify scalar dot

* ggml : remove WASM SIMD bit shuffling + remove vzip for ARM 32-bit

* ggml : fix Q4_1 quantization

* ggml : update cuBLAS + normalize variable names

* ggml : remove Q4_2 mode

* ggml : minor formatting

* ggml : fix Q5_0 quantization

* scripts : add script for measuring the time per token

* AVX implementations (#1370)

* ggml : uniform 5th bit extraction

* llama : produce error upon loading old model files

* llama : fix model magic/version write

* ggml : speed-up Q5_0 + Q5_1 at 4 threads

* ggml : preserve old Q4 and Q5 formats

* ggml : simplify Q8_1 - no need for low / high sums anymore

* ggml : fix Q8_0 and Q8_1 rounding

* Revert "AVX implementations (#1370)"

This reverts commit 948d124837.

* ggml : fix AVX2 implementation

* sha : update hashes for 7B and 13B

* readme : update timings + remove warning banner

* llama : update v2 PR number to 1405

* ggml : fix WASM comments

* ggml : back to original bit order

* readme : add note that Q4 and Q5 have been changed

* llama : fix return for unknown version

---------

Co-authored-by: Stephan Walter <stephan@walter.name>
2023-05-12 00:23:08 +03:00
Georgi Gerganov
f9a6364912
llama : require first token to be BOS (#1303)
* llama : require first token to be BOS

* scripts : add ppl-run-all.sh

* perplexity : add BOS for each chunk

* readme : update perplexity values after BOS fix

* perplexity : add clarifying comments
2023-05-08 17:41:54 +03:00
Georgi Gerganov
bca9ad938a
minor : fix whitespaces (#1302) 2023-05-03 20:09:42 +03:00
Georgi Gerganov
e2a937ca6a
minor : fix trailing whitespaces 2023-05-03 18:43:23 +03:00
KASR
b0c71c7b6d
scripts : platform independent script to verify sha256 checksums (#1203)
* python script to verify the checksum of the llama models

Added Python script for verifying SHA256 checksums of files in a directory, which can run on multiple platforms. Improved the formatting of the output results for better readability.

* Update README.md

update to the readme for improved readability and to explain the usage of the python checksum verification script

* update the verification script

I've extended the script based on suggestions by @prusnak

The script now checks the available RAM, is there is enough to check the file at once it will do so. If not the file is read in chunks.

* minor improvment

small change so that the available ram is checked and not the total ram

* remove the part of the code that reads the file at once if enough ram is available

based on suggestions from @prusnak i removed the part of the code that checks whether the user had enough ram to read the entire model at once. the file is now always read in chunks.

* Update verify-checksum-models.py

quick fix to pass the git check
2023-05-03 18:31:28 +03:00
kuvaus
9daff419f6
fix build-info.h for git submodules (#1289)
* make git build info work with submodules

---------

Co-authored-by: Green Sky <green@g-s.xyz>
2023-05-03 02:43:43 +02:00
DannyDaemonic
f4cef87edf
Add git-based build information for better issue tracking (#1232)
* Add git-based build information for better issue tracking

* macOS fix

* "build (hash)" and "CMAKE_SOURCE_DIR" changes

* Redo "CMAKE_CURRENT_SOURCE_DIR" and clearer build messages

* Fix conditional dependency on missing target

* Broke out build-info.cmake, added find_package fallback, and added build into to all examples, added dependencies to Makefile

* 4 space indenting for cmake, attempt to clean up my mess in Makefile

* Short hash, less fancy Makefile, and don't modify build-info.h if it wouldn't change it
2023-05-01 18:23:47 +02:00
Georgi Gerganov
284685f169
scripts : add helper scripts to synch ggml repo 2023-04-23 19:57:09 +03:00