Commit Graph

1764 Commits

Author SHA1 Message Date
Kerfuffle
34b0a08207
gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981)
* gguf-py: Refactor and add file reading support

* Replay changes from #3871

Credit to @cebtenzzre for that pull

* Various type annotation fixes.

* sort imports with isort (again)

* Fix missing return statement in add_tensor

* style cleanup with flake8

* fix NamedTuple and Enum usage

* Fix an issue with state init in GGUFReader

Move examples to an examples/ directory

Clean up examples

Add an example of modifying keys in a GGUF file

Update documentation with info on examples

Try to support people importing gguf/gguf.py directly

* Damagage is not a word.

* Clean up gguf-py/examples/modify_gguf.py whitespace

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

* Update gguf-py/examples/modify_gguf.py formatting

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

* Update gguf-py/gguf/gguf_reader.py type hint

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

* Make examples executable, formatting changes

* Add more information to GGUFReader and examples comments

* Include a gguf Python package version bump

* Add convert-gguf-endian.py script

* cleanup

* gguf-py : bump minor version

* Reorganize scripts

* Make GGUFReader endian detection less arbitrary

* Add JSON dumping support to gguf-dump.py

Which I kind of regret now

* A few for gguf-dump.py cleanups

* Murder accidental tuple in gguf-py/scripts/gguf-dump.py

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

* cleanup

* constants : remove unneeded type annotations

* fix python 3.8 compat

* Set up gguf- scripts in pyproject.toml

* And include scripts/__init__.py, derp

* convert.py: We can't currently support Q8_0 on big endian.

* gguf-py: SpecialVocab: Always try available sources for special token ids

gguf-py: SpecialVocab: Try to load merges from merges.txt if not in tokenizer.json

gguf-py: SpecialVocab: Add 'add_bos_token' type bools to GGUF metadata
u

* cleanup

* Promote add_X_token to GGUF metadata for BOS and EOS

---------

Co-authored-by: Jared Van Bortel <jared@nomic.ai>
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2023-11-11 08:04:50 +03:00
Jhen-Jie Hong
4a4fd3eefa
server : allow continue edit on completion mode (#3950)
* server : allow continue edit on completion mode

* server : handle abort case in runCompletion

* server : style improvement
2023-11-10 16:49:33 -06:00
Galunid
df9d1293de
Unbreak persimmon after #3837 (#4010) 2023-11-10 14:24:54 +01:00
Galunid
a75fa576ab
scripts: Generalize convert scripts (#3838)
* Replace convert-*-hf-to-gguf.py files with convert-hf-to-gguf.py
2023-11-09 11:09:29 +01:00
Mihai
57ad015dc3
server : add min_p param (#3877)
* Update server.cpp with min_p after it was introduced in https://github.com/ggerganov/llama.cpp/pull/3841

* Use spaces instead of tabs

* Update index.html.hpp after running deps.sh

* Fix test - fix line ending
2023-11-08 20:00:34 -06:00
Jared Van Bortel
af00cca08e Merge commit 'ec893798b7a2a803466cc8f063051499ec3d96f7' into HEAD 2023-11-08 16:36:00 -05:00
Jared Van Bortel
c438c16896 fix build with external fmtlib (v10)
Co-authored-by: ToKiNoBug <tokinobug@163.com>
2023-11-08 16:31:29 -05:00
Jared Van Bortel
a8cac53207 kompute : fix issues with debug layers 2023-11-08 16:31:29 -05:00
slaren
875fb42871
ggml-alloc : fix backend assignments of views (#3982) 2023-11-08 13:15:14 +01:00
Jared Van Bortel
0a7c980b6f
gguf : track writer state, free unneeded tensors, cleanup (#3871) 2023-11-07 12:43:04 -05:00
Georgi Gerganov
413503d4b9
make : do not add linker flags when compiling static llava lib (#3977) 2023-11-07 20:25:32 +03:00
xaedes
e9c1cecb9d
ggml : fix backward rope after YaRN (#3974)
* fix backward process of rope

rope backward process was broken after YaRN RoPE (#2268) implementation, due to missing changes in backward functions.

the code for the backward process is nearly identically to the forward process:
the only difference is the sign of the sin-values.

to avoid future regressions remove the near-duplicate backward functions and reuse the forward code:

for this a new function argument `bool forward` was added to `ggml_compute_forward_rope_f32` and `ggml_compute_forward_rope_f16`.
the sin-values will be negated when forward is false.

* fix finetune rope call to use correct default attn_factor of 1.0f

* remove unused `ggml_rope_xpos_back`

it is better to have only one `ggml_rope_back` function that accepts all rope parameters, so that `ggml_compute_backward` can propagate all parameters without having to switch between different rope_back variants.

* fix comments explaining the sinus sign in ggml_forward_rope

* add missing function arguments in declaration

* fix function argument type in declaration
2023-11-07 10:04:51 +02:00
Matthew Tejo
54b4df8886
Use params when loading models in llava-cli (#3976)
llava-cli was loading models with default params and ignoring settings
from the cli. This switches to a generic function to load the params
from the cli options.
2023-11-07 10:43:59 +03:00
Meng Zhang
46876d2a2c
cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946)
* protyping the idea that supports running on CPU for a GGML_USE_CUBLAS=on build

* doc: add comments to ggml_cublas_loaded()

* fix defined(...)
2023-11-07 08:49:08 +02:00
Damian Stewart
381efbf480
llava : expose as a shared library for downstream projects (#3613)
* wip llava python bindings compatibility

* add external llava API

* add base64 in-prompt image support

* wip refactor image loading

* refactor image load out of llava init

* cleanup

* further cleanup; move llava-cli into its own file and rename

* move base64.hpp into common/

* collapse clip and llava libraries

* move llava into its own subdir

* wip

* fix bug where base64 string was not removed from the prompt

* get libllava to output in the right place

* expose llava methods in libllama.dylib

* cleanup memory usage around clip_image_*

* cleanup and refactor *again*

* update headerdoc

* build with cmake, not tested (WIP)

* Editorconfig

* Editorconfig

* Build with make

* Build with make

* Fix cyclical depts on Windows

* attempt to fix build on Windows

* attempt to fix build on Windows

* Upd TODOs

* attempt to fix build on Windows+CUDA

* Revert changes in cmake

* Fix according to review comments

* Support building as a shared library

* address review comments

---------

Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
Co-authored-by: Jared Van Bortel <jared@nomic.ai>
2023-11-07 00:36:23 +03:00
slaren
2833a6f63c
ggml-cuda : fix f16 mul mat (#3961)
* ggml-cuda : fix f16 mul mat

ggml-ci

* silence common.cpp warning (bonus)
2023-11-05 18:45:16 +01:00
Kerfuffle
d9ccce2e33
Allow common process_escapes to handle \x sequences (#3928)
* Allow common process_escapes to handle \x sequences

* Fix edge case when second hex digit is NUL
2023-11-05 10:06:06 -07:00
Thái Hoàng Tâm
bb60fd0bf6
server : fix typo for --alias shortcut from -m to -a (#3958) 2023-11-05 18:15:27 +02:00
Jared Van Bortel
132d25b8a6
cuda : fix disabling device with --tensor-split 1,0 (#3951)
Co-authored-by: slaren <slarengh@gmail.com>
2023-11-05 10:08:57 -05:00
Meng Zhang
3d48f42efc
llama : mark LLM_ARCH_STARCODER as full offload supported (#3945)
as done in https://github.com/ggerganov/llama.cpp/pull/3827
2023-11-05 14:40:08 +02:00
Eve
c41ea36eaa
cmake : MSVC instruction detection (fixed up #809) (#3923)
* Add detection code for avx

* Only check hardware when option is ON

* Modify per code review sugguestions

* Build locally will detect CPU

* Fixes CMake style to use lowercase like everywhere else

* cleanup

* fix merge

* linux/gcc version for testing

* msvc combines avx2 and fma into /arch:AVX2 so check for both

* cleanup

* msvc only version

* style

* Update FindSIMD.cmake

---------

Co-authored-by: Howard Su <howard0su@gmail.com>
Co-authored-by: Jeremy Dunn <jeremydunn123@gmail.com>
2023-11-05 10:03:09 +02:00
Eve
a7fac013cf
ci : use intel sde when ci cpu doesn't support avx512 (#3949) 2023-11-05 09:46:44 +02:00
slaren
48ade94538
cuda : revert CUDA pool stuff (#3944)
* Revert "cuda : add ROCM aliases for CUDA pool stuff (#3918)"

This reverts commit 629f917cd6.

* Revert "cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)"

This reverts commit d6069051de.

ggml-ci
2023-11-05 09:12:13 +02:00
Kerfuffle
f28af0d81a
gguf-py: Support 01.AI Yi models (#3943) 2023-11-04 16:20:34 -06:00
cebtenzzre
f88b198885 llama : fix Vulkan whitelist (#11) 2023-11-03 17:22:22 -04:00
Adam Treat
ffd0624be2 Remove this debug code. 2023-11-03 17:22:22 -04:00
Adam Treat
a5eb001eab Revert the prompt processing on gpu for now.
Fixes issues #1580 and #1581
2023-11-03 17:22:22 -04:00
Adam Treat
e006d377dd Scale the workgroup count down to allow correct generation for falcon with
AMD radeon cards with lower workgroup count limit

Partially fixes #1581
2023-11-03 17:22:22 -04:00
cebtenzzre
89b71278ff llama : decide to disable Vulkan before loading tensors (#7) 2023-11-03 17:22:22 -04:00
cebtenzzre
1c17010188 vulkan : fix missing break in matmul selection (#9) 2023-11-03 17:22:22 -04:00
Adam Treat
74ddf0f17d Fix synchronization problem for AMD Radeon with amdvlk driver or windows
drivers. Does not have any performance or fidelity effect on other gpu/driver
combos I've tested.

FIXES: https://github.com/nomic-ai/gpt4all/issues/1507
2023-11-03 17:22:22 -04:00
Adam Treat
8d9efbf97a Lower the workgroup count for some shaders by providing a loop that processes
four floats at a time.
2023-11-03 17:22:22 -04:00
Adam Treat
752f7ebd61 Remove unused push constant that was giving validation errors. 2023-11-03 17:22:22 -04:00
Adam Treat
8400015337 Don't try an allocation on a heap that is smaller than the size we require. 2023-11-03 17:22:22 -04:00
cebtenzzre
cbc0d1af79 kompute : make scripts executable 2023-11-03 17:22:22 -04:00
cebtenzzre
21841d3163 kompute : enable kp_logger and make it static (#8) 2023-11-03 17:22:22 -04:00
Aaron Miller
cc05a602d6 use mat*vec shaders for mat*mat
I wrote the mat*mat shaders from scratch so I understand them better but
they are currently not faster than just multiply-invoking the mat*vec
shaders, by a significant degree - so, except for f32 which needed a new
shader, revert to the m*v ones here.
2023-11-03 17:22:22 -04:00
Aaron Miller
c1fd64548d attempted speedups 2 2023-11-03 17:22:22 -04:00
Aaron Miller
9bc52ebae3 attempted speedups 2023-11-03 17:22:22 -04:00
Aaron Miller
8dc79ac380 clean up vulkan/cpu switch 2023-11-03 17:22:22 -04:00
Aaron Miller
cd0257ed0d q4_1 mat*mat 2023-11-03 17:22:22 -04:00
Aaron Miller
4809890d80 rm commented dbg print 2023-11-03 17:22:22 -04:00
Aaron Miller
b78a94bc6d q6k mm works 2023-11-03 17:22:22 -04:00
Aaron Miller
d5741c07a5 use op param epsilon for norms 2023-11-03 17:22:22 -04:00
Aaron Miller
3327d84a7f perf: use bigger threadgroups in mm 2023-11-03 17:22:22 -04:00
Aaron Miller
46385ee0d5 misc vulkan cleanup
make pushconts consistent w/ dispatch, avoid a double free
2023-11-03 17:22:22 -04:00
Aaron Miller
f0cd38b9ad add mat*mat ops 2023-11-03 17:22:22 -04:00
Adam Treat
09d83f0401 Delete TODO now that we have q8_0. 2023-11-03 17:22:22 -04:00
Aaron Miller
8564f79036 falcon h2d + reenable vulkan 2023-11-03 17:22:22 -04:00
Aaron Miller
020b1745a0 vulkan: implement neox mode for rope 2023-11-03 17:22:21 -04:00