Commit Graph

314 Commits

Author SHA1 Message Date
Howard Su
94ddd6204c Simplify the logic of scheduling 2023-04-10 22:37:37 +08:00
Howard Su
6d18c6ea3e Fix the number of forward looking nodes 2023-04-10 22:37:10 +08:00
Howard Su
6f2a61eb4f Rework scheduling algorithm. 2023-04-10 22:24:27 +08:00
Howard Su
2035a3cc29 avoid to change ggml_task_type 2023-04-09 22:11:24 +08:00
Howard Su
3b03df5c05 look forward more 2023-04-08 19:55:29 +08:00
Howard Su
921296c0d5 avoid malloc/free in critial path 2023-04-08 00:47:19 +08:00
Howard Su
455f6f79bc Try find other single threaded operator to run 2023-04-08 00:34:05 +08:00
Howard Su
43dde039b0 Run second operator when possible 2023-04-07 23:51:46 +08:00
Howard Su
c640d2a4bd Remove finalizer 2023-04-07 22:24:14 +08:00
Howard Su
b8c9b27452 Merge remote-tracking branch 'tp/Pithikos-C-Thread-Pool2' into tp_schedule 2023-04-07 21:31:07 +08:00
Howard Su
5ad9e9531f Only check hardware when option is ON 2023-04-07 21:04:47 +08:00
Howard Su
997c749065 Add detection code for avx 2023-04-06 20:46:07 +08:00
Pavol Rusnak
d2beca95dc
Make docker instructions more explicit (#785) 2023-04-06 08:56:58 +02:00
Georgi Gerganov
eeaa7b0492
ggml : multi-thread ggml_rope() (~3-4 times faster on M1) (#781) 2023-04-05 22:11:03 +03:00
Georgi Gerganov
986b6ce9f9
ggml, llama : avoid heavy V transpose + improvements (#775)
ggml :

- added ggml_view_3d()
- ggml_view_tensor() now inherits the stride too
- reimplement ggml_cpy() to account for dst stride
- no longer require tensor->data to be memory aligned

llama :

- compute RoPE on 32-bit tensors (should be more accurate)
- store RoPE-ed K in the KV cache
- store transposed V in the KV cache (significant speed-up)
- avoid unnecessary Q copy
2023-04-05 22:07:33 +03:00
Georgi Gerganov
3416298929
Update README.md 2023-04-05 19:54:30 +03:00
Ivan Stepanov
5a8c4f6240
llama : define non-positive top_k; top_k range check (#779)
* Define non-positive top_k; top_k range check

* minor : brackets

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-04-05 19:20:05 +03:00
at8u
ff05d05c96
miku.sh : add executable bit (#780) 2023-04-05 18:59:13 +03:00
Georgi Gerganov
62b3e81aae
media : add logos and banners 2023-04-05 18:58:31 +03:00
Georgi Gerganov
8d10406d6e
readme : change logo + add bindings + add uis + add wiki 2023-04-05 18:56:20 +03:00
iacore
ed1c214e66
zig : add build.zig (#773)
Co-authored-by: Locria Cyber <74560659+locriacyber@users.noreply.github.com>
2023-04-05 18:06:02 +03:00
Ivan Stepanov
0c44427df1
make : missing host optimizations in CXXFLAGS (#763) 2023-04-05 17:38:37 +03:00
Adithya Balaji
594cc95fab
readme : update with CMake and windows example (#748)
* README: Update with CMake and windows example

* README: update with code-review for cmake build
2023-04-05 17:36:12 +03:00
at8u
88ed5761b8
examples : add Miku.sh (#724)
* Add Miku.sh to examples

* Add missing line to prompt in Miku.sh

* Add --keep param to Miku.sh

* Remove '[end_of_conversation]' line from Miku.sh

No longer is necessary.
2023-04-05 17:32:42 +03:00
Andrew Duffy
58c438cf7d
Add Accelerate/BLAS when using Swift (#765) 2023-04-05 06:44:24 -04:00
mgroeber9110
53dbba7695
Windows: reactive sigint handler after each Ctrl-C (#736) 2023-04-03 18:00:55 +02:00
SebastianApel
437e77855a
10+% performance improvement of ggml_vec_dot_q4_0 on AVX2 (#654)
* Performance improvement of AVX2 code
* Fixed problem with MSVC compiler
* Reviewer comments: removed double semicolon, deleted empty line 1962
2023-04-03 09:52:28 +02:00
Ivan Stepanov
cd7fa95690
Define non-positive temperature behavior (#720) 2023-04-03 02:19:04 +02:00
bsilvereagle
a0c0516416
Remove torch GPU dependencies from the Docker.full image (#665)
By using `pip install torch --index-url https://download.pytorch.org/whl/cpu`
instead of `pip install torch` we can specify we want to install a CPU-only version
of PyTorch without any GPU dependencies. This reduces the size of the Docker image
from 7.32 GB to 1.62 GB
2023-04-03 00:13:03 +02:00
Thatcher Chamberlin
d8d4e865cd
Add a missing step to the gpt4all instructions (#690)
`migrate-ggml-2023-03-30-pr613.py` is needed to get gpt4all running.
2023-04-02 12:48:57 +02:00
Christian Falch
e986f94829
Added api for getting/setting the kv_cache (#685)
The api provides access methods for retrieving the current memory buffer for the kv_cache and its token number.
It also contains a method for setting the kv_cache from a memory buffer.

This makes it possible to load/save history - maybe support --cache-prompt paramater as well?

Co-authored-by: Pavol Rusnak <pavol@rusnak.io>
2023-04-02 12:23:04 +02:00
Marian Cepok
c0bb1d3ce2
ggml : change ne to int64_t (#626) 2023-04-02 13:21:31 +03:00
Leonardo Neumann
6e7801d08d
examples : add gpt4all script (#658) 2023-04-02 10:56:20 +03:00
Stephan Walter
81040f10aa
llama : do not allocate KV cache for "vocab_only == true" (#682)
Fixes sanitizer CI
2023-04-02 10:18:53 +03:00
Fabian
c4f89d8d73
make : use -march=native -mtune=native on x86 (#609) 2023-04-02 10:17:05 +03:00
Murilo Santana
5b70e7de4c
fix default params for examples/main (#697) 2023-04-02 04:41:12 +02:00
Vladimir
d3bc4df97d fix windows build 2023-04-01 20:18:04 +02:00
Vladimir
a65d37ad36 using github Pithikos/C-Thread-Pool for threading 2023-04-01 20:18:04 +02:00
Vladimir
21e88c8b0f
run sanitizers in release, otherwise too slow (#5) 2023-04-01 20:16:36 +02:00
Ikko Eltociear Ashimine
a717cba844
py: huggingface -> Hugging Face (#686) 2023-04-01 18:38:18 +02:00
rimoliga
d0a7f742e7
readme: replace termux links with homepage, play store is deprecated (#680) 2023-04-01 16:57:30 +02:00
Slaren
0d054e292e Show error message when -f fails 2023-04-01 16:08:40 +02:00
Stephan Walter
3525899277
Enable -std= for cmake builds, fix warnings (#598) 2023-03-31 19:19:16 +00:00
slaren
1d08882afa
Optimize AVX2 ggml_vec_dot_q4_0 (#642) 2023-03-31 15:55:52 +00:00
perserk
02c5b27e91
Add AVX acceleration (#617)
* ggml : add AVX quantize_row_q4_0()

* ggml : add AVX ggml_vec_dot_q4_0()

* ggml : refactor AVX part of ggml_vec_dot_q4_0()

https://github.com/ggerganov/llama.cpp/pull/617#issuecomment-1489985645
2023-03-31 13:55:44 +02:00
Pavol Rusnak
cbef542879 py : cleanup the code
- use f-strings where possible
- drop first param of encode/decode functions since "utf-8" is the default
2023-03-31 10:32:01 +02:00
Pavol Rusnak
9733104be5 drop quantize.py (now that models are using a single file) 2023-03-31 01:07:32 +02:00
Georgi Gerganov
3df890aef4
readme : update supported models 2023-03-30 22:31:54 +03:00
Justine Tunney
ee0c40dd6d Introduce GGML migration tool for new file format
If you deleted your old Meta LLaMA .pth files, then the
migrate-ggml-2023-03-30-pr613.py script will allow you to convert your
old ggml files into the new mmap()'able format.

See #613
2023-03-30 12:28:25 -07:00
Justine Tunney
6f23ba5ee2 Ensure --mlock works properly with mmap() support 2023-03-30 12:28:25 -07:00