Howard Su
94ddd6204c
Simplify the logic of scheduling
2023-04-10 22:37:37 +08:00
Howard Su
6d18c6ea3e
Fix the number of forward looking nodes
2023-04-10 22:37:10 +08:00
Howard Su
6f2a61eb4f
Rework scheduling algorithm.
2023-04-10 22:24:27 +08:00
Howard Su
2035a3cc29
avoid to change ggml_task_type
2023-04-09 22:11:24 +08:00
Howard Su
3b03df5c05
look forward more
2023-04-08 19:55:29 +08:00
Howard Su
921296c0d5
avoid malloc/free in critial path
2023-04-08 00:47:19 +08:00
Howard Su
455f6f79bc
Try find other single threaded operator to run
2023-04-08 00:34:05 +08:00
Howard Su
43dde039b0
Run second operator when possible
2023-04-07 23:51:46 +08:00
Howard Su
c640d2a4bd
Remove finalizer
2023-04-07 22:24:14 +08:00
Howard Su
b8c9b27452
Merge remote-tracking branch 'tp/Pithikos-C-Thread-Pool2' into tp_schedule
2023-04-07 21:31:07 +08:00
Howard Su
5ad9e9531f
Only check hardware when option is ON
2023-04-07 21:04:47 +08:00
Howard Su
997c749065
Add detection code for avx
2023-04-06 20:46:07 +08:00
Pavol Rusnak
d2beca95dc
Make docker instructions more explicit ( #785 )
2023-04-06 08:56:58 +02:00
Georgi Gerganov
eeaa7b0492
ggml : multi-thread ggml_rope() (~3-4 times faster on M1) ( #781 )
2023-04-05 22:11:03 +03:00
Georgi Gerganov
986b6ce9f9
ggml, llama : avoid heavy V transpose + improvements ( #775 )
...
ggml :
- added ggml_view_3d()
- ggml_view_tensor() now inherits the stride too
- reimplement ggml_cpy() to account for dst stride
- no longer require tensor->data to be memory aligned
llama :
- compute RoPE on 32-bit tensors (should be more accurate)
- store RoPE-ed K in the KV cache
- store transposed V in the KV cache (significant speed-up)
- avoid unnecessary Q copy
2023-04-05 22:07:33 +03:00
Georgi Gerganov
3416298929
Update README.md
2023-04-05 19:54:30 +03:00
Ivan Stepanov
5a8c4f6240
llama : define non-positive top_k; top_k range check ( #779 )
...
* Define non-positive top_k; top_k range check
* minor : brackets
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-04-05 19:20:05 +03:00
at8u
ff05d05c96
miku.sh : add executable bit ( #780 )
2023-04-05 18:59:13 +03:00
Georgi Gerganov
62b3e81aae
media : add logos and banners
2023-04-05 18:58:31 +03:00
Georgi Gerganov
8d10406d6e
readme : change logo + add bindings + add uis + add wiki
2023-04-05 18:56:20 +03:00
iacore
ed1c214e66
zig : add build.zig ( #773 )
...
Co-authored-by: Locria Cyber <74560659+locriacyber@users.noreply.github.com>
2023-04-05 18:06:02 +03:00
Ivan Stepanov
0c44427df1
make : missing host optimizations in CXXFLAGS ( #763 )
2023-04-05 17:38:37 +03:00
Adithya Balaji
594cc95fab
readme : update with CMake and windows example ( #748 )
...
* README: Update with CMake and windows example
* README: update with code-review for cmake build
2023-04-05 17:36:12 +03:00
at8u
88ed5761b8
examples : add Miku.sh ( #724 )
...
* Add Miku.sh to examples
* Add missing line to prompt in Miku.sh
* Add --keep param to Miku.sh
* Remove '[end_of_conversation]' line from Miku.sh
No longer is necessary.
2023-04-05 17:32:42 +03:00
Andrew Duffy
58c438cf7d
Add Accelerate/BLAS when using Swift ( #765 )
2023-04-05 06:44:24 -04:00
mgroeber9110
53dbba7695
Windows: reactive sigint handler after each Ctrl-C ( #736 )
2023-04-03 18:00:55 +02:00
SebastianApel
437e77855a
10+% performance improvement of ggml_vec_dot_q4_0 on AVX2 ( #654 )
...
* Performance improvement of AVX2 code
* Fixed problem with MSVC compiler
* Reviewer comments: removed double semicolon, deleted empty line 1962
2023-04-03 09:52:28 +02:00
Ivan Stepanov
cd7fa95690
Define non-positive temperature behavior ( #720 )
2023-04-03 02:19:04 +02:00
bsilvereagle
a0c0516416
Remove torch GPU dependencies from the Docker.full image ( #665 )
...
By using `pip install torch --index-url https://download.pytorch.org/whl/cpu `
instead of `pip install torch` we can specify we want to install a CPU-only version
of PyTorch without any GPU dependencies. This reduces the size of the Docker image
from 7.32 GB to 1.62 GB
2023-04-03 00:13:03 +02:00
Thatcher Chamberlin
d8d4e865cd
Add a missing step to the gpt4all instructions ( #690 )
...
`migrate-ggml-2023-03-30-pr613.py` is needed to get gpt4all running.
2023-04-02 12:48:57 +02:00
Christian Falch
e986f94829
Added api for getting/setting the kv_cache ( #685 )
...
The api provides access methods for retrieving the current memory buffer for the kv_cache and its token number.
It also contains a method for setting the kv_cache from a memory buffer.
This makes it possible to load/save history - maybe support --cache-prompt paramater as well?
Co-authored-by: Pavol Rusnak <pavol@rusnak.io>
2023-04-02 12:23:04 +02:00
Marian Cepok
c0bb1d3ce2
ggml : change ne to int64_t ( #626 )
2023-04-02 13:21:31 +03:00
Leonardo Neumann
6e7801d08d
examples : add gpt4all script ( #658 )
2023-04-02 10:56:20 +03:00
Stephan Walter
81040f10aa
llama : do not allocate KV cache for "vocab_only == true" ( #682 )
...
Fixes sanitizer CI
2023-04-02 10:18:53 +03:00
Fabian
c4f89d8d73
make : use -march=native -mtune=native on x86 ( #609 )
2023-04-02 10:17:05 +03:00
Murilo Santana
5b70e7de4c
fix default params for examples/main ( #697 )
2023-04-02 04:41:12 +02:00
Vladimir
d3bc4df97d
fix windows build
2023-04-01 20:18:04 +02:00
Vladimir
a65d37ad36
using github Pithikos/C-Thread-Pool for threading
2023-04-01 20:18:04 +02:00
Vladimir
21e88c8b0f
run sanitizers in release, otherwise too slow ( #5 )
2023-04-01 20:16:36 +02:00
Ikko Eltociear Ashimine
a717cba844
py: huggingface -> Hugging Face ( #686 )
2023-04-01 18:38:18 +02:00
rimoliga
d0a7f742e7
readme: replace termux links with homepage, play store is deprecated ( #680 )
2023-04-01 16:57:30 +02:00
Slaren
0d054e292e
Show error message when -f fails
2023-04-01 16:08:40 +02:00
Stephan Walter
3525899277
Enable -std= for cmake builds, fix warnings ( #598 )
2023-03-31 19:19:16 +00:00
slaren
1d08882afa
Optimize AVX2 ggml_vec_dot_q4_0 ( #642 )
2023-03-31 15:55:52 +00:00
perserk
02c5b27e91
Add AVX acceleration ( #617 )
...
* ggml : add AVX quantize_row_q4_0()
* ggml : add AVX ggml_vec_dot_q4_0()
* ggml : refactor AVX part of ggml_vec_dot_q4_0()
https://github.com/ggerganov/llama.cpp/pull/617#issuecomment-1489985645
2023-03-31 13:55:44 +02:00
Pavol Rusnak
cbef542879
py : cleanup the code
...
- use f-strings where possible
- drop first param of encode/decode functions since "utf-8" is the default
2023-03-31 10:32:01 +02:00
Pavol Rusnak
9733104be5
drop quantize.py (now that models are using a single file)
2023-03-31 01:07:32 +02:00
Georgi Gerganov
3df890aef4
readme : update supported models
2023-03-30 22:31:54 +03:00
Justine Tunney
ee0c40dd6d
Introduce GGML migration tool for new file format
...
If you deleted your old Meta LLaMA .pth files, then the
migrate-ggml-2023-03-30-pr613.py script will allow you to convert your
old ggml files into the new mmap()'able format.
See #613
2023-03-30 12:28:25 -07:00
Justine Tunney
6f23ba5ee2
Ensure --mlock works properly with mmap() support
2023-03-30 12:28:25 -07:00