Commit Graph

390 Commits

Author SHA1 Message Date
FSSRepo
c02c52efb5 fix multiple clients 2023-10-17 17:54:56 -04:00
FSSRepo
d2b1fac6c7 fix make bui;d errors 2023-10-17 17:18:56 -04:00
FSSRepo
ed0c11cb83 multimodal support enabled by default 2023-10-17 16:58:20 -04:00
FSSRepo
6c277eaab5 update api like OpenAI 2023-10-17 16:53:38 -04:00
FSSRepo
58f8ae9bfe readme change 2023-10-17 16:32:19 -04:00
FSSRepo
fa0f22f14f Merge remote-tracking branch 'upstream/master' 2023-10-17 16:31:33 -04:00
FSSRepo
aa2268f4cd sync README.md changes 2023-10-17 16:21:05 -04:00
Georgi Gerganov
e1675d133c
llama : avoid fprintf in favor of LLAMA_LOG (#3538) 2023-10-17 22:34:26 +03:00
slaren
a5e8c1d8c7
train-text-from-scratch : fix assert failure in ggml-alloc (#3618) 2023-10-17 20:00:58 +03:00
Georgi Gerganov
e74c705e15
editorconfig : remove trailing spaces 2023-10-17 19:52:53 +03:00
coezbek
3ad1e3f1a1
server : documentation of JSON return value of /completion endpoint (#3632)
* Added documentation of JSON return value of /completion endpoint

* Update examples/server/README.md

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-17 19:51:02 +03:00
Georgi Gerganov
1142013da4
save-load-state : fix example + add ci test (#3655)
* save-load-state : fix example (close #3606)

* ci : add test for save-load-state example

ggml-ci
2023-10-17 19:12:46 +03:00
staviq
1a159553f9
tokenizer : special token handling (#3538)
* Rewrite special token handling from #1931

* shorten param name, add st verification by type

* use offsets instead of copy by substr

* formatting, remove copying iterator on delete

* llama : normalize code-style

* swift fix

* print pfx/sfx if verb, main: split pfx input sfx

* dont add space when using special tokens

* minor : comment + spacing

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-17 18:11:01 +03:00
Georgi Gerganov
940efa95fe
llava : fix tokenization to not add bos between image embeddings and user prompt (#3645)
* llava : fix tokenization to not add bos after system prompt

* set seed

---------

Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
2023-10-16 23:58:00 +03:00
FSSRepo
4d1804330e fix llava implementation 2023-10-16 16:31:17 -04:00
FSSRepo
d7eca255d7 context shift fixed 2023-10-16 14:43:10 -04:00
FSSRepo
2d9f11db28 fixed premature end due stop word 2023-10-16 12:36:05 -04:00
FSSRepo
fd64f04fc2 fix long prompt than ctx proposed in #3639 2023-10-15 19:07:18 -04:00
FSSRepo
b727e022d6 fix ci make build undefined ref errors 2023-10-15 18:53:48 -04:00
FSSRepo
ce961a304b some ci fixes 2023-10-15 18:46:01 -04:00
Steward Garcia
9035978aae
Merge pull request #6 from damian0815/fssrepo_mac_fixes
fix compilation errors with llvm
2023-10-15 18:38:52 -04:00
Steward Garcia
f47fd17b73
Merge branch 'ggerganov:master' into master 2023-10-15 18:23:47 -04:00
FSSRepo
4e5c5c451c notify the user from server ui that multimodality is unavialable 2023-10-14 08:28:49 -04:00
M. Yusuf Sarıgöz
11dc1091f6
Honor -ngl option for Cuda offloading in llava (#3621) 2023-10-14 04:52:44 -06:00
Damian Stewart
299f6b54d8 fix compilation errors with llvm 2023-10-14 11:17:38 +02:00
FSSRepo
7e64bfe060 refactor code + remove unused comments + improved README.md 2023-10-14 00:31:34 -04:00
FSSRepo
9f72b44635 add multimodal input - alfa 2023-10-13 23:36:32 -04:00
FSSRepo
de35b47908 fixed tokens probs 2023-10-13 19:55:25 -04:00
FSSRepo
9d98cdda2c llava multimodal integration 2023-10-13 18:42:44 -04:00
FSSRepo
eb08201227 add changes to README.md 2023-10-13 14:28:06 -04:00
FSSRepo
a2c2d98c16 add context swap 2023-10-13 14:12:50 -04:00
FSSRepo
b6d9e212e5 fixed timings per slot 2023-10-13 13:10:38 -04:00
FSSRepo
a410a9e300 unused change reverted 2023-10-13 12:23:58 -04:00
FSSRepo
6358ae5f48 server ui now support multiple clients 2023-10-13 12:22:54 -04:00
FSSRepo
4ba5a5013d chat.mjs support cached prompt + some fixes 2023-10-13 11:06:41 -04:00
slaren
424b6381c4
ggml : add context enumeration functions (#3605)
finetune : fix assert failure in ggml-alloc
2023-10-13 12:23:10 +02:00
FSSRepo
500ac7120e cached prompt support 2023-10-12 21:16:12 -04:00
FSSRepo
83c2b3553a grammar + no stream completion 2023-10-12 18:43:57 -04:00
FSSRepo
5b8e29de53 multiple client support 2023-10-12 17:09:12 -04:00
FSSRepo
81484805f0 completion endpoint working 2023-10-12 16:17:27 -04:00
FSSRepo
29c8cdd65d refactored sampling function 2023-10-12 15:02:19 -04:00
FSSRepo
b716eeb72a Merge branch 'master' of https://github.com/ggerganov/llama.cpp 2023-10-12 12:55:08 -04:00
FSSRepo
78504218b9 save dev progress 2023-10-12 12:51:48 -04:00
M. Yusuf Sarıgöz
370359e5ba
examples: support LLaVA v1.5 (multimodal model) (#3436)
* WIP: start implementing LLaVA

* rm scratch buf for now, will revert after cleanup

* LLaVA image encoder is working. will combine with llama

* Add llava inference code, but it's buggy. debugging

* LLaVA is working e2e, needs to optimize memory allocation + cleanup

* Use ggml_allocr + rm unnecessary code

* fix: crlf -> lf

* fix: new line at EoF

* fix: trailing whitespace

* Add readme

* Update readme

* Some cleanup

* Are you happy editorconfig?

* rm unused batch image preprocessing

* rm unused import

* fix: rm designated initializers

* introduce pad-to-square mode for non-square images

* are you happy editorconfig?

* gitignore /llava

* Handle cases where image file does not exist

* add llava target to Makefile

* add support for 13b model variant

* Maybe seed is unlucky?

* Check if apples are compared to apples

* are you happy editorconfig?

* Use temperature = 0.1 by default

* command line: use gpt_params_parse()

* minor

* handle default n_predict

* fix typo

* llava : code formatting, rename files, fix compile warnings

* do not use Wno-cast-qual for MSVC

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-12 18:23:18 +03:00
Aarni Koskela
b016596d90
server : add completion mode (no chat) (#3582) 2023-10-12 09:51:53 +03:00
Georgi Gerganov
57dd55e2c7
server : fix kv cache management (#3588) 2023-10-12 09:29:04 +03:00
FSSRepo
471230202d crash fixed 2023-10-11 19:48:15 -04:00
FSSRepo
63f99b1ea6 implementing parallel decoding in server example 2023-10-11 18:14:11 -04:00
Georgi Gerganov
b8fe4b5cc9
main : fix session loading bug (#3400) 2023-10-11 23:55:41 +03:00
Michael Coppola
a8bdd65525
server : add parameter -tb N, --threads-batch N (#3584)
Co-authored-by: Michael Coppola <info@michaeljcoppola.com>
2023-10-11 22:42:22 +03:00