Commit Graph

1463 Commits

Author SHA1 Message Date
Georgi Gerganov
176993c871
Merge branch 'master' into server-rev 2023-10-22 15:04:16 +03:00
Georgi Gerganov
22c69a2794
batched : add len CLI argument 2023-10-22 08:37:20 +03:00
FSSRepo
2eb4c11ec5 fix image load + view image in chat 2023-10-21 14:34:19 -04:00
Jhen-Jie Hong
17b23eb9cb
server : fix multibyte handle in partial response (#3706) 2023-10-21 14:58:03 +03:00
shibe2
465219b914 CLBlast: Add outer loops over src0 for broadcasting in mulmat
Reduce repeated dequantization of the same data.
2023-10-20 22:30:52 +04:00
Georgi Gerganov
d1031cf49c
sampling : refactor init to use llama_sampling_params (#3696)
* sampling : refactor init to use llama_sampling_params

* llama : combine repetition, frequency and presence penalties in 1 call

* examples : remove embd-input and gptneox-wip

* sampling : rename penalty params + reduce size of "prev" vector

* sampling : add llama_sampling_print helper

* sampling : hide prev behind API and apply #3661

ggml-ci
2023-10-20 21:07:23 +03:00
Georgi Gerganov
778c070d1b
server : logs + minor code style 2023-10-20 20:44:51 +03:00
Georgi Gerganov
5d540e80d1
server : no need for atomic int - already using mutex 2023-10-20 20:44:29 +03:00
Georgi Gerganov
113dd60005
server : bach has to be allocated for n_parallel sequences 2023-10-20 20:42:45 +03:00
FSSRepo
6b2437e32d added thread safe pipeline 2023-10-20 12:07:32 -04:00
Qin Yue Chen
8cf19d60dc
gguf : support big endian platform (#3552)
* check whether platform is 390x if yes->do not import immintrin.h

* support s390x big endian

* support --bigendian option for s390x
1. verified with baichuan7b-chat with float 16 on s390x
2. verified with baichuan7b-chat
3. verified with chinese-alpaca-2-13b-f16

* update format based on editor-config checker result

* Update convert-baichuan-hf-to-gguf.py

* 1. check in ggml.c if endianess is not match
2. update GGUF version
3. change get_pack_prefix to property
4. update information log

* always use "GGUF" as beginng of GGUF file

* Compare "GGUF" with file header char by char
1.  Set GGUF_MAGIC to "GGUF" string instead of int value
2. Compare "GGUF" char by char to ensure its byte order
3. Move bytes swap code from convert.py to gguf.py write_tensor_data

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-20 14:19:40 +03:00
Georgi Gerganov
a0edf73bda
server : fix uninitialized sampling context (close #3685) 2023-10-20 13:06:10 +03:00
Herman Semenov
f439e506e8
ggml : fix rope + llama minor optimizations (#3560)
* Minor fixes and fixed memleak

* Using const auto references in range-based loop C++17
2023-10-20 13:02:12 +03:00
cebtenzzre
e78f3ef24a
convert : restore compat with old Falcon models (#3680) 2023-10-20 08:32:08 +03:00
M. Yusuf Sarıgöz
f3b25e4043
multimodal : add BakLLaVA conversion support (#3682) 2023-10-19 19:40:41 +03:00
M. Yusuf Sarıgöz
60abea9798
llava : avoid segfault in case of non-existent mmproj file (#3674) 2023-10-19 16:59:11 +03:00
Georgi Gerganov
325d1793f7
server : minor sync 2023-10-19 15:03:24 +03:00
Georgi Gerganov
9740824ba5
server : snake case 2023-10-19 14:44:37 +03:00
Georgi Gerganov
e3a2c3fe32
server : use refs + use llama_batch_clear() 2023-10-19 14:44:04 +03:00
Georgi Gerganov
3d5929e8ee
server : bug fix in ingest_images
n_tokens is incremented internally by llama_batch_add
2023-10-19 14:43:19 +03:00
Georgi Gerganov
a8c981b734
server : remove beam-search functionality 2023-10-19 14:10:37 +03:00
Georgi Gerganov
654e0a1fe0
server : coding-style normalization (part 2) 2023-10-19 14:09:45 +03:00
Georgi Gerganov
e44ed60187
server : coding-style normalization 2023-10-19 13:50:23 +03:00
FSSRepo
ab2fc00224 latest changes of sampling API 2023-10-18 16:57:48 -04:00
FSSRepo
8540568c48 Merge branch 'master' of https://github.com/ggerganov/llama.cpp 2023-10-18 16:55:26 -04:00
FSSRepo
7196c4e08a new sampling API 2023-10-18 16:50:09 -04:00
Georgi Gerganov
004797f6ac
readme : update hot topics 2023-10-18 21:44:43 +03:00
Georgi Gerganov
4e82b2ea3f
speculative : bug fixes 2023-10-18 18:49:40 +03:00
Georgi Gerganov
0e89203b51
speculative : add tree-based sampling example (#3624)
* sampling : one sequence per sampling context

ggml-ci

* speculative : add tree-based sampling support

ggml-ci

* speculative : reuse the n_parallel CLI param

* speculative : refactor sampling

* examples : fix build after sampling refactoring

ggml-ci

* batched : fix n_seq_id

* sampling : fix malloc

ggml-ci

* swift : fix build

ggml-ci

* swift : try to fix build

ggml-ci

* prompts : add assistant.txt

* common : add llama_batch_add() and llama_batch_clear() helpers

* speculative : minor refactor

ggml-ci

* minor : comments + rename

ggml-ci

* speculative : fix off-by-one for n_drafted

* speculative : fix the n_drafted fix + p constants
2023-10-18 16:21:57 +03:00
Steward Garcia
84b8f2b060
Merge branch 'ggerganov:master' into master 2023-10-18 08:43:17 -04:00
Jhen-Jie Hong
c67fe68e41
metal : implement q5_0 and q5_1 kernels (#3648)
* metal : implement dequantize_q5_0

* metal : block_q_n_dot_y for block_q5_0 (broken)

* metal : revert unnecessary change

* metal : implement dequantize_q5_1

* metal : block_q_n_dot_y for q5_1 (broken)

* metal : fix block_q_n_dot_y

* minor : spaces / formatting

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-18 15:21:48 +03:00
shibe2
1117d06607
opencl : fix element-wise multiplication (#3656) 2023-10-18 15:09:22 +03:00
FSSRepo
35fd37430f fix zig build 2023-10-17 18:04:26 -04:00
FSSRepo
c02c52efb5 fix multiple clients 2023-10-17 17:54:56 -04:00
FSSRepo
d2b1fac6c7 fix make bui;d errors 2023-10-17 17:18:56 -04:00
FSSRepo
ed0c11cb83 multimodal support enabled by default 2023-10-17 16:58:20 -04:00
FSSRepo
6c277eaab5 update api like OpenAI 2023-10-17 16:53:38 -04:00
FSSRepo
58f8ae9bfe readme change 2023-10-17 16:32:19 -04:00
FSSRepo
fa0f22f14f Merge remote-tracking branch 'upstream/master' 2023-10-17 16:31:33 -04:00
slaren
cb33f43a2a
fix embeddings when using CUDA (#3657) 2023-10-17 22:24:50 +02:00
FSSRepo
aa2268f4cd sync README.md changes 2023-10-17 16:21:05 -04:00
Georgi Gerganov
e1675d133c
llama : avoid fprintf in favor of LLAMA_LOG (#3538) 2023-10-17 22:34:26 +03:00
BarfingLemurs
8402566a7c
readme : update hot-topics & models, detail windows release in usage (#3615)
* Update README.md

* Update README.md

* Update README.md

* move "Running on Windows" section below "Prepare data and run"

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-17 21:13:21 +03:00
shibe2
40e5ce054f CLBlast: Fix temporary buffer size for f16 conversion (wsize)
Fix buffer overflow.
Reduce the size to fit just one 2D slice.
Assert sufficient size.
2023-10-17 21:02:30 +04:00
slaren
a5e8c1d8c7
train-text-from-scratch : fix assert failure in ggml-alloc (#3618) 2023-10-17 20:00:58 +03:00
Georgi Gerganov
e74c705e15
editorconfig : remove trailing spaces 2023-10-17 19:52:53 +03:00
coezbek
3ad1e3f1a1
server : documentation of JSON return value of /completion endpoint (#3632)
* Added documentation of JSON return value of /completion endpoint

* Update examples/server/README.md

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-17 19:51:02 +03:00
Georgi Gerganov
1142013da4
save-load-state : fix example + add ci test (#3655)
* save-load-state : fix example (close #3606)

* ci : add test for save-load-state example

ggml-ci
2023-10-17 19:12:46 +03:00
ldwang
5fe268a4d9
readme : add Aquila2 links (#3610)
Signed-off-by: ldwang <ftgreat@gmail.com>
Co-authored-by: ldwang <ftgreat@gmail.com>
2023-10-17 18:52:33 +03:00
staviq
1a159553f9
tokenizer : special token handling (#3538)
* Rewrite special token handling from #1931

* shorten param name, add st verification by type

* use offsets instead of copy by substr

* formatting, remove copying iterator on delete

* llama : normalize code-style

* swift fix

* print pfx/sfx if verb, main: split pfx input sfx

* dont add space when using special tokens

* minor : comment + spacing

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-17 18:11:01 +03:00