Georgi Gerganov
83e1490187
server : fix slot reuse
2023-10-22 21:57:23 +03:00
Georgi Gerganov
8fe7ca4875
server : apply fix from #3722
2023-10-22 21:05:45 +03:00
Georgi Gerganov
00ae55b388
server : hide ctx_sampling->prev behind API ( #3696 )
2023-10-22 20:09:25 +03:00
M. Yusuf Sarıgöz
3d6a687f1d
Update readme to document multimodal in server
2023-10-22 20:03:35 +03:00
Georgi Gerganov
dd1af2ed35
server : minor style
2023-10-22 19:52:50 +03:00
M. Yusuf Sarıgöz
a4d69d8b81
Merge branch 'server-rev' of https://github.com//ggerganov/llama.cpp into server-rev
2023-10-22 19:49:48 +03:00
M. Yusuf Sarıgöz
2679c432d5
Update readme to document multimodal in server
2023-10-22 19:49:33 +03:00
Georgi Gerganov
a8063171bd
server : completion requests remember slot_id
2023-10-22 19:34:48 +03:00
Georgi Gerganov
f305d6434f
editorconfig : new line in index.html
2023-10-22 19:10:30 +03:00
M. Yusuf Sarıgöz
5359fb9267
Do not save/load image_data to localStorage
2023-10-22 19:08:09 +03:00
Georgi Gerganov
f67d971344
server : bug fix for prompt caching
2023-10-22 17:52:59 +03:00
Georgi Gerganov
569ebf11cf
server : refactor ctx_sampling init + n_ctx + names
2023-10-22 16:57:05 +03:00
Georgi Gerganov
ef18f4d579
server : fix crash in Debug on macOS (I have no idea why this fixes it!?)
2023-10-22 16:55:40 +03:00
Georgi Gerganov
197a0a9e23
server : fix switch fallthrough
2023-10-22 16:55:05 +03:00
Georgi Gerganov
715f384a6b
clip : link to ggml, not to llama
2023-10-22 16:52:12 +03:00
Georgi Gerganov
4b4ab722ab
make : silence stb warnings
2023-10-22 16:51:59 +03:00
Georgi Gerganov
176993c871
Merge branch 'master' into server-rev
2023-10-22 15:04:16 +03:00
Georgi Gerganov
22c69a2794
batched : add len CLI argument
2023-10-22 08:37:20 +03:00
FSSRepo
2eb4c11ec5
fix image load + view image in chat
2023-10-21 14:34:19 -04:00
Jhen-Jie Hong
17b23eb9cb
server : fix multibyte handle in partial response ( #3706 )
2023-10-21 14:58:03 +03:00
shibe2
465219b914
CLBlast: Add outer loops over src0 for broadcasting in mulmat
...
Reduce repeated dequantization of the same data.
2023-10-20 22:30:52 +04:00
Georgi Gerganov
d1031cf49c
sampling : refactor init to use llama_sampling_params ( #3696 )
...
* sampling : refactor init to use llama_sampling_params
* llama : combine repetition, frequency and presence penalties in 1 call
* examples : remove embd-input and gptneox-wip
* sampling : rename penalty params + reduce size of "prev" vector
* sampling : add llama_sampling_print helper
* sampling : hide prev behind API and apply #3661
ggml-ci
2023-10-20 21:07:23 +03:00
Georgi Gerganov
778c070d1b
server : logs + minor code style
2023-10-20 20:44:51 +03:00
Georgi Gerganov
5d540e80d1
server : no need for atomic int - already using mutex
2023-10-20 20:44:29 +03:00
Georgi Gerganov
113dd60005
server : bach has to be allocated for n_parallel sequences
2023-10-20 20:42:45 +03:00
FSSRepo
6b2437e32d
added thread safe pipeline
2023-10-20 12:07:32 -04:00
Qin Yue Chen
8cf19d60dc
gguf : support big endian platform ( #3552 )
...
* check whether platform is 390x if yes->do not import immintrin.h
* support s390x big endian
* support --bigendian option for s390x
1. verified with baichuan7b-chat with float 16 on s390x
2. verified with baichuan7b-chat
3. verified with chinese-alpaca-2-13b-f16
* update format based on editor-config checker result
* Update convert-baichuan-hf-to-gguf.py
* 1. check in ggml.c if endianess is not match
2. update GGUF version
3. change get_pack_prefix to property
4. update information log
* always use "GGUF" as beginng of GGUF file
* Compare "GGUF" with file header char by char
1. Set GGUF_MAGIC to "GGUF" string instead of int value
2. Compare "GGUF" char by char to ensure its byte order
3. Move bytes swap code from convert.py to gguf.py write_tensor_data
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-20 14:19:40 +03:00
Georgi Gerganov
a0edf73bda
server : fix uninitialized sampling context ( close #3685 )
2023-10-20 13:06:10 +03:00
Herman Semenov
f439e506e8
ggml : fix rope + llama minor optimizations ( #3560 )
...
* Minor fixes and fixed memleak
* Using const auto references in range-based loop C++17
2023-10-20 13:02:12 +03:00
cebtenzzre
e78f3ef24a
convert : restore compat with old Falcon models ( #3680 )
2023-10-20 08:32:08 +03:00
M. Yusuf Sarıgöz
f3b25e4043
multimodal : add BakLLaVA conversion support ( #3682 )
2023-10-19 19:40:41 +03:00
M. Yusuf Sarıgöz
60abea9798
llava : avoid segfault in case of non-existent mmproj file ( #3674 )
2023-10-19 16:59:11 +03:00
Georgi Gerganov
325d1793f7
server : minor sync
2023-10-19 15:03:24 +03:00
Georgi Gerganov
9740824ba5
server : snake case
2023-10-19 14:44:37 +03:00
Georgi Gerganov
e3a2c3fe32
server : use refs + use llama_batch_clear()
2023-10-19 14:44:04 +03:00
Georgi Gerganov
3d5929e8ee
server : bug fix in ingest_images
...
n_tokens is incremented internally by llama_batch_add
2023-10-19 14:43:19 +03:00
Georgi Gerganov
a8c981b734
server : remove beam-search functionality
2023-10-19 14:10:37 +03:00
Georgi Gerganov
654e0a1fe0
server : coding-style normalization (part 2)
2023-10-19 14:09:45 +03:00
Georgi Gerganov
e44ed60187
server : coding-style normalization
2023-10-19 13:50:23 +03:00
FSSRepo
ab2fc00224
latest changes of sampling API
2023-10-18 16:57:48 -04:00
FSSRepo
8540568c48
Merge branch 'master' of https://github.com/ggerganov/llama.cpp
2023-10-18 16:55:26 -04:00
FSSRepo
7196c4e08a
new sampling API
2023-10-18 16:50:09 -04:00
Georgi Gerganov
004797f6ac
readme : update hot topics
2023-10-18 21:44:43 +03:00
Georgi Gerganov
4e82b2ea3f
speculative : bug fixes
2023-10-18 18:49:40 +03:00
Georgi Gerganov
0e89203b51
speculative : add tree-based sampling example ( #3624 )
...
* sampling : one sequence per sampling context
ggml-ci
* speculative : add tree-based sampling support
ggml-ci
* speculative : reuse the n_parallel CLI param
* speculative : refactor sampling
* examples : fix build after sampling refactoring
ggml-ci
* batched : fix n_seq_id
* sampling : fix malloc
ggml-ci
* swift : fix build
ggml-ci
* swift : try to fix build
ggml-ci
* prompts : add assistant.txt
* common : add llama_batch_add() and llama_batch_clear() helpers
* speculative : minor refactor
ggml-ci
* minor : comments + rename
ggml-ci
* speculative : fix off-by-one for n_drafted
* speculative : fix the n_drafted fix + p constants
2023-10-18 16:21:57 +03:00
Steward Garcia
84b8f2b060
Merge branch 'ggerganov:master' into master
2023-10-18 08:43:17 -04:00
Jhen-Jie Hong
c67fe68e41
metal : implement q5_0 and q5_1 kernels ( #3648 )
...
* metal : implement dequantize_q5_0
* metal : block_q_n_dot_y for block_q5_0 (broken)
* metal : revert unnecessary change
* metal : implement dequantize_q5_1
* metal : block_q_n_dot_y for q5_1 (broken)
* metal : fix block_q_n_dot_y
* minor : spaces / formatting
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-10-18 15:21:48 +03:00
shibe2
1117d06607
opencl : fix element-wise multiplication ( #3656 )
2023-10-18 15:09:22 +03:00
FSSRepo
35fd37430f
fix zig build
2023-10-17 18:04:26 -04:00
FSSRepo
c02c52efb5
fix multiple clients
2023-10-17 17:54:56 -04:00