llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-27 20:04:35 +00:00

Author	SHA1	Message	Date
M. Yusuf Sarıgöz	5359fb9267	Do not save/load image_data to localStorage	2023-10-22 19:08:09 +03:00
Georgi Gerganov	f67d971344	server : bug fix for prompt caching	2023-10-22 17:52:59 +03:00
Georgi Gerganov	569ebf11cf	server : refactor ctx_sampling init + n_ctx + names	2023-10-22 16:57:05 +03:00
Georgi Gerganov	ef18f4d579	server : fix crash in Debug on macOS (I have no idea why this fixes it!?)	2023-10-22 16:55:40 +03:00
Georgi Gerganov	197a0a9e23	server : fix switch fallthrough	2023-10-22 16:55:05 +03:00
Georgi Gerganov	715f384a6b	clip : link to ggml, not to llama	2023-10-22 16:52:12 +03:00
Georgi Gerganov	176993c871	Merge branch 'master' into server-rev	2023-10-22 15:04:16 +03:00
Georgi Gerganov	22c69a2794	batched : add len CLI argument	2023-10-22 08:37:20 +03:00
FSSRepo	2eb4c11ec5	fix image load + view image in chat	2023-10-21 14:34:19 -04:00
Jhen-Jie Hong	17b23eb9cb	server : fix multibyte handle in partial response (#3706 )	2023-10-21 14:58:03 +03:00
Georgi Gerganov	d1031cf49c	sampling : refactor init to use llama_sampling_params (#3696 ) * sampling : refactor init to use llama_sampling_params * llama : combine repetition, frequency and presence penalties in 1 call * examples : remove embd-input and gptneox-wip * sampling : rename penalty params + reduce size of "prev" vector * sampling : add llama_sampling_print helper * sampling : hide prev behind API and apply #3661 ggml-ci	2023-10-20 21:07:23 +03:00
Georgi Gerganov	778c070d1b	server : logs + minor code style	2023-10-20 20:44:51 +03:00
Georgi Gerganov	5d540e80d1	server : no need for atomic int - already using mutex	2023-10-20 20:44:29 +03:00
Georgi Gerganov	113dd60005	server : bach has to be allocated for n_parallel sequences	2023-10-20 20:42:45 +03:00
FSSRepo	6b2437e32d	added thread safe pipeline	2023-10-20 12:07:32 -04:00
Qin Yue Chen	8cf19d60dc	gguf : support big endian platform (#3552 ) * check whether platform is 390x if yes->do not import immintrin.h * support s390x big endian * support --bigendian option for s390x 1. verified with baichuan7b-chat with float 16 on s390x 2. verified with baichuan7b-chat 3. verified with chinese-alpaca-2-13b-f16 * update format based on editor-config checker result * Update convert-baichuan-hf-to-gguf.py * 1. check in ggml.c if endianess is not match 2. update GGUF version 3. change get_pack_prefix to property 4. update information log * always use "GGUF" as beginng of GGUF file * Compare "GGUF" with file header char by char 1. Set GGUF_MAGIC to "GGUF" string instead of int value 2. Compare "GGUF" char by char to ensure its byte order 3. Move bytes swap code from convert.py to gguf.py write_tensor_data --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-20 14:19:40 +03:00
Georgi Gerganov	a0edf73bda	server : fix uninitialized sampling context (close #3685 )	2023-10-20 13:06:10 +03:00
M. Yusuf Sarıgöz	f3b25e4043	multimodal : add BakLLaVA conversion support (#3682 )	2023-10-19 19:40:41 +03:00
M. Yusuf Sarıgöz	60abea9798	llava : avoid segfault in case of non-existent mmproj file (#3674 )	2023-10-19 16:59:11 +03:00
Georgi Gerganov	325d1793f7	server : minor sync	2023-10-19 15:03:24 +03:00
Georgi Gerganov	9740824ba5	server : snake case	2023-10-19 14:44:37 +03:00
Georgi Gerganov	e3a2c3fe32	server : use refs + use llama_batch_clear()	2023-10-19 14:44:04 +03:00
Georgi Gerganov	3d5929e8ee	server : bug fix in ingest_images n_tokens is incremented internally by llama_batch_add	2023-10-19 14:43:19 +03:00
Georgi Gerganov	a8c981b734	server : remove beam-search functionality	2023-10-19 14:10:37 +03:00
Georgi Gerganov	654e0a1fe0	server : coding-style normalization (part 2)	2023-10-19 14:09:45 +03:00
Georgi Gerganov	e44ed60187	server : coding-style normalization	2023-10-19 13:50:23 +03:00
FSSRepo	ab2fc00224	latest changes of sampling API	2023-10-18 16:57:48 -04:00
FSSRepo	8540568c48	Merge branch 'master' of https://github.com/ggerganov/llama.cpp	2023-10-18 16:55:26 -04:00
FSSRepo	7196c4e08a	new sampling API	2023-10-18 16:50:09 -04:00
Georgi Gerganov	4e82b2ea3f	speculative : bug fixes	2023-10-18 18:49:40 +03:00
Georgi Gerganov	0e89203b51	speculative : add tree-based sampling example (#3624 ) * sampling : one sequence per sampling context ggml-ci * speculative : add tree-based sampling support ggml-ci * speculative : reuse the n_parallel CLI param * speculative : refactor sampling * examples : fix build after sampling refactoring ggml-ci * batched : fix n_seq_id * sampling : fix malloc ggml-ci * swift : fix build ggml-ci * swift : try to fix build ggml-ci * prompts : add assistant.txt * common : add llama_batch_add() and llama_batch_clear() helpers * speculative : minor refactor ggml-ci * minor : comments + rename ggml-ci * speculative : fix off-by-one for n_drafted * speculative : fix the n_drafted fix + p constants	2023-10-18 16:21:57 +03:00
FSSRepo	c02c52efb5	fix multiple clients	2023-10-17 17:54:56 -04:00
FSSRepo	d2b1fac6c7	fix make bui;d errors	2023-10-17 17:18:56 -04:00
FSSRepo	ed0c11cb83	multimodal support enabled by default	2023-10-17 16:58:20 -04:00
FSSRepo	6c277eaab5	update api like OpenAI	2023-10-17 16:53:38 -04:00
FSSRepo	58f8ae9bfe	readme change	2023-10-17 16:32:19 -04:00
FSSRepo	fa0f22f14f	Merge remote-tracking branch 'upstream/master'	2023-10-17 16:31:33 -04:00
FSSRepo	aa2268f4cd	sync README.md changes	2023-10-17 16:21:05 -04:00
Georgi Gerganov	e1675d133c	llama : avoid fprintf in favor of LLAMA_LOG (#3538 )	2023-10-17 22:34:26 +03:00
slaren	a5e8c1d8c7	train-text-from-scratch : fix assert failure in ggml-alloc (#3618 )	2023-10-17 20:00:58 +03:00
Georgi Gerganov	e74c705e15	editorconfig : remove trailing spaces	2023-10-17 19:52:53 +03:00
coezbek	3ad1e3f1a1	server : documentation of JSON return value of /completion endpoint (#3632 ) * Added documentation of JSON return value of /completion endpoint * Update examples/server/README.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-17 19:51:02 +03:00
Georgi Gerganov	1142013da4	save-load-state : fix example + add ci test (#3655 ) * save-load-state : fix example (close #3606) * ci : add test for save-load-state example ggml-ci	2023-10-17 19:12:46 +03:00
staviq	1a159553f9	tokenizer : special token handling (#3538 ) * Rewrite special token handling from #1931 * shorten param name, add st verification by type * use offsets instead of copy by substr * formatting, remove copying iterator on delete * llama : normalize code-style * swift fix * print pfx/sfx if verb, main: split pfx input sfx * dont add space when using special tokens * minor : comment + spacing --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-17 18:11:01 +03:00
Georgi Gerganov	940efa95fe	llava : fix tokenization to not add bos between image embeddings and user prompt (#3645 ) * llava : fix tokenization to not add bos after system prompt * set seed --------- Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>	2023-10-16 23:58:00 +03:00
FSSRepo	4d1804330e	fix llava implementation	2023-10-16 16:31:17 -04:00
FSSRepo	d7eca255d7	context shift fixed	2023-10-16 14:43:10 -04:00
FSSRepo	2d9f11db28	fixed premature end due stop word	2023-10-16 12:36:05 -04:00
FSSRepo	fd64f04fc2	fix long prompt than ctx proposed in #3639	2023-10-15 19:07:18 -04:00
FSSRepo	b727e022d6	fix ci make build undefined ref errors	2023-10-15 18:53:48 -04:00

1 2 3 4 5 ...

421 Commits