llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-27 20:04:35 +00:00

Author	SHA1	Message	Date
Georgi Gerganov	5d540e80d1	server : no need for atomic int - already using mutex	2023-10-20 20:44:29 +03:00
Georgi Gerganov	113dd60005	server : bach has to be allocated for n_parallel sequences	2023-10-20 20:42:45 +03:00
FSSRepo	6b2437e32d	added thread safe pipeline	2023-10-20 12:07:32 -04:00
Georgi Gerganov	325d1793f7	server : minor sync	2023-10-19 15:03:24 +03:00
Georgi Gerganov	9740824ba5	server : snake case	2023-10-19 14:44:37 +03:00
Georgi Gerganov	e3a2c3fe32	server : use refs + use llama_batch_clear()	2023-10-19 14:44:04 +03:00
Georgi Gerganov	3d5929e8ee	server : bug fix in ingest_images n_tokens is incremented internally by llama_batch_add	2023-10-19 14:43:19 +03:00
Georgi Gerganov	a8c981b734	server : remove beam-search functionality	2023-10-19 14:10:37 +03:00
Georgi Gerganov	654e0a1fe0	server : coding-style normalization (part 2)	2023-10-19 14:09:45 +03:00
Georgi Gerganov	e44ed60187	server : coding-style normalization	2023-10-19 13:50:23 +03:00
FSSRepo	ab2fc00224	latest changes of sampling API	2023-10-18 16:57:48 -04:00
FSSRepo	8540568c48	Merge branch 'master' of https://github.com/ggerganov/llama.cpp	2023-10-18 16:55:26 -04:00
FSSRepo	7196c4e08a	new sampling API	2023-10-18 16:50:09 -04:00
Georgi Gerganov	4e82b2ea3f	speculative : bug fixes	2023-10-18 18:49:40 +03:00
Georgi Gerganov	0e89203b51	speculative : add tree-based sampling example (#3624 ) * sampling : one sequence per sampling context ggml-ci * speculative : add tree-based sampling support ggml-ci * speculative : reuse the n_parallel CLI param * speculative : refactor sampling * examples : fix build after sampling refactoring ggml-ci * batched : fix n_seq_id * sampling : fix malloc ggml-ci * swift : fix build ggml-ci * swift : try to fix build ggml-ci * prompts : add assistant.txt * common : add llama_batch_add() and llama_batch_clear() helpers * speculative : minor refactor ggml-ci * minor : comments + rename ggml-ci * speculative : fix off-by-one for n_drafted * speculative : fix the n_drafted fix + p constants	2023-10-18 16:21:57 +03:00
FSSRepo	c02c52efb5	fix multiple clients	2023-10-17 17:54:56 -04:00
FSSRepo	d2b1fac6c7	fix make bui;d errors	2023-10-17 17:18:56 -04:00
FSSRepo	ed0c11cb83	multimodal support enabled by default	2023-10-17 16:58:20 -04:00
FSSRepo	6c277eaab5	update api like OpenAI	2023-10-17 16:53:38 -04:00
FSSRepo	58f8ae9bfe	readme change	2023-10-17 16:32:19 -04:00
FSSRepo	fa0f22f14f	Merge remote-tracking branch 'upstream/master'	2023-10-17 16:31:33 -04:00
FSSRepo	aa2268f4cd	sync README.md changes	2023-10-17 16:21:05 -04:00
Georgi Gerganov	e1675d133c	llama : avoid fprintf in favor of LLAMA_LOG (#3538 )	2023-10-17 22:34:26 +03:00
slaren	a5e8c1d8c7	train-text-from-scratch : fix assert failure in ggml-alloc (#3618 )	2023-10-17 20:00:58 +03:00
Georgi Gerganov	e74c705e15	editorconfig : remove trailing spaces	2023-10-17 19:52:53 +03:00
coezbek	3ad1e3f1a1	server : documentation of JSON return value of /completion endpoint (#3632 ) * Added documentation of JSON return value of /completion endpoint * Update examples/server/README.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-17 19:51:02 +03:00
Georgi Gerganov	1142013da4	save-load-state : fix example + add ci test (#3655 ) * save-load-state : fix example (close #3606) * ci : add test for save-load-state example ggml-ci	2023-10-17 19:12:46 +03:00
staviq	1a159553f9	tokenizer : special token handling (#3538 ) * Rewrite special token handling from #1931 * shorten param name, add st verification by type * use offsets instead of copy by substr * formatting, remove copying iterator on delete * llama : normalize code-style * swift fix * print pfx/sfx if verb, main: split pfx input sfx * dont add space when using special tokens * minor : comment + spacing --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-17 18:11:01 +03:00
Georgi Gerganov	940efa95fe	llava : fix tokenization to not add bos between image embeddings and user prompt (#3645 ) * llava : fix tokenization to not add bos after system prompt * set seed --------- Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>	2023-10-16 23:58:00 +03:00
FSSRepo	4d1804330e	fix llava implementation	2023-10-16 16:31:17 -04:00
FSSRepo	d7eca255d7	context shift fixed	2023-10-16 14:43:10 -04:00
FSSRepo	2d9f11db28	fixed premature end due stop word	2023-10-16 12:36:05 -04:00
FSSRepo	fd64f04fc2	fix long prompt than ctx proposed in #3639	2023-10-15 19:07:18 -04:00
FSSRepo	b727e022d6	fix ci make build undefined ref errors	2023-10-15 18:53:48 -04:00
FSSRepo	ce961a304b	some ci fixes	2023-10-15 18:46:01 -04:00
Steward Garcia	9035978aae	Merge pull request #6 from damian0815/fssrepo_mac_fixes fix compilation errors with llvm	2023-10-15 18:38:52 -04:00
Steward Garcia	f47fd17b73	Merge branch 'ggerganov:master' into master	2023-10-15 18:23:47 -04:00
FSSRepo	4e5c5c451c	notify the user from server ui that multimodality is unavialable	2023-10-14 08:28:49 -04:00
M. Yusuf Sarıgöz	11dc1091f6	Honor -ngl option for Cuda offloading in llava (#3621 )	2023-10-14 04:52:44 -06:00
Damian Stewart	299f6b54d8	fix compilation errors with llvm	2023-10-14 11:17:38 +02:00
FSSRepo	7e64bfe060	refactor code + remove unused comments + improved README.md	2023-10-14 00:31:34 -04:00
FSSRepo	9f72b44635	add multimodal input - alfa	2023-10-13 23:36:32 -04:00
FSSRepo	de35b47908	fixed tokens probs	2023-10-13 19:55:25 -04:00
FSSRepo	9d98cdda2c	llava multimodal integration	2023-10-13 18:42:44 -04:00
FSSRepo	eb08201227	add changes to README.md	2023-10-13 14:28:06 -04:00
FSSRepo	a2c2d98c16	add context swap	2023-10-13 14:12:50 -04:00
FSSRepo	b6d9e212e5	fixed timings per slot	2023-10-13 13:10:38 -04:00
FSSRepo	a410a9e300	unused change reverted	2023-10-13 12:23:58 -04:00
FSSRepo	6358ae5f48	server ui now support multiple clients	2023-10-13 12:22:54 -04:00
FSSRepo	4ba5a5013d	chat.mjs support cached prompt + some fixes	2023-10-13 11:06:41 -04:00

1 2 3 4 5 ...

405 Commits