llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-28 12:24:35 +00:00

Author	SHA1	Message	Date
Steward Garcia	f47fd17b73	Merge branch 'ggerganov:master' into master	2023-10-15 18:23:47 -04:00
FSSRepo	4e5c5c451c	notify the user from server ui that multimodality is unavialable	2023-10-14 08:28:49 -04:00
M. Yusuf Sarıgöz	11dc1091f6	Honor -ngl option for Cuda offloading in llava (#3621 )	2023-10-14 04:52:44 -06:00
FSSRepo	7e64bfe060	refactor code + remove unused comments + improved README.md	2023-10-14 00:31:34 -04:00
FSSRepo	9f72b44635	add multimodal input - alfa	2023-10-13 23:36:32 -04:00
FSSRepo	de35b47908	fixed tokens probs	2023-10-13 19:55:25 -04:00
FSSRepo	9d98cdda2c	llava multimodal integration	2023-10-13 18:42:44 -04:00
FSSRepo	eb08201227	add changes to README.md	2023-10-13 14:28:06 -04:00
FSSRepo	a2c2d98c16	add context swap	2023-10-13 14:12:50 -04:00
FSSRepo	b6d9e212e5	fixed timings per slot	2023-10-13 13:10:38 -04:00
FSSRepo	a410a9e300	unused change reverted	2023-10-13 12:23:58 -04:00
FSSRepo	6358ae5f48	server ui now support multiple clients	2023-10-13 12:22:54 -04:00
FSSRepo	4ba5a5013d	chat.mjs support cached prompt + some fixes	2023-10-13 11:06:41 -04:00
slaren	424b6381c4	ggml : add context enumeration functions (#3605 ) finetune : fix assert failure in ggml-alloc	2023-10-13 12:23:10 +02:00
FSSRepo	500ac7120e	cached prompt support	2023-10-12 21:16:12 -04:00
FSSRepo	83c2b3553a	grammar + no stream completion	2023-10-12 18:43:57 -04:00
FSSRepo	5b8e29de53	multiple client support	2023-10-12 17:09:12 -04:00
FSSRepo	81484805f0	completion endpoint working	2023-10-12 16:17:27 -04:00
FSSRepo	29c8cdd65d	refactored sampling function	2023-10-12 15:02:19 -04:00
FSSRepo	b716eeb72a	Merge branch 'master' of https://github.com/ggerganov/llama.cpp	2023-10-12 12:55:08 -04:00
FSSRepo	78504218b9	save dev progress	2023-10-12 12:51:48 -04:00
M. Yusuf Sarıgöz	370359e5ba	examples: support LLaVA v1.5 (multimodal model) (#3436 ) * WIP: start implementing LLaVA * rm scratch buf for now, will revert after cleanup * LLaVA image encoder is working. will combine with llama * Add llava inference code, but it's buggy. debugging * LLaVA is working e2e, needs to optimize memory allocation + cleanup * Use ggml_allocr + rm unnecessary code * fix: crlf -> lf * fix: new line at EoF * fix: trailing whitespace * Add readme * Update readme * Some cleanup * Are you happy editorconfig? * rm unused batch image preprocessing * rm unused import * fix: rm designated initializers * introduce pad-to-square mode for non-square images * are you happy editorconfig? * gitignore /llava * Handle cases where image file does not exist * add llava target to Makefile * add support for 13b model variant * Maybe seed is unlucky? * Check if apples are compared to apples * are you happy editorconfig? * Use temperature = 0.1 by default * command line: use gpt_params_parse() * minor * handle default n_predict * fix typo * llava : code formatting, rename files, fix compile warnings * do not use Wno-cast-qual for MSVC --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-12 18:23:18 +03:00
Aarni Koskela	b016596d90	server : add completion mode (no chat) (#3582 )	2023-10-12 09:51:53 +03:00
Georgi Gerganov	57dd55e2c7	server : fix kv cache management (#3588 )	2023-10-12 09:29:04 +03:00
FSSRepo	471230202d	crash fixed	2023-10-11 19:48:15 -04:00
FSSRepo	63f99b1ea6	implementing parallel decoding in server example	2023-10-11 18:14:11 -04:00
Georgi Gerganov	b8fe4b5cc9	main : fix session loading bug (#3400 )	2023-10-11 23:55:41 +03:00
Michael Coppola	a8bdd65525	server : add parameter -tb N, --threads-batch N (#3584 ) Co-authored-by: Michael Coppola <info@michaeljcoppola.com>	2023-10-11 22:42:22 +03:00
Kerfuffle	70c29da118	common : fix mirostat state when using multiple sequences (#3543 ) * Fix mirostat state when using multiple sequences * Fix mirostat by completely refactoring sampling! * Try to fix zig build. * Export function to fetch/create default sampler states Code formatting cleanups and add some comments Silence a warning about id not being used when logging is disabled * Apply some renaming suggestions. Fix comments that were out of sync with the pull. * Use more consistant naming convention for sampling contexts	2023-10-11 22:35:46 +03:00
Georgi Gerganov	8c70a5ff25	batched : add bench tool (#3545 ) * batched : add bench tool * batched : minor fix table * batched-bench : add readme + n_kv_max is now configurable * batched-bench : init warm-up batch * batched-bench : pass custom set of PP, TG and PL * batched-bench : add mmq CLI arg	2023-10-11 21:25:33 +03:00
Zane Shannon	24ba3d829e	examples : add batched.swift + improve CI for swift (#3562 )	2023-10-11 06:14:05 -05:00
vvhg1	11ea5c7d96	infill. : fix tokenization (#3508 ) * infill tokens correction * serverinfill tokens correction * removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape * removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape * only rm when params.escape, rm space if possible which is added back or rm added space token * only rm when params.escape, rm space if possible which is added back or rm added space token * Revert "only rm when params.escape, rm space if possible which is added back or rm added space token" This reverts commit `63ba0b621f`. * fix interactive prompt escaping and fix server infill leading space handling * rm unnecessary bool check	2023-10-10 10:31:21 +03:00
Georgi Gerganov	fcca0a7004	refact : fix convert script + zero out KV cache to avoid nans (#3523 ) * refact : fix convert script + zero out KV cache to avoid nans * ggml : silu(-inf) should never happen * metal : assert various kernel requirements	2023-10-09 14:32:17 +03:00
Ryder Wishart	8e6716a102	api_like_OAI.py : compat with Microsoft Guidance (#2746 ) Check for None in addition to empty string check in all request params Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-08 13:55:58 +03:00
arcrank	9c38d181d4	api_like_OAI.py : simplify function (#2796 ) Simplify function	2023-10-08 13:52:57 +03:00
Mihai	cb13d73a72	server : docs fix default values and add n_probs (#3506 )	2023-10-06 21:39:33 +03:00
pudepiedj	a8777ad84e	parallel : add option to load external prompt file (#3416 ) * Enable external file and add datestamp * Add name of external file at end * Upload ToK2024 * Delete ToK2024.txt * Experiments with jeopardy * Move ParallelQuestions to /proimpts and rename * Interim commit * Interim commit * Final revision * Remove trailing whitespace * remove cmake_all.sh * Remove cmake_all.sh * Changed .gitignore * Improved reporting and new question files. * Corrected typo * More LLM questions * Update LLM-questions.txt * Yet more LLM-questions * Remove jeopardy results file * Reinstate original jeopardy.sh * Update examples/parallel/parallel.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-06 16:16:38 +03:00
Jhen-Jie Hong	97af49fa39	server : reuse llama_sample_token common util (#3494 ) * server : reuse llama_sample_token common function * common : use n_probs for temperature sampling	2023-10-06 15:44:24 +03:00
Kenvix ⭐	45eba9369f	build : use std::make_tuple() for compatibility with older GCC versions (#3488 )	2023-10-05 20:16:39 +03:00
Jhen-Jie Hong	e8b8d32e86	server : fix incorrect num_tokens_predicted (#3480 )	2023-10-05 17:02:55 +03:00
Merrick Christensen	f72f8f22c9	finetune : readme fix typo (#3465 ) Fix small typo	2023-10-04 09:33:13 +03:00
h-h-h-h	8186242b6d	main : consistent prefix/suffix coloring (#3425 ) * Typo * No `--in-prefix` coloring The `--in-prefix` text was inconsistently colored. Now, it's never colored, just like the `--in-suffix` text.	2023-10-03 21:16:15 +03:00
Georgi Gerganov	ac2219fef3	llama : fix session saving/loading (#3400 ) * llama : fix session saving/loading * llama : temp fix for clearing "future" tokens from the KV cache * llama : fix handling of "future" tokens when loading sessions * llama : fix comments for llama_kv_cache API	2023-10-03 21:04:01 +03:00
cebtenzzre	0fe321031a	gguf : general usability improvements (#3409 )	2023-10-02 14:58:46 -04:00
xaedes	a03ce38455	finetune : fix #3404 (#3437 ) the shapes for init model of gqa models was wrong	2023-10-02 16:15:45 +03:00
bandoti	095231dfd3	cmake : fix transient definitions in find pkg (#3411 )	2023-10-02 12:51:49 +03:00
vvhg1	c97f01c362	infill : add new example + extend server API (#3296 ) * vvhg-code-infill (#1) * infill in separate example (#2) * reverted changes to main and added infill example * cleanup * naming improvement * make : add missing blank line * fix missing semicolon * brought infill up to current main code * cleanup --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>	2023-10-02 10:42:02 +03:00
Georgi Gerganov	bc34dd4f5b	train : fix KQ_pos allocation (#3392 ) * train : fix KQ_pos allocation * make sure KQ_pos is not reallocated in finetune --------- Co-authored-by: xaedes <xaedes@gmail.com>	2023-09-29 19:05:18 +03:00
Cebtenzzre	bc39553c90	build : enable more non-default compiler warnings (#3200 )	2023-09-28 17:41:44 -04:00
slaren	16bc66d947	llama.cpp : split llama_context_params into model and context params (#3301 ) * llama.cpp : split llama_context_params into model and context params ggml-ci * fix metal build * fix freq_base/scale default to model value * llama-bench : keep the same model between tests when possible * move n_threads to llama_context_params, add n_threads_batch * fix mpi build * remove kv_size(), cuda scratch fixes * remove low-vram option * add n_threads_batch to system info, refactor to get_system_info() * add documentation about --threads-batch to the READMEs * llama-bench fix * main : fix rope freq/scale warning * llama.cpp : add llama_get_model common : add llama_tokenize from model * remove duplicated ctx/model functions ggml-ci * cuda : print total VRAM used	2023-09-28 22:42:38 +03:00

1 2 3 4 5 ...

368 Commits