llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-12 11:40:17 +00:00

Author	SHA1	Message	Date
Georgi Gerganov	f55b647300	llama : minor indentation during tensor loading ggml-ci	2024-07-04 19:34:04 +03:00
Francis Couture-Harpin	18e92879d5	llama : fix t5 uses of n_head and n_ff	2024-07-04 11:52:48 -04:00
Francis Couture-Harpin	c6ac198424	Merge branch 'master' into openelm	2024-07-04 11:45:21 -04:00
Francis Couture-Harpin	269e07bb00	llama : use const ref for print_f and fix division by zero	2024-07-04 11:39:32 -04:00
ditsuke	51d2ebadbb	build: Export hf-to-gguf as snakecase	2024-07-04 15:39:13 +00:00
ditsuke	1e920018d3	doc: Add context for why we add an explicit pytorch source	2024-07-04 15:39:13 +00:00
ditsuke	01a5f06550	chore: Remove rebase artifacts	2024-07-04 15:39:13 +00:00
ditsuke	07786a61a2	chore: Fixup requirements and build	2024-07-04 15:39:13 +00:00
ditsuke	de14e2ea2b	chore: ignore all __pychache__	2024-07-04 15:39:13 +00:00
ditsuke	821922916f	fix: Update script paths in CI scripts	2024-07-04 15:39:13 +00:00
ditsuke	b1c3f26e5e	fix: Actually include scripts in build Not namespaced though :(	2024-07-04 15:39:13 +00:00
ditsuke	b0a46993df	build(python): Package scripts with pip-0517 compliance	2024-07-04 15:39:13 +00:00
Georgi Gerganov	199d0fb0c9	Merge branch 'master' into pr/7359	2024-07-04 18:25:16 +03:00
Georgi Gerganov	3fe395d220	llama : handle n_head == 0	2024-07-04 18:23:17 +03:00
fairydreaming	807b0c49ff	Inference support for T5 and FLAN-T5 model families (#5763 ) * llama : add inference support and model types for T5 and FLAN-T5 model families * llama : add new API functions to support encoder-decoder models: llama_encode(), llama_model_has_encoder(), llama_model_decoder_start_token() * common, llama-cli, llama-batched : add support for encoder-decoder models * convert-hf : handle shared token embeddings tensors in T5Model * convert-hf : add support for SentencePiece BPE tokenizer in T5Model (for Pile-T5 models) * convert-hf : add MT5ForConditionalGeneration and UMT5ForConditionalGeneration to architectures supported by T5Model * convert : add t5 tokenizer tests, use "slow" HF tokenizer for t5 --------- Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-07-04 15:46:11 +02:00
Georgi Gerganov	22a648f8cc	Merge branch 'master' into pr/7359	2024-07-04 16:42:13 +03:00
Georgi Gerganov	9971c38ada	llama : do not print hparams for vocab-only models	2024-07-04 16:39:02 +03:00
Georgi Gerganov	b59ddf945e	llama : fix save/load state	2024-07-04 15:55:23 +03:00
Georgi Gerganov	29ab5a0ed1	llama : use std::array for per-layer hparams	2024-07-04 15:35:15 +03:00
Daniel Bevenius	f8c4c0738d	tests : add _CRT_SECURE_NO_WARNINGS for WIN32 (#8231 ) This commit adds the compile definition `_CRT_SECURE_NO_WARNINGS` to the root cmake subproject. The motivation for this is that currently the following warnings are displayed when compiling the tests and common cmake subprojects: ```console test-llama-grammar.cpp C:\llama.cpp\src\.\llama.cpp(1406,77): warning C4996: 'strerror': This function or variable may be unsafe. Consider using strerror_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details. [C:\llama.cpp\build\tests\test-llama-grammar.vcxproj] ... ``` This compile definition is currently set for the `src` subproject and this change moves into the root cmake project so that it is applied to all cmake subprojects.	2024-07-04 13:53:42 +03:00
Daniel Bevenius	402d6feffa	llama : suppress unref var in Windows MSVC (#8150 ) * llama : suppress unref var in Windows MSVC This commit suppresses two warnings that are currently generated for src/llama.cpp when building on Windows MSVC ```console C:\llama.cpp\src\llama.cpp(14349,45): warning C4101: 'ex': unreferenced local variable [C:\llama.cpp\build\src\llama.vcxproj] C:\llama.cpp\src\llama.cpp(19285,44): warning C4101: 'e': unreferenced local variable [C:\llama.cpp\build\src\llama.vcxproj] ``` * Update src/llama.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-07-04 13:50:57 +03:00
Georgi Gerganov	20fc3804bf	convert : fix gemma v1 tokenizer convert (#8248 ) ggml-ci	2024-07-04 10:41:03 +03:00
AidanBeltonS	f619024764	[SYCL] Remove unneeded semicolons (#8280 )	2024-07-04 09:07:19 +08:00
Daniele	d23287f122	Define and optimize RDNA1 (#8085 )	2024-07-04 01:02:58 +02:00
slaren	5f2d4e60e2	ppl : fix n_seq_max for perplexity (#8277 ) * ppl : fix n_seq_max for perplexity * use 1 seq for kl_divergence	2024-07-03 20:33:31 +03:00
Xuan Son Nguyen	916248af1f	fix phi 3 conversion (#8262 )	2024-07-03 16:01:54 +02:00
Judd	f8d6a23804	fix typo (#8267 ) Co-authored-by: Judd <foldl@boxvest.com>	2024-07-03 14:40:16 +02:00
AidanBeltonS	fadde67135	Dequant improvements rebase (#8255 ) * Single load for half2 * Store scales in local mem * Vec load quantized values	2024-07-03 09:55:34 +08:00
MistApproach	a27152b602	fix: add missing short command line argument -mli for multiline-input (#8261 )	2024-07-02 22:56:46 +02:00
Clint Herron	3e2618bc7b	Adding step to `clean` target to remove legacy binary names to reduce upgrade / migration confusion arising from #7809 . (#8257 )	2024-07-02 13:19:56 -04:00
Clint Herron	07a3fc0608	Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258 )	2024-07-02 12:18:10 -04:00
Faisal Zaghloul	968967376d	Add `JAIS` model(s) (#8118 ) * Add `JAIS` model(s) * cleanup * address review comments * remove hack * un-hardcode max-alibi-bias * minor tweaks --------- Co-authored-by: fmz <quic_fzaghlou@quic.com>	2024-07-02 16:36:00 +02:00
Daniel Bevenius	023b8807e1	convert-hf : print output file name when completed (#8181 ) * convert-hf : print output file name when completed This commit adds the output file name to the log message when the conversion is completed. The motivation for this change is that when `--outfile` option is not specified it migth not be obvious where the output file is written. With this change the output of running the script will be something like the following: ```console INFO:hf-to-gguf:Model successfully exported to models/gemma-2-9b-it.gguf. ``` Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! convert-hf : print output file name when completed Updates the output of to support printing the directory if the output is split into multiple files. Also the output file name is now retrieved from the model_instance object. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! convert-hf : print output file name when completed Use parent attribute of Path object and string interpolation. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! convert-hf : print output file name when completed Use os.sep instead of hardcoding the path separator. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-07-02 09:40:49 +03:00
slaren	0e0590adab	cuda : update supports_op for matrix multiplication (#8245 )	2024-07-02 09:39:38 +03:00
luoyu-intel	a9f3b10215	[SYCL] Fix win build conflict of math library (#8230 ) * fix win build conflict of math library * fix the condition: !(win32 & SYCL) * revert warp_size=16	2024-07-02 12:50:07 +08:00
luoyu-intel	d08c20edde	[SYCL] Fix the sub group size of Intel (#8106 ) * use warp_size macro for all sycl kernels * fix mask of permute_sub_group_by_xor * fix rms_norm with correct warp number * fix rms_norm_f32/group_norm_f32 * move norm to norm.cpp file * fix quantize bug * fix mmvq's batch size	2024-07-02 10:16:00 +08:00
Xuan Son Nguyen	5fac350b9c	Fix gemma2 tokenizer convert (#8244 ) * fix gemma2 tokenizer convert * remove scores * improve code, fix new line issue	2024-07-02 01:07:23 +02:00
compilade	e3e33c0cbc	llama : minor spacing changes Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-07-01 15:23:02 -04:00
Johannes Gäßler	cb5fad4c6c	CUDA: refactor and optimize IQ MMVQ (#8215 ) * CUDA: refactor and optimize IQ MMVQ * uint -> uint32_t * __dp4a -> ggml_cuda_dp4a * remove MIN_CC_DP4A checks * change default * try CI fix	2024-07-01 20:39:06 +02:00
Mateusz Charytoniuk	dae57a1ebc	readme: add Paddler to the list of projects (#8239 )	2024-07-01 20:13:22 +03:00
Xuan Son Nguyen	49122a873f	gemma2: add sliding window mask (#8227 ) * gemma2: add sliding window mask * fix data_swa uninitialized * better naming * add co-author Co-authored-by: Arlo Phoenix <arlo-phoenix@users.noreply.github.com> * replace list with single tensor * update * llama : minor styling * convert : add sanity check for query_pre_attn_scalar * fix small typo in README --------- Co-authored-by: Arlo Phoenix <arlo-phoenix@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-07-01 18:48:34 +02:00
Roni	0ddeff1023	readme : update tool list (#8209 ) * Added gppm to Tool list in README * Update README.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-07-01 15:48:16 +03:00
Michael Francis	3840b6f593	nix : enable curl (#8043 ) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-07-01 14:47:04 +03:00
Georgi Gerganov	257f8e41e2	nix : remove OpenCL remnants (#8235 ) * nix : remove OpenCL remnants * minor : remove parentheses	2024-07-01 14:46:18 +03:00
iacore	694c59cb42	Document BERT support. (#8205 ) * Update README.md document BERT support * Update README.md	2024-07-01 13:40:58 +02:00
zhentaoyu	197fe6c1d7	[SYCL] Update SYCL-Rope op and Refactor (#8157 ) * align with rope.cu and move sycl-op to a single file	2024-07-01 19:39:06 +08:00
Francis Couture-Harpin	c8cdb48d10	llama : support all OpenELM models * llama : add variable GQA and variable FFN sizes Some metadata keys can now also be arrays to support setting their value per-layer for models like OpenELM.	2024-06-30 23:14:01 -04:00
Georgi Gerganov	d0a7145ba9	flake.lock: Update (#8218 )	2024-06-30 16:09:34 -07:00
Francis Couture-Harpin	51b2577dd4	Merge branch 'master' into openelm	2024-06-30 16:22:07 -04:00
Xuan Son Nguyen	9ef0780062	Fix new line issue with chat template, disable template when in-prefix/suffix is set (#8203 ) * preserve new line llama_chat_format_single * disable chat template if in-prefix/suffix is set * remove redundant change	2024-06-30 20:27:13 +02:00

1 2 3 4 5 ...

3319 Commits