llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-27 20:04:35 +00:00

Author	SHA1	Message	Date
klosax	1b4f9c8eb9	convert-gptneox-h5-to-gguf.py : accumulate kv and ti + special tokens	2023-08-01 23:40:50 +02:00
klosax	49380a23a3	gguf.py : accumulate kv and tensor info data + special tokens	2023-08-01 23:37:48 +02:00
klosax	ff1cb02397	constants.py : special tokens	2023-08-01 23:17:21 +02:00
Bono Lv	c574bddb36	fix a typo in examples/server/README.md (#2478 )	2023-08-01 14:54:28 +02:00
klosax	36a36c32a3	Update gptneox-main.cpp	2023-08-01 14:44:28 +02:00
klosax	c77fabb1f9	gptneox-main.cpp : special tokens	2023-08-01 14:32:53 +02:00
klosax	e7a741695c	convert-gptneox-h5-to-gguf.py : Special tokens	2023-08-01 14:30:00 +02:00
ebraminio	86aeb27734	server : Support dark mode (#2414 ) * server : Support dark mode So it respects user system light / dark settings. * Update index.html.hpp by running ./deps.sh	2023-08-01 10:56:23 +02:00
Matteo Boschini	1873ff586b	metal : add gqa8 kernel to allow llama-2-70B on metal (#2459 ) * Added gqa8 kernel to allow llama-2-70B on metal * Update ggml-metal.m Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com> * Extend kernel_mul_mat_f16_f32 to handle gqa broadcast * Added ne03==ne13 assertion --------- Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>	2023-08-01 10:43:12 +03:00
klosax	da4900e835	Update convert-llama-h5-to-gguf.py	2023-07-31 23:04:03 +02:00
M. Yusuf Sarıgöz	f3de876a12	fix : update convert-llama-h5-to-gguf.py	2023-07-31 23:58:29 +03:00
Johannes Gäßler	49e7cb5bb1	CUDA: fixed LLAMA_FAST compilation option (#2473 )	2023-07-31 21:02:19 +02:00
Johannes Gäßler	b772bba42e	CUDA: fixed cmake F16 option (#2471 )	2023-07-31 19:52:22 +02:00
M. Yusuf Sarıgöz	bb42aefaeb	gguf : mmap tensor data example	2023-07-31 17:46:12 +03:00
Johannes Gäßler	0728c5a8b9	CUDA: mmq CLI option, fixed mmq build issues (#2453 )	2023-07-31 15:44:35 +02:00
M. Yusuf Sarıgöz	b26f5b2e43	gguf : fix typo in function call	2023-07-31 16:23:54 +03:00
Johannes Gäßler	1215ed7d5c	CUDA: Implemented row flattening for non-glm RoPE (#2468 )	2023-07-31 14:32:30 +02:00
Johannes Gäßler	2dbf518911	CUDA: fewer memory bank conflicts for mul_mat_q (#2458 )	2023-07-31 13:18:51 +02:00
slaren	9d2382b3e4	Fix Metal backend broken from the allocator changes (#2455 ) * fix Metal backend broken from the allocator changes	2023-07-31 11:02:53 +02:00
M. Yusuf Sarıgöz	7aa0a0e7f7	gguf : support custom alignment value	2023-07-31 09:59:36 +03:00
klosax	6b3a7b9f4f	Update convert-llama-h5-to-gguf.py	2023-07-31 03:02:00 +02:00
klosax	4f5b6224be	Update convert-gptneox-h5-to-gguf.py	2023-07-31 03:00:20 +02:00
klosax	2a0914673c	Update convert-gptneox-h5-to-gguf.py	2023-07-30 17:31:11 +02:00
klosax	068a8e0fbe	Update convert-llama-h5-to-gguf.py	2023-07-30 17:29:56 +02:00
klosax	30c4ea47e6	add gptneox gguf example	2023-07-30 16:59:26 +02:00
klosax	2fabc176ce	Update convert-llama-h5-to-gguf.py	2023-07-30 16:28:08 +02:00
slaren	a113689571	ggml : add graph tensor allocator (#2411 ) * ggml : add graph tensor allocator * ggml : don't calculate data pointer of unallocated tensors when creating a view with an offset * ggml : refactor ggml_view_Nd into ggml_view_tensor_offset	2023-07-30 15:58:01 +02:00
klosax	f175b05872	Makefile : add gptneox gguf example	2023-07-30 15:08:37 +02:00
klosax	e9192b0135	add gptneox gguf example	2023-07-30 15:05:37 +02:00
klosax	4ed98bf1ab	Update convert-llama-h5-to-gguf.py	2023-07-30 15:01:47 +02:00
klosax	b19c11750b	ggml.c : add gguf_get_arr_n	2023-07-30 14:58:50 +02:00
klosax	b4676ee447	ggml.h : increase GGML_MAX_NAME to 64	2023-07-30 14:51:37 +02:00
klosax	ccd81a751b	gguf.py : add layer norm eps and merges	2023-07-30 14:48:14 +02:00
klosax	0790c121aa	constants.py : add layer norm eps	2023-07-30 14:46:36 +02:00
M. Yusuf Sarıgöz	87c34e4dd4	gguf : update convert-llama-h5-to-gguf.py	2023-07-30 01:09:22 +03:00
M. Yusuf Sarıgöz	32e037ffbe	gguf : fix set is not subscriptable	2023-07-30 01:01:13 +03:00
Johannes Gäßler	11f3ca06b8	CUDA: Quantized matrix matrix multiplication (#2160 ) * mmq implementation for non k-quants * q6_K * q2_K * q3_k * q4_K * vdr * q5_K * faster q8_1 loading * loop unrolling * add __restrict__ * q2_K sc_high * GGML_CUDA_MMQ_Y * Updated Makefile * Update Makefile * DMMV_F16 -> F16 * Updated README, CMakeLists * Fix CMakeLists.txt * Fix CMakeLists.txt * Fix multi GPU out-of-bounds	2023-07-29 23:04:44 +02:00
Johannes Gäßler	9baf9ef304	CUDA: faster multi GPU synchronization (#2448 )	2023-07-29 23:04:10 +02:00
klosax	06c3e4a1a7	Update convert-llama-h5-to-gguf.py	2023-07-29 21:38:01 +02:00
klosax	9577821487	gguf.py : support any type	2023-07-29 21:29:07 +02:00
klosax	2c22e3bcdb	ggml.c : get arr str and f32	2023-07-29 20:37:47 +02:00
klosax	34469b9ea7	ggml.h : get array str and f32	2023-07-29 20:36:06 +02:00
M. Yusuf Sarıgöz	0f5e57f01d	gguf : handle already encoded string	2023-07-29 19:56:06 +03:00
klosax	8ad7cd49fb	Update convert-llama-h5-to-gguf.py	2023-07-29 16:47:00 +02:00
M. Yusuf Sarıgöz	0317c41d98	gguf : upd gguf conversion script	2023-07-29 13:31:07 +03:00
M. Yusuf Sarıgöz	cc3dd7f042	gguf : write tokenizer data	2023-07-29 13:30:22 +03:00
M. Yusuf Sarıgöz	8a76dd8a85	gguf : write tensors one by one	2023-07-29 13:17:28 +03:00
M. Yusuf Sarıgöz	c861e234f4	gguf : write tensors one by one	2023-07-29 12:49:01 +03:00
M. Yusuf Sarıgöz	0c219fb5b5	gguf : fix writing gguf arrays	2023-07-29 12:42:54 +03:00
M. Yusuf Sarıgöz	93f7f7aef7	gguf : write tensors one by one and code reuse	2023-07-29 12:34:35 +03:00

... 2 3 4 5 6 ...

1155 Commits