llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-27 03:44:35 +00:00

Author	SHA1	Message	Date
slaren	d273bfd2c9	allocator: cleanup, more comments	2023-07-22 15:05:24 +02:00
slaren	e2b9575951	allocator cleanup	2023-07-22 13:29:44 +02:00
slaren	7de7882537	allocator: fix partial offloading	2023-07-22 02:34:21 +02:00
slaren	e87840f9fd	allocator: automatic inplace operations	2023-07-21 16:51:50 +02:00
slaren	3d679827e7	improved memory management fixes	2023-07-21 12:59:26 +02:00
slaren	cd6f5dec92	improved memory management	2023-07-21 00:44:35 +02:00
slaren	de69f8f20d	initial implementation of delayed graph allocation	2023-07-20 15:57:48 +02:00
slaren	cb205c0d13	automatically calculate compute buffer sizes (without graph allocator)	2023-07-20 02:42:36 +02:00
slaren	295f85654a	allocators wip renamed ggml_backend functions changed ggml_buffer and ggml_backend to always be used as pointers rename ggml_tensor::params -> op_params	2023-07-19 02:43:44 +02:00
slaren	9c72e7e916	rebase to master (except ggml-cuda)	2023-07-16 15:10:46 +02:00
slaren	24cc6f008f	minor fixes	2023-07-16 14:56:52 +02:00
slaren	0d2b66c638	ggml backend interface wip refactor ggml-cuda	2023-07-16 14:56:46 +02:00