llama.cpp/.gitignore at 5b8023d935401072b73b63ea995aaae040d57b87 - llama.cpp - Gitea: Git with a cup of tea

root/llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-28 12:24:35 +00:00

Justine Tunney 5b8023d935

Implement prototype for instant mmap() loading

This change uses a custom malloc() implementation to transactionally
capture to a file dynamic memory created during the loading process.
That includes (1) the malloc() allocation for mem_buffer and (2) all
the C++ STL objects. On my $1000 personal computer, this change lets
me run ./main to generate a single token (-n 1) using the float16 7B
model (~12gb size) in one second. In order to do that, there's a one
time cost where a 13gb file needs to be generated. This change rocks
but it shouldn't be necessary to do something this heroic. We should
instead change the file format, so that tensors don't need reshaping
and realignment in order to be loaded.

2023-03-16 22:16:33 -07:00

25 lines

232 B

Plaintext

Raw Blame History

 *.o
 *.a
 .cache/
 .vs/
 .vscode/
 .DS_Store
 build/
 build-em/
 build-debug/
 build-release/
 build-static/
 build-no-accel/
 build-sanitize-addr/
 build-sanitize-thread/
 models/*
 /main
 /quantize
 /magic.dat
 arm_neon.h
 compile_commands.json