llama.cpp

root/llama.cpp

Fork 0

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-12 03:31:46 +00:00

Commit Graph

Author SHA1 Message Date

Author	SHA1	Message	Date
Georgi Gerganov	47182dd03f	llama : update llama_model API names (#11063 ) * llama : deprecate llama_free_model, add llama_model_free ggml-ci * llama : change `llama_load_model_from_file` -> `llama_model_load_from_file` ggml-ci	2025-01-06 10:55:18 +02:00
Georgi Gerganov	f66f582927	llama : refactor `src/llama.cpp` (#10902 ) * llama : scatter llama.cpp into multiple modules (wip) * llama : control-vector -> adapter * llama : arch * llama : mmap ggml-ci * ci : remove BUILD_SHARED_LIBS=OFF ggml-ci * llama : arch (cont) ggml-ci * llama : chat ggml-ci * llama : model ggml-ci * llama : hparams ggml-ci * llama : adapter ggml-ci * examples : fix ggml-ci * rebase ggml-ci * minor * llama : kv cache ggml-ci * llama : impl ggml-ci * llama : batch ggml-ci * cont ggml-ci * llama : context ggml-ci * minor * llama : context (cont) ggml-ci * llama : model loader ggml-ci * common : update lora ggml-ci * llama : quant ggml-ci * llama : quant (cont) ggml-ci * minor [no ci]	2025-01-03 10:18:53 +02:00
Eric Curtin	0cc63754b8	Introduce llama-run (#10291 ) It's like simple-chat but it uses smart pointers to avoid manual memory cleanups. Less memory leaks in the code now. Avoid printing multiple dots. Split code into smaller functions. Uses no exception handling. Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2024-11-25 22:56:24 +01:00

Georgi Gerganov

47182dd03f

llama : update llama_model API names (#11063 )

* llama : deprecate llama_free_model, add llama_model_free

ggml-ci

* llama : change `llama_load_model_from_file` -> `llama_model_load_from_file`

ggml-ci

2025-01-06 10:55:18 +02:00

Georgi Gerganov

f66f582927

llama : refactor src/llama.cpp (#10902 )

* llama : scatter llama.cpp into multiple modules (wip)

* llama : control-vector -> adapter

* llama : arch

* llama : mmap

ggml-ci

* ci : remove BUILD_SHARED_LIBS=OFF

ggml-ci

* llama : arch (cont)

ggml-ci

* llama : chat

ggml-ci

* llama : model

ggml-ci

* llama : hparams

ggml-ci

* llama : adapter

ggml-ci

* examples : fix

ggml-ci

* rebase

ggml-ci

* minor

* llama : kv cache

ggml-ci

* llama : impl

ggml-ci

* llama : batch

ggml-ci

* cont

ggml-ci

* llama : context

ggml-ci

* minor

* llama : context (cont)

ggml-ci

* llama : model loader

ggml-ci

* common : update lora

ggml-ci

* llama : quant

ggml-ci

* llama : quant (cont)

ggml-ci

* minor [no ci]

2025-01-03 10:18:53 +02:00

Eric Curtin

0cc63754b8

Introduce llama-run (#10291 )

It's like simple-chat but it uses smart pointers to avoid manual
memory cleanups. Less memory leaks in the code now. Avoid printing
multiple dots. Split code into smaller functions. Uses no exception
handling.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>

2024-11-25 22:56:24 +01:00

3 Commits