llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-11-14 23:09:53 +00:00

History

Michael Podvitskiy fb4a0ec083 llama : propagate the results of `graph_compute` (#9525 ) * llama: propagating the results of `graph_compute` to the user interface * llama: reverting kv_cache in case of failed compute * llama: `llama_kv_cache_state` was removed, only the result of `llama_graph_compute` is returned * llama: restore a kv_cache in case of failed computation * llama: correct reverting of the entire batch. also updates `llama_kv_cache_find_slot`, will correctly count the number of `used` cells for recurrent models * llama: updated comments * llama : add comments about KV cache state after error --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-11-13 20:00:35 +02:00
..
llama.h	llama : propagate the results of `graph_compute` (#9525 )	2024-11-13 20:00:35 +02:00

Michael Podvitskiy fb4a0ec083

llama : propagate the results of graph_compute (#9525 )

* llama: propagating the results of `graph_compute` to the user interface

* llama: reverting kv_cache in case of failed compute

* llama: `llama_kv_cache_state` was removed, only the result of `llama_graph_compute` is returned

* llama: restore a kv_cache in case of failed computation

* llama: correct reverting of the entire batch.
also updates `llama_kv_cache_find_slot`, will correctly count the number of `used` cells for recurrent models

* llama: updated comments

* llama : add comments about KV cache state after error

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

2024-11-13 20:00:35 +02:00

llama.h llama : propagate the results of graph_compute (#9525 ) 2024-11-13 20:00:35 +02:00