wip: llama : separate recurrent states from the KV cache

This will be necessary to support Jamba
(and other recurrent models mixed with Attention).

Doesn't compile yet, and finding a slot isn't yet done correctly for recurrent states.
This commit is contained in:
Francis Couture-Harpin 2024-04-03 11:07:16 -04:00
parent 5fb1574c81
commit 271104c65c

1424
llama.cpp

File diff suppressed because it is too large Load Diff