Kerfuffle
6e08281e58
Extend llama_kv_cache_seq_rm to allow matching any sequence ( #3843 )
...
* Extend llama_kv_cache_seq_rm to allow matichng any sequence
* Replace llama_kv_cache_tokens_rm with llama_kv_cache_clear
Use llama_kv_cache_clear for cache clearing
Change calls to llama_kv_cache_tokens_rm that want to delete by position to use llama_kv_cache_seq_rm functionality
2023-10-29 11:31:40 -06:00
Georgi Gerganov
6961c4bd0b
batched-bench : print params at start
2023-10-25 10:26:27 +03:00
Georgi Gerganov
0e89203b51
speculative : add tree-based sampling example ( #3624 )
...
* sampling : one sequence per sampling context
ggml-ci
* speculative : add tree-based sampling support
ggml-ci
* speculative : reuse the n_parallel CLI param
* speculative : refactor sampling
* examples : fix build after sampling refactoring
ggml-ci
* batched : fix n_seq_id
* sampling : fix malloc
ggml-ci
* swift : fix build
ggml-ci
* swift : try to fix build
ggml-ci
* prompts : add assistant.txt
* common : add llama_batch_add() and llama_batch_clear() helpers
* speculative : minor refactor
ggml-ci
* minor : comments + rename
ggml-ci
* speculative : fix off-by-one for n_drafted
* speculative : fix the n_drafted fix + p constants
2023-10-18 16:21:57 +03:00
Georgi Gerganov
8c70a5ff25
batched : add bench tool ( #3545 )
...
* batched : add bench tool
* batched : minor fix table
* batched-bench : add readme + n_kv_max is now configurable
* batched-bench : init warm-up batch
* batched-bench : pass custom set of PP, TG and PL
* batched-bench : add mmq CLI arg
2023-10-11 21:25:33 +03:00