llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-28 12:24:35 +00:00

History

Georgi Gerganov 1da7b76569 server : fix speculative decoding with context shift (#10641 ) * server : fix speculative decoding with context shift ggml-ci * server : take into account speculative limits ggml-ci * server : add tests		2024-12-04 22:38:20 +02:00
..
test_basic.py	server : add more test cases (#10569 )	2024-11-29 21:48:56 +01:00
test_chat_completion.py	server: Add "tokens per second" information in the backend (#10548 )	2024-12-02 14:45:54 +01:00
test_completion.py	server : replace behave with pytest (#10416 )	2024-11-26 16:20:18 +01:00
test_ctx_shift.py	server : replace behave with pytest (#10416 )	2024-11-26 16:20:18 +01:00
test_embedding.py	server : replace behave with pytest (#10416 )	2024-11-26 16:20:18 +01:00
test_infill.py	server : add more test cases (#10569 )	2024-11-29 21:48:56 +01:00
test_lora.py	server : replace behave with pytest (#10416 )	2024-11-26 16:20:18 +01:00
test_rerank.py	server : add more test cases (#10569 )	2024-11-29 21:48:56 +01:00
test_security.py	server : replace behave with pytest (#10416 )	2024-11-26 16:20:18 +01:00
test_slot_save.py	server : replace behave with pytest (#10416 )	2024-11-26 16:20:18 +01:00
test_speculative.py	server : fix speculative decoding with context shift (#10641 )	2024-12-04 22:38:20 +02:00
test_tokenize.py	server : replace behave with pytest (#10416 )	2024-11-26 16:20:18 +01:00