llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-27 20:04:35 +00:00

History

Georgi Gerganov d1031cf49c sampling : refactor init to use llama_sampling_params (#3696 ) * sampling : refactor init to use llama_sampling_params * llama : combine repetition, frequency and presence penalties in 1 call * examples : remove embd-input and gptneox-wip * sampling : rename penalty params + reduce size of "prev" vector * sampling : add llama_sampling_print helper * sampling : hide prev behind API and apply #3661 ggml-ci	2023-10-20 21:07:23 +03:00
..
CMakeLists.txt	speculative : PoC for speeding-up inference via speculative sampling (#2926 )	2023-09-03 15:12:08 +03:00
speculative.cpp	sampling : refactor init to use llama_sampling_params (#3696 )	2023-10-20 21:07:23 +03:00

Georgi Gerganov d1031cf49c

sampling : refactor init to use llama_sampling_params (#3696 )

* sampling : refactor init to use llama_sampling_params

* llama : combine repetition, frequency and presence penalties in 1 call

* examples : remove embd-input and gptneox-wip

* sampling : rename penalty params + reduce size of "prev" vector

* sampling : add llama_sampling_print helper

* sampling : hide prev behind API and apply #3661

ggml-ci

2023-10-20 21:07:23 +03:00

CMakeLists.txt

speculative : PoC for speeding-up inference via speculative sampling (#2926 )

2023-09-03 15:12:08 +03:00

speculative.cpp

sampling : refactor init to use llama_sampling_params (#3696 )

2023-10-20 21:07:23 +03:00