mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2024-12-26 19:34:35 +00:00
6d341ab6c5
* (WIP) Implement stochastic speculative decoding * sample from residual distribution on draft accept failure * fix #5657: force greedy sampling with probs when temp is 0 * remove p_accept parameter * fix style * remove unused variables * add srand() in speculative.cpp * replace use of rand() with mt19937 sampling * fixes based on review (@JohannesGaessler) * fix r random generation * randomly select next sequence to verify + fix bug in memory freeing * fix bug in active_seqs sync * fix uniform int distribution initialization * remove warnings from comparison between int and size_t * check grammar in `llama_sample_probability_distribution_impl` * remove malloc code by utilizing vectors * add PR link to README
10 lines
285 B
Markdown
10 lines
285 B
Markdown
# llama.cpp/examples/speculative
|
|
|
|
Demonstration of speculative decoding and tree-based speculative decoding techniques
|
|
|
|
More info:
|
|
|
|
- https://github.com/ggerganov/llama.cpp/pull/2926
|
|
- https://github.com/ggerganov/llama.cpp/pull/3624
|
|
- https://github.com/ggerganov/llama.cpp/pull/5625
|