* Introduce the new Min-P sampler by @kalomaze
The Min-P sampling method was designed as an alternative to Top-P, and aims to ensure a balance of quality and variety. The parameter *p* represents the minimum probability for a token to be considered, relative to the probability of the most likely token.
* Min-P enabled and set to 0.05 default
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
* added `llama_model_token_*` variants to all the `llama_token_*` functions.
* added `LLAMA_API`
* formatting
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* removed old `llama_token` functions
* changed 3 more functions to take in model
- `llama_token_get_text`
- `llama_token_get_score`
- `llama_token_get_type`
* added back docs
* fixed main.cpp
* changed token functions to use new model variants
* changed token functions to use new model variants
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Fix mirostat state when using multiple sequences
* Fix mirostat by completely refactoring sampling!
* Try to fix zig build.
* Export function to fetch/create default sampler states
Code formatting cleanups and add some comments
Silence a warning about id not being used when logging is disabled
* Apply some renaming suggestions.
Fix comments that were out of sync with the pull.
* Use more consistant naming convention for sampling contexts