* Initial XTC commit
Adds XTC sampler, not activated by default, but recommended settings by default.
* Cleanup
* Simplified chances calculation
To be more inline with the original implementation, chance is calculated once at the beginning.
* First fixes by comments
Still need to look into sorting
* Fixed trailing backspaces
* Fixed RNG to be reproduceable
Thanks to @slaren for directions
* Fixed forgotten header
* Moved `min_keep`
Moved from conditions to a simple check at the end.
* Fixed broken randomization
Thanks to @slaren for explanation
* Swapped sorting for a custom algorithm
Shifts tokens to remove the penalized ones, then puts the penalized at the back. Should make `min_keep` still viable.
* Algorithm rework
1. Scan token from top till the first non-penalizable
2. Remove the last captured token (the least probable above threshold)
3. Shift all tokens to override the remaining penalizable
4. Penalize and put them at the the bottom.
* Added XTC to `test-sampling`
* Simplified algorithm and more tests
* Updated info in common and args
* Merged back lost commits in common and arg
* Update dump info in common
* Fixed incorrect min_keep check
* Added XTC to README
* Renamed parameters, fixed info and defaults
* probability is at 0 by default, but XTC is included in sampling queue
* threshold higher than 0.5 switches XTC off
* Initial server support
* Added XTC to server UIs
* Fixed labels in old server UI
* Made algorithm safer and more readable
* Removed xtc_threshold_max
* Fixed arg after update
* Quick fixes by comments
* Simplified algorithm since threshold_max is removed
* Renamed random distribution
* Fixed tests and outdated README
* Small fixes
* Adding loading page for '/' server requests
* set content when model is loading
* removed loading html file
* updated cmakelist
* updated makefile
* cleaned up whitespace
* cleanup for PR removed error
* updated server test to handle 503 HTML
* updated server test to handle 503 HTML
* ca†ch 503 before parsing json
* revert test
* account for both api and web browser requests
* precommit corrections
* eol fix
* revert changes to pre-commit
* removed print statement
* made loading message more descriptive
* also support .html files
---------
Co-authored-by: VJHack <flymyplane21@gmail.com>
Co-authored-by: Vinesh Janarthanan <36610342+VJHack@users.noreply.github.com>
* json: default additionalProperty to true
* json: don't force additional props after normal properties!
* json: allow space after enum/const
* json: update pydantic example to set additionalProperties: false
* json: prevent additional props to redefine a typed prop
* port not_strings to python, add trailing space
* fix not_strings & port to js+py
* Update json-schema-to-grammar.cpp
* fix _not_strings for substring overlaps
* json: fix additionalProperties default, uncomment tests
* json: add integ. test case for additionalProperties
* json: nit: simplify condition
* reformat grammar integ tests w/ R"""()""" strings where there's escapes
* update # tokens in server test: consts can now have trailing space
* ic
* migrate my eary work
* add the belonging stuff: css,favicon etc
* de prompts
* chore: Update HTML meta tags in index.html file
* add api-key css classes
* some necessary fixes
* Add API key CSS classes and update styling in style.css
* clean the code
* move API to the top, rearrange param sliders. update css
* add tooltips to the parameters with comprehensible explanations
* fix FloatField and BoolField tooltips
* fix grammar field width
* use template literales for promptFormats.js
* update const ModelGenerationInfo
* remove ms per token, since not relevant for most webui users and use cases
* add phi-3 prompt template
* add phi3 to dropdown
* add css class
* update forgotten css theme
* add user message suffix
* fix chatml & add llama3 format
* fix llama3 prompt template
* more prompt format fixes
* add more comon stop tokens
* add missing char
* do not separate with new line or comma
* move prompt style
* add hacky llama2 prompt solution, reduce redundancy in promptFormats.js
* fix toggle state localstorage
* add cmd-r prompt et reduce redundancy
* set default prompt to empty
* move files, clean code
* fix css path
* add a button to the new ui
* move new ui to "/public" due to otherwise problematic CORS behaviour
* include new ui in cpp
* fix wrong link to old ui
* renaming to ensure consistency
* fix typos "prompt-format" -> "prompt-formats"
* use correct indent
* add new ui files to makefile
* fix typo
* Added themes support with two sample themes and a favicon.
* Newline
* Newline
* Newline
* Trailing whitespace
* Increased opacity for contrast
* Increase opacity.
Check actions cancelled for some other priority job and I can't seem to manually re-run them, so MOAR OPACITY
* Opacity action trigger.
Trying to re-trigger the cancelled action.
* One more opacity adjustment
This Actions pipeline is failing for random issues.
* Delete examples/server/themes/buttons_top/completion.js
This will be served from the static string built-in to server.
* Delete examples/server/themes/buttons_top/index.js
This will be served from the static string built-in to server.
* Delete examples/server/themes/wild/completion.js
This will be served from the static string built-in to server.
* Delete examples/server/themes/buttons_top/json-schema-to-grammar.mjs
This will be served from the static string built-in to server.
* Delete examples/server/themes/wild/index.js
This will be served from the static string built-in to server.
* Delete examples/server/themes/wild/json-schema-to-grammar.mjs
This will be served from the static string built-in to server.
* Replaced underscore.
* server: format error to json
* server: do not crash on grammar error
* fix api key test case
* revert limit max n_predict
* small fix
* correct coding style
* update completion.js
* launch_slot_with_task
* update docs
* update_slots
* update webui
* update readme
* Add API key authentication for enhanced server-client security
* server : to snake_case
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update server.cpp with min_p after it was introduced in https://github.com/ggerganov/llama.cpp/pull/3841
* Use spaces instead of tabs
* Update index.html.hpp after running deps.sh
* Fix test - fix line ending
* implementing parallel decoding in server example
* crash fixed
* save dev progress
* refactored sampling function
* completion endpoint working
* multiple client support
* grammar + no stream completion
* cached prompt support
* chat.mjs support cached prompt + some fixes
* server ui now support multiple clients
* unused change reverted
* fixed timings per slot
* add context swap
* add changes to README.md
* llava multimodal integration
* fixed tokens probs
* add multimodal input - alfa
* refactor code + remove unused comments + improved README.md
* fix compilation errors with llvm
* notify the user from server ui that multimodality is unavialable
* some ci fixes
* fix ci make build undefined ref errors
* fix long prompt than ctx proposed in #3639
* fixed premature end due stop word
* context shift fixed
* fix llava implementation
* sync README.md changes
* readme change
* update api like OpenAI
* multimodal support enabled by default
* fix make bui;d errors
* fix multiple clients
* fix zig build
* new sampling API
* latest changes of sampling API
* server : coding-style normalization
* server : coding-style normalization (part 2)
* server : remove beam-search functionality
* server : bug fix in ingest_images
n_tokens is incremented internally by llama_batch_add
* server : use refs + use llama_batch_clear()
* server : snake case
* server : minor sync
* added thread safe pipeline
* server : bach has to be allocated for n_parallel sequences
* server : no need for atomic int - already using mutex
* server : logs + minor code style
* server : fix multibyte handle in partial response (#3706)
* fix image load + view image in chat
* make : silence stb warnings
* clip : link to ggml, not to llama
* server : fix switch fallthrough
* server : fix crash in Debug on macOS (I have no idea why this fixes it!?)
* server : refactor ctx_sampling init + n_ctx + names
* server : bug fix for prompt caching
* Do not save/load image_data to localStorage
* editorconfig : new line in index.html
* server : completion requests remember slot_id
* Update readme to document multimodal in server
* server : minor style
* Update readme to document multimodal in server
* server : hide ctx_sampling->prev behind API (#3696)
* server : apply fix from #3722
* server : fix slot reuse
* server : add comment about changing slot_state to bool
---------
Co-authored-by: FSSRepo <go778sgt@gmail.com>
Co-authored-by: Damian Stewart <d@damianstewart.com>
Co-authored-by: Steward Garcia <57494570+FSSRepo@users.noreply.github.com>
Co-authored-by: Jhen-Jie Hong <iainst0409@gmail.com>
Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
* server : add n_probs param in chat UI
* server : keep message data array & show in probabilites component
* server : add simple popover component
* server : fix completion_probabilities undefined if not set n_probs
* server : implement Probabilites
* server : handle bytes
* server : make n_probs max to 10 for easy scroll
* server : adjust for dark/light mode
* server : Fix regenerated prompt
* server : update index.html.hpp
* server : convert prob to percentage + show original value as div title
* server : fix Probabilites not used if included empty str
* server : skip byte pair in display probabilites
* server : remove array check of completion_probabilities in messages
* skip empty array or byte pair (> 1) in Probabilites
* generate index.html.hpp
* fix incorrect prob convert if the str is already a known token
* use final response to show probabilities on stop
* revert unnecessary change
* correct probabilites usage
* remove unused function
* always send partial response for get correct probs of last to_send
* fix typo
* fix content of format_final_response
* refactor probs render & make pColor transparent if not found
* send empty string when got stop_pos in partial
* avoid unnecessary empty data event & send rest of partial tokens on stop
* use <br /> for new line
* skip -1 tok in loop to avoid send '' on end
* trim last new lines on stop
* revert unnecessary change
* support for templates in browser LocalStorage
* sync accepted #2409 fix from upstream
* convert autosave invocation to useEffect
* Apply suggestions from code review
Co-authored-by: Jhen-Jie Hong <iainst0409@gmail.com>
* Regen index.html.cpp, suggested from code review
---------
Co-authored-by: Jhen-Jie Hong <iainst0409@gmail.com>
* server : implement json-schema-to-grammar.mjs by follow python impl
* server : add grammar support in chat.mjs
* server : implement grammer param in the UI
* server : generate .hpp
* server : remove trailing whitespaces
* server : generate .hpp
* server : fix sort of prop pairs
* server : optimize regex & iteration