Commit Graph

2 Commits

Author SHA1 Message Date
Pierrick Hymbert
a016026a3a
server: continuous performance monitoring and PR comment (#6283)
* server: bench: init

* server: bench: reduce list of GPU nodes

* server: bench: fix graph, fix output artifact

* ci: bench: add mermaid in case of image cannot be uploaded

* ci: bench: more resilient, more metrics

* ci: bench: trigger build

* ci: bench: fix duration

* ci: bench: fix typo

* ci: bench: fix mermaid values, markdown generated

* typo on the step name

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* ci: bench: trailing spaces

* ci: bench: move images in a details section

* ci: bench: reduce bullet point size

---------

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
2024-03-27 20:26:49 +01:00
Pierrick Hymbert
621e86b331
server: benchmark: chat/completions scenario and other llm servers comparison (#5941)
* server: bench: Init a bench scenario with K6
See #5827

* server: bench: EOL EOF

* server: bench: PR feedback and improved k6 script configuration

* server: bench: remove llamacpp_completions_tokens_seconds as it include prompt processing time and it's misleading

server: bench: add max_tokens from SERVER_BENCH_MAX_TOKENS

server: bench: increase truncated rate to 80% before failing

* server: bench: fix doc

* server: bench: change gauge custom metrics to trend

* server: bench: change gauge custom metrics to trend
server: bench: add trend custom metrics for total tokens per second average

* server: bench: doc add an option to debug http request

* server: bench: filter dataset too short and too long sequences

* server: bench: allow to filter out conversation in the dataset based on env variable

* server: bench: fix assistant message sent instead of user message

* server: bench: fix assistant message sent instead of user message

* server : add defrag thold parameter

* server: bench: select prompts based on the current iteration id not randomly to make the bench more reproducible

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-09 23:41:49 +01:00