* server : add "tokens" output
ggml-ci
* server : output embeddings for all tokens when pooling = none
ggml-ci
* server : update readme [no ci]
* server : fix spacing [no ci]
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* server : be explicit about the pooling type in the tests
ggml-ci
* server : update /embeddings and /v1/embeddings endpoints
ggml-ci
* server : do not normalize embeddings when there is no pooling
ggml-ci
* server : update readme
ggml-ci
* server : fixes
* tests : update server tests
ggml-ci
* server : update readme [no ci]
* server : remove rebase artifact
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
- Add `struct llama_sampler` and `struct llama_sampler_i`
- Add `llama_sampler_` API
- Add `llama_sampler_chain_` API for chaining multiple samplers
- Remove `LLAMA_API_INTERNAL`
- Add `llama_perf_` API and remove old `llama_print_timings` and `llama_reset_timings`
* create append_pooling operation; allow to specify attention_type; add last token pooling; update examples
* find result_norm/result_embd tensors properly; update output allocation logic
* only use embd output for pooling_type NONE
* get rid of old causal_attn accessor
* take out attention_type; add in llama_set_embeddings
* bypass logits when doing non-NONE pooling
* common : gpt_params_parse do not print usage
* common : rework usage print (wip)
* common : valign
* common : rework print_usage
* infill : remove cfg support
* common : reorder args
* server : deduplicate parameters
ggml-ci
* common : add missing header
ggml-ci
* common : remote --random-prompt usages
ggml-ci
* examples : migrate to gpt_params
ggml-ci
* batched-bench : migrate to gpt_params
* retrieval : migrate to gpt_params
* common : change defaults for escape and n_ctx
* common : remove chatml and instruct params
ggml-ci
* common : passkey use gpt_params
This commit updates the hf.sh script usage to include the --outdir option
and specifies the models directory as the output directory.
The motivation for this is to avoid cluttering the root directory with
model files.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
* gritlm: add initial README.md to examples/gritlm
This commit adds a suggestion for an initial README.md for the gritlm
example.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
* squash! gritlm: add initial README.md to examples/gritlm
Use the `scripts/hf.sh` script to download the model file.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
* squash! gritlm: add initial README.md to examples/gritlm
Fix editorconfig-checker error in examples/gritlm/README.md.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
---------
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
* add gritlm example
* gritlm results match
* tabs to spaces
* comment out debug printing
* rebase to new embed
* gritlm embeddings are back babeee
* add to gitignore
* allow to toggle embedding mode
* Clean-up GritLM sample code.
* Fix types.
* Flush stdout and output ending newline if streaming.
* mostly style fixes; correct KQ_mask comment
* add causal_attn flag to llama_cparams
* gritml : minor
* llama : minor
---------
Co-authored-by: Douglas Hanley <thesecretaryofwar@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>