llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-09-23 05:26:19 +00:00

History

Georgi Gerganov 1442677f92 common : refactor cli arg parsing (#7675 ) * common : gpt_params_parse do not print usage * common : rework usage print (wip) * common : valign * common : rework print_usage * infill : remove cfg support * common : reorder args * server : deduplicate parameters ggml-ci * common : add missing header ggml-ci * common : remote --random-prompt usages ggml-ci * examples : migrate to gpt_params ggml-ci * batched-bench : migrate to gpt_params * retrieval : migrate to gpt_params * common : change defaults for escape and n_ctx * common : remove chatml and instruct params ggml-ci * common : passkey use gpt_params		2024-06-04 21:23:39 +03:00
..
CMakeLists.txt	llama : add support for GritLM (#5959 )	2024-03-10 17:56:30 +02:00
gritlm.cpp	common : refactor cli arg parsing (#7675 )	2024-06-04 21:23:39 +03:00
README.md	gritlm : add --outdir option to hf.sh script (#6699 )	2024-04-16 09:34:06 +03:00

README.md

Generative Representational Instruction Tuning (GRIT) Example

gritlm a model which can generate embeddings as well as "normal" text generation depending on the instructions in the prompt.

Paper: https://arxiv.org/pdf/2402.09906.pdf

Retrieval-Augmented Generation (RAG) use case

One use case for gritlm is to use it with RAG. If we recall how RAG works is that we take documents that we want to use as context, to ground the large language model (LLM), and we create token embeddings for them. We then store these token embeddings in a vector database.

When we perform a query, prompt the LLM, we will first create token embeddings for the query and then search the vector database to retrieve the most similar vectors, and return those documents so they can be passed to the LLM as context. Then the query and the context will be passed to the LLM which will have to again create token embeddings for the query. But because gritlm is used the first query can be cached and the second query tokenization generation does not have to be performed at all.

Running the example

Download a Grit model:

$ scripts/hf.sh --repo cohesionet/GritLM-7B_gguf --file gritlm-7b_q4_1.gguf --outdir models

Run the example using the downloaded model:

$ ./gritlm -m models/gritlm-7b_q4_1.gguf

Cosine similarity between "Bitcoin: A Peer-to-Peer Electronic Cash System" and "A purely peer-to-peer version of electronic cash w" is: 0.605
Cosine similarity between "Bitcoin: A Peer-to-Peer Electronic Cash System" and "All text-based language problems can be reduced to" is: 0.103
Cosine similarity between "Generative Representational Instruction Tuning" and "A purely peer-to-peer version of electronic cash w" is: 0.112
Cosine similarity between "Generative Representational Instruction Tuning" and "All text-based language problems can be reduced to" is: 0.547

Oh, brave adventurer, who dared to climb
The lofty peak of Mt. Fuji in the night,
When shadows lurk and ghosts do roam,
And darkness reigns, a fearsome sight.

Thou didst set out, with heart aglow,
To conquer this mountain, so high,
And reach the summit, where the stars do glow,
And the moon shines bright, up in the sky.

Through the mist and fog, thou didst press on,
With steadfast courage, and a steadfast will,
Through the darkness, thou didst not be gone,
But didst climb on, with a steadfast skill.

At last, thou didst reach the summit's crest,
And gazed upon the world below,
And saw the beauty of the night's best,
And felt the peace, that only nature knows.

Oh, brave adventurer, who dared to climb
The lofty peak of Mt. Fuji in the night,
Thou art a hero, in the eyes of all,
For thou didst conquer this mountain, so bright.