update main readme (#8333)

2024-12-25 02:44:36 +00:00 · 2024-07-06 19:01:23 +02:00 · 2024-07-06 19:01:23 +02:00 · 60d83a0149
commit 60d83a0149
parent 87e25a1d1b
2 changed files with 25 additions and 20 deletions
--- a/README.md
+++ b/README.md
@ -391,28 +391,21 @@ The `grammars/` folder contains a handful of sample grammars. To write your own,
 For authoring more complex JSON grammars, you can also check out https://grammar.intrinsiclabs.ai/, a browser app that lets you write TypeScript interfaces which it compiles to GBNF grammars that you can save for local use. Note that the app is built and maintained by members of the community, please file any issues or FRs on [its repo](http://github.com/intrinsiclabsai/gbnfgen) and not this one.
-### Obtaining and using the Facebook LLaMA 2 model
+## Build
- Refer to [Facebook's LLaMA download page](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) if you want to access the model data.
+Please refer to [Build llama.cpp locally](./docs/build.md)
 - Alternatively, if you want to save time and space, you can download already converted and quantized models from [TheBloke](https://huggingface.co/TheBloke), including:
  - [LLaMA 2 7B base](https://huggingface.co/TheBloke/Llama-2-7B-GGUF)
  - [LLaMA 2 13B base](https://huggingface.co/TheBloke/Llama-2-13B-GGUF)
  - [LLaMA 2 70B base](https://huggingface.co/TheBloke/Llama-2-70B-GGUF)
  - [LLaMA 2 7B chat](https://huggingface.co/TheBloke/Llama-2-7B-chat-GGUF)
  - [LLaMA 2 13B chat](https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF)
  - [LLaMA 2 70B chat](https://huggingface.co/TheBloke/Llama-2-70B-chat-GGUF)
-### Seminal papers and background on the models
+## Supported backends
-If your issue is with model generation quality, then please at least scan the following links and papers to understand the limitations of LLaMA models. This is especially important when choosing an appropriate model size and appreciating both the significant and subtle differences between LLaMA models and ChatGPT:
+| Backend | Target devices |
- LLaMA:
+| --- | --- |
-    - [Introducing LLaMA: A foundational, 65-billion-parameter large language model](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/)
+| [Metal](./docs/build.md#metal-build) | Apple Silicon |
-    - [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971)
+| [BLAS](./docs/build.md#blas-build) | All |
- GPT-3
+| [BLIS](./docs/backend/BLIS.md) | All |
-    - [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
+| [SYCL](./docs/backend/SYCL.md) | Intel and Nvidia GPU |
- GPT-3.5 / InstructGPT / ChatGPT:
+| [CUDA](./docs/build.md#cuda) | Nvidia GPU |
-    - [Aligning language models to follow instructions](https://openai.com/research/instruction-following)
+| [hipBLAS](./docs/build.md#hipblas) | AMD GPU |
-    - [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)
+| [Vulkan](./docs/build.md#vulkan) | GPU |
 ## Tools
@ -460,3 +453,15 @@ To learn more how to measure perplexity using llama.cpp, [read this documentatio
 - [Build on Android](./docs/android.md)
 - [Performance troubleshooting](./docs/token_generation_performance_tips.md)
 - [GGML tips & tricks](https://github.com/ggerganov/llama.cpp/wiki/GGML-Tips-&-Tricks)
 **Seminal papers and background on the models**
 If your issue is with model generation quality, then please at least scan the following links and papers to understand the limitations of LLaMA models. This is especially important when choosing an appropriate model size and appreciating both the significant and subtle differences between LLaMA models and ChatGPT:
 - LLaMA:
    - [Introducing LLaMA: A foundational, 65-billion-parameter large language model](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/)
    - [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971)
 - GPT-3
    - [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
 - GPT-3.5 / InstructGPT / ChatGPT:
    - [Aligning language models to follow instructions](https://openai.com/research/instruction-following)
    - [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)
--- a/docs/build.md
+++ b/docs/build.md
@ -85,7 +85,7 @@ Building the program with BLAS support may lead to some performance improvements
 ### Accelerate Framework:
-  This is only available on Mac PCs and it's enabled by default. You can just build using the normal instructions.
+This is only available on Mac PCs and it's enabled by default. You can just build using the normal instructions.
 ### OpenBLAS: