mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2025-01-11 19:21:46 +00:00
Add Misc section + update hot topics + minor fixes
This commit is contained in:
parent
2f700a2738
commit
97ab2b2578
21
README.md
21
README.md
@ -5,12 +5,17 @@
|
||||
|
||||
Inference of [Facebook's LLaMA](https://github.com/facebookresearch/llama) model in pure C/C++
|
||||
|
||||
**Hot topics:**
|
||||
|
||||
- Cache input prompts for faster initialization: https://github.com/ggerganov/llama.cpp/issues/64
|
||||
- Create a `llama.cpp` logo: https://github.com/ggerganov/llama.cpp/issues/105
|
||||
|
||||
## Description
|
||||
|
||||
The main goal is to run the model using 4-bit quantization on a MacBook
|
||||
|
||||
- Plain C/C++ implementation without dependencies
|
||||
- Apple silicon first-class citizen - optimized via Arm Neon and Accelerate framework
|
||||
- Apple silicon first-class citizen - optimized via ARM NEON
|
||||
- AVX2 support for x86 architectures
|
||||
- Mixed F16 / F32 precision
|
||||
- 4-bit quantization support
|
||||
@ -174,7 +179,7 @@ Note the use of `--color` to distinguish between user input and generated text.
|
||||
|
||||
## Limitations
|
||||
|
||||
- I don't know yet how much the quantization affects the quality of the generated text
|
||||
- We don't know yet how much the quantization affects the quality of the generated text
|
||||
- Probably the token sampling can be improved
|
||||
- The Accelerate framework is actually currently unused since I found that for tensor shapes typical for the Decoder,
|
||||
there is no benefit compared to the ARM_NEON intrinsics implementation. Of course, it's possible that I simlpy don't
|
||||
@ -187,11 +192,15 @@ Note the use of `--color` to distinguish between user input and generated text.
|
||||
- Collaborators can push to branches in the `llama.cpp` repo
|
||||
- Collaborators will be invited based on contributions
|
||||
|
||||
### Coding guide-lines
|
||||
### Coding guidelines
|
||||
|
||||
- Avoid adding third-party dependencies, extra files, extra headers, etc.
|
||||
- Always consider cross-compatibility with other operating systems and architectures
|
||||
- Avoid fancy looking modern STL constructs, use basic for loops, avoid templates, keep it simple
|
||||
- Avoid fancy looking modern STL constructs, use basic `for` loops, avoid templates, keep it simple
|
||||
- There are no strict rules for the code style, but try to follow the patterns in the code (indentation, spaces, etc.). Vertical alignment makes things more readable and easier to batch edit
|
||||
- Clean-up any tailing whitespaces, use 4 spaces indentation, brackets on same line, `int * var`
|
||||
- Look at the [good first issues](https://github.com/ggerganov/llama.cpp/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) for tasks
|
||||
- Clean-up any trailing whitespaces, use 4 spaces indentation, brackets on same line, `void * ptr`, `int & a`
|
||||
- See [good first issues](https://github.com/ggerganov/llama.cpp/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) for tasks suitable for first contributions
|
||||
|
||||
### Misc
|
||||
|
||||
- Practice your C++ typing skills: https://typing-battles.ggerganov.com
|
||||
|
Loading…
Reference in New Issue
Block a user