Molly Sophia
4ff7fe1fb3
llama : add chat template for RWKV-World + fix EOT ( #9968 )
...
* Add chat template for RWKV-World
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* RWKV: Fix the chat template not being used
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* RWKV v6: Set EOT token to ``\n\n``
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* readme: add rwkv into supported model list
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-10-22 13:33:37 +03:00
Asghar Ghorbani
994cfb1acb
readme : update UI list ( #9972 )
...
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-musa.Dockerfile platforms:linux/amd64 tag:full-musa]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-musa.Dockerfile platforms:linux/amd64 tag:light-musa]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-musa.Dockerfile platforms:linux/amd64 tag:server-musa]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
add PocketPal AI app
2024-10-21 21:20:59 +03:00
Loïc Carrère
45f097645e
readme : update bindings list ( #9951 )
...
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
Update the binding list by adding LM-Kit.NET (C# & VB.NET)
2024-10-20 19:25:41 +03:00
icppWorld
7cab2083c7
readme : update infra list ( #9942 )
...
llama_cpp_canister allows you to run llama.cpp as a Smart Contract on the Internet Computer. The smart contract runs as WebAssembly in a so-called 'canister'.
2024-10-20 19:01:34 +03:00
Ma Mingfei
60ce97c9d8
add amx kernel for gemm ( #8998 )
...
add intel amx isa detection
add vnni kernel for gemv cases
add vnni and amx kernel support for block_q8_0
code cleanup
fix packing B issue
enable openmp
fine tune amx kernel
switch to aten parallel pattern
add error message for nested parallelism
code cleanup
add f16 support in ggml-amx
add amx kernels for QK_K quant formats: Q4_K, Q5_K, Q6_K and IQ4_XS
update CMakeList
update README
fix some compilation warning
fix compiler warning when amx is not enabled
minor change
ggml-ci
move ggml_amx_init from ggml.c to ggml-amx/mmq.cpp
ggml-ci
update CMakeLists with -mamx-tile, -mamx-int8 and -mamx-bf16
ggml-ci
add amx as an ggml-backend
update header file, the old path for immintrin.h has changed to ggml-cpu-impl.h
minor change
update CMakeLists.txt
minor change
apply weight prepacking in set_tensor method in ggml-backend
fix compile error
ggml-ci
minor change
ggml-ci
update CMakeLists.txt
ggml-ci
add march dependency
minor change
ggml-ci
change ggml_backend_buffer_is_host to return false for amx backend
ggml-ci
fix supports_op
use device reg for AMX backend
ggml-ci
minor change
ggml-ci
minor change
fix rebase
set .buffer_from_host_ptr to be false for AMX backend
2024-10-18 13:34:36 +08:00
Tim Wang
3752217ed5
readme : update bindings list ( #9918 )
...
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
Co-authored-by: Tim Wang <tim.wang@ing.com>
2024-10-17 09:57:14 +03:00
Michał Tuszyński
4c42f93b22
readme : update bindings list ( #9889 )
2024-10-15 11:20:34 +03:00
R0CKSTAR
943d20b411
musa : update doc ( #9856 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-10-12 08:09:53 +03:00
Viet-Anh NGUYEN (Andrew)
71967c2a6d
Add Llama Assistant ( #9744 )
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
2024-10-04 20:29:35 +02:00
Paweł Wodnicki
3f1ae2e32c
Update README.md ( #9591 )
...
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full-cuda.Dockerfile platforms:linux/amd64 tag:full-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/full.Dockerfile platforms:linux/amd64,linux/arm64 tag:full]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-cuda.Dockerfile platforms:linux/amd64 tag:light-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli-intel.Dockerfile platforms:linux/amd64 tag:light-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-cli.Dockerfile platforms:linux/amd64,linux/arm64 tag:light]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-cuda.Dockerfile platforms:linux/amd64 tag:server-cuda]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server-intel.Dockerfile platforms:linux/amd64 tag:server-intel]) (push) Waiting to run
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/llama-server.Dockerfile platforms:linux/amd64,linux/arm64 tag:server]) (push) Waiting to run
Nix CI / nix-eval (macos-latest) (push) Waiting to run
Nix CI / nix-eval (ubuntu-latest) (push) Waiting to run
Nix CI / nix-build (macos-latest) (push) Waiting to run
Nix CI / nix-build (ubuntu-latest) (push) Waiting to run
flake8 Lint / Lint (push) Waiting to run
Add Bielik model.
2024-10-01 19:18:46 +02:00
Georgi Gerganov
589b48d41e
contrib : add Resources section ( #9675 )
2024-09-29 14:38:18 +03:00
Aarni Koskela
43bcdd9703
readme : add tool ( #9655 )
2024-09-28 15:07:14 +03:00
Georgi Gerganov
b5de3b74a5
readme : update hot topics
Nix CI / nix-eval (macos-latest) (push) Has been cancelled
Nix CI / nix-eval (ubuntu-latest) (push) Has been cancelled
Nix CI / nix-build (macos-latest) (push) Has been cancelled
Nix CI / nix-build (ubuntu-latest) (push) Has been cancelled
flake8 Lint / Lint (push) Has been cancelled
update-flake-lock / lockfile (push) Has been cancelled
2024-09-27 20:57:51 +03:00
Riceball LEE
1d48e98e4f
readme : add programmable prompt engine language CLI ( #9599 )
2024-09-23 18:58:17 +03:00
Shane A
0aadac10c7
llama : support OLMoE ( #9462 )
2024-09-16 09:47:37 +03:00
OSecret
d6b37c881f
readme : update tools list ( #9475 )
...
* Added link to proprietary wrapper for Unity3d into README.md
Wrapper has prebuild library and was tested on iOS, Android, WebGL, PC, Mac platforms, has online demos like [this](https://d23myu0xfn2ttc.cloudfront.net/rich/index.html ) and [that](https://d23myu0xfn2ttc.cloudfront.net/ ).
* Update README.md
Fixes upon review
2024-09-15 10:36:53 +03:00
Faisal Zaghloul
449ccfb6f5
Add Jais to list of supported models ( #9439 )
...
Co-authored-by: fmz <quic_fzaghlou@quic.com>
2024-09-12 02:29:53 +02:00
Georgi Gerganov
38ca6f644b
readme : update hot topics
2024-09-09 15:51:37 +03:00
Antonis Makropoulos
5ed087573e
readme : add LLMUnity to UI projects ( #9381 )
...
* add LLMUnity to UI projects
* add newline to examples/rpc/README.md to fix editorconfig-checker unit test
2024-09-09 14:21:38 +03:00
Georgi Gerganov
b69a480af4
readme : refactor API section + remove old hot topics
2024-09-03 10:00:36 +03:00
Younes Belkada
b40eb84895
llama : support for falcon-mamba
architecture ( #9074 )
...
* feat: initial support for llama.cpp
* fix: lint
* refactor: better refactor
* Update src/llama.cpp
Co-authored-by: compilade <git@compilade.net>
* Update src/llama.cpp
Co-authored-by: compilade <git@compilade.net>
* fix: address comments
* Update convert_hf_to_gguf.py
Co-authored-by: compilade <git@compilade.net>
* fix: add more cleanup and harmonization
* fix: lint
* Update gguf-py/gguf/gguf_writer.py
Co-authored-by: compilade <git@compilade.net>
* fix: change name
* Apply suggestions from code review
Co-authored-by: compilade <git@compilade.net>
* add in operator
* fix: add `dt_b_c_rms` in `llm_load_print_meta`
* fix: correct printf format for bool
* fix: correct print format
* Update src/llama.cpp
Co-authored-by: compilade <git@compilade.net>
* llama : quantize more Mamba tensors
* llama : use f16 as the fallback of fallback quant types
---------
Co-authored-by: compilade <git@compilade.net>
2024-08-21 11:06:36 +03:00
wangshuai09
cfac111e2b
cann: add doc for cann backend ( #8867 )
...
Co-authored-by: xuedinge233 <damow890@gmail.com>
Co-authored-by: hipudding <huafengchun@gmail.com>
2024-08-19 16:46:38 +08:00
Minsoo Cheong
c679e0cb5c
llama : add EXAONE model support ( #9025 )
...
* add exaone model support
* add chat template
* fix whitespace
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* add ftype
* add exaone pre-tokenizer in `llama-vocab.cpp`
Co-Authored-By: compilade <113953597+compilade@users.noreply.github.com>
* fix lint
Co-Authored-By: compilade <113953597+compilade@users.noreply.github.com>
* add `EXAONE` to supported models in `README.md`
* fix space
Co-authored-by: compilade <git@compilade.net>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: compilade <113953597+compilade@users.noreply.github.com>
Co-authored-by: compilade <git@compilade.net>
2024-08-16 09:35:18 +03:00
Frank Mai
84eb2f4fad
docs: introduce gpustack and gguf-parser ( #8873 )
...
* readme: introduce gpustack
GPUStack is an open-source GPU cluster manager for running large
language models, which uses llama.cpp as the backend.
Signed-off-by: thxCode <thxcode0824@gmail.com>
* readme: introduce gguf-parser
GGUF Parser is a tool to review/check the GGUF file and estimate the
memory usage without downloading the whole model.
Signed-off-by: thxCode <thxcode0824@gmail.com>
---------
Signed-off-by: thxCode <thxcode0824@gmail.com>
2024-08-12 14:45:50 +02:00
Eric Curtin
b42978e7e4
readme : add ramalama to the availables UI ( #8811 )
...
ramalama is a repo agnostic boring CLI tool that supports pulling from
ollama, huggingface and oci registries.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2024-08-05 15:45:01 +03:00
BarfingLemurs
400ae6f65f
readme : update model list ( #8851 )
2024-08-05 08:54:10 +03:00
R0CKSTAR
e54c35e4fb
feat: Support Moore Threads GPU ( #8383 )
...
* Update doc for MUSA
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Add GGML_MUSA in Makefile
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Add GGML_MUSA in CMake
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* CUDA => MUSA
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* MUSA adds support for __vsubss4
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Fix CI build failure
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-07-28 01:41:25 +02:00
MorganRO8
68504f0970
readme : update games list ( #8673 )
...
Added link to game I made that depends on llama
2024-07-24 19:48:00 +03:00
Thorsten Sommer
3a7ac5300a
readme : update UI list [no ci] ( #8505 )
2024-07-24 15:52:30 +03:00
Georgi Gerganov
be0cfb4175
readme : fix server badge
2024-07-19 14:34:55 +03:00
Andy Salerno
fd560fe680
Update README.md to fix broken link to docs ( #8399 )
...
Update the "Performance troubleshooting" doc link to be correct - the file was moved into a dir called 'development'
2024-07-09 14:58:44 -04:00
b4b4o
c4dd11d1d3
readme : fix web link error [no ci] ( #8347 )
2024-07-08 17:19:24 +03:00
toyer
04ce3a8b19
readme : add supported glm models ( #8360 )
2024-07-08 08:57:19 +03:00
Andy Tai
f1948f1e10
readme : update bindings list ( #8222 )
...
* adding guile_llama_cpp to binding list
* fix formatting
* fix formatting
2024-07-07 16:21:37 +03:00
Xuan Son Nguyen
60d83a0149
update main readme ( #8333 )
2024-07-06 19:01:23 +02:00
Xuan Son Nguyen
be20e7f49d
Reorganize documentation pages ( #8325 )
...
* re-organize docs
* add link among docs
* add link to build docs
* fix style
* de-duplicate sections
2024-07-05 18:08:32 +02:00
Georgi Gerganov
6c05752c50
contributing : update guidelines ( #8316 )
2024-07-05 09:09:47 +03:00
Georgi Gerganov
e235b267a2
py : switch to snake_case ( #8305 )
...
* py : switch to snake_case
ggml-ci
* cont
ggml-ci
* cont
ggml-ci
* cont : fix link
* gguf-py : use snake_case in scripts entrypoint export
* py : rename requirements for convert_legacy_llama.py
Needed for scripts/check-requirements.sh
---------
Co-authored-by: Francis Couture-Harpin <git@compilade.net>
2024-07-05 07:53:33 +03:00
Mateusz Charytoniuk
dae57a1ebc
readme: add Paddler to the list of projects ( #8239 )
2024-07-01 20:13:22 +03:00
Xuan Son Nguyen
49122a873f
gemma2: add sliding window mask ( #8227 )
...
* gemma2: add sliding window mask
* fix data_swa uninitialized
* better naming
* add co-author
Co-authored-by: Arlo Phoenix <arlo-phoenix@users.noreply.github.com>
* replace list with single tensor
* update
* llama : minor styling
* convert : add sanity check for query_pre_attn_scalar
* fix small typo in README
---------
Co-authored-by: Arlo Phoenix <arlo-phoenix@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-01 18:48:34 +02:00
Roni
0ddeff1023
readme : update tool list ( #8209 )
...
* Added gppm to Tool list in README
* Update README.md
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-01 15:48:16 +03:00
iacore
694c59cb42
Document BERT support. ( #8205 )
...
* Update README.md
document BERT support
* Update README.md
2024-07-01 13:40:58 +02:00
Georgi Gerganov
a95631ee97
readme : update API notes
2024-06-26 19:26:13 +03:00
Georgi Gerganov
f3f65429c4
llama : reorganize source code + improve CMake ( #8006 )
...
* scripts : update sync [no ci]
* files : relocate [no ci]
* ci : disable kompute build [no ci]
* cmake : fixes [no ci]
* server : fix mingw build
ggml-ci
* cmake : minor [no ci]
* cmake : link math library [no ci]
* cmake : build normal ggml library (not object library) [no ci]
* cmake : fix kompute build
ggml-ci
* make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE
ggml-ci
* move public backend headers to the public include directory (#8122 )
* move public backend headers to the public include directory
* nix test
* spm : fix metal header
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* scripts : fix sync paths [no ci]
* scripts : sync ggml-blas.h [no ci]
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-06-26 18:33:02 +03:00
Isaac McFadyen
8854044561
Clarify default MMQ for CUDA and LLAMA_CUDA_FORCE_MMQ flag ( #8115 )
...
* Add message about int8 support
* Add suggestions from review
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-06-26 08:29:28 +02:00
Johannes Gäßler
a818f3028d
CUDA: use MMQ instead of cuBLAS by default ( #8075 )
2024-06-24 17:43:42 +02:00
Abheek Gulati
1193778105
readme : update UI list ( #7943 )
2024-06-18 09:57:41 +03:00
Bryan Honof
b473e95084
Add Nix and Flox install instructions ( #7899 )
2024-06-17 09:37:55 -06:00
hopkins385
6fe1c62741
readme : update UI list [no ci] ( #7958 )
2024-06-16 14:51:18 +03:00
Galunid
a55eb1bf0f
readme : Remove outdated instructions from README.md ( #7914 ) [no ci]
2024-06-13 09:42:41 +02:00