HanClinto
72660c357c
Updating run-with-preset.py
to use new binary names.
...
Updating docs around `perplexity` binary rename.
2024-06-10 15:23:32 -07:00
HanClinto
2fd66b2ce2
Updating a few lingering doc references for rename of main to llama-cli
2024-06-10 14:53:23 -07:00
HanClinto
e7e03733b2
Updating docs for eval-callback binary to use new llama-
prefix.
2024-06-10 14:44:46 -07:00
ochafik
0be5f399c4
add two missing llama- prefixes
2024-06-10 22:00:28 +01:00
Olivier Chafik
f9cfd04bd4
address gbnf-validator unused fread warning (switched to C++ / ifstream)
2024-06-10 17:38:36 +01:00
Olivier Chafik
b8436395b4
rename: llama-cli-cmake-pkg(.exe)
2024-06-10 16:23:45 +01:00
Olivier Chafik
4881a94bee
fix test-eval-callback
2024-06-10 16:21:14 +01:00
Olivier Chafik
b8cb44e812
more llama-cli(.exe)
2024-06-10 16:08:06 +01:00
Olivier Chafik
051633ed2d
update dockerfile refs
2024-06-10 16:05:11 +01:00
Olivier Chafik
1cc651446d
rename(make): llama-baby-llama
2024-06-10 16:03:18 +01:00
Olivier Chafik
0fcf2c328e
rename dockerfile w/ llama-cli
2024-06-10 15:44:49 +01:00
Olivier Chafik
0bb2a3f233
fix some missing -cli suffixes
2024-06-10 15:42:20 +01:00
Olivier Chafik
daeaeb1222
Merge remote-tracking branch 'origin/master' into bins
2024-06-10 15:38:41 +01:00
Olivier Chafik
5265c15d4c
rename llama|main -> llama-cli; consistent RPM bin prefixes
2024-06-10 15:34:14 +01:00
slaren
fd5ea0f897
ci : try win-2019 on server windows test ( #7854 )
2024-06-10 15:18:41 +03:00
Georgi Gerganov
c28a83902c
examples : remove --instruct remnants ( #7846 )
2024-06-10 15:00:15 +03:00
Georgi Gerganov
d9da0e4986
server : improve "prompt" handling ( #7847 )
2024-06-10 14:59:55 +03:00
Johannes Gäßler
1f0dabda8d
CUDA: use tensor cores for MMQ ( #7676 )
...
* CUDA: int8 tensor cores for MMQ (legacy quants)
* fix out-of-bounds writes
* __builtin_assume -> GGML_CUDA_ASSUME
* fix writeback returning too early
2024-06-10 11:45:13 +02:00
Ben Ashbaugh
af4ae502dd
use the correct SYCL context for host USM allocations ( #7777 )
...
Signed-off-by: Ben Ashbaugh <ben.ashbaugh@intel.com>
2024-06-10 10:21:31 +01:00
Georgi Gerganov
10ceba354a
flake.lock: Update ( #7838 )
...
Flake lock file updates:
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/ad57eef4ef0659193044870c731987a6df5cf56b?narHash=sha256-SzDKxseEcHR5KzPXLwsemyTR/kaM9whxeiJohbL04rs%3D' (2024-05-29)
→ 'github:NixOS/nixpkgs/051f920625ab5aabe37c920346e3e69d7d34400e?narHash=sha256-4q0s6m0GUcN7q%2BY2DqD27iLvbcd1G50T2lv08kKxkSI%3D' (2024-06-07)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-06-09 16:04:50 -07:00
Georgi Gerganov
e95beeb1fc
imatrix : handle partial entries ( #7833 )
2024-06-09 20:19:35 +03:00
Nicolás Pérez
57bf62ce7c
docs: Added initial PR template with directions for doc only changes and squash merges [no ci] ( #7700 )
...
This commit adds pull_request_template.md and CONTRIBUTING.md . It focuses on explaining to contributors the need to rate PR complexity level, when to add [no ci] and how to format PR title and descriptions.
Co-authored-by: Brian <mofosyne@gmail.com>
Co-authored-by: compilade <git@compilade.net>
2024-06-10 01:24:29 +10:00
mgroeber9110
3e2ee44315
server: do not remove whitespace at the start of a completion chunk ( #7830 )
2024-06-09 20:50:35 +10:00
Johannes Gäßler
42b53d192f
CUDA: revise q8_1 data layout for mul_mat_q ( #7824 )
2024-06-09 09:42:25 +02:00
sasha0552
2decf57bc6
convert-hf : set the model name based on cli arg, if present ( #7693 )
...
`--model-name` argument was added a while ago but did not do anything.
This commit fixes this issue and enables this feature.
2024-06-09 16:39:25 +10:00
compilade
5795b94182
convert-hf : match model part name prefix and suffix ( #7687 )
...
In #7075 , to fix the conversion of (some) models using model-00001-of-00001.safetensors instead of model.safetensors for a single model part we simply used the same logic as the part count to get the part names.
But this doesn't always work correctly, like when unusual additional model files like consolidated.safetensors in https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3 are present.
This commit matching both the prefix and the suffix of the model part names should fix this problem without breaking any previously-supported upstream models. But according to report by @teleprint-me there is still some
persistent problem, but shall do in the meantime.
2024-06-09 12:47:25 +10:00
compilade
ed9f252118
gguf-py : decouple adding metadata from writing in GGUFWriter ( #7827 )
...
Main changes of this PR is to consolidate GGUFWriter.add_key and GGUFWriter.add_val into GGUFWriter.add_key_value.
In addition use_temp_file is now opt-in instead of opt-out defaulting to False.
Also GGUFWriter now does not require output file name until when actually writing to it.
And GGUFWriter doesn't really need to eagerly prepare the data layout of the metadata
2024-06-09 12:34:29 +10:00
slaren
fe1e3917cf
Revert "[SYCL] Update rpc-server.cpp to include SYCL backend ( #7682 )" ( #7808 )
...
This reverts commit 9422c5e34b
.
2024-06-09 01:43:39 +02:00
Olivier Chafik
d4d915d351
url: save -mu downloads to new cache location ( #7826 )
...
* url: save -mu download to new cache location
* url: fs_get_cache_file_path util
* url: tweak sig of fs_get_cache_file
2024-06-08 21:21:08 +02:00
Olivier Chafik
347f30803f
rename Dockerfiles
2024-06-08 15:10:32 +01:00
Olivier Chafik
78eae7f3ba
gitignore /llama-*
2024-06-08 14:29:35 +01:00
Olivier Chafik
efaa441233
fix llama-lookup-* Makefile rules
2024-06-08 14:26:11 +01:00
Olivier Chafik
b0eb3b88e9
rm bin files
2024-06-08 14:16:32 +01:00
Olivier Chafik
eef922e02e
sort cmake example subdirs
2024-06-08 14:09:28 +01:00
Olivier Chafik
b648243496
add/fix gbnf-validator subfolder to cmake
2024-06-08 14:07:56 +01:00
Olivier Chafik
81222f02db
prefix more cmake targets w/ llama-
2024-06-08 14:05:34 +01:00
Olivier Chafik
10650b692d
rename {main->llama}-cmake-pkg binary
2024-06-08 13:57:06 +01:00
Olivier Chafik
78bca8cb07
fix main refs
2024-06-08 13:52:03 +01:00
Olivier Chafik
ab5efbb3b6
Prefix all example bins w/ llama-
2024-06-08 13:42:01 +01:00
Olivier Chafik
23d0df5bd5
main: target name -> llama-cli
2024-06-08 12:50:35 +01:00
Olivier Chafik
fe93cc96cc
Merge remote-tracking branch 'origin/master' into bins
2024-06-08 12:04:52 +01:00
sasha0552
7a16ce7db2
server : smart slot selection using Longest Common Prefix ( #7728 )
...
* server : Smart selection of available slot using Longest Common Substring
* add usage
* remove trailing whitespaces
* Use Longest Common Prefix (LCP) instead of LCS
* Rename argument
2024-06-08 10:50:31 +03:00
slaren
da799b4189
vulkan : reuse parent extra for views ( #7806 )
...
* vulkan : reuse parent extra for views
* Fix validation error when multiple compute contexts are used in a graph
---------
Co-authored-by: 0cc4m <picard12@live.de>
2024-06-07 19:47:49 +02:00
Christian Zhou-Zheng
c00fad71e5
gguf-split : change binary multi-byte units to decimal ( #7803 )
2024-06-07 15:56:01 +03:00
intelmatt
27615f5ab2
cmake : fix BUILD_SHARED_LIBS=ON build ( #7784 )
...
common depends on pthreads in Linux
2024-06-07 15:15:07 +03:00
Olivier Chafik
0dba58269f
Update server-llm.sh
2024-06-07 11:52:40 +01:00
Johannes Gäßler
7027b27d76
server: update cache_prompt documentation [no ci] ( #7745 )
2024-06-07 11:15:49 +02:00
ochafik
af8f0169da
Update .gitignore
2024-06-07 10:14:03 +01:00
ochafik
7fbe6006c9
update straggling refs
2024-06-07 09:42:21 +01:00
ochafik
99df4cc091
rm accidentally checked in bins
2024-06-07 09:40:09 +01:00