Olivier Chafik
eef922e02e
sort cmake example subdirs
2024-06-08 14:09:28 +01:00
Olivier Chafik
b648243496
add/fix gbnf-validator subfolder to cmake
2024-06-08 14:07:56 +01:00
Olivier Chafik
81222f02db
prefix more cmake targets w/ llama-
2024-06-08 14:05:34 +01:00
Olivier Chafik
10650b692d
rename {main->llama}-cmake-pkg binary
2024-06-08 13:57:06 +01:00
Olivier Chafik
78bca8cb07
fix main refs
2024-06-08 13:52:03 +01:00
Olivier Chafik
ab5efbb3b6
Prefix all example bins w/ llama-
2024-06-08 13:42:01 +01:00
Olivier Chafik
23d0df5bd5
main: target name -> llama-cli
2024-06-08 12:50:35 +01:00
Olivier Chafik
fe93cc96cc
Merge remote-tracking branch 'origin/master' into bins
2024-06-08 12:04:52 +01:00
sasha0552
7a16ce7db2
server : smart slot selection using Longest Common Prefix ( #7728 )
...
* server : Smart selection of available slot using Longest Common Substring
* add usage
* remove trailing whitespaces
* Use Longest Common Prefix (LCP) instead of LCS
* Rename argument
2024-06-08 10:50:31 +03:00
slaren
da799b4189
vulkan : reuse parent extra for views ( #7806 )
...
* vulkan : reuse parent extra for views
* Fix validation error when multiple compute contexts are used in a graph
---------
Co-authored-by: 0cc4m <picard12@live.de>
2024-06-07 19:47:49 +02:00
Christian Zhou-Zheng
c00fad71e5
gguf-split : change binary multi-byte units to decimal ( #7803 )
2024-06-07 15:56:01 +03:00
intelmatt
27615f5ab2
cmake : fix BUILD_SHARED_LIBS=ON build ( #7784 )
...
common depends on pthreads in Linux
2024-06-07 15:15:07 +03:00
Olivier Chafik
0dba58269f
Update server-llm.sh
2024-06-07 11:52:40 +01:00
Johannes Gäßler
7027b27d76
server: update cache_prompt documentation [no ci] ( #7745 )
2024-06-07 11:15:49 +02:00
ochafik
af8f0169da
Update .gitignore
2024-06-07 10:14:03 +01:00
ochafik
7fbe6006c9
update straggling refs
2024-06-07 09:42:21 +01:00
ochafik
99df4cc091
rm accidentally checked in bins
2024-06-07 09:40:09 +01:00
woodx
a5cabd7649
server : do not get prompt in infill mode ( #7286 )
...
* avoid to get prompt in infill mode and embedding mode
* remove embedding mode
* refactor format
---------
Co-authored-by: wudexiang <wudexiang@bytedance.com>
2024-06-07 10:09:45 +03:00
pengxin99
d5c938cd77
[SYCL] fix softmax r2r result wrong issue ( #7811 )
2024-06-07 14:28:26 +08:00
slaren
c9ee7118d5
check for nans in imatrix and quantize ( #7807 )
...
* imatrix : detect nan/inf values
* quantize : check imatrix for nan/inf values
2024-06-07 09:01:29 +03:00
ochafik
fbd83131f5
Merge remote-tracking branch 'origin/master' into bins
2024-06-07 00:51:31 +01:00
ochafik
a0a7f2b031
Update build.yml
2024-06-07 00:38:05 +01:00
ochafik
8695baebc0
update more names
2024-06-07 00:21:01 +01:00
Georgi Gerganov
ee459f40f6
server : fix --threads-http arg ( #7801 )
2024-06-06 19:19:59 +03:00
Olivier Chafik
9a03341094
main/server: fix targets
2024-06-06 15:53:25 +01:00
Olivier Chafik
8b7c734473
main: update refs -> llama
...
fix examples/main ref
2024-06-06 15:44:51 +01:00
Olivier Chafik
f5f19a236f
server: simplify nix package
2024-06-06 15:44:40 +01:00
Olivier Chafik
f298cc63d2
server: update refs -> llama-server
...
gitignore llama-server
2024-06-06 15:44:40 +01:00
Olivier Chafik
849842916d
main
/server
: rename to llama
/ llama-server
for consistency w/ homebrew
2024-06-06 15:28:27 +01:00
Georgi Gerganov
f83351f9a6
imatrix : migrate to gpt_params ( #7771 )
...
* imatrix : migrate to gpt_params
ggml-ci
* imatrix : add --save-frequency cli arg
* common : fix --no-ppl
2024-06-06 16:30:58 +03:00
Clint Herron
ad675e1c67
Added support for . (any character) token in grammar engine. ( #6467 )
...
* Added support for . (any characer) token in grammar engine.
* Add integration tests for any-character symbol.
2024-06-06 06:08:52 -07:00
Mattheus Chediak
a143c04375
README minor fixes ( #7798 ) [no ci]
...
derievatives --> derivatives
2024-06-06 22:17:54 +10:00
Olivier Chafik
55b2d0849d
grammars: x{min,max} repetition operator ( #6640 )
...
* grammars: x{min,max} repetition operator + tweak +/*/? to avoid duplication of original over alternates
* grammars: handle `x{n}` and fix `x{n,n}`
* grammars: document new repetition operators
* grammars: uniform use of int for min & max
* grammars: refactor parser test
* grammar: parsing tests w/ natural pretty print of updated expectations
* grammars: much prettier print of expectations (+ TEST_GRAMMAR_PARSER_PRINT_ALL=1 to force all)
* grammars: improve test pretty print again
* grammars: pretty print rules and chars
* grammars: fix copy rule skipping
* grammars: disallow `a{,}` (not allowed in regexps)
* Update common/grammar-parser.cpp
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* grammars: fix copy rule skipping (again) & display of expectations
* grammars: more test cases
* grammars: update reps parsing to bring ? / * / + closer to before
* json: use new GBNF repetitions{m,n} syntax
* grammars: update performance gotchas w/ repetition advice
* Update examples/json_schema_to_grammar.py
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* Update examples/server/public/json-schema-to-grammar.mjs
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* grammars: comment on rule repetitions
* grammars: ensure unambiguous number alternatives
* grammar: nit typo switched error msgs
* grammar: nit numbering in comment
* json: update numeric rule to be unambiguous
* Apply suggestions from code review
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* Update examples/server/public/json-schema-to-grammar.mjs
Co-authored-by: Clint Herron <hanclinto@gmail.com>
* json: fix integral-part
* grammar: add repetition tests
---------
Co-authored-by: Clint Herron <hanclinto@gmail.com>
2024-06-06 10:07:06 +01:00
Joan Fontanals
f5d7b268ec
llama : add jina v2 base code ( #7596 )
...
* feat: add changes to handle jina v2 base code
* fix: do not complicate things
* fix: fix the usage of the code model
* fix: fix comments
* fix: fix linting issues
* fix: remove ollama patches
* style : minor
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-06-06 10:22:41 +03:00
slaren
2d08b7fbb4
docker : build only main and server in their images ( #7782 )
...
* add openmp lib to dockerfiles
* build only main and server in their docker images
2024-06-06 08:19:49 +03:00
slaren
d67caea0d6
docker : add openmp lib ( #7780 )
2024-06-06 08:17:21 +03:00
Galunid
7672adeec7
Fix encoding in python scripts ( #7733 )
2024-06-06 03:07:24 +10:00
Johannes Gäßler
7d1a378b8f
CUDA: refactor mmq, dmmv, mmvq ( #7716 )
...
* CUDA: refactor mmq, dmmv, mmvq
* fix out-of-bounds write
* struct for qk, qr, qi
* fix cmake build
* mmq_type_traits
2024-06-05 16:53:00 +02:00
Georgi Gerganov
2b3389677a
ggml : refactor rope norm/neox ( #7634 )
...
* ggml : unify rope norm/neox (CPU)
* ggml : fix compile warning
* ggml : remove GLM rope mode
ggml-ci
* metal : better rope implementation
ggml-ci
* cuda : better rope implementation
ggml-ci
* naming : n_orig_ctx -> n_ctx_orig
ggml-ci
* dev : add reminders to update backends
ggml-ci
* vulkan : fix ggml_rope_ext() usage
* cuda : fix array size + indents
ggml-ci
2024-06-05 11:29:20 +03:00
arch-btw
9973e81c5c
readme : remove -ins ( #7759 )
...
-ins and --instruct were moved in https://github.com/ggerganov/llama.cpp/pull/7675
I have adjusted the README accordingly.
There was no trace of --chatml in the README.
2024-06-05 09:40:49 +03:00
jaime-m-p
c90dbe026b
Fix per token atrributes bits ( #7749 )
2024-06-05 01:26:14 +02:00
agray3
b90dc566c1
Allow number of nodes in CUDA graph to change ( #7738 )
...
Previously the code would have failed to cope in the case that the
number of nodes changes in an existing CUDA graph. This fixes the
issue by removing an unnecessary conditional.
2024-06-04 22:06:49 +02:00
Georgi Gerganov
1442677f92
common : refactor cli arg parsing ( #7675 )
...
* common : gpt_params_parse do not print usage
* common : rework usage print (wip)
* common : valign
* common : rework print_usage
* infill : remove cfg support
* common : reorder args
* server : deduplicate parameters
ggml-ci
* common : add missing header
ggml-ci
* common : remote --random-prompt usages
ggml-ci
* examples : migrate to gpt_params
ggml-ci
* batched-bench : migrate to gpt_params
* retrieval : migrate to gpt_params
* common : change defaults for escape and n_ctx
* common : remove chatml and instruct params
ggml-ci
* common : passkey use gpt_params
2024-06-04 21:23:39 +03:00
Georgi Gerganov
554c247caf
ggml : remove OpenCL ( #7735 )
...
ggml-ci
2024-06-04 21:23:20 +03:00
Georgi Gerganov
0cd6bd3483
llama : remove beam search ( #7736 )
2024-06-04 21:23:05 +03:00
Georgi Gerganov
5ca0944a15
readme : remove obsolete Zig instructions ( #7471 )
2024-06-04 19:43:01 +03:00
slaren
adc9ff3841
llama-bench : allow using a different printer for stderr with -oe ( #7722 )
...
compare-commits.sh : hide stdout, use -oe to print markdown
2024-06-04 14:32:42 +02:00
Daniele
987d743d6b
Improve hipBLAS support in CMake ( #7696 )
...
* Improve hipBLAS support in CMake
This improves the detection of the correct CMAKE_PREFIX_PATH when using different distributions or a self-built ROCm SDK.
* Set ROCM_PATH correctly
2024-06-04 14:09:15 +02:00
zhouwg
b226c1227b
refine .gitignore ( #7688 )
...
This adds tags and android ndk into the git ignore list
2024-06-04 21:21:26 +10:00
jaime-m-p
3b38d48609
Per token attributes ( #7685 )
...
* Add per token attributes enum
* Using phi-3 for testing 'rstrip'
* Using jina-v2 for testing 'lstrip'
* Brute force test for 'lstrip' and 'rstrip'
* Implement 'rstrip' and 'lstrip'
* Update phi-3 GGUF file (obsolete since 917dc8c
)
* Replace llama_token_type with llama_token_attribs
2024-06-04 09:17:17 +02:00