Commit Graph

1705 Commits

Author SHA1 Message Date
slaren
f4d973cecb
convert.py : fix llama/llama2 conversion due to vocab_size=-1 (#4258) 2023-11-30 23:42:23 +02:00
tarcey
954e22858c
llama : fix typical sampling (#4261)
Typical sampling was broken because after copying new_candidates into canditates, the "sorted" bool is left at "true", but the new data is no longer sorted according to probability. Patch to set "sorted" to false.

Test: Generating with temp=0.0001 (approx. argmax)  should generate the same sequence at typical>=1.0 and typical=0.9999 (approx. disabled, but enters the typical sampling codepath).
2023-11-30 23:40:23 +02:00
rhjdvsgsgks
e2bd725f4b
py : fix oai proxy (#3972)
* fix oai proxy

fix generation not stoped while bot stop talking in chat mode

fix possible `slot_id` not exist

response for cors (and pre flight)

* oai proxy: workaround for some client (such as Chatbox)

* use stop as separator to replace hardcoded `\n`
2023-11-30 22:50:40 +02:00
Georgi Gerganov
1f5cd83275
examples : add readme files 2023-11-29 11:00:17 +02:00
Peter Sugihara
4fea3420ee
readme : add FreeChat (#4248) 2023-11-29 09:16:34 +02:00
Jared Van Bortel
64e64aa255
ggml : restore abort() in GGML_ASSERT (#4242) 2023-11-28 11:51:11 +02:00
Georgi Gerganov
8406b0924b
ggml : re-enable BLAS for CPU when src0 != F32 + remove redundant full offload checks in llama.cpp (#4240)
* ggml : use blas even if src0 is not F32

* llama : use n_threads_batch only when n_tokens >= 32

ggml-ci

* llama : revert n_threads_batch logic

ggml-ci
2023-11-28 10:32:03 +02:00
bandoti
b38a16dfcf
cmake : fix issue with version info not getting baked into LlamaConfig.cmake (#3970)
* Split CPP generation from build-info query

* Remove blank lines

* Add BUILD_SHARED_LIBS option
2023-11-27 21:25:42 +02:00
Kasumi
0dab8cd7cc
readme : add Amica to UI list (#4230) 2023-11-27 19:39:42 +02:00
Bailey Chittle
bb03290c17
examples : iOS example with swift ui (#4159)
* copy to llama.cpp as subdir

* attempt enabling metal, fails

* ggml metal compiles!

* Update README.md

* initial conversion to new format, utf8 errors?

* bug fixes, but now has an invalid memory access :(

* added O3, now has insufficient memory access

* begin sync with master

* update to match latest code, new errors

* fixed it!

* fix for loop conditionals, increase result size

* fix current workflow errors

* attempt a llama.swiftui workflow

* Update .github/workflows/build.yml

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-27 16:56:52 +02:00
Jared Van Bortel
f3b269813f
ggml : fix -Warray-bounds warning with gcc (#4231) 2023-11-26 22:58:43 -05:00
Georgi Gerganov
3e73d31d9c
lookahead : support -n -1 infinite generation 2023-11-26 21:52:23 +02:00
Georgi Gerganov
9656026b53
readme : update hot topics 2023-11-26 20:42:51 +02:00
Georgi Gerganov
922754a8d6
lookahead : add example for lookahead decoding (#4207)
* lookahead : init

* lookahead : generate and store n-grams

* lookahead : use loop instead recursion to generate n-grams

* lookahead : initial working implementation

* lookahead : filter repeating n-grams

* lookahead : use deterministic init

* lookahead : add to Makefile

* lookahead : fix a bug in the seq_id of the lookahead tokens

* lookahead : add comments

---------

Co-authored-by: slaren <slarengh@gmail.com>
2023-11-26 20:33:07 +02:00
Xiao-Yong Jin
22da05536f
metal : fix yarn (#4220)
get the correct n_orig_ctx in metal
2023-11-26 10:30:02 +02:00
Galunid
1ddb52ec38
scripts : Use mmap in torch load (#4202)
* Use mmap in torch load, prefer .bin files when loading

* Revert .bin > .safetensors preference
2023-11-25 22:45:02 +01:00
Marcus Dunn
f837c3a992
llama : grammar reserve space in decode_utf8 (#4210)
* reserve space for codepoints

* improvement for the appended 0
2023-11-25 18:58:23 +02:00
crasm
3014b5415d
Update docs for yarn_ext_factor <0.0 as unspecified instead of NaN (#4189) 2023-11-25 10:47:07 -05:00
Georgi Gerganov
04814e718e
readme : update hot topics 2023-11-25 12:02:13 +02:00
Georgi Gerganov
af19d35734
server : OAI API compatibility (#4198)
* Add openai-compatible POST /v1/chat/completions API endpoint to server example

* fix code style

* Update server README.md

* Improve server README.md

* Fix server.cpp code style according to review

* server : some style changes

* server : indentation

* server : enable special tokens during tokenization by default

* server : minor code style

* server : change random string generator

* straightforward /v1/models endpoint

---------

Co-authored-by: kir-gadjello <111190790+kir-gadjello@users.noreply.github.com>
Co-authored-by: Tobi Lütke <tobi@Tobis-MacBook-Pro.local>
2023-11-25 11:29:06 +02:00
slaren
e9c13ff781
llama : set metal log callback correctly (#4204) 2023-11-24 18:10:01 +01:00
slaren
8a052c131e
ggml-cuda : support stablelm rope (#4156)
* ggml-cuda : support stablelm rope

* remove unused freq_base kernel parameter

* add n_dims parameter to llm_build_k_shift, default to n_rot via overload

* llama : fix llm_build_k_shift args

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-24 18:04:31 +01:00
Galunid
189d68446e
convert : fix tensors using grad in some models (#4173) 2023-11-24 15:02:49 +01:00
eastriver
2568a4bf54
main.swift : fix eos checking (#4197)
llama_token_eos(const struct llama_model *) is currently getting struct llama_context type variable context as a parameter.
2023-11-24 11:25:10 +02:00
Aaryaman Vasishta
b35f3d0def
readme : use PATH for Windows ROCm (#4195)
* Update README.md to use PATH for Windows ROCm

* Update README.md

* Update README.md
2023-11-24 09:52:39 +02:00
Jared Van Bortel
9ae88baf38 Merge remote-tracking branch 'upstream/master' into nomic-vulkan-redo 2023-11-23 17:22:09 -05:00
Jared Van Bortel
a4bb9c5ced vulkan : sync with "migrate to dynamic graphs" 2023-11-23 17:22:09 -05:00
Jared Van Bortel
23f6d51f68 Merge commit '4760e7cc0b68570d58f55e8dda469805d1759d0d' into nomic-vulkan 2023-11-23 17:22:09 -05:00
Jared Van Bortel
208cd52f7d vulkan : implement YaRN RoPE scaling (#2268)
The NeoX cur_rot part is different because I'm pretty sure my original
implementation was wrong.
2023-11-23 17:22:09 -05:00
Jared Van Bortel
1829f1d7be Merge commit '4760e7cc0b68570d58f55e8dda469805d1759d0d~' into nomic-vulkan 2023-11-23 17:22:08 -05:00
Jared Van Bortel
02c3309f6d merge fixup (e16b9fa4ba) 2023-11-23 17:22:05 -05:00
Jared Van Bortel
9c4dfd06e8 mention skipped change 2023-11-23 17:22:05 -05:00
Jared Van Bortel
fe26e6adff Merge commit 'e16b9fa4baa8a09c6619b116159830e898050942' into nomic-vulkan 2023-11-23 17:22:04 -05:00
Jared Van Bortel
6474fc879a vulkan : handle ggml_scale for n%8 != 0
ref ggerganov/llama.cpp#3754
2023-11-23 17:22:00 -05:00
Jared Van Bortel
2a41ba7258 Merge commit '469c9addef75893e6be12edda852d12e840bf064' into nomic-vulkan 2023-11-23 17:22:00 -05:00
Jared Van Bortel
a934b2cb8a vulkan : assert various kernel requirements 2023-11-23 17:22:00 -05:00
Jared Van Bortel
f194e1b6a6 Merge commit 'fcca0a700487999d52a525c96d6661e9f6a8703a' into nomic-vulkan 2023-11-23 17:21:59 -05:00
Jared Van Bortel
39abedd1d7 vulkan : optimize workgroup sizes 2023-11-23 17:18:48 -05:00
Jared Van Bortel
84f7fc4553 vulkan : rope n_past is now KQ_pos, f16 rope kernel 2023-11-23 17:18:42 -05:00
Jared Van Bortel
71565eb0c3 vulkan : replace ggml_diag_mask_inf with ggml_add (custom -inf mask) 2023-11-23 17:18:27 -05:00
Haohui Mai
55978ce09b
Fix incorrect format strings and uninitialized variables. (#4133)
* Fix incorrect format strings and uninitialized variables.

* Address comments

* Add the missing include statement
2023-11-23 22:56:53 +01:00
Georgi Gerganov
6b0a7420d0
llama : KV cache view API + better KV cache management (#4170)
* llama : keep track of used KV cells + better KV cache management

* llama : zero KV cache used upon clear

ggml-ci

* llama : allow exporting a view of the KV cache (#4180)

* Allow exporting a view of the KV cache

* Allow dumping the sequences per cell in common

* Track max contiguous cells value and position as well

* Fix max contiguous empty cells index calculation

Make dump functions deal with lengths or sequences counts > 10 better

* Fix off by one error in dump_kv_cache_view

* Add doc comments for KV cache view functions

Eliminate cell sequence struct; use llama_seq_id directly

Minor cleanups

* common : add -dkvc arg for enabling kv cache dumps

---------

Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>
2023-11-23 19:07:56 +02:00
Georgi Gerganov
d103d935c0
readme : update hot topics 2023-11-23 13:51:22 +02:00
Daniel Bevenius
9d5949f04b
examples : fix typo in parallel example doc comment (#4181)
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2023-11-23 13:34:20 +02:00
Georgi Gerganov
ff8238f71d
docs : add llama-star arch idea 2023-11-23 11:35:04 +02:00
Galunid
8e672efe63
stablelm : simplify + speedup generation (#4153) 2023-11-21 16:22:30 +01:00
Galunid
0b871f1a04
finetune - update readme to mention llama support only (#4148) 2023-11-20 19:30:00 +01:00
Aaryaman Vasishta
dfc7cd48b1
readme : update ROCm Windows instructions (#4122)
* Update README.md

* Update README.md

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

---------

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2023-11-20 17:02:46 +02:00
Seb C
881800d1f0
main : Add ChatML functionality to main example (#4046)
Co-authored-by: Sebastian Cramond <sebby37@users.noreply.github.com>
2023-11-20 14:56:59 +01:00
Galunid
f23c0359a3
ci : add flake8 to github actions (python linting) (#4129)
Disabled rules:

* E203 Whitespace before ':' - disabled because we often use 'C' Style where values are aligned

* E211 Whitespace before '(' (E211) - disabled because we often use 'C' Style where values are aligned

* E221 Multiple spaces before operator - disabled because we often use 'C' Style where values are aligned

* E225 Missing whitespace around operator - disabled because it's broken so often it seems like a standard

* E231 Missing whitespace after ',', ';', or ':' - disabled because we often use 'C' Style where values are aligned

* E241 Multiple spaces after ',' - disabled because we often use 'C' Style where values are aligned

* E251 Unexpected spaces around keyword / parameter equals - disabled because it's broken so often it seems like a standard

* E261 At least two spaces before inline comment - disabled because it's broken so often it seems like a standard

* E266 Too many leading '#' for block comment - sometimes used as "section" separator

* E501 Line too long - disabled because it's broken so often it seems like a standard

* E701 Multiple statements on one line (colon) - broken only in convert.py when defining abstract methods (we can use# noqa instead)

* E704 Multiple statements on one line - broken only in convert.py when defining abstract methods (we can use# noqa instead)
2023-11-20 11:35:47 +01:00