llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-12 19:50:17 +00:00

History

compilade a1631e53f6 llama : simplify Mamba with advanced batch splits (#8526 ) * llama : advanced batch splits This includes equal-sequence-length batch splits which are useful to simplify recurrent model operators. * llama : always make recurrent state slots contiguous * ggml : simplify mamba operators * llama : fix integer signedness mixing * llama : logits_all has priority over batch->logits Otherwise, the server embeddings tests failed. This was likely an existing problem but was only detected here because of an additional assertion. * llama : apply suggestions Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * llama : fix t5 segfault * llama : fix Mamba session save and restore * llama : minor cosmetic changes * llama : rename llama_reorder_outputs to llama_output_reorder Also move it closer to llama_output_reserve. * llama : fix pooled embeddings when using batches with equal_seqs * minor : add struct members for clarity ggml-ci * llama : fix T5 segfault again * llama : fix Mamba pooled embeddings with multiple sequences Until the pooled embeddings are refactored to allow splitting across ubatches for causal embeddings, recurrent models can only process a single sequence per ubatch when calculating pooled embeddings. * llama : add llama_model_is_recurrent to simplify figuring that out This will make it easier to more cleanly support RWKV-v6 and Mamba-2. * llama : fix simple splits when the batch contains embeddings --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>		2024-08-21 17:58:11 -04:00
..
CMakeLists.txt	llama : move vocab, grammar and sampling into separate files (#8508 )	2024-07-23 13:10:17 +03:00
llama-grammar.cpp	ggml : reduce hash table reset cost (#8698 )	2024-07-27 04:41:55 +02:00
llama-grammar.h	llama : fix build + fix fabs compile warnings (#8683 )	2024-07-25 19:57:31 +03:00
llama-impl.h	llama : better replace_all (cont) (#8926 )	2024-08-09 18:23:52 +03:00
llama-sampling.cpp	Fix a spelling mistake (#9001 )	2024-08-12 11:46:03 +02:00
llama-sampling.h	llama : move vocab, grammar and sampling into separate files (#8508 )	2024-07-23 13:10:17 +03:00
llama-vocab.cpp	llama : std::move llm_bigram_bpe from work_queue (#9062 )	2024-08-21 10:32:58 +03:00
llama-vocab.h	common : remove duplicate function llama_should_add_bos_token (#8778 )	2024-08-15 10:23:23 +03:00
llama.cpp	llama : simplify Mamba with advanced batch splits (#8526 )	2024-08-21 17:58:11 -04:00
unicode-data.cpp	Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258 )	2024-07-02 12:18:10 -04:00
unicode-data.h	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
unicode.cpp	llama : move vocab, grammar and sampling into separate files (#8508 )	2024-07-23 13:10:17 +03:00
unicode.h	llama : move vocab, grammar and sampling into separate files (#8508 )	2024-07-23 13:10:17 +03:00