llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2025-01-10 02:31:46 +00:00

Author	SHA1	Message	Date
TheNotary	0e41441fa1	moves ggml-vocab.bin into test folder where it's used. It appears this file is only used during tests as of now. Removing it from the model folder makes it more flexible for how users are loading their model data into the project (e.g. are they using a docker bind-mounts, are they using symlinks, are they DLing models directly into this folder?) By moving this, the instructions for getting started can be safely simplified to: $ rm models/.gitkeep $ rm -r models $ ln -s /mnt/c/ai/models/LLaMA $(pwd)/models I think it's a good idea because the model files are quite large, and be useful across multiple projects so symlinks shine in this use case without creating too much confusion for the onboardee..	2023-04-26 16:20:42 -05:00
unbounded	5f939498d5	ggml : unit test for quantization functions (#953 ) * Unit test for quantization functions Use the ggml_internal_get_quantize_fn function to loop through all quantization formats and run a sanity check on the result. Also add a microbenchmark that times these functions directly without running the rest of the GGML graph. * test-quantize-fns: CI fixes Fix issues uncovered in CI - need to use sizes divisible by 328 for loop unrolling - use intrinsic header that should work on Mac test-quantize: remove Per PR comment, subsumed by test-quantize-fns * test-quantize: fix for q8_0 intermediates	2023-04-22 12:10:39 +03:00
Stephan Walter	436e561931	all : be more strict about converting float to double (#458 ) * Be more strict about converting float to double * Test equivalence of round, SILU implementations Test module is commented out in CMakeLists.txt because the tests may take a long time, depending on how much the compiler optimizes. * Fix softmax in perplexity.cpp * all : prefer float over double where appropriate * perplexity : add <cmath> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-28 19:48:20 +03:00
Georgi Gerganov	a316a425d0	Overhaul the examples structure - main -> examples - utils -> examples (renamed to "common") - quantize -> examples - separate tools for "perplexity" and "embedding" Hope I didn't break something !	2023-03-25 20:26:40 +02:00
Stephan Walter	69c92298a9	Deduplicate q4 quantization functions (#383 ) * Deduplicate q4 quantization functions * Use const; add basic test * Re-enable quantization test * Disable AVX2 flags in CI --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-22 19:29:06 +02:00
Georgi Gerganov	f5a77a629b	Introduce C-style API (#370 ) * Major refactoring - introduce C-style API * Clean up * Add <cassert> * Add <iterator> * Add <algorithm> .... * Fix timing reporting and accumulation * Measure eval time only for single-token calls * Change llama_tokenize return meaning	2023-03-22 07:32:36 +02:00
Georgi Gerganov	eb34620aec	Add tokenizer test + revert to C++11 (#355 ) * Add test-tokenizer-0 to do a few tokenizations - feel free to expand * Added option to convert-pth-to-ggml.py script to dump just the vocabulary * Added ./models/ggml-vocab.bin containing just LLaMA vocab data (used for tests) * Added utility to load vocabulary file from previous point (temporary implementation) * Avoid using std::string_view and drop back to C++11 (hope I didn't break something) * Rename gpt_vocab -> llama_vocab * All CMake binaries go into ./bin/ now	2023-03-21 17:29:41 +02:00

7 Commits