llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-12-30 21:34:36 +00:00

Author	SHA1	Message	Date
Kerfuffle	34b0a08207	gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981 ) * gguf-py: Refactor and add file reading support * Replay changes from #3871 Credit to @cebtenzzre for that pull * Various type annotation fixes. * sort imports with isort (again) * Fix missing return statement in add_tensor * style cleanup with flake8 * fix NamedTuple and Enum usage * Fix an issue with state init in GGUFReader Move examples to an examples/ directory Clean up examples Add an example of modifying keys in a GGUF file Update documentation with info on examples Try to support people importing gguf/gguf.py directly * Damagage is not a word. * Clean up gguf-py/examples/modify_gguf.py whitespace Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update gguf-py/examples/modify_gguf.py formatting Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update gguf-py/gguf/gguf_reader.py type hint Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Make examples executable, formatting changes * Add more information to GGUFReader and examples comments * Include a gguf Python package version bump * Add convert-gguf-endian.py script * cleanup * gguf-py : bump minor version * Reorganize scripts * Make GGUFReader endian detection less arbitrary * Add JSON dumping support to gguf-dump.py Which I kind of regret now * A few for gguf-dump.py cleanups * Murder accidental tuple in gguf-py/scripts/gguf-dump.py Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * cleanup * constants : remove unneeded type annotations * fix python 3.8 compat * Set up gguf- scripts in pyproject.toml * And include scripts/__init__.py, derp * convert.py: We can't currently support Q8_0 on big endian. * gguf-py: SpecialVocab: Always try available sources for special token ids gguf-py: SpecialVocab: Try to load merges from merges.txt if not in tokenizer.json gguf-py: SpecialVocab: Add 'add_bos_token' type bools to GGUF metadata u * cleanup * Promote add_X_token to GGUF metadata for BOS and EOS --------- Co-authored-by: Jared Van Bortel <jared@nomic.ai> Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>	2023-11-11 08:04:50 +03:00
Jhen-Jie Hong	4a4fd3eefa	server : allow continue edit on completion mode (#3950 ) * server : allow continue edit on completion mode * server : handle abort case in runCompletion * server : style improvement	2023-11-10 16:49:33 -06:00
Galunid	df9d1293de	Unbreak persimmon after #3837 (#4010 )	2023-11-10 14:24:54 +01:00
Galunid	a75fa576ab	scripts: Generalize convert scripts (#3838 ) * Replace convert-*-hf-to-gguf.py files with convert-hf-to-gguf.py	2023-11-09 11:09:29 +01:00
Mihai	57ad015dc3	server : add min_p param (#3877 ) * Update server.cpp with min_p after it was introduced in https://github.com/ggerganov/llama.cpp/pull/3841 * Use spaces instead of tabs * Update index.html.hpp after running deps.sh * Fix test - fix line ending	2023-11-08 20:00:34 -06:00
Jared Van Bortel	af00cca08e	Merge commit 'ec893798b7a2a803466cc8f063051499ec3d96f7' into HEAD	2023-11-08 16:36:00 -05:00
Jared Van Bortel	c438c16896	fix build with external fmtlib (v10) Co-authored-by: ToKiNoBug <tokinobug@163.com>	2023-11-08 16:31:29 -05:00
Jared Van Bortel	a8cac53207	kompute : fix issues with debug layers	2023-11-08 16:31:29 -05:00
slaren	875fb42871	ggml-alloc : fix backend assignments of views (#3982 )	2023-11-08 13:15:14 +01:00
Jared Van Bortel	0a7c980b6f	gguf : track writer state, free unneeded tensors, cleanup (#3871 )	2023-11-07 12:43:04 -05:00
Georgi Gerganov	413503d4b9	make : do not add linker flags when compiling static llava lib (#3977 )	2023-11-07 20:25:32 +03:00
xaedes	e9c1cecb9d	ggml : fix backward rope after YaRN (#3974 ) * fix backward process of rope rope backward process was broken after YaRN RoPE (#2268) implementation, due to missing changes in backward functions. the code for the backward process is nearly identically to the forward process: the only difference is the sign of the sin-values. to avoid future regressions remove the near-duplicate backward functions and reuse the forward code: for this a new function argument `bool forward` was added to `ggml_compute_forward_rope_f32` and `ggml_compute_forward_rope_f16`. the sin-values will be negated when forward is false. * fix finetune rope call to use correct default attn_factor of 1.0f * remove unused `ggml_rope_xpos_back` it is better to have only one `ggml_rope_back` function that accepts all rope parameters, so that `ggml_compute_backward` can propagate all parameters without having to switch between different rope_back variants. * fix comments explaining the sinus sign in ggml_forward_rope * add missing function arguments in declaration * fix function argument type in declaration	2023-11-07 10:04:51 +02:00
Matthew Tejo	54b4df8886	Use params when loading models in llava-cli (#3976 ) llava-cli was loading models with default params and ignoring settings from the cli. This switches to a generic function to load the params from the cli options.	2023-11-07 10:43:59 +03:00
Meng Zhang	46876d2a2c	cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946 ) * protyping the idea that supports running on CPU for a GGML_USE_CUBLAS=on build * doc: add comments to ggml_cublas_loaded() * fix defined(...)	2023-11-07 08:49:08 +02:00
Damian Stewart	381efbf480	llava : expose as a shared library for downstream projects (#3613 ) * wip llava python bindings compatibility * add external llava API * add base64 in-prompt image support * wip refactor image loading * refactor image load out of llava init * cleanup * further cleanup; move llava-cli into its own file and rename * move base64.hpp into common/ * collapse clip and llava libraries * move llava into its own subdir * wip * fix bug where base64 string was not removed from the prompt * get libllava to output in the right place * expose llava methods in libllama.dylib * cleanup memory usage around clip_image_* * cleanup and refactor again * update headerdoc * build with cmake, not tested (WIP) * Editorconfig * Editorconfig * Build with make * Build with make * Fix cyclical depts on Windows * attempt to fix build on Windows * attempt to fix build on Windows * Upd TODOs * attempt to fix build on Windows+CUDA * Revert changes in cmake * Fix according to review comments * Support building as a shared library * address review comments --------- Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com> Co-authored-by: Jared Van Bortel <jared@nomic.ai>	2023-11-07 00:36:23 +03:00
slaren	2833a6f63c	ggml-cuda : fix f16 mul mat (#3961 ) * ggml-cuda : fix f16 mul mat ggml-ci * silence common.cpp warning (bonus)	2023-11-05 18:45:16 +01:00
Kerfuffle	d9ccce2e33	Allow common process_escapes to handle \x sequences (#3928 ) * Allow common process_escapes to handle \x sequences * Fix edge case when second hex digit is NUL	2023-11-05 10:06:06 -07:00
Thái Hoàng Tâm	bb60fd0bf6	server : fix typo for --alias shortcut from -m to -a (#3958 )	2023-11-05 18:15:27 +02:00
Jared Van Bortel	132d25b8a6	cuda : fix disabling device with --tensor-split 1,0 (#3951 ) Co-authored-by: slaren <slarengh@gmail.com>	2023-11-05 10:08:57 -05:00
Meng Zhang	3d48f42efc	llama : mark LLM_ARCH_STARCODER as full offload supported (#3945 ) as done in https://github.com/ggerganov/llama.cpp/pull/3827	2023-11-05 14:40:08 +02:00
Eve	c41ea36eaa	cmake : MSVC instruction detection (fixed up #809 ) (#3923 ) * Add detection code for avx * Only check hardware when option is ON * Modify per code review sugguestions * Build locally will detect CPU * Fixes CMake style to use lowercase like everywhere else * cleanup * fix merge * linux/gcc version for testing * msvc combines avx2 and fma into /arch:AVX2 so check for both * cleanup * msvc only version * style * Update FindSIMD.cmake --------- Co-authored-by: Howard Su <howard0su@gmail.com> Co-authored-by: Jeremy Dunn <jeremydunn123@gmail.com>	2023-11-05 10:03:09 +02:00
Eve	a7fac013cf	ci : use intel sde when ci cpu doesn't support avx512 (#3949 )	2023-11-05 09:46:44 +02:00
slaren	48ade94538	cuda : revert CUDA pool stuff (#3944 ) * Revert "cuda : add ROCM aliases for CUDA pool stuff (#3918)" This reverts commit `629f917cd6`. * Revert "cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903)" This reverts commit `d6069051de`. ggml-ci	2023-11-05 09:12:13 +02:00
Kerfuffle	f28af0d81a	gguf-py: Support 01.AI Yi models (#3943 )	2023-11-04 16:20:34 -06:00
cebtenzzre	f88b198885	llama : fix Vulkan whitelist (#11 )	2023-11-03 17:22:22 -04:00
Adam Treat	ffd0624be2	Remove this debug code.	2023-11-03 17:22:22 -04:00
Adam Treat	a5eb001eab	Revert the prompt processing on gpu for now. Fixes issues #1580 and #1581	2023-11-03 17:22:22 -04:00
Adam Treat	e006d377dd	Scale the workgroup count down to allow correct generation for falcon with AMD radeon cards with lower workgroup count limit Partially fixes #1581	2023-11-03 17:22:22 -04:00
cebtenzzre	89b71278ff	llama : decide to disable Vulkan before loading tensors (#7 )	2023-11-03 17:22:22 -04:00
cebtenzzre	1c17010188	vulkan : fix missing break in matmul selection (#9 )	2023-11-03 17:22:22 -04:00
Adam Treat	74ddf0f17d	Fix synchronization problem for AMD Radeon with amdvlk driver or windows drivers. Does not have any performance or fidelity effect on other gpu/driver combos I've tested. FIXES: https://github.com/nomic-ai/gpt4all/issues/1507	2023-11-03 17:22:22 -04:00
Adam Treat	8d9efbf97a	Lower the workgroup count for some shaders by providing a loop that processes four floats at a time.	2023-11-03 17:22:22 -04:00
Adam Treat	752f7ebd61	Remove unused push constant that was giving validation errors.	2023-11-03 17:22:22 -04:00
Adam Treat	8400015337	Don't try an allocation on a heap that is smaller than the size we require.	2023-11-03 17:22:22 -04:00
cebtenzzre	cbc0d1af79	kompute : make scripts executable	2023-11-03 17:22:22 -04:00
cebtenzzre	21841d3163	kompute : enable kp_logger and make it static (#8 )	2023-11-03 17:22:22 -04:00
Aaron Miller	cc05a602d6	use matvec shaders for matmat I wrote the matmat shaders from scratch so I understand them better but they are currently not faster than just multiply-invoking the matvec shaders, by a significant degree - so, except for f32 which needed a new shader, revert to the m*v ones here.	2023-11-03 17:22:22 -04:00
Aaron Miller	c1fd64548d	attempted speedups 2	2023-11-03 17:22:22 -04:00
Aaron Miller	9bc52ebae3	attempted speedups	2023-11-03 17:22:22 -04:00
Aaron Miller	8dc79ac380	clean up vulkan/cpu switch	2023-11-03 17:22:22 -04:00
Aaron Miller	cd0257ed0d	q4_1 mat*mat	2023-11-03 17:22:22 -04:00
Aaron Miller	4809890d80	rm commented dbg print	2023-11-03 17:22:22 -04:00
Aaron Miller	b78a94bc6d	q6k mm works	2023-11-03 17:22:22 -04:00
Aaron Miller	d5741c07a5	use op param epsilon for norms	2023-11-03 17:22:22 -04:00
Aaron Miller	3327d84a7f	perf: use bigger threadgroups in mm	2023-11-03 17:22:22 -04:00
Aaron Miller	46385ee0d5	misc vulkan cleanup make pushconts consistent w/ dispatch, avoid a double free	2023-11-03 17:22:22 -04:00
Aaron Miller	f0cd38b9ad	add mat*mat ops	2023-11-03 17:22:22 -04:00
Adam Treat	09d83f0401	Delete TODO now that we have q8_0.	2023-11-03 17:22:22 -04:00
Aaron Miller	8564f79036	falcon h2d + reenable vulkan	2023-11-03 17:22:22 -04:00
Aaron Miller	020b1745a0	vulkan: implement neox mode for rope	2023-11-03 17:22:21 -04:00

... 3 4 5 6 7 ...

1764 Commits