llama.cpp

mirror of https://github.com/ggerganov/llama.cpp.git synced 2024-11-15 07:19:53 +00:00

History

Gabe Goodhart 0d2ec43833 llama : support IBM Granite architecture (#9412 ) * feat(gguf-py): Add Granite model and params to gguf-py Branch: GraniteLM Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat(convert_hf_to_gguf): Add registration and param setup for Granite Branch: GraniteLM Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat(llama.cpp): Add config parsing for Granite multiplier params Branch: GraniteLM Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat(llama.cpp): First pass at full port of granite deviations from llama Something is still not working right since the results are mostly terrible, but on occasion it's producing relevant results at this point, so _something_ is working. Branch: GraniteLM Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix(llama.cpp): Determine granite language 3b instruct by vocab size Branch: GraniteLM Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix(convert_hf_to_gguf): Use LlamaModel as base for GraniteModel The defaults in LlamaModel are needed for Granite as well Branch: GraniteLM Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix(llama.cpp): Switch Granite param names to use _scale for consistency Other scalar multipliers are called _scale, so this provides a more consistent naming convention. Branch: GraniteLM Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> fix(convert_hf_to_gguf/gguf-py): _multiplier -> _scale The transformers names with _multiplier will now be converted to the _scale equivalent during conversion. Branch: GraniteLM Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix(llama.cpp): Use separate switch clause for granite in llm_load_hparams Branch: GraniteLM Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>		2024-09-17 09:44:58 +03:00
..
__init__.py	convert-*.py: GGUF Naming Convention Refactor and Metadata Override Refactor (#7499 )	2024-07-18 20:40:15 +10:00
constants.py	llama : support IBM Granite architecture (#9412 )	2024-09-17 09:44:58 +03:00
gguf_reader.py	py : type-check all Python scripts with Pyright (#8341 )	2024-07-07 15:04:39 -04:00
gguf_writer.py	llama : support IBM Granite architecture (#9412 )	2024-09-17 09:44:58 +03:00
gguf.py	gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981 )	2023-11-11 08:04:50 +03:00
lazy.py	gguf-py : simplify support for quant types (#8838 )	2024-08-08 13:33:09 -04:00
metadata.py	server : add lora hotswap endpoint (WIP) (#8857 )	2024-08-06 17:33:39 +02:00
py.typed	convert : various script cleanups/fixes + merges and special token handling (#2842 )	2023-08-30 11:25:50 +03:00
quants.py	ggml-quants : ternary packing for TriLMs and BitNet b1.58 (#8151 )	2024-09-05 21:48:47 -04:00
tensor_mapping.py	llama : support OLMoE (#9462 )	2024-09-16 09:47:37 +03:00
utility.py	gguf-py : fix some metadata name extraction edge cases (#8591 )	2024-07-20 21:58:49 -04:00
vocab.py	Move convert.py to examples/convert-legacy-llama.py (#7430 )	2024-05-30 21:40:00 +10:00