mirror of
https://github.com/ggerganov/llama.cpp.git
synced 2024-12-24 10:24:35 +00:00
Page:
HuggingFace Model Card Metadata Interoperability Consideration
2
HuggingFace Model Card Metadata Interoperability Consideration
Brian edited this page 2024-11-21 01:41:30 +11:00
Below is the agreed upon mapping between GGUF KV Keys and Hugging face as per Discussion with HF to coordinate on extending the handling of base model sources and dataset sources.
GGUF KV Key | HF Model Card Field | Notes |
---|---|---|
general.name |
model_name |
Name of the model. |
general.license |
license |
License identifier. |
general.license.name |
license_name |
Full name of the license. |
general.license.link |
license_link |
URL to the license text. |
general.base_model.{id}.name |
base_model |
Simpler field: array of model IDs on HF Hub. |
general.base_model.{id}.name |
base_model_sources[].name |
Extension: detailed description of base models. |
general.base_model.{id}.author |
base_model_sources[].author |
Author of the parent/base model (extension field). |
general.base_model.{id}.version |
base_model_sources[].version |
Version of the parent/base model (extension field). |
general.base_model.{id}.organization |
base_model_sources[].organization |
Organization responsible for the parent/base model (extension field). |
general.base_model.{id}.description |
base_model_sources[].description |
Description of the parent/base model (extension field). |
general.base_model.{id}.url |
base_model_sources[].url |
URL for more information about the parent/base model (extension field). |
general.base_model.{id}.doi |
base_model_sources[].doi |
DOI of the parent/base model (extension field). |
general.base_model.{id}.uuid |
base_model_sources[].uuid |
UUID of the parent/base model (extension field). |
general.base_model.{id}.repo_url |
base_model_sources[].repo_url |
Repository URL of the parent/base model (extension field). |
general.dataset.{id}.name |
datasets |
Simpler field: array of dataset IDs on HF Hub. |
general.dataset.{id}.name |
dataset_sources[].name |
Extension: detailed description of datasets. |
general.dataset.{id}.author |
dataset_sources[].author |
Author of the dataset (extension field). |
general.dataset.{id}.version |
dataset_sources[].version |
Version of the dataset (extension field). |
general.dataset.{id}.organization |
dataset_sources[].organization |
Organization responsible for the dataset (extension field). |
general.dataset.{id}.description |
dataset_sources[].description |
Description of the dataset (extension field). |
general.dataset.{id}.url |
dataset_sources[].url |
URL for more information about the dataset (extension field). |
general.dataset.{id}.doi |
dataset_sources[].doi |
DOI of the dataset (extension field). |
general.dataset.{id}.uuid |
dataset_sources[].uuid |
UUID of the dataset (extension field). |
general.dataset.{id}.repo_url |
dataset_sources[].repo_url |
Repository URL of the dataset (extension field). |
general.tags |
tags |
Tags describing the model. |
general.languages |
language |
Languages supported by the model. |
general.description |
Not explicitly mapped for now | Can be included in a custom "description" field in the model card. |
general.url |
Not explicitly mapped for now | General URL for further information about the model. |
general.repo_url |
Not explicitly mapped for now | Repository URL for the model. |
general.doi |
Not explicitly mapped for now | DOI of the model. |
general.uuid |
Not explicitly mapped for now | UUID of the model. |
general.size_label |
Not explicitly mapped for now | May represent quantization or sizing information. |
general.quantized_by |
Not explicitly mapped for now | Indicates who performed quantization. |
general.alignment |
Not explicitly mapped for now | Potentially indicates alignment objective (e.g., RLHF, etc.). |
general.file_type |
Not explicitly mapped for now | File format of the model (e.g., GGUF, Safetensors). |
An example below of how the mapping as shown above may appear:
# Model Card Fields
model_name: Example Model Six
# Licensing details
license: apache-2.0
license_name: Apache License Version 2.0, January 2004
license_link: https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md
# Simple Model (singular or list of hugging face model ids)
base_model: stabilityai/stable-diffusion-xl-base-1.0
# Detailed Model Parents (Merges, Pre-tuning, etc...) (list of dicts)
base_model_sources:
- name: GPT-3
author: OpenAI
version: '3.0'
organization: OpenAI
description: A large language model capable of performing a wide variety of language tasks.
url: 'https://openai.com/research/gpt-3'
doi: 10.5555/gpt3doi123456
uuid: 123e4567-e89b-12d3-a456-426614174000
repo_url: 'https://github.com/openai/gpt-3'
- name: BERT
author: Google AI Language
version: '1.0'
organization: Google
description: A transformer-based model pretrained on English to achieve state-of-the-art performance on a range of NLP tasks.
url: 'https://github.com/google-research/bert'
doi: 10.5555/bertdoi789012
uuid: 987e6543-e21a-43f3-a356-527614173999
repo_url: 'https://github.com/google-research/bert'
# Simple Dataset (singular or list of hugging face dataset ids)
datasets: common_voice
# Detailed Model Datasets Used (Training data...) (list of dicts)
dataset_sources:
- name: Wikipedia Corpus
author: Wikimedia Foundation
version: '2021-06'
organization: Wikimedia
description: A dataset comprising the full English Wikipedia, used to train models in a range of natural language tasks.
url: 'https://dumps.wikimedia.org/enwiki/'
doi: 10.5555/wikidoi234567
uuid: 234e5678-f90a-12d3-c567-426614172345
repo_url: 'https://github.com/wikimedia/wikipedia-corpus'
- name: Common Crawl
author: Common Crawl Foundation
version: '2021-04'
organization: Common Crawl
description: A dataset containing web-crawled data from various domains, providing a broad range of text.
url: 'https://commoncrawl.org'
doi: 10.5555/ccdoi345678
uuid: 345e6789-f90b-34d5-d678-426614173456
repo_url: 'https://github.com/commoncrawl/cc-crawl-data'
# Model Content Metadata
tags:
- text generation
- transformer
- llama
- tiny
- tiny model
language:
- en
Users Guide
Useful information for users that doesn't fit into Readme.
- Home
- Feature Matrix
- GGML Tips & Tricks
- Chat Templating
- Metadata Override
- HuggingFace Model Card Metadata Interoperability Consideration
Technical Details
These are information useful for Maintainers and Developers which does not fit into code comments
Github Actions Main Branch Status
Click on a badge to jump to workflow. This is here as a useful general view of all the actions so that we may notice quicker if main branch automation is broken and where.