Updated Templates supported by llama_chat_apply_template (markdown)

2024-12-25 10:54:36 +00:00 · 2024-04-02 23:32:18 +08:00 · 2024-04-02 23:32:18 +08:00 · 42a21c9ab0
commit 42a21c9ab0
parent f29513fdfc
1 changed files with 82 additions and 60 deletions
--- a/Templates-supported-by-llama_chat_apply_template.md
+++ b/Templates-supported-by-llama_chat_apply_template.md
@ -6,50 +6,6 @@ This is the list of templates currently supported by `llama_apply_chat_template`

 ## Supported templates

-<details>
-<summary>Python code</summary>
-
-```python
-from transformers import AutoTokenizer
-
-VARIANTS_TO_TEST = [
-    'teknium/OpenHermes-2.5-Mistral-7B',
-    'mistralai/Mistral-7B-Instruct-v0.2',
-    'TheBloke/FusionNet_34Bx2_MoE-AWQ',
-    'bofenghuang/vigogne-2-70b-chat',
-    'mlabonne/AlphaMonarch-7B',
-    'google/gemma-7b-it',
-    'OrionStarAI/Orion-14B-Chat',
-    'openbmb/MiniCPM-2B-dpo-fp32',
-]
-
-HISTORY = [
-    { 'role': 'system', 'content': 'test' },
-    { 'role': 'user', 'content': 'hello' },
-    { 'role': 'assistant', 'content': 'response' },
-    { 'role': 'user', 'content': 'again' },
-    { 'role': 'assistant', 'content': 'response' },
-]
-
-for variant in VARIANTS_TO_TEST:
-    history = [m for m in HISTORY] # copy
-    if 'Mistral' in variant or 'gemma' in variant:
-        history.pop(0) # no system prompt for mistral and gemma
-    if 'gemma' in variant:
-        # GemmaTokenizer is quite buggy, let's hard code the template here
-        GEMMA_TMLP = "{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if (message['role'] == 'assistant') %}{% set role = 'model' %}{% else %}{% set role = message['role'] %}{% endif %}{{ '<start_of_turn>' + role + '\n' + message['content'] | trim + '<end_of_turn>\n' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\n'}}{% endif %}"
-        print('Gemma')
-        output = AutoTokenizer.from_pretrained(VARIANTS_TO_TEST[0]).apply_chat_template(history, tokenize=False, chat_template=GEMMA_TMLP)
-        print(output)
-        print('-' * 30)
-    else:
-        print(variant)
-        tokenizer = AutoTokenizer.from_pretrained(variant)
-        print(tokenizer.apply_chat_template(history, tokenize=False))
-        print('-' * 30)
-```
-</details>
-
 ```
 Usage: ./server -m ... --chat-template chatml
 teknium/OpenHermes-2.5-Mistral-7B
@ -126,22 +82,6 @@ Assistant: </s>response</s>Human: again
 Assistant: </s>response</s>
 ```

-Additionally, we also support zephyr template (I cannot find it on huggingface, but have seen in [this list](https://github.com/ggerganov/llama.cpp/blob/c8d847d57efdc0f9bbbf881d48c645e151b36fd8/examples/server/public/promptFormats.js) )
-
-```
-Usage: ./server -m ... --chat-template zephyr
-<|system|>
-test<|endoftext|>
-<|user|>
-hello<|endoftext|>
-<|assistant|>
-response<|endoftext|>
-<|user|>
-again<|endoftext|>
-<|assistant|>
-response<|endoftext|>
-```
-
 ```
 Usage: ./server -m ... --chat-template openchat
 openchat/openchat-3.5-0106
@ -192,6 +132,88 @@ Another question

 ```

+Additionally, we also support zephyr template (I cannot find it on huggingface, but have seen in [this list](https://github.com/ggerganov/llama.cpp/blob/c8d847d57efdc0f9bbbf881d48c645e151b36fd8/examples/server/public/promptFormats.js) )
+
+```
+Usage: ./server -m ... --chat-template zephyr
+<|system|>
+test<|endoftext|>
+<|user|>
+hello<|endoftext|>
+<|assistant|>
+response<|endoftext|>
+<|user|>
+again<|endoftext|>
+<|assistant|>
+response<|endoftext|>
+```
+
+## How to add a new template
+
+1. Check the `chat_template` in the model's HuggingFace `tokenizer_config.json` [(example)](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/blob/main/tokenizer_config.json#L42).
+	- If there isn't one, open an issue first to discuss. Some older models actually predate chat templates and multi-turn responses and would be difficult to support.
+2. Use the following python script to generate a test conversation.
+    <details>
+    <summary>Script</summary>
+
+	```python
+    from transformers import AutoTokenizer
+
+    VARIANTS_TO_TEST = [
+        'teknium/OpenHermes-2.5-Mistral-7B',
+        'mistralai/Mistral-7B-Instruct-v0.2',
+        'TheBloke/FusionNet_34Bx2_MoE-AWQ',
+        'bofenghuang/vigogne-2-70b-chat',
+        'mlabonne/AlphaMonarch-7B',
+        'google/gemma-7b-it',
+        'OrionStarAI/Orion-14B-Chat',
+        'openbmb/MiniCPM-2B-dpo-fp32',
+        'openchat/openchat-3.5-0106',
+        'deepseek-ai/deepseek-coder-33b-instruct',
+        # Replace with your model's HuggingFace name
+    ]
+
+    HISTORY = [
+        { 'role': 'system', 'content': 'You are a helpful assistant' },
+        { 'role': 'user', 'content': 'Hello' },
+        { 'role': 'assistant', 'content': 'Hi there' },
+        { 'role': 'user', 'content': 'Who are you' },
+        { 'role': 'assistant', 'content': '   I am an assistant   ' },
+        { 'role': 'user', 'content': 'Another question' },
+    ]
+
+    for variant in VARIANTS_TO_TEST:
+        history = [m for m in HISTORY] # copy
+        if 'Mistral' in variant or 'gemma' in variant:
+            history.pop(0) # no system prompt for mistral and gemma
+        if 'gemma' in variant:
+            # GemmaTokenizer is quite buggy, let's hard code the template here
+            GEMMA_TMLP = "{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if (message['role'] == 'assistant') %}{% set role = 'model' %}{% else %}{% set role = message['role'] %}{% endif %}{{ '<start_of_turn>' + role + '\n' + message['content'] | trim + '<end_of_turn>\n' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\n'}}{% endif %}"
+            print("\n----- Gemma -----")
+            output = AutoTokenizer.from_pretrained(VARIANTS_TO_TEST[0]).apply_chat_template(history, tokenize=False, add_generation_prompt=True, chat_template=GEMMA_TMLP)
+            print(output)
+            print("\n[Test String]\n// google/gemma-7b-it")
+            print(output.replace("\n", "\\n"))
+            print('"' + output.replace("\n", "\\n") + '",')
+        else:
+            print("\n----- " + variant + " -----")
+            tokenizer = AutoTokenizer.from_pretrained(variant)
+            output = tokenizer.apply_chat_template(history, tokenize=False, add_generation_prompt=True)
+            print(output)
+            print("\n[Test String]\n// " + variant)
+            print('"' + output.replace("\n", "\\n") + '",')
+	```
+	</details>
+
+3. Copy both the `chat_template` from HuggingFace and the formatted text below `[Test String]` into [tests/test-chat-template.cpp](https://github.com/ggerganov/llama.cpp/blob/master/tests/test-chat-template.cpp).
+
+4. Run `make tests/test-chat-template`. You can now use this test to verify that your template implementation is identical to the original.
+
+5. Implement your template in llama.cpp (search for `llama_chat_apply_template_internal`). 
+	- This function attempts to detect the model's template when it's not specified. This uses the model's `chat_template` metadata, so pick a unique pattern.
+
+6. `make` and run the test. Repeat until the output matches the original!
+
 ## Custom chat templates

 Currently, it's not possible to use your own chat template with llama.cpp server's `/chat/completions`