Skip to content

Commit f160bf7

Browse files
tobocop2abetlen
andauthored
Fix: model fails to load when chat template uses HuggingFace generation tags (abetlen#2226)
* chat-format: ignore HuggingFace's {% generation %} chat-template tag HuggingFace's transformers chat-template extension adds {% generation %} and {% endgeneration %} tags so trainers can mark generation spans for loss masking. The tags ship in GGUF tokenizer.chat_template metadata (SmolLM3 et al), but jinja2's default environment doesn't recognize them, so Llama() raises TemplateSyntaxError at init for any affected GGUF, even when the caller passes an explicit chat_format override. Register a minimal Jinja extension that treats both tags as inert wrappers: the body between them renders as-is, the markers themselves emit nothing. No behavioral change for templates that don't use the tags. Prior art: PR abetlen#2082 attempted the same approach but referenced an unimported 'nodes' module and didn't consume the body or closing tag. * fix: simplify generation tag handling * refactor: rename generation tag extension --------- Co-authored-by: abetlen <abetlen@gmail.com>
1 parent 2c455a5 commit f160bf7

2 files changed

Lines changed: 12 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
88
## [Unreleased]
99

1010
- feat: Update llama.cpp to ggml-org/llama.cpp@d749821db
11+
- fix: model fails to load when chat template uses HuggingFace generation tags by @tobocop2 in #2226
1112
- docs: add contributing guide by @abetlen in #2229
1213
- chore: Migrate llama.cpp submodule URL to ggml-org/llama.cpp by @shalinib-ibm in #2034
1314
- fix: Enable unified KV cache for embedding contexts to preserve full per-sequence context in batch embedding calls by @SanjanaB123 in #2217

llama_cpp/llama_chat_format.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
)
2525

2626
import jinja2
27+
from jinja2.ext import Extension
2728
from jinja2.sandbox import ImmutableSandboxedEnvironment
2829

2930
import numpy as np
@@ -192,6 +193,15 @@ def __call__(
192193

193194

194195
class Jinja2ChatFormatter(ChatFormatter):
196+
class IgnoreGenerationTags(Extension):
197+
"""Pass-through for HuggingFace's ``{% generation %}`` chat-template tag."""
198+
199+
tags = {"generation"}
200+
201+
def parse(self, parser: jinja2.parser.Parser):
202+
parser.stream.skip(1)
203+
return parser.parse_statements(("name:endgeneration",), drop_needle=True)
204+
195205
def __init__(
196206
self,
197207
template: str,
@@ -213,6 +223,7 @@ def __init__(
213223
loader=jinja2.BaseLoader(),
214224
trim_blocks=True,
215225
lstrip_blocks=True,
226+
extensions=[Jinja2ChatFormatter.IgnoreGenerationTags],
216227
).from_string(self.template)
217228

218229
@staticmethod

0 commit comments

Comments
 (0)