fix(template): preserve non-string prefix elements in _replace_system#9612
fix(template): preserve non-string prefix elements in _replace_system#9612he-yufeng wants to merge 2 commits into
Conversation
_replace_system replaced {{SYSTEM}} in string prefix elements but filtered out
non-string elements entirely. Token-ID lists such as ['bos_token_id'] are
valid prefix components used by ziya, bluelm, emu3_chat and other templates
(prefix=[['bos_token_id'], '{{SYSTEM}}']). When no system message is given,
those elements were silently dropped, causing every training/inference sequence
for those templates to start without the expected BOS token.
Change the filter comprehension to a conditional expression so that
non-strings pass through unchanged.
There was a problem hiding this comment.
Code Review
This pull request fixes a bug in _replace_system where non-string elements (such as list placeholders like ['bos_token_id']) were being filtered out from the template prefix. The method now correctly preserves non-string elements while stripping the {{SYSTEM}} placeholder from string elements. A corresponding unit test has been added to verify this behavior. I have no further feedback to provide.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
thanks! |
|
Please pass the lint checks: pip install pre-commit
pre-commit run --all-files |
_replace_systemis responsible for building the no-system variant ofprefixby replacing the{{SYSTEM}}placeholder in string elements. The current implementation uses a filter comprehension:The
if isinstance(p, str)clause silently drops every non-string element. Token-ID lists like['bos_token_id']are legitimate prefix components — templates such asziya,bluelm, andemu3_chat(and others inbaai.py) useprefix=[['bos_token_id'], '{{SYSTEM}}']. When_replace_systemruns (i.e. no system message is provided), the BOS token-ID entry disappears frommeta.prefix, so every training and inference sequence for these templates starts without the expected leading token.Fix: change the filter to a conditional expression so non-strings pass through unchanged:
Adds a focused unit test in
tests/general/test_template_meta.pythat instantiates aTemplateMetawith a mixed prefix and asserts the list element survives__post_init__.