Skip to content

fix(template): preserve non-string prefix elements in _replace_system#9612

Open
he-yufeng wants to merge 2 commits into
modelscope:mainfrom
he-yufeng:fix/replace-system-preserves-token-lists
Open

fix(template): preserve non-string prefix elements in _replace_system#9612
he-yufeng wants to merge 2 commits into
modelscope:mainfrom
he-yufeng:fix/replace-system-preserves-token-lists

Conversation

@he-yufeng

Copy link
Copy Markdown

_replace_system is responsible for building the no-system variant of prefix by replacing the {{SYSTEM}} placeholder in string elements. The current implementation uses a filter comprehension:

# before
return [p.replace('{{SYSTEM}}', '') for p in prefix if isinstance(p, str)]

The if isinstance(p, str) clause silently drops every non-string element. Token-ID lists like ['bos_token_id'] are legitimate prefix components — templates such as ziya, bluelm, and emu3_chat (and others in baai.py) use prefix=[['bos_token_id'], '{{SYSTEM}}']. When _replace_system runs (i.e. no system message is provided), the BOS token-ID entry disappears from meta.prefix, so every training and inference sequence for these templates starts without the expected leading token.

Fix: change the filter to a conditional expression so non-strings pass through unchanged:

# after
return [p.replace('{{SYSTEM}}', '') if isinstance(p, str) else p for p in prefix]

Adds a focused unit test in tests/general/test_template_meta.py that instantiates a TemplateMeta with a mixed prefix and asserts the list element survives __post_init__.

_replace_system replaced {{SYSTEM}} in string prefix elements but filtered out
non-string elements entirely.  Token-ID lists such as ['bos_token_id'] are
valid prefix components used by ziya, bluelm, emu3_chat and other templates
(prefix=[['bos_token_id'], '{{SYSTEM}}']).  When no system message is given,
those elements were silently dropped, causing every training/inference sequence
for those templates to start without the expected BOS token.

Change the filter comprehension to a conditional expression so that
non-strings pass through unchanged.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a bug in _replace_system where non-string elements (such as list placeholders like ['bos_token_id']) were being filtered out from the template prefix. The method now correctly preserves non-string elements while stripping the {{SYSTEM}} placeholder from string elements. A corresponding unit test has been added to verify this behavior. I have no further feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@Jintao-Huang

Copy link
Copy Markdown
Collaborator

thanks!

@Jintao-Huang

Copy link
Copy Markdown
Collaborator

Please pass the lint checks:

pip install pre-commit
pre-commit run --all-files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants