Skip to content

fix(dataset): propagate system message to rejected_messages in DPO preprocessing#9611

Open
he-yufeng wants to merge 1 commit into
modelscope:mainfrom
he-yufeng:fix/dpo-system-in-rejected-messages
Open

fix(dataset): propagate system message to rejected_messages in DPO preprocessing#9611
he-yufeng wants to merge 1 commit into
modelscope:mainfrom
he-yufeng:fix/dpo-system-in-rejected-messages

Conversation

@he-yufeng

Copy link
Copy Markdown

When a DPO dataset row has both a system key and rejected_messages, the system message is never added to rejected_messages. The recursive call to preprocess only passes {'messages': row['rejected_messages']} — the system key is dropped on the way in.

DPO training requires chosen and rejected to have the same prompt context; this silently produces asymmetric encoding. Anything downstream that encodes both sides will see different conversation prefixes, which corrupts the training signal.

One-line fix: pass 'system': row.get('system') to the recursive call so the system message is forwarded.

Also adds a regression test covering the scenario.

…eprocessing

When a DPO dataset row has both a `system` key and `rejected_messages`, the
recursive preprocess() call only forwarded `{'messages': row['rejected_messages']}`,
dropping the system key. This meant system was prepended to chosen messages but
never to rejected messages, producing asymmetric conversation prefixes and
corrupting the DPO training signal.

Pass `'system': row.get('system')` to the recursive call so both sides receive
the same system prefix. Also adds a regression test.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request ensures that the system message is propagated to rejected_messages during preprocessing in swift/dataset/preprocessor/core.py, and adds a corresponding unit test to prevent regression. The reviewer pointed out a potential TypeError if the recursive preprocessing of rejected_messages returns None, suggesting a robust check to handle this case gracefully.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines 515 to +517
if 'rejected_messages' in row:
row['rejected_messages'] = MessagesPreprocessor.preprocess(
self, {'messages': row['rejected_messages']})['messages']
self, {'messages': row['rejected_messages'], 'system': row.get('system')})['messages']

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If MessagesPreprocessor.preprocess returns None (which happens if rejected_messages is empty, invalid, or cannot be repaired), attempting to access ['messages'] directly on the result will raise a TypeError: 'NoneType' object is not subscriptable.

To make this preprocessing step more robust and prevent potential crashes, we should check if the preprocessed rejected row is None and return None for the entire row to filter it out gracefully.

Suggested change
if 'rejected_messages' in row:
row['rejected_messages'] = MessagesPreprocessor.preprocess(
self, {'messages': row['rejected_messages']})['messages']
self, {'messages': row['rejected_messages'], 'system': row.get('system')})['messages']
if 'rejected_messages' in row:
rejected_row = MessagesPreprocessor.preprocess(
self, {'messages': row['rejected_messages'], 'system': row.get('system')})
if rejected_row is None:
return None
row['rejected_messages'] = rejected_row['messages']

@Jintao-Huang

Copy link
Copy Markdown
Collaborator

The system content is usually placed inside messages and rejected_messages.

@he-yufeng

Copy link
Copy Markdown
Author

You're right that in the common format the system turn is already embedded as the first message in both messages and rejected_messages, and there this change is a no-op.

The case it fixes is when the dataset carries system as a separate top-level column (the system / system_prompt mapping). For those rows preprocess pops system and prepends it to messages via to_std_messages / sharegpt_to_messages, but the recursive call for rejected_messages was built as {'messages': row['rejected_messages']} without system, so the system turn only lands on the chosen side. DPO then trains on a pair whose chosen and rejected halves have different prefixes.

The regression test builds exactly such a row: with the fix, system is first in both lists; reverting the one-line change makes it fail with 'user' != 'system' on rejected_messages[0].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants