fix(dataset): propagate system message to rejected_messages in DPO preprocessing by he-yufeng · Pull Request #9611 · modelscope/ms-swift

he-yufeng · 2026-06-22T09:30:42Z

When a DPO dataset row has both a system key and rejected_messages, the system message is never added to rejected_messages. The recursive call to preprocess only passes {'messages': row['rejected_messages']} — the system key is dropped on the way in.

DPO training requires chosen and rejected to have the same prompt context; this silently produces asymmetric encoding. Anything downstream that encodes both sides will see different conversation prefixes, which corrupts the training signal.

One-line fix: pass 'system': row.get('system') to the recursive call so the system message is forwarded.

Also adds a regression test covering the scenario.

…eprocessing When a DPO dataset row has both a `system` key and `rejected_messages`, the recursive preprocess() call only forwarded `{'messages': row['rejected_messages']}`, dropping the system key. This meant system was prepended to chosen messages but never to rejected messages, producing asymmetric conversation prefixes and corrupting the DPO training signal. Pass `'system': row.get('system')` to the recursive call so both sides receive the same system prefix. Also adds a regression test.

gemini-code-assist

Code Review

This pull request ensures that the system message is propagated to rejected_messages during preprocessing in swift/dataset/preprocessor/core.py, and adds a corresponding unit test to prevent regression. The reviewer pointed out a potential TypeError if the recursive preprocessing of rejected_messages returns None, suggesting a robust check to handle this case gracefully.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-22T09:31:18Z

        if 'rejected_messages' in row:
            row['rejected_messages'] = MessagesPreprocessor.preprocess(
-                self, {'messages': row['rejected_messages']})['messages']
+                self, {'messages': row['rejected_messages'], 'system': row.get('system')})['messages']


If MessagesPreprocessor.preprocess returns None (which happens if rejected_messages is empty, invalid, or cannot be repaired), attempting to access ['messages'] directly on the result will raise a TypeError: 'NoneType' object is not subscriptable.

To make this preprocessing step more robust and prevent potential crashes, we should check if the preprocessed rejected row is None and return None for the entire row to filter it out gracefully.

Suggested change

if 'rejected_messages' in row:

row['rejected_messages'] = MessagesPreprocessor.preprocess(

self, {'messages': row['rejected_messages']})['messages']

self, {'messages': row['rejected_messages'], 'system': row.get('system')})['messages']

if 'rejected_messages' in row:

rejected_row = MessagesPreprocessor.preprocess(

self, {'messages': row['rejected_messages'], 'system': row.get('system')})

if rejected_row is None:

return None

row['rejected_messages'] = rejected_row['messages']

Jintao-Huang · 2026-06-24T02:39:16Z

The system content is usually placed inside messages and rejected_messages.

he-yufeng · 2026-06-24T12:08:10Z

You're right that in the common format the system turn is already embedded as the first message in both messages and rejected_messages, and there this change is a no-op.

The case it fixes is when the dataset carries system as a separate top-level column (the system / system_prompt mapping). For those rows preprocess pops system and prepends it to messages via to_std_messages / sharegpt_to_messages, but the recursive call for rejected_messages was built as {'messages': row['rejected_messages']} without system, so the system turn only lands on the chosen side. DPO then trains on a pair whose chosen and rejected halves have different prefixes.

The regression test builds exactly such a row: with the fix, system is first in both lists; reverting the one-line change makes it fail with 'user' != 'system' on rejected_messages[0].

gemini-code-assist Bot reviewed Jun 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(dataset): propagate system message to rejected_messages in DPO preprocessing#9611

fix(dataset): propagate system message to rejected_messages in DPO preprocessing#9611
he-yufeng wants to merge 1 commit into
modelscope:mainfrom
he-yufeng:fix/dpo-system-in-rejected-messages

he-yufeng commented Jun 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 22, 2026

Uh oh!

Jintao-Huang commented Jun 24, 2026

Uh oh!

he-yufeng commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

he-yufeng commented Jun 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Jintao-Huang commented Jun 24, 2026

Uh oh!

he-yufeng commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants