fix(dataset): propagate system message to rejected_messages in DPO preprocessing#9611
fix(dataset): propagate system message to rejected_messages in DPO preprocessing#9611he-yufeng wants to merge 1 commit into
Conversation
…eprocessing
When a DPO dataset row has both a `system` key and `rejected_messages`, the
recursive preprocess() call only forwarded `{'messages': row['rejected_messages']}`,
dropping the system key. This meant system was prepended to chosen messages but
never to rejected messages, producing asymmetric conversation prefixes and
corrupting the DPO training signal.
Pass `'system': row.get('system')` to the recursive call so both sides receive
the same system prefix. Also adds a regression test.
There was a problem hiding this comment.
Code Review
This pull request ensures that the system message is propagated to rejected_messages during preprocessing in swift/dataset/preprocessor/core.py, and adds a corresponding unit test to prevent regression. The reviewer pointed out a potential TypeError if the recursive preprocessing of rejected_messages returns None, suggesting a robust check to handle this case gracefully.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if 'rejected_messages' in row: | ||
| row['rejected_messages'] = MessagesPreprocessor.preprocess( | ||
| self, {'messages': row['rejected_messages']})['messages'] | ||
| self, {'messages': row['rejected_messages'], 'system': row.get('system')})['messages'] |
There was a problem hiding this comment.
If MessagesPreprocessor.preprocess returns None (which happens if rejected_messages is empty, invalid, or cannot be repaired), attempting to access ['messages'] directly on the result will raise a TypeError: 'NoneType' object is not subscriptable.
To make this preprocessing step more robust and prevent potential crashes, we should check if the preprocessed rejected row is None and return None for the entire row to filter it out gracefully.
| if 'rejected_messages' in row: | |
| row['rejected_messages'] = MessagesPreprocessor.preprocess( | |
| self, {'messages': row['rejected_messages']})['messages'] | |
| self, {'messages': row['rejected_messages'], 'system': row.get('system')})['messages'] | |
| if 'rejected_messages' in row: | |
| rejected_row = MessagesPreprocessor.preprocess( | |
| self, {'messages': row['rejected_messages'], 'system': row.get('system')}) | |
| if rejected_row is None: | |
| return None | |
| row['rejected_messages'] = rejected_row['messages'] |
|
The system content is usually placed inside |
|
You're right that in the common format the system turn is already embedded as the first message in both The case it fixes is when the dataset carries The regression test builds exactly such a row: with the fix, system is first in both lists; reverting the one-line change makes it fail with |
When a DPO dataset row has both a
systemkey andrejected_messages, the system message is never added torejected_messages. The recursive call topreprocessonly passes{'messages': row['rejected_messages']}— thesystemkey is dropped on the way in.DPO training requires chosen and rejected to have the same prompt context; this silently produces asymmetric encoding. Anything downstream that encodes both sides will see different conversation prefixes, which corrupts the training signal.
One-line fix: pass
'system': row.get('system')to the recursive call so the system message is forwarded.Also adds a regression test covering the scenario.