feat: support new DPO data format and update SFT config to use override API by arendu · Pull Request #405 · NVIDIA/NeMo-Aligner

arendu · 2024-11-14T23:06:51Z

What does this PR do ?

This PR makes the dpo dataset use chat format tokens from the model's config yaml instead of hardcoding chat/special tokens in the jsonl data file.

Currently, each datapoint inside a DPO jsonl data file, looks like this:

{
  "prompt": "<extra_id_0>System\n\n<extra_id_1>User\nbacillus subtilus\n<extra_id_1>Assistant\n",
  "chosen_response": "Bacillus ... and industry alike.\n<extra_id_1>",
  "rejected_response": "The Bacillus ... fields of study.\n<extra_id_1>",
  "rejected_reward": 3,
  "chosen_reward": 4
}

With this PR it should be like this (OpenAI list of messages format with no chat/formatting tokens):

{
  "prompt": [
    {
      "role": "system",
      "content": ""
    },
    {
      "role": "user",
      "content": "bacillus subtilus"
    }
  ],
  "chosen_response": {
    "role": "assistant",
    "content": "Bacillus ... and industry alike."
  },
  "rejected_response": {
    "role": "assistant",
    "content": "The Bacillus ... fields of study."
  },
  "chosen_reward": 4,
  "rejected_reward": 3
}

Additionally There is a script added to convert old data files into the new format.

python nemo_aligner/data/nlp/scripts/undo_special_tokens.py <path_to_old_format_dpo_jsonl_file>

A new file will be written in the same location as the old format file.

Changelog

Please update the CHANGELOG.md under next version with high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation? Make sure to also update the NeMo Framework User Guide which contains the tutorials

Checklist when contributing a new algorithm

Does the trainer resume and restore model state all states?
Does the trainer support all parallelism techniques(PP, TP, DP)?
Does the trainer support max_steps=-1 and validation?
Does the trainer only call APIs defined in alignable_interface.py?
Does the trainer have proper logging?

Additional Information

Related to # (issue)

examples/nlp/gpt/conf/gpt_dpo.yaml

examples/nlp/gpt/train_gpt_sft.py

nemo_aligner/data/nlp/builders.py

nemo_aligner/data/nlp/datasets.py

nemo_aligner/data/nlp/scripts/undo_special_tokens.py

nemo_aligner/data/nlp/datasets.py

examples/nlp/gpt/train_gpt_dpo.py

terrykong

TODOs

compatbility test
Stretch (update the dpo.sh template test script to convert the train data json into this new format)

nemo_aligner/data/nlp/datasets.py

nemo_aligner/data/nlp/scripts/undo_special_tokens.py

nemo_aligner/data/nlp/datasets.py

tests/test_datasets.py

Signed-off-by: Terry Kong <terryk@nvidia.com>

terrykong · 2024-11-22T01:55:06Z

closing in favor of #403

Signed-off-by: arendu <adithya.r@gmail.com>

for more information, see https://pre-commit.ci Signed-off-by: NeMo-Aligner CI <nemo-aligner-ci@nvidia.com>

Signed-off-by: arendu <adithya.r@gmail.com>

…NeMo-Aligner into adithyare/dpo_data_refac

…de API (#405) Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: arendu <adithya.r@gmail.com> Signed-off-by: NeMo-Aligner CI <nemo-aligner-ci@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Terry Kong <terryk@nvidia.com>

ccclyu · 2024-12-05T22:45:09Z

nemo_aligner/data/nlp/datasets.py

+
+        return output_dict
+
+    def convert(self, messages):


Do you think it can support apply_chat_template (https://huggingface.co/docs/transformers/main/en/chat_templating) for huggingface tokenizers that are adapted in most open-sourced LLMs?

github-actions bot added the Algorithms label Nov 14, 2024

arendu requested review from gshennvm and terrykong November 15, 2024 06:33

arendu marked this pull request as ready for review November 15, 2024 06:36

arendu changed the title ~~Adithyare/dpo data refac~~ DPO data format refactor Nov 15, 2024

github-actions bot removed the Algorithms label Nov 15, 2024

terrykong suggested changes Nov 15, 2024

View reviewed changes

arendu requested a review from terrykong November 18, 2024 22:38

terrykong changed the title ~~DPO data format refactor~~ feat: support new DPO data format Nov 21, 2024

terrykong suggested changes Nov 21, 2024

View reviewed changes

arendu requested a review from terrykong November 21, 2024 05:29

arendu added the CI label Nov 21, 2024

github-actions bot removed the CI label Nov 21, 2024

terrykong reviewed Nov 21, 2024

View reviewed changes

tests/test_datasets.py Outdated Show resolved Hide resolved

terrykong reviewed Nov 21, 2024

View reviewed changes

tests/test_datasets.py Outdated Show resolved Hide resolved

terrykong force-pushed the adithyare/dpo_data_refac branch from d32515c to a112c19 Compare November 22, 2024 01:49

feat: dpo dataset new openai chat completion format

0d3b8ee

Signed-off-by: Terry Kong <terryk@nvidia.com>

terrykong force-pushed the adithyare/dpo_data_refac branch from a112c19 to 0d3b8ee Compare November 22, 2024 01:50

terrykong closed this Nov 22, 2024

terrykong mentioned this pull request Nov 22, 2024

Nemotron5 features #403

Draft

8 tasks

terrykong reopened this Nov 22, 2024

Update test_datasets.py

db3eb40

terrykong changed the title ~~feat: support new DPO data format~~ feat: support new DPO data format and update SFT config to use override API Dec 3, 2024

arendu and others added 2 commits December 3, 2024 23:20

updated to use importskip

adb8130

Signed-off-by: arendu <adithya.r@gmail.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

1d732ad

for more information, see https://pre-commit.ci Signed-off-by: NeMo-Aligner CI <nemo-aligner-ci@nvidia.com>

terrykong previously approved these changes Dec 3, 2024

View reviewed changes

Merge branch 'main' into adithyare/dpo_data_refac

613a63a

terrykong added the Run CICD Set + un-set to retrigger (add after r*.*.* labels) label Dec 3, 2024

terrykong mentioned this pull request Dec 4, 2024

feat: add context parallel support for SFT #420

Closed

8 tasks

arendu added 2 commits December 4, 2024 01:48

fix for batch size misconfiguration

a76c29a

Signed-off-by: arendu <adithya.r@gmail.com>

Merge branch 'adithyare/dpo_data_refac' of https://github.com/NVIDIA/…

e3d1192

…NeMo-Aligner into adithyare/dpo_data_refac

arendu dismissed terrykong’s stale review via e3d1192 December 4, 2024 01:49

arendu added Run CICD Set + un-set to retrigger (add after r*.*.* labels) and removed Run CICD Set + un-set to retrigger (add after r*.*.* labels) labels Dec 4, 2024

Update gpt_sft.yaml removed comment

db1d5f1

terrykong approved these changes Dec 4, 2024

View reviewed changes

arendu added Run CICD Set + un-set to retrigger (add after r*.*.* labels) and removed Run CICD Set + un-set to retrigger (add after r*.*.* labels) labels Dec 4, 2024

terrykong enabled auto-merge (squash) December 4, 2024 01:58

terrykong merged commit 5d4b2a7 into main Dec 4, 2024

terrykong deleted the adithyare/dpo_data_refac branch December 4, 2024 02:20

ccclyu reviewed Dec 5, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support new DPO data format and update SFT config to use override API#405

feat: support new DPO data format and update SFT config to use override API#405
terrykong merged 8 commits intomainfrom
adithyare/dpo_data_refac

arendu commented Nov 14, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

terrykong left a comment •

edited by arendu

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

terrykong commented Nov 22, 2024

Uh oh!

ccclyu Dec 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

arendu commented Nov 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Checklist when contributing a new algorithm

Additional Information

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

terrykong left a comment • edited by arendu Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

terrykong commented Nov 22, 2024

Uh oh!

ccclyu Dec 5, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

arendu commented Nov 14, 2024 •

edited

Loading

terrykong left a comment •

edited by arendu

Loading