Skip to content

Fix for incorrect sequence of prompts being sent over for inference#147

Merged
ethantang-db merged 110 commits intosingle-controller-hackathonfrom
ethantang-db/fix_multi_node
Aug 14, 2025
Merged

Fix for incorrect sequence of prompts being sent over for inference#147
ethantang-db merged 110 commits intosingle-controller-hackathonfrom
ethantang-db/fix_multi_node

Conversation

@ethantang-db
Copy link
Collaborator

@ethantang-db ethantang-db commented Aug 12, 2025

This fixes the issue where the sequence of prompts we are sending over to vLLM is incorrect and is actually causing a scattering pattern through out its sequence wrt to the unique prompts generations where what we want is that the sequence should have the generations be grouped together for each unique prompt. This issue down the line will cause cases where certain set ups would cause the model to not train as well as this should fix the calculation for its rewards and advantage calculation.

Runs with the fix:

Single node run: https://dbc-559ffd80-2bfc.cloud.databricks.com/ml/experiments/3786561063037191/runs/6921b098214b4b0c88397c5ce0b170a2?o=7395834863327820
2 nodes run: https://dbc-559ffd80-2bfc.cloud.databricks.com/ml/experiments/3786561063037191/runs/a9a74b1a4b2a4561941513d37c305550?o=7395834863327820

@ethantang-db ethantang-db changed the title Attempt of fixing multi node Fix for incorrect sequence of prompts being sent over for inference Aug 14, 2025
Copy link
Collaborator

@abaheti95 abaheti95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. This is a fine fix for now while generation per prompt <= rollouts per train actor. But, we would have to do a long term fix once we move the Rewards in the RolloutActor.

Copy link
Collaborator

@rithwik-db rithwik-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one comment, but LGTM, thanks for catching this!

Copy link
Collaborator

@bowenyang008 bowenyang008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ethantang-db, this is great!

@ethantang-db ethantang-db merged commit 1823225 into single-controller-hackathon Aug 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants