Fix for incorrect sequence of prompts being sent over for inference by ethantang-db · Pull Request #147 · databricks/compose-rl

ethantang-db · 2025-08-12T17:25:12Z

This fixes the issue where the sequence of prompts we are sending over to vLLM is incorrect and is actually causing a scattering pattern through out its sequence wrt to the unique prompts generations where what we want is that the sequence should have the generations be grouped together for each unique prompt. This issue down the line will cause cases where certain set ups would cause the model to not train as well as this should fix the calculation for its rewards and advantage calculation.

Runs with the fix:

Single node run: https://dbc-559ffd80-2bfc.cloud.databricks.com/ml/experiments/3786561063037191/runs/6921b098214b4b0c88397c5ce0b170a2?o=7395834863327820
2 nodes run: https://dbc-559ffd80-2bfc.cloud.databricks.com/ml/experiments/3786561063037191/runs/a9a74b1a4b2a4561941513d37c305550?o=7395834863327820

abaheti95

Awesome. This is a fine fix for now while generation per prompt <= rollouts per train actor. But, we would have to do a long term fix once we move the Rewards in the RolloutActor.

compose_rl/algorithms/online/callback_utils.py

rithwik-db

Left one comment, but LGTM, thanks for catching this!

compose_rl/algorithms/online/callback_utils.py

test_single_controller_ppo.py

bowenyang008

Thanks @ethantang-db, this is great!

ethantang-db added 30 commits August 9, 2025 11:44

adding logging to understand weight updates

15c6ae3

assert false

4ed2322

logging more updateS

453a610

trying out llama 1b

5c15cb8

fix loading

2bd7bc9

different dataset

a8d817b

revert to r1

5d64218

trying out 2 nodes

3cf4432

test

f7184be

log worker_wrap logic

7e66c92

force crash

7e1d365

removing assert

7c10670

jank logging

b0a3467

try gloo?

ccd2e4c

revert back to nccl

34ce4fb

try out cpu and gloo

536e513

log tensors to file

2f4de01

better logging

c96a8d1

removed redundent debugging

ac6b8d9

rank

24b0d93

try env vars

f95a801

try out other place for nccl

b0a3441

f...

5598942

further debugging

febdc69

log what weights are updated

05085f3

log weight updates

0ebb94b

update weights

2673c66

this is trippin

d3c1d20

better weight logging

840e0d8

like cursor bruh?

b9dfda7

ethantang-db added 21 commits August 13, 2025 12:52

isolate which var it is

594f87e

trying out 16 samples

dde0142

more debug

26b6cc4

trying out something else

2e1b64a

further debugging

7042619

debug

a768a0e

double checking type

136d687

more check

d46ae8d

flipping this

2289ee7

more corrections

46e72b6

more debugging

27a52f2

stack

a394e79

fix bs

bdeb027

more fix

2647386

most random comma

c0c3dae

double checking shapes

45b08fe

should be stack

f8bfcce

cleaning up stuff

4075679

let's do a run

7b62267

change cluster

748168e

the fix

06248b7

ethantang-db changed the title ~~Attempt of fixing multi node~~ Fix for incorrect sequence of prompts being sent over for inference Aug 14, 2025

white space

d1bf233

abaheti95 approved these changes Aug 14, 2025

View reviewed changes

compose_rl/algorithms/online/callback_utils.py Show resolved Hide resolved

rithwik-db approved these changes Aug 14, 2025

View reviewed changes

bowenyang008 reviewed Aug 14, 2025

View reviewed changes

compose_rl/algorithms/online/callback_utils.py Show resolved Hide resolved

added checks for proper configs

9af0e4d

bowenyang008 reviewed Aug 14, 2025

View reviewed changes

test_single_controller_ppo.py Show resolved Hide resolved

bowenyang008 approved these changes Aug 14, 2025

View reviewed changes

ethantang-db merged commit 1823225 into single-controller-hackathon Aug 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for incorrect sequence of prompts being sent over for inference#147

Fix for incorrect sequence of prompts being sent over for inference#147
ethantang-db merged 110 commits intosingle-controller-hackathonfrom
ethantang-db/fix_multi_node

ethantang-db commented Aug 12, 2025 •

edited

Loading

Uh oh!

abaheti95 left a comment

Uh oh!

Uh oh!

rithwik-db left a comment

Uh oh!

Uh oh!

Uh oh!

bowenyang008 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ethantang-db commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abaheti95 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rithwik-db left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bowenyang008 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ethantang-db commented Aug 12, 2025 •

edited

Loading