Skip to content

[2/n][rollout] feat: flowgrpo - add diffusion agent loop support#53

Closed
AndyZhou952 wants to merge 118 commits intozhtmike:mainfrom
AndyZhou952:diffusion-agent-loop-pr
Closed

[2/n][rollout] feat: flowgrpo - add diffusion agent loop support#53
AndyZhou952 wants to merge 118 commits intozhtmike:mainfrom
AndyZhou952:diffusion-agent-loop-pr

Conversation

@AndyZhou952
Copy link
Copy Markdown

@AndyZhou952 AndyZhou952 commented Mar 17, 2026

What does this PR do?

Add diffusion agent loop support

Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward, fully_async, one_step_off
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

knlnguyen1802 and others added 19 commits March 13, 2026 10:57
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
@AndyZhou952 AndyZhou952 marked this pull request as ready for review March 17, 2026 03:50
)


class DiffusionAgentLoopWorker:
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly inherited from AgentLoopWorker, check repeated methods?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like it makes more sense to keep separate classes, considering the amount of overwrites needed. (Trace removed, DiffusersModelConfig, sampling params etc.)

@zhtmike
Copy link
Copy Markdown
Owner

zhtmike commented Mar 17, 2026

fix CI pls

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces diffusion-model support to the experimental agent loop stack, adding a diffusion-specific single-turn agent loop, a diffusion worker pipeline for batching/postprocessing, and an initial test targeting diffusion rollouts.

Changes:

  • Add DiffusionSingleTurnAgentLoop and register it as diffusion_single_turn_agent.
  • Extend the agent loop manager/worker codepath with DiffusionAgentLoopWorker and diffusion-specific output schemas.
  • Add a new diffusion agent loop test under tests/experimental/agent_loop.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

File Description
verl/experimental/agent_loop/single_turn_agent_loop.py Adds a diffusion-oriented single-turn agent loop that passes negative prompts and expects image outputs.
verl/experimental/agent_loop/agent_loop.py Adds diffusion output models, diffusion worker implementation, and routes manager to the diffusion worker based on model_type.
verl/experimental/agent_loop/__init__.py Exposes DiffusionAgentLoopWorker in the package exports.
tests/experimental/agent_loop/test_diffusion_agent_loop.py Adds an end-to-end-style diffusion rollout test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
@AndyZhou952 AndyZhou952 changed the title [agent loop] feat: [1/n] flowgrpo - add diffusion agent loop support [rollout] feat: [1/n] flowgrpo - add diffusion agent loop support Mar 23, 2026
@zhtmike zhtmike changed the title [rollout] feat: [1/n] flowgrpo - add diffusion agent loop support [2/n][rollout] feat: flowgrpo - add diffusion agent loop support Mar 23, 2026
@zhtmike
Copy link
Copy Markdown
Owner

zhtmike commented Mar 24, 2026

move to verl PR

@zhtmike zhtmike closed this Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants