Skip to content

[4/n][trainer] feat: flowgrpo - add diffusers + fsdp engine support#50

Open
zhtmike wants to merge 37 commits intomainfrom
diffusers_engine
Open

[4/n][trainer] feat: flowgrpo - add diffusers + fsdp engine support#50
zhtmike wants to merge 37 commits intomainfrom
diffusers_engine

Conversation

@zhtmike
Copy link
Copy Markdown
Owner

@zhtmike zhtmike commented Mar 12, 2026

What does this PR do?

  • Add Diffusers with FSDP as the training engine, for diffusion model RL.
  • Add FlowGRPO algorithm (loss only, for UT testing)
  • SP support with be provided in the separeted PR
  • It might be too big to have trainer + engine PR together, so we split it into two. The next PR will be the final trainer PR, and the last one is doc PR for FlowGRPO algorithm.

Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward, fully_async, one_step_off
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

@zhtmike zhtmike force-pushed the diffusers_engine branch 4 times, most recently from 50425ca to ba27f6a Compare March 13, 2026 07:53
@zhtmike zhtmike marked this pull request as ready for review March 13, 2026 07:53
@zhtmike zhtmike requested a review from Copilot March 13, 2026 07:53
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Diffusers-based diffusion model training path to VERL’s trainer stack, integrating a new FSDP/FSDP2 engine implementation and FlowGRPO-specific loss/scheduler utilities.

Changes:

  • Introduces DiffusersFSDPEngine (FSDP/FSDP2) for diffusion-model training/inference, including checkpointing and LoRA handling.
  • Adds diffusion-specific padding + loss utilities and FlowGRPO policy loss / image KL helper.
  • Adds Diffusers model config + scheduler/model helpers and a sanity test for the new engine.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
verl/workers/utils/padding.py Adds prompt-embed padding→no-padding conversion for diffusion batches.
verl/workers/utils/losses.py Adds diffusion_loss and imports kl_penalty_image.
verl/workers/engine_workers.py Makes worker logic tolerant of diffusion configs (missing HF fields / input_ids).
verl/workers/engine/fsdp/diffusers_impl.py New Diffusers FSDP engine implementation (core of the PR).
verl/workers/engine/fsdp/__init__.py Conditionally exports the Diffusers FSDP engine.
verl/workers/engine/__init__.py Exposes Diffusers engine at the package level when available.
verl/workers/config/model.py Adds DiffusersModelConfig dataclass.
verl/workers/config/engine.py Allows TrainingWorkerConfig.model_config to be HF or Diffusers config.
verl/utils/fsdp_utils.py Extends LoRA param collection to support diffusers module naming/prefixes.
verl/trainer/ppo/core_algos.py Adds FlowGRPO policy loss + kl_penalty_image.
verl/trainer/config/model/diffusers_model.yaml New Hydra config template for Diffusers models.
verl/models/diffusers_model/* Adds diffusion-model abstraction utilities, Qwen-Image adapter, and FlowMatch SDE scheduler.
tests/special_sanity/check_device_api_usage.py Updates allowlist for new engine file.
tests/models/test_diffusers_fsdp_engine.py Adds a Ray-based integration test for diffusion FSDP engine.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-pass Diffusers (diffusion model) training support on the existing worker/engine stack by introducing an FSDP/FSDP2 engine implementation and the accompanying FlowGRPO loss/scheduler plumbing.

Changes:

  • Introduce DiffusersFSDPEngine (FSDP/FSDP2) plus diffusion-model utilities (scheduler + model-specific hooks).
  • Add FlowGRPO policy loss + image KL helper and a diffusion-specific loss wrapper.
  • Add prompt-embed padding→no-padding conversion and new CPU/GPU tests + config YAML for diffusers models.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
verl/workers/utils/padding.py Add embeds_padding_2_no_padding for diffusion prompt embeds.
verl/workers/utils/losses.py Add diffusion_loss and image KL integration.
verl/workers/engine_workers.py Make worker logic robust when model_config lacks LLM-only fields (e.g., hf_config, input_ids).
verl/workers/engine/fsdp/diffusers_impl.py New Diffusers FSDP/FSDP2 engine implementation.
verl/workers/engine/fsdp/init.py Conditional export of DiffusersFSDPEngine.
verl/workers/engine/init.py Export DiffusersFSDPEngine when available.
verl/workers/config/model.py Add DiffusersModelConfig.
verl/workers/config/engine.py Allow TrainingWorkerConfig.model_config to be HF or Diffusers config.
verl/utils/fsdp_utils.py Extend LoRA param collection to support diffusers module structure.
verl/trainer/ppo/core_algos.py Register flow_grpo policy loss + add kl_penalty_image.
verl/trainer/config/model/diffusers_model.yaml Add Hydra config template for diffusers models.
verl/models/diffusers_model/* New diffusion model base/registry, QwenImage hook, scheduler impl, and utils.
tests/utils/test_padding_on_cpu.py Unit test for embed padding→no-padding conversion.
tests/trainer/ppo/test_core_algos_on_cpu.py Unit test for FlowGRPO policy loss.
tests/special_sanity/check_device_api_usage.py Allowlist new engine file.
tests/models/test_diffusers_fsdp_engine.py End-to-end-ish smoke test for diffusers engine (fsdp/fsdp2).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@zhtmike zhtmike changed the title [trainer] feat: [1/n] flowgrpo - add diffusers + fsdp engine support [4/n][trainer] feat: flowgrpo - add diffusers + fsdp engine support Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants