[4/n][trainer] feat: flowgrpo - add diffusers + fsdp engine support by zhtmike · Pull Request #50 · zhtmike/verl

zhtmike · 2026-03-12T09:47:51Z

What does this PR do?

Add Diffusers with FSDP as the training engine, for diffusion model RL.
Add FlowGRPO algorithm (loss only, for UT testing)
SP support with be provided in the separeted PR
It might be too big to have trainer + engine PR together, so we split it into two. The next PR will be the final trainer PR, and the last one is doc PR for FlowGRPO algorithm.

Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward, fully_async, one_step_off
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)
If your PR is related to the recipe submodule, please also update the reference to the submodule commit via git submodule update --remote or cd recipe && git pull origin main.

Copilot

Pull request overview

Adds a Diffusers-based diffusion model training path to VERL’s trainer stack, integrating a new FSDP/FSDP2 engine implementation and FlowGRPO-specific loss/scheduler utilities.

Changes:

Introduces DiffusersFSDPEngine (FSDP/FSDP2) for diffusion-model training/inference, including checkpointing and LoRA handling.
Adds diffusion-specific padding + loss utilities and FlowGRPO policy loss / image KL helper.
Adds Diffusers model config + scheduler/model helpers and a sanity test for the new engine.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
`verl/workers/utils/padding.py`	Adds prompt-embed padding→no-padding conversion for diffusion batches.
`verl/workers/utils/losses.py`	Adds `diffusion_loss` and imports `kl_penalty_image`.
`verl/workers/engine_workers.py`	Makes worker logic tolerant of diffusion configs (missing HF fields / `input_ids`).
`verl/workers/engine/fsdp/diffusers_impl.py`	New Diffusers FSDP engine implementation (core of the PR).
`verl/workers/engine/fsdp/__init__.py`	Conditionally exports the Diffusers FSDP engine.
`verl/workers/engine/__init__.py`	Exposes Diffusers engine at the package level when available.
`verl/workers/config/model.py`	Adds `DiffusersModelConfig` dataclass.
`verl/workers/config/engine.py`	Allows `TrainingWorkerConfig.model_config` to be HF or Diffusers config.
`verl/utils/fsdp_utils.py`	Extends LoRA param collection to support diffusers module naming/prefixes.
`verl/trainer/ppo/core_algos.py`	Adds FlowGRPO policy loss + `kl_penalty_image`.
`verl/trainer/config/model/diffusers_model.yaml`	New Hydra config template for Diffusers models.
`verl/models/diffusers_model/*`	Adds diffusion-model abstraction utilities, Qwen-Image adapter, and FlowMatch SDE scheduler.
`tests/special_sanity/check_device_api_usage.py`	Updates allowlist for new engine file.
`tests/models/test_diffusers_fsdp_engine.py`	Adds a Ray-based integration test for diffusion FSDP engine.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

verl/workers/engine/fsdp/diffusers_impl.py

verl/workers/utils/padding.py

verl/workers/engine/fsdp/diffusers_impl.py

examples/flowgrpo_trainer/diffusers/qwen_image.py

tests/models/test_diffusers_fsdp_engine.py

verl/workers/utils/padding.py

verl/workers/engine/fsdp/diffusers_impl.py

Copilot

Pull request overview

Adds first-pass Diffusers (diffusion model) training support on the existing worker/engine stack by introducing an FSDP/FSDP2 engine implementation and the accompanying FlowGRPO loss/scheduler plumbing.

Changes:

Introduce DiffusersFSDPEngine (FSDP/FSDP2) plus diffusion-model utilities (scheduler + model-specific hooks).
Add FlowGRPO policy loss + image KL helper and a diffusion-specific loss wrapper.
Add prompt-embed padding→no-padding conversion and new CPU/GPU tests + config YAML for diffusers models.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
verl/workers/utils/padding.py	Add `embeds_padding_2_no_padding` for diffusion prompt embeds.
verl/workers/utils/losses.py	Add `diffusion_loss` and image KL integration.
verl/workers/engine_workers.py	Make worker logic robust when `model_config` lacks LLM-only fields (e.g., `hf_config`, `input_ids`).
verl/workers/engine/fsdp/diffusers_impl.py	New Diffusers FSDP/FSDP2 engine implementation.
verl/workers/engine/fsdp/init.py	Conditional export of `DiffusersFSDPEngine`.
verl/workers/engine/init.py	Export `DiffusersFSDPEngine` when available.
verl/workers/config/model.py	Add `DiffusersModelConfig`.
verl/workers/config/engine.py	Allow `TrainingWorkerConfig.model_config` to be HF or Diffusers config.
verl/utils/fsdp_utils.py	Extend LoRA param collection to support diffusers module structure.
verl/trainer/ppo/core_algos.py	Register `flow_grpo` policy loss + add `kl_penalty_image`.
verl/trainer/config/model/diffusers_model.yaml	Add Hydra config template for diffusers models.
verl/models/diffusers_model/*	New diffusion model base/registry, QwenImage hook, scheduler impl, and utils.
tests/utils/test_padding_on_cpu.py	Unit test for embed padding→no-padding conversion.
tests/trainer/ppo/test_core_algos_on_cpu.py	Unit test for FlowGRPO policy loss.
tests/special_sanity/check_device_api_usage.py	Allowlist new engine file.
tests/models/test_diffusers_fsdp_engine.py	End-to-end-ish smoke test for diffusers engine (fsdp/fsdp2).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

verl/trainer/ppo/core_algos.py

verl/workers/engine/fsdp/diffusers_impl.py

zhtmike force-pushed the diffusers_engine branch 4 times, most recently from 50425ca to ba27f6a Compare March 13, 2026 07:53

zhtmike marked this pull request as ready for review March 13, 2026 07:53

zhtmike requested a review from Copilot March 13, 2026 07:53

Copilot started reviewing on behalf of zhtmike March 13, 2026 07:54 View session

Copilot AI reviewed Mar 13, 2026

View reviewed changes

zhtmike requested a review from Copilot March 16, 2026 05:08

Copilot started reviewing on behalf of zhtmike March 16, 2026 05:08 View session

Copilot AI reviewed Mar 16, 2026

View reviewed changes

verl/trainer/ppo/core_algos.py Outdated Show resolved Hide resolved

verl/workers/engine/fsdp/diffusers_impl.py Outdated Show resolved Hide resolved

verl/workers/engine/fsdp/diffusers_impl.py Show resolved Hide resolved

verl/workers/engine/fsdp/diffusers_impl.py Show resolved Hide resolved

zhtmike added 14 commits March 16, 2026 14:39

add DiffusionFSDP engine

341e11f

refactor

87391de

refactor

3de2b55

fix bug

57d7093

clean

980e227

update dosctring

d6555d5

fix bug

482e5bf

add dosctring

39bb098

add UTs

66f7bed

refactor & add docstring for metaclass

de1ea60

refactor

120b97f

refactor

fe013ac

better name

5880a28

fix arrcording to suggestion

a6d12a8

zhtmike force-pushed the diffusers_engine branch from 77db4b9 to a6d12a8 Compare March 16, 2026 06:39

chenyingshu approved these changes Mar 16, 2026

View reviewed changes

zhtmike mentioned this pull request Mar 18, 2026

[2/n][rollout] feat: flowgrpo - add diffusion agent loop support #53

Closed

8 tasks

zhtmike added 2 commits March 20, 2026 16:20

move context manager to outside

6ff361d

Merge branch 'main' into diffusers_engine

8bda360

zhtmike changed the title ~~[trainer] feat: [1/n] flowgrpo - add diffusers + fsdp engine support~~ [4/n][trainer] feat: flowgrpo - add diffusers + fsdp engine support Mar 23, 2026

zhtmike added 3 commits March 27, 2026 11:51

Merge branch 'main' into diffusers_engine

c1ea735

refactor script

aaf5b6b

update CI

b3496f7

zhtmike force-pushed the diffusers_engine branch from 7b1b0c5 to b3496f7 Compare March 27, 2026 07:48

zhtmike added 18 commits March 30, 2026 14:35

Merge branch 'main' into diffusers_engine

da924c2

merge main

ef89bac

move loss according to suggetion

d11588a

simply import

cc226cb

clean code

a38b6ff

Merge branch 'main' into diffusers_engine

212b868

clean diffusion agent loop

58286ee

add lora config back for vllm-compatibility

c9a88e6

add comment

78335be

fix loss normalization

3fc5197

extract prepare_model_inputs

e6374e8

clean

0e7fe3e

refactor

75a945c

fix test

b6771ed

fix img_shapes with zero_cond_t is enabled

cff1fdd

add protection to force scheduler calculated in fp32 in training loop

2d5a25a

clean & add metric

0aa772c

Merge branch 'main' into diffusers_engine

bd9b8d7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[4/n][trainer] feat: flowgrpo - add diffusers + fsdp engine support#50

[4/n][trainer] feat: flowgrpo - add diffusers + fsdp engine support#50
zhtmike wants to merge 37 commits intomainfrom
diffusers_engine

zhtmike commented Mar 12, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zhtmike commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhtmike commented Mar 12, 2026 •

edited

Loading