Conversation
40ecc20 to
381a0fd
Compare
381a0fd to
c014371
Compare
There was a problem hiding this comment.
Pull request overview
Adds Ulysses Sequence Parallel (SP) support for Diffusers-based training in the FSDP engine, including runtime monkey-patches to work around current upstream Diffusers limitations.
Changes:
- Enable Ulysses SP device mesh initialization and Diffusers context-parallel enablement in
DiffusersFSDPEngine. - Introduce Diffusers monkey-patches (mesh shape, attention mask shape, attention backward) plus a helper to fix hook mesh wiring.
- Add distributed equivalence tests (fwd, fwd+bwd, and FSDP-wrapped) and an example training script enabling
ulysses_sequence_parallel_size=2.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
verl/workers/engine/fsdp/diffusers_impl.py |
Turns on Ulysses SP path in the Diffusers FSDP engine, pads seq-lens for SP alignment, and wires Diffusers context-parallel config/hooks. |
verl/models/diffusers/monkey_patch.py |
Adds monkey-patches and mesh-fix utilities needed for Diffusers Ulysses SP to work. |
verl/models/diffusers/__init__.py |
Exposes the monkey-patch entrypoint as a package API. |
tests/models/test_diffusers_ulysses.py |
Adds multi-GPU distributed tests checking SP vs non-SP forward/backward equivalence (plus FSDP-wrapped case). |
examples/flowgrpo_trainer/run_flowgrpo_sp2.sh |
Provides a runnable example config for training with ulysses_sequence_parallel_size=2. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
a5ca841 to
e3aa352
Compare
There was a problem hiding this comment.
Pull request overview
Adds Ulysses sequence parallel (SP) support for Diffusers-based training, including device-mesh setup and input sequence alignment, plus accompanying distributed tests and an example launch script.
Changes:
- Enable Ulysses SP in
DiffusersFSDPEngineby creating a (dp, ring, ulysses) device mesh, wiring the SP process group, and padding text embeddings to SP-aligned lengths. - Update vLLM-Omni async server to use
OmniEngineArgsfor engine arg construction. - Add distributed tests for Diffusers Ulysses SP forward/backward equivalence and an example script enabling
ulysses_sequence_parallel_size=2.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
verl/workers/rollout/vllm_rollout/vllm_omni_async_server.py |
Switches vLLM-Omni engine args type used to build the async engine config. |
verl/workers/engine/fsdp/diffusers_impl.py |
Implements Diffusers Ulysses SP wiring (mesh + group), enables parallelism, and pads text inputs to SP-friendly lengths. |
tests/models/test_diffusers_ulysses.py |
Adds distributed tests validating SP vs non-SP forward/backward behavior (including an FSDP-wrapped case). |
examples/flowgrpo_trainer/run_flowgrpo_sp2.sh |
Example FlowGRPO training invocation with Diffusers Ulysses SP size set to 2. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| module_sp = AutoModel.from_config(cfg, torch_dtype=torch.bfloat16) | ||
| module_sp.enable_parallelism(config=ContextParallelConfig(ulysses_degree=sp_size, mesh=ulysses_device_mesh)) | ||
| module_sp = module_sp.to(device=device, dtype=torch.bfloat16) |
There was a problem hiding this comment.
Same issue as above: ContextParallelConfig(..., mesh=ulysses_device_mesh) is not compatible with current released diffusers APIs and is likely to raise TypeError. It would be better to share a single helper to build the config in a version-tolerant way so all tests stay in sync with the production engine behavior.
| module_sp = AutoModel.from_config(cfg, torch_dtype=torch.bfloat16) | ||
| module_sp.enable_parallelism(config=ContextParallelConfig(ulysses_degree=sp_size, mesh=ulysses_device_mesh)) | ||
| module_sp = module_sp.to(device=device, dtype=torch.bfloat16) |
There was a problem hiding this comment.
Same issue as above: ContextParallelConfig(..., mesh=ulysses_device_mesh) may not be a valid constructor call in released diffusers versions and can fail at import/runtime. Consider adding a small compatibility helper (or reusing the engine’s logic) so the test works across supported diffusers versions.
| module.enable_parallelism( | ||
| config=ContextParallelConfig(ulysses_degree=sp_size, mesh=self.ulysses_device_mesh) | ||
| ) |
There was a problem hiding this comment.
ContextParallelConfig construction here passes mesh=..., but in current diffusers releases the public constructor does not accept a mesh kwarg (the mesh is handled internally / via different API). This is likely to raise a TypeError at runtime. To keep compatibility, consider using a version/feature-gated path (e.g., try constructing without mesh and set the mesh via the supported attribute/method, or rely on enable_parallelism() to create the mesh when possible).
| module.enable_parallelism( | |
| config=ContextParallelConfig(ulysses_degree=sp_size, mesh=self.ulysses_device_mesh) | |
| ) | |
| cp_config = ContextParallelConfig(ulysses_degree=sp_size) | |
| # Some diffusers versions do not accept `mesh` in the constructor; | |
| # set it via attribute when available for compatibility. | |
| if hasattr(cp_config, "mesh"): | |
| cp_config.mesh = self.ulysses_device_mesh | |
| module.enable_parallelism(config=cp_config) |
| module_sp = AutoModel.from_config(cfg, torch_dtype=torch.bfloat16) | ||
| module_sp.enable_parallelism(config=ContextParallelConfig(ulysses_degree=sp_size, mesh=ulysses_device_mesh)) | ||
| module_sp = module_sp.to(device=device, dtype=torch.bfloat16) |
There was a problem hiding this comment.
ContextParallelConfig(ulysses_degree=..., mesh=...) assumes the constructor accepts a mesh kwarg, which is not true for current diffusers releases. This will make the test error before it can run. Consider version/feature-gating this and constructing the config using the API supported by released diffusers (or mirroring whatever compatibility shim is used in the engine).
eb393e7 to
0cbbe96
Compare
|
I wasn’t able to find your contact email, so I’m reaching out here. I’m interested in learning more about the technical details and discussing contributions. My email is 1639206518@qq.com. I’d greatly appreciate it if you could reach out. I’d be happy to connect and discuss further. |
What does this PR do?
Memory:

Val Reward:

Performance:

Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,veomni,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,fully_async,one_step_off,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)recipesubmodule, please also update the reference to the submodule commit viagit submodule update --remoteorcd recipe && git pull origin main.