[trainer] feat: support sequence parallel by zhtmike · Pull Request #45 · zhtmike/verl

zhtmike · 2026-03-05T03:34:19Z

What does this PR do?

support sequence parallel for Diffusers SDPA attn backend
Require diffusers >= 0.38

Memory:

Val Reward:

Performance:

Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward, fully_async, one_step_off
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)
If your PR is related to the recipe submodule, please also update the reference to the submodule commit via git submodule update --remote or cd recipe && git pull origin main.

Copilot

Pull request overview

Adds Ulysses Sequence Parallel (SP) support for Diffusers-based training in the FSDP engine, including runtime monkey-patches to work around current upstream Diffusers limitations.

Changes:

Enable Ulysses SP device mesh initialization and Diffusers context-parallel enablement in DiffusersFSDPEngine.
Introduce Diffusers monkey-patches (mesh shape, attention mask shape, attention backward) plus a helper to fix hook mesh wiring.
Add distributed equivalence tests (fwd, fwd+bwd, and FSDP-wrapped) and an example training script enabling ulysses_sequence_parallel_size=2.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`verl/workers/engine/fsdp/diffusers_impl.py`	Turns on Ulysses SP path in the Diffusers FSDP engine, pads seq-lens for SP alignment, and wires Diffusers context-parallel config/hooks.
`verl/models/diffusers/monkey_patch.py`	Adds monkey-patches and mesh-fix utilities needed for Diffusers Ulysses SP to work.
`verl/models/diffusers/__init__.py`	Exposes the monkey-patch entrypoint as a package API.
`tests/models/test_diffusers_ulysses.py`	Adds multi-GPU distributed tests checking SP vs non-SP forward/backward equivalence (plus FSDP-wrapped case).
`examples/flowgrpo_trainer/run_flowgrpo_sp2.sh`	Provides a runnable example config for training with `ulysses_sequence_parallel_size=2`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

verl/workers/engine/fsdp/diffusers_impl.py

Copilot

Pull request overview

Adds Ulysses sequence parallel (SP) support for Diffusers-based training, including device-mesh setup and input sequence alignment, plus accompanying distributed tests and an example launch script.

Changes:

Enable Ulysses SP in DiffusersFSDPEngine by creating a (dp, ring, ulysses) device mesh, wiring the SP process group, and padding text embeddings to SP-aligned lengths.
Update vLLM-Omni async server to use OmniEngineArgs for engine arg construction.
Add distributed tests for Diffusers Ulysses SP forward/backward equivalence and an example script enabling ulysses_sequence_parallel_size=2.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
`verl/workers/rollout/vllm_rollout/vllm_omni_async_server.py`	Switches vLLM-Omni engine args type used to build the async engine config.
`verl/workers/engine/fsdp/diffusers_impl.py`	Implements Diffusers Ulysses SP wiring (mesh + group), enables parallelism, and pads text inputs to SP-friendly lengths.
`tests/models/test_diffusers_ulysses.py`	Adds distributed tests validating SP vs non-SP forward/backward behavior (including an FSDP-wrapped case).
`examples/flowgrpo_trainer/run_flowgrpo_sp2.sh`	Example FlowGRPO training invocation with Diffusers Ulysses SP size set to 2.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-30T10:53:08Z

tests/models/test_diffusers_ulysses.py

+    module_sp = AutoModel.from_config(cfg, torch_dtype=torch.bfloat16)
+    module_sp.enable_parallelism(config=ContextParallelConfig(ulysses_degree=sp_size, mesh=ulysses_device_mesh))
+    module_sp = module_sp.to(device=device, dtype=torch.bfloat16)


Same issue as above: ContextParallelConfig(..., mesh=ulysses_device_mesh) is not compatible with current released diffusers APIs and is likely to raise TypeError. It would be better to share a single helper to build the config in a version-tolerant way so all tests stay in sync with the production engine behavior.

Copilot · 2026-03-30T10:53:08Z

tests/models/test_diffusers_ulysses.py

+    module_sp = AutoModel.from_config(cfg, torch_dtype=torch.bfloat16)
+    module_sp.enable_parallelism(config=ContextParallelConfig(ulysses_degree=sp_size, mesh=ulysses_device_mesh))
+    module_sp = module_sp.to(device=device, dtype=torch.bfloat16)


Same issue as above: ContextParallelConfig(..., mesh=ulysses_device_mesh) may not be a valid constructor call in released diffusers versions and can fail at import/runtime. Consider adding a small compatibility helper (or reusing the engine’s logic) so the test works across supported diffusers versions.

Copilot · 2026-03-30T10:53:08Z

verl/workers/engine/fsdp/diffusers_impl.py

+                module.enable_parallelism(
+                    config=ContextParallelConfig(ulysses_degree=sp_size, mesh=self.ulysses_device_mesh)
+                )


ContextParallelConfig construction here passes mesh=..., but in current diffusers releases the public constructor does not accept a mesh kwarg (the mesh is handled internally / via different API). This is likely to raise a TypeError at runtime. To keep compatibility, consider using a version/feature-gated path (e.g., try constructing without mesh and set the mesh via the supported attribute/method, or rely on enable_parallelism() to create the mesh when possible).

Suggested change

module.enable_parallelism(

config=ContextParallelConfig(ulysses_degree=sp_size, mesh=self.ulysses_device_mesh)

)

cp_config = ContextParallelConfig(ulysses_degree=sp_size)

# Some diffusers versions do not accept `mesh` in the constructor;

# set it via attribute when available for compatibility.

if hasattr(cp_config, "mesh"):

cp_config.mesh = self.ulysses_device_mesh

module.enable_parallelism(config=cp_config)

Copilot · 2026-03-30T10:53:08Z

tests/models/test_diffusers_ulysses.py

+    module_sp = AutoModel.from_config(cfg, torch_dtype=torch.bfloat16)
+    module_sp.enable_parallelism(config=ContextParallelConfig(ulysses_degree=sp_size, mesh=ulysses_device_mesh))
+    module_sp = module_sp.to(device=device, dtype=torch.bfloat16)


ContextParallelConfig(ulysses_degree=..., mesh=...) assumes the constructor accepts a mesh kwarg, which is not true for current diffusers releases. This will make the test error before it can run. Consider version/feature-gating this and constructing the config using the API supported by released diffusers (or mirroring whatever compatibility shim is used in the engine).

niehen6174 · 2026-04-03T06:49:32Z

I wasn’t able to find your contact email, so I’m reaching out here.

I’m interested in learning more about the technical details and discussing contributions.

My email is 1639206518@qq.com. I’d greatly appreciate it if you could reach out. I’d be happy to connect and discuss further.

zhtmike force-pushed the verl-omni-dev-sp branch from 40ecc20 to 381a0fd Compare March 18, 2026 10:36

support sequence parallel

c014371

zhtmike force-pushed the verl-omni-dev-sp branch from 381a0fd to c014371 Compare March 19, 2026 09:00

add sp script

f5ef656

zhtmike marked this pull request as ready for review March 19, 2026 09:06

zhtmike requested a review from Copilot March 19, 2026 09:06

Copilot AI reviewed Mar 19, 2026

View reviewed changes

verl/workers/engine/fsdp/diffusers_impl.py Show resolved Hide resolved

zhtmike added 2 commits March 19, 2026 17:18

clean comment

842f356

clean unecessary all_gather

e3aa352

zhtmike force-pushed the verl-omni-dev-sp branch from a5ca841 to e3aa352 Compare March 19, 2026 09:54

zhtmike added 7 commits March 19, 2026 18:53

change order

2ab1702

drop monkey-patch since it is fixed in diffusers side

66a46cd

clean monkey-patch

881c003

add comments

d42a832

use vllm-omni 0.18

d7c2295

clean monkey-patches

abf883f

add protection

7db975c

zhtmike requested a review from Copilot March 30, 2026 10:43

Copilot started reviewing on behalf of zhtmike March 30, 2026 10:44 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

zhtmike force-pushed the verl-omni branch 6 times, most recently from eb393e7 to 0cbbe96 Compare April 2, 2026 01:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[trainer] feat: support sequence parallel#45

[trainer] feat: support sequence parallel#45
zhtmike wants to merge 11 commits intoverl-omnifrom
verl-omni-dev-sp

zhtmike commented Mar 5, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

niehen6174 commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-                module.enable_parallelism(
-                    config=ContextParallelConfig(ulysses_degree=sp_size, mesh=self.ulysses_device_mesh)
-                )
+                cp_config = ContextParallelConfig(ulysses_degree=sp_size)
+                # Some diffusers versions do not accept `mesh` in the constructor;
+                # set it via attribute when available for compatibility.
+                if hasattr(cp_config, "mesh"):
+                    cp_config.mesh = self.ulysses_device_mesh
+                module.enable_parallelism(config=cp_config)

Conversation

zhtmike commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

niehen6174 commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhtmike commented Mar 5, 2026 •

edited

Loading