From f62bc1490b8a341f50738c6d18ac9e986fb49ea7 Mon Sep 17 00:00:00 2001 From: yiyixuxu Date: Tue, 23 Jun 2026 01:51:00 +0000 Subject: [PATCH 1/8] [.ai] document single-file model layout and "don't reimplement DiffusionPipeline" Two conventions that weren't written down, surfaced while reviewing a new model/pipeline port: - models.md: add a "Single-file model layout" section. A model lives in one self-contained `transformer_.py`; reuse shared blocks from normalization.py/attention.py/embeddings.py and inline model-specific variants with `# Copied from` rather than creating per-model `attention_processor_.py` / `block_.py` / `rope_.py` companion files (no precedent for these in the repo). - pipelines.md: add gotcha #7, "Don't reimplement DiffusionPipeline." A subclass adds only pipeline-specific steps; device placement, offloading, execution-device resolution, and module registration already exist on the base class. No custom device/offload manager, no `device=` arg on `__call__`, no `set_` setters. Co-Authored-By: Claude Opus 4.8 --- .ai/models.md | 10 ++++++++++ .ai/pipelines.md | 8 ++++++++ 2 files changed, 18 insertions(+) diff --git a/.ai/models.md b/.ai/models.md index 40df77a728a9..7029e54dcb77 100644 --- a/.ai/models.md +++ b/.ai/models.md @@ -15,6 +15,16 @@ Linked from `AGENTS.md`, `skills/model-integration/SKILL.md`, and `review-rules. * When adding a new transformer (or reviewing one), skim `src/diffusers/models/transformers/transformer_flux.py`, `src/diffusers/models/transformers/transformer_flux2.py`, `src/diffusers/models/transformers/transformer_qwenimage.py`, and `src/diffusers/models/transformers/transformer_wan.py` first to establish the pattern. Most conventions (mixin set, file structure, naming, gradient-checkpointing implementation, `_no_split_modules` settings, etc.) are easiest to internalize by comparison rather than from a fixed list. * **Loading goes through `from_pretrained` / `from_single_file`.** Weights and configs load through the standard paths — never fetched or imported out-of-band at runtime. Don't override or add a custom `from_pretrained`, and don't load weights manually (`load_file(...)`, `hf_hub_download(...)`, or `sys.path.insert(...)` to import a reference repo). For an original-format single checkpoint, add `from_single_file` support (mixin + weight-mapping). +## Single-file model layout + +A model's implementation is **self-contained in one file** — `transformer_.py` (or `unet_.py`). Don't split it into per-model companion modules; multi-file layouts are a research-repo habit, not the diffusers one. There is no precedent in `src/diffusers/models/` for `attention_processor_.py`, `block_.py`, or `rope_.py`. + +- **Attention** — the `Attention` class and its processor live in the model file (see [Attention pattern](#attention-pattern)), never a separate `attention_processor_.py`. +- **Blocks / norms / FFNs** — reuse the canonical shared classes from `normalization.py`, `attention.py`, and `embeddings.py` (e.g. `LuminaRMSNormZero` and `LuminaLayerNormContinuous` live in `normalization.py`, `LuminaFeedForward` in `attention.py`). If you need a model-specific variant, define it **inline in the model file** with a `# Copied from ...` header — don't fork the class into a new `block_.py` module. +- **RoPE** — define the rotary-embedding class inline in `transformer_.py` (see `transformer_cogview4.py`, `transformer_hunyuan_video.py`, `transformer_cosmos.py`), reusing the shared `embeddings.py` helpers. + +This does mean some duplication across model files — that's the deliberate diffusers tradeoff, and `# Copied from` keeps the copies in sync (see [AGENTS.md](AGENTS.md)). A reader should be able to understand a model from its single file without chasing per-model side modules. + ## Attention pattern Attention must follow the diffusers pattern: both the `Attention` class and its processor are defined in the model file. The processor's `__call__` handles the actual compute and must use `dispatch_attention_fn` rather than calling `F.scaled_dot_product_attention` directly. The attention class inherits `AttentionModuleMixin` and declares `_default_processor_cls` and `_available_processors`. diff --git a/.ai/pipelines.md b/.ai/pipelines.md index f25df556c44a..acc9d29a6b0a 100644 --- a/.ai/pipelines.md +++ b/.ai/pipelines.md @@ -76,3 +76,11 @@ src/diffusers/pipelines// 6. **Be deliberate about methods on the pipeline.** `__call__` is the user's mental model. The methods on the class are how they navigate it. Diffusers convention (flux, sdxl, wan, qwenimage) is a flat class body of public lifecycle methods (`__init__`, `check_inputs`, `encode_prompt`, `prepare_latents`, `__call__`). Two principles, not strict rules — use judgment: - **If a method is called from `__call__`, and it's a step in the pipeline lifecycle, make it public.** Each call from `__call__` should correspond to a step a user can identify: either a standard one (`encode_prompt`, `prepare_latents`, `set_timesteps`, …) or a pipeline-specific one (`prepare_src_latents`, `prepare_reference_audio_latents`, …). Don't gate these behind a `_`; they're part of the pipeline's API surface alongside their standard siblings. - **If a method is only used by another method, make it private (`_foo`) or lift it to a module-level function — and keep the count down.** Before adding one, see if the logic can be absorbed into its caller. Unless you expect the helper to be reused by another method (or another task pipeline), absorbing is usually the better call — especially when the body is small. Avoid a pipeline class littered with private helpers that bury the lifecycle.. + +7. **Don't reimplement `DiffusionPipeline`.** A pipeline subclass adds only *pipeline-specific* steps (`__call__`, `check_inputs`, `encode_prompt`, `prepare_latents`, …). Device and lifecycle management already live on the base class — use it, don't shadow it: + - **Device placement** — `pipe.to(device)`. Don't add a `device=` argument to `__call__` or write a custom mover; `__call__` resolves the device via `self._execution_device`. + - **Offloading** — `enable_model_cpu_offload()` / `enable_sequential_cpu_offload()` / `enable_group_offload()`. Don't keep parallel `enable_*_flag` booleans or a hand-rolled device/offload manager; the base already enforces one strategy at a time. + - **Execution device** — read `self._execution_device`; don't maintain your own `self.execution_device`. + - **Components** — registered once via `register_modules(...)` in `__init__` and accessed as attributes. There is no `set_` setter convention; to swap a component, reconstruct the pipeline or re-register — don't ship `set_processor` / `set_transformer` / etc. + + If you're writing device, offload, or module-registration plumbing, it almost certainly already exists on `DiffusionPipeline` — reuse it. From f5d4350229fa657d21e5881aba15ce43a240e4fb Mon Sep 17 00:00:00 2001 From: yiyixuxu Date: Tue, 23 Jun 2026 01:54:47 +0000 Subject: [PATCH 2/8] [.ai] generalize single-file model layout section Drop the implementation-specific examples (Lumina class names, per-model transformer_*.py citations) and state the single-file policy generally: everything (attention, blocks, RoPE, model-specific layers) in one model file, with shared blocks either imported from the common modules or brought in via `# Copied from ... with Old->New`. Co-Authored-By: Claude Opus 4.8 --- .ai/models.md | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/.ai/models.md b/.ai/models.md index 7029e54dcb77..540eb379638c 100644 --- a/.ai/models.md +++ b/.ai/models.md @@ -17,13 +17,11 @@ Linked from `AGENTS.md`, `skills/model-integration/SKILL.md`, and `review-rules. ## Single-file model layout -A model's implementation is **self-contained in one file** — `transformer_.py` (or `unet_.py`). Don't split it into per-model companion modules; multi-file layouts are a research-repo habit, not the diffusers one. There is no precedent in `src/diffusers/models/` for `attention_processor_.py`, `block_.py`, or `rope_.py`. +A model follows the **single-file policy**: its full implementation lives in one `transformer_.py` (or `unet_.py`) — attention (the `Attention` class and its processor), transformer blocks, RoPE, and any model-specific layers all in that file. Don't split it into per-model companion modules (`attention_processor_.py`, `block_.py`, `rope_.py`); multi-file layouts are a research-repo habit, not the diffusers one. -- **Attention** — the `Attention` class and its processor live in the model file (see [Attention pattern](#attention-pattern)), never a separate `attention_processor_.py`. -- **Blocks / norms / FFNs** — reuse the canonical shared classes from `normalization.py`, `attention.py`, and `embeddings.py` (e.g. `LuminaRMSNormZero` and `LuminaLayerNormContinuous` live in `normalization.py`, `LuminaFeedForward` in `attention.py`). If you need a model-specific variant, define it **inline in the model file** with a `# Copied from ...` header — don't fork the class into a new `block_.py` module. -- **RoPE** — define the rotary-embedding class inline in `transformer_.py` (see `transformer_cogview4.py`, `transformer_hunyuan_video.py`, `transformer_cosmos.py`), reusing the shared `embeddings.py` helpers. - -This does mean some duplication across model files — that's the deliberate diffusers tradeoff, and `# Copied from` keeps the copies in sync (see [AGENTS.md](AGENTS.md)). A reader should be able to understand a model from its single file without chasing per-model side modules. +For shared building blocks, either: +- **import** a common layer from `normalization.py`, `attention.py`, or `embeddings.py`, or +- **`# Copied from`** a class in another model and rename (`# Copied from ...transformer_other.OtherBlock with Other->This`), so `make fix-copies` keeps the copies in sync. ## Attention pattern From 04f9096533318e2d646f836281c85e1138f3e224 Mon Sep 17 00:00:00 2001 From: yiyixuxu Date: Tue, 23 Jun 2026 01:59:43 +0000 Subject: [PATCH 3/8] [.ai] trim redundant don't-split sentence from single-file section The positive statement (everything in one model file) already conveys it; drop the companion-module filename list and the editorializing. Co-Authored-By: Claude Opus 4.8 --- .ai/models.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.ai/models.md b/.ai/models.md index 540eb379638c..9503d059bf29 100644 --- a/.ai/models.md +++ b/.ai/models.md @@ -17,7 +17,7 @@ Linked from `AGENTS.md`, `skills/model-integration/SKILL.md`, and `review-rules. ## Single-file model layout -A model follows the **single-file policy**: its full implementation lives in one `transformer_.py` (or `unet_.py`) — attention (the `Attention` class and its processor), transformer blocks, RoPE, and any model-specific layers all in that file. Don't split it into per-model companion modules (`attention_processor_.py`, `block_.py`, `rope_.py`); multi-file layouts are a research-repo habit, not the diffusers one. +A model follows the **single-file policy**: its full implementation lives in one `transformer_.py` (or `unet_.py`) — attention (the `Attention` class and its processor), transformer blocks, RoPE, and any model-specific layers all in that file. For shared building blocks, either: - **import** a common layer from `normalization.py`, `attention.py`, or `embeddings.py`, or From 29452c84cdf782e1826dd1417ec132fb897768c6 Mon Sep 17 00:00:00 2001 From: yiyixuxu Date: Tue, 23 Jun 2026 02:09:53 +0000 Subject: [PATCH 4/8] [.ai] generalize the "don't reimplement DiffusionPipeline" gotcha State the principle and the canonical base APIs; drop the implementation- specific anti-pattern callouts (enable_*_flag booleans, set_ setters, the device= arg, a shadow self.execution_device). Co-Authored-By: Claude Opus 4.8 --- .ai/pipelines.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/.ai/pipelines.md b/.ai/pipelines.md index acc9d29a6b0a..a303f7c74202 100644 --- a/.ai/pipelines.md +++ b/.ai/pipelines.md @@ -77,10 +77,9 @@ src/diffusers/pipelines// - **If a method is called from `__call__`, and it's a step in the pipeline lifecycle, make it public.** Each call from `__call__` should correspond to a step a user can identify: either a standard one (`encode_prompt`, `prepare_latents`, `set_timesteps`, …) or a pipeline-specific one (`prepare_src_latents`, `prepare_reference_audio_latents`, …). Don't gate these behind a `_`; they're part of the pipeline's API surface alongside their standard siblings. - **If a method is only used by another method, make it private (`_foo`) or lift it to a module-level function — and keep the count down.** Before adding one, see if the logic can be absorbed into its caller. Unless you expect the helper to be reused by another method (or another task pipeline), absorbing is usually the better call — especially when the body is small. Avoid a pipeline class littered with private helpers that bury the lifecycle.. -7. **Don't reimplement `DiffusionPipeline`.** A pipeline subclass adds only *pipeline-specific* steps (`__call__`, `check_inputs`, `encode_prompt`, `prepare_latents`, …). Device and lifecycle management already live on the base class — use it, don't shadow it: - - **Device placement** — `pipe.to(device)`. Don't add a `device=` argument to `__call__` or write a custom mover; `__call__` resolves the device via `self._execution_device`. - - **Offloading** — `enable_model_cpu_offload()` / `enable_sequential_cpu_offload()` / `enable_group_offload()`. Don't keep parallel `enable_*_flag` booleans or a hand-rolled device/offload manager; the base already enforces one strategy at a time. - - **Execution device** — read `self._execution_device`; don't maintain your own `self.execution_device`. - - **Components** — registered once via `register_modules(...)` in `__init__` and accessed as attributes. There is no `set_` setter convention; to swap a component, reconstruct the pipeline or re-register — don't ship `set_processor` / `set_transformer` / etc. +7. **Don't reimplement `DiffusionPipeline`.** A pipeline subclass adds only *pipeline-specific* steps (`__call__`, `check_inputs`, `encode_prompt`, `prepare_latents`, …). Device handling, offloading, execution-device resolution, and component registration already live on the base class — use them rather than adding a parallel implementation on the subclass: + - **Device placement** — `pipe.to(device)`; `__call__` reads `self._execution_device`. + - **Offloading** — `enable_model_cpu_offload()` / `enable_sequential_cpu_offload()` / `enable_group_offload()`. + - **Components** — registered once via `register_modules(...)` in `__init__` and accessed as attributes. - If you're writing device, offload, or module-registration plumbing, it almost certainly already exists on `DiffusionPipeline` — reuse it. + If you find yourself writing device, offload, or module-registration plumbing on a pipeline, it almost certainly already exists on `DiffusionPipeline` — reuse it. From bd9df684eccb20f636917c0a9f9ef11f39039d68 Mon Sep 17 00:00:00 2001 From: yiyixuxu Date: Tue, 23 Jun 2026 02:13:08 +0000 Subject: [PATCH 5/8] [.ai] pare gotcha #7 down to the principle MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Drop the per-category API bullets; one sentence — don't add device placement, offloading, or component loading/registration logic on a subclass, it's on the base class. Can expand later if needed. Co-Authored-By: Claude Opus 4.8 --- .ai/pipelines.md | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/.ai/pipelines.md b/.ai/pipelines.md index a303f7c74202..8b485c245f0b 100644 --- a/.ai/pipelines.md +++ b/.ai/pipelines.md @@ -77,9 +77,4 @@ src/diffusers/pipelines// - **If a method is called from `__call__`, and it's a step in the pipeline lifecycle, make it public.** Each call from `__call__` should correspond to a step a user can identify: either a standard one (`encode_prompt`, `prepare_latents`, `set_timesteps`, …) or a pipeline-specific one (`prepare_src_latents`, `prepare_reference_audio_latents`, …). Don't gate these behind a `_`; they're part of the pipeline's API surface alongside their standard siblings. - **If a method is only used by another method, make it private (`_foo`) or lift it to a module-level function — and keep the count down.** Before adding one, see if the logic can be absorbed into its caller. Unless you expect the helper to be reused by another method (or another task pipeline), absorbing is usually the better call — especially when the body is small. Avoid a pipeline class littered with private helpers that bury the lifecycle.. -7. **Don't reimplement `DiffusionPipeline`.** A pipeline subclass adds only *pipeline-specific* steps (`__call__`, `check_inputs`, `encode_prompt`, `prepare_latents`, …). Device handling, offloading, execution-device resolution, and component registration already live on the base class — use them rather than adding a parallel implementation on the subclass: - - **Device placement** — `pipe.to(device)`; `__call__` reads `self._execution_device`. - - **Offloading** — `enable_model_cpu_offload()` / `enable_sequential_cpu_offload()` / `enable_group_offload()`. - - **Components** — registered once via `register_modules(...)` in `__init__` and accessed as attributes. - - If you find yourself writing device, offload, or module-registration plumbing on a pipeline, it almost certainly already exists on `DiffusionPipeline` — reuse it. +7. **Don't reimplement `DiffusionPipeline`.** A pipeline subclass adds only *pipeline-specific* steps (`__call__`, `check_inputs`, `encode_prompt`, `prepare_latents`, …). Device placement, offloading, and component loading/registration already live on the base class — don't add your own; use what's there. From de6999bf8f86937854e3e38d4bb99640248e4cb7 Mon Sep 17 00:00:00 2001 From: yiyixuxu Date: Tue, 23 Jun 2026 02:20:44 +0000 Subject: [PATCH 6/8] [.ai] renumber gotchas after merge: component-mutation #7, reimplement #8 main's newly merged "don't modify registered component on the fly" keeps #7; our "don't reimplement DiffusionPipeline" becomes #8 (appended after it). Co-Authored-By: Claude Opus 4.8 --- .ai/pipelines.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.ai/pipelines.md b/.ai/pipelines.md index 09e9916ff24d..b2d6e065b894 100644 --- a/.ai/pipelines.md +++ b/.ai/pipelines.md @@ -77,5 +77,5 @@ src/diffusers/pipelines// - **If a method is called from `__call__`, and it's a step in the pipeline lifecycle, make it public.** Each call from `__call__` should correspond to a step a user can identify: either a standard one (`encode_prompt`, `prepare_latents`, `set_timesteps`, …) or a pipeline-specific one (`prepare_src_latents`, `prepare_reference_audio_latents`, …). Don't gate these behind a `_`; they're part of the pipeline's API surface alongside their standard siblings. - **If a method is only used by another method, make it private (`_foo`) or lift it to a module-level function — and keep the count down.** Before adding one, see if the logic can be absorbed into its caller. Unless you expect the helper to be reused by another method (or another task pipeline), absorbing is usually the better call — especially when the body is small. Avoid a pipeline class littered with private helpers that bury the lifecycle.. -7. **Don't reimplement `DiffusionPipeline`.** A pipeline subclass adds only *pipeline-specific* steps (`__call__`, `check_inputs`, `encode_prompt`, `prepare_latents`, …). Device placement, offloading, and component loading/registration already live on the base class — don't add your own; use what's there. -8. **Don't modify the state of a registered component on the fly.** From inside `__call__` or other helper methods, don't change the state of `self.text_encoder` / `self.transformer` / `self.vae` — no in-place `.to(dtype/device)`, no setting attributes/buffers or swapping submodules. Components are shared and routinely reused across pipelines, so a per-call mutation may silently change another pipeline's outputs. You should pass a component that's already in the right state, and document that expectation explicitly. Only when that's genuinely inconvenient and you must change state for the duration of a call — e.g. swapping in an attention processor — save the original first and restore it before returning, so the component is left exactly as you found it. The PAG pipelines are the reference for this: `pipeline_pag_sd.py` snapshots `original_attn_proc = self.unet.attn_processors`, installs the PAG processors for the denoising loop, then calls `self.unet.set_attn_processor(original_attn_proc)` at the end of `__call__`. +7. **Don't modify the state of a registered component on the fly.** From inside `__call__` or other helper methods, don't change the state of `self.text_encoder` / `self.transformer` / `self.vae` — no in-place `.to(dtype/device)`, no setting attributes/buffers or swapping submodules. Components are shared and routinely reused across pipelines, so a per-call mutation may silently change another pipeline's outputs. You should pass a component that's already in the right state, and document that expectation explicitly. Only when that's genuinely inconvenient and you must change state for the duration of a call — e.g. swapping in an attention processor — save the original first and restore it before returning, so the component is left exactly as you found it. The PAG pipelines are the reference for this: `pipeline_pag_sd.py` snapshots `original_attn_proc = self.unet.attn_processors`, installs the PAG processors for the denoising loop, then calls `self.unet.set_attn_processor(original_attn_proc)` at the end of `__call__`. +8. **Don't reimplement `DiffusionPipeline`.** A pipeline subclass adds only *pipeline-specific* steps (`__call__`, `check_inputs`, `encode_prompt`, `prepare_latents`, …). Device placement, offloading, and component loading/registration already live on the base class — don't add your own; use what's there. From c33ab92da731e7daea61a021a9b17d4d4e488e79 Mon Sep 17 00:00:00 2001 From: yiyixuxu Date: Tue, 23 Jun 2026 02:22:22 +0000 Subject: [PATCH 7/8] [.ai] add blank line between gotchas 7 and 8 Co-Authored-By: Claude Opus 4.8 --- .ai/pipelines.md | 1 + 1 file changed, 1 insertion(+) diff --git a/.ai/pipelines.md b/.ai/pipelines.md index b2d6e065b894..eed9a1be5ba5 100644 --- a/.ai/pipelines.md +++ b/.ai/pipelines.md @@ -78,4 +78,5 @@ src/diffusers/pipelines// - **If a method is only used by another method, make it private (`_foo`) or lift it to a module-level function — and keep the count down.** Before adding one, see if the logic can be absorbed into its caller. Unless you expect the helper to be reused by another method (or another task pipeline), absorbing is usually the better call — especially when the body is small. Avoid a pipeline class littered with private helpers that bury the lifecycle.. 7. **Don't modify the state of a registered component on the fly.** From inside `__call__` or other helper methods, don't change the state of `self.text_encoder` / `self.transformer` / `self.vae` — no in-place `.to(dtype/device)`, no setting attributes/buffers or swapping submodules. Components are shared and routinely reused across pipelines, so a per-call mutation may silently change another pipeline's outputs. You should pass a component that's already in the right state, and document that expectation explicitly. Only when that's genuinely inconvenient and you must change state for the duration of a call — e.g. swapping in an attention processor — save the original first and restore it before returning, so the component is left exactly as you found it. The PAG pipelines are the reference for this: `pipeline_pag_sd.py` snapshots `original_attn_proc = self.unet.attn_processors`, installs the PAG processors for the denoising loop, then calls `self.unet.set_attn_processor(original_attn_proc)` at the end of `__call__`. + 8. **Don't reimplement `DiffusionPipeline`.** A pipeline subclass adds only *pipeline-specific* steps (`__call__`, `check_inputs`, `encode_prompt`, `prepare_latents`, …). Device placement, offloading, and component loading/registration already live on the base class — don't add your own; use what's there. From 79a3e77584b6fed2c5c939f5586eead6aa074796 Mon Sep 17 00:00:00 2001 From: YiYi Xu Date: Tue, 23 Jun 2026 13:54:16 -1000 Subject: [PATCH 8/8] Update .ai/models.md Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> --- .ai/models.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.ai/models.md b/.ai/models.md index 5701945f7c69..744c6b3a5234 100644 --- a/.ai/models.md +++ b/.ai/models.md @@ -17,7 +17,7 @@ Linked from `AGENTS.md`, `skills/model-integration/SKILL.md`, and `review-rules. ## Single-file model layout -A model follows the **single-file policy**: its full implementation lives in one `transformer_.py` (or `unet_.py`) — attention (the `Attention` class and its processor), transformer blocks, RoPE, and any model-specific layers all in that file. +A model follows the **single-file policy**: its full implementation lives in one `transformer_.py` (or `unet_.py`) — attention (the `Attention` class and its processor), transformer blocks, RoPE, and any model-specific layers should all be in that file. For shared building blocks, either: - **import** a common layer from `normalization.py`, `attention.py`, or `embeddings.py`, or