diff --git a/.ai/models.md b/.ai/models.md
index 0729a9b799e1..744c6b3a5234 100644
--- a/.ai/models.md
+++ b/.ai/models.md
@@ -15,6 +15,14 @@ Linked from `AGENTS.md`, `skills/model-integration/SKILL.md`, and `review-rules.
 * When adding a new transformer (or reviewing one), skim `src/diffusers/models/transformers/transformer_flux.py`, `src/diffusers/models/transformers/transformer_flux2.py`, `src/diffusers/models/transformers/transformer_qwenimage.py`, and `src/diffusers/models/transformers/transformer_wan.py` first to establish the pattern. Most conventions (mixin set, file structure, naming, gradient-checkpointing implementation, `_no_split_modules` settings, etc.) are easiest to internalize by comparison rather than from a fixed list.
 * **Loading goes through `from_pretrained` / `from_single_file`.** Weights and configs load through the standard paths — never fetched or imported out-of-band at runtime. Don't override or add a custom `from_pretrained`, and don't load weights manually (`load_file(...)`, `hf_hub_download(...)`, or `sys.path.insert(...)` to import a reference repo). For an original-format single checkpoint, add `from_single_file` support (mixin + weight-mapping).
 
+## Single-file model layout
+
+A model follows the **single-file policy**: its full implementation lives in one `transformer_<name>.py` (or `unet_<name>.py`) — attention (the `Attention` class and its processor), transformer blocks, RoPE, and any model-specific layers should all be in that file.
+
+For shared building blocks, either:
+- **import** a common layer from `normalization.py`, `attention.py`, or `embeddings.py`, or
+- **`# Copied from`** a class in another model and rename (`# Copied from ...transformer_other.OtherBlock with Other->This`), so `make fix-copies` keeps the copies in sync.
+
 ## Attention pattern
 
 Attention must follow the diffusers pattern: both the `Attention` class and its processor are defined in the model file. The processor's `__call__` handles the actual compute and must use `dispatch_attention_fn` rather than calling `F.scaled_dot_product_attention` directly. The attention class inherits `AttentionModuleMixin` and declares `_default_processor_cls` and `_available_processors`.
diff --git a/.ai/pipelines.md b/.ai/pipelines.md
index 16e0ace48fca..eed9a1be5ba5 100644
--- a/.ai/pipelines.md
+++ b/.ai/pipelines.md
@@ -78,3 +78,5 @@ src/diffusers/pipelines/<model>/
     - **If a method is only used by another method, make it private (`_foo`) or lift it to a module-level function — and keep the count down.** Before adding one, see if the logic can be absorbed into its caller. Unless you expect the helper to be reused by another method (or another task pipeline), absorbing is usually the better call — especially when the body is small. Avoid a pipeline class littered with private helpers that bury the lifecycle..
 
 7. **Don't modify the state of a registered component on the fly.** From inside `__call__` or other helper methods, don't change the state of `self.text_encoder` / `self.transformer` / `self.vae` — no in-place `.to(dtype/device)`, no setting attributes/buffers or swapping submodules. Components are shared and routinely reused across pipelines, so a per-call mutation may silently change another pipeline's outputs. You should pass a component that's already in the right state, and document that expectation explicitly. Only when that's genuinely inconvenient and you must change state for the duration of a call — e.g. swapping in an attention processor — save the original first and restore it before returning, so the component is left exactly as you found it. The PAG pipelines are the reference for this: `pipeline_pag_sd.py` snapshots `original_attn_proc = self.unet.attn_processors`, installs the PAG processors for the denoising loop, then calls `self.unet.set_attn_processor(original_attn_proc)` at the end of `__call__`.
+
+8. **Don't reimplement `DiffusionPipeline`.** A pipeline subclass adds only *pipeline-specific* steps (`__call__`, `check_inputs`, `encode_prompt`, `prepare_latents`, …). Device placement, offloading, and component loading/registration already live on the base class — don't add your own; use what's there.