huggingface · Boogu-Team · Jun 18, 2026 · Jun 22, 2026 · Jun 22, 2026 · Jun 22, 2026
diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml
@@ -491,6 +491,8 @@
         title: AnimateDiff
       - local: api/pipelines/aura_flow
         title: AuraFlow
+      - local: api/pipelines/boogu
+        title: Boogu-Image
       - local: api/pipelines/bria_3_2
         title: Bria 3.2
       - local: api/pipelines/bria_fibo

diff --git a/docs/source/en/api/pipelines/boogu.md b/docs/source/en/api/pipelines/boogu.md
@@ -0,0 +1,153 @@
+<!--Copyright 2025 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+-->
+
+# Boogu-Image
+
+## Overview
+
+Boogu-Image is an instruction-driven image generation and editing model. Rather than a
+plain text prompt, it is conditioned on a natural-language *instruction* that is encoded
+by a Qwen3-VL multimodal LLM, which can also attend to optional reference images. A
+single/double-stream transformer denoiser then predicts the latent updates, and a
+flow-matching scheduler with training-aligned time shifting controls the denoising
+trajectory. The VAE maps between image and latent space.
+
+The model is released in several variants:
+
+- **Base** (`Boogu/Boogu-Image-0.1-Base`) — text-to-image, full sampling schedule.
+- **Turbo** (`Boogu/Boogu-Image-0.1-Turbo`) — DMD student model for few-step
+  text-to-image generation.
+- **Edit** (`Boogu/Boogu-Image-0.1-Edit`) — instruction-based image editing conditioned
+  on one or more reference images.
+
+FP8-quantized checkpoints are also available for each variant (the `-fp8` suffix).
+
+There are two pipeline classes:
+
+- [`BooguImagePipeline`] — text-to-image and instruction editing.
+- [`BooguImageTurboPipeline`] — a subclass adding the DMD few-step inference path. It
+  defaults the guidance scales to the DMD-required values (`text_guidance_scale=1.0`,
+  `image_guidance_scale=1.0`, `empty_instruction_guidance_scale=0.0`).
+
+## Usage examples
+
+### Text-to-image
+
+```python
+import torch
+from diffusers.pipelines.boogu import BooguImagePipeline
+
+pipe = BooguImagePipeline.from_pretrained("Boogu/Boogu-Image-0.1-Base", torch_dtype=torch.bfloat16)
+pipe = pipe.to("cuda")
+
+image = pipe(
+    instruction="A serene Chinese ink-wash landscape of the Guilin mountains bathed in golden light, layered peaks, mirror-like river, glowing golden contours.",
+    height=1024,
+    width=1024,
+    num_inference_steps=50,
+    text_guidance_scale=4.0,
+).images[0]
+
+image.save("base.png")
+```
+
+### Few-step generation (Turbo)
+
+```python
+import torch
+from diffusers.pipelines.boogu import BooguImageTurboPipeline
+
+pipe = BooguImageTurboPipeline.from_pretrained("Boogu/Boogu-Image-0.1-Turbo", torch_dtype=torch.bfloat16)
+pipe = pipe.to("cuda")
+
+image = pipe(
+    instruction="A serene Chinese ink-wash landscape of the Guilin mountains bathed in golden light.",
+    height=1024,
+    width=1024,
+    num_inference_steps=4,
+).images[0]
+
+image.save("turbo.png")
+```
+
+### Instruction-based editing
+
+Pass one or more reference images through `input_images`:
+
+```python
+import torch
+from PIL import Image
+from diffusers.pipelines.boogu import BooguImagePipeline
+
+pipe = BooguImagePipeline.from_pretrained("Boogu/Boogu-Image-0.1-Edit", torch_dtype=torch.bfloat16)
+pipe = pipe.to("cuda")
+
+image = pipe(
+    instruction="Turn the image into a colored-pencil illustration.",
+    input_images=[Image.open("base.png").convert("RGB")],
+    height=1024,
+    width=1024,
+    num_inference_steps=50,
+    text_guidance_scale=4.0,
+    image_guidance_scale=1.0,
+).images[0]
+
+image.save("edit.png")
+```
+
+### FP8 checkpoints
+
+FP8 weights are stored in a non-safetensors format, so load the transformer separately
+with `use_safetensors=False` and pass it to the pipeline:
+
+```python
+import torch
+from diffusers import BooguImageTransformer2DModel
+from diffusers.pipelines.boogu import BooguImagePipeline
+
+transformer = BooguImageTransformer2DModel.from_pretrained(
+    "Boogu/Boogu-Image-0.1-Base-fp8",
+    subfolder="transformer",
+    torch_dtype=torch.bfloat16,
+    use_safetensors=False,
+)
+pipe = BooguImagePipeline.from_pretrained(
+    "Boogu/Boogu-Image-0.1-Base-fp8", torch_dtype=torch.bfloat16, transformer=transformer
+)
+pipe = pipe.to("cuda")
+```
+
+Runnable scripts for every variant are available in
+[`examples/boogu`](https://github.com/huggingface/diffusers/tree/main/examples/boogu).
+
+> [!TIP]
+> The transformer uses fused `triton` (RMSNorm) and `flash_attn` (SwiGLU, variable-length
+> attention) kernels when they are installed, and falls back to pure PyTorch otherwise.
+
+## BooguImagePipeline
+
+[[autodoc]] pipelines.boogu.pipeline_boogu.BooguImagePipeline
+  - all
+  - __call__
+
+## BooguImageTurboPipeline
+
+[[autodoc]] pipelines.boogu.pipeline_boogu_turbo.BooguImageTurboPipeline
+  - all
+  - __call__
+
+## FMPipelineOutput
+
+[[autodoc]] pipelines.boogu.pipeline_boogu.FMPipelineOutput
diff --git a/examples/boogu/README.md b/examples/boogu/README.md
@@ -0,0 +1,81 @@
+# Boogu-Image
+
+[Boogu-Image](https://huggingface.co/Boogu) is an instruction-driven image generation and editing model. It pairs a Qwen3-VL multimodal LLM (instruction encoder) with a single/double-stream transformer denoiser and a flow-matching scheduler with training-aligned time shifting.
+
+This directory contains minimal inference scripts for the released checkpoints.
+
+## Environment installation
+[Boogu-Image-quick-start](https://github.com/boogu-project/Boogu-Image/blob/main/quick_start.sh)
+
+## Pipelines
+
+| Pipeline | Class | Use case |
+|---|---|---|
+| Base | `BooguImagePipeline` | Text-to-image (50 steps) |
+| Turbo | `BooguImageTurboPipeline` | Few-step DMD text-to-image (4 steps) |
+| Edit | `BooguImagePipeline` | Instruction-based image editing (pass `input_images`) |
+
+## Scripts
+
+| Script | Checkpoint |
+|---|---|
+| `inference_base.py` | `Boogu/Boogu-Image-0.1-Base` |
+| `inference_turbo.py` | `Boogu/Boogu-Image-0.1-Turbo` |
+| `inference_edit.py` | `Boogu/Boogu-Image-0.1-Edit` |
+| `inference_base_fp8.py` | `Boogu/Boogu-Image-0.1-Base-fp8` |
+| `inference_turbo_fp8.py` | `Boogu/Boogu-Image-0.1-Turbo-fp8` |
+| `inference_edit_fp8.py` | `Boogu/Boogu-Image-0.1-Edit-fp8` |
+
+## Usage
+
+Text-to-image:
+
+```bash
+python inference_base.py
+```
+
+Few-step (Turbo):
+
+```bash
+python inference_turbo.py
+```
+
+Image editing (reads `base.png` as the reference image, so run `inference_base.py` first):
+
+```bash
+python inference_edit.py
+```
+
+## FP8 checkpoints
+
+FP8 weights are stored in a non-safetensors format, so the transformer is loaded
+separately with `use_safetensors=False` and passed to the pipeline:
+
+```python
+import torch
+from diffusers import BooguImageTransformer2DModel
+from diffusers.pipelines.boogu import BooguImagePipeline
+
+transformer = BooguImageTransformer2DModel.from_pretrained(
+    "Boogu/Boogu-Image-0.1-Base-fp8",
+    subfolder="transformer",
+    torch_dtype=torch.bfloat16,
+    use_safetensors=False,
+)
+pipe = BooguImagePipeline.from_pretrained(
+    "Boogu/Boogu-Image-0.1-Base-fp8", torch_dtype=torch.bfloat16, transformer=transformer
+)
+pipe = pipe.to("cuda")
+```
+
+The FP8 scripts also disable the DeepGEMM kernel for the FP8 VLM (forcing a Triton
+finegrained-fp8 fallback) for broader hardware compatibility — see
+`_disable_deepgemm_for_fp8_vlm()` in each FP8 script.
+
+## Optional performance dependencies
+
+The transformer can use fused kernels when available; without them it falls back to
+pure PyTorch and prints a one-time warning:
+
+- `triton` — fused RMSNorm
+- `flash_attn` — fused SwiGLU and variable-length flash attention
diff --git a/examples/boogu/inference_base.py b/examples/boogu/inference_base.py
@@ -0,0 +1,20 @@
+import torch
+
+from diffusers.pipelines.boogu import BooguImagePipeline
+
+
+MODEL_PATH = "Boogu/Boogu-Image-0.1-Base"
+
+pipe = BooguImagePipeline.from_pretrained(MODEL_PATH, torch_dtype=torch.bfloat16)
+pipe = pipe.to("cuda")
+
+images = pipe(
+    instruction="一幅国风琉金风格的山水画作，展现了桂林山水在金光普照下的壮丽景象。远山层叠，江水如镜，山峰边缘勾勒着发光的金色线条。画面采用石青石绿岩彩与鎏金质感相结合，局部有厚涂油画笔触，空中飘浮着金色粒子，营造出梦幻朦胧而又磅礴大气的意境。",
+    height=1024,
+    width=1024,
+    num_inference_steps=50,
+    text_guidance_scale=4.0,
+).images
+
+images[0].save("base.png")
+print("Inference OK, saved base.png")
diff --git a/examples/boogu/inference_base_fp8.py b/examples/boogu/inference_base_fp8.py
@@ -0,0 +1,52 @@
+import os
+
+import torch
+
+from diffusers import BooguImageTransformer2DModel
+from diffusers.pipelines.boogu import BooguImagePipeline
+
+
+def _disable_deepgemm_for_fp8_vlm() -> None:
+    # For transformers >= 5.11.0
+    os.environ["TRANSFORMERS_DISABLE_DEEPGEMM_LINEAR"] = "1"
+
+    try:
+        import transformers.integrations.finegrained_fp8 as fg_fp8
+    except Exception:
+        return
+
+    def _raise_import_error(*args, **kwargs):
+        raise ImportError("DeepGEMM disabled; forcing Triton finegrained-fp8 fallback.")
+
+    if hasattr(fg_fp8, "deepgemm_fp8_fp4_linear"):
+        # For 5.10.1 <= transformers < 5.11.0
+        fg_fp8.deepgemm_fp8_fp4_linear = _raise_import_error
+    elif hasattr(fg_fp8, "_load_deepgemm_kernel"):
+        # For 5.5.0 <= transoformers < 5.10.1
+        fg_fp8._load_deepgemm_kernel = _raise_import_error
+
+
+_disable_deepgemm_for_fp8_vlm()
+
+MODEL_PATH = "Boogu/Boogu-Image-0.1-Base-fp8"
+
+transformer = BooguImageTransformer2DModel.from_pretrained(
+    MODEL_PATH,
+    subfolder="transformer",
+    torch_dtype=torch.bfloat16,
+    use_safetensors=False,
+)
+pipe = BooguImagePipeline.from_pretrained(MODEL_PATH, torch_dtype=torch.bfloat16, transformer=transformer)
+pipe = pipe.to("cuda")
+
+images = pipe(
+    instruction="一幅国风琉金风格的山水画作，展现了桂林山水在金光普照下的壮丽景象。远山层叠，江水如镜，山峰边缘勾勒着发光的金色线条。画面采用石青石绿岩彩与鎏金质感相结合，局部有厚涂油画笔触，空中飘浮着金色粒子，营造出梦幻朦胧而又磅礴大气的意境。",
+    height=1024,
+    width=1024,
+    num_inference_steps=50,
+    text_guidance_scale=4.0,
+).images
+
+assert len(images) == 1
+images[0].save("base_fp8.png")
+print("Inference OK, saved base_fp8.png")
diff --git a/examples/boogu/inference_edit.py b/examples/boogu/inference_edit.py
@@ -0,0 +1,38 @@
+import os
+
+import torch
+from PIL import Image
+
+from diffusers.pipelines.boogu import BooguImagePipeline
+
+
+MODEL_PATH = "Boogu/Boogu-Image-0.1-Edit"
+
+# Negative prompt steering quality away from common artifacts. With text_guidance_scale > 1
+# the model guides away from this prompt, so it noticeably improves style adherence.
+NEGATIVE_INSTRUCTION = (
+    "(((deformed))), blurry, over saturation, bad anatomy, disfigured, poorly drawn face, "
+    "mutation, mutated, (extra_limb), (ugly), (poorly drawn hands), fused fingers, messy drawing, "
+    "broken legs censor, censored, censor_bar"
+)
+
+if not os.path.exists("base.png"):
+    raise FileNotFoundError("base.png not found — run inference_base.py first to generate the reference image.")
+
+pipe = BooguImagePipeline.from_pretrained(MODEL_PATH, torch_dtype=torch.bfloat16)
+pipe = pipe.to("cuda")
+
+images = pipe(
+    instruction="把图片风格调整为彩铅插画。",
+    negative_instruction=NEGATIVE_INSTRUCTION,
+    input_images=[Image.open("base.png").convert("RGB")],
+    height=1024,
+    width=1024,
+    num_inference_steps=50,
+    text_guidance_scale=4.0,
+    image_guidance_scale=1.0,
+).images
+
+assert len(images) == 1
+images[0].save("edit.png")
+print("Inference OK, saved edit.png")