Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -491,6 +491,8 @@
title: AnimateDiff
- local: api/pipelines/aura_flow
title: AuraFlow
- local: api/pipelines/boogu
title: Boogu-Image
- local: api/pipelines/bria_3_2
title: Bria 3.2
- local: api/pipelines/bria_fibo
Expand Down
153 changes: 153 additions & 0 deletions docs/source/en/api/pipelines/boogu.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
-->

# Boogu-Image

## Overview

Boogu-Image is an instruction-driven image generation and editing model. Rather than a
plain text prompt, it is conditioned on a natural-language *instruction* that is encoded
by a Qwen3-VL multimodal LLM, which can also attend to optional reference images. A
single/double-stream transformer denoiser then predicts the latent updates, and a
flow-matching scheduler with training-aligned time shifting controls the denoising
trajectory. The VAE maps between image and latent space.

The model is released in several variants:

- **Base** (`Boogu/Boogu-Image-0.1-Base`) — text-to-image, full sampling schedule.
- **Turbo** (`Boogu/Boogu-Image-0.1-Turbo`) — DMD student model for few-step
text-to-image generation.
- **Edit** (`Boogu/Boogu-Image-0.1-Edit`) — instruction-based image editing conditioned
on one or more reference images.

FP8-quantized checkpoints are also available for each variant (the `-fp8` suffix).

There are two pipeline classes:

- [`BooguImagePipeline`] — text-to-image and instruction editing.
- [`BooguImageTurboPipeline`] — a subclass adding the DMD few-step inference path. It
defaults the guidance scales to the DMD-required values (`text_guidance_scale=1.0`,
`image_guidance_scale=1.0`, `empty_instruction_guidance_scale=0.0`).

## Usage examples

### Text-to-image

```python
import torch
from diffusers.pipelines.boogu import BooguImagePipeline

pipe = BooguImagePipeline.from_pretrained("Boogu/Boogu-Image-0.1-Base", torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")

image = pipe(
instruction="A serene Chinese ink-wash landscape of the Guilin mountains bathed in golden light, layered peaks, mirror-like river, glowing golden contours.",
height=1024,
width=1024,
num_inference_steps=50,
text_guidance_scale=4.0,
).images[0]

image.save("base.png")
```

### Few-step generation (Turbo)

```python
import torch
from diffusers.pipelines.boogu import BooguImageTurboPipeline

pipe = BooguImageTurboPipeline.from_pretrained("Boogu/Boogu-Image-0.1-Turbo", torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")

image = pipe(
instruction="A serene Chinese ink-wash landscape of the Guilin mountains bathed in golden light.",
height=1024,
width=1024,
num_inference_steps=4,
).images[0]

image.save("turbo.png")
```

### Instruction-based editing

Pass one or more reference images through `input_images`:

```python
import torch
from PIL import Image
from diffusers.pipelines.boogu import BooguImagePipeline

pipe = BooguImagePipeline.from_pretrained("Boogu/Boogu-Image-0.1-Edit", torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")

image = pipe(
instruction="Turn the image into a colored-pencil illustration.",
input_images=[Image.open("base.png").convert("RGB")],
height=1024,
width=1024,
num_inference_steps=50,
text_guidance_scale=4.0,
image_guidance_scale=1.0,
).images[0]

image.save("edit.png")
```

### FP8 checkpoints

FP8 weights are stored in a non-safetensors format, so load the transformer separately
with `use_safetensors=False` and pass it to the pipeline:

```python
import torch
from diffusers import BooguImageTransformer2DModel
from diffusers.pipelines.boogu import BooguImagePipeline

transformer = BooguImageTransformer2DModel.from_pretrained(
"Boogu/Boogu-Image-0.1-Base-fp8",
subfolder="transformer",
torch_dtype=torch.bfloat16,
use_safetensors=False,
)
pipe = BooguImagePipeline.from_pretrained(
"Boogu/Boogu-Image-0.1-Base-fp8", torch_dtype=torch.bfloat16, transformer=transformer
)
pipe = pipe.to("cuda")
```

Runnable scripts for every variant are available in
[`examples/boogu`](https://github.com/huggingface/diffusers/tree/main/examples/boogu).

> [!TIP]
> The transformer uses fused `triton` (RMSNorm) and `flash_attn` (SwiGLU, variable-length
> attention) kernels when they are installed, and falls back to pure PyTorch otherwise.

## BooguImagePipeline

[[autodoc]] pipelines.boogu.pipeline_boogu.BooguImagePipeline
- all
- __call__

## BooguImageTurboPipeline

[[autodoc]] pipelines.boogu.pipeline_boogu_turbo.BooguImageTurboPipeline
- all
- __call__

## FMPipelineOutput

[[autodoc]] pipelines.boogu.pipeline_boogu.FMPipelineOutput
81 changes: 81 additions & 0 deletions examples/boogu/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Boogu-Image

[Boogu-Image](https://huggingface.co/Boogu) is an instruction-driven image generation and editing model. It pairs a Qwen3-VL multimodal LLM (instruction encoder) with a single/double-stream transformer denoiser and a flow-matching scheduler with training-aligned time shifting.

This directory contains minimal inference scripts for the released checkpoints.

## Environment installation
[Boogu-Image-quick-start](https://github.com/boogu-project/Boogu-Image/blob/main/quick_start.sh)

## Pipelines

| Pipeline | Class | Use case |
|---|---|---|
| Base | `BooguImagePipeline` | Text-to-image (50 steps) |
| Turbo | `BooguImageTurboPipeline` | Few-step DMD text-to-image (4 steps) |
| Edit | `BooguImagePipeline` | Instruction-based image editing (pass `input_images`) |

## Scripts

| Script | Checkpoint |
|---|---|
| `inference_base.py` | `Boogu/Boogu-Image-0.1-Base` |
| `inference_turbo.py` | `Boogu/Boogu-Image-0.1-Turbo` |
| `inference_edit.py` | `Boogu/Boogu-Image-0.1-Edit` |
| `inference_base_fp8.py` | `Boogu/Boogu-Image-0.1-Base-fp8` |
| `inference_turbo_fp8.py` | `Boogu/Boogu-Image-0.1-Turbo-fp8` |
| `inference_edit_fp8.py` | `Boogu/Boogu-Image-0.1-Edit-fp8` |

## Usage

Text-to-image:

```bash
python inference_base.py
```

Few-step (Turbo):

```bash
python inference_turbo.py
```

Image editing (reads `base.png` as the reference image, so run `inference_base.py` first):

```bash
python inference_edit.py
```

## FP8 checkpoints

FP8 weights are stored in a non-safetensors format, so the transformer is loaded
separately with `use_safetensors=False` and passed to the pipeline:

```python
import torch
from diffusers import BooguImageTransformer2DModel
from diffusers.pipelines.boogu import BooguImagePipeline

transformer = BooguImageTransformer2DModel.from_pretrained(
"Boogu/Boogu-Image-0.1-Base-fp8",
subfolder="transformer",
torch_dtype=torch.bfloat16,
use_safetensors=False,
)
pipe = BooguImagePipeline.from_pretrained(
"Boogu/Boogu-Image-0.1-Base-fp8", torch_dtype=torch.bfloat16, transformer=transformer
)
pipe = pipe.to("cuda")
```

The FP8 scripts also disable the DeepGEMM kernel for the FP8 VLM (forcing a Triton
finegrained-fp8 fallback) for broader hardware compatibility — see
`_disable_deepgemm_for_fp8_vlm()` in each FP8 script.

## Optional performance dependencies

The transformer can use fused kernels when available; without them it falls back to
pure PyTorch and prints a one-time warning:

- `triton` — fused RMSNorm
- `flash_attn` — fused SwiGLU and variable-length flash attention
20 changes: 20 additions & 0 deletions examples/boogu/inference_base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
import torch

from diffusers.pipelines.boogu import BooguImagePipeline


MODEL_PATH = "Boogu/Boogu-Image-0.1-Base"

pipe = BooguImagePipeline.from_pretrained(MODEL_PATH, torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")

images = pipe(
instruction="一幅国风琉金风格的山水画作,展现了桂林山水在金光普照下的壮丽景象。远山层叠,江水如镜,山峰边缘勾勒着发光的金色线条。画面采用石青石绿岩彩与鎏金质感相结合,局部有厚涂油画笔触,空中飘浮着金色粒子,营造出梦幻朦胧而又磅礴大气的意境。",
height=1024,
width=1024,
num_inference_steps=50,
text_guidance_scale=4.0,
).images

images[0].save("base.png")
print("Inference OK, saved base.png")
52 changes: 52 additions & 0 deletions examples/boogu/inference_base_fp8.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
import os

import torch

from diffusers import BooguImageTransformer2DModel
from diffusers.pipelines.boogu import BooguImagePipeline


def _disable_deepgemm_for_fp8_vlm() -> None:
# For transformers >= 5.11.0
os.environ["TRANSFORMERS_DISABLE_DEEPGEMM_LINEAR"] = "1"

try:
import transformers.integrations.finegrained_fp8 as fg_fp8
except Exception:
return

def _raise_import_error(*args, **kwargs):
raise ImportError("DeepGEMM disabled; forcing Triton finegrained-fp8 fallback.")

if hasattr(fg_fp8, "deepgemm_fp8_fp4_linear"):
# For 5.10.1 <= transformers < 5.11.0
fg_fp8.deepgemm_fp8_fp4_linear = _raise_import_error
elif hasattr(fg_fp8, "_load_deepgemm_kernel"):
# For 5.5.0 <= transoformers < 5.10.1
fg_fp8._load_deepgemm_kernel = _raise_import_error


_disable_deepgemm_for_fp8_vlm()

MODEL_PATH = "Boogu/Boogu-Image-0.1-Base-fp8"

transformer = BooguImageTransformer2DModel.from_pretrained(
MODEL_PATH,
subfolder="transformer",
torch_dtype=torch.bfloat16,
use_safetensors=False,
)
pipe = BooguImagePipeline.from_pretrained(MODEL_PATH, torch_dtype=torch.bfloat16, transformer=transformer)
pipe = pipe.to("cuda")

images = pipe(
instruction="一幅国风琉金风格的山水画作,展现了桂林山水在金光普照下的壮丽景象。远山层叠,江水如镜,山峰边缘勾勒着发光的金色线条。画面采用石青石绿岩彩与鎏金质感相结合,局部有厚涂油画笔触,空中飘浮着金色粒子,营造出梦幻朦胧而又磅礴大气的意境。",
height=1024,
width=1024,
num_inference_steps=50,
text_guidance_scale=4.0,
).images

assert len(images) == 1
images[0].save("base_fp8.png")
print("Inference OK, saved base_fp8.png")
38 changes: 38 additions & 0 deletions examples/boogu/inference_edit.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
import os

import torch
from PIL import Image

from diffusers.pipelines.boogu import BooguImagePipeline


MODEL_PATH = "Boogu/Boogu-Image-0.1-Edit"

# Negative prompt steering quality away from common artifacts. With text_guidance_scale > 1
# the model guides away from this prompt, so it noticeably improves style adherence.
NEGATIVE_INSTRUCTION = (
"(((deformed))), blurry, over saturation, bad anatomy, disfigured, poorly drawn face, "
"mutation, mutated, (extra_limb), (ugly), (poorly drawn hands), fused fingers, messy drawing, "
"broken legs censor, censored, censor_bar"
)

if not os.path.exists("base.png"):
raise FileNotFoundError("base.png not found — run inference_base.py first to generate the reference image.")

pipe = BooguImagePipeline.from_pretrained(MODEL_PATH, torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")

images = pipe(
instruction="把图片风格调整为彩铅插画。",
negative_instruction=NEGATIVE_INSTRUCTION,
input_images=[Image.open("base.png").convert("RGB")],
height=1024,
width=1024,
num_inference_steps=50,
text_guidance_scale=4.0,
image_guidance_scale=1.0,
).images

assert len(images) == 1
images[0].save("edit.png")
print("Inference OK, saved edit.png")
Loading
Loading