Z-Image-Turbo Implementation Notes

Reference document for building a Svelte-based image generation app with canvas/inpainting capabilities.

Stack Decisions

Frontend: Svelte 5 + Vite (no SvelteKit, no TypeScript)
Styling: Tailwind CSS (possibly with shadcn-svelte)
Language: JavaScript only
Backend: FastAPI (Python)
UI Design: Custom/bespoke (not replicating original)

1. BACKEND ARCHITECTURE

Core Setup (FastAPI)

from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
import uvicorn
import torch
import json
import os
from io import BytesIO
import base64

app = FastAPI()
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],  # Restrict in production
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

# Global pipeline - lazy loaded
pipe = None

Configuration Persistence

CONFIG_FILE = "config.json"

def load_config():
    if os.path.exists(CONFIG_FILE):
        try:
            with open(CONFIG_FILE, "r") as f:
                return json.load(f)
        except Exception:
            pass
    return {
        "cache_dir": None,
        "model_id": "Tongyi-MAI/Z-Image-Turbo",
        "cpu_offload": False
    }

def save_config(config):
    with open(CONFIG_FILE, "w") as f:
        json.dump(config, f, indent=4)

Pipeline Management

def get_pipeline():
    global pipe
    if pipe is None:
        from diffusers import ZImagePipeline

        device = "cuda" if torch.cuda.is_available() else "cpu"
        dtype = torch.bfloat16 if device == "cuda" else torch.float32

        config = load_config()
        pipe = ZImagePipeline.from_pretrained(
            config['model_id'],
            torch_dtype=dtype,
            low_cpu_mem_usage=False,
            cache_dir=config.get('cache_dir')
        )

        if config.get("cpu_offload", False) and device == "cuda":
            pipe.enable_model_cpu_offload()
        else:
            pipe.to(device)

    return pipe

API Endpoints

Request/Response Models

class GenerateRequest(BaseModel):
    prompt: str
    height: int = 1024
    width: int = 1024
    steps: int = 8
    guidance_scale: float = 0.0
    seed: int = -1

class SettingsRequest(BaseModel):
    cache_dir: str
    cpu_offload: bool = False

Endpoints

Endpoint	Method	Purpose
`/health`	GET	Returns `{"status": "ok"}`
`/settings`	GET	Returns current config
`/settings/model-path`	POST	Updates config, sets `pipe=None` to force reload
`/generate`	POST	Main generation endpoint

Generation Endpoint Logic

@app.post("/generate")
async def generate(req: GenerateRequest):
    # Validate dimensions
    if req.height % 16 != 0 or req.width % 16 != 0:
        raise HTTPException(400, "Height and Width must be divisible by 16")

    pipeline = get_pipeline()
    device = "cuda" if torch.cuda.is_available() else "cpu"

    # Handle seed
    generator = None
    if req.seed != -1:
        generator = torch.Generator(device).manual_seed(req.seed)

    # Generate
    image = pipeline(
        prompt=req.prompt,
        height=req.height,
        width=req.width,
        num_inference_steps=req.steps,
        guidance_scale=req.guidance_scale,
        generator=generator,
    ).images[0]

    # Convert to base64
    buffer = BytesIO()
    image.save(buffer, format="PNG")
    img_str = base64.b64encode(buffer.getvalue()).decode()

    return {"image": f"data:image/png;base64,{img_str}"}

2. MODEL DETAILS (Z-Image-Turbo)

Specifications

Model ID: Tongyi-MAI/Z-Image-Turbo
Parameters: 6 billion
Architecture: S3-DiT (Scalable Single-Stream Diffusion Transformer)
Text Encoder: Qwen 4B
VAE: Flux Autoencoder
Optimized Steps: 8 (distilled model)

Generation Parameters

Parameter	Type	Range	Default	Notes
`prompt`	str	Any	Required	Text description
`height`	int	256-2048	1024	Must be divisible by 16
`width`	int	256-2048	1024	Must be divisible by 16
`num_inference_steps`	int	1-50	8	More steps = higher quality, slower
`guidance_scale`	float	0.0-10.0	0.0	0 = no guidance
`generator`	Generator	-	None	For seed control

Memory Requirements

Recommended VRAM: 16GB+
CUDA Precision: bfloat16
CPU Precision: float32
CPU Offload: Available for low-VRAM GPUs (trades speed for memory)

3. GENERATION PARAMETERS (UI Controls Needed)

These are the parameters the UI needs to expose for image generation:

Inference Steps

Range: 1-50
Default: 8

Guidance Scale

Range: 0-10 (0.1 increments)
Default: 0.0

Dimensions

Width: 256-2048px (must be divisible by 16)
Height: 256-2048px (must be divisible by 16)

Useful Aspect Ratio Presets:

Name	Ratio	Dimensions
Square	1:1	1024×1024
Portrait	3:4	896×1152
Landscape	4:3	1152×896
Wide	16:9	1344×768

Seed

-1 = random
Any positive integer for reproducibility

4. STATE MANAGEMENT (Svelte Stores)

Core State

// Using Svelte writable stores (src/lib/stores/)

// generation.js
import { writable } from 'svelte/store'

export const prompt = writable('')
export const generatedImage = writable(null)  // data:image/png;base64,...
export const loading = writable(false)

export const settings = writable({
  steps: 8,
  guidance_scale: 0.0,
  width: 1024,
  height: 1024,
  seed: -1
})

API Communication Pattern

// src/lib/api.js
const API_BASE = 'http://localhost:8000'

export async function generateImage(prompt, settings) {
  const res = await fetch(`${API_BASE}/generate`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ prompt, ...settings })
  })

  if (!res.ok) {
    const error = await res.json()
    throw new Error(error.detail || 'Generation failed')
  }

  return res.json()  // { image: "data:image/png;base64,..." }
}

export async function getSettings() {
  const res = await fetch(`${API_BASE}/settings`)
  return res.json()
}

export async function updateSettings(settings) {
  const res = await fetch(`${API_BASE}/settings/model-path`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(settings)
  })
  return res.json()
}

5. PYTHON DEPENDENCIES

# requirements.txt
fastapi
uvicorn
torch
transformers
accelerate
protobuf
sentencepiece
git+https://github.com/huggingface/diffusers.git

Notes:

Diffusers must be from git for ZImagePipeline
PyTorch installation depends on CUDA version
Recommend Python 3.8+

6. FRONTEND SETUP (Svelte + Vite + Tailwind)

Project Initialization

# Create Svelte project with Vite
npm create vite@latest frontend -- --template svelte

cd frontend

# Install Tailwind
npm install -D tailwindcss postcss autoprefixer
npx tailwindcss init -p

# Install shadcn-svelte (optional - for UI components)
npx shadcn-svelte@latest init

# Icons
npm install lucide-svelte

# Canvas library (for later phases)
npm install konva svelte-konva

Recommended Dependencies

{
  "dependencies": {
    "lucide-svelte": "^0.460.0",
    "konva": "^9.3.0",
    "svelte-konva": "^1.0.0"
  },
  "devDependencies": {
    "svelte": "^5.0.0",
    "vite": "^6.0.0",
    "tailwindcss": "^3.4.0",
  }
}

7. ERROR HANDLING

Backend Pattern

# Validation (400)
if req.height % 16 != 0:
    raise HTTPException(400, "Height must be divisible by 16")

# Server Error (500)
try:
    result = operation()
except Exception as e:
    print(f"Error: {e}")
    raise HTTPException(500, str(e))

Frontend Pattern

try {
  const res = await fetch(...)
  if (!res.ok) throw new Error('Failed')
  // success handling
} catch (e) {
  console.error(e)
  alert('User-friendly error message')
} finally {
  loading = false
}

8. DOWNLOAD FUNCTIONALITY

function downloadImage() {
  const link = document.createElement('a')
  link.href = image  // data URI
  link.download = `z-image-${Date.now()}.png`
  link.click()
}

9. CONFIGURATION LIFECYCLE

Load: Read config.json on startup, fallback to defaults
Display: Populate settings UI from config
Update: POST to /settings/model-path with new values
Persist: Backend saves to config.json
Reload: Set pipe = None to force model reload on next request
Feedback: Alert user that model will reload

10. NOTES FOR CANVAS/INPAINTING EXTENSION

Additional Endpoints Needed

Endpoint	Purpose
`POST /inpaint`	Inpainting with mask
`POST /img2img`	Image-to-image generation
`POST /outpaint`	Extend canvas beyond boundaries

Additional Request Fields

class InpaintRequest(BaseModel):
    prompt: str
    image: str          # Base64 encoded source image
    mask: str           # Base64 encoded mask (white = regenerate)
    height: int = 1024
    width: int = 1024
    steps: int = 8
    guidance_scale: float = 0.0
    seed: int = -1
    strength: float = 0.8  # How much to change (0-1)

Canvas State Additions

// Layers
layers: Layer[] = []
activeLayerId: string | null = null

// Canvas
canvasWidth: number = 1024
canvasHeight: number = 1024
zoom: number = 1.0
panX: number = 0
panY: number = 0

// Tools
activeTool: 'brush' | 'eraser' | 'select' | 'move' | 'pan' = 'brush'
brushSize: number = 50
brushHardness: number = 100

// Mask
maskCanvas: HTMLCanvasElement | null = null
showMask: boolean = true
maskOpacity: number = 0.5

// History
undoStack: CanvasState[] = []
redoStack: CanvasState[] = []

InvokeAI-Style Features to Implement

Layer System
- Multiple image layers
- Layer visibility toggles
- Layer opacity controls
- Layer reordering (drag & drop)
- Layer merge/flatten
Selection Tools
- Rectangular selection
- Lasso selection
- Magic wand (color-based)
- Selection invert/expand/contract
Brush/Mask Tools
- Variable size brush
- Soft/hard edge options
- Mask painting mode
- Quick mask visualization
Canvas Navigation
- Pan (middle mouse / space+drag)
- Zoom (scroll wheel / +/- keys)
- Fit to screen
- Reset view
Inpainting Workflow
- Paint mask over areas to regenerate
- Mask feathering options
- Preserve composition checkbox
- Mask invert option
History
- Undo/redo stack
- History panel with thumbnails
- Snapshot system
Export Options
- Export single layer
- Export merged/flattened
- Export with transparency
- Export mask only

11. RECOMMENDED LIBRARIES

Canvas

Konva.js + svelte-konva - Canvas library with built-in layer support, transforms, events
Fabric.js - Alternative with more object manipulation features

UI Components

shadcn-svelte - Tailwind-based UI components (optional)
lucide-svelte - Icon library

Utilities

tailwind-merge or clsx - Class name utilities

12. FILE STRUCTURE (Svelte + Vite)

project-root/
├── backend/
│   ├── main.py           # FastAPI application
│   ├── routes/
│   │   ├── generate.py   # Generation endpoints
│   │   ├── inpaint.py    # Inpainting endpoints
│   │   └── settings.py   # Settings endpoints
│   ├── services/
│   │   ├── pipeline.py   # Pipeline management
│   │   └── image.py      # Image processing utilities
│   └── config.py         # Configuration management
│
├── frontend/
│   ├── src/
│   │   ├── lib/
│   │   │   ├── components/
│   │   │   │   ├── Canvas.svelte
│   │   │   │   ├── LayerPanel.svelte
│   │   │   │   ├── ToolPanel.svelte
│   │   │   │   └── ...
│   │   │   ├── stores/
│   │   │   │   ├── canvas.js     # Canvas state
│   │   │   │   ├── layers.js     # Layer management
│   │   │   │   ├── tools.js      # Tool state
│   │   │   │   └── settings.js   # App settings
│   │   │   └── utils/
│   │   │       ├── api.js        # API communication
│   │   │       ├── canvas.js     # Canvas utilities
│   │   │       └── image.js      # Image processing
│   │   ├── App.svelte            # Main app component
│   │   ├── main.js               # Entry point
│   │   └── app.css               # Tailwind imports
│   ├── index.html
│   ├── vite.config.js
│   ├── tailwind.config.js
│   └── package.json
│
├── config.json
├── requirements.txt
└── README.md

FilesExpand file tree

IMPLEMENTATION_NOTES.md

Latest commit

History