Skip to content

Latest commit

 

History

History
554 lines (441 loc) · 12.9 KB

File metadata and controls

554 lines (441 loc) · 12.9 KB

Z-Image-Turbo Implementation Notes

Reference document for building a Svelte-based image generation app with canvas/inpainting capabilities.

Stack Decisions

  • Frontend: Svelte 5 + Vite (no SvelteKit, no TypeScript)
  • Styling: Tailwind CSS (possibly with shadcn-svelte)
  • Language: JavaScript only
  • Backend: FastAPI (Python)
  • UI Design: Custom/bespoke (not replicating original)

1. BACKEND ARCHITECTURE

Core Setup (FastAPI)

from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
import uvicorn
import torch
import json
import os
from io import BytesIO
import base64

app = FastAPI()
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],  # Restrict in production
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

# Global pipeline - lazy loaded
pipe = None

Configuration Persistence

CONFIG_FILE = "config.json"

def load_config():
    if os.path.exists(CONFIG_FILE):
        try:
            with open(CONFIG_FILE, "r") as f:
                return json.load(f)
        except Exception:
            pass
    return {
        "cache_dir": None,
        "model_id": "Tongyi-MAI/Z-Image-Turbo",
        "cpu_offload": False
    }

def save_config(config):
    with open(CONFIG_FILE, "w") as f:
        json.dump(config, f, indent=4)

Pipeline Management

def get_pipeline():
    global pipe
    if pipe is None:
        from diffusers import ZImagePipeline

        device = "cuda" if torch.cuda.is_available() else "cpu"
        dtype = torch.bfloat16 if device == "cuda" else torch.float32

        config = load_config()
        pipe = ZImagePipeline.from_pretrained(
            config['model_id'],
            torch_dtype=dtype,
            low_cpu_mem_usage=False,
            cache_dir=config.get('cache_dir')
        )

        if config.get("cpu_offload", False) and device == "cuda":
            pipe.enable_model_cpu_offload()
        else:
            pipe.to(device)

    return pipe

API Endpoints

Request/Response Models

class GenerateRequest(BaseModel):
    prompt: str
    height: int = 1024
    width: int = 1024
    steps: int = 8
    guidance_scale: float = 0.0
    seed: int = -1

class SettingsRequest(BaseModel):
    cache_dir: str
    cpu_offload: bool = False

Endpoints

Endpoint Method Purpose
/health GET Returns {"status": "ok"}
/settings GET Returns current config
/settings/model-path POST Updates config, sets pipe=None to force reload
/generate POST Main generation endpoint

Generation Endpoint Logic

@app.post("/generate")
async def generate(req: GenerateRequest):
    # Validate dimensions
    if req.height % 16 != 0 or req.width % 16 != 0:
        raise HTTPException(400, "Height and Width must be divisible by 16")

    pipeline = get_pipeline()
    device = "cuda" if torch.cuda.is_available() else "cpu"

    # Handle seed
    generator = None
    if req.seed != -1:
        generator = torch.Generator(device).manual_seed(req.seed)

    # Generate
    image = pipeline(
        prompt=req.prompt,
        height=req.height,
        width=req.width,
        num_inference_steps=req.steps,
        guidance_scale=req.guidance_scale,
        generator=generator,
    ).images[0]

    # Convert to base64
    buffer = BytesIO()
    image.save(buffer, format="PNG")
    img_str = base64.b64encode(buffer.getvalue()).decode()

    return {"image": f"data:image/png;base64,{img_str}"}

2. MODEL DETAILS (Z-Image-Turbo)

Specifications

  • Model ID: Tongyi-MAI/Z-Image-Turbo
  • Parameters: 6 billion
  • Architecture: S3-DiT (Scalable Single-Stream Diffusion Transformer)
  • Text Encoder: Qwen 4B
  • VAE: Flux Autoencoder
  • Optimized Steps: 8 (distilled model)

Generation Parameters

Parameter Type Range Default Notes
prompt str Any Required Text description
height int 256-2048 1024 Must be divisible by 16
width int 256-2048 1024 Must be divisible by 16
num_inference_steps int 1-50 8 More steps = higher quality, slower
guidance_scale float 0.0-10.0 0.0 0 = no guidance
generator Generator - None For seed control

Memory Requirements

  • Recommended VRAM: 16GB+
  • CUDA Precision: bfloat16
  • CPU Precision: float32
  • CPU Offload: Available for low-VRAM GPUs (trades speed for memory)

3. GENERATION PARAMETERS (UI Controls Needed)

These are the parameters the UI needs to expose for image generation:

Inference Steps

  • Range: 1-50
  • Default: 8

Guidance Scale

  • Range: 0-10 (0.1 increments)
  • Default: 0.0

Dimensions

  • Width: 256-2048px (must be divisible by 16)
  • Height: 256-2048px (must be divisible by 16)

Useful Aspect Ratio Presets:

Name Ratio Dimensions
Square 1:1 1024×1024
Portrait 3:4 896×1152
Landscape 4:3 1152×896
Wide 16:9 1344×768

Seed

  • -1 = random
  • Any positive integer for reproducibility

4. STATE MANAGEMENT (Svelte Stores)

Core State

// Using Svelte writable stores (src/lib/stores/)

// generation.js
import { writable } from 'svelte/store'

export const prompt = writable('')
export const generatedImage = writable(null)  // data:image/png;base64,...
export const loading = writable(false)

export const settings = writable({
  steps: 8,
  guidance_scale: 0.0,
  width: 1024,
  height: 1024,
  seed: -1
})

API Communication Pattern

// src/lib/api.js
const API_BASE = 'http://localhost:8000'

export async function generateImage(prompt, settings) {
  const res = await fetch(`${API_BASE}/generate`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ prompt, ...settings })
  })

  if (!res.ok) {
    const error = await res.json()
    throw new Error(error.detail || 'Generation failed')
  }

  return res.json()  // { image: "data:image/png;base64,..." }
}

export async function getSettings() {
  const res = await fetch(`${API_BASE}/settings`)
  return res.json()
}

export async function updateSettings(settings) {
  const res = await fetch(`${API_BASE}/settings/model-path`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(settings)
  })
  return res.json()
}

5. PYTHON DEPENDENCIES

# requirements.txt
fastapi
uvicorn
torch
transformers
accelerate
protobuf
sentencepiece
git+https://github.com/huggingface/diffusers.git

Notes:

  • Diffusers must be from git for ZImagePipeline
  • PyTorch installation depends on CUDA version
  • Recommend Python 3.8+

6. FRONTEND SETUP (Svelte + Vite + Tailwind)

Project Initialization

# Create Svelte project with Vite
npm create vite@latest frontend -- --template svelte

cd frontend

# Install Tailwind
npm install -D tailwindcss postcss autoprefixer
npx tailwindcss init -p

# Install shadcn-svelte (optional - for UI components)
npx shadcn-svelte@latest init

# Icons
npm install lucide-svelte

# Canvas library (for later phases)
npm install konva svelte-konva

Recommended Dependencies

{
  "dependencies": {
    "lucide-svelte": "^0.460.0",
    "konva": "^9.3.0",
    "svelte-konva": "^1.0.0"
  },
  "devDependencies": {
    "svelte": "^5.0.0",
    "vite": "^6.0.0",
    "tailwindcss": "^3.4.0",
  }
}

7. ERROR HANDLING

Backend Pattern

# Validation (400)
if req.height % 16 != 0:
    raise HTTPException(400, "Height must be divisible by 16")

# Server Error (500)
try:
    result = operation()
except Exception as e:
    print(f"Error: {e}")
    raise HTTPException(500, str(e))

Frontend Pattern

try {
  const res = await fetch(...)
  if (!res.ok) throw new Error('Failed')
  // success handling
} catch (e) {
  console.error(e)
  alert('User-friendly error message')
} finally {
  loading = false
}

8. DOWNLOAD FUNCTIONALITY

function downloadImage() {
  const link = document.createElement('a')
  link.href = image  // data URI
  link.download = `z-image-${Date.now()}.png`
  link.click()
}

9. CONFIGURATION LIFECYCLE

  1. Load: Read config.json on startup, fallback to defaults
  2. Display: Populate settings UI from config
  3. Update: POST to /settings/model-path with new values
  4. Persist: Backend saves to config.json
  5. Reload: Set pipe = None to force model reload on next request
  6. Feedback: Alert user that model will reload

10. NOTES FOR CANVAS/INPAINTING EXTENSION

Additional Endpoints Needed

Endpoint Purpose
POST /inpaint Inpainting with mask
POST /img2img Image-to-image generation
POST /outpaint Extend canvas beyond boundaries

Additional Request Fields

class InpaintRequest(BaseModel):
    prompt: str
    image: str          # Base64 encoded source image
    mask: str           # Base64 encoded mask (white = regenerate)
    height: int = 1024
    width: int = 1024
    steps: int = 8
    guidance_scale: float = 0.0
    seed: int = -1
    strength: float = 0.8  # How much to change (0-1)

Canvas State Additions

// Layers
layers: Layer[] = []
activeLayerId: string | null = null

// Canvas
canvasWidth: number = 1024
canvasHeight: number = 1024
zoom: number = 1.0
panX: number = 0
panY: number = 0

// Tools
activeTool: 'brush' | 'eraser' | 'select' | 'move' | 'pan' = 'brush'
brushSize: number = 50
brushHardness: number = 100

// Mask
maskCanvas: HTMLCanvasElement | null = null
showMask: boolean = true
maskOpacity: number = 0.5

// History
undoStack: CanvasState[] = []
redoStack: CanvasState[] = []

InvokeAI-Style Features to Implement

  1. Layer System

    • Multiple image layers
    • Layer visibility toggles
    • Layer opacity controls
    • Layer reordering (drag & drop)
    • Layer merge/flatten
  2. Selection Tools

    • Rectangular selection
    • Lasso selection
    • Magic wand (color-based)
    • Selection invert/expand/contract
  3. Brush/Mask Tools

    • Variable size brush
    • Soft/hard edge options
    • Mask painting mode
    • Quick mask visualization
  4. Canvas Navigation

    • Pan (middle mouse / space+drag)
    • Zoom (scroll wheel / +/- keys)
    • Fit to screen
    • Reset view
  5. Inpainting Workflow

    • Paint mask over areas to regenerate
    • Mask feathering options
    • Preserve composition checkbox
    • Mask invert option
  6. History

    • Undo/redo stack
    • History panel with thumbnails
    • Snapshot system
  7. Export Options

    • Export single layer
    • Export merged/flattened
    • Export with transparency
    • Export mask only

11. RECOMMENDED LIBRARIES

Canvas

  • Konva.js + svelte-konva - Canvas library with built-in layer support, transforms, events
  • Fabric.js - Alternative with more object manipulation features

UI Components

  • shadcn-svelte - Tailwind-based UI components (optional)
  • lucide-svelte - Icon library

Utilities

  • tailwind-merge or clsx - Class name utilities

12. FILE STRUCTURE (Svelte + Vite)

project-root/
├── backend/
│   ├── main.py           # FastAPI application
│   ├── routes/
│   │   ├── generate.py   # Generation endpoints
│   │   ├── inpaint.py    # Inpainting endpoints
│   │   └── settings.py   # Settings endpoints
│   ├── services/
│   │   ├── pipeline.py   # Pipeline management
│   │   └── image.py      # Image processing utilities
│   └── config.py         # Configuration management
│
├── frontend/
│   ├── src/
│   │   ├── lib/
│   │   │   ├── components/
│   │   │   │   ├── Canvas.svelte
│   │   │   │   ├── LayerPanel.svelte
│   │   │   │   ├── ToolPanel.svelte
│   │   │   │   └── ...
│   │   │   ├── stores/
│   │   │   │   ├── canvas.js     # Canvas state
│   │   │   │   ├── layers.js     # Layer management
│   │   │   │   ├── tools.js      # Tool state
│   │   │   │   └── settings.js   # App settings
│   │   │   └── utils/
│   │   │       ├── api.js        # API communication
│   │   │       ├── canvas.js     # Canvas utilities
│   │   │       └── image.js      # Image processing
│   │   ├── App.svelte            # Main app component
│   │   ├── main.js               # Entry point
│   │   └── app.css               # Tailwind imports
│   ├── index.html
│   ├── vite.config.js
│   ├── tailwind.config.js
│   └── package.json
│
├── config.json
├── requirements.txt
└── README.md