Ideogram 4 on Apple Silicon (MPS)

Run Ideogram 4 on a MacBook with MPS — no CUDA, no NVIDIA GPU needed.

FP8 weights are dequantized to bf16 on CPU, then the full model is loaded onto MPS. Three non-obvious tricks make this work: a monkey-patch around MPS's missing ndtri op, manual fp8→bf16 dequant that avoids bitsandbytes entirely, and loading the Qwen3-VL text encoder without its vision components.

A WebUI (FastAPI + React + SQLite) is included for interactive use with structured prompt composition, generation progress tracking, image history, and prompt history.

Quick start

CLI (single image)

# 1. Create venv and install deps
python3 -m venv .venv && source .venv/bin/activate
pip install git+https://github.com/ideogram-oss/ideogram4.git
pip install -r server/requirements.txt

# 2. Log in and accept the gated repo terms at
#    https://huggingface.co/ideogram-ai/ideogram-4-fp8
hf auth login

# 3. Generate
python ideogram4_mps.py \
  --prompt-file examples/caption.json \
  --resolution 1024 \
  --preset V4_QUALITY_48 \
  --out examples/result.png

WebUI (full stack)

# 1. Create venv and install Python deps
python3 -m venv .venv && source .venv/bin/activate
pip install git+https://github.com/ideogram-oss/ideogram4.git
pip install -r server/requirements.txt

# 2. Install Node deps
cd webui && pnpm install && cd ..

# 3. Configure Quick Prompt (optional but recommended)
cp .env.example .env
# Edit .env: set IDEOGRAM4_MAGIC_PROMPT_API_KEY

# 4. Log in to HuggingFace
hf auth login

# 5. Launch
./run.sh

Then open http://localhost:5173.

Note: ideogram4 is not published on PyPI. pip install git+... pulls it directly from the official GitHub repo. huggingface-cli login is deprecated — use hf auth login instead.

Model download

The model weights (~26 GB, FP8 safetensors) are not included in this repo. They are downloaded automatically from HuggingFace on first pipeline load — you don't need to run a separate download command. Weights are cached to ~/.cache/huggingface/hub/.

To pre-download without running inference:

source .venv/bin/activate
hf download ideogram-ai/ideogram-4-fp8

The download above is optional. The model auto-downloads on first load either way.

Prerequisites

License: Accept the terms at https://huggingface.co/ideogram-ai/ideogram-4-fp8 ("Agree and access repository" button).
Token permissions: If using a fine-grained token, you must enable "Read access to contents of all public gated repos you can access" in your token settings. The simplest option is to create a Read-scoped token (not fine-grained) and use it with hf auth login.

Architecture

Browser (localhost:5173 by default)
    │
    │ HTTP (Vite dev proxy /api → localhost:8000 by default)
    ▼
FastAPI Server (server/main.py, port 8000 by default)
    │
    ├── model_daemon.py    ← model lifecycle, LoRA, get_pipeline()
    │     ├── LoRA apply/remove (server/apply_lora.py)
    │     │     Lokr / standard weight merge → load_state_dict()
    │     └── Ideogram4Pipeline (MPS)
    │           FP8 → bf16 on CPU → MPS
    │           Qwen3-VL text encoder (text-only)
    │           Conditional + Unconditional transformers
    │           VAE autoencoder
    │
    ├── magic_prompt.py    ← POST /api/magic-prompt → commandcode.ai
    │
    ├── config.py          ← env var config (paths, ports, defaults)
    │
    ├── db.py              ← SQLite (images, prompts, form state)
    │
    └── logger.py          ← structured logs → logs/

Key ports

Default port	Variable	Process	Role
8000	`IDEOGRAM4_SERVER_PORT`	`main.py`	FastAPI server, pipeline owner, SQLite
5173	`IDEOGRAM4_WEBUI_PORT`	Vite dev server	React WebUI with proxy to `IDEOGRAM4_SERVER_PORT`

Startup flow (`./run.sh`)

Installs Python + Node dependencies
Loads .env from project root (if present)
Stops existing processes on the configured server/webui ports (graceful stop first, force stop only if needed)
Starts server and webui on the configured ports in parallel
Cleans up all processes on SIGINT / SIGTERM / EXIT

Manual startup (for debugging)

# Load env vars, then:
# Terminal 1: API Server
set -a && source .env && set +a
python server/main.py

# Terminal 2: WebUI
cd webui && pnpm dev -- --port "${IDEOGRAM4_WEBUI_PORT:-5173}"

WebUI features

Model Panel — Load / Unload controls with live status indicator (idle / loading / loaded)
Quick Prompt — Natural language → structured caption via LLM (MiniMaxAI/MiniMax-M3). Supports text-only and text+image (drag-drop, multi-image). Auto-populates all form fields including style settings.
Caption Editor — Tabbed interface: structured form (scene, style, composition) or raw JSON, with bidirectional real-time sync
Raw JSON mode — If raw JSON is present, generation submits that JSON object directly rather than rebuilding it from form fields
Style Settings — Aesthetics, lighting, medium (photograph / illustration / 3d_render / painting / graphic_design), camera or art style, color palette
Composition — Background description + dynamic element list (type: obj/text, bbox, description)
LoRA — Apply/remove LoRA weights (Lokr or standard format) with strength control. Auto-detected from models/loras/ (gitignored).
Generation Settings — 7 aspect ratio presets with visual preview, custom width/height (128–2048px, snapped to 128), quality preset (Turbo / Default / Quality), seed, estimated generation time
Status Overlay — Progress bar with percentage during generation, error state with dismiss
Prompt History — Sidebar with persistent URLs (/history/$promptId), click to restore form + view result, auto-refresh on generation
Auto-save — Form state persisted via server API (SQLite) with localStorage fallback

Full WebUI spec: docs/WEBUI_SPEC.md (Korean)

CLI options

Flag	Default	Description
`--prompt`	—	JSON caption string (inline)
`--prompt-file`	—	File containing JSON caption
`--repo`	`ideogram-ai/ideogram-4-fp8`	HuggingFace repo ID
`--width`	—	Output width, multiple of 16 (overrides `--resolution`)
`--height`	—	Output height, multiple of 16 (overrides `--resolution`)
`--resolution`	`1024`	Square output (multiple of 16). Ignored if `--width`/`--height` set
`--preset`	`V4_QUALITY_48`	`V4_QUALITY_48` / `V4_DEFAULT_20` / `V4_TURBO_12`
`--seed`	`20260608`	Random seed
`--format`	`png`	Output format: `png` / `webp` / `jpeg`
`--quality`	—	Lossy quality 1-100 (webp/jpeg only; default: lossless)
`--lora`	—	Path to LoRA `.safetensors` to apply (Lokr or standard)
`--lora-strength`	`0.6`	LoRA merge strength
`--out`	required	Output image path

JSON caption format

Ideogram 4 needs structured JSON captions. See examples/caption.json for a complete example. Minimal example:

{
  "compositional_deconstruction": {
    "background": "Seoul alleyway at dusk, warm neon signs, wet pavement",
    "elements": [
      {"type": "obj", "desc": "A young Korean woman holding a sign"},
      {"type": "text", "desc": "The sign reads '사랑합니다' in clean Hangul"}
    ]
  }
}

Full format reference: https://github.com/ideogram-oss/ideogram4/blob/main/docs/prompting.md

API endpoints

Method	Path	Description
`GET`	`/api/model/status`	Model state (`idle` / `loading` / `loaded`)
`POST`	`/api/model/load`	Trigger model load
`POST`	`/api/model/unload`	Unload model from memory
`POST`	`/api/magic-prompt`	Natural language → structured caption via LLM
`POST`	`/api/generate`	Submit generation task (JSON caption + params). Local single-generation slot; returns `409` if another generation is running
`GET`	`/api/status/{task_id}`	Poll generation progress and result
`POST`	`/api/verify`	Validate a JSON caption without generating
`GET`	`/api/lora/status`	List available LoRAs + currently applied
`POST`	`/api/lora/apply`	Apply LoRA by name with strength
`POST`	`/api/lora/remove`	Restore original weights
`GET`	`/api/images`	List generated images
`DELETE`	`/api/images/{id}`	Delete a generated image
`GET`	`/api/prompts`	List saved prompts
`GET`	`/api/prompts/{id}`	Get single prompt by ID
`DELETE`	`/api/prompts/{id}`	Delete a saved prompt
`GET`	`/api/form`	Load last saved form state
`POST`	`/api/form`	Save form state

Runtime concurrency

This is a local single-user app. Model load, unload, LoRA apply/remove, and generation share one in-process pipeline and are protected by a pipeline operation lock. Generation runs in a daemon thread, but only one generation is accepted at a time; extra /api/generate requests return 409 instead of queuing unbounded work. Completed task status entries are kept briefly for polling and cleaned up after about one hour.

Memory & speed

Common baseline (V4_QUALITY_48):

Disk: ~26 GB model weights (FP8 safetensors)
Total model params: ~26.8B (2× 9.3B transformers + 8B text encoder + VAE)
Peak memory (M5 Max, no swap): ~50 GB
Peak memory (M1 Max 64 GB, heavy swap): 63–68 GB

M5 Max (128 GB unified memory)

Resolution	Load	Generation	Peak MPS mem
1024×1024	~197 s	~408 s	~50 GB

Pipeline load breakdown

All times from bench_load.py run on each machine. M5 Max numbers are with PYTORCH_MPS_FAST_MATH=1.

Step	M5 Max (128 GB)	M1 Max (64 GB)
Text encoder (CPU dequant → MPS)	77 s	128 s
Conditional transformer (9.3B)	74 s	84 s
Unconditional transformer (9.3B)	38 s	84 s
VAE	2 s	19 s
MPSGraph warmup (first inference)	5 s	88 s
Pipeline load total	197 s	315 s

M1 Max (64 GB unified memory)

Cross-chip comparison

All at V4_QUALITY_48, same caption prompt. Ratios are consistent across resolutions — generation slowdown is fixed per-step, not pixel-dependent.

Metric	M5 Max (128 GB)	M1 Max (64 GB)	Ratio (M1/M5)
Pipeline load	197 s	315 s	1.6×
Generation 1024²	408 s	2240 s	5.5×
Generation 512²	~149 s*	818 s	5.5×
Peak memory 1024²	~50 GB	68.4 GB	swap
Peak memory 512²	—	63.7 GB	swap

* 512² on M5 Max is estimated (408 / 2.74 scaling based on M1 resolution ratio).

Analysis

Pipeline load is 1.6× slower on M1 Max — dominated by CPU dequant + MPS transfer (text encoder 8B), not GPU compute.
Generation is consistently ~5.5× slower regardless of resolution (512² and 1024² show the same ratio). This reflects the combined effect of lower MPS compute throughput, narrower memory bandwidth, and swap pressure on the 64 GB machine.
On M1 Max 64 GB, even 512×512 exceeds physical RAM (63.7 GB peak). Swap is unavoidable at any resolution with V4_QUALITY_48.

Recommendations for M1 Max 64 GB

Goal	Suggested config	Est. time
Best quality without swap	768×768 + V4_DEFAULT_20	~300–400 s
Fast generation	512×512 + V4_TURBO_12	~200–300 s
Maximum quality (accept swap)	1024×1024 + V4_QUALITY_48	~2240 s

Upgrading to 96 GB+ unified memory eliminates swap entirely and brings generation time closer to the ~2-3× chip-gap ratio.

Logging

All processes write structured runtime logs to logs/ (gitignored):

Process	Log file pattern	Content
CLI (`ideogram4_mps.py`)	`logs/ideogram4_mps-<ts>.log`	Download, dequant, loading, generation, output
Server (`main.py`)	`logs/server-<ts>.log`	HTTP requests, model lifecycle, generation, uvicorn

Logs include timestamps, severity level, and structured messages. Set IDEOGRAM4_LOG_DIR to override the default logs/ directory.

The .log suffix from generation metadata (examples/result.log) is kept in git via .gitignore exclusion while runtime logs are ignored.

Configuration

All settings are read from environment variables at import time by server/config.py. run.sh auto-loads .env from the project root. See .env.example for all options.

Variable	Default	Description
`IDEOGRAM4_MAGIC_PROMPT_API_KEY`	—	LLM API key for Quick Prompt (required)
`IDEOGRAM4_MAGIC_PROMPT_MODEL`	`MiniMaxAI/MiniMax-M3`	LLM model for prompt expansion
`IDEOGRAM4_MAGIC_PROMPT_BASE_URL`	`https://api.commandcode.ai/provider/v1`	LLM provider base URL
`IDEOGRAM4_MAGIC_PROMPT_TIMEOUT`	`120`	LLM request timeout (seconds)
`IDEOGRAM4_MAGIC_PROMPT_MAX_TOKENS`	`16384`	LLM max response tokens
`IDEOGRAM4_MAGIC_PROMPT_TEMPERATURE`	`1.0`	LLM temperature
`IDEOGRAM4_SERVER_HOST`	`0.0.0.0`	FastAPI bind host
`IDEOGRAM4_SERVER_PORT`	`8000`	FastAPI listen port
`IDEOGRAM4_WEBUI_PORT`	`5173`	Vite WebUI dev server port used by `run.sh`
`IDEOGRAM4_SERVER_LOG_LEVEL`	`info`	Uvicorn log level
`IDEOGRAM4_CORS_ORIGINS`	`*`	CORS allow-origins
`IDEOGRAM4_MODEL_REPO`	`ideogram-ai/ideogram-4-fp8`	HuggingFace model repo
`IDEOGRAM4_DEFAULT_PRESET`	`V4_QUALITY_48`	Default generation preset
`IDEOGRAM4_DEFAULT_FORMAT`	`webp`	Default output format (server)
`IDEOGRAM4_DEFAULT_SEED`	`20260608`	Default generation seed
`IDEOGRAM4_IMAGE_QUALITY_WEBP`	`90`	WebP lossy quality
`IDEOGRAM4_IMAGE_QUALITY_JPEG`	`95`	JPEG lossy quality
`IDEOGRAM4_LOG_DIR`	`logs/`	Log output directory
`IDEOGRAM4_DB_PATH`	`server/data/ideogram4.db`	SQLite database path
`IDEOGRAM4_OUTPUT_DIR`	`server/output/`	Generated image output dir
`IDEOGRAM4_LORA_DIR`	`models/loras/`	LoRA weight files dir
`IDEOGRAM4_LORA_STRENGTH`	`0.6`	Default LoRA merge strength
`IDEOGRAM4_WARMUP_SIZE`	`64`	Warmup resolution (width=height)
`IDEOGRAM4_WARMUP_STEPS`	`2`	Warmup step count
`IDEOGRAM4_DB_QUERY_LIMIT`	`50`	Default row limit for DB queries

Apple Silicon Mac (M1/M2/M3/M4/M5)
Python 3.11+ with pip
Node.js 20+ with pnpm
PYTORCH_ENABLE_MPS_FALLBACK=1 (set automatically)
PYTORCH_MPS_FAST_MATH=1 (set automatically)
~50 GB unified memory for 1024×1024 V4_QUALITY_48 (smaller resolutions / presets may work with less)
~26 GB free disk space for FP8 model weights
HuggingFace account with access to the gated repo ideogram-ai/ideogram-4-fp8

Example output

_{한복 여인, 새벽 정원
(V4_QUALITY_48, 1024×1024)}

_{황혼 녘 한옥마을
(V4_QUALITY_48, 1024×1024)}

Korean traditional folk pattern illustration

_{전통 문양 일러스트
(V4_QUALITY_48, 832×1248)}

License

This project is MIT. The Ideogram 4 model weights are under the Ideogram 4 Non-Commercial License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ideogram 4 on Apple Silicon (MPS)

Quick start

CLI (single image)

WebUI (full stack)

Model download

Prerequisites

Architecture

Key ports

Startup flow (`./run.sh`)

Manual startup (for debugging)

WebUI features

CLI options

JSON caption format

API endpoints

Runtime concurrency

Memory & speed

M5 Max (128 GB unified memory)

Pipeline load breakdown

M1 Max (64 GB unified memory)

Cross-chip comparison

Analysis

Recommendations for M1 Max 64 GB

Logging

Configuration

Example output

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
examples		examples
notes		notes
server		server
webui		webui
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
bench_load.py		bench_load.py
ideogram4_mps.py		ideogram4_mps.py
run.sh		run.sh
test_lora.py		test_lora.py

Folders and files

Latest commit

History

Repository files navigation

Ideogram 4 on Apple Silicon (MPS)

Quick start

CLI (single image)

WebUI (full stack)

Model download

Prerequisites

Architecture

Key ports

Startup flow (./run.sh)

Manual startup (for debugging)

WebUI features

CLI options

JSON caption format

API endpoints

Runtime concurrency

Memory & speed

M5 Max (128 GB unified memory)

Pipeline load breakdown

M1 Max (64 GB unified memory)

Cross-chip comparison

Analysis

Recommendations for M1 Max 64 GB

Logging

Configuration

Example output

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Startup flow (`./run.sh`)

Packages