Run Ideogram 4 on a MacBook with MPS — no CUDA, no NVIDIA GPU needed.
FP8 weights are dequantized to bf16 on CPU, then the full model is loaded onto
MPS. Three non-obvious tricks make this work: a monkey-patch around MPS's missing
ndtri op, manual fp8→bf16 dequant that avoids bitsandbytes entirely, and
loading the Qwen3-VL text encoder without its vision components.
A WebUI (FastAPI + React + SQLite) is included for interactive use with structured prompt composition, generation progress tracking, image history, and prompt history.
# 1. Create venv and install deps
python3 -m venv .venv && source .venv/bin/activate
pip install git+https://github.com/ideogram-oss/ideogram4.git
pip install -r server/requirements.txt
# 2. Log in and accept the gated repo terms at
# https://huggingface.co/ideogram-ai/ideogram-4-fp8
hf auth login
# 3. Generate
python ideogram4_mps.py \
--prompt-file examples/caption.json \
--resolution 1024 \
--preset V4_QUALITY_48 \
--out examples/result.png# 1. Create venv and install Python deps
python3 -m venv .venv && source .venv/bin/activate
pip install git+https://github.com/ideogram-oss/ideogram4.git
pip install -r server/requirements.txt
# 2. Install Node deps
cd webui && pnpm install && cd ..
# 3. Configure Quick Prompt (optional but recommended)
cp .env.example .env
# Edit .env: set IDEOGRAM4_MAGIC_PROMPT_API_KEY
# 4. Log in to HuggingFace
hf auth login
# 5. Launch
./run.shThen open http://localhost:5173.
Note:
ideogram4is not published on PyPI.pip install git+...pulls it directly from the official GitHub repo.huggingface-cli loginis deprecated — usehf auth logininstead.
The model weights (~26 GB, FP8 safetensors) are not included in this repo.
They are downloaded automatically from HuggingFace on first pipeline load — you
don't need to run a separate download command. Weights are cached to
~/.cache/huggingface/hub/.
To pre-download without running inference:
source .venv/bin/activate
hf download ideogram-ai/ideogram-4-fp8The download above is optional. The model auto-downloads on first load either way.
-
License: Accept the terms at https://huggingface.co/ideogram-ai/ideogram-4-fp8 ("Agree and access repository" button).
-
Token permissions: If using a fine-grained token, you must enable "Read access to contents of all public gated repos you can access" in your token settings. The simplest option is to create a Read-scoped token (not fine-grained) and use it with
hf auth login.
Browser (localhost:5173 by default)
│
│ HTTP (Vite dev proxy /api → localhost:8000 by default)
▼
FastAPI Server (server/main.py, port 8000 by default)
│
├── model_daemon.py ← model lifecycle, LoRA, get_pipeline()
│ ├── LoRA apply/remove (server/apply_lora.py)
│ │ Lokr / standard weight merge → load_state_dict()
│ └── Ideogram4Pipeline (MPS)
│ FP8 → bf16 on CPU → MPS
│ Qwen3-VL text encoder (text-only)
│ Conditional + Unconditional transformers
│ VAE autoencoder
│
├── magic_prompt.py ← POST /api/magic-prompt → commandcode.ai
│
├── config.py ← env var config (paths, ports, defaults)
│
├── db.py ← SQLite (images, prompts, form state)
│
└── logger.py ← structured logs → logs/
| Default port | Variable | Process | Role |
|---|---|---|---|
| 8000 | IDEOGRAM4_SERVER_PORT |
main.py |
FastAPI server, pipeline owner, SQLite |
| 5173 | IDEOGRAM4_WEBUI_PORT |
Vite dev server | React WebUI with proxy to IDEOGRAM4_SERVER_PORT |
- Installs Python + Node dependencies
- Loads
.envfrom project root (if present) - Stops existing processes on the configured server/webui ports (graceful stop first, force stop only if needed)
- Starts server and webui on the configured ports in parallel
- Cleans up all processes on SIGINT / SIGTERM / EXIT
# Load env vars, then:
# Terminal 1: API Server
set -a && source .env && set +a
python server/main.py
# Terminal 2: WebUI
cd webui && pnpm dev -- --port "${IDEOGRAM4_WEBUI_PORT:-5173}"- Model Panel — Load / Unload controls with live status indicator (idle / loading / loaded)
- Quick Prompt — Natural language → structured caption via LLM (MiniMaxAI/MiniMax-M3). Supports text-only and text+image (drag-drop, multi-image). Auto-populates all form fields including style settings.
- Caption Editor — Tabbed interface: structured form (scene, style, composition) or raw JSON, with bidirectional real-time sync
- Raw JSON mode — If raw JSON is present, generation submits that JSON object directly rather than rebuilding it from form fields
- Style Settings — Aesthetics, lighting, medium (photograph / illustration / 3d_render / painting / graphic_design), camera or art style, color palette
- Composition — Background description + dynamic element list (type: obj/text, bbox, description)
- LoRA — Apply/remove LoRA weights (Lokr or standard format) with strength control. Auto-detected from
models/loras/(gitignored). - Generation Settings — 7 aspect ratio presets with visual preview, custom width/height (128–2048px, snapped to 128), quality preset (Turbo / Default / Quality), seed, estimated generation time
- Status Overlay — Progress bar with percentage during generation, error state with dismiss
- Prompt History — Sidebar with persistent URLs (
/history/$promptId), click to restore form + view result, auto-refresh on generation - Auto-save — Form state persisted via server API (SQLite) with localStorage fallback
Full WebUI spec: docs/WEBUI_SPEC.md (Korean)
| Flag | Default | Description |
|---|---|---|
--prompt |
— | JSON caption string (inline) |
--prompt-file |
— | File containing JSON caption |
--repo |
ideogram-ai/ideogram-4-fp8 |
HuggingFace repo ID |
--width |
— | Output width, multiple of 16 (overrides --resolution) |
--height |
— | Output height, multiple of 16 (overrides --resolution) |
--resolution |
1024 |
Square output (multiple of 16). Ignored if --width/--height set |
--preset |
V4_QUALITY_48 |
V4_QUALITY_48 / V4_DEFAULT_20 / V4_TURBO_12 |
--seed |
20260608 |
Random seed |
--format |
png |
Output format: png / webp / jpeg |
--quality |
— | Lossy quality 1-100 (webp/jpeg only; default: lossless) |
--lora |
— | Path to LoRA .safetensors to apply (Lokr or standard) |
--lora-strength |
0.6 |
LoRA merge strength |
--out |
required | Output image path |
Ideogram 4 needs structured JSON captions. See examples/caption.json for a
complete example. Minimal example:
{
"compositional_deconstruction": {
"background": "Seoul alleyway at dusk, warm neon signs, wet pavement",
"elements": [
{"type": "obj", "desc": "A young Korean woman holding a sign"},
{"type": "text", "desc": "The sign reads '사랑합니다' in clean Hangul"}
]
}
}Full format reference: https://github.com/ideogram-oss/ideogram4/blob/main/docs/prompting.md
| Method | Path | Description |
|---|---|---|
GET |
/api/model/status |
Model state (idle / loading / loaded) |
POST |
/api/model/load |
Trigger model load |
POST |
/api/model/unload |
Unload model from memory |
POST |
/api/magic-prompt |
Natural language → structured caption via LLM |
POST |
/api/generate |
Submit generation task (JSON caption + params). Local single-generation slot; returns 409 if another generation is running |
GET |
/api/status/{task_id} |
Poll generation progress and result |
POST |
/api/verify |
Validate a JSON caption without generating |
GET |
/api/lora/status |
List available LoRAs + currently applied |
POST |
/api/lora/apply |
Apply LoRA by name with strength |
POST |
/api/lora/remove |
Restore original weights |
GET |
/api/images |
List generated images |
DELETE |
/api/images/{id} |
Delete a generated image |
GET |
/api/prompts |
List saved prompts |
GET |
/api/prompts/{id} |
Get single prompt by ID |
DELETE |
/api/prompts/{id} |
Delete a saved prompt |
GET |
/api/form |
Load last saved form state |
POST |
/api/form |
Save form state |
This is a local single-user app. Model load, unload, LoRA apply/remove, and
generation share one in-process pipeline and are protected by a pipeline
operation lock. Generation runs in a daemon thread, but only one generation is
accepted at a time; extra /api/generate requests return 409 instead of
queuing unbounded work. Completed task status entries are kept briefly for
polling and cleaned up after about one hour.
Common baseline (V4_QUALITY_48):
- Disk: ~26 GB model weights (FP8 safetensors)
- Total model params: ~26.8B (2× 9.3B transformers + 8B text encoder + VAE)
- Peak memory (M5 Max, no swap): ~50 GB
- Peak memory (M1 Max 64 GB, heavy swap): 63–68 GB
| Resolution | Load | Generation | Peak MPS mem |
|---|---|---|---|
| 1024×1024 | ~197 s | ~408 s | ~50 GB |
All times from bench_load.py run on each machine. M5 Max numbers are with PYTORCH_MPS_FAST_MATH=1.
| Step | M5 Max (128 GB) | M1 Max (64 GB) |
|---|---|---|
| Text encoder (CPU dequant → MPS) | 77 s | 128 s |
| Conditional transformer (9.3B) | 74 s | 84 s |
| Unconditional transformer (9.3B) | 38 s | 84 s |
| VAE | 2 s | 19 s |
| MPSGraph warmup (first inference) | 5 s | 88 s |
| Pipeline load total | 197 s | 315 s |
All at V4_QUALITY_48, same caption prompt. Ratios are consistent across resolutions — generation slowdown is fixed per-step, not pixel-dependent.
| Metric | M5 Max (128 GB) | M1 Max (64 GB) | Ratio (M1/M5) |
|---|---|---|---|
| Pipeline load | 197 s | 315 s | 1.6× |
| Generation 1024² | 408 s | 2240 s | 5.5× |
| Generation 512² | ~149 s* | 818 s | 5.5× |
| Peak memory 1024² | ~50 GB | 68.4 GB | swap |
| Peak memory 512² | — | 63.7 GB | swap |
* 512² on M5 Max is estimated (408 / 2.74 scaling based on M1 resolution ratio).
- Pipeline load is 1.6× slower on M1 Max — dominated by CPU dequant + MPS transfer (text encoder 8B), not GPU compute.
- Generation is consistently ~5.5× slower regardless of resolution (512² and 1024² show the same ratio). This reflects the combined effect of lower MPS compute throughput, narrower memory bandwidth, and swap pressure on the 64 GB machine.
- On M1 Max 64 GB, even 512×512 exceeds physical RAM (63.7 GB peak). Swap is unavoidable at any resolution with V4_QUALITY_48.
| Goal | Suggested config | Est. time |
|---|---|---|
| Best quality without swap | 768×768 + V4_DEFAULT_20 | ~300–400 s |
| Fast generation | 512×512 + V4_TURBO_12 | ~200–300 s |
| Maximum quality (accept swap) | 1024×1024 + V4_QUALITY_48 | ~2240 s |
Upgrading to 96 GB+ unified memory eliminates swap entirely and brings generation time closer to the ~2-3× chip-gap ratio.
All processes write structured runtime logs to logs/ (gitignored):
| Process | Log file pattern | Content |
|---|---|---|
CLI (ideogram4_mps.py) |
logs/ideogram4_mps-<ts>.log |
Download, dequant, loading, generation, output |
Server (main.py) |
logs/server-<ts>.log |
HTTP requests, model lifecycle, generation, uvicorn |
Logs include timestamps, severity level, and structured messages. Set
IDEOGRAM4_LOG_DIR to override the default logs/ directory.
The .log suffix from generation metadata (examples/result.log) is kept in git via
.gitignore exclusion while runtime logs are ignored.
All settings are read from environment variables at import time by server/config.py.
run.sh auto-loads .env from the project root. See .env.example for all options.
| Variable | Default | Description |
|---|---|---|
IDEOGRAM4_MAGIC_PROMPT_API_KEY |
— | LLM API key for Quick Prompt (required) |
IDEOGRAM4_MAGIC_PROMPT_MODEL |
MiniMaxAI/MiniMax-M3 |
LLM model for prompt expansion |
IDEOGRAM4_MAGIC_PROMPT_BASE_URL |
https://api.commandcode.ai/provider/v1 |
LLM provider base URL |
IDEOGRAM4_MAGIC_PROMPT_TIMEOUT |
120 |
LLM request timeout (seconds) |
IDEOGRAM4_MAGIC_PROMPT_MAX_TOKENS |
16384 |
LLM max response tokens |
IDEOGRAM4_MAGIC_PROMPT_TEMPERATURE |
1.0 |
LLM temperature |
IDEOGRAM4_SERVER_HOST |
0.0.0.0 |
FastAPI bind host |
IDEOGRAM4_SERVER_PORT |
8000 |
FastAPI listen port |
IDEOGRAM4_WEBUI_PORT |
5173 |
Vite WebUI dev server port used by run.sh |
IDEOGRAM4_SERVER_LOG_LEVEL |
info |
Uvicorn log level |
IDEOGRAM4_CORS_ORIGINS |
* |
CORS allow-origins |
IDEOGRAM4_MODEL_REPO |
ideogram-ai/ideogram-4-fp8 |
HuggingFace model repo |
IDEOGRAM4_DEFAULT_PRESET |
V4_QUALITY_48 |
Default generation preset |
IDEOGRAM4_DEFAULT_FORMAT |
webp |
Default output format (server) |
IDEOGRAM4_DEFAULT_SEED |
20260608 |
Default generation seed |
IDEOGRAM4_IMAGE_QUALITY_WEBP |
90 |
WebP lossy quality |
IDEOGRAM4_IMAGE_QUALITY_JPEG |
95 |
JPEG lossy quality |
IDEOGRAM4_LOG_DIR |
logs/ |
Log output directory |
IDEOGRAM4_DB_PATH |
server/data/ideogram4.db |
SQLite database path |
IDEOGRAM4_OUTPUT_DIR |
server/output/ |
Generated image output dir |
IDEOGRAM4_LORA_DIR |
models/loras/ |
LoRA weight files dir |
IDEOGRAM4_LORA_STRENGTH |
0.6 |
Default LoRA merge strength |
IDEOGRAM4_WARMUP_SIZE |
64 |
Warmup resolution (width=height) |
IDEOGRAM4_WARMUP_STEPS |
2 |
Warmup step count |
IDEOGRAM4_DB_QUERY_LIMIT |
50 |
Default row limit for DB queries |
- Apple Silicon Mac (M1/M2/M3/M4/M5)
- Python 3.11+ with pip
- Node.js 20+ with pnpm
PYTORCH_ENABLE_MPS_FALLBACK=1(set automatically)PYTORCH_MPS_FAST_MATH=1(set automatically)- ~50 GB unified memory for 1024×1024 V4_QUALITY_48 (smaller resolutions / presets may work with less)
- ~26 GB free disk space for FP8 model weights
- HuggingFace account with access to the gated repo
ideogram-ai/ideogram-4-fp8
![]() 한복 여인, 새벽 정원 (V4_QUALITY_48, 1024×1024) |
![]() 황혼 녘 한옥마을 (V4_QUALITY_48, 1024×1024) |
![]() 전통 문양 일러스트 (V4_QUALITY_48, 832×1248) |
This project is MIT. The Ideogram 4 model weights are under the Ideogram 4 Non-Commercial License.



