AnimeLoom

Anime Character Consistency Engine — generate studio-quality anime video from a text story while keeping the same character face, identity, and style across every shot.

AnimeLoom orchestrates a multi-stage pipeline (story decomposition, identity-locked keyframes, motion synthesis, face restoration, post-processing) to turn a one-paragraph prompt into a smooth anime sequence on a single GPU. Built around the latest 2026 open-source video models with a face-lock pass that pastes the character's face from a reference image into every output frame.

What's New in v2

Stage	v1 (older Colab path)	v1.5	v2 (current)
Keyframes	SDXL + LoRA	SDXL + LoRA + IP-Adapter SDXL + img2img chaining	unchanged from v1.5
Story decomposition	Rule-based	Gemini Flash	Two-stage Gemini -> Claude refinement
Video model	CogVideoX 1.5	Wan2.1-I2V-14B	Wan2.2-I2V-A14B (MoE) + anime LoRA
Face consistency	None	GFPGAN per frame	Wan2.2-Animate face lock (face pasted from keyframe at every frame)
Post-processing	GFPGAN + Real-ESRGAN	+ two-pass temporal smoothing, every-2nd-frame GFPGAN	unchanged from v1.5
Identity consistency	~70%	~85%	~95%+

The v2 path runs Wan2.2 in two passes per shot:

Phase 3a — Wan2.2-I2V-A14B (with anime LoRA) generates a driving clip that captures motion only
Phase 3b — Wan2.2-Animate-14B takes the SDXL keyframe as reference and the driving clip as motion source, producing a face-locked output where the character's face is literally pasted from the keyframe at every frame

This decoupled design comes from the Wan-Animate paper (arXiv 2509.14055) and is the single biggest available jump for "consistent anime faces" in open-source 2026.

Pipeline

Story (text)
   |
   v
Phase 1: Decompose (Gemini story planning -> Claude cinematic refinement)
   |
   v
Phase 2: SDXL + LoRA + IP-Adapter -> identity-locked keyframes (img2img chaining,
                                     adaptive strength decay, dynamic anchor refresh,
                                     quality gate with drift detection)
   |
   v
Phase 3a: Wan2.2-I2V-A14B (+ anime LoRA) -> driving clips (motion source)
   |
   v
Phase 3b: Wan2.2-Animate-14B(reference=keyframe, driving=clip) -> face-locked frames
   |
   v
Phase 4: GFPGAN every-2nd-frame face restoration + two-pass temporal smoothing
   |
   v
Phase 5: RIFE temporal upscale (16fps -> 24fps) + Real-ESRGAN spatial upscale
   |
   v
Phase 6: Cross-dissolve assembly -> final mp4

Requirements

Recommended GPU: NVIDIA RTX A6000 (48GB VRAM). Each Wan2.2 14B variant peaks around 42-46GB with model CPU offload.
Minimum GPU: any 24GB+ card with sequential offload (slower; 480x640 max resolution).
Python 3.10+, PyTorch 2.5.1 + CUDA 12.4, ffmpeg, Redis (optional, for Celery).
API keys (optional but recommended): Gemini (free, 1500 req/day at aistudio.google.com/apikey) and Anthropic Claude.

Quick Start (RunPod A6000 — primary path)

Spin up an A6000 (48GB) pod on RunPod with the PyTorch image
Open Jupyter, then notebooks/AnimeLoom_RunPod.ipynb
Run cells in order:
- Cell 1 — installs pinned deps (torch 2.5.1+cu124, diffusers 0.36, ftfy, gfpgan, facexlib, etc.)
- Cell 2 — downloads a character LoRA from HuggingFace (default: joelthomas77/animeloom-sakura-haruno)
- Cell 2.5 — patches the story decomposer for two-stage Gemini->Claude refinement
- Cell 3 — runs the full v2 pipeline (Phase 1 -> 6) and renders the final video

Configure in Cell 3:

STORY_TEXT — your one-paragraph story
CHARACTER_NAME — must match the LoRA from Cell 2
GEMINI_API_KEY / ANTHROPIC_API_KEY — both optional; falls back gracefully

Quick Start (CLI / standalone)

git clone https://github.com/JoelJohnsonThomas/AnimeLoom.git
cd AnimeLoom
chmod +x setup.sh
./setup.sh

python main.py --text "A girl walks through a cherry blossom forest at sunset"
python main.py --script script.txt --quality high
python main.py --api      # FastAPI server
python main.py --test     # smoke test

Architecture

+------------------------------------------------------------+
|                       DirectorAgent                         |
|  +-----------+  +---------------+  +--------------------+   |
|  | Story     |->| WorkflowGraph |->| Shot Executor      |   |
|  | Decomposer|  | (DAG)         |  | + Checkpointing    |   |
|  +-----------+  +---------------+  +---------+----------+   |
+------------------------------------------------+------------+
            |              |               |              |
            v              v               v              v
    +---------------+ +-----------+ +-----------+ +--------------+
    | Character     | | Animator  | | Evaluator | | Asset        |
    | Agent         | | Agent     | | Agent     | | MemoryBank   |
    |               | |           | |           | |              |
    | * LoRA train  | | * Wan2.2  | | * Identity| | * LoRAs      |
    | * IP-Adapter  | | * Animate | | * Motion  | | * Embeddings |
    | * Consistency | | * RIFE    | | * Visual  | | * Scenes     |
    +---------------+ +-----------+ +-----------+ +--------------+

Project Structure

animeloom/
├── director/
│   ├── agent.py                   # main orchestrator (script parsing, shot execution)
│   ├── workflow.py                 # shot dependency DAG with topological ordering
│   └── memory_bank.py              # persistent character/scene/shot storage
├── agents/
│   ├── story/
│   │   └── decomposer.py           # two-stage Gemini -> Claude story decomposer
│   ├── character/
│   │   ├── trainer.py              # LoRA fine-tuning (PEFT, rank 16-32)
│   │   ├── lora_manager.py         # adapter load/unload
│   │   ├── ip_adapter.py           # IPAdapterConditioner (SDXL face-image conditioning)
│   │   └── consistency.py          # GroundingDINO + SAM + CLIP identity validation
│   ├── animator/
│   │   ├── wan_wrapper.py          # multi-backend video wrapper
│   │   ├── wan_animate.py          # Wan2.2-Animate-14B face-lock wrapper (NEW in v2)
│   │   ├── cogvideo_wrapper.py     # CogVideoX fallback
│   │   ├── pixverse.py             # PixVerse external fallback
│   │   └── controlnet.py           # OpenPose pose conditioning
│   ├── postprocess/
│   │   ├── upscaler.py             # RIFE temporal + Real-ESRGAN spatial
│   │   ├── face_restore.py         # GFPGAN/CodeFormer face restoration
│   │   ├── color_grade.py          # anime LUT grading
│   │   └── transitions.py          # cross-dissolve assembly
│   └── evaluator/
│       ├── character_score.py       # CLIP-based identity consistency
│       ├── motion_score.py          # optical flow motion fidelity
│       └── visual_score.py          # sharpness, colour, smoothness
├── api/
│   ├── app.py                       # FastAPI application
│   ├── routes/{characters,generation}.py
│   └── schemas/models.py
├── jobs/
│   ├── worker.py                    # Celery async worker
│   └── tasks/{training,generation}.py
├── cloud/
│   ├── colab_survival.py            # 4-min keep-alive + 5-min checkpointing
│   ├── kaggle_trainer.py            # Kaggle P100 trainer
│   └── gcp_setup.sh                 # GCP T4 VM provisioning
├── notebooks/
│   └── AnimeLoom_RunPod.ipynb       # primary v2 pipeline notebook
├── warehouse/                        # runtime asset storage
│   ├── models/                       # base model weights
│   ├── lora/                         # character LoRA adapters
│   ├── outputs/                      # generated videos
│   └── checkpoints/                  # resume checkpoints
├── main.py                           # CLI entry point
├── setup.sh
├── requirements.txt
└── sample_script.txt

Models Used (v2)

Stage	Model	Purpose	VRAM
Keyframes	`cagliostrolab/animagine-xl-3.1` (SDXL) + character LoRA	identity-locked anime stills	~12GB
Identity conditioning	`h94/IP-Adapter` `ip-adapter_sdxl.bin`	image-to-image face anchoring	shares SDXL UNet
Story decomposer	Gemini 2.5 Flash + Claude Sonnet 4.6	shot list + cinematic refinement	API only
Driving clip	`Wan-AI/Wan2.2-I2V-A14B-Diffusers` (MoE 14B)	motion source for Phase 3b	~42GB peak
Anime style	Wan 2.2 anime LoRA (Civitai community)	anime aesthetic on Wan output	<1GB
Face lock	`Wan-AI/Wan2.2-Animate-14B`	reference face + driving motion -> output	~46GB peak
Face restore	GFPGAN v1.4	every-2nd-frame face cleanup	~3GB
Temporal upscale	RIFE 4.x	16fps -> 24fps interpolation	~6GB
Spatial upscale	Real-ESRGAN x4plus_anime_6B	480p -> 720p+ sharpening	~6GB

Each phase fully unloads before the next loads, so peak VRAM stays within A6000 limits.

Tech Stack

Category	Tools
ML	PyTorch 2.5.1+cu124, Diffusers 0.36, PEFT, Transformers, Accelerate
Video	Wan2.2-I2V-A14B, Wan2.2-Animate-14B, CogVideoX-2B (fallback), AnimateDiff (fallback)
Identity	IP-Adapter SDXL, character LoRA, GroundingDINO + SAM + CLIP
NLP	Gemini 2.5 Flash, Claude Sonnet 4.6, rule-based fallback
Post	RIFE, Real-ESRGAN, GFPGAN, OpenCV, ffmpeg
API	FastAPI, Uvicorn, Pydantic
Queue	Celery, Redis
Infra	RunPod (primary), Google Colab, Kaggle, GCP

Settings (Cell 3 of the notebook)

Parameter	Default	Description
`IMAGE_WIDTH` x `IMAGE_HEIGHT`	768 x 1152	SDXL keyframe resolution (portrait)
`SDXL_STEPS`	35	SDXL inference steps
`SDXL_GUIDANCE`	7.0	SDXL guidance scale
`LORA_SCALE`	1.15	character LoRA scale (early shots; relaxed to 1.0 after shot 2)
`WAN_W` x `WAN_H`	480x832 (auto-detected from VRAM)	Wan2.2 output resolution
`NUM_FRAMES`	33	frames per Wan2.2 clip
`WAN_STEPS`	30	Wan2.2 inference steps
`WAN_GUIDANCE`	3.0	lower = more motion freedom
`FPS` -> `TARGET_FPS`	16 -> 24	source fps and RIFE-interpolated fps
`FACE_RESTORE`	True	GFPGAN every-2nd-frame face restoration
`SPATIAL_UPSCALE`	True	Real-ESRGAN x4plus_anime_6B
`COLOR_GRADE`	True	anime LUT grading
`WAN_ANIME_LORA_REPO`	`Kijai/wan22-anime-style`	Wan2.2 anime style LoRA repo (skip on failure)

Story Script Format (CLI)

SCENE: Character introduction
CHAR: Sakura
A young woman with pink hair walks through a cherry blossom forest

SCENE: Bridge
CHAR: Sakura
She stops at a wooden bridge and looks at the river below

SCENE: Wind
CHAR: Sakura
The wind gently moves her hair as petals fall around her

Directives: SCENE: (or SHOT:) starts a new shot, CHAR: lists character names, POSE: references a pose video, free text is the prompt.

Training a Character LoRA

Image count	Use case
10-15	prototyping; identity may drift on extreme angles
20-30	studio quality; cover front, 3/4, side, expressions, lighting
30+	diminishing returns

Best practices: official screencaps over fan art, mix front + 3/4 + side views, mix expressions, include full-body and close-up shots, use 512px+ on the shortest side.

API Endpoints

Method	Endpoint	Description
`POST`	`/character/create`	upload character sheet, train LoRA
`GET`	`/character/list`	list all characters
`GET`	`/character/{id}`	get character details
`DELETE`	`/character/{id}`	delete a character
`POST`	`/generate/shot`	generate single shot
`POST`	`/generate/sequence`	generate multi-shot sequence
`POST`	`/generate/text-to-anime`	full text -> anime video
`GET`	`/job/{job_id}`	check generation job status

Environment Variables

Variable	Default	Description
`AI_CACHE_ROOT`	`./warehouse`	root directory for all assets
`GEMINI_API_KEY`	—	Gemini Flash API key (free tier sufficient)
`ANTHROPIC_API_KEY`	—	Claude Sonnet API key (~$0.003 per story)
`REDIS_URL`	`redis://localhost:6379/0`	Celery job queue URL
`PIXVERSE_API_KEY`	—	PixVerse fallback API key (optional)
`API_HOST` / `API_PORT`	`0.0.0.0` / `8080`	FastAPI bind

Expected Quality on A6000

Metric	v1	v1.5	v2 (current)
Identity consistency	~70%	~85%	~95%+
Face stability across shots	low	medium	near-perfect (face pasted from keyframe)
Motion smoothness	ok	good	better (Wan2.2 MoE temporal attention)
Anime aesthetic	good	good	stronger (Wan2.2 anime LoRA)
Visual quality	6-7/10	7.5-8/10	8.7-9.2/10

To break 9.5/10, the next paradigm shift is HunyuanVideo full fp16 on H100 (80GB), or temporal-conditioning models like Sora / Veo 2 — both outside the A6000 envelope.

How It Works

Story Decomposition — Gemini 2.5 Flash plans a structured shot list (SCENE/CHAR/ACTION/CAMERA/MOOD per shot, all sharing one environment). Claude Sonnet 4.6 refines each shot into cinematic anime language with required body movement. Falls back to rule-based on missing keys.
Keyframe Generation — SDXL + animagine-xl-3.1 + character LoRA generate keyframe 0 with text2img. Keyframes 1+ use img2img chaining (StableDiffusionXLImg2ImgPipeline) with adaptive strength decay (0.40 -> 0.25). IP-Adapter SDXL conditions every shot on the identity anchor (refreshed every 3 shots). A pixel-drift quality gate regenerates outliers.
Driving Clip (Phase 3a) — Wan2.2-I2V-A14B (Mixture-of-Experts) plus an optional Wan 2.2 anime LoRA generates a short clip per shot. Center-crop resize avoids face-proportion distortion. Face quality at this stage is irrelevant - it gets overwritten in Phase 3b.
Face Lock (Phase 3b) — Wan2.2-Animate-14B decouples skeleton (body motion) from facial expression. The driving clip provides motion; the SDXL keyframe is the face reference. Output has the keyframe's face at every frame with the driving clip's motion. Falls back to Track A (driving clips become final) if Animate is unavailable.
Face Restoration — Two-pass temporal smoothing wraps a face-region-only GFPGAN pass applied to every 2nd frame (prevents identity drift from over-restoration; preserves anime texture).
Temporal + Spatial Upscale — RIFE interpolates 16fps -> 24fps; Real-ESRGAN x4plus_anime_6B sharpens each frame.
Color Grading — anime LUT grading with palette presets (warm, cool, vibrant, muted).
Assembly — cross-dissolve between adjacent clips, final mp4 written via OpenCV.

Contributing

Contributions welcome.

Fork the repository
Create a feature branch (git checkout -b feature/your-feature)
Commit your changes
Push to the branch
Open a pull request

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.claude		.claude
__pycache__		__pycache__
agents		agents
api		api
characters		characters
cloud		cloud
director		director
jobs		jobs
notebooks		notebooks
output		output
scripts		scripts
test_characters		test_characters
test_images		test_images
warehouse		warehouse
.env.example		.env.example
.gitignore		.gitignore
Claude.md		Claude.md
GETTING_STARTED.md		GETTING_STARTED.md
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
sample_script.txt		sample_script.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AnimeLoom

What's New in v2

Pipeline

Requirements

Quick Start (RunPod A6000 — primary path)

Quick Start (CLI / standalone)

Architecture

Project Structure

Models Used (v2)

Tech Stack

Settings (Cell 3 of the notebook)

Story Script Format (CLI)

Training a Character LoRA

API Endpoints

Environment Variables

Expected Quality on A6000

How It Works

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AnimeLoom

What's New in v2

Pipeline

Requirements

Quick Start (RunPod A6000 — primary path)

Quick Start (CLI / standalone)

Architecture

Project Structure

Models Used (v2)

Tech Stack

Settings (Cell 3 of the notebook)

Story Script Format (CLI)

Training a Character LoRA

API Endpoints

Environment Variables

Expected Quality on A6000

How It Works

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages