Awesome World Model

Awesome World Models Survey

1. Core Concepts & General World Models

World-Simulator · [Paper] [Code]
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond · [Paper] [Code]
Understanding World or Predicting Future? A Comprehensive Survey of World Models · [Paper] [Code]
World Models in AI: Like a Child · [Paper] [Code]
The Trinity of Consistency as a Defining Principle for General World Models · [Paper] [Code]
Learning to Model the World: A Survey of World Models in Artificial Intelligence · [Paper]

2. World Representation & Generation

3D and 4D World Modeling: A Survey · [Paper] [Code]
Exploring the Evolution of Physics Cognition in Video Generation · [Paper] [Code]
A Survey of Interactive Generative Video · [Paper] [Code]
From 2D to 3D Cognition · [Paper] [Code]
Advances in Feed-Forward 3D Reconstruction and View Synthesis: A Survey · [Paper] [Code]
3D Representation · [Paper] [Code]

3. Application: Embodied AI

World Models for Embodied AI · [Paper] [Code]
World Models and Physical Simulation · [Paper] [Code]
Embodied AI Agents: Modeling the World · [Paper] [Code]
Aligning Cyber Space with Physical World · [Paper] [Code]

4. Application: Autonomous Driving

The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey · [Paper] [Code]
A Survey of World Models for Autonomous Driving · [Paper] [Code]
World Models for Autonomous Driving: An Initial Survey · [Paper] [Code]
Interplay Between Video Generation and World Models in Autonomous Driving · [Paper] [Code]

5. Safety, Efficiency & Learning Methods

The Safety Challenge of World Models · [Paper] [Code]
World Model Safety · [Paper] [Code]
From Masks to Worlds · [Paper] [Code]
Model-based Reinforcement Learning: A Survey · [Paper] [Code]

6. Awesome Lists

Awesome World Model Evolution - Forging the World Model Universe from Unified Multimodal Models · [Paper] [Code]
Awesome-Efficient-Video-Generation · [Paper] [Code]
Awesome-World-Models · [Paper] [Code]
OpenMM-Arena · [Paper] [Code]
Awesome-From-Video-Generation-to-World-Model · [Paper] [Code]
Awesome-Video-World-Models-with-AR-Diffusion · [Paper] [Code]
Unified Operator on Interactive World Model · [Paper] [Code]
Awesome-VLA-for-AD · [Paper] [Code]

7. Positional Paper

A Path Towards Autonomous Machine Intelligence · [Paper]
Critiques of World Models · [Paper]
Positional Encoding Field · [Paper] [Code]
Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models · [Paper] [Code]
Research on World Models Is Not Injecting World Knowledge into Specific Tasks · [Paper] [Code]

World Model - Reasoning

spatial reasoning

spatial reasoning details

Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence · [Paper] [Code]
SpatialPIN: Enhancing Spatial Reasoning Capabilities of Vision-Language Models through Prompting and Interacting 3D Priors · [Paper] [Code]
SpatialBot: Precise Spatial Understanding with Vision Language Models · [Paper] [Code]
Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models · [Paper] [Code]
SpatialCoT: Advancing Spatial Reasoning through Coordinate Alignment and Chain-of-Thought for Embodied Task Planning · [Paper] [Code]
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities · [Paper] [Code]
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models · [Paper] [Code]
SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning · [Paper] [Code]
Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models · [Paper] [Code]
LEO-VL: Efficient Scene Representation for Scalable 3D Vision-Language Learning · [Paper] [Code]
SD-VLM: Spatial Measuring and Understanding with Depth-Encoded Vision-Language Models · [Paper] [Code]
SpaceVista: All-Scale Visual Spatial Reasoning from mm to km · [Paper] [Code]
Grounded Reinforcement Learning for Visual Reasoning · [Paper] [Code]
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning · [Paper] [Code]
3D Aware Region Prompted Vision Language Model · [Paper] [Code]
3D‑R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding · [Paper] [Code]
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics · [Paper] [Code]
Continuous 3D Perception Model with Persistent State · [Paper] [Code]
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs · [Paper] [Code]
RL makes MLLMs see better than SFT · [Paper] [Code]
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models · [Paper] [Code]
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces · [Paper] [Code]
Spatial Reasoning with Vision-Language Models in Ego-Centric Multi-View Scenes · [Paper] [Code]
COARSE CORRESPONDENCES Boost Spatial-Temporal Reasoning in Multimodal Language Model · [Paper] [Code]
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models · [Paper] [Code]
Cambrian-S · [Paper] [Code]
GS-reasoner · [Paper] [Code]
UniUGG · [Paper] [Code]
SenseNova-SI · [Paper] [Code]

omni reasoning

omni reasoning Details

DenseWorld · [Paper] [Code]
MindJourney · [Paper] [Code]
Think with 3D · [Paper] [Code]
Factored Interactive Object-Centric World Model · [Paper] [Code]
3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation · [Paper] [Code]
VIRAL: Visual Representation Alignment for MLLMs · [Paper] [Code]
VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction · [Paper] [Code]
Video models are zero-shot learners and reasoners · [Paper] [Code]
MLLMs Need 3D-Aware Representation Supervision for Scene Understanding (3DRS) · [Paper] [Code]
Diffusion Feedback Helps CLIP See Better (DIVA) · [Paper] [Code]
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors · [Paper] [Code]
Thinking by Doing · [Paper] [Code]
GalaxyWalker · [Paper] [Code]

World Model - Multimodel Synthesis

interactive video generation

interactive video generation Details

WorldCanvas · [Paper] [Code]
RealWonder · [Paper] [Code]
emu3.5 · [Paper] [Code]
MakeAnything · [Paper] [Code]
WorldCam · [Paper] [Code]

video camera view editing details

ReCamMaster: · [Paper] [Code]
CamCloneMaster · [Paper] [Code]
EgoX · [Paper] [Code]
WorldReel · [Paper] [Code]
DimensionX · [Paper] [Code]
Context-as-Memory · [Paper] [Code]
ShotStream · [Paper] [Code]

navigation video generation

navigation video generation Details

Mirage-2 · [Paper] [Code]
Yan · [Paper] [Code]
Context-as-Memory · [Paper] [Code]
Hunyuan-GameCraft-1.0 · [Paper] [Code]
Hunyuan-GameCraft-2.0 · [Paper] [Code]
Hunyuan-Worldplay · [Paper] [Code]
Matrix-Game · [Paper] [Code]
Matrix-Game 2.0 · [Paper] [Code]
Matrix-Game 3.0 · [Paper] [Code]
Longvie-2 · [Paper] [Code]
Hunyuan-Game · [Paper] [Code]
Yume1.5 · [Paper] [Code]
GameNGen · [Paper] [Code]
Oasis-2.0 · [Paper] [Code]
WorldExplorer · [Paper] [Code]
Tinyworlds · [Paper] [Code]
Videofrom3D · [Paper] [Code]
LYRA · [Paper] [Code]
Fantasyworld · [Paper] [Code]
Navigation World Models · [Paper] [Code]
WonderWorld · [Paper] [Code]
CLONE DETERMINISTIC 3D WORLDS WITH GEOMETRICALLY-REGULARIZED WORLD MODELS · [Paper] [Code]
NFD · [Paper] [Code]
WorldCache · [Paper] [Code]
Solaris · paper code
Infinite-World · [Paper] [Code]

(long-term) video generation

general video generation model

Helios · [Paper] [Code]
veo-3 · [Paper] [Code]
wan-2.5 · [Paper] [Code]
Waver · [Paper] [Code]
Sora · [Paper] [Code]
Seedance · [Paper] [Code]
Kling · [Paper] [Code]
Lynx · [Paper] [Code]
Cameractrl · [Paper] [Code]
Motionctrl · [Paper] [Code]
Panacea · [Paper] [Code]
WorldWeaver · [Paper] [Code]
Longlive · [Paper] [Code]
Vchain · [Paper] [Code]
Videocanvas · [Paper] [Code]
SANA-video · [Paper] [Code]
Hunyuan-Video · [Paper] [Code]
CogVideoX · [Paper] [Code]
longcat-video · [Paper] [Code]
Pretrained Video Generative Models as World Simulators · [Paper] [Code]
SFP · [Paper] [Code]

audio generation

audio generation details

AudioX · [Paper] [Code]
MMAudio · [Paper] [Code]
AudioGenie · [Paper] [Code]
LTX-2 · [Paper] [Code]

brain signal

brain signal corresponding details

Artificial Hippocampus Networks for Efficient Long-Context Modeling · [Paper] [Code]

World Model - Simulator and Representation

Feature Matching & Point Tracking

feature matching & point tracking details

SuperGlue · [Paper] [Code]
LoFTR · [Paper] [Code]
DKM · [Paper] [Code]
CasMTR · [Paper] [Code]
Roma · [Paper] [Code]
TAPTR · [Paper] [Code]
LocoTrack · [Paper] [Code]
BootsTAPIR · [Paper] [Code]

Multi-View Stereo (MVS)

multi-view stereo details

Gipuma · [Paper] [Code]
MVSNet · [Paper] [Code]
CIDER · [Paper] [Code]
PatchmatchNet · [Paper] [Code]
GeoMVSNet · [Paper] [Code]

3D generation

general 3D generation details

Matrix-3D · [Paper] [Code]
GenEx · [Paper] [Code]
HunyuanWorld · [Paper] [Code]
dreamcube · [Paper] [Code]
ViPE · [Paper] [Code]
VGGT · [Paper] [Code]
LGM · [Paper] [Code]
GS-LRM · [Paper] [Code]
LVSM · [Paper] [Code]
DUSt3R · [Paper] [Code]
MASt3R · [Paper] [Code]
sam-3d-object · [Paper] [Code]
tripo3D · [Paper] [Code]
Meshanything · [Paper] [Code]
BPT · [Paper] [Code]
TreeMeshGPT · [Paper] [Code]
DeepMesh · [Paper] [Code]
MeshMosaic · [Paper] [Code]
π3 · [Paper] [Code]
DA2 · [Paper] [Code]
EvoWorld · [Paper] [Code]
omniVGGT · [Paper] [Code]
M3arsSynth · [Paper] [Code]
PartCrafter · [Paper] [Code]
nano3D · [Paper] [Code]
hunyuan3d · [Paper] [Code]
hitem3d · [Paper] [Code]
HunyuanWorld-Voyager · [Paper] [Code]
Skybox · [Paper] [Code]
WVD · [Paper] [Code]
Flashworld · [Paper] [Code]
Real-Time Frame Model (RTFM) · [Paper] [Code]
WorldMirror · [Paper] [Code]
Skyfall-GS · [Paper] [Code]
Worldgrow · [Paper] [Code]
Continuous 3D Perception Model with Persistent State (CUT3R) · [Paper] [Code]
WonderZomom · [Paper] [Code]
EchoWorld · [Paper] [Code]
LoGeR · [Paper] [Code]
Utonia · [Paper] [Code]
inspatio-world · [Paper] [Code]
DROID-W · [Paper] [Code]
VGGT-World · [Paper] [Code]
Marble 1.1 Plus · [Paper] [Code]

4D generation

general 4D generation details

PhysDreamer · [Paper] [Code]
Physics3D · [Paper] [Code]
DriveDreamer4D · [Paper] [Code]
DOME · [Paper] [Code]
OccSora · [Paper] [Code]
DynamicCity · [Paper] [Code]
Aether · [Paper] [Code]
MorphoSim · [Paper] [Code]
SEE4D · [Paper] [Code]
4K4DGen · [Paper] [Code]

Simulator

simulator details

Ai2thor · [Paper] [Code]
SimWorld · [Paper] [Code]
Genesis · [Paper] [Code]
Latticeworld · [Paper] [Code]
Worldlab · [Paper] [Code]
WorldGrow · [Paper] [Code]

Joint-Embedding Predictive Architecture (JEPA)

JEPA Family Models

I-JEPA · [Paper] [Code]
MC-JEPA · [Paper] [Code]
V-JEPA · [Paper] [Code]
Point-JEPA · [Paper] [Code]
3D-JEPA · [Paper] [Code]
ACT-JEPA · [Paper] [Code]
V-JEPA 2 · [Paper] [Code]
Audio-JEPA · [Paper] [Code]
LeJEPA · [Paper] [Code]
Causal-JEPA · [Paper] [Code]
LeWorldModel · [Paper] [Code]
Think-JEPA · [Paper] [Code]

World Model - Memory

reasoning memory

reasoning memory details

LANGUAGE MODELS REPRESENT SPACE AND TIME · [Paper] [Code]
On memory · [Paper] [Code]

synthesis memory

synthesis memory details

Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation · [Paper] [Code]
MemFlow · [Paper] [Code]
StoryMem · [Paper] [Code]
DiT-Mem · [Paper] [Code]
MosaicMem · [Paper] [Code]

World Model - VLA

embodied ai

embodied ai models detail

pi0 · [Paper] [Code]
pi0.5 · [Paper] [Code]
pi0.6 · [Ckpt]
gigabrain · [Paper] [Code]
wall-oss · [Paper] [Code]
spirit-v1.5 · [Paper] [Code]
Evo-0 · [Paper] [Code]
World-env · [Paper] [Code]
Masquerade · [Paper] [Code]
Uwm · [Paper] [Code]
brickgpt · [Paper] [Code]
V-JEPA 2 · [Paper] [Code]
CtrlWorld · [Paper] [Code]
MomaGEN · [Paper] [Code]
WorldVLA · [Paper] [Code]
PhysWorld · [Paper] [Code]
RoboTracer · [Paper] [Code]
OSVI-WM · [Paper] [Code]
LPWM · [Paper] [Code]
ACoT-VLA · [Paper] [Code]
GigaWorld-Policy · [Paper] [Code]

embodied ai with video generation

World-in-world · [Paper] [Code]
gigaworld-0 · [Paper] [Code]
Lumine · [Paper] [Code]
lingbot-va · [Paper] [Code]
WoW · [Paper] [Code]
Cosmos · [Paper] [Code]
Hand2World · [Paper] [Code]

embodied ai with 3D generation

PointWorld · [Paper] [Code]

auto-driving

auto driving details

GAIA-1 · [Paper] [Code]
GAIA-2 · [Paper] [Code]
worldsplat · [Paper] [Code]
OmniNWM · [Paper] [Code]
dream4drive · [Paper] [Code]
DriveDreamer · [Paper] [Code]
DriveDreamer-2 · [Paper] [Code]
DrivingDiffusion · [Paper] [Code]
BEVWorld · [Paper] [Code]
UnO · [Paper] [Code]

Other World Models Corresponding Works

Datasets

Dataset details

OmniWorld-Game · [Paper] [Code]
AgiBot · [Paper] [Code]
DROID · [Paper] [Code]
RH20T · [Paper]
HOI4D · [Paper]
Epic-Kitchens · [Paper]
Ego-Exo4D · [Paper]
HoloAssist · [Paper]
Assembly101 · [Code]
EgoDex · [Paper]
CityWalk · [Paper] [Access]
GameFactory-Dataset · [Paper] [Access]
Look and Tell · [Paper] [Access]
WildWorld · [Paper] [Code]
Sekai · [Paper] [Code]

data curation framework

Unlocking Exocentric Video-Language Data for Egocentric Video Representation Learning · [Paper]

Benchmark

video/image reasoning benchmark details

MMWorld · [Paper] [Code]
MLVU · [Paper] [Code]
FavorBench · [Paper] [Code]
Videoverse · [Paper] [Code]
VinoGround · [Paper] [Code]
ShortVIdBench · [Paper] [Code]
Motion-Bench · [Paper] [Code]
OpenMM-Arena · [Code]
Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models · [Paper] [Code]

interactive video/image generation benchmark details

WoWbench · [Paper] [Code]
Envision · [Paper] [Code]

navigation generation benchmark details

WorldBench · [Paper] [Code]
CameraBench · [Paper] [Code]
Omni-WorldBench · [Paper] [Code]

3D/4D reasoning benchmark details

4D-Bench · [Paper] [Code]

3D/4D generation benchmark details

4DWorldBench · [Paper] [Code]
Widerange4D · [Paper] [Code]

spatial intelligence

VSI-Bench · [Paper] [Code]
SITE · [Paper] [Code]
MMSI-BENCH · [Paper] [Code]
OMNISPATIAL · [Paper] [Code]
MindCube · [Paper] [Code]
Stare · [Paper] [Code]
CoreCognition · [Paper] [Code]
SpatialViz-Bench · [Paper] [Code]
EASI · [Paper] [Code]
DSI-BENCH: A Benchmark for Dynamic Spatial Intelligence · [Paper] [Code]
Spatial Reasoning with Vision-Language Models in Ego-Centric Multi-View Scenes · [Paper] [Code]
SEEING ACROSS VIEWS: Benchmarking Spatial Reasoning of Vision-Language Models in Robotic Scenes · [Paper] [Code]
Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models · [Paper] [Code]
VisualTrans: A Benchmark for Real-World Visual Transformation Reasoning · [Paper] [Code]
From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D · [Paper] [Code]
MMGR · [Paper] [Code]
Phyx · [Paper] [Code]
Seephys · [Paper] [Code]

World Knowledge Model

World Knowledge Editing

EditWorld · [Paper] [Code]
ReasonPix2Pix · [Paper] [Code]
AnyEdit · [Paper] [Code]
Unireal · [Paper] [Code]
world-to-image · [Paper] [Code]
Worldedit · [Paper] [Code]
ChronoEdit · [Paper] [Code]

Code generation

CWM · [Paper] [Code]
Web World Models · [Paper] [Code]

Detection

DINO-WM · [Paper] [Code]

World Model Training

world model training methods details

RLVR-World · [Paper] [Code]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Awesome World Model

Awesome World Models Survey

World Model - Reasoning

spatial reasoning

omni reasoning

World Model - Multimodel Synthesis

interactive video generation

navigation video generation

(long-term) video generation

audio generation

brain signal

World Model - Simulator and Representation

Feature Matching & Point Tracking

Multi-View Stereo (MVS)

3D generation

4D generation

Simulator

Joint-Embedding Predictive Architecture (JEPA)

World Model - Memory

reasoning memory

synthesis memory

World Model - VLA

embodied ai

auto-driving

Other World Models Corresponding Works

Datasets

Benchmark

World Knowledge Model

World Model Training

FilesExpand file tree

awesome_world_model.md

Latest commit

History

awesome_world_model.md

File metadata and controls

Awesome World Model

Awesome World Models Survey

World Model - Reasoning

spatial reasoning

omni reasoning

World Model - Multimodel Synthesis

interactive video generation

navigation video generation

(long-term) video generation

audio generation

brain signal

World Model - Simulator and Representation

Feature Matching & Point Tracking

Multi-View Stereo (MVS)

3D generation

4D generation

Simulator

Joint-Embedding Predictive Architecture (JEPA)

World Model - Memory

reasoning memory

synthesis memory

World Model - VLA

embodied ai

auto-driving

Other World Models Corresponding Works

Datasets

Benchmark

World Knowledge Model

World Model Training