A modular PyTorch implementation of World Models (Ha & Schmidhuber, 2018) for the CarRacing-v3 environment. This repository breaks down the architecture into three distinct stages—Perception, Memory, and Action—with integrated experiment tracking via Weights & Biases (W&B).
Date: February 19, 2026 Subject: Implementation and Empirical Analysis of World Models for Autonomous Navigation
This project represents my implementation and rigorous testing of the "World Model" architecture within the CarRacing-v3 environment. The core challenge in autonomous navigation is the high-dimensional nature of visual data. By decoupling perception from reasoning, we aim to build an agent that understands the underlying physics of its world rather than just reacting to pixels.
This re-implementation serves as a "baptism by fire" in debugging a seminal AI milestone. Building a z=32 latent bottleneck from scratch on a modern, sometimes unstable stack like Python 3.14 provided a research-driven stress test for the theory. Navigating the cuBLAS initialization minefield and the thermal limits of an RTX 3070 Ti was the price of admission to prove that a neural hallucination can effectively master reality.
In traditional Reinforcement Learning (RL), agents often struggle with the "curse of dimensionality" when processing raw pixel data. Calculating gradients across thousands of pixels while simultaneously learning physics and long-term strategy is computationally expensive and prone to high variance.
Our goal was to solve the CarRacing-v3 task by building a surrogate environment. By predicting the consequences of its actions, the agent can play the game millions of times "in its head" (the dream state), transferring that learned intuition back to the real track.
The Vision model is a Variational Autoencoder (VAE) designed to compress RGB images into a latent vector .
- Architecture: The encoder uses four convolutional layers with stride-2 to downsample the input, while the decoder utilizes transposed convolutions to reconstruct the original frame.
- The Latent Bottleneck: We utilize the reparameterization trick to allow backpropagation through the stochastic sampling process.
- Empirical Findings: Training logs show a validation loss reduction from 18.0 to 14.5. The VAE successfully discards high-frequency noise like grass textures while preserving the structural "signal" necessary for navigation, such as track curvature and car orientation.
The Memory component is a Mixture Density Network combined with a Long Short-Term Memory (MDN-RNN).
- Predictive Power: The model predicts , where is the hidden state of the LSTM.
- Hyperparameter Selection: We configured the MDN head with 5 Gaussian components to handle the multi-modal nature of the environment, such as choosing between a left or right turn at a fork.
- Loss Dynamics: The MDN loss dropped to approximately -155. This negative value indicates that the Gaussian distributions became highly peaked, signifying the model has mastered the "internal physics" of the track.
The Controller () is a minimalist linear layer mapping to action vectors. We utilized CMA-ES (Covariance Matrix Adaptation Evolution Strategy) to optimize the weights.
In "Dream Mode," the agent never interacts with the actual Gym environment. Instead, the RNN predicts the next latent vector and reward, which are fed back into the Controller.
- Hurdle: A significant risk is "model exploitation," where the Controller learns to exploit inaccuracies in the RNN's predictions.
- Stability: To counter this, we used a deterministic strategy during training, taking the mean of the most likely Gaussian component to stabilize the evolution.
During training, we monitored the RTX 3070 Ti Laptop GPU closely.
- GPU Utilization: Utilization remained constant at 90-100%, indicating a well-saturated compute pipeline.
- The "CUBLAS" Hurdle: We encountered
CUBLAS_STATUS_NOT_INITIALIZEDerrors traced to the TensorFloat-32 (TF32) kernel path. Disabling TF32 (torch.backends.cuda.matmul.allow_tf32 = False) ensured stability at the cost of slight performance overhead. - Thermal Constraints: The GPU temperature plateaued at 85°C. This thermal load suggests that sequential LSTM operations put a sustained stress on the silicon compared to more "bursty" convolutional workloads.
This implementation reveals a high-performing architecture that solves the CarRacing-v3 task by successfully decoupling perception, memory, and control. By compressing environment observations into a z=32 latent vector, the model prioritized essential features over decorative noise. The VAE's stable convergence indicates that this bottleneck is sufficient for low-complexity physics.
The MDN-RNN demonstrated a powerful ability to "hallucinate" transitions, with reward prediction MSE hitting a near-zero 0.00004. This allows the agent to effectively "feel" incoming rewards within its dream state.
The ultimate success is evident in the Controller’s performance, achieving a "solved" state with rewards exceeding 780. The agent developed an aggressive, high-velocity racing style characterized by tight cornering and intentional drifting. However, the hardware constraints and stability issues in the Python 3.14 environment suggest that the next step for this research is moving toward Transformer-based world models to bypass the sequential compute bottlenecks inherent in RNNs.
World_Models/
├── data/ # Dataset storage
│ ├── recordings/ # Raw rollouts (.npz)
│ ├── processed/ # Preprocessed latent vectors (dataset.pt)
│ └── weights/ # Saved model checkpoints
├── src/
│ ├── action/ # (C) Controller Component
│ ├── memory/ # (M) Memory Component
│ ├── perception/ # (V) Vision Component
│ ├── helpers.py # Global logging & utilities
│ └── play_world_model.py # Inference/Demo script
Collect raw rollouts using multiprocessing:
python generate_data.py
Train the VAE to compress 64x64 frames into a z=32 vector:
python src/perception/train_vae.py --epochs 20
Convert raw images into latent sequences () to speed up RNN training:
python src/memory/preprocess_rnn.py
Train the MDN-RNN to predict the future latent state:
python src/memory/train_rnn.py --epochs 20
Evolve the Controller using CMA-ES:
python src/action/train_controller.py --generations 100
Watch your fully trained agent play in real-time:
python src/play_world_model.py --render_mode human
This project uses Weights & Biases for logging:
- Loss Curves: Monitor VAE reconstruction and RNN log-likelihood.
- Video: The
train_controller.pyscript automatically uploads video replays of the best agents.
To disable logging: wandb disabled


