This project implements infinite-horizon stochastic optimal control algorithms for trajectory tracking of a differential-drive robot navigating a 2D environment with static circular obstacles.
We compare three control approaches:
- Certainty Equivalent Control (CEC) — solved via nonlinear optimization using CasADi.
- Generalized Policy Iteration (GPI) - Deterministic — without stochastic transitions.
- Generalized Policy Iteration (GPI) — with full probabilistic modeling and value iteration.
The goal is to track a periodic reference trajectory over a 100-step horizon, minimizing tracking error, control effort, and proximity to obstacles.
. ├── part1.py # Certainty Equivalent Control (CEC) ├── part2-deterministic.py # GPI assuming deterministic transitions ├── part2.py # Full GPI with stochastic transitions ├── utils.py # Utility functions (dynamics, reference trajectory, plotting) ├── mujoco_car.py # Optional MuJoCo simulator wrapper ├── fig/, meshes/, env.xml # Visualization and MuJoCo environment files ├── ECE276B_PR3.pdf # Final project report ├── README.txt # This file
- Python 3.8+
- NumPy
- CasADi
- Ray
- tqdm
- Matplotlib (for visualization)
- MuJoCo (optional, for physics-based validation)
pip install numpy casadi ray tqdm matplotlib
To use MuJoCo:
- Download from https://mujoco.org/
- Add the path to the ~/.mujoco folder and configure environment variables.
-
Certainty Equivalent Control (CEC) Solves a receding-horizon nonlinear program at each time step using CasADi's
nlpsol.python part1.py -
Deterministic GPI Uses Generalized Policy Iteration assuming a deterministic transition model.
python part2-deterministic.py -
Full Stochastic GPI Implements full GPI with stochastic transitions, obstacle-aware stage costs, and policy iteration.
python part2.pyUse --use-mujoco flag to run simulation using MuJoCo (if installed):
python part2.py --use-mujoco
- Discretization of state and control spaces with configurable resolution.
- Periodic reference trajectory via Lissajous curve.
- Obstacle avoidance via exponential or hard penalty functions.
- Terminal cost enforcement.
- Parallel computation of transition matrices using Ray.
- Support for both simulation-only and physics-based environments.
- Simulation logs (position, orientation, control actions)
- Visualizations of robot trajectory and reference path
- Performance metrics: tracking error, total cost, loop time
All visualizations are saved automatically to the fig/ directory when the simulation completes.
For detailed methodology, equations, and evaluation results, see: ECE276B_PR3.pdf
- Pedram Aghazadeh
UC San Diego, ECE 276B — Planning & Learning in Robotics
This project is for academic and research use only. No commercial use is permitted without permission.