Gymnasium environments for self-propelling spheres on a flat ground plane:
hyrosphere-v0— tetrahedral rigid body with 4 point masses driven by angular accelerations (action dim 4, obs dim 41).linearsphere-v0— sphere with 6 point masses sliding along the cardinal axes (action dim 6, obs dim 63).
Reward is (z - radius) + 0.1 * v_z — the goal is to jump as high as possible, with a small upward-velocity term for early-training shaping. Episodes truncate at 10 s of simulated time (dt = 0.01, 1000 steps).
Requires Python 3.12 (via pyenv or system) and Poetry ≥ 1.9.
Torch is pulled from the pytorch-cu128 index — CUDA 12.8 build, needed for RTX 5090 / Blackwell (sm_120).
poetry installimport gymnasium as gym
import gym_hyrosphere # registers hyrosphere-v0 and linearsphere-v0
env = gym.make("hyrosphere-v0")
obs, info = env.reset(seed=0)
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())poetry run python viewer.py [--env hyro|linear] # interactive OpenGL viewer
poetry run python train.py --env hyro --timesteps 2_000_000 # PPO training
poetry run python play.py [--run runs/<dir>] # watch trained policy
poetry run tensorboard --logdir runs/ # training curves
poetry run python benchmark.py # 10k physics-step timing
poetry run python plot-progress.py -f progress.csv -n 100 # live-plot CSVtrain.py runs PPO with a VecNormalize wrapper (obs+reward normalization) and writes checkpoints + tb logs to runs/<env>-<timestamp>/. play.py picks up the most recent run by default.
The math behind the simulator is in docs/: start with overview.md, then dynamics.md for the shared rigid-body equations and hyrosphere.md / linearsphere.md for per-model specifics.
- The original repo trained with OpenAI Baselines (
python -m baselines.run --alg=acktr ...). That package is unmaintained and the saved checkpoints (hyrosphere-acktr,linearsphere-acktr) are not compatible with this Gymnasium-based rewrite. Usestable-baselines3or another modern RL library to train against the env. step()returns the 5-tuple(obs, reward, terminated, truncated, info);reset(seed=...)returns(obs, info). There is no terminal condition — onlytruncatedafter 10 s.