Skip to content

grauwolf32/openai-physics

Repository files navigation

gym-hyrosphere

Gymnasium environments for self-propelling spheres on a flat ground plane:

  • hyrosphere-v0 — tetrahedral rigid body with 4 point masses driven by angular accelerations (action dim 4, obs dim 41).
  • linearsphere-v0 — sphere with 6 point masses sliding along the cardinal axes (action dim 6, obs dim 63).

Reward is (z - radius) + 0.1 * v_z — the goal is to jump as high as possible, with a small upward-velocity term for early-training shaping. Episodes truncate at 10 s of simulated time (dt = 0.01, 1000 steps).

Setup

Requires Python 3.12 (via pyenv or system) and Poetry ≥ 1.9. Torch is pulled from the pytorch-cu128 index — CUDA 12.8 build, needed for RTX 5090 / Blackwell (sm_120).

poetry install

Use

import gymnasium as gym
import gym_hyrosphere  # registers hyrosphere-v0 and linearsphere-v0

env = gym.make("hyrosphere-v0")
obs, info = env.reset(seed=0)
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())

Scripts

poetry run python viewer.py [--env hyro|linear]              # interactive OpenGL viewer
poetry run python train.py  --env hyro --timesteps 2_000_000 # PPO training
poetry run python play.py   [--run runs/<dir>]               # watch trained policy
poetry run tensorboard --logdir runs/                        # training curves
poetry run python benchmark.py                               # 10k physics-step timing
poetry run python plot-progress.py -f progress.csv -n 100    # live-plot CSV

train.py runs PPO with a VecNormalize wrapper (obs+reward normalization) and writes checkpoints + tb logs to runs/<env>-<timestamp>/. play.py picks up the most recent run by default.

Docs

The math behind the simulator is in docs/: start with overview.md, then dynamics.md for the shared rigid-body equations and hyrosphere.md / linearsphere.md for per-model specifics.

Notes

  • The original repo trained with OpenAI Baselines (python -m baselines.run --alg=acktr ...). That package is unmaintained and the saved checkpoints (hyrosphere-acktr, linearsphere-acktr) are not compatible with this Gymnasium-based rewrite. Use stable-baselines3 or another modern RL library to train against the env.
  • step() returns the 5-tuple (obs, reward, terminated, truncated, info); reset(seed=...) returns (obs, info). There is no terminal condition — only truncated after 10 s.

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages