-
Notifications
You must be signed in to change notification settings - Fork 61
Update quickstart guide for Stable-Baselines3 #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -4,44 +4,70 @@ | |||||
| Getting Started | ||||||
| =============== | ||||||
|
|
||||||
| Most of the library tries to follow a sklearn-like syntax for the Reinforcement Learning algorithms. | ||||||
| Stable-Baselines3 follows a scikit-learn–like interface for Reinforcement Learning algorithms. | ||||||
| Models are created, trained using `.learn()`, and used for prediction via `.predict()`. | ||||||
|
|
||||||
| Here is a quick example of how to train and run PPO2 on a cartpole environment: | ||||||
| Quick Example: Train PPO on CartPole | ||||||
| ===================================== | ||||||
|
|
||||||
| Before running the example, install the required dependencies: | ||||||
|
|
||||||
| .. code-block:: bash | ||||||
|
|
||||||
| pip install stable-baselines3[extra] | ||||||
| pip install gymnasium[box2d] | ||||||
|
Comment on lines
+17
to
+18
|
||||||
|
|
||||||
| Basic Training Example | ||||||
| ---------------------- | ||||||
|
|
||||||
| .. code-block:: python | ||||||
|
|
||||||
| import gym | ||||||
| from stable_baselines3.common.vec_env import DummyVecEnv | ||||||
|
||||||
| from stable_baselines3 import PPO | ||||||
|
||||||
| import gymnasium as gym | ||||||
|
||||||
|
|
||||||
| # Create a vectorized environment | ||||||
| env = DummyVecEnv([lambda: gym.make("CartPole-v1")]) | ||||||
|
|
||||||
| # Create the model | ||||||
| model = PPO("MlpPolicy", env, verbose=1) | ||||||
|
|
||||||
| from stable_baselines.common.policies import MlpPolicy | ||||||
| from stable_baselines.common.vec_env import DummyVecEnv | ||||||
| from stable_baselines import PPO2 | ||||||
| # Train the model | ||||||
| model.learn(total_timesteps=10000) | ||||||
|
|
||||||
| env = gym.make('CartPole-v1') | ||||||
| # Optional: PPO2 requires a vectorized environment to run | ||||||
| # the env is now wrapped automatically when passing it to the constructor | ||||||
| # env = DummyVecEnv([lambda: env]) | ||||||
| # Test the trained model | ||||||
| obs = env.reset() | ||||||
| for _ in range(1000): | ||||||
| action, _ = model.predict(obs, deterministic=True) | ||||||
| obs, reward, done, info = env.step(action) | ||||||
|
||||||
| obs, reward, done, info = env.step(action) | |
| obs, rewards, dones, infos = env.step(action) |
Copilot
AI
Feb 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment states that DummyVecEnv is "required by Stable-Baselines3", but this repository is stable-baselines (v2). The explanation should reference stable-baselines, not stable-baselines3. Additionally, in stable-baselines v2, vectorized environments are optional for PPO2 (the environment is wrapped automatically when passing it to the constructor).
| - ``DummyVecEnv`` wraps the environment into a vectorized format required by Stable-Baselines3. | |
| - ``DummyVecEnv`` wraps the environment into a vectorized format used by stable-baselines; for PPO2 in stable-baselines (v2), this wrapping is optional because the environment is automatically vectorized when passed to the constructor. |
Copilot
AI
Feb 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment references "Gymnasium", but stable-baselines (v2) uses OpenAI Gym, not Gymnasium. Gymnasium is used by stable-baselines3, not this library.
Copilot
AI
Feb 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import uses "from stable_baselines3 import PPO", but this repository is stable-baselines (v2) which uses "from stable_baselines import PPO2" (note the different module name and algorithm class name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This documentation update appears to be for the wrong repository. This is the stable-baselines (v2) repository, which is in maintenance mode and uses the old API with "stable_baselines" imports and "gym" environments. However, this PR updates the documentation to use stable-baselines3 API patterns. The description mentions stable-baselines3, but this should remain as stable-baselines v2 documentation. Please review the entire change and ensure it matches the correct library version.