Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 49 additions & 23 deletions docs/guide/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,44 +4,70 @@
Getting Started
===============

Most of the library tries to follow a sklearn-like syntax for the Reinforcement Learning algorithms.
Stable-Baselines3 follows a scikit-learn–like interface for Reinforcement Learning algorithms.
Models are created, trained using `.learn()`, and used for prediction via `.predict()`.
Comment on lines +7 to +8
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This documentation update appears to be for the wrong repository. This is the stable-baselines (v2) repository, which is in maintenance mode and uses the old API with "stable_baselines" imports and "gym" environments. However, this PR updates the documentation to use stable-baselines3 API patterns. The description mentions stable-baselines3, but this should remain as stable-baselines v2 documentation. Please review the entire change and ensure it matches the correct library version.

Copilot uses AI. Check for mistakes.

Here is a quick example of how to train and run PPO2 on a cartpole environment:
Quick Example: Train PPO on CartPole
=====================================

Before running the example, install the required dependencies:

.. code-block:: bash

pip install stable-baselines3[extra]
pip install gymnasium[box2d]
Comment on lines +17 to +18
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The installation instructions reference stable-baselines3, but this repository is for stable-baselines (v2). The correct installation for this library would be "pip install stable-baselines" not "pip install stable-baselines3[extra]".

Copilot uses AI. Check for mistakes.

Basic Training Example
----------------------

.. code-block:: python

import gym
from stable_baselines3.common.vec_env import DummyVecEnv
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import statement uses "from stable_baselines3.common.vec_env import DummyVecEnv", but this repository is stable-baselines (v2) which uses "from stable_baselines.common.vec_env import DummyVecEnv" (note: stable_baselines, not stable_baselines3).

Copilot uses AI. Check for mistakes.
from stable_baselines3 import PPO
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import statement uses "from stable_baselines3 import PPO", but this repository is stable-baselines (v2) which uses "from stable_baselines import PPO2" (note: PPO2, not PPO). This import will fail for users of this library.

Copilot uses AI. Check for mistakes.
import gymnasium as gym
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import uses "gymnasium as gym", but stable-baselines (v2) is designed to work with the old "gym" library (OpenAI Gym), not "gymnasium". Throughout the rest of the documentation in this repository, "import gym" is used consistently. Gymnasium is used by stable-baselines3, not stable-baselines v2.

Copilot uses AI. Check for mistakes.

# Create a vectorized environment
env = DummyVecEnv([lambda: gym.make("CartPole-v1")])

# Create the model
model = PPO("MlpPolicy", env, verbose=1)

from stable_baselines.common.policies import MlpPolicy
from stable_baselines.common.vec_env import DummyVecEnv
from stable_baselines import PPO2
# Train the model
model.learn(total_timesteps=10000)

env = gym.make('CartPole-v1')
# Optional: PPO2 requires a vectorized environment to run
# the env is now wrapped automatically when passing it to the constructor
# env = DummyVecEnv([lambda: env])
# Test the trained model
obs = env.reset()
for _ in range(1000):
action, _ = model.predict(obs, deterministic=True)
obs, reward, done, info = env.step(action)
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using vectorized environments in stable-baselines (v2), the step() method returns arrays with pluralized variable names: "obs, rewards, dones, info". This code uses singular "reward, done" which would cause confusion. Check examples.rst lines 87, 145, etc. for the correct pattern with vectorized environments in stable-baselines v2.

Suggested change
obs, reward, done, info = env.step(action)
obs, rewards, dones, infos = env.step(action)

Copilot uses AI. Check for mistakes.
env.render()

model = PPO2(MlpPolicy, env, verbose=1)
model.learn(total_timesteps=10000)
Explanation
-----------

obs = env.reset()
for i in range(1000):
action, _states = model.predict(obs)
obs, rewards, dones, info = env.step(action)
env.render()
- ``DummyVecEnv`` wraps the environment into a vectorized format required by Stable-Baselines3.
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states that DummyVecEnv is "required by Stable-Baselines3", but this repository is stable-baselines (v2). The explanation should reference stable-baselines, not stable-baselines3. Additionally, in stable-baselines v2, vectorized environments are optional for PPO2 (the environment is wrapped automatically when passing it to the constructor).

Suggested change
- ``DummyVecEnv`` wraps the environment into a vectorized format required by Stable-Baselines3.
- ``DummyVecEnv`` wraps the environment into a vectorized format used by stable-baselines; for PPO2 in stable-baselines (v2), this wrapping is optional because the environment is automatically vectorized when passed to the constructor.

Copilot uses AI. Check for mistakes.
- ``"MlpPolicy"`` specifies a Multi-Layer Perceptron policy (default for non-image tasks).
- ``learn(total_timesteps=10000)`` trains the agent.
- ``predict()`` selects actions from the trained model.
- ``deterministic=True`` ensures consistent evaluation behavior.

One-Liner Training
==================

Or just train a model with a one liner if
`the environment is registered in Gym <https://github.com/openai/gym/wiki/Environments>`_ and if
`the policy is registered <custom_policy.html>`_:
If the environment is registered in Gymnasium and the default policy is appropriate,
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment references "Gymnasium", but stable-baselines (v2) uses OpenAI Gym, not Gymnasium. Gymnasium is used by stable-baselines3, not this library.

Copilot uses AI. Check for mistakes.
a model can be trained in a single line:

.. code-block:: python

from stable_baselines import PPO2
from stable_baselines3 import PPO
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import uses "from stable_baselines3 import PPO", but this repository is stable-baselines (v2) which uses "from stable_baselines import PPO2" (note the different module name and algorithm class name).

Copilot uses AI. Check for mistakes.

model = PPO2('MlpPolicy', 'CartPole-v1').learn(10000)
model = PPO("MlpPolicy", "CartPole-v1").learn(10000)

This automatically:
- Creates the environment
- Wraps it properly
- Trains the model

.. figure:: https://cdn-images-1.medium.com/max/960/1*R_VMmdgKAY0EDhEjHVelzw.gif

Define and train a RL agent in one line of code!
Define and train a Reinforcement Learning agent in one line of code!
Loading