Stable-Baselines-Team · Yassineachkhity · Feb 26, 2026 · Copilot · Feb 26, 2026 · Copilot
diff --git a/docs/guide/quickstart.rst b/docs/guide/quickstart.rst
@@ -4,44 +4,70 @@
 Getting Started
 ===============
 
-Most of the library tries to follow a sklearn-like syntax for the Reinforcement Learning algorithms.
+Stable-Baselines3 follows a scikit-learn–like interface for Reinforcement Learning algorithms.
+Models are created, trained using `.learn()`, and used for prediction via `.predict()`.
 
-Here is a quick example of how to train and run PPO2 on a cartpole environment:
+Quick Example: Train PPO on CartPole
+=====================================
+
+Before running the example, install the required dependencies:
+
+.. code-block:: bash
+
+   pip install stable-baselines3[extra]
+   pip install gymnasium[box2d]
+
+Basic Training Example
+----------------------
 
 .. code-block:: python
 
-  import gym
+   from stable_baselines3.common.vec_env import DummyVecEnv
+   from stable_baselines3 import PPO
+   import gymnasium as gym
+
+   # Create a vectorized environment
+   env = DummyVecEnv([lambda: gym.make("CartPole-v1")])
+
+   # Create the model
+   model = PPO("MlpPolicy", env, verbose=1)
 
-  from stable_baselines.common.policies import MlpPolicy
-  from stable_baselines.common.vec_env import DummyVecEnv
-  from stable_baselines import PPO2
+   # Train the model
+   model.learn(total_timesteps=10000)
 
-  env = gym.make('CartPole-v1')
-  # Optional: PPO2 requires a vectorized environment to run
-  # the env is now wrapped automatically when passing it to the constructor
-  # env = DummyVecEnv([lambda: env])
+   # Test the trained model
+   obs = env.reset()
+   for _ in range(1000):
+       action, _ = model.predict(obs, deterministic=True)
+       obs, reward, done, info = env.step(action)
-       obs, reward, done, info = env.step(action)
+       obs, rewards, dones, infos = env.step(action)
-       obs, reward, done, info = env.step(action)
+       obs, rewards, dones, infos = env.step(action)
+       env.render()
 
-  model = PPO2(MlpPolicy, env, verbose=1)
-  model.learn(total_timesteps=10000)
+Explanation
+-----------
 
-  obs = env.reset()
-  for i in range(1000):
-      action, _states = model.predict(obs)
-      obs, rewards, dones, info = env.step(action)
-      env.render()
+- ``DummyVecEnv`` wraps the environment into a vectorized format required by Stable-Baselines3.
- ``DummyVecEnv`` wraps the environment into a vectorized format required by Stable-Baselines3.
+- ``DummyVecEnv`` wraps the environment into a vectorized format used by stable-baselines; for PPO2 in stable-baselines (v2), this wrapping is optional because the environment is automatically vectorized when passed to the constructor.
- ``DummyVecEnv`` wraps the environment into a vectorized format required by Stable-Baselines3.
+- ``DummyVecEnv`` wraps the environment into a vectorized format used by stable-baselines; for PPO2 in stable-baselines (v2), this wrapping is optional because the environment is automatically vectorized when passed to the constructor.
+- ``"MlpPolicy"`` specifies a Multi-Layer Perceptron policy (default for non-image tasks).
+- ``learn(total_timesteps=10000)`` trains the agent.
+- ``predict()`` selects actions from the trained model.
+- ``deterministic=True`` ensures consistent evaluation behavior.
 
+One-Liner Training
+==================
 
-Or just train a model with a one liner if
-`the environment is registered in Gym <https://github.com/openai/gym/wiki/Environments>`_ and if
-`the policy is registered <custom_policy.html>`_:
+If the environment is registered in Gymnasium and the default policy is appropriate,
+a model can be trained in a single line:
 
 .. code-block:: python
 
-    from stable_baselines import PPO2
+   from stable_baselines3 import PPO
 
-    model = PPO2('MlpPolicy', 'CartPole-v1').learn(10000)
+   model = PPO("MlpPolicy", "CartPole-v1").learn(10000)
 
+This automatically:
+- Creates the environment
+- Wraps it properly
+- Trains the model
 
 .. figure:: https://cdn-images-1.medium.com/max/960/1*R_VMmdgKAY0EDhEjHVelzw.gif
 
-  Define and train a RL agent in one line of code!
+   Define and train a Reinforcement Learning agent in one line of code!