π GEM
- -General Gym - A comprehensive framework for reinforcement learning environments that - provides a unified interface for various RL tasks.
- Learn more -diff --git a/content/gem/_index.md b/content/gem/_index.md index 014d01d..80f5701 100644 --- a/content/gem/_index.md +++ b/content/gem/_index.md @@ -56,7 +56,7 @@ for _ in range(30): GEM includes __single file__ examples for training an LLM agent through `oat` or `verl` framework.
The [OAT](https://github.com/sail-sg/oat) framework provides a comprehensive solution for training language model agents in reinforcement learning environments. diff --git a/layouts/gem/single.html b/layouts/gem/single.html index ac2d66b..bce422c 100644 --- a/layouts/gem/single.html +++ b/layouts/gem/single.html @@ -34,7 +34,7 @@Advanced GEM features, custom environments, and training.
-GEM makes it simple to create custom environments. To create a new environment, simply add .reset() and .step() methods, and then register the environment here. See examples for more information.
gem.core.Env.reset()GEM makes it simple to create custom environments. To create a new environment, simply add .reset() and .step() methods, and then register the environment here. See examples for more information.
gem.core.Env.reset()Returns:
obs (str) - Initial question/observation from the environment.info (dict) - Any extra info e.g. for logging or to aid debugging.gem.core.Env.step(action)gem.core.Env.step(action)Returns:
obs (str) - Next observation/output from the environment.gem.core.Env.step(action)truncated (bool) - Following Gym environments but currently unused.info (dict) - Any extra info.gem.core.Env: Your environment should extend the base environment class.reset() and .step() logicfrom gem.core import Env
-from gem.envs.registration import register
-
-class ReverseStringEnv(Env):
- def __init__(self, str_len: int = 5):
- super().__init__()
- self.str_len = str_len
-
- def _get_instructions(self) -> str:
- return (
- "You are tasked to reverse a given string.\n"
- "You may provide your response in any manner. Only the content wrapped inside \\boxed{} will be considered as your final answer.\n"
- f"Please reverse the string: {self.gt_str}.\n"
- )
-
- def reset(self, seed=None):
- super().reset(seed)
- characters = string.ascii_letters + string.digits # A-Z, a-z, 0-9
- self.gt_str = "".join(random.choices(characters, k=self.str_len))
- return self._get_instructions(), {}
-
- def step(self, action):
- clean_action = extract_last_boxed_answer(action)
- if clean_action is None:
- reward = 0
- else:
- reward = float(clean_action[::-1] == self.gt_str)
- return TERMINAL_STATE, reward, True, True, {}
-
-
-# Register your environment
-register("custom:ReverseString", ReverseStringEnv)
-from gem.core import Env
+from gem.envs.registration import register
+
+class ReverseStringEnv(Env):
+ def __init__(self, str_len: int = 5):
+ super().__init__()
+ self.str_len = str_len
+
+ def _get_instructions(self) -> str:
+ return (
+ "You are tasked to reverse a given string.\n"
+ "You may provide your response in any manner. Only the content wrapped inside \\boxed{} will be considered as your final answer.\n"
+ f"Please reverse the string: {self.gt_str}.\n"
+ )
+
+ def reset(self, seed=None):
+ super().reset(seed)
+ characters = string.ascii_letters + string.digits # A-Z, a-z, 0-9
+ self.gt_str = "".join(random.choices(characters, k=self.str_len))
+ return self._get_instructions(), {}
+
+ def step(self, action):
+ clean_action = extract_last_boxed_answer(action)
+ if clean_action is None:
+ reward = 0
+ else:
+ reward = float(clean_action[::-1] == self.gt_str)
+ return TERMINAL_STATE, reward, True, True, {}
+
+
+# Register your environment
+register("custom:ReverseString", ReverseStringEnv)
+GEM supports a diverse range of environments and makes it easy to add your own. GEM provides four main categories of environments, each designed for different types of agent training and evaluation.
All GEM environments follow a consistent interface pattern:
env.sample_random_action() - Get a random valid actionThis design closely follows the Gymnasium standard, making it easy to integrate with existing RL frameworks and tools.
-Interactive game environments including Sudoku, Minesweeper, Wordle, and more from the TextArena collection.
-We maintain local versions of many of the TextArena games with (i) improved dense game reward design and (ii) compatible gym-style interface.
-| Environment | -Description | -|
|---|---|---|
game:GuessTheNumber |
- The agent has multiple guesses to guess the hidden number. The agent receives whether the hidden number is higher or lower than its guess. | -|
game:Mastermind |
- The agent has multiple guesses to guess the hidden code. The agent receives black and white pegs depending on the number of correct digits in the right and wrong places. | -|
game:Minesweeper |
- The agent must reveal all safe grid squares without revealing a mine. For each revealed square the agent receives the number of adjacent squares that contain mines. | -|
game:Wordle |
- The agent must guess the hidden word. After each turn the agent receives feedback ("G"=correct letter + correct position, "Y"=correct letter + incorrect position, "X"=incorrect letter). | -|
game:FifteenPuzzle |
- Arrange tiles on the board into ascending order using the empty space to slide tiles into different positions. | -|
game:Hangman |
- The objective of the game is to guess the word by providing one letter guesses or the entire word. | -|
game:Sudoku |
- Classic Sudoku Game. `easy` version renders a 4x4 board. | -|
game:TowerofHanoi |
- a classic single-player puzzle game where the objective is to move a stack of disks from one tower to another following specific rules. | -
| Environment | +Description | +
|---|---|
game:GuessTheNumber |
+The agent has multiple guesses to guess the hidden number. The agent receives whether the hidden number is higher or lower than its guess. | +
game:Mastermind |
+The agent has multiple guesses to guess the hidden code. The agent receives black and white pegs depending on the number of correct digits in the right and wrong places. | +
game:Minesweeper |
+The agent must reveal all safe grid squares without revealing a mine. For each revealed square the agent receives the number of adjacent squares that contain mines. | +
game:Wordle |
+The agent must guess the hidden word. After each turn the agent receives feedback ("G"=correct letter + correct position, "Y"=correct letter + incorrect position, "X"=incorrect letter). | +
game:FifteenPuzzle |
+Arrange tiles on the board into ascending order using the empty space to slide tiles into different positions. | +
game:Hangman |
+The objective of the game is to guess the word by providing one letter guesses or the entire word. | +
game:Sudoku |
+Classic Sudoku Game. `easy` version renders a 4x4 board. | +
game:TowerofHanoi |
+a classic single-player puzzle game where the objective is to move a stack of disks from one tower to another following specific rules. | +
Each environment additionally has -easy, -hard, and -random variants, where -random denotes that the environment is set to a random level of difficulty at each reset.
Adding new games is easy. Simply include .step(), .reset() functions and register the environment with a new name.
Mathematical reasoning environments with automatic answer parsing and checking, compatible with various math datasets.
GEM’s math environment class includes automatic answer parsing and checking and is designed to be compatible with any math dataset. To add a new environment simply register the dataset. A typical use case is combining these with access to the python tool to train the agent to utilize code.
-| Environment | -Dataset | -|
|---|---|---|
math:ASDIV2k |
- ASDIV-2k | -|
math:GSM8k |
- GSM-8k | -|
math:Math12k |
- MATH-12k | -|
math:ORZ57k |
- ORZ-57k | -
| Environment | +Dataset | +
|---|---|
math:ASDIV2k |
+ASDIV-2k | +
math:GSM8k |
+GSM-8k | +
math:Math12k |
+MATH-12k | +
math:ORZ57k |
+ORZ-57k | +
Code generation and evaluation environments that automatically test solutions in sandboxed environments.
GEM’s code environment class automatically evaluates success by running the test cases in a sandbox. This class can be used with any code dataset consisting of the task and test cases.
-| Environment | -Dataset | -|
|---|---|---|
code:CodeContest |
- CodeContest | -|
code:Taco8k |
- TACO-8k | -
| Environment | +Dataset | +
|---|---|
code:CodeContest |
+CodeContest | +
code:Taco8k |
+TACO-8k | +
subprocess code.QA environments designed for integrated search tool usage to train agents in information retrieval and reasoning.
GEM’s question-answering environments are designed to allow integrated search tool usage to train the agent to use search functionality. Additional question-answering environments can be added by simply registering the dataset.
-| Environment | -Dataset | -|
|---|---|---|
qa:NaturalQuestions |
- NaturalQuestions | -|
qa:HotpotQA |
- HotpotQA | -|
logic:RuleTaker-d0 |
- RuleTaker-d0-70k | -|
logic:RuleTaker-d1 |
- RuleTaker-d1-70k | -|
logic:RuleTaker-d2 |
- RuleTaker-d2-70k | -|
logic:RuleTaker-d3 |
- RuleTaker-d3-70k | -|
logic:RuleTaker-d5 |
- RuleTaker-d5-70k | -
| Environment | +Dataset | +
|---|---|
qa:NaturalQuestions |
+NaturalQuestions | +
qa:HotpotQA |
+HotpotQA | +
logic:RuleTaker-d0 |
+RuleTaker-d0-70k | +
logic:RuleTaker-d1 |
+RuleTaker-d1-70k | +
logic:RuleTaker-d2 |
+RuleTaker-d2-70k | +
logic:RuleTaker-d3 |
+RuleTaker-d3-70k | +
logic:RuleTaker-d5 |
+RuleTaker-d5-70k | +
We include all tasks in Reasoning Gym in our package, which could be simply used by calling make(rg:[sub_task_name]).
We include all tasks in Reasoning Gym in our package, which could be simply used by calling make(rg:[sub_task_name]).
Following the Gym interface, GEM provides wrappers to easily add and change functionality. Wrappers are registered in the WRAPPER_FACTORY.
+Following the Gym interface, GEM provides wrappers to easily add and change functionality. Wrappers are registered in the WRAPPER_FACTORY.
The main wrapper types are: observation wrappers, tool wrappers, and episode tracking wrappers.
-Observation wrappers are used to convert the sequence of game states and agent actions into a string which is used as the prompt for the LLM agent at the next step.
-| Wrapper name | -Description | -Example (Mastermind) | -
|---|---|---|
| no wrapper | -The observation string from the environment. | -"At turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s)." |
-
concat |
- The sequence of environment observation strings from all previous steps concatenated together. | -"You are playing Mastermind. [instructions]... Enter your first guess to start the game.\nAt turn 1, you guessed 123. This guess receives 1 black peg(s) and 1 white peg(s).\nAt turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s)." |
-
concat_with_action |
- The sequence of [environment observation string, agent action, environment observation string, etc.] from all previous steps concatenated together. | -"You are playing Mastermind. [instructions]... Enter your first guess to start the game.\nOkay, I will guess a random 3 digit number for now. My first guess will be \\boxed{123}.\nAt turn 1, you guessed 123. This guess receives 1 black peg(s) and 1 white peg(s).\nOkay, let's think. One digit is in the correct place. Perhaps this is 3. One digit is completely incorrect. Let's try switching 1 for 4 and moving the 2. My next guess will be \\boxed{243}.\nAt turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s)." |
-
concat_chat (default) |
- The sequence of [environment observation string, agent action, environment observation string, etc.] from all previous steps concatenated together with the chat template applied to denote "user" (environment) vs "assistant" (agent) turns. | -"<|im_start|>user\nYou are playing Mastermind. [instructions]... Enter your first guess to start the game.<|im_end|>\n<|im_start|>assistant\nOkay, I will guess a random 3 digit number for now. My first guess will be \\boxed{123}<|im_end|> <|im_start|>user\nAt turn 1, you guessed 123. This guess receives 1 black peg(s) and 1 white peg(s).<|im_end|>\n<|im_start|>assistant\nOkay, let's think. One digit is in the correct place. Perhaps this is 3. One digit is completely incorrect. Let's try switching 1 for 4 and moving the 2. My next guess will be \\boxed{243}.<|im_end|>\n<|im_start|>user\nAt turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s).<|im_end|>\n<|im_start|>assistant" |
-
concat_chat_on_reset |
- Same as concat_with_action but the chat template tag is applied at the start. | -"<|im_start|>user\nYou are playing Mastermind. [instructions]... Enter your first guess to start the game.\nOkay, I will guess a random 3 digit number for now. My first guess will be \\boxed{123}.\nAt turn 1, you guessed 123. This guess receives 1 black peg(s) and 1 white peg(s).\nOkay, let's think. One digit is in the correct place. Perhaps this is 3. One digit is completely incorrect. Let's try switching 1 for 4 and moving the 2. My next guess will be \\boxed{243}.\nAt turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s)." |
-
| Wrapper name | +Description | +Example (Mastermind) | +
|---|---|---|
| no wrapper | +The observation string from the environment. | +"At turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s)." |
+
concat |
+The sequence of environment observation strings from all previous steps concatenated together. | +"You are playing Mastermind. [instructions]... Enter your first guess to start the game.\nAt turn 1, you guessed 123. This guess receives 1 black peg(s) and 1 white peg(s).\nAt turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s)." |
+
concat_with_action |
+The sequence of [environment observation string, agent action, environment observation string, etc.] from all previous steps concatenated together. | +"You are playing Mastermind. [instructions]... Enter your first guess to start the game.\nOkay, I will guess a random 3 digit number for now. My first guess will be \\boxed{123}.\nAt turn 1, you guessed 123. This guess receives 1 black peg(s) and 1 white peg(s).\nOkay, let's think. One digit is in the correct place. Perhaps this is 3. One digit is completely incorrect. Let's try switching 1 for 4 and moving the 2. My next guess will be \\boxed{243}.\nAt turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s)." |
+
concat_chat (default) |
+The sequence of [environment observation string, agent action, environment observation string, etc.] from all previous steps concatenated together with the chat template applied to denote "user" (environment) vs "assistant" (agent) turns. | +"<|im_start|>user\nYou are playing Mastermind. [instructions]... Enter your first guess to start the game.<|im_end|>\n<|im_start|>assistant\nOkay, I will guess a random 3 digit number for now. My first guess will be \\boxed{123}<|im_end|> <|im_start|>user\nAt turn 1, you guessed 123. This guess receives 1 black peg(s) and 1 white peg(s).<|im_end|>\n<|im_start|>assistant\nOkay, let's think. One digit is in the correct place. Perhaps this is 3. One digit is completely incorrect. Let's try switching 1 for 4 and moving the 2. My next guess will be \\boxed{243}.<|im_end|>\n<|im_start|>user\nAt turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s).<|im_end|>\n<|im_start|>assistant" |
+
concat_chat_on_reset |
+Same as concat_with_action but the chat template tag is applied at the start. | +"<|im_start|>user\nYou are playing Mastermind. [instructions]... Enter your first guess to start the game.\nOkay, I will guess a random 3 digit number for now. My first guess will be \\boxed{123}.\nAt turn 1, you guessed 123. This guess receives 1 black peg(s) and 1 white peg(s).\nOkay, let's think. One digit is in the correct place. Perhaps this is 3. One digit is completely incorrect. Let's try switching 1 for 4 and moving the 2. My next guess will be \\boxed{243}.\nAt turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s)." |
+
GEM supports integrating multiple tools to the same agent. Tools are handled by the tool wrapper.
The input to env.step() is “action”, a string which is typically the response from the LLM. With the tool env wrapper, when env.step(action) is called, the tool env wrapper iterates through each tool and attempts to parse and execute the action. If any tool is executed successfully, the observation from that tool is returned. If no tool is executed successfully, the action is passed to the wrapped environment.
The tracking wrapper logs statistics over the episode, including cumulative_rewards etc. It is not required but can be useful for debugging.
-GEM supports collecting multiple episodes in parallel, including asynchronously stepping each of the environments (which may include tool calls etc.). VectorEnv environments automatically reset so that when an episode from one of the parallel environments ends, it is automatically resets and begins the next episode.
-Use make_vec() instead of make() when creating environments:
import gem
-
-# Create vectorized environment with 8 parallel instances
-vec_env = gem.make_vec("game:GuessTheNumber-v0", num_envs=8)
-
-# Reset all environments
-observations, infos = vec_env.reset()
-
-# Step all environments
-actions = [env.sample_random_action() for _ in range(8)]
-observations, rewards, terminated, truncated, infos = vec_env.step(actions)
-import gem
+
+# Create vectorized environment with 8 parallel instances
+vec_env = gem.make_vec("game:GuessTheNumber-v0", num_envs=8)
+
+# Reset all environments
+observations, infos = vec_env.reset()
+
+# Step all environments
+actions = [env.sample_random_action() for _ in range(8)]
+observations, rewards, terminated, truncated, infos = vec_env.step(actions)
+Vectorization is particularly useful for:
© 2025 Axon-RL. All rights reserved.
- -© 2025 Axon-RL. All rights reserved.
+ +GEM is a diverse collection of environments for training LLM agents in the era of experience. The library includes Math, Code, general reasoning, and question-answering environments, as well as a suite of games (Mastermind, Minesweeper, Hangman, etc). GEM also features fully integrated python and search tool use.
-pip install gem-llm
-Here’s a simple example to get you started. The interface closely follows Gym and other popular RL environment suites.
-Environments can be initialized with make() (or make_vec() for parallelization) and each environment hasΒ Env.reset(),Β Env.step()Β andΒ Env.sample_random_action() functions.
import gem
-
-# Initialize the environment
-env = make("game:GuessTheNumber-v0")
-
-# Reset the environment to generate the first observation
-observation, info = env.reset()
-for _ in range(30):
- action = env.sample_random_action() # insert policy here
-
- # apply action and receive next observation, reward
- # and whether the episode has ended
- observation, reward, terminated, truncated, info = env.step(action)
-
- # If the episode has ended then reset to start a new episode
- if terminated or truncated:
- observation, info = env.reset()
-pip install gem-llm
+Here’s a simple example to get you started. The interface closely follows Gym and other popular RL environment suites.
+Environments can be initialized with make() (or make_vec() for parallelization) and each environment hasΒ Env.reset(),Β Env.step()Β andΒ Env.sample_random_action() functions.
import gem
+
+# Initialize the environment
+env = make("game:GuessTheNumber-v0")
+
+# Reset the environment to generate the first observation
+observation, info = env.reset()
+for _ in range(30):
+ action = env.sample_random_action() # insert policy here
+
+ # apply action and receive next observation, reward
+ # and whether the episode has ended
+ observation, reward, terminated, truncated, info = env.step(action)
+
+ # If the episode has ended then reset to start a new episode
+ if terminated or truncated:
+ observation, info = env.reset()
+GEM includes single file examples for training an LLM agent through oat or verl framework.
The OAT framework provides a comprehensive solution for training language model agents in reinforcement learning environments.
-The OAT framework provides a comprehensive solution for training language model agents in reinforcement learning environments.
+ -The VERL framework offers another approach to training agents with different optimization strategies and capabilities.
- -The VERL framework offers another approach to training agents with different optimization strategies and capabilities.
© 2025 Axon-RL. All rights reserved.
- -© 2025 Axon-RL. All rights reserved.
+ +GEM provides a comprehensive set of tools to enhance agent capabilities and enable sophisticated problem-solving approaches. GEM currently supports python and search tools to enhance agent capabilities and enable more sophisticated problem-solving approaches.
Allows agents to write and execute Python code, enabling computational problem-solving and data manipulation capabilities.
GEM’s python code tool allows the agent to learn to write code. The python tool parses code blocks, runs them, and returns the result.
-GEM includes a search tool, enabling the agent to learn to call search engines for information retrieval and knowledge enhancement.
-<search> content. Returns the result of the search if a valid search call is found.
- <search></search> call is found.Agents can use the search tool by including search queries in their responses using the <search></search> tags. The tool will:
<search> content. Returns the result of the search if a valid search call is found.
+<search></search> call is found.Agents can use the search tool by including search queries in their responses using the <search></search> tags. The tool will:
© 2025 Axon-RL. All rights reserved.
- -© 2025 Axon-RL. All rights reserved.
+ +Wiring general intelligence through reinforcement learning
- -General Gym - A comprehensive framework for reinforcement learning environments that - provides a unified interface for various RL tasks.
- Learn more -We introduce GEM, our open-source efforts to build a General Experience Maker
- Read more -We're building a team of passionate researchers and developers dedicated to advancing reinforcement - learning. More information about our team members will be available soon.
-Wiring general intelligence through reinforcement learning
+ +General Gym - A comprehensive framework for reinforcement learning environments that +provides a unified interface for various RL tasks.
+Learn more +We introduce GEM, our open-source efforts to build a General Experience Maker
+Read more +We're building a team of passionate researchers and developers dedicated to advancing reinforcement +learning. More information about our team members will be available soon.
+© 2025 Axon-RL. All rights reserved.
+ +