diff --git a/content/gem/_index.md b/content/gem/_index.md index 014d01d..80f5701 100644 --- a/content/gem/_index.md +++ b/content/gem/_index.md @@ -56,7 +56,7 @@ for _ in range(30): GEM includes __single file__ examples for training an LLM agent through `oat` or `verl` framework.
- train with OAT + train with OAT
The [OAT](https://github.com/sail-sg/oat) framework provides a comprehensive solution for training language model agents in reinforcement learning environments. diff --git a/layouts/gem/single.html b/layouts/gem/single.html index ac2d66b..bce422c 100644 --- a/layouts/gem/single.html +++ b/layouts/gem/single.html @@ -34,7 +34,7 @@

✨ Features

🧱 Advanced

diff --git a/public/categories/index.xml b/public/categories/index.xml index 3a12913..4976d4f 100644 --- a/public/categories/index.xml +++ b/public/categories/index.xml @@ -1,11 +1 @@ - - - - Categories on Axon-RL - Wiring General Intelligence Through Reinforcement Learning - http://localhost:53236/categories/ - Recent content in Categories on Axon-RL - Wiring General Intelligence Through Reinforcement Learning - Hugo - en-us - - - +Categories on Axon-RL - Wiring General Intelligence Through Reinforcement Learninghttps://axon-rl.github.io/categories/Recent content in Categories on Axon-RL - Wiring General Intelligence Through Reinforcement LearningHugo -- gohugo.ioen-us \ No newline at end of file diff --git a/public/gem/advanced/index.html b/public/gem/advanced/index.html index 7b32768..ed0084e 100644 --- a/public/gem/advanced/index.html +++ b/public/gem/advanced/index.html @@ -1,98 +1,81 @@ - - - - - - - Axon-RL - Wiring General Intelligence Through Reinforcement Learning - - - - - - + + + + +Axon-RL - Wiring General Intelligence Through Reinforcement Learning + + + + + - -
- -
- - -
- - -
-

🧱 Advanced

- - - -
-

Overview

+
+ +
+
+ +
+

🧱 Advanced

+
+

Overview

Advanced GEM features, custom environments, and training.

-

Custom Environments

-

GEM makes it simple to create custom environments. To create a new environment, simply add .reset() and .step() methods, and then register the environment here. See examples for more information.

-

gem.core.Env.reset()

+

Custom Environments

+

GEM makes it simple to create custom environments. To create a new environment, simply add .reset() and .step() methods, and then register the environment here. See examples for more information.

+

gem.core.Env.reset()

Returns:

  • obs (str) - Initial question/observation from the environment.
  • info (dict) - Any extra info e.g. for logging or to aid debugging.
-

gem.core.Env.step(action)

+

gem.core.Env.step(action)

Returns:

  • obs (str) - Next observation/output from the environment.
  • @@ -101,47 +84,47 @@

    gem.core.Env.step(action)

  • truncated (bool) - Following Gym environments but currently unused.
  • info (dict) - Any extra info.
-

Creating a Custom Environment

+

Creating a Custom Environment

  1. Inherit from gem.core.Env: Your environment should extend the base environment class
  2. Implement Required Methods: Add your custom .reset() and .step() logic
  3. Register the Environment: Add your environment to the registry for easy access
  4. Test and Validate: Ensure your environment works correctly with GEM’s ecosystem
-

Example Structure

-
from gem.core import Env
-from gem.envs.registration import register
-
-class ReverseStringEnv(Env):
-    def __init__(self, str_len: int = 5):
-        super().__init__()
-        self.str_len = str_len
-
-    def _get_instructions(self) -> str:
-        return (
-            "You are tasked to reverse a given string.\n"
-            "You may provide your response in any manner. Only the content wrapped inside \\boxed{} will be considered as your final answer.\n"
-            f"Please reverse the string: {self.gt_str}.\n"
-        )
-
-    def reset(self, seed=None):
-        super().reset(seed)
-        characters = string.ascii_letters + string.digits  # A-Z, a-z, 0-9
-        self.gt_str = "".join(random.choices(characters, k=self.str_len))
-        return self._get_instructions(), {}
-
-    def step(self, action):
-        clean_action = extract_last_boxed_answer(action)
-        if clean_action is None:
-            reward = 0
-        else:
-            reward = float(clean_action[::-1] == self.gt_str)
-        return TERMINAL_STATE, reward, True, True, {}
-
-
-# Register your environment
-register("custom:ReverseString", ReverseStringEnv)
-

Best Practices

+

Example Structure

+
from gem.core import Env
+from gem.envs.registration import register
+
+class ReverseStringEnv(Env):
+    def __init__(self, str_len: int = 5):
+        super().__init__()
+        self.str_len = str_len
+
+    def _get_instructions(self) -> str:
+        return (
+            "You are tasked to reverse a given string.\n"
+            "You may provide your response in any manner. Only the content wrapped inside \\boxed{} will be considered as your final answer.\n"
+            f"Please reverse the string: {self.gt_str}.\n"
+        )
+
+    def reset(self, seed=None):
+        super().reset(seed)
+        characters = string.ascii_letters + string.digits  # A-Z, a-z, 0-9
+        self.gt_str = "".join(random.choices(characters, k=self.str_len))
+        return self._get_instructions(), {}
+
+    def step(self, action):
+        clean_action = extract_last_boxed_answer(action)
+        if clean_action is None:
+            reward = 0
+        else:
+            reward = float(clean_action[::-1] == self.gt_str)
+        return TERMINAL_STATE, reward, True, True, {}
+
+
+# Register your environment
+register("custom:ReverseString", ReverseStringEnv)
+

Best Practices

  • Clear Instructions: Provide clear, unambiguous instructions in your observations
  • Consistent Rewards: Design a reward structure that encourages desired behavior
  • @@ -149,30 +132,20 @@

    Example Structure

  • Informative Output: Use the info dictionary to provide debugging information
  • Documentation: Document your environment’s behavior and expected usage
- -
- -
- - -
-
-

© 2025 Axon-RL. All rights reserved.

- -
-
- - - - - - - - - +
+
+ + + + + - - + \ No newline at end of file diff --git a/public/gem/advanced/index.xml b/public/gem/advanced/index.xml index 87ae121..438e5c3 100644 --- a/public/gem/advanced/index.xml +++ b/public/gem/advanced/index.xml @@ -1,11 +1 @@ - - - - 🧱 Advanced on Axon-RL - Wiring General Intelligence Through Reinforcement Learning - http://localhost:53236/gem/advanced/ - Recent content in 🧱 Advanced on Axon-RL - Wiring General Intelligence Through Reinforcement Learning - Hugo - en-us - - - +🧱 Advanced on Axon-RL - Wiring General Intelligence Through Reinforcement Learninghttps://axon-rl.github.io/gem/advanced/Recent content in 🧱 Advanced on Axon-RL - Wiring General Intelligence Through Reinforcement LearningHugo -- gohugo.ioen-us \ No newline at end of file diff --git a/public/gem/environments/index.html b/public/gem/environments/index.html index 25f05ad..46b7ffa 100644 --- a/public/gem/environments/index.html +++ b/public/gem/environments/index.html @@ -1,88 +1,71 @@ - - - - - - - Axon-RL - Wiring General Intelligence Through Reinforcement Learning - - - - - - + + + + +Axon-RL - Wiring General Intelligence Through Reinforcement Learning + + + + + - -
- -
- - -
- - -
-

🌍 Environments

- - - -
-

Overview

+
+ +
+
+ +
+

🌍 Environments

+
+

Overview

GEM supports a diverse range of environments and makes it easy to add your own. GEM provides four main categories of environments, each designed for different types of agent training and evaluation.

All GEM environments follow a consistent interface pattern:

    @@ -91,201 +74,191 @@

    Overview

  • env.sample_random_action() - Get a random valid action

This design closely follows the Gymnasium standard, making it easy to integrate with existing RL frameworks and tools.

-

Games

+

Games

Interactive game environments including Sudoku, Minesweeper, Wordle, and more from the TextArena collection.

-

We maintain local versions of many of the TextArena games with (i) improved dense game reward design and (ii) compatible gym-style interface.

-

Available Game Environments

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

We maintain local versions of many of the TextArena games with (i) improved dense game reward design and (ii) compatible gym-style interface.

+

Available Game Environments

+
EnvironmentDescription
game:GuessTheNumberThe agent has multiple guesses to guess the hidden number. The agent receives whether the hidden number is higher or lower than its guess.
game:MastermindThe agent has multiple guesses to guess the hidden code. The agent receives black and white pegs depending on the number of correct digits in the right and wrong places.
game:MinesweeperThe agent must reveal all safe grid squares without revealing a mine. For each revealed square the agent receives the number of adjacent squares that contain mines.
game:WordleThe agent must guess the hidden word. After each turn the agent receives feedback ("G"=correct letter + correct position, "Y"=correct letter + incorrect position, "X"=incorrect letter).
game:FifteenPuzzleArrange tiles on the board into ascending order using the empty space to slide tiles into different positions.
game:HangmanThe objective of the game is to guess the word by providing one letter guesses or the entire word.
game:SudokuClassic Sudoku Game. `easy` version renders a 4x4 board.
game:TowerofHanoia classic single-player puzzle game where the objective is to move a stack of disks from one tower to another following specific rules.
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
EnvironmentDescription
game:GuessTheNumberThe agent has multiple guesses to guess the hidden number. The agent receives whether the hidden number is higher or lower than its guess.
game:MastermindThe agent has multiple guesses to guess the hidden code. The agent receives black and white pegs depending on the number of correct digits in the right and wrong places.
game:MinesweeperThe agent must reveal all safe grid squares without revealing a mine. For each revealed square the agent receives the number of adjacent squares that contain mines.
game:WordleThe agent must guess the hidden word. After each turn the agent receives feedback ("G"=correct letter + correct position, "Y"=correct letter + incorrect position, "X"=incorrect letter).
game:FifteenPuzzleArrange tiles on the board into ascending order using the empty space to slide tiles into different positions.
game:HangmanThe objective of the game is to guess the word by providing one letter guesses or the entire word.
game:SudokuClassic Sudoku Game. `easy` version renders a 4x4 board.
game:TowerofHanoia classic single-player puzzle game where the objective is to move a stack of disks from one tower to another following specific rules.
-

Difficulty Variants

+

Difficulty Variants

Each environment additionally has -easy, -hard, and -random variants, where -random denotes that the environment is set to a random level of difficulty at each reset.

-

Adding New Games

+

Adding New Games

Adding new games is easy. Simply include .step(), .reset() functions and register the environment with a new name.

-

Math

+

Math

Mathematical reasoning environments with automatic answer parsing and checking, compatible with various math datasets.

GEM’s math environment class includes automatic answer parsing and checking and is designed to be compatible with any math dataset. To add a new environment simply register the dataset. A typical use case is combining these with access to the python tool to train the agent to utilize code.

-

Available Math Environments

- - - - - - - - - - - - - - - - - - - - - - - - - +

Available Math Environments

+
EnvironmentDataset
math:ASDIV2kASDIV-2k
math:GSM8kGSM-8k
math:Math12kMATH-12k
math:ORZ57kORZ-57k
+ + + + + + + + + + + + + + + + + + + + + + + +
EnvironmentDataset
math:ASDIV2kASDIV-2k
math:GSM8kGSM-8k
math:Math12kMATH-12k
math:ORZ57kORZ-57k
-

Features

+

Features

  • Automatic Answer Parsing: Built-in parsing for mathematical expressions and numerical answers
  • Answer Checking: Automatic validation of agent responses against ground truth
  • Dataset Compatibility: Works with any math dataset that follows the standard format
  • Tool Integration: Designed to work seamlessly with Python tool for computational assistance
-

Code

+

Code

Code generation and evaluation environments that automatically test solutions in sandboxed environments.

GEM’s code environment class automatically evaluates success by running the test cases in a sandbox. This class can be used with any code dataset consisting of the task and test cases.

-

Available Code Environments

- - - - - - - - - - - - - - - - - +

Available Code Environments

+
EnvironmentDataset
code:CodeContestCodeContest
code:Taco8kTACO-8k
+ + + + + + + + + + + + + + + +
EnvironmentDataset
code:CodeContestCodeContest
code:Taco8kTACO-8k
-

Features

+

Features

  • Automatic Code Evaluation: Runs test cases in a secure sandbox environment
  • Test Case Validation: Compares agent-generated code against provided test cases
  • Sandbox Diversity: Two execution options are available.
      -
    • Sandboxed environment using bubblewrap
    • +
    • Sandboxed environment using bubblewrap
    • Implementation with Python’s subprocess code.
  • Dataset Diversity: Compatible with any code dataset that includes problems and test cases
-

Question-Answering

+

Question-Answering

QA environments designed for integrated search tool usage to train agents in information retrieval and reasoning.

GEM’s question-answering environments are designed to allow integrated search tool usage to train the agent to use search functionality. Additional question-answering environments can be added by simply registering the dataset.

-

Available QA Environments

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

Available QA Environments

+
EnvironmentDataset
qa:NaturalQuestionsNaturalQuestions
qa:HotpotQAHotpotQA
logic:RuleTaker-d0RuleTaker-d0-70k
logic:RuleTaker-d1RuleTaker-d1-70k
logic:RuleTaker-d2RuleTaker-d2-70k
logic:RuleTaker-d3RuleTaker-d3-70k
logic:RuleTaker-d5RuleTaker-d5-70k
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
EnvironmentDataset
qa:NaturalQuestionsNaturalQuestions
qa:HotpotQAHotpotQA
logic:RuleTaker-d0RuleTaker-d0-70k
logic:RuleTaker-d1RuleTaker-d1-70k
logic:RuleTaker-d2RuleTaker-d2-70k
logic:RuleTaker-d3RuleTaker-d3-70k
logic:RuleTaker-d5RuleTaker-d5-70k
-

Environment Types

+

Environment Types

  • Natural Questions: Real-world questions that people ask search engines, requiring factual knowledge and reasoning
  • HotpotQA: Multi-hop reasoning questions that require gathering information from multiple sources
  • RuleTaker: Logical reasoning environments with varying complexity levels (d0 through d5), where agents must apply rules to derive conclusions
-

Reasoning Gym

-

We include all tasks in Reasoning Gym in our package, which could be simply used by calling make(rg:[sub_task_name]).

- -
- -
+

Reasoning Gym

+

We include all tasks in Reasoning Gym in our package, which could be simply used by calling make(rg:[sub_task_name]).

+
+
+
+ + + + + - - + \ No newline at end of file diff --git a/public/gem/environments/index.xml b/public/gem/environments/index.xml index a787a62..2d854f7 100644 --- a/public/gem/environments/index.xml +++ b/public/gem/environments/index.xml @@ -1,11 +1 @@ - - - - 🌍 Environments on Axon-RL - Wiring General Intelligence Through Reinforcement Learning - http://localhost:53236/gem/environments/ - Recent content in 🌍 Environments on Axon-RL - Wiring General Intelligence Through Reinforcement Learning - Hugo - en-us - - - +🌍 Environments on Axon-RL - Wiring General Intelligence Through Reinforcement Learninghttps://axon-rl.github.io/gem/environments/Recent content in 🌍 Environments on Axon-RL - Wiring General Intelligence Through Reinforcement LearningHugo -- gohugo.ioen-us \ No newline at end of file diff --git a/public/gem/features/index.html b/public/gem/features/index.html index f1ea3b2..9dbb16e 100644 --- a/public/gem/features/index.html +++ b/public/gem/features/index.html @@ -1,252 +1,235 @@ - - - - - - - Axon-RL - Wiring General Intelligence Through Reinforcement Learning - - - - - - + + + + +Axon-RL - Wiring General Intelligence Through Reinforcement Learning + + + + + - -
- -
- - -
- - -
-

✨ Features

- - - -
-

Wrappers

-

Following the Gym interface, GEM provides wrappers to easily add and change functionality. Wrappers are registered in the WRAPPER_FACTORY.

+
+ +
+
+ +
+

✨ Features

+
+

Wrappers

+

Following the Gym interface, GEM provides wrappers to easily add and change functionality. Wrappers are registered in the WRAPPER_FACTORY.

The main wrapper types are: observation wrappers, tool wrappers, and episode tracking wrappers.

-
- Note: Order is important! Wrappers should be added in the following order:
- tool env wrapper (optional) β†’ observation wrapper (optional) β†’ episode tracking wrapper (optional). +
+Note: Order is important! Wrappers should be added in the following order:
+tool env wrapper (optional) β†’ observation wrapper (optional) β†’ episode tracking wrapper (optional).
-

Observation Wrappers

+

Observation Wrappers

Observation wrappers are used to convert the sequence of game states and agent actions into a string which is used as the prompt for the LLM agent at the next step.

-

Observation Wrapper Examples

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

Observation Wrapper Examples

+
Wrapper nameDescriptionExample (Mastermind)
no wrapperThe observation string from the environment."At turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s)."
concatThe sequence of environment observation strings from all previous steps concatenated together."You are playing Mastermind. [instructions]... Enter your first guess to start the game.\nAt turn 1, you guessed 123. This guess receives 1 black peg(s) and 1 white peg(s).\nAt turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s)."
concat_with_actionThe sequence of [environment observation string, agent action, environment observation string, etc.] from all previous steps concatenated together."You are playing Mastermind. [instructions]... Enter your first guess to start the game.\nOkay, I will guess a random 3 digit number for now. My first guess will be \\boxed{123}.\nAt turn 1, you guessed 123. This guess receives 1 black peg(s) and 1 white peg(s).\nOkay, let's think. One digit is in the correct place. Perhaps this is 3. One digit is completely incorrect. Let's try switching 1 for 4 and moving the 2. My next guess will be \\boxed{243}.\nAt turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s)."
concat_chat (default)The sequence of [environment observation string, agent action, environment observation string, etc.] from all previous steps concatenated together with the chat template applied to denote "user" (environment) vs "assistant" (agent) turns."<|im_start|>user\nYou are playing Mastermind. [instructions]... Enter your first guess to start the game.<|im_end|>\n<|im_start|>assistant\nOkay, I will guess a random 3 digit number for now. My first guess will be \\boxed{123}<|im_end|> <|im_start|>user\nAt turn 1, you guessed 123. This guess receives 1 black peg(s) and 1 white peg(s).<|im_end|>\n<|im_start|>assistant\nOkay, let's think. One digit is in the correct place. Perhaps this is 3. One digit is completely incorrect. Let's try switching 1 for 4 and moving the 2. My next guess will be \\boxed{243}.<|im_end|>\n<|im_start|>user\nAt turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s).<|im_end|>\n<|im_start|>assistant"
concat_chat_on_resetSame as concat_with_action but the chat template tag is applied at the start."<|im_start|>user\nYou are playing Mastermind. [instructions]... Enter your first guess to start the game.\nOkay, I will guess a random 3 digit number for now. My first guess will be \\boxed{123}.\nAt turn 1, you guessed 123. This guess receives 1 black peg(s) and 1 white peg(s).\nOkay, let's think. One digit is in the correct place. Perhaps this is 3. One digit is completely incorrect. Let's try switching 1 for 4 and moving the 2. My next guess will be \\boxed{243}.\nAt turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s)."
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Wrapper nameDescriptionExample (Mastermind)
no wrapperThe observation string from the environment."At turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s)."
concatThe sequence of environment observation strings from all previous steps concatenated together."You are playing Mastermind. [instructions]... Enter your first guess to start the game.\nAt turn 1, you guessed 123. This guess receives 1 black peg(s) and 1 white peg(s).\nAt turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s)."
concat_with_actionThe sequence of [environment observation string, agent action, environment observation string, etc.] from all previous steps concatenated together."You are playing Mastermind. [instructions]... Enter your first guess to start the game.\nOkay, I will guess a random 3 digit number for now. My first guess will be \\boxed{123}.\nAt turn 1, you guessed 123. This guess receives 1 black peg(s) and 1 white peg(s).\nOkay, let's think. One digit is in the correct place. Perhaps this is 3. One digit is completely incorrect. Let's try switching 1 for 4 and moving the 2. My next guess will be \\boxed{243}.\nAt turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s)."
concat_chat (default)The sequence of [environment observation string, agent action, environment observation string, etc.] from all previous steps concatenated together with the chat template applied to denote "user" (environment) vs "assistant" (agent) turns."<|im_start|>user\nYou are playing Mastermind. [instructions]... Enter your first guess to start the game.<|im_end|>\n<|im_start|>assistant\nOkay, I will guess a random 3 digit number for now. My first guess will be \\boxed{123}<|im_end|> <|im_start|>user\nAt turn 1, you guessed 123. This guess receives 1 black peg(s) and 1 white peg(s).<|im_end|>\n<|im_start|>assistant\nOkay, let's think. One digit is in the correct place. Perhaps this is 3. One digit is completely incorrect. Let's try switching 1 for 4 and moving the 2. My next guess will be \\boxed{243}.<|im_end|>\n<|im_start|>user\nAt turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s).<|im_end|>\n<|im_start|>assistant"
concat_chat_on_resetSame as concat_with_action but the chat template tag is applied at the start."<|im_start|>user\nYou are playing Mastermind. [instructions]... Enter your first guess to start the game.\nOkay, I will guess a random 3 digit number for now. My first guess will be \\boxed{123}.\nAt turn 1, you guessed 123. This guess receives 1 black peg(s) and 1 white peg(s).\nOkay, let's think. One digit is in the correct place. Perhaps this is 3. One digit is completely incorrect. Let's try switching 1 for 4 and moving the 2. My next guess will be \\boxed{243}.\nAt turn 2, you guessed 243. This guess receives 1 black peg(s) and 2 white peg(s)."
-

Tool Env Wrapper

+

Tool Env Wrapper

GEM supports integrating multiple tools to the same agent. Tools are handled by the tool wrapper.

The input to env.step() is “action”, a string which is typically the response from the LLM. With the tool env wrapper, when env.step(action) is called, the tool env wrapper iterates through each tool and attempts to parse and execute the action. If any tool is executed successfully, the observation from that tool is returned. If no tool is executed successfully, the action is passed to the wrapped environment.

-
-
-

gem.tools.tool_env_wrapper.ToolEnvWrapper

-
-
-
-

Attributes

-
    -
  • - env -
    The wrapped environment.
    -
  • -
  • - tools (List[BaseTool]) -
    A list of tools.
    -
  • -
  • - tool_reward (float = 0.05) -
    Reward if a tool is called.
    -
  • -
  • - tool_success_reward (float = 0.05) -
    Additional reward if the tool call is executed without errors.
    -
  • -
  • - max_tool_uses (int = 10) -
    Maximum number of tool uses allowed.
    -
  • -
-
-
.reset()
-
-

Returns

-
    -
  • - obs (str) -
    The ToolEnvWrapper.env.reset() output (ie. the environment question), with a list of the available tools and instructions concatenated onto the end.
    -
  • -
  • - info (dict) -
    Extra info about the episode state.
    -
  • -
-
-
.step(action: str)
-
-

Parameters

-
    -
  • - action (str) -
    The response from the LLM agent.
    -
  • -
-
-
-

Returns

-
    -
  • - observation (str) -
    The output of the tool call if a tool call is found, otherwise the observation from ToolEnvWrapper.env.step().
    -
  • -
  • - reward (float) -
    tool_reward if a tool call is found (+ tool_success_reward if the tool call is executed without errors), otherwise the reward from ToolEnvWrapper.env.step()
    -
  • -
  • - terminated (bool) -
    Whether the episode is terminated.
    -
  • -
  • - truncated (bool) -
    Whether the episode is truncated.
    -
  • -
  • - info (dict) -
    Extra info about the episode state.
    -
  • -
-
-
+
+
+

gem.tools.tool_env_wrapper.ToolEnvWrapper

-

Episode Tracking Wrapper

+
+
+

Attributes

+
    +
  • +env +
    The wrapped environment.
    +
  • +
  • +tools (List[BaseTool]) +
    A list of tools.
    +
  • +
  • +tool_reward (float = 0.05) +
    Reward if a tool is called.
    +
  • +
  • +tool_success_reward (float = 0.05) +
    Additional reward if the tool call is executed without errors.
    +
  • +
  • +max_tool_uses (int = 10) +
    Maximum number of tool uses allowed.
    +
  • +
+
+
.reset()
+
+

Returns

+
    +
  • +obs (str) +
    The ToolEnvWrapper.env.reset() output (ie. the environment question), with a list of the available tools and instructions concatenated onto the end.
    +
  • +
  • +info (dict) +
    Extra info about the episode state.
    +
  • +
+
+
.step(action: str)
+
+

Parameters

+
    +
  • +action (str) +
    The response from the LLM agent.
    +
  • +
+
+
+

Returns

+
    +
  • +observation (str) +
    The output of the tool call if a tool call is found, otherwise the observation from ToolEnvWrapper.env.step().
    +
  • +
  • +reward (float) +
    tool_reward if a tool call is found (+ tool_success_reward if the tool call is executed without errors), otherwise the reward from ToolEnvWrapper.env.step()
    +
  • +
  • +terminated (bool) +
    Whether the episode is terminated.
    +
  • +
  • +truncated (bool) +
    Whether the episode is truncated.
    +
  • +
  • +info (dict) +
    Extra info about the episode state.
    +
  • +
+
+
+
+

Episode Tracking Wrapper

The tracking wrapper logs statistics over the episode, including cumulative_rewards etc. It is not required but can be useful for debugging.

-

Vectorization

+

Vectorization

GEM supports collecting multiple episodes in parallel, including asynchronously stepping each of the environments (which may include tool calls etc.). VectorEnv environments automatically reset so that when an episode from one of the parallel environments ends, it is automatically resets and begins the next episode.

-
- Performance tip: Use vectorization for better throughput when training agents on multiple episodes simultaneously. +
+Performance tip: Use vectorization for better throughput when training agents on multiple episodes simultaneously.
-

Benefits

+

Benefits

  • Improved Throughput: Run multiple environments simultaneously for faster data collection
  • Automatic Reset: Environments automatically reset when episodes end, ensuring continuous operation
  • Asynchronous Execution: Each environment can step independently, maximizing efficiency
  • Tool Support: Vectorized environments fully support tool usage across all parallel instances
-

Usage

+

Usage

Use make_vec() instead of make() when creating environments:

-
import gem
-
-# Create vectorized environment with 8 parallel instances
-vec_env = gem.make_vec("game:GuessTheNumber-v0", num_envs=8)
-
-# Reset all environments
-observations, infos = vec_env.reset()
-
-# Step all environments
-actions = [env.sample_random_action() for _ in range(8)]
-observations, rewards, terminated, truncated, infos = vec_env.step(actions)
-

Key Features

+
import gem
+
+# Create vectorized environment with 8 parallel instances
+vec_env = gem.make_vec("game:GuessTheNumber-v0", num_envs=8)
+
+# Reset all environments
+observations, infos = vec_env.reset()
+
+# Step all environments
+actions = [env.sample_random_action() for _ in range(8)]
+observations, rewards, terminated, truncated, infos = vec_env.step(actions)
+

Key Features

  • Automatic Management: No need to manually handle environment resets
  • Scalable: Easily adjust the number of parallel environments based on your computational resources
  • Compatible: Works with all GEM environments, tools, and wrappers
  • Efficient: Optimized for minimal overhead in parallel execution
-

Use Cases

+

Use Cases

Vectorization is particularly useful for:

  • Training reinforcement learning agents
  • @@ -254,30 +237,20 @@

    Use Cases

  • Running evaluation experiments across multiple episodes
  • Testing agent performance with statistical significance
- -
- -
- - -
-
-

© 2025 Axon-RL. All rights reserved.

- -
-
- - - - - - - - - +
+
+ + + + + - - + \ No newline at end of file diff --git a/public/gem/features/index.xml b/public/gem/features/index.xml index 8db6229..572aa4f 100644 --- a/public/gem/features/index.xml +++ b/public/gem/features/index.xml @@ -1,11 +1 @@ - - - - ✨ Features on Axon-RL - Wiring General Intelligence Through Reinforcement Learning - http://localhost:53236/gem/features/ - Recent content in ✨ Features on Axon-RL - Wiring General Intelligence Through Reinforcement Learning - Hugo - en-us - - - +✨ Features on Axon-RL - Wiring General Intelligence Through Reinforcement Learninghttps://axon-rl.github.io/gem/features/Recent content in ✨ Features on Axon-RL - Wiring General Intelligence Through Reinforcement LearningHugo -- gohugo.ioen-us \ No newline at end of file diff --git a/public/gem/index.html b/public/gem/index.html index 2af3898..e01a1fe 100644 --- a/public/gem/index.html +++ b/public/gem/index.html @@ -1,151 +1,124 @@ - - - - - - - Axon-RL - Wiring General Intelligence Through Reinforcement Learning - - - - - - + + + + +Axon-RL - Wiring General Intelligence Through Reinforcement Learning + + + + + - -
- -
- - -
- - -
-

πŸš€ Getting Started

- - - -
-

Overview

+
+ +
+
+ +
+

πŸš€ Getting Started

+
+

Overview

GEM is a diverse collection of environments for training LLM agents in the era of experience. The library includes Math, Code, general reasoning, and question-answering environments, as well as a suite of games (Mastermind, Minesweeper, Hangman, etc). GEM also features fully integrated python and search tool use.

-
- New to GEM? Start with our Quick Start guide below to get started and running in minutes. +
+New to GEM? Start with our Quick Start guide below to get started and running in minutes.
-

Installation

-
pip install gem-llm
-

Quick Start

-

Here’s a simple example to get you started. The interface closely follows Gym and other popular RL environment suites.

-

Environments can be initialized with make() (or make_vec() for parallelization) and each environment hasΒ Env.reset(),Β Env.step()Β andΒ Env.sample_random_action() functions.

-
import gem
-
-# Initialize the environment
-env = make("game:GuessTheNumber-v0")
-
-# Reset the environment to generate the first observation
-observation, info = env.reset()
-for _ in range(30):
-    action = env.sample_random_action() # insert policy here
-
-    # apply action and receive next observation, reward
-    # and whether the episode has ended
-    observation, reward, terminated, truncated, info = env.step(action)
-
-    # If the episode has ended then reset to start a new episode
-    if terminated or truncated:
-        observation, info = env.reset()
-
- Please see further documentation for details of vectorized environments, automated resetting, different observation/chat templates, and integrated tools. +

Installation

+
pip install gem-llm
+

Quick Start

+

Here’s a simple example to get you started. The interface closely follows Gym and other popular RL environment suites.

+

Environments can be initialized with make() (or make_vec() for parallelization) and each environment hasΒ Env.reset(),Β Env.step()Β andΒ Env.sample_random_action() functions.

+
import gem
+
+# Initialize the environment
+env = make("game:GuessTheNumber-v0")
+
+# Reset the environment to generate the first observation
+observation, info = env.reset()
+for _ in range(30):
+    action = env.sample_random_action() # insert policy here
+
+    # apply action and receive next observation, reward
+    # and whether the episode has ended
+    observation, reward, terminated, truncated, info = env.step(action)
+
+    # If the episode has ended then reset to start a new episode
+    if terminated or truncated:
+        observation, info = env.reset()
+
+Please see further documentation for details of vectorized environments, automated resetting, different observation/chat templates, and integrated tools.
-

Training Agents

+

Training Agents

GEM includes single file examples for training an LLM agent through oat or verl framework.

-

The OAT framework provides a comprehensive solution for training language model agents in reinforcement learning environments.

-
- train with verl +

The OAT framework provides a comprehensive solution for training language model agents in reinforcement learning environments.

+ -

The VERL framework offers another approach to training agents with different optimization strategies and capabilities.

- -
- -
+

The VERL framework offers another approach to training agents with different optimization strategies and capabilities.

- - -
-
-

© 2025 Axon-RL. All rights reserved.

- -
-
- - - - - - - - - +
+
+ + + + + - - + \ No newline at end of file diff --git a/public/gem/index.xml b/public/gem/index.xml index 8aac96a..5794dfb 100644 --- a/public/gem/index.xml +++ b/public/gem/index.xml @@ -1,11 +1 @@ - - - - πŸš€ Getting Started on Axon-RL - Wiring General Intelligence Through Reinforcement Learning - http://localhost:53236/gem/ - Recent content in πŸš€ Getting Started on Axon-RL - Wiring General Intelligence Through Reinforcement Learning - Hugo - en-us - - - +πŸš€ Getting Started on Axon-RL - Wiring General Intelligence Through Reinforcement Learninghttps://axon-rl.github.io/gem/Recent content in πŸš€ Getting Started on Axon-RL - Wiring General Intelligence Through Reinforcement LearningHugo -- gohugo.ioen-us \ No newline at end of file diff --git a/public/gem/overview/index.html b/public/gem/overview/index.html index 5b98e1a..1dcbf4c 100644 --- a/public/gem/overview/index.html +++ b/public/gem/overview/index.html @@ -1,10 +1 @@ - - - - http://localhost:53236/gem/ - - - - - - +https://axon-rl.github.io/gem/ \ No newline at end of file diff --git a/public/gem/tools/index.html b/public/gem/tools/index.html index 4ee03a2..d620a0b 100644 --- a/public/gem/tools/index.html +++ b/public/gem/tools/index.html @@ -1,236 +1,209 @@ - - - - - - - Axon-RL - Wiring General Intelligence Through Reinforcement Learning - - - - - - + + + + +Axon-RL - Wiring General Intelligence Through Reinforcement Learning + + + + + - -
- -
- - -
- - -
-

πŸ› οΈ Tools

- - - -
-

Overview

+
+ +
+
+ +
+

πŸ› οΈ Tools

+
+

Overview

GEM provides a comprehensive set of tools to enhance agent capabilities and enable sophisticated problem-solving approaches. GEM currently supports python and search tools to enhance agent capabilities and enable more sophisticated problem-solving approaches.

-

Python Tool

+

Python Tool

Allows agents to write and execute Python code, enabling computational problem-solving and data manipulation capabilities.

GEM’s python code tool allows the agent to learn to write code. The python tool parses code blocks, runs them, and returns the result.

-

API Reference

-
-
-

gem.tools.python_code_tool.PythonCodeTool

-
-
-
.execute_action(action: str)
-
- Parses the action to find the first complete code block. If a valid code block is found the code is run and the output is returned. -
-
-

Parameters

-
    -
  • - action (str) -
    The response from the LLM agent.
    -
  • -
-
-
-

Returns

-
    -
  • - is_valid (bool) -
    Whether a valid code block is found.
    -
  • -
  • - has_error (bool) -
    Whether the code gave an error.
    -
  • -
  • - observation (str) -
    The output of running the code if a valid code block is found, otherwise an empty string.
    -
  • -
  • - parsed_action (str) -
    The action truncated at the end of the first valid code block. If no code block is found then parsed_action is set to the input action.
    -
  • -
-
-
.instruction_string()
-
- A string for adding to the prompt to instruct the agent that the python code tool is available. -
-
-

Returns

-
    -
  • - str -
    Instruction string for the agent
    -
  • -
-
-
-
-

Search Tool

+

API Reference

+
+
+

gem.tools.python_code_tool.PythonCodeTool

+
+
+
.execute_action(action: str)
+
+Parses the action to find the first complete code block. If a valid code block is found the code is run and the output is returned. +
+
+

Parameters

+
    +
  • +action (str) +
    The response from the LLM agent.
    +
  • +
+
+
+

Returns

+
    +
  • +is_valid (bool) +
    Whether a valid code block is found.
    +
  • +
  • +has_error (bool) +
    Whether the code gave an error.
    +
  • +
  • +observation (str) +
    The output of running the code if a valid code block is found, otherwise an empty string.
    +
  • +
  • +parsed_action (str) +
    The action truncated at the end of the first valid code block. If no code block is found then parsed_action is set to the input action.
    +
  • +
+
+
.instruction_string()
+
+A string for adding to the prompt to instruct the agent that the python code tool is available. +
+
+

Returns

+
    +
  • +str +
    Instruction string for the agent
    +
  • +
+
+
+
+

Search Tool

GEM includes a search tool, enabling the agent to learn to call search engines for information retrieval and knowledge enhancement.

-

API Reference

-
-
-

gem.tools.search_tool.SearchTool

-
-
-
.execute_action(action: str)
-
- Parses the action to find the first complete extract the <search> content. Returns the result of the search if a valid search call is found. -
-
-

Parameters

-
    -
  • - action (str) -
    The response from the LLM agent.
    -
  • -
-
-
-

Returns

-
    -
  • - is_valid (bool) -
    Whether a valid <search></search> call is found.
    -
  • -
  • - has_error (bool) -
    Whether the search engine gave an error.
    -
  • -
  • - observation (str) -
    The output of running the search if a valid search call is found, otherwise an empty string.
    -
  • -
  • - parsed_action (str) -
    The action truncated at the end of the first valid search call. If no search call is found then parsed_action is set to the input action.
    -
  • -
-
-
.instruction_string()
-
- A string for adding to the prompt to instruct the agent that the search tool is available. -
-
-

Returns

-
    -
  • - str -
    Instruction string for the agent
    -
  • -
-
-
-
-

Usage

-

Agents can use the search tool by including search queries in their responses using the <search></search> tags. The tool will:

+

API Reference

+
+
+

gem.tools.search_tool.SearchTool

+
+
+
.execute_action(action: str)
+
+Parses the action to find the first complete extract the <search> content. Returns the result of the search if a valid search call is found. +
+
+

Parameters

+
    +
  • +action (str) +
    The response from the LLM agent.
    +
  • +
+
+
+

Returns

+
    +
  • +is_valid (bool) +
    Whether a valid <search></search> call is found.
    +
  • +
  • +has_error (bool) +
    Whether the search engine gave an error.
    +
  • +
  • +observation (str) +
    The output of running the search if a valid search call is found, otherwise an empty string.
    +
  • +
  • +parsed_action (str) +
    The action truncated at the end of the first valid search call. If no search call is found then parsed_action is set to the input action.
    +
  • +
+
+
.instruction_string()
+
+A string for adding to the prompt to instruct the agent that the search tool is available. +
+
+

Returns

+
    +
  • +str +
    Instruction string for the agent
    +
  • +
+
+
+
+

Usage

+

Agents can use the search tool by including search queries in their responses using the <search></search> tags. The tool will:

  1. Parse the search query from the agent’s response
  2. Execute the search using the configured search engine
  3. Return the search results to the agent
  4. Allow the agent to use this information for better responses
- -
- -
-
- - -
-
-

© 2025 Axon-RL. All rights reserved.

- -
-
- - - - - - - - - +
+
+
+ + + + + - - + \ No newline at end of file diff --git a/public/gem/tools/index.xml b/public/gem/tools/index.xml index 3de0d41..141ecc0 100644 --- a/public/gem/tools/index.xml +++ b/public/gem/tools/index.xml @@ -1,11 +1 @@ - - - - πŸ› οΈ Tools on Axon-RL - Wiring General Intelligence Through Reinforcement Learning - http://localhost:53236/gem/tools/ - Recent content in πŸ› οΈ Tools on Axon-RL - Wiring General Intelligence Through Reinforcement Learning - Hugo - en-us - - - +πŸ› οΈ Tools on Axon-RL - Wiring General Intelligence Through Reinforcement Learninghttps://axon-rl.github.io/gem/tools/Recent content in πŸ› οΈ Tools on Axon-RL - Wiring General Intelligence Through Reinforcement LearningHugo -- gohugo.ioen-us \ No newline at end of file diff --git a/public/index.html b/public/index.html index a6439eb..916f225 100644 --- a/public/index.html +++ b/public/index.html @@ -1,122 +1,97 @@ - - - + - - - - Axon-RL - Wiring General Intelligence Through Reinforcement Learning - - - - - - + + + +Axon-RL - Wiring General Intelligence Through Reinforcement Learning + + + + + - -
- -
- - -
- -
- -

Wiring general intelligence through reinforcement learning

- -
- - -
-

Projects

-
-

πŸ’Ž GEM

- -

General Gym - A comprehensive framework for reinforcement learning environments that - provides a unified interface for various RL tasks.

- Learn more -
-
- - -
-

Blogs

- -
- - -
-

Research Group

-
-

About Our Team

- -

We're building a team of passionate researchers and developers dedicated to advancing reinforcement - learning. More information about our team members will be available soon.

-
-
+
+ +
+
+
+ +

Wiring general intelligence through reinforcement learning

+ +
+
+

Projects

+
+

πŸ’Ž GEM

+ +

General Gym - A comprehensive framework for reinforcement learning environments that +provides a unified interface for various RL tasks.

+Learn more +
+
+
+

Blogs

+ +
+
+

Research Group

+
+

About Our Team

+ +

We're building a team of passionate researchers and developers dedicated to advancing reinforcement +learning. More information about our team members will be available soon.

+
+
+
+ + + + + - - + \ No newline at end of file diff --git a/public/index.xml b/public/index.xml index 4b14142..d9f58a3 100644 --- a/public/index.xml +++ b/public/index.xml @@ -1,11 +1 @@ - - - - Axon-RL - Wiring General Intelligence Through Reinforcement Learning - http://localhost:53236/ - Recent content on Axon-RL - Wiring General Intelligence Through Reinforcement Learning - Hugo - en-us - - - +Axon-RL - Wiring General Intelligence Through Reinforcement Learninghttps://axon-rl.github.io/Recent content on Axon-RL - Wiring General Intelligence Through Reinforcement LearningHugo -- gohugo.ioen-us \ No newline at end of file diff --git a/public/js/code-copy.min.js b/public/js/code-copy.min.js index b061f56..2361715 100644 --- a/public/js/code-copy.min.js +++ b/public/js/code-copy.min.js @@ -1,4 +1,4 @@ -document.addEventListener("DOMContentLoaded",function(){function e(){const e=document.querySelectorAll("pre:has(code), .highlight");e.forEach(function(e){if(e.querySelector(".copy-button"))return;const t=document.createElement("button");t.className="copy-button",t.innerHTML=` +document.addEventListener('DOMContentLoaded',function(){function a(){const a=document.querySelectorAll('pre:has(code), .highlight');a.forEach(function(b){if(b.querySelector('.copy-button'))return;const a=document.createElement('button');a.className='copy-button',a.innerHTML=` @@ -7,4 +7,4 @@ document.addEventListener("DOMContentLoaded",function(){function e(){const e=doc Copy - `,t.setAttribute("aria-label","Copy code to clipboard"),t.setAttribute("title","Copy code"),e.style.position="relative",t.addEventListener("click",function(){o(e,t)}),e.appendChild(t)})}function o(e,s){let o=e.querySelector("code");o||(o=e);const i=o.textContent||o.innerText;navigator.clipboard&&window.isSecureContext?navigator.clipboard.writeText(i).then(function(){n(s)}).catch(function(e){console.error("Failed to copy code: ",e),t(i,s)}):t(i,s)}function t(e,t){const o=document.createElement("textarea");o.value=e,o.style.position="fixed",o.style.left="-999999px",o.style.top="-999999px",document.body.appendChild(o),o.focus(),o.select();try{const e=document.execCommand("copy");e?n(t):s(t)}catch(e){console.error("Fallback: Oops, unable to copy",e),s(t)}document.body.removeChild(o)}function n(e){const t=e.querySelector(".copy-icon"),n=e.querySelector(".check-icon"),s=e.querySelector(".copy-text");t.style.display="none",n.style.display="block",s.textContent="Copied!",e.classList.add("copied"),setTimeout(function(){t.style.display="block",n.style.display="none",s.textContent="Copy",e.classList.remove("copied")},2e3)}function s(e){const t=e.querySelector(".copy-text");t.textContent="Failed",e.classList.add("error"),setTimeout(function(){t.textContent="Copy",e.classList.remove("error")},2e3)}e();const i=new MutationObserver(function(t){t.forEach(function(t){t.type==="childList"&&e()})});i.observe(document.body,{childList:!0,subtree:!0})}) \ No newline at end of file + `,a.setAttribute('aria-label','Copy code to clipboard'),a.setAttribute('title','Copy code'),b.style.position='relative',a.addEventListener('click',function(){e(b,a)}),b.appendChild(a)})}function e(f,d){let a=f.querySelector('code');a||(a=f);const e=a.textContent||a.innerText;navigator.clipboard&&window.isSecureContext?navigator.clipboard.writeText(e).then(function(){c(d)}).catch(function(a){console.error('Failed to copy code: ',a),b(e,d)}):b(e,d)}function b(e,b){const a=document.createElement('textarea');a.value=e,a.style.position='fixed',a.style.left='-999999px',a.style.top='-999999px',document.body.appendChild(a),a.focus(),a.select();try{const a=document.execCommand('copy');a?c(b):d(b)}catch(a){console.error('Fallback: Oops, unable to copy',a),d(b)}document.body.removeChild(a)}function c(a){const b=a.querySelector('.copy-icon'),c=a.querySelector('.check-icon'),d=a.querySelector('.copy-text');b.style.display='none',c.style.display='block',d.textContent='Copied!',a.classList.add('copied'),setTimeout(function(){b.style.display='block',c.style.display='none',d.textContent='Copy',a.classList.remove('copied')},2e3)}function d(a){const b=a.querySelector('.copy-text');b.textContent='Failed',a.classList.add('error'),setTimeout(function(){b.textContent='Copy',a.classList.remove('error')},2e3)}a();const f=new MutationObserver(function(b){b.forEach(function(b){b.type==='childList'&&a()})});f.observe(document.body,{childList:!0,subtree:!0})}) \ No newline at end of file diff --git a/public/js/smooth-scroll.min.js b/public/js/smooth-scroll.min.js index 3e364aa..c1fb4c3 100644 --- a/public/js/smooth-scroll.min.js +++ b/public/js/smooth-scroll.min.js @@ -1 +1 @@ -document.addEventListener("DOMContentLoaded",function(){function s(e,t=300){const s=document.querySelector(e);if(!s)return console.warn(`Target element not found: ${e}`),!1;const o=window.pageYOffset,a=s.getBoundingClientRect().top+window.pageYOffset,r=a-120,c=r-o;let n=null;function i(e){n===null&&(n=e);const s=e-n,a=l(s,o,c,t);window.scrollTo(0,a),s{t=null},0),document.querySelectorAll(".gem-sidebar a").forEach(e=>{e.classList.remove("active")}),n.classList.add("active"),e(n)}function e(e){const t=document.querySelector(".gem-sidebar");if(!t||!e)return;const n=e.offsetTop,s=e.offsetHeight,o=t.clientHeight,i=n-o/2+s/2;t.scrollTo({top:i,behavior:"smooth"})}document.querySelectorAll('a[href*="#"]').forEach(e=>{e.addEventListener("click",i)}),window.addEventListener("popstate",function(){if(window.location.hash){const o=window.location.hash.substring(1),t=document.querySelector(`.gem-sidebar a[href$="#${o}"]`);t&&(document.querySelectorAll(".gem-sidebar a").forEach(e=>{e.classList.remove("active")}),t.classList.add("active"),n=o,e(t)),s(window.location.hash)}});const a={root:null,rootMargin:"-120px 0px -60% 0px",threshold:[0,.1,.5,1]};let n=null,t=null;const r=new IntersectionObserver(s=>{if(t)return;const o=s.filter(e=>e.isIntersecting).sort((e,t)=>t.intersectionRatio-e.intersectionRatio);if(o.length>0){const s=o[0],t=s.target.getAttribute("id");if(t&&t!==n){n=t;const s=document.querySelector(`.gem-sidebar a[href$="#${t}"]`);s&&(document.querySelectorAll(".gem-sidebar a").forEach(e=>{e.classList.remove("active")}),s.classList.add("active"),e(s),window.history&&window.history.replaceState&&window.history.replaceState(null,null,`#${t}`))}}},a);document.querySelectorAll("h2[id], h3[id], section[id], .expandable-section[id]").forEach(e=>{r.observe(e)});function c(){const o=window.location.pathname,i=window.location.hash;if(i){const o=i.substring(1),t=document.querySelector(`.gem-sidebar a[href$="#${o}"]`);if(t){t.classList.add("active"),n=o,e(t),s(i);return}}let t=null;if(t=document.querySelector(`.gem-sidebar a[href="${o}"]`),!t){const e=o.replace(/\/$/,"");t=document.querySelector(`.gem-sidebar a[href="${e}"]`)}if(!t&&!o.endsWith("/")&&(t=document.querySelector(`.gem-sidebar a[href="${o}/"]`)),t)t.classList.add("active"),e(t);else{const t=document.querySelector(".gem-sidebar a");t&&(t.classList.add("active"),e(t))}}c()}) \ No newline at end of file +document.addEventListener('DOMContentLoaded',function(){function d(b,e=300){const c=document.querySelector(b);if(!c)return console.warn(`Target element not found: ${b}`),!1;const d=window.pageYOffset,g=c.getBoundingClientRect().top+window.pageYOffset,h=g-120,i=h-d;let a=null;function f(b){a===null&&(a=b);const c=b-a,g=j(c,d,i,e);window.scrollTo(0,g),c{b=null},0),document.querySelectorAll('.gem-sidebar a').forEach(a=>{a.classList.remove('active')}),c.classList.add('active'),a(c)}function a(a){const b=document.querySelector('.gem-sidebar');if(!b||!a)return;const c=a.offsetTop,d=a.offsetHeight,e=b.clientHeight,f=c-e/2+d/2;b.scrollTo({top:f,behavior:'smooth'})}document.querySelectorAll('a[href*="#"]').forEach(a=>{a.addEventListener('click',f)}),window.addEventListener('popstate',function(){if(window.location.hash){const e=window.location.hash.substring(1),b=document.querySelector(`.gem-sidebar a[href$="#${e}"]`);b&&(document.querySelectorAll('.gem-sidebar a').forEach(a=>{a.classList.remove('active')}),b.classList.add('active'),c=e,a(b)),d(window.location.hash)}});const g={root:null,rootMargin:'-120px 0px -60% 0px',threshold:[0,.1,.5,1]};let c=null,b=null;const h=new IntersectionObserver(e=>{if(b)return;const d=e.filter(a=>a.isIntersecting).sort((a,b)=>b.intersectionRatio-a.intersectionRatio);if(d.length>0){const e=d[0],b=e.target.getAttribute('id');if(b&&b!==c){c=b;const d=document.querySelector(`.gem-sidebar a[href$="#${b}"]`);d&&(document.querySelectorAll('.gem-sidebar a').forEach(a=>{a.classList.remove('active')}),d.classList.add('active'),a(d),window.history&&window.history.replaceState&&window.history.replaceState(null,null,`#${b}`))}}},g);document.querySelectorAll('h2[id], h3[id], section[id], .expandable-section[id]').forEach(a=>{h.observe(a)});function i(){const e=window.location.pathname,f=window.location.hash;if(f){const e=f.substring(1),b=document.querySelector(`.gem-sidebar a[href$="#${e}"]`);if(b){b.classList.add('active'),c=e,a(b),d(f);return}}let b=null;if(b=document.querySelector(`.gem-sidebar a[href="${e}"]`),!b){const a=e.replace(/\/$/,'');b=document.querySelector(`.gem-sidebar a[href="${a}"]`)}if(!b&&!e.endsWith('/')&&(b=document.querySelector(`.gem-sidebar a[href="${e}/"]`)),b)b.classList.add('active'),a(b);else{const b=document.querySelector('.gem-sidebar a');b&&(b.classList.add('active'),a(b))}}i()}) \ No newline at end of file diff --git a/public/js/theme-toggle.min.js b/public/js/theme-toggle.min.js index 9f86a76..c03f377 100644 --- a/public/js/theme-toggle.min.js +++ b/public/js/theme-toggle.min.js @@ -1 +1 @@ -class ThemeToggle{constructor(){this.button=document.getElementById("theme-toggle"),this.icon=document.querySelector(".theme-icon"),this.currentTheme=localStorage.getItem("theme")||"light",this.init()}init(){this.setTheme(this.currentTheme),this.button.addEventListener("click",()=>{this.toggleTheme()})}toggleTheme(){this.currentTheme=this.currentTheme==="light"?"dark":"light",this.setTheme(this.currentTheme),localStorage.setItem("theme",this.currentTheme)}setTheme(e){document.body.setAttribute("data-theme",e),this.icon.textContent=e==="light"?"πŸŒ™":"β˜€οΈ"}}document.addEventListener("DOMContentLoaded",()=>{new ThemeToggle}) \ No newline at end of file +class ThemeToggle{constructor(){this.button=document.getElementById('theme-toggle'),this.icon=document.querySelector('.theme-icon'),this.currentTheme=localStorage.getItem('theme')||'light',this.init()}init(){this.setTheme(this.currentTheme),this.button.addEventListener('click',()=>{this.toggleTheme()})}toggleTheme(){this.currentTheme=this.currentTheme==='light'?'dark':'light',this.setTheme(this.currentTheme),localStorage.setItem('theme',this.currentTheme)}setTheme(a){document.body.setAttribute('data-theme',a),this.icon.textContent=a==='light'?'πŸŒ™':'β˜€οΈ'}}document.addEventListener('DOMContentLoaded',()=>{new ThemeToggle}) \ No newline at end of file diff --git a/public/js/typing-animation.min.js b/public/js/typing-animation.min.js index 151cac2..820f137 100644 --- a/public/js/typing-animation.min.js +++ b/public/js/typing-animation.min.js @@ -1 +1 @@ -class TypingAnimation{constructor(e,t,n=100){this.element=document.getElementById(e),this.text=t,this.speed=n,this.currentIndex=0,this.isTyping=!1}start(){if(this.isTyping||!this.element)return;this.isTyping=!0,this.element.textContent="",this.element.style.opacity="1",this.element.classList.add("typing-cursor"),this.typeNextCharacter()}typeNextCharacter(){this.currentIndex{this.typeNextCharacter()},this.speed)):(this.isTyping=!1,setTimeout(()=>{this.element.classList.remove("typing-cursor")},1e3))}reset(){this.currentIndex=0,this.isTyping=!1,this.element.textContent="",this.element.classList.remove("typing-cursor")}}document.addEventListener("DOMContentLoaded",()=>{const e=new TypingAnimation("typing-text","Wiring general intelligence through reinforcement learning",80);setTimeout(()=>{e.start()},500)}) \ No newline at end of file +class TypingAnimation{constructor(a,b,c=100){this.element=document.getElementById(a),this.text=b,this.speed=c,this.currentIndex=0,this.isTyping=!1}start(){if(this.isTyping||!this.element)return;this.isTyping=!0,this.element.textContent='',this.element.style.opacity='1',this.element.classList.add('typing-cursor'),this.typeNextCharacter()}typeNextCharacter(){this.currentIndex{this.typeNextCharacter()},this.speed)):(this.isTyping=!1,setTimeout(()=>{this.element.classList.remove('typing-cursor')},1e3))}reset(){this.currentIndex=0,this.isTyping=!1,this.element.textContent='',this.element.classList.remove('typing-cursor')}}document.addEventListener('DOMContentLoaded',()=>{const a=new TypingAnimation('typing-text','Wiring general intelligence through reinforcement learning',80);setTimeout(()=>{a.start()},500)}) \ No newline at end of file diff --git a/public/sitemap.xml b/public/sitemap.xml index 36d45a7..4574d02 100644 --- a/public/sitemap.xml +++ b/public/sitemap.xml @@ -1,21 +1 @@ - - - - http://localhost:53236/gem/features/ - - http://localhost:53236/gem/environments/ - - http://localhost:53236/gem/ - - http://localhost:53236/ - - http://localhost:53236/categories/ - - http://localhost:53236/tags/ - - http://localhost:53236/gem/tools/ - - http://localhost:53236/gem/advanced/ - - +https://axon-rl.github.io/https://axon-rl.github.io/categories/https://axon-rl.github.io/tags/https://axon-rl.github.io/gem/features/https://axon-rl.github.io/gem/environments/https://axon-rl.github.io/gem/https://axon-rl.github.io/gem/tools/https://axon-rl.github.io/gem/advanced/ \ No newline at end of file diff --git a/public/tags/index.xml b/public/tags/index.xml index 9a79580..98692cc 100644 --- a/public/tags/index.xml +++ b/public/tags/index.xml @@ -1,11 +1 @@ - - - - Tags on Axon-RL - Wiring General Intelligence Through Reinforcement Learning - http://localhost:53236/tags/ - Recent content in Tags on Axon-RL - Wiring General Intelligence Through Reinforcement Learning - Hugo - en-us - - - +Tags on Axon-RL - Wiring General Intelligence Through Reinforcement Learninghttps://axon-rl.github.io/tags/Recent content in Tags on Axon-RL - Wiring General Intelligence Through Reinforcement LearningHugo -- gohugo.ioen-us \ No newline at end of file