This guide provides two distinct paths for training an agent:
- Basic GRPO Training: A standard method to get started quickly.
- AgentEvolver Training: An advanced method that supports a self-evolving agent training.
Before you begin, run these commands in your terminal to configure your environment. You only need to do this once.
- Initialize Conda
source <YOUR_CONDA_PATH>/etc/profile.d/conda.sh- Configure API Endpoints
export DASHSCOPE_API_KEY="<YOUR_API_KEY>"
export HF_ENDPOINT=https://hf-mirror.com💡 Tip: Add the
exportcommands to your~/.bashrcor~/.zshrcfile to set them automatically in new terminal sessions.
This launches the simulation environment (e.g., AppWorld) where the agent will operate. This service will run in the background. You'll need a new terminal for the next step.
conda activate appworld
bash env_service/launch_script/appworld.shThis command starts the training process using the GRPO method.
conda activate agentevolver
bash examples/run_basic.shJust like in basic training, this launches the agent's simulation environment. This service will run in the background. You'll need a new terminal for the next step.
conda activate appworld
bash env_service/launch_script/appworld.shThis service gives the agent long-term memory and the ability to reflect on past actions. This service will listen for requests on http://127.0.0.1:8001. Keep this terminal open.
Configure API Endpoints:
export FLOW_EMBEDDING_API_KEY="<YOUR_API_KEY>"
export FLOW_EMBEDDING_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
export FLOW_LLM_API_KEY="<YOUR_API_KEY>"
export FLOW_LLM_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1conda activate reme
cd external/reme
reme \
config=default \
backend=http \
thread_pool_max_workers=256 \
http.host="127.0.0.1" \
http.port=8001 \
http.limit_concurrency=256 \
llm.default.model_name=qwen-max-2025-01-25 \
embedding_model.default.model_name=text-embedding-v4 \
vector_store.default.backend=local \
op.rerank_memory_op.params.enable_llm_rerank=falseWith the environment and ReMe services running, start the AgentEvolver training.
conda activate agentevolver
bash examples/run_overall.shPlease read Part A and Part B first.
To run AgentEvolver and train agents on multiple nodes, we need some extra setup.
Follow the instructions in Configuration to prepare the training environment on all nodes.
In Part A and Part B, we leave the Ray cluster to be started by AgentEvolver. To enable multi-node training, we need to start the Ray cluster manually.
Start Ray on all nodes:
# all ray node must be in the same conda environment
conda activate agentevolver
# node 1 as head
ray start --head
# other nodes as followers
ray start --address='<head addr>'Start envservice on one node:
conda activate appworld
cd env_service
bash launch_script/appworld.shOnce you see the successful startup logs, you're good to go. The address of this node will be used in the next step.
If ReMe-Service is needed, start it on one node.
Create a launch script based on run_basic.sh or run_overall.sh:
- Update
env_urlto the address of the envservice node - Adjust other training parameters as needed
Then start training from one node:
conda activate agentevolver
bash run_basic.shAgentEvolver will automatically use the Ray cluster and spread the training tasks to all nodes.