Toward Safety and Utility in Web Agents via Adaptive Policy Enhancement and Dual-Objective Optimization
A multi-agent collaborative framework for balancing safety and utility in web environments
📄 Paper | 🤗 HuggingFace | 🐦 X (Twitter)
- HarmonyGuard
Large language models enable agents to autonomously perform tasks in open web environments. However, as hidden threats within the web evolve, web agents face the challenge of balancing task performance with emerging risks during long-sequence operations. Although this challenge is critical, current research remains limited to single-objective optimization or single-turn scenarios, lacking the capability for collaborative optimization of both safety and utility in web environments.
To address this gap, we propose HarmonyGuard, a multi-agent collaborative framework that leverages policy enhancement and objective optimization to jointly improve both utility and safety. HarmonyGuard features a multi-agent architecture characterized by two fundamental capabilities:
| Component | Description |
|---|---|
| 🛡️ Policy Agent | LLM-based agent for processing and updating security policies |
| ⚡ Utility Agent | Agent for implementing web agent reasoning evaluation and reasoning correction |
| 🔌 MCP Server | Model Context Protocol Server, used for the agent tool calls |
-
🔄 Adaptive Policy Enhancement: We introduce the Policy Agent within HarmonyGuard, which automatically extracts and maintains structured security policies from unstructured external documents, while continuously updating policies in response to evolving threats.
-
⚖️ Dual-Objective Optimization: Based on the dual objectives of safety and utility, the Utility Agent integrated within HarmonyGuard performs the Markovian real-time reasoning to evaluate the objectives and utilizes metacognitive capabilities for their optimization.
📊 Performance: Extensive evaluations on multiple benchmarks show that HarmonyGuard improves policy compliance by up to 38% and task completion by up to 20% over existing baselines, while achieving over 90% policy compliance across all tasks.
If you find our work valuable for your research or applications, we would greatly appreciate a star ⭐ and a citation using the BibTeX entry provided below.
@article{chen2025harmonyguard,
title={HarmonyGuard: Toward Safety and Utility in Web Agents via Adaptive Policy Enhancement and Dual-Objective Optimization},
author={Chen, Yurun and Hu, Xavier and Liu, Yuhan and Yin, Keting and Li, Juncheng and Zhang, Zhuosheng and Zhang, Shengyu},
journal={arXiv preprint arXiv:2508.04010},
year={2025}
}
| Feature | Description |
|---|---|
| 🏆 Multi-Benchmark Support | Integrated evaluation with ST-WebAgentBench and WASP |
| 🛡️ Policy-Aware Evaluation | Six-dimensional safety assessment (User-Consent, Boundary, Strict Execution, Hierarchy, Robustness, Error Handling) |
| 🤖 Multi-Model Support | Compatible with OpenAI GPT models, Anthropic Claude, and Alibaba Qwen |
| 📄 Automated Policy Processing | PDF and webpage policy extraction capabilities |
| 📝 Comprehensive Logging | Detailed logging and thought process tracking |
HarmonyGuard/
├── 🛡️ harmony_agents/ # Core agent implementations
│ ├── policy_agent.py # Policy processing agent
│ ├── utility_agent.py # Utility functions agent
│ └── mcp_server.py # MCP server implementation
├── 🏆 benchmark/ # Benchmark suites
│ ├── ST-WebAgentBench/ # ST-WebAgentBench benchmark
│ └── wasp/ # WASP benchmark
├── 🔧 utility/ # Utility modules
│ ├── config_loader.py # Configuration management
│ ├── logger.py # Logging utilities
│ └── tools.py # Common tools
├── 📚 policy_docs/ # External Policy documentation
├── 📊 policy_processing_output/ # Structured policy outputs
├── 📈 output/ # Evaluation results
│ ├── stweb/ # ST-WebAgentBench results
│ └── wasp/ # WASP results
├── 📊 evaluate/ # Evaluation tools
│ ├── evaluate_wasp.py # WASP evaluation tool
│ ├── evaluate_stweb.py # ST-Web evaluation tool
│ ├── Results/ # Evaluation results directory
│ │ ├── WASP/ # WASP results to be evluated
│ │ └── stweb/ # ST-Web results to be evluated
│ └── README.md # Evaluation documentation
└── 📝 materials/ # Project materials
├── icon.png # Project icon
└── config_explanation_en.md # Configuration documentation
- ✅ Python 3.10 or higher
- 🐳 Docker (for ST-WebAgentBench)
- ☁️ AWS EC2
git clone git@github.com:YurunChen/HarmonyGuard.git
cd HarmonyGuardThen edit the env.example file with your actual API keys:
# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_API_BASE=your_openai_base_url_here
# Anthropic Configuration
ANTHROPIC_API_KEY=your_anthropic_api_key_here
ANTHROPIC_API_BASE=your_anthropic_base_url_here
# Alibaba DashScope Configuration
DASHSCOPE_API_KEY=your_dashscope_api_key_here
DASHSCOPE_API_BASE=your_dashscope_base_url_hereCopy the example environment file and configure it:
cp env.example .envWe evaluated the performance of HarmonyGuard on two benchmarks. The benchmark environments were set up using two scripts, with each environment created via conda.
chmod +x setup_stweb.sh
./setup_stweb.shThis will create an environment named harmonyguard-stweb using conda.
chmod +x setup_wasp.sh
./setup_wasp.shThis will create an environment named harmonyguard-wasp using conda.
Based on the following tutorial for deployment on AWS EC2:
-
ST-WebAgentBench:
-
WASP: visualwebarena
⚠️ Notice: If the ST-WebAgentBench website runs successfully, several website URLs need to be configured inHarmonyGuard/benchmark/ST-WebAgentBench/.env.
The project uses config.yaml for configuration management. we present detailed explanation in this. Key configuration sections:
openai:
policy_agent:
api_key: "${OPENAI_API_KEY}" # read from .env
base_url: "${OPENAI_API_BASE}" # read from .env
model: "gpt-4o"
max_tokens: 2048
temperature: 0
...mcp_server:
openai:
api_key: "${OPENAI_API_KEY}" # read from .env
base_url: "${OPENAI_API_BASE}" # read from .env
model: "gpt-4o"
max_tokens: 8000
temperature: 0logging:
level: "INFO"
format: "%(asctime)s - %(levelname)s - %(message)s"
console:
enabled: true
file:
enabled: false
path: "logs"To process the policy files located in the root directory, execute the following command:
cd harmony_agents
python policy_agent.py \
-i your_policy_file_path \
-org "organization" \
-desc "Description of the policy files" \
-subject "Agent"✅ The processed results will be saved in the policy_processing_output directory.
Among the output files, the xxx_policies.json file is considered the parsed result. Use the path of this file to replace the placeholder in your configuration:
policy:
risk_cat_path: "xxx_policies.json"
Run the following command in the root directory:
# Remember to set .env in ST-WebAgentBench
cd benchmark/ST-WebAgentBench
conda activate harmonyguard-stweb
python st_bench_loop.py # You can set the evaluation range in this file.Run WASP prompt injection tests:
cd benchmark/wasp/webarena_prompt_injections
conda activate harmonyguard-wasp
export DATASET=webarena_prompt_injections
export REDDIT="Put your Reddit website URL here."
export GITLAB="Put your Gitlab website URL here."
python run.py \
--config configs/experiment_config.raw.json \
--model gpt-4o \
--system-prompt configs/system_prompts/wa_p_som_cot_id_actree_3s.json \
--output-dir ../../../output/wasp/ \
--output-format webarenaWe recommend saving all output files in the HarmonyGuard/output directory.
The results of WASP should be stored in HarmonyGuard/output/wasp, and the results of ST-WebAgentBench should be stored in HarmonyGuard/output/stweb.
The results of WASP is printed to the console. Please make sure to save the complete execution log.
We provide the evaluation code in the evaluate folder. Detailed instructions can be found here.
| Project | Description |
|---|---|
| ST-WebAgentBench | For the safety and trustworthiness evaluation framework |
| WASP | For the web agent security benchmark |
| BrowserGym | For the web automation infrastructure |
For questions, issues, or contributions:
- 📧 Email: yurunchen.research@gmail.com
- 🐛 Issues: GitHub Issues
This project is licensed under the MIT License.