Skip to content

steel-dev/leaderboard

Repository files navigation

Browser Agent Leaderboard

This repository presents the current standings of various web agents evaluated on the WebVoyager benchmark (paper). The WebVoyager benchmark comprises 643 tasks across 15 popular websites, assessing agents' abilities to perform diverse web navigation and interaction tasks.


Steel.dev - Open-source Browser API for AI Agents & Apps Steel is an open-source browser API purpose-built for AI agents.

Leaderboard

Rank Agent Organization WebVoyager Score Source Open Source New SOTA
1 Alumnium Alumnium 98.5% Source Yes Yes Yes
2 Surfer 2 H Company 97.1% Source No Yes Yes
3 Magnitude Magnitude 93.9% Source Yes No
4 AIME Browser-Use Aime 92.34% Source No Yes
5 Surfer-H + Holo1 H Company 92.2% Source No Yes
6 Browserable Browserable 90.4% Source Yes Yes
7 Browser Use Browser Use 89.1% Source Yes No
8 Operator OpenAI 87% Source No No
9 Skyvern 2.0 Skyvern 85.85% Source Yes No
10 Project Mariner Google 83.5% Source No No
11 Notte Notte 73.1% Source Yes Yes
12 Agent-E Emergence AI 73.1% Source No No
13 WebSight Academic Research 68% Source No No
14 Runner H 0.1 H Company 67% Source No No
15 WebVoyager Academic Research 59.1% Source Yes No
16 WILBUR Academic Research 53% Source No No

Notes:

  • Open Source: Indicates whether the agent's source code is publicly available.
  • New: Denotes recently introduced agents.
  • SOTA: Signifies agents that have achieved state-of-the-art performance.

Contributing

We encourage contributions to keep this leaderboard up-to-date. If you have information about new agents or updated scores, please submit a pull request or open an issue.

License

This project is licensed under the MIT License.