Skip to content

[codex] feat(data): LexBench-Browser multi-system expansion#58

Draft
jiaxinwang-sherry wants to merge 2 commits into
lexmount:mainfrom
jiaxinwang-sherry:codex/lexbench-multi-system-expansion-agent-bench
Draft

[codex] feat(data): LexBench-Browser multi-system expansion#58
jiaxinwang-sherry wants to merge 2 commits into
lexmount:mainfrom
jiaxinwang-sherry:codex/lexbench-multi-system-expansion-agent-bench

Conversation

@jiaxinwang-sherry

Copy link
Copy Markdown
Contributor

What changed

This PR ports the LexBench-Browser multi-system data expansion into the public browseruse-agent-bench repo.

It adds new dataset splits and metadata for:

  • rpa
  • im
  • cross_system
  • email
  • login_required

It also updates VERSION_HISTORY.md and merges the new split definitions into the current data_info.json structure used on main.

Why it changed

The data expansion was already prepared in staging, but that branch lives on unrelated history relative to browseruse-agent-bench.
This PR cherry-picks the dataset-only change onto lexmount/browseruse-agent-bench so the public repo can review and merge it cleanly.

Impact

  • Public LexBench-Browser data now exposes more task slices for evaluation and analysis.
  • Existing task_all and sample50 metadata on main are preserved.
  • The PR stays scoped to dataset files only.

Validation

  • python3 -m json.tool browseruse_bench/data/LexBench-Browser/data_info.json
  • git diff --stat agent-bench/main..HEAD

王嘉欣 and others added 2 commits June 16, 2026 10:18
…-system + email + login_required)

- Add RPA split (22 tasks): Odoo/Amazon/Trip/GitHub/Yahoo/npm/Trello/Wikipedia/HN/Dolibarr/Mailinator/GuerrillaMail
- Add IM split (10 tasks): Zulip channel monitoring, topic aggregation, emoji sentiment, cross-channel search, contributor tracking
- Add cross_system split (10 tasks): Zulip→Odoo across CRM/Sales/HR/Accounting/Project/Inventory/Employees/Purchase/Calendar
- Add email split (10 tasks): YOPmail inbox/OTP/aggregation + YOPmail→Odoo cross-system
- Add login_required split (7 tasks): Discord/Element/Rocket.Chat/Telegram/Reddit (pending test accounts)
- Update data_info.json with all 6 splits and version info

Co-Authored-By: Sherry Wang <sherry@wangjiaxindeMacBook-Air.local>
(cherry picked from commit c90adaaa7dd3ebdfb4cb6fa7a3219f75072b8203)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant