[codex] feat(data): LexBench-Browser multi-system expansion#58
Draft
jiaxinwang-sherry wants to merge 2 commits into
Draft
Conversation
…-system + email + login_required) - Add RPA split (22 tasks): Odoo/Amazon/Trip/GitHub/Yahoo/npm/Trello/Wikipedia/HN/Dolibarr/Mailinator/GuerrillaMail - Add IM split (10 tasks): Zulip channel monitoring, topic aggregation, emoji sentiment, cross-channel search, contributor tracking - Add cross_system split (10 tasks): Zulip→Odoo across CRM/Sales/HR/Accounting/Project/Inventory/Employees/Purchase/Calendar - Add email split (10 tasks): YOPmail inbox/OTP/aggregation + YOPmail→Odoo cross-system - Add login_required split (7 tasks): Discord/Element/Rocket.Chat/Telegram/Reddit (pending test accounts) - Update data_info.json with all 6 splits and version info Co-Authored-By: Sherry Wang <sherry@wangjiaxindeMacBook-Air.local> (cherry picked from commit c90adaaa7dd3ebdfb4cb6fa7a3219f75072b8203)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changed
This PR ports the LexBench-Browser multi-system data expansion into the public
browseruse-agent-benchrepo.It adds new dataset splits and metadata for:
rpaimcross_systememaillogin_requiredIt also updates
VERSION_HISTORY.mdand merges the new split definitions into the currentdata_info.jsonstructure used onmain.Why it changed
The data expansion was already prepared in staging, but that branch lives on unrelated history relative to
browseruse-agent-bench.This PR cherry-picks the dataset-only change onto
lexmount/browseruse-agent-benchso the public repo can review and merge it cleanly.Impact
task_allandsample50metadata onmainare preserved.Validation
python3 -m json.tool browseruse_bench/data/LexBench-Browser/data_info.jsongit diff --stat agent-bench/main..HEAD