Skip to content

[Demo] Adding code base for Evaluation leaderboard UI#16

Merged
minpyaemoe merged 10 commits intomainfrom
demo/add-leaderboard-ui
Apr 20, 2026
Merged

[Demo] Adding code base for Evaluation leaderboard UI#16
minpyaemoe merged 10 commits intomainfrom
demo/add-leaderboard-ui

Conversation

@minpyaemoe
Copy link
Copy Markdown
Collaborator

No description provided.

Comment thread demo/agent-eval-dashboard/scripts/download_hf_dataset.py Outdated
Comment thread demo/agent-eval-dashboard/scripts/add_results.py
Comment thread demo/agent-eval-dashboard/README.md
Comment thread README.md
Comment thread demo/agent-eval-dashboard/scripts/regenerate_detail_pages.py
Copy link
Copy Markdown
Collaborator

@billyhoce billyhoce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, removal of one file needed and some small code quality nits

Comment thread paper_experiments/view_results.py Outdated
pennychong94
pennychong94 previously approved these changes Apr 20, 2026
Copy link
Copy Markdown

@pennychong94 pennychong94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Collaborator

@Marcus-Duigan Marcus-Duigan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread README.md
```

Replace `<timestamp>` with the actual timestamp of your output directory. This loads the pickled results and starts a Streamlit app at `http://localhost:8501` that visualizes error categories and per-sample diagnostics. More details are in [paper_experiments/readme.md](paper_experiments/readme.md).
2. Then, to view the dashboard locally, run the following command:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need to add the then, since its already a bullet point list it follows sequentially

@minpyaemoe minpyaemoe requested a review from pennychong94 April 20, 2026 05:39
@minpyaemoe minpyaemoe removed the request for review from billyhoce April 20, 2026 05:47
@minpyaemoe minpyaemoe merged commit 54db965 into main Apr 20, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants