Skip to content

add#92

Closed
mayinghan wants to merge 26 commits intov0.2.10-bffrom
main
Closed

add#92
mayinghan wants to merge 26 commits intov0.2.10-bffrom
main

Conversation

@mayinghan
Copy link
Copy Markdown
Collaborator


name: Pull Request
about: Propose changes to the codebase
title: "Brief description of changes"
labels: ''
assignees: ''


Description

Please include a summary of the change and which issue is fixed or feature is implemented. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)
Implements # (issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • Refactoring/Code cleanup
  • Build/CI/CD related changes
  • Other (please describe):

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration.

  • Test A
  • Test B

Test Configuration:

  • Firmware version:
  • Hardware:
  • Toolchain:
  • SDK:

Checklist:

  • My code follows the style guidelines of this project (ran black ., isort ., flake8 .)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

Screenshots (if applicable)

If applicable, add screenshots to help showcase your changes.

Additional context

Add any other context about the PR here.

xzrderek and others added 26 commits August 12, 2025 17:07
* fix failure reason

* update
* add live bench

* fix live bench and rollout processor
* Add AIME2025, GPQA, HealthBench evaluation_test suites; unify row-limiting via pytest flag; clean up examples

* evaluation with aggregated scores

* WIP: vibe coded as an mvp

* merge

* remove

* updated logger

* formatting

* formatting

* fixing tests

---------

Co-authored-by: benjibc <youfychenbc5000@gmail.com>
* e2e smoke test

* temp adding

* update

* test

* adjust bounds

* change back to regular schedule

* final
* convert rollout_input_params to completion_params

* fix

* DISABLE_EP_SQLITE_LOG

* fix kwargs access to "model"

* DRY completion params and make it a dict

* fix tests

* revert

* fix

* ensure logging

* fix smoke test params
* "Copy" button

* consolidate filter configurations

* filter button works

* extract tooltip into its own component

* vite build

* Refactor AddFilterButton layout for improved styling and structure

* vite build
* Finished Error Handling

* Address comments

* Changing the rollout processors

* cleaning up mcp gym

* remove import

* Update

* failing test

* fixing flaky test

* update comments
* livesvgbench + metadata fix

* bugs in retry processor
* works

* vite build / fix warnings

* don't show totals / fix warnings / vite build

* styling

* no black border / vite build
* BigQuery

* removing unneeded
…te CI; exclude vite dist; restore deleted files from main (bigquery adapter + vite src/readme) (#74)
@mayinghan mayinghan closed this Aug 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants