Releases: GoogleCloudPlatform/evalbench
Releases · GoogleCloudPlatform/evalbench
v1.9.0
1.9.0 (2026-06-11)
Features
- add AgyCliGenerator support to evaluator and models, including test suite and configuration datasets (9e6d71f)
- add AgyCliGenerator support to evaluator and models, including test suite and configuration datasets (9e6d71f)
- add Antigravity agent tab and update CLI version retrieval logic (fe1a0ae)
- add Antigravity agent tab and update CLI version retrieval logic (147680b)
- dea: support YAML-only configuration and penguins dataset for DEA conversational evaluation (#410) (2e37c1c)
- recover resolved model label from agy cli logs for statistics bucket tagging (ee7635f)
- update MCP config translation to support both serverUrl and url fields in agy cli (685b1b2)
Bug Fixes
- dea: resolve concurrency deadlock in GcpAdcCredentialService (#422) (6619ae5)
- pin pyopenssl<26.2 to avoid google-auth mTLS regression (#424) (c8916ae)
- update error message to list supported AgentCliGenerator subclasses (dfe1e8e)
- update generator key path in trends data mapping to model_config.generator (3739f9d)
v1.8.0
1.8.0 (2026-06-03)
Features
- add Codex to agent filters in viewer (a2be23c)
- add Codex to agent filters in viewer (#387) (2ac6c1f)
- add filter_native_tools option to trajectory_matcher to optionally ignore native harness tools during scoring (2c34d94)
- add packaged console script entrypoint to support
uvxexecution (#385) (8ea07f8) - dea: define EvalDeaRequest input model for conversational evaluations (#407) (04b91cf)
- opt-in function-calling for the Gemini SDK judge (#409) (d97f511)
- Rename package to google-evalbench and decouple viewer dependencies (#390) (0d75811)
- scorers: filter native tools in trajectory_matcher with opt-out flag (2c1ab58)
- stabilize Cloud Run deployment and polish standalone CLI UX (#389) (4720eef)
- support work_dir for claude code eval (#403) (179e0d3)
Bug Fixes
- add --no-sync flag to runtime uv run commands to prevent PyPI timeouts (#392) (0c4783c)
- allow-list files in fake home directory for Gemini CLI (#395) (734bc2a)
- fix Mesop event routing bug in trends dropdown (023d150)
- gemini-cli: support 'name' parameter key in skill extraction (#378) (62400da)
- patch absl help output when running via uvx/launcher (85048e2)
- prevent silent errors on DB query timeouts and extend deadline (#406) (fbbd31d)
- surface eval failures instead of silently terminating or crashing (#398) (9c36108)
v1.7.1
v1.7.0
1.7.0 (2026-05-05)
Features
- add Dataform scorers and plumb isolated fake_home workspace directory tracking (#349) (b4ddfda)
- Add dbt Scorers for Agent Evaluations (#367) (5496d59)
- Add GCS Artifacts Reporter for Agent Evaluations (#366) (11def06)
- implement lifecycle execution for setup and teardown scripts (#360) (9de38da)
Bug Fixes
v1.6.0
1.6.0 (2026-04-30)
Features
- add exponential backoff for Gemini 429 resource exhausted errors (#352) (11d4236)
- allow execution of mixed DDL and DML statements in Spanner driver (#351) (64c0b63)
- chunk Spanner DDL statements into groups of 10 to avoid limits (#353) (306caad)
- handle quoted fields in CSV setup data (#350) (3ba2334)
- handle recursive dependency cleanup in Spanner table drop (#356) (3eb4d45)
- use parameterized queries for data insertion in MySQL, Postgres, and SQLite (#358) (733724d)
Bug Fixes
v1.5.0
v1.4.0
v1.3.1
v1.3.0
v1.2.0
1.2.0 (2026-04-07)
Features
- adc: support ADC for database authentication (#306) (6cb05e6)
- add Cloud Run support with entrypoint script, custom CSS, and environment-based XSRF configuration (82fdeca)
- add UV_NO_SYNC support to run script and update Dockerfile and cloudbuild configuration accordingly (43731f9)
- allow database name mapping via config (#303) (3e8d25a)
- geminicli: populate adc in fake home (01c9c5b)
- geminicli: populate adc in fake home (ce06c9b)
- implement on_load logic to auto-select job directory from query parameters (4691de4)