Skip to content

Add AIME2025, GPQA, HealthBench evaluation_test suites; unify row-limiting via pytest flag; clean up examples #199

Add AIME2025, GPQA, HealthBench evaluation_test suites; unify row-limiting via pytest flag; clean up examples

Add AIME2025, GPQA, HealthBench evaluation_test suites; unify row-limiting via pytest flag; clean up examples #199

Triggered via pull request August 10, 2025 17:04
Status Success
Total duration 9m 18s
Artifacts 4

ci.yml

on: pull_request
Lint & Type Check
55s
Lint & Type Check
Matrix: test-core
Batch Evaluation Tests
1m 36s
Batch Evaluation Tests
MCP End-to-End Tests
46s
MCP End-to-End Tests
Upload Coverage
8s
Upload Coverage
Fit to window
Zoom out
Zoom in

Annotations

1 warning
MCP End-to-End Tests
No files were found with the provided path: coverage.xml. No artifacts will be uploaded.

Artifacts

Produced during runtime
Name Size Digest
coverage-batch-eval Expired
31.1 KB
sha256:181615560b48f4a3d86f7e10e58312efb89b6a38fc5f431248f725d2e572ebdf
coverage-core-3.10 Expired
37 KB
sha256:98d26fff306fbe4582e84dba54d32596c2f40b76a60a26d3c8a2fe1711bbd5ba
coverage-core-3.11 Expired
37 KB
sha256:9792257363e948352faa9723ab0e5ed2299227e0b2f6fdc3912e0de148796df3
coverage-core-3.12 Expired
37 KB
sha256:7011f8b79fe3d4201fd26f1e175bea26caf1dd8aa673e8e484de78fe7a469872