Skip to content

External evidence: review one benchmark claim boundary #48

@weich97

Description

@weich97

Goal

Review one public benchmark or README claim and classify it as an engineering claim, benchmark claim, or scientific claim under docs/claim_boundaries.md.

Starter commands

python -m pip install -e ".[dev]"
python scripts/check_release_readiness.py

Then inspect one target claim in README.md, docs/results/, or the leaderboard registry.

Acceptance criteria

Please submit an issue or PR that includes:

  • the exact claim text or file path;
  • the proposed claim class: engineering, benchmark, or scientific;
  • the evidence currently supporting it;
  • evidence labels that should appear on the row or section;
  • whether the wording should be weakened, strengthened, or left unchanged;
  • one runnable command or artifact path that a reviewer can use to verify the claim.

This task is intentionally about falsifiability. A useful review can say that a claim is too strong.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions