Goal
Review one public benchmark or README claim and classify it as an engineering claim, benchmark claim, or scientific claim under docs/claim_boundaries.md.
Starter commands
python -m pip install -e ".[dev]"
python scripts/check_release_readiness.py
Then inspect one target claim in README.md, docs/results/, or the leaderboard registry.
Acceptance criteria
Please submit an issue or PR that includes:
- the exact claim text or file path;
- the proposed claim class: engineering, benchmark, or scientific;
- the evidence currently supporting it;
- evidence labels that should appear on the row or section;
- whether the wording should be weakened, strengthened, or left unchanged;
- one runnable command or artifact path that a reviewer can use to verify the claim.
This task is intentionally about falsifiability. A useful review can say that a claim is too strong.
Goal
Review one public benchmark or README claim and classify it as an engineering claim, benchmark claim, or scientific claim under
docs/claim_boundaries.md.Starter commands
python -m pip install -e ".[dev]" python scripts/check_release_readiness.pyThen inspect one target claim in
README.md,docs/results/, or the leaderboard registry.Acceptance criteria
Please submit an issue or PR that includes:
This task is intentionally about falsifiability. A useful review can say that a claim is too strong.