Skip to content

Demo: block unverifiable AI eval claim#17

Open
AzurLiu wants to merge 3 commits into
mainfrom
demo/ai-eval-claim-gate
Open

Demo: block unverifiable AI eval claim#17
AzurLiu wants to merge 3 commits into
mainfrom
demo/ai-eval-claim-gate

Conversation

@AzurLiu
Copy link
Copy Markdown
Owner

@AzurLiu AzurLiu commented May 31, 2026

This PR is the public Falsiflow launch demo.

It intentionally starts with placeholder AI eval evidence so the claim gate fails with claim_check_blocked. The follow-up commit replaces falsiflow_ai_eval/evidence.csv with source-backed evidence so the same PR turns green with claim_check_ready.

Claim under review:

candidate_model improves answer quality over baseline_model on the pinned claims_eval_v2026_05_26 task set.

Canonical evidence:

Expected blocked behavior:

  • AI Eval Claim Gate Demo fails strict CI
  • status: claim_check_blocked
  • blocking stage: gate_evidence
  • repair actions ask for source-backed eval evidence

Expected ready behavior:

  • AI Eval Claim Gate Demo passes
  • status: claim_check_ready
  • verification: bundle_verified

This PR is deliberately not about proving the model is good. It demonstrates that unverifiable eval claims do not pass CI until the repository supplies pinned, source-backed evidence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant