Skip to content

[codex] Add observation-aware eval manifest#158

Merged
DavidJBianco merged 1 commit into
devfrom
codex/observation-aware-eval
May 15, 2026
Merged

[codex] Add observation-aware eval manifest#158
DavidJBianco merged 1 commit into
devfrom
codex/observation-aware-eval

Conversation

@DavidJBianco
Copy link
Copy Markdown
Collaborator

Summary

  • Emit OBSERVATION_MANIFEST.json beside generated ground truth with per-storyline and red-herring source evidence status.
  • Teach eforge eval to load the manifest and apply observation-aware adjustment only to coverage-style causality metrics while preserving strict correctness and contradiction checks.
  • Update CLI sidecar handling, evaluation reports, docs, skill references, TODO, and regression coverage.

Why

Observation profiles intentionally make some evidence filtered, delayed, dropped, or out of window. Automated evaluation should distinguish those expected collection gaps from generator failures without simply lowering pass thresholds or hiding real contradictions.

Validation

  • uv run eforge validate-config
  • uv run ruff check .
  • uv run ruff format --check .
  • Focused manifest/eval tests: 145 passed
  • uv run pytest -v: 3047 passed, 15 skipped

Slow tests were not run for this branch after the final eval-manifest changes.

@DavidJBianco DavidJBianco marked this pull request as ready for review May 15, 2026 15:54
@DavidJBianco DavidJBianco merged commit 77653ce into dev May 15, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant