Skip to content

Add: 'Verify on real data, not just unit tests' to lessons learned #475

@braboj

Description

@braboj

Source

me-fuji session 150 (#1170 partial fix in PR #1173).

Observation

When fixing a specific in-the-wild bug, all indirect green signals can lie:

  • Unit tests pass (synthetic geometry was correctly handled by the new code)
  • Full test suite passes (no regression)
  • `--check` flags pass (no committed artifacts changed)

But running the actual processing pipeline on the targeted input showed byte-identical output to pre-fix — the new code never fired on the case it was supposed to fix.

In the session this happened on the af-75 stopped freq30 MTF chart. The post-DP V-crossing detector had a clean unit-test pass but never fired on the real chart because the DP output didn't have the shape the detector required. Caught by running `py -m mtfdigitizer.diagnose ttartisan-af-75mm-f2-0` and comparing numbers to the committed digitization-log values.

Proposed template change

Add to `base/workflow/ai-workflow.md` Lessons Learned section, or `base/core/testing.md` near the testability/verification rules:

Verify the fix fires on real data. When a fix is meant to close a specific in-the-wild case, run the actual processing pipeline on that input and compare numbers to the committed/expected output. Unit tests on synthetic inputs and CI-staleness checks can both pass while the fix never executes on the targeted case. `./run-pipeline.sh | diff - ` is the test that matters — it confirms code reached the bug site AND took the new branch.

Generalizes S149's "pixel-level chart probe is the truth" — both point at the same antipattern of trusting indirect green signals over direct measurement.

Alternatives considered

  • Add as a MUST in `testing.md` — too strong; some fixes don't have a singular "real case" to point at.
  • Stack as an inline note inside the existing `Verify agent calculations against the system` section in `ai-workflow.md` — could work; that section is about agent math errors, this is about a different failure mode (fix never fires) but same root pattern (trust direct measurement over inference).

Preference: standalone bullet in Lessons Learned. Concrete enough to remember, general enough to apply.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — nice to havetaskAtomic implementable work

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions