Skip to content

Add 'Spot-check + global metric for bulk regeneration after a shared-path fix' to ai-workflow.md #470

@braboj

Description

@braboj

Pattern from wuseria S147 (#1122 cohort log refresh) — cost-effective shape for bulk-artifact regeneration.

Pattern

When a fix touches a shared code path with N downstream artifacts to regenerate, full per-artifact inspection is expensive (N visual checks) and unnecessary. Pair a small spot-check (3-5 representative cases) with a global numeric metric (reference-set calibration, full test suite, link check, accessibility score). The spot-check catches obviously broken regeneration; the global metric catches drift across the population. Together they cover the same regression class as per-artifact review at a fraction of the time cost.

The reverse failure mode — global metric alone, no spot-check — misses regeneration bugs that pass numeric thresholds while producing visibly wrong artifacts (e.g. an SVG renderer that swaps a curve identity but keeps the same number of polylines). The other reverse mode — full per-artifact glance with no global metric — discovers post-merge that the change had a population-level effect outside the sample.

Concrete example

wuseria/me-fuji#1147 refreshed 12 TTartisan production digitization logs after the #1122 chrome-strip fix. Full glance would have meant 24 overlay PNGs to inspect (max + stopped aperture per lens). Instead:

  • Spot-check 3 lenses: the fix's primary target, the most-artifact-rich case, a clean baseline. All traced cleanly under visual inspection.
  • Global metric: reference-set calibration showed 583/626 paired comparisons (93.1%) within ±0.05, with the only changed chart being viltrox freq30S (NOT a TTartisan lens), confirming the change is safe across the full TTartisan cohort population.

Total verification cost: ~5 minutes. Per-artifact equivalent would have been ~45 minutes. Same regression coverage.

When to use

  • Code change affects N artifacts that share a generator/renderer
  • A global numeric metric exists that aggregates across N
  • The artifacts have a visual or structural dimension the metric doesn't capture (overlay PNGs, layout SVGs, rendered HTML)

When NOT to use

  • Each artifact has independent semantics the global metric can't summarize
  • The "shared path" has different effects on each artifact (no population assumption)
  • N is small enough (< 5) that per-artifact glance is already cheap

Why this belongs upstream

templates/base/workflow/ai-workflow.md already has "Review continuously" — review after each task, not after queuing 10 up. The spot-check + global metric pattern is the operational form of that principle for bulk-artifact cases where "after each" would dominate the change's actual work.

Proposed wording

Add to templates/base/workflow/ai-workflow.md Lessons Learned section (alongside the candidates in #468):

Spot-check + global metric for bulk regeneration after a shared-path fix

When a fix touches a shared code path with N downstream artifacts to regenerate, full per-artifact inspection is expensive and unnecessary. Pair a small spot-check (3-5 representative cases — the fix's primary target, the most artifact-rich case, a clean baseline) with a global numeric metric that aggregates across the population (test suite pass rate, calibration result, link check, accessibility score). The spot-check catches obviously broken regeneration; the metric catches drift across the population. Same regression coverage as per-artifact review at a fraction of the cost.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — nice to havespikeResearch or exploration

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions