Pattern from wuseria S147 (#1122) — small but valuable addition to quality conventions.
Pattern
When introducing a numeric threshold to a filter (cutoff, bound, tolerance), measure the actual distribution of the filtered quantity in the working data set (p99, p95, max-legitimate) before picking the threshold. Comment the data source in the constant's docstring so future maintainers can re-validate the threshold against fresh data without re-running the analysis from scratch.
The alternative — picking thresholds from first principles or "feels right" — produces thresholds that mask future bugs by being too loose or break edge cases by being too tight, with no record of how the number was chosen.
Concrete example
wuseria/me-fuji#1122 added _MAX_RUN_LENGTH = 20 to filter vertical chrome out of ridge candidates. The threshold was sized from the reference-set column-run histogram:
median=2 px, p95=3 px, p99=13 px, max legitimate ~17 px
chrome runs measured: 50 px and 265 px
20 admits any real curve ridge (slack above p99=13) but drops chrome (the smallest chrome run is 2.5× the threshold). Calibration validated: only one chart changed, by losing a previously out-of-tolerance reading.
The docstring on the constant records the data source so a future maintainer can rerun the histogram against new reference data and re-validate:
# Real MTF curve ridges measured across the reference set fit within ~3 px
# at p95 and ~13 px at p99; runs >20 px are chrome.
_MAX_RUN_LENGTH: int = 20
Why this belongs upstream
templates/base/core/quality.md §"Magic numbers and magic strings must be named constants" is one half — the other half is how the number gets picked. The named-constant rule prevents scattered literals; the distribution-sized rule prevents arbitrary thresholds masquerading as principled ones.
Proposed wording
Add to templates/base/core/quality.md under the existing "Magic numbers" item or as a sibling item:
- When introducing a numeric threshold to a filter, cutoff, or bound, size it from the actual distribution of the relevant quantity in the working data set (p95, p99, max-legitimate) — not from first principles or intuition. Document the data source in the constant's docstring so a future maintainer can rerun the analysis against fresh data and re-validate the threshold.
Related to upstream-flag in #468
Companion to #468 ("probe before trusting issue body's mechanism"). Both came from the same session's mtfdigitizer work; both are about grounding decisions in measured runtime data instead of inferred premises.
Pattern from wuseria S147 (#1122) — small but valuable addition to quality conventions.
Pattern
When introducing a numeric threshold to a filter (cutoff, bound, tolerance), measure the actual distribution of the filtered quantity in the working data set (p99, p95, max-legitimate) before picking the threshold. Comment the data source in the constant's docstring so future maintainers can re-validate the threshold against fresh data without re-running the analysis from scratch.
The alternative — picking thresholds from first principles or "feels right" — produces thresholds that mask future bugs by being too loose or break edge cases by being too tight, with no record of how the number was chosen.
Concrete example
wuseria/me-fuji#1122 added
_MAX_RUN_LENGTH = 20to filter vertical chrome out of ridge candidates. The threshold was sized from the reference-set column-run histogram:20 admits any real curve ridge (slack above p99=13) but drops chrome (the smallest chrome run is 2.5× the threshold). Calibration validated: only one chart changed, by losing a previously out-of-tolerance reading.
The docstring on the constant records the data source so a future maintainer can rerun the histogram against new reference data and re-validate:
Why this belongs upstream
templates/base/core/quality.md§"Magic numbers and magic strings must be named constants" is one half — the other half is how the number gets picked. The named-constant rule prevents scattered literals; the distribution-sized rule prevents arbitrary thresholds masquerading as principled ones.Proposed wording
Add to
templates/base/core/quality.mdunder the existing "Magic numbers" item or as a sibling item:Related to upstream-flag in #468
Companion to #468 ("probe before trusting issue body's mechanism"). Both came from the same session's mtfdigitizer work; both are about grounding decisions in measured runtime data instead of inferred premises.