Skip to content

fix: below+bidirect edit distance takes max across strands#84

Open
aviezerl wants to merge 4 commits intomasterfrom
fix/below-bidirect-strand-max
Open

fix: below+bidirect edit distance takes max across strands#84
aviezerl wants to merge 4 commits intomasterfrom
fix/below-bidirect-strand-max

Conversation

@aviezerl
Copy link
Copy Markdown
Collaborator

Summary

  • Fixed direction="below" with bidirect=TRUE taking min across strands instead of max. A genomic substitution changes both strands simultaneously, so disrupting a motif requires both strands to fall below the threshold — the harder strand determines the answer.
  • Rewrote edit distance documentation into a dedicated section with clear direction/strand semantics, parameter reference, and worked examples.

Details

In evaluate_windows(), the maybe_update_min lambda always kept the global minimum edit distance across all (position, strand) pairs. For direction="above" this is correct (a match on either strand suffices). For direction="below", the correct per-position combination is max across strands, then min across positions.

New below_bidirect code path computes both strands at each offset, combines with max (NaN-aware: if only one strand passes score.min, uses that strand alone), then feeds the combined result into the existing global-min tracking.

Test plan

  • Existing test updated: bidirectional assertion changed from min(fwd, rev) to >= max(fwd, rev)
  • New test: 1bp iterator verifying bidi == max(fwd, rev) at each position
  • All 1280 PWM tests pass
  • Validated on PrimatesAnc069 CTCF query — hits correctly show bidirectional scores above score.min

aviezerl and others added 4 commits April 12, 2026 19:27
When direction="below" and bidirect=TRUE, the edit distance was
incorrectly taking the minimum across forward and reverse strands.
Since a genomic substitution changes both strands simultaneously,
disrupting a motif site requires bringing *both* strands below the
threshold — so the harder strand (max) should determine the answer.

For direction="above", min remains correct: a match on either strand
suffices.

Also rewrites the edit distance documentation into a dedicated section
with clear explanation of direction/strand semantics, parameter
reference, and worked examples using the examples DB.
Previously, direction="below" silently set score.min = score.thresh
when score.min was NULL. This hidden default was a footgun: users
who pre-screened for strong matches and then ran edit distance would
get unexpected NAs without realizing score.min was being set behind
their back.

score.min now defaults to NULL (no filter) for both directions,
matching score.max behavior. Users who want filtering must set
score.min explicitly.

Also updates docs to clearly explain score.min vs score.thresh:
score.min is a pre-filter (which windows to evaluate),
score.thresh is the target (what score to reach).
Count single-base substitutions that independently cross a PWM score
threshold. For each window, computes how many (position, alt_base) pairs
would each individually change the score past the threshold. Returns 0
if threshold already satisfied, NA if no single edit suffices.

Builds on existing PWMEditDistanceScorer infrastructure (gain tables,
direction handling, score filtering). Subs-only, no indels.

R API: gvtrack.create("name", NULL, "pwm.n_mutations", pssm=...,
       score.thresh=..., direction=..., bidirect=..., score.min=...,
       score.max=...)

Tests: 20 tests, 94 assertions covering correctness, direction,
bidirectional, score filters, integration, and parameter validation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant