fix: below+bidirect edit distance takes max across strands#84
Open
fix: below+bidirect edit distance takes max across strands#84
Conversation
When direction="below" and bidirect=TRUE, the edit distance was incorrectly taking the minimum across forward and reverse strands. Since a genomic substitution changes both strands simultaneously, disrupting a motif site requires bringing *both* strands below the threshold — so the harder strand (max) should determine the answer. For direction="above", min remains correct: a match on either strand suffices. Also rewrites the edit distance documentation into a dedicated section with clear explanation of direction/strand semantics, parameter reference, and worked examples using the examples DB.
Previously, direction="below" silently set score.min = score.thresh when score.min was NULL. This hidden default was a footgun: users who pre-screened for strong matches and then ran edit distance would get unexpected NAs without realizing score.min was being set behind their back. score.min now defaults to NULL (no filter) for both directions, matching score.max behavior. Users who want filtering must set score.min explicitly. Also updates docs to clearly explain score.min vs score.thresh: score.min is a pre-filter (which windows to evaluate), score.thresh is the target (what score to reach).
Count single-base substitutions that independently cross a PWM score
threshold. For each window, computes how many (position, alt_base) pairs
would each individually change the score past the threshold. Returns 0
if threshold already satisfied, NA if no single edit suffices.
Builds on existing PWMEditDistanceScorer infrastructure (gain tables,
direction handling, score filtering). Subs-only, no indels.
R API: gvtrack.create("name", NULL, "pwm.n_mutations", pssm=...,
score.thresh=..., direction=..., bidirect=..., score.min=...,
score.max=...)
Tests: 20 tests, 94 assertions covering correctness, direction,
bidirectional, score filters, integration, and parameter validation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
direction="below"withbidirect=TRUEtaking min across strands instead of max. A genomic substitution changes both strands simultaneously, so disrupting a motif requires both strands to fall below the threshold — the harder strand determines the answer.Details
In
evaluate_windows(), themaybe_update_minlambda always kept the global minimum edit distance across all (position, strand) pairs. Fordirection="above"this is correct (a match on either strand suffices). Fordirection="below", the correct per-position combination is max across strands, then min across positions.New
below_bidirectcode path computes both strands at each offset, combines with max (NaN-aware: if only one strand passesscore.min, uses that strand alone), then feeds the combined result into the existing global-min tracking.Test plan
min(fwd, rev)to>= max(fwd, rev)bidi == max(fwd, rev)at each positionscore.min