Skip to content

fix: resolve direction calculator caching bugs and prevent stale queries on hot-swap#797

Open
3rabiii wants to merge 9 commits intoOneBusAway:mainfrom
3rabiii:fix/direction-calculator-caching
Open

fix: resolve direction calculator caching bugs and prevent stale queries on hot-swap#797
3rabiii wants to merge 9 commits intoOneBusAway:mainfrom
3rabiii:fix/direction-calculator-caching

Conversation

@3rabiii
Copy link
Copy Markdown
Contributor

@3rabiii 3rabiii commented Mar 26, 2026

Description

This PR addresses several reliability and correctness bugs within the AdvancedDirectionCalculator. Previously, transient database errors could permanently poison the direction cache, insufficient shape points defaulted to an invalid "East" orientation, and the calculator retained a stale database pointer after a GTFS hot-swap.

Changes Included

  • Prevent Transient Error Caching: computeFromShapes now returns (string, error). The loop tracks transient DB errors (lastTransientErr) and propagates them to the caller, ensuring we only cache valid directions or legitimate "no data" states, avoiding permanent cache poisoning during DB hiccups.
  • Fix Bogus Orientations: Explicitly return sql.ErrNoRows when there are fewer than 2 shape points. This prevents the caller from incorrectly interpreting a nil error as a valid 0 radian (East) orientation.
  • Add Error Visibility: Added slog.Warn for non-ErrNoRows errors in the orientation calculation loop to surface silent failures like DB connection drops or timeouts.
  • Resolve Stale Queries on Hot-Swap: - Added a sync.RWMutex to protect the queries pointer in AdvancedDirectionCalculator.
    • Added an UpdateQueries() method to atomically swap the DB pointer and safely evict the entire direction result cache.
    • Wired the calculator into gtfs.Manager during startup so that ForceUpdate can automatically trigger the refresh after a successful GTFS reload.

Verification

image

fixes: #747

Copy link
Copy Markdown

@fletcherw fletcherw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Adel, this is a good improvement to the caching logic. I have some feedback but no large changes are needed.

adc.directionResults.Store(stopID, computedDir)
}

return computedDir, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, even if computeFromShapes returned an err, you're returning computedDir and a nil error. If the intention is to ignore errors, you should leave a comment here explaining why it's safe to do so.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional so we gracefully fallback to an empty direction "" without failing the API. Returning nil also ensures singleflight correctly shares this fallback with concurrent callers. I've added a comment to clarify this. Thanks!

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there an easy way you can add a test for the new behavior you've added, verifying that a returned error value isn't cached?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I've added TestTransientDBError_NotCached. It simulates a db failure by closing an in-memory database and verifies that the resulting empty string is not cached

@3rabiii 3rabiii requested a review from fletcherw March 31, 2026 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Direction calculator: permanent negative cache on transient DB errors

2 participants