feat: refactor GTFS realtime data storage to use a FeedData struct per feed with granular locking#480
feat: refactor GTFS realtime data storage to use a FeedData struct per feed with granular locking#480mann-patwa wants to merge 4 commits intoOneBusAway:mainfrom
FeedData struct per feed with granular locking#480Conversation
d7751eb to
00d8539
Compare
| manager.feedMapMutex.Lock() | ||
| feed := manager.feedData[feedID] | ||
| if feed == nil { | ||
| feed = &FeedData{ | ||
| VehicleLastSeen: make(map[string]time.Time), | ||
| } | ||
| manager.feedData[feedID] = feed | ||
| } | ||
| manager.feedMapMutex.Unlock() |
There was a problem hiding this comment.
we acquire the feedMapMutex to take a snapshot of the feedData and make new FeedData struct if this is a new feed
|
@aaronbrethorst , need a review! |
9548bf2 to
de7804f
Compare
|
@aaronbrethorst would love a review! |
ef6d0a0 to
d8b4a47
Compare
d8b4a47 to
51c4a5d
Compare
aaronbrethorst
left a comment
There was a problem hiding this comment.
Mann, this is an ambitious and well-thought-out refactoring. The FeedData struct is a clean encapsulation, and the overall lock granularity improvement is the right direction for multi-feed deployments. The new tests for error retention and empty payloads are a great addition. There's one race condition that needs fixing before this can ship.
Critical Issues (1 found)
1. Data race: feedLastUpdate deleted without lock in clearFeedData
internal/gtfs/gtfs_manager.go:131 — delete(manager.feedLastUpdate, feedID) is called with no lock held. This is a regression: the old code held realTimeMutex.Lock() when deleting from this map.
GetFeedUpdateTimes() (line 989) reads feedLastUpdate under realTimeMutex.RLock(), and SetFeedUpdateTime() (line 1002) writes under realTimeMutex.Lock(). The unprotected delete in clearFeedData creates a concurrent map read/write race that will crash the process with a fatal panic — Go's runtime kills the process on concurrent map access, and recover() cannot catch it.
Fix:
// After feed.mu.Unlock()
manager.realTimeMutex.Lock()
delete(manager.feedLastUpdate, feedID)
manager.realTimeMutex.Unlock()
manager.buildMergedRealtime()Important Issues (2 found)
1. MockAddAlert holds feedMapMutex while acquiring feed.mu
internal/gtfs/gtfs_manager_mock.go:147-159 — MockAddAlert acquires feedMapMutex.Lock() and then feed.mu.Lock() without releasing feedMapMutex first. This differs from the pattern in updateFeedRealtime (lines 279-289), which releases feedMapMutex before acquiring feed.mu. While both acquire in the same order (no deadlock today), holding feedMapMutex longer than necessary blocks all other feed operations. More importantly, the inconsistent nesting style is a maintenance hazard — a future developer might assume either pattern is canonical.
Fix: Release feedMapMutex before locking feed.mu, matching updateFeedRealtime:
m.feedMapMutex.Unlock() // Move this before feed.mu.Lock()
feed.mu.Lock()
feed.Alerts = append(feed.Alerts, alert)
feed.mu.Unlock()2. MockAddTripUpdate and MockResetRealTimeData are now inconsistent with feedData model
internal/gtfs/gtfs_manager_mock.go:130-174 — These pre-existing methods weren't updated in this PR. MockAddTripUpdate writes directly to the merged realTimeTrips slice, bypassing feedData entirely. A subsequent buildMergedRealtime() call (e.g., from MockAddAlert or a real feed update) will silently overwrite the injected data. Similarly, MockResetRealTimeData clears merged views but not feedData, so the next merge repopulates them.
These aren't regressions you introduced (they were written against the old model), but this PR changes the invariant they relied on. Worth updating to go through feedData["_test"] for consistency, or at minimum adding a // TODO comment noting the inconsistency.
Suggestions (3 found)
1. Misleading comment in buildMergedRealtime
internal/gtfs/realtime.go:618-619 — The comment says "the pointers in the map are never overwritten, only appended, so this is safe." Pointers are overwritten when a new FeedData is inserted for a previously unseen feed (line 285). The actual safety comes from feedMapMutex.RLock(). Consider rewording to: "Snapshot feed pointers under feedMapMutex.RLock; writers use feedMapMutex.Lock when inserting new entries."
2. FeedData exported fields with unexported mutex
internal/gtfs/gtfs_manager.go:103-111 — FeedData has exported fields (Trips, Vehicles, etc.) but an unexported mutex (mu). Code outside the package (or even test code within it) can read/write these fields without acquiring mu. Several tests already do this directly (e.g., multi_feed_test.go:442). Consider either unexporting the fields with accessor methods, or documenting the locking requirements clearly on the struct.
3. MockAddAlert doesn't initialize VehicleLastSeen
internal/gtfs/gtfs_manager_mock.go:153 — Creates &FeedData{} without initializing VehicleLastSeen, while all other creation sites use make(map[string]time.Time). This is safe today because cleanupExpiredVehicles checks for nil, but it's an inconsistency that could bite if that nil check is ever removed.
Strengths
- Clean structural refactoring: Collapsing four separate maps into a single
FeedDatastruct per feed is a meaningful improvement in code organization and maintainability - Lock granularity is correct: The per-feed
muallows truly parallel feed ingestion, whilefeedMapMutexprotects the map structure andmergeMutexserializes rebuilds — this layered approach is well-designed - Early unlock pattern: Capturing counts before
feed.mu.Unlock()(lines 431-435) to keep the critical section short is a nice touch - Improved error handling: The
hadDataBeforedistinction (line 291) to differentiate "first failure" from "retaining stale data" is a user-facing improvement in observability - New test coverage:
TestUpdateFeedRealtime_RetainsOldDataOnErrorandTestUpdateFeedRealtime_SuccessReplacesOldare valuable additions that test the data retention semantics - No deadlock risk: Lock ordering is consistent across all production code paths (feedMapMutex -> feed.mu -> mergeMutex -> realTimeMutex)
Recommended Action
- Fix the
feedLastUpdaterace inclearFeedData— this is the only blocker - Address the
MockAddAlertlock nesting inconsistency - Consider updating
MockAddTripUpdate/MockResetRealTimeDatafor consistency - After the race fix, this is ready to merge
…r feed with granular locking
… in subsequent requests.
7a6228a to
f8f01b3
Compare
|
@aaronbrethorst , |
Fixes #479
Description
This PR refactors the GTFS Manager to use fine-grained, per-feed locking for realtime data ingestion.
Previously, a global reader/writer lock (
realTimeMutex) was held for the entire duration of a feed update. In environments with multiple realtime feeds updating concurrently, this caused significant lock contention. A slower feed update would block all other feeds from writing and could potentially block readers trying to access the current state.This change introduces a localized
FeedDatastruct with its ownsync.RWMutex. This allows individual feeds to process and store their realtime data independently and concurrently. The global lock is now only acquired briefly during the final aggregation step (buildMergedRealtime), where pointers to the updated feed data are merged into the global system view.Changes Proposed
1. Introduced
FeedDataStructReplaced the separate:
feedTripsfeedVehiclesfeedAlertsfeedVehicleLastSeenWith a single:
This bundles a feed's realtime state into a cohesive structure, improving maintainability and encapsulation.
2. Added Per-Feed Locks
sync.RWMutexinsideFeedData.This enables true parallel processing of realtime feeds.
3. Map Synchronization
feedMapMutex.map[string]*FeedData.This guarantees thread-safe feed lifecycle management.
4. Optimized Merging (
buildMergedRealtime)a) Merge Serialization
mergeMutexto serialize concurrent merge operations.b) Feed-Level Locking During Merge
RWMutexonly long enough to copy its data slices.This ensures minimal blocking at the feed level.
5. Updated Tests
FeedDataarchitecture.6. Added Test Coverage
New test cases were added to validate: