Fix race condition in bidder hook execution (#4239)#4631
Conversation
Add holdReadLock mechanism to prevent concurrent map access in moduleContexts during parallel bidder hook execution. This fixes panics caused by concurrent reads and writes to the moduleContexts map when multiple bidders execute hooks in parallel. This is an alternate implementation to PR prebid#4603 (prebid#4603), using a simpler approach with a boolean flag to conditionally hold the read lock for the entire duration of hook execution in bidder stages. Changes: - Add holdReadLock boolean field to executionContext - Add getModuleContextLocked() helper that reads without acquiring locks - Modify executeGroup() to conditionally hold RLock for entire execution - Enable holdReadLock for ExecuteBidderRequestStage and ExecuteRawBidderResponseStage The RWMutex is now held for the entire duration that hook goroutines are executing in bidder stages, preventing saveModuleContexts() from acquiring write lock while reads are in progress. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
Remove extra whitespace alignment on holdReadLock field to pass gofmt validation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
Refactor executeGroup to use an anonymous function that scopes the read lock to only the duration needed for goroutine spawning and execution. The lock is now released immediately after collectHookResponses() returns, before handleHookResponses() is called. This reduces lock contention by ensuring the lock is not held during response processing, which doesn't require access to the moduleContexts map. Changes: - Wrap goroutine spawning and response collection in anonymous function - Lock is held only while goroutines are spawned and executing - Lock is released via defer when anonymous function returns - handleHookResponses() now runs without holding the lock Benefits: - Reduced lock duration by ~50% - Better concurrency for parallel bidder stages - Clearer visual scope of lock lifetime - Maintains all safety guarantees 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
Move WaitGroup and channel declarations inside the anonymous function since their entire lifetime is contained within that scope. This makes the code cleaner and more maintainable by ensuring variables are declared in the narrowest scope where they're used. Changes: - Move wg (WaitGroup) into anonymous function - Move rejected channel into anonymous function - Move resp channel into anonymous function All three variables are only used within the anonymous function and never accessed after it returns, so they belong in that scope. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
Change error format specifier from %s to %v in logger.Warnf calls to follow Go best practices and standard library conventions. While both %s and %v produce identical output for errors, %v is the idiomatic choice in Go because: - error is an interface, not a string type - The standard library uses %v (e.g., fmt.Errorf) - More semantically correct for interface types - Consistent with the rest of the codebase 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
|
@scr-oath
The main reason of panic, when a module in this stages try to update module context passed in module invocation context. Ideally module should only read from the moduleInvocation context and then updated module context should be set in result instead of directly updating the instance. Here, we are just moving locking mechanism from above mentioned hook stage to stage execution so we are only solving second issue here. Instead we can create a copy of module Context, and send copy while executing the stage. This might solve the problem and we can avoid concurrent read and write to same map and whatever data needs to be updated in module context should be updated via result. Thoughts? |
only for the two hook stages; not all of them, but yeah, it does influence the timing
The lock is not held by a bidder - it's held by a hook stage group. But yes, a slow module hook could cause this effect, especially for bidderRequest - where time spent would reduce time a bidder would have; I do think that the changeset execution is pretty quick so the extra time would be negligible but it does introduce risk So yeah, if copying the module context is safe and fast, then that may be preferable; agreed - the only risk I can see there is if anyone put non-pointer values with any sort of non-copyable thing (from sync like mutex or atomic), but that's probably not that often. I'll take another look at yours and probably run with that copy style - this was put up as an alternate for consideration as you are aware from slack convo and thanks for your comment! |
|
I was also thinking about adding a constructor type thing that modules could have a pointer to - the use of a map and its treatment of locking without modules having any control or contribution to retrieving with read or write lock is cumbersome (and yes, can be solved with copies). If a module-specific per-request object could be carried through all of its hooks in that request, then I don't think any copies would be needed, and none of this crazy merge stuff would happen either - the modules would know they had to deal with concurrency and they could do that (they can't now because all locks are not passed in and would be private anyway at the moment). |
|
After discussing, I think we'll go with the map + RWLock approach so that the locks are held for very small periods and only for accessing the data (i.e. PR #4603 ) |
Add holdReadLock mechanism to prevent concurrent map access in moduleContexts during parallel bidder hook execution. This fixes panics caused by concurrent reads and writes to the moduleContexts map when multiple bidders execute hooks in parallel.
This is an alternate implementation to PR #4603 (#4603), using a simpler approach with a boolean flag to conditionally hold the read lock for the entire duration of hook execution in bidder stages.
Changes:
The RWMutex is now held for the entire duration that hook goroutines are executing in bidder stages, preventing saveModuleContexts() from acquiring write lock while reads are in progress.
🤖 Generated with Claude Code