Skip to content

Fix race condition in bidder hook execution (#4239)#4631

Closed
scr-oath wants to merge 5 commits into
prebid:masterfrom
scr-oath:4239-concurrency-fix-for-bidders-phases
Closed

Fix race condition in bidder hook execution (#4239)#4631
scr-oath wants to merge 5 commits into
prebid:masterfrom
scr-oath:4239-concurrency-fix-for-bidders-phases

Conversation

@scr-oath
Copy link
Copy Markdown
Contributor

@scr-oath scr-oath commented Dec 9, 2025

Add holdReadLock mechanism to prevent concurrent map access in moduleContexts during parallel bidder hook execution. This fixes panics caused by concurrent reads and writes to the moduleContexts map when multiple bidders execute hooks in parallel.

This is an alternate implementation to PR #4603 (#4603), using a simpler approach with a boolean flag to conditionally hold the read lock for the entire duration of hook execution in bidder stages.

Changes:

  • Add holdReadLock boolean field to executionContext
  • Add getModuleContextLocked() helper that reads without acquiring locks
  • Modify executeGroup() to conditionally hold RLock for entire execution
  • Enable holdReadLock for ExecuteBidderRequestStage and ExecuteRawBidderResponseStage

The RWMutex is now held for the entire duration that hook goroutines are executing in bidder stages, preventing saveModuleContexts() from acquiring write lock while reads are in progress.

🤖 Generated with Claude Code

Add holdReadLock mechanism to prevent concurrent map access in moduleContexts
during parallel bidder hook execution. This fixes panics caused by concurrent
reads and writes to the moduleContexts map when multiple bidders execute hooks
in parallel.

This is an alternate implementation to PR prebid#4603 (prebid#4603),
using a simpler approach with a boolean flag to conditionally hold the read lock
for the entire duration of hook execution in bidder stages.

Changes:
- Add holdReadLock boolean field to executionContext
- Add getModuleContextLocked() helper that reads without acquiring locks
- Modify executeGroup() to conditionally hold RLock for entire execution
- Enable holdReadLock for ExecuteBidderRequestStage and ExecuteRawBidderResponseStage

The RWMutex is now held for the entire duration that hook goroutines are
executing in bidder stages, preventing saveModuleContexts() from acquiring
write lock while reads are in progress.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
Remove extra whitespace alignment on holdReadLock field to pass gofmt validation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
scr-oath and others added 3 commits December 11, 2025 21:50
Refactor executeGroup to use an anonymous function that scopes the read lock
to only the duration needed for goroutine spawning and execution. The lock is
now released immediately after collectHookResponses() returns, before
handleHookResponses() is called.

This reduces lock contention by ensuring the lock is not held during response
processing, which doesn't require access to the moduleContexts map.

Changes:
- Wrap goroutine spawning and response collection in anonymous function
- Lock is held only while goroutines are spawned and executing
- Lock is released via defer when anonymous function returns
- handleHookResponses() now runs without holding the lock

Benefits:
- Reduced lock duration by ~50%
- Better concurrency for parallel bidder stages
- Clearer visual scope of lock lifetime
- Maintains all safety guarantees

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
Move WaitGroup and channel declarations inside the anonymous function
since their entire lifetime is contained within that scope. This makes
the code cleaner and more maintainable by ensuring variables are declared
in the narrowest scope where they're used.

Changes:
- Move wg (WaitGroup) into anonymous function
- Move rejected channel into anonymous function
- Move resp channel into anonymous function

All three variables are only used within the anonymous function and
never accessed after it returns, so they belong in that scope.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
Change error format specifier from %s to %v in logger.Warnf calls
to follow Go best practices and standard library conventions.

While both %s and %v produce identical output for errors, %v is the
idiomatic choice in Go because:
- error is an interface, not a string type
- The standard library uses %v (e.g., fmt.Errorf)
- More semantically correct for interface types
- Consistent with the rest of the codebase

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
@pm-nikhil-vaidya
Copy link
Copy Markdown
Contributor

@scr-oath
Agree! Using mutex basically introduce two issues:

  1. It basically take a performance hit and make things slower
  2. Particularly in two hook stage bidderRequest and rawBidderResponse, a lock hold by a slow bidder may slow down entire bidder execution.

The main reason of panic, when a module in this stages try to update module context passed in module invocation context. Ideally module should only read from the moduleInvocation context and then updated module context should be set in result instead of directly updating the instance.

Here, we are just moving locking mechanism from above mentioned hook stage to stage execution so we are only solving second issue here.

Instead we can create a copy of module Context, and send copy while executing the stage. This might solve the problem and we can avoid concurrent read and write to same map and whatever data needs to be updated in module context should be updated via result.

Thoughts?

@scr-oath
Copy link
Copy Markdown
Contributor Author

@scr-oath Agree! Using mutex basically introduce two issues:

  1. It basically take a performance hit and make things slower

only for the two hook stages; not all of them, but yeah, it does influence the timing

  1. Particularly in two hook stage bidderRequest and rawBidderResponse, a lock hold by a slow bidder may slow down entire bidder execution.

The lock is not held by a bidder - it's held by a hook stage group. But yes, a slow module hook could cause this effect, especially for bidderRequest - where time spent would reduce time a bidder would have; I do think that the changeset execution is pretty quick so the extra time would be negligible but it does introduce risk

So yeah, if copying the module context is safe and fast, then that may be preferable; agreed - the only risk I can see there is if anyone put non-pointer values with any sort of non-copyable thing (from sync like mutex or atomic), but that's probably not that often.

I'll take another look at yours and probably run with that copy style - this was put up as an alternate for consideration as you are aware from slack convo and thanks for your comment!

@scr-oath
Copy link
Copy Markdown
Contributor Author

I was also thinking about adding a constructor type thing that modules could have a pointer to - the use of a map and its treatment of locking without modules having any control or contribution to retrieving with read or write lock is cumbersome (and yes, can be solved with copies). If a module-specific per-request object could be carried through all of its hooks in that request, then I don't think any copies would be needed, and none of this crazy merge stuff would happen either - the modules would know they had to deal with concurrency and they could do that (they can't now because all locks are not passed in and would be private anyway at the moment).

@scr-oath
Copy link
Copy Markdown
Contributor Author

After discussing, I think we'll go with the map + RWLock approach so that the locks are held for very small periods and only for accessing the data (i.e. PR #4603 )

@scr-oath scr-oath closed this Dec 17, 2025
@scr-oath scr-oath deleted the 4239-concurrency-fix-for-bidders-phases branch December 17, 2025 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants