Modules: Concurrency issue with module context fix#4603
Modules: Concurrency issue with module context fix#4603pm-nikhil-vaidya wants to merge 16 commits into
Conversation
Add holdReadLock mechanism to prevent concurrent map access in moduleContexts during parallel bidder hook execution. This fixes panics caused by concurrent reads and writes to the moduleContexts map when multiple bidders execute hooks in parallel. This is an alternate implementation to PR prebid#4603 (prebid#4603), using a simpler approach with a boolean flag to conditionally hold the read lock for the entire duration of hook execution in bidder stages. Changes: - Add holdReadLock boolean field to executionContext - Add getModuleContextLocked() helper that reads without acquiring locks - Modify executeGroup() to conditionally hold RLock for entire execution - Enable holdReadLock for ExecuteBidderRequestStage and ExecuteRawBidderResponseStage The RWMutex is now held for the entire duration that hook goroutines are executing in bidder stages, preventing saveModuleContexts() from acquiring write lock while reads are in progress. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
|
Please see #4631 as an alternative approach, which uses the read-lock when calling the parallel bidder hook stages. NOTE: I wrote it with Claude code and need to review myself - it may not be exactly what I wanted - I may want it to have better control over the lock than it currently does. I just walked through it - it seems fine - each of the hookExecutor methods seems to call saveModuleContexts AFTER calling the executeStage and therefore holding a single lock for the duration of the executeGroup is fine - I might have grabbed and held at each hook invocation, but conceptually you can't return until all hooks are called so it's perfectly fine to only have one read lock held, so 👍 |
|
CORRECTION from gemini: Implementing something with lock around map is probably best/fastest apparently.
I'll have to take another look at this when I'm fresh. Could be ready to go 🤞 |
|
@postindustria-code can you please review? |
|
@pm-nikhil-vaidya - Happy New Year - can you take a look at the diff and see if you are comfortable applying it? |
|
@scr-oath I have implemented the solution you suggested as it removes intermediate copy while copying the module context and removes extra lock mechanism while reading. I have just added a check in the Insert() function in case of map map is empty. func (mc *ModuleContext) Insert(seq iter.Seq2[string, any]) {
if mc == nil {
return
}
mc.Lock()
defer mc.Unlock()
if mc.data == nil {
mc.data = make(map[string]any)
}
maps.Insert(mc.data, seq)
}You can review and let me know if any more changes required |
|
I understand the motivation behind introducing All() for performance reasons. That makes sense, especially given how hot this path can be. However I see a possible deadlock issue with its usage. ModuleContext.All() holds an RLock for the entire duration of iteration and executes caller-provided code while that lock is held. This creates a structural deadlock hazard:
F.i. this pattern is very natural and currently deadlocks: GetAll() seems to be safer to use although being a bit slower and using copies @pm-nikhil-vaidya @scr-oath what are you thoughts on this? |
|
Quick follow-up / clarification: The previous comment was also from me (@legendko) — just posted from my personal GitHub account by mistake. Apologies for any confusion; I wanted to continue the discussion here from this account. While we’re on the topic, I’d like to raise one additional concern, since this is a breaking API change for module authors:
|
|
@bsardo @scr-oath @postindustria-code I’ve rolled back to the previous implementation. I’ve also retained the approach suggested by @scr-oath for cases where a user prefers performance over the potential risk of a deadlock. |
| mc.Lock() | ||
| defer mc.Unlock() | ||
| if mc.data == nil { | ||
| mc.data = make(map[string]any) |
There was a problem hiding this comment.
suggestion: Make with the right size for copy if nil.
| mc.data = make(map[string]any) | |
| mc.data = make(map[string]any, len(data)) |
| var emptyMapIter iter.Seq2[string, any] = func(yield func(string, any) bool) {} | ||
|
|
||
| // All returns an iterator over key-value pairs from the module context with read lock held | ||
| func (mc *ModuleContext) All() iter.Seq2[string, any] { |
There was a problem hiding this comment.
suggestion: remove this
I thought we had decided against exposing All due to its risk of deadlock if held in reverse order with another
| } | ||
|
|
||
| // Insert adds the key-value pairs from seq to the module context with write lock held | ||
| func (mc *ModuleContext) Insert(seq iter.Seq2[string, any]) { |
There was a problem hiding this comment.
suggestion: remove this
I thought we decided against exporing Insert also for deadlock concerns.
|
@pm-nikhil-vaidya we discussed @scr-oath's comments today. Let's remove |
scr-oath
left a comment
There was a problem hiding this comment.
All three previous change requests have been addressed:
- ✅
make(map[string]any, len(data))— properly sized allocation inSetAllwhendatais nil - ✅
All()iterator removed — eliminates deadlock risk from holding lock across iteration - ✅
Insert()removed — same concern addressed
Concurrency analysis of put(): The copy-based approach (existingCtx.SetAll(mCtx.GetAll())) is safe. Go evaluates mCtx.GetAll() first (acquires/releases mCtx.RLock), then passes the returned plain map to existingCtx.SetAll() (acquires/releases existingCtx.Lock). No two ModuleContext locks are ever held simultaneously. No reverse-order deadlock is possible since ModuleContext methods never reference back to moduleContexts.
nitpick (non-blocking): SetAll comment says "replaces all data" but maps.Copy merges into the existing map (existing keys not in data are preserved). The merge behavior is correct for this use case (accumulating context across stages), but the comment could say "merges data into the context" to be precise.
LGTM — the RWMutex + copy-based approach is sound, modules are all updated correctly, and tests are comprehensive.
bf0b766
scr-oath
left a comment
There was a problem hiding this comment.
Re-reviewing after my previous approval was dismissed. The only delta since my last review is the v3→v4 import path bump from the master merge — the logic is unchanged.
One blocking issue found: data race in Get/GetAll where mc.data is read outside the lock.
Previous change-requests (sized allocation in SetAll, removed All() iterator, removed Insert()) are all still correctly addressed. The concurrency design is sound aside from the pre-lock nil check.
|
|
||
| // Get retrieves a value from the module context with read lock | ||
| func (mc *ModuleContext) Get(key string) (any, bool) { | ||
| if mc == nil || mc.data == nil { |
There was a problem hiding this comment.
issue (blocking): mc.data == nil read outside the lock is a data race.
Set (line 72) writes mc.data under a write lock. Reading mc.data here without holding any lock is a concurrent read/write — go test -race will flag this. Same issue in GetAll at line 79.
Fix: move the data == nil check inside the lock, or drop it entirely since NewModuleContext() always initializes data:
func (mc *ModuleContext) Get(key string) (any, bool) {
if mc == nil {
return nil, false
}
mc.RLock()
defer mc.RUnlock()
if mc.data == nil {
return nil, false
}
val, ok := mc.data[key]
return val, ok
}Apply the same pattern to GetAll.
| return result | ||
| } | ||
|
|
||
| // SetAll replaces all data in the context |
There was a problem hiding this comment.
issue (non-blocking): Comment says "replaces all data" but maps.Copy merges — existing keys not present in data are preserved.
Since the only call site (moduleContexts.put) wants merge semantics, the implementation is correct. Just fix the comment to say "merges data into the context" to avoid misleading module authors.
| mc.ctxs[moduleName] = newCtx | ||
|
|
||
| // Add new data to existing context | ||
| existingCtx.SetAll(mCtx.GetAll()) |
There was a problem hiding this comment.
suggestion (non-blocking): Extract mCtx.GetAll() before acquiring moduleContexts.Lock to avoid nested locking.
Currently safe because GetAll copies-then-releases and SetAll acquires its own lock — no two ModuleContext locks are ever held simultaneously. But holding moduleContexts.Lock while calling into ModuleContext methods is a fragile invariant. Moving the snapshot outside the lock makes it explicit:
func (mc *moduleContexts) put(moduleName string, mCtx *hookstage.ModuleContext) {
if mc == nil {
return
}
newData := mCtx.GetAll()
mc.Lock()
defer mc.Unlock()
existingCtx, ok := mc.ctxs[moduleName]
if !ok {
mc.ctxs[moduleName] = mCtx
return
}
existingCtx.SetAll(newData)
}There was a problem hiding this comment.
This will partially address the nested locking issue. Since setAll also acquires a lock, nested locking won’t be completely eliminated. Therefore, we can retain the existing implementation.
| }}, | ||
| } | ||
| return ret, nil | ||
| } |
There was a problem hiding this comment.
suggestion (non-blocking): The "get async request from module context" + error-analytics pattern is duplicated across HandleProcessedAuctionHook and HandleAuctionResponseHook (4 blocks total). Consider extracting a helper to cut ~40 lines of duplication:
func getAsyncRequest(mc *hookstage.ModuleContext, prefix string) (*AsyncRequest, *hookanalytics.Analytics) { ... }There was a problem hiding this comment.
This change is refactoring of the RTD module, we can take this change in separate PR. This PR mainly target the concurrency issue.
|
CI build failure — two lines in The same pattern was already fixed on lines 364, 547, and 801. Apply the same migration to these two: // Lines 262 and 692 — change:
assert.NotNil(t, entrypointResult.ModuleContext[asyncRequestKey])
// to:
asyncRequest, _ := entrypointResult.ModuleContext.Get(asyncRequestKey)
assert.NotNil(t, asyncRequest) |
scr-oath
left a comment
There was a problem hiding this comment.
Re-reviewing after the fix commits.
The prior blocking race is fixed. Get and GetAll in hooks/hookstage/invocation.go now check mc.data == nil inside the read lock, ordered behind Set/SetAll's lazy init under the write lock. The SetAll doc comment was also corrected from "replaces" to "merges". The two CI-breaking ModuleContext[asyncRequestKey] map-indexes in modules/scope3/rtd/module_test.go are now .Get(...). The new wurfl device-detection migration is a clean, mechanical port that mirrors the pattern used by the other modules in this PR.
Verification:
- Race fix —
invocation.golines 58–65 (Get) and 85–92 (GetAll):RLocktaken, then nil check, then map read. Symmetric on the write side at lines 72–78 (Set) and 100–106 (SetAll). No window remains. put()inhooks/hookexecution/context.go:63— copy-based merge is still deadlock-free:mCtxandexistingCtxare distinct*ModuleContextinstances, andModuleContextmethods never call back intomoduleContexts. Lock graph is acyclic.- CI fixes — both leftover map-index sites in
module_test.gomigrated; the assertion semantics (assert.NotNilon interface nil) are preserved becauseGetreturns untyped-nil on miss.
LGTM. One non-blocking nit on the wurfl error message left as a line comment.
Reviewer notes:
- The two carryover non-blocking suggestions from my prior review (extract snapshot before
put's outer lock; helper for the duplicated scope3Get+type-assert pattern) remain unaddressed. Author's call — neither blocks merge.
| rawHeaders, ok := invocationCtx.ModuleContext[wurflHeaderCtxKey].(map[string]string) | ||
| headers, ok := invocationCtx.ModuleContext.Get(wurflHeaderCtxKey) | ||
| if !ok { | ||
| return result, hookexecution.NewFailure("invalid module context type") |
There was a problem hiding this comment.
nitpick (non-blocking): error message is misleading on this branch.
Get returning ok=false means the key is missing, not that the value's type is wrong. The next branch (line 110) handles the actual type-assertion failure with the same string, so the two failure modes are no longer distinguishable.
Consider:
headers, ok := invocationCtx.ModuleContext.Get(wurflHeaderCtxKey)
if !ok {
return result, hookexecution.NewFailure("missing module context entry")
}
rawHeaders, ok := headers.(map[string]string)
if !ok {
return result, hookexecution.NewFailure("invalid module context type")
}
#4239