Skip to content

Modules: Concurrency issue with module context fix#4603

Open
pm-nikhil-vaidya wants to merge 16 commits into
prebid:masterfrom
pm-nikhil-vaidya:module-context-concurrency
Open

Modules: Concurrency issue with module context fix#4603
pm-nikhil-vaidya wants to merge 16 commits into
prebid:masterfrom
pm-nikhil-vaidya:module-context-concurrency

Conversation

@pm-nikhil-vaidya
Copy link
Copy Markdown
Contributor

@pm-nikhil-vaidya pm-nikhil-vaidya commented Nov 11, 2025

Comment thread hooks/hookstage/invocation.go
scr-oath added a commit to scr-oath/prebid-server that referenced this pull request Dec 9, 2025
Add holdReadLock mechanism to prevent concurrent map access in moduleContexts
during parallel bidder hook execution. This fixes panics caused by concurrent
reads and writes to the moduleContexts map when multiple bidders execute hooks
in parallel.

This is an alternate implementation to PR prebid#4603 (prebid#4603),
using a simpler approach with a boolean flag to conditionally hold the read lock
for the entire duration of hook execution in bidder stages.

Changes:
- Add holdReadLock boolean field to executionContext
- Add getModuleContextLocked() helper that reads without acquiring locks
- Modify executeGroup() to conditionally hold RLock for entire execution
- Enable holdReadLock for ExecuteBidderRequestStage and ExecuteRawBidderResponseStage

The RWMutex is now held for the entire duration that hook goroutines are
executing in bidder stages, preventing saveModuleContexts() from acquiring
write lock while reads are in progress.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
@scr-oath
Copy link
Copy Markdown
Contributor

scr-oath commented Dec 9, 2025

Please see #4631 as an alternative approach, which uses the read-lock when calling the parallel bidder hook stages.

NOTE: I wrote it with Claude code and need to review myself - it may not be exactly what I wanted - I may want it to have better control over the lock than it currently does.

I just walked through it - it seems fine - each of the hookExecutor methods seems to call saveModuleContexts AFTER calling the executeStage and therefore holding a single lock for the duration of the executeGroup is fine - I might have grabbed and held at each hook invocation, but conceptually you can't return until all hooks are called so it's perfectly fine to only have one read lock held, so 👍

@scr-oath
Copy link
Copy Markdown
Contributor

scr-oath commented Dec 17, 2025

Ok - so taking another look - I think that *sync.Map would be a better option - I believe that models Java's ConcurrentHashMap where it tries to do things without locking if possible but resorts to locking in sticky situations. You're basically inventing sync.Map without that faster path here by forcing users to use Get/Set where they could be using Load/Store to accomplish the same effect and without reinventing any wheels.

CORRECTION from gemini: Implementing something with lock around map is probably best/fastest apparently.

Go 1.24+ Update: The standard Go map implementation was upgraded to Swiss Tables (a high-performance hash table design by Google). This makes the standard map + RWMutex significantly faster for lookups and inserts, further widening the gap between it and sync.Map for general use.

I'll have to take another look at this when I'm fresh. Could be ready to go 🤞

@scr-oath scr-oath self-requested a review December 17, 2025 15:51
Comment thread hooks/hookexecution/context.go
@bsardo
Copy link
Copy Markdown
Collaborator

bsardo commented Jan 6, 2026

@postindustria-code can you please review?

@scr-oath
Copy link
Copy Markdown
Contributor

scr-oath commented Jan 6, 2026

@pm-nikhil-vaidya - Happy New Year - can you take a look at the diff and see if you are comfortable applying it?
#4603 (comment)

@pm-nikhil-vaidya
Copy link
Copy Markdown
Contributor Author

@scr-oath I have implemented the solution you suggested as it removes intermediate copy while copying the module context and removes extra lock mechanism while reading. I have just added a check in the Insert() function in case of map map is empty.

func (mc *ModuleContext) Insert(seq iter.Seq2[string, any]) {
	if mc == nil {
		return
	}

	mc.Lock()
	defer mc.Unlock()

	if mc.data == nil {
		mc.data = make(map[string]any)
	}

	maps.Insert(mc.data, seq)
}

You can review and let me know if any more changes required

@legendko
Copy link
Copy Markdown
Contributor

legendko commented Jan 8, 2026

I understand the motivation behind introducing All() for performance reasons. That makes sense, especially given how hot this path can be. However I see a possible deadlock issue with its usage.

ModuleContext.All() holds an RLock for the entire duration of iteration and executes caller-provided code while that lock is held. This creates a structural deadlock hazard:

  • If code inside the iteration attempts to call Set(), Insert(), or any write operation → guaranteed deadlock
  • Multiple concurrent bidders might trigger this without any module author error
  • Plus if iteration is slow or blocks → writer starvation for all concurrent bidders

F.i. this pattern is very natural and currently deadlocks:

for k, v := range mc.All() {
    if shouldUpdate(k, v) {
        mc.Set(k, newValue) // deadlocks: tries to upgrade RLock → Lock
    }
}

GetAll() seems to be safer to use although being a bit slower and using copies

@pm-nikhil-vaidya @scr-oath what are you thoughts on this?

@postindustria-code
Copy link
Copy Markdown
Contributor

Quick follow-up / clarification:

The previous comment was also from me (@legendko) — just posted from my personal GitHub account by mistake. Apologies for any confusion; I wanted to continue the discussion here from this account.

While we’re on the topic, I’d like to raise one additional concern, since this is a breaking API change for module authors:

  • Should this be explicitly treated as a breaking change (e.g. version bump / release notes)?
  • A migration guide for external modules? This exact issue arised from a custom module

@pm-nikhil-vaidya
Copy link
Copy Markdown
Contributor Author

@bsardo @scr-oath @postindustria-code

I’ve rolled back to the previous implementation. I’ve also retained the approach suggested by @scr-oath for cases where a user prefers performance over the potential risk of a deadlock.

Comment thread hooks/hookexecution/context.go
Comment thread hooks/hookstage/invocation.go Outdated
mc.Lock()
defer mc.Unlock()
if mc.data == nil {
mc.data = make(map[string]any)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Make with the right size for copy if nil.

Suggested change
mc.data = make(map[string]any)
mc.data = make(map[string]any, len(data))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread hooks/hookstage/invocation.go Outdated
var emptyMapIter iter.Seq2[string, any] = func(yield func(string, any) bool) {}

// All returns an iterator over key-value pairs from the module context with read lock held
func (mc *ModuleContext) All() iter.Seq2[string, any] {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: remove this

I thought we had decided against exposing All due to its risk of deadlock if held in reverse order with another

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread hooks/hookstage/invocation.go Outdated
}

// Insert adds the key-value pairs from seq to the module context with write lock held
func (mc *ModuleContext) Insert(seq iter.Seq2[string, any]) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: remove this

I thought we decided against exporing Insert also for deadlock concerns.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@bsardo
Copy link
Copy Markdown
Collaborator

bsardo commented Feb 3, 2026

@pm-nikhil-vaidya we discussed @scr-oath's comments today. Let's remove Insert and All as he suggested and then we should be good.

scr-oath
scr-oath previously approved these changes Feb 10, 2026
Copy link
Copy Markdown
Contributor

@scr-oath scr-oath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three previous change requests have been addressed:

  1. make(map[string]any, len(data)) — properly sized allocation in SetAll when data is nil
  2. All() iterator removed — eliminates deadlock risk from holding lock across iteration
  3. Insert() removed — same concern addressed

Concurrency analysis of put(): The copy-based approach (existingCtx.SetAll(mCtx.GetAll())) is safe. Go evaluates mCtx.GetAll() first (acquires/releases mCtx.RLock), then passes the returned plain map to existingCtx.SetAll() (acquires/releases existingCtx.Lock). No two ModuleContext locks are ever held simultaneously. No reverse-order deadlock is possible since ModuleContext methods never reference back to moduleContexts.

nitpick (non-blocking): SetAll comment says "replaces all data" but maps.Copy merges into the existing map (existing keys not in data are preserved). The merge behavior is correct for this use case (accumulating context across stages), but the comment could say "merges data into the context" to be precise.

LGTM — the RWMutex + copy-based approach is sound, modules are all updated correctly, and tests are comprehensive.

Copy link
Copy Markdown
Contributor

@scr-oath scr-oath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewing after my previous approval was dismissed. The only delta since my last review is the v3→v4 import path bump from the master merge — the logic is unchanged.

One blocking issue found: data race in Get/GetAll where mc.data is read outside the lock.

Previous change-requests (sized allocation in SetAll, removed All() iterator, removed Insert()) are all still correctly addressed. The concurrency design is sound aside from the pre-lock nil check.

Comment thread hooks/hookstage/invocation.go Outdated

// Get retrieves a value from the module context with read lock
func (mc *ModuleContext) Get(key string) (any, bool) {
if mc == nil || mc.data == nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): mc.data == nil read outside the lock is a data race.

Set (line 72) writes mc.data under a write lock. Reading mc.data here without holding any lock is a concurrent read/write — go test -race will flag this. Same issue in GetAll at line 79.

Fix: move the data == nil check inside the lock, or drop it entirely since NewModuleContext() always initializes data:

func (mc *ModuleContext) Get(key string) (any, bool) {
	if mc == nil {
		return nil, false
	}
	mc.RLock()
	defer mc.RUnlock()
	if mc.data == nil {
		return nil, false
	}
	val, ok := mc.data[key]
	return val, ok
}

Apply the same pattern to GetAll.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread hooks/hookstage/invocation.go Outdated
return result
}

// SetAll replaces all data in the context
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (non-blocking): Comment says "replaces all data" but maps.Copy merges — existing keys not present in data are preserved.

Since the only call site (moduleContexts.put) wants merge semantics, the implementation is correct. Just fix the comment to say "merges data into the context" to avoid misleading module authors.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

mc.ctxs[moduleName] = newCtx

// Add new data to existing context
existingCtx.SetAll(mCtx.GetAll())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (non-blocking): Extract mCtx.GetAll() before acquiring moduleContexts.Lock to avoid nested locking.

Currently safe because GetAll copies-then-releases and SetAll acquires its own lock — no two ModuleContext locks are ever held simultaneously. But holding moduleContexts.Lock while calling into ModuleContext methods is a fragile invariant. Moving the snapshot outside the lock makes it explicit:

func (mc *moduleContexts) put(moduleName string, mCtx *hookstage.ModuleContext) {
	if mc == nil {
		return
	}
	newData := mCtx.GetAll()

	mc.Lock()
	defer mc.Unlock()

	existingCtx, ok := mc.ctxs[moduleName]
	if !ok {
		mc.ctxs[moduleName] = mCtx
		return
	}
	existingCtx.SetAll(newData)
}

Copy link
Copy Markdown
Contributor Author

@pm-nikhil-vaidya pm-nikhil-vaidya Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will partially address the nested locking issue. Since setAll also acquires a lock, nested locking won’t be completely eliminated. Therefore, we can retain the existing implementation.

}},
}
return ret, nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (non-blocking): The "get async request from module context" + error-analytics pattern is duplicated across HandleProcessedAuctionHook and HandleAuctionResponseHook (4 blocks total). Consider extracting a helper to cut ~40 lines of duplication:

func getAsyncRequest(mc *hookstage.ModuleContext, prefix string) (*AsyncRequest, *hookanalytics.Analytics) { ... }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is refactoring of the RTD module, we can take this change in separate PR. This PR mainly target the concurrency issue.

@scr-oath
Copy link
Copy Markdown
Contributor

CI build failure — two lines in modules/scope3/rtd/module_test.go still use map-index syntax on *ModuleContext:

module_test.go:262: cannot index entrypointResult.ModuleContext (variable of type *hookstage.ModuleContext)
module_test.go:692: cannot index entrypointResult.ModuleContext (variable of type *hookstage.ModuleContext)

The same pattern was already fixed on lines 364, 547, and 801. Apply the same migration to these two:

// Lines 262 and 692 — change:
assert.NotNil(t, entrypointResult.ModuleContext[asyncRequestKey])
// to:
asyncRequest, _ := entrypointResult.ModuleContext.Get(asyncRequestKey)
assert.NotNil(t, asyncRequest)

@bsardo bsardo assigned postindustria-code and unassigned bsardo Apr 16, 2026
Copy link
Copy Markdown
Contributor

@scr-oath scr-oath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewing after the fix commits.

The prior blocking race is fixed. Get and GetAll in hooks/hookstage/invocation.go now check mc.data == nil inside the read lock, ordered behind Set/SetAll's lazy init under the write lock. The SetAll doc comment was also corrected from "replaces" to "merges". The two CI-breaking ModuleContext[asyncRequestKey] map-indexes in modules/scope3/rtd/module_test.go are now .Get(...). The new wurfl device-detection migration is a clean, mechanical port that mirrors the pattern used by the other modules in this PR.

Verification:

  1. Race fix — invocation.go lines 58–65 (Get) and 85–92 (GetAll): RLock taken, then nil check, then map read. Symmetric on the write side at lines 72–78 (Set) and 100–106 (SetAll). No window remains.
  2. put() in hooks/hookexecution/context.go:63 — copy-based merge is still deadlock-free: mCtx and existingCtx are distinct *ModuleContext instances, and ModuleContext methods never call back into moduleContexts. Lock graph is acyclic.
  3. CI fixes — both leftover map-index sites in module_test.go migrated; the assertion semantics (assert.NotNil on interface nil) are preserved because Get returns untyped-nil on miss.

LGTM. One non-blocking nit on the wurfl error message left as a line comment.

Reviewer notes:

  • The two carryover non-blocking suggestions from my prior review (extract snapshot before put's outer lock; helper for the duplicated scope3 Get+type-assert pattern) remain unaddressed. Author's call — neither blocks merge.

rawHeaders, ok := invocationCtx.ModuleContext[wurflHeaderCtxKey].(map[string]string)
headers, ok := invocationCtx.ModuleContext.Get(wurflHeaderCtxKey)
if !ok {
return result, hookexecution.NewFailure("invalid module context type")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick (non-blocking): error message is misleading on this branch.

Get returning ok=false means the key is missing, not that the value's type is wrong. The next branch (line 110) handles the actual type-assertion failure with the same string, so the two failure modes are no longer distinguishable.

Consider:

headers, ok := invocationCtx.ModuleContext.Get(wurflHeaderCtxKey)
if !ok {
    return result, hookexecution.NewFailure("missing module context entry")
}
rawHeaders, ok := headers.(map[string]string)
if !ok {
    return result, hookexecution.NewFailure("invalid module context type")
}

@scr-oath scr-oath requested a review from bsardo April 28, 2026 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants