[Bug] eager backend: guard groups/tasks maps with a single mutex#54
Merged
Merged
Conversation
The eager backend's groups and tasks maps were read without holding any lock (only updateState acquired stateMutex while writing tasks), so concurrent callers could trip Go's "concurrent map read and map write" fatal. Adds a single sync.Mutex covering both maps and introduces getStateLocked so the group methods can re-read tasks without re-acquiring the lock. Same root cause as upstream RichardKnop/machinery#737 (PR #740). A coarse single mutex is intentional: the eager backend is for tests and local development, contention is low, and one lock keeps the "single caller at a time" invariant easy to reason about. Adds TestEagerBackend_ConcurrentAccess which exercises every public surface — without the fix, this test deterministically trips the race detector.
b05d611 to
0a6c995
Compare
There was a problem hiding this comment.
Pull request overview
This PR makes the v1 eager in-memory backend safe for concurrent use by serializing access to its task and group metadata maps.
Changes:
- Replaces the previous task-only mutex with a single mutex guarding both
groupsandtasks. - Avoids recursive locking by adding
getStateLockedfor callers that already hold the mutex. - Adds a concurrent access regression test covering the eager backend’s map read/write paths.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
v1/backends/eager/eager.go |
Adds mutex protection around eager backend group/task map access. |
v1/backends/eager/eager_test.go |
Adds a concurrency regression test for eager backend public operations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
markt-sublime
approved these changes
May 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
groupsandtasksmaps were read without holding any lock —updateStatewas the only place that acquiredstateMutex, and only while writingtasks. Concurrent callers can trip Go's "concurrent map read and map write" fatal.stateMutexwith a singlesync.Mutex(mu) that guards both maps across every public method, and introducesgetStateLockedsoGroupCompleted/GroupTaskStatescan re-readtaskswithout re-acquiring the lock.This is the same root cause reported upstream as RichardKnop/machinery#737 (PR #740). Upstream's fix uses two
sync.RWMutex(one per map). Ours is simpler and also catches the v1PurgeGroupMetacase that upstream PR #740 missed.v2 status
The same bug exists in
v2/backends/eager/eager.gobut is not addressed here. The v2 module has a pre-existing build break (v2/config/config.go:65referencesMongoDBConfig, which was deleted in commit8617130"Drop mongo support" #23 and never replaced). CI'sgo list ./...runs from the root module and doesn't descend into v2, so v2 is effectively unmaintained dead code in this fork. The v2 race-fix would need to come with a v2 build fix; leaving that as a separate decision.How did you validate this change
Added
TestEagerBackend_ConcurrentAccess(v1/backends/eager/eager_test.go) — spawns 50 goroutines x 200 iterations and exercises every public surface:SetState{Pending,Received,Started,Success},GetState,InitGroup,GroupCompleted,GroupTaskStates,PurgeState,PurgeGroupMeta. Without the mutex changes this test deterministically tripsgo test -race; with them it passes cleanly.