Skip to content

[Bug] eager backend: guard groups/tasks maps with a single mutex#54

Merged
madirey merged 1 commit into
masterfrom
madi/fix-eager-backend-concurrent-map-race
May 20, 2026
Merged

[Bug] eager backend: guard groups/tasks maps with a single mutex#54
madirey merged 1 commit into
masterfrom
madi/fix-eager-backend-concurrent-map-race

Conversation

@madirey

@madirey madirey commented May 19, 2026

Copy link
Copy Markdown
Member

Summary

  • The v1 eager backend's groups and tasks maps were read without holding any lock — updateState was the only place that acquired stateMutex, and only while writing tasks. Concurrent callers can trip Go's "concurrent map read and map write" fatal.
  • Replaces stateMutex with a single sync.Mutex (mu) that guards both maps across every public method, and introduces getStateLocked so GroupCompleted/GroupTaskStates can re-read tasks without re-acquiring the lock.
  • A coarse single mutex is intentional: the eager backend is for tests and local development, contention is low, and one lock keeps the "single caller at a time touches internal state" invariant easy to reason about.

This is the same root cause reported upstream as RichardKnop/machinery#737 (PR #740). Upstream's fix uses two sync.RWMutex (one per map). Ours is simpler and also catches the v1 PurgeGroupMeta case that upstream PR #740 missed.

v2 status

The same bug exists in v2/backends/eager/eager.go but is not addressed here. The v2 module has a pre-existing build break (v2/config/config.go:65 references MongoDBConfig, which was deleted in commit 8617130 "Drop mongo support" #23 and never replaced). CI's go list ./... runs from the root module and doesn't descend into v2, so v2 is effectively unmaintained dead code in this fork. The v2 race-fix would need to come with a v2 build fix; leaving that as a separate decision.

How did you validate this change

Added TestEagerBackend_ConcurrentAccess (v1/backends/eager/eager_test.go) — spawns 50 goroutines x 200 iterations and exercises every public surface: SetState{Pending,Received,Started,Success}, GetState, InitGroup, GroupCompleted, GroupTaskStates, PurgeState, PurgeGroupMeta. Without the mutex changes this test deterministically trips go test -race; with them it passes cleanly.

  • Assisted by Claude 🤖

The eager backend's groups and tasks maps were read without holding any
lock (only updateState acquired stateMutex while writing tasks), so
concurrent callers could trip Go's "concurrent map read and map write"
fatal. Adds a single sync.Mutex covering both maps and introduces
getStateLocked so the group methods can re-read tasks without re-acquiring
the lock.

Same root cause as upstream RichardKnop/machinery#737 (PR #740). A coarse
single mutex is intentional: the eager backend is for tests and local
development, contention is low, and one lock keeps the "single caller at
a time" invariant easy to reason about.

Adds TestEagerBackend_ConcurrentAccess which exercises every public
surface — without the fix, this test deterministically trips the race
detector.
@madirey madirey force-pushed the madi/fix-eager-backend-concurrent-map-race branch from b05d611 to 0a6c995 Compare May 19, 2026 02:03
@madirey madirey requested a review from Copilot May 19, 2026 02:18

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes the v1 eager in-memory backend safe for concurrent use by serializing access to its task and group metadata maps.

Changes:

  • Replaces the previous task-only mutex with a single mutex guarding both groups and tasks.
  • Avoids recursive locking by adding getStateLocked for callers that already hold the mutex.
  • Adds a concurrent access regression test covering the eager backend’s map read/write paths.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
v1/backends/eager/eager.go Adds mutex protection around eager backend group/task map access.
v1/backends/eager/eager_test.go Adds a concurrency regression test for eager backend public operations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@madirey madirey marked this pull request as ready for review May 19, 2026 02:22
@madirey madirey merged commit 8784e0b into master May 20, 2026
4 checks passed
@madirey madirey deleted the madi/fix-eager-backend-concurrent-map-race branch May 20, 2026 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants