Skip to content

fix: don't mutate shared matrix AST when resolving ref: rows#2894

Open
amitmishra11 wants to merge 2 commits into
go-task:mainfrom
amitmishra11:fix/matrix-ref-race-2890
Open

fix: don't mutate shared matrix AST when resolving ref: rows#2894
amitmishra11 wants to merge 2 commits into
go-task:mainfrom
amitmishra11:fix/matrix-ref-race-2890

Conversation

@amitmishra11

Copy link
Copy Markdown

Fixes #2890

Bug

resolveMatrixRefs mutated the *ast.Matrix passed to it in place. That Matrix lives on the cached, shared *ast.Task (the same *ast.For/*ast.Matrix object is reused on every invocation of a task), so when a task with a for: matrix: ref: loop is invoked concurrently — e.g. by two parallel deps that call the same internal task with different vars — both calls race on the row.Value = value assignment in resolveMatrixRefs.

Reproduction from the issue:

tasks:
  build-bins:
    internal: true
    cmds:
      - for:
          matrix:
            GOARCH: { ref: .ARCH_VAR }
            BIN:   { ref: .BINS_VAR }
        cmd: 'echo "built GOARCH={{.ITEM.GOARCH}} BIN={{.ITEM.BIN}}"'
  build-amd64:
    cmds: [{ task: build-bins, vars: { ARCH_VAR: [amd64], BINS_VAR: [a, b, c] } }]
  build-arm64:
    cmds: [{ task: build-bins, vars: { ARCH_VAR: [arm64], BINS_VAR: [x, y] } }]
  default:
    deps: [build-amd64, build-arm64]

Intermittently, ARCH_VAR/BINS_VAR resolved for one caller leak into the other caller's matrix expansion, silently producing wrong output with no error.

Root cause (two layers)

  1. resolveMatrixRefs resolved ref: rows directly on the matrix it was given, instead of a copy.
  2. Matrix.DeepCopy() already existed, but didn't actually deep-copy the rows: deepcopy.OrderedMap only deep-copies a value if it implements the Copier interface (DeepCopy() T), and *MatrixRow didn't implement it. So Matrix.DeepCopy()'s "copy" still pointed at the exact same *MatrixRow objects as the original — using it as-is would not have fixed the race.

Fix

  • Added MatrixRow.DeepCopy() so Matrix.DeepCopy() produces independent rows.
  • resolveMatrixRefs now deep-copies the matrix first, resolves refs into the copy, and returns the copy. The caller (itemsFromFor) uses the returned, resolved matrix instead of the original.

Tests

Added TestResolveMatrixRefsDoesNotMutateSharedMatrix (variables_internal_test.go), which:

  • Drives resolveMatrixRefs from 200 goroutines against the same shared *ast.Matrix, each with a different ref value, and asserts every caller gets back its own value (no cross-contamination).
  • Asserts the original input matrix is never mutated (Ref and Value unchanged after all concurrent calls).

I verified the test actually catches the regression: temporarily reverted the fix (mutate-in-place + no deep copy) and reran the test — it failed reliably and reproduced cross-contamination with the exact symptom from the issue (e.g. "got ARCH=amd64, want arm64"), plus a direct assertion that the shared matrix had been mutated. Restoring the fix makes it pass consistently across repeated runs.

Note: I could not run this under go test -race in my environment (no 64-bit cgo-capable C compiler available), so I can't show the race detector flagging it directly — the regression test instead proves the bug via direct mutation/cross-contamination assertions, which catch the issue deterministically regardless of goroutine scheduling.

Verification

  • go build ./... — clean
  • go vet ./... — clean
  • go test . / go test ./taskfile/... — all pass (one pre-existing, unrelated failure in TestEvaluateSymlinksInPaths due to no symlink permissions on Windows; confirmed it fails identically on unmodified main)
  • Existing TestForCmds/TestForDeps matrix-ref test cases (loop-matrix, loop-matrix-ref, loop-matrix-ref-error) still pass unchanged

resolveMatrixRefs mutated the *ast.Matrix passed to it in place. That
Matrix is part of the cached, shared Task AST (the same *ast.For/*ast.Matrix
is reused for every call to a task), so concurrent invocations of a task
with a `for: matrix: ref:` loop - e.g. via parallel deps that call the same
internal task with different vars - raced on the row.Value assignment.
The result was silent cross-contamination: one caller's resolved ref value
could leak into another caller's matrix expansion, non-deterministically.

Matrix.DeepCopy() already existed but didn't actually deep-copy anything:
deepcopy.OrderedMap only deep-copies a value if it implements the Copier
interface, and *MatrixRow did not, so the "copy" still shared the same
*MatrixRow pointers as the original.

Fix:
- Add MatrixRow.DeepCopy(), so Matrix.DeepCopy() produces independent rows.
- resolveMatrixRefs now deep-copies the matrix before resolving refs into
  it, and returns the copy instead of mutating its input.

Added a regression test that drives resolveMatrixRefs concurrently from
many goroutines against the same shared matrix with different ref values,
and asserts both that each caller gets its own value back and that the
original shared matrix is never mutated. Reverting the fix reliably
reproduces the exact cross-contamination described in the issue.

Fixes go-task#2890

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a concurrency bug where resolveMatrixRefs mutated the shared/cached Task AST matrix in-place, allowing ref: row values to leak across concurrent invocations of the same task.

Changes:

  • Make matrix ref: resolution operate on a deep-copied matrix and return the resolved copy to callers.
  • Implement MatrixRow.DeepCopy() so Matrix.DeepCopy() produces truly independent row objects.
  • Add a regression test that drives concurrent resolveMatrixRefs calls against the same shared *ast.Matrix and asserts no cross-contamination / no input mutation.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
variables.go Updates itemsFromFor to use a resolved matrix copy and changes resolveMatrixRefs to return a deep-copied resolved matrix.
variables_internal_test.go Adds a concurrent regression test ensuring matrix ref resolution doesn’t mutate shared AST and doesn’t leak values across goroutines.
taskfile/ast/matrix.go Adds MatrixRow.DeepCopy() so Matrix.DeepCopy() actually deep-copies rows (avoiding shared row pointers).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread variables.go
Comment thread variables.go
Comment thread variables_internal_test.go
- Skip the deep-copy in resolveMatrixRefs when the matrix has no
  ref: rows, avoiding unnecessary allocation on every invocation of
  a static matrix loop.
- Check cache.Err() right after templater.ResolveRef, matching the
  existing pattern in resolveEnumRefs, so a real templating error
  isn't masked as "must resolve to a list".
- In the regression test, record an explicit error instead of
  silently leaving a result empty when a goroutine's row lookup
  fails, so such failures aren't misreported as cross-contamination.
@amitmishra11

Copy link
Copy Markdown
Author

Addressed all three review comments in 4641a56:

  • resolveMatrixRefs now skips the deep-copy entirely when the matrix has no ref: rows, so static matrices (the common case) don't pay for an allocation on every invocation.
  • Added a cache.Err() check immediately after templater.ResolveRef, mirroring the existing pattern in resolveEnumRefs, so a real templating error in a ref isn't masked behind the generic "must resolve to a list" message.
  • In the regression test, a goroutine that can't find/parse its row now records an explicit error instead of silently leaving its result empty, so that failure mode is reported clearly instead of being misread as cross-contamination.

go build, go vet, and the full matrix-ref test suite still pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Data race: for matrix ref: variables leak between concurrent dependency invocations of the same task

2 participants