Skip to content

Fix races in AWS compose up#2038

Merged
lionello merged 12 commits intomainfrom
lio/fix-race
Apr 9, 2026
Merged

Fix races in AWS compose up#2038
lionello merged 12 commits intomainfrom
lio/fix-race

Conversation

@lionello
Copy link
Copy Markdown
Member

@lionello lionello commented Apr 9, 2026

Description

  • LoadConfig would overwrite Account and Region while others could be reading those fields
  • FillOutputs would overwrite the CloudFormation stack outputs, while Tail/Subscribe would read from go-routine
  • SetUTCMode would race with httptest server during tests
  • use atomic.Int32 for increments in test go-routines
==================
WARNING: DATA RACE
Write at 0x00c000d2ebb0 by goroutine 491:
  github.com/DefangLabs/defang/src/pkg/clouds/aws/codebuild/cfn.(*AwsCfn).fillWithOutputs()
      /Users/llunesu/dev/defang/src/pkg/clouds/aws/codebuild/cfn/setup.go:215 +0xf0
  github.com/DefangLabs/defang/src/pkg/clouds/aws/codebuild/cfn.(*AwsCfn).FillOutputs()
      /Users/llunesu/dev/defang/src/pkg/clouds/aws/codebuild/cfn/setup.go:204 +0x358
  github.com/DefangLabs/defang/src/pkg/clouds/aws/codebuild/cfn.(*AwsCfn).FillOutputs()
      /Users/llunesu/dev/defang/src/pkg/clouds/aws/codebuild/cfn/setup.go:192 +0x158
  github.com/DefangLabs/defang/src/pkg/clouds/aws/codebuild/cfn.(*AwsCfn).FillOutputs()
      /Users/llunesu/dev/defang/src/pkg/clouds/aws/codebuild/cfn/setup.go:186 +0x3c
  github.com/DefangLabs/defang/src/pkg/cli/client/byoc/aws.(*ByocAws).Subscribe()
      /Users/llunesu/dev/defang/src/pkg/cli/client/byoc/aws/byoc.go:964 +0x5c
  github.com/DefangLabs/defang/src/pkg/cli.WaitServiceState()
      /Users/llunesu/dev/defang/src/pkg/cli/subscribe.go:35 +0x1b4
  github.com/DefangLabs/defang/src/pkg/cli.TailAndMonitor.func1()
      /Users/llunesu/dev/defang/src/pkg/cli/tailAndMonitor.go:48 +0x124

Previous write at 0x00c000d2ebb0 by main goroutine:
  github.com/DefangLabs/defang/src/pkg/clouds/aws/codebuild/cfn.(*AwsCfn).fillWithOutputs()
      /Users/llunesu/dev/defang/src/pkg/clouds/aws/codebuild/cfn/setup.go:215 +0xf0
  github.com/DefangLabs/defang/src/pkg/clouds/aws/codebuild/cfn.(*AwsCfn).FillOutputs()
      /Users/llunesu/dev/defang/src/pkg/clouds/aws/codebuild/cfn/setup.go:204 +0x358
  github.com/DefangLabs/defang/src/pkg/clouds/aws/codebuild/cfn.(*AwsCfn).FillOutputs()
      /Users/llunesu/dev/defang/src/pkg/clouds/aws/codebuild/cfn/setup.go:192 +0x158
  github.com/DefangLabs/defang/src/pkg/clouds/aws/codebuild/cfn.(*AwsCfn).FillOutputs()
      /Users/llunesu/dev/defang/src/pkg/clouds/aws/codebuild/cfn/setup.go:186 +0x3c
  github.com/DefangLabs/defang/src/pkg/cli/client/byoc/aws.(*ByocAws).QueryLogs()
      /Users/llunesu/dev/defang/src/pkg/cli/client/byoc/aws/byoc.go:698 +0x5c
  github.com/DefangLabs/defang/src/pkg/cli.streamLogs()
      /Users/llunesu/dev/defang/src/pkg/cli/tail.go:238 +0x7ec
  github.com/DefangLabs/defang/src/pkg/cli.Tail()
      /Users/llunesu/dev/defang/src/pkg/cli/tail.go:170 +0x1b8
  github.com/DefangLabs/defang/src/pkg/cli.TailAndMonitor()
      /Users/llunesu/dev/defang/src/pkg/cli/tailAndMonitor.go:72 +0x860
  github.com/DefangLabs/defang/src/cmd/cli/command.makeComposeUpCmd.func1()
      /Users/llunesu/dev/defang/src/cmd/cli/command/compose.go:166 +0x1304
  github.com/DefangLabs/defang/src/pkg/cli.ComposeUp()
      /Users/llunesu/dev/defang/src/pkg/cli/composeUp.go:199 +0xd00
  github.com/DefangLabs/defang/src/pkg/cli/client/byoc/aws.(*ByocAws).runCdCommand()
      /Users/llunesu/dev/defang/src/pkg/cli/client/byoc/aws/byoc.go:552 +0x2a8
  github.com/DefangLabs/defang/src/pkg/cli/client/byoc/aws.(*ByocAws).deploy()
      /Users/llunesu/dev/defang/src/pkg/cli/client/byoc/aws/byoc.go:281 +0xa00
  github.com/DefangLabs/defang/src/pkg/cli/client/byoc/aws.(*ByocAws).deploy()
      /Users/llunesu/dev/defang/src/pkg/cli/client/byoc/aws/byoc.go:240 +0x588
==================
==================
WARNING: DATA RACE
Read at 0x00c000d2eb50 by goroutine 491:
  github.com/DefangLabs/defang/src/pkg/clouds/aws.(*Aws).LoadConfig()
      /Users/llunesu/dev/defang/src/pkg/clouds/aws/common.go:31 +0x60
  github.com/DefangLabs/defang/src/pkg/cli/client/byoc/aws.(*ByocAws).Subscribe()
      /Users/llunesu/dev/defang/src/pkg/cli/client/byoc/aws/byoc.go:968 +0x10c
  github.com/DefangLabs/defang/src/pkg/clouds/aws/codebuild/cfn.(*AwsCfn).FillOutputs()
      /Users/llunesu/dev/defang/src/pkg/clouds/aws/codebuild/cfn/setup.go:186 +0x3c
  github.com/DefangLabs/defang/src/pkg/cli/client/byoc/aws.(*ByocAws).Subscribe()
      /Users/llunesu/dev/defang/src/pkg/cli/client/byoc/aws/byoc.go:964 +0x5c
  github.com/DefangLabs/defang/src/pkg/cli.WaitServiceState()
      /Users/llunesu/dev/defang/src/pkg/cli/subscribe.go:35 +0x1b4
  github.com/DefangLabs/defang/src/pkg/cli.TailAndMonitor.func1()
      /Users/llunesu/dev/defang/src/pkg/cli/tailAndMonitor.go:48 +0x124

Previous write at 0x00c000d2eb50 by main goroutine:
  github.com/DefangLabs/defang/src/pkg/clouds/aws.(*Aws).LoadConfig()
      /Users/llunesu/dev/defang/src/pkg/clouds/aws/common.go:38 +0x1e8
  github.com/DefangLabs/defang/src/pkg/cli/client/byoc/aws.(*ByocAws).QueryLogs()
      /Users/llunesu/dev/defang/src/pkg/cli/client/byoc/aws/byoc.go:702 +0x10c
  github.com/DefangLabs/defang/src/pkg/clouds/aws/codebuild/cfn.(*AwsCfn).FillOutputs()
      /Users/llunesu/dev/defang/src/pkg/clouds/aws/codebuild/cfn/setup.go:186 +0x3c
  github.com/DefangLabs/defang/src/pkg/cli/client/byoc/aws.(*ByocAws).QueryLogs()
      /Users/llunesu/dev/defang/src/pkg/cli/client/byoc/aws/byoc.go:698 +0x5c
  github.com/DefangLabs/defang/src/pkg/cli.streamLogs()
      /Users/llunesu/dev/defang/src/pkg/cli/tail.go:238 +0x7ec
  github.com/DefangLabs/defang/src/pkg/cli.Tail()
      /Users/llunesu/dev/defang/src/pkg/cli/tail.go:170 +0x1b8
  github.com/DefangLabs/defang/src/pkg/cli.TailAndMonitor()
      /Users/llunesu/dev/defang/src/pkg/cli/tailAndMonitor.go:72 +0x860
  github.com/DefangLabs/defang/src/cmd/cli/command.makeComposeUpCmd.func1()

Linked Issues

Checklist

  • I have performed a self-review of my code
  • I have added appropriate tests
  • I have updated the Defang CLI docs and/or README to reflect my changes, if necessary

Summary by CodeRabbit

  • Refactor

    • Consolidated AWS region handling into core AWS logic.
    • Added CloudFormation output caching for faster repeated operations.
    • Safer UTC mode toggling to reduce unintended global time changes.
  • Bug Fixes

    • Log group ARNs now use the regional format (fixes log querying/tailing/subscription).
  • Chores

    • Removed BYOC debug endpoint.
    • Updated linting comments and CI/test commands to run with the race detector.
  • Telemetry

    • Command flags are now recorded as string values for tracking.

@lionello lionello requested a review from jordanstephens as a code owner April 9, 2026 20:27
@lionello lionello requested a review from edwardrf April 9, 2026 20:28
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 9, 2026

📝 Walkthrough

Walkthrough

Consolidates AWS region/ARN handling and adds one-time caching for AWS config and CloudFormation outputs; updates BYOC clients (AWS log-group ARN generation and GCP setup flag visibility) and removes a BYOC debug endpoint; minor lint and test adjustments.

Changes

Cohort / File(s) Summary
AWS common & ARN helpers
src/pkg/clouds/aws/common.go
Added cached LoadConfig via sync.Once, introduced MakeRegionalARN and MakeARN helpers for consistent ARN construction.
Region package removal
src/pkg/clouds/aws/region/region.go
Deleted the region package (type alias, region constants, FromArn).
BYOC AWS changes
src/pkg/cli/client/byoc/aws/byoc.go, src/pkg/cli/client/byoc/aws/list.go
Switched makeLogGroupARN to use MakeRegionalARN; replaced fallback region.USEast1 with "us-east-1" string.
BYOC base & GCP client
src/pkg/cli/client/byoc/baseclient.go, src/pkg/cli/client/byoc/gcp/byoc.go
Removed ByocBaseClient.Debug method; made ByocGcp.setupDone exported as SetupDone and updated usages.
AWS CodeBuild / CloudFormation
src/pkg/clouds/aws/codebuild/cfn/setup.go, src/pkg/clouds/aws/codebuild/cfn/setup_test.go, src/pkg/clouds/aws/codebuild/common.go
Reworked constructor signature to accept aws.Region, added sync.Once memoization for FillOutputs (and extracted helper), removed AwsCodeBuild.MakeARN. Test updated to use region string literal.
AWS package init & helpers ripple
src/pkg/clouds/aws/login.go, src/pkg/clouds/aws/common.go, src/pkg/clouds/aws/...
Codebase updated to use new Aws fields/methods (config caching, ARN helpers); adjusted small linter comments.
Misc: build/tests/formatting
.github/workflows/go.yml, src/Makefile, src/pkg/auth/auth_test.go, src/pkg/cli/state.go, src/pkg/cli/tail.go, src/pkg/track/track.go
Enabled race detector in CI and Make targets; test atomic counter replaced; added package-level sync.Once for state init; conditional UTC mode set; flag values stored as strings.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • raphaeltm
  • jordanstephens

Poem

🐰
I hopped through regions, strings in tow,
ARNs now local, tidy, and so—
Configs cached once, no extra race,
BYOC fields primped for a shinier place,
Happy hops, the build keeps pace!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.08% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fix races in AWS compose up' directly and clearly summarizes the main objective: addressing race conditions in AWS compose functionality.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch lio/fix-race

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/pkg/clouds/aws/common.go (1)

31-36: Context capture in sync.Once - verify acceptability.

The first caller's ctx is captured and used for the config loading. Subsequent callers' contexts are ignored, meaning:

  1. If the first caller's context is canceled mid-load, subsequent callers get the cancellation error even if their contexts are still valid.
  2. Subsequent callers cannot provide a longer timeout if the first caller had a short one.

This is a known trade-off with sync.Once and is generally acceptable for one-time initialization like config loading. Just flagging for awareness.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/pkg/clouds/aws/common.go` around lines 31 - 36, LoadConfig currently
captures the first caller's ctx inside a.cfgOnce.Do which can cause later
callers to inherit cancellations/timeouts; to fix, make the one-time init use a
non-cancellable context (e.g., call a.loadConfig(context.Background()) inside
the a.cfgOnce.Do) or explicitly document the behavior so callers know the first
ctx wins; update the Aws.LoadConfig implementation (and any comments)
referencing a.cfgOnce and a.loadConfig to either use context.Background() for
the Do call or add documentation explaining the captured-context tradeoff.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/pkg/clouds/aws/common.go`:
- Around line 31-36: LoadConfig currently captures the first caller's ctx inside
a.cfgOnce.Do which can cause later callers to inherit cancellations/timeouts; to
fix, make the one-time init use a non-cancellable context (e.g., call
a.loadConfig(context.Background()) inside the a.cfgOnce.Do) or explicitly
document the behavior so callers know the first ctx wins; update the
Aws.LoadConfig implementation (and any comments) referencing a.cfgOnce and
a.loadConfig to either use context.Background() for the Do call or add
documentation explaining the captured-context tradeoff.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6c9758ce-39ea-4932-b04c-7796877da3e4

📥 Commits

Reviewing files that changed from the base of the PR and between 2043c75 and 1788fe3.

📒 Files selected for processing (11)
  • src/pkg/cli/client/byoc/aws/byoc.go
  • src/pkg/cli/client/byoc/aws/list.go
  • src/pkg/cli/client/byoc/baseclient.go
  • src/pkg/cli/client/byoc/gcp/byoc.go
  • src/pkg/clouds/aws/codebuild/cfn/setup.go
  • src/pkg/clouds/aws/codebuild/cfn/setup_test.go
  • src/pkg/clouds/aws/codebuild/common.go
  • src/pkg/clouds/aws/common.go
  • src/pkg/clouds/aws/login.go
  • src/pkg/clouds/aws/region/region.go
  • src/pkg/clouds/gcp/login.go
💤 Files with no reviewable changes (3)
  • src/pkg/clouds/aws/codebuild/common.go
  • src/pkg/cli/client/byoc/baseclient.go
  • src/pkg/clouds/aws/region/region.go

Comment thread src/pkg/cli/client/byoc/aws/list.go
Comment thread src/pkg/cli/client/byoc/gcp/byoc.go
@lionello lionello requested a review from raphaeltm as a code owner April 9, 2026 22:42
@lionello lionello merged commit 05bc28b into main Apr 9, 2026
14 of 16 checks passed
@lionello lionello deleted the lio/fix-race branch April 9, 2026 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants