Skip to content

fix(e2e): Fix 409 on label, dispatch ref, cleanup gaps for nightly/push/release runs #88

Merged
yasserfaraazkhan merged 7 commits intomasterfrom
SEC-10102
Apr 17, 2026
Merged

fix(e2e): Fix 409 on label, dispatch ref, cleanup gaps for nightly/push/release runs #88
yasserfaraazkhan merged 7 commits intomasterfrom
SEC-10102

Conversation

@yasserfaraazkhan
Copy link
Copy Markdown
Contributor

@yasserfaraazkhan yasserfaraazkhan commented Apr 16, 2026

E2E server creation affecting desktop and mobile repos.

  • Duplicate webhook deliveries no longer race to create conflicting cloud installations (per-PR in-progress guard + nanosecond uid).
  • Non-PR workflow dispatches were passing a commit SHA as ref, which GitHub rejects with 422, switched to branch name so nightly, push-to-master, and release-branch runs now actually execute.
  • Instance cleanup after test completion was silently failing due to SHA mismatch between the tracking key and the dispatched workflow's head_sha
  • The tracking key is now embedded as mw_tracking_key in dispatch inputs and used for direct lookup on completion.
  • The orphaned-instance scan is now configurable (default 3h, down from 8h) and runs periodically rather than only on startup.
  • Add E2E/Reset-Servers label. PRs that are Up for longer tend to accumulate test data. This Lebel will Destroy the servers and we can Add E2E/Run label again to run tests on fresh servers.
NONE

@mm-cloud-bot mm-cloud-bot added the release-note-none Denotes a PR that doesn't merit a release note. label Apr 16, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 16, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds tracking-key propagation for E2E workflows, per-PR+platform in-memory deduplication and generation-based cancellation for in-flight provisioning, configurable non-PR instance max-age and periodic stale-instance cleanup, and a label-triggered E2E reset path.

Changes

Cohort / File(s) Summary
Configuration
server/config.go
Added E2EResetServersLabel string and E2EInstanceMaxAge int to MatterwickConfig (0 falls back to 3 hours).
Server core & lifecycle
server/server.go
Added e2eInProgress, e2eInProgressLock, e2ePRCleanupGeneration, stopCh, and stopOnce; initialize in New(); replace one-time startup cleanup with immediate cleanupStaleNonPRE2EInstances() and launch a background ticker that re-runs cleanup at ~maxAge/2 (min 30m); Stop() closes stopCh before shutdown.
E2E test management
server/e2e_tests.go
e2eUniqueSuffix() now uses cloudModel.NewID()[:8]; added per-PR+platform in-memory dedupe (e2eInProgress) in handleE2ETestRequest; added e2ePRCleanupGeneration snapshot/check to abort storing newly provisioned instances if cleanup ran during provisioning; replaced startup cleanup entrypoint with cleanupStaleNonPRE2EInstances() and introduced e2eInstanceMaxAge(); dispatch helpers accept trackingKey and inject mw_tracking_key when set.
Push event dispatch
server/push_events.go
Abort dispatch if sha empty (log and destroy created instances); compute trackingKey as fmt.Sprintf("%s-push-%s-%s", repoName, branch, sha); extended trigger helpers to accept and forward trackingKey; dispatch ref changed from sha to branch.
Workflow run & completion handling
server/workflow_run.go
Completion now prefers lookup via payload.WorkflowRun.Inputs["mw_tracking_key"] to find & destroy instances (with SHA fallback); nightly/CMT dispatchers pass mw_tracking_key; dispatchCMTWorkflow now accepts trackingKey and dispatches with ref=branch.
Pull request label handling
server/pull_request.go
On labeled action, if label equals s.Config.E2EResetServersLabel, asynchronously call handleE2ECleanup(pr) and return early. Minor comment punctuation tweaks.

Sequence Diagram

sequenceDiagram
    participant Webhook as "Webhook (Push/PR)"
    participant Server as "Matterwick Server"
    participant GH as "GitHub API / Actions"
    participant Workflow as "Workflow Run"
    participant Cleanup as "Cleanup Handler"

    Webhook->>Server: Receive push or PR (repo, branch, sha)
    Server->>Server: Compute trackingKey (repo-push-branch-sha)
    Server->>Server: Check e2eInProgress (dedupe per PR+platform)
    Server->>GH: Dispatch workflow (ref=branch) with input mw_tracking_key
    GH->>Workflow: Start workflow run
    Workflow->>Server: Send workflow_run completion (includes mw_tracking_key)
    Server->>Server: Lookup & remove trackingKey under lock
    Server->>Cleanup: Destroy associated E2E instances
    Server->>Cleanup: Fallback: scan & destroy by SHA if mw_tracking_key missing
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main changes: fixing E2E server creation issues (409 on label, dispatch ref problems, and cleanup gaps) across nightly/push/release runs.
Description check ✅ Passed The description is directly related to the changeset, providing context for the fixes and explaining the rationale behind key changes like the tracking key mechanism and configurable cleanup.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch SEC-10102

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/e2e_tests.go`:
- Around line 100-115: The current in-progress guard uses s.e2eInProgress keyed
only by repo/pr (variable key), causing label-specific runs to be dropped;
update the locking logic to distinguish platforms by including the effective
test platform in the key (e.g., build a composite key like repo/pr/platform or
repo/pr/labelVariant) and use that composite when checking/setting
s.e2eInProgress, s.e2eInProgressLock, and when deleting in the deferred cleanup;
alternatively, if you prefer one PR-wide slot, change the map to track a set of
pending platforms per PR (e.g., map[string]map[string]bool) and enqueue the
incoming platform when it differs so it will run after the current provisioning
completes.
- Around line 37-42: The function e2eUniqueSuffix currently truncates
time.Now().UnixNano() to uint32 which wraps every ~4.3s; replace that with a
non-truncating random source: import crypto/rand, read 4 bytes into a slice,
convert to a uint32 (e.g. binary.BigEndian.Uint32) and return
fmt.Sprintf("%08x", value) so the suffix is derived from
cryptographically-random bytes instead of the low 32 bits of the timestamp;
update e2eUniqueSuffix to use this random-read approach and handle any read
error (propagate or panic/log) as appropriate.

In `@server/push_events.go`:
- Around line 125-127: Add a fail-fast check for an empty commit SHA before
generating tracking keys or calling s.triggerE2EWorkflowForPushEvent: if sha is
empty (zero-length), log an error/warning and return without creating the
tracking key or dispatching (so MOBILE_VERSION won't be sent empty) instead of
proceeding to call triggerE2EWorkflowForPushEvent(repoName, instanceType,
branch, sha, key, instances); ensure the guard is placed immediately before the
tracking/dispatch logic that builds the key and invokes
triggerE2EWorkflowForPushEvent.

In `@server/server.go`:
- Around line 123-139: The periodic cleanup goroutine uses a ticker but never
observes shutdown; change it to listen to a server-scoped stop signal (e.g., an
existing s.stopCh or a context.Context on the server) so it exits on Stop().
Specifically, run the ticker loop with a select that waits on ticker.C and the
server stop signal, call ticker.Stop() when the stop signal is received, and
ensure Stop() closes/signals the same stop channel or cancels the context so the
goroutine ends; the relevant symbols are s.cleanupStaleNonPRE2EInstances(),
s.e2eInstanceMaxAge(), and Stop().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d7ee7b83-0ebe-448d-9395-54511cdf8d62

📥 Commits

Reviewing files that changed from the base of the PR and between 6181c43 and eca75bd.

📒 Files selected for processing (5)
  • server/config.go
  • server/e2e_tests.go
  • server/push_events.go
  • server/server.go
  • server/workflow_run.go

Comment thread server/e2e_tests.go Outdated
Comment thread server/e2e_tests.go Outdated
Comment thread server/push_events.go
Comment thread server/server.go
yasserfaraazkhan and others added 2 commits April 16, 2026 18:17
…r reliable instance lifecycle

- Fix 409 on PR E2E label: per-PR in-progress guard + nanosecond uid prevents
  duplicate webhook deliveries from racing to create conflicting installations
- Fix nightly/push/release E2E not running: dispatch ref was commit SHA; GitHub
  workflow_dispatch requires branch name — all non-PR dispatch paths updated
- Fix cleanup gaps: embed mw_tracking_key in dispatch inputs for direct key-based
  cleanup on workflow_run completion, eliminating SHA mismatch failures
- Add E2EResetServersLabel: single label to destroy existing PR servers so
  developer can add E2E/Run to get a clean-state environment
- Add E2EInstanceMaxAge config (default 3h) and periodic orphan scan (every
  maxAge/2) replacing startup-only 8h hardcoded threshold

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- e2eUniqueSuffix: replace uint32(UnixNano) with cloudModel.NewID()[:8]
  (crypto/rand-based) to eliminate ~4.3s wrap-around collision risk
- in-progress guard: include testPlatform in key so different mobile
  platform labels (ios vs android) are not incorrectly dropped as duplicates
- push_events: fail fast and destroy instances when push event SHA is empty
  to avoid malformed tracking keys and empty MOBILE_VERSION dispatch
- server: add stopCh to Server struct; periodic cleanup goroutine now
  observes Stop() via select on stopCh, exiting cleanly on shutdown

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
server/e2e_tests.go (2)

98-115: ⚠️ Potential issue | 🟠 Major

Don’t use one in-progress slot for every mobile label on the PR.

The guard is still keyed only by repo/pr, but mobile runs are label-specific (ios, android, both). If a second mobile label arrives while the first request is provisioning, it is dropped as a duplicate and that platform never runs. Key the guard by repo/pr/testPlatform, or track pending platforms separately while keeping s.e2eInstances PR-scoped for reuse.

Suggested fix
 	key := fmt.Sprintf("%s-pr-%d", pr.RepoName, pr.Number)
+	inProgressKey := key
+	if instanceType == "mobile" {
+		inProgressKey = fmt.Sprintf("%s-%s", key, testPlatform)
+	}
 
 	// Guard against duplicate webhook deliveries: if another goroutine is already
 	// handling this PR key (check, cloud lookup, or 30-min creation), drop the
 	// duplicate rather than racing to create conflicting cloud installations.
 	s.e2eInProgressLock.Lock()
-	if s.e2eInProgress[key] {
+	if s.e2eInProgress[inProgressKey] {
 		s.e2eInProgressLock.Unlock()
 		logger.Warn("E2E instance creation already in progress for this PR, skipping duplicate request")
 		return
 	}
-	s.e2eInProgress[key] = true
+	s.e2eInProgress[inProgressKey] = true
 	s.e2eInProgressLock.Unlock()
 	defer func() {
 		s.e2eInProgressLock.Lock()
-		delete(s.e2eInProgress, key)
+		delete(s.e2eInProgress, inProgressKey)
 		s.e2eInProgressLock.Unlock()
 	}()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/e2e_tests.go` around lines 98 - 115, The current in-progress guard
uses a PR-scoped key (key := fmt.Sprintf("%s-pr-%d", pr.RepoName, pr.Number)) so
s.e2eInProgress blocks all platforms for that PR; change the guard to be
platform-aware by including the test platform (e.g., ios/android/both) in the
key (e.g., fmt.Sprintf("%s-pr-%d-%s", pr.RepoName, pr.Number, testPlatform)) or
alternatively change s.e2eInProgress to map PR -> set of pending platforms and
check/add the specific platform there; update all references to key and the
defer cleanup to remove only the platform entry while keeping s.e2eInstances
PR-scoped for reuse (symbols: key, s.e2eInProgress, s.e2eInstances).

37-42: ⚠️ Potential issue | 🟠 Major

Use a non-wrapping unique source for the suffix.

Casting UnixNano() to uint32 still repeats the suffix space every ~4.29 seconds, so identical PR/platform name patterns can collide again on delayed redeliveries and reintroduce the 409s this change is meant to avoid.

Suggested fix
-// e2eUniqueSuffix returns an 8-character hex suffix for instance name uniqueness.
-// Uses the lower 32 bits of the nanosecond clock so that two calls within the
-// same second produce different values, preventing 409 conflicts when duplicate
-// webhook deliveries trigger concurrent instance creation.
+// e2eUniqueSuffix returns a short random suffix for instance name uniqueness.
 func e2eUniqueSuffix() string {
-	return fmt.Sprintf("%08x", uint32(time.Now().UnixNano()))
+	return cloudModel.NewID()[:8]
 }
#!/bin/bash
python - <<'PY'
base = 1_700_000_000_000_000_000
a = base & 0xffffffff
b = (base + 2**32) & 0xffffffff
print({"same_low_32_bits": a == b, "wrap_seconds": (2**32) / 1e9})
PY
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/e2e_tests.go` around lines 37 - 42, The suffix currently uses
uint32(time.Now().UnixNano()) in e2eUniqueSuffix which wraps every ~4.29s and
can reintroduce collisions; change e2eUniqueSuffix to produce 4 random bytes
instead (use crypto/rand to Read 4 bytes and encoding/binary to convert to
uint32) and fmt.Sprintf("%08x", ...) that value so the 8-char hex is drawn from
a non-wrapping random source rather than the low 32 bits of the clock. Ensure
you add the crypto/rand and encoding/binary imports and handle/read errors
appropriately.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/pull_request.go`:
- Around line 49-52: The reset path races with in-flight provisioning:
handleE2ECleanup can destroy instances while handleE2ETestRequest later writes
new ones into s.e2eInstances, undoing the reset. Fix by serializing per-PR E2E
state (e.g., introduce or use a per-PR struct like an E2EState with a mutex and
a cancel/reset flag) and update handleE2ECleanup to acquire that lock and set
the reset/canceled flag before destroying instances; likewise modify
handleE2ETestRequest to acquire the same lock (or check the flag under the lock)
before storing into s.e2eInstances or dispatching workflows so provisioning
aborts when reset is active. Ensure both paths reference the same per-PR key (PR
identifier) in s.e2eInstances and that handleE2ECleanup waits or short-circuits
provisioning based on the flag.

---

Duplicate comments:
In `@server/e2e_tests.go`:
- Around line 98-115: The current in-progress guard uses a PR-scoped key (key :=
fmt.Sprintf("%s-pr-%d", pr.RepoName, pr.Number)) so s.e2eInProgress blocks all
platforms for that PR; change the guard to be platform-aware by including the
test platform (e.g., ios/android/both) in the key (e.g.,
fmt.Sprintf("%s-pr-%d-%s", pr.RepoName, pr.Number, testPlatform)) or
alternatively change s.e2eInProgress to map PR -> set of pending platforms and
check/add the specific platform there; update all references to key and the
defer cleanup to remove only the platform entry while keeping s.e2eInstances
PR-scoped for reuse (symbols: key, s.e2eInProgress, s.e2eInstances).
- Around line 37-42: The suffix currently uses uint32(time.Now().UnixNano()) in
e2eUniqueSuffix which wraps every ~4.29s and can reintroduce collisions; change
e2eUniqueSuffix to produce 4 random bytes instead (use crypto/rand to Read 4
bytes and encoding/binary to convert to uint32) and fmt.Sprintf("%08x", ...)
that value so the 8-char hex is drawn from a non-wrapping random source rather
than the low 32 bits of the clock. Ensure you add the crypto/rand and
encoding/binary imports and handle/read errors appropriately.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 31e6afb1-002e-406a-a48e-c95c90665bfc

📥 Commits

Reviewing files that changed from the base of the PR and between eca75bd and 6302d49.

📒 Files selected for processing (3)
  • server/config.go
  • server/e2e_tests.go
  • server/pull_request.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • server/config.go

Comment thread server/pull_request.go
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
server/push_events.go (1)

118-125: Move the empty SHA check before instance creation to avoid wasted cloud resources.

The fail-fast guard correctly prevents dispatch with an empty SHA and cleans up instances. However, the check occurs after createMultipleE2EInstancesForPushEvent (line 105), meaning cloud instances are provisioned and then immediately destroyed when SHA is empty. Consider moving this validation earlier in the function, before line 102.

♻️ Suggested refactor

Move the check to just before instance creation:

 	if !isDesktop && !isMobile {
 		logger.Warn("Repository is neither desktop nor mobile, skipping E2E tests")
 		return
 	}

+	// Fail fast if the push event carried no commit SHA — an empty SHA would
+	// produce a malformed tracking key and send an empty MOBILE_VERSION to the
+	// test workflow.
+	if sha == "" {
+		logger.Error("Push event has no commit SHA, skipping E2E dispatch")
+		return
+	}
+
 	// Create E2E instances for testing
 	instanceType := "desktop"

Then remove the check from lines 118-125.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/push_events.go` around lines 118 - 125, The empty-commit SHA guard
should be moved so we validate sha before provisioning instances: check the sha
variable (the current if sha == "" branch) before calling
createMultipleE2EInstancesForPushEvent, log the error with logger and return
early, and then remove the duplicate check and destroyE2EInstances call that
currently exists after instance creation; this ensures we never call
createMultipleE2EInstancesForPushEvent when sha is empty and avoids unnecessary
calls to destroyE2EInstances.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@server/push_events.go`:
- Around line 118-125: The empty-commit SHA guard should be moved so we validate
sha before provisioning instances: check the sha variable (the current if sha ==
"" branch) before calling createMultipleE2EInstancesForPushEvent, log the error
with logger and return early, and then remove the duplicate check and
destroyE2EInstances call that currently exists after instance creation; this
ensures we never call createMultipleE2EInstancesForPushEvent when sha is empty
and avoids unnecessary calls to destroyE2EInstances.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 14e7fa98-2627-439e-b95f-bd66f7cbb691

📥 Commits

Reviewing files that changed from the base of the PR and between 6302d49 and a9fd961.

📒 Files selected for processing (3)
  • server/e2e_tests.go
  • server/push_events.go
  • server/server.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • server/e2e_tests.go
  • server/server.go

yasserfaraazkhan and others added 2 commits April 16, 2026 18:31
Adds a per-PR cleanup generation counter to prevent a race where
handleE2ECleanup (triggered by E2EResetServersLabel) destroys cloud
installations while handleE2ETestRequest is still provisioning, causing
the provisioning goroutine to later store deleted instances in the tracking
map and dispatch a workflow against non-existent servers.

handleE2ECleanup increments e2ePRCleanupGeneration[key] before destroying.
handleE2ETestRequest snapshots the counter before provisioning starts and
checks it after creation completes — if the counter advanced, the fresh
instances are discarded instead of being stored and dispatched.

Also skips duplicate-comment findings (uid and in-progress guard) which
were already fixed in the previous commit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ventE2E

Previously the check fired after createMultipleE2EInstancesForPushEvent,
meaning 3 cloud installations were provisioned and then immediately
destroyed when the push event carried no commit SHA. Move the guard to
before the provisioning call so no instances are ever created for
empty-SHA events, and remove the now-unnecessary destroyE2EInstances call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
server/server.go (1)

173-177: Consider guarding against double-close if Stop() could be called multiple times.

Closing an already-closed channel panics. If Stop() is guaranteed to be called exactly once, this is fine. If there's any possibility of multiple calls (e.g., signal handlers, test teardown), consider using sync.Once to guard the close.

Optional defensive pattern
 type Server struct {
+	stopOnce sync.Once
 	stopCh   chan struct{}
 }

 func (s *Server) Stop() {
 	s.Logger.Info("Stopping MatterWick")
-	close(s.stopCh)
+	s.stopOnce.Do(func() { close(s.stopCh) })
 	manners.Close()
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/server.go` around lines 173 - 177, The Stop method can panic if called
more than once because it closes s.stopCh unconditionally; protect the close by
adding a guard (e.g., a sync.Once field on Server or an isStopped bool protected
by a mutex) and use that guard when invoking close(s.stopCh) inside Server.Stop,
leaving s.Logger.Info("Stopping MatterWick") and manners.Close() intact; update
the Server struct to include the guard (e.g., stopOnce sync.Once) and call
stopOnce.Do(func(){ close(s.stopCh) }) from Stop.
server/e2e_tests.go (1)

58-63: Consider periodic cleanup of e2ePRCleanupGeneration entries.

The e2ePRCleanupGeneration map accumulates entries (one per PR key) that are never removed. While each entry is small (~40 bytes), over very long server uptimes with many PRs this could grow unboundedly.

Consider clearing entries in handleE2ECleanup after the cleanup completes, or implementing periodic pruning of keys no longer in e2eInstances.

Optional cleanup in handleE2ECleanup
 func (s *Server) handleE2ECleanup(pr *model.PullRequest) {
     // ... existing cleanup logic ...
     
     // Fallback: catch orphans from restarts, map overwrites, or failed goroutines
     s.cleanupOrphanedE2EInstances(pr, logger)
+
+    // Remove the generation entry now that cleanup is complete.
+    // Any in-flight provisioning will have already captured its snapshot.
+    s.e2ePRCleanupGenerationLock.Lock()
+    delete(s.e2ePRCleanupGeneration, key)
+    s.e2ePRCleanupGenerationLock.Unlock()
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/e2e_tests.go` around lines 58 - 63, The e2ePRCleanupGeneration map
grows unbounded because entries are never removed; update handleE2ECleanup to
remove the PR key from e2ePRCleanupGeneration after a successful cleanup (or
implement a periodic pruning pass that iterates e2ePRCleanupGeneration and
deletes keys that no longer exist in e2eInstances), ensuring any necessary
synchronization around e2ePRCleanupGeneration/e2eInstances is respected (use the
existing mutex or add one if needed) and keeping the cleanup logic bounded.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@server/e2e_tests.go`:
- Around line 58-63: The e2ePRCleanupGeneration map grows unbounded because
entries are never removed; update handleE2ECleanup to remove the PR key from
e2ePRCleanupGeneration after a successful cleanup (or implement a periodic
pruning pass that iterates e2ePRCleanupGeneration and deletes keys that no
longer exist in e2eInstances), ensuring any necessary synchronization around
e2ePRCleanupGeneration/e2eInstances is respected (use the existing mutex or add
one if needed) and keeping the cleanup logic bounded.

In `@server/server.go`:
- Around line 173-177: The Stop method can panic if called more than once
because it closes s.stopCh unconditionally; protect the close by adding a guard
(e.g., a sync.Once field on Server or an isStopped bool protected by a mutex)
and use that guard when invoking close(s.stopCh) inside Server.Stop, leaving
s.Logger.Info("Stopping MatterWick") and manners.Close() intact; update the
Server struct to include the guard (e.g., stopOnce sync.Once) and call
stopOnce.Do(func(){ close(s.stopCh) }) from Stop.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a9782ff5-cde1-48a4-9795-11ba3b5d015d

📥 Commits

Reviewing files that changed from the base of the PR and between a9fd961 and c9d2779.

📒 Files selected for processing (3)
  • server/e2e_tests.go
  • server/push_events.go
  • server/server.go
✅ Files skipped from review due to trivial changes (1)
  • server/push_events.go

…ouble-close

Delete the PR key from e2ePRCleanupGeneration at the end of handleE2ECleanup
to prevent the map growing unbounded (one entry per cleaned-up PR forever).

Guard Server.Stop against a second call by routing close(s.stopCh) through
a sync.Once field, eliminating the panic on "close of closed channel".

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/server.go`:
- Around line 58-63: The cleanup-generation map e2ePRCleanupGeneration suffers
from zero-value ambiguity when handleE2ECleanup increments then deletes a PR
key: reads in handleE2ETestRequest see 0 and cannot distinguish "never ran" from
"was deleted", breaking resetDuringProvisioning (which compares
e2ePRCleanupGeneration[key] != startGeneration). Fix by switching to a
non-ambiguous scheme: either (1) use a tombstone sentinel value (e.g., -1 or a
dedicated constant) and have handleE2ECleanup write that sentinel instead of
deleting the key until all in-flight provisioning (startGeneration checks) can
observe the change, or (2) make e2ePRCleanupGeneration a monotonic counter that
is only incremented (never deleted) during the provisioning window so
startGeneration comparisons remain valid; update handleE2ECleanup and
handleE2ETestRequest logic to set/interpret the new sentinel or monotonic
semantics accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bb4b0708-96de-4502-a452-eb3390edfe02

📥 Commits

Reviewing files that changed from the base of the PR and between c9d2779 and 681094e.

📒 Files selected for processing (2)
  • server/e2e_tests.go
  • server/server.go
✅ Files skipped from review due to trivial changes (1)
  • server/e2e_tests.go

Comment thread server/server.go
…antics

Deleting the key after cleanup reintroduces the zero-value ambiguity:
a provisioning run that started before any cleanup captures startGeneration=0,
then cleanup increments to 1 and deletes the key (value silently back to 0),
so the final check 0!=0 is false and stale deleted instances get stored.

Keep the counter monotonic (never delete) so the != comparison always fires
when cleanup ran during provisioning. The map holds at most one int64 per
PR key ever cleaned up, which is negligible.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@yasserfaraazkhan yasserfaraazkhan merged commit d8bc3bd into master Apr 17, 2026
3 checks passed
@yasserfaraazkhan yasserfaraazkhan deleted the SEC-10102 branch April 17, 2026 05:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note-none Denotes a PR that doesn't merit a release note.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants