Skip to content

Automate stage reset with opt-out label to reduce coordination overhead #8837

@linear

Description

@linear

problem

Resetting the shared stage branch is currently manual and disruptive. Engineers have to ask in a channel before resetting to avoid interrupting active testing, which slows recovery when stage is broken.

In practice, most resets should proceed immediately, but there is no lightweight mechanism to protect the minority of PRs that are actively testing.

solution

Introduce an automated stage-reset workflow with default-allow behavior and explicit opt-out:

  • Add a new PR label: stage-reset-blocked
  • When a stage reset is requested, automation checks open PRs labeled on stage
  • If no PR has stage-reset-blocked, automatically reset stage to main, then re-merge on stage PR branches in deterministic order
  • If one or more PRs have stage-reset-blocked, skip reset and notify owners/Slack with blocked PR list
  • Always post a completion/status summary to Slack and keep full audit trail in Actions logs

This reduces coordination cost in the common case while preserving safety for active testers.

technical

Implement via GitHub Actions + existing merge-bot conventions:

  1. Trigger paths
    • Manual workflow_dispatch for reset requests (initial rollout)
    • Optional automatic trigger on merge-bot conflict / failed stage deploy (later)
  2. Data collection
    • Query open PRs with label on stage
    • Partition into blocked/unblocked based on stage-reset-blocked
  3. Reset execution rules
    • Proceed only when blocked list is empty (or when explicit maintainer override input is set)
    • Reset branch: align stage to main
    • Re-merge each on stage branch back to stage
    • Push branch and report per-PR merge outcomes
  4. Notifications
    • Pre-action Slack message (reason, on-stage PRs, blocked PRs)
    • Post-action Slack message (success/failure, re-applied PRs, failures requiring manual follow-up)
  5. Guardrails
    • Dry-run mode for initial rollout
    • Protected release-window bypass rules
    • Full run logs and deterministic ordering for repeatability
  6. Docs alignment
    • Keep apps/docs/docs/04-engineering-practices/02-deployment/index.md in sync with workflow behavior
    • Include policy examples for when to apply/remove stage-reset-blocked

Related context: ENG-3570

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions