Skip to content

app-service: allow install to recover an app stuck in upgradeFailed#3437

Closed
pengpeng wants to merge 1 commit into
mainfrom
fix/app-service-recover-upgradefailed-via-install
Closed

app-service: allow install to recover an app stuck in upgradeFailed#3437
pengpeng wants to merge 1 commit into
mainfrom
fix/app-service-recover-upgradefailed-via-install

Conversation

@pengpeng

@pengpeng pengpeng commented Jun 21, 2026

Copy link
Copy Markdown
Member
  • Background

When an upgrade fails because the helm release is missing — e.g. the release was lost while the ApplicationManager still believes the app is installed (orphaned/incomplete prior install) — the app lands in upgradeFailed. Today that state only allows UpgradeOp/UninstallOp:

  • upgrade keeps failing early at GetDeployedReleaseVersion with release: not found,
  • install is rejected outright (install operation is not allowed for upgradeFailed state),
  • cancel is not allowed either,

so the app is wedged with no CLI recovery path.

This PR mirrors the existing InstallFailed recovery: allow InstallOp from upgradeFailed and add the upgradeFailed -> Pending transition the install handler writes. install() already tolerates an existing release (driver.ErrReleaseExists), so when an old release is still present this is effectively a no-op; when the release is gone it reinstalls cleanly.

IsTerminalReinstallable intentionally still excludes upgradeFailed (it may hold a live previous release, so checkAppNameConflict must keep treating it as occupied); only its doc comment is updated.

Changes:

  • OperationAllowedInState[UpgradeFailed]: add InstallOp.
  • StateTransitions[UpgradeFailed]: add Pending (required by TestOperationAllowedAlignsWithStateTransitions).
  • tests: TestIsOperationAllowed / TestIsStateTransitionValid cases for the new edges.

go test ./pkg/appstate/... passes.

  • Target Version for Merge

main

  • Related Issues

N/A

  • PRs Involving Sub-Systems

This is approach A (minimal recovery-side fix). Approach B#3438 — makes upgrade self-heal a missing release by installing it, so upgradeFailed never happens for a vanished release. The two are independent and can coexist; opened both so maintainers can pick the preferred direction (or take both).

  • Other information:

app-service is a core component; the change takes effect after app-service is rebuilt/redeployed.

When an upgrade fails because the helm release is missing (e.g. the release
was lost while the ApplicationManager still believes the app is installed),
the app lands in upgradeFailed. That state only allowed UpgradeOp/UninstallOp,
so `upgrade` keeps failing at GetDeployedReleaseVersion with "release: not
found" and `install` is rejected outright ("install operation is not allowed
for upgradeFailed state") — leaving the app wedged.

Mirror the existing InstallFailed recovery: allow InstallOp from upgradeFailed
and add the upgradeFailed -> Pending transition the install handler needs.
install() already tolerates an existing release (ErrReleaseExists), so when an
old release is still present this is effectively a no-op.

IsTerminalReinstallable intentionally still excludes upgradeFailed (it may hold
a live previous release, so checkAppNameConflict must keep treating it as
occupied); its doc comment is updated to reflect that.

Co-authored-by: Cursor <cursoragent@cursor.com>
@vercel

vercel Bot commented Jun 21, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
olares Ready Ready Preview, Comment Jun 21, 2026 10:20am
1 Skipped Deployment
Project Deployment Actions Updated (UTC)
olares-docs Ignored Ignored Jun 21, 2026 10:20am

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant