app-service: allow install to recover an app stuck in upgradeFailed#3437
Closed
pengpeng wants to merge 1 commit into
Closed
app-service: allow install to recover an app stuck in upgradeFailed#3437pengpeng wants to merge 1 commit into
pengpeng wants to merge 1 commit into
Conversation
When an upgrade fails because the helm release is missing (e.g. the release
was lost while the ApplicationManager still believes the app is installed),
the app lands in upgradeFailed. That state only allowed UpgradeOp/UninstallOp,
so `upgrade` keeps failing at GetDeployedReleaseVersion with "release: not
found" and `install` is rejected outright ("install operation is not allowed
for upgradeFailed state") — leaving the app wedged.
Mirror the existing InstallFailed recovery: allow InstallOp from upgradeFailed
and add the upgradeFailed -> Pending transition the install handler needs.
install() already tolerates an existing release (ErrReleaseExists), so when an
old release is still present this is effectively a no-op.
IsTerminalReinstallable intentionally still excludes upgradeFailed (it may hold
a live previous release, so checkAppNameConflict must keep treating it as
occupied); its doc comment is updated to reflect that.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
1 Skipped Deployment
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When an upgrade fails because the helm release is missing — e.g. the release was lost while the
ApplicationManagerstill believes the app is installed (orphaned/incomplete prior install) — the app lands inupgradeFailed. Today that state only allowsUpgradeOp/UninstallOp:upgradekeeps failing early atGetDeployedReleaseVersionwithrelease: not found,installis rejected outright (install operation is not allowed for upgradeFailed state),cancelis not allowed either,so the app is wedged with no CLI recovery path.
This PR mirrors the existing
InstallFailedrecovery: allowInstallOpfromupgradeFailedand add theupgradeFailed -> Pendingtransition the install handler writes.install()already tolerates an existing release (driver.ErrReleaseExists), so when an old release is still present this is effectively a no-op; when the release is gone it reinstalls cleanly.IsTerminalReinstallableintentionally still excludesupgradeFailed(it may hold a live previous release, socheckAppNameConflictmust keep treating it as occupied); only its doc comment is updated.Changes:
OperationAllowedInState[UpgradeFailed]: addInstallOp.StateTransitions[UpgradeFailed]: addPending(required byTestOperationAllowedAlignsWithStateTransitions).TestIsOperationAllowed/TestIsStateTransitionValidcases for the new edges.go test ./pkg/appstate/...passes.main
N/A
This is approach A (minimal recovery-side fix). Approach B — #3438 — makes
upgradeself-heal a missing release by installing it, soupgradeFailednever happens for a vanished release. The two are independent and can coexist; opened both so maintainers can pick the preferred direction (or take both).app-service is a core component; the change takes effect after app-service is rebuilt/redeployed.