Request Type
Performance improvement
Affected Workflow (if applicable)
Infrastructure (gitops-update, build, helm-update-chart, dispatch-helm)
Problem / Motivation
The github-actions-argocd-sync action fails intermittently when the ArgoCD server takes longer than the CLI's default timeout (~90s) to respond to a sync request. This was observed on the Reporter repo (run #23253916955) where firmino-reporter-dev failed after 5 sync attempts, even though the sync was actually applied successfully on ArgoCD and the new image was running correctly in dev.
Root cause (confirmed): The argocd app sync command completes the sync successfully (successfully synced (all tasks run)), but the CLI exits with code 1 because there are orphaned resources that require pruning:
{"level":"fatal","msg":"2 resources require pruning","time":"2026-03-18T14:39:07-03:00"}
The orphaned resources are:
ClusterRole reporter-manager-midaz-plugins-dev
ClusterRoleBinding reporter-manager-midaz-plugins-dev
These were left behind after a rename from namespace-suffixed names to plain reporter-manager. The current entrypoint.sh redirects all output to /dev/null, hiding this error. It then retries 5 times — each retry successfully syncs but also exits 1 due to the same pruning requirement.
Initial hypothesis (timeout) was incorrect. The ~1min per attempt was the actual sync duration, not a timeout. The exit code 1 was from the pruning fatal log, not a gRPC timeout.
Proposed Solution
Changes to github-actions-argocd-sync/entrypoint.sh:
-
Remove > /dev/null 2>&1 from the sync command — expose the actual error message so failures are diagnosable from the GitHub Actions log. This is the most critical change — without it, the real error is invisible.
-
Add --prune flag support — new optional input prune (default: false). When enabled, pass --prune to argocd app sync so orphaned resources are cleaned up automatically during sync. This prevents the "resources require pruning" fatal from causing false failures.
-
Use --async on argocd app sync — fire the sync without waiting for completion. The script already has an argocd app wait step afterward that handles the confirmation. This separates sync dispatch from sync verification.
-
Increase retry interval from 5s to 30s — give time for a previous sync attempt to complete before retrying.
-
Add explicit --timeout to the sync and wait commands (e.g., --timeout 180) for predictable behavior regardless of CLI defaults.
Alternatives Considered
- Only removing /dev/null (helps diagnosis but doesn't prevent the failure)
- Always pruning (risky in production — better as opt-in flag)
- Adding
--force to sync retries (risky, could cause unintended overwrites)
Example Usage
# Existing usage remains the same (backward compatible)
- uses: LerianStudio/github-actions-argocd-sync@main
with:
app-name: firmino-reporter
argo-cd-token: ${{ secrets.ARGOCD_TOKEN }}
argo-cd-url: ${{ secrets.ARGOCD_URL }}
env-prefix: dev
skip-if-not-exists: true
# New: with safe pruning enabled
- uses: LerianStudio/github-actions-argocd-sync@main
with:
app-name: firmino-reporter
argo-cd-token: ${{ secrets.ARGOCD_TOKEN }}
argo-cd-url: ${{ secrets.ARGOCD_URL }}
env-prefix: dev
skip-if-not-exists: true
prune: true
Would This Be a Breaking Change?
No — fully backward compatible
Checklist
Additional Context
- Related Jira ticket: DSINT-860
- Reported by Arthur Ribeiro in #devops-team
- Investigated by Lucas Bedatty — confirmed root cause via local sync without /dev/null redirect
- Orphaned resources from PR
fix/reporter-cluster-role-unique-names (March 12) — namespace suffix added then reverted, old resources left behind
Request Type
Performance improvement
Affected Workflow (if applicable)
Infrastructure (gitops-update, build, helm-update-chart, dispatch-helm)
Problem / Motivation
The
github-actions-argocd-syncaction fails intermittently when the ArgoCD server takes longer than the CLI's default timeout (~90s) to respond to a sync request. This was observed on the Reporter repo (run #23253916955) wherefirmino-reporter-devfailed after 5 sync attempts, even though the sync was actually applied successfully on ArgoCD and the new image was running correctly in dev.Root cause (confirmed): The
argocd app synccommand completes the sync successfully (successfully synced (all tasks run)), but the CLI exits with code 1 because there are orphaned resources that require pruning:The orphaned resources are:
ClusterRole reporter-manager-midaz-plugins-devClusterRoleBinding reporter-manager-midaz-plugins-devThese were left behind after a rename from namespace-suffixed names to plain
reporter-manager. The current entrypoint.sh redirects all output to/dev/null, hiding this error. It then retries 5 times — each retry successfully syncs but also exits 1 due to the same pruning requirement.Initial hypothesis (timeout) was incorrect. The ~1min per attempt was the actual sync duration, not a timeout. The exit code 1 was from the pruning fatal log, not a gRPC timeout.
Proposed Solution
Changes to
github-actions-argocd-sync/entrypoint.sh:Remove
> /dev/null 2>&1from the sync command — expose the actual error message so failures are diagnosable from the GitHub Actions log. This is the most critical change — without it, the real error is invisible.Add
--pruneflag support — new optional inputprune(default:false). When enabled, pass--prunetoargocd app syncso orphaned resources are cleaned up automatically during sync. This prevents the "resources require pruning" fatal from causing false failures.Use
--asynconargocd app sync— fire the sync without waiting for completion. The script already has anargocd app waitstep afterward that handles the confirmation. This separates sync dispatch from sync verification.Increase retry interval from 5s to 30s — give time for a previous sync attempt to complete before retrying.
Add explicit
--timeoutto the sync and wait commands (e.g.,--timeout 180) for predictable behavior regardless of CLI defaults.Alternatives Considered
--forceto sync retries (risky, could cause unintended overwrites)Example Usage
Would This Be a Breaking Change?
No — fully backward compatible
Checklist
Additional Context
fix/reporter-cluster-role-unique-names(March 12) — namespace suffix added then reverted, old resources left behind