Skip to content

fix: correct order-service image name typo causing ImagePullBackOff (fixes #27)#28

Open
github-actions[bot] wants to merge 1 commit intomainfrom
fix/cluster-doctor/order-service-typo-issue-27
Open

fix: correct order-service image name typo causing ImagePullBackOff (fixes #27)#28
github-actions[bot] wants to merge 1 commit intomainfrom
fix/cluster-doctor/order-service-typo-issue-27

Conversation

@github-actions
Copy link

Summary

Fixes #27 — Corrects a typo in the order-service container image name that is causing ImagePullBackOff, resulting in ArgoCD reporting a Degraded health status.

Root Cause Analysis

Issue: Pod order-service-5d9867645-ffsrd is stuck in ImagePullBackOff in namespace default on cluster msftgbb (resource group: agentic-platform-engineering).

Root Cause: Commit 5d5dfc7 ("intentionally degrade service for testing") introduced a typo in the order-service image name in Act-3/argocd/apps/broken-aks-store-all-in-one.yaml:

Image
Broken ghcr.io/azure-samples/aks-store-demo/orde-service:2.1.0
Fixed ghcr.io/azure-samples/aks-store-demo/order-service:2.1.0

The non-existent image path causes ghcr.io to return 403 Forbidden when the kubelet attempts to pull it, resulting in ImagePullBackOff.

Evidence Collected from Cluster

Pod: order-service-5d9867645-ffsrd (namespace: default)
Status: Pending / ImagePullBackOff

Events:
  Warning  Failed  kubelet  Failed to pull image "ghcr.io/azure-samples/aks-store-demo/orde-service:2.1.0":
           failed to authorize: failed to fetch anonymous token: unexpected status from GET request:
           403 Forbidden
  Warning  Failed  kubelet  Error: ErrImagePull  (x5)
  Warning  Failed  kubelet  Error: ImagePullBackOff  (x42)

Cluster Identity Verified (2 signals):

  1. Cluster FQDN: msftgbb-dns-ic5hvueo.hcp.canadacentral.azmk8s.io
  2. Resource UID: 6993952c6544f50001ed50e9 in subscription ed38c53b-b762-4ab9-954d-a05e40f9458a

ArgoCD State:

  • App agentic-platform-engineering-demo: Synced ✅ / Health: Degraded ❌
  • Syncing Act-3/argocd/apps from main

Changes Made

  • File: Act-3/argocd/apps/broken-aks-store-all-in-one.yaml
  • Line: Fixed orde-serviceorder-service in the order-service Deployment image reference

Test Plan

After merging, ArgoCD will auto-sync (selfHeal: true). Validate with:

# Watch the new pod come up
kubectl get pods -n default -l app=order-service -w

# Confirm image pull succeeds
kubectl describe pod -n default -l app=order-service | grep -A 10 "Events:"

# Verify deployment is healthy
kubectl rollout status deployment/order-service -n default

# Confirm ArgoCD health
argocd app get agentic-platform-engineering-demo

Expected outcomes:

  1. New pod successfully pulls order-service:2.1.0
  2. Deployment reaches 1/1 ready replicas ✅
  3. ArgoCD health transitions: DegradedHealthy

Rollback Plan

kubectl rollout undo deployment/order-service -n default

Or revert this PR.


Automated fix created by Cluster Doctor Agent (Issue #27)

Fixes #27 - ArgoCD deployment degraded due to ImagePullBackOff on
cluster msftgbb (resource group: agentic-platform-engineering).

The container image name had a typo ('orde-service' instead of
'order-service'), causing ghcr.io to return 403 Forbidden during
anonymous token fetch, resulting in ImagePullBackOff.

  Before: ghcr.io/azure-samples/aks-store-demo/orde-service:2.1.0
  After:  ghcr.io/azure-samples/aks-store-demo/order-service:2.1.0

Evidence:
  Failed to pull image: failed to authorize: failed to fetch anonymous
  token: unexpected status from GET request to ghcr.io: 403 Forbidden

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🚨 ArgoCD Deployment Failed: agentic-platform-engineering-demo

1 participant