Skip to content

Feat: add observedGeneration; simplify conditions#416

Merged
pdettori merged 9 commits into
kagenti:mainfrom
r3v5:add-observed-generation-to-status
Jun 11, 2026
Merged

Feat: add observedGeneration; simplify conditions#416
pdettori merged 9 commits into
kagenti:mainfrom
r3v5:add-observed-generation-to-status

Conversation

@r3v5

@r3v5 r3v5 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Implements RHAIENG-4938: simplify AgentRuntime status by adding observedGeneration and removing redundant fields/conditions.

  • Add status.observedGeneration — tracks the most recent metadata.generation processed by the controller
  • Remove status.phase (Pending/Active/Error) — redundant with Ready condition
  • Remove CardFetched and SkillsDiscovered conditions — card/skill status is observable via status.card and status.linkedSkills fields
  • Move TargetResolved=True to after applyWorkloadConfig succeeds (not just after resolveTargetRef)
  • Replace Phase printer column with Readykubectl get agentruntime now shows NAME TYPE TARGET READY AGE

Final status shape

status:
  observedGeneration: <int64>
  configuredPods: <int32>
  card: ...
  linkedSkills: [...]
  conditions:
  - type: IstioMeshEnrolled   # from PR #403
  - type: ConfigResolved
  - type: TargetResolved
  - type: Ready

Deviation from ticket

ConfigResolved kept — ticket AC says "exactly two conditions" but PR #405 (config merge) depends on ConfigResolved. Keeping it avoids cross-PR conflicts. Can be removed in a follow-up once #405 merges. Jira comment posted noting this deviation.

IstioMeshEnrolled — added by PR #403 (merged into main). Not part of this PR, but appears in the conditions list after rebase. Three states: True/NamespaceLabeled, False/OptedOut, False/PatchFailed.

Test plan

  • make test — all unit tests pass
  • make test-e2e (AgentRuntime focus) — 8/8 specs pass
  • Manual verification in Kind cluster (Podman) — all 7 ACs pass
  • make manifests generate — no drift
  • make lint — pre-existing warnings only

Manual Verification Results (Kind + Podman)

Click to expand full verification output

AC1: Printer columns show NAME, TYPE, TARGET, READY, AGE

kubectl get agentruntime -n ac-test
NAME           TYPE    TARGET        READY   AGE
test-missing   agent   nonexistent           28s
test-runtime   agent   test-agent    True    28s

PASS — Phase column gone, Ready column present.

AC2: status.observedGeneration == metadata.generation

kubectl get agentruntime test-runtime -n ac-test \
  -o jsonpath='generation={.metadata.generation} observedGeneration={.status.observedGeneration}'
generation=1 observedGeneration=1

PASS

AC3: status.phase removed

kubectl get agentruntime test-runtime -n ac-test -o jsonpath='{.status.phase}'
# (empty output)

PASS

AC4: Conditions present (TargetResolved, ConfigResolved, Ready)

kubectl get agentruntime test-runtime -n ac-test \
  -o jsonpath='{range .status.conditions[*]}{.type}={.status} reason={.reason}{"\n"}{end}'
IstioMeshEnrolled=True reason=NamespaceLabeled
ConfigResolved=True reason=ConfigResolved
TargetResolved=True reason=TargetFound
Ready=True reason=Configured

PASS — All 3 required conditions present. IstioMeshEnrolled is from PR #403 (merged into main), not from this PR.

AC5: CardFetched condition absent

kubectl get agentruntime test-runtime -n ac-test \
  -o jsonpath='{.status.conditions[?(@.type=="CardFetched")].type}'
# (empty output)

PASS

AC6: SkillsDiscovered condition absent

kubectl get agentruntime test-runtime -n ac-test \
  -o jsonpath='{.status.conditions[?(@.type=="SkillsDiscovered")].type}'
# (empty output)

PASS

AC7: Error case — missing target sets TargetResolved=False, no Phase

TargetResolved=False reason=TargetNotFound
kubectl get agentruntime test-missing -n ac-test -o jsonpath='{.status.phase}'
# (empty output)

PASS

Full status YAML (happy path)

status:
  conditions:
  - lastTransitionTime: "2026-06-09T12:49:06Z"
    message: Namespace ac-test enrolled in Istio ambient mesh
    observedGeneration: 1
    reason: NamespaceLabeled
    status: "True"
    type: IstioMeshEnrolled
  - lastTransitionTime: "2026-06-09T12:49:06Z"
    message: Configuration resolved successfully
    observedGeneration: 1
    reason: ConfigResolved
    status: "True"
    type: ConfigResolved
  - lastTransitionTime: "2026-06-09T12:49:06Z"
    message: Deployment test-agent resolved
    observedGeneration: 1
    reason: TargetFound
    status: "True"
    type: TargetResolved
  - lastTransitionTime: "2026-06-09T12:49:06Z"
    message: Workload test-agent configured with config-hash 57de3ab75a7e
    observedGeneration: 1
    reason: Configured
    status: "True"
    type: Ready
  configuredPods: 1
  observedGeneration: 1

Summary

AC Description Result
AC1 Printer columns: NAME TYPE TARGET READY AGE PASS
AC2 observedGeneration == generation PASS
AC3 status.phase removed PASS
AC4 3 conditions: TargetResolved, ConfigResolved, Ready PASS
AC5 CardFetched condition absent PASS
AC6 SkillsDiscovered condition absent PASS
AC7 Error case: TargetResolved=False, no Phase PASS

@r3v5 r3v5 requested a review from a team as a code owner June 9, 2026 13:00
@r3v5 r3v5 changed the title feat(status): add observedGeneration; simplify conditions Feat: add observedGeneration; simplify conditions Jun 9, 2026

@pdettori pdettori left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — clean status simplification. All 15/16 checks pass; the failing E2E (should populate linkedSkills from annotation) is a pre-existing flaky test that also fails on main (run 27172754436, Jun 8) due to missing authbridge-config data in the skill-discovery namespace.

Two minor suggestions below (non-blocking).

Comment thread kagenti-operator/internal/controller/agentruntime_controller.go Outdated
updated := &agentv1alpha1.AgentRuntime{}
Expect(k8sClient.Get(ctx, nn, updated)).To(Succeed())
Expect(updated.Status.ObservedGeneration).To(Equal(updated.Generation))
})

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ~180 lines of CardFetched/SkillsDiscovered tests correctly removed. Consider adding a small test that verifies Ready=False is set when applyWorkloadConfig returns an error (the ConfigApplyError path) — this error path is currently untested and interacts with the TargetResolved relocation above.

@r3v5

r3v5 commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

Addressing review feedback from @pdettori

1. TargetResolved semantics (controller.go:243)

Agreed — moved TargetResolved=True back to immediately after resolveTargetRef succeeds. Cleaner semantic: "I found the deployment." The Ready condition gates the full success path.

Fixed in 6be4120.

2. ConfigApplyError test coverage (controller_test.go:235)

Good point. The ConfigApplyError path is mechanically identical to the other error paths (TargetNotFound, ConfigHashError) — all three call setCondition(Ready=False, ...) + Status().Update(). The TargetNotFound path is already tested end-to-end. Triggering ConfigApplyError specifically in envtest requires simulating a workload Update failure (RBAC, conflict) which would need a mock client — out of scope for this PR but worth a follow-up if we add client mocking infrastructure.

@r3v5 r3v5 force-pushed the add-observed-generation-to-status branch from 0cb654f to 6be4120 Compare June 10, 2026 10:47
@pdettori

Copy link
Copy Markdown
Member

@r3v5 CI has been fixed - please rebase from upstream main / resolve conflicts and force push so we can check now with full CI

r3v5 added 9 commits June 10, 2026 22:41
Tracks the most recent metadata.generation processed by the controller,
enabling clients to detect stale status via generation skew.

Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Ian Miller <milleryan2003@gmail.com>
Phase (Pending/Active/Error) was redundant with conditions.
Clients should use Ready condition and observedGeneration instead.

Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Ian Miller <milleryan2003@gmail.com>
These fine-grained conditions added noise without actionable signal.
Card fetch and skill discovery status is still observable via
status.card and status.linkedSkills fields respectively.

Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Ian Miller <milleryan2003@gmail.com>
…er column

TargetResolved=True now set after applyWorkloadConfig succeeds, not
merely after resolveTargetRef. Replaces the CardFetched printer column
with Ready for a cleaner `kubectl get agentruntime` output.

Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Ian Miller <milleryan2003@gmail.com>
…nup)

- api-reference.md: add observedGeneration, remove phase, remove
  SkillsDiscovered condition, update kubectl examples
- GETTING_STARTED.md: replace PHASE column with READY
- test/e2e/README.md: update scenario descriptions for new status shape
- test/e2e/e2e_test.go: replace Phase assertions with Ready/TargetResolved
  condition checks

Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Ian Miller <milleryan2003@gmail.com>
The skill discovery e2e test could observe Ready=True from a reconcile
that ran before the controller restarted with skillDiscovery enabled,
causing linkedSkills to appear empty. Combining both assertions in a
single Eventually block ensures the test waits for a reconcile that
reflects the enabled feature gate.

Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Ian Miller <milleryan2003@gmail.com>
Moves TargetResolved=True back to immediately after resolveTargetRef
succeeds, preserving the clear semantic: "I found the deployment."
The Ready condition gates the full success path.

Addresses review feedback from @pdettori.

Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Ian Miller <milleryan2003@gmail.com>
Uses an updateFailClient wrapper to simulate Deployment update failures,
verifying that Ready=False with reason ConfigApplyError is set when
applyWorkloadConfig fails.

Addresses review feedback from @pdettori.

Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Ian Miller <milleryan2003@gmail.com>
Main added SCC RoleBinding error path and skill status update code
that still referenced removed symbols (setPhase, RuntimePhaseError,
ConditionTypeSkillsDiscovered). Aligned with status simplification.

Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Ian Miller <milleryan2003@gmail.com>
@r3v5 r3v5 force-pushed the add-observed-generation-to-status branch from a415f84 to bb9b9d2 Compare June 10, 2026 21:51
@pdettori pdettori merged commit 1c3187a into kagenti:main Jun 11, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants