Skip to content

HYPERFLEET-733 - fix: sentinel delays first event publish for newly created clusters#77

Merged
openshift-merge-bot[bot] merged 2 commits intoopenshift-hyperfleet:mainfrom
86254860:HYPERFLEET-733
Mar 20, 2026
Merged

HYPERFLEET-733 - fix: sentinel delays first event publish for newly created clusters#77
openshift-merge-bot[bot] merged 2 commits intoopenshift-hyperfleet:mainfrom
86254860:HYPERFLEET-733

Conversation

@86254860
Copy link
Contributor

@86254860 86254860 commented Mar 19, 2026

Summary

Adds a new first-priority reconciliation check that immediately publishes events for resources that have never been processed by an adapter (status.LastUpdated is zero). This eliminates first-event latency where new clusters would wait for max age intervals (10s for not-ready) instead of publishing within one poll interval (~5s).

Changes

  • Add ReasonNeverProcessed constant and never-processed reconciliation in decision engine
  • Remove created_time fallback logic for max age calculations
  • Update tests to reflect never-processed reconciliation behavior
  • Update operator guide align with the latest changes

Testing

make test-all 0 issue.

Summary by CodeRabbit

  • New Features

    • Added a "never processed" reconciliation trigger that immediately publishes resources with no prior updates.
  • Improvements

    • Reordered reconciliation priority: never-processed, generation/spec-change, then time-based max-age; time-based checks now always use last-updated as the reference.
    • Published metric now counts events for "generation changed", "never processed", or "max age exceeded".
  • Documentation

    • Updated operator guide, decision flow diagram, and CloudEvents reason details.
  • Tests

    • Updated and added tests to validate the new precedence and behaviors.

@openshift-ci openshift-ci bot requested review from ciaranRoche and vkareh March 19, 2026 10:16
@coderabbitai
Copy link

coderabbitai bot commented Mar 19, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b5666459-8cf9-4087-972a-2a0348031549

📥 Commits

Reviewing files that changed from the base of the PR and between d4ac25e and e89f70e.

📒 Files selected for processing (1)
  • internal/engine/decision_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/engine/decision_test.go

Walkthrough

This change introduces a prioritized three-part decision strategy in the reconciliation engine. It adds a highest-priority "never-processed" check that immediately publishes when status.last_updated is zero, adds the exported constant ReasonNeverProcessed, reorders evaluation to check generation mismatch (generation > ObservedGeneration) second, and uses status.last_updated as the sole timestamp for max-age decisions (removing the created_time fallback). Tests and documentation were updated to reflect the new priority and reasons.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: introducing a never-processed reconciliation check that publishes events immediately for newly created clusters, eliminating first-event latency.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/engine/decision_test.go (1)

430-447: Add one precedence test where both triggers are true.

Please add a case with lastUpdated = time.Time{} and generation > observedGeneration to lock in that never-processed has priority over generation-based reconciliation.

Suggested table case
+		{
+			name:               "never processed takes priority over generation mismatch",
+			createdTime:        now.Add(-1 * time.Hour),
+			lastUpdated:        time.Time{},
+			ready:              true,
+			wantShouldPublish:  true,
+			wantReasonContains: ReasonNeverProcessed,
+			description:        "When both conditions match, never-processed should win by priority",
+		},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/engine/decision_test.go` around lines 430 - 447, Add a new test case
in the decision_test.go table (the slice of test cases used by the decision
evaluation) named e.g. "never processed takes precedence over generation" where
lastUpdated is time.Time{} (zero), generation is greater than
observedGeneration, ready can be true or false, wantShouldPublish is true and
wantReasonContains is ReasonNeverProcessed; this ensures the never-processed
branch (lastUpdated == zero) is preferred over a generation-based reconciliation
trigger by asserting the decision returns publish=true and the reason contains
ReasonNeverProcessed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/sentinel-operator-guide.md`:
- Around line 84-89: Update the intro paragraph that currently calls the model a
“dual-trigger” system to match the three-part reconciliation strategy described
in the section; in the "Sentinel's decision engine" intro replace "dual-trigger"
and the two-trigger list with wording that states a three-part decision strategy
and enumerates the three triggers: Never-Processed Reconciliation, State-Based
Reconciliation, and Time-Based Reconciliation so the intro and the detailed list
are consistent.

---

Nitpick comments:
In `@internal/engine/decision_test.go`:
- Around line 430-447: Add a new test case in the decision_test.go table (the
slice of test cases used by the decision evaluation) named e.g. "never processed
takes precedence over generation" where lastUpdated is time.Time{} (zero),
generation is greater than observedGeneration, ready can be true or false,
wantShouldPublish is true and wantReasonContains is ReasonNeverProcessed; this
ensures the never-processed branch (lastUpdated == zero) is preferred over a
generation-based reconciliation trigger by asserting the decision returns
publish=true and the reason contains ReasonNeverProcessed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a04a5f87-b88b-4286-9da0-646ffa09feee

📥 Commits

Reviewing files that changed from the base of the PR and between 3f193c7 and b5acc74.

📒 Files selected for processing (3)
  • docs/sentinel-operator-guide.md
  • internal/engine/decision.go
  • internal/engine/decision_test.go

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/sentinel-operator-guide.md`:
- Line 104: The docs use mixed casing for the same status field—replace all
occurrences of status.LastUpdated and any "LastUpdated" in mermaid diagrams with
snake_case status.last_updated to match the API/CEL conventions (e.g.,
created_time) and update the CEL variables table and diagram labels accordingly
so the field name is consistently last_updated throughout the document.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: eaafd01f-8926-419c-9318-aca242aba0d6

📥 Commits

Reviewing files that changed from the base of the PR and between b5acc74 and baafeb8.

📒 Files selected for processing (3)
  • docs/sentinel-operator-guide.md
  • internal/engine/decision.go
  • internal/engine/decision_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal/engine/decision.go
  • internal/engine/decision_test.go

Copy link
Contributor

@ciaranRoche ciaranRoche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Contributor

@xueli181114 xueli181114 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-structured change. The three-part decision priority (never-processed → generation → max-age) is sound and the code reads clearly.

What's good:

  • The LastUpdated.IsZero() check before the generation check is the right call — it eliminates the created_time fallback which was a leaky abstraction (mixing API creation metadata with adapter processing state)
  • Removing the referenceTime fallback to CreatedTime simplifies the max-age path and eliminates a subtle bug where newly created resources with Ready=true would wait 30m instead of publishing immediately
  • Precedence test (TestDecisionEngine_Evaluate_NeverProcessedPrecedence) locks in the ordering — good defensive testing
  • Doc updates are thorough and the mermaid diagram is clearer than the previous split diagrams

Minor observations (non-blocking):

  1. decision.go:95 — The referenceTime variable is now redundant. Since the IsZero() early return guarantees LastUpdated is non-zero at this point, you could use resource.Status.LastUpdated directly in the Add(maxAge) call and drop the local variable. Not worth a respin, but worth noting.

  2. The CreatedTime field on Resource is now unused by the decision engine. If no other code path references it for decision-making, consider documenting that CreatedTime is purely informational (not used in reconciliation logic) to prevent future developers from re-introducing the fallback pattern.

/lgtm

@openshift-ci
Copy link

openshift-ci bot commented Mar 20, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ciaranRoche, xueli181114

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [ciaranRoche,xueli181114]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit a86043c into openshift-hyperfleet:main Mar 20, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants