Skip to content

HYPERFLEET-537 - docs: configurable conditions-based decision making#102

Open
rafabene wants to merge 1 commit intomainfrom
HYPERFLEET-537
Open

HYPERFLEET-537 - docs: configurable conditions-based decision making#102
rafabene wants to merge 1 commit intomainfrom
HYPERFLEET-537

Conversation

@rafabene
Copy link
Contributor

@rafabene rafabene commented Mar 11, 2026

Summary

  • Replace hardcoded max_age_not_ready/max_age_ready with configurable conditions section using CEL expressions
  • Update sentinel flowchart to include CEL compilation at startup and conditions-based evaluation flow
  • Update Decision Engine description, test scenarios, and config examples across all sentinel docs
  • Update architecture-summary.md for consistency

Files Changed

File Change
sentinel-config.yaml New conditions block replaces max_age_not_ready/max_age_ready
sentinel.md Flowchart, Decision Engine, Decision Logic, Config Loader, Resource Filtering examples, Test Scenarios
sentinel-deployment.md ConfigMap updated with new config format
architecture-summary.md Config examples and Decision Algorithm updated

Related

Test plan

  • No remaining references to max_age_not_ready/max_age_ready in the repository
  • All YAML config examples use consistent conditions format
  • Flowchart renders correctly with new CEL compilation step
  • Test scenarios include new edge cases (no rule match fallback, CEL compilation failure)

Summary by CodeRabbit

  • Documentation

    • Updated Sentinel docs, diagrams and deployment notes for CEL-based decision rules, reference_time semantics, generation-priority reconciliation, startup validation, and updated metrics/labels.
  • New Features

    • Condition-driven reconciliation using configurable CEL expressions (message_decision) and immediate publish when resource generation advances.
    • Region-specific decision parameters, templated CloudEvent/message composition, and broker/config separation to support multiple broker backends.
  • Chores

    • Metrics now report rule-based reasons and a new config_loads_total metric.

@openshift-ci openshift-ci bot requested review from rh-amarin and xueli181114 March 11, 2026 19:32
@openshift-ci
Copy link

openshift-ci bot commented Mar 11, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign tirthct for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 11, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Replaces fixed max-age timeouts with a configurable CEL-based message_decision block that defines named params (ref_time, is_ready, age_exceeded_ready, age_exceeded_not_ready) and a boolean result expression to decide publish vs skip. Adds immediate reconciliation when resource.generation > condition(resource, "Available").observed_generation by publishing a CreateEvent without waiting for age logic. CEL expressions are compiled at startup (fail-fast). Sentinel configs (including regional variants) now include message_decision; broker/config separation introduced; metrics and skip-reason labels updated to reflect generation_mismatch and message_decision outcomes.

Sequence Diagram(s)

sequenceDiagram
    participant Config as Configuration
    participant CELEngine as CEL Engine
    participant Monitor as Resource Monitor
    participant Decision as Decision Loop
    participant Broker as Event Broker

    Config->>CELEngine: Load & compile message_decision (fail-fast)
    CELEngine-->>Config: Compilation OK

    Monitor->>Decision: Poll resource / read conditions
    Decision->>Decision: Check resource.generation vs condition(...).observed_generation

    alt generation > observed_generation
        Decision->>Broker: Publish CreateEvent (immediate)
    else generation == observed_generation
        Decision->>CELEngine: Evaluate message_decision params (ref_time,is_ready,age_exceeded_*)
        CELEngine-->>Decision: Boolean result
        alt result == true
            Decision->>Broker: Publish Reconciliation Event
        else result == false
            Decision-->>Decision: Skip (record reason: message_decision)
        end
    end

    Broker-->>Broker: Queue/forward event
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: replacing hardcoded max-age configuration with configurable, conditions-based decision-making using CEL expressions.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch HYPERFLEET-537
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
hyperfleet/components/sentinel/sentinel.md (1)

608-638: ⚠️ Potential issue | 🟠 Major

Use one observed_generation path throughout the spec.

This section compares against resource.status.conditions.Ready.observed_generation, but the rest of the page and the test scenarios use resource.status.observed_generation. Please normalize on a single contract here; otherwise the Decision Engine description is ambiguous.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/components/sentinel/sentinel.md` around lines 608 - 638, The
document inconsistently references two observed_generation paths; normalize to a
single contract by replacing the use of
resource.status.conditions.Ready.observed_generation with the canonical
resource.status.observed_generation (or vice versa if you prefer that canonical
name) everywhere in the Decision Logic and Implementation Requirements so the
generation check in Evaluate(resource, now) consistently compares
resource.generation against the same observed_generation path; update all
references, examples, and test scenario mentions and ensure the text and logging
(e.g., the generation mismatch reason string) reflect the chosen single field
name.
hyperfleet/components/sentinel/sentinel-deployment.md (1)

43-61: ⚠️ Potential issue | 🟠 Major

The pod spec still doesn't consume the documented broker ConfigMap.

This file now introduces hyperfleet-sentinel-broker, but the Deployment only pulls GCP_PROJECT_ID from sentinel-config and never imports any BROKER_* keys. As written, the RabbitMQ example never reaches the container environment, and Pub/Sub has two competing config sources.

Also applies to: 121-177

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/components/sentinel/sentinel-deployment.md` around lines 43 - 61,
The Deployment pod spec is missing env entries that consume the new
hyperfleet-sentinel-broker ConfigMap so BROKER_* values never reach the
container; update the env block to add the required BROKER_* env vars (e.g.,
BROKER_TYPE, BROKER_URL, BROKER_USER, BROKER_PASSWORD or whatever keys your
broker consumer expects) with valueFrom: configMapKeyRef name:
hyperfleet-sentinel-broker and the appropriate key names, ensure
BROKER_CREDENTIALS (if still needed) remains sourced from sentinel-secrets
(secretKeyRef) only when required, and remove or reconcile any duplicate BROKER
sources so Pub/Sub continues to use GCP_PROJECT_ID from sentinel-config while
RabbitMQ/other brokers use hyperfleet-sentinel-broker.
🧹 Nitpick comments (1)
hyperfleet/components/sentinel/sentinel-deployment.md (1)

126-134: Expose rule-oriented labels in the metrics spec.

Once skip/publish decisions come from arbitrary CEL rules, a fixed ready_state={ready,not_ready} label no longer tells operators which rule matched and cannot represent the no-match fallback. A matched_rule or decision_reason label would fit the new configuration model much better.

Also applies to: 209-225

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/components/sentinel/sentinel-deployment.md` around lines 126 -
134, The metrics spec currently emits a fixed ready_state label which can't
convey which CEL rule matched or represent a no-match fallback; update the
metrics emission to include a rule-oriented label (e.g., matched_rule or
decision_reason) populated with the matching rule name (use the rule "name"
values such as isReady and isNotReady) and ensure a deterministic fallback value
(e.g., "no_match") when no rule matches; apply the same change to the other
rules block referenced (lines ~209-225) so all rule-based decisions expose the
matching rule name and a clear no-match reason in metrics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@hyperfleet/architecture/architecture-summary.md`:
- Around line 268-279: Update the later sequence diagrams so their decision
boxes match the documented Decision Algorithm: change the checks to first
compare resource.generation vs resource.status.observed_generation
(generation-first reconciliation), then evaluate CEL condition rules to select
matched_rule.max_age (or smallest max_age if none), compute reference_time from
the rule expression, and use now >= reference_time + max_age to trigger
PublishEvent(CreateEvent(resource)); replace any hardcoded phase != "Ready" or
last_updated_time + 10s logic with these same symbols (FetchResources,
resource.generation, resource.status.observed_generation, matched_rule.max_age,
reference_time, PublishEvent, CreateEvent, CEL) so the diagrams and prose are
consistent.

In `@hyperfleet/components/sentinel/sentinel.md`:
- Around line 238-255: Update the doc to specify the timestamp fallback used
when no rule matches: state that reference_time (used by nextEventTime) will use
conditionTime(resource, "<conditionName>") if available, otherwise fall back to
resource.metadata.creationTimestamp (and if that is missing, use the resource's
observedGeneration/now as a last resort); mention this behavior wherever
reference_time, nextEventTime, or conditionTime are described (including the
default-rule/no-match section) so Test 9’s scenario is fully specified. Ensure
the prose notes that rules still pick the smallest max_age when none match but
that the timestamp for ROI calculation falls back as described.

---

Outside diff comments:
In `@hyperfleet/components/sentinel/sentinel-deployment.md`:
- Around line 43-61: The Deployment pod spec is missing env entries that consume
the new hyperfleet-sentinel-broker ConfigMap so BROKER_* values never reach the
container; update the env block to add the required BROKER_* env vars (e.g.,
BROKER_TYPE, BROKER_URL, BROKER_USER, BROKER_PASSWORD or whatever keys your
broker consumer expects) with valueFrom: configMapKeyRef name:
hyperfleet-sentinel-broker and the appropriate key names, ensure
BROKER_CREDENTIALS (if still needed) remains sourced from sentinel-secrets
(secretKeyRef) only when required, and remove or reconcile any duplicate BROKER
sources so Pub/Sub continues to use GCP_PROJECT_ID from sentinel-config while
RabbitMQ/other brokers use hyperfleet-sentinel-broker.

In `@hyperfleet/components/sentinel/sentinel.md`:
- Around line 608-638: The document inconsistently references two
observed_generation paths; normalize to a single contract by replacing the use
of resource.status.conditions.Ready.observed_generation with the canonical
resource.status.observed_generation (or vice versa if you prefer that canonical
name) everywhere in the Decision Logic and Implementation Requirements so the
generation check in Evaluate(resource, now) consistently compares
resource.generation against the same observed_generation path; update all
references, examples, and test scenario mentions and ensure the text and logging
(e.g., the generation mismatch reason string) reflect the chosen single field
name.

---

Nitpick comments:
In `@hyperfleet/components/sentinel/sentinel-deployment.md`:
- Around line 126-134: The metrics spec currently emits a fixed ready_state
label which can't convey which CEL rule matched or represent a no-match
fallback; update the metrics emission to include a rule-oriented label (e.g.,
matched_rule or decision_reason) populated with the matching rule name (use the
rule "name" values such as isReady and isNotReady) and ensure a deterministic
fallback value (e.g., "no_match") when no rule matches; apply the same change to
the other rules block referenced (lines ~209-225) so all rule-based decisions
expose the matching rule name and a clear no-match reason in metrics.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d3a74e28-76b6-4274-99be-601c3ab31699

📥 Commits

Reviewing files that changed from the base of the PR and between e40d5bc and a8fbe6b.

📒 Files selected for processing (4)
  • hyperfleet/architecture/architecture-summary.md
  • hyperfleet/components/sentinel/sentinel-config.yaml
  • hyperfleet/components/sentinel/sentinel-deployment.md
  • hyperfleet/components/sentinel/sentinel.md

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
hyperfleet/components/sentinel/sentinel.md (1)

905-916: Test 9 could be clearer about reference_time fallback behavior.

Test 9 documents the scenario where no condition rule matches, correctly stating it falls back to the smallest max_age. However, the test states reference_time = now() - 15s without explaining how this value is determined when the resource "has no Ready condition (or unexpected status)."

According to line 249 in this file, when the reference_time expression evaluation fails, it should fall back to created_time. Test 9 should clarify this to fully document the edge case behavior mentioned in the past review.

🧪 Suggested test clarification
 **Test 9: No condition rule matches → use smallest max_age as fallback**

Given:

  • Resource has no Ready condition (or unexpected status)
  • resource.generation = 1
  • resource.status.observed_generation = 1
    • reference_time = now() - 15s
    • reference_time = resource.metadata.created_time = now() - 15s # Falls back to created_time when conditionTime() fails
    • No rule matches → fallback to smallest max_age (10s)
      Then:
    • Decision: PUBLISH
    • Reason: "max age expired (fallback: no rule matched)"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/components/sentinel/sentinel.md` around lines 905 - 916, Update
Test 9 in sentinel.md to explicitly state that when the resource "has no Ready
condition (or unexpected status)" the reference_time expression evaluation fails
and therefore reference_time falls back to resource.metadata.created_time (e.g.,
"reference_time = resource.metadata.created_time = now() - 15s  # Falls back to
created_time when conditionTime() fails"); keep the rest of the scenario and
expected Decision/Reason unchanged so readers understand the fallback behavior
described around line 249.
hyperfleet/architecture/architecture-summary.md (2)

268-279: Clarify reference_time behavior when no rule matches.

The decision algorithm mentions using "smallest max_age if no match" (line 275) but doesn't specify what reference_time value is used in that scenario. Since reference_time is typically derived from a CEL expression like conditionTime(resource, "Ready"), readers may wonder what timestamp applies when no rule matches (e.g., when the resource lacks the expected condition).

Based on the past review comment indicating this was addressed in sentinel.md, consider adding a brief note here for completeness.

📝 Suggested addition
   Evaluate condition rules (CEL expressions) to find matching rule
   max_age = matched_rule.max_age (or smallest max_age if no match)
-  reference_time = evaluate reference_time expression
+  reference_time = evaluate reference_time expression (falls back to created_time if evaluation fails)

   IF now >= reference_time + max_age:
     PublishEvent(broker, CreateEvent(resource))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/architecture/architecture-summary.md` around lines 268 - 279, The
Decision Algorithm text is unclear about which reference_time to use when no
rule matches; update the description for reference_time (the symbol
reference_time) to state the fallback behavior: when matched_rule is absent,
evaluate reference_time using the same default expression rules (e.g., evaluate
the CEL expression if provided, otherwise use
resource.status.observed_generation timestamp or
resource.metadata.creationTimestamp as a final fallback) and document that
max_age uses the smallest configured value when no match; mention the relevant
symbols FetchResources, matched_rule, max_age, reference_time, PublishEvent and
CreateEvent so readers can find and confirm the logic.

232-240: Document CEL custom functions or cross-reference their definitions.

The configuration introduces CEL expressions using custom functions like conditionTime(resource, "Ready") and status(resource, "Ready"), but this file doesn't explain what these functions do or their return types. Readers encountering this architecture summary may struggle to understand the decision logic without knowing the function semantics.

📚 Suggested clarification

Add a brief note after the configuration example explaining the custom CEL functions:

 conditions:
   reference_time: 'conditionTime(resource, "Ready")'
   rules:
     - name: isReady
       expression: 'status(resource, "Ready") == "True"'
       max_age: 30m
     - name: isNotReady
       expression: 'status(resource, "Ready") == "False"'
       max_age: 10s
+
+# CEL Custom Functions:
+# - conditionTime(resource, "conditionName"): Returns the last_updated_time of the specified condition
+# - status(resource, "conditionName"): Returns the status value ("True", "False", "Unknown") of the specified condition
+# See sentinel.md for complete CEL function documentation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/architecture/architecture-summary.md` around lines 232 - 240, Add
a short explanatory note after the conditions example that documents the custom
CEL functions conditionTime(resource, "Ready") and status(resource, "Ready"):
state what each function returns (e.g., conditionTime returns an RFC3339
timestamp or duration since the named condition was set; status returns the
string value of the named condition such as "True"/"False"/"Unknown"), describe
expected parameter types (resource object and condition name string), and either
link or cross-reference the function implementations/definitions so readers can
find the exact semantics and return types used by the CEL expressions in this
file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@hyperfleet/architecture/architecture-summary.md`:
- Around line 268-279: The Decision Algorithm text is unclear about which
reference_time to use when no rule matches; update the description for
reference_time (the symbol reference_time) to state the fallback behavior: when
matched_rule is absent, evaluate reference_time using the same default
expression rules (e.g., evaluate the CEL expression if provided, otherwise use
resource.status.observed_generation timestamp or
resource.metadata.creationTimestamp as a final fallback) and document that
max_age uses the smallest configured value when no match; mention the relevant
symbols FetchResources, matched_rule, max_age, reference_time, PublishEvent and
CreateEvent so readers can find and confirm the logic.
- Around line 232-240: Add a short explanatory note after the conditions example
that documents the custom CEL functions conditionTime(resource, "Ready") and
status(resource, "Ready"): state what each function returns (e.g., conditionTime
returns an RFC3339 timestamp or duration since the named condition was set;
status returns the string value of the named condition such as
"True"/"False"/"Unknown"), describe expected parameter types (resource object
and condition name string), and either link or cross-reference the function
implementations/definitions so readers can find the exact semantics and return
types used by the CEL expressions in this file.

In `@hyperfleet/components/sentinel/sentinel.md`:
- Around line 905-916: Update Test 9 in sentinel.md to explicitly state that
when the resource "has no Ready condition (or unexpected status)" the
reference_time expression evaluation fails and therefore reference_time falls
back to resource.metadata.created_time (e.g., "reference_time =
resource.metadata.created_time = now() - 15s  # Falls back to created_time when
conditionTime() fails"); keep the rest of the scenario and expected
Decision/Reason unchanged so readers understand the fallback behavior described
around line 249.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6df6ec8f-eafa-48ee-a059-4c1608fcea0d

📥 Commits

Reviewing files that changed from the base of the PR and between a8fbe6b and d2f2665.

📒 Files selected for processing (3)
  • hyperfleet/architecture/architecture-summary.md
  • hyperfleet/components/sentinel/sentinel-deployment.md
  • hyperfleet/components/sentinel/sentinel.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • hyperfleet/components/sentinel/sentinel-deployment.md

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@hyperfleet/architecture/architecture-summary.md`:
- Around line 274-279: Update the summary algorithm to specify the same fallback
used in sentinel.md for computing reference_time: when evaluating
conditionTime(resource, "Ready") returns no timestamp or no rule matches (so
matched_rule is absent), fall back to resource.created_time (or the documented
alternative in sentinel.md) before computing reference_time + max_age;
explicitly mention reference_time, matched_rule, max_age, and
conditionTime(resource, "Ready") in the text so the no-match and no-timestamp
paths produce a well-defined reference_time used in the IF now >= reference_time
+ max_age check.
- Around line 218-222: Update the responsibilities list to reflect the new
generation-first decision model: replace references to the old trigger-based
checks (status.phase and status.last_updated_time) with the new reconciliation
decision flow that uses generation-first reconciliation plus CEL-selected
max_age and reference_time; specifically, modify the Decision Logic bullet to
state that reconciliation is triggered by new generation detection and/or
CEL-evaluated max_age/reference_time rules (using CEL expressions) and remove or
deprecate the prior reliance on status.phase/status.last_updated_time so the
summary uses a single, consistent decision model.

In `@hyperfleet/components/sentinel/sentinel.md`:
- Around line 215-218: State in the main algorithm that when evaluating
condition(resource, "Available").observed_generation it must default to 0 if the
Available condition is missing (to avoid CEL/runtime errors), and treat
resource.generation > observed_generation using that default; update the rule
that references resource.generation and condition(resource,
"Available").observed_generation to explicitly say "missing Available condition
= 0 (observed_generation = 0)" so new resources are handled deterministically
(this mirrors the behavior described later in Test 8).
- Around line 24-26: Clarify ownership of timing/generation fields by explicitly
stating whether adapters write per-condition fields
(status.conditions[].last_updated_time and
status.conditions[].observed_generation) or only the top-level observed_time /
observed_generation and the API derives the per-condition values; update the
sentinel.md text to pick one model (recommended: adapters POST top-level
observed_time/observed_generation and the API populates status.conditions[]
last_updated_time and observed_generation) and change the lines describing "Uses
adapter status updates" and any subsequent contract wording to reflect that
single, unambiguous flow, referencing the symbols
status.conditions[].last_updated_time, status.conditions[].observed_generation,
observed_time, and observed_generation so readers know which side is
authoritative.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cc00851b-dcd1-4c20-952a-ecaf7ebcb1d1

📥 Commits

Reviewing files that changed from the base of the PR and between d2f2665 and c4a18ec.

📒 Files selected for processing (2)
  • hyperfleet/architecture/architecture-summary.md
  • hyperfleet/components/sentinel/sentinel.md

@rafabene rafabene marked this pull request as draft March 12, 2026 18:26
@rafabene rafabene marked this pull request as ready for review March 17, 2026 19:25
@openshift-ci openshift-ci bot requested review from 86254860 and Mischulee March 17, 2026 19:25
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@hyperfleet/components/sentinel/sentinel.md`:
- Around line 215-226: Add explicit wording to the Decision Logic section
clarifying missing-Available behavior: state that when condition(resource,
"Available") is absent its observed_generation should be treated as 0 (so
condition(resource, "Available").observed_generation defaults to 0), which
causes the Generation Mismatch check to trigger immediate reconciliation for new
resources; also reference that this aligns with message_decision.params and
message_decision.result evaluation flow so readers understand the precedence.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4d00326d-9dd5-46bc-a61e-b8141e6798f0

📥 Commits

Reviewing files that changed from the base of the PR and between c4a18ec and fcd347c.

📒 Files selected for processing (5)
  • hyperfleet/architecture/architecture-summary.md
  • hyperfleet/components/adapter/framework/adapter-flow-diagrams.md
  • hyperfleet/components/sentinel/sentinel-config.yaml
  • hyperfleet/components/sentinel/sentinel-deployment.md
  • hyperfleet/components/sentinel/sentinel.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • hyperfleet/components/sentinel/sentinel-config.yaml

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
hyperfleet/components/sentinel/sentinel-deployment.md (2)

215-226: ⚠️ Potential issue | 🟠 Major

reason label semantics are internally inconsistent.

Line 215 describes skipped reasons as "matched rule name"/"no_match", while Line 226 describes decision reasons like "generation_mismatch"/"message_decision" and param-specific skip reasons. Please define one canonical taxonomy to avoid divergent metric implementations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/components/sentinel/sentinel-deployment.md` around lines 215 -
226, Unify the `reason` label taxonomy across the sentinel metrics: choose and
document one canonical set of reason values (e.g., "no_match",
"rule:<rule_name>", "generation_mismatch", "message_decision",
"param:<param_name>") and update the metric docs for
`hyperfleet_sentinel_resources_skipped_total` and any decision-related
descriptions so they reference this single taxonomy; ensure the `reason`
semantics are explicitly described (how to encode rule names vs param-specific
skips) and referenced in the Implementation Requirements so all code paths
updating `reason` (the skipped counter and decision recording) use the same
canonical values.

60-65: ⚠️ Potential issue | 🟠 Major

BROKER_CREDENTIALS is still required in the manifest despite being documented as optional.

Line 60 says credentials are only needed for some brokers, but Lines 61-65 always require sentinel-secrets/broker-credentials. This can block startup when that key is intentionally absent.

Suggested doc fix
         - name: BROKER_CREDENTIALS
           valueFrom:
             secretKeyRef:
               name: sentinel-secrets
               key: broker-credentials
+              optional: true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/components/sentinel/sentinel-deployment.md` around lines 60 - 65,
The manifest currently requires BROKER_CREDENTIALS unconditionally (env var
BROKER_CREDENTIALS using secretKeyRef sentinel-secrets → broker-credentials);
make this optional by either removing/commenting the hard requirement in the
example and documenting it as optional, or update the secret reference to allow
absence by setting the SecretKeySelector to optional (i.e., mark the
secretKeyRef for broker-credentials as optional) so the pod can start when
sentinel-secrets/broker-credentials is intentionally absent; update the text
around BROKER_CREDENTIALS to reflect the optional behavior.
♻️ Duplicate comments (2)
hyperfleet/components/sentinel/sentinel.md (1)

24-25: ⚠️ Potential issue | 🟠 Major

Field ownership is misstated for condition-level status data.

Line 24/Line 42 currently say adapters update status.conditions[].last_updated_time and status.conditions[].observed_generation directly. That conflicts with the documented contract where adapters POST status reports and the API produces aggregated condition fields Sentinel reads. Please make ownership explicit to avoid implementation drift.

Also applies to: 42-42

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/components/sentinel/sentinel.md` around lines 24 - 25, The doc
incorrectly states that adapters update status.conditions[].last_updated_time
and status.conditions[].observed_generation directly; instead, state that
adapters submit status reports (e.g., via POST) and the API aggregates those
reports into the condition-level fields that Sentinel reads; update the
sentences mentioning status.conditions[].last_updated_time and
status.conditions[].observed_generation (and any text about adapters updating
them) to clarify that adapters produce reports and the API owns and sets the
aggregated condition fields which Sentinel consumes.
hyperfleet/architecture/architecture-summary.md (1)

219-220: ⚠️ Potential issue | 🟠 Major

Use the concrete observed-generation source, not ambiguous observedGeneration.

At Line 219 and Line 268, observedGeneration is undefined/ambiguous. This should explicitly reference condition(resource, "Available").observed_generation (with the documented default behavior when missing) so the algorithm matches the API contract and the sentinel docs.

Also applies to: 268-273

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hyperfleet/architecture/architecture-summary.md` around lines 219 - 220, The
docs reference an ambiguous observedGeneration; update the text and any
pseudo-code that uses resource.generation > observedGeneration to instead use
the concrete API field condition(resource, "Available").observed_generation (and
mention the documented default behavior when that field is missing), and apply
this change to both places that mention observedGeneration (the generation check
and the later algorithm block that spans the configurable message decision), so
the algorithm explicitly matches the API contract and sentinel docs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@hyperfleet/components/sentinel/sentinel-deployment.md`:
- Around line 215-226: Unify the `reason` label taxonomy across the sentinel
metrics: choose and document one canonical set of reason values (e.g.,
"no_match", "rule:<rule_name>", "generation_mismatch", "message_decision",
"param:<param_name>") and update the metric docs for
`hyperfleet_sentinel_resources_skipped_total` and any decision-related
descriptions so they reference this single taxonomy; ensure the `reason`
semantics are explicitly described (how to encode rule names vs param-specific
skips) and referenced in the Implementation Requirements so all code paths
updating `reason` (the skipped counter and decision recording) use the same
canonical values.
- Around line 60-65: The manifest currently requires BROKER_CREDENTIALS
unconditionally (env var BROKER_CREDENTIALS using secretKeyRef sentinel-secrets
→ broker-credentials); make this optional by either removing/commenting the hard
requirement in the example and documenting it as optional, or update the secret
reference to allow absence by setting the SecretKeySelector to optional (i.e.,
mark the secretKeyRef for broker-credentials as optional) so the pod can start
when sentinel-secrets/broker-credentials is intentionally absent; update the
text around BROKER_CREDENTIALS to reflect the optional behavior.

---

Duplicate comments:
In `@hyperfleet/architecture/architecture-summary.md`:
- Around line 219-220: The docs reference an ambiguous observedGeneration;
update the text and any pseudo-code that uses resource.generation >
observedGeneration to instead use the concrete API field condition(resource,
"Available").observed_generation (and mention the documented default behavior
when that field is missing), and apply this change to both places that mention
observedGeneration (the generation check and the later algorithm block that
spans the configurable message decision), so the algorithm explicitly matches
the API contract and sentinel docs.

In `@hyperfleet/components/sentinel/sentinel.md`:
- Around line 24-25: The doc incorrectly states that adapters update
status.conditions[].last_updated_time and
status.conditions[].observed_generation directly; instead, state that adapters
submit status reports (e.g., via POST) and the API aggregates those reports into
the condition-level fields that Sentinel reads; update the sentences mentioning
status.conditions[].last_updated_time and
status.conditions[].observed_generation (and any text about adapters updating
them) to clarify that adapters produce reports and the API owns and sets the
aggregated condition fields which Sentinel consumes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4d5f6306-2133-46cc-ade8-1d3641a17a32

📥 Commits

Reviewing files that changed from the base of the PR and between fcd347c and 4ac6092.

📒 Files selected for processing (5)
  • hyperfleet/architecture/architecture-summary.md
  • hyperfleet/components/adapter/framework/adapter-flow-diagrams.md
  • hyperfleet/components/sentinel/sentinel-config.yaml
  • hyperfleet/components/sentinel/sentinel-deployment.md
  • hyperfleet/components/sentinel/sentinel.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • hyperfleet/components/adapter/framework/adapter-flow-diagrams.md
  • hyperfleet/components/sentinel/sentinel-config.yaml

Copy link
Contributor

@xueli181114 xueli181114 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Focusing on technical gaps and consistency issues.

Overall direction is sound — making the decision engine configurable aligns with the adapter framework's CEL patterns. But the documentation needs tightening before it becomes the implementation contract.

- Introduce message_decision with CEL params and boolean result expression
- Single condition(name) CEL function for accessing resource status
- Pure CEL operators (no AND/OR aliases)
- Unified decision logic (no hardcoded generation check)
- Ready==False covers generation mismatch (API computes it)
- Remove sentinel-config.yaml and sentinel-deployment.md to avoid drift
- Update architecture-summary.md to match
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants