Separate concerns for compute-pcrs by Jakob-Naucke · Pull Request #294 · trusted-execution-clusters/operator

Jakob-Naucke · 2026-06-26T10:20:05Z

retry image pcrs update in compute-pcrs
do not update status in compute-pcrs
refactor for things that don't need to be shared and functions that don't need to be macros

related #269

Summary by Sourcery

Separate status management from PCR computation and make ConfigMap updates for image PCRs more robust.

New Features:

Initialize structured logging in the compute-pcrs binary.

Bug Fixes:

Retry image PCR ConfigMap updates in compute-pcrs to handle update conflicts.
Ensure completed PCR compute jobs update ApprovedImage status conditions in the operator rather than in the compute-pcrs binary.

Enhancements:

Replace macros for status and image PCR updates with reusable async helper functions in the operator.
Move condition and transition-time utilities from the shared library into the operator where they are used.
Make ImagePcr clonable to support reuse during ConfigMap update retries.
Add dedicated helper and tests for updating image PCRs ConfigMaps from the operator.

If another process had written, replace will fail. Instead, loop and retry on conflict. Add comments on "if this becomes a performance issue". Signed-off-by: Jakob Naucke <jnaucke@redhat.com> Assisted-by: Opus 4.6

and turn into a function, no longer needed in compute-pcrs. Signed-off-by: Jakob Naucke <jnaucke@redhat.com> Assisted-by: Opus 4.6

openshift-ci · 2026-06-26T10:20:09Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Jakob-Naucke

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sourcery-ai · 2026-06-26T10:20:12Z

Reviewer's Guide

Refactors status/condition helpers and image PCRs update logic to live in the operator crate, replaces macros with regular functions, and changes compute-pcrs so it only updates the ConfigMap with retry-on-conflict while the operator is responsible for status updates on ApprovedImage and cluster resources.

Sequence diagram for compute-pcrs ConfigMap update with retry-on-conflict

sequenceDiagram
    actor JobPod
    participant ComputePcrs as compute_pcrs_main
    participant KubeAPI as Kube_API
    participant ImagePcrsCM as ConfigMap_image_pcrs

    JobPod->>ComputePcrs: main()
    loop until ConfigMap replace succeeds
        ComputePcrs->>KubeAPI: ConfigMap::get(PCR_CONFIG_MAP)
        KubeAPI-->>ComputePcrs: image_pcrs_map
        ComputePcrs->>ComputePcrs: serde_json::from_str(ImagePcrs)
        ComputePcrs->>ComputePcrs: image_pcrs.0.insert(resource_name, ImagePcr)
        ComputePcrs->>KubeAPI: ConfigMap::replace(PCR_CONFIG_MAP, image_pcrs_map)
        alt conflict 409
            KubeAPI-->>ComputePcrs: Error Api{code:409}
            ComputePcrs->>ComputePcrs: retry
        else success
            KubeAPI-->>ComputePcrs: Ok
            ComputePcrs->>JobPod: Ok(())
        end
    end

Sequence diagram for operator job_reconcile updating ApprovedImage status

sequenceDiagram
    participant Operator as job_reconcile
    participant KubeAPI as Kube_API
    participant ApprovedImage as ApprovedImage
    participant ImagePcrsCM as ConfigMap_image_pcrs
    participant Trustee as trustee_update_reference_values

    Operator->>KubeAPI: Job::status
    KubeAPI-->>Operator: Job{status:Complete}
    Operator->>KubeAPI: Job::metadata.owner_references
    KubeAPI-->>Operator: OwnerReference{name}

    Operator->>KubeAPI: Api<ApprovedImage>::get(name)
    KubeAPI-->>Operator: ApprovedImage
    Operator->>Operator: committed_condition(INSTALLED_REASON, generation, None)
    Operator->>Operator: ApprovedImageStatus{conditions}
    Operator->>KubeAPI: update_status(Api<ApprovedImage>, name, status)
    KubeAPI-->>Operator: Ok(())

    Operator->>KubeAPI: Api<ConfigMap>::get(PCR_CONFIG_MAP)
    KubeAPI-->>Operator: image_pcrs_map
    Operator->>Operator: update_image_pcrs(config_maps, map, ImagePcrs)
    Operator->>KubeAPI: ConfigMap::replace(PCR_CONFIG_MAP, map)
    KubeAPI-->>Operator: Ok(())

    Operator->>Trustee: update_reference_values(Client)
    Trustee-->>Operator: Ok(())

    Operator->>KubeAPI: Job::delete(name, DeleteParams::foreground)
    KubeAPI-->>Operator: Ok(())

File-Level Changes

Change	Details	Files
Move status/condition helper code from the shared lib into the operator crate and replace macro-based status updates with a typed async helper.	Remove update_status macro, condition helpers, and trait/impls from lib.rs Introduce async update_status function and condition helpers (condition_status, Conditions trait, transition_time, committed_condition) in operator/src/lib.rs Wire operator main and conditions modules to use the new update_status function and locally defined helpers instead of the removed lib exports	`lib/src/lib.rs` `operator/src/lib.rs` `operator/src/main.rs` `operator/src/conditions.rs`
Refactor image PCRs update logic into a reusable async function in the operator and adjust callers and tests accordingly.	Remove update_image_pcrs macro from lib/reference_values.rs and make ImagePcr clonable Add async update_image_pcrs function in operator/reference_values.rs that serializes ImagePcrs and replaces the PCR ConfigMap Update handle_new_image and disallow_image to call update_image_pcrs and return proper errors Extend tests to cover update_image_pcrs success/error paths and adjust existing tests for new behavior (e.g., ownerReferences, new API call order)	`lib/src/reference_values.rs` `operator/src/reference_values.rs`
Change compute-pcrs binary to only update image PCRs in the ConfigMap with retry-on-conflict and no longer manipulate ApprovedImage status.	Initialize env_logger for compute-pcrs for runtime logging Fetch and update the image-pcrs ConfigMap in a retry loop, handling 409 conflict responses by re-reading and retrying the replace Stop fetching ApprovedImage resources and stop updating their status conditions from compute-pcrs; only modify the ConfigMap Add Clone derive to ImagePcr to support cloning when updating the map	`compute-pcrs/src/main.rs` `lib/src/reference_values.rs` `compute-pcrs/Cargo.toml`
Ensure ApprovedImage status is updated by the operator when compute-pcrs jobs complete instead of by the compute-pcrs binary itself.	Extend job_reconcile to resolve the owning ApprovedImage from the Job ownerReference Compute a committed condition using committed_condition and update ApprovedImageStatus via update_status Adapt job_reconcile test expectations to the new API call order and behavior	`operator/src/reference_values.rs`
Update attestation key approval to use the new async status update helper.	Replace update_status macro usage with update_status function when approving AttestationKey resources	`operator/src/attestation_key_register.rs`

Possibly linked issues

feat: migrate to Server-Side Apply (SSA) and enforce controller owner… #269: PR directly targets compute-pcrs failure/robustness by retried ConfigMap updates and moving status updates from the job.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 2 issues, and left some high level feedback:

The retry loop in compute-pcrs for handling 409 conflicts on the ConfigMap update has no backoff or max retry limit; consider adding a small delay and/or attempt cap to avoid a tight infinite loop if the conflict persists.
In update_status, the generic bounds on K are quite broad and Debug is not used; you may be able to simplify the signature (and potentially move the constraint to the Api<K> caller) to make the helper more generally reusable and easier to read.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The retry loop in `compute-pcrs` for handling 409 conflicts on the ConfigMap update has no backoff or max retry limit; consider adding a small delay and/or attempt cap to avoid a tight infinite loop if the conflict persists.
- In `update_status`, the generic bounds on `K` are quite broad and `Debug` is not used; you may be able to simplify the signature (and potentially move the constraint to the `Api<K>` caller) to make the helper more generally reusable and easier to read.

## Individual Comments

### Comment 1
<location path="operator/src/reference_values.rs" line_range="129-130" />
<code_context>
         return Ok(Action::requeue(Duration::from_secs(300)));
     }
+
+    let err = "Completed job had no owner reference";
+    let resource_name = &job.meta().owner_references.as_ref().context(err)?[0].name;
+    let approved_images: Api<ApprovedImage> = Api::default_namespaced(kube_client.clone());
+    let try_image = approved_images.get(resource_name).await;
</code_context>
<issue_to_address>
**issue (bug_risk):** Accessing the first owner reference by index can panic if the vector is empty.

`context(err)?` only guards the `Option`; it doesn’t prevent `owner_references` from being `Some(vec![])`, which will panic on `[0]`. Use something like `and_then(|v| v.first())` to ensure a non-empty vector while still surfacing the error via `Context`, e.g.:

```rust
let owner = job
    .meta()
    .owner_references
    .as_ref()
    .and_then(|v| v.first())
    .context(err)?;
let resource_name = &owner.name;
```
</issue_to_address>

### Comment 2
<location path="compute-pcrs/src/main.rs" line_range="66-75" />
<code_context>
+    // If we see this causing performance problems, consider NoSQL
</code_context>
<issue_to_address>
**suggestion (bug_risk):** The tight retry loop on ConfigMap conflicts lacks backoff or a retry limit.

Under high contention this can devolve into a busy-wait and generate excessive API calls. Add a brief backoff (e.g., `tokio::time::sleep`) and/or a maximum retry count so that persistent conflicts fail with an error instead of looping indefinitely.

Suggested implementation:

```rust
    // If we see this causing performance problems, consider NoSQL
    const MAX_PCR_CONFIGMAP_RETRIES: u32 = 10;
    let mut pcr_configmap_retries: u32 = 0;

    loop {
        if pcr_configmap_retries >= MAX_PCR_CONFIGMAP_RETRIES {
            anyhow::bail!(
                "failed to update PCR ConfigMap after {MAX_PCR_CONFIGMAP_RETRIES} retries due to conflicts"
            );
        }

        // simple linear backoff to avoid busy-waiting under contention
        if pcr_configmap_retries > 0 {
            // scale by attempt to slightly increase delay on repeated conflicts
            sleep(Duration::from_millis(50 * pcr_configmap_retries as u64)).await;
        }

        pcr_configmap_retries += 1;

        let mut image_pcrs_map = config_maps.get(PCR_CONFIG_MAP).await?;

```

To compile and run correctly, you will also need to:

1. Import the backoff primitives near the top of `compute-pcrs/src/main.rs`:
   - `use std::time::Duration;`
   - `use tokio::time::sleep;`

2. Ensure that the loop breaks out on a successful ConfigMap update (e.g., after a successful `config_maps.replace(...)` call). The retry counter and backoff only help if the code:
   - `continue`s the loop specifically on conflict (HTTP 409) from the Kubernetes API, and
   - `break`s or returns on success or non-retryable errors.

If the conflict handling logic is not yet present, you should:
- Match on the `kube::Error` from the update call,
- `continue` the loop only when the error is a 409 conflict,
- Return the error immediately for all other failure modes.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

not from compute-pcrs job for separation of concerns. Signed-off-by: Jakob Naucke <jnaucke@redhat.com> Assisted-by: Opus 4.6

no longer needed, makes lib/ slimmer Signed-off-by: Jakob Naucke <jnaucke@redhat.com> Assisted-by: Opus 4.6

Jakob-Naucke · 2026-06-26T11:12:55Z

-    let status = ApprovedImageStatus { conditions };
-    update_status!(approved_images, &args.resource_name, status)?;
+    // If we see this causing performance problems, consider NoSQL
+    loop {


suggestion from @alicefr: instead of "managing" concurrent writes from the job, write back into an extra status field of the ApprovedImage and let the operator collect

openshift-ci · 2026-06-26T11:44:49Z

@Jakob-Naucke: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/operator-lifecycle-verify	`3409d7a`	link	true	`/test operator-lifecycle-verify`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Jakob-Naucke added 2 commits June 26, 2026 11:54

compute-pcrs: loop-retry configmap replacement

cbf8d8b

If another process had written, replace will fail. Instead, loop and retry on conflict. Add comments on "if this becomes a performance issue". Signed-off-by: Jakob Naucke <jnaucke@redhat.com> Assisted-by: Opus 4.6

Move update_image_pcrs! into operator

58ad716

and turn into a function, no longer needed in compute-pcrs. Signed-off-by: Jakob Naucke <jnaucke@redhat.com> Assisted-by: Opus 4.6

Jakob-Naucke requested review from SpaceFace02 and alicefr June 26, 2026 10:20

sourcery-ai Bot reviewed Jun 26, 2026

View reviewed changes

Comment thread operator/src/reference_values.rs Outdated

Comment thread compute-pcrs/src/main.rs

Jakob-Naucke force-pushed the sep-compute-pcrs-concerns branch from ee4443f to 69932a0 Compare June 26, 2026 10:30

Update ApprovedImage status from operator

14b589b

not from compute-pcrs job for separation of concerns. Signed-off-by: Jakob Naucke <jnaucke@redhat.com> Assisted-by: Opus 4.6

Jakob-Naucke force-pushed the sep-compute-pcrs-concerns branch from 69932a0 to d0441b1 Compare June 26, 2026 10:37

Move status functionality out of lib

3409d7a

no longer needed, makes lib/ slimmer Signed-off-by: Jakob Naucke <jnaucke@redhat.com> Assisted-by: Opus 4.6

Jakob-Naucke force-pushed the sep-compute-pcrs-concerns branch from d0441b1 to 3409d7a Compare June 26, 2026 10:49

Jakob-Naucke commented Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Separate concerns for compute-pcrs#294

Separate concerns for compute-pcrs#294
Jakob-Naucke wants to merge 4 commits into
trusted-execution-clusters:mainfrom
Jakob-Naucke:sep-compute-pcrs-concerns

Jakob-Naucke commented Jun 26, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

openshift-ci Bot commented Jun 26, 2026

Uh oh!

sourcery-ai Bot commented Jun 26, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Jakob-Naucke Jun 26, 2026

Uh oh!

openshift-ci Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Jakob-Naucke commented Jun 26, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

openshift-ci Bot commented Jun 26, 2026

Uh oh!

sourcery-ai Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for compute-pcrs ConfigMap update with retry-on-conflict

Sequence diagram for operator job_reconcile updating ApprovedImage status

File-Level Changes

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Jakob-Naucke Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

openshift-ci Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Jakob-Naucke commented Jun 26, 2026 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented Jun 26, 2026 •

edited

Loading