Skip to content

fix: resolve concurrent state corruption in NetworkController#1082

Open
pulkitvats2007-crypto wants to merge 2 commits into
nephio-project:mainfrom
pulkitvats2007-crypto:fix-network-controller-race
Open

fix: resolve concurrent state corruption in NetworkController#1082
pulkitvats2007-crypto wants to merge 2 commits into
nephio-project:mainfrom
pulkitvats2007-crypto:fix-network-controller-race

Conversation

@pulkitvats2007-crypto
Copy link
Copy Markdown
Contributor

Bug: Concurrent State Corruption in NetworkController

The NetworkController had a critical concurrency flaw where per-reconciliation state was stored on the reconciler struct itself. Since the reconciler is a shared singleton, concurrent reconciliation loops ended up mutating the same resources (and previously devices) fields.

In a real-world Nephio setup—where multiple Network CRs are reconciled simultaneously—this caused state from one reconciliation to leak into another. As a result, configurations such as IPAM allocations, VLAN assignments, or OpenConfig data intended for one network could be incorrectly applied to a different one.

This issue was particularly dangerous because it often failed silently: the controller would report a successful reconciliation while leaving the underlying infrastructure in an inconsistent or incorrect state. In some cases, concurrent writes to shared fields could also trigger race conditions and lead to controller panics.


Fix: Isolate Reconciliation State & Ensure Thread Safety

This PR resolves the issue by eliminating shared mutable state from the reconciler and ensuring that each reconciliation loop operates on its own isolated data.

Key changes:

  • Removed shared fields: Eliminated resources and unused devices from the reconciler struct to prevent cross-request mutation.
  • Scoped state locally: Moved initialization of the resources object into the Reconcile function so each invocation gets a fresh instance.
  • Explicit dependency passing: Updated helper functions like applyInitialResources and getNewResources to accept resources as a parameter instead of relying on struct-level state.
  • Cleaned up dead code: Removed the unused devices field to simplify the controller and reduce confusion.

Result

The NetworkController is now thread-safe and behaves correctly under concurrent reconciliation scenarios. Each Network CR is processed independently, eliminating cross-resource contamination and ensuring reliable, predictable infrastructure configuration—especially important in Nephio’s GitOps-driven workflows where batch updates are common.

Signed-off-by: pulkitvats2007-crypto <pulkitvats2007@gmail.com>
@nephio-prow nephio-prow Bot requested review from efiacor and liamfallon March 29, 2026 08:12
@nephio-prow
Copy link
Copy Markdown
Contributor

nephio-prow Bot commented Mar 29, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign liamfallon for approval by writing /assign @liamfallon in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented Mar 29, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: pulkitvats2007-crypto / name: pulkitvats2007-crypto (655a84c, 7eff209)

@nephio-prow
Copy link
Copy Markdown
Contributor

nephio-prow Bot commented Mar 29, 2026

Hi @pulkitvats2007-crypto. Thanks for your PR.

I'm waiting for a nephio-project member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@efiacor
Copy link
Copy Markdown
Collaborator

efiacor commented Mar 29, 2026

/ok-to-test

Signed-off-by: pulkitvats2007-crypto <pulkitvats2007@gmail.com>
@pulkitvats2007-crypto
Copy link
Copy Markdown
Contributor Author

/assign @liamfallon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants