fix: resolve concurrent state corruption in NetworkController#1082
fix: resolve concurrent state corruption in NetworkController#1082pulkitvats2007-crypto wants to merge 2 commits into
Conversation
Signed-off-by: pulkitvats2007-crypto <pulkitvats2007@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @pulkitvats2007-crypto. Thanks for your PR. I'm waiting for a nephio-project member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/ok-to-test |
Signed-off-by: pulkitvats2007-crypto <pulkitvats2007@gmail.com>
|
/assign @liamfallon |
Bug: Concurrent State Corruption in NetworkController
The
NetworkControllerhad a critical concurrency flaw where per-reconciliation state was stored on the reconciler struct itself. Since the reconciler is a shared singleton, concurrent reconciliation loops ended up mutating the sameresources(and previouslydevices) fields.In a real-world Nephio setup—where multiple
NetworkCRs are reconciled simultaneously—this caused state from one reconciliation to leak into another. As a result, configurations such as IPAM allocations, VLAN assignments, or OpenConfig data intended for one network could be incorrectly applied to a different one.This issue was particularly dangerous because it often failed silently: the controller would report a successful reconciliation while leaving the underlying infrastructure in an inconsistent or incorrect state. In some cases, concurrent writes to shared fields could also trigger race conditions and lead to controller panics.
Fix: Isolate Reconciliation State & Ensure Thread Safety
This PR resolves the issue by eliminating shared mutable state from the reconciler and ensuring that each reconciliation loop operates on its own isolated data.
Key changes:
resourcesand unuseddevicesfrom the reconciler struct to prevent cross-request mutation.resourcesobject into theReconcilefunction so each invocation gets a fresh instance.applyInitialResourcesandgetNewResourcesto acceptresourcesas a parameter instead of relying on struct-level state.devicesfield to simplify the controller and reduce confusion.Result
The
NetworkControlleris now thread-safe and behaves correctly under concurrent reconciliation scenarios. EachNetworkCR is processed independently, eliminating cross-resource contamination and ensuring reliable, predictable infrastructure configuration—especially important in Nephio’s GitOps-driven workflows where batch updates are common.