Problem
When the NSO sets initial Unknown/Pending conditions on a Gateway and then hits an error, it does not requeue the reconcile. Because the gateway spec hasn't changed, no subsequent watch event fires, and the gateway is never retried.
Discovered during incident datum-cloud/engineering#258. Even after the root GatewayClass issue was fixed via a patch to the gateway resource, the NSO didn't retry — the gateway had to be deleted and recreated to force an ADDED event.
Proposed Fix
Ensure that any error during initial gateway reconciliation results in a requeue with backoff, rather than relying solely on watch events to re-trigger reconciliation.
Problem
When the NSO sets initial
Unknown/Pendingconditions on a Gateway and then hits an error, it does not requeue the reconcile. Because the gateway spec hasn't changed, no subsequent watch event fires, and the gateway is never retried.Discovered during incident datum-cloud/engineering#258. Even after the root GatewayClass issue was fixed via a patch to the gateway resource, the NSO didn't retry — the gateway had to be deleted and recreated to force an ADDED event.
Proposed Fix
Ensure that any error during initial gateway reconciliation results in a requeue with backoff, rather than relying solely on watch events to re-trigger reconciliation.