We have a plugin that monitors for RunPodSandbox events. We observed that if a RunPodSandbox requests is in flight while the NRI plugin starts up and registers, then the pod sandbox event will be missed and not delivered in Synchronize or RunPodSandbox.
Here's the timeline:
- Kubelet issues a
RunPodSandbox creation event
- Containerd starts to process the
RunPodSandbox, and creates a pod sandbox in sandboxstore.StateUnknown
- Containerd doesn't send a
RunPodSandbox NRI event (because no NRI plugin is registered just yet)
- NRI Plugin Starts up & Registers
- containerd registers the plugin and synchronizes it's state. As part of doing so, it list all the pod sandboxes, but note it filters out sandboxes in
sandboxstore.StateUnknown
- The NRI plugin recevies the synchronized list of PodSandboxes, but it misses the pod in (1) because the sandbox was in Unknown state
- The
RunPodSandbox completes
- The
RunPodSandbox event was missed from both Synchronize call and RunPodSandbox NRI events!
Expected behavior:
I would expect that for every pod sandbox event, it will be delivered in either Synchronize or RunPodSandbox. Maybe one approach to consider is for Synchronize to return pod sandboxes creations that are in flight (i.e. don't exclude Unknown state pod sandboxes).