-
Notifications
You must be signed in to change notification settings - Fork 50
network: add cleanupOrphanTaps() to remove orphan TAPs by carrier state #407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for urunc ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
2e20d21 to
157d093
Compare
7fcd818 to
dbb9c93
Compare
…r state - Call new function cleanupOrphanTaps() at the start of DynamicNetwork.NetworkSetup(). - Add cleanupOrphanTaps(): scan netns for interfaces matching ^tap.*_urunc$ and use kernel carrier/operational state as the sole criterion: - NO-CARRIER => delete orphan (remove TC/qdisc, then delete link) - LOWER_UP / operational up / FlagRunning => treat as in-use and abort - Do not scan /proc or check /dev/net/tun; do not attempt to reuse TAPs. - Skip cleanup when no container interface (e.g. no eth0) is present. - Remove PID/FD based checks and netns flock; document the single unikernel-per-netns assumption. - Preserve networkSetup() create-only semantics and ensure TC/qdisc cleanup before link deletion. This resolves an issue on Kubernetes where restarting urunc left orphan TAP devices in the pod network namespace and prevented subsequent network setup. Signed-off-by: sidneychang <2190206983@qq.com>
cmainas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @sidneychang ,
thank you for this fix. This proposal though affects all deployments with urunc adding some additional overhead, even in deployments which are not affected from the issue. Therefore, I would prefer to dig a bit further and identify why reusing the tap device does not work. I will come up here with more info as soon as I find time to investigate this in more details.
Hello, I’ve verified locally that reusing an existing TAP device is feasible and works well in Pod restart scenarios, avoiding failures caused by leftover TAP devices. Currently, NetworkSetup blocks startup based solely on the presence of a TAP: This approach infers unikernel liveness from device existence, which can lead to false positives after crashes or restarts. My proposal is to decouple these concerns by checking for an active hypervisor process in the same network namespace to determine whether a unikernel is actually running. If no hypervisor is present and exactly one TAP exists, we treat this as a restart path and reuse the TAP; only when a hypervisor is still active do we reject the setup to prevent multiple concurrent unikernels. From an implementation perspective, this would mainly involve:
I’d appreciate your feedback on this approach, or if you see a better way to detect an already-running unikernel in the same namespace. |
cleanupOrphanTaps()at the start ofDynamicNetwork.NetworkSetup().cleanupOrphanTaps()(new): scan current netns for interfaces matching^tap.*_urunc$and use kernel carrier/operational state as the sole criterion:Description
In kubernetes setups when a pod is getting restarted, the network namespace (created by the pause container) remains active and hence the tap0_urunc device still exists. Therefore, when urunc (re)creates the container it identifies the tap0_urunc device and it does not recreates it.
Related issues
tap*_uruncleft after urunc restart in Kubernetes, preventing NetworkSetup #406How was this tested?
Then observe the Pod status and restart:
LLM usage
I use LLM to disscuss how to identify a orphan tap device.
Checklist
make lint).make test_ctr,make test_nerdctl,make test_docker,make test_crictl).