doublezerod: add periodic kernel route reconciliation#3672
Open
doublezerod: add periodic kernel route reconciliation#3672
Conversation
f31a780 to
bd203a8
Compare
Add a reconciliation loop to the liveness manager that periodically scans the kernel routing table for missing BGP routes and reinstalls them, mitigating connectivity loss caused by external processes removing routes. Also promote liveness session down logs from DEBUG to INFO for passive/peer-passive modes so operators can see the full up/down lifecycle.
Increment RouteInstallFailures counter when a reconciliation reinstall fails, matching the observability pattern in onSessionUp. Also pre-allocate the toCheck slice.
- Re-check installed state under lock before RouteAdd to prevent resurrecting routes intentionally withdrawn by onSessionDown - Add SrcIP to kernel route lookup key for tighter matching in multi-interface setups - Reject negative RouteReconcileInterval in Validate() - Use named const for reconcile interval flag default - Log when route reconciliation is enabled at startup
bd203a8 to
99d373a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves: #3669
Summary of Changes
--route-liveness-reconcile-interval), detects BGP routes that should already be installed but are missing, and reinstalls themdoublezero_liveness_route_reinstalls_totalanddoublezero_liveness_route_install_failures_totalPrometheus metrics to track reinstalls and failuresinstalledstate under lock before each reinstall soreconcileRoutescannot resurrect a route thatonSessionDownintentionally withdrew between snapshot and reinstall(table, dst, nexthop)but different source IPs are matched independently in multi-interface setupsDiff Breakdown
Bulk of the change is the reconciliation loop and its tests.
Key files (click to expand)
client/doublezerod/internal/liveness/manager.go—reconcileRoutes()implementation with TOCTOU guard and src-aware kernel key, config field + validation, goroutine launch, startup log, Debug→Info log level changeclient/doublezerod/internal/liveness/manager_test.go— unit tests for route reconciliation (missing route reinstall, present route skip, uninstalled route skip, install failure metric)client/doublezerod/internal/liveness/metrics.go—RouteReinstallscounter androuteReinstallhelperclient/doublezerod/cmd/doublezerod/main.go—--route-liveness-reconcile-intervalflag wiring with named default constTesting Verification
RouteAdderrorNetlinkerto simulate kernel route state; reconciliation ticker set totime.Hourin tests to prevent background interference while callingreconcileRoutes()directlygo vetandgo buildclean