zebra: defer RIB sweep until metaqueue is drained#27093
Conversation
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
This PR updates the SONiC FRR patchset to address a zebra startup timing issue where rib_sweep_route() can run before queued kernel routes have been processed into the RIB, potentially leaving stale/orphaned kernel routes after an ungraceful restart.
Changes:
- Add a zebra patch that defers
rib_sweep_route()execution while the zebra metaqueue still has pending work. - Update the FRR patch
seriesto include the new patch in the applied patchset.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/sonic-frr/patch/series | Adds the new zebra patch to the ordered FRR patch series. |
| src/sonic-frr/patch/0107-zebra-defer-rib-sweep-until-metaqueue-drained.patch | Defers RIB sweep until the metaqueue is drained by rescheduling the sweep timer. |
b5cdf91 to
977c06d
Compare
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
977c06d to
842b4c5
Compare
842b4c5 to
2ad4c84
Compare
2ad4c84 to
7fd24c2
Compare
7fd24c2 to
0525ff4
Compare
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
1370bdb to
76c49af
Compare
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run Azure.sonic-buildimage |
1 similar comment
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
1 similar comment
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azpw run Azure.sonic-buildimage |
|
Retrying failed(or canceled) jobs... |
|
The change is not in msft-202412 yet. @deepak-singhal0408, please manually create the cherry pick PR for branch msft-202412. ---Powered by SONiC BuildBot
|
|
The change is not in 202511 yet. @deepak-singhal0408, please manually create the cherry pick PR for branch 202511. ---Powered by SONiC BuildBot
|
Upstream PR FRRouting/frr#21550 merged (2026-05-03). This is a temporary carry patch that will be dropped on the next FRR tag bump. The patch defers the post-startup RIB sweep in zebra until the metaqueue is fully drained, preventing premature deletion of routes that are still being processed from bgpd/staticd. Signed-off-by: Deepak Singhal <deepsinghal@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: mhchann <mhchann082@gmail.com>
Cherry-pick of #27093 to 202511 branch. Patch renumbered from 0108 to 0107 to match 202511 series. Signed-off-by: Deepak Singhal <deepsinghal@microsoft.com>
…#27210) Cherry-pick of sonic-net#27093 to 202511 branch. Patch renumbered from 0108 to 0107 to match 202511 series. Signed-off-by: Deepak Singhal <deepsinghal@microsoft.com> Signed-off-by: Kebo Liu <kebol@nvidia.com>
Cherry-pick of #27093 to 202511 branch. Patch renumbered from 0108 to 0107 to match 202511 series. Signed-off-by: Deepak Singhal <deepsinghal@microsoft.com>
Cherry-pick of sonic-net/sonic-buildimage#27093 to msft-202412 branch. Patch renumbered from 0108 to 0147 to match msft-202412 series. Signed-off-by: Deepak Singhal <deepsinghal@microsoft.com>
What I did
SONiC-only carry patch for FRR zebra — defer RIB sweep until the metaqueue drains. Adds bounded retry logic (50 retries × 20ms = ~1s cap).
To be dropped when the upstream startup-ordering rework (FRRouting/frr#21550) lands in a future FRR tag bump.
How I did it
When zebra starts without
-K(graceful_restart=0), the sweep timer fires with 0-second delay but the metaqueuework_queuehas a 10ms batching hold (ZEBRA_RIB_PROCESS_HOLD_TIME). This causes the sweep to walk an empty RIB and miss stale routes still queued in the metaqueue.The patch checks
zrouter.mq->size > 0before sweeping. If the metaqueue is non-empty, it reschedules the sweep at 2× the hold time (20ms), up to 50 retries (~1 second). After max retries, it sweeps anyway with a warning log.How to verify it
RIB sweep deferreddebug messages (withdebug ribenabled)Which release branch to backport
202511, msft-202412
Cherry-pick tracking:
master202511msft-202412Upstream references