Skip to content

zebra: defer RIB sweep until metaqueue is drained#27093

Merged
eddieruan-alibaba merged 1 commit into
sonic-net:masterfrom
deepak-singhal0408:fix/zebra-sweep-metaqueue-drain
May 6, 2026
Merged

zebra: defer RIB sweep until metaqueue is drained#27093
eddieruan-alibaba merged 1 commit into
sonic-net:masterfrom
deepak-singhal0408:fix/zebra-sweep-metaqueue-drain

Conversation

@deepak-singhal0408
Copy link
Copy Markdown
Contributor

@deepak-singhal0408 deepak-singhal0408 commented Apr 30, 2026

What I did

SONiC-only carry patch for FRR zebra — defer RIB sweep until the metaqueue drains. Adds bounded retry logic (50 retries × 20ms = ~1s cap).

To be dropped when the upstream startup-ordering rework (FRRouting/frr#21550) lands in a future FRR tag bump.

How I did it

When zebra starts without -K (graceful_restart=0), the sweep timer fires with 0-second delay but the metaqueue work_queue has a 10ms batching hold (ZEBRA_RIB_PROCESS_HOLD_TIME). This causes the sweep to walk an empty RIB and miss stale routes still queued in the metaqueue.

The patch checks zrouter.mq->size > 0 before sweeping. If the metaqueue is non-empty, it reschedules the sweep at 2× the hold time (20ms), up to 50 retries (~1 second). After max retries, it sweeps anyway with a warning log.

How to verify it

  1. Boot a SONiC VS image with this patch
  2. Verify stale kernel routes are properly swept after startup
  3. Check syslog for RIB sweep deferred debug messages (with debug rib enabled)

Which release branch to backport

202511, msft-202412

Cherry-pick tracking:

Branch PR Patch # Status
master #27093 0108 ✅ Merged
202511 #27210 0107 ⏳ CI pending
msft-202412 Azure/sonic-buildimage-msft#2260 0147 ⏳ CI pending

Upstream references

Copilot AI review requested due to automatic review settings April 30, 2026 09:26
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the SONiC FRR patchset to address a zebra startup timing issue where rib_sweep_route() can run before queued kernel routes have been processed into the RIB, potentially leaving stale/orphaned kernel routes after an ungraceful restart.

Changes:

  • Add a zebra patch that defers rib_sweep_route() execution while the zebra metaqueue still has pending work.
  • Update the FRR patch series to include the new patch in the applied patchset.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/sonic-frr/patch/series Adds the new zebra patch to the ordered FRR patch series.
src/sonic-frr/patch/0107-zebra-defer-rib-sweep-until-metaqueue-drained.patch Defers RIB sweep until the metaqueue is drained by rescheduling the sweep timer.

Comment thread src/sonic-frr/patch/0107-zebra-defer-rib-sweep-until-metaqueue-drained.patch Outdated
@deepak-singhal0408 deepak-singhal0408 force-pushed the fix/zebra-sweep-metaqueue-drain branch from b5cdf91 to 977c06d Compare April 30, 2026 15:58
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@deepak-singhal0408 deepak-singhal0408 force-pushed the fix/zebra-sweep-metaqueue-drain branch from 842b4c5 to 2ad4c84 Compare May 5, 2026 05:30
Copilot AI review requested due to automatic review settings May 5, 2026 05:33
@deepak-singhal0408 deepak-singhal0408 force-pushed the fix/zebra-sweep-metaqueue-drain branch from 2ad4c84 to 7fd24c2 Compare May 5, 2026 05:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

@deepak-singhal0408 deepak-singhal0408 force-pushed the fix/zebra-sweep-metaqueue-drain branch from 7fd24c2 to 0525ff4 Compare May 5, 2026 05:39
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

1 similar comment
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@deepak-singhal0408
Copy link
Copy Markdown
Contributor Author

/azpw run Azure.sonic-buildimage

@mssonicbld
Copy link
Copy Markdown
Collaborator

⚠️ Notice: /azpw run only runs failed jobs now. If you want to trigger a whole pipline run, please rebase your branch or close and reopen the PR.
💡 Tip: You can also use /azpw retry to retry failed jobs directly.

Retrying failed(or canceled) jobs...

@eddieruan-alibaba eddieruan-alibaba merged commit 4d2fd24 into sonic-net:master May 6, 2026
29 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in SONiC Routing Dashboard May 6, 2026
@mssonicbld
Copy link
Copy Markdown
Collaborator

The change is not in msft-202412 yet. @deepak-singhal0408, please manually create the cherry pick PR for branch msft-202412.
You can ping the release branch owner(github account: r12f) to approve your cherry pick PR.
If this change is already in msft-202412, please comment "already in msft-202412". Thanks!

---Powered by SONiC BuildBot

@mssonicbld
Copy link
Copy Markdown
Collaborator

The change is not in 202511 yet. @deepak-singhal0408, please manually create the cherry pick PR for branch 202511.
You can ping the release branch owner(github account: vmittal-msft) to approve your cherry pick PR.
If this change is already in 202511, please comment "already in 202511". Thanks!

---Powered by SONiC BuildBot

mhchann pushed a commit to mhchann/sonic-buildimage that referenced this pull request May 7, 2026
Upstream PR FRRouting/frr#21550 merged (2026-05-03). This is a temporary
carry patch that will be dropped on the next FRR tag bump.

The patch defers the post-startup RIB sweep in zebra until the metaqueue
is fully drained, preventing premature deletion of routes that are still
being processed from bgpd/staticd.

Signed-off-by: Deepak Singhal <deepsinghal@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: mhchann <mhchann082@gmail.com>
vmittal-msft pushed a commit that referenced this pull request May 7, 2026
Cherry-pick of #27093 to 202511 branch.
Patch renumbered from 0108 to 0107 to match 202511 series.

Signed-off-by: Deepak Singhal <deepsinghal@microsoft.com>
keboliu pushed a commit to keboliu/sonic-buildimage that referenced this pull request May 8, 2026
…#27210)

Cherry-pick of sonic-net#27093 to 202511 branch.
Patch renumbered from 0108 to 0107 to match 202511 series.

Signed-off-by: Deepak Singhal <deepsinghal@microsoft.com>
Signed-off-by: Kebo Liu <kebol@nvidia.com>
yxieca pushed a commit that referenced this pull request May 9, 2026
Cherry-pick of #27093 to 202511 branch.
Patch renumbered from 0108 to 0107 to match 202511 series.

Signed-off-by: Deepak Singhal <deepsinghal@microsoft.com>
deepak-singhal0408 added a commit to deepak-singhal0408/sonic-buildimage-msft that referenced this pull request May 11, 2026
Cherry-pick of sonic-net/sonic-buildimage#27093 to msft-202412 branch.
Patch renumbered from 0108 to 0147 to match msft-202412 series.

Signed-off-by: Deepak Singhal <deepsinghal@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Bug: FRR failed to clean up static blackhole route when recovering from ungraceful exits.

7 participants