Skip to content

Add support for multi-process port sharing with CIBIR.#5798

Merged
ProjectsByJackHe merged 40 commits intomainfrom
jackhe/sql-cibir-fix-sock-reservation
Apr 13, 2026
Merged

Add support for multi-process port sharing with CIBIR.#5798
ProjectsByJackHe merged 40 commits intomainfrom
jackhe/sql-cibir-fix-sock-reservation

Conversation

@ProjectsByJackHe
Copy link
Copy Markdown
Contributor

@ProjectsByJackHe ProjectsByJackHe commented Feb 18, 2026

Description

Fixes #5795

The XDP datapath can be configured to intercept packets based on QUIC Connection ID instead of local port.
This behavior existed in MsQuic but was not heavily exercised until recently.
One issue was that MsQuic always attempted to reserve UDP / TCP sockets for each application server process.
But for multiple server processes that may want to share a single port, we would run into port collision errors.
This PR adds support for CIBIR across multiple processes on the same port and document the behavior

Potential options to allow for multi-process port sharing:

  • Option 1: MsQuic delegates port protection to applications and provides best practice recommendations.

Analysis: Deferring the responsibility of port protection and isolation to the application has the upside of enabling the most potential scenarios but could also be a footgun.

  • Option 2: MsQuic makes sure some persistent reservation exists at a port.

Analysis: Note that LookUpPersistentReservation does not require admin privileges, but CreatePersistentReservation does require admin. This is useful in that if any reservation exists on a port, we can reasonably trust that an admin knew what they were doing when they created it. safety, and ensure the consumers of CIBIR must know what they are doing.

  • Option 3: MsQuic creates per-proc sockets with SIO_CPU_AFFINITY, but does not reserve the port.

Analysis: If another unrelated app creates a socket with SIO_CPU_AFFINITY, then they can bind to the CIBIR shared port. But for all other apps, trying to bind a socket to a CIBIR port will result in a collision.

Option chosen: 1

MsQuic's stance is that the application takes responsibility for book-keeping and protecting sharing shared local ports when using XDP + CIBIR.

  • Multiple MsQuic processes in Cibir+XDP mode can share a local port for server sockets only.
  • Applications should also not assume the shared port is safe from other non-Msquic processes binding to it.

MsQuic will NOT make an OS port reservation for server sockets when CIBIR+XDP is enabled. Clients on the other hand, MsQuic will always make OS port reservations.

  • Applications using server sockets + CIBIR/XDP must specify a well-known local port.

What changed

  • Server sockets with XDP+CIBIR both enabled/available will skip OS port reservation and OS socket creation to rely on XDP.

any failures plumbing xdp rules will bubble up as a socket creation error to the app. Can't fall back to OS sockets.

  • Client sockets with XDP+CIBIR both enabled/available will still do OS port reservation and socket creation but rely on XDP.

any failures plumbing xdp rules will silently fall back to using OS sockets. CIBIR transport negotiation can still work without XDP.

  • Server sockets with CIBIR enabled but XDP not available/enabled will do OS port reservation and fall back to OS sockets
  • Client sockets with CIBIR enabled but XDP not available/enabled will do OS port reservation and fall back to OS sockets

Port protection options

Testing

A new DataPathTest was added.

Documentation

Settings.md

@ProjectsByJackHe ProjectsByJackHe requested a review from a team as a code owner February 18, 2026 02:02
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.08%. Comparing base (37d6673) to head (3b550f8).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5798      +/-   ##
==========================================
+ Coverage   84.80%   85.08%   +0.27%     
==========================================
  Files          60       60              
  Lines       18731    18731              
==========================================
+ Hits        15885    15937      +52     
+ Misses       2846     2794      -52     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread src/platform/datapath_raw_socket.c Outdated
Comment thread src/platform/datapath_raw_socket.c Outdated
Comment thread src/platform/datapath_raw_socket.c Outdated
Comment thread src/platform/datapath_winuser.c Outdated
Comment thread src/platform/datapath_winuser.c Outdated
Comment thread src/platform/platform_internal.h
Comment thread src/platform/platform_internal.h Outdated
Comment thread docs/Settings.md Outdated
Copy link
Copy Markdown
Collaborator

@guhetier guhetier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I am concerned that we keep adding incremental exceptions to the port reservation logic to solve the next issue, but without having a clear design goal.
This code will be hard to maintain and confusing for apps as there is no simple rule about what can be done with ports.

I think we need to take the time soon to come up with a clear story about when can port be shared and when they can't + document it + check we implement it.

Comment thread docs/Settings.md Outdated
Comment thread src/platform/platform_internal.h
Comment thread src/platform/datapath_winuser.c Outdated
Comment thread src/platform/datapath_winuser.c Outdated
Comment thread src/platform/datapath_winuser.c
@ProjectsByJackHe
Copy link
Copy Markdown
Contributor Author

ProjectsByJackHe commented Feb 19, 2026

In general, I am concerned that we keep adding incremental exceptions to the port reservation logic to solve the next issue, but without having a clear design goal. This code will be hard to maintain and confusing for apps as there is no simple rule about what can be done with ports.

I think we need to take the time soon to come up with a clear story about when can port be shared and when they can't + document it + check we implement it.

I agree. I added a CIBIR.md.

@ProjectsByJackHe
Copy link
Copy Markdown
Contributor Author

In general, I am concerned that we keep adding incremental exceptions to the port reservation logic to solve the next issue, but without having a clear design goal. This code will be hard to maintain and confusing for apps as there is no simple rule about what can be done with ports.

I think we need to take the time soon to come up with a clear story about when can port be shared and when they can't + document it + check we implement it.

please see the updated XDP.md and CIBIR.md

@nibanks
Copy link
Copy Markdown
Collaborator

nibanks commented Mar 11, 2026

Maybe we should actually go through the rest of the standardization process for https://datatracker.ietf.org/doc/draft-banks-quic-cibir/ since you're still using it....

@guhetier
Copy link
Copy Markdown
Collaborator

Maybe we should actually go through the rest of the standardization process for https://datatracker.ietf.org/doc/draft-banks-quic-cibir/ since you're still using it....

Do you have more context about it? I am interested in understanding whether we should work on standardizing CIBIR or if another solution was adopted by other deployments that we should align with.

@nibanks
Copy link
Copy Markdown
Collaborator

nibanks commented Mar 11, 2026

Maybe we should actually go through the rest of the standardization process for https://datatracker.ietf.org/doc/draft-banks-quic-cibir/ since you're still using it....

Do you have more context about it? I am interested in understanding whether we should work on standardizing CIBIR or if another solution was adopted by other deployments that we should align with.

I know of no other solution/proposal that achieves what CIBIR does. We can chat more on Discord about the process.

Comment thread src/test/lib/TestHelpers.h Outdated
Comment thread src/test/lib/TestHelpers.h Outdated
Comment thread src/test/lib/HandshakeTest.cpp Outdated
Comment thread src/test/lib/HandshakeTest.cpp Outdated
Comment thread src/platform/datapath_xplat.c
Comment thread src/platform/datapath_xplat.c Outdated
Comment thread docs/CIBIR.md Outdated
Comment thread docs/CIBIR.md
Comment thread docs/CIBIR.md Outdated
Comment thread docs/CIBIR.md Outdated
Comment thread docs/XDP.md Outdated
Comment thread docs/XDP.md
Copy link
Copy Markdown
Contributor

@mtfriesen mtfriesen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking close, aside from the refinements requested by various comments.

@mtfriesen mtfriesen dismissed their stale review April 9, 2026 12:06

No longer blocking merging.

Comment thread docs/CIBIR.md Outdated
Comment thread docs/CIBIR.md Outdated
Comment thread src/test/lib/TestHelpers.h Outdated
Comment thread src/test/lib/TestHelpers.h Outdated
Comment thread src/test/lib/TestHelpers.h Outdated
Comment thread src/test/lib/TestHelpers.h Outdated
Comment thread src/platform/datapath_xplat.c Outdated
Comment thread src/platform/datapath_winuser.c Outdated
Comment thread src/platform/unittest/DataPathTest.cpp Outdated
Comment thread docs/CIBIR.md Outdated
Comment thread src/manifest/clog.sidecar Outdated
@guhetier
Copy link
Copy Markdown
Collaborator

guhetier commented Apr 10, 2026

The PR description says:

any failures plumbing xdp rules will bubble up as a socket creation error to the app. Can't fall back to OS sockets.

But errors when setting up XDP rules seems to be silently eaten:

   RawSocketCreateUdp                     → returns QUIC_STATUS
     └─ CxPlatTryAddSocket(...)           → returns QUIC_STATUS (checked ✅)
     └─ CxPlatDpRawPlumbRulesOnSocket(..) → void (NOT checked ❌)
          └─ CxPlatDpRawInterfaceAddRules → void, silently fails on alloc failure or rule overflow
               └─ CxPlatDpRawInterfaceUpdateRules → void, has TODO: "Figure out how to better handle failure"
                    └─ XdpCreateProgram → returns QUIC_STATUS (failure logged, then `continue`)

When XDP is not backed up with OS sockets, we would silently ignore all traffic. Probably applies to QTIP too.

@ProjectsByJackHe
Copy link
Copy Markdown
Contributor Author

ProjectsByJackHe commented Apr 10, 2026

The PR description says:

any failures plumbing xdp rules will bubble up as a socket creation error to the app. Can't fall back to OS sockets.

But errors when setting up XDP rules seems to be silently eaten:

   RawSocketCreateUdp                     → returns QUIC_STATUS
     └─ CxPlatTryAddSocket(...)           → returns QUIC_STATUS (checked ✅)
     └─ CxPlatDpRawPlumbRulesOnSocket(..) → void (NOT checked ❌)
          └─ CxPlatDpRawInterfaceAddRules → void, silently fails on alloc failure or rule overflow
               └─ CxPlatDpRawInterfaceUpdateRules → void, has TODO: "Figure out how to better handle failure"
                    └─ XdpCreateProgram → returns QUIC_STATUS (failure logged, then `continue`)

When XDP is not backed up with OS sockets, we would silently ignore all traffic. Probably applies to QTIP too.

Yeah, looking through the code, xdp failures in rule plumbing get silently ignored. This is a latent bug impacting QTIP (now CIBIR as well). Let's create a separate issue for this and address it in a follow up PR. This PR is starting to balloon into a very hard to review size.

@ProjectsByJackHe
Copy link
Copy Markdown
Contributor Author

The PR description says:

any failures plumbing xdp rules will bubble up as a socket creation error to the app. Can't fall back to OS sockets.

But errors when setting up XDP rules seems to be silently eaten:

   RawSocketCreateUdp                     → returns QUIC_STATUS
     └─ CxPlatTryAddSocket(...)           → returns QUIC_STATUS (checked ✅)
     └─ CxPlatDpRawPlumbRulesOnSocket(..) → void (NOT checked ❌)
          └─ CxPlatDpRawInterfaceAddRules → void, silently fails on alloc failure or rule overflow
               └─ CxPlatDpRawInterfaceUpdateRules → void, has TODO: "Figure out how to better handle failure"
                    └─ XdpCreateProgram → returns QUIC_STATUS (failure logged, then `continue`)

When XDP is not backed up with OS sockets, we would silently ignore all traffic. Probably applies to QTIP too.

#5935

Copy link
Copy Markdown
Collaborator

@guhetier guhetier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, two minor nits and it is ready to go.

Comment thread src/platform/datapath_xplat.c Outdated
Comment thread src/test/lib/HandshakeTest.cpp
@ProjectsByJackHe ProjectsByJackHe merged commit 5e99ea7 into main Apr 13, 2026
506 of 509 checks passed
@ProjectsByJackHe ProjectsByJackHe deleted the jackhe/sql-cibir-fix-sock-reservation branch April 13, 2026 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Partner-SQL] Multiple Quic listeners needed on same port on Windows

5 participants