1210: Fix coredump in http_client during rapid connection destruction by baemyung · Pull Request #1476 · ibm-openbmc/bmcweb

baemyung · 2026-05-21T00:16:32Z

Issue:
BMCWeb crashes with segmentation fault when HTTP client connections are rapidly created and destroyed while async operations are pending. This occurs during event subscription deletion, BMCWeb shutdown, or network timeout scenarios where ConnectionPool is destroyed while callbacks are still executing.

Root Cause:
Race condition between async callback execution and ConnectionPool destruction leads to three failure modes:

Out-of-bounds vector access in sendNext() when connId is invalid
Null pointer dereference when connection pointer is null
Use-after-free when callback executes after pool destruction

The issue manifests as:

Segmentation fault in sendNext()
Crash in afterSendData() when accessing destroyed pool
Invalid memory access during callback execution

Key Fixes:

Bounds Checking
- Validate connId < connections.size() before vector access
- Prevents segmentation fault from out-of-bounds access
- Returns early with error log if connId invalid
Null Pointer Validation
- Check if connection pointer is null before dereferencing
- Provides graceful error handling instead of crash
- Logs error and returns safely
Reordered afterSendData()
- Check ConnectionPool validity BEFORE calling user callback
- Prevents callback execution if pool has been destroyed
- Ensures safe early return if pool no longer exists

Testing:
Reproduced coredump by simulating aggressive client behavior:

Client rapidly creates HTTP connections to BMCWeb
Client sends POST requests to create event subscriptions
Client abruptly closes connections during SSL handshake
Client destroys sessions while async operations are pending
This triggers race condition where callbacks execute after ConnectionPool destruction

Example Python code that reproduces the issue:

  for i in range(100):
      session = requests.Session()
      session.post('https://bmc/redfish/v1/EventService/Subscriptions',
                   json={'Destination': 'http://192.0.2.1:8080/events'})
      time.sleep(0.01)  # Brief delay
      session.close()   # Abrupt destruction while request pending
      del session

Before fix:

Segmentation fault in sendNext()
BMCWeb process crashes
Coredump generated

After fix:

No coredumps detected
BMCWeb process remains stable
Graceful error handling with log messages: "Invalid connId: X (size: Y)" "Connection at index X is null" "Failed to capture connection"

Expected logs during aggressive client testing (normal behavior):

SSL handshake failures (client abruptly disconnects)
Authentication failures (connections close during auth)
No segmentation faults or process crashes

Verification:
coredumpctl list bmcweb # No new coredumps
systemctl status bmcweb # Process still running
dmesg | grep segfault # No segmentation faults

baemyung · 2026-05-21T11:16:52Z

Defect: https://jazz07.rchland.ibm.com:13443/jazz/web/projects/FPS%20Collaboration#action=com.ibm.team.workitem.viewWorkItem&id=778179

upstream: https://gerrit.openbmc.org/c/openbmc/bmcweb/+/90540, https://gerrit.openbmc.org/c/openbmc/bmcweb/+/90541
1110: #1477
1120: #1475
1210: #1476

jnin-dev

I don't see anything wrong, looks good to me

Issue: BMCWeb crashes with segmentation fault when HTTP client connections are rapidly created and destroyed while async operations are pending. This occurs during event subscription deletion, BMCWeb shutdown, or network timeout scenarios where ConnectionPool is destroyed while callbacks are still executing. Root Cause: Race condition between async callback execution and ConnectionPool destruction leads to three failure modes: 1. Out-of-bounds vector access in sendNext() when connId is invalid 2. Null pointer dereference when connection pointer is null 3. Use-after-free when callback executes after pool destruction The issue manifests as: - Segmentation fault in sendNext() - Crash in afterSendData() when accessing destroyed pool - Invalid memory access during callback execution Key Fixes: 1. Bounds Checking - Validate connId < connections.size() before vector access - Prevents segmentation fault from out-of-bounds access - Returns early with error log if connId invalid 2. Null Pointer Validation - Check if connection pointer is null before dereferencing - Provides graceful error handling instead of crash - Logs error and returns safely 3. Reordered afterSendData() - Check ConnectionPool validity BEFORE calling user callback - Prevents callback execution if pool has been destroyed - Ensures safe early return if pool no longer exists Testing: Reproduced coredump by simulating aggressive client behavior: - Client rapidly creates HTTP connections to BMCWeb - Client sends POST requests to create event subscriptions - Client abruptly closes connections during SSL handshake - Client destroys sessions while async operations are pending - This triggers race condition where callbacks execute after ConnectionPool destruction Example Python code that reproduces the issue: ``` for i in range(100): session = requests.Session() session.post('https://bmc/redfish/v1/EventService/Subscriptions', json={'Destination': 'http://192.0.2.1:8080/events'}) time.sleep(0.01) # Brief delay session.close() # Abrupt destruction while request pending del session ``` Before fix: - Segmentation fault in sendNext() - BMCWeb process crashes - Coredump generated After fix: - No coredumps detected - BMCWeb process remains stable - Graceful error handling with log messages: "Invalid connId: X (size: Y)" "Connection at index X is null" "Failed to capture connection" Expected logs during aggressive client testing (normal behavior): - SSL handshake failures (client abruptly disconnects) - Authentication failures (connections close during auth) - No segmentation faults or process crashes Verification: coredumpctl list bmcweb # No new coredumps systemctl status bmcweb # Process still running dmesg | grep segfault # No segmentation faults Change-Id: I7b7a61aa8fcce1e32b5e3893da5ff0d5b3e3344f Signed-off-by: Myung Bae <myungbae@us.ibm.com>

Some http_client async callbacks in ConnectionInfo used raw `this` pointers in std::bind_front(), causing use-after-free when the object was destroyed before callbacks completed. The crash occurs at: ``` ibm-openbmc#8 std::__throw_bad_function_call() ibm-openbmc#9 crow::ConnectionInfo::afterRead (this=0x14dc1a0, ec=..., bytesTransferred=<optimized out>) at /usr/src/debug/bmcweb/1.0+git/http/http_client.hpp:390 ``` The matching code is ``` void afterRead(const std::shared_ptr<ConnectionInfo>& /*self*/, const boost::beast::error_code& ec, const std::size_t bytesTransferred) { ... // Make sure the received response code is valid as defined by // the associated retry policy if (connPolicy->invalidResp(respCode)) <--- { ``` In summary, ``` if (connPolicy->invalidResp(respCode)) The std::bad_function_call exception implies that the function is either invalid, or its anchored object is invalid. In this case, it seems the object (e.g. this or connPolicy) is invalid. The code uses: ``` std::bind_front(&ConnectionInfo::afterResolve, this, shared_from_this()) ``` While shared_from_this() is passed as a parameter, std::bind_front binds this (raw pointer) as the implicit first parameter for the member function call (e.g. afterResolve). By making sure `this (self)` comes from as `shared_from_this()`, `this`will be valid as long as `shared_from_this()` is valid. `` [self = shared_from_this()](const boost::beast::error_code& ec, size_t bytesTransferred) { self->afterRead(self, ec, bytesTransferred); } ``` Tested: - Redfish Service Validator passes - It is not easy to reproduce the issue manually, but I made sure that it is working under stress run to invoke many of the client connection and disconnection (as much as possible). Change-Id: I9a840f4f99521038a40064c889710c35e6be2480 Signed-off-by: Myung Bae <myungbae@us.ibm.com>

baemyung mentioned this pull request May 21, 2026

1120: Fix coredump in http_client during rapid connection destruction #1475

Open

baemyung requested review from abiasojo, gtmills, jeaaustx and jnin-dev May 21, 2026 11:17

baemyung marked this pull request as draft May 21, 2026 12:46

baemyung marked this pull request as ready for review May 22, 2026 00:01

baemyung force-pushed the 1210-fix-http-client-coredump-by-rapid-conn-close branch from e922687 to 008fff0 Compare May 26, 2026 17:48

baemyung mentioned this pull request May 26, 2026

1110: Fix coredump in http_client during rapid connection destruction #1477

Open

jeaaustx approved these changes May 27, 2026

View reviewed changes

jnin-dev approved these changes May 27, 2026

View reviewed changes

abiasojo approved these changes May 28, 2026

View reviewed changes

baemyung added 2 commits May 29, 2026 08:08

baemyung force-pushed the 1210-fix-http-client-coredump-by-rapid-conn-close branch from 008fff0 to c049e43 Compare May 29, 2026 13:08

gtmills approved these changes May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1210: Fix coredump in http_client during rapid connection destruction#1476

1210: Fix coredump in http_client during rapid connection destruction#1476
baemyung wants to merge 2 commits into
ibm-openbmc:1210from
baemyung:1210-fix-http-client-coredump-by-rapid-conn-close

baemyung commented May 21, 2026

Uh oh!

baemyung commented May 21, 2026 •

edited

Loading

Uh oh!

jnin-dev left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

baemyung commented May 21, 2026

Uh oh!

baemyung commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnin-dev left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

baemyung commented May 21, 2026 •

edited

Loading