1210: Fix coredump in http_client during rapid connection destruction#1476
Open
baemyung wants to merge 2 commits into
Open
1210: Fix coredump in http_client during rapid connection destruction#1476baemyung wants to merge 2 commits into
baemyung wants to merge 2 commits into
Conversation
Contributor
Author
e922687 to
008fff0
Compare
jeaaustx
approved these changes
May 27, 2026
jnin-dev
approved these changes
May 27, 2026
Contributor
jnin-dev
left a comment
There was a problem hiding this comment.
I don't see anything wrong, looks good to me
abiasojo
approved these changes
May 28, 2026
Issue:
BMCWeb crashes with segmentation fault when HTTP client connections are
rapidly created and destroyed while async operations are pending. This
occurs during event subscription deletion, BMCWeb shutdown, or network
timeout scenarios where ConnectionPool is destroyed while callbacks are
still executing.
Root Cause:
Race condition between async callback execution and ConnectionPool
destruction leads to three failure modes:
1. Out-of-bounds vector access in sendNext() when connId is invalid
2. Null pointer dereference when connection pointer is null
3. Use-after-free when callback executes after pool destruction
The issue manifests as:
- Segmentation fault in sendNext()
- Crash in afterSendData() when accessing destroyed pool
- Invalid memory access during callback execution
Key Fixes:
1. Bounds Checking
- Validate connId < connections.size() before vector access
- Prevents segmentation fault from out-of-bounds access
- Returns early with error log if connId invalid
2. Null Pointer Validation
- Check if connection pointer is null before dereferencing
- Provides graceful error handling instead of crash
- Logs error and returns safely
3. Reordered afterSendData()
- Check ConnectionPool validity BEFORE calling user callback
- Prevents callback execution if pool has been destroyed
- Ensures safe early return if pool no longer exists
Testing:
Reproduced coredump by simulating aggressive client behavior:
- Client rapidly creates HTTP connections to BMCWeb
- Client sends POST requests to create event subscriptions
- Client abruptly closes connections during SSL handshake
- Client destroys sessions while async operations are pending
- This triggers race condition where callbacks execute after
ConnectionPool destruction
Example Python code that reproduces the issue:
```
for i in range(100):
session = requests.Session()
session.post('https://bmc/redfish/v1/EventService/Subscriptions',
json={'Destination': 'http://192.0.2.1:8080/events'})
time.sleep(0.01) # Brief delay
session.close() # Abrupt destruction while request pending
del session
```
Before fix:
- Segmentation fault in sendNext()
- BMCWeb process crashes
- Coredump generated
After fix:
- No coredumps detected
- BMCWeb process remains stable
- Graceful error handling with log messages:
"Invalid connId: X (size: Y)"
"Connection at index X is null"
"Failed to capture connection"
Expected logs during aggressive client testing (normal behavior):
- SSL handshake failures (client abruptly disconnects)
- Authentication failures (connections close during auth)
- No segmentation faults or process crashes
Verification:
coredumpctl list bmcweb # No new coredumps
systemctl status bmcweb # Process still running
dmesg | grep segfault # No segmentation faults
Change-Id: I7b7a61aa8fcce1e32b5e3893da5ff0d5b3e3344f
Signed-off-by: Myung Bae <myungbae@us.ibm.com>
Some http_client async callbacks in ConnectionInfo used raw `this` pointers in std::bind_front(), causing use-after-free when the object was destroyed before callbacks completed. The crash occurs at: ``` ibm-openbmc#8 std::__throw_bad_function_call() ibm-openbmc#9 crow::ConnectionInfo::afterRead (this=0x14dc1a0, ec=..., bytesTransferred=<optimized out>) at /usr/src/debug/bmcweb/1.0+git/http/http_client.hpp:390 ``` The matching code is ``` void afterRead(const std::shared_ptr<ConnectionInfo>& /*self*/, const boost::beast::error_code& ec, const std::size_t bytesTransferred) { ... // Make sure the received response code is valid as defined by // the associated retry policy if (connPolicy->invalidResp(respCode)) <--- { ``` In summary, ``` if (connPolicy->invalidResp(respCode)) The std::bad_function_call exception implies that the function is either invalid, or its anchored object is invalid. In this case, it seems the object (e.g. this or connPolicy) is invalid. The code uses: ``` std::bind_front(&ConnectionInfo::afterResolve, this, shared_from_this()) ``` While shared_from_this() is passed as a parameter, std::bind_front binds this (raw pointer) as the implicit first parameter for the member function call (e.g. afterResolve). By making sure `this (self)` comes from as `shared_from_this()`, `this`will be valid as long as `shared_from_this()` is valid. `` [self = shared_from_this()](const boost::beast::error_code& ec, size_t bytesTransferred) { self->afterRead(self, ec, bytesTransferred); } ``` Tested: - Redfish Service Validator passes - It is not easy to reproduce the issue manually, but I made sure that it is working under stress run to invoke many of the client connection and disconnection (as much as possible). Change-Id: I9a840f4f99521038a40064c889710c35e6be2480 Signed-off-by: Myung Bae <myungbae@us.ibm.com>
008fff0 to
c049e43
Compare
gtmills
approved these changes
May 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue:
BMCWeb crashes with segmentation fault when HTTP client connections are rapidly created and destroyed while async operations are pending. This occurs during event subscription deletion, BMCWeb shutdown, or network timeout scenarios where ConnectionPool is destroyed while callbacks are still executing.
Root Cause:
Race condition between async callback execution and ConnectionPool destruction leads to three failure modes:
The issue manifests as:
Key Fixes:
Bounds Checking
Null Pointer Validation
Reordered afterSendData()
Testing:
Reproduced coredump by simulating aggressive client behavior:
Example Python code that reproduces the issue:
Before fix:
After fix:
Expected logs during aggressive client testing (normal behavior):
Verification:
coredumpctl list bmcweb # No new coredumps
systemctl status bmcweb # Process still running
dmesg | grep segfault # No segmentation faults