Implement Network Recovery: Linux chip-tool#43115
Implement Network Recovery: Linux chip-tool#43115chulspro wants to merge 1 commit intoproject-chip:masterfrom
Conversation
This change updates the following Network Recovery feature implementation PR to the latest master branch of open source. project-chip#38421 Signed-off-by: Charles Kim <chulspro.kim@samsung.com>
|
|
There was a problem hiding this comment.
Code Review
This pull request introduces the Network Recovery feature for the Linux chip-tool, enabling discovery of recoverable nodes and network credential updates over BLE. While the implementation is comprehensive, a security audit revealed a high-severity use-after-free vulnerability and a medium-severity memory leak in AutoNetworkRecover.cpp that must be addressed for stability and security. Furthermore, the review identified bugs in the public API and error handling, along with several TODO comments and a minor typo. All comments have been retained, with some enhanced by references to established code review rules.
| } | ||
| else | ||
| { | ||
| self->mLastNetworkID = networkID.Value(); |
There was a problem hiding this comment.
In NetworkRecoverBase::OnSuccessReadLastNetworkID, the member variable mLastNetworkID (which is a chip::ByteSpan) is assigned a value from the networkID argument. networkID is a reference to a ByteSpan that points into a transient network response buffer. This buffer is only valid for the duration of the callback. However, mLastNetworkID is used later in the SendRemoveNetwork function (line 78), which is called after a new CASE session has been established. By this time, the original response buffer has been freed or reused, leading to a use-after-free. This can cause a crash or lead to the RemoveNetwork command being sent with corrupted data, potentially affecting the wrong network configuration.
| chip::ByteSpan mPassword; | ||
| chip::ByteSpan mOperationalDataset; | ||
| uint64_t mBreadcrumb; | ||
| chip::Optional<uint16_t> mTimeout; |
| else if (self->mNextStep == Step::kSendDisArmFailSafe) | ||
| { | ||
| err = CHIP_ERROR_INTERNAL; | ||
| } |
There was a problem hiding this comment.
There appears to be a bug in the error handling logic. When a failsafe disarm is successful (mNextStep == Step::kSendDisArmFailSafe), this code sets err = CHIP_ERROR_INTERNAL. This causes the original error that triggered the disarm to be lost and CHIP_ERROR_INTERNAL to be reported instead. A successful disarm should not be treated as an error. The recovery process should terminate by reporting the original error.
References
- The current error handling logic incorrectly sets
err = CHIP_ERROR_INTERNALon a successful disarm, which does not reflect the intended behavior of preserving the original error or indicating success, thus contradicting the principle of ensuring delegate methods reflect their intended behavior.
| CHIP_ERROR DiscoverRecoverableNodes(uint16_t timeout); | ||
|
|
||
| CHIP_ERROR RecoverNode(NodeId remoteId, uint64_t recoveryId, WiFiCredentials wiFiCreds, uint64_t breadcrumb = 0); | ||
| CHIP_ERROR RecoverNode(NodeId remoteId, uint64_t recoveryId, ByteSpan operationaDataset, uint64_t breadcrumb = 0); |
| err = self->mController->GetConnectedDevice(self->mRemoteNodeId, &self->mOnDeviceConnectedCallback, | ||
| &self->mOnDeviceConnectionFailureCallback); | ||
| if (err != CHIP_NO_ERROR) | ||
| { | ||
| ChipLogError(Controller, "Failed to get device connection: %" CHIP_ERROR_FORMAT, err.Format()); | ||
| } | ||
| return; |
There was a problem hiding this comment.
In NetworkRecoverBase::FinishRecoverNetwork, if an error occurs and the state is not already kSendDisArmFailSafe or kSendCommissioningComplete, the code attempts to initiate a disarm command by calling mController->GetConnectedDevice (line 383). If this call fails immediately (returns an error), the function logs the error and returns without calling mNetworkRecoverCallback. In the AutoNetworkRecover implementation, this callback is responsible for deleting the AutoNetworkRecover instance. Consequently, the instance is leaked.
| * TODO: | ||
| */ |
| * TODO: | ||
| */ |
| // out of a peripheral that matches the given discriminator. | ||
| virtual void NewConnection(BleLayer * bleLayer, void * appState, const SetupDiscriminator & connDiscriminator) = 0; | ||
|
|
||
| // TODO: |
| mDiscoveredRecoveryIds.push_back(params.GetRecoveryId()); | ||
| } | ||
|
|
||
| // TODO: |
|
|
||
| chip::Ble::ChipBLEDeviceIdentificationInfo deviceInfo; | ||
|
|
||
| // TODO: |
|
PR #43115: Size comparison from 429a4d7 to 3a256ac Increases above 0.2%:
Full report (10 builds for cc13x4_26x4, cc32xx, nrfconnect, realtek, stm32)
|
| kRecover, | ||
| }; | ||
|
|
||
| class DLL_EXPORT NetworkRecover |
There was a problem hiding this comment.
could we have some doc comments on this: what is the purpose of this class, how is it used?
andy31415
left a comment
There was a problem hiding this comment.
Request changes on testing: just saying manually validated is too brief.
https://project-chip.github.io/connectedhomeip-doc/contributing/pull_request_guidelines.html#testing
- need justification on why automated testing is impossible
- given manual tests, we need sufficient details for someone to reproduce (and understand how it was tested). Need commands, observed results and how this proves that the feature works
Strongly prefer automation though if possible
| namespace chip { | ||
| namespace Controller { | ||
|
|
||
| class DeviceCommissioner; |
There was a problem hiding this comment.
why was forward declaration required?
Worried about us layering inversion if we use forward declarations, since GN only validates layering based on includes.
| TransportPayloadCapability transportPayloadCapability = TransportPayloadCapability::kMRPPayload); | ||
|
|
||
| /** | ||
| * TODO: |
There was a problem hiding this comment.
TODO seems empty ... please fill in with details.
| // out of a peripheral that matches the given discriminator. | ||
| virtual void NewConnection(BleLayer * bleLayer, void * appState, const SetupDiscriminator & connDiscriminator) = 0; | ||
|
|
||
| // TODO: |
There was a problem hiding this comment.
same here - no empty todos please.
This change brings up the Network Recovery feature implementation PR #38421 to the latest master branch of open source.
Usage:
For discovering nodes who advertising network recovery message in BLE
If don't specific the timeout value, it will take 30 seconds as default for Network Recovery scan.
Once scan completed, the RecoveryIdentifier value from each recoverable device will be listed in the chip-tool log, for example:
For recovering a node's Wi-Fi network in BLE
chip-tool networkrecovery recover-wifi node-id recovery-identifier ssid password breadcrumb, for example:
Testing
manually validated with WiFi Linux All Clusters App