Skip to content

Implement Network Recovery: Linux chip-tool#43115

Open
chulspro wants to merge 1 commit intoproject-chip:masterfrom
chulspro:feature/contribution-network-recovery-1
Open

Implement Network Recovery: Linux chip-tool#43115
chulspro wants to merge 1 commit intoproject-chip:masterfrom
chulspro:feature/contribution-network-recovery-1

Conversation

@chulspro
Copy link
Copy Markdown
Contributor

This change brings up the Network Recovery feature implementation PR #38421 to the latest master branch of open source.

Usage:

For discovering nodes who advertising network recovery message in BLE

chip-tool networkrecovery discover --timeout 10

If don't specific the timeout value, it will take 30 seconds as default for Network Recovery scan.
Once scan completed, the RecoveryIdentifier value from each recoverable device will be listed in the chip-tool log, for example:

[1744899521.197] [1746:1751] [CTL] Discovery timed out
[1744899521.198] [1746:1751] [TOO] Find recoverable devices:
[1744899521.198] [1746:1751] [TOO] 0x8877665544332211
[1744899521.199] [1746:1747] [BLE] BLE removing known devices
[1744899521.202] [1746:1747] [BLE] BLE initiating scan
[1744899521.217] [1746:1751] [BLE] ChipDeviceScanner has started scanning!
[1744899521.218] [1746:1746] [CTL] Shutting down the commissioner

For recovering a node's Wi-Fi network in BLE
chip-tool networkrecovery recover-wifi node-id recovery-identifier ssid password breadcrumb, for example:

chip-tool networkrecovery recover-wifi 1 9833440827789222417 my-ssid my-password 0

Testing

manually validated with WiFi Linux All Clusters App

This change updates the following Network Recovery feature
implementation PR to the latest master branch of open source.

project-chip#38421
Signed-off-by: Charles Kim <chulspro.kim@samsung.com>
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@chulspro chulspro added v1.6 and removed tizen For Tizen platform labels Feb 12, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the Network Recovery feature for the Linux chip-tool, enabling discovery of recoverable nodes and network credential updates over BLE. While the implementation is comprehensive, a security audit revealed a high-severity use-after-free vulnerability and a medium-severity memory leak in AutoNetworkRecover.cpp that must be addressed for stability and security. Furthermore, the review identified bugs in the public API and error handling, along with several TODO comments and a minor typo. All comments have been retained, with some enhanced by references to established code review rules.

}
else
{
self->mLastNetworkID = networkID.Value();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

In NetworkRecoverBase::OnSuccessReadLastNetworkID, the member variable mLastNetworkID (which is a chip::ByteSpan) is assigned a value from the networkID argument. networkID is a reference to a ByteSpan that points into a transient network response buffer. This buffer is only valid for the duration of the callback. However, mLastNetworkID is used later in the SendRemoveNetwork function (line 78), which is called after a new CASE session has been established. By this time, the original response buffer has been freed or reused, leading to a use-after-free. This can cause a crash or lead to the RemoveNetwork command being sent with corrupted data, potentially affecting the wrong network configuration.

chip::ByteSpan mPassword;
chip::ByteSpan mOperationalDataset;
uint64_t mBreadcrumb;
chip::Optional<uint16_t> mTimeout;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The member variable mTimeout is defined in NetworkRecoveryRecoverCommand but is never used. It seems to be a leftover from a copy-paste and should be removed to avoid confusion.

Comment on lines +231 to +234
else if (self->mNextStep == Step::kSendDisArmFailSafe)
{
err = CHIP_ERROR_INTERNAL;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There appears to be a bug in the error handling logic. When a failsafe disarm is successful (mNextStep == Step::kSendDisArmFailSafe), this code sets err = CHIP_ERROR_INTERNAL. This causes the original error that triggered the disarm to be lost and CHIP_ERROR_INTERNAL to be reported instead. A successful disarm should not be treated as an error. The recovery process should terminate by reporting the original error.

References
  1. The current error handling logic incorrectly sets err = CHIP_ERROR_INTERNAL on a successful disarm, which does not reflect the intended behavior of preserving the original error or indicating success, thus contradicting the principle of ensuring delegate methods reflect their intended behavior.

CHIP_ERROR DiscoverRecoverableNodes(uint16_t timeout);

CHIP_ERROR RecoverNode(NodeId remoteId, uint64_t recoveryId, WiFiCredentials wiFiCreds, uint64_t breadcrumb = 0);
CHIP_ERROR RecoverNode(NodeId remoteId, uint64_t recoveryId, ByteSpan operationaDataset, uint64_t breadcrumb = 0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a typo in the parameter name operationaDataset. It should be operationalDataset.

    CHIP_ERROR RecoverNode(NodeId remoteId, uint64_t recoveryId, ByteSpan operationalDataset, uint64_t breadcrumb = 0);

Comment on lines +383 to +389
err = self->mController->GetConnectedDevice(self->mRemoteNodeId, &self->mOnDeviceConnectedCallback,
&self->mOnDeviceConnectionFailureCallback);
if (err != CHIP_NO_ERROR)
{
ChipLogError(Controller, "Failed to get device connection: %" CHIP_ERROR_FORMAT, err.Format());
}
return;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

In NetworkRecoverBase::FinishRecoverNetwork, if an error occurs and the state is not already kSendDisArmFailSafe or kSendCommissioningComplete, the code attempts to initiate a disarm command by calling mController->GetConnectedDevice (line 383). If this call fails immediately (returns an error), the function logs the error and returns without calling mNetworkRecoverCallback. In the AutoNetworkRecover implementation, this callback is responsible for deleting the AutoNetworkRecover instance. Consequently, the instance is leaked.

Comment on lines +91 to +92
* TODO:
*/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a TODO comment here. Please add the appropriate documentation for this new FindOrEstablishSession overload.

Comment on lines +222 to +223
* TODO:
*/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a TODO comment here. Please add the appropriate documentation for this new Connect overload.

// out of a peripheral that matches the given discriminator.
virtual void NewConnection(BleLayer * bleLayer, void * appState, const SetupDiscriminator & connDiscriminator) = 0;

// TODO:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

A TODO comment is present. Please add documentation for the new NewConnection overload or remove the comment if it's no longer applicable.

mDiscoveredRecoveryIds.push_back(params.GetRecoveryId());
}

// TODO:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

A TODO comment is present. Please address it, either by implementing the intended logic or by adding a more descriptive comment about what needs to be done.


chip::Ble::ChipBLEDeviceIdentificationInfo deviceInfo;

// TODO:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a TODO comment here. Please address it. If the current logic is correct, the comment should be removed.

@github-actions
Copy link
Copy Markdown

PR #43115: Size comparison from 429a4d7 to 3a256ac

Increases above 0.2%:

platform target config section 429a4d7 3a256ac change % change
stm32 light STM32WB5MM-DK FLASH 472220 473184 964 0.2
Full report (10 builds for cc13x4_26x4, cc32xx, nrfconnect, realtek, stm32)
platform target config section 429a4d7 3a256ac change % change
cc13x4_26x4 lighting-app LP_EM_CC1354P10_6 FLASH 772440 773404 964 0.1
RAM 103200 103200 0 0.0
lock-ftd LP_EM_CC1354P10_6 FLASH 785248 786212 964 0.1
RAM 108480 108480 0 0.0
pump-app LP_EM_CC1354P10_6 FLASH 730336 731300 964 0.1
RAM 97236 97236 0 0.0
pump-controller-app LP_EM_CC1354P10_6 FLASH 714856 715812 956 0.1
RAM 97436 97436 0 0.0
cc32xx air-purifier CC3235SF_LAUNCHXL FLASH 555776 556740 964 0.2
RAM 204432 204432 0 0.0
lock CC3235SF_LAUNCHXL FLASH 589964 590928 964 0.2
RAM 204720 204720 0 0.0
nrfconnect all-clusters-app nrf52840dk_nrf52840 FLASH 954168 955120 952 0.1
RAM 161925 161925 0 0.0
realtek light-switch-app rtl8777g FLASH 703440 704592 1152 0.2
RAM 113356 113356 0 0.0
lighting-app rtl8777g FLASH 745336 746488 1152 0.2
RAM 114564 114564 0 0.0
stm32 light STM32WB5MM-DK FLASH 472220 473184 964 0.2
RAM 141208 141208 0 0.0

kRecover,
};

class DLL_EXPORT NetworkRecover
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we have some doc comments on this: what is the purpose of this class, how is it used?

Copy link
Copy Markdown
Contributor

@andy31415 andy31415 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chulspro are we unable to add automated tests for this? we added new files: network recover and auto network recover, but no unit tests for them

Copy link
Copy Markdown
Contributor

@andy31415 andy31415 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request changes on testing: just saying manually validated is too brief.

https://project-chip.github.io/connectedhomeip-doc/contributing/pull_request_guidelines.html#testing

  • need justification on why automated testing is impossible
  • given manual tests, we need sufficient details for someone to reproduce (and understand how it was tested). Need commands, observed results and how this proves that the feature works

Strongly prefer automation though if possible

@github-project-automation github-project-automation Bot moved this from Todo to In Progress in [Platform] Darwin Feb 12, 2026
namespace chip {
namespace Controller {

class DeviceCommissioner;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was forward declaration required?

Worried about us layering inversion if we use forward declarations, since GN only validates layering based on includes.

TransportPayloadCapability transportPayloadCapability = TransportPayloadCapability::kMRPPayload);

/**
* TODO:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO seems empty ... please fill in with details.

// out of a peripheral that matches the given discriminator.
virtual void NewConnection(BleLayer * bleLayer, void * appState, const SetupDiscriminator & connDiscriminator) = 0;

// TODO:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here - no empty todos please.

@andy31415
Copy link
Copy Markdown
Contributor

@chulspro this PR had no updates in a couple of months and currently has failing CI and conflicts.

is this being worked on or is it stale / should be closed?

I closed a 1-year old PR that was related as stale: #38421 (which seems to be correct since this PR is described as superseeding it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

4 participants