Implement Network Recovery: Linux chip-tool by chulspro · Pull Request #43115 · project-chip/connectedhomeip

chulspro · 2026-02-12T12:59:31Z

This change brings up the Network Recovery feature implementation PR #38421 to the latest master branch of open source.

Usage:

For discovering nodes who advertising network recovery message in BLE

chip-tool networkrecovery discover --timeout 10

If don't specific the timeout value, it will take 30 seconds as default for Network Recovery scan.
Once scan completed, the RecoveryIdentifier value from each recoverable device will be listed in the chip-tool log, for example:

[1744899521.197] [1746:1751] [CTL] Discovery timed out
[1744899521.198] [1746:1751] [TOO] Find recoverable devices:
[1744899521.198] [1746:1751] [TOO] 0x8877665544332211
[1744899521.199] [1746:1747] [BLE] BLE removing known devices
[1744899521.202] [1746:1747] [BLE] BLE initiating scan
[1744899521.217] [1746:1751] [BLE] ChipDeviceScanner has started scanning!
[1744899521.218] [1746:1746] [CTL] Shutting down the commissioner

For recovering a node's Wi-Fi network in BLE
chip-tool networkrecovery recover-wifi node-id recovery-identifier ssid password breadcrumb, for example:

chip-tool networkrecovery recover-wifi 1 9833440827789222417 my-ssid my-password 0

Testing

manually validated with WiFi Linux All Clusters App

This change updates the following Network Recovery feature implementation PR to the latest master branch of open source. project-chip#38421 Signed-off-by: Charles Kim <chulspro.kim@samsung.com>

CLAassistant · 2026-02-12T12:59:42Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

gemini-code-assist

Code Review

This pull request introduces the Network Recovery feature for the Linux chip-tool, enabling discovery of recoverable nodes and network credential updates over BLE. While the implementation is comprehensive, a security audit revealed a high-severity use-after-free vulnerability and a medium-severity memory leak in AutoNetworkRecover.cpp that must be addressed for stability and security. Furthermore, the review identified bugs in the public API and error handling, along with several TODO comments and a minor typo. All comments have been retained, with some enhanced by references to established code review rules.

gemini-code-assist · 2026-02-12T13:02:54Z

+    }
+    else
+    {
+        self->mLastNetworkID = networkID.Value();


In NetworkRecoverBase::OnSuccessReadLastNetworkID, the member variable mLastNetworkID (which is a chip::ByteSpan) is assigned a value from the networkID argument. networkID is a reference to a ByteSpan that points into a transient network response buffer. This buffer is only valid for the duration of the callback. However, mLastNetworkID is used later in the SendRemoveNetwork function (line 78), which is called after a new CASE session has been established. By this time, the original response buffer has been freed or reused, leading to a use-after-free. This can cause a crash or lead to the RemoveNetwork command being sent with corrupted data, potentially affecting the wrong network configuration.

gemini-code-assist · 2026-02-12T13:02:54Z

+    chip::ByteSpan mPassword;
+    chip::ByteSpan mOperationalDataset;
+    uint64_t mBreadcrumb;
+    chip::Optional<uint16_t> mTimeout;


The member variable mTimeout is defined in NetworkRecoveryRecoverCommand but is never used. It seems to be a leftover from a copy-paste and should be removed to avoid confusion.

gemini-code-assist · 2026-02-12T13:02:54Z

+    else if (self->mNextStep == Step::kSendDisArmFailSafe)
+    {
+        err = CHIP_ERROR_INTERNAL;
+    }


There appears to be a bug in the error handling logic. When a failsafe disarm is successful (mNextStep == Step::kSendDisArmFailSafe), this code sets err = CHIP_ERROR_INTERNAL. This causes the original error that triggered the disarm to be lost and CHIP_ERROR_INTERNAL to be reported instead. A successful disarm should not be treated as an error. The recovery process should terminate by reporting the original error.

References

The current error handling logic incorrectly sets err = CHIP_ERROR_INTERNAL on a successful disarm, which does not reflect the intended behavior of preserving the original error or indicating success, thus contradicting the principle of ensuring delegate methods reflect their intended behavior.

gemini-code-assist · 2026-02-12T13:02:54Z

+    CHIP_ERROR DiscoverRecoverableNodes(uint16_t timeout);
+
+    CHIP_ERROR RecoverNode(NodeId remoteId, uint64_t recoveryId, WiFiCredentials wiFiCreds, uint64_t breadcrumb = 0);
+    CHIP_ERROR RecoverNode(NodeId remoteId, uint64_t recoveryId, ByteSpan operationaDataset, uint64_t breadcrumb = 0);


There is a typo in the parameter name operationaDataset. It should be operationalDataset.

CHIP_ERROR RecoverNode(NodeId remoteId, uint64_t recoveryId, ByteSpan operationalDataset, uint64_t breadcrumb = 0);

gemini-code-assist · 2026-02-12T13:02:54Z

+            err = self->mController->GetConnectedDevice(self->mRemoteNodeId, &self->mOnDeviceConnectedCallback,
+                                                        &self->mOnDeviceConnectionFailureCallback);
+            if (err != CHIP_NO_ERROR)
+            {
+                ChipLogError(Controller, "Failed to get device connection: %" CHIP_ERROR_FORMAT, err.Format());
+            }
+            return;


In NetworkRecoverBase::FinishRecoverNetwork, if an error occurs and the state is not already kSendDisArmFailSafe or kSendCommissioningComplete, the code attempts to initiate a disarm command by calling mController->GetConnectedDevice (line 383). If this call fails immediately (returns an error), the function logs the error and returns without calling mNetworkRecoverCallback. In the AutoNetworkRecover implementation, this callback is responsible for deleting the AutoNetworkRecover instance. Consequently, the instance is leaked.

gemini-code-assist · 2026-02-12T13:02:54Z

+     * TODO:
+     */


There is a TODO comment here. Please add the appropriate documentation for this new FindOrEstablishSession overload.

gemini-code-assist · 2026-02-12T13:02:54Z

+     * TODO:
+     */


There is a TODO comment here. Please add the appropriate documentation for this new Connect overload.

gemini-code-assist · 2026-02-12T13:02:54Z

    // out of a peripheral that matches the given discriminator.
    virtual void NewConnection(BleLayer * bleLayer, void * appState, const SetupDiscriminator & connDiscriminator) = 0;

+    // TODO:


A TODO comment is present. Please add documentation for the new NewConnection overload or remove the comment if it's no longer applicable.

gemini-code-assist · 2026-02-12T13:02:54Z

+                mDiscoveredRecoveryIds.push_back(params.GetRecoveryId());
+            }
+
+            // TODO:


A TODO comment is present. Please address it, either by implementing the intended logic or by adding a more descriptive comment about what needs to be done.

gemini-code-assist · 2026-02-12T13:02:54Z


    chip::Ble::ChipBLEDeviceIdentificationInfo deviceInfo;

+    // TODO:


There is a TODO comment here. Please address it. If the current logic is correct, the comment should be removed.

github-actions · 2026-02-12T13:26:46Z

PR #43115: Size comparison from 429a4d7 to 3a256ac

Increases above 0.2%:

platform	target	config	section	`429a4d7`	`3a256ac`	change	% change
stm32	light	STM32WB5MM-DK	FLASH	472220	473184	964	0.2

Full report (10 builds for cc13x4_26x4, cc32xx, nrfconnect, realtek, stm32)

platform	target	config	section	`429a4d7`	`3a256ac`	change	% change
cc13x4_26x4	lighting-app	LP_EM_CC1354P10_6	FLASH	772440	773404	964	0.1
			RAM	103200	103200	0	0.0
	lock-ftd	LP_EM_CC1354P10_6	FLASH	785248	786212	964	0.1
			RAM	108480	108480	0	0.0
	pump-app	LP_EM_CC1354P10_6	FLASH	730336	731300	964	0.1
			RAM	97236	97236	0	0.0
	pump-controller-app	LP_EM_CC1354P10_6	FLASH	714856	715812	956	0.1
			RAM	97436	97436	0	0.0
cc32xx	air-purifier	CC3235SF_LAUNCHXL	FLASH	555776	556740	964	0.2
			RAM	204432	204432	0	0.0
	lock	CC3235SF_LAUNCHXL	FLASH	589964	590928	964	0.2
			RAM	204720	204720	0	0.0
nrfconnect	all-clusters-app	nrf52840dk_nrf52840	FLASH	954168	955120	952	0.1
			RAM	161925	161925	0	0.0
realtek	light-switch-app	rtl8777g	FLASH	703440	704592	1152	0.2
			RAM	113356	113356	0	0.0
	lighting-app	rtl8777g	FLASH	745336	746488	1152	0.2
			RAM	114564	114564	0	0.0
stm32	light	STM32WB5MM-DK	FLASH	472220	473184	964	0.2
			RAM	141208	141208	0	0.0

andy31415 · 2026-02-12T18:37:27Z

+    kRecover,
+};
+
+class DLL_EXPORT NetworkRecover


could we have some doc comments on this: what is the purpose of this class, how is it used?

andy31415

@chulspro are we unable to add automated tests for this? we added new files: network recover and auto network recover, but no unit tests for them

andy31415

Request changes on testing: just saying manually validated is too brief.

https://project-chip.github.io/connectedhomeip-doc/contributing/pull_request_guidelines.html#testing

need justification on why automated testing is impossible
given manual tests, we need sufficient details for someone to reproduce (and understand how it was tested). Need commands, observed results and how this proves that the feature works

Strongly prefer automation though if possible

andy31415 · 2026-02-12T18:42:11Z

+namespace chip {
+namespace Controller {
+
+class DeviceCommissioner;


why was forward declaration required?

Worried about us layering inversion if we use forward declarations, since GN only validates layering based on includes.

andy31415 · 2026-02-12T18:42:30Z

                 TransportPayloadCapability transportPayloadCapability = TransportPayloadCapability::kMRPPayload);

+    /**
+     * TODO:


TODO seems empty ... please fill in with details.

andy31415 · 2026-02-12T18:42:41Z

    // out of a peripheral that matches the given discriminator.
    virtual void NewConnection(BleLayer * bleLayer, void * appState, const SetupDiscriminator & connDiscriminator) = 0;

+    // TODO:


same here - no empty todos please.

andy31415 · 2026-04-30T19:26:13Z

@chulspro this PR had no updates in a couple of months and currently has failing CI and conflicts.

is this being worked on or is it stale / should be closed?

I closed a 1-year old PR that was related as stale: #38421 (which seems to be correct since this PR is described as superseeding it).

Implement Network Recovery: Linux chip-tool

3a256ac

This change updates the following Network Recovery feature implementation PR to the latest master branch of open source. project-chip#38421 Signed-off-by: Charles Kim <chulspro.kim@samsung.com>

chulspro requested review from marktrayer and robszewczyk February 12, 2026 12:59

chulspro requested review from a team, arkq and joonhaengHeo as code owners February 12, 2026 12:59

github-actions Bot added tools examples scripts platform darwin esp32 linux controller ble app tizen For Tizen platform labels Feb 12, 2026

chulspro removed tools examples scripts platform darwin esp32 linux controller ble app labels Feb 12, 2026

chulspro added v1.6 and removed tizen For Tizen platform labels Feb 12, 2026

github-project-automation Bot added this to [Platform] Darwin Feb 12, 2026

github-project-automation Bot moved this to Todo in [Platform] Darwin Feb 12, 2026

gemini-code-assist Bot reviewed Feb 12, 2026

View reviewed changes

andy31415 reviewed Feb 12, 2026

View reviewed changes

andy31415 requested changes Feb 12, 2026

View reviewed changes

github-project-automation Bot moved this from Todo to In Progress in [Platform] Darwin Feb 12, 2026

andy31415 reviewed Feb 12, 2026

View reviewed changes


		chip::Ble::ChipBLEDeviceIdentificationInfo deviceInfo;

		// TODO:

Conversation

chulspro commented Feb 12, 2026

Usage:

Testing

Uh oh!

CLAassistant commented Feb 12, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Feb 12, 2026

Uh oh!

andy31415 Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

andy31415 left a comment

Choose a reason for hiding this comment

Uh oh!

andy31415 left a comment

Choose a reason for hiding this comment

Uh oh!

andy31415 Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

andy31415 Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

andy31415 Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

andy31415 commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants