Skip to content

Implement bounded local client app metadata storage#10392

Open
gotnull wants to merge 3 commits into
meshtastic:developfrom
gotnull:feature/client-app-data-store
Open

Implement bounded local client app metadata storage#10392
gotnull wants to merge 3 commits into
meshtastic:developfrom
gotnull:feature/client-app-data-store

Conversation

@gotnull
Copy link
Copy Markdown

@gotnull gotnull commented May 5, 2026

Summary

Implement the firmware side of the bounded local client-app metadata store proposed in Meshtastic/rfcs#12 and meshtastic/protobufs#907. Adds a tiny new module (ClientAppDataStore), wires it into AdminModule for the four new oneof tags 104..107, persists records to /prefs/clientappdata.proto, and ships native test coverage.

This is intentionally namespaced, not owned: the firmware enforces shape and capacity, but does NOT authenticate which companion app is writing. Any admin-capable client may overwrite or delete any app_id. See the linked RFC for the full namespaced-not-owned framing.

Firmware behavior

Three commits on this branch:

  1. firmware: implement local client app data store (a833cfc)
  2. firmware: handle client app data admin requests (d054049)
  3. firmware: add client app data tests (5979032)

Storage path and lifecycle

  • Persisted to /prefs/clientappdata.proto via the existing NodeDB::saveProto helper (atomic via SafeFile, SPI-locked, power-safe gated).
  • Loaded on boot via ClientAppDataStore::init(), which is called from main.cpp right after nodeDB = new NodeDB and TransmitHistory::loadFromDisk(), alongside the other /prefs-reading initializers.
  • Missing or undecodable file at boot leaves the store empty (no log noise on first boot; only LOG_WARN for genuine decode failures).
  • Cleared automatically by NodeDB::factoryReset() through the existing rmDir("/prefs") path. No new firmware code required.

Local-only set/delete, remote authenticated read

In AdminModule::handleReceivedProtobuf:

  • set_client_app_data (tag 104): rejected with Routing_Error_NOT_AUTHORIZED if mp.from != 0. Local writes call ClientAppDataStore::set and rely on the existing auto-ACK path to emit Routing_Error_NONE.
  • get_client_app_data_request (tag 105): allowed for local clients and for authenticated remote admin (matches every other get_*_request tag). Replies with get_client_app_data_response. On miss the response carries an empty app_id sentinel.
  • delete_client_app_data_request (tag 107): rejected with NOT_AUTHORIZED if remote. Treats delete-of-missing as success (idempotent), matching remove_by_nodenum, remove_favorite_node, and delete_file_request precedent.
  • messageIsRequest() and messageIsResponse() allowlists are extended to enumerate the new request/response tags so the auth gate does not mis-classify them as state-changing writes.
  • setPassKey() is called on the get response, mirroring handleGetOwner and handleGetDeviceUIConfig. PKI encryption flag is propagated.

Error mapping via existing Routing_Error

No new status enum, no new response envelope. All failures map to existing routing-layer errors:

Store result Wire reply
Ok (set/delete) auto-ACK Routing_Error_NONE
Ok (get) populated get_client_app_data_response
NotFound (get) get_client_app_data_response with empty app_id
NotFound (delete) auto-ACK Routing_Error_NONE (idempotent)
InvalidAppId / PayloadTooLarge / NoSpace / StorageError Routing_Error_BAD_REQUEST
Remote sender on set/delete Routing_Error_NOT_AUTHORIZED

Store-layer logs (LOG_WARN / LOG_ERROR) disambiguate the BAD_REQUEST cases for operators.

Limits enforced

  • app_id matches ^[a-z0-9._-]{1,32}$ (pure char-by-char check, no <regex>)
  • payload size <= 512 bytes (rejected at the store boundary even when nanopb's fixed buffer would otherwise truncate)
  • Max 4 records per node, statically allocated as meshtastic_LocalClientAppData store_;. No heap.
  • Overwriting an existing app_id reuses the slot (no extra capacity consumed).
  • remove() compacts trailing records to free the slot for re-use.

updated_at semantics

Always server-set via getValidTime(RTCQualityDevice). Caller-supplied values are ignored. Returns 0 if the firmware has no valid wall-clock time at write. Matches the spec in the proto comment.

NodeDB.h virtual change

loadProto and saveProto now have virtual (added in commit 3, scoped to that commit specifically). Mirrors the existing MockNodeDB::getMeshNode override pattern in test/test_traffic_management. Cost: one indirect call per save/load (negligible vs flash I/O). Benefit: enables the test suite's MockNodeDB to substitute these helpers without touching the real filesystem.

Portable struct zeroing

Three sites in ClientAppDataStore.cpp (one in init(), one in clear(), one in remove()) zero a nanopb-generated POD slot back to its empty state. They use memset(&dst, 0, sizeof(dst)) rather than the more idiomatic dst = meshtastic_ClientAppData_init_zero; because the nRF52 toolchain (arm-none-eabi-gcc 9.2.1) rejects brace-init-list assignment to an existing POD struct via operator=, while host gcc (used by the Portduino test env) accepts it. memset is portable across both, and is byte-equivalent for these init-zero macros (every field's zero value is byte-zero).

Tests run and results

$ bash bin/test-native-docker.sh -f test_client_app_data
================= 27 test cases: 27 succeeded in 00:03:12.789 =================

$ bash bin/test-native-docker.sh -f test_default
================= 14 test cases: 14 succeeded in 00:03:22.012 =================

Both runs: zero compile errors, zero linker errors, zero warnings on any feature file. Validated on the Docker-based [env:coverage] (Portduino-based native, Python 3.14 + PlatformIO 6.1.19).

The 27 new tests cover: isValidAppId accept/reject matrix (conventional names, null, empty, uppercase, whitespace, oversize, forbidden chars), set/get round trip with byte-exact verification, payload-size bounds (at-max, over-max), slot accounting (overwrite reuses, max-records enforced, delete frees), delete-then-get NotFound, delete-of-missing semantic, server-set updated_at (caller value ignored, real epoch when test RTC is seeded), persistence-survives-reinit (mock retains saved snapshot across init reset), storage-error injection on set and remove, factory-reset semantic (modeled by mock returning OTHER_FAILURE, matching the post-rmDir("/prefs") world), and first-boot empty store.

Dispatch test harness limitation

Direct dispatch tests for the four new AdminModule cases are deliberately omitted. The case bodies are thin glue (validation, Result -> Routing_Error mapping, MeshPacket reply construction). Exercising them in isolation would require standing up channels / service / MeshPacket fakes that the native test harness does not provide, and would re-test logic the store-level cases below already cover. The dispatch glue compiles and links via the same coverage build that runs the test suite. Documented inline at the top of test/test_client_app_data/test_main.cpp.

Security caveat: namespaced, not owned

Repeated here for visibility on the firmware PR specifically:

The firmware validates app_id shape, payload size, and record limits, but it does not authenticate that a specific companion app owns a given app_id. Any admin-capable client may be able to write or delete any app_id unless future firmware policy adds stronger ownership controls. Clients must therefore treat stored metadata as untrusted, optional, and recoverable. Sensitive data, secrets, paid entitlement state, identity keys, session keys, trust authority, blocklists, and security-critical decisions must not be stored here.

session_passkey is a node-minted nonce (replay protection, not per-client identity). config.security.admin_key[3] identifies operators who can issue PKC-encrypted remote admin (not per companion app). PhoneAPI sees a single undifferentiated stream from the connected phone. There is no firmware primitive today that could be repurposed for per-app ownership; adding one would be a separate, larger RFC.

Compatibility

Wire-additive only. Existing clients are unaffected. Companion apps built against this PR will get either Routing_Error_NOT_AUTHORIZED (today's catch-all for unknown admin payloads on older builds) or no reply at all when talking to firmware that doesn't implement the feature; both must be treated as "feature unavailable, fall back to app-local storage."

No version bump. No device_state migration. The on-disk file is a fresh /prefs/clientappdata.proto; absence is the expected first-boot state.

Hardware E2E validation plan

Native tests pass on the Docker coverage env (Portduino-based). The nRF52 build of this branch (arm-none-eabi-gcc 9.2.1) was also validated locally for the WisMesh Tag environment (bash bin/build-nrf52.sh rak_wismeshtag, SUCCESS, UF2 + DFU OTA zip generated). Real-device validation has been executed against a flashed RAK WisMesh Tag and passed all checklist steps, with results captured in the "Hardware E2E validation result" subsection below.

Target device

RAK WisMesh Tag (an nRF52840 RAK4631-family tag from RAKwireless / MOKO Smart that the developer has on hand). Single representative target; if maintainers want broader coverage before merge, happy to extend to a second target.

Why this is not part of the current native/unit phase

The native coverage env exercises every line of the new module and dispatch glue under a real (mocked-NodeDB) lifecycle, but it does not execute the embedded toolchain, real flash I/O, the actual SafeFile atomic-write code path, or any BLE/serial transport. Hardware E2E catches: real-flash write/read round-trips, BLE GATT transport sizing for the 512-byte payload, real RTC behavior for updated_at (vs the host-clock seed in the test harness), and any platform-specific #ifdef paths that the Portduino build skips.

Splitting it off keeps the merge-blocking phase (CI-friendly native tests) cheap and runnable by maintainers without dedicated hardware, and lets the real-device pass land as a comment on the PR with logs and an artifact reference.

Validation checklist

  1. Build the firmware for the WisMesh Tag PlatformIO environment: bash bin/build-nrf52.sh rak_wismeshtag. Produces UF2 + DFU OTA zip + ELF + manifest under release/.
  2. Flash the device using the repo-supported artifact and documented flashing path (the variant's documented flow, or the official Meshtastic flasher).
  3. Verify boot: device comes up, reaches normal idle, no boot-loop, no startup error log.
  4. Connect locally over BLE (preferred for the GATT-sizing check) or USB serial. Use the official Meshtastic CLI / phone app pre-bond as needed.
  5. Send set_client_app_data with the test record:
    • app_id = "app.mesh.test"
    • version = 1
    • payload = a deterministic 64-byte pattern (e.g. 0..63)
    • leave updated_at zero on the wire (firmware overwrites it)
  6. Read it back via get_client_app_data_request. Verify byte-for-byte equality on payload, version == 1, app_id == "app.mesh.test", updated_at is a real epoch (or 0 if RTC is uninitialized on this build).
  7. Reboot the device (admin reboot or power cycle). After re-bond, repeat the read. Verify the record persists.
  8. Delete via delete_client_app_data_request("app.mesh.test"). Repeat the read; verify the response is a get_client_app_data_response with empty app_id (the NOT_FOUND sentinel).
  9. Factory reset via the existing admin path (factory_reset_config is enough; factory_reset_device also fine). Re-bond. Repeat the read; verify the record is still absent (factory reset cleared /prefs including our file).
  10. Confirm normal Meshtastic behavior is unaffected: send/receive a text on the primary channel, verify NodeInfo broadcast still happens on schedule, verify position fix and telemetry still surface as expected. The feature is local-only, so any regression here would point at unrelated breakage from the submodule bump or the regenerated headers.

Use the test app_id app.mesh.test throughout. Do not use any production app_id during validation.

Rollback / safety checklist

  • Back up the current node config first if practical: use admin backup_preferences to flash, or pull config via the Meshtastic CLI to a local file.
  • Do not store secrets in the validation payload. Use the deterministic byte pattern above; nothing sensitive.
  • Do not store entitlement, identity, trust, routing, or auth state in the validation payload (matches the namespaced-not-owned rule from the proto comment).
  • Confirm the rollback path before flashing: keep a known-good release artifact for the WisMesh Tag on hand (downloaded from meshtastic.org/downloads/ or the latest GitHub Release), and verify the official Meshtastic flasher can re-flash it from a clean device. If anything goes sideways, this is the recovery path.
  • Record in the validation note posted to this PR: the exact firmware artifact (commit SHA + env name), the flashing method (CLI tool + version, or web flasher build date), and the captured logs from each step. If anything fails, attach the log excerpt verbatim instead of paraphrasing.

Where this happens

After the RFC PR and Protobufs PR are open and linked. Either before or during firmware PR review, depending on maintainer cadence. The validation note will be appended to this PR as a comment with an artifact link and a checklist of pass/fail per step.

Hardware E2E validation result

Field Value
Date 2026-05-05
Target RAK WisMesh Tag
Nodenum 5aad5ed6
Firmware artifact firmware-rak_wismeshtag-2.7.23.5979032.uf2
UF2 MD5 960e8c28fa1db02f6b0bf035a272e932
Transport local connected device path used by the diagnostic client (BLE)
Test app_id app.mesh.test
Test payload deterministic 64-byte pattern, bytes 0..63

Per-step results:

Step Result
Get probe before set Pass (empty app_id sentinel returned, NOT_FOUND)
Set + verify Pass (response carried app_id="app.mesh.test", version=1, payload=64B, byte-for-byte match against the deterministic pattern)
Verify after reboot Pass (after manual power cycle and BLE reconnect, response still carried the same app_id, version, 64-byte payload, and the firmware-set updated_at)
Delete + verify Pass (follow-up get returned the empty app_id sentinel)
Get probe after delete Pass (empty app_id sentinel)

Notes:

  • Set transport flush: 52 ms.
  • Delete transport flush: 42 ms.
  • Persistence proof. The firmware-set updated_at value remained byte-equal across the manual reboot, and the 64-byte payload verified byte-for-byte after reconnect. This confirms NodeDB::saveProto("/prefs/clientappdata.proto", ...) survived the power cycle and was re-loaded by ClientAppDataStore::init() on boot.
  • Empty app_id sentinel confirmed after delete and after a clean factory state, exactly as documented for tag 106.
  • Safety. The validation payload contained no secrets, no entitlement state, no identity material, no trust state, no routing state, and no auth state. It is a deterministic 64-byte test pattern.
  • The diagnostic client used to drive the validation is a local debug-only harness in the companion app; it is not shipped as a user feature and is not required to verify or reproduce these results.

Linked

gotnull added 3 commits May 5, 2026 17:14
Add a small bounded local-node-only metadata store for companion apps,
exposed through the new AdminMessage tags 104..107. Each record carries
an opaque payload up to 512 bytes keyed by app_id, persisted to
/prefs/clientappdata.proto via the existing NodeDB::saveProto helper,
and cleared automatically by factoryReset() through the existing
rmDir("/prefs") path.

Includes:
  - protobufs submodule bump to the matching feature commit
  - regenerated nanopb output for admin and localonly
  - clientAppDataFileName constant in NodeDB.h
  - new ClientAppDataStore class (modules/ClientAppDataStore.{h,cpp})
    with strict app_id validation (^[a-z0-9._-]{1,32}$), bounded slot
    table (max 4 records, no heap), in-place overwrite, slot compaction
    on remove, server-set updated_at via getValidTime(RTCQualityDevice)
  - ClientAppDataStore::init() wired into main.cpp after nodeDB init,
    alongside the other /prefs-reading initializers

The store is namespaced, not owned: firmware enforces shape and
capacity but does not authenticate which client is writing. Callers
must treat payloads as untrusted, optional, and recoverable.

AdminModule dispatch wiring and tests follow in subsequent commits.
Wire AdminMessage dispatch for the four new client_app_data tags
(104..107) into AdminModule::handleReceivedProtobuf:

  - set_client_app_data: local-only (mp.from != 0 -> NOT_AUTHORIZED).
    On valid input: ClientAppDataStore::set; relies on the existing
    auto-ACK path to emit Routing_Error_NONE. Validation, payload-size,
    no-space, and storage failures map to Routing_Error_BAD_REQUEST.

  - get_client_app_data_request: validates app_id and replies with
    get_client_app_data_response. On miss the response carries an
    empty app_id sentinel (the firmware returns the _init_default
    record because the store leaves *out untouched on NotFound).
    Calls setPassKey on the response, mirroring handleGetOwner /
    handleGetDeviceUIConfig style. Honors mp.pki_encrypted.

  - delete_client_app_data_request: local-only. Treats delete-of-
    missing as success (idempotent), matching remove_by_nodenum /
    remove_favorite_node / delete_file_request precedent.

Also extend messageIsRequest()/messageIsResponse() to enumerate the
new request/response tags so the auth path does not mis-classify them
as state-changing writes (which would force a session_passkey check).

No new private methods on AdminModule; all logic is inline in the
case bodies, matching the existing handler style.
Add test/test_client_app_data/test_main.cpp covering ClientAppDataStore
end-to-end against a MockNodeDB that intercepts loadProto/saveProto for
the LocalClientAppData descriptor. 27 tests across:

  - isValidAppId() accept/reject matrix (conventional names, null,
    empty, uppercase, whitespace, oversize, forbidden chars)
  - set/get round trip + payload-size bounds (at-max, over-max)
  - slot accounting (overwrite reuses, delete frees, max-records enforced)
  - delete-then-get NotFound, delete-of-missing semantic
  - server-set updated_at (caller-supplied value is ignored, real
    epoch when test RTC is seeded by initializeTestEnvironment)
  - persistence-survives-reinit (mock retains saved snapshot)
  - storage error injection on set and remove
  - factory-reset semantic (modeled by mock returning OTHER_FAILURE,
    matching the post-rmDir("/prefs") world)
  - first-boot empty store

Also mark NodeDB::loadProto and NodeDB::saveProto virtual so tests can
substitute MockNodeDB. Mirrors the existing MockNodeDB::getMeshNode
override pattern in test/test_traffic_management. No production behavior
change; one indirect call per save/load is negligible vs flash I/O.

Dispatch-path tests (AdminModule cases for tags 104..107) are
deliberately omitted as harness-limited: exercising them in isolation
would require standing up channels/service/MeshPacket fakes that the
native test harness does not provide, and would re-test glue logic
the store-level cases already cover. Documented inline at the top
of test_main.cpp.

Both bin/test-native-docker.sh -f test_client_app_data and
bin/test-native-docker.sh -f test_default pass clean. Zero compile
errors, zero linker errors, zero warnings on any feature file.
@github-actions github-actions Bot added the needs-review Needs human review label May 5, 2026
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 5, 2026

CLA assistant check
All committers have signed the CLA.

@gotnull
Copy link
Copy Markdown
Author

gotnull commented May 12, 2026

Field-validated this branch end-to-end from an iOS companion app driving the new admin tags over BLE. Posting numbers in case they help on Q3 of the protobufs PR ("acceptable across constrained targets?").

Hardware / firmware. RAK WisMesh Tag (nRF52840) flashed from this PR branch. Companion app: an iOS Flutter client that talks AdminMessage 104..107 directly over the BLE service.

App-side payload. A fixed 60-byte structure (2 magic bytes, version, length, flags, seen-bits, hints, timestamps, CRC32). Sits at ~12% of the 512-byte cap; well below worst-case LocalClientAppData size 2258 bytes even with all four slots used.

What was exercised.

Path Result
First read on unwritten slot → empty-app_id sentinel OK
set_client_app_data flush → ACK OK, 60-byte payload accepted
get_client_app_data_request → response with payload + firmware-set updated_at OK
Re-write with different bytes → ACK + read-back match OK
delete_client_app_data_request → ACK → re-read returns empty-app_id sentinel OK
BLE link cycle (forced disconnect + auto-reconnect) → slot state preserved OK
Device hard power-cycle → re-read returns byte-identical payload OK, updated_at preserved, no re-stamp on read

The persistence-over-reboot leg is the one not explicitly covered in the PR body's Validation section, so flagging it specifically: payload bytes and the firmware-stamped updated_at survived a full power-off / power-on cycle without modification.

Behaviours worth confirming I'm reading right:

  1. Firmware does not re-stamp updated_at on a get_client_app_data_request. Only writes stamp it. Verified by snapshot diff across reboot.
  2. The empty-app_id response from a get-miss is the same shape whether the slot was never written or was just deleted. Both behave identically client-side, which keeps the consumer simple.

Happy to share the diagnostic-side trace if it would be useful for the maintainer review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request needs-review Needs human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants