Skip to content

Add OTA failure reason reporting and extensible status details#84

Open
MathiasKoch wants to merge 4 commits intofeature/mqtt-traitfrom
rustot-ota-reasons
Open

Add OTA failure reason reporting and extensible status details#84
MathiasKoch wants to merge 4 commits intofeature/mqtt-traitfrom
rustot-ota-reasons

Conversation

@MathiasKoch
Copy link
Member

Summary

Adds granular failure diagnostics to OTA job status updates. When an OTA operation fails, the job status now includes a human-readable reason string and numeric error_code in the status details sent to AWS IoT — enabling cloud-side root cause analysis without device log access.

Introduces the StatusDetailsExt trait, allowing PAL implementations to inject custom key-value pairs (device temperature, battery level, firmware version, etc.) into every job status update alongside the base OTA fields.

Also adds self-managed integration test infrastructure that automates OTA job provisioning, S3 upload, and cloud-side assertion against AWS IoT.

Note: This is PR 3 of 4 in a stacked series. Depends on #83 (generic MqttClient trait).

Design

Failure Reason Flow

OtaError::Pal(pal_err)
  → ImageStateReason::Pal(OtaPalError)
    → JobStatusReason::Aborted(Some(reason))
      → OtaStatusDetails { reason: "sig_check_failed", error_code: 1001 }
        → MQTT job update with status details

OtaPalError variants map to reason strings and error codes:

  • SignatureCheckFailed"sig_check_failed" / 1001
  • FileWriteFailed"file_write_failed" / 1002
  • FileCloseFailed"file_close_failed" / 1003
  • BadImageState"bad_image_state" / 1004
  • And others for each PAL failure mode

StatusDetailsExt

PAL implementations declare an associated type:

trait OtaPal {
    type StatusDetailsExt: StatusDetailsExt + Clone;
    fn status_details(&self) -> Self::StatusDetailsExt;
}

Custom fields are combined with base OTA fields via CombinedStatusDetails<E>, which flattens both into a single JSON object during serialization. Default impl for () provides zero-cost opt-out.

Integration Test Infrastructure (tests/common/aws_ota.rs)

  • Loads management credentials from environment, assumes role into target IoT account
  • Cleans up stale QUEUED/IN_PROGRESS jobs before each run
  • Uploads firmware to S3, creates OTA update, polls until job execution appears
  • Post-test: describe_job_execution() asserts cloud-side status and status details
  • Guaranteed cleanup (cancel jobs + delete S3) even on test panic

Changelog

  • Add OtaPalError variants with error_code() and as_reason_str() methods
  • Add OtaStatusDetails struct with reason, error_code, and progress fields
  • Add StatusDetailsExt trait and CombinedStatusDetails<E> for extensible job status details
  • Add StatusDetailsExt associated type to OtaPal trait
  • Preserve OtaPalError through error handling chain instead of discarding to Aborted(None)
  • Include block progress count in Failed status details
  • Add self-managed OTA integration test infrastructure with AWS credential management and cleanup

MathiasKoch and others added 4 commits February 6, 2026 11:02
Extend OTA status updates to include detailed failure reasons (e.g.
SignatureCheckFailed, BadImageState) in job status details. Rework
integration tests to self-provision OTA jobs via AWS SDK instead of
relying on external shell scripts, with automatic cleanup.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
OtaError::Pal was discarding the inner OtaPalError, so the cleanup path
reported Aborted(None) — losing the failure reason (e.g.
SignatureCheckFailed) from job status details. Now OtaError::Pal carries
the OtaPalError through to the job status update.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Progress was only refreshed for InProgress updates, so Failed status
carried the stale value from the last periodic update. Now progress is
also updated on Failed, giving an accurate block count at failure time.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Allow PAL implementations to provide custom key-value pairs that are
included in every job status update. The PAL defines a StatusDetails
associated type implementing StatusDetailsExt, which gets serialized
alongside the base OTA status fields via CombinedStatusDetails.

This threads the extra context through ProgressState, ControlInterface,
and DataInterface as a generic parameter (defaulting to ()).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments