ros-controls · bmagyar · Jun 25, 2025 · Jun 18, 2025 · Jun 19, 2025 · Jun 19, 2025
diff --git a/design_drafts/hardware_status.md b/design_drafts/hardware_status.md
@@ -0,0 +1,160 @@
+# Standardized Hardware Status Reporting
+
+## Short description
+
+This proposal is an attempt at a standardized way for hardware components in `ros2_control` to report their status.
+
+Right now, if you're writing a hardware interface, how you report things like health, errors, or connectivity is pretty much up to you. This usually means everyone rolls their own custom messages. While that works for a single project, it makes it really tough to build generic, reusable tools on top of `ros2_control` (and even internally!). This proposal is a first-pass attempt at defining a generic `HardwareStatus` message. The main goal is to find a good balance between a structured, predictable format that tools can rely on, and the flexibility needed to report all the weird, wonderful, and specific details of different hardware.
+
+This is very much a draft to get the conversation started, not a final solution!
+
+Here's a diagram I put together to visualize it:
+
+![hardware_status](images/hardware_status.png)
+
+Let's discuss this in slightly more detail.
+
+## 1. The Idea: Structured vs. Unstructured
+
+The core idea is to split status reporting into two complementary parts.
+
+1. **Structured Status:**
+- A fixed set of fields covering \~80% of common hardware needs—machine-readable, reliable, and directly consumable by controllers, watchdogs, automation tools and even for us internally.
+- A compact, general-purpose block of enums and identifiers. If a device can’t fill one of these fields, it simply reports `UNKNOWN`.
+
+2. **Unstructured Status:**
+- A free-form array of key/value diagnostics for everything else—geared toward logs, dashboards, and human inspection only.
+- A slower, richer stream of `diagnostic_msgs/KeyValue[]`, strictly for debugging and UI, ideally not parsed by control loops.
+
+## 2. Example Message Topology
+
+We separate **real-time status** (fast, small) from **detailed diagnostics** (bulkier, slower) on two topics:
+
+| Topic                          | Msg Type                           | Rate     | Intended Use                                |
+| ------------------------------ | ---------------------------------- | -------- | ------------------------------------------- |
+| `/hardware_status`             | `control_msgs/HardwareStatus`      | 1–50 Hz  | Health, ops & safety logic, auto‐monitoring |
+| `/hardware_diagnostics` | `control_msgs/HardwareDiagnostics` | 0.1–1 Hz | GUI dashboards, logs, debugging             |
+
+## 3. Structured Status: `HardwareStatus`
+
+```
+# control_msgs/msg/HardwareStatus
+
+std_msgs/Header header        # timestamp + frame_id (optional)
+string           hardware_id  # unique per‐instance, e.g. "left_wheel/driver"
+
+# ——— Health & Error ——————————————————————————————————————————————
+uint8  health_status         # see HealthStatus enum
+uint8  error_domain          # see ErrorDomain enum
+
+# ——— Operational State ———————————————————————————————————————————
+uint8  operational_mode      # see ModeStatus enum
+uint8  power_state           # see PowerState enum
+uint8  connectivity_status   # see ConnectivityStatus enum
+
+# ——— Vendor & Version Info ————————————————————————————————————————
+string manufacturer          # e.g. "Bosch"
+string model                 # e.g. "Lidar-XYZ-v2"
+string firmware_version      # e.g. "1.2.3"
+
+# ——— Optional Details for Context —————————————————————————————————
+# Provides specific quantitative values related to the enums above.
+# e.g., for power_state, could have {key: "voltage", value: "24.1"}
+# e.g., for connectivity, could have {key: "signal_strength", value: "-55dBm"}
+diagnostic_msgs/KeyValue[] state_details
+```
+
+### 3.1. Enums
+
+```
+# control_msgs/msg/HardwareStatus (continued)
+
+# High-level health
+uint8 HEALTH_UNKNOWN=0
+uint8 HEALTH_OK     =1
+uint8 HEALTH_DEGRADED=2
+uint8 HEALTH_WARNING =3
+uint8 HEALTH_ERROR   =4
+uint8 HEALTH_FATAL   =5
+
+# Error category
+uint8 ERROR_NONE    =0
+uint8 ERROR_UNKNOWN =1
+uint8 ERROR_HW      =2
+uint8 ERROR_FW      =3
+uint8 ERROR_COMM    =4
+uint8 ERROR_POWER   =5
+
+# Mode of operation
+uint8 MODE_UNKNOWN    =0
+uint8 MODE_MANUAL     =1
+uint8 MODE_AUTONOMOUS =2
+uint8 MODE_SAFE       =3
+uint8 MODE_MAINTENANCE=4
+
+# Power states
+uint8 POWER_UNKNOWN   =0
+uint8 POWER_OFF       =1
+uint8 POWER_STANDBY   =2
+uint8 POWER_ON        =3
+uint8 POWER_SLEEP     =4
+
+# Connectivity
+uint8 CONNECT_UNKNOWN =0
+uint8 CONNECT_UP      =1
+uint8 CONNECT_DOWN    =2
+uint8 CONNECT_FAILURE =3
+```
+
+#### 3.2. A Note on a Future Addition 
+
+A potential limitation of the single-value enums above is that a component can only report one state per category at a time. Consider the `error_domain`: what happens if a hardware fault (`ERROR_HW`) immediately causes a communication failure (`ERROR_COMM`)? With the current design, the hardware driver must choose to (or is limited to) report only one.
+
+A potential solution for this in a future iteration would be to define some enums as **bitfields**. This would involve assigning values as powers of 2, allowing multiple states to be combined using a bitwise `OR` operation.
+
+For example, the `ErrorDomain` enum could be redefined as a bitmask:
+```
+# Example ErrorDomain as a bitfield
+uint8 ERROR_NONE    = 0  # 0b00000000
+uint8 ERROR_HW      = 1  # 0b00000001
+uint8 ERROR_FW      = 2  # 0b00000010
+uint8 ERROR_COMM    = 4  # 0b00000100
+uint8 ERROR_POWER   = 8  # 0b00001000
+# ... up to 4 more flags
+```
+
+A publisher could then report both a hardware and power fault simultaneously by setting the value to `ERROR_HW | ERROR_POWER` (which is `9`, or `0b00001001`). A subscriber could then check for a specific error using a bitwise `AND` (e.g., `if (status.error_domain & ERROR_HW)`).
+
+The primary trade-off is that we would be limited by the size of the enum's underlying type. A `uint8` allows for exactly 8 unique flags. While this may be sufficient for now, it's a constraint to keep in mind as we finalize this design. We can add this if we hear from the community that this is needed.
+
+## 4. Unstructured Diagnostics: `HardwareDiagnostics`
+
+```
+# control_msgs/msg/HardwareDiagnostics
+
+std_msgs/Header    header
+string             hardware_id
+KeyValue[]         entries   # diagnostic_msgs/KeyValue[]
+```
+
+> **Example Entry**
+>
+> ```yaml
+> header:
+>   stamp: {sec: 1625563200, nanosec: 0}
+> hardware_id: "arm_controller"
+> entries:
+>   - {key: "cpu_temp",       value: "72.5°C"}
+>   - {key: "voltage_input",  value: "24.1V"}
+>   - {key: "last_error_code",value: "0x1A3F"}
+> ```
+
+## 5. Open Questions & Discussion
+
+1. Could we reuse `lifecycle_msgs/State` for `operational_mode`, or is a dedicated enum preferable for clarity?
+2. I left some question marks in the diagrams, any categories we are missing?
+3. Should `HardwareStatus` include a short `string error_message`, or strictly push error details into diagnostics only?
+4. Also another thing, maybe we use one big message (`control_msgs/HardwareStatus`) to make it simpler rather than publish structured vs. unstructured data on separate topics (`/hardware_status` and `/hardware_diagnostics`)?
+5. And the main questions that I have, Is this whole approach overly complicated, let's avoid that pitfall.
+
+Looking forward to hearing what everyone thinks!
diff --git a/design_drafts/images/hardware_status.png b/design_drafts/images/hardware_status.png