diff --git a/design_drafts/hardware_status.md b/design_drafts/hardware_status.md index c76fcbd..ecdeb0d 100644 --- a/design_drafts/hardware_status.md +++ b/design_drafts/hardware_status.md @@ -4,7 +4,7 @@ This proposal is an attempt at a standardized way for hardware components in `ros2_control` to report their status. -Right now, if you're writing a hardware interface, how you report things like health, errors, or connectivity is pretty much up to you. This usually means everyone rolls their own custom messages. While that works for a single project, it makes it really tough to build generic, reusable tools on top of `ros2_control` (and even internally!). This proposal is a first-pass attempt at defining a generic `HardwareStatus` message. The main goal is to find a good balance between a structured, predictable format that tools can rely on, and the flexibility needed to report all the weird, wonderful, and specific details of different hardware. +Right now, if you're writing a hardware component, how you report things like health, errors, or connectivity is pretty much up to you. This usually means everyone rolls their own custom messages. While that works for a single project, it makes it really tough to build generic, reusable tools on top of `ros2_control` (and even internally!). This proposal is a first-pass attempt at defining a generic `HardwareStatus` message. The main goal is to find a good balance between a structured, predictable format that tools can rely on, and the flexibility needed to report all the weird, wonderful, and specific details of different hardware. This is very much a draft to get the conversation started, not a final solution! @@ -14,17 +14,24 @@ Here's a diagram I put together to visualize it: Let's discuss this in slightly more detail. +## 0. Note on Terminology +- Hardware Component: A Hardware Interface written for a Single Component, currently can be of form `System`, `Actuator` or `Sensor`, and have multiple sub-components. + - Ex - "pal_arm" +- Device: A sub-component of a Hardware Component. + - Ex - "base_motor" + ## 1. The Idea: Structured vs. Unstructured The core idea is to split status reporting into two complementary parts. -1. **Structured Status:** -- A fixed set of fields covering \~80% of common hardware needs—machine-readable, reliable, and directly consumable by controllers, watchdogs, automation tools and even for us internally. -- A compact, general-purpose block of enums and identifiers. If a device can’t fill one of these fields, it simply reports `UNKNOWN`. +1. **Structured, Standards-Based Status:** + - A fixed set of fields covering \~80% of common hardware needs - machine-readable, reliable, and directly consumable by controllers, watchdogs, automation tools and even for us internally. + - A collection of status messages, where each message type corresponds to a specific industry standard (e.g., `CANopenState`), providing a machine-readable and reliable format. + - A device in a hardware component populates only the status blocks relevant to it within a single, device-specific message(`HardwareDeviceStatus`), aggregates it into one message(`HardwareStatus`) which containes all the devices in the hardware, covering the common hardware needs for controllers, watchdogs, and automation tools. -2. **Unstructured Status:** -- A free-form array of key/value diagnostics for everything else—geared toward logs, dashboards, and human inspection only. -- A slower, richer stream of `diagnostic_msgs/KeyValue[]`, strictly for debugging and UI, ideally not parsed by control loops. +2. **Unstructured Status:** + - A free-form array of key/value diagnostics for everything else-geared toward logs, dashboards, and human inspection only. + - A slower, richer stream of `diagnostic_msgs/KeyValue[]`, strictly for debugging and UI, ideally not parsed by control loops. ## 2. Example Message Topology @@ -37,37 +44,68 @@ We separate **real-time status** (fast, small) from **detailed diagnostics** (bu ## 3. Structured Status: `HardwareStatus` +The foundation of this approach is the `HardwareStatus` message. A single publisher per hardware component would publish `HardwareStatus` messages on the `/hardware_status` topic , each message is an array of `HardwareDeviceStatus` messages which contain the standard separated messages of a single device in the hardware component. + ``` # control_msgs/msg/HardwareStatus std_msgs/Header header # timestamp + frame_id (optional) -string hardware_id # unique per‐instance, e.g. "left_wheel/driver" +string hardware_id # unique per‐hardware-component, ideally the name of the hardware derived from HardwareInfo e.g. "pal_arm" + +# --- Device Status Aggregation --------------------------------- +# An array containing the status of individual devices in the hardware component +HardwareDeviceStatus[] hardware_device_states +``` +``` +# control_msgs/msg/HardwareDeviceStatus +string device_id # unique per-device, e.g. "base_motor" + +# --- Standard-Specific States -------------------------------------- +# States populated based on the standards relevant to this device. +# A device will only fill the arrays for the standards it implements, rest will be empty +GenericState[] generic_hardware_status +CANopenState[] canopen_states +EtherCATState[] ethercat_states +VDA5050State[] vda5050_states +``` + +### 3.1. Standardized State Messages + +Below are the proposed initial standard-specific messages, based on widely used industrial standards. Additions and opinions here would be really appreciated! + +--- + +**`ros2_control` Generic State** + +This message encapsulates the general-purpose status fields, serving as a baseline for any hardware component. -# ——— Health & Error —————————————————————————————————————————————— +``` +# control_msgs/msg/GenericState + +# --- Health & Error ---------------------------------------------- uint8 health_status # see HealthStatus enum -uint8[] error_domain # Array of device errors, because hardware can throw more than one, see ErrorDomain enum +uint8[] error_domain # Array of device errors, see ErrorDomain enum -# ——— Operational State ——————————————————————————————————————————— +# --- Operational State ------------------------------------------- uint8 operational_mode # see ModeStatus enum uint8 power_state # see PowerState enum uint8 connectivity_status # see ConnectivityStatus enum -# ——— Vendor & Version Info ———————————————————————————————————————— +# --- Vendor & Version Info ---------------------------------------- string manufacturer # e.g. "Bosch" string model # e.g. "Lidar-XYZ-v2" string firmware_version # e.g. "1.2.3" -# ——— Optional Details for Context ————————————————————————————————— +# --- Optional Details for Context --------------------------------- # Provides specific quantitative values related to the enums above. # e.g., for power_state, could have {key: "voltage", value: "24.1"} # e.g., for connectivity, could have {key: "signal_strength", value: "-55dBm"} diagnostic_msgs/KeyValue[] state_details ``` -### 3.1. Enums - +#### `ROS2ControlState` Enums ``` -# control_msgs/msg/HardwareStatus (continued) +# control_msgs/msg/GenericState (enums) # High-level health uint8 HEALTH_UNKNOWN=0 @@ -90,10 +128,7 @@ uint8 EMERGENCY_STOP_HW # state of the emergency stop hardware (i.e. e-stop butt uint8 EMERGENCY_STOP_SW # state of the emergency stop software system (over travel, pinch point) uint8 PROTECTIVE_STOP_HW # state of the protective stop hardware (i.e. safety field state) uint8 PROTECTIVE_STOP_SW # state of the software protective stop -# Some protective stop errors need to be acknowledged before the hardware can reactivate -# see https://docs.universal-robots.com/Universal_Robots_ROS2_Documentation/doc/ur_robot_driver/ur_robot_driver/doc/dashboard_client.html#unlock-protective-stop-std-srvs-trigger uint8 SAFETY_STOP -# Some hardware requires calibration on startup (for example a linear rail or quadruped) unit8 CALIBRATION_REQUIRED @@ -133,27 +168,72 @@ uint8 CONNECT_FAILURE =3 uint8 CONNECTION_SLOW # to tell the controlling system it is struggling to communicate at rate ``` -#### 3.2. A Note on a Future Addition +--- -A potential limitation of the single-value enums above is that a component can only report one state per category at a time. Consider the `error_domain`: what happens if a hardware fault (`ERROR_HW`) immediately causes a communication failure (`ERROR_COMM`)? With the current design, the hardware driver must choose to (or is limited to) report only one. -That or return an array of errors and let mission control sort out the correct action to recover. +**CANopen State** -A potential solution for this in a future iteration would be to define some enums as **bitfields**. This would involve assigning values as powers of 2, allowing multiple states to be combined using a bitwise `OR` operation. +Reports state according to CiA 301 and CiA 402, common for motor drives and I/O. +- **Source:** [CAN in Automation (CiA)](https://www.can-cia.org/) - CiA 301 & 402 specifications. -For example, the `ErrorDomain` enum could be redefined as a bitmask: ``` -# Example ErrorDomain as a bitfield (why only an 8 bit number?) -uint8 ERROR_NONE = 0 # 0b00000000 -uint8 ERROR_HW = 1 # 0b00000001 -uint8 ERROR_FW = 2 # 0b00000010 -uint8 ERROR_COMM = 4 # 0b00000100 -uint8 ERROR_POWER = 8 # 0b00001000 -# ... up to 4 more flags +# control_msgs/msg/CANopenState + +uint8 node_id # The CANopen node ID of the device + +# --- CiA 301 State ------------------------------------------------- +uint8 nmt_state # Network Management state (e.g., OPERATIONAL) + +# --- CiA 402 State (for drives) ------------------------------------ +uint8 dsp_402_state # Drive state machine state (e.g., OPERATION_ENABLED) + +# --- Error Reporting ----------------------------------------------- +uint32 last_emcy_code # Last Emergency (EMCY) error code received ``` -A publisher could then report both a hardware and power fault simultaneously by setting the value to `ERROR_HW | ERROR_POWER` (which is `9`, or `0b00001001`). A subscriber could then check for a specific error using a bitwise `AND` (e.g., `if (status.error_domain & ERROR_HW)`). +--- + +**EtherCAT State** + +Reports the EtherCAT slave state according to the EtherCAT State Machine (ESM). +- **Source:** [EtherCAT Technology Group (ETG)](https://www.ethercat.org/en/downloads.html) - ETG.1000.4 EtherCAT Protocol Specifications. + +``` +# control_msgs/msg/EtherCATState -The primary trade-off is that we would be limited by the size of the enum's underlying type. A `uint8` allows for exactly 8 unique flags. While this may be sufficient for now, it's a constraint to keep in mind as we finalize this design. We can add this if we hear from the community that this is needed. +uint16 slave_position # Position of the slave on the bus (0, 1, 2...) +string vendor_id # Unique vendor identifier +string product_code # Unique product code for the device + +# --- EtherCAT State Machine (ESM) ---------------------------------- +uint8 al_state # Application Layer state (INIT, PREOP, SAFEOP, OP) +bool has_error # True if the slave is in an error state +uint16 al_status_code # AL Status Code indicating the reason for an error +``` + +--- + +**VDA5050 State** + +For AGVs and AMRs compliant with VDA5050, this provides a snapshot of the vehicle's high-level status. +- **Source:** [Verband der Automobilindustrie (VDA)](https://github.com/VDA5050/VDA5050) - VDA 5050 Specification. + +``` +# control_msgs/msg/VDA5050State + +# --- Order and Action Status --------------------------------------- +string order_id # ID of the currently executed order +string action_status # e.g., RUNNING, PAUSED, FINISHED, FAILED +uint32 last_node_id # ID of the last reached node in the topology + +# --- Vehicle State ------------------------------------------------- +bool driving # True if the vehicle's drives are active +float64 battery_charge # Current battery charge in percent +string operating_mode # e.g., MANUAL, AUTOMATIC, SERVICE + +# --- Error Reporting ----------------------------------------------- +string error_type +string error_description +``` ## 4. Unstructured Diagnostics: `HardwareDiagnostics` @@ -178,60 +258,38 @@ KeyValue[] entries # diagnostic_msgs/KeyValue[] > ``` ## 5. Open Questions & Discussion +1. Is the current list of standardized state messages (`CANopen`, `EtherCAT`, `VDA5050`, `ISO10218`) a good starting point? Are there other non-proprietary standards that are critical to include? +2. Is this whole approach overly complicated? It would be good to avoid that pitfall. -1. Could we reuse `lifecycle_msgs/State` for `operational_mode`, or is a dedicated enum preferable for clarity? -2. I left some question marks in the diagrams, any categories we are missing? -3. Should `HardwareStatus` include a short `string error_message`, or strictly push error details into diagnostics only? -4. Also another thing, maybe we use one big message (`control_msgs/HardwareStatus`) to make it simpler rather than publish structured vs. unstructured data on separate topics (`/hardware_status` and `/hardware_diagnostics`)? -5. And the main questions that I have, Is this whole approach overly complicated, let's avoid that pitfall. +## 6. Alternative Publishing Strategies -Looking forward to hearing what everyone thinks! +While this proposal centers on a single topic with an array of device statuses, it's worth discussing the trade-offs of other possible architectures. How else could we structure the flow of status information? -## Hardware Status Interface -What does configuring the Hardware Status (per hardware because a mobile_base is likely different than the arm mounted on top of it) look like? -Should we have blocks of state (i.e. standardized messages) that can be added together if the hardware offers X, Y and Z features? -(For example my robot arm has a `safety interface` for e-stop/p-stop and a `hardware_status` interface to report power, operating mode and ...) -What does it look like at the interface level? Is there a separate read (and maybe write) method for status reporting and reconfiguration? -For example standard safety status (E-stop, P-stop), operating mode, [battery state](https://docs.ros2.org/foxy/api/sensor_msgs/msg/BatteryState.html). - -JointState has been the standard ROS2 control works with. What about GPIO, SafetyStatus, BatteryState, .... these are interfaces that hardware frequently provides. -What if ros2_control made a set of messages to standardize it's interfaces for each subcategory? -[SensorMsgs](https://docs.ros2.org/foxy/api/sensor_msgs/index-msg.html) is a start of what we need. -For example the UR controller exposes lots of interfaces via [GPIO](https://github.com/UniversalRobots/Universal_Robots_ROS2_Description/blob/85d2ad8d1526ee6c0f21dca94e1e697c83706b71/urdf/ur.ros2_control.xacro#L294-L311) but not in a standardized way so if someone wanted to control it and then switch robots their codebase would likely need to change to handle auxiliary control and monitoring. - -Some errors or states will be set as the hardware stops functioning. -Should the status broadcaster hold and continue to publish last known state? -Should the status broadcaster offer statistics on hardware DEACTIVATE/ERROR and ACTIVATIONS? -Lots of industrial applications would like to know how many e-stops, number of controller errors/faults, ____ per shift, week or some period of time -Could this open the option for custom or standard controllers to monitor and keep the system healthy? i.e. automatic arm fault reset controller, +- **Per Device Messages** + - One issue I see with the current aggregated status message approach is that it seems a tad bit complicated for simple systems, what if a hardware component has only 1 actuator? + - Then what if, instead of a single aggregated topic, each device in a hardware component published its own `HardwareDeviceStatus` message on the same `/hardware_status` topic which will now be of the type `HardwareDeviceStatus` + - Then receivers just listen to the same `/hardware_status` topic as before, but just have to parse the `device_id` to see if the data is relevant, and similarly, publishers have to also only fill in the `HardwareDeviceStatus` message and send it without need of aggregation +## References Links of hardware interfaces and their attempt to convey hardware status and support other control modes - ### UR [hardware_interface](https://github.com/UniversalRobots/Universal_Robots_ROS2_Driver/blob/868f240bc8578ebfa1d19b94f8a6a1ad62fa0bd1/ur_robot_driver/src/hardware_interface.cpp#L266-L270) [SafetyMode.msg](https://github.com/UniversalRobots/Universal_Robots_ROS2_Driver/blob/main/ur_dashboard_msgs/msg/SafetyMode.msg) [RobotMode.msg](https://github.com/UniversalRobots/Universal_Robots_ROS2_Driver/blob/main/ur_dashboard_msgs/msg/RobotMode.msg) [control.xacro](https://github.com/UniversalRobots/Universal_Robots_ROS2_Description/blob/85d2ad8d1526ee6c0f21dca94e1e697c83706b71/urdf/ur.ros2_control.xacro#L294-L311) - ### Kuka [hardware_interface](https://github.com/lbr-stack/lbr_fri_ros2_stack/blob/f2784b86e5975eddc9b5eab901baaca329306653/lbr_ros2_control/include/lbr_ros2_control/system_interface_type_values.hpp#L8-L27) - ### Kinova [fault_reset controller](https://github.com/Kinovarobotics/ros2_kortex/blob/main/kortex_description/arms/gen3/7dof/config/ros2_controllers.yaml#L17-L18) to report and reset faults. [twist_controller](https://github.com/Kinovarobotics/ros2_kortex/blob/309f9c9d4a277970e542e5ac1fe260ced0630f65/kortex_description/arms/gen3/7dof/config/ros2_controllers.yaml#L11-L12) - ### Dynamixel [hardware_interface](https://github.com/ROBOTIS-GIT/dynamixel_hardware_interface/blob/02841dd2ae422676e5dc0fea37057bdec3be8cc1/include/dynamixel_hardware_interface/dynamixel_hardware_interface.hpp#L53-L91) - ### Robotiq [hardware_interface](https://github.com/PickNikRobotics/ros2_robotiq_gripper/blob/12e623212e6891a5fcc9af94d67b07e640916394/robotiq_driver/include/robotiq_driver/driver.hpp#L41-L66) [acrivation_controller](https://github.com/PickNikRobotics/ros2_robotiq_gripper/blob/main/robotiq_controllers/src/robotiq_activation_controller.cpp) - ### Ethercat [hardware_interface](https://github.com/ICube-Robotics/ethercat_driver_ros2/blob/52be2c2ed163bab25d46c402ddb4e7216c0a0ec3/ethercat_generic_plugins/ethercat_generic_cia402_drive/include/ethercat_generic_plugins/cia402_common_defs.hpp#L31-L56) - ### ROS2 canopen driver https://github.com/ros-industrial/ros2_canopen/tree/master - ### Picknik Twist & Fault controllers -https://github.com/PickNikRobotics/picknik_controllers +https://github.com/PickNikRobotics/picknik_controllers \ No newline at end of file diff --git a/design_drafts/images/hardware_status.png b/design_drafts/images/hardware_status.png index 1b45982..96fe4df 100644 Binary files a/design_drafts/images/hardware_status.png and b/design_drafts/images/hardware_status.png differ