Skip to content

Commit f25d169

Browse files
committed
docs: add IPC naming convention and debounce semantics documentation
- Document ipc_service_name module with naming convention (<subsystem>/<channel>[/<qualifier>]), constraints, and usage table - Expand HoldTime doc comment with full semantics, when-to-use guidance, and edge cases (zero duration, reset behavior) - Document EdgeWithCooldown::reset() explaining that it anchors a new cooldown window rather than returning to the initial state - Add Send + Sync supertraits to the Debounce trait and remove redundant bounds from downstream usages - Create docs/open-points.md covering three unresolved design items: KVS lossy integer casts, PreFailed aging clock reset behavior, and MAX_PATH_LENGTH coupling with LongString capacity
1 parent 3726eac commit f25d169

5 files changed

Lines changed: 168 additions & 5 deletions

File tree

docs/open-points.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# Open Points
2+
3+
Items requiring design decisions or future attention. Each entry includes
4+
the problem, code location, options considered, and a recommendation.
5+
6+
---
7+
8+
## 1. KVS `as` casts — lossy integer conversions
9+
10+
**Location:** `src/dfm_lib/src/sovd_fault_storage.rs` lines 107-111 (serialize), 132-135 (deserialize)
11+
12+
**Problem:**
13+
`rust_kvs` only supports `i64` as its numeric type. Fault counters are
14+
stored as `u32`/`u64`, so serialization uses `as i64` (potentially
15+
truncating high-bit values) and deserialization uses `as u32`/`as u64`
16+
(wrapping on negative values, defaulting to 0 on missing keys).
17+
18+
Affected fields:
19+
- `occurrence_counter` (`u32``i64``u32`)
20+
- `aging_counter` (`u32``i64``u32`)
21+
- `healing_counter` (`u32``i64``u32`)
22+
- `first_occurrence_secs` (`u64``i64``u64`)
23+
- `last_occurrence_secs` (`u64``i64``u64`)
24+
25+
**Options:**
26+
1. **`TryFrom` with fallback to 0** — safe, but silently loses data on overflow.
27+
2. **`TryFrom` with `StorageError` propagation** — surfaces the problem to
28+
29+
the caller; forces handling of corrupted/out-of-range storage.
30+
31+
**Recommendation:** Option 2 — propagate as `StorageError`. Silent
32+
fallback to 0 can mask data corruption in long-running automotive
33+
systems where counters may accumulate over thousands of operation
34+
cycles.
35+
36+
---
37+
38+
## 2. PreFailed resets the aging clock
39+
40+
**Location:** `src/dfm_lib/src/fault_record_processor.rs` line 175 (`mark_aging_active`)
41+
42+
**Problem:**
43+
When a `PreFailed` event arrives, `mark_aging_active` is called, which
44+
resets the aging clock to the current operation cycle. This means a
45+
fault that oscillates between `PreFailed` and `PrePassed` will never
46+
age out, even if it never reaches a full `Failed` confirmation.
47+
48+
This is the **conservative** (automotive-safe) approach: a fault that is
49+
still intermittently signaling `PreFailed` should not be silently
50+
cleared. However, it deviates from a strict reading of ISO 14229 where
51+
only confirmed DTCs (`Failed`) start the aging counter.
52+
53+
**Options:**
54+
1. **Keep current behavior** — conservative; any sign of fault activity
55+
keeps the DTC alive.
56+
2. **Only reset aging on `Failed`** — strict ISO 14229 interpretation;
57+
`PreFailed` alone does not prevent aging.
58+
59+
**Recommendation:** Option 1 — maintain the current conservative
60+
behavior. In automotive safety contexts, it is preferable to keep a
61+
DTC visible longer than to risk clearing it while the underlying
62+
condition is still intermittently present.
63+
64+
---
65+
66+
## 3. MAX_PATH_LENGTH coupling with LongString capacity
67+
68+
**Location:** `src/fault_lib/src/fault_manager_sink.rs` line 215
69+
70+
**Problem:**
71+
`MAX_PATH_LENGTH` is a magic constant set to `128`, which must match
72+
the capacity of `LongString::StaticString<128>` used for IPC. If either
73+
value changes independently, path validation and IPC serialization will
74+
disagree, potentially causing silent truncation or runtime panics.
75+
76+
**Options:**
77+
1. **`const_assert!`** — add a compile-time assertion that
78+
`MAX_PATH_LENGTH == LongString::CAPACITY` so any mismatch fails the
79+
build.
80+
2. **Doc comment only** — document the coupling (already done in T55)
81+
and rely on code review to keep them in sync.
82+
83+
**Recommendation:** Option 1 if `LongString` exposes a `CAPACITY`
84+
constant. Otherwise, the existing doc comment (added in T55) is
85+
sufficient until the upstream type provides a programmatic bound.

src/common/src/debounce.rs

Lines changed: 40 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ pub enum DebounceMode {
6969

7070
impl DebounceMode {
7171
/// Create a boxed [`Debounce`] implementation matching this mode.
72-
pub fn into_debouncer(self) -> Box<dyn Debounce + Send + Sync> {
72+
pub fn into_debouncer(self) -> Box<dyn Debounce> {
7373
match self {
7474
DebounceMode::CountWithinWindow { min_count, window } => Box::new(CountWithinWindow::new(min_count, window.into())),
7575
DebounceMode::HoldTime { duration } => Box::new(HoldTime::new(duration.into())),
@@ -89,7 +89,10 @@ pub struct DebouncePolicy {
8989
}
9090

9191
/// Trait for debounce algorithm implementations.
92-
pub trait Debounce {
92+
///
93+
/// All implementations must be [`Send`] + [`Sync`] to support use in
94+
/// multi-threaded fault-reporting pipelines (e.g. behind `Arc<Mutex<_>>`).
95+
pub trait Debounce: Send + Sync {
9396
/// Called on each fault occurrence. Returns true if the event should be reported.
9497
fn on_event(&mut self, now: Instant) -> bool;
9598

@@ -141,6 +144,27 @@ impl Debounce for CountWithinWindow {
141144
///
142145
/// Confirms a fault only after the condition persists continuously for
143146
/// the configured duration.
147+
///
148+
/// # Semantics
149+
///
150+
/// - The first call to [`on_event`](Debounce::on_event) starts an internal
151+
/// timer but returns `false` (not yet confirmed).
152+
/// - Subsequent calls return `true` only when the elapsed time since the
153+
/// first event meets or exceeds `duration`.
154+
/// - If [`reset`](Debounce::reset) is called (e.g. the fault condition
155+
/// clears), the timer restarts from zero on the next `on_event`.
156+
///
157+
/// # When to use
158+
///
159+
/// Use `HoldTime` for "stuck-at" faults where a condition must remain
160+
/// continuously active for a minimum period before it should be reported.
161+
/// This prevents transient glitches from triggering fault confirmation.
162+
///
163+
/// # Edge cases
164+
///
165+
/// - A `duration` of zero confirms on the **second** call (the first
166+
/// call always records the start time and returns `false`).
167+
/// - Calling `reset` and then immediately `on_event` restarts the timer.
144168
pub struct HoldTime {
145169
duration: Duration,
146170
start_time: Option<Instant>,
@@ -196,6 +220,20 @@ impl Debounce for EdgeWithCooldown {
196220
}
197221
}
198222

223+
/// Reset the edge-with-cooldown debouncer.
224+
///
225+
/// Sets `last_report` to `now`, which means the cooldown window
226+
/// restarts from this instant. Any `on_event` call arriving before
227+
/// `now + cooldown` will be suppressed.
228+
///
229+
/// This is the correct behavior when the underlying fault clears:
230+
/// the reporter should not immediately re-fire on the next edge.
231+
///
232+
/// # Difference from other debouncers
233+
///
234+
/// Unlike [`HoldTime::reset`] (which clears the timer entirely),
235+
/// this method *anchors* a new cooldown window — the debouncer
236+
/// does **not** return to the "never reported" initial state.
199237
fn reset(&mut self, now: Instant) {
200238
self.last_report = Some(now);
201239
}

src/common/src/ipc_service_name.rs

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,49 @@
1010
// SPDX-License-Identifier: Apache-2.0
1111
//
1212

13+
//! IPC service name constants for iceoryx2 publish-subscribe channels.
14+
//!
15+
//! # Naming convention
16+
//!
17+
//! All service names use a hierarchical slash-separated format:
18+
//!
19+
//! ```text
20+
//! <subsystem>/<channel>[/<qualifier>]
21+
//! ```
22+
//!
23+
//! - **`<subsystem>`** — logical component, e.g. `dfm` (Diagnostic Fault Manager).
24+
//! - **`<channel>`** — communication purpose, e.g. `event`, `enabling_condition`.
25+
//! - **`<qualifier>`** — optional sub-channel, e.g. `hash/response`, `notification`.
26+
//!
27+
//! # Constraints
28+
//!
29+
//! - Must be valid iceoryx2 [`ServiceName`](iceoryx2::prelude::ServiceName) values.
30+
//! - Maximum length is 255 bytes (iceoryx2 limit).
31+
//! - Characters allowed: alphanumeric, `/`, `_`, `-`, `.`.
32+
//! - Must not start or end with `/`.
33+
//!
34+
//! # Examples
35+
//!
36+
//! | Constant | Value | Direction |
37+
//! |----------|-------|-----------|
38+
//! | `DIAGNOSTIC_FAULT_MANAGER_EVENT_SERVICE_NAME` | `dfm/event` | reporter → DFM |
39+
//! | `DIAGNOSTIC_FAULT_MANAGER_HASH_CHECK_RESPONSE_SERVICE_NAME` | `dfm/event/hash/response` | DFM → reporter |
40+
//! | `ENABLING_CONDITION_NOTIFICATION_SERVICE_NAME` | `dfm/enabling_condition/notification` | DFM → reporters |
41+
1342
/// Iceoryx2 service name for the main diagnostic-event channel (reporter → DFM).
43+
///
44+
/// Reporters publish [`DiagnosticEvent`](crate::types::DiagnosticEvent) messages on this
45+
/// channel. The Diagnostic Fault Manager subscribes and processes them.
1446
pub const DIAGNOSTIC_FAULT_MANAGER_EVENT_SERVICE_NAME: &str = "dfm/event";
47+
1548
/// Iceoryx2 service name for hash-check responses (DFM → reporter).
49+
///
50+
/// After a reporter registers, the DFM replies on this channel with a
51+
/// hash-check response confirming catalog consistency.
1652
pub const DIAGNOSTIC_FAULT_MANAGER_HASH_CHECK_RESPONSE_SERVICE_NAME: &str = "dfm/event/hash/response";
53+
1754
/// Iceoryx2 service name for enabling-condition notifications (DFM → reporters).
55+
///
56+
/// The DFM publishes [`EnablingConditionNotification`](crate::EnablingConditionNotification)
57+
/// messages whenever an enabling condition changes state.
1858
pub const ENABLING_CONDITION_NOTIFICATION_SERVICE_NAME: &str = "dfm/enabling_condition/notification";

src/dfm_lib/src/fault_record_processor.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ pub struct FaultRecordProcessor<S: SovdFaultStateStorage> {
5555
storage: Arc<S>,
5656
catalog_registry: Arc<FaultCatalogRegistry>,
5757
/// Per-source, per-fault debounce state. Lazily populated from catalog on first event.
58-
debouncers: HashMap<FaultKey, Box<dyn Debounce + Send>>,
58+
debouncers: HashMap<FaultKey, Box<dyn Debounce>>,
5959
/// Tracks last confirmed (post-debounce) lifecycle stage per fault key.
6060
/// Used to detect transitions that should reset the debouncer.
6161
last_stages: HashMap<FaultKey, LifecycleStage>,
@@ -217,7 +217,7 @@ impl<S: SovdFaultStateStorage> FaultRecordProcessor<S> {
217217
/// Lazily looks up the debouncer for a fault key, creating it from
218218
/// the catalog's `manager_side_debounce` config on first access.
219219
/// Returns `None` when no manager-side debounce is configured.
220-
fn get_or_create_debouncer(&mut self, key: &FaultKey, fault_id: &FaultId) -> Option<&mut Box<dyn Debounce + Send>> {
220+
fn get_or_create_debouncer(&mut self, key: &FaultKey, fault_id: &FaultId) -> Option<&mut Box<dyn Debounce>> {
221221
if self.debouncers.contains_key(key) {
222222
return self.debouncers.get_mut(key);
223223
}

src/fault_lib/src/reporter.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ pub struct Reporter {
3939
/// Runtime debounce state derived from `descriptor.reporter_side_debounce`.
4040
/// When `Some`, `publish()` filters events through the debouncer before
4141
/// forwarding to the sink, reducing IPC traffic to the DFM.
42-
pub(crate) debouncer: Option<Box<dyn Debounce + Send + Sync>>,
42+
pub(crate) debouncer: Option<Box<dyn Debounce>>,
4343
/// Tracks the last published lifecycle stage so that stage transitions
4444
/// (e.g. Passed → Failed) can reset the debouncer.
4545
pub(crate) last_stage: LifecycleStage,

0 commit comments

Comments
 (0)