Skip to content

feat(eventbus): close END-297 — finalize message broker closure pass#178

Merged
unclesp1d3r merged 28 commits into
mainfrom
feat/end-297-close-message-broker
Apr 19, 2026
Merged

feat(eventbus): close END-297 — finalize message broker closure pass#178
unclesp1d3r merged 28 commits into
mainfrom
feat/end-297-close-message-broker

Conversation

@unclesp1d3r
Copy link
Copy Markdown
Member

@unclesp1d3r unclesp1d3r commented Apr 18, 2026

Summary

Closes END-297Implement message broker for multi-collector coordination and load balancing.

The daemoneye-eventbus crate was already ~95% implemented (spec .kiro/specs/daemoneye-core-monitoring/tasks.md §2.6.2 was [x]) before this PR began. This is a closure pass, not a rebuild: it addresses the four remaining gaps that prevented formal sign-off, adds the sign-off artifact, and carries the branch through 6 rounds of comprehensive review with convergent fixes.

Impact: 70 files changed (+2,776 / −527). 19 commits. 5 implementation units + 6 review rounds + 1 workspace-wide Rust 1.95 clippy sweep.

Risk level: 🟡 Medium. Touches privileged collector startup ordering, shared-Arc shutdown semantics, and a new public API on EventBusClient. Extensively reviewed — every change went through at least 2 independent reviewers. Production risk quantified in Known Limitations below.

What Changed

🔧 Core feature (Unit 1) — Control-message subscription delivery

The only real functional gap. Without it, procmond could not receive BeginMonitoring over the bus and ran uncoordinated on startup.

  • daemoneye-eventbus::EventSubscription::include_control: bool — opt-in field, default false for backward compatibility
  • EventBusClient::subscribe_with_control(sub) -> (Receiver<BusEvent>, Receiver<Message>) — parallel receivers for event + control streams
  • EventBusClient::shutdown_signal(&self) -> bool — non-consuming shutdown that works with shared Arc<EventBusClient>
  • EventBusConnector.client: Option<Arc<EventBusClient>> + client_arc() accessor — lets callers clone the client out of a brief read-lock and .await without holding the lock (clippy::await_holding_lock compliant)
  • procmond/src/main.rs — replaces the start-monitoring-immediately workaround with subscribe-then-wait on control.collector.lifecycle, with a single absolute timeout_at deadline (chatty streams cannot reset the window)
  • BeginMonitoringReason enum + begin_monitoring_or_exit() helper — default-fatal error policy: every actor error is fatal except ChannelFull on degraded-fallback paths (one-shot signals like LifecycleSignal / StandaloneMode exit even on ChannelFull)
  • New --standalone / PROCMOND_STANDALONE=1 escape hatch for agent-less operation

📏 Infrastructure (Unit 2) — <1ms p99 latency SLO gate

  • New latency_p99_slo criterion bench in daemoneye-eventbus/benches/ipc_performance.rs — panics if p99 exceeds Duration::from_millis(1) (nearest-rank, N=10 000 samples, 1 000 warmup)
  • New eventbus-latency-slo job in .github/workflows/benchmarks.yml — runs on Linux + macOS, triggers on pull_request with path filter covering daemoneye-eventbus/**, collector-core/**, Cargo.toml, Cargo.lock, rust-toolchain.toml, and the workflow file itself
  • Observed p99 on macOS aarch64: ~21 µs (48× under SLO)

📚 Quality (Unit 3) — Sign-off doc-truth sync

  • IMPLEMENTATION_SUMMARY.md, VALIDATION_CHECKLIST.md, COMPREHENSIVE_REVIEW.md, README.md updated to match actual behavior
  • Removed the stale "Pause/Resume stubbed" claim (they delegate to ProcessManager::{pause,resume}_collector)
  • Line-count expectations in VALIDATION_CHECKLIST.md refreshed; stale "95% complete" figure dropped
  • daemoneye-eventbus/README.md and docs/src/technical/eventbus-architecture.md p99 claim scoped explicitly to Linux/macOS CI

✅ Validation (Unit 4) — E2E multi-collector test

  • New daemoneye-eventbus/tests/e2e_multi_collector.rs — 5 scenarios on the real broker + TaskDistributor + ResultAggregator:
    1. Load balancing across queue-group members (100 tasks → both receive > 0, neither > 95)
    2. Broadcast control.collector.config.procmond reload (both subscribers receive it)
    3. Failover after deregister_collector (no task loss, remainder routes to surviving member)
    4. Result aggregation with correlation IDs + dedup cache
    5. Wildcard control.collector.task.# observer sees all task broadcasts without participating in the queue group

📜 Sign-off (Unit 5) — Acceptance evidence artifact

  • New daemoneye-eventbus/docs/END-297-acceptance-evidence.md — one row per acceptance criterion with repo-relative evidence paths, reproduction commands, and a "Post-Merge Follow-ups" section tracking deferred work

🧹 Workspace-wide Rust 1.95 clippy sweep

Commit 79c9773 — 44 files, mechanical only:

  • Duration::from_secs(60)Duration::from_mins(1), from_millis(1000)from_secs(1), etc. (clippy::duration_suboptimal_units)
  • .map(f).unwrap_or(x) on Result.map_or(x, f) (clippy::map_unwrap_or)
  • if d == 0 { 0 } else { a / d }.checked_div(d).unwrap_or(0) (clippy::manual_checked_ops)

Review History

Unprecedented review depth for this repo. Every thread resolved; findings converged across reviewers.

Round Reviewer(s) Threads Commits
1 CodeRabbit + Copilot inline review 4 review threads 4f27920
2 CodeRabbit re-review 2 review threads 2eb4ba4
3 CodeRabbit + Copilot re-review 5 review threads 24c7360
4 Copilot re-review 5 review threads 6a09779
5 Internal 6-agent comprehensive review 7 findings → fixes f25beb1
6 Internal 3-agent comprehensive re-review 2 critical + accuracy 38dd28e

All 18 GitHub review threads resolved. Post-merge follow-ups enumerated in daemoneye-eventbus/docs/END-297-acceptance-evidence.md → "Post-Merge Follow-ups".

Control-Flow Diagram — procmond startup (after this PR)

flowchart TD
  Start([procmond starts]) --> Init[Initialize telemetry, DB, event_bus]
  Init --> Register{Register with agent}
  Register -->|Ok| SetDefault[standalone_mode = cli/env only]
  Register -->|Err| ForceStandalone[registration_failed = true]
  SetDefault --> StandaloneCheck{standalone_mode?}
  ForceStandalone --> StandaloneCheck
  StandaloneCheck -->|true| DirectBegin[begin_monitoring_or_exit<br/>StandaloneMode]
  StandaloneCheck -->|false| Subscribe[subscribe_with_control<br/>control.collector.lifecycle]
  Subscribe -->|Err| BrokerUnreach[begin_monitoring_or_exit<br/>BrokerUnreachable]
  Subscribe -->|Ok| SpawnHeartbeat[Spawn heartbeat task<br/>now safe — subscription<br/>is active]
  SpawnHeartbeat --> SpawnWait[Spawn lifecycle_wait_task]
  SpawnWait --> WaitLoop{timeout_at<br/>60s deadline}
  WaitLoop -->|Ok Some lifecycle msg| LifecycleBegin[begin_monitoring_or_exit<br/>LifecycleSignal]
  WaitLoop -->|Ok None channel-closed| ChClosedBegin[begin_monitoring_or_exit<br/>ChannelClosed]
  WaitLoop -->|Err timeout| TimeoutBegin[begin_monitoring_or_exit<br/>WaitTimeout]
  DirectBegin --> Running([Running])
  BrokerUnreach --> Running
  LifecycleBegin --> Running
  ChClosedBegin --> Running
  TimeoutBegin --> Running
  LifecycleBegin -.ChannelFull or ChannelClosed.-> Exit1([exit 1<br/>supervisor restarts])
  ChClosedBegin -.ChannelClosed only.-> Exit1
  TimeoutBegin -.ChannelClosed only.-> Exit1
  DirectBegin -.ChannelFull or ChannelClosed.-> Exit1
Loading

Test Plan

Local verification (completed):

  • cargo clippy --workspace --all-targets -- -D warnings — clean (Rust 1.95)
  • cargo fmt --all --check — clean
  • cargo nextest run -p daemoneye-eventbus193/193 pass (9 leaky per existing pattern)
  • cargo nextest run -p procmond631/631 pass
  • cargo nextest run -p daemoneye-agent169/169 pass
  • cargo nextest run -p collector-core249/249 pass
  • cargo nextest run -p daemoneye-eventbus --test e2e_multi_collector5/5 pass
  • cargo bench --package daemoneye-eventbus --bench ipc_performance -- '^latency_p99_slo$'~21 µs p99
  • just ci-check — full workspace + all pre-commit hooks green (1 597 tests across all crates)

CI (pending remote):

  • CI matrix green on Linux, macOS, Windows (ci.yml)
  • eventbus-latency-slo benchmark job green on Linux + macOS (benchmarks.yml)
  • CodeQL scan (codeql.yml) clean
  • Mergify merge protections satisfied

Risk Assessment

Factor Score Notes
Size 🟠 6/10 2 776 additions — but 1 100+ are the mechanical clippy sweep; remainder concentrated in one module
Complexity 🟡 5/10 New Arc wrapping pattern, startup-ordering invariant, default-fatal error policy — all reviewed in depth
Test coverage 🟡 5/10 5 new e2e scenarios + 6 new unit tests; 4 happy-path behaviors deferred to follow-up (logged)
Dependencies 🟢 2/10 No new external deps; workspace dep updates isolated to Cargo.lock housekeeping commits
Security 🟡 4/10 New --standalone escape hatch + broker-unreachable fallback lack audit-ledger writes (logged follow-up, P1)

Mitigation: review depth, default-fatal error policy, clippy::await_holding_lock = deny enforced, unsafe_code = forbid intact, zero new unsafe.

Known Limitations

  • Cross-OS-process subscription registration is a latent transport-layer gap. The broker does not yet run a per-connection receive loop that accepts subscribe-request protocol messages from remote clients over interprocess. In production today, a separately-spawned procmond will hit its 60s BEGIN_MONITORING_WAIT_TIMEOUT and fall back to standalone collection on every startup until this closes. Scope-reducing R14 from full to partial (in-process only) — see daemoneye-eventbus/docs/END-297-acceptance-evidence.md row 14 and "Known Latent Gaps" section.
  • Existing file-size overruns (rpc.rs 2 789 lines, broker.rs 1 371 lines) are not addressed — tracked in a separate refactor ticket per plan scope boundary.
  • #[non_exhaustive] was attempted on EventSubscription but required a builder-pattern refactor that would cascade across collector-core, tests, and benches (FRU forbidden cross-crate for non-exhaustive types). Deferred with a note in the struct's rustdoc and the evidence artifact.

Post-Merge Follow-ups

Logged in daemoneye-eventbus/docs/END-297-acceptance-evidence.md → "Post-Merge Follow-ups". Priority summary:

  • P1 — Audit-ledger writes on the 5 fallback paths (operator opt-in standalone, registration failure, broker unreachable, channel closed, wait timeout). An attacker who disables the agent can force uncoordinated collection with no tamper-evident record today.
  • P1 — Tests for subscribe-then-wait happy path / --standalone runtime / off-topic continue branch / TrySendError::Full backpressure drop.
  • P2 — Remove EventBusConnector.connected bool drift risk (delegate is_connected() to self.client.is_some() after auditing three error-path sites).
  • P2#[non_exhaustive] on EventSubscription + builder-pattern refactor.
  • P3 — Type-design polish (collapse include_control into method choice, shutdown_signal → enum).
  • P3 — Simplification opportunities (extract try_subscribe_control helper, collect_until doc, shutdown-pattern canonical example).

Review Checklist

General

  • Code follows project style (AGENTS.md — cargo clippy --workspace --all-targets -- -D warnings clean)
  • Self-review completed (6 rounds + 2 comprehensive)
  • Comments explain WHY, not WHAT (verified by comment-analyzer review rounds 5 & 6)
  • No debug statements, no dbg!, no println! in production paths
  • No hardcoded secrets

Safety

  • unsafe_code = forbid enforced; no new unsafe
  • clippy::await_holding_lock = deny enforced; no guard held across .await
  • clippy::arithmetic_side_effects = deny enforced; all arithmetic uses checked_* / saturating_*
  • begin_monitoring_or_exit default-fatal on unknown ActorError variants (defensive against #[non_exhaustive])

Contract

  • EventSubscription.include_control default is false (legacy subscribers unaffected)
  • subscribe() behavior unchanged for existing callers
  • shutdown(mut self) API untouched (shutdown_signal(&self) is additive)
  • Wire format (postcard) unchanged; #[serde(default)] covers older payloads

Tests

  • New tests added for new behavior (control delivery, latency SLO, 5 e2e scenarios)
  • Existing tests still pass (1 597 / 1 597)
  • Tests use order-independent assertions where order is not guaranteed
  • No flaky tests introduced

Documentation

  • Rustdoc on all new public items (include_control, subscribe_with_control, shutdown_signal, client_arc, BeginMonitoringReason)
  • docs/solutions/best-practices/rust-async-arc-rwlock-await-holding-lock-pattern-2026-04-18.md captures the Arc<InnerResource> pattern for future subsystems
  • Acceptance evidence artifact present
  • Post-merge follow-ups logged in-tree (not in gitignored docs/todos/)

AI Disclosure

Extensive use of Claude Code (Claude Opus 4.7 (1M Context)) for: planning (/ce:plan), implementation via serial + parallel systems-programming:rust-pro subagents (Units 1–4), 6 rounds of comprehensive review dispatch (/pr-review-toolkit:review-pr), and the 44-file Rust 1.95 clippy sweep. All code changes reviewed by me against project standards (AGENTS.md, zero-warnings clippy, open-core hygiene, unsafe_code = forbid) and validated by the full test suite + just ci-check on each commit.

Refs: END-297 · PR plan docs/plans/2026-04-18-001-feat-close-end-297-message-broker-plan.md · Evidence daemoneye-eventbus/docs/END-297-acceptance-evidence.md

Closes END-297 by addressing the four remaining gaps that prevented
formal sign-off of the multi-collector message broker:

Unit 1 — Control-message subscription delivery (the only real functional
gap). EventBusClient gains an opt-in `include_control: bool` field on
EventSubscription (default false for backward compatibility) and a new
`subscribe_with_control()` method returning parallel (event_rx,
control_rx) receivers. procmond replaces its start-monitoring-immediately
workaround with subscribe-then-wait on control.collector.lifecycle,
honoring BeginMonitoring from the agent. A new --standalone CLI flag
preserves agent-less operation for tests and air-gapped bootstrap. The
broker-unreachable path logs a loud WARN and falls back to the escape
hatch rather than silently collecting.

Unit 2 — Machine-checked <1ms p99 latency SLO. New latency_p99_slo
criterion bench in benches/ipc_performance.rs with
`const LATENCY_P99_SLO: Duration = Duration::from_millis(1)` and
panic-on-breach. A dedicated eventbus-latency-slo job in
.github/workflows/benchmarks.yml runs it on Linux and macOS.
Observed p99 on macOS aarch64: ~21us (48x under SLO).

Unit 3 — Doc-truth sync across IMPLEMENTATION_SUMMARY.md,
VALIDATION_CHECKLIST.md, COMPREHENSIVE_REVIEW.md, and README.md. The
largest correction removes the "Pause/Resume stubbed" claim —
rpc.rs:1862-1950 delegates to ProcessManager::{pause,resume}_collector,
they are fully implemented. Embedded validation commands re-verified
against current state; line-count expectations refreshed.

Unit 4 — End-to-end multi-collector coordination test
(tests/e2e_multi_collector.rs). Five scenarios exercising the real
broker, task distributor, and result aggregator with two simulated
collector clients: load balancing across queue group members, broadcast
config reload, failover after deregister, result aggregation with
correlation IDs, and wildcard observer. Scope-limited to in-process
clients because cross-OS-process subscription registration is a latent
transport-layer gap tracked separately.

Unit 5 — Acceptance evidence artifact
(docs/END-297-acceptance-evidence.md) mapping every acceptance criterion
to code, tests, and benchmarks with repo-relative paths and independently
runnable reproduction commands.

Coverage:
- 193 eventbus + 631 procmond + 169 agent tests pass
- cargo clippy --workspace --all-targets -- -D warnings: clean
- cargo fmt --all --check: clean
- latency_p99_slo bench passes locally

Refs: END-297
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copilot AI review requested due to automatic review settings April 18, 2026 23:27
@dosubot dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Apr 18, 2026
@unclesp1d3r unclesp1d3r self-assigned this Apr 18, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds explicit control-message subscription semantics and a new dual-stream subscription API, gates actor-mode startup on lifecycle control messages with a --standalone override, adds an integration test for unconnected control subscribe, and introduces a CI job to run an eventbus p99 latency SLO benchmark.

Changes

Cohort / File(s) Summary
CI: benchmarks workflow
\.github/workflows/benchmarks.yml
Adds pull_request trigger scoped to daemoneye-eventbus/**, collector-core/**; new eventbus-latency-slo job runs cargo bench for ipc_performance filtered to ^latency_p99_slo$ across ubuntu/macos matrix; uploads OS-scoped artifacts with 30-day retention; other jobs gated to workflow_dispatch.
Event subscription bridge & docs
collector-core/src/daemoneye_event_bus.rs, docs/src/technical/eventbus-architecture.md
Bridge conversion now explicitly sets EventSubscription.include_control = false. Docs tighten Local IPC latency claim to <1ms p99 (Linux/macOS) and add examples for include_control: false and include_control: true with dual receivers.
EventBus connector API & tests
procmond/src/event_bus_connector.rs, procmond/tests/actor_mode_integration_tests.rs
Connector now stores Arc<EventBusClient>, exposes client_arc(); adds subscribe_with_control(...) returning (event_rx, control_rx) and makes existing subscribe(...) set include_control=false. New test verifies subscribe_with_control errors when not connected.
Actor-mode lifecycle & CLI
procmond/src/main.rs
Adds CLI flag --standalone / PROCMOND_STANDALONE=1. When not standalone, procmond subscribes to control.collector.lifecycle via subscribe_with_control, spawns a lifecycle-wait task that begins monitoring upon receiving a BeginMonitoring control message (60s timeout), and aborts the wait task during shutdown; registration failures force standalone.
Policy / agent guidance
AGENTS.md
Updates guidance: disallow merging unless PR passed CI and received human maintainer approval (human must approve merges).

Sequence Diagram(s)

sequenceDiagram
    participant Procmond as Procmond (collector)
    participant EventBus as EventBus (agent IPC broker)
    participant Agent as Agent (control publisher)

    rect rgba(200,150,100,0.5)
    Note over Procmond,Agent: Actor-mode lifecycle startup
    Procmond->>EventBus: subscribe_with_control(topic="control.collector.lifecycle")
    EventBus-->>Procmond: returns (event_rx, control_rx)
    Procmond->>Procmond: spawn lifecycle_wait_task awaiting control_rx (timeout 60s)
    Agent->>EventBus: publish BeginMonitoring -> lifecycle topic
    EventBus-->>Procmond: deliver control Message on control_rx
    Procmond->>Procmond: begin_monitoring()
    end

    rect rgba(100,150,200,0.5)
    Note over Procmond: Fallback / standalone
    Procmond->>Procmond: if --standalone OR subscribe fails/timeouts -> begin_monitoring() immediately
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

rust, core-feature, process-monitoring, ipc, async, testing, integration, documentation, type:feature, cross-platform, performance

Poem

A collector waits for the agent's sign,
Two streams split — events clear, controls defined,
Standalone steps in if the broker's not kind,
Benchmarks watch p99 across OS line,
Ship sharp, stay safe, keep startup aligned.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed Title follows Conventional Commits spec with type 'feat', scope 'eventbus', and clear description linking to END-297.
Linked Issues check ✅ Passed Changes directly address END-297 requirements: control-message delivery (subscribe_with_control), <1ms p99 latency SLO bench, queue-group/load-balancing dispatch via topology, Arc for safe concurrent access, and E2E multi-collector coordination tests.
Out of Scope Changes check ✅ Passed All changes are scoped to END-297 closure: broker integration, procmond lifecycle coordination, latency validation, documentation sync, and E2E tests. AGENTS.md update aligns governance; file-size hygiene and cross-OS subscription registration deferred per stated boundaries.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The pull request description comprehensively addresses the changeset, detailing control-message subscription delivery, latency SLO infrastructure, documentation updates, E2E tests, and the acceptance evidence artifact that close END-297. Changes are well-scoped, justified, and traced to review history.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/end-297-close-message-broker

Warning

Review ran into problems

🔥 Problems

These MCP integrations need to be re-authenticated in the Integrations settings: Linear, Notion


Comment @coderabbitai help to get the list of available commands and usage tips.

@dosubot dosubot Bot added core-feature Core system functionality daemoneye An important component of the larger DaemonEye suite. documentation Improvements or additions to documentation integration Related to integration testing and component integration testing Related to test development and test infrastructure labels Apr 18, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 18, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Require conventional commit format per https://www.conventionalcommits.org/en/v1.0.0/. Skipped for dependabot and dosubot.

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?!?:

🟢 Full CI must pass

Wonderful, this rule succeeded.

All CI checks must pass. Activates for non-bot authors, or dependabot when files exist outside .github/workflows/.

  • check-success = DCO
  • check-success = coverage
  • check-success = quality
  • check-success = test
  • check-success = test-cross-platform (macos-15, macOS)
  • check-success = test-cross-platform (ubuntu-22.04, Linux)
  • check-success = test-cross-platform (windows-2022, Windows)

🟢 Do not merge outdated PRs

Wonderful, this rule succeeded.

Make sure PRs are within 3 commits of the base branch before merging

  • #commits-behind <= 3

@coderabbitai coderabbitai Bot added rust Pull requests that update rust code ipc Inter-Process Communication process-monitoring Process monitoring and enumeration features cross-platform Multi-platform compatibility features async Related to asynchronous programming and async/await patterns type:feature performance labels Apr 18, 2026
@dosubot
Copy link
Copy Markdown
Contributor

dosubot Bot commented Apr 18, 2026

Related Documentation

3 document(s) may need updating based on files changed in this PR:

DaemonEye

eventbus-architecture /DaemonEye/blob/main/docs/src/technical/eventbus-architecture.md — ⏳ Awaiting Merge
procmond Process Collection Integration
View Suggested Changes
@@ -4,7 +4,7 @@
 ## Dual-Mode Operation
 procmond can operate in two modes, determined at startup:
 
-- **Actor Mode (Broker-Coordinated):** If the `DAEMONEYE_BROKER_SOCKET` environment variable is set, procmond starts in actor mode. It initializes an `EventBusConnector` for crash-recoverable event delivery, coordinates startup with the agent, and dynamically adjusts collection intervals in response to backpressure.
+- **Actor Mode (Broker-Coordinated):** If the `DAEMONEYE_BROKER_SOCKET` environment variable is set, procmond starts in actor mode. It initializes an `EventBusConnector` for crash-recoverable event delivery, coordinates startup with the agent, and dynamically adjusts collection intervals in response to backpressure. Actor mode can also be forced into standalone behavior using the `--standalone` CLI flag or `PROCMOND_STANDALONE=1` environment variable, bypassing agent coordination.
 - **Standalone Mode:** If no broker socket is configured, procmond falls back to standalone operation using the legacy `ProcessEventSource` with the collector-core framework.
 
 ## Actor Pattern: ProcmondMonitorCollector
@@ -22,7 +22,7 @@
 These messages are triggered either by the agent (via RPC) or by internal system events (such as backpressure).
 
 ### Startup Coordination
-On startup, the actor waits for a `BeginMonitoring` command from the agent before starting collection. This ensures all collectors are ready and the agent has completed privilege dropping. In test and development scenarios, `BeginMonitoring` is sent immediately to allow collection without full agent infrastructure.
+On startup, the actor subscribes to control topics and waits for a `BeginMonitoring` command from the agent before starting collection. This ensures all collectors are ready and the agent has completed privilege dropping. The subscription is established after registration succeeds to prevent race conditions where the agent's broadcast could arrive before the collector is listening. If the subscription fails or if the `--standalone` flag or `PROCMOND_STANDALONE=1` environment variable is set, procmond falls back to immediate-start standalone mode without agent coordination.
 
 ### EventBusConnector Integration
 In actor mode, events are published to the broker via `EventBusConnector`, which uses a write-ahead log (WAL) for crash-recoverable delivery. If the broker is unavailable, events are buffered and replayed when connectivity is restored. The connector also provides backpressure signals to the actor, triggering dynamic interval adjustment (1.5x slowdown during backpressure).
@@ -74,6 +74,7 @@
 - `--max-processes <n>`: Maximum number of processes to collect per cycle (`0` for unlimited; default: 0)
 - `--enhanced-metadata`: Enable collection of enhanced process metadata (requires privileges)
 - `--compute-hashes`: Enable computation of executable hashes for collected processes
+- `--standalone`: Start monitoring immediately without waiting for an agent `BeginMonitoring` signal (also: `PROCMOND_STANDALONE=1`)
 
 ## Standalone Mode: ProcessEventSource
 If no broker is configured, procmond uses the legacy `ProcessEventSource` with collector-core. This mode provides basic process event collection and lifecycle management, but does not support coordinated startup, crash-recoverable event delivery, or dynamic backpressure.

✅ Accepted

procmond Refactor and Integration
View Suggested Changes
@@ -28,6 +28,7 @@
 - `--max-processes <number>`: Maximum processes to collect per cycle (`0` for unlimited, default: `0`)
 - `--enhanced-metadata`: Enable enhanced metadata collection (boolean flag)
 - `--compute-hashes`: Enable executable hashing (boolean flag)
+- `--standalone`: Forces standalone/agent-less mode (embedded broker) without requiring the `DAEMONEYE_BROKER_SOCKET` environment variable. Also activated by `PROCMOND_STANDALONE=1`.
 - `DAEMONEYE_BROKER_SOCKET`: If set, enables actor mode and connects to the specified broker socket for coordinated operation.
 
 Example usage:

✅ Accepted

How did I do? Any feedback?

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR completes END-297’s “message broker closure pass” by finalizing control-message delivery semantics in daemoneye-eventbus, updating procmond to coordinate startup via lifecycle control messages, and adding benchmark/test/doc artifacts to support formal sign-off.

Changes:

  • Add opt-in Control message delivery via EventSubscription::include_control and EventBusClient::subscribe_with_control() (dual-channel receivers).
  • Update procmond actor-mode startup to subscribe to control.collector.lifecycle and wait for BeginMonitoring, with a --standalone / PROCMOND_STANDALONE=1 escape hatch.
  • Add END-297 acceptance evidence: a p99 latency SLO gate (criterion + CI job), plus new multi-collector E2E coordination tests and doc-truth sync.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
procmond/src/main.rs Implements subscribe-then-wait startup flow and standalone escape hatch.
procmond/src/event_bus_connector.rs Adds subscribe_with_control() wrapper returning (event_rx, control_rx).
procmond/tests/actor_mode_integration_tests.rs Adds coverage for the “not connected” control-subscribe error path.
procmond/tests/snapshots/cli__procmond_help.snap Updates CLI help snapshot for --standalone.
daemoneye-eventbus/src/message.rs Introduces include_control field and clarifies subscription semantics in docs.
daemoneye-eventbus/src/client.rs Implements control-message delivery to opted-in subscribers; adds unit tests.
daemoneye-eventbus/src/broker.rs Updates test subscription structs with the new field.
daemoneye-eventbus/tests/integration_tests.rs Adds serialization/defaulting tests for include_control.
daemoneye-eventbus/tests/correlation_metadata_tests.rs Updates subscription struct with include_control: false.
daemoneye-eventbus/tests/e2e_multi_collector.rs New E2E test suite for routing/broadcast/failover/aggregation/wildcards.
daemoneye-eventbus/benches/ipc_performance.rs Adds latency_p99_slo benchmark gate and supporting payload builder.
daemoneye-eventbus/benches/throughput.rs Updates subscription struct with include_control: false.
.github/workflows/benchmarks.yml Adds eventbus-latency-slo job running the SLO gate on Linux/macOS.
collector-core/src/daemoneye_event_bus.rs Explicitly sets include_control: false for bridge-layer subscriptions.
daemoneye-eventbus/docs/END-297-acceptance-evidence.md New acceptance evidence mapping criteria → code/tests/benches.
daemoneye-eventbus/README.md Updates latency claim to “<1ms p99” with bench reference.
daemoneye-eventbus/VALIDATION_CHECKLIST.md Updates validation commands/line counts and corrects Pause/Resume status.
daemoneye-eventbus/IMPLEMENTATION_SUMMARY.md Syncs docs to reflect Pause/Resume being implemented.
daemoneye-eventbus/COMPREHENSIVE_REVIEW.md Syncs completion status and RPC operation details with current behavior.

Comment thread procmond/src/main.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/benchmarks.yml:
- Around line 118-146: The workflow currently only runs via workflow_dispatch so
the eventbus-latency-slo job never gates PRs; update the workflow's top-level
triggers to include pull_request (and optionally push/ schedule) with
appropriate branch/path filters so eventbus-latency-slo executes for PRs and
fails CI on regressions, and ensure the job name eventbus-latency-slo remains
unchanged so CI consumers find it; e.g., add pull_request (and/or schedule) to
the on: section and adjust any branch filter to target the branches you want
protected.

In `@procmond/src/main.rs`:
- Around line 418-425: The control topic loop currently only handles lifecycle
messages and discards all other `control.collector.{id}` messages, then exits
and drops `control_rx`, preventing per-collector RPCs from reaching
RpcServiceHandler; modify the wait/dispatch logic (the task that reads from
`control_rx` using `lifecycle_topic`, `collector_control_topic`, and the
`control_topics` vector) so that when a non-lifecycle message arrives it is
forwarded to the rpc handler instead of hitting the debug branch — e.g. match on
the message kind and for lifecycle run the existing logic but for other messages
call into `rpc_service`/`RpcServiceHandler` (or send the message down a channel
that `rpc_service` owns) and do not exit the loop after the first lifecycle
message, ensuring `control_rx` remains alive so `HealthCheck`/`UpdateConfig`
reach `RpcServiceHandler`.
- Around line 401-473: The code may subscribe-and-wait for BeginMonitoring even
if register() failed earlier, which stalls because the agent only broadcasts to
registered collectors; update the flow around subscription/BeginMonitoring
(symbols: register(), registration_manager, registration_manager.collector_id(),
subscribe_with_control, control_rx, actor_handle.begin_monitoring()) so that
before calling event_bus.subscribe_with_control or entering the wait path you
verify registration succeeded (e.g. check the register() result or a
registration_manager.is_registered()/state flag); if registration failed,
immediately fall back to standalone behavior by invoking
actor_handle.begin_monitoring() and skipping subscription/wait, or retry
registration first—ensure the subscription/wait branch only runs when
registration is confirmed.

In `@procmond/tests/actor_mode_integration_tests.rs`:
- Around line 430-437: Replace the brittle string assertions on the error
Display with a direct pattern match asserting the error variant: check that
result is Err(EventBusConnectorError::Connection(_)) (using a match or
assert!(matches!(...)) against EventBusConnectorError::Connection(_)) instead of
inspecting err_msg; ensure the EventBusConnectorError type is in scope (import
or fully qualify) and keep the original intent that subscribe_with_control
returns a connection-related error when not connected.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Repository UI (inherited), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9d8e5ec0-0779-49ab-bd4c-99550830a717

📥 Commits

Reviewing files that changed from the base of the PR and between 3e194be and 119e913.

⛔ Files ignored due to path filters (14)
  • daemoneye-eventbus/COMPREHENSIVE_REVIEW.md is excluded by none and included by none
  • daemoneye-eventbus/IMPLEMENTATION_SUMMARY.md is excluded by none and included by none
  • daemoneye-eventbus/README.md is excluded by none and included by none
  • daemoneye-eventbus/VALIDATION_CHECKLIST.md is excluded by none and included by none
  • daemoneye-eventbus/benches/ipc_performance.rs is excluded by none and included by none
  • daemoneye-eventbus/benches/throughput.rs is excluded by none and included by none
  • daemoneye-eventbus/docs/END-297-acceptance-evidence.md is excluded by none and included by none
  • daemoneye-eventbus/src/broker.rs is excluded by none and included by none
  • daemoneye-eventbus/src/client.rs is excluded by none and included by none
  • daemoneye-eventbus/src/message.rs is excluded by none and included by none
  • daemoneye-eventbus/tests/correlation_metadata_tests.rs is excluded by none and included by none
  • daemoneye-eventbus/tests/e2e_multi_collector.rs is excluded by none and included by none
  • daemoneye-eventbus/tests/integration_tests.rs is excluded by none and included by none
  • procmond/tests/snapshots/cli__procmond_help.snap is excluded by !**/*.snap and included by procmond/**
📒 Files selected for processing (5)
  • .github/workflows/benchmarks.yml
  • collector-core/src/daemoneye_event_bus.rs
  • procmond/src/event_bus_connector.rs
  • procmond/src/main.rs
  • procmond/tests/actor_mode_integration_tests.rs

Comment thread .github/workflows/benchmarks.yml
Comment thread procmond/src/main.rs
Comment thread procmond/src/main.rs Outdated
Comment thread procmond/tests/actor_mode_integration_tests.rs Outdated
@coderabbitai coderabbitai Bot removed documentation Improvements or additions to documentation rust Pull requests that update rust code ipc Inter-Process Communication process-monitoring Process monitoring and enumeration features cross-platform Multi-platform compatibility features core-feature Core system functionality async Related to asynchronous programming and async/await patterns labels Apr 18, 2026
Addresses 5 new review threads from the re-review of round-1 fixes:

- daemoneye-eventbus/README.md:120 (@copilot, Minor): README's p99 SLO
  claim was still unscoped. Updated to match the docs/src copy:
  "asserted in CI on Linux and macOS... Windows and FreeBSD results are
  informational only."

- daemoneye-eventbus/tests/e2e_multi_collector.rs (@copilot, Minor):
  the 47-lint allow-list violated AGENTS.md's "allow requires
  justification" guidance. Narrowed to 12 test-appropriate lints with
  per-lint inline rationale comments.

- daemoneye-eventbus/src/client.rs:1289 (@copilot, Minor):
  test_subscribe_with_control_closed_channel_when_not_opted_in used
  try_recv which accepts both Empty and Disconnected. The opt-out
  contract is specifically that the channel is CLOSED, not just empty.
  Switched to timeout(100ms, recv()) returning Ok(None).

- .github/workflows/benchmarks.yml (@coderabbitai, Major): the
  pull_request path filter didn't include workspace-level inputs that
  can plausibly affect p99. Added Cargo.toml, Cargo.lock,
  rust-toolchain.toml.

- docs/src/technical/eventbus-architecture.md:389-397 (@coderabbitai,
  Minor): the subscribe_with_control example ran a single
  tokio::select! branch. Wrapped in loop { ... else => break } for
  continuous event/control handling.

Coverage:
- cargo clippy --workspace --all-targets -- -D warnings: clean
- cargo fmt --all --check: clean
- 193 eventbus tests pass

Refs: END-297, PR #178
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copilot AI review requested due to automatic review settings April 19, 2026 00:15
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 25 changed files in this pull request and generated 5 comments.

Comment thread procmond/src/main.rs Outdated
Comment thread daemoneye-eventbus/tests/e2e_multi_collector.rs Outdated
Comment thread daemoneye-eventbus/tests/e2e_multi_collector.rs Outdated
Comment thread AGENTS.md Outdated
Comment thread procmond/src/event_bus_connector.rs
…in Arc

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Resolves 5 new review threads from @copilot's re-review of the round-2
fixes. Three real issues (one bug, one design issue, one policy gap)
plus two doc mismatches:

- procmond/src/main.rs (copilot, Critical): the heartbeat task was
  spawned BEFORE the control subscription. Tokio's interval ticks
  immediately on first poll, so a heartbeat could be published before
  the subscription was registered with the broker - causing an agent
  BeginMonitoring broadcast to be delivered to zero subscribers and
  procmond to fall back to a 60s timeout. Moved the heartbeat spawn
  to AFTER the subscribe block completes; updated the ordering comment
  to describe the real invariant.

- daemoneye-eventbus/src/client.rs + procmond/src/event_bus_connector.rs
  (copilot, Major): on connector shutdown with outstanding Arc clones,
  the previous implementation skipped client.shutdown() entirely,
  leaving the broker connection alive and background tasks running
  until the last clone dropped. Added a non-consuming
  EventBusClient::shutdown_signal(&self) that fires the broadcast
  shutdown channel. Connector shutdown now always calls shutdown_signal
  first (stops background tasks regardless of Arc count), then
  attempts Arc::into_inner for the full consuming shutdown when
  possible. No task leak on shutdown.

- AGENTS.md (copilot, Minor): the "No Auto-Commits" guideline had been
  replaced with "No Merging" without preserving the auto-commit
  prohibition. Restored "No Auto-Commits" as an explicit rule #2
  alongside "No Merging".

- daemoneye-eventbus/tests/e2e_multi_collector.rs (copilot, two docs):
  - File header previously claimed "Integration tests are code inside
    #[cfg(test)] by convention"; corrected - integration tests in
    tests/ compile as their own binary crate with no #[cfg(test)]
    wrapping.
  - collect_until doc comment previously claimed "Uses tokio::select!"
    but the impl is a try_recv drain loop with a short sleep.
    Rewrote the doc to describe the actual polling approach.

Coverage:
- cargo clippy --workspace --all-targets -- -D warnings: clean
- cargo fmt --all --check: clean
- 193 eventbus + 631 procmond tests pass

Refs: END-297, PR #178
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 27 changed files in this pull request and generated 1 comment.

Comment thread daemoneye-eventbus/src/message.rs
…ions

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…_checked_ops for 1.95 toolchain

Pre-commit's pinned clippy (rustup 1.95.0) introduced several new lints
that the workspace's prior 1.94.x toolchain did not enforce. just
ci-check now passes cleanly across all 44 touched files. No behavior
changes — every edit is a mechanical clippy suggestion.

Changes by lint:

- duration_suboptimal_units: replaced Duration::from_secs(60) with
  Duration::from_mins(1), from_secs(300) with from_mins(5),
  from_secs(3600) with from_hours(1), from_millis(1000) with
  from_secs(1), etc. across daemoneye-lib, daemoneye-eventbus,
  collector-core, daemoneye-agent, and procmond. Removed now-redundant
  inline unit comments.

- map_unwrap_or: rewrote .map(f).unwrap_or(x) on Result as
  .map_or(x, f) in daemoneye-lib, collector-core, and daemoneye-agent.
  Two sites in trigger.rs flagged as further simplifiable became
  .is_ok_and(|x| ...).

- manual_checked_ops: replaced if d == 0 { 0 } else { a / d } in
  daemoneye-eventbus/src/process_manager.rs with checked_div.unwrap_or(0).
  Dropped the now-unnecessary arithmetic_side_effects /
  integer_division allow attributes on that block.

Coverage:
- cargo +1.95 clippy --workspace --all-targets -- -D warnings: clean
- cargo +1.95 fmt --all --check: clean
- just ci-check: passes (1597 tests across workspace, all green)

Refs: END-297, PR #178
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 19, 2026

Codecov Report

❌ Patch coverage is 85.31685% with 95 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
procmond/src/main.rs 85.02% 37 Missing ⚠️
procmond/src/event_bus_connector.rs 26.53% 36 Missing ⚠️
daemoneye-eventbus/src/client.rs 93.18% 19 Missing ⚠️
collector-core/src/trigger.rs 77.77% 2 Missing ⚠️
collector-core/src/analysis_chain.rs 75.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Applies the in-scope fixes from the final comprehensive review
(6 reviewers). Non-blocking items are deferred to focused follow-up
PRs and logged in the acceptance-evidence "Post-Merge Follow-ups"
section so they travel with this PR.

Fixed in this commit:

- procmond/src/main.rs: treat ActorError::ChannelClosed as a fatal
  condition. Added begin_monitoring_or_exit() helper and routed all
  four fallback paths (--standalone, broker-unreachable,
  channel-closed, wait-timeout) plus the lifecycle-signal success
  path through it. ChannelClosed means the actor task crashed and
  procmond has become a zombie; exiting non-zero lets the supervisor
  restart. ChannelFull and other transient errors stay logged but
  non-fatal. (silent-failure-hunter C1)

- daemoneye-eventbus/docs/END-297-acceptance-evidence.md: demote R14
  (Control-message delivery) from fully satisfied to partial (in-
  process only). Added note that production procmond will hit the 60s
  wait timeout on every startup until the cross-process subscription-
  registration gap closes. (code-reviewer I-1)

- procmond/src/main.rs startup-ordering comment: rewrote to describe
  the real invariant. Previous wording claimed a causal heartbeat-
  triggers-broadcast chain that doesn't exist in the current agent
  code. (code-reviewer I-3)

- daemoneye-eventbus/src/client.rs: fix unreachable "already signaled"
  path in shutdown_signal rustdoc. The &self receiver proves the
  consuming shutdown() has not run; only "no live receivers" is
  reachable. (comment-analyzer S4a)

- daemoneye-eventbus/benches/ipc_performance.rs: drop misleading
  "x86_64" claim from the SLO cfg comment. The #[cfg] gate is
  target_os only; macOS aarch64 already exercises the assertion
  (observed p99 ~21µs). (comment-analyzer S4b)

- procmond/src/main.rs composition-root comment: remove the
  "statelessness invariant ... across trust domains" phrasing that
  does not appear in MultiAlgorithmHasher rustdoc. Replaced with the
  accurate claim (Send + Sync, no per-request mutable state).
  (comment-analyzer S4c)

- procmond/src/main.rs: drop the "(ADR referenced in END-297 evidence)"
  citation — no ADR link exists in the evidence artifact. Point
  directly to the plan document. (comment-analyzer S4d)

- daemoneye-eventbus/src/client.rs test docstring: correct the
  TryRecvError::Full reference — mpsc::error::TryRecvError has only
  Empty and Disconnected variants; a routed-to-wrong-channel bug
  would produce Ok(_), not any error variant. (comment-analyzer I6)

- daemoneye-eventbus/src/message.rs: EventSubscription docstring
  explains why #[non_exhaustive] wasn't applied (FRU forbidden
  cross-crate, requires builder-pattern refactor) and points to the
  follow-up.

Deferred (logged in Post-Merge Follow-ups section of the acceptance-
evidence artifact):

- Full audit-ledger integration on the 5 fallback paths (P1)
- Missing tests for subscribe-then-wait / --standalone / off-topic
  branch / TrySendError::Full (P1)
- Remove EventBusConnector.connected bool drift risk (P2)
- #[non_exhaustive] + builder-pattern refactor (P2)
- Type-design polish — collapse include_control, shutdown_signal
  enum (P3)
- Simplification opportunities — try_subscribe_control helper,
  collect_until doc, shutdown canonical example (P3)

Coverage:
- cargo +1.95 clippy --workspace --all-targets -- -D warnings: clean
- cargo +1.95 fmt --all --check: clean
- 193 eventbus + 631 procmond + 169 agent + 249 collector-core tests
  all pass

Refs: END-297, PR #178
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 67 out of 70 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

daemoneye-eventbus/src/client.rs:474

  • EventBusClient::subscribe() / subscribe_with_control() only record topic patterns in the client’s local self.subscriptions map and never notify the broker/transport about those patterns. However, DaemoneyeBroker::publish{,_control_message} delivers to remote (managed) clients via ClientConnectionManager::broadcast_to_topic, which matches against server-side subscription patterns. As-is, cross-process subscribers won’t receive published messages on non-direct topics (e.g., control.collector.lifecycle). Consider implementing a subscribe/unsubscribe wire message so the client registers topic_patterns with the broker (or otherwise ensure the server delivers messages to the client).

Comment thread procmond/src/main.rs
Second comprehensive review (scoped to commit f25beb1) flagged two
critical issues with the begin_monitoring_or_exit helper and several
accuracy defects in the accompanying docs. Addressed now.

Critical fixes:

- Happy-path hang (silent-failure C-1). The prior helper treated
  ChannelFull as non-fatal on every call site. But the agent's
  BeginMonitoring broadcast is at-most-once, and the standalone opt-in
  path is a one-shot entry — if ChannelFull hits on either
  lifecycle-signal or standalone-mode, the signal is consumed with no
  retry, leaving procmond hung in WaitingForAgent forever. This is the
  exact zombie mode the helper was created to prevent. Introduced a
  typed BeginMonitoringReason enum distinguishing one-shot paths
  (StandaloneMode, LifecycleSignal) from degraded-fallback paths
  (BrokerUnreachable, ChannelClosed, WaitTimeout) and made the helper's
  error policy path-dependent: ChannelFull is recoverable only on
  degraded-fallback paths; on one-shot paths every actor error is fatal.

- Non-exhaustive ActorError drift (silent-failure C-2). The prior
  helper's catch-all Err(_) branch silently absorbed any future
  ActorError variant into the "log-and-continue" path. Since ActorError
  is marked #[non_exhaustive], new variants would inherit the
  wrong-default without a compiler warning. Inverted the default:
  default-fatal posture now explicitly handles ChannelFull and exits
  on anything else, including ChannelClosed, ResponseDropped, and any
  future variant a reviewer adds.

- Typed reason enum (silent-failure S-2). Replaced the five &str
  literals with a BeginMonitoringReason enum, so misspellings at call
  sites are compiler errors and log-correlation queries are reliable.

Accuracy fixes (comment-analyzer):

- The #[allow(clippy::exit, reason = ...)] rationale incorrectly
  claimed that returning Err to main() would produce a clean exit
  supervisors wouldn't restart. In fact, anyhow::Result's Termination
  impl returns non-zero; the real reason exit() is needed is that
  three of five call sites are inside tokio::spawn tasks where Err
  return propagates nowhere. Rewrote the rationale accordingly.

- The EventSubscription docstring pointed at a file
  (docs/todos/end-297-type-design-polish.md) that doesn't exist in the
  repo — same class of bug the prior commit was meant to fix. Redirected
  to the EventSubscription bullet of the "Post-Merge Follow-ups"
  section of END-297-acceptance-evidence.md, which is where the
  deferral is actually tracked.

- The evidence doc's simplification-opportunities bullet cited line
  range 476-538 for the nested match, which was off by ~40 lines after
  this commit's own insertion. Dropped the specific line range and
  described the match shape instead so the reference stays valid as the
  file grows.

- Heartbeat-ordering comment now explicitly states the placement is
  "defensive only" — no code path today routes heartbeats into the
  broadcast trigger, and the subscribe/spawn order could be reversed
  without breaking correctness. Previous wording implied a causal chain
  that doesn't exist.

Coverage:
- cargo clippy --workspace --all-targets -- -D warnings: clean
- cargo fmt --all --check: clean
- 193 eventbus + 631 procmond tests pass

Refs: END-297, PR #178
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 67 out of 70 changed files in this pull request and generated 2 comments.

Comment thread procmond/src/main.rs
Comment thread procmond/src/main.rs Outdated
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 75 out of 78 changed files in this pull request and generated 2 comments.

Comment thread .github/workflows/scorecard.yml Outdated
Comment thread procmond/src/event_bus_connector.rs
Cleanup pass inspired by the libmagic-rs justfile structure. All
changes are additive or simplifying — no existing recipes break.

Structural changes:

- Simplified format-docs. Replaced the 60-char existence-check with
  `@mise_exec mdformat .`. mise guarantees mdformat is present;
  excludes (target/, node_modules/) already live in .mdformat.toml.
  Removed the stale "mdformat-install" citation (no such recipe
  existed). Same recipe now works on Windows and Unix — no platform
  split needed.

- Factored coverage into a private `_coverage +args` helper (the
  `cargo llvm-cov nextest` command line now lives in one place). Added
  `coverage-report` (HTML + open browser) and `coverage-summary`
  (terminal-only). `coverage-check` is now a one-liner: `just
  _coverage --fail-under-lines 9.7`. RUSTFLAGS='--cfg coverage' lets
  future code gate expensive paths with `#[cfg(coverage)]` when
  coverage runs get slow.

- Added `lint-actions` recipe — `actionlint .github/workflows/*.yml`
  with release.yml ignored (GoReleaser-generated). Already enforced
  via pre-commit; now runnable standalone.

- Renamed `lint-docs` → `lint-rustdoc` for clarity (checks rustdoc
  comments, not markdown). Added `lint-md` that runs markdownlint-cli2
  + lychee link-validation — catches the class of stale cross-reference
  that comment-reviewer rounds keep finding. Top-level `lint` now
  calls both.

- Marked `pre-commit-run` as `[private]`. It's only invoked via
  `check` and `ci-check`; no reason to clutter `just --list`.

- Added aliases for format/lint symmetry:
  - `format-rust := fmt`
  - `format-md := format-docs`
  - `format-just := fmt-justfile`
  - `lint-just := lint-justfile`

- Added `[group(...)]` annotations: bench/*, goreleaser/*, release/*
  now cluster together in `just --list`.

Bug fixes:

- Double space in goreleaser-build-target Windows variant (line 341).
- Orphaned "# Check coverage thresholds" + TODO comment collapsed
  into a single descriptive comment.

Verified:
- `just lint-justfile` clean (passes its own formatter).
- `just --list` renders cleanly with groups.
- `just format-docs` runs (no platform-split needed).
- `just lint-actions` runs (no findings).

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Running `lint-md` as part of `lint` (and therefore `ci-check`) revealed
many pre-existing markdownlint violations across the repo:

- third-party skill docs under .claude/skills/**
- draft specs under .kiro/upcoming-specs/**
- older mdbook content (docs/src/**)
- solution docs (frontmatter MD025 multiple-h1)

These are unrelated to this PR's scope (justfile cleanup) and would
require a focused docs-cleanup pass to resolve.

Pulled `lint-md` out of the default `lint` chain. The recipe still
exists and is runnable on-demand via `just lint-md`. Added a comment at
the lint chain explaining why it's deferred and pointing at the
cleanup ticket.

Verified: `just ci-check` exits 0 (1597 tests pass, all lint stages
clean, coverage + build-release + security-scan + goreleaser-check
all pass).

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- main.rs: unify begin_monitoring_or_exit policy so every ActorError is
  fatal. begin_monitoring() is a one-time WaitingForAgent->Running
  transition with no retry on any path, so the is_one_shot split from
  round-6 masked a hang path. Removed BeginMonitoringReason::is_one_shot
  and the ChannelFull special case; the enum is retained purely as a
  typed log label.
- event_bus_connector.rs: correct shutdown log wording. broadcast::Sender
  only fails with zero receivers, never 'already signaled'.
- message.rs: document EventSubscription::enable_wildcards as advisory
  (the client matcher currently parses wildcards unconditionally).
- scorecard.yml: remove stale 'Disabled' comment above publish_results.
- Retire daemoneye-eventbus/docs/END-297-acceptance-evidence.md. Its
  content was scratch wrap-up; latent gaps and post-merge follow-ups now
  live as todos in .context/compound-engineering/todos/ (084-091).

Resolves PR #178 threads: PRRT_kwDOPveTG857_G-3, PRRT_kwDOPveTG857_Ksk,
PRRT_kwDOPveTG857_Ksp, PRRT_kwDOPveTG857-36m, PRRT_kwDOPveTG857_NNQ,
PRRT_kwDOPveTG857_NNV.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 75 out of 78 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

daemoneye-eventbus/src/message.rs:623

  • EventSubscription now derives Default, but the derived default sets enable_wildcards to false (bool default). The rustdoc just below says setting it to true is the “honest default” until wildcard enforcement lands, and the docs encourage constructing via ..Default::default(). Consider implementing Default manually (or adding a serde/default helper) so enable_wildcards defaults to true, or adjust the documentation to match the actual default to avoid surprising callers once enforcement is added.

The eventbus-latency-slo job has been cancelled at its 15-minute timeout
on every run since it was added in the END-297 closure-pass (10 runs, 0
greens). Root cause: no target/ cache, so every PR did a cold build of
daemoneye-eventbus + all deps + criterion harness before the ~3-second
bench ever ran. Cold build on ubuntu-latest/macos-latest exceeds 15 min.

Add Swatinem/rust-cache keyed on OS + Cargo.lock so the second run onward
restores target/ and completes well under the timeout. Matches the
existing cache strategy used for the criterion-baseline manual-dispatch
job above.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

async Related to asynchronous programming and async/await patterns core-feature Core system functionality cross-platform Multi-platform compatibility features daemoneye An important component of the larger DaemonEye suite. documentation Improvements or additions to documentation integration Related to integration testing and component integration ipc Inter-Process Communication performance process-monitoring Process monitoring and enumeration features rust Pull requests that update rust code size:L This PR changes 100-499 lines, ignoring generated files. testing Related to test development and test infrastructure type:feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement message broker for multi-collector coordination and load balancing

2 participants