Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
3eb49b2
feat(prediction): add predictive cooldown based on historical usage p…
owaindjones May 1, 2026
34e8a0c
refactor(prediction): use Duration type for max_extension_time and al…
owaindjones May 1, 2026
393ebc3
chore: add .sisyphus to gitignore
owaindjones May 1, 2026
3c9c945
feat(prediction): add debug logging, documentation, and wire prune in…
owaindjones May 1, 2026
1f3af88
fix(preview): improve accumulation logging, auto-correct prediction i…
owaindjones May 1, 2026
bfd4e54
docs(prediction): update log format example to match accumulated_tick…
owaindjones May 1, 2026
f1aadcd
refactor(prediction): simplify model to use inhibited flag, fix --pri…
owaindjones May 2, 2026
043a8b9
refactor(inhibit): remove dead is_auth_error function after retry log…
owaindjones May 2, 2026
7782e8e
fix(prediction): update debug log to show full TimeKey, fix inhibit c…
owaindjones May 2, 2026
5688bac
fix(prediction): constrain proximity search by year/week, fix inhibit…
owaindjones May 2, 2026
e5ff8f1
fix(prediction): fix ISO week wraparound in proximity search, expand …
owaindjones May 2, 2026
2da23ab
fmt: reformat code to match rustfmt conventions
owaindjones May 2, 2026
e5be9a2
refactor(prediction): remove noisy debug logs, change seconds_into_we…
owaindjones May 2, 2026
dccf78e
fix(systemd): use StateDirectory for persistent history data with rea…
owaindjones May 2, 2026
ff4a218
fix(service): fix predictive cooldown logging and extension not applying
owaindjones May 2, 2026
32cdfbf
feat(prediction): add delta features, gap detection and zero-fill int…
owaindjones May 2, 2026
9f2fbdc
fix(prediction): compute delta features in production and consume tre…
owaindjones May 2, 2026
067cade
fix(history): fall back to file modification time for non-YYYYMMDD fi…
owaindjones May 2, 2026
84363e4
docs(history): document Linux creation time limitation in file sort f…
owaindjones May 2, 2026
aad16f8
fix(prediction): recompute deltas for entries after gap-filled synthe…
owaindjones May 2, 2026
6b1585d
fix(prediction): fix network delta averaging and add gap-fill delta r…
owaindjones May 2, 2026
606fbb6
fix(prediction): fix clippy manual_range_contains lint in test
owaindjones May 2, 2026
107d21c
chore: add branch-only work rule and simplify systemd service ExecStart
owaindjones May 2, 2026
0623176
docs(systemd): remove stale --config paths from ExecStart examples
owaindjones May 2, 2026
6d5f97c
feat(tracing): two-phase init with reloadable filter and log level pr…
owaindjones May 2, 2026
9f86164
feat(prediction): remove delta fields from history, use timestamp-bas…
owaindjones May 2, 2026
b87d411
docs(prediction): fix stale references to removed constants and old b…
owaindjones May 2, 2026
4f5d165
feat(prediction): re-evaluate cooldown extension every tick during wa…
owaindjones May 2, 2026
6e8dc6a
refactor(prediction): remove dead cooldown_extension_applied field an…
owaindjones May 2, 2026
66f2a60
fix(prediction): online model updates and fix prediction overwrite bug
owaindjones May 7, 2026
45e0e90
feat(gpu): add aggregate GPU metrics with per-GPU + total average thr…
owaindjones May 7, 2026
887f39f
docs(configuration): document dual GPU thresholds with per-GPU + tota…
owaindjones May 7, 2026
b45d1f5
docs(gpu): update GPU docs to reflect dual-threshold aggregate metrics
owaindjones May 7, 2026
4673a9a
config: update default GPU thresholds to per_gpu=25.0, total=40.0 and…
owaindjones May 7, 2026
e74f603
refactor(config): remove dead default functions, use hardcoded values…
owaindjones May 7, 2026
0186493
docs(agents): fix stale reference to removed default helper functions
owaindjones May 7, 2026
b6bd2de
fmt: fix derive macro formatting on Metrics struct
owaindjones May 7, 2026
1c8fcd7
refactor(config): remove redundant default_what() helper function
owaindjones May 7, 2026
484d826
fix(config): align Default impl values with config/rouser.toml
owaindjones May 7, 2026
1eaf63e
fmt: format vec! macro for exclude_device_prefixes
owaindjones May 7, 2026
4d7d7fb
fix(gpu): invert has_gpus() logic — was returning true when NO GPUs e…
owaindjones May 7, 2026
4f50e41
feat(debug): show aggregate GPU metrics in debug log, add has_gpus tests
owaindjones May 7, 2026
1d20ad9
fmt: fix trailing whitespace and blank line formatting in gpu.rs test…
owaindjones May 7, 2026
92fc7f9
test(gpu): add GpuAggregate unit tests for empty, single, and multi-G…
owaindjones May 7, 2026
7f15241
feat(prediction): include GPU aggregate metrics in snapshot debug log
owaindjones May 7, 2026
b9e9c66
fmt: fix indentation in model.rs snapshot log formatting
owaindjones May 7, 2026
87cdc2b
feat(prediction): add GPU deltas to EntryDeltas and TrendSignal
owaindjones May 7, 2026
78d576e
docs(prediction): rewrite prediction model docs with ML architecture
owaindjones May 7, 2026
03c8e47
docs: add comprehensive prediction model refactoring TODO and AGENTS.…
owaindjones May 7, 2026
64a71ef
fix: correct typo in AGENTS.md predotion→prediction reference
owaindjones May 7, 2026
56ae9dd
feat(config): add ML model parameters to PredictionConfig
owaindjones May 7, 2026
d6ed09d
feat(prediction): add ML model module with NG-RC wrapper and feature …
owaindjones May 7, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -36,4 +36,5 @@ site/

# Scratch directories
.scratch/
.sisyphus/
scratch/
28 changes: 25 additions & 3 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ These guidelines are specific to **AI/LLM agents** working on this codebase. Hum
## Core Principles

- **Read CONTRIBUTING.md first**: Before making changes, read [CONTRIBUTING.md](./CONTRIBUTING.md) for coding standards, testing conventions, and documentation sync rules that apply to all contributors (agents included). AGENTS.md covers agent-specific behavior; CONTRIBUTING.md covers everything else.
- **Work in branches only**: All work must be done in feature or topic branches unless the user explicitly specifies otherwise. Commits directly to `main` are forbidden without explicit instruction. Before beginning any task, check what branch you're on and create a new one if needed (e.g., `feat/description`, `fix/description`).
- **Build before committing**: The code MUST compile (`cargo build`), pass all tests (`cargo test --all-targets`), and be clean under clippy (`cargo clippy --all-targets -- -D warnings`) before any git commit. Never ship broken code. Always match CI commands exactly — `--all-targets` includes test targets which may have lint warnings not visible otherwise.
- **Conventional commits**: All git commit messages follow [Conventional Commits](https://www.conventionalcommits.org/) format: `type(scope): description`. See section below.
- **Commit frequently when stable**: Make atomic, logical commits whenever the codebase is in a working state (builds, tests pass). Do not batch unrelated changes into a single commit. Each commit should represent one coherent unit of change.
Expand All @@ -13,6 +14,9 @@ These guidelines are specific to **AI/LLM agents** working on this codebase. Hum
- For larger units of work (major refactoring, big new feature), split into small, manageable commits rather than one massive commit to preserve history granularity and make rollbacks easier.
- **Follow existing patterns first**: Before proposing new patterns or structures, search for and follow established conventions in the codebase. When in doubt, match what's already there.
- **Graceful degradation over panics**: Metric collectors return `Result` types and fall back to zero values on failure. The daemon continues operating even when individual metrics are unavailable.
- **Descriptive comments are encouraged**: Comments that explain non-obvious intent, arithmetic expectations, or why a particular approach was chosen should be kept — especially in tests where the "what" is clear but the "why" and expected values may not be. Docstrings on public APIs and complex algorithms (e.g., accumulation logic, security-critical code) are welcome. Avoid comments that merely restate what the code already says ("increment counter by one"), but keep those that add context a reader wouldn't get from reading alone.
- **Docs document current state only**: All documentation must describe how things work now — never reference "previous behaviour", "this replaces", or any historical comparison. Documentation is read against the current codebase; past implementation details belong in git history, not docs.
- **Use todos tool for task tracking**: Always use the `todos` tool to track tasks and keep it updated as you progress. When interrupted or new requests are made during work, update the todos list ordering by priority. This ensures continuity across session boundaries and prevents lost context on resumption.

### Agent-Specific Rules (do NOT apply to human developers)

Expand Down Expand Up @@ -96,6 +100,7 @@ Use the affected module as scope: `service`, `config`, `gpu`, `cpu`, `network`,
## Logging Conventions

- Use the `tracing` crate (`debug!`, `info!`, `warn!`, `error!` macros).
- **Log level priority chain**: When resolving the effective tracing log level, always follow this exact order: CLI `-l` flag > RUST_LOG env var > config.log_level from any loaded config file > default of `'info'`. Never reorder these — the function `resolve_tracing_log_level()` in main.rs implements this and must not be changed.
- **State-change-only logging**: When tracking persistent states (inhibition, connection status), only emit INFO logs on actual state transitions. Do not log every polling cycle when state is unchanged. Track previous state and compare at the end of each tick/loop iteration.

## Error Handling Conventions
Expand All @@ -114,8 +119,8 @@ Use the affected module as scope: `service`, `config`, `gpu`, `cpu`, `network`,
## Configuration Conventions

- TOML format via the `toml` crate with serde derive macros.
- All config values have sensible defaults defined as `fn default_*() -> T` helper functions.
- Optional fields use `#[serde(default)]`; required overrides use `#[serde(default = "default_fn")]`.
- All config values have sensible defaults defined in `config/rouser.toml`, embedded at compile time via `include_str!()`. Struct fields use bare `#[serde(default)]`; Duration fields may need explicit helper functions only when humantime_serde requires a function-typed default (e.g., `default_history_length()` for 30-day history).
- Explicit `Default` trait impls on config structs hardcode values from `config/rouser.toml`. Never add `fn default_*() -> T` helper functions — the TOML file is the single source of truth.
- Duration parsing uses `humantime_serde` for human-readable format (e.g., `"5s"`, `"30m"`).

## XDG Base Directory Compliance
Expand Down Expand Up @@ -219,7 +224,7 @@ The old `/org/freedesktop/PowerManagement.Inhibit` API is obsolete (deprecated ~
`config/rouser.toml` is the single source of truth for all configuration defaults — not `src/config.rs`, not documentation, not code comments. When updating default values:

1. **Always update `config/rouser.toml` first** with the new default value
2. Then update `src/config.rs` to match (default helper functions like `default_ema_alpha_cpu()`)
2. Then update `src/config.rs` to match (hardcoded values in `Default` trait impls)
3. Then update all documentation (`docs/configuration.md`, `docs/metrics-overview.md`, etc.)

The code defaults in `config/rouser.toml` are embedded at compile time via `include_str!()` and served as both the shipped config file AND the binary's built-in fallback. Never change a default value without updating all three locations simultaneously.
Expand Down Expand Up @@ -299,3 +304,20 @@ echo "https://github.com/{owner}/{repo}/actions/runs/RUN_ID"
- **Missing `needs` dependencies**: If a job references another via `needs: [foo]`, and `foo` is conditional (`if:`), the dependent job inherits that condition — it will skip if the dependency was skipped. Always verify both jobs have matching trigger conditions.
- **Container vs runner environment mismatch**: Steps running in containers (e.g., `container: fedora:latest`) cannot access tools on the host runner (like `gh` CLI). Split containerized build steps from upload/CLI steps that run on `ubuntu-latest` without a container.
- **Artifact download path defaults to `.`**: When using `actions/download-artifact@v4`, always specify `path: some-dir/` explicitly, then move files with `mv some-dir/* .` before consuming them — default behavior may merge artifacts unpredictably.

## XDG State Directory Migration

History data was migrated from `$XDG_DATA_HOME/rouser` (or `~/.local/share/rouser`) to `$XDG_STATE_HOME/rouser` (or `~/.local/state/rouser`). This is a breaking change: existing history files at the old path are not read by new binaries. The fallback for read-only `/home` with no writable state dir uses `/tmp/rouser-history.<pid>` with 0700 permissions to minimize TOCTOU risk on shared systems. When updating config defaults or docs, always reference `XDG_STATE_HOME`, never `XDG_DATA_HOME`.

## Prediction Model Refactoring (In Progress)

The prediction module is undergoing a major refactoring to replace the histogram-based TimeKey approach with an unsupervised ML model using NG-RC reservoir computing from the [irithyll](https://crates.io/crates/irithyll) crate. See [`docs/prediction-todo.md`](./docs/prediction-todo.md) for the complete task tracker and architecture decisions.

**Key changes:**
- **TimeKey deprecation**: The `(year, week_of_year, seconds_into_week)` histogram key is being removed. Year provides no pattern-matching value (it's monotonically increasing), and 604800 buckets/week is wasteful for sparse data. The ML approach eliminates bucketing entirely — each history entry becomes a feature vector.
- **Feature vectors**: Six normalized values per entry: CPU max, CPU avg, GPU max, GPU avg, network MB/s, disk MB/s. No time-key bucketing; temporal patterns learned via reservoir delay embeddings.
- **Unsupervised learning**: NG-RC updates weights at each prediction `update_interval` (default 30s) without labeled data. Anomaly score maps to cooldown extension.
- **Gap-filled entries preserved**: Unlike the previous approach that filtered out zero-value gap entries, these represent valid idle states and contribute to baseline anomaly scoring.
- **GPU deltas added**: EntryDeltas now includes `gpu_delta_per_gpu_max` and `gpu_delta_total_average`, updated in TrendSignal alongside CPU/network/disk trends.

**Config changes:** New fields planned for `[prediction]`: `hidden_dim: usize (default 16)`, `delay_buffer_size: usize (default 8)` to control reservoir capacity.
3 changes: 1 addition & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,8 +155,7 @@ Include contextual identifiers in log messages: GPU device IDs (`card0(nvidia)`)
### Configuration Conventions

- TOML format via the `toml` crate with serde derive macros.
- All config values have sensible defaults defined as `fn default_*() -> T` helper functions.
- Optional fields use `#[serde(default)]`; required overrides use `#[serde(default = "default_fn")]`.
- All config values have sensible defaults defined in `config/rouser.toml`, embedded at compile time via `include_str!()`. Struct fields use bare `#[serde(default)]`; explicit `Default` trait impls on config structs hardcode these same values. Never add `fn default_*() -> T` helper functions — the TOML file is the single source of truth.
- Duration parsing uses `humantime_serde` for human-readable format (e.g., `"5s"`, `"30m"`).

---
Expand Down
6 changes: 6 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,16 @@ libc = "0.2"
serde = { version = "1.0", features = ["derive"] }
humantime-serde = "1.0"

# Binary serialization for history log (lightweight, serde-compatible via bincode v2)
bincode = { version = "2", features = ["serde"] }

# CLI parsing
clap = { version = "4", features = ["derive"] }
humantime = "2.1"

# Streaming machine learning (unsupervised NG-RC reservoir computing for cooldown prediction)
irithyll = { version = "9.9", features = ["serde-bincode"] }


[dev-dependencies]
tempfile = "3.0"
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ rouser keeps headless servers and desktops awake during active use. It monitors
- **Multi-metric monitoring**: CPU (per-core frequency-weighted), GPU (NVIDIA/AMD/Intel), network I/O, disk activity
- **Configurable thresholds**: Independent per-core and total-CPU thresholds, per-GPU reporting
- **EMA smoothing**: Per-metric exponential moving average for stable readings
- **Predictive cooldown**: Learns from historical usage patterns to extend idle cooldown duration, reducing false-positive sleep inhibition during typical active-use hours
- **Systemd integration**: Uses `org.freedesktop.login1.Manager.Inhibit` D-Bus API
- **TOML configuration**: Embedded default config; auto-installs to user or system paths on first run, merges `/etc/rouser/config.toml` and `~/.config/rouser/config.toml` if present
- **Dry-run mode**: Test without inhibiting sleep
Expand Down Expand Up @@ -71,6 +72,7 @@ See [Configuration Reference](docs/configuration.md) for all options with defaul
| [Configuration Reference](docs/configuration.md) | All config options with embedded-default values |
| [Command Line](docs/command-line.md) | CLI arguments and usage examples |
| [Metrics Overview](docs/metrics-overview.md) | How CPU, GPU, network, disk metrics are collected |
| [Prediction Model](docs/prediction-model.md) | How adaptive cooldown extension works from historical patterns |
| [GPU Usage Measurement](docs/gpu-usage-measurement.md) | What NVML, amdgpu, and i915 actually measure |
| [D-Bus Inhibition](docs/d-bus-inhibition.md) | How sleep inhibition works under the hood |

Expand Down
14 changes: 12 additions & 2 deletions config/rouser.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,9 @@ total_threshold = 25.0
ema_alpha = 0.7

[metrics.gpu]
threshold = 15.0 # GPU usage threshold (percentage)
ema_alpha = 0.7 # EMA smoothing factor
per_gpu_threshold = 25.0 # Per-GPU utilization percentage that triggers inhibition
total_threshold = 40.0 # System-wide average GPU utilization threshold (both thresholds use OR logic)
ema_alpha = 0.7 # EMA smoothing factor

[metrics.network]
threshold = 10.0 # Network I/O threshold (Mbps)
Expand All @@ -35,3 +36,12 @@ cooldown_duration = "10s" # Time below threshold before releasing inhibition
[inhibitor]
what = "shutdown:idle" # Lock type: idle, sleep, suspend, shutdown (colon-separated)
mode = "block" # Mode: block, delay, block-weak

# Predictive cooldown — learns from historical usage patterns to dynamically extend or reduce the cooldown duration.
# Requires a longer history (days/weeks of data). Disabled by default; set update_interval to enable.
[prediction]
update_interval = "30s" # Seconds between averaged snapshots written to history log; must be >= root update_interval
history_length = "30d" # Keep this much historical data; older entries are pruned periodically
max_extension_time = "1h" # Maximum additional time for predictive cooldown extension
ml_hidden_dim = 16 # Number of hidden neurons in NG-RC reservoir computing model (controls capacity, O(n^2) memory)
ml_delay_buffer_size = 8 # Size of delay buffer for temporal feature creation from past states
29 changes: 17 additions & 12 deletions docs/averaging.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,8 +94,9 @@ threshold = 80.0
ema_alpha = 0.1 # Default smoothing for CPU

[metrics.gpu]
threshold = 90.0
ema_alpha = 0.2 # More responsive for GPU
per_gpu_threshold = 90.0 # Per-GPU max usage threshold
total_threshold = 85.0 # System-wide average threshold (both use OR logic)
ema_alpha = 0.2 # More responsive for GPU

[metrics.network]
threshold = 100.0
Expand All @@ -117,15 +118,16 @@ ema_alpha = 0.1 # Standard smoothing for disk I/O

### Per-GPU EMA Smoothing

Each detected GPU applies the same `ema_alpha` from `[metrics.gpu]`, but independently. There is no per-GPU config override — the threshold and smoothing factor apply uniformly to all GPUs:
Each detected GPU applies the same `ema_alpha` from `[metrics.gpu]`, but independently. There is no per-GPU config override — both thresholds and the smoothing factor apply uniformly to all GPUs:

```toml
[metrics.gpu]
threshold = 90.0 # Applies to ALL detected GPUs
ema_alpha = 0.2 # Applied per-device, not globally averaged
per_gpu_threshold = 90.0 # Per-GPU max usage threshold (applies to ALL detected GPUs)
total_threshold = 85.0 # System-wide average threshold
ema_alpha = 0.2 # Applied per-device, not globally averaged
```

This means card0(nvidia) at 95% and card1(amdgpu) at 87% are each compared against the same threshold independently — one exceeding it triggers inhibition regardless of the other's state.
This means card0(nvidia) at 95% and card1(amdgpu) at 87% are each compared against the `per_gpu_threshold` independently — one exceeding it triggers inhibition regardless of the other's state. The system-wide average is also checked: if both GPUs hover near `total_threshold`, that alone can trigger inhibition even if neither per-GPU value exceeds its threshold.

## Threshold Evaluation

Expand Down Expand Up @@ -323,8 +325,9 @@ total_threshold = 60.0
ema_alpha = 0.1 # Default smoothing for CPU

[metrics.gpu]
threshold = 90.0
ema_alpha = 0.2
per_gpu_threshold = 90.0 # Per-GPU max usage threshold
total_threshold = 75.0 # System-wide average threshold (both use OR logic)
ema_alpha = 0.2 # EMA smoothing for GPU

[metrics.network]
threshold = 50.0
Expand Down Expand Up @@ -354,8 +357,9 @@ total_threshold = 70.0
ema_alpha = 0.15 # More responsive for compilation bursts

[metrics.gpu]
threshold = 95.0
ema_alpha = 0.2 # Responsive for GPU workloads
per_gpu_threshold = 95.0 # Per-GPU max usage threshold (high for gaming)
total_threshold = 80.0 # System-wide average threshold (both use OR logic)
ema_alpha = 0.2 # EMA smoothing for GPU

[metrics.network]
threshold = 100.0
Expand Down Expand Up @@ -385,8 +389,9 @@ total_threshold = 60.0
ema_alpha = 0.2 # Quick spike detection

[metrics.gpu]
threshold = 90.0
ema_alpha = 0.25 # Very responsive for gaming GPU activity
per_gpu_threshold = 90.0 # Per-GPU max usage threshold (high for gaming)
total_threshold = 85.0 # System-wide average threshold (both use OR logic)
ema_alpha = 0.25 # Very responsive for gaming GPU activity

[timing]
duration_threshold = "15s" # Shorter threshold — gamers prefer instant response
Expand Down
Loading
Loading