Skip to content

Feature: Predictive cooldown based on usage patterns #18

@owaindjones

Description

@owaindjones

Overview

Desktop environments give us the option to automatically put the machine into standby after set amounts of time. For example, my main development workstation and LLM machine is configured in KDE to sleep after 15 minutes of being idle.

User story

Depending on usage pattern, the fixed time may be too much or too little - here are some examples:

  • Machine being woken in the middle of the night to label photos as part of another server's cron job: Batch task, an immediate burst of solid above-inhibition-threshold activity for ~10 minutes, and then back to a long period of idle time (sub-inhibition-threshold activity). Machine could sleep after 5 minutes of inactivity.
  • Machine being woken and immediately started a gaming session on: A period of sub-threshold activity (streaming starting up, Steam loading, game updates downloading for ~10 minutes), followed by a continuous amount of high activity for a long period of time (maybe 20 minutes, maybe 3 hours, depending on the human and how much free time they have..!). Again, after they've finished with the game, the machine drops back to idle after some smaller short bursts (more updates, game saves syncing, user closing down Steam), but the system shouldn't be too quick to go back to idle - after around ~15 minutes, to not interrupt the user browsing Steam within the headless game stream, something of low intensity.
  • Machine being woken and continuing an OpenCode session: A mixture of sub-threshold activity for minutes at a time (user is writing things to the LLM), followed by a mixture of bursty and continuous above-threshold activity as the LLM works on the given task, with pauses as it waits on execution of commands or for the user to respond to questions it's asked. Here, it's helpful to have a longer cooldown of ~30-60 minutes.

Proposal

My idea is thus - a dynamic, predictive cooldown duration which grows or shrinks depending on time of day and the metrics pattern. This would be based on historical data and using a simple statistical model, e.g. logistic regression (or whatever is most suitable - research is advisable). The model must be very computationally efficient - being fed new training data on every tick of rouser and making a prediction of the desirable remaining cooldown duration in seconds.

The training must be entirely passive, based on the metrics rouser already gathers - the user can keep the machine awake through other means (by using the desktop environment, or running systemd-inhibit themselves). The predictive cooldown model rouser uses should be multidimensional based on time, the final (i.e. EMA-smoothed) metrics used to decide current inhibition state, the current inhibition state (inhibited/not inhibited). It should infer when the machine has been suspended or shut down from gaps in time within the logged data.

Predicted desirable cooldown will be bounded by the configurable timing.cooldown_duration parameter - it must be equal or greater than it. If the config parameter is set to "10s" and the predicted cooldown reaches 0 seconds, the actual runtime cooldown value rouser uses will be clamped to minimum 10 seconds.

The estimated cooldown must be updated on every tick whilst inhibition is inactive. It may increase or decrease dynamically on every tick, so rouser must account for that each tick when making the decision on whether to drop the inhibition lock.

The model must be CPU and memory efficient and not rely on GPU acceleration. We could take cues from how mobile and embedded devices predict battery discharge rate or charging time based on usage patterns.

Expected behaviour

Rouser dynamically decides how long to keep suspension inhibited when the metrics fall below the inhibition threshold, based on previous usage patterns. It should adapt to current workload but should also take into account the typical hourly/daily/weekly usage for the current time and metric pattern.

Prediction log file format

Because we're writing historical data (days, weeks, or months of) which we must load efficiently at startup, I suggest the log file format be a binary one. To allow us to include more metrics in future without breaking the prediction log format, I suggest using a structured format like msgpack, bson, protobuf or Cap'n Proto -- or something else -- whichever is the best choice for this Rust project, trying to keep dependencies to a minimum but also having something that's safe, maintainable and simple to implement. The log file may benefit from being wrapped in a compression format too - not sure.

The log format must include:

  • Timestamp (with sub-second granularity)
  • All EMA-smoothed metrics
  • Current inhibition flag (do we have inhibition lock: true/false)

The file format must support efficient truncation from the "top" of the file; we want to maintain an on-disk persistent circular buffer which retains a fixed amount of data gathered, with will be configured by a history_length parameter set to a long period of time (like 30 days). The log doesn't need to be pruned on every tick but should be pruned e.g. once every 12 hours. The log file is not allowed to grow indefinitely as this would use up all available disk space.

It may be more efficient to do "log rotation" and partition the data across multiple files - split on date - and then we can efficiently prune by removing files whose date is older than the history_length cutoff.

File location

When running as a non-privileged user process, rouser should write to a rouser dir in the appropriate directory according to the XDG specs, e.g. whichever XDG env var would evaluate to a prefix of ~/.local/share/ (defaulting to that if it's unset). And when running as a root/privileged process, writing to /var/lib/rouser.

  • Running as user: ~/.local/share/rouser/history.log.YYYYMMDD (suffix with current date)
  • Running as root: /var/lib/rouser/history.log.YYYYMMDD (suffix with current date)

Configuration file additions

[prediction]
update_interval: "30s"   # different update interval to the main one, generally higher - will be clamped to the root update_interval as we are still bounded by tick rate.  This is the rate at which rouser writes 
history_length: "30d"  # how long the history is allowed to get - entries older than this are pruned periodically

Testing

The prediction model and file format and configuration are easy to test in isolation as they don't rely on manual QA of the live state of a machine - all input data for these new pieces of code can be synthesized for unit tests, and history files produced in unit tests should be ephemeral using tempfiles. History files can be written to a temp dir i.e. under /tmp during manual QA.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions