Skip to content

Investigate overview page cache TTL and data freshness #2

@maxlyth

Description

@maxlyth

Context

The overview page uses @st.cache_data(ttl=300) on timeline and config loaders, with _QUANT = 300 timestamp quantization. This means displayed health data can be up to 5 minutes stale.

Since v0.4.10, the time window is computed dynamically (benchmarked against an 8-second page-load target). The window computation is also cached with the same TTL.

Investigation items

1. Cache TTL appropriateness

The 300s TTL was chosen as a general-purpose trade-off. Evaluate whether it should be:

  • Shorter for data that changes frequently (coverage, fire frequency shift as entities change state)
  • Longer for data that rarely changes (sensor config, observation list)
  • Different per cache — currently all caches use the same TTL
Cache function Current TTL Data Staleness concern
_get_sensor_ids 120s Sensor list Sensors added/removed
_get_config 300s Prior, threshold, obs list, source User edits YAML or UI config
_load_sensor_timelines 300s Entity state history New state changes arrive
_get_window 300s Dynamic window calculation Benchmark may not reflect current load

2. Benchmark sensor representativeness

The dynamic window benchmarks using the first valid sensor. This may not be representative:

  • A sensor with few observations may benchmark optimistically
  • A sensor with many template observations may benchmark pessimistically
  • Consider benchmarking with the median or sampling multiple sensors

3. Window stability across cache boundaries

When caches expire (every ~5 min), the window is recomputed. If system load fluctuates, the window may oscillate between values, causing start_ts to change and invalidating all timeline caches. Consider:

  • Hysteresis: only change window if the new value differs by >20%
  • Persisting the window in st.session_state with a longer lifetime

4. Coverage and fire frequency accuracy at short windows

At shorter dynamic windows (e.g., 1–2 hours), coverage percentage may not be meaningful — an entity with a 6-hour update interval would show 0% coverage in a 1-hour window even though it's healthy. Consider minimum window thresholds per metric.

Labels

investigation, performance

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions