Context
The overview page uses @st.cache_data(ttl=300) on timeline and config loaders, with _QUANT = 300 timestamp quantization. This means displayed health data can be up to 5 minutes stale.
Since v0.4.10, the time window is computed dynamically (benchmarked against an 8-second page-load target). The window computation is also cached with the same TTL.
Investigation items
1. Cache TTL appropriateness
The 300s TTL was chosen as a general-purpose trade-off. Evaluate whether it should be:
- Shorter for data that changes frequently (coverage, fire frequency shift as entities change state)
- Longer for data that rarely changes (sensor config, observation list)
- Different per cache — currently all caches use the same TTL
| Cache function |
Current TTL |
Data |
Staleness concern |
_get_sensor_ids |
120s |
Sensor list |
Sensors added/removed |
_get_config |
300s |
Prior, threshold, obs list, source |
User edits YAML or UI config |
_load_sensor_timelines |
300s |
Entity state history |
New state changes arrive |
_get_window |
300s |
Dynamic window calculation |
Benchmark may not reflect current load |
2. Benchmark sensor representativeness
The dynamic window benchmarks using the first valid sensor. This may not be representative:
- A sensor with few observations may benchmark optimistically
- A sensor with many template observations may benchmark pessimistically
- Consider benchmarking with the median or sampling multiple sensors
3. Window stability across cache boundaries
When caches expire (every ~5 min), the window is recomputed. If system load fluctuates, the window may oscillate between values, causing start_ts to change and invalidating all timeline caches. Consider:
- Hysteresis: only change window if the new value differs by >20%
- Persisting the window in
st.session_state with a longer lifetime
4. Coverage and fire frequency accuracy at short windows
At shorter dynamic windows (e.g., 1–2 hours), coverage percentage may not be meaningful — an entity with a 6-hour update interval would show 0% coverage in a 1-hour window even though it's healthy. Consider minimum window thresholds per metric.
Labels
investigation, performance
Context
The overview page uses
@st.cache_data(ttl=300)on timeline and config loaders, with_QUANT = 300timestamp quantization. This means displayed health data can be up to 5 minutes stale.Since v0.4.10, the time window is computed dynamically (benchmarked against an 8-second page-load target). The window computation is also cached with the same TTL.
Investigation items
1. Cache TTL appropriateness
The 300s TTL was chosen as a general-purpose trade-off. Evaluate whether it should be:
_get_sensor_ids_get_config_load_sensor_timelines_get_window2. Benchmark sensor representativeness
The dynamic window benchmarks using the first valid sensor. This may not be representative:
3. Window stability across cache boundaries
When caches expire (every ~5 min), the window is recomputed. If system load fluctuates, the window may oscillate between values, causing
start_tsto change and invalidating all timeline caches. Consider:st.session_statewith a longer lifetime4. Coverage and fire frequency accuracy at short windows
At shorter dynamic windows (e.g., 1–2 hours), coverage percentage may not be meaningful — an entity with a 6-hour update interval would show 0% coverage in a 1-hour window even though it's healthy. Consider minimum window thresholds per metric.
Labels
investigation, performance