Skip to content

Commit 4dcf26e

Browse files
authored
Add 1h pre-aggregated data, improve dashboards, and speed up web queries (#27)
Summary ### Changes - **TimescaleDB**: Added `cml_data_1h` continuous aggregate (min/max/avg per sublink per hour) with an auto-refresh policy, reducing query load for historical data. - **Grafana real-time dashboard**: Rewrote RSL/TSL panels to show a shaded min/max band with mean line. Automatically switches between raw data (≤3 days) and 1h aggregates (>3 days). Added an interval variable (Auto/Raw). Removed Altair dependency. - **Grafana archive dashboard**: New `cml-archive` dashboard with two bar chart panels (active sublinks/hour and approximate data points/hour), embedded in the archive page via iframe. - **Webserver performance**: Replaced full-table `COUNT(*)` with `approximate_row_count()`, replaced `MIN/MAX(time) FROM cml_data` with `MIN/MAX(bucket) FROM cml_data_1h`, and removed the per-CML `GROUP BY` stats query. Fully removed pandas and Altair from the webserver. Commit messages * Add 1h pre-aggregated data and improve real-time dashboard - Add TimescaleDB continuous aggregate cml_data_1h (1h min/max/avg) - Remove Altair time-series from real-time route; keep for archive only - Grafana: show 1h min/max band always, avg/raw switching on zoom level - Grafana: per-sublink colors, interval selector (Auto/Raw) * Avoid full table scans on cml_data for stats queries - Replace COUNT(*) with approximate_row_count() for record totals - Replace MIN/MAX(time) FROM cml_data with MIN/MAX(bucket) FROM cml_data_1h - Replace per-CML COUNT GROUP BY with cml_data_1h aggregate estimate * Replace Altair archive chart with embedded Grafana dashboard - Remove generate_archive_charts() and pandas/altair imports - Add cml-archive Grafana dashboard (active sublinks + data points per hour) - Simplify archive page: remove header, metric cards and top-CML table - Add compact summary bar with record count, CML count, date range * Add database migration guide for `cml_data_1h` continuous aggregate * Remove unused dependencies from requirements and tests
1 parent d136099 commit 4dcf26e

9 files changed

Lines changed: 841 additions & 357 deletions

File tree

database/MIGRATION.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Database Migration Guide
2+
3+
## `cml_data_1h` continuous aggregate
4+
5+
**Branch:** `feature/performance-and-grafana-improvements`
6+
7+
`init.sql` only runs on a fresh database volume, so when deploying this branch
8+
to a machine that already has data you must apply the migration manually.
9+
10+
### Steps
11+
12+
**1. Pull and redeploy the application**
13+
14+
```bash
15+
git pull origin main
16+
docker compose up -d --build
17+
```
18+
19+
**2. Create the continuous aggregate**
20+
21+
```bash
22+
docker compose exec database psql -U myuser -d mydatabase -c "
23+
CREATE MATERIALIZED VIEW cml_data_1h
24+
WITH (timescaledb.continuous) AS
25+
SELECT
26+
time_bucket('1 hour', time) AS bucket,
27+
cml_id,
28+
sublink_id,
29+
MIN(rsl) AS rsl_min,
30+
MAX(rsl) AS rsl_max,
31+
AVG(rsl) AS rsl_avg,
32+
MIN(tsl) AS tsl_min,
33+
MAX(tsl) AS tsl_max,
34+
AVG(tsl) AS tsl_avg
35+
FROM cml_data
36+
GROUP BY bucket, cml_id, sublink_id
37+
WITH NO DATA;
38+
39+
SELECT add_continuous_aggregate_policy('cml_data_1h',
40+
start_offset => INTERVAL '2 days',
41+
end_offset => INTERVAL '1 hour',
42+
schedule_interval => INTERVAL '1 hour'
43+
);
44+
"
45+
```
46+
47+
**3. Backfill historical data (one-time)**
48+
49+
```bash
50+
docker compose exec database psql -U myuser -d mydatabase -c "
51+
CALL refresh_continuous_aggregate('cml_data_1h', NULL, NULL);
52+
"
53+
```
54+
55+
This may take a few seconds depending on how much data is present. After it
56+
completes the refresh policy keeps the view up to date automatically.

database/init.sql

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,4 +81,35 @@ SELECT create_hypertable('cml_data', 'time');
8181

8282
-- Index is created by the archive_loader service after bulk data load (faster COPY).
8383
-- If no archive data is loaded, create it manually:
84-
-- CREATE INDEX idx_cml_data_cml_id ON cml_data (cml_id, time DESC);
84+
-- CREATE INDEX idx_cml_data_cml_id ON cml_data (cml_id, time DESC);
85+
86+
-- ---------------------------------------------------------------------------
87+
-- 1-hour continuous aggregate for fast queries over large time ranges.
88+
-- Grafana and the webserver automatically switch to this view when the
89+
-- requested time range exceeds 3 days, reducing the scanned row count
90+
-- by ~360x (10-second raw data → 1-hour buckets).
91+
-- ---------------------------------------------------------------------------
92+
CREATE MATERIALIZED VIEW cml_data_1h
93+
WITH (timescaledb.continuous) AS
94+
SELECT
95+
time_bucket('1 hour', time) AS bucket,
96+
cml_id,
97+
sublink_id,
98+
MIN(rsl) AS rsl_min,
99+
MAX(rsl) AS rsl_max,
100+
AVG(rsl) AS rsl_avg,
101+
MIN(tsl) AS tsl_min,
102+
MAX(tsl) AS tsl_max,
103+
AVG(tsl) AS tsl_avg
104+
FROM cml_data
105+
GROUP BY bucket, cml_id, sublink_id
106+
WITH NO DATA;
107+
108+
-- Automatically refresh every hour, covering up to 2 days of history.
109+
-- The 1-hour end_offset prevents partial (in-progress) buckets from being
110+
-- materialised prematurely; very recent data reads through to raw cml_data.
111+
SELECT add_continuous_aggregate_policy('cml_data_1h',
112+
start_offset => INTERVAL '2 days',
113+
end_offset => INTERVAL '1 hour',
114+
schedule_interval => INTERVAL '1 hour'
115+
);

database/init_archive_data.sh

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,3 +84,12 @@ EOSQL
8484

8585
echo "Archive data successfully loaded!"
8686
# Note: cml_stats is populated by the parser's background stats thread on startup.
87+
88+
# Refresh the 1-hour continuous aggregate so that Grafana and the webserver can
89+
# immediately serve pre-aggregated data for large time ranges without scanning
90+
# the full raw cml_data table.
91+
echo "Refreshing 1h continuous aggregate (cml_data_1h)..."
92+
psql $PSQL_FLAGS <<-EOSQL
93+
CALL refresh_continuous_aggregate('cml_data_1h', NULL, NULL);
94+
EOSQL
95+
echo "Continuous aggregate refresh complete."
Lines changed: 196 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,196 @@
1+
{
2+
"id": null,
3+
"uid": "cml-archive",
4+
"title": "CML Archive",
5+
"tags": [],
6+
"timezone": "browser",
7+
"schemaVersion": 36,
8+
"version": 1,
9+
"refresh": "",
10+
"time": {
11+
"from": "now-1M",
12+
"to": "now"
13+
},
14+
"panels": [
15+
{
16+
"id": 1,
17+
"title": "Active Sublinks per Hour",
18+
"type": "timeseries",
19+
"datasource": {
20+
"type": "grafana-postgresql-datasource",
21+
"uid": "PostgreSQL"
22+
},
23+
"gridPos": {
24+
"h": 9,
25+
"w": 24,
26+
"x": 0,
27+
"y": 0
28+
},
29+
"targets": [
30+
{
31+
"datasource": {
32+
"type": "grafana-postgresql-datasource",
33+
"uid": "PostgreSQL"
34+
},
35+
"format": "time_series",
36+
"rawQuery": true,
37+
"rawSql": "SELECT\n bucket AS \"time\",\n 'sublinks' AS metric,\n COUNT(*) AS value\nFROM cml_data_1h\nWHERE bucket >= $__timeFrom()::timestamptz\n AND bucket <= $__timeTo()::timestamptz\nGROUP BY bucket\nORDER BY 1 ASC",
38+
"refId": "A"
39+
}
40+
],
41+
"fieldConfig": {
42+
"defaults": {
43+
"color": {
44+
"fixedColor": "blue",
45+
"mode": "fixed"
46+
},
47+
"custom": {
48+
"drawStyle": "bars",
49+
"barAlignment": 0,
50+
"lineWidth": 1,
51+
"fillOpacity": 60,
52+
"gradientMode": "none",
53+
"spanNulls": false,
54+
"showPoints": "never",
55+
"stacking": {
56+
"mode": "none",
57+
"group": "A"
58+
},
59+
"axisCenteredZero": false,
60+
"axisColorMode": "text",
61+
"axisLabel": "Sublinks",
62+
"axisPlacement": "auto",
63+
"scaleDistribution": {
64+
"type": "linear"
65+
},
66+
"hideFrom": {
67+
"tooltip": false,
68+
"viz": false,
69+
"legend": false
70+
},
71+
"thresholdsStyle": {
72+
"mode": "off"
73+
}
74+
},
75+
"mappings": [],
76+
"thresholds": {
77+
"mode": "absolute",
78+
"steps": [
79+
{
80+
"color": "green",
81+
"value": null
82+
}
83+
]
84+
},
85+
"unit": "short",
86+
"decimals": 0,
87+
"displayName": "Active sublinks"
88+
},
89+
"overrides": []
90+
},
91+
"options": {
92+
"tooltip": {
93+
"mode": "single",
94+
"sort": "none"
95+
},
96+
"legend": {
97+
"displayMode": "hidden",
98+
"placement": "bottom"
99+
}
100+
}
101+
},
102+
{
103+
"id": 2,
104+
"title": "Approximate Data Points per Hour",
105+
"type": "timeseries",
106+
"datasource": {
107+
"type": "grafana-postgresql-datasource",
108+
"uid": "PostgreSQL"
109+
},
110+
"gridPos": {
111+
"h": 9,
112+
"w": 24,
113+
"x": 0,
114+
"y": 9
115+
},
116+
"targets": [
117+
{
118+
"datasource": {
119+
"type": "grafana-postgresql-datasource",
120+
"uid": "PostgreSQL"
121+
},
122+
"format": "time_series",
123+
"rawQuery": true,
124+
"rawSql": "SELECT\n bucket AS \"time\",\n 'data points' AS metric,\n COUNT(*) * 360 AS value\nFROM cml_data_1h\nWHERE bucket >= $__timeFrom()::timestamptz\n AND bucket <= $__timeTo()::timestamptz\nGROUP BY bucket\nORDER BY 1 ASC",
125+
"refId": "A"
126+
}
127+
],
128+
"fieldConfig": {
129+
"defaults": {
130+
"color": {
131+
"fixedColor": "semi-dark-green",
132+
"mode": "fixed"
133+
},
134+
"custom": {
135+
"drawStyle": "bars",
136+
"barAlignment": 0,
137+
"lineWidth": 1,
138+
"fillOpacity": 60,
139+
"gradientMode": "none",
140+
"spanNulls": false,
141+
"showPoints": "never",
142+
"stacking": {
143+
"mode": "none",
144+
"group": "A"
145+
},
146+
"axisCenteredZero": false,
147+
"axisColorMode": "text",
148+
"axisLabel": "Data points",
149+
"axisPlacement": "auto",
150+
"scaleDistribution": {
151+
"type": "linear"
152+
},
153+
"hideFrom": {
154+
"tooltip": false,
155+
"viz": false,
156+
"legend": false
157+
},
158+
"thresholdsStyle": {
159+
"mode": "off"
160+
}
161+
},
162+
"mappings": [],
163+
"thresholds": {
164+
"mode": "absolute",
165+
"steps": [
166+
{
167+
"color": "green",
168+
"value": null
169+
}
170+
]
171+
},
172+
"unit": "short",
173+
"decimals": 0,
174+
"displayName": "Approx. data points"
175+
},
176+
"overrides": []
177+
},
178+
"options": {
179+
"tooltip": {
180+
"mode": "single",
181+
"sort": "none"
182+
},
183+
"legend": {
184+
"displayMode": "hidden",
185+
"placement": "bottom"
186+
}
187+
}
188+
}
189+
],
190+
"templating": {
191+
"list": []
192+
},
193+
"annotations": {
194+
"list": []
195+
}
196+
}

0 commit comments

Comments
 (0)