Skip to content
Merged
122 changes: 122 additions & 0 deletions database/MIGRATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,128 @@

---

## PR `feat/db-roles-rls` — Create database roles and enable Row-Level Security

**Branch:** `feat/db-roles-rls`

`init.sql` only runs on a fresh database volume, so when deploying this branch
to a machine that already has data you must apply the migration file below
**after** migrations 001–003 from `feat/db-add-user-id` have already been applied.

### Changes

| File | What it does |
|------|-------------|
| `migrations/004_add_roles_rls.sql` | Creates `user1` and `webserver_role` login roles; grants table/function permissions; enables RLS on `cml_metadata` and `cml_stats`; creates `current_user`-based policies; creates `cml_data_secure` and `cml_data_1h_secure` security-barrier views |
| `migrations/005_drop_sublink_from_segmentby.sql` | Removes `sublink_id` from `compress_segmentby`; new setting is `'user_id, cml_id'`; reduces average decompression work per CML query by ~2–4× |

### Backward compatibility

This migration is **fully backward-compatible** with the existing services:

- `myuser` (PostgreSQL superuser) bypasses RLS by default. The parser and
webserver still connect as `myuser` and see all data unchanged until
PR3 (`feat/parser-user-id`) and PR5 (`feat/webserver-auth`) wire up the
new role credentials.
- No table schema changes — only roles, grants, and policies are added.
- Rollback is possible: revoke grants, drop policies, then drop roles (see
Rollback section below).

### Note on `cml_data` isolation

TimescaleDB does not allow RLS on a compressed hypertable (and compression
cannot be set on an RLS-enabled table — they are mutually exclusive).
`cml_data` keeps compression; per-user isolation is provided by
`cml_data_secure` and `cml_data_1h_secure` security-barrier views.

### Note on `cml_data_1h` (continuous aggregate)

PostgreSQL RLS cannot be applied to materialized views, so `cml_data_1h` itself
cannot carry row-level policies. The same security-barrier view trick used for
`cml_data` is applied here too: `cml_data_1h_secure` is a `security_barrier`
view that filters `WHERE user_id = current_user`, providing the same automatic
per-user isolation. User roles (`user1`, `webserver_role`) are granted access
to `cml_data_1h_secure` only, not to the underlying `cml_data_1h` aggregate.

### Steps

**1. Back up the database**

```bash
docker compose exec database pg_dump -U myuser -d mydatabase \
> backup_pre_roles_rls_$(date +%Y%m%d_%H%M%S).sql
```

**2. Pull and rebuild**

```bash
git pull origin feat/db-roles-rls # or merge to main first
docker compose up -d --build
```

**3. Apply the migrations in order**

```bash
docker compose exec -T database psql -U myuser -d mydatabase \
< database/migrations/004_add_roles_rls.sql

docker compose exec -T database psql -U myuser -d mydatabase \
< database/migrations/005_drop_sublink_from_segmentby.sql
```

**4. Verify**

```bash
# List the new roles
docker compose exec database psql -U myuser -d mydatabase \
-c "\du user1 webserver_role"

# Confirm RLS is enabled on cml_metadata and cml_stats
# (cml_data intentionally shows f — compression and RLS are mutually exclusive)
docker compose exec database psql -U myuser -d mydatabase \
-c "SELECT relname, relrowsecurity FROM pg_class \
WHERE relname IN ('cml_data','cml_metadata','cml_stats');"

# Smoke-test: user1 should see only their own rows via the secure view
docker compose exec database psql \
-U user1 -d mydatabase \
-c "SELECT count(*) FROM cml_data_secure;"
```

**Rollback:**

```bash
docker compose exec database psql -U myuser -d mydatabase -c "
-- Drop security-barrier views
DROP VIEW IF EXISTS cml_data_secure;
DROP VIEW IF EXISTS cml_data_1h_secure;

-- Drop policies (only cml_metadata and cml_stats have RLS)
DROP POLICY IF EXISTS user_cml_metadata_policy ON cml_metadata;
DROP POLICY IF EXISTS user_cml_stats_policy ON cml_stats;
DROP POLICY IF EXISTS webserver_cml_metadata_policy ON cml_metadata;
DROP POLICY IF EXISTS webserver_cml_stats_policy ON cml_stats;

-- Disable RLS (cml_data was never RLS-enabled)
ALTER TABLE cml_metadata DISABLE ROW LEVEL SECURITY;
ALTER TABLE cml_stats DISABLE ROW LEVEL SECURITY;

-- Revoke grants
REVOKE ALL ON cml_data, cml_metadata, cml_stats, cml_data_1h
FROM user1, webserver_role;
REVOKE EXECUTE ON FUNCTION update_cml_stats(TEXT, TEXT)
FROM user1;
REVOKE user1 FROM webserver_role;
REVOKE USAGE ON SCHEMA public FROM user1, webserver_role;

-- Drop roles
DROP ROLE IF EXISTS user1;
DROP ROLE IF EXISTS webserver_role;
"
```

---

## PR `feat/db-add-user-id` — Add `user_id` for multi-user RLS support

**Branch:** `feat/db-add-user-id`
Expand Down
116 changes: 111 additions & 5 deletions database/init.sql
Original file line number Diff line number Diff line change
Expand Up @@ -147,9 +147,13 @@ SELECT add_continuous_aggregate_policy('cml_data_1h',
-- ---------------------------------------------------------------------------
-- Compression for cml_data chunks older than 7 days.
--
-- compress_segmentby: each compressed segment contains one (cml_id, sublink_id)
-- pair, so a query filtered to a single CML decompresses only ~1/728th of a
-- chunk — not the whole thing.
-- compress_segmentby: one compressed segment per (user_id, cml_id).
-- user_id is the leading key so a per-user query skips all other users'
-- segments entirely. sublink_id is intentionally omitted: ~80% of CMLs
-- have 2 sublinks and ~15% have 4; keeping sublinks together in one
-- segment roughly halves decompression work per CML query vs. splitting
-- by sublink. Filtering to a specific sublink after decompression is a
-- trivial CPU operation on already-decompressed columnar data.
-- compress_orderby: matches the query pattern (time range scans), allowing
-- skip-scan decompression for narrow time windows within a segment.
--
Expand All @@ -158,10 +162,112 @@ SELECT add_continuous_aggregate_policy('cml_data_1h',
-- The current uncompressed week chunk is left untouched so real-time ingestion
-- and detail-view queries on recent data have no decompression overhead.
-- ---------------------------------------------------------------------------
-- Note: TimescaleDB does not allow ENABLE ROW LEVEL SECURITY on a compressed
-- hypertable, and compression cannot be set on an RLS-enabled table. These
-- two features are mutually exclusive on the same hypertable. Per-user
-- isolation for cml_data is provided by the cml_data_secure view below.
ALTER TABLE cml_data SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'user_id, cml_id, sublink_id',
timescaledb.compress_segmentby = 'user_id, cml_id',
timescaledb.compress_orderby = 'time DESC'
);

SELECT add_compression_policy('cml_data', INTERVAL '7 days');
SELECT add_compression_policy('cml_data', INTERVAL '7 days');

-- ---------------------------------------------------------------------------
-- Database roles and Row-Level Security (PR feat/db-roles-rls)
--
-- Role naming convention: PG login role name = user_id value in the data.
-- "user1" role ↔ user_id = 'user1' — enables current_user-based RLS
-- policies and the cml_data_1h_secure security-barrier view below.
--
-- user1: used by the user1 parser instance (writes) and by the webserver
-- (via SET ROLE) for DB-enforced scoped reads.
-- webserver_role: used by the webserver process. Has a read-all RLS policy
-- for admin/aggregate queries; SET ROLEs to a user role for scoped reads.
--
-- Passwords shown here are development defaults.
-- Override them via environment variables or a secrets manager in production.
-- ---------------------------------------------------------------------------

CREATE ROLE user1 LOGIN PASSWORD 'user1password';
CREATE ROLE webserver_role LOGIN PASSWORD 'webserverpassword';

-- Allow webserver_role to impersonate user roles (SET ROLE user1).
GRANT user1 TO webserver_role;

-- Schema access.
GRANT USAGE ON SCHEMA public TO user1, webserver_role;

-- Table permissions.
GRANT SELECT, INSERT, UPDATE ON cml_data TO user1;
GRANT SELECT, INSERT, UPDATE ON cml_metadata TO user1;
GRANT SELECT, INSERT, UPDATE ON cml_stats TO user1;

GRANT SELECT ON cml_data TO webserver_role;
GRANT SELECT ON cml_metadata TO webserver_role;
GRANT SELECT ON cml_stats TO webserver_role;

-- Parser calls update_cml_stats() to upsert per-CML statistics.
GRANT EXECUTE ON FUNCTION update_cml_stats(TEXT, TEXT) TO user1;

-- Row-Level Security on cml_metadata and cml_stats.
-- cml_data is excluded: TimescaleDB does not allow RLS on compressed
-- hypertables (and compression cannot be set on an RLS-enabled table).
-- Per-user isolation for raw cml_data queries is provided by the
-- cml_data_secure security-barrier view defined below.
ALTER TABLE cml_metadata ENABLE ROW LEVEL SECURITY;
ALTER TABLE cml_stats ENABLE ROW LEVEL SECURITY;

-- Generic current_user policies for cml_metadata and cml_stats.
-- Because role name = user_id value, one policy per table covers all users.
CREATE POLICY user_cml_metadata_policy ON cml_metadata
FOR ALL
USING (user_id = current_user)
WITH CHECK (user_id = current_user);

CREATE POLICY user_cml_stats_policy ON cml_stats
FOR ALL
USING (user_id = current_user)
WITH CHECK (user_id = current_user);

-- Permissive read-all policies for webserver_role (admin / cross-user use).
CREATE POLICY webserver_cml_metadata_policy ON cml_metadata
FOR SELECT TO webserver_role
USING (true);

CREATE POLICY webserver_cml_stats_policy ON cml_stats
FOR SELECT TO webserver_role
USING (true);

-- Security-barrier view over cml_data (compressed hypertable).
-- Provides per-user isolation for raw cml_data queries via
-- WHERE user_id = current_user. The security_barrier option prevents the
-- query optimizer from pushing caller-supplied predicates above the filter
-- (SQL injection protection). WITH CHECK OPTION rejects writes through
-- this view where user_id != current_user.
CREATE VIEW cml_data_secure WITH (security_barrier) AS
SELECT * FROM cml_data
WHERE user_id = current_user
WITH CHECK OPTION;

GRANT SELECT ON cml_data_secure TO user1;
GRANT SELECT ON cml_data_secure TO webserver_role;

-- Security-barrier view over cml_data_1h (continuous aggregate).
--
-- PostgreSQL cannot apply RLS to materialized views. This view wraps
-- cml_data_1h with WHERE user_id = current_user and security_barrier,
-- providing DB-enforced per-user filtering with no application WHERE clause.
--
-- User roles query cml_data_1h_secure (auto-filtered).
-- webserver_role queries cml_data_1h_secure after SET ROLE for user pages;
-- queries cml_data_1h directly (as webserver_role) for admin/cross-user
-- aggregates — those paths still need WHERE user_id = ? in the app.
CREATE VIEW cml_data_1h_secure WITH (security_barrier) AS
SELECT * FROM cml_data_1h
WHERE user_id = current_user;

GRANT SELECT ON cml_data_1h_secure TO user1;
GRANT SELECT ON cml_data_1h TO webserver_role;
GRANT SELECT ON cml_data_1h_secure TO webserver_role;
6 changes: 4 additions & 2 deletions database/migrations/002_update_compression_segmentby.sql
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,10 @@
-- Part of PR feat/db-add-user-id.
-- Run this AFTER 001_add_user_id.sql.
--
-- Adds user_id to compress_segmentby so that per-user range scans
-- decompress only the relevant segment instead of the full chunk.
-- Adds user_id as the leading segmentby key so that per-user range scans
-- decompress only the relevant segments instead of the full chunk.
-- sublink_id is included here alongside cml_id; it was later dropped in
-- migration 005 (feat/db-roles-rls) — see that file for the rationale.
-- The decompress → alter → recompress cycle is non-destructive; no data
-- is lost if the process is interrupted (TimescaleDB keeps the original
-- uncompressed chunks until recompression succeeds).
Expand Down
Loading
Loading