Skip to content

Commit a07b0f6

Browse files
authored
feat: queries benchmarking and optimization (#68) (#534)
* feat(db): add hot-path indexes and query benchmark (#68) * docs: added changeset (#68) * chore: addressed review comments (#68) * fix: bug fixes in benchmark-queries (#68)
1 parent b3effd1 commit a07b0f6

9 files changed

Lines changed: 835 additions & 0 deletions

File tree

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
"nostream": minor
3+
---
4+
5+
Add hot-path PostgreSQL indexes for subscription, vanish, retention, and invoice queries; add `db:benchmark` and `db:verify-index-impact` tooling; document index rationale and benchmarking. Closes #68.

CONFIGURATION.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,36 @@ The i2pd web console (tunnel status, `.b32.i2p` destinations) is published to th
7878

7979
If you've set READ_REPLICAS to 4, you should configure RR0_ through RR3_.
8080

81+
## Database indexes and benchmarking
82+
83+
The schema ships with a small, query-driven set of indexes. The most important ones for relay hot paths are:
84+
85+
| Index | Covers |
86+
|----------------------------------------------|----------------------------------------------------------------------------------------------------------|
87+
| `events_active_pubkey_kind_created_at_idx` | `REQ` with `authors`+`kinds` ordered by `created_at DESC, event_id ASC`; `hasActiveRequestToVanish`; by-pubkey deletes. Composite key `(event_pubkey, event_kind, event_created_at DESC, event_id)` so the ORDER BY tie-breaker is satisfied from the index without a sort step. |
88+
| `events_deleted_at_partial_idx` | Retention purge over soft-deleted rows. Partial on `deleted_at IS NOT NULL`. |
89+
| `invoices_pending_created_at_idx` | `findPendingInvoices` poll (`ORDER BY created_at ASC`). Partial on `status = 'pending'`. |
90+
| `event_tags (tag_name, tag_value)` | NIP-01 generic tag filters (`#e`, `#p`, …) via the normalized `event_tags` table. |
91+
| `events_event_created_at_index` | Time-range scans (`since` / `until`). |
92+
| `events_event_kind_index` | Kind-only filters and purge kind-whitelist logic. |
93+
94+
Run the read-only benchmark against your own database to confirm the planner is using the expected indexes and to record baseline latencies:
95+
96+
```sh
97+
npm run db:benchmark
98+
npm run db:benchmark -- --runs 5 --kind 1 --limit 500
99+
```
100+
101+
The `db:benchmark` script loads the local `.env` file automatically (via `node --env-file-if-exists=.env`), using the same `DB_HOST`/`DB_PORT`/`DB_USER`/`DB_PASSWORD`/`DB_NAME` variables as the relay. The benchmark issues only `EXPLAIN (ANALYZE, BUFFERS)` and `SELECT` statements — it never writes. Flags: `--runs <n>` (default 3), `--kind <n>` (default 1 / `TEXT_NOTE`; pass `0` for SET_METADATA), `--limit <n>` (default 500), `--horizon-days <n>` (default 7), `--help`.
102+
103+
For a full before/after proof of the index impact (seeds a throwaway dataset, drops and recreates the indexes, and prints a BEFORE/AFTER table), use:
104+
105+
```sh
106+
npm run db:verify-index-impact
107+
```
108+
109+
The hot-path index migration (`20260420_120000_add_hot_path_indexes.js`) uses `CREATE INDEX CONCURRENTLY`, so it can be applied to a running relay without taking `ACCESS EXCLUSIVE` locks on the `events` or `invoices` tables.
110+
81111
# Settings
82112

83113
Running `nostream` for the first time creates the settings file in `<project_root>/.nostr/settings.yaml`. If the file is not created and an error is thrown ensure that the `<project_root>/.nostr` folder exists. The configuration directory can be changed by setting the `NOSTR_CONFIG_DIR` environment variable. `nostream` will pick up any changes to this settings file without needing to restart.

README.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -650,6 +650,26 @@ npm run export -- backup-2024-01-01.jsonl # custom filename
650650
```
651651

652652
The script reads the same `DB_*` environment variables used by the relay (see [CONFIGURATION.md](CONFIGURATION.md)).
653+
654+
## Benchmark Database Queries
655+
656+
Run the read-only query benchmark to record the planner's choices and timings for the relay's hot-path queries (REQ subscriptions, vanish checks, purge scans, pending-invoice polls):
657+
658+
```
659+
npm run db:benchmark
660+
npm run db:benchmark -- --runs 5 --kind 1 --limit 500
661+
```
662+
663+
The benchmark only issues `EXPLAIN (ANALYZE, BUFFERS)` and `SELECT` statements against your configured database — it never writes. It loads `DB_*` variables from `.env` automatically (via `node --env-file-if-exists=.env`), so no extra setup is required beyond the one you already need to run the relay. Use it to confirm the `events_active_pubkey_kind_created_at_idx`, `events_deleted_at_partial_idx`, and `invoices_pending_created_at_idx` indexes are being picked up.
664+
665+
For a reproducible before/after proof on a throwaway dataset, run:
666+
667+
```
668+
npm run db:verify-index-impact
669+
```
670+
671+
It seeds ~200k synthetic events, drops the hot-path indexes, runs EXPLAIN (ANALYZE, BUFFERS) for each hot query, recreates the indexes, and prints a BEFORE/AFTER table. See the *Database indexes and benchmarking* section of [CONFIGURATION.md](CONFIGURATION.md).
672+
653673
## Relay Maintenance
654674

655675
Use `clean-db` to wipe or prune `events` table data. This also removes
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
/**
2+
* Add narrow, query-driven indexes to cover the hottest read paths.
3+
*
4+
* Each index is created with CREATE INDEX CONCURRENTLY so the migration can be
5+
* applied to a running relay without taking an ACCESS EXCLUSIVE lock on the
6+
* events table. CONCURRENTLY is not allowed inside a transaction, so this
7+
* migration opts out of Knex's default transactional wrapper via
8+
* `exports.config.transaction = false`.
9+
*
10+
* Rationale for each index is documented inline. See also:
11+
* https://devcenter.heroku.com/articles/postgresql-indexes
12+
*/
13+
14+
exports.config = { transaction: false }
15+
16+
exports.up = async function (knex) {
17+
// Covers the hottest subscription / per-message reads:
18+
//
19+
// 1. NIP-01 REQ with `authors` + `kinds` ordered by created_at DESC
20+
// (see EventRepository.findByFilters):
21+
// WHERE event_pubkey = ? AND event_kind IN (...)
22+
// ORDER BY event_created_at DESC, event_id ASC LIMIT N
23+
//
24+
// 2. `EventRepository.hasActiveRequestToVanish(pubkey)` — invoked on every
25+
// inbound event via UserRepository.isVanished:
26+
// WHERE event_pubkey = ? AND event_kind = 62 AND deleted_at IS NULL
27+
//
28+
// 3. `EventRepository.deleteByPubkeyExceptKinds(pubkey, kinds)`:
29+
// WHERE event_pubkey = ? AND event_kind NOT IN (...) AND deleted_at IS NULL
30+
//
31+
// The index is intentionally NOT partial on `deleted_at IS NULL`: the REQ
32+
// subscription path in findByFilters does not currently add that predicate,
33+
// so a partial index would be ineligible for the most important query shape.
34+
// Soft-deleted rows are a small fraction of total rows in practice (they get
35+
// hard-deleted by the retention sweep), so the bloat is negligible compared
36+
// to the benefit of the index being usable by the hot path.
37+
//
38+
// Including `event_id` as the final column makes the composite key match the
39+
// full ORDER BY (created_at DESC, event_id ASC) used by findByFilters, so the
40+
// planner can satisfy LIMIT N directly from the index without an extra sort
41+
// step for the tie-breaker.
42+
await knex.raw(`
43+
CREATE INDEX CONCURRENTLY IF NOT EXISTS events_active_pubkey_kind_created_at_idx
44+
ON events (event_pubkey, event_kind, event_created_at DESC, event_id)
45+
`)
46+
47+
// Supports the retention / purge scan in `deleteExpiredAndRetained` and the
48+
// vanish hard-delete follow-up:
49+
// WHERE deleted_at IS NOT NULL
50+
// Partial index is tiny because well-maintained relays hard-delete these
51+
// rows periodically and the vast majority of events have deleted_at IS NULL.
52+
await knex.raw(`
53+
CREATE INDEX CONCURRENTLY IF NOT EXISTS events_deleted_at_partial_idx
54+
ON events (deleted_at)
55+
WHERE deleted_at IS NOT NULL
56+
`)
57+
58+
// Supports `InvoiceRepository.findPendingInvoices`, which is polled by the
59+
// maintenance worker to detect settled invoices:
60+
// WHERE status = 'pending' ORDER BY created_at ASC OFFSET ? LIMIT ?
61+
// Partial on status = 'pending' so the index only contains the rows the
62+
// poller actually scans. Keyed on `created_at` so the planner can satisfy
63+
// the ORDER BY straight from the index (FIFO polling, bounded tail latency
64+
// even with large pending backlogs).
65+
await knex.raw(`
66+
CREATE INDEX CONCURRENTLY IF NOT EXISTS invoices_pending_created_at_idx
67+
ON invoices (created_at)
68+
WHERE status = 'pending'
69+
`)
70+
}
71+
72+
exports.down = async function (knex) {
73+
await knex.raw('DROP INDEX CONCURRENTLY IF EXISTS invoices_pending_created_at_idx')
74+
await knex.raw('DROP INDEX CONCURRENTLY IF EXISTS events_deleted_at_partial_idx')
75+
await knex.raw('DROP INDEX CONCURRENTLY IF EXISTS events_active_pubkey_kind_created_at_idx')
76+
}

package.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,8 @@
4343
"db:migrate": "knex migrate:latest",
4444
"db:migrate:rollback": "knex migrate:rollback",
4545
"db:seed": "knex seed:run",
46+
"db:benchmark": "node --env-file-if-exists=.env -r ts-node/register src/scripts/benchmark-queries.ts",
47+
"db:verify-index-impact": "node --env-file-if-exists=.env -r ts-node/register scripts/verify-index-impact.ts",
4648
"pretest:unit": "node -e \"require('fs').mkdirSync('.test-reports/unit', {recursive: true})\"",
4749
"test:unit": "mocha 'test/**/*.spec.ts'",
4850
"test:unit:watch": "npm run test:unit -- --min --watch --watch-files src/**/*,test/**/*",

0 commit comments

Comments
 (0)