Skip to content

Dry runs#351

Open
Scr4tch587 wants to merge 9 commits into
mainfrom
kai/dry-runs
Open

Dry runs#351
Scr4tch587 wants to merge 9 commits into
mainfrom
kai/dry-runs

Conversation

@Scr4tch587

Copy link
Copy Markdown
Contributor

What

Add ?dry-run=true to POST /build. Runs the full build (including all dependencies) against an in-memory store, writes nothing to the DB, and returns the produced rows so builder logic can be validated.

How

  • New Store ABC over the four build-path data ops
  • PostgresStore — thin shell over core.db.datasets (real builds)
  • MemoryStore — in-process dict backend (dry runs)
  • store threaded through run_build → execute_job → _fetch_dep_data
  • worker no longer calls core.db.datasets directly
  • build_dataset(dry_run=...) is the single boundary that selects the store
  • MemoryStore starts empty, so a dry run rebuilds the whole graph in isolation
  • no build lock (nullcontext) — a dry run can't corrupt real data
  • no cleanup — store is GC'd when the request ends
  • json round-trip on insert mirrors Postgres Jsonb serialization

Response

  • real build: {"status": "ok"}
  • dry run: {dataset_name, dataset_version, dry_run, rows}

Why

  • validate builder output without touching real data
  • same build logic runs for real and dry runs — store is the only difference

Testing

  • MemoryStore unit suite + PostgresStore/MemoryStore read-method parity
  • dry-run integration: DB untouched, builders actually run, chain + lookback graphs
  • full suite green against real Docker Postgres

Scr4tch587 and others added 7 commits June 17, 2026 21:25
Introduce a Store ABC with PostgresStore (forwards to core.db.datasets) and
MemoryStore (in-process dict, no DB connection) backends, and thread a store
through run_build -> execute_job -> _fetch_dep_data. The store also owns the
build lock: PostgresStore uses the shared per-dataset lock, MemoryStore uses a
nullcontext. Real builds default to PostgresStore, so behavior is unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
build_dataset(dry_run=...) is the single boundary that reads the flag: it
constructs a PostgresStore (real build, returns None) or a fresh MemoryStore
(dry run) and passes it into run_build. A dry run rebuilds the whole graph in
isolation and returns the produced rows for the requested dataset.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
POST /build accepts ?dry-run=true (default false). A dry run runs builders over
the whole dependency graph in memory without writing to the DB and returns the
produced rows; real builds still return {"status": "ok"}. Errors keep the same
400/422/500 semantics. Adds integration coverage for DB-unchanged, builders
running, chain/lookback graphs, and PostgresStore/MemoryStore read parity.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Scr4tch587 Scr4tch587 requested a review from Blackgaurd June 22, 2026 04:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant