Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 117 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Changelog

All notable changes to sow are documented here. The format is loosely based on
[Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and the project follows
[Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

The next release lands the launch positioning ("Stop letting Claude touch your
prod database") plus the rest of the eng-review plan as five parallel PRs:

### Added (planned)

- **`sow sandbox`** — flagship zero-config command. Auto-detects your project's
Postgres source, samples + sanitizes, spins up a local sandbox, and patches
`.env.local` with the new `DATABASE_URL`. One command from clone to working
sandbox. (PR #4)
- **`sow env revert`** — restores `.env.local` from the `.env.local.sow.bak`
backup that `sow sandbox` writes. (PR #4)
- **JSONB sanitization.** sow now walks JSONB columns recursively and replaces
values whose key matches a PII pattern. Closes the biggest PII leak vector in
modern Postgres schemas. (PR #3)
- **Postgres type coverage.** Built-in transformers for `inet`, `cidr`,
`macaddr`, `macaddr8`, plus passthrough handling for `bytea`, `xml`, `money`,
`interval`, range types, array types, and custom enums. (PR #3)
- **`--allow-unsafe` flag.** sow's sanitizer is now fail-closed: it aborts
`sow connect` if it sees a Postgres type it can't verify. Pass `--allow-unsafe`
to NULL out unhandled columns instead. (PR #3)
- **`sow doctor <connector>`** — drill into a single connector's referential
integrity warnings. Surfaces orphaned FKs, transient read errors, and
sanitization warnings. (PR #6)
- **Tag-driven release workflow.** New `version-bump.yml` workflow lets you cut
a major/minor/patch/prerelease via the GitHub Actions UI; the existing
`release.yml` is now triggered only by tag pushes (not every merge to main).
Prevents accidental releases on README typos. (PR #5)

### Changed (planned)

- **`sow branch reset` is now sub-second** on a 10k-row schema. Refactored the
Docker provider to use Postgres template databases (one long-lived container
per connector, N branch databases inside). Old reset path was 5-15s; new path
is ~200-800ms. Enables tight agent reset loops (50 iterations in a minute).
(PR #2)
- **Sampler integrity warnings** — the referential-integrity pass now collects
structured warnings (`parent_fetch_failed`, `parent_not_found`,
`child_fetch_failed`, `implicit_ref_fetch_failed`) instead of silently
swallowing them in `catch {}` blocks. Surfaced via `sow doctor <connector>`.
(PR #6)
- **Implicit reference resolution is now batched.** The sampler used to fire
one query per (source_table, source_column) pair when resolving implicit FKs;
it now collects missing ids by target table across all sources and fires one
`IN (...)` query per target. ~10x reduction in `sow connect` round-trips on a
50-table schema. (PR #6)
- **Skip-list for implicit references is now dynamic.** The old hardcoded
English-only `["id", "user_id", "owner_id", "created_by"]` set is replaced
with a dynamic check against the actual formal Relationships from the
schema. Works for non-English column names and unusual FK layouts. (PR #6)
- **MCP tool count corrected.** Package descriptions now correctly state 22
tools (was: incorrectly listed as 15).
- **README repositioned** around "Stop letting Claude touch your prod database"
with new sections on the agent reset loop, the cookbook of three workflows,
and a docs index.

## [0.1.14] — 2026-04-06

### Fixed

- **SQL injection across the sampler and branching layer (security).** A class
of bugs where dynamic SQL was built by string-interpolating values from
sampled source data has been closed. Seven call sites parameterized:
- `packages/core/src/sampler/referential.ts` — three formal-FK and
implicit-reference call sites (regression: a text PK like `O'Brien` used
to crash silently and drop the parent row)
- `packages/core/src/branching/manager.ts:getBranchSample` — the `table`
argument from user/agent input is now `quoteIdent`-quoted, the `limit` is
bound via `$1`
- `packages/core/src/branching/providers/supabase.ts:fetchAuthUserMappings`
— the `IN (...)` clause now uses `$1, $2, ...` placeholders, batched at
1000 ids per query, with UUID-shape pre-filter
- `packages/core/src/branching/supabase.ts` — eight RLS DDL and auth-user
INSERT/DELETE sites now use parameterized values and `quoteIdent`
identifiers
- **`packages/core/src/adapters/postgres.ts`** — the `query()` method's
`params` argument was previously declared in the interface but silently
dropped at runtime (`_params?: unknown[]`). Now actually passes through to
`postgres@3`'s `sql.unsafe(query, parameters)` for real bind-parameter
safety.
- **Fail-safe RLS setup in the Supabase provider.** A previous structure
could DISABLE row-level security on a table when a transient introspection
error occurred during sandbox setup. RLS introspection now lives in its own
per-table try block that `continue`s on error rather than falling into the
policy-disable fallback path.
- **Identifier quoting helper** — new `packages/core/src/sql/identifiers.ts`
exports `quoteIdent()`, the SQL-standard double-quote escape used wherever
table or column names are interpolated into dynamic SQL. Throws on empty
identifiers and embedded NUL bytes.
- **`sow branch sample` limit clamping** — accepts `LIMIT 0` (a valid request
for an empty result set), falls back to the documented default of 5 for
non-finite inputs, and clamps the upper bound at 100.

### Tests

- 89 unit tests passing. 10 new regression tests in
`packages/core/src/sampler/referential.test.ts` covering `quoteIdent`
edge cases, the `O'Brien` single-quote regression, composite FK
parameterization, and hostile-payload defense.
- Cross-model adversarial review (Claude + Codex) — both passes clean,
Codex structured P1 gate passed.

## [0.1.13] — earlier

Initial public release. Functional CLI, MCP server, Docker-backed branches,
deterministic PII sanitization, schema introspection, edge-case sampling,
checkpoint save/load, branch diff. Auto-detection from env files and the
common ORMs (Prisma, Drizzle, Knex, TypeORM, Sequelize, Docker Compose).
Provider hints for Supabase, Neon, Vercel Postgres, and Railway.
</content>
76 changes: 53 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
╚══════╝ ╚═════╝ ╚══╝╚══╝
```

**Safe test databases from production Postgres.**
**Stop letting Claude touch your prod database.**

[![GitHub stars](https://img.shields.io/github/stars/Bugsterapp/sow)](https://github.com/Bugsterapp/sow)
[![npm version](https://img.shields.io/npm/v/@sowdb/cli)](https://www.npmjs.com/package/@sowdb/cli)
Expand All @@ -20,46 +20,52 @@

</div>

sow connects to your production Postgres, samples representative data with edge cases, replaces all PII with realistic fakes, and gives you isolated database branches that start in seconds. 100% local, zero API calls, zero cost.
You're using Claude Code or Cursor against a real codebase with a real database. Every time the agent is about to do something database-adjacent, you feel that quiet pang of "wait, should I let it do that?"

sow is the safety layer. One command points it at your prod Postgres, samples the data, scrubs every PII column with realistic fakes, and gives your coding agent a sandboxed local copy to hammer. Prod never gets touched. The sandbox runs in seconds, resets in under one. 100% local. Zero API calls. Zero cost. Never writes to your source database.

## Install & First Use

```bash
npm install -g @sowdb/cli
sow connect postgresql://user:pass@host:5432/mydb
sow branch create my-feature
# -> postgresql://sow:sow@localhost:54320/sow
cd your-project
sow sandbox
```

## Why sow?
`sow sandbox` auto-detects your database from your project's env files, samples it, sanitizes PII, and patches `.env.local` with a safe `DATABASE_URL`. Now any coding agent on your laptop talks to the sandbox instead of prod.

## Why sow

- **PII Safe** — All personal data is detected and replaced with realistic fakes.
- **Agent-First** — MCP server, `--json` mode, SKILL.md for agent context.
- **Fast** — First snapshot in 30-60s. Branches in ~5s. Resets in ~1s.
- **Checkpoints** — Save and restore branch state instantly.
- **Diff** — See exactly what changed: rows added, deleted, modified, schema changes.
- **Deterministic** — Same seed produces identical output every time.
- **Read-Only** — sow never writes to your source database.
- **Auto-Detect** — Scans .env files, Prisma, Drizzle, Knex, TypeORM, Sequelize, Docker Compose.
- **Built for coding agents.** MCP server with 22 tools, `--json` mode for every command, `SKILL.md` for agent context, deterministic seeds so bugs reproduce across sessions.
- **PII-safe by default.** Detects emails, phones, names, addresses, SSNs, JSONB-embedded fields. Fail-closed: aborts if it sees a Postgres type it can't verify, with `--allow-unsafe` to override explicitly.
- **Reset in under 1 second.** Postgres template-database backed. Your agent can try a destructive change, verify the result, reset, try again — 50 iterations in a minute.
- **Zero config.** Auto-detects env files, Prisma, Drizzle, Knex, TypeORM, Sequelize, Docker Compose. Identifies Supabase, Neon, Vercel Postgres, and Railway projects.
- **Read-only on the source.** sow never writes to your production database. Parameterized queries, identifier escaping, and a security-audited code path verified by both Claude and Codex adversarial review.
- **100% local.** No cloud round-trip, no third party holding your sanitized data, no account, no API key. The sandbox lives on your laptop.

## Quick Start

```bash
# Zero-config: detect your DB, sample, sanitize, patch .env.local
sow sandbox

# Or do it explicitly
sow connect postgresql://user:pass@host:5432/mydb # analyze, sample, sanitize
sow branch create my-feature # isolated Postgres in ~5s
DATABASE_URL=postgresql://sow:sow@localhost:54320/sow npm run dev
sow branch diff my-feature # see what changed
sow branch reset my-feature # back to seed state in <1s
sow branch diff my-feature # see what your agent changed
sow branch delete my-feature # clean up
```

## For AI Agents

```bash
npm install -g @sowdb/mcp
sow mcp --agent cursor # or claude-code, windsurf, codex
sow mcp --agent claude-code # or cursor, windsurf, codex
```

Or add manually to your MCP config:
Or add to your MCP config manually:

```json
{
Expand All @@ -75,26 +81,50 @@ Install the agent skill for context:
npx skills add Bugsterapp/sow
```

The MCP server exposes 22 tools: `sow_sandbox`, `sow_connect`, `sow_detect`, `sow_branch_create`, `sow_branch_reset`, `sow_branch_diff`, `sow_branch_save`, `sow_branch_load`, `sow_branch_exec`, `sow_branch_users`, `sow_branch_tables`, `sow_branch_sample`, and more. Every tool returns structured JSON. Agents drive the full sample → branch → exec → diff → reset loop without a human in the middle.

## How It Works

```
Production DB sow Pipeline Local Branches
Production DB sow Pipeline Local Sandbox

┌──────────┐ ┌──────────────────────┐ ┌──────────────┐
│ Schema │ │ 1. Analyze │ │ Branch A │
│ Stats │────>│ 2. Sample (N rows) │────>│ :54320
│ Stats │────>│ 2. Sample (N rows) │────>│ :54320/A
│ Data │ │ 3. Sanitize PII │ │ │
│ (read │ │ 4. Save snapshot │ │ Branch B │
│ only) │ │ (~2 MB) │ │ :54321 │
└──────────┘ └──────────────────────┘ └──────────────┘
Provider-managed
│ only) │ │ (~2 MB) │ │ :54320/B │
└──────────┘ └──────────────────────┘ │ │
│ Branch C │
│ :54320/C │
└──────────────┘
One container
per connector,
N branch DBs,
reset in <1s.
```

## Cookbook

Three workflows that show the full agent loop. See [`docs/cookbook.md`](docs/cookbook.md) for the prompts and full walkthrough.

1. **Let Claude refactor your schema without fear** — `sow sandbox`, then ask Claude to add a column, drop an index, rename a table. Verify, reset, try a different approach.
2. **Let Cursor generate seed data for a new feature** — point your agent at the sandbox and ask for "100 realistic users with orders." Inspect with `sow branch sample`. Reset and ask for a different distribution.
3. **Let your coding agent debug a failing migration** — replay your last migration on the sandbox. If it fails, reset and try a fix. No prod risk.

## Documentation

- [`docs/sandbox.md`](docs/sandbox.md) — the `sow sandbox` flagship command, flags, and `.env.local` patching with backup/revert
- [`docs/sanitization.md`](docs/sanitization.md) — what sow sanitizes, the fail-closed gate, JSONB handling, and the `--allow-unsafe` flag
- [`docs/cookbook.md`](docs/cookbook.md) — three end-to-end workflows for coding agents
- [`CHANGELOG.md`](CHANGELOG.md) — release history
- [`CONTRIBUTING.md`](CONTRIBUTING.md) — building from source, running tests, the lane structure

## sow Cloud — coming soon

sow CLI is free, open source, and works 100% locally. Always will be.

sow Cloud is for teams: shared connectors, CI/CD without Docker-in-Docker, compliance (data never touches dev laptops), and a team dashboard.
sow Cloud is for teams: shared connectors, CI/CD without Docker-in-Docker, compliance (sanitized data never touches dev laptops), and a team dashboard.

[Join the waitlist →](https://tally.so/r/0QvzZN)

Expand Down
Loading
Loading