refactor(core): sampler warnings + dynamic skip + batched fetch (Lane A rest)#6
Merged
Merged
Conversation
… fetch Three related fixes to packages/core/src/sampler/referential.ts, surfaced by the eng review as Issues #3, #7, and #8. Same module, one coordinated refactor, one test pass. Issue #3 — warnings collection (was: silent catch{} blocks) The three try/catch blocks in ensureReferentialIntegrity silently swallowed any failure to fetch a missing parent, an orphaned child, or an implicit-reference batch. Users got branches with dangling FKs and no way to know. Replaced with structured warning collection into a new IntegrityWarning[] array. Four kinds: - parent_fetch_failed: the missing-parent SELECT threw - parent_not_found: the parent genuinely doesn't exist in source - child_fetch_failed: the ensure-1-child-per-parent SELECT threw - implicit_ref_fetch_failed: the implicit IN(...) batch threw Warnings are deduped by (source, columns, target) fingerprint so a schema with thousands of orphaned rows emits one line per relationship, not one per row. Error messages are truncated to 140 chars to keep metadata bounded. ensureReferentialIntegrity now returns { tables, warnings } instead of a bare Map. Callers (sampler/index.ts) thread the warnings through SamplingResult.integrityWarnings, which flows into ConnectorMetadata.integrityWarnings for later surfacing by the CLI. Issue #7 — dynamic SKIP_IMPLICIT_COLUMNS (was: English-only hardcoded list) The old hardcoded ["id", "user_id", "owner_id", "created_by"] set was English-biased and conflated "already handled by formal FK" with "should never be followed". A Spanish codebase with `creado_por`, or an app with `author_id`/`reviewer_id`/`assignee_id` would fall through the gap. Replaced with a dynamic check: compute the set of (source_table, source_column) pairs that appear in any formal Relationship, and skip THOSE in the implicit-reference pass. Works for any language and any schema. The old symbolic "id" skip is preserved by the inferrer itself (inferTargetTable returns null for columns that don't end in _id or can't be mapped to a target table). Issue #8 — batched implicit references (was: N+1 per source-target pair) The old resolveImplicitReferences walked (table, column) pairs and fired one SELECT per pair, even when multiple source tables all referenced the same parent. On a 50-table schema over a VPN that produced 100+ round-trips (~30-60s). Rewritten to collect all missing ids grouped by target table across ALL source tables first, then issue one IN($1,$2,...) query per target (still chunked at 100 ids per query to stay well under Postgres's 65,535 bind-param limit). Deduplicates ids so three source rows referencing the same parent only request it once. Tests 10 new tests in packages/core/src/sampler/referential.test.ts: - 5 for warnings (happy path, parent_fetch_failed, parent_not_found, implicit_ref_fetch_failed, reason truncation) - 2 for dynamic skip (formal FK takes precedence, non-English columns don't throw) - 3 for batching (one query per target across sources, >100 id chunking, dedup) Total: 99 passing (was 89, +10 new). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wires the new SamplingResult.integrityWarnings (added in the previous
commit) through the CLI surface. Zero new core logic — just plumbing
the warning count and details out to the user.
- packages/cli/src/commands/connect.ts
After a successful sow connect, if the snapshot has any integrity
warnings, print one summary line:
⚠ N referential integrity warning(s) — run `sow doctor <name>` for details
- packages/cli/src/commands/doctor.ts
New describeConnectorWarnings() returns the per-connector report
(tables, rows, snapshot size, full warning list) by reading
ConnectorMetadata. The generic doctor checkConnectors() now flags
any connector with warnings as a "warn" status with a hint pointing
at sow doctor <name>, so users discover them even without knowing
to ask.
- packages/cli/src/commands/runner.ts
runDoctor() now accepts an optional connectorName positional arg.
When passed, prints the detailed warnings report instead of the
generic system check list. Format groups warnings by kind and
explains the root causes (parent_not_found = source data has
orphans, *_fetch_failed = transient read error, retry with sow
connector refresh). JSON mode supported.
- packages/cli/src/cli.ts
Help text for doctor updated to "doctor [connector]" so users
discover the new positional.
Tests: still 99 passing — this commit is wiring only, no new logic
worth a unit test (would amount to mocking everything).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
faculopezscala
added a commit
that referenced
this pull request
Apr 6, 2026
The 5 code PRs (#2-#6) deliver the engineering for the launch. This PR delivers everything user-facing: the README pitch, the package metadata, the docs/ folder, and the CHANGELOG. This PR depends on PRs #2-#6 being merged first because the new README references commands those PRs add (sow sandbox, sow doctor <connector>, the --allow-unsafe flag, the sub-second reset). Land them first, rebase this against main, then merge. README.md Hero rewritten from "Safe test databases from production Postgres" to "Stop letting Claude touch your prod database". Body explains the anxiety-reduction pitch: a coding agent is about to do something database-adjacent and you feel that quiet pang. sow is the safety layer. New "Why sow" section. New "How It Works" diagram showing the template-DB shape (one container per connector, N branch DBs, reset in <1s). New "Cookbook" stub linking to docs/cookbook.md. New "Documentation" section with the docs/ index. packages/cli/package.json Description: "Stop letting Claude touch your prod database. PII-safe local Postgres sandbox for coding agents." Keywords: added ai-agents, coding-agents, claude-code, cursor, sandbox, mcp. packages/core/package.json Description: "sow core engine — analyze, sample, sanitize, and branch Postgres databases for safe coding-agent sandboxes" Keywords: added ai-agents, coding-agents, sandbox. packages/mcp/package.json Description corrected from "15 tools" to "22 tools" (the actual count in packages/mcp/src/index.ts) and repositioned: "sow MCP server — 22 tools for coding agents (Claude Code, Cursor, Codex) to safely manage Postgres sandboxes" Keywords: added claude-code, cursor, codex, coding-agents, sandbox. docs/sandbox.md (new) The sow sandbox flagship command — what it does, the flags, the .env.local backup/revert flow, when not to use it, and what's actually in the sandbox. docs/sanitization.md (new) What sow sanitizes (the PII type table), how JSONB walking works, the fail-closed gate, the --allow-unsafe escape hatch, custom rules via .sow.yml, what sow does NOT do (free-text NER, etc.), and the read-only-on-the-source guarantee. docs/cookbook.md (new) Three end-to-end workflows with concrete prompts: 1. Let Claude refactor your schema without fear 2. Let Cursor generate seed data for a new feature 3. Let your coding agent debug a failing migration Plus the "agent reset loop" pattern diagram, the MCP tool list, and operational tips (one long-running sandbox per project, checkpoints for known-good states, sow doctor as the inspection surface, principle of least privilege on the source DB user). CHANGELOG.md (new) Scaffold with three sections: - [Unreleased] documenting the planned PR #2-#6 features under Added/Changed - [0.1.14] documenting the SQL injection security fix that already shipped (PR #1, merged earlier in the session) - [0.1.13] one-line summary of the initial public release Test/build/lint all clean (89/89 tests, 3/3 packages built, no source code changed in this PR). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the rest of Lane A from the eng review plan: three related fixes to the referential sampler that the security PR (#1) deliberately deferred. All three live in the same module and rewrite the same core function, so they ship together.
Issues addressed:
catch {}blocks with structuredIntegrityWarning[]collection. Surfaced viasow doctor <connector>.["id","user_id","owner_id","created_by"]set with a dynamic check against the actual formal Relationships.IN (...)query per target instead of one per source-target pair. ~30-60s → 1 round-trip per target on a 50-table schema over a VPN.Commits (bisectable):
4af4c41—refactor(core): referential sampler — warnings, dynamic skip, batched fetche169d3a—feat(cli): surface sampler integrity warnings via sow doctorWhat changed
Issue #3 — warnings, not silence
ensureReferentialIntegritynow returns{ tables, warnings }instead of a bareMap. Four warning kinds:parent_fetch_failedparent_not_foundchild_fetch_failedimplicit_ref_fetch_failedWarnings are deduped by
(source, columns, target)fingerprint so a schema with thousands of orphaned rows emits one line per relationship, not one per row. Error messages truncated at 140 chars.The warnings flow:
ensureReferentialIntegrity→SamplingResult.integrityWarnings→ConnectorMetadata.integrityWarnings(persisted to disk) →sow doctor <connector>→ user sees them.Issue #7 — dynamic skip
Old hardcoded list:
New dynamic check:
Works for any language, any column name, any FK shape. The old "id" implicit-skip is preserved naturally because
inferTargetTablereturns null for plain"id"columns (no_idsuffix).Issue #8 — batched implicit references
Before:
After:
50-table schema impact: ~250 round-trips → ~50 round-trips. With dedup, often much fewer (multiple source rows referencing the same parent collapse into one bind param).
Test Coverage
Tests: 89 → 99 (+10 new)— all inpackages/core/src/sampler/referential.test.ts.Coverage: 10/10 new paths directly tested. 100%.
Pre-Landing Review
No CRITICAL findings. The sampler refactor was triple-checked against the existing 10 SQL-injection regression tests (all still pass), the test pattern is the same SpyAdapter approach used in PR #1, and the wiring through
ConnectorMetadataonly adds an optional field (backwards compat for metadata written by older sow versions).Adversarial review notes
Self-reviewed with the same lens as the prior adversarial passes:
warnedRelationshipsSet fingerprinted by(source, columns, target).(source, columns, target)tuples. Different relationships still emit independent warnings.relationshipsarray (their tests are about composite/text PK handling, not formal-FK skipping).sourceTable === targetTable(self-reference). The outerfor (const rel of relationships)still hasif (rel.sourceTable === rel.targetTable) continuefrom the original code. Preserved.Plan Completion
This PR closes 3 of the remaining 10 issues from
/plan-eng-review. The other lanes ship as PRs #2 (Lane B template-DB), #3 (Lane C sanitizer), #4 (Lane D sandbox+env-patch), #5 (Lane E release workflow). All five lanes are independent and can merge in any order.Test plan
bunx vitest run— 99/99 tests passing (89 pre-existing + 10 new)bunx turbo build— 3/3 packages cleanbunx eslinton every touched file — clean (4 pre-existing warnings unrelated)sow connectagainst a schema with intentional orphaned FKs and verify the summary line +sow doctor <connector>flowsow connectagainst a 50-table schema and watch the round-trip count drop from ~250 to ~50 (Issue #8 measurable win)🤖 Generated with Claude Code