Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,14 +66,14 @@ seedstorm gaps \

## Features

- **Schema self-discovery** — introspects tables, columns, PKs, FKs, enum values, UNIQUE and CHECK constraints; no manual editing required
- **Schema self-discovery** — introspects tables, columns, PKs, FKs, enum values, UNIQUE and CHECK constraints, generated columns, comments, defaults, and indexes; no manual editing required
- **FK-aware seeding** — topological sort guarantees parent tables are seeded before children; handles nullable and non-nullable self-referential FKs with bounded depth, near-cycles, junction tables, and deep multi-level chains
- **Constraint-aware faker mapping** — UNIQUE → `uuid`, CHECK IN → `randomstring(a,b,c)`, CHECK range → `number(min,max)`; seed data always satisfies your constraints
- **Semantic faker** — maps column names (`email`, `first_name`, `price`, `city`…) to realistic `gofakeit` generators automatically
- **Enum coverage** — every enum value appears at least `--rows` times, independently per column
- **AI enrichment** — Gemini rewrites faker hints for domain-meaningful data; supply `--prompt` for richer context
- **Gap analysis** — `gaps` shows which tables are empty with row counts and FK context; `--fill` seeds only the empty ones
- **Schema clone for test DBs** — copy schema-only structure from one connected Postgres/MySQL database into another matching local target, then seed it with safe fake data
- **Schema clone for test DBs** — copy schema-only structure from one connected Postgres/MySQL database into another matching local target, preserving compatible table metadata before seeding it with safe fake data
- **Interactive TUI** — wizard for table selection, global config, self-reference depth, per-table row volumes, and review before seeding
- **Web UI** — `seedstorm serve` exposes an interactive graph workspace with click-to-select tables, self-reference depth, per-table row overrides, truncate-only runs (`Rows = 0` + `truncate`), live SSE job logs, schema clone between connected DBs, multi-DB session switcher, and connection presets in `localStorage`
- **Dry-run** — preview the seed plan and INSERT SQL without touching the database
Expand Down
4 changes: 3 additions & 1 deletion docs/commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -274,7 +274,7 @@ seedstorm export --data data.yaml --format csv --out data.csv

## `clone-schema`

Copies schema-only table structure from a source database into a target database of the same engine. This is designed for local/test database setup before running `seed`; it recreates the metadata seedstorm understands: tables, columns, nullability, PKs, FKs, single-column UNIQUE constraints, enum values, and simple CHECK constraints.
Copies schema-only table structure from a source database into a target database of the same engine. This is designed for local/test database setup before running `seed`; it recreates compatible table metadata seedstorm understands: tables, columns, exact introspected column DDL types, nullability, defaults, stored generated columns, PKs, FKs, single-column UNIQUE constraints, multi-column indexes, enum values, simple CHECK constraints, and table/column comments.

```bash
seedstorm clone-schema \
Expand Down Expand Up @@ -310,6 +310,8 @@ seedstorm clone-schema \
| `--dry-run` / `-n` | false | Print generated DDL, do not execute |
| `--interactive` / `-i` | false | Confirm the clone in the terminal UI |

Boundaries: `clone-schema` is same-engine only. It does not attempt cross-engine translation, and it does not clone views, triggers, functions/procedures, partial/expression indexes, grants, ownership, or non-public/non-current schemas.

---

## `serve`
Expand Down
1 change: 1 addition & 0 deletions docs/schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ tables:
| `faker` | Faker hint (see table below). Auto-assigned by `introspect`; overridden by `ai-enrich`. |
| `nullable` | `true` if the column allows NULL |
| `unique` | `true` if the column has a UNIQUE constraint (auto-sets faker to `uuid`) |
| `generated` | `true` for database-generated columns; seedstorm keeps them out of generated INSERT rows |

## Faker Hints Reference

Expand Down
151 changes: 124 additions & 27 deletions integration/integration_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -187,13 +187,21 @@ func cloneSmokeSchema(t *testing.T, driver string, conn *sql.DB) {
CREATE TABLE clone_users (
id integer PRIMARY KEY,
email varchar(255) NOT NULL UNIQUE,
status varchar(20) NOT NULL CHECK (status IN ('active', 'blocked'))
status varchar(20) NOT NULL DEFAULT 'active' CHECK (status IN ('active', 'blocked')),
full_label varchar(300) GENERATED ALWAYS AS (email || ':' || status) STORED
);
CREATE TABLE clone_orders (
id integer PRIMARY KEY,
user_id integer NOT NULL REFERENCES clone_users(id),
total integer NOT NULL CHECK (total BETWEEN 1 AND 500)
subtotal numeric(10,2) NOT NULL DEFAULT 10.00,
tax numeric(10,2) NOT NULL DEFAULT 0.00,
total numeric(10,2) GENERATED ALWAYS AS (subtotal + tax) STORED,
quantity integer NOT NULL CHECK (quantity BETWEEN 1 AND 500)
);
CREATE INDEX idx_clone_orders_user_total ON clone_orders(user_id, total);
CREATE UNIQUE INDEX uq_clone_users_status_email ON clone_users(status, email);
COMMENT ON TABLE clone_users IS 'clone source users';
COMMENT ON COLUMN clone_users.status IS 'workflow state';
`)
return
}
Expand All @@ -205,14 +213,22 @@ func cloneSmokeSchema(t *testing.T, driver string, conn *sql.DB) {
CREATE TABLE clone_users (
id integer PRIMARY KEY,
email varchar(255) NOT NULL UNIQUE,
status varchar(20) NOT NULL CHECK (status IN ('active', 'blocked'))
);
status varchar(20) NOT NULL DEFAULT 'active' CHECK (status IN ('active', 'blocked')),
full_label varchar(300) GENERATED ALWAYS AS (concat(email, ':', status)) STORED,
UNIQUE KEY uq_clone_users_status_email (status, email)
) COMMENT='clone source users';
ALTER TABLE clone_users MODIFY status varchar(20) NOT NULL DEFAULT 'active' COMMENT 'workflow state';
CREATE INDEX idx_clone_users_status ON clone_users(status);
CREATE TABLE clone_orders (
id integer PRIMARY KEY,
user_id integer NOT NULL,
total integer NOT NULL CHECK (total BETWEEN 1 AND 500),
subtotal decimal(10,2) NOT NULL DEFAULT 10.00,
tax decimal(10,2) NOT NULL DEFAULT 0.00,
total decimal(10,2) GENERATED ALWAYS AS (subtotal + tax) STORED,
quantity integer NOT NULL CHECK (quantity BETWEEN 1 AND 500),
FOREIGN KEY (user_id) REFERENCES clone_users(id)
);
CREATE INDEX idx_clone_orders_user_total ON clone_orders(user_id, total);
`)
}

Expand All @@ -232,10 +248,11 @@ func assertCloneSchemaCanSeed(t *testing.T, driver string, conn *sql.DB, tables
st := schema.Table{Columns: make(map[string]schema.Column, len(tbl.Columns))}
for _, col := range tbl.Columns {
sc := schema.Column{
Type: col.Type,
PK: col.IsPK,
Nullable: col.IsNullable,
Faker: faker.MapColumnToFaker(driver, col),
Type: col.Type,
PK: col.IsPK,
Nullable: col.IsNullable,
Generated: col.Generated != "",
Faker: faker.MapColumnToFaker(driver, col),
}
if col.Name == "email" {
sc.Faker = "email"
Expand Down Expand Up @@ -272,6 +289,80 @@ func assertCloneSchemaCanSeed(t *testing.T, driver string, conn *sql.DB, tables
}
}

func assertCloneMetadata(t *testing.T, tables []db.Table) {
t.Helper()
byName := make(map[string]db.Table, len(tables))
for _, tbl := range tables {
byName[tbl.Name] = tbl
}
users, ok := byName["clone_users"]
if !ok {
t.Fatal("clone_users missing from cloned metadata")
}
if users.Comment != "clone source users" {
t.Fatalf("clone_users comment = %q", users.Comment)
}
var status, label db.Column
for _, col := range users.Columns {
switch col.Name {
case "status":
status = col
case "full_label":
label = col
}
}
if status.Default == "" {
t.Fatal("clone_users.status default was not preserved")
}
if status.Comment != "workflow state" {
t.Fatalf("clone_users.status comment = %q", status.Comment)
}
if label.Generated == "" {
t.Fatal("clone_users.full_label generated expression was not preserved")
}
if !hasIndex(users.Indexes, "uq_clone_users_status_email", true, []string{"status", "email"}) {
t.Fatalf("multi-column unique index not preserved: %#v", users.Indexes)
}
orders := byName["clone_orders"]
if !hasIndex(orders.Indexes, "idx_clone_orders_user_total", false, []string{"user_id", "total"}) {
t.Fatalf("multi-column index not preserved: %#v", orders.Indexes)
}
var subtotal, total db.Column
for _, col := range orders.Columns {
switch col.Name {
case "subtotal":
subtotal = col
case "total":
total = col
}
}
if subtotal.Default == "" {
t.Fatal("clone_orders.subtotal default was not preserved")
}
if total.Generated == "" {
t.Fatal("clone_orders.total generated expression was not preserved")
}
}

func hasIndex(indexes []db.Index, name string, unique bool, columns []string) bool {
for _, idx := range indexes {
if idx.Name != name || idx.Unique != unique || len(idx.Columns) != len(columns) {
continue
}
match := true
for i := range columns {
if idx.Columns[i] != columns[i] {
match = false
break
}
}
if match {
return true
}
}
return false
}

// buildAndSeed runs the full introspect → build schema → generate → seed pipeline.
// It prints a summary at the end (not per-row during insert).
func buildAndSeed(t *testing.T, label, driver, dsn string, conn *sql.DB) map[string][]map[string]interface{} {
Expand All @@ -292,10 +383,11 @@ func buildAndSeed(t *testing.T, label, driver, dsn string, conn *sql.DB) map[str
st := schema.Table{Columns: make(map[string]schema.Column, len(tbl.Columns))}
for _, col := range tbl.Columns {
sc := schema.Column{
Type: col.Type,
PK: col.IsPK,
Nullable: col.IsNullable,
Faker: faker.MapColumnToFaker(driver, col),
Type: col.Type,
PK: col.IsPK,
Nullable: col.IsNullable,
Generated: col.Generated != "",
Faker: faker.MapColumnToFaker(driver, col),
}
if col.FK != nil {
sc.FK = fmt.Sprintf("%s.%s", col.FK.TableName, col.FK.ColumnName)
Expand Down Expand Up @@ -1508,7 +1600,7 @@ func TestPostgresIntegration(t *testing.T) {
for _, tbl := range tables {
st := schema.Table{Columns: make(map[string]schema.Column, len(tbl.Columns))}
for _, col := range tbl.Columns {
sc := schema.Column{Type: col.Type, PK: col.IsPK, Nullable: col.IsNullable}
sc := schema.Column{Type: col.Type, PK: col.IsPK, Nullable: col.IsNullable, Generated: col.Generated != ""}
if col.FK != nil {
sc.FK = fmt.Sprintf("%s.%s", col.FK.TableName, col.FK.ColumnName)
}
Expand Down Expand Up @@ -1603,6 +1695,7 @@ func TestPostgresSchemaCloneDDL(t *testing.T) {
if len(cloned) != 2 {
t.Fatalf("cloned tables = %d, want 2", len(cloned))
}
assertCloneMetadata(t, cloned)
assertCloneSchemaCanSeed(t, postgresDriver, conn, cloned)
dropCloneSmokeSchema(t, postgresDriver, conn)
}
Expand Down Expand Up @@ -2744,7 +2837,7 @@ func TestMySQLIntegration(t *testing.T) {
for _, tbl := range tables {
st := schema.Table{Columns: make(map[string]schema.Column, len(tbl.Columns))}
for _, col := range tbl.Columns {
sc := schema.Column{Type: col.Type, PK: col.IsPK, Nullable: col.IsNullable}
sc := schema.Column{Type: col.Type, PK: col.IsPK, Nullable: col.IsNullable, Generated: col.Generated != ""}
if col.FK != nil {
sc.FK = fmt.Sprintf("%s.%s", col.FK.TableName, col.FK.ColumnName)
}
Expand Down Expand Up @@ -2839,6 +2932,7 @@ func TestMySQLSchemaCloneDDL(t *testing.T) {
if len(cloned) != 2 {
t.Fatalf("cloned tables = %d, want 2", len(cloned))
}
assertCloneMetadata(t, cloned)
assertCloneSchemaCanSeed(t, mysqlDriver, conn, cloned)
dropCloneSmokeSchema(t, mysqlDriver, conn)
}
Expand Down Expand Up @@ -2882,10 +2976,11 @@ func seedL0(t *testing.T, driver, dsn string, conn *sql.DB) {
st := schema.Table{Columns: make(map[string]schema.Column, len(tbl.Columns))}
for _, col := range tbl.Columns {
sc := schema.Column{
Type: col.Type,
PK: col.IsPK,
Nullable: col.IsNullable,
Faker: faker.MapColumnToFaker(driver, col),
Type: col.Type,
PK: col.IsPK,
Nullable: col.IsNullable,
Generated: col.Generated != "",
Faker: faker.MapColumnToFaker(driver, col),
}
if col.FK != nil {
sc.FK = fmt.Sprintf("%s.%s", col.FK.TableName, col.FK.ColumnName)
Expand Down Expand Up @@ -3001,10 +3096,11 @@ func TestPostgresGaps(t *testing.T) {
st := schema.Table{Columns: make(map[string]schema.Column, len(tbl.Columns))}
for _, col := range tbl.Columns {
sc := schema.Column{
Type: col.Type,
PK: col.IsPK,
Nullable: col.IsNullable,
Faker: faker.MapColumnToFaker(postgresDriver, col),
Type: col.Type,
PK: col.IsPK,
Nullable: col.IsNullable,
Generated: col.Generated != "",
Faker: faker.MapColumnToFaker(postgresDriver, col),
}
if col.FK != nil {
sc.FK = fmt.Sprintf("%s.%s", col.FK.TableName, col.FK.ColumnName)
Expand Down Expand Up @@ -3185,10 +3281,11 @@ func TestMySQLGaps(t *testing.T) {
st := schema.Table{Columns: make(map[string]schema.Column, len(tbl.Columns))}
for _, col := range tbl.Columns {
sc := schema.Column{
Type: col.Type,
PK: col.IsPK,
Nullable: col.IsNullable,
Faker: faker.MapColumnToFaker(mysqlDriver, col),
Type: col.Type,
PK: col.IsPK,
Nullable: col.IsNullable,
Generated: col.Generated != "",
Faker: faker.MapColumnToFaker(mysqlDriver, col),
}
if col.FK != nil {
sc.FK = fmt.Sprintf("%s.%s", col.FK.TableName, col.FK.ColumnName)
Expand Down
9 changes: 5 additions & 4 deletions internal/cli/introspect.go
Original file line number Diff line number Diff line change
Expand Up @@ -67,10 +67,11 @@ Outputs a schema.yaml that can be used for seeding or AI enrichment.`,
}
for _, c := range t.Columns {
sc := schema.Column{
Type: c.Type,
PK: c.IsPK,
Nullable: c.IsNullable,
Faker: faker.MapColumnToFaker(dbType, c),
Type: c.Type,
PK: c.IsPK,
Nullable: c.IsNullable,
Generated: c.Generated != "",
Faker: faker.MapColumnToFaker(dbType, c),
}
if c.FK != nil {
sc.FK = fmt.Sprintf("%s.%s", c.FK.TableName, c.FK.ColumnName)
Expand Down
Loading
Loading