This package implements a schema migration system that compares an ORM model against a live database and generates the SQL commands needed to bring the database in sync with the ORM.
The system is built around a normalized JSON structure that acts as a database-agnostic contract between three actors:
- ORM Extractor — reads the application model and produces the JSON
- DB Extractor — reads the actual database and produces the same JSON
- Diff Engine + Command Builder — compares the two JSONs and generates SQL
Because both extractors produce the same format, the diff engine does not need to know anything about ORM internals or database-specific catalogs.
- Every object is an entity with
entity,entity_nameandattributes - The hierarchy is navigable via nested dictionaries
- Names of constraints, indexes and FKs are hashed for deterministic comparison
- Attributes are cleaned (no None, no False, no empty values) to avoid spurious diffs
root
├── entity: "db"
├── entity_name: <database_name>
├── schemas: {<schema_name>: <schema_item>}
├── extensions: {<ext_name>: <extension_item>}
└── event_triggers: {<trigger_name>: <event_trigger_item>}
schema_item
├── entity: "schema"
├── entity_name: <schema_name>
└── tables: {<table_name>: <table_item>}
table_item
├── entity: "table"
├── entity_name: <table_name>
├── attributes: {pkeys: "col1,col2" | None}
├── columns: {<col_name>: <column_item>}
├── relations: {<hashed_fk_name>: <relation_item>}
├── constraints: {<hashed_cst_name>: <constraint_item>}
└── indexes: {<hashed_idx_name>: <index_item>}
column_item
├── entity: "column"
├── entity_name: <column_name>
└── attributes:
├── dtype: str — Genropy internal type (T, I, N, R, D, DHZ, etc.)
├── sql_type: str — SQL standard type (text, integer, numeric, etc.)
├── size: str|None — "100", "10,2", "0:200"
├── notnull: bool|"_auto_" — True, or "_auto_" for PK columns
├── sqldefault: str|None — SQL DEFAULT expression
├── unique: bool|None
├── extra_sql: str|None — extra SQL appended to column definition
└── generated_expression: str|None — GENERATED ALWAYS AS expression
relation_item (foreign key)
├── entity: "relation"
├── entity_name: <hashed_name>
└── attributes:
├── columns: [str] — source columns
├── related_table: str — target table name
├── related_schema: str — target schema name
├── related_columns: [str] — target columns
├── constraint_name: str — actual constraint name
├── constraint_type: "FOREIGN KEY"
├── on_delete: str|None — RESTRICT, CASCADE, SET NULL, SET DEFAULT, NO ACTION
├── on_update: str|None
├── deferrable: bool|None
└── initially_deferred: bool|None
constraint_item
├── entity: "constraint"
├── entity_name: <hashed_name>
└── attributes:
├── columns: [str]
├── constraint_name: str
└── constraint_type: "UNIQUE"|"CHECK"
index_item
├── entity: "index"
├── entity_name: <hashed_name>
└── attributes:
├── columns: {col_name: sort_order|None}
├── index_name: str
├── method: str|None — btree, gin, gist, brin, hash
├── with_options: dict|None
├── tablespace: str|None
├── unique: bool|None
└── where: str|None — partial index condition
extension_item
├── entity: "extension"
├── entity_name: <extension_name>
└── attributes: {}
event_trigger_item
├── entity: "event_trigger"
├── entity_name: <trigger_name>
└── attributes: {<event_trigger_specific_attrs>}
Column attributes are filtered through COL_JSON_KEYS to ensure only
comparable attributes are kept. Both extractors apply the same filter,
guaranteeing that the diff engine sees homogeneous data.
Constraints, foreign keys and indexes use deterministic hashed names
({type}_{md5_8chars}) computed from schema + table + columns + type.
This allows the diff engine to match entities across ORM and DB
regardless of the actual constraint name in the database.
The architecture is designed for extension. Adding a new entity type requires:
- A factory function in
structures.py(e.g.new_function_item()) - Extraction logic in the ORM and/or DB extractor
- Handler methods in the command builder (
added_<entity>,changed_<entity>,removed_<entity>) - An entry in
ENTITY_TREEif the entity is part of the navigable hierarchy
The diff engine and executor work generically and require no changes.
The following entities are not yet implemented but the structure naturally
accommodates them. Each would follow the same entity/entity_name/attributes
pattern.
schema_item
└── functions: {<func_name>: <function_item>}
function_item
├── entity: "function"
├── entity_name: <function_name>
└── attributes:
├── language: str — plpgsql, sql, python, etc.
├── return_type: str
├── arguments: str — full argument signature
├── body: str — function body (hash for comparison)
├── volatility: str — VOLATILE, STABLE, IMMUTABLE
├── security: str|None — SECURITY DEFINER | INVOKER
└── is_procedure: bool — True for PROCEDURE, False for FUNCTION
ORM counterpart needed: a way to declare functions in the Genropy model,
possibly via a @sql_function decorator or a functions section in the
package/table model.
schema_item
└── views: {<view_name>: <view_item>}
view_item
├── entity: "view"
├── entity_name: <view_name>
└── attributes:
├── definition: str — SELECT query (hash for comparison)
├── materialized: bool — True for MATERIALIZED VIEW
├── columns: [str] — output column names
└── with_data: bool|None — WITH DATA for materialized views
ORM counterpart needed: a views section in the package model with
the SQL definition as a string or callable.
table_item
└── triggers: {<trigger_name>: <trigger_item>}
trigger_item
├── entity: "trigger"
├── entity_name: <trigger_name>
└── attributes:
├── timing: str — BEFORE, AFTER, INSTEAD OF
├── events: [str] — [INSERT, UPDATE, DELETE, TRUNCATE]
├── for_each: str — ROW | STATEMENT
├── function_name: str — function to execute
├── function_schema: str
├── condition: str|None — WHEN clause
└── arguments: str|None — arguments passed to the function
ORM counterpart needed: a trigger declaration in the table model, referencing a function entity.
schema_item
└── types: {<type_name>: <type_item>}
type_item
├── entity: "type"
├── entity_name: <type_name>
└── attributes:
├── type_kind: str — ENUM, COMPOSITE, DOMAIN, RANGE
├── enum_values: [str]|None — for ENUM types
├── columns: dict|None — for COMPOSITE types {name: type}
├── base_type: str|None — for DOMAIN types
└── constraint: str|None — for DOMAIN types (CHECK)
ORM counterpart needed: a types section in the package model.
ENUM types are the most common use case.
schema_item
└── sequences: {<seq_name>: <sequence_item>}
sequence_item
├── entity: "sequence"
├── entity_name: <sequence_name>
└── attributes:
├── start_value: int
├── increment: int
├── min_value: int|None
├── max_value: int|None
├── cycle: bool
└── owned_by: str|None — schema.table.column
Note: serial columns and GENERATED AS IDENTITY implicitly create sequences. Only standalone sequences (not tied to a column) need this.
These are typically managed outside the schema migration workflow:
- Permissions (GRANT/REVOKE) — managed by deployment/ops tooling
- Tablespaces — infrastructure-level, not schema-level
- Publications/Subscriptions — logical replication configuration
- Foreign Data Wrappers — external data source configuration
- Collations — rarely changed after DB creation
Decision: The normalized JSON structure uses SQL standard types
(integer, text, timestamp with time zone, etc.) instead of Genropy
internal dtypes (I, T, DHZ, etc.).
Rationale:
- The JSON structure is meant to be a database-agnostic contract.
SQL standard types (
information_schematypes) are universally understood and self-documenting. - Genropy dtypes are framework-specific and cause information loss:
smallintandintegerboth map toI,realanddouble precisionboth map toR. - A standalone migration package (
genro-sqlmigration) must work without any knowledge of Genropy internals. Using SQL standard types makes the JSON format truly portable.
Implementation:
sql_typeis added alongsidedtype, not as a replacement.dtyperemains the primary Genropy attribute for internal use (UI, validation, application logic).- The column attribute used for type comparison is
sql_type(included inCOL_JSON_KEYS).dtypeis excluded fromCOL_JSON_KEYSand does not participate in the diff. - The DB extractor returns the
data_typefrominformation_schemadirectly assql_type, without converting to Genropy dtypes. - The ORM extractor keeps the original
dtypeand addssql_typeby translating viarevTypesDictfrom the base adapter (_gnrbaseadapter.py). This keeps the mapping in one place within Genropy, without duplicating it in the migration package.
Consequences:
- The diff engine compares
sql_typevalues, which are human-readable and unambiguous. - The command builder receives SQL standard types and can generate DDL
directly, without needing
revTypesDictfor type resolution. TYPE_CONVERSIONSkeys will use SQL standard type pairs instead of Genropy dtype pairs.- The standalone
genro-sqlmigrationpackage receives JSON withsql_typealready in SQL standard format — it is Genropy's responsibility (via its ORM extractor) to produce the correct mapping.
The constraint_item already has constraint_type: "CHECK" and the factory
accepts a check_clause parameter, but process_constraints in db_extractor.py
silently ignores CHECK constraints. Implementation steps:
- In
db_extractor.py, process CHECK constraints like UNIQUE constraints - Add
check_clauseto the constraint attributes - In
command_builder.py, handle CHECK inadded_constraintandchanged_constraint - In the ORM, allow declaring CHECK constraints on tables
Database comments are useful for documentation and can be extracted from
pg_catalog.pg_description. They could be stored as an additional column
attribute:
COL_JSON_KEYS = (
"sql_type", "notnull", "sqldefault", "size",
"unique", "extra_sql", "generated_expression",
"comment" # <-- new
)The command builder would generate COMMENT ON COLUMN schema.table.column IS '...'
for added/changed comments.
ORM counterpart: the column doc attribute in Genropy could map to this.
Similarly, table-level comments could be an attribute of table_item:
table_item["attributes"]["comment"] = "Description of the table"The current ENTITY_TREE only describes the navigable hierarchy for the
json_to_tree UI function. When new entity types are added, it needs
to be updated:
ENTITY_TREE = {
'schemas': {
'tables': {
'columns': None,
'relations': None,
'constraints': None,
'indexes': None,
'triggers': None, # future
},
'views': None, # future
'functions': None, # future
'types': None, # future
'sequences': None, # future
}
}Currently there is no validation of the JSON structure. A validate_structure()
function could verify that:
- Every entity has
entity,entity_nameandattributes - Column
sql_typevalues are valid SQL standard types - FK targets exist in the structure
- Hashed names are consistent
This would help catch bugs in custom extractors (especially for the standalone
genro-sqlmigration package where third-party extractors may produce the JSON).
| Module | Responsibility |
|---|---|
structures.py |
Constants, factory functions, utilities |
orm_extractor.py |
ORM model → normalized JSON |
db_extractor.py |
Live database → normalized JSON (via adapter) |
diff_engine.py |
Compare two JSONs → typed events (added/changed/removed) |
command_builder.py |
Events → SQL command fragments |
executor.py |
Assemble and execute SQL, verify backups |
migrator.py |
Orchestrator composing all components |