feat(wren-core): support composite (multi-column) primary keys#2345
feat(wren-core): support composite (multi-column) primary keys#2345goldmedal wants to merge 3 commits into
Conversation
MDL models could only declare a single primary-key column. PR #2323 made relationship JOIN conditions handle composite keys, but the `primary_key` field itself stayed single-column, so calculated fields over a TO_MANY relationship could not work for composite-keyed models, and the OSI importer had to downgrade composite keys to one column. Represent the primary key as a `#[serde(untagged)]` enum so a JSON string stays a string (backward compatible) and an array is the composite form. The calculation join now emits `pk1 = pk1 AND pk2 = pk2`, reusing the existing downstream `collect_join_keys` splitter. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (4)
WalkthroughAdds composite primary key support across the stack: MDL schema accepts arrays, a typed ChangesComposite Primary Key Type System and Query Planning
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/wren/src/wren/osi.py (1)
432-499:⚠️ Potential issue | 🟠 Major | ⚡ Quick winValidate OSI primary-key members against dataset fields before emitting MDL.
This branch preserves composite keys, but it never checks that each
pk_namesmember exists inds.fields. Withprimary_key: ["id", "missing"], the emitted model still carries["id", "missing"]while onlyidcan ever be marked on a field, solint_osi_file()misses a broken PK definition and the converter produces an inconsistent manifest.🛠️ Suggested check
# Convert fields columns: list[dict] = [] untyped: list[str] = [] sm_column_types = wren_cfg.column_types.get(name) or {} @@ columns.append(col) if type_override is None: dim = f.get("dimension") if not (isinstance(dim, dict) and dim.get("is_time")): untyped.append(fname) + + field_names = {column["name"] for column in columns} + invalid_pks = [pk_name for pk_name in pk_names if pk_name not in field_names] + for pk_name in invalid_pks: + errors.append( + ValidationError( + "error", + f"dataset '{name}'", + f"primary_key '{pk_name}' not found in fields", + ) + ) + if invalid_pks: + pk_names = [] @@ if pk_names: model["primary_key"] = pk_names[0] if len(pk_names) == 1 else pk_names🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@core/wren/src/wren/osi.py` around lines 432 - 499, The code builds pk_names but never verifies those names exist in the dataset fields, so add a validation step (before the final primary_key assignment) that collects actual field names from ds (e.g., set of fname values from the for f in ds.get("fields") loop or from the built columns list), then compute valid_pk = [p for p in pk_names if p in field_name_set] and missing_pk = [p for p in pk_names if p not in field_name_set]; if missing_pk is non-empty append a ValidationError (same shape as other errors) indicating the missing PK members for dataset name, and set model["primary_key"] using valid_pk (single value if len==1, list if >1); if valid_pk is empty omit model["primary_key"] entirely. Ensure you reference pk_names, ds.get("fields"), model["primary_key"], and ValidationError when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@core/wren-core-base/src/mdl/builder.rs`:
- Around line 146-156: The builder currently allows empty primary-key values
which breaks Model.primary_key invariants; update the methods primary_key and
primary_keys to validate inputs: in primary_key(&str) assert the provided string
is non-empty (panic with a clear message like "primary_key must be non-empty")
before setting PrimaryKey::Single, and in primary_keys(&[&str]) assert the slice
is non-empty and that none of the entries are empty (panic with messages like
"primary_keys must contain at least one column" and "primary_keys entries must
be non-empty") before setting PrimaryKey::Composite; this preserves in-memory
invariants for Model.primary_key.
In `@core/wren-mdl/mdl.schema.json`:
- Line 314: Replace the typo "OEN_TO_MANY" with the correct "ONE_TO_MANY" in the
model description string that begins "the primary key of the model. A single
column (string) or a composite key (array of columns)..." inside the
mdl.schema.json so the relationship name is spelled correctly; update the
description text exactly where that string is defined (search for the phrase
"the primary key of the model" to locate the description) and save the schema.
- Around line 315-323: The schema's oneOf branch currently permits empty strings
for primary-key column names; update the JSON schema in
core/wren-mdl/mdl.schema.json so the string variant includes "minLength": 1 and
the array variant's "items" schema also includes "minLength": 1 (i.e., add
"minLength":1 to the object with "type":"string" and to the "items" object under
the array) to prevent "" as a valid column name.
In `@core/wren/src/wren/context.py`:
- Around line 866-876: The primary_key normalization in validate_project
currently assumes non-string truthy values are iterable, causing TypeError and
allowing invalid types; change the logic around model.get("primary_key") (pk)
and pk_cols so you first validate shape: accept only a non-empty str or a
list/tuple whose every element is a non-empty str; if pk is any other type or a
list containing non-str/empty elements, append a ValidationError (same pattern
as existing errors) indicating malformed primary_key and skip iteration; only
after passing these shape checks set pk_cols to [pk] for strings or the
validated list for lists and then iterate over pk_cols to check membership in
col_names.
---
Outside diff comments:
In `@core/wren/src/wren/osi.py`:
- Around line 432-499: The code builds pk_names but never verifies those names
exist in the dataset fields, so add a validation step (before the final
primary_key assignment) that collects actual field names from ds (e.g., set of
fname values from the for f in ds.get("fields") loop or from the built columns
list), then compute valid_pk = [p for p in pk_names if p in field_name_set] and
missing_pk = [p for p in pk_names if p not in field_name_set]; if missing_pk is
non-empty append a ValidationError (same shape as other errors) indicating the
missing PK members for dataset name, and set model["primary_key"] using valid_pk
(single value if len==1, list if >1); if valid_pk is empty omit
model["primary_key"] entirely. Ensure you reference pk_names, ds.get("fields"),
model["primary_key"], and ValidationError when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: d65b5f91-0a7d-43e6-a7b1-ad2e59b4caf3
📒 Files selected for processing (11)
core/wren-core-base/manifest-macro/src/lib.rscore/wren-core-base/src/mdl/builder.rscore/wren-core-base/src/mdl/manifest.rscore/wren-core/core/src/logical_plan/analyze/model_generation.rscore/wren-core/core/src/logical_plan/analyze/plan.rscore/wren-core/core/src/mdl/mod.rscore/wren-mdl/mdl.schema.jsoncore/wren/src/wren/context.pycore/wren/src/wren/osi.pycore/wren/tests/unit/test_context.pycore/wren/tests/unit/test_osi.py
A manifest using an array `primaryKey` is forward-incompatible: older engines expecting `Option<String>` fail to deserialize it. Mirror the `dialect` precedent so the new wire form is reflected in the versions. - MAX_SUPPORTED_LAYOUT_VERSION 2 -> 3, no-op migrate_v2_to_v3 - schema_version 4 (-> layoutVersion 3); validate warns when a list-form primary_key is used below schema_version 4 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- builder: reject empty/blank PK column names; single-element primary_keys() collapses to PrimaryKey::Single - mdl.schema.json: minLength 1 on PK string + array items; fix OEN_TO_MANY -> ONE_TO_MANY typo - context.py: reject malformed primary_key shapes (non-str/list, empty entries) instead of crashing on TypeError Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
MDL models could previously declare only a single primary-key column (
primaryKey: Option<String>). #2323 taught relationship JOIN conditions to handle a conjunction of equalities, but theprimary_keyfield itself stayed single-column. As a result:[a, b]) to one column with a warning.This PR adds first-class composite primary key support.
Approach
Model.primary_keybecomes a#[serde(untagged)]enum:"primaryKey": "c_custkey"stay a string — fully backward compatible."primaryKey": ["ps_partkey", "ps_suppkey"].The calculation join now emits
pk1 = pk1 AND pk2 = pk2, reusing the existing downstreamcollect_join_keyssplitter from #2323.Changes
Rust core
wren-core-base: newPrimaryKeyenum;primary_key()returns the first column (back-compat), newprimary_keys() -> Vec<&str>;ModelBuilder::primary_keys(&[&str]).wren-core/plan.rs: the three PK consumption sites (TO_MANY required fields, calc join condition,CalculationPlanNodeoutput fields + dimensions) iterate all PK columns.wren-core/model_generation.rs: calculation SQL generation now groups by / projects all PK dimensions (previously hard-codeddimensions[0], which only projected the first PK).Schema & Python
mdl.schema.json:primaryKey→oneOf [string, array].context.py: validation checks every PK column.osi.py: composite keys are now preserved as a list (downgrade warning removed); the WRENprimary_keyoverride is kept as an explicit-narrowing escape hatch.dbt.py: intentionally unchanged — per-columnuniquetests denote independently unique columns, not a composite key.Tests
Single, array →Composite; accessors).GROUP BY p_partkey, p_suppkeyand the join backON ... p_partkey = ... AND ... p_suppkey = ....Verification
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Schema
Tests
Migrations
Versioning (mirrors the
dialectprecedent)An array
primaryKeyis forward-incompatible — an older engine expectingOption<String>fails to deserialize it. Following how the optionaldialectfield bumped the versions:MAX_SUPPORTED_LAYOUT_VERSION2 → 3, with a no-opmigrate_v2_to_v3(single-string PKs remain valid v3).schema_version4 added (→ layoutVersion 3).wren context validatewarns when a list-formprimary_keyis used belowschema_version 4;wren context upgradenow targets 4 by default.mdl.schema.jsonalready acceptsstring | array.