Skip to content

feat(wren-core): support composite (multi-column) primary keys#2345

Open
goldmedal wants to merge 3 commits into
mainfrom
feat/composite-primary-key
Open

feat(wren-core): support composite (multi-column) primary keys#2345
goldmedal wants to merge 3 commits into
mainfrom
feat/composite-primary-key

Conversation

@goldmedal
Copy link
Copy Markdown
Collaborator

@goldmedal goldmedal commented Jun 5, 2026

Summary

MDL models could previously declare only a single primary-key column (primaryKey: Option<String>). #2323 taught relationship JOIN conditions to handle a conjunction of equalities, but the primary_key field itself stayed single-column. As a result:

  • calculated fields over a TO_MANY relationship couldn't work for composite-keyed models, and
  • the OSI importer had to downgrade composite keys ([a, b]) to one column with a warning.

This PR adds first-class composite primary key support.

Approach

Model.primary_key becomes a #[serde(untagged)] enum:

enum PrimaryKey { Single(String), Composite(Vec<String>) }
  • Existing manifests "primaryKey": "c_custkey" stay a string — fully backward compatible.
  • New composite form: "primaryKey": ["ps_partkey", "ps_suppkey"].

The calculation join now emits pk1 = pk1 AND pk2 = pk2, reusing the existing downstream collect_join_keys splitter from #2323.

Changes

Rust core

  • wren-core-base: new PrimaryKey enum; primary_key() returns the first column (back-compat), new primary_keys() -> Vec<&str>; ModelBuilder::primary_keys(&[&str]).
  • wren-core/plan.rs: the three PK consumption sites (TO_MANY required fields, calc join condition, CalculationPlanNode output fields + dimensions) iterate all PK columns.
  • wren-core/model_generation.rs: calculation SQL generation now groups by / projects all PK dimensions (previously hard-coded dimensions[0], which only projected the first PK).

Schema & Python

  • mdl.schema.json: primaryKeyoneOf [string, array].
  • context.py: validation checks every PK column.
  • osi.py: composite keys are now preserved as a list (downgrade warning removed); the WREN primary_key override is kept as an explicit-narrowing escape hatch.
  • dbt.py: intentionally unchanged — per-column unique tests denote independently unique columns, not a composite key.

Tests

  • Rust serde round-trip (string → Single, array → Composite; accessors).
  • Rust snapshot test for a composite-PK TO_MANY calculation — verifies GROUP BY p_partkey, p_suppkey and the join back ON ... p_partkey = ... AND ... p_suppkey = ....
  • OSI: inverted the old downgrade-warning test to assert the composite key is preserved.
  • Python validation tests for composite PK (missing column → error; all present → ok).

Verification

Suite Result
wren-core lib + sqllogictests pass
wren-core clippy / fmt clean
wren-core-base (both feature configs) pass
wren-core-py rust / python 28 / 33 pass
wren unit tests 551 pass, lint clean

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Models now support composite primary keys (single or multiple columns); calculations and joins use all declared key columns.
  • Schema

    • Model schema accepts either a single column name or a non-empty array for primary keys; project handling recognizes the newer schema/layout version (defaults to v4).
  • Tests

    • Added tests for composite PK validation, serialization, migration, and SQL join generation.
  • Migrations

    • Added a no-op migration step for the new layout version.

Versioning (mirrors the dialect precedent)

An array primaryKey is forward-incompatible — an older engine expecting Option<String> fails to deserialize it. Following how the optional dialect field bumped the versions:

  • MAX_SUPPORTED_LAYOUT_VERSION 2 → 3, with a no-op migrate_v2_to_v3 (single-string PKs remain valid v3).
  • schema_version 4 added (→ layoutVersion 3). wren context validate warns when a list-form primary_key is used below schema_version 4; wren context upgrade now targets 4 by default.
  • mdl.schema.json already accepts string | array.

MDL models could only declare a single primary-key column. PR #2323 made
relationship JOIN conditions handle composite keys, but the `primary_key`
field itself stayed single-column, so calculated fields over a TO_MANY
relationship could not work for composite-keyed models, and the OSI
importer had to downgrade composite keys to one column.

Represent the primary key as a `#[serde(untagged)]` enum so a JSON string
stays a string (backward compatible) and an array is the composite form.
The calculation join now emits `pk1 = pk1 AND pk2 = pk2`, reusing the
existing downstream `collect_join_keys` splitter.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added python Pull requests that update Python code rust Pull requests that update rust code core labels Jun 5, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 5, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7098f40a-49a7-4be4-90fd-6db8d97b62ed

📥 Commits

Reviewing files that changed from the base of the PR and between 424aaa5 and f22a627.

📒 Files selected for processing (5)
  • core/wren-core-base/src/mdl/builder.rs
  • core/wren-core-base/src/mdl/manifest.rs
  • core/wren-mdl/mdl.schema.json
  • core/wren/src/wren/context.py
  • core/wren/tests/unit/test_context.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • core/wren-core-base/src/mdl/builder.rs
  • core/wren/tests/unit/test_context.py
  • core/wren-core-base/src/mdl/manifest.rs
  • core/wren/src/wren/context.py

Walkthrough

Adds composite primary key support across the stack: MDL schema accepts arrays, a typed PrimaryKey replaces raw strings in manifests, builders/OSI conversion/validation preserve composites, migrations include a v2→v3 no-op step, and query planning and tests are updated to join on all key columns.

Changes

Composite Primary Key Type System and Query Planning

Layer / File(s) Summary
MDL schema definition for composite keys
core/wren-mdl/mdl.schema.json
primaryKey now accepts either a string or a non-empty array of strings.
PrimaryKey enum and Model API
core/wren-core-base/manifest-macro/src/lib.rs, core/wren-core-base/src/mdl/manifest.rs
Introduce PrimaryKey enum (serde untagged), add PrimaryKey::columns(), change generated Model.primary_key field to Option<PrimaryKey>, update Model::primary_key() and add Model::primary_keys(), and add serde tests.
ModelBuilder API for composite keys
core/wren-core-base/src/mdl/builder.rs
ModelBuilder::primary_key wraps singles as PrimaryKey::Single; new ModelBuilder::primary_keys(&[&str]) produces PrimaryKey::Composite with validation and single-element collapse.
OSI conversion and validation
core/wren/src/wren/osi.py, core/wren/src/wren/context.py, core/wren/tests/unit/*, core/wren/tests/unit/test_context_cli.py
OSI parsing preserves composite primary_key lists or single strings; _convert_field marks fields by membership in a set of primary-key names; validate_project() accepts lists, validates each candidate column, emits compatibility warning for composite keys on schema_version < 4, and CLI/tests updated to expect schema_version 4 as latest.
Manifest migrations
core/wren-core-base/src/mdl/migration.rs
Add v2→v3 migration branch and a documented no-op migrate_v2_to_v3 preserving existing primaryKey forms; add tests for migrating to layoutVersion 3 and preserving composite arrays.
Query planning for composite primary keys
core/wren-core/core/src/logical_plan/analyze/plan.rs, core/wren-core/core/src/logical_plan/analyze/model_generation.rs, core/wren-core/core/src/mdl/mod.rs
Refactor calculation and plan builders to collect all visible primary key columns for TO_MANY joins, expand output schema and dimensions per pk column, build per-column equality predicates combined with AND, and aggregate/group by all rebased dimensions. Add snapshot test asserting joins include all pk equality predicates.
Integration and validation tests
core/wren/tests/unit/test_context.py, core/wren/tests/unit/test_osi.py, core/wren/tests/unit/test_context_cli.py
Add unit tests for composite primary_key validation (missing/present columns, schema-version gating), update CLI upgrade tests to expect default target schema_version 4, and adjust OSI tests to assert composite preservation and field flags.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Canner/WrenAI#2323: Both PRs update join predicate construction to handle multiple key-column equalities.
  • Canner/WrenAI#2322: Extends OSI→MDL conversion to parse and emit primary_key as either a single column or a composite list, aligning with this PR.

Suggested reviewers

  • douenergy

"I'm a rabbit with a key so spry,
Strings and arrays now both can fly.
From schema to plan the columns sing,
Joined together, strong they bring.
Hoppity hops — composite joy!"

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately summarizes the main change: adding support for composite (multi-column) primary keys to the wren-core system.
Docstring Coverage ✅ Passed Docstring coverage is 93.75% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/composite-primary-key

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/wren/src/wren/osi.py (1)

432-499: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate OSI primary-key members against dataset fields before emitting MDL.

This branch preserves composite keys, but it never checks that each pk_names member exists in ds.fields. With primary_key: ["id", "missing"], the emitted model still carries ["id", "missing"] while only id can ever be marked on a field, so lint_osi_file() misses a broken PK definition and the converter produces an inconsistent manifest.

🛠️ Suggested check
     # Convert fields
     columns: list[dict] = []
     untyped: list[str] = []
     sm_column_types = wren_cfg.column_types.get(name) or {}
@@
         columns.append(col)
         if type_override is None:
             dim = f.get("dimension")
             if not (isinstance(dim, dict) and dim.get("is_time")):
                 untyped.append(fname)
+
+    field_names = {column["name"] for column in columns}
+    invalid_pks = [pk_name for pk_name in pk_names if pk_name not in field_names]
+    for pk_name in invalid_pks:
+        errors.append(
+            ValidationError(
+                "error",
+                f"dataset '{name}'",
+                f"primary_key '{pk_name}' not found in fields",
+            )
+        )
+    if invalid_pks:
+        pk_names = []
@@
     if pk_names:
         model["primary_key"] = pk_names[0] if len(pk_names) == 1 else pk_names
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/wren/src/wren/osi.py` around lines 432 - 499, The code builds pk_names
but never verifies those names exist in the dataset fields, so add a validation
step (before the final primary_key assignment) that collects actual field names
from ds (e.g., set of fname values from the for f in ds.get("fields") loop or
from the built columns list), then compute valid_pk = [p for p in pk_names if p
in field_name_set] and missing_pk = [p for p in pk_names if p not in
field_name_set]; if missing_pk is non-empty append a ValidationError (same shape
as other errors) indicating the missing PK members for dataset name, and set
model["primary_key"] using valid_pk (single value if len==1, list if >1); if
valid_pk is empty omit model["primary_key"] entirely. Ensure you reference
pk_names, ds.get("fields"), model["primary_key"], and ValidationError when
making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/wren-core-base/src/mdl/builder.rs`:
- Around line 146-156: The builder currently allows empty primary-key values
which breaks Model.primary_key invariants; update the methods primary_key and
primary_keys to validate inputs: in primary_key(&str) assert the provided string
is non-empty (panic with a clear message like "primary_key must be non-empty")
before setting PrimaryKey::Single, and in primary_keys(&[&str]) assert the slice
is non-empty and that none of the entries are empty (panic with messages like
"primary_keys must contain at least one column" and "primary_keys entries must
be non-empty") before setting PrimaryKey::Composite; this preserves in-memory
invariants for Model.primary_key.

In `@core/wren-mdl/mdl.schema.json`:
- Line 314: Replace the typo "OEN_TO_MANY" with the correct "ONE_TO_MANY" in the
model description string that begins "the primary key of the model. A single
column (string) or a composite key (array of columns)..." inside the
mdl.schema.json so the relationship name is spelled correctly; update the
description text exactly where that string is defined (search for the phrase
"the primary key of the model" to locate the description) and save the schema.
- Around line 315-323: The schema's oneOf branch currently permits empty strings
for primary-key column names; update the JSON schema in
core/wren-mdl/mdl.schema.json so the string variant includes "minLength": 1 and
the array variant's "items" schema also includes "minLength": 1 (i.e., add
"minLength":1 to the object with "type":"string" and to the "items" object under
the array) to prevent "" as a valid column name.

In `@core/wren/src/wren/context.py`:
- Around line 866-876: The primary_key normalization in validate_project
currently assumes non-string truthy values are iterable, causing TypeError and
allowing invalid types; change the logic around model.get("primary_key") (pk)
and pk_cols so you first validate shape: accept only a non-empty str or a
list/tuple whose every element is a non-empty str; if pk is any other type or a
list containing non-str/empty elements, append a ValidationError (same pattern
as existing errors) indicating malformed primary_key and skip iteration; only
after passing these shape checks set pk_cols to [pk] for strings or the
validated list for lists and then iterate over pk_cols to check membership in
col_names.

---

Outside diff comments:
In `@core/wren/src/wren/osi.py`:
- Around line 432-499: The code builds pk_names but never verifies those names
exist in the dataset fields, so add a validation step (before the final
primary_key assignment) that collects actual field names from ds (e.g., set of
fname values from the for f in ds.get("fields") loop or from the built columns
list), then compute valid_pk = [p for p in pk_names if p in field_name_set] and
missing_pk = [p for p in pk_names if p not in field_name_set]; if missing_pk is
non-empty append a ValidationError (same shape as other errors) indicating the
missing PK members for dataset name, and set model["primary_key"] using valid_pk
(single value if len==1, list if >1); if valid_pk is empty omit
model["primary_key"] entirely. Ensure you reference pk_names, ds.get("fields"),
model["primary_key"], and ValidationError when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d65b5f91-0a7d-43e6-a7b1-ad2e59b4caf3

📥 Commits

Reviewing files that changed from the base of the PR and between 5fd8870 and f51f7c1.

📒 Files selected for processing (11)
  • core/wren-core-base/manifest-macro/src/lib.rs
  • core/wren-core-base/src/mdl/builder.rs
  • core/wren-core-base/src/mdl/manifest.rs
  • core/wren-core/core/src/logical_plan/analyze/model_generation.rs
  • core/wren-core/core/src/logical_plan/analyze/plan.rs
  • core/wren-core/core/src/mdl/mod.rs
  • core/wren-mdl/mdl.schema.json
  • core/wren/src/wren/context.py
  • core/wren/src/wren/osi.py
  • core/wren/tests/unit/test_context.py
  • core/wren/tests/unit/test_osi.py

Comment thread core/wren-core-base/src/mdl/builder.rs
Comment thread core/wren-mdl/mdl.schema.json Outdated
Comment thread core/wren-mdl/mdl.schema.json
Comment thread core/wren/src/wren/context.py
goldmedal and others added 2 commits June 5, 2026 12:29
A manifest using an array `primaryKey` is forward-incompatible: older
engines expecting `Option<String>` fail to deserialize it. Mirror the
`dialect` precedent so the new wire form is reflected in the versions.

- MAX_SUPPORTED_LAYOUT_VERSION 2 -> 3, no-op migrate_v2_to_v3
- schema_version 4 (-> layoutVersion 3); validate warns when a list-form
  primary_key is used below schema_version 4

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- builder: reject empty/blank PK column names; single-element
  primary_keys() collapses to PrimaryKey::Single
- mdl.schema.json: minLength 1 on PK string + array items; fix
  OEN_TO_MANY -> ONE_TO_MANY typo
- context.py: reject malformed primary_key shapes (non-str/list,
  empty entries) instead of crashing on TypeError

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@goldmedal goldmedal requested a review from douenergy June 5, 2026 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core python Pull requests that update Python code rust Pull requests that update rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant