Skip to content

feat(wren-core): auto-register unknown MDL functions as inferred bypass UDFs#2338

Draft
goldmedal wants to merge 2 commits into
mainfrom
worktree-plan-0529-173748
Draft

feat(wren-core): auto-register unknown MDL functions as inferred bypass UDFs#2338
goldmedal wants to merge 2 commits into
mainfrom
worktree-plan-0529-173748

Conversation

@goldmedal
Copy link
Copy Markdown
Collaborator

@goldmedal goldmedal commented Jun 1, 2026

Problem

When wren-core processes an MDL, function calls inside column expressions and RLAC conditions must resolve against a function registered in the wren-core context (built-ins, dialect functions, or a CSV-declared remote function). Any data-source-native function the engine doesn't know about fails planning with Invalid function '...', forcing authors to declare every such function up front.

Change

Scan the manifest's free-form SQL expressions and auto-register any unknown function as a bypass UDF — a no-op UDF that passes planning and is unparsed back to the source untouched (the same mechanism remote functions already use, but discovered automatically).

Scope of the scan (all free-form expression sources):

  • calculated column expressions
  • RLAC conditions
  • cube measure / dimension / time-dimension expressions
  • relationship conditions

OVER (...) → registered as a window UDF; otherwise scalar. Unknown aggregates still need an explicit remote-function declaration (they can't be told apart from scalars by syntax). Explicitly declared remote functions, built-ins, and dialect functions are never overridden.

Inferred return type

Instead of unconditionally returning Utf8, a new ReturnType::Inferred derives the type from the actual argument types at planning time:

  • no args → Utf8
  • one arg → that argument's type (covers round/lower/abs/custom transforms)
  • multiple args → common super-type via type_union_resolution, falling back to the first arg, then Utf8 (covers coalesce/greatest/nvl)

Original casing is preserved for unparsing (e.g. toYear), with a lowercase alias for DataFusion's name resolution.

Registration point

Registration runs at the start of apply_wren_on_ctx — the choke point shared by every planning path: transform_sql, the Python SessionContext load (both the unparse ctx and the LocalRuntime exec_ctx), dry_run, and permission analysis. It must register on the input ctx before the ModelAnalyzeRule state snapshot is taken, since that rule resolves functions against the snapshot. This keeps all paths consistent and requires no changes to the Python bindings.

Notes / limitations

  • CLAC has no scannable expression — its threshold is a normalized literal (Numeric/String), not free-form SQL.
  • Executing an unknown function via local DataFusion (query()) still fails at execution time (This function should not be called) — bypass UDFs can only be unparsed/pushed down, never executed. This is by design.

Testing

  • ReturnType::Inferred unit test (0 / 1 / many args)
  • column expression with an unknown scalar function → passes planning, unparses through with original casing
  • RLAC condition using a bypass function
  • full lib suite green (136 tests), clippy clean, cargo fmt applied, wren-core-py compiles

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Unknown functions in manifest expressions are auto-registered as bypass UDFs (scalar or window) so expressions remain usable.
    • Inferred return types are supported and computed from call arguments.
    • Original function name casing is preserved in transformed SQL; parser aliases are added for resolution.
  • Tests

    • Added tests for RLAC with bypass functions and for inferred-bypass registration and return-type behavior.

…ss UDFs

MDL column expressions and RLAC conditions previously only accepted
functions registered in the wren-core context, forcing authors to declare
every data-source-native function as a remote function. Scan the manifest's
free-form expressions (column expressions, RLAC conditions, cube
measure/dimension/time-dimension expressions, relationship conditions) and
auto-register any unknown function as a bypass UDF so it passes planning and
is unparsed back to the source.

Return type is inferred from the actual argument types at planning time
(no args -> Utf8; one arg -> that type; many -> common super-type via
type_union_resolution) rather than unconditionally Utf8.

Registration happens in apply_wren_on_ctx, the choke point shared by every
planning path (transform, Python session load, dry-run, permission analysis),
so all paths are consistent and the Python bindings need no changes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added rust Pull requests that update rust code core labels Jun 1, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d1cc492e-9f37-45f9-a9ad-5ba6399bf1a1

📥 Commits

Reviewing files that changed from the base of the PR and between 95a906f and 9b254f8.

📒 Files selected for processing (3)
  • core/wren-core/core/src/mdl/context.rs
  • core/wren-core/core/src/mdl/function/remote_function.rs
  • core/wren-core/core/src/mdl/mod.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • core/wren-core/core/src/mdl/function/remote_function.rs

Walkthrough

Auto-register unknown functions found in MDL manifest expressions as inferred bypass UDFs, add ReturnType::Inferred for planning-time return-type computation, register inferred scalar/window UDFs by scanning manifest expressions, integrate registration into apply_wren_on_ctx, and add tests covering inference and registration behavior.

Changes

Inferred Bypass UDF Registration

Layer / File(s) Summary
Return type inference and inferred UDF constructors
core/wren-core/core/src/mdl/function/remote_function.rs
ReturnType gains an Inferred variant computed at planning time: zero args default to Utf8, one arg returns that type, multiple args use union resolution. bypass_any_signature() added. ByPassScalarUDF::new_inferred() and ByPassWindowFunction::new_inferred() create bypass UDFs with inferred return types, preserve original case for SQL, and register lowercase aliases. Unit tests added for inferred return-type logic and casing/alias behavior.
Manifest scanning and auto-registration mechanism
core/wren-core/core/src/mdl/mod.rs
register_inferred_bypass_for_manifest() scans manifest expression strings (calculated columns, RLAC conditions, cube measures/dimensions/time-dimensions, relationships), parses them into AST, discovers function calls, and registers previously-unknown functions as inferred scalar or window bypass UDFs. Documentation in transform_sql_with_ctx clarifies this behavior.
Integration into analyzer context
core/wren-core/core/src/mdl/context.rs
apply_wren_on_ctx() derives a mutable base_state, calls register_inferred_bypass_for_manifest() on it, and then uses that state when creating reset_default_catalog_schema, ensuring inferred UDFs are available during analyzer configuration.
Test coverage for inferred bypass UDFs
core/wren-core/core/src/mdl/mod.rs, core/wren-core/core/src/logical_plan/analyze/access_control.rs
Tests verify return type inference logic, confirm auto-registration preserves original function names/casing in transformed SQL, assert scalar vs window registrations are independent, and validate RLAC filter expression generation using an inferred bypass scalar UDF.

🎯 3 (Moderate) | ⏱️ ~22 minutes

🐰 Unknown functions bloom in the manifest's garden,
Auto-registered without a formal pardon,
Inferred types adapt to arguments with grace,
The analyzer learns new functions apace!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly and clearly describes the main change: auto-registering unknown MDL functions as inferred bypass UDFs, which is the core objective across all modified files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch worktree-plan-0529-173748

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/wren-core/core/src/mdl/context.rs`:
- Around line 50-61: apply_wren_on_ctx currently calls
crate::mdl::register_inferred_bypass_for_manifest(ctx, ...) which mutates the
caller's shared SessionContext (ctx) and causes inferred UDFs to leak across
requests; instead, create a derived context/state copy and register inferred
bypass UDFs on that copy before taking the state snapshot so the original ctx is
not mutated. Locate the call to register_inferred_bypass_for_manifest (and any
helper that updates ctx) and change the flow to clone or derive a new
context/state (e.g., clone ctx or call a context-with-temporary-state helper),
run register_inferred_bypass_for_manifest against the derived context, and pass
that derived context into the ModelAnalyzeRule snapshot/resolve path (preserving
apply_wren_on_ctx semantics but avoiding permanent mutation of the input ctx).

In `@core/wren-core/core/src/mdl/function/remote_function.rs`:
- Around line 506-515: The new_inferred constructor currently only stores a
single name which forces callers to register lowercase lookup names and loses
original casing; update the ByPassWindowUDF/ByPassScalarUDF pattern by storing
both the original source spelling and a lookup-normalized spelling (e.g., keep a
new field like original_name or display_name alongside the existing name used
for lookup) so new_inferred(name: &str) preserves the incoming casing while
still populating the lookup-safe form (lowercased or otherwise normalized) for
resolution; modify the constructor (function new_inferred) to set both fields
and ensure ReturnType::Inferred and signature:bypass_any_signature() are still
used.

In `@core/wren-core/core/src/mdl/mod.rs`:
- Around line 692-730: The current single `registered` set collapses scalar and
window auto-registrations causing the first-seen kind to block the other; update
the logic in the visit_expressions closure to keep two separate
auto-registration trackers (e.g., registered_scalar: HashSet<String> and
registered_window: HashSet<String>) while leaving the existing `is_known` check
global. When encountering ast::Expr::Function, compute the same `original` and
`normalized` names, then before calling ctx.register_udf or ctx.register_udwf
check/insert into the appropriate per-kind set (use normalized for window and
normalized/original as you currently use for scalar) so a previously
auto-registered scalar doesn’t prevent a subsequent window registration and vice
versa; keep using ByPassScalarUDF::new_inferred(original) and
ByPassWindowFunction::new_inferred(&normalized) and the same conditions
(func.over.is_some()) to decide which set to consult and which register_* call
to make.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d380ca17-04b4-4f26-bc9f-3431d90f8d86

📥 Commits

Reviewing files that changed from the base of the PR and between 12f30ef and 95a906f.

📒 Files selected for processing (4)
  • core/wren-core/core/src/logical_plan/analyze/access_control.rs
  • core/wren-core/core/src/mdl/context.rs
  • core/wren-core/core/src/mdl/function/remote_function.rs
  • core/wren-core/core/src/mdl/mod.rs

Comment thread core/wren-core/core/src/mdl/context.rs Outdated
Comment thread core/wren-core/core/src/mdl/function/remote_function.rs
Comment thread core/wren-core/core/src/mdl/mod.rs Outdated
- Register inferred bypass UDFs into a locally-derived SessionState instead
  of mutating the caller's shared SessionContext, so functions can't leak
  across manifests/requests reusing the same context.
- Track auto-registered scalar and window variants separately so the same
  name used both plainly and with OVER(...) registers both kinds.
- Preserve original casing for inferred window UDFs via a lowercase alias
  (mirroring the scalar path), keeping case-sensitive backends correct.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@goldmedal goldmedal marked this pull request as draft June 5, 2026 04:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core rust Pull requests that update rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant