feat(wren-core): auto-register unknown MDL functions as inferred bypass UDFs#2338
feat(wren-core): auto-register unknown MDL functions as inferred bypass UDFs#2338goldmedal wants to merge 2 commits into
Conversation
…ss UDFs MDL column expressions and RLAC conditions previously only accepted functions registered in the wren-core context, forcing authors to declare every data-source-native function as a remote function. Scan the manifest's free-form expressions (column expressions, RLAC conditions, cube measure/dimension/time-dimension expressions, relationship conditions) and auto-register any unknown function as a bypass UDF so it passes planning and is unparsed back to the source. Return type is inferred from the actual argument types at planning time (no args -> Utf8; one arg -> that type; many -> common super-type via type_union_resolution) rather than unconditionally Utf8. Registration happens in apply_wren_on_ctx, the choke point shared by every planning path (transform, Python session load, dry-run, permission analysis), so all paths are consistent and the Python bindings need no changes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughAuto-register unknown functions found in MDL manifest expressions as inferred bypass UDFs, add ReturnType::Inferred for planning-time return-type computation, register inferred scalar/window UDFs by scanning manifest expressions, integrate registration into apply_wren_on_ctx, and add tests covering inference and registration behavior. ChangesInferred Bypass UDF Registration
🎯 3 (Moderate) | ⏱️ ~22 minutes
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@core/wren-core/core/src/mdl/context.rs`:
- Around line 50-61: apply_wren_on_ctx currently calls
crate::mdl::register_inferred_bypass_for_manifest(ctx, ...) which mutates the
caller's shared SessionContext (ctx) and causes inferred UDFs to leak across
requests; instead, create a derived context/state copy and register inferred
bypass UDFs on that copy before taking the state snapshot so the original ctx is
not mutated. Locate the call to register_inferred_bypass_for_manifest (and any
helper that updates ctx) and change the flow to clone or derive a new
context/state (e.g., clone ctx or call a context-with-temporary-state helper),
run register_inferred_bypass_for_manifest against the derived context, and pass
that derived context into the ModelAnalyzeRule snapshot/resolve path (preserving
apply_wren_on_ctx semantics but avoiding permanent mutation of the input ctx).
In `@core/wren-core/core/src/mdl/function/remote_function.rs`:
- Around line 506-515: The new_inferred constructor currently only stores a
single name which forces callers to register lowercase lookup names and loses
original casing; update the ByPassWindowUDF/ByPassScalarUDF pattern by storing
both the original source spelling and a lookup-normalized spelling (e.g., keep a
new field like original_name or display_name alongside the existing name used
for lookup) so new_inferred(name: &str) preserves the incoming casing while
still populating the lookup-safe form (lowercased or otherwise normalized) for
resolution; modify the constructor (function new_inferred) to set both fields
and ensure ReturnType::Inferred and signature:bypass_any_signature() are still
used.
In `@core/wren-core/core/src/mdl/mod.rs`:
- Around line 692-730: The current single `registered` set collapses scalar and
window auto-registrations causing the first-seen kind to block the other; update
the logic in the visit_expressions closure to keep two separate
auto-registration trackers (e.g., registered_scalar: HashSet<String> and
registered_window: HashSet<String>) while leaving the existing `is_known` check
global. When encountering ast::Expr::Function, compute the same `original` and
`normalized` names, then before calling ctx.register_udf or ctx.register_udwf
check/insert into the appropriate per-kind set (use normalized for window and
normalized/original as you currently use for scalar) so a previously
auto-registered scalar doesn’t prevent a subsequent window registration and vice
versa; keep using ByPassScalarUDF::new_inferred(original) and
ByPassWindowFunction::new_inferred(&normalized) and the same conditions
(func.over.is_some()) to decide which set to consult and which register_* call
to make.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: d380ca17-04b4-4f26-bc9f-3431d90f8d86
📒 Files selected for processing (4)
core/wren-core/core/src/logical_plan/analyze/access_control.rscore/wren-core/core/src/mdl/context.rscore/wren-core/core/src/mdl/function/remote_function.rscore/wren-core/core/src/mdl/mod.rs
- Register inferred bypass UDFs into a locally-derived SessionState instead of mutating the caller's shared SessionContext, so functions can't leak across manifests/requests reusing the same context. - Track auto-registered scalar and window variants separately so the same name used both plainly and with OVER(...) registers both kinds. - Preserve original casing for inferred window UDFs via a lowercase alias (mirroring the scalar path), keeping case-sensitive backends correct. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Problem
When wren-core processes an MDL, function calls inside column expressions and RLAC conditions must resolve against a function registered in the wren-core context (built-ins, dialect functions, or a CSV-declared remote function). Any data-source-native function the engine doesn't know about fails planning with
Invalid function '...', forcing authors to declare every such function up front.Change
Scan the manifest's free-form SQL expressions and auto-register any unknown function as a bypass UDF — a no-op UDF that passes planning and is unparsed back to the source untouched (the same mechanism remote functions already use, but discovered automatically).
Scope of the scan (all free-form expression sources):
OVER (...)→ registered as a window UDF; otherwise scalar. Unknown aggregates still need an explicit remote-function declaration (they can't be told apart from scalars by syntax). Explicitly declared remote functions, built-ins, and dialect functions are never overridden.Inferred return type
Instead of unconditionally returning
Utf8, a newReturnType::Inferredderives the type from the actual argument types at planning time:Utf8round/lower/abs/custom transforms)type_union_resolution, falling back to the first arg, thenUtf8(coverscoalesce/greatest/nvl)Original casing is preserved for unparsing (e.g.
toYear), with a lowercase alias for DataFusion's name resolution.Registration point
Registration runs at the start of
apply_wren_on_ctx— the choke point shared by every planning path:transform_sql, the PythonSessionContextload (both the unparsectxand the LocalRuntimeexec_ctx),dry_run, and permission analysis. It must register on the input ctx before theModelAnalyzeRulestate snapshot is taken, since that rule resolves functions against the snapshot. This keeps all paths consistent and requires no changes to the Python bindings.Notes / limitations
thresholdis a normalized literal (Numeric/String), not free-form SQL.query()) still fails at execution time (This function should not be called) — bypass UDFs can only be unparsed/pushed down, never executed. This is by design.Testing
ReturnType::Inferredunit test (0 / 1 / many args)cargo fmtapplied, wren-core-py compiles🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Tests