Generate the Spark Connect registrar from the canonical named surface (stacks on #25)#26
Open
estebanzimanyi wants to merge 14 commits into
Open
Generate the Spark Connect registrar from the canonical named surface (stacks on #25)#26estebanzimanyi wants to merge 14 commits into
estebanzimanyi wants to merge 14 commits into
Conversation
… surface Bump codegen/input/meos-idl.json to the MEOS-API IDL and regenerate functions.GeneratedFunctions over the full consolidated superset: mul_* (incl. tbigint); minDistance; the circular-buffer and network-point MF-JSON readers; the ever- and always-covers families (ecovers_*/acovers_*); trgeo_*; the H3 / th3index family (ever_eq_h3indexset_th3index, h3index_in/out, H3Index lowered to long); PostgreSQL type I/O; tgeogpoint_great_circle_distance; meos_initialize_noexit_error_handler. 2916 functions.
…ld flags The functions.GeneratedFunctions facade is generated at build time from the MEOS IDL with the optional type families selected by the same flag names and ON|OFF (also 1|0) values as the MobilityDB/MEOS build: -DCBUFFER, -DNPOINT, -DPOSE, -DRGEO, -DH3. Every family is included by default; passing -DCBUFFER=OFF (or =0) drops that family's functions from the generated binding so a subset jar ships without it (RGEO needs POSE). FunctionsGenerator maps each function's source header to its family and omits excluded families; jmeos-core runs the generator at generate-sources (so the flag flows through mvn) and compiles the generated functions.GeneratedFunctions.
…try C API MobilityDB #1137 renamed the public rigid-geometry C API from trgeo to trgeometry. The MEOS IDL the facade is generated from adopts the new names (verified 1:1 against the master meos_rgeo.h: 67 trgeo->trgeometry; the trgeoinst_make instant constructor is unchanged, matching master), so the generated functions.GeneratedFunctions and the bundled jar resolve against a post-#1137 libmeos.
Bumps codegen/input/meos-idl.json to the public+bound MEOS surface of the ecosystem pin: the set-set spatial-join family (edwithin/tdwithin/adisjoint _tgeoarr_tgeoarr), the mindistance_tgeoarr_tgeoarr rename, the trgeometry analytics (frechet/hausdorff/dyntimewarp/centroid/length/speed), tpose and tnpoint value accessors, tcbuffer traversed-area, and the aggregate combine functions. 3031 bound functions (was 2916).
Hoists the tier-aware MeosOps* facade (62 classes) into JMEOS so every JVM binding inherits the one canonical Java idiom from the shared jar instead of duplicating it per engine. The facade forwards to functions.GeneratedFunctions under a package-private MeosOpsRuntime probe gated by the canonical -Dmeos.enabled property; javadoc is engine-neutral. Relocates the maintained generator (regen_facade_from_jar + the gap / sql / tbigint / h3 emitters + parity_audit + meos-ref) under jmeos-core/tools so the facade stays regenerated, not hand-edited; regeneration is idempotent against the pin jar.
MeosSetSetJoin exposes the MEOS *_tgeoarr_tgeoarr family as eDwithinPairs / tDwithinPairs / aDisjointPairs over two arrays of temporal-geometry handles: it marshals the native pointer arrays the kernel prunes in C, keeps them reachable across the call with reachabilityFence, and reads back the flattened 0-based index pairs (and, for tDwithin, the per-pair tstzspanset of in-range times). Both JVM engines call it from the shared org.mobilitydb.meos layer, so the NxN spatial-join surface derives once. Verified against libmeos.
The IDL and bundled libmeos carry the 54a9d4bc54 public surface: the per-thread PROJ context, the box3d_in/gbox_in parsers, and tpose_to_tpoint. The parity-gap forwarders bind the value-at-timestamptz wrappers through their result-returning form and drop the pointcloud initializer absent from the surface.
Compile jmeos-core for Java 17 and rewrite the type-pattern switches in STBox and the time types as instanceof-pattern if/else chains (instanceof patterns are Java 17). The facade bytecode then loads on the Spark Connect server's Java 17 runtime (Spark 3.5's supported JRE), and still runs on later runtimes.
Add extract_named_surface.py, which produces meos-named-surface.json from the two canonical sources already in the MobilityDB tree: the SQL CREATE FUNCTION catalog (named functions, overloads, per-argument DEFAULTs -> valid call arities) and the doxygen chain (@sqlfn on the PG wrapper, @csqlfn on the MEOS function) linking each SQL name to its PG and MEOS C functions. This is the layer above the C-FFI IDL from which a binding's named surface and its Spark Connect registrar are generated, rather than hand-maintained. 1284 named functions, asMFJSON resolving to temporal_as_mfjson with minArity 1 / maxArity 4.
…tter extract_spark_impls.py scans the MobilitySpark UDFs (register name + field + body GeneratedFunctions call) and joins on the named surface's SQL->MEOS C linkage to recover canonical name -> Spark impl mechanically, so the emitter needs no hand-written remap. The join classifies each function for emission: single-impl (identity name over one impl), multi-impl (identity name with a WKB-type-tag dispatch builder), and join gaps to close.
…face generate_spark_registrar.py joins the canonical named surface with the Spark impl scan and emits MobilitySparkConnectExtensionsGen.scala: a SparkSessionExtensions that injects each canonical function under its identity name (asMFJSON, not temporalAsMfjson), no hand-written remap. Shipped ScalaUDF closures live in a companion object so they capture only the serializable UDF; the builder null-pads the impl's optional args to the call-site arity. The 81 single-impl functions are generated, compiled, and serve live over Spark Connect under their identity names; the 139 multi-impl names are listed for the per-row meos_typeof_hexwkb dispatch.
…trar A multi-impl canonical name (one SQL name over several type-specific Spark impls) whose first argument differs in MEOS type is emitted as a single ScalaUDF that peeks meostype_name(meos_typeof_hexwkb(arg0)) per row and routes to the impl whose receiver type matches, with the Temporal-receiver impl as the catch-all default. The receiver category is read from the impl's primary MEOS function (the last non-marshaling GeneratedFunctions call in the UDF body) first C-parameter type. The registrar serves the /items-collection OGC function set under identity names: asMFJSON, stbox, the Xmin/Ymin/Xmax/Ymax/Tmin/Tmax accessors, numSequences, sequenceN, trajectory. Functions that differ only on a later argument (atTime on its time argument) are listed for the arg-N dispatch extension.
…th the SQL default Several MobilitySpark UDFs register one MEOS operation under both a bare name and a camelCase name (asText/tpointAsText, getTime/time, cumulativeLength/...). When a canonical name's impls all share one primary MEOS function, bind the identity name to a single impl rather than treating it as a type dispatch. Capture each optional argument's SQL DEFAULT literal in the named surface and fill an omitted optional argument with it, but only when the impl exposes a full overload's worth of arguments (impl arity equals that overload's maxArity, so the positions align); otherwise null-pad and let the impl's own default hold. This serves asText/asEWKT (maxdecimaldigits default 15) while keeping asMFJSON at full coordinate precision (its impl exposes a non-leading argument subset).
Generalize the multi-impl dispatch from arg0 to the first argument position at which the type-specific impls differ, peeking that argument's MEOS type tag: atTime routes on its time argument (tstzspan/tstzset/tstzspanset), duration routes on its span argument. The concrete type for a generic Span/Set receiver is taken from the MEOS type-name embedded in the impl's primary function name. Each dispatch route carries its impl's arity and SQL-default fills, so an omitted optional argument of the chosen impl is filled with the canonical default rather than a null pad (duration on a tstzspanset supplies boundspan=FALSE); boolean SQL defaults are emitted alongside integer ones. The named surface is regenerated from the pin, so speed(tgeompoint) resolves through the dedicated Tpoint_speed wrapper to tpoint_speed and binds to the speed impl.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Extract the canonical named-operation surface from the MobilityDB SQL CREATE FUNCTION catalog and the doxygen @sqlfn/@csqlfn chain, derive the canonical-name to Spark-impl mapping from the MobilitySpark UDF bodies, and emit MobilitySparkConnectExtensionsGen.scala: a SparkSessionExtensions that injects each canonical function under its identity name with no camelCase remap and no hand-written table. Single-impl functions bind to the one impl; multi-impl names whose first argument differs in MEOS type dispatch per row on the arg0 WKB type tag (meos_typeof_hexwkb) to the type-matching impl, with the Temporal-receiver impl as the catch-all default. The facade targets Java 17 so it runs on the Spark 3.5 runtime. The registrar serves the OGC reads (asMFJSON, stbox, the Xmin/Ymin/Xmax/Ymax/Tmin/Tmax accessors, numSequences, sequenceN, trajectory) under identity names over Spark Connect.