Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
103 commits
Select commit Hold shift + click to select a range
e17f8ca
PeriodUDT simple implementation.
Aug 7, 2023
822ae6d
Added comments and definitions for PeriodUDT class.
Aug 7, 2023
88d7b1b
Deleted JMEOS jars from git tracking, added to gitignore. Added Regis…
Aug 7, 2023
69b8234
Partially implemented UDFs for PeriodUDT class.
Aug 8, 2023
5975da9
Add tgeompointinst UDF
Aug 2, 2023
97cd188
Add PeriodSet
Aug 4, 2023
747c016
Update pom
Aug 4, 2023
62a51c9
Modify main
Aug 7, 2023
f5de6a6
Add period set UDT
Aug 8, 2023
6639c3f
Finished UDFs and UDF registrator.
Aug 9, 2023
2fdc6cd
Finished UDFs and UDF registrator.
Aug 9, 2023
0e192f9
Added sample tests and testing utility.
Aug 10, 2023
8e699f2
Started working on TimestampSet UDTs
Aug 10, 2023
3cf33a6
Merge pull request #2 from satriabw/period-implementation
Action52 Aug 10, 2023
3b1c81c
Add PeriodSet registrator for UDT and UDF
Aug 8, 2023
c36010b
Add the rest of PointUDF implemenation
Aug 9, 2023
3fa4056
Implement Period Set UDF
Aug 10, 2023
c6240cb
Implemented basic version of TimestampSetUDT. Also modified examples …
Aug 11, 2023
c9b4e45
Implement PeriodSet
Aug 11, 2023
c33e269
Skip the test for now
Aug 11, 2023
a3b2309
Merge pull request #3 from satriabw/satria/poc
Aug 11, 2023
0505997
Implemented changes on Period using Binaries.
Aug 16, 2023
24b5271
Implemented new structure with MeosDatatype as parent class for Spark…
Aug 16, 2023
f290586
Reformated Factory to minimize reduncancies
Aug 17, 2023
45d193f
Merge pull request #4 from satriabw/meos-datatype
Action52 Aug 17, 2023
de5ae39
Merge branch 'develop' into timestampset-implementation
Action52 Aug 17, 2023
ca441df
Merge pull request #6 from satriabw/timestampset-implementation
Aug 17, 2023
78180d4
Add implementation for ais dataset
Aug 25, 2023
91afa4c
Tidy up implementation
Aug 25, 2023
7724506
Finish AISDataExample implementation
Aug 28, 2023
7b1109e
Merge pull request #7 from satriabw/feature/tpoint
Aug 28, 2023
212192f
feat(spark): JMEOS 1.3 + BerlinMOD Q1-Q17 + edge-to-cloud pipeline — …
estebanzimanyi May 7, 2026
3e17bda
ci: exclude legacy sources from license check; add header to Main.java
estebanzimanyi May 8, 2026
c8b182a
test: don't call meos_finalize in unit test teardown
estebanzimanyi May 8, 2026
f13da5c
test: remove all meos_finalize/ms.close calls from test teardown
estebanzimanyi May 8, 2026
59ebace
fix(meos): bundle spatial_ref_sys.csv and register it on session create
estebanzimanyi May 8, 2026
e939f29
feat(parquet): add tintFromBinary, tfloatFromBinary, tboolFromBinary,…
estebanzimanyi May 8, 2026
9a45b95
feat(parquet): add span/spanset fromBinary UDFs (tstzspan, intspan, f…
estebanzimanyi May 8, 2026
d026fde
feat(parquet): add tgeompointFromBinary + tgeogpointFromBinary; updat…
estebanzimanyi May 8, 2026
5a5ff76
fix(test): drop tgeogpoint unit test requiring SRS setup; fix README …
estebanzimanyi May 8, 2026
a25da95
fix(test): register spatial_ref_sys.csv in @BeforeAll to enable geode…
estebanzimanyi May 8, 2026
092ee60
feat(platform): add macOS and Windows support via patched JMEOS-1.4.jar
estebanzimanyi May 8, 2026
33b6d2d
fix(ci): remove invalid shell: pwsh on uses: step in Windows job
estebanzimanyi May 8, 2026
e07dc3f
feat(spark): BerlinMOD Q1-Q17 + UDFs + benchmark + JVM crash fixes
estebanzimanyi May 9, 2026
c2b93c4
fix(bench): use local[2], ulimit -c 0, and pin java.library.path
estebanzimanyi May 9, 2026
522cf6c
fix(build): pin java.library.path to /usr/local/lib in surefire; upda…
estebanzimanyi May 9, 2026
473404f
refactor(jmeos): rename JMEOS-1.5 → JMEOS-1.4 to match MEOS API versi…
estebanzimanyi May 9, 2026
2892eb0
fix(bench): flush results to JSON after each query; use atomic write
estebanzimanyi May 9, 2026
1b853e6
feat(bench): add --quick flag (--runs 1) for crash-safety verification
estebanzimanyi May 9, 2026
cbf79ff
feat(bench): add --queries range selector for targeted crash bisection
estebanzimanyi May 9, 2026
77dbe5e
fix(memory): free MEOS native objects in all UDFs to prevent OOM crash
estebanzimanyi May 9, 2026
8420c09
test(memory): add NativeMemoryLeakTest — VmRSS-based native leak dete…
estebanzimanyi May 9, 2026
1a47ed9
fix(bench): use tdwithin_tgeo_tgeo in tDwithin UDF (q10 fix)
estebanzimanyi May 9, 2026
c7d55e4
fix(berlinmod): ORDER BY alias in q12 + richer error output in bench
estebanzimanyi May 9, 2026
85f915d
feat(udfs): add 5 UDF groups for full operator parity — 166 tests green
estebanzimanyi May 10, 2026
62d2c28
feat(udfs): add 4 UDF groups + 13 UDAFs — 235 tests green
estebanzimanyi May 10, 2026
c449e17
fix(build): prioritise bundled lib/libmeos.so in surefire java.librar…
estebanzimanyi May 10, 2026
18d380b
feat(udfs): add DistanceUDFs, extend RestrictionUDFs and TransformUDF…
estebanzimanyi May 10, 2026
5e3c5c0
feat(udfs): add transcendental math, trend, tboolWhenTrue, tpointIsSi…
estebanzimanyi May 10, 2026
d6409ff
feat(udfs): add span/spanset/stbox/elevation restriction UDFs — 265 t…
estebanzimanyi May 10, 2026
176c03d
feat(udfs): add tintAtValue, tnumber span/spanset restriction, tgeoMi…
estebanzimanyi May 10, 2026
0a11203
feat(udfs): add cumulative length, traversed area, shift/scale time —…
estebanzimanyi May 10, 2026
2536c74
feat(geo): add StaticGeoUDFs — 17 static geometry predicates/metrics/…
estebanzimanyi May 10, 2026
c8bbb7e
feat(temporal): add 10 UDFs — temporal comparisons, tintToTfloat, tpr…
estebanzimanyi May 10, 2026
5569168
feat(geo): add 6 STBox analytics UDFs — area, perimeter, volume, isGe…
estebanzimanyi May 10, 2026
17528f8
feat(temporal): add TBoxUDFs — 13 TBox accessor/span-conversion UDFs …
estebanzimanyi May 10, 2026
382eaf4
feat(geo): add ever/always scalar predicates + tgeo×tgeo temporal rels
estebanzimanyi May 10, 2026
548755c
feat(temporal): MFJSON I/O, text output, and tint shift/scale UDFs
estebanzimanyi May 10, 2026
1e4b7a9
feat(temporal): ever_ne/always_ne predicates + value_at_timestamptz a…
estebanzimanyi May 10, 2026
c887dc4
feat(temporal): tintValueN, tintMinusValue, temporalDeleteTimestamptz…
estebanzimanyi May 10, 2026
45e91a1
feat(temporal): parity batch — 85 new UDFs, 642 tests green
estebanzimanyi May 10, 2026
2f9ccea
feat(udfs): set value accessors, ttext_values, geo I/O UDFs (701 tests)
estebanzimanyi May 10, 2026
6af23fe
feat(udfs): tstzspanset extra accessors + tpointFromBaseTemp construc…
estebanzimanyi May 10, 2026
bbdbd58
feat(udfs): parity batch — Transform/Restriction/Similarity/SpanAlgeb…
estebanzimanyi May 10, 2026
3c93c3a
fix(safety): replace local[*] with local[2] in all configs and docs
estebanzimanyi May 10, 2026
5965800
chore(libs): remove stale JMEOS jars — only JMEOS-1.4.jar is active
estebanzimanyi May 10, 2026
e77ecf1
feat(geo): add tpoint I/O, SRID, round, bounding-box, and convex-hull…
estebanzimanyi May 10, 2026
18d6b66
fix(demo): update BerlinMOD UDFs for MEOS 1.4 renamed symbols
estebanzimanyi May 10, 2026
841dc22
feat(bench): resumable BerlinMOD benchmark + --queries selector
estebanzimanyi May 10, 2026
8fda0fd
feat(parity): MobilityDB SQL surface parity at 100% (858/858)
estebanzimanyi May 10, 2026
038501e
feat(perf): th3index spatial prefilter for cross-join queries (Stage 2)
estebanzimanyi May 10, 2026
a986758
feat(perf): extend th3index prefilter to trip×trip cross-joins (Q5/Q6…
estebanzimanyi May 10, 2026
b133a57
feat(perf): cross-platform th3index prefilter — portable SQL + PG GiS…
estebanzimanyi May 10, 2026
e9871d6
feat(perf): include trip_h3 in setup/generate_data.sh trips.csv override
estebanzimanyi May 10, 2026
c12e257
feat(h3): 100% public-API parity in Th3IndexUDFs (86 UDFs)
estebanzimanyi May 10, 2026
6b238bc
feat(perf): polygon-side prefilter — adopt MobilityDB #938's static-g…
estebanzimanyi May 11, 2026
8b5c612
Add minDistance UDFs and adopt spatial-min Q5 form
estebanzimanyi May 14, 2026
d5724ac
Uplift MobilitySpark to compile against post-regen JMEOS 1.4
estebanzimanyi May 15, 2026
9e7b153
Reactivate Th3IndexUDFs after JMEOS regen exposes th3index surface
estebanzimanyi May 15, 2026
f7b5ce5
Uplift Th3IndexUDFs for post-regen JMEOS surface drift
estebanzimanyi May 15, 2026
c6e78f5
Materialise trip_h3 without a self-referential view
estebanzimanyi May 15, 2026
0edd39f
Vendor the regenerated JMEOS jar with the th3index surface
estebanzimanyi May 15, 2026
2018ee4
Assert tnumberTrend(tint) returns null
estebanzimanyi May 15, 2026
6414dda
Vendor the libmeos.so carrying the th3index and minDistance surface
estebanzimanyi May 15, 2026
aa420a7
Vendor the split-interface JMEOS jar to fit the JVM proxy clinit limit
estebanzimanyi May 15, 2026
62a9bcd
Install libh3 in CI so the th3index libmeos resolves at runtime
estebanzimanyi May 15, 2026
27a9348
Run surefire fork IPC over a process pipe to survive native stderr wr…
estebanzimanyi May 15, 2026
1e1abf3
Print JVM and surefire crash dumps when unit tests fail
estebanzimanyi May 15, 2026
ef92da0
Install the no-exit MEOS error handler in ConstructorUDFsExtTest
estebanzimanyi May 15, 2026
4a366fd
Install proj-data in CI so geodetic MFJSON resolves its CRS
estebanzimanyi May 15, 2026
17b3eae
Surface the MEOS errno for the geodetic MFJSON call on CI
estebanzimanyi May 15, 2026
f6a79d8
Provide the MEOS spatial_ref_sys table in CI for geodetic input
estebanzimanyi May 15, 2026
4d8d32c
Make the th3index family selectable at build time with the H3 flag
estebanzimanyi May 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 110 additions & 0 deletions .github/workflows/maven.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
name: Maven CI

on:
push:
branches: ["main", "feat/**", "fix/**"]
paths-ignore:
- "**/*.md"
- "doc/**"
pull_request:
branches: ["main", "feat/**", "fix/**"]
paths-ignore:
- "**/*.md"
- "doc/**"
workflow_dispatch:

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
build:
name: Build and test (Java 21 / Spark 3.5)
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4

- name: Set up Java 21
uses: actions/setup-java@v4
with:
distribution: temurin
java-version: "21"
cache: maven

- name: Install libmeos runtime dependencies
run: |
sudo apt-get update -qq
sudo apt-get install -y libjson-c5 libgeos-c1t64 libproj25 proj-data libgsl27 libh3-1

- name: Set up libmeos.so and LD_LIBRARY_PATH
run: |
mkdir -p /tmp/libmeos
cp "$GITHUB_WORKSPACE/lib/libmeos.so" /tmp/libmeos/libmeos.so
echo "LD_LIBRARY_PATH=/tmp/libmeos" >> "$GITHUB_ENV"

- name: Provide the MEOS spatial_ref_sys table
run: |
# libmeos resolves SRID metadata (geodetic recognition for
# tgeogpoint, transforms) from its built-in default path
# /usr/local/share/spatial_ref_sys.csv. The vendored .so ships
# without that data file, so geography input fails with
# MEOS_ERR_INVALID_ARG. Fetch the canonical table to the default
# path, as a runtime data dependency rather than a vendored blob.
sudo mkdir -p /usr/local/share
sudo curl -fsSL \
https://raw.githubusercontent.com/MobilityDB/MobilityDB/master/meos/src/geo/spatial_ref_sys.csv \
-o /usr/local/share/spatial_ref_sys.csv
test -s /usr/local/share/spatial_ref_sys.csv

- name: Install JMEOS 1.4 to local Maven repository
run: |
mvn install:install-file \
-Dfile=libs/JMEOS-1.4.jar \
-DgroupId=org.jmeos \
-DartifactId=jmeos \
-Dversion=1.4 \
-Dpackaging=jar \
-q

- name: License header check
run: bash tools/scripts/check_license.sh

- name: Compile
run: mvn -B compile

- name: Verify the H3 flag excludes the th3index package
run: |
mvn -B clean compile -DH3=OFF
for f in h3 demo; do
if [ -d "target/classes/org/mobilitydb/spark/$f" ]; then
echo "ERROR: $f package compiled despite -DH3=OFF"; exit 1
fi
done
mvn -B clean compile

- name: Unit tests
run: mvn -B test

- name: Native crash diagnostics
if: failure()
run: |
echo "===== JVM fatal error logs ====="
find . -name 'hs_err_pid*.log' -print -exec cat {} \; 2>/dev/null || true
echo "===== surefire dump streams ====="
for f in $(find . -path '*/surefire-reports/*' \( -name '*.dump' -o -name '*.dumpstream' -o -name '*-jvmRun*.dump' \) 2>/dev/null); do
echo "----- $f -----"; cat "$f" 2>/dev/null || true
done
echo "===== surefire reports + test stdout ====="
for f in $(find target/surefire-reports -name '*.txt' 2>/dev/null); do
echo "----- $f -----"; cat "$f" 2>/dev/null || true
done

- name: Package (fat jar)
run: mvn -B package -DskipTests

- name: Upload fat jar
uses: actions/upload-artifact@v4
with:
name: mobilityspark-spark.jar
path: target/*-spark.jar
11 changes: 9 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,21 @@
.project
.settings/

# Intellij
# IntelliJ IDEA
.idea/
*.iml
*.iws
*.ipr

# Mac
# macOS
.DS_Store
**/.DS_Store

# Maven
log/
target/

# Large BerlinMOD benchmark data (generated locally — too large for GitHub)
berlinmod/data/trips.csv
dependency-reduced-pom.xml
hs_err_pid*.log
22 changes: 22 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
-------------------------------------------------------------------------------
This MobilityDB code is provided under The PostgreSQL License.

Copyright (c) 2020-2025, Université libre de Bruxelles and MobilityDB
contributors

Permission to use, copy, modify, and distribute this software and its
documentation for any purpose, without fee, and without a written agreement is
hereby granted, provided that the above copyright notice and this paragraph and
the following two paragraphs appear in all copies.

IN NO EVENT SHALL UNIVERSITE LIBRE DE BRUXELLES BE LIABLE TO ANY PARTY FOR
DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION,
EVEN IF UNIVERSITE LIBRE DE BRUXELLES HAS BEEN ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.

UNIVERSITE LIBRE DE BRUXELLES SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING,
BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND
UNIVERSITE LIBRE DE BRUXELLES HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
175 changes: 175 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
MobilitySpark
=============

[MEOS (Mobility Engine, Open Source)](https://www.libmeos.org/) is a C library that enables the
manipulation of temporal and spatiotemporal data based on
[MobilityDB](https://mobilitydb.com/)'s data types and functions.

MobilitySpark is a binding for [Apache Spark](https://spark.apache.org/) built on top of MEOS
via [JMEOS](https://github.com/MobilityDB/JMEOS) (the Java binding for MEOS).

<img src="doc/images/mobilitydb-logo.svg" width="200" alt="MobilityDB Logo" />

The MobilityDB project is developed by the Computer & Decision Engineering Department of the
[Université libre de Bruxelles](https://www.ulb.be/) (ULB) under the direction of
[Prof. Esteban Zimányi](http://cs.ulb.ac.be/members/esteban/).
ULB is an OGC Associate Member and member of the OGC Moving Feature Standard Working Group
([MF-SWG](https://www.ogc.org/projects/groups/movfeatswg)).

<img src="doc/images/OGC_Associate_Member_3DR.png" width="100" alt="OGC Associate Member Logo" />

---

## 1. Requirements

- **Java 21** (OpenJDK or Temurin)
- **Apache Maven 3.8+**
- **Apache Spark 3.5** (provided at runtime; not needed to compile)
- **JMEOS 1.4** — bundled in `libs/JMEOS-1.4.jar`
(the Java binding for MEOS 1.4; includes `libmeos.so` for Linux).
A small `MeosNative.java` supplement covers ~70 MEOS-1.4-renamed
symbols not yet in the released JAR.

---

## 2. Building MobilitySpark

### Clone the repository

```sh
git clone https://github.com/MobilityDB/MobilitySpark.git
cd MobilitySpark
```

### Compile

```sh
mvn compile
```

### Package (fat jar for `spark-submit`)

```sh
mvn package -DskipTests
```

The fat jar is written to `target/mobilityspark-0.1.0-SNAPSHOT-spark.jar`.

---

## 3. Using MobilitySpark

### 3.1. Initialise MEOS and register UDFs

```java
SparkSession spark = SparkSession.builder().master("local[2]").getOrCreate();
try (MobilitySparkSession ms = MobilitySparkSession.create(spark)) {
// All UDFs are now available in Spark SQL
spark.sql("SELECT atTime(trip, TIMESTAMP '2020-01-01 00:30:00') FROM trips").show();
}
```

### 3.2. Available UDFs

MobilitySpark covers **100% of MobilityDB's active addressable SQL surface**
(858/858 functions) — the same audit methodology runs against MobilityDuck
to keep both bindings in lockstep. See
[`docs/parity-100.md`](docs/parity-100.md) for the achievement note,
[`docs/parity-status.md`](docs/parity-status.md) for the per-section
coverage report (regenerable via `python3 scripts/parity-audit.py`),
and the comprehensive UDF inventory in [PR #5](https://github.com/MobilityDB/MobilitySpark/pull/5).

A small sample (every UDF group in MobilityDB has a Spark equivalent):

| UDF | Signature | Description |
|-----|-----------|-------------|
| `atTime` | `(STRING, TIMESTAMP) → STRING` | Restrict tgeompoint to a timestamp |
| `eIntersects` | `(STRING, STRING) → BOOLEAN` | Ever intersects a geometry |
| `nearestApproachDistance` | `(STRING, STRING) → DOUBLE` | Min distance at any common instant |
| `eDwithin` | `(STRING, STRING, DOUBLE) → BOOLEAN` | Ever within given distance |
| `spaceTimeTiles` | `(STRING, DOUBLE×3, STRING, STRING, TIMESTAMP, BOOLEAN) → ARRAY<STRING>` | Multidimensional tiling for parallel partitioning |
| `tfloatSeqSetGaps` | `(ARRAY<STRING>, STRING, DOUBLE, STRING) → STRING` | Build a tfloat sequence-set with gap detection |

All tgeompoint values are stored as **hex-WKB strings** (output of `temporal_as_hexwkb`).
Geometry values are **hex-EWKB strings** (output of `geo_as_hexewkb`).
Set / span / spanset / tbox / stbox values follow the same hex-encoding convention.

### 3.3. Portable SQL (BerlinMOD benchmark)

The SQL queries in `berlinmod/` use **named functions only** — no operator symbols — so the
same file runs unchanged on MobilityDB (PostgreSQL), MobilityDuck (DuckDB), and MobilitySpark
(Spark SQL). This is the portability contract defined in
[Discussion #861](https://github.com/MobilityDB/MobilityDB/discussions/861).

Run the demo:

```sh
spark-submit --class org.mobilitydb.spark.demo.BerlinMODDemo \
target/mobilityspark-0.1.0-SNAPSHOT-spark.jar
```

### 3.4. Sample queries

Restrict a trip to a query instant:
```sql
SELECT atTime(trip, TIMESTAMP '2020-01-01 00:30:00+00') AS pos FROM Trips;
```

Find vehicles that ever passed a query point:
```sql
SELECT DISTINCT v.licence
FROM Vehicles v JOIN Trips t ON t.vehId = v.vehId
JOIN QueryPoints p ON eIntersects(t.trip, p.geom);
```

Minimum nearest-approach distance between vehicle pairs:
```sql
SELECT MIN(nearestApproachDistance(t1.trip, t2.trip)) AS min_dist
FROM Trips t1 JOIN Trips t2 ON t1.vehId < t2.vehId;
```

---

## 4. Running the tests

Unit tests run without a Spark session (each UDF is a plain Java lambda):

```sh
mvn test
```

---

## 5. Examples

Numbered examples mirror the MEOS C examples (`meos/examples/01_hello_world.c`, etc.):

| Class | Description |
|-------|-------------|
| `N01HelloWorld` | Round-trip a tgeompoint through hex-WKB |
| `N03BerlinMOD` | BerlinMOD Q1/Q3/Q4/Q5/Q6 portable SQL |

Run with:
```sh
spark-submit --class org.mobilitydb.spark.examples.N01HelloWorld \
target/mobilityspark-0.1.0-SNAPSHOT-spark.jar
```

---

## 6. Project structure

```
src/main/java/org/mobilitydb/spark/
MobilitySparkSession.java — entry point: init MEOS + register all UDFs
temporal/TemporalUDFs.java — atTime and other time-axis UDFs
geo/GeoUDFs.java — eIntersects, nearestApproachDistance, eDwithin
examples/N01HelloWorld.java — hello-world example
examples/N03BerlinMOD.java — BerlinMOD portable SQL demo
demo/BerlinMODDemo.java — Q1/Q3/Q4/Q5/Q6 implementation
udfs/TemporalUDFs.java — convenience facade (registerAll)

berlinmod/ — portable SQL files (RFC #861)
libs/JMEOS-1.3.jar — JMEOS 1.3 (includes libmeos.so)
tools/scripts/ — license header checker
```
Loading