From 6ee075e5807d9005c45c5972d88fa8a84dc299d1 Mon Sep 17 00:00:00 2001 From: Esteban Zimanyi Date: Sun, 10 May 2026 16:28:44 +0200 Subject: [PATCH 1/5] docs+scripts: PR coordination policy + audit/lint tooling refresh (consolidates #115 + #119) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three related artefacts that prevent the same class of failure (parallel work on the same surface drifting in policy and offset state): docs/PR-COORDINATION.md — ecosystem policy: gh pr list is the first step before any code change; minimise PR count by folding into existing PRs; squash each PR to a single commit before review. docs/CONSOLIDATION-PLAN.md — file-level overlap matrix between commits already on main and the open consolidate/* parity PRs. scripts/lint-tz-pinned-tests.py — flags every line in an expected-output block that carries a hardcoded UTC offset. scripts/parity-audit.py — three improvements: - Strip single-line SQL comments before matching CREATE FUNCTION, so '-- CREATE FUNCTION tdirection(...)' placeholders for unimplemented MobilityDB functions don't inflate the missing-list. - OUT_OF_SCOPE_NAMES bucket for domain-specific names that are PG-only or have no DuckDB equivalent: transform_gk (Gauss-Krüger for SECONDO), create_trip (BerlinMOD generator), _edisjoint (PG SQL-wrapper internal helper), box2d/box3d (PostGIS-only bbox types), range/multirange (PG range/multirange types). Headline 90.3% → 92.8% addressable (combined with the parity-adding PRs in flight). --- docs/CONSOLIDATION-PLAN.md | 76 +++++++++++++++++ docs/PR-COORDINATION.md | 145 ++++++++++++++++++++++++++++++++ scripts/lint-tz-pinned-tests.py | 109 ++++++++++++++++++++++++ scripts/parity-audit.py | 34 +++++++- 4 files changed, 363 insertions(+), 1 deletion(-) create mode 100644 docs/CONSOLIDATION-PLAN.md create mode 100644 docs/PR-COORDINATION.md create mode 100755 scripts/lint-tz-pinned-tests.py diff --git a/docs/CONSOLIDATION-PLAN.md b/docs/CONSOLIDATION-PLAN.md new file mode 100644 index 00000000..9fa0d1d1 --- /dev/null +++ b/docs/CONSOLIDATION-PLAN.md @@ -0,0 +1,76 @@ +# Consolidation plan — open parity work + +This document captures the file-level overlap between the parity commits +already on `main` and the open `consolidate/*` PRs, and suggests the +resolution path. Maintained as a working artefact during consolidation; +delete once everything is merged. + +## Direct overlaps + +### Per-thread MEOS init (PR #111) vs. single-timezone commits on `main` + +| On main | PR #111 | +|---|---| +| `39921f1 fix(tz): single-timezone model` adds `meos_initialize_timezone("Europe/Brussels")` + `AutoLoadExtension(icu)` + `SetOptionByName("TimeZone", "Europe/Brussels")` to `LoadInternal`. | `c237f6c fix(threads): replace global mutex with per-thread MEOS initialization` REMOVES the entire `meos_initialize()` block from `LoadInternal`; per-thread guard initializes MEOS lazily on first use. | +| `08a5598 docs(tz): clarify two-timezone reality in comments` comments framing the Brussels override. | `9dd765a test(stbox): make timestamp assertions timezone-neutral` establishes the project policy: tests use `stbox_eq()` / `=` / `asText(...)` comparisons, never offset-bearing string matches. | + +**Resolution**: when PR #111 merges, revert `39921f1` and `08a5598` from +`main`. Any test files pinned to `+01` that the single-timezone +commits introduced (`040_tgeometry_parity.test`, `041_tgeography_parity.test`, +`042_tgeogpoint_parity.test`, plus the `update_test_expected.py` rewrite of +~25 test files) need to be rewritten as timezone-neutral, **not flipped +back to `+00`**. + +### Parity work on `main` vs. `consolidate/*` PRs + +| Main commit | Files | Overlapping PR | +|---|---|---| +| `c8cad6d feat(parity): Binary/HexWKB I/O for sets, spans, spansets` | `src/temporal/{set,span,spanset}{,_functions}.cpp` | #103 `consolidate/aggregates-parity` | +| `91102ae feat(parity): tgeometry/tgeography Binary/HexWKB/MFJSON/Text parsers` | `src/geo/{tgeometry,tgeography}_in_out.cpp` | #102 `consolidate/tiles-bins-parity`, #104 `consolidate/geo-types-parity` | +| `afac6eb feat(parity): tbool/tint/tfloat/ttext FromHexWKB and FromMFJSON parsers` | `src/temporal/temporal.cpp` | #97 `consolidate/temporal-ops-parity`, #100 `consolidate/analytics-parity`, #112 `feat/wkb-roundtrip-all-types` | +| `88227cd feat(parity): tgeo_teq/tne for tgeometry and tgeography` | `src/geo/{tgeometry,tgeography}_ops.cpp` | #98 `consolidate/spatial-predicates-parity`, #104 | +| `e958b59 feat(parity): tgeo_teq/tne aliases + audit fixes` | `src/geo/tgeompoint.cpp`, `scripts/parity-audit.py` | #98, #99 `consolidate/tgeompoint-ops-parity` | +| `e41c8d9 feat(parity): tbool_and/or/not, ttext_cat, mobilitydb_version aliases` | `src/temporal/temporal.cpp`, `src/mobilityduck_extension.cpp` | #97, #99, #102, #103 | +| `cb88cc0 docs(parity): exclude PG-only entries from headline coverage` | `scripts/parity-audit.py`, `docs/parity-status.md` | none direct | + +**Resolution options** (per consolidation PR — pick one): + +1. **Rebase the consolidation PR on top of the main commits**, then drop + the now-duplicate registrations from the PR diff. Cleanest when the + consolidation PR is the larger / more comprehensive surface. + +2. **Revert the main commit and fold its registrations into the + consolidation PR**. Cleanest when the consolidation PR has not yet + added a particular alias but the main commit has. + +3. **Keep both, accept the diff churn**. Only viable when the main + commit and the consolidation PR add the same name-list and the diff + is truly identical — which is rare given audit/policy drift between + the two streams. + +The maintainer makes the call per PR. `cb88cc0` (audit script +infrastructure) is independent of the consolidations and can stay on +`main` regardless. + +## Independent items (no current overlap) + +These pushed branches do not collide with any open PR and can land +independently: + +- `docs/pr-coordination-policy` (this batch) — adds + `docs/PR-COORDINATION.md`. + +## Notes for the maintainer + +- The audit script `scripts/parity-audit.py` is the source of truth for + coverage tracking. Regenerate `docs/parity-status.md` after each + consolidation merge so the headline number reflects the merged + surface. At time of writing the audit reports 90.3% addressable + parity on `main`. + +- Cross-platform fanout — every name renamed or removed from MEOS + (`meosType → MeosType`, `tcontains_geo_tgeo` arg-count drop, etc.) + needs a corresponding sync in MobilityDuck. The parallel branch + `perf/duck-stack-attempt2` already has these (`7a8cc22`, `c37b9e2`); + these need to land on `main` too, ideally before the consolidation + PRs. diff --git a/docs/PR-COORDINATION.md b/docs/PR-COORDINATION.md new file mode 100644 index 00000000..46d56b02 --- /dev/null +++ b/docs/PR-COORDINATION.md @@ -0,0 +1,145 @@ +# PR coordination policy + +Before changing any source file in MobilityDuck, **check the open PR list first**. +The same applies across the ecosystem (MobilityDB, JMEOS, PyMEOS, MobilitySpark, +MEOS-API, MobilityDB-Deck, etc.) when a change in one repo could land before +related changes in a sibling. + +## The rule + +``` +gh -R MobilityDB/MobilityDuck pr list --state open --limit 30 +``` + +Read the titles. Open the diff of any PR whose title touches your area. Read +the body for policy decisions. Only then make a code change. + +## Why + +Two recent failure modes that this policy prevents: + +1. **Duplicated work.** Several `consolidate/*` parity PRs were open + (`#97`, `#98`, `#99`, `#100`, `#102`, `#103`, `#104`) covering temporal-ops, + tspatial-predicates, tgeompoint-ops, analytics, tile/bin, aggregates, and + the geo type triple. A parallel parity stream pushed equivalent commits + directly to `main`, creating a 6-commit overlap that has to be untangled + before either side can merge. + +2. **Reverted policies reintroduced.** PR #111 (`fix/per-thread-meos-init`) + moved MEOS init out of `LoadInternal` and onto a per-thread guard, + *and* established the project policy of timezone-neutral test assertions + (`stbox_eq()`, `=` round-trips, `asText(...)` comparisons). A separate + stream simultaneously committed `fix(tz): single-timezone model — extension + forces both MEOS and DuckDB to Europe/Brussels` to `LoadInternal` — + exactly the policy PR #111 was reverting. The two will conflict at merge, + and any test files pinned to `+01` need rewriting either way. + +The PR queue carries the project's current direction. The cost of skimming it +is a few seconds; the cost of a merge conflict on a 50-file consolidation PR +is hours. + +## How to apply + +1. **Before any commit**: + - `gh pr list --state open` in the affected repo. + - For PRs whose titles touch your area, run `gh pr view --json title,body,files --jq '{title, body, files: [.files[].path]}'`. + - If your change duplicates the surface or conflicts with a policy + decision: stop, comment on the PR or coordinate, do not push. + +2. **Before pushing new commits to `main`**: confirm none of the open + `consolidate/*` PRs cover the same surface. Treat the consolidation + branches as pending merges. + +3. **Before adding a new policy** (init order, threading, error handling, + timezone, type-rename status): grep open PRs for the same area. If a PR + is changing the policy you're about to set, don't. + +4. **When parallel sessions are mentioned**: assume one is currently running + in the same checkout. Use `git worktree list`, `git status`, + `gh pr list` to see what they're touching. Don't push to `main` of an + org repo while a parallel session is doing the same. + +5. **For follow-up commits to a PR you've already opened**: also check + whether other open PRs would conflict with your follow-up — the original + PR's existence doesn't immunize follow-ups. + +## What this policy is NOT + +- It is **not** "ask the user before every commit." Local exploratory work + that you don't push is fine. +- It is **not** "wait for all PRs to merge." Independent work that doesn't + touch the same files or policies is fair game. +- The trigger is **file-level overlap** with open PRs, plus **policy-level + overlap** (init order, error handling, type-rename status, test patterns). + +## Cross-ecosystem variant + +The same rule applies across the ecosystem when a change in one repo could +land before related changes in a sibling repo: + +- MobilityDB MEOS API change → check JMEOS / PyMEOS / MobilityDuck / + MEOS-API for in-flight syncs. +- MobilityDuck binding addition → check MobilitySpark / MobilityPySpark for + parity expectations. +- Rename or removal in MEOS C → check all binding repos' open PRs for + cherry-picks of the rename. + +The "Cross-platform uniformity" policy covers the *post-merge* obligation +(update all bindings before removing a name); this policy covers the +*pre-commit* obligation (don't start work that an open PR has already +covered or is reversing). + +## One PR = one commit = one feature + +Two complementary obligations apply to every PR: + +1. **Minimise PR count.** Before opening a new PR, check open PRs + (`gh pr list`) and consolidate the new work into an existing PR if + the surface is topic-coherent. Five small PRs that add binding + registrations for the same family of MEOS functions should be one PR, + not five. + +2. **Squash each PR to a single commit before review.** The squashed + message becomes the merge commit message, so write it carefully: + subject = the PR title's "what changed", body = rationale and + reviewer-facing notes. + +### Squash recipe + +```sh +git checkout +tree=$(git write-tree) +parent=$(git merge-base HEAD origin/main) +msg=$(cat <<'EOF' + + + +EOF +) +newhash=$(git commit-tree -p $parent -m "$msg" $tree) +git reset --hard $newhash +git push --force-with-lease origin +``` + +This preserves authorship metadata (every commit's author stays as the +original author) while collapsing the history. + +### Why + +Reviewer cost is the dominant cost. One consolidated PR with one commit +means the maintainer reads one diff, interprets one CI run, makes one +merge decision. Three small PRs with three commits each multiplies that +by nine. Single-commit PRs also make `git revert` precise (one commit = +one feature = one revert) and eliminate ordering ambiguity inside the +PR's history. + +### Exceptions + +- Branches already exceeding 20 commits — merging as-is is cheaper than + squashing. +- Cherry-picked commits that need to remain attributed to their original + author with the original commit hash for archaeology. +- Truly orthogonal work (a docs PR and an unrelated code PR should stay + separate). +- Dependency-chained PRs where the second PR genuinely needs to merge + after the first. diff --git a/scripts/lint-tz-pinned-tests.py b/scripts/lint-tz-pinned-tests.py new file mode 100755 index 00000000..176d48de --- /dev/null +++ b/scripts/lint-tz-pinned-tests.py @@ -0,0 +1,109 @@ +#!/usr/bin/env python3 +"""Flag timezone-pinned expected values in DuckDB sqllogic tests. + +A test that hardcodes a timezone offset (`+00`, `+01`, `+02`, …) in an +expected output line is fragile: MEOS renders timestamps using the +process's TZ, and any divergence between the developer's machine and CI +flips the expected offset. The project policy is to write +timezone-neutral assertions instead — value equality (`= tstzspan +'...'`), accessor functions (`stbox_eq`, `numSpans`, `numValues`, +`startTimestamp() = …`), or `asText(...)` round-trips on both sides. + +This script walks `test/sql/**/*.test`, finds every line in an +expected-output block (the lines after `----`) that contains a +`±NN` offset, and prints the file:line and a short snippet. Returns +non-zero if any are found, so it can be used as a pre-commit gate or a +CI lint step. + +Inputs that look like literal-offset SQL (`tstzspan '[2000-01-01 +00:00:00+00, ...]'`) are part of a query line, not an expected value, +and are skipped — only lines that follow a `----` separator within a +test block are checked. + +Usage: + python3 scripts/lint-tz-pinned-tests.py [--root test] +""" + +import argparse +import glob +import os +import re +import sys + + +# Match a UTC-offset literal at the tail of a TIMESTAMPTZ rendering. +# Matches `+00`, `+05:30`, `-08`, etc. The negative lookbehind for +# `[eE]` avoids scientific-notation false positives (`1.5e+00`). +TZ_OFFSET_RE = re.compile(r"(? 100: + snippet = snippet[:97] + "..." + print(f"{rel}:{lineno}: {snippet}") + + if pinned_count: + print() + print( + f"Found {pinned_count} timezone-pinned expected values in " + f"{len(pinned_files)} files. Rewrite as value-equality " + f"(`= tstzspan '...'`), accessor (`numSpans`, `startTimestamp`), " + f"or `asText(...)` round-trip assertions per the project " + f"timezone-neutral policy (PR #111 / commit 9dd765a)." + ) + return 1 + + print("No timezone-pinned expected values found.") + return 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/scripts/parity-audit.py b/scripts/parity-audit.py index 90ce34a3..52c83cf8 100755 --- a/scripts/parity-audit.py +++ b/scripts/parity-audit.py @@ -83,9 +83,41 @@ ) +# Individual function names that are out-of-scope by domain — not part of +# the MobilityDuck surface because they exist for a MobilityDB-specific +# integration or PG-internal pattern that DuckDB doesn't share. +OUT_OF_SCOPE_NAMES = { + # Gauss-Krüger projection — added to MobilityDB to connect with the + # SECONDO platform. No equivalent need in MobilityDuck. + "transform_gk", + # Synthetic-trip generator for the BerlinMOD benchmark generator. + # The generator runs in MobilityDB / SECONDO; the parquet artefacts + # it produces are what MobilityDuck consumes. + "create_trip", + # Internal C helper composed by the `edisjoint` SQL wrapper for the + # bbox short-circuit pattern (`NOT(bbox && temp) OR _edisjoint(...)`). + # MobilityDuck registers `eDisjoint` directly against the MEOS C + # export, so the PG SQL-wrapper helper has no DuckDB equivalent. + "_edisjoint", + # PostGIS GEOMETRY-flavored bounding-box types (box2d / box3d) — + # PostgreSQL-only, no DuckDB equivalent. STBox covers the + # spatiotemporal bbox surface in MobilityDuck. + "box2d", + "box3d", + # PG range / multirange types — PostgreSQL-specific. DuckDB has no + # range type, so range(intspan) / multirange(intspanset) casts have + # no target type to land in. + "range", + "multirange", +} + + def is_out_of_scope_name(fname): - """Return True for PG-only helper names (suffix match).""" + """Return True for PG-only helper names (suffix match) or + domain-specific names that have no MobilityDuck equivalent.""" lower = fname.lower() + if lower in {n.lower() for n in OUT_OF_SCOPE_NAMES}: + return True # All suffixes start with `_`, so a non-empty prefix means the suffix # matched a "_" shape (e.g. tnumber_in, temporal_sel). for suf in OUT_OF_SCOPE_NAME_SUFFIXES: From cf36886adb6682a56f3324d709bd1b1340af4be9 Mon Sep 17 00:00:00 2001 From: Esteban Zimanyi Date: Wed, 20 May 2026 11:55:01 +0200 Subject: [PATCH 2/5] fix: drop premature `using meosType = MeosType;` alias in tydef.hpp MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `meosType` (lower-case) is the **pre-consolidation** MEOS type name; `MeosType` (upper-case) is the **post-consolidation** target that the upstream rename sweep has not yet reached. The current vcpkg pin (`vcpkg_ports/meos/portfile.cmake` REF f11b7443ee98…) is still pre-consolidation: `meos/include/temporal/meos_catalog.h` line 121 declares the typedef as `} meosType;` and every MEOS API uses the lower-case spelling. MobilityDuck's source code consistently uses `meosType` to match — `grep -rn '\bMeosType\b' src/` finds the name only on the alias line and its comment, nowhere else. c8cad6d added `using meosType = MeosType;` as a forward-looking bridge for the eventual consolidation bump. That bridge points at `MeosType`, which the current pin does NOT yet expose, so it breaks every PR's Linux arm64 build with: /duckdb_build_dir/src/include/tydef.hpp:18:18: error: ‘MeosType’ does not name a type; did you mean ‘meosType’? The fix is to drop the premature alias and replace the misleading comment with one that documents the pre/post-consolidation distinction and the resume path for the next pin bump — at that point a reviewer can either restore the bridge (this time it'll be valid because `MeosType` will exist) or sweep the MobilityDuck source from `meosType` to `MeosType` in a single PR. Unblocks every in-flight PR's Linux arm64 build: #126, #130, #149, #158, #159, #160, plus the entire `feat/*_port_core` extended-type stack (#148/#150/#151/#153/#155/#156). --- src/include/tydef.hpp | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/src/include/tydef.hpp b/src/include/tydef.hpp index b7b28109..b9860ca0 100644 --- a/src/include/tydef.hpp +++ b/src/include/tydef.hpp @@ -11,11 +11,27 @@ extern "C" { #include } -// Forward-compat alias for the meosType → MeosType rename (MobilityDB -// pr785-sync-script). Vcpkg's MEOS exposes `MeosType`; existing -// MobilityDuck code still uses `meosType`. This alias bridges the two -// without touching every reference site. -using meosType = MeosType; +// MEOS naming history: `meosType` is the **pre-consolidation** spelling +// and `MeosType` is the **post-consolidation** target (the rename is +// part of the upstream consolidation sweep, not yet reached by the +// vcpkg pin). The current pin +// (`vcpkg_ports/meos/portfile.cmake` REF f11b7443ee98…) is still +// pre-consolidation and exposes `meosType` — see +// meos/include/temporal/meos_catalog.h, where line 121 declares +// `} meosType;`. MobilityDuck's source consistently uses +// `meosType` (verified via `grep -rn '\bmeosType\b' src/`), which +// matches the pin, so no alias is needed today. +// +// An earlier version of this file added `using meosType = MeosType;` +// as a forward-looking bridge for the eventual consolidation bump. +// That alias references `MeosType`, which the current pin does NOT +// yet expose, so it broke the build: +// "'MeosType' does not name a type; did you mean 'meosType'?". +// +// When the MEOS pin is bumped past the consolidation point, restore +// a bridge here (`using meosType = MeosType;` becomes valid then) or +// sweep the source `meosType → MeosType` in one PR — whichever the +// project prefers at that time. namespace duckdb { From c7816dabead836de248a64246e91ba04316e03fb Mon Sep 17 00:00:00 2001 From: Esteban Zimanyi Date: Fri, 15 May 2026 08:49:29 +0200 Subject: [PATCH 3/5] polyfill(ci): stage icu extension for amd64 docker tests (from open PR #136) Cherry-picked from open PR #136 (commit 9e1d7a6) so this PR's CI goes green before #136 lands. When #136 reaches main, the rebase will collapse this commit to a no-op and it will drop out. --- original commit body --- Pre-stage icu extension for amd64 docker tests LoadInternal calls ExtensionHelper::AutoLoadExtension(db, "icu") so the Europe/Brussels timezone option is honoured. Inside the linux_amd64 test docker container there is no network egress and the local extension directory is empty, so the autoload fails. Copy the icu.duckdb_extension that was just built locally (declared in extension_config.cmake) into the expected path before running the unittester. --- Makefile | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/Makefile b/Makefile index bc17d050..ab8cf2ee 100644 --- a/Makefile +++ b/Makefile @@ -11,9 +11,32 @@ include extension-ci-tools/makefiles/duckdb_extension.Makefile # both MEOS (meos_initialize_timezone) and DuckDB (DBConfig::SetOptionByName # "TimeZone") to Europe/Brussels. Tests pass on any OS timezone — the # extension is the single source of truth, no TZ env var needed. +# +# LoadInternal also calls ExtensionHelper::AutoLoadExtension(db, "icu") so +# the timezone option is honoured. Autoload looks for the extension on disk +# at $HOME/.duckdb/extensions///icu.duckdb_extension +# and falls back to a hub download. Inside the linux_amd64 test docker +# container that path is empty and there is no network egress, so the +# autoload fails. We copy the icu.duckdb_extension that was built locally +# as part of this extension's build (declared in extension_config.cmake) +# into the expected path before running the unittester. +DUCKDB_VERSION_TAG := v1.4.4 + +define stage_icu + @if [ -f ./build/$(1)/extension/icu/icu.duckdb_extension ]; then \ + platform=$$(uname -m | sed 's/x86_64/linux_amd64/;s/aarch64/linux_arm64/'); \ + target=$$HOME/.duckdb/extensions/$(DUCKDB_VERSION_TAG)/$$platform; \ + mkdir -p "$$target" && cp -f ./build/$(1)/extension/icu/icu.duckdb_extension "$$target/" && \ + echo "Staged icu.duckdb_extension at $$target/"; \ + fi +endef + test_release_internal: + $(call stage_icu,release) ./build/release/$(TEST_PATH) "$(PROJ_DIR)test/*" test_debug_internal: + $(call stage_icu,debug) ./build/debug/$(TEST_PATH) "$(PROJ_DIR)test/*" test_reldebug_internal: - ./build/reldebug/$(TEST_PATH) "$(PROJ_DIR)test/*" \ No newline at end of file + $(call stage_icu,reldebug) + ./build/reldebug/$(TEST_PATH) "$(PROJ_DIR)test/*" From 0df3c6091c6e7ff65a99c44dfedb417a80129c9c Mon Sep 17 00:00:00 2001 From: Esteban Zimanyi Date: Wed, 20 May 2026 19:21:51 +0200 Subject: [PATCH 4/5] polyfill(macos/wasm): bigint_to_set_duckdb int64_t forwarder (from open PR #140) On macOS LP64 and Wasm/emscripten, int64 (long) and int64_t (long long) are the same width but distinct types, so clang rejects passing bigint_to_set where a Set *(*)(int64_t) is expected as a non-type template arg. Cherry- picked from open PR #140 (a8b1755) so this PR goes green on osx_amd64, osx_arm64, and wasm_mvp before #140 lands. The cast is a no-op on Linux, where int64 and int64_t are both long. When #140 reaches main the rebase collapses this commit to a no-op. --- src/temporal/set.cpp | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/src/temporal/set.cpp b/src/temporal/set.cpp index b803498a..1858e489 100644 --- a/src/temporal/set.cpp +++ b/src/temporal/set.cpp @@ -945,6 +945,14 @@ static inline Set *date_to_set_duckdb(DateADT d) { return date_to_set(ToMeosDate(duckdb::date_t(d))); } +// macOS LP64: int64 (long) and int64_t (long long) are the same width but +// distinct types, so clang rejects passing bigint_to_set where a +// Set *(*)(int64_t) is expected as a non-type template arg. The cast is a +// no-op on Linux. See SetUnionScalarFunction below. +static inline Set *bigint_to_set_duckdb(int64_t i) { + return bigint_to_set(static_cast(i)); +} + struct SetPtrState { Set *accumulated; }; @@ -1069,7 +1077,7 @@ void SetTypes::RegisterSetUnionAgg(ExtensionLoader &loader) { LogicalType::INTEGER, SetTypes::intset())); set_union_set.AddFunction( AggregateFunction::UnaryAggregateDestructor>( + SetUnionScalarFunction>( LogicalType::BIGINT, SetTypes::bigintset())); set_union_set.AddFunction( AggregateFunction::UnaryAggregateDestructor Date: Sat, 16 May 2026 11:04:41 +0200 Subject: [PATCH 5/5] Stage icu for the macOS osx_arm64 test path too The stage_icu helper mapped only the Linux uname values, so on the macOS arm64 test runner uname -m returned "arm64" and the icu extension was copied to .duckdb/extensions/v1.4.4/arm64 instead of .../osx_arm64, where DuckDB's autoload looks. The hub fallback is not reliably resolvable on that runner, so the osx_arm64 Test step failed to load the extension. Map the OS and architecture to the DuckDB platform string (linux_amd64, linux_arm64, osx_amd64, osx_arm64) so the locally built icu is staged at the path autoload expects on every tested platform; the Linux mapping is unchanged. Co-Authored-By: Claude Opus 4.7 --- Makefile | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/Makefile b/Makefile index ab8cf2ee..a9485498 100644 --- a/Makefile +++ b/Makefile @@ -15,16 +15,23 @@ include extension-ci-tools/makefiles/duckdb_extension.Makefile # LoadInternal also calls ExtensionHelper::AutoLoadExtension(db, "icu") so # the timezone option is honoured. Autoload looks for the extension on disk # at $HOME/.duckdb/extensions///icu.duckdb_extension -# and falls back to a hub download. Inside the linux_amd64 test docker -# container that path is empty and there is no network egress, so the -# autoload fails. We copy the icu.duckdb_extension that was built locally -# as part of this extension's build (declared in extension_config.cmake) -# into the expected path before running the unittester. +# and falls back to a hub download. That fails both inside the linux_amd64 +# test docker container (empty path, no network egress) and on the macOS +# osx_arm64 test runner (hub icu not reliably resolvable). We copy the +# icu.duckdb_extension that was built locally as part of this extension's +# build (declared in extension_config.cmake) into the expected path, +# matched to the DuckDB platform string, before running the unittester. DUCKDB_VERSION_TAG := v1.4.4 define stage_icu @if [ -f ./build/$(1)/extension/icu/icu.duckdb_extension ]; then \ - platform=$$(uname -m | sed 's/x86_64/linux_amd64/;s/aarch64/linux_arm64/'); \ + case "$$(uname -s)-$$(uname -m)" in \ + Linux-x86_64) platform=linux_amd64 ;; \ + Linux-aarch64) platform=linux_arm64 ;; \ + Darwin-arm64) platform=osx_arm64 ;; \ + Darwin-x86_64) platform=osx_amd64 ;; \ + *) platform=$$(uname -m) ;; \ + esac; \ target=$$HOME/.duckdb/extensions/$(DUCKDB_VERSION_TAG)/$$platform; \ mkdir -p "$$target" && cp -f ./build/$(1)/extension/icu/icu.duckdb_extension "$$target/" && \ echo "Staged icu.duckdb_extension at $$target/"; \