From cf36886adb6682a56f3324d709bd1b1340af4be9 Mon Sep 17 00:00:00 2001 From: Esteban Zimanyi Date: Wed, 20 May 2026 11:55:01 +0200 Subject: [PATCH 1/5] fix: drop premature `using meosType = MeosType;` alias in tydef.hpp MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `meosType` (lower-case) is the **pre-consolidation** MEOS type name; `MeosType` (upper-case) is the **post-consolidation** target that the upstream rename sweep has not yet reached. The current vcpkg pin (`vcpkg_ports/meos/portfile.cmake` REF f11b7443ee98…) is still pre-consolidation: `meos/include/temporal/meos_catalog.h` line 121 declares the typedef as `} meosType;` and every MEOS API uses the lower-case spelling. MobilityDuck's source code consistently uses `meosType` to match — `grep -rn '\bMeosType\b' src/` finds the name only on the alias line and its comment, nowhere else. c8cad6d added `using meosType = MeosType;` as a forward-looking bridge for the eventual consolidation bump. That bridge points at `MeosType`, which the current pin does NOT yet expose, so it breaks every PR's Linux arm64 build with: /duckdb_build_dir/src/include/tydef.hpp:18:18: error: ‘MeosType’ does not name a type; did you mean ‘meosType’? The fix is to drop the premature alias and replace the misleading comment with one that documents the pre/post-consolidation distinction and the resume path for the next pin bump — at that point a reviewer can either restore the bridge (this time it'll be valid because `MeosType` will exist) or sweep the MobilityDuck source from `meosType` to `MeosType` in a single PR. Unblocks every in-flight PR's Linux arm64 build: #126, #130, #149, #158, #159, #160, plus the entire `feat/*_port_core` extended-type stack (#148/#150/#151/#153/#155/#156). --- src/include/tydef.hpp | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/src/include/tydef.hpp b/src/include/tydef.hpp index b7b28109..b9860ca0 100644 --- a/src/include/tydef.hpp +++ b/src/include/tydef.hpp @@ -11,11 +11,27 @@ extern "C" { #include } -// Forward-compat alias for the meosType → MeosType rename (MobilityDB -// pr785-sync-script). Vcpkg's MEOS exposes `MeosType`; existing -// MobilityDuck code still uses `meosType`. This alias bridges the two -// without touching every reference site. -using meosType = MeosType; +// MEOS naming history: `meosType` is the **pre-consolidation** spelling +// and `MeosType` is the **post-consolidation** target (the rename is +// part of the upstream consolidation sweep, not yet reached by the +// vcpkg pin). The current pin +// (`vcpkg_ports/meos/portfile.cmake` REF f11b7443ee98…) is still +// pre-consolidation and exposes `meosType` — see +// meos/include/temporal/meos_catalog.h, where line 121 declares +// `} meosType;`. MobilityDuck's source consistently uses +// `meosType` (verified via `grep -rn '\bmeosType\b' src/`), which +// matches the pin, so no alias is needed today. +// +// An earlier version of this file added `using meosType = MeosType;` +// as a forward-looking bridge for the eventual consolidation bump. +// That alias references `MeosType`, which the current pin does NOT +// yet expose, so it broke the build: +// "'MeosType' does not name a type; did you mean 'meosType'?". +// +// When the MEOS pin is bumped past the consolidation point, restore +// a bridge here (`using meosType = MeosType;` becomes valid then) or +// sweep the source `meosType → MeosType` in one PR — whichever the +// project prefers at that time. namespace duckdb { From 2b7322a006ab575f942d281c1029656d425c7873 Mon Sep 17 00:00:00 2001 From: Esteban Zimanyi Date: Wed, 20 May 2026 12:25:38 +0200 Subject: [PATCH 2/5] diag: surface odd-length hex output from MEOS *_as_hexwkb on macOS arm64 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit PRs #126 and #130 have been failing the macOS osx_arm64 unittest with: Invalid hex string, length (69) has to be a multiple of two! That message is from DuckDB's hex/blob decoder rejecting an odd-length string — meaning one of the MEOS `*_as_hexwkb` producers is emitting a 69-char hex string on that platform. Hex strings of WKB-encoded MEOS values must always be even-length (each byte is encoded as 2 hex chars), so the producer is either: - producing a literal odd-length string (MEOS-side bug), or - producing an even-length string with an embedded null at an odd position (DuckDB sees that null as the string end). This commit wraps every `*_as_hexwkb` call-site with a diagnostic-grade precondition check that runs immediately after the MEOS call returns. When the producer emits odd-length output, the exception message now includes both `strlen(hex)` and the `sz` out-parameter reported by MEOS — pinpointing which side is wrong and what the truncation point was. Sites instrumented (7 producers): * src/geo/tgeompoint.cpp — TgeoAsHexWkbExec (temporal_as_hexwkb) * src/geo/stbox_functions.cpp — Stbox_as_hexwkb (stbox_as_hexwkb) * src/temporal/span_functions.cpp — Span_as_hexwkb * src/temporal/spanset_functions.cpp — Spanset_as_hexwkb * src/temporal/set_functions.cpp — Set_as_hexwkb * src/temporal/tbox_functions.cpp — Tbox_as_hexwkb * src/temporal/temporal_functions.cpp — Temporal_as_hexwkb Each site also switches `StringVector::AddString(result, hex)` (which uses `strlen(hex)` implicitly) to `AddString(result, hex, actual)` (which uses the explicit length we just measured), so any latent embedded-null bug becomes detectable at the producer rather than the consumer. The change is intentionally a no-op on green paths: hex strings that are already even-length are passed through with no extra behaviour change, just an explicit length argument. Refs: #126, #130, #158 (touches some of the same files; sequence after #158 lands). --- src/geo/stbox_functions.cpp | 14 ++++++++++++-- src/geo/tgeompoint.cpp | 19 +++++++++++++++++-- src/temporal/set_functions.cpp | 13 +++++++++++-- src/temporal/span_functions.cpp | 13 +++++++++++-- src/temporal/spanset_functions.cpp | 13 +++++++++++-- src/temporal/tbox_functions.cpp | 13 +++++++++++-- src/temporal/temporal_functions.cpp | 16 ++++++++++++++-- 7 files changed, 87 insertions(+), 14 deletions(-) diff --git a/src/geo/stbox_functions.cpp b/src/geo/stbox_functions.cpp index b5398f02..5985ba5d 100644 --- a/src/geo/stbox_functions.cpp +++ b/src/geo/stbox_functions.cpp @@ -310,8 +310,18 @@ void StboxFunctions::Stbox_as_hexwkb(DataChunk &args, ExpressionState &state, Ve throw InternalException("Failure in Stbox_as_hexwkb: unable to cast stbox to hexwkb"); return string_t(); } - string_t ret_str(wkb); - string_t stored_data = StringVector::AddStringOrBlob(result, ret_str); + // Diagnostic: hex strings must be even-length. See + // src/geo/tgeompoint.cpp TgeoAsHexWkbExec for context. + size_t actual = strlen(wkb); + if (actual % 2 != 0) { + std::string diag = "Stbox_as_hexwkb produced odd-length string: " + "strlen=" + std::to_string(actual) + + " sz=" + std::to_string(wkb_size); + free(wkb); + free(stbox); + throw InternalException(diag); + } + string_t stored_data = StringVector::AddStringOrBlob(result, wkb, actual); free(wkb); free(stbox); return stored_data; diff --git a/src/geo/tgeompoint.cpp b/src/geo/tgeompoint.cpp index 2bc5e6a0..e69411c2 100644 --- a/src/geo/tgeompoint.cpp +++ b/src/geo/tgeompoint.cpp @@ -1933,10 +1933,25 @@ void TgeoAsHexWkbExec(DataChunk &args, ExpressionState &state, Vector &result, u Temporal *t = BlobToTemp(input); size_t sz = 0; char *hex = temporal_as_hexwkb(t, variant, &sz); - (void) sz; free(t); if (!hex) throw InternalException("temporal_as_hexwkb returned null"); - string_t stored = StringVector::AddString(result, hex); + // Diagnostic: hex strings must be even-length (2 chars / byte). + // The macOS-arm64 CI on PRs #126/#130 has been failing with + // "Invalid hex string, length (69)" from DuckDB's decoder, which + // implies the producer is occasionally emitting odd-length output + // on that platform. Report both `strlen` and `sz` so the next + // failure pinpoints whether MEOS itself produced the odd length + // or whether something downstream truncates at an embedded null. + size_t actual = strlen(hex); + if (actual % 2 != 0) { + std::string diag = "temporal_as_hexwkb produced odd-length string: " + "strlen=" + std::to_string(actual) + + " sz=" + std::to_string(sz) + + " variant=" + std::to_string((int) variant); + free(hex); + throw InternalException(diag); + } + string_t stored = StringVector::AddString(result, hex, actual); free(hex); return stored; }); diff --git a/src/temporal/set_functions.cpp b/src/temporal/set_functions.cpp index 71407eee..293310ca 100644 --- a/src/temporal/set_functions.cpp +++ b/src/temporal/set_functions.cpp @@ -121,12 +121,21 @@ void SetFunctions::Set_as_hexwkb(DataChunk &args, ExpressionState &state, Vector Set *s = reinterpret_cast(data_copy); size_t hex_size = 0; char *hex = set_as_hexwkb(s, WKB_EXTENDED, &hex_size); - (void)hex_size; free(data_copy); if (!hex) { throw InternalException("asHexWKB: set_as_hexwkb failed"); } - string_t stored = StringVector::AddString(result, hex); + // Diagnostic: hex strings must be even-length. See + // src/geo/tgeompoint.cpp TgeoAsHexWkbExec for context. + size_t actual = strlen(hex); + if (actual % 2 != 0) { + std::string diag = "asHexWKB: set_as_hexwkb produced odd-length string: " + "strlen=" + std::to_string(actual) + + " sz=" + std::to_string(hex_size); + free(hex); + throw InternalException(diag); + } + string_t stored = StringVector::AddString(result, hex, actual); free(hex); return stored; } diff --git a/src/temporal/span_functions.cpp b/src/temporal/span_functions.cpp index 8c6f5bdb..3e4eea52 100644 --- a/src/temporal/span_functions.cpp +++ b/src/temporal/span_functions.cpp @@ -152,10 +152,19 @@ void SpanFunctions::Span_as_hexwkb(DataChunk &args, ExpressionState &state, Vect Span *s = reinterpret_cast(copy); size_t hex_size = 0; char *hex = span_as_hexwkb(s, WKB_EXTENDED, &hex_size); - (void)hex_size; free(copy); if (!hex) throw InternalException("asHexWKB: span_as_hexwkb failed"); - string_t stored = StringVector::AddString(result, hex); + // Diagnostic: hex strings must be even-length. See + // src/geo/tgeompoint.cpp TgeoAsHexWkbExec for context. + size_t actual = strlen(hex); + if (actual % 2 != 0) { + std::string diag = "asHexWKB: span_as_hexwkb produced odd-length string: " + "strlen=" + std::to_string(actual) + + " sz=" + std::to_string(hex_size); + free(hex); + throw InternalException(diag); + } + string_t stored = StringVector::AddString(result, hex, actual); free(hex); return stored; }); diff --git a/src/temporal/spanset_functions.cpp b/src/temporal/spanset_functions.cpp index 4010f4d9..64f74434 100644 --- a/src/temporal/spanset_functions.cpp +++ b/src/temporal/spanset_functions.cpp @@ -110,10 +110,19 @@ void SpansetFunctions::Spanset_as_hexwkb(DataChunk &args, ExpressionState &state SpanSet *ss = reinterpret_cast(copy); size_t hex_size = 0; char *hex = spanset_as_hexwkb(ss, WKB_EXTENDED, &hex_size); - (void)hex_size; free(copy); if (!hex) throw InternalException("asHexWKB: spanset_as_hexwkb failed"); - string_t stored = StringVector::AddString(result, hex); + // Diagnostic: hex strings must be even-length. See + // src/geo/tgeompoint.cpp TgeoAsHexWkbExec for context. + size_t actual = strlen(hex); + if (actual % 2 != 0) { + std::string diag = "asHexWKB: spanset_as_hexwkb produced odd-length string: " + "strlen=" + std::to_string(actual) + + " sz=" + std::to_string(hex_size); + free(hex); + throw InternalException(diag); + } + string_t stored = StringVector::AddString(result, hex, actual); free(hex); return stored; }); diff --git a/src/temporal/tbox_functions.cpp b/src/temporal/tbox_functions.cpp index fd17bacf..4ea413ef 100644 --- a/src/temporal/tbox_functions.cpp +++ b/src/temporal/tbox_functions.cpp @@ -1905,12 +1905,21 @@ void TboxFunctions::Tbox_as_hexwkb(DataChunk &args, ExpressionState &state, Vect TBox *tbox = reinterpret_cast(copy); size_t hex_size = 0; char *hex = tbox_as_hexwkb(tbox, WKB_EXTENDED, &hex_size); - (void)hex_size; free(copy); if (!hex) { throw InternalException("asHexWKB: tbox_as_hexwkb failed"); } - string_t stored = StringVector::AddString(result, hex); + // Diagnostic: hex strings must be even-length. See + // src/geo/tgeompoint.cpp TgeoAsHexWkbExec for context. + size_t actual = strlen(hex); + if (actual % 2 != 0) { + std::string diag = "asHexWKB: tbox_as_hexwkb produced odd-length string: " + "strlen=" + std::to_string(actual) + + " sz=" + std::to_string(hex_size); + free(hex); + throw InternalException(diag); + } + string_t stored = StringVector::AddString(result, hex, actual); free(hex); return stored; } diff --git a/src/temporal/temporal_functions.cpp b/src/temporal/temporal_functions.cpp index 7e5cc932..acafbb3b 100644 --- a/src/temporal/temporal_functions.cpp +++ b/src/temporal/temporal_functions.cpp @@ -6105,8 +6105,20 @@ void TemporalFunctions::Temporal_as_hexwkb(DataChunk &args, ExpressionState &sta free(temp); throw InternalException("[Temporal_as_hexwkb] temporal_as_hexwkb returned NULL"); } - std::string ret(hex); - string_t stored = StringVector::AddStringOrBlob(result, ret); + // Diagnostic: hex strings must be even-length (2 chars / byte). + // See tgeompoint.cpp TgeoAsHexWkbExec for context on the + // macOS-arm64 "Invalid hex string, length (69)" failure. + size_t actual = strlen(hex); + if (actual % 2 != 0) { + std::string diag = "[Temporal_as_hexwkb] odd-length string: " + "strlen=" + std::to_string(actual) + + " sz=" + std::to_string(hex_size) + + " variant=" + std::to_string((int) variant); + free(hex); + free(temp); + throw InternalException(diag); + } + string_t stored = StringVector::AddStringOrBlob(result, hex, actual); free(hex); free(temp); From 3ffa1917b25cfbe1d5dde3b693c8f1e00a21334e Mon Sep 17 00:00:00 2001 From: Esteban Zimanyi Date: Fri, 15 May 2026 08:49:29 +0200 Subject: [PATCH 3/5] polyfill(ci): stage icu extension for amd64 docker tests (from open PR #136) Cherry-picked from open PR #136 (commit 9e1d7a6) so this PR's CI goes green before #136 lands. When #136 reaches main, the rebase will collapse this commit to a no-op and it will drop out. --- original commit body --- Pre-stage icu extension for amd64 docker tests LoadInternal calls ExtensionHelper::AutoLoadExtension(db, "icu") so the Europe/Brussels timezone option is honoured. Inside the linux_amd64 test docker container there is no network egress and the local extension directory is empty, so the autoload fails. Copy the icu.duckdb_extension that was just built locally (declared in extension_config.cmake) into the expected path before running the unittester. --- Makefile | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/Makefile b/Makefile index bc17d050..ab8cf2ee 100644 --- a/Makefile +++ b/Makefile @@ -11,9 +11,32 @@ include extension-ci-tools/makefiles/duckdb_extension.Makefile # both MEOS (meos_initialize_timezone) and DuckDB (DBConfig::SetOptionByName # "TimeZone") to Europe/Brussels. Tests pass on any OS timezone — the # extension is the single source of truth, no TZ env var needed. +# +# LoadInternal also calls ExtensionHelper::AutoLoadExtension(db, "icu") so +# the timezone option is honoured. Autoload looks for the extension on disk +# at $HOME/.duckdb/extensions///icu.duckdb_extension +# and falls back to a hub download. Inside the linux_amd64 test docker +# container that path is empty and there is no network egress, so the +# autoload fails. We copy the icu.duckdb_extension that was built locally +# as part of this extension's build (declared in extension_config.cmake) +# into the expected path before running the unittester. +DUCKDB_VERSION_TAG := v1.4.4 + +define stage_icu + @if [ -f ./build/$(1)/extension/icu/icu.duckdb_extension ]; then \ + platform=$$(uname -m | sed 's/x86_64/linux_amd64/;s/aarch64/linux_arm64/'); \ + target=$$HOME/.duckdb/extensions/$(DUCKDB_VERSION_TAG)/$$platform; \ + mkdir -p "$$target" && cp -f ./build/$(1)/extension/icu/icu.duckdb_extension "$$target/" && \ + echo "Staged icu.duckdb_extension at $$target/"; \ + fi +endef + test_release_internal: + $(call stage_icu,release) ./build/release/$(TEST_PATH) "$(PROJ_DIR)test/*" test_debug_internal: + $(call stage_icu,debug) ./build/debug/$(TEST_PATH) "$(PROJ_DIR)test/*" test_reldebug_internal: - ./build/reldebug/$(TEST_PATH) "$(PROJ_DIR)test/*" \ No newline at end of file + $(call stage_icu,reldebug) + ./build/reldebug/$(TEST_PATH) "$(PROJ_DIR)test/*" From bc480c4b39a734d5e6ad9bdf9d38711e44ac472b Mon Sep 17 00:00:00 2001 From: Esteban Zimanyi Date: Wed, 20 May 2026 19:21:55 +0200 Subject: [PATCH 4/5] polyfill(macos/wasm): bigint_to_set_duckdb int64_t forwarder (from open PR #140) On macOS LP64 and Wasm/emscripten, int64 (long) and int64_t (long long) are the same width but distinct types, so clang rejects passing bigint_to_set where a Set *(*)(int64_t) is expected as a non-type template arg. Cherry- picked from open PR #140 (a8b1755) so this PR goes green on osx_amd64, osx_arm64, and wasm_mvp before #140 lands. The cast is a no-op on Linux, where int64 and int64_t are both long. When #140 reaches main the rebase collapses this commit to a no-op. --- src/temporal/set.cpp | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/src/temporal/set.cpp b/src/temporal/set.cpp index b803498a..1858e489 100644 --- a/src/temporal/set.cpp +++ b/src/temporal/set.cpp @@ -945,6 +945,14 @@ static inline Set *date_to_set_duckdb(DateADT d) { return date_to_set(ToMeosDate(duckdb::date_t(d))); } +// macOS LP64: int64 (long) and int64_t (long long) are the same width but +// distinct types, so clang rejects passing bigint_to_set where a +// Set *(*)(int64_t) is expected as a non-type template arg. The cast is a +// no-op on Linux. See SetUnionScalarFunction below. +static inline Set *bigint_to_set_duckdb(int64_t i) { + return bigint_to_set(static_cast(i)); +} + struct SetPtrState { Set *accumulated; }; @@ -1069,7 +1077,7 @@ void SetTypes::RegisterSetUnionAgg(ExtensionLoader &loader) { LogicalType::INTEGER, SetTypes::intset())); set_union_set.AddFunction( AggregateFunction::UnaryAggregateDestructor>( + SetUnionScalarFunction>( LogicalType::BIGINT, SetTypes::bigintset())); set_union_set.AddFunction( AggregateFunction::UnaryAggregateDestructor Date: Sat, 16 May 2026 11:04:41 +0200 Subject: [PATCH 5/5] Stage icu for the macOS osx_arm64 test path too The stage_icu helper mapped only the Linux uname values, so on the macOS arm64 test runner uname -m returned "arm64" and the icu extension was copied to .duckdb/extensions/v1.4.4/arm64 instead of .../osx_arm64, where DuckDB's autoload looks. The hub fallback is not reliably resolvable on that runner, so the osx_arm64 Test step failed to load the extension. Map the OS and architecture to the DuckDB platform string (linux_amd64, linux_arm64, osx_amd64, osx_arm64) so the locally built icu is staged at the path autoload expects on every tested platform; the Linux mapping is unchanged. Co-Authored-By: Claude Opus 4.7 --- Makefile | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/Makefile b/Makefile index ab8cf2ee..a9485498 100644 --- a/Makefile +++ b/Makefile @@ -15,16 +15,23 @@ include extension-ci-tools/makefiles/duckdb_extension.Makefile # LoadInternal also calls ExtensionHelper::AutoLoadExtension(db, "icu") so # the timezone option is honoured. Autoload looks for the extension on disk # at $HOME/.duckdb/extensions///icu.duckdb_extension -# and falls back to a hub download. Inside the linux_amd64 test docker -# container that path is empty and there is no network egress, so the -# autoload fails. We copy the icu.duckdb_extension that was built locally -# as part of this extension's build (declared in extension_config.cmake) -# into the expected path before running the unittester. +# and falls back to a hub download. That fails both inside the linux_amd64 +# test docker container (empty path, no network egress) and on the macOS +# osx_arm64 test runner (hub icu not reliably resolvable). We copy the +# icu.duckdb_extension that was built locally as part of this extension's +# build (declared in extension_config.cmake) into the expected path, +# matched to the DuckDB platform string, before running the unittester. DUCKDB_VERSION_TAG := v1.4.4 define stage_icu @if [ -f ./build/$(1)/extension/icu/icu.duckdb_extension ]; then \ - platform=$$(uname -m | sed 's/x86_64/linux_amd64/;s/aarch64/linux_arm64/'); \ + case "$$(uname -s)-$$(uname -m)" in \ + Linux-x86_64) platform=linux_amd64 ;; \ + Linux-aarch64) platform=linux_arm64 ;; \ + Darwin-arm64) platform=osx_arm64 ;; \ + Darwin-x86_64) platform=osx_amd64 ;; \ + *) platform=$$(uname -m) ;; \ + esac; \ target=$$HOME/.duckdb/extensions/$(DUCKDB_VERSION_TAG)/$$platform; \ mkdir -p "$$target" && cp -f ./build/$(1)/extension/icu/icu.duckdb_extension "$$target/" && \ echo "Staged icu.duckdb_extension at $$target/"; \