Skip to content

Parquet: Read and write geometry and geography WKB values#16982

Open
huan233usc wants to merge 1 commit into
apache:mainfrom
huan233usc:geo-parquet-value-path
Open

Parquet: Read and write geometry and geography WKB values#16982
huan233usc wants to merge 1 commit into
apache:mainfrom
huan233usc:geo-parquet-value-path

Conversation

@huan233usc

Copy link
Copy Markdown
Contributor

Geometry and geography columns map to a BINARY Parquet column carrying a
geometry/geography logical type, the schema mapping added in #16765. That PR
deliberately left the value path as a follow-up: the writer threw
UnsupportedOperationException and the reader failed on the unsupported logical
type. This PR wires up the value path.

Geo values are pure WKB, and Iceberg represents them in memory as a ByteBuffer
(Type.TypeID.GEOMETRY / GEOGRAPHY map to ByteBuffer.class), so the reader
and writer reuse ParquetValueReaders/ParquetValueWriters.byteBuffers — the
same primitive already used for BSON. The change is in BaseParquetReaders /
BaseParquetWriter, so both the generic and internal object models inherit it.

Testing:

  • Enables the shared DataTest round-trip coverage for geospatial types in the
    generic Parquet reader/writer (supportsGeospatial()), exercising geometry and
    geography across multiple CRS and edge algorithms with randomly generated WKB
    values.
  • Adds an explicit WKB round-trip test (TestParquetDataWriter) covering
    geometry, geography, and null values through the DataWriter path.

Geometry and geography columns are stored as pure WKB in a BINARY
Parquet column (the logical-type mapping landed in apache#16765). Wire the
value path through ParquetValueReaders/Writers.byteBuffers, the same
primitive used for BSON, since the in-memory representation is a WKB
ByteBuffer. This replaces the temporary UnsupportedOperationException
stubs left for the writer and the unsupported-logical-type failure on
the reader.

Enable the shared DataTest round-trip coverage for geospatial types in
the generic Parquet reader/writer and add an explicit WKB round-trip
test, including null values.
@huan233usc huan233usc force-pushed the geo-parquet-value-path branch from d05928f to 7ee817f Compare June 27, 2026 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant