Parquet: Read and write geometry and geography WKB values#16982
Open
huan233usc wants to merge 1 commit into
Open
Parquet: Read and write geometry and geography WKB values#16982huan233usc wants to merge 1 commit into
huan233usc wants to merge 1 commit into
Conversation
Geometry and geography columns are stored as pure WKB in a BINARY Parquet column (the logical-type mapping landed in apache#16765). Wire the value path through ParquetValueReaders/Writers.byteBuffers, the same primitive used for BSON, since the in-memory representation is a WKB ByteBuffer. This replaces the temporary UnsupportedOperationException stubs left for the writer and the unsupported-logical-type failure on the reader. Enable the shared DataTest round-trip coverage for geospatial types in the generic Parquet reader/writer and add an explicit WKB round-trip test, including null values.
d05928f to
7ee817f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Geometry and geography columns map to a BINARY Parquet column carrying a
geometry/geography logical type, the schema mapping added in #16765. That PR
deliberately left the value path as a follow-up: the writer threw
UnsupportedOperationExceptionand the reader failed on the unsupported logicaltype. This PR wires up the value path.
Geo values are pure WKB, and Iceberg represents them in memory as a
ByteBuffer(
Type.TypeID.GEOMETRY/GEOGRAPHYmap toByteBuffer.class), so the readerand writer reuse
ParquetValueReaders/ParquetValueWriters.byteBuffers— thesame primitive already used for BSON. The change is in
BaseParquetReaders/BaseParquetWriter, so both the generic and internal object models inherit it.Testing:
DataTestround-trip coverage for geospatial types in thegeneric Parquet reader/writer (
supportsGeospatial()), exercising geometry andgeography across multiple CRS and edge algorithms with randomly generated WKB
values.
TestParquetDataWriter) coveringgeometry, geography, and null values through the
DataWriterpath.