feat(writer): embed iceberg.schema in Parquet footer metadata by viirya · Pull Request #2724 · apache/iceberg-rust

viirya · 2026-06-27T16:20:14Z

Which issue does this PR close?

Closes Add iceberg.schema to footer for engine compatibility #2184.

What changes are included in this PR?

Engines such as Snowflake resolve an Iceberg table's schema from the iceberg.schema key in a Parquet file's footer key-value metadata. iceberg-rust didn't write this key, so Parquet files it produced (or files produced by nimtable compaction on top of it) were rejected by those engines.

This PR writes the Iceberg schema as JSON under the iceberg.schema footer key when the Parquet writer is initialized, matching iceberg-java. I verified the Java behavior against the source:

parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java writes meta("iceberg.schema", SchemaParser.toJson(schema)) unconditionally in WriteBuilder.build().
The value is the full Iceberg Schema JSON — the same representation that appears in table metadata's schemas array. iceberg-rust's serde_json::to_string(&schema) produces that same JSON.

Implementation: in ParquetWriter, right after the underlying AsyncArrowWriter is lazily created, append_key_value_metadata is called with iceberg.schema → schema JSON. It's unconditional, matching Java (the schema is always present).

Scope

Parquet writer only. iceberg-java writes the same iceberg.schema key from its Avro writer as well, but in iceberg-rust the Avro writer produces manifests (metadata), not the data files these engines query, so it's out of scope here. Adding it to the Avro path can be a follow-up if there's a need.

Are these changes tested?

New test test_parquet_writer_embeds_iceberg_schema_in_footer: writes a Parquet file through ParquetWriter, reads the footer back, asserts the iceberg.schema key is present, and that its JSON value round-trips to the written Schema.

All writer::file_writer::parquet_writer tests pass (no regression), full iceberg lib suite (1372 tests) passes, clippy and rustfmt clean.

Engines such as Snowflake resolve an Iceberg table's schema from the `iceberg.schema` key in a Parquet file's footer key-value metadata. iceberg-rust did not write this key, so Parquet files it produced (or files produced by nimtable compaction on top of it) were rejected by those engines. Write the Iceberg schema as JSON under the `iceberg.schema` footer key when the Parquet writer is initialized, matching iceberg-java (`Parquet.java`). The value is the same schema JSON that appears in table metadata's `schemas`, produced via `serde_json::to_string`. Scope is the Parquet writer only. iceberg-java writes the same key from its Avro writer too, but in iceberg-rust the Avro writer produces manifests (metadata), not the data files these engines query, so that is left as a follow-up. Closes apache#2184

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(writer): embed iceberg.schema in Parquet footer metadata#2724

feat(writer): embed iceberg.schema in Parquet footer metadata#2724
viirya wants to merge 1 commit into
apache:mainfrom
viirya:fix/2184-iceberg-schema-footer

viirya commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

viirya commented Jun 27, 2026

Which issue does this PR close?

What changes are included in this PR?

Scope

Are these changes tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant