Skip to content

feat(datafusion): Serializable Codec for Distributed Engines#2727

Draft
NoahKusaba wants to merge 1 commit into
apache:mainfrom
NoahKusaba:feature/datafusion-catalog-config
Draft

feat(datafusion): Serializable Codec for Distributed Engines#2727
NoahKusaba wants to merge 1 commit into
apache:mainfrom
NoahKusaba:feature/datafusion-catalog-config

Conversation

@NoahKusaba

@NoahKusaba NoahKusaba commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Add Enhancements to datafusion-iceberg crate, for future ballista-iceberg integration (which will be put another repo)
#2613

Iceberg's DataFusion physical plan nodes hold live, non-serializable handles — an Arc, a Table with an open FileIO, a PartitionValueCalculator. A distributed engine (Ballista) has to serialize plans and ship them to remote executors, which it can't do with those handles. This branch adds the minimal serializable seed needed to reconstruct those handles on a remote node, plus the public API surface for an external codec to read that seed out of a node and rebuild it. Everything is additive and opt-in: the config is always Option, defaulting to None, which preserves existing single-node behavior unchanged.

Are these changes tested?

Yes

@NoahKusaba NoahKusaba marked this pull request as draft June 28, 2026 01:46
@NoahKusaba NoahKusaba changed the title feat(datafusion) feat(datafusion): Serializable Codec for Distributed Engines Jun 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant