Skip to content

Spark: Add support for 4.2.0 (RC)#14984

Draft
manuzhang wants to merge 3 commits into
apache:mainfrom
manuzhang:spark4.2-preview
Draft

Spark: Add support for 4.2.0 (RC)#14984
manuzhang wants to merge 3 commits into
apache:mainfrom
manuzhang:spark4.2-preview

Conversation

@manuzhang

@manuzhang manuzhang commented Jan 7, 2026

Copy link
Copy Markdown
Member

This PR contains three commits. The first two are procedure commits to retain history and the main changes are in the third commit as follows.

  • Adds/registers Spark 4.2 modules in Gradle/settings/CI.
  • Updates Spark 4.2 view handling to use Spark 4.2’s ViewInfo-based APIs.
  • Reworks Iceberg Spark view commands, including renamed Iceberg-specific exec classes.
  • Removes the old SupportsReplaceView adapter and routes view create/replace through Spark 4.2 catalog interfaces.
  • Fixes ALTER VIEW SET TBLPROPERTIES to use native view.updateProperties().set(...).commit() for Iceberg-backed catalogs instead of replacing the full view.
  • Restores coverage for fully qualified function identifiers not being rewritten, scoped to catalogs where Spark 4.2 still fails as expected.
  • Improves SparkCatalog unchecked errors for impossible checked exception branches with operation-specific messages.
  • Updates Spark 4.2 tests/benchmarks for API and error-message changes.

Co-authored-by: @codex

@manuzhang manuzhang force-pushed the spark4.2-preview branch 7 times, most recently from 0d5d05d to 330955b Compare January 8, 2026 15:39
@manuzhang manuzhang force-pushed the spark4.2-preview branch 2 times, most recently from bd2bff7 to af86915 Compare February 7, 2026 15:06
@manuzhang

manuzhang commented Feb 9, 2026

Copy link
Copy Markdown
Member Author

This failure from testing Spark 4.2.0-preview2 is caused by apache/spark#53788, after which an AnalysisException would be thrown on Iceberg metadata tables like default.table.partitions.

TestAddFilesProcedure > addPartitionsWithNullValueShouldAddFilesToNullPartition() > catalogName = spark_catalog, implementation = org.apache.iceberg.spark.SparkSessionCatalog, config = {type=hive, default-namespace=default, parquet-enabled=true, cache-enabled=false}, formatVersion = 2 FAILED
    org.apache.spark.sql.AnalysisException: [REQUIRES_SINGLE_PART_NAMESPACE] spark_catalog requires a single-part namespace, but got `default`.`table`. SQLSTATE: 42K05

@manuzhang

Copy link
Copy Markdown
Member Author

Failed tests after upgrading to Spark 4.2.0-preview3-rc1

  1. testJoinsHourToDays() in TestStoragePartitionedJoins.java:
Assertion failed: "SPJ should not change query output: number of results should match"
The actual and expected query result sizes differ, indicating that either the join logic or test data setup causes a mismatch.
  1. readFromViewReferencingTempFunction() in TestViews.java:
Assertion failed: Expected a specific routine not found error, but got an AnalysisException with different message details.

@manuzhang

Copy link
Copy Markdown
Member Author

apache/spark#54884 has been opened to fix the first failure.

@manuzhang

Copy link
Copy Markdown
Member Author

I will update HourToDaysReducer following interface changes from apache/spark#54884 in next preview release. All other test failures have been fixed.

@szehon-ho szehon-ho left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave a note here to implement the new method in the Reducer once apache/spark#54884 is in (next Spark 4.2 preview)

Sorry just saw, it is the same comment

@manuzhang

Copy link
Copy Markdown
Member Author

The failed tests in 4.2.0-preview3 have been fixed in 4.2.0-preview4.

@szehon-ho

Copy link
Copy Markdown
Member

what's the plan with this branch? @rahulsmahadev and i wanted to work on consuming some of our new DSV2 features from 4.2, will we merge this first based on 4.2 RC0 and can work in parallel?

@manuzhang

Copy link
Copy Markdown
Member Author

@szehon-ho Firstly, please help review last commit whether the changes are ok. Meanwhile, do we want to maintain 4 spark versions?

@szehon-ho

Copy link
Copy Markdown
Member

@huaxingao @anuragmantri any thoughts on it? according to Anurag's discuss thread, we keep around 2 minors, so maybe we can drop spark 4.0 once spark 4.2 is released?

@anuragmantri

Copy link
Copy Markdown
Collaborator

I'm +1 on adding 4.2 and removing 4.0. We need to discuss the sync and get consensus though
https://lists.apache.org/thread/6kmh92wl6qkw08dpgv04bl51v590phbl

Please respond there and I can start a vote as well.

@manuzhang manuzhang force-pushed the spark4.2-preview branch 4 times, most recently from 5aa6f15 to ec9ea3a Compare June 15, 2026 10:27
Comment thread dev/stage-binaries.sh
Comment thread spark/v4.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkView.java Outdated
@manuzhang manuzhang force-pushed the spark4.2-preview branch 4 times, most recently from 3759e1f to 1db4a96 Compare June 18, 2026 04:35
giggsoff pushed a commit to giggsoff/iceberg that referenced this pull request Jun 23, 2026
The Spark 4.2 version bump (4.2.0.1-4.3.0-0 -> -1) adopts vanilla Apache
Spark 4.2.0 APIs, breaking compilation of the v4.2 tree. Port the required
changes from apache#14984, scoped to v4.2 only:

- Views: migrate ViewCatalog + SupportsReplaceView to TableViewCatalog /
  ViewInfo across BaseCatalog, SparkCatalog, SparkSessionCatalog and SparkView
  (now a static ViewInfo converter); add loadTableOrView/commitView; remove
  SupportsReplaceView. Catalog edits are surgical so the fork's ADH purge
  directory-delete and existing time-travel code are preserved.
- spark-extensions: drop the old Create/Alter view execs, rename
  Describe/ShowCreate/ShowV2ViewProperties to Iceberg* variants, and update
  the view analysis/strategy rules to the native DSv2 view API.
- Geo: replace GeographyVal/GeometryVal with BinaryView + Geography/Geometry
  types and getBinaryView in StructInternalRow and SparkParquetReaders.
- Add reportDriverMetrics() to StagedSparkTable and RollbackStagedTable
  (StagedTable/TruncatableTable now declare it).
- Netty: Spark 4.2 calls PlatformDependent.hasDirectByteBufferAddress (added
  in Netty 4.2.12; SPARK-56817 ships 4.2.13) but iceberg-arrow pins 4.2.4.
  Force netty-buffer/netty-common to 4.2.13 in the Spark 4.2 build only,
  leaving the shared pin and other Spark versions untouched.
- Tests: update TestViews expectations for the new messages and SHOW CREATE
  output, and override loadTableOrView in TestSparkCatalog.
@aokolnychyi

Copy link
Copy Markdown
Contributor

Did we convert this into a draft as we anticipate another RC with changes around view management?

manuzhang and others added 2 commits June 27, 2026 09:13
Co-authored-by: Codex <codex@openai.com>
Co-authored-by: Codex <codex@openai.com>
@manuzhang manuzhang force-pushed the spark4.2-preview branch 2 times, most recently from d42a203 to e8d1500 Compare June 27, 2026 03:20
@manuzhang

manuzhang commented Jun 27, 2026

Copy link
Copy Markdown
Member Author

@aokolnychyi Nope, I've always kept it as a draft. My plan is to wait for everything being settled at Spark side. Nevertheless, your review and comments are always welcome.

Add Spark 4.2 module support and include follow-up compatibility fixes for Spark 4.2 view command handling and test expectations.

Co-authored-by: Codex <codex@openai.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants