Spark 4.1: Fix Partition info not displayed in job description for RewritePositionDeleteFilesSparkAction#15175
Spark 4.1: Fix Partition info not displayed in job description for RewritePositionDeleteFilesSparkAction#15175manuzhang wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes an issue where partition information was not properly displayed in job descriptions for the RewritePositionDeleteFilesSparkAction in Spark 4.1. The fix adds a description field to StructProjection to carry partition information through the projection process.
Changes:
- Modified
StructProjectionto store and return a description string - Updated
PartitionUtil.coercePartition()to pass partition description when wrappingPartitionData - Modified
RewritePositionDeleteFilesSparkActionto extract and use the partition description in job descriptions
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| api/src/main/java/org/apache/iceberg/util/StructProjection.java | Added description field and getter method, plus overloaded wrap method to accept description |
| core/src/main/java/org/apache/iceberg/util/PartitionUtil.java | Updated to pass partition description when wrapping PartitionData instances |
| spark/v4.1/spark/src/main/java/org/apache/iceberg/spark/actions/RewritePositionDeleteFilesSparkAction.java | Added helper method to extract partition description for job descriptions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
be1b165 to
5beb72d
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
5beb72d to
e72feb1
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f85cd05 to
85a351f
Compare
85a351f to
cd74849
Compare
377bc26 to
bcf599e
Compare
| @@ -377,4 +378,8 @@ private String jobDesc( | |||
| table.name()); | |||
| } | |||
| } | |||
|
|
|||
| private String partitionDescription(StructLike struct) { | |||
| return struct instanceof StructProjection ? ((StructProjection) struct).description() : ""; | |||
There was a problem hiding this comment.
we don't really need to cast with JDK 17 anymore. we can define the variable name already as part of the instanceof
e0aad64 to
b80e536
Compare
| @@ -362,7 +363,7 @@ private String jobDesc( | |||
| runner.description(), | |||
| group.info().globalIndex(), | |||
| plan.totalGroupCount(), | |||
| partition, | |||
| partitionDescription(partition), | |||
There was a problem hiding this comment.
I think we have the same issue in RewriteDataFilesSparkAction?
There was a problem hiding this comment.
I don't know why, but RewriteDataFilesSparkAction doesn't call coercePartition.
There was a problem hiding this comment.
this is the places that I'm referring to:
There was a problem hiding this comment.
Without coercePartition, the partition info can be displayed in job description.
There was a problem hiding this comment.
ah I see, got it, thanks. Nevertheless, I think we should still aim for implementing a proper toString() method
|
|
||
| public StructProjection copyFor(StructLike newStruct) { | ||
| return new StructProjection(type, positionMap, nestedProjections).wrap(newStruct); | ||
| return new StructProjection(type, positionMap, nestedProjections).wrap(newStruct, description); |
There was a problem hiding this comment.
thinking more about this, I'm not sure that's the right approach going forward. Maybe it's better to implement toString and defer to the underlying struct in the toString implementation.
@Override
public String toString() {
return null != struct ? struct.toString() : null;
}
that way we don't need any of the other changes and we can also make sure that this is properly handled not only for RewritePositionDeleteFilesAction but also for the action for rewriting data files
There was a problem hiding this comment.
Do you mean like https://github.com/apache/iceberg/pull/13251/changes?
There was a problem hiding this comment.
yes, like in that PR, although @stevenzwu has a point in that it should only include the projected fields
b80e536 to
18f54f4
Compare
|
@nastra Please take another look. I followed the handling of coerced PartitionData in https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/PartitionStatsHandler.java#L510 and no other changes are needed now. |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
|
@nastra @huaxingao gentle ping for review. Thanks! |
18f54f4 to
a6fb373
Compare
…writePositionDeleteFilesSparkAction
a6fb373 to
f53c61d
Compare
Closes #12414
After Fix:
