HIVE-29546: Iceberg: [V3] Support of ROW LINEAGE in COMPACTION#6407
HIVE-29546: Iceberg: [V3] Support of ROW LINEAGE in COMPACTION#6407kokila-19 wants to merge 2 commits intoapache:masterfrom
Conversation
ddcb497 to
305c54e
Compare
305c54e to
5859f42
Compare
051059e to
a01c6a1
Compare
a01c6a1 to
ec71086
Compare
| private static void setRowLineageConfFlag(Configuration conf, boolean enabled) { | ||
| if (enabled) { | ||
| conf.setBoolean(SessionStateUtil.ROW_LINEAGE, true); | ||
| } else { | ||
| conf.unset(SessionStateUtil.ROW_LINEAGE); | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Enable the row lineage session flag for the current statement execution. | ||
| * Returns {@code true} if the flag was enabled | ||
| */ | ||
| public static void enableRowLineage(SessionState sessionState) { | ||
| setRowLineageConfFlag(sessionState.getConf(), true); | ||
| } | ||
|
|
||
| public static void disableRowLineage(SessionState sessionState) { | ||
| setRowLineageConfFlag(sessionState.getConf(), false); | ||
| } | ||
|
|
There was a problem hiding this comment.
why can't we directly do
public static void enableRowLineage(SessionState sessionState) {
sessionState.getConf().setBoolean(SessionStateUtil.ROW_LINEAGE, true);
}
public static void disableRowLineage(SessionState sessionState) {
sessionState.getConf().setBoolean(SessionStateUtil.ROW_LINEAGE, false);
}
You have javadoc for one & not for others, considering it is a util class we can drop it
| private static String buildSelectColumnList(Table icebergTable, HiveConf conf) { | ||
| return icebergTable.schema().columns().stream() | ||
| .map(Types.NestedField::name) | ||
| .map(col -> HiveUtils.unparseIdentifier(col, conf)) | ||
| .collect(Collectors.joining(", ")); |
There was a problem hiding this comment.
I don't think this logic should kick in like this. If rowLineage isn't enabled, it should just return *, like before.
If rowLineage is enabled add the name from ROW_LINEAGE_COLUMNS_TO_FILE_NAME
There was a problem hiding this comment.
I’ve refactored the code to perform the row lineage check only once and handle all related changes accordingly.
| public static String getRowLineageSelectColumns(boolean rowLineageEnabled) { | ||
| return rowLineageEnabled | ||
| ? ", " + VirtualColumn.ROW_LINEAGE_ID.getName() + ", " + VirtualColumn.LAST_UPDATED_SEQUENCE_NUMBER.getName() | ||
| : ""; |
There was a problem hiding this comment.
Can you change it getRowLineageColumnsForCompaction
| if (rowLineageEnabled) { | ||
| RowLineageUtils.enableRowLineage(sessionState); | ||
| LOG.debug("Row lineage flag set for compaction of table {}", compactTableName); | ||
| } |
There was a problem hiding this comment.
can we not do it when we add the columns for row lineage, would avoid redundant checking rowLineageEnabled
ec71086 to
b9ca107
Compare
|



What changes were proposed in this pull request?
Preserve Iceberg v3 row lineage during compaction by generating compaction rewrite queries that carry row-lineage values correctly.
Propagate the row-lineage flag reliably into write job properties using RowLineageUtils.isRowLineageInsert(conf).
Why are the changes needed?
Compaction is implemented as INSERT OVERWRITE, without special handling it can rewrite data with new row-lineage values.
Does this PR introduce any user-facing change?
No, this PR does not introduce any user-facing changes. It adds internal support for row lineage during compaction.
How was this patch tested?
qtest