Skip to content

Core: Add ManifestFile adapter for v4 tracked files#16867

Open
anoopj wants to merge 7 commits into
apache:mainfrom
anoopj:manifestfile-adapter
Open

Core: Add ManifestFile adapter for v4 tracked files#16867
anoopj wants to merge 7 commits into
apache:mainfrom
anoopj:manifestfile-adapter

Conversation

@anoopj

@anoopj anoopj commented Jun 18, 2026

Copy link
Copy Markdown
Member

Add TrackedFileAdapters.asManifestFile to wrap a DATA_MANIFEST or DELETE_MANIFEST TrackedFile as a ManifestFile.

The ManifestFile adapter will only be used by tests. Note that ManifestFile doesn't support manifest DVs. So in the actual manifest read path, the reader resolves the MDV first and produces the resulting TrackedFiles, which are then adapted to data/delete files.

Closes #16227

@anoopj anoopj moved this to In review in V4: metadata tree Jun 18, 2026
@github-actions github-actions Bot added the core label Jun 18, 2026
@anoopj anoopj force-pushed the manifestfile-adapter branch 2 times, most recently from b50410f to 718ae34 Compare June 19, 2026 00:42
Add TrackedFileAdapters.asManifestFile to wrap a DATA_MANIFEST or
DELETE_MANIFEST TrackedFile as a ManifestFile.

Closes apache#16227
@anoopj anoopj force-pushed the manifestfile-adapter branch from 718ae34 to 93368cb Compare June 19, 2026 00:47

@wombatu-kun wombatu-kun left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks correct: field delegation, equals/hashCode (consistent with GenericManifestFile), constructor preconditions, and the test coverage are all solid.

Non-blocking discussion point: partitionSpecId() throws UnsupportedOperationException. That is defensible since a v4 manifest is not bound to a single spec. But @rdblue pushed back on #16100 against these adapters exposing an invalid or arbitrary spec ID, so it may be worth confirming with reviewers that a throw is the intended contract for any future caller that reads partitionSpecId().

Comment thread core/src/main/java/org/apache/iceberg/TrackedFileAdapters.java
Comment thread core/src/main/java/org/apache/iceberg/TrackedFileAdapters.java
NO_PARTITION,
8L,
2048L);
file.set(MANIFEST_INFO_ORDINAL, manifestInfo);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use the builder from @gaborkaszab 's PR: #16769

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I can make the change if #16769 gets merged first or do this as a followup PR if this gets merged first.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, in the meantime the builder got merged

@gaborkaszab gaborkaszab left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR, @anoopj !
I took a Quick Look and left some minor comment.


private TrackedManifestFile(TrackedFile file) {
Preconditions.checkArgument(
file.tracking() != null, "Cannot create manifest file: no tracking");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: other adapters don't check if tracking is null. Not sure we have to here. It's a required field anyway

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

private TrackedManifestFile(TrackedFile file) {
Preconditions.checkArgument(
file.tracking() != null, "Cannot create manifest file: no tracking");
Preconditions.checkArgument(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the spec proposal data seq num is 'Optional for leaf manifests, required for root.'. I think it's valid to have it as null here. Seems also optional in ManifestFile

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is for root-level manifest tracking where null is truly an invalid state. We can either validate or just have it fail with an NPE.

Seems also optional in ManifestFile

It is required for a manifest file: ManifestFile.sequenceNumber() returns a primitive

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to add on top of what Anoop said. this is for leaf manifest file entry (TrackedFile) in the root manifest file. Hence dataSequenceNumber should be required

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation @anoopj and @stevenzwu !

file.tracking().snapshotId() != null, "Cannot create manifest file: no snapshot ID");
Preconditions.checkArgument(
file.manifestInfo() != null, "Cannot create manifest file: no manifest info");
Preconditions.checkArgument(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This checks an internal invariant for a TrackedFile. As per the spec proposal a DELETE_MANIFEST shouldn't have firstRowId, so I'd assume that this is a contract to keep when we construct a TrackedFile and set this value. In fact this is check in the relevant PR

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

}

@Override
public boolean equals(Object other) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other adapters don't override equals and hashCode. Is there a particular reason it's needed here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to be consistent with the semantics of GenericManifestFile.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually use equals and hashCode? We try to avoid implementing those methods where there are different interpretations of what "equality" might mean.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't add any code that uses it. I was trying to be consistent with the behavior of the GenericManifestFile which defines equality with the location. I have removed it.

@anoopj anoopj requested review from gaborkaszab and stevenzwu June 22, 2026 17:38

@stevenzwu stevenzwu left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I will merge Gabor's TrackedFileBuilder once CI passed. then let's rebase this one and use the builder as applicable.

private TrackedManifestFile(TrackedFile file) {
Preconditions.checkArgument(
file.tracking() != null, "Cannot create manifest file: no tracking");
Preconditions.checkArgument(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to add on top of what Anoop said. this is for leaf manifest file entry (TrackedFile) in the root manifest file. Hence dataSequenceNumber should be required

private TrackedManifestFile(TrackedFile file) {
Preconditions.checkArgument(
file.tracking().dataSequenceNumber() != null,
"Cannot create manifest file: no data sequence number");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think we prefer the slightly more specific "Invalid data sequence number: null" because "no X" doesn't necessarily mean "x is null".

Also, even though the attempt is to create a manifest file, that's not the actual problem. It's debatable whether to include what is being attempted, "Cannot create manifest file", but I would opt to leave it out because that's clear from other context (that the check is in asManifestFile and the TrackedManifestFile constructor).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Override
public int partitionSpecId() {
throw new UnsupportedOperationException(
"Tracked manifests are not bound to a single partition spec");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that it is correct to say "Tracked manifests": what is a "tracked" manifest and why is one from a manifest list not "tracked"?

I think it is better to say "v4 manifests are not bound to a single partition spec".

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. I was trying to avoid having too many v4 references.

public ManifestContent content() {
return file.contentType() == FileContent.DATA_MANIFEST
? ManifestContent.DATA
: ManifestContent.DELETES;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I think it is safer to avoid a dependency between this and the constructor's verification. I would recommend using a switch statement:

switch (file.contentType()) {
  case DATA_MANIFEST:
    return ManifestContent.DATA;
  case DELETE_MANIFEST:
    return ManifestContent.DELETES;
  default:
    throw new UnsupportedOperationException("Unsupported content type for manifests: " + file.contentType());
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

public boolean equals(Object other) {
if (this == other) {
return true;
} else if (!(other instanceof TrackedManifestFile)) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that the implementation of GenericManifestFile is very helpful. It considers two files equal if they have the same path (which I think is reasonable) but it also excludes any implementation other than GenericManifestFile, which means that they aren't actually interchangeable as the interface suggests.

In addition, this creates a situation where a.equals(b) may be true, but b.equals(a) can be false for reasonable implementations.

I'd prefer to remove this and fix places where it is assumed or used. Good catch, @gaborkaszab!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

"Cannot create manifest file: no data sequence number");
Preconditions.checkArgument(
file.tracking().snapshotId() != null, "Cannot create manifest file: no snapshot ID");
Preconditions.checkArgument(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also seems an internal invariant for TrackedFile. The new TrackedFileBuilder guarantees that manifestInfo is set for manifest entries.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Removed.

NO_PARTITION,
8L,
2048L);
file.set(MANIFEST_INFO_ORDINAL, manifestInfo);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, in the meantime the builder got merged

private TrackedManifestFile(TrackedFile file) {
Preconditions.checkArgument(
file.tracking() != null, "Cannot create manifest file: no tracking");
Preconditions.checkArgument(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation @anoopj and @stevenzwu !

void testManifestFileAdapterDelegation() {
Tracking tracking = createTracking();
ManifestInfo manifestInfo = createManifestInfo();
TrackedFileStruct file =

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we hide the construction of the input TrackedFile into a separate function? Like dataManifestFile() it could get ManifestInfo, FileContent, and/or Tracking parameters if that matters for the test.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. extracted manifestBuilder(FileContent) / manifestFile(FileContent, ManifestInfo) and now tests build through them.

}

@Test
void testManifestFileAdapterDelegation() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe parameterize the test for data + delete manifest. The testDeleteManifestContentMapsToDeletes test could be dropped then

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggesiton. Done.

@anoopj anoopj requested a review from rdblue June 23, 2026 18:36

@gaborkaszab gaborkaszab left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some minor comments. Other than them, LGTM!

.manifestInfo(manifestInfo)
.keyMetadata(ByteBuffer.wrap(new byte[] {7, 8, 9}))
.build();
populateTrackingFields(file);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI: in the follow-up for the TrackedFileBuilder PR, we decided to drop populateTrackingFields and went to call the constructor of TrackedFile instead using the createTracking (revived in the follow-up PR) method.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. I can rebase when the followup PR is merged.

manifestBuilder(FileContent.DATA_MANIFEST).manifestInfo(createManifestInfo()).build();
// Inheritance fills the data sequence number, which the adapter requires; first row ID and
// key metadata stay unset.
((TrackingStruct) file.tracking()).set(DATA_SEQUENCE_NUMBER_ORDINAL, DATA_SEQUENCE_NUMBER);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of using the ordinal, we can use the inheritFrom(some_manifest) method to do that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Routing through inheritFrom() will require building a parent Tzracking with several nulls. I prefer to keep it since this is simpler.

Comment thread core/src/test/java/org/apache/iceberg/TestTrackedFileAdapters.java Outdated
Comment thread core/src/test/java/org/apache/iceberg/TestTrackedFileAdapters.java Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

v4: add wrapper for ManifestFile

5 participants