Skip to content

fix(hadoop): use codec-aware metadata filenames#1326

Open
fallintoplace wants to merge 6 commits into
apache:mainfrom
fallintoplace:fix/hadoop-compressed-metadata-paths
Open

fix(hadoop): use codec-aware metadata filenames#1326
fallintoplace wants to merge 6 commits into
apache:mainfrom
fallintoplace:fix/hadoop-compressed-metadata-paths

Conversation

@fallintoplace

Copy link
Copy Markdown
Contributor

Summary

  • write Hadoop metadata files with suffixes that match the configured metadata compression codec
  • load the actual discovered metadata file path instead of reconstructing an uncompressed-looking path
  • teach Hadoop metadata discovery to recognize plain, gzip, and zstd metadata files consistently

Why

Hadoop currently always writes metadata to vN.metadata.json even when write.metadata.compression-codec requests gzip or zstd. The table metadata reader infers decompression from the filename suffix, so compressed bytes behind a plain .metadata.json path can fail to reload.

Using codec-aware filenames on write and carrying the discovered metadata path through LoadTable keeps create, commit, and reload consistent for plain, gzip, and zstd metadata.

Testing

  • go test ./catalog/hadoop -run 'TestHadoopCatalogTestSuite/(TestMetadataFilePathForCompression|TestVersionPatternMatches|TestIsTableDirWithZstdMetadata|TestFindVersionGzipOnlyWithHint|TestFindVersionMixedGzipAndPlain|TestFindVersionZstdOnlyWithHint|TestFindVersionMixedZstdAndPlain|TestCreateTableGzipMetadata|TestCreateTableZstdMetadata|TestCommitTableGzipMetadata|TestCommitTableZstdMetadata)$' -count=1
  • go test ./catalog/hadoop -count=1
  • go test ./catalog/... -count=1
  • go test ./... -run '^$' -count=1
  • go run github.com/golangci/golangci-lint/v2/cmd/golangci-lint@v2.11.4 run --timeout=10m ./catalog/hadoop/...
  • git diff --check

@fallintoplace fallintoplace requested a review from zeroshade as a code owner June 27, 2026 09:55

@tanmayrauth tanmayrauth left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good — write suffix now matches the read-path codec inference in table.go, and the create→load / commit→load round-trip tests cover both gzip and zstd. One tiny nit.

Comment thread catalog/hadoop/hadoop.go Outdated
@@ -188,6 +190,22 @@ func (c *Catalog) metadataFilePath(ident table.Identifier, version int) string {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this change metadataFilePath no longer has a production caller, CreateTable/CommitTable/LoadTable all go through metadataFilePathForCompression / findMetadataLocation now. It's still referenced from the tests so it's not dead, but might be worth either keeping it explicitly as a test-only helper or pointing those tests at the new helper to avoid drift.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a valid comment, I will amend the PR. Have a nice weekend.

@zeroshade

Copy link
Copy Markdown
Member

This LGTM on review, but it currently has merge conflicts with main (the metadata-discovery code moved). Could you rebase onto the latest main? Once it's clean I'll get it merged. Thanks @fallintoplace!

@laskoviymishka laskoviymishka left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch on the underlying bug — CreateTable/CommitTable were writing to vN.metadata.json regardless of the codec, so a gzip file behind a plain filename would fail to parse on the next LoadTable, and routing the discovered path from findMetadataLocation straight into NewFromLocation is the right fix. The gzip side of this is solid.

I'd hold it before merging though. The zstd half goes beyond what the ecosystem supports: Java's Codec enum is only NONE/GZIP and never probes for .zstd, PyIceberg falls back to NOOP_COMPRESSOR and reads the bytes as garbage, and iceberg-rust rejects zstd outright. So a Hadoop-catalog table we write with zstd is unreadable by every other client — and the Hadoop catalog is exactly the shared-filesystem case where that matters. I'd gate zstd at this layer (same error you already raise for brotli) unless the intent is an explicit go-only extension, in which case it needs a loud warning in the PR description and CHANGES.md.

The other blocker is independent of zstd: on commit we only stat the new codec's own filename, and on load we re-probe with a fixed plain→gz→zstd order even though WalkDir already saw the real filename. Both let a stale plain vN shadow a newer compressed vN, which silently defeats a compression migration and can produce two different files at the same version under a concurrent cross-codec write.

A few things I'd want before merge:

  • gate (or loudly document) zstd for the Hadoop catalog
  • make the commit conflict check probe all codec variants for the new version
  • thread the filename WalkDir already found into scanForward instead of re-probing, and add a test pinning which file wins when vN.metadata.json and vN.gz.metadata.json both exist
  • assert the returned path (not just the version) in the gzip/zstd findVersion tests

Once those are addressed, happy to take another pass and approve.

Comment thread catalog/hadoop/hadoop.go Outdated
case table.MetadataCompressionCodecGzip:
suffix = ".gz.metadata.json"
case table.MetadataCompressionCodecZstd:
suffix = ".zstd.metadata.json"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the zstd branch here is a cross-client trap. zstd isn't a value Java's TableMetadataParser.Codec understands — its enum is only NONE("") and GZIP(".gz"), and HadoopTableOperations.getMetadataFile only ever probes Codec.values(), so a vN.zstd.metadata.json is never even looked for. PyIceberg falls back to NOOP_COMPRESSOR for any non-.gz suffix and reads the compressed bytes as garbage, and iceberg-rust rejects zstd outright in parse_metadata_file_compression.

So a Hadoop-catalog table we write with zstd is invisible-or-corrupt to every other Iceberg client, and the Hadoop catalog is precisely the shared-filesystem case where that interop matters most.

I'd gate zstd at this layer for the Hadoop catalog — return the same "unsupported codec" error you already produce for brotli, so the failure is loud at write time rather than a silent unreadable file. If the intent is genuinely a go-to-go-only extension, that needs a prominent warning in the PR description and CHANGES.md and a clear note in the code that this is non-portable. wdyt?

Comment thread catalog/hadoop/hadoop.go
tempPath := filepath.Join(metaDir, uuid.New().String()+".metadata.json")

compression := updated.Properties().Get(table.MetadataCompressionKey, table.MetadataCompressionDefault)
newMetaPath, err := c.metadataFilePathForCompression(ident, newVersion, compression)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conflict guard below only stats the codec-specific path for newVersion, so cross-codec races slip through. If writer A plans v2.gz.metadata.json and writer B plans v2.metadata.json concurrently, neither Stat sees the other and both commit — two different files both claiming version 2, breaking the monotonic chain.

Java's getMetadataFile iterates every codec variant, so its conflict detection is codec-agnostic by construction.

I'd make the guard probe all variants: call metadataVersionLocation(ident, newVersion) and fail the commit if it finds any file for that version, regardless of suffix. That also covers the migration case where someone switches codecs between commits.

Comment thread catalog/hadoop/hadoop.go Outdated
strings.Join(ident, "."), catalog.ErrNoSuchTable)
}

metaPath, ok := c.metadataVersionLocation(ident, maxVer)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WalkDir above already saw the exact filename for maxVer (it matched versionPattern to compute the version), but we throw the name away and only keep the integer — then re-probe here with the fixed plain→gz→zstd order.

That re-probe is where stale data wins: if both v5.metadata.json (old) and v5.gz.metadata.json (new) exist, plain is probed first and silently shadows the compressed current version, so the catalog serves stale uncompressed metadata even though the authoritative version is the gzip one. A compression migration would be quietly defeated.

I'd capture the matched filename in the WalkDir callback (it's the path param) when v > maxVer, and thread that straight into scanForward so we never re-probe a file we just saw. That removes the shadowing and drops the extra Stats per lookup on the slow path.

Comment thread catalog/hadoop/hadoop_test.go Outdated

ver, err := s.cat.findVersion(ident)
s.Require().NoError(err)
s.Equal(2, ver)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestFindVersionZstdOnlyWithHint and TestFindVersionMixedZstdAndPlain only assert the version number, never the returned path — so the part of the fix that actually matters (that findMetadataLocation returns the codec-suffixed path, not a reconstructed plain one) isn't directly exercised here. The empty nil files are fine since these call only Stat, but the path is the thing the LoadTable fix hinges on.

I'd add a direct findMetadataLocation assertion that the returned path ends in .zstd.metadata.json for the zstd-only case and .gz.metadata.json for a gzip-only one. That's also the natural place to add the shadowing case from the comment on findMetadataLocation — write both v3.metadata.json and v3.gz.metadata.json and assert which path wins, so the resolution is pinned by a test.

Comment thread catalog/hadoop/hadoop_test.go Outdated

path, err = s.cat.metadataFilePathForCompression(ident, 1, table.MetadataCompressionCodecZstd)
s.Require().NoError(err)
s.Equal(filepath.Join(s.warehouse, "ns", "tbl", "metadata", "v1.zstd.metadata.json"), path)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old TestMetadataFilePath covered version 1 and version 42; the rewrite only keeps version 1 across the three codecs. The multi-digit case is the normal path for any table past v9, and fmt.Sprintf("v%d%s", ...) is exactly where a width/format slip would hide.

I'd carry a version=42 assertion over for at least one codec while we're here.

Comment thread catalog/hadoop/hadoop.go Outdated
// number and the UUID is in canonical 8-4-4-4-12 hex format.
var uuidMetadataPattern = regexp.MustCompile(
`^[0-9]{5}-[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}(?:\.gz)?\.metadata\.json$`,
`^[0-9]{5}-[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}(?:\.(?:gz|zstd))?\.metadata\.json$`,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small things here. The // Java/PyIceberg catalogs comment above is now inaccurate for the zstd alternative — neither Java nor PyIceberg ever emits 00000-<uuid>.zstd.metadata.json, so I'd note the zstd part is an iceberg-go-only extension rather than implying parity.

And recognizing UUID-style zstd here (used by isTableDir/ListTables) while findMetadataLocation's WalkDir still keys off versionPattern only creates a confusing split: a user could create a table in Python, see it in ListTables, then get ErrNoSuchTable on LoadTable. That load gap is pre-existing, but I'd at least not widen the recognition surface without a // recognized but not loadable note. The new zstd UUID branch also has no test backing it.

Comment thread catalog/hadoop/hadoop.go Outdated
filepath.Join(dir, fmt.Sprintf("v%d.metadata.json", version)),
filepath.Join(dir, fmt.Sprintf("v%d.gz.metadata.json", version)),
filepath.Join(dir, fmt.Sprintf("v%d.zstd.metadata.json", version)),
} {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth a one-line comment that any non-ErrNotExist Stat error (permission denied, transient FS error) is treated as "not found" here. It's the same behavior as the old metadataVersionExists, so not a regression, but on a slow filesystem a transient error could make a real version look absent — calling it out keeps the next reader from assuming it's only an existence check.

s.Equal(len(created.Schema().Fields()), len(loaded.Schema().Fields()))
}

func (s *HadoopCatalogTestSuite) TestCreateTableGzipMetadata() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These four new integration tests (TestCreateTableGzipMetadata, TestCreateTableZstdMetadata, TestCommitTableGzipMetadata, TestCommitTableZstdMetadata) repeat the same os.Mkdir + props map + CreateTable + FileExists boilerplate, where the existing suite leans on a createTestTable helper.

I'd fold the four into a createTestTableWithCompression helper or table-driven subtests to match the rest of the suite, and while you're adding zstd coverage, a TestIsTableDirTrueUUIDMetadataZstd to mirror the existing UUID+gzip case would catch a ztsd-style typo in uuidMetadataPattern. Low priority next to the blockers above.

@fallintoplace fallintoplace force-pushed the fix/hadoop-compressed-metadata-paths branch from 03d947b to 3512a3f Compare June 29, 2026 22:14

@laskoviymishka laskoviymishka left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix itself is right — the gzip round-trip tests exercise the previously-broken write-then-reload path with real compressed bytes and assert UUID equality, so I'm confident the reported bug (compressed bytes behind a plain filename) is closed. The zstd hard-reject on write is a nice improvement over silently writing something that fails on read.

One thing I'd fix before this lands: findMetadataLocation returns maxPath straight from the WalkDir callback, so when both v{n}.metadata.json and v{n}.gz.metadata.json exist for a version, the file we load depends on listing order. On local FS the .gz variant happens to sort first so the test passes, but on a remote/NFS listing it's arbitrary — and it disagrees with the hint path, which resolves through metadataVersionLocation and deterministically prefers plain. Same table, two discovery branches, potentially two different files. Canonicalizing on metadataVersionLocation(ident, maxVer) after the walk closes that.

Left the detail inline. Fix that and I think this is good to land.

Comment thread catalog/hadoop/hadoop.go Outdated
v, _ := strconv.Atoi(matches[1])
if v > maxVer {
maxVer = v
maxPath = path

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxPath is set to whichever variant WalkDir visits first, so when both v3.metadata.json and v3.gz.metadata.json exist for a version the loaded file depends on listing order. On local FS .gz sorts ahead of .metadata so it's stable and the test passes, but on a remote/NFS listing it's arbitrary — and it disagrees with the hint path, which goes through metadataVersionLocation and always prefers plain. Same table can then resolve to two different metadata files depending on which discovery branch runs.

I'd drop maxPath and call metadataVersionLocation(ident, maxVer) after the walk so both discovery paths land on the same file. wdyt?

@zeroshade zeroshade left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining blockers are concurrency races in the new cross-codec Hadoop metadata paths. laskoviymishka's zstd-rejection and path-canonicalization threads appear addressed; items below are the remaining blockers (concurrency).

  • [MAJOR] CommitTable cross-codec version conflict check is non-atomic; add a codec-independent per-version claim and mixed-codec concurrent commit test.
  • [MAJOR] CreateTable has the same default/gzip duplicate-v1 race; reuse the claim and add a concurrent create test.

Comment thread catalog/hadoop/hadoop.go Outdated
}

if existingPath, exists := c.metadataVersionLocation(ident, newVersion); exists {
_ = c.filesystem.Remove(tempPath)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MAJOR] The cross-codec commit conflict check is non-atomic. Two concurrent writers can both compute the same newVersion, both pass metadataVersionLocation(ident, newVersion) before either publishes, then one RenameNoReplace can publish vN.metadata.json while the other publishes vN.gz.metadata.json; both succeed, leaving two files claiming the same version. Fix by taking an atomic per-version claim independent of codec suffix before publishing (for example a RenameNoReplace sentinel for vN), and add a concurrent mixed-codec commit test where both writers pass preflight before either final rename.

Comment thread catalog/hadoop/hadoop.go
}
}

metaPath, err := c.metadataFilePathForCompression(ident, version, compression)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MAJOR] CreateTable has the same duplicate-version race with different compression. Two concurrent creators can both pass isTableDir, then publish v1.metadata.json and v1.gz.metadata.json successfully. Reuse a codec-independent per-version/table-creation claim before publishing metadata, and add a concurrent create test with default and gzip writers.

@zeroshade zeroshade left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update — the two earlier concurrency concerns are addressed now: the codec-independent vN.metadata.json.claim path closes the mixed-codec version race, and the new concurrent mixed-codec create/commit tests cover the real scenarios. Nice work.

I found one new follow-up around crash recovery for retained claim files that should be addressed before merge.

Comment thread catalog/hadoop/hadoop.go
if err := c.filesystem.RenameNoReplace(tempPath, metaPath); err != nil {
func (c *Catalog) commitMetadataFile(ident table.Identifier, version int, tempPath, metaPath string, conflictErr error) error {
claimPath := c.metadataVersionClaimPath(ident, version)
if err := c.filesystem.RenameNoReplace(tempPath, claimPath); err != nil {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MAJOR] A stale .claim file can permanently block this metadata version. The claim is created here with RenameNoReplace(tempPath, claimPath), then is only removed by the deferred cleanup when removeClaim remains true or intentionally retained after the final rename. If the process dies after taking the claim but before publishing the final metadata file, vN.metadata.json.claim remains with no final vN.*.metadata.json, so every future attempt for that version hits fs.ErrExist and returns a conflict even though metadataVersionLocation finds no actual metadata file. Please make stale claims recoverable: when claimPath already exists, check whether any final metadata variant exists via metadataVersionLocation; if none exists, treat the claim as abandoned by reclaiming/removing it, or encode enough information to distinguish in-progress vs abandoned. Add a regression test that leaves a vN.metadata.json.claim with no vN.*.metadata.json and verifies recovery.

@fallintoplace fallintoplace force-pushed the fix/hadoop-compressed-metadata-paths branch from 1fabc33 to a61039c Compare July 4, 2026 10:47
@zeroshade

Copy link
Copy Markdown
Member

Following up after the latest commit: it only addresses the nlreturn lint, so my earlier request-changes still stands. The stale-claim recovery gap is unaddressed — if the process dies between taking the vN.metadata.json.claim and the final publish, that claim permanently blocks the version because the fs.ErrExist branch never checks whether a final metadata file actually exists. Please add the recovery check (on fs.ErrExist, call metadataVersionLocation and reclaim or remove an abandoned claim when no final metadata exists) plus a regression test that leaves a claim without final metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants