Skip to content

HCR bug fix: Retention job should use Hadoop Path to write data_manifest.json#633

Merged
jiang95-dev merged 1 commit into
linkedin:mainfrom
jiang95-dev:lejiang/tmp-retention
Jun 16, 2026
Merged

HCR bug fix: Retention job should use Hadoop Path to write data_manifest.json#633
jiang95-dev merged 1 commit into
linkedin:mainfrom
jiang95-dev:lejiang/tmp-retention

Conversation

@jiang95-dev

@jiang95-dev jiang95-dev commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Summary

Retention with backup enabled failed with Permission denied ... access=WRITE, inode="/" when writing data_manifest.json. The root cause is that backup-manifest path derivation used java.nio.file.Paths, which is not URI-aware and collapses hdfs://authority/... into hdfs:/authority/..., dropping the authority. The mangled path then resolved against the default filesystem root and tried to mkdirs a bogus top-level directory. This affected tables whose newer data files are stored as fully-qualified hdfs:// URIs while older partitions use bare paths (a mix Iceberg persists verbatim in manifests). The fix derives the parent directory with Hadoop Path, which preserves scheme + authority.

Changes

  • Bug Fixes
  • Tests

Bug Fixes

  • Operations.prepareBackupDataManifests now derives a data file's parent via a dataFileParent() helper backed by Hadoop Path instead of java.nio.file.Paths, preserving scheme + authority for fully-qualified hdfs:// paths.

Tests

  • Added OperationsBackupPathTest, a pure-Java (no Spark) regression test covering both fully-qualified hdfs://authority/... and bare /data/... data file forms, asserting the authority is preserved and the backup dir is inserted correctly.

Testing Done

  • Added new tests for the changes made.

OperationsBackupPathTest (4 cases) passes on both the Spark 3.1 and 3.5 modules. Reverting dataFileParent to the old java.nio.file.Paths implementation makes the two authority-bearing cases fail (hdfs://ltx1-holdem/...hdfs:/ltx1-holdem/...), confirming the tests catch the regression; the bare-path cases continue to pass. (Note: the in-JVM test harness uses a local file:/// filesystem with no authority, so the hdfs:// authority drop cannot be reproduced through the Spark integration path without a MiniDFSCluster + HDFS-backed storage; the pure-Java test pins the path-derivation behavior deterministically instead.)

  • Local code review completed

…st paths

Retention backup-manifest writing used java.nio.file.Paths to derive the
parent directory of a data file. Paths is not URI-aware: it collapses
hdfs://authority/... into hdfs:/authority/..., dropping the authority. The
mangled path then resolved against the default filesystem root and failed
with "Permission denied ... inode=/". This bit tables whose newer data files
are stored as fully-qualified hdfs:// URIs while older ones are bare paths.

Use Hadoop Path (which preserves scheme + authority) via a testable
dataFileParent() helper. OperationsBackupPathTest covers the qualified and
bare path forms, plus a negative test that exercises the java.nio.file.Paths
derivation and asserts it produces the broken, root-rooted destination path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jiang95-dev jiang95-dev force-pushed the lejiang/tmp-retention branch from d2af8d5 to 93415f4 Compare June 16, 2026 20:55
@jiang95-dev jiang95-dev changed the title fix(retention): preserve scheme/authority when deriving backup manifest paths HCR bug fix: Retention job should use Hadoop Path to write manifest.json Jun 16, 2026
@jiang95-dev jiang95-dev changed the title HCR bug fix: Retention job should use Hadoop Path to write manifest.json HCR bug fix: Retention job should use Hadoop Path to write data_manifest.json Jun 16, 2026
@jiang95-dev jiang95-dev merged commit 3e8680a into linkedin:main Jun 16, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants