Skip to content

fix(retention): preserve scheme/authority when deriving backup manifest paths#1

Closed
jiang95-dev wants to merge 1 commit into
mainfrom
lejiang/tmp-retention
Closed

fix(retention): preserve scheme/authority when deriving backup manifest paths#1
jiang95-dev wants to merge 1 commit into
mainfrom
lejiang/tmp-retention

Conversation

@jiang95-dev

Copy link
Copy Markdown
Owner

Summary

Retention with backup enabled failed with Permission denied ... access=WRITE, inode="/" when writing data_manifest.json. The root cause is that backup-manifest path derivation used java.nio.file.Paths, which is not URI-aware and collapses hdfs://authority/... into hdfs:/authority/..., dropping the authority. The mangled path then resolved against the default filesystem root and tried to mkdirs a bogus top-level directory. This affected tables whose newer data files are stored as fully-qualified hdfs:// URIs while older partitions use bare paths (a mix Iceberg persists verbatim in manifests). The fix derives the parent directory with Hadoop Path, which preserves scheme + authority.

Changes

  • Bug Fixes
  • Tests

Bug Fixes

  • Operations.prepareBackupDataManifests now derives a data file's parent via a new dataFileParent() helper backed by Hadoop Path instead of java.nio.file.Paths, preserving scheme + authority for fully-qualified hdfs:// paths.
  • getTrashPath(String, String, String) is now a static @VisibleForTesting pure function so the path math is directly unit-testable.

Tests

  • Added OperationsBackupPathTest, a pure-Java (no Spark) regression test covering both fully-qualified hdfs://authority/... and bare /data/... data file forms, asserting the authority is preserved and the backup dir is inserted correctly.

Testing Done

  • Added new tests for the changes made.

OperationsBackupPathTest (4 cases) passes on the fix. Reverting dataFileParent to the old java.nio.file.Paths implementation makes the two authority-bearing cases fail (hdfs://ltx1-holdem/...hdfs:/ltx1-holdem/...), confirming the tests catch the regression; the bare-path cases continue to pass. Existing testRetentionDataManifestWith* and testRetentionWithBackupFails* integration tests (which exercise backupEnabled=true) pass on both the Spark 3.1 and 3.5 modules.

  • Local code review completed

…st paths

Retention backup-manifest writing used java.nio.file.Paths to derive the
parent directory of a data file. Paths is not URI-aware: it collapses
hdfs://authority/... into hdfs:/authority/..., dropping the authority. The
mangled path then resolved against the default filesystem root and failed
with "Permission denied ... inode=/". This bit tables whose newer data files
are stored as fully-qualified hdfs:// URIs while older ones are bare paths.

Use Hadoop Path (which preserves scheme + authority) via a testable
dataFileParent() helper, and add OperationsBackupPathTest covering both the
qualified and bare path forms.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jiang95-dev

Copy link
Copy Markdown
Owner Author

Superseded by linkedin#633 (opened against upstream).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant