S3 listing bugfix#151
Merged
Merged
Conversation
urlquote() was using std::to_string(val) to produce the percent-encoded character value, which emits decimal (e.g. space became "%32" instead of "%20"). This is incorrect per RFC 3986, which specifies uppercase hex digits. Switch to snprintf with "%%%02X" to produce correct percent-encoded output.
When encoding-type=url is set in the ListBucket request (which it is by default), S3 returns percent-encoded Key and Prefix values in its XML response. The code was not decoding these, so object names containing special characters (or even path separators) appeared with raw percent-encoding in directory listings and PROPFIND responses. Add a urlunquote() function to reverse percent-encoding, and apply it to Key and Prefix values immediately after XML parsing. Refactor Results() to delegate to a static ParseListBucketResult() method, enabling unit testing of the XML parsing logic without requiring a live S3 connection. Fixes: PelicanPlatform#146
Some S3-compatible backends such as OpenStack Swift create zero-byte objects with trailing slashes (e.g. "dir/") to represent directories. These placeholder objects caused two problems: In Stat: a HEAD on "dir" returns 404 (the actual key is "dir/"), so the code falls through to a list-based directory check. But listing with prefix="dir/" and max-keys=1 returns only the placeholder itself, which consumed the single result slot hiding real children, and matched the old foundObj check which returned ENOENT. Fix by bumping max-keys to 2, filtering out placeholders via remove_if, and recognizing a removed placeholder as proof the directory exists (even if empty). When HEAD returns 200 with size 0, the old code unconditionally declared the object a regular file. Now zero-byte objects (other than .pelican_dir_marker) fall through to the list-based check: if children exist the object is a directory, otherwise it is a genuine zero-byte file. In Readdir: the placeholder "dir/" appears in the object list from ListBucket. After filename extraction (rfind + trimslashes), it produces an empty string, which XRootD interprets as end-of-directory and stops iteration before reaching actual files. Fix by filtering placeholder objects from the listing results in ListS3Dir.
Add 7 tests for urlquote/urlunquote covering hex encoding correctness, unreserved character passthrough, special character encoding, decoding, mixed case hex digits, invalid sequence handling, and round-trip consistency. Add 6 tests for ParseListBucketResult covering basic key/prefix extraction, URL-encoded key decoding, continuation token handling, truncation flag behavior, invalid XML error reporting, and wrong root element detection.
There was a problem hiding this comment.
Pull request overview
Fixes S3-backed directory listing correctness in the XRootD S3 OSS plugin when backends return URL-encoded keys/prefixes and/or use zero-byte “directory placeholder” objects.
Changes:
- Fix
urlquote()percent-escape formatting and addurlunquote()for decoding S3encoding-type=urlXML responses. - Refactor S3 ListObjectsV2 XML parsing into
AmazonS3List::ParseListBucketResult()and decode parsed Keys/Prefixes. - Filter zero-byte placeholder objects from
StatandReaddirlisting results to avoid masking children / premature end-of-directory.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/s3_tests.cc | Adds unit tests for urlquote/urlunquote and ParseListBucketResult behavior. |
| src/stl_string_utils.hh | Declares new urlunquote() helper. |
| src/stl_string_utils.cc | Fixes urlquote() escaping and implements urlunquote(). |
| src/S3FileSystem.cc | Adjusts Stat() behavior to handle zero-byte directory placeholders and avoid false ENOENT. |
| src/S3Directory.cc | Filters placeholder objects from directory listings to prevent empty-name entries terminating Readdir. |
| src/S3Commands.hh | Exposes AmazonS3List::ParseListBucketResult() for reuse/testing. |
| src/S3Commands.cc | Refactors list XML parsing and applies URL-decoding to parsed Key/Prefix values. |
Non-zero objects whose key ends with '/' should return ENOENT on Stat, not be silently promoted to directories. Restrict the placeholder filter to size == 0 objects, which is the OpenStack Swift convention.
When Stat lists with prefix "foo/" to check for a directory, the list may return the key "foo/" itself alongside any children. Previously only zero-byte self-objects were filtered, so a non-zero trailing-slash key (e.g. a 1024-byte object named "foo/") would survive the filter, be treated as a directory child, and cause Stat to return success instead of ENOENT. Now all self-referencing objects (key == prefix) are removed from the result set before the directory/ENOENT decision. Only zero-byte self-objects set the removedPlaceholder flag, preserving the empty- directory behaviour for OpenStack Swift-style placeholders.
Closed
- S3Directory.cc: update filter comment to accurately describe that any self-key object (not just zero-byte ones) is removed to prevent empty filenames in Readdir - S3FileSystem.cc: update stale "single item" comment to explain why maxKeys=2 is requested (placeholder + one child) - s3_unit_tests.cc: add ZeroByteDirectoryPlaceholder fixture test covering three cases: empty placeholder-only directory, placeholder with a child file, and a genuine zero-byte regular file - s3_unit_tests.cc: add ReadirSkipsPlaceholder fixture test verifying that a zero-byte "dir/" entry does not cause Readdir to terminate early before returning actual children
Member
Author
|
Per the original issue, here's the example XML I get from propfinding the endpoint (note that I spun up test instance where my fed prefix is Otherwise, I've reviewed all the code and the new tests, and this looks good to me so I'm going to merge. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #146
Two bugs prevented correct directory listings against S3-compatible
backends that use zero-byte placeholder objects for directories:
URL-encoded keys not decoded: The ListBucket request uses
encoding-type=urlso S3 percent-encodes Key/Prefix values for XMLsafety. The code never decoded them, so object names appeared raw
(e.g.
testfolder%2Ffile1.txt) in PROPFIND responses. This addsurlunquote()and applies it after XML parsing. Also fixes a pre-existingbug in
urlquote()that emitted decimal rather than hex escapes(
%32instead of%20).Zero-byte placeholder objects broke Stat and Readdir:
Some backends create zero-byte keys with trailing slashes (e.g.
testfolder/)as directory markers. In
Stat, these caused ENOENT because theplaceholder consumed the single-item list probe result. In
Readdir,the placeholder produced an empty filename that XRootD interpreted as
end-of-directory, hiding all children. Fix: filter placeholders from
list results in both code paths, using their presence as proof the
directory exists.
Includes 13 unit tests for
urlquote/urlunquoteand the refactoredParseListBucketResult. Manually verified against the UMass OSN backendfrom the issue.