Skip to content

S3 listing bugfix#151

Merged
jhiemstrawisc merged 9 commits into
PelicanPlatform:mainfrom
jhiemstrawisc:issue-146
Apr 3, 2026
Merged

S3 listing bugfix#151
jhiemstrawisc merged 9 commits into
PelicanPlatform:mainfrom
jhiemstrawisc:issue-146

Conversation

@jhiemstrawisc
Copy link
Copy Markdown
Member

Fixes #146

Two bugs prevented correct directory listings against S3-compatible
backends that use zero-byte placeholder objects for directories:

URL-encoded keys not decoded: The ListBucket request uses
encoding-type=url so S3 percent-encodes Key/Prefix values for XML
safety. The code never decoded them, so object names appeared raw
(e.g. testfolder%2Ffile1.txt) in PROPFIND responses. This adds
urlunquote() and applies it after XML parsing. Also fixes a pre-existing
bug in urlquote() that emitted decimal rather than hex escapes
(%32 instead of %20).

Zero-byte placeholder objects broke Stat and Readdir:
Some backends create zero-byte keys with trailing slashes (e.g. testfolder/)
as directory markers. In Stat, these caused ENOENT because the
placeholder consumed the single-item list probe result. In Readdir,
the placeholder produced an empty filename that XRootD interpreted as
end-of-directory, hiding all children. Fix: filter placeholders from
list results in both code paths, using their presence as proof the
directory exists.

Includes 13 unit tests for urlquote/urlunquote and the refactored
ParseListBucketResult. Manually verified against the UMass OSN backend
from the issue.

urlquote() was using std::to_string(val) to produce the percent-encoded
character value, which emits decimal (e.g. space became "%32" instead
of "%20"). This is incorrect per RFC 3986, which specifies uppercase
hex digits.

Switch to snprintf with "%%%02X" to produce correct percent-encoded
output.
When encoding-type=url is set in the ListBucket request (which it is
by default), S3 returns percent-encoded Key and Prefix values in its
XML response. The code was not decoding these, so object names
containing special characters (or even path separators) appeared with
raw percent-encoding in directory listings and PROPFIND responses.

Add a urlunquote() function to reverse percent-encoding, and apply it
to Key and Prefix values immediately after XML parsing.

Refactor Results() to delegate to a static ParseListBucketResult()
method, enabling unit testing of the XML parsing logic without
requiring a live S3 connection.

Fixes: PelicanPlatform#146
Some S3-compatible backends such as OpenStack Swift create zero-byte
objects with trailing slashes (e.g. "dir/") to represent directories.
These placeholder objects caused two problems:

In Stat: a HEAD on "dir" returns 404 (the actual key is "dir/"), so
the code falls through to a list-based directory check. But listing
with prefix="dir/" and max-keys=1 returns only the placeholder itself,
which consumed the single result slot hiding real children, and matched
the old foundObj check which returned ENOENT. Fix by bumping max-keys
to 2, filtering out placeholders via remove_if, and recognizing a
removed placeholder as proof the directory exists (even if empty).

When HEAD returns 200 with size 0, the old code unconditionally
declared the object a regular file. Now zero-byte objects (other than
.pelican_dir_marker) fall through to the list-based check: if children
exist the object is a directory, otherwise it is a genuine zero-byte
file.

In Readdir: the placeholder "dir/" appears in the object list from
ListBucket. After filename extraction (rfind + trimslashes), it
produces an empty string, which XRootD interprets as end-of-directory
and stops iteration before reaching actual files. Fix by filtering
placeholder objects from the listing results in ListS3Dir.
Add 7 tests for urlquote/urlunquote covering hex encoding correctness,
unreserved character passthrough, special character encoding, decoding,
mixed case hex digits, invalid sequence handling, and round-trip
consistency.

Add 6 tests for ParseListBucketResult covering basic key/prefix
extraction, URL-encoded key decoding, continuation token handling,
truncation flag behavior, invalid XML error reporting, and wrong root
element detection.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes S3-backed directory listing correctness in the XRootD S3 OSS plugin when backends return URL-encoded keys/prefixes and/or use zero-byte “directory placeholder” objects.

Changes:

  • Fix urlquote() percent-escape formatting and add urlunquote() for decoding S3 encoding-type=url XML responses.
  • Refactor S3 ListObjectsV2 XML parsing into AmazonS3List::ParseListBucketResult() and decode parsed Keys/Prefixes.
  • Filter zero-byte placeholder objects from Stat and Readdir listing results to avoid masking children / premature end-of-directory.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/s3_tests.cc Adds unit tests for urlquote/urlunquote and ParseListBucketResult behavior.
src/stl_string_utils.hh Declares new urlunquote() helper.
src/stl_string_utils.cc Fixes urlquote() escaping and implements urlunquote().
src/S3FileSystem.cc Adjusts Stat() behavior to handle zero-byte directory placeholders and avoid false ENOENT.
src/S3Directory.cc Filters placeholder objects from directory listings to prevent empty-name entries terminating Readdir.
src/S3Commands.hh Exposes AmazonS3List::ParseListBucketResult() for reuse/testing.
src/S3Commands.cc Refactors list XML parsing and applies URL-decoding to parsed Key/Prefix values.

Comment thread src/S3Commands.cc
Comment thread src/S3FileSystem.cc Outdated
Comment thread test/s3_tests.cc
Non-zero objects whose key ends with '/' should return ENOENT on Stat,
not be silently promoted to directories. Restrict the placeholder filter
to size == 0 objects, which is the OpenStack Swift convention.
When Stat lists with prefix "foo/" to check for a directory, the list
may return the key "foo/" itself alongside any children.  Previously
only zero-byte self-objects were filtered, so a non-zero trailing-slash
key (e.g. a 1024-byte object named "foo/") would survive the filter,
be treated as a directory child, and cause Stat to return success
instead of ENOENT.

Now all self-referencing objects (key == prefix) are removed from the
result set before the directory/ENOENT decision.  Only zero-byte
self-objects set the removedPlaceholder flag, preserving the empty-
directory behaviour for OpenStack Swift-style placeholders.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Comment thread src/S3FileSystem.cc
Comment thread src/S3Directory.cc Outdated
Comment thread src/S3FileSystem.cc
Comment thread src/S3Directory.cc Outdated
Comment thread src/S3FileSystem.cc Outdated
- S3Directory.cc: update filter comment to accurately describe that
  any self-key object (not just zero-byte ones) is removed to prevent
  empty filenames in Readdir

- S3FileSystem.cc: update stale "single item" comment to explain why
  maxKeys=2 is requested (placeholder + one child)

- s3_unit_tests.cc: add ZeroByteDirectoryPlaceholder fixture test
  covering three cases: empty placeholder-only directory, placeholder
  with a child file, and a genuine zero-byte regular file

- s3_unit_tests.cc: add ReadirSkipsPlaceholder fixture test verifying
  that a zero-byte "dir/" entry does not cause Readdir to terminate
  early before returning actual children
@jhiemstrawisc
Copy link
Copy Markdown
Member Author

Per the original issue, here's the example XML I get from propfinding the endpoint (note that I spun up test instance where my fed prefix is /my-prefix, so this doesn't match exactly:

$ curl -k https://`hostname`:8443/my-prefix/testfolder -X PROPFIND -H "Depth: 1"
<?xml version="1.0" encoding="utf-8"?>
<D:multistatus xmlns:D="DAV:" xmlns:ns1="http://apache.org/dav/props/" xmlns:ns0="DAV:">
<D:response xmlns:lp1="DAV:" xmlns:lp2="http://apache.org/dav/props/" xmlns:lp3="LCGDM:">
<D:href>/my-prefix/testfolder</D:href>
<D:propstat>
<D:prop>
<lp1:getcontentlength>4096</lp1:getcontentlength>
<lp1:getlastmodified>Thu, 01 Jan 1970 00:00:00 GMT</lp1:getlastmodified>
<lp1:resourcetype><D:collection/></lp1:resourcetype>
<lp1:iscollection>1</lp1:iscollection>
<lp1:executable>F</lp1:executable>
</D:prop>
<D:status>HTTP/1.1 200 OK</D:status>
</D:propstat>
</D:response>
<D:response xmlns:lp1="DAV:" xmlns:lp2="http://apache.org/dav/props/" xmlns:lp3="LCGDM:">
<D:href>/my-prefix/testfolder/file1.txt</D:href>
<D:propstat>
<D:prop>
<lp1:getcontentlength>0</lp1:getcontentlength>
<lp1:getlastmodified>Thu, 01 Jan 1970 00:00:00 GMT</lp1:getlastmodified>
<lp1:resourcetype/>
<lp1:iscollection>0</lp1:iscollection>
<lp1:executable>F</lp1:executable>
</D:prop>
<D:status>HTTP/1.1 200 OK</D:status>
</D:propstat>
</D:response>
<D:response xmlns:lp1="DAV:" xmlns:lp2="http://apache.org/dav/props/" xmlns:lp3="LCGDM:">
<D:href>/my-prefix/testfolder/file2.txt</D:href>
<D:propstat>
<D:prop>
<lp1:getcontentlength>0</lp1:getcontentlength>
<lp1:getlastmodified>Thu, 01 Jan 1970 00:00:00 GMT</lp1:getlastmodified>
<lp1:resourcetype/>
<lp1:iscollection>0</lp1:iscollection>
<lp1:executable>F</lp1:executable>
</D:prop>
<D:status>HTTP/1.1 200 OK</D:status>
</D:propstat>
</D:response>
<D:response xmlns:lp1="DAV:" xmlns:lp2="http://apache.org/dav/props/" xmlns:lp3="LCGDM:">
<D:href>/my-prefix/testfolder/testfolder</D:href>
<D:propstat>
<D:prop>
<lp1:getcontentlength>4096</lp1:getcontentlength>
<lp1:getlastmodified>Thu, 01 Jan 1970 00:00:00 GMT</lp1:getlastmodified>
<lp1:resourcetype><D:collection/></lp1:resourcetype>
<lp1:iscollection>1</lp1:iscollection>
<lp1:executable>F</lp1:executable>
</D:prop>
<D:status>HTTP/1.1 200 OK</D:status>
</D:propstat>
</D:response>
</D:multistatus>

Otherwise, I've reviewed all the code and the new tests, and this looks good to me so I'm going to merge.

@jhiemstrawisc jhiemstrawisc merged commit cacc723 into PelicanPlatform:main Apr 3, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PROPFIND broken on S3 backed Origin

2 participants