Skip to content

Fix file glob not correctly reading sdfs with prefixed files#106

Open
JoelLucaAdams wants to merge 2 commits intomainfrom
bug-file-glob-failing-with-prefixed-paths
Open

Fix file glob not correctly reading sdfs with prefixed files#106
JoelLucaAdams wants to merge 2 commits intomainfrom
bug-file-glob-failing-with-prefixed-paths

Conversation

@JoelLucaAdams
Copy link
Collaborator

@JoelLucaAdams JoelLucaAdams commented Mar 13, 2026

If you attempt to load files from a folder that have used the output block's file_prefix then you might have some files that look like this:

normal_0000.sdf
normal_0001.sdf
full_0000.sdf

Previously we were only loading the glob if the final component of the path they supplied was *.sdf, now we check it ends with it instead to allow for filtering on file prefixes. Prior to this fix attempting to load a dataset by supplying something along the lines of "normal_*.sdf" would fail

try:
p = Path(path_glob)
paths = list(p.parent.glob(p.name)) if p.name == "*.sdf" else list(p)
paths = list(p.parent.glob(p.name)) if p.name.endswith("*.sdf") else list(p)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be generalised further? Would this not work?

paths = [x for x in p.parent.glob(p.name)]

This still leaves out the possibility of globs like **/*.sdf, if that would ever be useful, but I'm not sure how to generalise to that using Path.glob. You might have to use the plain glob library if you want that, but I don't think it works well with Path.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have generalised it more...and somehow made it longer and do more checks

Comment on lines +125 to +126
if p.suffix.lower() != ".sdf":
raise FileNotFoundError(f"{p.as_posix()} is not an SDF file.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it matter if users are renaming their sdf files? The previous behaviour was to enforce the use of .sdf as a file extension, so happy for you to keep this if you prefer, but I'd be tempted to let people pass whatever filenames they want.

paths = list(p.parent.glob(p.name)) if p.name == "*.sdf" else list(p)
except TypeError:
if p.is_dir():
raise TypeError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is less an issue of typing and more an issue of passing an incompatible value.

Suggested change
raise TypeError(
raise ValueError(

"""

try:
if isinstance(path_glob, PathLike):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this change. I'd normally trust duck typing over direct type checks. Trying p = Path(path_glob) and catching the exception if it fails is probably safer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants