Fix file glob not correctly reading sdfs with prefixed files#106
Fix file glob not correctly reading sdfs with prefixed files#106JoelLucaAdams wants to merge 2 commits intomainfrom
Conversation
src/sdf_xarray/__init__.py
Outdated
| try: | ||
| p = Path(path_glob) | ||
| paths = list(p.parent.glob(p.name)) if p.name == "*.sdf" else list(p) | ||
| paths = list(p.parent.glob(p.name)) if p.name.endswith("*.sdf") else list(p) |
There was a problem hiding this comment.
Could this be generalised further? Would this not work?
paths = [x for x in p.parent.glob(p.name)]This still leaves out the possibility of globs like **/*.sdf, if that would ever be useful, but I'm not sure how to generalise to that using Path.glob. You might have to use the plain glob library if you want that, but I don't think it works well with Path.
There was a problem hiding this comment.
I have generalised it more...and somehow made it longer and do more checks
| if p.suffix.lower() != ".sdf": | ||
| raise FileNotFoundError(f"{p.as_posix()} is not an SDF file.") |
There was a problem hiding this comment.
Should it matter if users are renaming their sdf files? The previous behaviour was to enforce the use of .sdf as a file extension, so happy for you to keep this if you prefer, but I'd be tempted to let people pass whatever filenames they want.
| paths = list(p.parent.glob(p.name)) if p.name == "*.sdf" else list(p) | ||
| except TypeError: | ||
| if p.is_dir(): | ||
| raise TypeError( |
There was a problem hiding this comment.
This is less an issue of typing and more an issue of passing an incompatible value.
| raise TypeError( | |
| raise ValueError( |
| """ | ||
|
|
||
| try: | ||
| if isinstance(path_glob, PathLike): |
There was a problem hiding this comment.
Not sure about this change. I'd normally trust duck typing over direct type checks. Trying p = Path(path_glob) and catching the exception if it fails is probably safer.
If you attempt to load files from a folder that have used the
outputblock'sfile_prefixthen you might have some files that look like this:Previously we were only loading the glob if the final component of the path they supplied was
*.sdf, now we check it ends with it instead to allow for filtering on file prefixes. Prior to this fix attempting to load a dataset by supplying something along the lines of "normal_*.sdf" would fail