Summary
There are currently a number of file-format-specific read_ functions:
read_mtz
read_cif
read_crystfel
read_precognition
read_csv
read_pickle
These should obviously all continue to exist, for use cases where one and only one file format is an acceptable input. However, there are many cases where any of these inputs are acceptable, and this ends up requiring a bunch of control logic to check what file format the user has supplied. I think a more mature API should have a single rs.read (or perhaps rs.read_data or something) function which would have the call signature
rs.read(filename, fileformat='auto')
When fileformat is left as "auto", the function would check for an expected set of file extensions (which we would want to carefully curate) and if none of those are found, raise an error and tell the user to supply the file format.
Example implementation elsewhere
A good example of this type of API is mdtraj's mdtraj.load, which wraps around a number of format-specific loading functions.
Design choices
- What file formats are accepted, and what do we call these formats?
- I believe that both
rs.read_crystfel and rs.read_precognition accept.hkl files. Do we pick one or the other to be the default? Require specification? Are there any other similar duplicates? Is this an unsurmountable issue for implementing this function?
- Are there situations where the downstream code needs to know which file format was read? I tend to think not; code may need to check for things like "are there phases?", "is this merged?" etc., but in principle none of those are format-specific. But, in theory, the function could take an optional argument like
return_extension and then return a tuple of (rs.DataSet, filetype). That feels overcomplicated to me; a developer can always just chose to keep using the format-specific load functions if needed.
- Are there any other weird edge cases to consider?
I'm happy to take a stab at this and make a PR if that's of interest.
Summary
There are currently a number of file-format-specific
read_functions:read_mtzread_cifread_crystfelread_precognitionread_csvread_pickleThese should obviously all continue to exist, for use cases where one and only one file format is an acceptable input. However, there are many cases where any of these inputs are acceptable, and this ends up requiring a bunch of control logic to check what file format the user has supplied. I think a more mature API should have a single
rs.read(or perhapsrs.read_dataor something) function which would have the call signatureWhen
fileformatis left as"auto", the function would check for an expected set of file extensions (which we would want to carefully curate) and if none of those are found, raise an error and tell the user to supply the file format.Example implementation elsewhere
A good example of this type of API is
mdtraj'smdtraj.load, which wraps around a number of format-specific loading functions.Design choices
rs.read_crystfelandrs.read_precognitionaccept.hklfiles. Do we pick one or the other to be the default? Require specification? Are there any other similar duplicates? Is this an unsurmountable issue for implementing this function?return_extensionand then return a tuple of(rs.DataSet, filetype). That feels overcomplicated to me; a developer can always just chose to keep using the format-specific load functions if needed.I'm happy to take a stab at this and make a PR if that's of interest.