Skip to content

Single rs.read function that detects file extension #296

@dennisbrookner

Description

@dennisbrookner

Summary

There are currently a number of file-format-specific read_ functions:

  • read_mtz
  • read_cif
  • read_crystfel
  • read_precognition
  • read_csv
  • read_pickle

These should obviously all continue to exist, for use cases where one and only one file format is an acceptable input. However, there are many cases where any of these inputs are acceptable, and this ends up requiring a bunch of control logic to check what file format the user has supplied. I think a more mature API should have a single rs.read (or perhaps rs.read_data or something) function which would have the call signature

rs.read(filename, fileformat='auto')

When fileformat is left as "auto", the function would check for an expected set of file extensions (which we would want to carefully curate) and if none of those are found, raise an error and tell the user to supply the file format.

Example implementation elsewhere

A good example of this type of API is mdtraj's mdtraj.load, which wraps around a number of format-specific loading functions.

Design choices

  • What file formats are accepted, and what do we call these formats?
  • I believe that both rs.read_crystfel and rs.read_precognition accept.hkl files. Do we pick one or the other to be the default? Require specification? Are there any other similar duplicates? Is this an unsurmountable issue for implementing this function?
  • Are there situations where the downstream code needs to know which file format was read? I tend to think not; code may need to check for things like "are there phases?", "is this merged?" etc., but in principle none of those are format-specific. But, in theory, the function could take an optional argument like return_extension and then return a tuple of (rs.DataSet, filetype). That feels overcomplicated to me; a developer can always just chose to keep using the format-specific load functions if needed.
  • Are there any other weird edge cases to consider?

I'm happy to take a stab at this and make a PR if that's of interest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    APIInterface DecisionsenhancementImprovement to existing feature

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions