In meteor, we found it very useful to implement a Map class. I actually tried to avoid doing this, but eventually found it necessary to keep the rest of the code clean/sane. Otherwise, you start to pass around a huge number of column labels, and the entire thing becomes a crazy bookkeeping exercise.
We have an implementation that should be considered a functional prototype, not a final version:
https://github.com/rs-station/meteor/blob/main/meteor/rsmap.py
The current implementation has a good interface, but the implementation is sub-optimal. It is basically a subclass of rs.DataSet that restricts the columns, only allowing amplitudes, phases, uncertainties, and H/K/L. This has a big downside in that it can affect inherited pandas.DataFrame or rs.DataSet methods in unexpected ways, making for a less than ideal user and developer experience.
That said, such a class could be generally useful to rs users, so it would be great to push a better version of this "upstream" into rs, and have meteor simply use that implementation.
what is a Map?
We can conceptualize it as a special case of an rs.DataSet, where you flag specific columns as:
- amplitudes
- phases
- uncertainties (optional)
And you can only have ONE of each of these special columns. This opens up a lot of nice logic: converting to real space and back, comparing two map objects (think of a scaling function: scale(map1, map2)), etc. The simple big feature is that by declaring which columns are amplitudes/phases at the time of instantiation, you get around having to keep track of column label names.
what decisions do we face?
First, feedback would be appreciated as to if we want to implement a Map object in rs. I am a yes vote on this. But obviously we need some consensus before proceeding.
If we decide to do it, we need to decide on a general implementation strategy. Here are the possibilities as I see them -- if I didn't have the imagination to come up with a good option, please speak up!
- Subclass
rs.DataSet and simply annotate what columns are amplitudes, phases, uncertainties. Place dtype requirements on these columns. The instance, though, can have other, unrelated columns (this differs from current implementation). This is probably the simplest model to implement.
- Build a new class from the ground up. This is more code, but provides for maximum control, and would allow us to be strict (only keep columns we want).
- Try and clean up the current model. Subclass
rs.DataSet, be strict about what columns are allowed. Aggressively test inherited methods to ensure they work.
These appear in my order of preference. Each model has advantages and disadvantages, though. Feedback wanted.
@JBGreisman @kmdalton @alisiafadini
In
meteor, we found it very useful to implement aMapclass. I actually tried to avoid doing this, but eventually found it necessary to keep the rest of the code clean/sane. Otherwise, you start to pass around a huge number of column labels, and the entire thing becomes a crazy bookkeeping exercise.We have an implementation that should be considered a functional prototype, not a final version:
https://github.com/rs-station/meteor/blob/main/meteor/rsmap.py
The current implementation has a good interface, but the implementation is sub-optimal. It is basically a subclass of
rs.DataSetthat restricts the columns, only allowing amplitudes, phases, uncertainties, and H/K/L. This has a big downside in that it can affect inheritedpandas.DataFrameorrs.DataSetmethods in unexpected ways, making for a less than ideal user and developer experience.That said, such a class could be generally useful to
rsusers, so it would be great to push a better version of this "upstream" intors, and havemeteorsimply use that implementation.what is a
Map?We can conceptualize it as a special case of an
rs.DataSet, where you flag specific columns as:And you can only have ONE of each of these special columns. This opens up a lot of nice logic: converting to real space and back, comparing two map objects (think of a scaling function:
scale(map1, map2)), etc. The simple big feature is that by declaring which columns are amplitudes/phases at the time of instantiation, you get around having to keep track of column label names.what decisions do we face?
First, feedback would be appreciated as to if we want to implement a
Mapobject inrs. I am a yes vote on this. But obviously we need some consensus before proceeding.If we decide to do it, we need to decide on a general implementation strategy. Here are the possibilities as I see them -- if I didn't have the imagination to come up with a good option, please speak up!
rs.DataSetand simply annotate what columns are amplitudes, phases, uncertainties. Place dtype requirements on these columns. The instance, though, can have other, unrelated columns (this differs from current implementation). This is probably the simplest model to implement.rs.DataSet, be strict about what columns are allowed. Aggressively test inherited methods to ensure they work.These appear in my order of preference. Each model has advantages and disadvantages, though. Feedback wanted.
@JBGreisman @kmdalton @alisiafadini