This issue should discuss the general structure of our Python modules, the functions and our internal data model. This will be entangled with the discussion and decision of the implementation details of the indicator correlation functions (see #6), but the general layout of the modules can already be discussed now.
It is not yet clear to me how we logically split up the code from PWSQC and pws-pyqc. Since both have a part which does flagging of faulty periods and a part which does bias correction, it seems logical to also have separate modules for flagging and bias correction. But it is also logical to keep code from PWSQC and pws-pyqc closely together.
First draft of module structure
flagging.py <-- not sure if this is the correct English term for adding the flags to the PWS timeseries
|── fz_filter()
|── hi_filter()
|── station_outlier_filter() <-- maybe needs a more descriptive name that indicates what method is used
|── indicator_correlation_filter() <-- not sure about this and the next one
|── calc_indicator_correlation()
|── ...
bias_correct.py
|── quantile_mapping() <-- use what is in pws-pyqc, but not yet sure about details
|── ... <-- something from PWSQC bias correction that is done in station-outlier-filter?
|── ...
Note that we can reference the papers and original codebase for each function in their docstring. Hence, we do not have to hint the origin of the methods in their function name.
Also not that finding neighboring sensors shall be done with the implementation from poligrain, see OpenSenseAction/poligrain#15
Data model
Since we fully embrace xarray and xarray.Dataset in poligrain, it seems logical to also rely on it here. I would, however, first do some experiments when example data and an example workflow is ready. If we can write simple functions that work with 1D time series, we could just pass np.arrays and would have much more generic code. We can still use xarray.Datasets for loading and handling the data, but when passing to function we do not have to rely on it and just use the underlying numpy.arrays. But, let's do some experiments first.
This issue should discuss the general structure of our Python modules, the functions and our internal data model. This will be entangled with the discussion and decision of the implementation details of the indicator correlation functions (see #6), but the general layout of the modules can already be discussed now.
It is not yet clear to me how we logically split up the code from
PWSQCandpws-pyqc. Since both have a part which does flagging of faulty periods and a part which does bias correction, it seems logical to also have separate modules for flagging and bias correction. But it is also logical to keep code fromPWSQCandpws-pyqcclosely together.First draft of module structure
Note that we can reference the papers and original codebase for each function in their docstring. Hence, we do not have to hint the origin of the methods in their function name.
Also not that finding neighboring sensors shall be done with the implementation from
poligrain, see OpenSenseAction/poligrain#15Data model
Since we fully embrace
xarrayandxarray.Datasetinpoligrain, it seems logical to also rely on it here. I would, however, first do some experiments when example data and an example workflow is ready. If we can write simple functions that work with 1D time series, we could just passnp.arraysand would have much more generic code. We can still usexarray.Datasetsfor loading and handling the data, but when passing to function we do not have to rely on it and just use the underlyingnumpy.arrays. But, let's do some experiments first.