Skip to content

3. Data pre processing

abdelkaderm edited this page Aug 4, 2025 · 5 revisions

Data Preprocessing with the esd Package

The esd package allows for flexible and powerful data preprocessing. It provides tools for subsetting, aggregating, computing anomalies, regridding, and performing empirical orthogonal functions (EOF) or principal component analysis (PCA). These functionalities can handle a variety of datasets depending on the input object type.


Main Functionalities

1. Sub-setting

To extract a subset of data in time and space, use the subset() function:

subset(x, it, is)
  • it: Time index
  • is: Space index

2. Aggregating

To compute aggregates of a dataset in time or space, use the following commands:

Aggregating in Time:

aggregate(x, it, is, ...)

Additionally, derivative or conversion tools based on aggregate() are available:

  • as.monthly(): Compute monthly aggregates.
  • as.4seasons(): Compute seasonal aggregates.
  • as.annual(): Compute annual aggregates.

Aggregating in Space:

aggregate.area(x, it, is, ...)
  • it: Time index
  • is: Space index

3. Computing Anomalies

To compute the anomaly of an object x, use the anomaly() function:

anomaly(x, ref, ...)
  • ref: Reference period (optional). If not specified, the entire time period of the dataset is used as the default.
  • it and is: Time and space indexes, respectively.

4. Regridding

For gridding, regridding, or transforming grids (e.g., from higher to lower resolution or vice versa), use the regrid() function:

regrid(x, it, is, ...)
  • it and is: Time and space indexes.
  • ...: Additional arguments depending on the type of the input object.

Example:

eraint <- t2m.ERAINT(lon = c(-30, 50), lat = c(40, 80))
merra <- t2m.MERRA(lon = c(-30, 50), lat = c(40, 80))
merra.new <- regrid(merra, is = eraint, verbose = TRUE)

map(merra)
title("Original field")
x11()
map(merra.new)
title("Regridded field")

5. Empirical Orthogonal Functions (EOF)

To compute empirical orthogonal functions based on a field object, use the following:

EOF(x, it, is, ...)
  • it and is: Time and space indexes.
  • ...: Additional arguments depending on the type of the input object.

6. Principal Component Analysis (PCA)

To perform principal component analysis on a station object, use the following:

PCA(x, it, is, ...)
  • it and is: Time and space indexes.
  • ...: Additional arguments depending on the type of the input object.

Example: Comparing ERA40 with NCEP-NCAR 2m January Temperature Dataset

Step 1: Retrieve ERA40 and NCEP-NCAR Datasets

Retrieve datasets using the retrieve() function:

ncep <- retrieve('~/data/ncep/air.mon.mean.nc', lon = c(-40, 40), lat = c(40, 75))
era40 <- retrieve('~/data/ERA40/era40_t2m.nc', lon = c(-40, 40), lat = c(40, 75))

Step 2: Extract the Common Sub-Period (1958–2002)

ncep <- subset(ncep, it = c(1958, 2002))
era40 <- subset(era40, it = c(1958, 2002))

Step 3: Aggregate to Seasons and Extract Winter (DJF)

Check the data class:

class(era40)
class(ncep)

For NCEP (Monthly Data):

Convert from monthly to four seasons and extract January:

NCEP <- as.4seasons(ncep)
DJF <- subset(NCEP, it = 1)

For ERA40 (Daily Data):

Convert from daily to monthly and then to four seasons:

Era40 <- as.monthly(era40)
ERA40 <- as.4seasons(Era40)
djf <- subset(ERA40,it=1)

Step 4: Compute anomalies, spatial averaging, convert to station objects, and combine the latter as

> djfs <- as.station(spatial.avg.field(anomaly(djf)))
> attr(djfs,"location") <- "ERA40"
> DJFs <- as.station(spatial.avg.field(anomaly(DJF)))
> attr(DJFs,"location") <- "NCEP-NCAR"
> com <- combine.stations(djfs,DJFs)

Step 5: Plot the result

> plot(com)

https://github.com/metno/esd/blob/master/img/era40vsncep.png

Clone this wiki locally