Skip to content

2. Data handling

abdelkaderm edited this page Aug 4, 2025 · 52 revisions

An important fundamental principle of the esd package is the use of "smart data", where data is seamlessly integrated with its metadata. Metadata is attached to the data as attributes, allowing it to "tag" the data without interfering with operations. This approach ensures that the associated metadata is always available for reference, making data processing and analysis more efficient and traceable.

The functions in the esd package are specifically designed to take full advantage of this metadata, simplifying tasks such as data selection, processing, and analysis. Additionally, esd aligns with the FAIR principles (Findable, Accessible, Interoperable, and Reusable) to ensure transparency and trustworthiness in climate data.


Commonly Used Functions in esd

Here are some of the most commonly used functions in the esd package for working with climate data:

Function Description
select.station Find or select one (or several) specific station(s) from existing metadata.
station Retrieve data for a weather station from a specific dataset.
retrieve Retrieve field data from a NetCDF file.

Example Datasets in esd

This section provides examples of how to create and/or retrieve (freely available global and/or national) datasets using the esd library. The package includes two global datasets, the MET Norway archive, and data from Nordic programs. Below are the primary datasets supported by esd:

1. MET Norway Archive (KDVH)

  • Contains daily and monthly time step data.
  • Datasets are referred to as:
  • Note: These datasets are accessible only within the MET Norway firewall.

2. Global Historical Climatology Network (GHCN)

  • Provides historical climate data on daily and monthly time steps.
  • Datasets are referred to as:

3. European Climate Data (ECAD)

4. Nordic Monthly Datasets

  • Includes datasets from the following Nordic programs:
    • NACD: Nordic Arctic Climatology Dataset.
    • NARP: Nordic Arctic Research Program.

Below are useful links and descriptions to help you handle weather station data and perform advanced data operations using esd:


Select Weather Stations from Existing Metadata

To select weather stations from one or more datasets mentioned previously, use the following command:

> ss <- select.station()

The empty parentheses () can include arguments for any of the search criteria listed below. These arguments allow you to filter weather stations based on location, altitude, metadata, or other attributes.

Search Argument Description
loc Select stations by location's name(s).
lon Select stations by location's longitude.
lat Select stations by location's latitude.
alt Select stations by altitude: positive values select stations above the altitude; negative values select below it.
param Select stations by recorded parameter or variable identifier (e.g., temperature, precipitation).
src Limit the search to a specific data source (e.g., "NARP", "NACD", "NORDKLIMA", "GHCNM", "METNOM", etc.).
stid Select stations by the identifier of the weather/climate station.
cntr Select stations by country name.
it Select stations for specific or a range of dates:
- Use an integer [1:12] for months.
- Use a 4-digit integer for years (e.g., 2014).
- Use a vector of dates (e.g., "2014-01-01").
nmin Select stations with at least nmin number of years, months, or days (e.g., 30 years).

Notes:

  • The select.station function allows you to combine multiple arguments to refine your search.

  • For example, to select stations within a specific country and altitude range, you can use:

    > ss <- select.station(cntr = "Norway", alt = 500)
  • The metadata-driven search ensures you can quickly locate the exact station data you need for analysis.

  • The following map shows all available weather stations recording 2m-temperature within the spatial domain covering Scandinavian regions. This is obtained by typing

> ss <- select.station(param='t2m',lon=c(-15,45),lat=c(55,80))
> map(ss,cex=.5,col="darkred",bg="red")

Alt text

  • The following map shows all available weather stations recording precipitation within the spatial domain covering Scandinavian regions. This is obtained by typing
> ss <- select.station(param='precip',lon=c(-15,45),lat=c(55,80))
> map(ss,cex=.5,col="darkgreen",bg="green")

Alt text

Retrieve Data from Predefined Datasets

Example: METNO

To retrieve the daily mean temperature for the "Oslo" station (18700) from the MET Norway archive (accessible only within the MET Norway firewall), use the following commands:

For Daily Data:

# Retrieve daily data
t2m.dly <- station(stid = '18700', param = 't2m', src = 'metnod')

This is equivalent to:

t2m.dly <- station.metnod(stid = '18700', param = 't2m')

For Monthly Data:

# Retrieve monthly data
t2m.mon <- station.metnom(stid = "18700", param = "t2m")

Aggregating and Plotting Annual Mean Data:

You can aggregate the daily data into annual mean values and plot the result:

# Aggregate to annual mean and plot
t2m.ann <- as.annual(t2m.dly, FUN = "mean")
plot(t2m.ann, ylim = c(2, 10))

Adding a Linear Trend:

To add a linear trend line to the plot:

# Add a linear trend to the plot
lines(trend(t2m.ann), col = "red", lwd = 2)

Annual Mean Temperature for Oslo


Create a Station Object from Scratch (Advanced Users)

Advanced users can create a station object from scratch using the as.station() function. Here's an example:

# Create a station object
s <- as.station(x = data, stid = "18700", loc = "Oslo-Blindern", lon = 10.7, lat = 59.9, alt = 94,
                param = c("t2m", "precip"))

Here, data is a data.frame containing ordered values recorded at the "Oslo-Blindern" weather station.

Example 1: Creating Monthly Data

# Generate synthetic monthly data
data <- round(matrix(rnorm(20 * 12), 20, 12), 2)
colnames(data) <- month.abb
x <- data.frame(year = 1981:2000, data)
X <- as.station(x, loc = "", param = "noise", unit = "none")

Example 2: Creating Daily Data or Indexed Data from a Text File

# Read data from a text file
x <- read.table(file.name, header = TRUE, skip = 20, sep = ",")

# Create a zoo object for the 'dates' and 'data' columns
z <- zoo(x$data, order.by = x$date) # Check ?zoo for additional info

# Create a station object from the zoo object
y <- as.station(z, stid, lon, lat, alt, param, calendar, quality, cntr, loc, src, url, unit, longname, reference, info)

Retrieve Field Data from NetCDF Files (e.g., Reanalysis, GCMs, RCMs)

The retrieve() function reads data from a NetCDF file and returns a zoo field object with attributes. It supports both regular and irregular (rotated) longitude-latitude grids.

Arguments for retrieve():

Argument Description
ncfile Full path of the NetCDF file or an object of class ncdf or ncdf4.
lon Select a specific grid cell or a subregion by longitude.
lat Select a specific grid cell or a subregion by latitude.
lev Select a specific vertical level.
time Select a specific date or time span.
param Climate parameter or variable (e.g., tas: surface temperature).
plot Plot the retrieved object if set to TRUE.
greenwich Convert longitudes to -180°E/180°E or center maps on the Greenwich meridian.
verbose Displays extra information on progress if set to TRUE.

Examples of Retrieving Data from NetCDF Files

Example 1: Retrieve ERA40 Reanalysis

To retrieve the ERA40 reanalysis air surface temperature (tas) from the Climate Explorer website, follow these steps:

  1. Download the data and store it locally in a destination file:

    download.file(url = "http://climexp.knmi.nl/NCEPNCAR40/air.2m.mon.mean.nc", 
                  destfile = "/tmp/air.2m.mon.mean.nc",
                  method = "auto", quiet = FALSE, mode = "w", cacheOK = TRUE)
  2. Read the data into an object named eraint and plot the result:

    eraint <- retrieve('/tmp/air.2m.mon.mean.nc', plot = TRUE)

Here is an example of the resulting plot:

ERA40 Reanalysis Surface Temperature


Example 2: CMIP3/5 RCP Scenarios

To retrieve the air surface temperature (tas) for RCP 4.5 scenarios using the NorESM1-ME model from the Climate Explorer website, follow these steps:

  1. Download the data and store it locally in a destination file:

    download.file(url = "http://climexp.knmi.nl/CMIP5/monthly/tas/tas_Amon_NorESM1-ME_rcp45_000.nc", 
                  destfile = "/tmp/tas_Amon_NorESM1-ME_rcp45_000.nc",
                  method = "auto", quiet = FALSE, mode = "w", cacheOK = TRUE)
  2. Read the data into an object named gcm:

    gcm <- retrieve(ncfile = "/tmp/tas_Amon_NorESM1-ME_rcp45_000.nc", param = "tas", plot = TRUE)
  3. Map the results using the following command:

    map(gcm, projection = "lonlat")

Here is an example of the resulting map:

CMIP5 RCP 4.5 Surface Temperature Map

Summary

The esd package provides powerful tools for handling climate data by integrating metadata directly into the data structure. This approach simplifies data retrieval, processing, and analysis while adhering to the FAIR principles for trustworthy and reusable climate data.

Whether you are working with datasets from the MET Norway archive, GHCN, ECAD, Nordic programs, REANALYSES, CMIP3/5/6, ..., esd makes it easier to manage and analyze climate data effectively.

Clone this wiki locally