Transforms in rslearn can be used to implement augmentations, to normalize data, to perform specialized data pre-processing, etc.
Transforms are specified via the SplitConfig object. The RslearnDataModule accepts
a default SplitConfig that is shared across all model stages, as well as a train,
val, test, and predict SplitConfig that overrides those default options. See
ModelConfig.md for more details. Note that there is currently no way
to append to the default transforms in a stage-specific transform list, so e.g. if you
have data pre-processing steps in the default SplitConfig but want to add
augmentations for training, you would need to repeat the pre-processing steps when
overriding the transforms in the train SplitConfig.
Transforms are applied after initial data loading and task processing. First, the
DataInputs in the model configuration file are read from the rslearn dataset. The
passthrough inputs are copied to an "input dict". Labels are passed to the Task
object(s), which process them into tensors suitable for training that are placed in the
"target dict".
The transforms are then applied on the per-example input and target dicts, and can make arbitrary adjustments to them.
Many transforms accept selectors that specify which key(s) in the input and/or target
dicts the transform should be applied on. Slashes are used to separate path components
in the selector, where "input/image" accesses input_dict["image"] while
"target/task1/boxes" accesses target_dict["task1"]["boxes"]. If the selector does not
start with either "input/" or "target/", it is assumed to access the input dict, so
"image" also accesses input_dict["image"].
The Concatenate transform concatenates bands across multiple image inputs. It can
also be used to copy an image under one name in the input dict to another name.
Here is a summary of the configuration options:
transforms:
- class_path: rslearn.train.transforms.concatenate.Concatenate
init_args:
# This is a map from selectors to a list of band indices that should be
# taken from that image. The images across the selectors are concatenated
# together. An empty list of band indices can be used to indicate that all
# bands from that image should be retained.
selections:
# Take first three bands from Sentinel-2.
sentinel2: [0, 1, 2]
# And all bands from Sentinel-1.
sentinel1: []
# The selector to write the output. Here it will be saved at
# input_dict["image"].
output_selector: imageSome decoders like FasterRCNN use input_dict["image"] to get the original input
dimensions. However, some encoders expect the input image to be named something else,
e.g. OlmoEarth expects the Sentinel-2 time series input to be called sentinel2_l2a.
Then, Concatenate can be used to copy it:
transforms:
# Normalize the Sentinel-2 images for the OlmoEarth model.
- class_path: rslearn.models.olmoearth_pretrain.norm.OlmoEarthNormalize
init_args:
band_names:
sentinel2_l2a: ["B02", "B03", "B04", "B08", "B05", "B06", "B07", "B8A", "B11", "B12", "B01", "B09"]
# Apply Concatenate to copy sentinel2_l2a -> image for FasterRCNN.
- class_path: rslearn.train.transforms.concatenate.Concatenate
init_args:
selections:
sentinel2_l2a: []
output_selector: imageThe Flip transform randomly flips inputs horizontally and/or vertically. Typically it
only used for training.
Here is a summary of the configuration options:
transforms:
# ...
- class_path: rslearn.train.transforms.flip.Flip
init_args:
# Selectors for images that should be flipped. It defaults to ["image"].
image_selectors: ["image"]
# Selectors for DetectionTask targets that should be flipped. Note that the
# DetectionTask targets are a dict with keys for boxes, labels, etc., but
# the selectors here should correspond to the overall dict. There example
# here would work for single-task training; if using MultiTask, then if the
# detection task name is "detect", you can set box_selectors to
# ["target/detect"] (not ["target/detect/boxes"]).
# The default is an empty list.
box_selectors: ["target/"]Here is an example to flip an input image along with segmentation targets:
transforms:
# ...
- class_path: rslearn.train.transforms.flip.Flip
init_args:
# Similar to DetectionTask, the SegmentationTask targets are a dict with
# the "classes" and "valid" keys. Unlike with box_selectors, we need to
# specify those images directly here. With MultiTask, there would be
# another level in the dict, e.g. "target/segment_task_name/classes".
image_selectors:
- image
- target/classes
- target/validThe Mask transform uses one (mask) image input to mask another (target) image input.
The value of the target image is set to the mask value everywhere where the mask image
is 0.
Here is a summary of the configuration options:
transforms:
- class_path: rslearn.train.transforms.mask.Mask
init_args:
# Selectors specifying which images to mask.
selectors: ["image"]
# The selector for the image to use as the mask.
mask_selector: "mask"
# The mask value. Each image in selectors will be set to this value where
# the mask_selector image is 0.
mask_value: 0The original use case for Mask is for cases where only a portion of the window was
labeled for bounding box detection tasks. For SegmentationTask, the invalid portions of
the label can be set to the NODATA value, but this is not possible for DetectionTask.
The example above could be coupled with a Task and DataInputs like this:
task:
class_path: rslearn.train.tasks.detection.DetectionTask
init_args:
property_name: "category"
classes: ["unknown", "platform", "turbine"]
box_size: 15
inputs:
image:
data_type: "raster"
layers: ["sentinel2"]
bands: ["B04", "B03", "B02", "B05", "B06", "B07", "B08", "B11", "B12"]
passthrough: true
dtype: FLOAT32
mask:
data_type: "raster"
layers: ["mask"]
bands: ["mask"]
passthrough: true
dtype: FLOAT32
is_target: true
targets:
data_type: "vector"
layers: ["label"]
is_target: trueThen, the dataset should contain a mask layer with a raster that is 0 at pixels that
were outside the context used for labeling, and 1 otherwise, and a sentinel2 layer
containing the input image. The image will be masked to 0 at regions not considered
during labeling, so that the model can learn to predict no boxes in those regions.
The Sentinel2SCLToMask transform converts a Sentinel-2 Scene Classification Layer
(SCL) raster into a binary mask image (1 = valid, 0 = masked). This can be used with
Mask to apply SCL-based masking at training time.
transforms:
- class_path: rslearn.train.transforms.sentinel2.Sentinel2SCLToMask
init_args:
scl_selector: "scl"
output_selector: "mask"
# Default: [3, 8, 9, 10] (cloud shadow, medium/high cloud, cirrus)
exclude_scl_values: [3, 8, 9, 10]
- class_path: rslearn.train.transforms.mask.Mask
init_args:
selectors: ["image"]
mask_selector: "mask"
mask_value: 0The Normalize transform implements linear normalization of images.
Here is a summary of the configuration options:
transforms:
- class_path: rslearn.train.transforms.normalize.Normalize
init_args:
# The mean and standard deviation for z-score normalization.
# The image will be normalized as (image - mean) / std.
# Both mean and std can either be one value (to apply on all bands), or a
# list with one value per band.
mean: 10
std: 200
# Optionally clip the result to this range after the linear rescaling. The
# default is null to not perform any clipping.
valid_range: [0, 1]
# The image selectors to apply the normalization on. The default is
# ["image"].
selectors: ["image"]
# Optionally limit the normalization to specific bands. This list specifies
# the bands to normalize. If mean/std are a list, they should have the same
# size as this bands list.
bands: [0, 1, 2]
# If normalizing specific bands in image time series, num_bands should be
# set to the number of bands at each timestep. If num_bands is set, then
# the bands is repeated for each timestep, e.g. if bands=[2] then we apply
# normalization on image[2], image[2+num_bands], etc. num_bands can also be
# set without setting bands if mean and std are lists, in which case those
# means and stds are repeated for each timestep.
num_bands: 9The Pad transform performs deterministic padding or cropping.
Here is a summary of the configuration options:
transforms:
- class_path: rslearn.train.transforms.pad.Pad
init_args:
# The size to pad to, or a min/max of pad sizes. If the image is larger
# than this size, it is cropped instead.
size: 32
# "topleft" (default) to only apply padding on the bottom and right sides,
# or "center" to apply padding equally on all sides.
mode: "topleft"
# The image and box selectors to apply padding on.
image_selectors: ["image"]
box_selectors: []- With topleft mode and an image smaller than
size, padding is added to the bottom and right so that the original image content is in the topleft. - With topleft mode and an image larger than
size, the image is cropped to the topleftsize x sizeportion. - With center mode and an image smaller than
size, padding is added equally on all sides so that the original image content is in the center. - With center mode and an image larger than
size, center cropping is applied to the centersize x sizeportion. - If
sizeis a list like[128, 256], then a random size between 128 and 256 is sampled for each example before applying the operation. This may not be useful currently since we do not support batch transforms yet.
The SelectBands transform selects a subset of bands from an image. Concatenate can
be used for this purpose too (with one selector), but SelectBands supports operating
over image time series whereas Concatenate does not.
Here is a summary of the configuration options:
transforms:
- class_path: rslearn.train.transforms.select_bands.SelectBands
init_args:
# The band indices to select.
band_indices: [0, 1, 2]
# The selector to read the input image, default "image".
input_selector: "image"
# The output selector under which to save the image, default "image".
output_selector: "image"
# Optional number of bands per image, to distinguish between stacked images
# in an image time series. If set, then the band_indices are selected for
# each image in the time series.
num_bands_per_timestep: 9The Sentinel1ToDecibels transform converts Sentinel-1 data from raw intensities
(linear scale) to or from decibels.
Here is a summary of the configuration options:
transforms:
- class_path: rslearn.train.transforms.sentinel1.Sentinel1ToDecibels
init_args:
# The image selectors to apply the conversion on.
selectors: ["image"]
# Set true to convert from decibels to linear scale.
from_decibels: false
# Values less than epsilon are clipped to epsilon when converting to
# decibels.
epsilon: 1e-6