IMCDataAnalysis/02-processing.Rmd at main · NicolasSompairac/IMCDataAnalysis · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
# Multi-channel image processing {#processing}

This book describes common analysis steps of spatially-resolved single-cell data
**after** image segmentation and feature extraction. The following sections
describe the processing of IMC raw data, including file type conversion, image
segmentation, feature extraction and data export. To obtain more detailed
information on the individual image processing approaches, please visit their
repositories:

[IMC segmentation
pipeline](https://github.com/BodenmillerGroup/ImcSegmentationPipeline): Raw IMC
data pre-processing is performed using the
[imctools](https://github.com/BodenmillerGroup/imctools) Python package to
convert raw MCD files into OME-TIFF and TIFF files. After image
cropping, an [ilastik](https://www.ilastik.org/) pixel classifier is trained for
image classification prior to image segmentation using
[CellProfiler](https://cellprofiler.org/). Features (e.g. mean pixel intensity)
of segmented objects (e.g. cells) are quantified and exported. Read more in the
[Docs](https://github.com/BodenmillerGroup/ImcSegmentationPipeline/blob/main/scripts/imc_preprocessing.ipynb).

[steinbock](https://github.com/BodenmillerGroup/steinbock): The `steinbock`
framework offers tools for multi-channel image processing using the command-line
or Python code [@Windhager2021]. Supported tasks include IMC data preprocessing,
supervised multi-channel image segmentation, object quantification and data
export to a variety of file formats. It supports functionality similar to those
of the IMC Segmentation Pipeline and further allows deep-learning enabled image
segmentation. The framework is available as platform-independent Docker
container, ensuring reproducibility and user-friendly installation. Read more in
the [Docs](https://bodenmillergroup.github.io/steinbock/latest/).

## Image pre-processing

To facilitate IMC data pre-processing, the
[readimc](https://github.com/BodenmillerGroup/readimc) and
[imctools](https://github.com/BodenmillerGroup/imctools) open-source Python
packages allow extracting the multi-modal (IMC acquisitions, panoramas),
multi-region, multi-channel information contained in raw IMC images. While
`imctools` contains functionality specific to the IMC Segmentation Pipeline, the
`readimc` package contains reader functions for IMC raw data and should be used
for this purpose.

Starting from IMC raw data and a "panel" file, individual acquisitions are
extracted as TIFF and OME-TIFF files. The panel contains information of
antibodies used in the experiment and the user can specify which channels to
keep for downstream analysis. When pixel classification-based image
segmentation is performed (see next section), random tiles are cropped from
images for convenience of pixel labelling.

## Image segmentation

The IMC Segmentation Pipeline supports pixel classification-based image
segmentation while `steinbock` supports pixel classification-based and deep
learning-based segmentation.

**Random forest-based** image segmentation is performed by training a classifier
using [Ilastik](https://www.ilastik.org/) on the randomly extracted image crops
and selected image channels. Pixels are classified as nuclear, cytoplasmic, or
background. Employing a customizable [CellProfiler](https://cellprofiler.org/)
pipeline, the probabilities are then thresholded for segmenting nuclei, and
nuclei are expanded into cytoplasmic regions to obtain cell masks.

**Deep learning-based** image segmentation is performed as presented by
[@Greenwald2021]. Briefly, `steinbock` first aggregates user-defined
image channels to generate two-channel images representing nuclear and
cytoplasmic signals. Next, the
[DeepCell](https://github.com/vanvalenlab/intro-to-deepcell) Python package is
used to run `Mesmer`, a deep learning-enabled segmentation algorithm pre-trained
on `TissueNet`, to automatically obtain cell masks without any further user
input.

Segmentation masks are single-channel images that match the input images in
size, with non-zero grayscale values indicating the IDs of segmented objects
(e.g. cells). These masks are written out as TIFF files after segmentation.

## Feature extraction {#feature-extraction}

Using the segmentation masks together with their corresponding multi-channel
images, the IMC Segmentation Pipeline as well as `steinbock` extract
object-specific features. These include the mean pixel intensity per object and
channel, morphological features (e.g. object area) and the objects' locations.
Object-specific features are written out as CSV files where rows represent
individual objects and columns represent features.

Furthermore, the IMC Segmentation Pipeline and `steinbock` compute _spatial
object graphs_, in which nodes correspond to objects, and nodes in spatial
proximity are connected by an edge. These graphs serve as a proxy for
interactions between neighboring cells. They are stored as edge list in form of
a CSV file.

Both approaches also write out image-specific metadata (e.g. width and height)
as CSV file.

## Data export

To further facilitate compatibility with downstream analysis, `steinbock`
exports data to a variety of file formats such as OME-TIFF for images, FCS for
single-cell data, the _anndata_ [@Wolf2018] format for data analysis in Python,
and various graph file formats for network analysis using software such as
[CytoScape](https://cytoscape.org/) [@Shannon2003]. For export to OME-TIFF,
steinbock uses [xtiff](https://github.com/BodenmillerGroup/xtiff), a Python
package developed for writing multi-channel TIFF stacks.

## Data import into R

In Section \@ref(read-data), we will highlight the use of the
[imcRtools](https://github.com/BodenmillerGroup/imcRtools) and
[cytomapper](https://github.com/BodenmillerGroup/cytomapper) packages to read in
single-cell and image data generated by the IMC Segmentation Pipeline and
`steinbock`. All further downstream analyses are performed in R and detailed in
the following sections.