Spatial/10-Map-Algebra.Rmd at main · mgimond/Spatial · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
# Map Algebra {#chp10_0}

## Introduction

Cartographic modeling is a structured approach to geographic analysis that uses a language of operations to interpret spatial data. These operations are organized into functional categories that describe how data values are transformed and combined across geographic space. The modeling system, as developed by **Dana Tomlin** [@DTomlin] is built around three core spatial contexts:

+ **Local operations**: Compute new values for each location based on data explicitly associated with that location, often across one or more layers.

+ **Focal operations**: Compute values based on the surrounding neighborhood of each location. These neighborhoods can be defined by distance, direction, travel cost, or visibility.

+  **Zonal operations**: Aggregate data across defined zones, which are mutually exclusive and collectively exhaustive areas of the study region.

In addition to these, O'Sullivan and Unwin [@Unwin1] propose a fourth category: **global operations**. These refer to analytical functions that consider *all* locations in a dataset simultaneously, such as calculating overall statistics, evaluating spatial autocorrelation, or computing Euclidean distances to nearest features.


```{r echo=FALSE}
library(raster)
library(rasterVis)

gp <- function(sg, nr=1, sfmt="%02.0f", fnts = 2.1){
  library(ggplot2)
  library(scales)
  r.df <- as.data.frame(cbind(xyFromCell(sg,1:ncell(sg)),getValues(sg)))
  plotData = tidyr::pivot_longer(r.df, cols = c(-x,-y) )
  ggplot(aes(x = x, y = y), data = plotData, environment = environment()) +
    geom_tile(aes(fill = value)) + facet_wrap(~ name, nrow=nr) +
    scale_colour_brewer(palette="Blues") +
    theme(strip.text.x = element_text(size=9, family="mono", face="bold"),
          legend.position = "none",
          axis.text=element_text(size=7),
          axis.title=element_text(size=8) ) +
    coord_equal() + geom_text(aes(label=sprintf(fmt=sfmt, value)), colour='white',size=fnts)
}

r = raster(ncol=10, nrow=10, xmn=0, xmx=100, ymn=0, ymx=100)
projection(r) = "+proj=utm +zone=19 +datum=NAD83"
values(r) = seq(1 : ncell(r))
```

## Local operations and functions

Local operations are the most fundamental type of cartographic modeling function. They compute a new value for each location based solely on the data explicitly associated with that same location—either from a single layer or from multiple layers.

For instance, starting with an original raster layer, we might apply a sequence of operations—first multiplying each cell value by 2, then adding 1. The result is a new raster in which each cell reflects the cumulative effect of those operations applied to the corresponding cell in the original layer. This is an example of a **unary operation**, where only a single raster is involved in the computation.

```{r f10-local01, echo=FALSE, fig.cap = "Example of a local operation where  `output=(2 * raster + 1)`.", fig.width=5, fig.height=3}

l1 <- r * 2 + 1   # Apply basic operations to r, assign output to l1

sg = stack(r,l1)
names(sg) = c("input","output")
gp(sg,1)

```

Local operations can also involve multiple raster layers. For example, two rasters can be added together by summing the values of overlapping cells, resulting in a new raster where each cell reflects the combined value from the corresponding cells in the input layers.

```{r f10-local02, echo=FALSE, fig.cap = "Example of a local operation where `output=(input1+input2)`. Note how each cell output only involves input raster cells that share the same exact location.", fig.width=7, fig.height=3}
l2 <- r - r + runif(length(r), 0,10)   # Apply basic operations to r, assign output to l1
l3 <- r + l2
sg = stack(r,l2,l3)
names(sg) = c("input1"," input2", "output")
gp(sg,1)
```


Local operations also include **reclassification**, where a range of input raster values is assigned a new, common value. This technique is useful for grouping data into meaningful categories. For example, we might reclassify the input raster values as follows:

 Original values  Reclassified values
----------------  --------------------
            0-25  25
           26-50  50
           51-75  75
          76-100  100

```{r f10-local03, echo=FALSE, fig.cap = "Example of a local operation where the output results from the reclassification of input values.", message=FALSE, warning=FALSE, fig.height=2.9, fig.width=5}

m    = c(0, 25, 25,  25, 50, 50,  50, 75,  75, 75, 100, 100)
m    = matrix(m, ncol=3, byrow = T)
l4   = reclassify(r, m, right = T)

l1 <- r * 2 + 1   # Apply basic operations to r, assign output to l1

sg = stack(r,l4)
names(sg) = c("input","output")
gp(sg,1)

```

## Focal operations and functions

Focal operations differ from local operations in that they consider the values of neighboring cells when computing a new value for each location. Instead of looking only at a single cell, focal functions examine a defined neighborhood--such as the eight adjacent cells surrounding a central location.

For example, a focal operation might calculate the average of all neighboring cell values and assign that average to the central cell. This approach is useful for **smoothing** data, detecting spatial patterns, and identifying transitions across a surface.

```{r f10-focal01, echo=FALSE, fig.cap = "Example of a focal operation where the output cell values take on the average value of neighboring cells from the input raster. Focal cells surrounded by non-existent cells are assigned an `NA` in this example.", message=FALSE, warning=FALSE, fig.height=2.9, fig.width=5}

r2 = raster(ncol=10, nrow=10, xmn=0, xmx=100, ymn=0, ymx=100)
set.seed(123)
values(r2) = round(runif(ncell(r2),1,100))
f1 = focal(r2, w=matrix(1/9,nrow=3,ncol=3), fun=sum)

sg = stack(r2,f1)
names(sg) = c("input","output")
gp(sg,1,"%2.1f")

```

Notice in the example above that the edge cells in the output raster have been assigned a value of `NA` (No Data). This occurs because the neighborhood around those edge cells extends beyond the raster’s boundary, where no values exist. Some GIS applications handle this differently—they may ignore the missing surrounding cells and compute the average using only the available neighboring values, as demonstrated in the next example.

```{r f10-focal02, echo=FALSE, fig.cap = "Example of a focal operation where the output cell values take on the average value of neighboring cells from the input raster. Surrounding non-existent cells are ignored.", message=FALSE, warning=FALSE, fig.height=3, fig.width=5}

r2 = raster(ncol=10, nrow=10, xmn=0, xmx=100, ymn=0, ymx=100)
set.seed(123)
values(r2) = round(runif(ncell(r2),1,100))
f1 = focal(r2, w=matrix(1,nrow=3,ncol=3)  , fun=mean, na.rm=TRUE, pad=TRUE)

sg = stack(r2, f1)
names(sg) = c("input","output")
gp(sg,1,"%2.1f")

```

Focal (or neighborhood) operations require the definition of a window region, commonly referred to as a **kernel**. In the examples above, a simple 3×3 kernel was used, where each cell’s value was influenced by its immediate neighbors. The kernel itself can vary in both size and shape—for instance, a 3×3 square where the central cell is excluded (resulting in eight neighbors), or a circular neighborhood defined by a specified radius.

```{r f10-focal03, echo=FALSE, fig.cap = "Example of a focal operation where the kernel is defined by a 3 by 3 cell *without* the center cell, and whose output cell takes on the average value of those neighboring cells. ", message=FALSE, warning=FALSE, fig.height=3, fig.width=5}

r2 = raster(ncol=10, nrow=10, xmn=0, xmx=100, ymn=0, ymx=100)
set.seed(123)
values(r2) = round(runif(ncell(r2),1,100))
m  <- matrix(c(1,1,1,1,0,1,1,1,1),nrow = 3) # This defines the 3x3 moving window
f1 = focal(r2, w=m  , fun=mean, na.rm=TRUE, pad=TRUE)

sg = stack(r2,f1)
names(sg) = c("input","output")
gp(sg,1,"%2.1f")

```

n addition to defining the shape and size of the neighborhood, a kernel also specifies the weight each neighboring cell contributes to the summary statistic. For example, in a 3×3 neighborhood, each cell might contribute an equal weight of 1/9^th^ to the final value. However, weights can also be defined using more complex functions--these are known as **kernel functions**. One commonly used kernel function is the **Gaussian**, which assigns greater weight to cells closer to the center and less to those farther away.

```{r f10-focal04, echo=FALSE, fig.cap = "Example of a focal operation where the kernel is defined by a Gaussian function whereby the closest cells are assigned a greater weight.", message=FALSE, warning=FALSE, fig.height=3, fig.width=5}

r2 = raster(ncol=10, nrow=10, xmn=0, xmx=100, ymn=0, ymx=100)
set.seed(123)
values(r2) = round(runif(ncell(r2),1,100))
m = focalWeight(r2, 4, "Gauss")
f1 = focal(r2, w=m  , fun=sum, na.rm=TRUE, pad=TRUE)

sg = stack(r2,f1)
names(sg) = c("input","output")
gp(sg,1,"%2.1f")

```

## Zonal operations and functions

Zonal operations differ from local and focal operations in that they summarize values across defined zones rather than individual cells or neighborhoods. A zone is a group of cells that share a common value on a separate raster layer—often representing administrative boundaries, land parcels, or other categorical regions.

In the following example, the cell values from the raster layer are aggregated into three zones whose boundaries are delineated by red polygons. Each output zone shows the average value of the cells within that zone.

```{r f10-zonal01, echo=FALSE, fig.cap = "Example of a zonal operation where the cell values are averaged for each of the three zones delineated in red.", message=FALSE, warning=FALSE, fig.height=3.5, fig.width=6}

library(gridExtra)
library(stars)
#library(broom)
cds1 <- rbind(c(0,50), c(0,100), c(50,100), c(50,50), c(0,50))
cds2 <- rbind(c(0,0), c(100,0), c(50,50), c(0,50), c(0,0))
cds3 <- rbind(c(100,0), c(50,50), c(50,100), c(100,100), c(100,0))
polys <- SpatialPolygons(list(Polygons(list(Polygon(cds1)), 1),
                              Polygons(list(Polygon(cds2)), 2),
                              Polygons(list(Polygon(cds3)), 3)))

polys <- st_sf(st_sfc(list(st_polygon(list(cds1)), st_polygon(list(cds2)), st_polygon(list(cds3)))), crs = crs(r, asText = TRUE))

v <- extract(r, polys, fun=mean, bind = TRUE)

dfx <- data.frame(rbind(cds1[1:4,],cds2[1:4,], cds3[1:4,]),
                  id = rep(1:3, each = 4),
                  val = rep(extract(r, polys, fun=mean, bind = TRUE), each = 4))
dfx_center <-  as.data.frame(st_coordinates(st_centroid(polys)))
dfx_center$val <- v

sg <- stack(r)
names(sg) <- "input"

p1 <- gp(sg) + geom_polygon(data=dfx, aes(x=X1, y = X2, group = id), colour = "red", fill = NA)

names(sg) <- "output"
p2 <- gp(sg) + geom_polygon(data=dfx, aes(x=X1, y = X2, group = id, fill = val),
                     alpha = 0.5, colour = "white") +
      geom_text(data=dfx_center,
                aes(x=X, y = Y,label=sprintf("%2.1f", val)),colour='white',size=4.1)


grid.arrange(p1, p2, ncol=2)

```

While rasters can be used to define zones, it's more common for a GIS to have a polygon layer define the zones.

## Global operations and functions

Global operations and functions may use some or all input cells when computing the value of each output cell. A common example is the **Euclidean Distance** tool, which calculates the shortest straight-line distance between each cell and a specified source or destination. In the example below, a new raster assigns to each cell the distance to the nearest cell with a value of 1--note that only two such cells exist in the input raster.

```{r f10-global01, echo=FALSE, fig.cap = "Example of a global function: the Euclidean distance. Each pixel is assigned its closest distance to one of the two source locations (defined in the input layer).", message=FALSE, warning=FALSE, fig.height=3, fig.width=5}

p           = raster(ncol=10, nrow=10, xmn=0, xmx=100, ymn=0, ymx=100) # Create an empty raster
p[]         = NA  # Assign NoData values to all pixels
p[c(20,85)] = 1   # Change the pixels 20 and 85  to 1
p.d = distance(p) # Compute the shortest distance between each output pixel in the
                  # output raster and pixels 20 and 85
sg = stack(p, p.d)
names(sg) = c("input","output")
gp(sg,1,"%2.0f")

```

Global operations and functions can also generate single value outputs such as the overall pixel mean or standard deviation.

Another popular use of global functions is in the mapping of least-cost paths where a cost surface raster is used to identify the shortest path between two locations which minimizes cost (in time or money).

## Operators and functions

Operations and functions applied to gridded data can be broken down into three groups: **mathematical**, **logical comparison** and **Boolean**.

### Mathematical operators and functions

Two mathematical operators have already been demonstrated in earlier sections: the *multiplier* and the *addition* operators. Other operators include division and the modulo (also known as the modulus) which is the remainder of a division. Mathematical *functions* can also be applied to gridded data manipulation. Examples are square root and sine functions. The following table showcases a few examples with ArcGIS and R syntax.

Operation | ArcGIS Syntax | R Syntax | Example
----------|--------|--------|--------
Addition | `+`     | `+` |  `input1 + input2`
Subtraction | `-`  | `-` | `input1 - input2`
Division | `/` | `/` | `input1 / input2`
Modulo | `Mod()` | `%%` | `Mod(input1, 100)`, `input1 %% 10`
Square root | `SquareRoot()` | `sqrt()`| `SquareRoot(input1)`, `sqrt(input1)`

### Logical comparison

The logical comparison operators evaluate a condition then output a value of `1` if the condition is true and `0` if the condition is false. Logical comparison operators consist of *greater than*, *less than*, *equal* and *not equal*.

Logical comparison | Syntax
-------------------|-------
Greater than  | `>`
Less than | `<`
Equal | `==`
Not equal | `!=`

For example, the following figure shows the output of the comparison between two rasters where we are assessing if cells in `input1` are *greater than* those in `input2` (on a cell-by-cell basis).


```{r f10-logical01, echo=FALSE, fig.cap = "Output of the operation `input1` &gt; `input2`. A value of 1 in the output raster indicates that the condition is true and a value of 0 indicates that the condition is false.", fig.height=2, fig.width=5}

ras1 <- raster(ncol=3, nrow=3, xmn=0, xmx=100, ymn=0, ymx=100)
ras2 <- ras1
ras.out <- ras1
projection(r) <- "+proj=utm +zone=19 +datum=NAD83"
set.seed(32)
values(ras1) <- round(runif(length(ras1),-5,5))
set.seed(5)
values(ras2) <- round(runif(length(ras2),-5,5))
ras.out <- (ras1 > ras2)
sg = stack(ras1,ras2,ras.out)
names(sg) <- c("input1","input2", "output")
gp(sg,1, sfmt="%2.0f", fnts=4)
```

When assessing whether two cells are equal, some programming environments such as R and ArcGIS's *Raster Calculator* require the use of the *double equality* syntax, `==`, as in `input1 == input2`. In these programming environments, the single equality syntax is usually interpreted as an *assignment operator* so `input1 = input2` would instruct the computer to assign the cell values in `input2` to `input1` (which is not what we want to do here).

Some applications make use of special functions to test a condition. For example, ArcGIS has a function called `Con(condition, out1, out2)` which assigns the value `out1` if the condition is met and a value of `out2` if it's not. For example, ArcGIS's raster calculator expression

```
Con( input1 > input2, 1, 0)
```
outputs a value of `1` if `input1` is greater than `input2` and `0` if not. It generates the same output as the one shown in the above figure. Note that in most programming environments (including ArcGIS), the expression `input1 > input2` produces the same output because the value `1` is the numeric representation of  `TRUE` and `0` that of `FALSE`.


### Boolean (or Logical) operators

In map algebra, *Boolean* operators are used to compare conditional states of a cell (i.e. TRUE or FALSE). The three Boolean operators are **AND**, **OR** and **NOT**.

Boolean  ArcGIS   R          Example
-------- -------- ---------  ------
AND      &       	&	         `input1 & input2`
OR       `|`     	`|`        `input1 | input2`
NOT	     `~`      `!`        `~input2`, `! input2`

> A "TRUE" state is usually encoded as a `1` or any *non-zero* integer while a "FALSE" state is usually encoded as a `0`.

For example, if `cell1=0` and `cell2=1`, the Boolean operation `cell1 AND cell2` results in a `FALSE` (or 0) output cell value. This Boolean operation can be translated into plain English as "are the cells 1 and 2 both TRUE?" to which we answer "No they are not" (cell1 is FALSE). The **OR** operator can be interpreted as "is x *or* y TRUE?" so that `cell1 OR cell2` would return `TRUE`. The **NOT** interpreter can be interpreted as "is x not TRUE?" so that `NOT cell1` would return `TRUE`.


```{r f10-boolean01, echo=FALSE, fig.cap = "Output of the operation `input1 AND input2`. A value of 1 in the output raster indicates that the condition is true and a value of 0 indicates that the condition is false. Note that many programming environments treat *any* none 0 values as TRUE so that `-3 AND -4` will return `TRUE`.", fig.height=2, fig.width=5}

ras1 <- raster(ncol=3, nrow=3, xmn=0, xmx=100, ymn=0, ymx=100)
ras2 <- ras1
ras.out <- ras1
projection(r) <- "+proj=utm +zone=19 +datum=NAD83"
set.seed(32)
values(ras1) <- round(runif(length(ras1),-5,5))
set.seed(5)
values(ras2) <- round(runif(length(ras2),-5,5))
ras.out <- (ras1 & ras2)
sg = stack(ras1,ras2,ras.out)
names(sg) <- c("input1","input2", "output")
gp(sg,1, sfmt="%2.0f", fnts=4)

```

```{r f10-boolean02, echo=FALSE, fig.cap = "Output of the operation `NOT input2`. A value of 1 in the output raster indicates that the input cell is NOT TRUE (i.e. has a value of 0).", fig.height=2, fig.width=5}

ras1 <- raster(ncol=3, nrow=3, xmn=0, xmx=100, ymn=0, ymx=100)
ras2 <- ras1
ras.out <- ras1
projection(r) <- "+proj=utm +zone=19 +datum=NAD83"
set.seed(32)
values(ras1) <- round(runif(length(ras1),-5,5))
set.seed(5)
values(ras2) <- round(runif(length(ras2),-5,5))
ras.out <- (! ras2)
sg = stack(ras2,ras.out)
names(sg) <- c("input2", "output")
gp(sg,1, sfmt="%2.0f", fnts=4)

```

### Combining operations

Both comparison and Boolean operations can be combined into a single expression. For example, we may wish to find locations (cells) that satisfy requirements from two different raster layers: e.g. `0<input1<4`  AND  `input2>0`. To satisfy the first requirement, we can write out the expression as `(input1>0) & (input1<4)`. Both comparisons (delimited by parentheses) return a `0` (FALSE) or a `1` (TRUE). The ampersand, `&`, is a Boolean operator that checks that both conditions are met and returns a 1 if yes  or a 0 if not. This expression is then combined with another comparison using another ampersand operator that assesses the criterion `input2>0`. The amalgamated expression is thus `((input1>0) & (input1<4)) & (input2>0)`.

```{r f10-combining01, echo=FALSE, fig.height=2, fig.width=5, fig.cap = "Output of the operation ((`input1`&gt;`0`) &amp; (`input1`&lt;`4`)) &amp; (`input2`&gt;`0`). A value of 1 in the output raster indicates that the condition is true and a value of 0 indicates that the condition is false."}

ras1 <- raster(ncol=3, nrow=3, xmn=0, xmx=100, ymn=0, ymx=100)
ras2 <- ras1
ras.out <- ras1
projection(r) <- "+proj=utm +zone=19 +datum=NAD83"
set.seed(32)
values(ras1) <- round(runif(length(ras1),-5,5))
set.seed(5)
values(ras2) <- round(runif(length(ras2),-5,5))
ras.out <- ((ras1 > 0) & (ras1 <4)) & (ras2 > 0)
sg <- stack(ras1,ras2,ras.out)
names(sg) <- c("input1","input2", "output")
gp(sg,1, sfmt="%2.0f", fnts=4)

```

> Note that most software environments assign the ampersand character, `&`, to the `AND` Boolean operator.

## Summary

In this chapter, we explored the foundational concepts of Map Algebra, a framework for performing spatial analysis using raster data. We examined how local, focal, zonal, and global operations each contribute to different types of spatial reasoning. Local operations focus on individual cells, focal operations incorporate neighborhood context, zonal operations summarize values across defined regions, and global operations consider the entire raster dataset.