mappingInR/section_D.Rmd at main · dcsuh/mappingInR · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
---
title: "section_D"
subtitle: "https://dcsuh.github.io/mappingInR/"
author: "Daniel Suh"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

```{r, message=F}
library(sf)
library(terra)
library(tidyverse)
library(spData)
library(spDataLarge)
library(magrittr)
```


## Geometry operations

So far, we have learned how to perform operations on your data according to the attribute (non-spatial data) and then using the spatial data itself. This section will teach you how to perform operations on the actual spatial geometry data which will allow you to manipulate the geometries, add buffers, or create centroids.

Vector geometries can be manipulated using either unary or binary operations. Unary operations only use the geometry that you are manipulating. Binary operations rely on a separate geometry to make the changes to your target geometry.

Raster geometries can also be manipulated to change the resolution (number of pixels) and the extent and origin of the raster. This is useful when trying to align different raster data so that you can perform map algebra.

Finally, we will discuss raster-vector interactions.


## Geometric operations on vector data

This section focuses on operations performed on the geometry of 'sf' objects which means we will be working with objects of class 'sfc' in addition to objects of class 'sf'

## Simplification

Simplification generalizes lines and polygons to remove detail from the geometries. This is useful to reduce the amount of memory required to visualize these geometries. st_simplify() uses an algorithm for simplifying known as the Douglas-Peucker algorithm with an argument of dTolerance to control the intensity of the simplication.

```{r}
plot(st_geometry(seine))
seine_simp = st_simplify(seine, dTolerance = 2000)  # 2000 m
plot(st_geometry(seine_simp))
```

We can also do this with polygons

```{r}
us_states2163 = st_transform(us_states, 2163)
plot(st_geometry(us_states2163))
```

```{r}
us_states_simp1 = st_simplify(us_states2163, dTolerance = 100000)  # 100 km
plot(st_geometry(us_states_simp1))
```

One issue that you can see is that st_simplify() works on a per-geometry basis. In order to make the borders line up we can use ms_simplify() from 'rmapshaper'

```{r}
#install.packages("rmapshaper")
library(rmapshaper)
# proportion of points to retain (0-1; default 0.05)
us_states2163$AREA = as.numeric(us_states2163$AREA)
us_states_simp2 = rmapshaper::ms_simplify(us_states2163, keep = 0.01,
                                          keep_shapes = TRUE)
plot(st_geometry(us_states_simp2))
```


## Centroids

Centroid operations identify the center of geographic objects. Like the center of anything, there are different ways to define it (e.g. how would you define the center of your body?). One of the most common ways to define the center of geographic objects is to use the center of mass (imagine cutting out the map of Georgia from a piece of paper and balancing it on your finger). st_centroid() allows for calculation of geographic centroids.

```{r}
nz_centroid <- st_centroid(nz)
seine_centroid <- st_centroid(seine)
```

It's possible that the centroid could fall outside of the geometry (consider a donut). In this case, you can use st_point_on_surface which will ensure that points are on the surface of the geometry

```{r}
nz_pos = st_point_on_surface(nz)
seine_pos = st_point_on_surface(seine)
```


```{r}
library(tmap)
p_centr1 = tm_shape(nz) + tm_borders() +
  tm_shape(nz_centroid) + tm_symbols(shape = 1, col = "black", size = 0.5) +
  tm_shape(nz_pos) + tm_symbols(shape = 1, col = "red", size = 0.5)
p_centr2 = tm_shape(seine) + tm_lines() +
  tm_shape(seine_centroid) + tm_symbols(shape = 1, col = "black", size = 0.5) +
  tm_shape(seine_pos) + tm_symbols(shape = 1, col = "red", size = 0.5)
tmap_arrange(p_centr1, p_centr2, ncol = 2)
```


## Buffers

Buffers are polygons representing the area within a given distance of a geometric feature: regardless of whether the input is a point, line or polygon, the output is a polygon. Buffers are often used for data analysis when you want to know answers to questions such as: how many points are within this distance of this line?

```{r}
seine_buff_5km = st_buffer(seine, dist = 5000)
seine_buff_50km = st_buffer(seine, dist = 50000)
plot(st_geometry(seine))
plot(st_geometry(seine_buff_5km))
plot(st_geometry(seine_buff_50km))
```

## Affine transformations

Affine transformation is any transformation that preserves lines and parallelism. However, angles or length are not necessarily preserved. Affine transformations include, among others, shifting (translation), scaling and rotation. Additionally, it is possible to use any combination of these. Affine transformations are an essential part of geocomputation.

The ‘sf’ package implements affine transformation for objects of classes ‘sfg’ and ‘sfc’.

First, make an object of class ‘sfc’
```{r}
nz_sfc = st_geometry(nz)
```


Shifting moves every point by the same distance in map units.

```{r}
nz_shift = nz_sfc + c(0, 100000)
```

Scaling enlarges or shrinks objects by a factor. This can be done globally or locally. If global, the relative position of the geometry features will stay the same and be scaled together. If scaling locally, then each of the geometry features will scale relative to a point that you must define (usually the centroid).

```{r}
nz_centroid_sfc = st_centroid(nz_sfc)
nz_scale = (nz_sfc - nz_centroid_sfc) * 0.5 + nz_centroid_sfc
```

Rotating requires a rotation matrix to be defined

```{r}
rotation = function(a){
  r = a * pi / 180 #degrees to radians
  matrix(c(cos(r), sin(r), -sin(r), cos(r)), nrow = 2, ncol = 2)
}
```

nz_rotate can then be used with the defined rotation matrix wrapped in the rotation function

```{r}
nz_rotate = (nz_sfc - nz_centroid_sfc) * rotation(30) + nz_centroid_sfc
```

```{r}
st_crs(nz_shift) = st_crs(nz_sfc)
st_crs(nz_scale) = st_crs(nz_sfc)
st_crs(nz_rotate) = st_crs(nz_sfc)
p_at1 = tm_shape(nz_sfc) + tm_polygons() +
  tm_shape(nz_shift) + tm_polygons(col = "red") +
  tm_layout(main.title = "Shift")
p_at2 = tm_shape(nz_sfc) + tm_polygons() +
  tm_shape(nz_scale) + tm_polygons(col = "red") +
  tm_layout(main.title = "Scale")
p_at3 = tm_shape(nz_sfc) + tm_polygons() +
  tm_shape(nz_rotate) + tm_polygons(col = "red") +
  tm_layout(main.title = "Rotate")
tmap_arrange(p_at1, p_at2, p_at3, ncol = 3)
```


Finally, the newly created geometries can replace the old ones with the st_set_geometry() function:
```{r}
nz_scale_sf = st_set_geometry(nz, nz_scale)
```


## Clipping

Spatial clipping is a form of spatial subsetting that only affects the geometry columns of some of the affected features. Clipping only applies to features more complex than points.


This creates a venn diagram with two points x and y for us to use as an example

```{r}
b = st_sfc(st_point(c(0, 1)), st_point(c(1, 1))) # create 2 points
b = st_buffer(b, dist = 1) # convert points to circles
plot(b)
text(x = c(-0.5, 1.5), y = 1, labels = c("x", "y")) # add text

```

This selects the area that is shared between x and y

```{r}
x = b[1]
y = b[2]
x_and_y = st_intersection(x, y)
plot(b)
plot(x_and_y, col = "lightgrey", add = TRUE) # color intersecting area
```

Use these other functions instead of st_intersection() to try and select other areas of the venn diagram: st_union(), st_difference(), st_sym_difference()

```{r}

```

## Geometry Unions

When working with attribute data, we were still able to aggregate features and dissolve polygons together. When we did this we used summarize() and aggregate() but the function that was working behind the scenes to dissolve the polygons was actually st_union(). This function can take two geometries and then unite them.

```{r}
us_west = us_states[us_states$REGION == "West", ]
us_west_union = st_union(us_west)
```

## Type transformations

Geometry casting is a powerful operation that enables transformation of the geometry type. st_cast() is used for this operation and it is important to note that it will behave differently on ‘sfg’, ‘sfc’, and ‘sf’ objects. st_cast() will allow you to change between points, lines, and polygons. When changing between multi versions of these, it will take a multi-object and turn it into multiple different objects. Refer to the online textbook for more information on using st_cast and perform geometric type transformations.

## Geometric operations on raster data

Geometric raster operations include the shift, flipping, mirroring, scaling, rotation or warping of images.

There are certain geometric operations on raster data that R does not handle well so if you need to perform these then you can work with dedicated GIS softwares and the online textbook also goes in to bridging between R and GIS software in chapter 9.

However, R can handle operations such as changing the extent, resolution, and origin of a raster. This can be important when trying to match raster layers and it can also be useful to aggregate (lower the resolution) of a raster so that it is less computationally intensive.

## Geometric intersections
We previously saw how to extract values from a raster overlaid by other spatial objects. To retrieve a spatial output, we can use almost the same subsetting syntax. The only difference is that we have to make clear that we would like to keep the matrix structure by setting the drop argument to FALSE. This will return a raster object containing the cells whose midpoints overlap with clip.
```{r}
elev = rast(system.file("raster/elev.tif", package = "spData"))
clip = rast(xmin = 0.9, xmax = 1.8, ymin = -0.45, ymax = 0.45,
            resolution = 0.3, vals = rep(1, 9))
crop(elev,clip)
```
For the same operation we can also use the intersect() and crop() command.

## Extent and Origin

In order for rasters to match, the extent and origin of the raster need to match. When using publicly available data such as satellite imagery, the files may often come in different resolution or projections.

If the rasters only differ in their extent then you can use extend() with both rasters as the argument so that the smaller one extends to match the larger one. Also, terra will conveniently throw an error if two rasters do not match.

```{r}
elev = rast(system.file("raster/elev.tif", package = "spData"))
elev_2 = extend(elev, c(1, 2)) #extended to be longer than elev
```

Try to run this code chunk. What happens?
```{r}
#elev_3 = elev + elev_2
```

```{r}
elev_4 = extend(elev, elev_2)
```

By default, the origin is the cell corner closest to the coordinates (0,0). origin() will retrieve the origin of a raster.

```{r}
origin(elev_4)
```
elev_4 conveniently has a corner with coordinates (0,0)

If the origin does not match then the rasters will not match. This can be fixed by using origin() again with the raster as the argument.


```{r}
# change the origin
#origin(elev_4) <- c(0.25, 0.25)
plot(elev_4)
plot(elev, add = TRUE) # and add the original raster
```


## Aggregation and disaggregation

Raster datasets can also differ with regard to their resolution. To match resolutions, one can either decrease, using aggregate(), or increase, using disagg(), the resolution of one raster.

In this example, we decrease the resolution by aggregating by a factor of 5. The resulting cell uses the mean of the input cells but the function could be set to other functions such as median() or sum().

```{r}
data(dem, package="spDataLarge")
dem <- rast(dem)
dem_agg = aggregate(dem, fact = 5, fun = mean)
plot(dem)
plot(dem_agg)
```

By contrast, the disaggregate() function increases the resolution. However, we have to specify a method on how to fill the new cells. The disaggregate() function provides two methods. The default one (method = "near") simply gives all output cells the value of the input cell, and hence duplicates values which leads to a blocky output image.

```{r}
dem_disagg = terra::disaggregate(dem_agg, fact = 5, method = "near")
```

bilinear is another method that may produce better looking results.

```{r}
dem_disagg = terra::disaggregate(dem_agg, fact = 5, method = "bilinear")
```

However, with disaggregation we are always interpolating the values for the new cells so the outputs will always be limited by the lower resolution input. Unfortunately,there is no enhance() function.


![zoom and enhance](https://i.makeagif.com/media/12-29-2016/DhK9ZZ.gif)

## Resampling

Aggregation and disaggregation are useful when changing the resolution of a single raster. However, in other situations you may have multiple rasters with varying extent and resolution so you will want to resample which is a process for computing values for new pixel locations. There are a variety of methods to resample and these are detailed in the online textbook. The simplest versions are nearest neighbor and bilinear interpolation. Nearest neighbor assigns the value of the nearest cell of the original raster to the cell of the target one. It is fast and usually suitable for categorical rasters. Bilinear interpolation assigns a weighted average of the four nearest cells from the original raster to the cell of the target one. This is the fastest method for continuous rasters. To resample you can use the resample() function which allows you to designate the rasters you would like to work with and the method for resampling. For the exercises in this course you will not need to resample but in your own work with raster data you may need to and we suggest turning to the online textbook and other sources for more information on how to best do this for your situation.


## Raster-Vector Interactions

This section focuses on interactions between raster and vector geographic data models. It includes four main techniques: raster cropping and masking using vector objects; extracting raster values using different types of vector data; and raster-vector conversion. The above concepts are demonstrated using data used in previous chapters to understand their potential real-world applications.

## Raster cropping

The ability to crop and mask rasters using vector data is valuable because raster data is usually over a larger geographic area than your vector data. Limiting your raster data to the geographic area described by your vector geometries will be less computationally expensive and will help to develop nicer maps.

We will use these files to demonstrate raster cropping and masking
```{r}
srtm = rast(system.file("raster/srtm.tif", package = "spDataLarge"))
zion = st_read(system.file("vector/zion.gpkg", package = "spDataLarge"))
zion = st_transform(zion, crs(srtm))
```

We can start by using crop() to crop our raster using our vector data. Cropping will limit the raster to the smallest rectangular space that still captures the vector geometry
```{r}
srtm_cropped = crop(srtm, vect(zion))
```

After cropping, we can mask() the rest of the raster data outside the vector area so that it is NA
```{r}
srtm_masked = mask(srtm, vect(zion))
```

mask() can also be altered with the argument 'inverse' set to TRUE to turn all of the raster inside the vector into NA
```{r}
srtm_inv_masked = mask(srtm, vect(zion), inverse = TRUE)
```

Cropping and masking are typically used so that only the area of interest is left.

![](https://geocompr.robinlovelace.net/05-geometry-operations_files/figure-html/cropmask-1.png)


## Extraction

Raster extraction is the process of identifying and returning the values associated with a ‘target’ raster at specific locations, based on a (typically vector) geographic ‘selector’ object.

The basic example is of extracting the value of a raster cell at specific points. For this purpose, we will use zion_points, which contain a sample of 30 locations within the Zion National Park. The following command extracts elevation values from srtm and creates a data frame with points’ IDs (one value per vector’s row) and related srtm values for each point. Now, we can add the resulting object to our zion_points dataset with the cbind() function:
```{r}
data("zion_points", package = "spDataLarge")
elevation = terra::extract(srtm, vect(zion_points))
zion_points = cbind(zion_points, elevation)
```


Line selectors can also be used for extraction but it is often better to use a series of points rather than a line. This is achieved by making a line and then splitting it up into many points as demonstrated below:

Make line
```{r}
zion_transect = cbind(c(-113.2, -112.9), c(37.45, 37.2)) %>%
  st_linestring() %>%
  st_sfc(crs = crs(srtm)) %>%
  st_sf()
```

Split into points
```{r}
zion_transect$id = 1:nrow(zion_transect)
zion_transect = st_segmentize(zion_transect, dfMaxLength = 250)
zion_transect = st_cast(zion_transect, "POINT")
```

Find the distance between points
```{r}
zion_transect = zion_transect %>%
  group_by(id) %>%
  mutate(dist = st_distance(geometry)[, 1])
```

Extract elevation value for each point and combine with main object
```{r}
zion_elev = terra::extract(srtm, vect(zion_transect))
zion_transect = cbind(zion_transect, zion_elev)
```

What this creates is an elevation map for a straight line through Zion National Park

![](https://geocompr.robinlovelace.net/05-geometry-operations_files/figure-html/lineextr-1.png)


Finally, polygons can be used for raster extractions. Polygons can return a lot of values per polygon.
```{r}
zion_srtm_values = terra::extract(x = srtm, y = vect(zion))
```

This resulting data frame is useful for calculating summary statistics of certain polygons
```{r}
group_by(zion_srtm_values, ID) %>%
  summarize(across(srtm, list(min = min, mean = mean, max = max)))
```

## Rasterization

Rasterization is the conversion of vector objects into their representation in raster objects. The terra package contains the function rasterize() for doing this work. Its first two arguments are, x, vector object to be rasterized and, y, a ‘template raster’ object defining the extent, resolution and CRS of the output. Deciding on which resolution to use can be difficult because if it is too low then the result may miss the full geographic variability of the vector data; if it is too high, computational times may be excessive.


To demonstrate rasterization in action, we will use a template raster that has the same extent and CRS as the input vector data 'cycle_hire_osm_projected' and spatial resolution of 1000 meters:
```{r}
cycle_hire_osm_projected = st_transform(cycle_hire_osm, "EPSG:27700")
raster_template = rast(ext(cycle_hire_osm_projected), resolution = 1000,
                       crs = st_crs(cycle_hire_osm_projected)$wkt)
```

rasterize() is used for rasterization and is very flexible. Try to work out what each of these different methods is doing when rasterizing.

```{r}
ch_raster1 = rasterize(vect(cycle_hire_osm_projected), raster_template,
                       field = 1)
```

```{r}
ch_raster2 = rasterize(vect(cycle_hire_osm_projected), raster_template,
                       fun = "length")
```

```{r}
ch_raster3 = rasterize(vect(cycle_hire_osm_projected), raster_template,
                       field = "capacity", fun = sum)
```


## Spatial Vectorization

Vectorization is the conversion of raster objects into their representation in vector objects. It involves converting spatially continuous raster data into spatially discrete vector data such as points, lines or polygons.

The simplest form of vectorization is to convert the centroids of raster cells into points. as.points() does exactly this for all non-NA raster grid cells. Note, here we also used st_as_sf() to convert the resulting object to the sf class.

```{r}
elev_point = as.points(elev) %>%
  st_as_sf()
```


Another common type of spatial vectorization is the creation of contour lines representing lines of a continuous variable such as elevation or temperature.
```{r}
data(dem, package="spDataLarge")
dem <- rast(dem)
cl = as.contour(dem)
plot(dem, axes = FALSE)
plot(cl, add = TRUE)
```


The final type of vectorization involves conversion of rasters to polygons. This can be done with terra::as.polygons(), which converts each raster cell into a polygon consisting of five coordinates, all of which are stored in memory (explaining why rasters are often fast compared with vectors!).

This is illustrated below by converting the grain object into polygons and subsequently dissolving borders between polygons with the same attribute values (also see the dissolve argument in as.polygons()).


```{r}
grain = rast(system.file("raster/grain.tif", package = "spData"))
grain_poly = as.polygons(grain) %>%
  st_as_sf()
```


## Exercises

5.5 Exercises
Some of the exercises use a vector (zion_points) and raster dataset (srtm) from the spDataLarge package. They also use a polygonal ‘convex hull’ derived from the vector dataset (ch) to represent the area of interest:


```{r}
library(sf)
library(terra)
zion_points = read_sf(system.file("vector/zion_points.gpkg", package = "spDataLarge"))
srtm = rast(system.file("raster/srtm.tif", package = "spDataLarge"))
ch = st_combine(zion_points) %>%
  st_convex_hull() %>%
  st_as_sf()
```


E1. Generate and plot simplified versions of the nz dataset. Experiment with different values of keep (ranging from 0.5 to 0.00005) for ms_simplify() and dTolerance (from 100 to 100,000) st_simplify().

At what value does the form of the result start to break down for each method, making New Zealand unrecognizable?
Advanced: What is different about the geometry type of the results from st_simplify() compared with the geometry type of ms_simplify()? What problems does this create and how can this be resolved?

E2. It was previously established that Canterbury region had 70 of the 101 highest points in New Zealand. Using st_buffer(), how many points in nz_height are within 100 km of Canterbury?

E3. Find the geographic centroid of New Zealand. How far is it from the geographic centroid of Canterbury?

E4. Most world maps have a north-up orientation. A world map with a south-up orientation could be created by a reflection (one of the affine transformations not mentioned in this chapter) of the world object’s geometry. Write code to do so. Hint: you need to use a two-element vector for this transformation. Bonus: create an upside-down map of your country.

E5. Subset the point in p that is contained within x and y.

Using base subsetting operators.
Using an intermediary object created with st_intersection().

E6. Calculate the length of the boundary lines of US states in meters. Which state has the longest border and which has the shortest? Hint: The st_length function computes the length of a LINESTRING or MULTILINESTRING geometry.

E7. Crop the srtm raster using (1) the zion_points dataset and (2) the ch dataset. Are there any differences in the output maps? Next, mask srtm using these two datasets. Can you see any difference now? How can you explain that?

E8. Firstly, extract values from srtm at the points represented in zion_points. Next, extract average values of srtm using a 90 buffer around each point from zion_points and compare these two sets of values. When would extracting values by buffers be more suitable than by points alone?

E9. Subset points higher than 3100 meters in New Zealand (the nz_height object) and create a template raster with a resolution of 3 km. Using these objects:

Count numbers of the highest points in each grid cell.
Find the maximum elevation in each grid cell.

E10. Aggregate the raster counting high points in New Zealand (created in the previous exercise), reduce its geographic resolution by half (so cells are 6 by 6 km) and plot the result.

Resample the lower resolution raster back to a resolution of 3 km. How have the results changed?
Name two advantages and disadvantages of reducing raster resolution.

E11. Polygonize the grain dataset and filter all squares representing clay.

Name two advantages and disadvantages of vector data over raster data.
At which points would it be useful to convert rasters to vectors in your work?