-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathLesson6.Rmd
More file actions
274 lines (218 loc) · 7.28 KB
/
Lesson6.Rmd
File metadata and controls
274 lines (218 loc) · 7.28 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
---
params:
lesson: "Lesson 6"
title: "Lists, lists, lists and applying functions with `purrr`"
bookchapter_name: "Cheat sheet for the `purrr` package"
bookchapter_section: "https://purrr.tidyverse.org/"
functions: "`map`, `pluck`, `keep`, `discard`, `compact`"
packages: "`dplyr`, `purrr`"
# end inputs ---------------------------------------------------------------
header-includes: \usepackage{float}
always_allow_html: yes
output:
html_document:
code_folding: show
---
```{r, setup, echo = FALSE, cache = FALSE, include = FALSE}
options(width=100)
knitr::opts_chunk$set(
eval = FALSE, # run all code
echo = TRUE, # show code chunks in output
tidy = TRUE, # make output as tidy
message = FALSE, # mask all messages
warning = FALSE, # mask all warnings
comment = "",
tidy.opts=list(width.cutoff=100), # set width of code chunks in output
size="small" # set code chunk size
)
```
\
<!-- install packages -->
```{r, load packages, eval=T, include=T, cache=F, message=F, warning=F, results='hide',echo=F}
packages <- c("ggplot2","ggthemes","dplyr","tidyverse","zoo","RColorBrewer","viridis","plyr")
if (require(packages)) {
install.packages(packages,dependencies = T)
require(packages)
# load tvthemes
devtools::install_github("Ryo-N7/tvthemes")
}
lapply(packages,library,character.only=T)
```
<!-- ____________________________________________________________________________ -->
<!-- ____________________________________________________________________________ -->
<!-- ____________________________________________________________________________ -->
<!-- start body -->
# `r paste0(params$lesson,": ",params$title)`
\
Functions for `r params$lesson`
`r params$functions`
\
Packages for `r params$lesson`
`r params$packages`
\
# Agenda
Use the `purrr` package to apply functions to lists and vectors.
[`r params$bookchapter_name`](`r params$bookchapter_section`).
\
<!-- ----------------------- image --------------------------- -->
<div align="center">
<img src="img/purrr.jpg" style=width:50%>
</div>
<!-- ----------------------- image --------------------------- -->
\
<!-- end yaml template------------------------------------------------------- -->
# Do First
Recreate the below plot using the smaller NYC Airbnb dataset. The curve is a 'loess'. To change the legend title, add the (unintuitive) `colour = "your legend title"` argument to the `labs()` function.
```{r}
# smaller csv file (16 cols)
url <- "http://data.insideairbnb.com/united-states/ny/new-york-city/2021-04-07/data/listings.csv.gz"
nyc <- readr::read_csv(url)
nyc <- nyc[nyc$id < 1000000,] # get smaller subet of data
```
```{r, echo=F, eval=T, out.width="100%"}
require(ggthemes,ggplot2,readr)
# smaller csv file (16 cols)
url <- "http://data.insideairbnb.com/united-states/ny/new-york-city/2021-04-07/data/listings.csv.gz"
nyc <- read_csv(url)
nyc <- nyc[nyc$id < 1000000,] # get smaller subet of data
colv = "#00060a"
ggplot(data = nyc, aes(x = reviews_per_month,y = number_of_reviews)) +
geom_point(aes(color=neighbourhood_group_cleansed),show.legend = T,alpha=0.5) +
geom_smooth(color = colv, fill = colv, method="loess",alpha=0.3) +
labs(title = "Reviews across NYC boroughs",
caption = "Source: NYC Airbnb data",
x = "Reviews per month",
y = "Number of reviews",
colour = "Boroughs") +
theme_classic()
```
# Create some data in a list
First generate some random data
```{r,eval=T}
s1 <- sample(10) # random number sample
s2 <- rnorm(10,500) # sample 10 normally distributed random numbers around a mean of 500
s3 <- runif(10) # random uniform distribution
s1
s2
s3
```
Now combine these into a list using `list()`
```{r,eval=T}
ls1 <- list(s1,s2,s3) # create a list of these data
ls1
ls1 %>% str
```
# Exercise 0
## List indexing
Print the `ls1` list object and take note of the index and elements
```{r}
ls1
# index
ls1[1]
ls1[2]
ls1[3]
# elements
ls1[[1]][[1]]
ls1[[1]][[3]]
ls1[[2]][[10]]
ls1[[3]][[11]] # ??
ls1[[3]] %>% length
# what's the difference?
ls1[1]
ls1[[1]]
```
## Apply functions
The `purrr` package uses the following apply functions to apply function iteratively to a list or vector.
`map` Apply a function to each element of a list
```{r}
require(purrr)
set.seed(12) # set a number seed to generate reprodicible results for random data
map(ls1,mean) # get the mean
```
## Exercise 1
Apply summary stats to the `ls1` list data
* `sum`
* `summary`
* `max`
* `sqrt`
* `length` and `lengths`
What happens when you run the following and why?
```{r}
mean(ls1)
sum(ls1)
```
\
# Exercise 2
## Filter lists
`pluck` Select an element by name or index
`keep` Select elements that pass a logical test
`discard` Select elements that do not pass a logical test
`compact` Drop empty elements
```{r,eval=F}
pluck(ls1,3) # advantage = returns numeric
ls1[3][[1]] # this is the same as above
func <- map(ls1,mean) > 10 # create a logical test (a predicate function)
keep(ls1, func)
discard(ls1, func)
ls2 <- list(1,NA,NULL,integer(0),list()) # list of empty and null things
compact(ls2)
```
Store plots in lists for easy retrieval. Create two plots of the `ls1` data (called `ls1p` and `ls2p`) and store in a list called `plot_list`.
\
First turn the list into a dataframe so `ggplot` understands it.
```{r}
ls1_df <- ls1 %>%
data.frame
names(ls1_df) <- c("A","B","C")
```
```{r,echo=F,eval=T}
set.seed(10)
ls1_df <- ls1 %>%
data.frame
# create some plots
names(ls1_df) <- c("A","B","C")
ls1p <- ggplot(ls1_df,aes(A,B,size=A))+geom_point(color="orange",show.legend = F) + geom_line(color="orange",show.legend = F) + labs(title="Access plots as list indices") + theme_minimal()
ls2p <- ggplot(ls1_df)+geom_point(aes(A,C,color=A,size=A),show.legend = F) + theme_minimal()
# store in list
plot_list <- list(ls1p,ls2p)
```
```{r,eval=F}
plot_list <- list(ls1p,ls2p)
```
Plot your plot from the object `plot_list`
```{r,echo=F,eval=T}
# pluck individual plots from list
names(plot_list) <- c("Plot1","Plot2")
pluck(plot_list,"Plot2")
```
# Exercise 3
## Summarise lists
`every` Do all elements pass a test?
`some` Do some elements pass a test?
`has_element` Does a list contain an element?
`detect` Find first element to pass
`detect_index` Find index of first element to pass
`vec_depth` Return depth (number of levels of indexes)
```{r, eval=F}
ls1 %>% every(is.character)
ls1 %>% some(is.character)
ls1 %>% has_element("foo")
ls1 %>% detect(is.character)
ls1 %>% detect_index(is.character)
ls1 %>% vec_depth
```
# Exercise 4
## Transform lists
`modify` Apply function to each element
`modify_at` Apply function to elements by name or index
`modify_if` Apply function to elements that pass a test
`modify_depth` Apply function to each element at a given level of a list
```{r}
ls1_repeat <- list(list(list(ls1))) # create list of lists
ls1_repeat %>% map(mean) # list is indexed too far down
ls1_repeat %>% modify_depth(4,mean) # access deep list indices
```
# Further useful `purrr` functions
`pmap` Apply a function to groups of elements from lists of lists
`lmap` Apply function to each list-element of a list or vector
`imap` Apply function to each element of a list or vector and its index