forked from anabento/R_Bootcamp
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathM2DataTypes.Rmd
More file actions
308 lines (206 loc) · 9.33 KB
/
M2DataTypes.Rmd
File metadata and controls
308 lines (206 loc) · 9.33 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
---
title: ""
author: ""
date: "`r Sys.Date()`"
output:
html_document:
toc: true
toc_float:
collapsed: true
smooth_scroll: false
---
#<span style="color:cadetblue">Data Types</span>
***
This module will cover:
- assigning objects
- data structure and syntax
- troubleshooting through Rstudio
- data subsetting
```{r settings, echo=FALSE}
knitr::opts_chunk$set(echo = TRUE, eval=FALSE)
```
##<span style="color:cadetblue"> `R` Objects </span>
Everything in `R` is an object. But what does this mean to someone who doesn't have a computer science background?
`R` doesn't let users directly access the computer's memory, instead users access many different specialized data structures. You can think of this design as a filing cabinet. One drawer contains textbooks, another pictures, another data tables, and so on. In this analogy, each drawer is a different object class.
If we look in the textbook drawer, each book or object has a similar structure (front cover, title page, table of contents, etc.). Just like each book has a unique title, `R` objects have user assigned names.
Objects in `R` are often assigned a name by using an <span style="color:orangered">**arrow**</span> made from a the <span style="color:orangered">**less than**</span> sign and <span style="color:orangered">**dash**</span> <-
```{r make an object}
myname <- "Odum" #this line assigns "Odum" to an object 'myname'
myname #print the content of 'myname' object by running it in the console
```
The information in the object is retrived by calling the object's name.
## <span style="color:cadetblue">Data Types</span>
One of the first concepts we'll need to understand how `R` organizes information. This is <span style="color:orangered"> **_very important_**</span> because nearly every task requires manipulating data. `R` has 5 basic <span style="color:orangered"> ('*atomic*')</span> classes :
* logical (`TRUE`, `FALSE`)
* character ("blue", "purple")
* integer (1,2,3) *each* number must be followed by `L`
* numeric (1,1.5,2)
* complex (i)
<span style="color:orangered">**_Concept Check_**:</span> What is the class of `myname`?
Based on the decsription above, we can guess that is should be a character. We can use a few base `R` functions to get information about objects. Some helpful functions include:
* typeof()
* length()
* attributes()
* str()
<span style="color:orangered">**Practice**:</span> Verify `myname` class type and determine the length. You may need to pull up the documentation page for the functions.
```{r character class}
#What is the class
?typeof() #look up documentation for typeof()
typeof(myname) #return the object type of "myname"
#How long is the object
?length()
length(myname)
```
This object has a length of one. Why?
Let's make an numeric class object with a length greater than 1
```{r numeric class}
y <- c(1:10) #colon indicates '1 to 10'
typeof(y)
length(y)
attributes(y)
```
<span style="color:orangered">**_Concept Check_**:</span> How could we make `myname` longer than 1?
##<span style="color:cadetblue">Data Structure</span>
These 5 classes or modes can then be combind to create many different types of *data structure*. Common data structures include:
* vector
* matrix
* list
* factors
* data frame
These data structures can consist of either a single class or a mixture of data types. Let's look at examples of each type.
###<span style="color:cadetblue">Vectors</span>
```Vectors``` are the most common data structure. Vectors (the `R` data ) are vectors of elements can be character, logical, numeric or integer class.
```{r Vectors}
#concatenating objects into a vector using "c"
character_vector <- c("a", "b", "c")
numeric_vector <- c(1, 10, 49)
boolean_vector <- c("TRUE", "FALSE", "TRUE")
```
You can creating ```Vectors``` using handy functions
```{r Vectors1}
vec1 <- rep(1:4, 2)
vec2 <- rep(1:4, each = 2)
vec3<-seq(1, 15, by = 2)
```
You can transform and use operators to create other ```vectors```. Explore the difference between `vec4` and `vec5`. How was the addition done in both cases?
```{r Vectors2}
vec4<-vec1+vec2+vec3
vec5<-sum(vec1,vec2,vec3)
```
There are many different ways to manipulate `vectors`. These methods will come in handy when exploring and plotting data. Accessing specific ```elements``` in the ```vector``` can be done based on the value of the `element` or the `element` location in the string.
<span style="color:orangered">**Practice**:</span> Determine how the command is accesses the vector. Complete the comment where needed.
```{r Vectors3}
vec4[1] # first elements
vec4[1:3] # # What happened here?
vec4[c(1,2,4)] # first second and fourth elements
vec6 <- c(1:20) #creating a vector
x <- c(3, 5, 12) # creating another vector
vec6[x]# the elements of vec6 that coicinde with vector x
vec6[-x] # What happened here?
vec7 <- c(1:5,2,2,2,2,2) #creating a vector
vec7[vec7 > 4] # elements bigger than 4
vec7[vec7 < 4] # # What happened here?
which(vec7 < 4) # positions
which(vec7 == 2) # position
```
###<span style="color:cadetblue">Matrices</span>
```Matrices``` are a special type of vector. The data is organized into columns and rows giving the matrix dimensions. There are many ways to make a matrix, but the default is to fill column-wise.
```{r Matrices}
rnames <- c("R1", "R2")
cnames <- c("C1", "C2")
m<-matrix(data = NA, nrow = 2, ncol = 2, byrow=TRUE,
dimnames=list(rnames, cnames))
```
<span style="color:orangered">**Practice**:</span> Add more rows and columns to the **matrix** and fill it with 1s and change the names of rows and columns
```{r Matrices1}
m<-matrix(data = , nrow = , ncol = , byrow = TRUE, dimnames = list(rnames, cnames))
```
Another way to build a matrix is to combine vectors by rows (rbind) or columns (cbind)
```{r Matrices 2}
m1<-rbind(vec1,vec2)
m2<-cbind(vec2,vec3,vec4)
```
Subsetting from a ``` matrix```. Note that here we use two values for subsetting.
```{r Matrices 3}
# All elements in first row of m1
m1[1,]
#All elements in first column of m1
m1[,1]
```
<span style="color:orangered">**Practice**:</span> Write the command that is descibed in the comments.
```{r Matrices 4}
# All elements in second row of m2
#All elements in third column of m2
#The elements in the bottom 2 rows, and the second and third column of m2
```
###<span style="color:cadetblue">Lists</span>
```Lists``` are really containers for vectors and matrices. They can hold different data classes in each entry of the list including lists! This means you can make lists of lists, which is why lists are considered a recursive vector.
*Lists of lists of lists can be confusing*
```{r Lists}
#make a list
w <- list(name="Odum", mynumbers=vec1, mymatrix=m, age=5.3)
primes <- list(
2, 3, 4:5,
6:7, 8:11, 12:13,
14:17, 18:19, 20:23)
```
<span style="color:orangered">**Practice**:</span> Follow comments to interact with the list
```{r Lists 2}
#check the class of an entry
#determine the structure of w
#subsetting
w[[2]] ## what is this?
w[[2]][1] #what is this? How is it different than last command?
```
###<span style="color:cadetblue">Factors</span>
```Factors``` are used for ordered or unordered categorical data. This data type is important for modeling and plotting work. Factors can only contain pre-defined values and are really a way at add labels to integers.
order categories: `low`,`med`, `high`
unorder categories: `blue`, `red`
```{r Factors1}
dat<-c('low','med', 'high')
# check if it is a factor
print(is.factor(dat))
#Making the vector a into a factor
factor_dat <- factor(dat)
#check again
#Changing the order
new_order_dat <- factor(factor_dat,levels = c("med","high","low"))
```
<span style="color:orangered">**Practice**:</span> Making a ```vector``` into a ```factor```
```{r Factors2}
hues <- c(rep("blue",20), rep("red", 30))
# make hues a factor
#summarize hues
summary(hues)
```
###<span style="color:cadetblue">Data frames</span>
```Data frames``` are the most common way we organize ecological data. They have the freedom of lists and the organization of a matrix. Each column can be a different data class. This means for a single observation or row in a data frame categorical, numeric, factorial, etc. can be included.
```{r Data frames}
# creating vectors
n = c(2, 3, 5)
s = c("aa", "bb", "cc")
b = c(TRUE, FALSE, TRUE)
# creating the data frame
df = data.frame(n, s, b)
names(df) <- c("ID","Code","Passed") # variable names
```
Like with the ```matrix``` we can subset a ```data frame```
```{r Data frames1}
# subsetting a column
df[,1]
# subsetting a row
df[1,]
# the $ for subsetting
df$ID
```
##<span style="color:cadetblue"> Useful functions</span>
head() - see first 6 rows
tail() - see last 6 rows
dim() - see dimensions
nrow() - number of rows
ncol() - number of columns
str() - structure of each column
names() - will list the names attribute for a data frame (or any object really), which gives the column names.
## <span style="color:cadetblue">Going further</span>
The `R` community lead by Garrett Grolemund and Hadley Wickham has developed work flows that are more efficient for common data science manipulations. Check out the [Tidyverse](https://www.tidyverse.org/) and [R for Data Science](http://r4ds.had.co.nz/).
In general, if you're searching for a quick reference for `R` [statmethods](http://www.statmethods.net/index.html) is a good source.