diff --git a/modules/Subsetting_Data_in_R/Subsetting_Data_in_R.Rmd b/modules/Subsetting_Data_in_R/Subsetting_Data_in_R.Rmd index 5f260c1e..aa29783c 100644 --- a/modules/Subsetting_Data_in_R/Subsetting_Data_in_R.Rmd +++ b/modules/Subsetting_Data_in_R/Subsetting_Data_in_R.Rmd @@ -28,7 +28,7 @@ We are constantly making improvements. - The sequence `seq()` function helps you create numeric vectors (`from`,`to`, `by`, and `length.out` arguments) - The repeat `rep()` function helps you create vectors with the `each` and `times` arguments - Reproducible science makes everyone's life easier! -- `readr`has helpful functions like `read_csv()` that can help you import data into R +- The `readr` package has helpful functions like `read_csv()` that can help you import data into R 📃 [Day 2 Cheatsheet](https://daseh.org/modules/cheatsheets/Day-2.pdf) @@ -204,6 +204,14 @@ head(er) # Renaming Columns +## Why rename? + +Renaming can: + +- make it easier to work with your data +- make your column names more compaitible with R +- make your column names more interpretable by others + ## `rename` function ```{r, fig.alt="dplyr", out.width = "70%", echo = FALSE, fig.align='center'} @@ -414,6 +422,20 @@ C. Keeping it as is and use quotes around the column name when you use it. # Subsetting Columns +## Why Subset? + +Subsetting involves grabbing specific parts of your data to: + +- Produce a smaller dataset +- Examine specific subsets of your data +- Use a particular part of the data for a specific analysis/visualzation + + +Be cautious about removing columns/variables as you might find they are useful later. + +You should be guided by your questions of interest. + + ## Let's get our data again We'll work with the CO heat-related ER visits dataset again.