From b7e041ec968f17ecd525927968eb367c77cb6fdf Mon Sep 17 00:00:00 2001 From: Ansh Pathak Date: Fri, 18 Jul 2025 00:22:59 -0500 Subject: [PATCH 1/4] onea --- inst/tutorials/99-overview/tutorial.Rmd | 327 +++++++++++++++++------- 1 file changed, 235 insertions(+), 92 deletions(-) diff --git a/inst/tutorials/99-overview/tutorial.Rmd b/inst/tutorials/99-overview/tutorial.Rmd index 7d66662..81aedc6 100644 --- a/inst/tutorials/99-overview/tutorial.Rmd +++ b/inst/tutorials/99-overview/tutorial.Rmd @@ -74,15 +74,10 @@ options(tutorial.exercise.timelimit = 600, -The smallest unit at which data are made available from the decennial US Census is the block, and the smallest unit available in the ACS is the block group, which represents a collection of blocks. Other surveys are generally available at higher levels of aggregation. - +This tutorial covers an overview of [Analyzing US Census Data](https://walker-data.com/census-r/index.html) by Kyle Walker. You will learn about using the [**tidycensus**] package for collecting, interacting, and plotting US Census data. You will mainly focus on collecting data from the Decennial Census and the American Community Survey (ACS). - -## Texas Income -### - - + ### Exercise 1 @@ -106,7 +101,7 @@ If that fails, it is probably because you have not yet loaded `library(tutorial. CP/CR. -```{r texas-income-1} +```{r introduction-1} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -138,7 +133,7 @@ tutorial.helpers::show_file("TexasIncome.qmd", chunk = "Last") CP/CR. -```{r texas-income-2} +```{r introduction-2} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -159,7 +154,7 @@ Note that this causes `library(tidyverse)` to be copied down to the Console and CP/CR. -```{r texas-income-3} +```{r introduction-3} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -170,17 +165,25 @@ question_text(NULL, ### -Always pair `get_acs()` with **tidyverse** functions for filtering, transforming, and visualizing your data. +This is how professionals work, they type code in the quarto document and send it down to the console to execute it! -### Exercise 4 +## Texas Income + + +A critical part of the Census data analysis process is data visualization, where an analyst examines patterns and trends found in their data graphically. This first section illustrates some examples for getting started with exploratory Census data visualization with [ggplot2](https://ggplot2.tidyverse.org/). You will be using the `get_acs()` and the `geom_sf()` functions as well. + + + + +### Exercise 1 Ask AI to use **tidycensus** to get data on the median household income for all counties in Texas for 2020. -Put what it gives you in your code chunk and do `Ctrl/Cmd + Enter` to send it to the Console. +Put what it gives you in a new code chunk and do `Ctrl/Cmd + Enter` to send it to the Console. CP/CR. -```{r texas-income-4} +```{r texas-income-1} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -194,20 +197,29 @@ question_text(NULL, Our answer: ```` - +income_tx <- get_acs( + geography = "county", + variables = "B19013_001", + state = "TX", + year = 2020, + geometry = FALSE +) ```` -Tip: Always inspect your output using `glimpse()` or `head()` before plotting. +### Exercise 2 -### Exercise 5 +Copy and paste our code to replace it with what you have. -Now ask your AI to include `geometry = TRUE` -Put what it gives you in your code chunk and do `Ctrl/Cmd + Enter` to send it to the console +In the Console, run: -CP/CR +``` +tutorial.helpers::show_file("TexasIncome.qmd", chunk = "Last") +``` -```{r texas-income-5} +CP/CR. + +```{r texas-income-2} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -218,12 +230,32 @@ question_text(NULL, ### - The `geometry = TRUE` argument returns spatial polygons, useful for maps and spatial analysis. +`get_acs()` is part of the **tidycensus** package and allows downloading American Community Survey (ACS) data. -### Exercise 6 +### Exercise 2 -Here is our code. It is okay if your code is different. That will happen when using AI! -Replace your code with what it gave you using this code: +Using our code, set `geometry = TRUE`. + +Take your code chunk and do `Ctrl/Cmd + Enter` to send it to the Console. + +In the Console, run: + +``` +tutorial.helpers::show_file("TexasIncome.qmd", chunk = "Last") +``` + +CP/CR. + +```{r texas-income-2} +question_text(NULL, + answer(NULL, correct = TRUE), + allow_retry = TRUE, + try_again_button = "Edit Answer", + incorrect = NULL, + rows = 6) +``` + +### ```` income_tx <- get_acs( @@ -231,17 +263,56 @@ income_tx <- get_acs( variables = "B19013_001", state = "TX", year = 2020, - geometry = FALSE + geometry = TRUE ) ```` + The `geometry = TRUE` argument returns spatial polygons, useful for maps and spatial analysis. + +### Exercise 4 + +Let's explore the data, run `glimpse(income_tx)` in the console. + +CP/CR. + +```{r texas-income-4} +question_text(NULL, + answer(NULL, correct = TRUE), + allow_retry = TRUE, + try_again_button = "Edit Answer", + incorrect = NULL, + rows = 3) +``` + ### -`get_acs()` is part of the tidycensus package and allows downloading American Community Survey (ACS) data. +### Exercise 4 + +Let's get some quick stats for each column of the data. -### Exercise 7 +Run `summary(income_tx)` in the console. + +CP/CR. -Now you will use AI to generate code that creates a plot of median household income in Texas counties. + +```{r texas-income-4} +question_text(NULL, + answer(NULL, correct = TRUE), + allow_retry = TRUE, + try_again_button = "Edit Answer", + incorrect = NULL, + rows = 3) +``` + +### + + + + + +### Exercise 4 + +Use AI to generate code that creates a plot of median household income in Texas counties. Send our code to the console from the previous exercise. @@ -249,7 +320,7 @@ Now, type `income_tx` in the console. CP/CR the first few lines. -```{r texas-income-7} +```{r texas-income-4} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -263,16 +334,16 @@ question_text(NULL, `ggplot2` is ideal for Census data due to its support for basic charts, group comparisons, and geospatial visualizations. -### Exercise 8 +### Exercise 5 -Now copy the first few lines into your AI and say that you are working with tidyverse. Tell it to take the data in `income_tx` and make a choropleth map of median household income. +Now copy the first few lines into your AI and say that you are working with **tidyverse**. Tell it to take the data in `income_tx` and make a choropleth map of median household income. -Put what it gives you in your code chunk and do `Ctrl/Cmd + Enter` to send it to the console +Put what it gives you in a new code chunk and do `Ctrl/Cmd + Enter` to send it to the console -CP/CR +CP/CR. -```{r texas-income-8} +```{r texas-income-5} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -283,18 +354,23 @@ question_text(NULL, ### -You can create histograms, bar charts, and scatterplots of Census variables using `ggplot()`, often starting with `aes(x = estimate)` or `aes(x = var1, y = var2)` +Our answer: -### Exercise 9 +```` +ggplot(income_tx) + + geom_sf(aes(fill = estimate), color = "white", size = 0.2) +```` + +### Exercise 6 Now, ask the AI to color counties by the estimate column and add an approprite title and theme. -Put what it gives you in your code chunk and do `Ctrl/Cmd + Enter` to send it to the console +Replace your code with this code in your chunk and do `Ctrl/Cmd + Enter` to send it to the console -CP/CR +CP/CR. -```{r texas-income-9} +```{r texas-income-6} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -308,7 +384,7 @@ question_text(NULL, When mapping spatial Census data, use `geom_sf()` in `ggplot2` and apply `fill = estimate` to show choropleth patterns. -### Exercise 10 +### Exercise 7 Here is our code. It is okay if your code is different. That will happen when using AI! Replace your code with what it gave you using this code: @@ -339,14 +415,16 @@ ggplot(income_tx) + theme_minimal() ``` -### Exercise 11 +### Exercise 8 -1. In the Console, run the following command to display the last chunk of your `.qmd` file: CP/CR +In the Console, run the following command to display the last chunk of your `.qmd` file: CP/CR -tutorial.helpers::show_file("TexasIncome.qmd", chunk = "last") +``` +tutorial.helpers::show_file("TexasIncome.qmd", chunk = "Last") +``` -```{r texas-income-11} +```{r texas-income-8} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -357,12 +435,12 @@ question_text(NULL, ### -The `show_file()` function from tutorial.helpers is a convenient way to check the contents of files without leaving R. It helps confirm that your edits were saved properly. +The `show_file()` function from **tutorial.helpers** is a convenient way to check the contents of files without leaving R. It helps confirm that your edits were saved properly. ## California Bachelors Degree - +In this activity, we will be using **ggplot**, `geom_sf`, and be modifying data to turn it into percentages. ### Exercise 1 @@ -373,10 +451,7 @@ In this exercise, we’ll get the percentage of adults with a bachelor’s degre ### -Did you know? -Visualizing Maps with `geometry = TRUE` -Passing `geometry = TRUE` to `get_acs()` returns spatial geometry as an `sf` object, which works well with `ggplot2` for choropleth maps. ### Exercise 2 @@ -392,7 +467,7 @@ execute: In the Console, run: ``` -tutorial.helpers::show_file("CaliforniaBachelors.qmd", start = -5) +tutorial.helpers::show_file("CaliforniaBachelors.qmd", chunk = "Last") ``` CP/CR. @@ -433,9 +508,9 @@ Working in the console like this is how professionals work! ### Exercise 4 -- Ask an AI assistant (like ChatGPT) to generate R code that uses tidycensus to get educational attainment variables for all California counties in 2020 and save it in a variable called `edu_ca` +Ask AI to generate R code that uses tidycensus to get educational attainment variables for all California counties in 2020 and save it in a variable called `edu_ca` -Put what it gives you in your code chunk and do `Ctrl/Cmd + Enter` to send it to the console +Put what it gives you in a new code chunk and do `Ctrl/Cmd + Enter` to send it to the console CP/CR @@ -452,18 +527,34 @@ question_text(NULL, ### -Did you know? You can use `load_variables()` and filter/search the resulting data frame to explore variable descriptions and codes, such as "`B19013_001`" for median household income. +Our code: -### Exercise 5 +```` +edu_ca <- get_acs( + geography = "county", + variables = c("B15003_001", "B15003_022", "B15003_023", "B15003_024", "B15003_025"), + state = "CA", + year = 2020, + geometry = FALSE, + summary_var = "B15003_001" +) +```` -Now, tell your AI to add include `geometry = TRUE` +### Exercise 2 -Put what it gives you in your code chunk and do `Ctrl/Cmd + Enter` to send it to the console +Using our code, set `geometry = TRUE`. -CP/CR +Take your code chunk and do `Ctrl/Cmd + Enter` to send it to the Console. + +In the Console, run: + +``` +tutorial.helpers::show_file("CaliforniaBachelors.qmd", chunk = "Last") +``` +CP/CR. -```{r california-bachelors-degree-5} +```{r texas-income-2} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -474,14 +565,6 @@ question_text(NULL, ### -The American Community Survey (ACS) provides annual demographic, economic, and housing data based on samples, while the Decennial Census gives a complete count every 10 years. - -### Exercise 6 - -Here is our code. It is okay if your code is different. That will happen when using AI! -Replace your code with what it gave you using this code: - -```` edu_ca <- get_acs( geography = "county", variables = c("B15003_001", "B15003_022", "B15003_023", "B15003_024", "B15003_025"), @@ -490,23 +573,36 @@ edu_ca <- get_acs( geometry = TRUE, summary_var = "B15003_001" ) -```` -### +The American Community Survey (ACS) provides annual demographic, economic, and housing data based on samples, while the Decennial Census gives a complete count every 10 years. -The `get_acs()` function is powerful for pulling American Community Survey (ACS) data. +### Exercise 4 -### Exercise 7 +Let's explore the data, run `glimpse(edu_ca)` in the console. -We’ll now make a choropleth map of bachelor’s degree attainment across California counties. +CP/CR. -Send our code to the console from the previous exercise. +```{r texas-income-4} +question_text(NULL, + answer(NULL, correct = TRUE), + allow_retry = TRUE, + try_again_button = "Edit Answer", + incorrect = NULL, + rows = 3) +``` -Now, type `edu_ca` in the console. +### -CP/CR the first few lines. +### Exercise 4 -```{r california-bachelors-degree-7} +Let's get some quick stats for each column of the data. + +Run `summary(edu_ca)` in the console. + +CP/CR. + + +```{r texas-income-4} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -517,40 +613,38 @@ question_text(NULL, ### -Geometry must be `TRUE` if we want to map later. - -### Exercise 8 +### Exercise 7 -Now copy the first few lines into your AI and say that you are working with tidyverse. Tell it to take the data in `edu_ca` and make a choropleth map of at least bachelor’s degree attainment or higher across California counties. +We’ll now make a choropleth map of bachelor’s degree attainment across California counties. -Tell it to also use `mutate()` to calculate the percentage of the population with at least a bachelor’s degree and pipe it into a `ggplot()` +Send our code to the console from the previous exercise. -Put what it gives you in your code chunk and do `Ctrl/Cmd + Enter` to send it to the console +Now, type `edu_ca` in the console. -CP/CR +CP/CR the first few lines. -```{r california-bachelors-degree-8} +```{r california-bachelors-degree-7} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, try_again_button = "Edit Answer", incorrect = NULL, - rows = 6) + rows = 3) ``` ### -For educational attainment, we use `B15003_022` through `B15003_025` to sum all individuals with a bachelor’s degree or more, then divide by the total population (variable `B15003_001`). +Geometry must be `TRUE` if we want to map later. -### Exercise 9 +### Exercise 8 -Now, tell it to also use `geom_sf(aes(fill = percent))` and a `scale_fill_viridis_c()` to make it look nice +Now copy the first few lines into your AI and say that you are working with tidyverse. Tell it to take the data in `edu_ca` and make a choropleth map of at least bachelor’s degree attainment or higher across California counties and use `mutate()` to calculate the percentage of the population with at least a bachelor’s degree and pipe it into a `ggplot()` -Put what it gives you in your code chunk and do `Ctrl/Cmd + Enter` to send it to the console +Put what it gives you in a new code chunk and do `Ctrl/Cmd + Enter` to send it to the console CP/CR -```{r california-bachelors-degree-9} +```{r california-bachelors-degree-8} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -561,11 +655,28 @@ question_text(NULL, ### -The `scale_fill_viridis_c()` function applies colorblind-friendly color scales to maps, using palettes like viridis, plasma, or magma. +Our Code: -### Exercise 10 +```` +edu_ca <- edu_ca %>% + group_by(GEOID) %>% + summarize( + percent = 100 * sum(estimate[variable != "B15003_001"]) / unique(summary_est) + ) + +ggplot(edu_ca) + + geom_sf(aes(fill = percent)) + + scale_fill_viridis_c(option = "C") + + labs(title = "Adult Percentage with at least a Bachelor's in CA (2020)", + fill = "% with Degree") + + theme_minimal() +```` + + +For educational attainment, we use `B15003_022` through `B15003_025` to sum all individuals with a bachelor’s degree or more, then divide by the total population (variable `B15003_001`). The `scale_fill_viridis_c()` function applies colorblind-friendly color scales to maps, using palettes like viridis, plasma, or magma. Here is our code. It is okay if your code is different. That will happen when using AI! + Replace your code with what it gave you using this code: ```` @@ -920,3 +1031,35 @@ The `show_file()` function from tutorial.helpers is a convenient way to check th ### Good Knowledge Drops +If the year is not specified, `get_acs()` defaults to the most recent five-year ACS sample. The data returned is similar in structure to that returned by `get_decennial()`, but includes an `estimate` column (for the ACS estimate) and `moe` column (for the margin of error around that estimate) instead of a value column. + +The geography parameter in `get_acs()` and `get_decennial()` allows users to request data aggregated to common Census enumeration units. +Census blocks are available in `get_decennial()` but not in `get_acs()` as block-level data are not available from the American Community Survey. To request data within states and/or counties, state and county names can be supplied to the state and county parameters, respectively, [formatted in the way that they are accepted by the US Census Bureau API](https://walker-data.com/census-r/an-introduction-to-tidycensus.html#geography-and-variables-in-tidycensus). + +**tidycensus** accepts state names (e.g. "Wisconsin"), state postal codes (e.g. "WI"), and state FIPS codes (e.g. "55"), so an analyst can use what they are most comfortable with. + +This is the typical workflow. First, we figure out what variables we need and download them using a function like `get_acs()` or `get_decennial()`. We save the result to a permanent object like `az_race`. Then, we build pipes starting from that saved object. + +The `mutate()` function allows us the create, modify and delete columns. More information about this function can be found [here](https://dplyr.tidyverse.org/reference/mutate.html) + +The attractive defaults of [ggplot2](https://ggplot2.tidyverse.org/) visualizations allow for the creation of legible graphics with little to no customization. This helps greatly with exploratory data analysis tasks where the primary audience is the analyst exploring the dataset.This section covers how to take a Census data visualization that is relatively illegible by default and polish it up for eventual presentation and export from R. + +While an analyst may be comfortable with the plot as-is, **ggplot2** allows for significant customization with respect to stylistic presentation. This includes styling the bars on the plot with a different color and internal transparency; changing the font; and customizing the axis tick labels. + +Once an analyst has settled on a visualization design, they may want to export their image from R to display on a website, in a blog post, or in a report. You can do so by clicking the button **Export** in the plot window and then **Save as Image** + +The [ggsave()](https://ggplot2.tidyverse.org/reference/ggsave.html) function in ggplot2 will save the last plot generated to an image file in the user’s current working directory by default. The specified file extension will control the output image format, e.g. .png. + +Cleaning up the plot allows us to use some additional visualization options in **ggplot2**. In addition to specifying appropriate chart labels, we can format the axis tick labels by using appropriate `scale_*` functions in **ggplot2** and setting the X-axis limits to show both sides of 0 equally + +We’ll also make use of an alternative **ggplot2** theme, `theme_minimal()`, which uses a white background with muted gridlines. + +The `separate()` function is used to split the `NAME` column into three separate columns: `TRACT`, `COUNTY`, and `STATE`. The `sep` argument is used to specify the character that separates the values in the `NAME` column. + +The `summarize()` function is used to calculate summary statistics for each group. In this case, we are calculating the minimum median home value for each county. + +The `scale_x_continuous()` function is used to format the x-axis labels as dollar values. The `breaks` argument is used to specify the breaks on the x-axis. + +The `scale_color_viridis_c()` function is used to adjust the color scheme of the plot. The `guide` argument is used to specify the type of legend to display. + +The `theme_void()` function strips the background grid and axis labels from the plot accordingly: \ No newline at end of file From c3cee2ccfa20ae02eae0b9f37eca14f5bcf46312 Mon Sep 17 00:00:00 2001 From: Ansh Pathak Date: Fri, 18 Jul 2025 12:15:50 -0500 Subject: [PATCH 2/4] added good knowledge drops and minor changes --- inst/tutorials/99-overview/tutorial.Rmd | 195 ++++++++++++++---------- 1 file changed, 116 insertions(+), 79 deletions(-) diff --git a/inst/tutorials/99-overview/tutorial.Rmd b/inst/tutorials/99-overview/tutorial.Rmd index 81aedc6..bf95ecc 100644 --- a/inst/tutorials/99-overview/tutorial.Rmd +++ b/inst/tutorials/99-overview/tutorial.Rmd @@ -232,7 +232,7 @@ question_text(NULL, `get_acs()` is part of the **tidycensus** package and allows downloading American Community Survey (ACS) data. -### Exercise 2 +### Exercise 3 Using our code, set `geometry = TRUE`. @@ -246,7 +246,7 @@ tutorial.helpers::show_file("TexasIncome.qmd", chunk = "Last") CP/CR. -```{r texas-income-2} +```{r texas-income-3} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -286,7 +286,7 @@ question_text(NULL, ### -### Exercise 4 +### Exercise 5 Let's get some quick stats for each column of the data. @@ -295,7 +295,7 @@ Run `summary(income_tx)` in the console. CP/CR. -```{r texas-income-4} +```{r texas-income-5} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -310,7 +310,7 @@ question_text(NULL, -### Exercise 4 +### Exercise 6 Use AI to generate code that creates a plot of median household income in Texas counties. @@ -320,7 +320,7 @@ Now, type `income_tx` in the console. CP/CR the first few lines. -```{r texas-income-4} +```{r texas-income-6} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -334,7 +334,7 @@ question_text(NULL, `ggplot2` is ideal for Census data due to its support for basic charts, group comparisons, and geospatial visualizations. -### Exercise 5 +### Exercise 7 Now copy the first few lines into your AI and say that you are working with **tidyverse**. Tell it to take the data in `income_tx` and make a choropleth map of median household income. @@ -343,7 +343,7 @@ Put what it gives you in a new code chunk and do `Ctrl/Cmd + Enter` to send it t CP/CR. -```{r texas-income-5} +```{r texas-income-7} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -361,16 +361,16 @@ ggplot(income_tx) + geom_sf(aes(fill = estimate), color = "white", size = 0.2) ```` -### Exercise 6 +### Exercise 8 -Now, ask the AI to color counties by the estimate column and add an approprite title and theme. +Now, tell the AI to color counties by the estimate column and add an approprite title and theme. Replace your code with this code in your chunk and do `Ctrl/Cmd + Enter` to send it to the console CP/CR. -```{r texas-income-6} +```{r texas-income-8} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -384,7 +384,7 @@ question_text(NULL, When mapping spatial Census data, use `geom_sf()` in `ggplot2` and apply `fill = estimate` to show choropleth patterns. -### Exercise 7 +### Exercise 9 Here is our code. It is okay if your code is different. That will happen when using AI! Replace your code with what it gave you using this code: @@ -415,7 +415,7 @@ ggplot(income_tx) + theme_minimal() ``` -### Exercise 8 +### Exercise 10 In the Console, run the following command to display the last chunk of your `.qmd` file: CP/CR @@ -424,7 +424,7 @@ In the Console, run the following command to display the last chunk of your `.qm tutorial.helpers::show_file("TexasIncome.qmd", chunk = "Last") ``` -```{r texas-income-8} +```{r texas-income-10} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -443,16 +443,12 @@ The `show_file()` function from **tutorial.helpers** is a convenient way to chec In this activity, we will be using **ggplot**, `geom_sf`, and be modifying data to turn it into percentages. -### Exercise 1 +### Select `File -> New File -> Quarto Document ...`. Provide a title -- `"CaliforniaBachelors"` -- and an author (you). Render the document and save it as `CaliforniaBachelors.qmd`. In this exercise, we’ll get the percentage of adults with a bachelor’s degree or higher in each California county. -### - - - ### Exercise 2 In your QMD, put `library(tidyverse)` and `library(tidycensus)` in a new code chunk. Press Ctrl/Cmd + Shift + K to render the file @@ -540,7 +536,7 @@ edu_ca <- get_acs( ) ```` -### Exercise 2 +### Exercise 5 Using our code, set `geometry = TRUE`. @@ -554,7 +550,7 @@ tutorial.helpers::show_file("CaliforniaBachelors.qmd", chunk = "Last") CP/CR. -```{r texas-income-2} +```{r california-bachelors-degree-5} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -565,6 +561,7 @@ question_text(NULL, ### +```` edu_ca <- get_acs( geography = "county", variables = c("B15003_001", "B15003_022", "B15003_023", "B15003_024", "B15003_025"), @@ -573,16 +570,17 @@ edu_ca <- get_acs( geometry = TRUE, summary_var = "B15003_001" ) +```` The American Community Survey (ACS) provides annual demographic, economic, and housing data based on samples, while the Decennial Census gives a complete count every 10 years. -### Exercise 4 +### Exercise 6 Let's explore the data, run `glimpse(edu_ca)` in the console. CP/CR. -```{r texas-income-4} +```{r california-bachelors-degree-6} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -593,7 +591,7 @@ question_text(NULL, ### -### Exercise 4 +### Exercise 7 Let's get some quick stats for each column of the data. @@ -602,7 +600,7 @@ Run `summary(edu_ca)` in the console. CP/CR. -```{r texas-income-4} +```{r california-bachelors-degree-7} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -613,7 +611,7 @@ question_text(NULL, ### -### Exercise 7 +### Exercise 8 We’ll now make a choropleth map of bachelor’s degree attainment across California counties. @@ -623,7 +621,7 @@ Now, type `edu_ca` in the console. CP/CR the first few lines. -```{r california-bachelors-degree-7} +```{r california-bachelors-degree-8} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -636,7 +634,7 @@ question_text(NULL, Geometry must be `TRUE` if we want to map later. -### Exercise 8 +### Exercise 9 Now copy the first few lines into your AI and say that you are working with tidyverse. Tell it to take the data in `edu_ca` and make a choropleth map of at least bachelor’s degree attainment or higher across California counties and use `mutate()` to calculate the percentage of the population with at least a bachelor’s degree and pipe it into a `ggplot()` @@ -644,7 +642,7 @@ Put what it gives you in a new code chunk and do `Ctrl/Cmd + Enter` to send it t CP/CR -```{r california-bachelors-degree-8} +```{r california-bachelors-degree-9} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -675,24 +673,8 @@ ggplot(edu_ca) + For educational attainment, we use `B15003_022` through `B15003_025` to sum all individuals with a bachelor’s degree or more, then divide by the total population (variable `B15003_001`). The `scale_fill_viridis_c()` function applies colorblind-friendly color scales to maps, using palettes like viridis, plasma, or magma. -Here is our code. It is okay if your code is different. That will happen when using AI! - -Replace your code with what it gave you using this code: -```` -edu_ca <- edu_ca %>% - group_by(GEOID) %>% - summarize( - percent = 100 * sum(estimate[variable != "B15003_001"]) / unique(summary_est) - ) -ggplot(edu_ca) + - geom_sf(aes(fill = percent)) + - scale_fill_viridis_c(option = "C") + - labs(title = "Adult Percentage with at least a Bachelor's in CA (2020)", - fill = "% with Degree") + - theme_minimal() -```` ```{r} #| message: false @@ -715,14 +697,14 @@ ggplot(edu_ca) + `ggplot2` can handle spatial data directly using `geom_sf()`. Use `mutate()` to calculate percentages, and pipe that into `ggplot()` for a map. -### Exercise 11 +### Exercise 10 1. In the Console, run the following command to display the last chunk of your `.qmd` file: CP/CR tutorial.helpers::show_file("CaliforniaBachelors.qmd", chunk = "last") -```{r california-bachelors-degree-11} +```{r california-bachelors-degree-10} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -742,13 +724,11 @@ The variables in `get_acs()` like `"B19013_001"` are codes that represent specif ## California Median Age -### Exercise 1 - -Select `File -> New File -> Quarto Document ...`. Provide a title -- `"CaliforniaAge"` -- and an author (you). Render the document and save it as `CaliforniaAge.qmd`. +### -In this exercise, you will collect median age data for all counties in California for the year 2020 using the `tidycensus` package. +In this exercise, you will collect and plot median age data for all counties in California for the year 2020 using the **tidycensus** package. -### +Select `File -> New File -> Quarto Document ...`. Provide a title -- `"CaliforniaAge"` -- and an author (you). Render the document and save it as `CaliforniaAge.qmd`. @@ -767,7 +747,7 @@ execute: In the Console, run: ``` -tutorial.helpers::show_file("CaliforniaAge.qmd", start = -5) +tutorial.helpers::show_file("CaliforniaAge.qmd", chunk = "last") ``` CP/CR. @@ -811,10 +791,10 @@ Working in the console like this is how professionals work! ### Exercise 4 -Ask an AI assistant (such as ChatGPT) to generate R code that uses tidycensus data to get both the median age and populations for all counties in California for 2020 in a variable called `age_ca` +Ask AI to generate R code that uses tidycensus data to get both the median age and populations for all counties in California for 2020 in a variable called `age_ca` +Put what it gives you in a new code chunk and do `Ctrl/Cmd + Enter` to send it to the console -Put what it gives you in your code chunk and do `Ctrl/Cmd + Enter` to send it to the console CP/CR @@ -830,17 +810,35 @@ question_text(NULL, ### +Our code: + +```` +age_ca <- get_acs( + geography = "county", + variables = c(median_age = "B01002_001", population = "B01003_001"), + state = "CA", + year = 2020, +) +```` + AI is your best friend! Professionals use AI to generate code and plot. You however, need to be careful as it can and will generate much extra code that you may not need. + ### Exercise 5 -Now, tell your AI to set `geometry = FALSE` and only give the part to get the data into the variable. +Using our code, add `geometry = FALSE`. -Put what it gives you in your code chunk and do `Ctrl/Cmd + Enter` to send it to the console +Take your code chunk and do `Ctrl/Cmd + Enter` to send it to the Console. -CP/CR +In the Console, run: + +``` +tutorial.helpers::show_file("CaliforniaAge.qmd", chunk = "Last") +``` -```{r california-median-age-5} +CP/CR. + +```{r california-bachelors-degree-5} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -851,13 +849,6 @@ question_text(NULL, ### -Set geometry = TRUE in `get_acs()` or `get_decennial()` if you do not need spatial data. - -### Exercise 6 - -Here is our code. It is okay if your code is different. That will happen when using AI! -Replace your code with what it gave you using this code: - ```` age_ca <- get_acs( geography = "county", @@ -869,6 +860,43 @@ age_ca <- get_acs( ```` +The American Community Survey (ACS) provides annual demographic, economic, and housing data based on samples, while the Decennial Census gives a complete count every 10 years. + +### Exercise 6 + +Let's explore the data, run `glimpse(age_ca)` in the console. + +CP/CR. + +```{r california-bachelors-degree-6} +question_text(NULL, + answer(NULL, correct = TRUE), + allow_retry = TRUE, + try_again_button = "Edit Answer", + incorrect = NULL, + rows = 3) +``` + +### + +### Exercise 7 + +Let's get some quick stats for each column of the data. + +Run `summary(age_ca)` in the console. + +CP/CR. + + +```{r california-bachelors-degree-7} +question_text(NULL, + answer(NULL, correct = TRUE), + allow_retry = TRUE, + try_again_button = "Edit Answer", + incorrect = NULL, + rows = 3) +``` + ### `get_acs()` is your go-to for detailed annual demographic estimates from the American Community Survey. It returns both point estimates and margins of error (MOE) by default. @@ -901,7 +929,7 @@ Now, copy/paste those few lines into your AI. Tell it to use the `age_ca` data a -Put what it gives you in your code chunk and do `Ctrl/Cmd + Enter` to send it to the console +Put what it gives you in the code chunk and do `Ctrl/Cmd + Enter` to send it to the console CP/CR @@ -916,6 +944,19 @@ question_text(NULL, ### +Our Code: + +```` +age_ca_wide <- age_ca %>% + select(NAME, variable, estimate) %>% + pivot_wider(names_from = variable, values_from = estimate) + + +ggplot(age_ca_wide, aes(x = reorder(NAME, median_age), y = median_age)) + + geom_col(fill = "#4daf4a") + + coord_flip() +```` + Annotate charts with `geom_text()` or `labs()` to add clarity about what each axis, facet, or fill represents—especially useful for public-facing work. ### Exercise 9 @@ -924,7 +965,7 @@ Now we have a basic bar plot, but we have too much data points. Tell your AI to filter the dataset to the 15 most populous counties and add informative labels and a title with `labs()`. -Put what it gives you in your code chunk and do `Ctrl/Cmd + Enter` to send it to the console +Put what it gives you in the code chunk and do `Ctrl/Cmd + Enter` to send it to the console CP/CR @@ -939,11 +980,6 @@ question_text(NULL, ### -`theme_minimal()` from `ggplot2`: -Applies a clean, minimal theme to the plot for better readability. - -### Exercise 10 - Here is our code. It is okay if your code is different. That will happen when using AI! Replace your code with what it gave you using this code: @@ -972,6 +1008,9 @@ ggplot(largest_ca, aes(x = reorder(NAME, median_age), y = median_age)) + ) ```` +`theme_minimal()` from `ggplot2`: +Applies a clean, minimal theme to the plot for better readability. + ```{r} #| message: false age_ca_wide <- age_ca %>% @@ -998,11 +1037,6 @@ ggplot(largest_ca, aes(x = reorder(NAME, median_age), y = median_age)) + ) ``` -### - -`coord_flip()` from ggplot2: -Flips x and y axes, often used to make horizontal bar charts easier to read. - ### Exercise 11 @@ -1062,4 +1096,7 @@ The `scale_x_continuous()` function is used to format the x-axis labels as dolla The `scale_color_viridis_c()` function is used to adjust the color scheme of the plot. The `guide` argument is used to specify the type of legend to display. -The `theme_void()` function strips the background grid and axis labels from the plot accordingly: \ No newline at end of file +The `theme_void()` function strips the background grid and axis labels from the plot accordingly: + +`coord_flip()` from ggplot2: +Flips x and y axes, often used to make horizontal bar charts easier to read. \ No newline at end of file From 26ec05cd92e8940759653ca94cafe4fd48d77c0b Mon Sep 17 00:00:00 2001 From: Ansh Pathak Date: Fri, 18 Jul 2025 12:16:05 -0500 Subject: [PATCH 3/4] again --- inst/tutorials/99-overview/tutorial.Rmd | 66 ++++++++++++------------- 1 file changed, 33 insertions(+), 33 deletions(-) diff --git a/inst/tutorials/99-overview/tutorial.Rmd b/inst/tutorials/99-overview/tutorial.Rmd index bf95ecc..5945ecf 100644 --- a/inst/tutorials/99-overview/tutorial.Rmd +++ b/inst/tutorials/99-overview/tutorial.Rmd @@ -449,7 +449,7 @@ Select `File -> New File -> Quarto Document ...`. Provide a title -- `"Californi In this exercise, we’ll get the percentage of adults with a bachelor’s degree or higher in each California county. -### Exercise 2 +### Exercise 1 In your QMD, put `library(tidyverse)` and `library(tidycensus)` in a new code chunk. Press Ctrl/Cmd + Shift + K to render the file @@ -468,7 +468,7 @@ tutorial.helpers::show_file("CaliforniaBachelors.qmd", chunk = "Last") CP/CR. -```{r california-bachelors-degree-2} +```{r california-bachelors-degree-1} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -481,7 +481,7 @@ question_text(NULL, Render again. Everything looks nice, albeit empty, because we have added code to make the file look better and more professional. -### Exercise 3 +### Exercise 2 Place your cursor in the QMD file on the `library(tidyverse)` line. Use `Cmd/Ctrl + Enter` to execute that line. @@ -489,7 +489,7 @@ Note that this causes `library(tidyverse)` to be copied down to the Console and CP/CR. -```{r california-bachelors-degree-3} +```{r california-bachelors-degree-2} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -502,7 +502,7 @@ question_text(NULL, Working in the console like this is how professionals work! -### Exercise 4 +### Exercise 3 Ask AI to generate R code that uses tidycensus to get educational attainment variables for all California counties in 2020 and save it in a variable called `edu_ca` @@ -511,7 +511,7 @@ Put what it gives you in a new code chunk and do `Ctrl/Cmd + Enter` to send it t CP/CR -```{r california-bachelors-degree-4} +```{r california-bachelors-degree-3} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -536,7 +536,7 @@ edu_ca <- get_acs( ) ```` -### Exercise 5 +### Exercise 4 Using our code, set `geometry = TRUE`. @@ -550,7 +550,7 @@ tutorial.helpers::show_file("CaliforniaBachelors.qmd", chunk = "Last") CP/CR. -```{r california-bachelors-degree-5} +```{r california-bachelors-degree-4} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -574,13 +574,13 @@ edu_ca <- get_acs( The American Community Survey (ACS) provides annual demographic, economic, and housing data based on samples, while the Decennial Census gives a complete count every 10 years. -### Exercise 6 +### Exercise 5 Let's explore the data, run `glimpse(edu_ca)` in the console. CP/CR. -```{r california-bachelors-degree-6} +```{r california-bachelors-degree-5} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -591,7 +591,7 @@ question_text(NULL, ### -### Exercise 7 +### Exercise 6 Let's get some quick stats for each column of the data. @@ -600,7 +600,7 @@ Run `summary(edu_ca)` in the console. CP/CR. -```{r california-bachelors-degree-7} +```{r california-bachelors-degree-6} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -611,7 +611,7 @@ question_text(NULL, ### -### Exercise 8 +### Exercise 7 We’ll now make a choropleth map of bachelor’s degree attainment across California counties. @@ -621,7 +621,7 @@ Now, type `edu_ca` in the console. CP/CR the first few lines. -```{r california-bachelors-degree-8} +```{r california-bachelors-degree-7} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -634,7 +634,7 @@ question_text(NULL, Geometry must be `TRUE` if we want to map later. -### Exercise 9 +### Exercise 8 Now copy the first few lines into your AI and say that you are working with tidyverse. Tell it to take the data in `edu_ca` and make a choropleth map of at least bachelor’s degree attainment or higher across California counties and use `mutate()` to calculate the percentage of the population with at least a bachelor’s degree and pipe it into a `ggplot()` @@ -642,7 +642,7 @@ Put what it gives you in a new code chunk and do `Ctrl/Cmd + Enter` to send it t CP/CR -```{r california-bachelors-degree-9} +```{r california-bachelors-degree-8} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -697,14 +697,14 @@ ggplot(edu_ca) + `ggplot2` can handle spatial data directly using `geom_sf()`. Use `mutate()` to calculate percentages, and pipe that into `ggplot()` for a map. -### Exercise 10 +### Exercise 9 1. In the Console, run the following command to display the last chunk of your `.qmd` file: CP/CR tutorial.helpers::show_file("CaliforniaBachelors.qmd", chunk = "last") -```{r california-bachelors-degree-10} +```{r california-bachelors-degree-9} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -733,7 +733,7 @@ Select `File -> New File -> Quarto Document ...`. Provide a title -- `"Californi -### Exercise 2 +### Exercise 1 In your QMD, put `library(tidyverse)` and `library(tidycensus)` in a new code chunk. Press Ctrl/Cmd + Shift + K to render the file @@ -752,7 +752,7 @@ tutorial.helpers::show_file("CaliforniaAge.qmd", chunk = "last") CP/CR. -```{r california-median-age-2} +```{r california-median-age-1} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -765,7 +765,7 @@ question_text(NULL, Render again. Everything looks nice, albeit empty, because we have added code to make the file look better and more professional. -### Exercise 3 +### Exercise 2 Place your cursor in the QMD file on the `library(tidyverse)` line. Use `Cmd/Ctrl + Enter` to execute that line. @@ -773,7 +773,7 @@ Note that this causes `library(tidyverse)` to be copied down to the Console and CP/CR. -```{r california-median-age-3} +```{r california-median-age-2} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -788,7 +788,7 @@ Working in the console like this is how professionals work! -### Exercise 4 +### Exercise 3 Ask AI to generate R code that uses tidycensus data to get both the median age and populations for all counties in California for 2020 in a variable called `age_ca` @@ -798,7 +798,7 @@ Put what it gives you in a new code chunk and do `Ctrl/Cmd + Enter` to send it t CP/CR -```{r california-median-age-4} +```{r california-median-age-3} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -824,7 +824,7 @@ age_ca <- get_acs( AI is your best friend! Professionals use AI to generate code and plot. You however, need to be careful as it can and will generate much extra code that you may not need. -### Exercise 5 +### Exercise 4 Using our code, add `geometry = FALSE`. @@ -838,7 +838,7 @@ tutorial.helpers::show_file("CaliforniaAge.qmd", chunk = "Last") CP/CR. -```{r california-bachelors-degree-5} +```{r california-median-age-4} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -862,13 +862,13 @@ age_ca <- get_acs( The American Community Survey (ACS) provides annual demographic, economic, and housing data based on samples, while the Decennial Census gives a complete count every 10 years. -### Exercise 6 +### Exercise 5 Let's explore the data, run `glimpse(age_ca)` in the console. CP/CR. -```{r california-bachelors-degree-6} +```{r california-median-age-5} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -879,7 +879,7 @@ question_text(NULL, ### -### Exercise 7 +### Exercise 6 Let's get some quick stats for each column of the data. @@ -888,7 +888,7 @@ Run `summary(age_ca)` in the console. CP/CR. -```{r california-bachelors-degree-7} +```{r california-median-age-6} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -1038,14 +1038,14 @@ ggplot(largest_ca, aes(x = reorder(NAME, median_age), y = median_age)) + ``` -### Exercise 11 +### Exercise 10 1. In the Console, run the following command to display the last chunk of your `.qmd` file: CP/CR tutorial.helpers::show_file("CaliforniaAge.qmd", chunk = "last") -```{r california-median-age-11} +```{r california-median-age-10} question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, @@ -1099,4 +1099,4 @@ The `scale_color_viridis_c()` function is used to adjust the color scheme of the The `theme_void()` function strips the background grid and axis labels from the plot accordingly: `coord_flip()` from ggplot2: -Flips x and y axes, often used to make horizontal bar charts easier to read. \ No newline at end of file +Flips x and y axes, often used to make horizontal bar charts easier to read. From 986f58813971198efddc3a95924546f4e53f2f77 Mon Sep 17 00:00:00 2001 From: Ansh Pathak Date: Fri, 18 Jul 2025 12:24:36 -0500 Subject: [PATCH 4/4] 55 --- inst/tutorials/99-overview/tutorial.Rmd | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/inst/tutorials/99-overview/tutorial.Rmd b/inst/tutorials/99-overview/tutorial.Rmd index 5945ecf..58230e1 100644 --- a/inst/tutorials/99-overview/tutorial.Rmd +++ b/inst/tutorials/99-overview/tutorial.Rmd @@ -1063,6 +1063,17 @@ The `show_file()` function from tutorial.helpers is a convenient way to check th ``` +## Summary + +This tutorial covered an overview of [Analyzing US Census Data](https://walker-data.com/census-r/index.html) by Kyle Walker. You learned about using the [**tidycensus**] package for collecting, interacting, and plotting US Census data. You mainly focused on collecting data from the Decennial Census and the American Community Survey (ACS). + + + + + + + + ### Good Knowledge Drops If the year is not specified, `get_acs()` defaults to the most recent five-year ACS sample. The data returned is similar in structure to that returned by `get_decennial()`, but includes an `estimate` column (for the ACS estimate) and `moe` column (for the margin of error around that estimate) instead of a value column. @@ -1100,3 +1111,7 @@ The `theme_void()` function strips the background grid and axis labels from the `coord_flip()` from ggplot2: Flips x and y axes, often used to make horizontal bar charts easier to read. + + + +