course-spring2019/data.Rmd at master · ml4econ/course-spring2019 · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
title: "Data"
---

Throughout the course we will make use of he [`experimentdatar`](https://itamarcaspi.github.io/experimentdatar/) data package that contains publicly available datasets that were used in Susan Athey and Guido Imbens' course ["Machine Learning and Econometrics"](https://www.aeaweb.org/conference/cont-ed/2018-webcasts) (AEA continuing Education, 2018). The datasets are conveniently packed for R users.

### Installation

You can install the **development** version from
[GitHub](https://github.com/itamarcaspi/experimentdatar/)

```r
# install.packages("devtools")
devtools::install_github("itamarcaspi/experimentdatar")
```

### Usage

Load the __experimentdatar__ package
```{r load_package, message=FALSE, warning=FALSE}

library(experimentdatar)

```

Load the `social` dataset

```{r load_data}

data(social)

```

(You can find information about variable definitions using `?social`.)

Use `dataDetails()` function to open the paper associated with the `social` dataset

```{r details, eval=FALSE}
dataDetails("social")
```

Load the __dplyr__ and display the response (`outcome_vote`) and treatment variable (`treat_neighbors`)

```{r tibble, message=FALSE, warning=FALSE}
library(dplyr) # for data manipulations

social %>%
  select(outcome_voted, treat_neighbors) %>%
  head()
```

Use __ggplot2__ to plot the distribution of the median income of survey participants

```{r plot, message=FALSE, warning=FALSE}
library(ggplot2) # for generating plots

social %>%
  ggplot(aes(median_age)) +
  geom_histogram(binwidth = 1)

```


### List of available datasets

* `charitable`: Data used for the paper "Does Price matter in charitable giving? Evidence from a large-Scale Natural Field experiment"
by Karlan and List (2007).

* `IVdataset`: Data used for the paper "Does compulsory school attendance affect schooling and earnings?"
by Angrist and Krueger (1991) and related papers.

* `mobilization`: Data for the paper "Comparing Experimental and Matching Methods Using a Large-Scale Voter Mobilization Experiment"
by Arceneaux, Gerber, and Green (2006).

* `vouchers`: Data for the paper "Vouchers for Private Schooling in Colombia: Evidence from a Randomized Natural Experiment"
by Angrist, Bettinger, Bloom, King, and Kremer (2002).

* `secrecy`: Data for the paper "Ballot Secrecy Concerns and Voter Mobilization: New Experimental Evidence about Message Source, Context, and the Duration of Mobilization Effects"
by Gerber, Hubers, Biggers, and Hendry (2014).

* `social`: Data for the paper "Social Pressure and Voter Turnout: Evidence from a Large-Scale Field Experiment"
by Gerber, Green, and Larimer (2008).

* `welfare`: Data for the paper "Modeling heterogeneous treatment effects in survey experiments with Bayesian Additive Regression Trees"
by Green and Kern (2012).


### Resources

* The [ExperimentData repository](https://github.com/gsbDBI/ExperimentData) where you can find the original as well as new datasets and more documentations.