Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
Empty file removed .Rhistory
Empty file.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

42 changes: 37 additions & 5 deletions .Rproj.user/shared/notebooks/paths
Original file line number Diff line number Diff line change
@@ -1,11 +1,43 @@
/Users/zief0002/Documents/github/Statistical-Thinking/index.Rmd="D4BEC498"
/Users/zief0002/Documents/github/statistical-thinking/01-introduction.Rmd="A36A7474"
/Users/zief0002/Documents/github/epsy-5261/04-05-one-sample-test-assumptions.qmd="8E8D446D"
/Users/zief0002/Documents/github/epsy-5261/_quarto.yml="EC6B113F"
/Users/zief0002/Documents/github/epsy-5261/assets/epsy-5261.js="F35ECBE2"
/Users/zief0002/Documents/github/epsy-5261/assets/theme.scss="1685309C"
/Users/zief0002/Documents/github/modeling/_quarto.yml="AD1A2029"
/Users/zief0002/Documents/github/modeling/assets/theme.scss="776B071D"
/Users/zief0002/Documents/github/modeling/index.qmd="B1B5D332"
/Users/zief0002/Documents/github/statistical-thinking/.gitignore="C4E4EE26"
/Users/zief0002/Documents/github/statistical-thinking/01-introduction.qmd="D38DD5EF"
/Users/zief0002/Documents/github/statistical-thinking/02-modeling-and-simulation.Rmd="6A895574"
/Users/zief0002/Documents/github/statistical-thinking/02-modeling-and-simulation.qmd="F9D6A207"
/Users/zief0002/Documents/github/statistical-thinking/03-data-generation.Rmd="41928D7C"
/Users/zief0002/Documents/github/statistical-thinking/03-data-generation.qmd="EF9B4691"
/Users/zief0002/Documents/github/statistical-thinking/04-monte-carlo-simulation.Rmd="472FECCD"
/Users/zief0002/Documents/github/statistical-thinking/04-monte-carlo-simulation.qmd="42ED7D9B"
/Users/zief0002/Documents/github/statistical-thinking/05-modeling-sampling-variation.qmd="37F76736"
/Users/zief0002/Documents/github/statistical-thinking/06-describing-distributions.qmd="0B888E6F"
/Users/zief0002/Documents/github/statistical-thinking/07-experimental-variation.qmd="0C070902"
/Users/zief0002/Documents/github/statistical-thinking/08-p-value.qmd="9703ECFA"
/Users/zief0002/Documents/github/statistical-thinking/09-internal-validity-evidence.Rmd="1CC3F612"
/Users/zief0002/Documents/github/statistical-thinking/09-internal-validity-evidence.qmd="83DFFAA6"
/Users/zief0002/Documents/github/statistical-thinking/10-sampling-variation.qmd="E738589B"
/Users/zief0002/Documents/github/statistical-thinking/11-external-validity-evidence.Rmd="F0ECBDA4"
/Users/zief0002/Documents/github/statistical-thinking/11-external-validity-evidence.qmd="335ED2AB"
/Users/zief0002/Documents/github/statistical-thinking/12-validity-evidence-and-inferences.qmd="59DEEAE0"
/Users/zief0002/Documents/github/statistical-thinking/13-observational-studies.Rmd="60E51F04"
/Users/zief0002/Documents/github/statistical-thinking/13-observational-studies.qmd="A856E4B0"
/Users/zief0002/Documents/github/statistical-thinking/14-statistical-estimation.Rmd="8D252671"
/Users/zief0002/Documents/github/statistical-thinking/14-statistical-estimation.qmd="2CC004BF"
/Users/zief0002/Documents/github/statistical-thinking/15-uncertainty-and-bias.Rmd="E15FB9A"
/Users/zief0002/Documents/github/statistical-thinking/index.Rmd="30155EDD"
/Users/zief0002/Documents/github/statistical-thinking/statistical-thinking.css="7059DC70"
/Users/zief0002/Documents/github/statistical-thinking/statistical-thinking.js="B20A5CAD"
/Users/zief0002/Documents/github/statistical-thinking/15-uncertainty-and-bias.qmd="0EEE1FD4"
/Users/zief0002/Documents/github/statistical-thinking/README.md="CCFAA8DC"
/Users/zief0002/Documents/github/statistical-thinking/_quarto.yml="DCD25E8E"
/Users/zief0002/Documents/github/statistical-thinking/assets/statistical-thinking.css="7D5F7896"
/Users/zief0002/Documents/github/statistical-thinking/assets/statistical-thinking.js="D9297926"
/Users/zief0002/Documents/github/statistical-thinking/assets/sticky-notes.css="75380479"
/Users/zief0002/Documents/github/statistical-thinking/assets/theme.scss="751A2036"
/Users/zief0002/Documents/github/statistical-thinking/index.qmd="BA2AE753"
/Users/zief0002/Documents/github/statistical-thinking/spotify/hoffman-playlists.qmd="1F15AFBF"
/Users/zief0002/Documents/github/statistical-thinking/spotify/playlist-header.html="5C4813AF"
/Users/zief0002/Documents/github/statistical-thinking/spotify/playlists.css="A9D73F30"
/Users/zief0002/Documents/github/statistical-thinking/spotify/spotify-training-playlists.qmd="3DC0C1F2"
/Users/zief0002/Documents/github/statistical-thinking/spotify/spotify-validation-playlists.qmd="A9A12C03"
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,5 @@
.RData
.Ruserdata
.DS_Store

/.quarto/
19 changes: 16 additions & 3 deletions docs/01-introduction.Rmd → 01-introduction.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,25 +4,34 @@ Learning statistics is sexy. Hal Varian, Google's chief economist, believes this

Whether you believe it is the sexiest subject or not, it is incontrovertible that the use of statistics and data are prevalent in today's information age. Almost every person on earth will benefit from learning some foundational ideas of statistics. This is true because statistics forms the basis of our everyday world just as much as do science, technology, and politics. Google, Netflix, Twitter, Facebook, OKCupid, Match.com, Amazon, iTunes, and the Federal Government are just a handful of the companies and organizations that use statistics on a daily basis. Journalism, political science, biology, sociology, psychology, graphic design, economics, sports science, and dance are all disciplines that have made use of statistical methodology.

### Course Material {-}
<br />


## Course Material {-}

The materials on this website and in the lab manual will introduce you to the seminal ideas underlying the discipline of statistics. In addition, they have been designed with your learning in mind. For example, many of the class activities were developed using pedagogical principles, such as small group activities and discussion, that have been shown in research to improve student learning.

Course readings should be completed outside of class and are intended to help you learn and extend the ideas, skills, and concepts you learn in the classroom. The readings themselves are not all "traditional" readings in the sense of words written on the screen, but instead often link to video clips, blogs and other multimedia material.

<br />


### TinkerPlots 3&trade; Software {-}

Much of the material presented in the lab manual requires the use of TinkerPlots 3&trade;. This software can be downloaded (for Mac or PC), and a license can be purchased from [http://www.tinkerplots.com/](http://www.tinkerplots.com/).

<br />


### Lab Manual and Data Sets {-}

You will work from the lab manual every day in class. As such, you will need to bring a copy of the lab manual (physical or electronic) with you to class every day. To download a PDF copy of the lab manual, click this link: [https://github.com/zief0002/statistical-thinking/blob/master/statistical-thinking-v4_3.pdf?raw=true](https://github.com/zief0002/statistical-thinking/blob/master/statistical-thinking-v4_3.pdf?raw=true).
You will work from the lab manual every day in class. As such, you will need to bring a copy of the lab manual (physical or electronic) with you to class every day. To download a PDF copy of the lab manual, click this link: <https://github.com/zief0002/statistical-thinking/blob/main/statistical-thinking-v4_5.pdf?raw=true>.

There are several data sets used in the lab manual, as well as in EPsy 3264 assignments. To download a ZIP file to your computer that includes all the data sets, click one of the links below. Once the ZIP file has been downloaded to your computer, double-click the ZIP file to unzip it and access the materials.

- [TinkerPlots 3 Data Files](https://github.com/zief0002/statistical-thinking/blob/master/tp3-data.zip?raw=true)
- [TinkerPlots 3 Data Files](https://github.com/zief0002/statistical-thinking/blob/main/tp3-data.zip?raw=true)

<br />



Expand All @@ -33,3 +42,7 @@ The lab manual, instructors, and teaching assistants are all resources that are
Learning anything new takes time and effort and this is especially true of learning statistics, as you are not just learning a set of methods, but rather a disciplined way of thinking about the world. Changing your habits of mind will take continual practice. It will also take a great deal of patience and persistence.

As you engage in and use the skills, concepts and ideas introduced in the material, you will find yourself thinking about data and evidence in a different way. This may lead you to make different decisions or choices. But, even if this course does not change your world overnight, you will at the very least be able to critically think about inferences and conclusions drawn from data.

<br />


Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,20 @@ A model is a simplified representation of a system that can be used to promote a

Models have many purposes, but are primarily used to better understand phenomena in the real-world. Common uses of models are for description, exploration, prediction, and classification. For example, Google builds models to understand and predict peoples' internet searching tendencies. These models are then used to help Google carry out more efficient and better searches of information. As another example, Netflix builds models to understand the characteristics of movies that their customers have rated highly so that they can then recommend other movies that the person may enjoy. Amazon and Apple iTunes both use models in similar manners.

### Outline and Goals of Unit 1 {-}
<br />


## Outline and Goals of Unit 1 {-}

The following schematic outlines the course readings, in-class activities, and assignments for Unit 1.

<br />

```{r out.width="50%", echo=FALSE, fig.align='left'}
```{r}
#| echo: false
#| out-width: "80%"
#| fig-align: 'left'
#| fig-alt: "Graphic outlining the course readings, in-class activities, and assignments for Unit 1"
knitr::include_graphics("img/unit-01-outline.png")
```

Expand All @@ -27,13 +34,16 @@ You will also be introduced to the Monte Carlo simulation process and learn how

As you progress through the unit, remember that the modeling process is a creative process that can often be very challenging. At times, this might lead to frustration as you are learning and practicing some of the material. But, as Mosteller et al. (1973) remind us, it is also a profitable experience since, "modeling is not only a technique of statistics&hellip;it is a method of study which can be applied in many other fields as well" (p. xii).^[Mosteller, F., Kruskal, W. H., Link, R. F., Pieters, R. S., &amp; Rising, G. R. (1973). *Statistics by example: Finding models.* Reading, MA: Addison–Wesley.]

### Randomness {-}
<br />


## Randomness {-}

One critical component of simulation is the random process used to generate data. To help you begin to understand randomness, watch the [Random Sequences: Human vs. Coin](https://www.youtube.com/watch?v=H2lJLXS3AYM) YouTube video.

<br />


<center>
<iframe width="560" height="315" src="https://www.youtube.com/embed/H2lJLXS3AYM" frameborder="q" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<iframe width="560" height="315" src="https://www.youtube.com/embed/H2lJLXS3AYM" title="Random Sequences: Human vs. Coin video" frameborder="q" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</center>
2 changes: 1 addition & 1 deletion docs/03-data-generation.Rmd → 03-data-generation.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,5 @@ In the in-class activity, *Generating Random Data&mdash;Cat Factory*, you will c
<br />

<center>
<iframe width="560" height="420" src="https://www.youtube.com/embed/udeuvbJKGeI" frameborder="0" allowfullscreen></iframe>
<iframe width="560" height="420" src="https://www.youtube.com/embed/udeuvbJKGeI" title="Probability Simulation" frameborder="0" allowfullscreen></iframe>
</center>
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,10 @@ Monte Carlo simulation is one method that statisticians use to understand real-w

- [The Beginning of the Monte Carlo Method](http://www.webpages.uidaho.edu/~stevel/565/literature/The%20Beginning%20of%20Monte%20Carlo%20Method.pdf)

<br />


### Example of a Monte Carlo Simulation Study {-}
## Example of a Monte Carlo Simulation Study {-}

In 1978, China introduced the "one-child" policy in order to alleviate social, economic, and environmental problems in China. According to Wikipedia,^[One-child policy. (2015, May 30). In Wikipedia, The Free Encyclopedia. Retrieved 18:02, June 1, 2015, from [http://en.wikipedia.org/w/index.php?title=One-child_policy&oldid=664745432](http://en.wikipedia.org/w/index.php?title=One-child_policy&oldid=664745432)]

Expand Down Expand Up @@ -33,7 +35,8 @@ One way to model this is to write the word **boy** on one index card and the wor

<br />

```{r echo=FALSE}
```{r}
#| echo: false
d = data.frame(
Family = c("Family #1", "Family #2", "Family #3"),
Girl = c("&#10004;", "", "&#10004;&#10004;"),
Expand All @@ -50,18 +53,25 @@ knitr::kable(

We could carry out this simulation for many families, say 500 families, and use the results to provide an answer to the research question. You can imagine that carrying out even this simple simulation would quickly become quite tedious. Simulation studies, such as this, are typically carried out using computer programs. In this unit, you will learn to use a computer program called TinkerPlots&trade; to model processes in the real-world and carry out simulation studies.

### Monte Carlo Simulation Assumptions {-}
<br />


## Monte Carlo Simulation Assumptions {-}

"Wait," you say. "Even if I carried out this simulation, I still would not be able to provide an answer to the research question! It doesn't reflect reality! Some families may not want to have any children, while others might be happy to stop after a girl was born. What about multiple births?"

Maybe you are even questioning whether the probability of having a boy or having a girl is really 50:50. These are all valid points, and all would likely affect the results of the simulation, which in turn affects the inferences and conclusions that are drawn.

While the model used in the "one son" example is overly simplistic for drawing any sorts of meaningful conclusions about implementing such a policy in China, it could, however, provide a useful starting point for introducing additional complexity. Even in the most enormously complicated modeling problem, researchers often make many simplifying assumptions. (Remember that all models—even those that seem quite complex—are simplifications of reality and get many things wrong.) With enough simplification, a model can be constructed and studied. The model is evaluated and often revised or updated as certain assumptions are deemed tenable and others are not. Because of this process, simulation studies are generally iterative in their development. This iteration process continues until an adequate level of understanding is developed and the research question can be answered.

### Monte Carlo Simulation in Practice {-}
<br />


## Monte Carlo Simulation in Practice {-}

In practice, statisticians often use incredibly complex models to generate their data. As an example, Electronic Arts, the video game company behind titles such as *Madden*, *NHL* and *FIFA*, uses game telemetry (the transmission of data from a game executable for recording and analysis) to model the gameplay patterns of players and identify the elements of their games that are highly correlated with player retention.^[Weber, B. G., John, M., Mateas, M., &amp; Jhala, A. (2011). *Modeling player retention in Madden NFL 11.* Presented at Innovative Applications of Artificial Intelligence. [http://users.soe.ucsc.edu/~bweber/pubs/madden11retention.pdf](http://users.soe.ucsc.edu/~bweber/pubs/madden11retention.pdf)]

By understanding the behavior of players and the common patterns that are used, Electronic Arts game developers can focus their attention on more relevant features in future iterations of the game and ultimately reduce production costs. For example, in their examination of *Madden NFL 11*, Electronic Arts used 46 features to model players' preferences, including their control usage, performance, and play-calling style. This is but one example of using simulation in video games.

<br />

Loading