Built site for gh-pages

Quarto GHA Workflow Runner · Quarto GHA Workflow Runner · commit 5950af06e96d · 2025-08-21T02:53:38.000Z
diff --git a/.nojekyll b/.nojekyll
@@ -1 +1 @@
-774fbae8
+2030ac00
diff --git a/08_stats_sims.html b/08_stats_sims.html
@@ -496,10 +496,10 @@ <h4 class="unnumbered anchored" data-anchor-id="uniform-distribution">Uniform di
 <div class="cell">
 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="fu">runif</span>(<span class="at">n =</span> <span class="dv">20</span>, <span class="at">min =</span> <span class="sc">-</span><span class="dv">3</span>, <span class="at">max =</span> <span class="dv">3</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output cell-output-stdout">
-<pre><code> [1]  0.9151310 -2.7302919 -2.0926021  0.4233409 -1.4870530  0.5004577
- [7]  0.7759846  0.2851662  0.6160349  2.6545841 -2.8730297 -1.3263089
-[13]  1.8697259  0.5580634  1.4596908  2.1194205 -2.3670650 -0.8514127
-[19] -1.2995478 -2.2674588</code></pre>
+<pre><code> [1] -0.6033947 -0.6906502 -2.0110033  2.7114631 -0.4305185  0.9970371
+ [7] -2.5054233  2.4639020  2.1075071 -2.3923168  1.2850431  2.5855303
+[13] -0.5696733 -0.2806871 -2.3561868 -2.7265408  0.1665160 -2.0451474
+[19]  0.2929108  0.4752286</code></pre>
 </div>
 </div>
 <p>When we draw a million times from the distribution, we can then plot it and see that it does look as we would expect:</p>
diff --git a/Methods-Camp.pdf b/Methods-Camp.pdf
diff --git a/search.json b/search.json
@@ -454,7 +454,7 @@
     "href": "08_stats_sims.html#random-sampling",
     "title": "7  Statistics and simulations",
     "section": "",
-    "text": "7.1.1 Random sampling from theoretical distributions\n\nUniform distribution\nFor the uniform distribution, the arguments specify how many draws we want and the boundaries\n\nrunif(n = 20, min = -3, max = 3)\n\n [1]  0.9151310 -2.7302919 -2.0926021  0.4233409 -1.4870530  0.5004577\n [7]  0.7759846  0.2851662  0.6160349  2.6545841 -2.8730297 -1.3263089\n[13]  1.8697259  0.5580634  1.4596908  2.1194205 -2.3670650 -0.8514127\n[19] -1.2995478 -2.2674588\n\n\nWhen we draw a million times from the distribution, we can then plot it and see that it does look as we would expect:\n\nset.seed(123)\nmy_runif &lt;- runif(n = 1000000, min = -3, max = 3)\n\n\nggplot(data.frame(my_runif), aes(x = my_runif)) +\n  geom_histogram(binwidth = 0.25, boundary = 0, closed = \"right\") +\n  scale_x_continuous(breaks = seq(-5, 5, 1), limits = c(-5, 5))\n\n\n\n\n\n\n\n\n\n\nBinomial distribution\nFor the binomial distribution, we can specify the number of draws, how many trials each draw will have, and the probability of success.\nFor instance, we can ask R to do the following twenty times: flip a fair coin one hundred times, and count the number of tails.\n\nrbinom(n = 20, size = 100, prob = 0.5)\n\n [1] 48 45 54 50 58 50 42 58 48 57 53 49 52 51 49 40 57 53 52 41\n\n\nWith prob = , we can implement unfair coins:\n\nrbinom(n = 20, size = 100, prob = 0.9)\n\n [1] 88 87 93 95 93 92 91 94 87 91 90 92 93 89 90 95 91 90 86 88\n\n\n\n\nNormal distribution\nFor the Normal or Gaussian distribution, we specify the number of draws, the mean, and standard deviation:\n\nrnorm(n = 20, mean = 0, sd = 1)\n\n [1]  1.10455864  0.06386693 -1.59684275  1.86298270 -0.90428935 -1.55158044\n [7]  1.27986282 -0.32420495 -0.70015076  2.17271578  0.89778913 -0.01338538\n[13] -0.74074395  0.36772316 -0.66453402 -1.11498344 -1.15067439 -0.55098894\n[19]  0.10503154 -0.27183645\n\n\n\n\n\n\n\n\nExercise\n\n\n\nCompute and plot my_rnorm, a vector with 10,000 draws from a Normal distribution \\(X\\) with mean equal to -10 and standard deviation equal to 2 (\\(X\\sim N(-10,2)\\)). You can recycle code!\n\n\n\n\n\n7.1.2 Random sampling from data\nIn this section we will work with good ol’ mtcars, one of R’s most notable default datasets. We’ll assign it to an object so it shows in our Environment pane:\n\nmy_mtcars &lt;- mtcars\n\n\n\n\n\n\n\nTip\n\n\n\nDefault datasets such as mtcars and iris are useful because they are available to everyone, and once you become familiar with them, you can start thinking about the code instead of the intricacies of the data. These qualities also make default datasets ideal for building reproducible examples (see Wickham 2014)\n\n\nWe can use the function sample() to obtain random values from a vector. The size = argument specifies how many values we want. For example, let’s get one random value of the “mpg” column:\n\nsample(my_mtcars$mpg, size = 1)\n\n[1] 24.4\n\n\nEvery time we run this command, we can get a different result:\n\nsample(my_mtcars$mpg, size = 1)\n\n[1] 14.7\n\n\n\nsample(my_mtcars$mpg, size = 1)\n\n[1] 15.5\n\n\nIn some occasions we do want to get the same result consistently after running some random process multiple times. In this case, we set a seed, which takes advantage of R’s pseudo-random number generator capabilities. No matter how many times we run the following code block, the result will be the same:\n\nset.seed(123)\nsample(my_mtcars$mpg, size = 1)\n\n[1] 15\n\n\nSampling with replacement means that we can get the same value multiple times. For example:\n\nset.seed(12)\nsample(c(\"Banana\", \"Apple\", \"Orange\"), size = 3, replace = T)\n\n[1] \"Apple\"  \"Apple\"  \"Orange\"\n\n\n\nsample(my_mtcars$mpg, size = 100, replace = T)\n\n  [1] 26.0 15.2 18.7 18.7 30.4 21.0 24.4 26.0 32.4 15.8 32.4 19.2 18.1 16.4 19.2\n [16] 27.3 14.3 10.4 17.3 13.3 21.4 13.3 19.2 24.4 15.0 27.3 17.8 15.2 15.8 14.3\n [31] 19.7 16.4 18.7 15.8 19.2 21.0 14.3 15.2 14.3 27.3 21.4 33.9 33.9 21.4 30.4\n [46] 33.9 21.4 17.3 17.3 10.4 26.0 18.7 15.2 30.4 10.4 10.4 15.5 14.3 26.0 17.3\n [61] 33.9 26.0 24.4 18.7 30.4 32.4 21.5 30.4 15.2 27.3 13.3 17.3 21.4 24.4 13.3\n [76] 22.8 33.9 13.3 21.5 14.3 19.2 30.4 24.4 26.0 15.8 10.4 24.4 14.3 15.2 10.4\n [91] 19.2 21.0 16.4 19.2 24.4 19.7 18.7 10.4 18.7 17.8\n\n\nIn order to sample not from a vector but from a data frame’s rows, we can use the slice_sample() function from dplyr:\n\nmy_mtcars |&gt; \n  slice_sample(n = 2) # a number of rows\n\n                  mpg cyl disp  hp drat   wt  qsec vs am gear carb\nDodge Challenger 15.5   8  318 150 2.76 3.52 16.87  0  0    3    2\nDatsun 710       22.8   4  108  93 3.85 2.32 18.61  1  1    4    1\n\n\n\nmy_mtcars |&gt; \n  slice_sample(prop = 0.5) # a proportion of rows\n\n                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb\nToyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1\nFerrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6\nMerc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3\nHornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2\nMaserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8\nDatsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1\nFord Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4\nDodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2\nMerc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4\nLincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4\nValiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1\nFiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1\nMazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4\nMerc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2\nCamaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4\nCadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4\n\n\nAgain, we can also use seeds here to ensure that we’ll get the same result each time:\n\nset.seed(123)\nmy_mtcars |&gt; \n  slice_sample(prop = 0.5)\n\n                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb\nMaserati Bora      15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8\nCadillac Fleetwood 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4\nHonda Civic        30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2\nMerc 450SLC        15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3\nDatsun 710         22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1\nMerc 280           19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4\nFiat 128           32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1\nDodge Challenger   15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2\nMerc 280C          17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4\nHornet Sportabout  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2\nToyota Corolla     33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1\nFord Pantera L     15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4\nAMC Javelin        15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2\nFerrari Dino       19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6\nMerc 230           22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2\nLotus Europa       30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2\n\n\n\n\n\n\n\n\nExercise\n\n\n\nUse slice_sample() to sample 32 rows from mtcars with replacement.",
+    "text": "7.1.1 Random sampling from theoretical distributions\n\nUniform distribution\nFor the uniform distribution, the arguments specify how many draws we want and the boundaries\n\nrunif(n = 20, min = -3, max = 3)\n\n [1] -0.6033947 -0.6906502 -2.0110033  2.7114631 -0.4305185  0.9970371\n [7] -2.5054233  2.4639020  2.1075071 -2.3923168  1.2850431  2.5855303\n[13] -0.5696733 -0.2806871 -2.3561868 -2.7265408  0.1665160 -2.0451474\n[19]  0.2929108  0.4752286\n\n\nWhen we draw a million times from the distribution, we can then plot it and see that it does look as we would expect:\n\nset.seed(123)\nmy_runif &lt;- runif(n = 1000000, min = -3, max = 3)\n\n\nggplot(data.frame(my_runif), aes(x = my_runif)) +\n  geom_histogram(binwidth = 0.25, boundary = 0, closed = \"right\") +\n  scale_x_continuous(breaks = seq(-5, 5, 1), limits = c(-5, 5))\n\n\n\n\n\n\n\n\n\n\nBinomial distribution\nFor the binomial distribution, we can specify the number of draws, how many trials each draw will have, and the probability of success.\nFor instance, we can ask R to do the following twenty times: flip a fair coin one hundred times, and count the number of tails.\n\nrbinom(n = 20, size = 100, prob = 0.5)\n\n [1] 48 45 54 50 58 50 42 58 48 57 53 49 52 51 49 40 57 53 52 41\n\n\nWith prob = , we can implement unfair coins:\n\nrbinom(n = 20, size = 100, prob = 0.9)\n\n [1] 88 87 93 95 93 92 91 94 87 91 90 92 93 89 90 95 91 90 86 88\n\n\n\n\nNormal distribution\nFor the Normal or Gaussian distribution, we specify the number of draws, the mean, and standard deviation:\n\nrnorm(n = 20, mean = 0, sd = 1)\n\n [1]  1.10455864  0.06386693 -1.59684275  1.86298270 -0.90428935 -1.55158044\n [7]  1.27986282 -0.32420495 -0.70015076  2.17271578  0.89778913 -0.01338538\n[13] -0.74074395  0.36772316 -0.66453402 -1.11498344 -1.15067439 -0.55098894\n[19]  0.10503154 -0.27183645\n\n\n\n\n\n\n\n\nExercise\n\n\n\nCompute and plot my_rnorm, a vector with 10,000 draws from a Normal distribution \\(X\\) with mean equal to -10 and standard deviation equal to 2 (\\(X\\sim N(-10,2)\\)). You can recycle code!\n\n\n\n\n\n7.1.2 Random sampling from data\nIn this section we will work with good ol’ mtcars, one of R’s most notable default datasets. We’ll assign it to an object so it shows in our Environment pane:\n\nmy_mtcars &lt;- mtcars\n\n\n\n\n\n\n\nTip\n\n\n\nDefault datasets such as mtcars and iris are useful because they are available to everyone, and once you become familiar with them, you can start thinking about the code instead of the intricacies of the data. These qualities also make default datasets ideal for building reproducible examples (see Wickham 2014)\n\n\nWe can use the function sample() to obtain random values from a vector. The size = argument specifies how many values we want. For example, let’s get one random value of the “mpg” column:\n\nsample(my_mtcars$mpg, size = 1)\n\n[1] 24.4\n\n\nEvery time we run this command, we can get a different result:\n\nsample(my_mtcars$mpg, size = 1)\n\n[1] 14.7\n\n\n\nsample(my_mtcars$mpg, size = 1)\n\n[1] 15.5\n\n\nIn some occasions we do want to get the same result consistently after running some random process multiple times. In this case, we set a seed, which takes advantage of R’s pseudo-random number generator capabilities. No matter how many times we run the following code block, the result will be the same:\n\nset.seed(123)\nsample(my_mtcars$mpg, size = 1)\n\n[1] 15\n\n\nSampling with replacement means that we can get the same value multiple times. For example:\n\nset.seed(12)\nsample(c(\"Banana\", \"Apple\", \"Orange\"), size = 3, replace = T)\n\n[1] \"Apple\"  \"Apple\"  \"Orange\"\n\n\n\nsample(my_mtcars$mpg, size = 100, replace = T)\n\n  [1] 26.0 15.2 18.7 18.7 30.4 21.0 24.4 26.0 32.4 15.8 32.4 19.2 18.1 16.4 19.2\n [16] 27.3 14.3 10.4 17.3 13.3 21.4 13.3 19.2 24.4 15.0 27.3 17.8 15.2 15.8 14.3\n [31] 19.7 16.4 18.7 15.8 19.2 21.0 14.3 15.2 14.3 27.3 21.4 33.9 33.9 21.4 30.4\n [46] 33.9 21.4 17.3 17.3 10.4 26.0 18.7 15.2 30.4 10.4 10.4 15.5 14.3 26.0 17.3\n [61] 33.9 26.0 24.4 18.7 30.4 32.4 21.5 30.4 15.2 27.3 13.3 17.3 21.4 24.4 13.3\n [76] 22.8 33.9 13.3 21.5 14.3 19.2 30.4 24.4 26.0 15.8 10.4 24.4 14.3 15.2 10.4\n [91] 19.2 21.0 16.4 19.2 24.4 19.7 18.7 10.4 18.7 17.8\n\n\nIn order to sample not from a vector but from a data frame’s rows, we can use the slice_sample() function from dplyr:\n\nmy_mtcars |&gt; \n  slice_sample(n = 2) # a number of rows\n\n                  mpg cyl disp  hp drat   wt  qsec vs am gear carb\nDodge Challenger 15.5   8  318 150 2.76 3.52 16.87  0  0    3    2\nDatsun 710       22.8   4  108  93 3.85 2.32 18.61  1  1    4    1\n\n\n\nmy_mtcars |&gt; \n  slice_sample(prop = 0.5) # a proportion of rows\n\n                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb\nToyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1\nFerrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6\nMerc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3\nHornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2\nMaserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8\nDatsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1\nFord Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4\nDodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2\nMerc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4\nLincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4\nValiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1\nFiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1\nMazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4\nMerc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2\nCamaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4\nCadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4\n\n\nAgain, we can also use seeds here to ensure that we’ll get the same result each time:\n\nset.seed(123)\nmy_mtcars |&gt; \n  slice_sample(prop = 0.5)\n\n                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb\nMaserati Bora      15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8\nCadillac Fleetwood 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4\nHonda Civic        30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2\nMerc 450SLC        15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3\nDatsun 710         22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1\nMerc 280           19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4\nFiat 128           32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1\nDodge Challenger   15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2\nMerc 280C          17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4\nHornet Sportabout  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2\nToyota Corolla     33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1\nFord Pantera L     15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4\nAMC Javelin        15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2\nFerrari Dino       19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6\nMerc 230           22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2\nLotus Europa       30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2\n\n\n\n\n\n\n\n\nExercise\n\n\n\nUse slice_sample() to sample 32 rows from mtcars with replacement.",
     "crumbs": [
       "<span class='chapter-number'>7</span>  <span class='chapter-title'>Statistics and simulations</span>"
     ]
diff --git a/sitemap.xml b/sitemap.xml
@@ -2,54 +2,54 @@
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
     <loc>https://methodscamp.github.io/index.html</loc>
-    <lastmod>2025-08-20T02:00:40.729Z</lastmod>
+    <lastmod>2025-08-21T02:49:34.785Z</lastmod>
   </url>
   <url>
     <loc>https://methodscamp.github.io/00_setup.html</loc>
-    <lastmod>2025-08-20T02:00:40.695Z</lastmod>
+    <lastmod>2025-08-21T02:49:34.751Z</lastmod>
   </url>
   <url>
     <loc>https://methodscamp.github.io/01_r_intro.html</loc>
-    <lastmod>2025-08-20T02:00:40.695Z</lastmod>
+    <lastmod>2025-08-21T02:49:34.751Z</lastmod>
   </url>
   <url>
     <loc>https://methodscamp.github.io/02_tidy_data1.html</loc>
-    <lastmod>2025-08-20T02:00:40.695Z</lastmod>
+    <lastmod>2025-08-21T02:49:34.751Z</lastmod>
   </url>
   <url>
     <loc>https://methodscamp.github.io/03_functions.html</loc>
-    <lastmod>2025-08-20T02:00:40.696Z</lastmod>
+    <lastmod>2025-08-21T02:49:34.751Z</lastmod>
   </url>
   <url>
     <loc>https://methodscamp.github.io/04_calculus.html</loc>
-    <lastmod>2025-08-20T02:00:40.696Z</lastmod>
+    <lastmod>2025-08-21T02:49:34.751Z</lastmod>
   </url>
   <url>
     <loc>https://methodscamp.github.io/06_tidy_data2.html</loc>
-    <lastmod>2025-08-20T02:00:40.696Z</lastmod>
+    <lastmod>2025-08-21T02:49:34.752Z</lastmod>
   </url>
   <url>
     <loc>https://methodscamp.github.io/07_probability.html</loc>
-    <lastmod>2025-08-20T02:00:40.696Z</lastmod>
+    <lastmod>2025-08-21T02:49:34.752Z</lastmod>
   </url>
   <url>
     <loc>https://methodscamp.github.io/08_stats_sims.html</loc>
-    <lastmod>2025-08-20T02:00:40.696Z</lastmod>
+    <lastmod>2025-08-21T02:49:34.752Z</lastmod>
   </url>
   <url>
     <loc>https://methodscamp.github.io/05_matrices.html</loc>
-    <lastmod>2025-08-20T02:00:40.696Z</lastmod>
+    <lastmod>2025-08-21T02:49:34.751Z</lastmod>
   </url>
   <url>
     <loc>https://methodscamp.github.io/11_wrapup.html</loc>
-    <lastmod>2025-08-20T02:00:40.697Z</lastmod>
+    <lastmod>2025-08-21T02:49:34.753Z</lastmod>
   </url>
   <url>
     <loc>https://methodscamp.github.io/references.html</loc>
-    <lastmod>2025-08-20T02:00:40.839Z</lastmod>
+    <lastmod>2025-08-21T02:49:34.896Z</lastmod>
   </url>
   <url>
     <loc>https://methodscamp.github.io/Methods-Camp.pdf</loc>
-    <lastmod>2025-08-20T02:04:36.275Z</lastmod>
+    <lastmod>2025-08-21T02:53:36.532Z</lastmod>
   </url>
 </urlset>

Original file line number	Diff line number	Diff line change
`@@ -454,7 +454,7 @@`
`454`	`454`	`"href": "08_stats_sims.html#random-sampling",`
`455`	`455`	`"title": "7 Statistics and simulations",`
`456`	`456`	`"section": "",`
`457`		- "text": "7.1.1 Random sampling from theoretical distributions\n\nUniform distribution\nFor the uniform distribution, the arguments specify how many draws we want and the boundaries\n\nrunif(n = 20, min = -3, max = 3)\n\n [1] 0.9151310 -2.7302919 -2.0926021 0.4233409 -1.4870530 0.5004577\n [7] 0.7759846 0.2851662 0.6160349 2.6545841 -2.8730297 -1.3263089\n[13] 1.8697259 0.5580634 1.4596908 2.1194205 -2.3670650 -0.8514127\n[19] -1.2995478 -2.2674588\n\n\nWhen we draw a million times from the distribution, we can then plot it and see that it does look as we would expect:\n\nset.seed(123)\nmy_runif <- runif(n = 1000000, min = -3, max = 3)\n\n\nggplot(data.frame(my_runif), aes(x = my_runif)) +\n geom_histogram(binwidth = 0.25, boundary = 0, closed = \"right\") +\n scale_x_continuous(breaks = seq(-5, 5, 1), limits = c(-5, 5))\n\n\n\n\n\n\n\n\n\n\nBinomial distribution\nFor the binomial distribution, we can specify the number of draws, how many trials each draw will have, and the probability of success.\nFor instance, we can ask R to do the following twenty times: flip a fair coin one hundred times, and count the number of tails.\n\nrbinom(n = 20, size = 100, prob = 0.5)\n\n [1] 48 45 54 50 58 50 42 58 48 57 53 49 52 51 49 40 57 53 52 41\n\n\nWith prob = , we can implement unfair coins:\n\nrbinom(n = 20, size = 100, prob = 0.9)\n\n [1] 88 87 93 95 93 92 91 94 87 91 90 92 93 89 90 95 91 90 86 88\n\n\n\n\nNormal distribution\nFor the Normal or Gaussian distribution, we specify the number of draws, the mean, and standard deviation:\n\nrnorm(n = 20, mean = 0, sd = 1)\n\n [1] 1.10455864 0.06386693 -1.59684275 1.86298270 -0.90428935 -1.55158044\n [7] 1.27986282 -0.32420495 -0.70015076 2.17271578 0.89778913 -0.01338538\n[13] -0.74074395 0.36772316 -0.66453402 -1.11498344 -1.15067439 -0.55098894\n[19] 0.10503154 -0.27183645\n\n\n\n\n\n\n\n\nExercise\n\n\n\nCompute and plot my_rnorm, a vector with 10,000 draws from a Normal distribution \\(X\\) with mean equal to -10 and standard deviation equal to 2 (\\(X\\sim N(-10,2)\\)). You can recycle code!\n\n\n\n\n\n7.1.2 Random sampling from data\nIn this section we will work with good ol’ mtcars, one of R’s most notable default datasets. We’ll assign it to an object so it shows in our Environment pane:\n\nmy_mtcars <- mtcars\n\n\n\n\n\n\n\nTip\n\n\n\nDefault datasets such as mtcars and iris are useful because they are available to everyone, and once you become familiar with them, you can start thinking about the code instead of the intricacies of the data. These qualities also make default datasets ideal for building reproducible examples (see Wickham 2014)\n\n\nWe can use the function sample() to obtain random values from a vector. The size = argument specifies how many values we want. For example, let’s get one random value of the “mpg” column:\n\nsample(my_mtcars$mpg, size = 1)\n\n[1] 24.4\n\n\nEvery time we run this command, we can get a different result:\n\nsample(my_mtcars$mpg, size = 1)\n\n[1] 14.7\n\n\n\nsample(my_mtcars$mpg, size = 1)\n\n[1] 15.5\n\n\nIn some occasions we do want to get the same result consistently after running some random process multiple times. In this case, we set a seed, which takes advantage of R’s pseudo-random number generator capabilities. No matter how many times we run the following code block, the result will be the same:\n\nset.seed(123)\nsample(my_mtcars$mpg, size = 1)\n\n[1] 15\n\n\nSampling with replacement means that we can get the same value multiple times. For example:\n\nset.seed(12)\nsample(c(\"Banana\", \"Apple\", \"Orange\"), size = 3, replace = T)\n\n[1] \"Apple\" \"Apple\" \"Orange\"\n\n\n\nsample(my_mtcars$mpg, size = 100, replace = T)\n\n [1] 26.0 15.2 18.7 18.7 30.4 21.0 24.4 26.0 32.4 15.8 32.4 19.2 18.1 16.4 19.2\n [16] 27.3 14.3 10.4 17.3 13.3 21.4 13.3 19.2 24.4 15.0 27.3 17.8 15.2 15.8 14.3\n [31] 19.7 16.4 18.7 15.8 19.2 21.0 14.3 15.2 14.3 27.3 21.4 33.9 33.9 21.4 30.4\n [46] 33.9 21.4 17.3 17.3 10.4 26.0 18.7 15.2 30.4 10.4 10.4 15.5 14.3 26.0 17.3\n [61] 33.9 26.0 24.4 18.7 30.4 32.4 21.5 30.4 15.2 27.3 13.3 17.3 21.4 24.4 13.3\n [76] 22.8 33.9 13.3 21.5 14.3 19.2 30.4 24.4 26.0 15.8 10.4 24.4 14.3 15.2 10.4\n [91] 19.2 21.0 16.4 19.2 24.4 19.7 18.7 10.4 18.7 17.8\n\n\nIn order to sample not from a vector but from a data frame’s rows, we can use the slice_sample() function from dplyr:\n\nmy_mtcars \|> \n slice_sample(n = 2) # a number of rows\n\n mpg cyl disp hp drat wt qsec vs am gear carb\nDodge Challenger 15.5 8 318 150 2.76 3.52 16.87 0 0 3 2\nDatsun 710 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1\n\n\n\nmy_mtcars \|> \n slice_sample(prop = 0.5) # a proportion of rows\n\n mpg cyl disp hp drat wt qsec vs am gear carb\nToyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1\nFerrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6\nMerc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3\nHornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2\nMaserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8\nDatsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1\nFord Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4\nDodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2\nMerc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4\nLincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4\nValiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1\nFiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1\nMazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4\nMerc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2\nCamaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4\nCadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4\n\n\nAgain, we can also use seeds here to ensure that we’ll get the same result each time:\n\nset.seed(123)\nmy_mtcars \|> \n slice_sample(prop = 0.5)\n\n mpg cyl disp hp drat wt qsec vs am gear carb\nMaserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8\nCadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4\nHonda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2\nMerc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3\nDatsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1\nMerc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4\nFiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1\nDodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2\nMerc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4\nHornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2\nToyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1\nFord Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4\nAMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2\nFerrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6\nMerc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2\nLotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2\n\n\n\n\n\n\n\n\nExercise\n\n\n\nUse slice_sample() to sample 32 rows from mtcars with replacement.",
	`457`	+ "text": "7.1.1 Random sampling from theoretical distributions\n\nUniform distribution\nFor the uniform distribution, the arguments specify how many draws we want and the boundaries\n\nrunif(n = 20, min = -3, max = 3)\n\n [1] -0.6033947 -0.6906502 -2.0110033 2.7114631 -0.4305185 0.9970371\n [7] -2.5054233 2.4639020 2.1075071 -2.3923168 1.2850431 2.5855303\n[13] -0.5696733 -0.2806871 -2.3561868 -2.7265408 0.1665160 -2.0451474\n[19] 0.2929108 0.4752286\n\n\nWhen we draw a million times from the distribution, we can then plot it and see that it does look as we would expect:\n\nset.seed(123)\nmy_runif <- runif(n = 1000000, min = -3, max = 3)\n\n\nggplot(data.frame(my_runif), aes(x = my_runif)) +\n geom_histogram(binwidth = 0.25, boundary = 0, closed = \"right\") +\n scale_x_continuous(breaks = seq(-5, 5, 1), limits = c(-5, 5))\n\n\n\n\n\n\n\n\n\n\nBinomial distribution\nFor the binomial distribution, we can specify the number of draws, how many trials each draw will have, and the probability of success.\nFor instance, we can ask R to do the following twenty times: flip a fair coin one hundred times, and count the number of tails.\n\nrbinom(n = 20, size = 100, prob = 0.5)\n\n [1] 48 45 54 50 58 50 42 58 48 57 53 49 52 51 49 40 57 53 52 41\n\n\nWith prob = , we can implement unfair coins:\n\nrbinom(n = 20, size = 100, prob = 0.9)\n\n [1] 88 87 93 95 93 92 91 94 87 91 90 92 93 89 90 95 91 90 86 88\n\n\n\n\nNormal distribution\nFor the Normal or Gaussian distribution, we specify the number of draws, the mean, and standard deviation:\n\nrnorm(n = 20, mean = 0, sd = 1)\n\n [1] 1.10455864 0.06386693 -1.59684275 1.86298270 -0.90428935 -1.55158044\n [7] 1.27986282 -0.32420495 -0.70015076 2.17271578 0.89778913 -0.01338538\n[13] -0.74074395 0.36772316 -0.66453402 -1.11498344 -1.15067439 -0.55098894\n[19] 0.10503154 -0.27183645\n\n\n\n\n\n\n\n\nExercise\n\n\n\nCompute and plot my_rnorm, a vector with 10,000 draws from a Normal distribution \\(X\\) with mean equal to -10 and standard deviation equal to 2 (\\(X\\sim N(-10,2)\\)). You can recycle code!\n\n\n\n\n\n7.1.2 Random sampling from data\nIn this section we will work with good ol’ mtcars, one of R’s most notable default datasets. We’ll assign it to an object so it shows in our Environment pane:\n\nmy_mtcars <- mtcars\n\n\n\n\n\n\n\nTip\n\n\n\nDefault datasets such as mtcars and iris are useful because they are available to everyone, and once you become familiar with them, you can start thinking about the code instead of the intricacies of the data. These qualities also make default datasets ideal for building reproducible examples (see Wickham 2014)\n\n\nWe can use the function sample() to obtain random values from a vector. The size = argument specifies how many values we want. For example, let’s get one random value of the “mpg” column:\n\nsample(my_mtcars$mpg, size = 1)\n\n[1] 24.4\n\n\nEvery time we run this command, we can get a different result:\n\nsample(my_mtcars$mpg, size = 1)\n\n[1] 14.7\n\n\n\nsample(my_mtcars$mpg, size = 1)\n\n[1] 15.5\n\n\nIn some occasions we do want to get the same result consistently after running some random process multiple times. In this case, we set a seed, which takes advantage of R’s pseudo-random number generator capabilities. No matter how many times we run the following code block, the result will be the same:\n\nset.seed(123)\nsample(my_mtcars$mpg, size = 1)\n\n[1] 15\n\n\nSampling with replacement means that we can get the same value multiple times. For example:\n\nset.seed(12)\nsample(c(\"Banana\", \"Apple\", \"Orange\"), size = 3, replace = T)\n\n[1] \"Apple\" \"Apple\" \"Orange\"\n\n\n\nsample(my_mtcars$mpg, size = 100, replace = T)\n\n [1] 26.0 15.2 18.7 18.7 30.4 21.0 24.4 26.0 32.4 15.8 32.4 19.2 18.1 16.4 19.2\n [16] 27.3 14.3 10.4 17.3 13.3 21.4 13.3 19.2 24.4 15.0 27.3 17.8 15.2 15.8 14.3\n [31] 19.7 16.4 18.7 15.8 19.2 21.0 14.3 15.2 14.3 27.3 21.4 33.9 33.9 21.4 30.4\n [46] 33.9 21.4 17.3 17.3 10.4 26.0 18.7 15.2 30.4 10.4 10.4 15.5 14.3 26.0 17.3\n [61] 33.9 26.0 24.4 18.7 30.4 32.4 21.5 30.4 15.2 27.3 13.3 17.3 21.4 24.4 13.3\n [76] 22.8 33.9 13.3 21.5 14.3 19.2 30.4 24.4 26.0 15.8 10.4 24.4 14.3 15.2 10.4\n [91] 19.2 21.0 16.4 19.2 24.4 19.7 18.7 10.4 18.7 17.8\n\n\nIn order to sample not from a vector but from a data frame’s rows, we can use the slice_sample() function from dplyr:\n\nmy_mtcars \|> \n slice_sample(n = 2) # a number of rows\n\n mpg cyl disp hp drat wt qsec vs am gear carb\nDodge Challenger 15.5 8 318 150 2.76 3.52 16.87 0 0 3 2\nDatsun 710 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1\n\n\n\nmy_mtcars \|> \n slice_sample(prop = 0.5) # a proportion of rows\n\n mpg cyl disp hp drat wt qsec vs am gear carb\nToyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1\nFerrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6\nMerc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3\nHornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2\nMaserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8\nDatsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1\nFord Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4\nDodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2\nMerc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4\nLincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4\nValiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1\nFiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1\nMazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4\nMerc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2\nCamaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4\nCadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4\n\n\nAgain, we can also use seeds here to ensure that we’ll get the same result each time:\n\nset.seed(123)\nmy_mtcars \|> \n slice_sample(prop = 0.5)\n\n mpg cyl disp hp drat wt qsec vs am gear carb\nMaserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8\nCadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4\nHonda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2\nMerc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3\nDatsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1\nMerc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4\nFiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1\nDodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2\nMerc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4\nHornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2\nToyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1\nFord Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4\nAMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2\nFerrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6\nMerc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2\nLotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2\n\n\n\n\n\n\n\n\nExercise\n\n\n\nUse slice_sample() to sample 32 rows from mtcars with replacement.",
`458`	`458`	`"crumbs": [`
`459`	`459`	`"<span class='chapter-number'>7</span> <span class='chapter-title'>Statistics and simulations</span>"`
`460`	`460`	`]`