- "text": "7.1.1 Random sampling from theoretical distributions\n\nUniform distribution\nFor the uniform distribution, the arguments specify how many draws we want and the boundaries\n\nrunif(n = 20, min = -3, max = 3)\n\n [1] 0.9151310 -2.7302919 -2.0926021 0.4233409 -1.4870530 0.5004577\n [7] 0.7759846 0.2851662 0.6160349 2.6545841 -2.8730297 -1.3263089\n[13] 1.8697259 0.5580634 1.4596908 2.1194205 -2.3670650 -0.8514127\n[19] -1.2995478 -2.2674588\n\n\nWhen we draw a million times from the distribution, we can then plot it and see that it does look as we would expect:\n\nset.seed(123)\nmy_runif <- runif(n = 1000000, min = -3, max = 3)\n\n\nggplot(data.frame(my_runif), aes(x = my_runif)) +\n geom_histogram(binwidth = 0.25, boundary = 0, closed = \"right\") +\n scale_x_continuous(breaks = seq(-5, 5, 1), limits = c(-5, 5))\n\n\n\n\n\n\n\n\n\n\nBinomial distribution\nFor the binomial distribution, we can specify the number of draws, how many trials each draw will have, and the probability of success.\nFor instance, we can ask R to do the following twenty times: flip a fair coin one hundred times, and count the number of tails.\n\nrbinom(n = 20, size = 100, prob = 0.5)\n\n [1] 48 45 54 50 58 50 42 58 48 57 53 49 52 51 49 40 57 53 52 41\n\n\nWith prob = , we can implement unfair coins:\n\nrbinom(n = 20, size = 100, prob = 0.9)\n\n [1] 88 87 93 95 93 92 91 94 87 91 90 92 93 89 90 95 91 90 86 88\n\n\n\n\nNormal distribution\nFor the Normal or Gaussian distribution, we specify the number of draws, the mean, and standard deviation:\n\nrnorm(n = 20, mean = 0, sd = 1)\n\n [1] 1.10455864 0.06386693 -1.59684275 1.86298270 -0.90428935 -1.55158044\n [7] 1.27986282 -0.32420495 -0.70015076 2.17271578 0.89778913 -0.01338538\n[13] -0.74074395 0.36772316 -0.66453402 -1.11498344 -1.15067439 -0.55098894\n[19] 0.10503154 -0.27183645\n\n\n\n\n\n\n\n\nExercise\n\n\n\nCompute and plot my_rnorm, a vector with 10,000 draws from a Normal distribution \\(X\\) with mean equal to -10 and standard deviation equal to 2 (\\(X\\sim N(-10,2)\\)). You can recycle code!\n\n\n\n\n\n7.1.2 Random sampling from data\nIn this section we will work with good ol’ mtcars, one of R’s most notable default datasets. We’ll assign it to an object so it shows in our Environment pane:\n\nmy_mtcars <- mtcars\n\n\n\n\n\n\n\nTip\n\n\n\nDefault datasets such as mtcars and iris are useful because they are available to everyone, and once you become familiar with them, you can start thinking about the code instead of the intricacies of the data. These qualities also make default datasets ideal for building reproducible examples (see Wickham 2014)\n\n\nWe can use the function sample() to obtain random values from a vector. The size = argument specifies how many values we want. For example, let’s get one random value of the “mpg” column:\n\nsample(my_mtcars$mpg, size = 1)\n\n[1] 24.4\n\n\nEvery time we run this command, we can get a different result:\n\nsample(my_mtcars$mpg, size = 1)\n\n[1] 14.7\n\n\n\nsample(my_mtcars$mpg, size = 1)\n\n[1] 15.5\n\n\nIn some occasions we do want to get the same result consistently after running some random process multiple times. In this case, we set a seed, which takes advantage of R’s pseudo-random number generator capabilities. No matter how many times we run the following code block, the result will be the same:\n\nset.seed(123)\nsample(my_mtcars$mpg, size = 1)\n\n[1] 15\n\n\nSampling with replacement means that we can get the same value multiple times. For example:\n\nset.seed(12)\nsample(c(\"Banana\", \"Apple\", \"Orange\"), size = 3, replace = T)\n\n[1] \"Apple\" \"Apple\" \"Orange\"\n\n\n\nsample(my_mtcars$mpg, size = 100, replace = T)\n\n [1] 26.0 15.2 18.7 18.7 30.4 21.0 24.4 26.0 32.4 15.8 32.4 19.2 18.1 16.4 19.2\n [16] 27.3 14.3 10.4 17.3 13.3 21.4 13.3 19.2 24.4 15.0 27.3 17.8 15.2 15.8 14.3\n [31] 19.7 16.4 18.7 15.8 19.2 21.0 14.3 15.2 14.3 27.3 21.4 33.9 33.9 21.4 30.4\n [46] 33.9 21.4 17.3 17.3 10.4 26.0 18.7 15.2 30.4 10.4 10.4 15.5 14.3 26.0 17.3\n [61] 33.9 26.0 24.4 18.7 30.4 32.4 21.5 30.4 15.2 27.3 13.3 17.3 21.4 24.4 13.3\n [76] 22.8 33.9 13.3 21.5 14.3 19.2 30.4 24.4 26.0 15.8 10.4 24.4 14.3 15.2 10.4\n [91] 19.2 21.0 16.4 19.2 24.4 19.7 18.7 10.4 18.7 17.8\n\n\nIn order to sample not from a vector but from a data frame’s rows, we can use the slice_sample() function from dplyr:\n\nmy_mtcars |> \n slice_sample(n = 2) # a number of rows\n\n mpg cyl disp hp drat wt qsec vs am gear carb\nDodge Challenger 15.5 8 318 150 2.76 3.52 16.87 0 0 3 2\nDatsun 710 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1\n\n\n\nmy_mtcars |> \n slice_sample(prop = 0.5) # a proportion of rows\n\n mpg cyl disp hp drat wt qsec vs am gear carb\nToyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1\nFerrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6\nMerc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3\nHornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2\nMaserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8\nDatsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1\nFord Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4\nDodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2\nMerc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4\nLincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4\nValiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1\nFiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1\nMazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4\nMerc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2\nCamaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4\nCadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4\n\n\nAgain, we can also use seeds here to ensure that we’ll get the same result each time:\n\nset.seed(123)\nmy_mtcars |> \n slice_sample(prop = 0.5)\n\n mpg cyl disp hp drat wt qsec vs am gear carb\nMaserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8\nCadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4\nHonda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2\nMerc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3\nDatsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1\nMerc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4\nFiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1\nDodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2\nMerc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4\nHornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2\nToyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1\nFord Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4\nAMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2\nFerrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6\nMerc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2\nLotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2\n\n\n\n\n\n\n\n\nExercise\n\n\n\nUse slice_sample() to sample 32 rows from mtcars with replacement.",
0 commit comments