diff --git a/001-introduction.Rmd b/001-introduction.Rmd index 4fc3351..be701ed 100644 --- a/001-introduction.Rmd +++ b/001-introduction.Rmd @@ -7,7 +7,7 @@ library( tidyverse ) # Introduction {#introduction} -Monte Carlo simulations are a tool for studying the behavior of random processes, such as the behavior of a statistical estimation procedure when applied to a sample of data. +Monte Carlo\index{Monte Carlo simulation} simulations are a tool for studying the behavior of random processes, such as the behavior of a statistical estimation procedure when applied to a sample of data. Within quantitatively oriented fields, researchers developing new statistical methods or evaluating the use of existing methods nearly always use Monte Carlo simulations as part of their research process. In the context of methodological development, researchers use simulations in a way analogous to how a chef would use their test kitchen to develop new recipes before putting them on the menu, how a manufacturer would use a materials testing laboratory to evaluate the safety and durability of a new consumer good before bringing it to market, or how an astronaut would prepare for a spacewalk by practicing the process in an underwater mock-up. Simulation studies provide a clean and controlled environment for testing out data analysis approaches before putting them to use with real empirical data. @@ -27,7 +27,7 @@ The basic approach for doing so is as follows: 3. Repeat Steps 1 and 2 many times. 4. Summarize the results across these repetitions in order to understand the general trends or patterns in how the method works. -Simulation is useful because one can control the data-generating process and therefore fully know the truth---something that is almost always uncertain when analyzing real, empirical data. +Simulation is useful because one can control the data-generating process\index{data-generating process} and therefore fully know the truth---something that is almost always uncertain when analyzing real, empirical data. Having full control of the data-generating process makes it possible to assess how well a procedure works by comparing the estimates produced by the data analysis procedure against this known truth. For instance, we can see if estimates from a statistical procedure are consistently too high or too low (i.e., whether an estimator is systematically biased). We can also compare multiple data analysis procedures by assessing the degree of error in each set of results to determine which procedure is generally more accurate when applied to the same collection of artificial datasets. @@ -110,7 +110,7 @@ During the process of proposing, seeking funding for, and planning an empirical Part of such justifications may involve a _power analysis_, or an approximation of the probability that the study will show a statistically significant effect, given assumptions about the magnitude of true effects and other aspects of the data-generating process. Researchers may also wish to compare the power of different possible designs in order to inform decisions about how to carry out the proposed study given a set of monetary and temporal constraints. -Many guidelines and tools are available for conducting power analysis for various research designs, including software such as [PowerUp!](https://www.causalevaluation.org/power-analysis.html) [@dong2013PowerUpToolCalculating], [the Generalizer](https://www.thegeneralizer.org/) [@tipton2014stratified], [G*Power](https://www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/gpower) [@faul2009StatisticalPowerAnalyses], and [PUMP](https://cran.r-project.org/web//packages/PUMP/index.html) [@hunter2023PowerMultiplicityProject]. +Many guidelines and tools are available for conducting power analysis\index{power analysis} for various research designs, including software such as [PowerUp!](https://www.causalevaluation.org/power-analysis.html) [@dong2013PowerUpToolCalculating], [the Generalizer](https://www.thegeneralizer.org/) [@tipton2014stratified], [G*Power](https://www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/gpower) [@faul2009StatisticalPowerAnalyses], and [PUMP](https://cran.r-project.org/web//packages/PUMP/index.html) [@hunter2023PowerMultiplicityProject]. These tools use analytic formulas for power, which are often derived using approximations and simplifying assumptions about a planned design. Simulation provides a very general-purpose alternative for power calculations, which can avoid such approximations and simplifications. By repeatedly simulating data based on a hypothetical process and then analyzing data following a specific protocol, one can _computationally_ approximate the power to detect an effect of a specified size. @@ -160,7 +160,7 @@ One can partially address questions of generalization by examining a wide range Even this strategy has limitations, though. Except for very simple processes, we can seldom consider every possible set of conditions. -As we will see in later chapters, the design of a simulation study typically entails making choices over very large spaces of possibility. +As we will see in later chapters, the design of a simulation study\index{simulation study} typically entails making choices over very large spaces of possibility. This flexibility leaves lots of room for discretion and judgement, and even for personal or professional biases [@boulesteix2020Replication]. Due to this flexibility, simulation findings are held in great skepticism by many. The following motto summarizes the skeptic's concern: @@ -173,7 +173,7 @@ If our goal is to show something that we already believe is correct (e.g., that [^guest-speakers]: A comment from James: I recall attending seminars in the statistics department during graduate school, where guest speakers usually presented both some theory and some simulation results. A few years into my graduate studies, I realized that the simulation part of the presentation could nearly always be replaced with a single slide that said "we did some simulations and showed that our new method works better than old methods under conditions that we have cleverly selected to be favorable for our approach." I hope that my own work is not as boring or predictable as my memory of these seminars. -Critiques of simulation studies often revolve around the _realism_, _relevance_, or _generality_ of the data generating process. +Critiques of simulation studies often revolve around the _realism_, _relevance_, or _generality_ of the data generating process\index{data-generating process}. Are the simulated data realistic, in the sense that they follow similar patterns to what one would see in real empirical data? Are the explored aspects of the simulation relevant to what we would expect to find in practice? Was the simulation systematic in exploring a wide variety of scenarios, so that general conclusions are warranted? @@ -268,12 +268,12 @@ We have also included many case studies throughout the book; these are designed We divided the book into several parts. In Part I (which you are reading now), we lay out our case for learning simulation, introduces some guidance on programming, and presents an initial, very simple simulation to set the stage for later discussion of design principles. -Part II lays out the core components (generating artificial data, applying analytic procedures, executing the simulations, and analyzing the simulation results) for simulating a single scenario. It then presents some more involved case studies that illustrate the principles of modular, tidy simulation design. +Part II lays out the core components (generating artificial data, applying analytic procedures, executing the simulations, and analyzing the simulation results) for simulating a single scenario. It then presents some more involved case studies that illustrate the principles of modular, tidy simulation design.\index{tidy design} Part III moves to multifactor simulations, meaning simulations that look at more than one scenario or context. Multifactor simulation is central to the design of more formal simulation studies because it is only by evaluating or comparing estimation procedures across multiple scenarios that we can begin to understand their general properties. The book closes with two final parts. Part IV covers some technical challenges that are commonly encountered when programming simulations, including reproducibility, parallel computing, and error handling. -Part V covers three extensions to specialized forms of simulation: simulating to calculate power, simulating within a potential outcomes framework for causal inference, and the parametric bootstrap. The specialized applications underscore how simulation can be used to answer a wide variety of questions across a range of contexts, thus connecting back to the broader purposes discussed above. +Part V covers three extensions to specialized forms of simulation: simulating to calculate power, simulating within a potential outcomes\index{potential outcomes framework} framework for causal inference, and the parametric bootstrap\index{parametric bootstrap}. The specialized applications underscore how simulation can be used to answer a wide variety of questions across a range of contexts, thus connecting back to the broader purposes discussed above. The book also includes appendices with some further guidance on writing R code and pointers to further resources. diff --git a/003-programming-preliminaries.Rmd b/003-programming-preliminaries.Rmd index ad96390..6f124df 100644 --- a/003-programming-preliminaries.Rmd +++ b/003-programming-preliminaries.Rmd @@ -56,10 +56,11 @@ In writing code for simulations, we will make extensive use of this particular f We will have more to say about random number generation later. ### Rolling your own {#rolling-your-own} +\index{functions!custom written} In R, you can create your own function by specifying the pieces of input information, the steps to follow in transforming the inputs, and the result to return as output. Learning to write your own functions to carry out calculations is an immensely useful skill that will greatly enhance your ability to accomplish a range of tasks. -Writing custom functions is also central to our approach to coding Monte Carlo simulations, and so we highlight some of the key considerations here. +Writing custom functions is also central to our approach to coding Monte Carlo\index{Monte Carlo simulation} simulations, and so we highlight some of the key considerations here. [Chapter 19 of R for Data Science (1st edition)](https://r4ds.had.co.nz/functions.html) provides an in-depth discussion of how to write your own functions. Here is an example of a custom function called `one_pval()`: @@ -260,7 +261,7 @@ Later chapters will have much more to say about the process of writing custom fu ### Function skeletons {#function-skeletons} -In discussing how to write functions for simulations, we will often present _function skeletons_. By a skeleton, we mean code that creates a function with a specific set of input arguments, but where the body is left partially or fully unspecified. +In discussing how to write functions for simulations, we will often present _function skeletons_. By a skeleton, we mean code that creates a function with a specific set of input arguments, but where the body is left partially or fully unspecified.\index{skeleton} Here is a cursory example of a function skeleton: ```{r} run_simulation <- function( N, J, mu, sigma, tau ) { @@ -322,7 +323,7 @@ Pipes are a very nice technique for writing clear code that is easy for others t As we will elaborate in subsequent chapters, we follow a modular approach to writing simulations, in which each component of the simulation is represented by its own custom function or its own object in R. This modular approach leads to code that always has the same broad structure and where the process of implementing the simulation follows a set sequence of steps. -We start by coding a data-generating process, then write one or more data-analysis methods, then determine how to evaluate the performance of the methods, and finally implement an experimental design to examine the performance of the methods across multiple scenarios. +We start by coding a data-generating process\index{data-generating process}, then write one or more data-analysis methods, then determine how to evaluate the performance of the methods, and finally implement an experimental design to examine the performance of the methods across multiple scenarios. Over the next several chapters, we will walk through this process several times. Although we always follow the same broad process, the case studies that we will present are _not_ intended as a cookbook that must be rigidly followed. @@ -360,4 +361,4 @@ Adapting a good example is usually much easier than starting from a blank screen } ``` -5. Modify `one_pval()` to allow the user to specify a hypothesized value for the population mean, to use when calculating the one-sample t-test results. \ No newline at end of file +5. Modify `one_pval()` to allow the user to specify a hypothesized value for the population mean, to use when calculating the one-sample t-test results. diff --git a/005-initial-t-test-simulation.Rmd b/005-initial-t-test-simulation.Rmd index 2f737c1..6873af7 100644 --- a/005-initial-t-test-simulation.Rmd +++ b/005-initial-t-test-simulation.Rmd @@ -1,9 +1,10 @@ # An initial simulation {#t-test-simulation} +\index{example!t-test simulation} To begin learning about simulation, a good starting place is to examine a small, concrete example. This example illustrates how simulation involves replicating the data-generating and data-analysis processes, followed by aggregating the results across replications. -Our little example encapsulates the bulk of our approach to Monte Carlo simulation, touching on all the main components involved. +Our little example encapsulates the bulk of our approach to Monte Carlo\index{Monte Carlo simulation} simulation, touching on all the main components involved. In subsequent chapters we will look at each of these components in greater detail. But first, let us look at a simulation of a very simple statistical problem. @@ -100,7 +101,7 @@ We got about 95% coverage, which is good news. In `r table(rps)[[1]]` out of the It is important to recognize that this set of simulations results, and our coverage rate of `r 100 * mean( rps == 1 )`%, itself has some uncertainty in it. Because we only repeated the simulation 1000 times, what we really have is a _sample_ of 1000 independent replications, out of an infinite number of possible simulation runs. Our coverage of `r round(100*mean(rps==1),digits=1)`% is an _estimate_ of what the true coverage would be, if we ran more and more replications. -The source of uncertainty of our estimate is called _Monte Carlo simulation error (MCSE)_. +The source of uncertainty of our estimate is called _Monte Carlo simulation error (MCSE\index{Monte Carlo standard error})_. We can actually assess the Monte Carlo simulation error in our simulation results using standard statistical procedures for independent and identically distributed data. Here we use a proportion test to check whether the estimated coverage rate is consistent with a true coverage rate of 95%: ```{r} @@ -238,8 +239,8 @@ Although the general trend is pretty clear, the graph is still a bit messy becau So far, we have executed a simple simulation to assess how well a statistical method works in a given circumstance, then expanded the simulation by running a single-factor experiment in which we varied the sample size to see how the method's performance changes. In our example, we found that coverage is below what it should be for small sample sizes, but improves for sample sizes in the 100's. -This example captures all the major steps of a simulation study, which we outlined at the start of Chapter \@ref(introduction). -We generated some hypothetical data according to a fully-specified data-generating process: we did both a normal distribution and a geometric distribution, each with a mean of 4. +This example captures all the major steps of a simulation study\index{simulation study}, which we outlined at the start of Chapter \@ref(introduction). +We generated some hypothetical data according to a fully-specified data-generating process\index{data-generating process}: we did both a normal distribution and a geometric distribution, each with a mean of 4. We applied a defined data-analysis procedure to the simulated data: we used a confidence interval based on the $t$ distribution. We assessed how well the procedure worked across replications of the data-generating and data-analysis processes: in this case we focused on the coverage rate of the confidence interval. After creating a function to implement this whole process for a single scenario, we investigated how the performance of the confidence interval changed depending on sample size. diff --git a/010-Simulation-structure.Rmd b/010-Simulation-structure.Rmd index 087d7a3..abbd81d 100644 --- a/010-Simulation-structure.Rmd +++ b/010-Simulation-structure.Rmd @@ -2,7 +2,7 @@ # Structure of a simulation study {#simulation-structure} -Monte Carlo simulation is a very flexible tool that researchers use to study a vast array of different models and topics. +Monte Carlo\index{Monte Carlo simulation} simulation is a very flexible tool that researchers use to study a vast array of different models and topics. Within the realm of methodological research, most simulations share a common structure, nearly always involving the same set of steps or component pieces. In learning to design your own simulations, it is very useful to recognize the core components that most simulation studies share. Identifying these components will help you to organize your work and structure the coding tasks involved in writing a simulation. @@ -139,7 +139,7 @@ We will now look at the inputs and outputs of each function to see how they alig Subsequent chapters examine each piece in much greater detail---putting meat on the bones of each function skeleton, to torture our metaphor---and discuss specific statistical issues and programming techniques that are useful for designing each component. Besides illustrating the skeletal framework of a simulation, readers might find it useful to use it as a template from which to start writing their own code. -The `simhelpers` package includes the function `create_skeleton()`, which will open a new R script that contains a template for a simulation study, with sections corresponding to each component: +The `simhelpers` package\index{package!simhelpers} includes the function `create_skeleton()`,\index{create\_skeleton()} which will open a new R script that contains a template for a simulation study, with sections corresponding to each component: ```{r eval=FALSE} simhelpers::create_skeleton() ``` @@ -237,7 +237,7 @@ Here is a different version of the skeleton, which uses `replicate()` instead of This version of the skeleton does not create a `one_run()` helper function, but instead puts the code from the body of `one_run()` directly into the `expr` argument of `replicate()`. To learn about other ways of repeatedly evaluating the simulation process, see Appendix \@ref(repeating-oneself). We go into further detail about how to approach running the simulation process in Chapter \@ref(running-the-simulation-process). -Among other things, we will illustrate how to use the `bundle_sim()` function from the `simhelpers` package to automate the process of coding this step, thereby avoiding the need to write a `one_run()` helper function. +Among other things, we will illustrate how to use the `bundle_sim()` function from the `simhelpers` package to automate the process of coding this step, thereby avoiding the need to write a `one_run()` helper function.\index{bundle\_sim()}\index{package!simhelpers} ### Performance summaries @@ -265,7 +265,7 @@ We discuss the specifics of different performance measures and assessment of Mon ### Multifactor simulations -Thus far, we have sketched out the structure of a modular, tidy simulation for a single context. +Thus far, we have sketched out the structure of a modular, tidy simulation for a single context.\index{tidy design} In our $t$-test case study, for example, we might ask how well the $t$-test works when we have $n=100$ units and the observations follow geometric distribution. However, we rarely want to examine a method only in a single context. Typically, we want to explore how well a procedure works across a range of different contexts. @@ -281,4 +281,4 @@ We will discuss multiple-scenario simulations in Part III (starting with Chapter ## Exercises -1. Look back at the $t$-test simulation presented in Chapter \@ref(t-test-simulation). The code presented there did not entirely follow the formal structure outlined in this chapter. Revise the code by creating separate functions for each of four components in the simulation skeleton. Using the functions, re-run the simulation and recreate one or more graphs from the exercises in the previous chapter. \ No newline at end of file +1. Look back at the $t$-test simulation presented in Chapter \@ref(t-test-simulation)\index{example!t-test simulation}. The code presented there did not entirely follow the formal structure outlined in this chapter. Revise the code by creating separate functions for each of four components in the simulation skeleton. Using the functions, re-run the simulation and recreate one or more graphs from the exercises in the previous chapter. diff --git a/015-Case-study-ANOVA.Rmd b/015-Case-study-ANOVA.Rmd index 1ad77f1..cd745d7 100644 --- a/015-Case-study-ANOVA.Rmd +++ b/015-Case-study-ANOVA.Rmd @@ -1,11 +1,11 @@ # Case Study: Heteroskedastic ANOVA and Welch {#case-ANOVA} - +\index{example!heteroskedastic ANOVA (Welch)} ```{r include = FALSE} library(tidyverse) ``` -In this chapter, we present another detailed example of a simulation study to demonstrate how to put the principles of tidy, modular simulation into practice. +In this chapter, we present another detailed example of a simulation study\index{simulation study} to demonstrate how to put the principles of tidy, modular simulation into practice. To illustrate the process of programming a simulation, we reconstruct the simulations from @brown1974SmallSampleBehavior. We will also use this case study as a recurring example in some of the following chapters. @@ -87,9 +87,9 @@ It is a measure of how sensitive a method is to violations of the null. Brown and Forsythe estimated error rates and power for nominal $\alpha$-levels of 1%, 5%, and 10%. Table 1 of their paper reports the simulation results for type-I error (labeled as "size"). -Our Table \@ref(tab:BF-table1) reports some of their results with respect to Type I error. +Our Table \@ref(tab:BF-table1) reports some of their results with respect to Type I error\index{performance measures!Type I error}. For a well-calibrated hypothesis testing method, the reported numbers should be very close to the desired alpha levels, as listed at the top of the table. -We can compare the four tests to each other across each row, where each row is a specific scenario defined by a specific data generating process. +We can compare the four tests to each other across each row, where each row is a specific scenario defined by a specific data generating process\index{data-generating process}. Looking at ANOVA, for example, we see some scenarios with very elevated rates. For instance, in Scenario E, the ANOVA F-test has 21.9% rejection when it should only have 10%. In contrast, the ANOVA F works fine under scenario A, which is what one would expect because all the groups have the same variance. Brown and Forsythe's choice of scenarios here illustrates a broader design principle: to provide a full picture of the performance of a method or set of methods, it is wise to always evaluate them under conditions where we expect things to work, as well as conditions where we expect them to not work well. @@ -183,6 +183,7 @@ To replicate the Brown and Forsythe simulation, we will first write functions to We will then use these functions to evaluate the hypothesis testing procedures in a specific scenario with a particular set of parameters (e.g., sample sizes, number of groups, and so forth). We will then scale up to execute the simulations for a range of scenarios that vary the parameters of the data-generating model, just as reported in Brown and Forsythe's paper. ## The data-generating model {#case-anova-DGP} +\index{data-generating process} In the heteroskedastic one-way ANOVA simulation, there are three sets of parameter values: population means, population variances, and sample sizes. Rather than attempting to write a general data-generating function immediately, it is often easier to write code for a specific case first and then use that code as a starting point for developing a function. @@ -241,7 +242,7 @@ The inputs to the function will be the parameters of the model that we specified ``` The function is simply the code we built previously, all bundled up. -We developed the function by first writing code to make the data-generating process to work once, the way we want, and only then turning the final code into a function for later reuse. +We developed the function by first writing code to make the data-generating process\index{data-generating process} to work once, the way we want, and only then turning the final code into a function for later reuse. Once we have turned the code into a function, we can call it to get a new set of simulated data. For example, to generate a dataset with the same parameters as before, we can do: @@ -344,7 +345,7 @@ one_run( mu = mu, sigma_sq = sigma_sq, sample_size = sample_size ) This function implements a single simulation trial by generating artificial data and then analyzing the data, ending with a tidy dataset that has results for the single run. We next call `one_run()` over and over; see Appendix \@ref(repeating-oneself) for some discussion of options. -The following uses `repeat_and_stack()` from `simhelpers` to evaluate `one_run()` 4 times and then stack the results into a single dataset: +The following uses `repeat_and_stack()` from `simhelpers` to evaluate `one_run()` 4 times and then stack the results into a single dataset:\index{package!simhelpers}\index{repeat\_and\_stack()} ```{r} library(simhelpers) @@ -409,7 +410,7 @@ mean(p_vals$Welch < 0.05) The Welch test does much better, although it appears to be a little bit in excess of 0.05. Note that these two numbers are quite close (though not quite identical) to the corresponding entries in Table 1 of Brown and Forsythe (1974). The difference is due to the fact that both Table 1 and are results are actually _estimated_ rejection rates, because we have not actually simulated an infinite number of replications. The estimation error arising from using a finite number of replications is called _simulation error_ (or _Monte Carlo error_). -In Chapter \@ref(performance-measures), we will look more at how to estimate and control the Monte Carlo simulation error in performance measures. +In Chapter \@ref(performance-measures), we will look more at how to estimate and control the Monte Carlo\index{Monte Carlo simulation} simulation error in performance measures. So there you have it! Each part of the simulation is a distinct block of code, and together we have a modular simulation that can be easily extended to other scenarios or other tests. The exercises at the end of this chapter ask you to extend the framework further. diff --git a/020-Data-generating-models.Rmd b/020-Data-generating-models.Rmd index 5a654b3..a36edf8 100644 --- a/020-Data-generating-models.Rmd +++ b/020-Data-generating-models.Rmd @@ -14,7 +14,7 @@ theme_set( theme_classic() ) # Data-generating processes {#data-generating-processes} As we saw in Chapter \@ref(simulation-structure), the first step of a simulation is creating artificial data based on some process where we know (and can control) the truth. -This step is what we call the data generating process (DGP). +This step is what we call the data generating process\index{data-generating process} (DGP\index{data-generating process}). Think of it as a recipe for cooking up artificial data, which can be applied over and over, any time we're hungry for a new dataset. Like a good recipe, a good DGP needs to be complete---it cannot be missing ingredients and it cannot omit any steps. Unlike cooking or baking, however, DGPs are usually specified in terms of a statistical model, or a set of equations involving constants, parameter values, and random variables. @@ -33,7 +33,8 @@ Before diving in, it is helpful to consider a few examples that we will return t ### Example 1: One-way analysis of variance {#ANOVA-example} -We have already seen one example of a DGP in the ANOVA example from Chapter \@ref(case-ANOVA). Here, we consider observations on some variable $X$ drawn from a population consisting of $G$ groups, where group $g$ has population mean $\mu_g$ and population variance $\sigma_g^2$ for $g = 1,...,G$. +We have already seen one example of a DGP in the ANOVA example from Chapter \@ref(case-ANOVA).\index{example!heteroskedastic ANOVA (Welch)} +Here, we consider observations on some variable $X$ drawn from a population consisting of $G$ groups, where group $g$ has population mean $\mu_g$ and population variance $\sigma_g^2$ for $g = 1,...,G$. A simulated dataset consists of $n_g$ observations from each group $g = 1,...,G$, where $X_{ig}$ is the measurement for observation $i$ in group $g$. The statistical model for these data can be written as follows: $$ @@ -47,6 +48,7 @@ $$ ### Example 2: Bivariate Poisson model {#BVPois-example} +\index{example!bivariate Poisson} As a second example, suppose that we want to understand how the usual Pearson sample correlation coefficient behaves with non-normal data and also to investigate how the Pearson correlation relates to Spearman's rank correlation coefficient. To look into such questions, one DGP we might entertain is a bivariate Poisson model, which is a distribution for a pair of counts, $C_1,C_2$, where each count follows a Poisson distribution and where the pair of counts may be correlated. @@ -72,14 +74,14 @@ An interesting feature of this model is that the range of possible correlations ### Example 3: Hierarchical linear model for a cluster-randomized trial {#CRT-example} -Cluster-randomized trials are randomized experiments where the unit of randomization is a _group_ of individuals, rather than the individuals themselves. +Cluster-randomized trials\index{cluster-randomized trial} are randomized experiments where the unit of randomization is a _group_ of individuals, rather than the individuals themselves. For example, suppose we have a collection of schools and the students within them. A cluster-randomized trial involves randomizing the _schools_ into treatment or control conditions and then measuring an outcome such as academic performance on the multiple students within the schools. Typically, researchers will be interested in the extent to which average outcomes differ across schools assigned to different conditions, which captures the impact of the treatment relative to the control condition. We will index the schools using $j = 1,...,J$ and let $n_j$ denote the number of students observed in school $j$. Say that $Y_{ij}$ is the outcome measure for student $i$ in school $j$, for $1 = 1,...,n_j$ and $j = 1,...,J$, and let $Z_j$ be an indicator equal to 1 if school $j$ is assigned to the treatment condition and otherwise equal to 0. -A widely used approach for estimating impacts from cluster-randomized trials is heirarchical linear modeling (HLM). +A widely used approach for estimating impacts from cluster-randomized trials is heirarchical linear modeling (HLM).\index{multilevel model} One way to write an HLM is in two parts. First, we consider a regression model that describes the distribution of the outcomes across students within school $j$: $$ @@ -103,7 +105,7 @@ A DGP involves a statistical model with parameters and random variables, but it In statistical analysis of real data, we often use models that describe only _part_ of the distribution of the data, rather than its full, multivariate distribution. For instance, when conducting a regression analysis, we are analyzing the distribution of an outcome or response variable, conditional on a set of predictor variables. In other words, we take the predictor variables as _given_ or _fixed_, rather than modeling their distribution. -When using an item response theory (IRT) model, we use responses to a set of items to estimate individual ability levels, given the set of items on the test. +When using an item response theory (IRT) model,\index{item response theory} we use responses to a set of items to estimate individual ability levels, given the set of items on the test. We do not (usually) model the items themselves. In contrast, if we are going to _generate_ data for simulating a regression model or IRT model, we need to specify distributions for these additional features (the predictors in a regression model, the items in an IRT model); we can no longer just take them as given. @@ -183,7 +185,7 @@ In this case study, we developed the following function to generate data for a s This function takes both the focal model parameters (`mu`, `sigma_sq`) and other design parameters that one might not think of as parameters per-se (`sample_size`). When simulating, we have to specify quantities that we take for granted when analyzing real data. -How would we write a DGP function for the bivariate Poisson model? The equations in Section \@ref(DGP-examples) give us the recipe, so it just a matter of re-expressing them in code. +How would we write a DGP function for the bivariate Poisson model? The equations in Section \@ref(DGP-examples) give us the recipe, so it just a matter of re-expressing them in code.\index{example!bivariate Poisson} For this model, the only design parameter is the sample size, $N$; the sole focal parameter is the correlation between the variates, $\rho$; and the auxiliary parameters are the expected counts $\mu_1$ and $\mu_2$. @@ -233,7 +235,7 @@ ggplot(ANOVA_dat) + theme(legend.position = "none") ``` -Here is a plot of 30 observations from the bivariate Poisson distribution with means $\mu_1 = 10, \mu_2 = 7$ and correlation $\rho = .65$ (points are jittered slightly to avoid over-plotting): +Here is a plot of 30 observations from the bivariate Poisson distribution with means $\mu_1 = 10, \mu_2 = 7$ and correlation $\rho = .65$ (points are jittered slightly to avoid over-plotting):\index{example!bivariate Poisson} ```{r bivariate-Poisson-scatter} #| echo: false @@ -311,9 +313,10 @@ In models that are even a little bit more complex, it is quite easy for small th Even simple checks such as these can be quite helpful in catching such bugs. ## Example: Simulating clustered data {#case-cluster} +\index{example!cluster randomized trial} Writing code for a complicated DGP can feel like a daunting task, but if you first focus on a recipe for how the data is generated, it is often not too bad to then convert that recipe into code. -We now illustrate this process with a detailed case study involving a more complex data-generating process +We now illustrate this process with a detailed case study involving a more complex data-generating process\index{data-generating process} Recent literature on multisite trials (where, for example, students are randomized to treatment or control within each of a series of sites) has explored how variation in the strength of effects across sites can affect how different data-analysis procedures behave [e.g., @miratrix2021applied; @Bloom2016using]. In this example, we are going to extend this work to explore best practices for estimating treatment effects in cluster randomized trials. In particular, we will investigate what happens when the treatment impact for each school is related to the size of the school. @@ -321,7 +324,7 @@ In particular, we will investigate what happens when the treatment impact for ea ### A design decision: What do we want to manipulate? -In designing a simulation study, we need to find a DGP that will allow us to address the specific questions we are interested in investigating. +In designing a simulation study\index{simulation study}, we need to find a DGP that will allow us to address the specific questions we are interested in investigating. For instance, in the one-way ANOVA example, we wanted to see how different degrees of within-group variation impacted the performance of several hypothesis-testing procedures. We therefore needed a data generation process that allowed us to control the extent of within-group variation. @@ -498,6 +501,7 @@ The next step is to test the code, making sure it is doing what we think it is. In fact, it is not--there is a subtle bug that only appears under some specifications of the parameters; see the exercises for more on diagnosing and repairing this error. ### Standardization in the DGP {#DGP-standardization} +\index{reparameterization} One difficulty with the current implementation of the model is that the magnitude of the different parameters are inter-connected. For instance, raising or lowering the within-school variance ($\sigma^2_u$) will increase the overall variation in $Y$, and therefore affect our the interpretation of the treatment effect parameters, because a given value of $\gamma_{1}$ will be less consequential if there is more overall variation. @@ -763,7 +767,7 @@ r_tss <- function(n, mean, sd, df) { r_tss(n = 8, mean = 3, sd = 2, df = 5) ``` -1. Modify the Welch simulation's `generate_ANOVA_data()` function from Chapter \@ref(Welch-DGP) to generate data from shifted-and-scaled $t$-distributions rather than from normal distributions. Include the degrees of freedom as an input argument. +1. Modify the Welch simulation's `generate_ANOVA_data()` function from Chapter \@ref(Welch-DGP) to generate data from shifted-and-scaled $t$-distributions rather than from normal distributions. Include the degrees of freedom as an input argument.\index{example!heteroskedastic ANOVA (Welch)} Note that, in your modified function, you will have to translate the original parameters (`mu`, `sigma_sq`) to appropriate parameters of `r_tss()`. With new function in hand, simulate a dataset with low degrees of freedom and plot it to see if the data includes outliers. @@ -773,6 +777,7 @@ With new function in hand, simulate a dataset with low degrees of freedom and pl Do the results change substantially? ### Plot the bivariate Poisson +\index{example!bivariate Poisson} In Section \@ref(DGP-functions), we provided an example of a DGP function for the bivariate Poisson model. We demonstrated a plot of data simulated from this function in Section \@ref(DGP-plotting). @@ -862,6 +867,7 @@ r_bivariate_Poisson_raw <- function(N, alpha0, alpha1, alpha2, beta, delta) 5. Consider target means of $\mathbb{E}(C_1) = \mathbb{E}(C_2) = 10$ and target variances of $\mathbb{V}(C_1) = \mathbb{V}(C_2) = 15$. What are the minimum and maximum possible correlations between $C_1$ and $C_2$? ### Another bivariate negative binomial distribution {#BVNB2} +\index{example!bivariate Poisson} Another model for generating bivariate counts with negative binomial marginal distributions is by using Gaussian copulas. Here is a mathematical recipe for this distribution, which will produce counts with marginal means $\mu_1$ and $\mu_2$ and marginal variances $\mu_1 + \mu_1^2 / p_1$ and $\mu_2 + \mu_2^2 / p_2$. @@ -900,7 +906,7 @@ In the following we run a few tests on the cluster RCT DGP code (`gen_cluster_RC 1. What is the variance of the outcomes generated by the model for the cluster-randomized trial if there are no treatment effects? (Try simulating data to check!) What other quick checks can you run on this DGP to make sure it is working correctly? -2. It is prudent to prevent nonsensical values of parameters. For example, what happens if you put in `alpha = 1` or give an `alpha` larger than 1? Does it generate data? It can be useful to put in some error-catching code to prevent such values from being used. For example, you can have `stopifnot( n_min > 0 )` at the top of your function. Implement this error-catching code and check that it works. +2. It is prudent to prevent nonsensical values of parameters. For example, what happens if you put in `alpha = 1` or give an `alpha` larger than 1? Does it generate data? It can be useful to put in some error-catching code to prevent such values from being used. For example, you can have `stopifnot( n_min > 0 )` at the top of your function. Implement this error-catching code and check that it works.\index{stopifnot()} 3. The code also rounds `alpha * n_bar`--but what happens if `alpha * n_bar` is not an integer? What will happen if you specify values for `alpha` and `n_bar` that do not multiply to a whole number? Is this ok, or could it cause problems? Explain why or why not. @@ -986,6 +992,7 @@ Be sure to specify what your other parameters are, as you test how a target para ### Random effects meta-regression {#meta-regression-DGP} +\index{example!meta-regression} Meta-analysis involves working with quantitative findings reported by previous studies on a particular topic, in the form of effect size estimates and their standard errors, to generate integrative summaries and identify patterns in the findings that might not be evident from any previous study considered in isolation. One model that is widely used in meta-analysis is the random effects meta-regression, which relates the effect sizes to known characteristics of the studies that are encoded in the form of predictors. @@ -1028,4 +1035,4 @@ for selection probabilities $0 \leq \lambda_1 \leq 1$ and $0 \leq \lambda_2 \leq 3. Create several further plots using different values for $\lambda_1$ and $\lambda_2$. How do these parameters affect the shape of the distribution? -4. Use the `selmodel()` function from the `metafor` package to estimate the Vevea-Hedges selection model based on one of your simulated datasets. (See Exercise \@ref(Vevea-Hedges-estimation) for example syntax.) How do the estimates compare to the model parameters you've specified? \ No newline at end of file +4. Use the `selmodel()` function from the `metafor` package to estimate the Vevea-Hedges selection model based on one of your simulated datasets. (See Exercise \@ref(Vevea-Hedges-estimation) for example syntax.) How do the estimates compare to the model parameters you've specified? diff --git a/030-Estimation-procedures.Rmd b/030-Estimation-procedures.Rmd index 93b0a3c..dfa56ec 100644 --- a/030-Estimation-procedures.Rmd +++ b/030-Estimation-procedures.Rmd @@ -23,7 +23,7 @@ source("case_study_code/analyze_cluster_RCT.R") We do simulation studies to understand how to analyze data. Thus, the central object of study is a _data analysis procedure_, or a set of steps or calculations carried out on a dataset. We want to know how well a procedure would work when applied in practice, and the data generating processes we described in Chapter \@ref(data-generating-processes) provides a stand-in for real data. -To revisit our consumer product testing analogy from Chapter \@ref(introduction), the data analysis procedure is the product, and the data generating process is the set of trials to which we subject the product. +To revisit our consumer product testing analogy from Chapter \@ref(introduction), the data analysis procedure is the product, and the data generating process\index{data-generating process} is the set of trials to which we subject the product. Depending on the research question, a data-analysis procedure might be very simple---as simple as just computing a sample correlation---or it might involve something more complex, such as fitting a multilevel model and generating a confidence interval for a specific coefficient. A data analysis procedure might even involve a combination of several components. @@ -70,7 +70,7 @@ In particular, it tells us that applying Fisher's $z$-transformation of Pearson' It also tells us that the empirical standard error of the $z$-transformed correlation coefficient is simply $1 / \sqrt{N - 3}$, and thus independent of the correlation parameter. This makes $z$-transformation very useful for computing confidence intervals, which can then be back-transformed to the Pearson-$r$ scale. However, these results are specific to bivariate normal distributions. -Would this transformation work well in the face of non-normal data, such as the bivariate Poisson distribution we coded in Chapter \@ref(data-generating-processes)? +Would this transformation work well in the face of non-normal data, such as the bivariate Poisson distribution we coded in Chapter \@ref(data-generating-processes)?\index{example!bivariate Poisson} For studying this question with simulation, a simple estimation function would take a dataset with two variables as input and compute the sample correlation and its $z$-transformation, compute confidence intervals for $z$, and then back-transform the confidence interval end-points. Here is an implementation of this sequence of calculations: @@ -109,8 +109,8 @@ Doing so makes it easier to add in additional methods as desired or to focus on Writing separate functions also leads to a code base that is flexible and useful for other purposes (such as analyzing real data). Finally (repeating one of our favorite mantras), separating functions makes debugging easier because it lets you focus attention on one thing at a time, without worrying about how errors in one area might propagate to others. -To see how this works in practice, we return to the case study from Section \@ref(case-cluster), where we developed a data-generating function for simulating a cluster-randomized trial with student-level outcomes but school-level treatment assignment. -Our data-generating process allowed for varying school sizes and heterogeneous treatment effects that are correlated with school size. +To see how this works in practice, we return to the case study from Section \@ref(case-cluster), \index{example!cluster randomized trial} where we developed a data-generating function for simulating a cluster-randomized trial\index{cluster-randomized trial} with student-level outcomes but school-level treatment assignment. +Our data-generating process\index{data-generating process} allowed for varying school sizes and heterogeneous treatment effects that are correlated with school size. Several different procedures might be used to estimate an overall average effect from a clustered experiment. We will consider three different procedures: * Fitting a multi-level regression model (also known as a hierarchical linear model), @@ -187,7 +187,7 @@ Note the `stopifnot` command: it will throw an error if the condition is not tru This `stopifnot` ensures we do not have both treatment and control students within a single school--if we did, we would have more aggregated values than school ids due to the grouping! Putting *assert statements* in your code like this is a good way to guarantee you are not introducing weird and hard-to-track errors. A `stopifnot` statement halts your code as soon as something goes wrong, rather than letting that initial error flow on to further work, potentially creating odd results that are hard to understand. -Here we are protecting ourselves from strange results if, for example, we messed up our DGP code to have treatment not nested within school, or if we were using data that did not actually come from a cluster randomized experiment. +Here we are protecting ourselves from strange results if, for example, we messed up our DGP\index{data-generating process} code to have treatment not nested within school, or if we were using data that did not actually come from a cluster randomized experiment. See Section \@ref(about-stopifnot) for more. All of our functions produce output in the same format: @@ -219,7 +219,7 @@ For estimation functions that involve multi-step procedures or novel methods, ot ### Checking against existing implementations -In Chapter \@ref(case-ANOVA), we wrote a function called `ANOVA_Welch_F()` for computing $p$-values from two different procedures for testing equality of means in a heteroskedastic ANOVA: +In Chapter \@ref(case-ANOVA), we wrote a function called `ANOVA_Welch_F()` for computing $p$-values from two different procedures for testing equality of means in a heteroskedastic ANOVA:\index{example!heteroskedastic ANOVA (Welch)} ```{r, eval = FALSE, file = "case_study_code/ANOVA_Welch_F.R"} ``` @@ -242,6 +242,7 @@ all.equal(aov_results$p.value, Welch_results$Welch) We use `all.equal()` because it will check equality up to a tolerance in R, which can avoid some perplexing errors due to rounding. +\index{example!bivariate Poisson} For the bivariate correlation example introduced in Section \@ref(estimation-functions), we can check the output of `r_and_z()` against R's built-in `cor.test()` function: ```{r} Pois_dat <- r_bivariate_Poisson(15, rho = 0.6, mu1 = 14, mu2 = 8) @@ -385,6 +386,7 @@ But, just like many other research endeavors, simulations are often a highly ite ## Handling errors, warnings, and other hiccups {#error-handling} +\index{errors and warnings} Especially when working with more advanced estimation methods, it is possible that your estimation function will fail, throw an error, or return something uninterpretable for certain input datasets. For instance, maximum likelihood estimation often involves using iterative, numerical optimization algorithms that sometimes fail to converge. @@ -482,7 +484,9 @@ M1 <- quiet_safe_lmer( Yobs ~ 1 + Z + (1|sid), data=dat ) M1 ``` -In our estimation function, we replace the original `lmer()` call with our quiet-and-safe version, `quiet_safe_lmer()`. We then pick apart the pieces of its output to return a tidy dataset of results. (Separately, we also add in a generally preferred optimizer, bobyqa, to try and avoid the warnings and errors in the first place.) The resulting function is: +In our estimation function, we replace the original `lmer()` call with our quiet-and-safe version, `quiet_safe_lmer()`. +We then pick apart the pieces of its output to return a tidy dataset of results. +(Separately, we also add in a generally preferred optimizer, bobyqa, to try and avoid the warnings and errors in the first place.) The resulting function is: ```{r} analysis_MLM_safe <- function( dat ) { @@ -698,7 +702,7 @@ Once written, applying the function will produce a tidy table of results that we ### More Heteroskedastic ANOVA {#BFFs-forever} -In the classic simulation by @brown1974SmallSampleBehavior, they not only looked at the performance of the homoskedastic ANOVA-F test and Welch's heteroskedastic-F test, they also proposed their own new hypothesis testing procedure. +In the classic simulation by @brown1974SmallSampleBehavior, they not only looked at the performance of the homoskedastic ANOVA-F test and Welch's heteroskedastic-F test, they also proposed their own new hypothesis testing procedure.\index{example!heteroskedastic ANOVA (Welch)} 1. This exercise continues the running example introduces in Chapter \@ref(ANOVA-hypothesis-testing-function). Write a function that implements the Brown-Forsythe F\* test (the BFF\* test!) as described on p. 130 of @brown1974SmallSampleBehavior, using the following code skeleton: ```{r} @@ -744,6 +748,7 @@ Welch_ANOVA_F <- function( data , pre_alpha = .05) { ### Check the cluster-RCT functions {#cross-check-CRT-estimators} +\index{example!cluster randomized trial} Section \@ref(multiple-estimation-procedures) presented functions implementing several different strategies for estimating an average treatment effect from a cluster-randomized trial. Write some code to validate these functions by comparing their output to the results of other tools for doing the same calculation. @@ -791,6 +796,7 @@ If all clusters are singletons, skip the model-fitting step, return an `NA` valu 2. Revise `analyze_data()` to use your new version of `analysis_MLM_contingent()`. The revised function will now sometimes return `NA` values for the `MLM` results. To implement the strategy of falling-back on OLS regression, add some code in `analyze_data()` that replaces any `NA` values for the MLM results with corresponding results of `analysis_OLS()`. Test that the function is working as expected (in this case, if the MLM fails, the final result would have the same estimates for MLM and OLS as two separate rows). Compared to having `analysis_MLM_contingent()` run OLS internally, writing code this way improves efficiency because we do not fit the OLS regression multiple times on the same data; such an improvement could make a simulation faster to run. ### Estimating 3-parameter item response theory models {#IRT-3PL-estimation} +\index{item response theory} Exercise \@ref(IRT-DGP-parameters) asked you to write a data-generating function for the 3-parameter IRT model described in described in Section \@ref(three-parameter-IRT). Use your function to generate a large dataset. @@ -806,6 +812,7 @@ Using functions from the [`{ltm}`](https://cran.r-project.org/package=ltm), [`{m ### Meta-regression with selective reporting {#Vevea-Hedges-estimation} +\index{example!meta-regression} Exercise \@ref(Vevea-Hedges-DGP) asked you to write a data-generating function for the @vevea1995general selection model. The `{metafor}` package includes a function for fitting this model (as well as a variety of other selection models). @@ -852,4 +859,4 @@ fix_one <- selmodel(rma_fit, type = "step", steps = c(.025, .500), 4. The `selmodel()` throws warnings when the dataset contains no observations with $p_i$ in one of the specified intervals. Modify your estimation function to set the selection probability for an interval to $\lambda_1 = 0.1$ if there are no $p_i$ values between .025 and .500 and to set $\lambda_2 = 0.1$ if there are no $p_i$ values larger than .500. - Write code demonstrating that the function works as expected. \ No newline at end of file + Write code demonstrating that the function works as expected. diff --git a/035-running-simulation.Rmd b/035-running-simulation.Rmd index 082d2bd..62dbf04 100644 --- a/035-running-simulation.Rmd +++ b/035-running-simulation.Rmd @@ -20,16 +20,16 @@ source("case_study_code/analyze_cluster_RCT.R") # Running the Simulation Process {#running-the-simulation-process} In the prior two chapters we saw how to write functions that generate data according to a particular model and functions that implement data-analysis procedures on simulated data. -The next step in building a simulation involves putting these two pieces together, running the DGP function and the data-analysis function repeatedly to obtain results (in the form of point estimates, standard errors, confidence intervals, $p$-values, or other quantities) from many replications of the whole process. +The next step in building a simulation involves putting these two pieces together, running the DGP\index{data-generating process} function and the data-analysis function repeatedly to obtain results (in the form of point estimates, standard errors, confidence intervals, $p$-values, or other quantities) from many replications of the whole process. As with most things R-related, there are many different techniques that can be used to repeat a set of calculations over and over. In this chapter, we demonstrate two key techniques for doing so. -We then explain how to ensure reproducibility of simulation results by setting the seed used by R's random number generator. +We then explain how to ensure reproducibility of simulation results by setting the seed\index{random seed} used by R's random number generator. ## Repeating oneself {#repeating-oneself} Suppose that we want to repeatedly generate bivariate Poisson data and then use the data to estimate the correlation between the variates. -We saw a DGP function for the bivariate Poisson in Section \@ref(DGP-functions), and an estimation function in Section \@ref(estimation-functions). +We saw a DGP function for the bivariate Poisson in Section \@ref(DGP-functions), and an estimation function in Section \@ref(estimation-functions).\index{example!bivariate Poisson} To produce a simulated correlation coefficient, we need to run these two functions in turn: ```{r} dat <- r_bivariate_Poisson( N = 30, rho = 0.4, mu1 = 8, mu2 = 14 ) @@ -66,7 +66,7 @@ See Appendix \@ref(more-repeating-oneself) for more discussion of these options. ## One run at a time A slightly different technique for running multiple replications of a process is to first write a function that executes a single run of the simulation, and then repeatedly evaluate that single function. -For instance, here is a function that stitches together the two steps in the bivariate-Poisson correlation simulation: +For instance, here is a function that stitches together the two steps in the bivariate-Poisson correlation simulation:\index{example!bivariate Poisson} ```{r} one_bivariate_Poisson_r <- function(N, rho, mu1, mu2) { dat <- r_bivariate_Poisson( N = N, rho = rho, mu1 = mu1, mu2 = mu2 ) @@ -90,7 +90,7 @@ repeat_and_stack( This technique of wrapping the data-generating function and estimation function inside of another function might strike you as a bit cumbersome because the wrapper is only two lines of code and writing it requires repeating many of the function argument names when calling the data-generating function (`N = N, rho = rho`, etc.). However, the wrapper technique can be useful for more complicated simulations, such as those that involve comparison of multiple estimation methods. -Consider the cluster-randomized experiment case study presented in Section \@ref(case-cluster) and \@ref(multiple-estimation-procedures). +Consider the cluster-randomized experiment case study presented in Section \@ref(case-cluster) and \@ref(multiple-estimation-procedures).\index{example!cluster randomized trial} In this simulation, we are interested in comparing the performance of three different estimation methods: a multi-level model, a linear regression with clustered standard errors, and a linear regression on the data aggregated to the school level. A single replication of the simulation entails generating a dataset and then applying three different estimation functions to it. Here is a function that takes our simulation parameters and runs a single trial of the full process: @@ -134,7 +134,7 @@ runs <- repeat_and_stack( R, ``` Setting `id = "runID"` lets us keep track of which iteration number produced which result. -Once our simulation is complete, we save our results to a file so that we can avoid having to re-run the full simulation if we want to explore the results in some future work session. +Once our simulation is complete, we save our results to a file so that we can avoid having to re-run the full simulation if we want to explore the results in some future\index{future package} work session. We now have results for each of our estimation methods applied to each of 1000 generated datasets. The next step is to evaluate how well the estimators did. @@ -142,6 +142,7 @@ For example, we will want to examine questions about bias, precision, and accura In Chapter \@ref(performance-measures), we will look systematically at ways to quantify the performance of estimation methods. ## Reparameterizing {#one-run-reparameterization} +\index{reparameterization} In Section \@ref(DGP-standardization), we discussed how to index the DGP of the cluster-randomized experiment using an intra-class correlation (ICC) instead of using two separate variance components. This type of re-parameterization can be handled as part of writing a wrapper function for executing the DGP and estimation procedures. @@ -163,9 +164,9 @@ one_run <- function( } ``` -[^stopifnot]: Note the use of `stopifnot` in the body of `one_run()`. +[^stopifnot]: Note the use of `stopifnot` in the body of `one_run()`.\index{stopifnot()} It is wise to ensure our parameter transforms are all reasonable, so we do not get unexplained errors or strange results later on. -It is best if your code fails as soon as possible! Otherwise debugging can be quite hard. +It is best if your code fails as soon as possible! Otherwise debugging can be quite hard.\index{debugging code} For details on `quiet_analyze_data()`, see Section \@ref(capturing-errors-and-warnings). Also see Section \@ref(multiple-estimation-procedures) for how `quiet_analyze_data()` bundles the three different estimation methods we are evaluating. @@ -229,6 +230,7 @@ It runs simulations in effect size units, and provides "knobs" to the simulation ## Bundling simulations with `simhelpers` {#bundle-sim-demo} +\index{simulation driver}\index{package!simhelpers}\index{bundle\_sim()} The techniques that we have demonstrated for repeating a set of calculations each involve a very specific coding pattern, which will often have the same structure even if the details of the data-generating model or the names of the input parameters are very different from the examples we have presented. The `simhelpers` package provides a function `bundle_sim()` that abstracts this common pattern and allows you to automatically stitch together (or "bundle") a DGP function and an estimation function, so that they can be run once or multiple times. @@ -237,7 +239,7 @@ Thus, `bundle_sim()` provides a convenient alternative to writing your own `one_ The `bundle_sim()` function takes a DGP function and an estimation function as inputs and gives us back a new function that will run a simulation using whatever parameters we give it. The `bundle_sim()` function is a "factory" in that it creates a new function on the fly. -Here is a basic example, where we create a function for running our simulation to evaluate how well our Pearson correlation coefficient confidence interval works on a bivariate Poisson distribution: +Here is a basic example, where we create a function for running our simulation to evaluate how well our Pearson correlation coefficient confidence interval works on a bivariate Poisson distribution:\index{example!bivariate Poisson} ```{r} sim_r_Poisson <- bundle_sim( f_generate = r_bivariate_Poisson, f_analyze = r_and_z, @@ -287,7 +289,7 @@ See Exercise \@ref(reparameterization-redux) for further discussion of how to wo ## Seeds and pseudo-random number generators {#seeds-and-pseudo-RNGs} -In prior chapters, we used built-in functions to generate random numbers and wrote our own data-generating functions that produce artificial data following a specific random process. +In prior chapters, we used built-in functions to generate random numbers\index{pseudo-random number generator} and wrote our own data-generating functions that produce artificial data following a specific random process. With either type of function, re-running it with the exact same input parameters will produce different results. For instance, running the `rchisq` function with the same set of inputs will produce two different sequences of $\chi^2$ random variables: ```{r} @@ -296,7 +298,8 @@ c2 <- rchisq(4, df = 3) rbind(c1, c2) ``` If you run the same code as above, you will get different results from these. -Likewise, running the bivariate Poisson function from Section \@ref(DGP-functions) or the `sim_r_Poisson()` function from Section \@ref(bundle-sim-demo) multiple times will produce different datasets: +Likewise, running the bivariate Poisson function from Section \@ref(DGP-functions) or the `sim_r_Poisson()` function from Section \@ref(bundle-sim-demo) multiple times will produce different datasets:\index{example!bivariate Poisson} + ```{r} dat_A <- r_bivariate_Poisson(20, rho = 0.5, mu1 = 4, mu2 = 7) dat_B <- r_bivariate_Poisson(20, rho = 0.5, mu1 = 4, mu2 = 7) @@ -307,12 +310,12 @@ sim_B <- sim_r_Poisson( 10, N = 30, rho = 0.4, mu1 = 8, mu2 = 14) identical(sim_A, sim_B) ``` Of course, this is the intended behavior of these functions, but it has an important consequence that needs some care and attention. -Using functions like `rchisq()`, `r_bivariate_Poisson()`, or `r_bivariate_Poisson()` in a simulation study means that the results will not be fully reproducible. +Using functions like `rchisq()`, `r_bivariate_Poisson()`, or `r_bivariate_Poisson()` in a simulation study\index{simulation study} means that the results will not be fully reproducible. If we run the simulation code and write up our results, and then someone else tries to run the same simulation using the very same code, they will not get the very same results. This undermines transparency and makes it hard to verify if a set of presented results really came from a given codebase. When using DGP functions for simulations, it is useful to be able to exactly control the process of generating random numbers. -This is much more feasible than it sounds: although Monte Carlo simulations are random in theory, computers are deterministic. +This is much more feasible than it sounds: although Monte Carlo\index{Monte Carlo simulation} simulations are random in theory, computers are deterministic. When we use R to generate what we have been referring to as "random numbers," the functions produce what are actually _pseudo-random_ numbers. Pseudo-random numbers are generated from chains of mathematical equations designed to produce sequences of numbers that appear random, but actually follow a deterministic sequence. Each subsequent random number is a calculated by starting from the previously generated value (i.e., the current state of the random number generator), applying a complicated function, and storing the result (i.e., updating the state). @@ -389,7 +392,7 @@ If we had not set the seed, we would not know if we were just getting (un)lucky, ## Conclusions -The act of running a simulation is not itself the goal of a simulation study. Rather, it is only the procedural mechanism by which we generate many realizations of a data analysis procedure under a specified data-generating process. +The act of running a simulation is not itself the goal of a simulation study. Rather, it is only the procedural mechanism by which we generate many realizations of a data analysis procedure under a specified data-generating process\index{data-generating process}. As we have seen in this chapter, doing so requires carefully stitching together the one-two step of data generation and analysis, repeating that combined process many times, and then organizing the resulting output in a form that can be easily and systematically evaluated. Different coding strategies can be used to accomplish this, but they all serve the same purpose: to make repeated evaluation of a procedure reliable, transparent, and easy to extend. Finally, because simulation studies rely on stochastic data generation, it is critical that we pay careful attention to reproducibility. @@ -402,7 +405,7 @@ With the tools of this chapter in place, we are now positioned to move from runn ### Welch simulations {#Welch-simulation} -In the prior chapter's exercises (See \@ref(BFFs-forever)), you made a new `BF_F` function for the Welch simulation. Update the Welch simulation code to include your `BF_F` function, and then generate simulation results for this additional estimator. See Chapter @ref(case-ANOVA) for the original simulation and overall context. +In the prior chapter's exercises (See \@ref(BFFs-forever)), you made a new `BF_F` function for the Welch simulation. Update the Welch simulation code to include your `BF_F` function, and then generate simulation results for this additional estimator. See Chapter @ref(case-ANOVA) for the original simulation and overall context.\index{example!heteroskedastic ANOVA (Welch)} ### Compare sampling distributions of Pearson's correlation coefficients {#Pearson-sampling-distributions} @@ -413,7 +416,7 @@ In Section \@ref(bundle-sim-demo), we made a bundled simulation function `sim_r_ sim_r_Poisson( 3, mu1 = 4, mu2 = 5, N = 40, rho = 0.7 ) ``` -1. Use `sim_r_Poisson()` to generate 5000 replications of the Fisher-$z$-transformed correlation coefficient under a bivariate Poisson distribution with $\rho = 0.7$ for a sample of $N = 40$ observations. You pick the remaining parameters. +1. Use `sim_r_Poisson()` to generate 5000 replications of the Fisher-$z$-transformed correlation coefficient under a bivariate Poisson distribution with $\rho = 0.7$ for a sample of $N = 40$ observations. You pick the remaining parameters.\index{example!bivariate Poisson} 2. Create a bundled simulation function by combining the data-generating function from Exercise \@ref(BVNB1) or \@ref(BVNB2) with the `r_and_z()` estimation function. Run the function to generate 5000 replications of the Fisher-$z$-transformed correlation coefficient under a bivariate negative binomial distribution, with the same parameter values as in part 1. @@ -421,6 +424,7 @@ sim_r_Poisson( 3, mu1 = 4, mu2 = 5, N = 40, rho = 0.7 ) ### Reparameterization, redux {#reparameterization-redux} +\index{reparameterization} In Section \@ref(one-run-reparameterization), we illustrated how the `one_run()` simulation wrapper function could be tweaked in order to reparameterize the model in terms of a single intra-class correlation rather than two variance parameters. But what if you want to avoid having to write your own simulation wrapper and instead use `bundle_sim()`? @@ -428,8 +432,9 @@ Revise `gen_cluster_RCT()` to accomplish the reparameterization, then use `simhe ### Fancy clustered RCT simulations {#fancy-cluster-RCT-sims} +\index{example!cluster randomized trial} -In Exercise \@ref(cluster-RCT-baseline), your task was to write a data-generating function for a cluster-randomized trial that includes a baseline covariate. +In Exercise \@ref(cluster-RCT-baseline), your task was to write a data-generating function for a cluster-randomized trial\index{cluster-randomized trial} that includes a baseline covariate. Then in Exercise \@ref(CRT-ANCOVA-estimators), your task was to create an estimation function that could be applied to data from such a trial. 1. Using your work from these exercises, create a bundled simulation function `one_run_cov()` that combines the data-generating function and the estimation function. Use both `analyze_data()` and `analyze_data_cov()` to get estimates for all three approaches both with and without covariate adjustment. diff --git a/040-Performance-criteria.Rmd b/040-Performance-criteria.Rmd index 908c479..469733e 100644 --- a/040-Performance-criteria.Rmd +++ b/040-Performance-criteria.Rmd @@ -28,7 +28,7 @@ cluster_RCT_res <- Once we run a simulation, we end up with a pile of results to sort through. For example, Figure \@ref(fig:CRT-ATE-hist) depicts the distribution of average treatment effect estimates from the cluster-randomized experiment simulation we generated in Section \@ref(one-run-reparameterization). There are three different estimators, each with 1000 replications. -Each histogram is an approximation of the _sampling distribution_ of the estimator, meaning its distribution across repetitions of the data-generating process. +Each histogram is an approximation of the _sampling distribution_ of the estimator, meaning its distribution across repetitions of the data-generating process\index{data-generating process}. Visually, we see the three estimators look about the same---they sometimes estimate too low, and sometimes estimate too high. We also see linear regression is shifted up a bit, and aggregation and MLM might have a few more outlying values. These observations are initial impressions of how well each estimator performs---that is, how closely the estimator approximates the truth. @@ -61,7 +61,7 @@ ggplot(cluster_RCT_res) + ``` A _performance measure_ summarizes an estimator's sampling distribution to describe how an estimator would behave on average if we were to repeat the data-generating process an infinite number of times. -For example, the bias of an estimator is the difference between the average value of the estimator and the corresponding target parameter. +For example, the bias\index{performance measures!bias} of an estimator is the difference between the average value of the estimator and the corresponding target parameter. Bias measures the central tendency of the sampling distribution, capturing whether we are systematically off, on average, from the true parameter value. In Figure \@ref(fig:CRT-ATE-hist), the black dashed lines mark the true average treatment effect of 0.3 and the solid lines with the dot mark the means of the estimators. @@ -70,12 +70,12 @@ This distance is nearly zero for the aggregation estimator and the multilevel mo Our performance measure of bias would be near zero for aggregation and multilevel modeling, but around 0.05 to 0.1 for linear regression. Many performance measures exist. -For point estimates, conventional performance measures include bias, variance, and root mean squared error. +For point estimates, conventional performance measures include bias, variance, and root mean squared error\index{performance measures!RMSE\index{performance measures!RMSE}}. If the point estimates come with corresponding standard errors, then we may also want to evaluate how accurately the standard errors represent the true uncertainty of the point estimators; conventional performance measures for capturing this include the relative bias and relative root mean squared error of the variance estimator. -For procedures that produce confidence intervals or other types of interval estimates, conventional performance measures include the coverage rate and average interval width. -Finally, for inferential procedures that involve hypothesis tests (or more generally, classification tasks), conventional performance measures include Type I error rates and power. +For procedures that produce confidence intervals or other types of interval estimates, conventional performance measures include the coverage\index{performance measures!coverage} rate and average interval width. +Finally, for inferential procedures that involve hypothesis tests (or more generally, classification tasks), conventional performance measures include Type I error\index{performance measures!Type I error} rates and power. We describe each of these measures in Sections \@ref(assessing-point-estimators) through \@ref(assessing-inferential-procedures). @@ -85,8 +85,8 @@ We describe each of these measures in Sections \@ref(assessing-point-estimators) -For any particular measure, the actual performance of an estimator depends on the data generating process under which it is evaluated. -Performance measures are defined with respect to the sampling distribution of an estimator, which is the distribution of values we would see across different datasets that all came from the same DGP. +For any particular measure, the actual performance of an estimator depends on the data generating process\index{data-generating process} under which it is evaluated. +Performance measures are defined with respect to the sampling distribution of an estimator, which is the distribution of values we would see across different datasets that all came from the same DGP\index{data-generating process}. We will therefore need to use some statistical notation to define properties of this sampling distribution. In the following, we define many of the most common measures in terms of standard statistical quantities such as the mean, variance, and other moments of the sampling distribution. Notation-wise, for a random variable $T$ (e.g., for some estimator $T$), we will use the expectation operator $\E(T)$ to denote the mean, $\M(T)$ to denote the median, $\Var(T)$ to denote the variance, and $\Prob()$ to denote probabilities of specific outcomes, all with respect to the sampling distribution of $T$. @@ -102,8 +102,8 @@ In Figure \@ref(fig:CRT-ATE-hist), for example, we estimated the bias of each es If we were to repeat the whole simulation (with a different seed), then our bias results would shift slightly because they are imperfect estimates of the actual bias. In working with simulation results, it is important to track how much uncertainty we have in our performance measure estimates. -We call such uncertainty _Monte Carlo error_ because it is the error arising from using a finite number of replications of the Monte Carlo simulation process. -One way to quantify the simulation uncertainty is with the _Monte Carlo Standard Error (MCSE)_, which is the standard error of a performance estimate given a specific number of replications. +We call such uncertainty _Monte Carlo error_ because it is the error arising from using a finite number of replications of the Monte Carlo\index{Monte Carlo simulation} simulation process. +One way to quantify the simulation uncertainty is with the _Monte Carlo Standard Error (MCSE\index{Monte Carlo standard error})_, which is the standard error of a performance estimate given a specific number of replications. Just as when we analyze real data, we can apply statistical techniques to estimate the MCSE and even to generate confidence intervals for performance measures. The magnitude of the MCSE is driven by how many replications we use: if we only use a few, we will have noisy estimates of performance with large MCSEs; if we use millions of replications, the MCSE will usually be tiny. @@ -207,7 +207,7 @@ This is a consequence of how these performance measures are defined. One might see this property as a limitation on the utility of using bias and RMSE to measure the performance of an estimator, because these measures can be quite sensitive to the metric of the parameter.--> ### Comparing the Performance of the Cluster RCT Estimation Procedures {#clusterRCTperformance} - +\index{example!cluster randomized trial} ```{r, include=FALSE} runs <- readRDS( file = "results/cluster_RCT_simulation.rds" ) ``` @@ -393,6 +393,7 @@ For this question we evaluate estimated standard errors in terms of effective de ### Calibration +\index{performance measures!calibration} Calibration is a relative measure, where we look at the ratio of the expected (average) value of an uncertainty estimator to the true degree of uncertainty. The relative measure removes scale, which is important because the absolute magnitude of uncertainty will depend on many features of the data-generating process, such as sample size. @@ -521,7 +522,7 @@ For simple statistical methods in which $V$ is based on a sum-of-squares of norm Even with more complex methods, the degrees of freedom are interpretable: higher degrees of freedom imply that $V$ is based on more observations, and thus will be a more precise estimate of the actual degree of uncertainty in $T$. ### Assessing SEs for the Cluster RCT Simulation {#assess-cluster-RCT-SEs} - +\index{example!cluster randomized trial} Returning to the cluster RCT example, we will assess whether our estimated SEs are about right by comparing the average _estimated_ (squared) standard error to the empirical sampling variance. Our standard errors are _inflated_ if they are systematically larger than they should be, across the simulation runs. We will also look at how stable our variance estimates are by comparing their standard deviation to the empirical sampling variance and by computing the Satterthwaite degrees of freedom. @@ -604,6 +605,7 @@ Likewise, variance estimators that have relative bias below 1 will tend to produ Thus, confidence interval coverage captures multiple aspects of the performance of an estimation procedure. ### Confidence Intervals in the Cluster RCT Simulation {#cluster-RCT-CI-coverage} +\index{example!cluster randomized trial} Returning to the CRT simulation, we will examine the coverage and expected width of normal Wald-type confidence intervals for each of the estimators under consideration. To do this, we first have to calculate the confidence intervals because we did not do so in the estimation function. @@ -678,8 +680,8 @@ The only difference is the _conditions_ under which the data are generated. We often find it useful to think of power as a _function_ rather than as a single quantity because its absolute magnitude will generally depend on the sample size of a dataset and the magnitude of the effect of interest. Because of this, power evaluations will typically involve examining a _sequence_ of data-generating scenarios with varying sample size or varying effect size. -If we are actually planning a future analysis, we would then look to determine what sample size or effect size would give us 80% power (a traditional target power level used for planning in many cases). -See Chapter \@ref(sec:power) for more on power analysis for study planning. +If we are actually planning a future\index{future package} analysis, we would then look to determine what sample size or effect size would give us 80% power (a traditional target power level used for planning in many cases). +See Chapter \@ref(sec:power) for more on power analysis\index{power analysis} for study planning. Separately, if our goal is to compare testing procedures, then absolute power at substantively large effects is often less informative than the relative power of different testing procedures at substantively small effects, near to but not exactly at the null. In this setting, attention naturally shifts to infinitesimal power---that is, how quickly power increases as we move from a null effect to very small alternatives. @@ -722,6 +724,7 @@ For instance, a testing procedure with $\alpha = .05$ would be considered accept ### Inference in the Cluster RCT Simulation +\index{example!cluster randomized trial} Returning to the cluster RCT simulation, we evaluate the validity and power of hypothesis tests for the average treatment effect based on each of the three estimation methods. The data used in previous sections of the chapter was simulated under a process with a non-null treatment effect parameter (equal to 0.3 SDs), so the null hypothesis of zero average treatment effect does not hold. @@ -911,7 +914,7 @@ Whether this is misleading or not will depend on context. ## Estimands Not Represented By a Parameter {#implicit-estimands} -In the Cluster RCT example, we focused on the estimand of the school-level ATE, represented by the model parameter $\gamma_1$. +In the Cluster RCT example,\index{example!cluster randomized trial} we focused on the estimand of the school-level ATE, represented by the model parameter $\gamma_1$. What if we were instead interested in the person-level average effect? This estimand does not correspond to any input parameter in our data generating process. Instead, it is defined _implicitly_ by a combination of other parameters. @@ -1106,7 +1109,7 @@ $$ MCSE\left(M_T\right) = \frac{T_{(R + 1 - c)} - T_{(c)}}{2 \times 1.96}. (\#eq:MCSE-median) $$ -A Monte Carlo standard error for the median absolute deviation can be computed following the same approach, but substituting the order statistics of $E_r = | T_r - \theta|$ in place of those for $T_r$. +A Monte Carlo standard error\index{Monte Carlo standard error} for the median absolute deviation can be computed following the same approach, but substituting the order statistics of $E_r = | T_r - \theta|$ in place of those for $T_r$. Trimmed mean bias with trimming proportion $p$ is calculated by taking the mean of the the middle $(1 - 2p) \times R$ observations, which we have denoted as $\tilde{T}_{\{p\}}$. A MCSE for the trimmed mean (and for the trimmed mean bias) is $$ @@ -1286,6 +1289,7 @@ It can be applied to assess MCSE for nearly any performance measure you might en ## Performance Calculations with the `simhelpers` Package +\index{package!simhelpers}\index{calc\_absolute()}\index{calc\_relative\_var()}\index{calc\_rejection()}\index{calc\_coverage()} The `simhelpers` package provides several functions for calculating most of the performance measures that we have reviewed, along with MCSEs for each of the performance measures. The functions are easy to use. @@ -1458,7 +1462,7 @@ Table: (\#tab:performance-measure-summary) Performance measures for evaluating p ### Brown and Forsythe (1974) results {#Brown-Forsythe-performance} -1. Use the `generate_ANOVA_data` data-generating function for one-way heteroskedastic ANOVA (Section \@ref(case-anova-DGP)) and the data-analysis function you wrote for Exercise \@ref(BFFs-forever) to create a simulation driver function for the Brown and Forsythe simulations. +1. Use the `generate_ANOVA_data` data-generating function for one-way heteroskedastic ANOVA (Section \@ref(case-anova-DGP)) and the data-analysis function you wrote for Exercise \@ref(BFFs-forever) to create a simulation driver function for the Brown and Forsythe simulations.\index{example!heteroskedastic ANOVA (Welch)} 2. Use your simulation driver to evaluate the Type-I error rate of the ANOVA $F$-test, Welch's test, and the BFF* test for a scenario with four groups, sample sizes $n_1 = 11$, $n_2 = 16$, $n_3 = 16$, $n_4 = 21$, equal group means $\mu_1 = \mu_2 = \mu_3 = \mu_4 = 0$, and group standard deviations $\sigma_1 = 3$, $\sigma_2 = 2$, $\sigma_3 = 2$, $\sigma_4 = 1$. Are all of the tests level-$\alpha$? @@ -1482,7 +1486,7 @@ Compute the size-adjusted power of the ANOVA $F$-test, Welch's test, and the BFF ### Three correlation estimators {#three-correlation-estimators} -Consider the bivariate negative binomial distribution model described in Exercise \@ref(BVNB2). +Consider the bivariate negative binomial distribution model described in Exercise \@ref(BVNB2).\index{example!bivariate Poisson} Suppose that we want to estimate the correlation parameter $\rho$ of the latent bivariate normal distribution. Without studying the statistical theory for this problem, we might think to use simulation to evaluate whether any common correlation measures work well for estimating this parameter. Potential candidate estimators include the usual sample Pearson's correlation, Spearman's rank correlation, or Kendall's $\tau$ coefficient. diff --git a/060-multiple-scenarios.Rmd b/060-multiple-scenarios.Rmd index e517459..ac3a62a 100644 --- a/060-multiple-scenarios.Rmd +++ b/060-multiple-scenarios.Rmd @@ -37,6 +37,7 @@ dat <- gen_cluster_RCT( n=5, J=3, p=0.5, # (PART) Systematic Simulations {-} # Simulating across multiple scenarios {#simulating-multiple-scenarios} +\index{simulation driver} In Chapter \@ref(simulation-structure), we described the general structure of basic simulations as following four steps: generate, analyze, repeat, and summarize. The principles of tidy simulation suggest that each of these steps should be represented by its own function or set of code. @@ -59,12 +60,12 @@ Even if we are only using simulation in an ad hoc, exploratory way, we will ofte We have already seen examples of this in Chapter \@ref(t-test-simulation), where we looked at the coverage rate of a confidence interval for the mean of an exponential distribution. In Section \@ref(simulating-across-different-scenarios), we applied a simulation driver function across a set of sample sizes ranging from 10 to 300, finding that the coverage rate improves towards the desired level as sample size increases. Simple forms of systematic exploration such as this are useful in many situations. -For instance, when using Monte Carlo simulation for study planning, we might examine simulated power over a range of the target parameter to identify the smallest parameter for power is above a desired level. +For instance, when using Monte Carlo\index{Monte Carlo simulation} simulation for study planning, we might examine simulated power over a range of the target parameter to identify the smallest parameter for power is above a desired level. If we are using simulation simply to study an unfamiliar model, we might vary a key parameter over a wide range to see how the performance of an estimator changes. These forms of exploration can be understood as single-factor simulations. To demonstrate a single-factor simulation, we revisit the case study on heteroskedastic analysis of variance, as studied by @brown1974SmallSampleBehavior and developed in Chapter \@ref(case-ANOVA). -Suppose that we want to understand how the power of Welch's test varies as a function of the maximum distance between group means. +Suppose that we want to understand how the power of Welch's test varies as a function of the maximum distance between group means.\index{example!heteroskedastic ANOVA (Welch)} The data-generating function `generate_ANOVA_data()` that we developed previously was set up to take a vector of means per group, so we re-parameterize the function to define the group means based on the maximum difference (`max_diff`), under the assumption that the means are equally spaced between zero and the maximum difference. We will also re-parameterize the function in terms of the total sample size and the fraction of observations allocated to each group. The revised function is @@ -185,7 +186,8 @@ power_levels Now that we have a function for carrying out the performance calculations, we could consider incorporating this step into the simulation driver function. That way, we can call the simulation driver function with a set of parameter values and it will return a table of performance summaries. -The `bundle_sim()` function from `simhelpers` will create such a function for us, by combining a performance calculation function with the data-generating and data-analysis functions: +The `bundle_sim()` function from `simhelpers` will create such a function for us, by combining a performance calculation function with the data-generating and data-analysis functions:\index{example!heteroskedastic ANOVA (Welch)}\index{bundle\_sim()}\index{package!simhelpers} + ```{r} sim_ANOVA_full <- bundle_sim( f_generate = generate_ANOVA_new, @@ -252,9 +254,9 @@ These questions can be examined by expanding the simulation design to further sc ## Simulating across multiple factors -Consider a simulation study examining the performance of confidence intervals for Pearson's correlation coefficient under a bivariate Poisson distribution. +Consider a simulation study\index{simulation study} examining the performance of confidence intervals for Pearson's correlation coefficient under a bivariate Poisson distribution. We examined this data-generating model in Section \@ref(BVPois-example), implementing it in the function `r_bivariate_Poisson()`. The model has three parameters (the means of each variate, $\mu_1, \mu_2$ and the correlation $\rho$) and there is one design parameter (sample size, $N$). -Thus, we could in principle examine up to four factors. +Thus, we could in principle examine up to four factors.\index{example!bivariate Poisson} Using these parameters directly as factors in the simulation design will lead to considerable redundancy because of the symmetry of the model: generating data with $\mu_1 = 10$ and $\mu_2 = 5$ would lead to identical correlations as using $\mu_1 = 5$ and $\mu_2 = 10$. It is useful to re-parameterize to reduce redundancy and simplify things. @@ -421,7 +423,7 @@ sim_results <- unnest(cols = res) ``` -As an alternative approach, the `evaluate_by_row()` function from the `simhelpers` package accomplishes the same thing as the `pmap()` calculations inside the `mutate()` step of the above code. Its syntax is a bit more concise: +As an alternative approach, the `evaluate_by_row()` function from the `simhelpers` package accomplishes the same thing as the `pmap()` calculations inside the `mutate()` step of the above code. Its syntax is a bit more concise:\index{evaluate\_by\_row()} ```{r, eval=FALSE} sim_results <- @@ -615,10 +617,10 @@ Allowing yourself the flexibility to re-calculate performance measures can be ve ## Summary -Multifactor simulations are simply a series of individual scenario simulations, where the set of scenarios are structured by systematically manipulating some of the parameters of the data-generating process. +Multifactor simulations are simply a series of individual scenario simulations, where the set of scenarios are structured by systematically manipulating some of the parameters of the data-generating process\index{data-generating process}. The overall workflow for implementing a multifactor simulation begins with identifying which parameters and which specific values of those parameters to explore. These parameters correspond to the factors of the simulation's design; the specific values correspond to the levels (or settings) of each factor. -Following the principles of tidy simulation, we represent these decisions as a dataset consisting of all the combinations of the factors that we wish to explore. +Following the principles of tidy simulation, we represent these decisions as a dataset consisting of all the combinations of the factors that we wish to explore.\index{tidy design} Think of this as a menu, or checklist, of simulation scenarios to run. The next step in the workflow is then to walk down the list, running a simulation of each scenario in turn. @@ -661,7 +663,7 @@ The variance of a $t$ distribution is $df/(df-2)$, so when we divide our observa square root of this, we standardize them so they have unit variance. The estimand of interest here is `mu`, the center of the distribution. The estimation methods of interest are the conventional (arithemetic) mean, a 10% trimmed mean, and the median of a sample of $n$ observations. -For performance measures, focus on bias, true standard error, and root mean squared error. +For performance measures, focus on bias, true standard error, and root mean squared error\index{performance measures!RMSE\index{performance measures!RMSE}}. 1. Verify that `gen_scaled_t()` produces data with mean `mu` and standard deviation 1 for various `df0` values. @@ -689,7 +691,7 @@ For performance measures, focus on bias, true standard error, and root mean squa ### Estimating latent correlations {#exercise:multifactor-latent-correlation} -Exercise \@ref(BVNB2) introduced a bivariate negative binomial model and asked you to write a data-generating function that implements the model. Exercise \@ref(three-correlation-estimators) provided an estimation function (called `three_corrs`) that calculates three different types of correlation coefficients, and asked you to write a function for calculating the bias and RMSE of these measures. +Exercise \@ref(BVNB2) introduced a bivariate negative binomial model and asked you to write a data-generating function that implements the model. Exercise \@ref(three-correlation-estimators) provided an estimation function (called `three_corrs`) that calculates three different types of correlation coefficients, and asked you to write a function for calculating the bias and RMSE of these measures.\index{example!bivariate Poisson} 1. Combine your data-generating function, `three_corrs()`, and your performance calculation into a simulation driver. @@ -700,8 +702,9 @@ Exercise \@ref(BVNB2) introduced a bivariate negative binomial model and asked y 4. Create a graph or graphs that depict the bias and RMSE of each correlation as a function of $\rho$ and any other key parameters. ### Meta-regression {#exercise:multifactor-meta-regression} +\index{example!meta-regression} -Exercise \@ref(meta-regression-DGP) described the random effects meta-regression model. +Exercise \@ref(meta-regression-DGP\index{data-generating process}) described the random effects meta-regression model. List the focal, auxiliary, and structural parameters of this model, and propose a set of design factors to use in a multifactor simulation of the model. Create a list with one entry per factor, then create a dataset with one row for each simulation context that you propose to evaluate. diff --git a/070-experimental-design.Rmd b/070-experimental-design.Rmd index 1c4eea9..e91f622 100644 --- a/070-experimental-design.Rmd +++ b/070-experimental-design.Rmd @@ -49,7 +49,7 @@ As @little2013praise puts it: How do we go about choosing parameter values to examine? Choosing which parameters to use is a central part of good simulation design because the primary limitation of simulation studies is always their _generalizability_. -On the one hand, it is difficult to extrapolate findings from a simulation study beyond the set of simulation conditions that were examined. On the other hand, it is often difficult or impossible to examine the full space of all possible parameter values, except for very simple problems. +On the one hand, it is difficult to extrapolate findings from a simulation study\index{simulation study} beyond the set of simulation conditions that were examined. On the other hand, it is often difficult or impossible to examine the full space of all possible parameter values, except for very simple problems. Even in the relatively straightforward Pearson correlation simulation, we have four factors and the last three could each take on an infinite number of possible levels. How can we come up with a defensible set of levels to examine? @@ -112,6 +112,7 @@ We illustrate extending to a full multifactor simulation next, and then continue ### Choosing parameters for the Clustered RCT +\index{example!cluster randomized trial} We begin by identifying some potential research questions suggested by our preliminary exploration. Regarding bias, we noticed in our initial simulation that Linear Regression targets a person-weighted average effect, so it would be considered biased for the cluster-average average treatment effect. @@ -125,7 +126,7 @@ We might then ask, are the standard errors always the right size, on average? W To answer all of these questions we need to more systematically explore the space of models. But we have a lot of knobs to turn. -In particular, our data-generating process will produce artificial cluster-randomized experiments where we can vary any of the following features: +In particular, our data-generating process\index{data-generating process} will produce artificial cluster-randomized experiments where we can vary any of the following features: - the number of clusters; - the proportion of clusters that receive treatment; @@ -148,7 +149,7 @@ For example, if we simply added more cross-cluster variation by directly increas If we then see that the performance of an estimator deteriorates as variation increases, we have a confound: is the cross-cluster variation causing the problem, or is it the total variation? To avoid this confound, we should vary cluster variation while holding the total variation fixed; this is why we use the ICC parameterization, as discussed in Section \@ref(case-cluster). -Given our research questions and the way we parameterize the DGP, we end up with the following factors and levels: +Given our research questions and the way we parameterize the DGP\index{data-generating process}, we end up with the following factors and levels: ```{r CRT_factors} crt_design_factors <- list( @@ -207,7 +208,7 @@ if ( !file.exists( "results/simulation_CRT.rds" ) ) { } ``` -Normally, we would speed up these calculations using parallel processing; we discuss how to do so in Chapter \@ref(parallel-processing). +Normally, we would speed up these calculations using parallel processing\index{parallel processing}; we discuss how to do so in Chapter \@ref(parallel-processing). Even under parallel processing, this simulation took overnight to run. Our final results look like this: @@ -280,7 +281,7 @@ sres %>% Even with this very crude summary of the results, we see tantalizing hints of differences between the methods. Linear Regression does seem biased, on average, across the scenarios considered. But so does multilevel modeling---we did not see bias before, but apparently for some scenarios it is biased as well. -Overall SE and RMSE seems roughly the same across scenarios, and it is clear that uncertainty trumps bias, at least on average. +Overall SE and RMSE\index{performance measures!RMSE} seems roughly the same across scenarios, and it is clear that uncertainty trumps bias, at least on average. However, this pattern could be driven by the scenarios with few clusters that are small in size. In order to understand the full story of what is going on, we need to dig further into the results. diff --git a/072-presentation-of-results.Rmd b/072-presentation-of-results.Rmd index 695f5b6..dca75e3 100644 --- a/072-presentation-of-results.Rmd +++ b/072-presentation-of-results.Rmd @@ -31,7 +31,7 @@ sres ``` -Once we have our performance measures for each method examined for each of scenario of our study's design, the computationally challenging parts of a simulation study are complete, but several intellectually challenging tasks remain. +Once we have our performance measures for each method examined for each of scenario of our study's design, the computationally challenging parts of a simulation study\index{simulation study} are complete, but several intellectually challenging tasks remain. The goal of a simulation study is to provide evidence to address a research question or questions, but performance measures (like numbers more generally) do not analyze themselves. Rather, they require interpretation, analysis, and communication in order to identify findings and broad, potentially generalizable patterns of results that are relevant the research question(s). @@ -39,7 +39,7 @@ Good analysis will provide a clear understanding of how one or more of the simul In multi-factor simulations, the major challenge in analyzing simulation results is dealing with the multiplicity and dimensional nature of the results. For instance, in our cluster RCT simulation, we calculated performance metrics in each of `r prettyNum( nrow(sres) / 3, big.mark=",")` different simulation scenarios, which vary along several factors. -For each scenario, we calculated a whole suite of performance measures (bias, SE, RMSE, coverage, ...), and we have these performance measures for each of three estimation methods under consideration. +For each scenario, we calculated a whole suite of performance measures (bias, SE, RMSE\index{performance measures!RMSE}, coverage, ...), and we have these performance measures for each of three estimation methods under consideration. We organized all these results as a table with `r prettyNum( nrow(sres), big.mark=",")` rows (three rows per simulation scenario, with each row corresponding to a specific method) and one column per performance metric. Navigating all of this can feel somewhat overwhelming. How do we understand trends in this complex, multi-factor data structure? @@ -70,7 +70,7 @@ You are going to want to show, for example, the relative performance of alternat This means a lot of rows, and a lot of dimensions. Tables can do two dimensions; when you try to cram more than that into a table, no one is particularly well served. -Furthermore, in simulation, exact values for your bias/RMSE/type-I error, or whatever, are not usually of interest. And in fact, we rarely have them due to Monte Carlo simulation error. +Furthermore, in simulation, exact values for your bias/RMSE/type-I error, or whatever, are not usually of interest. And in fact, we rarely have them due to Monte Carlo\index{Monte Carlo simulation} simulation error. The tables provide a false sense of security, unless you include uncertainty, which clutters your table even further. Overall, tables and simulations do not particularly well mix. @@ -80,7 +80,8 @@ It is often more useful and insightful to present results in graphs [@gelman2002 To illustrate, consider the following table of simulation results showing the false rejection rate, against an $\alpha$ of $0.10$, for an estimator of an average treatment impact. We have two factors of interest, the treatment and control group sizes. - -To illustrate these tools, we walk through them using our running example of comparing methods for analyzing a Cluster RCT. +To illustrate these tools, we walk through them using our running example of comparing methods for analyzing a Cluster RCT.\index{example!cluster randomized trial} We will start with an investigation of bias. As a reminder, in our Cluster RCT example, we have three methods for estimating the average treatment effect: linear regression of the student-level outcome onto treatment (with cluster-robust standard errors); aggregation, where we regress the cluster-level average outcome onto treatment (with heteroskedastic robust standard errors); and multilevel modeling with random effects. We want to know if these methods are biased for our defined estimand, which is the cluster-average treatment effect. We have five simulation factors: school size (`n_bar`), number of schools (`J`), the intraclass correlation (`ICC`), the degree of variation in school size (`alpha`), and the relationship between school size and treatment effect (`size_coef`). -Once we go through the four core tools, we continue our evaluation of our simulation to show how we can assess other performance metrics of interest (true standard error, RMSE, estimated standard errors, and coverage) using these tools. +Once we go through the four core tools, we continue our evaluation of our simulation to show how we can assess other performance metrics of interest (true standard error, RMSE\index{performance measures!RMSE}, estimated standard errors, and coverage\index{performance measures!coverage}) using these tools. We do not dive deeply into validity or power; see Chapter \@ref(sec:power) for more on those measures. @@ -185,7 +185,7 @@ Each boxplot shows the central measure of how well an estimator worked across a If the boxes are narrow, then we know that the variation across simulations within the box did not impact performance much. If the boxes are wide, then we know that the factors that vary within the box matter a lot for performance. -With bundling, we generally need a good number of simulation runs per scenario, so that the MCSE in the performance measures does not make our boxplots look substantially more variable (wider) than the truth. +With bundling, we generally need a good number of simulation runs per scenario, so that the MCSE\index{Monte Carlo standard error} in the performance measures does not make our boxplots look substantially more variable (wider) than the truth. Consider a case where all the scenarios within a box have zero _true_ bias; if the MCSE were large, the _estimated_ biases would still vary and we would see a wide boxplot when we should not. To illustrate bundling, we replicate our small subset figure from above, but instead of each point (with a given `J`, `alpha`, and `size_coef`) just being the single scenario with `n_bar=80` and `ICC = 0.20`, we plot all the scenarios (across the `n_bar` and `ICC` values) in a boxplot at that location. @@ -342,7 +342,7 @@ ggplot( sres, aes( ICC, bias, col=method, Our story is fairly clear now: LR is increasingly biased as alpha grows large and the cluster size relates to impact. MLM is also biased in these cases, but only when ICC is low (this is because it is driving towards person-weighting when there is little cluster variation). -Our figure does have what appears to be a negative bias trend for `alpha = 0.5` and higher `ICC`---we expect this is Monte Carlo error, as we do not have a theoretical reason to expect bias here; simulations will often have unfortunate, and misleading, patterns that are driven by random chance. +Our figure does have what appears to be a negative bias trend for `alpha = 0.5` and higher `ICC`---we expect this is Monte Carlo\index{Monte Carlo simulation} error, as we do not have a theoretical reason to expect bias here; simulations will often have unfortunate, and misleading, patterns that are driven by random chance. We could (and probably should) run further analyses or a second simulation to check our assumptions here. @@ -720,7 +720,7 @@ ggplot( c_sub, aes( ICC, coverage, col=method, group=method ) ) + ``` Generally coverage is good unless $J$ is low or ICC is 0. -Monte Carlo standard error based confidence intervals on our performance metrics indicate that, in some settings, the observed coverage is reliably different from the nominal 95%, suggesting issues with estimator bias, standard error estimation, or both. +Monte Carlo standard error\index{Monte Carlo standard error} based confidence intervals on our performance metrics indicate that, in some settings, the observed coverage is reliably different from the nominal 95%, suggesting issues with estimator bias, standard error estimation, or both. We might then want to see if these results are general across the other simulation scenarios (see exercises). For confidence interval width, we can calculate the average width relative to the width of LR across all scenarios: @@ -831,7 +831,7 @@ Make a final plot and draft a short write-up that captures how coverage is vulne ### Pearson correlations with a bivariate Poisson distribution -In Section \@ref(using-pmap-to-run-multifactor-simulations), we generated results for a multifactor simulation of confidence intervals for Pearson's correlation coefficient under a bivariate Poisson data-generating process. Create a plot that depicts the coverage rate of the confidence intervals for $\rho$ across all four simulation factors. +In Section \@ref(using-pmap-to-run-multifactor-simulations), we generated results for a multifactor simulation of confidence intervals for Pearson's correlation coefficient under a bivariate Poisson data-generating process\index{data-generating process}. Create a plot that depicts the coverage rate of the confidence intervals for $\rho$ across all four simulation factors.\index{example!bivariate Poisson} Write a brief explanation of how the plot is laid out and explain why you chose to construct it as you did. diff --git a/075-special-topics-on-reporting.Rmd b/075-special-topics-on-reporting.Rmd index 5082011..7b4da95 100644 --- a/075-special-topics-on-reporting.Rmd +++ b/075-special-topics-on-reporting.Rmd @@ -55,6 +55,7 @@ In Chapter \@ref(presentation-of-results) we saw some examples of using regressi In this chapter we will provide some further in-depth examples along with the R code for doing this sort of thing. ### Example 1: Biserial, revisited +\index{example!Biserial correlation} As our first in depth example, we walk through the analysis that produces the final ANOVA summary table for the biserial correlation example in Chapter \@ref(presentation-of-results). In the visualization there, we saw that several factors appeared to impact bias. @@ -335,7 +336,7 @@ With more complex experiments, where the various factors are interacting with ea Another exploration approach we might use is regression trees. We wrote a utility method, a wrapper to the `rpart` package, to do this ([script here](code/create_analysis_tree.R)). -Here, for example, we see what predicts larger bias amounts: +Here, for example, we see what predicts larger bias amounts in our cluster RCT running example:\index{example!cluster randomized trial} ```{r, include=FALSE} # To silence loading messages @@ -388,13 +389,13 @@ Then, when $J$ is small, bias is even larger. Generally we would not use a tree like this for a final reporting of results, but they can be important tools for _understanding_ your results, which leads to how to make and select more conventional figures for an outward facing document. -## Analyzing results with few iterations per scenario +## Analyzing results with few iterations per scenario {#few-iterations} When each simulation iteration is expensive to run (e.g., if fitting your model takes several minutes), then running thousands of iterations for many scenarios may not be computationally feasible. But running simulations with only a small number of iterations will yield very noisy estimates of estimator performance for that scenario. Now, if the methods being evaluated are substantially different, then differences in performance might still be evident even with only a few iterations. -More generally, however, the Monte Carlo Standard Errors (MCSEs) may be so large that you will have a hard time discriminating between systematic patterns and noise. +More generally, however, the Monte Carlo\index{Monte Carlo simulation} Standard Errors (MCSEs) may be so large that you will have a hard time discriminating between systematic patterns and noise. One tool to handle few iterations is aggregation: if you average across scenarios, those averages will have more precise estimates of (average) performance than the estimates of performance within the scenarios. Do not, by contrast, trust the bundling approach--the MCSEs will make your boxes wider, and give the impression that there is more variation across scenarios than there really is. @@ -433,7 +434,7 @@ summary( ssres$R ) In the prior chapter we analyzed the results of our cluster RCT simulation with 1000 iterations per scenario. But say we only had 25 iterations per scenario. -Using the prior chapter as a guide, we next recreate some of the plots to show how MCSE can distort the picture of what is going on. +Using the prior chapter as a guide, we next recreate some of the plots to show how MCSE\index{Monte Carlo standard error} can distort the picture of what is going on. We start again with bias, and generate a single plot of the raw results for a subset of our scenarios, along with error bars from the MCSEs. @@ -561,9 +562,15 @@ Even with the additional replicates per point, we see noticeable noise in our pl Also note how our three methods continue to track each other up and down in the top row, due to shared error. This shared error can be deceptive. It can also be a boon: if we are explicitly comparing the performance of one method vs another, we can subtract out the shared error, similar to what happens in a blocked experiment [@gilbert2024multilevel]. +We discuss this approach next. -One way to remove this error to get a more precise head to head comparison of the estimators is to fit a multilevel regression model to our raw simulation results with a random effect for dataset. -We next fit such a model, taking advantage of the fact that estimated bias is simply the average of the error across replicates. +## Multilevel Meta Regressions {#multilevel-meta-regression} +\index{meta-regression} + +One way to get a more precise head to head comparison of the estimators is to fit a multilevel regression model to our raw simulation results with a random effect for dataset. +This can be especially useful as a tool for analyzing results with few iterations. + +We next continue the running example of the prior section and fit such a model, taking advantage of the fact that estimated bias is simply the average of the error across replicates. We first make a unique ID for each scenario and dataset, and then fit the model with a random effect for both. The first random effect allows for specific scenarios to have more or less bias beyond what our model predicts. The second random effect allows for a given dataset to have a larger or smaller error than expected, shared across the three estimators. @@ -659,6 +666,7 @@ Our model is explaining an estimated 16% of the variation across simulation scen ## What to do with warnings in simulations +\index{errors and warnings} Sometimes our analytic strategy might give some sort of warning (or fail altogether). In Section \@ref(error-handling) we saw how to trap such warnings or errors. diff --git a/077-case-study-comparing-estimators.Rmd b/077-case-study-comparing-estimators.Rmd index 5f88f93..897c6b8 100644 --- a/077-case-study-comparing-estimators.Rmd +++ b/077-case-study-comparing-estimators.Rmd @@ -84,7 +84,7 @@ Overall, the scaled error of the mean is stable across the different distributio The trimmed mean is advantageous when the degrees of freedom are small (note the blue line is far below the red line on the left hand plot): we are cropping outliers that destabilize our estimate, which leads to great wins. As the distribution grows more normal (i.e., as the degrees of freedom increases), this is no longer an advantage and the trimmed mean gets closer to the mean in terms of performance. -We might think we should be penalized slightly by having dropped 10% of our data, making the standard errors slightly larger: if this is the case, then it is not large as the MCSE swamps it (the red and blue line are basically overlapping). +We might think we should be penalized slightly by having dropped 10% of our data, making the standard errors slightly larger: if this is the case, then it is not large as the MCSE\index{Monte Carlo standard error} swamps it (the red and blue line are basically overlapping). The median is not able to take advantage of the nuances of the individual observations in the data because it is entirely determined by the middle value. When outliers cause real concern, this cost is minimal. When outliers are not a concern, the median is just worse. @@ -136,7 +136,7 @@ However, if the trimmed estimators are much more stable than the mean estimator, Before we just looked at the SE. But we actually want to -know the standard error, bias, and overall error (RMSE). +know the standard error, bias, and overall error (RMSE\index{performance measures!RMSE}). Can we plot all these on a single plot? Yes we can! To plot, we first gather the outcomes to make a long-form dataset of results: @@ -208,7 +208,7 @@ increases because the bias basically stays the same and the SE drops. But for smaller samples the trimming is superior. The median (essentially trimming 50% above and below) is overkill and has too much negative bias. -From a simulation study point of view, notice how we are looking at three +From a simulation study\index{simulation study} point of view, notice how we are looking at three different qualities of our estimators. Some people really care about bias, some care about RMSE. By presenting all results we are transparent about how the different estimators operate. diff --git a/080-simulations-as-evidence.Rmd b/080-simulations-as-evidence.Rmd index 077f9d7..e5d8951 100644 --- a/080-simulations-as-evidence.Rmd +++ b/080-simulations-as-evidence.Rmd @@ -34,7 +34,7 @@ Such extreme simulations might be considered overly _clean_ tests. Extreme and clean simulations are also usually easier to write, manipulate, and understand. Such simulations can help us learn about the estimators themselves. -For example, we can use simulation to uncover how different aspects of a data generating process affect the performance of an estimator. +For example, we can use simulation to uncover how different aspects of a data generating process\index{data-generating process} affect the performance of an estimator. To discover this, we would need a data generation process with clear levers controlling those aspects. Simple simulations can be used to push theory further---we can experiment to gain an inclination of whether a novel approach to something is a good idea at all, or to verify we are right about an intuition or derivation about how well an estimator can work. @@ -46,6 +46,8 @@ Unless the estimators being tested have truly universally stable properties, we After exploring the overly extreme or overly clean, we have to circle back to providing testing of the sorts of situations an eventual user of our methods might encounter in practice. So how can we make our simulations more relevant? +## Making simulations relevant + In the following subsections we go through a range of general strategies for making relevant simulations: 1. Break symmetries and regularities. @@ -56,7 +58,7 @@ In the following subsections we go through a range of general strategies for mak 6. Design a fully calibrated simulation. -## Break symmetries and regularities +### Break symmetries and regularities In a series of famous causal inference papers [@lin2013agnostic; @freedman2008regression], researchers examined when linear regression adjustment of a randomized experiment (i.e., when controlling for baseline covariates in a randomized experiment) could cause problems. Critically, if the treatment assignment is 50%, then the concerns that these researchers examined do not come into play, as asymmetries between the two groups get perfectly cancelled out. @@ -73,18 +75,18 @@ If we had run simulations with equal site size or equal proportion treated, we w Overall, it is important to make your data generation processes irregular. -## Use extensive multi-factor simulations. +### Use extensive multi-factor simulations. "If a single simulation is not convincing, use more of them," is one principle a researcher might take. By conducting extensive multifactor simulations, once can explore a large space of possible data generating scenarios. If, across the full range of scenarios, a general story bears out, then perhaps that will be more convincing than a narrower range. -Of course, the critic will claim that some aspect of your DGP that is not varying is the real culprit. +Of course, the critic will claim that some aspect of your DGP\index{data-generating process} that is not varying is the real culprit. If this aspect is unrealistic, then the findings, across the board, may be less relevant. Thus, pick the factors one varies with care, and pick factors in anticipation of reviewer critique. -## Use simulation DGPs from prior literature. +### Use simulation DGPs from prior literature. If a relevant prior paper uses a simulation to make a case, one approach is to replicate that simulation, adding in the new estimator one wants to evaluate. In effect, use previously published simulations to beat them at their own game. @@ -93,7 +95,7 @@ By constraining oneself to published simulations, one has less wiggle room to ch Indeed, such a benchmarking approach is often taken in the machine learning literature, where different canonical datasets are used to benchmark the performance of new methods. -## Pick simulation factors based on real data +### Pick simulation factors based on real data Use real data to inform choice of factor levels or other data generation features. For example, in James's work on designing methods for meta-analysis, there is often a question of how big simulated sample sizes should be and how many different simulated outcomes per study there should be when generating sets of hypothetical studies to be included in hypothetical meta-analyses. @@ -105,7 +107,7 @@ For instance, say we find that the distribution of study sizes fits a Poisson(63 We might then then simulate study sizes using a Poisson with mean parameters of 40, 60, or 80 (where 40, 60, and 80 would be one factor in our multifactor experiment). -## Resample real data directly to get authentic distributions +### Resample real data directly to get authentic distributions You can also use real data directly to avoid a parametric model entirely. For example, say you need a population distribution for a slew of covariates. @@ -121,7 +123,7 @@ Generating realistic leverage is hard to do with a parametric model, so instead -## Fully calibrated simulations {#calibrated-simulations} +### Build a fully calibrated simulation {#calibrated-simulations} Extending some of the prior ideas even further, one practice in increasing vogue is to generate _calibrated simulations_. See @Kern2014calibrated for an example of this in the context of generalizing experiments. @@ -151,7 +153,8 @@ Calibrated simulations frequently do not lend themselves to a clear factor-based In exchange for this loss of clean structure, we end up with a simulation that might be more faithfully capturing aspects of a specific context, making our simulations more relevant to answering the narrower question of "how will things work here?" even if they are less relevant for answering the question of "how will things work generally?" -## Generating calibrated data with a copula +## Case study: Using copulas to generate calibrated data +\index{copula} In this section, we borrow from a blog post^[See (https://cares-blog.gse.harvard.edu/post/copulas-for-simulation/)] to illustrate how to use copulas to generate data that is very calibrated to a target dataset. @@ -377,10 +380,9 @@ new_data$Y = empDist[ new_data$r_Y ] And we are done! -### How did we do? +### Assessing the copula's success -Let's explore results: did we end up with something "authentic"? -We first compare the distribution of our generated $Y$ to the true $Y$ from the real DGP: +Let us next see how the distribution of our generated $Y$ to the true $Y$ from the real DGP: ```{r, echo=FALSE} dtL = new_data %>% @@ -436,5 +438,337 @@ We may still end up with relationships between a dropped $X$ and $Y$ due to the +## Simulation Sections in Methods Papers + +Simulations, as we have argued, have many uses, and appear in many places. +One of the most prevalent is the simulation section of a statistical methods paper. +In this section, we make a case for what kinds of simulation evidence a methods paper should have, and outline several principles to consider if you are writing up, or reviewing, such a section. + + +In what follows, we discuss several related goals that simulations in +methods papers should serve, along with practical guidance on how to accomplish +each of them. +We then give some advice on navigating the many design decisions of a simulation, and conclude with some guiding principles for achieving these ends. + + +### The simulation's purpose + +Ideally, a simulation study\index{simulation study} in a methods paper is not decoration. +In the best case, it provides evidence for an argument. +These arguments can be pointed (the researcher has an agenda to prove) or more exploratory (the researcher is trying to learn something). +Which kind of argument correlates with what kind of methods paper is being written. + +Broadly speaking, there are two kinds of methods papers that involve simulation. +The first is a methods development paper. These usually propose some novel approach, have some (mathematical) theory arguing for its utility, have a simulation to justify or illustrate the argument, and finally an applied example to demonstrate the method's use. +For these papers, the argument of the simulation is generally something along the lines of "our proposed method is an improvement over whatever else is out there." +The simulation evidence is then the empirical case for whatever claim is being made. + +The second type of methods paper is a "simulation paper," where the researchers are specifically using simulation to evaluate some set of research questions. +Here there is not necessarily a specific case being made, but instead a research question such as, "which of a set of methods is most appropriate for our given context of interest?" +Now the simulation results are evidence to help adjudicate which options are most promising, and when. + +In either case, the simulation evidence is only good insofar as the choice of data generating process, factors and levels, and performance metrics are relevant. + + +#### Increase belief in the theory + +We generally hope that a mathematical theorem in a paper is correct, but few of us will spend the time carefully reading any accompanying proof or deriviation to ensure this is so. +In fact, many papers have these derivations in a supplement, and often they are cut down, without many of the intermediate steps. + +Simulation can provide a sanity check and reassurance in this case. +One might even say that the most basic job of a simulation is to verify that the theory behind a method +actually works. +Such a "sanity check" simulation is often a simple one. +A clean, stylized data generating process with the key structural features the +theory assumes---perhaps independent errors, a specific parametric form for the +outcome, or a particular covariate distribution---is entirely appropriate here. +The goal is not to show that the method works in a realistic context; it is to show +that the theory does what it says it does. +For example, if your theory says the estimator has a bias that diminishes at rate $1/n$, you should see this as a linear relationship of bias vs. $n$ on a log scale. + + +For this type of simulation, one useful practice is to verify not just that the +point estimates look right, but also that the uncertainty quantification is correct. +Coverage of nominal 95% confidence intervals is a particularly demanding test: +it requires both that the point estimate be close to unbiased and the standard error be well estimated. +Showing close-to-nominal coverage in a clean setting can be a quite meaningful form of theoretical verification. + + +#### Increase the readibility of your paper + +Methods papers can be extremely technical, and the theory can be difficult to understand. +If the notation is not well laid out (which in many cases it is not), even deciphering what a theorem is saying, or what an estimator actually is, can be difficult. +The simulation section often has to get down to brass tacks, laying out the problem cleanly. +Thus simulations can be an accessible entry point into your paper, providing some concrete examples of what the context for the estimators given are, what assumptions the estimators depend on, and so forth. + +While we advocate making the rest of your paper readable, we also note that a good simulation can help add clarity to your overall project. +Clean simulations of this type are also useful for illustrating _how_ the method works. + + +#### Demonstrate finite-sample relevance + +Most statistical theory is asymptotic: as $n \to \infty$, the +estimator is consistent; as the number of studies grows, the coverage of a confidence +interval approaches 95%; as the sample size grows, some test statistic converges to +a chi-squared distribution. +Such asymptotic results might say little about whether the method works at the sample sizes anyone would actually encounter. +Simulations, on the other hand, can show that at some reasonable $n$, things are in fact close to what theory promises (for at least the contexts explored). +For an alarming example of the perils of "asymptopia," @kang2018inference examines weak instruments for Instrumental Variable analysis. +Even though asymptotic IV estimators guarantee 95% coverage _eventually_, Kang et al. find coverage below 50%! + +If you have derived a new estimator and proved it is consistent and asymptotically +normal, a simulation showing that at $n = 500$ the bias is small, the sampling +distribution is approximately normal, and the confidence intervals achieve close to +nominal coverage could be reassuring to readers in a way that the asymptotic proof alone might not be. +Readers are (rightly) skeptical of novel methods, and seeing the theory confirmed in a clean, interpretable simulation builds trust. + +A second major purpose of simulation is therefore to show that the method works--- +or to characterize how well it works---in the finite-sample conditions that are +realistic for the problem. +This is where the multi-factor simulation framework developed earlier becomes essential. +Finite-sample performance depends on more than just $n$. +It depends on signal-to-noise ratios, covariate distributions, the complexity of +the true outcome model, the degree of model misspecification, and a host of +other features. +A convincing finite-sample simulation should vary these features systematically, +using levels that span the realistic range of what practitioners in the domain +are likely to encounter. + +This requirement does put a burden on the researcher to understand their domain. +What are typical sample sizes in your field? +What kinds of covariate structures are common? +How much model misspecification should one worry about? +The simulation factors you choose, and the levels you assign to them, are an +implicit argument about what a domain looks like. + + +#### Motivate and contextualize the subsequent applied analysis + +A methods paper often includes an applied data analysis, usually to show that the +methods work on real data and to make an appeal that the methods designed are not completely useless or irrelevant. +Simulations can play a supporting role here. +If your applied analysis shows that methods A and B gave different answers, then you might need your simulation section to make the case that we should believe A and not B. + +Such a case can be best built via what one might call a _domain-calibrated simulation_, distinct from +the multifactor simulation. +Where the multifactor simulation explores many conditions to establish (ideally) general properties, a domain-calibrated simulation is specifically designed to mimic the applied dataset in the paper. +It is built using the methods discussed above: the covariate distribution migth be drawn from the actual data, the outcome model might be fit to the actual data, and key features of the real setting (sample size, proportion treated, etc.) would all be matched. + +The purpose of this calibrated simulation is not to add statistical generality---it may add very little---but to argue that, in a context that looks a lot like the problem we care about, we should prefer one method over another. +This is especially true if the multifactor simulation shows that in some cases we might prefer the other method, or if in some cases there is no real difference between the two. + +A natural structure for a methods paper, then, is to present the multifactor +simulation first to establish a general story, and then follow it with a +calibrated simulation tied to the applied data. + + +### Navigating the many design decisions + +We next discuss several decisions one needs to make as they design their simulation. + +#### What performance metrics? + +The choice of what to measure in a simulation is not neutral. +Different metrics emphasize different aspects of performance, and a method that +looks good on one metric can look bad on another. +Bias measures systematic error; variance measures precision; mean squared error +combines them. +Coverage of confidence intervals assesses the full inferential pipeline. +Type I error\index{performance measures!Type I error} rate assesses validity of tests under the null. +Power assesses sensitivity. +Computational time is sometimes also relevant. + +For a given paper, some of these matter more than others. +If you are proposing a new estimator meant to correct a bias problem, bias is +obviously central. +If you are proposing a new test for a hypothesis where existing tests are known to +be anti-conservative, Type I error rate is central. +Think carefully about which metrics are directly relevant to the specific claim your +paper is making, and report those prominently. +Do not, of course, report only favorable metrics. +Do report a more full range of metrics in your appendix, if you are only presenting a few in the main paper. + +Do not focus overly narrowly on your primary metric, however. +For example, methods that controls bias tightly can often have much larger variances. +If you are introducing a new method that controls bias, you should first show bias reduction, but also show that your method achieves a favorable bias-variance trade-off. + + +#### How many replications? + +Too few replications and your simulation results will be noisy---a reviewer may +reasonably wonder whether your conclusions hold up, or whether you got lucky. +Too many and you waste computational resources that could have been spent on more +scenarios or factors. + +In your paper, you should compute and report _Monte Carlo standard errors_ for +your key performance metrics. +You can even use the MCSE\index{Monte Carlo standard error} formula to calculate target iterations. +For example, the Monte Carlo\index{Monte Carlo simulation} standard error for an estimated coverage probability $\hat{p}$ +based on $B$ replications is approximately $\sqrt{\hat{p}(1 - \hat{p}) / B}$. +For a nominal 95% confidence interval, this means at $B = 1000$ replications, the +Monte Carlo SE is about 0.7 percentage points---precise enough to distinguish +coverage of 93% from 95% but not to distinguish 94% from 95%. +If that level of precision is important for your argument, you need more +replications. +Reporting these Monte Carlo SEs, or at minimum thinking carefully about them when +choosing $B$, signals methodological care to readers. + +All of this said, as a rule of thumb, $B = 1000$ is often good enough. + + +#### How many factors? + +Start with a few factors, run $R=10$ or so iterations, and time your code to see how long your simulation runs (see @\ref(profiling-code) for how). +If your simulation runs quickly, you can add more factors in addition to scaling up the number of replications, and you probably should. + + + +### Principles of design and presentation + +We next dive into some suggested principles. + + +#### Don't make your DGP stupid + +All too often, the simulation section of a methods paper uses a DGP that is so extreme it is hard to image how it can be useful for anything other than a quick check of theory. +In reading papers, watch for exponential functions, extremely small amounts of noise relative to the variation of the covariates, or other features that make the DGP unrealistic. + +Do not do this, if you do not have good reason. + +#### Figures, not Tables + +As we discussed extensively in prior chapters, use figures, not tables. +Figures allow you to show trends (as something goes up, something else goes down, and so forth). +Tables do not make trends apparent. + +That said, including all your raw numbers in an excel table or csv file as supplemental material is good for transparency. +You might also have the raw values in a supplement, if you think people may be interested in exactly what those numbers are. + + + +#### Avoid selective reporting + +Simulations are particularly susceptible to selective reporting. +The combination of degrees of freedom available to the researcher and the pressure +to present results favorably creates a real risk of findings that do not paint an accurate picture. + +There are three ways to defend against selective reporting. + +The first is advanced planning. +Before running your main set of simulations, think through the full set of factors you will vary +and commit to a range of levels for each. +If you deviate from this plan---because a factor turned out not to matter, or +because a computational constraint forced a change---be transparent about why. +Full factorial designs, with all combinations of all levels, are preferable to +hand-selected sets of scenarios precisely because they leave less room for +post-hoc selection. + +The second is code sharing. +Code sharing is increasingly expected in many fields, and for simulation studies +it is particularly valuable. +Sharing the simulation code allows readers to verify results, extend the simulation +to additional conditions, or reproduce your findings if they are skeptical. +Beyond serving the field, code sharing also disciplines the researcher: knowing that others +could read and run the code tends to make one more careful and more honest. +In practice, this means keeping simulation code clean and well-documented, something +that the software engineering practices discussed earlier in this book will help +facilitate. + +The third is to find contexts where your methods fail. +If you present simulations that include boundaries where your methods fail, then you are giving a more complex picture of the landscape of performance. +You also gain credibility: by showing areas where things fall apart, you are signaling that you are not just cherry picking favorable results. + + +#### Tell a coherent story + +The best simulation sections should not be a list of very specific findings. +This, for example, is poor: + +> In scenario A, method X had an RMSE\index{performance measures!RMSE} of 0.34, and method Y had an RMSE of 0.23, so Y was preferred, while in scenario B, method X's RMSE rose to 0.44 and Y rose to 0.52, which was higher. In scenario C, X's RMSE rose again to 0.55, higher than Y's 0.45, and in scenario D, X's RMSE was 0.40, lower than Y's 0.42. + +This is better: + +> Results were mixed: Method X outperformed Y in scenarios B and D, while Y outperformed X in scenarios A and C. + +And even better: + +> In Scenarios B and D, which had a nonlinear trend, method X outperformed Y, but Y was superior when the trend was linear. + + +In the discussion you then might have: + +> Earlier we saw that method X outperformed Y when the trend was nonlinear, but Y outperformed X when the trend was linear. This is consistent with the fact that method X is designed to capture nonlinear trends: it has a stability cost (note the higher SEs on Table Q), but that cost is worth it if there is something to correct for. We generated our nonlinear trend based on our fit model to the applied example discussed in the next section; we therfore view the risk of increased bias in practical settings as quite real, and thus recommend using X over Y in practice, without strong reason to believe a linear trend. + + +Some people try to report all the results, without interpretation, and then have the discussion after. +Under this model, the reporting part can be extremely difficult to write, and extremely painful and boring to read. +Keep it short and refer to the tables and figures for the details. +Do not just list all those numbers without any contextualization. + + +In fact, if you can get away with it, just present the figures and tables, and dive right into interpretation as you explain how to read those tables. + +E.g., continuing the running example from above: + +> Table Q1 shows method X, with the bias correction, outperforms Y in the "nonlinear" scenarios, but Y outperforms when the trend is in fact linear. The higher RMSE in the linear scenarios is a stability cost (also note the higher SEs on Table Q2), but that cost is worth it if there is something to correct for. We generated our nonlinear trend based on our fit model to the applied example discussed in the next section; we therfore view the risk of increased bias in practical settings as quite real, and thus recommend using X over Y in practice, without strong reason to believe a linear trend. + +Overall, make an argument. +Give the reader your best understanding of why the proposed method works, when it works, how much it matters, +and when one should or should not worry. + +Your visualizations should be selected to support your argument. +In the above, connect your linear scenarios with a line, and the non-linear ones with a different line, to make it easy to see the pattern. +Curate your visualizations to show the findings that matter, and put the remainder in the supplement for transparency. + +#### Give summaries, not details, in the main text + +Extending the above, in the main text you might have summaries of your performance. +For example, in a large multifactor simulation of hundreds of scenarios, you might say + +> Across the scenarios considered, the RMSE of method X ranged from 0.24 to 0.64. The other methods generally had higher RMSEs, ranging from 0.32 to 0.81. That said, in 15% of the scenarios explored, method X's RMSEs were more than 20% larger than the overall average, well beyond what could be explained by the Monte Carlo uncertainty. + + + +#### Keep the main text simple + +Not everything should be in the main text. +It is entirely appropriate to put secondary results---additional factor levels, +supplementary metrics, robustness checks---in an appendix or supplementary +materials. +What belongs in the main text are the figures that most directly +supports the central claim of the paper, presented as clearly and compactly as +possible. + + +#### Report MCSEs, but do it concisely + +Including error bars from MCSEs on your plots can increase clutter substantially. +Instead consider putting the range of MCSEs in the caption of your plot. + +In your text, do not report differences between methods that are smaller than the MCSEs you are calculating. +If arguing equivalence, say that your simulations show the methods are about the same, but could differ by about twice your MCSE (getting a true MCSE on the difference can be surprisingly tricky as the estimated performance measures of different methods will be correlated; see Section \@ref(multilevel-meta-regression) for further discussion). + + +## Concluding Thoughts + +For a methods paper, simulations should generally serve at least three +purposes: verifying that the theory is correct in clean conditions, demonstrating +that the method works in finite samples at realistic scales for the domain, and +motivating the applied analysis by showing how the method performs in a context +that resembles the real application. +It should use performance metrics that are directly relevant to the paper's central +claims, report enough replications to make the results trustworthy, and be +presented in a way that tells a coherent story rather than merely exhausting the reader with numbers and details. +Getting all of this right is not easy, but it is what separates a simulation study +that genuinely advances knowledge from one that merely fills space between the theory +section and the data analysis. + +Simulations generally have a bad reputation, and that is unfortunately deserved. +When reading methods papers, keep an eye out for extreme simulations that have no connection to reality, simulations that only report on one metric, leaving other ones unstated, and reported measures where it is impossible to tell if a difference is substantively meaningful or not. +These examples are all too easy to find. +Please do not increase the pool, in this regard. + + + diff --git a/105-file-management.Rmd b/105-file-management.Rmd index 6f7ccba..1674f24 100644 --- a/105-file-management.Rmd +++ b/105-file-management.Rmd @@ -19,7 +19,7 @@ Fortunately, a bit of forethought and advance planning can make these types of p In the next several chapters, we describe some organizing principles and programming practices that will make it easier to handle complex simulation projects. In this chapter, we discuss project organization, file management, and storing simulation results. -In Chapter \@ref(parallel-processing), we demonstrate parallel processing methods that will help you to speed up the computation involved in multifactor simulations. +In Chapter \@ref(parallel-processing), we demonstrate parallel processing\index{parallel processing} methods that will help you to speed up the computation involved in multifactor simulations. Finally, in Chapter \@ref(debugging-and-testing), we introduce some basic debugging and testing techniques that we have found useful for keeping on top of complex code bases. ## Simulation project structure @@ -41,7 +41,7 @@ In practice, this means keeping the code for each phase in a separate file (or s In particular, dividing your simulations in this way allows for easily changing how one _analyzes_ an experiment without re-running the entire thing. The modular approach we propose is in keeping with a general principle for organizing large computational projects: keep the code for distinct tasks in different files. -In this chapter, we describe strategies for organizing the code and work products involved in a multifactor simulation study. +In this chapter, we describe strategies for organizing the code and work products involved in a multifactor simulation study\index{simulation study}. We start by offering guidance and recommendations for how to structure individual files and make use of code that is organized across more than one file. We then describe a directory structure that encourages a clear separation between phases of a project. Finally, we discuss strategies for storing and organizing results when executing a simulation. @@ -185,7 +185,7 @@ Also see Chapters 13 through 15 of @Wickham2023packages. ## Principled directory structures -_Tidy home, tidy mind_, as the saying goes. +_Tidy home, tidy mind_, as the saying goes.\index{tidy design} With any computational project, our home is usually a directory on our computer (or in a cloud). For multifactor simulation studies, we _strongly_ advocate using an organized and clearly labelled directory structure, which will facilitate more intentional---and easier to follow---coding practices. We recommend using an integrated development environment such as RStudio, Positron, or VS Code and building a directory structure as follows: @@ -313,6 +313,7 @@ source( "case_study_code/evaluate_CIs.R" ) ``` +\index{simulation driver} In Chapter \@ref(simulating-multiple-scenarios), we used `bundle_sim()` to create a simulation driver for executing the simulation for a given scenario and then used `evaluate_by_row()` to call the simulation driver for every condition listed in `params`: ```{r, eval = FALSE} library(simhelpers) diff --git a/120-parallel-processing.Rmd b/120-parallel-processing.Rmd index cfc2caa..bc8eb3a 100644 --- a/120-parallel-processing.Rmd +++ b/120-parallel-processing.Rmd @@ -16,7 +16,7 @@ library(furrr) ``` -Especially if you take our advice of "when in doubt, go more general" and if are running enough replicates to get nice and samll Monte Carlo errors, you will quickly come up against the computational limits of your computer. +Especially if you take our advice of "when in doubt, go more general" and if are running enough replicates to get nice and samll Monte Carlo\index{Monte Carlo simulation} errors, you will quickly come up against the computational limits of your computer. Simulations can be incredibly computationally intensive, and there are a few means for dealing with that. The first is to optimize code by removing extraneous calculation (e.g., by writing methods from scratch rather than using the safety-checking and thus sometimes slower methods in R, or by saving calculations that are shared across different estimation approaches) to make it run faster. This approach is usually quite hard, and the benefits often minimal; see Appendix @ref(optimize-code) for further discussion and examples. @@ -63,7 +63,7 @@ parallel::detectCores() Normally, unless you tell it to do otherwise, __*R only uses one core*__. This is obviously a bit lazy on R's part. -But it is easy to take advantage of multiple cores using the `future` and `furrr` packages. +But it is easy to take advantage of multiple cores using the `future` and `furrr` packages.\index{package!future}\index{package!furrr} ```{r, eval = TRUE} library(future) @@ -71,7 +71,7 @@ library(furrr) ``` In particular, the `furrr` package replicates our `map` functions, but in parallel. -We first tell our R session what kind of parallel processing we want using the `future` package. +We first tell our R session what kind of parallel processing\index{parallel processing} we want using the `future` package. In general, using `plan(multisession)` is the cleanest: it will start one entire R session per core, and have each session do work for you. The alternative, `multicore` does not seem to work well with Windows machines, nor with RStudio in general. diff --git a/130-debugging_and_testing.Rmd b/130-debugging_and_testing.Rmd index ba1a2de..580e841 100644 --- a/130-debugging_and_testing.Rmd +++ b/130-debugging_and_testing.Rmd @@ -6,6 +6,7 @@ library( testthat ) # Debugging and Testing {#debugging-and-testing} +\index{debugging code} Writing code is not too hard. Writing _correct_ code, however, can be quite challenging. @@ -17,6 +18,7 @@ There are some tools, however, that can mitigate that to some extent. ## Debugging with `print()` {#about-print} +\index{print()} When you follow modular design, you will often have methods you wrote calling other methods you wrote, which call even more methods you wrote. When you get an error, you might not be sure where to look, or what is happening. @@ -57,6 +59,7 @@ We discuss these next. ## Debugging with `browser()` {#about-browser-debugging} +\index{browser()} Consider the following code taken from a simulation: @@ -81,6 +84,7 @@ This allows you to walk through the code step by step, seeing what happens as yo Much of the time, RStudio will even jump to the part of your script where you paused, so you can see the code that will be run with each step. ## Debugging with `debug()` +\index{debug()} Another useful debugging tool is the `debug()` function. This function allows you to set a breakpoint in your code, so that when you call the function, it will stop at the beginning of the function and put you in the same browser discussed above. @@ -96,6 +100,7 @@ Now, when `run_simulation()` eventually calls `gen_dat` the script will stop, an ## Protecting functions with `stopifnot()` {#about-stopifnot} +\index{stopifnot()} When writing functions, especially those that take a lot of parameters, it is often wise to include `stopifnot()` statements at the top to verify the function is getting what it expects. These are sometimes called "assert statements" and are a tool for making errors show up as early as possible. @@ -175,7 +180,7 @@ In R, the most common way of doing this is the `testthat` package. There are two general parts to `testthat`: the `expect_*()` methods and the `test_that()` function. -Consider the following simple DGP to generate an X and Y variable with a given relationship: +Consider the following simple DGP\index{data-generating process} to generate an X and Y variable with a given relationship: ```{r} my_DGP <- function( N, mu, beta ) { stopifnot( N > 0, beta <= 1 ) diff --git a/140-simulation-for-power-analysis.Rmd b/140-simulation-for-power-analysis.Rmd index 81675d0..f59830f 100644 --- a/140-simulation-for-power-analysis.Rmd +++ b/140-simulation-for-power-analysis.Rmd @@ -28,18 +28,18 @@ library(furrr) # Using simulation as a power calculator {#sec:power} We can use simulation as a power calculator. -In particular, to estimate power, we generate data according to our best guess as to what we might find in a planned evaluation, and then analyze these synthetic data and see if we detect the effect we built into our DGP. +In particular, to estimate power, we generate data according to our best guess as to what we might find in a planned evaluation, and then analyze these synthetic data and see if we detect the effect we built into our DGP\index{data-generating process}. We then do this repeatedly, and see how often we detect our effect. This is power. Now, if we are generally right about our guesses about our DGP and the associated parameters we plugged into it, in terms of some planned study, then our power will be right on. -This is all a power analysis is, using simulation or otherwise. +This is all a power analysis\index{power analysis} is, using simulation or otherwise. Simulation has benefits over using power calculators because we can take into account odd aspects of our modeling, and also do non-standard approaches to evaluation that we might not find in a normal power calculator. We illustrate this idea with a case study. In this example, we are planning a school-level intervention to reduce rates of discipline via a socio-emotional targeting intervention on both teachers and students, where we have strongly predictive historic data and a time-series component. -This is a planned RCT, where we will treat entire schools (so a cluster-randomized study). +This is a planned randomized controlled trial (RCT),\index{randomized controlled trial} where we will treat entire schools (so this is a cluster-randomized study). We are struggling because treating each school is very expensive (we have to run a large training and coaching of the staff), so each unit is a major decision. We want something like 4, 5, or maybe 6 treated schools. Our diving question is: Can we get away with this? @@ -411,7 +411,7 @@ library( lme4 ) Many power analyses are regarding fitting some sort of multilevel model to some type of structured data. For example, researchers frequently want to calculate power for multisite randomized trials, where each of a series of sites has students randomized to treatment, or not. -Our earlier cluster RCT case study is another example of this. +Our earlier cluster RCT case study (see, e.g., Chapter \@ref(case-cluster)) is another example of this. We can use the same simulation framework to calculate power for these types of models. We write a data generation function that generates our data given our target structure, and then repeatidly generate and analyze data to assess power, just as we have done. diff --git a/150-potential-outcomes-framework.Rmd b/150-potential-outcomes-framework.Rmd index 7e85aad..587a1fb 100644 --- a/150-potential-outcomes-framework.Rmd +++ b/150-potential-outcomes-framework.Rmd @@ -14,7 +14,7 @@ theme_set( theme_classic() ) ``` -If we are in the business of evaluating how various methods such as matching or propensity score weighting work in practice, we would probably turn to the potential outcomes framework for our simulations. +If we are in the business of evaluating how various methods such as matching or propensity score weighting work in practice, we would probably turn to the potential outcomes\index{potential outcomes framework} framework for our simulations. The potential outcomes framework is a framework typically used in the causal inference literature to make very explicit statements regarding the mechanics of causality and the associated estimands one might target when estimating causal effects. While we recommend reading, for a more thourough overview, either [CITE Raudenbush or Field Experiments textbook], we briefly outline this framework here to set out our notation. @@ -40,7 +40,7 @@ Consider a sample of $n$ units, $\mathcal{S}$, along with their set of potential We can talk about the true ATE of the sample, or, if we thought of the sample as being drawn from some larger population, we could talk about the true ATE of that larger population. This is a tension that often arises in potential outcomes based simulations: if we are focused on $ATE_{\mathcal{S}}$ then for each sample we generate, our estimand could be (maybe only slightly) different, depending on whether our sample has more or fewer units with high $\tau_i$. -If, on the other hand, we are focused on where the units came from (which is our data generating model), our estimand is a property of the DGP, and would be the same for each sample generated. +If, on the other hand, we are focused on where the units came from (which is our data generating model), our estimand is a property of the DGP\index{data-generating process}, and would be the same for each sample generated. The catch is when we calculate our performance metrics, we now have two possible targets to pick from. Furthermore, if we are targeting the superpopulation ATE, then our error in estimation may be due in part to the representativeness of the sample, _not_ the estimation or uncertainty due to the random assignment. @@ -72,7 +72,7 @@ We are driving both the level and the treatment effect with $X_i$, assuming $\be One advantage of generating all the potential outcomes is we can then calculate the finite-sample estimands such as the true average treatment effect for the generated sample: we just take the average of $Y_i(1) - Y_i(0)$ for our sample. -Here is some code to illustrate the first part of the data generating process (we leave treatment assignment to later): +Here is some code to illustrate the first part of the data generating process\index{data-generating process} (we leave treatment assignment to later): ```{r PO_gen_data_function} gen_data <- function( n = 100, @@ -277,7 +277,7 @@ one_finite_run <- function( R0 = 3, n = 100, ... ) { } ``` -This driver also stores the finite sample ATE for future reference: +This driver also stores the finite sample ATE for future\index{future package} reference: ```{r} one_finite_run() ``` @@ -321,7 +321,7 @@ mean( sqrt( fruns$SE2 ) ) We can use our collection of mini-finite-sample runs to estimate superpopulation quantities as well. Given that the simulation datasets are i.i.d. draws, we can simply take expectations across all our simulations. -The only concern is our estimates of MCSE will be off due to the clustering in our simulation runs. +The only concern is our estimates of MCSE\index{Monte Carlo standard error} will be off due to the clustering in our simulation runs. Here we calculate superpopulation performance measures (both with the squared SE and without; we prefer the squared version): diff --git a/160-parametric-bootstrap.Rmd b/160-parametric-bootstrap.Rmd index 15defbb..a1f5db8 100644 --- a/160-parametric-bootstrap.Rmd +++ b/160-parametric-bootstrap.Rmd @@ -6,11 +6,11 @@ library( tidyverse ) # The Parametric bootstrap -An inference procedure very much connected to simulation studies is the parametric bootstrap. +An inference procedure very much connected to simulation studies is the parametric bootstrap\index{parametric bootstrap}. The parametric bootstrap is a bootstrap technique designed to obtain standard error estimates for an estimated parametric model. It can do better than the case-wise bootstrap in some circumstances, usually when there is need to avoid the discrete, chunky nature of a casewise bootstrap (which will only give values that exist in the original dataset). -For a parametric bootstrap, the core idea is to fit a given model to actual data, and then take the parameters we estimate from that model as the DGP parameters in a simulation study. +For a parametric bootstrap, the core idea is to fit a given model to actual data, and then take the parameters we estimate from that model as the DGP\index{data-generating process} parameters in a simulation study\index{simulation study}. The parametric bootstrap is a simulation study for a specific scenario, and our goal is to assess how variable (and, possibly, biased) our estimator is for this specific scenario. If the behavior of our estimator in our simulated scenario is similar to what it would be under repeated trials in the real world, then our bootstrap answers will be informative as to how well our original estimator performs in practice. This is the bootstrap principle, or analogy with an additional assumption that the real-world is effectively well specified as the parameteric model we are fitting. @@ -96,7 +96,7 @@ bind_rows( parametric = res_par, ``` -This is _also_ a simulation: our data generating process is a bit more vague, however, as we are just resampling the data. +This is _also_ a simulation: our data generating process\index{data-generating process} is a bit more vague, however, as we are just resampling the data. This means our estimands are not as clearly specified. For example, in our parameteric approach, our target parameter is known to be true. In the case-wise, the connection between our DGP and the parameter `theta.hat` is less explicit. diff --git a/200-coding-tidbits.Rmd b/200-coding-tidbits.Rmd index 1970404..6a90a2c 100644 --- a/200-coding-tidbits.Rmd +++ b/200-coding-tidbits.Rmd @@ -149,7 +149,7 @@ That didn't work, but let's provide some block sizes and see what happens: generate_blocked_data( n_k = c( 3, 2 ) ) ``` -Nice! We see that we have a block ID and the control and treatment potential outcomes. We also don't see a random assignment variable, so that tells us we probably need some other methods as well. +Nice! We see that we have a block ID and the control and treatment potential outcomes\index{potential outcomes framework}. We also don't see a random assignment variable, so that tells us we probably need some other methods as well. But we can play with this as it stands right away. Next we can see that there are many things we might tune: @@ -215,7 +215,7 @@ It is basically like `Sys.time()` but it has a few more features, such as being The `bench` package provides some powerful tools for timing code, and is in particular good for comparing different ways of doing the same (or similar) thing. `bench::mark()` runs each expression 10 times (by default) and tracks how long the computations take. It then summarizes the distribution of timings. -For example, we can time how long it takes to analyze some data from our cluster RCT experiment: +For example, we can time how long it takes to analyze some data from our cluster RCT DGP from \@ref(case-cluster):\index{example!cluster randomized trial} ```{r} library( bench ) @@ -375,13 +375,13 @@ Optimizing as you go usually means you will spend a lot of time wrestling with c For example, often it is the estimation method that will take a lot of computational time, so having very fast data generation code will not help shorten the overall run time of a simulation much, as you are tweaking something that is only a small part of the overall pie, in terms of time. Keep things simple; in general your time is more important than the computer's time. -Overall, computational efficiency should usually be a secondary consideration when you are starting to design a simulation study. +Overall, computational efficiency should usually be a secondary consideration when you are starting to design a simulation study\index{simulation study}. It is better to produce accurate code, even if it is a bit slow, than to write code that is speedy but hard to follow (or even worse, that produces incorrect results). That warning made, in the next sections we will look at a few optimization efforts applied to the ANOVA example from Section \@ref(case-ANOVA) to illustrate some principles of optimization that come up a lot in simulation projects. ### Hand-building functions -In our initial ANOVA simulation we used the system-implemented ANOVA. +In our initial ANOVA simulation (see Chapter \@ref(case-ANOVA)) we used the system-implemented ANOVA.\index{example!heteroskedastic ANOVA (Welch)} An alternative approach would be to "hand roll" the ANOVA F statistic and test directly. Doing so by hand can set you up to implement modified versions of these tests later on. Also, although hand-building a method does take more work to program, it can result in a faster piece of code (this actually is the case here) which in turn can make the overall simulation faster. diff --git a/210-futher-resources.Rmd b/210-futher-resources.Rmd index 10afae7..21edbbc 100644 --- a/210-futher-resources.Rmd +++ b/210-futher-resources.Rmd @@ -12,7 +12,7 @@ We close with a list of things of interest we have discovered while writing this - [SimDesign](https://github.com/philchalmers/SimDesign/wiki) R package (Chalmers, 2019) - Tools for building generic simulation workflows. - - [Chalmers & Adkin (2019)](http://www.tqmp.org/RegularArticles/vol16-4/p248/). Writing effective and reliable Monte Carlo simulations with the SimDesign package. + - [Chalmers & Adkin (2019)](http://www.tqmp.org/RegularArticles/vol16-4/p248/). Writing effective and reliable Monte Carlo\index{Monte Carlo simulation} simulations with the SimDesign package. - [DeclareDesign](https://declaredesign.org/) (Blair, Cooper, Coppock, & Humphreys) diff --git a/220-index.Rmd b/220-index.Rmd new file mode 100644 index 0000000..c8ae3c5 --- /dev/null +++ b/220-index.Rmd @@ -0,0 +1,21 @@ + + +```{r index-see-also, echo=FALSE, results='asis'} +# Emit all |see{} cross-references in one place. +# These produce no visible output and no page numbers — they only create +# redirect entries in the printed index. Keeping them here (rather than +# scattered through the prose) avoids duplicate "see X" entries in the +# compiled index. +if (knitr::is_latex_output()) { + cat(' +\\index{DGP|see{data-generating process}} +\\index{MCSE|see{Monte Carlo standard error}} +\\index{CRT|see{cluster-randomized trial}} +\\index{RCT|see{randomized controlled trial}} +\\index{IRT|see{item response theory}} +\\index{HLM|see{multilevel model}} +\\index{SE|see{standard error}} +\\printindex +') +} +``` diff --git a/extras/017-Addendum-to-case-Study.R b/attic/017-Addendum-to-case-Study.R similarity index 100% rename from extras/017-Addendum-to-case-Study.R rename to attic/017-Addendum-to-case-Study.R diff --git a/040-performance-criteria-in-depth-examples b/attic/040-performance-criteria-in-depth-examples similarity index 100% rename from 040-performance-criteria-in-depth-examples rename to attic/040-performance-criteria-in-depth-examples diff --git a/extras/042-cronbach-alpha-solutions.Rmd b/attic/042-cronbach-alpha-solutions.Rmd similarity index 100% rename from extras/042-cronbach-alpha-solutions.Rmd rename to attic/042-cronbach-alpha-solutions.Rmd diff --git a/extras/047-case-study-clustered-data.Rmd b/attic/047-case-study-clustered-data.Rmd similarity index 100% rename from extras/047-case-study-clustered-data.Rmd rename to attic/047-case-study-clustered-data.Rmd diff --git a/scraps/060-case-study-cronbach-alpha.Rmd b/attic/060-case-study-cronbach-alpha.Rmd similarity index 100% rename from scraps/060-case-study-cronbach-alpha.Rmd rename to attic/060-case-study-cronbach-alpha.Rmd diff --git a/105-file-management-scraps b/attic/105-file-management-scraps similarity index 99% rename from 105-file-management-scraps rename to attic/105-file-management-scraps index 8a15bae..111f9e3 100644 --- a/105-file-management-scraps +++ b/attic/105-file-management-scraps @@ -119,7 +119,7 @@ Note how we shuffle the rows of our task list so that which process gets what ta If some tasks are much longer (e.g., due to larger sample size) then this will get balanced out across our processes. See \@ref(parallel-processing) for more on parallel processing. -The `if-then` structure allows us to easily switch between parallel and nonparallel code. +The `if-then` structure allows us to easily switch between parallel and nonparallel code.\index{debugging code} This makes debugging easier: when running in parallel, stuff printed to the console does not show until the simulation is over. Plus it would be all mixed up since multiple processes are working simultaneously. diff --git a/performance-criteria-scraps b/attic/performance-criteria-scraps similarity index 100% rename from performance-criteria-scraps rename to attic/performance-criteria-scraps diff --git a/removed_fragments.txt b/attic/removed_fragments.txt similarity index 100% rename from removed_fragments.txt rename to attic/removed_fragments.txt diff --git a/dumb_job.R b/code/dumb_job.R similarity index 100% rename from dumb_job.R rename to code/dumb_job.R diff --git a/index_support/add_index_terms.R b/index_support/add_index_terms.R new file mode 100644 index 0000000..77c15df --- /dev/null +++ b/index_support/add_index_terms.R @@ -0,0 +1,478 @@ +################################################################################ +# add_index_terms.R +# +# Semi-automated indexing helper for bookdown projects. +# +# For each term in the INDEX_TERMS list below, this script finds occurrences +# in .Rmd files and appends \index{...} immediately after the term — but ONLY +# in prose text, never inside: +# - YAML front matter (---...---) +# - fenced code blocks (```...```) +# - inline code (`...`) +# - cross-reference spans (\@ref(...)) +# - markdown hyperlink URLs (...](...)) +# +# Usage: +# 1. Edit INDEX_TERMS (and optionally TERM_FILES) below. +# 2. Run in DRY_RUN mode first to preview what will change. +# 3. Set DRY_RUN <- FALSE to apply the changes. +# +# See the README section at the bottom for indexing conventions. +################################################################################ + +library(stringr) + +# ── Configuration ────────────────────────────────────────────────────────────── + +DRY_RUN <- FALSE # TRUE = preview only, FALSE = write changes to disk +FIRST_PER_FILE <- TRUE # TRUE = index only the first match per file (common +# for terms defined/introduced in a chapter) +# FALSE = index every match (for important recurring terms) +CASE_SENSITIVE <- TRUE # TRUE = exact case match, FALSE = case-insensitive + +# Directory containing the .Rmd files +RMD_DIR <- "." + +# ── Index term definitions ───────────────────────────────────────────────────── +# +# Each entry is a named list: +# term - the exact text to search for in prose +# index_key - the LaTeX \index{} entry (use ! for subentries) +# files - character vector of filenames to search, or NULL for all files +# +# Examples of index_key formats: +# "data-generating process" → top-level entry +# "performance measures!bias" → subentry: bias under performance measures +# "DGP|see{data-generating process}" → cross-reference (appears in PDF index) +# +# TIP: Run with DRY_RUN <- TRUE first to preview changes before committing. + +INDEX_TERMS <- list( + + # ── Core simulation vocabulary ── + list( + term = "data-generating process", + index_key = "data-generating process", + files = NULL # all files + ), + list( + term = "data generating process", + index_key = "data-generating process", + files = NULL + ), + # NOTE: DGP, MCSE, and other abbreviation |see{} cross-references are + # defined in 220-index.Rmd, not here. Do not add |see{} entries to this + # list — they belong in that file to avoid duplicates in the printed index. + list( + term = "Monte Carlo", + index_key = "Monte Carlo simulation", + files = NULL + ), + list( + term = "simulation study", + index_key = "simulation study", + files = NULL + ), + + # ── Performance measures ── + list( + term = "bias", + index_key = "performance measures!bias", + files = c("040-Performance-criteria.Rmd") + ), + list( + term = "root mean squared error", + index_key = "performance measures!RMSE", + files = NULL + ), + list( + term = "RMSE", + index_key = "performance measures!RMSE", + files = NULL + ), + list( + term = "coverage", + index_key = "performance measures!coverage", + files = c("040-Performance-criteria.Rmd", "074-building-good-vizualizations.Rmd") + ), + list( + term = "Type I error", + index_key = "performance measures!Type I error", + files = NULL + ), + list( + term = "statistical power", + index_key = "performance measures!power", + files = NULL + ), + list( + term = "Monte Carlo standard error", + index_key = "Monte Carlo standard error", + files = NULL + ), + # ── Packages ── + list( + term = "simhelpers", + index_key = "simhelpers package", + files = NULL + ), + list( + term = "furrr", + index_key = "furrr package", + files = NULL + ), + list( + term = "future", + index_key = "future package", + files = NULL + ), + + # ── Computational topics ── + list( + term = "parallel processing", + index_key = "parallel processing", + files = NULL + ), + list( + term = "pseudo-random number", + index_key = "pseudo-random number generator", + files = NULL + ), + list( + term = "seed", + index_key = "random seed", + files = c("035-running-simulation.Rmd") + ), + list( + term = "reparameterization", + index_key = "reparameterization", + files = NULL + ), + + # ── Statistical methods ── + list( + term = "parametric bootstrap", + index_key = "parametric bootstrap", + files = NULL + ), + list( + term = "potential outcomes", + index_key = "potential outcomes framework", + files = NULL + ), + list( + term = "cluster-randomized trial", + index_key = "cluster-randomized trial", + files = NULL + ), + list( + term = "power analysis", + index_key = "power analysis", + files = NULL + ) + + # ── Add more entries here ── + # list( + # term = "your term", + # index_key = "your index entry", + # files = NULL # or c("specific-file.Rmd") + # ) +) + + +# ── Helper functions ─────────────────────────────────────────────────────────── + +# Split an Rmd file's lines into "prose" vs "code/metadata" regions. +# Returns a logical vector: TRUE = prose line, FALSE = skip (code or YAML). +# Skips: +# - YAML front matter (the opening ---...--- or ---....... block) +# - HTML comment blocks (, possibly multi-line) +# - Fenced code blocks (```...```) +# +# NOTE on HTML comments: Rmd files sometimes contain commented-out code chunks +# like . +classify_lines <- function(lines) { + in_code_block <- FALSE + in_yaml <- FALSE + in_html_comment <- FALSE + is_prose <- logical(length(lines)) + + for (i in seq_along(lines)) { + line <- lines[[i]] + + # ── YAML front matter: only the very first --- opens it ── + if (i == 1 && str_detect(line, "^---\\s*$")) { + in_yaml <- TRUE + is_prose[i] <- FALSE + next + } + if (in_yaml) { + is_prose[i] <- FALSE + # Closed by a second --- or a ... line + if (str_detect(line, "^(---|\\.\\.\\.)\\s*$")) { + in_yaml <- FALSE + } + next + } + + # ── HTML comment blocks () ── + # Handle open and close potentially on the same line. + if (in_html_comment) { + is_prose[i] <- FALSE + if (str_detect(line, "-->")) in_html_comment <- FALSE + next + } + if (str_detect(line, "")) in_html_comment <- TRUE + next + } + + # ── Fenced code blocks ── + if (str_detect(line, "^\\s*```")) { + in_code_block <- !in_code_block + is_prose[i] <- FALSE # the fence line itself is not prose + } else { + is_prose[i] <- !in_code_block + } + } + is_prose +} + +# Within a prose line, apply replacement only outside protected spans: +# - inline code: `...` +# - cross-refs: \@ref(...) +# - hyperlinks: [...](...) — the URL portion +replace_outside_inline_code <- function(line, pattern, replacement, first_only) { + # Collect all protected ranges + protected <- rbind( + str_locate_all(line, "`[^`]+`")[[1]], # inline code + str_locate_all(line, "\\\\@ref\\([^)]+\\)")[[1]], # \@ref(...) + str_locate_all(line, "\\]\\([^)]+\\)")[[1]] # markdown link URLs ](...) + ) + + # Build a mask of character positions that are "safe" + n <- nchar(line) + safe <- rep(TRUE, n) + if (nrow(protected) > 0) { + for (k in seq_len(nrow(protected))) { + safe[protected[k, "start"]:protected[k, "end"]] <- FALSE + } + } + + # Find all matches of pattern in the full line + m <- str_locate_all(line, pattern)[[1]] + if (nrow(m) == 0) return(list(line = line, changed = FALSE)) + + # Filter to matches that fall entirely within safe regions + safe_matches <- apply(m, 1, function(r) all(safe[r["start"]:r["end"]])) + m <- m[safe_matches, , drop = FALSE] + if (nrow(m) == 0) return(list(line = line, changed = FALSE)) + + if (first_only) m <- m[1, , drop = FALSE] + + # Apply replacements in reverse order so positions stay valid + for (k in rev(seq_len(nrow(m)))) { + start <- m[k, "start"] + end <- m[k, "end"] + original_match <- substr(line, start, end) + line <- paste0( + substr(line, 1, start - 1), + replacement(original_match), + substr(line, end + 1, nchar(line)) + ) + } + list(line = line, changed = TRUE) +} + + +# ── Main processing function ─────────────────────────────────────────────────── + +process_term <- function(entry, rmd_dir, dry_run, first_per_file, case_sensitive) { + term <- entry$term + index_key <- entry$index_key + files <- entry$files + + # |see{} and |seealso{} cross-references must appear exactly once in the + # whole book — multiple identical "see X" entries look terrible in the index. + # Auto-detect these and force first-in-book mode regardless of first_per_file. + first_in_book <- str_detect(index_key, "\\|see") + found_anywhere <- FALSE # tracks across files when first_in_book is TRUE + + # Select target files + EXCLUDE <- c("Designing-Simulations-in-R.Rmd", "220-index.Rmd") + all_rmds <- list.files(rmd_dir, pattern = "\\.Rmd$", full.names = TRUE) + all_rmds <- all_rmds[!basename(all_rmds) %in% EXCLUDE] + if (!is.null(files)) { + all_rmds <- all_rmds[basename(all_rmds) %in% files] + } + if (length(all_rmds) == 0) { + message(" No matching files for term: ", term) + return(invisible(NULL)) + } + + # Build regex pattern (word boundary, optional case) + pattern_flags <- if (case_sensitive) "" else "(?i)" + pattern <- paste0(pattern_flags, "\\b", str_escape(term), "\\b") + + # Replacement function: append \index{} after the matched text + make_replacement <- function(matched_text) { + paste0(matched_text, "\\index{", index_key, "}") + } + + total_files_changed <- 0 + + for (rmd_path in all_rmds) { + if (first_in_book && found_anywhere) next # already placed, skip rest of files + + lines <- readLines(rmd_path, warn = FALSE) + is_prose <- classify_lines(lines) + + new_lines <- lines + file_changed <- FALSE + found_first <- FALSE + + for (i in seq_along(lines)) { + if (!is_prose[i]) next + if ((first_per_file || first_in_book) && found_first) next + + # Skip if \index{index_key} already follows this term on this line + already_indexed <- str_detect(lines[i], paste0( + str_escape(term), "\\\\index\\{", str_escape(index_key), "\\}" + )) + if (already_indexed) { + found_first <- TRUE + next + } + + result <- replace_outside_inline_code( + lines[i], pattern, make_replacement, first_only = (first_per_file && !found_first) + ) + + if (result$changed) { + new_lines[i] <- result$line + file_changed <- TRUE + found_first <- TRUE + found_anywhere <- TRUE + + if (dry_run) { + cat(sprintf("\n [%s] line %d\n", basename(rmd_path), i)) + cat(sprintf(" BEFORE: %s\n", lines[i])) + cat(sprintf(" AFTER: %s\n", result$line)) + } + } + } + + if (file_changed) { + total_files_changed <- total_files_changed + 1 + if (!dry_run) { + writeLines(new_lines, rmd_path) + } + } + } + + if (total_files_changed == 0) { + message(" (no matches found for: ", term, ")") + } + + invisible(total_files_changed) +} + + +# ── Run ──────────────────────────────────────────────────────────────────────── + + + +if ( FALSE) { + if (DRY_RUN) { + cat("════════════════════════════════════════════════════════\n") + cat(" DRY RUN — no files will be modified\n") + cat(" Set DRY_RUN <- FALSE to apply changes\n") + cat("════════════════════════════════════════════════════════\n\n") + } else { + cat("════════════════════════════════════════════════════════\n") + cat(" APPLYING CHANGES to .Rmd files\n") + cat("════════════════════════════════════════════════════════\n\n") + } + + for (entry in INDEX_TERMS) { + cat(sprintf("Term: \"%s\" → \\index{%s}\n", entry$term, entry$index_key)) + process_term(entry, RMD_DIR, DRY_RUN, FIRST_PER_FILE, CASE_SENSITIVE) + cat("\n") + } + + cat("Done.\n") + if (DRY_RUN) cat("Re-run with DRY_RUN <- FALSE to write changes.\n") +} + + + +################################################################################ +# CLEANUP HELPER +# +# strip_to_first(index_cmd) +# +# Removes all occurrences of a literal \index{...} command from the .Rmd files +# except the very first one found (in filename order). Useful for fixing +# |see{} entries that were accidentally inserted multiple times. +# +# Example: +# strip_to_first("\\index{DGP|see{data-generating process}}") +# strip_to_first("\\index{MCSE|see{Monte Carlo standard error}}") + +strip_to_first <- function(index_cmd, rmd_dir = ".") { + EXCLUDE <- c("Designing-Simulations-in-R.Rmd", "220-index.Rmd") + files <- sort(list.files(rmd_dir, pattern = "\\.Rmd$", full.names = TRUE)) + files <- files[!basename(files) %in% EXCLUDE] + kept <- FALSE + for (f in files) { + lines <- readLines(f, warn = FALSE) + hits <- which(grepl(index_cmd, lines, fixed = TRUE)) + if (length(hits) == 0) next + if (!kept) { + keep_line <- hits[1] + remove_pos <- hits[-1] + kept <- TRUE + } else { + remove_pos <- hits + } + if (length(remove_pos) > 0) { + lines[remove_pos] <- gsub(index_cmd, "", lines[remove_pos], fixed = TRUE) + writeLines(lines, f) + cat(sprintf(" Removed %d occurrence(s) from %s\n", length(remove_pos), basename(f))) + } + } + if (!kept) cat("Not found:", index_cmd, "\n") else cat("Done. First occurrence retained.\n") +} +################################################################################ + + +################################################################################ +# INDEXING CONVENTIONS (LaTeX \index{} format) +# +# Top-level entry: +# \index{data-generating process} +# +# Subentry (appears indented under the parent): +# \index{performance measures!bias} +# \index{performance measures!RMSE} +# +# Cross-reference ("see" note, no page number): +# \index{DGP|see{data-generating process}} +# +# Cross-reference with a page number ("see also"): +# \index{DGP|seealso{data-generating process}} +# +# Bold page number (marks the primary/defining occurrence): +# \index{data-generating process|textbf} +# +# Page range (wrap the relevant content): +# \index{long topic|(} ... content ... \index{long topic|)} +# +# TIP: Use \index{term|textbf} on the line where the term is DEFINED, +# and plain \index{term} everywhere else. This makes the defining +# page stand out in the printed index. +################################################################################ diff --git a/index_support/find_prose_lines.R b/index_support/find_prose_lines.R new file mode 100644 index 0000000..0482c27 --- /dev/null +++ b/index_support/find_prose_lines.R @@ -0,0 +1,202 @@ +################################################################################ +# find_prose_lines.R +# +# Search for phrases across .Rmd files, reporting only prose occurrences +# (skips YAML front matter, HTML comments, fenced code blocks, inline code, +# \@ref() spans, and markdown link URLs — same rules as add_index_terms.R). +# +# Usage: +# 1. Set SEARCH_TERMS to a pipe-separated string of phrases to search for. +# 2. Optionally restrict to specific files with SEARCH_FILES. +# 3. Source or run the script. Results are printed to the console. +# +# Output format (one block per search term): +# ── "cluster-randomized trial" ────────────────── +# 015-Case-study-ANOVA.Rmd : 185, 291 +# 020-Data-generating-models.Rmd : 73, 313, 359 +# +# When multiple terms are given, a combined summary is also printed showing +# every file and line that matched ANY of the terms. +################################################################################ + +library(stringr) + +# ── Configuration ────────────────────────────────────────────────────────────── + +# Pipe-separated phrases to search for. All are treated case-insensitively +# unless CASE_SENSITIVE is set to TRUE. +SEARCH_TERMS <- "IRT" + +# Restrict search to specific files, or NULL for all .Rmd files +SEARCH_FILES <- NULL # e.g. c("020-Data-generating-models.Rmd", "030-Estimation-procedures.Rmd") + +CASE_SENSITIVE <- FALSE # usually FALSE is more useful for search + +RMD_DIR <- "." + + +# ── Helper functions (shared with add_index_terms.R) ────────────────────────── + +classify_lines <- function(lines) { + in_code_block <- FALSE + in_yaml <- FALSE + in_html_comment <- FALSE + is_prose <- logical(length(lines)) + + for (i in seq_along(lines)) { + line <- lines[[i]] + + if (i == 1 && str_detect(line, "^---\\s*$")) { + in_yaml <- TRUE; is_prose[i] <- FALSE; next + } + if (in_yaml) { + is_prose[i] <- FALSE + if (str_detect(line, "^(---|\\.\\.\\.)\\s*$")) in_yaml <- FALSE + next + } + if (in_html_comment) { + is_prose[i] <- FALSE + if (str_detect(line, "-->")) in_html_comment <- FALSE + next + } + if (str_detect(line, "")) in_html_comment <- TRUE + next + } + if (str_detect(line, "^\\s*```")) { + in_code_block <- !in_code_block + is_prose[i] <- FALSE + } else { + is_prose[i] <- !in_code_block + } + } + is_prose +} + +# Return line numbers where `pattern` matches in prose, excluding protected spans. +prose_match_lines <- function(lines, is_prose, pattern) { + protected_pattern <- paste0( + "`[^`]+`", # inline code + "|\\\\@ref\\([^)]+\\)", # \@ref(...) + "|\\]\\([^)]+\\)" # markdown link URLs ](...) + ) + + hit_lines <- integer(0) + + for (i in seq_along(lines)) { + if (!is_prose[i]) next + line <- lines[[i]] + + # Build safe mask (exclude protected spans) + protected <- str_locate_all(line, protected_pattern)[[1]] + n <- nchar(line) + safe <- rep(TRUE, n) + if (nrow(protected) > 0) { + for (k in seq_len(nrow(protected))) + safe[protected[k, "start"]:protected[k, "end"]] <- FALSE + } + + # Find matches of pattern + m <- str_locate_all(line, pattern)[[1]] + if (nrow(m) == 0) next + + # Keep only matches entirely within safe regions + safe_hits <- apply(m, 1, function(r) all(safe[r["start"]:r["end"]])) + if (any(safe_hits)) hit_lines <- c(hit_lines, i) + } + hit_lines +} + + +# ── Format line numbers into gap-grouped ranges ─────────────────────────────── +# Lines within `gap` of each other are collapsed into a single "start-end" range. +# E.g. c(50,53,57,123,145,148) with gap=10 → "50-57, 123, 145-148" +format_lines <- function(lines, gap = 10) { + if (length(lines) == 0) return("") + lines <- sort(unique(lines)) + start <- lines[1]; end <- lines[1] + parts <- character(0) + for (i in seq_along(lines)[-1]) { + if (lines[i] - end <= gap) { + end <- lines[i] + } else { + parts <- c(parts, if (start == end) start else paste0(start, "-", end)) + start <- lines[i]; end <- lines[i] + } + } + parts <- c(parts, if (start == end) start else paste0(start, "-", end)) + paste(parts, collapse = ", ") +} + + +# ── Main search ──────────────────────────────────────────────────────────────── + +terms <- str_split(SEARCH_TERMS, "\\|")[[1]] |> str_trim() + +EXCLUDE <- c("Designing-Simulations-in-R.Rmd", "220-index.Rmd") +all_rmds <- list.files(RMD_DIR, pattern = "\\.Rmd$", full.names = FALSE) +all_rmds <- sort(all_rmds[!all_rmds %in% EXCLUDE]) +if (!is.null(SEARCH_FILES)) all_rmds <- all_rmds[all_rmds %in% SEARCH_FILES] + +# results[term][file] = integer vector of line numbers +results <- setNames(vector("list", length(terms)), terms) + +for (term in terms) { + flags <- if (CASE_SENSITIVE) "" else "(?i)" + pattern <- paste0(flags, "\\b", str_escape(term), "\\b") + file_hits <- list() + + for (fname in all_rmds) { + lines <- readLines(file.path(RMD_DIR, fname), warn = FALSE) + is_prose <- classify_lines(lines) + hits <- prose_match_lines(lines, is_prose, pattern) + if (length(hits) > 0) file_hits[[fname]] <- hits + } + results[[term]] <- file_hits +} + + +# ── Output ───────────────────────────────────────────────────────────────────── + +pad <- max(nchar(all_rmds)) + 2 + +for (term in terms) { + header <- paste0('\u2500\u2500 "', term, '" ', strrep("\u2500", max(0, 60 - nchar(term) - 5))) + cat(header, "\n") + + file_hits <- results[[term]] + if (length(file_hits) == 0) { + cat(" (no prose matches)\n") + } else { + for (fname in names(file_hits)) { + line_str <- format_lines(file_hits[[fname]]) + cat(sprintf(" %-*s %s\n", pad, fname, line_str)) + } + } + cat("\n") +} + +# Combined summary when multiple terms given +if (length(terms) > 1) { + cat(strrep("\u2500", 62), "\n") + cat(" COMBINED (any term)\n") + cat(strrep("\u2500", 62), "\n") + + # Merge all hits per file + combined <- list() + for (term in terms) { + for (fname in names(results[[term]])) { + combined[[fname]] <- sort(unique(c(combined[[fname]], results[[term]][[fname]]))) + } + } + if (length(combined) == 0) { + cat(" (no matches)\n") + } else { + for (fname in sort(names(combined))) { + line_str <- format_lines(combined[[fname]]) + cat(sprintf(" %-*s %s\n", pad, fname, line_str)) + } + } + cat("\n") +} diff --git a/index_support/index-planning.md b/index_support/index-planning.md new file mode 100644 index 0000000..9f66ceb --- /dev/null +++ b/index_support/index-planning.md @@ -0,0 +1,339 @@ +# Index Planning Document: *Designing Monte Carlo Simulations in R* + +## 1. Book Structure Overview + +### Part I: An Introductory Look + +**Chapter: Introduction** (`001-introduction.Rmd`) +- Some of simulation's many uses + - Comparing statistical approaches + - Assessing performance of complex pipelines + - Assessing performance under misspecification + - Assessing finite-sample performance + - Conducting Power Analyses + - Simulating processes +- The perils of simulation as evidence +- Simulating to learn +- Why R? +- Organization of the text + +**Chapter: Programming Preliminaries** (`003-programming-preliminaries.Rmd`) +- Welcome to the tidyverse +- Functions + - Rolling your own + - A dangerous function + - Using Named Arguments + - Argument Defaults + - Function skeletons +- Pipe (`>`) dreams +- Recipes versus Patterns +- Exercises + +**Chapter: An Initial Simulation** (`005-initial-t-test-simulation.Rmd`) +- Simulating a single scenario +- A non-normal population distribution +- Simulating across different scenarios +- Extending the simulation design +- Exercises + +--- + +### Part II: Structure and Mechanics of a Simulation Study + +**Chapter: Structure of a Simulation Study** (`010-Simulation-structure.Rmd`) +- General structure of a simulation +- Tidy, modular simulations +- Skeleton of a simulation study + - Data-Generating Process + - Data Analysis Procedure + - Repetition + - Performance summaries + - Multifactor simulations +- Exercises + +**Chapter: Case Study — Heteroskedastic ANOVA and Welch** (`015-Case-study-ANOVA.Rmd`) +- The data-generating model + - Now make a function (Welch DGP) + - Cautious coding +- The hypothesis testing procedures +- Running the simulation +- Summarizing test performance +- Exercises (Other alphas, Compare results, Power, Wide or long, Other tests, etc.) + +**Chapter: Data-Generating Processes** (`020-Data-generating-models.Rmd`) +- Examples (ANOVA, Bivariate Poisson, Cluster RCT) +- Components of a DGP +- A statistical model is a recipe for data generation +- Plot the artificial data +- Check the data-generating function +- Example: Simulating clustered data + - Design decisions; model for cluster RCT; equations to code; standardization +- Sometimes a DGP is all you need (3-parameter IRT) +- More to explore / Exercises + +**Chapter: Estimation Procedures** (`030-Estimation-procedures.Rmd`) +- Writing estimation functions +- Including Multiple Data Analysis Procedures +- Validating an Estimation Function + - Checking against existing implementations + - Checking novel procedures + - Checking with simulations +- Handling errors, warnings, and other hiccups + - Capturing errors and warnings + - Estimator processes that anticipate errors and warnings + - The `safely()` option +- Exercises + +**Chapter: Running the Simulation Process** (`035-running-simulation.Rmd`) +- Repeating oneself +- One run at a time +- Reparameterizing +- Bundling simulations with `simhelpers` +- Seeds and pseudo-random number generators +- Conclusions / Exercises + +**Chapter: Performance Measures** (`040-Performance-criteria.Rmd`) +- Measures for Point Estimators + - Comparing Cluster RCT Estimation Procedures + - Robust Performance Measures +- Measures for Variance Estimators + - Calibration + - Stability + - Assessing SEs for the Cluster RCT Simulation +- Measures for Confidence Intervals + - Cluster RCT confidence interval coverage +- Measures for Inferential Procedures (Hypothesis Tests) + - Validity; Power; Rejection Rates; Cluster RCT inference +- Relative or Absolute Measures? + - Performance relative to target parameter + - Performance relative to a benchmark estimator +- Estimands Not Represented By a Parameter +- Uncertainty in Performance Estimates (Monte Carlo Standard Error) + - Conventional MCSE for point estimators + - MCSEs for robust measures + - MCSE for CIs and Hypothesis Tests + - Jackknife and Bootstrap MCSEs +- Performance Calculations with the `simhelpers` Package +- Concluding thoughts / Exercises + +--- + +### Part III: Systematic Simulations + +**Chapter: Multiple Scenarios** (`060-multiple-scenarios.Rmd`) +- Simulating across levels of a single factor + - Performance summary function + - Adding performance calculations to the simulation driver +- Simulating across multiple factors +- Using `pmap` to run multifactor simulations +- When to calculate performance metrics + - Aggregate as you simulate (inside) + - Keep all simulation runs (outside) + - Getting raw results ready for analysis +- Summary / Exercises + +**Chapter: Designing Multifactor Simulations** (`070-experimental-design.Rmd`) +- Choosing parameter combinations +- Case Study: A multifactor evaluation of cluster RCT estimators + - Choosing parameters; Redundant factor combinations; Running; Calculating metrics +- Conclusions + +**Chapter: Exploring and Presenting Simulation Results** (`072-presentation-of-results.Rmd`) +- Tabulation + - Example: estimators of treatment variation +- Visualization + - Examples: RMSE, biserial correlation, variance estimation, heat maps, relative performance +- Modeling + - Examples: Biserial revisited; cross-classified data +- Reporting + +**Chapter: Building Good Visualizations** (`074-building-good-vizualizations.Rmd`) +- Subsetting and Many Small Multiples +- Bundling +- Aggregation +- Comparing true SEs with standardization +- The Bias-SE-RMSE plot +- Assessing the quality of estimated SEs + - Stability of estimated SEs +- Assessing confidence intervals +- Conclusions / Exercises + +**Chapter: Special Topics on Reporting** (`075-special-topics-on-reporting.Rmd`) +- Using regression to analyze simulation results + - Examples: Biserial; Cluster RCT +- Using regression trees to find important factors +- Analyzing results with few iterations per scenario +- Multilevel Meta Regressions +- What to do with warnings in simulations +- Conclusions + +**Chapter: Case Study — Comparing Different Estimators** (`077-case-study-comparing-estimators.Rmd`) +- Bias-variance tradeoffs + +**Chapter: Simulations as Evidence** (`080-simulations-as-evidence.Rmd`) +- Making simulations relevant + - Break symmetries; Multi-factor simulations; DGPs from prior literature; etc. + - Build a fully calibrated simulation +- Case study: Using copulas to generate calibrated data + - Steps 1–5 of copula-based calibration + - Assessing the copula's success; building multifactor simulation +- Simulation Sections in Methods Papers + - Purpose; Design decisions; Principles of design and presentation +- Concluding Thoughts + +--- + +### Part IV: Computational Considerations + +**Chapter: File Management** (`105-file-management.Rmd`) +- Simulation project structure +- Well-structured code files + - Headers in .R files; the `source` command; storing testing code +- Principled directory structures +- Saving simulation results + - File formats; Saving simulations as you go + +**Chapter: Parallel Processing** (`120-parallel-processing.Rmd`) +- Parallel on your computer +- Parallel on a virtual machine +- Parallel on a cluster + - CLI and CMD mode; task dispatchers; moving files; checking jobs; running lots of jobs + - Resources for Harvard's Odyssey + +**Chapter: Debugging and Testing** (`130-debugging_and_testing.Rmd`) +- Debugging with `print()` +- Debugging with `browser()` +- Debugging with `debug()` +- Protecting functions with `stopifnot()` +- Testing code (unit testing) + +--- + +### Part V: Complex Data Structures + +**Chapter: Simulation for Power Analysis** (`140-simulation-for-power-analysis.Rmd`) +- Getting design parameters from pilot data +- The data generating process +- Running the simulation +- Evaluating power + - Checking validity; Assessing Precision (SE); Assessing power; MDEs +- Power for Multilevel Data + +**Chapter: Simulation under the Potential Outcomes Framework** (`150-potential-outcomes-framework.Rmd`) +- Finite vs. Superpopulation inference +- Data generation processes for potential outcomes +- Finite sample performance measures +- Nested finite simulation procedure +- Calibrated simulations within the potential outcomes framework + - Matching-based imputation; Model-based imputation; Using imputed data + +**Chapter: The Parametric Bootstrap** (`160-parametric-bootstrap.Rmd`) +- Air conditioners: a stolen case study + +--- + +### Appendices + +**Appendix: Coding Tidbits** (`200-coding-tidbits.Rmd`) +- How to repeat yourself (`replicate()`, `map()`, other approaches) +- Default arguments for functions +- Profiling Code (`Sys.time()`, `tictoc`, `bench`, `profvis`) +- Optimizing code (and why you often shouldn't) + - Hand-building functions; computational efficiency; reusing code + +**Appendix: Further Readings and Resources** (`210-futher-resources.Rmd`) + +--- + +## 2. Bookdown Index Options + +### The short answer + +**bookdown does not have built-in support for a traditional back-of-book index in HTML output.** However, it has solid support for PDF/LaTeX output via standard LaTeX indexing. Your options fall into three tiers: + +--- + +### Option A: LaTeX Index for PDF Output (recommended starting point) + +This is the most natural and well-supported path. LaTeX has mature indexing infrastructure that works well with bookdown's `pdf_book` output. + +**How it works:** + +1. Add `\usepackage{imakeidx}` (or `makeidx`) to `preamble.tex`: + ```latex + \usepackage{imakeidx} + \makeindex + ``` + +2. Add `\printindex` at the end of the book (e.g., in `210-futher-resources.Rmd` or a new file): + ``` + \printindex + ``` + +3. Sprinkle `\index{}` commands throughout the `.Rmd` files at terms you want indexed: + ``` + The data-generating process \index{data-generating process} is the first step... + ``` + + For subentries: + ``` + \index{performance measures!bias} + \index{performance measures!RMSE} + ``` + + For "see also" cross-references: + ``` + \index{DGP|see{data-generating process}} + ``` + +4. Build the PDF — `pdflatex` + `makeindex` + `pdflatex` again handles the rest. The `bookdown::pdf_book` output already runs these steps. + +**Pros:** Polished, standard, automatic page numbers, subentries, cross-references. No extra packages needed. +**Cons:** Index only appears in PDF, not HTML gitbook. + +--- + +### Option B: HTML Index Page (manual or semi-automated) + +For the gitbook HTML output, bookdown has no built-in index mechanism (it relies on the search bar instead). But you can create a custom index page: + +1. Add a new `Rmd` file (e.g., `999-index-page.Rmd`) that serves as a manually curated "Index" chapter. +2. Use R to partially automate it — scan each chapter for key terms and generate anchor links. +3. Each entry links to a section anchor, e.g.: + ```markdown + **data-generating process** — [§ DGP functions](#DGP-functions), [§ DGP examples](#DGP-examples) + ``` + +The existing `{#label}` anchors already in your headings (e.g., `{#data-generating-processes}`, `{#performance-measures}`) make this feasible. + +**Pros:** Works in HTML, visible to web readers. +**Cons:** Requires significant manual curation; links go to sections, not page numbers. + +--- + +### Option C: Both Together (most complete) + +Use LaTeX `\index{}` commands for the PDF index, and also build the HTML index page. The `\index{}` commands are silently ignored in HTML output, so there's no conflict. This is the approach used by several Chapman & Hall/CRC bookdown textbooks. + +--- + +## 3. Recommended Next Steps + +Given where the book stands, here is a suggested workflow: + +1. **Start with `preamble.tex`** — add `\usepackage{imakeidx}` and `\makeindex` now, so the infrastructure is in place even before you add index entries. + DONE + + +2. **Decide on key terms to index** — based on the chapter outline above, the natural first-pass index candidates include: + - Core concepts: data-generating process, estimation procedure, performance measures, Monte Carlo standard error, tidy modular simulation + - Performance metrics: bias, variance, RMSE, coverage, power, Type I error, rejection rate + - Methods/packages: `simhelpers`, `purrr`, `furrr`, `future`, `testthat`, `replicate()`, `map()` + - Simulation design concepts: multifactor simulation, seeds, parallel processing, reparameterization + - Statistical concepts: confidence interval, hypothesis test, bootstrap, potential outcomes + +3. **Add `\index{}` commands chapter by chapter** — start with the most-referenced chapters (the structure chapters 010–040) where core vocabulary is defined. + +4. **Decide whether you want an HTML index** — if the book is primarily consumed as HTML (gitbook), this matters more than the PDF index. + +5. **Add `\printindex` to the book end** — either as its own `.Rmd` file or appended to the Further Resources chapter. diff --git a/index_support/index-terms-list.md b/index_support/index-terms-list.md new file mode 100644 index 0000000..78dcb77 --- /dev/null +++ b/index_support/index-terms-list.md @@ -0,0 +1,158 @@ +# Pending Index Terms + +NOTE: Some terms to possibly add to the index, work in progress + +--- + +## Core Simulation Concepts (15 terms) + +7. `simulation skeleton` +8. `single scenario simulation` +9. `multifactor simulation` +10. `simulation replication` +12. `simulation as evidence` +13. `calibrated simulation` +14. `finite-sample performance` +15. `misspecification` + +--- + +## Performance Measures (20 terms) + +16. `performance measures` +17. `performance measures!bias` +18. `performance measures!variance` +19. `performance measures!mean squared error` +20. `performance measures!RMSE` +21. `performance measures!relative bias` +22. `performance measures!coverage` +23. `performance measures!rejection rate` +24. `performance measures!Type I error` +25. `performance measures!power` +26. `performance measures!confidence interval width` +27. `performance measures!calibration` +28. `performance measures!stability` +29. `performance measures!robust measures` +30. `Monte Carlo standard error` +31. `MCSE|see{Monte Carlo standard error}` +32. `jackknife (MCSE)` +33. `bootstrap (MCSE)` +34. `size-adjusted power` +35. `estimand` + +--- + +## Data-Generating Processes (10 terms) + +36. `data-generating process!components` +37. `data-generating process!checking` +38. `data-generating process!plotting` +39. `data-generating process!standardization` +40. `data-generating process!cluster-randomized trial` +41. `data-generating process!ANOVA` +42. `data-generating process!bivariate Poisson` +43. `data-generating process!item response theory` +44. `data-generating process!meta-regression` + +--- + +## Estimation Procedures (8 terms) + +46. `estimation function` +47. `estimation function!validating` +48. `estimation function!multiple procedures` +49. `error handling` +50. `error handling!capturing warnings` +51. `error handling!safely()` +52. `contingent testing` +53. `Welch t-test` + +--- + +## Statistical Methods and Models (10 terms) + +54. `analysis of variance (ANOVA)` +55. `heteroskedasticity` +56. `parametric bootstrap` +57. `potential outcomes framework` +58. `finite population inference` +59. `superpopulation inference` +60. `cluster-randomized trial` +61. `multilevel model` +62. `meta-regression` +63. `item response theory` + +--- + +## R Programming Concepts (12 terms) + +64. `functions!writing` +65. `functions!default arguments` +66. `functions!named arguments` +67. `functions!skeleton` +68. `pipe operator` +69. `purrr package` +70. `purrr package!map()` +71. `purrr package!pmap()` +72. `replicate()` +73. `tidyverse` +74. `random seed` +75. `pseudo-random number generator` + +--- + +## R Packages (8 terms) + +76. `simhelpers package` +77. `furrr package` +78. `future package` +79. `testthat package` +80. `lme4 package` +81. `estimatr package` +82. `clubSandwich package` +83. `kableExtra package` + +--- + +## Computational and File Management (8 terms) + +84. `parallel processing` +85. `parallel processing!on a local computer` +86. `parallel processing!on a cluster` +87. `parallel processing!on a virtual machine` +88. `file management!project structure` +89. `file management!saving results` +90. `file management!source command` +91. `profiling code` + +--- + +## Debugging and Testing (4 terms) + +92. `debugging!print()` +93. `debugging!browser()` +94. `debugging!stopifnot()` +95. `unit testing` + +--- + +## Visualization and Reporting (5 terms) + +96. `visualization!many small multiples` +97. `visualization!Bias-SE-RMSE plot` +98. `visualization!heat maps` +99. `regression tree (simulation analysis)` +100. `simulation section (methods paper)` + +--- + +## Notes on Use + +- Terms listed as subentries (e.g., `performance measures!bias`) will appear indented + under the parent heading in the index. +- Cross-references (`|see{}`) produce a "see ..." note with no page number. +- On the line where a term is *defined*, use `|textbf` to bold the page number: + `\index{data-generating process|textbf}` +- Add these to `add_index_terms.R` as entries in the `INDEX_TERMS` list, + using the `term` field for the prose text to find and `index_key` for the + `\index{}` argument. diff --git a/index_support/migrate_see_entries.R b/index_support/migrate_see_entries.R new file mode 100644 index 0000000..68e3003 --- /dev/null +++ b/index_support/migrate_see_entries.R @@ -0,0 +1,110 @@ +################################################################################ +# migrate_see_entries.R +# +# One-time migration: finds every \index{A|see{B}} (and |seealso{B}) in the +# main .Rmd files, replaces it with \index{B} (the canonical entry), and +# ensures \index{A|see{B}} is present in 220-index.Rmd. +# +# After running this, prose files contain only canonical \index{term} entries. +# All cross-references live exclusively in 220-index.Rmd. +# +# Usage: +# Set DRY_RUN <- TRUE to preview, then DRY_RUN <- FALSE to apply. +################################################################################ + +library(stringr) + +DRY_RUN <- FALSE +RMD_DIR <- "." +INDEX_FILE <- "220-index.Rmd" +EXCLUDE <- c("Designing-Simulations-in-R.Rmd", INDEX_FILE) + +# ── Scan all prose files for \index{A|see{B}} ───────────────────────────────── + +files <- sort(list.files(RMD_DIR, pattern = "\\.Rmd$", full.names = TRUE)) +files <- files[!basename(files) %in% EXCLUDE] + +# Regex: captures A and B from \index{A|see{B}} or \index{A|seealso{B}} +SEE_PATTERN <- "\\\\index\\{([^|}]+)\\|(see(?:also)?)\\{([^}]+)\\}\\}" + +found_pairs <- list() # named list: "A|see{B}" -> TRUE, for deduplication + +for (f in files) { + lines <- readLines(f, warn = FALSE) + new_lines <- lines + changed <- FALSE + + for (i in seq_along(lines)) { + line <- lines[[i]] + if (!str_detect(line, SEE_PATTERN)) next + + # Collect all matches on this line + m <- str_match_all(line, SEE_PATTERN)[[1]] + # m columns: [full_match, A, see_or_seealso, B] + + for (k in seq_len(nrow(m))) { + full_match <- m[k, 1] + A <- m[k, 2] + see_type <- m[k, 3] + B <- m[k, 4] + + canonical_entry <- paste0("\\index{", B, "}") + see_entry <- paste0("\\index{", A, "|", see_type, "{", B, "}}") + found_pairs[[see_entry]] <- TRUE + + if (dry_run_line <- DRY_RUN) { + cat(sprintf("\n [%s] line %d\n", basename(f), i)) + cat(sprintf(" REPLACE: %s\n", full_match)) + cat(sprintf(" WITH: %s\n", canonical_entry)) + } + + new_lines[[i]] <- str_replace(new_lines[[i]], fixed(full_match), canonical_entry) + changed <- TRUE + } + } + + if (changed && !DRY_RUN) { + writeLines(new_lines, f) + cat(sprintf(" Updated: %s\n", basename(f))) + } +} + +# ── Update 220-index.Rmd ─────────────────────────────────────────────────────── + +index_path <- file.path(RMD_DIR, INDEX_FILE) +index_lines <- readLines(index_path, warn = FALSE) + +# Find lines already present in the file +already_present <- sapply(names(found_pairs), function(entry) { + any(str_detect(index_lines, fixed(entry))) +}) +new_entries <- names(found_pairs)[!already_present] + +if (length(new_entries) == 0) { + cat("\n220-index.Rmd: all see-entries already present, nothing to add.\n") +} else { + cat(sprintf("\n220-index.Rmd: adding %d new see-entries:\n", length(new_entries))) + for (e in new_entries) cat(sprintf(" %s\n", e)) + + if (!DRY_RUN) { + # Insert new entries just before the \printindex line + printindex_line <- which(str_detect(index_lines, fixed("\\printindex"))) + if (length(printindex_line) == 0) { + stop("Could not find \\printindex in ", INDEX_FILE) + } + insert_at <- printindex_line[1] + index_lines <- c( + index_lines[seq_len(insert_at - 1)], + new_entries, + index_lines[insert_at:length(index_lines)] + ) + writeLines(index_lines, index_path) + cat(sprintf(" Written to %s\n", INDEX_FILE)) + } +} + +if (DRY_RUN) { + cat("\n── DRY RUN complete. Set DRY_RUN <- FALSE to apply changes. ──\n") +} else { + cat("\n── Migration complete. ──\n") +} diff --git a/index_support/running-examples.md b/index_support/running-examples.md new file mode 100644 index 0000000..6a30637 --- /dev/null +++ b/index_support/running-examples.md @@ -0,0 +1,55 @@ +# Running Examples and Case Studies + +Index entries use the form `example!name` (lowercase, to sort correctly under a +single "example" parent heading in the printed index). + +--- + +## Primary Running Examples +*Introduced early and revisited across many chapters.* + +| Example | `\index{}` key | Chapters | +|---|---|---| +| Initial t-test simulation | `example!t-test simulation` | 3, 4 | +| Heteroskedastic ANOVA / Welch t-test (Brown & Forsythe 1974) | `example!heteroskedastic ANOVA (Welch)` | 5, 10, 12, exercises throughout | +| Cluster-randomized trial | `example!cluster-randomized trial` | 6, 8, 9, 10, 12, 14, 15, 17 | +| Bivariate Poisson / Pearson correlation CI | `example!bivariate Poisson` | 6, 8, 9, 10, 11, 12 | + +--- + +## Secondary Examples +*Introduced (often in exercises), revisited in a few later chapters.* + +| Example | `\index{}` key | Chapters | +|---|---|---| +| 3-parameter IRT model | `example!3-parameter IRT` | 6 (DGP), 8 (estimation) | +| Random effects meta-regression | `example!random effects meta-regression` | 6, 8, 12 (exercises) | +| Vevea-Hedges selection model | `example!Vevea-Hedges selection model` | 6, 8 (exercises, extends meta-regression) | +| Biserial correlation | `example!biserial correlation` | 12, 15 | +| Comparing treatment variation estimators | `example!treatment variation estimators` | 12, 15 | + +--- + +## Self-Contained Chapter Case Studies +*Appear primarily within a single chapter.* + +| Example | `\index{}` key | Chapter | +|---|---|---| +| Calibrated simulation using copulas | `example!calibrated simulation (copula)` | 16 | +| Parametric bootstrap (air conditioner data) | `example!parametric bootstrap (air conditioner)` | 20 | +| Power analysis for multilevel data | `example!power analysis (multilevel)` | 18 | +| Potential outcomes / matching imputation | `example!potential outcomes (matching)` | 19 | + +--- + +## Notes on Indexing + +- Add `\index{example!cluster-randomized trial}` etc. at the *start* of each + section or paragraph where a given example is substantively developed — + not just mentioned in passing. +- For the primary running examples, consider using `|textbf` on the page where + the example is first fully introduced: + `\index{example!cluster-randomized trial|textbf}` +- The `add_index_terms.R` script is not well-suited for these entries because + the example names don't appear as consistent verbatim phrases in the prose. + These are best added manually, section by section. diff --git a/index_support/searching index terms.txt b/index_support/searching index terms.txt new file mode 100644 index 0000000..6410d12 --- /dev/null +++ b/index_support/searching index terms.txt @@ -0,0 +1,12 @@ + +grep -rn "\\index{DGP|see{data-generating process}}" *.Rmd + +for (f in list.files(".", pattern = "\\.Rmd$", full.names = TRUE)) { + x <- readLines(f, warn = FALSE) + x2 <- gsub("\\index{DGP|see{data-generating process}}", "", x, fixed = TRUE) + if (!identical(x, x2)) writeLines(x2, f) +} + + + +grep -rn "\\\\index{[^}]*|see" *.Rmd | grep -v "Designing-Simulations-in-R.Rmd" | grep -v "220-index.Rmd" \ No newline at end of file diff --git a/preamble.tex b/preamble.tex index 5df9cb1..74da1f3 100644 --- a/preamble.tex +++ b/preamble.tex @@ -28,3 +28,7 @@ \newcommand{\mat}[1]{\mathbf{#1}} \newcommand{\bs}{\boldsymbol} \newcommand{\trace}{\text{tr}} + + +\usepackage{imakeidx} +\makeindex