diff --git a/.github/workflows/deploy_bookdown.yml b/.github/workflows/deploy_bookdown.yml index 093c492..1c415e5 100644 --- a/.github/workflows/deploy_bookdown.yml +++ b/.github/workflows/deploy_bookdown.yml @@ -22,6 +22,8 @@ jobs: run: Rscript -e 'bookdown::render_book("index.Rmd", "bookdown::gitbook")' - name: Set up tinytex uses: r-lib/actions/setup-tinytex@v2 + env: + TINYTEX_INSTALLER: TinyTeX - name: Check latex installation run: tlmgr --version - name: Render pdf book diff --git a/001-introduction.Rmd b/001-introduction.Rmd index 28acb80..4fc3351 100644 --- a/001-introduction.Rmd +++ b/001-introduction.Rmd @@ -97,7 +97,7 @@ For example, this is of particular concern with hierarchical data structures tha Simulation is a tractable approach for assessing the small-sample performance of such estimation methods or for determining minimum required sample sizes for adequate performance. One example of a simulation investigating questions of finite-sample behavior comes from @longUsingHeteroscedasticityConsistent2000, whose evaluated the performance of heteroskedasticity-robust standard errors (HRSE) in linear regression models. -Asymptotic analysis indicates that HRSEs work well (in the sense of providing correct assessments of uncertainty) in sufficiently large samples (@White1980heteroskedasticity), but what about in realistic contexts where small samples occur? +Asymptotic analysis indicates that HRSEs work well (in the sense of providing correct assessments of uncertainty) in sufficiently large samples [@White1980heteroskedasticity], but what about in realistic contexts where small samples occur? @longUsingHeteroscedasticityConsistent2000 use extensive simulations to investigate the properties of different versions of HRSEs for linear regression across a range of sample sizes, demonstrating that the most commonly used form of these estimators often does _not_ work well with sample sizes found in typical social science applications. Via simulation, they provided compelling evidence about a problem without having to wade into a technical (and potentially inaccessible) mathematical analysis of the problem. @@ -161,7 +161,7 @@ Even this strategy has limitations, though. Except for very simple processes, we can seldom consider every possible set of conditions. As we will see in later chapters, the design of a simulation study typically entails making choices over very large spaces of possibility. -This flexibility leaves lots of room for discretion and judgement, and even for personal or professional biases [@boulesteix2020Replicationa]. +This flexibility leaves lots of room for discretion and judgement, and even for personal or professional biases [@boulesteix2020Replication]. Due to this flexibility, simulation findings are held in great skepticism by many. The following motto summarizes the skeptic's concern: diff --git a/020-Data-generating-models.Rmd b/020-Data-generating-models.Rmd index 96f7fa1..e0332da 100644 --- a/020-Data-generating-models.Rmd +++ b/020-Data-generating-models.Rmd @@ -238,7 +238,7 @@ Here is a plot of 30 observations from the bivariate Poisson distribution with m ```{r bivariate-Poisson-scatter} #| echo: false #| message: false -#| fig.cap: "$N = 30$ observations from the bivariate Poisson distribution with $\\mu_1 = 10, \\mu_2 = 7, \rho = .65$." +#| fig.cap: "$N = 30$ observations from the bivariate Poisson distribution with $\\mu_1 = 10, \\mu_2 = 7, \\rho = .65$." #| fig.width: 6 #| fig.height: 4 @@ -310,7 +310,7 @@ Even simple checks such as these can be quite helpful in catching such bugs. Writing code for a complicated DGP can feel like a daunting task, but if you first focus on a recipe for how the data is generated, it is often not too bad to then convert that recipe into code. We now illustrate this process with a detailed case study involving a more complex data-generating process -Recent literature on multisite trials (where, for example, students are randomized to treatment or control within each of a series of sites) has explored how variation in the strength of effects across sites can affect how different data-analysis procedures behave [e.g., @miratrix2021applied; @Bloom:2016um]. +Recent literature on multisite trials (where, for example, students are randomized to treatment or control within each of a series of sites) has explored how variation in the strength of effects across sites can affect how different data-analysis procedures behave [e.g., @miratrix2021applied; @Bloom2016using]. In this example, we are going to extend this work to explore best practices for estimating treatment effects in cluster randomized trials. In particular, we will investigate what happens when the treatment impact for each school is related to the size of the school. @@ -585,16 +585,16 @@ For a particular fixed-length test, the set of item parameters would depend on t But we are not (yet) dealing with actual testing data, so we will need to make up an auxiliary model for these parameters. Perhaps we could just simulate some values? Arbitrarily, let's draw the difficulty parameters from a normal distribution with mean $\mu_\alpha = 0$ and standard deviation $\tau_\alpha = 1$. -The discrimination parameters have to be greater than zero, and values near $\beta_m = 1$ make the model simplify (in other words, if $\beta_1 = 1$ then we can drop the parameter from the model), so let's draw them from a gamma distribution with mean $\mu_\beta = 1$ and standard deviation $\tau_\beta = 0.2$. +The discrimination parameters have to be greater than zero, and values near $\alpha_m = 1$ make the model simplify (in other words, if $\alpha_1 = 1$ then we can drop the parameter from the model), so let's draw them from a gamma distribution with mean $\mu_\alpha = 1$ and standard deviation $\tau_\alpha = 0.2$. This decision requires a bit of work: gamma distributions are usually parameterized in terms of shape and rate, not mean and standard deviation. A bit of poking on Wikipedia gives us the answer, however: -shape is equal to $\mu_\beta^2 \tau_\beta^2 = 0.2^2$ and rate is equal to $\mu_\beta \tau_\beta^2 = 0.2^2$. +shape is equal to $\mu_\alpha^2 \tau_\alpha^2 = 0.2^2$ and rate is equal to $\mu_\alpha \tau_\alpha^2 = 0.2^2$. Finally, we imagine that all the test questions have four possible responses, and therefore set $\gamma_m = \frac{1}{4}$ for all the items, just like the instructor suggested. Each item requires three numbers; the easiest way to generate them is to let them all be independent of each other, so we do that. With that, let's make up some item parameters: ```{r} -alphas <- rnorm(M, mean = 0, sd = 1.5) # difficulty parameters -betas <- rgamma(M, shape = 0.2^2, rate = 0.2^2) # discrimination parameters +betas <- rnorm(M, mean = 0, sd = 1.5) # difficulty parameters +alphas <- rgamma(M, shape = 0.2^2, rate = 0.2^2) # discrimination parameters gammas <- rep(1 / 4, M) # guessing parameters ``` @@ -638,10 +638,13 @@ r_3PL_IRT <- function( thetas <- rnorm(N) # generate item parameters - alphas <- rnorm(M, mean = diff_M, sd = diff_SD) - betas <- rgamma(M, - shape = disc_M^2 * disc_SD^2, - rate = disc_M * disc_SD^2) + + alphas <- rgamma( + M, + shape = disc_M^2 * disc_SD^2, + rate = disc_M * disc_SD^2 + ) + betas <- rnorm(M, mean = diff_M, sd = diff_SD) gammas <- rep(1 / item_options, M) # simulate item responses @@ -830,7 +833,7 @@ Another model for generating bivariate counts with negative binomial marginal di $$ \left(\begin{array}{c}Z_1 \\ Z_2 \end{array}\right) \sim N\left(\left[\begin{array}{c}0 \\ 0\end{array}\right], \ \left[\begin{array}{cc}1 & \rho \\ \rho & 1\end{array}\right]\right) $$ -Now find $U_1 = \Phi(Z_1)$ and $U_1 = \Phi(Z_1)$, where $\Phi()$ is the standard normal cumulative distribution function (called `pnorm()` in R). +Now find $U_1 = \Phi(Z_1)$ and $U_2 = \Phi(Z_2)$, where $\Phi()$ is the standard normal cumulative distribution function (called `pnorm()` in R). Then generate the counts by evaluating $U_1$ and $U_2$ with the negative binomial quantile function, $F_{NB}^{-1}(x | \mu, p)$ with mean parameters $\mu$ and size parameter $p$ (this function is called `qnbinom()` in R): $$ C_1 = F_{NB}^{-1}(U_1 | \mu_1, p_1) \qquad C_2 = F_{NB}^{-1}(U_2 | \mu_2, p_2). diff --git a/072-presentation-of-results.Rmd b/072-presentation-of-results.Rmd index fcf5345..b30e37b 100644 --- a/072-presentation-of-results.Rmd +++ b/072-presentation-of-results.Rmd @@ -38,7 +38,7 @@ Good analysis will provide a clear understanding of how one or more of the simul In multi-factor simulations, the major challenge in analyzing simulation results is dealing with the multiplicity and dimensional nature of the results. For instance, in our cluster RCT simulation, we calculated performance metrics in each of `r prettyNum( nrow(sres) / 3, big.mark=",")` different simulation scenarios, which vary along several factors. For each scenario, we calculated a whole suite of performance measures (bias, SE, RMSE, coverage, ...), and we have these performance measures for each of three estimation methods under consideration. -We organizeed all these results as a table with `r prettyNum( nrow(sres), big.mark=",")` rows (three rows per simulation scenario, with each row corresponding to a specific method) and one column per performance metric. +We organized all these results as a table with `r prettyNum( nrow(sres), big.mark=",")` rows (three rows per simulation scenario, with each row corresponding to a specific method) and one column per performance metric. Navigating all of this can feel somewhat overwhelming. How do we understand trends in this complex, multi-factor data structure? @@ -379,7 +379,7 @@ The $x$-axis shows each of our five methods we are comparing. The boxplots are "holding" the other factors, and show the Type-I error rates for the different small-sample corrections across the covariates tested and degree of model misspecification. We add a line at the target 0.05 rejection rate to ease comparison. The reach of the boxes shows how some methods are more or less vulnerable to different types of misspecification. -Some estimators (e.g., $T^2_A$) are clearly hyper-conservitive, with very low rejection rates. +Some estimators (e.g., $T^2_A$) are clearly hyper-conservative, with very low rejection rates. Other methods (e.g., EDF), have a range of very high rejection rates when $m = 10$; the degree of rejection rate must depend on model mis-specification and number of covariates tested (the things in the boxes). @@ -568,8 +568,8 @@ Simulations are designed experiments, often with a full factorial structure. The results are datasets in their own right, just as if we had collected data in the wild. We can therefore leverage classic means for analyzing such full factorial experiments. For example, we can regress a performance measure against our factor levels to get the "main effects" of how the different levels impact performance, holding the other levels constant. -This type of regression is called a "meta regression" (@kleijnen1981regression, @friedman1988metamodel, @gilbert2024multilevel), as we are regressing on already processed results. -It also has ties to meta analysis (see, e.g., @borenstein2021introduction), where we look for trends across sets of experiments. +This type of regression is called a "meta regression" [@kleijnen1981regression; @friedman1988metamodel; @gilbert2024multilevel], as we are regressing on already processed results. +It also has ties to meta analysis [see, e.g., @borenstein2021introduction], where we look for trends across sets of experiments. In the language of a full factor experiment, we might be interested in the "main effects" and the "interaction effects." A main effect is whether, averaging across the other factors in our experiment, a factor of interest systematically impacts performance. @@ -583,14 +583,22 @@ We might expect, for example, that for all methods the true standard error goes Meta-regressions would also typically include interactions between method and factor, to see if some factors impact different methods differently. They can also include interactions between simulation factors, which allows us to explore how the impact of a factor can matter more or less, depending on other aspects of the context. +Using meta regresion can also account for simulation uncertainty in some contexts, which can be especially important when the number of iterations per scenario is low. +See @gilbert2024multilevel for more on this. -### Example 1: Biserial, revisited -For example, consider the bias of the biserial correlation estimates from above. -Visually, we see that several factors appear to impact bias, but we might want to get a sense of how much. -In particular, how much does the population vs sample cutoff option matter for bias, across all the simulation factors considered? +### Example 1: Biserial, revisited +In the biserial correlation example above, we saw that bias can change notably across scenarios considered, and that several factors appear to be driving these changes. +These factors also seem to have complex interactions: note how when p1 = 0.5, we get larger dips than when p1 = 1/8. +The figure gives a sense of this complex, rich story, but we might also want to summarize our results to get a sense of overall trends, so we can provide a simpler story of what is going on. +We also might want to get a sense of the relative importance of various factors and their interactions. +For example, we might ask how much the population (top row) vs. sample (bottom row) cutoff option matters for bias, across all the simulation factors considered. +Is it a primary driver of when there is a lot of bias, or just one of many players of roughly equal import? + ```{r setup_modeling_demonstration, warning=FALSE, include=FALSE} options(scipen = 5) mod = lm( bias ~ fixed + rho + I(rho^2) + p1 + n, data = r_F) @@ -598,6 +606,7 @@ broom::tidy(mod) %>% knitr::kable( digits = c( 0,4,4,1,2 ) ) ``` + -We can use ANOVA to decompose the variation in bias into components predicted by various combinations of the simulation factors. -Using ANOVA we can identify which factors have negligible/minor influence on the bias of an estimator, and which factors drive the variation we see. -We can then summarise our anova table to see the contribution of the various factors and interactions to the total amount of variation in performance: +ANOVA helps answer these sorts of questions. +In particular, with ANOVA, we can decompose how much bias changes across scenarios into components predicted by various combinations of the simulation factors. +We can do this with the `aov()` function in R, which is a wrapper around `lm()` that is designed for ANOVA. +We first fit a model regressing bias on all interactions of our four simulation factors. +In the R formula syntax, our model is `bias ~ rho * p1 * fixed * n`. + +The sum of squares ANOVA decomposition then provides a means for identifying which factors have negligible/minor influence on the bias of an estimator, and which factors drive the variation we see. +For example, the following "eta table" gives the contribution of the various factors and interactions to the total amount of variation in bias across scenarios: ```{r, warning=FALSE, echo=FALSE} anova_table <- aov(bias ~ rho * p1 * fixed * n, data = r_F) @@ -627,7 +641,7 @@ etaSquared(anova_table) %>% knitr::kable( digits = 2 ) ``` -Here we see which factors are explaining the most variation. E.g., `p1` is explaining 21% of the variation in bias across simulations. +The table shows which factors are explaining the most variation. E.g., `p1` is explaining 21% of the variation in bias across simulations. The contribution of any of the three- or four-way interactions are fairly minimal, by comparison, and could be dropped to simplify our model. Modeling summarizes overall trends, and ANOVA allows us to identify what factors are relatively more important for explaining variation in our performance measure. @@ -638,10 +652,10 @@ We could fit a regression model or ANOVA model for each performance measure in t @lee2023comparing were interested in evaluating how different modeling approaches perform when analyzing cross-classified data structures. To do this they conducted a multi-factor simulation to compare three methods: a method called CCREM, two-way OLS with cluster-robust variance estimation (CRVE), and two-way fixed effects with CRVE. The simulation was complex, involving several factors, so they fit an ANOVA model to understand which factors had the most influence on performance. -In particular, they ran _four_ multifactor simulations, each in a different set of conditions. +In particular, they ran _four_ multifactor simulations, each under a different broader context (those being assumptions met, homoscedasticity violated, exogeneity violated, and presence of random slopes). They then used ANOVA to explore how the simulation factors impacted bias within each of these contexts. -One of their tables in the supplementary materials (Table S5.2, see [here](https://osf.io/hy73g), page 20, and reproduced below) shows the results of these four ANOVA models, with each column being a simulation context (those being assumptions met, homoscedasticity violated, exogeneity violated, and presence of random slopes), and the rows corresponding to factors manipulated within the simulation. +One of their tables in the supplementary materials (Table S5.2, see [here](https://osf.io/hy73g), page 20, and reproduced below) shows the results of these four ANOVA models, with each column being a simulation context, and the rows corresponding to factors manipulated within that context. Small, medium, and large effects are marked to make them jump out to the eye. **ANOVA Results on Parameter Bias** @@ -668,7 +682,7 @@ Small, medium, and large effects are marked to make them jump out to the eye. We see that when model assumptions are met or only homoscedasticity is violated, choice of method (CCREM, two-way OLS-CRVE, FE-CRVE) has almost no impact on parameter bias ($\eta^2 = 0.000$ to 0.006). -However, under an exogeneity violation, method choice has a large effect ($\eta^2 = 0.995$), indicating that some methods (like OLS-CRVE) have much more bias than others. +However, under an exogeneity violation, method choice has a large effect ($\eta^2 = 0.995$), indicating that some methods (e.g., OLS-CRVE) have much more bias than others. Other factors such as the effect size of the parameter and the number of schools can also show moderate-to-large impacts on bias in several conditions. The table also shows how an interaction between simulation factors can matter. @@ -676,20 +690,17 @@ For example, interactions between method and number of schools, or students per Overall, the table shows how some aspects of the DGP matter more, and some less. -Using meta regresion can also account for simulation uncertainty in some contexts, which can be especially important when the number of iterations per scenario is low. -See @gilbert2024multilevel for more on this. ## Reporting -The final form of your report will typically -For your final write-up, you will not want to present everything. -A wall of numbers and observations only serves to pummel the reader. +There is a difference in the results you will generate so you can understand what is going on in your simulation, and the results that you will include in an outward facing report. +Do not pummel your reader with a deluge of tables, figures, and observations. Instead, present selected results that clearly illustrate the main findings from the study, along with anything unusual or anomalous. Your presentation will typically be best served with a few well-chosen figures. Then, in the text of your write-up, you might include a few specific numerical comparisons. Do not include too many of these, and be sure to say why the numerical comparisons you include are important. -To form these final exhibits, you will likely have to generate a wide range of results that show different facets of your simulation. +To form your final exhibits, you will likely have to generate a wide range of results that show different aspects of your simulation. These are for you, and will help you deeply understand what is going on. You then try to simplify the story, in a way that is honest and transparent, by curating this full set of figures to your final ones. Some of the remainder will then become supplementary materials that contain further detail to both enrich your main narrative and demonstrate that you are not hiding anything. @@ -703,3 +714,4 @@ People will naturally think, "if that researcher is so willing to let me see wha + diff --git a/074-building-good-vizualizations.Rmd b/074-building-good-vizualizations.Rmd index f482d6b..206328d 100644 --- a/074-building-good-vizualizations.Rmd +++ b/074-building-good-vizualizations.Rmd @@ -39,7 +39,8 @@ sres <- res %>% RMSE_mcse = rmse_mcse, SE = stddev, SE_mcse = stddev_mcse ) %>% - dplyr::select( -K_relvar ) + dplyr::select( -K_relvar ) %>% + ungroup() sres # 1000 iterations per factor @@ -49,21 +50,20 @@ summary( sres$R ) # Building good visualizations {#building-good-visualization} Visualization should nearly always be the first step in analyzing simulation results. -In the prior chapter, we saw a series of visualizations that showed overall trends across a variety of examples. +In the prior chapter, we saw a variety of examples primarily taken from published work. Those visualizations were not the initial ones created for those research projects. -In practice, making a visualization often requires creating a _bunch_ of graphs to look at different aspects of the data. -From that pile of graphs, you would then refine ones that communicate the overall results most cleanly, and include those in your main write-up. +In practice, getting to a good visualization often requires creating _many_ different graphs to look at different aspects of the data. +From that pile of graphs, you would then curate and refine those that communicate the overall results most cleanly. -In our work, we find we often generate a series of R Markdown reports with comprehensive simulation results targeting our various research questions. +In our work, we find we often generate a series of R Markdown reports with comprehensive sets of charts targeting our various research questions. These initial documents are then discussed internally by the research team. -In this chapter we discuss a set of common tools that we frequently use to explore our simulation results. -In particular, we focus on four essential tools: +In this chapter we first discuss four essential tools that we frequently use to make these initial sets of graphs: 1. **Subsetting**: Multifactor simulations can be complex and confusing. Sometimes it is easier to first explore a subset of the simulation results, such as a single factor level. 2. **Many small multiples**: Plot many results in a single plot, with facets to break up the results by simulation factors. -3. **Bundling**: Group the results by a primary factor of interest, and then plotting the performance measure as a boxplot so you can see how much variation there is within that factor level. -4. **Aggregation**: Average the performance measure across some of the simulation factors, so you can see overall trends. +3. **Bundling**: Group the results by a primary factor of interest, and then plot the performance measure as a boxplot so you can see how much variation there is within that factor level. +4. **Aggregation**: Average the performance measure across some of the simulation factors, so you can see overall trends with respect to the remaining factors. Subsetting is a very useful tool, especially when the scope of the simulation feels overwhelming. -And as we just saw, it can also be used as a quick validity check: we can subset to a known context where we know nothing exciting should be happening, and then check that indeed nothing is there. +And as we just saw, it can also be used as a quick validity check: subset to a known context where we know nothing exciting should be happening to verify that indeed nothing is there. -Subsetting allows for a deep dive into a specific context. -It also can make it easier to think through what is happening in a complex context. -Sometimes we might even just report a subset in our final analysis. -In this case, we would consider the other levels as a "sensitivity" analysis vaguely alluded to in our main report and placed elsewhere, such as an online supplemental appendix. +Subsetting allows for a deep dive into specific context. +It also can make it easier to think through what is happening in a complex context; think of it as a flashlight, shining attention on one part of your overall simulation or another, to focus attention and reduce complexity. +Sometimes we might even just report the results for a subset in our final analysis and put the analysis of the the remaining scenarios elsewhere, such as an online supplemental appendix. +In this case, it would then be our job to verify that our reported findings on the main results indeed were echoed in the set-aside runs. -It would be our job, in this case, to verify that our reported findings on the main results indeed were echoed in our other, set-aside, simulation runs. -In our case, as we see below, we will see little effect of the ICC on how one model performs relative to another; we thus might be able to safely ignore the ICC factor in our main report. + -Subsetting is useful, but if you do want to look at all your simulation results at once, you need to somehow aggregate your results to make them all fit on the plot. -We next present bundling, a way of using the core idea of small multiples for showing all of the raw results, but in a semi-aggregated way. +Subsetting is useful, but if you do want to look at all your simulation results at once, you need to somehow aggregate or group your results to make them all fit on the plot. +We next present bundling, a way of keeping the core idea of small multiples to show all of the raw results, but now in a semi-aggregated way. ## Bundling When faced with many simulation factors, we can _bundle_ the simulations into groups defined by a selected primary factor of interest, and then plot each bundle with a boxplot of the distribution of a selected performance criteria. -Each boxplot shows the central measure of how well an estimator worked across a set of scenarios, along with a sense of how much that performance varied across those scenarios. +Each boxplot shows the central measure of how well an estimator worked across a set of scenarios, along with a sense of how much that performance varied across those scenarios in the box. If the boxes are narrow, then we know that the variation across simulations within the box did not impact performance much. If the boxes are wide, then we know that the factors that vary within the box matter a lot for performance. With bundling, we generally need a good number of simulation runs per scenario, so that the MCSE in the performance measures does not make our boxplots look substantially more variable (wider) than the truth. -(Consider a case where all the scenarios within a box have zero bias; if MCSE were large, we would see a wide boxplot when we should not.) +Consider a case where all the scenarios within a box have zero _true_ bias; if the MCSE were large, the _estimated_ biases would still vary and we would see a wide boxplot when we should not. -To illustrate bundling, we group our Cluster RCT results by method, ICC, the size coefficient (how strong the cluster size to treatment impact relationship is), and alpha (how much the cluster sizes vary). -For a specific ICC, size, and alpha, we will put the boxes for the three methods side-by-side to directly compare them: +To illustrate bundling, we replicate our small subset figure from above, but instead of each point (with a given `J', `alpha`, and `size_coef`) just being the single scenario with `n_bar=80` and `ICC = 0.20`, we plot all the scenarios in a boxplot at that location. +We put the boxes for the three methods side-by-side to directly compare them: ```{r clusterRCT_plot_bias_v1} -ggplot( sres, aes( as.factor(alpha), bias, col=method, group=paste0(ICC,method) ) ) + - facet_grid( size_coef ~ ICC, labeller = label_both ) + - geom_boxplot(coef = Inf) + +ggplot( sres, aes( as.factor(J), bias, col=method, + group=paste0(method, J) ) ) + + facet_grid( size_coef ~ alpha, labeller = label_both ) + + geom_boxplot( coef = Inf, width=0.7, fill="grey" ) + geom_hline( yintercept = 0 ) + theme_minimal() + ``` -Each box is a collection of simulation trials. E.g., for `ICC = 0.6`, `size_coef = 0.2`, and `alpha = 0.8` each of the three boxes contains 9 scenarios representing the varying level 1 and level 2 sample sizes. -Here are the 9 for the Aggregation method: + + +All of our simulation trials are represented in this plot. +Each box is a collection of simulation trials. E.g., for `J = 5`, `size_coef = 0`, and `alpha = 0.8` each of the three boxes contains 15 scenarios representing the varying ICC and cluster size. +Here are the 15 results in the top right box for the Aggregation method: ```{r} filter( sres, - ICC == 0.6, - size_coef == 0.2, - alpha == 0.8, method=="Agg" ) %>% - dplyr::select( n_bar:alpha, bias ) %>% + J == 5, + size_coef == 0, + alpha == 0.8, + method=="Agg" ) %>% + dplyr::select( n_bar, J, size_coef, ICC, alpha, bias, bias_mcse ) %>% + arrange( bias ) %>% knitr::kable( digits = 2 ) ``` Our bias boxplot makes some trends clear. -For example, we see that there is virtually no bias for any method when the size coefficient is 0 and the ICC is 0. -It is a bit more unclear, but it seems there is also virtually no bias when the size coefficient is 0 regardless of ICC, but the boxes get wider as ICC increases, making us wonder if something else is potentially going on. -When alpha is 0 and the size coefficient is 0.2, all methods have a negative bias for most scenarios considered, as all boxes and almost all of the whiskers are below the 0 line (when ICC is 0.6 or 0.8 we may have some instances of 0 or positive bias, if that is not MCSE giving long tails). +For example, we see that there is no bias, on average, for any method when the size coefficient is 0 and alpha is 0, especially when $J = 80$. + +When the size coefficient is 0.2, we also see LR jump out from the others when `alpha` is not 0. -The apparent outliers (long tails) for some of the boxplots suggest that the other two factors (cluster size and number of clusters) do relate to the degree of bias. We could try bundling along different aspects to see if that explains these differences: +The apparent outliers (long tails) for some of the boxplots suggest that the two remaining factors (ICC and cluster size) could relate to the degree of bias. +They could also be due to MCSE, and given that we primariy see these tails when $J$ is small, this is a real concern. +MCSE aside, a long tail means that some scenario in the box had a high level of estimated bias. +We could try bundling along different aspects to see if either of the remaining factors (e.g., ICC) explains these differences. +Here we try bundling cluster size and number of clusters. ```{r clusterRCT_plot_bias_v2} -ggplot( sres, aes( as.factor(n_bar), bias, col=method, group=paste0(n_bar,method) ) ) + - facet_grid( alpha ~ size_coef, labeller = label_both ) + - geom_boxplot(coef = Inf) + +ggplot( sres, aes( as.factor(alpha), bias, col=method, + group=paste0(method, alpha) ) ) + + facet_grid( size_coef ~ ICC, labeller = label_both ) + + geom_boxplot( coef = Inf, width=0.7, fill="grey" ) + geom_hline( yintercept = 0 ) + - theme_minimal() + theme_minimal() ``` -No progress there; we have long tails suggesting something is allowing for large bias in some contexts. -This could be MCSE, with some of our bias estimates being large due to random chance. -Or it could be some specific combination of factors allows for large bias (e.g., perhaps small sample sizes makes our estimators more vulnerable to bias). +We have some progress now: the long tails are primarily when the ICC is high, but we also see that MLM has bias with ICC is 0, if alpha is nonzero. + +We know things are more unstable in smaller samples sizes, so the tails could still be MCSE, with some of our bias estimates being large due to random chance. +Or perhaps there is still some specific combination of factors that allow for large bias (e.g., perhaps small sample sizes makes our estimators more vulnerable to bias). In an actual analysis, we would make a note to investigate these anomalies later on. -In general, playing around with factors so that the boxes are generally narrow is a good idea; it means that you have found a representation of the data where the variation within your bundles is less important. +In general, trying to group your simulation scenarios so that their boxes are generally narrow is a good idea; narrow boxes means that you have found a representation of the data where you know what is driving the variation in your performance measure, and that the factors bundled inside the boxes are less important. This might not always be possible, if all your factors matter; in this case the width of your boxes tells you to what extent the bundled factors matter relative to the factors explicitly present in your plot. +One might wonder, with only few trials per box, whether we should instead look at the individual scenarios. +Unfortunately, that gets a bit cluttered: + +```{r} +ggplot( sres, aes( as.factor(alpha), bias, col= method, + group=paste0(alpha,ICC,method) ) ) + + facet_grid( size_coef ~ ICC, labeller = label_both ) + + geom_point( size = 0.5, + position = position_dodge(width=0.7 ) ) + + geom_hline( yintercept = 0 ) + + theme_minimal() +``` + +Using boxplots, even over such a few number of points, notably clarifies a visualization. ## Aggregation Boxplots can make seeing trends more difficult, as the eye is drawn to the boxes and tails, and the range of your plot axes can be large due to needing to accommodate the full tails and outliers of your results; this can compress the mean differences between groups, making them look small. +They can also be artificially inflated, especially if the MCSEs are large. Instead of bundling, we can therefore aggregate, where we average all the scenarios within a box to get a single number of average performance. This will show us overall trends rather than individual simulation variation. @@ -249,33 +290,25 @@ Our conclusions would then be more general: if we had not explored more scenario That said, if some of our scenarios had no bias, and some had large bias, when we aggregated we would report that there is generally a moderate amount of bias. This would not be entirely faithful to the actual results. -If, however, the initial boxplots show results generally in one direction or another, then aggregation will be more faithful to the spirit of the results. - -Also, aggregated results can be misleading if you have scaling issues or extreme outliers. -With bias, our scale is fairly well set, so we are good. -But if we were aggregating standard errors over different sample sizes, then the larger standard errors of the smaller sample size simulations (and the greater variability in estimating those standard errors) would swamp the standard errors of the larger sample sizes. -Usually, with aggregation, we want to average over something we believe does not change massively over the marginalized-out factors. -To achieve this, we can often average over a relative measure (such as standard error divided by the standard error of some baseline method), which tend to be more invariant and comparable across scenarios. +But when the initial boxplots show results generally in one direction or another, then aggregation can be quite faithful to the spirit of the results. A major advantage of aggregation over the bundling approach is we can have fewer replications per scenario. If the number of replicates within each scenario is small, then the performance measures for each scenario is estimated with a lot of error; the aggregate, by contrast, will be an average across many more replicates and thus give a good sense of _average_ performance. The averaging, in effect, gives a lot more replications per aggregated performance measure. - For our cluster RCT, we might aggregate our bias across our sample sizes as follows: ```{r} ssres <- sres %>% - group_by( method, ICC, alpha, size_coef ) %>% - summarise( bias = mean( bias ), - n = n() ) + group_by( method, size_coef, J, alpha ) %>% + summarise( bias = mean( bias ) ) ``` -We now have a single bias estimate for each combination of ICC, alpha, and size_coef; we have collapsed 9 scenarios into one overall scenario that generalizes bias across different sizes of experiment. +We now have a single bias estimate for each combination of size_coef, J, and alpha; we have collapsed 15 scenarios into one overall scenario that generalizes bias across different average cluster sizes and different ICCs. We can then plot, using many small multiples: ```{r agg_bias_plot_clusterRCT} -ggplot( ssres, aes( ICC, bias, col=method ) ) + +ggplot( ssres, aes( as.factor(J), bias, col=method, group=method ) ) + facet_grid( size_coef ~ alpha, labeller = label_both ) + geom_point( alpha=0.75 ) + geom_line( alpha=0.75 ) + @@ -283,14 +316,33 @@ ggplot( ssres, aes( ICC, bias, col=method ) ) + theme_minimal() ``` -We see more clearly that greater variation in cluster size (alpha) leads to greater bias for the linear regression estimator, but only if the coefficient for size is nonzero (which makes sense given our theoretical understanding of the problem---if size is not related to treatment effect, it is hard to imagine how varying cluster sizes would cause much bias). +We now see quite clearly that as `alpha` grows, linear regression gets more biased if cluster size relates to average impact in the cluster (`size_coef`). +Our finding makes sense given our theoretical understanding of the problem---if size is not related to treatment effect, it is hard to imagine how varying cluster sizes would cause much bias. + We are looking at an interaction between our simulation factors: we only see bias for linear regression when cluster size relates to impact and there is variation in cluster size. -As ICC increases, we are not seeing any major differences in the pattern of our results -We also see that all the estimators have near zero bias when there is no variation in cluster size, with the overplotted lines on the top row of the figure. +We also see that all the estimators have near zero bias when there is no variation in cluster size or the cluster size does not relate to outcome, as shown by the top row and left column facets. +Finally, we see the methods all likely give the same answers when there is no cluster size variation, given the overplotted lines on the left column of the figure. + +We might take this figure as still too complex. +So far we have learned that MLM does seem to react to ICC, and that LR reacts to `alpha` and `size_coef` in combination. +More broadly, with many levels of a factor, as we have with ICC, we can let ggplot aggregate directly by taking advantage of `geom_smooth()`. +This leads to the following: + +```{r, echo=FALSE, message=FALSE} +ggplot( sres, aes( ICC, bias, col=as.factor(alpha), + group=interaction(method,alpha) ) ) + + facet_grid( size_coef ~ method, labeller = label_both ) + + geom_smooth( alpha=0.75, se=FALSE, method="loess", span=1.5 ) + + geom_hline( yintercept = 0 ) + + theme_minimal() +``` + +Our story is fairly clear now: LR is biased when alpha is large and the cluster size relates to impact. +MLM can be biased when ICC is low, if cluster size relates to impact (this is because it is driving towards person-weighting when there is little cluster variation). -If you have many levels of a factor, as we do with ICC, you can let ggplot aggregate directly by taking advantage of the smoothing options: -```{r, message=FALSE, warning=FALSE} + + +Aggregation is powerful, but it can be misleading if you have scaling issues or extreme outliers. +With bias, our scale is fairly well set, so we are good. +But if we were aggregating standard errors over different sample sizes, then the larger standard errors of the smaller sample size simulations (and the greater variability in estimating those standard errors) would swamp the standard errors of the larger sample sizes. +Usually, with aggregation, we want to average over something we believe does not change massively over the marginalized-out factors. +To achieve this, we can often average over a relative measure (such as standard error divided by the standard error of some baseline method), which tend to be more invariant and comparable across scenarios. +We will see more examples of this kind of aggregation later on. -#### A note on how to aggregate +#### Some notes on how to aggregate Some performance measures are biased with respect to the Monte Carlo uncertainty. The estimated standard error, for example, is biased; the variance, by contrast, is not. @@ -318,20 +379,26 @@ agg_perf <- sres %>% summarise( SE = sqrt( mean( SE^2 ) ) ) ``` -Because bias is linear, you do not need to worry about the bias of the standard error. +Because bias is linear, you do not need to worry about the MCSE. But if you are looking at the magnitude of bias ($|bias|$), then you can run into issues when the biases are close to zero, if they are measured noisily. -In this case, looking at average bias, not average $|bias|$, is safer. +For example, imagine you have two scenarios with true bias of 0.0, but your MCSE is 0.02. +In one scenario, you estimate a bias of 0.017, and in the other -0.023. +If you average the estimated biases, you get -0.003, which suggests a small bias as we would wish. +Averaging the absolute biases, on the other hand, gives you 0.02, which could be deceptive. +With high MCSE and small magnitudes of bias, looking at average bias, not average $|bias|$, is safer. +Alternatively, you can use the formula $RMSE^2 = Bias^2 + SE^2$ to back out the average absolute bias from the RMSE and SE. -## Assessing true SEs -We just did a deep dive into bias. +## Comparing true SEs with standardization + +We just did a deep dive into bias. Uncertainty (standard errors) is another primary performance criterion of interest. As an initial exploration, we plot the standard error estimates from our Cluster RCT simulation, using smoothed lines to visualize trends. We use `ggplot`'s `geom_smooth` to aggregate over `size_coef` and `alpha`, which we leave out of the plot. We include individual data points to visualize variation around the smoothed estimates: -```{r} +```{r, message=FALSE} ggplot( sres, aes( ICC, SE, col=method ) ) + facet_grid( n_bar ~ J, labeller = label_both ) + geom_jitter( height = 0, width = 0.05, alpha=0.5 ) + @@ -350,8 +417,8 @@ While we can extract all of these from the figure, the figure is still not ideal The dominant influence of design features like ICC and sample size obscures our ability to detect meaningful differences between methods. In other words, even though SE changes across scenarios, it’s difficult to tell which method is actually performing better within each scenario. -We can also view the same information using boxplots, effectively "bundling" over the left-out dimensions. -We also put `n_bar` in our bundles because maybe it does not matter that much: +We can also view the same information by bundling over the left-out dimensions. +We put `n_bar` in our bundles because maybe it does not matter that much: ```{r} ggplot( sres, aes( ICC, SE, col=method, group=paste0( ICC, method ) ) ) + facet_grid( . ~ J, labeller = label_both ) + @@ -373,13 +440,10 @@ we want to conclude that: Simulation results are often driven by broad design effects, which can obscure the specific methodological questions we care about. Standardizing helps bring those comparisons to the forefront. Let's try that next. -#### Standardizing to compare across simulation scenarios ##### - One straightforward strategy for standardization is to compare each method’s performance to a designated baseline. In this example, we use Linear Regression (LR) as our baseline. -We focus on the standard error (SE) of each method’s estimate, rescaling it relative to LR. -We do this by, for each simulation scenario, dividing each method’s SE by the SE of LR, to produce `SE.scale`. -This relative measure, SE.scale, allows us to examine how much better or worse each method performs in terms of precision under varying conditions. +We standardize by, for each simulation scenario, dividing each method’s SE by the SE of LR, to produce `SE.scale`. +This relative measure, `SE.scale`, allows us to examine how much better or worse, across our scenarios, each method performs relative to a chosen reference method. ```{r} ssres <- @@ -389,8 +453,8 @@ ssres <- ungroup() ``` -We can then treat it as a measure like any other. -Here we bundle: +We can then treat `SE.scale` as a measure like any other. +Here we bundle, showing how relative SE changes by J, `n_bar` and ICC: ```{r} ggplot( ssres, aes( ICC, SE.scale, col=method, @@ -400,77 +464,71 @@ ggplot( ssres, aes( ICC, SE.scale, col=method, scale_y_continuous( labels = scales::percent_format() ) ``` -The figure above shows how each method compares to LR across simulation scenarios. We see that Aggregation performs worse than LR when the Intraclass Correlation Coefficient (ICC) is zero. However, when ICC is greater than zero, Aggregation yields improved precision. -The Multilevel Model (MLM), in contrast, appears more adaptive. It captures the benefits of aggregation when ICC is high, but avoids the precision cost when ICC is zero. This adaptivity makes MLM appealing in practice when ICC is unknown or variable across contexts. +The figure above shows how each method compares to LR across simulation scenarios. +Aggregation clearly performs worse than LR when the Intraclass Correlation Coefficient (ICC) is zero. However, when ICC is greater than zero, Aggregation yields improved precision. +The Multilevel Model (MLM), in contrast, appears more adaptive. +It captures the benefits of aggregation when ICC is high, but avoids the precision cost when ICC is zero. +This adaptivity makes MLM appealing in practice when ICC is unknown or variable across contexts. -Although faceting by n_bar and J helps reveal potential interaction effects, it may be more effective to collapse across these variables for a cleaner summary. -We are also not seeing how site size variation is impacting these results, and we might think that matters, especially for aggregation. - -As a warning regarding Monte Carlo uncertainty: when standardizing results, it is important to remember that uncertainty in the baseline measure (here, LR) propagates to the standardized values. This should be considered when interpreting variability in the scaled results. -Uncertaintly for relative performance is generally tricky to assess. - -To clarify the main patterns, we average our SE.scale across simulation settings---relative performance is on the same scale, so averaging is a natural thing to do. - -```{r} -s2 <- - ssres %>% - group_by( ICC, alpha, method ) %>% - summarise( SE.scale = mean( SE.scale ) ) - -ggplot( s2, aes( ICC, SE.scale, col=method ) ) + - facet_wrap( ~ alpha ) + - geom_point() + geom_line() + - scale_y_continuous( labels = scales::percent_format() ) + - labs( title = "Average relative SE to Linear Regression", - y = "Relative Standard Error" ) -``` - -Our aggregated plot of precision of aggregation and MLM relative to Linear Regression gives a simple story clearly told. -The performance of aggregation improves with ICC. -MLM also has benefits over LR, and does not pay much cost when ICC is low. - -We can also visualize the variability in relative standard errors across simulation scenarios using boxplots. -This allows us to examine how consistent each method’s performance is under different ICC conditions. +In looking at the plot we are seeing essentially identical rows and and fairly similar across columns. +This suggests we should bundle the `n_bar` to get a cleaner view of the main patterns, and that we can also bundle over `J` as well. +We finally drop the `LR` results entirely, as it is the reference method and always has a relative SE of 1. ```{r} ssres %>% filter( method != "LR" ) %>% ggplot( aes( ICC, SE.scale, col=method, group = interaction(ICC, method) ) ) + - facet_wrap( ~ alpha, nrow=1) + + facet_grid( size_coef ~ alpha, labeller = label_both ) + geom_hline( yintercept = 1 ) + geom_boxplot( position="dodge", width=0.1 ) + + scale_x_continuous( breaks=unique( ssres$ICC ) ) + scale_y_continuous( breaks=seq(90, 125, by=5 ) ) ``` -These boxplots show the full distribution of relative standard errors across all simulated scenarios, separated by ICC level. We exclude LR as it is the reference method. - The pattern is clear: when ICC = 0, Aggregation performs worse than LR, and MLM performs about the same. But as ICC increases, Aggregation and MLM both improve, and perform about the same to each other. This highlights the robustness of MLM across diverse conditions. -We might also explore how uncertainty changes with other factors. -Here, we see whether cluster size meaningfully helps: + +As a warning regarding Monte Carlo uncertainty: when standardizing results, it is important to remember that uncertainty in the baseline measure (here, LR) propagates to the standardized values. +This should be considered when interpreting variability in the scaled results. +Uncertainty for relative performance is generally tricky to assess. + + +To clarify the main patterns, we then aggregate our SE.scale across the bundled simulation settings---relative performance is on the same scale, so averaging is now a natural thing to do. +We have aggregated out sample sizes, and we go further and remove `size_coef` since it does not seem to matter much, given the above plot: ```{r} -sres %>% - filter( alpha == 0.8, size_coef == 0.2 ) %>% -ggplot( aes( n_bar, SE, col=factor(ICC), group=ICC ) ) + - facet_grid( J ~ method, labeller = label_both, scales = "free") + - geom_point() + geom_line() +s2 <- + ssres %>% + group_by( ICC, alpha, method ) %>% + summarise( SE.scale = mean( SE.scale ) ) %>% + filter( method != "LR" ) + +ggplot( s2, aes( ICC, SE.scale, col=method ) ) + + facet_wrap( ~ alpha, labeller = label_both ) + + geom_hline( yintercept = 1 ) + + geom_point() + geom_line() + + scale_y_continuous( labels = scales::percent_format() ) + + labs( title = "Average relative SE to Linear Regression", + y = "Relative Standard Error" ) ``` -If the ICC is low, cluster size matters. Otherwise, the benefits are much more slim. +Our aggregated plot of the precision of aggregation and MLM relative to Linear Regression gives a simple story clearly told. +The performance of aggregation improves with ICC. +MLM also has benefits over LR, and does not pay much cost when ICC is low. ## The Bias-SE-RMSE plot -We can also visualize bias and standard error together, along with RMSE, to get a fuller picture of performance. -To illustrate, we subset to our biggest scenarios, in terms of sample size, and no ICC: +We can visualize bias and standard error together, along with RMSE, to get a rich picture of performance. +To illustrate, we subset to our scenarios where there is real bias for both LR and MLM (i.e., when ICC is 0; see findings under bias from above). +We also subset to our middle values of `n_bar = 80` and our large `J=80`, where uncertainty is small and thus the relative role of bias may be large. ```{r} bsr <- sres %>% - filter( n_bar == 320, J==80, ICC == 0 ) + filter( n_bar == 80, J==80, ICC == 0 ) bsr <- bsr %>% dplyr::select( -R, -power, -ESE_hat, -SD_SE_hat ) %>% @@ -481,10 +539,12 @@ bsr <- bsr %>% summarise( value = mean( value ), n = n() ) -bsr$measure = factor( bsr$measure, levels=c("bias", "SE", "RMSE"), - labels =c("|bias|", "SE", "RMSE" ) ) +bsr$measure = factor( bsr$measure, + levels=c("bias", "SE", "RMSE"), + labels =c("bias", "SE", "RMSE" ) ) -ggplot( bsr, aes( alpha, value, col=method )) + +ggplot( bsr, aes( as.factor(alpha), value, col=method, + group = method )) + facet_grid( size_coef ~ measure ) + geom_line() + geom_point() + labs( y = "", x = "Site Variation" ) + @@ -497,31 +557,34 @@ ggplot( bsr, aes( alpha, value, col=method )) + ``` The combination of bias, standard error, and RMSE provides a rich and informative view of estimator performance. +The top row represents settings where effect size is independent of cluster size, while the bottom row reflects a correlation between size and effect. +We see how bias, SE and RMSE grow as site variation increases (moving rightward in each panel). +Notably, when effect size is related to cluster size (bottom row), both linear regression and MLM exhibit significant bias, leading to notable increase in RMSE over SE. +In contrast, when effect size is unrelated to cluster size (top row), all methods show minimal bias, and the SEs are about the same; that said, we see aggregation paying a penalty as variation cluster size increases. +Overall, we see RMSE is primarily driven by SE. -As an illustration, in the above plot, we focus on a specific simulation scenario with n_bar = 320, J = 80, and ICC = 0. The top row represents settings where effect size is independent of cluster size, while the bottom row reflects a correlation between size and effect. - -These types of visualizations directly illustrates the canonical relationship: +The Bias-SE-RMSE visualization directly illustrates the canonical relationship: $$ \text{RMSE}^2 = \text{Bias}^2 + \text{SE}^2 $$ -In the plot we get overall performance (RMSE) clearly decomposed into into its two fundamental components: systematic error (bias) and variability (standard error). +The plot shows overall performance (RMSE) decomposed into into its two fundamental components: systematic error (bias) and variability (standard error). Here we see how bias for LR, for example, is dominant when site variation is high. -The differences in SE are small and so not the main reason for differences in overall estimator performance; bias is the main driver. +The differences in SE across methods are small and are thus not the main reason for differences in overall estimator performance; bias is the main driver. This is the kind of diagnostic plot we often wish were included in more applied simulation studies. -## Assessing estimated SEs +## Assessing the quality of the estimated SEs So far we have examined the performance of our _point estimators_. We next look at ways to assess our _estimated_ standard errors. A good first question is whether they are about the right size, on average, across all the scenarios. -Here it is very important to see if they are _reliably_ the right size, so the bundling method is an especially important tool here. +When assessing estimated standard errors it is very important to see if they are _reliably_ the right size, making the bundling method an especially important tool here. We first see if the average estimated SE, relative to the true SE, is usually around 1 across all scenarios: -```{r} +```{r, out.width="100%"} sres <- sres %>% mutate( inflate = ESE_hat / SE ) @@ -529,27 +592,30 @@ ggplot( sres, aes( ICC, inflate, col=method, group = interaction(ICC,method) ) ) + facet_grid( . ~ J, labeller = label_both) + - geom_boxplot( position="dodge" ) + + geom_boxplot( position="dodge", outlier.size=0.5 ) + geom_hline( yintercept=1 ) + - labs( color="n" ) + + labs( color="n", y = "Inflation" ) + scale_y_continuous( labels = scales::percent_format() ) ``` -We see that our estimated SEs are about right, on average, across all scenarios. -When ICC is 0 and J is small, the MLM SEs are a bit too high. -When J is 5, the LR estimator can be a bit low under some circumstances. -We can start exploring these trends to dig into why our are wide (suggesting that other factors dictate when the SEs are biased). +We see that, for the most part, our estimated SEs are about right, on average, across all scenarios. +When the ICC is 0 and J is small, the MLM SEs are clearly too high. +We also see that when J is 5, the LR estimator tends to be a bit low. -We can look at the $J = 80$ to see what MCSEs are like. -The `simhelpers` `calc_relative_var()` method gives mcses for relative bias. +We next start exploring to dig into why our boxplots are wide. +In particular, we want to see if other factors dictate when the SEs are biased. +We first subset to the $J = 80$ scenarios to see if those box widths could just be due to the MCSEs. +The `simhelpers` `calc_relative_var()` method gives mcses for relative bias of an estimated _variance_ to the true _variance_. +We thus square our estimated SEs to get variance estimates, and then use that function to see if the relative variance estimates are biased: ```{r} se_res <- res %>% group_by( n_bar, J, ATE, size_coef, ICC, alpha, method ) %>% summarize( calc_relative_var( estimates = ATE_hat, - var_estimates = SE_hat^2, - criteria = "relative bias" ) ) + var_estimates = SE_hat^2, + criteria = "relative bias" ) ) + se_res %>% filter( J == 80, n_bar == 80 ) %>% ggplot( aes( ICC, rel_bias_var, col=method ) ) + @@ -563,12 +629,15 @@ se_res %>% width = 0 ) ``` -In looking at this plot, we see no real evidence of miscalibration. -This makes us think the boxes in the prior plot are wide due to MCSE rather than other simulation factors driving some slight miscalibration in some scenarios when $J$ is high. +In looking at this plot, we see no real evidence of miscalibration: our confidence intervals are generally covering 1, meaning our average estimated variance is about the same as the true variance. +This makes us think the boxes for $J=80$ in the prior plot are wide due to MCSE rather than other simulation factors driving some slight miscalibration. We might then assume this applies to the $J = 20$ case as well. -Finally, we can look at how stable the estimated SEs are, relative to the actual uncertainty. -We calculate the standard deviation of the estimated standard errors and compare that to the standard deviation of the point estimate. +### Stability of estimated SEs + +We can also look at how stable the estimated SEs are, relative to the actual uncertainty they are trying to capture. +We do this by calculating the standard deviation of the estimated standard errors and compare that to the standard deviation of the point estimate. +This is related to the coefficient of variation of `SE_hat`. ```{r} sres <- mutate( sres, @@ -580,12 +649,14 @@ ggplot( sres, facet_grid( . ~ J, labeller = label_both) + geom_boxplot( position="dodge" ) + labs( color="n" ) + - scale_y_continuous( labels = scales::percent_format() ) + scale_y_continuous( labels = scales::percent_format() ) + + scale_x_continuous( breaks = unique( sres$ICC ) ) ``` -It looks like MLM has more reliably estimated SEs than other methods when ICC is small. -Aggregation has more trouble estimating uncertainty when J is small. -Finally, LR's SEs are generally more unstable, relative to its performance, when $J$ is larger. +Overall, we have a lot of variation in the estimated SEs, relative to the actual uncertainty. +We also see that MLM has more reliably estimated SEs than other methods when ICC is small. +Aggregation has relatively more trouble estimating uncertainty when J is small. +Finally, LR's SEs are slightly more unstable, relative to the other methods, when $J$ is larger. Assessing the stability of standard errors is usually very in the weeds of a performance evaluation. It is a tricky measure: if the true SE is high for a method, then the relative instability will be lower, even if the absolute instability is the same. @@ -595,9 +666,9 @@ People often look at confidence interval coverage and confidence interval width, ## Assessing confidence intervals -Coverage is a blend of how accurate our estimates are and how good our estimated SEs are. +Coverage is a blend of how accurate (unbiased) our estimates are and how good our estimated SEs are. To assess coverage, we first calculate confidence intervals using the estimated effect, estimated standard error, and degrees of freedom. -Once we have our calculated t-based intervals, we can average them across runs to get average width and coverage using `simhelpers`. +Once we have our calculated $t$-based intervals, we can average them across runs to get average width and coverage using `simhelpers`'s `calc_coverage()` method. A good confidence interval estimator would be one which is generally relatively short while maintaining proper coverage. Our calculations are as so: @@ -626,16 +697,17 @@ c_sub <- covres %>% ggplot( c_sub, aes( ICC, coverage, col=method, group=method ) ) + facet_grid( . ~ J, labeller = label_both ) + - geom_line() + - geom_point() + + geom_line( position = position_dodge( width=0.05)) + + geom_point( position = position_dodge( width=0.05) ) + geom_errorbar( aes( ymax = coverage + 2*coverage_mcse, - ymin = coverage - 2*coverage_mcse ), width=0 ) + + ymin = coverage - 2*coverage_mcse ), width=0, + position = position_dodge( width=0.05) ) + geom_hline( yintercept = 0.95 ) ``` Generally coverage is good unless $J$ is low or ICC is 0. -Monte Carlo standard errors indicate that, in some settings, the observed coverage is reliably different from the nominal 95%, suggesting issues with estimator bias, standard error estimation, or both. -We might want to see if these results are general across other settings (see exercises). +Monte Carlo standard error based confidence intervals on our performance metrics indicate that, in some settings, the observed coverage is reliably different from the nominal 95%, suggesting issues with estimator bias, standard error estimation, or both. +We might then want to see if these results are general across the other simulation scenarios (see exercises). For confidence interval width, we can calculate the average width relative to the width of LR across all scenarios: @@ -663,6 +735,9 @@ ggplot( c_agg, aes( ICC, width_rel, col=method, group=method ) ) + Confidence interval width serves as a proxy for precision. Narrow intervals suggest more precise estimates. We see MLM has wider intervals, relative to LR, when ICC is low. When there is site variation, both Agg and MLM have shorter intervals. +This plot essentially echos our standard error findings, as expected. +There are mild differences due to differences in how the degrees of freedom are calculated, however. + @@ -694,6 +769,10 @@ In Section \@ref(using-pmap-to-run-multifactor-simulations), we generated result Write a brief explanation of how the plot is laid out and explain why you chose to construct it as you did. +### Making another plot for assessing SEs + +In the main chapter we examined how SE changes as a function of various simulation factors. +Now generate a plot to see whether and when cluster size meaningfully helps precision, and explain what you find. diff --git a/075-special-topics-on-reporting.Rmd b/075-special-topics-on-reporting.Rmd index ee54c53..adfc1a6 100644 --- a/075-special-topics-on-reporting.Rmd +++ b/075-special-topics-on-reporting.Rmd @@ -8,10 +8,11 @@ editor_options: ```{r setup_exp_design_analysis, include=FALSE} library( tidyverse ) library( purrr ) +library( broom ) options(list(dplyr.summarise.inform = FALSE)) theme_set( theme_classic() ) - +source( "code/create_analysis_tree.R" ) ### Code for one of the running examples source( "case_study_code/clustered_data_simulation.R" ) @@ -34,7 +35,7 @@ sres <- ) sres -# 100 iterations per factor +# 1000 iterations per factor summary( sres$R ) ``` @@ -47,16 +48,17 @@ We then dive more deeply into what to do when you have only a few iterations per ## Using regression to analyze simulation results -In Chapter \@ref(presentation-of-results) we saw some examples of using regression and ANOVA to analyze simulation results. -We next provide some further in-depth examples that give the code for doing this sort of thing. +In Chapter \@ref(presentation-of-results) we saw some examples of using regression and ANOVA on a set of simulation results to summarize overall patterns across scenarios. +In this chapter we will provide some further in-depth examples along with the R code for doing this sort of thing. ### Example 1: Biserial, revisited -We first give the code that produced the final ANOVA summary table for the biserial correlation example in Chapter \@ref(presentation-of-results). -In the visualization there, we saw that several factors appeared to impact bias, but we might want to get a sense of how much. -Under modeling of that same chapter, we saw a table that partialed out the variance across several factors so we could see which simulation factors mattered most for bias. +As our first in depth example, we walk through the analysis that produces the final ANOVA summary table for the biserial correlation example in Chapter \@ref(presentation-of-results). +In the visualization there, we saw that several factors appeared to impact bias. +On the eta table presented later in that same chapter, we saw a table that decomposed the variance across several factors so we could see which simulation factors mattered most for bias. -To build that table, we first fit a regression model to see: +To build that table, we first fit a regression model, regressing bias on all the simulation factors. +We first convert each factor to a factor variable, so that R does not assume a continuous relationship. ```{r, include=FALSE} load("data/d2r results.rData") @@ -65,10 +67,10 @@ allResults <- allResults %>% mutate( n = ordered(n), - p_inv = p1, p1 = factor(p1, levels = c(2:5,8)) |> fct_relabel(\(x) paste0("p1 = 1/", x)), - fixed = factor(fixed, levels = c(TRUE,FALSE), c("Fixed percentiles","Sample percentiles")) + fixed = factor(fixed, levels = c(TRUE,FALSE), + c("Fixed percentiles","Sample percentiles")) ) r_F <- @@ -76,10 +78,10 @@ r_F <- filter(stat=="r.i" & design=="Extreme Group") %>% droplevels() %>% mutate( - fixed = fct_recode(fixed, "Pop. cut-off" = "Fixed percentiles", "Sample cut-off" = "Sample percentiles"), - bias = mean - rho, - bias.sm = mean.sm - rho, - rmse = sqrt(bias^2 + var) + fixed = fct_recode(fixed, + "Pop. cut-off" = "Fixed percentiles", + "Sample cut-off" = "Sample percentiles"), + bias = mean - rho ) ``` @@ -89,25 +91,28 @@ mod = lm( bias ~ fixed + rho + I(rho^2) + p1 + n, data = r_F) summary(mod, digits=2) ``` -The above printout gives main effects for each factor, averaged across other factors. -Because `p1` and `n` are ordered factors, the `lm()` command automatically generates linear, quadradic, cubic and fourth order contrasts for them. +The above printout gives main effects for each factor, averaged across the others. +Because `p1` and `n` are ordered factors, the `lm()` command automatically generates linear, quadratic, cubic and fourth order contrasts for them. We smooth our `rho` factor, which has many levels of a continuous measure, with a quadratic curve. We could instead use splines or some local linear regression if we were worried about model fit for a complex relationship. The main effects are summaries of trends across contexts. For example, averaged across the other contexts, the "sample cutoff" condition is around 0.004 lower than the population (the baseline condition). -We can also use ANOVA to get a sense of the major sources of variation in the simulation results (e.g., identifying which factors have negligible/minor influence on the bias of an estimator). +As shown in Chapter \@ref(presentation-of-results), we can also use ANOVA to get a sense of the major sources of variation in the simulation results (e.g., identifying which factors have negligible/minor influence on the bias of an estimator). To do this, we use `aov()` to fit an analysis of variance model: ```{r anova_example, warning=FALSE} anova_table <- aov(bias ~ rho * p1 * fixed * n, data = r_F) -summary(anova_table) +knitr::kable( summary(anova_table)[[1]], + digits = c(0,4,4,1,5) ) ``` The advantage here is the multiple levels of our categorical factors get bundled together in our table of results, making a tidier display. +Note we are including interactions between our simulation factors. +The prior linear regression model was just estimating main effects of the factors, and not estimating these more complex relationships. -The table in Chapter \@ref(presentation-of-results) is a summary of this anova table, which we generate as follows: +The eta table in Chapter \@ref(presentation-of-results) is a summary of this anova table, which we generate as follows: ```{r, warning=FALSE, eval=FALSE} library(lsr) @@ -117,25 +122,23 @@ etaSquared(anova_table) %>% mutate( order = 1 + str_count(source, ":" ) ) %>% group_by( order ) %>% arrange( -eta.sq, .by_group = TRUE ) %>% - relocate( order ) %>% - knitr::kable( digits = 2 ) + relocate( order ) ``` We group the results by the order of the interaction, so that we can see the main effects first, then two-way interactions, and so on. We then sort within each group to put the high importance factors first. -The resulting variance decomposition table (see Chapter \@ref(presentation-of-results)) shows the amount of variation explained by each combination of factors. +The resulting variance decomposition table shows the amount of variation explained by each combination of factors. ### Example 2: Cluster RCT example, revisited -When we have several methods to compare, we can also use meta-regression to understand how these methods change as other simulation factors change. -We next continue our running Cluster RCT example. +When we have several methods to compare, we can use meta-regression to understand how these methods change as other simulation factors change. +We next illustrate this with our running Cluster RCT example. We first turn our simulation levels (except for ICC, which has several levels) into factors, so R does not assume that sample size, for example, should be treated as a continuous variable: ```{r} - sres_f <- sres %>% mutate( @@ -148,11 +151,12 @@ M <- lm( bias ~ (n_bar + J + size_coef + ICC + alpha) * method, data = sres_f ) # View the results -stargazer::stargazer(M, type = "text", single.row = TRUE ) +tidy( M ) %>% + knitr::kable( digits = 3 ) ``` -We can quickly generate a lot of regression coefficients, making our meta-regression somewhat hard to interpret. -The above model does not even have interactions of the simulation factors, even though the plots we have seen strongly suggest interactions among the simulation factors. +With even a modestly complex simulation, we can quickly generate a lot of regression coefficients, making our meta-regression somewhat hard to interpret. +The above model does not even have interactions between the simulation factors, even though the plots we have seen strongly suggest interactions among them. That said, picking out the significant coefficients is a quick way to obtain clues as to what is driving performance. For instance, several features interact with the LR method for bias. The other two methods seem less impacted. @@ -162,23 +166,32 @@ The other two methods seem less impacted. -We can simplify our model using LASSO regression, to drop coefficients that are less relevant. +We can simplify a meta regression model using LASSO regression, to drop coefficients that are less relevant. This requires some work to make our model matrix of dummy variables with all the interactions. +If using LASSO, we recommend fitting a separate model to each method being considered; +the set of fit LASSO models can then be compared to see which methods react to what factors, and how. + +We first illustrate with LR, and then extend to all three. +To use the LASSO we have to prepare our data first by hand---this involves converting all our factors to sets of dummy variables for the regression. +We also generate all interaction terms up to the cubic level. ```{r} library(modelr) library(glmnet) -# Define formula with all three-way interactions -form <- bias ~ ( n_bar + J + size_coef + ICC + alpha) * method +sres_f_LR <- sres_f %>% + filter( method == "LR" ) # Create model matrix -X <- model.matrix(form, data = sres_f)[, -1] # drop intercept +form <- bias ~ ( n_bar + J + size_coef + ICC + alpha )^3 +X <- model.matrix(form, data = sres_f_LR)[, -1] +# The [,-1] drops the intercept +dim(X) # Fit LASSO -fit <- cv.glmnet(X, sres_f$bias, alpha = 1) +fit <- cv.glmnet(X, sres_f_LR$bias, alpha = 1) -# Coefficients +# Non-zero coefficients coef(fit, s = "lambda.1se") %>% as.matrix() %>% as.data.frame() %>% @@ -187,71 +200,37 @@ coef(fit, s = "lambda.1se") %>% knitr::kable(digits = 3) ``` -When using regression, and especially LASSO, which levels are baseline can impact the final results. -Here "Agg" is our baseline method, and so our coefficients are showing how other methods differ from the Agg method. -If we selected LR as baseline, then we might suddenly see Agg and MLM as having large coefficients. - -One trick is to give dummy variables for all the methods, and overload the `method` factor with the baseline method, so that it is always the first level. -```{r} -form <- bias ~ 0 + ( n_bar + J + size_coef + ICC + alpha) * method -sres_f$method <- factor(sres_f$method) -vars = c("n_bar", "J", "size_coef", "alpha", "method") -contr.identity <- function(x) { - n = nlevels(x) - m <- diag(n) - rownames(m) <- colnames(m) <- levels(x) - - m -} -contr.identity(sres_f$n_bar) -X <- model.matrix(~ 0 + ( n_bar + J + size_coef + alpha) * method, - data = sres_f, - contrasts.arg = lapply(sres_f[,vars], - \(x) contr.identity(x))) +Note we have 71 covariates due to the many, many interactions and the fact that our sample sizes, etc., are all factors, not continuous. -colnames(X) -``` - -Now do the LASSO on this colinear mess: -```{r} -fit <- cv.glmnet(X, sres_f$bias, alpha = 1) -coef(fit, s = "lambda.1se") %>% - as.matrix() %>% - as.data.frame() %>% - rownames_to_column("term") %>% - filter(abs(lambda.1se) > 0) %>% - knitr::kable(digits = 3) -``` - - -We can also extend to allow for pairwise interactions of simulation factors: -```{r} -form2 <- bias ~ ( n_bar + J + size_coef + ICC + alpha)^2 * method -``` - -Interestingly, we get basically the same result: -```{r, echo=FALSE} -X2 <- model.matrix(form2, data = sres_f)[, -1] # drop intercept -fit2 <- cv.glmnet(X2, sres_f$bias, alpha = 1) -coef(fit2, s = "lambda.1se") %>% - as.matrix() %>% - as.data.frame() %>% - rownames_to_column("term") %>% - filter(abs(lambda.1se) > 0) %>% - knitr::kable(digits = 3) -``` - - -#### Fitting models to each method +When using regression, and especially LASSO, which levels are baseline can impact the final results. +We have our smallest sample sizes, no variation, 0 ICC, and no `size_coef` as baseline. +We might imagine that other choices of baseline could suddenly make other factors appear with large coefficients. +One trick to avoid selecting a baseline is to give dummy variables for all the factors, and fit LASSO with the colinear terms. +Due to regularization, this would still work; we do not pursue this here, however. -We know each method responds differently to the simulation factors, so we could fit three models, one for each method, and compare them. +We next bundle the above to make three models, one for each method. +We first rescale ICC to be on a 5 point scale to control it's relative coefficient size to the dummy variables, and then add a new feature of "zeroICC" as well (recalling the prior plots that showed ICC being 0 was unusual). ```{r} meth = c( "LR", "MLM", "Agg" ) +sres_f$zeroICC = ifelse( sres_f$ICC == 0, 1, 0 ) +sres_f$ICCsc = sres_f$ICC * 5 # rescale ICC to be on a 5 point scale + models <- map( meth, function(m) { - M <- lm( bias ~ (n_bar + J + size_coef + ICC + alpha)^2, - data = sres_f %>% filter( method == m ) ) - tidy( M ) + + sres_f_LR <- sres_f %>% + filter( method == m ) + + form <- bias ~ ( n_bar + J + size_coef + ICCsc + alpha + zeroICC )^3 + X <- model.matrix(form, data = sres_f_LR)[, -1] + fit <- cv.glmnet(X, sres_f_LR$bias, alpha = 1) + + coef(fit, s = "lambda.min") %>% + as.matrix() %>% + as.data.frame() %>% + rownames_to_column("term") %>% + rename( estimate = lambda.min ) %>% + filter(abs(estimate) > 0) } ) models <- @@ -261,74 +240,80 @@ models <- m_res <- models %>% dplyr::select( model, term, estimate ) %>% - pivot_wider( names_from="model", values_from="estimate" ) + pivot_wider( names_from="model", values_from="estimate" ) %>% + mutate(order = str_count(term, ":")) %>% + arrange(order) %>% + relocate(order) +options(knitr.kable.NA = '') m_res %>% - knitr::kable( digits = 2 ) + knitr::kable( digits = 3 ) %>% + print( na.print = "" ) ``` -Of course, this is table is hard to read. Better to instead plot the coefficients or use LASSO to simplify the model specification. +Of course, this is table is hard to read. Better to instead plot the coefficients: ```{r} +lvl = m_res$term m_resL <- m_res %>% - pivot_longer( -term, + pivot_longer( -c( order, term ), names_to = "model", values_to = "estimate" ) %>% - mutate( term = factor(term, levels = unique(term)) ) %>% - mutate( has_nbar = str_detect(term, "n_bar" ), - has_J = str_detect(term, "J"), - has_size_coef = str_detect(term, "size_coef"), - has_ICC = str_detect(term, "ICC"), - has_alpha = str_detect(term, "alpha") ) + mutate( term = factor(term, levels = rev(lvl) ) ) ggplot( m_resL, aes( x = term, y = estimate, fill = model, group = model ) ) + - facet_wrap( ~ has_nbar, scales="free_y" ) + + facet_wrap( ~ model ) + geom_bar( stat = "identity", position = "dodge" ) + + geom_hline(yintercept = 0 ) + coord_flip() ``` -Here we see how LR stands out, but also how MLM stands out under different simulation factor combinations. -Staring at this provides some understanding of how the methods are similar, and dissimilar. +Here we see how LR stands out, but also how MLM stands out under different simulation factor combinations (see, e.g., the interaction of zeroICC, alpha being 0.8, and size_coef being 0.2). +This aggregate plot provides some understanding of how the methods are similar, and dissimilar. For another example we turn to the standard error. -Here we regress $log(SE)$ onto the coefficients, and we rescale ICC to be on a 5 point scale to control it's relative coefficeint size to the dummy variables. -We regress $log(SE)$ and then exponentiate the coefficients to get the relative change in SE. -We can then interpret an exponentiated coefficient of, 0.64 for MLM for `n_bar80` as a 36% reduction of the standard error when we increase n_bar from the baseline of 20 to 80. +Here we regress $log(SE)$ onto the coefficients. +We then exponentiate the estimated coefficients to get the relative change in SE as a function of the factors. +We can interpret an exponentiated coefficient of, for example, 0.64 for MLM for `n_bar80` as a 36% reduction of the standard error when we increase n_bar from the baseline of 20 to 80. +We use ordinary least squares and include all interactions up to three way interactions. +We will then simply drop all the tiny coefficients, rather than use the full LASSO machinery, to simplify our output. +This results in a plot similar to the above: -Here we make a plot like above, but with these relative changes: ```{r, echo=FALSE} meth = c( "LR", "MLM", "Agg" ) -sres_f$ICCsc = sres_f$ICC * 5 # rescale ICC to be on a 5 point scale models <- map( meth, function(m) { - M <- lm( log(SE) ~ (n_bar + J + size_coef + ICCsc + alpha)^2, + M <- lm( log(SE) ~ (n_bar + J + size_coef + ICCsc + alpha)^3, data = sres_f %>% filter( method == m ) ) tidy( M ) %>% - mutate( estimate =exp(estimate) - 1 ) + mutate( estimate = exp(estimate) - 1 ) } ) models <- models %>% set_names(meth) %>% bind_rows( .id = "model" ) m_res <- models %>% - dplyr::select( model, term, estimate ) %>% + mutate(order = str_count(term, ":")) %>% + dplyr::select( order, term, model, estimate ) %>% pivot_wider( names_from="model", values_from="estimate" ) m_resL <- m_res %>% - pivot_longer( -term, + pivot_longer( -c( order, term ), names_to = "model", values_to = "estimate" ) %>% mutate( term = factor(term, levels = unique(term)) ) %>% - mutate( has_nbar = str_detect(term, "n_bar" ), - has_J = str_detect(term, "J"), - has_size_coef = str_detect(term, "size_coef"), - has_ICC = str_detect(term, "ICC"), - has_alpha = str_detect(term, "alpha") ) + group_by( term ) %>% + mutate( max_est = max( abs( log( estimate+1 ) ) ) ) %>% + ungroup() %>% + filter( max_est > 0.05, + term != "(Intercept)" ) + +m_resL <- m_resL %>% + mutate( term = factor( term, rev( levels(term) ) ) ) ggplot( m_resL, aes( x = term, y = estimate, fill = model, group = model ) ) + - facet_wrap( ~ has_nbar, scales="free_y" ) + geom_bar( stat = "identity", position = "dodge" ) + geom_hline( yintercept = 0, linetype = "dashed" ) + labs( y = "Relative change in SE", @@ -337,20 +322,22 @@ ggplot( m_resL, coord_flip() ``` -This clearly shows that the methods are basically the same in terms of uncertainty estimation. -We also see some interesting trends, such as the impact of `n_bar` declines when ICC is higher (see the interaction terms at rigth of plot). - +Our plot clearly shows that the three methods are basically the same in terms of uncertainty estimation, with a few differences when alpha is 0.8. +We also see some interesting trends, such as the impact of n_bar declines when ICC is higher (see the positive interaction terms at right of plot). ## Using regression trees to find important factors -With more complex experiments, where the various factors are interacting with each other in strange ways, it can be a bit tricky to decipher which factors are important and identify stable patterns. -Another approach we might use to explore is to fit a regression tree on the simulation results. +With more complex experiments, where the various factors are interacting with each other in strange ways, it can be a bit tricky to decipher which factors are important and what patterns are stable. +Another exploration approach we might use is regression trees. +We wrote a utility method, a wrapper to the `rpart` package, to do this ([script here](code/create_analysis_tree.R)). +Here, for example, we see what predicts larger bias amounts: ```{r} source( here::here( "code/create_analysis_tree.R" ) ) -set.seed(4344443) + +set.seed(12411) create_analysis_tree( sres_f, outcome = "bias", predictor_vars = c("method", "n_bar", "J", @@ -358,13 +345,10 @@ create_analysis_tree( sres_f, tree_title = "Cluster RCT Bias Analysis Tree" ) ``` -We will not walk through the tree code, but you can review it [here](code/create_analysis_tree.R). -This function is a wrapper of the `rpart` package. - -The default pruning is based on a cross-fitting evaluation, and our sample size is not too terribly high (just the number of simulation scenarios fit). -Rerunning the code with a different seed can give a different tree. -In general, it might be worth forcibly simplifying the tree. -Trees are built greedily, so forcibly trimming often gives you the big things. +The default pruning is based on a cross-fitting evaluation, but our sample size is not too terribly high (just the number of simulation scenarios fit) so this is quite unstable. +Rerunning the code with a different seed will generally give a different tree. +We find that it is often worth forcibly simplifying the tree. +Trees are built greedily, so forcibly trimming often leaves you only with the big things. For example: ```{r} @@ -372,10 +356,11 @@ create_analysis_tree( sres_f, outcome = "bias", predictor_vars = c("method", "n_bar", "J", "size_coef", "ICC", "alpha"), - tree_title = "Smaller Cluster RCT Bias Analysis Tree" ) + tree_title = "Smaller Cluster RCT Bias Analysis Tree", + min_leaves = 5, max_leaves = 10 ) ``` -A very straightforward story: if `size_coef` is not 0, we are using LR, and alpha is large, then we have large bias. +This tree gives a very straightforward story: if `size_coef` is not 0 and we are using LR, then alpha drives bias. We can also zero in on specific methods to understand how they engage with the simulation factors, like so: @@ -389,101 +374,199 @@ create_analysis_tree( filter( sres_f, method=="LR" ), ``` We force more leaves to get at some more nuance. -We again immediately see, for the LR method, that bias is large when we have non-zero size coefficient _and_ large alpha value. +We again immediately see, for the LR method, that bias is large when we have non-zero size coefficient _and_ a large alpha value. Then, when $J$ is small, bias is even larger. -Generally we would not use a tree like this for a final reporting of results, but they can be important tools for _understanding_ your results, which leads to how to make and select more conventional figures for final reporting. +Generally we would not use a tree like this for a final reporting of results, but they can be important tools for _understanding_ your results, which leads to how to make and select more conventional figures for an outward facing document. ## Analyzing results with few iterations per scenario -When your simulation iterations are expensive to run (i.e., when each model fitting takes several minutes), then running thousands of iterations for many scenarios may not be computationally feasible. -But running simulations with a smaller number of iterations will yield very noisy estimates of estimator performance. -For a given scenario, if the methods being evaluated are substantially different, then the main patterns in performance might become evident even with only a few iterations. More generally, however, the Monte Carlo Standard Errors (MCSEs) may be so large that you will have a hard time discriminating between systematic patterns and noise. +When each simulation iteration is expensive to run (e.g., if fitting your model takes several minutes), then running thousands of iterations for many scenarios may not be computationally feasible. +But running simulations with only a small number of iterations will yield very noisy estimates of estimator performance for that scenario. -One tool to handle this is aggregation: if you use visualization methods that average across scenarios, those averages will have more precise estimates of (average) performance. -Do not, by contrast, trust the bundling approaches--the MCSEs will make your boxes wider, and give the impression that there is more variation across scenarios than there really is. -Regression approaches can be particularly useful: the regressions will effectively average performance across scenario, and give summaries of overall trends. +Now, if the methods being evaluated are substantially different, then differences in performance might still be evident even with only a few iterations. +More generally, however, the Monte Carlo Standard Errors (MCSEs) may be so large that you will have a hard time discriminating between systematic patterns and noise. + +One tool to handle few iterations is aggregation: if you average across scenarios, those averages will have more precise estimates of (average) performance than the estimates of performance within the scenarios. +Do not, by contrast, trust the bundling approach--the MCSEs will make your boxes wider, and give the impression that there is more variation across scenarios than there really is. + +Meta regression approaches such as we saw above can be particularly useful: a regression will effectively average performance across scenario, and give summaries of overall trends. You can even fit random effects regression, specifically accounting for the noise in the scenario-specific performance measures. -For more on this approach see @gilbert2024multilevel. +For more on using random effects for your meta regression see @gilbert2024multilevel. ### Example: ClusterRCT with only 100 replicates per scenario -In the prior chapter we analyzed the results of our cluster RCT simulation with 1000 replicates per scenario. -But say we only had 100 per scenario. -Using the prior chapter as a guide, we recreate some of the plots to show how MCSE can distort the picture of what is going on. + +```{r, include=FALSE} +set.seed( 40440 ) + +# Make small dataset +res_small <- res %>% + mutate( runID = as.numeric( runID ) ) %>% + filter( runID <= 100 ) + +ssres <- res_small %>% + group_by( n_bar, J, ATE, size_coef, ICC, alpha, method ) %>% + summarise( + bias = mean(ATE_hat - ATE), + SE = sd( ATE_hat ), + RMSE = sqrt( mean( (ATE_hat - ATE )^2 ) ), + ESE_hat = sqrt( mean( SE_hat^2 ) ), + SD_SE_hat = sqrt( sd( SE_hat^2 ) ), + power = mean( p_value <= 0.05 ), + R = n(), + .groups = "drop" + ) +ssres + +summary( ssres$R ) +``` + +In the prior chapter we analyzed the results of our cluster RCT simulation with 1000 iterations per scenario. +But say we only had 25 per scenario. +Using the prior chapter as a guide, we next recreate some of the plots to show how MCSE can distort the picture of what is going on. First, we look at our single plot of the raw results. Before we plot, however, we calculate MCSEs and add them to the plot as error bars. ```{r} sres_sub <- - sres %>% + ssres %>% filter( n_bar == 320, J == 20 ) %>% mutate( bias.mcse = SE / sqrt( R ) ) +dodge <- position_dodge(width = 0.35) ggplot( sres_sub, aes( as.factor(alpha), bias, col=method, pch=method, group=method ) ) + facet_grid( size_coef ~ ICC, labeller = label_both ) + - geom_point() + + geom_point( position = dodge ) + geom_errorbar( aes( ymin = bias - 2*bias.mcse, ymax = bias + 2*bias.mcse ), - width = 0 ) + - geom_line() + + width = 0, + position = dodge ) + + geom_line( position = dodge ) + geom_hline( yintercept = 0 ) + - theme_minimal() + theme_minimal() + + coord_cartesian( ylim = c(-0.10,0.10) ) ``` -Aggregation should smooth out some of our uncertainty. -When we aggregate across 9 scenarios, our number of replicates goes from 100 to 900; our MCSEs should be about a third the size. -Here is our aggregated bias plot: +Our uncertainty is much less when ICC is 0; this is because our estimators are far more precise due to not having cluster variation to contend with. +Other than the ICC = 0 case, we see substantial amounts of uncertainty, making it very hard to tell the different estimators apart. +In the top row, second plot from left, we see that the three estimators are co-dependent: they all react similarly to the same datasets, so if we end up with datasets that randomly lead to large estimates, all three will give large estimates. +The shape we are seeing is not a systematic bias, but rather a shared random variation. -```{r} -sres_sub2 <- +Here is the same plot with the full 1000 replicates, with the 100 replicate results overlaid in light color for comparison: + +```{r, echo=FALSE} +sres_sub_full <- sres %>% + filter( n_bar == 320, J == 20 ) %>% + mutate( bias.mcse = SE / sqrt( R ) ) + +dodge <- position_dodge(width = 0.35) +ggplot( sres_sub_full, aes( as.factor(alpha), bias, + col=method, pch=method, group=method ) ) + + facet_grid( size_coef ~ ICC, labeller = label_both ) + + geom_point( position = dodge ) + + geom_errorbar( aes( ymin = bias - 2*bias.mcse, + ymax = bias + 2*bias.mcse ), + width = 0, + position = dodge ) + + geom_line( position = dodge ) + + geom_point( data=sres_sub, position = dodge, alpha=0.2 ) + + # geom_errorbar( data=sres_sub, aes( ymin = bias - 2*bias.mcse, +# ymax = bias + 2*bias.mcse ), + # alpha=0.25, + # width = 0, + # position = dodge ) + + geom_line( data=sres_sub, position = dodge, alpha=0.2 ) + + geom_hline( yintercept = 0 ) + + theme_minimal() + + coord_cartesian( ylim = c(-0.10,0.10) ) +``` + +The MCSEs have shrunk by around $1/\sqrt{10} = 0.32$, as we would expect (generally the MCSEs will be on the order of $1/\sqrt{R}$, where $R$ is the number of replicates, so to halve the MCSE you need to quadruple the number of replicates). +Also note the ICC=0.2 top facet has shifted to a flat, slightly elevated line: we do not yet know if the elevation is real, just as we did not know if the dip in the prior plot was real. +Our confidence intervals are still including 0: it is possible there is no bias at all when the size coefficient is 0 (in fact we are fairly sure it is indeed the case). + +```{r, include=FALSE} +# Checking MCSEs are smaller as expected +summary( sres_sub_full$bias.mcse / sres_sub$bias.mcse ) +``` + +Moving back to our "small replicates" simulation, we can use aggregation to smooth out some of our uncertainty. +For example, if we aggregate across 9 scenarios, our number of replicates goes from 100 to 900; our MCSEs should then be about a third the size. +To calculate an aggregated MCSE, we aggregate our scenario-specific MCSEs as follows: +$$ MCSE_{agg} = \sqrt{ \frac{1}{K^2} \sum_{k=1}^{K} MCSE_k^2 } $$ + +where $MCSE_k$ is the Monte Carlo Standard Error for scenario $k$, and $K$ is the number of scenarios being averaged. +Assuming a collection of estimates are independent, the overall $SE^2$ of an average is the average $SE^2$ divided by $K$. +In code we have: + +```{r, echo=FALSE} +sres_sub2 <- + ssres %>% mutate( bias.mcse = SE / sqrt( R ) ) %>% - group_by( n_bar, J ) %>% + group_by( method, alpha, size_coef, ICC ) %>% summarise( bias = mean( bias ), - bias.mcse = sqrt( mean( bias.mcse^2 )) / sqrt(n()), + bias.mcse = sqrt( mean( bias.mcse^2 ) ) / n(), + K = n(), .groups = "drop" ) +``` +Recall that the `SE` variable is simply the standard deviation of the estimates. -ggplot( sres_sub, aes( as.factor(alpha), bias, +We can then make our aggregated bias plot, aggregating across `n_bar` and `J`: + +```{r, echo=FALSE} +ggplot( sres_sub2, aes( as.factor(alpha), bias, col=method, pch=method, group=method ) ) + facet_grid( size_coef ~ ICC, labeller = label_both ) + - geom_point() + + geom_point( position = dodge ) + geom_errorbar( aes( ymin = bias - 2*bias.mcse, ymax = bias + 2*bias.mcse ), - width = 0 ) + - geom_line() + + width = 0, + position = dodge ) + + geom_line( position = dodge ) + geom_hline( yintercept = 0 ) + - theme_minimal() + theme_minimal() + + coord_cartesian( ylim = c(-0.10,0.10) ) ``` -To get aggregate MCSE, we aggregate our scenario-specific MCSEs as follows: -$$ MCSE_{agg} = \sqrt{ \frac{1}{K^2} \sum_{k=1}^{K} MCSE_k^2 } $$ +```{r, include=FALSE} +sres_sub2 +sres_sub +ss = left_join( sres_sub2, + sres_sub, + by = c("method", "alpha", "size_coef", "ICC") ) +# Off because we are comparing to only one scenario with very specific MCSE, but we need to look at average MCSE across the averaged scenarios. +summary( ss$bias.mcse.x / ss$bias.mcse.y ) + +``` -where $MCSE_i$ is the Monte Carlo Standard Error for scenario $i$, and $k$ is the number of scenarios. -Assuming a collection of estimates are independent, the overall $SE^2$ of the average is the average $SE^2$ divided by $K$. -Even with the additional replicates per point, we see noticable noise in our plot. -Note how our three methods track each other up and down in the zero-bias scenarios, giving a sense of a shared bias in some cases. +Even with the additional replicates per point, we see noticeable noise in our plot: look at the top-right ICC of 0.8 facet, for example. +Also note how our three methods continue to track each other up and down in top row, giving a sense of a shared error. This is because all methods are analyzing the same set of datasets; they have shared uncertainty. This uncertainty can be deceptive. It can also be a boon: if we are explicitly comparing the performance of one method vs another, the shared uncertainty can be subtracted out, similar to what happens in a blocked experiment [@gilbert2024multilevel]. -Here we fit a multilevel model to the data. +One way to take advantage of this is to fit a multilevel regression model to our raw simulation results with a random effect for dataset. +We next fit such a model, taking advantage of the fact that bias is simply the average of the error across replicates. +We first make a unique ID for each scenario and dataset, and then fit the model with a random effect for both. +The first random effect allows for specific scenarios to have more or less bias beyond what our model predicts. +The second random effect allows for a given dataset to have a larger or smaller error than expected, shared across the three estimators. ```{r} library(lme4) -sub_res <- - res %>% - filter( runID <= 100 ) %>% +res_small <- res_small %>% mutate( error = ATE_hat - ATE, simID = paste(n_bar, J, size_coef, ICC, alpha, sep = "_"), + dataID = paste( simID, runID, sep="_" ), J = as.factor(J), n_bar = as.factor(n_bar), alpha = as.factor(alpha), @@ -491,13 +574,13 @@ sub_res <- ) M <- lmer( - error ~ method*(J + n_bar + ICC + alpha + size_coef) + (1|runID) + (1|simID), - data = sub_res + error ~ method + (1|dataID) + (1|simID), + data = res_small ) arm::display(M) ``` -We can look at the random effects: +We can look at how much each source of variation explains the overall error: ```{r} ranef_vars <- as.data.frame(VarCorr(M)) %>% @@ -508,8 +591,42 @@ ranef_vars <- knitr::kable(ranef_vars, digits = 2) ``` -The above model is a multilevel model that allows us to estimate how bias varies with method and simulation factor, while accounting for the uncertainty in the simulation. -The random variation for `simID` captures unexplained variation due to the interactions of the simulation factors. We see a large value, indicating that many interactions are present, and our main effects are not fully capturing all trends. +The random variation for `simID` captures unexplained variation due to the interactions of the simulation factors. +It appears to be a trivial amount; almost all the variation is due to the dataset. +This makes sense: each datasets is unbalanced due to random assignment, and that estimation error is part of the dataset random effect. + +So far we have not included any simulation factors: we are pushing variation across simulation into the random effect terms. We can instead include the simulation factors as fixed effects, to see how they impact bias. + +```{r} +M2 <- lmer( + error ~ method*(J + n_bar + ICC + alpha + size_coef) + (1|dataID) + (1|simID), + data = res_small +) +texreg::screenreg(M2) +``` + +The above models allow us to estimate how bias varies with method and simulation factor, while accounting for the uncertainty in the simulation. + +Finally, we can see how much variation has been explained by comparing the random effect variances: +```{r} +ranef_vars1 <- + as.data.frame(VarCorr(M)) %>% + dplyr::select(grp = grp, sd = vcov) %>% + mutate( sd = sqrt(sd), + ICC = sd^2 / sum(sd^2 ) ) +ranef_vars2 <- + as.data.frame(VarCorr(M2)) %>% + dplyr::select(grp = grp, sd = vcov) %>% + mutate( sd = sqrt(sd), + ICC = sd^2 / sum(sd^2 ) ) +rr = left_join( ranef_vars1, ranef_vars2, by = "grp", + suffix = c(".null", ".full") ) +rr <- rr %>% + mutate( sd.red = sd.full / sd.null ) +knitr::kable(rr, digits = 2) +``` + + diff --git a/080-simulations-as-evidence.Rmd b/080-simulations-as-evidence.Rmd index c85ca22..d840551 100644 --- a/080-simulations-as-evidence.Rmd +++ b/080-simulations-as-evidence.Rmd @@ -40,14 +40,14 @@ In the following subsections we go through a range of general strategies for mak ### Break symmetries and regularities -In a series of famous causal inference papers (@lin2013agnostic, @freedman2008regression), researchers examined when linear regression adjustment of a randomized experiment (i.e., when controlling for baseline covariates in a randomized experiment) could cause problems. +In a series of famous causal inference papers [@lin2013agnostic; @freedman2008regression], researchers examined when linear regression adjustment of a randomized experiment (i.e., when controlling for baseline covariates in a randomized experiment) could cause problems. Critically, if the treatment assignment is 50%, then the concerns that these researchers examined do not come into play, as asymmetries between the two groups gets perfectly cancelled out. That said, if the treatment proportion is more lopsided, then under some circumstances you can get bias, and you can get invalid standard errors, depending on other structures of the data. Simulations can be used to explore these issues, but only if we break the symmetry of the 50% treatment assignment. When designing simulations, it is worth looking for places of symmetry, because in those contexts estimators will often work better than they might otherwise, and other factors may not have as much of an effect as anticipated. -Similarly, in recent work on best practices for analyzing multisite experiments (@miratrix2021applied), we identified how different estimators could be targeting different estimands. +Similarly, in recent work on best practices for analyzing multisite experiments [@miratrix2021applied], we identified how different estimators could be targeting different estimands. In particular, some estimators target site-average treatment effects, some target person-average treatment effects, and some target a kind of precision-weighted blend of the two. To see this play out in practice, our simulations needed the sizes of sites to vary, and also the proportion of treated within site to vary. If we had run simulations with equal site size and equal proportion treated, we would not see the broader behavior that separates the estimators considered. @@ -114,7 +114,7 @@ It is very easy to accidentally put a very simple model in place for this final We next walk through how you might calibrate further in the context of evaluating estimators for some sort of causal inference context where we are assessing methods of estimating a treatment effect of some binary treatment. If we just resample our covariates, but then layer a constant treatment effect on top, we may be missing critical aspects of how our estimators might fail in practice. -In the area of causal inference, the potential outcomes framework provides a natural path for generating calibrated simulations [@Kern_calibrated]. +In the area of causal inference, the potential outcomes framework provides a natural path for generating calibrated simulations [@Kern2014calibrated]. Also see \@ref(potential-outcomes) for more discussion of simulations in the potential outcomes framework. Under this framework, we would take an existing randomized experiment or observational study and then impute all the missing potential outcomes under some specific scheme. This fully defines the sample of interest and thus any target parameters, such as a measure of heterogeneity, are then fully known. diff --git a/Designing-Simulations-in-R.toc b/Designing-Simulations-in-R.toc index 748da91..a71dcf1 100644 --- a/Designing-Simulations-in-R.toc +++ b/Designing-Simulations-in-R.toc @@ -1,275 +1,282 @@ -\contentsline {chapter}{Welcome}{9}{chapter*.2}% -\contentsline {section}{License}{10}{section*.3}% -\contentsline {section}{About the authors}{10}{section*.4}% -\contentsline {section}{Acknowledgements}{11}{section*.5}% -\contentsline {part}{I\hspace {1em}An Introductory Look}{13}{part.1}% -\contentsline {chapter}{\numberline {1}Introduction}{15}{chapter.1}% -\contentsline {section}{\numberline {1.1}Some of simulation's many uses}{16}{section.1.1}% -\contentsline {subsection}{\numberline {1.1.1}Comparing statistical approaches}{17}{subsection.1.1.1}% -\contentsline {subsection}{\numberline {1.1.2}Assessing performance of complex pipelines}{17}{subsection.1.1.2}% -\contentsline {subsection}{\numberline {1.1.3}Assessing performance under misspecification}{18}{subsection.1.1.3}% -\contentsline {subsection}{\numberline {1.1.4}Assessing the finite-sample performance of a statistical approach}{19}{subsection.1.1.4}% -\contentsline {subsection}{\numberline {1.1.5}Conducting Power Analyses}{19}{subsection.1.1.5}% -\contentsline {subsection}{\numberline {1.1.6}Simulating processess}{20}{subsection.1.1.6}% -\contentsline {section}{\numberline {1.2}The perils of simulation as evidence}{21}{section.1.2}% -\contentsline {section}{\numberline {1.3}Simulating to learn}{23}{section.1.3}% -\contentsline {section}{\numberline {1.4}Why R?}{24}{section.1.4}% -\contentsline {section}{\numberline {1.5}Organization of the text}{25}{section.1.5}% -\contentsline {chapter}{\numberline {2}Programming Preliminaries}{27}{chapter.2}% -\contentsline {section}{\numberline {2.1}Welcome to the tidyverse}{27}{section.2.1}% -\contentsline {section}{\numberline {2.2}Functions}{28}{section.2.2}% -\contentsline {subsection}{\numberline {2.2.1}Rolling your own}{29}{subsection.2.2.1}% -\contentsline {subsection}{\numberline {2.2.2}A dangerous function}{30}{subsection.2.2.2}% -\contentsline {subsection}{\numberline {2.2.3}Using Named Arguments}{33}{subsection.2.2.3}% -\contentsline {subsection}{\numberline {2.2.4}Argument Defaults}{34}{subsection.2.2.4}% -\contentsline {subsection}{\numberline {2.2.5}Function skeletons}{35}{subsection.2.2.5}% -\contentsline {section}{\numberline {2.3}\texttt {\textbackslash {}\textgreater {}} (Pipe) dreams}{35}{section.2.3}% -\contentsline {section}{\numberline {2.4}Recipes versus Patterns}{36}{section.2.4}% -\contentsline {section}{\numberline {2.5}Exercises}{37}{section.2.5}% -\contentsline {chapter}{\numberline {3}An initial simulation}{39}{chapter.3}% -\contentsline {section}{\numberline {3.1}Simulating a single scenario}{42}{section.3.1}% -\contentsline {section}{\numberline {3.2}A non-normal population distribution}{43}{section.3.2}% -\contentsline {section}{\numberline {3.3}Simulating across different scenarios}{45}{section.3.3}% -\contentsline {section}{\numberline {3.4}Extending the simulation design}{48}{section.3.4}% -\contentsline {section}{\numberline {3.5}Exercises}{48}{section.3.5}% -\contentsline {part}{II\hspace {1em}Structure and Mechanics of a Simulation Study}{51}{part.2}% -\contentsline {chapter}{\numberline {4}Structure of a simulation study}{53}{chapter.4}% -\contentsline {section}{\numberline {4.1}General structure of a simulation}{53}{section.4.1}% -\contentsline {section}{\numberline {4.2}Tidy, modular simulations}{55}{section.4.2}% -\contentsline {section}{\numberline {4.3}Skeleton of a simulation study}{56}{section.4.3}% -\contentsline {subsection}{\numberline {4.3.1}Data-Generating Process}{58}{subsection.4.3.1}% -\contentsline {subsection}{\numberline {4.3.2}Data Analysis Procedure}{59}{subsection.4.3.2}% -\contentsline {subsection}{\numberline {4.3.3}Repetition}{59}{subsection.4.3.3}% -\contentsline {subsection}{\numberline {4.3.4}Performance summaries}{61}{subsection.4.3.4}% -\contentsline {subsection}{\numberline {4.3.5}Multifactor simulations}{61}{subsection.4.3.5}% -\contentsline {section}{\numberline {4.4}Exercises}{62}{section.4.4}% -\contentsline {chapter}{\numberline {5}Case Study: Heteroskedastic ANOVA and Welch}{63}{chapter.5}% -\contentsline {section}{\numberline {5.1}The data-generating model}{66}{section.5.1}% -\contentsline {subsection}{\numberline {5.1.1}Now make a function}{68}{subsection.5.1.1}% -\contentsline {subsection}{\numberline {5.1.2}Cautious coding}{69}{subsection.5.1.2}% -\contentsline {section}{\numberline {5.2}The hypothesis testing procedures}{70}{section.5.2}% -\contentsline {section}{\numberline {5.3}Running the simulation}{71}{section.5.3}% -\contentsline {section}{\numberline {5.4}Summarizing test performance}{72}{section.5.4}% -\contentsline {section}{\numberline {5.5}Exercises}{74}{section.5.5}% -\contentsline {chapter}{\numberline {6}Data-generating processes}{77}{chapter.6}% -\contentsline {section}{\numberline {6.1}Examples}{77}{section.6.1}% -\contentsline {subsection}{\numberline {6.1.1}Example 1: One-way analysis of variance}{78}{subsection.6.1.1}% -\contentsline {subsection}{\numberline {6.1.2}Example 2: Bivariate Poisson model}{78}{subsection.6.1.2}% -\contentsline {subsection}{\numberline {6.1.3}Example 3: Hierarchical linear model for a cluster-randomized trial}{79}{subsection.6.1.3}% -\contentsline {section}{\numberline {6.2}Components of a DGP}{79}{section.6.2}% -\contentsline {section}{\numberline {6.3}A statistical model is a recipe for data generation}{82}{section.6.3}% -\contentsline {section}{\numberline {6.4}Plot the artificial data}{84}{section.6.4}% -\contentsline {section}{\numberline {6.5}Check the data-generating function}{86}{section.6.5}% -\contentsline {section}{\numberline {6.6}Example: Simulating clustered data}{87}{section.6.6}% -\contentsline {subsection}{\numberline {6.6.1}A design decision: What do we want to manipulate?}{88}{subsection.6.6.1}% -\contentsline {subsection}{\numberline {6.6.2}A model for a cluster RCT}{89}{subsection.6.6.2}% -\contentsline {subsection}{\numberline {6.6.3}From equations to code}{91}{subsection.6.6.3}% -\contentsline {subsection}{\numberline {6.6.4}Standardization in the DGP}{94}{subsection.6.6.4}% -\contentsline {section}{\numberline {6.7}Sometimes a DGP is all you need}{96}{section.6.7}% -\contentsline {section}{\numberline {6.8}More to explore}{101}{section.6.8}% -\contentsline {section}{\numberline {6.9}Exercises}{102}{section.6.9}% -\contentsline {subsection}{\numberline {6.9.1}The Welch test on a shifted-and-scaled \(t\) distribution}{102}{subsection.6.9.1}% -\contentsline {subsection}{\numberline {6.9.2}Plot the bivariate Poisson}{102}{subsection.6.9.2}% -\contentsline {subsection}{\numberline {6.9.3}Check the bivariate Poisson function}{103}{subsection.6.9.3}% -\contentsline {subsection}{\numberline {6.9.4}Add error-catching to the bivariate Poisson function}{103}{subsection.6.9.4}% -\contentsline {subsection}{\numberline {6.9.5}A bivariate negative binomial distribution}{104}{subsection.6.9.5}% -\contentsline {subsection}{\numberline {6.9.6}Another bivariate negative binomial distribution}{105}{subsection.6.9.6}% -\contentsline {subsection}{\numberline {6.9.7}Plot the data from a cluster-randomized trial}{105}{subsection.6.9.7}% -\contentsline {subsection}{\numberline {6.9.8}Checking the Cluster RCT DGP}{105}{subsection.6.9.8}% -\contentsline {subsection}{\numberline {6.9.9}More school-level variation}{106}{subsection.6.9.9}% -\contentsline {subsection}{\numberline {6.9.10}Cluster-randomized trial with baseline predictors}{106}{subsection.6.9.10}% -\contentsline {subsection}{\numberline {6.9.11}3-parameter IRT datasets}{106}{subsection.6.9.11}% -\contentsline {subsection}{\numberline {6.9.12}Check the 3-parameter IRT DGP}{107}{subsection.6.9.12}% -\contentsline {subsection}{\numberline {6.9.13}Explore the 3-parameter IRT model}{108}{subsection.6.9.13}% -\contentsline {subsection}{\numberline {6.9.14}Random effects meta-regression}{108}{subsection.6.9.14}% -\contentsline {subsection}{\numberline {6.9.15}Meta-regression with selective reporting}{109}{subsection.6.9.15}% -\contentsline {chapter}{\numberline {7}Data analysis procedures}{111}{chapter.7}% -\contentsline {section}{\numberline {7.1}Writing estimation functions}{112}{section.7.1}% -\contentsline {section}{\numberline {7.2}Including Multiple Data Analysis Procedures}{114}{section.7.2}% -\contentsline {section}{\numberline {7.3}Validating an Estimation Function}{119}{section.7.3}% -\contentsline {subsection}{\numberline {7.3.1}Checking against existing implementations}{119}{subsection.7.3.1}% -\contentsline {subsection}{\numberline {7.3.2}Checking novel procedures}{121}{subsection.7.3.2}% -\contentsline {subsection}{\numberline {7.3.3}Checking with simulations}{124}{subsection.7.3.3}% -\contentsline {section}{\numberline {7.4}Handling errors, warnings, and other hiccups}{125}{section.7.4}% -\contentsline {subsection}{\numberline {7.4.1}Capturing errors and warnings}{126}{subsection.7.4.1}% -\contentsline {subsection}{\numberline {7.4.2}Adapting estimation procedures for errors and warnings}{133}{subsection.7.4.2}% -\contentsline {section}{\numberline {7.5}Exercises}{136}{section.7.5}% -\contentsline {subsection}{\numberline {7.5.1}More Heteroskedastic ANOVA}{136}{subsection.7.5.1}% -\contentsline {subsection}{\numberline {7.5.2}Contingent testing}{136}{subsection.7.5.2}% -\contentsline {subsection}{\numberline {7.5.3}Check the cluster-RCT functions}{137}{subsection.7.5.3}% -\contentsline {subsection}{\numberline {7.5.4}Extending the cluster-RCT functions}{137}{subsection.7.5.4}% -\contentsline {subsection}{\numberline {7.5.5}Contingent estimator processing}{138}{subsection.7.5.5}% -\contentsline {subsection}{\numberline {7.5.6}Estimating 3-parameter item response theory models}{138}{subsection.7.5.6}% -\contentsline {subsection}{\numberline {7.5.7}Meta-regression with selective reporting}{139}{subsection.7.5.7}% -\contentsline {chapter}{\numberline {8}Running the Simulation Process}{143}{chapter.8}% -\contentsline {section}{\numberline {8.1}Repeating oneself}{143}{section.8.1}% -\contentsline {section}{\numberline {8.2}One run at a time}{144}{section.8.2}% -\contentsline {subsection}{\numberline {8.2.1}Reparameterizing}{147}{subsection.8.2.1}% -\contentsline {section}{\numberline {8.3}Bundling simulations with \texttt {simhelpers}}{148}{section.8.3}% -\contentsline {section}{\numberline {8.4}Seeds and pseudo-random number generators}{150}{section.8.4}% -\contentsline {section}{\numberline {8.5}Exercises}{153}{section.8.5}% -\contentsline {subsection}{\numberline {8.5.1}Welch simulations}{153}{subsection.8.5.1}% -\contentsline {subsection}{\numberline {8.5.2}Compare sampling distributions of Pearson's correlation coefficients}{153}{subsection.8.5.2}% -\contentsline {subsection}{\numberline {8.5.3}Reparameterization, redux}{153}{subsection.8.5.3}% -\contentsline {subsection}{\numberline {8.5.4}Fancy clustered RCT simulations}{153}{subsection.8.5.4}% -\contentsline {chapter}{\numberline {9}Performance metrics}{155}{chapter.9}% -\contentsline {section}{\numberline {9.1}Metrics for Point Estimators}{157}{section.9.1}% -\contentsline {subsection}{\numberline {9.1.1}Comparing the Performances of the Cluster RCT Estimation Procedures}{159}{subsection.9.1.1}% -\contentsline {subsubsection}{Are the estimators biased?}{160}{section*.12}% -\contentsline {subsubsection}{Which method has the smallest standard error?}{161}{section*.13}% -\contentsline {subsubsection}{Which method has the smallest Root Mean Squared Error?}{161}{section*.14}% -\contentsline {subsection}{\numberline {9.1.2}Less Conventional Performance metrics}{162}{subsection.9.1.2}% -\contentsline {section}{\numberline {9.2}Metrics for Standard Error Estimators}{164}{section.9.2}% -\contentsline {subsection}{\numberline {9.2.1}Assessing SEs for Our Cluster RCT Simulation}{166}{subsection.9.2.1}% -\contentsline {section}{\numberline {9.3}Metrics for Confidence Intervals}{167}{section.9.3}% -\contentsline {subsection}{\numberline {9.3.1}Confidence Intervals in our Cluster RCT Example}{168}{subsection.9.3.1}% -\contentsline {section}{\numberline {9.4}Metrics for Inferential Procedures (Hypothesis Tests)}{168}{section.9.4}% -\contentsline {subsection}{\numberline {9.4.1}Validity}{168}{subsection.9.4.1}% -\contentsline {subsection}{\numberline {9.4.2}Power}{169}{subsection.9.4.2}% -\contentsline {subsection}{\numberline {9.4.3}The Rejection Rate}{169}{subsection.9.4.3}% -\contentsline {subsection}{\numberline {9.4.4}Inference in our Cluster RCT Simulation}{170}{subsection.9.4.4}% -\contentsline {section}{\numberline {9.5}Selecting Relative vs.~Absolute Metrics}{171}{section.9.5}% -\contentsline {section}{\numberline {9.6}Summary of Peformance Measures}{173}{section.9.6}% -\contentsline {subsection}{\numberline {9.6.1}Windsorization to control outliers}{179}{subsection.9.6.1}% -\contentsline {subsection}{\numberline {9.6.2}Correlation measures vs absolute performance}{181}{subsection.9.6.2}% -\contentsline {section}{\numberline {9.7}Summary of Peformance Measures}{183}{section.9.7}% -\contentsline {section}{\numberline {9.8}Estimands Not Represented By a Parameter}{184}{section.9.8}% -\contentsline {section}{\numberline {9.9}Uncertainty in Performance Estimates (the Monte Carlo Standard Error)}{187}{section.9.9}% -\contentsline {subsection}{\numberline {9.9.1}MCSE for Relative Variance Estimators}{188}{subsection.9.9.1}% -\contentsline {subsection}{\numberline {9.9.2}Calculating MCSEs With the \texttt {simhelpers} Package}{189}{subsection.9.9.2}% -\contentsline {subsection}{\numberline {9.9.3}MCSE Calculation in our Cluster RCT Example}{191}{subsection.9.9.3}% -\contentsline {section}{\numberline {9.10}Concluding thoughts}{191}{section.9.10}% -\contentsline {section}{\numberline {9.11}Exercises}{192}{section.9.11}% -\contentsline {subsection}{\numberline {9.11.1}Brown and Forsythe (1974)}{192}{subsection.9.11.1}% -\contentsline {subsection}{\numberline {9.11.2}Jackknife calculation of MCSEs}{192}{subsection.9.11.2}% -\contentsline {subsection}{\numberline {9.11.3}Distribution theory for person-level average treatment effects}{192}{subsection.9.11.3}% -\contentsline {subsection}{\numberline {9.11.4}Multiple scenarios}{192}{subsection.9.11.4}% -\contentsline {part}{III\hspace {1em}Multifactor Simulations}{195}{part.3}% -\contentsline {chapter}{\numberline {10}Designing and executing multifactor simulations}{197}{chapter.10}% -\contentsline {section}{\numberline {10.1}Choosing parameter combinations}{199}{section.10.1}% -\contentsline {section}{\numberline {10.2}Using pmap to run multifactor simulations}{201}{section.10.2}% -\contentsline {section}{\numberline {10.3}When to calculate performance metrics}{206}{section.10.3}% -\contentsline {subsection}{\numberline {10.3.1}Aggregate as you simulate (inside)}{206}{subsection.10.3.1}% -\contentsline {subsection}{\numberline {10.3.2}Keep all simulation runs (outside)}{206}{subsection.10.3.2}% -\contentsline {subsection}{\numberline {10.3.3}Getting raw results ready for analysis}{208}{subsection.10.3.3}% -\contentsline {section}{\numberline {10.4}Summary}{210}{section.10.4}% -\contentsline {section}{\numberline {10.5}Case Study: A multifactor evaluation of cluster RCT estimators}{211}{section.10.5}% -\contentsline {subsection}{\numberline {10.5.1}Choosing parameters for the Clustered RCT}{211}{subsection.10.5.1}% -\contentsline {subsection}{\numberline {10.5.2}Redundant factor combinations}{213}{subsection.10.5.2}% -\contentsline {subsection}{\numberline {10.5.3}Running the simulations}{213}{subsection.10.5.3}% -\contentsline {subsection}{\numberline {10.5.4}Calculating performance metrics}{214}{subsection.10.5.4}% -\contentsline {section}{\numberline {10.6}Exercises}{216}{section.10.6}% -\contentsline {subsection}{\numberline {10.6.1}Brown and Forsythe redux}{216}{subsection.10.6.1}% -\contentsline {subsection}{\numberline {10.6.2}Meta-regression}{216}{subsection.10.6.2}% -\contentsline {subsection}{\numberline {10.6.3}Comparing the trimmed mean, median and mean}{216}{subsection.10.6.3}% -\contentsline {chapter}{\numberline {11}Exploring and presenting simulation results}{219}{chapter.11}% -\contentsline {section}{\numberline {11.1}Tabulation}{220}{section.11.1}% -\contentsline {subsection}{\numberline {11.1.1}Example: estimators of treatment variation}{222}{subsection.11.1.1}% -\contentsline {section}{\numberline {11.2}Visualization}{223}{section.11.2}% -\contentsline {subsection}{\numberline {11.2.1}Example 0: RMSE in Cluster RCTs}{224}{subsection.11.2.1}% -\contentsline {subsection}{\numberline {11.2.2}Example 1: Biserial correlation estimation}{225}{subsection.11.2.2}% -\contentsline {subsection}{\numberline {11.2.3}Example 2: Variance estimation and Meta-regression}{226}{subsection.11.2.3}% -\contentsline {subsection}{\numberline {11.2.4}Example 3: Heat maps of coverage}{226}{subsection.11.2.4}% -\contentsline {subsection}{\numberline {11.2.5}Example 4: Relative performance of treatment effect estimators}{228}{subsection.11.2.5}% -\contentsline {section}{\numberline {11.3}Modeling}{229}{section.11.3}% -\contentsline {subsection}{\numberline {11.3.1}Example 1: Biserial, revisited}{230}{subsection.11.3.1}% -\contentsline {subsection}{\numberline {11.3.2}Example 2: Comparing methods for cross-classified data}{231}{subsection.11.3.2}% -\contentsline {section}{\numberline {11.4}Reporting}{233}{section.11.4}% -\contentsline {chapter}{\numberline {12}Building good visualizations}{235}{chapter.12}% -\contentsline {section}{\numberline {12.1}Subsetting and Many Small Multiples}{236}{section.12.1}% -\contentsline {section}{\numberline {12.2}Bundling}{239}{section.12.2}% -\contentsline {section}{\numberline {12.3}Aggregation}{242}{section.12.3}% -\contentsline {subsubsection}{\numberline {12.3.0.1}A note on how to aggregate}{244}{subsubsection.12.3.0.1}% -\contentsline {section}{\numberline {12.4}Assessing true SEs}{245}{section.12.4}% -\contentsline {subsubsection}{\numberline {12.4.0.1}Standardizing to compare across simulation scenarios}{247}{subsubsection.12.4.0.1}% -\contentsline {section}{\numberline {12.5}The Bias-SE-RMSE plot}{251}{section.12.5}% -\contentsline {section}{\numberline {12.6}Assessing estimated SEs}{252}{section.12.6}% -\contentsline {section}{\numberline {12.7}Assessing confidence intervals}{255}{section.12.7}% -\contentsline {section}{\numberline {12.8}Exercises}{258}{section.12.8}% -\contentsline {subsection}{\numberline {12.8.1}Assessing uncertainty}{258}{subsection.12.8.1}% -\contentsline {subsection}{\numberline {12.8.2}Assessing power}{258}{subsection.12.8.2}% -\contentsline {subsection}{\numberline {12.8.3}Going deeper with coverage}{258}{subsection.12.8.3}% -\contentsline {subsection}{\numberline {12.8.4}Pearson correlations with a bivariate Poisson distribution}{258}{subsection.12.8.4}% -\contentsline {chapter}{\numberline {13}Special Topics on Reporting Simulation Results}{259}{chapter.13}% -\contentsline {section}{\numberline {13.1}Using regression to analyze simulation results}{259}{section.13.1}% -\contentsline {subsection}{\numberline {13.1.1}Example 1: Biserial, revisited}{259}{subsection.13.1.1}% -\contentsline {subsection}{\numberline {13.1.2}Example 2: Cluster RCT example, revisited}{262}{subsection.13.1.2}% -\contentsline {subsubsection}{\numberline {13.1.2.1}Using LASSO to simplify the model}{264}{subsubsection.13.1.2.1}% -\contentsline {subsubsection}{\numberline {13.1.2.2}Fitting models to each method}{267}{subsubsection.13.1.2.2}% -\contentsline {section}{\numberline {13.2}Using regression trees to find important factors}{271}{section.13.2}% -\contentsline {section}{\numberline {13.3}Analyzing results with few iterations per scenario}{273}{section.13.3}% -\contentsline {subsection}{\numberline {13.3.1}Example: ClusterRCT with only 100 replicates per scenario}{274}{subsection.13.3.1}% -\contentsline {section}{\numberline {13.4}What to do with warnings in simulations}{278}{section.13.4}% -\contentsline {chapter}{\numberline {14}Case study: Comparing different estimators}{283}{chapter.14}% -\contentsline {section}{\numberline {14.1}Bias-variance tradeoffs}{286}{section.14.1}% -\contentsline {chapter}{\numberline {15}Simulations as evidence}{291}{chapter.15}% -\contentsline {section}{\numberline {15.1}Strategies for making relevant simulations}{292}{section.15.1}% -\contentsline {subsection}{\numberline {15.1.1}Break symmetries and regularities}{292}{subsection.15.1.1}% -\contentsline {subsection}{\numberline {15.1.2}Make your simulation general with an extensive multi-factor experiment}{293}{subsection.15.1.2}% -\contentsline {subsection}{\numberline {15.1.3}Use previously published simulations to beat them at their own game}{293}{subsection.15.1.3}% -\contentsline {subsection}{\numberline {15.1.4}Calibrate simulation factors to real data}{293}{subsection.15.1.4}% -\contentsline {subsection}{\numberline {15.1.5}Use real data to obtain directly}{294}{subsection.15.1.5}% -\contentsline {subsection}{\numberline {15.1.6}Fully calibrated simulations}{294}{subsection.15.1.6}% -\contentsline {part}{IV\hspace {1em}Computational Considerations}{297}{part.4}% -\contentsline {chapter}{\numberline {16}Organizing a simulation project}{299}{chapter.16}% -\contentsline {section}{\numberline {16.1}Well structured R scripts}{300}{section.16.1}% -\contentsline {subsection}{\numberline {16.1.1}The source command}{300}{subsection.16.1.1}% -\contentsline {subsection}{\numberline {16.1.2}Putting headers in your .R file}{301}{subsection.16.1.2}% -\contentsline {subsection}{\numberline {16.1.3}Storing testing code in your scripts}{302}{subsection.16.1.3}% -\contentsline {section}{\numberline {16.2}Principled directory structures}{302}{section.16.2}% -\contentsline {section}{\numberline {16.3}Saving simulation results}{303}{section.16.3}% -\contentsline {subsection}{\numberline {16.3.1}Saving simulations in general}{303}{subsection.16.3.1}% -\contentsline {subsection}{\numberline {16.3.2}Saving simulations as you go}{304}{subsection.16.3.2}% -\contentsline {subsection}{\numberline {16.3.3}Dynamically making directories}{307}{subsection.16.3.3}% -\contentsline {subsection}{\numberline {16.3.4}Loading and combining files of simulation results}{308}{subsection.16.3.4}% -\contentsline {chapter}{\numberline {17}Parallel Processing}{309}{chapter.17}% -\contentsline {section}{\numberline {17.1}Parallel on your computer}{310}{section.17.1}% -\contentsline {section}{\numberline {17.2}Parallel on a virtual machine}{311}{section.17.2}% -\contentsline {section}{\numberline {17.3}Parallel on a cluster}{312}{section.17.3}% -\contentsline {subsection}{\numberline {17.3.1}What is a command-line interface?}{312}{subsection.17.3.1}% -\contentsline {subsection}{\numberline {17.3.2}Running a job on a cluster}{314}{subsection.17.3.2}% -\contentsline {subsection}{\numberline {17.3.3}Checking on a job}{316}{subsection.17.3.3}% -\contentsline {subsection}{\numberline {17.3.4}Running lots of jobs on a cluster}{317}{subsection.17.3.4}% -\contentsline {subsection}{\numberline {17.3.5}Resources for Harvard's Odyssey}{319}{subsection.17.3.5}% -\contentsline {subsection}{\numberline {17.3.6}Acknowledgements}{320}{subsection.17.3.6}% -\contentsline {chapter}{\numberline {18}Debugging and Testing}{321}{chapter.18}% -\contentsline {section}{\numberline {18.1}Debugging with \texttt {print()}}{321}{section.18.1}% -\contentsline {section}{\numberline {18.2}Debugging with \texttt {browser()}}{322}{section.18.2}% -\contentsline {section}{\numberline {18.3}Debugging with \texttt {debug()}}{323}{section.18.3}% -\contentsline {section}{\numberline {18.4}Protecting functions with \texttt {stop()}}{323}{section.18.4}% -\contentsline {section}{\numberline {18.5}Testing code}{325}{section.18.5}% -\contentsline {part}{V\hspace {1em}Complex Data Structures}{329}{part.5}% -\contentsline {chapter}{\numberline {19}Using simulation as a power calculator}{331}{chapter.19}% -\contentsline {section}{\numberline {19.1}Getting design parameters from pilot data}{332}{section.19.1}% -\contentsline {section}{\numberline {19.2}The data generating process}{333}{section.19.2}% -\contentsline {section}{\numberline {19.3}Running the simulation}{337}{section.19.3}% -\contentsline {section}{\numberline {19.4}Evaluating power}{338}{section.19.4}% -\contentsline {subsection}{\numberline {19.4.1}Checking validity of our models}{338}{subsection.19.4.1}% -\contentsline {subsection}{\numberline {19.4.2}Assessing Precision (SE)}{341}{subsection.19.4.2}% -\contentsline {subsection}{\numberline {19.4.3}Assessing power}{341}{subsection.19.4.3}% -\contentsline {subsection}{\numberline {19.4.4}Assessing Minimum Detectable Effects}{342}{subsection.19.4.4}% -\contentsline {section}{\numberline {19.5}Power for Multilevel Data}{343}{section.19.5}% -\contentsline {chapter}{\numberline {20}Simulation under the Potential Outcomes Framework}{347}{chapter.20}% -\contentsline {section}{\numberline {20.1}Finite vs.~Superpopulation inference}{348}{section.20.1}% -\contentsline {section}{\numberline {20.2}Data generation processes for potential outcomes}{348}{section.20.2}% -\contentsline {section}{\numberline {20.3}Finite sample performance measures}{351}{section.20.3}% -\contentsline {section}{\numberline {20.4}Nested finite simulation procedure}{354}{section.20.4}% -\contentsline {chapter}{\numberline {21}The Parametric bootstrap}{359}{chapter.21}% -\contentsline {section}{\numberline {21.1}Air conditioners: a stolen case study}{360}{section.21.1}% -\contentsline {chapter}{\numberline {A}Coding Reference}{363}{appendix.A}% -\contentsline {section}{\numberline {A.1}How to repeat yourself}{363}{section.A.1}% -\contentsline {subsection}{\numberline {A.1.1}Using \texttt {replicate()}}{363}{subsection.A.1.1}% -\contentsline {subsection}{\numberline {A.1.2}Using \texttt {map()}}{365}{subsection.A.1.2}% -\contentsline {subsection}{\numberline {A.1.3}map with no inputs}{366}{subsection.A.1.3}% -\contentsline {subsection}{\numberline {A.1.4}Other approaches for repetition}{367}{subsection.A.1.4}% -\contentsline {section}{\numberline {A.2}Default arguments for functions}{367}{section.A.2}% -\contentsline {section}{\numberline {A.3}Profiling Code}{369}{section.A.3}% -\contentsline {subsection}{\numberline {A.3.1}Using \texttt {Sys.time()} and \texttt {system.time()}}{369}{subsection.A.3.1}% -\contentsline {subsection}{\numberline {A.3.2}The \texttt {tictoc} package}{370}{subsection.A.3.2}% -\contentsline {subsection}{\numberline {A.3.3}The \texttt {bench} package}{370}{subsection.A.3.3}% -\contentsline {subsection}{\numberline {A.3.4}Profiling with \texttt {profvis}}{373}{subsection.A.3.4}% -\contentsline {section}{\numberline {A.4}Optimizing code (and why you often shouldn't)}{373}{section.A.4}% -\contentsline {subsection}{\numberline {A.4.1}Hand-building functions}{374}{subsection.A.4.1}% -\contentsline {subsection}{\numberline {A.4.2}Computational efficiency versus simplicity}{375}{subsection.A.4.2}% -\contentsline {subsection}{\numberline {A.4.3}Reusing code to speed up computation}{376}{subsection.A.4.3}% -\contentsline {chapter}{\numberline {B}Further readings and resources}{383}{appendix.B}% +\contentsline {chapter}{Welcome}{7}{chapter*.2}% +\contentsline {section}{License}{8}{section*.3}% +\contentsline {section}{About the authors}{8}{section*.4}% +\contentsline {section}{Acknowledgements}{9}{section*.5}% +\contentsline {part}{I\hspace {1em}An Introductory Look}{11}{part.1}% +\contentsline {chapter}{\numberline {1}Introduction}{13}{chapter.1}% +\contentsline {section}{\numberline {1.1}Some of simulation's many uses}{14}{section.1.1}% +\contentsline {subsection}{\numberline {1.1.1}Comparing statistical approaches}{15}{subsection.1.1.1}% +\contentsline {subsection}{\numberline {1.1.2}Assessing performance of complex pipelines}{15}{subsection.1.1.2}% +\contentsline {subsection}{\numberline {1.1.3}Assessing performance under misspecification}{16}{subsection.1.1.3}% +\contentsline {subsection}{\numberline {1.1.4}Assessing the finite-sample performance of a statistical approach}{16}{subsection.1.1.4}% +\contentsline {subsection}{\numberline {1.1.5}Conducting Power Analyses}{17}{subsection.1.1.5}% +\contentsline {subsection}{\numberline {1.1.6}Simulating processess}{18}{subsection.1.1.6}% +\contentsline {section}{\numberline {1.2}The perils of simulation as evidence}{19}{section.1.2}% +\contentsline {section}{\numberline {1.3}Simulating to learn}{21}{section.1.3}% +\contentsline {section}{\numberline {1.4}Why R?}{22}{section.1.4}% +\contentsline {section}{\numberline {1.5}Organization of the text}{23}{section.1.5}% +\contentsline {chapter}{\numberline {2}Programming Preliminaries}{25}{chapter.2}% +\contentsline {section}{\numberline {2.1}Welcome to the tidyverse}{25}{section.2.1}% +\contentsline {section}{\numberline {2.2}Functions}{26}{section.2.2}% +\contentsline {subsection}{\numberline {2.2.1}Rolling your own}{26}{subsection.2.2.1}% +\contentsline {subsection}{\numberline {2.2.2}A dangerous function}{27}{subsection.2.2.2}% +\contentsline {subsection}{\numberline {2.2.3}Using Named Arguments}{30}{subsection.2.2.3}% +\contentsline {subsection}{\numberline {2.2.4}Argument Defaults}{31}{subsection.2.2.4}% +\contentsline {subsection}{\numberline {2.2.5}Function skeletons}{32}{subsection.2.2.5}% +\contentsline {section}{\numberline {2.3}\texttt {\textbackslash {}\textgreater {}} (Pipe) dreams}{32}{section.2.3}% +\contentsline {section}{\numberline {2.4}Recipes versus Patterns}{33}{section.2.4}% +\contentsline {section}{\numberline {2.5}Exercises}{34}{section.2.5}% +\contentsline {chapter}{\numberline {3}An initial simulation}{37}{chapter.3}% +\contentsline {section}{\numberline {3.1}Simulating a single scenario}{39}{section.3.1}% +\contentsline {section}{\numberline {3.2}A non-normal population distribution}{41}{section.3.2}% +\contentsline {section}{\numberline {3.3}Simulating across different scenarios}{42}{section.3.3}% +\contentsline {section}{\numberline {3.4}Extending the simulation design}{45}{section.3.4}% +\contentsline {section}{\numberline {3.5}Exercises}{46}{section.3.5}% +\contentsline {part}{II\hspace {1em}Structure and Mechanics of a Simulation Study}{49}{part.2}% +\contentsline {chapter}{\numberline {4}Structure of a simulation study}{51}{chapter.4}% +\contentsline {section}{\numberline {4.1}General structure of a simulation}{51}{section.4.1}% +\contentsline {section}{\numberline {4.2}Tidy, modular simulations}{53}{section.4.2}% +\contentsline {section}{\numberline {4.3}Skeleton of a simulation study}{54}{section.4.3}% +\contentsline {subsection}{\numberline {4.3.1}Data-Generating Process}{56}{subsection.4.3.1}% +\contentsline {subsection}{\numberline {4.3.2}Data Analysis Procedure}{56}{subsection.4.3.2}% +\contentsline {subsection}{\numberline {4.3.3}Repetition}{57}{subsection.4.3.3}% +\contentsline {subsection}{\numberline {4.3.4}Performance summaries}{58}{subsection.4.3.4}% +\contentsline {subsection}{\numberline {4.3.5}Multifactor simulations}{59}{subsection.4.3.5}% +\contentsline {section}{\numberline {4.4}Exercises}{60}{section.4.4}% +\contentsline {chapter}{\numberline {5}Case Study: Heteroskedastic ANOVA and Welch}{61}{chapter.5}% +\contentsline {section}{\numberline {5.1}The data-generating model}{64}{section.5.1}% +\contentsline {subsection}{\numberline {5.1.1}Now make a function}{66}{subsection.5.1.1}% +\contentsline {subsection}{\numberline {5.1.2}Cautious coding}{67}{subsection.5.1.2}% +\contentsline {section}{\numberline {5.2}The hypothesis testing procedures}{68}{section.5.2}% +\contentsline {section}{\numberline {5.3}Running the simulation}{69}{section.5.3}% +\contentsline {section}{\numberline {5.4}Summarizing test performance}{70}{section.5.4}% +\contentsline {section}{\numberline {5.5}Exercises}{72}{section.5.5}% +\contentsline {subsection}{\numberline {5.5.1}Other \(\alpha \)'s}{72}{subsection.5.5.1}% +\contentsline {subsection}{\numberline {5.5.2}Compare results}{72}{subsection.5.5.2}% +\contentsline {subsection}{\numberline {5.5.3}Power}{72}{subsection.5.5.3}% +\contentsline {subsection}{\numberline {5.5.4}Wide or long?}{72}{subsection.5.5.4}% +\contentsline {subsection}{\numberline {5.5.5}Other tests}{73}{subsection.5.5.5}% +\contentsline {subsection}{\numberline {5.5.6}Methodological extensions}{73}{subsection.5.5.6}% +\contentsline {subsection}{\numberline {5.5.7}Power analysis}{73}{subsection.5.5.7}% +\contentsline {chapter}{\numberline {6}Data-generating processes}{75}{chapter.6}% +\contentsline {section}{\numberline {6.1}Examples}{75}{section.6.1}% +\contentsline {subsection}{\numberline {6.1.1}Example 1: One-way analysis of variance}{76}{subsection.6.1.1}% +\contentsline {subsection}{\numberline {6.1.2}Example 2: Bivariate Poisson model}{76}{subsection.6.1.2}% +\contentsline {subsection}{\numberline {6.1.3}Example 3: Hierarchical linear model for a cluster-randomized trial}{76}{subsection.6.1.3}% +\contentsline {section}{\numberline {6.2}Components of a DGP}{77}{section.6.2}% +\contentsline {section}{\numberline {6.3}A statistical model is a recipe for data generation}{80}{section.6.3}% +\contentsline {section}{\numberline {6.4}Plot the artificial data}{82}{section.6.4}% +\contentsline {section}{\numberline {6.5}Check the data-generating function}{83}{section.6.5}% +\contentsline {section}{\numberline {6.6}Example: Simulating clustered data}{85}{section.6.6}% +\contentsline {subsection}{\numberline {6.6.1}A design decision: What do we want to manipulate?}{85}{subsection.6.6.1}% +\contentsline {subsection}{\numberline {6.6.2}A model for a cluster RCT}{86}{subsection.6.6.2}% +\contentsline {subsection}{\numberline {6.6.3}From equations to code}{89}{subsection.6.6.3}% +\contentsline {subsection}{\numberline {6.6.4}Standardization in the DGP}{91}{subsection.6.6.4}% +\contentsline {section}{\numberline {6.7}Sometimes a DGP is all you need}{93}{section.6.7}% +\contentsline {section}{\numberline {6.8}More to explore}{98}{section.6.8}% +\contentsline {section}{\numberline {6.9}Exercises}{98}{section.6.9}% +\contentsline {subsection}{\numberline {6.9.1}The Welch test on a shifted-and-scaled \(t\) distribution}{98}{subsection.6.9.1}% +\contentsline {subsection}{\numberline {6.9.2}Plot the bivariate Poisson}{99}{subsection.6.9.2}% +\contentsline {subsection}{\numberline {6.9.3}Check the bivariate Poisson function}{99}{subsection.6.9.3}% +\contentsline {subsection}{\numberline {6.9.4}Add error-catching to the bivariate Poisson function}{100}{subsection.6.9.4}% +\contentsline {subsection}{\numberline {6.9.5}A bivariate negative binomial distribution}{100}{subsection.6.9.5}% +\contentsline {subsection}{\numberline {6.9.6}Another bivariate negative binomial distribution}{101}{subsection.6.9.6}% +\contentsline {subsection}{\numberline {6.9.7}Plot the data from a cluster-randomized trial}{102}{subsection.6.9.7}% +\contentsline {subsection}{\numberline {6.9.8}Checking the Cluster RCT DGP}{102}{subsection.6.9.8}% +\contentsline {subsection}{\numberline {6.9.9}More school-level variation}{102}{subsection.6.9.9}% +\contentsline {subsection}{\numberline {6.9.10}Cluster-randomized trial with baseline predictors}{102}{subsection.6.9.10}% +\contentsline {subsection}{\numberline {6.9.11}3-parameter IRT datasets}{103}{subsection.6.9.11}% +\contentsline {subsection}{\numberline {6.9.12}Check the 3-parameter IRT DGP}{104}{subsection.6.9.12}% +\contentsline {subsection}{\numberline {6.9.13}Explore the 3-parameter IRT model}{104}{subsection.6.9.13}% +\contentsline {subsection}{\numberline {6.9.14}Random effects meta-regression}{104}{subsection.6.9.14}% +\contentsline {subsection}{\numberline {6.9.15}Meta-regression with selective reporting}{105}{subsection.6.9.15}% +\contentsline {chapter}{\numberline {7}Data analysis procedures}{107}{chapter.7}% +\contentsline {section}{\numberline {7.1}Writing estimation functions}{108}{section.7.1}% +\contentsline {section}{\numberline {7.2}Including Multiple Data Analysis Procedures}{110}{section.7.2}% +\contentsline {section}{\numberline {7.3}Validating an Estimation Function}{114}{section.7.3}% +\contentsline {subsection}{\numberline {7.3.1}Checking against existing implementations}{115}{subsection.7.3.1}% +\contentsline {subsection}{\numberline {7.3.2}Checking novel procedures}{116}{subsection.7.3.2}% +\contentsline {subsection}{\numberline {7.3.3}Checking with simulations}{119}{subsection.7.3.3}% +\contentsline {section}{\numberline {7.4}Handling errors, warnings, and other hiccups}{121}{section.7.4}% +\contentsline {subsection}{\numberline {7.4.1}Capturing errors and warnings}{121}{subsection.7.4.1}% +\contentsline {subsection}{\numberline {7.4.2}Adapting estimation procedures for errors and warnings}{127}{subsection.7.4.2}% +\contentsline {section}{\numberline {7.5}Exercises}{130}{section.7.5}% +\contentsline {subsection}{\numberline {7.5.1}More Heteroskedastic ANOVA}{130}{subsection.7.5.1}% +\contentsline {subsection}{\numberline {7.5.2}Contingent testing}{131}{subsection.7.5.2}% +\contentsline {subsection}{\numberline {7.5.3}Check the cluster-RCT functions}{131}{subsection.7.5.3}% +\contentsline {subsection}{\numberline {7.5.4}Extending the cluster-RCT functions}{131}{subsection.7.5.4}% +\contentsline {subsection}{\numberline {7.5.5}Contingent estimator processing}{132}{subsection.7.5.5}% +\contentsline {subsection}{\numberline {7.5.6}Estimating 3-parameter item response theory models}{132}{subsection.7.5.6}% +\contentsline {subsection}{\numberline {7.5.7}Meta-regression with selective reporting}{133}{subsection.7.5.7}% +\contentsline {chapter}{\numberline {8}Running the Simulation Process}{137}{chapter.8}% +\contentsline {section}{\numberline {8.1}Repeating oneself}{137}{section.8.1}% +\contentsline {section}{\numberline {8.2}One run at a time}{138}{section.8.2}% +\contentsline {subsection}{\numberline {8.2.1}Reparameterizing}{141}{subsection.8.2.1}% +\contentsline {section}{\numberline {8.3}Bundling simulations with \texttt {simhelpers}}{142}{section.8.3}% +\contentsline {section}{\numberline {8.4}Seeds and pseudo-random number generators}{143}{section.8.4}% +\contentsline {section}{\numberline {8.5}Exercises}{146}{section.8.5}% +\contentsline {subsection}{\numberline {8.5.1}Welch simulations}{146}{subsection.8.5.1}% +\contentsline {subsection}{\numberline {8.5.2}Compare sampling distributions of Pearson's correlation coefficients}{146}{subsection.8.5.2}% +\contentsline {subsection}{\numberline {8.5.3}Reparameterization, redux}{147}{subsection.8.5.3}% +\contentsline {subsection}{\numberline {8.5.4}Fancy clustered RCT simulations}{147}{subsection.8.5.4}% +\contentsline {chapter}{\numberline {9}Performance metrics}{149}{chapter.9}% +\contentsline {section}{\numberline {9.1}Metrics for Point Estimators}{151}{section.9.1}% +\contentsline {subsection}{\numberline {9.1.1}Comparing the Performance of the Cluster RCT Estimation Procedures}{153}{subsection.9.1.1}% +\contentsline {subsubsection}{Are the estimators biased?}{154}{section*.12}% +\contentsline {subsubsection}{Which method has the smallest standard error?}{154}{section*.13}% +\contentsline {subsubsection}{Which method has the smallest Root Mean Squared Error?}{155}{section*.14}% +\contentsline {subsection}{\numberline {9.1.2}Less Conventional Performance metrics}{156}{subsection.9.1.2}% +\contentsline {section}{\numberline {9.2}Metrics for Standard Error Estimators}{158}{section.9.2}% +\contentsline {subsection}{\numberline {9.2.1}Satterthwaite degrees of freedom}{160}{subsection.9.2.1}% +\contentsline {subsection}{\numberline {9.2.2}Assessing SEs for the Cluster RCT Simulation}{161}{subsection.9.2.2}% +\contentsline {section}{\numberline {9.3}Metrics for Confidence Intervals}{162}{section.9.3}% +\contentsline {subsection}{\numberline {9.3.1}Confidence Intervals in the Cluster RCT Simulation}{163}{subsection.9.3.1}% +\contentsline {section}{\numberline {9.4}Metrics for Inferential Procedures (Hypothesis Tests)}{164}{section.9.4}% +\contentsline {subsection}{\numberline {9.4.1}Validity}{165}{subsection.9.4.1}% +\contentsline {subsection}{\numberline {9.4.2}Power}{165}{subsection.9.4.2}% +\contentsline {subsection}{\numberline {9.4.3}The Rejection Rate}{166}{subsection.9.4.3}% +\contentsline {subsection}{\numberline {9.4.4}Inference in the Cluster RCT Simulation}{167}{subsection.9.4.4}% +\contentsline {section}{\numberline {9.5}Selecting Relative vs.\nobreakspace {}Absolute Metrics}{169}{section.9.5}% +\contentsline {section}{\numberline {9.6}Estimands Not Represented By a Parameter}{170}{section.9.6}% +\contentsline {section}{\numberline {9.7}Uncertainty in Performance Estimates (the Monte Carlo Standard Error)}{173}{section.9.7}% +\contentsline {subsection}{\numberline {9.7.1}MCSE for Relative Variance Estimators}{174}{subsection.9.7.1}% +\contentsline {subsection}{\numberline {9.7.2}Calculating MCSEs With the \texttt {simhelpers} Package}{175}{subsection.9.7.2}% +\contentsline {subsection}{\numberline {9.7.3}MCSE Calculation in our Cluster RCT Example}{176}{subsection.9.7.3}% +\contentsline {section}{\numberline {9.8}Summary of Peformance Measures}{177}{section.9.8}% +\contentsline {section}{\numberline {9.9}Concluding thoughts}{178}{section.9.9}% +\contentsline {section}{\numberline {9.10}Exercises}{178}{section.9.10}% +\contentsline {subsection}{\numberline {9.10.1}Brown and Forsythe (1974)}{178}{subsection.9.10.1}% +\contentsline {subsection}{\numberline {9.10.2}Better confidence intervals}{178}{subsection.9.10.2}% +\contentsline {subsection}{\numberline {9.10.3}Cluster RCT simulation under a strong null hypothesis}{179}{subsection.9.10.3}% +\contentsline {subsection}{\numberline {9.10.4}Jackknife calculation of MCSEs}{179}{subsection.9.10.4}% +\contentsline {subsection}{\numberline {9.10.5}Distribution theory for person-level average treatment effects}{179}{subsection.9.10.5}% +\contentsline {subsection}{\numberline {9.10.6}Multiple scenarios}{179}{subsection.9.10.6}% +\contentsline {part}{III\hspace {1em}Multifactor Simulations}{181}{part.3}% +\contentsline {chapter}{\numberline {10}Designing and executing multifactor simulations}{183}{chapter.10}% +\contentsline {section}{\numberline {10.1}Choosing parameter combinations}{185}{section.10.1}% +\contentsline {section}{\numberline {10.2}Using pmap to run multifactor simulations}{187}{section.10.2}% +\contentsline {section}{\numberline {10.3}When to calculate performance metrics}{191}{section.10.3}% +\contentsline {subsection}{\numberline {10.3.1}Aggregate as you simulate (inside)}{191}{subsection.10.3.1}% +\contentsline {subsection}{\numberline {10.3.2}Keep all simulation runs (outside)}{192}{subsection.10.3.2}% +\contentsline {subsection}{\numberline {10.3.3}Getting raw results ready for analysis}{193}{subsection.10.3.3}% +\contentsline {section}{\numberline {10.4}Summary}{195}{section.10.4}% +\contentsline {section}{\numberline {10.5}Case Study: A multifactor evaluation of cluster RCT estimators}{196}{section.10.5}% +\contentsline {subsection}{\numberline {10.5.1}Choosing parameters for the Clustered RCT}{196}{subsection.10.5.1}% +\contentsline {subsection}{\numberline {10.5.2}Redundant factor combinations}{198}{subsection.10.5.2}% +\contentsline {subsection}{\numberline {10.5.3}Running the simulations}{198}{subsection.10.5.3}% +\contentsline {subsection}{\numberline {10.5.4}Calculating performance metrics}{199}{subsection.10.5.4}% +\contentsline {section}{\numberline {10.6}Exercises}{200}{section.10.6}% +\contentsline {subsection}{\numberline {10.6.1}Brown and Forsythe redux}{200}{subsection.10.6.1}% +\contentsline {subsection}{\numberline {10.6.2}Meta-regression}{201}{subsection.10.6.2}% +\contentsline {subsection}{\numberline {10.6.3}Comparing the trimmed mean, median and mean}{201}{subsection.10.6.3}% +\contentsline {chapter}{\numberline {11}Exploring and presenting simulation results}{203}{chapter.11}% +\contentsline {section}{\numberline {11.1}Tabulation}{204}{section.11.1}% +\contentsline {subsection}{\numberline {11.1.1}Example: estimators of treatment variation}{206}{subsection.11.1.1}% +\contentsline {section}{\numberline {11.2}Visualization}{207}{section.11.2}% +\contentsline {subsection}{\numberline {11.2.1}Example 0: RMSE in Cluster RCTs}{208}{subsection.11.2.1}% +\contentsline {subsection}{\numberline {11.2.2}Example 1: Biserial correlation estimation}{209}{subsection.11.2.2}% +\contentsline {subsection}{\numberline {11.2.3}Example 2: Variance estimation and Meta-regression}{209}{subsection.11.2.3}% +\contentsline {subsection}{\numberline {11.2.4}Example 3: Heat maps of coverage}{210}{subsection.11.2.4}% +\contentsline {subsection}{\numberline {11.2.5}Example 4: Relative performance of treatment effect estimators}{211}{subsection.11.2.5}% +\contentsline {section}{\numberline {11.3}Modeling}{213}{section.11.3}% +\contentsline {subsection}{\numberline {11.3.1}Example 1: Biserial, revisited}{214}{subsection.11.3.1}% +\contentsline {subsection}{\numberline {11.3.2}Example 2: Comparing methods for cross-classified data}{215}{subsection.11.3.2}% +\contentsline {section}{\numberline {11.4}Reporting}{216}{section.11.4}% +\contentsline {chapter}{\numberline {12}Building good visualizations}{219}{chapter.12}% +\contentsline {section}{\numberline {12.1}Subsetting and Many Small Multiples}{220}{section.12.1}% +\contentsline {section}{\numberline {12.2}Bundling}{223}{section.12.2}% +\contentsline {section}{\numberline {12.3}Aggregation}{227}{section.12.3}% +\contentsline {subsubsection}{\numberline {12.3.0.1}Some notes on how to aggregate}{229}{subsubsection.12.3.0.1}% +\contentsline {section}{\numberline {12.4}Comparing true SEs with standardization}{230}{section.12.4}% +\contentsline {section}{\numberline {12.5}The Bias-SE-RMSE plot}{235}{section.12.5}% +\contentsline {section}{\numberline {12.6}Assessing the quality of the estimated SEs}{237}{section.12.6}% +\contentsline {subsection}{\numberline {12.6.1}Stability of estimated SEs}{239}{subsection.12.6.1}% +\contentsline {section}{\numberline {12.7}Assessing confidence intervals}{240}{section.12.7}% +\contentsline {section}{\numberline {12.8}Exercises}{242}{section.12.8}% +\contentsline {subsection}{\numberline {12.8.1}Assessing uncertainty}{242}{subsection.12.8.1}% +\contentsline {subsection}{\numberline {12.8.2}Assessing power}{242}{subsection.12.8.2}% +\contentsline {subsection}{\numberline {12.8.3}Going deeper with coverage}{242}{subsection.12.8.3}% +\contentsline {subsection}{\numberline {12.8.4}Pearson correlations with a bivariate Poisson distribution}{243}{subsection.12.8.4}% +\contentsline {subsection}{\numberline {12.8.5}Making another plot for assessing SEs}{243}{subsection.12.8.5}% +\contentsline {chapter}{\numberline {13}Special Topics on Reporting Simulation Results}{245}{chapter.13}% +\contentsline {section}{\numberline {13.1}Using regression to analyze simulation results}{245}{section.13.1}% +\contentsline {subsection}{\numberline {13.1.1}Example 1: Biserial, revisited}{245}{subsection.13.1.1}% +\contentsline {subsection}{\numberline {13.1.2}Example 2: Cluster RCT example, revisited}{248}{subsection.13.1.2}% +\contentsline {subsubsection}{\numberline {13.1.2.1}Using LASSO to simplify the model}{249}{subsubsection.13.1.2.1}% +\contentsline {section}{\numberline {13.2}Using regression trees to find important factors}{254}{section.13.2}% +\contentsline {section}{\numberline {13.3}Analyzing results with few iterations per scenario}{256}{section.13.3}% +\contentsline {subsection}{\numberline {13.3.1}Example: ClusterRCT with only 100 replicates per scenario}{257}{subsection.13.3.1}% +\contentsline {section}{\numberline {13.4}What to do with warnings in simulations}{263}{section.13.4}% +\contentsline {chapter}{\numberline {14}Case study: Comparing different estimators}{267}{chapter.14}% +\contentsline {section}{\numberline {14.1}Bias-variance tradeoffs}{270}{section.14.1}% +\contentsline {chapter}{\numberline {15}Simulations as evidence}{275}{chapter.15}% +\contentsline {section}{\numberline {15.1}Strategies for making relevant simulations}{276}{section.15.1}% +\contentsline {subsection}{\numberline {15.1.1}Break symmetries and regularities}{276}{subsection.15.1.1}% +\contentsline {subsection}{\numberline {15.1.2}Make your simulation general with an extensive multi-factor experiment}{277}{subsection.15.1.2}% +\contentsline {subsection}{\numberline {15.1.3}Use previously published simulations to beat them at their own game}{277}{subsection.15.1.3}% +\contentsline {subsection}{\numberline {15.1.4}Calibrate simulation factors to real data}{277}{subsection.15.1.4}% +\contentsline {subsection}{\numberline {15.1.5}Use real data to obtain directly}{277}{subsection.15.1.5}% +\contentsline {subsection}{\numberline {15.1.6}Fully calibrated simulations}{278}{subsection.15.1.6}% +\contentsline {part}{IV\hspace {1em}Computational Considerations}{281}{part.4}% +\contentsline {chapter}{\numberline {16}Organizing a simulation project}{283}{chapter.16}% +\contentsline {section}{\numberline {16.1}Well structured R scripts}{284}{section.16.1}% +\contentsline {subsection}{\numberline {16.1.1}The source command}{284}{subsection.16.1.1}% +\contentsline {subsection}{\numberline {16.1.2}Putting headers in your .R file}{285}{subsection.16.1.2}% +\contentsline {subsection}{\numberline {16.1.3}Storing testing code in your scripts}{286}{subsection.16.1.3}% +\contentsline {section}{\numberline {16.2}Principled directory structures}{286}{section.16.2}% +\contentsline {section}{\numberline {16.3}Saving simulation results}{287}{section.16.3}% +\contentsline {subsection}{\numberline {16.3.1}Saving simulations in general}{287}{subsection.16.3.1}% +\contentsline {subsection}{\numberline {16.3.2}Saving simulations as you go}{288}{subsection.16.3.2}% +\contentsline {subsection}{\numberline {16.3.3}Dynamically making directories}{291}{subsection.16.3.3}% +\contentsline {subsection}{\numberline {16.3.4}Loading and combining files of simulation results}{291}{subsection.16.3.4}% +\contentsline {chapter}{\numberline {17}Parallel Processing}{293}{chapter.17}% +\contentsline {section}{\numberline {17.1}Parallel on your computer}{294}{section.17.1}% +\contentsline {section}{\numberline {17.2}Parallel on a virtual machine}{295}{section.17.2}% +\contentsline {section}{\numberline {17.3}Parallel on a cluster}{295}{section.17.3}% +\contentsline {subsection}{\numberline {17.3.1}What is a command-line interface?}{296}{subsection.17.3.1}% +\contentsline {subsection}{\numberline {17.3.2}Running a job on a cluster}{298}{subsection.17.3.2}% +\contentsline {subsection}{\numberline {17.3.3}Checking on a job}{300}{subsection.17.3.3}% +\contentsline {subsection}{\numberline {17.3.4}Running lots of jobs on a cluster}{300}{subsection.17.3.4}% +\contentsline {subsection}{\numberline {17.3.5}Resources for Harvard's Odyssey}{303}{subsection.17.3.5}% +\contentsline {subsection}{\numberline {17.3.6}Acknowledgements}{303}{subsection.17.3.6}% +\contentsline {chapter}{\numberline {18}Debugging and Testing}{305}{chapter.18}% +\contentsline {section}{\numberline {18.1}Debugging with \texttt {print()}}{305}{section.18.1}% +\contentsline {section}{\numberline {18.2}Debugging with \texttt {browser()}}{306}{section.18.2}% +\contentsline {section}{\numberline {18.3}Debugging with \texttt {debug()}}{307}{section.18.3}% +\contentsline {section}{\numberline {18.4}Protecting functions with \texttt {stop()}}{307}{section.18.4}% +\contentsline {section}{\numberline {18.5}Testing code}{308}{section.18.5}% +\contentsline {part}{V\hspace {1em}Complex Data Structures}{313}{part.5}% +\contentsline {chapter}{\numberline {19}Using simulation as a power calculator}{315}{chapter.19}% +\contentsline {section}{\numberline {19.1}Getting design parameters from pilot data}{316}{section.19.1}% +\contentsline {section}{\numberline {19.2}The data generating process}{317}{section.19.2}% +\contentsline {section}{\numberline {19.3}Running the simulation}{321}{section.19.3}% +\contentsline {section}{\numberline {19.4}Evaluating power}{322}{section.19.4}% +\contentsline {subsection}{\numberline {19.4.1}Checking validity of our models}{322}{subsection.19.4.1}% +\contentsline {subsection}{\numberline {19.4.2}Assessing Precision (SE)}{324}{subsection.19.4.2}% +\contentsline {subsection}{\numberline {19.4.3}Assessing power}{325}{subsection.19.4.3}% +\contentsline {subsection}{\numberline {19.4.4}Assessing Minimum Detectable Effects}{326}{subsection.19.4.4}% +\contentsline {section}{\numberline {19.5}Power for Multilevel Data}{327}{section.19.5}% +\contentsline {chapter}{\numberline {20}Simulation under the Potential Outcomes Framework}{331}{chapter.20}% +\contentsline {section}{\numberline {20.1}Finite vs.\nobreakspace {}Superpopulation inference}{332}{section.20.1}% +\contentsline {section}{\numberline {20.2}Data generation processes for potential outcomes}{332}{section.20.2}% +\contentsline {section}{\numberline {20.3}Finite sample performance measures}{335}{section.20.3}% +\contentsline {section}{\numberline {20.4}Nested finite simulation procedure}{338}{section.20.4}% +\contentsline {chapter}{\numberline {21}The Parametric bootstrap}{343}{chapter.21}% +\contentsline {section}{\numberline {21.1}Air conditioners: a stolen case study}{344}{section.21.1}% +\contentsline {chapter}{\numberline {A}Coding Reference}{347}{appendix.A}% +\contentsline {section}{\numberline {A.1}How to repeat yourself}{347}{section.A.1}% +\contentsline {subsection}{\numberline {A.1.1}Using \texttt {replicate()}}{347}{subsection.A.1.1}% +\contentsline {subsection}{\numberline {A.1.2}Using \texttt {map()}}{348}{subsection.A.1.2}% +\contentsline {subsection}{\numberline {A.1.3}map with no inputs}{350}{subsection.A.1.3}% +\contentsline {subsection}{\numberline {A.1.4}Other approaches for repetition}{350}{subsection.A.1.4}% +\contentsline {section}{\numberline {A.2}Default arguments for functions}{351}{section.A.2}% +\contentsline {section}{\numberline {A.3}Profiling Code}{352}{section.A.3}% +\contentsline {subsection}{\numberline {A.3.1}Using \texttt {Sys.time()} and \texttt {system.time()}}{352}{subsection.A.3.1}% +\contentsline {subsection}{\numberline {A.3.2}The \texttt {tictoc} package}{353}{subsection.A.3.2}% +\contentsline {subsection}{\numberline {A.3.3}The \texttt {bench} package}{353}{subsection.A.3.3}% +\contentsline {subsection}{\numberline {A.3.4}Profiling with \texttt {profvis}}{356}{subsection.A.3.4}% +\contentsline {section}{\numberline {A.4}Optimizing code (and why you often shouldn't)}{356}{section.A.4}% +\contentsline {subsection}{\numberline {A.4.1}Hand-building functions}{357}{subsection.A.4.1}% +\contentsline {subsection}{\numberline {A.4.2}Computational efficiency versus simplicity}{358}{subsection.A.4.2}% +\contentsline {subsection}{\numberline {A.4.3}Reusing code to speed up computation}{360}{subsection.A.4.3}% +\contentsline {chapter}{\numberline {B}Further readings and resources}{365}{appendix.B}% diff --git a/Designing-Simulations-in-R_cache/latex/__packages b/Designing-Simulations-in-R_cache/latex/__packages deleted file mode 100644 index 2d63cdf..0000000 --- a/Designing-Simulations-in-R_cache/latex/__packages +++ /dev/null @@ -1,48 +0,0 @@ -tidyverse -ggplot2 -tibble -tidyr -readr -purrr -dplyr -stringr -forcats -lubridate -simhelpers -psych -mvtnorm -Matrix -lme4 -MASS -arm -lmerTest -estimatr -blkvar -microbenchmark -future -furrr -lsr -bookdown -knitr -rmarkdown -kableExtra -ggridges -metadat -numDeriv -metafor -carData -car -zoo -lmtest -sandwich -survival -AER -modelr -glmnet -rpart -rpart.plot -sn -testthat -mlmpower -tictoc -bench diff --git a/Designing-Simulations-in-R_files/figure-latex/clusterRCT_plot_bias_v1-1.pdf b/Designing-Simulations-in-R_files/figure-latex/clusterRCT_plot_bias_v1-1.pdf index 980c42e..9c5afeb 100644 Binary files a/Designing-Simulations-in-R_files/figure-latex/clusterRCT_plot_bias_v1-1.pdf and b/Designing-Simulations-in-R_files/figure-latex/clusterRCT_plot_bias_v1-1.pdf differ diff --git a/Designing-Simulations-in-R_files/figure-latex/clusterRCT_plot_bias_v2-1.pdf b/Designing-Simulations-in-R_files/figure-latex/clusterRCT_plot_bias_v2-1.pdf index 829ae74..42a1248 100644 Binary files a/Designing-Simulations-in-R_files/figure-latex/clusterRCT_plot_bias_v2-1.pdf and b/Designing-Simulations-in-R_files/figure-latex/clusterRCT_plot_bias_v2-1.pdf differ diff --git a/Designing-Simulations-in-R_files/figure-latex/disc_mde-1.pdf b/Designing-Simulations-in-R_files/figure-latex/disc_mde-1.pdf index dc6af86..ca2b705 100644 Binary files a/Designing-Simulations-in-R_files/figure-latex/disc_mde-1.pdf and b/Designing-Simulations-in-R_files/figure-latex/disc_mde-1.pdf differ diff --git a/Designing-Simulations-in-R_files/figure-latex/disc_power-1.pdf b/Designing-Simulations-in-R_files/figure-latex/disc_power-1.pdf index b53c2a5..80f8671 100644 Binary files a/Designing-Simulations-in-R_files/figure-latex/disc_power-1.pdf and b/Designing-Simulations-in-R_files/figure-latex/disc_power-1.pdf differ diff --git a/Designing-Simulations-in-R_files/figure-latex/disc_precision-1.pdf b/Designing-Simulations-in-R_files/figure-latex/disc_precision-1.pdf index 0b394f2..7eb6b2d 100644 Binary files a/Designing-Simulations-in-R_files/figure-latex/disc_precision-1.pdf and b/Designing-Simulations-in-R_files/figure-latex/disc_precision-1.pdf differ diff --git a/Designing-Simulations-in-R_files/figure-latex/swan_example_setup-1.pdf b/Designing-Simulations-in-R_files/figure-latex/swan_example_setup-1.pdf index 55df1b5..dd3f2f9 100644 Binary files a/Designing-Simulations-in-R_files/figure-latex/swan_example_setup-1.pdf and b/Designing-Simulations-in-R_files/figure-latex/swan_example_setup-1.pdf differ diff --git a/Designing-Simulations-in-R_files/figure-latex/ttest_result_figure-1.pdf b/Designing-Simulations-in-R_files/figure-latex/ttest_result_figure-1.pdf index ec50981..316bcaa 100644 Binary files a/Designing-Simulations-in-R_files/figure-latex/ttest_result_figure-1.pdf and b/Designing-Simulations-in-R_files/figure-latex/ttest_result_figure-1.pdf differ diff --git a/Designing-Simulations-in-R_files/figure-latex/unnamed-chunk-2-1.pdf b/Designing-Simulations-in-R_files/figure-latex/unnamed-chunk-2-1.pdf index 2bbcce8..040be88 100644 Binary files a/Designing-Simulations-in-R_files/figure-latex/unnamed-chunk-2-1.pdf and b/Designing-Simulations-in-R_files/figure-latex/unnamed-chunk-2-1.pdf differ diff --git a/book.bib b/book.bib index 72977bb..93b66fe 100644 --- a/book.bib +++ b/book.bib @@ -9,7 +9,6 @@ @article{Benjamin2017redefine, title = {Redefine Statistical Significance}, author = {Benjamin, Daniel J. and Berger, James O. and Johannesson, Magnus and Nosek, Brian A. and Wagenmakers, E.-J. and Berk, Richard and Bollen, Kenneth A. and Brembs, Björn and Brown, Lawrence and Camerer, Colin and Cesarini, David and Chambers, Christopher D. and Clyde, Merlise and Cook, Thomas D. and De Boeck, Paul and Dienes, Zoltan and Dreber, Anna and Easwaran, Kenny and Efferson, Charles and Fehr, Ernst and Fidler, Fiona and Field, Andy P. and Forster, Malcolm and George, Edward I. and Gonzalez, Richard and Goodman, Steven and Green, Edwin and Green, Donald P. and Greenwald, Anthony G. and Hadfield, Jarrod D. and Hedges, Larry V. and Held, Leonhard and Hua Ho, Teck and Hoijtink, Herbert and Hruschka, Daniel J. and Imai, Kosuke and Imbens, Guido and Ioannidis, John P. A. and Jeon, Minjeong and Jones, James Holland and Kirchler, Michael and Laibson, David and List, John and Little, Roderick and Lupia, Arthur and Machery, Edouard and Maxwell, Scott E. and McCarthy, Michael and Moore, Don A. and Morgan, Stephen L. and Munafó, Marcus and Nakagawa, Shinichi and Nyhan, Brendan and Parker, Timothy H. and Pericchi, Luis and Perugini, Marco and Rouder, Jeff and Rousseau, Judith and Savalei, Victoria and Schönbrodt, Felix D. and Sellke, Thomas and Sinclair, Betsy and Tingley, Dustin and Van Zandt, Trisha and Vazire, Simine and Watts, Duncan J. and Winship, Christopher and Wolpert, Robert L. and Xie, Yu and Young, Cristobal and Zinman, Jonathan and Johnson, Valen E.}, - date = {2017-09-01}, journaltitle = {Nature Human Behaviour}, shortjournal = {Nat Hum Behav}, volume = {2}, @@ -19,13 +18,14 @@ @article{Benjamin2017redefine doi = {10.1038/s41562-017-0189-z}, url = {https://www.nature.com/articles/s41562-017-0189-z}, urldate = {2025-09-19}, - langid = {english} + langid = {english}, + year = {2017}, } @article{Lakens2018justify, title = {Justify Your Alpha}, author = {Lakens, Daniel and Adolfi, Federico G. and Albers, Casper J. and Anvari, Farid and Apps, Matthew A. J. and Argamon, Shlomo E. and Baguley, Thom and Becker, Raymond B. and Benning, Stephen D. and Bradford, Daniel E. and Buchanan, Erin M. and Caldwell, Aaron R. and Van Calster, Ben and Carlsson, Rickard and Chen, Sau-Chin and Chung, Bryan and Colling, Lincoln J. and Collins, Gary S. and Crook, Zander and Cross, Emily S. and Daniels, Sameera and Danielsson, Henrik and DeBruine, Lisa and Dunleavy, Daniel J. and Earp, Brian D. and Feist, Michele I. and Ferrell, Jason D. and Field, James G. and Fox, Nicholas W. and Friesen, Amanda and Gomes, Caio and Gonzalez-Marquez, Monica and Grange, James A. and Grieve, Andrew P. and Guggenberger, Robert and Grist, James and Van Harmelen, Anne-Laura and Hasselman, Fred and Hochard, Kevin D. and Hoffarth, Mark R. and Holmes, Nicholas P. and Ingre, Michael and Isager, Peder M. and Isotalus, Hanna K. and Johansson, Christer and Juszczyk, Konrad and Kenny, David A. and Khalil, Ahmed A. and Konat, Barbara and Lao, Junpeng and Larsen, Erik Gahner and Lodder, Gerine M. A. and Lukavský, Jiří and Madan, Christopher R. and Manheim, David and Martin, Stephen R. and Martin, Andrea E. and Mayo, Deborah G. and McCarthy, Randy J. and McConway, Kevin and McFarland, Colin and Nio, Amanda Q. X. and Nilsonne, Gustav and De Oliveira, Cilene Lino and De Xivry, Jean-Jacques Orban and Parsons, Sam and Pfuhl, Gerit and Quinn, Kimberly A. and Sakon, John J. and Saribay, S. Adil and Schneider, Iris K. and Selvaraju, Manojkumar and Sjoerds, Zsuzsika and Smith, Samuel G. and Smits, Tim and Spies, Jeffrey R. and Sreekumar, Vishnu and Steltenpohl, Crystal N. and Stenhouse, Neil and Świątkowski, Wojciech and Vadillo, Miguel A. and Van Assen, Marcel A. L. M. and Williams, Matt N. and Williams, Samantha E. and Williams, Donald R. and Yarkoni, Tal and Ziano, Ignazio and Zwaan, Rolf A.}, - date = {2018-02-26}, + year = {2018}, journaltitle = {Nature Human Behaviour}, shortjournal = {Nat Hum Behav}, volume = {2}, @@ -33,8 +33,6 @@ @article{Lakens2018justify pages = {168--171}, issn = {2397-3374}, doi = {10.1038/s41562-018-0311-x}, - url = {https://www.nature.com/articles/s41562-018-0311-x}, - urldate = {2025-09-19}, langid = {english} } @@ -206,10 +204,6 @@ @article{tipton2015small volume = {40}, year = {2015}} -@techreport{articletipton2015smalltitleSmall-sampleadjustmentsfortestsofmoderatorsandmodelfitusingrobustvarianceestimationinmeta-regressionauthorTiptonElizabethandPustejovskyJamesEjournalJournalofEducationalandBehavioralStatisticsvolume40number6pages604--634year2015publisherSagePublicationsSageCA:LosAngelesCA, - date-added = {2025-05-22 14:42:33 -0400}, - date-modified = {2025-05-22 14:42:35 -0400}} - @article{pustejovsky2014converting, author = {Pustejovsky, James E}, date-added = {2025-05-22 14:33:07 -0400}, @@ -244,72 +238,53 @@ @article{pashley2024improving @article{boos2015Assessing, author = {Boos, Dennis D. and Osborne, Jason A.}, - date = {2015}, - date-modified = {2025-06-19 10:37:47 -0700}, doi = {10.1111/insr.12087}, - issn = {1751-5823}, journaltitle = {International Statistical Review}, number = {2}, pages = {228--238}, title = {Assessing Variability of Complex Descriptive Statistics in {{Monte Carlo}} Studies Using Resampling Methods}, - url = {https://onlinelibrary.wiley.com/doi/abs/10.1111/insr.12087}, - urldate = {2025-02-20}, volume = {83}, year = {2015}, - bdsk-url-1 = {https://onlinelibrary.wiley.com/doi/abs/10.1111/insr.12087}, - bdsk-url-2 = {https://doi.org/10.1111/insr.12087}} +} -@article{boulesteix2020Replicationa, +@article{boulesteix2020Replication, author = {Boulesteix, Anne-Laure and Hoffmann, Sabine and Charlton, Alethea and Seibold, Heidi}, - date = {2020-10-01}, - date-modified = {2025-06-19 10:37:54 -0700}, doi = {10.1111/1740-9713.01444}, journaltitle = {Significance}, number = {5}, pages = {18--21}, - shortjournal = {Significance}, title = {A Replication Crisis in Methodological Research?}, volume = {17}, year = {2020}, - bdsk-url-1 = {https://doi.org/10.1111/1740-9713.01444}} +} @article{boulesteix2013Plea, author = {Boulesteix, Anne-Laure and Lauer, Sabine and Eugster, Manuel J. A.}, - date = {2013-04-24}, - date-modified = {2025-06-19 10:38:06 -0700}, doi = {10.1371/journal.pone.0061562}, journaltitle = {PLOS ONE}, number = {4}, pages = {e61562}, - shortjournal = {PLOS ONE}, title = {A Plea for Neutral Comparison Studies in Computational Sciences}, volume = {8}, year = {2013}, - bdsk-url-1 = {https://doi.org/10.1371/journal.pone.0061562}} +} @article{boulesteix2017evidencebased, abstract = {The goal of medical research is to develop interventions that are in some sense superior, with respect to patient outcome, to interventions currently in use. Similarly, the goal of research in methodological computational statistics is to develop data analysis tools that are themselves superior to the existing tools. The methodology of the evaluation of medical interventions continues to be discussed extensively in the literature and it is now well accepted that medicine should be at least partly ``evidence-based''. Although we statisticians are convinced of the importance of unbiased, well-thought-out study designs and evidence-based approaches in the context of clinical research, we tend to ignore these principles when designing our own studies for evaluating statistical methods in the context of our methodological research. In this paper, we draw an analogy between clinical trials and real-data-based benchmarking experiments in methodological statistical science, with datasets playing the role of patients and methods playing the role of medical interventions. Through this analogy, we suggest directions for improvement in the design and interpretation of studies which use real data to evaluate statistical methods, in particular with respect to dataset inclusion criteria and the reduction of various forms of bias. More generally, we discuss the concept of ``evidence-based'' statistical research, its limitations and its impact on the design and interpretation of real-data-based benchmark experiments. We suggest that benchmark studies---a method of assessment of statistical methods using real-world datasets---might benefit from adopting (some) concepts from evidence-based medicine towards the goal of more evidence-based statistical research.}, author = {Boulesteix, Anne-Laure and Wilson, Rory and Hapfelmeier, Alexander}, - date = {2017-12}, - date-modified = {2025-06-19 10:38:00 -0700}, doi = {10.1186/s12874-017-0417-2}, - issn = {1471-2288}, issue = {1}, journaltitle = {BMC Medical Research Methodology}, number = {1}, pages = {1--12}, - shortjournal = {BMC Med Res Methodol}, - shorttitle = {Towards Evidence-Based Computational Statistics}, title = {Towards Evidence-Based Computational Statistics: Lessons from Clinical Research on the Role and Design of Real-Data Benchmark Studies}, volume = {17}, year = {2017}, - bdsk-url-1 = {https://doi.org/10.1186/s12874-017-0417-2}} +} @book{borenstein2021introduction, address = {Chichester, UK}, author = {Borenstein, Michael and Hedges, Larry V. and Higgins, Julian P.T. and Rothstein, Hannah R.}, - date-added = {2025-03-18 10:53:36 -0400}, - date-modified = {2025-03-18 10:53:36 -0400}, edition = {3}, isbn = {978-1-119-55437-7}, publisher = {John Wiley \& Sons}, @@ -319,8 +294,6 @@ @book{borenstein2021introduction @article{Cho2023bivariate, abstract = {The zero-inflated negative binomial distribution has been widely used for count data analyses in various biomedical settings due to its capacity of modeling excess zeros and overdispersion. When there are correlated count variables, a bivariate model is essential for understanding their full distributional features. Examples include measuring correlation of two genes in sparse single-cell RNA sequencing data and modeling dental caries count indices on two different tooth surface types. For these purposes, we develop a richly parametrized bivariate zero-inflated negative binomial model that has a simple latent variable framework and eight free parameters with intuitive interpretations. In the scRNA-seq data example, the correlation is estimated after adjusting for the effects of dropout events represented by excess zeros. In the dental caries data, we analyze how the treatment with Xylitol lozenges affects the marginal mean and other patterns of response manifested in the two dental caries traits. An R package ``bzinb'' is available on Comprehensive R Archive Network.}, author = {Cho, Hunyong and Liu, Chuwen and Preisser, John S and Wu, Di}, - date = {2023-07}, - date-modified = {2025-06-19 10:38:14 -0700}, doi = {10.1177/09622802231172028}, journaltitle = {Statistical Methods in Medical Research}, number = {7}, @@ -328,7 +301,7 @@ @article{Cho2023bivariate title = {A Bivariate Zero-Inflated Negative Binomial Model and Its Applications to Biomedical Settings}, volume = {32}, year = {2023}, - bdsk-url-1 = {https://doi.org/10.1177/09622802231172028}} +} @article{gilbert2024multilevel, author = {Gilbert, Joshua and Miratrix, Luke}, @@ -448,56 +421,41 @@ @book{westfall2013understanding volume = {543}, year = {2013}} -@article{Bloom:2016um, +@article{Bloom2016using, abstract = {{The present article considers a fundamental question in evaluation research: ``By how much do program effects vary across sites?'' The article first presents a theoretical model of cross-site impact variation and a related estimation model with a random treatment coefficient and fixed site-specific intercepts. This approach eliminates several biases that can arise from unbalanced sample designs for multisite randomized trials. The article then describes how the approach operates, explores its assumptions, and applies the approach to data from three large welfare-to-work trials. The article also illustrates how to report cross-site impact findings and presents diagnostics for assessing these findings. To keep the article manageable, it focuses on experimental estimates of effects of program assignment (effects of intent to treat), although the ideas presented can be extended to analyses of multisite quasi-experiments and experimental estimates of effects of program participation (complier average causal effects).}}, author = {Bloom, Howard S. and Raudenbush, Stephen W. and Weiss, Michael J. and Porter, Kristin}, - date-added = {2024-06-11 06:55:53 -0400}, - date-modified = {2024-06-11 06:55:53 -0400}, doi = {10.1080/19345747.2016.1264518}, - issn = {1934-5747}, journal = {Journal of Research on Educational Effectiveness}, - keywords = {FIRC}, local-url = {file://localhost/Users/lmiratrix/Documents/Papers%20Library/Bloom/Journal%20of%20Research%20on%20Educational%20Effectiveness_Porter_1.pdf}, month = {11}, number = {4}, pages = {0--0}, - rating = {5}, title = {{Using Multisite Experiments to Study Cross-Site Variation in Treatment Effects: A Hybrid Approach With Fixed Intercepts and a Random Treatment Coefficient}}, - url = {https://mail.google.com/mail/u/0/}, volume = {10}, year = {2016}, - bdsk-file-1 = {YnBsaXN0MDDSAQIDBFxyZWxhdGl2ZVBhdGhYYm9va21hcmtfEGguLi8uLi8uLi8uLi9Eb2N1bWVudHMvUGFwZXJzIExpYnJhcnkvQmxvb20vSm91cm5hbCBvZiBSZXNlYXJjaCBvbiBFZHVjYXRpb25hbCBFZmZlY3RpdmVuZXNzX1BvcnRlcl8xLnBkZk8RBGRib29rZAQAAAAABBAwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABgAwAABQAAAAEBAABVc2VycwAAAAkAAAABAQAAbG1pcmF0cml4AAAACQAAAAEBAABEb2N1bWVudHMAAAAOAAAAAQEAAFBhcGVycyBMaWJyYXJ5AAAFAAAAAQEAAEJsb29tAAAAPQAAAAEBAABKb3VybmFsIG9mIFJlc2VhcmNoIG9uIEVkdWNhdGlvbmFsIEVmZmVjdGl2ZW5lc3NfUG9ydGVyXzEucGRmAAAAGAAAAAEGAAAEAAAAFAAAACgAAAA8AAAAVAAAAGQAAAAIAAAABAMAAMs4AAAAAAAACAAAAAQDAAC97g4AAAAAAAgAAAAEAwAAMEEPAAAAAAAIAAAABAMAANRHDwAAAAAACAAAAAQDAAA5SQ8AAAAAAAgAAAAEAwAATUkPAAAAAAAYAAAAAQYAAMwAAADcAAAA7AAAAPwAAAAMAQAAHAEAAAgAAAAABAAAQb5MnI4AAAAYAAAAAQIAAAEAAAAAAAAADwAAAAAAAAAAAAAAAAAAAAgAAAAEAwAABAAAAAAAAAAEAAAAAwMAAPcBAAAIAAAAAQkAAGZpbGU6Ly8vCAAAAAEBAABMb2JhZG9yYQgAAAAEAwAAAJCClucAAAAIAAAAAAQAAEHG45qFAAAAJAAAAAEBAAA4OTYxOUJFQy1DRENDLTQyQ0EtODAwNy0yOTc0QUE5RUI4MzQYAAAAAQIAAIEAAAABAAAA7xMAAAEAAAAAAAAAAAAAAAEAAAABAQAALwAAAAAAAAABBQAAHwEAAAECAAA3M2MzZWM1ZjE1NjhiNDBlYWVkMTY1Nzc5ZGZkMjA5NWI5ZjE1YWExNmU2YzhlZDg4ZTNiMjNmNDdkYWM2NjI2OzAwOzAwMDAwMDAwOzAwMDAwMDAwOzAwMDAwMDAwOzAwMDAwMDAwMDAwMDAwMjA7Y29tLmFwcGxlLmFwcC1zYW5kYm94LnJlYWQtd3JpdGU7MDE7MDEwMDAwMTA7MDAwMDAwMDAwMDBmNDk0ZDs0ZTsvdXNlcnMvbG1pcmF0cml4L2RvY3VtZW50cy9wYXBlcnMgbGlicmFyeS9ibG9vbS9qb3VybmFsIG9mIHJlc2VhcmNoIG9uIGVkdWNhdGlvbmFsIGVmZmVjdGl2ZW5lc3NfcG9ydGVyXzEucGRmAADMAAAA/v///wEAAAAAAAAAEAAAAAQQAACsAAAAAAAAAAUQAAAsAQAAAAAAABAQAABcAQAAAAAAAEAQAABMAQAAAAAAAAIgAAAkAgAAAAAAAAUgAACYAQAAAAAAABAgAACoAQAAAAAAABEgAADYAQAAAAAAABIgAAC4AQAAAAAAABMgAADIAQAAAAAAACAgAAAEAgAAAAAAADAgAAAwAgAAAAAAAAHAAAB8AQAAAAAAABHAAAAUAAAAAAAAABLAAACMAQAAAAAAAIDwAAA4AgAAAAAAAAAIAA0AGgAjAI4AAAAAAAACAQAAAAAAAAAFAAAAAAAAAAAAAAAAAAAE9g==}, - bdsk-file-2 = {YnBsaXN0MDDSAQIDBFxyZWxhdGl2ZVBhdGhYYm9va21hcmtfEGguLi8uLi8uLi8uLi9Eb2N1bWVudHMvUGFwZXJzIExpYnJhcnkvQmxvb20vSm91cm5hbCBvZiBSZXNlYXJjaCBvbiBFZHVjYXRpb25hbCBFZmZlY3RpdmVuZXNzX1BvcnRlcl8xLnBkZk8RBGRib29rZAQAAAAABBAwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABgAwAABQAAAAEBAABVc2VycwAAAAkAAAABAQAAbG1pcmF0cml4AAAACQAAAAEBAABEb2N1bWVudHMAAAAOAAAAAQEAAFBhcGVycyBMaWJyYXJ5AAAFAAAAAQEAAEJsb29tAAAAPQAAAAEBAABKb3VybmFsIG9mIFJlc2VhcmNoIG9uIEVkdWNhdGlvbmFsIEVmZmVjdGl2ZW5lc3NfUG9ydGVyXzEucGRmAAAAGAAAAAEGAAAEAAAAFAAAACgAAAA8AAAAVAAAAGQAAAAIAAAABAMAAMs4AAAAAAAACAAAAAQDAAC97g4AAAAAAAgAAAAEAwAAMEEPAAAAAAAIAAAABAMAANRHDwAAAAAACAAAAAQDAAA5SQ8AAAAAAAgAAAAEAwAATUkPAAAAAAAYAAAAAQYAAMwAAADcAAAA7AAAAPwAAAAMAQAAHAEAAAgAAAAABAAAQb5MnI4AAAAYAAAAAQIAAAEAAAAAAAAADwAAAAAAAAAAAAAAAAAAAAgAAAAEAwAABAAAAAAAAAAEAAAAAwMAAPcBAAAIAAAAAQkAAGZpbGU6Ly8vCAAAAAEBAABMb2JhZG9yYQgAAAAEAwAAAJCClucAAAAIAAAAAAQAAEHG45qFAAAAJAAAAAEBAAA4OTYxOUJFQy1DRENDLTQyQ0EtODAwNy0yOTc0QUE5RUI4MzQYAAAAAQIAAIEAAAABAAAA7xMAAAEAAAAAAAAAAAAAAAEAAAABAQAALwAAAAAAAAABBQAAHwEAAAECAAA3M2MzZWM1ZjE1NjhiNDBlYWVkMTY1Nzc5ZGZkMjA5NWI5ZjE1YWExNmU2YzhlZDg4ZTNiMjNmNDdkYWM2NjI2OzAwOzAwMDAwMDAwOzAwMDAwMDAwOzAwMDAwMDAwOzAwMDAwMDAwMDAwMDAwMjA7Y29tLmFwcGxlLmFwcC1zYW5kYm94LnJlYWQtd3JpdGU7MDE7MDEwMDAwMTA7MDAwMDAwMDAwMDBmNDk0ZDs0ZTsvdXNlcnMvbG1pcmF0cml4L2RvY3VtZW50cy9wYXBlcnMgbGlicmFyeS9ibG9vbS9qb3VybmFsIG9mIHJlc2VhcmNoIG9uIGVkdWNhdGlvbmFsIGVmZmVjdGl2ZW5lc3NfcG9ydGVyXzEucGRmAADMAAAA/v///wEAAAAAAAAAEAAAAAQQAACsAAAAAAAAAAUQAAAsAQAAAAAAABAQAABcAQAAAAAAAEAQAABMAQAAAAAAAAIgAAAkAgAAAAAAAAUgAACYAQAAAAAAABAgAACoAQAAAAAAABEgAADYAQAAAAAAABIgAAC4AQAAAAAAABMgAADIAQAAAAAAACAgAAAEAgAAAAAAADAgAAAwAgAAAAAAAAHAAAB8AQAAAAAAABHAAAAUAAAAAAAAABLAAACMAQAAAAAAAIDwAAA4AgAAAAAAAAAIAA0AGgAjAI4AAAAAAAACAQAAAAAAAAAFAAAAAAAAAAAAAAAAAAAE9g==}, - bdsk-url-1 = {https://mail.google.com/mail/u/0/}, - bdsk-url-2 = {https://doi.org/10.1080/19345747.2016.1264518}} +} @article{brown1974SmallSampleBehavior, author = {Brown, Morton B. and Forsythe, Alan B.}, doi = {10.1080/00401706.1974.10489158}, - issn = {0040-1706, 1537-2723}, journal = {Technometrics}, langid = {english}, month = feb, number = {1}, pages = {129--132}, title = {The {{Small Sample Behavior}} of {{Some Statistics Which Test}} the {{Equality}} of {{Several Means}}}, - urldate = {2024-06-27}, volume = {16}, year = {1974}, - bdsk-url-1 = {https://doi.org/10.1080/00401706.1974.10489158}} +} @article{james1951ComparisonSeveralGroups, author = {James, G. S.}, doi = {10.2307/2332578}, - eprint = {2332578}, - eprinttype = {jstor}, - issn = {00063444}, journal = {Biometrika}, month = dec, number = {3/4}, pages = {324}, title = {The Comparison of Several Groups of Observations When the Ratios of the Population Variances Are Unknown}, - urldate = {2024-07-02}, volume = {38}, year = {1951}, bdsk-url-1 = {https://doi.org/10.2307/2332578}} @@ -505,24 +463,19 @@ @article{james1951ComparisonSeveralGroups @article{welch1951ComparisonSeveralMean, author = {Welch, B. L.}, doi = {10.2307/2332579}, - eprint = {2332579}, - eprinttype = {jstor}, - issn = {00063444}, journal = {Biometrika}, month = dec, number = {3/4}, pages = {330}, title = {On the Comparison of Several Mean Values: {{An}} Alternative Approach}, - urldate = {2024-07-02}, volume = {38}, year = {1951}, - bdsk-url-1 = {https://doi.org/10.2307/2332579}} +} @article{mehrotra1997ImprovingBrownforsytheSolution, abstract = {Over two decades ago, Brown and Forsythe (B-F) (1974) proposed an innovative solution to the problem of comparing independent normal means under heteroscedasticity. Since then, their testing procedure has gained in popularity and authors have published various articles in which the B-F test has formed the basis of their research. The purpose of this paper is to point out, and correct, a flaw in the B-F testing procedure. Specifically, it is shown that the approximation proposed by B-F for the null distribution of their test statistic is inadequate. An improved approximation is provided and the small sample null properties of the modified B-F test are studied via simulation. The empirical findings support the theoretical result that the modified B-F test does a better job of preserving the test size compared to the original B-F test.}, author = {Mehrotra, Devan V.}, doi = {10.1080/03610919708813431}, - issn = {0361-0918}, journal = {Communications in Statistics - Simulation and Computation}, keywords = {ANOVA,heteroscedasticity,Satterthwaite approximation}, month = jan, @@ -530,31 +483,22 @@ @article{mehrotra1997ImprovingBrownforsytheSolution pages = {1139--1145}, publisher = {Taylor \& Francis}, title = {Improving the Brown-Forsythe Solution to the Generalized Behrens-Fisher Problem}, - urldate = {2024-06-27}, volume = {26}, year = {1997}, - bdsk-url-1 = {https://doi.org/10.1080/03610919708813431}} +} -@article{Kern_calibrated, +@article{Kern2014calibrated, abstract = {{Randomized experiments are considered the gold standard for causal inference because they can provide unbiased estimates of treatment effects for the experimental participants. However, researchers and policymakers are often interested in using a specific experiment to inform decisions about other target populations. In education research, increasing attention is being paid to the potential lack of generalizability of randomized experiments because the experimental participants may be unrepresentative of the target population of interest. This article examines whether generalization may be assisted by statistical methods that adjust for observed differences between the experimental participants and members of a target population. The methods examined include approaches that reweight the experimental data so that participants more closely resemble the target population and methods that utilize models of the outcome. Two simulation studies and one empirical analysis investigate and compare the methods' performance. One simulation uses purely simulated data while the other utilizes data from an evaluation of a school-based dropout prevention program. Our simulations suggest that machine learning methods outperform regression-based methods when the required structural (ignorability) assumptions are satisfied. When these assumptions are violated, all of the methods examined perform poorly. Our empirical analysis uses data from a multisite experiment to assess how well results from a given site predict impacts in other sites. Using a variety of extrapolation methods, predicted effects for each site are compared to actual benchmarks. Flexible modeling approaches perform best, although linear regression is not far behind. Taken together, these results suggest that flexible modeling techniques can aid generalization while underscoring the fact that even state-of-the-art statistical techniques still rely on strong assumptions.}}, author = {Kern, Holger L. and Stuart, Elizabeth A. and Hill, Jennifer and Green, Donald P.}, - date-added = {2022-07-22 13:11:01 +0000}, - date-modified = {2022-07-22 13:11:09 +0000}, doi = {10.1080/19345747.2015.1060282}, - issn = {1934-5747}, journal = {Journal of Research on Educational Effectiveness}, - keywords = {generalizability}, - local-url = {file://localhost/Users/lmiratrix/Documents/Papers%20Library/Kern/Journal%20of%20Research%20on%20Educational%20Effectiveness_Green_1.pdf}, month = {03}, number = {1}, pages = {103--127}, - pmid = {27668031}, title = {{Assessing Methods for Generalizing Experimental Impact Estimates to Target Populations}}, volume = {9}, year = {2014}, - bdsk-file-1 = {YnBsaXN0MDDSAQIDBFxyZWxhdGl2ZVBhdGhYYm9va21hcmtfEGYuLi8uLi8uLi8uLi9Eb2N1bWVudHMvUGFwZXJzIExpYnJhcnkvS2Vybi9Kb3VybmFsIG9mIFJlc2VhcmNoIG9uIEVkdWNhdGlvbmFsIEVmZmVjdGl2ZW5lc3NfR3JlZW5fMS5wZGZPEQRcYm9va1wEAAAAAAQQMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWAMAAAUAAAABAQAAVXNlcnMAAAAJAAAAAQEAAGxtaXJhdHJpeAAAAAkAAAABAQAARG9jdW1lbnRzAAAADgAAAAEBAABQYXBlcnMgTGlicmFyeQAABAAAAAEBAABLZXJuPAAAAAEBAABKb3VybmFsIG9mIFJlc2VhcmNoIG9uIEVkdWNhdGlvbmFsIEVmZmVjdGl2ZW5lc3NfR3JlZW5fMS5wZGYYAAAAAQYAAAQAAAAUAAAAKAAAADwAAABUAAAAYAAAAAgAAAAEAwAAyzgAAAAAAAAIAAAABAMAAL3uDgAAAAAACAAAAAQDAAAwQQ8AAAAAAAgAAAAEAwAA1EcPAAAAAAAIAAAABAMAAAtODwAAAAAACAAAAAQDAAANTg8AAAAAABgAAAABBgAAxAAAANQAAADkAAAA9AAAAAQBAAAUAQAACAAAAAAEAABBwAPMpgAAABgAAAABAgAAAQAAAAAAAAAPAAAAAAAAAAAAAAAAAAAACAAAAAQDAAAEAAAAAAAAAAQAAAADAwAA9wEAAAgAAAABCQAAZmlsZTovLy8IAAAAAQEAAExvYmFkb3JhCAAAAAQDAAAAkIKW5wAAAAgAAAAABAAAQcbjmoUAAAAkAAAAAQEAADg5NjE5QkVDLUNEQ0MtNDJDQS04MDA3LTI5NzRBQTlFQjgzNBgAAAABAgAAgQAAAAEAAADvEwAAAQAAAAAAAAAAAAAAAQAAAAEBAAAvAAAAAAAAAAEFAAAdAQAAAQIAADdjYmEzNDc0NzlhMTMyYzFjN2U1OTc3ZTg4OTJmNjQyZDg2ODJjMjcxMWI0MjE2MDk1ZTdjNDJmZjdjZjE2NWY7MDA7MDAwMDAwMDA7MDAwMDAwMDA7MDAwMDAwMDA7MDAwMDAwMDAwMDAwMDAyMDtjb20uYXBwbGUuYXBwLXNhbmRib3gucmVhZC13cml0ZTswMTswMTAwMDAxMDswMDAwMDAwMDAwMGY0ZTBkOzRlOy91c2Vycy9sbWlyYXRyaXgvZG9jdW1lbnRzL3BhcGVycyBsaWJyYXJ5L2tlcm4vam91cm5hbCBvZiByZXNlYXJjaCBvbiBlZHVjYXRpb25hbCBlZmZlY3RpdmVuZXNzX2dyZWVuXzEucGRmAAAAAMwAAAD+////AQAAAAAAAAAQAAAABBAAAKQAAAAAAAAABRAAACQBAAAAAAAAEBAAAFQBAAAAAAAAQBAAAEQBAAAAAAAAAiAAABwCAAAAAAAABSAAAJABAAAAAAAAECAAAKABAAAAAAAAESAAANABAAAAAAAAEiAAALABAAAAAAAAEyAAAMABAAAAAAAAICAAAPwBAAAAAAAAMCAAACgCAAAAAAAAAcAAAHQBAAAAAAAAEcAAABQAAAAAAAAAEsAAAIQBAAAAAAAAgPAAADACAAAAAAAAAAgADQAaACMAjAAAAAAAAAIBAAAAAAAAAAUAAAAAAAAAAAAAAAAAAATs}, - bdsk-file-2 = {YnBsaXN0MDDSAQIDBFxyZWxhdGl2ZVBhdGhYYm9va21hcmtfEGYuLi8uLi8uLi8uLi9Eb2N1bWVudHMvUGFwZXJzIExpYnJhcnkvS2Vybi9Kb3VybmFsIG9mIFJlc2VhcmNoIG9uIEVkdWNhdGlvbmFsIEVmZmVjdGl2ZW5lc3NfR3JlZW5fMS5wZGZPEQRcYm9va1wEAAAAAAQQMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAWAMAAAUAAAABAQAAVXNlcnMAAAAJAAAAAQEAAGxtaXJhdHJpeAAAAAkAAAABAQAARG9jdW1lbnRzAAAADgAAAAEBAABQYXBlcnMgTGlicmFyeQAABAAAAAEBAABLZXJuPAAAAAEBAABKb3VybmFsIG9mIFJlc2VhcmNoIG9uIEVkdWNhdGlvbmFsIEVmZmVjdGl2ZW5lc3NfR3JlZW5fMS5wZGYYAAAAAQYAAAQAAAAUAAAAKAAAADwAAABUAAAAYAAAAAgAAAAEAwAAyzgAAAAAAAAIAAAABAMAAL3uDgAAAAAACAAAAAQDAAAwQQ8AAAAAAAgAAAAEAwAA1EcPAAAAAAAIAAAABAMAAAtODwAAAAAACAAAAAQDAAANTg8AAAAAABgAAAABBgAAxAAAANQAAADkAAAA9AAAAAQBAAAUAQAACAAAAAAEAABBwAPMpgAAABgAAAABAgAAAQAAAAAAAAAPAAAAAAAAAAAAAAAAAAAACAAAAAQDAAAEAAAAAAAAAAQAAAADAwAA9wEAAAgAAAABCQAAZmlsZTovLy8IAAAAAQEAAExvYmFkb3JhCAAAAAQDAAAAkIKW5wAAAAgAAAAABAAAQcbjmoUAAAAkAAAAAQEAADg5NjE5QkVDLUNEQ0MtNDJDQS04MDA3LTI5NzRBQTlFQjgzNBgAAAABAgAAgQAAAAEAAADvEwAAAQAAAAAAAAAAAAAAAQAAAAEBAAAvAAAAAAAAAAEFAAAdAQAAAQIAADdjYmEzNDc0NzlhMTMyYzFjN2U1OTc3ZTg4OTJmNjQyZDg2ODJjMjcxMWI0MjE2MDk1ZTdjNDJmZjdjZjE2NWY7MDA7MDAwMDAwMDA7MDAwMDAwMDA7MDAwMDAwMDA7MDAwMDAwMDAwMDAwMDAyMDtjb20uYXBwbGUuYXBwLXNhbmRib3gucmVhZC13cml0ZTswMTswMTAwMDAxMDswMDAwMDAwMDAwMGY0ZTBkOzRlOy91c2Vycy9sbWlyYXRyaXgvZG9jdW1lbnRzL3BhcGVycyBsaWJyYXJ5L2tlcm4vam91cm5hbCBvZiByZXNlYXJjaCBvbiBlZHVjYXRpb25hbCBlZmZlY3RpdmVuZXNzX2dyZWVuXzEucGRmAAAAAMwAAAD+////AQAAAAAAAAAQAAAABBAAAKQAAAAAAAAABRAAACQBAAAAAAAAEBAAAFQBAAAAAAAAQBAAAEQBAAAAAAAAAiAAABwCAAAAAAAABSAAAJABAAAAAAAAECAAAKABAAAAAAAAESAAANABAAAAAAAAEiAAALABAAAAAAAAEyAAAMABAAAAAAAAICAAAPwBAAAAAAAAMCAAACgCAAAAAAAAAcAAAHQBAAAAAAAAEcAAABQAAAAAAAAAEsAAAIQBAAAAAAAAgPAAADACAAAAAAAAAAgADQAaACMAjAAAAAAAAAIBAAAAAAAAAAUAAAAAAAAAAAAAAAAAAATs}, - bdsk-url-1 = {https://doi.org/10.1080/19345747.2015.1060282}} +} @article{White1980heteroskedasticity, author = {White, Halbert}, @@ -568,75 +512,56 @@ @article{White1980heteroskedasticity @article{dong2013PowerUpToolCalculating, author = {Dong, Nianbo and Maynard, Rebecca}, doi = {10.1080/19345747.2012.673143}, - file = {C:\Users\jamespustejovsky\Zotero\storage\KID6WKQ6\Dong and Maynard - 2013 - PowerUp! A Tool for Calculating Minimum D.pdf}, - issn = {1934-5747, 1934-5739}, journal = {Journal of Research on Educational Effectiveness}, langid = {english}, month = jan, number = {1}, pages = {24--67}, - shorttitle = {{{{\emph{PowerUp}}}}{\emph{!}}}, title = {{{{\emph{PowerUp}}}}{\emph{!}} : {{A Tool}} for {{Calculating Minimum Detectable Effect Sizes}} and {{Minimum Required Sample Sizes}} for {{Experimental}} and {{Quasi-Experimental Design Studies}}}, - urldate = {2024-06-27}, volume = {6}, year = {2013}, - bdsk-url-1 = {https://doi.org/10.1080/19345747.2012.673143}} +} @article{tipton2014stratified, abstract = { Background:An important question in the design of experiments is how to ensure that the findings from the experiment are generalizable to a larger population. This concern with generalizability is particularly important when treatment effects are heterogeneous and when selecting units into the experiment using random sampling is not possible---two conditions commonly met in large-scale educational experiments.Method:This article introduces a model-based balanced-sampling framework for improving generalizations, with a focus on developing methods that are robust to model misspecification. Additionally, the article provides a new method for sample selection within this framework: First units in an inference population are divided into relatively homogenous strata using cluster analysis, and then the sample is selected using distance rankings.Result:In order to demonstrate and evaluate the method, a reanalysis of a completed experiment is conducted. This example compares samples selected using the new method with the actual sample used in the experiment. Results indicate that even under high nonresponse, balance is better on most covariates and that fewer coverage errors result.Conclusion:The article concludes with a discussion of additional benefits and limitations of the method. }, author = {Elizabeth Tipton}, doi = {10.1177/0193841X13516324}, - eprint = {https://doi.org/10.1177/0193841X13516324}, journal = {Evaluation Review}, - note = {PMID: 24647924}, number = {2}, pages = {109-139}, title = {Stratified Sampling Using Cluster Analysis: A Sample Selection Strategy for Improved Generalizations From Experiments}, - url = {https://doi.org/10.1177/0193841X13516324}, volume = {37}, year = {2013}, - bdsk-url-1 = {https://doi.org/10.1177/0193841X13516324}} +} @article{faul2009StatisticalPowerAnalyses, author = {Faul, Franz and Erdfelder, Edgar and Buchner, Axel and Lang, Albert-Georg}, - copyright = {http://www.springer.com/tdm}, doi = {10.3758/BRM.41.4.1149}, - file = {C:\Users\jamespustejovsky\Zotero\storage\PH3MJZ2H\Faul et al. - 2009 - Statistical power analyses using GPower 3.1 Test.pdf}, - issn = {1554-351X, 1554-3528}, journal = {Behavior Research Methods}, langid = {english}, month = nov, number = {4}, pages = {1149--1160}, - shorttitle = {Statistical Power Analyses Using {{G}}*{{Power}} 3.1}, title = {Statistical Power Analyses Using {{G}}*{{Power}} 3.1: {{Tests}} for Correlation and Regression Analyses}, - urldate = {2024-06-27}, volume = {41}, year = {2009}, - bdsk-url-1 = {https://doi.org/10.3758/BRM.41.4.1149}} +} @article{longUsingHeteroscedasticityConsistent2000, abstract = {In the presence of heteroscedasticity, ordinary least squares (OLS) estimates are unbiased, but the usual tests of significance are generally inappropriate and their use can lead to incorrect inferences. Tests based on a heteroscedasticity consistent covariance matrix (HCCM), however, are consistent even in the presence of heteroscedasticity of an unknown form. Most applications that use a HCCM appear to rely on the asymptotic version known as HC0. Our Monte Carlo simulations show that HC0 often results in incorrect inferences when N {$\leq$} 250, while three relatively unknown, small sample versions of the HCCM, and especially a version known as HC3, work well even for N's as small as 25. We recommend that: (1) data analysts should correct for heteroscedasticity using a HCCM whenever there is reason to suspect heteroscedasticity; (2) the decision to use HCCM-based tests should not be determined by a screening test for heteroscedasticity; and (3) when N {$\leq$} 250, the HCCM known as HC3 should be used. Since HC3 is simple to compute, we encourage authors of statistical software to add this estimator to their programs.}, author = {Long, J. Scott and Ervin, Laurie H.}, doi = {10.1080/00031305.2000.10474549}, - file = {C:\Users\jamespustejovsky\Zotero\storage\URPFHBJT\Long and Ervin - 2000 - Using Heteroscedasticity Consistent Standard Error.pdf}, - issn = {0003-1305}, journal = {The American Statistician}, - keywords = {Heteroscedasticity,Heteroscedasticity consistent covariance matrix}, - month = aug, number = {3}, pages = {217--224}, publisher = {Taylor \& Francis}, title = {Using Heteroscedasticity Consistent Standard Errors in the Linear Regression Model}, - urldate = {2023-12-19}, volume = {54}, year = {2000}, - bdsk-url-1 = {https://doi.org/10.1080/00031305.2000.10474549}} +} @book{GerberGreen, author = {Gerber, Alan S and Green, Donald P}, - date-added = {2022-06-30 20:27:18 +0000}, - date-modified = {2022-07-03 14:19:03 +0000}, publisher = {W. W. Norton \& Company}, series = {W. W. Norton \& Company}, title = {{Field Experiments: Design, Analysis, and Interpretation}}, @@ -644,8 +569,6 @@ @book{GerberGreen @article{sundberg2003conditional, author = {Sundberg, Rolf}, - date-added = {2022-06-30 20:25:30 +0000}, - date-modified = {2022-06-30 20:25:30 +0000}, journal = {Journal of the Royal Statistical Society: Series B (Statistical Methodology)}, month = {00}, number = {1}, @@ -656,8 +579,6 @@ @article{sundberg2003conditional @article{staiger2010searching, author = {Staiger, Douglas O and Rockoff, Jonah E}, - date-added = {2022-06-28 21:30:20 +0000}, - date-modified = {2022-06-28 21:30:20 +0000}, journal = {Journal of Economic perspectives}, number = {3}, pages = {97--118}, @@ -667,12 +588,9 @@ @article{staiger2010searching @article{abdulkadirouglu2017research, author = {Abdulkadiro{\u{g}}lu, Atila and Angrist, Joshua D and Narita, Yusuke and Pathak, Parag A}, - date-added = {2022-06-28 21:29:08 +0000}, - date-modified = {2022-06-28 21:29:08 +0000}, journal = {Econometrica}, number = {5}, pages = {1373--1432}, - publisher = {Wiley Online Library}, title = {Research design meets market design: Using centralized assignment for impact evaluation}, volume = {85}, year = {2017}} @@ -681,7 +599,6 @@ @misc{fryda2014H2oInterfaceH2O abstract = {R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).}, author = {Fryda, Tomas and LeDell, Erin and Gill, Navdeep and Aiello, Spencer and Fu, Anqi and Candel, Arno and Click, Cliff and Kraljevic, Tom and Nykodym, Tomas and Aboyoun, Patrick and Kurka, Michal and Malohlava, Michal and Poirier, Sebastien and Wong, Wendy}, doi = {10.32614/CRAN.package.h2o}, - langid = {english}, month = jun, pages = {3.44.0.3}, publisher = {Comprehensive R Archive Network}, @@ -700,7 +617,7 @@ @book{xie2015 title = {Dynamic Documents with {R} and knitr}, url = {http://yihui.name/knitr/}, year = {2015}, - bdsk-url-1 = {http://yihui.name/knitr/}} +} @article{alfons2010ObjectOrientedFrameworkStatistical, author = {Alfons, Andreas and Templ, Matthias and Filzmoser, Peter}, @@ -727,8 +644,6 @@ @article{blair2019DeclaringDiagnosingResearch abstract = {Researchers need to select high-quality research designs and communicate those designs clearly to readers. Both tasks are difficult. We provide a framework for formally ``declaring'' the analytically relevant features of a research design in a demonstrably complete manner, with applications to qualitative, quantitative, and mixed methods research. The approach to design declaration we describe requires defining a model of the world (M), an inquiry (I), a data strategy (D), and an answer strategy (A). Declaration of these features in code provides sufficient information for researchers and readers to use Monte Carlo techniques to diagnose properties such as power, bias, accuracy of qualitative causal inferences, and other ``diagnosands.'' Ex ante declarations can be used to improve designs and facilitate preregistration, analysis, and reconciliation of intended and actual analyses. Ex post declarations are useful for describing, sharing, reanalyzing, and critiquing existing designs. We provide open-source software, DeclareDesign, to implement the proposed approach.}, author = {Blair, Graeme and Cooper, Jasper and Coppock, Alexander and Humphreys, Macartan}, doi = {10.1017/S0003055419000194}, - file = {C:\Users\jamespustejovsky\Zotero\storage\CIS3IBN8\Blair et al. - 2019 - Declaring and Diagnosing Research Designs.pdf}, - issn = {0003-0554, 1537-5943}, journal = {American Political Science Review}, keywords = {cited}, langid = {english}, @@ -740,7 +655,7 @@ @article{blair2019DeclaringDiagnosingResearch urldate = {2024-01-01}, volume = {113}, year = {2019}, - bdsk-url-1 = {https://doi.org/10.1017/S0003055419000194}} +} @book{blair2023ResearchDesignSocial, address = {Princeton}, @@ -753,17 +668,13 @@ @book{blair2023ResearchDesignSocial @article{boos2015AssessingVariabilityComplex, abstract = {SummaryGood statistical practice dictates that summaries in Monte Carlo studies should always be accompanied by standard errors. Those standard errors are easy to provide for summaries that are sample means over the replications of the Monte Carlo output: for example, bias estimates, power estimates for tests and mean squared error estimates. But often more complex summaries are of interest: medians (often displayed in boxplots), sample variances, ratios of sample variances and non-normality measures such as skewness and kurtosis. In principle, standard errors for most of these latter summaries may be derived from the Delta Method, but that extra step is often a barrier for standard errors to be provided. Here, we highlight the simplicity of using the jackknife and bootstrap to compute these standard errors, even when the summaries are somewhat complicated. {\copyright} 2014 The Authors. International Statistical Review {\copyright} 2014 International Statistical Institute}, author = {Boos, Dennis D. and Osborne, Jason A.}, - copyright = {{\copyright}2014\,The Authors. International Statistical Review {\copyright} 2014\,International Statistical Institute}, doi = {10.1111/insr.12087}, - file = {C\:\\Users\\jamespustejovsky\\Zotero\\storage\\CU9IPBXZ\\Boos and Osborne - 2015 - Assessing Variability of Complex Descriptive Stati.pdf;C\:\\Users\\jamespustejovsky\\Zotero\\storage\\T3ELDFPE\\insr.html}, - issn = {1751-5823}, journal = {International Statistical Review}, keywords = {Bootstrap,cited,coefficient of variation,delta method,influence curve,jackknife,standard errors,variability of ratios}, langid = {english}, number = {2}, pages = {228--238}, title = {Assessing {{Variability}} of {{Complex Descriptive Statistics}} in {{Monte Carlo Studies Using Resampling Methods}}}, - urldate = {2024-01-01}, volume = {83}, year = {2015}, bdsk-url-1 = {https://doi.org/10.1111/insr.12087}} @@ -772,10 +683,7 @@ @article{boulesteix2020IntroductionStatisticalSimulations abstract = {In health research, statistical methods are frequently used to address a wide variety of research questions. For almost every analytical challenge, different methods are available. But how do we choose between different methods and how do we judge whether the chosen method is appropriate for our specific study? Like in any science, in statistics, experiments can be run to find out which methods should be used under which circumstances. The main objective of this paper is to demonstrate that simulation studies, that is, experiments investigating synthetic data with known properties, are an invaluable tool for addressing these questions. We aim to provide a first introduction to simulation studies for data analysts or, more generally, for researchers involved at different levels in the analyses of health data, who (1) may rely on simulation studies published in statistical literature to choose their statistical methods and who, thus, need to understand the criteria of assessing the validity and relevance of simulation results and their interpretation; and/or (2) need to understand the basic principles of designing statistical simulations in order to efficiently collaborate with more experienced colleagues or start learning to conduct their own simulations. We illustrate the implementation of a simulation study and the interpretation of its results through a simple example inspired by recent literature, which is completely reproducible using the R-script available from online supplemental file 1.}, author = {Boulesteix, Anne-Laure and Groenwold, Rolf HH and Abrahamowicz, Michal and Binder, Harald and Briel, Matthias and Hornung, Roman and Morris, Tim P. and Rahnenf{\"u}hrer, J{\"o}rg and Sauerbrei, Willi}, chapter = {Epidemiology}, - copyright = {{\copyright} Author(s) (or their employer(s)) 2020. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.. http://creativecommons.org/licenses/by-nc/4.0/This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:~http://creativecommons.org/licenses/by-nc/4.0/.}, doi = {10.1136/bmjopen-2020-039921}, - file = {C:\Users\jamespustejovsky\Zotero\storage\YC92XY9J\Boulesteix et al. - 2020 - Introduction to statistical simulations in health .pdf}, - issn = {2044-6055, 2044-6055}, journal = {BMJ Open}, keywords = {cited,epidemiology,protocols & guidelines,statistics & research methods}, langid = {english}, @@ -785,26 +693,9 @@ @article{boulesteix2020IntroductionStatisticalSimulations pmid = {33318113}, publisher = {British Medical Journal Publishing Group}, title = {Introduction to Statistical Simulations in Health Research}, - urldate = {2024-01-01}, volume = {10}, year = {2020}, - bdsk-url-1 = {https://doi.org/10.1136/bmjopen-2020-039921}} - -@article{boulesteix2020ReplicationCrisisMethodological, - abstract = {Statisticians have been keen to critique statistical aspects of the ``replication crisis'' in other scientific disciplines. But new statistical tools are often published and promoted without any thought to replicability. This needs to change, argue Anne-Laure Boulesteix, Sabine Hoffmann, Alethea Charlton and Heidi Seibold}, - author = {Boulesteix, Anne-Laure and Hoffmann, Sabine and Charlton, Alethea and Seibold, Heidi}, - doi = {10.1111/1740-9713.01444}, - file = {C\:\\Users\\jamespustejovsky\\Zotero\\storage\\KBM9KZ63\\Boulesteix et al. - 2020 - A Replication Crisis in Methodological Research.pdf;C\:\\Users\\jamespustejovsky\\Zotero\\storage\\SS7MWV8I\\7038554.html}, - issn = {1740-9705}, - journal = {Significance}, - month = oct, - number = {5}, - pages = {18--21}, - title = {A {{Replication Crisis}} in {{Methodological Research}}?}, - urldate = {2024-01-01}, - volume = {17}, - year = {2020}, - bdsk-url-1 = {https://doi.org/10.1111/1740-9713.01444}} +} @misc{brown2023SimprFlexibleTidyverse, author = {Brown, Ethan}, @@ -822,24 +713,19 @@ @book{carsey2013MonteCarloSimulation @article{chalmers2020WritingEffectiveReliable, author = {Chalmers, R. Philip and Adkins, Mark C.}, doi = {10.20982/tqmp.16.4.p248}, - file = {C:\Users\jamespustejovsky\Zotero\storage\FQG3FM6J\Chalmers and Adkins - 2020 - Writing Effective and Reliable Monte Carlo Simulat.pdf}, - issn = {2292-1354}, journal = {The Quantitative Methods for Psychology}, keywords = {cited}, month = may, number = {4}, pages = {248--280}, title = {Writing {{Effective}} and {{Reliable Monte Carlo Simulations}} with the {{SimDesign Package}}}, - urldate = {2024-01-01}, volume = {16}, year = {2020}, - bdsk-url-1 = {https://doi.org/10.20982/tqmp.16.4.p248}} +} @book{chang2010MonteCarloSimulation, abstract = {Helping you become a creative, logical thinker and skillful "simulator," Monte Carlo Simulation for the Pharmaceutical Industry: Concepts, Algorithms, and Case Studies provides broad coverage of the entire drug development process, from drug discovery to preclinical and clinical trial aspects to commercialization. It presents the theories and metho}, author = {Chang, Mark}, - googlebooks = {MVuFoSMPZC8C}, - isbn = {978-1-4398-3593-7}, keywords = {cited,Mathematics / Probability & Statistics / General,Medical / Pharmacology}, langid = {english}, month = sep, @@ -861,7 +747,6 @@ @article{claesen2021ComparingDreamReality publisher = {Royal Society}, shorttitle = {Comparing Dream to Reality}, title = {Comparing Dream to Reality: An Assessment of Adherence of the First Generation of Preregistered Studies}, - urldate = {2024-01-08}, volume = {8}, year = {2021}, bdsk-url-1 = {https://doi.org/10.1098/rsos.211037}} @@ -878,9 +763,7 @@ @article{cruwell2023WhatBadgeComputational number = {4}, pages = {512--522}, publisher = {SAGE Publications Inc}, - shorttitle = {What's in a {{Badge}}?}, title = {What's in a {{Badge}}? {{A Computational Reproducibility Investigation}} of the {{Open Data Badge Policy}} in {{One Issue}} of {{Psychological Science}}}, - urldate = {2024-01-08}, volume = {34}, year = {2023}, bdsk-url-1 = {https://doi.org/10.1177/09567976221140828}} @@ -896,8 +779,6 @@ @article{feiveson2002PowerSimulation abstract = {This paper describes how to write Stata programs to estimate the power of virtually any statistical test that Stata can perform. Examples given include the t test, Poisson regression, Cox regression, and the nonparametric rank-sum test.}, author = {Feiveson, A. H.}, doi = {10.1177/1536867X0200200201}, - file = {C:\Users\jamespustejovsky\Zotero\storage\ZT8R83PS\Feiveson - 2002 - Power by Simulation.pdf}, - issn = {1536-867X}, journal = {The Stata Journal}, langid = {english}, month = jun, @@ -905,22 +786,18 @@ @article{feiveson2002PowerSimulation pages = {107--124}, publisher = {SAGE Publications}, title = {Power by {{Simulation}}}, - urldate = {2023-12-31}, volume = {2}, year = {2002}, - bdsk-url-1 = {https://doi.org/10.1177/1536867X0200200201}} +} @book{gamma1995DesignPatternsElements, abstract = {A book review of Design Patterns: Elements of Reusable Object-Oriented Software by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides is presented.}, address = {Reading, MA}, author = {Gamma, Erich and Helm, Richard and Johnson, Ralph and Vlissides, John}, chapter = {Books}, - copyright = {Copyright International Business Machines Corporation 1995}, - file = {C:\Users\jamespustejovsky\Zotero\storage\KQSAGQPV\Beck - 1995 - Design Patterns Elements of Reusable Object-Orien.pdf}, isbn = {0-201-63361-2}, keywords = {Design pattern,Design Patterns,Object-oriented programming}, publisher = {Addison-Wesley Publishing Co.}, - shorttitle = {Design {{Patterns}}}, title = {Design {{Patterns}}: {{Elements}} of {{Reusable Object-Oriented Software}}}, urldate = {2024-01-05}, year = {1995}} @@ -940,30 +817,24 @@ @article{gasparini2018RsimsumSummariseResults @book{gelman2013BayesianDataAnalysis, author = {Gelman, Andrew and Carlin, John B. and Stern, Hal S. and Dunson, David B. and Vehtari, Aki and Rubin, Donald B.}, doi = {10.1201/b16018}, - edition = {0}, isbn = {978-0-429-11307-9}, langid = {english}, month = nov, publisher = {{Chapman and Hall/CRC}}, title = {Bayesian {{Data Analysis}}}, - urldate = {2024-06-26}, year = {2013}, - bdsk-url-1 = {https://doi.org/10.1201/b16018}} +} @article{gelman2014StatisticalCrisisScience, abstract = {{$<$}em{$>$}Gale{$<$}/em{$>$} Academic OneFile includes The statistical crisis in science: data-dependent analy by Andrew Gelman and Eric Loken. Click to explore.}, author = {Gelman, Andrew and Loken, Eric}, - file = {C:\Users\jamespustejovsky\Zotero\storage\89I3TV76\i.html}, - issn = {00030996}, journal = {American Scientist}, langid = {english}, month = nov, number = {6}, pages = {460--466}, publisher = {Sigma Xi, The Scientific Research Society}, - shorttitle = {The Statistical Crisis in Science}, title = {The Statistical Crisis in Science: Data-Dependent Analysis--a \"Garden of Forking Paths\"--Explains Why Many Statistically Significant Comparisons Don't Hold Up}, - urldate = {2024-01-08}, volume = {102}, year = {2014}} @@ -983,29 +854,21 @@ @article{goldfeld2020SimstudyIlluminatingResearch @article{green2016SIMRPackagePower, abstract = {The r package simr allows users to calculate power for generalized linear mixed models from the lme4 package. The power calculations are based on Monte Carlo simulations. It includes tools for (i) running a power analysis for a given model and design; and (ii) calculating power curves to assess trade-offs between power and sample size. This paper presents a tutorial using a simple example of count data with mixed effects (with structure representative of environmental monitoring data) to guide the user along a gentle learning curve, adding only a few commands or options at a time.}, author = {Green, Peter and MacLeod, Catriona J.}, - copyright = {{\copyright} 2015 The Authors. Methods in Ecology and Evolution {\copyright} 2015 British Ecological Society}, doi = {10.1111/2041-210X.12504}, - file = {C\:\\Users\\jamespustejovsky\\Zotero\\storage\\QDLKYJ6L\\Green and MacLeod - 2016 - SIMR an R package for power analysis of generaliz.pdf;C\:\\Users\\jamespustejovsky\\Zotero\\storage\\HEIG34AA\\2041-210X.html}, - issn = {2041-210X}, journal = {Methods in Ecology and Evolution}, keywords = {cited,experimental design,glmm,Monte Carlo,random effects,sample size,type II error}, langid = {english}, number = {4}, pages = {493--498}, - shorttitle = {{{SIMR}}}, title = {{{SIMR}}: An {{R}} Package for Power Analysis of Generalized Linear Mixed Models by Simulation}, - urldate = {2023-12-31}, volume = {7}, year = {2016}, - bdsk-url-1 = {https://doi.org/10.1111/2041-210X.12504}} +} @article{hardwicke2023ReducingBiasIncreasing, abstract = {Flexibility in the design, analysis and interpretation of scientific studies creates a multiplicity of possible research outcomes. Scientists are granted considerable latitude to selectively use and report the hypotheses, variables and analyses that create the most positive, coherent and attractive story while suppressing those that are negative or inconvenient. This creates a risk of bias that can lead to scientists fooling themselves and fooling others. Preregistration involves declaring a research plan (for example, hypotheses, design and statistical analyses) in a public registry before the research outcomes are known. Preregistration (1) reduces the risk of bias by encouraging outcome-independent decision-making and (2) increases transparency, enabling others to assess the risk of bias and calibrate their confidence in research outcomes. In this Perspective, we briefly review the historical evolution of preregistration in medicine, psychology and other domains, clarify its pragmatic functions, discuss relevant meta-research, and provide recommendations for scientists and journal editors.}, author = {Hardwicke, Tom E. and Wagenmakers, Eric-Jan}, - copyright = {2022 Springer Nature Limited}, doi = {10.1038/s41562-022-01497-2}, - file = {C:\Users\jamespustejovsky\Zotero\storage\SKQY4R7Q\Hardwicke and Wagenmakers - 2023 - Reducing bias, increasing transparency and calibra.pdf}, - issn = {2397-3374}, journal = {Nature Human Behaviour}, keywords = {Science,Scientific community,technology and society}, langid = {english}, @@ -1014,17 +877,14 @@ @article{hardwicke2023ReducingBiasIncreasing pages = {15--26}, publisher = {Nature Publishing Group}, title = {Reducing Bias, Increasing Transparency and Calibrating Confidence with Preregistration}, - urldate = {2024-01-08}, volume = {7}, year = {2023}, - bdsk-url-1 = {https://doi.org/10.1038/s41562-022-01497-2}} +} @article{harwell2018SurveyReportingPractices, abstract = {Computer simulation studies represent an important tool for investigating processes difficult or impossible to study using mathematical theory or real data. Hoaglin and Andrews recommended these studies be treated as statistical sampling experiments subject to established principles of design and data analysis, but the survey of Hauck and Anderson suggested these recommendations had, at that point in time, generally been ignored. We update the survey results of Hauck and Anderson using a sample of studies applying simulation methods in statistical research to assess the extent to which the recommendations of Hoaglin and Andrews and others for conducting simulation studies have been adopted. The important role of statistical applications of computer simulation studies in enhancing the reproducibility of scientific findings is also discussed. The results speak to the state of the art and the extent to which these studies are realizing their potential to inform statistical practice and a program of statistical research.}, author = {Harwell, Michael and Kohli, Nidhi and {Peralta-Torres}, Yadira}, doi = {10.1080/00031305.2017.1342692}, - file = {C:\Users\jamespustejovsky\Zotero\storage\H9BDZ3WT\Harwell et al. - 2018 - A Survey of Reporting Practices of Computer Simula.pdf}, - issn = {0003-1305}, journal = {The American Statistician}, keywords = {Computer simulation,Design and data analysis,Survey}, month = oct, @@ -1032,35 +892,29 @@ @article{harwell2018SurveyReportingPractices pages = {321--327}, publisher = {Taylor \& Francis}, title = {A {{Survey}} of {{Reporting Practices}} of {{Computer Simulation Studies}} in {{Statistical Research}}}, - urldate = {2024-01-02}, volume = {72}, year = {2018}, - bdsk-url-1 = {https://doi.org/10.1080/00031305.2017.1342692}} +} @article{hoogland1998RobustnessStudiesCovariance, abstract = {In covariance structure modeling, several estimation methods are available. The robustness of an estimator against specific violations of assumptions can be determined empirically by means of a Monte Carlo study. Many such studies in covariance structure analysis have been published, but the conclusions frequently seem to contradict each other. An overview of robustness studies in covariance structure analysis is given, and an attempt is made to generalize findings. Robustness studies are described and distinguished from each other systematically by means of certain characteristics. These characteristics serve as explanatory variables in a meta-analysis concerning the behavior of parameter estimators, standard error estimators, and goodness-of-fit statistics when the model is correctly specified.}, author = {HOOGLAND, JEFFREY J. and BOOMSMA, {\relax ANNE}}, doi = {10.1177/0049124198026003003}, - file = {C:\Users\jamespustejovsky\Zotero\storage\D23SYWER\HOOGLAND and BOOMSMA - 1998 - Robustness Studies in Covariance Structure Modelin.pdf}, - issn = {0049-1241}, journal = {Sociological Methods \& Research}, langid = {english}, month = feb, number = {3}, pages = {329--367}, publisher = {SAGE Publications Inc}, - shorttitle = {Robustness {{Studies}} in {{Covariance Structure Modeling}}}, title = {Robustness {{Studies}} in {{Covariance Structure Modeling}}: {{An Overview}} and a {{Meta-Analysis}}}, - urldate = {2024-01-02}, volume = {26}, year = {1998}, - bdsk-url-1 = {https://doi.org/10.1177/0049124198026003003}} +} @article{huang2016GeneralizedEstimatingEquations, abstract = {Background/aims: Generalized estimating equations are a common modeling approach used in cluster randomized trials to account for within-cluster correlation. It is well known that the sandwich variance estimator is biased when the number of clusters is small ({$\leq$}40), resulting in an inflated type I error rate. Various bias correction methods have been proposed in the statistical literature, but how adequately they are utilized in current practice for cluster randomized trials is not clear. The aim of this study is to evaluate the use of generalized estimating equation bias correction methods in recently published cluster randomized trials and demonstrate the necessity of such methods when the number of clusters is small. Methods: Review of cluster randomized trials published between August 2013 and July 2014 and using generalized estimating equations for their primary analyses. Two independent reviewers collected data from each study using a standardized, pre-piloted data extraction template. A two-arm cluster randomized trial was simulated under various scenarios to show the potential effect of a small number of clusters on type I error rate when estimating the treatment effect. The nominal level was set at 0.05 for the simulation study. Results: Of the 51 included trials, 28 (54.9\%) analyzed 40 or fewer clusters with a minimum of four total clusters. Of these 28 trials, only one trial used a bias correction method for generalized estimating equations. The simulation study showed that with four clusters, the type I error rate ranged between 0.43 and 0.47. Even though type I error rate moved closer to the nominal level as the number of clusters increases, it still ranged between 0.06 and 0.07 with 40 clusters. Conclusions: Our results showed that statistical issues arising from small number of clusters in generalized estimating equations is currently inadequately handled in cluster randomized trials. Potential for type I error inflation could be very high when the sandwich estimator is used without bias correction.}, author = {Huang, Shuang and Fiero, Mallorie H and Bell, Melanie L}, doi = {10.1177/1740774516643498}, - issn = {1740-7745, 1740-7753}, journal = {Clinical Trials}, langid = {english}, month = aug, @@ -1068,42 +922,35 @@ @article{huang2016GeneralizedEstimatingEquations pages = {445--449}, shorttitle = {Generalized Estimating Equations in Cluster Randomized Trials with a Small Number of Clusters}, title = {Generalized Estimating Equations in Cluster Randomized Trials with a Small Number of Clusters: {{Review}} of Practice and Simulation Study}, - urldate = {2024-01-05}, volume = {13}, year = {2016}, - bdsk-url-1 = {https://doi.org/10.1177/1740774516643498}} +} @article{hussey2007DesignAnalysisStepped, abstract = {Cluster randomized trials (CRT) are often used to evaluate therapies or interventions in situations where individual randomization is not possible or not desirable for logistic, financial or ethical reasons. While a significant and rapidly growing body of literature exists on CRTs utilizing a ``parallel'' design (i.e. I clusters randomized to each treatment), only a few examples of CRTs using crossover designs have been described. In this article we discuss the design and analysis of a particular type of crossover CRT -- the stepped wedge -- and provide an example of its use.}, author = {Hussey, Michael A. and Hughes, James P.}, doi = {10.1016/j.cct.2006.05.007}, - file = {C:\Users\jamespustejovsky\Zotero\storage\ZX4WVRHJ\S1551714406000632.html}, - issn = {1551-7144}, journal = {Contemporary Clinical Trials}, keywords = {Cluster randomized trial,Prevention trials,Stepped wedge design}, month = feb, number = {2}, pages = {182--191}, title = {Design and Analysis of Stepped Wedge Cluster Randomized Trials}, - urldate = {2024-01-05}, volume = {28}, year = {2007}, - bdsk-url-1 = {https://doi.org/10.1016/j.cct.2006.05.007}} +} @book{jones2012IntroductionScientificProgramming, - abstract = {Known for its versatility, the free programming language R is widely used for statistical computing and graphics, but is also a fully functional programming language well suited to scientific programming.An Introduction to Scientific Programming and Simulation Using R teaches the skills needed to perform scientific programming while also introducin}, address = {New York}, author = {Jones, Owen and Maillardet, Robert and Robinson, Andrew}, doi = {10.1201/9781420068740}, - isbn = {978-0-429-14333-5}, - month = oct, publisher = {{Chapman and Hall/CRC}}, title = {Introduction to {{Scientific Programming}} and {{Simulation Using R}}}, year = {2012}, - bdsk-url-1 = {https://doi.org/10.1201/9781420068740}} +} @misc{joshi2022SimhelpersHelperFunctions, - author = {Joshi, Megha and Pustejovsky, James}, + author = {Joshi, Megha and Pustejovsky, James E.}, keywords = {cited}, title = {Simhelpers: {{Helper Functions}} for {{Simulation Studies}}}, year = {2022}} @@ -1130,39 +977,30 @@ @article{kern2016AssessingMethodsGeneralizing abstract = {Randomized experiments are considered the gold standard for causal inference because they can provide unbiased estimates of treatment effects for the experimental participants. However, researchers and policymakers are often interested in using a specific experiment to inform decisions about other target populations. In education research, increasing attention is being paid to the potential lack of generalizability of randomized experiments because the experimental participants may be unrepresentative of the target population of interest. This article examines whether generalization may be assisted by statistical methods that adjust for observed differences between the experimental participants and members of a target population. The methods examined include approaches that reweight the experimental data so that participants more closely resemble the target population and methods that utilize models of the outcome. Two simulation studies and one empirical analysis investigate and compare the methods' performance. One simulation uses purely simulated data while the other utilizes data from an evaluation of a school-based dropout prevention program. Our simulations suggest that machine learning methods outperform regression-based methods when the required structural (ignorability) assumptions are satisfied. When these assumptions are violated, all of the methods examined perform poorly. Our empirical analysis uses data from a multisite experiment to assess how well results from a given site predict impacts in other sites. Using a variety of extrapolation methods, predicted effects for each site are compared to actual benchmarks. Flexible modeling approaches perform best, although linear regression is not far behind. Taken together, these results suggest that flexible modeling techniques can aid generalization while underscoring the fact that even state-of-the-art statistical techniques still rely on strong assumptions.}, author = {Kern, Holger L. and Stuart, Elizabeth A. and Hill, Jennifer and Green, Donald P.}, doi = {10.1080/19345747.2015.1060282}, - file = {C:\Users\jamespustejovsky\Zotero\storage\2F7WKUXW\Kern et al. - 2016 - Assessing Methods for Generalizing Experimental Im.pdf}, - issn = {1934-5747}, journal = {Journal of Research on Educational Effectiveness}, keywords = {Bayesian Additive Regression Trees external validity generalizability propensity score weighting}, month = jan, number = {1}, pages = {103--127}, - pmid = {27668031}, - publisher = {Routledge}, title = {Assessing {{Methods}} for {{Generalizing Experimental Impact Estimates}} to {{Target Populations}}}, - urldate = {2024-01-01}, volume = {9}, year = {2016}, - bdsk-url-1 = {https://doi.org/10.1080/19345747.2015.1060282}} +} @article{koehler2009AssessmentMonteCarlo, abstract = {Statistical experiments, more commonly referred to as Monte Carlo or simulation studies, are used to study the behavior of statistical methods and measures under controlled situations. Whereas recent computing and methodological advances have permitted increased efficiency in the simulation process, known as variance reduction, such experiments remain limited by their finite nature and hence are subject to uncertainty; when a simulation is run more than once, different results are obtained. However, virtually no emphasis has been placed on reporting the uncertainty, referred to here as Monte Carlo error, associated with simulation results in the published literature, or on justifying the number of replications used. These deserve broader consideration. Here we present a series of simple and practical methods for estimating Monte Carlo error as well as determining the number of replications required to achieve a desired level of accuracy. The issues and methods are demonstrated with two simple examples, one evaluating operating characteristics of the maximum likelihood estimator for the parameters in logistic regression and the other in the context of using the bootstrap to obtain 95\% confidence intervals. The results suggest that in many settings, Monte Carlo error may be more substantial than traditionally thought.}, author = {Koehler, Elizabeth and Brown, Elizabeth and Haneuse, Sebastien J.-P. A.}, doi = {10.1198/tast.2009.0030}, - file = {C:\Users\jamespustejovsky\Zotero\storage\BZFE3YSE\Koehler et al. - 2009 - On the Assessment of Monte Carlo Error in Simulati.pdf}, - issn = {0003-1305}, journal = {The American Statistician}, keywords = {Bootstrap,cited,Jackknife,Replication}, month = may, number = {2}, pages = {155--162}, - pmid = {22544972}, publisher = {Taylor \& Francis}, title = {On the {{Assessment}} of {{Monte Carlo Error}} in {{Simulation-Based Statistical Analyses}}}, - urldate = {2024-01-02}, volume = {63}, year = {2009}, - bdsk-url-1 = {https://doi.org/10.1198/tast.2009.0030}} +} @misc{leschinski2019MonteCarloAutomaticParallelized, author = {Leschinski, Christian Hendrik}, @@ -1174,48 +1012,37 @@ @misc{leschinski2019MonteCarloAutomaticParallelized @article{leyrat2013PropensityScoresUsed, abstract = {Cluster randomized trials (CRTs) are often prone to selection bias despite randomization. Using a simulation study, we investigated the use of propensity score (PS) based methods in estimating treatment effects in CRTs with selection bias when the outcome is quantitative. Of four PS-based methods (adjustment on PS, inverse weighting, stratification, and optimal full matching method), three successfully corrected the bias, as did an approach using classical multivariable regression. However, they showed poorer statistical efficiency than classical methods, with higher standard error for the treatment effect, and type I error much smaller than the 5\% nominal level. Copyright {\copyright} 2013 John Wiley \& Sons, Ltd.}, author = {Leyrat, C. and Caille, A. and Donner, A. and Giraudeau, B.}, - copyright = {Copyright {\copyright} 2013 John Wiley \& Sons, Ltd.}, doi = {10.1002/sim.5795}, - file = {C\:\\Users\\jamespustejovsky\\Zotero\\storage\\VHHHRSDD\\Leyrat et al. - 2013 - Propensity scores used for analysis of cluster ran.pdf;C\:\\Users\\jamespustejovsky\\Zotero\\storage\\FU2REDBN\\sim.html}, - issn = {1097-0258}, journal = {Statistics in Medicine}, keywords = {cluster randomized trial,Monte-Carlo simulations,propensity score,selection bias}, langid = {english}, number = {19}, pages = {3357--3372}, - shorttitle = {Propensity Scores Used for Analysis of Cluster Randomized Trials with Selection Bias}, title = {Propensity Scores Used for Analysis of Cluster Randomized Trials with Selection Bias: A Simulation Study}, - urldate = {2024-01-05}, volume = {32}, year = {2013}, - bdsk-url-1 = {https://doi.org/10.1002/sim.5795}} +} @article{lohmann2022ItTimeTen, abstract = {The quantitative analysis of research data is a core element of empirical research. The performance of statistical methods that are used for analyzing empirical data can be evaluated and compared using computer simulations. A single simulation study can influence the analyses of thousands of empirical studies to follow. With great power comes great responsibility. Here, we argue that this responsibility includes replication of simulation studies to ensure a sound foundation for data analytical decisions. Furthermore, being designed, run, and reported by humans, simulation studies face challenges similar to other experimental empirical research and hence should not be exempt from replication attempts. We highlight that the potential replicability of simulation studies is an opportunity quantitative methodology as a field should pay more attention to.}, author = {Lohmann, Anna and Astivia, Oscar L. O. and Morris, Tim P. and Groenwold, Rolf H. H.}, - file = {C:\Users\jamespustejovsky\Zotero\storage\GBRU4F33\Lohmann et al. - 2022 - It's time! Ten reasons to start replicating simula.pdf}, - issn = {2674-1199}, journal = {Frontiers in Epidemiology}, title = {It's Time! {{Ten}} Reasons to Start Replicating Simulation Studies}, - urldate = {2024-01-01}, volume = {2}, year = {2022}} @article{miratrix2021applied, author = {Miratrix, Luke W. and Weiss, Michael J. and Henderson, Brit}, doi = {10.1080/19345747.2020.1831115}, - issn = {1934-5747, 1934-5739}, journal = {Journal of Research on Educational Effectiveness}, langid = {english}, month = jan, number = {1}, pages = {270--308}, - shorttitle = {An {{Applied Researcher}}'s {{Guide}} to {{Estimating Effects}} from {{Multisite Individually Randomized Trials}}}, title = {An {{Applied Researcher}}'s {{Guide}} to {{Estimating Effects}} from {{Multisite Individually Randomized Trials}}: {{Estimands}}, {{Estimators}}, and {{Estimates}}}, - urldate = {2024-01-05}, volume = {14}, year = {2021}, - bdsk-url-1 = {https://doi.org/10.1080/19345747.2020.1831115}} +} @book{miratrix2023DesigningMonteCarlo, author = {Miratrix, Luke W. and Pustejovsky, Jame E.}, @@ -1227,21 +1054,17 @@ @book{miratrix2023DesigningMonteCarlo @article{moerbeek2019WhatAreStatistical, abstract = {Subjects in randomized controlled trials do not always comply to the treatment condition they have been assigned to. This may cause the estimated effect of the intervention to be biased and also affect efficiency, coverage of confidence intervals, and statistical power. In cluster randomized trials non-compliance may occur at the subject level but also at the cluster level. In the latter case, all subjects within the same cluster have the same compliance status. The purpose of this study is to investigate the statistical implications of non-compliance in cluster randomized trials. A simulation study was conducted with varying degrees of non-compliance at either the cluster level or subject level. The probability of non-compliance depends on a covariate at the cluster or subject level. Various realistic values of the intraclass correlation coefficient and cluster size are used. The data are analyzed by intention to treat, as treated, per protocol and the instrumental variable approach. The results show non-compliance may result in downward biased estimates of the intervention effect and an under- or overestimate of its standard deviation. The coverage of the confidence intervals may be too small, and in most cases, empirical power is too small. The results are more severe when the probability of non-compliance increases and the covariate that affects compliance is unobserved. It is advocated to avoid non-compliance. If this is not possible, compliance status and covariates that affect compliance should be measured and included in the statistical model.}, author = {Moerbeek, Mirjam and van Schie, Sander}, - copyright = {{\copyright} 2019 The Authors. Statistics~in~Medicine Published by John Wiley \& Sons Ltd.}, doi = {10.1002/sim.8351}, - file = {C\:\\Users\\jamespustejovsky\\Zotero\\storage\\3D57RHNZ\\Moerbeek and Schie - 2019 - What are the statistical implications of treatment.pdf;C\:\\Users\\jamespustejovsky\\Zotero\\storage\\5VFMRPKP\\sim.html}, - issn = {1097-0258}, journal = {Statistics in Medicine}, keywords = {cluster randomized trial,simulation study,treatment non-compliance}, langid = {english}, number = {26}, pages = {5071--5084}, - shorttitle = {What Are the Statistical Implications of Treatment Non-Compliance in Cluster Randomized Trials}, title = {What Are the Statistical Implications of Treatment Non-Compliance in Cluster Randomized Trials: {{A}} Simulation Study}, urldate = {2024-01-05}, volume = {38}, year = {2019}, - bdsk-url-1 = {https://doi.org/10.1002/sim.8351}} +} @book{mooney1997MonteCarloSimulation, author = {Mooney, Christopher Z}, @@ -1254,46 +1077,36 @@ @book{mooney1997MonteCarloSimulation @article{morris2019UsingSimulationStudies, author = {Morris, Tim P. and White, Ian R. and Crowther, Michael J.}, doi = {10.1002/sim.8086}, - file = {C:\Users\jamespustejovsky\Zotero\storage\VNK7VV22\Morris et al. - 2019 - Using simulation studies to evaluate statistical m.pdf}, - issn = {02776715}, journal = {Statistics in Medicine}, keywords = {cited}, langid = {english}, month = jan, - shorttitle = {Using Simulation Studies to Evaluate Statistical Methods}, title = {Using Simulation Studies to Evaluate Statistical Methods}, urldate = {2019-01-26}, year = {2019}, - bdsk-url-1 = {https://doi.org/10.1002/sim.8086}} +} @misc{nguyen2022MpowerPackagePower, abstract = {Estimating sample size and statistical power is an essential part of a good study design. This R package allows users to conduct power analysis based on Monte Carlo simulations in settings in which consideration of the correlations between predictors is important. It runs power analyses given a data generative model and an inference model. It can set up a data generative model that preserves dependence structures among variables given existing data (continuous, binary, or ordinal) or high-level descriptions of the associations. Users can generate power curves to assess the trade-offs between sample size, effect size, and power of a design. This paper presents tutorials and examples focusing on applications for environmental mixture studies when predictors tend to be moderately to highly correlated. It easily interfaces with several existing and newly developed analysis strategies for assessing associations between exposures and health outcomes. However, the package is sufficiently general to facilitate power simulations in a wide variety of settings.}, author = {Nguyen, Phuc H. and Engel, Stephanie M. and Herring, Amy H.}, - file = {C:\Users\jamespustejovsky\Zotero\storage\ZV6P78KM\Nguyen et al. - 2022 - mpower An R Package for Power Analysis via Simula.pdf}, howpublished = {https://arxiv.org/abs/2209.08036v1}, journal = {arXiv.org}, langid = {english}, month = sep, shorttitle = {Mpower}, title = {Mpower: {{An R Package}} for {{Power Analysis}} via {{Simulation}} for {{Correlated Data}}}, - urldate = {2024-01-01}, year = {2022}} @article{orcan2021MonteCarloSEMPackageSimulate, abstract = {Monte Carlo simulation is a useful tool for researchers to estimated accuracy of a statistical model. It is usually used for investigating parameter estimation procedure or violation of assumption for some given conditions. To run a simulation either the paid software or open source but free program such as R is need to be used. For that, researchers must have a good knowledge about the theoretical procedures. This paper introduces the R package called MonteCarloSEM. The package helps to simulate and analyze data sets for some simulation condition such as sample size and normality for a given model. Also, an example is given to show how the functions within the package works.}, author = {Or{\c c}an, Fatih}, - file = {C:\Users\jamespustejovsky\Zotero\storage\DRYK6I84\Or{\c c}an - 2021 - MonteCarloSEM An R Package to Simulate Data for S.pdf}, - issn = {2148-7456}, journal = {International Journal of Assessment Tools in Education}, keywords = {cited}, langid = {english}, month = sep, number = {3}, pages = {704--713}, - publisher = {{\.I}zzet KARA}, - shorttitle = {{{MonteCarloSEM}}}, title = {{{MonteCarloSEM}}: {{An R Package}} to {{Simulate Data}} for {{SEM}}}, - urldate = {2024-01-02}, volume = {8}, year = {2021}} @@ -1301,33 +1114,28 @@ @article{paxton2001MonteCarloExperiments abstract = {The use of Monte Carlo simulations for the empirical assessment of statistical estimators is becoming more common in structural equation modeling research. Yet, there is little guidance for the researcher interested in using the technique. In this article we illustrate both the design and implementation of Monte Carlo simulations. We present 9 steps in planning and performing a Monte Carlo analysis: (1) developing a theoretically derived research question of interest, (2) creating a valid model, (3) designing specific experimental conditions, (4) choosing values of population parameters, (5) choosing an appropriate software package, (6) executing the simulations, (7) file storage, (8) troubleshooting and verification, and (9) summarizing results. Throughout the article, we use as a running example a Monte Carlo simulation that we performed to illustrate many of the relevant points with concrete information and detail.}, author = {Paxton, Pamela and Curran, Patrick J. and Bollen, Kenneth A. and Kirby, Jim and Chen, Feinian}, doi = {10.1207/S15328007SEM0802_7}, - issn = {1070-5511}, journal = {Structural Equation Modeling: A Multidisciplinary Journal}, keywords = {cited}, month = apr, number = {2}, pages = {287--312}, publisher = {Routledge}, - shorttitle = {Monte {{Carlo Experiments}}}, title = {Monte {{Carlo Experiments}}: {{Design}} and {{Implementation}}}, - urldate = {2024-01-02}, volume = {8}, year = {2001}, - bdsk-url-1 = {https://doi.org/10.1207/S15328007SEM0802_7}} +} @book{robert2010IntroducingMonteCarlo, address = {New York, NY}, author = {Robert, Christian and Casella, George}, doi = {10.1007/978-1-4419-1576-4}, - file = {C:\Users\jamespustejovsky\Zotero\storage\RX3A54TU\Robert and Casella - 2010 - Introducing Monte Carlo Methods with R.pdf}, isbn = {978-1-4419-1582-5 978-1-4419-1576-4}, keywords = {bayesian statistics,Markov chain,Mathematica,Monte Carlo,Monte Carlo method,Random variable,simulation,STATISTICA}, langid = {english}, publisher = {Springer}, title = {Introducing {{Monte Carlo Methods}} with {{R}}}, - urldate = {2024-01-02}, year = {2010}, - bdsk-url-1 = {https://doi.org/10.1007/978-1-4419-1576-4}} +} @misc{scheer2020SimToolConductSimulation, author = {Scheer, Marcel}, @@ -1340,63 +1148,48 @@ @article{siepe2024SimulationStudiesMethodological abstract = {Simulation studies are widely used for evaluating the performance of statistical methods in psychology. However, the quality of simulation studies can vary widely in terms of their design, execution, and reporting. In order to assess the quality of typical simulation studies in psychology, we reviewed 321 articles published in Psychological Methods, Behavioral Research Methods, and Multivariate Behavioral Research in 2021 and 2022, among which 100/321 = 31.2\% report a simulation study. We find that many articles do not provide complete and transparent information about key aspects of the study, such as justifications for the number of simulation repetitions, Monte Carlo uncertainty estimates, or code and data to reproduce the simulation studies. To address this problem, we provide a summary of the ADEMP (Aims, Data-generating mechanism, Estimands and other targets, Methods, Performance measures) design and reporting framework from Morris, White, and Crowther (2019) adapted to simulation studies in psychology. Based on this framework, we provide ADEMP-PreReg, a step-by-step template for researchers to use when designing, potentially preregistering, and reporting their simulation studies. We give formulae for estimating common performance measures, their Monte Carlo standard errors, and for calculating the number of simulation repetitions to achieve a desired Monte Carlo standard error. Finally, we give a detailed tutorial on how to apply the ADEMP framework in practice using an example simulation study on the evaluation of methods for the analysis of pre--post measurement experiments.}, author = {Siepe, Bj{\"o}rn S. and Barto{\v s}, Franti{\v s}ek and Morris, Tim and Boulesteix, Anne-Laure and Heck, Daniel W. and Pawel, Samuel}, doi = {10.31234/osf.io/ufgy6}, - file = {C\:\\Users\\jamespustejovsky\\Zotero\\storage\\3WUS7RGS\\Siepe et al. - 2024 - Simulation Studies for Methodological Research in .pdf;C\:\\Users\\jamespustejovsky\\Zotero\\storage\\ZA4S3LP9\\ufgy6.html}, keywords = {cited}, langid = {american}, month = jan, - publisher = {OSF}, - shorttitle = {Simulation {{Studies}} for {{Methodological Research}} in {{Psychology}}}, title = {Simulation Studies for Methodological Research in Psychology: A Standardized Template for Planning, Preregistration, and Reporting}, - urldate = {2024-01-01}, year = {2024}, - bdsk-url-1 = {https://doi.org/10.31234/osf.io/ufgy6}} +} @article{sigal2016PlayItAgain, abstract = {Monte Carlo simulations (MCSs) provide important information about statistical phenomena that would be impossible to assess otherwise. This article introduces MCS methods and their applications to research and statistical pedagogy using a novel software package for the R Project for Statistical Computing constructed to lessen the often steep learning curve when organizing simulation code. A primary goal of this article is to demonstrate how well-suited MCS designs are to classroom demonstrations, and how they provide a hands-on method for students to become acquainted with complex statistical concepts. In this article, essential programming aspects for writing MCS code in R are overviewed, multiple applied examples with relevant code are provided, and the benefits of using a generate--analyze--summarize coding structure over the typical ``for-loop'' strategy are discussed.}, author = {Sigal, Matthew J. and Chalmers, R. Philip}, doi = {10.1080/10691898.2016.1246953}, - file = {C:\Users\jamespustejovsky\Zotero\storage\VJFLBCD7\Sigal and Chalmers - 2016 - Play It Again Teaching Statistics With Monte Carl.pdf}, - issn = {null}, journal = {Journal of Statistics Education}, keywords = {Active learning,R,Simulation,Statistical computing}, month = sep, number = {3}, pages = {136--156}, publisher = {Taylor \& Francis}, - shorttitle = {Play {{It Again}}}, title = {Play {{It Again}}: {{Teaching Statistics With Monte Carlo Simulation}}}, - urldate = {2024-01-01}, volume = {24}, year = {2016}, - bdsk-url-1 = {https://doi.org/10.1080/10691898.2016.1246953}} +} @article{skrondal2000DesignAnalysisMonte, abstract = {The design and analysis of Monte Carlo experiments, with special reference to structural equation modelling, is discussed in this article. These topics merit consideration, since the validity of the conclusions drawn from a Monte Carlo study clearly hinges on these features. It is argued that comprehensive Monte Carlo experiments can be implemented on a PC if the experiments are adequately designed. This is especially important when investigating modern computer intensive methodologies like resampling and Markov Chain Monte Carlo methods. We are faced with three fundamental challenges in Monte Carlo experimentation. The first problem is statistical precision, which concerns the reliability of the obtained results. External validity, on the other hand, depends on the number of experimental conditions, and is crucial for the prospects of generalising the results beyond the specific experiment. Finally, we face the constraint on available computer resources. The conventional wisdom in designing and analysing Monte Carlo experiments embodies no explicit specification of meta-model for analysing the output of the experiment, the use of case studies or full factorial designs as experimental plans, no use of variance reduction techniques, a large number of replications, and "eyeballing" of the results. A critical examination of the conventional wisdom is presented in this article. We suggest that the following alternative procedures should be considered. First of all, we argue that it is profitable to specify explicit meta-models, relating the chosen performance statistics and experimental conditions. Regarding the experimental plan, we recommend the use of incomplete designs, which will often result in considerable savings. We also consider the use of common random numbers in the simulation phase, since this may enhance the precision in estimating meta-models. The use of fewer replications per trial, enabling us to investigate an increased number of experimental conditions, should also be considered in order to improve the external validity at the cost of the conventionally excessive precision.}, author = {Skrondal, Anders}, doi = {10.1207/S15327906MBR3502_1}, - issn = {0027-3171}, journal = {Multivariate Behavioral Research}, keywords = {cited}, month = apr, number = {2}, pages = {137--167}, - pmid = {26754081}, - publisher = {Routledge}, - shorttitle = {Design and {{Analysis}} of {{Monte Carlo Experiments}}}, title = {Design and {{Analysis}} of {{Monte Carlo Experiments}}: {{Attacking}} the {{Conventional Wisdom}}}, - urldate = {2024-01-02}, volume = {35}, year = {2000}, - bdsk-url-1 = {https://doi.org/10.1207/S15327906MBR3502_1}} +} @article{smith1973MonteCarloMethods, author = {Smith, Vincent Kerry}, - file = {C:\Users\jamespustejovsky\Zotero\storage\WC9SPKTS\1130000796834682624.html}, journal = {(No Title)}, langid = {english}, shorttitle = {Monte {{Carlo}} Methods}, title = {Monte {{Carlo}} Methods : {{Their Role}} for {{Econometrics}}}, - urldate = {2024-01-02}, year = {1973}} @article{sofrygin2017SimcausalPackageConducting, @@ -1409,12 +1202,10 @@ @article{sofrygin2017SimcausalPackageConducting title = {Simcausal {{R Package}}: {{Conducting Transparent}} and {{Reproducible Simulation Studies}} of {{Causal Effect Estimation}} with {{Complex Longitudinal Data}}}, volume = {81}, year = {2017}, - bdsk-url-1 = {https://doi.org/10.18637/jss.v081.i02}} +} @article{vevea1995general, author = {Vevea, Jack L and Hedges, Larry V}, - date = {1995-09-01}, - date-modified = {2025-06-19 10:38:20 -0700}, doi = {10.1007/BF02294384}, journaltitle = {Psychometrika}, number = {3}, @@ -1423,19 +1214,16 @@ @article{vevea1995general title = {A general linear model for estimating effect size in the presence of publication bias}, volume = {60}, year = {1995}, - bdsk-url-1 = {https://doi.org/10.1007/BF02294384}} +} @article{white2023HowCheckSimulation, abstract = {Simulation studies are powerful tools in epidemiology and biostatistics, but they can be hard to conduct successfully. Sometimes unexpected results are obtained. We offer advice on how to check a simulation study when this occurs, and how to design and conduct the study to give results that are easier to check. Simulation studies should be designed to include some settings in which answers are already known. They should be coded in stages, with data-generating mechanisms checked before simulated data are analysed. Results should be explored carefully, with scatterplots of standard error estimates against point estimates surprisingly powerful tools. Failed estimation and outlying estimates should be identified and dealt with by changing data-generating mechanisms or coding realistic hybrid analysis procedures. Finally, we give a series of ideas that have been useful to us in the past for checking unexpected results. Following our advice may help to prevent errors and to improve the quality of published simulation studies.}, author = {White, Ian R and Pham, Tra My and Quartagno, Matteo and Morris, Tim P}, doi = {10.1093/ije/dyad134}, - file = {C\:\\Users\\jamespustejovsky\\Zotero\\storage\\MI2FJWW8\\White et al. - 2023 - How to check a simulation study.pdf;C\:\\Users\\jamespustejovsky\\Zotero\\storage\\PP2MIHSI\\7313663.html}, - issn = {0300-5771}, journal = {International Journal of Epidemiology}, keywords = {cited}, month = oct, pages = {dyad134}, title = {How to Check a Simulation Study}, - urldate = {2024-01-01}, year = {2023}, - bdsk-url-1 = {https://doi.org/10.1093/ije/dyad134}} +} diff --git a/code/meta_analysis_playing.R b/code/meta_analysis_playing.R new file mode 100644 index 0000000..579c987 --- /dev/null +++ b/code/meta_analysis_playing.R @@ -0,0 +1,194 @@ + + +# Looking at how to viz simulation results and deal with +# heteroskedasticity in the MCSEs. + +# E.g., by the "MCSE Funnel Plot" and by using meta analysis to get +# shrunk estimates of performance + +library( tidyverse ) + +#### Load the ClusterRCT sim results and calc performance metrics #### + +source( here::here( "case_study_code/clustered_data_simulation.R" ) ) +source( here::here( "case_study_code/cronbach_alpha_simulation.R" ) ) + +res <- readRDS( file = here::here( "results/simulation_CRT.rds" ) ) +res + + +# Cut down to 100 reps to make MCSE even larger (optional) +res <- res %>% + filter( as.numeric(runID) <= 100 ) +res + + +library( simhelpers ) +sres <- + res %>% + group_by( + n_bar, J, ATE, size_coef, ICC, alpha, method + ) %>% + summarise( + calc_absolute( estimates = ATE_hat, true_param = ATE, + criteria = c("bias","stddev", "rmse")), + calc_relative_var( estimates = ATE_hat, var_estimates = SE_hat^2, + criteria = "relative bias" ), + power = mean( p_value <= 0.05 ), + ESE_hat = sqrt( mean( SE_hat^2 ) ), + SD_SE_hat = sqrt( sd( SE_hat^2 ) ), + ) %>% + rename( + R = K_absolute, + RMSE = rmse, + RMSE_mcse = rmse_mcse, + SE = stddev, + SE_mcse = stddev_mcse + ) %>% + dplyr::select( -K_relvar ) %>% + ungroup() + +sres + + +#### The meta analysis funnel plots #### + +# For Bias + +ggplot( sres, aes( bias_mcse, bias, col=as.factor(size_coef) )) + + facet_grid( alpha ~ method ) + + geom_point() + + geom_abline( slope=2, intercept=0, col="darkgrey", lty=2 ) + + geom_abline( slope=-2, intercept=0, col="darkgrey", lty=2 ) + + theme_minimal() + +summary( sres$bias_mcse ) + + +# For SE + +# NOTE: Not useful since we don't expect these SE values to be +# centered around any common value---we would need to somehow subtract +# out their expected values to see the residuals scatter, I think? +ggplot( sres, aes( SE_mcse, SE, col=as.factor(size_coef) )) + + facet_grid( alpha ~ method ) + + geom_point() + + theme_minimal() + +summary( sres$SE_mcse / sres$SE ) + + +# Is smoothing and then looking at residuals better? + +# This would be asking: Given reasonablly precise SE_mcse estimates, +# we have a sense of what the true SE should be, roughly. We then see +# if the deviation from that is larger than expected? +M_se = loess( SE ~ SE_mcse, data=sres ) +sres$SE_fitted = predict( M_se ) +sres$SE_resid = sres$SE - sres$SE_fitted +summary( sres$SE_resid ) + +ggplot( sres, aes( SE_mcse, SE_resid, col=as.factor(size_coef) )) + + facet_grid( alpha ~ method ) + + geom_point() + + geom_abline( slope=2, intercept=0, col="darkgrey", lty=2 ) + + geom_abline( slope=-2, intercept=0, col="darkgrey", lty=2 ) + + theme_minimal() + +# Is anything to be learned here? + + +# For RMSE +# Also broken due to the SE reason, above +ggplot( sres, aes( RMSE_mcse, RMSE, col=as.factor(size_coef) )) + + facet_grid( alpha ~ method ) + + geom_point() + + theme_minimal() + +summary( sres$RMSE_mcse / sres$RMSE ) + + + + +#### Initial attempt to fit multilevel model on the raw data #### + + +if ( FALSE ) { + library( lme4 ) + res + res$err = res$ATE_hat - res$ATE + table( table( res$seed ) ) + nrow(sres) / 3 + + M = lmer( err ~ 1 + method + (0+method|seed) + (1|seed:runID), + data = res ) + + + M = lmer( err ~ 1 + (as.factor(size_coef)*as.factor(alpha) + ICC + as.factor(n_bar) + as.factor(J) ) * method + (1+method|seed) + (1|seed:runID), + data = res ) + + arm::display(M) + VarCorr(M) + a = coef(M)$seed %>% + as.data.frame() + a$seed = rownames(a) + head(a) + + aL <- a %>% + pivot_longer( cols = -c( seed, `(Intercept)` ), + names_to = "method", + values_to = "bias_method" ) + sres + a = left_join( a, sres, by="seed" ) + +} + +#------------------------------------------------------------------------------- +# random effects meta-analysis of bias per method +library(metafor) + +sres %>% + filter(method == "MLM") %>% + rma.uni( + yi = bias, sei = bias_mcse, + mods = ~ as.factor(size_coef) * as.factor(alpha) * ICC + as.factor(n_bar) + as.factor(J), + data = . + ) + +RE_shrink <- function(dat) { + RE_fit <- rma.uni( + yi = bias, sei = bias_mcse, + mods = ~ as.factor(size_coef) * as.factor(alpha) * ICC + as.factor(n_bar) + as.factor(J), + data = dat + ) + + shrunk_bias <- blup(RE_fit) + data.frame(shrunk_bias = shrunk_bias$pred, shrunk_bias_mcse = shrunk_bias$se) +} + +sres_shrunken <- + sres %>% + group_nest(method) %>% + mutate( + shrunk_bias = map(data, RE_shrink) + ) %>% + unnest(cols = c(data, shrunk_bias)) + +ggplot( sres_shrunken, aes( bias_mcse, bias, col=as.factor(size_coef) )) + + facet_grid( alpha ~ method ) + + geom_point() + + geom_abline( slope=2, intercept=0, col="darkgrey", lty=2 ) + + geom_abline( slope=-2, intercept=0, col="darkgrey", lty=2 ) + + scale_x_continuous(limits = c(0, 0.1)) + + scale_y_continuous(limits = c(-0.2, 0.2)) + + theme_minimal() + +ggplot( sres_shrunken, aes( shrunk_bias_mcse, shrunk_bias, col=as.factor(size_coef) )) + + facet_grid( alpha ~ method ) + + geom_point() + + geom_abline( slope=2, intercept=0, col="darkgrey", lty=2 ) + + geom_abline( slope=-2, intercept=0, col="darkgrey", lty=2 ) + + scale_x_continuous(limits = c(0, 0.1)) + + scale_y_continuous(limits = c(-0.2, 0.2)) + + theme_minimal() + diff --git a/index.Rmd b/index.Rmd index bd7c7af..d322e6b 100644 --- a/index.Rmd +++ b/index.Rmd @@ -27,7 +27,19 @@ theme_set( theme_classic() ) # automatically create a bib database for R packages # These pulled from a grep of the Rmd files in the book, and made into a list of unique packages. # devtools::install_github("lmiratrix/blkvar") -packs = c("tidyverse", "dplyr", "ggplot2", "simhelpers", "psych", "mvtnorm", "lme4", "arm", "lmerTest", "estimatr", "blkvar", "microbenchmark", "purrr", "future", "furrr", "tidyr", "lsr", 'bookdown', 'simhelpers', 'knitr', 'rmarkdown', 'purrr') + +#devtools::install_github("https://github.com/meghapsimatrix/simhelpers") + +packs = c("tidyverse", "dplyr", "ggplot2", "simhelpers", "psych", "mvtnorm", "lme4", "arm", + "lmerTest", "estimatr", "blkvar", "microbenchmark", "purrr", "future", "furrr", + "tidyr", "lsr", 'bookdown', 'simhelpers', 'knitr', 'rmarkdown', 'purrr', + 'kableExtra', 'metafor', 'mlmpower', 'purrrlyr', 'rpart.plot', 'sn', 'ggridges', + 'carData', 'car', 'AER', + "tidyverse", "kableExtra", "tibble", "simhelpers", "ggridges", "metafor", + "ggplot2", "clubSandwich", "purrr", "future", "furrr", "lsr", "broom", + "modelr", "glmnet", "lme4", "testthat", "mlmpower", "blkvar", "tictoc", "bench", + "ggbeeswarm" +) pacman::p_load( char = packs ) knitr::write_bib( diff --git a/packages.bib b/packages.bib index f8a0b85..851f9ae 100644 --- a/packages.bib +++ b/packages.bib @@ -1,3 +1,10 @@ +@Manual{R-AER, + title = {AER: Applied Econometrics with R}, + author = {Christian Kleiber and Achim Zeileis}, + year = {2025}, + note = {R package version 1.2-15}, +} + @Manual{R-arm, title = {arm: Data Analysis Using Regression and Multilevel/Hierarchical Models}, @@ -7,23 +14,64 @@ @Manual{R-arm url = {https://CRAN.R-project.org/package=arm}, } +@Manual{R-bench, + title = {bench: High Precision Timing of R Expressions}, + author = {Jim Hester and Davis Vaughan}, + year = {2025}, + note = {R package version 1.1.4}, + url = {https://bench.r-lib.org/}, +} + @Manual{R-blkvar, - title = {blkvar: ATE and Treatment Variation Estimation for Blocked and -Multisite RCTs}, + title = {blkvar: ATE and Treatment Variation Estimation for Blocked and Multisite +RCTs}, author = {Luke Miratrix and Nicole Pashley}, - note = {R package version 0.0.1.6, commit d6cec2070a119f8490494f7ecbbe5e007a927bd3}, - url = {https://github.com/lmiratrix/blkvar}, year = {2025}, + note = {R package version 0.0.1.6, commit 60cf10e16a9960a3b0fe0c91adbe3671f604e040}, + url = {https://github.com/lmiratrix/blkvar}, } @Manual{R-bookdown, title = {bookdown: Authoring Books and Technical Documents with R Markdown}, author = {Yihui Xie}, year = {2025}, - note = {R package version 0.44}, + note = {R package version 0.45}, url = {https://github.com/rstudio/bookdown}, } +@Manual{R-broom, + title = {broom: Convert Statistical Objects into Tidy Tibbles}, + author = {David Robinson and Alex Hayes and Simon Couch}, + year = {2025}, + note = {R package version 1.0.10}, + url = {https://broom.tidymodels.org/}, +} + +@Manual{R-car, + title = {car: Companion to Applied Regression}, + author = {John Fox and Sanford Weisberg and Brad Price}, + year = {2024}, + note = {R package version 3.1-3}, + url = {https://r-forge.r-project.org/projects/car/}, +} + +@Manual{R-carData, + title = {carData: Companion to Applied Regression Data Sets}, + author = {John Fox and Sanford Weisberg and Brad Price}, + year = {2022}, + note = {R package version 3.0-5}, + url = {https://r-forge.r-project.org/projects/car/}, +} + +@Manual{R-clubSandwich, + title = {clubSandwich: Cluster-Robust (Sandwich) Variance Estimators with Small-Sample +Corrections}, + author = {James E. Pustejovsky}, + year = {2025}, + note = {R package version 0.6.1}, + url = {http://jepusto.github.io/clubSandwich/}, +} + @Manual{R-dplyr, title = {dplyr: A Grammar of Data Manipulation}, author = {Hadley Wickham and Romain François and Lionel Henry and Kirill Müller and Davis Vaughan}, @@ -56,14 +104,46 @@ @Manual{R-future url = {https://future.futureverse.org}, } +@Manual{R-ggbeeswarm, + title = {ggbeeswarm: Categorical Scatter (Violin Point) Plots}, + author = {Erik Clarke and Scott Sherrill-Mix and Charlotte Dawson}, + year = {2023}, + note = {R package version 0.7.2}, + url = {https://github.com/eclarke/ggbeeswarm}, +} + @Manual{R-ggplot2, title = {ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics}, author = {Hadley Wickham and Winston Chang and Lionel Henry and Thomas Lin Pedersen and Kohske Takahashi and Claus Wilke and Kara Woo and Hiroaki Yutani and Dewey Dunnington and Teun {van den Brand}}, year = {2025}, - note = {R package version 3.5.2}, + note = {R package version 4.0.0}, url = {https://ggplot2.tidyverse.org}, } +@Manual{R-ggridges, + title = {ggridges: Ridgeline Plots in ggplot2}, + author = {Claus O. Wilke}, + year = {2025}, + note = {R package version 0.5.7}, + url = {https://wilkelab.org/ggridges/}, +} + +@Manual{R-glmnet, + title = {glmnet: Lasso and Elastic-Net Regularized Generalized Linear Models}, + author = {Jerome Friedman and Trevor Hastie and Rob Tibshirani and Balasubramanian Narasimhan and Kenneth Tay and Noah Simon and James Yang}, + year = {2025}, + note = {R package version 4.1-10}, + url = {https://glmnet.stanford.edu}, +} + +@Manual{R-kableExtra, + title = {kableExtra: Construct Complex Table with kable and Pipe Syntax}, + author = {Hao Zhu}, + year = {2024}, + note = {R package version 1.4.0}, + url = {http://haozhu233.github.io/kableExtra/}, +} + @Manual{R-knitr, title = {knitr: A General-Purpose Package for Dynamic Report Generation in R}, author = {Yihui Xie}, @@ -96,6 +176,14 @@ @Manual{R-lsr url = {https://github.com/djnavarro/lsr}, } +@Manual{R-metafor, + title = {metafor: Meta-Analysis Package for R}, + author = {Wolfgang Viechtbauer}, + year = {2025}, + note = {R package version 4.8-0}, + url = {https://www.metafor-project.org}, +} + @Manual{R-microbenchmark, title = {microbenchmark: Accurate Timing Functions}, author = {Olaf Mersmann}, @@ -104,6 +192,22 @@ @Manual{R-microbenchmark url = {https://github.com/joshuaulrich/microbenchmark/}, } +@Manual{R-mlmpower, + title = {mlmpower: Power Analysis and Data Simulation for Multilevel Models}, + author = {Brian T. Keller}, + year = {2025}, + note = {R package version 1.0.10}, + url = {https://github.com/bkeller2/mlmpower}, +} + +@Manual{R-modelr, + title = {modelr: Modelling Functions that Work with the Pipe}, + author = {Hadley Wickham}, + year = {2023}, + note = {R package version 0.1.11}, + url = {https://modelr.tidyverse.org}, +} + @Manual{R-mvtnorm, title = {mvtnorm: Multivariate Normal and t Distributions}, author = {Alan Genz and Frank Bretz and Tetsuhisa Miwa and Xuefei Mi and Torsten Hothorn}, @@ -125,24 +229,74 @@ @Manual{R-purrr title = {purrr: Functional Programming Tools}, author = {Hadley Wickham and Lionel Henry}, year = {2025}, - note = {R package version 1.1.0}, + note = {R package version 1.2.0}, url = {https://purrr.tidyverse.org/}, } +@Manual{R-purrrlyr, + title = {purrrlyr: Tools at the Intersection of purrr and dplyr}, + author = {Lionel Henry}, + year = {2025}, + note = {R package version 0.0.10}, + url = {https://github.com/hadley/purrrlyr}, +} + @Manual{R-rmarkdown, title = {rmarkdown: Dynamic Documents for R}, author = {JJ Allaire and Yihui Xie and Christophe Dervieux and Jonathan McPherson and Javier Luraschi and Kevin Ushey and Aron Atkins and Hadley Wickham and Joe Cheng and Winston Chang and Richard Iannone}, - year = {2024}, - note = {R package version 2.29}, + year = {2025}, + note = {R package version 2.30}, url = {https://github.com/rstudio/rmarkdown}, } +@Manual{R-rpart.plot, + title = {rpart.plot: Plot rpart Models: An Enhanced Version of plot.rpart}, + author = {Stephen Milborrow}, + year = {2025}, + note = {R package version 3.1.3}, + url = {http://www.milbo.org/rpart-plot/index.html}, +} + @Manual{R-simhelpers, title = {simhelpers: Helper Functions for Simulation Studies}, author = {Megha Joshi and James Pustejovsky}, + year = {2025}, note = {R package version 0.3.1.9999}, url = {https://meghapsimatrix.github.io/simhelpers/}, +} + +@Manual{R-sn, + title = {sn: The Skew-Normal and Related Distributions Such as the Skew-t and +the SUN}, + author = {Adelchi Azzalini}, + year = {2023}, + note = {R package version 2.1.1}, + url = {http://azzalini.stat.unipd.it/SN/}, +} + +@Manual{R-testthat, + title = {testthat: Unit Testing for R}, + author = {Hadley Wickham}, year = {2025}, + note = {R package version 3.2.3}, + url = {https://testthat.r-lib.org}, +} + +@Manual{R-tibble, + title = {tibble: Simple Data Frames}, + author = {Kirill Müller and Hadley Wickham}, + year = {2025}, + note = {R package version 3.3.0}, + url = {https://tibble.tidyverse.org/}, +} + +@Manual{R-tictoc, + title = {tictoc: Functions for Timing R Scripts, as Well as Implementations of +"Stack" and "StackList" Structures}, + author = {Sergei Izrailev}, + year = {2024}, + note = {R package version 1.2.1}, + url = {https://github.com/jabiru/tictoc}, } @Manual{R-tidyr, @@ -161,6 +315,16 @@ @Manual{R-tidyverse url = {https://tidyverse.tidyverse.org}, } +@Book{AER2008, + title = {Applied Econometrics with {R}}, + author = {Christian Kleiber and Achim Zeileis}, + year = {2008}, + publisher = {Springer-Verlag}, + address = {New York}, + doi = {10.1007/978-0-387-77318-6}, + url = {https://CRAN.R-project.org/package=AER}, +} + @Book{bookdown2016, title = {bookdown: Authoring Books and Technical Documents with {R} Markdown}, author = {Yihui Xie}, @@ -171,6 +335,16 @@ @Book{bookdown2016 url = {https://bookdown.org/yihui/bookdown}, } +@Book{car2019, + title = {An {R} Companion to Applied Regression}, + edition = {Third}, + author = {John Fox and Sanford Weisberg}, + year = {2019}, + publisher = {Sage}, + address = {Thousand Oaks {CA}}, + url = {https://www.john-fox.ca/Companion/}, +} + @Article{RJ-2021-048, author = {Henrik Bengtsson}, title = {A Unifying Framework for Parallel and Distributed Processing in R using Futures}, @@ -192,6 +366,39 @@ @Book{ggplot22016 url = {https://ggplot2.tidyverse.org}, } +@Article{glmnet2010, + title = {Regularization Paths for Generalized Linear Models via Coordinate Descent}, + author = {Jerome Friedman and Trevor Hastie and Robert Tibshirani}, + journal = {Journal of Statistical Software}, + year = {2010}, + volume = {33}, + number = {1}, + pages = {1--22}, + doi = {10.18637/jss.v033.i01}, +} + +@Article{glmnet2011, + title = {Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent}, + author = {Noah Simon and Jerome Friedman and Trevor Hastie and Robert Tibshirani}, + journal = {Journal of Statistical Software}, + year = {2011}, + volume = {39}, + number = {5}, + pages = {1--13}, + doi = {10.18637/jss.v039.i05}, +} + +@Article{glmnet2023, + title = {Elastic Net Regularization Paths for All Generalized Linear Models}, + author = {J. Kenneth Tay and Balasubramanian Narasimhan and Trevor Hastie}, + journal = {Journal of Statistical Software}, + year = {2023}, + volume = {106}, + number = {1}, + pages = {1--31}, + doi = {10.18637/jss.v106.i01}, +} + @Book{knitr2015, title = {Dynamic Documents with {R} and knitr}, author = {Yihui Xie}, @@ -235,6 +442,17 @@ @Article{lmerTest2017 doi = {10.18637/jss.v082.i13}, } +@Article{metafor2010, + title = {Conducting meta-analyses in {R} with the {metafor} package}, + author = {Wolfgang Viechtbauer}, + journal = {Journal of Statistical Software}, + year = {2010}, + volume = {36}, + number = {3}, + pages = {1--48}, + doi = {10.18637/jss.v036.i03}, +} + @Book{mvtnorm2009, title = {Computation of Multivariate Normal and t Probabilities}, author = {Alan Genz and Frank Bretz}, @@ -265,6 +483,25 @@ @Book{rmarkdown2020 url = {https://bookdown.org/yihui/rmarkdown-cookbook}, } +@Manual{sn2023, + title = {The {R} package \texttt{sn}: The skew-normal and related distributions such as the skew-$t$ and the {SUN} (version 2.1.1).}, + author = {Azzalini A. Azzalini}, + address = {Universit\`a degli Studi di Padova, Italia}, + year = {2023}, + note = {Home page: \url{http://azzalini.stat.unipd.it/SN/}}, + url = {https://cran.r-project.org/package=sn}, +} + +@Article{testthat2011, + author = {Hadley Wickham}, + title = {testthat: Get Started with Testing}, + journal = {The R Journal}, + year = {2011}, + volume = {3}, + pages = {5--10}, + url = {https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf}, +} + @Article{tidyverse2019, title = {Welcome to the {tidyverse}}, author = {Hadley Wickham and Mara Averick and Jennifer Bryan and Winston Chang and Lucy D'Agostino McGowan and Romain François and Garrett Grolemund and Alex Hayes and Lionel Henry and Jim Hester and Max Kuhn and Thomas Lin Pedersen and Evan Miller and Stephan Milton Bache and Kirill Müller and Jeroen Ooms and David Robinson and Dana Paige Seidel and Vitalie Spinu and Kohske Takahashi and Davis Vaughan and Claus Wilke and Kara Woo and Hiroaki Yutani}, diff --git a/renv.lock b/renv.lock index 5bc7381..f1f212e 100644 --- a/renv.lock +++ b/renv.lock @@ -1,6 +1,6 @@ { "R": { - "Version": "4.5.1", + "Version": "4.5.2", "Repositories": [ { "Name": "CRAN", @@ -9,65 +9,6 @@ ] }, "Packages": { - "AER": { - "Package": "AER", - "Version": "1.2-15", - "Source": "Repository", - "Date": "2025-06-18", - "Title": "Applied Econometrics with R", - "Authors@R": "c(person(given = \"Christian\", family = \"Kleiber\", role = \"aut\", email = \"Christian.Kleiber@unibas.ch\", comment = c(ORCID = \"0000-0002-6781-4733\")), person(given = \"Achim\", family = \"Zeileis\", role = c(\"aut\", \"cre\"), email = \"Achim.Zeileis@R-project.org\", comment = c(ORCID = \"0000-0003-0918-3766\")))", - "Description": "Functions, data sets, examples, demos, and vignettes for the book Christian Kleiber and Achim Zeileis (2008), Applied Econometrics with R, Springer-Verlag, New York. ISBN 978-0-387-77316-2. (See the vignette \"AER\" for a package overview.)", - "LazyLoad": "yes", - "Depends": [ - "R (>= 3.0.0)", - "car (>= 2.0-19)", - "lmtest", - "sandwich (>= 2.4-0)", - "survival (>= 2.37-5)", - "zoo" - ], - "Suggests": [ - "boot", - "dynlm", - "effects", - "fGarch", - "forecast", - "foreign", - "ineq", - "KernSmooth", - "lattice", - "longmemo", - "MASS", - "mlogit", - "nlme", - "nnet", - "np", - "plm", - "pscl", - "quantreg", - "rgl", - "ROCR", - "rugarch", - "sampleSelection", - "scatterplot3d", - "strucchange", - "systemfit (>= 1.1-20)", - "truncreg", - "tseries", - "urca", - "vars" - ], - "Imports": [ - "stats", - "Formula (>= 0.2-0)" - ], - "License": "GPL-2 | GPL-3", - "NeedsCompilation": "no", - "Author": "Christian Kleiber [aut] (ORCID: ), Achim Zeileis [aut, cre] (ORCID: )", - "Maintainer": "Achim Zeileis ", - "Repository": "CRAN", - "Encoding": "UTF-8" - }, "CompQuadForm": { "Package": "CompQuadForm", "Version": "1.4.4", @@ -81,7 +22,7 @@ "License": "GPL (>= 2)", "LazyLoad": "yes", "NeedsCompilation": "yes", - "Repository": "CRAN", + "Repository": "RSPM", "Encoding": "UTF-8" }, "DBI": { @@ -133,30 +74,6 @@ "Maintainer": "Kirill Müller ", "Repository": "CRAN" }, - "Deriv": { - "Package": "Deriv", - "Version": "4.2.0", - "Source": "Repository", - "Type": "Package", - "Title": "Symbolic Differentiation", - "Date": "2025-06-20", - "Authors@R": "c(person(given=\"Andrew\", family=\"Clausen\", role=\"aut\"), person(given=\"Serguei\", family=\"Sokol\", role=c(\"aut\", \"cre\"), email=\"sokol@insa-toulouse.fr\", comment = c(ORCID = \"0000-0002-5674-3327\")), person(given=\"Andreas\", family=\"Rappold\", role=\"ctb\", email=\"arappold@gmx.at\"))", - "Description": "R-based solution for symbolic differentiation. It admits user-defined function as well as function substitution in arguments of functions to be differentiated. Some symbolic simplification is part of the work.", - "License": "GPL (>= 3)", - "Suggests": [ - "testthat (>= 0.11.0)" - ], - "BugReports": "https://github.com/sgsokol/Deriv/issues", - "RoxygenNote": "7.3.1", - "Imports": [ - "methods" - ], - "Encoding": "UTF-8", - "NeedsCompilation": "no", - "Author": "Andrew Clausen [aut], Serguei Sokol [aut, cre] (ORCID: ), Andreas Rappold [ctb]", - "Maintainer": "Serguei Sokol ", - "Repository": "CRAN" - }, "Formula": { "Package": "Formula", "Version": "1.2-5", @@ -195,7 +112,7 @@ "NeedsCompilation": "no", "Author": "Coen Bernaards [aut, cre], Paul Gilbert [aut], Robert Jennrich [aut]", "Maintainer": "Coen Bernaards ", - "Repository": "CRAN", + "Repository": "RSPM", "Encoding": "UTF-8" }, "HLMdiag": { @@ -258,7 +175,7 @@ "VignetteBuilder": "knitr", "NeedsCompilation": "yes", "Author": "Adam Loy [cre, aut], Jaylin Lowe [aut], Jack Moran [aut]", - "Repository": "CRAN" + "Repository": "RSPM" }, "MASS": { "Package": "MASS", @@ -298,10 +215,10 @@ }, "Matrix": { "Package": "Matrix", - "Version": "1.7-3", + "Version": "1.7-4", "Source": "Repository", "VersionNote": "do also bump src/version.h, inst/include/Matrix/version.h", - "Date": "2025-03-05", + "Date": "2025-08-27", "Priority": "recommended", "Title": "Sparse and Dense Matrix Classes and Methods", "Description": "A rich hierarchy of sparse and dense matrix classes, including general, symmetric, triangular, and diagonal matrices with numeric, logical, or pattern entries. Efficient methods for operating on such matrices, often wrapping the 'BLAS', 'LAPACK', and 'SuiteSparse' libraries.", @@ -337,7 +254,7 @@ "BuildResaveData": "no", "Encoding": "UTF-8", "NeedsCompilation": "yes", - "Author": "Douglas Bates [aut] (), Martin Maechler [aut, cre] (), Mikael Jagan [aut] (), Timothy A. Davis [ctb] (, SuiteSparse libraries, collaborators listed in dir(system.file(\"doc\", \"SuiteSparse\", package=\"Matrix\"), pattern=\"License\", full.names=TRUE, recursive=TRUE)), George Karypis [ctb] (, METIS library, Copyright: Regents of the University of Minnesota), Jason Riedy [ctb] (, GNU Octave's condest() and onenormest(), Copyright: Regents of the University of California), Jens Oehlschlägel [ctb] (initial nearPD()), R Core Team [ctb] (02zz1nj61, base R's matrix implementation)", + "Author": "Douglas Bates [aut] (ORCID: ), Martin Maechler [aut, cre] (ORCID: ), Mikael Jagan [aut] (ORCID: ), Timothy A. Davis [ctb] (ORCID: , SuiteSparse libraries, collaborators listed in dir(system.file(\"doc\", \"SuiteSparse\", package=\"Matrix\"), pattern=\"License\", full.names=TRUE, recursive=TRUE)), George Karypis [ctb] (ORCID: , METIS library, Copyright: Regents of the University of Minnesota), Jason Riedy [ctb] (ORCID: , GNU Octave's condest() and onenormest(), Copyright: Regents of the University of California), Jens Oehlschlägel [ctb] (initial nearPD()), R Core Team [ctb] (ROR: , base R's matrix implementation)", "Maintainer": "Martin Maechler ", "Repository": "CRAN" }, @@ -369,7 +286,7 @@ "NeedsCompilation": "no", "Author": "Douglas Bates [aut] (), Martin Maechler [aut, cre] ()", "Maintainer": "Martin Maechler ", - "Repository": "CRAN" + "Repository": "RSPM" }, "R6": { "Package": "R6", @@ -446,13 +363,13 @@ }, "RcppArmadillo": { "Package": "RcppArmadillo", - "Version": "14.6.0-1", + "Version": "15.0.2-2", "Source": "Repository", "Type": "Package", "Title": "'Rcpp' Integration for the 'Armadillo' Templated Linear Algebra Library", - "Date": "2025-07-02", + "Date": "2025-09-18", "Authors@R": "c(person(\"Dirk\", \"Eddelbuettel\", role = c(\"aut\", \"cre\"), email = \"edd@debian.org\", comment = c(ORCID = \"0000-0001-6419-907X\")), person(\"Romain\", \"Francois\", role = \"aut\", comment = c(ORCID = \"0000-0002-2444-4226\")), person(\"Doug\", \"Bates\", role = \"aut\", comment = c(ORCID = \"0000-0001-8316-9503\")), person(\"Binxiang\", \"Ni\", role = \"aut\"), person(\"Conrad\", \"Sanderson\", role = \"aut\", comment = c(ORCID = \"0000-0002-0049-4501\")))", - "Description": "'Armadillo' is a templated C++ linear algebra library (by Conrad Sanderson) that aims towards a good balance between speed and ease of use. Integer, floating point and complex numbers are supported, as well as a subset of trigonometric and statistics functions. Various matrix decompositions are provided through optional integration with LAPACK and ATLAS libraries. The 'RcppArmadillo' package includes the header files from the templated 'Armadillo' library. Thus users do not need to install 'Armadillo' itself in order to use 'RcppArmadillo'. From release 7.800.0 on, 'Armadillo' is licensed under Apache License 2; previous releases were under licensed as MPL 2.0 from version 3.800.0 onwards and LGPL-3 prior to that; 'RcppArmadillo' (the 'Rcpp' bindings/bridge to Armadillo) is licensed under the GNU GPL version 2 or later, as is the rest of 'Rcpp'.", + "Description": "'Armadillo' is a templated C++ linear algebra library aiming towards a good balance between speed and ease of use. It provides high-level syntax and functionality deliberately similar to Matlab. It is useful for algorithm development directly in C++, or quick conversion of research code into production environments. It provides efficient classes for vectors, matrices and cubes where dense and sparse matrices are supported. Integer, floating point and complex numbers are supported. A sophisticated expression evaluator (based on template meta-programming) automatically combines several operations to increase speed and efficiency. Dynamic evaluation automatically chooses optimal code paths based on detected matrix structures. Matrix decompositions are provided through integration with LAPACK, or one of its high performance drop-in replacements (such as 'MKL' or 'OpenBLAS'). It can automatically use 'OpenMP' multi-threading (parallelisation) to speed up computationally expensive operations. . The 'RcppArmadillo' package includes the header files from the 'Armadillo' library; users do not need to install 'Armadillo' itself in order to use 'RcppArmadillo'. Starting from release 15.0.0, the minimum compilation standard is C++14 so 'Armadillo' version 14.6.3 is included as a fallback when an R package forces the C++11 standard. Package authors should set a '#define' to select the 'current' version, or select the 'legacy' version (also chosen as default) if they must. See 'GitHub issue #475' for details. . Since release 7.800.0, 'Armadillo' is licensed under Apache License 2; previous releases were under licensed as MPL 2.0 from version 3.800.0 onwards and LGPL-3 prior to that; 'RcppArmadillo' (the 'Rcpp' bindings/bridge to Armadillo) is licensed under the GNU GPL version 2 or later, as is the rest of 'Rcpp'.", "License": "GPL (>= 2)", "LazyLoad": "yes", "Depends": [ @@ -480,7 +397,7 @@ "NeedsCompilation": "yes", "Author": "Dirk Eddelbuettel [aut, cre] (ORCID: ), Romain Francois [aut] (ORCID: ), Doug Bates [aut] (ORCID: ), Binxiang Ni [aut], Conrad Sanderson [aut] (ORCID: )", "Maintainer": "Dirk Eddelbuettel ", - "Repository": "CRAN", + "Repository": "RSPM", "Encoding": "UTF-8" }, "RcppEigen": { @@ -556,6 +473,45 @@ "Repository": "CRAN", "Encoding": "UTF-8" }, + "S7": { + "Package": "S7", + "Version": "0.2.0", + "Source": "Repository", + "Title": "An Object Oriented System Meant to Become a Successor to S3 and S4", + "Authors@R": "c( person(\"Object-Oriented Programming Working Group\", role = \"cph\"), person(\"Davis\", \"Vaughan\", role = \"aut\"), person(\"Jim\", \"Hester\", role = \"aut\", comment = c(ORCID = \"0000-0002-2739-7082\")), person(\"Tomasz\", \"Kalinowski\", role = \"aut\"), person(\"Will\", \"Landau\", role = \"aut\"), person(\"Michael\", \"Lawrence\", role = \"aut\"), person(\"Martin\", \"Maechler\", role = \"aut\", comment = c(ORCID = \"0000-0002-8685-9910\")), person(\"Luke\", \"Tierney\", role = \"aut\"), person(\"Hadley\", \"Wickham\", , \"hadley@posit.co\", role = c(\"aut\", \"cre\"), comment = c(ORCID = \"0000-0003-4757-117X\")) )", + "Description": "A new object oriented programming system designed to be a successor to S3 and S4. It includes formal class, generic, and method specification, and a limited form of multiple dispatch. It has been designed and implemented collaboratively by the R Consortium Object-Oriented Programming Working Group, which includes representatives from R-Core, 'Bioconductor', 'Posit'/'tidyverse', and the wider R community.", + "License": "MIT + file LICENSE", + "URL": "https://rconsortium.github.io/S7/, https://github.com/RConsortium/S7", + "BugReports": "https://github.com/RConsortium/S7/issues", + "Depends": [ + "R (>= 3.5.0)" + ], + "Imports": [ + "utils" + ], + "Suggests": [ + "bench", + "callr", + "covr", + "knitr", + "methods", + "rmarkdown", + "testthat (>= 3.2.0)", + "tibble" + ], + "VignetteBuilder": "knitr", + "Config/build/compilation-database": "true", + "Config/Needs/website": "sloop", + "Config/testthat/edition": "3", + "Config/testthat/parallel": "TRUE", + "Config/testthat/start-first": "external-generic", + "Encoding": "UTF-8", + "RoxygenNote": "7.3.2", + "NeedsCompilation": "yes", + "Author": "Object-Oriented Programming Working Group [cph], Davis Vaughan [aut], Jim Hester [aut] (), Tomasz Kalinowski [aut], Will Landau [aut], Michael Lawrence [aut], Martin Maechler [aut] (), Luke Tierney [aut], Hadley Wickham [aut, cre] ()", + "Maintainer": "Hadley Wickham ", + "Repository": "CRAN" + }, "SparseM": { "Package": "SparseM", "Version": "1.84-2", @@ -581,7 +537,7 @@ "URL": "http://www.econ.uiuc.edu/~roger/research/sparse/sparse.html", "NeedsCompilation": "yes", "Author": "Roger Koenker [cre, aut], Pin Tian Ng [ctb] (Contributions to Sparse QR code), Yousef Saad [ctb] (author of sparskit2), Ben Shaby [ctb] (author of chol2csr), Martin Maechler [ctb] (chol() tweaks; S4, )", - "Repository": "CRAN", + "Repository": "RSPM", "Encoding": "UTF-8" }, "abind": { @@ -622,7 +578,7 @@ "Maintainer": "Ravi Varadhan ", "License": "GPL (>= 2)", "LazyLoad": "yes", - "Repository": "CRAN", + "Repository": "RSPM", "NeedsCompilation": "no", "Encoding": "UTF-8" }, @@ -649,7 +605,7 @@ "MASS", "igraph" ], - "Repository": "CRAN", + "Repository": "RSPM", "RoxygenNote": "6.0.1" }, "arm": { @@ -906,16 +862,16 @@ ], "VignetteBuilder": "knitr", "Encoding": "UTF-8", - "LazyData": "true", "RoxygenNote": "7.3.2", - "Author": "Luke Miratrix [aut, cre], Nicole Pashley [aut]", - "Maintainer": "Luke Miratrix ", "RemoteType": "github", - "RemoteUsername": "lmiratrix", + "RemoteHost": "api.github.com", "RemoteRepo": "blkvar", + "RemoteUsername": "lmiratrix", "RemoteRef": "HEAD", - "RemoteSha": "d6cec2070a119f8490494f7ecbbe5e007a927bd3", - "RemoteHost": "api.github.com" + "RemoteSha": "60cf10e16a9960a3b0fe0c91adbe3671f604e040", + "NeedsCompilation": "no", + "Author": "Luke Miratrix [aut, cre], Nicole Pashley [aut]", + "Maintainer": "Luke Miratrix " }, "blob": { "Package": "blob", @@ -950,7 +906,7 @@ }, "bookdown": { "Package": "bookdown", - "Version": "0.44", + "Version": "0.45", "Source": "Repository", "Type": "Package", "Title": "Authoring Books and Technical Documents with R Markdown", @@ -1004,15 +960,15 @@ }, "boot": { "Package": "boot", - "Version": "1.3-31", + "Version": "1.3-32", "Source": "Repository", "Priority": "recommended", - "Date": "2024-08-28", - "Authors@R": "c(person(\"Angelo\", \"Canty\", role = \"aut\", email = \"cantya@mcmaster.ca\", comment = \"author of original code for S\"), person(\"Brian\", \"Ripley\", role = c(\"aut\", \"trl\"), email = \"ripley@stats.ox.ac.uk\", comment = \"conversion to R, maintainer 1999--2022, author of parallel support\"), person(\"Alessandra R.\", \"Brazzale\", role = c(\"ctb\", \"cre\"), email = \"brazzale@stat.unipd.it\", comment = \"minor bug fixes\"))", + "Date": "2025-08-29", + "Authors@R": "c(person(\"Angelo\", \"Canty\", role = \"aut\", email = \"cantya@mcmaster.ca\", comment = \"author of original code for S\"), person(\"Brian\", \"Ripley\", role = c(\"aut\", \"trl\"), email = \"Brian.Ripley@R-project.org\", comment = \"conversion to R, maintainer 1999--2022, author of parallel support\"), person(\"Alessandra R.\", \"Brazzale\", role = c(\"ctb\", \"cre\"), email = \"brazzale@stat.unipd.it\", comment = \"minor bug fixes\"))", "Maintainer": "Alessandra R. Brazzale ", "Note": "Maintainers are not available to give advice on using a package they did not author.", "Description": "Functions and datasets for bootstrapping from the book \"Bootstrap Methods and Their Application\" by A. C. Davison and D. V. Hinkley (1997, CUP), originally written by Angelo Canty for S.", - "Title": "Bootstrap Functions (Originally by Angelo Canty for S)", + "Title": "Bootstrap Functions", "Depends": [ "R (>= 3.0.0)", "graphics", @@ -1031,7 +987,7 @@ }, "brio": { "Package": "brio", - "Version": "1.1.4", + "Version": "1.1.5", "Source": "Repository", "Title": "Basic R Input Output", "Authors@R": "c( person(\"Jim\", \"Hester\", role = \"aut\", comment = c(ORCID = \"0000-0002-2739-7082\")), person(\"Gábor\", \"Csárdi\", , \"csardi.gabor@gmail.com\", role = c(\"aut\", \"cre\")), person(given = \"Posit Software, PBC\", role = c(\"cph\", \"fnd\")) )", @@ -1053,11 +1009,11 @@ "NeedsCompilation": "yes", "Author": "Jim Hester [aut] (), Gábor Csárdi [aut, cre], Posit Software, PBC [cph, fnd]", "Maintainer": "Gábor Csárdi ", - "Repository": "RSPM" + "Repository": "CRAN" }, "broom": { "Package": "broom", - "Version": "1.0.9", + "Version": "1.0.10", "Source": "Repository", "Type": "Package", "Title": "Convert Statistical Objects into Tidy Tibbles", @@ -1156,6 +1112,7 @@ "spdep (>= 1.1)", "speedglm", "spelling", + "stats4", "survey", "survival (>= 3.6-4)", "systemfit", @@ -1170,7 +1127,7 @@ "Config/usethis/last-upkeep": "2025-04-25", "Encoding": "UTF-8", "Language": "en-US", - "RoxygenNote": "7.3.2", + "RoxygenNote": "7.3.3", "Collate": "'aaa-documentation-helper.R' 'null-and-default.R' 'aer.R' 'auc.R' 'base.R' 'bbmle.R' 'betareg.R' 'biglm.R' 'bingroup.R' 'boot.R' 'broom-package.R' 'broom.R' 'btergm.R' 'car.R' 'caret.R' 'cluster.R' 'cmprsk.R' 'data-frame.R' 'deprecated-0-7-0.R' 'drc.R' 'emmeans.R' 'epiR.R' 'ergm.R' 'fixest.R' 'gam.R' 'geepack.R' 'glmnet-cv-glmnet.R' 'glmnet-glmnet.R' 'gmm.R' 'hmisc.R' 'import-standalone-obj-type.R' 'import-standalone-types-check.R' 'joinerml.R' 'kendall.R' 'ks.R' 'lavaan.R' 'leaps.R' 'lfe.R' 'list-irlba.R' 'list-optim.R' 'list-svd.R' 'list-xyz.R' 'list.R' 'lm-beta.R' 'lmodel2.R' 'lmtest.R' 'maps.R' 'margins.R' 'mass-fitdistr.R' 'mass-negbin.R' 'mass-polr.R' 'mass-ridgelm.R' 'stats-lm.R' 'mass-rlm.R' 'mclust.R' 'mediation.R' 'metafor.R' 'mfx.R' 'mgcv.R' 'mlogit.R' 'muhaz.R' 'multcomp.R' 'nnet.R' 'nobs.R' 'ordinal-clm.R' 'ordinal-clmm.R' 'plm.R' 'polca.R' 'psych.R' 'stats-nls.R' 'quantreg-nlrq.R' 'quantreg-rq.R' 'quantreg-rqs.R' 'robust-glmrob.R' 'robust-lmrob.R' 'robustbase-glmrob.R' 'robustbase-lmrob.R' 'sp.R' 'spdep.R' 'speedglm-speedglm.R' 'speedglm-speedlm.R' 'stats-anova.R' 'stats-arima.R' 'stats-decompose.R' 'stats-factanal.R' 'stats-glm.R' 'stats-htest.R' 'stats-kmeans.R' 'stats-loess.R' 'stats-mlm.R' 'stats-prcomp.R' 'stats-smooth.spline.R' 'stats-summary-lm.R' 'stats-time-series.R' 'survey.R' 'survival-aareg.R' 'survival-cch.R' 'survival-coxph.R' 'survival-pyears.R' 'survival-survdiff.R' 'survival-survexp.R' 'survival-survfit.R' 'survival-survreg.R' 'systemfit.R' 'tseries.R' 'utilities.R' 'vars.R' 'zoo.R' 'zzz.R'", "NeedsCompilation": "no", "Author": "David Robinson [aut], Alex Hayes [aut] (ORCID: ), Simon Couch [aut, cre] (ORCID: ), Posit Software, PBC [cph, fnd] (ROR: ), Indrajeet Patil [ctb] (ORCID: ), Derek Chiu [ctb], Matthieu Gomez [ctb], Boris Demeshev [ctb], Dieter Menne [ctb], Benjamin Nutter [ctb], Luke Johnston [ctb], Ben Bolker [ctb], Francois Briatte [ctb], Jeffrey Arnold [ctb], Jonah Gabry [ctb], Luciano Selzer [ctb], Gavin Simpson [ctb], Jens Preussner [ctb], Jay Hesselberth [ctb], Hadley Wickham [ctb], Matthew Lincoln [ctb], Alessandro Gasparini [ctb], Lukasz Komsta [ctb], Frederick Novometsky [ctb], Wilson Freitas [ctb], Michelle Evans [ctb], Jason Cory Brunson [ctb], Simon Jackson [ctb], Ben Whalley [ctb], Karissa Whiting [ctb], Yves Rosseel [ctb], Michael Kuehn [ctb], Jorge Cimentada [ctb], Erle Holgersen [ctb], Karl Dunkle Werner [ctb] (ORCID: ), Ethan Christensen [ctb], Steven Pav [ctb], Paul PJ [ctb], Ben Schneider [ctb], Patrick Kennedy [ctb], Lily Medina [ctb], Brian Fannin [ctb], Jason Muhlenkamp [ctb], Matt Lehman [ctb], Bill Denney [ctb] (ORCID: ), Nic Crane [ctb], Andrew Bates [ctb], Vincent Arel-Bundock [ctb] (ORCID: ), Hideaki Hayashi [ctb], Luis Tobalina [ctb], Annie Wang [ctb], Wei Yang Tham [ctb], Clara Wang [ctb], Abby Smith [ctb] (ORCID: ), Jasper Cooper [ctb] (ORCID: ), E Auden Krauska [ctb] (ORCID: ), Alex Wang [ctb], Malcolm Barrett [ctb] (ORCID: ), Charles Gray [ctb] (ORCID: ), Jared Wilber [ctb], Vilmantas Gegzna [ctb] (ORCID: ), Eduard Szoecs [ctb], Frederik Aust [ctb] (ORCID: ), Angus Moore [ctb], Nick Williams [ctb], Marius Barth [ctb] (ORCID: ), Bruna Wundervald [ctb] (ORCID: ), Joyce Cahoon [ctb] (ORCID: ), Grant McDermott [ctb] (ORCID: ), Kevin Zarca [ctb], Shiro Kuriwaki [ctb] (ORCID: ), Lukas Wallrich [ctb] (ORCID: ), James Martherus [ctb] (ORCID: ), Chuliang Xiao [ctb] (ORCID: ), Joseph Larmarange [ctb], Max Kuhn [ctb], Michal Bojanowski [ctb], Hakon Malmedal [ctb], Clara Wang [ctb], Sergio Oller [ctb], Luke Sonnet [ctb], Jim Hester [ctb], Ben Schneider [ctb], Bernie Gray [ctb] (ORCID: ), Mara Averick [ctb], Aaron Jacobs [ctb], Andreas Bender [ctb], Sven Templer [ctb], Paul-Christian Buerkner [ctb], Matthew Kay [ctb], Erwan Le Pennec [ctb], Johan Junkka [ctb], Hao Zhu [ctb], Benjamin Soltoff [ctb], Zoe Wilkinson Saldana [ctb], Tyler Littlefield [ctb], Charles T. Gray [ctb], Shabbh E. Banks [ctb], Serina Robinson [ctb], Roger Bivand [ctb], Riinu Ots [ctb], Nicholas Williams [ctb], Nina Jakobsen [ctb], Michael Weylandt [ctb], Lisa Lendway [ctb], Karl Hailperin [ctb], Josue Rodriguez [ctb], Jenny Bryan [ctb], Chris Jarvis [ctb], Greg Macfarlane [ctb], Brian Mannakee [ctb], Drew Tyre [ctb], Shreyas Singh [ctb], Laurens Geffert [ctb], Hong Ooi [ctb], Henrik Bengtsson [ctb], Eduard Szocs [ctb], David Hugh-Jones [ctb], Matthieu Stigler [ctb], Hugo Tavares [ctb] (ORCID: ), R. Willem Vervoort [ctb], Brenton M. Wiernik [ctb], Josh Yamamoto [ctb], Jasme Lee [ctb], Taren Sanders [ctb] (ORCID: ), Ilaria Prosdocimi [ctb] (ORCID: ), Daniel D. Sjoberg [ctb] (ORCID: ), Alex Reinhart [ctb] (ORCID: )", @@ -1300,92 +1257,6 @@ "Maintainer": "Gábor Csárdi ", "Repository": "CRAN" }, - "car": { - "Package": "car", - "Version": "3.1-3", - "Source": "Repository", - "Date": "2024-09-23", - "Title": "Companion to Applied Regression", - "Authors@R": "c(person(\"John\", \"Fox\", role = c(\"aut\", \"cre\"), email = \"jfox@mcmaster.ca\"), person(\"Sanford\", \"Weisberg\", role = \"aut\", email = \"sandy@umn.edu\"), person(\"Brad\", \"Price\", role = \"aut\", email = \"brad.price@mail.wvu.edu\"), person(\"Daniel\", \"Adler\", role=\"ctb\"), person(\"Douglas\", \"Bates\", role = \"ctb\"), person(\"Gabriel\", \"Baud-Bovy\", role = \"ctb\"), person(\"Ben\", \"Bolker\", role=\"ctb\"), person(\"Steve\", \"Ellison\", role=\"ctb\"), person(\"David\", \"Firth\", role = \"ctb\"), person(\"Michael\", \"Friendly\", role = \"ctb\"), person(\"Gregor\", \"Gorjanc\", role = \"ctb\"), person(\"Spencer\", \"Graves\", role = \"ctb\"), person(\"Richard\", \"Heiberger\", role = \"ctb\"), person(\"Pavel\", \"Krivitsky\", role = \"ctb\"), person(\"Rafael\", \"Laboissiere\", role = \"ctb\"), person(\"Martin\", \"Maechler\", role=\"ctb\"), person(\"Georges\", \"Monette\", role = \"ctb\"), person(\"Duncan\", \"Murdoch\", role=\"ctb\"), person(\"Henric\", \"Nilsson\", role = \"ctb\"), person(\"Derek\", \"Ogle\", role = \"ctb\"), person(\"Brian\", \"Ripley\", role = \"ctb\"), person(\"Tom\", \"Short\", role=\"ctb\"), person(\"William\", \"Venables\", role = \"ctb\"), person(\"Steve\", \"Walker\", role=\"ctb\"), person(\"David\", \"Winsemius\", role=\"ctb\"), person(\"Achim\", \"Zeileis\", role = \"ctb\"), person(\"R-Core\", role=\"ctb\"))", - "Depends": [ - "R (>= 3.5.0)", - "carData (>= 3.0-0)" - ], - "Imports": [ - "abind", - "Formula", - "MASS", - "mgcv", - "nnet", - "pbkrtest (>= 0.4-4)", - "quantreg", - "grDevices", - "utils", - "stats", - "graphics", - "lme4 (>= 1.1-27.1)", - "nlme", - "scales" - ], - "Suggests": [ - "alr4", - "boot", - "coxme", - "effects", - "knitr", - "leaps", - "lmtest", - "Matrix", - "MatrixModels", - "ordinal", - "plotrix", - "mvtnorm", - "rgl (>= 0.111.3)", - "rio", - "sandwich", - "SparseM", - "survival", - "survey" - ], - "ByteCompile": "yes", - "LazyLoad": "yes", - "Description": "Functions to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage, 2019.", - "License": "GPL (>= 2)", - "URL": "https://r-forge.r-project.org/projects/car/, https://CRAN.R-project.org/package=car, https://www.john-fox.ca/Companion/index.html", - "VignetteBuilder": "knitr", - "NeedsCompilation": "no", - "Author": "John Fox [aut, cre], Sanford Weisberg [aut], Brad Price [aut], Daniel Adler [ctb], Douglas Bates [ctb], Gabriel Baud-Bovy [ctb], Ben Bolker [ctb], Steve Ellison [ctb], David Firth [ctb], Michael Friendly [ctb], Gregor Gorjanc [ctb], Spencer Graves [ctb], Richard Heiberger [ctb], Pavel Krivitsky [ctb], Rafael Laboissiere [ctb], Martin Maechler [ctb], Georges Monette [ctb], Duncan Murdoch [ctb], Henric Nilsson [ctb], Derek Ogle [ctb], Brian Ripley [ctb], Tom Short [ctb], William Venables [ctb], Steve Walker [ctb], David Winsemius [ctb], Achim Zeileis [ctb], R-Core [ctb]", - "Maintainer": "John Fox ", - "Repository": "CRAN", - "Encoding": "UTF-8" - }, - "carData": { - "Package": "carData", - "Version": "3.0-5", - "Source": "Repository", - "Date": "2022-01-05", - "Title": "Companion to Applied Regression Data Sets", - "Authors@R": "c(person(\"John\", \"Fox\", role = c(\"aut\", \"cre\"), email = \"jfox@mcmaster.ca\"), person(\"Sanford\", \"Weisberg\", role = \"aut\", email = \"sandy@umn.edu\"), person(\"Brad\", \"Price\", role = \"aut\", email = \"brad.price@mail.wvu.edu\"))", - "Depends": [ - "R (>= 3.5.0)" - ], - "Suggests": [ - "car (>= 3.0-0)" - ], - "LazyLoad": "yes", - "LazyData": "yes", - "Description": "Datasets to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage (2019).", - "License": "GPL (>= 2)", - "URL": "https://r-forge.r-project.org/projects/car/, https://CRAN.R-project.org/package=carData, https://socialsciences.mcmaster.ca/jfox/Books/Companion/index.html", - "Author": "John Fox [aut, cre], Sanford Weisberg [aut], Brad Price [aut]", - "Maintainer": "John Fox ", - "Repository": "CRAN", - "Repository/R-Forge/Project": "car", - "Repository/R-Forge/Revision": "694", - "Repository/R-Forge/DateTimeStamp": "2022-01-05 19:40:37", - "NeedsCompilation": "no", - "Encoding": "UTF-8" - }, "cellranger": { "Package": "cellranger", "Version": "1.1.0", @@ -1498,9 +1369,10 @@ }, "cluster": { "Package": "cluster", - "Version": "2.1.6", + "Version": "2.1.8.1", "Source": "Repository", - "Date": "2023-11-30", + "VersionNote": "Last CRAN: 2.1.8 on 2024-12-10; 2.1.7 on 2024-12-06; 2.1.6 on 2023-11-30; 2.1.5 on 2023-11-27", + "Date": "2025-03-11", "Priority": "recommended", "Title": "\"Finding Groups in Data\": Cluster Analysis Extended Rousseeuw et al.", "Description": "Methods for Cluster analysis. Much extended the original from Peter Rousseeuw, Anja Struyf and Mia Hubert, based on Kaufman and Rousseeuw (1990) \"Finding Groups in Data\".", @@ -1535,8 +1407,7 @@ "URL": "https://svn.r-project.org/R-packages/trunk/cluster/", "NeedsCompilation": "yes", "Author": "Martin Maechler [aut, cre] (), Peter Rousseeuw [aut] (Fortran original, ), Anja Struyf [aut] (S original), Mia Hubert [aut] (S original, ), Kurt Hornik [trl, ctb] (port to R; maintenance(1999-2000), ), Matthias Studer [ctb], Pierre Roudier [ctb], Juan Gonzalez [ctb], Kamil Kozlowski [ctb], Erich Schubert [ctb] (fastpam options for pam(), ), Keefe Murphy [ctb] (volume.ellipsoid({d >= 3}))", - "Repository": "RSPM", - "Encoding": "UTF-8" + "Repository": "CRAN" }, "coda": { "Package": "coda", @@ -1632,59 +1503,9 @@ "License": "GPL (>= 3)", "URL": "https://strimmerlab.github.io/software/corpcor/", "NeedsCompilation": "no", - "Repository": "CRAN", + "Repository": "RSPM", "Encoding": "UTF-8" }, - "cowplot": { - "Package": "cowplot", - "Version": "1.2.0", - "Source": "Repository", - "Title": "Streamlined Plot Theme and Plot Annotations for 'ggplot2'", - "Authors@R": "person( given = \"Claus O.\", family = \"Wilke\", role = c(\"aut\", \"cre\"), email = \"wilke@austin.utexas.edu\", comment = c(ORCID = \"0000-0002-7470-9261\") )", - "Description": "Provides various features that help with creating publication-quality figures with 'ggplot2', such as a set of themes, functions to align plots and arrange them into complex compound figures, and functions that make it easy to annotate plots and or mix plots with images. The package was originally written for internal use in the Wilke lab, hence the name (Claus O. Wilke's plot package). It has also been used extensively in the book Fundamentals of Data Visualization.", - "URL": "https://wilkelab.org/cowplot/", - "BugReports": "https://github.com/wilkelab/cowplot/issues", - "Depends": [ - "R (>= 3.5.0)" - ], - "Imports": [ - "ggplot2 (>= 3.5.2)", - "grid", - "gtable", - "grDevices", - "methods", - "rlang", - "scales" - ], - "License": "GPL-2", - "Suggests": [ - "Cairo", - "covr", - "dplyr", - "forcats", - "gridGraphics (>= 0.4-0)", - "knitr", - "lattice", - "magick", - "maps", - "PASWR", - "patchwork", - "rmarkdown", - "ragg", - "testthat (>= 1.0.0)", - "tidyr", - "vdiffr (>= 0.3.0)", - "VennDiagram" - ], - "VignetteBuilder": "knitr", - "Collate": "'add_sub.R' 'align_plots.R' 'as_grob.R' 'as_gtable.R' 'axis_canvas.R' 'cowplot.R' 'draw.R' 'get_plot_component.R' 'get_axes.R' 'get_titles.R' 'get_legend.R' 'get_panel.R' 'gtable.R' 'key_glyph.R' 'plot_grid.R' 'save.R' 'set_null_device.R' 'setup.R' 'stamp.R' 'themes.R' 'utils_ggplot2.R'", - "RoxygenNote": "7.3.2", - "Encoding": "UTF-8", - "NeedsCompilation": "no", - "Author": "Claus O. Wilke [aut, cre] (ORCID: )", - "Maintainer": "Claus O. Wilke ", - "Repository": "CRAN" - }, "cpp11": { "Package": "cpp11", "Version": "0.5.2", @@ -1764,7 +1585,7 @@ }, "curl": { "Package": "curl", - "Version": "6.4.0", + "Version": "7.0.0", "Source": "Repository", "Type": "Package", "Title": "A Modern and Flexible Web Client for R", @@ -1788,7 +1609,7 @@ "Depends": [ "R (>= 3.0.0)" ], - "RoxygenNote": "7.3.2.9000", + "RoxygenNote": "7.3.2", "Encoding": "UTF-8", "Language": "en-US", "NeedsCompilation": "yes", @@ -1832,7 +1653,7 @@ }, "dbplyr": { "Package": "dbplyr", - "Version": "2.5.0", + "Version": "2.5.1", "Source": "Repository", "Type": "Package", "Title": "A 'dplyr' Back End for Databases", @@ -1875,7 +1696,7 @@ "rmarkdown", "RPostgres (>= 1.4.5)", "RPostgreSQL", - "RSQLite (>= 2.3.1)", + "RSQLite (>= 2.3.8)", "testthat (>= 3.1.10)" ], "VignetteBuilder": "knitr", @@ -1884,7 +1705,7 @@ "Config/testthat/parallel": "TRUE", "Encoding": "UTF-8", "Language": "en-gb", - "RoxygenNote": "7.3.1", + "RoxygenNote": "7.3.3", "Collate": "'db-sql.R' 'utils-check.R' 'import-standalone-types-check.R' 'import-standalone-obj-type.R' 'utils.R' 'sql.R' 'escape.R' 'translate-sql-cut.R' 'translate-sql-quantile.R' 'translate-sql-string.R' 'translate-sql-paste.R' 'translate-sql-helpers.R' 'translate-sql-window.R' 'translate-sql-conditional.R' 'backend-.R' 'backend-access.R' 'backend-hana.R' 'backend-hive.R' 'backend-impala.R' 'verb-copy-to.R' 'backend-mssql.R' 'backend-mysql.R' 'backend-odbc.R' 'backend-oracle.R' 'backend-postgres.R' 'backend-postgres-old.R' 'backend-redshift.R' 'backend-snowflake.R' 'backend-spark-sql.R' 'backend-sqlite.R' 'backend-teradata.R' 'build-sql.R' 'data-cache.R' 'data-lahman.R' 'data-nycflights13.R' 'db-escape.R' 'db-io.R' 'db.R' 'dbplyr.R' 'explain.R' 'ident.R' 'import-standalone-s3-register.R' 'join-by-compat.R' 'join-cols-compat.R' 'lazy-join-query.R' 'lazy-ops.R' 'lazy-query.R' 'lazy-select-query.R' 'lazy-set-op-query.R' 'memdb.R' 'optimise-utils.R' 'pillar.R' 'progress.R' 'sql-build.R' 'query-join.R' 'query-select.R' 'query-semi-join.R' 'query-set-op.R' 'query.R' 'reexport.R' 'remote.R' 'rows.R' 'schema.R' 'simulate.R' 'sql-clause.R' 'sql-expr.R' 'src-sql.R' 'src_dbi.R' 'table-name.R' 'tbl-lazy.R' 'tbl-sql.R' 'test-frame.R' 'testthat.R' 'tidyeval-across.R' 'tidyeval.R' 'translate-sql.R' 'utils-format.R' 'verb-arrange.R' 'verb-compute.R' 'verb-count.R' 'verb-distinct.R' 'verb-do-query.R' 'verb-do.R' 'verb-expand.R' 'verb-fill.R' 'verb-filter.R' 'verb-group_by.R' 'verb-head.R' 'verb-joins.R' 'verb-mutate.R' 'verb-pivot-longer.R' 'verb-pivot-wider.R' 'verb-pull.R' 'verb-select.R' 'verb-set-ops.R' 'verb-slice.R' 'verb-summarise.R' 'verb-uncount.R' 'verb-window.R' 'zzz.R'", "NeedsCompilation": "no", "Author": "Hadley Wickham [aut, cre], Maximilian Girlich [aut], Edgar Ruiz [aut], Posit Software, PBC [cph, fnd]", @@ -1927,7 +1748,7 @@ "Collate": "'assertions.R' 'authors-at-r.R' 'built.R' 'classes.R' 'collate.R' 'constants.R' 'deps.R' 'desc-package.R' 'description.R' 'encoding.R' 'find-package-root.R' 'latex.R' 'non-oo-api.R' 'package-archives.R' 'read.R' 'remotes.R' 'str.R' 'syntax_checks.R' 'urls.R' 'utils.R' 'validate.R' 'version.R'", "NeedsCompilation": "no", "Author": "Gábor Csárdi [aut, cre], Kirill Müller [aut], Jim Hester [aut], Maëlle Salmon [ctb] (), Posit Software, PBC [cph, fnd]", - "Repository": "RSPM" + "Repository": "CRAN" }, "diagonals": { "Package": "diagonals", @@ -1953,7 +1774,7 @@ "NeedsCompilation": "no", "Author": "Bastiaan Quast [aut, cre] ()", "Maintainer": "Bastiaan Quast ", - "Repository": "CRAN" + "Repository": "RSPM" }, "diagram": { "Package": "diagram", @@ -1974,12 +1795,12 @@ "License": "GPL (>= 2)", "LazyData": "yes", "NeedsCompilation": "no", - "Repository": "RSPM", + "Repository": "CRAN", "Encoding": "UTF-8" }, "diffobj": { "Package": "diffobj", - "Version": "0.3.5", + "Version": "0.3.6", "Source": "Repository", "Type": "Package", "Title": "Diffs for R Objects", @@ -1991,7 +1812,7 @@ "License": "GPL-2 | GPL-3", "URL": "https://github.com/brodieG/diffobj", "BugReports": "https://github.com/brodieG/diffobj/issues", - "RoxygenNote": "7.1.1", + "RoxygenNote": "7.2.3", "VignetteBuilder": "knitr", "Encoding": "UTF-8", "Suggests": [ @@ -2009,7 +1830,7 @@ "NeedsCompilation": "yes", "Author": "Brodie Gaslam [aut, cre], Michael B. Allen [ctb, cph] (Original C implementation of Myers Diff Algorithm)", "Maintainer": "Brodie Gaslam ", - "Repository": "RSPM" + "Repository": "CRAN" }, "digest": { "Package": "digest", @@ -2076,55 +1897,7 @@ "NeedsCompilation": "no", "Author": "Mitchell O'Hara-Wild [aut, cre] (), Matthew Kay [aut] (), Alex Hayes [aut] (), Rob Hyndman [aut] (), Earo Wang [ctb] (), Vencislav Popov [ctb] ()", "Maintainer": "Mitchell O'Hara-Wild ", - "Repository": "CRAN" - }, - "doBy": { - "Package": "doBy", - "Version": "4.7.0", - "Source": "Repository", - "Title": "Groupwise Statistics, LSmeans, Linear Estimates, Utilities", - "Authors@R": "c( person(given = \"Ulrich\", family = \"Halekoh\", email = \"uhalekoh@health.sdu.dk\", role = c(\"aut\", \"cph\")), person(given = \"Søren\", family = \"Højsgaard\", email = \"sorenh@math.aau.dk\", role = c(\"aut\", \"cre\", \"cph\")) )", - "Description": "Utility package containing: Main categories: Working with grouped data: 'do' something to data when stratified 'by' some variables. General linear estimates. Data handling utilities. Functional programming, in particular restrict functions to a smaller domain. Miscellaneous functions for data handling. Model stability in connection with model selection. Miscellaneous other tools.", - "Encoding": "UTF-8", - "VignetteBuilder": "knitr", - "LazyData": "true", - "LazyDataCompression": "xz", - "URL": "https://github.com/hojsgaard/doBy", - "License": "GPL (>= 2)", - "Depends": [ - "R (>= 4.2.0)", - "methods" - ], - "Imports": [ - "boot", - "broom", - "cowplot", - "Deriv", - "dplyr", - "ggplot2", - "MASS", - "Matrix", - "modelr", - "microbenchmark", - "rlang", - "tibble", - "tidyr" - ], - "Suggests": [ - "geepack", - "knitr", - "lme4", - "markdown", - "multcomp", - "pbkrtest (>= 0.5.2)", - "survival", - "testthat (>= 2.1.0)" - ], - "RoxygenNote": "7.3.2", - "NeedsCompilation": "no", - "Author": "Ulrich Halekoh [aut, cph], Søren Højsgaard [aut, cre, cph]", - "Maintainer": "Søren Højsgaard ", - "Repository": "CRAN" + "Repository": "RSPM" }, "doParallel": { "Package": "doParallel", @@ -2156,7 +1929,7 @@ "NeedsCompilation": "no", "Author": "Folashade Daniel [cre], Microsoft Corporation [aut, cph], Steve Weston [aut], Dan Tenenbaum [ctb]", "Maintainer": "Folashade Daniel ", - "Repository": "CRAN", + "Repository": "RSPM", "Encoding": "UTF-8" }, "dplyr": { @@ -2224,7 +1997,7 @@ }, "dtplyr": { "Package": "dtplyr", - "Version": "1.3.1", + "Version": "1.3.2", "Source": "Repository", "Title": "Data Table Back-End for 'dplyr'", "Authors@R": "c( person(\"Hadley\", \"Wickham\", , \"hadley@posit.co\", role = c(\"cre\", \"aut\")), person(\"Maximilian\", \"Girlich\", role = \"aut\"), person(\"Mark\", \"Fairbanks\", role = \"aut\"), person(\"Ryan\", \"Dickerson\", role = \"aut\"), person(\"Posit Software, PBC\", role = c(\"cph\", \"fnd\")) )", @@ -2233,7 +2006,7 @@ "URL": "https://dtplyr.tidyverse.org, https://github.com/tidyverse/dtplyr", "BugReports": "https://github.com/tidyverse/dtplyr/issues", "Depends": [ - "R (>= 3.3)" + "R (>= 4.0)" ], "Imports": [ "cli (>= 3.4.0)", @@ -2259,7 +2032,7 @@ "Config/Needs/website": "tidyverse/tidytemplate", "Config/testthat/edition": "3", "Encoding": "UTF-8", - "RoxygenNote": "7.2.3", + "RoxygenNote": "7.3.2.9000", "NeedsCompilation": "no", "Author": "Hadley Wickham [cre, aut], Maximilian Girlich [aut], Mark Fairbanks [aut], Ryan Dickerson [aut], Posit Software, PBC [cph, fnd]", "Maintainer": "Hadley Wickham ", @@ -2315,11 +2088,11 @@ "NeedsCompilation": "yes", "Author": "Graeme Blair [aut, cre], Jasper Cooper [aut], Alexander Coppock [aut], Macartan Humphreys [aut], Luke Sonnet [aut], Neal Fultz [ctb], Lily Medina [ctb], Russell Lenth [ctb], Molly Offer-Westort [ctb]", "Maintainer": "Graeme Blair ", - "Repository": "CRAN" + "Repository": "RSPM" }, "evaluate": { "Package": "evaluate", - "Version": "1.0.4", + "Version": "1.0.5", "Source": "Repository", "Type": "Package", "Title": "Parsing and Evaluation Tools that Provide More Details than the Default", @@ -2382,7 +2155,7 @@ "NeedsCompilation": "yes", "Author": "Martin Maechler [aut, cre] (), Christophe Dutang [aut] (), Vincent Goulet [aut] (), Douglas Bates [ctb] (cosmetic clean up, in svn r42), David Firth [ctb] (expm(method= \"PadeO\" and \"TaylorO\")), Marina Shapira [ctb] (expm(method= \"PadeO\" and \"TaylorO\")), Michael Stadelmann [ctb] (\"Higham08*\" methods, see ?expm.Higham08...)", "Maintainer": "Martin Maechler ", - "Repository": "CRAN" + "Repository": "RSPM" }, "farver": { "Package": "farver", @@ -2464,16 +2237,16 @@ }, "forcats": { "Package": "forcats", - "Version": "1.0.0", + "Version": "1.0.1", "Source": "Repository", "Title": "Tools for Working with Categorical Variables (Factors)", - "Authors@R": "c( person(\"Hadley\", \"Wickham\", , \"hadley@rstudio.com\", role = c(\"aut\", \"cre\")), person(\"RStudio\", role = c(\"cph\", \"fnd\")) )", + "Authors@R": "c( person(\"Hadley\", \"Wickham\", , \"hadley@posit.co\", role = c(\"aut\", \"cre\")), person(\"Posit Software, PBC\", role = c(\"cph\", \"fnd\"), comment = c(ROR = \"03wc8by49\")) )", "Description": "Helpers for reordering factor levels (including moving specified levels to front, ordering by first appearance, reversing, and randomly shuffling), and tools for modifying factor levels (including collapsing rare levels into other, 'anonymising', and manually 'recoding').", "License": "MIT + file LICENSE", "URL": "https://forcats.tidyverse.org/, https://github.com/tidyverse/forcats", "BugReports": "https://github.com/tidyverse/forcats/issues", "Depends": [ - "R (>= 3.4)" + "R (>= 4.1)" ], "Imports": [ "cli (>= 3.4.0)", @@ -2498,10 +2271,10 @@ "Config/testthat/edition": "3", "Encoding": "UTF-8", "LazyData": "true", - "RoxygenNote": "7.2.3", + "RoxygenNote": "7.3.3", "NeedsCompilation": "no", - "Author": "Hadley Wickham [aut, cre], RStudio [cph, fnd]", - "Maintainer": "Hadley Wickham ", + "Author": "Hadley Wickham [aut, cre], Posit Software, PBC [cph, fnd] (ROR: )", + "Maintainer": "Hadley Wickham ", "Repository": "CRAN" }, "foreach": { @@ -2656,7 +2429,7 @@ "NeedsCompilation": "no", "Author": "Davis Vaughan [aut, cre], Matt Dancho [aut], RStudio [cph, fnd]", "Maintainer": "Davis Vaughan ", - "Repository": "RSPM" + "Repository": "CRAN" }, "future": { "Package": "future", @@ -2699,7 +2472,7 @@ }, "gargle": { "Package": "gargle", - "Version": "1.5.2", + "Version": "1.6.0", "Source": "Repository", "Title": "Utilities for Working with Google APIs", "Authors@R": "c( person(\"Jennifer\", \"Bryan\", , \"jenny@posit.co\", role = c(\"aut\", \"cre\"), comment = c(ORCID = \"0000-0002-6983-2759\")), person(\"Craig\", \"Citro\", , \"craigcitro@google.com\", role = \"aut\"), person(\"Hadley\", \"Wickham\", , \"hadley@posit.co\", role = \"aut\", comment = c(ORCID = \"0000-0003-4757-117X\")), person(\"Google Inc\", role = \"cph\"), person(\"Posit Software, PBC\", role = c(\"cph\", \"fnd\")) )", @@ -2716,7 +2489,7 @@ "glue (>= 1.3.0)", "httr (>= 1.4.5)", "jsonlite", - "lifecycle", + "lifecycle (>= 0.2.0)", "openssl", "rappdirs", "rlang (>= 1.1.0)", @@ -2740,9 +2513,9 @@ "Config/testthat/edition": "3", "Encoding": "UTF-8", "Language": "en-US", - "RoxygenNote": "7.2.3", + "RoxygenNote": "7.3.2.9000", "NeedsCompilation": "no", - "Author": "Jennifer Bryan [aut, cre] (), Craig Citro [aut], Hadley Wickham [aut] (), Google Inc [cph], Posit Software, PBC [cph, fnd]", + "Author": "Jennifer Bryan [aut, cre] (ORCID: ), Craig Citro [aut], Hadley Wickham [aut] (ORCID: ), Google Inc [cph], Posit Software, PBC [cph, fnd]", "Maintainer": "Jennifer Bryan ", "Repository": "CRAN" }, @@ -2778,11 +2551,6 @@ "Maintainer": "Hadley Wickham ", "Repository": "CRAN" }, - "ggbeeswarm": { - "Package": "ggbeeswarm", - "Version": "0.7.2", - "Source": "Repository" - }, "ggdist": { "Package": "ggdist", "Version": "3.3.3", @@ -2848,39 +2616,37 @@ ], "NeedsCompilation": "yes", "Author": "Matthew Kay [aut, cre], Brenton M. Wiernik [ctb]", - "Repository": "CRAN" + "Repository": "RSPM" }, "ggplot2": { "Package": "ggplot2", - "Version": "3.5.2", + "Version": "4.0.0", "Source": "Repository", "Title": "Create Elegant Data Visualisations Using the Grammar of Graphics", - "Authors@R": "c( person(\"Hadley\", \"Wickham\", , \"hadley@posit.co\", role = \"aut\", comment = c(ORCID = \"0000-0003-4757-117X\")), person(\"Winston\", \"Chang\", role = \"aut\", comment = c(ORCID = \"0000-0002-1576-2126\")), person(\"Lionel\", \"Henry\", role = \"aut\"), person(\"Thomas Lin\", \"Pedersen\", , \"thomas.pedersen@posit.co\", role = c(\"aut\", \"cre\"), comment = c(ORCID = \"0000-0002-5147-4711\")), person(\"Kohske\", \"Takahashi\", role = \"aut\"), person(\"Claus\", \"Wilke\", role = \"aut\", comment = c(ORCID = \"0000-0002-7470-9261\")), person(\"Kara\", \"Woo\", role = \"aut\", comment = c(ORCID = \"0000-0002-5125-4188\")), person(\"Hiroaki\", \"Yutani\", role = \"aut\", comment = c(ORCID = \"0000-0002-3385-7233\")), person(\"Dewey\", \"Dunnington\", role = \"aut\", comment = c(ORCID = \"0000-0002-9415-4582\")), person(\"Teun\", \"van den Brand\", role = \"aut\", comment = c(ORCID = \"0000-0002-9335-7468\")), person(\"Posit, PBC\", role = c(\"cph\", \"fnd\")) )", + "Authors@R": "c( person(\"Hadley\", \"Wickham\", , \"hadley@posit.co\", role = \"aut\", comment = c(ORCID = \"0000-0003-4757-117X\")), person(\"Winston\", \"Chang\", role = \"aut\", comment = c(ORCID = \"0000-0002-1576-2126\")), person(\"Lionel\", \"Henry\", role = \"aut\"), person(\"Thomas Lin\", \"Pedersen\", , \"thomas.pedersen@posit.co\", role = c(\"aut\", \"cre\"), comment = c(ORCID = \"0000-0002-5147-4711\")), person(\"Kohske\", \"Takahashi\", role = \"aut\"), person(\"Claus\", \"Wilke\", role = \"aut\", comment = c(ORCID = \"0000-0002-7470-9261\")), person(\"Kara\", \"Woo\", role = \"aut\", comment = c(ORCID = \"0000-0002-5125-4188\")), person(\"Hiroaki\", \"Yutani\", role = \"aut\", comment = c(ORCID = \"0000-0002-3385-7233\")), person(\"Dewey\", \"Dunnington\", role = \"aut\", comment = c(ORCID = \"0000-0002-9415-4582\")), person(\"Teun\", \"van den Brand\", role = \"aut\", comment = c(ORCID = \"0000-0002-9335-7468\")), person(\"Posit, PBC\", role = c(\"cph\", \"fnd\"), comment = c(ROR = \"03wc8by49\")) )", "Description": "A system for 'declaratively' creating graphics, based on \"The Grammar of Graphics\". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.", "License": "MIT + file LICENSE", "URL": "https://ggplot2.tidyverse.org, https://github.com/tidyverse/ggplot2", "BugReports": "https://github.com/tidyverse/ggplot2/issues", "Depends": [ - "R (>= 3.5)" + "R (>= 4.1)" ], "Imports": [ "cli", - "glue", "grDevices", "grid", - "gtable (>= 0.1.1)", + "gtable (>= 0.3.6)", "isoband", "lifecycle (> 1.0.1)", - "MASS", - "mgcv", "rlang (>= 1.1.0)", - "scales (>= 1.3.0)", + "S7", + "scales (>= 1.4.0)", "stats", - "tibble", "vctrs (>= 0.6.0)", "withr (>= 2.5.0)" ], "Suggests": [ + "broom", "covr", "dplyr", "ggplot2movies", @@ -2889,6 +2655,8 @@ "knitr", "mapproj", "maps", + "MASS", + "mgcv", "multcomp", "munsell", "nlme", @@ -2897,10 +2665,12 @@ "ragg (>= 1.2.6)", "RColorBrewer", "rmarkdown", + "roxygen2", "rpart", "sf (>= 0.7-3)", "svglite (>= 2.1.2)", - "testthat (>= 3.1.2)", + "testthat (>= 3.1.5)", + "tibble", "vdiffr (>= 1.0.6)", "xml2" ], @@ -2910,12 +2680,13 @@ "VignetteBuilder": "knitr", "Config/Needs/website": "ggtext, tidyr, forcats, tidyverse/tidytemplate", "Config/testthat/edition": "3", + "Config/usethis/last-upkeep": "2025-04-23", "Encoding": "UTF-8", "LazyData": "true", "RoxygenNote": "7.3.2", - "Collate": "'ggproto.R' 'ggplot-global.R' 'aaa-.R' 'aes-colour-fill-alpha.R' 'aes-evaluation.R' 'aes-group-order.R' 'aes-linetype-size-shape.R' 'aes-position.R' 'compat-plyr.R' 'utilities.R' 'aes.R' 'utilities-checks.R' 'legend-draw.R' 'geom-.R' 'annotation-custom.R' 'annotation-logticks.R' 'geom-polygon.R' 'geom-map.R' 'annotation-map.R' 'geom-raster.R' 'annotation-raster.R' 'annotation.R' 'autolayer.R' 'autoplot.R' 'axis-secondary.R' 'backports.R' 'bench.R' 'bin.R' 'coord-.R' 'coord-cartesian-.R' 'coord-fixed.R' 'coord-flip.R' 'coord-map.R' 'coord-munch.R' 'coord-polar.R' 'coord-quickmap.R' 'coord-radial.R' 'coord-sf.R' 'coord-transform.R' 'data.R' 'docs_layer.R' 'facet-.R' 'facet-grid-.R' 'facet-null.R' 'facet-wrap.R' 'fortify-lm.R' 'fortify-map.R' 'fortify-multcomp.R' 'fortify-spatial.R' 'fortify.R' 'stat-.R' 'geom-abline.R' 'geom-rect.R' 'geom-bar.R' 'geom-bin2d.R' 'geom-blank.R' 'geom-boxplot.R' 'geom-col.R' 'geom-path.R' 'geom-contour.R' 'geom-count.R' 'geom-crossbar.R' 'geom-segment.R' 'geom-curve.R' 'geom-defaults.R' 'geom-ribbon.R' 'geom-density.R' 'geom-density2d.R' 'geom-dotplot.R' 'geom-errorbar.R' 'geom-errorbarh.R' 'geom-freqpoly.R' 'geom-function.R' 'geom-hex.R' 'geom-histogram.R' 'geom-hline.R' 'geom-jitter.R' 'geom-label.R' 'geom-linerange.R' 'geom-point.R' 'geom-pointrange.R' 'geom-quantile.R' 'geom-rug.R' 'geom-sf.R' 'geom-smooth.R' 'geom-spoke.R' 'geom-text.R' 'geom-tile.R' 'geom-violin.R' 'geom-vline.R' 'ggplot2-package.R' 'grob-absolute.R' 'grob-dotstack.R' 'grob-null.R' 'grouping.R' 'theme-elements.R' 'guide-.R' 'guide-axis.R' 'guide-axis-logticks.R' 'guide-axis-stack.R' 'guide-axis-theta.R' 'guide-legend.R' 'guide-bins.R' 'guide-colorbar.R' 'guide-colorsteps.R' 'guide-custom.R' 'layer.R' 'guide-none.R' 'guide-old.R' 'guides-.R' 'guides-grid.R' 'hexbin.R' 'import-standalone-obj-type.R' 'import-standalone-types-check.R' 'labeller.R' 'labels.R' 'layer-sf.R' 'layout.R' 'limits.R' 'margins.R' 'performance.R' 'plot-build.R' 'plot-construction.R' 'plot-last.R' 'plot.R' 'position-.R' 'position-collide.R' 'position-dodge.R' 'position-dodge2.R' 'position-identity.R' 'position-jitter.R' 'position-jitterdodge.R' 'position-nudge.R' 'position-stack.R' 'quick-plot.R' 'reshape-add-margins.R' 'save.R' 'scale-.R' 'scale-alpha.R' 'scale-binned.R' 'scale-brewer.R' 'scale-colour.R' 'scale-continuous.R' 'scale-date.R' 'scale-discrete-.R' 'scale-expansion.R' 'scale-gradient.R' 'scale-grey.R' 'scale-hue.R' 'scale-identity.R' 'scale-linetype.R' 'scale-linewidth.R' 'scale-manual.R' 'scale-shape.R' 'scale-size.R' 'scale-steps.R' 'scale-type.R' 'scale-view.R' 'scale-viridis.R' 'scales-.R' 'stat-align.R' 'stat-bin.R' 'stat-bin2d.R' 'stat-bindot.R' 'stat-binhex.R' 'stat-boxplot.R' 'stat-contour.R' 'stat-count.R' 'stat-density-2d.R' 'stat-density.R' 'stat-ecdf.R' 'stat-ellipse.R' 'stat-function.R' 'stat-identity.R' 'stat-qq-line.R' 'stat-qq.R' 'stat-quantilemethods.R' 'stat-sf-coordinates.R' 'stat-sf.R' 'stat-smooth-methods.R' 'stat-smooth.R' 'stat-sum.R' 'stat-summary-2d.R' 'stat-summary-bin.R' 'stat-summary-hex.R' 'stat-summary.R' 'stat-unique.R' 'stat-ydensity.R' 'summarise-plot.R' 'summary.R' 'theme.R' 'theme-defaults.R' 'theme-current.R' 'utilities-break.R' 'utilities-grid.R' 'utilities-help.R' 'utilities-matrix.R' 'utilities-patterns.R' 'utilities-resolution.R' 'utilities-tidy-eval.R' 'zxx.R' 'zzz.R'", + "Collate": "'ggproto.R' 'ggplot-global.R' 'aaa-.R' 'aes-colour-fill-alpha.R' 'aes-evaluation.R' 'aes-group-order.R' 'aes-linetype-size-shape.R' 'aes-position.R' 'all-classes.R' 'compat-plyr.R' 'utilities.R' 'aes.R' 'annotation-borders.R' 'utilities-checks.R' 'legend-draw.R' 'geom-.R' 'annotation-custom.R' 'annotation-logticks.R' 'scale-type.R' 'layer.R' 'make-constructor.R' 'geom-polygon.R' 'geom-map.R' 'annotation-map.R' 'geom-raster.R' 'annotation-raster.R' 'annotation.R' 'autolayer.R' 'autoplot.R' 'axis-secondary.R' 'backports.R' 'bench.R' 'bin.R' 'coord-.R' 'coord-cartesian-.R' 'coord-fixed.R' 'coord-flip.R' 'coord-map.R' 'coord-munch.R' 'coord-polar.R' 'coord-quickmap.R' 'coord-radial.R' 'coord-sf.R' 'coord-transform.R' 'data.R' 'docs_layer.R' 'facet-.R' 'facet-grid-.R' 'facet-null.R' 'facet-wrap.R' 'fortify-map.R' 'fortify-models.R' 'fortify-spatial.R' 'fortify.R' 'stat-.R' 'geom-abline.R' 'geom-rect.R' 'geom-bar.R' 'geom-tile.R' 'geom-bin2d.R' 'geom-blank.R' 'geom-boxplot.R' 'geom-col.R' 'geom-path.R' 'geom-contour.R' 'geom-point.R' 'geom-count.R' 'geom-crossbar.R' 'geom-segment.R' 'geom-curve.R' 'geom-defaults.R' 'geom-ribbon.R' 'geom-density.R' 'geom-density2d.R' 'geom-dotplot.R' 'geom-errorbar.R' 'geom-freqpoly.R' 'geom-function.R' 'geom-hex.R' 'geom-histogram.R' 'geom-hline.R' 'geom-jitter.R' 'geom-label.R' 'geom-linerange.R' 'geom-pointrange.R' 'geom-quantile.R' 'geom-rug.R' 'geom-sf.R' 'geom-smooth.R' 'geom-spoke.R' 'geom-text.R' 'geom-violin.R' 'geom-vline.R' 'ggplot2-package.R' 'grob-absolute.R' 'grob-dotstack.R' 'grob-null.R' 'grouping.R' 'properties.R' 'margins.R' 'theme-elements.R' 'guide-.R' 'guide-axis.R' 'guide-axis-logticks.R' 'guide-axis-stack.R' 'guide-axis-theta.R' 'guide-legend.R' 'guide-bins.R' 'guide-colorbar.R' 'guide-colorsteps.R' 'guide-custom.R' 'guide-none.R' 'guide-old.R' 'guides-.R' 'guides-grid.R' 'hexbin.R' 'import-standalone-obj-type.R' 'import-standalone-types-check.R' 'labeller.R' 'labels.R' 'layer-sf.R' 'layout.R' 'limits.R' 'performance.R' 'plot-build.R' 'plot-construction.R' 'plot-last.R' 'plot.R' 'position-.R' 'position-collide.R' 'position-dodge.R' 'position-dodge2.R' 'position-identity.R' 'position-jitter.R' 'position-jitterdodge.R' 'position-nudge.R' 'position-stack.R' 'quick-plot.R' 'reshape-add-margins.R' 'save.R' 'scale-.R' 'scale-alpha.R' 'scale-binned.R' 'scale-brewer.R' 'scale-colour.R' 'scale-continuous.R' 'scale-date.R' 'scale-discrete-.R' 'scale-expansion.R' 'scale-gradient.R' 'scale-grey.R' 'scale-hue.R' 'scale-identity.R' 'scale-linetype.R' 'scale-linewidth.R' 'scale-manual.R' 'scale-shape.R' 'scale-size.R' 'scale-steps.R' 'scale-view.R' 'scale-viridis.R' 'scales-.R' 'stat-align.R' 'stat-bin.R' 'stat-summary-2d.R' 'stat-bin2d.R' 'stat-bindot.R' 'stat-binhex.R' 'stat-boxplot.R' 'stat-connect.R' 'stat-contour.R' 'stat-count.R' 'stat-density-2d.R' 'stat-density.R' 'stat-ecdf.R' 'stat-ellipse.R' 'stat-function.R' 'stat-identity.R' 'stat-manual.R' 'stat-qq-line.R' 'stat-qq.R' 'stat-quantilemethods.R' 'stat-sf-coordinates.R' 'stat-sf.R' 'stat-smooth-methods.R' 'stat-smooth.R' 'stat-sum.R' 'stat-summary-bin.R' 'stat-summary-hex.R' 'stat-summary.R' 'stat-unique.R' 'stat-ydensity.R' 'summarise-plot.R' 'summary.R' 'theme.R' 'theme-defaults.R' 'theme-current.R' 'theme-sub.R' 'utilities-break.R' 'utilities-grid.R' 'utilities-help.R' 'utilities-patterns.R' 'utilities-resolution.R' 'utilities-tidy-eval.R' 'zxx.R' 'zzz.R'", "NeedsCompilation": "no", - "Author": "Hadley Wickham [aut] (), Winston Chang [aut] (), Lionel Henry [aut], Thomas Lin Pedersen [aut, cre] (), Kohske Takahashi [aut], Claus Wilke [aut] (), Kara Woo [aut] (), Hiroaki Yutani [aut] (), Dewey Dunnington [aut] (), Teun van den Brand [aut] (), Posit, PBC [cph, fnd]", + "Author": "Hadley Wickham [aut] (ORCID: ), Winston Chang [aut] (ORCID: ), Lionel Henry [aut], Thomas Lin Pedersen [aut, cre] (ORCID: ), Kohske Takahashi [aut], Claus Wilke [aut] (ORCID: ), Kara Woo [aut] (ORCID: ), Hiroaki Yutani [aut] (ORCID: ), Dewey Dunnington [aut] (ORCID: ), Teun van den Brand [aut] (ORCID: ), Posit, PBC [cph, fnd] (ROR: )", "Maintainer": "Thomas Lin Pedersen ", "Repository": "CRAN" }, @@ -2966,11 +2737,11 @@ "NeedsCompilation": "yes", "Author": "Kamil Slowikowski [aut, cre] (), Alicia Schep [ctb] (), Sean Hughes [ctb] (), Trung Kien Dang [ctb] (), Saulius Lukauskas [ctb], Jean-Olivier Irisson [ctb] (), Zhian N Kamvar [ctb] (), Thompson Ryan [ctb] (), Dervieux Christophe [ctb] (), Yutani Hiroaki [ctb], Pierre Gramme [ctb], Amir Masoud Abdol [ctb], Malcolm Barrett [ctb] (), Robrecht Cannoodt [ctb] (), Michał Krassowski [ctb] (), Michael Chirico [ctb] (), Pedro Aphalo [ctb] (), Francis Barton [ctb]", "Maintainer": "Kamil Slowikowski ", - "Repository": "CRAN" + "Repository": "RSPM" }, "ggridges": { "Package": "ggridges", - "Version": "0.5.6", + "Version": "0.5.7", "Source": "Repository", "Type": "Package", "Title": "Ridgeline Plots in 'ggplot2'", @@ -2982,7 +2753,7 @@ "R (>= 3.2)" ], "Imports": [ - "ggplot2 (>= 3.4.0)", + "ggplot2 (>= 3.5.0)", "grid (>= 3.0.0)", "scales (>= 0.4.1)", "withr (>= 2.1.1)" @@ -3002,12 +2773,12 @@ ], "VignetteBuilder": "knitr", "Collate": "'data.R' 'ggridges.R' 'geoms.R' 'geomsv.R' 'geoms-gradient.R' 'geom-density-line.R' 'position.R' 'scale-cyclical.R' 'scale-point.R' 'scale-vline.R' 'stats.R' 'theme.R' 'utils_ggplot2.R' 'utils.R'", - "RoxygenNote": "7.2.3", + "RoxygenNote": "7.3.2", "Encoding": "UTF-8", "NeedsCompilation": "no", - "Author": "Claus O. Wilke [aut, cre] ()", + "Author": "Claus O. Wilke [aut, cre] (ORCID: )", "Maintainer": "Claus O. Wilke ", - "Repository": "CRAN" + "Repository": "RSPM" }, "glmnet": { "Package": "glmnet", @@ -3121,7 +2892,7 @@ }, "googledrive": { "Package": "googledrive", - "Version": "2.1.1", + "Version": "2.1.2", "Source": "Repository", "Title": "An Interface to Google Drive", "Authors@R": "c( person(\"Lucy\", \"D'Agostino McGowan\", , role = \"aut\"), person(\"Jennifer\", \"Bryan\", , \"jenny@posit.co\", role = c(\"aut\", \"cre\"), comment = c(ORCID = \"0000-0002-6983-2759\")), person(\"Posit Software, PBC\", role = c(\"cph\", \"fnd\")) )", @@ -3130,11 +2901,11 @@ "URL": "https://googledrive.tidyverse.org, https://github.com/tidyverse/googledrive", "BugReports": "https://github.com/tidyverse/googledrive/issues", "Depends": [ - "R (>= 3.6)" + "R (>= 4.1)" ], "Imports": [ "cli (>= 3.0.0)", - "gargle (>= 1.5.0)", + "gargle (>= 1.6.0)", "glue (>= 1.4.2)", "httr", "jsonlite", @@ -3153,25 +2924,24 @@ "curl", "dplyr (>= 1.0.0)", "knitr", - "mockr", "rmarkdown", "spelling", - "testthat (>= 3.1.3)" + "testthat (>= 3.1.5)" ], "VignetteBuilder": "knitr", "Config/Needs/website": "tidyverse, tidyverse/tidytemplate", "Config/testthat/edition": "3", "Encoding": "UTF-8", "Language": "en-US", - "RoxygenNote": "7.2.3", + "RoxygenNote": "7.3.3", "NeedsCompilation": "no", - "Author": "Lucy D'Agostino McGowan [aut], Jennifer Bryan [aut, cre] (), Posit Software, PBC [cph, fnd]", + "Author": "Lucy D'Agostino McGowan [aut], Jennifer Bryan [aut, cre] (ORCID: ), Posit Software, PBC [cph, fnd]", "Maintainer": "Jennifer Bryan ", "Repository": "CRAN" }, "googlesheets4": { "Package": "googlesheets4", - "Version": "1.1.1", + "Version": "1.1.2", "Source": "Repository", "Title": "Access Google Sheets using the Sheets API V4", "Authors@R": "c( person(\"Jennifer\", \"Bryan\", , \"jenny@posit.co\", role = c(\"cre\", \"aut\"), comment = c(ORCID = \"0000-0002-6983-2759\")), person(\"Posit Software, PBC\", role = c(\"cph\", \"fnd\")) )", @@ -3186,7 +2956,7 @@ "cellranger", "cli (>= 3.0.0)", "curl", - "gargle (>= 1.5.0)", + "gargle (>= 1.6.0)", "glue (>= 1.3.0)", "googledrive (>= 2.1.0)", "httr", @@ -3213,9 +2983,9 @@ "Config/testthat/edition": "3", "Encoding": "UTF-8", "Language": "en-US", - "RoxygenNote": "7.2.3", + "RoxygenNote": "7.3.2.9000", "NeedsCompilation": "no", - "Author": "Jennifer Bryan [cre, aut] (), Posit Software, PBC [cph, fnd]", + "Author": "Jennifer Bryan [cre, aut] (ORCID: ), Posit Software, PBC [cph, fnd]", "Maintainer": "Jennifer Bryan ", "Repository": "CRAN" }, @@ -3247,7 +3017,7 @@ "NeedsCompilation": "no", "Author": "Baptiste Auguie [aut, cre], Anton Antonov [ctb]", "Maintainer": "Baptiste Auguie ", - "Repository": "CRAN", + "Repository": "RSPM", "Encoding": "UTF-8" }, "gtable": { @@ -3341,17 +3111,17 @@ }, "here": { "Package": "here", - "Version": "1.0.1", + "Version": "1.0.2", "Source": "Repository", "Title": "A Simpler Way to Find Your Files", - "Date": "2020-12-13", - "Authors@R": "c(person(given = \"Kirill\", family = \"M\\u00fcller\", role = c(\"aut\", \"cre\"), email = \"krlmlr+r@mailbox.org\", comment = c(ORCID = \"0000-0002-1416-3412\")), person(given = \"Jennifer\", family = \"Bryan\", role = \"ctb\", email = \"jenny@rstudio.com\", comment = c(ORCID = \"0000-0002-6983-2759\")))", + "Date": "2025-09-06", + "Authors@R": "c(person(given = \"Kirill\", family = \"M\\u00fcller\", role = c(\"aut\", \"cre\"), email = \"kirill@cynkra.com\", comment = c(ORCID = \"0000-0002-1416-3412\")), person(given = \"Jennifer\", family = \"Bryan\", role = \"ctb\", email = \"jenny@rstudio.com\", comment = c(ORCID = \"0000-0002-6983-2759\")))", "Description": "Constructs paths to your project's files. Declare the relative path of a file within your project with 'i_am()'. Use the 'here()' function as a drop-in replacement for 'file.path()', it will always locate the files relative to your project root.", "License": "MIT + file LICENSE", "URL": "https://here.r-lib.org/, https://github.com/r-lib/here", "BugReports": "https://github.com/r-lib/here/issues", "Imports": [ - "rprojroot (>= 2.0.2)" + "rprojroot (>= 2.1.0)" ], "Suggests": [ "conflicted", @@ -3369,13 +3139,13 @@ ], "VignetteBuilder": "knitr", "Encoding": "UTF-8", - "LazyData": "true", - "RoxygenNote": "7.1.1.9000", + "RoxygenNote": "7.3.3.9000", "Config/testthat/edition": "3", + "Config/Needs/website": "tidyverse/tidytemplate", "NeedsCompilation": "no", - "Author": "Kirill Müller [aut, cre] (), Jennifer Bryan [ctb] ()", - "Maintainer": "Kirill Müller ", - "Repository": "RSPM" + "Author": "Kirill Müller [aut, cre] (ORCID: ), Jennifer Bryan [ctb] (ORCID: )", + "Maintainer": "Kirill Müller ", + "Repository": "CRAN" }, "highr": { "Package": "highr", @@ -3409,13 +3179,17 @@ }, "hms": { "Package": "hms", - "Version": "1.1.3", + "Version": "1.1.4", "Source": "Repository", "Title": "Pretty Time of Day", - "Date": "2023-03-21", - "Authors@R": "c( person(\"Kirill\", \"Müller\", role = c(\"aut\", \"cre\"), email = \"kirill@cynkra.com\", comment = c(ORCID = \"0000-0002-1416-3412\")), person(\"R Consortium\", role = \"fnd\"), person(\"RStudio\", role = \"fnd\") )", + "Date": "2025-10-11", + "Authors@R": "c( person(\"Kirill\", \"Müller\", , \"kirill@cynkra.com\", role = c(\"aut\", \"cre\"), comment = c(ORCID = \"0000-0002-1416-3412\")), person(\"R Consortium\", role = \"fnd\"), person(\"Posit Software, PBC\", role = \"fnd\", comment = c(ROR = \"03wc8by49\")) )", "Description": "Implements an S3 class for storing and formatting time-of-day values, based on the 'difftime' class.", + "License": "MIT + file LICENSE", + "URL": "https://hms.tidyverse.org/, https://github.com/tidyverse/hms", + "BugReports": "https://github.com/tidyverse/hms/issues", "Imports": [ + "cli", "lifecycle", "methods", "pkgconfig", @@ -3428,17 +3202,12 @@ "pillar (>= 1.1.0)", "testthat (>= 3.0.0)" ], - "License": "MIT + file LICENSE", - "Encoding": "UTF-8", - "URL": "https://hms.tidyverse.org/, https://github.com/tidyverse/hms", - "BugReports": "https://github.com/tidyverse/hms/issues", - "RoxygenNote": "7.2.3", - "Config/testthat/edition": "3", - "Config/autostyle/scope": "line_breaks", - "Config/autostyle/strict": "false", "Config/Needs/website": "tidyverse/tidytemplate", + "Config/testthat/edition": "3", + "Encoding": "UTF-8", + "RoxygenNote": "7.3.3.9000", "NeedsCompilation": "no", - "Author": "Kirill Müller [aut, cre] (), R Consortium [fnd], RStudio [fnd]", + "Author": "Kirill Müller [aut, cre] (ORCID: ), R Consortium [fnd], Posit Software, PBC [fnd] (ROR: )", "Maintainer": "Kirill Müller ", "Repository": "CRAN" }, @@ -3657,7 +3426,7 @@ "NeedsCompilation": "no", "Author": "Sam Firke [aut, cre], Bill Denney [ctb], Chris Haid [ctb], Ryan Knight [ctb], Malte Grosser [ctb], Jonathan Zadra [ctb]", "Maintainer": "Sam Firke ", - "Repository": "CRAN" + "Repository": "RSPM" }, "jquerylib": { "Package": "jquerylib", @@ -3885,7 +3654,7 @@ }, "lavaan": { "Package": "lavaan", - "Version": "0.6-19", + "Version": "0.6-20", "Source": "Repository", "Title": "Latent Variable Analysis", "Authors@R": "c(person(given = \"Yves\", family = \"Rosseel\", role = c(\"aut\", \"cre\"), email = \"Yves.Rosseel@UGent.be\", comment = c(ORCID = \"0000-0002-4129-4477\")), person(given = c(\"Terrence\",\"D.\"), family = \"Jorgensen\", role = \"aut\", email = \"TJorgensen314@gmail.com\", comment = c(ORCID = \"0000-0001-5111-6773\")), person(given = c(\"Luc\"), family = \"De Wilde\", role = \"aut\", email = \"Luc.DeWilde@UGent.be\"), person(given = \"Daniel\", family = \"Oberski\", role = \"ctb\", email = \"daniel.oberski@gmail.com\"), person(given = \"Jarrett\", family = \"Byrnes\", role = \"ctb\", email = \"byrnes@nceas.ucsb.edu\"), person(given = \"Leonard\", family = \"Vanbrabant\", role = \"ctb\", email = \"info@restriktor.org\"), person(given = \"Victoria\", family = \"Savalei\", role = \"ctb\", email = \"vsavalei@ubc.ca\"), person(given = \"Ed\", family = \"Merkle\", role = \"ctb\", email = \"merklee@missouri.edu\"), person(given = \"Michael\", family = \"Hallquist\", role = \"ctb\", email = \"michael.hallquist@gmail.com\"), person(given = \"Mijke\", family = \"Rhemtulla\", role = \"ctb\", email = \"mrhemtulla@ucdavis.edu\"), person(given = \"Myrsini\", family = \"Katsikatsou\", role = \"ctb\", email = \"mirtok2@gmail.com\"), person(given = \"Mariska\", family = \"Barendse\", role = \"ctb\", email = \"m.t.barendse@gmail.com\"), person(given = c(\"Nicholas\"), family = \"Rockwood\", role = \"ctb\", email = \"nrockwood@rti.org\"), person(given = \"Florian\", family = \"Scharf\", role = \"ctb\", email = \"florian.scharf@uni-leipzig.de\"), person(given = \"Han\", family = \"Du\", role = \"ctb\", email = \"hdu@psych.ucla.edu\"), person(given = \"Haziq\", family = \"Jamil\", role = \"ctb\", email = \"haziq.jamil@ubd.edu.bn\", comment = c(ORCID = \"0000-0003-3298-1010\")), person(given = \"Franz\", family = \"Classe\", role = \"ctb\", email = \"classe@dji.de\") )", @@ -3911,9 +3680,9 @@ "ByteCompile": "true", "URL": "https://lavaan.ugent.be", "NeedsCompilation": "no", - "Author": "Yves Rosseel [aut, cre] (), Terrence D. Jorgensen [aut] (), Luc De Wilde [aut], Daniel Oberski [ctb], Jarrett Byrnes [ctb], Leonard Vanbrabant [ctb], Victoria Savalei [ctb], Ed Merkle [ctb], Michael Hallquist [ctb], Mijke Rhemtulla [ctb], Myrsini Katsikatsou [ctb], Mariska Barendse [ctb], Nicholas Rockwood [ctb], Florian Scharf [ctb], Han Du [ctb], Haziq Jamil [ctb] (), Franz Classe [ctb]", + "Author": "Yves Rosseel [aut, cre] (ORCID: ), Terrence D. Jorgensen [aut] (ORCID: ), Luc De Wilde [aut], Daniel Oberski [ctb], Jarrett Byrnes [ctb], Leonard Vanbrabant [ctb], Victoria Savalei [ctb], Ed Merkle [ctb], Michael Hallquist [ctb], Mijke Rhemtulla [ctb], Myrsini Katsikatsou [ctb], Mariska Barendse [ctb], Nicholas Rockwood [ctb], Florian Scharf [ctb], Han Du [ctb], Haziq Jamil [ctb] (ORCID: ), Franz Classe [ctb]", "Maintainer": "Yves Rosseel ", - "Repository": "CRAN", + "Repository": "RSPM", "Encoding": "UTF-8" }, "lifecycle": { @@ -3959,7 +3728,7 @@ }, "listenv": { "Package": "listenv", - "Version": "0.9.1", + "Version": "0.10.0", "Source": "Repository", "Depends": [ "R (>= 3.1.2)" @@ -3975,13 +3744,13 @@ "Description": "List environments are environments that have list-like properties. For instance, the elements of a list environment are ordered and can be accessed and iterated over using index subsetting, e.g. 'x <- listenv(a = 1, b = 2); for (i in seq_along(x)) x[[i]] <- x[[i]] ^ 2; y <- as.list(x)'.", "License": "LGPL (>= 2.1)", "LazyLoad": "TRUE", - "URL": "https://listenv.futureverse.org, https://github.com/HenrikBengtsson/listenv", - "BugReports": "https://github.com/HenrikBengtsson/listenv/issues", - "RoxygenNote": "7.3.1", + "URL": "https://listenv.futureverse.org, https://github.com/futureverse/listenv", + "BugReports": "https://github.com/futureverse/listenv/issues", + "RoxygenNote": "7.3.3", "NeedsCompilation": "no", "Author": "Henrik Bengtsson [aut, cre, cph]", "Maintainer": "Henrik Bengtsson ", - "Repository": "RSPM", + "Repository": "CRAN", "Encoding": "UTF-8" }, "lme4": { @@ -4128,7 +3897,7 @@ "NeedsCompilation": "no", "Author": "Adam Loy [aut, cre] (), Spenser Steele [aut], Jenna Korobova [aut]", "Maintainer": "Adam Loy ", - "Repository": "CRAN" + "Repository": "RSPM" }, "lmtest": { "Package": "lmtest", @@ -4239,11 +4008,11 @@ }, "magrittr": { "Package": "magrittr", - "Version": "2.0.3", + "Version": "2.0.4", "Source": "Repository", "Type": "Package", "Title": "A Forward-Pipe Operator for R", - "Authors@R": "c( person(\"Stefan Milton\", \"Bache\", , \"stefan@stefanbache.dk\", role = c(\"aut\", \"cph\"), comment = \"Original author and creator of magrittr\"), person(\"Hadley\", \"Wickham\", , \"hadley@rstudio.com\", role = \"aut\"), person(\"Lionel\", \"Henry\", , \"lionel@rstudio.com\", role = \"cre\"), person(\"RStudio\", role = c(\"cph\", \"fnd\")) )", + "Authors@R": "c( person(\"Stefan Milton\", \"Bache\", , \"stefan@stefanbache.dk\", role = c(\"aut\", \"cph\"), comment = \"Original author and creator of magrittr\"), person(\"Hadley\", \"Wickham\", , \"hadley@posit.co\", role = \"aut\"), person(\"Lionel\", \"Henry\", , \"lionel@posit.co\", role = \"cre\"), person(\"Posit Software, PBC\", role = c(\"cph\", \"fnd\"), comment = c(ROR = \"03wc8by49\")) )", "Description": "Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions. For more information, see package vignette. To quote Rene Magritte, \"Ceci n'est pas un pipe.\"", "License": "MIT + file LICENSE", "URL": "https://magrittr.tidyverse.org, https://github.com/tidyverse/magrittr", @@ -4262,10 +4031,10 @@ "ByteCompile": "Yes", "Config/Needs/website": "tidyverse/tidytemplate", "Encoding": "UTF-8", - "RoxygenNote": "7.1.2", + "RoxygenNote": "7.3.3", "NeedsCompilation": "yes", - "Author": "Stefan Milton Bache [aut, cph] (Original author and creator of magrittr), Hadley Wickham [aut], Lionel Henry [cre], RStudio [cph, fnd]", - "Maintainer": "Lionel Henry ", + "Author": "Stefan Milton Bache [aut, cph] (Original author and creator of magrittr), Hadley Wickham [aut], Lionel Henry [cre], Posit Software, PBC [cph, fnd] (ROR: )", + "Maintainer": "Lionel Henry ", "Repository": "CRAN" }, "mathjaxr": { @@ -4287,16 +4056,16 @@ "NeedsCompilation": "yes", "Author": "Wolfgang Viechtbauer [aut, cre] ()", "Maintainer": "Wolfgang Viechtbauer ", - "Repository": "CRAN" + "Repository": "RSPM" }, "mclust": { "Package": "mclust", - "Version": "6.1.1", + "Version": "6.1.2", "Source": "Repository", - "Date": "2024-04-29", + "Date": "2025-10-30", "Title": "Gaussian Mixture Modelling for Model-Based Clustering, Classification, and Density Estimation", "Description": "Gaussian finite mixture models fitted via EM algorithm for model-based clustering, classification, and density estimation, including Bayesian regularization, dimension reduction for visualisation, and resampling-based inference.", - "Authors@R": "c(person(\"Chris\", \"Fraley\", role = \"aut\"), person(\"Adrian E.\", \"Raftery\", role = \"aut\", comment = c(ORCID = \"0000-0002-6589-301X\")), person(\"Luca\", \"Scrucca\", role = c(\"aut\", \"cre\"), email = \"luca.scrucca@unipg.it\", comment = c(ORCID = \"0000-0003-3826-0484\")), person(\"Thomas Brendan\", \"Murphy\", role = \"ctb\", comment = c(ORCID = \"0000-0002-5668-7046\")), person(\"Michael\", \"Fop\", role = \"ctb\", comment = c(ORCID = \"0000-0003-3936-2757\")))", + "Authors@R": "c(person(\"Chris\", \"Fraley\", role = \"aut\"), person(\"Adrian E.\", \"Raftery\", role = \"aut\", comment = c(ORCID = \"0000-0002-6589-301X\")), person(\"Luca\", \"Scrucca\", role = c(\"aut\", \"cre\"), email = \"luca.scrucca@unibo.it\", comment = c(ORCID = \"0000-0003-3826-0484\")), person(\"Thomas Brendan\", \"Murphy\", role = \"ctb\", comment = c(ORCID = \"0000-0002-5668-7046\")), person(\"Michael\", \"Fop\", role = \"ctb\", comment = c(ORCID = \"0000-0003-3936-2757\")))", "Depends": [ "R (>= 3.0)" ], @@ -4316,13 +4085,13 @@ "License": "GPL (>= 2)", "URL": "https://mclust-org.github.io/mclust/", "VignetteBuilder": "knitr", - "Repository": "CRAN", + "Repository": "RSPM", "ByteCompile": "true", "NeedsCompilation": "yes", "LazyData": "yes", "Encoding": "UTF-8", - "Author": "Chris Fraley [aut], Adrian E. Raftery [aut] (), Luca Scrucca [aut, cre] (), Thomas Brendan Murphy [ctb] (), Michael Fop [ctb] ()", - "Maintainer": "Luca Scrucca " + "Author": "Chris Fraley [aut], Adrian E. Raftery [aut] (ORCID: ), Luca Scrucca [aut, cre] (ORCID: ), Thomas Brendan Murphy [ctb] (ORCID: ), Michael Fop [ctb] (ORCID: )", + "Maintainer": "Luca Scrucca " }, "memoise": { "Package": "memoise", @@ -4385,7 +4154,7 @@ "lmeInfo" ], "Author": "Ting Wang [aut, cre], Edgar Merkle [aut] (ORCID: ), Yves Rosseel [ctb]", - "Repository": "CRAN", + "Repository": "RSPM", "Encoding": "UTF-8" }, "metadat": { @@ -4434,7 +4203,7 @@ "NeedsCompilation": "no", "Author": "Wolfgang Viechtbauer [aut, cre] (), Thomas White [aut] (), Daniel Noble [aut] (), Alistair Senior [aut] (), W. Kyle Hamilton [aut] (), Guido Schwarzer [dtc] ()", "Maintainer": "Wolfgang Viechtbauer ", - "Repository": "CRAN" + "Repository": "RSPM" }, "metafor": { "Package": "metafor", @@ -4515,7 +4284,7 @@ "NeedsCompilation": "no", "Author": "Wolfgang Viechtbauer [aut, cre] ()", "Maintainer": "Wolfgang Viechtbauer ", - "Repository": "CRAN" + "Repository": "RSPM" }, "mgcv": { "Package": "mgcv", @@ -4577,7 +4346,7 @@ "NeedsCompilation": "yes", "Author": "Olaf Mersmann [aut], Claudia Beleites [ctb], Rainer Hurling [ctb], Ari Friedman [ctb], Joshua M. Ulrich [cre]", "Maintainer": "Joshua M. Ulrich ", - "Repository": "CRAN", + "Repository": "RSPM", "Encoding": "UTF-8" }, "mime": { @@ -4676,7 +4445,7 @@ "NeedsCompilation": "no", "Author": "Brian T. Keller [aut, cre, cph]", "Maintainer": "Brian T. Keller ", - "Repository": "CRAN" + "Repository": "RSPM" }, "mnormt": { "Package": "mnormt", @@ -4769,7 +4538,7 @@ "NeedsCompilation": "yes", "Author": "Christopher Jackson [aut, cre]", "Maintainer": "Christopher Jackson ", - "Repository": "CRAN" + "Repository": "RSPM" }, "mvtnorm": { "Package": "mvtnorm", @@ -4794,7 +4563,7 @@ "NeedsCompilation": "yes", "Author": "Alan Genz [aut], Frank Bretz [aut], Tetsuhisa Miwa [aut], Xuefei Mi [aut], Friedrich Leisch [ctb], Fabian Scheipl [ctb], Bjoern Bornkamp [ctb] (), Martin Maechler [ctb] (), Torsten Hothorn [aut, cre] ()", "Maintainer": "Torsten Hothorn ", - "Repository": "CRAN", + "Repository": "RSPM", "Encoding": "UTF-8" }, "nlme": { @@ -4833,18 +4602,14 @@ }, "nlmeU": { "Package": "nlmeU", - "Version": "0.70-9", + "Version": "0.71.7", "Source": "Repository", - "Date": "2022-05-02", - "Author": "Andrzej Galecki agalecki@umich.edu, Tomasz Burzykowski tomasz.burzykowski@uhasselt.be", - "Maintainer": "Andrzej Galecki ", - "Title": "Datasets and Utility Functions Enhancing Functionality of 'nlme' Package", - "Description": "Datasets and utility functions enhancing functionality of nlme package. Datasets, functions and scripts are described in book titled 'Linear Mixed-Effects Models: A Step-by-Step Approach' by Galecki and Burzykowski (2013). Package is under development.", - "Depends": [ - "R (>= 2.14.2)" - ], + "Title": "Functions and Data Supporting 'Linear Mixed-Effects Models: A Step-by-Step Approach'", + "Description": "Provides functions and datasets to support the book by Galecki and Burzykowski (2013), 'Linear Mixed-Effects Models: A Step-by-Step Approach', Springer. Includes functions for power calculations, log-likelihood contributions, and data simulation for linear mixed-effects models.", + "Authors@R": "c( person(\"Andrzej T.\", \"Galecki\", email = \"agalecki@umich.edu\", role = c(\"aut\", \"cre\"), comment = c(ORCID = \"0000-0003-1542-4001\")), person(\"Tomasz\", \"Burzykowski\", email = \"tomasz.burzykowski@uhasselt.be\", role = \"aut\", comment = c(ORCID = \"0000-0003-3378-975X\")) )", "Imports": [ - "nlme" + "nlme", + "stats" ], "Suggests": [ "reshape", @@ -4854,13 +4619,17 @@ "roxygen2", "testthat" ], - "License": "GPL (>= 2)", - "URL": "http://www-personal.umich.edu/~agalecki/", - "LazyData": "yes", - "Collate": "'logLik1.R' 'nlmeU-package.R' 'Pwr.R' 'simulateY.R' 'varia.R'", + "Encoding": "UTF-8", + "RoxygenNote": "7.3.2", + "License": "GPL-2", + "URL": "https://github.com/agalecki/nlmeU", + "Depends": [ + "R (>= 3.5.0)" + ], "NeedsCompilation": "no", - "Repository": "CRAN", - "Encoding": "UTF-8" + "Author": "Andrzej T. Galecki [aut, cre] (ORCID: ), Tomasz Burzykowski [aut] (ORCID: )", + "Maintainer": "Andrzej T. Galecki ", + "Repository": "RSPM" }, "nloptr": { "Package": "nloptr", @@ -4889,32 +4658,6 @@ "Maintainer": "Aymeric Stamm ", "Repository": "CRAN" }, - "nnet": { - "Package": "nnet", - "Version": "7.3-19", - "Source": "Repository", - "Priority": "recommended", - "Date": "2023-05-02", - "Depends": [ - "R (>= 3.0.0)", - "stats", - "utils" - ], - "Suggests": [ - "MASS" - ], - "Authors@R": "c(person(\"Brian\", \"Ripley\", role = c(\"aut\", \"cre\", \"cph\"), email = \"ripley@stats.ox.ac.uk\"), person(\"William\", \"Venables\", role = \"cph\"))", - "Description": "Software for feed-forward neural networks with a single hidden layer, and for multinomial log-linear models.", - "Title": "Feed-Forward Neural Networks and Multinomial Log-Linear Models", - "ByteCompile": "yes", - "License": "GPL-2 | GPL-3", - "URL": "http://www.stats.ox.ac.uk/pub/MASS4/", - "NeedsCompilation": "yes", - "Author": "Brian Ripley [aut, cre, cph], William Venables [cph]", - "Maintainer": "Brian Ripley ", - "Repository": "RSPM", - "Encoding": "UTF-8" - }, "nonnest2": { "Package": "nonnest2", "Version": "0.5-8", @@ -4955,7 +4698,7 @@ "NeedsCompilation": "no", "Author": "Edgar Merkle [aut, cre], Dongjun You [aut], Lennart Schneider [ctb], Mauricio Garnier-Villarreal [ctb], Seongho Bae [ctb], Phil Chalmers [ctb]", "Maintainer": "Edgar Merkle ", - "Repository": "CRAN" + "Repository": "RSPM" }, "npde": { "Package": "npde", @@ -4983,7 +4726,7 @@ "Collate": "'NpdeSimData.R' 'NpdeData.R' 'aaa_generics.R' 'NpdeData-methods.R' 'NpdeRes.R' 'NpdeRes-methods.R' 'NpdeObject.R' 'NpdeObject-methods.R' 'compute_distribution.R' 'compute_npde.R' 'compute_pd.R' 'compute_ploq.R' 'mainNpde.R' 'npde.R' 'npdeControl.R' 'plotNpde-auxDistPlot.R' 'plotNpde-auxScatter.R' 'plotNpde-auxScatterPlot.R' 'plotNpde-binningPI.R' 'plotNpde-covplot.R' 'plotNpde-distributionPlot.R' 'plotNpde-methods.R' 'plotNpde-plotFunctions.R' 'plotNpde-scatterplot.R'", "NeedsCompilation": "no", "Author": "Emmanuelle Comets [aut, cre] (), Karl Brendel [ctb], Thi Huyen Tram Nguyen [ctb], Marc Cerou [ctb], Romain Leroux [ctb], France Mentre [ctb]", - "Repository": "CRAN" + "Repository": "RSPM" }, "numDeriv": { "Package": "numDeriv", @@ -5007,7 +4750,7 @@ }, "openssl": { "Package": "openssl", - "Version": "2.3.3", + "Version": "2.3.4", "Source": "Repository", "Type": "Package", "Title": "Toolkit for Encryption, Signatures and Certificates Based on OpenSSL", @@ -5096,7 +4839,7 @@ "NeedsCompilation": "no", "Author": "Tyler Rinker [aut, cre, ctb], Dason Kurkiewicz [aut, ctb], Keith Hughitt [ctb], Albert Wang [ctb], Garrick Aden-Buie [ctb], Albert Wang [ctb], Lukas Burk [ctb]", "Maintainer": "Tyler Rinker ", - "Repository": "RSPM", + "Repository": "CRAN", "Encoding": "UTF-8" }, "parallelly": { @@ -5155,7 +4898,7 @@ "BugReports": "https://github.com/psolymos/pbapply/issues", "NeedsCompilation": "no", "Author": "Peter Solymos [aut, cre] (ORCID: ), Zygmunt Zawadzki [aut], Henrik Bengtsson [ctb], R Core Team [cph, ctb]", - "Repository": "CRAN", + "Repository": "RSPM", "Encoding": "UTF-8" }, "pbivnorm": { @@ -5170,49 +4913,12 @@ "License": "GPL (>= 2)", "URL": "https://github.com/brentonk/pbivnorm", "NeedsCompilation": "yes", - "Repository": "CRAN", + "Repository": "RSPM", "Encoding": "UTF-8" }, - "pbkrtest": { - "Package": "pbkrtest", - "Version": "0.5.5", - "Source": "Repository", - "Title": "Parametric Bootstrap, Kenward-Roger and Satterthwaite Based Methods for Test in Mixed Models", - "Authors@R": "c( person(given = \"Ulrich\", family = \"Halekoh\", email = \"uhalekoh@health.sdu.dk\", role = c(\"aut\", \"cph\")), person(given = \"Søren\", family = \"Højsgaard\", email = \"sorenh@math.aau.dk\", role = c(\"aut\", \"cre\", \"cph\")) )", - "Maintainer": "Søren Højsgaard ", - "Description": "Computes p-values based on (a) Satterthwaite or Kenward-Rogers degree of freedom methods and (b) parametric bootstrap for mixed effects models as implemented in the 'lme4' package. Implements parametric bootstrap test for generalized linear mixed models as implemented in 'lme4' and generalized linear models. The package is documented in the paper by Halekoh and Højsgaard, (2012, ). Please see 'citation(\"pbkrtest\")' for citation details.", - "URL": "https://people.math.aau.dk/~sorenh/software/pbkrtest/", - "Depends": [ - "R (>= 4.2.0)", - "lme4 (>= 1.1.31)" - ], - "Imports": [ - "broom", - "dplyr", - "MASS", - "methods", - "numDeriv", - "Matrix (>= 1.2.3)", - "doBy (>= 4.6.22)" - ], - "Suggests": [ - "nlme", - "markdown", - "knitr" - ], - "Encoding": "UTF-8", - "VignetteBuilder": "knitr", - "License": "GPL (>= 2)", - "ByteCompile": "Yes", - "RoxygenNote": "7.3.2", - "LazyData": "true", - "NeedsCompilation": "no", - "Author": "Ulrich Halekoh [aut, cph], Søren Højsgaard [aut, cre, cph]", - "Repository": "CRAN" - }, "pillar": { "Package": "pillar", - "Version": "1.11.0", + "Version": "1.11.1", "Source": "Repository", "Title": "Coloured Formatting for Columns", "Authors@R": "c(person(given = \"Kirill\", family = \"M\\u00fcller\", role = c(\"aut\", \"cre\"), email = \"kirill@cynkra.com\", comment = c(ORCID = \"0000-0002-1416-3412\")), person(given = \"Hadley\", family = \"Wickham\", role = \"aut\"), person(given = \"RStudio\", role = \"cph\"))", @@ -5254,7 +4960,7 @@ ], "VignetteBuilder": "knitr", "Encoding": "UTF-8", - "RoxygenNote": "7.3.2.9000", + "RoxygenNote": "7.3.3.9000", "Config/testthat/edition": "3", "Config/testthat/parallel": "true", "Config/testthat/start-first": "format_multi_fuzz, format_multi_fuzz_2, format_multi, ctl_colonnade, ctl_colonnade_1, ctl_colonnade_2", @@ -5269,10 +4975,10 @@ }, "pkgbuild": { "Package": "pkgbuild", - "Version": "1.4.4", + "Version": "1.4.8", "Source": "Repository", "Title": "Find Tools Needed to Build R Packages", - "Authors@R": "c( person(\"Hadley\", \"Wickham\", role = \"aut\"), person(\"Jim\", \"Hester\", role = \"aut\"), person(\"Gábor\", \"Csárdi\", , \"csardi.gabor@gmail.com\", role = c(\"aut\", \"cre\")), person(\"Posit Software, PBC\", role = c(\"cph\", \"fnd\")) )", + "Authors@R": "c( person(\"Hadley\", \"Wickham\", role = \"aut\"), person(\"Jim\", \"Hester\", role = \"aut\"), person(\"Gábor\", \"Csárdi\", , \"csardi.gabor@gmail.com\", role = c(\"aut\", \"cre\")), person(\"Posit Software, PBC\", role = c(\"cph\", \"fnd\"), comment = c(ROR = \"03wc8by49\")) )", "Description": "Provides functions used to build R packages. Locates compilers needed to build R packages on various platforms and ensures the PATH is configured appropriately so R can use them.", "License": "MIT + file LICENSE", "URL": "https://github.com/r-lib/pkgbuild, https://pkgbuild.r-lib.org", @@ -5291,20 +4997,20 @@ "covr", "cpp11", "knitr", - "mockery", "Rcpp", "rmarkdown", - "testthat (>= 3.0.0)", + "testthat (>= 3.2.0)", "withr (>= 2.3.0)" ], "Config/Needs/website": "tidyverse/tidytemplate", "Config/testthat/edition": "3", + "Config/usethis/last-upkeep": "2025-04-30", "Encoding": "UTF-8", - "RoxygenNote": "7.2.3", + "RoxygenNote": "7.3.2", "NeedsCompilation": "no", - "Author": "Hadley Wickham [aut], Jim Hester [aut], Gábor Csárdi [aut, cre], Posit Software, PBC [cph, fnd]", + "Author": "Hadley Wickham [aut], Jim Hester [aut], Gábor Csárdi [aut, cre], Posit Software, PBC [cph, fnd] (ROR: )", "Maintainer": "Gábor Csárdi ", - "Repository": "RSPM" + "Repository": "CRAN" }, "pkgconfig": { "Package": "pkgconfig", @@ -5332,12 +5038,12 @@ }, "pkgload": { "Package": "pkgload", - "Version": "1.3.4", + "Version": "1.4.1", "Source": "Repository", "Title": "Simulate Package Installation and Attach", "Authors@R": "c( person(\"Hadley\", \"Wickham\", role = \"aut\"), person(\"Winston\", \"Chang\", role = \"aut\"), person(\"Jim\", \"Hester\", role = \"aut\"), person(\"Lionel\", \"Henry\", , \"lionel@posit.co\", role = c(\"aut\", \"cre\")), person(\"Posit Software, PBC\", role = c(\"cph\", \"fnd\")), person(\"R Core team\", role = \"ctb\", comment = \"Some namespace and vignette code extracted from base R\") )", "Description": "Simulates the process of installing a package and then attaching it. This is a key part of the 'devtools' package as it allows you to rapidly iterate while developing a package.", - "License": "GPL-3", + "License": "MIT + file LICENSE", "URL": "https://github.com/r-lib/pkgload, https://pkgload.r-lib.org", "BugReports": "https://github.com/r-lib/pkgload/issues", "Depends": [ @@ -5345,38 +5051,39 @@ ], "Imports": [ "cli (>= 3.3.0)", - "crayon", "desc", "fs", "glue", + "lifecycle", "methods", "pkgbuild", + "processx", "rlang (>= 1.1.1)", "rprojroot", - "utils", - "withr (>= 2.4.3)" + "utils" ], "Suggests": [ "bitops", - "covr", + "jsonlite", "mathjaxr", - "mockr", "pak", "Rcpp", "remotes", "rstudioapi", - "testthat (>= 3.1.0)" + "testthat (>= 3.2.1.1)", + "usethis", + "withr" ], "Config/Needs/website": "tidyverse/tidytemplate, ggplot2", "Config/testthat/edition": "3", "Config/testthat/parallel": "TRUE", "Config/testthat/start-first": "dll", "Encoding": "UTF-8", - "RoxygenNote": "7.2.3", + "RoxygenNote": "7.3.2", "NeedsCompilation": "no", "Author": "Hadley Wickham [aut], Winston Chang [aut], Jim Hester [aut], Lionel Henry [aut, cre], Posit Software, PBC [cph, fnd], R Core team [ctb] (Some namespace and vignette code extracted from base R)", "Maintainer": "Lionel Henry ", - "Repository": "RSPM" + "Repository": "CRAN" }, "plyr": { "Package": "plyr", @@ -5432,7 +5139,7 @@ ], "Collate": "'adjective.R' 'adverb.R' 'exclamation.R' 'verb.R' 'rpackage.R' 'package.R'", "NeedsCompilation": "no", - "Repository": "RSPM", + "Repository": "CRAN", "Encoding": "UTF-8" }, "prettyunits": { @@ -5635,12 +5342,12 @@ "NeedsCompilation": "no", "Author": "William Revelle [aut, cre] (ORCID: )", "Maintainer": "William Revelle ", - "Repository": "CRAN", + "Repository": "RSPM", "Encoding": "UTF-8" }, "purrr": { "Package": "purrr", - "Version": "1.1.0", + "Version": "1.2.0", "Source": "Repository", "Title": "Functional Programming Tools", "Authors@R": "c( person(\"Hadley\", \"Wickham\", , \"hadley@posit.co\", role = c(\"aut\", \"cre\"), comment = c(ORCID = \"0000-0003-4757-117X\")), person(\"Lionel\", \"Henry\", , \"lionel@posit.co\", role = \"aut\"), person(\"Posit Software, PBC\", role = c(\"cph\", \"fnd\"), comment = c(ROR = \"https://ror.org/03wc8by49\")) )", @@ -5659,13 +5366,13 @@ "vctrs (>= 0.6.3)" ], "Suggests": [ - "carrier (>= 0.2.0)", + "carrier (>= 0.3.0)", "covr", "dplyr (>= 0.7.8)", "httr", "knitr", "lubridate", - "mirai (>= 2.4.0)", + "mirai (>= 2.5.1)", "rmarkdown", "testthat (>= 3.0.0)", "tibble", @@ -5681,7 +5388,7 @@ "Config/testthat/edition": "3", "Config/testthat/parallel": "TRUE", "Encoding": "UTF-8", - "RoxygenNote": "7.3.2", + "RoxygenNote": "7.3.3", "NeedsCompilation": "yes", "Author": "Hadley Wickham [aut, cre] (ORCID: ), Lionel Henry [aut], Posit Software, PBC [cph, fnd] (ROR: )", "Maintainer": "Hadley Wickham ", @@ -5689,10 +5396,10 @@ }, "purrrlyr": { "Package": "purrrlyr", - "Version": "0.0.8", + "Version": "0.0.10", "Source": "Repository", "Title": "Tools at the Intersection of 'purrr' and 'dplyr'", - "Authors@R": "c(person(given = \"Lionel\", family = \"Henry\", role = c(\"aut\", \"cre\"), email = \"lionel@rstudio.com\"), person(given = \"Hadley\", family = \"Wickham\", role = \"ctb\", email = \"hadley@rstudio.com\"), person(given = \"RStudio\", role = \"cph\"))", + "Authors@R": "c(person(given = \"Lionel\", family = \"Henry\", role = c(\"aut\", \"cre\"), email = \"lionel@posit.co\"), person(given = \"Hadley\", family = \"Wickham\", role = \"ctb\", email = \"hadley@posit.co\"), person(\"Posit Software, PBC\", role = c(\"cph\", \"fnd\"), comment = c(ROR = \"03wc8by49\")))", "Description": "Some functions at the intersection of 'dplyr' and 'purrr' that formerly lived in 'purrr'.", "License": "GPL-3 | file LICENSE", "URL": "https://github.com/hadley/purrrlyr", @@ -5710,13 +5417,13 @@ "LinkingTo": [ "Rcpp" ], - "SystemRequirements": "C++11", "Encoding": "UTF-8", - "RoxygenNote": "7.1.1", + "RoxygenNote": "7.3.3", "Config/testthat/edition": "3", + "Config/build/compilation-database": "true", "NeedsCompilation": "yes", - "Author": "Lionel Henry [aut, cre], Hadley Wickham [ctb], RStudio [cph]", - "Maintainer": "Lionel Henry ", + "Author": "Lionel Henry [aut, cre], Hadley Wickham [ctb], Posit Software, PBC [cph, fnd] (ROR: )", + "Maintainer": "Lionel Henry ", "Repository": "RSPM" }, "quadprog": { @@ -5745,7 +5452,7 @@ "Description": "Estimation and inference methods for models for conditional quantile functions: Linear and nonlinear parametric and non-parametric (total variation penalized) models for conditional quantiles of a univariate response and several methods for handling censored survival data. Portfolio selection methods based on expected shortfall risk are also now included. See Koenker, R. (2005) Quantile Regression, Cambridge U. Press, and Koenker, R. et al. (2017) Handbook of Quantile Regression, CRC Press, .", "Authors@R": "c( person(\"Roger\", \"Koenker\", role = c(\"cre\",\"aut\"), email = \"rkoenker@illinois.edu\"), person(\"Stephen\", \"Portnoy\", role = c(\"ctb\"), comment = \"Contributions to Censored QR code\", email = \"sportnoy@illinois.edu\"), person(c(\"Pin\", \"Tian\"), \"Ng\", role = c(\"ctb\"), comment = \"Contributions to Sparse QR code\", email = \"pin.ng@nau.edu\"), person(\"Blaise\", \"Melly\", role = c(\"ctb\"), comment = \"Contributions to preprocessing code\", email = \"mellyblaise@gmail.com\"), person(\"Achim\", \"Zeileis\", role = c(\"ctb\"), comment = \"Contributions to dynrq code essentially identical to his dynlm code\", email = \"Achim.Zeileis@uibk.ac.at\"), person(\"Philip\", \"Grosjean\", role = c(\"ctb\"), comment = \"Contributions to nlrq code\", email = \"phgrosjean@sciviews.org\"), person(\"Cleve\", \"Moler\", role = c(\"ctb\"), comment = \"author of several linpack routines\"), person(\"Yousef\", \"Saad\", role = c(\"ctb\"), comment = \"author of sparskit2\"), person(\"Victor\", \"Chernozhukov\", role = c(\"ctb\"), comment = \"contributions to extreme value inference code\"), person(\"Ivan\", \"Fernandez-Val\", role = c(\"ctb\"), comment = \"contributions to extreme value inference code\"), person(\"Martin\", \"Maechler\", role = \"ctb\", comment = c(\"tweaks (src/chlfct.f, 'tiny','Large')\", ORCID = \"0000-0002-8685-9910\")), person(c(\"Brian\", \"D\"), \"Ripley\", role = c(\"trl\",\"ctb\"), comment = \"Initial (2001) R port from S (to my everlasting shame -- how could I have been so slow to adopt R!) and for numerous other suggestions and useful advice\", email = \"ripley@stats.ox.ac.uk\"))", "Maintainer": "Roger Koenker ", - "Repository": "CRAN", + "Repository": "RSPM", "Depends": [ "R (>= 3.5)", "stats", @@ -5778,11 +5485,11 @@ }, "ragg": { "Package": "ragg", - "Version": "1.4.0", + "Version": "1.5.0", "Source": "Repository", "Type": "Package", "Title": "Graphic Devices Based on AGG", - "Authors@R": "c( person(\"Thomas Lin\", \"Pedersen\", , \"thomas.pedersen@posit.co\", role = c(\"cre\", \"aut\"), comment = c(ORCID = \"0000-0002-5147-4711\")), person(\"Maxim\", \"Shemanarev\", role = c(\"aut\", \"cph\"), comment = \"Author of AGG\"), person(\"Tony\", \"Juricic\", , \"tonygeek@yahoo.com\", role = c(\"ctb\", \"cph\"), comment = \"Contributor to AGG\"), person(\"Milan\", \"Marusinec\", , \"milan@marusinec.sk\", role = c(\"ctb\", \"cph\"), comment = \"Contributor to AGG\"), person(\"Spencer\", \"Garrett\", role = \"ctb\", comment = \"Contributor to AGG\"), person(\"Posit, PBC\", role = c(\"cph\", \"fnd\")) )", + "Authors@R": "c( person(\"Thomas Lin\", \"Pedersen\", , \"thomas.pedersen@posit.co\", role = c(\"cre\", \"aut\"), comment = c(ORCID = \"0000-0002-5147-4711\")), person(\"Maxim\", \"Shemanarev\", role = c(\"aut\", \"cph\"), comment = \"Author of AGG\"), person(\"Tony\", \"Juricic\", , \"tonygeek@yahoo.com\", role = c(\"ctb\", \"cph\"), comment = \"Contributor to AGG\"), person(\"Milan\", \"Marusinec\", , \"milan@marusinec.sk\", role = c(\"ctb\", \"cph\"), comment = \"Contributor to AGG\"), person(\"Spencer\", \"Garrett\", role = \"ctb\", comment = \"Contributor to AGG\"), person(\"Posit Software, PBC\", role = c(\"cph\", \"fnd\"), comment = c(ROR = \"03wc8by49\")) )", "Maintainer": "Thomas Lin Pedersen ", "Description": "Anti-Grain Geometry (AGG) is a high-quality and high-performance 2D drawing library. The 'ragg' package provides a set of graphic devices based on AGG to use as alternative to the raster devices provided through the 'grDevices' package.", "License": "MIT + file LICENSE", @@ -5802,14 +5509,15 @@ "systemfonts", "textshaping" ], + "Config/build/compilation-database": "true", "Config/Needs/website": "ggplot2, devoid, magick, bench, tidyr, ggridges, hexbin, sessioninfo, pkgdown, tidyverse/tidytemplate", + "Config/testthat/edition": "3", + "Config/usethis/last-upkeep": "2025-04-25", "Encoding": "UTF-8", "RoxygenNote": "7.3.2", - "SystemRequirements": "freetype2, libpng, libtiff, libjpeg", - "Config/testthat/edition": "3", - "Config/build/compilation-database": "true", + "SystemRequirements": "freetype2, libpng, libtiff, libjpeg, libwebp, libwebpmux", "NeedsCompilation": "yes", - "Author": "Thomas Lin Pedersen [cre, aut] (), Maxim Shemanarev [aut, cph] (Author of AGG), Tony Juricic [ctb, cph] (Contributor to AGG), Milan Marusinec [ctb, cph] (Contributor to AGG), Spencer Garrett [ctb] (Contributor to AGG), Posit, PBC [cph, fnd]", + "Author": "Thomas Lin Pedersen [cre, aut] (ORCID: ), Maxim Shemanarev [aut, cph] (Author of AGG), Tony Juricic [ctb, cph] (Contributor to AGG), Milan Marusinec [ctb, cph] (Contributor to AGG), Spencer Garrett [ctb] (Contributor to AGG), Posit Software, PBC [cph, fnd] (ROR: )", "Repository": "CRAN" }, "rappdirs": { @@ -5967,7 +5675,7 @@ }, "reformulas": { "Package": "reformulas", - "Version": "0.4.1", + "Version": "0.4.2", "Source": "Repository", "Title": "Machinery for Processing Random Effect Formulas", "Authors@R": "person(given = \"Ben\", family = \"Bolker\", role = c(\"aut\", \"cre\"), email = \"bolker@mcmaster.ca\", comment=c(ORCID=\"0000-0002-2127-0443\"))", @@ -6222,7 +5930,7 @@ "LazyData": "true", "RoxygenNote": "7.1.0", "NeedsCompilation": "yes", - "Repository": "CRAN" + "Repository": "RSPM" }, "rlang": { "Package": "rlang", @@ -6277,7 +5985,7 @@ }, "rmarkdown": { "Package": "rmarkdown", - "Version": "2.29", + "Version": "2.30", "Source": "Repository", "Type": "Package", "Title": "Dynamic Documents for R", @@ -6327,9 +6035,9 @@ "RoxygenNote": "7.3.2", "SystemRequirements": "pandoc (>= 1.14) - http://pandoc.org", "NeedsCompilation": "no", - "Author": "JJ Allaire [aut], Yihui Xie [aut, cre] (), Christophe Dervieux [aut] (), Jonathan McPherson [aut], Javier Luraschi [aut], Kevin Ushey [aut], Aron Atkins [aut], Hadley Wickham [aut], Joe Cheng [aut], Winston Chang [aut], Richard Iannone [aut] (), Andrew Dunning [ctb] (), Atsushi Yasumoto [ctb, cph] (, Number sections Lua filter), Barret Schloerke [ctb], Carson Sievert [ctb] (), Devon Ryan [ctb] (), Frederik Aust [ctb] (), Jeff Allen [ctb], JooYoung Seo [ctb] (), Malcolm Barrett [ctb], Rob Hyndman [ctb], Romain Lesur [ctb], Roy Storey [ctb], Ruben Arslan [ctb], Sergio Oller [ctb], Posit Software, PBC [cph, fnd], jQuery UI contributors [ctb, cph] (jQuery UI library; authors listed in inst/rmd/h/jqueryui/AUTHORS.txt), Mark Otto [ctb] (Bootstrap library), Jacob Thornton [ctb] (Bootstrap library), Bootstrap contributors [ctb] (Bootstrap library), Twitter, Inc [cph] (Bootstrap library), Alexander Farkas [ctb, cph] (html5shiv library), Scott Jehl [ctb, cph] (Respond.js library), Ivan Sagalaev [ctb, cph] (highlight.js library), Greg Franko [ctb, cph] (tocify library), John MacFarlane [ctb, cph] (Pandoc templates), Google, Inc. [ctb, cph] (ioslides library), Dave Raggett [ctb] (slidy library), W3C [cph] (slidy library), Dave Gandy [ctb, cph] (Font-Awesome), Ben Sperry [ctb] (Ionicons), Drifty [cph] (Ionicons), Aidan Lister [ctb, cph] (jQuery StickyTabs), Benct Philip Jonsson [ctb, cph] (pagebreak Lua filter), Albert Krewinkel [ctb, cph] (pagebreak Lua filter)", + "Author": "JJ Allaire [aut], Yihui Xie [aut, cre] (ORCID: ), Christophe Dervieux [aut] (ORCID: ), Jonathan McPherson [aut], Javier Luraschi [aut], Kevin Ushey [aut], Aron Atkins [aut], Hadley Wickham [aut], Joe Cheng [aut], Winston Chang [aut], Richard Iannone [aut] (ORCID: ), Andrew Dunning [ctb] (ORCID: ), Atsushi Yasumoto [ctb, cph] (ORCID: , cph: Number sections Lua filter), Barret Schloerke [ctb], Carson Sievert [ctb] (ORCID: ), Devon Ryan [ctb] (ORCID: ), Frederik Aust [ctb] (ORCID: ), Jeff Allen [ctb], JooYoung Seo [ctb] (ORCID: ), Malcolm Barrett [ctb], Rob Hyndman [ctb], Romain Lesur [ctb], Roy Storey [ctb], Ruben Arslan [ctb], Sergio Oller [ctb], Posit Software, PBC [cph, fnd], jQuery UI contributors [ctb, cph] (jQuery UI library; authors listed in inst/rmd/h/jqueryui/AUTHORS.txt), Mark Otto [ctb] (Bootstrap library), Jacob Thornton [ctb] (Bootstrap library), Bootstrap contributors [ctb] (Bootstrap library), Twitter, Inc [cph] (Bootstrap library), Alexander Farkas [ctb, cph] (html5shiv library), Scott Jehl [ctb, cph] (Respond.js library), Ivan Sagalaev [ctb, cph] (highlight.js library), Greg Franko [ctb, cph] (tocify library), John MacFarlane [ctb, cph] (Pandoc templates), Google, Inc. [ctb, cph] (ioslides library), Dave Raggett [ctb] (slidy library), W3C [cph] (slidy library), Dave Gandy [ctb, cph] (Font-Awesome), Ben Sperry [ctb] (Ionicons), Drifty [cph] (Ionicons), Aidan Lister [ctb, cph] (jQuery StickyTabs), Benct Philip Jonsson [ctb, cph] (pagebreak Lua filter), Albert Krewinkel [ctb, cph] (pagebreak Lua filter)", "Maintainer": "Yihui Xie ", - "Repository": "RSPM" + "Repository": "CRAN" }, "rpart": { "Package": "rpart", @@ -6379,12 +6087,12 @@ "LazyData": "yes", "URL": "http://www.milbo.org/rpart-plot/index.html", "NeedsCompilation": "no", - "Repository": "CRAN", + "Repository": "RSPM", "Encoding": "UTF-8" }, "rprojroot": { "Package": "rprojroot", - "Version": "2.1.0", + "Version": "2.1.1", "Source": "Repository", "Title": "Finding Files in Project Subdirectories", "Authors@R": "person(given = \"Kirill\", family = \"M\\u00fcller\", role = c(\"aut\", \"cre\"), email = \"kirill@cynkra.com\", comment = c(ORCID = \"0000-0002-1416-3412\"))", @@ -6443,16 +6151,16 @@ }, "rvest": { "Package": "rvest", - "Version": "1.0.4", + "Version": "1.0.5", "Source": "Repository", "Title": "Easily Harvest (Scrape) Web Pages", - "Authors@R": "c( person(\"Hadley\", \"Wickham\", , \"hadley@posit.co\", role = c(\"aut\", \"cre\")), person(\"Posit Software, PBC\", role = c(\"cph\", \"fnd\")) )", + "Authors@R": "c( person(\"Hadley\", \"Wickham\", , \"hadley@posit.co\", role = c(\"aut\", \"cre\")), person(\"Posit Software, PBC\", role = c(\"cph\", \"fnd\"), comment = c(ROR = \"03wc8by49\")) )", "Description": "Wrappers around the 'xml2' and 'httr' packages to make it easy to download, then manipulate, HTML and XML.", "License": "MIT + file LICENSE", "URL": "https://rvest.tidyverse.org/, https://github.com/tidyverse/rvest", "BugReports": "https://github.com/tidyverse/rvest/issues", "Depends": [ - "R (>= 3.6)" + "R (>= 4.1)" ], "Imports": [ "cli", @@ -6463,12 +6171,13 @@ "rlang (>= 1.1.0)", "selectr", "tibble", - "xml2 (>= 1.3)" + "xml2 (>= 1.4.0)" ], "Suggests": [ "chromote", "covr", "knitr", + "purrr", "R6", "readr", "repurrrsive", @@ -6476,6 +6185,7 @@ "spelling", "stringi (>= 0.3.1)", "testthat (>= 3.0.2)", + "tidyr", "webfakes" ], "VignetteBuilder": "knitr", @@ -6484,19 +6194,19 @@ "Config/testthat/parallel": "true", "Encoding": "UTF-8", "Language": "en-US", - "RoxygenNote": "7.3.1", + "RoxygenNote": "7.3.2", "NeedsCompilation": "no", - "Author": "Hadley Wickham [aut, cre], Posit Software, PBC [cph, fnd]", + "Author": "Hadley Wickham [aut, cre], Posit Software, PBC [cph, fnd] (ROR: )", "Maintainer": "Hadley Wickham ", "Repository": "CRAN" }, "saemix": { "Package": "saemix", - "Version": "3.3", + "Version": "3.4", "Source": "Repository", "Title": "Stochastic Approximation Expectation Maximization (SAEM) Algorithm", - "Authors@R": "c( person(\"Emmanuelle\", \"Comets\", role = c(\"aut\", \"cre\"), email = \"emmanuelle.comets@inserm.fr\"), person(\"Audrey\", \"Lavenu\", role = \"aut\"), person(\"Marc\", \"Lavielle\", role = \"aut\"), person(\"Belhal\", \"Karimi\", role = \"aut\"), person(\"Maud\", \"Delattre\", role = \"ctb\"), person(\"Marilou\", \"Chanel\", role = \"ctb\"), person(\"Johannes\", \"Ranke\", role = \"ctb\", comment = c(ORCID = \"0000-0003-4371-6538\")), person(\"Sofia\", \"Kaisaridi\", role = \"ctb\"), person(\"Lucie\", \"Fayette\", role = \"ctb\"))", - "Description": "The 'saemix' package implements the Stochastic Approximation EM algorithm for parameter estimation in (non)linear mixed effects models. The SAEM algorithm (i) computes the maximum likelihood estimator of the population parameters, without any approximation of the model (linearisation, quadrature approximation,...), using the Stochastic Approximation Expectation Maximization (SAEM) algorithm, (ii) provides standard errors for the maximum likelihood estimator (iii) estimates the conditional modes, the conditional means and the conditional standard deviations of the individual parameters, using the Hastings-Metropolis algorithm (see Comets et al. (2017) ). Many applications of SAEM in agronomy, animal breeding and PKPD analysis have been published by members of the Monolix group. The full PDF documentation for the package including references about the algorithm and examples can be downloaded on the github of the IAME research institute for 'saemix': .", + "Authors@R": "c( person(\"Emmanuelle\", \"Comets\", role = c(\"aut\", \"cre\"), email = \"emmanuelle.comets@inserm.fr\"), person(\"Audrey\", \"Lavenu\", role = \"aut\"), person(\"Marc\", \"Lavielle\", role = \"aut\"), person(\"Belhal\", \"Karimi\", role = \"aut\"), person(\"Maud\", \"Delattre\", role = \"ctb\"), person(\"Alexandra\", \"Lavalley-Morelle\", role = \"ctb\"), person(\"Marilou\", \"Chanel\", role = \"ctb\"), person(\"Johannes\", \"Ranke\", role = \"ctb\", comment = c(ORCID = \"0000-0003-4371-6538\")), person(\"Sofia\", \"Kaisaridi\", role = \"ctb\"), person(\"Lucie\", \"Fayette\", role = \"ctb\"))", + "Description": "The 'saemix' package implements the Stochastic Approximation EM algorithm for parameter estimation in (non)linear mixed effects models. It (i) computes the maximum likelihood estimator of the population parameters, without any approximation of the model (linearisation, quadrature approximation,...), using the Stochastic Approximation Expectation Maximization (SAEM) algorithm, (ii) provides standard errors for the maximum likelihood estimator (iii) estimates the conditional modes, the conditional means and the conditional standard deviations of the individual parameters, using the Hastings-Metropolis algorithm (see Comets et al. (2017) ). Many applications of SAEM in agronomy, animal breeding and PKPD analysis have been published by members of the Monolix group. The full PDF documentation for the package including references about the algorithm and examples can be downloaded on the github of the IAME research institute for 'saemix': .", "License": "GPL (>= 2)", "LazyLoad": "yes", "LazyData": "yes", @@ -6520,12 +6230,12 @@ "npde (>= 3.2)" ], "Encoding": "UTF-8", - "RoxygenNote": "7.3.1", + "RoxygenNote": "7.3.2", "NeedsCompilation": "no", - "Collate": "'aaa_generics.R' 'SaemixData.R' 'SaemixModel.R' 'SaemixRes.R' 'SaemixObject.R' 'backward.R' 'compute_LL.R' 'forward.R' 'func_FIM.R' 'func_aux.R' 'func_bootstrap.R' 'func_compare.R' 'func_discreteVPC.R' 'func_distcond.R' 'func_estimParam.R' 'func_exploreData.R' 'func_npde.R' 'func_plots.R' 'func_simulations.R' 'func_stepwise.R' 'main.R' 'main_estep.R' 'main_initialiseMainAlgo.R' 'main_mstep.R' 'stepwise.R' 'zzz.R'", - "Author": "Emmanuelle Comets [aut, cre], Audrey Lavenu [aut], Marc Lavielle [aut], Belhal Karimi [aut], Maud Delattre [ctb], Marilou Chanel [ctb], Johannes Ranke [ctb] (), Sofia Kaisaridi [ctb], Lucie Fayette [ctb]", + "Collate": "'aaa_generics.R' 'SaemixData.R' 'SaemixData-methods.R' 'SaemixData-methods_covariates.R' 'SaemixModel.R' 'SaemixRes.R' 'SaemixObject.R' 'backward.R' 'compute_LL.R' 'forward.R' 'func_FIM.R' 'func_aux.R' 'func_bootstrap.R' 'func_compare.R' 'func_discreteVPC.R' 'func_distcond.R' 'func_estimParam.R' 'func_exploreData.R' 'func_npde.R' 'func_plots.R' 'func_simulations.R' 'func_stepwise.R' 'main.R' 'main_estep.R' 'main_initialiseMainAlgo.R' 'main_mstep.R' 'stepwise.R' 'zzz.R'", + "Author": "Emmanuelle Comets [aut, cre], Audrey Lavenu [aut], Marc Lavielle [aut], Belhal Karimi [aut], Maud Delattre [ctb], Alexandra Lavalley-Morelle [ctb], Marilou Chanel [ctb], Johannes Ranke [ctb] (ORCID: ), Sofia Kaisaridi [ctb], Lucie Fayette [ctb]", "Maintainer": "Emmanuelle Comets ", - "Repository": "CRAN" + "Repository": "RSPM" }, "sandwich": { "Package": "sandwich", @@ -6567,7 +6277,7 @@ "NeedsCompilation": "no", "Author": "Achim Zeileis [aut, cre] (), Thomas Lumley [aut] (), Nathaniel Graham [ctb] (), Susanne Koell [ctb]", "Maintainer": "Achim Zeileis ", - "Repository": "CRAN", + "Repository": "RSPM", "Encoding": "UTF-8" }, "sass": { @@ -6713,7 +6423,7 @@ "URL": "https://meghapsimatrix.github.io/simhelpers/", "BugReports": "https://github.com/meghapsimatrix/simhelpers/issues", "Depends": [ - "R (>= 2.10)" + "R (>= 4.1.0)" ], "License": "GPL-3", "Encoding": "UTF-8", @@ -6744,14 +6454,15 @@ ], "RdMacros": "Rdpack", "VignetteBuilder": "knitr", - "Author": "Megha Joshi [aut, cre] (ORCID: ), James Pustejovsky [aut] (ORCID: )", - "Maintainer": "Megha Joshi ", "RemoteType": "github", - "RemoteUsername": "meghapsimatrix", + "RemoteHost": "api.github.com", "RemoteRepo": "simhelpers", - "RemoteRef": "master", - "RemoteSha": "3b1e25cc595de3432e56ee4a777baedf18bc1b78", - "RemoteHost": "api.github.com" + "RemoteUsername": "meghapsimatrix", + "RemoteRef": "HEAD", + "RemoteSha": "a512aa6844ed95aba4cd39102e08222624b51d56", + "NeedsCompilation": "no", + "Author": "Megha Joshi [aut, cre] (ORCID: ), James Pustejovsky [aut] (ORCID: )", + "Maintainer": "Megha Joshi " }, "sn": { "Package": "sn", @@ -6782,7 +6493,7 @@ "Encoding": "UTF-8", "NeedsCompilation": "no", "Author": "Adelchi Azzalini [aut, cre] ()", - "Repository": "CRAN" + "Repository": "RSPM" }, "snakecase": { "Package": "snakecase", @@ -6817,70 +6528,15 @@ "VignetteBuilder": "knitr", "NeedsCompilation": "no", "Author": "Malte Grosser [aut, cre]", - "Repository": "CRAN" - }, - "stargazer": { - "Package": "stargazer", - "Version": "5.2.3", - "Source": "Repository", - "Type": "Package", - "Title": "Well-Formatted Regression and Summary Statistics Tables", - "Date": "2022-03-03", - "Author": "Marek Hlavac ", - "Maintainer": "Marek Hlavac ", - "Description": "Produces LaTeX code, HTML/CSS code and ASCII text for well-formatted tables that hold regression analysis results from several models side-by-side, as well as summary statistics.", - "License": "GPL (>= 2)", - "Imports": [ - "stats", - "utils" - ], - "Enhances": [ - "AER", - "betareg", - "brglm", - "censReg", - "dynlm", - "eha", - "erer", - "ergm", - "fGarch", - "gee", - "glmx", - "gmm", - "lfe", - "lme4", - "lmtest", - "MASS", - "mclogit", - "mgcv", - "mlogit", - "nlme", - "nnet", - "ordinal", - "plm", - "pscl", - "quantreg", - "rms", - "relevent", - "robustbase", - "sampleSelection", - "spdep", - "survey", - "survival" - ], - "LazyLoad": "yes", - "Collate": "'stargazer-internal.R' 'stargazer.R'", - "NeedsCompilation": "no", - "Repository": "RSPM", - "Encoding": "UTF-8" + "Repository": "RSPM" }, "statmod": { "Package": "statmod", - "Version": "1.5.0", + "Version": "1.5.1", "Source": "Repository", - "Date": "2022-12-28", + "Date": "2025-10-08", "Title": "Statistical Modeling", - "Author": "Gordon Smyth [cre, aut], Lizhong Chen [aut], Yifang Hu [ctb], Peter Dunn [ctb], Belinda Phipson [ctb], Yunshun Chen [ctb]", + "Authors@R": "c(person(given = \"Gordon\", family = \"Smyth\", role = c(\"cre\", \"aut\"), email = \"smyth@wehi.edu.au\"), person(given = \"Lizhong\", family = \"Chen\", role = \"aut\"), person(given = \"Yifang\", family = \"Hu\", role = \"ctb\"), person(given = \"Peter\", family = \"Dunn\", role = \"ctb\"), person(given = \"Belinda\", family = \"Phipson\", role = \"ctb\"), person(given = \"Yunshun\", family = \"Chen\", role = \"ctb\"))", "Maintainer": "Gordon Smyth ", "Depends": [ "R (>= 3.0.0)" @@ -6896,7 +6552,8 @@ "Description": "A collection of algorithms and functions to aid statistical modeling. Includes limiting dilution analysis (aka ELDA), growth curve comparisons, mixed linear models, heteroscedastic regression, inverse-Gaussian probability calculations, Gauss quadrature and a secure convergence algorithm for nonlinear models. Also includes advanced generalized linear model functions including Tweedie and Digamma distributional families, secure convergence and exact distributional calculations for unit deviances.", "License": "GPL-2 | GPL-3", "NeedsCompilation": "yes", - "Repository": "CRAN", + "Author": "Gordon Smyth [cre, aut], Lizhong Chen [aut], Yifang Hu [ctb], Peter Dunn [ctb], Belinda Phipson [ctb], Yunshun Chen [ctb]", + "Repository": "RSPM", "Encoding": "UTF-8" }, "stringi": { @@ -6931,7 +6588,7 @@ }, "stringr": { "Package": "stringr", - "Version": "1.5.1", + "Version": "1.6.0", "Source": "Repository", "Title": "Simple, Consistent Wrappers for Common String Operations", "Authors@R": "c( person(\"Hadley\", \"Wickham\", , \"hadley@posit.co\", role = c(\"aut\", \"cre\", \"cph\")), person(\"Posit Software, PBC\", role = c(\"cph\", \"fnd\")) )", @@ -6940,7 +6597,7 @@ "URL": "https://stringr.tidyverse.org, https://github.com/tidyverse/stringr", "BugReports": "https://github.com/tidyverse/stringr/issues", "Depends": [ - "R (>= 3.6)" + "R (>= 4.1.0)" ], "Imports": [ "cli", @@ -6964,10 +6621,11 @@ ], "VignetteBuilder": "knitr", "Config/Needs/website": "tidyverse/tidytemplate", + "Config/potools/style": "explicit", "Config/testthat/edition": "3", "Encoding": "UTF-8", "LazyData": "true", - "RoxygenNote": "7.2.3", + "RoxygenNote": "7.3.3", "NeedsCompilation": "no", "Author": "Hadley Wickham [aut, cre, cph], Posit Software, PBC [cph, fnd]", "Maintainer": "Hadley Wickham ", @@ -6975,11 +6633,11 @@ }, "survey": { "Package": "survey", - "Version": "4.4-2", + "Version": "4.4-8", "Source": "Repository", "Title": "Analysis of Complex Survey Samples", "Description": "Summary statistics, two-sample tests, rank tests, generalised linear models, cumulative link models, Cox models, loglinear models, and general maximum pseudolikelihood estimation for multistage stratified, cluster-sampled, unequally weighted survey samples. Variances by Taylor series linearisation or replicate weights. Post-stratification, calibration, and raking. Two-phase subsampling designs. Graphics. PPS sampling without replacement. Small-area estimation.", - "Author": "Thomas Lumley, Peter Gao, Ben Schneider", + "Authors@R": "c(person(given = \"Thomas\", family = \"Lumley\", role = \"aut\"), person(given = \"Peter\", family = \"Gao\", role = \"aut\"), person(given = \"Ben\", family = \"Schneider\", role = \"aut\"), person(given = \"\\\"Thomas\", family = \"Lumley\\\"\", role = \"cre\", email = \"t.lumley@auckland.ac.nz\"))", "Maintainer": "\"Thomas Lumley\" ", "License": "GPL-2 | GPL-3", "Depends": [ @@ -7020,6 +6678,7 @@ ], "URL": "http://r-survey.r-forge.r-project.org/survey/", "NeedsCompilation": "yes", + "Author": "Thomas Lumley [aut], Peter Gao [aut], Ben Schneider [aut], \"Thomas Lumley\" [cre]", "Repository": "RSPM", "Encoding": "UTF-8" }, @@ -7055,7 +6714,7 @@ }, "svglite": { "Package": "svglite", - "Version": "2.2.1", + "Version": "2.2.2", "Source": "Repository", "Title": "An 'SVG' Graphics Device", "Authors@R": "c( person(\"Hadley\", \"Wickham\", , \"hadley@posit.co\", role = \"aut\"), person(\"Lionel\", \"Henry\", , \"lionel@posit.co\", role = \"aut\"), person(\"Thomas Lin\", \"Pedersen\", , \"thomas.pedersen@posit.co\", role = c(\"cre\", \"aut\"), comment = c(ORCID = \"0000-0002-5147-4711\")), person(\"T Jake\", \"Luciani\", , \"jake@apache.org\", role = \"aut\"), person(\"Matthieu\", \"Decorde\", , \"matthieu.decorde@ens-lyon.fr\", role = \"aut\"), person(\"Vaudor\", \"Lise\", , \"lise.vaudor@ens-lyon.fr\", role = \"aut\"), person(\"Tony\", \"Plate\", role = \"ctb\", comment = \"Early line dashing code\"), person(\"David\", \"Gohel\", role = \"ctb\", comment = \"Line dashing code and early raster code\"), person(\"Yixuan\", \"Qiu\", role = \"ctb\", comment = \"Improved styles; polypath implementation\"), person(\"Håkon\", \"Malmedal\", role = \"ctb\", comment = \"Opacity code\"), person(\"Posit Software, PBC\", role = c(\"cph\", \"fnd\"), comment = c(ROR = \"03wc8by49\")) )", @@ -7071,7 +6730,7 @@ "cli", "lifecycle", "rlang (>= 1.1.0)", - "systemfonts (>= 1.2.3)", + "systemfonts (>= 1.3.0)", "textshaping (>= 0.3.0)" ], "Suggests": [ @@ -7099,7 +6758,7 @@ "NeedsCompilation": "yes", "Author": "Hadley Wickham [aut], Lionel Henry [aut], Thomas Lin Pedersen [cre, aut] (ORCID: ), T Jake Luciani [aut], Matthieu Decorde [aut], Vaudor Lise [aut], Tony Plate [ctb] (Early line dashing code), David Gohel [ctb] (Line dashing code and early raster code), Yixuan Qiu [ctb] (Improved styles; polypath implementation), Håkon Malmedal [ctb] (Opacity code), Posit Software, PBC [cph, fnd] (ROR: )", "Maintainer": "Thomas Lin Pedersen ", - "Repository": "CRAN" + "Repository": "RSPM" }, "sys": { "Package": "sys", @@ -7127,7 +6786,7 @@ }, "systemfonts": { "Package": "systemfonts", - "Version": "1.2.3", + "Version": "1.3.1", "Source": "Repository", "Type": "Package", "Title": "System Native Font Finding", @@ -7150,9 +6809,12 @@ "Suggests": [ "covr", "farver", + "ggplot2", "graphics", "knitr", + "ragg", "rmarkdown", + "svglite", "testthat (>= 2.1.0)" ], "LinkingTo": [ @@ -7229,13 +6891,108 @@ "Maintainer": "Hadley Wickham ", "Repository": "CRAN" }, + "texreg": { + "Package": "texreg", + "Version": "1.39.4", + "Source": "Repository", + "Date": "2024-07-23", + "Title": "Conversion of R Regression Output to LaTeX or HTML Tables", + "Authors@R": "c(person(given = \"Philip\", family = \"Leifeld\", email = \"philip.leifeld@manchester.ac.uk\", role = c(\"aut\", \"cre\")), person(given = \"Claudia\", family = \"Zucca\", email = \"c.zucca@jads.nl\", role = \"ctb\"))", + "Description": "Converts coefficients, standard errors, significance stars, and goodness-of-fit statistics of statistical models into LaTeX tables or HTML tables/MS Word documents or to nicely formatted screen output for the R console for easy model comparison. A list of several models can be combined in a single table. The output is highly customizable. New model types can be easily implemented. Details can be found in Leifeld (2013), JStatSoft .)", + "URL": "https://github.com/leifeld/texreg/", + "BugReports": "https://github.com/leifeld/texreg/issues/", + "Suggests": [ + "broom (>= 0.4.2)", + "coda (>= 0.19.2)", + "ggplot2 (>= 3.1.0)", + "huxtable (>= 4.2.0)", + "knitr (>= 1.22)", + "rmarkdown (>= 1.12.3)", + "sandwich (>= 2.3-1)", + "systemfit (>= 1.1-0)", + "testthat (>= 2.0.0)", + "lmtest (>= 0.9-34)" + ], + "Depends": [ + "R (>= 3.5)" + ], + "Imports": [ + "methods", + "stats", + "httr" + ], + "Enhances": [ + "AER", + "alpaca", + "betareg", + "Bergm", + "bife", + "biglm", + "brglm", + "brms (>= 2.8.8)", + "btergm (>= 1.10.10)", + "dynlm", + "eha (>= 2.9.0)", + "ergm (>= 4.1.2)", + "feisr (>= 1.0.1)", + "fGarch", + "fixest (>= 0.10.5)", + "forecast", + "gamlss", + "gamlss.inf", + "gee", + "glmmTMB", + "gmm", + "gnm", + "h2o", + "latentnet", + "lfe", + "lme4 (>= 1.1.34)", + "logitr (>= 0.8.0)", + "lqmm", + "maxLik (>= 1.4.8)", + "metaSEM (>= 1.2.5.1)", + "mfx", + "mhurdle", + "miceadds", + "mlogit", + "MuMIn", + "nlme", + "nnet", + "oglmx", + "ordinal", + "pglm", + "plm (>= 2.4.1)", + "relevent", + "remify (>= 3.2.6)", + "remstats (>= 3.2.2)", + "remstimate (>= 2.3.11)", + "rms", + "robust", + "simex", + "spatialreg (>= 1.2.1)", + "spdep (>= 1.2.2)", + "speedglm", + "survival", + "truncreg (>= 0.2.5)", + "VGAM" + ], + "SystemRequirements": "pandoc (>= 1.12.3) suggested for using wordreg function; LaTeX packages tikz, booktabs, dcolumn, rotating, thumbpdf, longtable, paralist for the vignette", + "License": "GPL-3", + "Encoding": "UTF-8", + "RoxygenNote": "7.3.1", + "NeedsCompilation": "no", + "Author": "Philip Leifeld [aut, cre], Claudia Zucca [ctb]", + "Maintainer": "Philip Leifeld ", + "Repository": "CRAN" + }, "textshaping": { "Package": "textshaping", - "Version": "1.0.1", + "Version": "1.0.4", "Source": "Repository", "Title": "Bindings to the 'HarfBuzz' and 'Fribidi' Libraries for Text Shaping", "Authors@R": "c( person(\"Thomas Lin\", \"Pedersen\", , \"thomas.pedersen@posit.co\", role = c(\"cre\", \"aut\"), comment = c(ORCID = \"0000-0002-5147-4711\")), person(\"Posit Software, PBC\", role = c(\"cph\", \"fnd\"), comment = c(ROR = \"03wc8by49\")) )", - "Description": "Provides access to the text shaping functionality in the 'HarfBuzz' library and the bidirectional algorithm in the 'Fribidi' library. 'textshaping' is a low-level utility package mainly for graphic devices that expands upon the font tool-set provided by the 'systemfonts' package.", + "Description": "Provides access to the text shaping functionality in the 'HarfBuzz' library and the bidirectional algorithm in the 'Fribidi' library. 'textshaping' is a low-level utility package mainly for graphic devices that expands upon the font tool-set provided by the 'systemfonts' package.", "License": "MIT + file LICENSE", "URL": "https://github.com/r-lib/textshaping", "BugReports": "https://github.com/r-lib/textshaping/issues", @@ -7246,7 +7003,7 @@ "lifecycle", "stats", "stringi", - "systemfonts (>= 1.1.0)", + "systemfonts (>= 1.3.0)", "utils" ], "Suggests": [ @@ -7359,7 +7116,7 @@ ], "Config/testthat/edition": "3", "NeedsCompilation": "no", - "Repository": "RSPM" + "Repository": "CRAN" }, "tidyr": { "Package": "tidyr", @@ -7691,7 +7448,7 @@ "VignetteBuilder": "knitr", "NeedsCompilation": "no", "Author": "Charlotte Baey [aut, cre] (), Estelle Kuhn [aut]", - "Repository": "CRAN" + "Repository": "RSPM" }, "vctrs": { "Package": "vctrs", @@ -7770,7 +7527,7 @@ }, "vroom": { "Package": "vroom", - "Version": "1.6.5", + "Version": "1.6.6", "Source": "Repository", "Title": "Read and Write Rectangular Text Data Quickly", "Authors@R": "c( person(\"Jim\", \"Hester\", role = \"aut\", comment = c(ORCID = \"0000-0002-2739-7082\")), person(\"Hadley\", \"Wickham\", , \"hadley@posit.co\", role = \"aut\", comment = c(ORCID = \"0000-0003-4757-117X\")), person(\"Jennifer\", \"Bryan\", , \"jenny@posit.co\", role = c(\"aut\", \"cre\"), comment = c(ORCID = \"0000-0002-6983-2759\")), person(\"Shelby\", \"Bearrows\", role = \"ctb\"), person(\"https://github.com/mandreyel/\", role = \"cph\", comment = \"mio library\"), person(\"Jukka\", \"Jylänki\", role = \"cph\", comment = \"grisu3 implementation\"), person(\"Mikkel\", \"Jørgensen\", role = \"cph\", comment = \"grisu3 implementation\"), person(\"Posit Software, PBC\", role = c(\"cph\", \"fnd\")) )", @@ -7822,7 +7579,7 @@ ], "LinkingTo": [ "cpp11 (>= 0.2.0)", - "progress (>= 1.2.1)", + "progress (>= 1.2.3)", "tzdb (>= 0.1.1)" ], "VignetteBuilder": "knitr", @@ -7832,9 +7589,9 @@ "Copyright": "file COPYRIGHTS", "Encoding": "UTF-8", "Language": "en-US", - "RoxygenNote": "7.2.3.9000", + "RoxygenNote": "7.3.3", "NeedsCompilation": "yes", - "Author": "Jim Hester [aut] (), Hadley Wickham [aut] (), Jennifer Bryan [aut, cre] (), Shelby Bearrows [ctb], https://github.com/mandreyel/ [cph] (mio library), Jukka Jylänki [cph] (grisu3 implementation), Mikkel Jørgensen [cph] (grisu3 implementation), Posit Software, PBC [cph, fnd]", + "Author": "Jim Hester [aut] (ORCID: ), Hadley Wickham [aut] (ORCID: ), Jennifer Bryan [aut, cre] (ORCID: ), Shelby Bearrows [ctb], https://github.com/mandreyel/ [cph] (mio library), Jukka Jylänki [cph] (grisu3 implementation), Mikkel Jørgensen [cph] (grisu3 implementation), Posit Software, PBC [cph, fnd]", "Maintainer": "Jennifer Bryan ", "Repository": "CRAN" }, @@ -7915,7 +7672,7 @@ }, "xfun": { "Package": "xfun", - "Version": "0.52", + "Version": "0.54", "Source": "Repository", "Type": "Package", "Title": "Supporting Functions for Packages Maintained by 'Yihui Xie'", @@ -7937,7 +7694,7 @@ "rstudioapi", "tinytex (>= 0.30)", "mime", - "litedown (>= 0.4)", + "litedown (>= 0.6)", "commonmark", "knitr (>= 1.50)", "remotes", @@ -7947,22 +7704,23 @@ "jsonlite", "magick", "yaml", + "data.table", "qs" ], "License": "MIT + file LICENSE", "URL": "https://github.com/yihui/xfun", "BugReports": "https://github.com/yihui/xfun/issues", "Encoding": "UTF-8", - "RoxygenNote": "7.3.2", + "RoxygenNote": "7.3.3", "VignetteBuilder": "litedown", "NeedsCompilation": "yes", - "Author": "Yihui Xie [aut, cre, cph] (, https://yihui.org), Wush Wu [ctb], Daijiang Li [ctb], Xianying Tan [ctb], Salim Brüggemann [ctb] (), Christophe Dervieux [ctb]", + "Author": "Yihui Xie [aut, cre, cph] (ORCID: , URL: https://yihui.org), Wush Wu [ctb], Daijiang Li [ctb], Xianying Tan [ctb], Salim Brüggemann [ctb] (ORCID: ), Christophe Dervieux [ctb]", "Maintainer": "Yihui Xie ", - "Repository": "RSPM" + "Repository": "CRAN" }, "xml2": { "Package": "xml2", - "Version": "1.3.8", + "Version": "1.4.1", "Source": "Repository", "Title": "Parse XML", "Authors@R": "c( person(\"Hadley\", \"Wickham\", role = \"aut\"), person(\"Jim\", \"Hester\", role = \"aut\"), person(\"Jeroen\", \"Ooms\", email = \"jeroenooms@gmail.com\", role = c(\"aut\", \"cre\")), person(\"Posit Software, PBC\", role = c(\"cph\", \"fnd\")), person(\"R Foundation\", role = \"ctb\", comment = \"Copy of R-project homepage cached as example\") )", @@ -7983,7 +7741,6 @@ "curl", "httr", "knitr", - "magrittr", "mockery", "rmarkdown", "testthat (>= 3.2.0)", @@ -7992,7 +7749,7 @@ "VignetteBuilder": "knitr", "Config/Needs/website": "tidyverse/tidytemplate", "Encoding": "UTF-8", - "RoxygenNote": "7.2.3", + "RoxygenNote": "7.3.3", "SystemRequirements": "libxml2: libxml2-dev (deb), libxml2-devel (rpm)", "Collate": "'S4.R' 'as_list.R' 'xml_parse.R' 'as_xml_document.R' 'classes.R' 'format.R' 'import-standalone-obj-type.R' 'import-standalone-purrr.R' 'import-standalone-types-check.R' 'init.R' 'nodeset_apply.R' 'paths.R' 'utils.R' 'xml2-package.R' 'xml_attr.R' 'xml_children.R' 'xml_document.R' 'xml_find.R' 'xml_missing.R' 'xml_modify.R' 'xml_name.R' 'xml_namespaces.R' 'xml_node.R' 'xml_nodeset.R' 'xml_path.R' 'xml_schema.R' 'xml_serialize.R' 'xml_structure.R' 'xml_text.R' 'xml_type.R' 'xml_url.R' 'xml_write.R' 'zzz.R'", "Config/testthat/edition": "3", @@ -8060,7 +7817,7 @@ "NeedsCompilation": "yes", "Author": "Achim Zeileis [aut, cre] (), Gabor Grothendieck [aut], Jeffrey A. Ryan [aut], Joshua M. Ulrich [ctb], Felix Andrews [ctb]", "Maintainer": "Achim Zeileis ", - "Repository": "CRAN", + "Repository": "RSPM", "Encoding": "UTF-8" } } diff --git a/sec b/sec new file mode 100644 index 0000000..e69de29