jepusto · jepusto · Mar 28, 2026 · Mar 18, 2026 · Mar 18, 2026 · Mar 18, 2026
diff --git a/035-running-simulation.Rmd b/035-running-simulation.Rmd
@@ -405,7 +405,7 @@ With the tools of this chapter in place, we are now positioned to move from runn
 
 ### Welch simulations {#Welch-simulation}
 
-In the prior chapter's exercises (See \@ref(BFFs-forever)), you made a new `BF_F` function for the Welch simulation. Update the Welch simulation code to include your `BF_F` function, and then generate simulation results for this additional estimator.  See Chapter @ref(case-ANOVA) for the original simulation and overall context.\index{example!heteroskedastic ANOVA (Welch)}
+In Exercise \@ref(BFFs-forever) of Chapter \@ref(estimation-procedures), you made a new `BF_F` function for the Welch simulation. Update the Welch simulation code to include your `BF_F` function, and then generate simulation results for this additional estimator.  See Chapter @ref(case-ANOVA) for the original simulation and overall context.\index{example!heteroskedastic ANOVA (Welch)}
 
 
 ### Compare sampling distributions of Pearson's correlation coefficients {#Pearson-sampling-distributions}

diff --git a/080-simulations-as-evidence.Rmd b/080-simulations-as-evidence.Rmd
@@ -618,7 +618,7 @@ All of this said, as a rule of thumb, $B = 1000$ is often good enough.
 
 #### How many factors?
 
-Start with a few factors, run $R=10$ or so iterations, and time your code to see how long your simulation runs (see @\ref(profiling-code) for how).
+Start with a few factors, run $R=10$ or so iterations, and time your code to see how long your simulation runs (see \@ref(profiling-code) for how).
 If your simulation runs quickly, you can add more factors in addition to scaling up the number of replications, and you probably should.
 
 

diff --git a/120-parallel-processing.Rmd b/120-parallel-processing.Rmd
@@ -19,7 +19,7 @@ library(furrr)
 Especially if you take our advice of "when in doubt, go more general" and if are running enough replicates to get nice and samll Monte Carlo\index{Monte Carlo simulation} errors, you will quickly come up against the computational limits of your computer.
 Simulations can be incredibly computationally intensive, and there are a few means for dealing with that.
 The first is to optimize code by removing extraneous calculation (e.g., by writing methods from scratch rather than using the safety-checking and thus sometimes slower methods in R, or by saving calculations that are shared across different estimation approaches) to make it run faster.
-This approach is usually quite hard, and the benefits often minimal; see Appendix @ref(optimize-code) for further discussion and examples.
+This approach is usually quite hard, and the benefits often minimal; see Appendix \@ref(optimize-code) for further discussion and examples.
 The second is to use more computing power by making the simulation _parallel_.
 This latter approach is the topic of this chapter.
 

diff --git a/_output.yml b/_output.yml
@@ -1,5 +1,5 @@
 bookdown::gitbook:
-  css: style.css
+  css: css/style.css
   includes: 
     in_header: mathjax-preamble.html
   config:
@@ -20,8 +20,17 @@ bookdown::gitbook:
     download: ["pdf", "epub"]
 bookdown::pdf_book:
   includes:
-    in_header: preamble.tex
-  biblio-style: apalike
+    in_header: latex/preamble.tex
+    after_body: latex/after_body.tex
+  keep_tex: true
   citation_package: natbib
-  keep_tex: yes
-bookdown::epub_book: default
+  biblio-style: apalike
+  dev: "cairo_pdf"
+  latex_engine: xelatex
+  pandoc_args: --top-level-division=chapter
+  toc_depth: 3
+  toc_unnumbered: false
+  toc_appendix: true
+  highlight_bw: true
+bookdown::epub_book: 
+  stylesheet: css/style.css
diff --git a/attic/performance-criteria-scraps.Rmd b/attic/performance-criteria-scraps.Rmd
@@ -0,0 +1,54 @@
+## Different kinds of performance
+
+There are several classes of performance criteria we might be interested in, depending on what we are evaluating.
+In particular, we can as how well we are 
+We talk through them in the following sections after giving a high-level overview.
+
+**Point Estimation.** 
+Point estimation is when we try to measure the size of an estimand such as an actual average treatment effect $\gamma_1$.
+Estimation has two major components, the point estimator and the uncertainty estimator.
+We generally evaluate both the \emph{actual} properties of the point estimator and the performance of the \emph{estimated} properties of the point estimator.
+
+For the point estimate, we generally need to know our target _estimand_, which is the "right answer" that we are trying to estimate in our scenario.
+Continuing our running example of exploring best practices for analyzing a cluster randomized experiment, we might want to assess how well our estimate of the treatment effect $\hat\gamma_1$ does for the _estimand_ of the site-average treatment effect, $\gamma_1$, for example.
+
+Given our estimate $\hat{\gamma_1}$ and our estimand $\gamma_1$, we can ask whether we will get the right answer on average (if not, we have _bias_).
+We can also ask how variable our estimate will tend to be (this is estimating the _variance_ or _true standard error_ of our estimator).
+For estimation, we generally are concerned with two things: bias and variance.
+An estimator is biased if it would generally give estimates that are systematically higher (or lower) than the parameter being estimated in a given scenario.
+The variance of an estimator is a measure of how much the estimates vary from trial to trial.
+The variance is the true standard error, squared.
+We can ask how far off the truth we are likely to be on average (this is the _root mean squared error_).
+These are its actual properties.
+
+**Uncertainty Estimation.** 
+Point estimates also usually come with estimates of uncertainty: when we estimate the average treatment effect, we will also get an estimated standard error, $\widehat{SE}$, that tells us how far we are likely to be from our target.
+We need to understand the properties of $\widehat{SE}$ as well.
+We can ask whether our standard error estimator tends to be too large or too small, when compared to the true standard error.  We can also assess whether the estimated standard error is usually accurate, or unstable across our simulation scenarios.
+
+
+**Inference.** 
+Inference is when we do hypothesis testing, asking whether there is evidence for some sort of effect, or asking whether there is evidence that some coefficient is greater than or less than some specified value.
+In particular, for our example, to know if there is evidence that there is an average treatment effect at all we would test the null of $H_0: \gamma_1 = 0$.
+We would then 
+
+For inference, we first might ask whether our methods are valid, i.e., ask whether the methods work correctly when we test for a treatment effect when there is none.
+For example, we might wonder whether using multilevel models could open the door to inference problems if we had model misspecification, such as in a scenario where the residuals had some non-normal distribution.
+These sorts of questions are questions of validity.
+
+Also for inference, we might ask which method is better for detecting an effect when there is one.
+Here, we want to know how our estimators perform in circumstances with a non-zero average treatment effect.
+Do they reject the null often, or rarely?
+How much does using aggregation decrease (or increase?) our chances of rejection?
+These are questions about power.
+
+
+**Final thoughts.**
+Inference and estimation are clearly highly related--if we have a good estimate of the treatment effect and it is not zero (this is estimation), then we are willing to say that there is a treatment effect (this is inference)--but depending on the framing, the way you would set up a simulation to investigate the behavior of your estimators could be different.
+For example, if you were interested in inference, you might want to know how often you would reject the null hypothesis of no treatment effect---if just focused on estimation, you may not have a simulation that looks at this question of validity as directly.
+
+In the above, we discussed how to assess the performance of a single procedure, but in general we often have different methods for obtaining some estimate, and want to know which is best.
+For example, we want to identify which estimation strategy (aggregation, linear regression, or multilevel modeling) we should generally use when analyzing cluster randomized trial data---this is comparison.
+The goals for a simulation comparing different approaches should be to identify whether the strategies are different, when and how one is superior to the other, and what the salient differences are driven by.
+To fully understand the trade-offs and benefits, we will generally need to compare the different approaches with respect to a variety of metrics of success.
+
diff --git a/style.css → css/style.css b/style.css → css/style.css
diff --git a/toc.css → css/toc.css b/toc.css → css/toc.css
diff --git a/index.Rmd b/index.Rmd
@@ -1,9 +1,10 @@
 --- 
 title: "Designing Monte Carlo Simulations in R"
 author: "Luke W. Miratrix and James E. Pustejovsky\n(Equal authors)"
-date: "`r Sys.Date()`"
+date: "`r format(Sys.Date(), '%B %d, %Y')`"
 site: bookdown::bookdown_site
-documentclass: book
+documentclass: krantz
+classoption: krantz1
 bibliography: 
 - book.bib
 - packages.bib