From 1d3c8ca0cdf1b4525f179b7647e636af5649806a Mon Sep 17 00:00:00 2001
From: RossLinModelling <takfreak11@gmail.com>
Date: Tue, 25 Nov 2025 07:48:31 -0800
Subject: [PATCH 1/2] Update posts.html

Unit 5 and maybe onward based questions with the theme of election modeling.
---
 posts.html | 116 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 116 insertions(+)
diff --git a/posts.html b/posts.html
index 30e39f3..96bfb54 100644
--- a/posts.html
+++ b/posts.html
@@ -35,3 +35,119 @@ <h1 class="post-title">
     <span class="pagination-item newer">Newer</span>
   {% endif %}
 </div>
+
+\documentclass{article}
+\usepackage{amsmath}
+\usepackage{amssymb}
+\usepackage{graphicx}
+\usepackage{booktabs}
+\usepackage[margin=1in]{geometry}
+
+\begin{document}
+
+\section*{Math 50: Linear Regression Modeling -- Practice Contribution Questions}
+
+\textit{The following questions are based on a student's final project analyzing the ``13 Keys to the White House'' and other 2024 election forecasting models. Use the provided regression outputs and context to answer the questions regarding OLS diagnostics, logistic regression, and model validation.}
+
+\begin{enumerate}
+
+    % QUESTION 1
+    \item \textbf{Simple Linear Regression and Residual Analysis} \\
+    \textit{Context:} In a study of the ``13 Keys to the White House,'' a student attempts to predict the incumbent party's two-party vote share based on the number of ``False Keys'' (indicators unfavorable to the incumbent). The student runs an Ordinary Least Squares (OLS) regression using data from 1976--2020.
+    
+    \begin{enumerate}
+        [cite_start]\item The OLS regression of incumbent vote share ($Y$) on the number of false keys ($X$) produced a correlation coefficient of $r = -0.838$[cite: 59]. Calculate the Coefficient of Determination ($R^2$) and interpret its meaning in the context of this political science model. What percentage of the variance in the popular vote is explained by the ``Keys'' model?
+        
+        [cite_start]\item The regression analysis resulted in a statistically significant negative slope ($p < .01$)[cite: 62]. [cite_start]Based on the scatterplot provided in the study, the regression line passes through an incumbent vote share of roughly $0.57$ when $X=0$ (0 false keys) and drops to roughly $0.45$ when $X=9$[cite: 36].
+        \begin{enumerate}
+            \item Estimate the linear regression equation $\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x$.
+            \item Interpret the slope coefficient $\hat{\beta}_1$ in plain English. For every additional ``False Key'' triggered, how much vote share does the incumbent party lose on average?
+        \end{enumerate}
+        
+        [cite_start]\item The student notes that when predicting the \textit{Electoral College} vote share (rather than popular vote), the scatterplot becomes ``W-shaped,'' resulting in a weak correlation of $r = -0.224$[cite: 92, 93].
+        \begin{enumerate}
+            \item Explain why a ``W-shaped'' pattern in a residual plot or scatterplot violates the linearity assumption of Simple Linear Regression.
+            [cite_start]\item Why might a ``national'' model like the 13 Keys fail to predict Electoral College outcomes compared to popular vote outcomes? [cite: 94]
+        \end{enumerate}
+    \end{enumerate}
+
+    \vspace{0.5cm}
+
+    % QUESTION 2
+    \item \textbf{Multiple Regression and Multicollinearity} \\
+    \textit{Context:} A researcher suspects that Lichtman's ``13 Keys'' model might be overfitted and that the economy is the primary driver of election results. [cite_start]To test this, they isolate \textbf{Key 5} (Short-term economy) and \textbf{Key 6} (Long-term economy) and run a multiple regression predicting incumbent vote share[cite: 101, 140].
+
+    [cite_start]\textit{Data:} Use the provided regression output table[cite: 146]:
+    \begin{table}[h!]
+        \centering
+        \begin{tabular}{lcccc}
+            \toprule
+            \textbf{Variable} & \textbf{Coef} & \textbf{Std Err} & \textbf{$t$} & \textbf{$P>|t|$} \\
+            \midrule
+            Intercept (const) & 0.5440 & 0.032 & 16.823 & 0.000 \\
+            Num\_False\_Keys & -0.0108 & 0.004 & -2.628 & 0.030 \\
+            Key 5 & 0.0318 & 0.019 & 1.662 & 0.135 \\
+            Key 6 & -0.0016 & 0.014 & -0.113 & 0.913 \\
+            \bottomrule
+        \end{tabular}
+    \end{table}
+
+    \begin{enumerate}
+        \item Write out the estimated multiple regression equation. Based on the $p$-values at the $\alpha = 0.05$ significance level, which predictors are statistically significant?
+        
+        [cite_start]\item The study notes that Key 5 was the ``strongest factor of all'' in a single-feature analysis, yet in this multiple regression model, it has a $p$-value of 0.135 (insignificant)[cite: 99, 146]. Explain this apparent paradox. How does the inclusion of \texttt{Num\_False\_Keys} (which likely contains the economic information already) affect the standard errors and $p$-values of \texttt{Key 5}?
+        
+        [cite_start]\item The researcher argues that because the keys are ``connected and multicorrelational,'' the model should be viewed as a cohesive structure rather than summing individual parts[cite: 147]. If Key 5 and Key 6 are highly correlated with the total count of false keys, what would you expect to happen to the variance of the coefficients ($\text{Var}(\hat{\beta})$) if you removed the \texttt{Num\_False\_Keys} variable from the model?
+    \end{enumerate}
+
+    \vspace{0.5cm}
+
+    % QUESTION 3
+    \item \textbf{Logistic Regression and Classification Thresholds} \\
+    \textit{Context:} The ``13 Keys'' model is deterministic: it predicts that if 5 or fewer keys are false, the incumbent wins; otherwise, they lose. This is a binary classification problem. A student models this using Logistic Regression.
+
+    \begin{enumerate}
+        [cite_start]\item The ``inflection point'' of the model is identified at precisely five false keys[cite: 70]. In a logistic regression model, the probability of an incumbent win $P(Y=1)$ is given by the sigmoid function:
+        \[ P(Y=1) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x)}} \]
+        If the inflection point (decision boundary where $P=0.5$) occurs at $x = 5.5$ (the midpoint between 5 and 6 keys), find the ratio $-\frac{\beta_0}{\beta_1}$.
+        
+        [cite_start]\item The study mentions that 2016 was a ``calamitous'' year where probabilistic models gave Trump a low chance of winning (e.g., 29\%), while the deterministic Keys model predicted his victory[cite: 3, 6]. [cite_start]Discuss the trade-off between \textbf{Validity} (having a statistical guarantee of error) and \textbf{Plausibility} (accounting for uncertainty)[cite: 8]. Why might a logistic regression model with wide confidence intervals be considered more ``plausible'' but less useful to the public than a deterministic step-function model like the 13 Keys?
+        
+        \item Look at the ``Keys vs. Electoral College Win Probability'' plot referenced in the study. [cite_start]The curve resembles a step function (Heaviside) rather than a smooth Sigmoid[cite: 72]. What does this imply about the ``separation'' of the data classes in this limited dataset ($n=12$)? [cite_start]Why might ``quasi-separation'' be a problem for fitting a standard logistic regression using Maximum Likelihood Estimation (MLE)? [cite: 73]
+    \end{enumerate}
+
+    \vspace{0.5cm}
+
+    % QUESTION 4
+    \item \textbf{The Linear Extrapolation Problem} \\
+    [cite_start]\textit{Context:} The project critiques a forecasting model (Morris) by attempting to reverse-engineer it using polling data and ``fundamentals.'' The student suggests the model is likely \textit{not} a linear regression because of the ``linear extrapolation problem''[cite: 157].
+
+    \begin{enumerate}
+        \item Suppose a forecasting model predicts the Democratic vote margin ($M$) using a linear combination of Economic Indicators ($E$) and Polling Average ($P$):
+        \[ M = \beta_0 + \beta_1 E + \beta_2 P \]
+        [cite_start]The study notes that for certain states, a linear model would require weighting polls \textit{negatively} to fit the data, which is theoretically unsound[cite: 157]. Explain mathematically how a linear model could predict a vote share $> 100\%$ or $< 0\%$ if the inputs $E$ or $P$ are extreme.
+        
+        [cite_start]\item The student suggests a ``logistic transform'' is used in the toy simulation to fix this[cite: 161]. If the linear model outputs a raw ``score'' $z = \beta_0 + \beta_1 E + \beta_2 P$, how does applying the transformation $f(z) = \frac{1}{1+e^{-z}}$ resolve the extrapolation problem described in Unit 4?
+        
+        [cite_start]\item In the Monte Carlo simulation provided, the output is a probability distribution of electoral votes[cite: 158]. Explain why the histogram of outcomes spreads out (variance) rather than being a single bar. Which term in the standard regression model $Y = \beta X + \epsilon$ accounts for this spread?
+    \end{enumerate}
+
+    \vspace{0.5cm}
+
+    % QUESTION 5
+    \item \textbf{Sample Size and Degrees of Freedom} \\
+    [cite_start]\textit{Context:} The student acknowledges that their dataset is ``extremely limited'' (1976--2020), containing only 12 observations[cite: 139]. [cite_start]The regression model uses the ``Number of False Keys'' as a predictor, which is a summation of 13 individual binary variables[cite: 21].
+
+    \begin{enumerate}
+        \item Calculate the Degrees of Freedom ($df$) for the error term in the OLS regression shown in Question 2, where $n=12$ and the model estimates an intercept plus three coefficients (\texttt{Num\_False\_Keys}, \texttt{Key 5}, \texttt{Key 6}).
+        \[ df_{residual} = n - (k + 1) \]
+        Is this sample size sufficient for a reliable multiple regression analysis according to standard ``Rules of Thumb'' (e.g., 10 observations per predictor)?
+        
+        [cite_start]\item The project performs ``feature selection'' on the keys to see if the model can be reduced[cite: 97]. If the researcher decided to regress the vote share against \textit{all 13 keys individually} instead of the aggregate score, what would happen to the linear algebra solution for the OLS estimator $\hat{\beta} = (X^TX)^{-1}X^T Y$? [cite_start](Hint: Consider the dimensions of matrix $X$ with $n=12$ and $p=13$ [cite: 137]).
+        
+        [cite_start]\item The text mentions that 2016 polling errors were attributed to ``faulty turnout modeling'' and ``undecided voter treatment''[cite: 11]. If a regression model is trained on 1976-2012 data, and the mechanism of sampling error changes in 2016 (a ``structural break''), prediction error will increase. [cite_start]In the context of the ``Model Wars'' mentioned[cite: 14], how does \textbf{regularization} (like Ridge or Lasso) help a model generalize better to unseen data compared to a standard OLS model that perfectly fits the limited history?
+    \end{enumerate}
+
+\end{enumerate}
+
+\end{document}

From e2117d63481e144b47bcff579345d5a9b6386c28 Mon Sep 17 00:00:00 2001
From: RossLinModelling <takfreak11@gmail.com>
Date: Tue, 25 Nov 2025 07:56:58 -0800
Subject: [PATCH 2/2] Update posts.html

fixed election questions
---
 posts.html | 180 ++++++++++++++++++++++++++---------------------------
 1 file changed, 88 insertions(+), 92 deletions(-)

diff --git a/posts.html b/posts.html
index 96bfb54..2130ce5 100644
--- a/posts.html
+++ b/posts.html
@@ -36,118 +36,114 @@ <h1 class="post-title">
   {% endif %}
 </div>
 
-\documentclass{article}
-\usepackage{amsmath}
-\usepackage{amssymb}
-\usepackage{graphicx}
-\usepackage{booktabs}
-\usepackage[margin=1in]{geometry}
+\subsection*{Contribution Problems: Presidential Forecasting Models}
 
-\begin{document}
+\textit{The following problems are based on a student's final project analyzing the ``13 Keys to the White House'' and other 2024 election forecasting models. Use the provided regression outputs and context to answer the questions regarding OLS diagnostics, logistic regression, and model validation.}
 
-\section*{Math 50: Linear Regression Modeling -- Practice Contribution Questions}
+\begin{enumerate}
 
-\textit{The following questions are based on a student's final project analyzing the ``13 Keys to the White House'' and other 2024 election forecasting models. Use the provided regression outputs and context to answer the questions regarding OLS diagnostics, logistic regression, and model validation.}
+% QUESTION 1
+\item \textbf{Simple Linear Regression and Residual Analysis.} \\
+\textit{Context:} In a study of the ``13 Keys to the White House,'' a student attempts to predict the incumbent party's two-party vote share based on the number of ``False Keys'' (indicators unfavorable to the incumbent). The student runs an OLS regression using data from 1976--2020.
 
 \begin{enumerate}
+    \item The OLS regression of incumbent vote share \((Y)\) on the number of false keys \((X)\) produced a correlation coefficient of \(r = -0.838\). Calculate the Coefficient of Determination \(R^2\). What percentage of the variance in the popular vote is explained by the Keys model?
 
-    % QUESTION 1
-    \item \textbf{Simple Linear Regression and Residual Analysis} \\
-    \textit{Context:} In a study of the ``13 Keys to the White House,'' a student attempts to predict the incumbent party's two-party vote share based on the number of ``False Keys'' (indicators unfavorable to the incumbent). The student runs an Ordinary Least Squares (OLS) regression using data from 1976--2020.
-    
+    \item Based on the scatterplot in the project, the regression line passes through approximately \(0.57\) when \(X=0\) (0 false keys) and approximately \(0.45\) when \(X=9\).
     \begin{enumerate}
-        [cite_start]\item The OLS regression of incumbent vote share ($Y$) on the number of false keys ($X$) produced a correlation coefficient of $r = -0.838$[cite: 59]. Calculate the Coefficient of Determination ($R^2$) and interpret its meaning in the context of this political science model. What percentage of the variance in the popular vote is explained by the ``Keys'' model?
-        
-        [cite_start]\item The regression analysis resulted in a statistically significant negative slope ($p < .01$)[cite: 62]. [cite_start]Based on the scatterplot provided in the study, the regression line passes through an incumbent vote share of roughly $0.57$ when $X=0$ (0 false keys) and drops to roughly $0.45$ when $X=9$[cite: 36].
-        \begin{enumerate}
-            \item Estimate the linear regression equation $\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x$.
-            \item Interpret the slope coefficient $\hat{\beta}_1$ in plain English. For every additional ``False Key'' triggered, how much vote share does the incumbent party lose on average?
-        \end{enumerate}
-        
-        [cite_start]\item The student notes that when predicting the \textit{Electoral College} vote share (rather than popular vote), the scatterplot becomes ``W-shaped,'' resulting in a weak correlation of $r = -0.224$[cite: 92, 93].
-        \begin{enumerate}
-            \item Explain why a ``W-shaped'' pattern in a residual plot or scatterplot violates the linearity assumption of Simple Linear Regression.
-            [cite_start]\item Why might a ``national'' model like the 13 Keys fail to predict Electoral College outcomes compared to popular vote outcomes? [cite: 94]
-        \end{enumerate}
+        \item Estimate the linear regression equation \(\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x\).
+        \item Interpret the slope coefficient \(\hat{\beta}_1\). For each additional False Key, how much vote share does the incumbent party lose on average?
     \end{enumerate}
 
-    \vspace{0.5cm}
-
-    % QUESTION 2
-    \item \textbf{Multiple Regression and Multicollinearity} \\
-    \textit{Context:} A researcher suspects that Lichtman's ``13 Keys'' model might be overfitted and that the economy is the primary driver of election results. [cite_start]To test this, they isolate \textbf{Key 5} (Short-term economy) and \textbf{Key 6} (Long-term economy) and run a multiple regression predicting incumbent vote share[cite: 101, 140].
-
-    [cite_start]\textit{Data:} Use the provided regression output table[cite: 146]:
-    \begin{table}[h!]
-        \centering
-        \begin{tabular}{lcccc}
-            \toprule
-            \textbf{Variable} & \textbf{Coef} & \textbf{Std Err} & \textbf{$t$} & \textbf{$P>|t|$} \\
-            \midrule
-            Intercept (const) & 0.5440 & 0.032 & 16.823 & 0.000 \\
-            Num\_False\_Keys & -0.0108 & 0.004 & -2.628 & 0.030 \\
-            Key 5 & 0.0318 & 0.019 & 1.662 & 0.135 \\
-            Key 6 & -0.0016 & 0.014 & -0.113 & 0.913 \\
-            \bottomrule
-        \end{tabular}
-    \end{table}
-
+    \item When predicting Electoral College vote share, the scatterplot becomes ``W-shaped,'' and the correlation weakens to \(r = -0.224\).
     \begin{enumerate}
-        \item Write out the estimated multiple regression equation. Based on the $p$-values at the $\alpha = 0.05$ significance level, which predictors are statistically significant?
-        
-        [cite_start]\item The study notes that Key 5 was the ``strongest factor of all'' in a single-feature analysis, yet in this multiple regression model, it has a $p$-value of 0.135 (insignificant)[cite: 99, 146]. Explain this apparent paradox. How does the inclusion of \texttt{Num\_False\_Keys} (which likely contains the economic information already) affect the standard errors and $p$-values of \texttt{Key 5}?
-        
-        [cite_start]\item The researcher argues that because the keys are ``connected and multicorrelational,'' the model should be viewed as a cohesive structure rather than summing individual parts[cite: 147]. If Key 5 and Key 6 are highly correlated with the total count of false keys, what would you expect to happen to the variance of the coefficients ($\text{Var}(\hat{\beta})$) if you removed the \texttt{Num\_False\_Keys} variable from the model?
+        \item Explain why a W-shaped pattern in a residual plot violates the linearity assumption of Simple Linear Regression.
+        \item Why might a national model like the 13 Keys fail to predict Electoral College outcomes as well as popular vote outcomes?
     \end{enumerate}
+\end{enumerate}
 
-    \vspace{0.5cm}
+\vspace{0.5cm}
 
-    % QUESTION 3
-    \item \textbf{Logistic Regression and Classification Thresholds} \\
-    \textit{Context:} The ``13 Keys'' model is deterministic: it predicts that if 5 or fewer keys are false, the incumbent wins; otherwise, they lose. This is a binary classification problem. A student models this using Logistic Regression.
+% QUESTION 2
+\item \textbf{Multiple Regression and Multicollinearity.} \\
+\textit{Context:} A researcher suspects that the Keys model may be overfitted and that the economy is the primary driver of election results. To test this, they isolate Key 5 (Short-term economy) and Key 6 (Long-term economy) and run a multiple regression predicting incumbent vote share.
 
-    \begin{enumerate}
-        [cite_start]\item The ``inflection point'' of the model is identified at precisely five false keys[cite: 70]. In a logistic regression model, the probability of an incumbent win $P(Y=1)$ is given by the sigmoid function:
-        \[ P(Y=1) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x)}} \]
-        If the inflection point (decision boundary where $P=0.5$) occurs at $x = 5.5$ (the midpoint between 5 and 6 keys), find the ratio $-\frac{\beta_0}{\beta_1}$.
-        
-        [cite_start]\item The study mentions that 2016 was a ``calamitous'' year where probabilistic models gave Trump a low chance of winning (e.g., 29\%), while the deterministic Keys model predicted his victory[cite: 3, 6]. [cite_start]Discuss the trade-off between \textbf{Validity} (having a statistical guarantee of error) and \textbf{Plausibility} (accounting for uncertainty)[cite: 8]. Why might a logistic regression model with wide confidence intervals be considered more ``plausible'' but less useful to the public than a deterministic step-function model like the 13 Keys?
-        
-        \item Look at the ``Keys vs. Electoral College Win Probability'' plot referenced in the study. [cite_start]The curve resembles a step function (Heaviside) rather than a smooth Sigmoid[cite: 72]. What does this imply about the ``separation'' of the data classes in this limited dataset ($n=12$)? [cite_start]Why might ``quasi-separation'' be a problem for fitting a standard logistic regression using Maximum Likelihood Estimation (MLE)? [cite: 73]
-    \end{enumerate}
+\textit{Regression output:}
 
-    \vspace{0.5cm}
+\begin{table}[h!]
+    \centering
+    \begin{tabular}{lcccc}
+        \toprule
+        \textbf{Variable} & \textbf{Coef} & \textbf{Std Err} & \textbf{$t$} & \textbf{$P>|t|$} \\
+        \midrule
+        Intercept (const) & 0.5440 & 0.032 & 16.823 & 0.000 \\
+        Num\_False\_Keys & -0.0108 & 0.004 & -2.628 & 0.030 \\
+        Key 5 & 0.0318 & 0.019 & 1.662 & 0.135 \\
+        Key 6 & -0.0016 & 0.014 & -0.113 & 0.913 \\
+        \bottomrule
+    \end{tabular}
+\end{table}
 
-    % QUESTION 4
-    \item \textbf{The Linear Extrapolation Problem} \\
-    [cite_start]\textit{Context:} The project critiques a forecasting model (Morris) by attempting to reverse-engineer it using polling data and ``fundamentals.'' The student suggests the model is likely \textit{not} a linear regression because of the ``linear extrapolation problem''[cite: 157].
+\begin{enumerate}
+    \item Write out the estimated multiple regression equation. Based on the \(p\)-values at the \(\alpha = 0.05\) level, which predictors are statistically significant?
 
-    \begin{enumerate}
-        \item Suppose a forecasting model predicts the Democratic vote margin ($M$) using a linear combination of Economic Indicators ($E$) and Polling Average ($P$):
-        \[ M = \beta_0 + \beta_1 E + \beta_2 P \]
-        [cite_start]The study notes that for certain states, a linear model would require weighting polls \textit{negatively} to fit the data, which is theoretically unsound[cite: 157]. Explain mathematically how a linear model could predict a vote share $> 100\%$ or $< 0\%$ if the inputs $E$ or $P$ are extreme.
-        
-        [cite_start]\item The student suggests a ``logistic transform'' is used in the toy simulation to fix this[cite: 161]. If the linear model outputs a raw ``score'' $z = \beta_0 + \beta_1 E + \beta_2 P$, how does applying the transformation $f(z) = \frac{1}{1+e^{-z}}$ resolve the extrapolation problem described in Unit 4?
-        
-        [cite_start]\item In the Monte Carlo simulation provided, the output is a probability distribution of electoral votes[cite: 158]. Explain why the histogram of outcomes spreads out (variance) rather than being a single bar. Which term in the standard regression model $Y = \beta X + \epsilon$ accounts for this spread?
-    \end{enumerate}
+    \item Key 5 was the strongest single-feature predictor in the project, yet in the multiple regression it becomes insignificant (\(p = 0.135\)). Explain this paradox. How does including \texttt{Num\_False\_Keys} (which already encodes economic conditions) affect the standard error and significance of Key 5?
 
-    \vspace{0.5cm}
+    \item The researcher argues that the Keys are ``connected and multicorrelational.'' If Key 5 and Key 6 are highly correlated with the total number of False Keys, what happens to the variance of the coefficients \(\mathrm{Var}(\hat{\beta})\) if \texttt{Num\_False\_Keys} is removed from the model?
+\end{enumerate}
 
-    % QUESTION 5
-    \item \textbf{Sample Size and Degrees of Freedom} \\
-    [cite_start]\textit{Context:} The student acknowledges that their dataset is ``extremely limited'' (1976--2020), containing only 12 observations[cite: 139]. [cite_start]The regression model uses the ``Number of False Keys'' as a predictor, which is a summation of 13 individual binary variables[cite: 21].
+\vspace{0.5cm}
 
-    \begin{enumerate}
-        \item Calculate the Degrees of Freedom ($df$) for the error term in the OLS regression shown in Question 2, where $n=12$ and the model estimates an intercept plus three coefficients (\texttt{Num\_False\_Keys}, \texttt{Key 5}, \texttt{Key 6}).
-        \[ df_{residual} = n - (k + 1) \]
-        Is this sample size sufficient for a reliable multiple regression analysis according to standard ``Rules of Thumb'' (e.g., 10 observations per predictor)?
-        
-        [cite_start]\item The project performs ``feature selection'' on the keys to see if the model can be reduced[cite: 97]. If the researcher decided to regress the vote share against \textit{all 13 keys individually} instead of the aggregate score, what would happen to the linear algebra solution for the OLS estimator $\hat{\beta} = (X^TX)^{-1}X^T Y$? [cite_start](Hint: Consider the dimensions of matrix $X$ with $n=12$ and $p=13$ [cite: 137]).
-        
-        [cite_start]\item The text mentions that 2016 polling errors were attributed to ``faulty turnout modeling'' and ``undecided voter treatment''[cite: 11]. If a regression model is trained on 1976-2012 data, and the mechanism of sampling error changes in 2016 (a ``structural break''), prediction error will increase. [cite_start]In the context of the ``Model Wars'' mentioned[cite: 14], how does \textbf{regularization} (like Ridge or Lasso) help a model generalize better to unseen data compared to a standard OLS model that perfectly fits the limited history?
-    \end{enumerate}
+% QUESTION 3
+\item \textbf{Logistic Regression and Classification Thresholds.} \\
+\textit{Context:} The 13 Keys model is deterministic: if five or fewer keys are false, the incumbent wins; otherwise, they lose. A student models this using logistic regression.
+
+\begin{enumerate}
+    \item The inflection point of the logistic model occurs when the predicted probability is \(0.5\). The project identifies this point at \(x=5.5\). For a logistic model
+    \[
+        P(Y=1) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x)}},
+    \]
+    solve for the ratio \(-\beta_0/\beta_1\).
+
+    \item The project contrasts probabilistic models (e.g., giving Trump a 29\% chance in 2016) with the deterministic Keys model. Discuss the trade-off between \textbf{Validity} (error guarantees) and \textbf{Plausibility} (realistic uncertainty). Why might a logistic model with wide confidence intervals be more plausible but less useful to the public?
 
+    \item The ``Keys vs. Electoral College Win Probability'' plot resembles a step function rather than a smooth sigmoid. What does this suggest about class separation in a sample of only \(n=12\)? Why does quasi-separation pose a problem for logistic regression estimated via Maximum Likelihood?
 \end{enumerate}
 
-\end{document}
+\vspace{0.5cm}
+
+% QUESTION 4
+\item \textbf{The Linear Extrapolation Problem.} \\
+\textit{Context:} The project critiques a forecasting model by reverse-engineering its fundamentals-plus-polls structure and argues it may not be linear due to extrapolation issues.
+
+\begin{enumerate}
+    \item Suppose a model predicts Democratic vote margin using
+    \[
+        M = \beta_0 + \beta_1 E + \beta_2 P.
+    \]
+    Explain how a linear model can produce predictions exceeding \(100\%\) or below \(0\%\) when inputs \(E\) or \(P\) take extreme values.
+
+    \item A logistic transform
+    \[
+        f(z) = \frac{1}{1+e^{-z}}
+    \]
+    is used in the project's simulation to fix this issue. Explain how applying this transform ensures predictions remain between 0 and 1.
+
+    \item In the project's Monte Carlo simulation, the output is a distribution of electoral votes rather than a single value. Explain how the error term \(\epsilon\) in the regression model \(Y = \beta X + \epsilon\) leads to variance across simulation outcomes.
+\end{enumerate}
+
+\vspace{0.5cm}
+
+% QUESTION 5
+\item \textbf{Sample Size and Degrees of Freedom.} \\
+\textit{Context:} The dataset includes only 12 presidential elections (1976--2020). The model uses the total number of False Keys, a sum of 13 binary indicators.
+
+\begin{enumerate}
+    \item Compute the residual degrees of freedom for the regression in Question 2 with \(n=12\) and three predictors. Is this sample size adequate by common rules of thumb (e.g., 10 observations per predictor)?
+
+    \item If one attempted to regress vote share on all 13 keys individually (with \(n=12\) and \(p=13\)), what happens to the OLS formula \(\hat{\beta} = (X^T X)^{-1} X^T Y\)? Discuss in terms of matrix dimensions.
+
+    \item The project notes that 2016 polling errors were driven by changes in turnout modeling and undecided-voter behavior (a structural break). Explain how regularization methods (Ridge, Lasso) help prevent overfitting and improve generalization relative to ordinary least squares.
+\end{enumerate}
+
+\end{enumerate}