Skip to content

Adding election example questions#34

Open
RossLinModelling wants to merge 2 commits into
elevien:mainfrom
RossLinModelling:add-election-example
Open

Adding election example questions#34
RossLinModelling wants to merge 2 commits into
elevien:mainfrom
RossLinModelling:add-election-example

Conversation

@RossLinModelling
Copy link
Copy Markdown

@RossLinModelling RossLinModelling commented Nov 25, 2025

Summary

This pull request adds a new set of contribution problems to Unit 5 of the Math 50 notes. The problems are based on my final project analyzing the “13 Keys to the White House” and comparative forecasting models for recent U.S. presidential elections (including fundamentals + polls models). The goal is to give students a concrete, data-driven context for applying the concepts from Units 4–5 on linear regression, logistic regression, and model validation.

The new material appears under the heading:

Contribution Problems: Presidential Forecasting Models

in the Unit 5 LaTeX notes.


Content Added

The contribution consists of five multi-part problems, each linking a course concept to the election forecasting project:

  1. Simple Linear Regression and Residual Analysis

    • Uses the “13 Keys” dataset (1976–2020) with correlation ( r = -0.838 ) between number of false keys and incumbent two-party vote share.
    • Asks students to compute ( R^2 ), estimate the regression line from two points, interpret the slope in plain language, and discuss why a W-shaped scatterplot violates the linearity assumption and breaks the model when predicting Electoral College vote share.
  2. Multiple Regression and Multicollinearity

    • Uses regression output where incumbent vote share is predicted by:
      Num_False_Keys, Key 5 (short-term economy), and Key 6 (long-term economy).
    • Asks students to:
      • Write the estimated multiple regression equation.
      • Decide which predictors are significant at (\alpha = 0.05).
      • Explain why Key 5 looks strong in a single-predictor model but becomes insignificant once Num_False_Keys is included (multicollinearity and variance inflation).
      • Reason about what happens to (\operatorname{Var}(\hat{\beta})) if Num_False_Keys is removed.
  3. Logistic Regression and Classification Thresholds

    • Treats the Keys model as a binary classifier (incumbent wins if ≤5 false keys).
    • Students:
      • Use the logistic formula ( P(Y=1) = \frac{1}{1+e^{-(\beta_0 + \beta_1 x)}} ) to find the ratio (-\beta_0/\beta_1) when the decision boundary is at (x=5.5).
      • Discuss the trade-off between validity (error guarantees) and plausibility (modeling uncertainty) when comparing probabilistic forecasts to a deterministic “Keys” rule.
      • Explain how quasi-separation in a tiny dataset (n=12) creates near-step-function behavior and potential MLE instability.
  4. The Linear Extrapolation Problem

    • Starts from a fundamentals-plus-polls linear model ( M = \beta_0 + \beta_1 E + \beta_2 P ).
    • Problems ask students to:
      • Show algebraically how linear models can produce impossible margins (less than 0% or greater than 100%) when inputs are extreme or when a negative weight on polls is needed to fit a state.
      • Explain how applying a logistic transform ( f(z) = 1/(1+e^{-z}) ) constrains predictions to ([0,1]).
      • Interpret the Monte Carlo simulation’s distribution of Electoral College outcomes in terms of the error term ( \varepsilon ) in ( Y = \beta X + \varepsilon ).
  5. Sample Size and Degrees of Freedom

    • Emphasizes that the dataset contains only 12 elections (1976–2020).
    • Students:
      • Compute the residual degrees of freedom for the regression with three predictors and discuss whether the “10 observations per predictor” rule of thumb is satisfied.
      • Analyze what happens to the OLS solution ( \hat{\beta} = (X^T X)^{-1} X^T Y ) if we try to regress on all 13 individual Keys with only (n=12) (singular (X^T X)).
      • Connect 2016 polling failures and structural breaks to the need for regularization (Ridge/Lasso) to improve generalization versus plain OLS on such a small sample.

Pedagogical Motivation

These problems are designed to:

  • Tie abstract regression concepts directly to a high-interest real-world application (presidential elections).
  • Reinforce key Unit 5 topics:
    • OLS diagnostics and residual patterns
    • Multiple regression and multicollinearity
    • Logistic regression and classification
    • Domain constraints and link functions
    • Sample size, degrees of freedom, and overfitting
    • Regularization as a way to handle structural breaks and limited data
  • Provide a bridge between the course material and current debates about election models (fundamentals vs. polls, deterministic vs. probabilistic forecasts).

They can be used as:

  • Optional challenge problems,
  • A contribution exercise,
  • Or supplementary practice for students interested in political applications of regression.

Files Touched

  • public/latex_notes/unit5/unit5.tex (added the new subsection and problems near the end of Unit 5).

No existing content was deleted or modified; the contribution is additive and self-contained.

Unit 5 and maybe onward based questions with the theme of election modeling.
fixed election questions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant