Skip to content

hoseasiu/StatsForMLAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

StatsForMLAnalysis

This repository contains course notes for the Statistical Analysis of Machine Learning Systems short course, run during IAP 2023 at MIT Lincoln Laboratory. Lecture notes are in the "lectures" folder.

Syllabus

Statistical Analysis of Machine Learning Systems

Introduction to statistics with a focus on experiments and evaluations for machine learning systems. Topics include uncertainty quantification, hypothesis testing, statistical analysis workflows, and experiment design. Not a course on how to build machine learning models, but rather, how to quantify their properties using statistical methods. Focus is on practice and methods, rather than theory.

Lectures and Learning Objectives (students will be able to…)

Machine Learning, Statistics, and the Scientific Method

  • Describe the components of a scientific experiment
  • Describe the typical differences between scientific and machine learning experiments
  • Define descriptive and inferential statistics
  • Understand how to assess machine learning literature for statistical rigor
  • Identify approaches to uncertainty quantification in black-box analysis of machine learning systems

Frequentist Statistics

  • Write the null and alternative hypotheses
  • Explain when the null hypothesis is accepted or rejected
  • Define Type I and Type II errors
  • Perform one-sample, two-sample, and paired t-tests
  • Explain typical assumptions in parametric statistics
  • Perform calculations for effect size and explain the difference between statistical significance and effect size

Machine Learning Specifics

  • Describe common methods in general design of controlled experiments, and compare these to machine learning workflows
  • Describe violations of typical statistical assumptions in typical machine learning workflows
  • Identify machine learning experiment designs that alleviate these violations, including repeated measures and statistical corrections

Statistical Analysis Workflow

  • Design the components of a scientific experiment with a machine learning model as the subject
  • Frame experiments in the context of statistical models
  • Describe the procedure of omnibus and post-hoc test, with multiple comparison corrections
  • Understand basic principles for visualizing comparisons of groups and descriptive statistics
  • Use appropriate terms and visuals to communicate inferential statistical results

Assumptions about students:

  • You are currently involved in, adjacent to, or wish to be involved in machine learning projects, and are familiar with machine learning concepts such as training, testing, cross validation, classification and regression.
  • You have encountered basic probability and statistical principles before, including mean, median, standard deviation, variance, normal distribution, and Bayes rule.

Boundaries

Who is this course for?

  • People who build machine learning models
  • People who analyze the performance of machine learning models
  • People who care about accuracy in representing results

(Feel free to find-replace “who” with “who want to.”)

What this course is not

  • A primer on machine learning
  • A course about training uncertainty-aware machine learning models
  • Highly tailored to a specific part of machine learning (though we will talk about issues in the broad branches)
  • A theoretical treatment of much of anything
  • A substitute for a course on statistics or experiment design
  • Polished

What this course is

  • An attempt to connect the dots between statistical and ML practice
  • A set of gotchas and (general) remedies for stats in ML
  • Quick and dirty (for now, but possibly for quite a while)
  • A bit of a soapbox (particularly the first lecture)

About

Course notes for the Statistical Analysis of Machine Learning Systems short course, run during IAP 2023 at MIT Lincoln Laboratory.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors