StatsForMLAnalysis

This repository contains course notes for the Statistical Analysis of Machine Learning Systems short course, run during IAP 2023 at MIT Lincoln Laboratory. Lecture notes are in the "lectures" folder.

Syllabus

Statistical Analysis of Machine Learning Systems

Introduction to statistics with a focus on experiments and evaluations for machine learning systems. Topics include uncertainty quantification, hypothesis testing, statistical analysis workflows, and experiment design. Not a course on how to build machine learning models, but rather, how to quantify their properties using statistical methods. Focus is on practice and methods, rather than theory.

Lectures and Learning Objectives (students will be able to…)

Machine Learning, Statistics, and the Scientific Method

Describe the components of a scientific experiment
Describe the typical differences between scientific and machine learning experiments
Define descriptive and inferential statistics
Understand how to assess machine learning literature for statistical rigor
Identify approaches to uncertainty quantification in black-box analysis of machine learning systems

Frequentist Statistics

Write the null and alternative hypotheses
Explain when the null hypothesis is accepted or rejected
Define Type I and Type II errors
Perform one-sample, two-sample, and paired t-tests
Explain typical assumptions in parametric statistics
Perform calculations for effect size and explain the difference between statistical significance and effect size

Machine Learning Specifics

Describe common methods in general design of controlled experiments, and compare these to machine learning workflows
Describe violations of typical statistical assumptions in typical machine learning workflows
Identify machine learning experiment designs that alleviate these violations, including repeated measures and statistical corrections

Statistical Analysis Workflow

Design the components of a scientific experiment with a machine learning model as the subject
Frame experiments in the context of statistical models
Describe the procedure of omnibus and post-hoc test, with multiple comparison corrections
Understand basic principles for visualizing comparisons of groups and descriptive statistics
Use appropriate terms and visuals to communicate inferential statistical results

Assumptions about students:

You are currently involved in, adjacent to, or wish to be involved in machine learning projects, and are familiar with machine learning concepts such as training, testing, cross validation, classification and regression.
You have encountered basic probability and statistical principles before, including mean, median, standard deviation, variance, normal distribution, and Bayes rule.

Boundaries

Who is this course for?

People who build machine learning models
People who analyze the performance of machine learning models
People who care about accuracy in representing results

(Feel free to find-replace “who” with “who want to.”)

What this course is not

A primer on machine learning
A course about training uncertainty-aware machine learning models
Highly tailored to a specific part of machine learning (though we will talk about issues in the broad branches)
A theoretical treatment of much of anything
A substitute for a course on statistics or experiment design
Polished

What this course is

An attempt to connect the dots between statistical and ML practice
A set of gotchas and (general) remedies for stats in ML
Quick and dirty (for now, but possibly for quite a while)
A bit of a soapbox (particularly the first lecture)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
example_notebooks		example_notebooks
images		images
lectures		lectures
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StatsForMLAnalysis

Syllabus

Statistical Analysis of Machine Learning Systems

Lectures and Learning Objectives (students will be able to…)

Machine Learning, Statistics, and the Scientific Method

Frequentist Statistics

Machine Learning Specifics

Statistical Analysis Workflow

Boundaries

Who is this course for?

What this course is not

What this course is

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

StatsForMLAnalysis

Syllabus

Statistical Analysis of Machine Learning Systems

Lectures and Learning Objectives (students will be able to…)

Machine Learning, Statistics, and the Scientific Method

Frequentist Statistics

Machine Learning Specifics

Statistical Analysis Workflow

Boundaries

Who is this course for?

What this course is not

What this course is

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages