| created | 2023-05-01 |
|---|---|
| lastmod | 2025-01-15 |
Dan Dennett once said of Darwin's theory of evolution that it was the best idea that anyone has ever had. You could say the same about the MLE in the realm of [[statistical inference]]. It's simple and elegant and sometimes optimal.
Given a parametric model
Of course, the idea that we should just find the parameters that have the highest probability given the data is not some bedrock philosophical principle that can't be debated. And, as you might imagine, people do debate it—[[Bayesian statistics|Bayesians]] in particular. MLE is [[frequentist statistics|frequentist]] by nature; parameters are fixed and there are no priors. It also doesn't provide natural [[uncertainty quantification]] since we just get a point estimate. Of course this is where [[central limit theorems]] kick in.
The MLE can be seen as [[empirical risk minimization]] with the loss
When our model is misspecified (ie the data are being generated by some distribution that's not in our model), then we can use the connection between the KL divergence and the MLE to see that the MLE is finding the parameter that minimizes the distance between the true data-generating distribution and
Under enough regularity conditions, the MLE obeys a [[central limit theorems|CLT]] with variance equal to the inverse of the [[Fisher information]], thus matching the [[Cramer-Rao lower bound]].