Skip to content

Latest commit

 

History

History
43 lines (35 loc) · 2.8 KB

File metadata and controls

43 lines (35 loc) · 2.8 KB
created 2024-08-29
lastmod 2026-01-26

The Chernoff method, also called the Cramer-Chernoff method, applies [[basic inequalities#Markov's inequality]] to the nonnegative random variable $e^{\lambda X}$, and then chooses $\lambda$ to optimize the bound. This often results in tighter bounds and [[concentration inequalities]] than working with the random variable directly.

In particular, for a random variable $X$, Markov's inequality gives that $$ \Pr(X\geq t) = \Pr(e^{\lambda X}\geq e^{\lambda t}) \leq e^{-\lambda t}\E[e^{\lambda X}]. $$ This holds for all $\lambda>0$, so we obtain the bound $$ \Pr(X\geq t)\leq \inf_{\lambda >0}e^{-\lambda t}\E[e^{\lambda X}] = \inf_{\lambda >0} e^{-\lambda t}M_X(\lambda), $$ where $M_X$ is the [[MGF]] of $X$. If we place distributional assumptions on $X$ (eg bounded, [[sub-Gaussian distributions|sub-Gaussianity]]), then we can solve for the optimal value of $\lambda$. If $X = \sum_{i\leq n} X_i$ is the sum of iid observations, then $\E[e^{\lambda X}] = \prod_i \E[e^{\lambda X_i}] = n \E[e^{\lambda X_1}]$. This makes the Chernoff particularly attractive under independence assumptions, even though it is not optimal (see below).

The Chernoff method is often written in terms of the Cramer transform of $X$ (which is why it's often called the Cramer-Chernoff method), which is $$ \psi_X^(t) = \sup_{\lambda\geq 0} (\lambda t - \psi_X(\lambda)),\quad \text{where}\quad \psi_X(\lambda) = \log \E e^{\lambda Z}. $$ This is also often called the Fenchel-Legendre transform. This gives, $$ \Pr(X\geq t) \leq \exp(-\psi_X^(t)), $$ which leads us to studying and bounding the function $\psi_X^*(t)$.

Optimality

The Chernoff method is very convenient hence widely deployed, but it's not optimal. We can see this in the case of [[bounded scalar concentration#Hoeffding's bound]], which is missing a factor of $1/t$ in the exponent from what we expect from the CLT. (See The missing factor in Hoeffding's inequalities by Talagrand).

More generally, the [[method of moments for concentration]] beats the Chernoff bound (when all moments exist), and the optimization approach to concentration ([[concentration via convex optimization]]) will give the tightest results if the convex program can be solved. See also [[interpolating between Markov and Chernoff]], which is tighter than the Chernoff method but also less computationally convenient.

However, the Chernoff method is optimal is one sense. If $X_1,\dots,X_n$ are iid mean zero, $S_n = \sum_i X_i$, and for all $t\geq 0$, $$ \Pr(S_n \geq nt) \leq Q^n(t), $$ then $$ Q(t) \geq \inf_{\lambda \geq 0} \E[\exp(\lambda(X-t))]. $$ That is, if we have a bound which uses the product structure of the observations (hence the power of $n$ on $Q(t)$), then the Chernoff method is optimal.