Conformal Prediction

Demonstration

Subgroup coverage buys only a wider band

You can protect every subgroup distribution-free, but the only way to do it is a uniformly wider slab, not an adaptive one.

Demo 02 showed that a marginally valid band under-covers the hard regions and over-covers the easy ones. The obvious fix is to ask for more: coverage \(\ge 1-\alpha\) on every subgroup, not just on average. Foygel Barber, Candès, Ramdas & Tibshirani (2021, Thm 3.1) settle what that costs. If you demand distribution-free coverage on every subgroup of probability mass at least \(\delta\), the demand “is impossible to attain beyond the trivial solution”, and the trivial solution is simply to inflate the marginal level to \(1-\alpha\delta\): $$C(x) = [\,\hat\mu(x) - q_{1-\alpha\delta},\ \hat\mu(x) + q_{1-\alpha\delta}\,].$$ That is a single flat band, wider than the ordinary one, identical for every \(x\). It gains no adaptivity. As \(\delta\to 0\), protecting ever-smaller subgroups, the level \(1-\alpha\delta\to 1\) and the band’s width diverges.

Contrast that with the oracle adaptive band \(\hat\mu(x)\pm z_{1-\alpha/2}\,\sigma(x)\), which is narrow where the noise is small and wide where it is large, reaching the target in every subgroup with minimal average width. But it uses the true \(\sigma(x)\), a distributional assumption. That is exactly the trade the theorem makes precise: genuine adaptivity is not free, and you cannot buy it distribution-free. Slide \(\delta\) down and watch the orange slab swallow the plot while the blue oracle keeps hugging the data.

Above: empirical coverage in each \(x\)-bin (each bin a subgroup) under three bands. The ordinary marginal band (grey, \(\delta=1\)) clears the target on the easy low-\(x\) bins but falls below it on the hard high-\(x\) bins. The flat inflated band (orange) reaches the target on the worst bin only once \(\delta\) is small enough that the slab is huge, over-insuring every easy bin to do so. The oracle (blue) sits near the target everywhere at a fraction of the width.

Takeaway. Barber et al. (2021), Thm 3.1: distribution-free coverage on all subgroups of mass \(\ge\delta\) is attainable only by inflating the marginal level to \(1-\alpha\delta\), a wider flat band. It buys no adaptivity, and as you protect smaller subgroups the width diverges; the “avg width: flat ÷ oracle” readout is the multiplicative efficiency loss. Real adaptivity (the oracle) requires a distributional assumption. So subgroup-conditional coverage cannot be had for free, the §3 companion to Demo 07’s infinite-length result.