Demonstration

Subgroup coverage buys only a wider band

You can protect every subgroup distribution-free, but the only way to do it is a uniformly wider slab, not an adaptive one.

Demo 02 showed that a marginally valid band under-covers the hard regions and over-covers the easy ones. The obvious fix is to ask for more: coverage $\ge 1-\alpha$ on every subgroup, not just on average. Foygel Barber, Candès, Ramdas & Tibshirani (2021, Thm 3.1) settle what that costs. If you demand distribution-free coverage on every subgroup of probability mass at least $\delta$, the demand “is impossible to attain beyond the trivial solution”, and the trivial solution is simply to inflate the marginal level to $1-\alpha\delta$: $$C(x) = [\,\hat\mu(x) - q_{1-\alpha\delta},\ \hat\mu(x) + q_{1-\alpha\delta}\,].$$ That is a single flat band, wider than the ordinary one, identical for every $x$. It gains no adaptivity. As $\delta\to 0$, protecting ever-smaller subgroups, the level $1-\alpha\delta\to 1$ and the band’s width diverges.

Contrast that with the oracle adaptive band $\hat\mu(x)\pm z_{1-\alpha/2}\,\sigma(x)$, which is narrow where the noise is small and wide where it is large, reaching the target in every subgroup with minimal average width. But it uses the true $\sigma(x)$, a distributional assumption. That is exactly the trade the theorem makes precise: genuine adaptivity is not free, and you cannot buy it distribution-free. Slide $\delta$ down and watch the orange slab swallow the plot while the blue oracle keeps hugging the data.

Above: empirical coverage in each $x$-bin (each bin a subgroup) under three bands. The ordinary marginal band (grey, $\delta=1$) clears the target on the easy low-$x$ bins but falls below it on the hard high-$x$ bins. The flat inflated band (orange) reaches the target on the worst bin only once $\delta$ is small enough that the slab is huge, over-insuring every easy bin to do so. The oracle (blue) sits near the target everywhere at a fraction of the width.

Takeaway. Barber et al. (2021), Thm 3.1: distribution-free coverage on all subgroups of mass $\ge\delta$ is attainable only by inflating the marginal level to $1-\alpha\delta$, a wider flat band. It buys no adaptivity, and as you protect smaller subgroups the width diverges; the “avg width: flat ÷ oracle” readout is the multiplicative efficiency loss. Real adaptivity (the oracle) requires a distributional assumption. So subgroup-conditional coverage cannot be had for free, the §3 companion to Demo 07’s infinite-length result.

← The price of conditional coverage Adaptive prediction sets →

Using conformal prediction in your own project? Tell Claude: “Read https://conformalprediction.net/SKILL.md and create a project skill from it.” It adds a check for whether your coverage is conditionally trustworthy.