Demonstration
Marginal vs. conditional coverage
The guarantee is an average over all inputs, and an average can hide a lot.
Demo 01 delivered on its promise: about \(1-\alpha\) of fresh points land inside the band. But that promise is marginal, averaged over the whole distribution of \(x\). It says nothing about coverage at a particular \(x\). Here the noise is heteroscedastic: \(\sigma(x)\) grows with \(x\), so the data are tight on the left and fan out on the right. Yet the absolute-residual score \(|y-\hat\mu(x)|\) yields a single number \(q\), hence a band of constant width $$C(x) = [\,\hat\mu(x) - q,\ \hat\mu(x) + q\,].$$ Watch what that constant width does as you slide the selection window across \(x\). The marginal coverage stays near target; the local coverage does not.
Above: empirical coverage of the test points in each \(x\)-bin, against the dashed target \(1-\alpha\). The curve starts above target where the data are easy (small \(\sigma\)) and falls below it where they are hard (large \(\sigma\)). This is exactly Lei & Wasserman’s observation: a marginally valid band “tends to overestimate the set when \(x\) is in the high density area and to underestimate for low density \(x\).”
Takeaway. A 90% guarantee on average is perfectly consistent with, say, ~99% coverage where the problem is easy and well under target where it is hard. The constant band over-insures the cases you would have gotten right anyway and under-insures the cases you actually built the model for, and the guarantee is silent about precisely that distinction. Marginal validity is a real property; it is just not the property most people think they are buying. Next, Demo 03 pushes this generosity to its breaking point.