Demonstration

Marginal vs. conditional coverage

The guarantee is an average over all inputs, and an average can hide a lot.

Demo 01 delivered on its promise: about $1-\alpha$ of fresh points land inside the band. But that promise is marginal, averaged over the whole distribution of $x$. It says nothing about coverage at a particular $x$. Here the noise is heteroscedastic: $\sigma(x)$ grows with $x$, so the data are tight on the left and fan out on the right. Yet the absolute-residual score $|y-\hat\mu(x)|$ yields a single number $q$, hence a band of constant width $$C(x) = [\,\hat\mu(x) - q,\ \hat\mu(x) + q\,].$$ Watch what that constant width does as you slide the selection window across $x$. The marginal coverage stays near target; the local coverage does not.

Above: empirical coverage of the test points in each $x$-bin, against the dashed target $1-\alpha$. The curve starts above target where the data are easy (small $\sigma$) and falls below it where they are hard (large $\sigma$). This is exactly Lei & Wasserman’s observation: a marginally valid band “tends to overestimate the set when $x$ is in the high density area and to underestimate for low density $x$.”

Takeaway. A 90% guarantee on average is perfectly consistent with, say, ~99% coverage where the problem is easy and well under target where it is hard. The constant band over-insures the cases you would have gotten right anyway and under-insures the cases you actually built the model for, and the guarantee is silent about precisely that distinction. Marginal validity is a real property; it is just not the property most people think they are buying. Next, Demo 03 pushes this slack to its breaking point.

← How split conformal works The fence is the horizon →

Using conformal prediction in your own project? Tell Claude: “Read https://conformalprediction.net/SKILL.md and create a project skill from it.” It adds a check for whether your coverage is conditionally trustworthy.