Demonstration

The price of conditional coverage

The marginal-vs-conditional gap is not a tuning problem. It is a theorem.

Demo 02 showed a constant-width band that over-covers where the data are easy and under-covers where they are hard. The obvious reflex is to localize: chop the input into bins and calibrate a separate conformal quantile in each one, distribution-free, “Mondrian” conformal. With $B$ equal-width $x$-bins, each bin $b$ gets its own band $$C_b(x) = [\,\hat\mu(x) - q_b,\ \hat\mu(x) + q_b\,], \qquad q_b = \big(\text{the } \lceil (n_b+1)(1-\alpha)\rceil\text{-th smallest score in bin } b\big).$$ Slide $B$ up. The per-bin (conditional) coverage really does flatten out toward the target, the gap closes. But watch the price: $q_b = \lceil (n_b+1)(1-\alpha)\rceil$-th smallest score requires $\lceil (n_b+1)(1-\alpha)\rceil \le n_b$, i.e.\ enough points in the bin. The moment a bin is too sparse, the only finite-sample-valid interval is the whole real line, $q_b = +\infty$. For $\alpha = 0.1$ the cutoff is exact: any bin with fewer than $9$ calibration points must report $\infty$.

Above: sweeping the conditioning resolution $B$ from $1$ (pure global/marginal conformal) to $60$ (approaching per-$x$ conditioning). The worst-bin coverage (green) climbs to the target as $B$ grows, conditional validity is being achieved. But it climbs in lockstep with the fraction of test points whose band is $\infty$ (red) and the mean finite band length (orange, normalized). You buy uniform conditional coverage only by paying unbounded length. The vertical line marks your current $B$.

Takeaway. This is Lei & Wasserman (2014, Lemma 1) made tangible: for a continuous $x$, non-trivial finite-sample conditional validity is impossible distribution-free, any procedure with exact conditional coverage must have infinite expected interval length at almost every $x$. The localized “fix” is the theorem itself. As you refine $B$, bins starve, $q_b\to\infty$, and the band degenerates to the trivial $(-\infty,\infty)$ exactly where conditioning would have mattered most. This is why Demo 02’s gap cannot simply be closed by binning harder. You can approach conditional coverage, via smoothness assumptions, or quantile-regression scores (CQR) that let band width vary with $x$, but only by leaving the distribution-free setting and trusting a model. The guarantee you keep for free is marginal; the conditional one is for sale, and its distribution-free price is $\infty$.

← Conformal vs. recalibration Subgroup coverage buys only a wider band →

Using conformal prediction in your own project? Tell Claude: “Read https://conformalprediction.net/SKILL.md and create a project skill from it.” It adds a check for whether your coverage is conditionally trustworthy.