Conformal Prediction

Demonstration

The price of conditional coverage

The marginal-vs-conditional gap is not a tuning problem. It is a theorem.

Demo 02 showed a constant-width band that over-covers where the data are easy and under-covers where they are hard. The obvious reflex is to localize: chop the input into bins and calibrate a separate conformal quantile in each one, distribution-free, “Mondrian” conformal. With \(B\) equal-width \(x\)-bins, each bin \(b\) gets its own band $$C_b(x) = [\,\hat\mu(x) - q_b,\ \hat\mu(x) + q_b\,], \qquad q_b = \big(\text{the } \lceil (n_b+1)(1-\alpha)\rceil\text{-th smallest score in bin } b\big).$$ Slide \(B\) up. The per-bin (conditional) coverage really does flatten out toward the target, the gap closes. But watch the price: \(q_b = \lceil (n_b+1)(1-\alpha)\rceil\)-th smallest score requires \(\lceil (n_b+1)(1-\alpha)\rceil \le n_b\), i.e.\ enough points in the bin. The moment a bin is too sparse, the only finite-sample-valid interval is the whole real line, \(q_b = +\infty\). For \(\alpha = 0.1\) the cutoff is exact: any bin with fewer than \(9\) calibration points must report \(\infty\).

Above: sweeping the conditioning resolution \(B\) from \(1\) (pure global/marginal conformal) to \(60\) (approaching per-\(x\) conditioning). The worst-bin coverage (green) climbs to the target as \(B\) grows, conditional validity is being achieved. But it climbs in lockstep with the fraction of test points whose band is \(\infty\) (red) and the mean finite band length (orange, normalized). You buy uniform conditional coverage only by paying unbounded length. The vertical line marks your current \(B\).

Takeaway. This is Lei & Wasserman (2014, Lemma 1) made tangible: for a continuous \(x\), non-trivial finite-sample conditional validity is impossible distribution-free, any procedure with exact conditional coverage must have infinite expected interval length at almost every \(x\). The localized “fix” is not a fix; it is the theorem. As you refine \(B\), bins starve, \(q_b\to\infty\), and the band degenerates to the trivial \((-\infty,\infty)\) exactly where conditioning would have mattered most. This is why Demo 02’s gap cannot simply be closed by binning harder. You can approach conditional coverage, via smoothness assumptions, or quantile-regression scores (CQR) that let band width vary with \(x\), but only by leaving the distribution-free setting and trusting a model. The guarantee you keep for free is marginal; the conditional one is for sale, and its distribution-free price is \(\infty\).