Conformal Prediction

Demonstration

The coverage-score plane

Two independent moves: conformalizing slides you sideways to honest coverage; only a better conditional model lifts you toward the oracle.

This is the synthesis. Put a forecaster on two axes: how far its coverage is from target (horizontal, zero is calibrated) and how good its predictive density is, measured by log-score against an oracle that knows the true conditional spread \(\sigma(x)\) (vertical, zero is the oracle). The data are heteroscedastic: \(y\sim N(0,\sigma(x))\) with \(\sigma(x)\) varying, so a model that ignores \(x\) and uses one constant spread is honest on average but blunt where the noise is small or large.

Now move the knobs. Conformalizing the set resizes the interval to hit \(1-\alpha\) coverage; it is a purely horizontal move to the calibrated line, and it leaves the reported density, hence the log-score, exactly where it was. Sliding conditioning from a constant spread toward the oracle’s \(\sigma(x)\) is the vertical move, the only thing that climbs. The red bar is what conformal cannot touch: the residual-information gap \(I(R;X)\).

Drag predictive spread and the point moves left or right (its coverage drifts off target); tick conformalize and it snaps back to the calibrated line at the same height. The only way up is conditioning. Conformal prediction is a horizontal operation in this plane; it never closes the vertical gap.

A numerical check

The vertical gap claims to be exactly \(I(R;X)\). Here both sides are computed independently, on a grid, in your browser (the live version of check_gap.py): the single-shape (signed-CPS) regret from log-scores, and \(I(R;X)\) from entropies. They coincide. The absolute-interval forecaster pays an extra \(\mathrm{KL}(\bar r\,\|\,h_{\mathrm{sym}})\) whenever the residual law is skewed.

The conformal ranks \(U=G(R)\) are that same number seen as a picture. Marginally (grey) they are uniform — the conformal achievement. Conditionally they are not: easy inputs (green) concentrate \(U\) near \(\tfrac12\), hard inputs (red) push it into the tails. The average of that conditional non-uniformity is exactly \(I(R;X)\); when the green and red curves part from flat, the gap above is positive.

Raise heteroscedasticity to make the residual spread depend more on \(x\) (that is \(I(R;X)\)); raise skew to make the residual law asymmetric (that is the \(\mathrm{KL}\) term). The two identities hold to four decimals throughout.

Coverage and sharpness are different coordinates. Conformal prediction certifies the first and moves you along it; the second is the model’s job. The leftover height after conformalizing is the information about the residual that lives in \(x\) and that a single constant shape cannot capture, which is the paper’s residual-information gap. Coverage ⊥ log-score is the one-dimensional slice of this picture.