Conformal Prediction

Demonstration

One number, five pictures

The residual-information gap \(I(R;X)\) is a single quantity. The paper reads it five ways; here all five move together as you turn one knob.

The coverage–score plane showed that a single-shape conformal predictor leaves a fixed vertical gap to the oracle, and that the gap equals the mutual information \(I(R;X)\) between the residual and the input. That one number has five faces. They are not five results — they are five ways of looking at the same number, and seeing all of them at once is the quickest route to what the gap actually is.

Turn heteroscedasticity up and the residual spread starts to depend on \(x\): the gap opens, and all four panels respond in lockstep. Turn skew up and the absolute-interval forecaster picks up its extra \(\mathrm{KL}(\bar r\,\|\,h_{\mathrm{sym}})\) penalty, while \(I(R;X)\) itself is unmoved.

(i) A false-pooling cost

A single-shape system insists that, once the location \(\hat\mu\) is fixed, one residual law fits everybody — that \(R\perp X\). But the true conditional laws \(r(\cdot\mid x)\) (coloured, narrow for easy inputs, wide for hard ones) genuinely differ. The gap is the log-score you forfeit by pooling them into the one grey curve \(\bar r\).

(ii) An average log Bayes factor

Since \(I(R;X)=\mathbb{E}\,\log\frac{r(R\mid X)}{\bar r(R)}\), every observation contributes the log-likelihood ratio of its own \(x\)-specific residual law against the pooled one. The histogram is those per-sample log Bayes factors; the oracle’s expected per-sample edge is their mean, and that mean is \(I(R;X)\).

(iii) Conditional non-uniformity of conformal ranks

Let \(U=G(R)\) be the conformal rank under the pooled law — the PIT that signed CPS makes uniform. Marginally (grey) it is uniform: that is the conformal achievement. Conditionally it is not. Easy inputs (green) bunch \(U\) near \(\tfrac12\); hard inputs (red) push it to the tails. The average of that conditional non-uniformity is again \(I(R;X)\).

(iv) A projection onto independence

\(I(R;X)=\mathrm{KL}\!\left(P_{X,R}\,\|\,P_X\!\otimes\!P_R\right)\): the joint experiment projected onto the nearest model in which residual and input are independent. The left field is the joint \(P_{X,R}\) — watch each column’s width breathe with \(x\). The right field is the independence model \(P_X\!\otimes\!\bar r\), every column identical. The structure the projection throws away is the gap.

(v) A Kelly betting rent

This is the one with teeth. By the log-optimal (Kelly) growth theorem, a gambler who knows a distribution \(P\) and bets against a fair book priced by \(Q\) compounds wealth at rate \(\mathrm{KL}(P\,\|\,Q)\) per round. Let the single-shape conformal forecaster be the book — it prices every input with the pooled residual law \(\bar r\) — and let the oracle hold the conditional law \(r(\cdot\mid x)\). The oracle’s relative log-wealth (blue) drifts upward at expected slope \(\mathbb{E}_x\,\mathrm{KL}(r(\cdot\mid x)\,\|\,\bar r)=I(R;X)\): the same number, now in money. The per-round increments are exactly the log Bayes factors of panel (ii). Conformalizing re-levels the book; it cannot change the rent.

False pooling, an average log Bayes factor, non-uniform conformal ranks, a projection onto independence, a Kelly betting rent: five descriptions, one number. Each says the same thing — the information about the residual that lives in \(x\) after the location is fixed — and none of it is reduced by any step that ignores \(x\). That is the residual-information gap; the coverage–score plane is where it shows up as a height you cannot conformalize away.