Conformal Prediction

Demonstration

Conformal vs. recalibration

The fair case: where conformal genuinely wins, and where it quietly hands you the wrong object.

This is the fair turn. We have spent five demos prodding conformal’s soft spots; here we concede its one real advantage and draw the line cleanly. The question is not “is conformal valid?”, it is, but “which object do you actually need?

Start with an overconfident base model. The truth is \(y_i \sim N(0,\sigma_{\text{true}})\) with \(\sigma_{\text{true}}=1\), but the model claims a density that is too narrow: \(N(0,\sigma_{\text{model}})\) with \(\sigma_{\text{model}} = \sigma_{\text{true}}/k\) for an overconfidence factor \(k>1\). We hold out half the data to calibrate and judge everyone on the same test set. Three treatments:

The PIT histograms are the money shot. The flat line at density \(=1\) is what a calibrated forecast should look like.

Left: the RAW forecast’s PIT, piled up at the edges, the classic U of an overconfident model. Right: after variance recalibration the PIT sits flat on the uniform reference. Recalibration fixed the distribution, not just a coverage number.

Now overlay the predictive objects on the outcomes. The raw density is too narrow; the recalibrated density matches; and the conformal band is shaded as an interval, a set, not a curve, to make the set-vs-measure distinction visible. Use the selector to highlight each.

Read the log-scores: raw is worst, conformal-implied sits in the middle (it inherits the base shape), and recalibrated is best, it is the only treatment that optimized the score you actually report. Coverage tells the mirror story: raw under-covers, while both conformal and recalibration hit the target, but conformal’s coverage is a finite-sample, distribution-free guarantee, whereas recalibration’s rests on the Gaussian model being right.

Takeaway. If what you want is a calibrated forecast, a density to integrate, score, and decide with, you wanted recalibration. It improves the very number you report and hands back the whole object. Conformal calibrates coverage and emits a set; its implied “density” is the base model’s residual shape, re-leveled, and never optimized for the score. Conformal’s genuine edge is real and worth naming: a distribution-free, finite-sample coverage guarantee. That is the right tool exactly when the guarantee is the product, and the wrong one when a forecast is. See the paper for the full argument, or head back to the overview.