Companion paper
Marginally Useful
Formalizing the information gap in conformal prediction. The companion paper to this guide, built around one decomposition.
Read the PDF LaTeX source references.bib
Abstract
Conformal prediction gives a distribution-free, finite-sample guarantee of marginal coverage for a set. It is easy to read this as more than it is, as evidence that the underlying forecast is sharper or better calibrated as a distribution. The paper separates the two. The new result is one decomposition; the familiar cautions (marginal coverage is not conditional, validity is trivially satisfiable, exchangeability is required) are assembled with citations as context, not as discoveries.
- The residual-information gap (Propositions 2–3): the result. A single-shape conformal predictive system re-levels the base model’s residual shape; its log-score regret to the oracle is the mutual information \(I(R;X)\) between residual and input, which no recalibration that ignores \(X\) can reduce. Within the class of objects it produces, the system is log-score optimal: the cost is the class, not the calibration step.
- Orthogonality (Proposition 1): folklore, made precise. For a residual score the conformal set depends on the nonconformity scores alone, so marginal coverage places no constraint on the log-score of an accompanying density: pin coverage at \(1-\alpha\) while the log-score goes to \(-\infty\).
- The coverage–score plane (Section 6): a diagnostic. Conformalizing a fixed model is a horizontal move: toward zero coverage error, never toward a better score.
The impossibility of distribution-free conditional coverage is the result of Lei & Wasserman (2014) and Foygel Barber et al. (2021); its two coordinates appear, as finite-sample facts, in the price of conditional coverage and subgroup coverage demonstrations. The paper closes with a litmus test for when coverage is the objective.
Several ways to read the gap
The same quantity, the residual-information gap \(I(R;X)\), looks different from each angle:
- A false-pooling cost. A single-shape conformal system assumes one residual law fits every input (\(R\perp X\)); the gap is the price of that assumption.
- An average log Bayes factor. \(I(R;X)=\mathbb{E}\,\log\frac{r(R\mid X)}{\bar r(R)}\): the oracle’s expected per-observation log-likelihood advantage over the pooled shape.
- Conditional non-uniformity of conformal ranks. The PIT rank \(U=G(R)\) is marginally uniform (the conformal win), but \(I(R;X)=\mathbb{E}_X\,\mathrm{KL}(P_{U\mid X}\,\|\,\mathrm{Unif})\) conditionally: easy inputs cluster \(U\) near \(\tfrac12\), hard ones push it to the tails. Watch it in the coverage–score plane.
- A projection. The result is the information projection of the oracle onto the single-shape manifold; the gap is the leftover distance to independence \(R\perp X\).
Prediction versus verification
Conformal prediction verifies a coverage property; it does not model. It is best read as a terminal certification step: it repairs coverage, but any gain in a proper score comes from modeling the conditional spread, quantiles, or residual shape. Practically: conformalize last, and to sharpen, condition on \(x\) rather than tighten the conformal step.
Building from source
The paper builds with Tectonic (which fetches packages and runs BibTeX automatically):
cd paper && tectonic marginally-useful.tex
Any standard TeX distribution works too: pdflatex → bibtex → pdflatex → pdflatex.