Conformal Prediction

Technical note

Betting against a conformal predictor

A parimutuel account of the information gap. A companion note to Marginally Useful and the Feynman–Wigner diagnostic.

Read the PDF LaTeX source

The idea

The companion paper shows the wasted log-score of a single-shape conformal predictor is the mutual information \(I(R;X)\) between the residual and the input. This note rederives that number from a betting mechanism instead of from the score. Treat the predictor as the crowd in a parimutuel pool on the residual: bettors put money in, and the pot is split among the winners in proportion to their stake on the realised outcome. In the continuous limit the bin width cancels and the payoff is the ratio of the bettor's density to the crowd's, \(b(u)/q(u)\). The log-optimal stake is the truth, and its growth rate against a crowd pricing \(q\) is \(\mathrm{KL}(p\,\|\,q)\).

The rent

Conformalization prices the rank pool flat: \(q\equiv 1\). An entrant who knows only the marginal breaks even — marginal coverage stated as wealth. An entrant who conditions on \(X\) and stakes \(g(\cdot\mid X)\) grows his bankroll at rate exactly \(I(R;X)\). The gap is the rent, and re-leveling cannot close it, because the leak is invisible at the margin where the guarantee lives. A provably fair lottery that a side-informed player beats with certainty.

The mechanism is real

This is not a thought experiment. The microprediction platform splits a pot toward the Monte Carlo samples nearest the realised value — nearest-the-pin — which is the discrete form of \(b(z)/q(z)\). The continuous version, pricing a density directly through the density package, was run as the MidOne contest at Crunch Labs (CrunchDAO). A conformal predictor dropped into either is the entrant who prices the residual pool flat in \(X\); anyone who conditions on \(X\) holds a guaranteed growth rate \(I(R;X)\) against it.

Measuring the rent

Two ways to read the rent off a fitted predictor. A static lower bound, \(I(R;X)\ge \mathrm{HSIC}(U,X)/2K\), turns the distance-covariance detector into a certificate of a minimum wasted score. And a sequential e-process — the informed bettor's wealth \(W_t=\prod_s b_s(U_s\mid X_s)\) — is an anytime-valid test for conditional miscoverage whose growth rate converges to \(I(R;X)\). Both are checked numerically by check_parimutuel.py.

Using conformal prediction in your own project? Tell Claude: “Read https://conformalprediction.net/SKILL.md and create a project skill from it.” It adds a check for whether your coverage is conditionally trustworthy.