Demonstration

How split conformal works

We build the method and watch it deliver exactly what it promises.

Hold out a calibration set the model never trained on. For each calibration point compute a nonconformity score, here the absolute residual $s_i = |y_i - \hat\mu(x_i)|$. Take the score’s $\lceil (n+1)(1-\alpha)\rceil$-th smallest value, call it $q$, and emit the band $$C(x) = [\,\hat\mu(x) - q,\ \hat\mu(x) + q\,].$$ On fresh, exchangeable data this band covers the truth at least $1-\alpha$ of the time. Move the sliders: the empirical coverage tracks the target.

Above: the calibration nonconformity scores $|y-\hat\mu(x)|$. The dashed line is the conformal quantile $q$; the band half-width is exactly that value. Coverage is achieved by counting, not by modeling.

Takeaway. It works, and it asks almost nothing of you, not even that $\hat\mu$ be any good. That generosity is the first clue. In the fence-is-the-horizon demo we exploit it: the same 90% guarantee survives a deliberately terrible predictor. First, though, marginal vs. conditional coverage asks where that 90% actually lands.

← Home Marginal vs. conditional →

Using conformal prediction in your own project? Tell Claude: “Read https://conformalprediction.net/SKILL.md and create a project skill from it.” It adds a check for whether your coverage is conditionally trustworthy.