Code
Conformal prediction recipes
Short, runnable Python for each use-case. Every recipe is the same idea: score a held-out calibration fold, take an order statistic, act on it.
Two assumptions throughout. The calibration data must be exchangeable with the test point (the one place to be careful: plain split conformal does not hold under drift, see the time-series recipe). And the calibration fold must be disjoint from whatever the model trained on. Here alpha is the miscoverage level, so alpha = 0.1 asks for 90%. The dependency-free core of the first recipe is also on PyPI: pip install conformalguide.
Regression intervals (split conformal)
Wrap any point predictor in a band with finite-sample marginal coverage. This is how split conformal works.
import numpy as np
# model already fit on a training fold; calibrate on a disjoint fold
scores = np.sort(np.abs(y_cal - model.predict(X_cal))) # nonconformity scores
n = len(scores)
k = int(np.ceil((n + 1) * (1 - alpha))) # conformal rank (1-indexed)
q = np.inf if k > n else scores[k - 1] # the conformal quantile
lo = model.predict(X_test) - q
hi = model.predict(X_test) + q
# P(y in [lo, hi]) >= 1 - alpha, finite-sample, distribution-free
Adaptive intervals (conformalized quantile regression)
Want the band to widen where the data are noisy? Fit conditional quantiles, then conformalize them. The adaptivity comes from the quantile model; conformal only re-levels coverage (the worked example shows this).
from sklearn.ensemble import GradientBoostingRegressor
import numpy as np
lo_m = GradientBoostingRegressor(loss="quantile", alpha=alpha / 2).fit(X_tr, y_tr)
hi_m = GradientBoostingRegressor(loss="quantile", alpha=1 - alpha / 2).fit(X_tr, y_tr)
# conformity = how far the truth falls outside the predicted band, on calibration
s = np.maximum(lo_m.predict(X_cal) - y_cal, y_cal - hi_m.predict(X_cal))
n = len(s); k = int(np.ceil((n + 1) * (1 - alpha)))
q = np.sort(s)[k - 1]
lo = lo_m.predict(X_test) - q
hi = hi_m.predict(X_test) + q # adaptive width, exact marginal coverage
Classification prediction sets (LAC)
Return a set of labels guaranteed to contain the truth, small where the model is sure. The basis of adaptive prediction sets and selective triage.
import numpy as np
# probs_cal: (n, K) softmax on a calibration fold; y_cal: integer labels
s = 1 - probs_cal[np.arange(len(y_cal)), y_cal] # 1 - p(true class)
n = len(s); k = int(np.ceil((n + 1) * (1 - alpha)))
qhat = np.sort(s)[k - 1]
sets = [np.where(p >= 1 - qhat)[0] for p in probs_test] # labels confident enough to keep
# P(true label in set) >= 1 - alpha
Anomaly detection (conformal p-values)
Turn any anomaly score into a test with an exact false-alarm rate. This is calibrated anomaly detection.
import numpy as np
cal = np.sort(inlier_scores) # calibration scores; higher = more anomalous
n = len(cal)
def conformal_pvalue(s):
return (1 + np.count_nonzero(cal >= s)) / (n + 1)
flag = conformal_pvalue(new_score) <= alpha
# among true inliers, P(flag) <= alpha, distribution-free
Guaranteed-recall shortlist (screening, retrieval)
Keep a shortlist that contains a genuine hit at least 1 - alpha of the time. This is guaranteed recall.
import numpy as np
pos = np.sort(relevant_scores) # scores of known relevant items (ascending)
n = len(pos)
k = int(np.floor(alpha * (n + 1))) # tolerate up to an alpha miss-rate
t = -np.inf if k < 1 else pos[k - 1] # keep-threshold
shortlist = np.where(test_scores >= t)[0]
# a new relevant item clears t with probability >= 1 - alpha
Time series (adaptive conformal inference)
Exchangeability fails under drift, so the fixed quantile is replaced by an online one. ACI recovers long-run coverage, not per-step conditional coverage (exchangeability & time series).
import numpy as np
from collections import deque
alpha_t, gamma, W = alpha, 0.01, deque(maxlen=200) # trailing |residual| window
for x_t, y_t, pred_t in stream:
q = np.quantile(W, min(max(1 - alpha_t, 0.0), 1.0)) if W else np.inf
covered = pred_t - q <= y_t <= pred_t + q
alpha_t += gamma * (alpha - (0.0 if covered else 1.0)) # online level update
W.append(abs(y_t - pred_t))
# empirical coverage -> 1 - alpha over time, even under drift
Every recipe is one order statistic of a calibration score. That is the whole method, and it is why it composes with any model. What conformal gives you is the coverage certificate; the sharpness of the band, set, or shortlist is still the model’s job (the paper makes that split precise).