Conformal Prediction

A cautionary example

Using conformal wrappers: a worked example

Do MAPIE, crepes, and the time-series conformal methods actually help? One controlled experiment on synthetic data.

The experiment runs the real libraries (MAPIE, crepes) and the strong time-series methods (ACI, AgACI, conformal PID, NexCP/weighted, EnbPI) in their intended modes, against plain probabilistic baselines and, since the data are synthetic with a known law, a true oracle. Everyone is scored on conditional coverage, interval efficiency, and proper scores (interval/Winkler, CRPS), not marginal coverage alone.

Code and full numbers: benchmark/. Means over 5 seeds, target coverage 0.90; compare within each table, not across.

Time series under drift

rolling coverage over time
Fixed split conformal (blue) collapses once the volatility regime hits; the adaptive methods (ACI, conformal PID) and a simple EWMA-volatility Gaussian track the target. Coverage now, not just on average.
methodfamilymarg. covworst-windowcond. gapinterval score ↓CRPS ↓
oracle (true μ,σ)oracle0.900.820.025.520.76
skaters (Gaussian)prob0.890.700.046.570.86
skaters + norm. conformalcp0.890.690.046.58
EWMA-vol Gaussianprob0.890.810.036.920.92
conformal PIDcp0.900.820.017.00
ACIcp0.900.830.017.00
true σ on biased μ̂oracle0.820.700.127.020.92
NexCP (weighted)cp0.880.700.057.08
AgACIcp0.860.780.067.15
MAPIE EnbPI (online)cp0.840.360.208.87
MAPIE ACIcp0.840.360.208.87
GARCH(1,1) Gaussianprob0.690.480.349.840.98
skaters + split conformalcp0.600.200.5611.9
fixed split (CP)cp0.560.170.6013.8

Heteroscedastic regression

conditional coverage vs x
Vanilla split conformal (blue) over-covers the easy region (100%) and under-covers the hard one (61%). The adaptive methods (crepes normalized) and a heteroscedastic Gaussian hug the target, because they condition on \(x\).
methodfamilymarg.cov low-varcov hi-varcond. gapinterval ↓CRPS ↓
oracle (true f,s)oracle0.900.900.900.026.390.88
MAPIE CQRcp0.900.920.890.036.51
quantile GBR (no conformal)prob0.890.900.880.036.52
crepes normalizedcp0.900.920.870.046.85
crepes CPScp0.910.920.880.046.860.90
het-Gaussian (mean+var)prob0.930.930.890.067.080.90
MAPIE split (absolute)cp0.901.000.730.178.32
crepes standardcp0.901.000.730.178.32

What this one example shows

  1. Adaptive wrappers behave well here. CQR and crepes normalized/CPS reach near-oracle interval score and good conditional coverage; ACI and conformal PID recover coverage under drift where fixed split collapses. None of these is a straw man.
  2. But the adaptivity comes from the conditional model, not the conformal step. The cleanest tell: raw quantile-GBR with no conformal at all (6.52) ties conformalized quantile regression (6.51). Vanilla split conformal is identical to a static Gaussian. The conformal layer supplies the finite-sample marginal certificate; the sharpness comes from the model it wraps.
  3. A plain probabilistic model keeps pace on the proper score and returns a full distribution. An EWMA-vol Gaussian matches the conformal repairs on interval score (6.92 vs ~7.0); crepes CPS gets competitive CRPS because it is doing conditional distribution estimation.
  4. The author’s own forecaster behaves no differently. thinking_fast_and_slow, a timemachines skater that blends two EMAs into an adaptive predictive mean and standard deviation, lands at near-oracle CRPS (0.86) and the best non-oracle interval score (6.57). Conformalizing it changes nothing for the better: a fair adaptive (normalized) wrap re-levels coverage to 90% at an identical score (6.58), while a naive split-conformal wrap on the drifting series collapses to 60%. The value was already in the density it estimates, not in the conformal step.
  5. None achieves per-step conditional coverage, even the oracle’s worst window is ~0.82. The repairs deliver long-run/marginal coverage, never coverage now, consistent with the no-go results.

The lesson, not a leaderboard. Conformal prediction’s marginal certificate is real and useful, but it is not distributional quality. Where a conformal wrapper is sharp, it is sharp because of the conditional model underneath; the conformal step adds the coverage guarantee on top. That is the mechanism this toy is built to expose, the empirical face of the paper’s residual-information gap.