Demonstration

laplace vs conformal on a time series

For univariate distributional forecasting, modelling the conditional spread beats wrapping a point forecaster in a conformal band — at the same coverage.

This is the Code page’s time-series note, made visual. A heteroscedastic series \(y_t=\text{level}_t+\sigma_t\,\varepsilon_t\) drifts gently in level while its conditional spread \(\sigma_t\) swings between calm and turbulent regimes. Two forecasters consume the same rolling-mean point forecast \(\hat y_t\) and are calibrated to the same marginal level on \([0,t_0]\). They differ only in the shape of the predictive.

conformal (split): a constant half-width \(q\) from the calibration residual quantile, \(\hat y_t\pm q\). The implied predictive is a Gaussian of constant scale. This is what wrapping a point forecaster (AutoARIMA, ETS, …) in crepes or MAPIE gives you.
laplace-style: an online conditional distribution. Here, illustratively, a RiskMetrics EWMA volatility recalibrated by one scale so its calibration coverage hits target. It stands in for any conditional-scale model — a t-GARCH fit, or the skaters laplace filter; the principle is the same, model the conditional spread directly.

Both clear the marginal coverage target, so the conformal guarantee and the model’s calibration agree there. The readouts report, for each (conformal | laplace), marginal coverage, mean half-width (sharpness), mean log-likelihood, and CRPS. Turn up the volatility contrast and watch the laplace band breathe while the conformal band stays rigid; slide it back to zero and the two coincide, since homoscedastic data gives the conditional model nothing to exploit.

The second panel is the mechanism: the predictive half-width over time. The conformal band is a flat line; the laplace-style band tracks the true \(z\,\sigma_t\) (grey, dashed). That tracking is the whole difference.

Both bands clear the marginal level — the rigid one even over-covers, because a single width set from the calibration residuals is too wide for the calm majority. That extra width is not a free win: the constant-scale predictive still trails on log-likelihood. The laplace band instead spends its width where the uncertainty actually is, narrower in calm, wider in turbulent. The two stay close on CRPS, conformal’s home metric; the daylight opens up on log-likelihood.

The fair objection, and what survives it

A fair objection: frozen split conformal is a straw man for time series. Nobody holds a band fixed on a drifting process, though the failure is instructive for what the guarantee does and does not promise. The practical variants shown in the drift demo add forgetting: ACI adjusts the level online (Gibbs & Candès, 2021), rolling windows and weighted quantiles discount stale residuals (Barber et al., 2023). Adaptation is itself a form of conditional modelling, with the conformal step adding the marginal certificate on top, and working code for tacking it onto a forecaster is on the Code page. Two caveats survive the upgrade.

First, forgetting caps the sample. Discounting old residuals means the effective number of calibration points stops growing: a window of the last \(w\) residuals is a permanent \(n=w\), whatever the length of the series. So the coverage lottery never resolves, the Beta fan around the stamped level stays at its width forever, and the knob has no good setting: forget quickly and coverage tracks the regime while the fan stays wide; forget slowly and the fan narrows onto a level the drift has already moved.

Second, the adaptation rule is imposed a priori. A step size on \(\alpha\), a window length, a discount factor: simple update rules fixed in advance, competing at a task, tracking a moving conditional distribution, that the best adaptive time-series models are built and tuned for. On the open skaters benchmarks that competition runs continuously, and fixed a-priori width rules are not where it is won.

Takeaway. Both bands cover (the conformal one over-covers, by sitting too wide in calm stretches), CRPS is close, and log-likelihood is not: the conditional model wins because it crosses the information gap that conformalization, by construction, cannot. Slide the volatility contrast to zero and the gap closes; turn it up and the log-likelihood gap, the information gap, opens with it. The conformal band re-levels a single residual shape; it never learns where \(\sigma_t\) is large or small. So for univariate distributional time-series, prefer a conditional model. The one in this demo is a laplace-style filter (laplace from the skaters package), chosen for convenience only; any model that tracks the conditional scale would likely do as well or better; a t-GARCH fit, say.

← Drift & time series ↑ Back to overview

Using conformal prediction in your own project? Tell Claude: “Read https://conformalprediction.net/SKILL.md and create a project skill from it.” It adds a check for whether your coverage is conditionally trustworthy.