Patterns & anti-patterns

How conformal prediction gets used, and misused

A taxonomy of sound patterns and anti-patterns, and a sampled estimate of how often the literature actually misapplies the method.

Conformal prediction is easy to apply and, it turns out, mostly applied well. To check that impression rather than assert it, we drew a systematic sample of 79 of the 549 papers in the field’s main bibliography (Awesome Conformal Prediction), read each abstract, and labelled how it uses the method against the rubric below. The per-paper labels are in data/cp_papers_labeled.tsv.

What we found

sound use

77%

loose framing

18%

clear misuse

not CP / n/a

Of the 79 sampled, 61 were sound, 14 used loose framing, 2 clearly overstated or misapplied the guarantee, and 2 turned out not to use conformal prediction at all. Among the papers that actually use it, that is roughly four in five sound, one in five loose, and about one in forty a clear misuse. The headline is reassuring: in the research papers themselves, misuse is rare. Loose framing tends to appear in the popular layer, tutorials, courses, and vendor copy, which the curated lists leave out, more than in the methods literature.

Caveats, so this is read for what it is: a single-rater, abstract-based judgement over a ~14% sample; the rubric defaults to “sound” unless a claim is clearly overstated; and tone is partly subjective. Treat the 3% as “low single digits,” not a decimal.

Sound patterns

Where coverage or containment is genuinely the objective, or the contribution is honest methodology. Counts are occurrences in the sample.

Methodology and theory (39), new methods, validity / efficiency / conditional-coverage results, scores; correct by construction and honest about what the guarantee is.
The set is the deliverable (10), selective prediction, classification sets, retrieval and recall, screening shortlists; coverage is exactly what is wanted.
Safety and containment (7), robotics safe planning, reachable sets, runtime verification; the question is whether the truth lies inside a region.
A UQ layer with explicit caveats (4), conformal used as a calibration step, stating the marginal-coverage and exchangeability assumptions plainly.
Risk control and deferral (3), controlling a miss rate or false-negative rate, or deciding when to ask for help.
Anomaly detection as a test (3), conformal \(p\)-values as a distribution-free hypothesis test, where coverage is Type-I error control.

Anti-patterns

The ways the guarantee gets overstated. Most of these showed up as loose framing rather than outright error; the two clear misuses were both the first kind.

The accuracy-improver (4), presenting conformalization as making the forecast itself better, or evaluating a conformal layer by accuracy / AUROC, when coverage is what it controls.
Coverage as quality (3), selling marginal coverage as “reliable” or “well-calibrated” uncertainty, with the marginal-versus-conditional distinction left out.
The conditional overclaim (2), implying per-instance or conditional validity that is not available distribution-free.
Exchangeability ignored (1), asserting the vanilla split-conformal guarantee on time-series or shifted data without adaptation or caveat.
The silver bullet (1), framing conformal prediction as the answer to uncertainty quantification in general.

None of this is a knock on the field; if anything it is a credit to it. The point of the guide is to keep that good track record by being clear about which column a problem is in, and the applications are where the method earns its keep.

← Applications Theory →