promptdojo_

Data drift vs concept drift

A model is trained on a snapshot of the world, and the world keeps moving. Two different kinds of "moving" degrade it, and they're caught differently.

Data drift — the inputs change

Data drift is when the distribution of the inputs shifts away from what the model trained on: support tickets get longer, a new product floods the feed, prices double. The inputs look different even if the "right answer" rule is unchanged. You catch it by comparing input statistics (mean, category mix) between training and live data — run the editor: the live mean ticket length drifted past the tolerance.

Concept drift — the relationship changes

Concept drift is sneakier: the inputs look normal, but the relationship between inputs and the label changes. The same email that wasn't spam last year is spam now; what counts as "fraud" shifts as fraudsters adapt. Your input stats look fine, yet accuracy quietly drops. You catch it by watching the model's accuracy on fresh labeled data, not by watching the inputs.

Telling them apart

  • Inputs shifted → data drift (compare input stats).
  • Inputs normal but accuracy falling → concept drift (compare accuracy over time).

Why a builder cares

"The model worked at launch and slowly got worse" is almost always one of these. Knowing which one points you at the fix: data drift often needs re-collecting/rescaling inputs; concept drift needs fresh labels and usually a retrain. You'll write the data-drift check and the classifier next.