Train/inference skew
Train/inference skew (a.k.a. training-serving skew) is the silent killer of ML systems: the model learns from one set of features at training time, but gets a different set at inference time. The accuracy you measured offline evaporates in production — and nothing errors.
Run the editor: the model trained with an age feature, but production
can't supply it. The model now sees age as missing on every live
request.
How features drift between train and serve
- A feature is missing live — it was in your training table but the
production request doesn't carry it (like
ageabove). - A feature is renamed —
countryat train,regionat serve. Same count, different name, so a naive "did the number of features change?" check misses it. - A value is computed differently —
totalis in dollars at training but cents at serving, or a default changed. Same name, skewed meaning.
The cheapest detector compares the feature sets (and ideally types)
between training and serving: train - serve shows what's missing live,
serve - train shows unexpected extras. Real systems also compare value
distributions; set comparison catches the most common breaks.
Why a builder cares
"It worked in the notebook but the live model is garbage" is almost always
skew. Comparing the train and serve feature contracts before you ship —
and on a schedule after — turns a mysterious accuracy drop into a precise
"age is missing at serve." You'll write that comparison next.