Feature pipeline contracts
A model is trained on features — the named, typed inputs it learns
from (a refund's amount, the customer's country, an is_vip flag). A
feature pipeline is the code that turns raw data into those features.
The thing that keeps it from silently breaking is a contract: a
declared list of every feature with its name, type, and default.
Run the editor: the record is missing two features, but the contract fills them from defaults instead of producing a half-built input.
Why a pipeline needs a contract
Without a contract, the failure is silent and downstream:
- A column gets renamed (
country→region) and the model now reads a missing field as empty. - A feature's type drifts (
"50"string instead of50.0float) and the math goes wrong without an error. - A new data source omits a field and every row quietly defaults to garbage.
A contract is the schema you check raw data against before it reaches the model — the same "validate at the boundary" idea from earlier chapters, applied to ML features. Real teams use tools like Feast or a schema in their feature store; the contract idea is identical and is what matters here.
Why a builder cares
Most "the model got worse in production and nobody changed the model" mysteries are feature-pipeline breaks. A contract turns those silent breaks into loud, catchable ones: missing feature → use the declared default (or reject the row), wrong type → flag it. You'll write a tiny contract-applier next; the muscle is declare your features, then enforce the declaration.