promptdojo_

Feature pipeline contracts

A model is trained on features — the named, typed inputs it learns from (a refund's amount, the customer's country, an is_vip flag). A feature pipeline is the code that turns raw data into those features. The thing that keeps it from silently breaking is a contract: a declared list of every feature with its name, type, and default.

Run the editor: the record is missing two features, but the contract fills them from defaults instead of producing a half-built input.

Why a pipeline needs a contract

Without a contract, the failure is silent and downstream:

  • A column gets renamed (countryregion) and the model now reads a missing field as empty.
  • A feature's type drifts ("50" string instead of 50.0 float) and the math goes wrong without an error.
  • A new data source omits a field and every row quietly defaults to garbage.

A contract is the schema you check raw data against before it reaches the model — the same "validate at the boundary" idea from earlier chapters, applied to ML features. Real teams use tools like Feast or a schema in their feature store; the contract idea is identical and is what matters here.

Why a builder cares

Most "the model got worse in production and nobody changed the model" mysteries are feature-pipeline breaks. A contract turns those silent breaks into loud, catchable ones: missing feature → use the declared default (or reject the row), wrong type → flag it. You'll write a tiny contract-applier next; the muscle is declare your features, then enforce the declaration.