The shape of an inference endpoint
Serving a model means putting it behind an endpoint: a request comes
in, your code runs the model, a response goes out. Stripped of the
framework, it's a function — predict(request) -> response. In real code
that's a FastAPI route:
@app.post("/predict")
def predict(req: Request) -> Response:
...
but the shape is what matters, and you can reason about it in plain Python. Run the editor: one handler, a good request, and a bad one.
Two halves of the contract
- Request — the features the model needs (
amount,country, …). - Response — the prediction plus metadata the caller can act on:
the label, a score/confidence, maybe a model version. A bare
"fraud"with no score is hard to threshold or debug.
Validate at the boundary
The endpoint is the trust boundary. Never assume the request is
well-formed — a missing field, a string where a number belongs, a null.
Check inputs first and return a clear error (a 4xx in HTTP terms) instead
of letting request["amount"] throw deep inside the model. This is the
same "validate at system boundaries" rule from earlier chapters, now at
the serving edge.
Why a builder cares
Most "the model API crashed in prod" pages are unvalidated requests, not the model. A handler that validates first and returns a structured response (prediction + score, or a clean error) is the difference between a robust service and one that 500s on the first weird input. You'll build that handler next.