promptdojo_

The shape of an inference endpoint

Serving a model means putting it behind an endpoint: a request comes in, your code runs the model, a response goes out. Stripped of the framework, it's a function — predict(request) -> response. In real code that's a FastAPI route:

@app.post("/predict")
def predict(req: Request) -> Response:
    ...

but the shape is what matters, and you can reason about it in plain Python. Run the editor: one handler, a good request, and a bad one.

Two halves of the contract

  • Request — the features the model needs (amount, country, …).
  • Response — the prediction plus metadata the caller can act on: the label, a score/confidence, maybe a model version. A bare "fraud" with no score is hard to threshold or debug.

Validate at the boundary

The endpoint is the trust boundary. Never assume the request is well-formed — a missing field, a string where a number belongs, a null. Check inputs first and return a clear error (a 4xx in HTTP terms) instead of letting request["amount"] throw deep inside the model. This is the same "validate at system boundaries" rule from earlier chapters, now at the serving edge.

Why a builder cares

Most "the model API crashed in prod" pages are unvalidated requests, not the model. A handler that validates first and returns a structured response (prediction + score, or a clean error) is the difference between a robust service and one that 500s on the first weird input. You'll build that handler next.