Before the API: what the model is doing (step 1/1) · llm apis

Before the API: what the model is doing

Before you call a model API, you need the mental model from chapter zero in a more technical form.

A model call is not a database lookup. It is not a search engine. It is not a coworker with memory. It is a probability machine generating the next useful tokens from the context you provide.

That is why the same model can feel different in different products. The model is only one layer. Around it sits a harness: system instructions, tools, file access, retrieval, safety rules, output formatting, and history.

When you call an LLM API, you are building the smallest version of that harness yourself.

You choose:

the messages you send
the role and task
the context window
the output format
the tool or schema contract
the checks that decide whether the response is usable

Training is when the model learned broad patterns. Inference is this moment: you send context, the model returns output. If the output is wrong, the fix is usually not “the model is dumb.” The fix is often one layer closer:

the task was vague
the context was missing
the format was loose
the examples pointed the wrong way
the check allowed a fluent answer to pass

That is the bridge from “chat with AI” to “ship an AI feature.”