Before the API: what the model is doing — step 1 of 1
Before the API: what the model is doing
Before you call a model API, you need the mental model from chapter zero in a more technical form.
A model call is not a database lookup. It is not a search engine. It is not a coworker with memory. It is a probability machine generating the next useful tokens from the context you provide.
That is why the same model can feel different in different products. The model is only one layer. Around it sits a harness: system instructions, tools, file access, retrieval, safety rules, output formatting, and history.
When you call an LLM API, you are building the smallest version of that harness yourself.
You choose:
- the messages you send
- the role and task
- the context window
- the output format
- the tool or schema contract
- the checks that decide whether the response is usable
Training is when the model learned broad patterns. Inference is this moment: you send context, the model returns output. If the output is wrong, the fix is usually not “the model is dumb.” The fix is often one layer closer:
- the task was vague
- the context was missing
- the format was loose
- the examples pointed the wrong way
- the check allowed a fluent answer to pass
That is the bridge from “chat with AI” to “ship an AI feature.”