Reading the response — content blocks, stop_reason, usage — step 3 of 9
Five stop reasons, two that drive 99% of code
Your agent loop's main control flow IS branching on stop_reason.
Get this wrong and the loop either exits early or never exits.
Five values you'll see:
end_turn — the model is done
The model finished saying what it had to say. Read the text from
content, return it to the user, exit the loop. This is the
"normal" end state for non-agent calls.
tool_use — the model wants you to run something
The model emitted at least one tool_use block. Your code:
- Runs each tool by name with its input.
- Appends a user-role turn with
tool_resultblocks (one pertool_use_id). - Calls the model again with the appended history.
The loop continues until end_turn (or you hit your iteration cap).
max_tokens — the response was cut off
You set max_tokens=1024 and the model wanted to write more. The
output is truncated mid-sentence. Two options:
- Bump
max_tokensand retry the whole call (simple, sometimes wasteful). - Continue the conversation by sending the truncated response back as the assistant's prior turn and asking "continue."
In production agents, you cap max_tokens aggressively and
detect this stop reason as a signal that the task was too big.
Catching it loudly beats silently shipping a half-answer.
stop_sequence — a custom stop hit
You passed stop_sequences=["\n\nUser:", "STOP"] and one of those
strings appeared in the output. Rare in production; common in
fine-tuned-model legacy code. If you see this, you know the
prompt set up custom delimiters; the response was intentionally
cut at one.
pause_turn — server-side tool ran out of time
Anthropic-only. When you use Anthropic's hosted tools (their
server runs the tool, not yours) and the tool's internal loop
caps out, you get pause_turn instead of an error. The fix: just
call the API again with the same messages — the server resumes
from where it paused.
What this maps to in code
def drive_loop(response, messages, run_tool):
if response.stop_reason == "end_turn":
return extract_text(response.content), "done"
if response.stop_reason == "tool_use":
tool_results = []
for block in response.content:
if block.type == "tool_use":
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": run_tool(block.name, block.input),
})
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
return None, "continue"
if response.stop_reason == "max_tokens":
return None, "truncated"
# stop_sequence, pause_turn, refusal: handle as needed
return None, response.stop_reason
Two branches drive 99% of agent code. The others are exception
paths. The bug step 7 fixes: hardcoding if response.stop_reason == "stop" (a value that doesn't exist in the API — it's a
common misremembering of end_turn).
Anthropic vs OpenAI: the same field, slightly different
OpenAI's Responses API exposes response.status (completed,
incomplete, failed) at the response level, plus per-output
finish_reason (stop, length, tool_calls, content_filter).
Same idea, different vocabulary. Code that targets both providers
typically wraps stop_reason / finish_reason into a normalized
enum.
The principle is universal: every modern LLM API tells you why it stopped, and your code branches on that field.
Reading the response — content blocks, stop_reason, usage — step 3 of 9
Five stop reasons, two that drive 99% of code
Your agent loop's main control flow IS branching on stop_reason.
Get this wrong and the loop either exits early or never exits.
Five values you'll see:
end_turn — the model is done
The model finished saying what it had to say. Read the text from
content, return it to the user, exit the loop. This is the
"normal" end state for non-agent calls.
tool_use — the model wants you to run something
The model emitted at least one tool_use block. Your code:
- Runs each tool by name with its input.
- Appends a user-role turn with
tool_resultblocks (one pertool_use_id). - Calls the model again with the appended history.
The loop continues until end_turn (or you hit your iteration cap).
max_tokens — the response was cut off
You set max_tokens=1024 and the model wanted to write more. The
output is truncated mid-sentence. Two options:
- Bump
max_tokensand retry the whole call (simple, sometimes wasteful). - Continue the conversation by sending the truncated response back as the assistant's prior turn and asking "continue."
In production agents, you cap max_tokens aggressively and
detect this stop reason as a signal that the task was too big.
Catching it loudly beats silently shipping a half-answer.
stop_sequence — a custom stop hit
You passed stop_sequences=["\n\nUser:", "STOP"] and one of those
strings appeared in the output. Rare in production; common in
fine-tuned-model legacy code. If you see this, you know the
prompt set up custom delimiters; the response was intentionally
cut at one.
pause_turn — server-side tool ran out of time
Anthropic-only. When you use Anthropic's hosted tools (their
server runs the tool, not yours) and the tool's internal loop
caps out, you get pause_turn instead of an error. The fix: just
call the API again with the same messages — the server resumes
from where it paused.
What this maps to in code
def drive_loop(response, messages, run_tool):
if response.stop_reason == "end_turn":
return extract_text(response.content), "done"
if response.stop_reason == "tool_use":
tool_results = []
for block in response.content:
if block.type == "tool_use":
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": run_tool(block.name, block.input),
})
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
return None, "continue"
if response.stop_reason == "max_tokens":
return None, "truncated"
# stop_sequence, pause_turn, refusal: handle as needed
return None, response.stop_reason
Two branches drive 99% of agent code. The others are exception
paths. The bug step 7 fixes: hardcoding if response.stop_reason == "stop" (a value that doesn't exist in the API — it's a
common misremembering of end_turn).
Anthropic vs OpenAI: the same field, slightly different
OpenAI's Responses API exposes response.status (completed,
incomplete, failed) at the response level, plus per-output
finish_reason (stop, length, tool_calls, content_filter).
Same idea, different vocabulary. Code that targets both providers
typically wraps stop_reason / finish_reason into a normalized
enum.
The principle is universal: every modern LLM API tells you why it stopped, and your code branches on that field.