Schemas, Pydantic, and validation — making the model return real data — step 1 of 9
Free-form text is not data
The first time you ask Claude to "extract the user's email from this
support ticket," it returns exactly what you wanted: [email protected].
The second time, it returns Sure! The email is [email protected].. The
third time, The email address you're looking for is [email protected].
Three different shapes. Your downstream code expected a string. Now it either has to regex-extract the email out of natural language every time, or fail.
This is why every production AI feature — without exception — uses structured output: you tell the model exactly what JSON shape to return, and you validate the response when it comes back.
The pattern AI ships every time:
import anthropic
from pydantic import BaseModel
class Ticket(BaseModel):
email: str
severity: int # 1-5
summary: str
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=512,
messages=[{"role": "user", "content": "Extract from: <ticket text>"}],
)
raw = response.content[0].text
ticket = Ticket.model_validate_json(raw) # parse + validate in one call
print(ticket.email)
Three pieces. The schema (the BaseModel class), the prompt (asking for
JSON), and the validation step (model_validate_json). All three matter.
Skip the schema and you're back to regex-on-natural-language. Skip the
validation and you'll find out about the model's hallucinated field at
3am from a NoneType error in production.
Browser note: Pydantic loads in Pyodide via
micropip.install("pydantic"), but for fast feedback we use plaindictvalidation in these drills. Same logic, spelled out, so you can read and write the pattern by hand. Switching toBaseModellater is a two-line change.
Where AI specifically gets this wrong
- Trusting the model on first try. Models lie. They drop required fields, return strings where you wanted ints, and invent enum values you never defined. Validate every response.
- Forgetting
response_format/ tool use. On OpenAI, the modern canonical way is Structured Outputs:response_format={"type": "json_schema", "json_schema": {...}}— the API guarantees the response conforms to your schema. The older{"type": "json_object"}mode only guarantees valid JSON, not your shape. On Anthropic you typically use a tool definition (or the neweroutput_formatparameter). Without one, the model wraps its JSON in prose. - Catching
ValidationErrortoo broadly. When Pydantic rejects a response, you usually want to retry with the error message back to the model — not silently fall through.
Run the editor. We extract a name and validate the shape by hand.
Schemas, Pydantic, and validation — making the model return real data — step 1 of 9
Free-form text is not data
The first time you ask Claude to "extract the user's email from this
support ticket," it returns exactly what you wanted: [email protected].
The second time, it returns Sure! The email is [email protected].. The
third time, The email address you're looking for is [email protected].
Three different shapes. Your downstream code expected a string. Now it either has to regex-extract the email out of natural language every time, or fail.
This is why every production AI feature — without exception — uses structured output: you tell the model exactly what JSON shape to return, and you validate the response when it comes back.
The pattern AI ships every time:
import anthropic
from pydantic import BaseModel
class Ticket(BaseModel):
email: str
severity: int # 1-5
summary: str
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=512,
messages=[{"role": "user", "content": "Extract from: <ticket text>"}],
)
raw = response.content[0].text
ticket = Ticket.model_validate_json(raw) # parse + validate in one call
print(ticket.email)
Three pieces. The schema (the BaseModel class), the prompt (asking for
JSON), and the validation step (model_validate_json). All three matter.
Skip the schema and you're back to regex-on-natural-language. Skip the
validation and you'll find out about the model's hallucinated field at
3am from a NoneType error in production.
Browser note: Pydantic loads in Pyodide via
micropip.install("pydantic"), but for fast feedback we use plaindictvalidation in these drills. Same logic, spelled out, so you can read and write the pattern by hand. Switching toBaseModellater is a two-line change.
Where AI specifically gets this wrong
- Trusting the model on first try. Models lie. They drop required fields, return strings where you wanted ints, and invent enum values you never defined. Validate every response.
- Forgetting
response_format/ tool use. On OpenAI, the modern canonical way is Structured Outputs:response_format={"type": "json_schema", "json_schema": {...}}— the API guarantees the response conforms to your schema. The older{"type": "json_object"}mode only guarantees valid JSON, not your shape. On Anthropic you typically use a tool definition (or the neweroutput_formatparameter). Without one, the model wraps its JSON in prose. - Catching
ValidationErrortoo broadly. When Pydantic rejects a response, you usually want to retry with the error message back to the model — not silently fall through.
Run the editor. We extract a name and validate the shape by hand.