What a harness is — the four layers under every coding agent — step 9 of 9
Checkpoint
One last thing before we move on. Same surface as a write step — but the lesson doesn't complete until this passes.
Final drill. Build a harness with all four layers measurable.
Write harness(user_input, tools, fake_model, config, max_iters)
that:
- Layer 1: hold
systemfrom config in a separatesystem_promptvariable (matches Anthropic's SDK shape:system=is a top-level kwarg, NOT a role inside messages). Buildmessageswith just the user turn. - Loop up to
max_iters. Layer 2: callfake_model(messages)viawith_retries(provided — retries onTransientError). Layer 3: filter content into text + tool_use blocks; tracktokens_usedfromresponse.get("usage", 0). Onend_turn: return a result dict including layer counters. Layer 4: dispatch each tool_use block; tracktool_calls_total. - Returns
{"text": str, "iters": int, "tool_calls": int, "tokens": int, "system_used": bool}. system_usedis True iff the config'ssystemkey was held insystem_prompt(i.e., would be sent via the SDK's top-levelsystem=kwarg).
Two cases run. Expected output:
text='Found ramen.' iters=2 tool_calls=1 tokens=130 system_used=True
text='capped' iters=2 tool_calls=2 tokens=180 system_used=False
⌘↵ runs the editor.read, then continue.
Checkpoint
One last thing before we move on. Same surface as a write step — but the lesson doesn't complete until this passes.
Final drill. Build a harness with all four layers measurable.
Write harness(user_input, tools, fake_model, config, max_iters)
that:
- Layer 1: hold
systemfrom config in a separatesystem_promptvariable (matches Anthropic's SDK shape:system=is a top-level kwarg, NOT a role inside messages). Buildmessageswith just the user turn. - Loop up to
max_iters. Layer 2: callfake_model(messages)viawith_retries(provided — retries onTransientError). Layer 3: filter content into text + tool_use blocks; tracktokens_usedfromresponse.get("usage", 0). Onend_turn: return a result dict including layer counters. Layer 4: dispatch each tool_use block; tracktool_calls_total. - Returns
{"text": str, "iters": int, "tool_calls": int, "tokens": int, "system_used": bool}. system_usedis True iff the config'ssystemkey was held insystem_prompt(i.e., would be sent via the SDK's top-levelsystem=kwarg).
Two cases run. Expected output:
text='Found ramen.' iters=2 tool_calls=1 tokens=130 system_used=True
text='capped' iters=2 tool_calls=2 tokens=180 system_used=False
this step needs the editor
on desktop today; in the app (coming soon). save your spot and we'll bring you back here when you're ready.