What a harness is — the four layers under every coding agent (step 9/9) · agent harnesses

Checkpoint

One last thing before we move on. Same surface as a write step — but the lesson doesn't complete until this passes.

Final drill. Build a harness with all four layers measurable. Write harness(user_input, tools, fake_model, config, max_iters) that:

Layer 1: hold system from config in a separate system_prompt variable (matches Anthropic's SDK shape: system= is a top-level kwarg, NOT a role inside messages). Build messages with just the user turn.
Loop up to max_iters. Layer 2: call fake_model(messages) via with_retries (provided — retries on TransientError). Layer 3: filter content into text + tool_use blocks; track tokens_used from response.get("usage", 0). On end_turn: return a result dict including layer counters. Layer 4: dispatch each tool_use block; track tool_calls_total.
Returns {"text": str, "iters": int, "tool_calls": int, "tokens": int, "system_used": bool}.
system_used is True iff the config's system key was held in system_prompt (i.e., would be sent via the SDK's top-level system= kwarg).

Two cases run. Expected output:

text='Found ramen.' iters=2 tool_calls=1 tokens=130 system_used=True
text='capped' iters=2 tool_calls=2 tokens=180 system_used=False

⌘↵ runs the editor.read, then continue.

Checkpoint

One last thing before we move on. Same surface as a write step — but the lesson doesn't complete until this passes.

Final drill. Build a harness with all four layers measurable. Write harness(user_input, tools, fake_model, config, max_iters) that:

Layer 1: hold system from config in a separate system_prompt variable (matches Anthropic's SDK shape: system= is a top-level kwarg, NOT a role inside messages). Build messages with just the user turn.
Loop up to max_iters. Layer 2: call fake_model(messages) via with_retries (provided — retries on TransientError). Layer 3: filter content into text + tool_use blocks; track tokens_used from response.get("usage", 0). On end_turn: return a result dict including layer counters. Layer 4: dispatch each tool_use block; track tool_calls_total.
Returns {"text": str, "iters": int, "tool_calls": int, "tokens": int, "system_used": bool}.
system_used is True iff the config's system key was held in system_prompt (i.e., would be sent via the SDK's top-level system= kwarg).

Two cases run. Expected output:

text='Found ramen.' iters=2 tool_calls=1 tokens=130 system_used=True
text='capped' iters=2 tool_calls=2 tokens=180 system_used=False

this step needs the editor

on desktop today; in the app (coming soon). save your spot and we'll bring you back here when you're ready.

open this same url on a laptop to keep going today.