The two-year debate that missed the other half

From late 2023 through most of 2025, the developer-tools discourse on Twitter was a single recurring fight: which model is smartest? GPT-4 vs Claude 3 Opus vs Gemini 1.5. Then Sonnet 3.5 vs GPT-4o. Then Sonnet 4 vs o1 vs Gemini 2. Eval screenshots flew daily. Vibes-coding races got posted hourly. People switched providers based on which one cleared a debug task two seconds faster.

Almost nobody talked about the rest of the system.

In May 2026, Viv Trivedy (@Vtrivedy10 on X) posted the line that named the missing half:

"Agent = Model + Harness. If you're not the model, you're the harness."

Addy Osmani's Agent Harness Engineering essay (May 2026) used the line as a load-bearing citation and built the whole framing around it. Birgitta Böckeler at Thoughtworks had been mapping the same territory under the same word since April. HumanLayer's Skill Issue: Harness Engineering for Coding Agents posted earlier in spring 2026 (March 12, 2026). All three pieces converged on the same thesis within weeks of each other: the model is one input into the system. The harness is everything else, and the harness is what you actually engineer.

What "harness" actually means

A harness is every piece of code, configuration, and execution logic that isn't the model itself:

The system prompt (and CLAUDE.md / AGENTS.md / skill files that build it)
The tool definitions, MCP servers, custom CLIs the model can call
The bundled infrastructure: filesystem access, sandboxes, headless browsers, databases
The orchestration logic: subagents, plan-then-execute splits, handoffs, routing
The hooks and middleware: pre-tool guards, post-edit checks, type-check backpressure
The observability: traces, logs, cost tracking, latency budgets, eval rigs

Chapter 26 walked through the four layers — input prep, model call, output parsing, tool dispatch. That's the loop. The harness is the loop plus everything wrapped around it that decides what the loop actually does.

Why the framing matters now

Two years of "which model is best" debates produced a generation of teams that spent zero engineering effort on the harness. Their reflex when an agent failed was always the same: wait for a smarter model. Sometimes they got one. Most of the time the next model release fixed the wrong thing — it could solve harder LeetCode problems but still committed half-written code, still ran destructive bash without confirmation, still got lost on step 32 of a 40-step task.

Those aren't model failures. Those are harness failures. You don't fix them with a new API key. You fix them with engineering.

The remaining four lessons of this chapter are about that engineering. This first lesson is the mindset shift: stop blaming the model first, learn to look at the harness first.

What this lesson teaches

The six pieces every modern coding-agent harness has.
The four-reflex triage for diagnosing where a failure actually lives.
The harness gap — empirical evidence that the same model in two harnesses scores wildly differently.
A scoring function that audits a harness inventory and tells you what's missing.

By the end you should be able to look at any agent setup — your own, your team's, a screenshot of someone else's CLAUDE.md — and instantly answer the audit question this raises: of the six pieces, how many are actually present in YOUR setup?

That question is harness engineering's version of "have you tried turning it off and on again." It is the first question to ask before any other.

⌘↵ runs the editor.read, then continue.

promptdojo_›phase 06 · applied builds›ch 30 · harness engineering

lesson 1 of 5 · the harness-engineering mindset — agent equals model plus harnessstep 1 / 8

The two-year debate that missed the other half

Almost nobody talked about the rest of the system.

In May 2026, Viv Trivedy (@Vtrivedy10 on X) posted the line that named the missing half:

"Agent = Model + Harness. If you're not the model, you're the harness."

What "harness" actually means

A harness is every piece of code, configuration, and execution logic that isn't the model itself:

The system prompt (and CLAUDE.md / AGENTS.md / skill files that build it)
The tool definitions, MCP servers, custom CLIs the model can call
The bundled infrastructure: filesystem access, sandboxes, headless browsers, databases
The orchestration logic: subagents, plan-then-execute splits, handoffs, routing
The hooks and middleware: pre-tool guards, post-edit checks, type-check backpressure
The observability: traces, logs, cost tracking, latency budgets, eval rigs

Why the framing matters now

Those aren't model failures. Those are harness failures. You don't fix them with a new API key. You fix them with engineering.

The remaining four lessons of this chapter are about that engineering. This first lesson is the mindset shift: stop blaming the model first, learn to look at the harness first.

What this lesson teaches

The six pieces every modern coding-agent harness has.
The four-reflex triage for diagnosing where a failure actually lives.
The harness gap — empirical evidence that the same model in two harnesses scores wildly differently.
A scoring function that audits a harness inventory and tells you what's missing.

That question is harness engineering's version of "have you tried turning it off and on again." It is the first question to ask before any other.

⌘↵ runs the editor.read, then continue.

The harness-engineering mindset — Agent equals Model plus Harness — step 1 of 8

The two-year debate that missed the other half

What "harness" actually means

Why the framing matters now

What this lesson teaches

The harness-engineering mindset — Agent equals Model plus Harness — step 1 of 8

The two-year debate that missed the other half

What "harness" actually means

Why the framing matters now

What this lesson teaches