promptdojo_›phase 06 · applied builds›ch 30 · harness engineering

lesson 1 of 5 · the harness-engineering mindset — agent equals model plus harnessstep 4 / 8

"It's probably fine. It's just a skill issue."

HumanLayer's Skill Issue: Harness Engineering for Coding Agents (May 2026) opens with a sentence that the AI engineering community has been quoting at each other ever since:

"It's not a model problem. It's a configuration problem."

The post is a writeup from a team that runs dozens of agent projects and hundreds of sessions per week. Their pattern across all of them: the developer's first instinct when an agent fails is to blame the model. Their second instinct is to wait for a better model. Both instincts cost them weeks before they switched to the harness-first reflex.

The reframe is brutal: when your coding agent isn't performing the way you expect, before you blame the model, check the harness. The team coined the phrase "skill issue" half-jokingly — it's slang for "you played badly, the game's fine." Applied to agents, the parallel works: the model is probably fine; the harness is probably misconfigured.

Why blaming the model is the wrong reflex

Two reasons it persists, both wrong:

The model is the visible variable. When the developer changes providers (Claude → GPT-5 → Gemini), they can see themselves doing something. When they change the harness, the work is more like editing config — slower, less satisfying, harder to attribute credit to. So they reach for what feels like action.
The model is the marketing variable. Every model launch comes with eval screenshots that imply "this one solves your problems." It rarely does. The eval is in a different harness than yours, on tasks that look nothing like yours.

The harness-first reflex is empirically faster. HumanLayer reports: most of the agent failures they tracked over six months resolved through harness tuning, not model upgrades. The model wasn't the bottleneck. The team's relationship with the model was.

The four-reflex menu

When you observe an agent failure, the harness-engineering reflex is a four-way dispatch. Each failure type maps to one canonical fix:

Reflex 1 — Agent ignored a team convention

The model wrote inline styles, committed without running tests, used var instead of const, called the wrong logger. The team has a rule. The model didn't follow it.

Fix: Add the rule to AGENTS.md / CLAUDE.md. Make sure it's traceable to the failure. One sentence, one file, one PR.

Reflex 2 — Agent ran a destructive command

The model ran git push --force to main, rm -rf somewhere it shouldn't, DROP TABLE in a real DB. The first time, the model "couldn't have known." The second time, the harness should have stopped it.

Fix: Write a pre-tool hook. Block the command. Log the attempt. Surface the block to the model so it can choose a safer path on the next turn.

Reflex 3 — Agent got lost in a long task

The model started strong on a 40-step refactor, drifted by step 20, declared victory by step 32 without doing the rest. Context filled, attention rotted, completion criteria slipped.

Fix: Split the workflow. Planner subagent writes a plan file; executor subagent runs the plan step by step with its own fresh context window per step. Anthropic's pattern, Claude Code's pattern, every long-horizon production harness's pattern.

Reflex 4 — Agent shipped broken code

The model edited TypeScript, said "done." You ran tsc. Twelve errors. The agent had no idea.

Fix: Wire a post-edit hook. After every edit, run the type-check. On failure, inject the errors into the next model turn. This is type-check backpressure — the model self-corrects without you in the loop. Lesson 4 builds this concretely.

The mental map

Failure shape	Canonical fix	Harness piece
Ignored convention	Add rule to AGENTS.md	config
Destructive command	Pre-tool hook	hooks
Lost on long task	Planner/executor split	orchestration
Broken code shipped	Post-edit backpressure	hooks

Four reflexes. Four harness pieces. Most agent failures land in one of these four buckets. The remaining failures (a missing trace, an exhausted context window) land in the other two pieces — observability and infra — and you'll learn to spot those later.

The next drill makes you classify four specific failures into the right reflex. Get this drill right and the rest of the chapter clicks.

⌘↵ runs the editor.read, then continue.

promptdojo_›phase 06 · applied builds›ch 30 · harness engineering

lesson 1 of 5 · the harness-engineering mindset — agent equals model plus harnessstep 4 / 8

"It's probably fine. It's just a skill issue."

HumanLayer's Skill Issue: Harness Engineering for Coding Agents (May 2026) opens with a sentence that the AI engineering community has been quoting at each other ever since:

"It's not a model problem. It's a configuration problem."

Why blaming the model is the wrong reflex

Two reasons it persists, both wrong:

The model is the visible variable. When the developer changes providers (Claude → GPT-5 → Gemini), they can see themselves doing something. When they change the harness, the work is more like editing config — slower, less satisfying, harder to attribute credit to. So they reach for what feels like action.
The model is the marketing variable. Every model launch comes with eval screenshots that imply "this one solves your problems." It rarely does. The eval is in a different harness than yours, on tasks that look nothing like yours.

The four-reflex menu

When you observe an agent failure, the harness-engineering reflex is a four-way dispatch. Each failure type maps to one canonical fix:

Reflex 1 — Agent ignored a team convention

The model wrote inline styles, committed without running tests, used var instead of const, called the wrong logger. The team has a rule. The model didn't follow it.

Fix: Add the rule to AGENTS.md / CLAUDE.md. Make sure it's traceable to the failure. One sentence, one file, one PR.

Reflex 2 — Agent ran a destructive command

Fix: Write a pre-tool hook. Block the command. Log the attempt. Surface the block to the model so it can choose a safer path on the next turn.

Reflex 3 — Agent got lost in a long task

The model started strong on a 40-step refactor, drifted by step 20, declared victory by step 32 without doing the rest. Context filled, attention rotted, completion criteria slipped.

Reflex 4 — Agent shipped broken code

The model edited TypeScript, said "done." You ran tsc. Twelve errors. The agent had no idea.

The mental map

Failure shape	Canonical fix	Harness piece
Ignored convention	Add rule to AGENTS.md	config
Destructive command	Pre-tool hook	hooks
Lost on long task	Planner/executor split	orchestration
Broken code shipped	Post-edit backpressure	hooks

The next drill makes you classify four specific failures into the right reflex. Get this drill right and the rest of the chapter clicks.

⌘↵ runs the editor.read, then continue.

The harness-engineering mindset — Agent equals Model plus Harness — step 4 of 8

"It's probably fine. It's just a skill issue."

Why blaming the model is the wrong reflex

The four-reflex menu

Reflex 1 — Agent ignored a team convention

Reflex 2 — Agent ran a destructive command

Reflex 3 — Agent got lost in a long task

Reflex 4 — Agent shipped broken code

The mental map

The harness-engineering mindset — Agent equals Model plus Harness — step 4 of 8

"It's probably fine. It's just a skill issue."

Why blaming the model is the wrong reflex

The four-reflex menu

Reflex 1 — Agent ignored a team convention

Reflex 2 — Agent ran a destructive command

Reflex 3 — Agent got lost in a long task

Reflex 4 — Agent shipped broken code

The mental map