promptdojo_›phase 06 · applied builds›ch 30 · harness engineering

lesson 3 of 5 · context engineering — fighting the rotstep 4 / 9

Load the rule when the model needs it, not before

Compaction shrinks history. Offloading shrinks tool output. The third mitigation is the one most teams miss: progressive disclosure — only load instructions, tools, and rules when they become relevant.

The default behavior in a naive harness is to load everything at session start: every CLAUDE.md rule, every tool definition, every skill the model might possibly need. This is the "set the table for every possible meal" approach. It scales poorly. A harness with 30 skills and 50 tools spends thousands of tokens describing capabilities the agent will never use on this particular task.

Progressive disclosure flips the default: load nothing speculatively; load on demand.

How Claude Code's skills system works

The cleanest production example. Each skill is a markdown file in ~/.claude/skills/ (or per-project). The skill file declares:

Its name and trigger conditions ("use when the user asks to deploy")
The tools it provides (a subset, often new)
The rules it adds ("never deploy without running tests first")
Any examples or templates

At session start, the harness loads only the index — a list of skill names with one-line descriptions. The agent sees something like:

Available skills (not yet loaded):
- deploy-stack — triggered when user asks to deploy
- review-pr — triggered when user asks for code review
- migrate-db — triggered when user asks about schema changes
- ...

When a user message matches a trigger, the harness loads that skill's full content into context. The other 27 skills stay out. The model only ever sees the skill that matters for the current task.

Result: even with 30+ skills available, the session context contains the 1-2 skills the task needs. Token spend drops by 5-20x compared to loading them all eagerly.

What progressive disclosure covers

Three flavors of "load on demand":

Skills — Claude Code's named system. Trigger-driven.
Tool descriptions — load expensive tool descriptions only when the tool gets registered for this session. A vector-search MCP server with 12 tools might only need 3 of them for this task.
Documentation snippets — when the model asks a question about a library, fetch the relevant docs just-in-time. Don't preload the entire docs site.

Each one is a place where eager loading is the seductive but wrong default.

How to design a trigger

Two strategies, both common:

Keyword triggers — "this skill activates when the user message contains 'deploy', 'push', or 'release'." Cheap, brittle.
Embedding triggers — "compute the user message's embedding; load the skill if the cosine similarity to the skill description is over 0.7." More accurate, costs an extra round trip.

Production harnesses tend to combine: keyword as the cheap first filter, embedding as the disambiguator.

What you save, what you pay

The math:

A harness with 30 skills, each averaging 800 tokens, loaded eagerly = 24,000 tokens of system prompt before the user has typed a word.
The same harness with progressive disclosure = ~600 tokens of index, plus 800 tokens per loaded skill (typically 1-2 per session) = ~2,200 tokens.

That's 21,800 tokens of attention reclaimed per session. On a 200k-token window, that's ~10% of the budget freed for actual work.

The cost: one extra layer of complexity in the harness, plus the small latency of dispatching skill loads. Both pay for themselves within hours of running a real workload.

When NOT to progressively disclose

Progressive disclosure has a cost-floor. For very small harnesses (< 5 skills, < 10 tools), the dispatch logic is more code than the savings justify. Eager loading is fine.

The threshold: if you have more than ~10 skills or more than ~30 tools, progressive disclosure starts being worth the engineering. Below that, it's optimization for its own sake.

The combined picture

Three mitigations, working together:

Progressive disclosure keeps the front of context short — only the relevant rules and tools.
Compaction keeps the middle of context short — old turns get summarized.
Offloading keeps the back of context short — long tool outputs go to disk.

A harness that does all three well has, at any moment in a long session, a context that's 10-20% the size of what a naive harness would have. The model's attention is correspondingly 5-10x more focused. That's the harness gap in action.

What's next

The drill at step 05 makes you pick the right mitigation for four real context-bloat scenarios. Then the memory-hierarchy read covers a related question — where to put each kind of data when it's not in context. Then the data-classification drill. Then the budget-planning write and the checkpoint.

⌘↵ runs the editor.read, then continue.

promptdojo_›phase 06 · applied builds›ch 30 · harness engineering

lesson 3 of 5 · context engineering — fighting the rotstep 4 / 9

Load the rule when the model needs it, not before

Progressive disclosure flips the default: load nothing speculatively; load on demand.

How Claude Code's skills system works

The cleanest production example. Each skill is a markdown file in ~/.claude/skills/ (or per-project). The skill file declares:

Its name and trigger conditions ("use when the user asks to deploy")
The tools it provides (a subset, often new)
The rules it adds ("never deploy without running tests first")
Any examples or templates

At session start, the harness loads only the index — a list of skill names with one-line descriptions. The agent sees something like:

Available skills (not yet loaded):
- deploy-stack — triggered when user asks to deploy
- review-pr — triggered when user asks for code review
- migrate-db — triggered when user asks about schema changes
- ...

When a user message matches a trigger, the harness loads that skill's full content into context. The other 27 skills stay out. The model only ever sees the skill that matters for the current task.

Result: even with 30+ skills available, the session context contains the 1-2 skills the task needs. Token spend drops by 5-20x compared to loading them all eagerly.

What progressive disclosure covers

Three flavors of "load on demand":

Skills — Claude Code's named system. Trigger-driven.
Tool descriptions — load expensive tool descriptions only when the tool gets registered for this session. A vector-search MCP server with 12 tools might only need 3 of them for this task.
Documentation snippets — when the model asks a question about a library, fetch the relevant docs just-in-time. Don't preload the entire docs site.

Each one is a place where eager loading is the seductive but wrong default.

How to design a trigger

Two strategies, both common:

Keyword triggers — "this skill activates when the user message contains 'deploy', 'push', or 'release'." Cheap, brittle.
Embedding triggers — "compute the user message's embedding; load the skill if the cosine similarity to the skill description is over 0.7." More accurate, costs an extra round trip.

Production harnesses tend to combine: keyword as the cheap first filter, embedding as the disambiguator.

What you save, what you pay

The math:

A harness with 30 skills, each averaging 800 tokens, loaded eagerly = 24,000 tokens of system prompt before the user has typed a word.
The same harness with progressive disclosure = ~600 tokens of index, plus 800 tokens per loaded skill (typically 1-2 per session) = ~2,200 tokens.

That's 21,800 tokens of attention reclaimed per session. On a 200k-token window, that's ~10% of the budget freed for actual work.

The cost: one extra layer of complexity in the harness, plus the small latency of dispatching skill loads. Both pay for themselves within hours of running a real workload.

When NOT to progressively disclose

Progressive disclosure has a cost-floor. For very small harnesses (< 5 skills, < 10 tools), the dispatch logic is more code than the savings justify. Eager loading is fine.

The threshold: if you have more than ~10 skills or more than ~30 tools, progressive disclosure starts being worth the engineering. Below that, it's optimization for its own sake.

The combined picture

Three mitigations, working together:

Progressive disclosure keeps the front of context short — only the relevant rules and tools.
Compaction keeps the middle of context short — old turns get summarized.
Offloading keeps the back of context short — long tool outputs go to disk.

What's next

⌘↵ runs the editor.read, then continue.

Context engineering — fighting the rot — step 4 of 9

Load the rule when the model needs it, not before

How Claude Code's skills system works

What progressive disclosure covers

How to design a trigger

What you save, what you pay

When NOT to progressively disclose

The combined picture

What's next

Context engineering — fighting the rot — step 4 of 9

Load the rule when the model needs it, not before

How Claude Code's skills system works

What progressive disclosure covers

How to design a trigger

What you save, what you pay

When NOT to progressively disclose

The combined picture

What's next