Two techniques. One has aged well. The other has aged badly.

Lesson 01 set up the six-knob prompt scaffold (role, task, context, format, examples, constraints). This lesson is about two of the most-cited prompting techniques: few-shot examples and chain-of-thought (CoT). They were both gospel in 2023. By 2026, one is still gospel; the other has become a trap.

Few-shot — still works, when used right

Few-shot prompting is putting 2–5 example input/output pairs in the prompt before the real query:

Classify the sentiment as positive, negative, or neutral.

Input: great product, fast shipping
Output: positive

Input: broke after a week
Output: negative

Input: late but quality was fine
Output: ?

The model sees the pattern and follows it. Three places this genuinely earns its keep:

Format lock-in. When you need the output in a specific shape (label only, JSON, fixed prefix), examples teach the shape better than instructions can describe it.
Domain disambiguation. "Classify support tickets" means a different thing in different products. Examples ground the classifier in your labels, not the model's prior.
Output style. Tone, formality, brand voice — show, don't tell. Even three examples set tone better than a paragraph.

It also has failure modes — covered in steps 6 and 7 — most notably format lock-in working too well (model rigidly mimics example format even when the user's actual query needs different formatting).

CoT — the trap on reasoning models

Chain-of-thought prompting (adding "think step by step" or "explain your reasoning before answering") was the single highest-leverage technique on GPT-4 and Claude 3 in 2023–2024. On modern reasoning models (OpenAI's o-series, Claude with extended thinking, Gemini 2.5+), it's at best redundant and at worst harmful.

Why: reasoning models do step-by-step reasoning internally — that's the whole feature. Asking them to "think step by step" out loud:

Adds output tokens (and cost).
Doesn't improve accuracy on most benchmarks.
Sometimes hurts — explicit CoT in the prompt can interfere with the model's internal reasoning trace, lowering answer quality.

Cited evidence:

Wharton's Decreasing Value of Chain-of-Thought (2024) — measured CoT no longer improving and sometimes hurting on newer models.
OpenAI's reasoning models guide warns against bolting external CoT onto o-series models.

The 2026 rule: CoT helps non-reasoning models (Sonnet, GPT-4o, Haiku). It doesn't help reasoning models (o-series, Claude with extended thinking, Gemini 2.5+). Always ask: which model class am I prompting?

What everyone calls these techniques

Technique	Names you'll see
Few-shot	"in-context learning," "in-prompt examples," "k-shot" (k=2,3,5...)
Chain-of-thought	"CoT," "step-by-step," "scratchpad," "reasoning prefix"
Reasoning models	OpenAI o-series (o1, o3, o4), Claude with extended thinking, Gemini 2.5 Thinking, DeepSeek R1
Non-reasoning models	"Standard models" — Sonnet without extended thinking, GPT-4o, Haiku

What you'll build

A build_few_shot_prompt(task, examples, query) that assembles the canonical example-driven prompt format. Then the bug step 6 fixes: bolting "think step by step" onto a reasoning-model prompt (redundant at best, harmful at worst). Then the bug step 7 fixes: example format mismatching the actual query format (the model locks in on the wrong template).

⌘↵ runs the editor.read, then continue.

promptdojo_›phase 04 · shipping discipline›ch 19 · prompting cursor and claude code effectively

lesson 2 of 4 · few-shot and reasoning — examples that work, and the cot trap on reasoning modelsstep 1 / 9

Two techniques. One has aged well. The other has aged badly.

Few-shot — still works, when used right

Few-shot prompting is putting 2–5 example input/output pairs in the prompt before the real query:

Classify the sentiment as positive, negative, or neutral.

Input: great product, fast shipping
Output: positive

Input: broke after a week
Output: negative

Input: late but quality was fine
Output: ?

The model sees the pattern and follows it. Three places this genuinely earns its keep:

Format lock-in. When you need the output in a specific shape (label only, JSON, fixed prefix), examples teach the shape better than instructions can describe it.
Domain disambiguation. "Classify support tickets" means a different thing in different products. Examples ground the classifier in your labels, not the model's prior.
Output style. Tone, formality, brand voice — show, don't tell. Even three examples set tone better than a paragraph.

CoT — the trap on reasoning models

Why: reasoning models do step-by-step reasoning internally — that's the whole feature. Asking them to "think step by step" out loud:

Adds output tokens (and cost).
Doesn't improve accuracy on most benchmarks.
Sometimes hurts — explicit CoT in the prompt can interfere with the model's internal reasoning trace, lowering answer quality.

Cited evidence:

Wharton's Decreasing Value of Chain-of-Thought (2024) — measured CoT no longer improving and sometimes hurting on newer models.
OpenAI's reasoning models guide warns against bolting external CoT onto o-series models.

What everyone calls these techniques

Technique	Names you'll see
Few-shot	"in-context learning," "in-prompt examples," "k-shot" (k=2,3,5...)
Chain-of-thought	"CoT," "step-by-step," "scratchpad," "reasoning prefix"
Reasoning models	OpenAI o-series (o1, o3, o4), Claude with extended thinking, Gemini 2.5 Thinking, DeepSeek R1
Non-reasoning models	"Standard models" — Sonnet without extended thinking, GPT-4o, Haiku

What you'll build

⌘↵ runs the editor.read, then continue.

Few-shot and reasoning — examples that work, and the CoT trap on reasoning models — step 1 of 9

Two techniques. One has aged well. The other has aged badly.

Few-shot — still works, when used right

CoT — the trap on reasoning models

What everyone calls these techniques

What you'll build

Few-shot and reasoning — examples that work, and the CoT trap on reasoning models — step 1 of 9

Two techniques. One has aged well. The other has aged badly.

Few-shot — still works, when used right

CoT — the trap on reasoning models

What everyone calls these techniques

What you'll build