Don't add CoT to a model that's already reasoning

The 2023 prompt-engineering folklore: "always add 'think step by step' — it improves reasoning." That was true on GPT-4 and Claude 3, the models everyone benchmarked CoT against.

The 2026 reality: that advice is now wrong on a growing class of models. Reasoning models — OpenAI's o-series, Claude with extended thinking, Gemini 2.5+, DeepSeek R1 — do their reasoning internally. They emit a separate thinking block (which you can read but the user doesn't see) that contains the full chain-of-thought. The visible answer is the conclusion, not the reasoning.

Adding "think step by step" to a reasoning-model prompt:

Adds output tokens — the model now writes its reasoning twice (once internally, once in the visible answer).
Doesn't improve accuracy — Wharton's Decreasing Value of Chain-of-Thought (gail.wharton.upenn.edu/research-and-insights/tech-report-chain-of-thought/) measured this directly.
Sometimes hurts — explicit external CoT can interfere with the model's internal reasoning trace, lowering answer quality on benchmarks that compare to no-CoT baselines.

OpenAI's reasoning models guide explicitly warns:

"Encouraging the model to produce a chain of thought before answering — for example by adding 'think step by step' to the prompt — could affect the model's reasoning process in unintended ways."

The decision rule

if model is reasoning-class:
    skip CoT — the model handles it internally
else:
    add CoT for complex multi-step tasks

What "complex multi-step" means: tasks that benefit from intermediate steps before the final answer (math, logical deduction, multi-stage planning). For simple lookups or classifications, even non-reasoning models don't benefit.

How to tell which is which

Family	Reasoning?	CoT in prompt?
OpenAI o-series (o1, o3, o4-mini)	Yes	Skip
OpenAI GPT-4o, GPT-4-turbo	No	Add for complex tasks
OpenAI GPT-5 (when reasoning enabled)	Yes	Skip
Claude Sonnet 4.6 (without thinking)	No	Add for complex tasks
Claude Sonnet 4.6 with `thinking` enabled	Yes	Skip
Claude Haiku 4.5	No	Add for complex tasks
Gemini 2.5 Pro/Flash with thinking	Yes	Skip
Gemini 1.5 / 2.0 Flash	No	Add for complex tasks

The control surface for Claude is the thinking parameter on messages.create. When it's set, the model is in reasoning-class mode; when it isn't, it's standard.

Few-shot is different

Note: this trap is specific to CoT. Few-shot examples still help on reasoning models — they teach format, labels, and tone the same way, regardless of reasoning capability. So:

Reasoning model + few-shot: GOOD
Reasoning model + "think step by step": BAD
Non-reasoning model + few-shot: GOOD
Non-reasoning model + "think step by step" for complex tasks: GOOD

The two techniques aren't interchangeable, and the rule for each is different.

Why this matters now (2026)

A year ago you could write a single prompt-engineering playbook and deploy it everywhere. Now you can't. The same prompt that improved Sonnet's math accuracy will hurt o3's math accuracy. Every prompt template needs to know what model class it's targeting.

Step 6 fixes the bug of bolting CoT onto a reasoning-model prompt. Step 7 fixes the more common few-shot bug — examples in a different format than the actual query.

⌘↵ runs the editor.read, then continue.

promptdojo_›phase 04 · shipping discipline›ch 19 · prompting cursor and claude code effectively

lesson 2 of 4 · few-shot and reasoning — examples that work, and the cot trap on reasoning modelsstep 3 / 9

Don't add CoT to a model that's already reasoning

The 2023 prompt-engineering folklore: "always add 'think step by step' — it improves reasoning." That was true on GPT-4 and Claude 3, the models everyone benchmarked CoT against.

Adding "think step by step" to a reasoning-model prompt:

Adds output tokens — the model now writes its reasoning twice (once internally, once in the visible answer).
Doesn't improve accuracy — Wharton's Decreasing Value of Chain-of-Thought (gail.wharton.upenn.edu/research-and-insights/tech-report-chain-of-thought/) measured this directly.
Sometimes hurts — explicit external CoT can interfere with the model's internal reasoning trace, lowering answer quality on benchmarks that compare to no-CoT baselines.

OpenAI's reasoning models guide explicitly warns:

"Encouraging the model to produce a chain of thought before answering — for example by adding 'think step by step' to the prompt — could affect the model's reasoning process in unintended ways."

The decision rule

if model is reasoning-class:
    skip CoT — the model handles it internally
else:
    add CoT for complex multi-step tasks

How to tell which is which

Family	Reasoning?	CoT in prompt?
OpenAI o-series (o1, o3, o4-mini)	Yes	Skip
OpenAI GPT-4o, GPT-4-turbo	No	Add for complex tasks
OpenAI GPT-5 (when reasoning enabled)	Yes	Skip
Claude Sonnet 4.6 (without thinking)	No	Add for complex tasks
Claude Sonnet 4.6 with `thinking` enabled	Yes	Skip
Claude Haiku 4.5	No	Add for complex tasks
Gemini 2.5 Pro/Flash with thinking	Yes	Skip
Gemini 1.5 / 2.0 Flash	No	Add for complex tasks

The control surface for Claude is the thinking parameter on messages.create. When it's set, the model is in reasoning-class mode; when it isn't, it's standard.

Few-shot is different

Note: this trap is specific to CoT. Few-shot examples still help on reasoning models — they teach format, labels, and tone the same way, regardless of reasoning capability. So:

Reasoning model + few-shot: GOOD
Reasoning model + "think step by step": BAD
Non-reasoning model + few-shot: GOOD
Non-reasoning model + "think step by step" for complex tasks: GOOD

The two techniques aren't interchangeable, and the rule for each is different.

Why this matters now (2026)

Step 6 fixes the bug of bolting CoT onto a reasoning-model prompt. Step 7 fixes the more common few-shot bug — examples in a different format than the actual query.

⌘↵ runs the editor.read, then continue.

Few-shot and reasoning — examples that work, and the CoT trap on reasoning models — step 3 of 9

Don't add CoT to a model that's already reasoning

The decision rule

How to tell which is which

Few-shot is different

Why this matters now (2026)

Few-shot and reasoning — examples that work, and the CoT trap on reasoning models — step 3 of 9

Don't add CoT to a model that's already reasoning

The decision rule

How to tell which is which

Few-shot is different

Why this matters now (2026)