promptdojo_

Don't add CoT to a model that's already reasoning

The 2023 prompt-engineering folklore: "always add 'think step by step' — it improves reasoning." That was true on GPT-4 and Claude 3, the models everyone benchmarked CoT against.

The 2026 reality: that advice is now wrong on a growing class of models. Reasoning models — OpenAI's o-series, Claude with extended thinking, Gemini 2.5+, DeepSeek R1 — do their reasoning internally. They emit a separate thinking block (which you can read but the user doesn't see) that contains the full chain-of-thought. The visible answer is the conclusion, not the reasoning.

Adding "think step by step" to a reasoning-model prompt:

  1. Adds output tokens — the model now writes its reasoning twice (once internally, once in the visible answer).
  2. Doesn't improve accuracy — Wharton's Decreasing Value of Chain-of-Thought (gail.wharton.upenn.edu/research-and-insights/tech-report-chain-of-thought/) measured this directly.
  3. Sometimes hurts — explicit external CoT can interfere with the model's internal reasoning trace, lowering answer quality on benchmarks that compare to no-CoT baselines.

OpenAI's reasoning models guide explicitly warns:

"Encouraging the model to produce a chain of thought before answering — for example by adding 'think step by step' to the prompt — could affect the model's reasoning process in unintended ways."

The decision rule

if model is reasoning-class:
    skip CoT — the model handles it internally
else:
    add CoT for complex multi-step tasks

What "complex multi-step" means: tasks that benefit from intermediate steps before the final answer (math, logical deduction, multi-stage planning). For simple lookups or classifications, even non-reasoning models don't benefit.

How to tell which is which

FamilyReasoning?CoT in prompt?
OpenAI o-series (o1, o3, o4-mini)YesSkip
OpenAI GPT-4o, GPT-4-turboNoAdd for complex tasks
OpenAI GPT-5 (when reasoning enabled)YesSkip
Claude Sonnet 4.6 (without thinking)NoAdd for complex tasks
Claude Sonnet 4.6 with thinking enabledYesSkip
Claude Haiku 4.5NoAdd for complex tasks
Gemini 2.5 Pro/Flash with thinkingYesSkip
Gemini 1.5 / 2.0 FlashNoAdd for complex tasks

The control surface for Claude is the thinking parameter on messages.create. When it's set, the model is in reasoning-class mode; when it isn't, it's standard.

Few-shot is different

Note: this trap is specific to CoT. Few-shot examples still help on reasoning models — they teach format, labels, and tone the same way, regardless of reasoning capability. So:

  • Reasoning model + few-shot: GOOD
  • Reasoning model + "think step by step": BAD
  • Non-reasoning model + few-shot: GOOD
  • Non-reasoning model + "think step by step" for complex tasks: GOOD

The two techniques aren't interchangeable, and the rule for each is different.

Why this matters now (2026)

A year ago you could write a single prompt-engineering playbook and deploy it everywhere. Now you can't. The same prompt that improved Sonnet's math accuracy will hurt o3's math accuracy. Every prompt template needs to know what model class it's targeting.

Step 6 fixes the bug of bolting CoT onto a reasoning-model prompt. Step 7 fixes the more common few-shot bug — examples in a different format than the actual query.

read, then continue.