Chain of Thought Reasoning
Why Planning Matters
The Problem: LLMs often fail on complex tasks when asked to jump directly to an answer. They make reasoning errors and miss important steps.
The Solution: Chain of Thought (CoT) prompting encourages the model to think step-by-step, dramatically improving accuracy on reasoning-intensive tasks.
Real Impact: CoT can improve math problem accuracy from 18% to 97% and enables agents to tackle multi-step planning tasks reliably.
Real-World Analogy
Think of CoT like showing your work on a math test:
- Direct Answer = Writing just "42" -- might be wrong, hard to debug
- Chain of Thought = Writing each step -- easier to verify and correct
- Task Decomposition = Breaking a big problem into smaller solvable parts
- Tree of Thoughts = Exploring multiple solution paths simultaneously
CoT Techniques
Zero-Shot CoT
Simply add "Think step by step" to your prompt. Surprisingly effective for many reasoning tasks.
Few-Shot CoT
Provide examples with detailed reasoning chains. The model learns to mirror the reasoning style.
Self-Consistency
Generate multiple reasoning paths and take the majority vote. Reduces errors from any single chain.
Plan-and-Execute
First create a complete plan, then execute each step. Separates planning from execution.
Without CoT: "Roger has 5 tennis balls." (WRONG) With CoT: "Roger started with 5 balls. He bought 2 cans of 3 balls each = 6 balls. 5 + 6 = 11 balls total." (CORRECT)
Zero-Shot CoT
# Zero-Shot Chain of Thought
response = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "user",
"content": "A store has 3 types of fruit. Apples cost $2, bananas $1, oranges $3. I buy 4 apples, 6 bananas, and 2 oranges. How much do I spend? Think step by step."
}]
)
# Model breaks down: 4*2=8, 6*1=6, 2*3=6, total=20
# Plan-and-Execute Pattern
class PlanAndExecute:
def run(self, task):
# Step 1: Create plan
plan = self.planner.create_plan(task)
# Step 2: Execute each step
results = []
for step in plan.steps:
result = self.executor.execute(step, results)
results.append(result)
# Step 3: Synthesize final answer
return self.synthesizer.combine(results)
Task Decomposition
Common Mistake
Wrong: Breaking tasks into too many fine-grained steps
Why it fails: Over-decomposition causes the model to lose sight of the overall goal. Each step adds context overhead and increases the chance of compounding errors.
Instead: Decompose into 3-5 meaningful sub-tasks that each produce a clear intermediate result. Let the LLM handle the granular reasoning within each sub-task.
Planning Strategies
| Strategy | How It Works | Best For |
|---|---|---|
| Linear Plan | Sequential step list | Well-defined workflows |
| DAG Plan | Steps with dependencies | Parallelizable tasks |
| Adaptive Plan | Replan after each step | Uncertain environments |
| Hierarchical | High-level then detailed | Very complex tasks |
Tree of Thoughts
Tree of Thoughts (ToT)
- Generate: Create multiple candidate reasoning steps
- Evaluate: Score each candidate for promise
- Expand: Continue the most promising branches
- Backtrack: Abandon dead-end branches and try alternatives
Deep Dive: When CoT Hurts Performance
Chain of thought does not always help. For simple factual lookups ("What is the capital of France?"), CoT adds unnecessary tokens without improving accuracy. For creative tasks, step-by-step reasoning can make outputs feel robotic. CoT works best for multi-step reasoning, math, logical deduction, and tasks where showing work reveals errors. Always benchmark CoT vs direct prompting on your specific task before committing to one approach.
Quick Reference
| Technique | Description | Accuracy Gain |
|---|---|---|
| Zero-Shot CoT | "Think step by step" | +20-40% |
| Few-Shot CoT | Examples with reasoning chains | +30-50% |
| Self-Consistency | Multiple paths, majority vote | +5-15% over CoT |
| Tree of Thoughts | Branch and evaluate paths | Best for creative tasks |
| Plan-and-Execute | Separate planning from doing | Best for multi-step tasks |