Skip to main content

Chain-of-Thought Prompting: Techniques and Best Practices

Ryan Dahlberg
Ryan Dahlberg
October 22, 2025 13 min read
Share:
Chain-of-Thought Prompting: Techniques and Best Practices

Chain-of-Thought Prompting: Techniques and Best Practices

Chain-of-thought (CoT) prompting has emerged as one of the most powerful techniques for improving LLM reasoning capabilities. By encouraging models to articulate their reasoning process step-by-step, CoT prompting dramatically improves performance on complex tasks involving mathematics, logic, common sense reasoning, and multi-step problem solving.

This guide explores the full spectrum of CoT techniques, from foundational approaches to cutting-edge variations, with practical examples and implementation strategies.

What is Chain-of-Thought Prompting?

Chain-of-thought prompting elicits intermediate reasoning steps from language models rather than jumping directly to an answer. Instead of:

Standard Prompt: “What is 17 × 23?” Response: “391”

You get:

CoT Prompt: “What is 17 × 23? Let’s think step by step.” Response:

Let me break this down:
17 × 23 = 17 × (20 + 3)
       = (17 × 20) + (17 × 3)
       = 340 + 51
       = 391

The explicit reasoning chain serves multiple purposes:

  • Makes the model’s logic inspectable
  • Reduces errors by breaking complexity
  • Enables debugging when answers are wrong
  • Improves performance on complex reasoning tasks

The Foundation: Few-Shot Chain-of-Thought

The original CoT technique uses few-shot examples with explicit reasoning chains.

Structure

Provide 2-4 examples showing:

  1. The problem
  2. Step-by-step reasoning
  3. The final answer

Then present the actual problem for the model to solve.

Example: Math Word Problems

Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls.
Each can has 3 balls. How many tennis balls does he have now?

A: Roger started with 5 balls.
2 cans × 3 balls per can = 6 balls.
5 + 6 = 11 balls.
Answer: 11

Q: The cafeteria had 23 apples. If they used 20 to make lunch and
bought 6 more, how many apples do they have?

A: Started with 23 apples.
Used 20 apples: 23 - 20 = 3 apples remaining.
Bought 6 more: 3 + 6 = 9 apples.
Answer: 9

Q: A florist had 37 roses. She sold 16 roses and then received a
delivery of 24 roses. How many roses does she have now?

A: [Model generates reasoning here]

Why This Works

Few-shot CoT examples teach the model:

  • The expected reasoning style
  • The level of detail needed
  • How to structure the solution process
  • What constitutes a valid intermediate step

The model learns the pattern and applies it to new problems.

Selecting Good Examples

Not all examples are equally effective:

Diversity: Cover different problem types and solution strategies Clarity: Each reasoning step should be obvious and justified Completeness: Don’t skip steps, even obvious ones Accuracy: Ensure all reasoning and answers are correct

One bad example can degrade overall performance significantly.

Zero-Shot Chain-of-Thought

A surprising discovery: simply adding “Let’s think step by step” to prompts triggers CoT reasoning without any examples.

Basic Approach

Problem: [complex problem]
Let's think step by step.

That’s it. The model generates intermediate reasoning automatically.

Why Zero-Shot CoT Works

Large models have seen countless examples of step-by-step reasoning in their training data. The phrase “Let’s think step by step” activates this learned pattern.

Benefits:

  • No need to craft examples
  • Works across diverse problem types
  • Faster to implement
  • Reduces prompt length

Limitations:

  • Less control over reasoning style
  • May not work well on domain-specific problems
  • Quality varies by model size

Variations on the Trigger Phrase

Different trigger phrases work better for different tasks:

General Reasoning: “Let’s think step by step” Math Problems: “Let’s solve this step by step” Logical Analysis: “Let’s analyze this systematically” Code Problems: “Let’s break this down into steps” Planning Tasks: “Let’s create a plan step by step”

Experiment to find what works for your domain.

Self-Consistency Chain-of-Thought

Self-consistency improves CoT reliability by generating multiple reasoning paths and selecting the most consistent answer.

Process

  1. Generate 5-10 reasoning chains for the same problem
  2. Each chain may use different approaches or steps
  3. Extract the final answer from each chain
  4. Select the answer that appears most frequently (majority vote)

Example

Problem: A store had 20 apples. They sold some apples in the morning and had 8 left. They then received 15 more apples. How many apples do they have now?

Chain 1:

Started with 20, ended with 8, so sold 12 apples.
Then got 15 more: 8 + 15 = 23 apples.
Answer: 23

Chain 2:

Morning: 20 → 8 apples (sold 12).
Delivery: 8 + 15 = 23 apples.
Answer: 23

Chain 3:

Sold: 20 - 8 = 12 apples.
Current: 8 apples.
After delivery: 8 + 15 = 23 apples.
Answer: 23

All three chains agree: 23 apples is the answer.

When to Use Self-Consistency

High-Stakes Decisions: When errors are costly Ambiguous Problems: When multiple interpretations exist Quality Validation: When you need confidence in the answer Error Detection: When you want to identify inconsistent reasoning

Cost Considerations

Self-consistency requires multiple API calls. Optimize by:

  • Starting with 3 chains, expanding only if inconsistent
  • Using smaller/cheaper models for initial generation
  • Caching and reusing chains for similar problems
  • Parallel API calls to reduce latency

Tree-of-Thought Prompting

Tree-of-thought (ToT) extends CoT by exploring multiple reasoning branches at each step, backtracking when needed.

The Process

  1. Problem Decomposition: Break the problem into stages
  2. Branch Generation: Generate multiple options at each stage
  3. Evaluation: Assess the promise of each branch
  4. Selection: Choose best branch(es) to explore further
  5. Backtracking: Abandon unpromising paths
  6. Synthesis: Combine successful branches into final solution

Example: Creative Writing Task

Task: Write an engaging opening paragraph for a mystery novel.

Stage 1: Opening scene setting

  • Branch A: Dark and stormy night (classic)
  • Branch B: Sunny morning disrupted by discovery (contrast)
  • Branch C: Character in middle of investigation (in medias res)

Evaluation: Branch B and C more original, explore both.

Stage 2 (Branch B): What is discovered?

  • Branch B1: A mysterious letter
  • Branch B2: An empty house that should be occupied
  • Branch B3: An object out of place

Evaluation: B2 creates most tension, explore further.

Stage 3 (Branch B2): Character reaction

  • Generate several possible reactions
  • Evaluate for authenticity and engagement
  • Select best reaction

Final Output: Synthesize B → B2 → selected reaction into polished paragraph.

When to Use Tree-of-Thought

Creative Tasks: Multiple valid approaches exist Planning Problems: Need to consider alternatives Optimization: Seeking best solution, not just correct one Exploratory Analysis: Goal is to map solution space

Implementation Complexity

ToT is more complex to implement:

  • Requires explicit branching logic
  • Needs evaluation criteria for branches
  • Must track tree structure
  • Computationally expensive (many LLM calls)

Consider using orchestration frameworks or building custom tooling.

Least-to-Most Prompting

Least-to-most prompting decomposes problems into increasingly complex subproblems.

Structure

Step 1: Decompose problem into ordered subproblems

Problem: [complex problem]
Break this into simpler subproblems that build on each other.

Step 2: Solve easiest subproblem first

Subproblem 1: [simplest subproblem]
Solve this step.

Step 3: Use solution to tackle next subproblem

Given that [solution to subproblem 1], solve:
Subproblem 2: [next subproblem]

Continue until all subproblems solved.

Example: Software Architecture Design

Problem: Design a scalable user authentication system.

Decomposition:

  1. Define core authentication requirements
  2. Choose authentication method (JWT, sessions, etc.)
  3. Design token structure and validation
  4. Add authorization layer
  5. Implement rate limiting and security
  6. Design horizontal scaling approach

Solving: Each step builds on previous solutions. By step 6, all foundational decisions are made, making scalability design much more concrete.

Why Least-to-Most Works

Many complex problems are easier to solve when approached incrementally:

  • Early solutions constrain later problem space
  • Each step provides context for next
  • Errors detected early prevent cascading failures
  • Cognitive load managed by focusing on one layer at a time

Chain-of-Thought with Self-Refinement

Self-refinement adds an iterative improvement loop to CoT reasoning.

Process

Round 1: Generate initial CoT solution

Problem: [problem]
Solve step by step.

Round 2: Critique the solution

Review this solution:
[previous CoT reasoning]

Identify:
- Logical errors
- Missing steps
- Unclear reasoning
- Alternative approaches

Round 3: Refine based on critique

Given these issues:
[critique]

Generate an improved solution.

Repeat rounds 2-3 until quality plateaus.

When Self-Refinement Helps

Complex Reasoning: Multi-step logic where errors are likely Novel Problems: Tasks outside typical training distribution High Quality Needs: When “good enough” isn’t acceptable Learning Tasks: Building better intuition through iteration

Stopping Criteria

How many refinement rounds?

  • Fixed Iterations: 2-3 rounds for most tasks
  • Convergence: Stop when changes become minimal
  • Quality Threshold: Stop when validation metrics met
  • Resource Limits: Stop when cost/time budget exhausted

Domain-Specific Chain-of-Thought

Different domains benefit from specialized CoT patterns.

Mathematical Reasoning

Pattern: Show algebraic steps explicitly

Solve: 3x + 7 = 22

Step 1: Subtract 7 from both sides
3x + 7 - 7 = 22 - 7
3x = 15

Step 2: Divide both sides by 3
3x / 3 = 15 / 3
x = 5

Verification: 3(5) + 7 = 15 + 7 = 22 ✓

Code Debugging

Pattern: Trace execution and state

Bug: Function returns wrong value

Step 1: Input validation
- Input: x = -5
- Expected: absolute value
- Issue: Need to check for negative

Step 2: Trace execution
- Line 3: if x < 0 → True
- Line 4: return -x → returns 5
- Correct so far...

Step 3: Found it!
- Line 7: Missing return statement for positive case
- Falls through to undefined return

Strategic Planning

Pattern: Evaluate options systematically

Decision: Choose deployment strategy

Option A: Blue-Green Deployment
- Pro: Zero downtime
- Pro: Easy rollback
- Con: 2x infrastructure cost
- Con: More complex setup

Option B: Rolling Update
- Pro: No extra infrastructure
- Pro: Gradual rollout
- Con: Mixed versions during deploy
- Con: Slower rollback

Given constraints [budget limited, uptime critical]:
Recommend: Blue-Green despite cost, uptime is priority

Scientific Analysis

Pattern: Hypothesis-driven reasoning

Observation: API latency increased 3x

Hypothesis 1: Database connection pool saturated
- Check: Connection pool metrics
- Result: 10% utilization, not the cause

Hypothesis 2: Network congestion
- Check: Network throughput graphs
- Result: Normal levels, not the cause

Hypothesis 3: New deployment changed query patterns
- Check: Deployment timeline vs. latency spike
- Result: Exact correlation!
- Check: Query logs show N+1 query pattern
- Conclusion: New code introduced inefficient queries

Advanced Techniques: Analogical Prompting

Use analogies to guide reasoning in unfamiliar domains.

Pattern

Problem: [novel problem in domain A]

This problem is analogous to [familiar problem in domain B].

Solve the analogous problem:
[reasoning for domain B problem]

Map the solution back to the original domain:
[transfer solution to domain A]

Example

Problem: Design a system for managing shared computing resources across teams.

Analogy: This is like managing shared meeting rooms in an office.

Reasoning: For meeting rooms, we use:

  • Reservation systems
  • Fair allocation policies
  • Priority tiers
  • Overbooking with intelligent scheduling
  • Usage analytics to optimize allocation

Transfer: Apply to computing resources:

  • Resource reservation API
  • Fair-share scheduling
  • Priority queues for different teams
  • Overcommit with intelligent preemption
  • Usage dashboards for optimization

The analogy provides structure for reasoning about the unfamiliar problem.

Combining Multiple CoT Techniques

The most powerful applications combine techniques:

Example Workflow

Complex Problem: Design a machine learning training pipeline

Step 1: Least-to-Most Decomposition Break into foundational → advanced components.

Step 2: Tree-of-Thought Exploration For key decision points, explore multiple branches.

Step 3: Zero-Shot CoT Reasoning For routine subproblems, use simple step-by-step reasoning.

Step 4: Self-Consistency Validation Generate multiple reasoning paths for critical decisions.

Step 5: Self-Refinement Critique and improve the integrated design.

Each technique addresses different aspects of the complex task.

Best Practices

Explicit Step Markers

Use clear markers for reasoning steps:

Step 1: [description]
Step 2: [description]

Or:

First, I'll [action]
Then, I'll [action]
Finally, I'll [action]

This structure helps the model stay organized.

Encourage Metacognition

Ask the model to reason about its reasoning:

Before solving, consider:
- What information is needed?
- What approach is most appropriate?
- What could go wrong?

Then solve the problem.

Verify Intermediate Steps

For critical reasoning, verify each step:

Step 1: [claim]
Verification: [check that claim is valid]

Step 2: [next claim based on step 1]
Verification: [check validity]

Document Assumptions

Make assumptions explicit:

Assuming [assumption]:
Reasoning: [logic based on assumption]

If [assumption] is false:
Alternative: [different reasoning path]

Common Pitfalls

Over-Decomposition

Breaking tasks into too many micro-steps:

  • Increases prompt length
  • Adds noise
  • Slows reasoning
  • Reduces coherence

Fix: Find the right granularity for your task complexity.

Inconsistent Reasoning

Steps that contradict each other or don’t connect:

  • Sign of model confusion
  • Often from unclear prompts
  • Can compound errors

Fix: Add verification steps between major reasoning stages.

Premature Conclusions

Jumping to answers before completing the reasoning chain:

  • Defeats the purpose of CoT
  • Often caused by weak trigger phrases

Fix: Explicitly request complete reasoning before conclusions.

Hallucinated Steps

Model invents plausible-sounding but incorrect reasoning:

  • Hard to detect without domain knowledge
  • More common with complex or ambiguous problems

Fix: Use self-consistency or verification prompts.

Measuring CoT Effectiveness

Track these metrics:

Accuracy: Are final answers correct? Reasoning Quality: Are intermediate steps sound? Consistency: Do multiple runs produce similar reasoning? Explainability: Can you follow the logic? Efficiency: Cost/time vs. quality trade-off?

Use these metrics to refine your CoT prompts over time.

Tools and Frameworks

Several tools support CoT prompting:

LangChain: Built-in CoT chain types and templates Guidance: Structured generation for explicit reasoning control Semantic Kernel: Planning and reasoning abstractions Custom Solutions: Full control for specialized CoT patterns

Research Directions

CoT prompting continues to evolve:

Learned CoT: Training models to automatically use CoT reasoning Verifiable Reasoning: Formal methods to validate reasoning chains Multi-Modal CoT: Extending CoT to images, code, and other modalities Automated Prompt Engineering: Systems that generate optimal CoT prompts

Practical Implementation

Start simple:

  1. Try zero-shot CoT (“Let’s think step by step”)
  2. If quality insufficient, add 2-3 few-shot examples
  3. For critical tasks, add self-consistency
  4. For complex problems, consider tree-of-thought
  5. Always measure impact vs. baseline

Conclusion

Chain-of-thought prompting transforms LLMs from pattern-matching systems into reasoning engines. By encouraging explicit reasoning steps, CoT dramatically improves performance on tasks requiring logic, planning, and multi-step problem solving.

The key insight: making reasoning explicit doesn’t just improve answers—it makes AI systems more transparent, debuggable, and trustworthy. As models grow more capable, CoT techniques will become even more powerful, enabling AI to tackle increasingly complex cognitive tasks.

Start with zero-shot CoT for quick wins, then progressively add sophistication as your use cases demand. The investment in learning CoT techniques pays dividends across every domain where AI-powered reasoning matters.


Part of the AI & ML series on advanced techniques for working with large language models.

#Prompt Engineering #Chain-of-Thought #AI #LLMs #Reasoning