Compositional Prompting for LLM Reasoning: A Monte Carlo Tree Search Framework

October 9, 2025 2 min read Updated: February 24, 2026

LLMs are surprisingly capable reasoners, but coaxing good reasoning out of them is still mostly an art. You tweak a prompt, run it, squint at the output, tweak again. It works, sort of, but it doesn’t scale and it doesn’t compose.

I wanted to try something different: treat prompting as a search problem.

The idea

MCTS (Monte Carlo Tree Search) is the algorithm that made AlphaGo work. It’s good at navigating large decision spaces where you can’t enumerate everything but you can sample and learn. Prompt engineering has exactly this structure. The space of possible prompts is enormous, you can evaluate any particular prompt by running it, and small changes can have outsized effects.

The key design choice is decomposing prompts into a 5-dimensional action space:

Context: what background information to include
Examples: which few-shot examples to provide
Constraints: what guidelines or restrictions to specify
Format: how to structure the output
Reasoning: what thinking strategies to encourage

Each dimension has a discrete set of primitives. MCTS explores combinations of these primitives, learning which compositions produce better outputs.

Why bother?

Hand-crafted prompts have a composability problem. You find a prompt that works for task A. You find another for task B. Combining them for task A+B often breaks both. There’s no principled way to compose prompt strategies.

A search-based approach sidesteps this. The system explores compositions directly and keeps what works. It doesn’t need your intuition about what “should” work together.

The search loop

# Simplified conceptual flow
while not converged:
    # MCTS exploration
    prompt = select_promising_prompt()

    # Try it with LLM
    result = llm.generate(prompt)

    # Evaluate quality
    score = evaluate(result)

    # Update search tree
    backpropagate(score)

Standard MCTS, adapted for discrete prompt composition instead of game moves. The tree nodes represent partial prompt specifications, and rollouts are actual LLM calls with evaluation.

Results

On reasoning benchmarks, the system finds prompts that outperform hand-crafted baselines by about 30%. More interesting than the numbers: it discovers strategies I wouldn’t have thought to try. Some of the effective compositions look strange to a human prompt engineer but work well empirically.

The system also composes across task types, finding prompt structures that generalize rather than overfitting to specific problems.

Where this is useful

Complex reasoning tasks (math, logic, planning) benefit most. If you’re in a domain where you don’t have deep prompt engineering experience, or you need prompts that adapt to changing task distributions, search beats intuition.

The paper

Full technical details, including the formal action space specification, MCTS adaptations, experimental methodology, ablation studies, and analysis of discovered strategies:

View Paper

The idea

Why bother?

The search loop

Results

Where this is useful

The paper

Related Posts

MCTS-Reasoning: Tree Search for LLM Reasoning

Artificial Intelligence: A Modern Approach

Notes

src2md: Fitting Codebases into LLM Context Windows

Networks of Thought: Finding Your Research Niche in the Age of LLMs

CTK: A Toolkit for Managing AI Conversations Across Platforms

Discussion