Recent research titled “Reasoning with Language Model is Planning with World Model” by a team of researchers from the University of Florida and Mohamed bin Zayed University of Artificial Intelligence showcases their revolutionary method of incorporating strategic planning capabilities into large language models (LLMs). This improves the reasoning proficiency of LLMs and can bring AI capabilities closer to how humans think and plan.

The Challenge with LLMs

The power of LLMs to understand and generate human-like text has been evident in recent years. However, LLMs face difficulties when tackling tasks that require planning, math, logical, or commonsense reasoning. Humans possess mental representations of the environment in which they can simulate actions and their effects for planning purposes. The research team sought to enhance LLMs with such planning abilities, resulting in their proposed Reasoning via Planning (RAP) framework.

RAP incorporates a “world model” into LLMs and leverages Monte Carlo Tree Search to guide the LLM through exploration and exploitation within the reasoning space. Through RAP, LLMs show significant improvements in performance on diverse and challenging problems like Blocksworld, math reasoning, and logical inference.

Bridging the Gap with RAP

The RAP framework repurposes the LLM to build a world model, effectively enabling the LLM to anticipate the outcomes of actions before applying them. This planning process is similar to how humans engage in strategic thinking. At its core, RAP uses an aggregation mechanism to integrate multiple promising reasoning traces, guiding the LLM to improve its performance.

The RAP method’s world model formulation allows for a versatile design of state and action, catering to various reasoning contexts. During the reasoning process, the feasibility and desirability of each reasoning step are assessed and used in guiding how the LLM proceeds.

Rewards are introduced into the process, and Monte Carlo Tree Search (MCTS) finds optimal reasoning traces. These rewards play an essential role in assessing the correctness and helpfulness of reasoning steps, taking a diverse range of tasks into account.

The MCTS algorithm in RAP builds a reasoning tree iteratively to determine the final reasoning trace. By aggregating multiple valid reasoning traces, RAP-Aggregation produces the final answer for tasks where only the answer is required, and not the trace itself.

RAP in Action: Outperforming Baselines and GPT-4

In their experiments, the researchers applied RAP to various tasks and benchmarks, comparing its performance to recent LLM reasoning methods and powerful models like GPT-4 when computational resources were available.

In a plan-generation task using the Blocksworld benchmark, RAP shows significant improvement over baseline methods, with success rates up to 64%. The RAP framework is also successful in solving numerical reasoning tasks, like the GSM8k dataset, notably outperforming Chain-of-Thought and Least-to-Most prompting with self-consistency baselines.

For logical reasoning tasks, RAP is tested on the PrOntoQA dataset, achieving high correct answer and proof accuracy rates, surpassing baselines by up to 14%. RAP’s ability to recognize dead ends and explore alternative reasoning steps through its self-evaluation reward serves as the key to its success.

A Step Forward for AI Reasoning

This fascinating research embodies a significant advancement in AI reasoning. The Reasoning via Planning (RAP) framework imparts strategic planning capabilities to LLMs, allowing them to simulate future outcomes and reason more like humans. As RAP outperforms existing methods and demonstrates flexibility across a range of reasoning tasks, it holds immense potential for future research in AI and the development of more human-like strategic thinking and planning capabilities in artificial intelligence.

As AI technology advances, the incorporation of strategic planning through methods like RAP brings us closer to achieving human-level reasoning in AI systems. This could reshape various fields, from natural language understanding to autonomous decision-making, and lead to more efficient, accurate, and human-like intelligent agents.

Original Paper