Unraveling the Power of Transformers: Chain-of-Thought Prompts Unlock Their Mathematical Muscle

Recent research by Guhao Feng, Yuntian Gu, Bohang Zhang, and Haotian Ye from Peking University has revealed the incredible potential of using Chain-of-Thought (CoT) prompting in Large Language Models (LLMs) for improving their reasoning and mathematical capabilities. The study provides valuable insights into the importance of CoT and its future applications in AI.

Solving Complex Problems with CoT

The authors emphasize the use of CoT prompting in transformer-based LLMs for mathematical and reasoning tasks by providing fundamental impossibility results. Their key finding reveals that without CoT, solving basic mathematical and decision-making tasks would require super-polynomial growth in model size due to the parallel complexity of the problems. However, autoregressive transformer models equipped with CoT can effectively solve these tasks by generating intermediate derivations step-by-step, resulting in consistent improvement over existing techniques.

The experiments conducted in the study assessed the performance of the models using four tasks: Arithmetic, Equation, Longest Increasing Subsequence (LIS), and Edit Distance (ED). When trained on CoT datasets, the transformers achieved near-perfect performance. In contrast, those trained on direct datasets performed poorly, supporting the theoretical results derived by the authors.

Autoregressive Transformers and CoT Prompts

Autoregressive transformer models are neural network architectures designed to process sequences of input tokens and generate tokens for subsequent positions. These models operate by embedding input tokens and positional embeddings into a matrix, which is then transformed through a series of layers consisting of multi-head self-attention and feed-forward network layers.

The advancement that CoT brings to the table is the generation of intermediate reasoning steps in a sequential manner before reaching the answer, enabling transformers to solve complex problems with far greater accuracy than their counterparts without CoT.

Chain-of-Thought in Practice

The researchers demonstrated that Transformers equipped with CoT could efficiently solve both Arithmetic and Equation tasks. For Arithmetic tasks, autoregressive transformers generated CoTs by focusing on specific “handles” – adjacent numbers connected by an operation. With linear equation tasks, CoT is constructed using the Gaussian elimination algorithm to successively eliminate variables in a grammatical form, ultimately reaching the unique solution.

One particularly groundbreaking result the authors found was that when LLMs combined CoT with autoregressive transformers, they could solve general decision-making problems related to Dynamic Programming. This type of programming is widely used to decompose complex tasks into simpler subproblems, improving computational efficiency in various problem-solving applications.

The Future of CoT and AI

As the research unfolds the potential of CoT prompting in transformers, several important questions remain about its ability to generalize in real-world situations. Investigating how CoT generation is triggered by specific prompts and how model size affects CoT can help shape future research and lead to further discoveries in AI.

Additionally, with the power of CoT prompting, large language models have the possibility to learn from limited training data, adapt to unseen situations, and emulate dynamic programming, thereby solving intricate tasks more efficiently. By addressing these questions and exploring the generalization ability of LLMs with regard to CoT demonstrations, AI researchers can potentially reshape how LLMs perform mathematical and reasoning tasks on a whole new level.

In conclusion, the ongoing quest to unlock the power of transformers has led researchers to discover the remarkable potential of CoT prompting. With the ability to efficiently solve complex problems and demonstrate impressive generalization capabilities, CoT has the potential to shape the future landscape of AI, taking us one step closer to high-performing, more intelligent artificial minds.

Original Paper

Towards Revealing the Mystery behind Chain of Thought: a Theoretical Perspective