Researchers from the Faculty of Informatics at the Masaryk University have been making strides in enriching chain-of-thoughts datasets requiring arithmetical reasoning with the integration of nonparametric components, such as calculators. In their recent experiment, researchers have developed a machine-processable HTML-like format to enable more efficient integration between large language models (LLMs) and symbolic systems to improve arithmetical reasoning capabilities in AI.

Integration Challenges and Benefits

Large language models perform well on unstructured language data, but often struggle with arithmetical reasoning tasks that require explicit computation due to their probabilistic representation. Conversely, symbolic systems can flawlessly handle arithmetic calculations. By combining the neural and symbolic systems, researchers aim to leverage the strengths of both approaches to handle arithmetical reasoning tasks more effectively. The challenge lies in integrating these systems, which often requires careful structuring of input data to avoid confusion between the two.

The authors propose a unified, machine-processable format that integrates LLMs with calculators in chain-of-thoughts datasets. They convert datasets like GSM8K, Ape210K, AQuA-RAT, and MathQA into this new format to execute more efficient integration between LLMs and symbolic systems.

The Calc-X approach: A Common Format

Researchers introduce the Calc-X approach, a semi-structured data format for chain-of-thoughts data, which combines the precision of structured formats with the flexibility of unstructured text. Built on an HTML-like language, it uses three tags: gadget, output, and result. The gadget tag represents inputs to external systems, the output tag wraps the response of the external system to the query, and the result tag indicates the final result of the thought chain.

By implementing this format, researchers aim to establish a seamless integration between LLMs and non-parametric systems, such as calculators, ultimately enriching arithmetical reasoning capabilities of AI models.

Exploring Datasets: GSM8K, Ape210K, AQuA-RAT, and MathQA

The researchers survey several chain-of-thoughts datasets for arithmetical reasoning where LLMs can significantly benefit from relying on non-parametric systems.

GSM8K consists of 8K examples of arithmetical expressions that can be evaluated using a calculator. After parsing the formulae using regular expressions and evaluating them with the sympy library, the data is exported in the new HTML-like language format.

Ape210K has over 200k math problems involving simple arithmetic. The prompts are written in Chinese, which the team translates using machine translation. Linearization of the nested expressions is also done before exporting the data in the new format.

AQuA-RAT has around 100K math problems, offering multiple choices, correct choice, and an informal, free-text rationale. Researchers identify and evaluate expressions using regular expressions and sympy calculator while maintaining consistency in each rationale structure in the dataset.

MathQA extends AQuA-RAT’s rationale with further annotations. It addresses issues in the earlier dataset, corrects errors in rationales, and annotates solutions with nested expressions that lead to the correct answer. Parsing and linearization are similar to Ape210K, enabling a more coherent translation into the new format.

The Future of Calc-X and AI Research

The Calc-X approach can lead to significant improvements in AI capabilities for arithmetical reasoning tasks. But there are still ongoing challenges, such as refining the AQuA-RAT rationales and enhancing the integration between LLMs and non-parametric systems.

For future research, utilising a sequence-to-sequence language model could lead to a higher recall of annotated data. Additionally, employing a language model for creating more in-depth and natural-language explanatory datasets for arithmetical reasoning promises to enhance large-scale chain-of-thoughts datasets. These innovative and improved AI capabilities could have a substantial impact on industries requiring arithmetical reasoning, such as finance, engineering, and education, offering more efficient and precise problem-solving tools for users.

Original Paper