A recent research article from DAMO Academy Alibaba Group has explored whether GPT-4, a large language model (LLM), is capable of performing data analysis on par with professional human data analysts. The study conducted a series of head-to-head comparative experiments to measure GPT-4’s performance against that of human data analysts, utilizing a framework designed to prompt GPT-4 to perform end-to-end data analysis tasks.

How GPT-4 Performs Data Analysis

As a large language model, GPT-4 has shown strong capabilities in various areas such as natural language processing (NLP) and finance. This study sought to determine if GPT-4 could effectively replace data analysts by evaluating its performance on tasks that normally fall within the job scope of a data analyst. With this aim in mind, the researchers developed an end-to-end framework that prompts GPT-4 to perform the three main steps data analysts follow: data collection, data visualization, and data analysis.

When given a business-related query and relevant database tables and schemas, GPT-4 was used to identify the required data, generate appropriate graphs for visualization, and provide analysis and insights on the data. The experiments were conducted using a random selection of 100 questions from various domains and chart types from the NvBench dataset. The researchers also experimented with their framework for writing insights from the data and evaluating the quality through task-specific metrics.

Evaluation and Comparison with Human Analysts

To assess GPT-4’s performance as a data analyst, the study created a set of evaluation metrics to measure the output generated by GPT-4 in terms of information correctness, chart type correctness, aesthetics, correctness, alignment, complexity, and fluency. Six professional data annotators independently graded each generated figure and analysis bullet point using these metrics.

The results showed that GPT-4’s performance was generally comparable to that of human data analysts in many aspects. GPT-4 performed particularly well in understanding and plotting correct chart types, as well as in generating grammatically correct sentences. However, its scores in information correctness were not as high due to occasional errors in the plotted charts.

The study also compared GPT-4’s performance with that of a senior data analyst from the finance industry, another senior data analyst from the internet industry, and a junior analyst from a consulting firm. These comparisons revealed that GPT-4 even outperformed human analysts in some metrics. However, the cost of GPT-4 in performing these tasks was much lower than that of hiring human data analysts.

Cases, Challenges, and Future Research

The article discussed several cases illustrating the capabilities of GPT-4 and human analysts in generating data analysis insights. While GPT-4 demonstrated its ability to understand data, generate code, and provide insights, it also made incorrect calculations under certain conditions. On the other hand, human data analysts were able to express analysis with personal thoughts, emotions, and apply background knowledge.

Despite the promising results, the study found several challenges in GPT-4’s performance, such as hallucination problems that led to errors in the generated analysis. Additionally, the questions used in the experiments were somewhat too specific compared to what human data analysts usually encounter in practice. Furthermore, the quantity of human evaluation and data analyst annotation was limited due to budget constraints.

These challenges suggest that more studies are needed to validate the conclusion that GPT-4 can replace human data analysts. However, the findings of this research have demonstrated GPT-4’s potential as a valuable tool in the data analysis process and have provided a foundation for future research aimed at further understanding and improving AI capabilities in the data analytics field.

Key Takeaway: This study reveals the potential of GPT-4 as a data analyst, showing the capabilities of large language models like GPT-4 in automating tasks typically performed by data analysts. By leveraging these AI capabilities, businesses and organizations can potentially optimize their data analysis process, lower costs, and enhance decision-making based on insights generated from vast amounts of data.

Original Paper