Can AI Explain Company Performance?

A Horserace of AI Models

June 2023. Reading Time: 10 Minutes. Authors: Wachi Bandara, PhD, Anshuma Chandak, Brandon Flannery.


  • The rapid evolution of language models has the potential to revolutionise financial analysis
  • GPT outperformed when analyzing earnings calls, followed by Word2Vec and BERT
  • However, overall models should be selected carefully as each has its pros and cons


This paper aims to evaluate the quality of word vectors produced by different word embedding models on two text similarity related financial analysis tasks: company identification and explanation of earnings surprises (assessed by standardized unexpected earnings (SUE)). In the company identification task, we explore language models’ ability to identify the company by comparing its earnings call and different sections of its 10-K report with the business section of the 10-K report of randomly chosen companies. In the SUE task, we explore different language models’ ability in explaining a firm’s standardized unexpected earnings using text from their respective earnings call presentation section.

Specifically, this study surveys the quality of popular word embedding models on text from earnings calls and 10-K filings. The results of this survey suggest that transformer-based models like GPT and BERT models outperform others in both the text similarity related tasks of explaining earnings surprises and identifying companies from earnings calls.


There has been a great deal of excitement around the Generative Pre-trained Transformer (GPT) models, with ChatGPT taking center stage. Ever since its introduction in November 2022 [1], this AI chatbot has gained remarkable popularity across various platforms and domains. Open AI’s technical report [2] reports that GPT-4 exhibits human-level performance on the majority of these professional and academic exams. Notably, it passes a simulated version of the Uniform Bar Examination with a score in the top 10% of test takers This suggests that sophisticated large language models (LLMs) can potentially be a useful virtual research assistant for financial tasks. The question we address in this paper relates to the performance of these models when the task moves beyond simple recall.

Early studies using term frequency vectors established strong relationships between word occurrences and future firm returns. Research has