In this Key Scientific Paper Redux of “Can AI Read the Minds of Corporate Executives?,”[i] we summarize the findings of researchers who were interested in whether or not using Natural Language Processing may be used to assess disclosures from companies in their regulatory filings that would allow them to predict future earnings surprises in subsequent quarters. The answer to that question is: yes. This finding comports with our own back-test analyses that finds that there is still signal contained in within quarterly and annual reports over a year later.
Study Details
The researchers considered three different high-level approaches to extract meaning from the words contained in company 10-Ks and 10-Qs from 1993 thru 2021 and had them compete against one another to predict future earnings surprises. Specifically, those methods are:
1. Method 1: Sentiment.
a. A keyword sentiment lexicon developed by the researchers Loughran and McDonald in 2011 that had human-based researchers encoding words in financial reports as to their meaning.
b. A tweaking of Google’s pre-trained Large Language Model BERT (Bidirectional Encoder Representations from Transformers) known as FinBERT, created by Huang, et al. in 2022.
c. Simply looking at the length of the MD&A and Risk Factors sections.
2. Method 2: Bag-of-Words.
a. A manual word classification scheme along with a regression model similar to that employed by Jegadeesh and Wu in 2013. Here the meanings of words are classified and then given weights by a regression model.
b. The same type of approach as above, but as suggested by a difference scheme proposed by Manela and Moreira in 2017.
3. Method 3: Hierarchical Transformer Approach LLM. The method proposed by the authors of the paper who use a Large Language Model in a novel way. Specifically, they focus on:
a. The MD&A and Risk Factors sections of reports.
b. Given that current quarterly earnings announcements are noisy, they train their machine learning algorithms on next quarter’s earnings announcement surprises.
Major Findings
For each of the Methods’ performance summarized below the authors’ criteria was to rank stocks based on their earnings surprise forecasts into quintiles and then evaluate the out of sample performance of the High-minus-Low strategy that buys the highest quintile category and sells the lowest quintile category. Furthermore, for these quintile portfolios they looked at both the equal weighted (EW) and value weighted (VW) portfolio returns. Last, they controlled for additional factors that might skew their results, such as cross-sectional and time-series regressions with the time horizon and various firm characteristics.
For Method 2 the researchers used various statistical and machine learning methods to develop weightings for the words identified as predictive. Which statistical method was used is shown in the parentheses, below.
Below we do not describe the full voluminous output of the researchers. But the authors also looked at CAPM returns, as well as Fama-French 5-factor and 6-factor returns. For those interested in these measures of performance, please consult the paper which is publicly available.
1. Method 1 performance is as follows:
a. Keyword sentiment lexicon: EW = -2.1%; VW = +4.5%.
b. FinBERT: EW = +15.0%; VW = +31.3%.
c. Length of MD&A disclosures: EW = -18.9%; VW = -26.4% (in other words, shorter disclosures perform better).
d. Length of Risk Factors disclosures: EW = +18.6%; VW = -8.9% (in other words, shorter disclosures perform better).
2. Method 2 performance is as follows:
a. Bag-of-Words classification, version 1 (Ordinary Least Squares): EW = +22.4%; VW = +3.9%.
b. Bag-of-Words classification, version 2 (Loughran & McDonald OLS): EW = +18.7%; VW = +21.9%.
c. Bag-of-Words classification, version 3 (Elastic Nets): EW = +33.3%; VW = +18.1%.
d. Bag-of-Words classification, version 4 (Lasso): EW = +22.0%; VW = +10.3%.
e. Bag-of-Words classification, version 5 (Support Vector Regression): EW = +40.0%; VW = +41.8%.
f. Bag-of-Words classification, version 6 (XGBoost): EW = +25.6%; VW = +39.7%.
g. Bag-of-Words classification, version 7 (Random Forest): EW = -4.4%; VW = -19.1%.
h. Bag-of-Words classification, version 8 (Feed Forward Neural Networks): EW = +18.7%; VW = +21.7%.
3. Method 3 performance is as follows:
a. Frozen BERT: EW = +40.7%; VW = +43.0%.
b. FtBERT: EW = +43.9%; VW = +56.1%.
Conclusions
Among the findings of the researchers are that traditional Natural Language Processing techniques are not able to identify future positive or negative changes in firms’ valuations. Next, off-the-shelf Large Language Models (LLMs), even those trained on financial targets, while good at predicting future earnings surprises, are not better than simpler estimators such as the lengths of companies’ MD&A and Risk Factors sections of their quarterly and annual reports. Third, fine-tuning and training LLMs is a solution. The researchers’ offering, FtBERT provides superior results in identifying future positive or negative changes in earnings. Last, the researchers find that there is robust unpriced information contained in both quarterly and annual reports.
Quotes of Note
- “The abundance of signals and complexity in disclosed information leads to investors inattention to subtle but important signals even in the most foundational to the corporate reporting process items such as quarterly and annual, 10-Q (10-K), reports.”
- “Cohen et al. (2020) also show that the subsequent [earnings] announcement does indeed reflect information which the market neglected to react to in the previous quarter’s announcement.”
- “Finally, we bring attention to the valuable information content of 10-Q and 10-K reports. We also find that market participants react very slowly to this information, largely due to high disagreement about its interpretation.”
[i]Chapados, Nicolas, Zhenzhen Fan, Russ Goyenko, Issam Hadj Laradji, Fred Liu, and Chengyu Zhang. “Can AI Read the Minds of Corporate Executives?” SSRN. 27 June 2023




0 Comments