{"id":14334,"date":"2023-11-28T15:28:26","date_gmt":"2023-11-28T20:28:26","guid":{"rendered":"https:\/\/jasonapollovoss.com\/web\/?p=14334"},"modified":"2025-09-05T16:25:39","modified_gmt":"2025-09-05T22:25:39","slug":"key-scientific-paper-redux-can-ai-read-the-minds-of-corp-execs","status":"publish","type":"post","link":"https:\/\/jasonapollovoss.com\/web\/2023\/11\/28\/key-scientific-paper-redux-can-ai-read-the-minds-of-corp-execs\/","title":{"rendered":"Key Scientific Paper Redux: Can AI Read the Minds of Corp Execs?"},"content":{"rendered":"<p>[et_pb_section fb_built=&#8221;1&#8243; admin_label=&#8221;section&#8221; _builder_version=&#8221;4.16&#8243; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;post_content&#8221;][et_pb_row admin_label=&#8221;row&#8221; _builder_version=&#8221;4.16&#8243; background_size=&#8221;initial&#8221; background_position=&#8221;top_left&#8221; background_repeat=&#8221;repeat&#8221; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;post_content&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.16&#8243; custom_padding=&#8221;|||&#8221; global_colors_info=&#8221;{}&#8221; custom_padding__hover=&#8221;|||&#8221; theme_builder_area=&#8221;post_content&#8221;][et_pb_text admin_label=&#8221;Text&#8221; _builder_version=&#8221;4.16&#8243; background_size=&#8221;initial&#8221; background_position=&#8221;top_left&#8221; background_repeat=&#8221;repeat&#8221; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;post_content&#8221;]<\/p>\n<figure class=\"x-el x-el-figure c2-1 c2-2 c2-3x c2-i c2-h c2-21 c2-2c c2-29 c2-2a c2-43 c2-51 c2-3 c2-4 c2-5 c2-6 c2-7 c2-8\">\n<div><\/div>\n<\/figure>\n<p><span style=\"font-family: futural;\">In this Key Scientific Paper Redux of \u201cCan AI Read the Minds of Corporate Executives?,\u201d<a class=\"x-el x-el-a c2-2w c2-2x c2-69 c2-v c2-w c2-x c2-j c2-6a c2-3 c2-30 c2-31 c2-11 c2-32\" href=\"https:\/\/blogging.godaddy.com\/blog\/a6d795a4-a672-4120-a6ba-07384a52a2d8\/posts\/9e5f651f-618b-490e-9e9b-5045e0f25d60#_edn1\" rel=\"\">[i]<\/a>\u00a0we summarize the findings of researchers who were interested in whether or not using Natural Language Processing may be used to assess disclosures from companies in their regulatory filings that would allow them to predict future earnings surprises in subsequent quarters. The answer to that question is: yes. This finding comports with\u00a0<a class=\"x-el x-el-a c2-2w c2-2x c2-69 c2-v c2-w c2-x c2-j c2-6a c2-3 c2-30 c2-31 c2-11 c2-32\" href=\"https:\/\/deceptionandtruthanalysis.com\/insights\/f\/data-beats-the-dow\" rel=\"\">our own back-test analyses<\/a>\u00a0that finds that there is still signal contained in within quarterly and annual reports over a year later.<\/span><\/p>\n<div>\n<h4 class=\"x-el x-el-h4 c2-6h c2-6i c2-v c2-w c2-42 c2-2c c2-2a c2-29 c2-2b c2-3 c2-z c2-44 c2-10 c2-45 c2-46 c2-47 c2-48\"><span style=\"font-family: futural;\"><strong class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-3v c2-66\"><\/strong><\/span><\/h4>\n<h3 class=\"x-el x-el-h4 c2-6h c2-6i c2-v c2-w c2-42 c2-2c c2-2a c2-29 c2-2b c2-3 c2-z c2-44 c2-10 c2-45 c2-46 c2-47 c2-48\"><span style=\"font-family: futural;\"><strong class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-3v c2-66\">Study Details<\/strong><\/span><\/h3>\n<\/div>\n<p><span style=\"font-family: futural;\">The researchers considered three different high-level approaches to extract meaning from the words contained in company 10-Ks and 10-Qs from 1993 thru 2021 and had them compete against one another to predict future earnings surprises. Specifically, those methods are:<\/span><\/p>\n<p><span style=\"font-family: futural;\">1.\u00a0<u class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-31 c2-66 c2-69\">Method 1: Sentiment<\/u>.<\/span><\/p>\n<p><span style=\"font-family: futural;\">\u00a0\u00a0\u00a0a. A keyword sentiment lexicon developed by the researchers Loughran and McDonald in 2011 that had human-based researchers encoding words in financial reports as to their meaning.<\/span><\/p>\n<p><span style=\"font-family: futural;\">\u00a0\u00a0\u00a0b. A tweaking of Google\u2019s pre-trained Large Language Model BERT (Bidirectional Encoder Representations from Transformers) known as FinBERT, created by Huang, et al. in 2022.<\/span><\/p>\n<p><span style=\"font-family: futural;\">\u00a0\u00a0\u00a0c. Simply looking at the length of the MD&amp;A and Risk Factors sections.<\/span><\/p>\n<p><span style=\"font-family: futural;\">2.\u00a0<u class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-31 c2-66 c2-69\">Method 2: Bag-of-Words<\/u>.<\/span><\/p>\n<p><span style=\"font-family: futural;\">\u00a0\u00a0\u00a0a. A manual word classification scheme along with a regression model similar to that employed by Jegadeesh and Wu in 2013. Here the meanings of words are classified and then given weights by a regression model.<\/span><\/p>\n<p><span style=\"font-family: futural;\">\u00a0\u00a0\u00a0b. The same type of approach as above, but as suggested by a difference scheme proposed by Manela and Moreira in 2017.<\/span><\/p>\n<p><span style=\"font-family: futural;\">3.\u00a0<u class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-31 c2-66 c2-69\">Method 3: Hierarchical Transformer Approach LLM<\/u>. The method proposed by the authors of the paper who use a Large Language Model in a novel way. Specifically, they focus on:<\/span><\/p>\n<p><span style=\"font-family: futural;\">\u00a0\u00a0\u00a0a. The MD&amp;A and Risk Factors sections of reports.<\/span><\/p>\n<p><span style=\"font-family: futural;\">\u00a0\u00a0\u00a0b. Given that current quarterly earnings announcements are noisy, they train their machine learning algorithms on next quarter\u2019s earnings announcement surprises.<\/span><\/p>\n<div>\n<h4 class=\"x-el x-el-h4 c2-6h c2-6i c2-v c2-w c2-42 c2-2c c2-2a c2-29 c2-2b c2-3 c2-z c2-44 c2-10 c2-45 c2-46 c2-47 c2-48\"><span style=\"font-family: futural;\"><strong class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-3v c2-66\"><\/strong><\/span><\/h4>\n<h3 class=\"x-el x-el-h4 c2-6h c2-6i c2-v c2-w c2-42 c2-2c c2-2a c2-29 c2-2b c2-3 c2-z c2-44 c2-10 c2-45 c2-46 c2-47 c2-48\"><span style=\"font-family: futural;\"><strong class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-3v c2-66\">Major Findings<\/strong><\/span><\/h3>\n<\/div>\n<p><span style=\"font-family: futural;\">For each of the Methods\u2019 performance summarized below the authors\u2019 criteria was to rank stocks based on their earnings surprise forecasts into quintiles and then evaluate the out of sample performance of the High-minus-Low strategy that buys the highest quintile category and sells the lowest quintile category. Furthermore, for these quintile portfolios they looked at both the equal weighted (EW) and value weighted (VW) portfolio returns. Last, they controlled for additional factors that might skew their results, such as cross-sectional and time-series regressions with the time horizon and various firm characteristics.\u00a0<\/span><\/p>\n<p><span style=\"font-family: futural;\">For Method 2 the researchers used various statistical and machine learning methods to develop weightings for the words identified as predictive. Which statistical method was used is shown in the parentheses, below.<\/span><\/p>\n<p><span style=\"font-family: futural;\">Below we do not describe the full voluminous output of the researchers. But the authors also looked at CAPM returns, as well as Fama-French 5-factor and 6-factor returns. For those interested in these measures of performance, please consult\u00a0<a class=\"x-el x-el-a c2-2w c2-2x c2-69 c2-v c2-w c2-x c2-j c2-6a c2-3 c2-30 c2-31 c2-11 c2-32\" href=\"https:\/\/papers.ssrn.com\/sol3\/papers.cfm?abstract_id=4493166#:~:text=It%20can.%20Using%20textual%20information%20from%20a%20complete,language%20models%2C%20LLMs%2C%20to%20predict%20future%20earnings%20surprises.\" rel=\"\">the paper<\/a>\u00a0which is publicly available.<\/span><\/p>\n<p><span style=\"font-family: futural;\">1. Method 1 performance is as follows:<\/span><\/p>\n<p><span style=\"font-family: futural;\">\u00a0\u00a0\u00a0a. Keyword sentiment lexicon: EW = -2.1%; VW = +4.5%.<\/span><\/p>\n<p><span style=\"font-family: futural;\">\u00a0\u00a0\u00a0b. FinBERT: EW = +15.0%; VW = +31.3%.<\/span><\/p>\n<p><span style=\"font-family: futural;\">\u00a0\u00a0\u00a0c. Length of MD&amp;A disclosures: EW = -18.9%; VW = -26.4% (in other words, shorter disclosures perform better).<\/span><\/p>\n<p><span style=\"font-family: futural;\">\u00a0\u00a0\u00a0d. Length of Risk Factors disclosures: EW = +18.6%; VW = -8.9% (in other words, shorter disclosures perform better).<\/span><\/p>\n<p><span style=\"font-family: futural;\">2. Method 2 performance is as follows:<\/span><\/p>\n<p><span style=\"font-family: futural;\">\u00a0\u00a0\u00a0a. Bag-of-Words classification, version 1 (Ordinary Least Squares): EW = +22.4%; VW = +3.9%.<\/span><\/p>\n<p><span style=\"font-family: futural;\">\u00a0\u00a0\u00a0b. Bag-of-Words classification, version 2 (Loughran &amp; McDonald OLS): EW = +18.7%; VW = +21.9%.<\/span><\/p>\n<p><span style=\"font-family: futural;\">\u00a0\u00a0\u00a0c. Bag-of-Words classification, version 3 (Elastic Nets): EW = +33.3%; VW = +18.1%.<\/span><\/p>\n<p><span style=\"font-family: futural;\">\u00a0\u00a0\u00a0d. Bag-of-Words classification, version 4 (Lasso): EW = +22.0%; VW = +10.3%.<\/span><\/p>\n<p><span style=\"font-family: futural;\">\u00a0\u00a0\u00a0e. Bag-of-Words classification, version 5 (Support Vector Regression): EW = +40.0%; VW = +41.8%.<\/span><\/p>\n<p><span style=\"font-family: futural;\">\u00a0\u00a0\u00a0f. Bag-of-Words classification, version 6 (XGBoost): EW = +25.6%; VW = +39.7%.<\/span><\/p>\n<p><span style=\"font-family: futural;\">\u00a0\u00a0\u00a0g. Bag-of-Words classification, version 7 (Random Forest): EW = -4.4%; VW = -19.1%.<\/span><\/p>\n<p><span style=\"font-family: futural;\">\u00a0\u00a0\u00a0h. Bag-of-Words classification, version 8 (Feed Forward Neural Networks): EW = +18.7%; VW = +21.7%.<\/span><\/p>\n<p><span style=\"font-family: futural;\">3. Method 3 performance is as follows:<\/span><\/p>\n<p><span style=\"font-family: futural;\">\u00a0\u00a0\u00a0a. Frozen BERT: EW = +40.7%; VW = +43.0%.<\/span><\/p>\n<p><span style=\"font-family: futural;\">\u00a0\u00a0\u00a0b. FtBERT: EW = +43.9%; VW = +56.1%.<\/span><\/p>\n<div>\n<h4 class=\"x-el x-el-h4 c2-6h c2-6i c2-v c2-w c2-42 c2-2c c2-2a c2-29 c2-2b c2-3 c2-z c2-44 c2-10 c2-45 c2-46 c2-47 c2-48\"><span style=\"font-family: futural;\"><strong class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-3v c2-66\"><\/strong><\/span><\/h4>\n<h3 class=\"x-el x-el-h4 c2-6h c2-6i c2-v c2-w c2-42 c2-2c c2-2a c2-29 c2-2b c2-3 c2-z c2-44 c2-10 c2-45 c2-46 c2-47 c2-48\"><span style=\"font-family: futural;\"><strong class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-3v c2-66\">Conclusions<\/strong><\/span><\/h3>\n<\/div>\n<p><span style=\"font-family: futural;\">Among the findings of the researchers are that traditional Natural Language Processing techniques are not able to identify future positive or negative changes in firms\u2019 valuations. Next, off-the-shelf Large Language Models (LLMs), even those trained on financial targets, while good at predicting future earnings surprises, are not better than simpler estimators such as the lengths of companies\u2019 MD&amp;A and Risk Factors sections of their quarterly and annual reports. Third, fine-tuning and training LLMs is a solution. The researchers\u2019 offering, FtBERT provides superior results in identifying future positive or negative changes in earnings. Last, the researchers find that there is robust unpriced information contained in both quarterly and annual reports.<\/span><\/p>\n<p><span style=\"font-family: futural;\"><strong class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-3v c2-66\"><\/strong><\/span><\/p>\n<h3><span style=\"font-family: futural;\"><strong class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-3v c2-66\">Quotes of Note<\/strong><\/span><\/h3>\n<ul>\n<li><span style=\"font-family: futural;\">\u201cThe abundance of signals and complexity in disclosed information leads to investors inattention to subtle but important signals even in the most foundational to the corporate reporting process items such as quarterly and annual, 10-Q (10-K), reports.\u201d<\/span><\/li>\n<li><span style=\"font-family: futural;\">\u201cCohen et al. (2020) also show that the subsequent [earnings] announcement does indeed reflect information which the market neglected to react to in the previous quarter\u2019s announcement.\u201d<\/span><\/li>\n<li><span style=\"font-family: futural;\">\u201cFinally, we bring attention to the valuable information content of 10-Q and 10-K reports. We also find that market participants react very slowly to this information, largely due to high disagreement about its interpretation.\u201d \u00a0\u00a0\u00a0<\/span><\/li>\n<\/ul>\n<hr class=\"x-el x-el-hr c2-1 c2-2 c2-6j c2-6k c2-4q c2-29 c2-2b c2-k c2-3 c2-4 c2-5 c2-6 c2-7 c2-8\" \/>\n<p><span style=\"font-family: futural;\"><a class=\"x-el x-el-a c2-2w c2-2x c2-69 c2-v c2-w c2-x c2-j c2-6a c2-3 c2-30 c2-31 c2-11 c2-32\" href=\"https:\/\/blogging.godaddy.com\/blog\/a6d795a4-a672-4120-a6ba-07384a52a2d8\/posts\/9e5f651f-618b-490e-9e9b-5045e0f25d60#_ednref1\" rel=\"\">[i]<\/a>Chapados, Nicolas, Zhenzhen Fan, Russ Goyenko, Issam Hadj Laradji, Fred Liu, and Chengyu Zhang. \u201c<a class=\"x-el x-el-a c2-2w c2-2x c2-69 c2-v c2-w c2-x c2-j c2-6a c2-3 c2-30 c2-31 c2-11 c2-32\" href=\"https:\/\/papers.ssrn.com\/sol3\/papers.cfm?abstract_id=4493166#:~:text=It%20can.%20Using%20textual%20information%20from%20a%20complete,language%20models%2C%20LLMs%2C%20to%20predict%20future%20earnings%20surprises.\" rel=\"\">Can AI Read the Minds of Corporate Executives?<\/a>\u201d SSRN. 27 June 2023<\/span><\/p>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][\/et_pb_section]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this Key Scientific Paper Redux of \u201cCan AI Read the Minds of Corporate Executives?,\u201d[i]\u00a0we summarize the findings of researchers who were interested in whether or not using Natural Language Processing may be used to assess disclosures from companies in their regulatory filings that would allow them to predict future earnings surprises in subsequent quarters. [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":14335,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"on","_et_pb_old_content":"<figure class=\"x-el x-el-figure c2-1 c2-2 c2-3x c2-i c2-h c2-21 c2-2c c2-29 c2-2a c2-43 c2-51 c2-3 c2-4 c2-5 c2-6 c2-7 c2-8\">\r\n<div>\r\n<div><span style=\"font-family: futural;\"><img class=\"x-el x-el-img c2-1 c2-2 c2-k c2-21 c2-1x c2-1y c2-29 c2-2b c2-s c2-6b c2-4l c2-3 c2-4 c2-5 c2-6 c2-7 c2-8\" title=\"Key Scientific Paper Redux: Can AI Read the Minds of Corporate Executives?\" src=\"https:\/\/img1.wsimg.com\/isteam\/ip\/b4167b12-c211-4a45-9c4b-489be14138f8\/Can%20AI%20Read%20the%20Minds%20of%20Corporate%20Executives.PNG\/:\/cr=t:0%25,l:0%25,w:100%25,h:100%25\/rs=w:1280\" alt=\"Key Scientific Paper Redux: Can AI Read the Minds of Corporate Executives?\" \/><\/span><\/div>\r\n<\/div>\r\n<figcaption class=\"x-el x-el-figcaption c2-1 c2-2 c2-v c2-w c2-3d c2-29 c2-2b c2-4f c2-6c c2-6d c2-6e c2-6f c2-3 c2-6g c2-3e c2-10 c2-3f c2-3g c2-3h c2-3i\"><span style=\"font-family: futural;\">Key Scientific Paper Redux: Can AI Read the Minds of Corporate Executives?<\/span><\/figcaption><\/figure>\r\n<span style=\"font-family: futural;\"><em>By Jason A. Voss, CFA<\/em><\/span>\r\n\r\n<span style=\"font-family: futural;\">In this Key Scientific Paper Redux of \u201cCan AI Read the Minds of Corporate Executives?,\u201d<a class=\"x-el x-el-a c2-2w c2-2x c2-69 c2-v c2-w c2-x c2-j c2-6a c2-3 c2-30 c2-31 c2-11 c2-32\" href=\"https:\/\/blogging.godaddy.com\/blog\/a6d795a4-a672-4120-a6ba-07384a52a2d8\/posts\/9e5f651f-618b-490e-9e9b-5045e0f25d60#_edn1\" rel=\"\">[i]<\/a>\u00a0we summarize the findings of researchers who were interested in whether or not using Natural Language Processing may be used to assess disclosures from companies in their regulatory filings that would allow them to predict future earnings surprises in subsequent quarters. The answer to that question is: yes. This finding comports with\u00a0<a class=\"x-el x-el-a c2-2w c2-2x c2-69 c2-v c2-w c2-x c2-j c2-6a c2-3 c2-30 c2-31 c2-11 c2-32\" href=\"https:\/\/deceptionandtruthanalysis.com\/insights\/f\/data-beats-the-dow\" rel=\"\">our own back-test analyses<\/a>\u00a0that finds that there is still signal contained in within quarterly and annual reports over a year later.<\/span>\r\n<div>\r\n<h4 class=\"x-el x-el-h4 c2-6h c2-6i c2-v c2-w c2-42 c2-2c c2-2a c2-29 c2-2b c2-3 c2-z c2-44 c2-10 c2-45 c2-46 c2-47 c2-48\"><span style=\"font-family: futural;\"><strong class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-3v c2-66\">Study Details<\/strong><\/span><\/h4>\r\n<\/div>\r\n<span style=\"font-family: futural;\">The researchers considered three different high-level approaches to extract meaning from the words contained in company 10-Ks and 10-Qs from 1993 thru 2021 and had them compete against one another to predict future earnings surprises. Specifically, those methods are:<\/span>\r\n\r\n<span style=\"font-family: futural;\">1.\u00a0<u class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-31 c2-66 c2-69\">Method 1: Sentiment<\/u>.<\/span>\r\n\r\n<span style=\"font-family: futural;\">\u00a0\u00a0\u00a0a. A keyword sentiment lexicon developed by the researchers Loughran and McDonald in 2011 that had human-based researchers encoding words in financial reports as to their meaning.<\/span>\r\n\r\n<span style=\"font-family: futural;\">\u00a0\u00a0\u00a0b. A tweaking of Google\u2019s pre-trained Large Language Model BERT (Bidirectional Encoder Representations from Transformers) known as FinBERT, created by Huang, et al. in 2022.<\/span>\r\n\r\n<span style=\"font-family: futural;\">\u00a0\u00a0\u00a0c. Simply looking at the length of the MD&amp;A and Risk Factors sections.<\/span>\r\n\r\n<span style=\"font-family: futural;\">2.\u00a0<u class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-31 c2-66 c2-69\">Method 2: Bag-of-Words<\/u>.<\/span>\r\n\r\n<span style=\"font-family: futural;\">\u00a0\u00a0\u00a0a. A manual word classification scheme along with a regression model similar to that employed by Jegadeesh and Wu in 2013. Here the meanings of words are classified and then given weights by a regression model.<\/span>\r\n\r\n<span style=\"font-family: futural;\">\u00a0\u00a0\u00a0b. The same type of approach as above, but as suggested by a difference scheme proposed by Manela and Moreira in 2017.<\/span>\r\n\r\n<span style=\"font-family: futural;\">3.\u00a0<u class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-31 c2-66 c2-69\">Method 3: Hierarchical Transformer Approach LLM<\/u>. The method proposed by the authors of the paper who use a Large Language Model in a novel way. Specifically, they focus on:<\/span>\r\n\r\n<span style=\"font-family: futural;\">\u00a0\u00a0\u00a0a. The MD&amp;A and Risk Factors sections of reports.<\/span>\r\n\r\n<span style=\"font-family: futural;\">\u00a0\u00a0\u00a0b. Given that current quarterly earnings announcements are noisy, they train their machine learning algorithms on next quarter\u2019s earnings announcement surprises.<\/span>\r\n<div>\r\n<h4 class=\"x-el x-el-h4 c2-6h c2-6i c2-v c2-w c2-42 c2-2c c2-2a c2-29 c2-2b c2-3 c2-z c2-44 c2-10 c2-45 c2-46 c2-47 c2-48\"><span style=\"font-family: futural;\"><strong class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-3v c2-66\">Major Findings<\/strong><\/span><\/h4>\r\n<\/div>\r\n<span style=\"font-family: futural;\">For each of the Methods\u2019 performance summarized below the authors\u2019 criteria was to rank stocks based on their earnings surprise forecasts into quintiles and then evaluate the out of sample performance of the High-minus-Low strategy that buys the highest quintile category and sells the lowest quintile category. Furthermore, for these quintile portfolios they looked at both the equal weighted (EW) and value weighted (VW) portfolio returns. Last, they controlled for additional factors that might skew their results, such as cross-sectional and time-series regressions with the time horizon and various firm characteristics.\u00a0<\/span>\r\n\r\n<span style=\"font-family: futural;\">For Method 2 the researchers used various statistical and machine learning methods to develop weightings for the words identified as predictive. Which statistical method was used is shown in the parentheses, below.<\/span>\r\n\r\n<span style=\"font-family: futural;\">Below we do not describe the full voluminous output of the researchers. But the authors also looked at CAPM returns, as well as Fama-French 5-factor and 6-factor returns. For those interested in these measures of performance, please consult\u00a0<a class=\"x-el x-el-a c2-2w c2-2x c2-69 c2-v c2-w c2-x c2-j c2-6a c2-3 c2-30 c2-31 c2-11 c2-32\" href=\"https:\/\/papers.ssrn.com\/sol3\/papers.cfm?abstract_id=4493166#:~:text=It%20can.%20Using%20textual%20information%20from%20a%20complete,language%20models%2C%20LLMs%2C%20to%20predict%20future%20earnings%20surprises.\" rel=\"\">the paper<\/a>\u00a0which is publicly available.<\/span>\r\n\r\n<span style=\"font-family: futural;\">1. Method 1 performance is as follows:<\/span>\r\n\r\n<span style=\"font-family: futural;\">\u00a0\u00a0\u00a0a. Keyword sentiment lexicon: EW = -2.1%; VW = +4.5%.<\/span>\r\n\r\n<span style=\"font-family: futural;\">\u00a0\u00a0\u00a0b. FinBERT: EW = +15.0%; VW = +31.3%.<\/span>\r\n\r\n<span style=\"font-family: futural;\">\u00a0\u00a0\u00a0c. Length of MD&amp;A disclosures: EW = -18.9%; VW = -26.4% (in other words, shorter disclosures perform better).<\/span>\r\n\r\n<span style=\"font-family: futural;\">\u00a0\u00a0\u00a0d. Length of Risk Factors disclosures: EW = +18.6%; VW = -8.9% (in other words, shorter disclosures perform better).<\/span>\r\n\r\n<span style=\"font-family: futural;\">2. Method 2 performance is as follows:<\/span>\r\n\r\n<span style=\"font-family: futural;\">\u00a0\u00a0\u00a0a. Bag-of-Words classification, version 1 (Ordinary Least Squares): EW = +22.4%; VW = +3.9%.<\/span>\r\n\r\n<span style=\"font-family: futural;\">\u00a0\u00a0\u00a0b. Bag-of-Words classification, version 2 (Loughran &amp; McDonald OLS): EW = +18.7%; VW = +21.9%.<\/span>\r\n\r\n<span style=\"font-family: futural;\">\u00a0\u00a0\u00a0c. Bag-of-Words classification, version 3 (Elastic Nets): EW = +33.3%; VW = +18.1%.<\/span>\r\n\r\n<span style=\"font-family: futural;\">\u00a0\u00a0\u00a0d. Bag-of-Words classification, version 4 (Lasso): EW = +22.0%; VW = +10.3%.<\/span>\r\n\r\n<span style=\"font-family: futural;\">\u00a0\u00a0\u00a0e. Bag-of-Words classification, version 5 (Support Vector Regression): EW = +40.0%; VW = +41.8%.<\/span>\r\n\r\n<span style=\"font-family: futural;\">\u00a0\u00a0\u00a0f. Bag-of-Words classification, version 6 (XGBoost): EW = +25.6%; VW = +39.7%.<\/span>\r\n\r\n<span style=\"font-family: futural;\">\u00a0\u00a0\u00a0g. Bag-of-Words classification, version 7 (Random Forest): EW = -4.4%; VW = -19.1%.<\/span>\r\n\r\n<span style=\"font-family: futural;\">\u00a0\u00a0\u00a0h. Bag-of-Words classification, version 8 (Feed Forward Neural Networks): EW = +18.7%; VW = +21.7%.<\/span>\r\n\r\n<span style=\"font-family: futural;\">3. Method 3 performance is as follows:<\/span>\r\n\r\n<span style=\"font-family: futural;\">\u00a0\u00a0\u00a0a. Frozen BERT: EW = +40.7%; VW = +43.0%.<\/span>\r\n\r\n<span style=\"font-family: futural;\">\u00a0\u00a0\u00a0b. FtBERT: EW = +43.9%; VW = +56.1%.<\/span>\r\n<div>\r\n<h4 class=\"x-el x-el-h4 c2-6h c2-6i c2-v c2-w c2-42 c2-2c c2-2a c2-29 c2-2b c2-3 c2-z c2-44 c2-10 c2-45 c2-46 c2-47 c2-48\"><span style=\"font-family: futural;\"><strong class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-3v c2-66\">Conclusions<\/strong><\/span><\/h4>\r\n<\/div>\r\n<span style=\"font-family: futural;\">Among the findings of the researchers are that traditional Natural Language Processing techniques are not able to identify future positive or negative changes in firms\u2019 valuations. Next, off-the-shelf Large Language Models (LLMs), even those trained on financial targets, while good at predicting future earnings surprises, are not better than simpler estimators such as the lengths of companies\u2019 MD&amp;A and Risk Factors sections of their quarterly and annual reports. Third, fine-tuning and training LLMs is a solution. The researchers\u2019 offering, FtBERT provides superior results in identifying future positive or negative changes in earnings. Last, the researchers find that there is robust unpriced information contained in both quarterly and annual reports.<\/span>\r\n\r\n<span style=\"font-family: futural;\"><strong class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-3v c2-66\">Quotes of Note<\/strong><\/span>\r\n<ul>\r\n \t<li><span style=\"font-family: futural;\">\u201cThe abundance of signals and complexity in disclosed information leads to investors inattention to subtle but important signals even in the most foundational to the corporate reporting process items such as quarterly and annual, 10-Q (10-K), reports.\u201d<\/span><\/li>\r\n \t<li><span style=\"font-family: futural;\">\u201cCohen et al. (2020) also show that the subsequent [earnings] announcement does indeed reflect information which the market neglected to react to in the previous quarter\u2019s announcement.\u201d<\/span><\/li>\r\n \t<li><span style=\"font-family: futural;\">\u201cFinally, we bring attention to the valuable information content of 10-Q and 10-K reports. We also find that market participants react very slowly to this information, largely due to high disagreement about its interpretation.\u201d \u00a0\u00a0\u00a0<\/span><\/li>\r\n<\/ul>\r\n\r\n<hr class=\"x-el x-el-hr c2-1 c2-2 c2-6j c2-6k c2-4q c2-29 c2-2b c2-k c2-3 c2-4 c2-5 c2-6 c2-7 c2-8\" \/>\r\n\r\n<span style=\"font-family: futural;\"><a class=\"x-el x-el-a c2-2w c2-2x c2-69 c2-v c2-w c2-x c2-j c2-6a c2-3 c2-30 c2-31 c2-11 c2-32\" href=\"https:\/\/blogging.godaddy.com\/blog\/a6d795a4-a672-4120-a6ba-07384a52a2d8\/posts\/9e5f651f-618b-490e-9e9b-5045e0f25d60#_ednref1\" rel=\"\">[i]<\/a>Chapados, Nicolas, Zhenzhen Fan, Russ Goyenko, Issam Hadj Laradji, Fred Liu, and Chengyu Zhang. \u201c<a class=\"x-el x-el-a c2-2w c2-2x c2-69 c2-v c2-w c2-x c2-j c2-6a c2-3 c2-30 c2-31 c2-11 c2-32\" href=\"https:\/\/papers.ssrn.com\/sol3\/papers.cfm?abstract_id=4493166#:~:text=It%20can.%20Using%20textual%20information%20from%20a%20complete,language%20models%2C%20LLMs%2C%20to%20predict%20future%20earnings%20surprises.\" rel=\"\">Can AI Read the Minds of Corporate Executives?<\/a>\u201d SSRN. 27 June 2023<\/span>","_et_gb_content_width":"","footnotes":""},"categories":[3,465],"tags":[457,458,445,459,441],"class_list":["post-14334","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-the-blog","category-d-a-t-a","tag-ai-ml","tag-artificial-intelligence","tag-key-scientific-paper-redux","tag-machine-learning","tag-validation"],"_links":{"self":[{"href":"https:\/\/jasonapollovoss.com\/web\/wp-json\/wp\/v2\/posts\/14334","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/jasonapollovoss.com\/web\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jasonapollovoss.com\/web\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/jasonapollovoss.com\/web\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/jasonapollovoss.com\/web\/wp-json\/wp\/v2\/comments?post=14334"}],"version-history":[{"count":0,"href":"https:\/\/jasonapollovoss.com\/web\/wp-json\/wp\/v2\/posts\/14334\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/jasonapollovoss.com\/web\/wp-json\/wp\/v2\/media\/14335"}],"wp:attachment":[{"href":"https:\/\/jasonapollovoss.com\/web\/wp-json\/wp\/v2\/media?parent=14334"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jasonapollovoss.com\/web\/wp-json\/wp\/v2\/categories?post=14334"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jasonapollovoss.com\/web\/wp-json\/wp\/v2\/tags?post=14334"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}