One of the most frequent requests we receive at Deception And Truth Analysis is to independently validate our claims that DATA Scores are predictive of future stock price movements.
After a nearly six month review we are pleased to report that we have received that independent validation from CloudQuant[i] with the publication of their whitepaper Outperforming the Market with Measures of Deceptive and Truthful Language in Regulatory Filings. Specifically, they found that DATA Scores provide a significant return advantage for different capitalization investment strategies, as well as in various sector strategies. In this article we will discuss CloudQuant’s independent validation that shows DATA handily beats the S&P 500.
Now the only question, with respect, is: Are you interested in generating alpha, or not?
DATA Handily Beats the S&P 500: Major Results*
CloudQuant constructed a portfolio that featured a large portion of the most truthful companies as assessed by their DATA Scores. What they found is:
[* For those interested in CloudQuant’s methodology we have summarized it below.]
First, CloudQuant constructed a trading strategy that took advantage of some of the findings from their study. Namely, they used the period of 2008 through 2019 to define their parameters, and with a testing period of 2020 through March 2023. Also, they used the default settings described below, but with two modifications: a) the portfolio was long-only and b) the Percentile Rank Cutoff was set to the top or bottom 35%, depending on the sector.
This resulted in an Annualized Net Outperformance Return of 6.26% above the equal-weighted S&P 500. The idiosyncratic return registered a cumulative 86.29% outperformance for the preceding 15 years. For this trading strategy the turnover was relatively low at just 2.2%. These results assumed trading costs of 5 basis points per trade.
Here are additional details from this test as shown in the whitepaper’s Figure 3.5.1:

Second, for the highest performing sectors there is outperformance for at least 80 days after a position is initiated.
Third, for most of the sectors, the market does not begin to price the DATA Score assessment until 30 days after the publication of 10-Ks and 10-Qs (i.e. the day that DATA assesses these documents and publishes its DATA Scores). DATA speculates that this is because most investors respond to and price the quantitative financial results such as EPS rather than deceptiveness or truthfulness. These outcomes are driven by the management of a company’s linguistic choices; which, in turn, are driven by their behaviors. DATA very specifically is designed to measure deceptive and truthful behaviors. Thus, a lagged response makes sense.
Finally, because CloudQuant found that holding period is an important parameter they varied both the Lookback Period and the Holding Period and found that there is a consistent increase in Sharpe Ratio as the Lookback Period increases from 30 to 70 days. This means that DATA Scores are a useful metric throughout an entire quarter with peak performance at 20 and 80 Holding Period Days with 70 Lookback Period Days. To quote CloudQuant, they say:
“[This is] a very interesting result. Typically, datasets used for investing will exhibit a very rapid (minutes to several days) decay in their returns resulting in very limited ability for large investment managers to utilize them in a significant way.”
DATA Handily Beats the S&P 500: Methodology
- Data assessed. CloudQuant assessed DATA’s U.S. SEC 10-Q and 10-K filings dataset, DATAbase, as well as the Managements’ Discussion & Analysis sections of these documents.
- Data quantity. In total there were 6.7 million data points from 213,522 earnings filings from 6,191 companies over 14 years.
- Biases assessed. CloudQuant tested for both LookAhead Bias and Survivorship Bias and found both to be low.
- Data examined. CloudQuant evaluated three primary outputs from DATA, our DATA Score, the average DATA Score for deceptive fragments, and the average DATA Score for Truthful fragments. Very importantly, DATA’s algorithm has never seen a stock price. Instead, DATA’s assessments are an NLP assessment of known-to-science behavioral differences between deceivers and truth tellers.
- Transaction fees. For each entry or exit trade they assumed a 5-basis point transaction fee.
- Portfolio Composition. Their index composition was not static and was reassessed and redefined at the commencement of each calendar year to ensure that the universe of stock symbols remained representative of their respective market capitalization categories. This means that the strategies adapt to Mergers and Acquisition activity, ascendant companies, and company size dynamics.
- Rebalancing. Two methods of rebalancing were used: a) constant holding period with event-based entry; and b) daily rebalancing with a fixed lookback window and a maximum holding period.
- Position Entry. Multiple lookback periods were considered, as well looking at the percentile ranking of companies based on their DATA Scores. Once DATA Scores were converted into percentiles, CloudQuant also established threshold values for long and short positions.
- Position Exit. Two methods of exiting positions were examined; a) daily re-balancing with a fixed lookback window; and event based entry with a constant holding period.
- Portfolio Weighting. Two methods of weighting portfolios was considered; a) equal weight, and b) an event-driven approach, where the sizing is adjusted based on the signal count for each entry date.
- Entry Trade Execution Lag. To account for the fact it can take a long time to fully purchase a given position, CloudQuant looked at execution times ranging from 0 to up to 50 trading days.
- Holding Period. Holding periods were varied from between 20 to 80 days. This allowed CloudQuant to evaluate the decay rate of the signals, potential investment capacity and performance characteristics for medium to long-term investments.
- Portfolio Structure. Two types of portfolio structure were considered; a) long-only, and b) dollar-neutral.
For its back tests conducted on the S&P 500, CloudQuant’s default parameters were as follows:
- Portfolio Structure: dollar neutral.
- Lookback Period: 60 days.
- Cross-Sectional Percentile Ranking of raw DATA Score.
- Entry Trade Execution Lag: 0 days.
- Holding Period: 40 days.
- Percentile Rank Cutoff Threshold: 0.50/0.50
- Portfolio Weights: equal-weighted and the date range is 2008 through 2022.
For a full list of parameters, please refer to the whitepaper, Section 2, pages 2-9.
What’s Up Next?
This is the first of six articles we are authoring to summarize CloudQuant’s and Solactive’s independent validation of DATA Scores’ ability to predict future stock price movements. Here is an overview of this and forthcoming articles:
- DATA Handily Beats the S&P 500 – A summary of CloudQuant’s large cap findings.
- DATA’s Stock Picking Batting Average – Solactive found that DATA Scores make the right judgment call on stock’s going up a very large amount of the time.
- DATA Handily Beats the Russell 2000 – A summary of CloudQuant’s small cap findings.
- DATA Measures Something Unique – A summary of CloudQuant’s latency findings that demonstrate that it takes markets many weeks to price the behaviors that DATA is measuring.
- DATA’s Industry Performance – A summary of CloudQuant’s industry findings. The report card indicates that DATA is excellent in picking some industries, less so in others.
[i] Solactive has also independently validated the DATA platform with interesting results that will feature in a forthcoming article.




0 Comments