Backtesting strategies based on multiple signals Robert Novy-Marx University of Rochester and NBER 1
Jan 18, 2016
Backtesting strategies based on multiple signals
Robert Novy-MarxUniversity of Rochester and NBER
1
Multi-signal Strategies
Proliferation in industry E.g., MSCI Quality Index
High ROE, low ROE vol., low leverage “Smart beta” products
RAFI: weight on sales, CF, BE, and dividends.
Increasingly common in academia Piotroski’s F-score (9 signals) Asness et. al. Quality Score (21 signals)
2
Why the increased interest?
Because finding “alpha” is hard And they work great!
Impressive backtest performance
Too good? Alpha should be hard to find
Lots of smart people looking Huge incentives to try
And even to believe!
3
Issues
Every choice has potential to bias results Much bigger problem with multiple signals
Not just which signals are used… But how they are used!
Basic issue Each signal is used so that it individually
predicts positive in-sample returns Seems like a small thig—but it’s not!
4
Types of biases
Snooping: in-sample aspect of data
guides strategy formation Two types to worry about:
Multiple testing bias Consider multiple strategies, show only best one
Overfitting E.g., Ex post MVE SRs always high
MVE strat buys “winners” and sell “losers”
5
Examples
Bet on a series of fair coin flips What if you knew that there were:
1. More heads in the first (or second) half And could bet on just the early (or late) flips?
2. More heads than tails?
What sorts of biases? Do we account for these in finance?
6
First type: multiple testing (or selection) Don’t really account for it, formally
Do suspect (know) people look at more thing
Second type: overfitting Bet heads, not tails! Account for it?
One signal: Absolutely! t5% = 1.96 (not 1.65)
Multiple signals: No!7
Thought experiment?
8
Null hypothesis “Signals” don’t predict differences in
average returns E.g., monkeys selecting stocks by throwing darts
at the WSJ
Performance distribution t-statistics ~ N(0,1)
More or less Excess kurtosis and heteroscedasticity
9
What if you diversify across the lucky monkeys? Those with positive alpha
Clearly “snooping” Using in-sample aspect of data to form the strategy
How does this bias the results? Expected t-stat?
10
Get the average return
Diversify across their risks
Yields a high t-statistic:
Can also frame this in SRs
11
0.8E t N
| 0E r r
/p N
0.8 N
TE SR
Same thing (essentially) happens if you use all the signals But sign them so that they “predict” positive
in-sample returns Standard statistics account for this…
If and only if N = 1!
Again, strategy has high backtested SR Question: expect high SR going forward?
12
Issues
Combine things that backtest well
Get even better backtests Not surprising!
But what do the backtests mean? Biased?
Why? What biases? If so, by how much? (Quantify!)
Other intuitions?
13
Can address these
Calculate empirical distributions When signals are not informative But multiple signals are used to select stocks
Big boot-strapping exercise
Derive theoretical distributions In a simplified model
Normal, homoscedastic returns Use these to develop intuition
14
Strategy Construction
Long/short strategies Rebalanced annually (end of June)
Weight on each stock:
Si,t is the signal (x-sectional median = St)
mi,t is a cap multiplier Nests many common constructions
15
Nests “smart beta” Weight on each stock:
So long/sort is smart beta’s tilt from market
16
“Smart beta” Market
Signals
Generate individually as pure noise! Random normal variables
Composite signals sum individual signals Technical reason—mapping to theory
Not important for the empirical work
Cap multiplier is market equity Essentially value-weighted strategies
Again, not important
17
Best k-of-n strategies
“Natural” construction Investigate n signals Pick the k “strongest”
I.e., with most significant in-sample performance Combine them how?
Bootstrap for k ≤ n ≤ 100 Again, do it 10,000 times Collect strategy t-statistics
18
Two Issues
When k < n, selection bias When k = 1 < n, multiple testing bias
Well understood
When k > 1, overfitting Data snooping
In-sample aspect of data used to form strategy Pure overfitting only if k = n
Interaction!
19
Special Cases
20
Overfitting only
Multiple-testing only
Pure Selection
21
Pure Overfitting
22
Both Biases
23
General Case
What sort of strategies should we worry about? How do we think researchers design
strategies in practice? 3-of-20?
How many signals did MSCI consider for its quality index?
5-of-100?
24
General Case
25
Model (theory)
Strategies signal-weight stocks Returns normally dist. (assumption)
Equal volatilities Uncorrelated
Combine signals by averaging Or weighted averaging
combined strat = portfolio of pure strats So can apply facts from portfolio theory
26
Given weights on n signals (strategies), standard portfolio theory results
Weighting? Equal (1/k), corresponds to min variance Signal ( = ), corresponds to efficient
27
Best k-of-n strategies
Yields t-statistic distributions:
28
Critical values
Analytic for special cases: k = 1 k = n, with signal-weighting
Generally by numeric integration Simple computationally
But don’t provide much intuition Also derive good analytic approximations
Useful for comparative statics
29
Special Cases
30
Special Cases
31
General Cases
32
General Cases (Empirical)
33
General case, when k ~ n
n = 100
34
General case, when k ~ n
n = 40
35
General case, when k ~ n
n = 20
36
Tension when increasing k
Decreases vol. improves performance Decreases average signal quality lowers
returns impairs performance Initially first effect dominates (esp. w/ large n)
“Optimal” use of worst ~1/2 of signals: Throw them away!
Mean k/2-of-k t-stats. ~13% higher than k-of-k Mean k-of-2k t-stats. ~59% higher than k-of-k
37
Alternative Quantification
Pure multiple-testing bias equivalence How many single signals would you have to
look at to get the same bias? That is, given any critical value τ (i.e., for
some best k-of-n strategy), find n* s.t.
38
39
Approximate Power Law
Best k-of-n strategy bias: Similar to those from a best 1-of-nk
strategy! Using analytic approximation, can show that
log-n* roughly affine in log-n With slope ≈ k
Can see this graphically
40
41
Conclusion
View multi-signal claims skeptically Multiple good signals better performance
when combined Good backtested performance does NOT
any good signals “High tech” solution: use different tests “Low tech”: evaluate signals individually
Marginal power of each variable
42
General Approximation
Normal approximation for sum of: Top k absolute normal order statistics (MV) Top k squared normal order statistics (MVE)
Use Beta dist. of uniform order stats Approximately normal
Joint uniform conditional dist. of larger O-stats
Law of total variance
+43
How They Work
Specify mean, S.D. of approx. normal Combine with p-value how far out in tail
E.g., 5% crit. mean + two standard deviations
44
General Approximation
Where
45