Backtesting strategies based on multiple signals Robert Novy-Marx University of Rochester and NBER 1.

Backtesting strategies based on multiple signals

Robert Novy-MarxUniversity of Rochester and NBER

1

Multi-signal Strategies

Proliferation in industry E.g., MSCI Quality Index

High ROE, low ROE vol., low leverage “Smart beta” products

RAFI: weight on sales, CF, BE, and dividends.

Increasingly common in academia Piotroski’s F-score (9 signals) Asness et. al. Quality Score (21 signals)

2

Why the increased interest?

Because finding “alpha” is hard And they work great!

Impressive backtest performance

Too good? Alpha should be hard to find

Lots of smart people looking Huge incentives to try

And even to believe!

3

Issues

Every choice has potential to bias results Much bigger problem with multiple signals

Not just which signals are used… But how they are used!

Basic issue Each signal is used so that it individually

predicts positive in-sample returns Seems like a small thig—but it’s not!

4

Types of biases

Snooping: in-sample aspect of data

guides strategy formation Two types to worry about:

Multiple testing bias Consider multiple strategies, show only best one

Overfitting E.g., Ex post MVE SRs always high

MVE strat buys “winners” and sell “losers”

5

Examples

Bet on a series of fair coin flips What if you knew that there were:

1. More heads in the first (or second) half And could bet on just the early (or late) flips?

2. More heads than tails?

What sorts of biases? Do we account for these in finance?

6

First type: multiple testing (or selection) Don’t really account for it, formally

Do suspect (know) people look at more thing

Second type: overfitting Bet heads, not tails! Account for it?

One signal: Absolutely! t5% = 1.96 (not 1.65)

Multiple signals: No!7

Thought experiment?

8

Null hypothesis “Signals” don’t predict differences in

average returns E.g., monkeys selecting stocks by throwing darts

at the WSJ

Performance distribution t-statistics ~ N(0,1)

More or less Excess kurtosis and heteroscedasticity

9

What if you diversify across the lucky monkeys? Those with positive alpha

Clearly “snooping” Using in-sample aspect of data to form the strategy

How does this bias the results? Expected t-stat?

10

Get the average return

Diversify across their risks

Yields a high t-statistic:

Can also frame this in SRs

11

0.8E t N

| 0E r r

/p N

0.8 N

TE SR

Same thing (essentially) happens if you use all the signals But sign them so that they “predict” positive

in-sample returns Standard statistics account for this…

If and only if N = 1!

Again, strategy has high backtested SR Question: expect high SR going forward?

12

Issues

Combine things that backtest well

Get even better backtests Not surprising!

But what do the backtests mean? Biased?

Why? What biases? If so, by how much? (Quantify!)

Other intuitions?

13

Can address these

Calculate empirical distributions When signals are not informative But multiple signals are used to select stocks

Big boot-strapping exercise

Derive theoretical distributions In a simplified model

Normal, homoscedastic returns Use these to develop intuition

14

Strategy Construction

Long/short strategies Rebalanced annually (end of June)

Weight on each stock:

Si,t is the signal (x-sectional median = St)

mi,t is a cap multiplier Nests many common constructions

15

Nests “smart beta” Weight on each stock:

So long/sort is smart beta’s tilt from market

16

“Smart beta” Market

Signals

Generate individually as pure noise! Random normal variables

Composite signals sum individual signals Technical reason—mapping to theory

Not important for the empirical work

Cap multiplier is market equity Essentially value-weighted strategies

Again, not important

17

Best k-of-n strategies

“Natural” construction Investigate n signals Pick the k “strongest”

I.e., with most significant in-sample performance Combine them how?

Bootstrap for k ≤ n ≤ 100 Again, do it 10,000 times Collect strategy t-statistics

18

Two Issues

When k < n, selection bias When k = 1 < n, multiple testing bias

Well understood

When k > 1, overfitting Data snooping

In-sample aspect of data used to form strategy Pure overfitting only if k = n

Interaction!

19

Special Cases

20

Overfitting only

Multiple-testing only

Pure Selection

21

Pure Overfitting

22

Both Biases

23

General Case

What sort of strategies should we worry about? How do we think researchers design

strategies in practice? 3-of-20?

How many signals did MSCI consider for its quality index?

5-of-100?

24

General Case

25

Model (theory)

Strategies signal-weight stocks Returns normally dist. (assumption)

Equal volatilities Uncorrelated

Combine signals by averaging Or weighted averaging

combined strat = portfolio of pure strats So can apply facts from portfolio theory

26

Given weights on n signals (strategies), standard portfolio theory results

Weighting? Equal (1/k), corresponds to min variance Signal ( = ), corresponds to efficient

27

Best k-of-n strategies

Yields t-statistic distributions:

28

Critical values

Analytic for special cases: k = 1 k = n, with signal-weighting

Generally by numeric integration Simple computationally

But don’t provide much intuition Also derive good analytic approximations

Useful for comparative statics

29

Special Cases

30

Special Cases

31

General Cases

32

General Cases (Empirical)

33

General case, when k ~ n

n = 100

34


n = 40

35


n = 20

36

Tension when increasing k

Decreases vol. improves performance Decreases average signal quality lowers

returns impairs performance Initially first effect dominates (esp. w/ large n)

“Optimal” use of worst ~1/2 of signals: Throw them away!

Mean k/2-of-k t-stats. ~13% higher than k-of-k Mean k-of-2k t-stats. ~59% higher than k-of-k

37

Alternative Quantification

Pure multiple-testing bias equivalence How many single signals would you have to

look at to get the same bias? That is, given any critical value τ (i.e., for

some best k-of-n strategy), find n* s.t.

38

39

Approximate Power Law

Best k-of-n strategy bias: Similar to those from a best 1-of-nk

strategy! Using analytic approximation, can show that

log-n* roughly affine in log-n With slope ≈ k

Can see this graphically

40

41

Conclusion

View multi-signal claims skeptically Multiple good signals better performance

when combined Good backtested performance does NOT

any good signals “High tech” solution: use different tests “Low tech”: evaluate signals individually

Marginal power of each variable

42

General Approximation

Normal approximation for sum of: Top k absolute normal order statistics (MV) Top k squared normal order statistics (MVE)

Use Beta dist. of uniform order stats Approximately normal

Joint uniform conditional dist. of larger O-stats

Law of total variance

+43

How They Work

Specify mean, S.D. of approx. normal Combine with p-value how far out in tail

E.g., 5% crit. mean + two standard deviations

44

General Approximation

Where

45

Backtesting strategies based on multiple signals Robert Novy-Marx University of Rochester and NBER 1.

Documents

multiple signalsnot

high sr

high tstatistic

sample returnsseems

low roe

quality score

high backtested srquestion

overfittingbet heads