Top Banner
Time Series, Nonsense Time Series, Nonsense Correlations and the Correlations and the PCC PCC Julian Reiss Complutense University, Madrid and Centre for Philosophy, LSE
21

Time Series, Nonsense Correlations and the PCC Julian Reiss Complutense University, Madrid and Centre for Philosophy, LSE.

Jan 15, 2016

Download

Documents

Mariah Heath
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Time Series, Nonsense Correlations and the PCC Julian Reiss Complutense University, Madrid and Centre for Philosophy, LSE.

Time Series, Nonsense Time Series, Nonsense Correlations and the PCCCorrelations and the PCC

Julian Reiss

Complutense University, Madrid

and

Centre for Philosophy, LSE

Page 2: Time Series, Nonsense Correlations and the PCC Julian Reiss Complutense University, Madrid and Centre for Philosophy, LSE.

OverviewOverview The PCC in the context of time series A Well-Known Problem: British Bread Prices and

Venetian Sea Levels Two Attempts to Fix it:

– Defusing the problem– Arguing that the problem does not impugn the usefulness of

the principle I am going to argue that neither strategy will work Finally, I am going to present a reformulated version of

the principle, which avoids the problems here discussed but at the expense of virtual uselessness

Lesson: Don’t use the PCC in time-series analysis

Page 3: Time Series, Nonsense Correlations and the PCC Julian Reiss Complutense University, Madrid and Centre for Philosophy, LSE.

The Principle of the Common CauseThe Principle of the Common Cause Here’s a version of it:

PCC. If two variables X and Y are correlated then either– X causes Y– Y causes X or– Z, a common cause, causes both X and Y

Versions of this principle are at the heart of all probabilistic methods of causal inference (most prominently, of course, Bayes’ Nets)

Here focus on applications to time series: time-ordered measurements: X = {Xt, Xt+1, Xt+2, …, Xt+n}

Important in fields as diverse as neurophysiology, climatology, epidemiology, astro- and geophysics and many of the social sciences

Page 4: Time Series, Nonsense Correlations and the PCC Julian Reiss Complutense University, Madrid and Centre for Philosophy, LSE.

Bread Prices and Sea LevelsBread Prices and Sea Levels

Years ago, Elliott Sober introduced the following counterexample to the literature:

Consider the fact that the sea level in Venice and the cost of bread in Britain have both been on the rise in the past two centuries. Both, let us suppose, have monotonically increased. Imagine that we put this data in the form of a chronological list; for each date, we list the Venetian sea level and the going price of British bread. Because both quantities have increased steadily with time, it is true that higher than average sea levels tend to be associated with higher than average bread prices. The two quantities are very strongly positively correlated.

And yet, ex hypothesis, not causally connected

Page 5: Time Series, Nonsense Correlations and the PCC Julian Reiss Complutense University, Madrid and Centre for Philosophy, LSE.

Bread Prices and Sea LevelsBread Prices and Sea Levels

2223

2425

2829

3031

45

6

10

1415

1920

0

5

10

15

20

25

30

35

1 2 3 4 5 6 7 8

Venetian Y

British X

Page 6: Time Series, Nonsense Correlations and the PCC Julian Reiss Complutense University, Madrid and Centre for Philosophy, LSE.

Defusing the CounterexampleDefusing the Counterexample In a recent article, Kevin Hoover has tried to show that the

counterexample is only apparent Distinguish two stages of inference from observations:

– Statistical inference: from frequencies to probabilities– Causal inference: from probabilities to causal relations

Sober committed a fallacy at the first stage of the inference: the two series are associated (at the level of frequencies) but not correlated (at the level of probabilities)

One can see this readily when one considers that statistical inference is always conducted against a probability model; regarding Sober’s series as correlated means that the probability model one thinks most likely to be true of the series is one with stable moments (e.g. an IN process)—but simple data analysis shows that this is probably not the case

Page 7: Time Series, Nonsense Correlations and the PCC Julian Reiss Complutense University, Madrid and Centre for Philosophy, LSE.

Defusing the CounterexampleDefusing the Counterexample

For a series whose moments change (in this case, the mean increases monotonically in time), the Pearson’s correlation coefficient is not the right measure of correlation—we need a different measure that is adequate to the situation

Page 8: Time Series, Nonsense Correlations and the PCC Julian Reiss Complutense University, Madrid and Centre for Philosophy, LSE.

Defusing the CounterexampleDefusing the CounterexampleHoover’s use of terms is highly non-standard:

– The correlation coefficient is defined for any two variables, not only variables whose moments are stable over time

– Alternative, non-parametric measures yield the same verdict (e.g., Spearman’s rank correlation coefficient is unity)

– If the problem was one of statistical inference, we would expect the association to disappear with a larger sample—but this isn’t the case here

Page 9: Time Series, Nonsense Correlations and the PCC Julian Reiss Complutense University, Madrid and Centre for Philosophy, LSE.

Defusing the CounterexampleDefusing the CounterexampleBut never mind standard usage; what if Hoover

means to say: Correlation is a theoretical concept that needs to be operationalised differently in different contexts

The Pearson’s coefficient is an inappropriate measure of correlation when time-series are non-stationary; two integrated series are correlated if and only if they are co-integrated

Page 10: Time Series, Nonsense Correlations and the PCC Julian Reiss Complutense University, Madrid and Centre for Philosophy, LSE.

Defusing the CounterexampleDefusing the Counterexample

Stationary:A time series is weakly (or covariance) stationary if, and only if, its mean and variance are both finite and independent of time, and the covariance between the values of the series at different times depends only on the temporal distance between them

Integrated:Let d be the minimum integer such that {dXt} is weakly stationary. Then {Xt} is said to be integrated of order d, which is notated I(d). (By convention, a stationary time series is notated as I(0).)

Co-integrated:Two time series {Xt} and {Yt} are cointegrated if, and only if, each is

I(1) and a linear combination {Xt – 0 – 1Yt}, where 1 0, is I(0).

Page 11: Time Series, Nonsense Correlations and the PCC Julian Reiss Complutense University, Madrid and Centre for Philosophy, LSE.

Defusing the CounterexampleDefusing the Counterexample

-5

0

5

10

15

20

25

30

1 28 55 82 109 136 163 190 217 244 271 298 325 352 379 406 433 460 487

Alpha = .5

Alpha = 1

Xt = .5Xt-1 + Xt

Yt = Yt-1 + Yt

Yt = Yt-1 + Yt

Xt = .5Yt-1 + Xt

-40

-35

-30

-25

-20

-15

-10

-5

0

5

10

1 29 57 85 113 141 169 197 225 253 281 309 337 365 393 421 449 477

Alpha = 1

Z

Cointegration

Page 12: Time Series, Nonsense Correlations and the PCC Julian Reiss Complutense University, Madrid and Centre for Philosophy, LSE.

Defusing the CounterexampleDefusing the Counterexample

Distinguish:– “Spurious” correlation: correlation is only apparent

(e.g. because the wrong measure has been used)– “Nonsense” correlation: correlation is real but it does

not have a causal explanationHoover effectively denies that there are any

nonsense correlationsThe problem is that integratedness is only one

source of non-stationarity, and non-stationarity is only one source of nonsense correlation

Page 13: Time Series, Nonsense Correlations and the PCC Julian Reiss Complutense University, Madrid and Centre for Philosophy, LSE.

Defusing the CounterexampleDefusing the Counterexample Integratedness (or unit roots) is sometimes called a

stochastic source of non-stationarity; there are also deterministic sources:– A deterministic trend: Xt = t + XXt-1 + X,t

– Breaks in deterministic parameters (e.g., the mean of a series or its trend)

– (to be fair, Hoover considers so-called trend-stationary series—but we’ll see below that prior detrending isn’t always a good idea)

Correlations can be nonsense even in stationary series:– Xt = XXt-1 + X,t Yt = YYt-1 + Y,t

– Moving averages

Page 14: Time Series, Nonsense Correlations and the PCC Julian Reiss Complutense University, Madrid and Centre for Philosophy, LSE.

Defusing the CounterexampleDefusing the Counterexample

Furthermore, there are various sources of nonsense correlation that have nothing to do with time series as such– Population heterogeneity– Selection/sampling bias– Mathematical/conceptual/logical links– Etc.

Page 15: Time Series, Nonsense Correlations and the PCC Julian Reiss Complutense University, Madrid and Centre for Philosophy, LSE.

Defusing the CounterexampleDefusing the Counterexample

Though I do not have a proof for this claim, I doubt that case-by-case measures of correlation can be found that do not beg the question

Furthermore, Hoover’s recipe just shifts the problem up one level: cointegration, too, can be spurious

Ironically, Sober’s series appear cointegrated (0 = 20.25; 1 = .54)

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

0 1 2 3 4 5 6 7 8 9

Reihe1

Page 16: Time Series, Nonsense Correlations and the PCC Julian Reiss Complutense University, Madrid and Centre for Philosophy, LSE.

Insulating the PCCInsulating the PCC

Hence, I think we can justly conclude that Sober’s counterexample is genuine

But maybe, after all, it doesn’t matter so much because (a) Sober-like scenarios are rare; (b) data can be prepared prior to analysis and thus we can insulate the PCC from failures

Quickly on (a): it would be dumb to use an inference method we know sometimes fails even if failures are rare; but, importantly, failures aren’t rare—they’re ubiquitous

Page 17: Time Series, Nonsense Correlations and the PCC Julian Reiss Complutense University, Madrid and Centre for Philosophy, LSE.

Insulating the PCCInsulating the PCC

But, perhaps, we can do something with the data before applying the PCC to it (Steel 2003): “[T]he above discussion illustrates how researchers interested in drawing conclusions from statistical data can design their investigation so that counter-examples like Sober’s are not a concern. For instance, if the series is non-stationary but transformable into a stationary one via differentiating with respect to time, then differentiate. Then PCC can be invoked without concern for the difficulty illustrated by the Venice-Britain example.”

Unfortunately, this, too, doesn’t work

Page 18: Time Series, Nonsense Correlations and the PCC Julian Reiss Complutense University, Madrid and Centre for Philosophy, LSE.

Insulating the PCCInsulating the PCC

Differencing is only effective when series are integrated of order 1—many series are not:– Differencing won’t be effective in stationary time series

(cf. discussion above)– Nor in fractionally integrated series– You’ll have to difference several times if series are

integrated of an order > 1 Moreover, we can lose important information

through differencing:– Information on long-run behaviour– Information about co-breaking

Page 19: Time Series, Nonsense Correlations and the PCC Julian Reiss Complutense University, Madrid and Centre for Philosophy, LSE.

Insulating the PCCInsulating the PCC Other off-the-shelf correction methods don’t fare

better Detrending can lead to spurious results when series

are integrated (e.g., detrending can lead to spurious co-integration)

Detrending isn’t always effective: Sober’s series remain highly correlated after detrending

Compare:“Applying the program [that incorporates the PCC] to real data requires a lot of adaptation to particular circumstances: […] data must be differenced to remove auto-correlation…” (Clark Glymour, philosopher)“A Simple Message to Autocorrelation Correctors: Don’t.” (Grayham Mizon, econometrician)

Page 20: Time Series, Nonsense Correlations and the PCC Julian Reiss Complutense University, Madrid and Centre for Philosophy, LSE.

Eliminative InductionEliminative Induction

I think the best we can do is along the following lines:PCC*: A correlation between two variables X and Y is explanation-seeking. If all kinds of non-causal (e.g., statistical, logical, mathematical, conceptual, nomological) explanation can be ruled out, then either X causes Y, Y causes X or X and Y are the joint effects of a common cause Z, which screens off X and Y

But of course, this is of little help since other explanations can almost never be eliminated when time series are concerned

Page 21: Time Series, Nonsense Correlations and the PCC Julian Reiss Complutense University, Madrid and Centre for Philosophy, LSE.

PCC—What’s it good for?PCC—What’s it good for?The PCC seems applicable AT BEST to

systems that– are shielded from outside influences (in order to

avoid deterministic disturbances)– have no internal dynamics (in order to avoid

stochastic disturbances)– are run over and over again (in order to get

correlations to begin with)Aren’t these conditions typically met in

experimental set ups?But if we can experiment, why use the PCC?