A Bayesian method for the analysis of deterministic …A Bayesian method for the analysis of deterministic and stochastic time series Coryn Bailer-Jones Max Planck Institute for Astronomy,

A Bayesian method for the analysis of deterministic and stochastic time series

Coryn Bailer-JonesMax Planck Institute for Astronomy, Heidelberg

DPG, Berlin, March 2015

Coryn Bailer-Jones, MPI for Astronomy, Heidelberg

Time series modelling

• heteroscedastic, asymmetric noise on time and signal

• non-uniform time sampling

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

0 20 40 60 80 100

−20

020

4060

measured time, sm

easu

red

sign

al, y

P (Dj |�j , ✓,M) =

Z

tj ,zj

P (Dj |tj , zj ,�j)| {z }Measurement model

P (tj , zj |✓,M)| {z }Time series model

dtjdzj

Measured data Dj = (sj , yj) and uncertainties �j = (�sj ,�yj )

Likelihood of single data point: integrate over unknown true time (t) and signal (z)

Model M with parameters ✓


Model comparison

Likelihood of all data points is P (D|�, ✓,M) =Y

j

P (Dj |�j , ✓,M)

Evidence is the likelihood marginalized over the parameter prior

More robust alternative is the leave-one-out cross validation likelihood

P (D|�,M) =

Z

✓P (D|�, ✓,M)| {z }

likelihood

P (✓|M)| {z }prior

d✓

Calculate integrals by MCMC sampling of posterior

P (Dj |D�j ,�,M) =

Z

✓P (Dj |�j , ✓,M)| {z }

likelihood

P (✓|D�j ,��j ,M)| {z }

posterior

d✓

LCV =j=JY

j=1

P (Dj |D�j ,�,M)


Time series model

0 20 40 60 80 100

−0.0

40.

000.

020.

04

true time t

true

sign

al z

• red solid: deterministic component

• red dashed: standard deviation of stochastic component

• black: true data

Deterministic mean plus stochastic variation of constant variance

P (zj |tj , ✓,M) =

1p2⇡!

e�(zj�⌘(tj))2/2!2

Gaussian

⌘(tj) =

a

2

cos[2⇡(⌫t+ �)] + b sinusoidal


Time series model

µz = z0�

Vz =

c⌧

2

(1� �2)

9=

; where � = e�(t�t0)/⌧for t > t0

Ornstein-Uhlenbeck process

A Stationary, Markov, Gaussian process

dz(t) = �1

⌧z(t)dt+ c1/2N (t; 0, dt)

P (zj |tj , ✓,M) =1p2⇡Vz

e�(zj�µz)2/2Vz with

⌧ relaxation time

c di↵usion constant


Examples of OU process realizations

relaxation time, ⌧−15

−50

510

−15

−50

510

0 20 60 100

−15

−50

510

0 20 60 100 0 20 60 100 0 20 60 100time

signal

1 10 100 1000

diffe

rent

ran

dom

isat

ions


Luminosity variations in ultra cool dwarf stars

0 10 20 30 40 50

−0.0

60.

00

2m0345

0 20 40 60 80 100 120

−0.2

0.0

2m0913

0 20 40 60

−0.0

60.

022m1145a

0 20 40 60 80 100 120

−0.0

50.

05

2m1145b

0 20 40 60 80 100 120−0.0

40.

02

2m1146

0 20 40 60 80 100 120−0.0

60.

00

2m1334

0 5 10 15 20 25

−0.1

00.

05

calar3

0 20 40 60

−0.0

20.

02

sdss0539

0 10 20 30 40 50−0.0

30.

01

sori31

0 10 20 30 40 50

−0.0

20.

01sori33

0 10 20 30 40 50

−0.2

0.0

sori45

time / hrs

sign

al /

mag



Models compared:

• constant (variability just due to measurement noise)

• constant with Gaussian stochastic component

• sinusoid with Gaussian stochastic component

• OU process



0 10 20 30 40 50

−0.0

60.

00

2m0345

0 20 40 60 80 100 120

−0.2

0.0

2m0913

0 20 40 60

−0.0

60.

022m1145a

0 20 40 60 80 100 120

−0.0

50.

05

2m1145b

0 20 40 60 80 100 120−0.0

40.

02

2m1146

0 20 40 60 80 100 120−0.0

60.

00

2m1334

0 5 10 15 20 25

−0.1

00.

05

calar3

0 20 40 60

−0.0

20.

02

sdss0539

0 10 20 30 40 50−0.0

30.

01

sori31

0 10 20 30 40 50

−0.0

20.

01sori33

0 10 20 30 40 50

−0.2

0.0

sori45

time / hrs

sign

al /

mag

OU process

Sinusoid + stochastic

Sinusoid (8.3h, 13.3h)


Periodicity in biodiversity over past 550 Myr?

550 450 350 250 150 50

−800

−400

040

0

time BP / Myr

no. g

ener

a (-

cubi

c fit

)

periodic model withadditional fitted Gaussian noise

black = datared = model fit

stochastic process(OU process)

550 450 350 250 150 50

−800

−400

040

0

time BP / Myr

no. g

ener

a (-

cubi

c fit

)

550 450 350 250 150 50

−800

−400

040

0

time BP / Myr

no. g

ener

a (-

cubi

c fit

)

CV likelihood is much higher for this model

Rohde & Muller 2005


Summary

• a Bayesian method for modelling times series

‣ arbitrary time sampling and error models

‣ deterministic and stochastic times series

‣ use of cross-validation likelihood, a robust alternative to the evidence

• applications

‣ light curves of some very cool stars (and quasars) evolve stochastically

‣ no evidence for periodic variation of biodiversity over past 550 Myr

• more information and software: tinyurl.com/ctsmod

http://www.astroimpacts.org


Ultra cool dwarf model comparison results

C. A. L. Bailer-Jones: Bayesian time series analysis

Table 4. Log (base 10) LOO-CV likelihood of each model relative to that for the no-model for each light curve (log LLOO−CV − log LNM).

Light curve OUprocess Off+Stoch Sin Sin+Stoch Off+Sin+Stoch No-model p-value

2m0345 3.26 2.07 0.15 2.06 2.66 –13.60 4e-42m0913 0.44 0.72 0.23 0.97 0.10 –53.39 7e-42m1145a 15.23 8.59 3.01 12.26 11.70 –63.83 <1e-92m1145b –0.73 1.96 2.00 2.69 2.95 –39.71 1e-32m1146 0.67 0.56 –0.08 0.21 1.17 –26.83 3e-32m1334 14.95 12.82 4.06 16.86 16.12 –65.88 1e-9sdss0539 5.50 1.99 4.93 4.48 4.67 –19.62 3e-5calar3 3.60 1.43 5.65 5.11 4.28 –28.06 6e-4sori31 2.04 2.12 1.02 2.59 1.90 –11.16 4e-5sori33 1.49 0.66 2.14 1.85 2.12 –8.39 2e-3sori45 6.70 4.32 5.08 6.23 6.32 –29.93 5e-9

Notes. The penultimate column gives the value of the log likelihood for the no-model, log LNM. The last column is the p-value for the hypothesistest from BJM.

Although the measured data points have negligible timinguncertainties, they do have a finite duration (the integrationtime of the observations), either 5 or 8 min (fixed for a givenlight curve). This could be accommodated into the measurementmodel (Sect. 2.2) by using a top-hat distribution instead of aGaussian. I nonetheless approximate this as a delta function, fortwo reasons: (1) it accelerates considerably the likelihood cal-culations, because it allows us to replace the 2D likelihood in-tegral (Eq. (7)) with an analytic calculation (Sect. B.2); (2) themethod of calculating the likelihood of the OU process (derivedin Sect. A.2) is only defined for this limit.

6.2. Results: LOO-CV likelihood

I follow the procedure outlined in Sect. 4 to define the priors andto sample the posterior with MCMC. The results are summarizedin Table 4. A first glance over the table shows that for ten of thelight curves, most of the models are significantly better than theno-model at explaining the data, often by a large amount.

According to the χ2 test of BJM, all of these objects have avariability which is inconsistent with Gaussian noise on the scaleof the error bars, so there should be a better model than the no-model (although it may not be among those I have tested). Wesee from the Table that the no-model is not favoured for any lightcurve. However, for 2m0913, none of the models is significantlymore likely than the no-model, so there is no reason to “reject” it.As the no-model is equivalent to the null hypothesis of BJM’s χ2

test, and this gave a p-value of 7e−4, this shows that the p-valueis not a reliable metric for “rejecting” the null hypothesis.

On the other hand, in the three cases where the p-value isvery low – 2m1145a, 2m1334, sori45 – the relative log likeli-hood for at least one model is high. This suggests that a verylow p-value sometimes correctly indicates that another model ex-plains the data better, although this is of limited use as we do notknow how low the p-value has to be. But at least it might moti-vate us to define and test other models. The converse is not true:a relatively high p-value does not indicate that the null hypothe-sis is the best fitting model.

We turn now to identifying the best models. For all lightcurves, there is no significant difference between Off+Sin+Stochand Sin+Stoch, which just means that the offset is not needed.That is not surprising, because the light curves have zero meanby construction. For eight of the light curves, the LOO-CV like-lihood for Sin+Stoch is significantly larger than for Sin, imply-ing there is a source of (Gaussian) stochastic variability whichis not accounted for by the error bars in the data, {σy j }. This

indicates either an additional source of variance (variability), orthat the error bars have been underestimated. (In only two ofthese cases – 2m1145a and 2m1334 – are the differences be-tween Sin and Sin+Stoch very large.)

Of course, there is no reason a priori to assume that a si-nusoidal model is the appropriate one. In 9 of the 11 lightcurves, the sinusoidal models give a higher likelihood thanthe Off+Stoch model, and in the other two cases the value isnot significantly lower. We can therefore state that for noneof the 11 light curves is Off+Stoch significantly better thanthe sinusoidal models. But only with five or six light curvescan we say that a sinusoidal model is significantly better thanOff+Stoch. For the remaining light curves, the data (and priors!)do not discriminate sufficiently between the models, so neithercan be “rejected”.

Turning now to the OU process, we see that this is signifi-cantly better than all other models only for 2m1145a, but by aconfident margin. In seven other cases the OU process is stillbetter than the other models, or at least not significantly worsethan the best model, so cannot be discounted as an explanation.In the remaining three cases – 2m1145b, 2m1334, calar3 – atleast one other model is significantly better than the OU process.

The results for 2m1145a and 2m1145b are interesting, asthese are light curves of the same object observed a year apart.At one time the OU process is the best explanation, at the othereither a sinusoidal model or Off+Stoch. Although it is plausiblethat the object shows different behaviour at different times, e.g.according to the degree of cloud coverage, we should not over-interpret this. We should also not forget that another, untestedmodel could be better than any of these.

To summarize: based just on the LOO-CV likelihood, I con-clude that 10 of 11 light curves are explained much better bysome model other than the no-model, by a factor of 100 or morein likelihood. The exception is 2m0913, for which all models areequally plausible (likelihoods within a factor of ten). Three lightcurves can be associated with one particular model: 2m1145ais best described by the OU process; 2m1334 and calar3 arebest described by a sinusoidal model, the former requiring anadditional stochastic component (Sin+Stoch), the latter couldbe either with or without it (Sin). This would seem to be con-sistent with a rotational modulation of the light curve (but seethe next section). For the remaining seven light curves, no sin-gle model emerges as the clear winner, although some modelsare significantly disfavoured. In three of these seven cases –2m0345, sdss0539, sori45 – both the OU process and a sinu-soid model explain the data equally well (for 2m0345 and sori45

A89, page 9 of 16


Parameter posterior PDFs:

0.05 0.10 0.15

020

4060

frequency, ν / hr−1

dens

ity

0.02 0.06 0.100

510

2030

amplitude, a / mag

dens

ity

0.5 0.6 0.7 0.8 0.9

02

46

8

phase φ

dens

ity

black = posteriorred = prior


Parameter posterior PDFs: 2m1145a

0 50 100 150

0.00

0.02

0.04

τ / hr

dens

ity

−0.06 −0.02 0.02 0.060

1020

3040

b / mag

dens

ity0.00000 0.00015 0.00030

050

0010

000

c / mag2hr−1

dens

ity−0.04 0.00 0.04

010

2030

4050

µ[z1] / mag

dens

ity

0.00 0.04 0.08

050

100

150

V1 2[z1] / mag

dens

ity black = posteriorred = prior


Parameter posterior PDFs: 2m1334

−0.06 −0.02

040

8012

0

offset, b / mag

dens

ity

0.000 0.004 0.0080

100

300

500

frequency, ν / hr−1

dens

ity0.00 0.05 0.10 0.15

010

2030

4050

amplitude, a / mag

dens

ity0.85 0.90 0.95 1.00

020

4060

phase φ

dens

ity

0.005 0.015 0.025

050

100

150

standard deviation, ω / mag

dens

ity black = posteriorred = prior

A Bayesian method for the analysis of deterministic …A Bayesian method for the analysis of deterministic and stochastic time series Coryn Bailer-Jones Max Planck Institute for Astronomy,

Documents