Top Banner
Likelihood based inference 1. Overview of classical asymptotics 2. Profile likelihood and nuisance parameters NR 2013; 2010 3. p growing with n Portnoy 1984, 5, 8 4. p > n: regularization Buhlmann 2013; Taylor et al. 2014 5. approximate likelihoods composite, quasi, empirical, ... Topics in Inference Fields Institute, 2015 1
40

Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

Jan 22, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

Likelihood based inference

1. Overview of classical asymptotics

2. Profile likelihood and nuisance parameters NR 2013; 2010

3. p growing with n Portnoy 1984, 5, 8

4. p > n: regularization Buhlmann 2013; Taylor et al. 2014

5. approximate likelihoods composite, quasi, empirical, ...

Topics in Inference Fields Institute, 2015 1

Page 2: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

Models and likelihoodI Model for the probability distribution of y given xI Density f (y | x) with respect to, e.g., Lebesgue measureI Parameters for the density f (y | x ; θ), θ = (θ1, . . . , θp)

I Data y = (y1, . . . , yn) often independent

I Likelihood function L(θ; y) ∝ f (y ; θ) (y1, . . . , yn)

I log-likelihood function `(θ; y) = log L(θ; y)

I often θ = (ψ, λ)

I θ could have very large dimension, p > n

I θ could have infinite dimension in principleE(y | x) = θ(x) ‘smooth’

Topics in Inference Fields Institute, 2015 2

Page 3: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

Examplesgeneralized linear mixed models

GLM: yij | ui ∼ exp{yijηij − b(ηij) + c(yij)}

linear predictor: ηij = xTijβ + zT

ij ui j=1,...ni ; i=1,...m

random effects: ui ∼ Nk (0,Σ)

log-likelihood:

`(β,Σ) =m∑

i=1

(yT

i Xiβ −12

log |Σ|

+ log∫Rk

exp{yTi Ziui − 1T

i b(Xiβ + Ziui)−12

uTi Σ−1ui}dui

)Ormerod & Wand 2012

Topics in Inference Fields Institute, 2015 3

Page 4: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

... complicated likelihoodsI example: clustered binary dataI latent variable:

zir = x ′irβ + bi + εir , bi ∼ N(0, σ2b), εir ∼ N(0,1)

I r = 1, . . . ,ni : observations in a cluster/family/school...i = 1, . . . ,n clusters

I random effect bi introduces correlation betweenobservations in a cluster

I observations: yir = 1 if zir > 0, else 0I Pr(yir = 1 | bi) = Φ(x ′irβ + bi) = pi Φ(z) =

∫ z 1√2π

e−x2/2dx

I likelihood θ = (β, σb)L(θ; y) =

∏ni=1 log

∫∞−∞

∏nir=1 pi

yir (1− pi)1−yirφ(bi , σ

2b)dbi

I more general: zir = x ′irβ + w ′ir bi + εir

Renard et al. (2004)

Topics in Inference Fields Institute, 2015 4

Page 5: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

... complicated likelihoodsPoisson f (yt | αt ; θ) = exp(yt logµt − µt )/yt !

logµt = β + αt

autoregression

αt = φαt−1 + εt , εt ∼ N(0, σ2), |φ| < 1, θ = (β, φ, σ2)

likelihood

L(θ; y1, . . . , yn) =

∫ ( n∏t=1

f (yt | αt ; θ)

)f (α; θ)dα

Lapprox (θ; y) via Laplace with some refinementsDavis & Yau, 2011

Topics in Inference Fields Institute, 2015 5

Page 6: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

... complicated likelihoodsmultivariate extremes: example, wind speed at d locations

vector observations: (X1i , . . . ,Xdi), i = 1, . . . ,n

component-wise maxima: Z1, . . . ,Zd ; Zj = max(Xj1, . . . ,Xjn)

Zj are transformed (centered and scaled)

joint distribution function:

Pr(Z1 ≤ z1, . . . ,Zd ≤ zd ) = exp{−V (z1, . . . , zd )}

V (·) can be parameterized via Gaussian process models

likelihood : need the joint derivatives of V (·)

combinatorial explosion Davison et al., 2012

Topics in Inference Fields Institute, 2015 6

Page 7: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

... complicated likelihoodsRestricted Boltzmann machine:

f (v ; θ) =1

Z (θ)exp{

∑h

(hTWv + αTh + βTv)}, θ = (W , α, β)

observations: v1, . . . , vn, independent ∼ f (v ; θ); hidden units h

complete data likelihood

f (v ,h; θ) =1

Z (θ)exp{f (v ,h; θ)}

partition function: Z (θ) =∑

v ,h exp{f (v ,h; θ)}Jan 30 MZ slides; GL slides

Topics in Inference Fields Institute, 2015 7

Page 8: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

Why likelihood?I makes probability modelling centralI emphasizes the inverse problem of reasoning

from y to θ or f (·)I suggested by Fisher as a measure of plausibility

Royall, 1997L(θ)/L(θ) ∈ (1,3) very plausible;L(θ)/L(θ) ∈ (3,10) implausible;L(θ)/L(θ) ∈ (10,∞) very implausible

I converts a ‘prior’ probability π(θ) to a posterior π(θ | y) viaBayes’ Theorem

I provides a conventional set of summary quantities:maximum likelihood estimator, score function, ...

I leading to approximate pivotal functions, based onnormal distribution

I basis for comparison of models, using AIC or BIC

Topics in Inference Fields Institute, 2015 8

Page 9: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

Derived quantities

16 17 18 19 20 21 22 23

−4

−3

−2

−1

0

log−likelihood function

θθ

log−

likel

ihoo

d

θθθθ

θθ −− θθ

1.92 w/2

I maximum likelihood estimatorθ = arg supθ log L(θ; y)

= arg supθ`(θ; y)

I observed Fisher informationj(θ) = − ∂2`(θ)/∂θ2

∣∣θ

I efficient score function`′(θ) = ∂`(θ; y)/∂θ

`′(θ) = 0 assuming enough regularity

I `′(θ; y) =∑n

i=1 ∂ log fYi (yi ; θ)/∂θ y1, . . . , yn independent

Topics in Inference Fields Institute, 2015 9

Page 10: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

Limiting results no nuisance parameters

`′(θ)Tj−1(θ)`′(θ)L−→ χ2

p,

q(θ) ≡ (θ − θ)Tj(θ)(θ − θ)L−→ χ2

p,

w(θ) ≡ 2{`(θ)− `(θ)} L−→ χ2p

Approximate pivots p = 1

s(θ) ≡ `′(θ)j−1/2(θ).∼ N(0,1)

q(θ) ≡ (θ − θ)j1/2(θ).∼ N(0,1)

r(θ) ≡ ±√

[2{`(θ)− `(θ)}1/2].∼ N(0,1)

Topics in Inference Fields Institute, 2015 10

Page 11: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

... approximate pivots scalar parameter of interest

Topics in Inference Fields Institute, 2015 11

Page 12: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

Nuisance parameters: θ = (ψ, λ)

I λψ constrained maximum likelihood estimator

I profile log-likelihood `p(ψ) = `(ψ, λψ)

re(ψ; y) = (ψ − ψ)j1/2p (ψ)

.∼ N(0,1)

r(ψ; y) = ±√

[2{`p(ψ)− `p(ψ)}] .∼ N(0,1)

πm(ψ | y).∼ N{ψ, j−1/2

p (ψ)}

jp(ψ) = −`′′p(ψ); profile information

I treat profile log-likelihood as a one-parameter log-likelihood

Topics in Inference Fields Institute, 2015 12

Page 13: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

The problem with profilingI `p(ψ) = `(ψ, λψ) used as a ‘regular’ likelihood, with the

usual asymptoticsI neglects errors in the estimation of the nuisance parameterI can be very large when there are many nuisance

parameters

I example: Y ∼ N(Xβ, σ2I), σ2 = (y − X β)T(y − X β)/nI badly biased if dim(β) large relative to nI easy fix: σ2 = (y − X β)T(y − X β)/(n − p)

I example: Yij ∼ N(µi , σ2), j = 1, . . . ,n; i = 1, . . . ,p

I σ2 p→ n − 1n

σ2 as p →∞,n fixedNeyman & Scott, 1948

Topics in Inference Fields Institute, 2015 13

Page 14: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

Reminder: deriving limit resultsI `′(θ; y) = 0 = `′(θ; y) + (θ − θ)`′′(θ; y)

I `′(θ; y){−`′′(θ; y)}−1 .= θ − θ

I `′(θ; y)︸ ︷︷ ︸L−→N(0,i(θ))

{−`′′(θ; y)}−1︸ ︷︷ ︸p→i−1(θ)

=⇒ (θ − θ)L−→ N(0, i−1(θ))

i(θ) = E{j(θ)} = E{−`′′(θ)} = cov{`′(θ)}

I M estimator: θρ = argminθΣρ(yi ; θ)

I Solution Σψ(yi ; θρ) = 0, ψ(y ; θ) = ∂ρ(yi ; θ)/∂θ

I θρL−→ N{0,G−1(θ)} E{ψ(Y ; θ)} = 0

I G(θ) =E{−∂ψ(Y ; θ)/∂θ}[cov{ψ(Y ; θ)}]−1E{−∂ψ(Y ; θ)/∂θ}

Topics in Inference Fields Institute, 2015 14

Page 15: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

big data asymptoticsI Neyman-Scott problems: n fixed, p →∞

I Donoho n,p →∞, p/n→ β <∞

I likelihood results n,p →∞, p2/n→ β <∞Portnoy, 1984, 5, 8

I Laplace approx n,p →∞, p = o(n1/3)Shun & McCullagh, 1995

I p > n: regularizeI lasso

argminβ (y − Xβ)T(y − Xβ)− λΣj |βj |no intercept

I ridge regression

argminβ (y − Xβ)T(y − Xβ)− λΣjβ2j

Topics in Inference Fields Institute, 2015 15

Page 16: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

n,p →∞ Portnoy, 1984,5,8

I Model: yi = xTi β + Zi , i = 1, . . . ,n independent

I M-estimation:n∑

i=1

xiψ(yi − xTi β) = 0 (1)

I result: if ψ is monotone, and p log(p)/n→ 0, andconditions on X , then

there is a solution of (1) satisfying||β − β||2 = O(p/n)

I “rows of X behave like a sample from a distribution in Rp”

I if p3/2 log n/n→ 0, then

max |xTi (β − β)| p→ 0

I andaT

n(β − β)L−→ N(0, σ2)

σ2 = aTn(X TX )−1anEψ2(Z )/{Eψ′(Z )}2

Topics in Inference Fields Institute, 2015 16

Page 17: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

n,p →∞ Portnoy, 1984,5,8

I Model: yi ∼ exp{θTy − ψ(θ)}, i = 1, . . . ,n independent; p = pn

I maximum likelihood estimate ψ′(θn) = ynI under conditions on the eigenvalues of ψ′′(θ) and moment

conditions on y , Fisher information matrix

||θn − θn||2 ≤ cpn, in probability ,

I

||θ − θ − y || = Op(p/n) if p/n→ 0,I p3/2/n→ 0: √

naTn(θ − θ)

L−→ N(0,1),

likelihood ratio test of simple hypothesis asymptotically χ2p

I “asymptotic approximations are trustworthy if p3/2/n issmall, but may be very wrong if p2/n is not small”

I MLE ‘will tend to be’ consistent if p/n→ 0cf. also El Karoui et al., 2013, PNAS

Topics in Inference Fields Institute, 2015 17

Page 18: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

n < p Buhlmann et al., Lockhart et al.

βLasso = argminβ(||y − Xβ||22 + λ||β||1)

I prediction: ||X (βLasso − β0||22/n ‘small’

I estimation: ||βLasso − β0||q, q ∈ 1,2 ‘small’

I selection: P(S = S0) ‘large’ S0 is the ‘active set’:{j : β0

j 6= 0}I under restricted eigenvalue conditions on X , can get

results like

||βLasso − β0||1 = Op(s0√

log(p)/n), λ ≈√

log(p)/n

I what about estimated standard errors for βLasso?I Buhlmann, 2013: the ridge regression estimate

βR = argminβ(||y − Xβ||22 + λ||β||2),

can be bias-correctedI the bias-corrected version is asymptotically normally

distributed, and the asymptotic variance can be estimated

Topics in Inference Fields Institute, 2015 18

Page 19: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

n < p Buhlmann et al., Lockhart et al.

βLasso = argminβ(||y − Xβ||22 + λ||β||1)

TkL−→ Exp(1)

Taylor et al. 2014

Topics in Inference Fields Institute, 2015 19

Page 20: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

Likelihood in complex models

I simplify the likelihoodI composite likelihoodI variational approximationI Laplace approximation to integrals

I change the mode of inferenceI quasi-likelihoodI indirect inference

I simulateI approximate Bayesian computationI MCMC

Topics in Inference Fields Institute, 2015 20

Page 21: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

Composite likelihoodI also called pseudo-likelihoodI reduce high-dimensional dependencies by ignoring them

I for example, replace f (yi1, . . . , yik ; θ) by

pairwise marginal∏j<j ′

f2(yij , yij ′ ; θ), or

conditional∏

j

fc(yij | yN (ij); θ)

I Composite likelihood function

CL(θ; y) ∝n∏

i=1

∏j<j ′

f2(yij , yij ′ ; θ)

I Composite ML estimates are consistent, asymptoticallynormal, not fully efficient Besag, 1975; Lindsay, 1988

Topics in Inference Fields Institute, 2015 21

Page 22: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

Example: AR Poisson Davis & Yau, 2011

I Likelihood

L(θ; y1, . . . , yn) =

∫ ( n∏t=1

f (yt | αt ; θ)

)f (α; θ)dα

I Composite likelihood

CL(θ; y1, . . . , yn) =n−1∏t=1

∫ ∫f (yt | αt ; θ)f (yt+1 | αt+1; θ)f (αt , αt+1; θ)dαtdαt+1

I consecutive pairsI Time-series asymptotic regime one vector y of increasing length

I Composite ML estimator still consistent, asymptoticallynormal, estimable asymptotic variance

I Efficient, relative to a Laplace-type approximationI Surprises: AR(1), fully efficient; MA(1), poor; ARFIMA(0,d,0), ok

Topics in Inference Fields Institute, 2015 22

Page 23: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

Example: spatial extremes Davison et al., 2012; & Huser, 2015

Pr(Z1 ≤ z1, . . . ,Zd ≤ zd ) = exp{−V (z1, . . . , zd ; θ)}

I pairwise composite likelihood used to compare the fits ofseveral competing models

I model choice using “CLIC”, an analogue of AIC−2 log(CL) + tr(J−1K )

I Davison et al. 2012 applied this to annual maximum rainfallat several stations near Zurich

I “fitting max-stable processes to spatial or spatio-temporalblock maxima is awkward ... the use of compositelikelihoods ... has become widely used” Davison & Huser

Topics in Inference Fields Institute, 2015 23

Page 24: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

Example: Ising modelIsing model:

f (y ; θ) = exp(∑

(j,k)∈E

θjkyjyk )1

Z (θ)

neighbourhood contributions

f (yj | y(−j); θ) =exp(2yj

∑k 6=j θjkyk )

exp(2yj∑

k 6=j θjkyk ) + 1

penalized CL estimation based on sample y (1), . . . , y (n)

maxθ

n∑

i=1

`j(θ; y (i))−∑

j

∑k

Pλ(|θjk |)

Xue et al., 2012

Ravikumar et al., 2010

Topics in Inference Fields Institute, 2015 24

Page 25: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

Quasi-likelihoodI simplify the modelI

E(yi ; θ) = µi(θ); Var(yi ; θ) = φνi(θ)

I consistent with generalized linear modelsI example: over-dispersed Poisson responsesI PQL uses this construction, but with random effects

Molenberghs & Verbeke, Ch. 14

I why does it work?I score equations are the same as for a ‘real’ likelihood

hence unbiased

I derivative of score function equal to variance functionspecial to GLMs

Topics in Inference Fields Institute, 2015 25

Page 26: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

Indirect inferenceI composite likelihood estimators are consistent

under conditions ...

I because log CL(θ; y) =∑n

i=1∑

j<j ′ log f (yj , yj ′ ; θ)

I derivative w.r.t. θ has expected value 0

I what happens if an estimating equation g(y ; θ) is biased?I g(y1, . . . , yn; θn) = 0; θn → θ∗ Eg(Y ; θ∗) = 0

I θ∗ = k(θ); invertible? θ = k(θ∗) k−1 ≡ k

I new estimator θn = k(θn)

I k(·) is a bridge function, connecting wrong value of θto the right one Yi & R, 2010; Jiang & Turnbull, 2004

Topics in Inference Fields Institute, 2015 26

Page 27: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

... indirect inference Smith, 2008

I model of interest

yt = Gt (yt−1, xt , εt ; θ), θ ∈ Rd

I likelihood is not-computable, but can simulate from themodel

I simple (wrong) model

yt ∼ f (yt | yt−1, xt ; θ∗), θ∗ ∈ Rp

I find the MLE in the simple model, θ∗ = θ∗(y1, . . . , yn), say

I use simulated samples from model of interestto find the ‘best’ β

I ‘best’ θ gives data that reproduces θ∗ Shalizi, 2013

Topics in Inference Fields Institute, 2015 27

Page 28: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

... indirect inference Smith, 2008

I simulate samples ymt , m = 1, . . . ,M at some value θ

I compute θ∗(θ) from the simulated data

θ∗(θ) = arg maxθ∗

∑m

∑t

log f (ymt | ym

t−1, xt ; θ∗)

I choose θ so that θ∗(θ) is as close as possible to θ∗

I if p = d simply invert the ‘bridge function’I usually p > d

I θ1 = arg minθ{θ∗(θ)− θ}TW{θ∗(θ)− θ}I β2 = arg minθ(

∑t log f (yt | yt−1, xt , θ

∗(θ))−∑

t log f (yt |yt−1, xt , θ))

I estimates of θ are consistent, asymptotically normal, butnot efficient

Topics in Inference Fields Institute, 2015 28

Page 29: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

Approximate Bayesian Computation Marin et al., 2010

I simulate θ′ from π(θ)

I simulate data z from f (·; θ′)

I if z = y then θ′ is an observation from posterior π(· | y)

I actually s(z) = s(y) for some set of statistics

I actually ρ{s(z), s(y)} < ε for some distance function ρ(·)

Fearnhead & Prangle, 2011

I many variations, using different MCMC methods to selectcandidate values θ′

Topics in Inference Fields Institute, 2015 29

Page 30: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

... approximate Bayesian computationM/G/1 queue: exponential arrival times, general service times,single server

observations yi : times between departures from the queue

unobserved variables Vi : arrival time of customer i

model:I V1 ∼ Exp(θ3)

I Vi | Vi−1 ∼ Vi−1 + Exp(θ3)

I Yi | Xi−1, Vi ∼ Uniform{θ1 + max(0, Vi − Xi−1),

θ2 + max(0, Vi − Xi−1)} Xi =∑i

j=1 Yj

I service time∼ U(θ1, θ2)

ABC: use quantiles of departure times as summary statistics

Indirect Inference: use y , y(1), θ2 from steady-state model

Topics in Inference Fields Institute, 2015 30

Page 31: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

Fearnhead & Prangle, 2011

Page 32: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

ABC and Indirect Inference Cox & Kartsonaki, 2012

I both methods need a set of parameter values from whichto simulate: θ′ or θ

I both methods need a set of auxiliary functions of the datas(y) or θ∗(y)

I in indirect inference, θ∗ is the ‘bridge’ to the parameters ofreal interest, θ

I C & K use orthogonal designs based on Hadamardmatrices to chose θ′

I and calculate summary statistics focussed on individualcomponents of θ

I MCMC estimation of log-likelihood functionGeyer & Thompson, 1992

cond. comp. likelihood poor for Ising model Okabayashi et al., 2011

Topics in Inference Fields Institute, 2015 32

Page 33: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

Variational methods Ormerod & Wand, 2010

I in a Bayesian context, want f (β | y)use an approximation q(β)

I dependence of q on y suppressed

I choose q(β) to beI simple to calculateI close to posterior

I simple to calculateI q(β) =

∏qj (βj )

I simple parametric family

I close to posterior: miminize Kullback-Leibler divergence

KL(q || fpost ) =

∫q(β) log{q(β)/f (β | y)}dβ

Topics in Inference Fields Institute, 2015 33

Page 34: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

... variational methods Titterington, 2006

I close to posterior:

minq

∫q(β) log{q(β)/f (β | y)}dβ = min

qKL(q || fpost )

I equivalent to best LB for marginal f (y)

maxq

∫q(β) log{f (y , β)/q(β)}dβ

I in a likelihood context log f (y ; θ) = log∫

f (y | β; θ)f (β)dβ

=

∫q(β) log{f (y , β; θ)/q(β)}dβ + KL(q || fpost )

I

log f (y ; θ) ≥∫

q(β) log{f (y , β; θ)/q(β)}dβ

here β represent random effects u, or b, or ...

Topics in Inference Fields Institute, 2015 34

Page 35: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

Example: GLMM Ormerod & Wand, 2012

log-likelihood:

`(β,Σ) =m∑

i=1

(yT

i Xiβ −12

log |Σ|

+ log∫Rk

exp{yTi Ziui − 1T

i b(Xiβ + Ziui )−12

uTi Σ−1ui}dui

)=

m∑i=1

(yT

i Xiβ −12

log |Σ|

+ log∫Rk

exp{yTi Ziui − 1T

i b(Xiβ + Ziui )−12

uTi Σ−1ui}

φΛi (u − µi )

φΛi (u − µi )dui

)variational approx:

`(β,Σ) ≥m∑

i=1

(yT

i Xiβ −12

log |Σ|)

+m∑

i=1

Eu∼N(µi ,Λi )

(yT

i Ziu − 1Ti b(Xiβ + Ziu)− 1

2uTΣ−1u − log{φΛi (u − µi )}

)≡ `(β,Σ, µ,Λ) simplifies to k one-dim. integrals

Topics in Inference Fields Institute, 2015 35

Page 36: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

... variational approximations Ormerod & Wand, 2012

I

`(β,Σ) ≥ `(β,Σ, µ,Λ)

I variational estimate:

`(β, Σ, µ, Λ) = arg maxβ,Σ,µ,Λ`(β, Σ, µ, Λ)

I inference for β, Σ? consistency? asymptotic normality?Hall, Ormerod, Wand, 2011; Hall et al. 2011

I emphasis on algorithms and model selectione.g. Tan & Nott, 2013, 2014

I VL: approx L(θ; y) by a simpler function of θ, e.g.∏

qj(θ)

I CL: approx f (y ; θ) by a simpler function of y , e.g.∏

f (yj ; θ)

Topics in Inference Fields Institute, 2015 36

Page 37: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

Laplace approximation`(θ; y) = log

∫f (y | b; θ)g(b)db = log

∫exp{Q(b, y , θ)}db, say

`Lap(θ; y) = Q(b, y , θ)− 12

log |Q′′(b, y , θ)|+ c

using Taylor series expansion of Q(·, y , θ) about b

simplification of the Laplace approximation leads to PQL:

`PQL(θ,b; y) = log f (y | b; θ)− 12

bTΣ−1bBreslow & Clayton, 1993

to be jointly maximized over b and θ and parameters in Σ

PQL can be viewed as linearizing E(y) and then using resultsfor linear mixed models Molenberghs & Verbeke, 2006

Topics in Inference Fields Institute, 2015 37

Page 38: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

implemented in lme4 as glmer, in MASS as glmmPQLOrmerod & Wand, 2012

Page 39: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

ReferencesBesag, J. (1975). Statistical analysis of non-lattice data. The Statistician 24, 179–195.Breslow, N.E. & Clayton, D. G. (1993). Approximate inference in generalizsed linear mixed models. J. Am. Statist.Assoc. 88, 9–25.Buhlmann, P. (2013). Statistical significance in high-dimensional linear models. Bernoulli 19, 1212 – 1242.Buhlmann, P., Kalisch, M. and Meier, L. (2014). High-dimensional statistics with a view toward applications inbiology. Annual Review of Statistics and its Applications 1, 255–278.Cox, D.R. & Kartsonaki, C. (2012). The fitting of complex parametric models. Biometrika 99, 741–747.Davis, R. & Yau, C.Y. (2011). Comments on pairwise likelihood in time series. Statistica Sinica 21, 255–277.Davison, A.C., (2012). Statistical modeling of spatial extremes. Statistical Science 27, 161–186.Davison, A.C. & Huser, R.(2015). Statistics of Extremes Annual Reviews 2, to appear.El Karoui, N., Bean, D., Bickel, P.J., Lim, C. and Yu, B. (2013). On robust regression with hig-dimensional predictors.PNAS 110, 14557 – 14562.Fearnhead, P. & Prangle, (2012). Approximate lieklihood methods for estimating local recombination rates J. R.Statist. Soc. B 64, 657–680.Geyer, C. & Thompson, E.A. (1992). Constrained MC maximum likelihood... J. R. Statist. Soc. B 54, 657–699.Jiang, W. & Turnbull, B. (2004). The indirect methods ... Statistical Science 19, 239–263 .Lindsay, B. (1988). Composite likelihood methods. Contemp. Math. 80, 220–239.Lockhart, R., Taylor, J., Tibshirani, R.J. and Tibshirani, R. (2014). A signficance test for the lasso. Ann. Statist. 42,413 – 468.Marin, J.-M. et al. (2010). Approximate Bayesian computational methods. Stat. & Computing 22, 1167–1180.

Molenberghs, G. & Verbeke, G. (2006). Discrete Longitudinal Data Springer, New York.

Topics in Inference Fields Institute, 2015 39

Page 40: Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Profile likelihood

... referencesOkabayashi, X. Johnson, L.& Geyer, C.J. (2011). Extending pseudo-likelihood Statistica Sinica 21, 331–347.Ormerod, & Wand, M. (2012). Gaussian variational approximate inference... J Comp Graph Statist21, 2–17.Ormerod, & Wand, M. (2010). Explaining variational approximations. Am. Stat. 64, 140–153.Portnoy, S. (1984). Asymptotic behaviour of M-estimators of p regression parameters when p2/n is large. I.Consistency. Ann. Statist. 12, 1298 – 1309.Portnoy, S. (1985). Asymptotic behaviour of M-estimators of p regression parameters when p2/n is large. II. Normalapproximation. Ann. Statist. 13, 1403 – 1417.Portnoy, S. (1988). Asymptotic behaviour of likelihood methods for exponential families when the number ofparameters tends to infinity. Ann. Statist. 16, 356–366.Ravikumar et al. (2010). High-dimensional Ising model selection... Ann. Statist. 38, 1287–1319.Reid, N. (2013). Aspects of likelihood inference. Bernoulli 19, 1404–1418.Reid, N. (2010). Likelihood inference. Wiley Interdisciplinary Reviews in Computational Statistics, 5, 517–525.Renard, D. Molenberghs, G. and Geys, H. (2004). A pairwise likelihood approach to estimation in multilevel probitmodels. Comp. Stat. Data. Anal. 44, 649–667.Royall, R.J. (1997). Statistical Evidence.... Chapman & Hall, London.Shalizi, C. (2013). Notebooks. indirect inferenceShun, Z. & McCullagh, P. (1995 ). Laplace approximation ... J. R. Statist. Soc. B 57, 749–760.Smith, A.A. (2008). Indirect inference. in New Palgrave Dictionary of Economics 2nd ed.Taylor, J., Lockhart, R., Tibshirani, R.J. and Tibshirani, R. (2014). Exact post-selection inference for forwardstepwise and least angle regression. http://arxiv.org/pdf/1401.3889v4.pdfTitterington, D.M. (2006). Bayesian methods for neural networks ... Statistical Science 19, 128–139.Xue, L., Zou, H. & Cai, T. (2012). Nonconcave penalized composite conditional likelihood... Ann. Statist. 40,1403–1429.

Yi, G. & Reid, N. (2010). A note on misspecified estimating equations. Statistica Sinica 20, 1749–1769.

Topics in Inference Fields Institute, 2015 40