Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Proﬁle likelihood

Likelihood based inference

1. Overview of classical asymptotics

2. Profile likelihood and nuisance parameters NR 2013; 2010

3. p growing with n Portnoy 1984, 5, 8

4. p > n: regularization Buhlmann 2013; Taylor et al. 2014

5. approximate likelihoods composite, quasi, empirical, ...

Topics in Inference Fields Institute, 2015 1

Models and likelihoodI Model for the probability distribution of y given xI Density f (y | x) with respect to, e.g., Lebesgue measureI Parameters for the density f (y | x ; θ), θ = (θ1, . . . , θp)

I Data y = (y1, . . . , yn) often independent

I Likelihood function L(θ; y) ∝ f (y ; θ) (y1, . . . , yn)

I log-likelihood function `(θ; y) = log L(θ; y)

I often θ = (ψ, λ)

I θ could have very large dimension, p > n

I θ could have infinite dimension in principleE(y | x) = θ(x) ‘smooth’


Examplesgeneralized linear mixed models

GLM: yij | ui ∼ exp{yijηij − b(ηij) + c(yij)}

linear predictor: ηij = xTijβ + zT

ij ui j=1,...ni ; i=1,...m

random effects: ui ∼ Nk (0,Σ)

log-likelihood:

`(β,Σ) =m∑

i=1

(yT

i Xiβ −12

log |Σ|

+ log∫Rk

exp{yTi Ziui − 1T

i b(Xiβ + Ziui)−12

uTi Σ−1ui}dui

)Ormerod & Wand 2012


... complicated likelihoodsI example: clustered binary dataI latent variable:

zir = x ′irβ + bi + εir , bi ∼ N(0, σ2b), εir ∼ N(0,1)

I r = 1, . . . ,ni : observations in a cluster/family/school...i = 1, . . . ,n clusters

I random effect bi introduces correlation betweenobservations in a cluster

I observations: yir = 1 if zir > 0, else 0I Pr(yir = 1 | bi) = Φ(x ′irβ + bi) = pi Φ(z) =

∫ z 1√2π

e−x2/2dx

I likelihood θ = (β, σb)L(θ; y) =

∏ni=1 log

∫∞−∞

∏nir=1 pi

yir (1− pi)1−yirφ(bi , σ

2b)dbi

I more general: zir = x ′irβ + w ′ir bi + εir

Renard et al. (2004)


... complicated likelihoodsPoisson f (yt | αt ; θ) = exp(yt logµt − µt )/yt !

logµt = β + αt

autoregression

αt = φαt−1 + εt , εt ∼ N(0, σ2), |φ| < 1, θ = (β, φ, σ2)

likelihood

L(θ; y1, . . . , yn) =

∫ ( n∏t=1

f (yt | αt ; θ)

)f (α; θ)dα

Lapprox (θ; y) via Laplace with some refinementsDavis & Yau, 2011


... complicated likelihoodsmultivariate extremes: example, wind speed at d locations

vector observations: (X1i , . . . ,Xdi), i = 1, . . . ,n

component-wise maxima: Z1, . . . ,Zd ; Zj = max(Xj1, . . . ,Xjn)

Zj are transformed (centered and scaled)

joint distribution function:

Pr(Z1 ≤ z1, . . . ,Zd ≤ zd ) = exp{−V (z1, . . . , zd )}

V (·) can be parameterized via Gaussian process models

likelihood : need the joint derivatives of V (·)

combinatorial explosion Davison et al., 2012


... complicated likelihoodsRestricted Boltzmann machine:

f (v ; θ) =1

Z (θ)exp{

∑h

(hTWv + αTh + βTv)}, θ = (W , α, β)

observations: v1, . . . , vn, independent ∼ f (v ; θ); hidden units h

complete data likelihood

f (v ,h; θ) =1

Z (θ)exp{f (v ,h; θ)}

partition function: Z (θ) =∑

v ,h exp{f (v ,h; θ)}Jan 30 MZ slides; GL slides


Why likelihood?I makes probability modelling centralI emphasizes the inverse problem of reasoning

from y to θ or f (·)I suggested by Fisher as a measure of plausibility

Royall, 1997L(θ)/L(θ) ∈ (1,3) very plausible;L(θ)/L(θ) ∈ (3,10) implausible;L(θ)/L(θ) ∈ (10,∞) very implausible

I converts a ‘prior’ probability π(θ) to a posterior π(θ | y) viaBayes’ Theorem

I provides a conventional set of summary quantities:maximum likelihood estimator, score function, ...

I leading to approximate pivotal functions, based onnormal distribution

I basis for comparison of models, using AIC or BIC


Derived quantities

16 17 18 19 20 21 22 23

−4

−3

−2

−1

0

log−likelihood function

θθ

log−

likel

ihoo

d

θθθθ

θθ −− θθ

1.92 w/2

I maximum likelihood estimatorθ = arg supθ log L(θ; y)

= arg supθ`(θ; y)

I observed Fisher informationj(θ) = − ∂2`(θ)/∂θ2

∣∣θ

I efficient score function`′(θ) = ∂`(θ; y)/∂θ

`′(θ) = 0 assuming enough regularity

I `′(θ; y) =∑n

i=1 ∂ log fYi (yi ; θ)/∂θ y1, . . . , yn independent


Limiting results no nuisance parameters

`′(θ)Tj−1(θ)`′(θ)L−→ χ2

p,

q(θ) ≡ (θ − θ)Tj(θ)(θ − θ)L−→ χ2

p,

w(θ) ≡ 2{`(θ)− `(θ)} L−→ χ2p

Approximate pivots p = 1

s(θ) ≡ `′(θ)j−1/2(θ).∼ N(0,1)

q(θ) ≡ (θ − θ)j1/2(θ).∼ N(0,1)

r(θ) ≡ ±√

[2{`(θ)− `(θ)}1/2].∼ N(0,1)


... approximate pivots scalar parameter of interest


Nuisance parameters: θ = (ψ, λ)

I λψ constrained maximum likelihood estimator

I profile log-likelihood `p(ψ) = `(ψ, λψ)

re(ψ; y) = (ψ − ψ)j1/2p (ψ)

.∼ N(0,1)

r(ψ; y) = ±√

[2{`p(ψ)− `p(ψ)}] .∼ N(0,1)

πm(ψ | y).∼ N{ψ, j−1/2

p (ψ)}

jp(ψ) = −`′′p(ψ); profile information

I treat profile log-likelihood as a one-parameter log-likelihood


The problem with profilingI `p(ψ) = `(ψ, λψ) used as a ‘regular’ likelihood, with the

usual asymptoticsI neglects errors in the estimation of the nuisance parameterI can be very large when there are many nuisance

parameters

I example: Y ∼ N(Xβ, σ2I), σ2 = (y − X β)T(y − X β)/nI badly biased if dim(β) large relative to nI easy fix: σ2 = (y − X β)T(y − X β)/(n − p)

I example: Yij ∼ N(µi , σ2), j = 1, . . . ,n; i = 1, . . . ,p

I σ2 p→ n − 1n

σ2 as p →∞,n fixedNeyman & Scott, 1948


Reminder: deriving limit resultsI `′(θ; y) = 0 = `′(θ; y) + (θ − θ)`′′(θ; y)

I `′(θ; y){−`′′(θ; y)}−1 .= θ − θ

I `′(θ; y)︸︷︷︸L−→N(0,i(θ))

{−`′′(θ; y)}−1︸︷︷︸p→i−1(θ)

=⇒ (θ − θ)L−→ N(0, i−1(θ))

i(θ) = E{j(θ)} = E{−`′′(θ)} = cov{`′(θ)}

I M estimator: θρ = argminθΣρ(yi ; θ)

I Solution Σψ(yi ; θρ) = 0, ψ(y ; θ) = ∂ρ(yi ; θ)/∂θ

I θρL−→ N{0,G−1(θ)} E{ψ(Y ; θ)} = 0

I G(θ) =E{−∂ψ(Y ; θ)/∂θ}[cov{ψ(Y ; θ)}]−1E{−∂ψ(Y ; θ)/∂θ}


big data asymptoticsI Neyman-Scott problems: n fixed, p →∞

I Donoho n,p →∞, p/n→ β <∞

I likelihood results n,p →∞, p2/n→ β <∞Portnoy, 1984, 5, 8

I Laplace approx n,p →∞, p = o(n1/3)Shun & McCullagh, 1995

I p > n: regularizeI lasso

argminβ (y − Xβ)T(y − Xβ)− λΣj |βj |no intercept

I ridge regression

argminβ (y − Xβ)T(y − Xβ)− λΣjβ2j


n,p →∞ Portnoy, 1984,5,8

I Model: yi = xTi β + Zi , i = 1, . . . ,n independent

I M-estimation:n∑

i=1

xiψ(yi − xTi β) = 0 (1)

I result: if ψ is monotone, and p log(p)/n→ 0, andconditions on X , then

there is a solution of (1) satisfying||β − β||2 = O(p/n)

I “rows of X behave like a sample from a distribution in Rp”

I if p3/2 log n/n→ 0, then

max |xTi (β − β)| p→ 0

I andaT

n(β − β)L−→ N(0, σ2)

σ2 = aTn(X TX )−1anEψ2(Z )/{Eψ′(Z )}2


n,p →∞ Portnoy, 1984,5,8

I Model: yi ∼ exp{θTy − ψ(θ)}, i = 1, . . . ,n independent; p = pn

I maximum likelihood estimate ψ′(θn) = ynI under conditions on the eigenvalues of ψ′′(θ) and moment

conditions on y , Fisher information matrix

||θn − θn||2 ≤ cpn, in probability ,

I

||θ − θ − y || = Op(p/n) if p/n→ 0,I p3/2/n→ 0: √

naTn(θ − θ)

L−→ N(0,1),

likelihood ratio test of simple hypothesis asymptotically χ2p

I “asymptotic approximations are trustworthy if p3/2/n issmall, but may be very wrong if p2/n is not small”

I MLE ‘will tend to be’ consistent if p/n→ 0cf. also El Karoui et al., 2013, PNAS


n < p Buhlmann et al., Lockhart et al.

βLasso = argminβ(||y − Xβ||22 + λ||β||1)

I prediction: ||X (βLasso − β0||22/n ‘small’

I estimation: ||βLasso − β0||q, q ∈ 1,2 ‘small’

I selection: P(S = S0) ‘large’ S0 is the ‘active set’:{j : β0

j 6= 0}I under restricted eigenvalue conditions on X , can get

results like

||βLasso − β0||1 = Op(s0√

log(p)/n), λ ≈√

log(p)/n

I what about estimated standard errors for βLasso?I Buhlmann, 2013: the ridge regression estimate

βR = argminβ(||y − Xβ||22 + λ||β||2),

can be bias-correctedI the bias-corrected version is asymptotically normally

distributed, and the asymptotic variance can be estimated


n < p Buhlmann et al., Lockhart et al.

βLasso = argminβ(||y − Xβ||22 + λ||β||1)

TkL−→ Exp(1)

Taylor et al. 2014


Likelihood in complex models

I simplify the likelihoodI composite likelihoodI variational approximationI Laplace approximation to integrals

I change the mode of inferenceI quasi-likelihoodI indirect inference

I simulateI approximate Bayesian computationI MCMC


Composite likelihoodI also called pseudo-likelihoodI reduce high-dimensional dependencies by ignoring them

I for example, replace f (yi1, . . . , yik ; θ) by

pairwise marginal∏j<j ′

f2(yij , yij ′ ; θ), or

conditional∏

j

fc(yij | yN (ij); θ)

I Composite likelihood function

CL(θ; y) ∝n∏

i=1

∏j<j ′

f2(yij , yij ′ ; θ)

I Composite ML estimates are consistent, asymptoticallynormal, not fully efficient Besag, 1975; Lindsay, 1988


Example: AR Poisson Davis & Yau, 2011

I Likelihood

L(θ; y1, . . . , yn) =

∫ ( n∏t=1

f (yt | αt ; θ)

)f (α; θ)dα

I Composite likelihood

CL(θ; y1, . . . , yn) =n−1∏t=1

∫ ∫f (yt | αt ; θ)f (yt+1 | αt+1; θ)f (αt , αt+1; θ)dαtdαt+1

I consecutive pairsI Time-series asymptotic regime one vector y of increasing length

I Composite ML estimator still consistent, asymptoticallynormal, estimable asymptotic variance

I Efficient, relative to a Laplace-type approximationI Surprises: AR(1), fully efficient; MA(1), poor; ARFIMA(0,d,0), ok


Example: spatial extremes Davison et al., 2012; & Huser, 2015

Pr(Z1 ≤ z1, . . . ,Zd ≤ zd ) = exp{−V (z1, . . . , zd ; θ)}

I pairwise composite likelihood used to compare the fits ofseveral competing models

I model choice using “CLIC”, an analogue of AIC−2 log(CL) + tr(J−1K )

I Davison et al. 2012 applied this to annual maximum rainfallat several stations near Zurich

I “fitting max-stable processes to spatial or spatio-temporalblock maxima is awkward ... the use of compositelikelihoods ... has become widely used” Davison & Huser


Example: Ising modelIsing model:

f (y ; θ) = exp(∑

(j,k)∈E

θjkyjyk )1

Z (θ)

neighbourhood contributions

f (yj | y(−j); θ) =exp(2yj

∑k 6=j θjkyk )

exp(2yj∑

k 6=j θjkyk ) + 1

penalized CL estimation based on sample y (1), . . . , y (n)

maxθ

n∑

i=1

`j(θ; y (i))−∑

j

∑k

Pλ(|θjk |)

Xue et al., 2012

Ravikumar et al., 2010


Quasi-likelihoodI simplify the modelI

E(yi ; θ) = µi(θ); Var(yi ; θ) = φνi(θ)

I consistent with generalized linear modelsI example: over-dispersed Poisson responsesI PQL uses this construction, but with random effects

Molenberghs & Verbeke, Ch. 14

I why does it work?I score equations are the same as for a ‘real’ likelihood

hence unbiased

I derivative of score function equal to variance functionspecial to GLMs


Indirect inferenceI composite likelihood estimators are consistent

under conditions ...

I because log CL(θ; y) =∑n

i=1∑

j<j ′ log f (yj , yj ′ ; θ)

I derivative w.r.t. θ has expected value 0

I what happens if an estimating equation g(y ; θ) is biased?I g(y1, . . . , yn; θn) = 0; θn → θ∗ Eg(Y ; θ∗) = 0

I θ∗ = k(θ); invertible? θ = k(θ∗) k−1 ≡ k

I new estimator θn = k(θn)

I k(·) is a bridge function, connecting wrong value of θto the right one Yi & R, 2010; Jiang & Turnbull, 2004


... indirect inference Smith, 2008

I model of interest

yt = Gt (yt−1, xt , εt ; θ), θ ∈ Rd

I likelihood is not-computable, but can simulate from themodel

I simple (wrong) model

yt ∼ f (yt | yt−1, xt ; θ∗), θ∗ ∈ Rp

I find the MLE in the simple model, θ∗ = θ∗(y1, . . . , yn), say

I use simulated samples from model of interestto find the ‘best’ β

I ‘best’ θ gives data that reproduces θ∗ Shalizi, 2013


... indirect inference Smith, 2008

I simulate samples ymt , m = 1, . . . ,M at some value θ

I compute θ∗(θ) from the simulated data

θ∗(θ) = arg maxθ∗

∑m

∑t

log f (ymt | ym

t−1, xt ; θ∗)

I choose θ so that θ∗(θ) is as close as possible to θ∗

I if p = d simply invert the ‘bridge function’I usually p > d

I θ1 = arg minθ{θ∗(θ)− θ}TW{θ∗(θ)− θ}I β2 = arg minθ(

∑t log f (yt | yt−1, xt , θ

∗(θ))−∑

t log f (yt |yt−1, xt , θ))

I estimates of θ are consistent, asymptotically normal, butnot efficient


Approximate Bayesian Computation Marin et al., 2010

I simulate θ′ from π(θ)

I simulate data z from f (·; θ′)

I if z = y then θ′ is an observation from posterior π(· | y)

I actually s(z) = s(y) for some set of statistics

I actually ρ{s(z), s(y)} < ε for some distance function ρ(·)

Fearnhead & Prangle, 2011

I many variations, using different MCMC methods to selectcandidate values θ′


... approximate Bayesian computationM/G/1 queue: exponential arrival times, general service times,single server

observations yi : times between departures from the queue

unobserved variables Vi : arrival time of customer i

model:I V1 ∼ Exp(θ3)

I Vi | Vi−1 ∼ Vi−1 + Exp(θ3)

I Yi | Xi−1, Vi ∼ Uniform{θ1 + max(0, Vi − Xi−1),

θ2 + max(0, Vi − Xi−1)} Xi =∑i

j=1 Yj

I service time∼ U(θ1, θ2)

ABC: use quantiles of departure times as summary statistics

Indirect Inference: use y , y(1), θ2 from steady-state model


Fearnhead & Prangle, 2011

ABC and Indirect Inference Cox & Kartsonaki, 2012

I both methods need a set of parameter values from whichto simulate: θ′ or θ

I both methods need a set of auxiliary functions of the datas(y) or θ∗(y)

I in indirect inference, θ∗ is the ‘bridge’ to the parameters ofreal interest, θ

I C & K use orthogonal designs based on Hadamardmatrices to chose θ′

I and calculate summary statistics focussed on individualcomponents of θ

I MCMC estimation of log-likelihood functionGeyer & Thompson, 1992

cond. comp. likelihood poor for Ising model Okabayashi et al., 2011


Variational methods Ormerod & Wand, 2010

I in a Bayesian context, want f (β | y)use an approximation q(β)

I dependence of q on y suppressed

I choose q(β) to beI simple to calculateI close to posterior

I simple to calculateI q(β) =

∏qj (βj )

I simple parametric family

I close to posterior: miminize Kullback-Leibler divergence

KL(q || fpost ) =

∫q(β) log{q(β)/f (β | y)}dβ


... variational methods Titterington, 2006

I close to posterior:

minq

∫q(β) log{q(β)/f (β | y)}dβ = min

qKL(q || fpost )

I equivalent to best LB for marginal f (y)

maxq

∫q(β) log{f (y , β)/q(β)}dβ

I in a likelihood context log f (y ; θ) = log∫

f (y | β; θ)f (β)dβ

=

∫q(β) log{f (y , β; θ)/q(β)}dβ + KL(q || fpost )

I

log f (y ; θ) ≥∫

q(β) log{f (y , β; θ)/q(β)}dβ

here β represent random effects u, or b, or ...


Example: GLMM Ormerod & Wand, 2012

log-likelihood:

`(β,Σ) =m∑

i=1

(yT

i Xiβ −12

log |Σ|

+ log∫Rk

exp{yTi Ziui − 1T

i b(Xiβ + Ziui )−12

uTi Σ−1ui}dui

)=

m∑i=1

(yT

i Xiβ −12

log |Σ|

+ log∫Rk

exp{yTi Ziui − 1T

i b(Xiβ + Ziui )−12

uTi Σ−1ui}

φΛi (u − µi )

φΛi (u − µi )dui

)variational approx:

`(β,Σ) ≥m∑

i=1

(yT

i Xiβ −12

log |Σ|)

+m∑

i=1

Eu∼N(µi ,Λi )

(yT

i Ziu − 1Ti b(Xiβ + Ziu)− 1

2uTΣ−1u − log{φΛi (u − µi )}

)≡ `(β,Σ, µ,Λ) simplifies to k one-dim. integrals


... variational approximations Ormerod & Wand, 2012

I

`(β,Σ) ≥ `(β,Σ, µ,Λ)

I variational estimate:

`(β, Σ, µ, Λ) = arg maxβ,Σ,µ,Λ`(β, Σ, µ, Λ)

I inference for β, Σ? consistency? asymptotic normality?Hall, Ormerod, Wand, 2011; Hall et al. 2011

I emphasis on algorithms and model selectione.g. Tan & Nott, 2013, 2014

I VL: approx L(θ; y) by a simpler function of θ, e.g.∏

qj(θ)

I CL: approx f (y ; θ) by a simpler function of y , e.g.∏

f (yj ; θ)


Laplace approximation`(θ; y) = log

∫f (y | b; θ)g(b)db = log

∫exp{Q(b, y , θ)}db, say

`Lap(θ; y) = Q(b, y , θ)− 12

log |Q′′(b, y , θ)|+ c

using Taylor series expansion of Q(·, y , θ) about b

simplification of the Laplace approximation leads to PQL:

`PQL(θ,b; y) = log f (y | b; θ)− 12

bTΣ−1bBreslow & Clayton, 1993

to be jointly maximized over b and θ and parameters in Σ

PQL can be viewed as linearizing E(y) and then using resultsfor linear mixed models Molenberghs & Verbeke, 2006


implemented in lme4 as glmer, in MASS as glmmPQLOrmerod & Wand, 2012

ReferencesBesag, J. (1975). Statistical analysis of non-lattice data. The Statistician 24, 179–195.Breslow, N.E. & Clayton, D. G. (1993). Approximate inference in generalizsed linear mixed models. J. Am. Statist.Assoc. 88, 9–25.Buhlmann, P. (2013). Statistical significance in high-dimensional linear models. Bernoulli 19, 1212 – 1242.Buhlmann, P., Kalisch, M. and Meier, L. (2014). High-dimensional statistics with a view toward applications inbiology. Annual Review of Statistics and its Applications 1, 255–278.Cox, D.R. & Kartsonaki, C. (2012). The fitting of complex parametric models. Biometrika 99, 741–747.Davis, R. & Yau, C.Y. (2011). Comments on pairwise likelihood in time series. Statistica Sinica 21, 255–277.Davison, A.C., (2012). Statistical modeling of spatial extremes. Statistical Science 27, 161–186.Davison, A.C. & Huser, R.(2015). Statistics of Extremes Annual Reviews 2, to appear.El Karoui, N., Bean, D., Bickel, P.J., Lim, C. and Yu, B. (2013). On robust regression with hig-dimensional predictors.PNAS 110, 14557 – 14562.Fearnhead, P. & Prangle, (2012). Approximate lieklihood methods for estimating local recombination rates J. R.Statist. Soc. B 64, 657–680.Geyer, C. & Thompson, E.A. (1992). Constrained MC maximum likelihood... J. R. Statist. Soc. B 54, 657–699.Jiang, W. & Turnbull, B. (2004). The indirect methods ... Statistical Science 19, 239–263 .Lindsay, B. (1988). Composite likelihood methods. Contemp. Math. 80, 220–239.Lockhart, R., Taylor, J., Tibshirani, R.J. and Tibshirani, R. (2014). A signficance test for the lasso. Ann. Statist. 42,413 – 468.Marin, J.-M. et al. (2010). Approximate Bayesian computational methods. Stat. & Computing 22, 1167–1180.

Molenberghs, G. & Verbeke, G. (2006). Discrete Longitudinal Data Springer, New York.


... referencesOkabayashi, X. Johnson, L.& Geyer, C.J. (2011). Extending pseudo-likelihood Statistica Sinica 21, 331–347.Ormerod, & Wand, M. (2012). Gaussian variational approximate inference... J Comp Graph Statist21, 2–17.Ormerod, & Wand, M. (2010). Explaining variational approximations. Am. Stat. 64, 140–153.Portnoy, S. (1984). Asymptotic behaviour of M-estimators of p regression parameters when p2/n is large. I.Consistency. Ann. Statist. 12, 1298 – 1309.Portnoy, S. (1985). Asymptotic behaviour of M-estimators of p regression parameters when p2/n is large. II. Normalapproximation. Ann. Statist. 13, 1403 – 1417.Portnoy, S. (1988). Asymptotic behaviour of likelihood methods for exponential families when the number ofparameters tends to infinity. Ann. Statist. 16, 356–366.Ravikumar et al. (2010). High-dimensional Ising model selection... Ann. Statist. 38, 1287–1319.Reid, N. (2013). Aspects of likelihood inference. Bernoulli 19, 1404–1418.Reid, N. (2010). Likelihood inference. Wiley Interdisciplinary Reviews in Computational Statistics, 5, 517–525.Renard, D. Molenberghs, G. and Geys, H. (2004). A pairwise likelihood approach to estimation in multilevel probitmodels. Comp. Stat. Data. Anal. 44, 649–667.Royall, R.J. (1997). Statistical Evidence.... Chapman & Hall, London.Shalizi, C. (2013). Notebooks. indirect inferenceShun, Z. & McCullagh, P. (1995 ). Laplace approximation ... J. R. Statist. Soc. B 57, 749–760.Smith, A.A. (2008). Indirect inference. in New Palgrave Dictionary of Economics 2nd ed.Taylor, J., Lockhart, R., Tibshirani, R.J. and Tibshirani, R. (2014). Exact post-selection inference for forwardstepwise and least angle regression. http://arxiv.org/pdf/1401.3889v4.pdfTitterington, D.M. (2006). Bayesian methods for neural networks ... Statistical Science 19, 128–139.Xue, L., Zou, H. & Cai, T. (2012). Nonconcave penalized composite conditional likelihood... Ann. Statist. 40,1403–1429.

Yi, G. & Reid, N. (2010). A note on misspecified estimating equations. Statistica Sinica 20, 1749–1769.


http://vserver1.cscs.lsa.umich.edu/~crshalizi/notebooks/

http://arxiv.org/pdf/1401.3889v4.pdf

Likelihood based inferencefields2015bigdata2inference.weebly.com/uploads/4/4/4/6/... · 2019. 9. 4. · Likelihood based inference 1.Overview of classical asymptotics 2.Proﬁle likelihood

Documents