D I S C U S S I O N P A P E R 1051 ADDITIVE LOCATION-SCALE … · 2012. 2. 25. · i n s t i t u t d e s t a t i s t i q u e b i o s t a t i s t i q u e e t s c i e n c e s a c t

I N S T I T U T D E S T A T I S T I Q U E

B I O S T A T I S T I Q U E E T

S C I E N C E S A C T U A R I E L L E S

(I S B A)

UNIVERSITÉ CATHOLIQUE DE LOUVAIN

D I S C U S S I O N

P A P E R

1051

ADDITIVE LOCATION-SCALE MODELS

FOR INTERVAL CENSORED DATA

LAMBERT, P.

This file can be downloaded fromhttp://www.stat.ucl.ac.be/ISpub

Additive location-scale models

for interval censored data

Philippe Lambert ∗†

December 21, 2010

Abstract

An additive model for the location and dispersion of a continuous re-sponse with an arbitrary smooth conditional distribution is proposed. B-splines are used to specify the three components of the model. It can beextended to deal with interval censored data and multiple covariates. Asan illustration, the relation between age, the number of years of full-timeeducation and the net income (provided as intervals) available per personin Belgian households is studied from survey data.

Key words: Interval censored data ; additive model ; Smooth distri-bution.

1 Introduction

Generalized linear models (GLMs) (Nelder and Wedderburn, 1972; McCullaghand Nelder, 1989) had a major impact on statistical modelling with the syn-thesis in a single framework of several extensions of normal linear regressionmodels to discrete and positive continuous random variables. After choosinga distribution in the exponential family and a link function, a transform ofthe mean response could be described as a linear function of covariates withmaximum likelihood estimates of the regression coefficients quickly obtainedusing the iterative weighted least squares algorithm and its implementation inthe software GLIM. Extensions of GLMs enable to deal with multi-categorical,multivariate and longitudinal data, random effects (see e.g. Fahrmeir and Tutz,2001, for details and references), and even allow to jointly model location anddispersion (Jørgensen, 1997). Another relevant extension of GLMs releases thelinearity assumption and allows a nonparametric or a flexible parametric spec-ification of the relation between the mean response and continuous covariates:the publication of the book by Hastie and Tibshirani (1990) generated a lot ofresearch about generalized additive models.

∗Institut des sciences humaines et sociales, Méthodes quantitatives en sciences sociales,Université de Liège, Liège, Belgium. Email: [email protected].†Institut de statistique, biostatistique et sciences actuarielles (ISBA), Université catholique

de Louvain, Louvain-la-Neuve, Belgium.

1

Here, we shall focus on the location-scale model and extend it using some ofthe preceding principles. If Y is a continuous response and X a set of covariatesin Rp, the location-scale model assumes that

Y = µ(X) + σ(X)ε (1.1)

where ε is independent of X, µ(X) denotes the unknown regression surface andσ(X) enables to depart from the homoskedastic case. Most often, one furtherimposes that E(ε) = 0 and V(ε) = 1 such that µ(X) and σ2(X) are respectivelythe conditional mean and variance of Y given X.

Lambert and Lindsey (1999) jointly model the location, dispersion, skew-ness and kurtosis of a continuous response by relating the four parameters of astable distribution to covariates. That idea was further investigated by Rigbyand Stasinopoulos (2001) by replacing the stable distribution with other flexi-ble parametric conditional distributions and regression models by additive ones.Accelerated failure time (AFT) models (see e.g. Collett, 1994) in survival anal-ysis also rely on Eq. (1.1) with log-transformed data. Explanatory variables areassumed to act multiplicatively on the time-scale and for example to affect theprogression of a disease. An AFT model can be extended to include randomeffects (Lambert et al., 2004) similarly to the inclusion of frailty (Vaupel et al.,1979) in the Cox proportional hazards model or to have a flexible conditionaldistribution (Komárek et al., 2005). Joint models for location and scale werealso studied in the nonparametric literature. Hsieh (1996) proposes an empir-ical process approach to compare two samples of right-censored survival timesmodelled using an accelerated failure time model with an arbitrary distributionform. Starting from the Beran’s estimate F̃ (y|x) of the conditional distribu-tion of Y given X, Van Keilegom and Akritas (1999) estimate µ(X) and σ(X)consistently from right-censored data by discarding the right tail of F̃ (y|x).

The goal of our research is to estimate the location and dispersion of theconditional distribution of Y in Eq. (1.1) using additive models when the dis-tribution of ε has an unknown smooth density and when Y is possibly interval-censored. The plan of the paper is as follows. After a short introduction,Section 2 reminds how a smooth univariate density can be modelled using pe-nalized B-splines (Eilers and Marx, 1996; Lambert and Eilers, 2006, 2009).Section 3 combines the flexible specification for the conditional density withjoint additive models for location and dispersion. The roughness penalties andthe identification constraint for the pivotal distribution are described in Section4. Strategies for a joint estimation of the parameters involved in the three levelsof the model are proposed in Section 5 together with an algorithm and prac-tical recommendations based on our experience. An extension of the model tointerval censored data is presented in Section 6. After a fit of our location-scaleadditive model to simulated data and an application to the analysis of intervalcensored income in Sections 7 and 8, we conclude the paper by a discussion inSection 9.

2

2 Smooth specification of the conditional distribu-tion

Assume that one wants to specify how the conditional distribution of a singlecontinuous random variable Y changes with a set of continuous covariates X =(X1, . . . , Xp)

′ ∈ Rp. Denote by fY |X(y|x) the corresponding conditional density.Further assume that conditionally on X = x, the distribution of ε = (Y −

µx)/σx does not depend on x for suitably specified conditional location anddispersion parameters, µx and σx. One can find an interval S = (s1, s2) suchthat the support of ε is most likely included in it. For example, if µx and σ

2x

denote the first two conditional moments, then, by Chebyshev’s theorem, oneknows that Pr(|ε| ≤ k) ≥ 1− k−2 for k > 0. In most practical situations, thatprobability is far above that lower bound. Therefore, taking s2 = −s1 = 6 (say)usually provides satisfactory results. Of course, based on contextual information(such as positive skewness of the response) or historical data, more ad-hocchoices could be made.

Consider now a cubic B-splines basis {bk(·) : k = 1, . . . ,K} associated toa large number (20, say) of equidistant knots on S. Let {Jj : j = 1, . . . , J}be a partition of S into a large number (100, say) of consecutive bins of equalwidth ∆ with midpoints uJj=1. Let [B]jk = bk(uj) be the J × K matrix ofcubic B-splines evaluated the small bin midpoints uj(j = 1, . . . , J). Then, theprobability to observe ε in Jj could be specified as∫

Jjfε(e)de = πj =

exp([Bφ∗]j)∑L`=1 exp ([Bφ

∗]`)≈ fε(uj)∆ (2.1)

where φ∗ = (φ∗1, . . . , φ∗K)′ ∈ RK is a vector of spline parameters.

As πj(φ∗ + c) = πj(φ

∗) for any scalar c, one should constrain φ∗ for identi-fiability. We suggest to work with φ,

φk = φ∗k − log

J∑j=1

exp ([Bφ∗]j)

,such that

πj = exp ([Bφ]j) .

For more details on that strategy for estimating a smooth density, we refer toEilers and Marx (1996), Lambert and Eilers (2006, 2009) and to Section 4 and5 for aspects specific to the additive location-scale model.

3 Additive models for location and dispersion

Denote by µ(x) and σ(x) the conditional location and dispersion of Y givenx. Depending on the modelling purposes, µ(x) could be the conditional mean,median or even mode of the response. Likewise, σ(x) could be the conditionalstandard deviation, interquartile range or median absolute deviation, etc.

3

Assume the following additive model for the conditional location of Yi given(xµi , z

µi ):

µ(xµi , zµi ) = [µ(x

µ, zµ)]i =

J1∑j=1

fµj (xµij) +

βµ0 + p1∑j=1

βµj zµij

, (3.1)where i (i = 1, . . . , n) indexes the units of observation, zµi = (z

µi1, . . . , z

µip1

)′

is a set of p1 (mostly) categorical covariates, βµ = (βµ0 , . . . , β

µp1)′ the vector of

regression parameters and xµi = (xµi1, . . . , x

µiJ1

)′ a set of J1 continuous covariatestaking values in (0, 1). The last constraint is not a restriction as any of theoriginal covariates can be relocated and rescaled to meet that requirement.

Similarly, consider the following additive model for the dispersion of Yi given(xσi , z

σi ):

log σ(xσi , zσi ) =

J2∑j=1

fσj (xσij) +

βσ0 + p2∑j=1

βσj zσij

(3.2)Provided that these are smooth, the functional forms in µ(xµi , z

µi ) and

σ(xσi , zσi ) can be approximated using a linear combination of the elements of

a (large) B-splines basis {sl(·) : l = 1, . . . , L} on (0, 1) (see Brezger and Lang,2006, in a GLM setting):

fµj (xµij) =

L∑l=1

sl(xµij)θ

µlj ; f

σj (x

σij) =

L∑l=1

sl(xσij)θ

σlj .

Denote by Θµ = {θµlj} = [θµ1 , . . . ,θ

µJ1

] and Θσ = {θσlj} = [θσ1 , . . . ,θ

σJ2 ] the L×J1

and L× J2 matrices of spline coefficients for the additive terms in the locationand dispersion models, by {Sµj1}il = {s

µl (xij1)} and {S

σj2}il = {sσl (xij2)} the

n× J1 and n× J2 B-spline matrices for the j1th and j2th additive terms in thelocation and dispersion models respectively, by Zµ and Zσ the n× (1 + p1) andn× (1+p2) design matrices for the linear part in (3.1) and (3.2) with regressionparameters βµ = (βµ0 , . . . , β

µp1)′ and βσ = (βσ0 , . . . , β

σp1)′. Equations (3.1) and

(3.2) can be rewritten as

µ(xµ, zµ) = [Sµ1 . . . SµJ1

] vec (Θµ) + Zµβµ (3.3)

logσ(xσ, zσ) = [Sσ1 . . . SσJ2 ] vec (Θ

σ) + Zσβσ (3.4)

4 Penalties and identification constraint

4.1 Roughness penalty

The flexibility provided by the large numbers of B-splines, L for the additiveterms and K for the pivotal distribution fε(·), can be counterbalanced by aroughness penalty in a frequentist setting (Eilers and Marx, 1996) or a suitableprior in a Bayesian framework (Lang and Brezger, 2004; Jullion and Lambert,2007). The chosen order for the penalty will depend on the desired limitingbehavior for the functionals for large values of the penalty. We suggest to workwith

4

• a 3rd order penalty for the pivotal distribution, yielding a normal distri-bution at the limit when the penalty parameter τφ tends to +∞:

penφ(τφ,φ) = −0.5τφ∑k

(φk−3φk−1+3φk−2−φk−3)2 = −0.5τφ(Dφφ)′Dφφ ;

• a 2nd order penalty for each additive term in the location model, yieldinga linear regression model at the limit for the jth additive term (j =1, . . . , J1) when the penalty parameter τ

µj tends to +∞:

penµj (τµj ,θ

µj ) = −0.5τ

µj

∑l

(θµlj−2θµl−1,j+θ

µl−2,j)

2 = −0.5τµj (Dµθµj )

′Dµθµj ;

• a 1st order penalty for each additive term in the dispersion model, yieldingat the limit a conditionally homoskedastic model with respect to the jthcontinuous covariate (j = 1, . . . , J2) when the penalty parameter τ

φj tends

to +∞:

penσj (τσj ,θ

σj ) = −0.5τσj

∑l

(θσlj−2θσl−1,j +θσl−2,j)2 = −0.5τσj (Dσθσj )′Dσθσj .

In Bayesian terms, it can be translated into prior distributions on the splinecoefficients,

p(φ|τφ) ∝ (τφ)K/2 exp(−0.5τφ φ′P φφ

), (4.1)

p(θµj |τµj ) ∝ (τ

µj )

L/2 exp(−0.5τµj (θ

µj )′Pµθµj

), 1 ≤ j ≤ J1 (4.2)

p(θσj |τσj ) ∝ (τσj )L/2 exp(−0.5τσj (θσj )′P σθσj

), 1 ≤ j ≤ J2 (4.3)

where

P φ = (Dφ)′Dφ + �IK ; Pµ = (Dµ)′Dµ + �IL ; P

σ = (Dσ)′Dσ + �IL,

are full-rank matrices thanks to the addition of a small multiple � (10−6, say)of the identity matrix.

4.2 Likelihood and identification penalty

Given ψ = (βµ,Θµ,βσ,Θσ), one can associate to each observation,

{(xµi , zµi ), (x

σi , z

σi ), yi},

a quantity εi(ψ) such that

εi(ψ) =Yi − µ(xµi , z

µi )

σ(xσi , zσi )

.

If, for given ψ, one denotes by nj = nj(ψ) (j = 1, . . . , J) the number ofobserved εi(ψ)’s (i = 1, . . . , n) belonging to bin Jj , then the conditional jointdistribution of (N1(ψ), . . . , NJ(ψ)) is multinomial,

(N1(ψ), . . . , NJ(ψ)|φ) ∼ Mult(n;π1(φ), . . . , πJ(φ)), (4.4)

5

for values of πj(φ) given by Eq. (2.1). Therefore, the log-likelihood will be

logL(ψ,φ|D) =J∑j=1

nj(ψ) log πj(φ).

where D stands for the available data. Extra constraints should be expressed onthe spline coefficients φ involved in the pivotal distribution to force the desiredinterpretation for µi = µ(x

µi , z

µi ) and σi = σ(x

σi , z

σi ). For example, to interpret

µi and σi as the conditional mean and standard deviation of Yi, one shouldmake sure that the mean and variance of εi are 0 and 1 respectively. UsingEq. (2.1), one has

E[εi|φ] ≈ µεφ =1

J

J∑j=1

πj(φ)uj ; V[εi|φ] ≈ σ2εφ =1

J

J∑j=1

πju2j − µ2εφ ,

where uj is the midpoint of bin Jj . Therefore, one could add a large identifia-bility penalty to the log likelihood

penid = −κ{µ2εφ + (σ

2εφ− 1)2)

}, (4.5)

to force mean 0 and variance 1 to the fitted pivotal distribution when κ tendsto +∞. This suggests working with the penalized log-likelihood

logLpen(ψ,φ|data) =J∑j=1

nj(ψ) log πj(φ) + penid. (4.6)

Similar expressions can be obtained for other interpretations of µi and σi, seee.g. Section 8 where they stand for the conditional median and inter-quartilerange, respectively.

5 Bayesian inference

5.1 Joint and conditional posteriors

The full Bayesian model is given by (4.4), (4.1), (4.2), (4.3) and large varianceprior distributions for βµ,βσ and the penalty coefficients

τφ, τµj1 , τσj2 ∼ Exp(b = 10

−6) with 1 ≤ j1 ≤ J1, 1 ≤ j2 ≤ J2, (5.1)

where Exp(b) denotes an exponential distribution with mean b−1. Hence, thelog of the joint posterior is simply the sum of (4.6), of the logarithms of (4.1-4.3)and of

−b

τφ + J1∑j1=1

τµj1 +

J2∑j2=1

τσj2

.

6

The conditional posterior distributions for the penalty parameters can be shownto be

(τφ|φ,D) ∼ G(1 + 0.5K, b+ 0.5 φ′P φφ

), (5.2)

(τµj1 |θµj1,D) ∼ G

(1 + 0.5L, b+ 0.5 (θµj1)

′Pµθµj1

)with 1 ≤ j1 ≤ J1, (5.3)

(τσj2 |θσj2 ,D) ∼ G

(1 + 0.5L, b+ 0.5 (θσj2)

′P σθσj2)

with 1 ≤ j2 ≤ J2. (5.4)

Unfortunately, the conditional posteriors for θ and φ are not of a familiartype. Therefore, the Metropolis-within-Gibbs algorithm is used to sample fromthe joint posterior.

5.2 MCMC algorithm

For the algorithm to work without extra tuning on specific examples, we foundthat the data needed to be standardized in a first step (see Lang and Brezger,2004, for a similar recommendation). The response was first relocated andrescaled using the sample mean and standard deviation of the observed yi’s.The continuous covariates were also relocated and rescaled to take values in(0, 1).

5.2.1 Reference algorithm

The sampling algorithm is the following. At iteration m, given the state of thechain at the end of the previous iteration:

1. Univariate Metropolis steps for θµj1 in fµj1

(·):Let ςµG denote the variance of the partial residuals {yi−(β

µ0 +∑p1

j=1 βµj z

µij) :

i = 1, . . . , n} at the start of iteration m. Let Qµj1 =1ςµG

(Sµj1)′Sµj1 +P

µ+�IL

and let Lµj1(Lµj1

)′ be the Cholesky decomposition of that matrix. Denote

by ϑ(m)0 the value of θ

µj1

at the start of iteration m.

For ` from 1 to L:

– Generate ϑ = ϑ(m)`−1 + σ

µj1,`z Lµj1e` where e` is the `th unit vector of

length L, z ∼ N (0, 1) and σµj1,` is tuned during the burnin period(Haario et al., 2001) to ensure the desired acceptance rate (40% say).For identifiability of the additive model, recenter ϑ to ensure that∫ 10 f

µj1

(x|ϑ)dx = 1.

– Conditionally on the other parameters, accept (and set ϑ(m)` = ϑ) or

reject (and set ϑ(m)` = ϑ

(m)`−1) that proposal in a Metropolis step.

The value of θµj1 at the end of iteration m is ϑ(m)L .

2. Univariate Metropolis steps for βµ:Let ςµF denote the variance of the partial residuals {yi −

∑J1j=1 f

µj (x

µij) :

i = 1, . . . , n} for values of the spline parameters given by step 1. Let

7

Qµ = 1ςµF

(Zµ)′Zµ and let Lµ(Lµ)′ be the Cholesky decomposition of that

matrix. Denote by b(m)0 the value of β

µ at the start of iteration m.

For ` from 1 to p1 + 1:

– Generate b = b(m)`−1 + σ

µF,`z L

µe` where e` is the `th unit vector of

length p1+1, z ∼ N (0, 1) and σµF,` is tuned during the burnin periodto ensure the desired acceptance rate (40% say).

– Conditionally on the other parameters, accept (and set b(m)` = b) or

reject (and set b(m)` = b

(m)`−1) that proposal in a Metropolis step.

The value of βµ at the end of iteration m is b(m)p1+1

.

3. Univariate Metropolis steps for θσj2 in fµj2

(·):Let Qσj2 = (S

σj2

)′Sσj2 +Pσ+�IL and let L

σj2

(Lσj2)′ be the Cholesky decompo-

sition of that matrix. Proceed as in Step 1 with Qσj2 and Lσj2

substituted

to Qµj1 and Lµj1

respectively, ϑ(m)0 set to the value of θ

σj2 at the start of

iteration m and specific tuning parameters σµj2,` to reach the desired ac-ceptance rate (40% say). The value of θσj2 at the end of iteration m is

ϑ(m)L .

4. Univariate Metropolis steps for βσ:Let ςσF denote the variance of {log |yi −

∑J1j=1 f

µj (x

µij)| : i = 1, . . . , n} for

values of the spline parameters given by step 1. Let Qσ = 1ςσF(Zσ)′Zσ

and let Lσ(Lσ)′ be the Cholesky decomposition of that matrix. Denote

by b(m)0 the value of β

σ at the start of iteration m. Proceed as in Step 2with p2, L

σ and ςσF substituted to p1, Lµ and ςµF respectively. The value

of βσ at the end of iteration m is b(m)p2+1

.

5. Univariate Metropolis steps for spline parameters φ in the density:Let Qφ = B′Bσ+P φ+�IK and let L

φ(Lφ)′ be the Cholesky decompositionof that matrix. Proceed as in Step 1 with Qφ and Lφ substituted to Qµj1and Lµj1 respectively, ϑ

(m)0 set to the value of φ at the start of iteration

m and specific tuning parameters σφ` to reach the desired acceptance rate

(40% say). The value of φ at the end of iteration m is ϑ(m)K .

6. Gibbs step for the penalty parameters:Generate τφ, τµj1 (1 ≤ j1 ≤ J1), τ

σj2

(1 ≤ j2 ≤ J2) for iteration m fromEq. (5.2-5.4) with spline parameters set at their values from the previoussteps.

From the generated chain, one can build point estimates and credible regionsfor the spline parameters and any derived quantity.

8

5.2.2 Starting values

Starting values for the chain can be obtained by taking values of φ correspondingto a standard normal distribution. The starting values for the other parameterswere set as follows:

– [βµ]1 and [βσ]1 were set to the associated descriptive statistics for the ob-

served response. For example, if one models the median and the log in-terquartile range (IQR) of the response, then the initial values are simplytaken to be the sample median and log sample IQR of {yi : i = 1, . . . , n}.

– The other regression parameters and the spline parameters involved in theadditive parts of the model were all set equal to 0.

– The penalty parameters were set equal to a small value (10−1, say).

5.3 Truncation of the penalty priors

It was found in examples that the chains for the penalty parameters related tosome additive components could, in rare occasions, produce long series of verylarge values for τµj1 or τ

σj2

corresponding to a large penalization of conditionalnonlinear or non constant behaviors for fµj1 or f

σj2

as encouraged by a large2nd or a 1st order penalty for location and dispersion, respectively. This isnot surprising as e.g. a large value for τµj1 at iteration m tends to produce atiteration m + 1 a vector θµj1 corresponding to a nearly linear f

µj1

(·) functionalwhich in turn tends to produce a large τµj1 at iteration m+ 1 through Eq. (5.3).

While a very large (106, say) and a large (103, say) value for τµj1 at iterationm both generate at iteration m + 1 spline parameters suggesting linearity ofthe functionals fµj1(·), the chain for τ

µj1

tends to stay located at the right tailof the posterior distribution of the penalty during a large number of iterationsin the first case but not in the second. Therefore, the priors for the penaltyparameters in Eq. (5.1) were truncated to (0, τmax) where τmax denotes a largevalue (say 103) for the penalty.

5.4 The identification penalty

The identification penalty in Eq. (4.5) ensures that the estimated pivotal dis-tribution has location 0 and dispersion 1 when κ tends to +∞. While thatstrategy works when it comes to compute maximum (penalized) likelihood es-timates, it has a strong negative impact on the mixing of the chains generatedby the MCMC algorithm. Therefore, we advocate the following strategy:

1. Set κ equal to a moderate value (100, say).

2. Run the Metropolis-within-Gibbs algorithm described above.

3. Discard the burnin iterations, yielding after convergence final chains oflength M . For m from 1 to M :

For values of the parameters at iteration m:

9

– Define the density g(·|φ) at the bin midpoints uj by

g(uj |φ) =1

∆

exp([Bφ]j)∑L`=1 exp ([Bφ]`)

– Compute the location λ and dispersion δ of g(·|φ).– Set βµ0 ←− β

µ0 + λ and β

σ0 ←− βσ0 + log δ.

– The pivotal distribution with location 0 and dispersion 1 correspondsto the density fε given by

g(·|φ) = 1δfε

(· − λδ

∣∣∣∣φ) .– Recompute φ such that fε(uj |φ)∆ = exp([Bφ]j) for 1 ≤ j ≤ J.

After these corrections to the reference algorithm (see Section 5.2.1), ourexperience on different examples showed that trace plots suggest convergenceafter just a few thousands iterations.

6 Extension to interval censored data

Assume now that the response data take the form of intervals {(yLi , yUi ) : i =1, . . . , n}. Then, these intervals can be relocated and rescaled as (eLi , eUi ) where

eLi = eLi (ψ) =

yLi − µ(xµi , z

µi )

σ(xσi , zσi )

; eUi = eUi (ψ) =

yUi − µ(xµi , z

µi )

σ(xσi , zσi )

.

Assume that the exact unobserved response Yi is standardized in the sameway to obtain εi and that εi has a conditional distribution independent of thecovariates. Consider now the same strategy and notation as in Section 2 tomodel the corresponding pivotal density fε(·) on interval S. If ξ = (φ,ψ) andif cij is the proportion of bin Jj contained in (εLi , εUi ), then

Prξ

(yLi < Yi < y

Ui |x

µi , z

µi ,x

σi , z

σi

)= Pr

ξ

(eLi < εi < e

Ui

)=

∫ eUieLi

fε(e|φ)de

≈∑j

cij(ψ)πj(φ),

with πj(φ) given in Eq. (2.1). This is the composite link model (Thompsonand Baker, 1981; Eilers, 2007) and its application to the estimation of a densityfrom interval censored data (Lambert and Eilers, 2009; Lambert, 2011). Itcould also be extended to deal with heavily right censored data along the ideasin Çetinyürek and Lambert (2010).

The penalized log-likelihood that one had in Eq. (4.6) for precisely observedresponses becomes

logLpen(θ|data) =I∑i=1

log

J∑j=1

cijπj

+ penid.The derivation of the joint posterior and the strategy for exploring it are un-changed.

10

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

Figure 1: Simulation: conditional density fε(ε).

7 Simulation study

One hundred datasets {(xi, yi) : i = 1, . . . , n} of size n = 500, 1000, 1500were generated from the location-scale model in Eq. (1.1) where X is uni-formly distributed on (0, 1), µ(x) = sin(2πx), σ(x) = 0.79 exp{cos(4πx)} andε = 1.758W − 0.657 with W = 0.7 N

(−1, 0.62

)+ 0.3 N

(1, 0.52

). One

can check that ε has median 0 and interquartile range 1, see Fig. 1 for agraphical representation. Therefore, µ(x) and σ(x) are the conditional me-dian and interquartile range of Y given X = x, see Fig. 2. One also hasE(σ(X)) = 1. A scatterplot of one generated dataset of size n = 500 is pro-vided on Fig. 3. The points on Fig. 3 are the midpoints of the observed intervaldata (yi − 0.5h σxi , yi + 0.5h σxi) of average width h (taken to be 0.1, 0.5 or1.0 in the design of the simulation study). That makes 100× 3× 3 datasets forwhich µ(x), σ(x) and fε(ε) were estimated using the Bayesian strategy exposedin Section 5.

The estimated conditional median, interquartile range and pivotal densityfor each of the 100 simulated datasets are plotted as grey curves on Fig. 4, 5 and6, respectively ; the dashed curves represent the true functional. One can seethat the estimation of the conditional median and IQR is performant whateverthe sample size n and the mean width h of the interval data. Not surprisingly,the uncertainty decreases as n increases. The pivotal density and its bimodalityis also clearly detected. The quality of the reconstruction deteriorates as hincreases and is least satisfactory when h = 1.0.

Of course, in practice, if the available information is too sparse to providefine estimates of some of the components, the penalties will guide us, at thelimit, towards a classical linear regression setting with a normal pivotal distri-bution.

11

0.0 0.2 0.4 0.6 0.8 1.0

−1.

00.

00.

51.

0Conditional median

0.0 0.2 0.4 0.6 0.8 1.0

−1.

00.

00.

5

Conditional log IQR

Figure 2: Simulation: conditional median µ(x) and interquartile range σ(x).

*

**

*

*

*

**

**

*

*

*

*

*

*

*

*

*

*

***

*

*

*

**

*

**

*

*

*

**

*

*

**

**

**

*

*

*

*

*

* *

*

**

*

**

**

*

* **

*

**

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

* *

*

*

*

*

*

*

*

* *

*

*

*

*

*

**

*

*

*

*

*

*

* *

*

**

*

*

*

*

*

*

*

*

*

*

**

*

**

*

* **

*

*

*

*

**

*

*

*

**

*

**

*

*

*

*

*

**

*

**

*

*

*

*

*

*

*

***

*

* *

**

*

*

*

**

**

*

*

*

*

*

*

**

*

** *

*

*

*

*

*

*

*

**

*

**

*

*

*

*

*

*

*

**

*

**

*

*

*

*

**

**

*

*

* *

**

* *

*

*

*

*

*

*

*

**

*

*

*

*

*

*

*

*

*

*

*

*

**

*

*

*

*

*

*

*

*

*

**

**

**

*

*

*

*

*

*

*

*

**

*

*

**

*

*

*

*

*

*

**

*

*

*

**

**

*

*

*

*

*

*

*

*

**

*

*

*

*

*

*

*

****

*

*

*

*

*

*

*

*

***

*

*

**

*

**

**

*

*

* **

* *

*

*

*

*

*

*

*

*

*

*

*

*

**

**

*

*

*

*

*

**

*

* *

*

*

*

*

*

*

*

**

*

*

**

*

*

*

*

**

*

*

*

**

*

*

**

**

*

**

*

**

*

*

*

*

*

*

*****

**

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

* *

**

*

*

*

**

*

*

*

*

*

*

* *

*

*

*

*

*

*

*

*

**

*

**

*

*

**

*

*

*

**

*

*

**

*

*

*

*

*** *

*

*

*

*

*

*

*

**

*

*

*

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

yobs

Figure 3: Simulation: scatterplot of n = 500 simulated data.

8 Application

The data of interest were collected in 2006 through the European Social Survey,Round 3 (ESS, 2006). This is an academically-driven social survey designed tostudy the attitudes, beliefs and behaviour patterns of diverse populations inmore than 30 nations in Europe. We are specifically interested in the moneyavailable per person in Belgian households for respondents aged between 25and 75 who studied between 8 and 20 years. The ESS provides the net monthlyincome (in e) of the household (n = 1103) reported in one of the following in-tervals: 1: < 150 2: [150, 300[, 3: [300, 500[, 4: [500, 1.000[, 5: [1.000, 1.500[, 6:[1.500, 2.000[, 7: [2.000, 2.500[, 8: [2.500, 3.000[, 9: [3.000, 5.000[, 10: [5.000, 7.500[,11: [7.500, 10.000[, 12: ≥ 10.000e. The available income per person (also namedthe equivalised household income) is obtained by dividing the household netincome by the household weight. That weight is calculated using the OECD-

12

0.0 0.2 0.4 0.6 0.8 1.0

−1.

00.

00.

51.

0

n=500 ; h=0.1

0.0 0.2 0.4 0.6 0.8 1.0

−1.

00.

00.

51.

0

n=500 ; h=0.5

0.0 0.2 0.4 0.6 0.8 1.0

−1.

00.

00.

51.

0

n=500 ; h=1

0.0 0.2 0.4 0.6 0.8 1.0

−1.

00.

00.

51.

0

n=1000 ; h=0.1

0.0 0.2 0.4 0.6 0.8 1.0

−1.

00.

00.

51.

0

n=1000 ; h=0.5

0.0 0.2 0.4 0.6 0.8 1.0

−1.

00.

00.

51.

0

n=1000 ; h=1

0.0 0.2 0.4 0.6 0.8 1.0

−1.

00.

00.

51.

0

n=1500 ; h=0.1

0.0 0.2 0.4 0.6 0.8 1.0

−1.

00.

00.

51.

0

n=1500 ; h=0.5

0.0 0.2 0.4 0.6 0.8 1.0−

1.0

0.0

0.5

1.0

n=1500 ; h=1

Figure 4: Simulation: estimated conditional median (one grey curve perdataset) for sample sizes 500, 1000 or 1500 with interval data of average widthh = 0.1, 0.5 or 1.0. The dashed line is the true conditional median.

modified scale first proposed by Hagenaars et al. (1994) and advocated since thelate 1990s by the Statistical Office of the European Union (EUROSTAT). Thisscale assigns a value of 1 to the household head, of 0.5 to each additional adultmember and of 0.3 to each child (less than 14 years old). Therefore, our dataconsists of the preceding intervals each time divided by the household weight.We have studied its relation with age (48.5 ± 13.4 years) and the number ofyears of full-time education completed (12.7± 2.8 years) by the respondent.

We assume an additive location-scale model for the available income perperson, Y , cf. Eq. (1.1), and force ε to have median 0 and inter-quartile range1. The continuous covariates are age and educ such that xi = (agei, educi).Thus, µ(x) and σ(x) must be interpreted as the conditional median and inter-quartile range of Y given X = x, respectively.

Using the algorithm described in Section 5.2, a final chain of length 50, 000was run after a burnin of 5, 000 iterations for paramaters ψ and φ. The ageand educ components in the additive models for the conditional median and(log-) inter-quartile range of the income available per person are reported inFig. 7. These functional components are added to the reference values βµ0 andβσ0 in Eqs. (3.1) and (3.2), see Table 1 for their estimated posterior mean and95% highest posterior density (HPD) interval.

It suggests a significant increase (at a given age) of the median and of

13

0.0 0.2 0.4 0.6 0.8 1.0

−1.

00.

00.

51.

0

n=500 ; h=0.1

0.0 0.2 0.4 0.6 0.8 1.0

−1.

00.

00.

51.

0

n=500 ; h=0.5

0.0 0.2 0.4 0.6 0.8 1.0

−1.

00.

00.

51.

0

n=500 ; h=1

0.0 0.2 0.4 0.6 0.8 1.0

−1.

00.

00.

51.

0

n=1000 ; h=0.1

0.0 0.2 0.4 0.6 0.8 1.0

−1.

00.

00.

51.

0

n=1000 ; h=0.5

0.0 0.2 0.4 0.6 0.8 1.0

−1.

00.

00.

51.

0

n=1000 ; h=1

0.0 0.2 0.4 0.6 0.8 1.0

−1.

00.

00.

51.

0

n=1500 ; h=0.1

0.0 0.2 0.4 0.6 0.8 1.0

−1.

00.

00.

51.

0

n=1500 ; h=0.5

0.0 0.2 0.4 0.6 0.8 1.0−

1.0

0.0

0.5

1.0

n=1500 ; h=1

Figure 5: Simulation: estimated conditional log IQR (one grey curve perdataset) for sample sizes 500, 1000 or 1500 with interval data of average widthh = 0.1, 0.5 or 1.0. The dashed line is the true conditional log IQR.

Parameter Mean 95% HPD interval

βµ0 1461 (1410, 1513)βσ0 6.75 (6.69, 6.80)

Table 1: Income dataset: estimates of the posterior mean and of the 95% HPDinterval for the regression intercepts in the additive location-scale model.

log IQR of the net income available per person with the number of years ofeducation. An increase of these quantities with age (for a given educationlevel) is also visible till the late fifties where both the conditional median andlog IQR start to decrease. It corresponds to the age where Belgians typicallyleave the work market (on average at about 61) before the legal age of 65.Indeed, one can retire earlier at 60 after 35 years of contribution to the pensionsystem. The estimated pivotal distribution can be found on Fig. 8: it is rightskewed with a long right tail as expected with income data.

From the fitted model, one can compute the conditional probability to livebelow the poverty treshold. That treshold is commonly defined as 60% of themedian equivalised household income. In Belgium, it was 772e in 2006. If Y

14

−3 −2 −1 0 1 2 3

0.0

0.5

1.0

1.5

n=500 ; h=0.1

−3 −2 −1 0 1 2 3

0.0

0.5

1.0

1.5

n=500 ; h=0.5

−3 −2 −1 0 1 2 3

0.0

0.5

1.0

1.5

n=500 ; h=1

−3 −2 −1 0 1 2 3

0.0

0.5

1.0

1.5

n=1000 ; h=0.1

−3 −2 −1 0 1 2 3

0.0

0.5

1.0

1.5

n=1000 ; h=0.5

−3 −2 −1 0 1 2 3

0.0

0.5

1.0

1.5

n=1000 ; h=1

−3 −2 −1 0 1 2 3

0.0

0.5

1.0

1.5

n=1500 ; h=0.1

−3 −2 −1 0 1 2 3

0.0

0.5

1.0

1.5

n=1500 ; h=0.5

−3 −2 −1 0 1 2 30.

00.

51.

01.

5

n=1500 ; h=1

Figure 6: Simulation: estimated pivotal density (one grey curve per dataset)for sample sizes 500, 1000 or 1500 with interval data of average width h = 0.1,0.5 or 1.0. The dashed line is the true pivotal density fε(ε).

denotes the equivalised household income, then the probability of interest is

Pr(Y ≤ 772|X = x) = Pr(ε ≤ 772− µ(x)

σ(x)

)=

∫ 772−µ(x)σ(x)

−∞fε(e)de.

It was computed for a grid a values for age and educ, see Fig. 9. It suggests thateducation is the best insurance against poverty with (roughly) an estimated riskof more than 10% or 20% for persons who studied for less than 14 or 12 years,respectively. The low-educated young and old persons are particularly exposedwith an estimated risk over 30%.

9 Discussion

Additive models were proposed to model the location and the dispersion ofa continuous response with an unknown smooth conditional distribution. B-splines were used to specify all the functional components in the model, includ-ing the pivotal density. These were estimated jointly in a Bayesian frameworkwhere all sources of uncertainty are taken into account. Credible regions forany quantity of interest can be built from the generated MCMC chains.

The proposed tool can deal with interval censored data. It relies upon theresults in Lambert and Eilers (2009) and in Lambert (2011) where densities

15

30 40 50 60 70

−50

00

500

1000

Location: f(age)

age

8 10 12 14 16 18 20

−50

00

500

1000

Location: f(educ)

educ

30 40 50 60 70

−0.

6−

0.2

0.2

0.6

Log−dispersion: f(age)

age

8 10 12 14 16 18 20

−0.

6−

0.2

0.2

0.6

Log−dispersion: f(educ)

educ

Figure 7: Income data: age and educ components (in e, solid line) in theadditive models for the median and for the log interquartile range (solid line); grey regions delimit pointwise 80% (dark grey) and 95% (light grey) credibleintervals.

were estimated from univariate or bivariate grouped data. Simulations suggestthat a reliable estimation of the additive components can be obtained even withmodest sample sizes. A good identification of the pivotal distribution is moredependent on the width of the interval data. The suggested method enablesnot only to regress moments but also quantiles on covariates and to estimate(in a single step) the response distribution conditionally on these.

Several extensions of that model and further research are desirable. It in-cludes regression on interval censored covariates, model selection, goodness-of-fit tools for interval censored responses, random effects and extension to spatialdata. We are currently working on several of these aspects and hope to be ableto report soon in separate publications.

16

−5 0 5 10

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Pivotal density

Figure 8: Income data: fitted pivotal density in the location-scale model ; greyregions delimit pointwise 95% credible intervals.

Acknowledgments

Financial supports from the FSR research grant nr. FSRC-08/42 from the Uni-versity of Liège and of the IAP research network nr. P6/03 of the Belgiangovernment (Belgian Science Policy) are gratefully acknowledged.

References

Brezger, A. and Lang, S. (2006). Generalized structured additive regressionbased on Bayesian P-splines. Computational Statistics and Data Analysis,50, 967–991.

Çetinyürek, A. and Lambert, P. (2010). Smooth estimation of survival functionsand hazard ratios from interval-censored data using Bayesian penalized B-splines. Statistics in Medicine. DOI: 10.1002/sim.4081.

Collett, D. (1994). Modelling Survival Data in Medical Research. Chapman &Hall, London.

Eilers, P. H. C. (2007). Ill-posed problems with counts, the composite linkmodel and penalized likelihood. Statistical Modelling, 7, 239–254.

Eilers, P. H. C. and Marx, B. D. (1996). Flexible smoothing with B-splines andpenalties (with discussion). Statistical Science, 11, 89–121.

17

0.0

0.1

0.2

0.3

0.4

0.5

30 40 50 60 70

8

10

12

14

16

18

20

Probability to live below the poverty treshold in Belgium

Age

Edu

c

Figure 9: Income data: estimated probability to live below the poverty tresholdin 2006 in Belgium.

ESS (2006). European Social Survey Round 3: data file edi-tion 3.2. Norwegian Social Science Data Services, Norway.http://www.europeansocialsurvey.org/.

Fahrmeir, L. and Tutz, G. (2001). Multivariate Statistical Modelling Based onGeneralized Linear Models (2nd edition). Springer-Verlag.

Haario, H., Saksman, E., and Tamminen, J. (2001). An adaptive Metropolisalgorithm. Bernoulli, 7, 223–242.

Hagenaars, A., De Vos, K., and Zaidi, A. (1994). Poverty Statistics in theLate 1980’s: research based on micro-data. Luxembourg: Office for OfficialPublications of the European Communities.

Hastie, T. J. and Tibshirani, R. J. (1990). Generalized additive models. Chap-man & Hall, London.

Hsieh, F. (1996). Empirical process approach in a two-sample location–scalemodel with censored data. The Annals of Statistics, 24(6), 2705–2719.

Jørgensen, B. (1997). The Theory of Dispersion Models. Chapman & Hall,London.

18

Jullion, A. and Lambert, P. (2007). Robust specification of the roughnesspenalty prior distribution in spatially adaptive bayesian P-splines models.Computational Statistics and Data Analysis, 51, 2542–2558.

Komárek, A., Lesaffre, E., and Hilton, J. (2005). Accelerated failure time modelfor arbitrarily censored data with smoothed error distribution. Journal ofComputational and Graphical Statistics, 14, 726–745.

Lambert, P. (2011). Smooth semi- and nonparametric Bayesian estimation ofbivariate densities from bivariate histogram data. Computational Statisticsand Data Analysis, 55, 429–445.

Lambert, P. and Eilers, P. H. (2009). Bayesian density estimation from groupedcontinuous data. Computational Statistics and Data Analysis, 53, 1388–1399.

Lambert, P. and Eilers, P. H. C. (2006). Bayesian multidimensional densitysmoothing. In Proceedings of the 21st International Workshop on StatisticalModelling, pages 313–320, Galway.

Lambert, P. and Lindsey, J. (1999). Analysing financial returns using regressionbased on non-symmetric stable distributions. Applied Statistics, 48, 409–424.

Lambert, P., Collett, D., Kimber, A., and Johnson, R. (2004). Parametricaccelerated failure time models with random effects and an application tokidney transplant survival. Statistics in Medicine, 23(20), 3177–3192.

Lang, S. and Brezger, A. (2004). Bayesian P-splines. Journal of Computationaland Graphical Statistics, 13, 183–212.

McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd Edi-tion. Chapman & Hall / CRC.

Nelder, J. and Wedderburn, R. (1972). Generalized linear models. Journal ofthe Royal Statistical Society, Series B, 135, 370–384.

Rigby, R. and Stasinopoulos, D. (2001). Generalized additive models for loca-tion scale and shape. Applied statistics, 54(3), 1–38.

Thompson, R. and Baker, R. J. (1981). Composite link functions in generalizedlinear models. Applied Statistics, 30, 125–131.

Van Keilegom, I. and Akritas, M. (1999). Transfer of tail information in cen-sored regression models. The Annals of Statistics, 27(5), 1745–1784.

Vaupel, J., Manton, K., and Stallard, E. (1979). The impact of heterogeneity inindividual frailty on the dynamics of mortality. Demography, 16(3), 439–454.

19

D I S C U S S I O N P A P E R 1051 ADDITIVE LOCATION-SCALE … · 2012. 2. 25. · i n s t i t u t d e s t a t i s t i q u e b i o s t a t i s t i q u e e t s c i e n c e s a c t

Documents