-
I N S T I T U T D E S T A T I S T I Q U E
B I O S T A T I S T I Q U E E T
S C I E N C E S A C T U A R I E L L E S
(I S B A)
UNIVERSITÉ CATHOLIQUE DE LOUVAIN
D I S C U S S I O N
P A P E R
1051
ADDITIVE LOCATION-SCALE MODELS
FOR INTERVAL CENSORED DATA
LAMBERT, P.
This file can be downloaded
fromhttp://www.stat.ucl.ac.be/ISpub
-
Additive location-scale models
for interval censored data
Philippe Lambert ∗†
December 21, 2010
Abstract
An additive model for the location and dispersion of a
continuous re-sponse with an arbitrary smooth conditional
distribution is proposed. B-splines are used to specify the three
components of the model. It can beextended to deal with interval
censored data and multiple covariates. Asan illustration, the
relation between age, the number of years of full-timeeducation and
the net income (provided as intervals) available per personin
Belgian households is studied from survey data.
Key words: Interval censored data ; additive model ; Smooth
distri-bution.
1 Introduction
Generalized linear models (GLMs) (Nelder and Wedderburn, 1972;
McCullaghand Nelder, 1989) had a major impact on statistical
modelling with the syn-thesis in a single framework of several
extensions of normal linear regressionmodels to discrete and
positive continuous random variables. After choosinga distribution
in the exponential family and a link function, a transform ofthe
mean response could be described as a linear function of covariates
withmaximum likelihood estimates of the regression coefficients
quickly obtainedusing the iterative weighted least squares
algorithm and its implementation inthe software GLIM. Extensions of
GLMs enable to deal with multi-categorical,multivariate and
longitudinal data, random effects (see e.g. Fahrmeir and Tutz,2001,
for details and references), and even allow to jointly model
location anddispersion (Jørgensen, 1997). Another relevant
extension of GLMs releases thelinearity assumption and allows a
nonparametric or a flexible parametric spec-ification of the
relation between the mean response and continuous covariates:the
publication of the book by Hastie and Tibshirani (1990) generated a
lot ofresearch about generalized additive models.
∗Institut des sciences humaines et sociales, Méthodes
quantitatives en sciences sociales,Université de Liège, Liège,
Belgium. Email: [email protected].†Institut de statistique,
biostatistique et sciences actuarielles (ISBA), Université
catholique
de Louvain, Louvain-la-Neuve, Belgium.
1
-
Here, we shall focus on the location-scale model and extend it
using some ofthe preceding principles. If Y is a continuous
response and X a set of covariatesin Rp, the location-scale model
assumes that
Y = µ(X) + σ(X)ε (1.1)
where ε is independent of X, µ(X) denotes the unknown regression
surface andσ(X) enables to depart from the homoskedastic case. Most
often, one furtherimposes that E(ε) = 0 and V(ε) = 1 such that µ(X)
and σ2(X) are respectivelythe conditional mean and variance of Y
given X.
Lambert and Lindsey (1999) jointly model the location,
dispersion, skew-ness and kurtosis of a continuous response by
relating the four parameters of astable distribution to covariates.
That idea was further investigated by Rigbyand Stasinopoulos (2001)
by replacing the stable distribution with other flexi-ble
parametric conditional distributions and regression models by
additive ones.Accelerated failure time (AFT) models (see e.g.
Collett, 1994) in survival anal-ysis also rely on Eq. (1.1) with
log-transformed data. Explanatory variables areassumed to act
multiplicatively on the time-scale and for example to affect
theprogression of a disease. An AFT model can be extended to
include randomeffects (Lambert et al., 2004) similarly to the
inclusion of frailty (Vaupel et al.,1979) in the Cox proportional
hazards model or to have a flexible conditionaldistribution
(Komárek et al., 2005). Joint models for location and scale
werealso studied in the nonparametric literature. Hsieh (1996)
proposes an empir-ical process approach to compare two samples of
right-censored survival timesmodelled using an accelerated failure
time model with an arbitrary distributionform. Starting from the
Beran’s estimate F̃ (y|x) of the conditional distribu-tion of Y
given X, Van Keilegom and Akritas (1999) estimate µ(X) and
σ(X)consistently from right-censored data by discarding the right
tail of F̃ (y|x).
The goal of our research is to estimate the location and
dispersion of theconditional distribution of Y in Eq. (1.1) using
additive models when the dis-tribution of ε has an unknown smooth
density and when Y is possibly interval-censored. The plan of the
paper is as follows. After a short introduction,Section 2 reminds
how a smooth univariate density can be modelled using pe-nalized
B-splines (Eilers and Marx, 1996; Lambert and Eilers, 2006,
2009).Section 3 combines the flexible specification for the
conditional density withjoint additive models for location and
dispersion. The roughness penalties andthe identification
constraint for the pivotal distribution are described in Section4.
Strategies for a joint estimation of the parameters involved in the
three levelsof the model are proposed in Section 5 together with an
algorithm and prac-tical recommendations based on our experience.
An extension of the model tointerval censored data is presented in
Section 6. After a fit of our location-scaleadditive model to
simulated data and an application to the analysis of
intervalcensored income in Sections 7 and 8, we conclude the paper
by a discussion inSection 9.
2
-
2 Smooth specification of the conditional distribu-tion
Assume that one wants to specify how the conditional
distribution of a singlecontinuous random variable Y changes with a
set of continuous covariates X =(X1, . . . , Xp)
′ ∈ Rp. Denote by fY |X(y|x) the corresponding conditional
density.Further assume that conditionally on X = x, the
distribution of ε = (Y −
µx)/σx does not depend on x for suitably specified conditional
location anddispersion parameters, µx and σx. One can find an
interval S = (s1, s2) suchthat the support of ε is most likely
included in it. For example, if µx and σ
2x
denote the first two conditional moments, then, by Chebyshev’s
theorem, oneknows that Pr(|ε| ≤ k) ≥ 1− k−2 for k > 0. In most
practical situations, thatprobability is far above that lower
bound. Therefore, taking s2 = −s1 = 6 (say)usually provides
satisfactory results. Of course, based on contextual
information(such as positive skewness of the response) or
historical data, more ad-hocchoices could be made.
Consider now a cubic B-splines basis {bk(·) : k = 1, . . . ,K}
associated toa large number (20, say) of equidistant knots on S.
Let {Jj : j = 1, . . . , J}be a partition of S into a large number
(100, say) of consecutive bins of equalwidth ∆ with midpoints
uJj=1. Let [B]jk = bk(uj) be the J × K matrix ofcubic B-splines
evaluated the small bin midpoints uj(j = 1, . . . , J). Then,
theprobability to observe ε in Jj could be specified as∫
Jjfε(e)de = πj =
exp([Bφ∗]j)∑L`=1 exp ([Bφ
∗]`)≈ fε(uj)∆ (2.1)
where φ∗ = (φ∗1, . . . , φ∗K)′ ∈ RK is a vector of spline
parameters.
As πj(φ∗ + c) = πj(φ
∗) for any scalar c, one should constrain φ∗ for
identi-fiability. We suggest to work with φ,
φk = φ∗k − log
J∑j=1
exp ([Bφ∗]j)
,such that
πj = exp ([Bφ]j) .
For more details on that strategy for estimating a smooth
density, we refer toEilers and Marx (1996), Lambert and Eilers
(2006, 2009) and to Section 4 and5 for aspects specific to the
additive location-scale model.
3 Additive models for location and dispersion
Denote by µ(x) and σ(x) the conditional location and dispersion
of Y givenx. Depending on the modelling purposes, µ(x) could be the
conditional mean,median or even mode of the response. Likewise,
σ(x) could be the conditionalstandard deviation, interquartile
range or median absolute deviation, etc.
3
-
Assume the following additive model for the conditional location
of Yi given(xµi , z
µi ):
µ(xµi , zµi ) = [µ(x
µ, zµ)]i =
J1∑j=1
fµj (xµij) +
βµ0 + p1∑j=1
βµj zµij
, (3.1)where i (i = 1, . . . , n) indexes the units of
observation, zµi = (z
µi1, . . . , z
µip1
)′
is a set of p1 (mostly) categorical covariates, βµ = (βµ0 , . .
. , β
µp1)′ the vector of
regression parameters and xµi = (xµi1, . . . , x
µiJ1
)′ a set of J1 continuous covariatestaking values in (0, 1). The
last constraint is not a restriction as any of theoriginal
covariates can be relocated and rescaled to meet that
requirement.
Similarly, consider the following additive model for the
dispersion of Yi given(xσi , z
σi ):
log σ(xσi , zσi ) =
J2∑j=1
fσj (xσij) +
βσ0 + p2∑j=1
βσj zσij
(3.2)Provided that these are smooth, the functional forms in
µ(xµi , z
µi ) and
σ(xσi , zσi ) can be approximated using a linear combination of
the elements of
a (large) B-splines basis {sl(·) : l = 1, . . . , L} on (0, 1)
(see Brezger and Lang,2006, in a GLM setting):
fµj (xµij) =
L∑l=1
sl(xµij)θ
µlj ; f
σj (x
σij) =
L∑l=1
sl(xσij)θ
σlj .
Denote by Θµ = {θµlj} = [θµ1 , . . . ,θ
µJ1
] and Θσ = {θσlj} = [θσ1 , . . . ,θ
σJ2 ] the L×J1
and L× J2 matrices of spline coefficients for the additive terms
in the locationand dispersion models, by {Sµj1}il = {s
µl (xij1)} and {S
σj2}il = {sσl (xij2)} the
n× J1 and n× J2 B-spline matrices for the j1th and j2th additive
terms in thelocation and dispersion models respectively, by Zµ and
Zσ the n× (1 + p1) andn× (1+p2) design matrices for the linear part
in (3.1) and (3.2) with regressionparameters βµ = (βµ0 , . . . ,
β
µp1)′ and βσ = (βσ0 , . . . , β
σp1)′. Equations (3.1) and
(3.2) can be rewritten as
µ(xµ, zµ) = [Sµ1 . . . SµJ1
] vec (Θµ) + Zµβµ (3.3)
logσ(xσ, zσ) = [Sσ1 . . . SσJ2 ] vec (Θ
σ) + Zσβσ (3.4)
4 Penalties and identification constraint
4.1 Roughness penalty
The flexibility provided by the large numbers of B-splines, L
for the additiveterms and K for the pivotal distribution fε(·), can
be counterbalanced by aroughness penalty in a frequentist setting
(Eilers and Marx, 1996) or a suitableprior in a Bayesian framework
(Lang and Brezger, 2004; Jullion and Lambert,2007). The chosen
order for the penalty will depend on the desired limitingbehavior
for the functionals for large values of the penalty. We suggest to
workwith
4
-
• a 3rd order penalty for the pivotal distribution, yielding a
normal distri-bution at the limit when the penalty parameter τφ
tends to +∞:
penφ(τφ,φ) = −0.5τφ∑k
(φk−3φk−1+3φk−2−φk−3)2 = −0.5τφ(Dφφ)′Dφφ ;
• a 2nd order penalty for each additive term in the location
model, yieldinga linear regression model at the limit for the jth
additive term (j =1, . . . , J1) when the penalty parameter τ
µj tends to +∞:
penµj (τµj ,θ
µj ) = −0.5τ
µj
∑l
(θµlj−2θµl−1,j+θ
µl−2,j)
2 = −0.5τµj (Dµθµj )
′Dµθµj ;
• a 1st order penalty for each additive term in the dispersion
model, yieldingat the limit a conditionally homoskedastic model
with respect to the jthcontinuous covariate (j = 1, . . . , J2)
when the penalty parameter τ
φj tends
to +∞:
penσj (τσj ,θ
σj ) = −0.5τσj
∑l
(θσlj−2θσl−1,j +θσl−2,j)2 = −0.5τσj (Dσθσj )′Dσθσj .
In Bayesian terms, it can be translated into prior distributions
on the splinecoefficients,
p(φ|τφ) ∝ (τφ)K/2 exp(−0.5τφ φ′P φφ
), (4.1)
p(θµj |τµj ) ∝ (τ
µj )
L/2 exp(−0.5τµj (θ
µj )′Pµθµj
), 1 ≤ j ≤ J1 (4.2)
p(θσj |τσj ) ∝ (τσj )L/2 exp(−0.5τσj (θσj )′P σθσj
), 1 ≤ j ≤ J2 (4.3)
where
P φ = (Dφ)′Dφ + �IK ; Pµ = (Dµ)′Dµ + �IL ; P
σ = (Dσ)′Dσ + �IL,
are full-rank matrices thanks to the addition of a small
multiple � (10−6, say)of the identity matrix.
4.2 Likelihood and identification penalty
Given ψ = (βµ,Θµ,βσ,Θσ), one can associate to each
observation,
{(xµi , zµi ), (x
σi , z
σi ), yi},
a quantity εi(ψ) such that
εi(ψ) =Yi − µ(xµi , z
µi )
σ(xσi , zσi )
.
If, for given ψ, one denotes by nj = nj(ψ) (j = 1, . . . , J)
the number ofobserved εi(ψ)’s (i = 1, . . . , n) belonging to bin
Jj , then the conditional jointdistribution of (N1(ψ), . . . ,
NJ(ψ)) is multinomial,
(N1(ψ), . . . , NJ(ψ)|φ) ∼ Mult(n;π1(φ), . . . , πJ(φ)),
(4.4)
5
-
for values of πj(φ) given by Eq. (2.1). Therefore, the
log-likelihood will be
logL(ψ,φ|D) =J∑j=1
nj(ψ) log πj(φ).
where D stands for the available data. Extra constraints should
be expressed onthe spline coefficients φ involved in the pivotal
distribution to force the desiredinterpretation for µi = µ(x
µi , z
µi ) and σi = σ(x
σi , z
σi ). For example, to interpret
µi and σi as the conditional mean and standard deviation of Yi,
one shouldmake sure that the mean and variance of εi are 0 and 1
respectively. UsingEq. (2.1), one has
E[εi|φ] ≈ µεφ =1
J
J∑j=1
πj(φ)uj ; V[εi|φ] ≈ σ2εφ =1
J
J∑j=1
πju2j − µ2εφ ,
where uj is the midpoint of bin Jj . Therefore, one could add a
large identifia-bility penalty to the log likelihood
penid = −κ{µ2εφ + (σ
2εφ− 1)2)
}, (4.5)
to force mean 0 and variance 1 to the fitted pivotal
distribution when κ tendsto +∞. This suggests working with the
penalized log-likelihood
logLpen(ψ,φ|data) =J∑j=1
nj(ψ) log πj(φ) + penid. (4.6)
Similar expressions can be obtained for other interpretations of
µi and σi, seee.g. Section 8 where they stand for the conditional
median and inter-quartilerange, respectively.
5 Bayesian inference
5.1 Joint and conditional posteriors
The full Bayesian model is given by (4.4), (4.1), (4.2), (4.3)
and large varianceprior distributions for βµ,βσ and the penalty
coefficients
τφ, τµj1 , τσj2 ∼ Exp(b = 10
−6) with 1 ≤ j1 ≤ J1, 1 ≤ j2 ≤ J2, (5.1)
where Exp(b) denotes an exponential distribution with mean b−1.
Hence, thelog of the joint posterior is simply the sum of (4.6), of
the logarithms of (4.1-4.3)and of
−b
τφ + J1∑j1=1
τµj1 +
J2∑j2=1
τσj2
.
6
-
The conditional posterior distributions for the penalty
parameters can be shownto be
(τφ|φ,D) ∼ G(1 + 0.5K, b+ 0.5 φ′P φφ
), (5.2)
(τµj1 |θµj1,D) ∼ G
(1 + 0.5L, b+ 0.5 (θµj1)
′Pµθµj1
)with 1 ≤ j1 ≤ J1, (5.3)
(τσj2 |θσj2 ,D) ∼ G
(1 + 0.5L, b+ 0.5 (θσj2)
′P σθσj2)
with 1 ≤ j2 ≤ J2. (5.4)
Unfortunately, the conditional posteriors for θ and φ are not of
a familiartype. Therefore, the Metropolis-within-Gibbs algorithm is
used to sample fromthe joint posterior.
5.2 MCMC algorithm
For the algorithm to work without extra tuning on specific
examples, we foundthat the data needed to be standardized in a
first step (see Lang and Brezger,2004, for a similar
recommendation). The response was first relocated andrescaled using
the sample mean and standard deviation of the observed yi’s.The
continuous covariates were also relocated and rescaled to take
values in(0, 1).
5.2.1 Reference algorithm
The sampling algorithm is the following. At iteration m, given
the state of thechain at the end of the previous iteration:
1. Univariate Metropolis steps for θµj1 in fµj1
(·):Let ςµG denote the variance of the partial residuals
{yi−(β
µ0 +∑p1
j=1 βµj z
µij) :
i = 1, . . . , n} at the start of iteration m. Let Qµj1
=1ςµG
(Sµj1)′Sµj1 +P
µ+�IL
and let Lµj1(Lµj1
)′ be the Cholesky decomposition of that matrix. Denote
by ϑ(m)0 the value of θ
µj1
at the start of iteration m.
For ` from 1 to L:
– Generate ϑ = ϑ(m)`−1 + σ
µj1,`z Lµj1e` where e` is the `th unit vector of
length L, z ∼ N (0, 1) and σµj1,` is tuned during the burnin
period(Haario et al., 2001) to ensure the desired acceptance rate
(40% say).For identifiability of the additive model, recenter ϑ to
ensure that∫ 10 f
µj1
(x|ϑ)dx = 1.
– Conditionally on the other parameters, accept (and set ϑ(m)` =
ϑ) or
reject (and set ϑ(m)` = ϑ
(m)`−1) that proposal in a Metropolis step.
The value of θµj1 at the end of iteration m is ϑ(m)L .
2. Univariate Metropolis steps for βµ:Let ςµF denote the
variance of the partial residuals {yi −
∑J1j=1 f
µj (x
µij) :
i = 1, . . . , n} for values of the spline parameters given by
step 1. Let
7
-
Qµ = 1ςµF
(Zµ)′Zµ and let Lµ(Lµ)′ be the Cholesky decomposition of
that
matrix. Denote by b(m)0 the value of β
µ at the start of iteration m.
For ` from 1 to p1 + 1:
– Generate b = b(m)`−1 + σ
µF,`z L
µe` where e` is the `th unit vector of
length p1+1, z ∼ N (0, 1) and σµF,` is tuned during the burnin
periodto ensure the desired acceptance rate (40% say).
– Conditionally on the other parameters, accept (and set b(m)` =
b) or
reject (and set b(m)` = b
(m)`−1) that proposal in a Metropolis step.
The value of βµ at the end of iteration m is b(m)p1+1
.
3. Univariate Metropolis steps for θσj2 in fµj2
(·):Let Qσj2 = (S
σj2
)′Sσj2 +Pσ+�IL and let L
σj2
(Lσj2)′ be the Cholesky decompo-
sition of that matrix. Proceed as in Step 1 with Qσj2 and
Lσj2
substituted
to Qµj1 and Lµj1
respectively, ϑ(m)0 set to the value of θ
σj2 at the start of
iteration m and specific tuning parameters σµj2,` to reach the
desired ac-ceptance rate (40% say). The value of θσj2 at the end of
iteration m is
ϑ(m)L .
4. Univariate Metropolis steps for βσ:Let ςσF denote the
variance of {log |yi −
∑J1j=1 f
µj (x
µij)| : i = 1, . . . , n} for
values of the spline parameters given by step 1. Let Qσ =
1ςσF(Zσ)′Zσ
and let Lσ(Lσ)′ be the Cholesky decomposition of that matrix.
Denote
by b(m)0 the value of β
σ at the start of iteration m. Proceed as in Step 2with p2,
L
σ and ςσF substituted to p1, Lµ and ςµF respectively. The
value
of βσ at the end of iteration m is b(m)p2+1
.
5. Univariate Metropolis steps for spline parameters φ in the
density:Let Qφ = B′Bσ+P φ+�IK and let L
φ(Lφ)′ be the Cholesky decompositionof that matrix. Proceed as
in Step 1 with Qφ and Lφ substituted to Qµj1and Lµj1 respectively,
ϑ
(m)0 set to the value of φ at the start of iteration
m and specific tuning parameters σφ` to reach the desired
acceptance rate
(40% say). The value of φ at the end of iteration m is ϑ(m)K
.
6. Gibbs step for the penalty parameters:Generate τφ, τµj1 (1 ≤
j1 ≤ J1), τ
σj2
(1 ≤ j2 ≤ J2) for iteration m fromEq. (5.2-5.4) with spline
parameters set at their values from the previoussteps.
From the generated chain, one can build point estimates and
credible regionsfor the spline parameters and any derived
quantity.
8
-
5.2.2 Starting values
Starting values for the chain can be obtained by taking values
of φ correspondingto a standard normal distribution. The starting
values for the other parameterswere set as follows:
– [βµ]1 and [βσ]1 were set to the associated descriptive
statistics for the ob-
served response. For example, if one models the median and the
log in-terquartile range (IQR) of the response, then the initial
values are simplytaken to be the sample median and log sample IQR
of {yi : i = 1, . . . , n}.
– The other regression parameters and the spline parameters
involved in theadditive parts of the model were all set equal to
0.
– The penalty parameters were set equal to a small value (10−1,
say).
5.3 Truncation of the penalty priors
It was found in examples that the chains for the penalty
parameters related tosome additive components could, in rare
occasions, produce long series of verylarge values for τµj1 or
τ
σj2
corresponding to a large penalization of conditionalnonlinear or
non constant behaviors for fµj1 or f
σj2
as encouraged by a large2nd or a 1st order penalty for location
and dispersion, respectively. This isnot surprising as e.g. a large
value for τµj1 at iteration m tends to produce atiteration m + 1 a
vector θµj1 corresponding to a nearly linear f
µj1
(·) functionalwhich in turn tends to produce a large τµj1 at
iteration m+ 1 through Eq. (5.3).
While a very large (106, say) and a large (103, say) value for
τµj1 at iterationm both generate at iteration m + 1 spline
parameters suggesting linearity ofthe functionals fµj1(·), the
chain for τ
µj1
tends to stay located at the right tailof the posterior
distribution of the penalty during a large number of iterationsin
the first case but not in the second. Therefore, the priors for the
penaltyparameters in Eq. (5.1) were truncated to (0, τmax) where
τmax denotes a largevalue (say 103) for the penalty.
5.4 The identification penalty
The identification penalty in Eq. (4.5) ensures that the
estimated pivotal dis-tribution has location 0 and dispersion 1
when κ tends to +∞. While thatstrategy works when it comes to
compute maximum (penalized) likelihood es-timates, it has a strong
negative impact on the mixing of the chains generatedby the MCMC
algorithm. Therefore, we advocate the following strategy:
1. Set κ equal to a moderate value (100, say).
2. Run the Metropolis-within-Gibbs algorithm described
above.
3. Discard the burnin iterations, yielding after convergence
final chains oflength M . For m from 1 to M :
For values of the parameters at iteration m:
9
-
– Define the density g(·|φ) at the bin midpoints uj by
g(uj |φ) =1
∆
exp([Bφ]j)∑L`=1 exp ([Bφ]`)
– Compute the location λ and dispersion δ of g(·|φ).– Set βµ0 ←−
β
µ0 + λ and β
σ0 ←− βσ0 + log δ.
– The pivotal distribution with location 0 and dispersion 1
correspondsto the density fε given by
g(·|φ) = 1δfε
(· − λδ
∣∣∣∣φ) .– Recompute φ such that fε(uj |φ)∆ = exp([Bφ]j) for 1 ≤
j ≤ J.
After these corrections to the reference algorithm (see Section
5.2.1), ourexperience on different examples showed that trace plots
suggest convergenceafter just a few thousands iterations.
6 Extension to interval censored data
Assume now that the response data take the form of intervals
{(yLi , yUi ) : i =1, . . . , n}. Then, these intervals can be
relocated and rescaled as (eLi , eUi ) where
eLi = eLi (ψ) =
yLi − µ(xµi , z
µi )
σ(xσi , zσi )
; eUi = eUi (ψ) =
yUi − µ(xµi , z
µi )
σ(xσi , zσi )
.
Assume that the exact unobserved response Yi is standardized in
the sameway to obtain εi and that εi has a conditional distribution
independent of thecovariates. Consider now the same strategy and
notation as in Section 2 tomodel the corresponding pivotal density
fε(·) on interval S. If ξ = (φ,ψ) andif cij is the proportion of
bin Jj contained in (εLi , εUi ), then
Prξ
(yLi < Yi < y
Ui |x
µi , z
µi ,x
σi , z
σi
)= Pr
ξ
(eLi < εi < e
Ui
)=
∫ eUieLi
fε(e|φ)de
≈∑j
cij(ψ)πj(φ),
with πj(φ) given in Eq. (2.1). This is the composite link model
(Thompsonand Baker, 1981; Eilers, 2007) and its application to the
estimation of a densityfrom interval censored data (Lambert and
Eilers, 2009; Lambert, 2011). Itcould also be extended to deal with
heavily right censored data along the ideasin Çetinyürek and
Lambert (2010).
The penalized log-likelihood that one had in Eq. (4.6) for
precisely observedresponses becomes
logLpen(θ|data) =I∑i=1
log
J∑j=1
cijπj
+ penid.The derivation of the joint posterior and the strategy
for exploring it are un-changed.
10
-
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
Figure 1: Simulation: conditional density fε(ε).
7 Simulation study
One hundred datasets {(xi, yi) : i = 1, . . . , n} of size n =
500, 1000, 1500were generated from the location-scale model in Eq.
(1.1) where X is uni-formly distributed on (0, 1), µ(x) = sin(2πx),
σ(x) = 0.79 exp{cos(4πx)} andε = 1.758W − 0.657 with W = 0.7 N
(−1, 0.62
)+ 0.3 N
(1, 0.52
). One
can check that ε has median 0 and interquartile range 1, see
Fig. 1 for agraphical representation. Therefore, µ(x) and σ(x) are
the conditional me-dian and interquartile range of Y given X = x,
see Fig. 2. One also hasE(σ(X)) = 1. A scatterplot of one generated
dataset of size n = 500 is pro-vided on Fig. 3. The points on Fig.
3 are the midpoints of the observed intervaldata (yi − 0.5h σxi ,
yi + 0.5h σxi) of average width h (taken to be 0.1, 0.5 or1.0 in
the design of the simulation study). That makes 100× 3× 3 datasets
forwhich µ(x), σ(x) and fε(ε) were estimated using the Bayesian
strategy exposedin Section 5.
The estimated conditional median, interquartile range and
pivotal densityfor each of the 100 simulated datasets are plotted
as grey curves on Fig. 4, 5 and6, respectively ; the dashed curves
represent the true functional. One can seethat the estimation of
the conditional median and IQR is performant whateverthe sample
size n and the mean width h of the interval data. Not
surprisingly,the uncertainty decreases as n increases. The pivotal
density and its bimodalityis also clearly detected. The quality of
the reconstruction deteriorates as hincreases and is least
satisfactory when h = 1.0.
Of course, in practice, if the available information is too
sparse to providefine estimates of some of the components, the
penalties will guide us, at thelimit, towards a classical linear
regression setting with a normal pivotal distri-bution.
11
-
0.0 0.2 0.4 0.6 0.8 1.0
−1.
00.
00.
51.
0Conditional median
0.0 0.2 0.4 0.6 0.8 1.0
−1.
00.
00.
5
Conditional log IQR
Figure 2: Simulation: conditional median µ(x) and interquartile
range σ(x).
*
**
*
*
*
**
**
*
*
*
*
*
*
*
*
*
*
***
*
*
*
**
*
**
*
*
*
**
*
*
**
**
**
*
*
*
*
*
* *
*
**
*
**
**
*
* **
*
**
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
* *
*
*
*
*
*
*
*
* *
*
*
*
*
*
**
*
*
*
*
*
*
* *
*
**
*
*
*
*
*
*
*
*
*
*
**
*
**
*
* **
*
*
*
*
**
*
*
*
**
*
**
*
*
*
*
*
**
*
**
*
*
*
*
*
*
*
***
*
* *
**
*
*
*
**
**
*
*
*
*
*
*
**
*
** *
*
*
*
*
*
*
*
**
*
**
*
*
*
*
*
*
*
**
*
**
*
*
*
*
**
**
*
*
* *
**
* *
*
*
*
*
*
*
*
**
*
*
*
*
*
*
*
*
*
*
*
*
**
*
*
*
*
*
*
*
*
*
**
**
**
*
*
*
*
*
*
*
*
**
*
*
**
*
*
*
*
*
*
**
*
*
*
**
**
*
*
*
*
*
*
*
*
**
*
*
*
*
*
*
*
****
*
*
*
*
*
*
*
*
***
*
*
**
*
**
**
*
*
* **
* *
*
*
*
*
*
*
*
*
*
*
*
*
**
**
*
*
*
*
*
**
*
* *
*
*
*
*
*
*
*
**
*
*
**
*
*
*
*
**
*
*
*
**
*
*
**
**
*
**
*
**
*
*
*
*
*
*
*****
**
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
* *
**
*
*
*
**
*
*
*
*
*
*
* *
*
*
*
*
*
*
*
*
**
*
**
*
*
**
*
*
*
**
*
*
**
*
*
*
*
*** *
*
*
*
*
*
*
*
**
*
*
*
0.0 0.2 0.4 0.6 0.8 1.0
−2
−1
01
23
yobs
Figure 3: Simulation: scatterplot of n = 500 simulated data.
8 Application
The data of interest were collected in 2006 through the European
Social Survey,Round 3 (ESS, 2006). This is an academically-driven
social survey designed tostudy the attitudes, beliefs and behaviour
patterns of diverse populations inmore than 30 nations in Europe.
We are specifically interested in the moneyavailable per person in
Belgian households for respondents aged between 25and 75 who
studied between 8 and 20 years. The ESS provides the net
monthlyincome (in e) of the household (n = 1103) reported in one of
the following in-tervals: 1: < 150 2: [150, 300[, 3: [300, 500[,
4: [500, 1.000[, 5: [1.000, 1.500[, 6:[1.500, 2.000[, 7: [2.000,
2.500[, 8: [2.500, 3.000[, 9: [3.000, 5.000[, 10: [5.000,
7.500[,11: [7.500, 10.000[, 12: ≥ 10.000e. The available income per
person (also namedthe equivalised household income) is obtained by
dividing the household netincome by the household weight. That
weight is calculated using the OECD-
12
-
0.0 0.2 0.4 0.6 0.8 1.0
−1.
00.
00.
51.
0
n=500 ; h=0.1
0.0 0.2 0.4 0.6 0.8 1.0
−1.
00.
00.
51.
0
n=500 ; h=0.5
0.0 0.2 0.4 0.6 0.8 1.0
−1.
00.
00.
51.
0
n=500 ; h=1
0.0 0.2 0.4 0.6 0.8 1.0
−1.
00.
00.
51.
0
n=1000 ; h=0.1
0.0 0.2 0.4 0.6 0.8 1.0
−1.
00.
00.
51.
0
n=1000 ; h=0.5
0.0 0.2 0.4 0.6 0.8 1.0
−1.
00.
00.
51.
0
n=1000 ; h=1
0.0 0.2 0.4 0.6 0.8 1.0
−1.
00.
00.
51.
0
n=1500 ; h=0.1
0.0 0.2 0.4 0.6 0.8 1.0
−1.
00.
00.
51.
0
n=1500 ; h=0.5
0.0 0.2 0.4 0.6 0.8 1.0−
1.0
0.0
0.5
1.0
n=1500 ; h=1
Figure 4: Simulation: estimated conditional median (one grey
curve perdataset) for sample sizes 500, 1000 or 1500 with interval
data of average widthh = 0.1, 0.5 or 1.0. The dashed line is the
true conditional median.
modified scale first proposed by Hagenaars et al. (1994) and
advocated since thelate 1990s by the Statistical Office of the
European Union (EUROSTAT). Thisscale assigns a value of 1 to the
household head, of 0.5 to each additional adultmember and of 0.3 to
each child (less than 14 years old). Therefore, our dataconsists of
the preceding intervals each time divided by the household
weight.We have studied its relation with age (48.5 ± 13.4 years)
and the number ofyears of full-time education completed (12.7± 2.8
years) by the respondent.
We assume an additive location-scale model for the available
income perperson, Y , cf. Eq. (1.1), and force ε to have median 0
and inter-quartile range1. The continuous covariates are age and
educ such that xi = (agei, educi).Thus, µ(x) and σ(x) must be
interpreted as the conditional median and inter-quartile range of Y
given X = x, respectively.
Using the algorithm described in Section 5.2, a final chain of
length 50, 000was run after a burnin of 5, 000 iterations for
paramaters ψ and φ. The ageand educ components in the additive
models for the conditional median and(log-) inter-quartile range of
the income available per person are reported inFig. 7. These
functional components are added to the reference values βµ0 andβσ0
in Eqs. (3.1) and (3.2), see Table 1 for their estimated posterior
mean and95% highest posterior density (HPD) interval.
It suggests a significant increase (at a given age) of the
median and of
13
-
0.0 0.2 0.4 0.6 0.8 1.0
−1.
00.
00.
51.
0
n=500 ; h=0.1
0.0 0.2 0.4 0.6 0.8 1.0
−1.
00.
00.
51.
0
n=500 ; h=0.5
0.0 0.2 0.4 0.6 0.8 1.0
−1.
00.
00.
51.
0
n=500 ; h=1
0.0 0.2 0.4 0.6 0.8 1.0
−1.
00.
00.
51.
0
n=1000 ; h=0.1
0.0 0.2 0.4 0.6 0.8 1.0
−1.
00.
00.
51.
0
n=1000 ; h=0.5
0.0 0.2 0.4 0.6 0.8 1.0
−1.
00.
00.
51.
0
n=1000 ; h=1
0.0 0.2 0.4 0.6 0.8 1.0
−1.
00.
00.
51.
0
n=1500 ; h=0.1
0.0 0.2 0.4 0.6 0.8 1.0
−1.
00.
00.
51.
0
n=1500 ; h=0.5
0.0 0.2 0.4 0.6 0.8 1.0−
1.0
0.0
0.5
1.0
n=1500 ; h=1
Figure 5: Simulation: estimated conditional log IQR (one grey
curve perdataset) for sample sizes 500, 1000 or 1500 with interval
data of average widthh = 0.1, 0.5 or 1.0. The dashed line is the
true conditional log IQR.
Parameter Mean 95% HPD interval
βµ0 1461 (1410, 1513)βσ0 6.75 (6.69, 6.80)
Table 1: Income dataset: estimates of the posterior mean and of
the 95% HPDinterval for the regression intercepts in the additive
location-scale model.
log IQR of the net income available per person with the number
of years ofeducation. An increase of these quantities with age (for
a given educationlevel) is also visible till the late fifties where
both the conditional median andlog IQR start to decrease. It
corresponds to the age where Belgians typicallyleave the work
market (on average at about 61) before the legal age of 65.Indeed,
one can retire earlier at 60 after 35 years of contribution to the
pensionsystem. The estimated pivotal distribution can be found on
Fig. 8: it is rightskewed with a long right tail as expected with
income data.
From the fitted model, one can compute the conditional
probability to livebelow the poverty treshold. That treshold is
commonly defined as 60% of themedian equivalised household income.
In Belgium, it was 772e in 2006. If Y
14
-
−3 −2 −1 0 1 2 3
0.0
0.5
1.0
1.5
n=500 ; h=0.1
−3 −2 −1 0 1 2 3
0.0
0.5
1.0
1.5
n=500 ; h=0.5
−3 −2 −1 0 1 2 3
0.0
0.5
1.0
1.5
n=500 ; h=1
−3 −2 −1 0 1 2 3
0.0
0.5
1.0
1.5
n=1000 ; h=0.1
−3 −2 −1 0 1 2 3
0.0
0.5
1.0
1.5
n=1000 ; h=0.5
−3 −2 −1 0 1 2 3
0.0
0.5
1.0
1.5
n=1000 ; h=1
−3 −2 −1 0 1 2 3
0.0
0.5
1.0
1.5
n=1500 ; h=0.1
−3 −2 −1 0 1 2 3
0.0
0.5
1.0
1.5
n=1500 ; h=0.5
−3 −2 −1 0 1 2 30.
00.
51.
01.
5
n=1500 ; h=1
Figure 6: Simulation: estimated pivotal density (one grey curve
per dataset)for sample sizes 500, 1000 or 1500 with interval data
of average width h = 0.1,0.5 or 1.0. The dashed line is the true
pivotal density fε(ε).
denotes the equivalised household income, then the probability
of interest is
Pr(Y ≤ 772|X = x) = Pr(ε ≤ 772− µ(x)
σ(x)
)=
∫ 772−µ(x)σ(x)
−∞fε(e)de.
It was computed for a grid a values for age and educ, see Fig.
9. It suggests thateducation is the best insurance against poverty
with (roughly) an estimated riskof more than 10% or 20% for persons
who studied for less than 14 or 12 years,respectively. The
low-educated young and old persons are particularly exposedwith an
estimated risk over 30%.
9 Discussion
Additive models were proposed to model the location and the
dispersion ofa continuous response with an unknown smooth
conditional distribution. B-splines were used to specify all the
functional components in the model, includ-ing the pivotal density.
These were estimated jointly in a Bayesian frameworkwhere all
sources of uncertainty are taken into account. Credible regions
forany quantity of interest can be built from the generated MCMC
chains.
The proposed tool can deal with interval censored data. It
relies upon theresults in Lambert and Eilers (2009) and in Lambert
(2011) where densities
15
-
30 40 50 60 70
−50
00
500
1000
Location: f(age)
age
8 10 12 14 16 18 20
−50
00
500
1000
Location: f(educ)
educ
30 40 50 60 70
−0.
6−
0.2
0.2
0.6
Log−dispersion: f(age)
age
8 10 12 14 16 18 20
−0.
6−
0.2
0.2
0.6
Log−dispersion: f(educ)
educ
Figure 7: Income data: age and educ components (in e, solid
line) in theadditive models for the median and for the log
interquartile range (solid line); grey regions delimit pointwise
80% (dark grey) and 95% (light grey) credibleintervals.
were estimated from univariate or bivariate grouped data.
Simulations suggestthat a reliable estimation of the additive
components can be obtained even withmodest sample sizes. A good
identification of the pivotal distribution is moredependent on the
width of the interval data. The suggested method enablesnot only to
regress moments but also quantiles on covariates and to estimate(in
a single step) the response distribution conditionally on
these.
Several extensions of that model and further research are
desirable. It in-cludes regression on interval censored covariates,
model selection, goodness-of-fit tools for interval censored
responses, random effects and extension to spatialdata. We are
currently working on several of these aspects and hope to be ableto
report soon in separate publications.
16
-
−5 0 5 10
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Pivotal density
Figure 8: Income data: fitted pivotal density in the
location-scale model ; greyregions delimit pointwise 95% credible
intervals.
Acknowledgments
Financial supports from the FSR research grant nr. FSRC-08/42
from the Uni-versity of Liège and of the IAP research network nr.
P6/03 of the Belgiangovernment (Belgian Science Policy) are
gratefully acknowledged.
References
Brezger, A. and Lang, S. (2006). Generalized structured additive
regressionbased on Bayesian P-splines. Computational Statistics and
Data Analysis,50, 967–991.
Çetinyürek, A. and Lambert, P. (2010). Smooth estimation of
survival functionsand hazard ratios from interval-censored data
using Bayesian penalized B-splines. Statistics in Medicine. DOI:
10.1002/sim.4081.
Collett, D. (1994). Modelling Survival Data in Medical Research.
Chapman &Hall, London.
Eilers, P. H. C. (2007). Ill-posed problems with counts, the
composite linkmodel and penalized likelihood. Statistical
Modelling, 7, 239–254.
Eilers, P. H. C. and Marx, B. D. (1996). Flexible smoothing with
B-splines andpenalties (with discussion). Statistical Science, 11,
89–121.
17
-
0.0
0.1
0.2
0.3
0.4
0.5
30 40 50 60 70
8
10
12
14
16
18
20
Probability to live below the poverty treshold in Belgium
Age
Edu
c
Figure 9: Income data: estimated probability to live below the
poverty tresholdin 2006 in Belgium.
ESS (2006). European Social Survey Round 3: data file edi-tion
3.2. Norwegian Social Science Data Services,
Norway.http://www.europeansocialsurvey.org/.
Fahrmeir, L. and Tutz, G. (2001). Multivariate Statistical
Modelling Based onGeneralized Linear Models (2nd edition).
Springer-Verlag.
Haario, H., Saksman, E., and Tamminen, J. (2001). An adaptive
Metropolisalgorithm. Bernoulli, 7, 223–242.
Hagenaars, A., De Vos, K., and Zaidi, A. (1994). Poverty
Statistics in theLate 1980’s: research based on micro-data.
Luxembourg: Office for OfficialPublications of the European
Communities.
Hastie, T. J. and Tibshirani, R. J. (1990). Generalized additive
models. Chap-man & Hall, London.
Hsieh, F. (1996). Empirical process approach in a two-sample
location–scalemodel with censored data. The Annals of Statistics,
24(6), 2705–2719.
Jørgensen, B. (1997). The Theory of Dispersion Models. Chapman
& Hall,London.
18
-
Jullion, A. and Lambert, P. (2007). Robust specification of the
roughnesspenalty prior distribution in spatially adaptive bayesian
P-splines models.Computational Statistics and Data Analysis, 51,
2542–2558.
Komárek, A., Lesaffre, E., and Hilton, J. (2005). Accelerated
failure time modelfor arbitrarily censored data with smoothed error
distribution. Journal ofComputational and Graphical Statistics, 14,
726–745.
Lambert, P. (2011). Smooth semi- and nonparametric Bayesian
estimation ofbivariate densities from bivariate histogram data.
Computational Statisticsand Data Analysis, 55, 429–445.
Lambert, P. and Eilers, P. H. (2009). Bayesian density
estimation from groupedcontinuous data. Computational Statistics
and Data Analysis, 53, 1388–1399.
Lambert, P. and Eilers, P. H. C. (2006). Bayesian
multidimensional densitysmoothing. In Proceedings of the 21st
International Workshop on StatisticalModelling, pages 313–320,
Galway.
Lambert, P. and Lindsey, J. (1999). Analysing financial returns
using regressionbased on non-symmetric stable distributions.
Applied Statistics, 48, 409–424.
Lambert, P., Collett, D., Kimber, A., and Johnson, R. (2004).
Parametricaccelerated failure time models with random effects and
an application tokidney transplant survival. Statistics in
Medicine, 23(20), 3177–3192.
Lang, S. and Brezger, A. (2004). Bayesian P-splines. Journal of
Computationaland Graphical Statistics, 13, 183–212.
McCullagh, P. and Nelder, J. A. (1989). Generalized Linear
Models, 2nd Edi-tion. Chapman & Hall / CRC.
Nelder, J. and Wedderburn, R. (1972). Generalized linear models.
Journal ofthe Royal Statistical Society, Series B, 135,
370–384.
Rigby, R. and Stasinopoulos, D. (2001). Generalized additive
models for loca-tion scale and shape. Applied statistics, 54(3),
1–38.
Thompson, R. and Baker, R. J. (1981). Composite link functions
in generalizedlinear models. Applied Statistics, 30, 125–131.
Van Keilegom, I. and Akritas, M. (1999). Transfer of tail
information in cen-sored regression models. The Annals of
Statistics, 27(5), 1745–1784.
Vaupel, J., Manton, K., and Stallard, E. (1979). The impact of
heterogeneity inindividual frailty on the dynamics of mortality.
Demography, 16(3), 439–454.
19