Quantile Regression for Nonlinear Mixed Effects Models: A Likelihood Based Perspective Christian E. Galarza a Luis M. Castro b Francisco Louzada c Victor H. Lachos a * a Departamento de Estatística, Universidade Estadual de Campinas, Campinas, Brazil b Departamento de Estadística and CI 2 MA, Universidad de Concepción, Chile c Department of Applied Mathematics and Statistics, Universidade de São Paulo, São Carlos, Brazil. Abstract Longitudinal data are frequently analyzed using normal mixed effects models. Moreover, the traditional estimation methods are based on mean regression, which leads to non-robust parameter estimation for non-normal error distributions. Compared to the conventional mean regression approach, quantile regression (QR) can characterize the entire conditional distribu- tion of the outcome variable and is more robust to the presence of outliers and misspecification of the error distribution. This paper develops a likelihood-based approach to analyzing QR models for correlated continuous longitudinal data via the asymmetric Laplace (AL) distri- bution. Exploiting the nice hierarchical representation of the AL distribution, our classical approach follows the Stochastic Approximation of the EM (SAEM) algorithm for deriving ex- act maximum likelihood estimates of the fixed-effects and variance components in nonlinear mixed effects models (NLMEMs). We evaluate the finite sample performance of the algorithm and the asymptotic properties of the ML estimates through empirical experiments and applica- tions to two real life datasets. The proposed SAEM algorithm is implemented in the R package qrNLMM. Keywords Asymmetric Laplace distribution; Nonlinear mixed effects models; Quantile re- gression; SAEM algorithm; Stochastic Approximations. 1 Introduction Non-linear mixed-effects (NLME) models are frequently used for analyzing grouped data, clus- tered data, longitudinal data, multilevel data, among others. This is because, this type of models allows us to deal with non-linear relationships between the observed response and the covari- ates and/or random effects, and at the same time, takes into account within and between-subject correlations in the statistical modelling of the observed data. In general, NLME models arise as a * Address for correspondence: Departamento de Estatística, Rua Sérgio Buarque de Holanda, 651, Cidade Univer- sitária Zeferino Vaz, Campinas, São Paulo, Brazil. CEP 13083-859. e-mail: [email protected]1
31
Embed
Quantile Regression for Nonlinear Mixed Effects Models: A … · 2017-02-20 · Quantile Regression for Nonlinear Mixed Effects Models: A Likelihood Based Perspective Christian E.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Quantile Regression for Nonlinear Mixed Effects
Models: A Likelihood Based Perspective
Christian E. Galarzaa Luis M. Castrob Francisco Louzadac
Victor H. Lachosa∗
aDepartamento de Estatística, Universidade Estadual de Campinas, Campinas, Brazil
bDepartamento de Estadística and CI2MA, Universidad de Concepción, Chile
cDepartment of Applied Mathematics and Statistics, Universidade de São Paulo, São Carlos, Brazil.
Abstract
Longitudinal data are frequently analyzed using normal mixed effects models. Moreover,
the traditional estimation methods are based on mean regression, which leads to non-robust
parameter estimation for non-normal error distributions. Compared to the conventional mean
regression approach, quantile regression (QR) can characterize the entire conditional distribu-
tion of the outcome variable and is more robust to the presence of outliers and misspecification
of the error distribution. This paper develops a likelihood-based approach to analyzing QR
models for correlated continuous longitudinal data via the asymmetric Laplace (AL) distri-
bution. Exploiting the nice hierarchical representation of the AL distribution, our classical
approach follows the Stochastic Approximation of the EM (SAEM) algorithm for deriving ex-
act maximum likelihood estimates of the fixed-effects and variance components in nonlinear
mixed effects models (NLMEMs). We evaluate the finite sample performance of the algorithm
and the asymptotic properties of the ML estimates through empirical experiments and applica-
tions to two real life datasets. The proposed SAEM algorithm is implemented in the R package
M-Step: Maximize Q(θθθ |θθθ (k)) with respect to θθθ to obtain θθθ
(k+1).
However, in some situations, the E-step cannot be obtained analytically and has to be calcu-
lated though a simulation step. Wei & Tanner (1990) proposed the Monte Carlo EM (MCEM)
algorithm in which the E-step is replaced by a Monte Carlo approximation based on a large num-
ber of independent simulations of the missing data/latent variables. This simple solution is in fact
computationally expensive because the large number of independent simulations of the missing
data/latent variables required to achieve a good approximation of the algorithm. Consequently, in
order to reduce the amount of simulations required by the MCEM algorithm, the SAEM algorithm
proposed by Delyon et al. (1999) replaces the E-step by a stochastic approximation procedure. Be-
sides having good theoretical properties, the SAEM algorithm estimates the population parameters
accurately, converging to the global maxima of the ML estimates under quite general conditions
(Allassonnière et al., 2010; Delyon et al., 1999; Kuhn & Lavielle, 2004).
At each iteration, the SAEM algorithm successively simulates missing data/latent variables us-
ing their conditional distributions, updating the model parameters. Thus, at iteration k, the SAEM
algorithm proceeds as follows:
E-Step:
• Simulation: Draw (q(ℓ,k)), ℓ= 1, . . . ,m from the conditional distribution of the missing data
f (q|θθθ (k−1),yobs).
• Stochastic Approximation: Update the Q(θθθ |θθθ (k)) function as
Q(θθθ |θθθ (k))≈ Q(θθθ |θθθ (k−1)
)+δk
[1
m
m
∑ℓ=1
log f (yobs,q(ℓ,k);θθθ )−Q(θθθ |θθθ (k−1)
)
](6)
M-Step:
• Maximization: Update θθθ(k)
as θθθ(k+1)
= arg maxθ
Q(θθθ |θθθ (k)),
Note that, although the E-Step is similar in the SAEM and MCEM algorithms, a small number
of simulations m (for practical situations, m ≤ 20 is suggested) is necessary in the first one. This is
possible because, unlike the traditional EM algorithm and its variants, the SAEM algorithm uses
5
not only the current simulation of the missing data/latent variables at the iteration k but some or all
the previous simulations. In fact, this ‘memory’ property is set by the smoothing parameter δk. In
our case, we suggested the following choice of the smoothing parameter given as
δk =
{1, for 1 ≤ k ≤ cW
1k−cW
, for cW +1 ≤ k ≤W
where W is the maximum number of iterations, and c a cut point (0 ≤ c ≤ 1) which determines the
percentage of initial iterations with no memory.
3 The QR non-linear mixed model
3.1 The model
In this Section, we proposed the following general mixed-effects model. Let yi = (yi1, . . . ,yini)⊤
be the continuous response for subject i and η = (η(φφφ i,xi1), . . . ,η(φφφ i,xini))⊤ a nonlinear differen-
tiable function of vector-valued random parameters φi of dimension r. Moreover, let xi be a matrix
of covariates of dimension ni × r. The NLME model is defined as
yi = η(φφφ i,xi)+ εεε i, φφφ i = Aiβββ p +Bibi, (7)
where Ai and Bi are design matrices (fixed) of dimensions r×d and r×q, respectively, possibly
depending on elements of xi and incorporating time varying covariates in fixed or random effects,
βββ p is the regression coefficient corresponding to the pth quantile, bi is a q-dimensional random
effects vector associated to the i-th subject and εεε i the independent and identically distributed vector
of random errors. We define the pth quantile function of the response yi j as
Qp(yi j | xi j,bi) = η(φφφ i,xi j) = η(Aiβββ p +Bibi,xi j). (8)
where Qp denotes the inverse of the unknown distribution function F . In this setting, the random
effects bi are distributed independent and identically distributed (i.i.d) as Nq(0,ΨΨΨ), where the
dispersion matrix ΨΨΨ = ΨΨΨ(ααα) depends on unknown and reduced parameters ααα . The error terms are
distributed as εi jiid∼ AL(0,σ), being uncorrelated with the random effects. Then, conditionally on
bi, the observed responses for subject i, i.e., yi j for j = 1, . . . ,ni are independent following an AL
distribution with pdf given by
f (yi j | βββp,bi,σ) =p(1− p)
σexp
{−ρp
(yi j −η(Aiβββ p +Bibi,xi j)
σ
)}. (9)
3.2 The MCEM algorithm
In this section we develop a MCEM algorithm for the ML estimation of the parameters in the QR-
NLME model. This model has a flexible hierarchical representation, which is useful for deriving
6
interesting theoretical properties. From (4), we have that the QR-NLME model defined in (8)-(9),
can be represented as follows:
yi | bi,ui ∼ Nni
(ηηη(Aiβββ p +Bibi,xi)+ϑpui,στ2
pDDDi
),
bi ∼ Nq (0,ΨΨΨ),
ui ∼ni
∏j=1
exp(σ), (10)
for i = 1, . . . ,n, where ϑp and τ2p are given as in (3); DDDi represents a diagonal matrix that con-
tains the vector of latent variables ui = (ui1, . . . ,uini)⊤ and exp(σ) denotes the exponential dis-
tribution with mean σ . Let yic = (y⊤i ,b
⊤i ,u
⊤i )
⊤, with yi = (yi1, . . . ,yini
)⊤, bi =(bi1, . . . ,biq
)⊤,
ui =(ui1, . . . ,uini)⊤and let θθθ (k) =(βββ (k)⊤
p,σ (k),ααα (k)⊤)⊤, the estimate of θθθ at the k-th iteration. Since bi
and ui are independent for all i = 1, . . . ,n, it follows from (4) that the complete-data log-likelihood
function is given by
ℓc(θθθ ; yc) =n
∑i=1
ℓc(θθθ ; yic),
where
ℓc(θθθ ; yic) = constant−3
2nilogσ − 1
2log∣∣ΨΨΨ∣∣−1
2b⊤
i ΨΨΨ−1bi−1
σu⊤
i 1ni
− 1
2στ2p
(yi−ηηη(Aiβββ p +Bibi,xi)−ϑpui)⊤DDD−1
i (yi−ηηη(Aiβββ p +Bibi,xi)−ϑpui),(11)
where 1p is a vector of ones of dimension p. Since Ai, Bi and xi are known matrices, we simplify
the notation by writing ηηη(βββp,bi) to represent ηηη(φφφ i,xi) = ηηη(Aiβββ p +Bibi,xi). Given the current
estimate θθθ = θθθ (k), the E-step calculates the function
Q
(θθθ | θθθ
(k))=
n
∑i=1
Qi
(θθθ | θθθ
(k)),
where
Qi
(θθθ | θθθ
(k))
= E{ℓc(θθθ ; yic) | θθθ (k),y
}
∝ −3
2nilogσ−1
2log∣∣ΨΨΨ∣∣−1
2tr{(bb⊤)i
(k)ΨΨΨ−1
}− 1
2στ2p
[y⊤i DDD−1
i
(k)yi
−2ϑpy⊤i 1ni+
τ4p
4ui
(k)⊤1ni
−2y⊤i (DDD−1ηηη)
(k)i +2ϑp1⊤ni
ηηη i(k)
+ ηηη⊤i DDD−1
i ηηη i
(k)],(12)
where ηηη i = ηηη(βββp,bi), tr(A) denotes the trace of matrix A. The calculation of this function requires
the following expressions
ηηη i(k)= E
{ηηη i | θθθ (k),yi
}, ui
(k) = E{
ui | θθθ (k),yi
},
(bb⊤)i
(k)
= E{
bib⊤i | θθθ (k),yi
}, DDD−1
i
(k)= E
{DDD−1
i | θθθ (k),yi
},
(DDD−1ηηη)(k)i = E
{DDD−1
i ηηη i(k) | θθθ (k),yi
}, ( ηηη⊤DDD−1ηηη)
(k)i = E
{ηηη⊤
i DDD−1
i ηηη i | θθθ (k),yi
},
7
which do not have closed forms. Since the joint distribution of the latent variables(
b(k)
i ,u(k)
i
)is
unknown and the conditional expectations cannot be computed analytically, for any function g(·),the MCEM algorithm approximates these expectations using a Monte Carlo approximation given
by
E[g(bi,ui) | θθθ (k),yi]≈1
m
m
∑ℓ=1
g(b(ℓ,k)
i ,u(ℓ,k)
i ), (13)
which depend of the simulations of the two latent variables b(k)
i and u(k)
i from the conditional density
f (bi,ui | θθθ (k),yi). Using properties of conditional expectations, the expected value given in (13)
can be more accurately approximated as
Ebi,ui[g(bi,ui) | θθθ (k),yi] = Ebi
[Eui[g(bi,ui) | θθθ (k),bi,yi] | yi ]
≈ 1
m
m
∑ℓ=1
Eui[g(b(ℓ,k)
i ,ui) | θθθ (k),b(ℓ,k)
i ,yi], (14)
where b(ℓ,k) is generated from f (bi | θθθ (k),yi). Note that (14) is a more accurate approximation once
it only depends of one MC approximation, instead of two (as is needed in (13)).
For generating random samples from the full conditional distribution f (ui | yi,bi), first note that
the vector ui | yi,bi can be written as ui | yi,bi = [ ui1 | yi1,bi, ui2 | yi2,bi, · · · ,uini| yini
,bi ]⊤,
since ui j | yi j,bi is independent of uik | yik,bi, for all j,k = 1,2, . . . ,ni and j 6= k. Thus, the distri-
bution of f (ui j | yi j,bi) is proportional to
f (ui j | yi j,bi) ∝ φ(yi j
∣∣ηi j(βββp,bi)+ϑpui j, στ2pui j)× exp(σ),
which, from Subsection 2.1, leads to ui j | yi j,bi ∼ GIG( 12,χi j,ψ), where χi j and ψ are given by
χi j =|yi j−ηi j
(βββ
p,bi
)|
τp
√σ
and ψ =τp
2√
σ. (15)
From (5), and after generating samples from f (bi | θθθ (k),yi) (see Subsection 3.4), the conditional
expectation Eui[· | θθθ ,bi,yi] in (14) can be computed analytically. Finally, the proposed MCEM
algorithm for estimating the parameters of the QR-NLME model can be summarised as follows:
• MC E-step: Given θθθ = θθθ (k), for i = 1, . . . ,n;
Note that, for the MC E-step, we need to generate b(ℓ,k)i , ℓ = 1, . . . ,m, from f (bi | θθθ (k),yi),
where m is the number of Monte Carlo simulations to be used (a number suggested to be large
enough). A simulation method to generate samples from f (bi | θθθ (k),yi), is described next in Sub-
section 3.4.
3.3 A SAEM algorithm
As mentioned in Subsection 2.2, the SAEM circumvents the problem of simulating a large number
of latent values at each iteration, leading to a faster and efficient solution to the MCEM algorithm.
In summary, the SAEM algorithm proceeds as follows:
• E-step: Given θθθ = θθθ (k) for i = 1, . . . ,n;
– Stochastic approximation: Update the MC approximations using stochastic approxi-
9
mations, given by
S(k)1,i = S
(k−1)1,i +δk
[1
m
m
∑ℓ=1
J(k)⊤i E (DDD−1
i )(ℓ,k)J(k)i −S
(k−1)1,i
],
S(k)2,i = S
(k−1)2,i +δk
[1
m
m
∑ℓ=1
[2J
(k)⊤i E (DDD−1
i )(ℓ,k)[yi −ηηη(βββp
(k)
,b(ℓ,k)
i )−ϑpE (ui)(ℓ,k)
]]−S
(k−1)2,i
],
S(k)3,i = S
(k−1)3,i +δk
{1
m
m
∑ℓ=1
[(yi −ηηη(βββp
(k+1)
,b(ℓ,k)
i ))⊤E (DDD−1)(ℓ,k)(yi−ηηη(βββp
(k+1)
,b(ℓ,k)
i ))
−2ϑp(yi −ηηη(βββp
(k+1)
,b(ℓ,k)
i ))⊤1ni+
τ4p
4E (ui)
(ℓ,k)⊤1ni
]−S
(k−1)3,i
]and
S(k)4,i = S
(k−1)4,i +δk
[1
m
m
∑ℓ=1
[b(ℓ,k)
i b(ℓ,k)⊤i ]−S
(k−1)4,i
].
• M-step: Update θθθ (k) by maximizing Q
(θθθ | θθθ (k)
)over θθθ (k), which leads to the following
expressions:
βββ p
(k+1)= βββ p
(k)+
[n
∑i=1
S(k)1,i
]−1n
∑i=1
S(k)2,i ,
σ (k+1) =1
3Nτ2p
n
∑i=1
S(k)3,i ,
Ψ(k+1) =1
n
n
∑i=1
S(k)4,i . (16)
Given a set of suitable initial values θθθ(0)
(see Appendix B), the SAEM iterates until conver-
gence at iteration k, if maxi
{|θ (k+1)
i − θ(k)i |
|θ (k)i |+δ1
}< δ2, where δ1 and δ2 are pre-established small
values. As suggested by Searle et al. (1992) (page. 269), we use δ1 = 0.001 and δ2 = 0.0001.
As proposed by Booth & Hobert (1999), we also used a second convergence criteria defined by
maxi
{|θ (k+1)
i −θ(k)i |√
var(θ(k)i )+δ1
}< δ2.
3.4 Missing data simulation method
In order to generate samples from f (bi | yi,θθθ ), we use the Metropolis-Hastings (MH) algorithm
(Metropolis et al., 1953; Hastings, 1970), noting that the conditional distribution f (bi | yi,θθθ)(omitting θθθ ) can be represented as
f (bi | yi) ∝ f (yi | bi)× f (bi) ,
10
where bi ∼Nq(0,ΨΨΨ) and f (yi | bi)=∏ni
j=1 f (yi j | bi), with yi j |bi ∼AL(η(Aiβββ p+Bibi,xi j),σ , p
).
Since the objective function is a product of two distributions (with both support lying in R), a suit-
able choice for the proposal density is a multivariate normal distribution with mean and variance-
covariance matrix given by E(b(k−1)
i | yi) and Var(b(k−1)
i | yi) respectively. These quantities are
obtained from the last iteration of the SAEM algorithm. Note that this candidate leads to better
acceptance rate, and consequently a faster algorithm.
4 Estimation of the likelihood and standard errors
4.1 Likelihood Estimation
Given the observed data, the likelihood function ℓo(θθθ | y) of the model defined in (8)-(9) is given
by
ℓo(θθθ | y) =n
∑i=1
log f (yi | θθθ)) =n
∑i=1
log
∫
Rqf (yi | bi;θθθ) f (bi;θθθ)dbi, (17)
where the integral can be expressed as an expectation with respect to bi, i.e., Ebi[ f (yi | bi;θθθ)]. The
evaluation of this integral is not available analytically and is often replaced by its MC approxi-
mation involving a large number of simulations. However, alternative importance sampling (IS)
procedures might require a smaller number of simulations than the typical MC procedure. Fol-
lowing Meza et al. (2012), we can compute this integral using an IS scheme for any continuous
distribution f (bi;θθθ) of bi, having the same support as f (bi;θ). Re-writing (17) as
ℓo(θθθ | y) =n
∑i=1
log
∫
Rqf (yi | bi;θθθ)
f (bi;θθθ)
f (bi;θθθ)f (bi;θθθ )dbi.
we can express it as an expectation with respect to b∗i , where b∗
i ∼ f (b∗i ;θθθ ). Thus, the likelihood
function can now be expressed as
ℓo(θθθ | y)≈n
∑i=1
log
{1
m
m
∑ℓ=1
[ni
∏j=1
[ f (yi j | b∗(ℓ)i ;θθθ)]
f (b∗(ℓ)i ;θθθ)
f (b∗(ℓ)i ;θθθ)
]}, (18)
where {b∗(ℓ)i }, l = 1, . . . ,m, is an MC sample from f (b∗
i ;θθθ ), and f (yi | b∗(ℓ)i ;θθθ) is expressed as
∏ni
j=1 f (yi j | b∗(ℓ)i ;θθθ ) due to the conditional independence assumption. An efficient choice for
f (b∗(ℓ)i ;θ) is f (bi | yi). Therefore, we use the same proposal distribution discussed in Subsection
3.4, generating b∗(ℓ)i ∼ Nq(µµµbi
, ΣΣΣbi), where µµµbi
= E(b(w)
i | yi) and ΣΣΣbi= Var(bi | yi), which are
estimated empirically during the last few iterations of the SAEM algorithm at convergence.
4.2 Standard error approximation
Louis’ missing information principle (Louis, 1982) relates the score function of the incomplete data
log-likelihood with the complete data log-likelihood through the conditional expectation ∇∇∇o(θθθ) =Eθθθ [∇∇∇c(θθθ ;Ycom|Yobs)], where ∇∇∇o(θ) = ∂ℓo(θθθ ;Yobs)/∂θ and ∇∇∇c(θθθ ) = ∂ℓc(θ ;Ycom)/∂θθθ are the
11
score functions for the incomplete and complete data, respectively. As defined in Meilijson (1989),
the empirical information matrix can be computed as
Ie(θθθ |y) =n
∑i=1
s(yi|θθθ)s⊤(yi|θθθ )−1
nS(y|θθθ)S⊤(y|θθθ ), (19)
where S(y|θθθ ) = ∑ni=1 s(yi|θθθ ), with s(yi|θθθ) the empirical score function for the i-th individual.
Replacing θθθ by its ML estimator θθθ and considering ∇∇∇o(θθθ) = 0, equation (19) takes the simple
form
Ie(θθθ |y) =n
∑i=1
s(yi|θθθ)s⊤(yi|θθθ ). (20)
At the kth iteration, the empirical score function for the i-th subject can be computed as
s(yi|θθθ )(k) = s(yi|θθθ)(k−1)+δk
[1
m
m
∑ℓ=1
s(yi,q(ℓ,k);θθθ (k))− s(yi|θθθ)(k−1)
], (21)
where q(ℓ,k), ℓ = 1, . . . ,m, are the simulated missing values drawn from the conditional distribu-
tion f (·|θ (k−1),yi). Thus, at iteration k, the observed information matrix can be approximated as
Ie(θθθ |y)(k) = ∑ni=1 s(yi|θθθ)(k) s⊤(yi|θθθ)(k), such that at convergence, I−1
e (θθθ |y) = (Ie(θθθ |y)|θθθ=θθθ)−1 is
an estimate of the covariance matrix of the parameter estimates. Expressions for the elements of
the score vector with respect to θθθ are given in Appendix C.
5 Simulated data
In order to examine the performance of the proposed method, we conduct some simulation studies.
The first simulation study shows that the ML estimates based on the SAEM algorithm provide
good asymptotic properties. The second study investigates the consequences in population infer-
ences when the normality assumption is inappropriate. In order to do that, we used a heavy tailed
distribution for the random error term, testing the robustness of the proposed method in terms of
the parameter estimation.
5.1 Asymptotic properties
As in Pinheiro & Bates (1995), we performed the first simulation study with the following three
where ti j = 100,267,433,600,767,933,1100,1267,1433,1600 for all i. The goal is to estimate
the fixed effects parameters β ’s for a grid of percentiles p = {0.50,0.75,0.95}. A random effect
b1i, for i = 1, . . . ,n is added to the first growth parameter β1 and its effect over the growth-curve is
shown in Figure 4.
Parameters interpretation for this model is discussed in Section 6. The random effect b1i and
the error term εεε i = (εi1 . . . ,εi10)⊤ are non-correlated. In fact, b1i
iid∼N(0,σ 2b ) and εi j
iid∼AL(0,σe, p).
12
20 30 40 50 60 700
51
01
52
0
b1=−3
b1=−2
b1=−1
b1= 0
b1= 1
b1= 2
b1= 3
Time (days)
gro
wth
(cm
)
Figure 2: Effect of including a random effect b1 in the first parameter of the non-linear growth-
curve logistic model.
We set βββp = (β1,β2,β3)⊤ = (200,700,350)⊤, σe = 0.5 and σ 2
b = 10. Using the notation in (7),
the matrices Ai and Bi are given by I3 and (1,0,0)⊤ respectively. For different sample sizes n =25, 50, 100 and 200, we generate 100 data samples for each scenario. In addition, we choose m =20, c = 0.25 and W = 500 for the SAEM algorithm convergence parameters. Note, the choice of c
depends on the dataset, and also the underlying model. We set c = 0.25, given that an initial run of
125 iterations (which is 25% of W ) for the 0.05th quantile led to convergence to the neighborhood
solution. For all scenarios, we compute the square root of the mean square error (RMSE), bias
(Bias) and Monte Carlo standard deviation (MC-Sd) for each parameter over the 100 replicates.
These quantities are defined as
MC-Sd(θi) =
√√√√ 1
99
100
∑j=1
(θi
( j)− θi
)2and Bias(θi) = θi −θi (23)
where RMSE(θi) =
√MC-Sd2(θi)+Bias2(θi), the Monte Carlo mean θi =
1100 ∑100
j=1 θ( j)i (MC
Mean) and θi( j) is the estimate of θi from the j-th sample, j = 1 . . .100. Based on Figure 3, we
conclude that the bias in the estimation of fixed effects converge to zero when n increases.
13
Table 1: Simulation 1: Monte Carlo mean and standard deviation (MC Mean and MC-Sd) for the
fixed effects βββ and scale parameter σe obtained after fitting the QR-NLME model under different
settings of quantiles and sample sizes. Results based on 100 simulated samples.
β1 β2 β3 σe
Quantile (%) n MC Mean MC-Sd MC Mean MC-Sd MC Mean MC-Sd MC Mean MC-Sd
Figure 5: Soybean data: (a) Leaf weight profiles versus time. (b) Leaf weight profiles versus
time by genotype. (c) Ten randomly selected leaf weight profiles versus time been five per each
genotype.
defined as
Ai =
1 0 0 geni
0 1 0 0
0 0 1 0
and Bi =
1 0 0
0 1 0
0 0 1
. (25)
The three parameter interpretation are the asymptotic leaf weight, the time at which the leaf
reaches half of its asymptotic weight and the time elapsed between the leaf reaching half and
0.7311 = 1/(1+ e−1) of its asymptotic weight, respectively. Since the aim of the study is to
compare the final (asymptotic) growth of the two kind of Soybeans, the covariate geni was incor-
porated in the first component of the growth function. Therefore, the coefficient β4 represents the
difference (in g) of the asymptotic leaf weight between the plan introduction type and forrest one
(control). Figure 5 shows the leaf weight profiles.
Figure 6 shows the fitted regression lines for quantiles 0.10, 0.25, 0.50, 0.75 and 0.90 by geno-
type. From this figure we can see how the extreme quantiles estimation functions captures the full
data variability, detecting some atypical observations (particularly for the Plan Introduction geno-
type).
Figure 9 in Appendix D shows a summary of the obtained results. We can see that the effect
of the genotype results significant for all the quantile profile. Moreover, the difference varies with
respect to the conditional quantile been more significant for lower quantiles. Using the information
provided by the 95th percentile, we conclude that the Soybean plants growing more have a mean
leaf weight around 19.35 grams for the Forrest genotype and 23.25 grams for the Plan Introduction
genotype, then the asymptotic difference for the two genotypes is around 4 grams. Finally, it is
important to stress that the convergence of the fixed effect estimates and variance components is
analysed using a graphical criteria as is shown in Figure 11 (Appendix D).
17
0 20 40 60 80 100
05
10
15
20
25
30
Forrest
time (weeks)
ave
rag
e le
af w
eig
ht (g
r)
0 20 40 60 80 100
05
10
15
20
25
30
Plan Introduction
time (weeks)
ave
rag
e le
af w
eig
ht (g
r)
Figure 6: Soybean data: Fitted quantile regression for several quantiles.
6.2 HIV viral load study
The data set belongs to a clinical trial (ACTG 315) studied in previous works by Wu (2002) and
Lachos et al. (2013). In this study, the HIV viral load of 46 HIV-1 infected patients under an-
tiretroviral treatment (protease inhibitor and reverse transcriptase inhibitor drugs) is analysed. The
viral load and some other covariates were measured several times after the start of treatment. Wu
(2002) found that the only significance covariate for modelling the virus load was the CD4. Figure
7 shows the profile of viral load in log10 scale and CD4 cell count/100 per cubic ml versus time (in
days/100) for six randomly selected patients. We can see that there exist some inverse relationship
between the viral load and the CD4 cell count, i.e., high CD4 cell count leads to lower levels of
viral load. This is because the CD4 cells (also called T-cells) alert the immune system in the case
of an invasion of viruses and/or bacteria. Consequently, lower CD4 count means a weaker immune
system.
In order to fit the ACTG 315 data, we propose a bi-phasic non-linear model considered by Wu
(2002) and also used by Lachos et al. (2013). The proposed NLME model is given by:
yi j = log10
(e(ϕ1i−ϕ2iti j)+ e(ϕ3i−ϕ4iti j)
)+ εi j, i = 1, . . . ,46, j = 1, . . . ,ni, (26)
with ϕ1i = β1 +b1i, ϕ2i = β2 +b2i, ϕ3i = β3 +b3i, ϕ4i j = β4 +β5CD4i j +b4i, where the observed
value yi j represents the log-10 transformation of the viral load for the ith patient at time j, CD4i j
is the CD4 cell count (in cells/100mm3) for the ith patient at time j and εi j is the measurement
error term. As in the previous case, βββp = (β1,β2,β3,β4,β5)⊤ and bi = (b1i,b2i,b3i,b4i)
⊤ denotes
the fixed and random effects vector respectively, and CD4i = (CD4i1, . . . ,CD4ini)⊤. The matrices
Ai and Bi are defined as
Ai =
(I3 0 0
0 1niCD4i
)and Bi =
(I3 0
0 1ni
). (27)
18
0.0 0.5 1.0 1.5 2.0
23
45
6
time since infection
log
10
HIV
RN
A
0.0 0.5 1.0 1.5 2.0
0.0
00
.01
0.0
20
.03
0.0
40
.05
time since infection
CD
4 c
ou
nt
Figure 7: ACTG 315 data. Profiles of viral load (response) in log10 scale and CD4 cell count (in
cells/100mm3) for six randomly selected patients.
The parameters ϕ2i and ϕ4i are the two-phase viral decay rates, which represent the minimum
turnover rates of productively infected cells and that of latently or long-lived infected cells if ther-
apy was successful, respectively. For more details about the model in (26) see Grossman et al.
(1999) and Perelson et al. (1997).
Figure 8 shows the fitted regression lines for quantiles 0.10, 0.25, 0.50, 0.75 and 0.90 for the
ACTG 315 data. In order to plot this, first, we have fixed the CD4 covariate using the predicted
sequence from a linear regression (including a quadratic term) for explaining the CD4 cell count
along the time. We can see how quantile estimated functions follow the data behaviour and turn
easily to estimate a specific viral load quantile at any time of the experiment. Extreme quantile
functions bound the most of the observed profiles and evidence possible influential observations.
The results after fitting QR-NLME model over the grid of quantiles p = {0.05,0.10, . . .,0.95}are shown in figure 10 in Appendix D. The first phase viral decay rate is positive and its effect tends
to increase proportionally along quantiles. Moreover, the second phase viral decay rate is positive
correlated with the CD4 count and therefore with the duration of therapy. Consequently, more days
of treatment implies a higher CD4 cell count and therefore a higher second phase viral decay. The
CD4 cell process for this model has a different behaviour than for the expansion phase (Huang &
Dagne (2011)). The significance of the CD4 covariate increases positively with respect to quantiles
(until quantile p = 0.60 approximately) and its effect becomes constant for greater quantiles. As
in the previous case, the convergence of estimates for all parameters were also assessed using the
compute the conditional expectations E (ui) and E (D−1
i ) in the following way. Using matrix ex-
pectation properties, we define these expectations as
E (ui) = [E (ui1) E (ui1) · · · E (uini)]⊤ (B.1)
and
E (D−1
i ) = diag(E (u−1i )) =
E (u−1i1 ) 0 ... 0
0 E (u−1i2 ) ... 0
......
. . ....
0 0 ... E (u−1ini
)
. (B.2)
21
We already have ui j | yi j,bi ∼ GIG( 12,χi j,ψ) where χi j and ψ are defined in (15). Then, using (5),
we compute the moments involved in the equations above as E (ui j) =χi j
ψ (1+ 1χi jψ
) and E (u−1i j ) =
ψχi j
. Thus, for iteration k of the algorithm and for the ℓth Monte Carlo realization, we can compute
E (ui)(ℓ,k) and E [D−1
i ](ℓ,k) using equations (B.1)-(B.2) where
E (ui j)(ℓ,k) =
2|yi j −ηi j(βββ(k)
p ,b(ℓ,k)
i )|+4σ (k)
τ2p
and E (u−1i j )(ℓ,k) =
τ2p
2|yi j −ηi j(βββ(k)
p ,b(ℓ,k)
i )|.
Appendix C The empirical information matrix
In light of (11), the complete log-likelihood function can be rewritten as
ℓci(θθθ) = −3
2ni logσ − 1
2στ2p
ζ⊤i D−1
i ζi −1
2log∣∣ΨΨΨ∣∣−1
2b⊤
i ΨΨΨ−1bi−1
σu⊤
i 1ni(C.1)
where ζi = yi−ηηη(βββp,bi)−ϑpui and θθθ = (βββ⊤p,σ ,ααα⊤)⊤. Differentiating with respect to θθθ , we have
the following score functions:
∂ℓci(θθθ)
∂βββp
=∂ηηη
∂βββp
∂ζi
∂ηηη
∂ℓci(θθθ)
∂ζi
=1
στ2p
J⊤i D−1i ζi,
with Ji defined in section 3.2. and
∂ℓci(θθθ)
∂σ= −3ni
2
1
σ+
1
2σ 2τ2p
ζ⊤i D−1
i ζi+1
σ 2u⊤
i 1ni.
Let ααα be the vector of reduced parameters from ΨΨΨ, the dispersion matrix for bi. Using the trace
properties and differentiating the complete log-likelihood function, we have that
∂ℓci(θθθ)
∂ΨΨΨ=
∂
∂ΨΨΨ
[−n
2log∣∣ΨΨΨ∣∣−1
2tr{ΨΨΨ−1bib
⊤i }]
= −1
2tr{ΨΨΨ−1}+ 1
2tr{ΨΨΨ−1ΨΨΨ−1bib
⊤i }
=1
2tr{ΨΨΨ−1(bib
⊤i −ΨΨΨ)ΨΨΨ−1}
Next, taking derivatives with respect to a specific α j from ααα based on the chain rule, we have
∂ℓci(θθθ)
∂α j=
∂ΨΨΨ
∂α j
∂ℓci(θθθ)
∂ΨΨΨ
=∂ΨΨΨ
∂α j
1
2tr{ΨΨΨ−1(bib
⊤i −ΨΨΨ)ΨΨΨ−1}. (C.2)
22
where, using the fact that tr{ABCD}= (vec(A⊤))⊤(D⊤⊗B)(vec(C)), (C.2) can be rewritten as
∂ℓci(θθθ)
∂α j= (vec(∂ΨΨΨ
∂α j
⊤))⊤
1
2(ΨΨΨ−1 ⊗ΨΨΨ−1)(vec(bib
⊤i −ΨΨΨ)). (C.3)
Let Dq be the elimination matrix (Lavielle, 2014) that transforms the vectorized ΨΨΨ (written as
vec(ΨΨΨ)) into its half-vectorized form vech(ΨΨΨ), such that Dqvec(ΨΨΨ) = vech(ΨΨΨ). Using the fact that
for all j = 1, . . . , 12q(q+1), the vector (vec(∂ΨΨΨ
∂α j)⊤)⊤ corresponds to the jth row of the elimination
matrix Dq, we can generalize the derivative in (C.3) for the vector of parameters ααα as
∂ℓci(θθθ)
∂ααα=
1
2Dq(ΨΨΨ
−1 ⊗ΨΨΨ−1)(vec(bib⊤i −ΨΨΨ)).
Finally, at each iteration, we can compute the empirical information matrix by approximating the
score for the observed log-likelihood by the stochastic approximation given in (21).
23
Appendix D Figures
11
16
21
quantiles
β1
0.05 0.25 0.45 0.65 0.85
50
51
52
53
54
55
56
quantiles
β2
0.05 0.25 0.45 0.65 0.85
7.5
8.0
8.5
9.0
9.5
quantiles
β3
0.05 0.25 0.45 0.65 0.85
24
68
quantiles
β4
0.05 0.25 0.45 0.65 0.85
0.0
50
.15
0.2
50
.35
quantiles
σ
0.05 0.25 0.45 0.65 0.85
-20
10
12
Figure 9: Soybean data: Point estimates (center solid line) and 95% confidence intervals for model
parameters after fitting the QR-NLME model. The interpolated curves are spline-smoothed.
24
91
01
11
21
31
4
quantiles
β1
0.05 0.25 0.45 0.65 0.85
28
30
32
34
36
38
quantiles
β2
0.05 0.25 0.45 0.65 0.85
45
67
89
quantiles
β3
0.05 0.25 0.45 0.65 0.85
−4
−3
−2
−1
01
quantiles
β4
0.05 0.25 0.45 0.65 0.85
quantiles
β5
0.05 0.25 0.45 0.65 0.85
0.05
0.10
0.15
quantiles
σ
0.05 0.25 0.45 0.65 0.85
0.00
0.0
0.4
0.8
1.0
-0.2
Figure 10: ACTG 315 data: Point estimates (center solid line) and 95% confidence intervals for
model parameters after fitting the QR-NLME model. The interpolated curves are spline-smoothed.
25
Iteration
β1
0 100 300 500
16
.51
6.8
Iteration
β2
0 100 300 5005
3.6
54
.4
Iteration
β3
0 100 300 500
8.1
8.3
8.5
Iteration
β4
0 100 300 500
3.5
3.7
Iteration
σ
0 100 300 500
0.3
00
.45
Iteration
ψ1
0 100 300 500
51
52
5
Iteration
ψ2
0 100 300 500
05
15
Iteration
ψ3
0 100 300 500
51
5
Iteration
ψ4
0 100 300 500
02
Iteration
ψ5
0 100 300 500
0.0
2.0
Iteration
ψ6
0 100 300 500
0.2
0.5
0.8
Figure 11: Graphical summary for the convergence of the fixed effect estimates, variance compo-
nents of the random effects, and nuisance parameters performing a median regression (p = 0.50)for the Soybean data. The vertical dashed line delimits the beginning of the almost sure conver-
gence as defined by the cut-point parameter c = 0.25.
26
Iteration
β1
0 100 300 50011
.55
11
.70
Iteration
β2
0 100 300 500
30
.03
1.5
Iteration
β3
0 100 300 500
6.5
56
.70
Iteration
β4
0 100 300 500
−2.0
−1.4
Iteration
β5
0 100 300 5000.50
0.70
Iteration
σ
0 100 300 500
0.14
0.20
Iteration
ψ1
0 100 300 500
1.1
1.4
1.7
Iteration
ψ2
0 100 300 500
−0.2
0.4
Iteration
ψ3
0 100 300 500
0.5
1.5
Iteration
ψ4
0 100 300 500
0.4
1.0
Iteration
ψ5
0 100 300 500
−0.6
0.0
Iteration
ψ6
0 100 300 500
1.0
1.6
2.2
Iteration
ψ7
0 100 300 500
−0.6
0.0
Iteration
ψ8
0 100 300 500
−1.0
−0.2
Iteration
ψ9
0 100 300 500
0.0
0.6
Iteration
ψ10
0 100 300 500
1.5
2.5
Figure 12: Graphical summary for the convergence of the fixed effect estimates, variance compo-
nents of the random effects, and nuisance parameters performing a median regression (p = 0.50)for the HIV data. The vertical dashed line delimits the beginning of the almost sure convergence