Bayesian Quantile Mixture Regressionthanos/BNP2019_talk.pdfPrologos Quantile Mixture Regression Data Application Epilogos Bayesian Quantile Mixture Regression Athanasios Kottas Department

Prologos Quantile Mixture Regression Data Application Epilogos

Bayesian Quantile Mixture Regression

Athanasios Kottas

Department of Statistics, University of California, Santa Cruz

Joint work with Yifei Yan (Amazon, New York)

12th International Conference on Bayesian NonparametricsOxford, U.K., June 24-28, 2019

1 / 28


Quantile regression

• Quantile regression quantifies the relationship between a set ofquantiles of the response distribution and covariates, thus providing amore complete explanation of the response distribution.

• Practically important alternative to traditional mean regression models,with a growing literature in terms of methods and applications.

• Single-quantile regression formulation: yi = h(xi) + εi, i = 1, ..., n• Response observations yi, with covariate vectors xi.

• εi i.i.d. (given parameters) from an error distribution with density fp(ε) andp-th quantile equal to 0, i.e.,

∫ 0−∞ fp(ε)dε = p.

• h(x) is the quantile regression function, e.g., h(x) = x′β, for a linearquantile regression model.

2 / 28


Quantile regression

• Quantile regression quantifies the relationship between a set ofquantiles of the response distribution and covariates, thus providing amore complete explanation of the response distribution.

• Practically important alternative to traditional mean regression models,with a growing literature in terms of methods and applications.

• Single-quantile regression formulation: yi = h(xi) + εi, i = 1, ..., n• Response observations yi, with covariate vectors xi.

• εi i.i.d. (given parameters) from an error distribution with density fp(ε) andp-th quantile equal to 0, i.e.,

∫ 0−∞ fp(ε)dε = p.

• h(x) is the quantile regression function, e.g., h(x) = x′β, for a linearquantile regression model.

2 / 28


Estimation and modeling methods

• Classical nonparametric estimation: point estimates for quantile re-gression coefficients β through optimization of the check loss function,minβ

∑ni=1 ρp(yi − x′iβ), where ρp(u) = up− u1(u<0) (Koenker, 2005).

• Least absolute deviations criterion, minβ

∑ni=1 |yi − x′iβ|, for p = 0.5.

• Bayesian modeling and inference:• Nonparametric priors for the error distribution (Kottas & Gelfand, 2001;

Hanson & Johnson, 2002; Kottas & Krnjajic, 2009; Reich et al., 2010)

• Joint estimation of multiple quantile regression curves (Tokdar & Kadane,2012; Reich & Smith, 2013; Yang & Tokdar, 2017)

• Nonparametric inference through density regression (Taddy & Kottas, 2010)

• Parametric modeling based on the asymmetric Laplace (AL) distribution(Yu & Moyeed, 2001; Tsionas, 2003; Kozumi & Kobayashi, 2011)

3 / 28


Estimation and modeling methods

• Classical nonparametric estimation: point estimates for quantile re-gression coefficients β through optimization of the check loss function,minβ

∑ni=1 ρp(yi − x′iβ), where ρp(u) = up− u1(u<0) (Koenker, 2005).

• Least absolute deviations criterion, minβ

∑ni=1 |yi − x′iβ|, for p = 0.5.

• Bayesian modeling and inference:• Nonparametric priors for the error distribution (Kottas & Gelfand, 2001;

Hanson & Johnson, 2002; Kottas & Krnjajic, 2009; Reich et al., 2010)

• Joint estimation of multiple quantile regression curves (Tokdar & Kadane,2012; Reich & Smith, 2013; Yang & Tokdar, 2017)

• Nonparametric inference through density regression (Taddy & Kottas, 2010)

• Parametric modeling based on the asymmetric Laplace (AL) distribution(Yu & Moyeed, 2001; Tsionas, 2003; Kozumi & Kobayashi, 2011)

3 / 28


Objectives

• Quantile mixture regression: modeling for the response distributionthrough weighted mixtures of quantile regression components.

• Common regression function (common β in linear regression) for allmixture components.

• Synthesize information from multiple parts of the response distribution forestimation and selection of covariate effects.

• Different from simultaneous quantile regression, the goal is to obtain acombined estimate of the predictive effect of each covariate.

• Mixtures of K distributions, each parameterized in terms of the pk-thquantile, for k = 1, ...,K.

• Fixed pk: mixtures of generalized AL distributions (more flexible skewnessand tail behavior than the AL, when pk is fixed).

• Random pk: mixtures of AL distributions.

4 / 28


Objectives








4 / 28


Objectives








4 / 28


Objectives








4 / 28


Modeling framework

• Quantile mixture regression model:

f (y | x) =K∑

k=1

ωk fpk(y | µpk + x′β, θ)

• Continuous response y, covariate vector x.

• The pk-th quantile for density fpk is given by µpk + x′β(or, more generally, by µpk + h(x)).

• 0 < p1 < ... < pK < 1 are (possibly random) probabilities associatedwith quantiles µpk + x′β.

• Restriction µp1 < ... < µpK yields the ordering constraint for the quantiles→ facilitates identifiability→ connects mixture weights with quantiles.

• Additional scale/skewness/tail parameters θ (may be component specific).

5 / 28


Modeling framework


f (y | x) =K∑

k=1







5 / 28


Modeling framework


f (y | x) =K∑

k=1







5 / 28


Modeling framework


f (y | x) =K∑

k=1







5 / 28


Modeling framework


f (y | x) =K∑

k=1







5 / 28


Connection with quantile regression methods

• Different from simultaneous quantile regression which jointly modelsdifferent covariate effects for different quantiles (βpk

instead of β).

• Similar in spirit with composite quantile regression from the classicalliterature (Zou & Yuan, 2008): for a collection of K specified quantilelevels, 0 < p1 < . . . < pK < 1, the CQR estimator:

(b1, . . . , bK , β) = arg minb1,...,bk,β

K∑k=1

{n∑

i=1

ρpk(yi − bk − x′iβ)

}

• bk ≡ bpk , for k = 1, ...,K: intercepts for the specified quantile levels.

• Variable selection method with properties that include consistency inselection and asymptotic normality, as n→∞.

• But, only an optimization algorithm ... does not involve/correspond to aprobabilistic model.

6 / 28




instead of β).


(b1, . . . , bK , β) = arg minb1,...,bk,β

K∑k=1

{n∑

i=1


}




6 / 28




instead of β).


(b1, . . . , bK , β) = arg minb1,...,bk,β

K∑k=1

{n∑

i=1


}




6 / 28




instead of β).


(b1, . . . , bK , β) = arg minb1,...,bk,β

K∑k=1

{n∑

i=1


}




6 / 28


Two modeling scenarios

• We envision two modeling scenarios (with fixed K throughout):• fixed pk: for settings where one expects specific parts of the response

distribution (e.g., the right tail) to be informed by the covariates;

• the fixed pk model can also be used when such information is not available,based on a set of equally spaced pk spanning the unit interval;

• random pk: specify only the total number of quantiles, and let the datainform the full configuration of quantile components (both pk and µpk ).

• Mixture components need to be flexible distributions parameterized interms of quantiles.• For the random pk model, the AL distribution is sufficiently flexible.

• But not so when the quantile level is fixed→ generalized AL distribution.

7 / 28









7 / 28









7 / 28









7 / 28









7 / 28


Asymmetric Laplace distribution

• Asymmetric Laplace (AL) density

f ALp (y | µ, σ) = p(1− p)

σexp

{− 1σρp (y− µ)

}, y ∈ R

where ρp(u) = up− u1(u<0), σ > 0 is a scale parameter, p ∈ (0, 1), andµ ∈ R corresponds to the p-th percentile,

∫ µ−∞ f AL

p (y | µ, σ)dy = p.

• For µ = x′β, maximizing the likelihood w.r.t. β under an AL responsedistribution corresponds to minimizing for β the check loss function.

• This property, along with an effective mixture representation, render theAL a popular choice in quantile regression modeling.

• But, the fixed-p AL distribution is very restrictive:• the skewness of the error distribution is fully determined by fixing p;

• the mode of the error density is at 0 for any p.

8 / 28




f ALp (y | µ, σ) = p(1− p)

σexp

{− 1σρp (y− µ)

}, y ∈ R


∫ µ−∞ f AL

p (y | µ, σ)dy = p.





8 / 28




f ALp (y | µ, σ) = p(1− p)

σexp

{− 1σρp (y− µ)

}, y ∈ R


∫ µ−∞ f AL

p (y | µ, σ)dy = p.





8 / 28




f ALp (y | µ, σ) = p(1− p)

σexp

{− 1σρp (y− µ)

}, y ∈ R


∫ µ−∞ f AL

p (y | µ, σ)dy = p.





8 / 28


Generalized AL distribution

• Construction motivated by the AL mixture representation:

f ALp (y | µ, σ) =

∫R+

N(y | µ+ σA(p)z, σ2B(p)z)Exp(z | 1) dz

where A(p) = (1− 2p)/{p(1− p)} and B(p) = 2/{p(1− p)}.

• Replace the normal kernel with a skew normal density:

2ωφ

(y− ξω

)Φ

(α(y− ξ)

ω

)=

∫R+

N(y | ξ + ταs, τ 2)N+(s | 0, 1) ds

where α ∈ R is the skewness parameter, τ = ω(1 + α2)−1/2, andN+(0, 1) denotes the standard normal distribution truncated over R+.

• Generalized asymmetric Laplace (GAL) density:∫∫R+×R+

N(y | µ+σαs+σA(p)z, σ2B(p)z)Exp(z | 1)N+(s | 0, 1) dz ds

9 / 28




f ALp (y | µ, σ) =

∫R+




2ωφ

(y− ξω

)Φ

(α(y− ξ)

ω

)=

∫R+

N(y | ξ + ταs, τ 2)N+(s | 0, 1) ds




9 / 28




f ALp (y | µ, σ) =

∫R+




2ωφ

(y− ξω

)Φ

(α(y− ξ)

ω

)=

∫R+

N(y | ξ + ταs, τ 2)N+(s | 0, 1) ds




9 / 28



• GAL density f (y | p, µ, σ, α) =∫∫R+×R+


• When α = 0, the GAL density reduces to the AL density.• Integrating over z → generalized inverse-Gaussian density → integrating

over s→ (complex) closed-form expression for f (y | p, µ, σ, α).• Hierarchical mixture representation of the GAL density suffices for study

of model properties and for model fitting.

• p0-th quantile, for p0 ∈ (0, 1)? Setting γ = {I(α > 0)− p}|α|,∫ µ

−∞f (y | p, µ, σ, γ) dy = p g(γ) with g(γ) = 2Φ(−|γ|) exp(γ2/2).

• g(γ) is increasing in R−, and decreasing in R+;• for any γ, unique solution for p such that

∫ µ

−∞ f (y | p, µ, σ, γ) dy = p0.

10 / 28



• GAL density f (y | p, µ, σ, α) =∫∫R+×R+


• When α = 0, the GAL density reduces to the AL density.• Integrating over z → generalized inverse-Gaussian density → integrating

over s→ (complex) closed-form expression for f (y | p, µ, σ, α).• Hierarchical mixture representation of the GAL density suffices for study

of model properties and for model fitting.

• p0-th quantile, for p0 ∈ (0, 1)? Setting γ = {I(α > 0)− p}|α|,∫ µ

−∞f (y | p, µ, σ, γ) dy = p g(γ) with g(γ) = 2Φ(−|γ|) exp(γ2/2).

• g(γ) is increasing in R−, and decreasing in R+;• for any γ, unique solution for p such that

∫ µ

−∞ f (y | p, µ, σ, γ) dy = p0.

10 / 28



• Reparameterize (p, α) to (p0, γ) −→ for any fixed p0 ∈ (0, 1), the p0-thquantile of f (y | p0, µ, σ, γ) ≡ fp0(y | µ, σ, γ) is µ.

• Quantile-fixed (fixed p0) GAL density fp0(y | µ, σ, γ) =∫∫R+×R+

N(y | µ+ σC|γ|s + σAz, σ2Bz)Exp(z | 1)N+(s | 0, 1) dz ds

• A, B and C are all functions of γ (and p0).

• Y has density fp0 (µ, σ, γ) if-f (Y − µ)/σ has density fp0 (0, 1, γ)→ µ is alocation parameter (the p0-th quantile) and σ is a scale parameter.

• γ ∈ (L,U), where L is the negative root of g(γ) = 1 − p0 and U is thepositive root of g(γ) = p0.

• γ is a shape parameter that allows for more flexible densities than the AL.

11 / 28










11 / 28










11 / 28










11 / 28










11 / 28


−20 −10 0 10 20

0.0

00

.02

0.0

40

.06

0.0

8

p0 = 0.05

ε

De

nsity

γ = 0γ = −0.04γ = 4γ = 8

−10 −5 0 5 10

0.0

00

.05

0.1

00

.15

0.2

00

.25

p0 = 0.5

ε

De

nsity

γ = 0γ = −0.8γ = −0.4γ = 0.6

−10 −5 0 5 10

0.0

00

.05

0.1

00

.15

0.2

0

p0 = 0.75

ε

De

nsity

γ = 0γ = −1.5γ = −0.5γ = 0.2

Figure: Quantile level fixed GAL(µ = 0, σ = 1, γ) densities, with p0 = 0.05 (γ ∈(−0.07, 15.90)), p0 = 0.5 (γ ∈ (−1.09, 1.09)), and p0 = 0.75 (γ ∈ (−2.90, 0.39)).

12 / 28


Quantile regression with regularization

• The extension of the AL to the GAL distribution from the check lossfunction perspective.

• Marginalize over the zi −→ π(β, γ, σ, s1, ..., sn | data)−→ posterior fullconditional for β:

π(β | γ, σ, s1, ..., sn, data) ∝ π(β) exp

{− 1σ

n∑i=1

ρp(yi − xTi β − σH(γ)si)

}

• H(γ) = γg(γ)/{g(γ)− |p0 − I(γ < 0)|}

• p = I(γ < 0) + {[p0 − I(γ < 0)]/g(γ)}

• p0 the probability associated with the quantile modeled through xTi β.

• For γ = 0 (AL errors) −→ check loss function with p = p0.

13 / 28






{− 1σ

n∑i=1


}

• H(γ) = γg(γ)/{g(γ)− |p0 − I(γ < 0)|}

• p = I(γ < 0) + {[p0 − I(γ < 0)]/g(γ)}



13 / 28






{− 1σ

n∑i=1


}

• H(γ) = γg(γ)/{g(γ)− |p0 − I(γ < 0)|}

• p = I(γ < 0) + {[p0 − I(γ < 0)]/g(γ)}



13 / 28



• Adjusted loss function∑n

i=1 ρp(yi − xTi β − σH(γ)si)

• Positive-valued latent variables si can be viewed as response-specific weightsthat are adjusted by real-valued coefficient H(γ), which is fully specifiedthrough the shape parameter γ.

• Real-valued, response-specific terms σH(γ)si reflect on the estimation ofβ the effect of outlying observations relative to the AL distribution.

• Different versions of regularized quantile regression under differentpriors for β, working with AL errors (Li et al., 2010).• Lasso regularized quantile regression −→ hierarchical Laplace prior,π(β | σ, λ) =

∏j 0.5λσ−1 exp(−λσ−1|βj|)

• A broader framework for exploring regularization by adjusting the lossfunction (through the response distribution) in addition to the penaltyterm (through the prior for the regression coefficients).

14 / 28










14 / 28










14 / 28


GAL mixture model (fixed pk)

• Use GAL densities for the mixture components:

f (y | x) =K∑

k=1

ωk fpk(y | µpk + x′β, σ, γpk)

• Specify the values for 0 < p1 < ... < pK < 1.• Conditional truncated normal priors for µp1 < ... < µpK .• Rescaled Beta priors for the γpk (default uniform).• Hierarchical Laplace prior for β, π(β | σ, λ) =

∏dj=1

λ2σ exp(−λ

σ|βj|).

• Mixture weights defined through increments of a c.d.f. G (on (0, pK)):

ω1 = G(p1), ωk = G(pk)− G(pk−1), k = 2, ...,K

where G is assigned a Dirichlet process prior.

• MCMC: one set of latent variables for the mixture, two sets for the GALdensities→M-H steps for the γpk ; Gibbs sampling for all other parameters.

15 / 28




f (y | x) =K∑

k=1



∏dj=1

λ2σ exp(−λ

σ|βj|).


ω1 = G(p1), ωk = G(pk)− G(pk−1), k = 2, ...,K



15 / 28




f (y | x) =K∑

k=1



∏dj=1

λ2σ exp(−λ

σ|βj|).


ω1 = G(p1), ωk = G(pk)− G(pk−1), k = 2, ...,K



15 / 28




f (y | x) =K∑

k=1



∏dj=1

λ2σ exp(−λ

σ|βj|).


ω1 = G(p1), ωk = G(pk)− G(pk−1), k = 2, ...,K



15 / 28




f (y | x) =K∑

k=1



∏dj=1

λ2σ exp(−λ

σ|βj|).


ω1 = G(p1), ωk = G(pk)− G(pk−1), k = 2, ...,K



15 / 28




f (y | x) =K∑

k=1



∏dj=1

λ2σ exp(−λ

σ|βj|).


ω1 = G(p1), ωk = G(pk)− G(pk−1), k = 2, ...,K



15 / 28




f (y | x) =K∑

k=1



∏dj=1

λ2σ exp(−λ

σ|βj|).


ω1 = G(p1), ωk = G(pk)− G(pk−1), k = 2, ...,K



15 / 28


Synthetic data examples

• Different data sets simulated from yi = x′iβ + εi, where:• β = (3, 1.5, 0, 0, 2, 0, 0, 0)′;

• xi generated independently from a N8(0,Σ) distribution, with (i, j)th co-variance element 0.5|i−j|, for 1 ≤ i, j ≤ 8.

• Different scenarios for the error distribution:• mixture of three AL components (to highlight the benefits of the GAL

mixture kernel), with n = 600;

• normal and skew-normal distributions, with n = 500 in each case.

16 / 28


Synthetic data examples

• Different data sets simulated from yi = x′iβ + εi, where:• β = (3, 1.5, 0, 0, 2, 0, 0, 0)′;

• xi generated independently from a N8(0,Σ) distribution, with (i, j)th co-variance element 0.5|i−j|, for 1 ≤ i, j ≤ 8.

• Different scenarios for the error distribution:• mixture of three AL components (to highlight the benefits of the GAL

mixture kernel), with n = 600;

• normal and skew-normal distributions, with n = 500 in each case.

16 / 28


−10 −5 0 5 10

0.00

0.05

0.10

0.15

0.20

Posterior predictive density

x

Den

sity

Truth (population)Truth (data)Posterior mean

(a) GAL mixture

−10 −5 0 5 10

0.00

0.05

0.10

0.15

0.20

0.25


xD

ensi

ty


(b) AL mixture

Figure: Synthetic data from a mixture of three AL components. Posterior mean and 95% intervalestimates for the error density under the GAL and AL mixture models. In both cases, pk = k/10,for k = 1, ..., 9.

17 / 28


−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5


x

Den

sity


(a) Normal errors

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4


xD

ensi

ty


(b) Skew-normal errors

Figure: Synthetic data from normal and skew-normal distributions. Posterior mean and 95%interval estimates for the error density under the GAL mixture model (with fixed pk = k/10, fork = 1, ..., 9).

18 / 28


0.0 0.2 0.4 0.6 0.8

0.0

0.1

0.2

0.3

0.4

0.5

x

G(k

K)−

G((

k−

1)K

)

PriorPosterior

(a) Mixture weights

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

p = 0.1

x

Den

sity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

p = 0.2

x

Den

sity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

p = 0.3

x

Den

sity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

p = 0.4

x

Den

sity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

p = 0.5

x

Den

sity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

p = 0.6

x

Den

sity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

p = 0.7

Den

sity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

p = 0.8

Den

sity

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

p = 0.9

Den

sity

(b) Weighted mixture components

Figure: Synthetic data from normal distribution. Prior and posterior for the mixture weights,and posterior mean and 95% interval estimates for the error density weighted components, underthe GAL mixture model (with fixed pk = k/10, for k = 1, ..., 9).

19 / 28


0.0 0.2 0.4 0.6 0.8

0.0

0.1

0.2

0.3

0.4

0.5

0.6

x

G(k

K)−

G((

k−

1)K

)

PriorPosterior

(a) Mixture weights

−4 −2 0 2 4

0.00

0.10

0.20

0.30

p = 0.1

x

Den

sity

−4 −2 0 2 4

0.00

0.10

0.20

0.30

p = 0.2

x

Den

sity

−4 −2 0 2 4

0.00

0.10

0.20

0.30

p = 0.3

x

Den

sity

−4 −2 0 2 4

0.00

0.10

0.20

0.30

p = 0.4

x

Den

sity

−4 −2 0 2 4

0.00

0.10

0.20

0.30

p = 0.5

x

Den

sity

−4 −2 0 2 4

0.00

0.10

0.20

0.30

p = 0.6

x

Den

sity

−4 −2 0 2 4

0.00

0.10

0.20

0.30

p = 0.7

Den

sity

−4 −2 0 2 4

0.00

0.10

0.20

0.30

p = 0.8

Den

sity

−4 −2 0 2 4

0.00

0.10

0.20

0.30

p = 0.9

Den

sity


Figure: Synthetic data from skew-normal distribution. Prior and posterior for the mixtureweights, and posterior mean and 95% interval estimates for the error density weighted com-ponents, under the GAL mixture model (with fixed pk = k/10, for k = 1, ..., 9).

20 / 28


AL mixture model (random pk)

• Mixture with AL components:

f (y | x) =K∑

k=1

ωk f ALpk

(y | µpk + x′β, σ)

• Random 0 < p1 < ... < pK < 1 generated from a Poisson process on(0, 1) conditioning on K (uniform subject to the monotonicity restriction).

• Similar priors with before for the µp1 < ... < µpK , for β, and for themixture weights.

• Illustration (and comparison with GAL mixture) using synthetic datafrom a skew-normal distribution (n = 200).

21 / 28




f (y | x) =K∑

k=1

ωk f ALpk

(y | µpk + x′β, σ)




21 / 28




f (y | x) =K∑

k=1

ωk f ALpk

(y | µpk + x′β, σ)




21 / 28




f (y | x) =K∑

k=1

ωk f ALpk

(y | µpk + x′β, σ)




21 / 28


−4 −2 0 2 4 6

0.0

0.1

0.2

0.3

0.4

0.5


x

Den

sity


(a) AL mixture, random pk

−4 −2 0 2 4 6

0.0

0.1

0.2

0.3

0.4


xD

ensi

ty


(b) GAL mixture, fixed pk

Figure: Synthetic data from skew-normal distribution. Posterior mean and 95% interval esti-mates for the error density under the AL mixture with K = 5 and random pk , and the GALmixture with fixed pk = {0.1, 0.25, 0.5, 0.75, 0.9}.

22 / 28


−4 −2 0 2 4 6

0.0

0.1

0.2

0.3

0.4

0.5

Posterior predictive error density

x

Den

sity

−4 −2 0 2 4 6

0.0

0.1

0.2

0.3

0.4

0.5

p1 = 0.24, w1 = 0.23

xD

ensi

ty

−4 −2 0 2 4 6

0.0

0.1

0.2

0.3

0.4

0.5

p2 = 0.35, w2 = 0.27

x

Den

sity

−4 −2 0 2 4 6

0.0

0.1

0.2

0.3

0.4

0.5

p3 = 0.45, w3 = 0.24

Den

sity

−4 −2 0 2 4 6

0.0

0.1

0.2

0.3

0.4

0.5

p4 = 0.56, w4 = 0.17D

ensi

ty

−4 −2 0 2 4 6

0.0

0.1

0.2

0.3

0.4

0.5

p5 = 0.71, w5 = 0.09

Den

sity

Figure: Synthetic data from skew-normal distribution. Posterior mean and 95% interval esti-mates for the error density (top left panel) and for the error density weighted components, underthe AL mixture model with K = 5 and random pk .

23 / 28


Boston housing data example

• Realty price data from the Boston area with n = 506 observations:• response: log-transformed median value of owner-occupied housing in

USD 1000;

• 15 predictors, including: per capita crime (CRIM), nitric oxides concentra-tion (parts per 10 million) per town (NOX), average number of rooms perdwelling (RM), index of accessibility to radial highways per town (RAD),and full-value property-tax rate per USD 10,000 per town (TAX), trans-formed African American population proportion (B), and percentage val-ues of lower status population (LSTAT).

• Similar inferences under the random-pk AL mixture and fixed-pk GALmixture models (both with K = 9).

• Based on posterior predictive criteria (LPML), both models outperformsingle-quantile regression models (with GAL errors) for essentially anyfixed quantile level.

24 / 28




USD 1000;




24 / 28




USD 1000;




24 / 28


−0.6 −0.4 −0.2 0.0 0.2 0.4 0.6

01

23

4


ε

Den

sity

(a) Error density

−0.6 −0.2 0.0 0.2 0.4 0.6

01

23

x

Den

sity

p = 0.12

−0.6 −0.2 0.0 0.2 0.4 0.6

01

23

x

Den

sity

p = 0.22

−0.6 −0.2 0.0 0.2 0.4 0.6

01

23

x

Den

sity

p = 0.31

−0.6 −0.2 0.0 0.2 0.4 0.6

01

23

x

Den

sity

p = 0.38

−0.6 −0.2 0.0 0.2 0.4 0.6

01

23

x

Den

sity

p = 0.46

−0.6 −0.2 0.0 0.2 0.4 0.6

01

23

x

Den

sity

p = 0.54

−0.6 −0.2 0.0 0.2 0.4 0.6

01

23

Den

sity

p = 0.65

−0.6 −0.2 0.0 0.2 0.4 0.6

01

23

Den

sity

p = 0.76

−0.6 −0.2 0.0 0.2 0.4 0.6

01

23

Den

sity

p = 0.87


Figure: Boston housing data. Under the random-pk AL mixture model, posterior mean and 95%interval estimates for: (a) the error density; (b) the error density weighted components.

25 / 28


0 5 10 15

−0.

2−

0.1

0.0

0.1

0.2

ID

Effe

ct

●

●

●

● ● ●

●

●

●

●

●

●

●

●

●

LON

LAT

CRIM

ZN INDUSCHAS

NOX

RM

AGE

DIS

RAD

TAXPTRATIO

B

LSTAT

●

GALAL

Figure: Boston housing data. Posterior mean and 95% interval for βj , j = 1, . . . , 15, under thefixed-pk GAL mixture model, and the random-pk AL mixture model.

26 / 28


Summary

• A mixture model to:• integrate information from multiple parts of the response distribution to

inform estimation of covariate effects;

• identify the most relevant parts of the response distribution through mixtureweights associated with different quantiles.

• Two modeling scenarios: mixtures of GAL/AL distributions withfixed/random quantile levels.

• Applications to ROC estimation with covariates, extending the mixturemodel to incorporate stochastic ordering for the response distributionsassociated with the infected and non-infected groups.

27 / 28


Summary






27 / 28


Summary






27 / 28


• Acknowledgment: funding from NSF under award SES 1631963.

MANY THANKS!

28 / 28

Bayesian Quantile Mixture Regressionthanos/BNP2019_talk.pdfPrologos Quantile Mixture Regression Data Application Epilogos Bayesian Quantile Mixture Regression Athanasios Kottas Department

Documents