Prologos Quantile Mixture Regression Data Application Epilogos Bayesian Quantile Mixture Regression Athanasios Kottas Department of Statistics, University of California, Santa Cruz Joint work with Yifei Yan (Amazon, New York) 12 th International Conference on Bayesian Nonparametrics Oxford, U.K., June 24-28, 2019 1 / 28
72
Embed
Bayesian Quantile Mixture Regressionthanos/BNP2019_talk.pdfPrologos Quantile Mixture Regression Data Application Epilogos Bayesian Quantile Mixture Regression Athanasios Kottas Department
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Prologos Quantile Mixture Regression Data Application Epilogos
Bayesian Quantile Mixture Regression
Athanasios Kottas
Department of Statistics, University of California, Santa Cruz
Joint work with Yifei Yan (Amazon, New York)
12th International Conference on Bayesian NonparametricsOxford, U.K., June 24-28, 2019
1 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Quantile regression
• Quantile regression quantifies the relationship between a set ofquantiles of the response distribution and covariates, thus providing amore complete explanation of the response distribution.
• Practically important alternative to traditional mean regression models,with a growing literature in terms of methods and applications.
• Single-quantile regression formulation: yi = h(xi) + εi, i = 1, ..., n• Response observations yi, with covariate vectors xi.
• εi i.i.d. (given parameters) from an error distribution with density fp(ε) andp-th quantile equal to 0, i.e.,
∫ 0−∞ fp(ε)dε = p.
• h(x) is the quantile regression function, e.g., h(x) = x′β, for a linearquantile regression model.
2 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Quantile regression
• Quantile regression quantifies the relationship between a set ofquantiles of the response distribution and covariates, thus providing amore complete explanation of the response distribution.
• Practically important alternative to traditional mean regression models,with a growing literature in terms of methods and applications.
• Single-quantile regression formulation: yi = h(xi) + εi, i = 1, ..., n• Response observations yi, with covariate vectors xi.
• εi i.i.d. (given parameters) from an error distribution with density fp(ε) andp-th quantile equal to 0, i.e.,
∫ 0−∞ fp(ε)dε = p.
• h(x) is the quantile regression function, e.g., h(x) = x′β, for a linearquantile regression model.
2 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Estimation and modeling methods
• Classical nonparametric estimation: point estimates for quantile re-gression coefficients β through optimization of the check loss function,minβ
• Restriction µp1 < ... < µpK yields the ordering constraint for the quantiles→ facilitates identifiability→ connects mixture weights with quantiles.
• Additional scale/skewness/tail parameters θ (may be component specific).
5 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Connection with quantile regression methods
• Different from simultaneous quantile regression which jointly modelsdifferent covariate effects for different quantiles (βpk
instead of β).
• Similar in spirit with composite quantile regression from the classicalliterature (Zou & Yuan, 2008): for a collection of K specified quantilelevels, 0 < p1 < . . . < pK < 1, the CQR estimator:
(b1, . . . , bK , β) = arg minb1,...,bk,β
K∑k=1
{n∑
i=1
ρpk(yi − bk − x′iβ)
}
• bk ≡ bpk , for k = 1, ...,K: intercepts for the specified quantile levels.
• Variable selection method with properties that include consistency inselection and asymptotic normality, as n→∞.
• But, only an optimization algorithm ... does not involve/correspond to aprobabilistic model.
6 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Connection with quantile regression methods
• Different from simultaneous quantile regression which jointly modelsdifferent covariate effects for different quantiles (βpk
instead of β).
• Similar in spirit with composite quantile regression from the classicalliterature (Zou & Yuan, 2008): for a collection of K specified quantilelevels, 0 < p1 < . . . < pK < 1, the CQR estimator:
(b1, . . . , bK , β) = arg minb1,...,bk,β
K∑k=1
{n∑
i=1
ρpk(yi − bk − x′iβ)
}
• bk ≡ bpk , for k = 1, ...,K: intercepts for the specified quantile levels.
• Variable selection method with properties that include consistency inselection and asymptotic normality, as n→∞.
• But, only an optimization algorithm ... does not involve/correspond to aprobabilistic model.
6 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Connection with quantile regression methods
• Different from simultaneous quantile regression which jointly modelsdifferent covariate effects for different quantiles (βpk
instead of β).
• Similar in spirit with composite quantile regression from the classicalliterature (Zou & Yuan, 2008): for a collection of K specified quantilelevels, 0 < p1 < . . . < pK < 1, the CQR estimator:
(b1, . . . , bK , β) = arg minb1,...,bk,β
K∑k=1
{n∑
i=1
ρpk(yi − bk − x′iβ)
}
• bk ≡ bpk , for k = 1, ...,K: intercepts for the specified quantile levels.
• Variable selection method with properties that include consistency inselection and asymptotic normality, as n→∞.
• But, only an optimization algorithm ... does not involve/correspond to aprobabilistic model.
6 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Connection with quantile regression methods
• Different from simultaneous quantile regression which jointly modelsdifferent covariate effects for different quantiles (βpk
instead of β).
• Similar in spirit with composite quantile regression from the classicalliterature (Zou & Yuan, 2008): for a collection of K specified quantilelevels, 0 < p1 < . . . < pK < 1, the CQR estimator:
(b1, . . . , bK , β) = arg minb1,...,bk,β
K∑k=1
{n∑
i=1
ρpk(yi − bk − x′iβ)
}
• bk ≡ bpk , for k = 1, ...,K: intercepts for the specified quantile levels.
• Variable selection method with properties that include consistency inselection and asymptotic normality, as n→∞.
• But, only an optimization algorithm ... does not involve/correspond to aprobabilistic model.
6 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Two modeling scenarios
• We envision two modeling scenarios (with fixed K throughout):• fixed pk: for settings where one expects specific parts of the response
distribution (e.g., the right tail) to be informed by the covariates;
• the fixed pk model can also be used when such information is not available,based on a set of equally spaced pk spanning the unit interval;
• random pk: specify only the total number of quantiles, and let the datainform the full configuration of quantile components (both pk and µpk ).
• Mixture components need to be flexible distributions parameterized interms of quantiles.• For the random pk model, the AL distribution is sufficiently flexible.
• But not so when the quantile level is fixed→ generalized AL distribution.
7 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Two modeling scenarios
• We envision two modeling scenarios (with fixed K throughout):• fixed pk: for settings where one expects specific parts of the response
distribution (e.g., the right tail) to be informed by the covariates;
• the fixed pk model can also be used when such information is not available,based on a set of equally spaced pk spanning the unit interval;
• random pk: specify only the total number of quantiles, and let the datainform the full configuration of quantile components (both pk and µpk ).
• Mixture components need to be flexible distributions parameterized interms of quantiles.• For the random pk model, the AL distribution is sufficiently flexible.
• But not so when the quantile level is fixed→ generalized AL distribution.
7 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Two modeling scenarios
• We envision two modeling scenarios (with fixed K throughout):• fixed pk: for settings where one expects specific parts of the response
distribution (e.g., the right tail) to be informed by the covariates;
• the fixed pk model can also be used when such information is not available,based on a set of equally spaced pk spanning the unit interval;
• random pk: specify only the total number of quantiles, and let the datainform the full configuration of quantile components (both pk and µpk ).
• Mixture components need to be flexible distributions parameterized interms of quantiles.• For the random pk model, the AL distribution is sufficiently flexible.
• But not so when the quantile level is fixed→ generalized AL distribution.
7 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Two modeling scenarios
• We envision two modeling scenarios (with fixed K throughout):• fixed pk: for settings where one expects specific parts of the response
distribution (e.g., the right tail) to be informed by the covariates;
• the fixed pk model can also be used when such information is not available,based on a set of equally spaced pk spanning the unit interval;
• random pk: specify only the total number of quantiles, and let the datainform the full configuration of quantile components (both pk and µpk ).
• Mixture components need to be flexible distributions parameterized interms of quantiles.• For the random pk model, the AL distribution is sufficiently flexible.
• But not so when the quantile level is fixed→ generalized AL distribution.
7 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Two modeling scenarios
• We envision two modeling scenarios (with fixed K throughout):• fixed pk: for settings where one expects specific parts of the response
distribution (e.g., the right tail) to be informed by the covariates;
• the fixed pk model can also be used when such information is not available,based on a set of equally spaced pk spanning the unit interval;
• random pk: specify only the total number of quantiles, and let the datainform the full configuration of quantile components (both pk and µpk ).
• Mixture components need to be flexible distributions parameterized interms of quantiles.• For the random pk model, the AL distribution is sufficiently flexible.
• But not so when the quantile level is fixed→ generalized AL distribution.
7 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Asymmetric Laplace distribution
• Asymmetric Laplace (AL) density
f ALp (y | µ, σ) = p(1− p)
σexp
{− 1σρp (y− µ)
}, y ∈ R
where ρp(u) = up− u1(u<0), σ > 0 is a scale parameter, p ∈ (0, 1), andµ ∈ R corresponds to the p-th percentile,
∫ µ−∞ f AL
p (y | µ, σ)dy = p.
• For µ = x′β, maximizing the likelihood w.r.t. β under an AL responsedistribution corresponds to minimizing for β the check loss function.
• This property, along with an effective mixture representation, render theAL a popular choice in quantile regression modeling.
• But, the fixed-p AL distribution is very restrictive:• the skewness of the error distribution is fully determined by fixing p;
• the mode of the error density is at 0 for any p.
8 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Asymmetric Laplace distribution
• Asymmetric Laplace (AL) density
f ALp (y | µ, σ) = p(1− p)
σexp
{− 1σρp (y− µ)
}, y ∈ R
where ρp(u) = up− u1(u<0), σ > 0 is a scale parameter, p ∈ (0, 1), andµ ∈ R corresponds to the p-th percentile,
∫ µ−∞ f AL
p (y | µ, σ)dy = p.
• For µ = x′β, maximizing the likelihood w.r.t. β under an AL responsedistribution corresponds to minimizing for β the check loss function.
• This property, along with an effective mixture representation, render theAL a popular choice in quantile regression modeling.
• But, the fixed-p AL distribution is very restrictive:• the skewness of the error distribution is fully determined by fixing p;
• the mode of the error density is at 0 for any p.
8 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Asymmetric Laplace distribution
• Asymmetric Laplace (AL) density
f ALp (y | µ, σ) = p(1− p)
σexp
{− 1σρp (y− µ)
}, y ∈ R
where ρp(u) = up− u1(u<0), σ > 0 is a scale parameter, p ∈ (0, 1), andµ ∈ R corresponds to the p-th percentile,
∫ µ−∞ f AL
p (y | µ, σ)dy = p.
• For µ = x′β, maximizing the likelihood w.r.t. β under an AL responsedistribution corresponds to minimizing for β the check loss function.
• This property, along with an effective mixture representation, render theAL a popular choice in quantile regression modeling.
• But, the fixed-p AL distribution is very restrictive:• the skewness of the error distribution is fully determined by fixing p;
• the mode of the error density is at 0 for any p.
8 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Asymmetric Laplace distribution
• Asymmetric Laplace (AL) density
f ALp (y | µ, σ) = p(1− p)
σexp
{− 1σρp (y− µ)
}, y ∈ R
where ρp(u) = up− u1(u<0), σ > 0 is a scale parameter, p ∈ (0, 1), andµ ∈ R corresponds to the p-th percentile,
∫ µ−∞ f AL
p (y | µ, σ)dy = p.
• For µ = x′β, maximizing the likelihood w.r.t. β under an AL responsedistribution corresponds to minimizing for β the check loss function.
• This property, along with an effective mixture representation, render theAL a popular choice in quantile regression modeling.
• But, the fixed-p AL distribution is very restrictive:• the skewness of the error distribution is fully determined by fixing p;
• the mode of the error density is at 0 for any p.
8 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Generalized AL distribution
• Construction motivated by the AL mixture representation:
f ALp (y | µ, σ) =
∫R+
N(y | µ+ σA(p)z, σ2B(p)z)Exp(z | 1) dz
where A(p) = (1− 2p)/{p(1− p)} and B(p) = 2/{p(1− p)}.
• Replace the normal kernel with a skew normal density:
2ωφ
(y− ξω
)Φ
(α(y− ξ)
ω
)=
∫R+
N(y | ξ + ταs, τ 2)N+(s | 0, 1) ds
where α ∈ R is the skewness parameter, τ = ω(1 + α2)−1/2, andN+(0, 1) denotes the standard normal distribution truncated over R+.
Prologos Quantile Mixture Regression Data Application Epilogos
Quantile regression with regularization
• The extension of the AL to the GAL distribution from the check lossfunction perspective.
• Marginalize over the zi −→ π(β, γ, σ, s1, ..., sn | data)−→ posterior fullconditional for β:
π(β | γ, σ, s1, ..., sn, data) ∝ π(β) exp
{− 1σ
n∑i=1
ρp(yi − xTi β − σH(γ)si)
}
• H(γ) = γg(γ)/{g(γ)− |p0 − I(γ < 0)|}
• p = I(γ < 0) + {[p0 − I(γ < 0)]/g(γ)}
• p0 the probability associated with the quantile modeled through xTi β.
• For γ = 0 (AL errors) −→ check loss function with p = p0.
13 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Quantile regression with regularization
• The extension of the AL to the GAL distribution from the check lossfunction perspective.
• Marginalize over the zi −→ π(β, γ, σ, s1, ..., sn | data)−→ posterior fullconditional for β:
π(β | γ, σ, s1, ..., sn, data) ∝ π(β) exp
{− 1σ
n∑i=1
ρp(yi − xTi β − σH(γ)si)
}
• H(γ) = γg(γ)/{g(γ)− |p0 − I(γ < 0)|}
• p = I(γ < 0) + {[p0 − I(γ < 0)]/g(γ)}
• p0 the probability associated with the quantile modeled through xTi β.
• For γ = 0 (AL errors) −→ check loss function with p = p0.
13 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Quantile regression with regularization
• The extension of the AL to the GAL distribution from the check lossfunction perspective.
• Marginalize over the zi −→ π(β, γ, σ, s1, ..., sn | data)−→ posterior fullconditional for β:
π(β | γ, σ, s1, ..., sn, data) ∝ π(β) exp
{− 1σ
n∑i=1
ρp(yi − xTi β − σH(γ)si)
}
• H(γ) = γg(γ)/{g(γ)− |p0 − I(γ < 0)|}
• p = I(γ < 0) + {[p0 − I(γ < 0)]/g(γ)}
• p0 the probability associated with the quantile modeled through xTi β.
• For γ = 0 (AL errors) −→ check loss function with p = p0.
13 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Quantile regression with regularization
• Adjusted loss function∑n
i=1 ρp(yi − xTi β − σH(γ)si)
• Positive-valued latent variables si can be viewed as response-specific weightsthat are adjusted by real-valued coefficient H(γ), which is fully specifiedthrough the shape parameter γ.
• Real-valued, response-specific terms σH(γ)si reflect on the estimation ofβ the effect of outlying observations relative to the AL distribution.
• Different versions of regularized quantile regression under differentpriors for β, working with AL errors (Li et al., 2010).• Lasso regularized quantile regression −→ hierarchical Laplace prior,π(β | σ, λ) =
∏j 0.5λσ−1 exp(−λσ−1|βj|)
• A broader framework for exploring regularization by adjusting the lossfunction (through the response distribution) in addition to the penaltyterm (through the prior for the regression coefficients).
14 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Quantile regression with regularization
• Adjusted loss function∑n
i=1 ρp(yi − xTi β − σH(γ)si)
• Positive-valued latent variables si can be viewed as response-specific weightsthat are adjusted by real-valued coefficient H(γ), which is fully specifiedthrough the shape parameter γ.
• Real-valued, response-specific terms σH(γ)si reflect on the estimation ofβ the effect of outlying observations relative to the AL distribution.
• Different versions of regularized quantile regression under differentpriors for β, working with AL errors (Li et al., 2010).• Lasso regularized quantile regression −→ hierarchical Laplace prior,π(β | σ, λ) =
∏j 0.5λσ−1 exp(−λσ−1|βj|)
• A broader framework for exploring regularization by adjusting the lossfunction (through the response distribution) in addition to the penaltyterm (through the prior for the regression coefficients).
14 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Quantile regression with regularization
• Adjusted loss function∑n
i=1 ρp(yi − xTi β − σH(γ)si)
• Positive-valued latent variables si can be viewed as response-specific weightsthat are adjusted by real-valued coefficient H(γ), which is fully specifiedthrough the shape parameter γ.
• Real-valued, response-specific terms σH(γ)si reflect on the estimation ofβ the effect of outlying observations relative to the AL distribution.
• Different versions of regularized quantile regression under differentpriors for β, working with AL errors (Li et al., 2010).• Lasso regularized quantile regression −→ hierarchical Laplace prior,π(β | σ, λ) =
∏j 0.5λσ−1 exp(−λσ−1|βj|)
• A broader framework for exploring regularization by adjusting the lossfunction (through the response distribution) in addition to the penaltyterm (through the prior for the regression coefficients).
14 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
GAL mixture model (fixed pk)
• Use GAL densities for the mixture components:
f (y | x) =K∑
k=1
ωk fpk(y | µpk + x′β, σ, γpk)
• Specify the values for 0 < p1 < ... < pK < 1.• Conditional truncated normal priors for µp1 < ... < µpK .• Rescaled Beta priors for the γpk (default uniform).• Hierarchical Laplace prior for β, π(β | σ, λ) =
∏dj=1
λ2σ exp(−λ
σ|βj|).
• Mixture weights defined through increments of a c.d.f. G (on (0, pK)):
ω1 = G(p1), ωk = G(pk)− G(pk−1), k = 2, ...,K
where G is assigned a Dirichlet process prior.
• MCMC: one set of latent variables for the mixture, two sets for the GALdensities→M-H steps for the γpk ; Gibbs sampling for all other parameters.
15 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
GAL mixture model (fixed pk)
• Use GAL densities for the mixture components:
f (y | x) =K∑
k=1
ωk fpk(y | µpk + x′β, σ, γpk)
• Specify the values for 0 < p1 < ... < pK < 1.• Conditional truncated normal priors for µp1 < ... < µpK .• Rescaled Beta priors for the γpk (default uniform).• Hierarchical Laplace prior for β, π(β | σ, λ) =
∏dj=1
λ2σ exp(−λ
σ|βj|).
• Mixture weights defined through increments of a c.d.f. G (on (0, pK)):
ω1 = G(p1), ωk = G(pk)− G(pk−1), k = 2, ...,K
where G is assigned a Dirichlet process prior.
• MCMC: one set of latent variables for the mixture, two sets for the GALdensities→M-H steps for the γpk ; Gibbs sampling for all other parameters.
15 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
GAL mixture model (fixed pk)
• Use GAL densities for the mixture components:
f (y | x) =K∑
k=1
ωk fpk(y | µpk + x′β, σ, γpk)
• Specify the values for 0 < p1 < ... < pK < 1.• Conditional truncated normal priors for µp1 < ... < µpK .• Rescaled Beta priors for the γpk (default uniform).• Hierarchical Laplace prior for β, π(β | σ, λ) =
∏dj=1
λ2σ exp(−λ
σ|βj|).
• Mixture weights defined through increments of a c.d.f. G (on (0, pK)):
ω1 = G(p1), ωk = G(pk)− G(pk−1), k = 2, ...,K
where G is assigned a Dirichlet process prior.
• MCMC: one set of latent variables for the mixture, two sets for the GALdensities→M-H steps for the γpk ; Gibbs sampling for all other parameters.
15 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
GAL mixture model (fixed pk)
• Use GAL densities for the mixture components:
f (y | x) =K∑
k=1
ωk fpk(y | µpk + x′β, σ, γpk)
• Specify the values for 0 < p1 < ... < pK < 1.• Conditional truncated normal priors for µp1 < ... < µpK .• Rescaled Beta priors for the γpk (default uniform).• Hierarchical Laplace prior for β, π(β | σ, λ) =
∏dj=1
λ2σ exp(−λ
σ|βj|).
• Mixture weights defined through increments of a c.d.f. G (on (0, pK)):
ω1 = G(p1), ωk = G(pk)− G(pk−1), k = 2, ...,K
where G is assigned a Dirichlet process prior.
• MCMC: one set of latent variables for the mixture, two sets for the GALdensities→M-H steps for the γpk ; Gibbs sampling for all other parameters.
15 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
GAL mixture model (fixed pk)
• Use GAL densities for the mixture components:
f (y | x) =K∑
k=1
ωk fpk(y | µpk + x′β, σ, γpk)
• Specify the values for 0 < p1 < ... < pK < 1.• Conditional truncated normal priors for µp1 < ... < µpK .• Rescaled Beta priors for the γpk (default uniform).• Hierarchical Laplace prior for β, π(β | σ, λ) =
∏dj=1
λ2σ exp(−λ
σ|βj|).
• Mixture weights defined through increments of a c.d.f. G (on (0, pK)):
ω1 = G(p1), ωk = G(pk)− G(pk−1), k = 2, ...,K
where G is assigned a Dirichlet process prior.
• MCMC: one set of latent variables for the mixture, two sets for the GALdensities→M-H steps for the γpk ; Gibbs sampling for all other parameters.
15 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
GAL mixture model (fixed pk)
• Use GAL densities for the mixture components:
f (y | x) =K∑
k=1
ωk fpk(y | µpk + x′β, σ, γpk)
• Specify the values for 0 < p1 < ... < pK < 1.• Conditional truncated normal priors for µp1 < ... < µpK .• Rescaled Beta priors for the γpk (default uniform).• Hierarchical Laplace prior for β, π(β | σ, λ) =
∏dj=1
λ2σ exp(−λ
σ|βj|).
• Mixture weights defined through increments of a c.d.f. G (on (0, pK)):
ω1 = G(p1), ωk = G(pk)− G(pk−1), k = 2, ...,K
where G is assigned a Dirichlet process prior.
• MCMC: one set of latent variables for the mixture, two sets for the GALdensities→M-H steps for the γpk ; Gibbs sampling for all other parameters.
15 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
GAL mixture model (fixed pk)
• Use GAL densities for the mixture components:
f (y | x) =K∑
k=1
ωk fpk(y | µpk + x′β, σ, γpk)
• Specify the values for 0 < p1 < ... < pK < 1.• Conditional truncated normal priors for µp1 < ... < µpK .• Rescaled Beta priors for the γpk (default uniform).• Hierarchical Laplace prior for β, π(β | σ, λ) =
∏dj=1
λ2σ exp(−λ
σ|βj|).
• Mixture weights defined through increments of a c.d.f. G (on (0, pK)):
ω1 = G(p1), ωk = G(pk)− G(pk−1), k = 2, ...,K
where G is assigned a Dirichlet process prior.
• MCMC: one set of latent variables for the mixture, two sets for the GALdensities→M-H steps for the γpk ; Gibbs sampling for all other parameters.
15 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Synthetic data examples
• Different data sets simulated from yi = x′iβ + εi, where:• β = (3, 1.5, 0, 0, 2, 0, 0, 0)′;
• xi generated independently from a N8(0,Σ) distribution, with (i, j)th co-variance element 0.5|i−j|, for 1 ≤ i, j ≤ 8.
• Different scenarios for the error distribution:• mixture of three AL components (to highlight the benefits of the GAL
mixture kernel), with n = 600;
• normal and skew-normal distributions, with n = 500 in each case.
16 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Synthetic data examples
• Different data sets simulated from yi = x′iβ + εi, where:• β = (3, 1.5, 0, 0, 2, 0, 0, 0)′;
• xi generated independently from a N8(0,Σ) distribution, with (i, j)th co-variance element 0.5|i−j|, for 1 ≤ i, j ≤ 8.
• Different scenarios for the error distribution:• mixture of three AL components (to highlight the benefits of the GAL
mixture kernel), with n = 600;
• normal and skew-normal distributions, with n = 500 in each case.
16 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
−10 −5 0 5 10
0.00
0.05
0.10
0.15
0.20
Posterior predictive density
x
Den
sity
Truth (population)Truth (data)Posterior mean
(a) GAL mixture
−10 −5 0 5 10
0.00
0.05
0.10
0.15
0.20
0.25
Posterior predictive density
xD
ensi
ty
Truth (population)Truth (data)Posterior mean
(b) AL mixture
Figure: Synthetic data from a mixture of three AL components. Posterior mean and 95% intervalestimates for the error density under the GAL and AL mixture models. In both cases, pk = k/10,for k = 1, ..., 9.
17 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
−4 −2 0 2 4
0.0
0.1
0.2
0.3
0.4
0.5
Posterior predictive density
x
Den
sity
Truth (population)Truth (data)Posterior mean
(a) Normal errors
−4 −2 0 2 4
0.0
0.1
0.2
0.3
0.4
Posterior predictive density
xD
ensi
ty
Truth (population)Truth (data)Posterior mean
(b) Skew-normal errors
Figure: Synthetic data from normal and skew-normal distributions. Posterior mean and 95%interval estimates for the error density under the GAL mixture model (with fixed pk = k/10, fork = 1, ..., 9).
18 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
0.0 0.2 0.4 0.6 0.8
0.0
0.1
0.2
0.3
0.4
0.5
x
G(k
K)−
G((
k−
1)K
)
PriorPosterior
(a) Mixture weights
−4 −2 0 2 4
0.0
0.1
0.2
0.3
0.4
p = 0.1
x
Den
sity
−4 −2 0 2 4
0.0
0.1
0.2
0.3
0.4
p = 0.2
x
Den
sity
−4 −2 0 2 4
0.0
0.1
0.2
0.3
0.4
p = 0.3
x
Den
sity
−4 −2 0 2 4
0.0
0.1
0.2
0.3
0.4
p = 0.4
x
Den
sity
−4 −2 0 2 4
0.0
0.1
0.2
0.3
0.4
p = 0.5
x
Den
sity
−4 −2 0 2 4
0.0
0.1
0.2
0.3
0.4
p = 0.6
x
Den
sity
−4 −2 0 2 4
0.0
0.1
0.2
0.3
0.4
p = 0.7
Den
sity
−4 −2 0 2 4
0.0
0.1
0.2
0.3
0.4
p = 0.8
Den
sity
−4 −2 0 2 4
0.0
0.1
0.2
0.3
0.4
p = 0.9
Den
sity
(b) Weighted mixture components
Figure: Synthetic data from normal distribution. Prior and posterior for the mixture weights,and posterior mean and 95% interval estimates for the error density weighted components, underthe GAL mixture model (with fixed pk = k/10, for k = 1, ..., 9).
19 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
0.0 0.2 0.4 0.6 0.8
0.0
0.1
0.2
0.3
0.4
0.5
0.6
x
G(k
K)−
G((
k−
1)K
)
PriorPosterior
(a) Mixture weights
−4 −2 0 2 4
0.00
0.10
0.20
0.30
p = 0.1
x
Den
sity
−4 −2 0 2 4
0.00
0.10
0.20
0.30
p = 0.2
x
Den
sity
−4 −2 0 2 4
0.00
0.10
0.20
0.30
p = 0.3
x
Den
sity
−4 −2 0 2 4
0.00
0.10
0.20
0.30
p = 0.4
x
Den
sity
−4 −2 0 2 4
0.00
0.10
0.20
0.30
p = 0.5
x
Den
sity
−4 −2 0 2 4
0.00
0.10
0.20
0.30
p = 0.6
x
Den
sity
−4 −2 0 2 4
0.00
0.10
0.20
0.30
p = 0.7
Den
sity
−4 −2 0 2 4
0.00
0.10
0.20
0.30
p = 0.8
Den
sity
−4 −2 0 2 4
0.00
0.10
0.20
0.30
p = 0.9
Den
sity
(b) Weighted mixture components
Figure: Synthetic data from skew-normal distribution. Prior and posterior for the mixtureweights, and posterior mean and 95% interval estimates for the error density weighted com-ponents, under the GAL mixture model (with fixed pk = k/10, for k = 1, ..., 9).
20 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
AL mixture model (random pk)
• Mixture with AL components:
f (y | x) =K∑
k=1
ωk f ALpk
(y | µpk + x′β, σ)
• Random 0 < p1 < ... < pK < 1 generated from a Poisson process on(0, 1) conditioning on K (uniform subject to the monotonicity restriction).
• Similar priors with before for the µp1 < ... < µpK , for β, and for themixture weights.
• Illustration (and comparison with GAL mixture) using synthetic datafrom a skew-normal distribution (n = 200).
21 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
AL mixture model (random pk)
• Mixture with AL components:
f (y | x) =K∑
k=1
ωk f ALpk
(y | µpk + x′β, σ)
• Random 0 < p1 < ... < pK < 1 generated from a Poisson process on(0, 1) conditioning on K (uniform subject to the monotonicity restriction).
• Similar priors with before for the µp1 < ... < µpK , for β, and for themixture weights.
• Illustration (and comparison with GAL mixture) using synthetic datafrom a skew-normal distribution (n = 200).
21 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
AL mixture model (random pk)
• Mixture with AL components:
f (y | x) =K∑
k=1
ωk f ALpk
(y | µpk + x′β, σ)
• Random 0 < p1 < ... < pK < 1 generated from a Poisson process on(0, 1) conditioning on K (uniform subject to the monotonicity restriction).
• Similar priors with before for the µp1 < ... < µpK , for β, and for themixture weights.
• Illustration (and comparison with GAL mixture) using synthetic datafrom a skew-normal distribution (n = 200).
21 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
AL mixture model (random pk)
• Mixture with AL components:
f (y | x) =K∑
k=1
ωk f ALpk
(y | µpk + x′β, σ)
• Random 0 < p1 < ... < pK < 1 generated from a Poisson process on(0, 1) conditioning on K (uniform subject to the monotonicity restriction).
• Similar priors with before for the µp1 < ... < µpK , for β, and for themixture weights.
• Illustration (and comparison with GAL mixture) using synthetic datafrom a skew-normal distribution (n = 200).
21 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
−4 −2 0 2 4 6
0.0
0.1
0.2
0.3
0.4
0.5
Posterior predictive density
x
Den
sity
Truth (population)Truth (data)Posterior mean
(a) AL mixture, random pk
−4 −2 0 2 4 6
0.0
0.1
0.2
0.3
0.4
Posterior predictive density
xD
ensi
ty
Truth (population)Truth (data)Posterior mean
(b) GAL mixture, fixed pk
Figure: Synthetic data from skew-normal distribution. Posterior mean and 95% interval esti-mates for the error density under the AL mixture with K = 5 and random pk , and the GALmixture with fixed pk = {0.1, 0.25, 0.5, 0.75, 0.9}.
22 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
−4 −2 0 2 4 6
0.0
0.1
0.2
0.3
0.4
0.5
Posterior predictive error density
x
Den
sity
−4 −2 0 2 4 6
0.0
0.1
0.2
0.3
0.4
0.5
p1 = 0.24, w1 = 0.23
xD
ensi
ty
−4 −2 0 2 4 6
0.0
0.1
0.2
0.3
0.4
0.5
p2 = 0.35, w2 = 0.27
x
Den
sity
−4 −2 0 2 4 6
0.0
0.1
0.2
0.3
0.4
0.5
p3 = 0.45, w3 = 0.24
Den
sity
−4 −2 0 2 4 6
0.0
0.1
0.2
0.3
0.4
0.5
p4 = 0.56, w4 = 0.17D
ensi
ty
−4 −2 0 2 4 6
0.0
0.1
0.2
0.3
0.4
0.5
p5 = 0.71, w5 = 0.09
Den
sity
Figure: Synthetic data from skew-normal distribution. Posterior mean and 95% interval esti-mates for the error density (top left panel) and for the error density weighted components, underthe AL mixture model with K = 5 and random pk .
23 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Boston housing data example
• Realty price data from the Boston area with n = 506 observations:• response: log-transformed median value of owner-occupied housing in
USD 1000;
• 15 predictors, including: per capita crime (CRIM), nitric oxides concentra-tion (parts per 10 million) per town (NOX), average number of rooms perdwelling (RM), index of accessibility to radial highways per town (RAD),and full-value property-tax rate per USD 10,000 per town (TAX), trans-formed African American population proportion (B), and percentage val-ues of lower status population (LSTAT).
• Similar inferences under the random-pk AL mixture and fixed-pk GALmixture models (both with K = 9).
• Based on posterior predictive criteria (LPML), both models outperformsingle-quantile regression models (with GAL errors) for essentially anyfixed quantile level.
24 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Boston housing data example
• Realty price data from the Boston area with n = 506 observations:• response: log-transformed median value of owner-occupied housing in
USD 1000;
• 15 predictors, including: per capita crime (CRIM), nitric oxides concentra-tion (parts per 10 million) per town (NOX), average number of rooms perdwelling (RM), index of accessibility to radial highways per town (RAD),and full-value property-tax rate per USD 10,000 per town (TAX), trans-formed African American population proportion (B), and percentage val-ues of lower status population (LSTAT).
• Similar inferences under the random-pk AL mixture and fixed-pk GALmixture models (both with K = 9).
• Based on posterior predictive criteria (LPML), both models outperformsingle-quantile regression models (with GAL errors) for essentially anyfixed quantile level.
24 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Boston housing data example
• Realty price data from the Boston area with n = 506 observations:• response: log-transformed median value of owner-occupied housing in
USD 1000;
• 15 predictors, including: per capita crime (CRIM), nitric oxides concentra-tion (parts per 10 million) per town (NOX), average number of rooms perdwelling (RM), index of accessibility to radial highways per town (RAD),and full-value property-tax rate per USD 10,000 per town (TAX), trans-formed African American population proportion (B), and percentage val-ues of lower status population (LSTAT).
• Similar inferences under the random-pk AL mixture and fixed-pk GALmixture models (both with K = 9).
• Based on posterior predictive criteria (LPML), both models outperformsingle-quantile regression models (with GAL errors) for essentially anyfixed quantile level.
24 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
−0.6 −0.4 −0.2 0.0 0.2 0.4 0.6
01
23
4
Posterior predictive density
ε
Den
sity
(a) Error density
−0.6 −0.2 0.0 0.2 0.4 0.6
01
23
x
Den
sity
p = 0.12
−0.6 −0.2 0.0 0.2 0.4 0.6
01
23
x
Den
sity
p = 0.22
−0.6 −0.2 0.0 0.2 0.4 0.6
01
23
x
Den
sity
p = 0.31
−0.6 −0.2 0.0 0.2 0.4 0.6
01
23
x
Den
sity
p = 0.38
−0.6 −0.2 0.0 0.2 0.4 0.6
01
23
x
Den
sity
p = 0.46
−0.6 −0.2 0.0 0.2 0.4 0.6
01
23
x
Den
sity
p = 0.54
−0.6 −0.2 0.0 0.2 0.4 0.6
01
23
Den
sity
p = 0.65
−0.6 −0.2 0.0 0.2 0.4 0.6
01
23
Den
sity
p = 0.76
−0.6 −0.2 0.0 0.2 0.4 0.6
01
23
Den
sity
p = 0.87
(b) Weighted mixture components
Figure: Boston housing data. Under the random-pk AL mixture model, posterior mean and 95%interval estimates for: (a) the error density; (b) the error density weighted components.
25 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
0 5 10 15
−0.
2−
0.1
0.0
0.1
0.2
ID
Effe
ct
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
LON
LAT
CRIM
ZN INDUSCHAS
NOX
RM
AGE
DIS
RAD
TAXPTRATIO
B
LSTAT
●
GALAL
Figure: Boston housing data. Posterior mean and 95% interval for βj , j = 1, . . . , 15, under thefixed-pk GAL mixture model, and the random-pk AL mixture model.
26 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Summary
• A mixture model to:• integrate information from multiple parts of the response distribution to
inform estimation of covariate effects;
• identify the most relevant parts of the response distribution through mixtureweights associated with different quantiles.
• Two modeling scenarios: mixtures of GAL/AL distributions withfixed/random quantile levels.
• Applications to ROC estimation with covariates, extending the mixturemodel to incorporate stochastic ordering for the response distributionsassociated with the infected and non-infected groups.
27 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Summary
• A mixture model to:• integrate information from multiple parts of the response distribution to
inform estimation of covariate effects;
• identify the most relevant parts of the response distribution through mixtureweights associated with different quantiles.
• Two modeling scenarios: mixtures of GAL/AL distributions withfixed/random quantile levels.
• Applications to ROC estimation with covariates, extending the mixturemodel to incorporate stochastic ordering for the response distributionsassociated with the infected and non-infected groups.
27 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
Summary
• A mixture model to:• integrate information from multiple parts of the response distribution to
inform estimation of covariate effects;
• identify the most relevant parts of the response distribution through mixtureweights associated with different quantiles.
• Two modeling scenarios: mixtures of GAL/AL distributions withfixed/random quantile levels.
• Applications to ROC estimation with covariates, extending the mixturemodel to incorporate stochastic ordering for the response distributionsassociated with the infected and non-infected groups.
27 / 28
Prologos Quantile Mixture Regression Data Application Epilogos
• Acknowledgment: funding from NSF under award SES 1631963.