1 Motivation. • Bayesian discrete choice models • Bayesian approach offers extremely powerful meth- ods for numerical integration • These methods facilitate the study of latent vari- able models • Discrete choice is a case in point • Some particularly rich models can only be studied using Bayes
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1 Motivation.
• Bayesian discrete choice models
• Bayesian approach offers extremely powerful meth-ods for numerical integration
• These methods facilitate the study of latent vari-able models
• Discrete choice is a case in point
• Some particularly rich models can only be studiedusing Bayes
2 Review of Bayesian Statistics
• Let p(D|θ) denote the probability of the data Dgiven parameter θ
• In most applications, p(D|θ) corresponds to alikelihood l(θ)
• Unlike many classical estimators, Bayesian ap-proach typically requires specification of paramet-
ric likelihood
• Many researchers view this as a drawback
• However, in discrete choice parametric models arealmost always used
• p(θ) is prior distribution of parameters
• The choice of a prior can be controversial in ap-plied work
• In some cases, we might put reasonable restric-tions on parameters
• Some researchers conduct senstivity analysis withrespect to priors
• Bayes theorem:
p(θ|D) =p(D, θ)
p(D)
=p(D|θ)p(θ)
p(D)∝ p(θ)l(θ)
• p(θ|D) is the posterior distribution of the para-meters given the data
• In the Bayesian approach, the research updateshis/her beliefs about the parameters using the
laws of conditional probability
• The posterior is proportional to the prior timesthe likelihood
• Note that as the sample becomes large, the like-lihood will become the dominant term
• In classical statistics, the parameter is a randomvariable
• Confidence intervals etc.... are obtained using anasymptotic approximation
• The choice between Bayes/Classical approachesis a matter of dispute in statistics/econometrics
• Under mild regularity conditions, as the samplesize n becomes large, the posterior becomes:
p(θ|D) ≈ N(bθMLE,∙−H
θ=bθMLE
¸−1)
• Intuitively, this occurs because the posterior closelyresembles the likelihood function as the sample
size becomes large.
• p(θ) stays fixed with n
• l(θ) increases with n
• The prior swamps the likelihood in large samples
2.1 Predictive distributions
• Df data that we have not observed
• D observed data
p(Df |D) =Zp(Df |θ)p(θ|D)dθ
• p(Df |θ) probability of Df given θ
• Integrate out parameter uncertainty using the pos-terior, p(θ|D)
2.2 Decision Theory
• L (a, θ) loss function associated with an action awhen the parameter is θ
• A Bayesian should choose to minimize expectedloss given parameter uncertainty
mina
ZL (a, θ) p(θ|D)dθ
• Examples: estimation- a corresponds to choice ofparameter
• Non-nested testing- a coresponds to choice ofmodel
• Profit maximization: target marketing
• Utility Maximization
3 Markov Chains.
• Two common ways to conduct MCMC are Gibbssampling and Metropolis.
• A normal random walk metropolis works as fol-
lows.
• First, the econometrician comes up with a roughguess θ0 at the MLE.
• Second, come up with a rough guess at I0 at theinformation matrix using the hessian of the MLE.
• A sequence of psueorandom values θ(1), ..., θ(S)
• In the above, yij is the utility of person i for
alternative j
• εij is the stochastic preference shock
• xij are covariates that enter into i’s utility
• cij = 1 if i chooses j
• If the yij were known, then we could use the Gibbssampler above to estimate β and h
• However, the yij are latent variables and thereforewe do data augmentation.
• The idea behind data augmentation is simple— weintegrate out the distribution of the variables that
we do not see.
• Follwing the notation in Cameron and Trivedi, letf(θ|y, y∗) denote the posterior conditional on theobserved variables, y and the latent variables, y∗.
• Let f(y∗|y, θ) denote the distribution of the latentvariable conditional on y and parameters.
• Then the posterior can be written as:
p(θ|y) =Zf(θ|y, y∗)f(y∗|y, θ)dy∗
• Taking account of the latent variable simply in-volves an additional Gibbs step.
• The distribution of the latent utility yijis a tru-
cated normal distribution.
• If cij = 1, yij is a truncated normal with mean pa-rameter β, precision h and lower truncation point
max{yij0, j0 6= j}.
• If cij = 0, yij is a truncated normal with mean
parameter β, precision h and upper truncation
point max{yij}.
• The Gibbs sampler for the multinomial probit sim-ply adds the data augmentation step above:
• A Gibbs sampler generates a pseudo-random se-
quence (h(s), β(s),½y(s)ij
¾i∈I,j∈J
) s = 1, ..., S us-
ing the following markov chain
1. Given (h(s), β(s)), draw β(s+1) ∼ p(β|h(s),X, y, C)
2. Given β(s+1) draw h(s+1) ∼ p(h|β(s+1),X, y, C)
3. For each I, draw y(s+1)i1 ∼ p(h|β(s+1),X, y
(s)i2 , ..., y
(s)iJ , C)
4. Draw y(s+1)i2 ∼ p(h|β(s+1),X, y
(s+1)i1 , ..., y
(s)iJ , C)
5. ...
6. Draw y(s+1)iJ ∼ p(h|β(s+1),X, y
(s+1)i1 , ..., y
(s+1)iJ−1 , C)
7. Return to 1
6 Target Marketing
• In ”The Value of Purchase History Data in TargetMarketing” Rossi et. al. attempt to estimate
household level preference parameters.
• This is of interest as a marketing problem.
• CMI checkout coupon uses purchase informationto customize coupons to a particular household.
• In principal, the entire purchase history (from con-sumer loyalty cards) could be used to customize
coupons (and hence prices)
• If a household level preference parameter can beforecasted with high precision, this is essentially
first degree price discrimination!
• Even with short purchase histories, they find thatprofits are increased 2.5 fold through the use of
purchase data compared to blanket couponing strate-
gies.
• Even one observation can boost profits from coupon-ing by 50%.
• This application is of interest to economists aswell.
• The methods in this paper allow us to account forconsumer heterogeneity in a very rich manner.
• This might be useful to examine the distribtionof welfare consequences of a policy intervention
(e.g. a merger or market regulation).
• Beyond that, these methods demonstrate the powerof Bayesian methods in latent variable problems.
7 Random Coefficients Model
• Multinomial probit with panel data on householdlevel choices
yh,t = Xh,tβh + εh,tεh,t ∼ N(0,Λ)
βh = ∆zh + vh, vh ∼ N(0, Vβ)
• Households h = 1, ...,H and time t = 1, ..., T
• Xh,t covariates and zh demographics
• Note that household specific random coefficients
βh remain fixed over time
• Ih,t observed choice
• The posterior distributions are derived in Appen-dix A.
• Formally, the derivations are very close to ourmultinomial probit model above.
• Gibbs sampling is used to simulate the posteriordistribution of Λ,∆, Vβ
8 Predictive Distributions
• The authors wish to give different coupons to dif-ferent households.
• A rational (Bayesian) decision maker would formher beliefs about household h’s preference para-
meters given her posterior about the model para-
meters.
• This will involve, as we show below, forming a
predictive distribution for βh given the econome-
trician’s information set.
• As a first case, suppose that the econometricianonly knew zh, the demographics of household h
• From our model, p(βh|zh,∆, Vβ) is N(∆zh, Vβ)
• Given the posterior p(∆, Vβ|Data), the econome-
tricians predictive distribution for βh is:
p(βh|zh,Data) =Zp(βh|zh,∆, Vβ)p(∆, Vβ|Data)
• We can simulate p(βh|zh,Data) using Gibbs sam-
pling given our posterior simulations ∆(s), V(s)β
s = 1, ..., S:
1
S
Xp(βh|zh,∆(s), V
(s)β )
• We could draw random βh from p(βh|zh,Data).
• For each∆(s), V (s)β , draw β(s)h from p(βh|zh,∆(s), V
(s)β )
• Given β(s)h , s = 1, ..., S, we could then simulate
purchase probabilities.
• Draw ε(s)ht from εh,t ∼ N(0,Λ(s))
• The posterior purchase probability for j, givenXht and zh is:
1
S
Xs1½Xjhtβh + ε
(s)jht > Xj0htβh + ε
(s)j0ht for j
0 6= j¾
• This would allow us to simulate the purchase re-sponse to different couponing strategies for a spe-
cific household h.
• The paper runs through different couponing strate-gies given different information set (e.g. full or
choice only information sets).
• The key ideas are similar- form a predictive distri-
bution for h’s preferences and simulate purchase
behavior in an analogous fashion.
• In the case of a full purchase information his-tory, we could use the raw Gibbs output since
the markov chain will simulate β(s)h s = 1, ..., S.
• This could then be used to simulate choice be-havior as in the example above (given draws of
ε(s)ht )
9 Data
• AC Neilson scanner panel data for tuna in Spring-field Missouri.
• 400 households, 1.5 years, 1-61 purchases.
• Brands and covariates in Table 2.
• Demographics Table 3.
• Table 4, delta coefficients.
• Poorer people prefer private label.
• Goodness of fit moderate for demographic coeffi-cients
• Figures 1 and 2, household level coefficient esti-mates with different information sets
• Table 5, return to different marketing strategies.
• Bottom line, you gain .5 cents to 1.0 cents per
customer through better estimates.
• With a lot of customers, this could be quite prof-itable.