CONSUMER DECISIONS ON SHARE OF WALLET, AUTOMOBILE SEARCH, AND ONLINE PRODUCT REVIEWS by Sungha Jang APPROVED BY SUPERVISORY COMMITTEE: ___________________________________________ Brian T. Ratchford, Chair ___________________________________________ Ashutosh Prasad, Co-Chair ___________________________________________ B.P.S. Murthi ___________________________________________ Gonca Soysal
153
Embed
Consumer Decisions on Share of Wallet, Automobile Search
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
CONSUMER DECISIONS ON SHARE OF WALLET, AUTOMOBILE SEARCH,
AND ONLINE PRODUCT REVIEWS
by
Sungha Jang APPROVED BY SUPERVISORY COMMITTEE: ___________________________________________ Brian T. Ratchford, Chair ___________________________________________ Ashutosh Prasad, Co-Chair ___________________________________________ B.P.S. Murthi ___________________________________________ Gonca Soysal
Copyright 2011
Sungha Jang
All Rights Reserved
To my parents, Judeok Jang and Boksun Kim
CONSUMER DECISIONS ON SHARE OF WALLET, AUTOMOBILE SEARCH,
AND ONLINE PRODUCT REVIEWS
by
SUNGHA JANG, B.A., M.B.A.
DISSERTATION
Presented to the Faculty of
The University of Texas at Dallas
in Partial Fulfillment
of the Requirements
for the Degree of
DOCTOR OF PHILOSOPHY IN
MANAGEMENT SCIENCE
THE UNIVERSITY OF TEXAS AT DALLAS
May, 2011
UMI Number: 3450462
All rights reserved
INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted.
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion.
UMI 3450462
Copyright 2011 by ProQuest LLC. All rights reserved. This edition of the work is protected against
unauthorized copying under Title 17, United States Code.
ProQuest LLC 789 East Eisenhower Parkway
P.O. Box 1346 Ann Arbor, MI 48106-1346
v
ACKNOWLEDGEMENTS
I started the long journey of studying marketing keeping in mind my mother’s saying that ‘you
can travel in a long distance by asking other people’. As I am writing these acknowledgments to
my dissertation, I am so deeply indebted to so many people who have guided me in this journey.
I have benefited greatly from the aid of my advisor, Dr. Brian Ratchford. His belief and
encouragement raised my passion and ability and were vital to the successful completion of the
theses. I am honored to work under his insightful advice. I also offer special thanks to my co-
advisor, Dr. Ashutosh Prasad. His thoughtful and constructive advice has showed me the way of
a researcher. I could not have completed my dissertation without his guidance.
I wish to thank two other committee members as well. I owe special thanks to Dr. B.P.S. Murthi
for his academic advice and considerate care throughout the program. I thank Dr. Gonca Soysal
for her interest and helpful comments on my research. I must also express my appreciation to the
marketing faculty at the University of Texas at Dallas. I am especially grateful to Dr. Ram Rao
and Dr. Nanda Kumar for their passionate and instructive guide on research.
At last, I show my greatest gratitude to my parents, who always guide me with wisdom. With
their trust, encouragement, and love, I am able to finish this long journey to a marketing Ph.D.
March, 2011
vi
CONSUMER DECISIONS ON SHARE OF WALLET, AUTOMOBILE SEARCH,
AND ONLINE PRODUCT REVIEWS
Publication No. ___________________
Sungha Jang, Ph.D. The University of Texas at Dallas, 2011
ABSTRACT
Supervising Professor: Brian T. Ratchford The objective of three essays is to understand consumers’ decisions on allocating budget to credit
card expenditures, using information sources for automobile purchases, and incorporating online
product reviews with their prior knowledge.
In the first essay, we examine how consumers allocate their budget to multiple firms and
categories. As expenditures are simultaneous and censored, we propose a Bayesian estimation of
a simultaneous equations Tobit model with latent classes. Our approach taking into account
expenditure interrelationships and consumer heterogeneity results in the more accurate prediction
of the size and share of wallet, which firms can use for better segmenting and targeting.
In the second essay, we examine the interdependency between various information sources,
segment consumers based on their search patterns, and compare search results by the segments.
We find out that both online search and offline search affect each other and that low external
search segments choose American brands and get lower discounts while high external search
vii
segments choose foreign brands and get better price deals. Our results can give managers
guidelines in which media they should provide information and which search segments they
should target.
In the third essay, we examine the effects and value of online product reviews on the purchase
decision process. In our approach, consumers incorporate product reviews with their prior
perceived quality in order to construct posterior perceived quality which affects the consideration
set and choice decisions. Our findings show that consumers use product reviews mainly in the
consideration set stage and their updating method is consistent with Bayesian updating. We also
compute the monetary values of each component of the product reviews. Our results have
managerial implications such as product review providers should display all components of
consumer reviews from the beginning of the search and manufacturers should keep consumers’
perceived quality high by managing their prior quality at all times. They should also encourage
Abstract .......................................................................................................................................... vi
List of TABLES ...............................................................................................................................x
List of FIGURES ........................................................................................................................... xi
CHAPTER 1 CONSUMER SPENDING PATTERNS ACROSS FIRMS AND CATEGORIES: APPLICATION TO THE SIZE AND SHARE OF WALLET .......................................................1
CHAPTER 3 HOW CONSUMERS USE PRODUCT REVIEWS IN THE PURCHASE DECISION PROCESS...................................................................................................................97
C) Derivatives implied by coefficients of endogenous variables
Label Purchases Cash Advances
Derivatives Habitual Segment
Adaptive Segment
Derivatives Habitual Segment
Adaptive Segment
(1) Across category within bank
CC CASHPUR / 0.03 0.03 CC PURCASH / 0.12 0.39
FF CASHPUR / 0.003 0.06 FF PURCASH / 0.19 0.02
(2) Across bank within category
FC PURPUR / -0.08 -0.30 FC CASHCASH / 0.07 0.28
CF PURPUR / -0.03 -0.13 CF CASHCASH / 0.02 -0.02
(3) Across bank and category
FC CASHPUR / -0.02 -0.31 FC PURCASH / -0.13 -0.36
CF CASHPUR / -0.01 -0.09 CF PURCASH / -0.10 -0.17
- The variable CASH represents expenditure in the cash advances category and PUR represents expenditure in the purchases category. - The subscript C represents competing banks and F represents the focal bank.
23
more reliance on factors such as income. In addition, expenditures in this segment are much
more interrelated. Thus, we label the larger segment as habitual segment and the smaller
segment as adaptive segment because expenditures depend on income.
The characteristics of the segments in demographics and expenditure patterns are as
follows. Compared to the habitual segment, consumers in the adaptive segment are slightly older
(39.1 vs. 36.8), comprise more males (70.9% vs. 66.2%), have fewer salaried persons (83.9% vs.
89.5%), and have more income. The adaptive segment spends, on average, 2.3 times more in the
purchases category and 1.4 times more in the cash advances category. However, there are no
differences between the segments in credit score and share of wallet in both categories. In the
following subsection, we discuss the effects of all factors by segment.
Effects of Endogenous Variables
Next, we interpret the coefficients of endogenous variables, which are presented in terms
of the derivatives implied by the coefficients in Table 1.3.C. These summarize how the
preference for purchases category is associated with the preference for cash advances category
and vice versa, across firms and categories.
Across categories within the same bank (Label 1), the interrelationships of the preference
between categories are positive and asymmetric. Specifically, in both segments, the increased
preference for cash advances is not associated with the preference for purchase except for the
habitual segment at competing banks with a weak positive impact ( )03.0/ CC CASHPUR . In
contrast, the increased preference for purchases is associated with the increased preference for
cash advances. At competitors, the coefficients are significant for both segments
24
CC PURCASH /( =0.12 and 0.39 for the habitual and the adaptive segment, respectively). At
the focal bank, the coefficient is significant only for the habitual segment
)19.0/( FF PURCASH . From the results, we can see that the purchases category is the more
important for inducing cross-selling than cash advances category and that the interrelationship
across categories more strongly takes place at competitors.
Across banks within the same category (Label 2), the interrelationships of the preference
for different banks depend on the category. In the purchases category, increased preference at the
focal bank is associated with decreased preference at competitors ( FC PURPUR / = 08.0 and
30.0 for the habitual and adaptive segment, respectively). Similarly, the increased preference at
competitors is associated with the decreased preference at the focal bank ( FC PURPUR / =
03.0 and 13.0 , respectively). The reason for the negative interrelationship in the purchases
category may be because consumers keep a balance between expenditures at two different banks
due to the limited budget.
Interestingly, cash advances respond somewhat differently. The increased preference for
cash advances at the focal bank are associated with increased preference for cash advances at
competing firms ( 28.0,07.0/ FC CASHCASH for each segment, respectively), but the
reverse is not so strong ( 02.0/ CF CASHCASH , for the habitual segment). Evidently, those
who use the focal firm for cash advances tend to seek cash advances elsewhere as well possibly
because they cannot satisfy their cash demand only at the focal bank. Therefore, if the focal bank
can increase the limit of cash advances without risk, there are business opportunities in the cash
advances category. However, those who use competing firms for cash advances are less likely to
25
use the focal firm possibly because there are a number of competing banks that can be sources of
cash advances.
Finally, we compare the interrelationships of preferences across bank and across category
(Label 3). In general, these interrelationships are negative. It is seen that increased preference for
cash advances by the adaptive segment of the focal bank is associated with reduced preference
for purchases at competing banks ( 31.0/ FC CASHPUR ). Similarly, increased preference
for cash advances at competing banks is associated with reduced preference for purchases at the
focal bank ( 01.0/ CF CASHPUR and 09.0 , respectively). With respect to cash advances
category, increased preference for purchase at the focal bank is associated with reduced
preference for cash advances at competing banks ( 13.0/ FC PURCACH and 36.0 ) while
increased preference for purchases at competing banks is associated with reduced preference for
cash advances at the focal bank ( 10.0/ CF PURCASH and 17.0 , respectively). The
results show that the interrelationships of expenditures also occur across-bank and across-
category, which cannot be discovered by a within bank and within category examination.
In summary, the results show that there are interrelationships between expenditures
across banks and categories. Overall, the interrelationships are positive across categories at the
same bank or in the cash advances category across banks. Therefore, banks can utilize the
positive results for cross-selling at the same bank or offering more cash advances in order to
keep consumers use one’s own cash advances category. Meanwhile, interrelationships are
negative in the purchases across banks and cross-bank and cross-category. It should be noted that
those interrelationships vary by segments. The adaptive segment shows stronger
26
interrelationships than the habitual segment. Therefore, it is necessary for firms to distinguish the
segments and implement different marketing mix based on their usage patterns. For example, it
could be more effective to target the adaptive segment for cross-selling.
Effects of Exogenous Variables
From Table 1.3.A and 1.3.B, we found large differences in the effects of exogenous
variables between the two segments. Most of all, expenditures of the habitual segment are
heavily affected by past expenditures while the expenditures of the adaptive segment are more
affected by demographics. Especially, the differences in coefficients of income, salaried person
indicator, and credit score are salient. We briefly present impacts of exogenous variables by
segments.
First, we start with the habitual consumers. Age, male, and salaried person do not have an
impact on the preferences for both purchase and cash advance categories at both firms. However,
income has a positive impact on preferences for purchase category ( , for
competing banks and the focal bank respectively, and hereafter in this subsection) and a negative
impact on preferences for cash advance category ( 8,11 ). Credit score has negative
impacts only on the preference for cash advance category ( 26.0,82.0 ). Finally, the
impact of past expenditures is so strongly positive )03.1~74.0( that this segment is named
as the habitual segment.
Second, with respect to the adaptive segment, age has a negative impact only on the
preference for purchase category )01( and male does not have an impact on either
category. Income has a positive impact on the preference for purchase )31.0,59.0( and for
27
cash advance category at competing banks ).28.0( Salaried person has negative impact on
the preference for purchase )15.0,37.0( and a positive impact on the preference for cash
advance at competing banks )43.0( . Credit score has negative impacts on both the preference
for purchase )12.0,31.0( and the preference for cash advance )65.0,60.1( . The
coefficients of past expenditures )58.0~33.0( are about half that for the habitual segment.
Overall, credit card usage is less affected by age or gender and more affected by income.
Higher income consumers tend to spend more in the purchases category and less in the cash
advances category. Salaried persons show various usage patterns depending on firms and
categories. Finally, consumers with high credit score are less likely to use the cash advances
category.
Prediction of the Size and Share of Wallet
After finding consumer spending patterns across firms and categories, we utilize this
knowledge for predicting the size and share of wallet. We predict expenditures only at competing
banks conditional on expenditures at the focal bank given that the firm should have availability
of its own transactions with consumers. The size of wallet for a category is the sum of
expenditures at the focal firm and expected expenditures at competing firms. The share of wallet
is calculated as,
)|( focalscompetitorfocal
focalfocal
yyEyy
WalletofSizey
WalletofShare .
After the burn-in period, we calculate the expected expenditures at competing banks.
Note that we derive expected expenditures from the latent utility from expenditures at competing
28
banks conditional on observed expenditures at the focal bank. As the analytical calculation
method is complicated, we use a numerical approach using Monte Carlo integration, which is
explained in detail in Appendix C.
The prediction results of our proposed model (S2) and the benchmark model (M1) are
presented in Table 1.4.A for the estimation sample and Table 1.4.B for the validation sample. We
present Mean Absolute Error (or MAE) of the size and share of wallet between the real
expenditures and expected expenditures. For the estimation sample, we present the results by
segments to see how much we can improve the prediction by segmentation. In each drawing, we
obtain the membership from a multinomial distribution with the probability which is the revised
prior membership probability with the likelihood conditional on observed expenditures. We
allocate each consumer to the segment of which their membership frequency is higher.
The results show that, in general, the proposed model considering the interrelationship
and heterogeneity shows better performance in predicting both the size and share of wallet,
especially in the habitual segment. As the benchmark model M1 does not distinguish the
interrelationships and the different preferences of consumers. Although the coefficients of past
expenditures in M1 are on average 0.67, its prediction is worse than the adaptive segment which
does not heavily depend on the past usage patterns. Specifically, in the purchases category, the
decreased MAE of the habitual segment in model S2 compared to model M1 is 4.20 % for the
size and 2.22 % for the share of wallet. However, in the adaptive segment, the reduction is
marginal for the size ( 7.2 %) and does not take place for the share of wallet. In the cash
advances category, the decreased MAE of the habitual segment in model S2 is 3.25 % for the
29
Table 1.4. Prediction of the Size and Share of Wallet A) The estimation sample
Category Type Segment N M1 (MAE) S2 (MAE) Reduction (%)
Purchases
Size Habitual 4412 0.341 0.271 -20.4%
Adaptive 1143 0.971 0.944 -2.7%
Share Habitual 4395 0.134 0.104 -22.2%
Adaptive 1141 0.121 0.127 5.0%
Cash
Advances
Size Habitual 4412 0.482 0.360 -25.3%
Adaptive 1143 0.899 0.806 -10.3%
Share Habitual 2675 0.139 0.123 -11.4%
Adaptive 695 0.143 0.140 -1.6%
B) The validation sample
Category Type Past Expenditure N M1 (MAE) S2 (MAE) Reduction
(%)
Purchases
Size Observed 995 0.519 0.506 -2.6%
Predicted 994 0.689 0.694 0.7%
Share Observed 991 0.132 0.123 -6.4%
Predicted 990 0.164 0.163 -0.3%
Cash
Advances
Size Observed 995 0.566 0.483 -14.7%
Predicted 994 0.874 0.816 -6.6%
Share Observed 612 0.152 0.145 -4.5%
Predicted 612 0.181 0.177 -2.4%
30
size and 4.11 % for the share. In addition, in the adaptive segment, the decrease in MAE is
3.10 % for the size and 6.1 % for the share of wallet. In summary, the proposed model brings
the benefits of better prediction for the habitual segment by correctly estimating the effects of
factors, mainly the past expenditures, and as good prediction for the adaptive segment as the
simple model M1 though the latter segment does not heavily rely on the past expenditures.
We also present the prediction results of the validation sample in Table 1.4.B. In practice,
it may be difficult to use past expenditures at competing banks as exogenous variables.
Therefore, for past expenditures at competing banks, we use both observed ones assumed to be
available and predicted ones estimated by a Tobit model in which we regress past expenditures at
competing banks on demographics and past expenditures at the focal bank. To calculate expected
expenditures at competitors, we use the every 10th draws from the posterior distribution. As we
cannot decide the segment membership in the validation sample, each consumer is randomly
assigned to the habitual segment and adaptive segment with the probability of 76% and 24% on
average for each iteration.
The results show that the performance of the proposed model S2 is generally better than
the performance of the benchmark model M1 in that there are decreases of MAE in prediction of
the size and share for both categories. It is notable that the better performance of S2 holds even
when we use the predicted past expenditures at competitors. Considering the fact that a reduced-
form model usually predicts better than a structural model, it is meaningful that our proposed
structural model outperforms the benchmark. Therefore, the results in the validation sample
firmly show that it is necessary to consider the interrelationships and heterogeneous preferences
when firms predict the size and share of wallet.
31
CONCLUSION
Summary
The purpose of this paper is to better understand consumers’ spending patterns across
firms and categories in order to better predict the size and share of wallet. It considers the
interrelationship between expenditures and heterogeneity in preference, which have not been
addressed in the previous literature. Consumers’ utility maximization problem with respect to
their expenditures subject to the budget constraint derives the simultaneous equations Tobit
model. To estimate the model, we propose a Bayesian estimation method. That is, using the
MCMC algorithms, we estimate the coefficients of endogenous variables, impute the latent
endogenous variables, and estimate the coefficients of exogenous variables and variance matrix
in sequence.
With this approach, we sought to answer several research questions raised in the
introduction. The first issue was what segments we can identify. We find two consumer
segments; the habitual segment and the adaptive segment. The former consists of consumers
whose current expenditures are closely related to past expenditures, possibly because their
budget allocation preference is stable. The latter segment consists of consumers whose current
expenditures across firms and categories are strongly interrelated and are affected by their
income.
32
The second issue was about differences in consumer spending patterns by segments,
especially the interrelationship of expenditures. We find that the interrelations are different
mainly in magnitudes by the segments. In general, within-bank expenditures in purchase and
cash advance categories positively affect each other. Within category, expenditures at the focal
bank and competing banks affect each other negatively in purchases category but positively in
the cash advances category. In cross-banks and categories, we find generally negative usage
patterns. For example, purchases at one bank are negatively related to cash advances at other
bank and vice versa.
The third issue was whether we can better predict the size and share of wallet by
considering the interrelationship of expenditures and customer heterogeneity. We compared the
size and share of wallet from our proposed model with those from the benchmark model. The
proposed model generally has lower prediction errors in both the size and share of wallet than the
benchmark model. We especially find that the prediction error reduction in the habitual segment
is large, possibly because the effects of past expenditures on the current expenditures are more
accurately estimated. In conclusion, our empirical findings show that it is important to consider
the interrelationships between expenditures and consumer heterogeneity to better predict the size
and share of wallet.
Managerial Implications
Our findings provide managers with guidelines through utilizing the interrelationship of
expenditures and heterogeneity between segments. First, from the significance and magnitude of
the interrelationships, firms can accurately implement cross-selling. For example, the increased
preference for purchases is associated with the increased preference for cash advances while the
33
converse is not true. Therefore, a cross-selling strategy of promoting purchases category first and
then cash advance category can be applied.
Second, managers should understand that expenditures in some categories at the focal
bank positively affect expenditure in the same categories at competing banks. For example, in
the cash advance category, expenditures at the focal bank increase expenditures at competing
banks. Without considering the reason of this positive impact, managers may unnecessarily
overspend on competitive promotions and advertising. If they can find reasons (e.g., the low
limit of cash advances) and take actions (e.g., increase the limit or offer a loan), there may be an
opportunity to capture consumers’ whole budget in the category.
Third, managers should give attention to the adaptive segment. This segment has larger
size of wallet than the habitual segment but the focal bank’s share is not larger. As the
interrelationships of this segment across firms is largely negative, if the focal bank can attract
this segment more, the bank can get the higher share of wallet from the increased expenditures at
the focal bank and decreased expenditures at competing banks. Therefore, managers need to take
care of this segment and provide incentives so that they can achieve higher share from
consumers of higher size.
We recommend the implementation of our approach as follows: Managers need to get
information on customers’ expenditures at competing firms for a sample of customers. It is
necessary to obtain this information at least once in order to estimate the parameters and check
whether the prediction is correct. After the managers obtain the coefficients from the sample,
they can predict expenditures for the out-of-sample customers by multiplying the exogenous
variables with the coefficients. Finally, they can predict the size and share of wallet with the
34
predicted expenditures at competing firms conditional on the expenditures at the firm. The
method to estimate a sample of customers and apply the coefficients to out-of-sample for
prediction is found in many studies (e.g., Iyengar et al. 2003).
If the managers use the past expenditures at competing firms for the exogenous variables
for the exclusion restrictions, it would be necessary to predict those expenditures for the out-of-
sample customers. Using a multivariate Tobit model, managers can regress the past expenditures
at competing firms on other variables available at the firm for the sample of customers. Then,
they can predict those expenditures for the out-of-sample customers using the coefficients
obtained from the model. For the periods after estimation, the managers can use the expenditures
previously predicted by our model as the past expenditures in the current period.
Limitation and Future Research
As the data is limited to demographic information and past expenditures, we saw the
impacts of only these variables on the current expenditures and not, for example, other marketing
mix effects. For example, we might speculate that if competitors increase advertising or provide
promotions, consumers may increase expenditures at competing firms and decrease expenditures
at the focal firm. Thus, future research using richer data sets could investigate the effects of
marketing mix. If panel data is available, it would also be worth investigating the change of the
interrelationship, size, and share of wallet over time, and its possible drivers.
35
APPENDIX
A. Derivation of a Simultaneous Equations Tobit Model
We present the derivation of a Simultaneous Equations Tobit model from the utility
maximization problem with binding non-negativity constraints referring to the previous research
(i.e., Amemiya et al. 1993; Ransom 1987).
We assume that consumers maximize a quadratic utility function by allocating their
budget across M firm-category combination and the outside option. This is expressed by
(A1) 2/')(max ' yAyybyUy
Wyts M'
11.. ,
where y is a vector of non-negative expenditures, )',,,,,,,( 21112110 MC yyyyyyy , b is a
(M+1) dimensional vector, A is a (M+1) (M+1) negative definite matrix, and 11M is a (M+1)
dimensional vector of ones.
The Lagrangean function is )1(2/' '1
' yWyAyybL M and the necessary and
sufficient Kuhn-Tucker conditions for a constrained maximum are
0my
L , 0my , 0m
m yLy
0L , 0 , 0L ,
where Mm ,,0 . That is,
36
(A2) mm
yyU 0 and 01'
1 WyM .
We assume that 00y and consequently, 00y
U and 01'1 WyM because of
complementary slackness. Therefore, Equation A2 can be rewritten as
(A3) mm
yyU
yU 0
0
and WyM'
11 , where .,,1 Mm
We partition the matrix and vectors as
,'
,, 000
Aaaa
Abb
byy
y and aa
a 0
where ,, 00 by and 0a are scalars.
In a matrix form, Equation A3 can be written as
(A4) yyabyAab MM 101)'(][ 0
Using the identity yWy M'
0 1 , we express the Kuhn-Tucker conditions in Equation A4 as
(A5) yGy M10 ,
where '0
' 111'1 MMMM aAaaG and MM WabWab 11 00 .
Note that contains the stochastic elements in the form of ( 0uum ) if we set up b with
a typical element )( 0 mmm ubb , where 0mb is a deterministic marginal utility and mu
represents individual differences in marginal utility among consumers. In a general form,
could be made to depend on individuals’ characteristics (exogenous variables) and error terms. A
typical m-th equation in Kuhn-Tucker conditions in Equation A5 could thus be written as
37
00
0011
m
mm
K
kmkmk
J
jmmj
yif
yifxy
where mkx are the individuals’ characteristics and m is an error term resulting from differences
in marginal utility.
An alternative way of writing the conditions would be
00
011
RHSif
RHSifxyy m
K
kmkmk
J
mjj
mmjmmm
which is a typical expression of a simultaneous equations Tobit model. That is, the Kuhn-Tucker
conditions to find out the optimal expenditure convert to the estimation problem of a
simultaneous equations Tobit model. After standardization of the parameters, we derive a
simultaneous equations Tobit model as
(A6) ,XY
where Y represents a vector of endogenous variables which could be censored at zero and X
represents exogenous individuals’ characteristics, and is a vector of error terms following a
multivariate normal distribution.
B. MCMC Algorithms for Model Estimation
Equation 2 is the main equation to estimate. Our approach is to sequentially draw , ,
, and *iy . We first explain the basic estimation method in the aggregate level from the
subsection 1 through the subsection 5. Then, in the subsection 6, we explain how to extend the
basic estimation method to the latent class model.
38
1. Likelihood function
To calculate the likelihood conditional on other parameters, we require the distribution of
*iy , which is derived from Equation 2. As i follows a multivariate normal distribution, we re-
arrange the equation and denote the function, )( *1iyg , as follows.
(B1) iiii Xyyg **1 )(
By the transformation technique, we get the distribution of *iy :
(B2)
)()()(21exp
)2(
1
|)(|)()(21exp
)2(1
|)(|][
)()]([)(
*1'11'1*2/1'112/
*1'*2/12/
*
*'
*1*1*
iiiiM
iiiiM
ii
i
iii
XyXy
absXyXy
absXyf
yyg
absygfyf
That is, a multivariate Normal distribution, ),(~ '111*ii XNy , is obtained.
Assuming that each observation is independent, we can calculate the likelihood for all
observations by )()( *
1
*i
N
i
yfyf .
2. Estimation of Coefficients of Endogenous Variables ( )
The matrix is estimated using a random walk chain Metropolis-Hastings method. As
the diagonal elements of are 1, we need to estimate only the off-diagonal elements. Let ~
denote the vector consisting of off-diagonal elements of , where the dimension of ~ is
1)(~ 2 MMK . We use a normal prior, i.e., ),(~~~~N and generate candidate draws
according to zr )1(* ~~ , where ),0(~ zNz and r denotes the r-th iteration. We assume
39
Kz Ic ~~ and determine the value of c~ to make the acceptance probability is around 40%,
following the general rule (Koop 2003). Therefore, the candidate *~ is drawn from a
multivariate normal distribution such as
)~,~(~~~
)1(*K
r IcN
Using the prior of ~ and the likelihood, we calculate the posterior probability of ~ as
follows.
(B3) )()~(),,|~( ** yfy
With Equation B3, we calculate the acceptance probability as
(B4) 1,),,|~~(
),,|~~(min)~,~( *)1(
***)1(
yy
rr .
3. Estimation of Coefficients of Exogenous Variables and Error Covariance ( and )
Once we get a new draw of )(~ r , we construct )(r by re-arranging .~ )(r Finally, we
calculate **~ii yy (hereafter, we suppress the iteration number r for simplicity). We now stack
all the observations together as
NMNN X
X
X
y
y
y
111
*
*1
* ,,,
~
~
~
and write
(B5) Xy *~ .
We assume that i follows ),0(N and follows ),0( NIN .
40
As Equation B5 is a SUR model, we can estimate and by using a Gibbs sampler
with standard Normal-Wishart priors. Specifically, we use a normal prior ),(~ N and a
Wishart prior ),(~1 VvW .
The posterior of conditional on *~y and 1 is ),(~,~| 1* Ny ,
where 1
1
1'1 )(N
iii XX and )~(
1
*1'1N
iii yX . The posterior for 1
conditional on *~y and is ),(~,~| *1 VvWy , where vNv and
.)'~)(~(1
1
**1N
iiiii XyXyVV
4. Data Augmentation
Now, we address the censoring issue. After getting all parameters ( , and ), we can
impute *iy in the following way. (1) If all elements in iy are positive, there is no need to impute.
(2) If all elements in iy are zero, we draw *iy from the multivariate truncated normal
distribution, )~,( 1)0,( iXMVTN , where '11~ . (3) If some elements of iy are zero,
we draw the latent values from a conditional multivariate truncated normal distribution.
Denote the zero symi ' as a vector of uy (unknown symi '* ) and the non-zero symi ' as a
vector of ky (known symi '* ). Then, we impute *
uy from
)~,( ||)0,( kukuMVTN ,
41
where ))((~~)( 111| kkkkukuku XyX and '1
|~~~~~
ukkkukuuku . Note that
uX )( 1 is a vector of elements of iX1 , which corresponds to unknown symi '* while
kX )( 1 is a vector of elements of iX1 , which corresponds to known symi '* . Similarly,
uu~ is a covariance matrix between unknown symi '
* while kk~ is a covariance matrix between
known symi '* . In addition, uk
~ is a covariance matrix between unknown symi '* and known .'* symi
5. Prior Distribution
We use diffuse settings for the priors on parameters as follows. Coefficients of
endogenous variables : )10,0(~~~
3~ KK IMVN . Coefficients of exogenous variables:
)10,0(~ 3KK IMVN . Variance-covariance matrix: ),(~1 VvW , where 3Mv and
MIvV )/1( . Variance of the random walk chain: 510~c , which makes Kz I ~510 .
6. The latent class membership
The steps of estimating latent classes model are to determine the class of observation for
each iteration and then run the estimation of Equation 2 in the given class. Given a latent
segment s, Equation 2 for consumer i belonging to the segment s can be expressed as
issisiss Xy* , where ),0(~ sis N and the likelihood function given other parameters is
S
ssssiisii yfeVBGepyf
1
** ),,|(),,,,|( ,
where '1 ),,( iSii eee and 1ise if consumer i belong to the segment s and 0ise if not. In
addition, '1 ),,( SG , '
1 ),,( SB and '1 ),,( SV .
42
p is a vector of the probability of the consumer belonging to the s-th class in the mixtures.
That is, ),,( 1 Sppp and )1( iss ePp . For the prior of p, we set up a Dirichlet distribution
of )(~ Dp , where S1 and S1 is an S-vector of ones. As there is an identification problem in
the mixture model, we impose a labeling restriction by drawing p from an ordered Dirichlet,
which makes ss pp 1 for Ss ,,2 . The posterior distribution of p is
)(~ Dp , where N
iie
1
.
We draw ie from the multinomial distribution, ),1(~ pMei . The posterior distribution of ie is
expressed as
S
ssssis
SSSiSS
ssssis
ii
yfp
yfp
yfp
yfpMe
1
*
*
1
*
111*
1
),,|(
),,|(,,),,|(
),,|(,1~ .
After the membership s is determined, we select observations belonging to the s class (i.e.
*isy and isX ), sequentially draw other parameters within the given class s, and repeat the steps
through the last class.
C. Monte Carlo Integration
We calculate expected expenditures at competing banks given that the focal bank has
access to consumers’ expenditures at it. To do this, first, we need to consider consumers’ usage
patterns at the focal bank (i.e., 21y and 22y ) such that (1) they use both categories, (2) they use
one of categories, and (3) they use none of categories. Second, we need to convert the latent
utility from expenditures (i.e., ),|( 2221*11 yyyE and ),|( 2221
*12 yyyE ) to expected consumer
43
spending (i.e., ),|( 222111 yyyE and ),|( 222112 yyyE ). We extend the expectation logic of a
univariate Tobit model to multivariate and conditional expectation. The distribution of the latent
utility is
(C1) ),(~ '111*ii XMVNy
The expected expenditures in two categories at competitors are calculated as follows.
(C2) ],),(|),[(]),[(
])(|)[(])[(
**12
*11
*12
*11
4
1
**12
*11
12,1112,11
4
112,1112,11
kk
k
kk
k
RyyyyERyyP
RyyyyERyyPyyE
where kR means a possible range that actual expenditures at competing banks exist. Specifically,
each range is defined as )0,0(: 12111 yyR , ),0(: 212112 ryyR , )0,(: 121113 yryR , and
),(: 2121114 ryryR , where 01r and 02r . *kR means a possible range that the latent utility
from expenditures at competing banks. Corresponding to , each *kR is defined as follows:
)0,0(: *12
*11
*1 yyR , )0,0(: *
12*11
*2 yyR , )0,0(: *
12*11
*3 yyR , and )0,0(: *
12*11
*4 yyR .
Given that expenditures at the focal bank is known, we can calculate expected
expenditures at competing banks conditional on expenditures at the focal bank.
(C3)
],),(),(|),[(]),(|),[(
),(|
),(|
**22
*21
**12
*11
*12
*11
4
1
**22
*21
**12
*11
**22
*2112,11
222112,11
lkk
lk
l
l
SyyandRyyyyESyyRyyP
SyyyyE
SyyyyE
where lS means observed expenditures at the focal bank and is one of )0,0(: 22211 yyS ,
),0(: 222212 syyS , )0,(: 221213 ysyS , or ),(: 2221214 sysyS , where 01s and
02s . *lS means a possible range that the latent utility from expenditures at the focal bank
kR
44
exists. Corresponding to lS , each *lS is defined as follows: )0,0(: *
22*21
*1 yyS ,
),0(: 2*22
*21
*2 syyS , )0,(: *
221*21
*3 ysyS , and ),(: 2
*221
*21
*4 sysyS .
As it is difficult to analytically calculate Equation C3, we use a Monte Carlo integration
and describe the steps as follows.
Step 1. Randomly draw ),( *12
*11 yy for N times. In case of )0,0(: 22211 yyS , we draw
from the multivariate normal distribution in Equation C1. In other cases, we draw from a
multivariate normal distribution conditional on the positive values of 21y and 22y .
Step 2. For the probability part in Equation C3, calculate the ratio of the number of draws **
12*11 ),( kRyy and **
22*21 ),( lSyy to the number of draws **
22*21 ),( lSyy .
Step 3. For the expectation part, calculate the average of draws ),( *12
*11 yy , which belong to
*kR given *
lS .
45
REFERENCES
Amemiya, Takeshi (1974), “Multivariate Regression and Simultaneous Equation Models when the Dependent Variables Are Truncated Normal,” Econometrica, 42 (6), 999-1012.
Amemiya, Takeshi, Makoto Saito, and Keiko Shimono (1993), “A Study of Household
Investment Patterns in Japan: An Application of Generalized Tobit Model,” The Economic Studies Quarterly, 44 (1), 13-28.
Baumann, Chris, Suzan Burton, and Greg Elliott (2005), “Determinants of Customer Loyalty and
Share of Wallet in Retail Banking,” Journal of Financial Services Marketing, 9 (3), 231-48.
Bowman and Das Narayandas (2004), “Linking Customer Management Effort to Customer
Profitability in Business Markets,” Journal of Marketing Research, 41 (November), 433-47.
Cameron, Colin A. and Pravin K. Trivedi (2005), Microeconometrics: Methods and
Applications, Cambridge: Cambridge University Press. Carlin, Bradley P. and Thomas A. Louis (2000), Bayes and Empirical Bayes Methods for Data
Analysis. Boca Raton: Chapman & Hall. Chen, Yuxin and Joel H. Steckel (2005), “Modeling Credit Card 'Share of Wallet': Solving the
Incomplete Information Problem,” working paper, New York University, NY. Cooil, Bruce, Timothy L. Keiningham, Lerzan Aksoy, and Michael Hsu (2007), “A Longitudinal
Analysis of Customer Satisfaction and Share of Wallet: Investigating the Moderating Effect of Customer Characteristics,” Journal of Marketing, 71 (January), 67-83.
Du, Rex Yuxing, Wagner A. Kamakura, and Carl F. Mela (2007), “Size and Share of Customer
Wallet,” Journal of Marketing, 71 (April), 94-113. Iyengar, Raghuram, Asim Ansari, and Sunil Gupta (2003), “Leveraging Information Across
Categoreis,” Quantitative Marketing and Economics, 1 (4), 425-65.
46
Kamakura, Wagner, Michel Wedel, Fernando de Rosa, and Jose Afonso Mazzon (2003), “Cross-selling through Database Marketing: a Mixed Data Factor Analyzer for Data Augmentation and Prediction,” International Journal of Resear3ch in Marketing, 20 (1), 45-65.
Koop, Gary (2003), Bayesian Econometrics. Hoboken, NJ: Wiley. Li, Kai (1998), “Bayesian Inference in a Simultaneous Equation Model with Limited Dependent
Variables,” Journal of Econometrics, 85 (2), 387-400. Li, Shibo, Baohong Sun, and Ronald T. Wilcox (2005), “Cross-Selling Sequentially Ordered
Products: An Application to Consumer Banking Services,” Journal of Marketing Research, 42 (May), 233-39.
Maddala, (1986), Limited-Dependent and Qualitative Variables in Econometrics. New York,
NY: Cambridge University Press. Malthouse, Edward C. and Paul Wang (1998), “Database Segmentation Using Share of
Customer,” Journal of Database Marketing, 6 (3), 239-52. Murphy, KM and RH Topel (1985), “Estimation and Inference in Two-Step Econometric
Models,” Journal of Business and Economic Statistics, 20 (1), 88-97. Ransom, Michael R. (1987), “A Comment on Consumer Demand Systems with Binding Non-
negativity Constraints,” Journal of Econometrics, 34, 355-59. Reinartz, Werner J. and V. Kumar (2003), “The Impact of Customer Relationship Characteristics
on Profitable Lifetime Duration,” Journal of Marketing, 67 (January), 77-99. Reinartz, Werner J., Jacquelyn S. Thomas, and V. Kumar (2005), “Balancing Acquisition and
Retention Resources to Maximize Customer Profitability,” Journal of Marketing, 69 (January), 63-79.
Verhoef, Peter C. (2003), “Understanding the Effect of Customer Relationship Management
Efforts on Customer Retention and Customer Share Development,” Journal of Marketing, 67 (October), 30-45.
Yang, Sha, Vishal Narayan and Henry Assael (2006), “Estimating the Interdependence of
Television Program Viewership Between Spouses: A Bayesian Simultaneous Equation Model,” Marketing Science, 25 (4), 336-49.
Zheng, Zhiqiang, Peter S. Fader, and Balaji Padmanabhan (2009), “Inferring Competitive
Measures Using Augmented Site-Centric Data,” working paper, University of Texas at Dallas, TX.
47
CHAPTER 2
SEARCH PATTERNS, SEARCH-BASED SEGMENTATION AND SEARCH RESULTS
OF AUTOMOBILE PURCHASERS
Sungha Jang
School of Management, Department of Marketing, SM32
The University of Texas at Dallas
800 West Campbell Road
Richardson, Texas 75080-3021
48
ABSTRACT
Consumers often search several information sources when making purchase decisions. In this
paper, we study how time spent searching one source is interrelated with time spent searching
other sources using data on new automobile purchases. In this category, information sources
include different offline sources, Internet websites, spouse and internal search. We build a
structural model assuming that consumers allocate their search time across several information
sources to maximize utility and estimate relationships of each information source. Then, we
segment consumers based on their search preferences and examine brand choices and price-
related results. We find that consumers use information sources in a complementary manner and
that the dealer is still a prominent source. We also find that at the segment level, except for two
segments, an inverted U-shaped relationship exists between internal and external search. Brand
choice analysis reveals that low external search is associated with a choice of American brands.
Finally, segments achieve different price negotiation times and discounts but display similar
satisfaction with price paid. Based on these results, we provide recommendations to automakers.
significant. Therefore, we do not find any evidence of the inverted U-shaped relationship
between internal search and external search in parameter estimation.
The effect of spouse search is positive on search preference for all the external sources
except for manufacturer websites ( ’s range between 0.065 and 0.253). Note that there is no
interaction effect between spouse search and gender across all information sources. Therefore,
regardless of the gender of buyers, it seems that buyers and their spouses search together.
Effects of other exogenous variables. There are various effects of other exogenous
variables in Table 2.6. We find that the year of the survey had an effect on search in external
sources. Compared to the base year 2001, consumers searched more in offline independent
sources in 2003 ( 615.0 ) and searched less on offline manufacturer source in 2005
( 521.0 ). Among Internet sources, consumers searched more on independent websites
( 665.1 in 2003) and manufacturer websites ( 901.0 in 2003 and 0.878 in 2005).
However, they reduced their search time on dealer websites in 2003 ( 491.0 ). The positive
sum of coefficients reflects the increase in total external search time in 2003 and 2005 compared
to the base year 2001.
The results in Table 2.6 also show the effects of demographics and product attributes.
Male buyers searched less on offline personal sources ( 755.0 ) and experiential sources
( 82.0 ). However, there is no effect of age on search preference for all information sources.
With respect to search cost-related variables, consumers with higher education years had a lower
search preference for the offline manufacturer sources and dealer sources ( 098.0 and
0.06- ) and consumers with higher hourly wage searched less in offline dealer sources and
78
experiential sources ( 009.0 and 02.0 ). It is also found that high sticker price increases
search preference for offline independent sources and experiential sources ( 891.0 and 1.228,
respectively). Finally, as might be expected, the effect of the helpfulness of each information
source is positive.
Search Based Segments
Our second part of results is obtained from the segmentation analysis. By estimating
Equation 2, we obtained the latent search preference ( *ijy ) on each external information source.
We then segment the consumers in terms of search related variables including the latent search
preferences, internal search and spouse search, as well as a dummy variable of whether the
customer used the Internet or not. As mentioned previously, we used a two-step cluster analysis
for handling both continuous variables related to search and the discrete dummy variable. We
selected the number of clusters based on BIC and the size of the segments. We used a nine-
segment solution because the decrease in BIC is marginal after nine segments and none of the
segments is too small to be managerially relevant. Figure 2.2 shows the search times across the
segments.
In Figure 2.2, the x-axis represents the extent of the internal search and the y-axis
represents the offline, the Internet, and spouse search times. The figure lists the segments from
S1 to S9 for reference and gives the size of each segment. We sorted nine segments by the
degree of internal search. The first three segments are low internal search segments (i.e. the
levels of internal search lie between - 1.21 and 0.44- ), the next three segments are moderate
internal search segments (the levels of internal search lie between 0.24- and 0.05), and the last
79
Segments Segment Label S1 (273, 16.1%) Lowest internal searcher S2 (192, 11.3%) Lowest external searcher S3 (247, 14.6%) Low internal but moderate offline searcher S4 (55, 3.2%) Highest external searcher S5 (114, 6.7%) High searcher in online/offline sources S6 (75, 4.4%) High searcher in offline sources S7 (286, 16.9%) Moderate searcher S8 (189, 11.1%) High internal and low external searcher S9 (266, 15.7%) Most experienced and loyal searcher
Figure 2.2. Search Based Segments (S1 to S9) and Their Search Times in Hours
three segments are high internal search segments (the levels of internal search lie between 0.35
and 1.20). Note that some segments use the Internet (S1, S4, S5, S7, and S8) while other
segments do not use the Internet (S2, S3, S6, and S9).
Figure 2.2 reveals two interesting results that are new to the literature. First, we find, at
the segment level, the inverted U-shaped relationship between the internal search and external
0
5
10
15
20
25
30
-1.21 -0.79 -0.44 -0.24 -0.18 0.05 0.35 1.03 1.20
Exte
rnal
Sea
rch
Tim
e
Internal Search
Search Time by Segments
Offline Internet Spouse
SegmentSize
S1(273)
S2(192)
S3(247)
S4(55)
S5(114)
S6(75)
S7(286)
S8(189)
S9(266)
80
search if we put aside segments S1 and S4. That is, the level of external search increases in the
low internal search segments up to the segment S5 and S6 (the segments with moderate internal
search) and then decreases in the high internal search segments. It is worth remarking that the
theoretical inverted U-shaped relationship, like the view of Moorthy et al. (1997), is obtained at
the segment level, but not in aggregate. The other interesting result is two off-pattern segments:
S1 and S4. Members of S1, the lowest internal search segments, do moderate external search
(12.2 hours). The reason could be that these consumers want to compensate for their lack of
knowledge by external search. Segment S4 is a niche segment (3.2%) with moderate internal
search (-0.24) but long external search (43.9 hours). The reasons could be that they are efficient
enough to process the external information more or that they enjoy external search based on their
current internal knowledge. Uncovering these segments is important because, depending on their
size, they can mask the inverted U-shaped relationship. If segments S1 and S4 are relatively
large, it may make the relationship between internal and external search negative.
In Table 2.7, we provide a labeling and profiling of the segments by looking at their
search patterns and descriptive characteristics including demographics. After this, we also
discuss how our segmentation results compare against those of Furse et al. (1984).
Segment S1 (size n=273, 16.1%) is the second largest and characterized by the lowest
internal search. Though their internal search level is lowest, their external search is moderate
(12.2 hours). Segment S2 (n=192, 11.3%) also does low internal search and is characterized by
its lowest external search time (3.48 hours) and no Internet usage. Segment S3 (n=247, 14.6%)
consists of the moderate offline external searchers. This segment shows higher level of internal
search and external search than S2.
81
Segment S4 (n=55, 3.2%) is characterized by the highest external search time (43.9
hours). It is notable that their external search is extremely high compared to the degree of their
internal search. Segment S5 (n=114, 6.7%) and Segment S6 (n=75, 4.4%) consist of consumers
who use offline sources for long time (21.2 and 25.5 hours, respectively) and whose spouse
search is also high (15.1 and 16.1 hours, respectively). The differences between two segments
are that S5 uses Internet sources while S6 does not and that S6 spends the longest time on test-
driving (7.39 hours).
Segments S7 (n=286, 16.9%), S8 (n=189, 11.1%), and S9 (n=266, 15.7%) do relatively
high internal search. As the degree of internal search increases in segments, the external search
time decreases. Especially, S9 does the highest internal search but very low external search time
(3.9 hours). In addition, this segment does not use the Internet at all.
Results relating segments and demographics are as follows. Older consumers are more
likely to belong to high internal search and low external search segments compared to younger
consumers. Females with low internal search or males with high internal search are likely to
belong to low external search segments. Employed consumers do not necessarily belong to lower
search segments than those who are unemployed. Highly educated consumers do not necessarily
belong to the high internal search segments but are likely to belong to the high external search
segments. Consumers with high hourly wages do not seem to reduce their search time because
some high wage segments search more than low wage ones.
We overlay our segments on those of Furse et al. (1984) in Table 2.7. We find that S1 matches
their cluster, Self-Reliant Shopper, in that they spend certain amount of time but do not involve
other people much. S2 matches their cluster, Purchase Pal Assisted, who are the least
82
experienced car shoppers and get help from others. S3 and S7 are similar to their Moderate
search cluster. S4 matches their High Search cluster of those consumers spending the greatest
amount of time in search activity. S5 and S6 are similar to their cluster, Retail Shopper, who
involves many decision makers, especially the wife, in the search process. S8 and S9 match their
cluster, Low Search, of those who have the prior purchases experience but spend less time.
Roughly, therefore, search based segments in the Internet era match those in the pre-Internet era.
Some differences, however, are also found in the Internet era. For example, while the
proportion of Purchase Pal Assisted has decreased (19% vs. 11.3%), Self-Reliant Shopper has
increased (12% vs. 16.9%). In addition, a new segment has emerged (S5, 6.7%), which is similar
to their Retail Shopper cluster, but uses the Internet as an additional information source. The
changes occurred possibly because many consumers got information directly from the Internet
without other people’s help or even led to a new searcher type.
Search Results
The third and final part of results pertains to the effects of search on brand choices and
price-related outcomes. First, we look at the brand choices of the search-based segments, which
is new to the literature. We examine the relationship by a correspondence analysis first followed
by a logit model analysis. Then, we look at price-related outcomes such as pricing negotiation
time, discount amount/rate, and the final price satisfaction by segment.
Search based segments and their brand choices. We categorize the individual automobile
brands into country level brands. If there are many brands in the same country, we classify them
by manufacturer depending on the number of observations. The final brands we use are Chevy
(20.1%), GM low brands (Pontiac and Saturn, 9.6%), GM high brands (Cadillac, GMC, and
83
Table 2.7. Description of Segments
Segments (Size, %) Search Pattern Demographics
Similar Group in
Furse et al. (1984)
S1 (273, 16.1%) Lowest internal searcher
Their internal search is lowest. They seem to make up lack of their knowledge by moderate external search.
The youngest segment (average age 39.1 years). They are highly educated (16.3 years), employed (88%), and paid hourly wage ($25.1). They buy cars for the first time or change models.
Self-Reliant Shopper (12%)
S2 (192, 11.3%) Lowest external searcher
As their internal search is low, their external search is also low. They are the lowest searchers and do not use the Internet.
Average age 49.1 years. The female proportion is higher (54%). Marriage rate is 64%. Their education level (14.4 years) and hourly wage ($19.3) are lower than others. They do not have many experiences in automobile purchases.
Purchase Pal Assisted (19%)
S3 (247, 14.6%) Low internal but moderate offline searcher
Their internal search is relatively low but external search is larger than other low internal searcher segment (S2). They do not use the Internet.
Average age is 50.2 years. Fewer years of education (14.6 years), less employed (66%) and paid ($20.1) than others. They do not have many experiences in purchases.
Moderate Searcher (32%)
S4 (55, 3.2%) Highest external searcher
They do moderate internal search but extremely high external search. This segment is niche.
Average age is 42.5 years. They are less married (60%) but more educated (16.5 years), employed (82%) and paid ($27) than others.
High Searcher (5%)
S5 (114, 6.7%) High searcher in online/offline sources
They do high external search in various sources including spouses.
Average age is 43.6 years. Half are female (51%). Most are married (80%) and employed (87%).
Retail Shopper (5%)
S6 (75, 4.4%) High searcher in offline sources
They do high external search but do not use the Internet. They spend long time in test-driving and get spouses’ help most.
Average age is 50.7 years. Half are female (51%). Most are married (85%). They are less employed (68%) and paid ($17.6).
Retail Shopper (5%)
S7 (286, 16.9%) Moderate searcher
They use various sources in the moderate level. They are the largest segment.
Average age is 44.5 years. Most are employed (87%) and highly paid ($26). The proportion of males, marriage rate, and education levels are average.
Moderate Searcher (32%)
S8 (189, 11.1%) High internal and low external searcher
They do high internal search but low external search.
Average age 48.8 years. Higher proportion of males (65%). More years of education (16.3 years), higher employment rate (84%), and paid ($31). They have purchased 3.16 cars in 10 years and 72% of them buy the same makers.
Low Searcher (26%)
S9 (266, 15.7%) Most experienced and loyal searcher
Their internal search is the highest and their external search is low. They do not use the Internet.
Average age 52.7 years. More are married (80%) but less employed (66%) and paid ($20.3). They are so loyal to auto makers that 80% of them buy the same brands. They are most experienced in purchasing cars (3.32 in 10 years).
Low Searcher (26%)
84
Oldsmobile, 10.2%), Ford (17.4 %), Chrysler (11.2%), Toyota/Honda (14.2%), other Japanese
brands (e.g., Nissan, Mazda, and so on, 4.6%), EU brands (4.7%) and Korean brands (3.6%).
We look at the graphical relationship between the search-based segments and automobile
brands using a correspondence analysis. The result is in Figure 2.3. In the correspondence
analysis, we chose two dimensions. The first dimension explains 69.2% of the original
information and the second explains 14.4%. From the perspective of the segments, the main
dimension (x-axis) appears related to Internet usage because the Internet using segments (S1, S4,
S5, S7, and S8) are located on the right side and the rest of the segments are located on the left
side. From the perspective of the brands, the main dimension appears related to the brand origin
because American brands are located together on the left side while foreign brands are on the
right side.
By combining the results of the search-based segments and their brand choices, we can
see the relationship between them. The most salient result is that the low and moderate search
segments (S2, S3, S7, S8, and S9) correspond to the American brands while the high search
segments (S1, S4, and S5) correspond to all the Japanese and EU brands. Segment S6, high
offline search segment, is close to Chrysler and Korean brands.
In addition to the correspondence analysis, we confirmed the relationship between the
segment membership and brand choices by using a multinomial logit model. We set up the
multinomial logit model as follows.
L
lil
ili
x
xlbrandP
1
)exp(
)exp()( ,
85
Figure 2.3. Correspondence between Search-based Segments and Brand Choices
where i indexes consumer and l indexes brand and ix are the independent variables including
segment dummies, demographics and product related data. For brevity, we report only the main
results of this analysis. The logit analysis results are, in general, similar to the correspondence
analysis. Setting Chevy as the reference category, compared to S9, the segments S1, S4, and S5
are more likely to choose Toyota/Honda ( =2.33, 1.83, and 1.91), other Japanese brands (
=1.98 and 1.53 for S1 and S5), or EU brands ( =2.14, 1.68, and 1.72, respectively). S3 is more
likely to choose Chrysler ( =0.85) or Korean brands ( =1.35) and S6 is more likely to choose
Chrysler ( =1.17). However, there is no difference in brand choices of S2, S7, and S8,
S1S2
S3
S4S5
S6
S7S8
S9
Chevy
GM Low
GM HighFord
Chrysler
Toyota/Honda
Other JapaneseEU
Korean-1.5
-1.0
-0.5
0.0
0.5
1.0
-1.0 -0.5 0.0 0.5 1.0 1.5
Sale
s Vol
ume
American-Foreign
Correspondence between Segments and Brands
86
compared to S9, as those segments belong to the low external search segments and are likely to
choose American brands.
The close relationship between search-based segments and their brand choices
demonstrates that it is important for automakers to choose proper communication media for their
customers. That is, American brands, whose customers are high internal and low external
searchers, might consider spending more on building consumer loyalty and satisfaction. In
contrast, foreign brands should provide more information to satisfy their consumers’ information
needs. As the foreign brands are strongly associated with Internet users, they should enhance
their Internet-based advertising and communications.
Search based segments and price-related outcomes. Finally, we look at price-related
outcomes for the different segments. To see the differences by segment, we run an ANCOVA, in
which the dependent variables are the price negotiation time with the dealer, discount amount,
discount rate (discount amount over the sticker price), and the final price satisfaction. The main
independent variable is the segment variable and the covariates are age, male indicator, marriage
indicator, employment indicator, education level in years, hourly wage, sticker price, and the
brands. Table 2.8 shows the F-test results which test for mean differences in the dependent
variables by segment.
We find that there is a difference (i.e., we can reject the null hypothesis of mean equality)
in the price negotiation time with the dealer for different segments (F=17.32, p-value<0.01). In
general, the high external search segments (S4, S5, and S6) spend a longer time on negotiating
with the dealer (around 3 hours) while low external search segments (S2, S8, and S9) and
moderate external search segments (S1, S3, S7) spend a shorter time on negotiating with the
87
dealer (around 1 to 1.5 hours, respectively). Because the negotiation takes place at the dealer,
high external searchers seem to spend more time with the dealer when they visit the dealer to
shop. The discount amount is also different by segment (F=2.41, p-value=0.01). Overall, the high
offline search segments (S4, S5 and S6) or the high internal search segments (S7, S8, and S9) get
on average discounts of $3000 while the others receive on average discounts of less than $2500.
The results of the discount rate show similar differences (F=2.44, p-value=0.01). The segments
S4 through S9 get about a 10.7~11.6% discount but the other segments receive about a
9.4~10.5% discount.
Interestingly, however, even though price negotiation times, discount amount, and
discount rate are different for different segments, the final price satisfaction, on average 5.4 out
of 7, is not different across the segments (F=0.84, p-value=0.56). Considering that every segment
ends up with a similar satisfaction level, the different search patterns are the outcomes of their
best search effort to maximize their utility given their current knowledge, productivity in
S8 High internal and low external searcher 1.07 (1.04) 3192 (2758) 0.114 (0.091) 5.55 (1.21)
S9 Most experienced and loyal searcher 1.25 (3.36) 2950 (2582) 0.108 (0.092) 5.59 (1.22)
F statistic 17.32 2.41 2.44 0.84
p-value 0.00 0.01 0.01 0.57
89
CONCLUSION
Summary
The objectives of this paper were to find out the relationships between search sources in a
comprehensive manner, segment the buyers based on their search patterns, and examine the
search results for each segment. We consider the entire range of information sources that buyers
consult in automobile purchases including internal search, offline search sources, Internet
sources, and spouse search. By analyzing the data on automobile purchases in 2001, 2003, and
2005, we find some interesting results that extend the results from the previous studies.
First, we find that, in general, search preference for each information source is positively
associated with the others. The generally positive interrelationship occurs within the offline and
the Internet sources and across the offline and the Internet sources, implying that consumers
complementarily use all information sources. However, search preference for the dealer sources
and internal search reduce search preference for all information sources. It is notable that search
preference for dealer sources significantly reduces search preference for the Internet sources.
This finding extends previous results that looked at the effects of the Internet on offline sources
but not the reverse effects.
Second, we identify nine segments based on consumers’ search patterns. The segments
are profiled based on the extent of their internal search, Internet and offline search time, and
spouse search time. Several of the segments correspond to those of Furse et al. (1984) obtained
90
prior to the Internet. At the segment level, we find the inverted U-shaped relationship between
internal search and external search. That is, low and high internal searchers are low external
searchers while moderate internal searchers are high on external search. We also find that two
segments do not conform to the inverted U-shaped relationship; one has low internal search but
moderate external search and the other has moderate internal search but extremely high external
search. Though the latter segment is small in size, the presence of two such segments shows a
reason for why the inverted U-shaped relationship may be hard to find at the aggregate level.
Finally, we examine the outcomes of search, focusing on brand choice. The results show
that segments with low external search are associated with purchase of American brands while
segments with high external search correspond to Japanese and EU brands. These results are
notable in that the relationship between search and brand choices is indentified for the first time.
In addition, we find that though the price-related outcomes are different for different segments,
final price satisfaction levels are similar across segments.
In conclusion, our study extends the search literature by providing some new insights
including the effect of offline search on Internet search, the identification of search-based
segments, the relationship of internal and external search at the segment level, and the search
segments’ brand choices. We discuss how automakers might utilize these results next.
Managerial Implication
Our results have some practical implications for dealers and automakers. First, the dealer
is still a powerful and efficient information source for consumers in the Internet era. The more
time consumers prefer to spend with the dealer, the less time they prefer to spend with other
information sources. This result qualifies the results in previous studies about the role of the
91
Internet in reducing the search time with the dealer. Automakers should carefully select and train
dealers, maintaining a good relationship with them not only for the final sales but also for
providing information to consumers.
Second, automakers can identify their positioning and their competitors’ positioning in
terms of consumers’ search patterns. The results show that American brands and Japanese brands
are close to the other brands of their countries. EU brands are close to Japanese brands, maybe
being perceived as foreign country brands, while Korean brands are positioned in a distinct
location. That is, competition occurs between brands of the same country group. Thus,
automakers could focus on their differentiation from other brands from the same country.
Third, automakers can develop efficient communication strategies based on the
relationship of the search segments and their brand choices. For example, because customers of
American brands are low external searchers, American brands might implement advertising
campaigns that build brand image and loyalty. As Japanese and EU brands are associated with
higher external and Internet search, they should enhance information delivery through their own
websites from which consumers can acquire their information and substitute other offline
information sources. Korean brands should provide more information to convince the high
search consumers. However, they have to work to reduce the distance in their position from other
foreign brands to be perceived as one of them.
Limitation and Future Research
If researchers have more information, they can understand consumers’ search patterns
better. First, consideration sets can affect the search patterns. If consumers are considering those
brands with which they are familiar and have experience, they are less likely to conduct long
92
searches because of high internal search. Yet, if they are considering new automakers, they
would have to search more to obtain the necessary information. Therefore, future studies should
consider ways to include the effect of consideration sets. Second, the sequence of search can help
determine if some information sources initiate or stop further search. This might give some
insights into which information sources are more important in different stages of search. Third,
our dataset did not cover the 2008-2009 periods, which has seen turmoil and bankruptcies in the
automobile industry, changes in product lines, elimination of dealers, government intervention,
and the recession. It would be interesting to see whether these have altered consumers’ search
patterns in automobile purchases.
93
APPENDIX
MCMC Algorithms for Model Estimation
(1) Estimating the parameters of endogenous variables ( )
The matrix is estimated using a random walk chain Metropolis-Hastings method. As
the diagonal elements of are one, we need to estimate the off-diagonal elements only. Let ~
denote the vector consisting of off-diagonal elements of , where the dimension of ~ is
1)(~ 2 JJK . In this study J is 8. We use a diffuse normal prior, i.e., ),(~~~~N , where
K~~ 0 and KI ~4
~ 10 .
We generate candidate draws according to zs )1(* ~~ , where ),0(~ zNz and s
denotes the s-th iteration. To find the proper z , we ran the MCMC algorithms twice, following
Koop (2003). In the first run, we assume Kz Ic ~1~ and randomly assign 5
1 10~c (80%) and
51 105~c (20%). After we get the variance of ~ , denoted as z , in the second run, we
reassume that zz c2~ and randomly assign 2
2 10~c (80%) and 22 105~c (20%).
As there are many parameters in ~ , we draw and accept the new candidates equation by
equation. For example, let us denote j~ as the coefficient vector of the endogenous variables in
the j-th equation. To determine sj
~ , we draw js
jj z)1(* ~~ given sj
~ , where the subject j
means the related components in the j-th equation. By comparing the posterior probabilities with
94
)1(~ sj and *~
j , we decide which draw to use at the s-th iteration and repeat the process for j=1 to
J. Estimating ~ by the split equations is helpful for getting the proper acceptance rate.
(2) Estimating the parameters of exogenous variables ( ) and covariance matrix ( )
We estimate and by using a Gibbs sampler with standard Normal-Wishart priors.
Specifically, we use a normal prior ),(~ N , where K0 and KI410 and a
Wishart prior ),(~1 VvW , where 3Jv and JIvV )/1( .
(3) Imputing *y
After getting all parameters ( , and ), we can impute *iy . If all elements in iy are
positive, there is no need to impute. If all elements in iy are zero, we draw *iy from the
multivariate truncated normal distribution, ),( '111)0,( iXMVTN . If some elements of
iy are zero, we draw the latent values from a conditional multivariate truncated normal
distribution.
95
REFERENCES
Amemiya, Takeshi (1974), “Multivariate Regression and Simultaneous Equation Models when the Dependent Variables Are Truncated Normal,” Econometrica, 42 (6), 999-1012.
________, Makoto Saito, and Keiko Shimono (1993), “A Study of Household Investment
Patterns in Japan: An Application of Generalized Tobit Model,” The Economic Studies Quarterly, 44 (1), 13-28.
Bettman, James R., (1979), Information Processing Theory of Consumer Choice. Reading, MA:
Addison-Wesley. Furse, David H., Girish N. Punj, and David W. Stewart (1984), “Typologies of Individual Search
Strategies Among Purchasers of New Automobiles,” Journal of Consumer Research, 10 (March), 417-31.
Guo, Chiquan (2001), “A Review on Consumer External Search: Amount and Determinants,”
Journal of Business and Psychology, 15 (3), 505-19. Hauser, John, Glen Urban, and Bruce Weinberg (1993), “How Consumers Allocate Their Time
When Searching for Information,” Journal of Marketing Research, 30 (November), 452-66.
Hoffman, Donna L. and George R. Franke (1986), “Correspondence Analysis: Graphical
Representation of Categorical Data in Marketing Research,” Journal of Marketing Research, 23 (August), 213-27.
Jang, Sungha, Ashutosh Prasad, and Brian T. Ratchford (2010), “Consumer Spending Patterns
across Firms and Categories: Application to the Size and Share of Wallet,” working paper, University of Texas at Dallas, TX.
John, Deborah Roedder, Carol A. Scott, and James R. Bettman (1986), “Sampling Data for
Covariation Assessment: The Effect of Prior Beliefs on Search Patterns,” Journal of Consumer Research, 13 (June), 38-47.
Klein, Lisa R. and Gary T. Ford (2003), “Consumer Search for Information in the Digital Age:
An Empirical Study of Prepurchase Search for Automobiles,” Journal of Interactive Marketing, 17 (3), 29-49.
96
Koop, Gary (2003), Bayesian Econometrics. Hoboken, NJ: Wiley. Moorthy, K. Sridhar, Brian T. Ratchford, and Debabrata Talukdar (1997), “Consumer
Information Search Revisited: Theory and Empirical Analysis,” Journal of Consumer Research, 23 (March), 263-77.
Punj, Girish N. and Richard Staelin (1983), “A Model of Consumer Information Search Behavior
for New Automobiles,” Journal of Consumer Research, 9 (March), 366-80. Ransom, Michael R. (1987), “A Comment on Consumer Demand Systems with Binding Non-
negativity Constraints,” Journal of Econometrics, 34, 355-59. Rao, Akshay and Wanda Sieben (1992), “The Effect of Prior Knowledge on Price Acceptability
and the Type of Information Examined,” Journal of Consumer Research, 19 (September), 256-270.
Ratchford, Brian T., Myung-Soo Lee, and Debabrata Talukdar (2003), “The Impact of the
Internet on Information Search for Automobiles,” Journal of Marketing Research, 40 (May), 193-209.
Ratchford, Brian T., Debabrata Talukdar, and Myung-Soo Lee (2007), “The Impact of the
Internet on Consumers’ Use of Information Sources for Automobiles: A Re-Inquiry,” Journal of Consumer Research, 34 (June), 111-19.
Russo, J. Edward and France LeClerc (1994), “An Eye-Fixation Analysis of Choice Processes
for Consumer Nondurables,” Journal of Consumer Research, 21 (September), 274-90. Viswanathan, Siva, Jason Kuruzovich, Sanjay Gosain, and Ritu Agarwal (2007), “Online
Infomediaries and Price Discrimination: Evidence from the Automotive Retailing Sector,” Journal of Marketing, 71 (July), 89-107.
Srinivasan, Narasimhan and Brian T. Ratchford (1991), “An Empirical Test of a Model of
External Search for Automobiles,” Journal of Consumer Research, 18 (2), 233-42. Yang, Sha, Vishal Narayan and Henry Assael (2006), “Estimating the Interdependence of
Television Program Viewership Between Spouses: A Bayesian Simultaneous Equation Model,” Marketing Science, 25 (4), 336-49.
Zettelmeyer, Florian, Fiona Scott Morton, and Jorge Silva-Risso (2006), “How the Internet
Lowers Prices: Evidence from Matched Survey and Automobile Transaction Data,” Journal of Marketing, 43 (May), 168-81.
97
CHAPTER 3
HOW CONSUMERS USE PRODUCT REVIEWS
IN THE PURCHASE DECISION PROCESS
Sungha Jang
School of Management, Department of Marketing, SM32
The University of Texas at Dallas
800 West Campbell Road
Richardson, Texas 75080-3021
98
ABSTRACT
Several studies have found a positive effect of product reviews on sales at the aggregate level.
This paper, however, uses individual level data to examine the influence of product reviews in
different stages of the consumer’s purchase decision process. Specifically, a two-stage model
consisting of consideration set formation and choice is posited, where information from product
reviews can be incorporated at each stage. The model is estimated using an online panel study
about hotel choice. We find that: (1) Consumers use product reviews more in the consideration
set stage and less in the choice stage; (2) Bayesian updating of prior perceived quality explains
better how consumers use product reviews compared to two competing updating methods; (3)
The monetary value of a unit increase in the mean of product reviews can be computed – in the
case of the hotel study we find that it is equivalent to a price decrease of $57. Our results suggest
that managers should make product reviews available from the beginning of the search process,
show all components of product reviews (i.e., mean, number, and variance), and focus on
satisfying customers and encouraging them to write reviews.
Parameters in bold are significant at the 95% level.
The product attributes also affect consideration set membership. The coefficient of
Bayesian updating perceived quality is positive ( 054.0 ). Therefore, hotels with high
Bayesian updating perceived quality are more likely to be included in the consideration set. The
negative coefficient of the ratio of price to willingness to pay ( 104.0 ) shows that hotels
with the ratio of higher price to willingness to pay ratio are less likely to be included. In addition,
as the coefficient of hotel brand awareness is positive ( 252.0 ), it is more likely that well
known hotels are included in the consideration set. However, the coefficient of hotel experience
at other places is not significant, possibly because most of the respondents had not stayed at
hotels in the survey. In case of Holiday Inn, around 50% of respondents have stayed at one of its
125
chain hotels, but it seems that Holiday Inn, a relatively low quality hotel, is less attractive as a
resort hotel to respondents.
In the choice stage, the hotel-specific intercepts are significant only for some hotels. The
positive coefficient of Fiesta Americana ( 938.0 ) means that if this hotel is included in the
consideration set, it is likely to be finally chosen. In contrast, the negative coefficients of
Imperial Las Perlas and Holiday Inn ( 826.0 and ,184.1 respectively) means that even if
they are included in the consideration set, those hotels are much less likely to be finally chosen.
Other than those three hotels, there are no hotel-specific effects in the choice stage. This could be
because consumers have already considered the hotel-specific effects at the consideration set
stage.
The effects of other product attributes are different in the two stages. In the choice stage,
unlike in the consideration set stage, the Bayesian updating perceived quality is not significant.
That is, after consumers consider alternatives with respect to quality in the consideration set
stage, they do not consider quality any more. Possibly, they consider hotels with similar quality
level. The coefficients of the ratio of price to willingness to pay and awareness have the same
signs as those in the consideration set stage ( 223.0 and ,423.0 respectively). Thus, among
hotels in the consideration set, hotels with higher ratio of price to willingness to pay or less
known hotels are less likely to be chosen. Finally, the experience variable is not significant in the
choice stage either.
In summary, hotel specific characteristics affect both consideration set and choice. The
significance of Bayesian updating perceived quality means that product reviews play an
important role in the consideration set stage. But it is notable that consumers do not consider
126
quality again in the choice stage. Overall, lower price and awareness increase the possibility of
being included in the consideration set and being chosen as a choice.
Values of Bayesian Updating Perceived Quality and Product Reviews
We compute the monetary value of a unit increase in Bayesian updating perceived quality
and product reviews. Our approach is to compute the unit changes of Bayesian updating
perceived quality and price, which induce the same change of the consideration set utility. We
use the coefficients of Bayesian updating perceived quality and price to willingness to pay. As
Bayesian updating perceived quality consists of prior perceived quality and product reviews, we
can finally derive the value of a unit increase in product reviews by the chain rule formula.
The coefficient of Bayesian updating perceived quality *1 1 0.054ij ijz W means
that a unit increase in Bayesian updating perceived quality increases the consideration set utility
by 0.054. The coefficient of the ratio of price to willingness to pay *2 0.104ij ijz p
means a unit decrease in the ratio increases the utility by 0.104. Therefore, one unit increase in
Bayesian updating perceived quality brings as much utility change as )104.0/054.0(52.0 unit
decrease in the ratio of price to willingness to pay. That is, for an individual i across all products
)52.0(*
1
*
ij
ij
ij
ij
pz
Wz
,
where *ijz is the utility of including product j in the consideration set, 1ijW is the expectation of
Bayesian updating perceived quality, ijp is the price to willingness to pay. As ijp consists of
127
price jp and willingness to pay iWTP , it is the case that .1
ij
i
jij WTP
dpWTP
pddp Thus, 0.52
unit decrease in ijp is equivalent to iWTP52.0 unit decrease in price as
ijij WTPdpdp 52.052.0 ,
where iWTP is willingness to pay of the individual i and is constant across hotels. In our dataset,
the average of iWTP52.0 of all respondents is $70.6, which indicates that the value of one unit
increase in Bayesian updating perceived quality is worth $70.6 in that both one unit increase in
Bayesian updating perceived quality and price decrease of $70.6 result in the same utility
change.
Next, we calculate the monetary values of each component of product reviews using the
monetary value of Bayesian updating perceived quality. Based on Equation 6, we set up the
expectation of Bayesian updating perceived quality as )/1(/1)/1(/1
220
20
20
1jjij
jjjijijij sn
rsnWW after
replacing 2ijq by 2
js which summarizes the variance of product reviews. By multiplying the
monetary value of Bayesian updating perceived quality and the derivatives of 1ijW with respect
to each component of product reviews, we can calculate the monetary values of product reviews
as follows.
128
Component of Product Reviews
Monetary Value (=Necessary Price Change Derivative)
Mean )( jr 22
0
21
1
**
//1/
)52.0(jjij
jji
j
ij
ij
ij
j
ij
snsn
WTPr
WWz
rz
Number )( jn 222
0
20
201
1
**
)//1()//1)((
)52.0(jjij
ijjijji
j
ij
ij
ij
j
ij
snsWr
WTPn
WWz
nz
Variance )( 2js
2220
20
220
21
1
*
2
*
)//1()//)((
)52.0(jjij
ijjjijji
j
ij
ij
ij
j
ij
snsnWr
WTPs
WWz
sz
Note that even though the monetary values of one unit change in posterior )52.0( iWTP are
product-invariant, the monetary values of product reviews are the product-variant as the prior
perceived quality and product reviews are product-variant.
Table 3.5 shows the monetary values of a unit increase in the mean, number, and variance
of product reviews by hotels.
Table 3.5. Monetary Value of a Unit Increase in Product reviews Components
Hotel
Monetary Value of Unit Increase in Product reviews ($)
)( 0ijj Wr Mean )( jr
Number)( jn
Variance)( 2
js
H1. Royal Solaris 67.6 -0.003 0.60 -0.030 H2. Dreams Resort 69.7 0.004 -1.10 0.931 H3. GR Solaris 68.1 -0.002 0.45 -0.058 H4. InterContinental 56.2 0.160 -4.32 0.605 H5. Riu Palace Las Americas 69.1 0.012 -2.16 0.983 H6. Fiesta Americana 68.2 0.058 -3.99 0.941 H7. JW Marriott 45.8 -0.093 2.71 -0.261 H8. Hotel Sotavento 40.6 -0.310 3.41 -0.302 H9. Imperial Las Perlas 39.6 0.343 -4.12 0.728 H10. Holiday Inn 43.8 -0.667 8.68 -0.589
129
Some observations are that (1) a unit increase in the mean of product reviews is the most
valuable, while a unit increase in variance is the second and unit increase in the number of
product reviews is not very valuable, and (2) values of product reviews vary across hotels.
Regarding the value of a unit increase in the mean consumer review, the average is $57
with a maximum of $69.7 for Dreams Resort hotel (H2) and the minimum of $39.6 for Imperial
Las Perlas (H9). The average value implies that a unit increase in the mean of product reviews
brings as much utility increase in the consideration set as price decrease by $57. Therefore, the
higher mean of product reviews is an alternative to avoid undesirable price decrease to be
included in the consideration set.
Interestingly, however, the value of a unit change in the number of product reviews is not
high and its sign is inconsistent across hotels. The maximum value is $0.34 per review for
Imperial Las Perlas (H9) and the minimum value is -$0.66 for Holiday Inn (H10). Different signs
result from the difference between the mean of product reviews and prior perceived quality
)( 0ijj Wr . If the mean of product reviews is higher than prior perceived quality (e.g., H9),
consumers may interpret the larger number of product reviews positively. However, if the mean
of product reviews is lower than prior perceived quality (e.g., H10), consumers may have doubts
about quality on those hotels and be assured by the large number of reviews.
A unit increase in variance of product reviews has moderate value and large differences
across hotels. For example, its maximum value is $8.68 for Holiday Inn (H10) and the minimum
value is -$4.32 for Intercontinental (H4). Again, different signs result from the difference
between the mean of product reviews and prior perceived quality )( 0ijj Wr but the interpretation
is not the same. If the mean of product reviews is higher than prior perceived quality (e.g., H4),
130
high variance possibly makes consumers think that even though the overall quality is high, there
are some consumers who experienced low quality just like their low prior perceived quality. So,
the value of high variance is negative. However, if the mean of product reviews is lower than
prior perceived quality (e.g., H10), consumers may regard large variance as consumer
heterogeneity and positively interpret that there are consumers who experience high quality just
like their high prior perceived quality.
In summary, using the estimation results of Bayesian updating perceived quality and
price, we find that the value of Bayesian updating perceived quality is $70.6 and the various
monetary values of product reviews depending on hotels and differences between the mean of
product reviews and prior perceived quality. Especially, the value of the mean of product reviews
is around $57 on average.
131
CONCLUSION
Summary
The objective of this paper is to study in which stages of the purchase decision process
consumers use product reviews and how they incorporate product reviews with their prior
perceived quality. We also evaluate how valuable product reviews are in monetary terms. We
used four types of perceived quality (viz. prior perceived quality, product reviews, average of
prior perceived quality and product reviews, and Bayesian updating perceived quality) in a two-
stage choice model in order to understand consumers’ decision processes when product reviews
are available.
The best fitting model (Model 10) shows that consumers use Bayesian updating
perceived quality in the consideration set stage. This means that consumers use product reviews
from the consideration set stage and the update method is consistent with the Bayesian manner,
by which consumers update prior perceived quality using the information components of product
reviews. These components are the mean of product reviews, their number and variance.
The estimation results in the two-stage choice model are summarized as follows: In the
consideration set stage, intrinsic hotel effects are high for well-known international hotel brands
such as Marriott but low for local hotels such as Hotel Sotavento. Hotels with high Bayesian
updating perceived quality are more likely to be included in the consideration set while hotels
with high price are less likely to be included. It is also shown that awareness is important for
hotels to be included.
132
In the choice stage, the results show that intrinsic hotel effects and Bayesian updating
perceived quality become much less important. Rather, price and awareness play a significant
role. Consumers consider hotels with a similar quality level in the consideration set stage but
once they construct consideration sets consisting of the similar quality hotels, they put more
weight on prices and awareness.
Finally, we compute the monetary values of the components of product reviews. We find
that a unit increase in the mean of product reviews is worth $57 on average. That is, by
improving the mean of product reviews, hotels are more likely to be included at the same price or
do not need to reduce prices to be considered more. Our findings also show that the number of
product reviews is less important, while the variance of product reviews can have positive or
negative monetary value depending on the differences between the mean of product reviews and
prior perceived quality.
Managerial Implication
There are several managerial implications of our study for retail managers who present
product reviews of different manufacturers’ products or manufacturers themselves. First, the
result that consumers use product reviews in the consideration set stage but less so in the choice
stage provides a guide for how to display product reviews. Since product reviews are important
from the consideration set stage, managers may need to give consumers easy access to product
reviews from the beginning of the search. The methods would include showing product reviews
in the list of first search results, or allowing consumers to sort the search results by the
components of product reviews. Then, consumers could actively use product reviews from the
consideration set. The managerial implication to manufacturers is that they need to have their
133
product quality good enough to be included in the consideration set because once consumers
construct the consideration set, quality is not a choice criterion any more but price still is.
Therefore, as shown in our results of the choice model, they should be aware of more price
competition between manufacturers within the similar quality level.
Second, consumers’ Bayesian updating shows that retailers and manufacturers need to be
concerned about all components of product reviews (i.e., the mean, number, and variance) as all
of components are used to update prior perceived quality. Particularly, it is recommended for
retailers to provide variance information, perhaps by using a histogram, as well as the mean and
the number which are commonly presented. Manufacturers should note that a high mean of
product reviews is much more important for determining Bayesian updating perceived quality
and eventually consideration set formation than a larger number of product reviews. Thus, it is
beneficial for manufacturers to provide encouragement to consumers who have positive
experiences with their products in order to have them write good product reviews and to handle
grievances of unhappy consumers proactively. In other words, manufacturers may need to
concentrate on motivating satisfied consumers more than increasing the number of product
reviews.
Third, regardless of the strong effects of product reviews, it is important to manage
consumers’ prior perceived quality and awareness at all times. Prior perceived quality is directly
related to Bayesian updating perceived quality and indirectly mediates the effects of the number
and variance of product reviews on Bayesian updating perceived quality. Therefore, if
manufacturers constantly maintain high prior perceived quality by brand positioning or
134
advertising, they may be able to negate the effects of bad product reviews, which are sometimes
inevitable.
Limitation and Future Research
A limitation of the research is that besides the numerical summary of product reviews,
consumers also get product information from review passages and on some sites, the percentage
of consumers who recommend a review as being helpful or unhelpful. Furthermore, consumers
can deliberately search for some positive phrases for including alternatives quickly or negative
phrases for eliminating alternatives. Therefore, it would give new insights to quantify descriptive
passages and analyze them. In further research, researchers can utilize product reviews on
subcategories. From the hotel example, consumers may also refer to detailed evaluation on hotel
service, gym or pool, hotel condition, room cleanliness, or room comfort. As consumers consult
information on different subcategories depending on products or purchase situations (e.g.,
vacation, business, or family trip), models which consider detailed information would be useful.
135
REFERENCES
Allenby, Greg M. and James L. Ginter (1995), “The Effects of In-store Displays and Feature Advertising on Consideration Set,” International Journal of Research in Marketing, 12 (May), 67-80.
Andrews Rick L., T.C. Srinivasan (1995), “Studying Consideration Effects in Empirical Choice
Models Using Scanner Panel Data,” Journal of Marketing Research, XXXII February, 30-41.
Chevalier, Judith A., Dina Mayzlin (2006), “The Effect of Word of Mouth on Sales: Online
Book Reviews,” Journal of Marketing Research, 43 (August), 345-354. Chiang, Jeongwen, Siddhartha Chib, Chakravarthi Narasimhan (1999), “Markov chain Monte
Carlo and models of consideration set and parameter heterogeneity,” Journal of Econometrics 89 223-248.
Clemons, Eric K., Guodong Gordon Gao, Lorin M. Hitt (2006), “When Online Reviews Meet
Hyperdifferentiation: A Study of the Craft Beer Industry,” Journal of Management Information Systems, 23 (2), 149-171.
Edwards, Yancy D., Greg M. Allenby (2003), “Multivariate Analysis of Multiple Response
Data,” Journal of Marketing Research, 40 (August), 321-334. Erdem, Tülin and Michael P. Keane (1996), “Decision-Making under Uncertainty: Capturing
Gensch, Dennis H. (1987), “A Two Stage Disaggregate Attribute Choice Model,” Marketing
Science, 6 (Summer), 223-31. Gilbride, Timothy J., Greg M. Allenby (2004), “A Choice Model with Conjunctive, Disjunctive,
and Compensatory Screening Rules,” Marketing Science, 23, 391-406. Hajivassiliou, V., D. McFadden, P. Rudd. (1996). “Simulation of multivariate normal rectangle
probabilities and their derivatives,” Journal of Econometrics, 72, 85-134. Keane, M. (1994). “A computationally practical simulation estimator for panel data,”
Econometrica, 62, 95-116.
136
Koop, Gary (2003), Bayesian Econometrics. Hoboken, NJ: Wiley. Liu, Yong (2006), "Word of Mouth for Movies: Its Dynamics and Impact on Box Office
Revenue," Journal of Marketing, 70 (3), 74-89. Mehta, Nitin, Surendra Rajiv, Kannan Srinivasan (2003), “Price Uncertainty and Consumer
Search: A Structural Model of Consideration Set Formation,” Marketing Science, 22 (1), 58-84.
Newton, M. and Raftery, A. (1994), “Approximate Bayesian inference by the weighted
likelihood bootstrap,” Journal of the Royal Statistical Society, Series B, 56, 3-48. Nierop, Erjen Van, Bart Bronnenberg, Richard Paap, Michel Wedel, Philip Hans Franses (2010),
“Retrieving Unobserved Consideration Sets from Household Panel Data,” Journal of Marketing Research, 47 (February), 63-74.
Roberts, John H., James M. Lattin (1997), "Consideration: Review of Research and Prospects for
Future Insights," Journal of Marketing Research, 34 (August), 406-410. Sun, Monic 2009, “How Does Variance of Product Ratings Matter?” Working paper, Stanford
University, CA. Vermeulen, Ivar E. and Daphne Seegers (2009), “Tried and tested: The impact of online hotel
reviews on consumer consideration,” Tourism Management, 30, 123-127.
137
APPENDIX
MCMC Algorithms
Equation 8 is the main equation to estimate. Our approach is to sequentially draw , ,
, *iZ and *
iY . We first stack all equations into vectors and matrices as
*
**
i
ii Y
ZU , F
i
Ci
i XX
X0
0, B ,
i
iie .
We then stack all the observations together as
*
*1
*
.
.
NU
U
U ,
NX
X
X..
1
, and
Ne
e
e..1
,
and write
(A1) eXBU *,
where e is ),0(N , and where is a block-diagonal matrix given by ),0( NIN .
As Equation A1 is a SUR model, we can estimate B and by using a Gibbs sampler
with standard Normal-Wishart priors. Specifically, we use a normal prior ),(~ BBNB and a
Wishart prior ),(~1 VvW .
The posterior of B conditional on *U and 1 is ),(~,| 1*BBNUB ,
138
where )(1
*1'1N
iiiBBBB UX and 1
1
1'1 )(N
iiiBB XX . The posterior for 1
conditional on *U and B is ),(~,| *1 VvWBU , where vNv and
.)')((1
1
**1N
iiiii BXUBXUVV
Now, we estimate *iZ and *
iY using data augmentation. First, we data augment *iZ given
*iY and other parameters from a multivariate normal distribution
(A2) )~,|(~ **zzi
Cii YXMVNZ ,
where *| iCi YX is the expectation of *
iZ conditional on *iY and zz
~ is the variance-covariance
matrix of error terms in the consideration set stage conditional on the other variance-covariance
matrices ( zyyy , ). We draw a positive *ijz if consumer i includes product j in the consideration
set and a negative *ijz if consumer i does not include product j in the consideration set.
Second, we data augment *iY given *
iZ and other parameters from a multivariate normal
distribution
(A3) )~,|(~ **yyi
Fii ZXMVNY ,
where *| iFi ZX is the expectation of *
iY conditional on *iZ and yy
~ is the variance-covariance
matrix of error terms in the choice stage conditional on the variance-covariance matrices
),( zyzz .
We draw *ijy in the case of whether product j is in the consideration set and whether it is
finally chosen in the consideration set. For products in the consideration set, we augment *ijy
from a sub-distribution of the distribution in A3, consisting of the mean vector and variance-
139
covariance matrix of products in the consideration set. We impose restrictions that *ijy is the
highest for the finally chosen product among products in the consideration set and that each *ijy
is negative if the consumer chooses the outside option (No reservation). For products not
included in the consideration set, we augment *ijy from a sub-distribution of the distribution in
A3, consisting of the mean vector and covariance matrix of products not included in the
consideration set. Unlike the products in the consideration set, however, we do not impose a
restriction on the size and sign of *ijy .
VITA
Sungha Jang received a Bachelor of Economics with a major in statistics in 1998 and a Master of
Business Administration concentrating on Marketing in 2001 from Korea University, Seoul,
Korea. He will be awarded the Doctor of Philosophy in Management Science specializing in
Marketing in May, 2011 at the University of Texas at Dallas. Prior to joining the Ph.D. program,
he worked for Experian Korea as a senior consultant in the field of credit risk management.