Consumer Decisions on Share of Wallet, Automobile Search

CONSUMER DECISIONS ON SHARE OF WALLET, AUTOMOBILE SEARCH,

AND ONLINE PRODUCT REVIEWS

by

Sungha Jang APPROVED BY SUPERVISORY COMMITTEE: ___________________________________________ Brian T. Ratchford, Chair ___________________________________________ Ashutosh Prasad, Co-Chair ___________________________________________ B.P.S. Murthi ___________________________________________ Gonca Soysal

Copyright 2011

Sungha Jang

All Rights Reserved

To my parents, Judeok Jang and Boksun Kim



by

SUNGHA JANG, B.A., M.B.A.

DISSERTATION

Presented to the Faculty of

The University of Texas at Dallas

in Partial Fulfillment

of the Requirements

for the Degree of

DOCTOR OF PHILOSOPHY IN

MANAGEMENT SCIENCE

THE UNIVERSITY OF TEXAS AT DALLAS

May, 2011

UMI Number: 3450462

All rights reserved

INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted.

In the unlikely event that the author did not send a complete manuscript

and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion.

UMI 3450462

Copyright 2011 by ProQuest LLC. All rights reserved. This edition of the work is protected against

unauthorized copying under Title 17, United States Code.

ProQuest LLC 789 East Eisenhower Parkway

P.O. Box 1346 Ann Arbor, MI 48106-1346

v

ACKNOWLEDGEMENTS

I started the long journey of studying marketing keeping in mind my mother’s saying that ‘you

can travel in a long distance by asking other people’. As I am writing these acknowledgments to

my dissertation, I am so deeply indebted to so many people who have guided me in this journey.

I have benefited greatly from the aid of my advisor, Dr. Brian Ratchford. His belief and

encouragement raised my passion and ability and were vital to the successful completion of the

theses. I am honored to work under his insightful advice. I also offer special thanks to my co-

advisor, Dr. Ashutosh Prasad. His thoughtful and constructive advice has showed me the way of

a researcher. I could not have completed my dissertation without his guidance.

I wish to thank two other committee members as well. I owe special thanks to Dr. B.P.S. Murthi

for his academic advice and considerate care throughout the program. I thank Dr. Gonca Soysal

for her interest and helpful comments on my research. I must also express my appreciation to the

marketing faculty at the University of Texas at Dallas. I am especially grateful to Dr. Ram Rao

and Dr. Nanda Kumar for their passionate and instructive guide on research.

At last, I show my greatest gratitude to my parents, who always guide me with wisdom. With

their trust, encouragement, and love, I am able to finish this long journey to a marketing Ph.D.

March, 2011

vi



Publication No. ___________________

Sungha Jang, Ph.D. The University of Texas at Dallas, 2011

ABSTRACT

Supervising Professor: Brian T. Ratchford The objective of three essays is to understand consumers’ decisions on allocating budget to credit

card expenditures, using information sources for automobile purchases, and incorporating online

product reviews with their prior knowledge.

In the first essay, we examine how consumers allocate their budget to multiple firms and

categories. As expenditures are simultaneous and censored, we propose a Bayesian estimation of

a simultaneous equations Tobit model with latent classes. Our approach taking into account

expenditure interrelationships and consumer heterogeneity results in the more accurate prediction

of the size and share of wallet, which firms can use for better segmenting and targeting.

In the second essay, we examine the interdependency between various information sources,

segment consumers based on their search patterns, and compare search results by the segments.

We find out that both online search and offline search affect each other and that low external

search segments choose American brands and get lower discounts while high external search

vii

segments choose foreign brands and get better price deals. Our results can give managers

guidelines in which media they should provide information and which search segments they

should target.

In the third essay, we examine the effects and value of online product reviews on the purchase

decision process. In our approach, consumers incorporate product reviews with their prior

perceived quality in order to construct posterior perceived quality which affects the consideration

set and choice decisions. Our findings show that consumers use product reviews mainly in the

consideration set stage and their updating method is consistent with Bayesian updating. We also

compute the monetary values of each component of the product reviews. Our results have

managerial implications such as product review providers should display all components of

consumer reviews from the beginning of the search and manufacturers should keep consumers’

perceived quality high by managing their prior quality at all times. They should also encourage

satisfied customers to write good reviews.

viii

TABLE OF CONTENTS

Acknowledgements ..........................................................................................................................v

Abstract .......................................................................................................................................... vi

List of TABLES ...............................................................................................................................x

List of FIGURES ........................................................................................................................... xi

CHAPTER 1 CONSUMER SPENDING PATTERNS ACROSS FIRMS AND CATEGORIES: APPLICATION TO THE SIZE AND SHARE OF WALLET .......................................................1

ABSTRACT .........................................................................................................................2

INTRODUCTION ...............................................................................................................3

LITERATURE REVIEW ....................................................................................................6

MODEL AND METHODOLOGY......................................................................................9

DATA AND ESTIMATION .............................................................................................14

RESULTS ..........................................................................................................................18

CONCLUSION ..................................................................................................................31

APPENDIX ........................................................................................................................35

REFERENCES ..................................................................................................................45

CHAPTER 2 SEARCH PATTERNS, SEARCH-BASED SEGMENTATION AND SEARCH RESULTS OF AUTOMOBILE PURCHASERS ..........................................................................47

ABSTRACT .......................................................................................................................48

INTRODUCTION .............................................................................................................49

LITERATURE REVIEW ..................................................................................................55

MODEL AND METHODOLOGY....................................................................................60

DATA AND ESTIMATION .............................................................................................64

RESULTS ..........................................................................................................................70

CONCLUSION ..................................................................................................................89

ix

APPENDIX ........................................................................................................................93

REFERENCES ..................................................................................................................95

CHAPTER 3 HOW CONSUMERS USE PRODUCT REVIEWS IN THE PURCHASE DECISION PROCESS...................................................................................................................97

ABSTRACT .......................................................................................................................98

INTRODUCTION .............................................................................................................99

LITERATURE REVIEW ................................................................................................103

MODEL AND ESTIMATION ........................................................................................107

SURVEY AND DATA ....................................................................................................117

RESULTS ........................................................................................................................121

CONCLUSION ................................................................................................................131

REFERENCES ................................................................................................................135

APPENDIX ......................................................................................................................137

VITA

x

LIST OF TABLES

Number Page

Table 1.1. Descriptive Statistics.....................................................................................................15

Table 1.2. Model Fits and Reduction Percent Compared to the Benchmark Model (M1) ............19

Table 1.3. Parameter Estimates ......................................................................................................20

Table 1.4. Prediction of the Size and Share of Wallet ...................................................................29

Table 2.1. Information Sources in the Automobile Purchases.......................................................51

Table 2.2. Comparison of Studies Related to Automobile Purchases ...........................................56

Table 2.3. Descriptive Statistics of Major Variables .....................................................................65

Table 2.4. Results of Principal Component Analysis ....................................................................68

Table 2.5. Interrelationship of the External Search Sources ..........................................................71

Table 2.6. Effects of the Exogenous Variables ..............................................................................76

Table 2.7. Description of Segments ...............................................................................................83

Table 2.8. Results of ANCOVA ....................................................................................................88

Table 3.1. Competing Specifications ...........................................................................................108

Table 3.2. Hotel Information .......................................................................................................118

Table 3.3. Log Marginal Likelihood of Models ..........................................................................122

Table 3.4. Parameter Estimates of Model 10 ...............................................................................124

Table 3.5. Monetary Value of a Unit Increase in Product reviews Components .........................128

xi

LIST OF FIGURES

Number Page

Figure 1.1. Conceptual Model of Factors to Affect Expenditures .................................................16

Figure 2.1. Conceptual Model .......................................................................................................52

Figure 2.2. Search Based Segments (S1 to S9) and Their Search Times in Hours .......................79

Figure 2.3. Correspondence between Search-based Segments and Brand Choices ......................85

Figure 3.1. Survey Screen on Some Hotel Information ...............................................................118

Figure 3.2. Survey Procedure.......................................................................................................119

1

CHAPTER 1

CONSUMER SPENDING PATTERNS ACROSS FIRMS AND CATEGORIES:

APPLICATION TO THE SIZE AND SHARE OF WALLET

Sungha Jang

School of Management, Department of Marketing, SM32


800 West Campbell Road

Richardson, Texas 75080-3021

2

ABSTRACT

Firms need to know consumers’ expenditures across firms and categories to predict their size-

and share-of-wallet. In this study, we model consumers’ expenditures allowing for three features:

(1) interrelationship in consumers’ spending across multiple firms and categories, called

simultaneity; (2) data censoring that occurs when consumers do not spend in certain categories;

and (3) consumer heterogeneity in spending patterns. To handle these, we propose a

simultaneous equations Tobit model with latent classes. The model is estimated with Bayesian

estimation using credit card expenditure data. Two segments are identified. One segment, of

‘habituals,’ covers 76% of consumers who show habitual usage patterns. The remaining

segment, ‘adaptives,’ allocates their budget based on income and demographics. We discuss the

interrelationship of expenditures across firms and categories by segments. The findings suggest

that firms need to take heterogeneity and inter-related expenditures into account for accurate

prediction of size- and share-of-wallet.

Keywords: Share of wallet; Customer heterogeneity; Structural model; Simultaneous equations

Tobit model; Bayesian estimation.

3

INTRODUCTION

Share of wallet is defined as the percentage of a customer’s total category expenditure (i.e., size

of wallet) that is captured by the firm. It is an informative metric for customers’ untapped

potentials, for the effectiveness of marketing activities, and for competitive benchmarking. It has

been used as a loyalty measure (e.g., Bowman and Narayandas 2004), as a segmentation criterion

(e.g., Reinartz and Kumar 2003), and is known to have a positive impact on profits (e.g.,

Reinartz, Thomas and Kumar 2005). However, as Du, Kamakura and Mela (2007) emphasize,

calculating the share of wallet requires information about customers’ expenditures at competing

firms as well as one’s own firm; information that is often unavailable. In the absence of

information on expenditures at competitors, a model to predict it needs to be constructed. This

issue motivates the present research.

It may be informative to consider the impact of cross-firm and cross-category

expenditures on share of wallet calculations because studies show that consumer purchases in a

category are related to their purchases in other categories (e.g., Iyengar, Ansari and Gupta 2003;

Li, Sun and Wilcox 2005). However, these studies do not have share of wallet calculation as a

goal, and do not consider consumers’ purchases from competing firms. In the existing share of

wallet literature, either a single category is considered (e.g., Chen and Steckel 2005; Zheng,

Fader, and Padmanabhan 2009) or the relationship between inter-category expenditures is

indirectly examined through the error term (e.g., Du, Kamakura and Mela 2007).

4

In this paper, we develop a structural model of budget allocation to study the

interrelationship of expenditures across firms and categories, given current and past expenditures

in multiple categories at a focal firm and competing firms. We consider the possibility that

spending patterns can vary for different consumers and control for unobserved heterogeneity

using a latent class approach. Estimating this model presents methodological issues such as

simultaneity and censoring, which lead to more complexity compared to the typical problem of

determining the size and share of wallet in a single category that is found in the literature.

Simultaneity occurs because expenditures across firms and categories are interrelated due to

consumers’ budget constraint. Censoring occurs when consumers do not spend in all categories

in all firms.

To resolve the simultaneity and censoring problems, we propose a Bayesian estimation

method for estimating a simultaneous equations Tobit model, which represents an important

contribution of the paper. The approach is to estimate the coefficients of endogenous variables

(expenditures) by a Metropolis-Hastings algorithm to solve simultaneity, and for zero

expenditures, impute from a truncated normal distribution conditional on non-zero expenditures,

solving censoring. Multiplying endogenous variables with the coefficients of endogenous

variables yields a SUR model. With this SUR model, we estimate coefficients of exogenous

variables and error covariance by Gibbs samplers. As Carlin and Louis (2000) and Koop (2003)

point out, there are not many Bayesian studies that use a simultaneous equations Tobit model.

Therefore, our approach can be a good alternative for future studies.

The empirical application, in common with several other papers in the literature, deals

with financial services offerings. We use credit card data for a focal bank and a composite of

5

competing banks, both of whom offer services in the two categories that we consider, namely,

making purchases and taking cash advances on the bank credit card. We refer to these two

categories as purchases and cash advances, respectively. Given this data, we examined the

following research questions: (1) What consumer segments can be identified? (2) How different

are consumer spending patterns by segments, especially interrelationship of expenditures? (3)

Can we better predict the size and share of wallet by allowing interrelationship and heterogeneity

compared to benchmark models? We answer those questions through the estimation results of

our model. We find the proposed model has better fit and prediction for the size and share of

wallet on both the estimation and validation sample than benchmark models.

The paper proceeds as follows: The next section is a review of the literature. In the model

and methodology section, we propose the simultaneous equations Tobit model and the estimation

method. Then, we describe the data set and present the results. Finally, we provide the

conclusions, managerial implication and directions for future research.

6

LITERATURE REVIEW

This study is related to two research areas: relationship of purchases across categories and the

size and share of wallet. We consider each in turn.

First, relationship of purchases across categories is well studied in the context of cross-

selling. For example, Kamakura et al. (2003) showed relationships in purchases of 22 financial

services at a focal bank and its competitors. From this, they predicted the possibility of

customers purchasing services that they did not yet own. Li et al. (2005) found a sequential order

of buying financial products and predicted opportunities of cross-selling products based on this.

Similarly, Iyengar et al. (2003) present a model to better understand and predict consumers’

purchases in a category given purchases in other categories. In general, the cross-selling

literature shows that purchases across categories are interrelated and that this knowledge can

help to predict purchases in other categories. However, the existing literature is for categories

within a firm and does not examine interrelationship between firms, which we address in this

study.

Second, we examine the share of wallet literature by further dividing it into three broad

streams of research. The first stream deals with the role and consequences of share of wallet.

Regarding its role, share of wallet has been treated as a loyalty measure (Bowman and

Narayandas 2004), as a segment criterion (Reinartz and Kumar 2003), and as an attrition detector

(Malthouse and Wang 1998). Regarding its consequences, the positive effects of share of wallet

7

on profitability and relation duration have been investigated (e.g., Reinartz and Kumar 2003;

Reinartz, Thomas and Kumar 2005).

The second stream examines the antecedents of share of wallet. For example, Bowman

and Narayandas (2004) and Cooil, Keiningham, Aksoy and Hsu (2007) analyze the impact of

customer satisfaction on share of wallet. Baumann, Burton and Elliott (2005) and Verhoef (2003)

identify customers’ characteristics and transaction patterns and firms’ marketing activities that

are associated with share of wallet. These findings are helpful for firms to understand who their

low share of wallet consumers are and improve the relationship with such customers.

The third stream is to predict expenditures at competing firms and calculate share of

wallet based on the predicted expenditures. Often, expenditure data are available only for the

focal firm and either unavailable or partially available for the competitors. Then, a prediction

model for the unavailable expenditures is required. This is the literature against which we are

most closely positioned. We proceed to discuss it in more detail.

Chen and Steckel (2005) calculate share of wallet of a credit card firm by inferring

consumers’ behavior at competing firms. Specifically, they use the number of grocery purchases

with the focal firm’s credit card and infer the number of purchases that customers might have

made from competitors using Markov switching matrices. Based on the length of inter-purchase

times, they calculate share of wallet without information on competitors. Zheng et al. (2009)

calculate share of wallet for five online retailers (selling apparel, wireless services, books, office

supplies, and travel) by using market level data on competitors that are publicly available. They

use a limited information NBD/Dirichlet model for calculating share of wallet. Du et al. (2007)

use imputed balances outside the focal bank to simultaneously predict category ownership, total

8

amount, and the focal firm’s share at 10 categories of a focal bank by a multivariate factor

analytic model.

The previous literature focused on share of wallet in a single category (e.g., Chen and

Steckel 2005), share of wallet at the firm level (e.g., Zheng et al. 2009), or independent share of

wallet in several categories (Du et al. 2007). However, based on the results from the cross-selling

literature, it is obvious that firms should consider the relationships between purchases in multiple

categories, which we consider in this paper. Therefore, our methodological contribution extends

the previous literature by including the interrelationship of expenditures across multiple

categories at multiple firms.

9

MODEL AND METHODOLOGY

Consider J firms that sell products in C categories to a market of consumers. The consumers also

have an outside option that represents all consumption categories not provided by the J firms.

The firms are indexed by Jj ,,1 , the categories by Cc ,,1 and the outside option by 0.

We assume that consumers maximize their concave utility functions by allocating budget W

across these M (= CJ ) firm-category combinations and the outside option. This is expressed by

(1) Wyts

yAyybyU

M

y

'1

'

1..

2/')(max

where )',,,,,,,( 21112110 MC yyyyyyy is the vector of expenditures, b is an (M+1)

dimensional vector, A is an (M+1) (M+1) negative definite matrix, and 11M is an (M+1)

dimensional vector of ones. Note that expenditures are nonnegative, and that b and A can be

interpreted as proportional to the mean and the variance-covariance matrix of the unit return to

the expenditures (Amemiya, Saito and Shimono 1993).

Ransom (1987) and Amemiya et al. (1993) point out that the Kuhn-Tucker conditions for

the quadratic utility with a budget constraint yields a simultaneous equations Tobit model

introduced by Amemiya (1974). We provide the derivation in Appendix A. Thus, for each

individual i, we have a simultaneous equations Tobit model given in structural form by:

(2) *i i iy X , where ),0(~ Ni .

10

The vector *iy of latent utility from expenditures contains M endogenous variables and

thus M equations indexed by Mm ,,1 . For consumer i and expenditures of m-th firm-

category combination, the relationship between the expenditure utility ( *miy ) and the observed

expenditure ( miy ) is given by,

00

0*

**

mimi

mimimi

yify

yifyy

That is, consumers’ spending equals their expenditure utility provided it is higher than a

threshold, which is scaled to zero, and they spend nothing in categories that yield negative utility.

Matrix iX contains the vectors of exogenous variables (e.g., consumer characteristics) in

the following form.

'

'2

'1

000

0000

Mi

i

i

i

x

xx

X ,

where mix is a mk -vector containing i-th observation of the vector of explanatory variables in the

m-th equation. In each mix vector, there are variables common to all M equations and a unique

variable in each m-th equation for exclusion restrictions to identify the system.

is a MM matrix whose diagonal elements are one and whose off-diagonal elements

are coefficients of other endogenous variables. This matrix expresses the interrelationship of the

endogenous variables. is a 1K vector of coefficients, where M

mmkK

1

. The vector

11

shows the impact of the exogenous variables on the endogenous variables. Finally, i follows a

normal distribution ),0(N .

Estimating the structural model given by Equation 2 presents issues of simultaneity and

censoring. Simultaneity occurs because each endogenous variable affects the other endogenous

variables, expressed through the matrix . The standard solution described in, e.g., Cameron

and Trivedi (2005, p. 561) for two simultaneous equations, is to first obtain the reduced form and

estimate it as a Tobit model. Second, replace the regressors *iy in the structural model by their

reduced form predictions, then proceed with regression.

In contrast, we adopt an alternative Bayesian approach. To solve the simultaneity

problem, we estimate from the original structural form. To solve the censoring issue, if miy is

zero, we impute *miy using a truncated multivariate normal distribution conditional on other non-

zero endogenous variables. If we multiply the endogenous variables with (i.e. *iy ) and derive

new variables, namely )(~ **ii yy , the original model converts into a SUR model (i.e.

iii Xy *~ ). Thereafter, the estimation method of and is the same as that of a SUR

model.

There are two advantages of our Bayesian approach. First, compared to maximum

likelihood estimation, it is more feasible in estimating models with many endogenous variables.

This is because if we estimate a simultaneous equations Tobit model by maximum likelihood

method (e.g. Maddala, 1986), the numerical approximation (e.g. the GHK simulator) to the

multivariate normal integral in the likelihood is less accurate as the number of equations

increases. Even if we estimate the model by two-step maximum likelihood estimation (e.g.

12

Murphy and Topel, 1985), in the case of more than two equations, it is not easy to correct the

covariance matrix of estimators in the second step. However, our approach does not involve

multiple integrals nor variance correction as we sequentially draw parameters given other

parameters.

The second advantage of the proposed approach, compared to other Bayesian methods, is

that it is possible to estimate all types of simultaneous equations models: recursive models, just-

identified models and over-identified models. Li (1998) infers a recursive model with a limited

dependent variable. Our approach can easily analyze a recursive model by setting a proper

restriction on the matrix . Regarding just-identified models, Koop (2003) mentions an

estimation method to first take its reduced form, estimate it by a SUR model and then recover the

parameters in the structural form by transformation. Yang, Narayan and Assael (2006) also use a

similar approach. However, their method is not applicable to over-identified models because

there is an identification issue of recovering parameters in the structural form from the reduced

form. In contrast, our approach is applicable to even over-identified models as we do not recover

parameters from a reduced form but separately draw parameters of endogenous and exogenous

variables.

Note that we adopt a latent class model in order to capture unobserved heterogeneity. On

every iteration, we determine the membership of the latent classes and estimate parameters by

the latent classes to have different , and . Details of the Markov Chain Monte Carlo

(MCMC) algorithms are provided in Appendix B. A summary is presented below.

Step A. Draw the membership of the latent class s

Step B. For the latent class s=1, ,S

13

- Step B1. Take a candidate draw *~s using the posterior simulator. Calculate the posterior

probability at *~s and )1(~ r

s , where r means r-th iteration. Accept the candidate draw with

the probability

- Step B2. With the new draw )(~ rs , derive )(r

s and calculate )1*()()*(~ rs

rs

rs yy

- Step B3. Draw )(rs and )(r

s using a Gibbs sampler for a SUR model

- Step B4. Draw )*(rsy given all parameters )(r

s , )(rs and )(r

s

Step C. Repeat Step A through B for R times

14

DATA AND ESTIMATION

Data

The empirical application uses data on credit card expenditures in two categories,

purchases and cash advances, which are the main services of the credit card industry. The data

consist of expenditures in these two categories at the focal bank and all its competitors

aggregated together. Therefore, we assume that there are two banks: the focal bank and a second,

composite bank.

The dataset has the current three-month average expenditures and past three-month

average expenditures both at the focal bank and competitors. Besides expenditure information,

we have demographic information such as age, gender, income, salaried person indicator (1 if a

consumer is salaried, 0 if self-employed), and credit scores.

In Table 1.1.A, we provide descriptive statistics for the demographic variables, but

omitting expenditures due to confidentiality requirements. Recall that if latent utility is negative,

expenditures are observed as zero. In Table 1.1.B, the censoring percent of the current and past

expenditures are presented.

We discarded some consumers who provided information that seemed suspect (i.e.,

reported too low or too high income) or showed abnormal credit card use (i.e., were delinquent

or had average monthly expenditure that exceeded a quarter of their annual income). After

cleaning the data, the final dataset had 6617 observations. We randomly selected 5610 (or 85%)

observations for estimation and kept the remaining 1007 for validation.

15

Table 1.1. Descriptive Statistics A) Demographics

Variable Mean (or %) SD Min Max

Age 37.3 8.1 22 77

Male (Dummy) 0.7 0.5 0 1

Log of Income 29.1 18.4 3.0 197.6

Salaried Person (Dummy) 0.9 0.3 0 1

Credit Score 6.4 0.7 3.7 7.8

B) Censoring percent of expenditures variables

Expenditures Current Past

Purchase C 6.4% 6.5%

Cash Advance C 46.8% 47.8%

Purchase F 12.3% 16.9%

Cash Advance F 59.3% 58.8%

The subscript C represents competing banks and F represents the focal bank.

Model Estimation

A conceptual view of the model is depicted in Figure 1.1. With the variables in the

dataset, we construct a simultaneous equations Tobit model with four equations (2 banks 2

categories) as shown in Equation 3.

(3)

22

21

12

11

*22

*21

*12

*11

421

412

411

322

312

311

222

221

211

122

121

112

,2222

,2121

,1212

,1111

'22

'21

'12

'11

*22

*21

*12

*11

00

00

yyyy

yyyy

XXXX

yyyy

Past

Past

Past

Past

16

Figure 1.1. Conceptual Model of Factors to Affect Expenditures

Note that we re-arranged the terms in Equation 2 for better interpretation. Thus, the

matrix in Equation 2, which represents the interrelationship between endogenous variables, is

converted into the -matrix on the right hand side of Equation 3. The ’s in the jc-th equation

are interpreted as effects of the latent utility from other expenditures ( *jcy ) on the latent utility

from the jc-th expenditure ( *jcy ).

In Equation 3, *jcy is the latent utility from expenditure in the category c at the firm j,

where j=1 (competitors) or 2 (the focal firm) and c=1 (purchases) or 2 (cash advances). Equation

3 shows that *jcy is affected by all other endogenous variables *

jcy as well as common

demographic information X. The past three-month expenditures Pastjcy , are included in the

equation where the corresponding expenditure is the endogenous variable and are used for the

17

exclusion restrictions. Values of the past expenditures are assumed to be known in the current

period.

We estimate the parameters using Metropolis-Hastings within a Gibbs sampler and

impute the latent endogenous variables *jcy using data augmentation. We made 20,000 draws, of

which the first 10,000 were discarded as a ‘burn-in’ period. We checked that the algorithm

converged by investigating the stable trends of draws and the distributions of draws. The

acceptance rates were around 0.4. We kept every 10th draw to report the parameters estimation

and calculate the conditional expenditures at competitors.

Benchmark Model

We use the following multivariate Tobit model as a benchmark model to compare with

the proposed model.

iii Xy* , where ),0(~ Ni

The benchmark model is thus a reduced-form of our structural model that does not consider the

interrelationship of endogenous variables and heterogeneity of consumer preferences. In this

model the utility from expenditures is affected by demographics and past expenditures. The

vector is the combined effect of and in Equation 2. For the exogenous matrix iX , we

use the same variables as the proposed model.

18

RESULTS

In this section we discuss the model fit and the parameter estimates. Using model fit statistics,

the number of the latent classes is determined. Then, for each latent class, we interpret the

interrelationship between expenditures across firms and categories, which is our main focus.

After this, we interpret effects of exogenous variables including demographics and past

expenditures. Finally, we present the prediction results for the estimation and validation sample.

Model Comparison

We estimate the proposed model (denoted S) for different numbers of latent segments.

We also estimate the benchmark model (denoted M). Variables whose 95% posterior intervals do

not contain zero are in bold.

In Table 1.2, we present several model fit statistics including marginal likelihood, AIC,

BIC, and DIC. For the proposed model, we find that a 2-segment solution, denoted S2, works

best. The BIC of S2 (30,551) is a reduction of 29% from the BIC of S1 (43,120), while the

decrease after two segments is less significant (e.g. less than 5% difference). Other statistics also

indicate that S2 is preferred. In S2, the size of the segments are 76% and 24%.

From Table 1.2, we can see the benefit of including expenditure interrelationships and

consumer heterogeneity by comparing S2 and S1 against the benchmark models M1 and M2.

First, both S2 and S1 show better model fit than M1. Furthermore, comparing the main model S2

against a benchmark model with the same latent class number (M2), shows that modeling

19

Table 1.2. Model Fits and Reduction Percent Compared to the Benchmark Model (M1)

Model Segment Size

Model Fit Reduction %

Marginal LL

AIC BIC DIC Marginal LL

AIC BIC DIC

M1 100% -21,700 42,950 43,200 107,280 - - - -

M2 63/37% -13,733 30,452 30,956 68,386 -36.7% -29.1% -28.3% -36.3%

S1 100% -21,610 42,790 43,120 106,810 -0.4% -0.4% -0.2% -0.4%

S2 76/24% -13,274 29,888 30,551 67,101 -38.8% -30.4% -29.3% -37.5%

interrelationships is important because the BIC in S2 (30,551) is statistically smaller than BIC in

M2 (30,956). Second, with respect to consumer heterogeneity, S2 and M2 show large increase in

likelihood and decrease in information criteria compared to M1. That is, there is heterogeneity in

consumer spending patterns and including it significantly increases model fit.

Segment Characteristics

Hereafter, we focus on the results for S2. We profile the two segments in our proposed

model S2 by examining parameter estimates of each segment in Table 1.3. The larger segment

(76%) shows inertia in usage pattern over time since the coefficients of past expenditures across

firms and categories are high (over 0.7) possibly because allocation preference across firms and

categories are already in equilibrium. Meanwhile, the effects of other factors including

interrelationship are not so strong compared to the other segment. In contrast, the smaller

segment (24%) shows less dependence on past expenditures (the coefficients are around 0.4) and

20

Table 1.3. Parameter Estimates A) Expenditures at competing banks

LHS RHS Habitual Segment Adaptive Segment

Estimates SD Estimates SD

Purchases

(C)

Exo-

geneous

Intercept 0.02 0.10 2.01 0.80

Age -0.001 0.001 -0.01 0.01

Male (D) -0.01 0.01 -0.09 0.08

Log of Income 0.05 0.01 0.59 0.08

Salaried Person (D) -0.04 0.02 -0.37 0.11

Credit Score 0.01 0.01 -0.31 0.12

Past Expenditure 0.79 0.01 0.33 0.02

Other

Endo-

geneous

Purchase C - - - -

Cash Advance C 0.03 0.005 0.03 0.03

Purchase F -0.08 0.02 -0.30 0.04

Cash Advance F -0.02 0.01 -0.31 0.05

Cash

Advances

(C)

Exo-

geneous

Intercept 5.05 0.26 10.19 0.91

Age -0.004 0.003 0.001 0.01

Male (D) 0.06 0.04 -0.01 0.13

Log of Income -0.11 0.04 -0.28 0.13

Salaried Person (D) 0.14 0.06 0.43 0.16

Credit Score -0.82 0.04 -1.60 0.13


Other

Endo-

geneous

Purchase C 0.12 0.02 0.39 0.08

Cash Advance C - - - -

Purchase F -0.13 0.05 -0.36 0.11

Cash Advance F 0.07 0.02 0.28 0.09

21

Table 1.3 continued.

B) Expenditures at the focal bank

LHS RHS Habitual Segment Adaptive Segment Estimates SD Estimates SD

Purchases (F)

Exo-

geneous

Intercept 0.03 0.06 0.95 0.42

Age -0.001 0.001 -0.01 0.003

Male (D) -0.003 0.01 0.00 0.05

Log of Income 0.03 0.01 0.31 0.06

Salaried Person (D) -0.002 0.01 -0.15 0.06

Credit Score 0.000 0.01 -0.12 0.06


Other

Endo-

geneous

Purchase C -0.03 0.01 -0.13 0.03

Cash Advance C -0.01 0.004 -0.09 0.02

Purchase F - - - -

Cash Advance F 0.003 0.01 0.06 0.04

Cash Advances

(F)

Exo-

geneous

Intercept 1.57 0.14 4.05 0.58

Age -0.001 0.001 0.00 0.004

Male (D) 0.04 0.02 0.02 0.06

Log of Income -0.08 0.02 -0.02 0.06

Salaried Person (D) -0.01 0.03 0.04 0.08

Credit Score -0.26 0.02 -0.65 0.09


Other

Endo-

geneous

Purchase C -0.10 0.02 -0.17 0.04

Cash Advance C 0.02 0.01 -0.02 0.03

Purchase F 0.19 0.03 0.02 0.05

Cash Advance F - - - -

22

Table 1.3 continued.

C) Derivatives implied by coefficients of endogenous variables

Label Purchases Cash Advances

Derivatives Habitual Segment

Adaptive Segment

Derivatives Habitual Segment

Adaptive Segment

(1) Across category within bank

CC CASHPUR / 0.03 0.03 CC PURCASH / 0.12 0.39

FF CASHPUR / 0.003 0.06 FF PURCASH / 0.19 0.02

(2) Across bank within category

FC PURPUR / -0.08 -0.30 FC CASHCASH / 0.07 0.28

CF PURPUR / -0.03 -0.13 CF CASHCASH / 0.02 -0.02

(3) Across bank and category

FC CASHPUR / -0.02 -0.31 FC PURCASH / -0.13 -0.36

CF CASHPUR / -0.01 -0.09 CF PURCASH / -0.10 -0.17

- The variable CASH represents expenditure in the cash advances category and PUR represents expenditure in the purchases category. - The subscript C represents competing banks and F represents the focal bank.

23

more reliance on factors such as income. In addition, expenditures in this segment are much

more interrelated. Thus, we label the larger segment as habitual segment and the smaller

segment as adaptive segment because expenditures depend on income.

The characteristics of the segments in demographics and expenditure patterns are as

follows. Compared to the habitual segment, consumers in the adaptive segment are slightly older

(39.1 vs. 36.8), comprise more males (70.9% vs. 66.2%), have fewer salaried persons (83.9% vs.

89.5%), and have more income. The adaptive segment spends, on average, 2.3 times more in the

purchases category and 1.4 times more in the cash advances category. However, there are no

differences between the segments in credit score and share of wallet in both categories. In the

following subsection, we discuss the effects of all factors by segment.

Effects of Endogenous Variables

Next, we interpret the coefficients of endogenous variables, which are presented in terms

of the derivatives implied by the coefficients in Table 1.3.C. These summarize how the

preference for purchases category is associated with the preference for cash advances category

and vice versa, across firms and categories.

Across categories within the same bank (Label 1), the interrelationships of the preference

between categories are positive and asymmetric. Specifically, in both segments, the increased

preference for cash advances is not associated with the preference for purchase except for the

habitual segment at competing banks with a weak positive impact ( )03.0/ CC CASHPUR . In

contrast, the increased preference for purchases is associated with the increased preference for

cash advances. At competitors, the coefficients are significant for both segments

24

CC PURCASH /( =0.12 and 0.39 for the habitual and the adaptive segment, respectively). At

the focal bank, the coefficient is significant only for the habitual segment

)19.0/( FF PURCASH . From the results, we can see that the purchases category is the more

important for inducing cross-selling than cash advances category and that the interrelationship

across categories more strongly takes place at competitors.

Across banks within the same category (Label 2), the interrelationships of the preference

for different banks depend on the category. In the purchases category, increased preference at the

focal bank is associated with decreased preference at competitors ( FC PURPUR / = 08.0 and

30.0 for the habitual and adaptive segment, respectively). Similarly, the increased preference at

competitors is associated with the decreased preference at the focal bank ( FC PURPUR / =

03.0 and 13.0 , respectively). The reason for the negative interrelationship in the purchases

category may be because consumers keep a balance between expenditures at two different banks

due to the limited budget.

Interestingly, cash advances respond somewhat differently. The increased preference for

cash advances at the focal bank are associated with increased preference for cash advances at

competing firms ( 28.0,07.0/ FC CASHCASH for each segment, respectively), but the

reverse is not so strong ( 02.0/ CF CASHCASH , for the habitual segment). Evidently, those

who use the focal firm for cash advances tend to seek cash advances elsewhere as well possibly

because they cannot satisfy their cash demand only at the focal bank. Therefore, if the focal bank

can increase the limit of cash advances without risk, there are business opportunities in the cash

advances category. However, those who use competing firms for cash advances are less likely to

25

use the focal firm possibly because there are a number of competing banks that can be sources of

cash advances.

Finally, we compare the interrelationships of preferences across bank and across category

(Label 3). In general, these interrelationships are negative. It is seen that increased preference for

cash advances by the adaptive segment of the focal bank is associated with reduced preference

for purchases at competing banks ( 31.0/ FC CASHPUR ). Similarly, increased preference

for cash advances at competing banks is associated with reduced preference for purchases at the

focal bank ( 01.0/ CF CASHPUR and 09.0 , respectively). With respect to cash advances

category, increased preference for purchase at the focal bank is associated with reduced

preference for cash advances at competing banks ( 13.0/ FC PURCACH and 36.0 ) while

increased preference for purchases at competing banks is associated with reduced preference for

cash advances at the focal bank ( 10.0/ CF PURCASH and 17.0 , respectively). The

results show that the interrelationships of expenditures also occur across-bank and across-

category, which cannot be discovered by a within bank and within category examination.

In summary, the results show that there are interrelationships between expenditures

across banks and categories. Overall, the interrelationships are positive across categories at the

same bank or in the cash advances category across banks. Therefore, banks can utilize the

positive results for cross-selling at the same bank or offering more cash advances in order to

keep consumers use one’s own cash advances category. Meanwhile, interrelationships are

negative in the purchases across banks and cross-bank and cross-category. It should be noted that

those interrelationships vary by segments. The adaptive segment shows stronger

26

interrelationships than the habitual segment. Therefore, it is necessary for firms to distinguish the

segments and implement different marketing mix based on their usage patterns. For example, it

could be more effective to target the adaptive segment for cross-selling.

Effects of Exogenous Variables

From Table 1.3.A and 1.3.B, we found large differences in the effects of exogenous

variables between the two segments. Most of all, expenditures of the habitual segment are

heavily affected by past expenditures while the expenditures of the adaptive segment are more

affected by demographics. Especially, the differences in coefficients of income, salaried person

indicator, and credit score are salient. We briefly present impacts of exogenous variables by

segments.

First, we start with the habitual consumers. Age, male, and salaried person do not have an

impact on the preferences for both purchase and cash advance categories at both firms. However,

income has a positive impact on preferences for purchase category ( , for

competing banks and the focal bank respectively, and hereafter in this subsection) and a negative

impact on preferences for cash advance category ( 8,11 ). Credit score has negative

impacts only on the preference for cash advance category ( 26.0,82.0 ). Finally, the

impact of past expenditures is so strongly positive )03.1~74.0( that this segment is named

as the habitual segment.

Second, with respect to the adaptive segment, age has a negative impact only on the

preference for purchase category )01( and male does not have an impact on either

category. Income has a positive impact on the preference for purchase )31.0,59.0( and for

27

cash advance category at competing banks ).28.0( Salaried person has negative impact on

the preference for purchase )15.0,37.0( and a positive impact on the preference for cash

advance at competing banks )43.0( . Credit score has negative impacts on both the preference

for purchase )12.0,31.0( and the preference for cash advance )65.0,60.1( . The

coefficients of past expenditures )58.0~33.0( are about half that for the habitual segment.

Overall, credit card usage is less affected by age or gender and more affected by income.

Higher income consumers tend to spend more in the purchases category and less in the cash

advances category. Salaried persons show various usage patterns depending on firms and

categories. Finally, consumers with high credit score are less likely to use the cash advances

category.

Prediction of the Size and Share of Wallet

After finding consumer spending patterns across firms and categories, we utilize this

knowledge for predicting the size and share of wallet. We predict expenditures only at competing

banks conditional on expenditures at the focal bank given that the firm should have availability

of its own transactions with consumers. The size of wallet for a category is the sum of

expenditures at the focal firm and expected expenditures at competing firms. The share of wallet

is calculated as,

)|( focalscompetitorfocal

focalfocal

yyEyy

WalletofSizey

WalletofShare .

After the burn-in period, we calculate the expected expenditures at competing banks.

Note that we derive expected expenditures from the latent utility from expenditures at competing

28

banks conditional on observed expenditures at the focal bank. As the analytical calculation

method is complicated, we use a numerical approach using Monte Carlo integration, which is

explained in detail in Appendix C.

The prediction results of our proposed model (S2) and the benchmark model (M1) are

presented in Table 1.4.A for the estimation sample and Table 1.4.B for the validation sample. We

present Mean Absolute Error (or MAE) of the size and share of wallet between the real

expenditures and expected expenditures. For the estimation sample, we present the results by

segments to see how much we can improve the prediction by segmentation. In each drawing, we

obtain the membership from a multinomial distribution with the probability which is the revised

prior membership probability with the likelihood conditional on observed expenditures. We

allocate each consumer to the segment of which their membership frequency is higher.

The results show that, in general, the proposed model considering the interrelationship

and heterogeneity shows better performance in predicting both the size and share of wallet,

especially in the habitual segment. As the benchmark model M1 does not distinguish the

interrelationships and the different preferences of consumers. Although the coefficients of past

expenditures in M1 are on average 0.67, its prediction is worse than the adaptive segment which

does not heavily depend on the past usage patterns. Specifically, in the purchases category, the

decreased MAE of the habitual segment in model S2 compared to model M1 is 4.20 % for the

size and 2.22 % for the share of wallet. However, in the adaptive segment, the reduction is

marginal for the size ( 7.2 %) and does not take place for the share of wallet. In the cash

advances category, the decreased MAE of the habitual segment in model S2 is 3.25 % for the

29

Table 1.4. Prediction of the Size and Share of Wallet A) The estimation sample

Category Type Segment N M1 (MAE) S2 (MAE) Reduction (%)

Purchases

Size Habitual 4412 0.341 0.271 -20.4%

Adaptive 1143 0.971 0.944 -2.7%

Share Habitual 4395 0.134 0.104 -22.2%

Adaptive 1141 0.121 0.127 5.0%

Cash

Advances

Size Habitual 4412 0.482 0.360 -25.3%

Adaptive 1143 0.899 0.806 -10.3%

Share Habitual 2675 0.139 0.123 -11.4%

Adaptive 695 0.143 0.140 -1.6%

B) The validation sample

Category Type Past Expenditure N M1 (MAE) S2 (MAE) Reduction

(%)

Purchases

Size Observed 995 0.519 0.506 -2.6%

Predicted 994 0.689 0.694 0.7%

Share Observed 991 0.132 0.123 -6.4%

Predicted 990 0.164 0.163 -0.3%

Cash

Advances

Size Observed 995 0.566 0.483 -14.7%

Predicted 994 0.874 0.816 -6.6%

Share Observed 612 0.152 0.145 -4.5%

Predicted 612 0.181 0.177 -2.4%

30

size and 4.11 % for the share. In addition, in the adaptive segment, the decrease in MAE is

3.10 % for the size and 6.1 % for the share of wallet. In summary, the proposed model brings

the benefits of better prediction for the habitual segment by correctly estimating the effects of

factors, mainly the past expenditures, and as good prediction for the adaptive segment as the

simple model M1 though the latter segment does not heavily rely on the past expenditures.

We also present the prediction results of the validation sample in Table 1.4.B. In practice,

it may be difficult to use past expenditures at competing banks as exogenous variables.

Therefore, for past expenditures at competing banks, we use both observed ones assumed to be

available and predicted ones estimated by a Tobit model in which we regress past expenditures at

competing banks on demographics and past expenditures at the focal bank. To calculate expected

expenditures at competitors, we use the every 10th draws from the posterior distribution. As we

cannot decide the segment membership in the validation sample, each consumer is randomly

assigned to the habitual segment and adaptive segment with the probability of 76% and 24% on

average for each iteration.

The results show that the performance of the proposed model S2 is generally better than

the performance of the benchmark model M1 in that there are decreases of MAE in prediction of

the size and share for both categories. It is notable that the better performance of S2 holds even

when we use the predicted past expenditures at competitors. Considering the fact that a reduced-

form model usually predicts better than a structural model, it is meaningful that our proposed

structural model outperforms the benchmark. Therefore, the results in the validation sample

firmly show that it is necessary to consider the interrelationships and heterogeneous preferences

when firms predict the size and share of wallet.

31

CONCLUSION

Summary

The purpose of this paper is to better understand consumers’ spending patterns across

firms and categories in order to better predict the size and share of wallet. It considers the

interrelationship between expenditures and heterogeneity in preference, which have not been

addressed in the previous literature. Consumers’ utility maximization problem with respect to

their expenditures subject to the budget constraint derives the simultaneous equations Tobit

model. To estimate the model, we propose a Bayesian estimation method. That is, using the

MCMC algorithms, we estimate the coefficients of endogenous variables, impute the latent

endogenous variables, and estimate the coefficients of exogenous variables and variance matrix

in sequence.

With this approach, we sought to answer several research questions raised in the

introduction. The first issue was what segments we can identify. We find two consumer

segments; the habitual segment and the adaptive segment. The former consists of consumers

whose current expenditures are closely related to past expenditures, possibly because their

budget allocation preference is stable. The latter segment consists of consumers whose current

expenditures across firms and categories are strongly interrelated and are affected by their

income.

32

The second issue was about differences in consumer spending patterns by segments,

especially the interrelationship of expenditures. We find that the interrelations are different

mainly in magnitudes by the segments. In general, within-bank expenditures in purchase and

cash advance categories positively affect each other. Within category, expenditures at the focal

bank and competing banks affect each other negatively in purchases category but positively in

the cash advances category. In cross-banks and categories, we find generally negative usage

patterns. For example, purchases at one bank are negatively related to cash advances at other

bank and vice versa.

The third issue was whether we can better predict the size and share of wallet by

considering the interrelationship of expenditures and customer heterogeneity. We compared the

size and share of wallet from our proposed model with those from the benchmark model. The

proposed model generally has lower prediction errors in both the size and share of wallet than the

benchmark model. We especially find that the prediction error reduction in the habitual segment

is large, possibly because the effects of past expenditures on the current expenditures are more

accurately estimated. In conclusion, our empirical findings show that it is important to consider

the interrelationships between expenditures and consumer heterogeneity to better predict the size

and share of wallet.

Managerial Implications

Our findings provide managers with guidelines through utilizing the interrelationship of

expenditures and heterogeneity between segments. First, from the significance and magnitude of

the interrelationships, firms can accurately implement cross-selling. For example, the increased

preference for purchases is associated with the increased preference for cash advances while the

33

converse is not true. Therefore, a cross-selling strategy of promoting purchases category first and

then cash advance category can be applied.

Second, managers should understand that expenditures in some categories at the focal

bank positively affect expenditure in the same categories at competing banks. For example, in

the cash advance category, expenditures at the focal bank increase expenditures at competing

banks. Without considering the reason of this positive impact, managers may unnecessarily

overspend on competitive promotions and advertising. If they can find reasons (e.g., the low

limit of cash advances) and take actions (e.g., increase the limit or offer a loan), there may be an

opportunity to capture consumers’ whole budget in the category.

Third, managers should give attention to the adaptive segment. This segment has larger

size of wallet than the habitual segment but the focal bank’s share is not larger. As the

interrelationships of this segment across firms is largely negative, if the focal bank can attract

this segment more, the bank can get the higher share of wallet from the increased expenditures at

the focal bank and decreased expenditures at competing banks. Therefore, managers need to take

care of this segment and provide incentives so that they can achieve higher share from

consumers of higher size.

We recommend the implementation of our approach as follows: Managers need to get

information on customers’ expenditures at competing firms for a sample of customers. It is

necessary to obtain this information at least once in order to estimate the parameters and check

whether the prediction is correct. After the managers obtain the coefficients from the sample,

they can predict expenditures for the out-of-sample customers by multiplying the exogenous

variables with the coefficients. Finally, they can predict the size and share of wallet with the

34

predicted expenditures at competing firms conditional on the expenditures at the firm. The

method to estimate a sample of customers and apply the coefficients to out-of-sample for

prediction is found in many studies (e.g., Iyengar et al. 2003).

If the managers use the past expenditures at competing firms for the exogenous variables

for the exclusion restrictions, it would be necessary to predict those expenditures for the out-of-

sample customers. Using a multivariate Tobit model, managers can regress the past expenditures

at competing firms on other variables available at the firm for the sample of customers. Then,

they can predict those expenditures for the out-of-sample customers using the coefficients

obtained from the model. For the periods after estimation, the managers can use the expenditures

previously predicted by our model as the past expenditures in the current period.

Limitation and Future Research

As the data is limited to demographic information and past expenditures, we saw the

impacts of only these variables on the current expenditures and not, for example, other marketing

mix effects. For example, we might speculate that if competitors increase advertising or provide

promotions, consumers may increase expenditures at competing firms and decrease expenditures

at the focal firm. Thus, future research using richer data sets could investigate the effects of

marketing mix. If panel data is available, it would also be worth investigating the change of the

interrelationship, size, and share of wallet over time, and its possible drivers.

35

APPENDIX

A. Derivation of a Simultaneous Equations Tobit Model

We present the derivation of a Simultaneous Equations Tobit model from the utility

maximization problem with binding non-negativity constraints referring to the previous research

(i.e., Amemiya et al. 1993; Ransom 1987).

We assume that consumers maximize a quadratic utility function by allocating their

budget across M firm-category combination and the outside option. This is expressed by

(A1) 2/')(max ' yAyybyUy

Wyts M'

11.. ,

where y is a vector of non-negative expenditures, )',,,,,,,( 21112110 MC yyyyyyy , b is a

(M+1) dimensional vector, A is a (M+1) (M+1) negative definite matrix, and 11M is a (M+1)

dimensional vector of ones.

The Lagrangean function is )1(2/' '1

' yWyAyybL M and the necessary and

sufficient Kuhn-Tucker conditions for a constrained maximum are

0my

L , 0my , 0m

m yLy

0L , 0 , 0L ,

where Mm ,,0 . That is,

36

(A2) mm

yyU 0 and 01'

1 WyM .

We assume that 00y and consequently, 00y

U and 01'1 WyM because of

complementary slackness. Therefore, Equation A2 can be rewritten as

(A3) mm

yyU

yU 0

0

and WyM'

11 , where .,,1 Mm

We partition the matrix and vectors as

,'

,, 000

Aaaa

Abb

byy

y and aa

a 0

where ,, 00 by and 0a are scalars.

In a matrix form, Equation A3 can be written as

(A4) yyabyAab MM 101)'(][ 0

Using the identity yWy M'

0 1 , we express the Kuhn-Tucker conditions in Equation A4 as

(A5) yGy M10 ,

where '0

' 111'1 MMMM aAaaG and MM WabWab 11 00 .

Note that contains the stochastic elements in the form of ( 0uum ) if we set up b with

a typical element )( 0 mmm ubb , where 0mb is a deterministic marginal utility and mu

represents individual differences in marginal utility among consumers. In a general form,

could be made to depend on individuals’ characteristics (exogenous variables) and error terms. A

typical m-th equation in Kuhn-Tucker conditions in Equation A5 could thus be written as

37

00

0011

m

mm

K

kmkmk

J

jmmj

yif

yifxy

where mkx are the individuals’ characteristics and m is an error term resulting from differences

in marginal utility.

An alternative way of writing the conditions would be

00

011

RHSif

RHSifxyy m

K

kmkmk

J

mjj

mmjmmm

which is a typical expression of a simultaneous equations Tobit model. That is, the Kuhn-Tucker

conditions to find out the optimal expenditure convert to the estimation problem of a

simultaneous equations Tobit model. After standardization of the parameters, we derive a

simultaneous equations Tobit model as

(A6) ,XY

where Y represents a vector of endogenous variables which could be censored at zero and X

represents exogenous individuals’ characteristics, and is a vector of error terms following a

multivariate normal distribution.

B. MCMC Algorithms for Model Estimation

Equation 2 is the main equation to estimate. Our approach is to sequentially draw , ,

, and *iy . We first explain the basic estimation method in the aggregate level from the

subsection 1 through the subsection 5. Then, in the subsection 6, we explain how to extend the

basic estimation method to the latent class model.

38

1. Likelihood function

To calculate the likelihood conditional on other parameters, we require the distribution of

*iy , which is derived from Equation 2. As i follows a multivariate normal distribution, we re-

arrange the equation and denote the function, )( *1iyg , as follows.

(B1) iiii Xyyg **1 )(

By the transformation technique, we get the distribution of *iy :

(B2)

)()()(21exp

)2(

1

|)(|)()(21exp

)2(1

|)(|][

)()]([)(

*1'11'1*2/1'112/

*1'*2/12/

*

*'

*1*1*

iiiiM

iiiiM

ii

i

iii

XyXy

absXyXy

absXyf

yyg

absygfyf

That is, a multivariate Normal distribution, ),(~ '111*ii XNy , is obtained.

Assuming that each observation is independent, we can calculate the likelihood for all

observations by )()( *

1

*i

N

i

yfyf .

2. Estimation of Coefficients of Endogenous Variables ( )

The matrix is estimated using a random walk chain Metropolis-Hastings method. As

the diagonal elements of are 1, we need to estimate only the off-diagonal elements. Let ~

denote the vector consisting of off-diagonal elements of , where the dimension of ~ is

1)(~ 2 MMK . We use a normal prior, i.e., ),(~~~~N and generate candidate draws

according to zr )1(* ~~ , where ),0(~ zNz and r denotes the r-th iteration. We assume

39

Kz Ic ~~ and determine the value of c~ to make the acceptance probability is around 40%,

following the general rule (Koop 2003). Therefore, the candidate *~ is drawn from a

multivariate normal distribution such as

)~,~(~~~

)1(*K

r IcN

Using the prior of ~ and the likelihood, we calculate the posterior probability of ~ as

follows.

(B3) )()~(),,|~( ** yfy

With Equation B3, we calculate the acceptance probability as

(B4) 1,),,|~~(

),,|~~(min)~,~( *)1(

***)1(

yy

rr .

3. Estimation of Coefficients of Exogenous Variables and Error Covariance ( and )

Once we get a new draw of )(~ r , we construct )(r by re-arranging .~ )(r Finally, we

calculate **~ii yy (hereafter, we suppress the iteration number r for simplicity). We now stack

all the observations together as

NMNN X

X

X

y

y

y

111

*

*1

* ,,,

~

~

~

and write

(B5) Xy *~ .

We assume that i follows ),0(N and follows ),0( NIN .

40

As Equation B5 is a SUR model, we can estimate and by using a Gibbs sampler

with standard Normal-Wishart priors. Specifically, we use a normal prior ),(~ N and a

Wishart prior ),(~1 VvW .

The posterior of conditional on *~y and 1 is ),(~,~| 1* Ny ,

where 1

1

1'1 )(N

iii XX and )~(

1

*1'1N

iii yX . The posterior for 1

conditional on *~y and is ),(~,~| *1 VvWy , where vNv and

.)'~)(~(1

1

**1N

iiiii XyXyVV

4. Data Augmentation

Now, we address the censoring issue. After getting all parameters ( , and ), we can

impute *iy in the following way. (1) If all elements in iy are positive, there is no need to impute.

(2) If all elements in iy are zero, we draw *iy from the multivariate truncated normal

distribution, )~,( 1)0,( iXMVTN , where '11~ . (3) If some elements of iy are zero,

we draw the latent values from a conditional multivariate truncated normal distribution.

Denote the zero symi ' as a vector of uy (unknown symi '* ) and the non-zero symi ' as a

vector of ky (known symi '* ). Then, we impute *

uy from

)~,( ||)0,( kukuMVTN ,

41

where ))((~~)( 111| kkkkukuku XyX and '1

|~~~~~

ukkkukuuku . Note that

uX )( 1 is a vector of elements of iX1 , which corresponds to unknown symi '* while

kX )( 1 is a vector of elements of iX1 , which corresponds to known symi '* . Similarly,

uu~ is a covariance matrix between unknown symi '

* while kk~ is a covariance matrix between

known symi '* . In addition, uk

~ is a covariance matrix between unknown symi '* and known .'* symi

5. Prior Distribution

We use diffuse settings for the priors on parameters as follows. Coefficients of

endogenous variables : )10,0(~~~

3~ KK IMVN . Coefficients of exogenous variables:

)10,0(~ 3KK IMVN . Variance-covariance matrix: ),(~1 VvW , where 3Mv and

MIvV )/1( . Variance of the random walk chain: 510~c , which makes Kz I ~510 .

6. The latent class membership

The steps of estimating latent classes model are to determine the class of observation for

each iteration and then run the estimation of Equation 2 in the given class. Given a latent

segment s, Equation 2 for consumer i belonging to the segment s can be expressed as

issisiss Xy* , where ),0(~ sis N and the likelihood function given other parameters is

S

ssssiisii yfeVBGepyf

1

** ),,|(),,,,|( ,

where '1 ),,( iSii eee and 1ise if consumer i belong to the segment s and 0ise if not. In

addition, '1 ),,( SG , '

1 ),,( SB and '1 ),,( SV .

42

p is a vector of the probability of the consumer belonging to the s-th class in the mixtures.

That is, ),,( 1 Sppp and )1( iss ePp . For the prior of p, we set up a Dirichlet distribution

of )(~ Dp , where S1 and S1 is an S-vector of ones. As there is an identification problem in

the mixture model, we impose a labeling restriction by drawing p from an ordered Dirichlet,

which makes ss pp 1 for Ss ,,2 . The posterior distribution of p is

)(~ Dp , where N

iie

1

.

We draw ie from the multinomial distribution, ),1(~ pMei . The posterior distribution of ie is

expressed as

S

ssssis

SSSiSS

ssssis

ii

yfp

yfp

yfp

yfpMe

1

*

*

1

*

111*

1

),,|(

),,|(,,),,|(

),,|(,1~ .

After the membership s is determined, we select observations belonging to the s class (i.e.

*isy and isX ), sequentially draw other parameters within the given class s, and repeat the steps

through the last class.

C. Monte Carlo Integration

We calculate expected expenditures at competing banks given that the focal bank has

access to consumers’ expenditures at it. To do this, first, we need to consider consumers’ usage

patterns at the focal bank (i.e., 21y and 22y ) such that (1) they use both categories, (2) they use

one of categories, and (3) they use none of categories. Second, we need to convert the latent

utility from expenditures (i.e., ),|( 2221*11 yyyE and ),|( 2221

*12 yyyE ) to expected consumer

43

spending (i.e., ),|( 222111 yyyE and ),|( 222112 yyyE ). We extend the expectation logic of a

univariate Tobit model to multivariate and conditional expectation. The distribution of the latent

utility is

(C1) ),(~ '111*ii XMVNy

The expected expenditures in two categories at competitors are calculated as follows.

(C2) ],),(|),[(]),[(

])(|)[(])[(

**12

*11

*12

*11

4

1

**12

*11

12,1112,11

4

112,1112,11

kk

k

kk

k

RyyyyERyyP

RyyyyERyyPyyE

where kR means a possible range that actual expenditures at competing banks exist. Specifically,

each range is defined as )0,0(: 12111 yyR , ),0(: 212112 ryyR , )0,(: 121113 yryR , and

),(: 2121114 ryryR , where 01r and 02r . *kR means a possible range that the latent utility

from expenditures at competing banks. Corresponding to , each *kR is defined as follows:

)0,0(: *12

*11

*1 yyR , )0,0(: *

12*11

*2 yyR , )0,0(: *

12*11

*3 yyR , and )0,0(: *

12*11

*4 yyR .

Given that expenditures at the focal bank is known, we can calculate expected

expenditures at competing banks conditional on expenditures at the focal bank.

(C3)

],),(),(|),[(]),(|),[(

),(|

),(|

**22

*21

**12

*11

*12

*11

4

1

**22

*21

**12

*11

**22

*2112,11

222112,11

lkk

lk

l

l

SyyandRyyyyESyyRyyP

SyyyyE

SyyyyE

where lS means observed expenditures at the focal bank and is one of )0,0(: 22211 yyS ,

),0(: 222212 syyS , )0,(: 221213 ysyS , or ),(: 2221214 sysyS , where 01s and

02s . *lS means a possible range that the latent utility from expenditures at the focal bank

kR

44

exists. Corresponding to lS , each *lS is defined as follows: )0,0(: *

22*21

*1 yyS ,

),0(: 2*22

*21

*2 syyS , )0,(: *

221*21

*3 ysyS , and ),(: 2

*221

*21

*4 sysyS .

As it is difficult to analytically calculate Equation C3, we use a Monte Carlo integration

and describe the steps as follows.

Step 1. Randomly draw ),( *12

*11 yy for N times. In case of )0,0(: 22211 yyS , we draw

from the multivariate normal distribution in Equation C1. In other cases, we draw from a

multivariate normal distribution conditional on the positive values of 21y and 22y .

Step 2. For the probability part in Equation C3, calculate the ratio of the number of draws **

12*11 ),( kRyy and **

22*21 ),( lSyy to the number of draws **

22*21 ),( lSyy .

Step 3. For the expectation part, calculate the average of draws ),( *12

*11 yy , which belong to

*kR given *

lS .

45

REFERENCES

Amemiya, Takeshi (1974), “Multivariate Regression and Simultaneous Equation Models when the Dependent Variables Are Truncated Normal,” Econometrica, 42 (6), 999-1012.

Amemiya, Takeshi, Makoto Saito, and Keiko Shimono (1993), “A Study of Household

Investment Patterns in Japan: An Application of Generalized Tobit Model,” The Economic Studies Quarterly, 44 (1), 13-28.

Baumann, Chris, Suzan Burton, and Greg Elliott (2005), “Determinants of Customer Loyalty and

Share of Wallet in Retail Banking,” Journal of Financial Services Marketing, 9 (3), 231-48.

Bowman and Das Narayandas (2004), “Linking Customer Management Effort to Customer

Profitability in Business Markets,” Journal of Marketing Research, 41 (November), 433-47.

Cameron, Colin A. and Pravin K. Trivedi (2005), Microeconometrics: Methods and

Applications, Cambridge: Cambridge University Press. Carlin, Bradley P. and Thomas A. Louis (2000), Bayes and Empirical Bayes Methods for Data

Analysis. Boca Raton: Chapman & Hall. Chen, Yuxin and Joel H. Steckel (2005), “Modeling Credit Card 'Share of Wallet': Solving the

Incomplete Information Problem,” working paper, New York University, NY. Cooil, Bruce, Timothy L. Keiningham, Lerzan Aksoy, and Michael Hsu (2007), “A Longitudinal

Analysis of Customer Satisfaction and Share of Wallet: Investigating the Moderating Effect of Customer Characteristics,” Journal of Marketing, 71 (January), 67-83.

Du, Rex Yuxing, Wagner A. Kamakura, and Carl F. Mela (2007), “Size and Share of Customer

Wallet,” Journal of Marketing, 71 (April), 94-113. Iyengar, Raghuram, Asim Ansari, and Sunil Gupta (2003), “Leveraging Information Across

Categoreis,” Quantitative Marketing and Economics, 1 (4), 425-65.

46

Kamakura, Wagner, Michel Wedel, Fernando de Rosa, and Jose Afonso Mazzon (2003), “Cross-selling through Database Marketing: a Mixed Data Factor Analyzer for Data Augmentation and Prediction,” International Journal of Resear3ch in Marketing, 20 (1), 45-65.

Koop, Gary (2003), Bayesian Econometrics. Hoboken, NJ: Wiley. Li, Kai (1998), “Bayesian Inference in a Simultaneous Equation Model with Limited Dependent

Variables,” Journal of Econometrics, 85 (2), 387-400. Li, Shibo, Baohong Sun, and Ronald T. Wilcox (2005), “Cross-Selling Sequentially Ordered

Products: An Application to Consumer Banking Services,” Journal of Marketing Research, 42 (May), 233-39.

Maddala, (1986), Limited-Dependent and Qualitative Variables in Econometrics. New York,

NY: Cambridge University Press. Malthouse, Edward C. and Paul Wang (1998), “Database Segmentation Using Share of

Customer,” Journal of Database Marketing, 6 (3), 239-52. Murphy, KM and RH Topel (1985), “Estimation and Inference in Two-Step Econometric

Models,” Journal of Business and Economic Statistics, 20 (1), 88-97. Ransom, Michael R. (1987), “A Comment on Consumer Demand Systems with Binding Non-

negativity Constraints,” Journal of Econometrics, 34, 355-59. Reinartz, Werner J. and V. Kumar (2003), “The Impact of Customer Relationship Characteristics

on Profitable Lifetime Duration,” Journal of Marketing, 67 (January), 77-99. Reinartz, Werner J., Jacquelyn S. Thomas, and V. Kumar (2005), “Balancing Acquisition and

Retention Resources to Maximize Customer Profitability,” Journal of Marketing, 69 (January), 63-79.

Verhoef, Peter C. (2003), “Understanding the Effect of Customer Relationship Management

Efforts on Customer Retention and Customer Share Development,” Journal of Marketing, 67 (October), 30-45.

Yang, Sha, Vishal Narayan and Henry Assael (2006), “Estimating the Interdependence of

Television Program Viewership Between Spouses: A Bayesian Simultaneous Equation Model,” Marketing Science, 25 (4), 336-49.

Zheng, Zhiqiang, Peter S. Fader, and Balaji Padmanabhan (2009), “Inferring Competitive

Measures Using Augmented Site-Centric Data,” working paper, University of Texas at Dallas, TX.

47

CHAPTER 2

SEARCH PATTERNS, SEARCH-BASED SEGMENTATION AND SEARCH RESULTS

OF AUTOMOBILE PURCHASERS

Sungha Jang





48

ABSTRACT

Consumers often search several information sources when making purchase decisions. In this

paper, we study how time spent searching one source is interrelated with time spent searching

other sources using data on new automobile purchases. In this category, information sources

include different offline sources, Internet websites, spouse and internal search. We build a

structural model assuming that consumers allocate their search time across several information

sources to maximize utility and estimate relationships of each information source. Then, we

segment consumers based on their search preferences and examine brand choices and price-

related results. We find that consumers use information sources in a complementary manner and

that the dealer is still a prominent source. We also find that at the segment level, except for two

segments, an inverted U-shaped relationship exists between internal and external search. Brand

choice analysis reveals that low external search is associated with a choice of American brands.

Finally, segments achieve different price negotiation times and discounts but display similar

satisfaction with price paid. Based on these results, we provide recommendations to automakers.

Keywords: Search; Segmentation; Structural model; Simultaneous equations Tobit model;

Brand choice.

49

INTRODUCTION

When consumers search for information to make purchase decisions, they often use several

information sources (Ratchford et al. 2003, 2007). For example, when purchasing an automobile,

consumers may do internal search based on their past experiences and satisfaction and do

external search by talking with acquaintances, reading independent reviews or going to

manufacturer and dealer sources either on the Internet or offline. If married, they will probably

also ask their spouses. Search can affect consumers’ brand choices and their ability to negotiate

better prices (Ratchford et al. 2003). Therefore, it is necessary to understand consumers’ search

patterns, identify different segments of searchers, and examine their search results. This

understanding can help to budget the billions of advertising and promotional dollars spent by

automobile manufacturers and dealers more effectively.

While a number of research papers deal with search across multiple information sources,

this paper addresses some issues that have received less attention. Consider, for example, the

following two observations from automobile search:

First, despite the proliferation of information sources and an average sticker price of

about $27000, half the consumers in our dataset reported an external search time of less than

eight hours. A possible explanation for this may be that these consumers rely less on external

sources because they possess information from internal search. Therefore, it would be useful to

see how consumers allocate their time on the different external and internal information sources.

Our proposed structural model allows us to consider the relationship between the entire set of

50

information sources including internal, offline, the Internet and spouses, unlike existing studies.

The model builds on work that studied the relationship between internal search, or product

knowledge, and external search but with fewer sources and limited interrelationships (e.g., Rao

and Sieben 1992; John, Scott and Bettman 1986). Similarly, our proposed model extends the

studies by Ratchford et al. (2003) and Klein and Ford (2003) that examined the impact of the

Internet on offline sources only, by looking at the reverse impact as well. Thus, this paper

contributes to extending the literature on search by examining interrelationships of all

information sources that consumers use.

Second, we observe that 40% of the consumers in the dataset did not use the Internet for

search even in 2005, although the Internet was widely available by then (i.e., 93% of respondents

had some access to the Internet) and studies by Ratchford et al. (2003, 2007) have found that in

automobile purchases, Internet search has substituted significantly for traditional sources. A

possible explanation is that different individuals search differently, with some segments having

lower search preference for the Internet than others. This possibility could be studied by a

segmentation analysis based on search preference. This paper examines search-based

segmentation of automobile buyers. The prevailing benchmark is Furse, Punj, and Stewart

(1984), who classified buyers into six segments, but prior to the emergence of the Internet as an

information source. It will be relevant to compare our segmentation results against Furse et al.

(1984) and see what changes have occurred. On the topic of segments, while the effects of search

on price-related outcomes such as the price discount have been examined (e.g., Viswanathan,

Kuruzovich, Gosain, and Agarwal 2007), brand choice for search-based segments has been

largely unexplored and it is undertaken in this paper. Therefore, the search-based segmentation

51

contributes to the literature by comparing similarities and differences between segments before

and after the Internet. In addition, the relationship between search-based segments and brands

shows the rationale that firms need different approaches to provide their consumers necessary

information.

We next provide a brief overview of the model and results of this paper. The information

sources in our dataset are listed in Table 2.1.

Table 2.1. Information Sources in the Automobile Purchases Category Offline sources Internet sources

Internal Search

External

Search

Personal Friends / Relative Online Chat sites

Independent Consumer Report Third Party websites

Independent websites

Manufacturer Brochures, Advertisement Manufacturer websites

Dealer Dealer, Showroom Dealer websites

Experiential Test-driving N/A

Spousal Search

We assume that consumers have limited discretionary time in which to search for information

and do outside activities. They further allocate their search time among different external sources

based on their preferences to maximize their utility. Due to the limited search time, an increase in

the search time on a source can lead to an increase or decrease in the search time on other

sources. These changes may depend on consumer characteristics and product attributes. We build

a structural model and estimate the impact of each information source and consumers’

52

Figure 2.1. Conceptual Model

characteristics on other information sources. Figure 2.1 depicts a conceptual view of these

relationships. In the model, positive search time is the revealed preference, whereas in case of

zero search, we estimate the latent preferences for the external information sources by data

augmentation.

Next, we segment consumers based on their estimated search preferences for information

sources. We compare and contrast the segments by their brand choices as well as their price-

related outcomes (e.g., negotiation time, discount and final price satisfaction). The relationship

between segments and brand choice is insightful for developing the proper communication

strategies for automobile makers.

Internal search

Internal search2

Spouse search

Spouse search × Male

Year 2003

Year 2005

Male

Age

Education years

Hourly wage

Log of sticker price

Exogenous Variables

Endogenous Variables

Personal

Independent (I*)

Manufacturer (I*)

Dealer (I*)

Independent

Manufacturer

Dealer

Test-Driving

*: I represents the Internet source.

53

Our main results are as follows: First, we find that the interrelationships between

preferences for information sources in automobile purchases are generally positive. However,

consumers who prefer to spend more time with the dealer spend less time on other information

sources including Internet sources. Another finding is that Internet search is generally associated

with increased offline search except for search in the manufacturer websites, which substitutes

search in offline independent, manufacturer, and dealer sources. In addition, relationships

between the Internet sources are complementary to each other. We also find that internal search

reduces external search, perhaps explaining why some customers have short search times, but we

do not find an overall inverted U-shaped relationship between internal search and each external

information source in this part of the analysis. Finally, the buyer’s external search increases with

spouse search.

Second, we identify nine segments of automobile buyers using the individual search

preference for each source. Segment-wise, the relationship between internal and external search

is, in general, inverted U-shaped, indicating that consumers with low or high internal search do

less external search than those with moderate internal search. But two segments do not exhibit

this pattern. In one there is low internal search but high external search, while the other has

moderate internal search but very high external search. In addition, when we overlay our

segments on those of Furse et al. (1984), several differences emerge, which may be expected

given that our segmentation is influenced by Internet usage. For example, the proportion of

searchers relying on acquaintances has decreased while the proportion of independent searchers

has increased due to easy access to information via the Internet.

54

Third, we examine the brand choices of the segments. Overall, the low external search

segments tend to choose American brands, while high external search segments, which use the

Internet, tend to choose Japanese or European brands. This result indicates that manufacturers

need to consider different advertising strategies (e.g., enhancing the brand loyalty vs. providing

more information) depending on their customers’ search patterns. For the price-related results,

we find that the different segments have different negotiation time, discount amount, and

discount rate but the same satisfaction levels with the price paid.

The rest of this paper is organized as follows: In the next section, we discuss the relevant

literature and our contribution. After this, we introduce the search time allocation model and the

estimation methodology. We then describe the dataset and present empirical results. Finally, we

provide conclusions, managerial implication and directions for future research.

55

LITERATURE REVIEW

This study is related to four research streams: search time allocation, relationships of information

sources, segmentation of automobile buyers, and the outcomes of search. Table 2.2 describes the

positioning of this study in comparison to other closely related studies on automobile purchases.

Regarding, first, search time allocation, Hauser Urban, and Weinberg (1993) propose an

allocation model of search time given a budget constraint. They calculate the value of positive

and negative information on four different sources, viz., showroom, interview, articles, and

advertisements. They find that the values of sources are different in the search order, i.e.,

showroom and interviews had higher values than other sources. Differences from the current

research are that their study was conducted in a laboratory setting, with fewer information

sources and with limited search time, and that the interrelationship between different sources was

not tested. Ratchford et al. (2003) propose a utility maximization model in which consumers

decide on the total search time and allocate it between different sources by taking into account

the gains and losses from search. Their focus is on the impact of the Internet on the share of

offline sources in searching. Our paper extends their examination of the effect of the Internet by

differentiating between search on different types of Internet websites (i.e., independent,

manufacturer, and dealer) and for different segments of consumers. These different websites

have different characteristics. That is, independent websites (e.g., Consumer Reports) cover

many brands and have prices and users’ opinions. Manufacturer websites provide detailed

information on their own models, and dealer websites provide selling prices and transaction

56

Table 2.2. Comparison of Studies Related to Automobile Purchases

Category Furse et al. (1984)

Hauser et al. (1993)

Ratchford et al. (2003)

Zettelmeyer et al. (2006)

This Study (2010)

1. Purpose Classify the buyers into segments in the perspective of the buyers and dealers

Find benefit/cost model of how consumers allocate time and of the value of sources

Study the effects of the Internet search on the offline sources and total time

Study the way of how the Internet reduces the price

Study search patterns, segments, and the search results

2. Data Survey of recent buyers

Experiment Survey of recent buyers

Survey of recent buyer and real transaction data

Survey of recent buyers

3. Model Cluster analysis Allocation of time on source j:

Tttts

tvtv

J

jj

J

jjj

10

100

..

)()(max

))/exp(1()( jjjjj ttv

v: value t: search time

Allocation of time on source j:

jj

jjj

tw

taSg

MaxB

))]ln((exp[1

g: search gain S: prior information t: search time

Regression:

iiii

i

SDXprice )ln(

X: transaction data D: demographics S: survey response

Allocation of time on source j (y vector):

2/'')( AyyybyUy: a vector of time spent on source j’s

4. Variables a. Personal b. Independent c. Manufacturer d. Dealer e. Experiential f. Indep. (I*) g. Manuf. (I) h. Dealer (I) i. Internal j. Spouse

√ √ √ √ √

√ (indirect) √

√ √ √

√ √ √ √ √

√ (aggregated) √ (aggregated) √ (aggregated) √ (indirect)

√ √ √ √ √ √ √

√ √ √ √ √ √ √ √ √ √

5. Relationship among search sources

√ (share of j) √

6. Heterogeneity in searchers √ (clusters) √

7. Search results a. Brand b. Price

√ (Partial)

√

√

√ √

*: I represents the Internet source.

57

services. Therefore, different websites potentially have different effects on consumer search

behavior. Ratchford et al. (2007) extend the model in Ratchford et al. (2003), allowing more

general assumptions, but do not distinguish different types of websites. We also examine the

interrelationship between information sources, which are not fully dealt with in these papers and

falls in the next research stream.

The second research stream deals with interrelationships between offline search, Internet

search, internal search, and spouse search. The impact of the Internet on the offline sources has

been investigated. For example, Ratchford et al. (2003, 2007) find that Internet search

moderately reduces the share of third-party print and severely reduces the share of the dealer.

Similarly, Klein and Ford (2003) find that active shoppers who were planning to purchase a car

within the next six months in 2000, increased the share of Internet sources but decreased the

share of dealer visits compared to buyers who already bought in the previous year, 1999. In

contrast, there is little study on the effects of the offline sources on Internet search. In our data,

even in 2005, almost half the consumers did not search the Internet when purchasing an

automobile. Possibly those consumers rely on offline sources. Therefore, we examine how

offline sources affect Internet search, which is absent in existing automobile studies.

There are several studies on the relationship between internal search, measured as

knowledge and experience, and external search (e.g., Guo 2001), but the theory is still unsettled.

Moorthy, Ratchford, and Talukdar (1997) find that amount of search follows the inverted U-

shape versus experience, which means that external search is low for consumers with low or high

knowledge, and high for those with moderate knowledge. They explain that consumers with low

experience are not able to make fine distinction between alternatives and therefore have little

58

incentive to search and consumers with high experience have relatively little uncertainty about

alternatives and thus do not externally search a lot, while consumers with an intermediate

experience have partially differentiated brand perceptions and hence a greater incentive to

search. In contrast, Rao and Sieben (1992) find a U-shaped relation between prior product

knowledge and search amount. Some researchers report a positive relationship between

subjective knowledge and search (e.g., Srinivasan and Ratchford 1991) while others find a

negative relationship between product familiarity and the search (e.g., Russo and Leclerc 1994).

We test for the inverted U-shaped relationship in two ways, first by looking at the significance of

the quadratic form of internal search in parameter estimation and second by looking at external

search time by the segments sorted by the degree of internal search. Segment level analysis helps

to resolve some of the above inconsistencies. Separately, we also examine search time with the

spouse. Yang, Narayan, and Assael (2006) find that spouses have a positive impact on their

partners’ viewership of TV programs. Because the husband and wife are likely to visit dealers

and perform other search activities together, one would expect to find a positive relationship

between spouses in automobile purchases.

The third stream of interest is the segmentation of automobile buyers. Furse et al. (1984)

identified six segments from the perspective of buyers and dealers. However, segmentation of

automobile buyers needs to be revisited because many consumers currently search on the

Internet, which was unavailable at the time of Furse et al.’s study. In addition, their paper did not

focus on the brand choice or price-related outcomes for different search-based segments.

The fourth literature stream deals with the results of search. Much of it focuses on the

effects of search on price discounts. For example, Zettelmeyer et al. (2006) examine how search

59

on the different types of websites reduces the price paid. Viswanathan et al. (2007) find that price

information from Internet search reduces the paid price but that product information from

Internet search increases it. However, it is also informative to understand the relationship

between search and brand choices, as we shall show.

To conclude, our study extends the existing literature by investigating the

interrelationship of a large number of information sources used in automobile search, segmenting

consumers on the basis of their search preference across multiple sources, which includes

Internet sources, and examining the results of search on brand choices as well as price-related

outcomes, by segment.

60

MODEL AND METHODOLOGY

As shown in Table 2.1, information sources used by automobile buyers can be grouped into:

internal search, various external search sources, and spouse. We will assume that internal search

and spouse search are less time constrained to buyers compared to external search. Therefore,

consumers maximize their utility by allocating a limited time across external sources, with these

allocations possibly influenced by internal search, spouse search, consumer demographics, and

product attributes.

We set up the utility maximization problem as follows: Consumers allocate

Jjy j ,1, , of their time to search in each of J external information sources, and time 0y on

other activities including work and leisure. They maximize utility by allocating a total time of T.

This is expressed by

(1) 0 1 2

0 1 2, , ,

01

max ( , , , , )

. . .

JJy y y y

J

jj

U y y y y

s t y y T

Jy( , ),

As in previous studies, the functional form for utility is selected such that the marginal gains

from search are diminishing (e.g., Hauser et al. 1993 and Ratchford et al. 2003). We use a

quadratic utility function, which has this property, specifically,

2/'')( AyyybyU ,

where y is the vector of search times and other activities, )',,,,( 210 Jyyyyy , b is a (J+1)

dimensional vector, and A is a (J+1) (J+1) negative definite matrix. Here, b and A can be

61

interpreted as proportional to the mean and the variance-covariance matrix of the unit return to

search time and other activities (Amemiya, Saito and Shimono 1993).

For this type of maximization problem of a quadratic utility and linear constraint,

Ransom (1987) and Amemiya et al. (1993) have proved that the Kuhn-Tucker conditions yield a

simultaneous equations Tobit model, first introduced by Amemiya (1974). That is, for each

consumer i, the structural form of a simultaneous equations Tobit model is given by

(2) iii Xy* ,

where the vector of endogenous variables *iy affect each other and are affected by a vector of

exogenous variables iX and a vector of error terms i . Note that other activities 0y and the total

time T are absorbed into error terms and coefficients of variables during transformation.

The endogenous vector *iy can be defined as the vector of latent search preference for J

information sources. For consumer i and information source j, the relationship between the

search preference and the search time is given by,

000

*

**

ij

ijijij yif

yifyy

where *ijy is consumer i’s search preference for the information source j and ijy is the observed

search time on it. That is, we assume that search time on the information source j is related to the

search preference for the source j. If the search preference for an information source exceeds a

threshold utility, which we scale to zero, the consumer searches that information source and the

observed positive search time is treated as the preference itself. However, if the search

62

preference for the source is less than the threshold, the consumer does not search that source and

the observed time is zero.

The matrix iX contains the vectors of exogenous variables (e.g., demographics and

product attributes) for all the information sources in the following form.

'

'2

'1

000

0000

iJ

i

i

i

x

xx

X ,

where ijx is a jk -vector containing the i-th observation of the vector of explanatory variables in

the j-th equation. In each ijx vector, there are variables common in all J equations and a unique

variable in each j equation. We use the unique variables as exclusion restrictions to identify the

system.

is a JJ matrix whose diagonal elements are one and off-diagonal elements are

coefficients of other endogenous variables. This matrix expresses the interrelationship of the

endogenous variables. is a 1K vector of coefficients of the form '''1 ),,( J , where

J

jjkK

1

. The vector shows the impact of the exogenous variables on the endogenous

variables. Finally, i follows a normal distribution ),0(N .

Estimating the model involves simultaneity and censoring issues. Simultaneity occurs

since, due to limited time, change in the allocation of search time to one source leads to changes

in all the others. The censoring issue occurs because consumers do not use all information

sources. For example, in our data, almost half the respondents did not use the Internet. To handle

63

these issues in estimation, we adopt the method of Jang, Prasad, and Ratchford (2010). This

method, briefly described in the Appendix, is to sequentially draw ,,, *iy and given

other parameters using MCMC algorithms.

Next, we classify automobile buyers into segments. To segment, we use the latent search

preference *iy for each information source, the internal and spouse search times, and a dummy

variable for whether the consumer used the Internet or not. To handle both continuous variables

and a discrete variable (the Internet usage dummy), a two-step clustering method is used. We

then profile the segments using the variables used in the clustering analysis and adding

demographic and transactional variables.

Finally, we investigate the search results by segment. To find the relationship between

the segments and their brand choices, we use a correspondence analysis and multinomial logit

analysis. Correspondence analysis is an exploratory data analysis technique for the graphical

display of contingency tables (e.g., see Hoffman and Franke 1986). Besides brand choices, we

also look at the price-related outcomes (negotiation time with the dealer, discount amount,

discount rate, and the final price satisfaction) using ANCOVA.

64

DATA AND ESTIMATION

Data Description

The database contains consumers’ search behavior on automobiles purchased in 2001,

2003, and 2005; the data on purchases in 2001 was used by Ratchford et al. (2003, 2007). The

more recent datasets, which follow the same general format and survey procedure, have not been

used in studies to date. Details of the data collection procedures are given in Ratchford et al.

(2003, 2007). Table 2.3 shows the descriptive statistics of some important variables.

Automobile buyers answered about their search time on various information sources and

the helpfulness of each source. They also provided responses about their previous purchase

experience, with which we construct the internal search variable, and about spouse search time.

In addition, they gave their new car information such as brand and price. For time-variant

monetary variables such as hourly wage, sticker price, or discount, we took 2005 as the base year

and adjusted the monetary values in 2001 and 2003 by the proper inflation factors of 10% and

6%, respectively.

To see the interrelationships between search preference for different information sources

and the effects of other exogenous factors on them, we ran the simultaneous equations Tobit

model given by Equation 2. For simplified interpretation, we rearrange the terms as,

.

65

Table 2.3. Descriptive Statistics of Major Variables A) Search Time

Information Sources

Search Time (hour) Helpfulness Mean SD User (%) Mean SD

Offline

Personal 1.51 (2.52) 69.7% 3.67 (1.98) Independent 1.81 (3.20) 63.8% 3.43 (2.05) Manufacturer 1.51 (2.40) 75.1% 3.32 (1.71) Dealer 2.66 (2.68) 100.0% 4.65 (1.44) Test Driving 1.53 (2.85) 88.3% 5.15 (1.90)

Internet Independent 1.07 (2.96) 36.6% 2.12 (1.74) Manufacturer 1.08 (3.06) 45.4% 2.81 (2.18) Dealer 0.45 (1.30) 28.2% 2.19 (1.86)

Buyer Total Search 11.61 (11.40) 100.0% Spouse 3.94 (6.99) 63.5%

*: Helpfulness is on a 1-7 scale where 7 is very helpful.

B) Demographics Variables Mean (or %) SD

Age 46.88 (13.02)

Male (Dummy variable) 0.54 (0.50)

Married (Dummy variable) 0.70 (0.46)

Education Years 15.40 (2.82)

Hourly Wage ($) 23.36 (19.36)

Sticker Price ($) 27164.76 (7752.15)

Discount ($) 2864.34 (2565.89)

66

(3)

8

2

1

*8

*2

*1

8281

2821

1812

88

22

11

'8

'2

'1

*8

*2

*1

00

00

i

i

i

i

i

i

i

i

i

i

i

i

i

i

i

y

yy

helpful

helpfulhelpful

Xc

XcXc

y

yy

.

Note that the matrix in Equation 2, which represents the interrelationship between endogenous

variables, is converted into the -matrix in the right hand side of Equation 3. Therefore, ’s in

the j-th equation is interpreted as effects of search preference for other information sources

)( *, jiy on search preference for the j-th information source ( *

ijy ).

Search preference for external information sources makes up the endogenous variables

( *ijy ). The sources can be divided into offline and Internet sources. Search preference for offline

sources includes, personal (talking to friend or relative, *1iy ), independent (reading magazines

such as Consumer Reports, *2iy ), manufacturer (reading brochures, *

3iy ), dealer (visiting the

dealer, *4iy ), and experiential (test-driving, *

5iy ) sources. Search preference for Internet sources

include visiting independent websites ( *6iy ), manufacturer websites ( *

7iy ), and dealer websites

( *8iy ). Therefore, there are eight endogenous variables corresponding to the search times on eight

external information sources.

For the common exogenous variables, iXc , we include internal search and spouse search.

We include the square of internal search in order to see whether there is the inverted U-shaped

relationship between internal search and external search. We also consider an interaction effect

between spouse search and gender to see whether men consult less with their spouse than

women. To control observable consumer heterogeneity, we include age, male, education years,

67

hourly wage, and log of sticker price. Especially, education years and hourly wage are

significantly related to search patterns as more educated people may search less because of their

higher search productivity and higher income people may search less due to higher opportunity

costs. To capture unobservable heterogeneity over years, we use year dummies for year 2003 and

year 2005. For exclusion restrictions, we use the variables, helpfulness of each information

source.

As internal search cannot be directly observed, it is operationalized indirectly. Of the

available variables in the data, according to Bettman (1979), the degree of internal search is

related to whether the consumers buy from the same maker, satisfaction with the previous

product, and the number of purchases in 10 years. Furthermore, in the existing automobile

studies, experience and knowledge, which induce internal search, are measured by total

purchases, satisfaction with the previous car, or time since last purchase (Moorthy et al. 1997;

Punj and Staelin 1983; Srinivasan and Ratchford 1991). We conduct a principal component

analysis in order to construct a measure from these variables, which represents internal search.

Table 2.4 shows the results of the principal component analysis. The first principal

component has weights of 0.53 to 0.62 on each variable, capturing the common characteristic of

each variable, and it explains 44% of the original information. Thus, we use the first principal

component as the measure of internal search.

We exclude a few respondents who were outliers, namely, those reporting search times

on each offline external source exceeding 30 hours, or spouse search times of over 50 hours, or

zero time with the dealer. After excluding outliers, the number of respondents was 647 in 2001,

510 in 2003, and 550 in 2005, making up 37.5%, 30.1%, and 32.4% of the sample, respectively.

68

Table 2.4. Results of Principal Component Analysis Variables PC1* PC2 PC3

Same maker indicator (Dummy variable) 0.62 -0.13 -0.77

Previous car satisfaction 0.57 -0.60 0.56

Number of new car purchases in 10 years 0.53 0.79 0.30

Proportion Explained 44% 29% 27%

*: PCk means the k-th principal component.

For the missing values in certain variables for some respondents, we replaced them with the

mean of the rest of the respondents in the same year.

We exclude a few respondents who were outliers, namely, those reporting search times

on each offline external source exceeding 30 hours, or spouse search times of over 50 hours, or

zero time with the dealer. After excluding outliers, the number of respondents was 647 in 2001,

510 in 2003, and 550 in 2005, making up 37.5%, 30.1%, and 32.4% of the sample, respectively.

For the missing values in certain variables for some respondents, we replaced them with the

mean of the rest of the respondents in the same year.

Estimation

We estimate the parameters of Equation 2 by a Metropolis-Hastings within a Gibbs

sampler and impute the latent endogenous variables *jy using data augmentation method. The

MCMC algorithms are described by, and follow, Jang et al. (2010). We review the estimation

method and the choice of priors in Appendix.

The MCMC algorithms are run twice. In the first run, we took 25000 draws and

discarded the first 10000 as a ‘burn-in’ period. The remaining 15000 draws were used to

69

calculate the variance of the coefficients of the endogenous variables, used for the candidate

drawing in the second run. The second run made 25000 draws of which the first 10000 were

discarded and the remaining 15000 were used to calculate the posterior distributions of all the

coefficients. We verified that the algorithm converged by investigating the stable trends of draws

and the distributions of draws. The acceptance rate for the second run is about 0.47. For *iy , we

saved the last 1000 imputed values and averaged them.

70

RESULTS

The estimation results of Equation 3 are reported in three parts. The first part deals with the

estimation of the simultaneous equations Tobit model, which reveals the interrelationships

between search preferences in different sources and the effects of exogenous variables. The

second part deals with the results of the segmentation analysis, which classifies consumers based

on their search patterns. The third part deals with examining the segments’ brand choices and

price-related outcomes. In the results, we emphasize interpretation of the statistically significant

variables whose 95% posterior intervals do not contain zero.

Relationships between Information Sources

Recall that in the simultaneous equations Tobit model of Equation 3, the - matrix on

the right-hand side represents the effect on search preference for a given source of search

preference for other sources. The vector represents the effects of the internal search, spouse

search, and other observable demographics on search preferences. We begin by discussing

results from the estimation of the - matrix, given in Table 2.5, dividing them into effects of

search preference for Internet sources and effects of search preference for offline sources on each

information source. After that, we will discuss the results from the estimation of , given in

Table 2.6.

Effects of the Internet sources. The upper right side of Table 2.5 shows the effects of

search preference for Internet sources on search preference for offline sources. Overall, search

71

Table 2.5. Interrelationship of the External Search Sources

RHS

LHS

Offline Sources Internet Sources

Personal Independent Manufacturer Dealer Experiential Independent Manufacturer Dealer

Offline Personal 0.127 (0.06) -0.05 (0.13) -1.384 (0.26) 0.288 (0.06) 0.069 (0.06) -0.042 (0.09) 0.253 (0.11)

Independent 0.176 (0.06) 0.315 (0.11) -0.521 (0.12) -0.016 (0.09) 0.222 (0.05) -0.198 (0.08) 0.163 (0.11)

Manufacturer 0.107 (0.04) 0.16 (0.03) -0.634 (0.23) 0.197 (0.08) 0.043 (0.03) -0.06 (0.03) 0.162 (0.05)

Dealer 0.025 (0.03) 0.03 (0.03) -0.006 (0.05) 0.189 (0.03) 0.07 (0.03) -0.055 (0.03) 0.072 (0.03)

Experiential 0.071 (0.07) 0.066 (0.07) -0.073 (0.11) -1.605 (0.22) 0.174 (0.06) -0.031 (0.08) 0.04 (0.1)

Internet Independent -0.101 (0.12) 0.346 (0.1) -0.055 (0.16) -2.209 (0.46) 0.407 (0.12) 0.155 (0.11) 0.228 (0.15)

Manufacturer 0.108 (0.07) -0.148 (0.08) 0.14 (0.21) 0.092 (0.35) 0.028 (0.15) 0.204 (0.05) 0.209 (0.09)

Dealer 0.07 (0.04) 0.087 (0.04) -0.063 (0.08) -0.537 (0.24) 0.034 (0.1) 0.041 (0.03) 0.193 (0.03)

Note: Variables in bold are significant at the 95% level and numbers in parentheses are standard deviations.

72

preference for independent websites and dealer websites is correlated with search preference for

offline information sources while search preference for manufacturer websites is negatively

correlated with it. Specifically, search preference for independent websites is correlated with

search preference for offline independent sources ( 222.0 ), dealer sources ( 07.0 ), and

the experiential source ( 174.0 ). Increased search preference for dealer websites is associated

with increased search preference for the offline personal, manufacturer, and dealer sources

( 253.0 , 0.162, and 0.072, respectively). The complementary effects exhibited above may

have been driven by consumers’ will to verify or to utilize the online search information more

effectively with offline search information. In contrast, search preference for manufacturer

websites is negatively correlated with search preference for offline independent sources

( 198.0 ), manufacturer sources ( 06.0 ), and dealer sources ( 055.0 ). The

substitution pattern is meaningful not only because it shows which Internet source substitutes

traditional offline information sources but also because it has implications for manufacturers that

providing proper information on their websites can reduce consumers’ extended offline search.

The interrelationships between Internet sources also show complementary effects (in the

lower right side in Table 2.5). Search preference for independent websites is correlated with

search preference for manufacturer websites ( 204.0 ). Search preferences in the manufacturer

websites and the dealer websites are correlated with each other ( 193.0 and 209.0 ,

respectively). Though there is no direct association between search preference for independent

websites and dealer websites, there is an indirect association because search preference for

independent websites is associated with search preference for manufacturer websites, which is

73

sequentially associated with search preference for dealer websites. Thus, we can conclude that

Internet users tend to search all types of websites together.

Effects of the offline sources. The interrelationships of the offline sources in the upper left

side in Table 2.5 are positive in general, i.e., we find that consumers complementarily use each

offline information source. It is seen that search preference for personal sources is correlated

with search preference for independent sources and manufacturer sources ( 176.0 and 0.107).

Also search preference for independent sources is correlated with search preference for personal

sources and manufacturer sources ( 127.0 and 0.16), while search preference for

manufacturer sources is correlated with search preference for independent sources only

( 315.0 ). Finally, search preference for experiential sources (test-driving) is correlated with

search preference for personal, manufacturer, and dealer sources ( 288.0 , 0.197, and 0.189).

Therefore, it can be concluded that consumers prefer to search multiple offline information

sources.

An exception exists with search preference for offline dealer sources, which is strongly

negative on search preference for other offline sources ( 384.1 , 521.0 , 634.0 , and

605.1 , for the personal, independent, manufacturer, and experiential sources, respectively).

That is, if consumers have higher preference of searching at offline dealer sources, their

preference of searching in other sources decreases. It is interesting that search preference for

offline dealer sources is also negatively correlated with search preference for experiential

sources. One reason may be that offline dealer sources include taking a look at the showroom,

which enables consumers to have product information (e.g., style, comfortable seats, etc) without

74

driving. The implication of the findings stated above is that the dealer can supply effective and

comprehensive information to consumers.

Effects of search preference for offline sources on search preference for Internet sources

vary depending on offline sources (in the lower left side in Table 2.5). Search preference for

offline independent sources is correlated with search preference for independent websites and

dealer websites ( 346.0 and 0.087). Similarly, search preference for experiential sources is

correlated with search preference for independent websites ( 407.0 ). An explanation for these

positive effects may be that consumers tend to seek for information across offline and Internet

sources. It is notable that the effect of offline independent sources is positive on independent

websites but negative on the manufacturer websites ( 148.0 ). Recalling that the effect of

independent websites is positive on offline independent sources, while the effect of the

manufacturer websites is negative, clearly offline independent sources do not compete against

their Internet version but against manufacturer websites.

It is also found that effects of search preference for offline dealer sources are negative for

search preference for independent websites ( 209.2 ) and dealer websites ( 537.0 ).

Thus, though the Internet is widely available and popular for searching, it has not replaced the

traditionally important information source, the dealer, for a large proportion of buyers. However,

it is notable that search preference for offline dealer sources is not negatively correlated with

search preference for manufacturer websites. Along with the results that search preference for

manufacturer websites replaces search preference for offline sources, no effect of search

preference for offline dealer sources on search preference for manufacturer websites shows that

manufacturers’ website information plays an important role in consumers’ search behavior.

75

The findings from the interrelationships between information sources can be summarized

as follows. First, many positive coefficients show that consumers prefer to use various

information sources. Second, negative effects of search preference for offline dealer sources on

search preference for other offline and online sources show that the dealer is still an important

source in the Internet era. Third, manufacturer websites are the most important Internet sources

in consumer search since search preference for manufacturer websites is negatively correlated

with search preference for offline sources, while search preference for manufacturer websites is

not negatively correlated with search preference for offline dealer sources. The latter generally

reduces search preference for other sources. Fourth, the Internet versions of information do not

necessarily replace the traditional counterparts, as there are positive interrelationships between

some information sources (e.g., independent sources). Rather, different information sources in

the different format can be in competition. For example, search preference for the offline

independent source is negatively correlated with search preference for manufacturer websites

and vice versa.

Effects of Internal search and spouse search. The effects of exogenous variables are

given in Table 2.6. Here, we discuss the effects of internal search, spouse search and other

exogenous variables. We find in Table 2.6 that the extent of internal search significantly reduces

search preference for external sources except offline manufacturer sources, independent

websites, and dealer websites. The reduction is especially large in the offline personal and

experiential sources ( is less than 40.0 ). Thus, consumers who have a high level of internal

search reduce external search. However, any coefficients of the squared internal search are not

76

Table 2.6. Effects of the Exogenous Variables A) Effects of exogenous variables on search preference for offline sources

Endogenous Exogenous

Offline Sources Personal Independent Manufacturer Dealer Test Driving

Intercept 0.211 (5.34) -9.878 (4.18) 1.236 (3.28) -1.891 (2.52) -12.173 (5.71) Internal -0.398 (0.13) -0.281 (0.1) -0.122 (0.09) -0.172 (0.06) -0.497 (0.13) Internal2 -0.044 (0.09) -0.07 (0.07) -0.018 (0.06) -0.066 (0.04) -0.059 (0.09) Spouse 0.158 (0.03) 0.091 (0.02) 0.152 (0.03) 0.092 (0.01) 0.195 (0.04) Spouse Male 0.048 (0.04) 0.017 (0.03) -0.042 (0.02) 0.025 (0.02) 0.072 (0.04) Year 2003 -0.161 (0.31) 0.615 (0.26) -0.107 (0.21) -0.149 (0.15) 0.377 (0.35) Year 2005 -0.389 (0.31) 0.238 (0.25) -0.521 (0.2) -0.016 (0.15) 0.586 (0.34) Male -0.755 (0.31) 0.004 (0.26) 0.086 (0.21) -0.22 (0.15) -0.82 (0.34) Age -0.023 (0.07) -0.043 (0.06) -0.016 (0.05) 0.018 (0.04) 0.048 (0.08) Education years -0.084 (0.05) -0.042 (0.04) -0.098 (0.04) -0.06 (0.03) -0.05 (0.06) Hourly wage -0.013 (0.01) -0.01 (0.01) -0.002 (0.01) -0.009 (0.004) -0.02 (0.01) Log of sticker price 0.301 (0.51) 0.891 (0.39) 0.096 (0.32) 0.435 (0.23) 1.228 (0.54) Helpfulness 1.02 (0.08) 1.12 (0.06) 0.571 (0.05) 0.175 (0.02) 0.94 (0.09)

B) Effects of exogenous variables on search preference for online sources

Note: Variables in bold are significant at the 95% level and numbers in parentheses are standard deviations.

Endogenous Exogenous

Online Sources Independent Manufacturer Dealer

Intercept -16.19 (8.36) -7.74 (5.4) 2.808 (3.52) Internal -0.345 (0.21) -0.275 (0.14) -0.131 (0.09) Internal2 -0.123 (0.14) -0.041 (0.09) -0.105 (0.06) Spouse 0.253 (0.06) -0.006 (0.05) 0.065 (0.03) Spouse Male 0.036 (0.06) 0.056 (0.04) 0.029 (0.02) Year 2003 1.665 (0.53) 0.901 (0.35) -0.491 (0.23) Year 2005 0.978 (0.51) 0.878 (0.35) 0.141 (0.22) Male -0.333 (0.51) 0.086 (0.34) -0.031 (0.22) Age 0.061 (0.12) 0.042 (0.08) 0.037 (0.05) Education years 0.099 (0.08) 0.065 (0.06) -0.061 (0.04) Hourly wage -0.006 (0.01) 0.009 (0.01) -0.01 (0.01) Log of sticker price 1.195 (0.79) 0.052 (0.51) -0.399 (0.34) Helpfulness 1.782 (0.15) 1.194 (0.1) 0.752 (0.05)

77

significant. Therefore, we do not find any evidence of the inverted U-shaped relationship

between internal search and external search in parameter estimation.

The effect of spouse search is positive on search preference for all the external sources

except for manufacturer websites ( ’s range between 0.065 and 0.253). Note that there is no

interaction effect between spouse search and gender across all information sources. Therefore,

regardless of the gender of buyers, it seems that buyers and their spouses search together.

Effects of other exogenous variables. There are various effects of other exogenous

variables in Table 2.6. We find that the year of the survey had an effect on search in external

sources. Compared to the base year 2001, consumers searched more in offline independent

sources in 2003 ( 615.0 ) and searched less on offline manufacturer source in 2005

( 521.0 ). Among Internet sources, consumers searched more on independent websites

( 665.1 in 2003) and manufacturer websites ( 901.0 in 2003 and 0.878 in 2005).

However, they reduced their search time on dealer websites in 2003 ( 491.0 ). The positive

sum of coefficients reflects the increase in total external search time in 2003 and 2005 compared

to the base year 2001.

The results in Table 2.6 also show the effects of demographics and product attributes.

Male buyers searched less on offline personal sources ( 755.0 ) and experiential sources

( 82.0 ). However, there is no effect of age on search preference for all information sources.

With respect to search cost-related variables, consumers with higher education years had a lower

search preference for the offline manufacturer sources and dealer sources ( 098.0 and

0.06- ) and consumers with higher hourly wage searched less in offline dealer sources and

78

experiential sources ( 009.0 and 02.0 ). It is also found that high sticker price increases

search preference for offline independent sources and experiential sources ( 891.0 and 1.228,

respectively). Finally, as might be expected, the effect of the helpfulness of each information

source is positive.

Search Based Segments

Our second part of results is obtained from the segmentation analysis. By estimating

Equation 2, we obtained the latent search preference ( *ijy ) on each external information source.

We then segment the consumers in terms of search related variables including the latent search

preferences, internal search and spouse search, as well as a dummy variable of whether the

customer used the Internet or not. As mentioned previously, we used a two-step cluster analysis

for handling both continuous variables related to search and the discrete dummy variable. We

selected the number of clusters based on BIC and the size of the segments. We used a nine-

segment solution because the decrease in BIC is marginal after nine segments and none of the

segments is too small to be managerially relevant. Figure 2.2 shows the search times across the

segments.

In Figure 2.2, the x-axis represents the extent of the internal search and the y-axis

represents the offline, the Internet, and spouse search times. The figure lists the segments from

S1 to S9 for reference and gives the size of each segment. We sorted nine segments by the

degree of internal search. The first three segments are low internal search segments (i.e. the

levels of internal search lie between - 1.21 and 0.44- ), the next three segments are moderate

internal search segments (the levels of internal search lie between 0.24- and 0.05), and the last

79

Segments Segment Label S1 (273, 16.1%) Lowest internal searcher S2 (192, 11.3%) Lowest external searcher S3 (247, 14.6%) Low internal but moderate offline searcher S4 (55, 3.2%) Highest external searcher S5 (114, 6.7%) High searcher in online/offline sources S6 (75, 4.4%) High searcher in offline sources S7 (286, 16.9%) Moderate searcher S8 (189, 11.1%) High internal and low external searcher S9 (266, 15.7%) Most experienced and loyal searcher

Figure 2.2. Search Based Segments (S1 to S9) and Their Search Times in Hours

three segments are high internal search segments (the levels of internal search lie between 0.35

and 1.20). Note that some segments use the Internet (S1, S4, S5, S7, and S8) while other

segments do not use the Internet (S2, S3, S6, and S9).

Figure 2.2 reveals two interesting results that are new to the literature. First, we find, at

the segment level, the inverted U-shaped relationship between the internal search and external

0

5

10

15

20

25

30

-1.21 -0.79 -0.44 -0.24 -0.18 0.05 0.35 1.03 1.20

Exte

rnal

Sea

rch

Tim

e

Internal Search

Search Time by Segments

Offline Internet Spouse

SegmentSize

S1(273)

S2(192)

S3(247)

S4(55)

S5(114)

S6(75)

S7(286)

S8(189)

S9(266)

80

search if we put aside segments S1 and S4. That is, the level of external search increases in the

low internal search segments up to the segment S5 and S6 (the segments with moderate internal

search) and then decreases in the high internal search segments. It is worth remarking that the

theoretical inverted U-shaped relationship, like the view of Moorthy et al. (1997), is obtained at

the segment level, but not in aggregate. The other interesting result is two off-pattern segments:

S1 and S4. Members of S1, the lowest internal search segments, do moderate external search

(12.2 hours). The reason could be that these consumers want to compensate for their lack of

knowledge by external search. Segment S4 is a niche segment (3.2%) with moderate internal

search (-0.24) but long external search (43.9 hours). The reasons could be that they are efficient

enough to process the external information more or that they enjoy external search based on their

current internal knowledge. Uncovering these segments is important because, depending on their

size, they can mask the inverted U-shaped relationship. If segments S1 and S4 are relatively

large, it may make the relationship between internal and external search negative.

In Table 2.7, we provide a labeling and profiling of the segments by looking at their

search patterns and descriptive characteristics including demographics. After this, we also

discuss how our segmentation results compare against those of Furse et al. (1984).

Segment S1 (size n=273, 16.1%) is the second largest and characterized by the lowest

internal search. Though their internal search level is lowest, their external search is moderate

(12.2 hours). Segment S2 (n=192, 11.3%) also does low internal search and is characterized by

its lowest external search time (3.48 hours) and no Internet usage. Segment S3 (n=247, 14.6%)

consists of the moderate offline external searchers. This segment shows higher level of internal

search and external search than S2.

81

Segment S4 (n=55, 3.2%) is characterized by the highest external search time (43.9

hours). It is notable that their external search is extremely high compared to the degree of their

internal search. Segment S5 (n=114, 6.7%) and Segment S6 (n=75, 4.4%) consist of consumers

who use offline sources for long time (21.2 and 25.5 hours, respectively) and whose spouse

search is also high (15.1 and 16.1 hours, respectively). The differences between two segments

are that S5 uses Internet sources while S6 does not and that S6 spends the longest time on test-

driving (7.39 hours).

Segments S7 (n=286, 16.9%), S8 (n=189, 11.1%), and S9 (n=266, 15.7%) do relatively

high internal search. As the degree of internal search increases in segments, the external search

time decreases. Especially, S9 does the highest internal search but very low external search time

(3.9 hours). In addition, this segment does not use the Internet at all.

Results relating segments and demographics are as follows. Older consumers are more

likely to belong to high internal search and low external search segments compared to younger

consumers. Females with low internal search or males with high internal search are likely to

belong to low external search segments. Employed consumers do not necessarily belong to lower

search segments than those who are unemployed. Highly educated consumers do not necessarily

belong to the high internal search segments but are likely to belong to the high external search

segments. Consumers with high hourly wages do not seem to reduce their search time because

some high wage segments search more than low wage ones.

We overlay our segments on those of Furse et al. (1984) in Table 2.7. We find that S1 matches

their cluster, Self-Reliant Shopper, in that they spend certain amount of time but do not involve

other people much. S2 matches their cluster, Purchase Pal Assisted, who are the least

82

experienced car shoppers and get help from others. S3 and S7 are similar to their Moderate

search cluster. S4 matches their High Search cluster of those consumers spending the greatest

amount of time in search activity. S5 and S6 are similar to their cluster, Retail Shopper, who

involves many decision makers, especially the wife, in the search process. S8 and S9 match their

cluster, Low Search, of those who have the prior purchases experience but spend less time.

Roughly, therefore, search based segments in the Internet era match those in the pre-Internet era.

Some differences, however, are also found in the Internet era. For example, while the

proportion of Purchase Pal Assisted has decreased (19% vs. 11.3%), Self-Reliant Shopper has

increased (12% vs. 16.9%). In addition, a new segment has emerged (S5, 6.7%), which is similar

to their Retail Shopper cluster, but uses the Internet as an additional information source. The

changes occurred possibly because many consumers got information directly from the Internet

without other people’s help or even led to a new searcher type.

Search Results

The third and final part of results pertains to the effects of search on brand choices and

price-related outcomes. First, we look at the brand choices of the search-based segments, which

is new to the literature. We examine the relationship by a correspondence analysis first followed

by a logit model analysis. Then, we look at price-related outcomes such as pricing negotiation

time, discount amount/rate, and the final price satisfaction by segment.

Search based segments and their brand choices. We categorize the individual automobile

brands into country level brands. If there are many brands in the same country, we classify them

by manufacturer depending on the number of observations. The final brands we use are Chevy

(20.1%), GM low brands (Pontiac and Saturn, 9.6%), GM high brands (Cadillac, GMC, and

83

Table 2.7. Description of Segments

Segments (Size, %) Search Pattern Demographics

Similar Group in

Furse et al. (1984)

S1 (273, 16.1%) Lowest internal searcher

Their internal search is lowest. They seem to make up lack of their knowledge by moderate external search.

The youngest segment (average age 39.1 years). They are highly educated (16.3 years), employed (88%), and paid hourly wage ($25.1). They buy cars for the first time or change models.

Self-Reliant Shopper (12%)

S2 (192, 11.3%) Lowest external searcher

As their internal search is low, their external search is also low. They are the lowest searchers and do not use the Internet.

Average age 49.1 years. The female proportion is higher (54%). Marriage rate is 64%. Their education level (14.4 years) and hourly wage ($19.3) are lower than others. They do not have many experiences in automobile purchases.

Purchase Pal Assisted (19%)

S3 (247, 14.6%) Low internal but moderate offline searcher

Their internal search is relatively low but external search is larger than other low internal searcher segment (S2). They do not use the Internet.

Average age is 50.2 years. Fewer years of education (14.6 years), less employed (66%) and paid ($20.1) than others. They do not have many experiences in purchases.

Moderate Searcher (32%)

S4 (55, 3.2%) Highest external searcher

They do moderate internal search but extremely high external search. This segment is niche.

Average age is 42.5 years. They are less married (60%) but more educated (16.5 years), employed (82%) and paid ($27) than others.

High Searcher (5%)

S5 (114, 6.7%) High searcher in online/offline sources

They do high external search in various sources including spouses.

Average age is 43.6 years. Half are female (51%). Most are married (80%) and employed (87%).

Retail Shopper (5%)

S6 (75, 4.4%) High searcher in offline sources

They do high external search but do not use the Internet. They spend long time in test-driving and get spouses’ help most.

Average age is 50.7 years. Half are female (51%). Most are married (85%). They are less employed (68%) and paid ($17.6).

Retail Shopper (5%)

S7 (286, 16.9%) Moderate searcher

They use various sources in the moderate level. They are the largest segment.

Average age is 44.5 years. Most are employed (87%) and highly paid ($26). The proportion of males, marriage rate, and education levels are average.

Moderate Searcher (32%)

S8 (189, 11.1%) High internal and low external searcher

They do high internal search but low external search.

Average age 48.8 years. Higher proportion of males (65%). More years of education (16.3 years), higher employment rate (84%), and paid ($31). They have purchased 3.16 cars in 10 years and 72% of them buy the same makers.

Low Searcher (26%)

S9 (266, 15.7%) Most experienced and loyal searcher

Their internal search is the highest and their external search is low. They do not use the Internet.

Average age 52.7 years. More are married (80%) but less employed (66%) and paid ($20.3). They are so loyal to auto makers that 80% of them buy the same brands. They are most experienced in purchasing cars (3.32 in 10 years).

Low Searcher (26%)

84

Oldsmobile, 10.2%), Ford (17.4 %), Chrysler (11.2%), Toyota/Honda (14.2%), other Japanese

brands (e.g., Nissan, Mazda, and so on, 4.6%), EU brands (4.7%) and Korean brands (3.6%).

We look at the graphical relationship between the search-based segments and automobile

brands using a correspondence analysis. The result is in Figure 2.3. In the correspondence

analysis, we chose two dimensions. The first dimension explains 69.2% of the original

information and the second explains 14.4%. From the perspective of the segments, the main

dimension (x-axis) appears related to Internet usage because the Internet using segments (S1, S4,

S5, S7, and S8) are located on the right side and the rest of the segments are located on the left

side. From the perspective of the brands, the main dimension appears related to the brand origin

because American brands are located together on the left side while foreign brands are on the

right side.

By combining the results of the search-based segments and their brand choices, we can

see the relationship between them. The most salient result is that the low and moderate search

segments (S2, S3, S7, S8, and S9) correspond to the American brands while the high search

segments (S1, S4, and S5) correspond to all the Japanese and EU brands. Segment S6, high

offline search segment, is close to Chrysler and Korean brands.

In addition to the correspondence analysis, we confirmed the relationship between the

segment membership and brand choices by using a multinomial logit model. We set up the

multinomial logit model as follows.

L

lil

ili

x

xlbrandP

1

)exp(

)exp()( ,

85

Figure 2.3. Correspondence between Search-based Segments and Brand Choices

where i indexes consumer and l indexes brand and ix are the independent variables including

segment dummies, demographics and product related data. For brevity, we report only the main

results of this analysis. The logit analysis results are, in general, similar to the correspondence

analysis. Setting Chevy as the reference category, compared to S9, the segments S1, S4, and S5

are more likely to choose Toyota/Honda ( =2.33, 1.83, and 1.91), other Japanese brands (

=1.98 and 1.53 for S1 and S5), or EU brands ( =2.14, 1.68, and 1.72, respectively). S3 is more

likely to choose Chrysler ( =0.85) or Korean brands ( =1.35) and S6 is more likely to choose

Chrysler ( =1.17). However, there is no difference in brand choices of S2, S7, and S8,

S1S2

S3

S4S5

S6

S7S8

S9

Chevy

GM Low

GM HighFord

Chrysler

Toyota/Honda

Other JapaneseEU

Korean-1.5

-1.0

-0.5

0.0

0.5

1.0

-1.0 -0.5 0.0 0.5 1.0 1.5

Sale

s Vol

ume

American-Foreign

Correspondence between Segments and Brands

86

compared to S9, as those segments belong to the low external search segments and are likely to

choose American brands.

The close relationship between search-based segments and their brand choices

demonstrates that it is important for automakers to choose proper communication media for their

customers. That is, American brands, whose customers are high internal and low external

searchers, might consider spending more on building consumer loyalty and satisfaction. In

contrast, foreign brands should provide more information to satisfy their consumers’ information

needs. As the foreign brands are strongly associated with Internet users, they should enhance

their Internet-based advertising and communications.

Search based segments and price-related outcomes. Finally, we look at price-related

outcomes for the different segments. To see the differences by segment, we run an ANCOVA, in

which the dependent variables are the price negotiation time with the dealer, discount amount,

discount rate (discount amount over the sticker price), and the final price satisfaction. The main

independent variable is the segment variable and the covariates are age, male indicator, marriage

indicator, employment indicator, education level in years, hourly wage, sticker price, and the

brands. Table 2.8 shows the F-test results which test for mean differences in the dependent

variables by segment.

We find that there is a difference (i.e., we can reject the null hypothesis of mean equality)

in the price negotiation time with the dealer for different segments (F=17.32, p-value<0.01). In

general, the high external search segments (S4, S5, and S6) spend a longer time on negotiating

with the dealer (around 3 hours) while low external search segments (S2, S8, and S9) and

moderate external search segments (S1, S3, S7) spend a shorter time on negotiating with the

87

dealer (around 1 to 1.5 hours, respectively). Because the negotiation takes place at the dealer,

high external searchers seem to spend more time with the dealer when they visit the dealer to

shop. The discount amount is also different by segment (F=2.41, p-value=0.01). Overall, the high

offline search segments (S4, S5 and S6) or the high internal search segments (S7, S8, and S9) get

on average discounts of $3000 while the others receive on average discounts of less than $2500.

The results of the discount rate show similar differences (F=2.44, p-value=0.01). The segments

S4 through S9 get about a 10.7~11.6% discount but the other segments receive about a

9.4~10.5% discount.

Interestingly, however, even though price negotiation times, discount amount, and

discount rate are different for different segments, the final price satisfaction, on average 5.4 out

of 7, is not different across the segments (F=0.84, p-value=0.56). Considering that every segment

ends up with a similar satisfaction level, the different search patterns are the outcomes of their

best search effort to maximize their utility given their current knowledge, productivity in

different search sources, or spousal help.

88

Table 2.8. Results of ANCOVA

Segment Segment Name Negotiation

Time Discount

Discount Rate

Final Price Satisfaction

Mean SD Mean SD Mean SD Mean SD

S1 Lowest internal searcher 1.35 (1.34) 2467 (2536) 0.094 (0.086) 5.28 (1.18)

S2 Lowest external searcher 0.92 (0.86) 2462 (2272) 0.100 (0.091) 5.32 (1.44)

S3 Low internal / moderate offline searcher 1.60 (1.76) 2691 (2237) 0.105 (0.085) 5.33 (1.27)

S4 Highest external searcher 2.32 (2.51) 2959 (2788) 0.107 (0.092) 5.38 (0.97)

S5 High searcher in online/offline sources 3.38 (3.59) 3144 (2777) 0.113 (0.093) 5.20 (1.45)

S6 High searcher in offline sources 3.16 (4.30) 3034 (2780) 0.110 (0.092) 5.37 (1.2)

S7 Moderate searcher 1.53 (2.12) 3168 (2642) 0.116 (0.089) 5.43 (1.22)

S8 High internal and low external searcher 1.07 (1.04) 3192 (2758) 0.114 (0.091) 5.55 (1.21)

S9 Most experienced and loyal searcher 1.25 (3.36) 2950 (2582) 0.108 (0.092) 5.59 (1.22)

F statistic 17.32 2.41 2.44 0.84

p-value 0.00 0.01 0.01 0.57

89

CONCLUSION

Summary

The objectives of this paper were to find out the relationships between search sources in a

comprehensive manner, segment the buyers based on their search patterns, and examine the

search results for each segment. We consider the entire range of information sources that buyers

consult in automobile purchases including internal search, offline search sources, Internet

sources, and spouse search. By analyzing the data on automobile purchases in 2001, 2003, and

2005, we find some interesting results that extend the results from the previous studies.

First, we find that, in general, search preference for each information source is positively

associated with the others. The generally positive interrelationship occurs within the offline and

the Internet sources and across the offline and the Internet sources, implying that consumers

complementarily use all information sources. However, search preference for the dealer sources

and internal search reduce search preference for all information sources. It is notable that search

preference for dealer sources significantly reduces search preference for the Internet sources.

This finding extends previous results that looked at the effects of the Internet on offline sources

but not the reverse effects.

Second, we identify nine segments based on consumers’ search patterns. The segments

are profiled based on the extent of their internal search, Internet and offline search time, and

spouse search time. Several of the segments correspond to those of Furse et al. (1984) obtained

90

prior to the Internet. At the segment level, we find the inverted U-shaped relationship between

internal search and external search. That is, low and high internal searchers are low external

searchers while moderate internal searchers are high on external search. We also find that two

segments do not conform to the inverted U-shaped relationship; one has low internal search but

moderate external search and the other has moderate internal search but extremely high external

search. Though the latter segment is small in size, the presence of two such segments shows a

reason for why the inverted U-shaped relationship may be hard to find at the aggregate level.

Finally, we examine the outcomes of search, focusing on brand choice. The results show

that segments with low external search are associated with purchase of American brands while

segments with high external search correspond to Japanese and EU brands. These results are

notable in that the relationship between search and brand choices is indentified for the first time.

In addition, we find that though the price-related outcomes are different for different segments,

final price satisfaction levels are similar across segments.

In conclusion, our study extends the search literature by providing some new insights

including the effect of offline search on Internet search, the identification of search-based

segments, the relationship of internal and external search at the segment level, and the search

segments’ brand choices. We discuss how automakers might utilize these results next.

Managerial Implication

Our results have some practical implications for dealers and automakers. First, the dealer

is still a powerful and efficient information source for consumers in the Internet era. The more

time consumers prefer to spend with the dealer, the less time they prefer to spend with other

information sources. This result qualifies the results in previous studies about the role of the

91

Internet in reducing the search time with the dealer. Automakers should carefully select and train

dealers, maintaining a good relationship with them not only for the final sales but also for

providing information to consumers.

Second, automakers can identify their positioning and their competitors’ positioning in

terms of consumers’ search patterns. The results show that American brands and Japanese brands

are close to the other brands of their countries. EU brands are close to Japanese brands, maybe

being perceived as foreign country brands, while Korean brands are positioned in a distinct

location. That is, competition occurs between brands of the same country group. Thus,

automakers could focus on their differentiation from other brands from the same country.

Third, automakers can develop efficient communication strategies based on the

relationship of the search segments and their brand choices. For example, because customers of

American brands are low external searchers, American brands might implement advertising

campaigns that build brand image and loyalty. As Japanese and EU brands are associated with

higher external and Internet search, they should enhance information delivery through their own

websites from which consumers can acquire their information and substitute other offline

information sources. Korean brands should provide more information to convince the high

search consumers. However, they have to work to reduce the distance in their position from other

foreign brands to be perceived as one of them.


If researchers have more information, they can understand consumers’ search patterns

better. First, consideration sets can affect the search patterns. If consumers are considering those

brands with which they are familiar and have experience, they are less likely to conduct long

92

searches because of high internal search. Yet, if they are considering new automakers, they

would have to search more to obtain the necessary information. Therefore, future studies should

consider ways to include the effect of consideration sets. Second, the sequence of search can help

determine if some information sources initiate or stop further search. This might give some

insights into which information sources are more important in different stages of search. Third,

our dataset did not cover the 2008-2009 periods, which has seen turmoil and bankruptcies in the

automobile industry, changes in product lines, elimination of dealers, government intervention,

and the recession. It would be interesting to see whether these have altered consumers’ search

patterns in automobile purchases.

93

APPENDIX

MCMC Algorithms for Model Estimation

(1) Estimating the parameters of endogenous variables ( )

The matrix is estimated using a random walk chain Metropolis-Hastings method. As

the diagonal elements of are one, we need to estimate the off-diagonal elements only. Let ~

denote the vector consisting of off-diagonal elements of , where the dimension of ~ is

1)(~ 2 JJK . In this study J is 8. We use a diffuse normal prior, i.e., ),(~~~~N , where

K~~ 0 and KI ~4

~ 10 .

We generate candidate draws according to zs )1(* ~~ , where ),0(~ zNz and s

denotes the s-th iteration. To find the proper z , we ran the MCMC algorithms twice, following

Koop (2003). In the first run, we assume Kz Ic ~1~ and randomly assign 5

1 10~c (80%) and

51 105~c (20%). After we get the variance of ~ , denoted as z , in the second run, we

reassume that zz c2~ and randomly assign 2

2 10~c (80%) and 22 105~c (20%).

As there are many parameters in ~ , we draw and accept the new candidates equation by

equation. For example, let us denote j~ as the coefficient vector of the endogenous variables in

the j-th equation. To determine sj

~ , we draw js

jj z)1(* ~~ given sj

~ , where the subject j

means the related components in the j-th equation. By comparing the posterior probabilities with

94

)1(~ sj and *~

j , we decide which draw to use at the s-th iteration and repeat the process for j=1 to

J. Estimating ~ by the split equations is helpful for getting the proper acceptance rate.

(2) Estimating the parameters of exogenous variables ( ) and covariance matrix ( )

We estimate and by using a Gibbs sampler with standard Normal-Wishart priors.

Specifically, we use a normal prior ),(~ N , where K0 and KI410 and a

Wishart prior ),(~1 VvW , where 3Jv and JIvV )/1( .

(3) Imputing *y

After getting all parameters ( , and ), we can impute *iy . If all elements in iy are

positive, there is no need to impute. If all elements in iy are zero, we draw *iy from the

multivariate truncated normal distribution, ),( '111)0,( iXMVTN . If some elements of

iy are zero, we draw the latent values from a conditional multivariate truncated normal

distribution.

95

REFERENCES

Amemiya, Takeshi (1974), “Multivariate Regression and Simultaneous Equation Models when the Dependent Variables Are Truncated Normal,” Econometrica, 42 (6), 999-1012.

________, Makoto Saito, and Keiko Shimono (1993), “A Study of Household Investment

Patterns in Japan: An Application of Generalized Tobit Model,” The Economic Studies Quarterly, 44 (1), 13-28.

Bettman, James R., (1979), Information Processing Theory of Consumer Choice. Reading, MA:

Addison-Wesley. Furse, David H., Girish N. Punj, and David W. Stewart (1984), “Typologies of Individual Search

Strategies Among Purchasers of New Automobiles,” Journal of Consumer Research, 10 (March), 417-31.

Guo, Chiquan (2001), “A Review on Consumer External Search: Amount and Determinants,”

Journal of Business and Psychology, 15 (3), 505-19. Hauser, John, Glen Urban, and Bruce Weinberg (1993), “How Consumers Allocate Their Time

When Searching for Information,” Journal of Marketing Research, 30 (November), 452-66.

Hoffman, Donna L. and George R. Franke (1986), “Correspondence Analysis: Graphical

Representation of Categorical Data in Marketing Research,” Journal of Marketing Research, 23 (August), 213-27.

Jang, Sungha, Ashutosh Prasad, and Brian T. Ratchford (2010), “Consumer Spending Patterns

across Firms and Categories: Application to the Size and Share of Wallet,” working paper, University of Texas at Dallas, TX.

John, Deborah Roedder, Carol A. Scott, and James R. Bettman (1986), “Sampling Data for

Covariation Assessment: The Effect of Prior Beliefs on Search Patterns,” Journal of Consumer Research, 13 (June), 38-47.

Klein, Lisa R. and Gary T. Ford (2003), “Consumer Search for Information in the Digital Age:

An Empirical Study of Prepurchase Search for Automobiles,” Journal of Interactive Marketing, 17 (3), 29-49.

96

Koop, Gary (2003), Bayesian Econometrics. Hoboken, NJ: Wiley. Moorthy, K. Sridhar, Brian T. Ratchford, and Debabrata Talukdar (1997), “Consumer

Information Search Revisited: Theory and Empirical Analysis,” Journal of Consumer Research, 23 (March), 263-77.

Punj, Girish N. and Richard Staelin (1983), “A Model of Consumer Information Search Behavior

for New Automobiles,” Journal of Consumer Research, 9 (March), 366-80. Ransom, Michael R. (1987), “A Comment on Consumer Demand Systems with Binding Non-

negativity Constraints,” Journal of Econometrics, 34, 355-59. Rao, Akshay and Wanda Sieben (1992), “The Effect of Prior Knowledge on Price Acceptability

and the Type of Information Examined,” Journal of Consumer Research, 19 (September), 256-270.

Ratchford, Brian T., Myung-Soo Lee, and Debabrata Talukdar (2003), “The Impact of the

Internet on Information Search for Automobiles,” Journal of Marketing Research, 40 (May), 193-209.

Ratchford, Brian T., Debabrata Talukdar, and Myung-Soo Lee (2007), “The Impact of the

Internet on Consumers’ Use of Information Sources for Automobiles: A Re-Inquiry,” Journal of Consumer Research, 34 (June), 111-19.

Russo, J. Edward and France LeClerc (1994), “An Eye-Fixation Analysis of Choice Processes

for Consumer Nondurables,” Journal of Consumer Research, 21 (September), 274-90. Viswanathan, Siva, Jason Kuruzovich, Sanjay Gosain, and Ritu Agarwal (2007), “Online

Infomediaries and Price Discrimination: Evidence from the Automotive Retailing Sector,” Journal of Marketing, 71 (July), 89-107.

Srinivasan, Narasimhan and Brian T. Ratchford (1991), “An Empirical Test of a Model of

External Search for Automobiles,” Journal of Consumer Research, 18 (2), 233-42. Yang, Sha, Vishal Narayan and Henry Assael (2006), “Estimating the Interdependence of

Television Program Viewership Between Spouses: A Bayesian Simultaneous Equation Model,” Marketing Science, 25 (4), 336-49.

Zettelmeyer, Florian, Fiona Scott Morton, and Jorge Silva-Risso (2006), “How the Internet

Lowers Prices: Evidence from Matched Survey and Automobile Transaction Data,” Journal of Marketing, 43 (May), 168-81.

97

CHAPTER 3

HOW CONSUMERS USE PRODUCT REVIEWS

IN THE PURCHASE DECISION PROCESS

Sungha Jang





98

ABSTRACT

Several studies have found a positive effect of product reviews on sales at the aggregate level.

This paper, however, uses individual level data to examine the influence of product reviews in

different stages of the consumer’s purchase decision process. Specifically, a two-stage model

consisting of consideration set formation and choice is posited, where information from product

reviews can be incorporated at each stage. The model is estimated using an online panel study

about hotel choice. We find that: (1) Consumers use product reviews more in the consideration

set stage and less in the choice stage; (2) Bayesian updating of prior perceived quality explains

better how consumers use product reviews compared to two competing updating methods; (3)

The monetary value of a unit increase in the mean of product reviews can be computed – in the

case of the hotel study we find that it is equivalent to a price decrease of $57. Our results suggest

that managers should make product reviews available from the beginning of the search process,

show all components of product reviews (i.e., mean, number, and variance), and focus on

satisfying customers and encouraging them to write reviews.

Keywords: Product reviews; Bayesian updating; Consideration sets; Multivariate Probit; Choice

models; Bayesian estimation.

99

INTRODUCTION

Consumers frequently rely on the opinion of other consumers, such as product experts,

acquaintances, or online users, before they make their purchase decisions. The easy availability

of online product reviews has facilitated this behavior. Reflecting the surge in product reviews

usage, a number of recent papers have investigated the effects of product reviews on aggregate

sales. In general, more favorable product reviews are found to lead to higher sales (e.g.,

Chevalier and Mayzlin 2006). However, at the individual level, the use of product reviews in

different stages of the purchase decision process is relatively unexplored. The two stages that we

consider are the consideration set stage, where products are selected for further evaluation, and

the choice stage, where a final product is chosen from the consideration set. The motivation of

this paper is thus to examine in what stage, and how, consumers use product reviews in the

purchase decision process.

For a clearer explanation of the type of information that product reviews provide and how

consumers use product reviews, consider the following scenario:

M is planning a first trip to Cancun. Online information leads to short-listing the Marriott

hotel, because M has had good experiences with the Marriott brand, and the Fiesta

Americana hotel, because it is highly rated. Furthermore, after considering that the Marriott

has 30 online product reviews, averaging 3.5/5, and the Fiesta Americana has 270 reviews,

averaging 4.7/5, M still decided to choose the Marriott due to prior experience carrying

more weight.

100

From this, we see that the process of an individual consumer’s use of product reviews can be a

balancing act. Even while allowing the relationship in the literature that high product reviews

positively affect aggregate sales, it is not necessary that individual consumers always select the

highest rated product.

To explore how product reviews influence the purchase decision process, we organize the

study into three research questions:

(1) Are product reviews used in the consideration set stage, the choice stage, or both stages,

and to what extent?

(2) How is information from product reviews incorporated with prior experience or prior

perceived quality in the different stages?

(3) What is the value of each component of product reviews (i.e., mean, number, and

variance) expressed ideally in monetary terms?

A possible hypothesis is that consumers use prior perceived quality in the consideration

set stage and use product reviews in the choice stage by updating their prior perceived quality.

The rationale for this would be that in the consideration set stage, consumers are thought to apply

simple criteria to minimize their search efforts to a subset of alternatives from all alternatives

(e.g., Gilbride and Allenby 2004). Therefore, consumers are likely to use their current knowledge

about quality, called prior perceived quality. In contrast, in the choice stage, consumers carefully

consider detailed information and incorporate other people’s opinions on quality, i.e., product

reviews, with their prior perceived quality. As a result, they get an updated knowledge of quality,

called posterior perceived quality. The updating method could be Bayesian or something else.

Finally, posterior perceived quality affects the choice.

We empirically test the above hypothesis and several competing specifications in the

context of making a hotel choice online. Several previous studies have also looked at hotel

101

choice and this is likely to be a high involvement task with extended problem solving (e.g.,

Vermeulen and Seegers 2009).

Our findings contribute to the current theory on the effect of product reviews in three

ways. First, we find that product reviews have separate effects in different stages of the purchase

decision process. The results show that product reviews update prior perceived quality in the

consideration set stage, which affects inclusion into the consideration set. Therefore, product

reviews, assuming that they are positive, should be made available from the beginning of the

purchase process.

Second, we find that consumers integrate product reviews information in a manner

consistent with Bayesian updating. To see this, we compared the performance of Bayesian

updating with two other updating heuristics. In Bayesian updating, consumers combine their

prior perceived quality with product reviews data, including mean, number, and variance,

resulting in posterior perceived quality. In the heuristic updating methods, consumers either

replace their prior perceived quality by product reviews, or use the average of prior perceived

quality and product reviews.

Third, we obtain the monetary value of each component of the product reviews. In the

consideration set stage, we examine how much increase in the mean, number or variance of

product reviews is necessary to keep the utility level and consideration set composition

unchanged for a given price decrease. Specifically, we found that a unit increase in the mean

consumer review is worth $57, a unit increase in the number of reviews is not worth much, and a

unit increase in the variance of product reviews has a value that depends on the difference

between the mean of product reviews and prior perceived quality.

102

The paper proceeds as follows: In the next section, we provide a review of the literature.

Then, we describe the two-stage choice model and the estimation method. In addition, we

explain how consumers update prior perceived quality. After this, we describe the survey and

data followed by the results. Finally, we give the conclusions and directions for future research.

103

LITERATURE REVIEW

Our paper is based on several research streams that deal with the effect of product reviews on

sales, consideration set formation and choice models, and Bayesian updating.

Several papers have examined the effects of product reviews on sales for various product

categories. For example, Chevalier and Mayzlin (2006) find that the number and mean of

product reviews are positively related to online book sales. Clemons, Gao, and Hitt (2006) find

that mean and variance of beer brand review ratings are positively related to the beer brand’s

sales growth rate. Liu (2006) studies the dynamics of word-of-mouth and the box office revenue

for movies and finds that current period word-of-mouth affects next period box office revenue

but also that current period box office revenue affects next period word-of-mouth. Also studying

the movie industry, Sun (2009) finds that the variance of product reviews has an influence on

consumer decisions. Specifically, low review scores with large variance are less negatively

interpreted by consumers because consumers might assume a mismatch occurred between the

unhappy consumers and the product.

Though they show that product reviews affect sales, current studies focus on the

relationship at the aggregate level and do not explain how product reviews affect the purchase

decision process of individual consumers. This study focuses on the latter point. For example,

consumers may already hold a prior perceived quality about the product and use product reviews

to update it, or even completely replace it with the product reviews. Furthermore, product

104

reviews might affect different stages of the purchase decision process differently. We discuss this

next.

A second relevant research stream deals with consideration set formation and choice.

There is clear evidence that consumers form consideration sets as part of the decision making

process. For a review of this literature, see Roberts and Lattin (1997). The rationale for forming a

consideration set is that consumers do not find it cost-effective to process information on all the

brands available. That is, the consideration set stage is less effortful while the choice stage is

more comprehensive (Gilbride and Allenby 2004). Empirical work shows that choice can be

predicted more accurately by a two-stage process involving consideration set formation rather

than a one-stage process (Gensch 1987).

In the different stages of the two-stage process, consumers may have different

information about product attributes or apply different weights on the same product attribute. For

example, Andrews and Srinivasan (1995) find that the effect of price is negative in the

consideration set stage but can be positive in the choice stage. This might occur because

consumers select only affordable products in the consideration set stage, while in the choice

stage, higher price is associated with higher quality. Similarly, Allenby and Ginter (1995) find

that in-store displays and features influence consideration set formation whereas merchandising

support information affects choice. Consumers might also have or use different information

about product attributes in the two stages. It may be that consumers use only prior perceived

quality to construct the consideration set from a large number of products. In the choice stage,

however, consumers may search product reviews and incorporate information from different

sources. Our model allows for all these possibilities.

105

A third related research stream is about how consumers incorporate current knowledge

(i.e., prior perceived quality) with new information (i.e., product reviews). Both before and after

Erdem and Keane (1996), who model consumer learning about brand attributes and consumer

updating uncertainty over time, there are papers that model the consumers learning process using

Bayesian updating. For example, Mehta, Rajiv, and Srinivasan (2003) examine how consumers

update their prior perceived quality from initial transactions while buying and experiencing

products. They draw consumers’ post perceived quality and use it to explain consumers

consideration set and choice.

Our approach is similar to Mehta et al. (2003). We apply Bayesian updating of prior

perceived quality by product reviews. This contributes in two ways to the existing literature on

product reviews. First, we take into account prior knowledge as an information source, which

extant approaches do not. Second, the Bayesian updating can use the mean, number, and

variance of the product reviews and examine the values of those components of product reviews,

whereas the extant approach does not use all of these components at the same time. It is not,

however, necessary that consumers use Bayesian updating at all, and they may use a simpler

method. We examine if they exclude prior perceived quality and use only the product reviews, or

if they use the average of prior perceived quality and product reviews. We evaluate which of

these methods is most consistent with the outcomes.

In summary, this study contributes to the current research on product reviews in three

ways. First, we extend the understanding of the effects of product reviews in the decision process

by examining in what stage between consideration set stage and choice stage product reviews

affect consumers’ decision. Second, we extend the understanding of how consumers use product

106

reviews with prior perceived quality by using different updating methods. Third, we evaluate the

monetary values of the components of product reviews and prioritize the importance of them.

107

MODEL AND ESTIMATION

In this section, we develop the two-stage model of consideration set formation and choice. At

each stage, consumers have utilities from perceived quality, price, and other product

characteristics. We use four types of perceived quality (viz. prior perceived quality, product

reviews, average of prior perceived quality and product reviews, and Bayesian updating

perceived quality). Listed in order of complexity, these are defined as follows:

- Prior perceived quality is consumers’ perceived quality before they look at (or if they look

at but ignore) product reviews. We can measure the mean and variance of prior perceived

quality by directly asking the respondents.

- Product reviews are equal to the perceived quality if consumers completely adopt other

consumers’ evaluations about product quality.

- Average of prior perceived quality and product reviews can be calculated. This is the

perceived quality if consumers use this simple method to update prior perceived quality

with product reviews.

- Bayesian updating perceived quality is the updated prior perceived quality with product

reviews in a Bayesian manner. We calculate Bayesian updating perceived quality using

Bayes’ rule.

We examine in what stage which type of perceived quality is used by consumers to

integrate product reviews by comparing model fits of models with different types of perceived

quality in the two stages. Note that a priori there is no certainty that any of these types will

provide a better fit than another given that they are not nested. Table 3.1 shows the list of models

we test and a brief description of their characteristics.

108

Table 3.1. Competing Specifications

Model Perceived Quality in

Consideration Set Stage ( Cq )

Perceived Quality in Choice Stage ( Fq ) Characteristics

Model 1

Prior

Prior Consumers use only prior perceived quality in the consideration set stage. They may or may not update it after looking at product reviews.

Model 2 Reviews

Model 3 Average of Prior and Reviews

Model 4 Bayesian updating

Model 5

Reviews

Reviews Consumers use product reviews in the consideration set stage and may or may not update it.


Model 7 Bayesian updating


Average of Prior and Reviews Consumers incorporate prior perceived quality and reviews. Model 9 Bayesian updating

Model 10 Bayesian updating Bayesian updating Consumers update in the Bayesian manner.

For example, in Model 1, consumers use prior perceived quality in both stages. In Model

2, consumers use prior perceived quality in the consideration set stage but use product reviews in

the choice stage. Model 4, which indicates that consumers use prior perceived quality in the

consideration set stage and use Bayesian updating perceived quality in the choice stage, has

some theoretical support (e.g., Gilbride and Allenby 2004). The rationale is that consumers do

not spend much effort and time on looking at detailed product reviews of all products to form a

consideration set, but that they would go through detailed product reviews in the choice stage by

looking at not only the mean but also the number and variance of product reviews.

109

Consideration set stage

In the consideration set stage, consumers evaluate the utility of each product for inclusion

in the consideration set. The utility of each product is given by a multi-attribute model (e.g.,

Andrews and Srinivasan 1995). Thus, for individual i, the utility in the consideration set stage for

product j, where j=1,…,J, is denoted *ijz and expressed as follows:

(1) ijijijCijjij xpqz 3210

* ,

where Cijq is perceived quality of product j in the consideration set stage. ijp is the ratio of the

price of the product j to consumer i’s willingness to pay; and ijx is a vector of other product

attributes. The error term ij is assumed to be normally distributed, as ),0(~ zzij N . Finally,

j0 is a j-product specific intercept and 321 ,, is the vector of coefficients of the covariates.

Note that we look at perceived quality, in whichever of its four types is used, as capturing other

elements of the expected hotel utility than star ratings and price. This seems reasonable because

when both a 5-start and a 3-star hotel receive a 3/5 rating, one would still suppose that the 5-star

hotel provides higher overall utility.

We assume that consumer i includes product j in the consideration set if 0*ijz and

excludes it if 0*ijz . Therefore, the relationship between the consideration set utility *

ijz and the

observed decision ijz of whether consumer i includes product j is given by

*

*

1 if 0,0 if 0,

ijij

ij

zz

z

where 1ijz if consumer i includes product j and 0ijz otherwise.

110

Consumer i’s consideration set iC is thus the vector ),,,( 21 iJii zzz and it is related to the

vector of utilities ),,,( **2

*1

*iJiii zzzZ . The distribution of *

iZ , following Edwards and Allenby

(2004), is

(2) ),(~*zz

Cii XMVNZ ,

where CiX is the matrix of product attributes across J products in the consideration set iC ,

),,,,,( 321001 J is the vector of coefficients and zz is the variance-covariance

matrix of error terms. We estimate the parameters using a multivariate Probit model because

consumers decide for each of the J alternatives whether it should be included or not.

Choice stage

After consumer i forms consideration set iC , each product in the set is further evaluated

in the choice stage and the one with highest utility is chosen. As before, we assume that utilities

in the choice stage are multi-attribute functions of the perceived quality, the ratio of price to

willingness to pay, and other product attributes. Though the consumer has information on the

same product attributes, it is possible that the consumer weights them differently in the choice

stage because that is a different task than consideration set formation (e.g., Andrews and

Srinivasan 1995). Therefore, we allow different coefficients on the product attributes. The utility

of individual i from product j is expressed as

(3) ijijijFijjij xpqy 3210

* ,

where *ijy is the utility relative to the outside option, whose utility is scaled to zero, and other

variables are as before except that a different type of perceived quality Fijq can be used.

111

Furthermore, the parameters for product attributes as well as the error structure, ),0(~ yyij N ,

are different from the consideration set stage.

The choice rule that relates the utility *ijy to observed choice ijy , depends on whether

product j belongs to consideration set iC or not:

If ij C then * *

*

1 if max( ,0) ,

0 if max( ) 0.ik ij

ij

ik

y yy

y

If ij C then 0ijy .

Here k represents products in the consideration set. Thus, 1ijy indicates that consumer i

included product j in the consideration set iC in the first stage and then chose it in the choice

stage.

We can find the distribution of the vector of utilities ),,,( **2

*1

*iJiii yyyY as follows.

(4) ),(~*yy

Fii XMVNY ,

where FiX is the matrix of product attributes across J products, ),,,,,( 321001 J is a

vector of coefficients and yy is the variance-covariance matrix of error terms. We estimate

parameters using a multinomial Probit model because consumers choose a specific product from

the consideration set.

Bayesian updating and alternative heuristics

Consumers can change their prior perceived quality after looking at product reviews.

With Bayesian updating, consumers construct a posterior perceived quality by combining prior

perceived quality and product reviews. We describe the Bayesian updating method below.

112

Let ijq denote consumer i’s prior perceived quality on product j and assume that it

follows a normal distribution,

(5) ),(~ 200 ijijij WNq .

Suppose there are jn other consumers who have experienced product j and provided

product reviews ),1( jjl nlr and those consumers are believed to be representative buyers

and unbiased. They may have experienced different quality because of different consumption

situations. For example, consumers of the same hotel may have had different employee

interactions, room service or seasons. The consumer experiences with quality are assumed to be

normally distributed around the intrinsic quality jQ with variance of 2jQ . That is,

),(~,, 21 jj Qjjnj QNrr .

However, the intrinsic quality jQ is not known to any consumer. From consumer i’s

perspective, his or her prior perceived quality ijq is an indicator of jQ , meaning that the

consumer thinks that the quality of product j is ijq and that other people received quality

experiences centered at ijq . Therefore, before looking at product reviews, the consumer believes

the distribution of product reviews is as follows:

),(~,, 21 ijj qijjnj qNrr .

Finally, the consumer updates his or her perceived quality after looking at product

reviews and has a new distribution for the posterior perceived quality. The posterior distribution

of the quality mean given the variance and product reviews is

(6) ),(~,,,| 2111

2ijijjnjqij WNrrq

jij,

113

where )/1(/1

)/1(/122

0

20

20

1ij

ij

qjij

jqjijijij n

rnWW and

)/1(/11

220

21

ijqjijij n

. Note that jr and jn are the

mean and number of product reviews on product j, respectively.

For the variance of product reviews, 2ijq , we use a vague prior ),0(~/1 2 Gamma

ijq

and obtain a posterior distribution of

(7) 212 2,

2~,,,|/1

ijj

jjnjijq sn

nGammarrq

jij,

where jn

lijjl

jij qr

ns

1

22 )(1

1 is consumer i’s posterior variance of product reviews on product

j, given his or her posterior perceived quality is ijq .

In the Bayesian updating method, it should be noted that the parameters of the posterior

perceived quality distribution consist of the parameters of the prior perceived quality distribution

and consumer review components. That is, the distribution of the posterior perceived quality

mean is Normal with mean 1ijW and variance 21ij , where 1ijW consists of parameters of prior

perceived quality ( 0ijW and 20ij ) and product reviews components ( jr and jn ) and 2

1ij consists

of 20ij and jn . Similarly, the posterior perceived quality variance 2

ijq follows a gamma

distribution with parameters consisting of prior perceived quality ijq and product reviews

components ( jlr and jn ). That is, the posterior perceived quality is affected not only by the

characteristics of product reviews ( jr , jn and 2js ), but also by the characteristics of a consumer’s

prior perceived quality ( 0ijW and 20ij ).

114

As alternatives to Bayesian updating, the heuristic update methods we consider are (1)

product reviews and (2) the average of prior perceived quality and product reviews. As the

format of product reviews in our dataset is a 1-5 scale, which is common, we use multinomial

random draws for product reviews.

Note that regardless of the updating methods, we assume that consumers update their

prior perceived quality only for those products that they include in the consideration set. For

products excluded in the consideration set, we use the same perceived quality as in the

consideration set stage.

Estimation of the two-stage model

We next discuss the estimation. We assume that the utilities in the consideration set stage

and the choice stage are interrelated and cast the two utility functions into a system of equations

as follows.

(8) i

iFi

Ci

i

i

XX

YZ

00

*

*

,

where *iZ is the vector of utilities from the consideration set stage and *

iY is the vector of

utilities from the choice stage. We assume that yyzy

zyzz

i

i N ',00

~ , where zz and

yy are variance-covariance matrixes of i and i , respectively and zy is the covariance matrix

between i and i .

Thus our two-stage model is given by Equation (8). It allows different parameters for

consideration utility *iZ and choice utility *

iY , which is a flexible representation (Gilbride and

115

Allenby 2004). And, by looking at zy , we can see the relationships between consideration set

utility and choice utility, which is relatively unexamined in the literature because it is empirically

difficult to model correlation between the two stages (Nierop, Bronnenberg, Paap, Wedel, and

Franses 2010).

Equation (8) is in the form of a SUR model and the estimation method for parameters ,

and is discussed in several places (e.g., Koop 2003). The difference from a standard SUR

model is that we need to draw *iZ and *

iY by data augmentation using the consideration set and

the choice. The full Bayesian MCMC algorithms including the data augmentation procedure are

in Appendix.

The probability that consumer i chooses product j from consideration set Ci in the choice

stage can be written as

)()|()( iiii CPCjyPjyP .

Here, )( iCP is the probability of observed consideration set Ci and it is calculated as

follows:

)0,,0,0(),,,()( **22

*1121 iJiJiiiiiJiii zdzdzdPcccPCP ,

where cij=1 if consumer i includes product j in the consideration set and cij=0 if not. And dij=1 if

consumer i includes product j in the consideration set and dij= -1 if not. For example, if consumer

i includes product 1 and 2 out of three products {1,2,3}, the probability of consideration Ci is

calculated as )0,0,0()0,1,1()( *3

*2

*1321 iiiiiii zzzPcccPCP .

Furthermore, )|( ii CjyP is the probability that consumer i finally chooses product j

given the consideration set Ci and it is calculated as follows:

116

),|()|( **iikijii CkjjkyyPCjyP ,

where yi is consumer i’s choice among J products. We calculate this probability using the GHK

estimator (Keane 1994; Hajivassilious et al. 1996).

Equation (8) involves a multivariate Probit model for consideration set and a multinomial

Probit model for the choice. As these are discrete choice models, we cannot identify all the

parameters in Equation (8). Following Edwards and Allenby (2003), we navigate in the

unidentified parameter space in estimation but report parameters which are divided by the

corresponding variances.

117

SURVEY AND DATA

To collect data to estimate the model, we conducted an online survey about hotel choice. To

make the task realistic, we used a tourist destination Cancun, Mexico as a specific location. We

constructed the survey website based on real hotel names, star ratings, product reviews and

prices taken from an online travel site, Travelocity.com. The survey page allows respondents to

access product reviews in both the consideration set stage and the choice stage if they want. We

presented respondents with search results similar to what they would see on online sites.

Respondents could read the hotel descriptions and get more information about hotel amenities by

clicking on the hotel name. However, compared to online sites such as Travelocity.com and

Hotels.com, we simplified the presentation in three ways. First, we picked a subset of 10 hotels

whose star ratings and product reviews were available. These are shown in Table 3.2 and

augmented by the last two columns which show, from the survey results, the percentage of

respondents who included them in their consideration sets and final choice decisions.

Second, we used the numerical ratings but not the textual descriptions of product reviews

because it is not easy to quantify these descriptions. Third, we presented respondents with the

number of reviews, the mean, and a histogram, as shown in Figure 3.1. In comparison,

Travelocity.com does not provide the histogram, but other websites such as Amazon.com do, and

we include it so that respondents have information about variance in reviews.

118

Table 3.2. Hotel Information

Hotel Stars Price ($)

Product reviews Conside-ration Choice Mean Number Variance

H1. Royal Solaris Cancun 3.5 152 3.52 175 1.24 29.3% 7.0% H2. Dreams Cancun Resort 4.5 229 4.70 308 0.57 32.0% 7.8% H3. GR Solaris Cancun 4.0 173 3.54 209 1.20 23.0% 2.8% H4. InterContinental 4.0 93 4.10 27 1.15 46.6% 17.4% H5. Riu Palace Las Americas 4.0 198 4.62 179 0.60 31.1% 6.3% H6. Fiesta Americana 5.0 168 4.90 69 0.37 37.6% 15.2% H7. JW Mariott Cancun 5.0 139 3.80 29 2.33 50.6% 23.1% H8. Hotel Sotavento 2.5 51 2.35 11 1.67 18.4% 1.9% H9. Imperial Las Perlas 2.0 32 3.02 11 2.08 19.7% 5.9% H10. Holiday Inn Express 3.0 60 2.45 12 1.42 36.7% 12.7%

Figure 3.1. Survey Screen on Some Hotel Information

119

Survey Procedure

First, we described the purpose of the survey and asked questions about their travel

experience. Second, we measured respondents’ prior knowledge about each hotel brand by

asking questions on awareness, stay experience, and perceived quality before they looked at

product reviews. Third, we presented hotel descriptions and product reviews and asked

respondents to shortlist hotels for further consideration if they want (i.e., the consideration set).

Fourth, we presented only hotels that respondents chose in the previous stage and asked them to

choose one hotel for their stay. Finally, we asked some demographic questions and how

respondents used product reviews in the consideration set and the choice stage. The survey

procedure is summarized in Figure 3.2.

Figure 3.2. Survey Procedure

We measured prior perceived quality with respect to mean and variance. We asked

respondents about the perceived quality of hotels on a scale of 1.0 to 5.0 with increments of 0.5.

120

This perceived quality is the mean of prior perceived quality. For variance, we asked the degree

of confidence on quality evaluation using 1 to 10 scales as its inverse is related to the variance.

Descriptive Statistics

We recruited 771 respondents from an online survey company, TRCHOME.com. We

dropped 75 respondents who completed the survey too rapidly indicating that they were

uninvolved in the task. Demographics are as follows: The average age was 46.5 years (s.d.=12.4)

and 73% were female. Based on the zip codes, the respondents are spread across the US.

Respondents were familiar with online shopping (4.48 out of 5, s.d.=0.9) and online hotel

booking (3.71 out of 5, s.d.=1.4). 17 % of them had stayed in Cancun for several days. Their

usual budget for hotels is $125 per night on average.

Respondents formed consideration sets with an average of 3.2 hotels. In general, hotels in

the consideration set were globally recognized hotel brands. That is, the percentage including

Marriot, Intercontinental, and Holiday Inn are 50.6%, 46.6%, and 36.7% respectively. In the

choice stage, the top three hotels were the Marriot (23.1%), Intercontinental (17.4%), and Fiesta

Americana (15.2%).

121

RESULTS

Model Comparison

We tested the ten models given in Table 3.1 that use the four types of perceived quality in

the different stages. For example, Model 4 hypothesizes that consumers use prior perceived

quality in the consideration set stage and posterior perceived quality updated by the Bayesian

manner in the choice stage. Since there is some support for this, we refer to it as the proposed

model. We compared the different models with log marginal likelihoods using the importance

sampling method of Newton and Raftery (1994). Note that all models have the same number of

parameters. Comparing the models, we can address: (1) In what stage consumers use product

reviews, and (2) how consumers use product reviews – i.e., ignore them, completely adopt them

or incorporate them with prior perceived quality in the Bayesian manner or heuristic manner.

Table 3.3 shows the log marginal likelihoods of unconditional choice )( jyP i ,

consideration set )( iCP , and conditional choice )|( ii CjyP for each model. The numbers in

parentheses are the ranks of the likelihoods, where smaller rank means a better model.

The comparison of log marginal likelihoods of unconditional choice reveals that Model

10, in which consumers are specified as using Bayesian updating of perceived quality in both

stages, is the best fitting model (log marginal likelihood is -3323). This means that consumers

use product reviews in the consideration set stage and update prior perceived quality with

product reviews in the Bayesian manner. Note that the Bayesian factors between the best model

(Model 10) and other models are large enough to support that the true model is Model 10. When

122

Table 3.3. Log Marginal Likelihood of Models

Model

Perceived Quality in

Consideration Set Stage ( Cq )

Perceived Quality in

Choice Stage ( Fq )

Log Marginal Likelihood

)( jyP i )( iCP )|( ii CjyP

Model 1

Prior

Prior -3391.0 (6) -2700.9 (7) -761.5 (10) Model 2 Reviews -3390.2 (5) -2647.5 (3) -752.1 (6) Model 3 Average* -3389.0 (4) -2689.3 (5) -753.6 (8) Model 4 Bayesian -3385.4 (3) -2697.8 (6) -752.6 (7) Model 5

Reviews Reviews -3392.2 (7) -2689.2 (4) -741.3 (2)

Model 6 Average -3381.2 (2) -2636.0 (2) -745.0 (3) Model 7 Bayesian -3456.4 (10) -2750.6 (10) -748.7 (5) Model 8

Average Average -3405.1 (8) -2721.8 (9) -747.2 (4)

Model 9 Bayesian -3416.8 (9) -2715.0 (8) -734.7 (1) Model 10 Bayesian Bayesian -3323.0 (1) -2592.2 (1) -759.2 (9)

*: Average represents average of prior perceived quality and product reviews.

we separately look at the log marginal likelihoods of consideration set and conditional choice in

Model 10, it is shown that Model 10 is the best fitting model for the consideration set stage, but

it does not explain the choice stage much. This reveals that the best fit for the consideration set

leads to the best model in the two stage decision process.

The second best model is Model 6, in which consumers use product reviews in the

consideration set stage and the average of prior perceived quality and product reviews in the

choice stage. This result shows that even product reviews themselves and the simple heuristic

have some utility towards describing the consumer decision process.

Among models, Model 4 was in accordance with the theory that the consideration set

stage is less effortful while the choice stage is more comprehensive (Gilbride and Allenby 2004).

However, our findings show that Model 4 is the third best model. While not the best, this

123

relatively high rank indicates that this model performs reasonably. Yet, the evidence outlined in

Table 3.3 does indicate that Bayesian updating is employed at the consideration stage, indicating

that respondents exerted the effort needed to integrate the information available at the

consideration stage with their prior beliefs.

In conclusion, product reviews turn out to be a critical factor in the decision process in

consideration set formation and all components of product reviews are used in the Bayesian

manner. Therefore, it is useful for firms to proactively manage product reviews such as by

displaying them saliently, or encouraging users to provide reviews, or having complaint redressal

mechanisms for unhappy consumers rather than having them air grievances online.

Parameter Estimation

Table 3.4 shows the parameter estimates from the consideration set stage and choice

stage of Model 10 whose marginal likelihood is the highest. In this model, Bayesian updating

perceived quality is employed as the measure of product quality. In the consideration set stage,

the hotel-specific intercept is closely associated with the possibility of the hotel being included in

the consideration set. For example, the intercept of JW Marriott has the highest value and it ranks

at the top in the consideration percentage. In contrast, the intercepts of Hotel Sotavento and

Imperial Las Perlas, which rank at the bottom, have quite negative values of 9.0 each. Thus.

the hotel-specific intercepts represent consumers’ consideration tendency on hotels based on

hotel intrinsic characteristics other than product attributes such as quality and price.

124

Table 3.4. Parameter Estimates of Model 10

Variables Consideration Set Model Choice Model Mean 2.5% 97.5% Mean 2.5% 97.5%

H1. Royal Solaris -0.576 -0.856 -0.376 -0.490 -1.394 0.134 H2. Dreams Resort -0.510 -1.026 -0.175 -0.767 -2.360 0.218 H3. GR Solaris -0.740 -0.971 -0.570 -1.054 -2.165 0.212 H4. InterContinental -0.220 -0.559 -0.008 0.080 -0.443 0.661 H5. Riu Palace Las Americas -0.549 -1.007 -0.274 -0.154 -1.138 0.755 H6. Fiesta Americana -0.401 -0.728 -0.143 0.938 0.280 1.644 H7. JW Marriott -0.161 -0.427 0.090 0.278 -0.536 0.926 H8. Hotel Sotavento -0.944 -1.121 -0.764 -1.157 -3.314 0.153 H9. Imperial Las Perlas -0.915 -1.058 -0.773 -0.826 -1.814 -0.076 H10. Holiday Inn -0.478 -0.636 -0.338 -1.184 -2.357 -0.481 Bayesian Updating Perceived Quality 0.054 0.005 0.115 0.019 -0.045 0.089 Price to WTP -0.104 -0.189 -0.048 -0.223 -0.338 -0.147 Awareness 0.252 0.138 0.421 0.423 0.180 0.676 Experience 0.075 -0.058 0.210 0.038 -0.227 0.289

Parameters in bold are significant at the 95% level.

The product attributes also affect consideration set membership. The coefficient of

Bayesian updating perceived quality is positive ( 054.0 ). Therefore, hotels with high

Bayesian updating perceived quality are more likely to be included in the consideration set. The

negative coefficient of the ratio of price to willingness to pay ( 104.0 ) shows that hotels

with the ratio of higher price to willingness to pay ratio are less likely to be included. In addition,

as the coefficient of hotel brand awareness is positive ( 252.0 ), it is more likely that well

known hotels are included in the consideration set. However, the coefficient of hotel experience

at other places is not significant, possibly because most of the respondents had not stayed at

hotels in the survey. In case of Holiday Inn, around 50% of respondents have stayed at one of its

125

chain hotels, but it seems that Holiday Inn, a relatively low quality hotel, is less attractive as a

resort hotel to respondents.

In the choice stage, the hotel-specific intercepts are significant only for some hotels. The

positive coefficient of Fiesta Americana ( 938.0 ) means that if this hotel is included in the

consideration set, it is likely to be finally chosen. In contrast, the negative coefficients of

Imperial Las Perlas and Holiday Inn ( 826.0 and ,184.1 respectively) means that even if

they are included in the consideration set, those hotels are much less likely to be finally chosen.

Other than those three hotels, there are no hotel-specific effects in the choice stage. This could be

because consumers have already considered the hotel-specific effects at the consideration set

stage.

The effects of other product attributes are different in the two stages. In the choice stage,

unlike in the consideration set stage, the Bayesian updating perceived quality is not significant.

That is, after consumers consider alternatives with respect to quality in the consideration set

stage, they do not consider quality any more. Possibly, they consider hotels with similar quality

level. The coefficients of the ratio of price to willingness to pay and awareness have the same

signs as those in the consideration set stage ( 223.0 and ,423.0 respectively). Thus, among

hotels in the consideration set, hotels with higher ratio of price to willingness to pay or less

known hotels are less likely to be chosen. Finally, the experience variable is not significant in the

choice stage either.

In summary, hotel specific characteristics affect both consideration set and choice. The

significance of Bayesian updating perceived quality means that product reviews play an

important role in the consideration set stage. But it is notable that consumers do not consider

126

quality again in the choice stage. Overall, lower price and awareness increase the possibility of

being included in the consideration set and being chosen as a choice.

Values of Bayesian Updating Perceived Quality and Product Reviews

We compute the monetary value of a unit increase in Bayesian updating perceived quality

and product reviews. Our approach is to compute the unit changes of Bayesian updating

perceived quality and price, which induce the same change of the consideration set utility. We

use the coefficients of Bayesian updating perceived quality and price to willingness to pay. As

Bayesian updating perceived quality consists of prior perceived quality and product reviews, we

can finally derive the value of a unit increase in product reviews by the chain rule formula.

The coefficient of Bayesian updating perceived quality *1 1 0.054ij ijz W means

that a unit increase in Bayesian updating perceived quality increases the consideration set utility

by 0.054. The coefficient of the ratio of price to willingness to pay *2 0.104ij ijz p

means a unit decrease in the ratio increases the utility by 0.104. Therefore, one unit increase in

Bayesian updating perceived quality brings as much utility change as )104.0/054.0(52.0 unit

decrease in the ratio of price to willingness to pay. That is, for an individual i across all products

)52.0(*

1

*

ij

ij

ij

ij

pz

Wz

,

where *ijz is the utility of including product j in the consideration set, 1ijW is the expectation of

Bayesian updating perceived quality, ijp is the price to willingness to pay. As ijp consists of

127

price jp and willingness to pay iWTP , it is the case that .1

ij

i

jij WTP

dpWTP

pddp Thus, 0.52

unit decrease in ijp is equivalent to iWTP52.0 unit decrease in price as

ijij WTPdpdp 52.052.0 ,

where iWTP is willingness to pay of the individual i and is constant across hotels. In our dataset,

the average of iWTP52.0 of all respondents is $70.6, which indicates that the value of one unit

increase in Bayesian updating perceived quality is worth $70.6 in that both one unit increase in

Bayesian updating perceived quality and price decrease of $70.6 result in the same utility

change.

Next, we calculate the monetary values of each component of product reviews using the

monetary value of Bayesian updating perceived quality. Based on Equation 6, we set up the

expectation of Bayesian updating perceived quality as )/1(/1)/1(/1

220

20

20

1jjij

jjjijijij sn

rsnWW after

replacing 2ijq by 2

js which summarizes the variance of product reviews. By multiplying the

monetary value of Bayesian updating perceived quality and the derivatives of 1ijW with respect

to each component of product reviews, we can calculate the monetary values of product reviews

as follows.

128

Component of Product Reviews

Monetary Value (=Necessary Price Change Derivative)

Mean )( jr 22

0

21

1

**

//1/

)52.0(jjij

jji

j

ij

ij

ij

j

ij

snsn

WTPr

WWz

rz

Number )( jn 222

0

20

201

1

**

)//1()//1)((

)52.0(jjij

ijjijji

j

ij

ij

ij

j

ij

snsWr

WTPn

WWz

nz

Variance )( 2js

2220

20

220

21

1

*

2

*

)//1()//)((

)52.0(jjij

ijjjijji

j

ij

ij

ij

j

ij

snsnWr

WTPs

WWz

sz

Note that even though the monetary values of one unit change in posterior )52.0( iWTP are

product-invariant, the monetary values of product reviews are the product-variant as the prior

perceived quality and product reviews are product-variant.

Table 3.5 shows the monetary values of a unit increase in the mean, number, and variance

of product reviews by hotels.

Table 3.5. Monetary Value of a Unit Increase in Product reviews Components

Hotel

Monetary Value of Unit Increase in Product reviews ($)

)( 0ijj Wr Mean )( jr

Number)( jn

Variance)( 2

js

H1. Royal Solaris 67.6 -0.003 0.60 -0.030 H2. Dreams Resort 69.7 0.004 -1.10 0.931 H3. GR Solaris 68.1 -0.002 0.45 -0.058 H4. InterContinental 56.2 0.160 -4.32 0.605 H5. Riu Palace Las Americas 69.1 0.012 -2.16 0.983 H6. Fiesta Americana 68.2 0.058 -3.99 0.941 H7. JW Marriott 45.8 -0.093 2.71 -0.261 H8. Hotel Sotavento 40.6 -0.310 3.41 -0.302 H9. Imperial Las Perlas 39.6 0.343 -4.12 0.728 H10. Holiday Inn 43.8 -0.667 8.68 -0.589

129

Some observations are that (1) a unit increase in the mean of product reviews is the most

valuable, while a unit increase in variance is the second and unit increase in the number of

product reviews is not very valuable, and (2) values of product reviews vary across hotels.

Regarding the value of a unit increase in the mean consumer review, the average is $57

with a maximum of $69.7 for Dreams Resort hotel (H2) and the minimum of $39.6 for Imperial

Las Perlas (H9). The average value implies that a unit increase in the mean of product reviews

brings as much utility increase in the consideration set as price decrease by $57. Therefore, the

higher mean of product reviews is an alternative to avoid undesirable price decrease to be

included in the consideration set.

Interestingly, however, the value of a unit change in the number of product reviews is not

high and its sign is inconsistent across hotels. The maximum value is $0.34 per review for

Imperial Las Perlas (H9) and the minimum value is -$0.66 for Holiday Inn (H10). Different signs

result from the difference between the mean of product reviews and prior perceived quality

)( 0ijj Wr . If the mean of product reviews is higher than prior perceived quality (e.g., H9),

consumers may interpret the larger number of product reviews positively. However, if the mean

of product reviews is lower than prior perceived quality (e.g., H10), consumers may have doubts

about quality on those hotels and be assured by the large number of reviews.

A unit increase in variance of product reviews has moderate value and large differences

across hotels. For example, its maximum value is $8.68 for Holiday Inn (H10) and the minimum

value is -$4.32 for Intercontinental (H4). Again, different signs result from the difference

between the mean of product reviews and prior perceived quality )( 0ijj Wr but the interpretation

is not the same. If the mean of product reviews is higher than prior perceived quality (e.g., H4),

130

high variance possibly makes consumers think that even though the overall quality is high, there

are some consumers who experienced low quality just like their low prior perceived quality. So,

the value of high variance is negative. However, if the mean of product reviews is lower than

prior perceived quality (e.g., H10), consumers may regard large variance as consumer

heterogeneity and positively interpret that there are consumers who experience high quality just

like their high prior perceived quality.

In summary, using the estimation results of Bayesian updating perceived quality and

price, we find that the value of Bayesian updating perceived quality is $70.6 and the various

monetary values of product reviews depending on hotels and differences between the mean of

product reviews and prior perceived quality. Especially, the value of the mean of product reviews

is around $57 on average.

131

CONCLUSION

Summary

The objective of this paper is to study in which stages of the purchase decision process

consumers use product reviews and how they incorporate product reviews with their prior

perceived quality. We also evaluate how valuable product reviews are in monetary terms. We

used four types of perceived quality (viz. prior perceived quality, product reviews, average of

prior perceived quality and product reviews, and Bayesian updating perceived quality) in a two-

stage choice model in order to understand consumers’ decision processes when product reviews

are available.

The best fitting model (Model 10) shows that consumers use Bayesian updating

perceived quality in the consideration set stage. This means that consumers use product reviews

from the consideration set stage and the update method is consistent with the Bayesian manner,

by which consumers update prior perceived quality using the information components of product

reviews. These components are the mean of product reviews, their number and variance.

The estimation results in the two-stage choice model are summarized as follows: In the

consideration set stage, intrinsic hotel effects are high for well-known international hotel brands

such as Marriott but low for local hotels such as Hotel Sotavento. Hotels with high Bayesian

updating perceived quality are more likely to be included in the consideration set while hotels

with high price are less likely to be included. It is also shown that awareness is important for

hotels to be included.

132

In the choice stage, the results show that intrinsic hotel effects and Bayesian updating

perceived quality become much less important. Rather, price and awareness play a significant

role. Consumers consider hotels with a similar quality level in the consideration set stage but

once they construct consideration sets consisting of the similar quality hotels, they put more

weight on prices and awareness.

Finally, we compute the monetary values of the components of product reviews. We find

that a unit increase in the mean of product reviews is worth $57 on average. That is, by

improving the mean of product reviews, hotels are more likely to be included at the same price or

do not need to reduce prices to be considered more. Our findings also show that the number of

product reviews is less important, while the variance of product reviews can have positive or

negative monetary value depending on the differences between the mean of product reviews and

prior perceived quality.

Managerial Implication

There are several managerial implications of our study for retail managers who present

product reviews of different manufacturers’ products or manufacturers themselves. First, the

result that consumers use product reviews in the consideration set stage but less so in the choice

stage provides a guide for how to display product reviews. Since product reviews are important

from the consideration set stage, managers may need to give consumers easy access to product

reviews from the beginning of the search. The methods would include showing product reviews

in the list of first search results, or allowing consumers to sort the search results by the

components of product reviews. Then, consumers could actively use product reviews from the

consideration set. The managerial implication to manufacturers is that they need to have their

133

product quality good enough to be included in the consideration set because once consumers

construct the consideration set, quality is not a choice criterion any more but price still is.

Therefore, as shown in our results of the choice model, they should be aware of more price

competition between manufacturers within the similar quality level.

Second, consumers’ Bayesian updating shows that retailers and manufacturers need to be

concerned about all components of product reviews (i.e., the mean, number, and variance) as all

of components are used to update prior perceived quality. Particularly, it is recommended for

retailers to provide variance information, perhaps by using a histogram, as well as the mean and

the number which are commonly presented. Manufacturers should note that a high mean of

product reviews is much more important for determining Bayesian updating perceived quality

and eventually consideration set formation than a larger number of product reviews. Thus, it is

beneficial for manufacturers to provide encouragement to consumers who have positive

experiences with their products in order to have them write good product reviews and to handle

grievances of unhappy consumers proactively. In other words, manufacturers may need to

concentrate on motivating satisfied consumers more than increasing the number of product

reviews.

Third, regardless of the strong effects of product reviews, it is important to manage

consumers’ prior perceived quality and awareness at all times. Prior perceived quality is directly

related to Bayesian updating perceived quality and indirectly mediates the effects of the number

and variance of product reviews on Bayesian updating perceived quality. Therefore, if

manufacturers constantly maintain high prior perceived quality by brand positioning or

134

advertising, they may be able to negate the effects of bad product reviews, which are sometimes

inevitable.


A limitation of the research is that besides the numerical summary of product reviews,

consumers also get product information from review passages and on some sites, the percentage

of consumers who recommend a review as being helpful or unhelpful. Furthermore, consumers

can deliberately search for some positive phrases for including alternatives quickly or negative

phrases for eliminating alternatives. Therefore, it would give new insights to quantify descriptive

passages and analyze them. In further research, researchers can utilize product reviews on

subcategories. From the hotel example, consumers may also refer to detailed evaluation on hotel

service, gym or pool, hotel condition, room cleanliness, or room comfort. As consumers consult

information on different subcategories depending on products or purchase situations (e.g.,

vacation, business, or family trip), models which consider detailed information would be useful.

135

REFERENCES

Allenby, Greg M. and James L. Ginter (1995), “The Effects of In-store Displays and Feature Advertising on Consideration Set,” International Journal of Research in Marketing, 12 (May), 67-80.

Andrews Rick L., T.C. Srinivasan (1995), “Studying Consideration Effects in Empirical Choice

Models Using Scanner Panel Data,” Journal of Marketing Research, XXXII February, 30-41.

Chevalier, Judith A., Dina Mayzlin (2006), “The Effect of Word of Mouth on Sales: Online

Book Reviews,” Journal of Marketing Research, 43 (August), 345-354. Chiang, Jeongwen, Siddhartha Chib, Chakravarthi Narasimhan (1999), “Markov chain Monte

Carlo and models of consideration set and parameter heterogeneity,” Journal of Econometrics 89 223-248.

Clemons, Eric K., Guodong Gordon Gao, Lorin M. Hitt (2006), “When Online Reviews Meet

Hyperdifferentiation: A Study of the Craft Beer Industry,” Journal of Management Information Systems, 23 (2), 149-171.

Edwards, Yancy D., Greg M. Allenby (2003), “Multivariate Analysis of Multiple Response

Data,” Journal of Marketing Research, 40 (August), 321-334. Erdem, Tülin and Michael P. Keane (1996), “Decision-Making under Uncertainty: Capturing

Dynamic Choice Processes in Turbulent Consumer Goods Markets,” Marketing Science, 15 (1), 1-20.

Gensch, Dennis H. (1987), “A Two Stage Disaggregate Attribute Choice Model,” Marketing

Science, 6 (Summer), 223-31. Gilbride, Timothy J., Greg M. Allenby (2004), “A Choice Model with Conjunctive, Disjunctive,

and Compensatory Screening Rules,” Marketing Science, 23, 391-406. Hajivassiliou, V., D. McFadden, P. Rudd. (1996). “Simulation of multivariate normal rectangle

probabilities and their derivatives,” Journal of Econometrics, 72, 85-134. Keane, M. (1994). “A computationally practical simulation estimator for panel data,”

Econometrica, 62, 95-116.

136

Koop, Gary (2003), Bayesian Econometrics. Hoboken, NJ: Wiley. Liu, Yong (2006), "Word of Mouth for Movies: Its Dynamics and Impact on Box Office

Revenue," Journal of Marketing, 70 (3), 74-89. Mehta, Nitin, Surendra Rajiv, Kannan Srinivasan (2003), “Price Uncertainty and Consumer

Search: A Structural Model of Consideration Set Formation,” Marketing Science, 22 (1), 58-84.

Newton, M. and Raftery, A. (1994), “Approximate Bayesian inference by the weighted

likelihood bootstrap,” Journal of the Royal Statistical Society, Series B, 56, 3-48. Nierop, Erjen Van, Bart Bronnenberg, Richard Paap, Michel Wedel, Philip Hans Franses (2010),

“Retrieving Unobserved Consideration Sets from Household Panel Data,” Journal of Marketing Research, 47 (February), 63-74.

Roberts, John H., James M. Lattin (1997), "Consideration: Review of Research and Prospects for

Future Insights," Journal of Marketing Research, 34 (August), 406-410. Sun, Monic 2009, “How Does Variance of Product Ratings Matter?” Working paper, Stanford

University, CA. Vermeulen, Ivar E. and Daphne Seegers (2009), “Tried and tested: The impact of online hotel

reviews on consumer consideration,” Tourism Management, 30, 123-127.

137

APPENDIX

MCMC Algorithms

Equation 8 is the main equation to estimate. Our approach is to sequentially draw , ,

, *iZ and *

iY . We first stack all equations into vectors and matrices as

*

**

i

ii Y

ZU , F

i

Ci

i XX

X0

0, B ,

i

iie .

We then stack all the observations together as

*

*1

*

.

.

NU

U

U ,

NX

X

X..

1

, and

Ne

e

e..1

,

and write

(A1) eXBU *,

where e is ),0(N , and where is a block-diagonal matrix given by ),0( NIN .

As Equation A1 is a SUR model, we can estimate B and by using a Gibbs sampler

with standard Normal-Wishart priors. Specifically, we use a normal prior ),(~ BBNB and a

Wishart prior ),(~1 VvW .

The posterior of B conditional on *U and 1 is ),(~,| 1*BBNUB ,

138

where )(1

*1'1N

iiiBBBB UX and 1

1

1'1 )(N

iiiBB XX . The posterior for 1

conditional on *U and B is ),(~,| *1 VvWBU , where vNv and

.)')((1

1

**1N

iiiii BXUBXUVV

Now, we estimate *iZ and *

iY using data augmentation. First, we data augment *iZ given

*iY and other parameters from a multivariate normal distribution

(A2) )~,|(~ **zzi

Cii YXMVNZ ,

where *| iCi YX is the expectation of *

iZ conditional on *iY and zz

~ is the variance-covariance

matrix of error terms in the consideration set stage conditional on the other variance-covariance

matrices ( zyyy , ). We draw a positive *ijz if consumer i includes product j in the consideration

set and a negative *ijz if consumer i does not include product j in the consideration set.

Second, we data augment *iY given *

iZ and other parameters from a multivariate normal

distribution

(A3) )~,|(~ **yyi

Fii ZXMVNY ,

where *| iFi ZX is the expectation of *

iY conditional on *iZ and yy

~ is the variance-covariance

matrix of error terms in the choice stage conditional on the variance-covariance matrices

),( zyzz .

We draw *ijy in the case of whether product j is in the consideration set and whether it is

finally chosen in the consideration set. For products in the consideration set, we augment *ijy

from a sub-distribution of the distribution in A3, consisting of the mean vector and variance-

139

covariance matrix of products in the consideration set. We impose restrictions that *ijy is the

highest for the finally chosen product among products in the consideration set and that each *ijy

is negative if the consumer chooses the outside option (No reservation). For products not

included in the consideration set, we augment *ijy from a sub-distribution of the distribution in

A3, consisting of the mean vector and covariance matrix of products not included in the

consideration set. Unlike the products in the consideration set, however, we do not impose a

restriction on the size and sign of *ijy .

VITA

Sungha Jang received a Bachelor of Economics with a major in statistics in 1998 and a Master of

Business Administration concentrating on Marketing in 2001 from Korea University, Seoul,

Korea. He will be awarded the Doctor of Philosophy in Management Science specializing in

Marketing in May, 2011 at the University of Texas at Dallas. Prior to joining the Ph.D. program,

he worked for Experian Korea as a senior consultant in the field of credit risk management.

Consumer Decisions on Share of Wallet, Automobile Search

Documents