-
A. Parasuraman, Valarie A. Zeithaml, & Leonard L. Berry
Reassessment of Expectations as aComparison Standard in
Measuring
Service Quaiity: implications forFurtiier Research
The authors respond to concerns raised by Cronin and Taylor
(1992) and Teas (1993) about the SERVQUAL in-strument and the
perceptions-minus-expectations specification invoked by it to
operationalize service quality. Afterdemonstrating that the
validity and alleged severity of many of those coiicerns are
questionable, they offer a set ofresearch directions for addressing
unresolved issues and adding to the understanding of service
qualityassessment.
T wo recent articles in this joumalCronin and Taylor(1992) and
Tfeas (1993)have raised concerns aboutour specification of service
quality as the gap between cus-tomers' expectations and perceptions
(Parasuraman, Zei-thaml, and Berry 1985) and about SERVQUAL, a
two-partinstrument we developed for measuring service quality
(Par-asuraman, Zeithaml, and Berry 1988) and later refined
(Par-asuraman, Berry, and Zeithaml 1991). In this article we
re-spond to those concerns by reexamining the underlying ar-guments
and evidence, introducing additional perspectivesthat we argue are
necessary for a balanced assessment ofthe specification and
measurement of service quality, andisolating the concerns that seem
appropriate from those thatare questionable. Building on the
insights emerging fromthis exchange, we propose an agenda for
further research.Because many of the issues raised in the two
articles and inour response are still unsettled, we hope that the
researchagenda will encourage all interested researchers to
addressthose issues and add to the service quality literature in a
con-structive manner.'
Though the concerns raised by both C&T and Teas re-late to
service quality assessment, by and large they focuson different
issues. For this reason, we discuss C&T's con-cerns first, then
address the issues raised by Teas. We con-clude with a discussion
of further research for addressing un-
'Because of the frequent references to Cronin and Taylor in this
article,hereafter we refer to them as C&T. For the same reason,
we use initials inciting our own work, that is, PZB, PBZ. ZBP, or
ZPB. Also, when appro-priate, we use Ihe abbreviations CS, SQ, and
PI to refer to customer satis-faction, service quality, and
purchase intentions, respectively.
A. Parasuraman is Federated Professor of Marketing.Texas A&M
Univer-sity. Valarie A. Zeithami is a Partner, Schmalensee and
Zeitfiaml. LeonardL Berry is JCPenney Chair of Retailing Studies
and Professor of Market-ing, Texas A&M University.
Joumal of MarketingVol. 58 (January 1994), 111-124
resolved issues, including the relationship between
servicequality and customer satisfaction.
Response to Cronin andTayior(1992)
C&T surveyed customers in four sectors (banking, pest
con-trol, dry cleaning, and fast food) using a questionnaire
thatcontained (1) the battery of expectations and
perceptionsquestions in the SERVQUAL instrument (PBZ 1991;
PZB1988), (2) a separate battery of questions to measure the
im-portance of the SERVQUAL items, and (3) single-itemscales to
measure overall service quality, customer satisfac-tion, and
purchase intentions. On the basis of their study,C&T conclude
that it is unnecessary to measure customer ex-pectations in service
quality research. Measuring percep-tions is sufficient, they
contend. They also conclude that ser-vice quality fails to affect
purchase intentions. We do not be-Heve that these conclusions are
warranted at this stage of re-search on service quality, as we
attempt to demonstrate inthis section. Our comments on C&T's
study focus on (1)conceptual issues, (2) methodological and
analytical issues,and (3) practical issues.
Conceptual IssuesThe perceptions-expectations gap
conceptualization of SQ.C&T contend (p. 56) that, "little if
any theoretical or empir-ical evidence supports the relevance of
the expectations-performance gap as the basis for measuring service
qual-ity." They imply here and elsewhere that PZB's extensivefocus
group research (PZB 1985; ZPB 1990) suggest onlythe attributes of
SQ and not its expectations-performancegap formulation. We wish to
emphasize that our researchprovides strong support for defining SQ
as the discrepancybetween customers' expectations and perceptions,
a pointwe make in PZB (1985) and ZPB (1990). C&T's
assertion
Reassessment of Expectations as a Comparison Standard /111
-
also seems to discount prior conceptual work in the SQ
lit-erature (Gronroos 1982; Lehtinen and Lehtinen 1982; Sas-ser,
Olsen, and Wyckoff 1978) as well as more recent re-search (Bolton
and Drew 1991a, b; ZBP 1991) that sup-ports the disconfirmation of
expectations conceptualizationof SQ. Bolton and Drew (1991b, p.
383), in an empiricalstudy cited by C&T, conclude:
Consistent with prior exploratory research concerning ser-vice
quality, a key determinant of overall service qualityis the gap
between performance and expectations (i.e., dis-confirmation). For
residential customers, perceived tele-phone service quality
depended on the disconfirmation trig-gered by perceived changes in
existing service or changesin service providers. . . . It is
interesting to note that dis-confirmation explains a larger
proportion of the veiriancein service quality than performance.
Therefore, C&T's use of the same Bolton and Drew articleas
support for their claim (p. 56) that' 'the marketing litera-ture
appears to offer considerable support for the superior-ity of
simple performance-based measures of service qual-ity" is
surprising and questionable. Moreover, a second ci-tation that
C&T offer to support this contentionMazis,Ahtola, and Klippel
(1975)is an article that neither dealtwith service quality nor
tested performance-based measuresagainst measures incorporating
expectations.
There is strong theoretical support for the general no-tion that
customer assessments of stimuli invariably occurrelative to some
norm. As far back as Helson (1964), adap-tation-level theory held
that individuals perceive stimulionly in relation to an adapted
standard. More recently,Kahneman and Miller (1986) provide
additional theoreticaland empirical support for norms as standards
against whichperformance is compared and can be measured.
Attitude formation versus attitude measurement. The em-pirical,
longitudinal studies (e.g., Bolton and Drew 1991a,b; Churchill and
Surprenant 1982; Oliver 1980) that C&Tcite to support their
arguments focus on the roles of expec-tations, performance, and
disconfirmation in the formationof attitudes. However, the
relevance of those arguments forraising concems about SERVQUAL is
moot because the in-strument is designed merely to measure
perceived SQanattitude levelat a given point in time, regardless of
the pro-cess by which it was formed. SERVQUAL is a tool to ob-tain
a reading of the attitude level, not a statement abouthow the level
was developed.
Relationship between CS and SQ. An important issueraised by
C&T (as well as Teas) is the nature of the link be-tween CS and
SQ. This is a complex issue characterized byconfusion about the
distinction between the two constructsas well as the causal
direction of their relationship. SQ re-searchers (e.g.. Carman
1990; PZB 1988) in the past havedistinguished between the two
according to the level atwhich they are measured: CS is a
transaction-specific assess-ment (consistent with the CS/D
literature) whereas SQ is aglobal assessment (consistent with the
SQ literature). Onthe basis of this distinction, SQ researchers
have positedthat an accumulation of transaction-specific
assessmentsleads to a global assessment (i.e., the direction of
causalityis from CS to SQ). However, on careful reflection, we
now
believe that this distinction may need to be revised,
particu-larly because some recent studies (Reidenbach and
Sandifer-Smallwood 1990; Woodside, Frey, and Daly 1989) havemodeled
SQ as an antecedent of CS.
In the research implications section, we propose and dis-cuss an
integrative framework that attempts to reconcile thediffering
viewpoints about the distinction, and the directionof causality,
between CS and SQ. Here we wish to pointout an apparent
misinterpretation by C&T of our publishedwork (PZB 1985, 1988).
Specifically, in PZB (1985) we donot discuss the relationship
between SQ and CS, and inPZB (1988) we argue in favor of the "CS
leads to SQ" hy-pothesis. Therefore, we are surprised by C&T's
attributingto us the opposite view by stating (p. 56) "PZB
(1985,1988) proposed that higher levels of perceived SQ result
inincreased CS" and by referring to (p. 62) "the effects
hy-pothesized by PZB (1985, 1988) [that] SQ is an antecedentof CS."
Moreover, as we discuss in the next section,C&T's empirical
findings do not seem to warrant their con-clusion (p. 64) that
"perceived service quality in fact leadsto satisfaction as proposed
by PZB (1985, 1988)."
Types of comparison norms in assessing CS and SQ. Incritiquing
the perceptions-minus-expectations conceptualiza-tion of SERVQUAL,
C&T raise the issue of appropriatecomparison standards against
which perceptions are to becompared (p. 56):
[PZB] state that in measuring perceived service qualitythe level
of comparison is what a consumer should ex-pect, whereas in
measures of satisfaction the appropriatecomparison is what a
consumer wouid expect. However,such a differentiation appears to be
inconsistent withWoodruff, Cadotte, and Jenkins' (1983) suggestion
that ex-pectations should be based on experience normswhatconsumers
shouid expect from a given service providergiven their experience
with that specific tyjje of serviceorganization.
Though the first sentence in this quote is consistent with
thedistinction we have made between comparison norms in as-sessing
SQ and CS, the next sentence seems to imply thatthe debate over
norms has been resolved, the consensusbeing that the
"experience-based norms" of Woodruff, Ca-dotte, and Jenkins (1983)
are the appropriate frame of refer-ence in CS assessment. We
believe that such an inference isunwarranted because
conceptualizations more recent thanWoodruff, Cadotte, and Jenkins
(1983) suggest that CS as-sessment could involve more than one
comparison norm(Forbes, Tse, and Taylor 1986; Oliver 1985; Tse and
Wil-ton 1988; Wilton and Nicosia 1986). Cadotte, Woodruff,and
Jenkins (1987) themselves have stated (p. 313) that"additional work
is needed to refine and expand the concep-tualization of norms as
standards." Our own recent re-search on customers' service
expectations (ZBP 1991, a pub-lication that C&T cite)
identifies two different comparisonnorms for SQ assessment: desired
service (the level of ser-vice a customer believes can and should
be delivered) andadequate service (the level of service the
customer consid-ers acceptable).
Our point is that the issue of comparison norms andtheir
interpretation has not yet been resolved fully. Though
112 / Joumal of Marketing, January 1994
-
recent conceptual work (e.g., Woodruff et al. 1991) and
em-pirical work (e.g., Boulding et al. 1993) continue to add toour
understanding of comparison norms and how they influ-ence
customers' assessments, more research is needed.
Methodoiogicai and Anaiyticai issuesC&T's empirical study
does not justify their claim (p. 64)that "marketing's current
conceptualization and measure-ment of service quality are hased on
a flawed paradigm"and that a performance-hased measure is superior
to theSERVQUAL measure. A numher of serious questionsahout their
methodology and their interpretation of the find-ings challenge the
strong inferences they make.
Dimensionality of service quality. One piece of evi-dence
C&T use to argue against the five-component struc-ture of
SERVQUAL is that Uieir LISREL-hased confirma-tory analysis does not
support the model shown in their Fig-ure 1. Unfortunately,
C&T's Figure 1 is not a totally accu-rate depiction of our
prior findings pertaining toSERVQUAL (PZB 1988) hei:ause it does
not allow for pos-sible intercorrelations among the five latent
constructs.Though we have said in our previous work thatSERVQUAL
consists of five distinct dimensions, we alsohave pointed out that
the factors representing those dimen-sions are intercorrelated and
hence overlap to some degree.We report average interfactor
correlations of .23 to .35 inPZB (1988). In a more recent study in
which we refinedSERVQUAL, reassessed its dimensionality, and
comparedour findings with four other replication studies, we
acknowl-edge and discuss potential overlap among the dimensionsand
arrive at the following conclusion (PZB 1991, p. 442):
Though the SERVQUAL dimensions represent five con-ceptually
distinct facets of service quality, they are also in-terrelated, as
evidenced hy the need for ohlique rotationsof factor solutions in
the various studies to obtain themost interpretahle factor
patterns. One fruitful area for fu-ture research is to explore the
nature and causes of theseinterrelationships.
Therefore, we would argue that the fit of C&T'sSERVQUAL data
to their model in Figure 1 might havebeen better than that implied
by their Table 1 results if themodel had allowed the five latent
constructs tointercorrelate.
C&T use results from theiir oblique factor analyses to
re-iterate their inference that the 22 SERVQUAL items are
uni-dimensional. They do so by merely referring to their Table2 and
indicating (p. 61) that "all of the items loaded predict-ably on a
single factor with the exception of item 19." Buta careful
examination of C&T's Table 2 raises questionsabout the
soundness of their inference.
First, except for the factor representing SERVQUAL'sperceptions
component (i.e., SERVPERF) in the pest con-trol context, the
percentage of variance in the 22 items cap-tured by the one factor
for which results are reported is lessthan 50%, the cut-off value
recommended by Bagozzi andYi (1988) for indicators of latent
constructs to be consid-ered adequate. Moreover, across the four
contexts, the vari-ance captured for the SERVQUAL items is much
lowerthan for the SERVPERF items (mean of 32.4% for
SERVQUAL versus 42.6% for SERVPERF). These resultsstrongly
suggest that a unidimensional factor is not suffi-cient to
represent fiiUy the information generated by the 22items,
especially in the case of SERVQUAL, for whichover two-thirds of the
variance in the items is unexplainedif just one factor is used. The
difference in the variance ex-plained for SERVQUAL and SERVPERF
also suggeststhat the former could be a richer construct that more
accu-rately represents the multifaceted nature of SQ posited inthe
conceptual literature on the subject (Gronroos 1982;Lehtinen and
Lehtinen 1982; PZB 1985; Sasser, Olsen, andWyckoff 1978).
Second, though C&T say that they conducted obliquefactor
analysis, they report loadings for just one factor ineach setting
(Table 2). It is not clear from their articlewhether the reported
loadings are from the rotated or unro-tated factor-loading matrix.
If they are prerotation loadings,interpreting them to infer
unidimensionality is questiona-ble. If they are indeed postrotation
loadings, the variance ex-plained uniquely by each factor might be
even lower thanthe percentages shown in Table 2. Because oblique
rotationallows factors to correlate with one another, part of the
var-iance reported as being explained by one factor might beshared
with the other factors included in the rotation. Thepossibility
that the unique variance represented by the sin-gle factors might
be lower than the already low percentagesin Table 2 further weakens
C&T's claim that the SQ con-struct is unidimensional.
Finally, C&T's inference that SERVQUAL andSERVPERF can be
treated as unidimensional on the basisof the high alpha values
reported in Table 2 is erroneous be-cause it is at odds with the
extensive discussion in the liter-ature about what
unidimensionality means and what coeffi-cient alpha does and does
not represent (Anderson and Ger-bing 1982; Gerbing and Anderson
1988; Green, Lissitz,and Mulaik 1977; Hattie 1985; Howell 1987). As
Howell(1987, p. 121) points out, "Coefficient alpha, as an
esti-mate of reliability, is neither necessary nor sufficient for
uni-dimensionality." And, according to Gerbing and Anderson(1988,
p. 190), "Coefficient alpha ... sometimes has beenmisinterpreted as
an index of unidimensionality rather thanreliability....
Unidimensionality and reliability are distinctconcepts.''
In short, every argument that C&T make on the basis oftheir
empirical findings to maintain that the SERVQUALitems form an
unidimensional scale is questionable. There-fore, summing or
averaging the scores across all items to cre-ate a single measure
of service quality, as C&T have donein evaluating their
structural models (Figure 2), is question-able as well.
Validity. Citing evidence from the correlations in theirTable 3,
C&T conclude that SERVPERF has better validitythan SERVQUAL (p.
61): "We suggest that the proposedperformance-based measures
provide a more construct-valid explication of service quality
because of their contentvalidity ... and the evidence of their
discriminant validity."This suggestion is unwarranted because, as
we demonstratesubsequently, SERVQUAL performs just as well
asSERVPERF on each validity criterion that C&T use.
Reassessment of Expectations as a Comparison Standard /113
-
C&T infer convergent validity for SERVPERF by stat-ing (p.
61), "A high correlation between the itemsSERVPERF,
importance-weighted SERVPERF, and ser-vice quality indicates some
degree of convergent validity."However, they fail to examine
SERVQUAL in similar fash-ion. According to Table 3, the average
pairwise correlationamong SERVPERF, importance-weighted
SERVPERF,and overall service quality is .6892 (average of
.9093,.6012, and .5572). The corresponding average correlationfor
SERVQUAL is .6870 (average of .9787, .5430, and.5394). Clearly, the
virtually identical average correlationsfor SERVPERF and SERVQUAL
do not warrant the conclu-sion that the former has higher
convergent validity than thelatter.
C&T claim discriminant validity for SERVPERF by stat-ing (p.
61), "An examination of the correlation matrix inTable 3 indicates
discriminant validity ... as the three ser-vice quality scales all
correlate more highly with each otherthan they do with other
research variables (i.e., satisfactionand purchase intentions)."
Though this statement is true, itis equally true for SERVQUAL, a
finding that C&T fail toacknowledge. In fact, though the
average pairwise correla-tion of SERVPERF with satisfaction and
purchase inten-tions is .4812 (average of .5978 and .3647), the
correspond-ing value for SERVQUAL is only .4569 (average of
.5605and .3534). Because the average within-construct
intercorre-lations are almost identical for the two scales, as
demon-strated previously, these results actually imply
somewhatstronger discriminant validity for SERVQUAL.
Regression analyses. C&T assess the predictive abilityof the
various multiple-item service quality scales by con-ducting
regression analyses wherein their single-item over-all SQ measure
is the dependent variable. On the basis ofthe R^ values reported in
Table 4, they correctly concludethat SERVPERF outperforms the other
three scales. Never-theless, one should note that the average
improvement in var-iance explained across the four contexts by
usingSERVPERF instead of SERVQUAL is 6%. (The mean R^values,
rounded to two decimal places, are .39 forSERVQUAL and .45 for
SERVPERF.) Whether this differ-ence is large enough to claim
superiority for SERVPERF isarguable. The dependent variable itself
is a performance-based (rather than disconfirmation-based) measure
and, assuch, is more similar to the SERVPERF than theSERVQUAL
formulation. Therefore, at least some of the im-provement in
explanatory power achieved by usingSERVPERF instead of SERVQUAL
could be merely an ar-tifact of the "shared method variance"
between the depend-ent and independent variables. As Bagozzi and Yi
(1991, p.426) point out, "When the same method is used to
measuredifferent constructs, shared method variance always
inflatesthe observed between-measure correlation."
The overall pattern of significant regression coefficientsin
C&T's Table 4 offers some insight about the dimension-ality of
the 22 items as well. Specifically, it can be parti-tioned into the
following five horizontal segments corre-sponding to the a priori
classification of the items into theSERVQUAL dimensions (PZB
1988):
Segment 1Segment 2Segment 3Segment 4Segment 5
TangiblesReliabilityResponsivenessAssuranceEmpathy
Variables V1-V4Variables V5-V9Variables V10-V13Variables
V14-V17Variables V18-V22
The total number of significant regression coefficients inthe
various segments of Table 4 are as follows:
Segment 1Segment 2Segment 3Segment 4Segment 5
TangiblesReliabilityResponsivenessAssuranceEmpathy
6329
1411
As this summary implies, the items under the five dimen-sions do
not all contribute in like fashion to explaining thevariance in
overall service quality. The reliability items arethe most critical
drivers, and the tangibles items are theleast critical drivers.
Interestingly, the relative importanceof the five dimensions
implied by this summarythat is, re-liability being most important,
tangibles being least impor-tant, and the other three being of
intermediate importancealmost exactly mirrors our findings obtained
through both di-rect (i.e., customer-expressed) and indirect (i.e.,
regression-based) measures of importance (PBZ 1991; PZB 1988;ZPB
1990). The consistency of these findings across stud-ies,
researchers, and contexts offers strong, albeit indirect,support
for the multidimensional nature of service quality.They also
suggest that techniques other than, or in additionto, factor
analyses can and should be used to understand ac-curately the
dimensionality of a complex construct such asSQ. As we have argued
in another article comparing severalSERVQUAL-replication studies
that yielded differing fac-tor pattems (PBZ 1991, p. 442-3):
Respondents' rating a specific company similarly onSERVQUAL
items pertaining to different dimensionsrather than the
respondents' inability to distinguish concep-tually between the
dimensions in generalis a plausibleexplanation for the diffused
factor pattems. To illustrate,consider the following SERVQUAL
items: Employees at XYZ give you prompt service (a responsive-
ness item) Employees of XYZ have the knowledge to answer your
ques-
tions (an assurance item) XYZ has operating hours convenient to
all its customers (an
empathy item)If customers happen to rate XYZ the same or even
simi-larly on these items, it does not necessarily mean that
cus-tomers consider the items to be part of the same dimen-sion.
Yet, because of high inter-correlations among thethree sets of
ratings, the items are likely to load on thesame factor when the
ratings are factor analyzed. There-fore, whether an unclear factor
pattern obtained through an-alyzing company-specific ratings
necessarily implies poordiscriminant validity for the general
SERVQUAL dimen-sions is debatable.
In discussing the rationale for their first proposition,which
they later test by using regression analysis, C&Tstate (p. 59),
"The evaluation [of] P, calls for an assess-ment of whether the
addition of the importance weights sug-gested by ZPB (1990)
improves the ability of the
114 / Journal of Marketing, January 1994
-
SERVQUAL and SERVPERF scales to measure servicequality." This
statement is misleading because in ZPB(1990), we recommend the use
of importance weightsmerely to compute a weighted average SERVQUAL
score(across the five dimensions) as an indicator of a
company'soverall SQ gap. Moreover, the importance weights that
weuse in ZPB (1990) are weights for the dimensions (derivedfrom
customer responses to a 100-point allocation ques-tion), not for
the individual SERVQUAL items. Neither inZPB (1990) nor in our
other work (PBZ 1991; PZB 1988)have we suggested using survey
questions to measure the im-portance of individual items, let alone
using weighted indi-vidual-item scores in regression analyses as
C&T havedone. In fact, we would argue that using weighted
itemscores as independent variables in regression analysis is
notmeaningful b^ause a primaiy purpose of regression analy-sis is
to derive the importance weights indirectly (in theform of beta
coefficients) by using unweighted or "raw"scores as independent
variables. Therefore, using weightedscores as independent variables
is a form of "doublecounting."
Relationships among SQ, CS, and PI. Figure 2 inC&T's article
shows the structural model they use to exam-ine the
interrelationships among SQ, CS, and PL Their oper-ationalization
and testing of this model suffer from severalserious problems.
C&T's use of single-item scales to meas-ure SQ, CS, and PI
fails to do justice to the richness ofthese constructs. As already
discussed, SQ is a multifacetedconstruct, even though there is no
clear consensus yet onthe number of dimensions and their
interrelationships.Though a single-item overall SQ measure may be
appropri-ate for examining the convergent validity and
predictivepower of alternative SQ measures such as SERVQUAL, itis
not as appropriate for testing models positing structural
re-lationships between SQ and other constructs such as PI,
es-pecially when a multiple-item scale is available. Therefore,in
Figure 2, using SERVQUAL or SERVPERF directly tooperationalize
overall SQ (i.e., 1)2) would have been far bet-ter than using a
single-item SQ measure as C&T have done.In other words, because
C&T's Pj, P3, and P4 relate to justthe bottom part of their
Figure 2, the procedure they used tooperationalize 5, should have
been used to operationalize1)2 instead.
C&T's failure to use multiple-item scales to measureCS and
PI, particularly in view of the centrality of these con-stucts to
their discussion and claims, is also questionable.Multiple-item CS
scales (e.g., Westbrook and Oliver 1981)are available in our
literature and have been used in severalcustomer-satisfaction
studies (e.g., Oliver and Swan 1989;Swan and Oliver 1989).
Multiple-item scales also havebeen used to operationalize purchase
or behavioral inten-tions (e.g., Bagozzi 1982; Dodds, Monroe, and
Grewal1991).
Another serious consequence of C&T's use of single-item
measures as indicators for their model's four latent con-structs is
the lack of degrees of freedom for a robust test oftheir model. The
four measures yield only six intermeasurecorrelations (i.e., 4X3 /2
= 6) to estimate the model's para-meters. Because five parameters
are being estimated as per
Figure 2, only one degree of freedom is available to test
themodel, a fact that C&T do not report in their LISREL
re-sults summarized in Table 5 or their discussion of the re-sults.
Insufficient degrees of freedom to estimate the parame-ters of a
structural model will tend to inflate the fit of thedata to the
model. For example, a model with zero degreesof freedom (i.e., the
number of intermeasure correlationsequal to the number of model
parameters) will fit the ob-served data perfectly.
Therefore, the meaningfulness of C&T's observation(p. 63)
that "Model 1 (SERVQUAL) had a good fit in twoof the four
industries ... whereas Model 2 (SERVPERF)had an excellent fit in
all four industries," should be as-sessed carefully, keeping in
mind two important points: (1)The fit values for both Model 1 and
Model 2 could be in-flated due to insufficient degrees of freedom
and (2) Thesomewhat better fit values for Model 2 coidd be an
artifactof the "shared method variance" between the SERVPERFand the
"Overall Service Quality" measures in the model.Because of the
questionable meaningfulness of the struc-tural model tests,
C&T's interpreting the test results (p. 63)"as additional
support for the superiority of theSERVPERF approach" is debatable
as well.
A comparison of the results in C&T's Tables 3 and 5 re-veals
several serious inconsistencies and interpretationalproblems that
reiterate the inadequacies of their measuresand structural model
test. For example, the correlation be-tween the single-item
measures of SQ and CS is .8175(Table 3), implying that the measures
are highly coliinearand exhibit little or no discriminant validity.
Yet C&T seemto ignore this problem and rely solely on the Table
5 results(which themselves are questionable because of the
prob-lems described previously) to make strong claims pertain-ing
to the direction of causality between the two constructs(i.e., SQ
leads to CS and not vice versa) and the relative in-fiuence of the
two constructs on PI (i.e., CS has a signifi-cant effect in all
four contexts and SQ has no significant ef-fect in any
context).
Regarding the latter claim, it is instructive to note fromTable
3 that CS and SQ have virtually identical correlationswith PI
(.5272 for SQ and .5334 for CS). ITie most logicalinference on the
basis of this fact is that SQ and CS have asimiliar effect on PI,
particularly because SQ and CS them-selves are highly correlated.
Such an inference is muchmore plausible and justified than
C&T's conclusion thatonly CS has a significant effect on PL
That the findingsfrom C&T's structural-model test led them to
this question-able inference is perhaps a manifestation of the
problemswith their measures (single-item measure,
muIticoUinearity)and their model (questionable causal path, lack of
degreesof freedom).
Practicai IssuesArguing in favor of performance-based measures
of SQ,C&T state (p. 58), "Practitioners often measure the
determi-nants of overall satisfaction/perceived quality by having
cus-tomers simply assess the performance of the company's busi-ness
processes." Though the practice of measuring only per-ceptions is
widespread, such a practice does not necessarily
Reassessment of Expectations as a Comparison Standard /115
-
Table 1Mean Perceptions-Only and SERVQUAL Scores^
ServiceQualityDimensions
TangiblesReliabilityResponsivenessAssuranceEmpathy
P
5.65.15.15.45.2
Tel.Co.
SQ.3
-1.3-1.2-1.1-1.1
P
5.34.85.15.45.1
Ins. Co.1
SQ0
-1.6-1.3-1.0-1.1
P
5.65.45.65.85.6
Ins. Co.2
SQ.6
-1.0-.8-.7-.8
P
5.45.14.85.24.7
Bank1
SQ.3
-1.4-1.6-1.4-1.6
P
5.85.45.45.75.2
Bank2
SQ.7
-1.1-.9-.8
-1.0numbers under "P" and "SQ" represent mean perceptions-only
and SERVQUAL scores respectively.
mean performance-based measures are superior to
disconfir-mation-based measures. As demonstrated in PBZ (1990),SQ
measurements that incorporate customer expectationsprovide richer
information than those that focus on percep-tions only. Moreover,
executives in companies that haveswitched to a
disconfirmation-based measurement approachtell us that the
information generated by this approach hasgreater diagnostic
value.
In a recent study we administered the SERVQUALscale to
independent samples of customers of five nation-ally known
companies (details of the study are in PBZ1991). Table 1 summarizes
perceptions-only andSERVQUAL scores by dimension for each of the
five com-panies. As Table 1 shows, the SERVQUAL scores for
eachcompany consistently exhibit greater variation across
dimen-sions than the perceptions-only scores. This pattern of
find-ings suggests that the SERVQUAL scores could be supe-rior in
terms of pinpointing areas of deficiency within a com-pany.
Moreover, examining only performance ratings canlead to different
actions than examining those ratings rela-tive to customer
expectations. For example. Ins. Co. 1might focus more attention on
tangibles than on assuranceif it relied solely on the
perceptions-only scores. Thiswould be a mistake, because the
company's SERVQUALscores show a serious shortfall for assurance and
no short-fall for tangibles.
An important question to ask in assessing the practicalvalue of
SERVQUAL vis-k-vis SERVPERE is. Are manag-ers who use SQ
measurements more interested in accu-rately identifying service
shortfalls or explaining variancein an overall measure of perceived
service? (Explained var-iance is the only criterion on which
SERVPERF performsbetter than SERVQUAL, and, as discussed
previously, thiscould be due to shared method variance.) We believe
thatmanagers would be more interested in an accurate diagno-sis of
SQ problems. From a practical standpoint,SERVQUAL is preferable to
SERVPERF in our judgment.The superior diagnostic value of SERVQUAL
more than off-sets the loss in predictive power.
Response to Teas (1993)The issues raised by Teas (1993) fall
under three main top-ics: (1) interpretation of the expectations
standard, (2) oper-ationalization of this standard, and (3)
evaluation of alterna-tive models specifying the SQ construct.
interpretation of ExpectationsA key issue addressed by Teas is
the impact of the interpre-tation of the expectations measure (E)
on the meaningful-ness of the P-E specification invoked by the
SERVQUALframework. Specifically, on the basis of a series of
concep-tual and mathematical arguments. Teas concludes that
in-creasing P-E scores may not necessarily reflect continu-ously
increasing levels of perceived quality, as theSERVQUAL framework
implies. Though his conclusionhas merit, several assumptions
underlying his argumentsmust be reexamined to assess accurately the
severity of theproblem.
The P-E specification is problematic only for certaintypes of
attributes under certain conditions. As Teas's discus-sion
suggests, this specification is meaningful if the servicefeature
being assessed is a vector attributethat is, one onwhich a
customer's ideal point is at an infinite level. Withvector
attributes, higher performance is always better (e.g.,checkout
speed at a grocery store). Thus, as portrayedunder Case A in Figure
1, for vector attributes there is a pos-itive monotonic
relationship between P and SQ for a givenlevel of the expectation
norm E, regardless of how E is in-terpreted. This relationship is
consistent with theSERVQUAL formulation of SQ. Moreover, customers
arelikely to consider most of the 22 items in the
SERVQUALinstrument to be vector attributesa point we discuss
sub-sequently and one that Teas acknowledges in footnote 16 ofhis
article.
The P-E specification could be problematic when a ser-vice
attribute is a classic ideal point attributethat is, oneon which a
customer's ideal point is at a finite level and,therefore,
performance beyond which will displease the cus-tomer (e.g.,
friendliness of a salesperson in a retail store).However, the
severity of the potential problem depends onhow the expectations
norm E is interpreted. Teas offers twointerpretations of E that are
helpful in assessing the meaning-fulness of the P-E specification:
a "classic attitudinalmodel ideal point" interpretation and a
"feasible idealpoint" interpretation. If E is interpreted as the
classic idealpoint (i.e., E = I ) and the performance level P
exceeds E,the relationship between P and SQ for a given level of E
be-comes negative monotonic, in contrast to the positive mon-otonic
relationship implied by the SERVQUAL formula-tion. As Case B in
Figure 1 shows, under the classic idealpoint interpretation of E
the P-E specification is meaning-ful as long as P is less than or
equal to E, but becomes a
116 / Journal of Marketing, January 1994
-
FIGURE 1Functional Relationship Between PerceivedPerformance
(Pj) and Service Quality (SQ)
Case A: Attribute j is a vector attribute and. therefore,
thegeneral form of the function below holds regardlessof how the
expectations standard (Ej) is defined.
SQ 0 PJ
Case B: Attribute j is a classic ideal-point attrit>ute and
Ej isinterpreted as the classic ideal point (i.e., Ej = Ij)
SQ 0 PJ
CaseC: Attribute j is a classic Ideal-point attribute and Ej
isinterpreted as the feasible ideal point (i.e., Ej < Ij)
SQ 0
problem if P exceeds E. When this condition occurs, the cor-rect
specification for SQ is -(P-E).
To examine the P-E specification's meaningfulnessunder the
feasible ideal point interpretation. Teas proposesa "modified
one-attribute SERVQUAL model" (MQj), rep-resented by equation 3 in
his article. Dropping the subscripti from this equation for
simplicity (doing so does not mate-rially affect the expression)
yields the following equation:(1) MQ = -1 [ |P - I | - |E - I |
]where
MQ = "Modified" qualityI =The ideal amount of the attribute
(i.e.. the classic
attitudinal model ideal point)E = The expectation norm
interpreted as the feasible
ideal pointP = The perceived performance level
Teas defines the feasible ideal point (p. 20) as ' 'a
feasiblelevel of performance under ideal circumstances, that is,
thebest level of performance by the highest quality providerunder
perfect circumstances." If one examines this defini-tion from a
customer's perspectivean appropriate onegiven that E and I are
specified by customersit is incon-ceivable that the feasible ideal
point would exceed the clas-sic ideal point. Because the feasible
ideal point representsthe level of service the customer considers
possible for thebest service company, exceeding that level still
should beviewed favorably by the customer. In contrast, the
classicideal point is the service level beyond which the
customer
would experience disutility. Therefore, the feasible idealpoint
logically cannot exceed the classic ideal point. Inother words,
when customers specify the levels of E and I,the range of feasible
ideal points will be restricted as fol-lows: E < I. Yet, implied
in Tbas's discussion of MQ is theassumption that E can exceed I
(Teas makes this assump-tion explicit in footnote 7). Under the
more realistic assump-tion that E < I, the absolute value of the
E - I difference(the second component in equation l's right side)
is equiv-alent to -(E - I). Making this substitution in equation
1yields
MQ =- l [ |P - l |
Furthermore, when P < I, |P - I| is equivalent to -(P -
I),and the equation for MQ simplifies to
MQ = -1 HP - I)] - (E -1)= (P - I) - (E - I)= P - E
Therefore, as long as the perceived performance (P)does not
exceed the classic ideal point (I), MQ is equivalentto SERVQUAL's
P-E specification, regardless of whetherP is below or above the
feasible ideal point (E). As such,Teas's conclusion (p. 21) that
"the SERVQUAL P-E meas-urement specification is not compatible with
the feasibleideal point interpretation of E when finite classic
ideal pointattributes are involved" is debatable. As Case C in
Figure1 shows, when P < I, the relationship between P and SQ
ispositive monotonic and identical to the relationship underthe
vector-attribute assumption (Case A).
The P-E specification does become a problem when P> I under
the classic ideal point attribute assumption, withE being
interpreted as the feasible ideal point. As shown byCase C in
Figure 1, perceived service quality (SQ) peakswhen P = I and starts
declining thereafter. An important fea-ture of Case C not obvious
from Teas's discussion is thatthe classic ideal point (I) in effect
supersedes the feasibleideal point (E) as the comparison standard
when P exceedsI. In other words, when P > I, the shape of the SQ
functionis the same in Cases B and C except that in Case C the
de-cline in SQ starts from a positive base level (of magnitudeI -
E), whereas in Case B it starts from a base level of
zero.Therefore, when P > I under the feasible ideal point
interpre-tation of E, the expression for service quality is
SQ = (I - E) - (P -1)
Figure 2 summarizes in flowchart form how the type ofattribute
and interpretation of E infiuence the specificationof SQ. The P-E
specification is appropriate under three ofthe five final branches
in the fiowchart. Therefore, the inap-propriateness of the P-E
specification depends on the like-lihood of the occurrence of the
remaining two branches.This is perhaps an empirical question that
also has implica-tions for further research on the nature of the
service attrib-utes, as we discuss subsequently.
Reassessment of Exp:tations as a Comparison Standard /117
-
FIGURE 2Impact of Attribute Type and Expectation Standard
on Expression for Perceived Service Quality
Is attribute j a vector attributeor a classic ideal point
attribute?
J:Vector attribute Ciassic ideal point attribute
Is E interpreted as a classic idealpoint or a feasible ideal
point?
Classic ideal point(E = i)
1Feasible ideai point
( l )
Is P < E? Is P < I?
P < E P > E p . I p > i
SQ=P-E SQ=P-E SQ=P-E
Opemtlonallzatlon of ExpectationsTeas raises several concerns
about the operationalization ofE, including a concern we ourselves
have raised (PBZ1990) about die use of the word "should" in our
original ex-pectations measure (PZB 1988). Because of this
concern,we developed a revised expectations measure (E*)
represent-ing the extent to which customers believe a particular
attrib-ute is "essential" for an excellent service company.
Teaswonders if E* is really an improvement over E, our
originalmeasure (p. 21-22):
Defining the revised SERVQUAL E* in this way, in con-junction
with the P-E measurement specification,... sug-gests high
performance scores on essential attributes (highE* scores) reflect
lower quality than high performance onattributes that are less
essential (low E* scores). It is diffi-cult to envision a
theoretical argument that supports thismeasurement
specification.
Tfeas's argument is somewhat misleading because high
per-formance on an "essential" attribute may not be highenough
(from the customer's standpoint) and, therefore, log-ically can
reflect lower quality on that attribute (a keyphrase missing fiom
Tfeas's argument) than equally high per-formance on a less
essential attribute. In fact, as we have ar-gued in our response to
C&T, this possibility is an impor-tant reason why measuring
only perceived service perfor-mance can lead to inaccurate
assessment of perceived ser-vice quality.
The operationalization issue addressed and discussedmost
extensively by Teas is the congruence (or lack
thereof) between the conceptual and operational definitionsof
the expectation measures. The systematic, comprehen-sive, and
unique approach Teas has used to explore thisissue is commendable.
His approach involves obtaining rat-ings on three different
comparison norms (E, E*, and I), con-ducting follow-up interviews
of respondents providing non-extreme ratings (i.e., less than 7 on
the 7-point scales used),using multiple coders to classify the
open-ended responses,and computing indexes of congruence on the
basis of rela-tive frequencies of responses in the coded
categories.
Though the results from Teas's content analysis of theopen-ended
responses (summarized in his Appendix B andTables 4 and 5) are
sound and insightful, his interpretationof them is open to
question. Specifically,' 'under the assump-tion that the
feasibility concept is congruent with the excel-lence norm
concept'' (p. 25), he concludes that the index ofcongruence is only
39.4% for E and 35.0% for E*, thoughhe later raises these indexes
to 52.4% and 49.1%, respec-tively, by including several more
response categories. Onthe basis of these values. Teas further
concludes (p. 25) that' 'a considerable portion of the variance in
the SERVQUALE and E* measures is the result of measurement error
in-duced by respondents misinterpreting the scales." As dis-cussed
subsequently, both conclusions are problematic.
The purpose of the follow-up interviews in Teas's studywas to
explore why respondents had lower than the maxi-mum expectation
levels^as Tfeas acknowledges (p. 24), ' 're-spondents answered
qualitative follow-up questions concern-ing their reasons for
non-extreme (non-7) responses." Theissue of how respondents
interpreted the expectations ques-tions was not the focus of the
follow-up interviews. There-fore, inferring incongruence between
intended and inter-preted meanings of expectations (i.e.,
measurement error)on the basis of the open-ended responses is
questionable.
Pertinent to the present discussion is a study that we
con-ducted to understand the nature and detenninants of custom-ers'
service expectations (ZBP 1993, a revised version ofthe monograph
ZBP 1991). In this study, we developed aconceptual model
explicating the general antecedents of ex-pectations and how they
are likely to influence expectationlevels. In the first column of
Table 2, we list five general an-tecedents from our study that are
especially likely to lowerexpectations. Shown in the second column
are illustrativeopen-ended responses from Teas's study that relate
to thegeneral antecedents. The correspondence between the
twocolumns of Table 2 is additional evidence that the items
inAppendix B of Teas's article refiect reasons for
expectationlevels rather than interpretations of the
expectationmeasures.
A key assumption underlying Teas's indexes of congru-ence is
that the only open-ended responses congruent withthe excellence
norm standard are the ones included underthe "not feasible," and
perhaps "sufficient," categories.The rationale for this critical
assumption is unclear. There-fore, what the index of congruence
really represents is un-clear as well. Given that the open-ended
responses are rea-sons for customers' nonextreme expectations, no
responselisted in Appendix B of Teas's article is necessarily
incom-
118 / Journal of Marketing, January 1994
-
TABLE 2Correspondence Between Antecedent Constructs
in ZBP's Expectations Model and Foliow-upResponses Qbtained by
Teas
Illustrative Follow-upResponses from Ibas(1993) Reflecting
Each
ConstructAntecedent Constructs
from ZBP (1993)Personal Needs: States orconditions essential to
thephysical or psychologicalwelt-being of the customer.Perceived
Service Alterna-tives: Customers' percep-tions of the degree to
whichthey can obtain better ser-vice through providers otherthan
the focal company.Self-Perceived ServiceRole: Customers'
percep-tions of the degree to whichthey themselves influencethe
level of service they re-ceive.Situational Factors:
Ser-vice-performance contingen-cies that customers perceiveare
beyond the control of theservice provider.Past Experience:
Custom-ers' previous exposure tothe service that is relevant tothe
focal service.
Not necessary to knowexact need.
All of them have pretty muchgood hours.
I've gone into the store notknowing exactly myself whatI
want.
Can't wait on everybody atonce.
You don't get perfection inthese kind of stores.
patible with the "excellence norm"that is, customers'
per-ceptions of attribute levels essential for service
excellence.
Interestingly, the coded responses from Teas's studyoffer rich
insight into another important facet of measuringSQ, namely, the
meaningfulness of the P-E specification.As already discussed,
whether the P-E specification is appro-priate critically hinges on
whether a service feature is a vec-tor attribute or a classic ideal
point attribute. The P-E spec-ification is appropriate if the
feature is a vector attribute orit is a classic ideal point
attribute and perceived perfor-mance is less than or equal to the
ideal level (see Figure 2).Customers' reasons for
less-ttian-maximum expectations rat-ings in Teas's study shed light
on whether a service featureis a vector or classic ideal point
attribute. Specifically, rea-sons suggesting that customers dislike
or derive negativeutility from performance levels exceeding their
expressed ex-pectation levels imply classic ideal point
attributes.
Of the open-ended responses in Appendix B of Tfeas's ar-ticle,
only those classified as "Ideal" (including double-classifications
involving "Ideal") clearly imply that the ser-vice features in
question are classic ideal point attributes.None of the other
open-ended responses implies that perfor-mance levels exceeding the
expressed expectation levelswill displease customers. Therefore,
service features associ-ated with all open-ended responses not
classified as"Ideal" are likely to possess vector-attribute
properties.
The response-category frequencies in Teas's Tables 4and 5 can be
used to compute a different type of "index ofcongruence'' that
refiects the extent to which reasons fornonextreme expectations
ratings are compatible with a vec-tor-attribute assumption. This
index is given by
100 - E Percentages of responses in the "Ideal"categories
From Table 4, this index is 97.6% for E and 96.6% for
E*,implying strong support for the vector-attribute assumption.From
Table 5, which shows the distribution of respondentsrather than
responses, this index is 91.6% for E and 90.8%for E*. These are
conservative estimates because Table 5 al-lows for multiple
responses and, therefore, some respon-dents could be included in
more than one category percent-age in the summation term of the
previous expression,thereby inflating the term. Therefore, for a
vast majority ofthe respondents, the vector-attribute assumption is
tenablefor all service attributes included in Tbas's study.
Evaluation of Alternative Serv/ce Quality ModelsUsing the
empirical results from his study. Teas examinesthe criterion and
construct validity of eight different SQmodels: unweighted and
weighted versions of theSERVQUAL (P-E), revised SERVQUAL (P-E*),
normedquality (NQ), and evaluated performance (EP) formula-tions.
On the basis of correlation coefficients refiecting cri-terion and
construct validity, he correctly concludes that hisEP formulation
outperforms the other formulations. How-ever, apart from the
statistical significance of the correla-tion differences reported
in his Tables 7 and 8, several ad-ditional issues are relevant for
assessing the superiority ofthe EP formulation.
In introducing these issues we limit our discussion tothe
unweighted versions of just the P-E, P-E*, and EP for-mulations for
simplicity and also because (1) for all formu-lations, the
unweighted version outperformed its weightedcounterpart on both
criterion and construct validity and (2)the NQ formulation, in
addition to having the lowest crite-rion and construct validity
among the four formulations, isnot one that we have proposed. From
Tfeas's equation 6 andhis discussion of it, the unweighted EP
formulation for anyservice is as follows (the service subscript " i
" is sup-pressed for notationa! simplicity):
where " m " is the number of service attributes. This
expres-sion assumes that the attributes are classic ideal point
attrib-utes as evidenced by the fact that the theoretical
maximumvalue for EP is zero, which occurs when Pj = L for all
attrib-utes j . As discussed in the preceding section, the
evidencefrom Tfeas's follow-up responses strongly suggests that
theservice attributes included in the study are much morelikely to
be vector attributes than classic ideal point attrib-utes.
Therefore, the conceptual soundness of the EP formu-lation is open
to question.
Tfeas's findings pertaining to nonextreme ratings on hisIj scale
raise additional questions about the soundness of
Reassessment of Expectations as a Comparison Standard /119
-
the classic ideal point attribute assumption invoked by theEP
formulation. As his Table 4 reveals, only 151 non-7 rat-ings (13%
of the total 1200 ratings) were obtained for theideal point (I)
scale, prompting him to observe (p. 26), "Itis interesting to note,
however, that the tendency for nonex-treme responses was much
smaller for the ideal point meas-ure than it was for the
expectations measures." Ironically,this observation, though
accurate, appears to go against theclassic ideal point attribute
assumption because the fact thatonly 13% of the ratings were less
than 7 seems incompati-ble with that assumption. If the service
features in questionwere indeed classic ideal point attributes, one
would expectthe ideal-point levels to be dominated by
nonextremeratings.
Even in instances in which nonextreme ratings were ob-tained for
the ideal point (I) measure, a majority of the rea-sons for those
ratings (as reflected by the open-ended re-sponses to the follow-up
questions) do not imply the pres-ence of finite ideal-point levels
for the attributes investi-gated. From Teas's Table 4, only 40.5%
of the reasons areclassified under "Ideal" (including double
classificationswith "Ideal" as one of the two categories). Teas
does ac-knowledge the presence of this problem, though he labels
itas a lack of discriminant validity for the
ideal-pointmeasure.
Finally, it is worth noting that a semantic-differentialscale
was used for the " P " and " I " measures in the EP for-mulation,
whereas a 7-point "strongly disagree"-"stronglyagree" scale was
used for the three measures (P, E, and E*)in the SERVQUAL and
revised SERVQUAL formulations.Therefore, the observed criterion and
construct validity ad-vantages of the EP formulation could be an
artifact of thescale-format differences, especially because the
correlationdifferences, though statistically significant, are
modest inmagnitude: For the unweighted formulations, the mean
cri-terion-validity correlation difference between EP and thetwo
SERVQUAL formulations is .077 ([.081 + .073]/2);the coiresponding
mean construct-validity correlation differ-ence is .132 ([.149 -f-
.114]/2). However, because these corre-lation differences are
consistently in favor of EP, the seman-tic-differential scale
format could be worth exploring in thefuture for measuring the
SERVQUAL perceptions and ex-pectations components.
Implications and Directions forFurther Research
C&T and Teas have raised important, but essentially
differ-ent, concerns about the SERVQUAL approach for measur-ing SQ.
C&T's primary concerns are that SERVQUAL'sexpectations
component is unnecessary and the instru-ment's dimensionality is
problematic. In contrast. Teas ques-tions the meaningfulness of
SERVQUAL's P-E specifica-tion and wonders whether an alternate
specificationthe"evaluated performance" (EP) specificationis
superior.We have attempted to demonstrate that the validity and
al-leged severity of many of the issues raised are
debatable.Nevertheless, several key unresolved issues emerging
from
this exchange offer a challenging agenda for further re-search
on measuring SQ.Measurement of ExpectationsOf the psychometric
concerns raised by C&T aboutSERVQUAL, the only one that has
consistent support fromtheir empirical findings is the somewhat
lower predictivepower of the P-E measure relative to the P-only
measure.Our own findings (PBZ 1991) as well as those of other
re-searchers (e.g., Babakus and Boiler 1992) also support
thesuperiority of the perceptions-only measure from a
purelypredictive-validity standpoint. However, as argued
previ-ously, the superior predictive power of the P-only
measuremust be balanced against its inferior diagnostic value.
There-fore, formally assessing the practical usefulness of
measur-ing expectations and the trade-offs involved in not doing
sois a fruitful avenue for additional research. More generally,a
need and an opportunity exist for explicitly incorporatingpractical
criteria (e.g., potential diagnostic value) into the tra-ditional
scale-assessment paradigm that is dominated by psy-chometric
criteria. Perreault (1992, p. 371), in a recent com-mentary on the
status of research in our discipline, issues asimilar call for a
broadened perspective in assessingmeasures:
Current academic thinking tends to define measures as
ac-ceptable or not primarily based on properties of the meas-ure.
However, research that focuses on defining what is"acceptable" in
terms of a measure's likely impact on in-terpretation of
substantive relationships, or the conclu-sions that might be drawn
from those relationships, mightprovide more practical guidelinesnot
only for practitio-ners but also for academics.... There is
probably a rea-sonable 'middle ground' between the 'all or nothing'
con-trast that seems to have evolved.The most appropriate way to
incorporate expectations
into SQ measurement is another area for additional re-search.
Though the SERVQUAL formulation as well as thevarious SQ models
evaluated by Teas use a difference-score formulation, psychometric
concerns have been raisedabout the use of difference scores in
multivariate analyses(for a recent review of these concerns, see
Peter, Churchill,and Brown 1993). Therefore, there is a need for
studies com-paring the paired-item, difference-score formulation
ofSERVQUAL with direct formulations that use single itemsto measure
customers' perceptions relative to theirexpectations.
Brown, Churchill, and Peter (1993) conducted such acomparative
study and concluded that a nondifference scoremeasure had better
psychometric properties thanSERVQUAL. However, in a comment on
their study, wehave raised questions about their interpretations
and demon-strated that the claimed psychometric superiority of the
non-difference score measure is questionable (PBZ 1993). Like-wise,
as our response to C&T's empirical study shows, thepsychometric
properties of SERVQUAL's difference scoreformulation are by and
large just as strong as those of its P-only component. Because the
cumulative empirical evi-dence about difference score and
nondifference score meas-ures has not established conclusively the
superiority of onemeasure over the other, additional research in
this area is
120 / Joumal of Marketing, January 1994
-
warranted. Such research also should explore altematives tothe
Likert-scale fomiat ("strongly agree"-"strongly disa-gree") used
most frequently in past SERVQUAL studies.The semantic-differential
scale format used by Tfeas to oper-adonalize his EP formulation is
an especially attractive alter-nativeas we pointed out previously,
this scale format is aplausible explanation for the consistently
superior perfor-mance of the EP formulation.
Dimensionality of Service QuaiityAs already discussed, C&T's
conclusion that the 22SERVQUAL items form a unidimensional scale is
unwar-ranted. However, replication studies have shown
significantcorrelations among the five dimensions originally
derivedfor SERVQUAL. Therefore, exploring why and how thefive
service quality dimensions are interrelated (e.g., couldsome of the
dimensions be antecedents of the others?) is afertile area for
additional research. Pursuing such a researchavenue would be more
appropriate for advancing our under-standing of SQ than hastily
discarding the multidimen-sional nature of the construct.
The potential drawback pointed out previously of usingfactor
analysis as the sole approach for assessing the dimen-sionality of
service quality implies a need for developingother approaches. As
we have suggested elsewhere (PBZ1991), an intriguing approach is to
give customers defini-tions of the five dimensions and ask them to
sort theSERVQUAL items into the liimensions only on the basis
ofeach item's content. The proportions of customers "cor-rectly"
sorting the items into the five dimensions would re-flect the
degree to which the dimensions are distinct. The pat-tem of correct
and incorrect classifications also could revealpotentially
confusing items and the consequent need to re-word the items and/or
dimensional definitions.
Relationsiiip Between Service Quaiity andCustomer
SatisfactionThe direction of causality between SQ and CS is an
impor-tant unresolved issue that C&T's article addresses
empiri-cally and Teas's article addresses conceptually.
ThoughC&T conclude that SQ leads to CS, and not vice versa,
theempirical basis for their conclusion is questionable becauseof
the previously discussed problems with their measuresand analysis.
Nevertheless, there is a lack of consensus inthe literature and
among resejuthers about the causal link be-tween the two constmcts.
Specifically, the view held bymany service quality researchers that
CS leads to SQ is atodds with the causal direction implied in
models specifiedby CS researchers. As Tfeas's discussion suggests,
these con-flicting perspectives could be due to the global or
overall at-titude focus in most SQ research in contrast to the
transac-tion-specific focus in most CS research. Adding to the
com-plexity of this issue, practitioners and the popular pressoften
use the terms service quality and customer satisfac-tion
interchangeably. An integrative framework that reflectsand
reconciles these differing perspectives is sorely neededto divert
our attention from merely debating the causal direc-tion between SQ
and CS to enhancing our knowledge ofhow they interrelate.
FIGURE 3Components of Transaction-Specific Evaiuations
Evaluation ofService Quaiity (SQ)
Evaiuation ofProduct Quaiity (SO)
Evaiuation ofPrice (P)
TransactionSatisfaction (TSAT)
Teas offers a thoughtful suggestion that could be thefoundation
for such a framework (p. 30):
One way to integrate these two causal perspectives is tospecify
two perceived quality conceptstransaction-specific quality and
relationship qualityand to specifyperceived transaction-specific
quality as the transaction-specific performance component of
contemporary con-sumer satisfaction models. This implies that
transaction-specific satisfaction is a function of perceived
transaction-specific performance quality. Furthermore,...
transaction-specific satisfaction could be argued to be a predictor
ofperceived long-term relationship quality.
A key notion embedded in Teas's suggestion is that bothSQ and CS
can be examined meaningfully from both trans-action-specific as
well as global perspectives, a viewpointembraced by other
researchers as well (e.g., Dabholkar1993). Building on this notion
and incorporating two otherpotential antecedents of CS^product
quality and pricewe propose (1) a transaction-specific
conceptualization ofthe constructs' interrelationships and (2) a
global frame-work reflecting an aggregation of customers'
evaluations ofmultiple transactions.
Figure 3 portrays the proposed transaction-specific con-ceptual
model. This model posits a customer's overall satis-faction with a
transaction to be a function of his or her as-sessment of service
quality, product quality, and price.^This conceptualization is
consistent with the "quality leadsto satisfaction" school of
thought that satisfaction research-ers often espouse (e.g.,
Reidenbach and Sandifer-Small-wood 1990; Woodside, Frey, and Daly
1989). The two sep-arate quality-evaluation antecedents capture the
fact that vir-tually all market offerings possess a mix of service
and prod-uct features and fall along a continuum anchored by
"tangi-
^Though the term "product" theoretically can refer to a good
(i.e., tan-gible product) or a service (i.e., intangible product),
here we use it only inthe former sense. Thus, "product quality" in
Figure 3 and elsewhere refersto tangible-product or good
quality.
Reassessment of Expectations as a Comparison Standard /121
-
FiGURE 4Components of Global Evaluations
Global ImpresJonsabout Firm
'SatisfactionService QualityProduct QualityPrice
ble dominant" at one end and "intangible dominant" at theother
(Shostack 1977). Therefore, in assessing satisfactionwith a
personal computer purchase or an airline flight, for ex-ample,
customers are likely to consider service features(e.g., empathy and
knowledge of the computer salesperson,courtesy and responsiveness
of the flight crew) and productfeatures (e.g., computer memory and
speed, seat comfortand meals during the flight) as well as price.
Understandingthe roles of service quality, product quality, and
price eval-uations in determining transaction-specific satisfaction
is animportant area for further research. For example, do
custom-ers use a compensatory process in combining the threetypes
of evaluations? Do individual factors (e.g., past expe-rience) and
contextual factors (e.g., purchase/consumptionoccasion) influence
the relative importance of the differentevaluations and how they
are combined?
In Figure 4, we present our proposed global framework.It depicts
customers' global impressions about a firm stem-ming from an
aggregation of transaction experiences. Weposit global impressions
to be multifaceted, consisting ofcustomers' overall satisfaction
with the firm as well as theiroverall perceptions of the firm's
service quality, productquality, and price. This framework, in
addition to capturingthe notion that the SQ and CS constructs can
be examinedat both transaction-specific and global levels, is
consistentwith the "satisfaction (with specific transactions) leads
tooverall quality perceptions'' school of thought embraced
byservice quality researchers (e.g., Bitner 1990; Bolton andDrew
1991a, b; Carman 1990). The term "transaction" inthis framework can
be used to represent an entire service ep-isode (e.g., a visit to a
fitness center or barber shop) or dis-crete components of a lengthy
interaction between a cus-
tomer and firm (e.g., the multiple encounters that a hotelguest
could have with hotel staff, facilities, and services).
The SERVQUAL instrument, in its present form, is in-tended to
ascertain customers' global perceptions of afirm's service quality.
In light of the proposed framework,modifying SERVQUAL to assess
transaction-specific ser-vice quality is a useful direction for
further research. Theproposed framework also raises other issues
worth explor-ing. For example, how do customers integrate
transaction-specific evaluations in forming overall impressions?
Aresome transactions weighed more heavily than others be-cause of
"primacy" and "recency" type effects or
satisfac-tion/dissatisfaction levels exceeding critical thresholds?
Dotransaction-specific service quality evaluations have any di-rect
influence on global service quality perceptions, in addi-tion to
the posited indirect influence (mediated through
trans-action-specific satisfaction)? How do the four facets
ofglobal impressions relate to one another? For example, arethere
any halo effects? Research addressing these and re-lated issues is
likely to add significantly to our knowledgein this area.
Specification of Service QualityA primary contribution of Teas's
article is that it highlightsthe potential problem with the P-E
specification of SQwhen the features on which a service is
evaluated are notvector attributes. Though we believe that the
collective evi-dence from Teas's study strongly supports the
vector-attribute assumption, the concern he raises has
implicationsfor correctly specifying SQ and for further research on
ser-vice attributes. Teas's EP specification, in spite of its
appar-ent empirical superiority over the other models he tested,
suf-fers from several deficiencies, including its being based onthe
questionable assumption that all service features are clas-sic
ideal point attributes. A mixed-model specification thatassumes
some features to be vector attributes and others tobe classic ideal
point attributes would be conceptually moreappropriate (cf. Green
and Srinivasan 1978). The flowchartin Figure 2 implies the
following mixed-model specifica-tion that takes into account
service-attribute type as well asinterpretation of the comparison
standard E:
SQ = twhere
^ +
= 1 if j is a vector attribute, or if it is a classic ideal
pointattribute and ? < l-\
= 0 otherwise.= 1 if j is a classic ideal point attribute and Ej
is inter-
preted as the classic ideal point (i.e., Ej = ip and Pj >
IJ;= 0 otherwise.= 1 if j is a classic ideal point attribute and Ej
is inter-
preted as the feasible ideal point (i.e.,V
= 0 otherwise.
jj < I-) and Pj
Operationalizing this specification has implications fordata
collection and analysis in SQ research. Specifically, inaddition to
obtaining data on customers' perceptions (Ppand comparison
standards (L and/or E-), one should ascer-
122 / Journal of Marketing, January 1994
-
and comparison standards (I- and/or E-), one should ascer-tain
whether customers view each service feattire as a vec-tor or
classic ideal point attribute. Including a separate bat-tery of
questions is one option for obtaining the additionalinformation,
though this option will lengthen the SQ sur-vey. An altemate
approach is to conduct a separate study(e.g., focus group
interviews in which customers discusswhether "more is always
better" on each service feature)to preclassify the features into
the two attribute categories.
The expression for SQ is implicitly an
individual-levelspecification because a given attribute j can be
viewed dif-ferently by different customers. In other words, for
thesame attribute j , the values of D^, Djj, and Djj can varyacross
customers. This possibility could pose a challenge inconducting
aggregate-level analyses across customers simi-lar to that faced by
researchers using conjoint analysis(Green and Srinivasan 1990). An
interesting direction for fur-ther research is to investigate
whether distinct customer seg-ments with homogeneous views about
the nature of service
attributes exist. Identification of such segments, in additionto
offering managerial insights for market segmentation,will reduce
the aforementioned analytical challenge in thatthe expression for
SQ can be treated as a segment level,rather than an
individual-level, specification.
ConclusionThe articles by C&T and Tfeas raise important
issues aboutthe specification and measurement of SQ. In this
article, weattempt to reexamine and clarify the key issues
raised.Though the current approach for assessing SQ can andshould
be refined, abandoning it altogether in favor of the al-ternate
approaches proffered by C&T and Teas does notseem warranted.
The collective conceptual and empirical ev-idence casts doubt on
the alleged severity of the concemsabout the current approach and
on the claimed superiorityof the altemate approaches. Critical
issues remain unre-solved and we hope that the research agenda we
have artic-ulated will be helpful in advancing our knowledge.
REFERENCESAnderson, James C. and David W. Gerbing (1982), "Some
Meth-
ods for Respecifying Measurement Models to Obtain
Unidimen-sional Construct Measurement," Joumal of Marketing
Re-search, 19 (November), 453-60.
Babakus, Emin and Gregory W. Boiler (1992), "An Empirical
As-sessment of the SERVQUAL Scale," Joumal of Business Re-search,
24, 253-^8.
Bagozzi, Richard P. (1982), "A Field Investigation of Causal
Re-lations Among Cognitions, Affect, Intentions, and
Behavior,"Joumal of Marketing Research, 19 (November), 562-84.
and Youjae Yi (1988), "On the Evaluation of StructuralEquation
Models," Joumal of the Academy of Marketing Sci-ence, 16 (Spring),
74-79.
and (1991), "Multitrait-Multimethod Matri-ces in Consumer
Research, " Journal of Consumer Research,17 (March), 426-39.
Bitner, Mary Jo (1990), "Evaluating Service Encounters: The
Ef-fects of Physical Surroundings and Employee Responses," Jour-nal
of Marketing, 54 (April), 69-82.
Bolton, Ruth N. and James H. Drew (1991a), "A
LongitudinalAnalysis of the Impact of Service Changes on Customer
Atti-tudes," Joumal of Marketing, 55 (January), 1-9.
and (1991b), "A Multistage Model of Custom-ers' Assessments of
Service Quality and Value," Joumal ofConsumer Research, 17
(Miirch), 375-84.
Boulding, William, Ajay Kalra, Richard Staelin, and Valarie
A.Zeithaml (1993), "A Dynamic Process Model of Service Qual-ity:
From Expectations to Behavioral Intentions," Journal ofMarketing
Research, 30 (February), 7-27.
Brown, Tom J., Gilbert A. Churchill, Jr., and J. Paul Peter
(1993),"Improving the Measurement of Service Quality," Journal
ofRetailing, 69 (Spring), 127-39.
Cadotte, Earnest R., Robert B. Woodruff, and Roger L.
Jenkins(1987), "Expectations and Norms in Models of
ConsumerSatisfaction," Journal of Marketing Research, 24
(August)305-14.
Carman, James M. (1990), "Consumer Perceptions of
ServiceQuality: An Assessment of the SERVQUAL Dimensions,"Joumal of
Retailing, 66 (1), 33-55.
Churchill, Gilbert A., Jr. and Carol Surprenant (1982), "An
Inves-
tigation Into the Determinants of Customer Satisfaction,"
Jour-nal of Marketing Research, 19 (November), 491-504.
Cronin, J. Joseph, Jr. and Steven A. Taylor (1992), "Measuring
Ser-vice Quality: A Reexamination and Extension," Journal
ofMarketing, 56 (July), 55-68.
Dabholkar, Pratibha A. (1993), "Customer Satisfaction and
Ser-vice Quality: Two Constructs or One?" in Enhancing Knowl-edge
Development in Marketing, Vol. 4, David W. Cravensand Peter R.
Dickson, eds. Chicago, IL: American MarketingAssociation,
10-18.
Dodds, William B., Kent B. Monroe, and Dhruv Grewal (1991),'
'The Effects of Price, Brand and Store Information on
Buyers'Product Evaluations," Joumal of Marketing Research, 28
(Au-gust), 307-19.
Forbes, J.D., David K. Tse, and Shirley Taylor (1986), "Toward
aModel of Consumer Post-Choice Response Behavior," in Ad-vances in
Consumer Research, Vol. 13, Richard L. Lutz, ed.Ann Arbor, MI:
Association for Consumer Research.
Gerbing, David W. and James C. Anderson (1988), "An
UpdatedParadigm for Scale Development Incorporating
Unidimension-ality and Its Assessment," Joumal of Marketing
Research, 15(May), 186-92.
Green, Paul E. and V. Srinivasan (1978), "Conjoint Analysis
inConsumer Research: Issues and Outlook," Journal of Con-sumer
Research, 5 (September), 103-23.
and (1990), "Conjoint Analysis in Marketing:New Developments
With Implications for Research and Prac-tice," Joumal of Marketing,
54 (October), 3-19.
Green, S.B., R. Lissitz, and S. Mulaik (1977), "Limitations
ofCoefficient Alpha as an Index of Test Unidimensionality,"
Edu-cational and Psychological Measurement, 37 (Winter),
827-38.
Gronroos, Christian (1982), Strategic Management and Marketingin
the Service Sector, Helsinki, Finland: Swedish School of Ec-onomics
and Business Administration.
Hattie, John (1985), "Methodology Review: Assessing
Unidimen-sionality," Applied Psychological Measurement, 9
(June),139-64.
Helson, Harry (1964), Adaptation Level Theory. New York:Harper
and Row.
Reassessment of Expectations as a Comparison Standard /123
-
Howell, Roy D. (1987), "Covariance Structure Modeling
andMeasurement Issues: A Note on 'Interrelations Among A Chan-nel
Entity's Power Sources,"' Joumal of Marketing Research,14
(February), 119-26.
Kahneman, Daniel and Dale T. Miller (1986), "Norm
Theory:Comparing Reality to Its Altematives." Psychologicai
Review,93 (2), 136-53.
Lehtinen, Uolevi and Jarmo R. Lehtinen (1982), "Service
Quality:A Study of Quality Dimensions," unpublished working
paper.Helsinki, Finland: Service Management Institute.
Mazis, Michael B., Olli T. Ahtola, and R. Eugene Klippel
(1975),"A Comparison of Four Multi-Attribute Models in the
Predic-tion of Consumer Attitudes," Joumal of Consumer Research,2
(June), 38-52.
Oliver, Richard L. (1980), "A Cognitive Model of the
Antece-dents and Consequences of Satisfaction Decisions," Joumal
ofMarketing Research, 17 (November), 460-69.
(1985), "An Extended Perspective on Post-Purchase Phe-nomena: Is
Satisfaction a Red Herring?'' unpublished paper pre-sented at 1985
Annual Conference of the Association for Con-sumer Research, Las
Vegas (October).
- and John B. Swan (1989), "Consumer Perceptions of
In-terpersonal Equity and Satisfaction in Transactions: A Field
Sur-vey Approach," Joumal of Marketing, 53 (April), 21-35.
Parasuraman, A., Leonard L. Berry, and Valarie A.
Zeithaml(1990), "Guidelines for Conducting Service Quality
Re-search," Marketing Research (December), 34^W.
, , and (1991), "Refinement and Re-assessment of the SERVQUAL
Scale," Joumal of Retailing,67 (Winter), 420-50.
, , and (1993), "More on ImprovingService Quality Measurement,"
Journal of Retailing, 69(Spring), 140-47.
-, Valarie A. Zeithaml, and Leonard L. Berry (1985).
"AConceptual Model of Service Quality and Its Implications
forFuture Research," Joumal of Marketing, 49 (Fall), 41-50.
-, and (1988), "SERVQUAL: A Mul-tiple-Item Scale for Measuring
Consumer Perceptions of Ser-vice Quality," Joumal of Retailing, 64
(1), 12^0.
Perreault, William D., Jr. (1992), "The Shifting Paradigm in
Mar-keting Research," Journal of the Academy of Marketing Sci-ence,
20 (Fall), 367-75.
Peter, J. Paul, Gilbert A. Churchill, Jr., and Tom J. Brown
(1993),' 'Caution in the Use of Difference Scores in Consumer
Re-search," Joumal of Consumer Research, 19 (March), 655-62.
Reidenbach, R. Eric and Beverly Sandifer-Smallwood (1990),
"Ex-ploring Perceptions of Hospital Operations by a Modified
SERVQUAL Approach," Joumal of Health Care Marketing,10
(December), 47-55.
Sasser, W. Earl, Jr., R. Paul Olsen, and D. Daryl Wyckoff
(1978),"Understanding Service Operations," in Management of
Ser-vice Operations. Boston: AUyn and Bacon.
Shostack, G. Lynn (1977), "Breaking Free from Product
Market-ing," Joumal of Marketing, 41 (April), 73-80.
Swan, John E. and Richard L. Oliver (1989), "Postpurchase
Com-munications by Consumers," Joumal of Retailing, 65 (Win-ter),
516-33.
Teas, R. Kenneth (1993), "Expectations, Performance
Evaluationand Consumers' Perceptions of Quality," Joumal of
Market-ing, 57 (October), 18-34.
Tse, David K. and Peter C. Wilton (1988), "Models of
ConsumerSatisfaction Formation: An Extension," Joumal of
MarketingResearch, 25 (May), 204-12.
Westbrook, Robert A. and Richard L. Oliver (1981),
"DevelopingBetter Measures of Consumer Satisfaction: Some
PreliminaryResults," in Advances in Consumer Research, Vol. 8, Kent
B.Monroe, ed. Ann Arbor, MI: Association for Consumer Re-search,
94-99.
Wilton, Peter and Franco M. Nicosia (1986), "Emerging Para-digms
for the Study of Consumer Satisfaction," European Re-search, 14
(January), 4-11.
Woodruff, Robert B., Ernest R. Cadotte, and Roger L.
Jenkins(1983),' 'Modeling Consumer Satisfaction Processes Using
Ex-perienced-Based Norms," Joumal of Marketing Research,
20(August), 296-304.
, D. Scott Clemons, David W. Schumann, Sarah F. Gar-dial, and
Mary Jane Burns (1991), "The Standards Issue InCS/D Research: A
Historical Perspective," Joumal of Con-sumer Satisfaction,
Dissatisfaction and Complaing Behavior,4, 103-9.
Woodside, Arch G., Lisa L. Frey, and Robert Timothy Daly(1989),
"Linking Service Quality, Customer Satisfaction, andBehavioral
Intention," Journal of Health Care Marketing, 9(December),
5-17.
Zeithaml, Valarie A., Leonard L. Berry, and A.
Parasuraman(1991), "The Nature and Determinants of Customer
Expecta-tions of Service," Marketing Science Institute Research
Pro-gram Series (May), Report No. 91-113.
, and (1993), "The Nature and Deter-minants of Customer
Expectations of Service," Joumal of theAcademy of Marketing
Science, 21 (1), 1-12.
-, A. Parasuraman, and Leonard L. Berry (1990), Deliver-ing
Service Quality. Balancing Customer Perceptions and Ex-pectations.
New York: The Free Press.
124 / Journal of Marketing, January 1994