Top Banner
1 23 Marketing Letters A Journal of Research in Marketing ISSN 0923-0645 Mark Lett DOI 10.1007/s11002-014-9345-7 A method for evaluating and selecting field experiment locations David Trafimow, James M. Leonhardt, Mihai Niculescu & Collin Payne
13

A method for evaluating and selecting field experiment locations

Apr 28, 2023

Download

Documents

ANNIE SELDEN
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A method for evaluating and selecting field experiment locations

1 23

Marketing LettersA Journal of Research in Marketing ISSN 0923-0645 Mark LettDOI 10.1007/s11002-014-9345-7

A method for evaluating and selecting fieldexperiment locations

David Trafimow, James M. Leonhardt,Mihai Niculescu & Collin Payne

Page 2: A method for evaluating and selecting field experiment locations

1 23

Your article is protected by copyright and all

rights are held exclusively by Springer Science

+Business Media New York. This e-offprint is

for personal use only and shall not be self-

archived in electronic repositories. If you wish

to self-archive your article, please use the

accepted manuscript version for posting on

your own website. You may further deposit

the accepted manuscript version in any

repository, provided it is only made publicly

available 12 months after official publication

or later and provided acknowledgement is

given to the original source of publication

and a link is inserted to the published article

on Springer's website. The link must be

accompanied by the following text: "The final

publication is available at link.springer.com”.

Page 3: A method for evaluating and selecting field experiment locations

A method for evaluating and selecting fieldexperiment locations

David Trafimow & James M. Leonhardt &Mihai Niculescu & Collin Payne

# Springer Science+Business Media New York 2015

Abstract When marketing researchers perform field experiments, it is crucialthat the experimental location and the control location are comparable. Atpresent, it is difficult to assess the comparability of field locations becausethere is no way to distinguish differences between locations that are due torandom versus systematic factors. To accomplish this, we propose a methodol-ogy that enables field researchers to evaluate and select optimal field locationsby parsing these random versus systematic effects. To determine the accuracy ofour proposed methodology, we performed computer simulations with 10,000cases per simulation. The simulations demonstrate that accuracy increases as thenumber of data points increases and as consistency increases.

Keywords Marketing research . Field experiments . Experimental methodology anddesign . Potential performance theory

Mark LettDOI 10.1007/s11002-014-9345-7

D. Trafimow (*)Department of Psychology, New Mexico State University, MSC 3,452, PO Box 30001, Las Cruces, NM88003-8001, USAe-mail: [email protected]

J. M. Leonhardt (*) :M. Niculescu : C. PayneDepartment of Marketing, New Mexico State University, MSC 5,280, PO Box 30001, Las Cruces, NM88003-8001, USAe-mail: [email protected]

M. Niculescue-mail: [email protected]

C. Paynee-mail: [email protected]

Author's personal copy

Page 4: A method for evaluating and selecting field experiment locations

1 Introduction

Much marketing research involves testing the effectiveness of particular interventionsin the field. For an intervention to be practical, it is insufficient for it to work only in thelaboratory. To make the case for practicality, the intervention must be demonstrated tobe effective in the context of the complex of naturally occurring forces that are at playin the marketplace in which the intervention is to be applied (e.g., Heckman and Smith1995; Levitt and List 2007; Lichtenstein and Slovic 1973). In addition, as field researchcontinues to be emphasized in marketing, and the frequency with which articles thatdepend on field research are published continues to accelerate, we expect that increasedattention will be devoted to field research method innovation. Here, we develop amethodology for assessing and increasing the validity of field experiments in market-ing. This new methodology employs the mathematics of potential performance theory(Trafimow and Rice 2008, 2009) and allows researchers to select optimal treatment andcontrol locations for their field experiments.

A preliminary description of the issue is as follows. A close examination ofproposed experimental and control locations will reveal differences. It is unlikely, forexample, that every time sales increase in one store they also will increase in another.There are two general classes of reasons for these differences. First, the complex ofnaturally occurring forces might differ between the two stores. The second reason isthat differences can occur because of randomness, even if the complex of naturallyoccurring forces is precisely the same in the two stores. If there were no randomness inthe world, it would be easy to test experimental-control location pairings. In theabsence of randomness, a perfect control location would be one where the findingson the dependent variable of interest were the same as in the experimental location, onevery measurement occasion, prior to the introduction of the intervention. The meth-odology presented in this paper allows researchers to handle randomness to approachthis ideal state.

2 Theoretical background

In a field experiment, the experimenter chooses an experimental location and a controllocation such that the intervention occurs in the experimental location but not in thecontrol location. For instance, suppose a large grocery retailer is interested in testing theeffect of a particular promotion on produce sales. The retailer could implement thepromotion in one of their stores and then compare that store’s produce sales withanother store where the promotion was absent. Should produce sales increase at thestore with the promotion relative to the store without the promotion, the retailer mayconclude that the promotion was successful in increasing produce sales. In addition,better field experiments may use paired baseline periods for treatment and controlstores sales (for example) so that differences are measured not only between, but withinstores; the potential interaction (or difference of differences) of which would be used toindicate the effect of the intervention (e.g. the promotion).

However, there are factors that can compromise this conclusion. For example, in anexperiment, participants are randomly assigned to conditions whereas in a field exper-iment this is not so. Consequently, alternative explanations pertaining to the existence

Mark Lett

Author's personal copy

Page 5: A method for evaluating and selecting field experiment locations

of preexisting differences tend to be more problematic for field experiments than forlaboratory experiments. Put another way, differences between the two locations, sub-sequent to the intervention, could be due to factors other than the intervention. To theextent that one can plausibly argue for one or more differences between the twolocations, other than the intervention or lack thereof, confidence in the efficacy of theintervention decreases even when an effect is obtained.

The usual way of handling this problem is to attempt to show that the two locationsmatch on variables that might be argued to be different between them. For example, itcould be that there was more store traffic in the experimental store than the controlstore. A statistical demonstration that, in fact, traffic was approximately equal in bothstores can help to alleviate the deleterious impact of the alternative explanation on theexperimenter’s preferred explanation that the difference in produce sales was due to theintervention. But there are problems with using matching to discredit alternativeexplanations.

The most obvious problem is that for ideal matching, the researcher needs to knowall of the variables on which to match. It is unlikely that all of these will be known,thereby greatly reducing the ability of experimenters to use a matching methodologyeffectively. Even if all of the relevant variables were known, it is unlikely that the twolocations will match on all of them. The experimenter is then reduced to trying tocontrol for the nonmatching variables on statistical bases (e.g., via partial correlations,ANCOVAs, hierarchical regression, and so on), which are not very satisfactory giventhe many demonstrations of the problems that these methods carry with them (e.g.,Heckman 1998).

Even if the foregoing problems were not so, there remains the issue that matching onobserved scores does not necessarily imply a match on true scores. Consider theexample of socioeconomic status influencing entrepreneurial success. Suppose that aresearcher suspects that entrepreneurs of high socioeconomic status average out to ahigher intelligence level than entrepreneurs of low socioeconomic status, and wishes tomatch on intelligence so as to rule it out as an explanation. If the difference inintelligence is really there, matching can only work by choosing high socioeconomicstatus entrepreneurs that are lower than their mean, low socioeconomic status entre-preneurs that are greater than their mean, or both. The statistical phenomenon ofregression to the mean implies that even though the matching can be carried outsuccessfully on observed scores, the participants will remain mismatched on truescores. That is, the true score mean of the high socioeconomic status entrepreneurswill still exceed that of the low socioeconomic entrepreneurs, even after matching onobserved scores.

The totality of these problems renders matching to be a poor solution to know ordemonstrate that experimental and control locations pair validly. Nevertheless, theproblem remains an important one. We suggest a solution that does not depend onknowing what the relevant variables are, or how to measure them to show that there areno differences with respect to them, or how to handle the problem of regression to themean when the true score means might differ. However, to understand the solution wepropose, it is necessary to understand the basics of potential performance theory(Trafimow and Rice 2008, 2009).

The mathematics of PPT involves two types of inputs. First, there are observedfrequencies. Second, there are within-entity (e.g., location 1 or 2) correlation

Mark Lett

Author's personal copy

Page 6: A method for evaluating and selecting field experiment locations

coefficients or consistency coefficients. Based on these two types of inputs and PPTmathematics, the output is potential agreement or the agreement that would be obtainedin the absence of randomness. When performing field research, we are concerned withthe agreement of two locations, the control location and the experimental location.These two locations might “agree” at a particular time or they might “disagree”, andthere are generally at least two ways of agreeing or disagreeing. For example, let usagain consider produce sales. Two locations (e.g., stores) would agree if, on anyparticular day, their respective produce sales each exceeded or each fell below theirrespective produce sales medians. Alternatively, two locations would disagree if, onany particular day, one location’s produce sales exceeded its daily produce salesmedian, while the other location’s produce sales fell below its daily produce salesmedian, or the reverse.

In general, it is possible to compare these two locations using a 2 (location 1 is aboveor below its respective median) by 2 (location 2 is above or below its respectivemedian) frequency table. The table would show observed frequency, that is thefrequency of observed agreement and observed disagreement for both locations acrossall trials (e.g., days) considered. For clarity regarding subsequent equations, we definethe following variables. Lower case letters denote the frequency that both locationswere above (a) or below (d) their respective medians; and when location 1 is above andlocation 2 is below their respective medians (b) and the reverse (c). There are also thefollowing row and column frequencies: r1=a+b,r2=c+d,c1=a+c, and c2=b+d.

This frequency of agreement/disagreement between the two locations—as describedin the foregoing 2 by 2 table—is the result of both random and systematic factors.However, PPT mathematics can provide us with an estimate of what the “true”frequencies would be in the absence of randomness. Stated another way, PPT mathe-matics can be used to construct a table that indicates potential agreement between thetwo locations (the agreement that would be obtained in the absence of randomness). Itis a PPT convention to use upper case letters to represent the frequencies that would beobtained in the absence of randomness; for example, A is the true frequency (in absenceof randomness) that corresponds to the observed frequency a, R1 is the true row 1frequency that corresponds to the observed row frequency r1, and so on, as described inthe previous paragraph.

Before presenting mathematical detail, consider again what it is that makes a controllocation a good one to contrast against an experimental location. As we pointed outearlier, the point of doing research in the field is to show not just that the interventionworks, but also that it works in the complex of naturally occurring forces where it needsto work to have the desired effect in the real world. But how can we know that thecomplex of naturally occurring forces is the same in the experimental and controllocations? Even if the two locations are similar in every way, so that the control locationis ideal for the experimental location with which it is paired, randomness provides aneffective disguise.

However, with PPT, it is possible to gain an excellent estimate of the agreement thatcould be expected between the two locations in the absence of randomness. If potentialagreement (PA) is near unity, it indicates that the complex of naturally occurring forcesworks similarly in the two locations with respect to the dependent variable underinvestigation (e.g., produce sales), so that the control location is ideal for the experi-mental location with which it is paired. To the extent that PA is less than unity, the

Mark Lett

Author's personal copy

Page 7: A method for evaluating and selecting field experiment locations

control location is less ideal for the experimental location with which it is paired. So theconnection between PPT and the problem of determining the validity of control groupsin field research becomes clear. Removing the veil of randomness renders it possible todirectly evaluate the validity of a control location or the relative validity of competingcontrol locations.

The PPT equations are easy to use by undergoing the following steps (see Trafimowand Rice, 2008 for proofs for all equations). First, it is necessary to convert theobserved frequency table into a correlation coefficient. This can be accomplished viaEq. 1, where a, b, c, and d are the cell frequencies and r1, r2, c1, and c2 are row andcolumn frequencies.

rXY ¼ ad−bcffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

aþ b aþ c bþ d cþ dp ¼ ad−bc

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

r1r2c1c2p ð1Þ

The correlation coefficient in Eq. 1 is an observed correlation coefficient. But tocorrect for randomness, a “corrected” or “true” correlation coefficient is needed.Equation 2 provides that—based on the observed correlation from Eq. 1 and theconsistency coefficients of each of the two locations (rXX ' and rYY ').

R ¼ rXYffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

rXX 0rYY 0p ð2Þ

The true correlation coefficient obtained in Eq. 2 can then be used to construct atable with true frequencies. These true frequencies are estimates of what would havehappened in the absence of randomness.

To obtain these estimates, the PPT researcher sets the margin frequencies at theobserved levels (R1=r1,R2=r2,C1=c1, and C2=c2), and uses these in conjunction withthe true correlation coefficient obtained via Eq. 2. The Eqs. 3–6 for obtaining all fourtrue cell frequencies are below.

A ¼ Rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

R1R2C1C2p þ C1R1

R1 þ R2ð Þ ð3Þ

B ¼ R1R1 þ R2−Rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

R1R2C1C2p þ C1R1

R1 þ R2ð Þ ð4Þ

C ¼ C1R2−Rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

R1R2C1C2p

R1 þ R2ð Þ ð5Þ

D ¼ C2R1 þ R2−R1 R1 þ R2−Rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

R1R2C1C2p þ C1R1

R1 þ R2ð Þ ð6Þ

Once these true cell frequencies have been estimated, Eq. 7 can be used to obtain PA.

PA ¼ Aþ D

Aþ Bþ C þ Dð7Þ

Mark Lett

Author's personal copy

Page 8: A method for evaluating and selecting field experiment locations

To gain an idea of how inconsistency reduces observed agreement relative to PA, weprovided Table 1. This table illustrates the difference between PA and observedagreement when locations vary in consistency. The table highlights that observedagreement is less than PA and the discrepancy increases as consistency decreases.

3 Simulations

Considering that it is generally not possible to obtain an infinite amount of data, it isimportant to know how well PPT equations do when there is limited data, in which casewe might expect PPT to perform at a level far below perfection. For each simulation, wetested the ability of PPT equations to discriminate between alternative PA levels {0.6versus 0.7, 0.7 versus 0.8, 0.8 versus 0.9, and 0.9 versus 1.0} when sample size {N=30,60, 120, and 240} and consistency coefficients varied {0.6, 0.7, 0.8, and 0.9}. Forexample, when PA=1.0 for one location and 0.90 for the other, we would hope that thePPT simulations would compute a higher PA estimate for the former than for the latterlocation. With infinite data, this would happen 100 % of the time. But with finite data, weexpected a substantial percentage of failures, though the percentage of successes shouldincrease as N increases. In addition, because lower consistency coefficients indicate morerandomness, we expected success rates to increase as consistency coefficients increased.

To commence, our PPT simulations relied on user-defined consistency coefficientsfor each location {0.6, 0.7, 0.8, or 0.9}. There was no point in using a user-definedconsistency coefficient of 1.0 as that would render PA equal to observed agreement.These user-defined values were then used as the basis for randomly generating 10,000sample consistency coefficients for each location for each simulation. Note that as theuser-defined consistency coefficients decreased, randomness increased, thereby en-abling us to manipulate the amount of randomness in the simulations.

In the empirical PPT research that has been conducted thus far, researchers haveemployed two blocks of paired trials in order to obtain consistency coefficients. Theidea is that if every trial in the first block of trials can be paired with a correspondingtrial in the second block of trials, the consistency coefficient for a particular person whocompletes the two blocks of trials is indexed by using a within-person correlationacross the two blocks of paired trials. Although the two block approach is particularlycongruent with PPT, there might be cases where it is not feasible to obtain two blocks

Table 1 Observed agreement as a function of potential agreement (PA) and consistency coefficients for twolocations (rxx′ and ryy′)

Consistency Potential agreement (PA)

rxx′ ryy′ 0.6 0.7 0.8 0.9 1.0

0.6 0.6 0.56 0.62 0.68 0.74 0.80

0.7 0.7 0.57 0.64 0.71 0.78 0.85

0.8 0.8 0.58 0.66 0.74 0.82 0.90

0.9 0.9 0.59 0.68 0.77 0.86 0.95

To simplify the simulations, we set the margin frequencies (R1=R2=C1=C2, not depicted) and consistencies(rxx′=ryy′) as equal to each other

Mark Lett

Author's personal copy

Page 9: A method for evaluating and selecting field experiment locations

of paired trials and alternative ways to compute consistency coefficients have beenproposed. For example, MacDonald and Trafimow (2013) proposed an equation thatallows consistency coefficients to be estimated based on a single block of trials. In thesimulations to be presented, we generated user-defined consistency coefficients and leftopen the issue of different ways to obtain them empirically based on different empiricalcontexts.

The difficulty with randomly generating user-defined consistency coefficients is thatthey are correlations, and distributions of correlations are skewed, thereby rendering theusual normal distributions as invalid for generating large numbers of user-definedconsistency coefficients to use in the simulations. To circumvent this difficulty, weused Fisher’s r to z transformations, which involved using Eq. 8 to convert theconsistency coefficients {0.6, 0.7, 0.8, and 0.9} into distribution means, and Eq. 9 toobtain the standard deviations. Based on these equations, 10,000 values were randomlygenerated, for each simulation, based on the means and standard deviation indicated inEqs. 8 and 9, respectively.

μZ ¼ 1

2loge

1þ rXX 0

1−rXX 0ð8Þ

σZ ¼ 1ffiffiffiffiffiffiffiffiffi

N−3p ð9Þ

Once the 10,000 values were generated for each simulation, we reversed Eq. 8 toreconvert each generated value back into a consistency coefficient. This was done viaEq. 10 below and resulted in 10,000 normally distributed consistency coefficients foreach location.

consistency coefficient in any of the 10; 000 cases ¼ e2Z−1e2Z þ 1

ð10Þ

We then addressed observed agreement for each pair of locations {0.6 versus 0.7,0.7 versus 0.8, 0.8 versus 0.9, and 0.9 versus 1.0}. To simplify the simulations, we setthe margin frequencies as equal to each other (R1=R2=C1=C2). In turn, this allowed usto replace Eqs. 3–6 with a single simple equation rendered below as Eq. 11 (seeTrafimow and Rice, 2008 for proof). In Eq. 11, s is the observed agreement and S isthe PA. This can be reversed using Eq. 12 below.

s ¼ Sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

rXX 0rYY 0p þ 0:5−0:5

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

rXX 0rYY 0p ð11Þ

S ¼ 2s−1þ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

rXX 0rYY 0p

2ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

rXX 0rYY 0p ð12Þ

Given that Eq. 11 supplied a value for observed agreement, we converted this to acorrelation coefficient using Eq. 13 below. Because distributions of correlation coeffi-cients are skewed, as previously described, we used the Fisher’s r to z transformation

Mark Lett

Author's personal copy

Page 10: A method for evaluating and selecting field experiment locations

process (Eqs. 8 and 9) to obtain 10,000 correlation values upon which we used Eq. 14to convert them into observed agreement values (Rosenthal and Rosnow 1991;Trafimow and Rice 2008).

r ¼ 2 s−0:5ð Þ ð13Þ

s ¼ 0:5þ r

2ð14Þ

Finally, it only remained to estimate PA using Eq. 12 in conjunction with theobserved agreement values and consistency coefficients obtained in the previous stepsof the simulation procedure.

3.1 Simulation results

The percentage of times that PPT chose the correct control location varied with threefactors. Most important, as N increased, PPT was correct more often. Second, aspopulation consistency coefficients increased, so that randomness decreased, the accu-racy of PPT increased. Finally, and of least interest, PPT performed best when choosingbetween population PAs of 1.0 and 0.90, and increasingly less well when choosingbetween 0.90 and 0.80, 0.80 and 0.70, and 0.70 and 0.60. Figure 1 illustrates theseeffects, as does Table 2.

4 General discussion

When comparing possible field locations, how is a researcher to distinguish randomdifferences from systematic ones? Our goal was to answer this question by invoking themathematics of PPT. In the ideal case of an infinite number of data points, PPT works

70%

75%

80%

85%

90%

95%

100%

N = 30 N = 60 N = 120 N = 240

Consistency = 0.7 72.80% 81.38% 90.27% 96.42%

Consistency = 0.8 78.73% 87.29% 94.22% 98.49%

Consistency = 0.9 84.82% 92.12% 97.05% 99.33%

% Correct

Fig. 1 Average frequencies for the correct control location being selected in the pairwise comparisonsbetween neighboring potential agreement (PA) levels across sample size (N) and consistency levels

Mark Lett

Author's personal copy

Page 11: A method for evaluating and selecting field experiment locations

perfectly but because PPT equations provide estimates, having too few data points is apotential problem. To address this issue, we performed simulations to understand howwell PPT performs at distinguishing competing control locations at different user-defined population PA levels.

In the worst-case scenario, where the population PA levels are low (0.70versus 0.60), population consistencies are low (0.70 at both locations), and thereare 30 data points for every sample estimate, PPT does not perform well(proportion of correct choices=0.69). In the best-case scenario, where populationPA levels are high (1.0 versus 0.90), population consistency coefficients areimpressive (0.90), and there are 240 data points for every estimate, PPT performsvery well (proportion of correct choices=1.0).

In general, when N is large, PPT tends to do well even in less than ideal conditionswith respect to the other two factors. Therefore, our proposed PPT methodology workswell provided that a sufficient number of data points are available. This conclusion isconsistent with other research that pertains to the law of large numbers (Nisbett et al.1983). Traditionally, marketing researchers have been able to obtain large amounts offield data from sources such as direct mail catalogs (e.g., Anderson and Simester 2001),school administrators and nonprofit organizations (e.g., Raju et al. 2010), ecommercesites (e.g., Algesheimer et al. 2010), and retail scanner data (e.g. Levav and Zhu 2009).

In addition to having access to a sufficient quantity of data on the dependent variableof interest, users of PPT must also consider how to go about estimating consistencycoefficients for each location under consideration. Ideally, the researcher would havesufficient repeated measures data from each location. However, the ideal situation maynot be met given the often limited form and quantity of available field data. Forinstance, we suspect that marketing researchers may encounter this problem whenusing grocery store scanner data at the daily level. Such data does not allow for a

Table 2 Pairwise comparisons between neighboring levels of potential agreement (PA) across consistencyand sample size (N) levels used in the simulations

N Consistencies(rxx′=ryy′)

Pairwise PA comparisons

1 vs. 0.9 (%) 0.9 vs. 0.8 (%) 0.8 vs. 0.7 (%) 0.7 vs. 0.6 (%) Average (%)

30 0.9 96.66 86.72 79.94 75.97 84.82

0.8 86.38 79.83 76.00 72.70 78.73

0.7 76.70 73.18 71.87 69.43 72.80

60 0.9 99.72 94.56 88.90 85.29 92.12

0.8 94.27 88.88 84.21 81.79 87.29

0.7 86.10 81.81 79.53 78.09 81.38

120 0.9 100.00 98.77 96.08 93.33 97.05

0.8 98.93 95.78 92.39 89.77 94.22

0.7 94.11 91.07 88.55 87.34 90.27

240 0.9 100.00 99.91 99.33 98.09 99.33

0.8 99.98 99.45 97.91 96.60 98.49

0.7 98.52 97.17 95.68 94.31 96.42

Mark Lett

Author's personal copy

Page 12: A method for evaluating and selecting field experiment locations

sufficient number of comparisons if comparing by day of the week or month of theyear.

To obtain a sufficient number of comparisons the researcher could construct afrequency table by first comparing whether the dependent variable of interest (e.g.,sales) during a given time period (e.g., daily sales) is above or below the historicalmedian for the dependent variable of interest. The researcher could then compare thesegiven time periods based on some criterion (e.g., neighboring days) to assess thefrequency of their agreement and disagreement. The benefit of this method of obtainingconsistency coefficients is that it does not limit the researcher to repeated measuresdata. This allows the researcher to use PPT with many more forms of data as long asthere is a sufficient quantity of data points. The drawback of this method is thatconsistency coefficients will likely be lower than would be the case with repeatedmeasures data.

5 Conclusion

As marketing researchers continue to search for interventions that work in the field, it isnot surprising that field experiments are increasingly emphasized. However, becausefield experiments use random assignment to conditions infrequently, there is theubiquitous concern that an obtained difference between the treatment and controllocations is due to differences in the locations rather than due to the manipulatedintervention. As we explained earlier, the usual methods involving matching, statisticalcontrol, and so on insufficiently address this concern. A major part of the problem is thedifficulty in knowing all of the relevant variables on which to match or on which tostatistically control. In addition, even in the unlikely event that all of the relevantvariables are known, matching and statistical control are still problematic. Our solutiondoes not depend on the researcher’s ability to know what all of the relevant variablesare. Unlike traditional methods trying to increase internal validity at the expense ofexternal validity, our approach increases both internal and external validity withoutinvolving a trade-off. As a result, this method is consistent with the results of a meta-analysis by Anderson et al. (1999).

Our proposed methodology hearkens back to the point of doing field experiments inthe first place, which is to test the intervention in the complex of naturally occurringforces where it actually would be applied. But for the field experiments to be fair, thecomplex of naturally occurring forces should be the same in the treatment and controllocations. Normally, this would be impossible to determine because there would be noway to distinguish systematic versus random differences between potential treatmentand control locations. But our methodology provides researchers with the capability toparse these effects.

The simulations demonstrated that although PPT does not work well with few datapoints (N=30 or less), it works extremely well with a large number of data points(N=120 or more). In many marketing contexts, it is not difficult to obtain a largenumber of data points, thereby maximizing the validity of our proposedmethodology. Inaddition, although the simulations were successful, we believe that there is room forfuture research that would be helpful in improving the proposed methodology.

Mark Lett

Author's personal copy

Page 13: A method for evaluating and selecting field experiment locations

In our view, the most important difficulty for researchers who wish to use ourmethodology is in obtaining the consistency coefficients that it requires. There aremultiple ways to obtain these coefficients. As of now, it is not clear which way is mostvalid, or even if a combination of ways is more valid than any single way. Fortunately,this is an issue that seems amenable to empirical testing. Therefore, we hope and expectthat the present demonstration will lead to at least two tracks of research. Mostobviously, we expect that future marketing researchers will routinely use our proposedmethodology to test the validity of their pairings of experimental and control locations.Less obviously, we expect that researchers will perform competitive tests of alternativemethods for obtaining consistency coefficients for locations. Finally, assuming thatresearchers pursue both of these tracks, it seems likely that the findings on both trackswill inform each other, and the integration of the two tracks will prove to be moreinformative than either one by itself.

References

Algesheimer, R., Borle, S., Dholakia, U. M., & Singh, S. S. (2010). The impact of customer communityparticipation on customer behaviors: an empirical investigation. Marketing Science, 29(4), 756–769.

Anderson, C. A., Lindsay, J. M., & Bushman, B. J. (1999). Research in the psychological laboratory: truth ortriviality? Current Directions in Psychological Science, 8(1), 3–9.

Anderson, E. T., & Simester, D. I. (2001). Are sale signs less effective when more products have them?Marketing Science, 20(2), 121–142.

Heckman, J. J. (1998). Detecting discrimination. The Journal of Economic Perspectives, 12(2), 101–116.Heckman, J. J., & Smith, J. A. (1995). Assessing the case for social experiments. The Journal of Economic

Perspectives, 9(2), 85–110.Levav, J., & Zhu, R. J. (2009). Seeking freedom through variety. Journal of Consumer Research,

36(4), 600–610.Levitt, S. D., & List, J. A. (2007). What do laboratory experiments measuring social preferences reveal about

the real world? The Journal of Economic Perspectives, 21(2), 153–174.Lichtenstein, S., & Slovic, P. (1973). Response-induced reversals of preference in gambling: an extended

replication in Las Vegas. Journal of Experimental Psychology, 101(1), 16–20.MacDonald, J. A., & Trafimow, D. (2013). A measure of within-participant consistency. Behavior Research

Methods, 45(4), 950–954.Nisbett, R. E., Krantz, D. H., Jepson, C., & Kunda, Z. (1983). The use of statistical heuristics in everyday

inductive reasoning. Psychological Review, 90(4), 339–363.Raju, S., Rajagopal, P., & Gilbride, T. J. (2010). Marketing healthful eating to children: the effectiveness of

incentives, pledges, and competitions. Journal of Marketing, 74(3), 93–106.Rosenthal, R., & Rosnow, R. L. (1991). Essentials of behavioral research: methods and data analysis.

New York: McGraw-Hill.Trafimow, D., & Rice, S. (2008). Potential performance theory (PPT): a general theory of task performance

applied to morality. Psychological Review, 115(2), 447–462.Trafimow, D., & Rice, S. (2009). Potential performance theory (PPT): describing a methodology for analyzing

task performance. Behavior Research Methods, 41(2), 359–371.

Mark Lett

Author's personal copy