Top Banner
QTEST: Quantitative Testing of Theories of Binary Choice Michel Regenwetter University of Illinois at Urbana-Champaign Clintin P. Davis-Stober University of Missouri at Columbia Shiau Hong Lim National University of Singapore Ying Guo, Anna Popova, and Chris Zwilling University of Illinois at Urbana-Champaign Yun-Shil Cha Korea Institute of Public Finance William Messner State Farm Insurance, Champaign, Illinois The goal of this paper is to make modeling and quantitative testing accessible to behavioral decision researchers interested in substantive questions. We provide a novel, rigorous, yet very general, quantitative diagnostic framework for testing theories of binary choice. This permits the nontechnical scholar to proceed far beyond traditionally rather superficial methods of analysis, and it permits the quantitatively savvy scholar to triage theoretical proposals before investing effort into complex and specialized quan- titative analyses. Our theoretical framework links static algebraic decision theory with observed variability in behavioral binary choice data. The article is supplemented with a custom-designed public-domain statistical analysis package, the QTEST software. We illustrate our approach with a quantitative analysis using published laboratory data, Michel Regenwetter, Department of Psychology, Uni- versity of Illinois at Urbana-Champaign; Clintin P. Davis- Stober, Department of Psychology, University of Missouri at Columbia; Shiau Hong Lim, Department of Mechanical Engineering, National University of Singapore, Singapore; Ying Guo, Anna Popova, and Chris Zwilling, Department of Psychology, University of Illinois at Urbana-Cham- paign; Yun-Shil Cha, Korea Institute of Public Finance, Seoul, Korea; William Messner, State Farm Insurance, Champaign, Illinois. Dedicated to R. Duncan Luce (May 1925–August 2012), whose amazing work provided much inspiration and mo- tivation for this program of research. Shiau Hong Lim programmed most of QTEST while at the Department of Computer Science, University of Illinois and while at the Department of Mathematics and Informa- tion Technology, University of Leoben, Austria. Yun-Shil Cha, Ying Guo, William Messner, Anna Popova, and Chris Zwilling contributed to the program debugging, interface design, and miscellaneous computation and carried out the data analyses. Cha and Messner have graduated from the University of Illinois since working on this project, and now work in industry. Regenwetter developed initial drafts of this article while a 2008 –2009 sabbatical Fellow of the Max Planck Institute for Human Development, Berlin. He thanks the Adaptive Behavior and Cognition group for many stimulating interactions. A number of colleagues have provided helpful comments at various presentations and discussion of this work. These include M. Birnbaum, P. Blavatskyy, M. Brown, E. Bokhari, D. Cavagnaro, J. Busemeyer, A. Glöckner, A. Bröder, G. Harrison, K. Katsikopoulos, G. Loomes, R. D. Luce, A. A. J. Marley, G. Pogrebna, J. Stevens, N. Wilcox, and attendees at the 2010 and 2011 meetings of the Society for Mathematical Psy- chology, the 2010, 2011, and 2012 meetings of the Society for Judgment and Decision Making, the 2011 European Mathematical Psychology Group meeting, the 2011 Geor- gia State CEAR workshop on structural modeling of het- erogeneity in discrete choice under risk and uncertainty, the 2012 Warwick workshop on noise and imprecision in individual and interactive decision-making, and the 2012 FUR XV meeting. Regenwetter acknowledges funding un- der AFOSR Grant No. FA9550-05-1-0356, NIMH Train- ing Grant PHS 2 T32 MH014257, NSF Grant SES No. 08-20009, NSF Grant SES No. 10-62045, and an Arnold O. Beckman Research Award from the University of Illi- nois at Urbana-Champaign. Davis-Stober was supported by a Dissertation Completion Fellowship of the University of Illinois when working on the theoretical and statistical models. Any opinions, findings, and conclusions or recom- mendations expressed in this publication are those of the authors and do not necessarily reflect the views of col- leagues, funding agencies, or employers. Correspondence concerning this article should be ad- dressed to Michel Regenwetter, Department of Psychol- ogy, University of Illinois at Urbana-Champaign, Cham- paign, IL 61820-5711. E-mail: [email protected] Decision © 2014 American Psychological Association 2014, Vol. 1, No. 1, 2–34 2325-9965/14/$12.00 DOI: 10.1037/dec0000007 2
33

QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

Jul 05, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

QTEST: Quantitative Testing of Theories of Binary Choice

Michel RegenwetterUniversity of Illinois at Urbana-Champaign

Clintin P. Davis-StoberUniversity of Missouri at Columbia

Shiau Hong LimNational University of Singapore

Ying Guo, Anna Popova,and Chris Zwilling

University of Illinois at Urbana-Champaign

Yun-Shil ChaKorea Institute of Public Finance

William MessnerState Farm Insurance, Champaign, Illinois

The goal of this paper is to make modeling and quantitative testing accessible tobehavioral decision researchers interested in substantive questions. We provide a novel,rigorous, yet very general, quantitative diagnostic framework for testing theories ofbinary choice. This permits the nontechnical scholar to proceed far beyond traditionallyrather superficial methods of analysis, and it permits the quantitatively savvy scholar totriage theoretical proposals before investing effort into complex and specialized quan-titative analyses. Our theoretical framework links static algebraic decision theory withobserved variability in behavioral binary choice data. The article is supplemented witha custom-designed public-domain statistical analysis package, the QTEST software. Weillustrate our approach with a quantitative analysis using published laboratory data,

Michel Regenwetter, Department of Psychology, Uni-versity of Illinois at Urbana-Champaign; Clintin P. Davis-Stober, Department of Psychology, University of Missouriat Columbia; Shiau Hong Lim, Department of MechanicalEngineering, National University of Singapore, Singapore;Ying Guo, Anna Popova, and Chris Zwilling, Departmentof Psychology, University of Illinois at Urbana-Cham-paign; Yun-Shil Cha, Korea Institute of Public Finance,Seoul, Korea; William Messner, State Farm Insurance,Champaign, Illinois.

Dedicated to R. Duncan Luce (May 1925–August 2012),whose amazing work provided much inspiration and mo-tivation for this program of research.

Shiau Hong Lim programmed most of QTEST while atthe Department of Computer Science, University of Illinoisand while at the Department of Mathematics and Informa-tion Technology, University of Leoben, Austria. Yun-ShilCha, Ying Guo, William Messner, Anna Popova, and ChrisZwilling contributed to the program debugging, interfacedesign, and miscellaneous computation and carried out thedata analyses. Cha and Messner have graduated from theUniversity of Illinois since working on this project, andnow work in industry. Regenwetter developed initial draftsof this article while a 2008–2009 sabbatical Fellow of theMax Planck Institute for Human Development, Berlin. Hethanks the Adaptive Behavior and Cognition group formany stimulating interactions. A number of colleagueshave provided helpful comments at various presentationsand discussion of this work. These include M. Birnbaum,

P. Blavatskyy, M. Brown, E. Bokhari, D. Cavagnaro, J.Busemeyer, A. Glöckner, A. Bröder, G. Harrison, K.Katsikopoulos, G. Loomes, R. D. Luce, A. A. J. Marley, G.Pogrebna, J. Stevens, N. Wilcox, and attendees at the 2010and 2011 meetings of the Society for Mathematical Psy-chology, the 2010, 2011, and 2012 meetings of the Societyfor Judgment and Decision Making, the 2011 EuropeanMathematical Psychology Group meeting, the 2011 Geor-gia State CEAR workshop on structural modeling of het-erogeneity in discrete choice under risk and uncertainty,the 2012 Warwick workshop on noise and imprecision inindividual and interactive decision-making, and the 2012FUR XV meeting. Regenwetter acknowledges funding un-der AFOSR Grant No. FA9550-05-1-0356, NIMH Train-ing Grant PHS 2 T32 MH014257, NSF Grant SES No.08-20009, NSF Grant SES No. 10-62045, and an ArnoldO. Beckman Research Award from the University of Illi-nois at Urbana-Champaign. Davis-Stober was supported bya Dissertation Completion Fellowship of the University ofIllinois when working on the theoretical and statisticalmodels. Any opinions, findings, and conclusions or recom-mendations expressed in this publication are those of theauthors and do not necessarily reflect the views of col-leagues, funding agencies, or employers.

Correspondence concerning this article should be ad-dressed to Michel Regenwetter, Department of Psychol-ogy, University of Illinois at Urbana-Champaign, Cham-paign, IL 61820-5711. E-mail: [email protected]

Decision © 2014 American Psychological Association2014, Vol. 1, No. 1, 2–34 2325-9965/14/$12.00 DOI: 10.1037/dec0000007

2

Page 2: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

including tests of novel versions of “Random Cumulative Prospect Theory.” A majorasset of the approach is the potential to distinguish decision makers who have a fixedpreference and commit errors in observed choices from decision makers who waver intheir preferences.

Keywords: behavioral decision research, Luce’s challenge, order-constrained likelihood-basedinference, probabilistic specification, theory testing

Supplemental materials: http://dx.doi.org/10.1037/dec0000007.supp

Behavioral decision researchers in the socialand behavioral sciences, who are interested inchoice under risk or uncertainty, in intertempo-ral choice, in probabilistic inference, or manyother research areas, invest much effort intoproposing, testing, and discussing descriptivetheories of pairwise preference. This article pro-vides the theoretical and conceptual frameworkunderlying a new, general purpose, public-domain tool set, the QTEST software.1 QTEST

leverages high-level quantitative methodologythrough mathematical modeling and state-of-the-art, maximum likelihood–based, statistics.Yet, it automates enough of the process that manyof its features require no more than relatively basicskills in math and statistics. The program featuresa simple Graphical User Interface and is generalenough that it can be applied to a large number ofsubstantive domains.

Consider a motivating analogy between the-ory testing and diagnostics in daily life. Imaginethat you experience intense abdominal pain.You consider three methods of diagnostics:

1. You may seek diagnostic informationfrom another lay person and/or a feverthermometer.

2. You may seek diagnostic informationfrom a nurse practitioner.

3. You may seek diagnostic informationfrom a radiologist.

Over recent decades, the behavioral scienceshave experienced an explosion in theoreticalproposals to explain one or the other phenom-enon in choice behavior across a variety ofsubstantive areas. In our view, the typical ap-proach to diagnosing the empirical validity ofsuch proposals tends to fall into either of twoextreme categories, similar to the patient eitherconsulting with a lay person (and maybe a ther-mometer) or with a radiologist. The overwhelm-ing majority of “tests” of decision theories ei-

ther use very simple descriptive measures (akinto asking a lay person), such as counting thenumber of choices consistent with a theoreticalprediction; possibly augmented by a basic gen-eral purpose statistical test (akin to checking fora fever), such as a t test; or proceed straight toa highly specialized, sometimes restrictive, andoftentimes rather sophisticated, quantitative test(akin to consulting with a radiologist), such as a“Logit” specification of a particular functionalform of a theory. The present study offers thecounterpart to the triage nurse: We provide anovel, rigorous, yet very general, quantitativediagnostic framework for testing theories of bi-nary choice. This permits the nontechnicalscholar to proceed far beyond very superficialmethods of analysis, and it permits the quanti-tatively savvy scholar to triage theoretical pro-posals before investing effort into complicated,restrictive, and specialized quantitative analy-ses. A basic underlying assumption, throughoutthe paper, is that a decision maker, who faces apairwise choice among two choice options, be-haves probabilistically (like the realization of asingle Bernoulli trial), including the possibilityof degenerate probabilities where the personpicks one option with certainty. Although thepaper is written in a ‘tutorial’ style to make thematerial maximally broadly accessible, it alsooffers several novel theoretical contributionsand it asks important new theoretical questions.

1 QTEST is funded by NSF-DRMS SES 08-20009 (Re-genwetter, PI). While a Bayesian extension is under devel-opment, we concentrate on a frequentist likelihood–basedapproach here. QTEST, together with installation instruc-tions, a detailed step-by-step tutorial, and some exampledata, are available from http://internal.psychology.illinois.edu/labs/DecisionMakingLab/. An Online Tutorial explainsstep-by-step how a novice user can replicate each QTEST

analysis using the original data, and generate three-dimensional QTEST figures similar to those in the paper. Theoriginal Regenwetter et al. data are provided with the soft-ware in a file format that QTEST can read directly.

3QUANTITATIVE TESTING OF THEORIES OF BINARY CHOICE

Page 3: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

Motivating Example and Illustration

We explain some basic concepts using a mo-tivating example that also serves as an illustra-tion throughout the paper. In the interest ofbrevity and accessibility, we cast the example interms of the most famous contemporary theoryof risky choice, Cumulative Prospect Theory(Tversky & Kahneman, 1992). However, be-cause our empirical illustration only considersgambles in which one can win but not losemoney, one can think of the predictions as de-rived from certain, more general, forms of“rank-dependent utility” theories.

Imagine an experiment on “choice underrisk,” in which each participant makes choicesamong pairs of lotteries. We concentrate on acase in which we aim to analyze data separatelyfor each participant, and in which each individ-ual repeats each pairwise choice multiple times.Table 1 shows 25 trials of such an experimentfor one participant. These data are from a pub-lished experiment on risky choice (Regenwetteret al., 2010, 2011a, 2011b) that we use forillustration throughout the paper.

In this experiment, which built on a verysimilar, seminal experiment by Tversky (1969),each of 18 participants made 20 repeated pair-wise choices among each of 10 pairs of lotteriesfor each of three sets of stimuli (plus distrac-tors). Participants carried out 18 warm-up trials,followed by 800 two-alternative forced choicesthat, unbeknownst to the participant, rotatedthrough what we label “Cash I,” “Distractor,”“Noncash,” and “Cash II” (see Table 1 for 25 ofthe trials). The 200 choices for each stimulus setconsisted of 20 repetitions of every pair of gam-bles among five gambles in that stimulus set, aswas the case in the original study by Tversky(1969). The distractors varied widely. We willonly consider “Cash I” and “Cash II” that bothinvolved cash lotteries. Table 2 shows abbrevi-ated versions of the “Cash II” gambles: Forexample, in Gamble A the decision maker has a28% chance of winning $31.43, nothing other-wise (see Appendix A for the other cash stim-ulus set). The participant in Table 1 made achoice between two Cash II gambles for the firsttime on Trial 4, namely, she chose a 28%chance of winning $31.43 over a 36% chance ofwinning $24.44. The Cash II gambles are setapart by horizontal lines in Table 1. All gambles

were displayed as “wheels of chance” on acomputer screen. Participants earned a $10.00base fee, and one of their choices was randomlyselected at the end of the experiment for realplay using an urn with marbles instead of theprobability wheel.

For this first illustration, we also consider aspecific theoretical prediction derivable fromCumulative Prospect Theory. We will use thelabel CPT � KT to refer to Cumulative Pros-pect Theory with a “power” utility function with“risk attitude” � and a “Kahneman-Tverskyweighting function” with weighting parameter �(Stott, 2006), according to which a binary gamblewith a P chance of winning X (and nothing oth-erwise) has a subjective (numerical) value of

P�

(P� � (1 � P)�)�1��

X�. (1)

For this paper, the exact details of this functionare not important, other than to note that itdepends on two parameters, � and �. For someof the points we will make, it is useful to payclose attention to a specific prediction underCPT � KT. We consider the weighting func-tion P.83

�P.83 � �1 � P�.83�� 1.83�

and the utility function

X.79, in which we substituted � � 0.83 and� � 0.79. These are displayed in Figure 1. Wechose these values because that case allows usto highlight some important insights aboutquantitative testing. According to this model,the subjective value attached to Gamble 1 inPair 1 of Table 2 is

.28.83

(.28.83 � .72.83)�1

.83�31.43.79 � 4.68, (2)

whereas the subjective value attached to Gam-ble 0 in Pair 1 of Table 2 is

.32.83

(.32.83 � .68.83)�1

.83�27.50.79 � 4.67. (3)

Therefore, Gamble 1 is preferred to Gamble 0in Pair 1, according to CPT � KT with � � 0.79,� � 0.83. A decision maker who satisfiesCPT � KT with � � 0.79, � � 0.83 ranks thegambles EDABC from best to worst, that is,

4 REGENWETTER ET AL.

Page 4: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

prefers Gamble 1 to Gamble 0 in Pair 1, in Pair2 and in Pair 5, whereas he prefers Gamble 0 toGamble 1 in each of the other 7 lottery pairs, asshown in Table 2 under the header “KT-V4Preferred Gamble.” We refer to such a patternof zeros and ones as a preference pattern. Thecorresponding binary preferences are shown inthe last column of Table 1.

The values � � 0.79, � � 0.83 are not theonly values that predict the preference patternEDABC in CPT � KT. We computed all pref-erence patterns for values of �, � that are mul-tiples of 0.01 and in the range �, � � �0.01, 1�.We consider � � 1, that is, only “risk averse”cases, for the sake of simplicity. Table 3 liststhe patterns, the corresponding rankings, and the

Table 1First 25 of 800 Pairwise Choices of DM1

Trial Stimulus set Gamble 1 Gamble 0Observed

choiceKT-V4

prediction

1 Cash I 33.3% chance of $26.6 (R) 41.7% chance of $23.8 (L) 12 Distractor 12% chance of $31.43 (R) 18% chance of $27.5 (L) 03 Noncash 20% chance of

�7 paperbacks (R)24% chance of

�4 music CDs (L)1

4 Cash II 28% chance of $31.43 (R) 36% chance of $24.44 (L) 1 1

5 Cash I 37.5% chance of $25.2 (L) 45.8% chance of $22.4 (R) 06 Distractor 16% chance of $22 (R) 24% chance of $22 (L) 07 Noncash 22% chance of

�40 movie rentals (L)26% chance of

�40 coffees (R)1

8 Cash II 32% chance of $27.5 (R) 40% chance of $22 (L) 1 0

9 Cash I 29.2% chance of $28 (R) 41.7% chance of $23.8 (L) 010 Distractor 4% chance of

�40 coffees (L)20% chance of

�4 music CDs (R)0

11 Noncash 18% chance of�15 sandwiches (L)

24% chance of�4 music CDs (R)

1

12 Cash II 36% chance of $24.44 (L) 44% chance of $20 (R) 0 0

13 Cash I 33.3% chance of $26.6 (R) 37.5% chance of $25.2 (L) 014 Distractor 6% chance of

�40 coffees (L)16% chance of

�7 paperbacks (R)0

15 Noncash 20% chance of�7 paperbacks (L)

22% chance of�40 movie rentals (R)

1

16 Cash II 28% chance of $31.43 (L) 40% chance of $22 (R) 1 0

17 Cash I 29.2% chance of $28 (R) 45.8% chance of $22.4 (L) 018 Distractor 8% chance of

�7 paperbacks (L)16% chance of

�40 coffees (R)1

19 Noncash 18% chance of�15 sandwiches (R)

26% chance of�40 coffees (L)

1

20 Cash II 32% chance of $27.5 (R) 36% chance of $24.44 (L) 1 1

21 Cash I 37.5% chance of $25.2 (L) 41.7% chance of $23.8 (R) 022 Distractor 14% chance of $22 (L) 26% chance of $22 (R) 023 Noncash 22% chance of

�40 movie rentals (L)24% chance of

�4 music CDs (R)1

24 Cash II 28% chance of $31.43 (R) 44% chance of $20 (L) 0 0

25 Cash I 33.3% chance of $26.6 (R) 45.8% chance of $22.4 (L) 0

Note. The symbol � stands for “approximately.” (L) means that the gamble was presented on the left screen side, (R)means it was presented on the right. An entry of 1 under “Observed choice” means that the respondent chose Gamble 1,whereas 0 means that he chose Gamble 0. The last column gives the Cash II predictions of KT-V4, i.e., Cumulative ProspectTheory with power utility (e.g., � � 0.79) and “Kahneman-Tversky” weighting (e.g., � � 0.83).

5QUANTITATIVE TESTING OF THEORIES OF BINARY CHOICE

Page 5: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

portion of the algebraic parameter space (the pro-portion of values of �, � in our grid search) asso-ciated with each pattern.2 We labeled the patternthat gives the ranking EDABC as KT-V4 here andelsewhere. The complete list of values of �, �yielding KT-V4 (i.e., ranking EDABC) is as fol-lows:

� � 0.58, � � 0.66;

or � � 0.63, � � 0.70;

or � � 0.79, � � 0.83;

or � � 0.84, � � 0.87;

or � � 0.95, � � 0.96.

Because 5 values of �, � yield this predictedpreference, Table 3 reports that the proportionof the algebraic space for CPT � KT that pre-dicts preference pattern KT-V4 is 0.0005.Clearly, only decision makers with very specificweighting and utility functions are predicted

2 There were 101 parameter combinations, among the10,000, where the values associated to two gambles differedby less than 10�20. For reasons of numerical accuracy, wedid not make a pairwise preference prediction in thosecases. We also omit the technical details of how to expandthe QTEST analyses to incorporate “indifference” amongpairs of objects, because we focus on two-alternative forcedchoice, in which a decision maker cannot express “indiffer-ence” among pairs of lotteries.

Table 2Illustrative Motivating Example

Pair

Monetary gamblecoded asGamble 1

Chance, Gain

Monetary gamblecoded asGamble 0

Chance, Gain

KT-V4preferredgamble

HDM# choicesGamble 1

DM1# choicesGamble 1

DM13# choicesGamble 1

1 A: 28%, $31.43 B: 32%, $27.50 1 18 90% 17 85% 16 80%2 � A: 28%, $31.43 C: 36%, $24.44 1 19 95% 13 65% 9 45%3 Q A: 28%, $31.43 D: 40%, $22 0 1 5% 5 25% 12 60%4 A: 28%, $31.43 E: 44%, $20 0 0 0% 4 20% 7 35%5 B: 32%, $27.50 C: 36%, $24.44 1 20 100% 17 85% 10 50%6 B: 32%, $27.50 D: 40%, $22 0 3 15% 8 40% 8 40%7 B: 32%, $27.50 E: 44%, $20 0 0 0% 3 15% 9 45%8 � C: 36%, $24.44 D: 40%, $22 0 2 10% 15 75% 12 60%9 C: 36%, $24.44 E: 44%, $20 0 1 5% 9 45% 11 55%

10 D: 40%, $22 E: 44%, $20 0 0 0% 10 50% 10 50%

Descriptive analysis:Total number of choices matching KT-V4 190 95% 133 67% 106 53%Number of modal choices matching KT-V4 10 8 (or 9) 4 (or 6)

Semi-quantitative analysis (� � .05):Number of significant 2-sided Binomial tests for/against

KT-V4 10 / 0 5 / 1 1 / 0

QTEST (p-values) for KT-V4Modal choice (Permit up to 50% error rate in each pair) 1 0.03 .550.75-supermajority (Permit � 25% error rate in each pair) 1 <0.00001 <0.000010.50-city-block (Sum of 10 error rates � .50) 1 <0.00001 <0.00001

QTEST (p-values) for Random CPT:“Kahneman-Tversky” (12 possible preference states) 0.045 0.0002 0.36“Goldstein-Einhorn” (43 possible preference states) 0.25 0.01 0.20

Note. The 10 gamble pairs are the Cash II stimulus set of Regenwetter et al. (2010, 2011a). KT-V4 denotes a specifictheoretical prediction made by Kahneman and Tversky’s Cumulative Prospect Theory. HDM is an illustrative hypotheticaldecision maker, and DM1 and DM13 are Participants 1 and 13 in Regenwetter et al. (2010, 2011a). The right three columnsshow the frequencies, of 20 repetitions, and corresponding percentages, that each decision maker chose the cash lotterycoded as Gamble 1. Frequencies in which the modal choice is consistent with KT-V4 are marked in typewriter style, casesin which the modal choice is inconsistent with KT-V4 are underlined, unmarked choice frequencies are exactly at the 50%boundary. Significant violations (� � 0.05) are marked in bold font.

6 REGENWETTER ET AL.

Page 6: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

to have preference EDABC according toCPT � KT, for example.

How can one test a theory like CumulativeProspect Theory, or one of its specific predic-tions, such as the one instantiated in KT-V4,empirically? If empirical data had no variabil-ity, it would be natural to treat them as alge-braic. But if there is variability in empiricaldata, a probabilistic framework is more ap-propriate. In particular, it is common to inter-pret algebraic models of behavior as assum-ing that behavior is deterministic, which may

be too strong an assumption. Table 2 showsthe binary choice frequencies of a hypotheti-cal decision maker (HDM), as well as those ofParticipant 1 (DM1) and of Participant 13(DM13) of Regenwetter et al. (2010, 2011a).We created the data of the hypothetical deci-sion maker to look as though she acted in a‘nearly deterministic’ way, with virtually ev-ery binary choice matching the prediction ofKT-V4: In Pair 1 she chooses the ‘correct’option 18 of 20 times, in Pairs 2 and 3, shechooses the ‘correct’ option 19 of 20 times.

Figure 1. Example of a “power” utility function for money, with � � .79 (top) and a “Kahneman-Tversky” probability weighting function, with � � 0.83 (solid curve in the lower panel) that generateKT-V4. (The dashed diagonal line in the lower panel is given for visual reference.)

7QUANTITATIVE TESTING OF THEORIES OF BINARY CHOICE

Page 7: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

Although some decision makers display rela-tively small amounts of variability in theirbinary choices, the typical picture for actualparticipants in the Tversky study and the Re-genwetter et al. study was more like the datain the two right-most columns of Table 2. Butwe will see that even data like those of HDMwarrant quantitative testing.

What are some common descriptive ap-proaches in the literature to diagnose the behav-ior of the three decision makers? Table 2 showsvarious summary measures.

First, consider the total number of choicesof a given decision maker that match KT-V4.HDM almost perfectly matches the predictionand only picks the ‘wrong’ gamble in 5% ofall choices. The two real decision makers,DM1 and DM2, are not as clear cut. Theychose the ‘correct’ option about two thirds ofthe time. Many authors would consider this adecent performance of KT-V4.

Second, consider the number of pairs onwhich the decision maker chose the ‘correct’option more often than the ‘wrong’ option, thatis, the number of pairs on which the observedmodal choice matched the prediction of KT-V4. HDM’s modal choice matches KT-V4 inevery pair, hence HDM has 10 correct modal

choices. The modal choices of DM1 matchKT-V4 in 8 or 9 pairs. It depends on whether“modal choice” does or does not include theknife-edge case where a person chooses eitheroption equally often, as DM1 does in Pair 10.Table 2 shows those choice frequencies intypewriter style where the strict modalchoice matches KT-V4, and those choice fre-quencies underlined where the strict modalchoice disagrees with KT-V4, whereas frequen-cies at the 50% level are neither in typewriter stylenor underlined. DM13’s strict modal choicematches KT-V4 only in four of 10 gamble pairs.In the literature, many authors would interpret thisfinding as indicating a poorer performance ofKT-V4 for DM13 than for DM1, and an inade-quate performance for DM13 overall.

A major complication with the analysis sofar is that it ignores the magnitude of thedisagreement between KT-V4 and the ob-served choice frequencies. For instance, eventhough DM1 only had one ‘incorrect’ modalchoice (in Pair 8), we should also ask whether15 of 20 choices inconsistent with KT-V4 inPair 8 might be too much of a disagreement tobe attributable to error and/or sampling vari-ability. Likewise, while DM13 shows many‘incorrect’ modal choices, it may be impor-

Table 3Predicted Preference Patterns Under CPT � KT for Cash II

Vertices for Cash II (with “Kahneman-Tversky” weighting and “power” utility)

KT-V1 KT-V2 KT-V3 KT-V4 KT-V5 KT-V6 KT-V7 KT-V8 KT-V9 KT-V10 KT-V11 KT-V120 1 1 1 1 1 1 1 1 1 1 10 0 1 1 1 1 1 1 1 1 1 10 0 0 0 1 1 1 1 1 1 1 10 0 0 0 0 0 1 1 1 1 1 10 0 0 1 0 1 1 1 1 1 1 10 0 0 0 0 0 0 1 1 1 1 10 0 0 0 0 0 0 0 1 1 1 10 0 0 0 0 0 0 0 0 1 1 10 0 0 0 0 0 0 0 0 0 1 10 0 0 0 0 0 0 0 0 0 0 1

Ranking of A, B, C, D, E (and associated portion of the algebraic space)EDCBA EDCAB EDACB EDABC EADCB EADBC AEDBC AEBDC ABEDC ABECD ABCED ABCDE0.3974 0.0051 0.0061 0.0005 0.0002 0.0069 0.0005 0.0080 0.0007 0.0089 0.0106 0.5552

Ranking of A, C, and D (and associated portion in the algebraic space)DCA DAC ADC ACD

.40 .01 .02 .57

Note. The pattern for KT-V4, marked in bold font here, was also given in Table 2. The proportions of occurrence forrankings of all five gambles of 9899 value combinations of �, � in the grid search, rounded to the closest 1

10,000. Theproportions of occurrence in the grid search for rankings of A, C, and D are rounded to the closest 1

100.

8 REGENWETTER ET AL.

Page 8: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

tant to take into account that none of theseinvolve frequencies that seem very differentfrom 10 (i.e., 50%). Could they have occurredaccidentally by sampling variability, if thedecision maker, in fact, tends to choose con-sistently with KT-V4 more often than not, inevery gamble pair?

Some scholars take a semiquantitative ap-proach by carrying out a Binomial test for eachgamble pair. A common approach is to considerthe Null Hypothesis that the person acts “ran-domly” and flips a fair coin for each gamblepair. We report such an analysis in Table 2. ThisNull is rejected for all 10 pairs for HDM, it isrejected for 5 pairs in DM1, and it is rejected inone pair in DM13. Scholars who take this ap-proach, often proceed next to see whether thepattern of ‘significant’ binary choices is consis-tent with the theory in question, here KT-V4.For the hypothetical decision maker, all 10 Bi-nomial tests come out significant and in favor ofKT-V4. For DM1, five significant Binomialtests are supportive of KT-V4, but one test, theone for Pair 8, suggests that KT-V4 must bewrong, because the decision maker chooses the‘wrong’ option in Pair 8 ‘more often than ex-pected by chance.’ For DM13, this analysisdraws a completely new picture: The Null Hy-pothesis that this decision maker flips coins isretained in 9 of 10 gamble pairs, with the re-maining test result (Pair 1) supporting KT-V4.

This type of analysis, while taking somequantitative information into account, is prob-lematic nonetheless: Because this analysis in-volved 10 distinct Binomial tests, Type-I errorsmay proliferate, that is, we may accumulatefalse significant results. For example, if these 10tests commit Type-I errors independently, and ifwe use � � .05 for each test (as in Table 2),then the overall combined Type-I error ratebecomes 1 � �.95�10 � .40 after running 10separate tests. A “Bonferroni correction”would, instead, reduce the power dramatically.The second problem arises when we move fromtesting a single prediction to multiple predic-tions (we will later consider 12 distinct predic-tions, KT-V1 through KT-V12).

Scholars with advanced expertise in quanti-tative testing rarely use the descriptive or semi-quantitative approaches we summarized in Ta-ble 2. Instead, they tend to consider primarilyeither of two approaches:

1) Tremble, or constant error, models (e.g.,Birnbaum & Chavez, 1997; Birnbaum &Gutierrez, 2007; Birnbaum & Bahra,2012; Harless & Camerer, 1994) assumethat a person facing a pairwise choice willmake an incorrect choice with some fixedprobability ε and choose the preferred op-tion with a fixed probability 1 � �. Ac-cording to these models, a decision makersatisfying CPT � KT with � � 0.83 and� � 0.79 will choose Gamble 1 in Pair 1of Table 2 with probability 1 � � becausethe value of Gamble 1 is higher than that ofGamble 0 (see Equations 2 and 3). Gener-ally, scholars in this branch of the literatureconsider error rates around 20–25%, that is,values of ε around 0.20–0.25, to be reason-able. So, a tremble model of CPT � KTwith � � 0.83 and � � 0.79 would typi-cally predict that, in Pair 1 of Table 2, Gam-ble 1 should be chosen with probability ex-ceeding 0.75. In particular, constant errormodels predict that the preferred option inany lottery pair is the modal choice (up tosampling variability).

2) Econometric models (which we use as a ge-neric term to include, e.g., “Fechnerian,”“Thurstonian,” “Luce choice,” “Logit,” and“Probit” models) assume that the probabilityof selecting one gamble over the other is afunction of the “strength of preference.” Thereare many sophisticated models in this domain(see, e.g., Blavatskyy & Pogrebna, 2010;Hausman & McFadden, 1984; Hey & Orme,1994; Loomes et al., 2002; Luce, 1959; Stott,2006; Wilcox, 2008, 2011; Yellott, 1977, fordiscussions and additional references). Ac-cording to these models, the strength of pref-erence, according to CPT � KT with � �0.83 and � � 0.79, favoring Gamble 1 overGamble 0 in Pair 1 of Table 2 is

.28.83

(.28.83 � .72.83)�1

.83�31.43.79

�.32.83

(.32.83 � .68.83)�1

.83�27.50.79

� 4.68 � 4.67 � 0.01. (4)

In these models, this strength of preference isperturbed by random noise of one kind or an-

9QUANTITATIVE TESTING OF THEORIES OF BINARY CHOICE

Page 9: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

other. If the median noise is zero and the noiseoverwhelms the strength of preference, thenthese models predict choice probabilities near0.50: A person with a very weak strength ofpreference will act similarly to someone flip-ping a fair coin. If the noise is almost negligible,then choice behavior becomes nearly determin-istic. The vast majority of such models share thefeature that whenever the strength of preferencefor one option over another is positive, then the‘preferred’ option is chosen with probabilitygreater than 1

2. In other words, they predict thatthe preferred option in any lottery pair is themodal choice (up to sampling variability).

Whereas the “Descriptive Analysis” and“Semiquantitative Analysis” in Table 2 resem-ble the patient who asks a lay person for diag-nostic help, possibly supplemented with a sim-ple quantitative measurement of bodytemperature, the alternative route of trembleand, especially econometric, models resemblesthe patient seeking diagnostics from the radiol-ogist, with different models corresponding todifferent specialized, and often highly technical,medical diagnostics. Just like different medicaldiagnostic methods vary dramatically in theskill set they require and in the assumptionsthey make about the likely state of health, so dodifferent ways to test theories of decision mak-ing vary in the mathematical and statistical skillset they demand of the scientist, and in thetechnical ‘convenience’ assumptions they makefor mathematical and computational tractability.

The questions and puzzles we just discussedillustrate a notorious challenge to meaningfultesting of decision theories (e.g., Luce, 1959,1995, 1997): There is a conceptual gap betweenthe algebraic nature of the theory and the prob-abilistic nature of the data, especially becausealgebraic models are most naturally interpretedas static and deterministic, whereas behavior ismost naturally viewed as dynamic and not fullydeterministic. Luce’s challenge is twofold:1. Recast an algebraic theory as a probabilisticmodel. 2. Use the appropriate statistical meth-odology for testing that probabilistic model.The first challenge has been recognized, some-times independently, by other leading scholars(see, e.g., Blavatskyy, 2007; Blavatskyy &Pogrebna, 2010; Harless & Camerer, 1994;Hey, 1995, 2005; Hey & Orme, 1994; Loomes& Sugden, 1995; Starmer, 2000; Stott, 2006;

Tversky, 1969; Wilcox, 2008, 2011). Some ofthese researchers have further cautioned thatdifferent probabilistic specifications of the samecore algebraic theory may lead to dramaticallydifferent quantitative predictions, a notion thatwe will very much reinforce further. Othershave warned that many probabilistic specifica-tions require difficult “order-constrained” statis-tical inference (Iverson & Falmagne, 1985).Both components of Luce’s challenge are non-trivial. From the outside, one can easily get theimpression that virtually any level of rigorousprobabilistic modeling and testing of decisiontheories requires advanced quantitative skills.

QTEST solves many of the problems we re-viewed. For example, it lets us formally test theNull Hypothesis that a decision maker’s modalchoices match KT-V4, via a single test on all ofa person’s binary choice data at once, providedthat we have multiple observations for eachchoice pair. Table 2 shows the p-values of thattest. A standard criterion is to reject a model orNull Hypothesis when the p-value is smallerthan 0.05, the usual significance level. Hence,small p-values are indications of poor modelperformance. A p-value of 1 means that a modelcannot be rejected on a given set of data, nomatter what the significance level of the statis-tical test. Here, HDM fits this Null Hypothesisperfectly because in each row in which KT-V4predicts preference for Gamble 1, the hypothet-ical decision maker chose Gamble 1 more oftenthan not, and in each row in which KT-V4predicts preference for Gamble 0, the hypothet-ical decision maker chose Gamble 0 more oftenthan not. Hence, the modal choice test ofKT-V4 for HDM has a p-value of 1. It is quitenotable that DM1 rejects the Null Hypothesiswith a p-value of 0.03. QTEST trades-off be-tween the excellent fit of the modal choices ineight lottery pairs and the one big discrepancybetween modal choice and KT-V4 in Pair 8, andrejects the Null Hypothesis. On the other hand,QTEST does not reject KT-V4 by modal choiceon DM13. The p-value of .55 takes into accountthat, despite the large number of observed ‘in-correct’ modal choices, none of these were sub-stantial deviations.

The quantitative modal choice analysis con-trasts sharply with the descriptive modal choiceanalysis and depicts a different picture. If wecount observed ‘correct’ modal choices, thenKT-V4 appears to be better supported by DM1

10 REGENWETTER ET AL.

Page 10: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

than DM13. Only a quantitative analysis, such asthat offered by QTEST, reveals that DM1’s singleviolation of modal choice is more serious thanDM13’s four violations combined. This teachesus that superficial descriptive indices are noteven monotonically related to quantitativegoodness-of-fit, and hence can be very mislead-ing. Note that QTEST is also designed to avoidthe proliferation of Type-I errors that a series ofseparate Binomial tests creates, because it testsall constraints of a given probability modeljointly in one test.

The modal choice test is also useful for thequantitative decision scientist: Because themodal choice prediction is rejected for DM1,we can conclude that constant error models anda very general class of econometric models ofCPT � KT with � � 0.83 and � � 0.79 arelikewise rejected, because the modal choice pre-diction is a relaxation of their vastly more re-strictive (i.e., specific) predictions.

Table 2 illustrates two more QTEST analysesfor KT-V4. We tested the Null Hypothesis thatthe decision maker satisfies KT-V4 and, in eachgamble pair, chooses ‘incorrectly’ at most 25%of the time, as is required by the common ruleof thumb for constant error models. Again,HDM fits perfectly. However, both DM1 andDM13 reject that Null Hypothesis with smallp-values. We also included another example,more closely related to the first descriptive ap-proach of counting ‘incorrect’ choices across allgamble pairs. For example, QTEST can estimatebinary choice probabilities subject to the con-straint that the error probabilities, summed overall gamble pairs, are limited to some maximumamount, say 0.5 (this is a restrictive model al-lowing at most an average error rate of 5% pergamble pair), and provide a goodness-of-fit forKT-V4. Again, HDM fits perfectly, but DM1and DM13 reject that model with small p-val-ues.

The bottom panel of the table illustrates avery different class of models and their test. Inthe first model, the parameters � and � ofCPT � KT have become random variables,that is, the utility and weighting functions of adecision maker are, themselves, no longer de-terministic concepts. This captures the idea thata decision maker satisfying CPT � KT couldwaver in his risk attitude � and in his weightingof probabilities. This model is rejected for DM1

and yields an adequate fit on the Cash II stimulifor DM13. At a significance level of 5%, theHDM is also rejected by Random CPT � KT,even though that model permits, as one of itsallowable preference states, the pattern labeledKT-V4 in the table, that HDM appears, descrip-tively, to satisfy nearly perfectly. However, aswe move to Cumulative Prospect Theory with atwo-parameter “Goldstein-Einhorn” weightingfunction, the data of HDM do not reject theRandom CPT model. In Random CPT, variabil-ity of choices is modeled as variability in pref-erences. The rejection of the Random CPTmodel with “Kahneman-Tversky” weightingfunction means that the slight variability in thechoice behavior of HDM cannot be explainedby assuming that this decision maker waversbetween different “Kahneman-Tversky”weighting functions!3

The table shows that DM1 is consistent nei-ther with the deterministic preference KT-V4perturbed by random error, nor with two Ran-dom CPT models. DM13 is consistent with bothkinds of models, but the deterministic prefer-ence KT-V4 is significantly rejected if we limiterror rates � 25% on each gamble pair. Thehypothetical decision maker is perfectly consis-tent with error-perturbed deterministic prefer-ence KT-V4, even with very small error rates,leading to a perfect fit (p-value � 1) for thosemodels. The fit of Random CPT � KT is mar-ginal for HDM and the “Kahneman-Tversky”version is, in fact, significantly rejected forHDM and DM1. This illustration documents theformidable power of quantitative testing. It alsoillustrates how QTEST provides very generaltests that lie in the open space between descrip-tive or semiquantitative analyses on the onehand and highly specialized classic quantitative‘error’ models on the other hand. QTEST canserve as the ‘triage nurse’ of theory testing.

This completes our motivating example.

3 More precisely, the HDM data are not statistically con-sistent with having been generated by a random samplefrom an unknown probability distribution over preferencestates consistent with CPT with “Kahneman-Tversky”weighting functions and “power” utility functions, where�, � are multiples of 0.01 and in a certain range.

11QUANTITATIVE TESTING OF THEORIES OF BINARY CHOICE

Page 11: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

The Geometry of Binary Choice

We now introduce a geometric frameworkwithin which we can simultaneously representalgebraic binary preference, binary choice prob-abilities, as well as empirically observed binarychoice proportions all within one and the samegeometric space.4 For the time being, we areinterested in three-dimensional visualizations,but we will later move to high-dimensional ab-stract models.

We start with lotteries A, B, and C of Table2. There are eight possible preference patternsamong these gambles: the six rankings (eachfrom best to worst): ABC, ACB, BAC, BCA,CAB, CBA, and two intransitive cycles that welabel ABCA and ACBA. Using the binary 0/1coding of the gambles given in Table 2, we canrepresent each of these 8 preference patterns asa corner (called a vertex) of a three-dimen-sional cube of length 1 (called the unit cube) inFigure 2. For example, ranking ABC is the point(1,1,1) in the space with coordinate system(A,B), (A,C), and (B,C).5 The cycle ABCA isthe point (1,0,1). The axes of the geometricspace are indexed by gamble pairs and, forrepresenting algebraic preferences, simply rep-resent the 0/1 coding of gambles in Tables 1 and2. Note that “preference patterns” are (deter-ministic) models of preference, not empiricaldata.

If we move beyond just the 0/1 coordinates andconsider also the interior of the cube, we canrepresent probabilities and proportions (observeddata). Each axis continues to represent a gamblepair. For example, Figure 2 also shows a proba-bility model, namely the modal choice consistentwith the ranking ABC: If a person chooses A overB at least 50%, B over C at least 50%, and A overC at least 50% of the time, then their binary choiceprobabilities must lie somewhere in the smallershaded cube attached to the vertex ABC. In par-ticular, if a person acts deterministically andchooses A over B 100%, B over C 100%, and Aover C 100% of the time, then this person’s (de-generate) choice probabilities coincide with thevertex ABC that has coordinates (1,1,1) and thatalso represents the deterministic preference ABC.

Next, we proceed to a joint visualization ofan algebraic model (KT-V4), a probabilitymodel (theoretical modal choice consistent withKT-V4), and empirical data (the observedchoice proportions of HDM, DM1, and DM13),

again in 3D. Now, and for our later visualiza-tions, we concentrate on Gambles A, C, and Dfrom Table 2 because they continue to be par-ticularly informative.

Figure 3 shows KT-V4 as the point (1,0,0) in3D space, consistent with Table 2 which showsGamble 1 as the preferred gamble in Pair 2 (A vs.C, marked �), Gamble 0 as the preferred gamblein Pair 3 (A vs. D, marked �), and Gamble 0 asthe preferred gamble in Pair 8 (C vs. D, marked�). Hence, the three coordinates are the gamblepairs (A,C), (A,D), and (C,D). If a decision makeracted deterministically and in accordance withKT-V4, this person would choose A over C 100%,A over D 0%, and C over D 0% of the time,represented by the point (1,0,0). This point repre-sents both a deterministic preference and a degen-erate case in which a person always chooses in away consistent with that preference. Our hypothet-ical decision maker comes very close to suchbehavior: HDM’s choice proportions were 95% Aover C, 5% A over D, and 10% C over D, whichcorresponds to the point with coordinates (.95, .05,.10) marked with a star next to the vertex KT-V4in Figure 3. DM1 has choice proportions givingthe star with coordinates (.65, .25, .75). If we usemodal choice as a criterion, a decision maker whosatisfies KT-V4 should choose A over C at least50%, A over D at most 50%, and C over D at most50% of the time, as indicated by the shadedsmaller cube attached to the vertex KT-V4. DM1has two ‘correct’ of three observed modal choices.Geometrically, this means that the data are repre-sented by a star located above the shaded cube inFigure 3. Intuitively speaking, the 15 of 20 choicesin Pair 8 mean that the data point is somewhat ‘faraway’ from the shaded cube but has two coordi-nates that are consistent with the shaded cube. On

4 In this paper, we concentrate on asymmetric and com-plete preferences only. Likewise, empirical data are as-sumed to be from a two-alternative forced choice paradigm,in which on each trial one and only one option must bechosen. QTEST is flexible enough to handle other models butdoes not currently automate as much for the modeling andanalysis processes for such cases. In particular, an extensionof the Graphical User Interface for more general cases is notyet available. Regenwetter and Davis-Stober (2012) usedthe MATLAB© core underlying QTEST to test models onternary paired comparison data where respondents couldstate indifference among pairs of gambles.

5 The gamble pair (A,B) gives the x axis, the pair (A,C)gives the y axis, and (B,C) gives the z axis in 3D space.

12 REGENWETTER ET AL.

Page 12: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

Figure 2. Two different views of the same geometric representation of eight algebraic andone probabilistic model(s) for Gambles A, B, and C. Each of the eight possible preferencepatterns forms a vertex of the unit cube. Modal choice consistent with preference rankingABC (choose A over B at least 50%, A over C at least 50%, and B over C at least 50% ofthe time) forms the smaller shaded cube.

13QUANTITATIVE TESTING OF THEORIES OF BINARY CHOICE

Page 13: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

the other hand, DM13 translates into a star ‘veryclose’ to the shaded cube, at coordinates (.45, .60,.60), even though each observed modal choice isthe opposite of what KT-V4 predicts (i.e., eachcoordinate has a value on ‘the wrong side’ of 1

2).Figure 3 shows that counting ‘correct’ or

‘incorrect’ modal choices is tantamount tocounting the number of coordinates that matchthe modal choice prediction. This makes it alsoobvious why the descriptive tally of ‘correctmodal choices,’ although common in the liter-ature, is not a useful measure of model perfor-mance. It is analogous to the patient countingthe number of symptoms present while discard-ing all information about the intensity or impor-tance of any symptoms. We can encounter datalike those of DM1 that have 2 of 3 coordinatesin the correct range, yet the data are ‘far away’from the modal choice predictions, while wecan also collect data like those of DM13 thathave 3 of 3 coordinates slightly out of rangewithout statistically violating the modal choicepredictions. Figure 3 only depicts three dimen-sions. Going back to Table 2, DM1 has 9 of 10coordinates in the correct range, yet the data are‘far away’ from the modal choice predictions,whereas DM13 has 4 of 10 coordinates slightlyout of range without statistically violating thepredictions. Note that DM1 and DM13 are realstudy participants, not hypothetical persons cus-tom-created to make a theoretical point.

Table 2 provides two other quantitative testresults for KT-V4 from QTEST. We now illus-trate these geometrically as well. Again, weproject from a 10D space down to 3D space byconcentrating on the same three gamble pairs.Figure 4 shows the three data sets with a prob-abilistic model that limits the error rates foreach gamble pair to at most 25%. Hence, thepermissible binary choice probability for A overC must be at least .75, the probability of choos-

Figure 3. Three different angles of view of the samethree-dimensional geometric visualization for Gamble Pairs2 (A,C), 3 (A,D), and 8 (C,D). KT-V4 predicts the prefer-ence ranking DAC, that is, the point with coordinates (1,0,0)in the space spanned by (A,C), (A,D), and (C,D). The shadedcube shows the binary choice probabilities consistent withthe modal choice predictions for KT-V4 (choose A over Cat least 50%, A over D at most 50%, and C over D at most50% of the time). The three stars are the data sets for HDM,DM1, and DM13.

14 REGENWETTER ET AL.

Page 14: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

ing A over D must be .25 or lower, and thebinary choice probability of C over D is like-wise limited to at most .25. Again, HDM hasdata inside the shaded small cube, indicating aperfect fit. DM1 and DM13 are located ‘faraway’ from the shaded cube, which is reflectedin the rejection of the model on both data sets inTable 2, with very small p-values. This is the

so-called 0.75-supermajority specification. Anupper bound of 25% errors per gamble pair perperson is consistent with a general rule of thumbthat has been explored in the literature (see, e.g.,Camerer, 1989; Harless & Camerer, 1994;Starmer & Sugden, 1989). In fact, one can setthe supermajority specification level anywherefrom .5 to .999 in the QTEST program.

Figure 4. Two different angles of view of the same geometric visualization of HDM, DM1,DM13 and a supermajority model of KT-V4 (the shaded cube), in which the choice probability ofA over C is at least .75, the choice probability of A over D is at most .25, and the choice probabilityof C over D is at most .25.

15QUANTITATIVE TESTING OF THEORIES OF BINARY CHOICE

Page 15: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

The last QTEST analysis for KT-V4 we re-ported in Table 2 considers a different proba-bility model. Instead of limiting the error ratefor each gamble pair individually, we limit thesum of all error rates, added over all gamblepairs. This allows a decision maker to have ahigher error rate on one gamble pair as long asthey have a lower error rate on another gamblepair. Figure 5 illustrates, how, once again, HDMfits such a model perfectly because HDM’s datalie inside the shaded pyramid, whereas DM1 andDM13 are again ‘far away’ from the model. Notethat the tests in Table 2 are carried out in a 10Dspace whose coordinates are the 10 gamble pairs,whereas we only consider the 3D projection forGambles Pairs 2, 3, and 8 in these figures. In otherwords, the tests in Table 2 are somewhat morecomplicated than the illustrative figures convey.

Once we have moved to the geometric rep-resentation in which algebraic models are ver-tices of a unit cube whose axes are formed bygamble pairs, in which probability models arepermissible ‘regions’ inside a unit cube (viewedas a space of binary choice probabilities), and inwhich data sets are points in the same unit cube(viewed as a space of choice proportions), itappears that quantitative theory testing shouldbe almost trivial. We have also seen how somedata sets are ‘far away’ from some models,others are ‘nearby’ some models, and some areeven inside (and hence “perfect fits” for) somemodels. However, as we discuss next, the intu-itive interpretation of ‘distance’ between theoryand data is an oversimplification. Each of themodels in Figures 3–5 can be characterizedmathematically as a probability model with so-called “order-constraints” on the parameters (say,each choice probability is �1/2). We discuss thesemodels informally here. Appendix B gives formaldetails, and Davis-Stober (2009) provides the like-lihood-based statistical inference framework forbinary choice data that we build on.6

Maximum-likelihood estimation and goodness-of-fit analysis, say, of the best fitting binary choiceprobabilities for DM13, subject to the modalchoice specification of KT-V4, as QTEST providesin Figure 3, is nontrivial, for several reasons. Forone thing, there are equally many parameters (bi-nary choice probabilities) as there are empiricalcells (binary choice proportions), yet we can tellfrom the figures that the models can be extremelyrestrictive, especially in high-dimensional spaces,and hence, must be testable. As explained in Da-

vis-Stober and Brown (2011) one cannot simplycount parameters to evaluate the complexity ofthese types of models. The second reason, return-ing to data like those of DM1 and DM13 in Figure3 is that the best fitting parameters, that is, themaximum-likelihood estimate, satisfying an or-der-constrained model may lie on a face, an edge,or even a vertex of the shaded modal choice cube.This becomes even more complicated in higherdimensional spaces, in which the modal choicemodel has surfaces of many different dimensions.Standard likelihood methods will break downwhen the best-fitting parameter values are on theboundary of the model, because the log-likelihoodgoodness-of-fit statistic will not have the usual andfamiliar asymptotic 2 distribution. Rather, thedistribution depends on the geometry of the modelin question. The best fitting model parameters alsoneed not be the orthogonal projection of the dataonto the model in the geometric space. In sum,statistical testing of these models is difficult.QTEST is specifically designed to carry out theappropriate “order-constrained” maximum-likelihood estimation and goodness-of-fit tests forvirtually all of the models we discuss.7

Another unusual, and possibly confusing, fea-ture of these models is that they can allow for a“perfect fit” where, on certain sets of data, a modelcannot be rejected no matter how large the signif-icance level. This is because many of these mod-els do not make “point predictions.” Rather theymake predictions that occupy a volume in the unitcube of binary choice probabilities. When a pointrepresenting a set of choice proportions (data) isinside such a model, then the best-fitting choiceprobabilities are literally equal to the observedchoice proportions, hence giving a perfect fit.

We now move to the full-fledged abstractmodels and their tests.

6 Myung et al. (2005) provide a corresponding Bayesianframework.

7 A prerequisite is that the model in question must be full-dimensional, which holds automatically for “distance-based”specifications. QTEST also assumes an iid sample. BecauseQTEST tests hypotheses about Binomial distributions, we rec-ommend 20 observations per gamble pair. We discuss thesetopics in more detail in the Online Supplement.

8 We will keep the discussion here nontechnical in theinterest of making QTEST as approachable as possible. Ap-pendix B provides some formally precise details. We leavea much more general theory for a different paper.

16 REGENWETTER ET AL.

Page 16: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

Aggregation- and Distance-Based (Error)Models

Aggregation-based specifications8 of a the-ory T require that aggregated data should beconsistent with the theoretical predictions of T,while also accounting for sample size. The pro-totypical case is majority/modal choice, which

requires that the modal choice for each gamblepair be consistent with the theoretical prediction(up to sampling variability). To consider KT-V4in Table 2 again, the theoretical majority/modal choice specification requires that thechoice probability for Gamble 1 must be higherthan that of Gamble 0 in Pairs 1, 2, and 5,whereas Gamble 0 must be chosen with higher

Figure 5. Two different angles of view of the same geometric visualization of HDM, DM1,DM13, and the city-block model of KT-V4 (the shaded pyramid), in which the sum of errorprobabilities can be at most 0.5.

17QUANTITATIVE TESTING OF THEORIES OF BINARY CHOICE

Page 17: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

probability than Gamble 1 in all remainingpairs.

So far we have focused on the majority spec-ification of a numerical theory like CumulativeProspect Theory. To illustrate how the sameapproach can apply to theories that do not relyon numerical utility values, let us consider asimple “lexicographic heuristic” (see, e.g.,Tversky, 1969):

LH: Prefer the gamble with the higher chance ofwinning unless the probabilities of winning arewithin 5 percentage points of each other. If thechance of winning is similar in both gambles (within5 percentage points), prefer the gamble with thehigher gain.

A decision maker who satisfies LH prefersGamble 1 to Gamble 0 for Gamble Pairs 1, 5, 8,and 10 of Table 2, whereas he prefers Gamble 0to Gamble 1 in Gamble Pairs 2, 3, 4, 6, 7, and9 of Table 2. In particular, this decision makerviolates transitivity, because, considering againthe three gambles, A, C, D, we see that heprefers A to C, C to D, but D to A. The major-ity/modal choice specification of LH is illus-trated in Figure 6. If we only considered Gam-bles A, C, and D, DM1 would fit perfectly,because the data point is inside the shaded cubeattached to LH in Figure 6. However, we willsee in the Testing Cumulative Prospect Theoryand LH section that LH is rejected on the fulldata in 10D space.

If we think of majority/modal choice spec-ifications as permitting up to 50% errors ornoise in each binary choice, then we are al-lowing up to 50% of all data to be discardedas noise (even more, when we take into ac-count sampling variability in finite samples).From that vantage point, we may want toplace stronger constraints on the binarychoice probabilities, so that we do not end upoverfitting data by accommodating modelsthat really are poor approximations of thecognitive process of interest. In a superma-jority specification, we specify a lower boundon the rate, that is, the minimum probabilitywith which a decision maker must chooseconsistently with their preference, for eachgamble pair. For example, in the data analysisof the Testing Cumulative Prospect Theoryand LH section we will consider a superma-jority level of 0.9, according to which a per-son must choose the preferred gamble in a

pair with probability at least 0.9, that is, wepermit up to 10% errors (up to sampling vari-ability) for each gamble pair.

As we have seen in Figures 2–6, majority andsupermajority specifications require the binarychoice probabilities to be within some range ofthe vertex that represents the algebraic theory inquestion. Distance-based specifications of atheory T generalize that idea. They constrainthe choice probabilities to lie within some spec-ified distance of the vertex that represents T.Appendix B provides a formal summary of suchmodels for three different distance measures.

Figure 6. Two different angles of view of the same geo-metric visualization of HDM, DM1, DM13, the modalchoice models of KT-V4 (orange), and LH (blue). In this3D figure, HDM is inside the orange cube for KT-V4, andDM1 is inside the blue cube attached to LH (the latter doesnot hold in 10D space).

18 REGENWETTER ET AL.

Page 18: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

Distance-Based Models for Theories WithMultiple Predictions

There is no reason why we should limit our-selves to theories that only predict a singlebinary preference pattern like KT-V4 in Table2. If a theory permits a variety of preferencepatterns, we can build a probabilistic model bycombining the various probabilistic models forall of the permitted patterns. For example, forGambles A, C, D in Table 2, we can consider allsix possible rankings (each from best to worst):ACD, ADC, CAD, CDA, DAC, and DCA. Fig-ure 7 considers the majority/modal choice spec-ification of that model on the top, and the su-permajority specification of that model with a0.90-supermajority level in the lower panel. Inthese models, the decision maker is allowed torank order the gambles from best to worst ac-cording to any fixed ranking that is unknown tothe researcher, then choose the preferred gam-ble in each gamble pair at least 50% (top ofFigure 7) or at least 90% (bottom of Figure 7) ofthe time.

The 0.90-supermajority model in the lowerpanel of Figure 7 can be interpreted to statethat the decision maker is allowed to have anyone of preference states ACD, ADC, CAD,CDA, DAC, and DCA, and for that preferencestate, chooses the ‘correct’ object in any pairwith probability 0.90 or higher. QTEST findsthe best fitting vertex and simultaneously testswhether the data are compatible with the con-straints on binary choice probabilities. Thetop of Figure 7 is a property that has receivedmuch attention in the literature under the labelof weak stochastic transitivity (WST).WST is the majority/modal choice specifica-tion of the collection of all transitive com-plete rankings of a set of choice alternatives.Regenwetter et al. (2010, 2011a) dedicatedmuch attention to the discussion of this prop-erty.9 WST was one of the earliest probabi-listic choice models that became known torequire order-constrained inference: Tversky(1969) attempted to test WST but acknowl-edged that appropriate order-constrained in-ference methods were unavailable. Iversonand Falmagne (1985) derived an order-constrained test for WST and showed thatTversky’s data yielded little evidence for sys-tematic violations. Regenwetter et al. (2010)provided a complete order-constrained test

(using a similar algorithm as that in QTEST) ofWST and found no systematic violations. Re-turning to the data of our Table 2 in 10Dspace, HDM yields a perfect fit of WST. DM1significantly violates WST with a p-value of0.02 and DM13 yields a perfect fit (see alsoTable 2 of Regenwetter et al., 2010, for de-tails). Note that the 3D figure of WST inFigure 7 gives the misleading impression thatthis might not be a restrictive property. In 10Dspace, the set of six shaded cubes in the top ofFigure 7 becomes a collection of 120 such “hy-percubes.” The two clear regions become 904different such regions associated with 904 intran-sitive 0/1 patterns.

While weak stochastic transitivity provides avery general level of triage, in which all possi-ble transitive complete rankings are permissiblepreference states and in which we use a modalchoice specification, we could alternatively con-sider only those transitive complete rankings aspermissible preference patterns that are compat-ible with CPT � KT, but we could augmentthat list of preference states by other preferencepatterns, such as LH, to form a new, and alsovery general, Null Hypothesis. As an example,if we focus again on the three lotteries A, C, D,then there are only four possible preferencepatterns, namely ACD, ADC, DAC, and DCApermitted by CPT � KT. The top panel of Fig-ure 8 shows the modal/majority choice specifi-cation of LH in blue in the upper left backcorner of the probability cube, and the specifi-cation of Cumulative Prospect Theory, with“Kahneman-Tversky” probability weightingfunctions and risk averse “power” utility func-tions, that is, the majority/modal choice speci-fications of the rankings ACD, ADC, DAC, andDCA in orange. The entire collection forms anextremely general Null Hypothesis that QTEST

can test, similarly to weak stochastic transitiv-

9 In particular, they explained why it is misleading tothink of this as a probabilistic model of transitivity per se,because there are many more transitive preferences thanthere are rankings for any given set of objects. As Regen-wetter and Davis-Stober (2012) discuss, if we moved be-yond two-alternative forced choice, that is, beyond 0/1patterns, then there would be very many more pairwisepreference relations to consider. For instance, althoughthere are 5! � 120 rankings for five choice alternatives,there are about 150 thousand transitive binary preferencesand about 33 million intransitive binary preferences.

19QUANTITATIVE TESTING OF THEORIES OF BINARY CHOICE

Page 19: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

ity, namely that the person is satisfyingCPT � KT or LH, with an upper bound of 50%on theoretical error rates. The lower panel ofFigure 8 shows the 0.90-supermajority specifi-cation, in which we limit error rates to 10% perlottery pair.

In this context, it is important to see thatalgebraic parameter counts and probability pa-

rameter counts do not match up at all. Thealgebraic version of CPT � KT has two freeparameters, � and �, that determine the shapesof the weighting and utility function, whereasLH has no free parameters. But, as we see inFigure 8, the probabilistic specifications ofthe two theories have the same number ofparameters: If we consider the blue cube as

Figure 7. Majority model (top) and 0.90-supermajority model (bottom) of the collection ofall six rankings of lotteries A, C, and D. The top is also known as “weak stochastictransitivity.”

20 REGENWETTER ET AL.

Page 20: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

representing one theory (LH), and the orangeshaded region as representing another theory(CPT � KT ), even though the two theoriesoccupy vastly different volumes in the cube,and even though one is more flexible by virtueof allowing 4 different rankings of the gambles(12 rankings in 10D), they have the same num-bers of parameters because they predict behav-ior by using the same number of binary choiceprobabilities (in the figure, we show threechoice probabilities). In other words, the usual

rule of thumb that counting parameters deter-mines the ‘complexity’ of a probability model,simply does not apply here. Furthermore, theNull Hypothesis that a decision maker “satisfiesCPT � KT or LH” has the same number ofparameters as the two nested Null Hypothesesthat 1) a decision maker “satisfies CPT � KT ”and 2) a decision maker “satisfies LH.” TheQTEST user can build compound Null Hypoth-eses like the one in Figure 8, but should beaware that model competitions, for example,selecting between CPT � KT and LH wouldideally use suitable methods for penalizingmore complex (flexible) models. Unfortunately,for direct model selection/competition, classical(“frequentist”) statistical approaches, includingthe current version of QTEST, are not well-suited(although, see Vuong, 1989, for a method to carryout certain nonnested likelihood ratio tests).

For direct comparisons of the models weconsider within QTEST, one could calculateBayes factors (e.g., Klugkist & Hoijtink, 2007)or Deviance Information Criterion (DIC) values(Myung et al., 2005). Alternatively, one couldcarry out model selection via normalized max-imum likelihood (see Davis-Stober and Brown,2011, for an application to order-restricted bi-nomial models similar to those we considerhere). All three of these are under developmentfor a future version of QTEST.

To this point, we have considered a variety ofmodels that can formally capture the idea that adecision maker has a (possibly unknown) fixedpreference and makes errors in her individualchoices. In each model, the ‘true’ preference ofa person is a vertex of the probability cube, andthe shape attached to each vertex provides con-straints on binary choice probabilities to repre-sent the variable choice behavior that is deemedconsistent with that deterministic preference.

Random Preference and Random UtilityModels

We now consider models that radically differfrom the ones we considered so far. Here, pref-erences are not treated as static like they are inaggregation- and distance-based specifications.In this approach, preferences themselves aremodeled as probabilistic in nature. Here, vari-ability in observed choice behavior is not due tonoise/errors in the responses. Rather, such vari-

Figure 8. Null Hypothesis that a person “satisfiesCPT � KT or LH.” Majority/modal choice specification(top) and 0.90-supermajority specification (bottom) of LH(in blue) and the four rankings ACD, ADC, DAC, and DCAof CPT � KT for A, C, and D of Cash II (in orange).

21QUANTITATIVE TESTING OF THEORIES OF BINARY CHOICE

Page 21: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

ability reflects substantive variation and/or un-certainty in the decision maker’s evaluationprocess. We will see that this type of model isnot just different conceptually, it is also quitedifferent geometrically, from models that as-sume constant deterministic preferences (orutilities) perturbed by random errors.

In the introduction, we reviewed CPT � KT,according to which a binary gamble with achance P of winning X (and nothing otherwise)has a subjective numerical value of

P�

�P���1�P����1��

X�. How can we model a de-

cision maker, who acts in accordance with thismodel, but who is uncertain about his risk atti-tude � and his � in the weighting function? Howcan we model decision makers who, whenasked to make a choice, sample values of �, �according to some unknown probability distri-bution over the possible values of these alge-braic parameters and then make a choice con-sistent with the CPT � KT representation? Wewill discuss a new Random CPT � KT model,in which �, � are allowed to be random vari-ables with an unknown joint distribution. Inorder to keep this paper as nontechnical as pos-sible, we consider a discretized model, in which� and � only take values that are multiples of0.01 in the range [0.01, 1]. In other words, forsimplicity, we consider an unknown distributionover finitely many possible value combinationsof �,�.

According to Random CPT � KT the pro-bability that a respondent chooses Gamble 1over Gamble 0 in Pair 1 of Table 2 is theprobability that he uses values of �,� for which

.28�

(.28� � .72�)�1��

31.43�

�.32�

(.32� � .68�)�1��

27.50�.

Can we test such a model without assuming aparticular distribution over the values for � and �?If we can communicate to QTEST what constraintsthis model imposes on binary choice probabilities,then the program can carry out a quantitative test.We can derive, for example, that

D preferred to E

⇔.4�

(.4� � .6�)�1��

22� �.44�

(.44� � .56�)�1��

20�

) .28�

(.28� � .72�)�1��

31.43� �.32�

(.32� � .68�)�1��

27.50�

⇔ A preferred to B,

no matter which values of �, � we consider (inthe specified range). Therefore, no matter whatjoint distribution we consider for �, � (in thatrange), writing PXY for the binary choice prob-ability that X is chosen over Y, it must be thecase that, 0 � PDE � PAB � 1. We discuss in theOnline Supplement how one can find a com-plete and nonredundant list of such constraints.At present this task is technically challenging.For Random CPT � KT and Cash II, such acomplete list is

0 � PDE � PCE � PCD � PBE

� PBD � PAE� PAD �

� PBC �PAC � PAB � 1. (5)

In other words, Random CPT � KT for Cash IIis the collection of all binary choice probabilitiesPAB, PAC, PAD, PAE, PBC, PBD, PBE, PCD, PCE, PDE,for which the constraints (5) hold. (There is noconstraint regarding whether PAD is greater, equal,or smaller than PBC, i.e., all three cases are per-missible solutions, as long as the two quantitiesare greater than PAE and smaller than PAC.)

Consider Gambles A, C, and D in Table 2once again. Consider the possibility that thedecision maker, at any point in time, rank ordersthe gambles from best to worst in a fashionconsistent with CPT � KT, that is, the rankingat any moment is one of ACD, ADC, DAC, andDCA, and when asked to choose among twogambles, picks the better one in the currentpreference ranking. However, that ranking isuncertain and/or allowed to vary. Mixture, aka,random preference models quantify this vari-ability with a probability distribution over pref-erence patterns such as, in this case, the fourrankings ACD, ADC, DAC, and DCA. Figure 9shows the binary choice probabilities if a per-son’s preferences fluctuate or if the person isuncertain about their preference ranking, but

22 REGENWETTER ET AL.

Page 22: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

permissible preference rankings are limited tothe rankings ACD, ADC, DAC, and DCA, con-sistent with CPT � KT.

The shaded region in Figure 9, that forms anirregular pyramid in 3D space, is called a con-

vex polytope (see the Online Supplement formore details). QTEST is able to evaluate themaximum-likelihood based goodness-of-fit ofany such convex polytope, within numericalaccuracy, provided that a) the polytope is full-

Figure 9. Two different views of Random CPT � KT on gambles A, C, and D. The fourvertices ACD, ADC, DAC, and DCA that are allowable preference patterns under CPT � KT.Every point in the shaded region has coordinates representing binary choice probabilitiesconsistent with Random CPT � KT in which � and � have some unknown joint distribution(within the stated range).

23QUANTITATIVE TESTING OF THEORIES OF BINARY CHOICE

Page 23: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

dimensional in that it has the same dimension asthe full probability space (in Figure 9, the 3Dpyramid is full-dimensional in the 3D cube; seethe Online Supplement for nonfull-dimensionalexamples), and provided that b) the user givesthe program a complete mathematical charac-terization of the polytope’s mathematical struc-ture. In practice, this means that the researcherwho wants to test a random preference modelwill first have to determine the geometric de-scription of the model. If the polytope is full-dimensional, then they can test the model usingQTEST up to computational accuracy. The char-acterization of Random CPT � KT on Cash IIvia the System of Constraints (5) happens to befairly simple (it involves 12 nonredundant “� ”constraints). In the Online Supplement, we pro-vide the corresponding complete system of 784nonredundant constraints for RandomCPT � KT on Cash I. We also consider Ran-dom CPT with “Goldstein-Einhorn” weightingfunctions and provide a complete system of 11nonredundant constraints on Cash I, as well as487 nonredundant constraints on Cash II in theOnline Supplement.

In Figure 9, the shaded region is an irregularpyramid characterized by the constraints

0 � PCD � PAD � PAC � 1.

For example, the second to last inequality givesthe shaded triangle in the �A, D�� �A, C� planeforming the base of the pyramid in the lowerdisplay, whereas the second inequality gives thetriangle in the �C, D�� �A, D� plane forming the‘back wall’ of the pyramid in the lower display.The top display is rotated and oriented so as toshow that the data sets of all three decisionmakers in Table 2 lie outside the RandomCPT � KT model.10

Notice how strongly random preference mod-els differ geometrically from aggregation- anddistance-based specifications. The aggregation-and distance-based models are a collection ofdisjoint geometric objects that are attached tothe vertices representing permissible preferencestates: for example, four disconnected cubes inFigure 8 for distance-based specifications ofCPT � KT on Cash II gamble pairs (A,C),(A,D), and (C,D). A random preference modelis always a single polytope whose vertices arethe permissible preference states: for example,

the irregular pyramid in Figure 9, for RandomCPT � KT on Cash II gamble pairs (A,C),(A,D), and (C,D). This makes it clear that fixedpreference perturbed by error and variable/uncertain preferences can be distinguishedmathematically and experimentally and at avery general level! In its current classical (“fre-quentist”) form, QTEST can test each of thesemodels, stated as a Null Hypothesis, providedthat, in the random preference and random util-ity case, the user provides the mathematicaldescription of the relevant polytope and that thelatter is full-dimensional.

In the Online Supplement, we discuss a va-riety of technical issues, including sample sizerequirements, assumptions about iid sampling,and conditions under which data can be pooledacross multiple participants.

Testing Cumulative Prospect Theoryand LH

To illustrate some applications of QTEST usingthe Cash I and Cash II data of Regenwetter et al.(2010, 2011a, 2011b) we consider three differenttheories: LH, CPT � KT, and CPT � GE. The-ory LH is the lexicographic heuristic we intro-duced earlier and illustrated in several figures.The main purpose of including LH is to showthat QTEST is not limited to numerical utilitytheories, and to illustrate how it can representand test even intransitive predictions. We havealso seen CPT � KT earlier. We now add acompeting functional form that we labelCPT � GE because it uses a “Goldstein-Einhorn”weighting function (Stott, 2006) with weightingparameters � � �0, 1� and s � �0, 10�. Accordingto CPT � GE a gamble with a P chance of win-ning X (and nothing otherwise) has a subjectivenumerical value of

sP�

sP� � (1 � P)�X�. (6)

We use � � �0, 1� as in CPT � KT.

10 Finding a nonredundant minimal complete list of con-straints characterizing a random preference model can bevery difficult. There are several public domain programs forthis task, such as, for example, PORTA (http://typo.zib.de/opt-long_projects/Software/Porta) and lrs (http://cgm.cs.mcgill.ca/~avis/C/lrs.html).

24 REGENWETTER ET AL.

Page 24: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

Table 4 shows the predicted preference pat-terns according to CPT � KT and CPT � GEfor Cash I and Cash II. For Cash I, there are 22different rankings possible according toCPT � KT, whereas there are only 11 differentrankings possible according to CPT � GE. Incontrast, in Cash II, there are altogether 43different preference patterns for CPT � GE andonly 12 predicted rankings for CPT � KT. Thismeans that either functional form of CumulativeProspect Theory can be more or less restrictivefor a given experiment, depending on the stim-uli used. In particular, CPT � GE, which hasone more algebraic parameter than CPT � KT(but does not contain CPT � KT as a nestedsubtheory), is actually more parsimonious thanCPT � KT in Cash I. Notice also that there arealtogether 210 � 1024 different conceivable 0/1-coded preference patterns, of which 120 arerankings. Of those 120, the rankings predictedby either version of Cumulative Prospect The-ory are only a fraction. This table also showsthat some of the rankings predicted byCPT � KT and/or CPT � GE only occur with avery specific set of parameter values in theweighting and utility functions. For example, itis very difficult to find a weighting function anda utility function within “Goldstein-Einhorn”that will give us a preference ranking ABDCE(GE-V40): only one in the two million combi-nations of parameter values that we checked inour grid search of the parameter space forCPT � GE actually gave this ranking, namely� � 0.911, � � 0.941 and s � 1.06. Thatparticular ranking never occurred in our gridsearch for CPT � KT.

On the other hand, each of the two theoriescan very easily explain two prominent rankingsof both stimulus sets, namely the ranking of thegambles from largest to smallest winningamount (abcde in Cash I and ABCDE in CashII) and the ranking of the gambles from highestto lowest probability of winning (edcba in CashI and EDCBA in Cash II). These two rankingscombined correspond to almost all parametervalues that we have considered in the gridsearch. The fact that two rankings are compat-ible with virtually all parameter values meansthat we may not be able to identify the param-eter values at all precisely whenever the data arein line with one of these two prominent rank-ings. This is an accident of reusing published

data. On the other hand, this example alsoshows that it is possible, in principle, to makeextremely specific predictions that could narrowdown the possible weighting and utility func-tions underlying binary choices. If a participantwere to reliably provide data consistent with thepreference ranking ABDCE (GE-V40) in Cash II,we would have a very precise idea of this deci-sion maker’s weighting and utility function assuggested by CPT � GE and we would haveevidence in favor of CPT � GE and againstCPT � KT.

Table 5 shows our analysis results for modalchoice and supermajority. The top panel pro-vides the results for the majority/modal choicespecification of CPT � KT, CPT � GE, andLH. At first sight, CPT � KT fits perfectly for11 participants in Cash I and for 12 participantsin Cash II, as indicated by in the correspond-ing columns. In each case that we find a theoryto fit the data, we provide the label of the bestfitting vertex. Because the Cash I and Cash IIstimuli were intermixed in the experiment, anymodel that assumes a decision maker to use afixed weighting and a fixed utility function andto make choices based on just those two func-tions, plus commit random errors, should beable to account for the Cash I and Cash II datajointly without requiring different weightingand utility functions for the two stimulus sets. Inother words, a person’s Cash I and Cash IIresponses can serve as each others’ replications.When a theory consistently fits the data of aperson for both stimulus sets with the sameweighting and the same utility function, then wemark the vertex labels in italics to indicate thatthey are mutually consistent. Whenever a theoryis rejected on a given stimulus set for a givenparticipant, we mark this with “—” and providethe p-value in bold faced font. The table can beread as follows: For Respondent 1, we findCPT � KT and CPT � GE to fit the Cash Iresponses, but both theories are rejected on theCash II data, whereas LH fits on both data sets.The column marked “Performance Summary”shows for each theory whether it is rejected(marked “—”), whether it fits consistentlyacross the two stimulus sets (marked KT, GE, orLH), or whether, even though it fits, it fails todo so consistently across the two stimulus sets(marked “�”).

25QUANTITATIVE TESTING OF THEORIES OF BINARY CHOICE

Page 25: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

Table 4Predicted Preference Patterns According to CPT � KT and CPT � GE for Cash I and Cash II

Cash I Predicted preference patterns Cash II Predicted preference patterns

CPT � KT CPT � GE CPT � KT CPT � GE

Vertex Portion Rankings Vertex Portion Vertex Portion Rankings Vertex Portion

KT-v1 0.6 edcba GE-v1 0.3 KT-V1 0.4 EDCBA GE-V1 0.2KT-v2 0.01 decba GE-v2 0.02 DECBA GE-V2 0.002KT-v3 0.009 dceba GE-v3 0.02 DCEBA GE-V3 0.002KT-v4 0.001 cdeba GE-v4 0.002 CDEBA GE-V4 0.0003KT-v5 0.008 cdbea GE-v5 0.02 DCBEA GE-V5 0.000007KT-v6 0.002 cbdea GE-v6 0.004 CDBEA GE-V6 0.002KT-v7 0.006 cbdae GE-v7 0.02 CBDEA GE-V7 0.0009KT-v8 0.0008 bcdae GE-v8 0.004 EDBCA GE-V8 0.00002KT-v9 0.007 bcade GE-v9 0.03 DBCEA GE-V9 0.0000005KT-v10 0.006 bacde GE-v10 0.03 EBDCA GE-V10 0.000005KT-v11 0.003 edcab EBCDA GE-V11 0.0000005KT-v12 0.003 edacb BDCEA GE-V12 0.0000005KT-v13 0.0001 edabc BECDA GE-V13 0.000001KT-v14 0.0001 eadcb BCEDA GE-V14 0.0000005KT-v15 0.003 eadbc BCDEA GE-V15 0.00002KT-v16 0.0003 aedbc CBDAE GE-V16 0.001KT-v17 0.004 aebdc BCDAE GE-V17 0.0008KT-v18 0.0001 aebcd BCAED GE-V18 0.0000005KT-v19 0.0002 abedc BCADE GE-V19 0.002KT-v20 0.004 abecd EDBAC GE-V20 0.000008KT-v21 0.005 abced EBDAC GE-V21 0.000002KT-v22 0.3 abcde GE-v11 0.5 BEDAC GE-V22 0.0000005

BDAEC GE-V23 0.0000005EBADC GE-V24 0.000004BEADC GE-V25 0.0000005BEACD GE-V26 0.0000005BAEDC GE-V27 0.0000005BAECD GE-V28 0.000005BACED GE-V29 0.000003BACDE GE-V30 0.003

KT-V2 0.005 EDCAB GE-V31 0.0009KT-V3 0.006 EDACB GE-V32 0.001KT-V4 0.0005 EDABC GE-V33 0.00007KT-V5 0.0002 EADCB GE-V34 0.000007KT-V6 0.007 EADBC GE-V35 0.001

EABDC GE-V36 0.00002KT-V7 0.0005 AEDBC GE-V37 0.00009KT-V8 0.008 AEBDC GE-V38 0.001KT-V9 0.0007 ABEDC GE-V39 0.0002

ABDCE GE-V40 0.0000005KT-V10 0.009 ABECD GE-V41 0.001KT-V11 0.01 ABCED GE-V42 0.001KT-V12 0.6 ABCDE GE-V43 0.8

Note. For each pattern, we provide the corresponding ranking of five gambles (from best to worst), and the portion of thealgebraic parameter space corresponding to each preference (based on a grid search), up to and rounded to the firstsignificant nonzero digit. The precision and range of our grid search were as follows: CPT � KT: �, � � �0.01, 1�, step size0.01. CPT � GE: �, � � �0.001, 0.991�, step size 0.01; s � �0.01, 9.96�, step size 0.05. When two gambles were assignednumerical values differing by less than 10�20, no prediction was made (100 cases of Cash I and 101 cases of Cash II; thisoccurred in CPT � KT only).

26 REGENWETTER ET AL.

Page 26: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

In the lower part of Table 5, we report the0.90-supermajority specification of CPT � KTand CPT � GE. The heuristic LH is now re-jected on every data set. With supermajority of

0.90, both CPT � KT and CPT � GE fit on theexact same data sets, namely, Participants 3, 5,8, 10, 11, and 14. The analysis in the lower halfof the table strongly suggests that the inconsis-

Table 5Results for Modal (0.50-Majority) and 0.90-Supermajority Specifications of Cumulative Prospect TheoryWith “Kahneman-Tversky” or “Goldstein-Einhorn” Weighting Functions and the Lexicographic HeuristicLH for 18 Participants

Performancesummary

“Kahneman-Tversky” “Goldstein-Einhorn” LH

Cash I Cash II Cash I Cash II Cash I Cash II

0.50-Majority/Modal Choice Specification

1 — — LH v1 0.13 — 0.03 v1 0.13 — 0.03 0.18 0.182 KT GE — �22 V12 �11 V43 0.00 0.003 KT GE — �1 V1 �1 V1 0.00 0.004 — — — — 0.01 V11,12 — 0.00 V42,43 0.00 0.005 KT GE — �1 V1 �1 V1 0.00 0.006 � � — v1 0.26 V8 0.40 v1 0.26 V38 0.40 0.21 0.017 � � — v1 V2 v1 V31 0.00 0.018 KT GE — �1 V1 �1 V1 0.00 0.009 � � — v1 0.09 V10,11 v1 0.09 V41,42 0.00 0.00

10 KT GE — �1 V1 �1 V1 0.00 0.0011 KT GE — �1 V1 �1 V1 0.00 0.0012 � — — v13 0.26 V10 0.12 — 0.01 V26 0.17 0.0013 � � LH v11 0.67 V5 0.62 v1 0.67 V34 0.70 0.08 0.5614 KT GE — �22 V12 �11 V43 0.00 0.0015 � � — v1 V9,10 v1 V39,41 0.02 0.0016 — — — v1 — 0.01 v1 — 0.02 0.00 0.0017 KT GE — �22 0.31 V12 v11 0.31 V43 0.05 0.0018 — � LH v1,2,3 — 0.01 v1,2,3 V4 0.23 0.4 0.47

0.90-Supermajority Specification1 — — — — 0.00 — 0.00 — 0.00 — 0.00 0.00 0.002 — — — — 0.00 V12 0.51 — 0.00 V43 0.51 0.00 0.003 KT GE — �1 V1 0.73 �1 V1 0.73 0.00 0.004 — — — — 0.00 — 0.00 — 0.00 — 0.00 0.00 0.005 KT GE — �1 V1 0.57 �1 V1 0.58 0.00 0.006 — — — — 0.00 — 0.00 — 0.00 — 0.00 0.00 0.007 — — — v1 0.81 — 0.00 v1 0.81 — 0.00 0.00 0.008 KT GE — �1 0.95 V1 0.90 �1 0.95 V1 0.90 0.00 0.009 — — — — 0.00 — 0.00 — 0.00 — 0.00 0.00 0.00

10 KT GE — �1 0.72 V1 0.30 �1 0.72 V1 0.30 0.00 0.0011 KT GE — �1 0.95 V1 �1 0.95 V1 0.00 0.0012 — — — — 0.00 — 0.00 — 0.00 — 0.00 0.00 0.0013 — — — — 0.00 — 0.00 — 0.00 — 0.00 0.00 0.0014 KT GE — �22 V12 �11 V43 0.00 0.0015 — — — — 0.00 — 0.00 — 0.00 — 0.00 0.00 0.0016 — — — — 0.02 — 0.00 — 0.02 — 0.00 0.00 0.0017 — — — — 0.00 — 0.00 — 0.00 — 0.00 0.00 0.0018 — — — — 0.00 — 0.00 — 0.00 — 0.00 0.00 0.00

Note. Rejections have boldface p-values (rounded to nearest percent). Perfect fits are checkmarks ( ). Nonsignificantviolations have their p-values listed (rounded to nearest percent). When a theory fits consistently in Cash I and Cash II, itis in italics. If it fits both stimulus sets, but there is no weighting function that yields the best fitting vertex for both stimulussets, then the fitting vertices are in smaller font. In Performance summary, — means rejected, KT means CPT � KT fitsconsistently, GE means CPT � GE fits consistently, and � means that two fits are mutually inconsistent.

27QUANTITATIVE TESTING OF THEORIES OF BINARY CHOICE

Page 27: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

tent fits in the modal choice analysis that wefound for Participants 6, 7, 9, 12, 13, 15, and 18were examples of ‘overfitting,’ that is ‘acciden-tal’ fits of the models. When we are interested inmodeling true preference perturbed by randomerror, we may want to limit error rates far below50% to avoid ‘overfitting.’ (Note, however, thateconometric models will oftentimes predictchoice probabilities close to 1

2, notably when-ever the strength of preference is negligiblerelative to the noise. In such models, choiceproportions far from 50% would be reason forrejection.)

The lower half of Table 5 offers three impor-tant insights: First, both versions of CumulativeProspect Theory are rejected on two thirds ofthe participants. Second, in those cases whereeither of these versions of Cumulative ProspectTheory fits a participant for one stimulus set, itdoes so also in the other stimulus set, hence, thesuccessful fits are highly replicable. Third, theonly predicted preference patterns that are notrejected, are the ranking of the gambles bydecreasing winning amount (abcde in Cash Iand ABCDE in Cash II), and the ranking of thegambles by increasing probability of winning(edcba in Cash I and EDCBA in Cash II). Thesetwo rankings are also consistent with simpleheuristics according to which decision makerseither ignore probabilities or outcomes for lot-teries like these. The Cash I and Cash II stim-ulus sets were not originally designed to bediagnostic for a full-fledged test of Cumula-tive Prospect Theory, hence we leave it forfuture work to determine the performance ofCPT � KT and CPT � GE more systemati-cally.

While we leave a full formal and theoreticaldiscussion of the relationship among differenttypes of probabilistic choice models for a dif-ferent paper, we have mentioned that manyeconometric models make predictions that arenested in the majority/modal choice specifica-tion. For example, “Logit,” “Probit,” and “Con-textual Utility” models, as well as a broad rangeof related econometric models (Blavatskyy,2007; Blavatskyy & Pogrebna, 2010; Stott,2006; Wilcox, 2008, 2011), are all nested in themodal choice specification. So are the choiceprobabilities under “decision field theory”(Busemeyer & Townsend, 1992, 1993) in thecase of deliberation with no initial bias. All of

these models imply that an option with higherutility has a probability 1/2 of being chosen. Ifone were to apply any one of these probabilisticmodels to CPT � KT and CPT � GE, for ex-ample, then the resulting binary choice proba-bilities would lie inside the majority/modalchoice model of CPT � KT and CPT � GE.

This is an example of how QTEST can serveas a screening device for the quantitativelysavvy decision scientist: It follows from ouranalyses that these parametric probabilisticmodels, when applied to CPT � KT andCPT � GE, would probably also fit poorly forthose five participants (1, 4, 12, 16, and 18)where we rejected the majority specification ofCPT � KT and CPT � GE. Likewise, in those7 cases where the modal choice specification fitinconsistently, many econometric models ofCPT � KT and CPT � GE would probablyyield different, hence mutually inconsistent, pa-rameter estimates on the Cash I and Cash IIstimuli as well. Majority/modal choice specifi-cations in QTEST can serve as a triage for de-ciding whether or not it is worth applying one ofthese econometric models to a given set of dataand for a given theory. Note that, unlike theseeconometric models, modal choice models donot require numerical strength of preference asinput. For example, we have illustrated a modalchoice specification of an intransitive modelLH for which there is no “Logit” or “Probit”formulation.

Table 6 shows the results of fitting RandomCPT � KT and Random CPT � GE on the samedata. There are two noteworthy findings: First,although the full linear order model was verysuccessfully fit to these data by Regenwetter et al.(2011a), the more restrictive, nested, RandomCPT � KT and CPT � GEmodels, in which onlylinear orders compatible with Cumulative Pros-pect Theory are allowed, are both rejected on twothirds of the participants. Second, for Participant 7,we find evidence in favor of Random CPT � GEand against Random CPT � KT, whereas forParticipant 9, we find evidence in favor ofRandom CPT � KT and against RandomCPT � GE. This documents that the quantita-tive analysis has the ability to let these theoriescompete. A more targeted experiment in thefuture could allow a stronger model competi-tion. Full-fledged model competition and modelselection, beyond mere rejections/retentions of

28 REGENWETTER ET AL.

Page 28: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

Null Hypotheses, also requires a future exten-sion of QTEST, for example, to Bayesian analy-sis methods that naturally trade-off betweencompeting models based on their complexity(flexibility).

QTEST as a Triage Method

Figure 10 gives an overview of how QTEST

operates as a triage method. The scholar firstneeds to determine all permissible preference pat-terns according to the theory or theories at hand.

• To model a decision maker who has a fixedpreference or utility function perturbed by error(Aggregation- and Distance-based (Error) Mod-els and Distance-based Models for Theorieswith Multiple Predictions sections), QTEST pro-vides highly automated tools to generate andtest distance-based specifications (left columnof Figure 10). For example, weak stochastictransitivity (shown in the top panel of Figure 7)is the majority/modal choice specification of thecollection of all linear orders. Regenwetter et al.(2010) previously ran a test of weak stochastic

transitivity using the computer code underlyingQTEST. Similarly, Figure 8 gives majority/modal choice and supermajority specificationsof CPT � KT (orange) and LH (blue). For anytheory of pairwise preference (that does notpredict indifference among any of the stimuliunder consideration), whether it involves highlyspecified numerical functional forms likeCPT � KT and CPT � GE, or whether it ischaracterized by some general property likeLH, QTEST only needs to know the permissiblepreference patterns to proceed. Scholars inter-ested in a very general and abstract, say, ‘non-parametric rank-dependent’ theory (of whichCPT � KT and CPT � GE are highly special-ized refinements) can likewise use QTEST aslong as they specify all permissible preferencepatterns according to such a theory. For exam-ple, the permissible preference states may bespecified through a list of general “axioms”(rules defining the mathematical representationof preferences). When a theory predicts astrength of preference (similar to our Eq. 4),there automatically exist a large number of

Table 6Results for Random Preference Models, Random CPT � KT and Random CPT � GE

Performance summary

RandomCPT � KT

RandomCPT � GE

Cash I Cash II Cash I Cash II

Random Preference Model

1 — — 0.00 0.00 0.00 0.012 — — 0.01 0.07 0.01 0.043 Random CPT � KT Random CPT � GE 0.39 0.09 0.49 0.954 — — 0.00 0.05 0.00 0.135 Random CPT � KT Random CPT � GE 0.44 0.38 0.116 — — 0.00 0.06 0.00 0.357 — Random CPT � GE 0.41 0.00 0.82 0.168 Random CPT � KT Random CPT � GE 0.39 0.07 0.059 Random CPT � KT — 0.10 0.12 0.04 0.15

10 — Random CPT � GE 0.045 0.34 0.12 0.4811 — — 0.00 0.24 0.02 0.0212 — — 0.00 0.00 0.00 0.0713 Random CPT � KT Random CPT � GE 0.36 0.06 0.2014 Random CPT � KT Random CPT � GE 0.6315 — — 0.19 0.00 0.02 0.0016 — — 0.00 0.00 0.00 0.0017 — — 0.047 0.17 0.00 0.1918 — Random CPT � GE 0.15 0.01 0.08 0.05

Note. Rejections have boldface p-values (rounded to nearest percent, except 0.045, 0.047, which are rounded to nearestpermille). Perfect fits are checkmarks ( ). Nonsignificant violations have their p-values listed (rounded to nearest percent).In Performance summary, — means rejected, “Random CPT � KT ” means “Random CPT � KT fits consistently”“Random CPT � GE ” means “Random CPT � GE fits consistently.”

29QUANTITATIVE TESTING OF THEORIES OF BINARY CHOICE

Page 29: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

econometric specifications, but some theories,such as LH, are not compatible with an econo-metric specification because they provide nostrength of preference input to such models.Because tremble and most econometric models(when they exist) are nested in the majority/modal choice specification, a rejection of themodal choice specification is a strong argumentagainst applying any such nested error modelsto those data (lower left of Figure 10).11

• To model a decision maker who wavers inhis preference or utility function (Random Prefer-ence and Random Utility Models section), QTEST

provides a suitable test, provided that the userenters a complete mathematical description of therelevant polytope (right column in Figure 10). TheRandom CPT � KT polytope is illustrated in Fig-ure 9. The polytope for all linear orders was pre-viously tested by Regenwetter et al. (2011a) andsome lexicographic semiorder polytopes weretested by Regenwetter et al. (2011b), on thesesame data. The linear ordering polytope contains

many econometric models as special cases, but itgenerally does not contain tremble models, hencea rejection of the linear ordering polytope wouldimply rejection of many econometric models(lower right of Figure 10).

We leave a much more extensive classifica-tion of probabilistic models, as well as manynew theoretical developments connecting natu-rally to QTEST, for future work.

Conclusions

QTEST provides a highly versatile, yet accessi-ble, quantitative testing environment for preferen-tial binary choice. We have discussed aggrega-

11 A referee pointed out that an econometric model couldbe a lower dimensional nested model of a modal choicespecification. A rejection of the majority/modal choicespecification still implies rejection of that nested economet-ric model, because the modal choice specification in such acase remains a generalization of the latter.

Figure 10. Summary graph. Note: “RDDS 11” is Regenwetter et al. (2011a) who tested thelinear ordering polytope on these data. “RDDSG 11” is Regenwetter et al. (2011b), who testedsome lexicographic semiorder polytopes on these data. Regenwetter et al. (2010) tested weakstochastic transitivity on these data.

30 REGENWETTER ET AL.

Page 30: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

tion- and distance-based specifications ofalgebraic theories that encapsulate the notion thatthe decision maker has a fixed binary preferenceand makes occasional erroneous choices, with er-ror rates being constrained in a variety of ways.This type of model makes it possible to developprobabilistic specifications of theories that are nu-merical or nonnumerical, that allow a single pref-erence pattern or multiple preference patterns. Wealso reviewed random preference models includ-ing two new probabilistic formulations of Cumu-lative Prospect Theory: Random CPT � KT andRandom CPT � GE. Last, but not least, we haveshown an application of some QTEST analyses onpreviously published laboratory choice data. Weillustrated how a simple lexicographic heuristicwas rejected on (almost) every data set (even atthe modal choice level). We provided tests ofCPT � KT and CPT � GE and concluded for thesupermajority specification that both versions ofCumulative Prospect Theory account for the exactsame six participants, who acted in a fashion con-sistent with two very simple heuristics, namely toprefer gambles with higher amounts or to prefergambles with higher probabilities of winning. Wealso documented how QTEST was ‘diagnostic’ be-tween Random CPT � KT and RandomCPT � GE in retaining one while rejecting theother as a Null Hypothesis. A full model selectionframework, however, will require further refine-ments, such as Bayesian extensions, for example.As we saw in Table 2, Random CPT � KT iseven rejected on the data of the hypothetical de-cision maker HDM whose data appeared to benearly in perfect agreement with KT-V4.

Because the Regenwetter et al. stimuli weredesigned as a replication of Tversky (1969), totest transitivity, not to be diagnostic amongcompeting theories, we leave it for follow-upwork to carry out more direct tests and compar-isons of decision making theories using QTEST.Likewise, work is under way to test theories onother domains, such as in intertemporal choiceand probabilistic inference, using the samemodeling and analysis framework. Further-more, Bayesian extensions and parallelized ver-sions of QTEST for multicore computer systemsare under development.

References

Birnbaum, M. H., & Bahra, J. (2012). Separatingresponse variability from structural inconsistency

to test models of risky decision making. Judgmentand Decision Making, 7, 402–426.

Birnbaum, M. H., & Chavez, A. (1997). Tests oftheories of decision making: Violations of branchindependence and distribution independence. Or-ganizational Behavior and Human Decision Pro-cesses, 71, 161–194.

Birnbaum, M. H., & Gutierrez, R. (2007). Testing forintransitivity of preferences predicted by a lexico-graphic semiorder. Organizational Behavior andHuman Decision Processes, 104, 96–112.

Blavatskyy, P. R. (2007). Stochastic expected utilitytheory. Journal of Risk and Uncertainty, 34, 259–286.

Blavatskyy, P. R., & Pogrebna, G. (2010). Models ofstochastic choice and decision theories: Why bothare important for analyzing decisions. Journal ofApplied Econometrics, 25, 963–986.

Busemeyer, J. R., & Townsend, J. T. (1992). Funda-mental derivations from decision field theory.Mathematical Social Sciences, 23, 255–282.

Busemeyer, J. R., & Townsend, J. T. (1993). Deci-sion field theory: A dynamic-cognitive approach todecision making in an uncertain environment. Psy-chological Review, 100, 432–459.

Camerer, C. F. (1989). An experimental test of sev-eral generalized utility theories. Journal of Riskand Uncertainty, 2, 61–104.

Davis-Stober, C. P. (2009). Analysis of multinomialmodels under inequality constraints: Applicationsto measurement theory. Journal of MathematicalPsychology, 53, 1–13.

Davis-Stober, C. P., & Brown, N. (2011). A shift instrategy or “error”? Strategy classification overmultiple stochastic specifications. Judgment andDecision Making, 6, 800–813.

Harless, D. W., & Camerer, C. F. (1994). The pre-dictive utility of generalized expected utility theo-ries. Econometrica, 62, 1251–89.

Hausman, J., & McFadden, D. (1984). Specificationtests for the multinomial logit model. Economet-rica, 52, 1219–1240.

Hey, J. D. (1995). Experimental investigations oferrors in decision making under risk. EuropeanEconomic Review, 39, 633–640.

Hey, J. D. (2005). Why we should not be silent aboutnoise. Experimental Economics, 8, 325–345.

Hey, J. D., & Orme, C. (1994). Investigating gener-alizations of expected utility theory using experi-mental data. Econometrica, 62, 1291–1326.

Iverson, G. J., & Falmagne, J.-C. (1985). Statisticalissues in measurement. Mathematical Social Sci-ences, 10, 131–153.

Klugkist, I., & Hoijtink, H. (2007). The Bayes factorfor inequality and about equality constrained mod-els. Computational Statistics & Data Analysis, 51,6367–6379.

31QUANTITATIVE TESTING OF THEORIES OF BINARY CHOICE

Page 31: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

Loomes, G., Moffatt, P. G., & Sugden, R. (2002). Amicroeconometric test of alternative stochastic the-ories of risky choice. Journal of Risk and Uncer-tainty, 24, 103–130.

Loomes, G., & Sugden, R. (1995). Incorporating astochastic element into decision theories. Euro-pean Economic Review, 39, 641–648.

Luce, R. D. (1959). Individual choice behavior: Atheoretical analysis. New York, NY: Wiley.

Luce, R. D. (1995). Four tensions concerning math-ematical modeling in psychology. Annual Reviewof Psychology, 46, 1–26.

Luce, R. D. (1997). Several unresolved conceptualproblems of mathematical psychology. Journal ofMathematical Psychology, 41, 79–87.

Myung, J., Karabatsos, G., & Iverson, G. (2005). ABayesian approach to testing decision making ax-ioms. Journal of Mathematical Psychology, 49,205–225.

Regenwetter, M., Dana, J., & Davis-Stober, C. P.(2010). Testing transitivity of preferences on two-alternative forced choice data. Frontiers in Quan-titative Psychology and Measurement, 1, 148.

Regenwetter, M., Dana, J., & Davis-Stober, C. P.(2011a). Transitivity of preferences. PsychologicalReview, 118, 42–56.

Regenwetter, M., Dana, J., Davis-Stober, C. P., &Guo, Y. (2011b). Parsimonious testing of transi-tive or intransitive preferences: Reply to Birnbaum(2011). Psychological Review, 118, 684–688.

Regenwetter, M., & Davis-Stober, C. P. (2012).Choice variability versus structural inconsistencyof preferences. Psychological Review, 119, 408–416.

Starmer, C. (2000). Developments in non-expectedutility theory: The hunt for a descriptive theory ofchoice under risk. Journal of Economic Literature,38, 332–382.

Starmer, C., & Sugden, R. (1989). Probability andjuxtaposition effects: An experimental investiga-tion of the common ratio effect. Journal of Riskand Uncertainty, 2, 159–178.

Stott, H. (2006). Cumulative prospect theory’s func-tional menagerie. Journal of Risk and Uncertainty,32, 101–130.

Tversky, A. (1969). Intransitivity of preferences.Psychological Review, 76, 31–48.

Tversky, A., & Kahneman, D. (1992). Advances inprospect theory: Cumulative representation of un-certainty. Journal of Risk and Uncertainty, 5, 297–323.

Vuong, Q. H. (1989). Likelihood ratio tests for modelselection and non-nested hypotheses. Economet-rica, 57, 307–333.

Wilcox, N. (2008). Stochastic models for binary dis-crete choice under risk: A critical primer andeconometric comparison. In J. Cox and G. Harri-son (Eds.), Risk aversion in experiments (Vol. 12,pp. 197–292). Bingley, UK: Emerald, Research inExperimental Economics.

Wilcox, N. (2011). Stochastically more risk averse: Acontextual theory of stochastic discrete choice un-der risk. Journal of Econometrics, 162, 89–104.

Yellott, J. I. J. (1977). The relationship between Luce’schoice axiom, Thurstone’s theory of comparativejudgement, and the double exponential distribution.Journal of Mathematical Psychology, 15, 109–144.

32 REGENWETTER ET AL.

Page 32: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

Appendix A

Cash I Lotteries in Regenwetter et al. (2010, 2011a, 2011b)

Gamble a: 7/24 chance of gaining $28,gain

or lose nothing otherwise.

Gamble b: 8/24 chance of gaining $26.60,gain or lose nothing otherwise.

Gamble c: 9/24 chance of gaining $25.20,gain or lose nothing otherwise.

Gamble d: 10/24 chance of gaining $23.80,gain or lose nothing otherwise.

Gamble e: 11/24 chance of gaining $22.40,gain or lose nothing otherwise.

For ease of notation we use small letters forCash I and capital letters for Cash II (see Table 2).

Appendix B

Probabilistic Specification

We introduce minimal mathematical notationto be concise. For a (deterministic) decisiontheory T, and for each pair of choice alterna-tives, f, g, write

fg* (T ) ��1

if f is strictly preferred

to g according to T,

0if f is not strictly preferred to

g according to T.

(7)

For KT-V4 of Table 2, we have

AB* (T ) � AC

* (T ) � BC* (T ) � 1,

AD* (T ) � AE

* (T ) � BD* (T ) � BE

* (T ) � CD* (T )

� CE* (T ) � DE

* (T ) � 0,

and thus (we also have the redundant informa-tion),

BA* (T ) � CA

* (T ) � CB* (T ) � 0,

DA* (T ) � EA

* (T ) � DB* (T ) � EB

* (T ) � DC* (T )

� EC* (T ) � ED

* (T ) � 1.

We call the vector VT � � fg* �T ��fg the vertex

representation of T. Leaving out the redundantcoordinates above, the vertex representation ofKT-V4 yields the following “vertex” of the unit“hypercube” in 10-space:

VKT�V4 � (1, 1, 0, 0, 1, 0, 0, 0, 0, 0). (8)

Our first step in probabilizing theories aboutbinary choice is to replace each coordinate fg

*

by a parameter fg � �0, 1� of a Bernoulli pro-cess. The Bernoulli process could model therandom selection of a respondent, in which casethe parameter fg denotes the probability thatsuch a respondent chooses f over g. Alterna-tively, the Bernoulli process could model afixed respondent’s selection of a choice alterna-tive in a randomly sampled observation. In thatcase, fg denotes the probability that the respon-dent chooses f over g in such an observation.With multiple paired comparisons, under cer-tain iid sampling assumptions, the fg form theparameters of a product of binomial distribu-tions. Throughout, we assume a two-alternativeforced choice paradigm in which each fg �1 � gf (and, for consistency, fg

* � 1 � gf* ).

(Appendices continue)

33QUANTITATIVE TESTING OF THEORIES OF BINARY CHOICE

Page 33: QTEST: Quantitative Testing of Theories of Binary Choice › pubs › journals › features › dec-0000007.pdf · (Stott, 2006), according to which a binary gamble with a P chance

Taking fg�T � as fg* �T �, the vertex represen-

tation embeds the deterministic theories as ex-treme points in a probability space. To para-phrase: “f is strictly preferred to g in theory T ifand only if f is chosen over g with probabilityone in T.” The purpose of probabilistic specifi-cations is to extend the range of choice proba-bilities to values between zero and one. Ourvarious probabilistic specifications achieve thisgoal by expanding the vertex representationsinto different types of geometric regions withinthe probability space.

Probabilistic Specification by Majority andSupermajority Rules

Let � � �12, 1� be a supermajority level. Su-

permajority specification of a deterministic the-ory T states a system of inequality constraintson the binary choice probabilities fg�T �, ac-cording to which, ∀f � g,

f is strictly preferred to g according to T

⇔ fg(T ) � �. (9)

When � � 12, this is just a formal represen-

tation of the requirement that the modal pair-wise choices in the population must match thebinary preferences of theory T. The modalchoice specification was illustrated in 3D-Figure 3, and the supermajority specificationwas illustrated in 3D-Figure 4 with � � .75.

Distance-Based Specification

Let � be a distance measure (in the appropri-ate space). Let U � 0 be an upper bound on thepermissible distance between choice probabili-ties and vertex representation. A distance-based

probabilistic specification of a deterministictheory T, with distance � and upper bound U,states that the vector �T � of binary choiceprobabilities that are allowable under T mustsatisfy

�( (T ),VT) � U. (10)

Three examples of � are as follows (usingnonredundant choice probabilities):

Supremum Distance: ��( (T ),VT)

� maxf�g

� fg(T ) � fg* (T )�, (11)

City-block Distance: �1( (T ),VT)

� �f�g

� fg(T ) � fg* (T )�, (12)

Euclidean Distance: �2( (T ),VT)

���f�g

� fg(T ) � fg* (T )�2. (13)

The supremum-distance specification can be re-formulated as a supermajority specification with� � 1 � U. Figures 2 and 3 hence gave anillustration of distance-based specification withan upper bound U � 0.5 on the supremumdistance. Figure 4 gives an illustration of dis-tance-based specification with an upper boundU � 0.25 on the supremum distance. The city-block specification is illustrated in Figure 5 withU � 0.50.

Received March 1, 2013Revision received October 1, 2013

Accepted October 22, 2013 �

E-Mail Notification of Your Latest Issue Online!

Would you like to know when the next issue of your favorite APA journal will beavailable online? This service is now available to you. Sign up at http://notify.apa.org/ andyou will be notified by e-mail when issues of interest to you become available!

34 REGENWETTER ET AL.