Artificial Intelligence Marketing and Predicting Consumer Choice

PraiseforArtificialIntelligenceMarketingandPredictingConsumerChoice

‘Fullofhard-wonpracticalwisdom,thisisacomprehensiveguidetonavigatingthecomplexityofmarketforecasting.Foregoingthehyperbolethatsooftencharacterizesdiscussionsofartificialintelligence,DrStruhlthoroughlyexplainsawiderangeofmethods,wheretheirdifficultieslieandhowtogetthebestinsightsfromeach.’PeterGoldstein,SoftwareEngineer,Google

‘DrStruhl’snewbookisararejewelamongmarketingsciencetomes–informative,easytounderstandand,dareIsay,evenentertaining.DrStruhlsurveysseveralmajoranalytictechniquesinplainEnglish,introducingthenovicetofoundationalconceptswhileatthesametimeremindingtheseasonedanalystofbestpracticesoftenforgotten,allwhilesprinklinghiswryhumourlikeaspoonfulofsugartohelpthemedicinegodown.Fortechniquesalreadyfamiliar,it’sanenjoyablerefresher;fortechniquesunfamiliar,anexcellentintroduction.Avaluableresourceforbeginnerandexpertalike.’DrRichardMcCullough,President,MacroConsultingInc

‘Thisbookcoverslucidlyanumberofresearchmethodologiesthatcommonlysupportveryimportantnewproductdevelopmentandmarketingstrategydecisions.DrStruhlshouldbecommendedformakingthematerialsaccessibletoawiderangeofaudiencesbyemphasizingthepracticality,appropriateness,andprosandconsofthevariousmethodologies.’JehoshuaEliashberg,SebastianSKresgeProfessorofMarketing,andProfessorofOperations,InformationandDecisions,TheWhartonSchool

‘DrStruhlhaswrittenanotherhighlyinformativebook.Itoffersaneasy-to-understandwayofthinkingabouthowtobestusedatatoanswerbiggermarketingquestions.Hisexplanationsareclearandrelatable,makingthisbookaninvaluabletoolforanyoneinvolvedincommercialdecisionmaking,especiallymarketersandresearchers.’KatieSzelc,Manager,CustomerInsights,GlobalBusinessInsights,Johnson&JohnsonMedicalDevices

‘Anexcellentall-in-oneprimerfortoday’smarketerandresearcher.Thisisclear,tothepointandacomprehensiveguidetothiscomplexfield.’LouisATucciPhD,AssociateProfessorofMarketing,TheCollegeofNewJersey

‘DrStruhldoesanexcellentjobofexplainingthestrengthsandweaknessesofmethodsofpredictingconsumerbehaviour.Thisbookisthoughtful,well-written,andalsoapracticalbookformarketers,marketingresearchersandbusinessconsultants.Ifyouhelporganizationsmakedecisions,thebestdecisionyoucanmakerightnowistoreadthisbook.’DavidFHarris,authorofTheCompleteGuidetoWritingQuestionnaires:Howtogetbetterinformationforbetterdecisions

‘ArtificialIntelligenceMarketingandPredictingConsumerChoiceclearlyexplainsthetoolsthatdrivesophisticatedmarketresearch.Iheartilyrecommendthisbookforanyonelookingforgreaterinsightandsuccesswithcutting-edgetechniques.’RobertKaminsky,President,MedSpanResearch

‘Forresearcherswhowanttotacklethecomplextasksofpredictingconsumerchoicesandcreatingmarketsimulations,thisbookisagreatone-stopreference.Ineasy-to-readstylewithplentyofusefulexamples,theauthorcoversconventionalmultivariatedataanalysistechniques(conjoint,discretechoice,CHAID,regression)aswellasthelatestones(HierarchicalBayesiananalysis).Thebookalsoincludesmanykeyconceptsanddefinitionsusefulforanyquantitativeresearcher,suchasstatisticalsignificance,sampling,andmore.’KathrynKorostoff,LeadInstructorand

Founder,ResearchRockstarLLC

‘Thisbookcoversanextensivesetofmethodsforpredictingconsumerchoices,includingconjointanalysis,discretechoicemodelling,neuralnetworks,classificationtrees,Bayesianmethods,andsomuchmore.DrStruhlhaswrittenagenuinelypracticalguidetopredictiveanalyticsthatissoeasy,itreadslikeabedtimereader.Havingbeenapractitionerinthisareaforover20years,Ifoundthisbooktonotonlybeinformativeforapertinent21stcenturytopic,butalsoafunread.’DonMeyer,ClientDirector,Analytics,ACNielsen

‘I’vebeenworkingwithDrStruhlforthepasttwoandahalfyearsandamtrulyimpressedbyhisexpertise.Hisbookcoversatrulyexpansiverangeofmethodsforpredictingconsumerchoices.Theseincludeneuralnetworks,ensembles,BayesianNetworksandclassificationtrees.Healsotalksabouthowmoreestablishedmethodssuchasconjointanalysisanddiscretechoicemodellinghavebenefitedfrommachinelearningmethods.Hereyouwillfindthebestapplicationsofeachapproach,withplentyofexamplesfromreallife,showingwhatworksandwhatdoesnot.Theonline,downloadablesimulatorprogramsareincrediblyimpressiveandshowyoutheamazingthingsthatcanbedoneintherealmofpredictions.’SungLee,President,TheResearchAssociates

‘Thisbookcoversatrulyexpansiverangeofmethodsforpredictingconsumerchoices.Someoftheseincludeconjointanalysis,discretechoicemodelling,neuralnetworks,classificationtrees,andBayesiannetworks.Withpracticaltipsandexamples,andawelcomeuseofhumour,thisisaclear,easy-to-readanddefinitiveguideforexpertsandnovicesalike.’PaulNisbetPhD,President,OneResearch

‘StevenStruhl’sdecadesofexperienceasananalyticsguruinreal-worldmarketingapplicationsshinethroughinthishighlyreadableguidetomodernanalysesandmodelsofconsumerchoices.Inengagingandentertainingstyle,brimmingwithpracticalexamples,withoutabstrusetheoryorsaleshype,hisbookisadown-to-earthandmuch-neededguide

thatclearsupthemysteriesofthesemethods.’DaveLyon,Principal,AuroraMarketModeling

‘DrStruhlhasdoneitagain!He’stakenacutting-edgetopicandgroundeditinveryaccessibleprose,usingreal-lifesituationssothatmarketingandmarketresearchpractitionerscanimmediatelyactuponArtificialIntelligenceanditsabilitytopredictconsumerchoice.IstronglyrecommendthisbookforanyonewantingtobetterunderstandhowtousethegrowingpresenceofArtificialIntelligenceandmachinelearninginourday-to-dayresponsibilities.’DarrinHelsel,PastResearchChair,AmericanMarketingAssociation,Portland

‘DrStruhl’slatestbook,ArtificialIntelligenceMarketingandPredictingConsumerChoice,providesconcreteandeasy-to-understandinformationaboutasetofanalysesthatcanbeintimidatingforresearchersandclientsalike.Hisexamplesareclearandapplicabletotheconceptsbeingdiscussedandprovideexcellentinsightsintohowthesevaluableanalyticaltechniquescanbeusedtoanswerreal-lifebusinessquestions.Inaddition,DrStruhlalsowriteswithasenseofhumourwhichhelpstomakereaderscomfortableandthematerialevenmorereadilyunderstandable.Thisbookisanexcellentresourceforbothmarketresearchsuppliersandclients!’JulieWorwaMBA,ResearchandMarketingConsultant

‘StevenStruhlhasagiftfortakingcomplexmethodologicalissuesandexplainingtheminaccessible,meaningfullanguage.Hiswritingisthought-provokingandentertaining.ArtificialIntelligenceMarketingandPredictingConsumerChoicetacklesmanyofthecontemporarychallengesofturninganabundanceofdataintocrucialinsights.Itisamust-readforanyonewhoisexploringtheuseofartificialintelligencemethodstoinformmarketingtacticsandstrategies.’LarryDurkin,SeniorConsultant,MSPAnalytics

ArtificialIntelligenceMarketingandPredictingConsumerChoiceAnoverviewoftoolsandtechniques

StevenStruhl

CONTENTS

CoverTitlePageCopyrightContentsListofFiguresListofTablesPreface

01Whoshouldreadthisbookandwhy?WhatwecoverinthisbookWhatcanyouexpectinthisbook?DataversusinformationWhatisimportant?ThemethodswewillbediscussingImplicitviewsofpeopleandbiasesOnewayofcomparingthesemethodsSenseandsensibilitywithpredictionsWherewewillnotbegoingSummaryofkeypoints

02GettingtheprojectgoingAtthebeginningKnowwhoyouaretalkingaboutortalkingtoWhatisthemostyoucanexpectfromeachmethod?Howdoyoujudgetheresult?Whatissignificant?OntocorrelationsHowdoIplantoevaluatetheresults?Knowwhatsensiblegoalsmightlooklike

Summaryofkeypoints

03Conjoint,discretechoiceandothertrade-offs:let’sdoanexperiment

ThereasonsweneedthesemethodsThebasicthinkingbehindtheexperimentally

designedmethodsWhatthemethodsask–andgetWhatisadesignedexperiment?ThegreatmeasurementpowerofexperimentsGettingmorefromexperiments:HBtotherescueAbrieftalkaboutoriginsApplicationsinbriefSummaryofkeypoints

04Creatingthebest,newestthing:discretechoicemodellingKeyfeaturesThinkingthroughandsettinguptheproblemHowmanypeopleyouneedUtilityandshareMarketsimulationsMakingmorethanonechoice:allocatingpurchasesUsingthesimulatorprogramintheonlineresourcesRoundingoutthepictureSummaryofkeypoints

05ConjointanalysisanditsusesThinkinginconjointversusthinkinginchoicesConjointanalysisforsingle-productoptimizationUsingthesingleproductsimulatorintheonline

resourcesConjointremainsanexcellentmethodformessagesConjointanalysisforthebestservicedeliveryUsingthemessageoptimizationsimulatorinthe

onlineresources

ConjointanalysisandinteractionsVariantsofconjointanalysisSummaryofkeypoints

06Predictivemodels:viaclassificationsthatgrowontreesClassificationtrees:understandinganamazing

analyticalmethodSeeinghowtreeswork,stepbystepStrong,yetweakAcasestudy:let’stakeacruiseCHAIDandCART(andCRT,C&RT,QUEST,J48and

others)Summary:applicationsandcautions

07RemarkablepredictivemodelswithBayesNetsWhatareBayesNetsandhowdotheycomparewith

othermethods?Let’smakeadealOurfirstexample:BayesNetslinkingsurvey

questionsandbehaviourBayesNetsconfirmatheoreticalmodel,mostlyWhatisimportanttobuyersofchildren’sapparelSummaryandconclusions

08Puttingittogether:whattousewhenThetasksthemethodsdoThinkingaboutthinking

BibliographyIndexBackcover

ListofFigures

FIGURE1.1Onewaytocategorizethemethodswewilldiscuss

FIGURE2.1Gettingtheframerightiscritical

FIGURE2.2Samplepercentages’errorsatdifferentsamplesizes

FIGURE2.3Noteverythingfallsintoastraightline

FIGURE2.4Correctclassificationtable–63percentcorrectoverall

FIGURE3.1Ratingthefeaturesofafloor-standingwinecooler

FIGURE3.2Asamplemarketplacescenariofordiscretechoicemodelling

FIGURE3.3Aprofileofaserviceforconjointanalysis

FIGURE3.4Asimplesimulatorforoneproduct

FIGURE3.5Asurveytaskformaximumdifferencescaling(MaxDiff)

FIGURE3.6AsurveytaskforaQ-Sort/Case5

FIGURE3.7Asmallandwrongwaytomeasureandalargercorrectway

FIGURE3.8Elementsofadesignedexperiment

FIGURE3.9Showingthattherearenocorrelationsamongattributes

FIGURE3.10Ourguestfictionalstudyparticipantrankscarsontwofeatures

FIGURE3.11Awin-lossmatrix

FIGURE4.1Aportionofamarketsimulator

FIGURE4.2Discretechoicetreatsbrandaslikeacontainerforfeatures

FIGURE4.3Aself-effectscurveforshareversusprice

FIGURE4.4Differentresponsesofsharetopricechanges

FIGURE4.5HowvalueschangeasHBanalysisruns

FIGURE4.6Errorsindiscretechoiceversusmanyothermethods

FIGURE4.7ComparingtheS-shapedresponsecurvetoprospecttheory

FIGURE4.8Thecurrentcasesideofthesimulator

FIGURE4.9Thereferencecasesideofthesimulator

FIGURE4.10Basecaseandtwosimulationsshowingthedangerofapricewar

FIGURE4.11Changesinrevenuesfromthesimulationsunderlinethedangers

FIGURE4.12Resultsfromthreemoresimulations

FIGURE4.13Pagedisplaycontrolsinthedownloadablesimulator

FIGURE4.14Theleftsideofthedownloadablesimulator

FIGURE4.15Therightsideofthedownloadablesimulator

FIGURE5.1Attributesindiscretechoiceversusattributesinconjoint

FIGURE5.2Differencesinhowdiscretechoiceandconjointtreatutility

FIGURE5.3Samplebasic-oneproductconjointsimulator

FIGURE5.4Detailsofhowutilitybecomesshareofpreference

FIGURE5.5Elementsvariedinthedirectmailoffering

FIGURE5.6Responsestotheelementsofthedirectmailoffer

FIGURE5.7Alternativestobetestedintheprintadvertisement

FIGURE5.8Oneoftheadstested

FIGURE5.9Thesimulatorusedforoptimizingthead

FIGURE5.10Elementsonawebpagevariedfortesting

FIGURE5.11Effectsofchangingelementsofthewebpage

FIGURE5.12Aserviceprofileforconjoint-basedoptimization

FIGURE5.13Priorities’placeonserviceareasfromtheanalysis

FIGURE5.14Effectsfromchangingthefrequencyofthe

newsletter

FIGURE5.15Howfrequencyofsamplingisoverdeliveredandunderdelivered

FIGURE5.16Thedownloadableadvertisingoptimizationsimulator

FIGURE5.17Averageresponsetochangesinpriceversuseachbrand’schanges

FIGURE5.18AnexampleofCBC-styleconjointanalysis

FIGURE6.1Thedatabase‘population’

FIGURE6.2Thetree’sfirstsplit

FIGURE6.3Resplittingthefirstsubgroup

FIGURE6.4Splittingofgroupsshownasaclassificationtree

FIGURE6.5Importancesfromrandomforests

FIGURE6.6Theelementsinatreediagram

FIGURE6.7Thefirstsplit,basedonupgrades

FIGURE6.8Thefirstsectionofthetree

FIGURE6.9Splittingvariablesforthoseupgradingonce

FIGURE6.10Mostlythebestprospects,thoseupgradingmorethanonce

FIGURE6.11Thewholeclassificationtree

FIGURE6.12Thegainschartsimplified

FIGURE6.13Decisionstumplinkingscoresandshare

FIGURE6.14Outputfromboosting

FIGURE7.1Aregressionlineshowingthebestprediction

FIGURE7.2Asplitfromaclassificationtree

FIGURE7.3Relationshipsinnetworks

FIGURE7.4Thenetworkforthethree-doorproblem

FIGURE7.5Insidethefirsttwonodesofthenetwork

FIGURE7.6Partofthethirdnode–whenyouchooseDoor1

FIGURE7.7Allofthethirdnode

FIGURE7.8Howlikelytheprizeistohidebehindeachdoor

FIGURE7.9S-shapedrelationshipfoundbyaBayesNet,butnotaregression

FIGURE7.10Asimplenetworklinkingshareandratings

FIGURE7.11Effectsrelativetothetargetandindexedversustheaverage

FIGURE7.12Aclassichierarchicalmodel

FIGURE7.13BayesNetsdiscoverahierarchy

FIGURE7.14Importancesfor46(disguised)attributes

FIGURE7.15Importanceoftrustandfashionbyageofchild

ListofTables

TABLE4.1Settingupcontingentpricing

TABLE6.1Thefirstfourclassificationrules

TABLE6.2Gainschartshowingthefivegroupswiththehighestindexvalues

Bonusonline-onlymaterialsareavailableatthefollowingurl:

www.koganpage.com/AI-Marketing

Bonusonline-onlyChapter1:Finishingexperimentsandontothenon-experimentalworld

Bonusonline-onlyChapter2:Artificialintelligence,ensemblesandneuralnets

Online-onlySimulators:threedifferentsimulatorsinExcelformat,oneinPowerPointformatandoneinPDF(AdobeAcrobat)format,allowingyoutointeractivelyoptimizeproductsandmessages.(YouwillneedFlashPlayerinstalledonyourcomputerforthePowerPointformatandPDF(AdobeAcrobat)formatsimulatorstowork.)

http://www.koganpage.com/AI-Marketing

PREFACE

Newsandarticlesonartificialintelligenceseemtobeeverywhere.Atleasttheydoifyouarewritingabookwiththewords‘artificialintelligence’inthetitle.Butwhatisartificialintelligence?Criticalasthisfieldis,itappearsthatthereisnocleardefinition.

A reporter went to Alphabet (formerly Google), the epicentre ofartificial intelligence, and asked people working there for anexplanation.Herearesomeoftheanswers:

‘Iwoulddefinitelyinterviewsomeoneelse.’

‘Nothanks.Sorry.Goodluck.’

‘Idon’tknow.I’llpass.’

‘It’smachinelearning.’

‘IworkatYahoo…’

Still, this topic is vitally important for you and for answering theincreasingly difficult questions you are likely to encounter. Thisbook will give you the practical information and pointers onapplications thatyouneedtoknowtosucceed.But this isvital foryourfuturecampaigns,andthisbookwilltellyouwhatyouneedtoknowtogetahead.In Chapter 1, we propose a working definition. There is no

question thatartificial intelligence andmachine learning, ifnotin fact the same, overlap substantially. This of course raises thequestionofwhatmachinelearningmeans.This definition also varies depending on who you ask, just like

asking about the height of PT Barnum’s elephant, Jumbo. (Jumbowas twelve foot six incheshigh if you askedMrBarnum, and tenfootnineifyouaskedsomeonewithatapemeasure.)

An online article purporting to teach about machine learningincluded regression and clustering among advanced machinelearning methods. These are two of the most august and long-standingofanalyticalapproaches.Regressionwaswidelyusedwellbeforecomputersexisted.Even taking a less expansive definition, machine learning has

beenwithusfordecades.Ithasbeenworkinginthebackgroundasoftenasintheforeground,solvingproblemsthatwouldhavebeenimpossibletoapproachwithoutit.Thisisworthnotingseparately.

Hidinginplainsight

Wehavebeenusingmachineartificialintelligence/learningmethodsforyears.Ithasbeenanessentialpartofmanymethodsusedtopredictconsumerchoices.Wewillbelayingoutsomekeyapplicationsandhowtheyrelatetoartificialintelligencethroughoutthisbook.

Nowthatwehaveestablishedthat,oratleastsaidit,weshouldgiveyou some basic information so you know if you are about to sitdown in themovie youwanted to see.Wewould like to give yousome sense of who this author is anyhow, who is asking you tosoldier through territories that are fraught with complexity, andoftendescribedwithobscurityandobfuscation.Ifyouarepeekinginsideanonlinepreview,or(rarityofrarities)

lookingat this inabookstore, thiscouldbeyourspot todecide ifyou want to continue to the next glance. If you looked here bymistake, then now could be the time to realize this was a luckystrokeafterall.Thisbookaims tocut throughmuchof themurky language, the

jargon, recondite formulations, and even utter nonsensesurroundingthisfield.Wewillhavetogoovervocabulary.Butwewilldothisonlysoyouwillbepreparedwhenyouencountersuchtermsasorthogonaldesignandknowthattheyarenothingtofear.Wewill steeraroundequations, subscriptednotationandGreek

letters whenever possible. If youwere hoping to see all of those,thenthiscouldbeyourcuetoinvestigateelsewhere.This book is not a guide to the types of services that vendors

supply. Rather,we aim to describe enough about eachmethod sothatyougetanideaofhowitworks–and,moreimportantly,howyou best apply it. Reading this book will help you to deal withsellers as they approach youwith their newest, latest things. Theinformationyougetherewillenableyoutoevaluatetheirclaimsinaninformedandsuitablyinquiringway.Lookingatotherprefaces,thisseemstobetheplacewhereItake

aturntothefirstperson,andtellyoualittleaboutmyself.YoumaybequestioninghowandwhyIcametowriteabooklikethisinthefirstplace.Here’swhatIcancomeupwith.Ihavebeenworkinginapplyingdataanalysis topracticalproblemsforover30years.MyclientshaveincludedmanyFortune100companies,butalsoahostofmid-sizeandsmallerentities,alongwithcharitable,educationalandnon-profitorganizations.Irunaconsultingcompany,ConvergeAnalytic,specializinginadvancedanalytics.Ihavewrittenover25articlesandtwootherbooks,oneofwhich

hasbeen inprint for over 20 years. You can seebothonAmazon(and even buy them, not that I am hinting at anything). You canevenorderthenewerone,PracticalTextAnalytics,directlyfromtheKoganPagewebsite.Ihavetaughtadvancedstatisticstoboredgraduatestudentswho

had to take it to get their degrees, given numerous other coursesandseminars,andcontinuetoteachcertificationcoursesonline.Aboutmy own education, I started as an undergraduate in the

sciences. Beyond this, I have an MBA (University of Chicago), adoctorateinpsychology(ChicagoSchoolofProfessionalPsychology)and an MA in language and linguistics (Boston University). Thiscombination does at least seem congruous with the topic of thebook.Italsogivesrisetothequestionofwhy,whenIwasyounger,Ididn’tjustgetajob.Concerningthemethodsthatwewillreview,Ihavebeenworking

withmost forat least20yearsandsomeforover30.Everything Iwrite about has worked in real applications for at least severalyears. Therefore, everything you see here will have a solid trackrecordinrealapplications.Manythanksareinorderhere.WeshouldstartwithJennyVolich

and AnnaMoss at Kogan Page formaking this book possible.Myparticular thanks toCharlotteOwenandRajveerRo’isin Singh fortheirhelpfuleditingandencouragingcomments.Ialsowouldliketogiveaheartfelt‘thankyou’tomywifeDebra,

for again tolerating latenights andworkingweekends. Especially,mythankstoherforsteppingfaroutsideherfieldandactingasakindofroyalfoodtasterforyou,thereaders.Shetriedoutvarioussections of the book to find out whether they were particularlyindigestible.Twoquotesaboutexpertisehelpdefinetheaimsofthisbook.One

oftenattributedtoEdwarddeBonosays,‘Anexpertissomeonewhohassucceededinmakingdecisionsandjudgementssimplerthroughknowing what to pay attention to and what to ignore.’ Another,reportedly saidbyNielsBohr, defines an expert as aperson ‘whohas foundoutbyhisownpainfulexperienceall themistakes thatonecanmakeinaverynarrowfield’.Totheextent that thisbookcanmakedecisionssimplerandhelpyoutoavoidsomeharrowingslipups,itwillhaveserveditspurpose.

TomywifeDebraandmymotherLydia

01

Whoshouldreadthisbookandwhy?

Thischapterexplainswhoshouldreadthisbook,practitionerswhoneed to dealwith predicting consumer choices, and studentswhowant to understand what is involved. It discusses the issuesinvolved in trying to predict these choices. Here we also explainwhat makes up data itself and what makes up information,introducethetypesofanalysesandtheirnatures,andoutlinewhatyoucanexpecttogetoncealltheworkhasbeendone.In thischapter,wealso introduceacriticalguidelineweseek to

follow throughout our discussion, as delineated by Lewis Carroll:‘Beginat thebeginning…andgoon tillyoucome to theend: thenstop.’ We also need first to consider exactly what we want toanalyse,whatwehopetogetfromitandwhy.Thischapterstartsusonthisprocess.

WhatwecoverinthisbookWegiveyoupracticalpointersandsuccessfulstrategiestodealwithmany situations involving the prediction of consumer choices.Thinkingaboutthefollowingshouldclarifywherewearegoing.Iffacedwithanyofthefollowing,whatwouldyoudo?

✔Makingthebestpossiblenewdeviceorservice,includingthebestfeaturesandsettingthebestprice.

✔Understandinghowmuchtochargeforthefineproductorserviceyoualreadyoffer.

✔Counteringacompetitorcomingintothemarketplacewithanewofferingbyrevampingyourproductoryourclaims.

✔Finetuningyourproductofferingsinthebestway.✔Gettingthebestmessageconveyingthebenefitsofyourfine

productorservice.✔Orperhapsjustsellingmuch,muchmoreofthegoodsor

servicesyouhave.

Whichofthesestrategieswouldyoufollow?Wouldyou…

Askpeopledirectlywhattheyreallywant?Lookatwhatpeoplehavebeenchoosingandtrytofigureoutwhattheywouldbuynext?Trytobuildamodelthatexplainsself-reportedattitudesandbehavioursinrelationtobuyingorbuyingintent?Doanexperiment,likeascientificexperiment,togettoananswer?Putyourtrustinatheoreticalmodelthatsoundsattractive?Talktoafewlikelycustomers?Lookatsocialmediaforclues?Throwdartsortrustthefaintrumblingsinyourgut?

Allof theseandmorehavebeen tried.Ourgoal is togiveyou thebest approaches for predicting consumer choices in differentsituations–andtotellyouwhichonestoavoidatallcosts.

Soisthisabookaboutpredictiveanalytics?The short, but infuriating, answer is yes – but no. Using data tofigureoutwhatpeoplewill choose is thehumansideofadvancedanalytics. There are many other analytical approaches that keeptabs onwhat is happening in sales or transactions, or that aim topredicteventsnot involvingdecisions,suchasdiseaseprogressionor mechanical failures, or that predict the likely outcomes ofsporting events – and indeed that try to predict the outcomes ofmanyothereventsandprocesses.Wewill be discussing the parts of the field that concentrate on

understanding thechoices thatpeoplemake,whether theseare inbuying a product or service, enrolling in a programme or school,subscribing to a service, choosing something on a web page,donating to a charity or volunteering their time.We also provideguidance on which methods work extremely well in the rightsituationsandthosethatdonotoffergoodguidance.Weaddressthesequestions:

Whichmethodisrightinwhichsituation?Whatarethesalientstrengthsofeach?Whataretheimplicitviewsoftheworldandbiasesineach?Whatarethecautionsandlimitations?

Whilemanybooksaddressthesubjectswecover,onlyafewhaveascattered chapter or two touching even briefly on multipleapproaches. This book will put them all into perspective, and soprovideausefulguidefor:

thepersonwhomustplantheproject;thosewhomustunderstandtheresultsandapplythem;thestudenttryingtolearnwhatworks.

Whatcanyouexpectinthisbook?Wewill focus on theplanning,execution andapplication of thewidelydiversemethodsusedtopredictwhatpeoplewillchoose.Wewilltalkaboutseveralprimaryareas,including:

thebasics,suchasreasonsfordoingtheanalysesandcoretaskssuchasselectingdataorselectingasample,framingtherightquestionandfocusingonwhatmattersmost;thedifferenttypesofmodelsandwhattheyreallypredictaboutchoices;differentkindsofoutputandhowtheybestcanbeused.

Wewillbeexplainingsomeadvancedtopics,butyouwillneedlittlemore than a passing acquaintance with statistics or analyticalmethodstograpplewiththeseideas.Anytimeweintroduceanewterm (and given the nature of practitioners in the field, these arenot in short supply) there will be an explanation. Those of youhopingforsubscripts,Greeklettersandmulti-tieredequationshavedefinitelysatdowninthewrongtheatre.

LookforordinaryEnglish,notequations

Everyonecanbreatheasighofrelief.Wewillnotberesortingtoequationsfilledwithreconditesymbolsanddonotexpectyoutohavemasteredadvancedstatisticsormaths.Wewillbeaddressingcomplextopics,buteverythingwillbeexplainedinplainEnglish.

Somedefinitionsweneed

Predictionversusforecasting

Prediction isaconceptgroundingallourdiscussions.Thistermisjustconvenientshorthand,followingthecommonusagethatweseeinsuchtermsaspredictiveanalytics.Actually,noonecanpredictanything.Ifthatweretrue,yourauthorwouldnowbesittingonthedeck of his J-Class yacht with a cool drink. Writing this book, ofcourse,butstillontheyacht.Themorecorrecttermisforecasting,somethinglikeforecasting

theweather.Someofourforecastscanbeincrediblyaccurate–andindeedcan look likepredicting.Someotherapproachesarenotasfortunate–andwewill talkabout those.Sadly, though,wedonothave the advantage of weather forecasters, who can stick theirhands out of the window and solemnly ‘forecast’ that it is nowraining. Everything we discuss will go beyond just taking thecurrenttemperatureandreportingit.

Artificialintelligence

Movingtoanotherkeytopic,let’stalkaboutartificialintelligence.One often-seen definition: anything done by a machine thatresponds to its environment and takes actions that maximize itssuccess. The machines we will discuss are computers. Theirenvironmentisdata.Theirsuccesscriterionliesinfindingpatternsinthedatathatwecannotperceive–andthathelpustotakemoreeffectiveactions.

We are not talking about robots that can do our work for us(sorry if you were expecting this). As penetrating as a machine’sanalysismaybe,youmustmakethefinaldecisions,andyoumustdecidehowtoputtheinformationgarneredintoaction.

Practicalartificialintelligence

Artificialintelligencebroadlymeansanythingamachinedoestorespondtoitsenvironmenttomaximizeitschancesofsuccess.Themachinesweusearecomputersandwesettheirgoalasdetectingcomplexpatternsthatwecannotinordertoaidinourmakingbetterdecisions.Forthoseexpectingrobotstodoourworkforus–sorry,notyet.Somesystemswithlow-levelintelligenceareautomated.Otherwiseyouaretheonewhoneedstodecidehowtousetheinformationthatthemachinesprovide.

Evenifwearenottreadingintherealmofsciencefiction,wewillsee how remarkablymachines can parse and understand data inwaysthatwecannot,andseepatternsthatwenevercould.Thereismuchthatisamazinginthisarena–aswewilldiscussthroughoutthis book. Also, some systems do make autonomous, low-leveldecisions based on rules that we devise. And we will talk aboutthose.

Whoisaconsumer?

This is our last major definition. Of course, the term consumerincludes ordinary shoppers dealing with the often bewilderingarrayofchoicesweencounterinmodernlife.However,aconsumercouldalsobeanyofthese:

adoctorchoosingmedicationsforpatients;apurchasingagentchoosingwhichtypeofrubberbushingswillfillthestockroom;adonorchoosingwheretopledge;astudentchoosingwheretomatriculate;anexecutivetryingtochoosewhichmanufacturerofmobilephonetowerswillgetthebid.

Bytheway,mobilephonetowersarethosetallandoftenunsightlyobjectsthatallowustoenjoythewonderfulworldofsmartphoneseverywherewego.Theyareveryexpensive, andvery fewpeople

on the entire earth decide about buying them. These towers (andtheirbuyers)appearinoneofourexamplesaboutpricing.

And,finally,whatisamathematicalmodel?

Wewill be discussing these throughout. By this, we simplymeananytypeofregularmanipulationofasetofvariablestoforecastorpredict thevaluesofsometargetvariable.Thiscouldbeassimpleas adding or multiplication, or as complex as some of the mind-bendingly difficult approaches that we discuss later in the book.Thesevariablescanbeanyquantityorqualitythatvariesfromonepersontothenext,andcanincludepersonalcharacteristics,ratings,consumptionpatterns,choicesmade,statedbeliefsandsoon.

MovingforwardStay tuned as we discuss, and try to make sense of, work fromcommunities that rarely makemuchmention of each other. ThiswillbeaninterestingjourneyandthroughoutwewilltrytofollowtheadviceputforthbyLewisCarroll: ‘Beginatthebeginning?andgoontillyoucometotheend:thenstop.’

DataversusinformationWehearalotaboutdataandevenhaveentirecareersnamedafterit,suchasdatascientist.Oneofthefirstthingsweneedtosettleisthedifferencebetweendataandinformation.Datasimplymeansanythingthatcanbemeasuredinanyway:

Measurementsofthehumgivenoffbyneonlightsaredata.Exaggerations,conflations,misrepresentationsanddownrightliesaredata.Collectionsofvideosrecordinginactivityonemptystreetsatnightaredata.

Information isdata thathasbeengatheredandprocessed togiveyouinsightssothatyoucandealwithasituation–inparticularanunexpectedone:

Reportsthatatruckisstalledonthehighwayandyourexpectedroutehasbeendelayedareinformation.Analysesoffactorsinfluencingtrendsinenrolmentatyourschoolareinformation.Analysesofwhatpeoplehaveboughtsothatyoucansellthemmoreoftherightitemsareinformation.Linkingconsumerinterestinspecificproductfeaturestolevelsofpurchasingprovidesinformation.

Acriticaldifference:dataisnotinformation

Dataisanythingthatcanbemeasuredinanyway.Informationisdatathathasbeengatheredandprocessedsothatyoucanuseittodealwithasituation–inparticularoneyoudonotexpect.Datadoesnotdoanythingbyitself.Youmustfindhowtoturnitintoinformation,thenknowledge,andfinallyactions.

Authors Clifford Stoll and Gary Schubert take this a few stepsfurtherwhentheysay thatdata isnot information, information isnot knowledge, knowledge is not understanding, understanding isnotwisdom.Themethodswedescribeherecangetyouallthewaytoknowledge.Therestisuptoyou.

FallaciesabouthavinglotsofdataAbeliefthatstilllingersisthathavingmoredatamightjustsolveallproblems.Ifwerecallthatdataisjustbasicallybitsandpieces,wecanseethatthisiswrong.Yetweencounterthiserroneousideainmanyplacesontheweb,andeveninsome(bad)books.Weshouldknowbetter.Theauthorofonesuchbookmadesuchaclaimwhilespeaking at a conference (and right before your author spoke,whichdidnotstartthingsoffverywell).Thispersonevensaidthatmoredataisalwaysbetter,evenifyouareaddingbaddata.The audience, mostly people with long experience in direct

marketing,satwitheitheramusedorannoyedexpressions.Onthewayout,onepersonsaid,‘Well,he’sanacademic.Hemustnothavepractical experience.’ This comment captures an important point.Manyideasaboutdataseemattractivebutdonotsurviveexposuretoreality.More data of thewrong type actually is bad for you. If you are

lookingforaneedleinahaystackitdoesnothelptohavealargerhaystack.Whileitisaworthwhilegoaltocollectasmuchdataaspossible,

dataquality, andknowingwhichdatawilladdressyourneeds,

remainparamount.Somefrustrationwithhavinghugerepositories,and yet finding no useful purpose in them, can be seen in therelativelynewtermdataswamp.Thisisadatalakethathasgonebad(orwasnevergood).Adata lake is also a new idea, basically a gathering of various

data sources thatarekept in theirnative formatswith little tonoupfront attempt to integrate them (the earlier data warehousesattempted to do at least some integration). The hope with a datalake is that some newer software, such as the often-mentionedHadoop, will do the magic that allows the data to get called up,aligned,cleanedandintegrated–allofwhichthenjustmight leadtoananalysisthatprovidessomethinguseful.Cleaningupdataandgetting it to match with data in other formats in themselves areformidabletasks.Toooften,evengettingthisfarcanconsumegreatamountsoftimeandeffort.

Trytoavoidthismisconception

Data,andespeciallygreatmassesofdata,getplayedupastheanswerinthepress,onthewebandeveninsomebooks.Simplyhavingalotofdatadoesnothelpsolveproblems.Datathathasnotbeencollectedforthepurposesofaspecificsetofanalysesmayneveryieldananswer,regardlessofhowmuchdatayouhave.Addingmoredataunconnectedtoyourquestionswillmakeyourtaskharder,noteasier.Thelargerthehaystack,themoredifficultitbecomestofindtheneedle.

Whatisimportant?Ifwecouldcounteverythingthatpeoplechose,thenanyeventthatcausedachangeinourcountwouldbeimportant.Supposeyouarerunninganon-profitradiostation,andyoustartaso-calledpledgedrive asking for donations, and after days of begging, you raisemoney.Youcan, therefore,saythat thisendlesswheedlinghadanimportanteffect.Similarly,ifyouwanttoseeifaproductwillsell,youmayputitoutonshelvesandwatchwhathappens.However,inneithercasecanyousayyouhavedonethebestyou

could.With the radio station, you cannot answer the question ofhow much money you could have raised by other means. Forinstance,youmightsimplyremindpeoplethatyouneeddonationstocontinue(thishasbeencalleda‘silentdonationdrive’).Withtheproduct,youcannottellhowmuchyoumighthavesold

if you configured it differently or charged some other price. Bysimply putting the product on the shelf you also have incurrednumerous expenses. These include distribution and presumablypromotion(sothatpeopleknowitisontheshelfwaitingforthem).Failingthiswaycostsagreatdeal.

Measurementisnotinsight

While direct measurements of behaviour have the advantage ofbeingrealdata,theydonotgiveusanyinsightsintoalternativeswe

mighthavetried,orintothereasonsforpeoplemakingadecision.Directmeasurements also canprove to be surprisingly difficult.

Suppose you run a bank. Clearly, you would like to attract moredeposits,more loanbusiness andperhapsmore financial-advisingbusiness.However,whileyoucouldseesomeimmediatereturnstoyourpromotional efforts, some consumerdecisions in these areastakemoreconsiderationandmighthappenoverweeksormonths.Waiting this long to see if your promotion has had the desired

effect has numerous disadvantages. First andmost obvious is thespan of time itself.We also encounter the problem of events thatcouldinterveneduringthattime.Whatifsomethingintheexternalconditions changes, as a resultmaking your productmore or lessattractive?Whatifacompetitordrawsbusinessawaybycopying–orworse,outdoing–yourpromotion?If we can sum up, if you rely just onwhat you can observe in

behaviour, you may sell more of your fine product, enrol morestudents,getmorecharitablepledges–andsoon.Butyouwillnotknow how much better you could have done by altering yourapproach.

Thelimitationsofdirectmeasurement

Measuringbehaviourdirectlygivesyourealdata.However,itdoesnotallowyoutodetermineifyoucouldhavedonesomethingbetter.Italsodoesnotgiveyouanyinsightintoconsumers’impressionsormotivations.Evenifyoumeasureresponsestoseveralalternativessimultaneously,youstillareseverelylimitedinwhatyoucantestcomparedwiththeexperimentalmethodsthatwewilldiscuss,anddonothaveinformationaboutwhypeoplearerespondingastheydo.

UsingprecursorsorsurrogatesforbehaviourAll these factors lead to the use of other methods, includinginterviewing and measuring variables that are supposed to beprecursors to or surrogates forbehaviour. Someof the surrogateswidely regarded as suggesting what might happen when peoplechooseare:

awarenessoftheproductorservice;ratings,forinstancesatisfactionratingsona1to10scale;associationswithdesiredcharacteristicsfortheproduct;buyingintentions;preference.

Therearenumerousothers,but thesewillgiveusastart. Insomecases, these measures are treated as outcomes – for instance,measuringawareness,likingand/orratingsofaproduct(orservice)markstheendoftheexercise.Again, thepresumptionhereis thatthese measures are precursors to behaviour. In some instances,thesemeasureshaverealconsequencesinthemselves,forinstance,whereemployeecompensationgetstiedtoscoresinsatisfaction.Veryoften thesemeasures, andothers suchasdemographicsor

past purchases, get rolled into mathematical models of varioussorts.Andindeed,wewillbediscussinghowmodelsliketheseworkandshowingexamplesthroughoutthisbook.

ThemethodswewillbediscussingThemethodswewillbediscussingfallintofourbroadclasses:

experimentalortrade-offmethods;questionsandanswers;modelsbasedonstoredorhistoricdata,machinelearning,artificialintelligence;varioustheoreticalmodels.

Someofthesemaysoundmysteriousnow.Weaimtoclearupanyuncertaintyaswediscusseach.Let’stalkabouteachbriefly.

ExperimentalmethodsTheseweredesignedtodevelopnewproducts,serviceofferingsorcommunications. They involve interviewing people. These includeextremelypowerfulapproaches,inthattheycanestimateresponsestomanythousandsofproduct/serviceconfigurationsormessagesinmarket simulations – including themarket simulator programsthatwewillshowyou.Ifdonewell,theirpredictionscanbehighlyprecise. These methods have track records of over 30 years andextensiveacademicsupport.Workdevelopingoneofthesemethods– discrete choice modelling – won a Nobel Prize in economics in2000.Thesemethodshavebeenextendedandstrengthenedbytheuseofmachinelearningapproaches.

Questionsandanswers

The name itself explains the next set of approaches. People getinterviewedandaskedavarietyofquestions.Methodsavailableforanalysingtheseanswershaveadvancedconsiderably.Infact,herewe will also encounter artificial intelligence, in at least oneunexpectedplace.

Manytypesofoutputcomefromquestionsandanswers,rangingfromsimpledescriptions topowerfulmodels thatshowtheeffectsof different variables on some outcome (or even multipleoutcomes).Someoutcomesthatyoucanseepredictedincludelevelsof satisfaction, degree of preference, and even (with some newermethods) market share. We will bring you up to date on whatexpertsaretalkingaboutanddoing.

ModelsbasedonstoredorhistoricdataThese can go deeply into machine learning and artificialintelligence, and appear to make up much of marketing science.Manytimes,datathathasbeencollectedforpurposesotherthanaspecific analysis get assembled, probed and examined. Outputsinclude models for scoring customers and prospects, and evenalgorithmsforquickdecisionssuchaswhethertoshowaparticularpersonanadvertisementonagivenwebpage.This is the arena that gains the most attention and causes the

most concern about breaches of privacy. Perhaps surprisingly, allthemethodsthatgetusedhere(exceptingcomputer-basedonesthatassemblesourcesofdata)alsogetusedinanalysingquestionsandanswers.

TheoreticalmodelsThese are important because they influence somuch of theworkthat gets done in investigating data. These models attempt toexplainwhich factors lead to behaviour, andwhich often unseenunderlyingcausescanchangewhatapersonchooses.

Implicitviewsofpeopleandbiases

Modelsbasedonstoredorhistoricdata

Thesehaveat least these two implicit viewsofpeople.First is thelogical-seemingideathatwecanforecastwhatapersonislikelytodobyobservingwhattheyhavedone.So,forinstance,ifyouhaveboughtaboxofthewonderfulbreakfast-likesubstanceKardboardKrunchieseveryweek,itseemsreasonabletoassumeyouwillbuyonenextweek.Anotherimplicitviewisthatyouasaconsumerarelikelytobuy

whatpeoplesomehowsimilartoyouhavebought.Soifwomeninthe last trimester of pregnancy buy a lot of cotton balls, thenwemight assume that Thisbe,whoalso is in the last trimester of herpregnancy,islikelytobuycottonballs.These assumptions can work effectively, but also can lead to

problems, as we will discuss. A great deal of effort in marketingsciencesgoesintofindingcohortsandobservingwhattheyaremostlikelytodo.Thistypeofalignmentofindividualwithagroupisonebasis forcertainsystemsthatshowlow-levelartificial intelligence,includingtherecommendationengineswewilldiscusslater.

AnalysisofquestionsandanswersThe obvious but critical assumption here is that if you ask fairlydirect questions, whatever people tell you will provide valuableinformation. Direct questions asking for ratings of importance inparticularhavenumerousproblems,includingpeoplegivingoverlypositive ratings (acquiescence bias), people giving the sameanswer repeatedly (straight lining), cultural and personaldifferencesinhowscalesareused,andinaccurateresponsesduetothecommondesiretogivesociallydesirableanswers.Trying to minimize these problems, methods of more indirect

questioning have been devised. It is the experimental methods,though, that go furthest to overcome these pitfalls of directquestioning.

Theexperimentalmethods

TheexperimentalmethodsThis set of approaches arose due to the realization that peoplecannotorwillnottellyoudirectlybyratingswhattheymostvalueinaproductorservice.Thesemethodsaskpeopletomakechoicesin various ways and so reveal the true hierarchies in what theyvalue.Powerfulastheseexperimentalmethodscanbe,theyhavetheir

own implicit assumptions. They are based on the belief that aproduct, service or message can be broken down into discretefeatures that can be tested and compared. It is true that nearlyeverything we encounter has at least one measurable feature.However,withsomeproducts,servicesormessages,thefeatureswecanmeasuremaynotbethemostimportantones.Therealsohavebeensomeobjectionstothesemethodsbasedon

various notions about howpeople think and process information,andwhether thesemethods trulycapture that.Aswewilldiscuss,thesecavilsarelargelybesidethepoint.

Andfinally,thetheoreticalmodelsTheseobviouslyareideasabouthowpeoplethink,feelandbehave– and often came into existence without any firm empiricalevidence. Still, these inform a great deal of what marketers andevenmarketingscience typesdo, so it is important toknowaboutthem.

OnewayofcomparingthesemethodsWecanarrangethesemethodscomparativelyinanumberofways.Figure 1.1 shows one of these, based on how much effort ofdifferenttypeseachrequires.Thetwoaxesrepresent:

howmuchisrequiredinplanningtointeractwithconsumerstogettheneededinformation;

howmuchanalyticalcomplexitycouldbeinvolved.

FIGURE1.1Onewaytocategorizethemethodswewilldiscuss

Let’sreviewFigure1.1.Thefirstaxisisinviolable–thereisnowayaround the effort required in interacting with consumers if themethods are towork. The second axis ismore discretionary, as itmaybepossibletogetbywithlessanalyticalcomplexityusingsomemethodssomeofthetime:

Thetheoreticalapproachesresideinthebottom-leftcorner,sinceasideas,theyneedinvolveneitherinteractingwithactualpeoplenoranyanalysis.Weneedatleasttomentionthem,astheyoftenunderlieotherapproaches.Wefinddiscretechoicemodellingmethodsattheoppositeextreme,astheyhavebothhighrequirementsforplanningtointeractwithconsumers,andtypicallyhighlevelsofanalyticalcomplexity.Therequirementsofthesemethods,bothintheory

andexecution,canseemdaunting.Wehopetoclearupthedifficultieshere.Approachesbasedonhistoricalortransactiondatatypicallyinvolvegreatanalyticalcomplexitybutlittleornointeractionwithconsumers.Thesemethodsrelyondatathathasbeenstoredbyanorganization,oftenoverlaidwithotherinformationgatheredfromvarioussecondarysources.IntheUnitedStates,vastamountsofdatacanbeappendedtonearlyanyperson,someofitattheindividuallevel,somehousehold,andsomeattheblockorneighbourhoodlevel.The amount of data that has been ferreted out about nearlyeveryoneisstaggering–oneservice,forinstance,offerstogetaname,address,age,ethnicityandgenderfromasimplee-mailaddress–andforover90percentofUShouseholds.Then,oncethe address is in hand, other services can provide literallyhundredsofitems.As follows, privacy concerns have been most strongly voicedabout data mining investigations. After all, much of thisinformation was gleaned without the person’s knowledge ofconsent. This also is far more information than most peoplerealizeanyorganizationcouldhaveaboutthem.Questionsandanswersprecededthemoreanalyticallydrivenapproaches–andinsomeformprobablygoallthewaybackinhumanhistory.Theirpre-techrootsremainhighlyvisibleinmanyplaces.Youmayencounterrudimentaryorevennoanalysisinreportsaboutquestionsandanswers.Simplecountingisastapleonthenightlynews,whereforinstanceyoucanhearsolemnannouncementstotheeffectthat,‘14percentlessBritsaredrinkingteathisyearthansixyearsago.’Thisshouldleadtotheinevitablequestion‘Sowhat?’–aswehaveafactoiddisconnectedfromanyideathatitiscausativeofanythingelse.Nonetheless, scaledmeasures, selections of appropriate itemsand rankings allmight lead to sophisticatedmodels that help

guidedecisionsandactions–ifsetupandanalysedreasonablywell.Wewill showyouhow this canwork later in this book.Even verbatim comments collected in interviews can showconsiderable predictive power. This is discussed in anothervolume(PracticalTextAnalytics).

SenseandsensibilitywithpredictionsFirst, it isworthrestatingthatwedonotpredict–popularasthisword may be – we more accurately forecast. The methods wediscuss have greater and lesser degrees of accuracy in makingforecasts. With each, endless possibilities exist to make seriousmistakes. Experience with these methods in the marketplace canhelpus toovercomemanyof these.Sadly, theseareoften learnedbyactuallymakingthemistakes.Forecastsarenotprojectionstothemarketplace.Thisisacritical

difference.Infact,goingfromaforecasttoaprojectionoftenprovesto be far more difficult than imagined, and so becomes a veryhumblingexperience.

SomeofthedifficultiesinprojectionsSuppose you generate a model based on your customers’transactionsthatsaysthatyouwillattaina15percentlikelihoodofthembuyingyour fineproduct ifyourunacoupon.Thisdoesnotmean that15percentof themarketplacewillbuy it,or thatyoursales will go up by 15 per cent – or in fact that 15 per cent ofcustomerswillfinallymakethepurchase.Your customers cannot make up the entire market, and

determining what percentage of the marketplace your customersmake up can be terrifically difficult. You must answer thesequestions,justtogetstarted:

Whatpercentageofthetotalmarketplacedoyourcustomers

represent?Whatpercentageofyourcustomersareactuallyawareofyourfineproduct?Ofthosewhoareaware,whatpercentageunderstandwhatitactuallyisanddoes?Howwellistheproductdistributed?Whatpercentageofyourcustomerscanactuallyfindyourproduct?Ifyouareontheweb,canpeoplefindyourproductandsucceedinbuyingit?Canyourproductionkeepupwithnewdemand?Ifyouareonawebsite,canitkeepupwithtraffic?Andhowmanynon-customersaregoingtojoininandpurchasewhateveryouhavetooffer?

Finally,togettofiguresforthetotalmarketplace,youhavetoknowitstruesize.Thismayseemobviousbutcanprovehighlydifficult.Withanyconceivableproductorservice,thisisnotjustacountofthe general population. You need to start with peoplewhomightpossibly have a use for your product (even if a faint possibility).This will be more than your customer base. And while you mayhave a count of your customers, this also could be less accuratethan you imagine. Unfortunately, it is a rare organization of anytypethathasanexactcustomerlist.

Acautionarytaleaboutcustomerlists

Alargebanksetupamassivecomputerizeddatabasetokeeptrackofitscommercialcustomers.Becausetheyhadmanythousandsofcustomersandprospects,andwereconstantlyupdatingthisinformation,theyhadalargecomputerandalargestaffbusilyatwork.Thiswasanumberofyearsago,sothecomputersystemwasstillahulkingandintimidatingpresenceononefloorofthebank.Theirsalespeople(whotheycalledaccountexecutives)weresupposedtosupply

informationontheircustomerstothestaff.Thatis,theyweresupposedtohandoversheetswithcustomernames,locationsandfacts.Intime,thebankdecidedtodoasurveytestinginterestinanewproductforcommercial

customers.Itseemedentirelylogicaltousethiscustomerdatabaseasonesourceofcompaniestocontact.Othercontactswouldcomefromoutsidelistsofcompaniesinthearea.Muchtoeveryone’sconsternation,thesurveyfoundthat60percentofthecustomer

namescontactedwerenotgood–thatis,theyhadnoworkingrelationshipwiththebank,orwereoutofbusinessentirely.Some25percentofthenameslistedascontactsactuallyweredead.Thesefactsupseteveryoneinvolved,andparticularlythepersonheadingthecomputerdatabaseoperation.Howcouldsuchathinghappen?Twofactorscontributedstrongly:

Theinputgivenbythesalespeoplewassimplyenteredintothedatabaseandnotcompletelyverified.Thetaskofjustkeepingupwithwhatthesalespeopleprovidedalreadytookeightstaff.Thebankbaulkedattheprospectofaddingyetmorestafftovalidatemorethanthecompany’snameandlocationbeingcorrect.Noonefromthebankcalledupthepurportedcontactjusttocheckonwhateverwasentered.

Thesalespeoplehadastrongmotivationnottoenterallthenamesoftheirbestclients,andtomakeupanyquotasforcallsthattheymissedwithspuriousinformation.Iftheykeptsomenamestothemselves,andthetimecameforthemtogoandworkelsewhere,theycouldtakeabaseofvaluablecustomers.Bankruleswouldpreventthemfromtakingactivecustomersknowntothebank.However,ifthesecustomers’namesnevermadeitintothecentraldatabase,thenthesalespeoplecoulddoastheypleased.Andinmanycases,thatispreciselywhattheydid.

Thereareseveralimportantlessonsinthisstory.Theoneweneedtotakeawayhereisthatcustomerdatabasesoftenarenotasgoodaswewouldlike.Theytypicallydonotfailinasspectacularawayasthis,buttheirlackofreliabilitycanbequitesurprising.Customersurveysbasedonthesedatabasesfrequentlyturnupraftsofincorrectnames,addresses,companyaffiliationsandpostalorZIPcodes.Anerrorinanyoneofthese,ofcourse,couldmakethelistinguseless.

Difficultiesinprojectingfromsurveys

Survey-basedforecastsencounteralltheproblemsofstartingfrominaccuratedatabases,andaddmore.Weencountertheproblemofwhether the sample used had usable names but names that stillmisrepresented the entire population sought. We also run intoproblemswhen studyparticipants are screened (allowed into thesurveyornot)basedonmanycriteria.Projectingafterstudyparticipantshavegonethroughthistypeof

screening is one bane of all survey-based research. We do thisscreening because we at least want to find people with somepossible use for or interest in the product in question. But then,once, we have them, just how many people do they actuallyrepresent? Even if we apply the best survey-related method forforecasting,discretechoicemodelling,wecanrunintosignificantproblemsmakingmarketplaceprojections.Surveysalsomayhaveproblemswithnot findingenoughof the

right people, with people who do not answer thoughtfully, withsurvey participants not following directions, with people typinganswerssuchas‘asdhfakjdhgajghad’whenaskedfortheiropinions,and so on. These may happen even with surveys that are puttogetherwell.

Case:problemsprojectingduetooverlyspecificscreening

Amajorsoftwaremakerwantedtotestresponsestovariousconfigurationsofitsnewproduct,anddosoconsideringvariouslikelycompetitiveresponses.Todeterminethelikelyoutcomes,theycorrectlychosediscretechoicemodelling,whichhasanoutstandingtrackrecordforaddressingthistypeofproblem.However(andagainstallobjectionsofthoseworkingwiththem),theydecidedthattheir

samplehadtoconsistofsoftwareengineerswithmorethanacertainamountofexperience,whohadatleastfivecomputersrunningtheiroperatingsystem,atleasttwocomputersrunninganotheroperatingsystem,andatleastoneonabackbenchsomewhererunningyetanotheroperatingsystem.Theysetsomanyconditionsthattherewasnowaytoestimatehowlargethispopulation

mightbeinthemarketplace.Nosourcesexistedwheresomeonecouldlookupthesizeofthistightlydefinedgroup.Astheyaddedeachconditionforincludingapersoninthesurvey,errorsintheirestimateswerecompounded–andprojectionsbecamelessaccurate.Theyalsohadnowayofknowinghowthebroaderpopulation–peoplenotexactlylikethosetheysurveyed–mightact.Sincetheirproductwasoverwhelminglyprevalent,itwassafetoassumethatpeopleoutsidethesmallcohorttheyinterviewedwerelikelytobuythisnewproductaswell.Intheend,theoutcomewasamixtureofsuccessandfailure.Theymanagedtoconfigurea

productthatsoldterrificallywell,sincethediscretechoicemodelwaswellexecutedanddidindeedshowthemahighlydesirablecombinationoffeaturesandpricing.However,theygrosslyunderestimateddemand,byafactorofaboutthree.Theycouldnotkeepproductionupwithsales,resultinginproductdelays,angrycustomers,mockeryinthepress,andordercancellations.

Unfortunately, far too many surveys end up not asking the rightquestions or not analysing data correctly to get useful answers.Overall,itisnotaseasytoputtogetherasurveyprovidingtherightinformation as it may seem. As a result of many subpar efforts,researchingeneralhassufferedablackeye,ormaybetwo.

ModelsoftenaimjusttoincreaselikelihoodsofbehaviourInmanycases, thesemethodsdonot forecast asmuchas suggestways to increase the odds of something happening. An estimatelike this could come from looking at similaritieswith peoplewhohave already bought, or could be an estimate of odds of buying

basedonagroup’sotherpurchasingbehaviour.Increasing your chances of getting a desired result, or reducing

your chances of failure, are of course highly worthwhileendeavours. However, if the output shows increased odds ofsuccess,youhaveneither forecastnorprojection.Youmayhaveausefulguidepost,butnoreal ideaofhowmuchbetteryouwilldoonceyoureachyourdestination.

ForecastsarenecessarilyshorttermNo matter what the forecast, it becomes invalid with sufficientchangeinthemarketplace.Anunexpectednewproductorserviceentering themarketcandisrupteverything.Whensomethingnewshakes up the market, methods based on stored data, such ascustomertransactions,canfailentirely.Oneoftheexperimentalmethods,discretechoicemodelling,can

address the likely effects of new products or services, evendisruptive ones, entering the market. One example involves anestablishedcancerdrug thatwas facing twonewentrants,bothofwhichwerefarmoreeffective.Forecastsforwhatmighthappentothis drug as it faced this major change were borne out in themarketplace.Infact,whenusingthismethod, it isalwaysbesttotestpossible

competitive actions in response to changes in your product orservice.Thismaybethepointatwhichthemarketingteamtellsyouthat you are giving them a headache. However, even afterdeveloping severe sorenessaround the temples fromall thathardthinking, anticipating what competitors might do poses quite achallenge.Also, any forecasts you havemade likelywill become invalid if

your organization runs into problems with communications,distribution or production. Similarly, even the best forecasts canbecome worthless quickly if something unexpected happens thatcausesproblemsinpublicperceptionsofyourproductorservice.

Wherewewillnotbegoing

Toomanymethods!We will explain how a number of advanced methods work inpracticeandshowtheirapplicationswithactualmarketplacedata(which does have to be disguised, though). However, we have anoverabundance ofmethods that could be discussed. For instance,the freeanalyticalprogramWeka listsover100methods,manyofthemusingadvancedmachinelearning–andmorethanafewwithnames likely to be highly unfamiliar, such as J48, LibSVM,HyperPipes, CLOPE, Tertius, etc. Anyhow, we might possibly notmentiononeofyourfavourites.

Notsubstantiated=nothereAplethoraofunsubstantiatedmethodsandsystemsnowsurroundus.Any timeon thewebwill turnup largenumbers.Wewill notreview anymethod supported just by a vendor’s website, even ifthatvendorhasputplentyoftheirownpapersthere.Everythingwediscuss will be backed by both a strong track record in practicalapplicationsand strong theoretical foundations,meaningmethodsthat have passed academic scrutiny in peer-reviewed papers, andmore than just one of those. Either one alone – either anecdotesbased on practical experience or publication in a journal – is notsufficient.Proprietarysystems thatare largely ‘blackbox’alsoareoffourmenu,asfascinatingassomeofthesemayseem.

NorecommendationsforspecificsoftwareorsolutionsManyfinesoftwarepackageshavebeendeveloped,somanythatishas become impossible to review and test them all. Statisticalanalysis software almost invariably is complex, and even therelatively easiest requires some learning and adaptation by the

user. The larger statistical packages, such as SPSS, SAS, Stat,Statistica, Systat and NCSS all make some implicit assumptionsaboutusersknowingwhattheyaredoing,moreorless–andwhythey are doing it. The more complex analyses often involveprograms that can be quite abstruse and require considerablelearning.Allthismeansthatifyouhaveafavouritesoftwareapplication,it

may not appear here. And while we discuss specific softwareprograms, this shouldnotbe takenas giving recommendationsorendorsements.Anyprogramthatappearsheresimplyissomethingthattheauthorhasusedandfindsuseful.Thisdoesnotmeanthatwhateveryouseeistheonlywayorthebestwaytogetananalysisdone. These are just products that haveworkedwell, have strongacademiccredentialsandseemhighlyreliable.Prices range from astronomical (often called enterprise class

software) to completely free. Free does not mean puny. No-costoptionsincludetheredoubtableRandtheamazingWeka.TheprogramRactuallyisavastcollectionofproblem-solvingand

statisticalroutinesthatyoudownloadtoyourcomputerfromonlinerepositories. That is, if you get and start R, you can then load adizzyingassortmentofanalyticalchoices.Thisprogramismadebyandlargelyforacademics.Itrunsbased

oncomputersyntax,meaninginstructionsthatyouwrite,andit iswhollyunforgivingaboutmistakes.Initsmostusualincarnations,itposesfiendishdifficultiesformostusers.Weka comes with four different interfaces, three of them

involvinggraphicaluserinterfaces (GUIs)–thesystemofmenusandvisualdisplayswefindinmanyfamiliarprograms.Itincludesmanyroutinesthatfallundertheheadingofmachinelearning.Itsstylefavoursvisualizingdatawheneverpossible.Andithasquirksof its own – for example, a colleague who is a former rocketscientistfindsitpuzzling,butothershavetakentoitrightaway.We will be talking more about software throughout this book.

Whilewedonot spell out thedetails ofwriting specific syntaxortake you step-by-step through routines,wewill keep you filled inaboutwhich applications seem particularlywell-suited to a giventask.

SummaryofkeypointsThis is a book for practitionerswho need to dealwith predictingconsumerchoices,andforstudentswhowanttounderstandwhatisinvolved.Onekeygoalistowinnowthroughthemanyapproachesto predicting these choices, giving you the best ones to apply indifferentsituations–andtellingyouwhichonestoavoid.Wewilltalkaboutthehumansideofpredictiveanalytics,thepart

thataimstoforecastwhatpeoplewillchoose–andinsomecases,whytheymakethosedecisions. It isnotaboutpredictiveanalyticsasawhole.Forinstance,wecouldotherwisetrytopredictwhenapipewill burst,whereadiseasewill progress, orwhich teamwillwin at a sporting event – attempts to model such processes oroutcomesfalloutsideourscope.Thetermconsumerscanmeanpeopledealingwiththeconfusing

arrayofeverydayproductsandservicesthatwefindallaroundus,includingsmallandlargepurchases, financialservices, technologyand telecommunications. But we also take consumers to meandoctorsmakingdecisionsaboutwhichfinepharmaceutical tobuy,purchasingagentslookingatthewidearrayofindustrialbushingstofilltheirwarehouses,andsoon.This book takes the approach that we can discuss concepts,

methodsandresults inplainEnglish.Whilewewillbegoingoversome advanced topics, and talking about howmethods work, wewill be avoiding subscripted notation, matrix algebra, statisticalproofs and Greek letters. Therewill be nomulti-tiered equations.Anyone expecting any of those will be sorely disappointed. Ourfocusisonunderstanding,planning,executionandapplication.

We explain awide array ofmethods that rarely get consideredtogether, compare their applications and put them into context.Each approach has its best uses and limitations, and we beganconsideringthoseinthischapter.Somekeydefinitionsgetcoveredbeforewegettothemethods,as

these are important for later discussions. First, and perhapsmostimportant, is theunderstanding thatdata is (are)not information.Data rather means anything that can be measured in any way,whetherornotithasanymeaningoruse.Informationisdatathathasbeenprocessedandanalysedsothatitcanbeusedtodealwithasituation,particularlyanovelorunexpectedone.Datadoesnotdoanythingby itself.As several authoritieshavepointedout, data isnot information, and information is not knowledge. Some take itfurther and remindus that knowledge in turn is notwisdom.Wewillshowyouhowtogettoinformation,buttherestwillbeuptoyou.There are some prevalent fallacies about data that we hope to

dispel.Foremostamongtheseisthemistakenbeliefthatmoredatais always better. There is a strong undercurrent in the literatureandonthewebsayingthatifyouhaveenoughdata,perhapsatthemagicalpointwhereitturnsbig,thenyouwillsolveyourproblems.Moreof thewrongkindsofdataactuallycausesproblems.Youdonotfindaneedlemoreeasilybyhavingabiggerhaystack.Data quality, and knowing what your data can be used to do,

remainparamount.Anyonesayingotherwise isprobably trying tosellyousomething–andyoudonotwantit.If we could count everything that people chose, anything that

caused a change in our count would be important. However,counting definitely is not predictive, as it of course only happensafter you have done something – and even then the act ofmeasuring can prove surprisingly difficult. Therefore, variousprecursorstobehaviourhavebeenproposed,withthetheorybeingthat seeing changes in these can be predictive of changes inbehaviour. Some of these include measuring awareness of a

product or service, ratings, associations, buying intentions andpreferences. Sometimes, these are treated as outcomes inthemselves,asinwhenmeasurementsofcustomersatisfactionaretieddirectlytocompensation.

ClassifyingthemethodsThemethodswewill be discussing fall into four broad classes: 1)experimental or trade-off methods; 2) questions and answers; 3)modelsbasedonstoredorhistoricdata,machinelearning,artificialintelligence;and4)varioustheoreticalmodels:

Experimentalortrade-offmethodsweredesignedtodevelopnewproducts,serviceofferingsorcommunications.Theyinvolveinterviewingpeople.Theseincludeextremelypowerfulapproachesinthattheycanestimateresponsestomanythousandsofproduct/serviceconfigurationsormessagesinmarketsimulations–includingthemarketsimulatorprograms.Machinelearningmethodshaveexpandedthecapabilitiesofthesemethods–aswewillshow.Questionsandanswers,asthenameimplies,involvepeoplegettinginterviewedandaskedavarietyofquestions.Methodsavailableforanalysingtheseanswershaveadvancedconsiderably.Infact,herewewillencounterartificialintelligence,inatleastoneunexpectedplace.Modelsbasedonstoredorhistoricdatacangodeeplyintomachinelearningandearlyusesofartificialintelligence,andappeartomakeupmuchofmarketingscience.Manytimes,datathathasbeencollectedforpurposesotherthanaspecificanalysisgetsassembled,probedandexamined.Outputsincludemodelsforscoringcustomersandprospects,andevenalgorithmsforquickdecisionssuchaswhethertoshowaparticularpersonanadvertisementonagivenwebpage.Theoreticalmodelsareimportantbecausetheyinfluenceso

muchoftheworkthatgetsdoneininvestigatingdata.Thesemodelsattempttoexplainwhichfactorsleadtobehaviourandwhichoftenunseenunderlyingcausescanchangewhatapersonchooses.

Eachofthesemethodshasitsownimplicitviewsofpeople.Modelsbasedonstoredorhistoricdatahaveatleasttwoimplicitviewsofpeople.Firstisthelogical-seemingideathatwecanforecastwhataperson is likely todobyobservingwhat theyhavedone.Anotherimplicit view is that you as a consumer are likely to buy whatpeople somehow similar to you have bought. These assumptionscanworkeffectively,butalsocanleadtoproblems–inparticular,thesecond,ifapersongetsassignedtothewronggroup.Andthesemethodsgenerallydonotprovideusefulguidancewhensituationschangeorwhenyouwanttodevelopnewproductsorservices.Analysis of questions and answers rests on the obvious but

criticalassumptionthatifyouaskfairlydirectquestions,whateverpeopletellyouwillprovidevaluableinformation.Thiscanworkifthe questions are asked in the rightway.However, asking peopledirectly about what they think is important has been shown toprovidemisleadinganswers.Theexperimentalmethods in fact arose due to the realization

thatpeoplecannotorwillnottellyoudirectlybyratingswhattheymost value in a product or service. Thesemethods ask people tomakechoicesinvariouswaysandsorevealthetruehierarchiesinwhattheyvalue.Powerfulas theseexperimentalmethodscanbe, theyhavetheir

own implicit assumptions. They are based on the belief that aproduct, service or message can be broken down into discretefeatures that can be tested and compared. It is true that nearlyeverythingweencounterhasatleastonemeasurablefeature.Andfinally,thetheoreticalmodelsaresimplyideasabouthow

people think, feel and behave – and often came into existencewithoutanyfirmempiricalevidence.Still,theseinformagreatdeal

of what marketers and even marketing science types do, so it isimportanttoknowaboutthem.

Onepossiblearrangement

Wecanarrangethesemethodscomparativelyinanumberofways.One useful way to do this is based on howmuch effort of twodifferent types each requires. These are: 1) how much work isrequiredinplanningto interactwithconsumerstoget theneededinformation; and 2) how much analytical complexity could beinvolved.Notallusefulanalysesneedreachtheirmaximumlevelofanalyticalcomplexity,buttheamountofeffortinvolvedinplanningtointeractisinviolable.Theexperimentalmethods require themosteffort inplanning

to interact with consumers, and among these discrete choicemodellingrequiresthemostanalyticaleffort.Questions and answers preceded themore analytically driven

approaches – and in some form probably go all the way back inhumanhistory.Theirpre-techrootsremainhighlyvisibleinmanyplaces. You may encounter rudimentary or even no analysis inreports about questions and answers. However, they can be usedwithmorerigorousandinvolvedanalytics,aswewillshow.Approaches based on historical or transaction data typically

involvegreatanalyticalcomplexitybutlittleornointeractionwithconsumers.Thesemethodsrelyondatathathasbeenstoredbyanorganization,oftenoverlaidwithother informationgatheredfromvarioussecondarysources.The theoretical approaches, as ideas, need involve neither

interactingwithactualpeoplenoranyanalysis.Weneedtoreviewthem,astheyoftenunderlieotherapproaches.

KeepingsensiblewithpredictionsWe rounded out what we will cover by discussing sensible

approachestoprediction,startingwiththereminderthatwedonotactually predict anything. We make forecasts, like weatherforecasts, rather than consulting our crystal ball and coming upwiththewinningnumber.Forecasts are not projections, and in fact any projections are

fraught with difficulties as we try to estimate how the data (orinformation) we have at hand relates to the entire marketplace.Eventhebestdatagatheringcanleaveusuncertainabouthowwellwehave captured everything in the outsideworld.And too often,datathatwehaveonhandturnsouttobelowerinqualitythanwehadsuspected.Finally,eventhebestofforecastsshouldnotbeexpectedtohold

up indefinitely. Changes in the marketplace or the externalenvironment will lead to a need for new analyses and newestimates.

02Gettingtheprojectgoing

Herewediscussthekeyfirststepsinplanningforaprojectthatyouwant toconcludewithapredictionofwhatpeoplewill choose.Aswithanyin-depthanalysis,youneedbothtoplancarefullyandgettherightdata.Herearesomeofthestepsthatneedtobetaken.Thischapteralsointroducessomeofthekeyconceptsandlanguagethatwemust learn (alas) to plan the analysis and understandwhat ishappeninginit.

AtthebeginningYoucanfindagreatmanyarticlesandbooksthatadviseyouaboutwhat todowhen starting aproject.Whilewehesitate topile intothisparticularfray,therearejustafewkeyissuesthatcouldstandareview,andafewquestionstoconsider.Theseareseveralquestionstokeepinmindbeforeanyproject:

Whatcanchange,basedondoingthisanalysis?WhatdoIreallyneedtoknowtohelpthatchangehappen?WhatdatadoIneedtoanalyseand/orwhichpeopledoIneedtointerviewsoIgettheanswer?WhatisthemostIcanexpecttogetbasedondoingthis?HowwillIknowifIreachedagoodanswer?Anddothepeoplewhoneedtousethisunderstandwhattheywillgetfromthis?

KnowwhoyouaretalkingaboutortalkingtoLet’s give everyone the benefit of the doubt and assume that thefirst key first question has been answered. All work should startwith some understanding ofwhat can change as a result. Still, inplanning for some kinds of analyses, deciding what you want toknow can take up as much time as everything else you do. Forinstance,ifyouwanttooptimizeanewproductorservicewithoneof the experimental methods (conjoint or discrete choice, inparticular),gettingtothefinallistoffeaturestovaryandmeasuretypicallytakesconsiderablethoughtandeffort.Youalsomustfigureoutwhereyouwillgetyourbasicdataand

what limitations you have in your sources. In Chapter 1, wedescribedafewpossiblepitfallsindealingwithinternaldata.Whatabout data that you gather from the web or from interviews? Ineither realm, it is remarkablyeasy togatherdata from thewrongplacesoraboutthewrongpeople–andsodatathatwillnotprovidereliableguidance.

Unfortunately,errorspersistSometimesitcanbequitetemptingtotakeashortcutandrunwithdatathatdoesnotquitematchthequestionathand.And,ofcourse,this almost inevitably will sabotage your results no matter howmuchof the incorrect data youhave. Thanks to theweb,wenowhaveaccess tomoreof thewrongdata thanwaseverpossible. Sothiscautionhasbecomecritical.Sadly, when you get the basic data incorrect, your conclusions

might benearly right,more or less right, or terriblywrong – andyou will have no way of knowing unless you can check withanothersource.Inmanycases,thereisnoothersource.

Poorresultsfromoverconfidenceindata

Predictionsofwhatagroupwilldoarefraughtwithdifficulties.Noplacearemistakesinpredictionsmoreclearlyrevealedthaninelections.Theseprognosticationsarehighlypublicizedandeasilycomparedtotheoutcomes.Oneofthemostcolossalmistakesinpredictionhappenedwithanenormoussample,

perhapsthelargesteverusedtopredictwhowouldwinanelection.(ThisstoryalsoappearsinPracticalTextAnalytics,butitisimportant–andsobearsanothertelling.)AmagazinecalledLiteraryDigestwenttothehugeexpenseofmailingout10million

lettersaskinglikelyUSvoterswhotheywouldpickforpresident.Theyearwas1936,andthecandidateswereoneAlfLandonandaslightlybetterknownperson,FranklinDelanoRoosevelt.Thismailingcoveredone-quarterofallUSvotersregisteredthatyear.Themagazinegot

backsome2.4millionresponses.Theycrowedthattheywouldbeabletogettheresultsrighttowithinafractionofapoint.Theirprediction:Landonwouldwin53percentto47percent.However,weneverdidhaveapresidentLandon.Instead,Rooseveltwonbyacrushing62percentto38percent.Themagazinemissedbyover15percentagepoints.Howcouldtheyhavegotthingsso

wrong,especiallywithsomanypeople?Thesimpleanswer,whichyoumayhavesuspected:theygotbackagreatmanyresponses,butfromthewrongpeopleformakingaprojection.Theyhadusednamesfromtelephonedirectories,magazinesubscriptionlistsandclub

memberships.However,telephoneswerestillarelativeluxuryin1936,withonlyabout40percentofUShouseholdsowningone,andinthemidstofthecash-strappedGreatDepression,relativelyfewcouldaffordamagazinesubscriptionoraclubmembership.Asaresult,theymissedthevastcohortoflessprivilegedvoters,amongwhomRoosevelthadanoverwhelmingmajority.Meanwhile,GeorgeGallup(oftheeponymousGalluppoll)usedamuchsmaller,more

scientificallyselectedsample,andgottheelectionresultsright.Wherethemillionsledtoawronganswer,thecarefullychosenthousandsledtoamuchmoreaccurateprediction.Wehavehad80yearstoabsorbthislesson.Yettoooften,peoplestillmakethesame

mistake.

Somehighlyerroneousresultsbasedonstartinginthewrongplacehavebeenreported,much to thehumiliationof thosemaking thismistake.Butmanyothersucherrorsgetburiedquietly–sometimesonly after the organization suffers from the poor decisions thatresult.

SampleframesAny place where you gather data is more technically called a

sample frame. Whether you gather data by interviewing, bycollectingdataonlineorfromdatawarehouses(orlakes)thatyourorganizationmaintains,you likelywillbedealingwithasamplingofallpossibledata.Ofcourse,doing interviews, theodds thatyouhaveasampleapproaches100percent.Liketheframeforapicture,ifthesampleframedoesnotfit,you

will not see the picture correctly. Figure 2.1 first shows the rightframe,andthenwhathappenswhenframesareoffintwodifferentways.

FIGURE2.1Gettingtheframerightiscritical

Let’ssuppose,forinstance,youcangatherallpossibledataaboutavowed beer drinkers on a large social media site, such asFacebook.Thiswouldbeahugenumberofbeerdrinkers,doubtlessmany thousands and perhapsmillions. However, even this manyuserswouldrepresentasubsetofallbeerdrinkers.Wecannotknowwhetherpeoplewhoindicateonasocialmedia

sitethattheylikebeerordrinkbeerarerepresentativeofallbeerdrinkers. ‘Representative’ means that they match the entirepopulation in terms ofwhich beers they like and drink, and howmuchofthosebeerstheydrink.Therefore, based on patterns you see in the data you gathered

froma socialmedia site, youcannot saywithanyconfidence thatyou have captured patterns in beer drinking in the general beer-drinking population. So you could not, for instance, crunch thismassive amount of data and say that youhave foundwhodrinksthe most of various beers in different cities. Again, it does notmatterhowmanypeopleyouhaveiftheyarenottherightpeople.

Careingettingdataandmorecareinprojections

Overall,weneedtobeverycarefulastowherewegetourdata.Andifwewanttomakeaprojectionfromourdatatothegeneralpopulation,wehavetobeevenmorecareful.Usingwhatyouhaveathandandtryingtoestimatewhatwillhappeninthemarketplaceisfraughtwithproblemsandpitfalls.

Whatisthemostyoucanexpectfromeachmethod?Weareoptimisticthatnooneouttherewillbemakingupthegoalsofaprojectastheygoalong.InChapter1,wetalkedbrieflyaboutthekeyoutputsyougetfromeachbroadclassofmethods.Wewillbe talking more about the specifics in each section. Let’s take aminutetotalkaboutthemostyoucanexpector,putanotherway,someofthelimitationsofeachtypeofmethod.

Themostfromhistoricalortransactiondata

Thesemethodsobviouslycangiveyouthebestpossiblefixonhowwell a given promotion ormarketing effort influenced short-termsales.Butwithoutaddedquestioning,thesemethodscannottellyouhoweffortsareinfluencingimpressions,perceptionsandreactionstoagivenbrand,serviceorinstitution.An example can be seen in what happens to a brand after

running a great many coupons and promotions. These can boostsales in the short term. But they can also position a brand veryfirmlyasnotbeingworthfullprice.Ignoringunderlyingperceptionsandassociationscanhavehighly

negative consequences. For instance, for many years thedepartment-store chain JCPenney engaged in an intricate schemeofheavydiscounting.Thenthemanagementdecidedto‘reposition’thestoreasanexclusivebrand,broughtinanexecutivefromApple

computers (presumably for his expertise with higher-priced,heavilybrandedmerchandise)andcutthediscounts.This effort failedmiserably. It did not accordwithwhat people

saw as Penney’s identity. The store even took to advertising anapology for turning its back on what it truly was. Many of thediscountsreturned.Thestoreclearlyhadatlastaskeditscustomerssomeoftherightquestions,buttoolate.Thedamagehaslingered.

Limitedabilitytoexperimentwithtransactionaldata

Thesemethodscannottellyouanythingaboutwhatelseyoumighthavetried.Abilitytoexperimentandtoanticipatenewturnsinthemarketplaceremainsextremelylimitednomatterhowpenetratingtheanalysis.Youcannotanswer‘whatif’questionswithtacticsyouhavenottried–asyoucanwiththeexperimentalmethods.

Donotexpecttheunexpected

Astreamofcommentaryrunningthroughthepopularpresskeepscirculatingthenotionthatinsightscanarrivealmostspontaneouslyfrom‘patterns’inthedata.Thistiesinwiththebeliefthatbecauseyouhaveagreatdealofsomething,youmustbeabletodoamazingthingswithit.Thisstorylineactuallystartedgainingcurrencyintheearly1990s,

whendatawarehousingandminingwereintheirformativestages.(Datawarehousingwastheprecursortobigdata,differinginthatittypicallyinvolvesdatathathasbeenorganizedaswellasstored.)

Beliefinthemysticpowerofdata

One widespread story supposedly supporting the near magicalpowers of data concerned a ‘major store’ that was said to havediscoveredyoucouldincreasebeersalesbypromotingthisproductalongwithdiapers,andpreferablyleadinguptotheweekend.Thesupposed logicwas thatyoung fatherswouldbedispatched to thestore to buydiapers for Saturday and Sunday, and of course, this

madethemthinkofbeer.Thiswas in fact anurban legendaboutdatamining. Its legend-

like statuswas underlined by theway that the name of the storechanged in different accounts, and by the way in which thepurported sales increases from this joint promotion increased bydifferingbutalwaysverypreciseamounts,suchas‘224percent’.Afteranumberofyears,thefellowwhomadeupthisstorycame

forward and confessed. He had done it, he said, to show howridiculousanecdotesabout findingserendipitous ‘patterns’ indatacouldbecome.By the time the truth emerged, this tall talehadbeen reprinted

innumerable times in support of themystical potential of data. Ithad even appeared in a textbook about data mining. Thisunderscorestheallureofgettingasolutioneasily,buttherereallyisnosubstitute foradvance thinkingabout theproblemyouneed tosolve.

ThemostfromquestionsandanswersWe can do much more with survey-based questions involvingscaled ratingsand selectionsof answers thanwaspossible evenafew years ago. For instance, thanks to newer andmore powerfulmethods of analysis, such as the Bayesian networks discussed inChapter7,wenowcansuccessfully linkquestions insurveyswithexternaldata,suchasmarketshareorshareofwallet.(Thiskindoflinkage was almost never possible with well-established methodsbasedonvariousformsofregression.)However, even with these powerful new methods, we rarely

arriveatdetailedprescriptionsforaction.Forinstance,ananalysisofasurveymaytellyouthatyoucouldgainupto10sharepointsfromdoingmoretotrainpeopleworkinginyourinboundcustomercall centre – but it likely will not tell you just what you need toimprove in that training.Therefore, you likelywill reachonly thefirst steps in understanding what needs to get more attention.

However,youcouldturntooneoftheexperimentalmethodsifyouneedmuchclearerdirection.WegiveanexampleoffindingspecificactionstoimprovecustomersatisfactioninChapter5.

ThemostfromtheexperimentalmethodsAmong these approaches, discrete choice and conjoint analysiscangiveyouveryprecisefixesonhowmuchagivenchangeinyourproduct or servicewill increaseshareofpreference. The correctterm is share of preference because you must add morecalculationstothebasicoutputtodeterminethetrueshareofsalesinthemarketplace.Two important factors are missing from share of preference:

how many are aware of the product or service and how welldistributed the product is. Everyone involved in an experimentalstudy becomes 100 per cent aware of all the choices they areevaluating – so responses can be overstated versus the actualmarketplace.And,asshouldbeapparent,theproductcannotsellifpeoplecannotfindit.Other factors that can cause share in themarketplace to differ

fromshareinasurveyarehoweffectivelymessagespromotingtheproductperform,andhowwelltheproductispresentedwhereitisavailable. Expect strong adverse effects on performance in themarketplace from either garbled messages, fostering the wrongassociationswith theproduct or itsusers, orpoorpresentation instores.

Anexampleofadjustingforawarenessanddistribution

Inastudyofpricingformen’scasualslacks,asponsoringcompanyinvested considerable time and effort in inviting hundreds ofpeople to survey locations where their brand and competitors’brandswere displayed and could be examined (but not tried on).These brands were shown at different prices, and then reshownwithpricesvariedaccordingtoanexperimentalmethod–andthis

wasrepeatedanumberof times.Peopleparticipating in thestudywereaskedwhichbrandtheywouldchoose,giventhatthebrandswereatthespecificpricesshown.(Wewilldiscussthedetailsofthismethod,discretechoicemodelling,inChapter4.)One brand that did not sell well in themarketplace performed

stronglyinthistest.Thisapparentlypuzzlingresultwasexplainedby the detailed figures the sponsoring company had about howaware shoppers were of the various brands and how extensivelyeach brand was distributed. The one brand that stronglyoverperformed in the test was neither well known nor widelyavailable.When the figures for awareness and distribution of this brand

were factored in along with its average in-market pricing, thestudy’s estimates of its share fell into linewith the best availablemarketplacesales figures.Currentconditionscouldbematched inthemarket simulation because, of the several alternative pricesthat were tested for each brand, one price was the marketplaceaverage.Thisgavestrongconfidence inprojectionsofwhatwouldhappenifpricesweretochange.Theseexpectedeffectswereshownby a special market simulator program created to run underMicrosoftExcel, allowingprices tobechangedandsaleseffects tobe seen in real time. (Thismay seemsomewhatabstractnow,butwill be shown in detailed examples in Chapters 4 and 5.) Thesponsoringcompanywasabletoembarkonanewpricingschemethathelpedincreasesalesandprofitability.

Aboutshareofpreference

Somemethodscandeliverremarkablyaccurateresultsshowingwhichfeaturesdrivechoicesofproductsorservices,andevenprovideestimatesofsharesandhowchangingfeaturesinfluencethese.However,thesearepredictionsofshareofpreference,notmarketshare.Shareofpreferenceinaresearchstudymustbeadjustedforhowmanyareawareofthe

product,howmanycomprehendtheproduct,andhowwidelytheproductisdistributed,tomatchactualsharesofsales.Iffiguresforthesecriticalfactorscanbefound(nottoooftenthecase),thenshareofpreferencecanbetranslatedintoactualmarketplacevalues.

Thesemethodsshowwhatwillhappenbutnotwhy

Thesemethodscanshowwhatwillhappenintheshorttermundervaryingmarketconditions.Buttheydonotexplainwhatpeopleareweighingasidefromfeatures,pricesand/ormessageswhenmakingadecision.Forinstance,therewasatimewhenSonycouldchargemore than other brands for most consumer electronics products.This was observed in many experimental-based studies andconfirmed in the marketplace. However, these studies did notprovide insights intowhatpeopleperceivedabout thisbrand thatallowed it to charge higher prices. Research based on moretraditionalquestionsandanswerswasneededtostartfillinginthispartofthepicture.

Howdoyoujudgetheresult?Attheoutsetofanyproject,youneedaplanforhowtheresultswillbeevaluated.Aquestionyoumayheardeterminingthegoodnessofan analysis goes something like this: ‘Is that significant?’ Anothercommonquestionyoumayencounteris:‘Wasthereacorrelation?’Sometimes these even get combined into: ‘Was there a significantcorrelation?’Thesequestionsareaskedforgoodreason,becauseofabasicneedtoevaluatewhethertheeffectisreal.Still, thesequestionsareshorthandatbest,andoften inaccurate

ormisleading.Let’sseeifsomeoftheissuessurroundingthesecanbeclarified.

Whatissignificant?Weneed to talkbrieflyaboutstatistical significance.Wait, comeback!Wepromisenottodrivearoundthisinendlesscircles.Thoseofuswithmoreacademic experiencehavehearda great

deal about this topic. This gets frequent use in reporting aboutresearch,ofthescientificandmarketvarieties.Yetthistermisnotwellunderstood.Solet’sgettotheessentials.

GettingpastsometerminologyPerhaps you recall the term null hypothesis, with more or lessdread. Less formally, this means the belief that nothing ishappening.Thatisall.Wego from there torejecting thenullhypothesis. Thismeans

youarenotsayingthatnothinghashappened.Thatconvoluted-seeming formulation is at the heart of significance testing.Statisticians want to be sure that they are not falsely claimingsomethingishappeningwhenitisnot.Making that kind of false claim is sometimes called a Type I

error.AsthenameTypeIimplies,statisticiansstartfromapositionofextremecaution.Thefirstruleisnotsayingsomethingmistaken.Aconventionhasarisenthataneffectissignificantonlyifweare

atleast95percentcertainwearenotmakingafalseclaim.(Thisisalso called 95 per cent confidence.)Why is the threshold 95 percent?Thebestanswerthatseemsavailableis: justbecause.Thisisjustaconvention,ifalongestablishedone.Based on the number of scientific papers that go through

incredible contortions to reach this 95 per cent level, you mightthinkthereissomemagicbehindit.Butthereisnot.

Powerversussignificance

PowerversussignificanceStatistical power tests whether you are really seeing somethinghappening.Weshouldseefrequentmentionsofthis,butthatisnothappening. The higher the level of statistical significance wedemand – say going from 90 per cent confident to 95 per centconfidentto99percentconfident–themorestatisticalpowergoesdown.Thatis,themorewewanttobecertainwearenotmakingafalse

claim,themorelikelywearetomisssomethingreal.Let’sillustratethis.Supposewearebirdwatchingandwecanseethebirdsfairlywell,butwearenot100percentcertainwhichbirdsweareseeing.For instance, ifwewantedto identifyaduckflyingoverhead,andweneededtobe95percentcertainthatarandombirdwesawwasaduck,someactualduckswouldflybyandwewouldnotbesureenough to call them correctly. If we needed to be 99 per centcertain, even more actual ducks would get by without usexclaiming,‘Duck!’Missingsomethingthatactuallyishappeningisthatothertypeoferrormentionedinstatisticsclass,TypeIIerror.

Statisticalsignificanceisoftenmentioned,butitsmeaningisnotwellunderstood.Whenaresultissignificant,itmeansyouarehighlyconfidentthatyouarenotmakingafalseclaim.Significancedoesnotmeasurehowlikelyyouaretobemissingsomethingreal,whichisdeterminedbythemuchlessusedstatisticalpower.

Traditionalsignificance testing alsobreaksdownwithvery largesamples. Ifyouhaveenoughdata,everydifferenceoreffectstartsseeming significant. This problem occurs well below today’sthresholdforbigdata.Forinstance,investigatingopinionsofabout12,000peoplewho answered theNORCGeneral Social Survey,wefound that people with the astrological sign Leo watchedsignificantlymore television than anyone else. However, don’t goandchideyourLeofriends.Thisissimplyanartefactoftestingforsignificance with a large sample – you can find random-seemingitemspassingthetest.

Significancetestingcanbreakdownwithhugesamplesorwithhundredsorthousandsorcomparisons.Alternativemethodsoftestingmodelsareusedbysomeprocedures.Theseothermethodsinvolvefindinghowwellthemodelworksonnewdata,orondataputtoonesidebeforethemodelhasbeenmade.Youneedtousethetestofwhatissensiblealongwithstatisticalsignificancetesting.

Traditionalsignificancetestingalsocanstretchpastitslimitswhiledoing dozens or hundreds of comparisons, trying to find whicheffects or values are larger. Somemethods, such as the BayesianNetworkswediscussinChapter7,mayuseavalueofinformationapproachratherthantraditionalstatisticaltestingtoovercomethisproblem. (We will get to this definition in Chapter 7.) Othermethods, such as the classification trees we discuss in Chapter 6,can deal with this issue by using highly sophisticated testing,advancedenoughtobecalledartificialintelligence.

Somesayforgetsignificancetestingentirely

A few authors have even advocated abandoning statisticalsignificance in favour of what they call searching for ‘repeatablepatterns’,whichboilsdowntoseeinghowwellthemodelholdsuponotherdata,orperhapssomepartofthedatathatyouhavesettoone side before youmade themodel. Testing on other data, evendatayouhaveputtoonesidebeforemakingthemodel,isasoundidea – but it also is worth keeping the extra guidance thatsignificance testingcangiveyouonwhatdefinitelyshouldnotgointoamodel.

SignificancetestingneedstomakesenseStatisticaltestsofanykindcannotdetermineifwhatyouseemtobeseeingmakessense.Forinstance,inChapter6,weusesignificancetestingbutdonotautomaticallyacceptalltheresultsthecomputeridentifiesaspassingthetest–becauseitdidnotmakesensetousethem. (This problem arose partly due to using a large sample.)

Especiallywithhugeswathsofdata,youwouldbemostprudenttouse statistical significance testing as suggesting a threshold belowwhich you do not want to go, rather than as forming the finaldecision.

MistakesonthesmallsideofsamplingOddly, in this timewhen ‘big’ data gets somuch attention,manydecisionsstillgetmadeusinglittleornodata.Whilethereisnowaytocounthowmany,oftenmajororganizationaldecisionsinvolvenomorethancheckingtherumblingsofagut.Youevencanfindentirebooks giving advice about decisionmaking that barelymake anymentionofhowtoanalyseandinterpretinformation.Numerous other decisions appear to rest on interviewing small

numbers of consumers. You will encounter this in two popularforms of research, focus groups and in-depth interviews. Thesemethods have good uses, but using either to make a finaldeterminationrarelyisoneofthem.

Allsampleshaveerror

All samples, that is using anything less than 100 per cent of allpossibledata,introducessomeerrorintoourmeasurement.Thatis,wecannotbecertainthatoursampleaccuratelyreflectstheentirepopulation. This uncertainty somewhat more formally is calledsamplingerror.Usingsmallgroupsofpeople(orsmallsamples)wetypicallyrun

intorealproblemsduetothistypeoferror.Thesmallerthesampleyou use, the less likely it is to match or represent the largerpopulationaccurately.Buthowdoyouknowhowmucherror?Moststatisticstextbooks

talk about this problem by discussing a giant barrel filled withdifferent-coloured balls – which is fine, but a little boring. Let’sinsteadsupposewehada largedumpster filledwithboxesof two

fine breakfast-like substances, KardboardKrunchies and SoggyOs.Suppose it is a huge dumpster, say, about a city block long. Howmanyboxeswouldyouhavetopullouttoestimatetheproportionofeachoftheseinthedumpsteraccurately?Andhowaccurately?

Wecandoadvanceestimatesforsometypesofsampleerror

Fortunately, some exceedingly smart people determined exactlyhowmucherroryouarelikelytohaveinasamplewherethereareonly twochoices (that is,as inourexample,KardboardKrunchiesversus SoggyOs, but also yes versus no, and blue versus white –wherethoseare theonlycolourspresent,andsoon).Thechart inFigure2.2showsyouthesizesoftheerrorsfordifferentsamplesifyouarefacingthistypeofproblem.

FIGURE2.2Samplepercentages’errorsatdifferentsamplesizes

First,let’sclearupthe95percentconfidencelevel.Dealingwithasample,youneverare100percentconfident that the sampleyoutake accurately reflects, or represents, the entire population. Asyou may recall, being 95 per cent confident is the standardthresholdsetinstatistics.(Asanotherreminder,thereasonforthisrequirementbasicallyboilsdownto‘justbecause’.)Therefore, the lines on Figure 2.2 reflect howmany percentage

points around our sample valuewemust allow sowe are 95 percent confident that the actual percentage in thewhole populationfalls into that range. This is points plus orminus,meaning aboveand below the measurement we took. This range is sometimesreferredtoasthemarginoferror–andmaybefamiliartothosefollowingelectionswherepollingresultsarereported.These ranges become increasingly large once the sample drops

downbelowabout150.Thesizeof thesample is the largest factordetermining themargin of error. However, in determining error,we also need to consider the percentage itself that we havemeasured.

At a percentage of 50 per cent, or going down the middle, theerror is largest.Asweapproach theextremesofzeroand100percent,theerrorgetssmaller.Theerroraroundapercentageof90percent is the sameas theerrorarounda sampleof10percent.Theerror around a sample of 80 per cent is the same as the erroraroundasampleof20percent,andsoon.Backtoourbreakfast-likesubstances,ifyoupulledout400boxes

andexactlyhalfofthemwereKardboardKrunchies,youwouldbe95 per cent certain that the entire bin contained somewherebetween45.1percentand54.9percentofthisfineproduct.

WhytinysamplesdonotworkforestimationIf you think aboutwhat an error of +/- (plus orminus) 14 pointsmeans, you realize that the actual value for thewholepopulationcouldbeanywherewithina28-pointrangeofwhatwasmeasured.For instance, when the sample’s value is 50 per cent, the actualvaluecouldbeanywherebetween36percentand64percent,ifthewhole population could be measured. Few would argue that thiswidearangeformsanacceptablebasisformakingadecision.Yet,thatisallyougetwithasampleof50.If the sample is below 30, this formula breaks down, and we

actuallyneedasomewhatdifferentsetofcalculations.However,asFigure2.2shows,errorsballoonwithsmallsamples.

Therisksinestimatingbasedonsmallgroups

Asmentioned,gatheringsofsmallgroupsforinterviewing,orfocusgroups, typicallyrangebetween5and15people. Inoneortwoofthese, a total of anywherebetween 5 and 30 or so people in totalhave offered their opinions. Unless the item being discussed is acomplete and unmitigated disaster (as can happen) or anastronomical, unprecedented success (aswe have yet to see), youneedfarmorepeopletomakeasoundfinaldecision.

Sowhydofocusgroups?

Sowhydofocusgroups?

Aside fromdetectingdisasters, thesegroupshaveseveralvaluableapplications. You learn about the language that people use indiscussingtheproduct,andinparticulartheterminologythattheycan understand. You can find out if what you are saying ishopelessly confusing from the perspective of peoplewho actuallyuse theproduct.Youcanopenupunexpectedavenues for furtherquestioning.Youcanlearnsomehumblinglessonsabouthowmuchthepeoplewhouseyourfineproduct thinkabout it (generally farlessthanonewouldimagine).

Ataleofmisleadingsmallsamples

Thisisastoryaboutadecisionthatwasmadeinentirelythewrongway.Itappearedinatrainingvideo,andevenbrieflyontelevision.Thiswasquiteafewyearsago,andthetapeseemstohavevanishedintothemistsoftime.Nonetheless,ithasbeendocumentedandstandsasaclearexampleofwhatnottodo.Thoseinvolvedwereamarketingteamfromamajormanufacturerofmen’sjeans.They

wantedtoevaluateanewproduct,men’ssuitsbearingtheirbrandname.Theystartedtheirinvestigationwithamassivequantitativeresearchstudy,almostsociologicalinnature.Thequestionsdelveddeeplyintobasicneedsforclothing,forsocialapprovalandsoon.Thispartoftheprogrammecostinexcessof$800,000incurrentdollars.Followingthis,theyconcludedwithafewfocusgroupstoevaluatetheirnewproductidea.

Theyhadprototypesuitsmadeandaskedafewsmallgroupsofmentogivetheirreactions,inparticulartothequestion,‘Wouldyoubuythese?’Usingthesefewresponsesasthebasisoftheirdecision,theyputthesuitsinto

production,distributedthemtostoresandpromotedthemthroughnationaladvertising.Ifyoucannotthinkofabrandofmen’ssuitsmadebyamanufacturerofmen’sjeans,thisisnotbecauseyoumissedsomething.Thisventurewasanabjectandexpensivefailure.Theycouldnothavemadesuchamistakehadtheyusedadequatenumbersandmethodstoinformtheirdecision.

AsuggestionforsmallversuslargenumbersUsing small numbers, you can get an overview, something likelooking at a general view of a landscape that you will traverse.Largernumbersarelikethedetaileddirectionsaboutwhereexactlyyouwill be going.Having both gives you themost assurance thatyouwillgetwhereyouneedtogoasefficientlyaspossible.

ConcludingourtalkaboutsignificanceFiguring out how much error you find in a percentage is thesimplest form of significance testing. The formulas get morecomplexforcomparingmorethantwosamples,forfiguringouttheerrorinasampleaverage(suchastheaverageratingona10-pointscale),forcomparingscoresinagrouptootherscoresinthesamegroup,andinahostofotherstatisticalprocedures.By the way, all these formulas work correctly only for what is

called a random sample – meaning that you do not have anylinkages among the people or items you are sampling. If you aredealing with, say, doctors together in a given hospital, and evenhospitals together in a given geography, more complexformulationsfortestingareneededtogetaccurateresults.Thatsaid,moststatisticaltestingproceduresinvolvedetermining

how large the error in the measurement (or the effect) is andcomparing that to the measurement itself. Getting the effect just1.96timesaslargeasitscalculatederrorisenoughtopassinmanyprocedures. (We will not be going into all the maths behind thisnumber,fortunately.)Thismayseemlikenotmuchtodemand,butthishasbeenthestandardformanyyears.

OntocorrelationsIn casual conversation, correlation can stand for almost anyrelationship.Instatistics,though,correlationonlymeansasimplesummary measure of how well two variables fit a straight line.Correlation can range fromahigh of +1,where the twovariablesrise and fall in a perfect straight-line relationship, down to –1,where the two variables have a perfect inverse relationship (onefallspreciselyastheotheronerises).Thecorrelation statistic is calledR. This shouldnotbe confused

withR,thefreestatisticalsoftwarepackage.(Apparently,thosewhoinventedRwerethinkingprimarilyofanearliersoftwareprogramcalledS.Thisisreallytrue.)However, theR statistic is the underpinning of theR-squared

value that may be familiar from regression. Regression in itsordinary version (also called ordinary least squares regression)doesindeedlookforstraight-linerelationships.Correlationalso isstrictlyforapairofvariables. Ifweheartalk

about four variables being correlated, either we are listening tosomeslightlysloppylanguage,orsomeonehascheckedalltheways

inwhichthesefourvariablescanbecomparedtwoatatime.

NotalltheworldfallsintoalineManyregular,predictablerelationshipsdonotfallinastraightline,or are not linear. Figure 2.3 shows three examples. In each, wefollow the standard practice of indicating the value of the firstvariablealongthehorizontalaxisandthesecondalongthevertical.

FIGURE2.3Noteverythingfallsintoastraightline

The first diagram (Typical Growth Curve) shows a standardgrowthcurve.YoumightseethispatterninnaturewhenacolonyofmouldgrowsinaPetridish.Ittakesawhileforgrowthtobecomenoticeable.Thengrowthtakesoffandstaysatafairlysteadyrate.Intime,growthreachesasaturationlevel.Itslowsandthenstops.We also often see this curve with the adoption of products. A

small brand has a great deal of troublemaking itself visible, butonce it reaches a certain threshold, growth comes much morerapidly.Finally,thebrandreachesapointwhereithassaturateditsmarketandfurthergrowthbecomesverydifficult.Eventhoughastraightlinedoesnotfitthisgrowthcurvepattern

well, thecorrelationcouldstillbedeceptivelystrong,evenashighas 0.9, misleadingly suggesting that the relationship is a straightline. Without looking at Figure 2.3, and looking just at the 0.9correlation,itwouldbeeasytomakethismistake.InthelasttwoexamplesinFigure2.3,thecorrelationactuallyis

zero. If we paid attention only to this one statistic, we might befooledintothinkingthattherewasnorelationshipatall.Wecaninfact find that there is a perfect correspondence in the latter tworelationships,butitisnotlinear.

Wehopethatthereisnolongeranyquestionaboutneedingtosetsightsonsomethingotherthana ‘significantcorrelation’.Weneedsomething stronger and more focused. And, as a reminder, wetypically cannot spend the time and money involved in simplyputtingaproductorserviceoutinthemarketplaceandseeinghowitdoes.Insomeplaces,wecandoalimitedmarketplacetest(suchastestingoutapromotionontheweb),butagain,thesedonotgiveusmuchchancetoexperimentwithalternatives.Sowhenplanningaproject,whatshouldwehaveinmindasourcriteriafordecidingiftheapproachwehavechosenshouldsucceed?

Correlationisasimplesummarymeasureofhowcloselytwovariablesfallcomparedtoaperfectstraight-linerelationship.CorrelationisknownasRinstatistics.R-squaredisindeedthecorrelationvaluesquared,andisthemostwidelyusedmeasureshowinghowwellaregressionmodelperformsinfittingvaluestoastraightline.

HowdoIplantoevaluatetheresults?Some measures have been used for many years in judging thegoodness of statistical models. Correct classification andexplained variance are likely to be the most familiar. Newermeasures include information criteria such as the AkaikeInformation Criterion (AIC) and Bayesian Information Criterion(BIC).Youmaycomeacrossmanyothermeasuresifyouspendtimearoundstatisticaltypes,buttheseareastart.Correctclassificationseemsintuitivelymostapproachable.This

measureisusedforcategoricaldataanddatathatcantake justafewvalues,suchasratingsona1–5scale.Weneedtousedifferenttypes of testing for different types of data. Before we go muchfurther,weshouldprobablyreviewthebasictypesofdata.

Thebasictypesofdata

Categoricaldatausesnumbersasplaceholdersfornon-numericalvalues.Forinstance,youcouldusethenumbers1,2,3and4tostandforfourregionsofthecountry,suchasnorth,south,eastandwest.Clearly‘west’isnotworthfourtimesasmuchas‘north’(eventhoughthismightbedisputedbypeoplelivinginthewest).Herewearesimplyputtingthecategoriesintoanumericformthatiseasierforthecomputertomanipulateandstorethanitstextequivalent.Ordinaldatashowswhichvaluesarelargerthanwhichothers,andsoprovidesanordertothevalues.Butitdoesnotshowhowmuchlargeronevalueisthananother.Forinstance,ifwe

havethreecontestantsinfirst,secondandthirdplace,weknowonlytheorderinwhichtheyareplaced,andnothowmuchbetterfirstdidthansecondorseconddidthanthird.Interval-leveldatagivesusasetdistancebetweenthevalues,butdoesnotallowustosaythatonevalueisacertainmultipleofanother.Temperature is an example of this kind of measure. Forinstance,ifitis40degreesoutside,wecannotsaythatitistwiceashotas20degrees.Wecansaythat40degrees is20degreesmore,and60is20morethan40,andsoon.Ratio-leveldatagivesusthemostcomparativeinformation.Weightisanexampleofthistypeofmeasure.WecanindeedsaythatifHanshas20kilosofsausages,hehastwiceasmanysausagesasFritz,whohasonly10.(WealsocansayHanshas10kilosmore,theluckyfellow.)Ratio-leveldataandinterval-leveldatacanbelumpedtogetheras continuous data, that is, data that can take any value,includingfractionalordecimalvalues.Ifyouhaveafive-pointscale, there is no value between (for instance) four and five.Similarly,therearenovaluesbetweenthe‘1’assignedto‘north’andthe‘2’assignedto‘south’inournominaldataexample.

There have been, and probably continue to be, many argumentsaboutwhether the scales used inmany surveys are justordinal-leveldataoriftheyareinterval-level.Thatis,youcanarguethat‘5’onafive-pointratingscaleisthesamedistancefrom‘4’as‘4’isfrom ‘3’. That would make the scale interval-level if true.(However,thisscaleisdefinitelynotratio-level:youcannotsaythataratingof‘4’istwiceasgoodasaratingof‘2’,andsoon.)At times you will see scaled ratings treated as if they were

interval data or ratio data. For instance, youwill see the averageratingona five-point ratingscale shownas (say)4.3,even thoughnoonecouldpossiblyhavegiventhatexactresponse.Inspiteofthefactthattreatingscaledratingsinthiswayviolatesbasicrulesand

guidelines, and there are better things you can do, we find thatsomemethodsholdupfairlywellwhenscaledratingsaremisusedinthisway.

BacktocorrectclassificationIfapersoniscorrectlyclassifiedbasedonamathematicalmodel,thismeansthatthemodelcanapplysomerulesormanipulationstoa set of predictor variables and correctly predict the score thatpersonwouldhave,orthegrouptowhichthepersonwouldbelong.Thatis,weaimatatargetvalueforeachpersonbyconstructingamodel,thenchecktoseeifpredictionmatchesreality.Somemodelswill takemultiple passes at the data, refining the

model at each step. When the analysis can do no better, it thencompares the number of correct and incorrect values, and gets apercentage.Figure2.4showsanexampleofacorrectclassificationtable.The

‘yes’ and ‘no’ responses are predicted by the model much betterthan‘maybe’.Thedarksquaresinthefigureshowthepercentagescorrectly predicted for each type of response. The ‘yes’ responsesarepredictedbest–some82.8percentofthe9,651peoplewhosaid‘yes’ were correctly identified by the model. Conversely, ‘maybe’responsesareincorrectlypredictedmoreoftenthannot–some66.4percentofthesearemisidentifiedashavingsaid‘yes’.

FIGURE2.4Correctclassificationtable–63percentcorrectoverall

NOTECellshighlightedshowpercentagescorrectlypredicted

When we multiply the percentages correctly predicted by thenumbers in each group, we get to the overall correct predictionlevel.Thisis63percent,broughtdownbythelowlevelofcorrectpredictionsforthegroupsaying‘maybe’.

ExplainedvarianceYou will see this measure used with many procedures, such asdifferent types of regressionmodels, but alsowith some kinds ofclassificationtreeanalysesandinfactoranalysis–andyouwillseeit cropping up in many other places. Variance is the pattern ofscatteringinthedata,andexplainedvariancethenishowmuchofthepattern of scattering in the values of some target variable themodelisabletocapture.Withregression-basedmodels,explainedvarianceisthe(perhaps

familiar) R-squared statistic. As a reminder, R-squared is thecorrelationvalue squared, sowith regression this is ameasure of

howwellthepredictionfromthemodelandtheactualvaluesofthetargetvariablefallintoastraightline.Explainedvarianceisalwayspositiveandcanrangefromzerotoone(oralternativelyfrom0percentto100percent).

Information-basedcriteriaYou may encounter these with some of the newer methods. Forinstance,thestatisticsprogramSPSShasanewerclusteringmethodthat they call TwoStep, and it shows the AIC and BIC statistics.These measures are good for comparing statistical models. Asmallerscoreisbetter.Howmuchsmallerisopentointerpretation,but an often-cited standard for theBIC is that a difference of twogivesweakevidence that onemodel is better thananother, andadifference of six gives strong evidence. There are no absolutestandards for either of these tests, though. Also, many otherinformation-based tests exist, but donot seem to getmuchuse inthemethodsweusuallyencounter.

Onemorecriterion:theRMSE

YoualsomayseementionsoftheRMSE,whichstandsfortheroot-mean-squarederror.Liketheinformation-basedcriteria,itisusefulfor comparingmodels.However, it isdifficult to interpretdirectlyasitdoesnotfallintoafixedrange,likeexplainedvariance.Ratherit is in the units of the basic measurement you are making. Forinstance, an RMSE value of 0.7 is good if the basic measurementgoesupto1,000,butnotgoodifitgoesupto1.

Whataregoodresults?‘Allmodelsarewrongbutsomeareuseful’–GEORGEEBOX

George Box was talking about the fact that any mathematicalrepresentation of the real world, even the best, leaves out some

aspects. This is of course correct. Mr Box was an eminentstatistician, and an excellent source of guidance on matters ofinterpretingdata.Knowing thatweare shooting foramodel that is ‘goodenough’

anduseful,westeerdirectlyintosomedisagreement.Nohardandfast ruleexists forwhatmakes fora ‘goodenough’ scorewith theevaluativecriteriawearediscussing.Fewarguethatanythinglessthan50percentcorrectcouldcount

as good. But beyond this, it becomes a matter of personalpreference. Still, more than a fewmodels have been accepted asuseful,evenwithlowerlevelsofbeing‘correct’.Sometimesonegoalofananalysisistodeterminejusthowwellaspecificquestioncanbe answered. In that case, a low level of performance is itselfinformational,showingthataspecificapproachordatasetcannotprovidestrongguidance.

Anexampleofapoorresultbeinghelpful

Atamajorpharmaceuticalmanufacturer,seniormanagementdecidedthatpoorsalesofagivendrughadtobeduetothesalesforcenotadequatelyexplainingthebenefitsofthisfineproducttothedoctorswhotheyregularlyvisited(pharmaceuticalcompanysalespeople,calledrepresentatives,aresenttodoctors’offices,hospitalsandclinicstodescribethewonderfulbenefitsoftheirparticularbrandofdrugs).Anelaboratestudywassetup,inwhichhundredsofdoctorswerecontactedwithin48hoursofavisit.Nomatterhowthedatawasanalysed,nomorethan28percentofthevarianceinsalescouldbeattributedtodoctorscorrectlyrecallingthemessageaboutthedrug.Managementfoundthisconclusiondifficulttoaccept,butafterenoughattemptswithdifferentanalyses,theyreluctantlydecidedthattheproblemslayoutsidethesalesforce.

Bereadyforstrongpreferencesintesting

Youmayencounterguidelinesorbenchmarks–certainlevelsoftestperformancethatmustbemetorasolutionwillgetrejected.Someofthesecanbeverystronglyheld,eveniftheyseemarbitrary.Forinstance, one organizationwouldnot accept any solution that didnotreach70percentexplainedvariance,andanotherinsistedjustasstronglyontwo-thirds(thislatterthresholdworksouttoroughly66.67percent,whichnotonlyisarbitrarybuttransparentlylooksarbitrary).Understandingexpectationsaboutwhatpassesas‘good’isacriticalpartofplanningtheproject.

ResultsandvalidationResults are considered stronger and more indicative of what thewholemarketwoulddowhentheyarevalidated.Wehavenotyetdiscussedvalidation.Soperhapsthisisthetime.Validationisnotanewidea,butthenotionthatyoushoulduseitregularlyhasgainedsupport as data sets have become larger andmodels have grownmorecomplex.Validation involves firstbuilding themodel onpart of thedata

whileholdingasidetherestofit.Youthentryoutthemodelonthepartofthedatathatyouheldtooneside.Whenyoutrythemodel

on this so-called hold-out sample, predictive accuracy usuallycomesinatalowerlevelthanwhenyousimplylookatthehowthemodelperformedwhereitwasmade.Even the best predictive modelling technique will fit some

randombumpsandfluctuationsthatarefoundonlyinthedatasetonwhichamodelwasbuilt.Tryingoutthemodelelsewhere,evenonanotherpartofthesamedatathatyousetaside,givesyousomesafeguardsagainstoverfittingtoseemingpatternsthatyouwillnotfindintheoutsideworld.Validation typically comes built in by default only in newer

statisticalroutines,suchasthosethatdotheBayesNetsanalyseswediscuss in Chapter 7. More traditional methods, such as linearregression, often do not include validationmethods, as they grewupinatimewhencomputingpowerwasnotuptodoingthistypeoftestingasamatterofcourse.

Validationmeansputtingpartofyourdataasideinahold-outsample,buildingthemodelwiththeremainder,andthentestingthemodelwiththehold-outportion.Thisissupposedtosafeguardagainstbuildingamodelthatwillworkwelljustwithyourparticulardataset.Gettingamodelthatfitstherandomfluctuationsthatappearinyourdata,butnotelsewhere,iscalledoverfitting.Validatedresultsusuallyhavelowerpredictiveaccuracythanamodelbuiltwithallthedata,andsoareconsideredmorerealisticinjudgingthegoodnessofamodel.Validatingresultsisadvisablewherepossible,andparticularlysowithlargerdatasets.

Itisalwayssoundpracticetovalidatewithlargerdatasets.Inthese,manyeffectsmayseemtobemeaningfulsimplybecauseyouhaveso much data. You may recall that statistical tests start to breakdownwithmassesofdata,becauseeveryeffectordifferenceseemssignificant.Whenyouhaveahugedataset,itisentirelyfeasibletoputsomeofittoonesideandhaveampleamountsleftforbuildinga complexmodel. Soparticularly if youhavea lot ofdata, testingwithahold-outsampleisworthincludingintheanalysis.

KnowwhatsensiblegoalsmightlooklikeAnyanalysisorprojectshouldbebuiltaroundreachingthedesiredresult, whichmay seem obvious but often is not easy to do. Onecommondifficultyliesinaligningwhatpeoplewantwithwhattheycanreasonablyexpect.Sometimes unreasonable expectations follow analyses that

basically make sense. Too often, this happens because no oneworkingontheprojecthasdiscussedwhattheresultsarelikelytoshowwiththepeoplewhoneedtousethem.(Bytheway,youmaysee these people called stakeholders.) For the people who willapply the results, you also need to remind them about what youhave been doing along the way, if elapsed time to completion ismorethanacoupleofweeks.Itissurprisingwhatverybusypeoplecanforgetinjustashorttime.There probably is noway to stop some people from going into

projects with unrealistic expectations. When considering whatmight be reasonable, the first question iswhether your source ofdatacanaddressthequestionathand.Forinstance,itisrationaltolookatcustomertransactionstoestimatehowmanywillrespondtoanewoffer similar to a recent one.However, it is not sensible toturntothisdatasourceandexpecttogleanideasfordevelopingasplendid new product. (In the following chapterswewill go oversubtler problems that can arise from starting with the wrongsources.)

Summaryofkeypoints

StartingconsiderationsThis chapter talks about first steps in planning a project. Severalconsiderations are key at the outset, and while some may seemapparent, it seems many a project has lurched into life withoutconsideringallof them.Keyamong thequestions toaskarewhatcan change based on the analysis.And, as follows, youmust alsodetermine what you really need to know to help that changehappen. You alsomust understandwho you are talking about ortalkingto–andavoidthetrapofjustrunningwithwhatevercomeseasilytohand.Thankstotheweb,thereismoredatathatwillproveto bewrong for answering your question than ever – so this is aparticularlykeycaution.Theplace fromwhichyoudrawyourdatamore technicallycan

be called a sample frame. Aswith the frame for a picture, if youpickonethatisthewrongsizeorthatobscuresthepictureinpart,youwillnotbeseeingwhatyoushould.

ThemosttoexpectYouneed to consider themost you can expect fromyourdata, or

less positively put, the limitations of different kinds of analyses.Withhistoricalortransactiondata,youcanreasonablyexpecttoforecastwhatwillhappenifyoudosomethingsimilartowhatyouhavealreadybeendoing.Youshouldnotexpecttomakeafantasticnewproductwiththis,though.With standard questions and answers from surveys, you can

predict more than you might have suspected, particularly withsomeneweranalyticalmethods (whichwewilldiscuss).However,youaremorelikelytogetoverallguidanceaboutwhatyouneedtoaddress, rather thanspecificdirection.For instance,youmay findthatyoucanincreasesharebyimprovingcustomerservice,butnotprecisely in which ways or by how much. The experimentalmethods,particularlyconjointanddiscretechoice,canprovidethatspecificlevelofinformation.Theselattermethods,alsocalledtrade-offmethods,arethebest

suitedfordeterminingtheextenttowhichspecificchangeswillleadto changes in marketplace behaviour. Even discrete choicemodelling, themost realistic and powerful of thesemethods, stillwillonly forecastshareofacceptance (rather thanactualmarketshare) unless you can adjust at a minimum for how many areaware of the product in the marketplace and how widely theproductisdistributed.Theothertrade-offmethods,MaxDiffandQ-Sort/Case5,provide

clearlydifferentiatedimportancesfor listsof itemssuchasclaims,messages, or specific sets of product or service features. Theycannottesttheeffectsofchangesinfeaturesonwholeproductsorservices. Both of them provide information at the ratio level,meaning you can say (for instance) that one feature is twice asimportant at the other. MaxDiff provides importances for everyperson,whileQ-Sortdoes soonly forgroups.Q-Sort canprioritizemanymoreitemsthanMaxDiff,though,upto100(versusupto35).

Settinguptojudgetheresults

Youwill do thebest, andgainwidest acceptanceof results, if youdecide how the resultswill be judged at the outset of the project.People commonly ask about whether results are ‘statisticallysignificant’ or ‘correlated’ or even ‘significantly correlated’, buttheseareusuallynotwhattheywantforevaluation.

Lookatsignificanceandmore

Statisticalsignificanceinfactiswidelymisunderstood.Itisnotthechances that something actually is happening. Rather, it is thechancesthatyouareavoidingmakingafalseclaim.Thatis,thenullhypothesis,otherthanbeingatermfillingmanyofuswithdread,means thebelief thatnothing is happening.Whenyouare 95percent confident you can reject that belief, voila, that is significant.ThisiscalledavoidingaTypeIerror.Seeing something that actually is happening is different. That is

measured by statistical power, which we should be seeingdiscussedagreatdealmore thanwedo.Powergoesdownasyoudemandhigher levels of statistical significance – themore certainyou have to be that you are not making a false claim, the morelikelyyouaretomisssomethingthatisreallyhappening.Significancetestingstartstobreakdownwithverylargenumbers

in a sample orwithmany tests beingdone. Somenewermethodstry todealwith this. Still,with largenumbers, significance shouldbeseenasabottomfloor–thatis,ifaneffectordifferencedoesnotpass,youcanbesurenothingishappening.Significance testing always needs first tomake sense. Themain

question is whether a statistically significant difference or effectyouarefindingismeaningful.

Samplesize,sampleerrorandthedangersofsmallsamples

Youoftendealwithaportion,orsample,ofallthepossibledatayoumight use. All samples have some error. We can determine inadvancewhatsampleerrorwillbeifwehaveasamplepercentage

and know the sample size and the percentagewe aremeasuring.Sampleerrorgoesupdramaticallyassamplesfallbelowabout150.Asurprisingnumberofdecisionsgetmadewithlittleornodata,

even in the faceof this supposedlybeingan eraof ‘bigdata’. Toooften,decisionsgetsupportedbytalkingtonomorethanahandfulofconsumers,ofteninthe(likelyfamiliar)settingofafocusgroup.Thesegroupshavemanyvaluableuses,butmakingafinaldecisionbasedonthemisnotoneofthose.Using small numbers, you can get an overview, something like

looking at a general view of a landscape that you will traverse.Larger numbers are like the detailed directions about where youwill be going.Having both gives you themost assurance that youwillgetwhereyouneedtoasefficientlyaspossible.

BettertestingmethodsCorrect classification and explained variance are two measurescommonly used to assess the goodness of results. Correctclassification is applied in models that have categorical target ordependent variables. Explained variance is used in models thathaveacontinuoustargetvariableandinsomeothermethodsthatdonothaveatargetvariablelikefactoranalysis.Thatsaid,wewanderintoconsiderabledisagreementabouthow

good is good enough. Nomodel is a perfect representation of theoutsideworld,sowehavetodecidehowgoodanapproximationwewillaccept.Strongbiasesmayexistaboutwhatisgoodenough.Youshouldfindoutaboutthesebeforeyoustarttheproject,soyouhaveathresholdforwhatisacceptable.You may also run into various information-based criteria for

testingthegoodnessofmodels.MostcommonamongthesearetheBayesian Information Criterion (BIC) and Akaike InformationCriterion (AIC).Thesearealwaysrelative,used tocompare twoormoremodelstodeterminewhichismathematicallybest.Another relativemeasure youmay encounter is the root-mean-

squared error, or RMSE. Like the information-based criteria, it isusefulforcomparingmodels,butdifficulttointerpretdirectly.

ThetermcorrelationisofteninaccurateshorthandCorrelations are taken tomean almost any relationship in casualconversations,butinstatisticsmeanonlyhowcloselytwovariablesfallintoastraightline.Manyregularrelationshipsamongfactorsinthe real world do not fall into a straight line. Correlations alsomeasure only straight-line relationships between a pair ofvariables. With more than two things at a time, we need to talkaboutothermeasures.

ResultsandvalidationResults are considered stronger andmore representative of whatyouwill find in the realworld if theyarevalidated.This involvesfirst putting aside part of the data and building themodel on therest.Youthentestoutthemodelonthepartofthedatayoudidnotuse to build the model, the so-called hold-out sample. Resultsusually are not quite as strong in the hold-out sample and in thepartofthedatayouusedtobuildthemodel.Eventhebestpossiblemodels usually fit some irregularities, lumps and bumps that arepeculiarjusttothedatasetathand–andthatyouwillnotfindintheoutsideworld.

KnowwhatsensibleresultswilllooklikeOnecommondifficultyliesinaligningwhatpeoplewantwithwhattheycanreasonablyexpect.Sometimes unreasonable expectations follow analyses that

basically make sense. Too often, this happens because no oneworkingontheprojecthasdiscussedwhattheresultsarelikelytoshowwiththepeoplewhoneedtousethem.

Youmayattimesencounterunrealisticexpectations.Agoodwaytodealwith these is consideringcarefullywhetheryour sourceofdata can address the question at hand. Then explain howreasonableoutcomesmightlook.

03Conjoint,discretechoiceandothertrade-offsLet’sdoanexperiment

This chapteraddressesbestmethods fordevelopingnewproductsor servicesor combinationsofmessages –allowingyou topredictresponsestomanyhundredsorthousandsofalternativesbytestinga small, scientifically selected fraction of them. These are theexperimentally designed approaches, also called the trade-offmethods.Attheirsimplest,theycanprovidecleardifferentiationinpreferences. At their most complex, they can accurately simulatewhatwillhappenundernewcircumstancesintherealworld.Thesefall into threebroadclasses:discretechoice, conjointanalysisandotherforcedtrade-offexercises.Theseallfalltowardstheendofthecontinuumofmethods that requirehighplanning forengagementwith consumers and have high analytical requirements. We willdiscussbrieflyhoweachmethodevolvedanditsrelativestrengthsandweaknesses. Thesemethods have been greatly expanded andstrengthened bymachine learning approaches. The next chapterswillshowhow.

ThereasonsweneedthesemethodsThesemethodswere developed to address a salient problemwithquestions asking about what is important. We cannot get at theright answer by simply asking people to give us ratings such as‘Howimportantisthistoyouonascaleofonetofive?’Whenthistype of direct question is posed, people cannot or will not give

answersthatreflectwhattheytrulyvaluethemost.Forinstance,supposeyouwantedtodevelopanewfloor-standing

wine cooler, and your product team came up with a variety ofpossiblefeatures,includingsomethatwerejustslightlyfar-fetched.Ifwe asked anaverage consumerquestions abouthow importantthese features were using a standard set of importance ratingscales,wewouldgetapatternliketheoneshowninFigure3.1.

FIGURE3.1Ratingthefeaturesofafloor-standingwinecooler

Asking for ratings in a survey, everything becomes highlyimportant. This is a real problem. Morwitz (in Scott Armstrong’sPrinciples of Forecasting, 2002) did a very thorough reviewof 60+years of research about trying to predict behaviour with scaledratings,andfoundnoconsistentlygoodwaytousethem.There isonepossibleexception: ifyouhavemassesofhistorical

sales data, a great deal of historical ratings data, the productcategory is not changing, and its buyers arenot changing.As youmight guess – this combination is not likely.However, if all theseconditions could be met, then you would have norms and thequestion of interpreting the scaled ratings would come down toreferringtohistoricalpatterns.

CASESTUDYTheessential3ambanker

A major bank entered the Chicago market once several Byzantine lawsrestricting banks from having many branches were nullified. It bought awholeportfolioof smallerbanks,proudlyemblazonedthemwith its logo,andputlargesignsinthewindowssaying,‘Talktoalivebanker24hoursaday!!!’After amonth or two, these signs were changed to say ‘Talk to a live

bankeruntil1am!!’Notlongafterwardstheychangedagain–toread‘Talkto a live banker until midnight!’ Before the seasons changed, the signschangedagain–toread‘Talktoalivebankeruntil10pm’.Afterthat,thesignscamedown,orperhapsfelldownduetotheweight

ofallthepatcheswithrevisedtimesthathadbeenplacedonthem.Clearly,not many people actually woke up at 3 am, slapped themselves on theforehead,andsaid,‘Ohmy,Ireallyneedtotalktoabanker!’Yetthebankhadbehavedasiftheyexpectedpeopletodothis.Whydid

this happen? Conversations with those involved revealed that they hadaskedpeople,usingscaledimportanceratings,whatwouldbeimportantinashinynewbankbearingtheircorporatename.Whenpeopleweregiventhe chance toprovide thesedirect ratings, theyoften ticked ‘talking to alivebanker’ascritical.Yetitclearlywasnotcrucialintheiractualbanking.Itwasjustthattheycouldnotsay‘no’towhatlookedlikeafreeoffer.Thepeoplewho framedthequestionswereruefulabout this, referring

to organizational pressures, not having enough time – and that, besides,someoneelsemadethefinaldecisionaboutthesurvey.Excusesaside,theygot useless answers andwasted a chance to enter a newmarket with amorevaluableoffer.Theywouldhavedonemuchbetterusingoneofthetrade-offmethods

wearediscussing.Infutureeffortstheydid,avoidingthemistakeofactingas if bankers were essential at 3 am. Their later offerings weresubstantiallybetterreceived.

ShortcomingsinscaledimportanceratingsIn the typical situation, where you do not have a long andapplicablehistory,scaledimportanceratingssufferfromnumerouswell-documentedproblems.Salientamongthemarethese:

acquiescencebias;straight-lineandextremeresponses;

sociallydesirableresponses;culturalskew.

Acquiescence bias means the tendency of people to respond toquestions with positive responses. That is, most people arepredisposed to avoid negatives and will choose a more flatteringresponseifpossible.Forinstance,thevastmajorityofresponsesonafive-pointimportanceratingscaledoindeedfallintothetoptwopoints of the scale (the ratings corresponding to the mostimportance).Straight-lineorextremeresponsesarisefromlimitedcognitive

effortbeingputintotheinterviewtask.Someindividualswillagreeto do the interview but then repeatedly check either end of theratingscale(highestorlowest),ratherthanthinkingofmoresubtlydifferentiatedresponses.Onealternativepatternofgivingstraight-line response that is fairly common is a person’s repeatedlychecking thesecondhighestboxonaratingscale,suchas ‘4’onafive-pointscale.Sociallydesirableresponsesreflectwhatpeoplebelieveshould

besaid,ratherthanwhattheyactuallybelieve.Forinstance, ifweweretoaskfordirectratingsoftheimportanceofsafetyfeaturesinanewcar,nearlyallpeoplewouldratethemas‘essential’.Yetmanycarsinthemarketplacecontinuetosellwell inspiteof indifferentorevenbelow-averagesafetyratings.Culturalskewreflectsthewell-documenteddifferencesfoundin

ratings among people with different cultural backgrounds. Forinstance, doctors in Japan are notoriously hard in ratings. Theymightratetheirfavouriteproductat‘7’ona10-pointscale.DoctorsinLatinAmerica, conversely,might rateallproductsat ‘9’ or ‘10’,even products that they would never use. Sometimes, even finergeographic differences matter. For instance, people in largenortherncitiesintheUnitedStatestendtogivelowerratingsthanpeopleinsouthern,lessurbanareas.

Theseproblemsledtoadoptionofnewmethods

Theseproblemsledtoadoptionofnewmethods

The inherentunreliability of rating scales led to a search for, andcommonadoptionof,newmethods.Thesehavebeenputunderthebroadheadingoftrade-offmethods–mostalsoaredesigned likescientific experiments. All of these methods ask respondents toweigh specific elementsor featuresofaproduct, service, claimormessageagainsteachother.

Problemswithtraditionalscaledimportanceratingsarenumerous,wellknownandthoroughlydocumented.Allofthesetogetherleadtoratingsofthistypeprovidingunreliableresultsforpredictingbehaviour.Salientamongthefactorsthatmaketheseratingsnon-usefulinclude:acquiescencebias(tendencytoratepositively);straight-lineorextremeresponses(frompeoplewhodonotputineffortansweringquestions);sociallydesirableresponses(orthecommontendencytosaywhatitseemsshouldbesaid,ratherthanreallyanswering);andculturalskew(differentuseofscaledratingsdependingonsocialbackground).Practicallyspeaking,nearlyeverythingratedtendstobecomehighlyimportantorcritical,

becauseitcostsnothinginasurveytosaythateverythingisimportant.Theseproblemsledtothewidespreadadoptionofvarioustrade-offmethods,inparticular,theexperimentallydesignedoneswediscussinthissectionofthebook.

It is entirely possible that the severe problems with scaledimportanceratingsforuseinpredictionledtoadevaluationoftheentirefieldofsurvey-basedresearch.Manypractitionersstillhavenot accepted the idea that these trade-off methods, if donereasonablywell,canovercometheselimitations.

ThebasicthinkingbehindtheexperimentallydesignedmethodsThese methods, as a reminder, also are called trade-off studies.Theymeasureresponsestodistinctfeaturesordistinctvariationsofdefinite features. That is, products, services and messages areassumed to be collections of features that are measurable andcomparable. These methods also assume that the value of eachfeaturecanbetradedversusthevalueofotherfeatures.This is as far into the psychology of decision making as these

methods go. They do not address the underlying intricacies ofdecisionmaking–itseemsmoreaccuratetoconsiderthattheyareaiming to capture the outcome of the decisions rather than theirinnerworkings.

Featurescanbebrokenintodiscretevariationsorlevels

Anotherimplicitassumptionisthatfeaturesthatvarycontinuouslyin the real world can be measured at specific fixed values. Eachdistinctvariationofanattributeiscalledalevel.For instance, suppose a course of medical treatment could be

pricedanywherebetween£2,000and£9,000.Severaldistinctpriceswouldbechosen tomeasure in thisrange,suchas:£2,000,£4,500,£6,800 and £9,000. These and only these get measured directly.Otherpointsinbetweenareestimatedbyinterpolation.Aswewillsee,inanyofthesemethods,choosingtherightpoints

tomeasure is critical. Youalsoneed to keep to as fewaspossiblethat measure what you need to know. This all arises from therequirementsofdesignedexperiments.

ThefeaturelevelwiththemostutilitywinsThe values of the various attributes and their variations or levelsaremeasuredinabstractunitscalledutilities.Theseareusedasakindofbookkeeping,asawayofmeasuringeverythingonthesamefooting.Thisleadstoanotherassumptionofthesemethods:thatthelevelofeachattributewiththehighestutilitywillwin inachoice.Thisdoesnotmeanthatpeoplelookatalltheattributes,orchoosecarefully.Italsodoesnotmeanthattheycognitivelyfollowautility-baseddecisionprocess.These methods are more usefully seen as aiming to match the

outcome of the decision process, rather than trying to decipherinnerworkings.Theydoassume,though,thatdecisionsatleastaregenerally consistent. Some methods provide measures of howconsistent people are when answering. Answers will passthresholds for being consistent in trade-off studies – if thoseinvolved understand what they need to do. Making these studiescleariscritical.

Whatthemethodsask–andget

Eachofthesemethodsgetsadifferenttypeofinformationbyusinga specific form of questioning. Themost complex, discrete choicemodelling,showsarepresentationoftheproductorservicechoicesinthecontextofcompetitiveofferings–andasksforadecision.Theleastcomplex,theQ-Sort/Case5method,showsalistoffeaturesorclaimsandaskspeople to rankpart of them.Let’s briefly go overwhat each method presents during an interview, in the hope ofmakingthismoretangible.

DiscretechoicemodellingStudy participants in a typical online interview see a series ofscreens showing them marketplaces or scenarios (or marketscenarios) representing the main choices that they have. Thefeatures of the alternative products or services shown in thesescenariosvaryfromonescreentothenext.Foreachscenariotheyevaluate, study participants are told to think only about thatparticular set and, given that the choices have the features andprices shown, to choose thebestof them.Manyof theseexercisesare set up so that people also have the option of saying ‘none ofthese’.Researchhasshownthatpeopledobetterwiththesetasksifthey

aretoldtoimaginethattheyreallywouldliketobuysomethingandtochoose‘none’onlyifeverythingistrulyunacceptable.Figure3.2showsanexampleofonescreenfromadiscretechoicestudydoneamongcomputerhardwareengineers.

FIGURE3.2Asamplemarketplacescenariofordiscretechoicemodelling

ConjointanalysisThetermconjointanalysisisusedtocoverseveralrelatedformsofanalysis,asmentionedinChapter2.Someconsiderconjointtobeabroader heading and will include discrete choice modelling as apartofit.However,becausethesetwomethodsarosefromdifferentdisciplines, and because some of their basic ideas differ, we willdiscussthemseparately.Fullprofileconjointisthefirstformofconjointanalysistohave

gained widespread use. It shows an entire product or servicedescribedasasetofattributes. Inatypicalstudy,people lookataseries of these product profiles and give each of them a rating.(Beforethesestudieswereroutinelydoneonline,peoplealsocouldgetasetofcardsandeitherratethemorsortthemintoorderfrommost to least preferred. Sometimes one product profile is still

referredtoasonecard.)Figure3.3showsonescreenused ina fullprofileconjointstudy

aboutservicefromatelecomcompany.Thisstudydiagnosedwhichlevelsof service invariousareaswouldbeassociatedwithhigherlevelsofoverallcustomersatisfaction–andbyhowmuch.

FIGURE3.3Aprofileofaserviceforconjointanalysis

Byshowingtheentireserviceexperience,thisanalysisovercametheproblemsinherentinaskingfordirectratingsoftheimportanceofservicefeatures.Chapter5discussesthistypeofanalysisanditsoutcomesinmoredetail.

FIGURE3.4Asimplesimulatorforoneproduct

MarketsimulatorsConjoint analysis and discrete choice modelling both can lead tomarket simulator programs. We will talk more about marketsimulators in the next two chapters. To help this become morespecific, we will show a simple one now and explain its generalfeatures.InFigure3.4,nexttotheletterA,youseeasetofcontrolsthata

user can change. Each time a new variation is chosen for eachfeature, the expected share of acceptance value (above letterB)changes.(InChapter2,welaidoutthedifferencebetweenshareofacceptanceandmarketshare.Ifthisisfeelingdistant,pleasecheckbackthere.)Theshadedselectionineachcontrolcorrespondstotheway the product is now configured (very high fibre, texturedformulation,noflavouring,twoweeks’supplyandlowpriceforits

formulation).This isoneof3x4x4x4x4possiblevariationsor768possiblewaystomakethisproduct.The share of acceptance expected with this product

configuration appears above the letterB. The display shows howthis value compares with the best possible product. Sometimesthese displays include charts of numbers as well as graphs, andsometimesyouwouldseejustthenumbers.Near letter C, you see the specific effects of selecting each

variationoffeature,whencomparedtothebestpossiblevariation.(This type of display does not appear in all simulator programs.)ThefiguresnearletterDshowthepricefortheproduct(bothdailyandtotal).Examples of market simulators with more than one product

appear in Chapter 4. They containmore elements and so benefitfrommore explanation. More details about other types of outputalsoappearinChapters4and5.

Maximumdifferencescaling(MaxDiff)MaxDiffusesa listof features,attributes,messagesor claimsandprovidesaclearreadingofrelative importances. Itbreaksa listofabout8–35attributesintosets,whichcanrangefromtwotosixatatime.Foreachset,itasksforthemostimportantoftheonesshown.With threeormore in a set, it is alsopossible to ask for the leastimportant. This is typically done in a web interview. A typicalscreenappearsinFigure3.5.

FIGURE3.5Asurveytaskformaximumdifferencescaling(MaxDiff)

Unlike conjoint anddiscrete choicemodelling,MaxDiffdoesnotmeasure the relative importances of different levels or variationsofattributes.Usingitsdirectcomparisons,itdoesnotmakesensetocompare, for instance, ‘shelf stable for threemonths’versus ‘shelfstable forsixmonths’– the longer timewouldalwayswin if thesetwo were to be compared. You could, though, compare ‘excellentshelfstability’tootherattributes.The attributes being compared should not make up an entire

product or service. Therefore, this method loses the distinctadvantageofconjointordiscretechoice,inwhichspecificfeaturesappearinthecontextofallotherfeatures.MaxDiffprovidesrelativeimportancesattheratiolevel.Wecan,

forinstance,say‘AttributeNisfourtimesasimportantasAttributeR’afterdoingaMaxDiffanalysis.

Q-Sort/Case5Thiswas the firstmethoddeveloped (back in the 1920s). It uses apartial guided ranking of items. We use only part of the Q-Sortmethod,namelyrankingthefewbestandfewworstitemsinalonglistof items.This leaves therest ‘tied’asunranked itemsbetween

thebestsetandtheworstset.Inalistof,forinstance,50items,wemightaskforthetop10,thenaskforthefirst,secondandthirdbesttobeordered–andthendothesamewiththebottom10.The rest of Q-Sort, which we do not use, sounds somewhat

mystical.Ittalksabouttypingpeoplebasedonhowtheysortedtheitems.TheanalysisiscarriedoutbyamethodcalledThurstone’sCase

5, or ‘The Law of Comparative Judgements’. Thurstone had theadvantageofworkinginthe1920s,whenitwasstillpossibletocallastatisticalanalysisalaw.Using Case 5, rankings get converted into ratio-level data like

MaxDiff. However, unlike MaxDiff, this method providesinformation only at the group level. Therefore, while Q-Sort canprocess more items than MaxDiff (100 items in a list has beenreported), it does not provide importances for the items for eachindividual.Study participants would respond to something like the list in

Figure 3.6. This method actually is not based on a designedexperiment,althoughitissupposedtofollowrulesabouthowitemsaresortedandranked.

FIGURE3.6AsurveytaskforaQ-Sort/Case5

BothMaxDiffandQ-Sortprovidealistingand/orchartofrelativeimportances for the attributes that were tested. These charts canapportion100percentof importanceamongtheattributes,orusean index. The index could show, for instance, how all attributescompareinimportancetotheaverage,withtheaveragesetto100.(In this scheme, an attributewith an index of 400would be fourtimes as important as the average for the set.) Examples of thesecharts appear in Online Bonus Chapter 1, available atwww.koganpage.com/AI-Marketing.

Whatisadesignedexperiment?The term designed experiment covers a broad range ofapproaches.However,allexperimentaldesignsfortrade-offstudiesmeet one goal: estimating the values of many items cleanly andclearly.Twoof themethods,discretechoiceandconjointanalysis,also use these designs so that relatively few carefully selectedsituationsorcomparisonscanestimatewhatwillhappenindozenstothousandsofsituations.


That is, if we use an experimental design and show just a fewstimulus items (products,marketplaces or comparisons) thenwecanestimateaccuratelywhatwouldhappeninhundreds,oreventhousands,ofdifferentsituations.With conjoint analysis and discrete choice, we also get a clear

reading of how strong each item is in influencing some overallmeasure, suchas a choice, or (in some cases) interest in a choice.(This latter measure can be useful in some instances, as we willdiscuss.)That may sound abstract, so let’s go to a specific example.

Suppose you want to measure relative preferences for cars thatvaryinhorsepower,mileageandtimetogetfromzeroto60milesperhour(roughly96kilometresperhour).Thisexamplewillusearating scale, in spite of its known difficulties, to make theexplanation as simple as possible. (Using a rating scale, it is stillpossible to get relative preferences based on the fact thatwe arecomparingeachperson’sratingstoher/hisotherratings.)Figure3.7shows one wrong way and the right way to get at the relativeinfluence on overall ratings of different levels of horsepower,mileageandtimefromzeroto60.

FIGURE3.7Asmallandwrongwaytomeasureandalargercorrectway

Wewouldhavenoideawhatisinfluencingoverallratingsusingthebadset-up–everythingimprovesinthesamewayfromcar1tocar 3: horsepower gets better, mileage gets better and time fromzero to 60miles per hour gets better.Wewill get a clear picturefromthemanyvariations,whereallthreefactorsvarydifferently.Inthesecondset-up,withninedifferentcars,allthefeaturesvary

indifferentwaysfromonecartothenext.Thisinfactisadesignedexperiment. In Figure 3.8 we review how this works, where thevariouscarsappearintheformofatable.Eachcarisarowinthetable,andeachattributeisacolumn.

FIGURE3.8Elementsofadesignedexperiment

Followingtheusualconvention,eachcolumnisavariable.Thatis, the attributes get varied from one car to the next. Each rowwouldbeshownonaseparatescreeninatypicalonlineinterview.LookingdownthethreecolumnsinFigure3.8wecandetermine

howvariations in each feature or attribute relate to variations ineach of the other attributes. We use standard correlations tomeasure these relationships. As a reminder, correlationsmeasurehowwellthetwovariablesfallintoastraightline.WeshowedthisinChapter2,wherevariableswereplottedversuseachother.In our example, all correlations are zero. That is, there is no

relationshipinthewaythefeaturesvaryfromonecartothenext.Recallthatthefeaturesarevariables.Figure3.9showsthetableofcorrelations. (Someblanks – theblackboxes –appear in the tablebecausevariablescannothavecorrelationswiththemselves.)

FIGURE3.9Showingthattherearenocorrelationsamongattributes

NOTECellsaredarkenedbecauseavariablecannothaveacorrelationwithitself

Thischaracteristicofdesigns iskeybecausewedonotwant themeasurementofonevariabletogetmixedorconfoundedwiththemeasurementofanother.When theattributeshavenocorrelationwithoneanother,thisiscalledanorthogonaldesign.

MakingitformalThe full name for this type of design is a fractional factorialorthogonal design. This is a mouthful and sounds highlyimpressive.Manyyearsago,we ranacrossapersonwhochargedhisclientsextrafora‘certified’versionofoneofthese.However,atthetime,thesedesignsallcamefromacataloguethatwaslabelled‘orthogonaldesigns’.Thisnamesimplymeansthatthisdesignusesa fraction of all possible combinations and that the factors arevariedsothattheyhavenocorrelations.Designslaterbecamemorevaried(andflexible),aswewilldiscusssoon.

AnothercomplicationOne other important rule also holds for this type of experimentaldesign:everypairofattributelevelswillappearatleastonce.Thisis not every three-way set, four-way set (or more, with biggerdesigns).Inourexample,forinstance,aquickreviewshows:

120HPappearswith30,40and50MPGeachatleastonce;thesameholdsfor150HPand180HP–eachappearswitheachMPGatleastonce;also,120HPappearsatleastoncewith7,8or9secondsto60;andthesameholdsfor150HPand180HPversuseachtimeto60–andsoon.

Gettingeverythingrighttakesalotofwork.Atonetime,thiskeptalotofgraduatestudentsfullyoccupiedastheyworkedtowardstheirdegrees. These designs went into catalogues, which had to besearched forbestdesign for the taskathand. (Wewill get tohowyou decide on a design shortly.) Fortunately, any relatively newcomputercaneasilycrankoutthesedesigns,makingcustomonestofitnewsituations.

ThegreatmeasurementpowerofexperimentsTwo examples will show why we bother with experiments. Forinstance, suppose you had a product with six attributes, eachhaving three levels, and one attribute with six levels. This wouldmeanthatyoucouldhave:3x3x3x3x3x3x6orsome4,374possiblevariationsonthisproduct.Using an experimental design, we can accurately estimate the

value of all 4,374 possible variations using only 18 productdescriptions.Next, suppose you have a product with 18 two-level attributes

(thesecouldbe‘featureisthere’versus‘featureisabsent’or‘colourisred’versus‘colourisblue’,forinstance).Thiswouldgiveyou218

(2tothe18thpower)or262,144combinations.Youcanmeasureallthese possible combinations using only 20 carefully selectedproductdescriptions.

Evenmorecombinationscanbetestedandcomparedinrealtimeusing amarket simulator program. These run under MicrosoftExcel(ormorerarely,PowerPointorinAdobeAcrobatPDFformat)andfeatureeasy-to-usecontrols.(YourcomputeralsoneedstohaveFlash Player installed for the PowerPoint and PDF versions towork.)WediscusssimulatorsinChapters4and5.

Designedexperimentsallowyoutomeasuretheeffectsofvaryingmanyattributesaccuratelyandcleanly.Usedwithdiscretechoicemodellingandconjoint,theyalsoallowyoutoaccuratelyforecastresponsestodozensoreventhousandsofalternativeproduct/serviceconfigurationsbyshowingjustasmallscientificallyselectedsubset.Designedexperimentshavetremendouspower.Forinstance,with18two-levelattributes,youcandeterminetheworthof262,144possiblecombinationsusingonly20experimentallydesignedproductconfigurations.

WhatyoumeasureandthesizeofexperimentsThemoreattributesandvariationsofattributes(orlevels)thatyouneedtomeasure,thelargeryourexperimentneedstobecome.Thatis, as you varymore attributes and levels, youwill need to showpeople more screens or pages with varying products. But howmany?Thisisaroughruleofthumb–theexactformulaisalittlemore

complicated.AswedidinFigure3.8,belowweexpressthisbothasrowsinadesignorasscreensthatyouneedtoshowwithdifferentproducts:

two-levelattribute:onerowinadesignorscreenshown;three-levelattribute:tworowsorscreensshown;four-levelattribute:threerowsorscreensshown;five-levelattribute:fourrowsorscreensshown;six-levelattribute:fiverowsorscreensshown.

Andthenweneedtoconsiderabitmore.Weneedtwomorerowsorscreens:

Oneformeasuringtheerrorinthemodel.Thisallowsustoknowhowwellwearemeasuring–basicallyhowconsistenteachpersonwasinanswering.Otherwise,whenwerunthemodel,thiswillbereportedas100percentregardlessofhowwellpeopledidwhenanswering.Oneforaveryusefultermcalledtheconstant.Thishas

mathematicalmeaning,butwecanuseittomeasurethevalueofthebrandorthechoiceoutsidetheattributesbeingtested.

Onelastwrinkle

Youmustcountthenumberofvariationsorlevelsineachattributeandcheck thedesignagainst these.Thedesignmustbeat leastasbigastheproductorthetwoattributeswiththemostlevels.Inourexample,threethree-levelattributeswouldsuggestthatweneed3x2 or six screens, plus our two extra formeasuring error and theconstant, or eight. However, we must have at least 3 x 3 (theproductofthetwoattributeswiththemostlevels)orninescreens.

Gettingmorefromexperiments:HBtotherescueIt is a sad but definite fact that people tire quickly of evaluatingalternativeproducts ormarketplaces.Depending on the audience,this could start as soon as six or eight being shown. (When studyparticipantsareveryinvolvedwiththeproductorservice,andhavegoodabilitytoconcentrate,theycancompleteupto21orsobeforetheygettootired.Engineersandfarmershavedoneverywellwithevaluatingagreatmanyalternatives.)Whenexperimentsgottoolarge,measuringmanyattributesand

levels, one strategywas to splitup thedesign, givinga fraction toeach person, and then adding more people to compensate. Forinstance,aclientgoesnearlycrazyandwantsthis:

sixfour-levelattributes;sixthree-levelattributes;twelvetwo-levelattributes.

Wewouldneed48screenstomeasurethis.Thisofcourseisfartoomany to show in its entirety to one person without some illegal

stimulativesubstances.Theoldsolutionwastoshoweachpersonafraction of these, for instance 12 of the 48 required, and thenmultiply the number of study participants to make up for it.Showingone-quarterofall thescreenstoeach,wewouldmultiplystudyparticipantbyfour.That is,eachpersoncountedasone-fourthofatotalexperiment

or replication. To get to the right number of experiments toanalyse,wejustincreasedthecountofpeople.InChapter4wetalkabouthowmanypeopleyouneedtomeasure.Withchoicemodels,addingmorepeoplelikethisworked!But it

madeformuchbiggerandcostlierstudies.Splittingupdesignsfortraditional full-profile conjoint was highly messy and likely toexplode. Clearly, as demand increased for more complex andrealistic representations of products and services, we neededsomethingbetter.

EnterHBanalysisHierarchicalBayesian(HB)analysiswasdevelopedinthe1990stoaddress the above problem. It is amachine learningmethod thatstretches howmuch we canmeasure in trade-off studies, relyingupon immense numbers of calculations and some fairly mind-bogglingconcepts.Fortunately, ithasbeenprovenunderfire–forover25years.WithHB,wecanmeasureuptofourtimesasmanyattributesin

choice models/conjoint as we could before this method wasdeveloped.Wecould,forinstance,reasonablysplita48-screentaskinto12-tasksets,showeachpersonone12-tasksetandnotincreasethe sample. (Pleasenote that someauthorities say you shouldnottryformorethanatwofoldincreaseorthree-foldincreaseinwhatyoumeasurebeforeyoucompensateby increasing thenumberofpeopleyouinterview.)As a big bonus, we also could get data for individuals from

discrete choice models (and MaxDiff studies). That was never

possible before HB. All answers could be only for groups. Thosewerethebadolddays.

ButwhatisHBanalysis?

Briefly, HB analysis fills in data that is scant or missing for arespondent by repeatedly borrowing estimates from otherrespondents. That is, it keeps sampling other respondents andstoringvalues fromthosewhohave themissing information, thenrunningcalculations. Itusuallydoes this20,000ormore times foreachattributelevelforeachrespondent,keepingarunningaverageofitsestimates.It may or may not compare the respondent to the sample it is

drawing and make adjustments based on their similarities.Estimates will settle down to steady values (or converge) if youhavesetuptheproblemcorrectly.Ifyouhavenot,thenmaybetheywillnot–andyouneedtorootoutproblems.Asolutionthatdoesnot converge usually means errors in set-up, data collection orcodingofvalues.This method gives your PC (or Mac) more of a workout than

almostanythingelseyoumightaskittodo.Unlikenearlyanyothertaskwethrowatacomputer,theseanalyseswillleaveyouwaiting.It could take many minutes for a complicated discrete choicemodellingruntofinish,evenhours.

HBanalysis:itworksandit’snotmagic

HierarchicalBayesian(HB)analysisisamachinelearningmethodthatallowsustogeneratefarmoreinformationfromadesignedexperimentthanformerlywaspossible.Thisallowsustomeasuremoreattributesandattributelevelsthanusedtobepossibleforaspecificnumberofmarketplaces(orproducts)showninastudy.Someexpertssaythisisabouttwiceasmuch,butabout25years’experienceshowsthatwecansafelymeasurethreetofourtimesasmanyasusedtobepossible.Inthebadolddays,ifweneededtorunanexperimentrequiringustoshow48screensof

marketplacesintotal,wewouldhaveneededtoboostoursamplebythreetimesifweshowedeachperson16ofthese.WithHB,wegetalmostexactlythesameresultwithnoincreaseinsample.Andwegetdataforeachindividual.Thissoundslikesorcery,butitworks,relyingonthepowerofthecomputerdoingmany

millionsofcalculationstoreachananswer.

Amazingly, all this borrowing of informationworks – andwe gethighly accurate estimates. It seems almost supernatural and evensomewhatsuspect,butwithHBanalysisweencounteronesituationwherewecangetmuchmorethankstothegreatpowerofmoderncomputers.Wecouldargueaboutcounting thisasartificial intelligence,but

youwillnote thatwehave called it amachine learningmethod.Thatis,thecomputerstorescalculationsandincorporatesthemintolater ones, thus learning fromwhat it has done.Wewill get intowhatmakesanalysesBayesianinmoredetailinlaterchapters.Fornow, this has to dowith theway that later estimates incorporateandmodifyearlierones.

Abrieftalkaboutorigins

TheoriginsandevolutionofdiscretechoicemodellingWork on discrete choice modelling started in the 1960s. DanielMcFaddeneventuallywonaNobelPrizeineconomicsforthiswork.(This appears to be the onlymethodwe discuss anywhere in this

bookthathasthisdistinction.)The first widely cited application of discrete choice modelling,

published around 1980, answered this question: How we canpredictchoiceswhenthealternativesdonothaveanyattributesincommon? This was in transportation, where the choices weretakingatrain,busorcartowork.Theauthorsaimedtodeterminethe factors thatwould inclinepeoplemore towardschoosingeachtypeoftransportation.The three choiceshavenocommonattributesexcept timedoor-

to-door. For instance, you do not care about the cost of parkingdowntownifyoutakethetraintowork,orthefareonatrainifyouaretakingabus–andsoon.Analysingchoiceswithoutcommonattributesposedmanythorny

problems,involvingdifficultiesthatcanscarcelybeimagineduntilyouseetheproofsinvolved.Fortunatelyforallofus,themethodisonasolidfooting.Andthestudyworked!Andindeed,thismethodhasaremarkablystrongtrackrecordas

well as excellent theoretical underpinnings. It remains the mostrealistic andmost predictivemethod for determining the specificattributes and variations in attributes that influence consumerdecisions–andhowmuchinfluencethoseattributeshave.

ConjointanalysisfromstarttoitsmanyvarietiesConjoint analysis was developed in the 1970s by marketresearchers, largely due to frustration with the poor predictiveabilityofscaledimportanceratings.Aswementioned,theseratingsdonotworkwellaspredictorsofbehaviourinnearlyallinstances.They are generally unreliable, in that their connection to whatpeoplewillactuallychooseistenuousatbest.One possible exception, just to remain fair, would occur if you

havea lotofhistoricalsalesdata,a lotofhistoricalratingsdata,aproduct or service category that is not changing, and consumerswhoarenotchanging.Asyoumightguess–thisisnottoolikely.

Conjoint proved it was better than scaled ratings in real-worldapplications, even in its earliest incarnations. It was rapidly andwidely adopted. Early conjoint (before about 1970) looked a littlelikemagicsquares–peopleputnumbersinboxesrankingpairsofattributelevelsasinthesampleinFigure3.10.

FIGURE3.10Ourguestfictionalstudyparticipantrankscarsontwofeatures

NOTE‘1’isthebestand‘9’istheworst

TheguestimaginarystudyparticipantinFigure3.10bestlikesthecarwith themost horsepower and bestmileage (MPG), shownbythenumber1inthatbox.Next,shestickswiththehighestmileagecar butwith the nextmost horsepower (150). In her third choice,though, she does not opt for the lowest horsepower car with thebest mileage, rather taking the highest horsepower car with thenextbestmileage.Therestofthesquaresaregivenvaluesdowntothe obvious worst choice, with the lowest horsepower and worstmileage.Our presumably patient participant would then need to do

another grid like this for horsepower versus time 0 to 60, thenanotherforMPGversustime0to60,etc.Notonlyisanexerciselikethis tedious with four or more attributes, the approach remainsdistant fromwhat people dowhen selecting a product or service.Andsoimprovementofthismethodcontinued.

Thebigdevelopment:full-profileconjointanalysis

Thisformofconjointarrivedinthemid-1970s.Itshowsaseriesofwhole products or services and asks the study participant toevaluate each in turn. Because it shows full descriptions of aproduct, it has the name full profile. Respondents typically ratetheseproductprofiles,or(veryrarelynow)sortandrankthem.This was immediately hailed as a great advance and gained

widespreadadoption.Itoftenworkedwell,particularlywithwidelyknown brands that were similar to each other. But it also brokedown mysteriously in other situations – and as these wereunderstoodbetter,theuseofdiscretechoicemodellingrose.Also, with standard analytical tools, the ability of conjoint

analysis tomeasurewasquite limited.Moststudiesstucktosixorsevenattributesatmost,withjustafewvariationsofeach.As mentioned above, HB analysis broke the barriers on how

manyattributesandlevelsmightbetested.Beforethismethodwasdeveloped fully (around 1990), a number of alternative forms ofconjointanalysisattemptedtodealwiththeselimitations.Perhapsbestknownamong thesewasasoftwareproductcalled

AdaptiveConjointAnalysis(ACA).Thistookeachpersondownaslightly different route based on what they said about what theyfound acceptable before seeing any product profiles. Thismethodreceived some strong criticism, with some justified complaintsabouttheassumptionsbehindit.Thatis,concernswerevoicedaboutthelackofrealisminvolved

in pulling attributes out of a product and asking forwhich levelswereacceptable,withoutseeinghowtheyfitwiththeotherfeaturesoftheproduct.Forinstance,youmightsayoffthetopofyourheadthatyouwouldneverbuyacondominiumwithabalcony(perhapsfor fear of being mistakenly serenaded at midnight by somewanderingRomeo).However,ifyoufoundacondothatwasperfectinallotherregards,youmight thendecidethatyoucouldput thisfearasideandbuyitafterall.Sotherewerecredibleobjectionsaboutbothcalculationsandthe

realism of the task. In spite of these, this product enjoyedconsiderablepopularityuntildiscretechoicemodellingsupplantedconjoint as the strongest method for determining a product’soptimalfeaturesandprices.Another method that attempted to deal with limitations of

traditionalconjointiscalledpartialprofileconjoint.Theideaherewas to simplify what study participants saw and evaluated, byshowing just a few features of the product or service in question.This also suffered fromanobviousproblem in its lackof realism.Wedonotevaluateactualproductorservicechoiceswithsomeofthem hidden from view, making the assumption that what wecannotseeisacceptable.

Thestartoftrade-offs:Q-SortandThurstone’sCase5Thurstone’s work on these methods was done before any of theothermethodswereevenconsidered,allthewaybackinthe1920s.He developed the analyses needed to turn rankings of differentitems into ratio-level scaled data. As alreadymentioned, becausethis was still early days, he got to call his scaling procedure ‘TheLaw of Comparative Judgements’. Nothing, nomatter how clever,getstobecomealawanymore.The procedure for solving this problem is called Case 5.

(Originallythe‘5’waswrittenasaRomannumeral,or‘V’.)Wewillnotbediscussingtheotherfourcases,butthesearediscussedinanimposingpaperbyThurstone.WeuseCase5analysisfollowingguidedsorting.Itworksonlyat

thegrouplevelbecauseyouneedtofindhowmanytimeseachitemranks better than each other. This produces a so-calledwin-lossmatrix, in which the item with the better ranking ‘wins’. Anexampleofawin-lossmatrixisshowninFigure3.11.

FIGURE3.11Awin-lossmatrix

NOTEPartofa‘win-loss’matrix:forinstance,AwinsagainstB60timesandBwinsagainstA69times.

Thurstone’s work underpins much of the later work done inanalysing howpeople trade off features.Hismethod still remainshighlyusefulfordeterminingtherelativeimportancesoflonglistsofattributesormessages.

Maximumdifferencescaling(MaxDiff)Maximumdifferencescaling(MaxDiff)isboththenameofapieceof software that follows an analytical routine and an establishedstatistical procedure that is different. The software appearedaround 2000, but other similar methods existed for many yearsbeforethen.MaxDiffsoftwaredoesthis:

Itstartswithalistofitems.Itgeneratesaspecialexperimentaldesign.Thisdesignmakessurethatitemsarecomparedwitheachotherinabalancedway.Itmergesdatagatheredfromstudyparticipantsandthedesign,andpreparesafilethatcanbeanalysedbyspecialHBsoftware.

TheHBsoftwarethengeneratesdataontheimportancesoftheitemsforeachperson.

Otherwise,comparisonsofitemsseveralatatimecouldbeanalysedjust as we analyse rankings with Case 5, at the group level. Thesoftware implementation of MaxDiff is a direct outgrowth ofThurstone’smethod.

ApplicationsinbriefWewillhavemuchmoretosayaboutthesemethodsinChapters4,5and6,andinBonusonlineChapter1.However,abriefreviewoftheir best applications might be useful as a way to conclude ourintroductiontothem.Bothconjointanddiscretechoicehaveenjoyedlonghistoriesas

methods for creating thebestpossiblenewproductsand services.MaxDiff andQ-Sort/Case 5 have workedwell formany years inprioritizing specific claims, messages or product features. Givenbelowareafewmorespecifics.

DiscretechoicemodellingThismethoddoesbestatdeterminingthemarketplaceeffectsthatwillfollowifproductsorservicesinacompetitiveenvironmentgetreconfigured, introduced or dropped.Discrete choice modellingexcels in realism, both in theway it represents the choice in themarketplaceand in theway itaskspeople tomakechoices justastheywouldinanactualpurchasedecision.This also is the best method to determine what could happen

followingchangeswhere thereare severalproductsor servicesofthe same brand. These effects include the egregiously namedcannibalization, which refers to the way that changes in oneproductmighttake(or‘eat’)someofthesalesfromanotherproductofferedbythesamebrand.Itcanalsorevealso-calledproductline

synergies, inwhichthepresenceoftwoormoreproductsleadstogreatersalesthanyouwouldexpectfromeachproductseparately.

ConjointanalysisConjointanalysis,initstraditionalfull-profileformat,considersoneproductata time. Therefore, itmakes sense to consider conjointwhen there is no true competitive context for your product.Sometimes products do not get evaluated alongside competitors –forinstance,direct-mailinsuranceofferings.(Peopleindeeddonotkeepfoldersfilledwitholdofferstocompare.)Alternatively, if you have some really special offer on your

website, people may act on it without doing any furthercomparisons.Conjointdoeswellat creating thebestalternative todisplayforthisapplication.Conjoint analysis also canwork highly effectively in optimizing

anytypeofmessage,determiningthebestcombinationofelementstoinclude.Ifyouhave,forinstance,amessagewhereyouwanttotestsevenelements(suchasheadline,headlineplacement,contentsof text,graphicsandsoon)andeachcanvary in threeways, thatwould make for 2,187 possible ways of varying the message.Conjointanalysiscandeterminetheworthofallofthesebytestingonly18 experimentallydesignedvariations.We showanexampleofthisinChapter5.Considerconjointaswellwhenyourproductwillgetchosenonly

infrequently.Sincediscretechoicemodellinguseschoices,itmightnotpickupwhatisdrivingdecisionsifthebrandyouareinterestedinrarelygetschosen.Youalsocouldconsidernarrowingthescopeofcompetitiontoget

abetter answer, if yourbrand is rarely chosen fromawide field.For instance, suppose you are doing a study for everyone’sbreakfastfavourite,SoggyOs.Abrandsuchasthiscouldhavea0.5percentshareoftheentirebreakfast-foodmarketandstillmakealot ofmoney.Lookingat thisbroadermarket, youprobably could

not determine with any accuracy which changes might mostincreasesales–anyeffectwouldbevanishinglysmallinthecontextofallotherpossibleproductchoices.Youcouldinsteadlookonlyatthosespecificbrandsthatareyour

close competitors, that is, other fine shredded cellulose-enrichedfood-likesubstances.Inthatarena,youmighthavea10percentor20percentshare,andsobeabletogetamoreaccuratereadingofhowmuchchangeswillaffectyoursales.

MaxDiffandQ-Sort/Case5Thesemethodsprovideclearreadingsoftherelativeimportancesofindividual features,messages or claims. They do not provide anyreadingofhowvarying featureswill affect acceptanceof awholeproduct or service. They, in fact, do notworkwell with levels orvariationsoffeatures.Youcannotuseeitherof themtoconfigureanentireproductor

service.Rather,youcanweighelementsthatyoumightincludein,for instance, a special-features add-on package (as for a car), orassess the importance of broader claims youmightmake about aproductorservice.Youwillgetratio-levelinformationfromeitherofthesemethods.

Thatis,aftertheanalysis,youcanforinstancesaythat,‘MessageNwas twice as important as message R’. This, and the cleardifferentiation of items into most and least important, providestrongadvantagesovertheuseofscaledratings.MaxDiff is good for evaluatingabout 8–35 items ina list.Q-Sort

can determine relative importances formanymore items – up to100havebeenreportedintheliterature.However,MaxDiffhastheadvantage of providing importances for every person. Q-Sortprovidesanswersonlyforagroupofpeople.Every time you want to analyse another group within your

samplewithQ-Sort,youmustrunanotheranalysis.WithMaxDiff,toanalyseanewgroup,youmerelyneedtoaveragetheindividual

datathatyoualreadyhaveforthepeopleinthatgroup.

SummaryofkeypointsThe experimental or trade-offmethods were developed to dealwitha salientproblemwithscaled importanceratings, that is, theunfortunate fact that these ratings are highly unreliable inpredicting behaviour.When you ask people to rate directly whatthey find important (in a survey), they tend to identify nearlyeverythingascritical.Relianceondirectimportanceratingshasledmany an unsuspecting organization down a wrong path, wastingtimeandresources.Thegreatadvantageofallthetrade-offmethodsisthattheyforce

people to prioritize, just as in real life. They cannot say thateverythingisimportant.The more complex methods, conjoint analysis and discrete

choice modelling, forecast marketplace responses to product orservices (or communications orweb pages)when features and/orprices are varied. There is some confusion about the differencesbetween these two methods, and some treat the term conjointanalysis as a broader heading that includes discrete choice.However, because these two approaches developed in differentdisciplines and work in somewhat different ways, we will betreatingthemseparately.Discretechoicemodellingwasdevelopedbyeconometriciansto

address the question of how we can predict choices when thealternatives do not have features or attributes in common. Thetheoreticalandmathematicalworkgoing into thismethod is trulyimpressive. This appears to be the only approach we discussthroughoutthisbookthatwasawardedaNobelPrize.In a discrete choice study, people are shown products in a

marketplace or market scenario, where the main competitiveofferings are described alongside each other. In a typical web

survey, one hypothetical marketplace is shown on one computerscreen,andpeopleareaskedwhichproducttheywouldchooseifallwereconfiguredasshown.Thentheyareshownanotherscreeninwhichthefeaturesoftheproductshavebeenvariedandareaskedtomakeanotherchoice.Itisalsopossibletoinclude‘noneofthese’as a choice – greatly increasing realism – since people almostalwayscanoptoutifallalternativesaretrulyunacceptable.Peoplemightmakeachoicelikethisintheregionof6–20times.Conjoint was developed by market researchers and originally

focusedonsingleproducts.Initstraditionalform,itshowsawholeproductdescribedasaseriesoffeaturesorattributes,andasksfora rating – hence the name full-profile conjoint. While conjointanalysisdoesaskpeopletoweighfeaturesinthecontextofallotherfeatures,ittypicallyasksforarating.Thisislesscloselytiedtowhatpeople do in the real world than the choices made in a discretechoicemodellingstudy.Bothconjointanalysisanddiscretechoicemodellingcan lead to

powerfulmarket simulator programs, which show in real timehow marketplace responses vary as features and/or prices ofproducts, services or messages are changed. The typicalmarketsimulatorrunsunderMicrosoftExcelandhaseasy-tousecontrols.Maximumdifferencescaling(MaxDiff)andQ-Sort/Case5arethe

simplest of this group of methods. They give us clearlydifferentiatedimportancesforalistofitems,suchasclaimsaboutaproduct, or features that do not make up an entire product orservice. They cannot measure the relative worths of differentvariationsorlevelsoffeatures.Theycan,however,provideaclearreading of the relative importances of features, and at the ratiolevel. That is, we can say, for instance, ‘feature N is twice asimportantasfeatureR’.Wecannotdothiswithscaledimportanceratings.MaxDiffcanprovide importances foreveryperson,whileQ-Sort

doessoonlyforgroupsofpeople.However,Q-SortcangetafixontheimportancesofmanymoreitemsthanMaxDiff,withupto100

reported in the literature for Q-Sort as compared to about 35 forMaxDiff.

DesignedexperimentsThe term designed experiment covers a broad range ofapproaches.However,allexperimentaldesignsfortrade-offstudiesmeet one goal: estimating the values of many items cleanly andclearly.Twoof themethods,discretechoiceandconjointanalysis,also use these designs so that relatively few carefully selectedsituationsorcomparisonscanestimatewhatwillhappenindozenstothousandsofsituations.These designs have great measurement power. For instance, if

youhaveaproductwith18featuresthateachvaryintwoways,thisgives rise to 218 (2 to the 18th power) or 262,144 combinations.Usingadesignedexperiment,youcandeterminethevalueofallofthesewith 20 carefully selected product configurations. Amarketsimulatorprogramwouldallowyoutodeterminethevaluesofallthese combinations (assuming you had the time and patience) inrealtime.Traditionally, all experimental designs used in conjoint analysis

and discrete choice modelling were the orthogonal, fractionalfactorialtype.Thisisamouthfulbutitsimplymeansthatafractionofallpossiblecombinationsiscarefullyselectedandthatthereareno correlations in the ways that the attributes vary from oneproductormarketplacetothenext.

Basicrulesfordesignedexperiments

Themoreattributesandvariationsofattributes(orlevels)thatyouneedtomeasure,thelargeryourexperimentneedstobecome.Thatis, as you varymore attributes and levels, youwill need to showpeoplemorescreensorpageswithvaryingproducts.Inthischapterwegavesomerulesabouthowmuchyouneedtomeasure(thatis

how many screens you need to show with different products ormarketplaces)fordifferentnumbersofattributesandlevels.

HBanalysisgetsmorefromexperimentsIt is a sad but definite fact that people tire quickly of evaluatingalternativeproducts ormarketplaces.Depending on the audience,this could start as soon as six or eight have been shown. (Asmentionnedearlier,whenstudyparticipantsareveryinvolvedwiththe product or service, andhave good ability to concentrate, theycancompleteupto21orsobeforetheygetexhausted.)Wegetmuchmorethanweusedtofromexperimentalmethods

by using a machine learning approach, Hierarchical Bayesian(HB)analysis.Mostexpertsagreewecanmeasuretwiceasmuchaswas traditionally possible, with no accompanying increase in thenumberofstudyparticipants.In theolddays, ifyouconstructedavery largeexperiment,you

simplysplititamongpeopletakingthestudy.Eachpersonmightdoone-third of an entire experiment requiring (for instance) 36marketplaces or products. Then you multiplied the number ofpeopleinthestudybythreetomakeupforthis.Itworked–butledtoconsiderableincreasesintimeandexpensesindoingthestudy.HB has been pushed to get three to four times as much

informationwithoutincreasingthenumberofpeopleinthestudy.Ithasstillreturnedexcellentresults.HB analysis sounds almost magical, and relies on some fairly

mind-bendingconcepts,butithasdefinitelyprovenitselfunderfireforover25years.Itmightnotquitecountasartificialintelligence,but it definitely can be considered machine learning, as thecomputermakesmany thousands of repeated estimates and usestheearlieronestoinformthefinalresult.

Bestapplications

Discretechoicemodellingremainsthebestmethodforpredictingthe effects of changing products in a competitive marketplace. Italsoistheonlymethodthatcantrulyaddresswhathappenswhenthere ismore than one product from a given brand in the set ofcompetitive entries. That is, it can determine when there are so-called product line synergies (effects from two ormore productstogetherthataregreaterthanwewouldexpectfromeitherproductalone). It also can diagnose the awful-named product linecannibalization,whereoneproductcutsinto(or‘eats’)thesalesofanotherproductfromthesamebrand.Conjoint analysis, developing as it didwith a focus on a single

product, remains useful where a product does not have truecompetitors. For instance, this might happen with a direct-mailoffering for an insurance product. (People of course do not keepportfolios of these, waiting for the best one to happen into theirlives.) Also, conjoint can be useful in creating the best productwhere the product has a very small share of the entire market.Sincediscretechoicemodellingisbasedonchoices,ifaproductgetschosen infrequently, you may not get enough data to determinewhich variations in its features are driving levels ofmarketplaceacceptance.MaxDiffandQ-Sortdonotevaluateentireproductsorservices,

but rather are useful for evaluating the relative importances of aseriesofclaims,messageorfeatures.Youmightuseeithermethod,forinstance,tofindthemostimportantitemstoincludeinaspecialfeatures package for a car, or the best absolutely free gift that aperson gets for signing onwith their friendly telecommunicationsprovider. MaxDiff can provide importances for every person ifanalysedusingHBanalysis,whileQ-Sortprovidesinformationonlyfor groups of people. Q-Sort can prioritizemanymore items thanMaxDiff,though–upto100ascomparedtoupto35.

Takeextracarewithprojections

AsareminderfromChapter2,conjointanalysisanddiscretechoicemodelling can get you to share of acceptance. This differs frommarket share in that youmust factor inhowawarepeople areoftheproductandhowwidelydistributeditis(ataminimum)togettoactualmarketplacelevels.Ifyouhopetomakeprojections,youmustalsoknowthesizeoftheentiremarket(generallynoteasytodetermine), and account for any problems in areas such ascommunications, how the product or service is displayed, howsalespeopletreatit,andsoon.Youmustbeextremelywellpreparedtomakeagoodprojection.

04Creatingthebest,newestthingDiscretechoicemodelling

In this chapter we discuss arguably the most powerful of allmethods for determining the best mix of prices and features toincludeinaproductorservice.Discretechoicemodellingcentreson determining exactly what people will choose. We review thebasics, and then tremendously effective outputs, ie marketsimulations and market simulators. We show how they solveproblemsthatareotherwiseimpossibletoaddress.

KeyfeaturesLet’s startwithwhatmay seem an extreme statement – and thenexplain the reasons for making it. Discrete choice modellingarguably is themost powerful of allmethods for determining thebestmix of products and features to include in a newproduct orserviceinacompetitivemarket.Thismethodcentresondetermininghowchangesinthefeatures

ofproductsorservicesinfluencechoices.Ifitissetupandanalysedwell, it has tremendous power to predict how new or modifiedproductsorserviceswillfareintheactualmarketplace.Before we start, it is worth a mention that we will be

metaphoricallytreadingontheedgesofsomedeeptheoreticalandmathematicalwaters.Wewillbemanoeuvringaroundmuchthatisabstruse, instead focusing on explaining clearlywhat youneed inordertomakeeffectiveuseoftheseremarkablemethods.Wehopethatthosewithmoreacademicexperiencedonotfindanyoftheir

favouritetheoriesslighted.

Whywedothis:payoffinmarketsimulationsThegreatestpowerofthismethodresidesinitsabilitytosimulatewhatwill happen in a changingmarketplace over the near term.Later in this chapter, you will encounter an example where theoutcomeofacompetitive response showedsomuchpotential riskthat the sponsoring organization decided to lock up the reports.They actually feared that a leak could undermine their marketposition.Figure4.1showssomeaspectsofaMicrosoft-Excel-basedmarket

simulator program.Not all simulators have these features – someareimportantandyetsometimesomitted,aswewilldiscuss.

FIGURE4.1Aportionofamarketsimulator

Youseeaportionofthesimulatorinthisfigure.Thecontrolsthatadjust the products’ features and prices appear only in part. Therestwouldbereachedbyscrollingdownintheworkbook.This program opens like any Excel workbook. The example

shown in Figure 4.1 is from an actual study done among doctors,determiningwhat theywouldprescribe ifanewdrughadcertainfeaturesandprices.Ithasbeendisguisedbymodifyingnamesandsomenumbers.Thecircled letters in thefigurearenotpartof thesimulator itself, but rather appear to help identify differentfeatures.Near letterA you see a graphical display of the outcome of the

simulation.Thesearethesharesofpreferenceexpectedafterthecontrols for the simulator have been changed to reflect a specificcombination of pricing and features. This information appears intableformnearletterC.Two adjustments for external market conditions appear near

letterB. As wementioned in Chapter 2, without adjustments likethese, you cannot expect the share of preference found by thecalculations behind the simulator to approach actual marketshare. This is because all thosewho responded to the studyweremade 100 per cent aware of the products – and that needs to beadjusted down to reflect marketplace conditions. In this specificsimulation,thepercentageofpatientswhoarenewlydiagnosedwillaffectproductchoices,soanothercontrolwasincludedtoadjustforthat.LettersDandFarerespectivelycontrolsforconfiguringthestate

of the marketplace that you want to estimate, along with areference orbasemarketplace towhich youwill compare. Thisreferencecaseisincludedsothatyoucangetanaccuratefixofhowmuchthechangesyouaretryingwillaffectmarketplaceacceptanceofyournewproductorservice.Thisisanimportantfeaturebecauseofthewaythatutility–the

abstractquantitythismethodassumesistradedoff–getstranslatedinto marketplace behaviour. We will be talking about the wayutility works in more detail. For now, though, we just need tounderstandthat this iskeyforseeingthestrengthsofeffects fromchangingfeaturesandprices.Finally, near letterE, we have some controls that allow you to

modify thebasicExceldisplay.Theprogramopenswith theExcelribbon – the control panel across the top of the screen – hiddenfromviewso thatyoucan seemoreof the simulator.Clicking theupper button restores the Excel ribbon to view, and clicking thelowerbuttonhidesitagain.These are the basic ingredients that make a simulator a

formidableforecastingtool.Nowthatwehaveseenthisincrediblypowerful form of output, let’s go through some of the basics andstepsthatyouneedtogetthere.

AsrealasitgetsDiscrete choice modelling is based on a survey that exposesproducts to study participants in the context of realisticmarketplacesituations.Theproductsthatpeopleevaluatecanhavetheirownfeaturesandtheirownprices.Thisrealism–andthefactthatwe are observing choices – provides a very strong analyticaladvantage. Consumers evaluate alternative products whileconsidering competitive offerings, just as they would in themarketplace.Study participants get a set of tasks or scenarios or market

scenarios, each one describing a set of realistic brandedservice/productalternatives.Theythensimplysaywhichproductineach market scenario – if any – they would choose. They do notneed to rate alternatives, or rank items they would never select.They simply need to pickwhatever theywould choose. They alsocanhave theoptionof saying thatnoneof thealternativeswouldappealtothem.Alternative products from one brand can appear competing

against each other. Market scenarios can be set up preciselyreflecting the important alternatives likely to appear in theactualmarketplace.Therefore,youcandetermineconsumerresponsesinthespecificcompetitivesituationsthatyouwouldneedtotest.

NoneedforunrealisticintrospectionDiscretechoicemodellinghasanotheradvantageoverratings-basedmethods in that it doesnot force buyers to dissect and to explaintheirreasoninginmakingdecisions.Extensiveresearchhasshownthatmostconsumersdonotgiveaccuratedescriptionsofwhatgoes

intotheirdecisions.AswediscussedinChapter3,mostbuyerswillrateallornearly

allproposedfeaturesas ‘highly important’or ‘critical’whenaskedto answer direct importance-rating types of questions. They willfaithfullydothiswhethertheytrulyneedthosefeaturesornot.Similarly,mostpeoplesaylittleaboutwhytheydidnotchoosea

productbeyondobservingthatthey‘don’tlikeit’,‘itwasn’tforme’,or ‘it wasn’t right’. Asking directly about what falls short in aproductcanprovetobeafrustratingtask.Approaches to understandingwhat consumerswill chooseneed

to go beyond these usually inaccurate ratings patterns to discernwhatinfluencesbuyers’behaviour–startingwithessentialfeature-related,provider-relatedandpricing-relatedneeds.Although discrete choice modelling market scenarios present

moreinformationthanmanyotherresearchstudymethods,peoplegenerally have little trouble completing the task – even whenevaluating10–15marketplaces.(Theymaygrumbleafterseeingsixorso,buttypicallykeepworkingwithgoodlevelsofconsistency.)After all, in each market scenario shown, people need only do

somethingthattheydoallthetime–lookatalternativesandmakea choice. Study participants typically report that the task engagestheir attention, and that it sometimes even is fun. That is a rarewordinconnectionwithmostresearchstudies.Thegenerallyhighratesofstudycompletionwiththismethodsupportthisdescription.

AnexperimentallydesignedapproachDiscretechoicemodellingstudiesareconstructedaccordingtostrictexperimental designs, as we described in Chapter 3. The exactnumber of marketplaces or scenarios required in a study willdependupontheprecisenumberofattributesandattributelevels(or variations) to be tested. As the numbers of attributes andattributelevelstobevariedincrease,sodothenumberofscenariosrequiredbythedesign.

Accurate analysis of findings from discrete choice modellingrequiresexperimentaldesigns thatareorthogonal orverynearlyorthogonal (that is, in which there is zero correlation or nearlyzerocorrelationinhowattributesvaryfromonemarketplacetothenext).Zerocorrelationmeansthatthewayonefeaturevaries,goingfrom onemarketplace to another, has no relationship to thewaythatanyotherfeaturevaries.Large experimental designs are difficult to construct properly,

and in thosecasesweneed thecomputer’shelp togenerate them.Larger designs are typically very nearly orthogonal (usually thespecifictypecalledD-optimaldesigns).Thesedesignsworkjustaswellasstandardorthogonalones,butcannotbefoundincataloguesinmostcases.Theyrequirecomputertimetodevelop.Keeping theexperimentareasonable size is important.Onekey

part of any discrete choice modelling study, then, would be athoroughdiscussionofthenumberofattributesandattributelevelstobevaried–andoftheimplicationsthatthesevariationswillhaveon the final studydesign.Measuring justwhat isneeded–andnomore–isacriticalconsideration.

Worthsayingagain:thepayoffinmarketsimulationsThe real power of thismethod is its ability to simulatewhatwillhappen ina changingmarketplaceover thenear term. If you canmanagetothinkaheadtowhatcompetitorsmostlikelywilldo,thenyoucancovernotonlywhatwouldhappenifyouchangeyourownfine product or service, but also what would happen whencompetitorsrespond.Later in this chapter, youwill encounter an examplewhere the

outcome was so revealing – and showed so much potentialvulnerability–thatthesponsoringorganizationdecidedtolockupthereports.Theyactuallyfearedthataleakcouldunderminetheirmarketposition.

Thinkingthroughandsettinguptheproblem

ThinkingintermsofchoicesDiscrete choice modelling, like all trade-off study methods,envisions products or services as a set of distinct features. Indiscrete choice modelling, these definite features have distinctvariationsthatareassumedtobemeasurableandcomparable.Thismeansthatthevalueofeachcanbetradedversusotherfeatures.This is about as far into the psychology of decision making as

thesemethodsgo.Discretechoicemodellingcanbeseenasaimingtoforecasttheoutcomesofdecisions,ratherthanastryingtofollowtheinnerworkingsofthehumanmind.Thismethodisverymuchabout what needs to happen with features and prices to attaingreateracceptanceofaproductorservice,ratherthandiggingintowhypeoplebehaveastheydo.Eachchoiceinadiscretechoicemodellingstudycanhaveitsown

featuresandprices.Comparingchoicesthatdonotshareattributesisonekeyproblemthatthismethodwasdevelopedtosolve.Someof the earliest widely cited applications for this method involvedtransportation, where (for instance) the choices were car, bus ortrain.Theseclearlydonotsharecommonattributesexceptfortimedoor-to-door.For instance, thecostofparkingdowntowndoesnotapply to the trainor thebus, thedistancebetweenbus stopsdoesnot apply to car or train, and how often the trains run does notapplytocarorbus.Theseearlystudiesworked–forecastswereborneoutinthereal

world.Alargebodyofimportantworkinthissubjectisstillrelatedtotransportation. (Youcanlearnmuchmoreaboutthismethodinsome excellent courses from the MIT school of Civil andEnvironmentalEngineering.)

Thefirststepisalargeone

Thefirststepisalargeone

Youmustfirstconsiderhowyourfineproductorservicecanvary,andthenhowcompetitorsmightvaryeitheratthesametimeorinresponse. This first step canprove to be surprisingly difficult andoftentakesasmuchtimeaseverythingelsedoneintheprocess.For instance, working with the marketing team at a major US

insurer,itquicklyemergedthattheyhadnoideaaboutthedetailsoftheircompetitors’products.Worse,whattheyhadimaginedwasavailable among theirproductswasnot allwhat they couldoffer.Afterafewweeksofdigging,oneoftheirteamannouncedthatthewholeprojectwas ‘givingusaheadache’.Fortunately,all involvedtooktheappropriateanalgesics,survivedthisphaseandwentontodoasuccessfulupdateontheirofferings.

Settingupastudy:anexample

Let’sgobackanumberofyearstowhendebitcardswerestillnotubiquitous. The client, here called Quiet Financial Services (QFS),was an extremely wealthy company that only recently hadabandoned the practice of solely serving those who had beenrecommended by other clients. (Their motto could have been, ‘Ifyou have an in, we might be inclined to offer you financialservices’.) Theywanted to develop a high-limit debit card for therightkindofpeople.TheirtwokeycompetitorsinthismarketwereCitibankandAmericanExpress(Amex).Theydidnotwanttoofferallthesamefeaturesinthekeyareas

of interest rate, credit limit and annual fee as the other twocompanies.Theysetouttotestcreditlimitsof$20,000and$80,000,whileAmexoffered$50,000or $120,000andCitibankawhopping$90,000or$120,000.Theyalsowantedtotestthreedifferentinterestratesfromthosetheircompetitorswereoffering,andtoanticipatewhatmighthappeniftheircompetitorschangedtheirinterestratesinresponse.Inaddition,theyhadabsolutelynointentionofofferingthiscard

withnofee.ThiswasCitibank’skeyincentivefortheircustomerstoaccrue a crushing debt. QFS lastly decided to test Amex droppingtheir relatively lofty $50 fee in response toQFS coming inwith alowerone.Figure4.2showseachcompetitorandthefeaturestobetested. The features appear in the boxes over the grey horizontalbars.

FIGURE4.2Discretechoicetreatsbrandaslikeacontainerforfeatures

Intotal,wehaveQFSwithtwo3-levelattributesandone2-levelattribute, Amex with two 2-level attributes and one 3-level, andCitibank with one 3-level attribute and one 2-level. The constant(Citibankofferingnofee)doesnotcountasanattributebecauseitdoesnotchange.

Nochangemeansnomeasurement

Youcannotmeasuretheimpactofanyfeaturethatdoesnotchange.Thisisconsideredafixedpartoftheproduct.Youmustvaryafeatureinsomewaytomeasureitseffectonchoices.

Somethingmayhaveseemedstrangetothoseofyoufamiliarwithtraditional conjoint analysis: we did not count the brand as anattribute.AswewilldiscussinChapter5,inconjointanalysis(andits conjoint-likedescendant,CBC)brand typicallygets includedasanattributethatgetsvariedalongwithotherattributes.With discrete choice modelling, however, brand works like a

container that holds the attributes. If we include a term in ourmodelcalled theconstant (asdiscussed inChapter3), thatcanbethoughtofastheresidualvalueofthebrandafterweaccountforallthespecificattributesthatwearemeasuring.The idea that attributes can be specific to each choice is a

critical concept that makes discrete choice modelling differentfrom, and superior to, traditional conjoint analysis. We will talkmoreaboutthisdifferenceinChapter5.

Considerbrandasacontainerforfeatures

Withdiscretechoicemodelling,eachchoice(orbrand)canhaveitsownfeatures,varyinginwaysspecifictothatbrand.Branddoesnotneedtobecountedasaproductattribute.Rather,wecanthinkofbrandasbeinglikeacontainerthatholdstheattributes.

LimitsonhowmuchyoucanmeasureTheQFSstudywehavebeendiscussingwouldrequireatotalof12marketplace scenarios. As a reminder, as you measure moreattributes and more attribute levels, you need more marketplacescenariostomeasure.Hereishowmanymarketplacescenarios,orcomputer screens, you need tomeasure for different numbers ofattributelevels:

two-levelattribute:onemarketplacescenarioorscreenshown;three-levelattribute:twomarketplacescenariosorscreensshown;four-levelattribute:threemarketplacescenariosorscreensshown;five-levelattribute:fourmarketplacescenariosorscreensshown;six-levelattribute:fivemarketplacescenariosorscreensshown.

And then we need to consider a bit more. We need two moremarketplacescenariosorscreens:

Oneformeasuringtheerrorinthemodel.Thisallowsustoknowhowwellwearemeasuring–basically,howconsistenteachpersonwasinanswering.Otherwise,whenwerunthemodel,thiswillbereportedas100percentregardlessofhowwellpeopledidwhenanswering.Oneforaveryusefultermcalledtheconstant.Thishas

mathematicalmeaning,butwecanuseittomeasurethevalueofthebrandorthechoiceoutsidetheattributesbeingtested.

You also must count the number of variations or levels in eachattributeandcheckthedesignagainstthese.Thedesignmustbeatleastasbigastheproductorthetwoattributeswiththemostlevels.So if youhave, for instance, a six-level attribute and a three-levelattribute,youneedatleast6x3or18marketplaces.Thisunderlinesthe importance of going for the simplest model that measureseverything you need. Going for the simplest working modelcoveringeverythingisanextremelywell-establishedpracticeinthesciences.ItissometimescalledOckham’srazor.OckhamwasWilliamofOckham,wholivedaround1300,sothis

ideahasagreatdealofhistory.Therazor,goingalongwiththeideathatyoushouldtrimawayeverythingextraneous,apparentlycameintothisaround1840.ItisnotclearwhetherWilliamactuallyevensawarazor.Evenwiththenear-magicofHBanalysis,whichwereviewbelow,

economy in design remains an important principle. You need toknow what to measure and how to measure. Understanding themarketplaceandyourbuyersbeforeyoustartthestudyiscritical.

ReviewingsomebestpracticesThese are important to any study, and while some may seemapparent,wehave encounteredmore thana few instanceswhereoneormorewasnotfollowed.Thesemeritaspecialcall-out.

Afewimportantguidelines

Understandyourmarket.Youneedtoknowyourmarketbeforedoingadiscretechoicemodellingstudy.Forinstance,ifyouwereembarkingonastudyofindustrialmacerators,wouldyouknowhowmanyspargingpaddlestoincludeinyourtest?Nearly99.99percentofuswouldbecompletelylostonthisquestionwithoutdoingsomeadvancedworkontheexcitingmaceratormarket.Neverplungeforwardwithoutgettingsomepreliminaryinsights–fromsecondarysourcesand/orqualitativeresearch.

Knowhowusersandprospectstalkabouttheproductorservice.Thosewhoworkwithaproductalldaytendtobelievethatallusershavetheirlevelofinterestandknowledge.Particularlyinmedical,financialandtechnicalfields,userscanshowanunsettlinglevelofunfamiliaritywithterminologyandjargon.Forinstance,doctorswhoarenotspecialistsorresearchersmayhavelittleideaofwhattheclientissayingabouttheirfinemedicalproduct.Again,qualitativeresearchcandelineatehowactualusersofaproductorservicetalkaboutit–andwhattheyunderstand.

Focusonbenefitstotheuser,ratherthanhowtheproductismade.Engineeringtypes,inparticular,cangetenmeshedintheintricaciesofhowtheproductismade.Forinstance,afoodmanufacturerwantedtotestwrittendescriptionsofhowmanyfoot-poundsoftorquewererequiredtoopenjarsoftheirAmbrosiansubstance.Theywerequiteproudofreducingtheeffortrequiredtoopenajarinhighlyspecificterms,andthoughteveryoneelsewouldbealso.

Keeptheattributelevelsinorder.Itseemsthisshouldbeobvious,buttherehavebeenafewstudieswherethiswasnotdone.Soifyouare,forinstance,measuringprices,makethelowestpricethefirstlevelandthenincreasesteadilytothehighestprice.Itisnothardtountangletheresultsifyoufailtodothis,butnotkeepingeverythinginorderintroducesanotherchanceforconfusionanderror.Justafewminutesatthebeginningoftheprojectcanobviateproblemslater.

Again,measurejustwhatyouneedtomeasureandnomore.Thisisworthanothermentionbecauseyouareconstructinganexperiment,anditssizedependsonthenumberofattributesandlevelsthatyouinclude.Moreattributesandlevelsmeansneedingtoshowmoremarketplacescenarios,andthereisadefinitelimittohowmanyoftheseanypersoncanrespondtowithoutillegalstimulants.Keepingthingsundercontrolandfocusingonkeyfeaturesiscritical.Also,whileyoushouldincludethekeycompetitorsinyourmarketplaces,youdonotneedtoincludeeverysmallproductorservice.

Measureasmuchvariationforeachattributethatyouthinkmayreasonablyhappen.Youcanestimatewhatwillhappenbetweencontinuousattributelevelsthatyoumeasureatdiscretepoints.Forinstance,ifyoumeasurepricesof$2.50and$4.00,youcanestimatewhatwillhappenatanypriceinbetween.Thisiscalledinterpolation.However,youcannotestimateaccuratelyanythingoutsidetherangethatyoutested.Ifthehighestpriceyoutestedwas$5.00,forinstance,youcannotaccuratelyguesswhatmighthappenatapriceof$7.00.Thisiscalledextrapolation.Itishighlyriskybecauseyoudonotknowwhereresponsesmayshiftstronglywithouttesting.Bettertogoalittleoutsidetheexpectedrangethantofallshortlaterandnot

beabletosimulateanewsituation.

Considernarrowingthefieldofcomparisonsifyouhavetoomanychoices.Youmayneedtofocusinonmoredirectcompetitorsthanthebroadermarketplace.Forinstance,amajormakerofprintersseveralyearsagowantedto‘rationalize’theirinkjetprinterofferings–thatis,offerfewerproductsthatcompetedcloselywitheachother.However,lookingattheirownsprawlingproductlineandcompetitors’offerings,theycameupwith57productsthatahaplessbuyermightchoose.Theydidsomepreliminaryinvestigationswithbuyers,andaftersomediscussionseparatedthechoicesintopriceranges.Itturnedoutthatsomeonewhowantedtospendabout$100onaprinterwouldnotevenlookatonecosting$800,butmightconsidersomethingfor$250iftheyfounditabsolutelywonderful.Thesponsoringcompanyconstructedfivepriceranges,withsomeoverlapatendsofeachrange,bringingthetotalnumberofchoicesthatanygivenpersonwouldseedowntoabout12.Thiswasalot,butmoremanageableforallinvolved.Andasaresult,theywereabletodeterminewhichcloselycompetingproductstheycouldsafelyeliminate.

Still,makethemarketplacesrepresentthemaincompetitorsrealistically.Themarketplacesprovidethereferencepointfordecisionsthatpeoplewillmake.(Wewillgettoreferencepointslaterinthischapter.)Makingthisreferencerealistichelpsmakedecisionsinthestudymorecloselyreflectthedecisionspeoplemakeintheactualmarketplace.

Ifyouexpectproductstomoveintooroutofthemarketplace,makesurethattheydosoinsomeofthemarketplacesyoushow.Therearewell-testedwaystodothis.Thislevelofrealismisthebestfordeterminingexactlywhatwillhappenindifferentsituations.Somenewerresearchonhowpeoplemakedecisions(whichwewilldiscusssoon)stronglysupportstheideathatanythingyousimulateshouldincludeonlythealternativeproductsetsthatpeopleactuallysawinthestudy.Somesoftwareallowsyoutodootherwise,butthisisnotasoundidea.

Tellpeopletoimaginetheyreallyneedtobuy/getoneofthechoicesifanyisatallacceptable.Researchshowsthatthisactuallyhelpspeopletoanswermoreaccurately.

Includea‘noneofthese’option.Intherealworld,peoplecandefernearlyanydecision.Pairedwiththerecommendationdirectlyabove,thisleadstomorerealisticresponses,withpeoplechoosing‘none’onlyintherareinstancewhenanexperimentaldesigngeneratesamarketplacescenariocontainingallbadchoices.

Whathappensiftheseguidelinesdonotgetfollowed?

Occasionally anorganizationwill launch into oneof these studieswithoutdoingtheirhomework,orwithoutthinkingabouttherules.Thiscanresult inconfusion,delayandamplequantitiesofblame.Thecasestudybelowshowswhathappenedinonesuchinstance.

CASESTUDYOnethatdidnotstartwell

A company thatmadewinewanted to do a pricing study. They selectednine prices per bottle for use in the study, specified in this order: $9.59,$8.39,$11.99,$10.59,$8.99,$12.49,$9.99,$12.00and$11.00.Theyalsotested three sizes for their beverage, 750 millilitres, 1 litre, and 850milliletres.Theycreated12screens,eachofwhichshowedonepriceandsizecombination.(Theydidnotshowcompetitorsbutsaidthatpricescouldvary from $8 to $14 for a comparable comestible.) Then they collectedtheirdata,butcouldnotmakeanysenseofwhattheygotbackwhentheytriedtoanalyseit.

Whatwentwrong?

Therearetwoprincipalproblemshere,thefirstofwhichhastwoparts.Tostart,theytestedtoomanyprices.Ifyouknowthemostandtheleastyouanticipatecharging,youneedtotestnomorethanfourpricestocapturethemostcomplexpatternsofpriceversusexpectedshare–althoughyoumight possibly go to five (if really not sure, six). You would do this toattemptcapturingsomepriceatwhichresponseschangedrastically.Asareminder,sharesalmostalwaysgodownwithsufficientincreasesinprice.Theremaybesomepriceatwhichthisdeclinebecomesmuchsteeper.Theotherpartofthisfirstproblemwasthatthepriceswerenotinorder

fromlowtohigh.This isoneofthepitfallspointedoutabove,and indeedthis company did not know exactly how to deal with the data got fromdoing this. They needed to be shown how to reorder the results just todeterminewhethertheylookedsensible.The second issue was that they did not have enough data to analyse

correctlywithjust12productsshown(thesecouldnotaccuratelybecalledmarketplacescenarios,withjustoneproduct).Firstcountingtheattributelevels, we find that measurement would require showing eightmarketplacestomeasurethefirstattributeandtwoforthesecond.Thatisless than the 12 they had, but they forgot one other rule. The actual

minimum would be 9 x 3, or 27, due to the need to cover as manymarketplacesasthenumbersofattributelevelsmultipliedbyeachother.This last problemproved to be insolublewithoutmorework. The best

that could be done was to go back and interviewmore people with theremaining price/size combinations. We then relied on our faithful HBanalysistohelpputtogethertheresultsfromthetwogroupsinterviewed.Thismeantdelaysandextracosts(andconsiderablegrumbling),butgoingbackformoreresponsesdidultimatelysavetheproject.

Thepriceisright:priceversuschangeinshareWementionedabovethatfourdifferentpriceswouldbeenoughtocapture even themost complicated pattern of price versus share.Butwhataresomeofthepatternsthatyoumightencounter?Aself-effects curve, as illustrated in Figure 4.3, shows one way aproduct’s share could change as its price, and only its price,changes. We find this by running several simulations and seeinghowshareshifts.

FIGURE4.3Aself-effectscurveforshareversusprice

Thenumbersintheboxesnearthesharevaluesrepresentpriceelasticitymeasurements.Anelasticitymeasurementistheratioofthe percentage change in the product’s share divided by thepercentagechangeinprice.Aproduct’sresponse isconsideredelastic if thevalue isgreater

than1.0,meaninga1percentchangeinpricecausesmorethana1per cent change in sales. An elasticity value of approximately 1.0means that the changes are sometimes called revenue neutral –that is, gross revenues remain approximately the same withincreasesordecreasesinprice.Finally,ifthevalueislessthan1.0,theresponseisconsideredinelastic.Anything related to elasticity seems perpetually confusing. It

may be easier to recall that elasticity values higher than 1.0

correspondtoalossingrossrevenuesbasedonincreasingprice,and those below 1.0 correspond to gains in gross revenues withincreasesinprice.Bytheway,forallthediscussionofpriceelasticity,youlikelywill

avoidconfusionifyouleaveitoutofyourdiscussion.Forinstance,manyyearsago,webroughtastudytoajuniorclientinwhichthedisplays showed how many share points they would lose for adollarincreaseinprice.Thisyoungpersongotincensed,demandingto see the elasticities. Themodified presentation thenwent to theReallyBigBossupstairs.He listened forawhile, thenburstout, ‘Idon’tcareaboutthese[expletive]elasticities!Showmehow[many]salesweloseifweraisethepriceadollar!’Figure4.4showsseveralpriceversussharecurves thatyouwill

commonly encounter. If we know where responses are likely tochange, four price points will cover all these contingencies. Thefigure to the left shows an essentially straight-line response toraisingprices for TinyCo, and a curved (convex) response forAceT&T. The Ace T&T curve shows increasing rates of share loss aspricesrise.

FIGURE4.4Differentresponsesofsharetopricechanges

Figure4.4 shows tworesponsecurveswhere there isapointofinflection,orelbow,aspricesincrease.Thatis,afteracertainpriceisreached,sharesdropoffmuchmoresharply.Youwanttosetthepricestestedsothatanysuchchangeiscaptured.Theexistenceofinflection points like these underlines the riskiness of trying toextrapolatebeyondthepricesyouactuallytest.Ifyoudonottestanadequately wide range, an inflection may lie just outside yourhighestorlowestprice.Gargantuan brand’s response curve shows a clear reservation

price, a point beyondwhich sales drop rapidly to nearly zero. Insome categories thereare clear reservationprices. For instance, anumberofyearsago,almostnobottledjuiceintheUSsoldatover$5.00 a container. Those days are no longer with us, but theprincipleremains.

WelcomebacktotheworldofHBanalysisOurcautionsaboutkeepingstudiescompactnowhaveasmuchtodowithnotwearingoutthepeopleansweringasanylimitsonthe

sizeofananalysis.ThisisthankstotheworldofHBanalysis,whichwewillrevisitanddiscussinslightlymoredetail.Briefly, this method of analysis fills in data that is scant or

missing for a respondentby repeatedlyborrowing estimates fromotherrespondents.Thatis,itkeepssamplingotherrespondentsandstoringvaluesfromthosewhohavetheneededinformation.Itusuallyruns20,000ormoretimesforeachattributelevelfor

each respondent, keeping a running average of its estimates. Itmayormaynotcomparetherespondenttothesampleitisdrawingandmake adjustments based on their similarities.Whether thesecomparisons get done or not, the estimates seem to come out thesameinallpracticalterms.Estimateswill settledown to steadyvalues (orconverge) if you

havesetuptheproblemcorrectly.Ifyouhavenot,thenmaybethiswill not happen.A solution that doesnot convergeusuallymeanserrorsinset-up,datacollectionorcodingofvalues.HB gives your PC (or Mac) more of a workout than almost

anything else. You will wait for a complicated analysis to run tofinish,maybeforhours.Amazingly,allthisborrowingworks–andwegetveryaccurateestimates.Figure 4.5 shows the progress of an analysis graphically. Each

wavylinerepresentsthehistoryoftheaverageutilityestimatesforonelevelofoneattribute.Atfirst,theseestimatesareunstable,butbynumber30,000,theyareoscillatingaroundasteadyestimate–ortheyhaveconvergedaroundafinalvalue.

FIGURE4.5HowvalueschangeasHBanalysisruns

BeforeHBanalysis,ifwewantedtomeasuremorethancouldbehandled by setting up about 15marketplace scenarios,we had toincreasethesampleandgiveeachpersonafractionofthewholesetof marketplaces we needed to get themeasurements we wanted.Forinstance,ifweneeded28marketplacescenarios,wecouldhavegiveneachpersonhalfofthose,anddoubledthesample.Nowwecansqueezemuchmoreoutofastudywithoutneeding

to compensate by increasing samples. You can get three or fourtimesmore information reliably (although someexperts aremoreconservativeand say twiceasmuchandnomore).Evendoublingthe amount of useful informationwe can get from one person isamazing.

HowmanypeopleyouneedFirst, no hard and fast rules exist. Discrete choice is based on

approaches in the logistic regression family of methods. Someexperts say you need 10 people per item youmeasure. For mostrelatedmethods,fiveisoftensaidtobesufficient.Experienceshowsthat,forareasonablysizedexperiment,125pergroupyouwanttomeasure separately is safe and reliable. Some are more cautiousandsay200pergroup.Asareminder,allsampleshaveerror.Discretechoicemodelling

maybehelpedsomewhatbecausetheerrorsinthisformofanalysishaveatighterdistributionthanwithmostprocedures.ThisiscalledaGumbeldistribution.WecanseethisinFigure4.6.Smallererrorsmeanthatsamplesindiscretechoicemodellingstudiesmayactlikebiggersampleswithstandardmeasurements.

FIGURE4.6Errorsindiscretechoiceversusmanyothermethods

Aboutthosecurves

AboutthosecurvesSomeofyoumaylookatthenormalcurveandsaysomethinglike,‘Oh, yes, that.’ Others might have a vague, if not terror-filled,recollection of something you probably should not have dozedthrough in Statistics 101. So that everyone is up to speed, thenormal distribution appears many times when we measurephysicalphenomena(henceitsname,normal).For instance, the belt sizes ofmen’s pants in the army follow a

normaldistribution.Theheightofthecurverepresentshowmanyitemsorobservationsfallatthatpoint.Wecanseethatthemostfallright at the average. In Figure 4.6, the average is set to zero, anddifferencesfromtheaverageareeitherpositiveornegative.Aswegetfurtherfromtheaverage,therearefewerobservationsoritems.Thiscurveissymmetrical,withasmanybelowbyacertainmarginasabovebythesamemargin.Errors in measurement in mathematical models also often fall

intothesamekindofnormalcurve.NotsotheGumbeldistribution.Bycomparison,itistallerandpinchedintowardsthemiddle,withrelativelymoresmallerrors.Italsoisnotsymmetrical.Itskewsoutfurtheronthepositiveside.Thatpinchingaroundtheaverageis theimportantpart.Smaller

errorsmean thatmeasurement is somewhatmore accurate for agivensamplesizethanwithmanymethodswetypicallyuse.

Backtoourtopic

Recall that with HB analysis you can create a large experiment,varyingmanyattributesandattributelevels,andnotneedtomakea compensating increase in sample. This helps keep the size andcost of studies involving discrete choice modelling in line. Withexceptionallylargeexperiments,youstillmayneedtoincreasethesampletogetenoughformakinggoodmeasurements.This definitely happens when you need more than 48

marketplacescenarios,equalling48screensthatyouwouldneedtoshow. Evendividing this by three, thiswould result in 16 screensperpersonparticipatinginthestudy–aheftyburdenonthepoorstudyparticipant.Someexpertswouldsaythatdividingthedesignbythreeandnot

increasing the sample is stretching too much. Staying moreconservative, any experimental design requiring over 36 marketscenariosmust have an increased sample.With 36 scenarios, thatwouldinvolvedividingthetotaldesignintotwoparts,showingeachperson18.Studies can grow unexpectedly. Not long ago, one became

monstrous,with77 candy choicesappearingona simulated shelf.Thismany is not recommended. The story is tangled, featuring aclient that somehowgot out of control, andan inexperiencedandhighly confused person getting into the middle of the planningprocess.Thisstudysetouttomeasuretheeffectsofcertaincandiesbeing removed from themarket. Some products were consideredessential,itemsthatneverwouldbeoutofstockandsoshouldhavebeenalwayspresent.However,55candychoicescouldbethereornot. This required 60 marketplace scenarios. Since the averagecandy buyer was not expected to remain patient with repeatedlyseeingsimulatedstoreshelves,eachpersonmadechoicesin10outofthe60marketplaces.Thestudyusedasampleof1,000,andthenallowed no more than two-way divisions of the total intosubsamples;500wastheminimumgroupsizeanalysed.Allinvolvedheldtheircollectivebreaths,duetothedifficultiesin

attempting to measure so much in one study. Fortunately, sharepredictionsprovedtobeaccurate,evenforcandieswithsharesthatwereafractionofapercentagepoint.Wecouldmakecomparisonsbecause the sponsoring company had extensive sales data.Whensimulationsweresettocurrentmarketconditions,resultsfittedthemarketplace.

UtilityandshareGoingbacktoearlierchapters,youmayrecall thatdiscretechoicemodelling uses utility as a way of keeping score. This allowsvarious features and prices to be valued on a common footing.However, thismethod does not assume that utility has a straight-linerelationshipwithmoreshare.Rather,discretechoicemodellingusesanS-shapedrelationshiporresponsecurve.This matches how people respond perceptually to their

environment. For instance, suppose a light source is slowlyincreased in brightness from zero. When the light is sufficientlydim, small increases in brightness hardly register as differences.Then when the perceptual threshold is reached, increases inintensity register more strongly. This is themid-portion of the S-curve. Finally, continuing to dial up the illumination, anotherthreshold is reached where the light becomes too bright. Thenfurtherincreasescannotbeperceived.Thiscurvealsomatchesmarketplacebehaviour.Utilitymustpass

a certain threshold to get a noticeable response – that is, smallerproducts tend to get lost in the shuffle.When a product becomessufficiently salient, small increments in utility boost responsesstrongly.Finally,saturationisreached.Atthatpoint,bigchangesinutility are needed to move those strongly committed to otherchoices.Itishardtomovepeoplewithentrenchedpreferencesandsoapproachanearlyunanimousresponselevel.Recent research into decision making, in particular prospect

theory (forwhichDanielKahnemanwonaNobelPrize) is largelyconsistentwith thisviewofutility.Prospect theoryalsousesanS-curve, but one that is less symmetrical. Figure 4.7 showshow theprospect theory S-curve compares with the one hypothesized bydiscretechoicemodelling.

FIGURE4.7ComparingtheS-shapedresponsecurvetoprospecttheory

Theprospect theory curve reflects the finding that people focusmore stronglyon losses thanongains.Thediamondon the curverepresents a ‘reference value’ – a point abovewhichpeople see againandbelowwhichtheyseealoss.The various choices in each discrete choice modelling

marketplacesetareferencepointforcomparisons–whichonesarebetterandwhichworse.Thisunderlinestheimportanceofkeepingmarketplaces realistic. The context that the whole marketplaceprovidesisakeyfactorinthechoiceapersonfinallymakes.Mathematically, the symmetrical S-curve is a reasonable

approximation for the curve in prospect theory. We could argueaboutwhether it is good enough. Perhaps venturing onto anotherregionofmetaphoricallythinice,wecansurmisethatfunctionallythese models are adequately close. The fact that discrete choice

modelling forecastshaveheldup sowell, in instanceswhere theycouldbechecked,arguesthatthismethodcancapturewhatpeoplechoose.Thatisofcoursedependentontheprojectbeingsetupandanalysedwithcare.

EstimatesinvolvecalculationsTherealismoftheS-shapedresponsemodelmakesforextraworkincalculations.Thatis,whiletheanalysisleadsfirsttoutilities,youcannot know the value of an alternative just by summing itsutilities.Thereisanintermediarystep(technicallyexponentiation,whichmaybe familiar tosomeas theexp function inExcel).ThisgetsyoufromutilitytothatS-shapedcurve.Thiscurvelooksnearlylikeastraightlineoveritsmiddlerange.

Outsidethatrange,shareshavealessclearrelationshiptoutility.Aproductwithashareover(about)60percentandunder(about)10percentcanrespondtochangesinutilityinunintuitiveways.Youneedamarketsimulatorprogramtoseehowshareschange

as features and prices change. After discussing one lastconsideration, we will show some examples of simulations inaction.

ThelastwrinklewithHBanalysisUsingHBanalysis,theutilitiesproducedforeachpersonarescaleddifferently from those forotherpeople – that is, largeror smallerthanforothers.Thishappensforsometechnicalreasons,whichwewillspareyou.Thepracticalupshotof this is thatreportingutilityvaluesafterusingHBanalysismakesonlylimitedsense.Howwegetawayfromthisproblemisbysolvingfor‘share’one

personata timeandaveragingthesevalues intoanoverallshare.Preciselyhowthishappensisjustabittechnical,butlet’ssayitandgo forward. That is, this gets done (for each person) by firstexponentiatingthetotalutilityofeachchoice,thendividingthatby

theexponentiatedsumofallthechoices’utilities.Wethenaverageallthesecalculatedresults.In any event, remain cautious if you see a report of utilities.

Instead,youshouldbeusingamarketsimulatorandobservingtheeffectsonshareofchangingfeaturesandprices.Andthisbringsusdirectlytothetopicofmarketsimulations.

MarketsimulationsIn amarket simulation program – perhaps the best thing sinceslicedbread–allthehardworkfinallypaysdividends.Itprovidesapowerfulculminationtoeverythingdoneinselectingattributesandlevels, in getting a good experimental design, in making themarketplaces, indoing theanalysis, ingivingyourpoor computeritsworkoutforthemonth,andintweakingExceluntilitdoeswhatyouneed.

OurfirstsimulatorThefirstsuchprogramwediscussisarelativelysimpleone,fromastudy that has been disguised slightly for use here. It involvedcommercialpurchasersofprinters.Thebasic situationwas this: ifcustomerscommittedtobuyingacertainnumberofprinters, theycould get a cash and/or non-cash incentive. The main questionaddressed was which of these it would be best to offer. Anotherfactormeasuredwas thepriceof theprintermodel thatmightbeusedinconnectionwiththisbulkpurchase.That is, the client wanted to measure how share would be

affected by different prices and different incentives. They alsowanted to knowwhatwouldhappen if competitors respondedbyofferingtheirownincentivesatvariouspricepoints.Study participants chose among these offerings and could also

choose‘noneofthese’ifallthecompetingoffersdidnotmeettheir

needs.Theywere told to imagine that they reallyneeded tomakesuchapurchase,butthattheycouldoptoutifallthechoiceswereunacceptable.Theysawrealbrandnames,whichwehavealteredinthisexample.Figure4.8showsapartofthissimulator.Inthistheusercouldset

thecurrentcase.Thiswouldbetheconfigurationthattheywantedto test in a given simulation. The other part of the simulator, inwhichtheuserwouldsetthereferencecase,appearsinFigure4.9.Together both sections are too large to read comfortably in print.(Excel,wherethissimulatorwasconstructed,allowsuserstoscrollaroundthescreentoseedifferentpartsofagivendisplay.)

FIGURE4.8Thecurrentcasesideofthesimulator

FIGURE4.9Thereferencecasesideofthesimulator

The changes in the simulator show differences between thecurrentcaseandthebaseorreferencecase.Youneedtohavethiscomparison if youwant to understand the sizes of share changeswhenfeaturesand/orpriceschange.Again,thisissobecauseutilitydoesnothaveastraight-linerelationshipwithshare.Thesizeoftheshare effect from changing a feature will depend in part on theshareatwhichyoustart.Thismay seem likeamind-bending concept –but it reflects the

realities of themarketplace, and so is important to recall.Where

you start in the marketplace can influence how strongly themarketplacerespondstowhateveryouarechanging.Startingfroma sufficiently low share can cause the share effects of makingchanges toyourproductor servicestronglynon-linear.Addingagiven amount of utilitymay produce little effect, then adding thesameamountmayincreasesharemuchmore,forinstance.YoucanseethisphenomenonreflectedinFigure4.7.Depending

onwhere you start on that S-curve, effects in share arising fromadding a given amount of utility can vary widely. This definitelymeritsbeingcalledout.Figure 4.9 shows the reference or base case portion of the

simulator. This section can be reconfigured – that is, the featuresand prices can be changed – so that you can still understand theshare effects of any change that youmake, should the current orreferencesituationchange.

Becauseutilityandsharedonothaveastraight-linerelationship,itisimportanttohaveareferenceorbasecaseforcomparisonsothatyouunderstandtheshareeffectsofchangingfeaturesand/orprices.DependingonwhereyourproductstartsontheS-shapedcurvereflectingtherelationshipofutilityandshare,agivenchangeinfeaturesorpricescanhavedifferentshareeffects.

CASESTUDYThepowerofsimulations–mobilephonetowers

This study concerned the exciting world of mobile phone towers, inparticular, the electronic innards that make them work. The sponsoringcompany, who we will call Ace T&T, had little idea of what their truecompetitivesituationwas.Theirmainsourceofinformationwasanecdotalreportsfromthesalesforce,pepperedwithsomegossiptheypickedupatvariousconferencesandconventions.Theywere fairly sure that theywere the leading provider, but had no

ideawhattheirlesser-knowncompetitors,whowewillcallMinorPlayers,InsignificantCo,andTinyIndustries,weredoingwithpricing.Acewantedtoknowiftheycouldsellmoreoftheseunitsiftheydropped

their prices. Also, they wanted to know what would happen if theircompetitorsfollowedsuit.Anexperimentwassetupandinterviewswereconducted. In the survey, buyersofmobile phone towers sawa series ofmarketplacescenarios.Ineachofthese,thefourbrandsofferedtowersatdifferentpricesbetween$48,000and$88,000.The client chose four evenly spaced prices for each brand. There also

weretwonon-pricefeaturesthatcouldvaryinacoupleofways,buttheseprovednottobeimportanttothestudy,sowewillomitthemhere.This led to a total of 24marketplace scenarios. Each study participant

evaluated12ofthemand,thankstoHBanalysis,thiswassufficient.

Lockupthosereports!

Thesimulatorwasmadesothatitshowedbothshareandgrossrevenuesper 100 sales. Thiswas critical information, asAce foundout in the first

simulations it tried. They set the reference point to the average currentpricerevealedintheresearch:$68,000.Notethat‘noneofthese’wasastudyoptionbutwasnotreported.This

didnotexceed1percentuntilallpriceswereover$82,000.Thesewereboomtimes,ifnotthegoldenage,formobilephonetowers.In any event, in simulations, Ace first set all brands’ prices to that

$68,000average.They thendroppedtheirprice to$52,000.This initiallyseemedgoodforthem,astheirestimatedsharewentfrom38percentto46percent.Thismeantashareincreaseequalto46/38or121percent.However, if the other brands somehow stumbled onto what Ace was

doing andmatched their pricedecrease,Acewould endupnearlywheretheystarted,ata39percentshare.WecanseethisinFigure4.10,wheretheresultsofthesetwosimulationsappearinchartformat.

FIGURE4.10Basecaseandtwosimulationsshowingthedangerofapricewar

Thiswasbadenough,but lookingatgrossrevenues,thepictureturneddire.AlthoughAce’sshareincreasedifAcealonedroppedprice,theirgrossrevenues actually declined slightly in spite of larger share. If the othersfollowedsuitindroppingprices,Acewouldendupsellingnearlythesameamountofunits,butgrossrevenueswouldgodownbyabout22percent.WecanseethisinFigure4.11.

FIGURE4.11Changesinrevenuesfromthesimulationsunderlinethedangers

Seeingthesesimulations,Aceissuedthecalltolockupthereports.Theyrealizedthattheywouldloseterriblyiftheystartedapricewar–orindeedifanyoneelsedid.OnlyInsignificantCo.wouldemergenearlyunscathedifeveryonedroppedprices.Theymightfindthedropinrevenueacceptablefortheirincreasedpresenceinthemarket.Theirsharewouldrisebyover27percent,increasingbyfivepointsovertheirinitial18percent.

Ontheupside

Soberedbutmademorecuriousbythesesimulations,AceT&Ttriedafewmore.ThreeofthesearesummarizedinthechartinFigure4.12.Thefirst,under‘A’,showswhathappensiftheyaloneraisedtheirpriceto$74,000.Theymanagetoekeoutasmallgainingrossrevenues,evenifsellingfewerunits.

FIGURE4.12Resultsfromthreemoresimulations

Then they testedwhatwould happen if they and their competitors allraised prices. This appears under ‘B’. As is the case with other inelasticgoods and services, if prices were to go up, all competitors win – andconsumerslose.Thisisonereasoncollusionisillegal.The last simulation,under ‘C’, showswhatwouldhappen if they raised

theirpriceandallotherstookamodestdecreaseatthesametime.Inthisinstance,theylose,butsodoeseveryoneelse.Thisexampleunderlinestheimportanceofnotfocusingsolelyonshare

asthecriterionforyoursuccess.Italsoshowshowsimulationscanmodelbothactionsandcounter-actions,helpingtosetamoreeffectivestrategy.

Makingmorethanonechoice:allocatingpurchasesDiscrete choice modelling can forecast what will happen insituations where people might choose more than one item. Witheverydayconsumerproducts, thismighthappeninsuchsituationsasbuyingbeveragesorsomefooditems.Allowingpeopletochoose

morethanoneitemiscalledallocatingpurchases.Allocationoftenmakessensewithbusiness-to-businesspurchase

decisions. As an example, many studies among doctors ask whattheywouldprescribefor10or20typicalpatientsand/orpatientsofspecific types. This makes sense because doctors often prescribeseveraldifferentdrugs for thesamecondition,dependingonsuchfactors as other drugs the patients are taking, diseases anddisordersthattheyhave,thepatient’sageandsoon.Thesimulatorweshowedatthebeginningofthischapterisbased

ondoctorsallocating20patientstothedifferenttherapiesavailable.Theydid twoallocations in eachmarketplace that theyevaluated,oneforallpatientsandonefornewlydiagnosedpatients.While it is possible to allow people to choose asmany items as

they like, this can getmessy to analyse. Asking people to allocateacrossasetnumberofpurchases,suchasthenext10,alwaysworkswell.

UsingthesimulatorprogramintheonlineresourcesDownload the program (available at www.koganpage.com/AI-Marketing) and open it with Microsoft Excel. If Excel is feelingreasonablywell, theprogramwill open and theExcel ribbon (thebar across the top with menus and commands) will temporarilyvanish. Two buttons on each page of the simulator allow you tocontrolwhethertheExcelribbonappearsornot.ThesebuttonsareshowninFigure4.13.Ifyoucannotseethem,andneedtousethem,pleasescrolldownuntiltheyarevisible.


FIGURE4.13Pagedisplaycontrolsinthedownloadablesimulator

If you want to continue working in Excel after using thesimulator,restoretheribbontoview,andthenclosethefileusingthemenu.If thesimulatoris theonlyfileopen,usingthesmall ‘X’thatappearsintheuppermostrightcornerofthescreenwillmakeExcelclose.If Excel is baulkingat opening the simulator, please reassure it,

clickingtheoptionitpresentstosayitisfinetorunmacros.Ifthisstilldoesnotwork,gotothethirdtabinthesimulator,‘SecurityinExcel 2007 and beyond’. Follow the instructions there and Excelshouldatlastcooperate.The first page of the simulator has some general instructions.

Theseareimportanttoincludewithanysimulatorthatwillfinditsway around an organization, or in case someone opens it after afewmonthshavepassed.Thenextsheetistheactualsimulator.Figure4.14showstheleft

sideofthescreenthatyouwillfindonthissheet.Hereyouhaveallthe controls for setting the current case and for establishing areferenceorbasecase.

FIGURE4.14Theleftsideofthedownloadablesimulator

Pricesandamountsofhorsepowerrunusingslidingcontrols.Youeither canpull these to thedesiredvalues or clickon the endsorinside the control. Clicking on an end (arrow) of the horsepowercontrol changes it by one unit; clicking inside it changes it by 10units.Clickingonanendofthepricecontrolchangesitby10units,whileclickinginsidechangesitby100units.All the other features are either present or absent. You choose

whethereachisapartofthefeaturepackagebyclickingeitheron

‘Yes’or‘No’.Atthebottomofthissideofthepageyouwillfindasetofcontrol

buttonsthatallowyoutocontrolwhethertheExcelribbonisvisibleor not. A last set of identical controls also appears at the bottomrightofthepageaboutsecurityinExcel.Figure4.15showsthevariousdisplaysinthesimulatorthatresult

fromthechosenvaluesofthecurrentandreferencecases.Thetopsectionhasnumericalinformation.

FIGURE4.15Therightsideofthedownloadablesimulator

Next to this chart you will find two additional control buttons.One returns the current case to preset values built into thesimulator. The other returns the reference case to a set of presetvalues. Controls of this type canbeuseful after a few simulationshavebeenrunandyouwouldliketogetafreshstart.The middle section of this page shows the information in the

number chart in a graphical format. Seeing the differences inchangescanhelpmaketheirsizesmoreeasilyunderstandable.At the bottom, youwill find calculations for gross revenues per

100 sales. These are gross revenues because no costs (such asdistribution, advertising or other kinds of overhead) are offsetagainstthem.Again, inmanycases,goingforthelargestsharemaynotbethe

best policy for getting themost revenue. This chart allows you tokeeptrackofbothshareandgrosssales.

UsingExceltooptimizeApowerfuloptimizer is built into Excel itself. This allows you tofindthesimulatorsettingsthatleadtothebestpossibleoutcomeinspecific situations. It is called the ‘solver’. It is not part of thesimulator itself, but runs in conjunction with the simulator’scontrols.You need to load the solver before you use it. The instructions

fromMicrosoft for loading and running this add-onare clear andeasy to follow. You could use the optimizer to answer questionssuchas,‘Whatisthemixofpriceandfeaturesthatwilloptimizemyrevenues if competitors stay as they are now?’ You could thenansweranotherquestion, suchas, ‘Whatwilloptimizerevenues ifmycompetitorsthendroptheirpricesby10percent?’Addressing questions like these can add a new dimension to

predicting changes in share and revenue. Depending on theobjectives of the project, optimization can provide valuablestrategicinsights.

RoundingoutthepictureThischapterhasbecomethelongestandmostcomplexinthisbook.First, congratulations for persevering. We have actually omittedmuch,inparticularvastswathsofamoretechnicalnature.Still,wehavejustafewlasttopicstoconsider.

Compensatoryversusnon-compensatory:whatisthis

Compensatoryversusnon-compensatory:whatisthisallabout?Somecriticismsofdiscretechoicemodellinghavearisenbasedontheories that people do not weigh attributes and sum utilities tomake decisions. These so-callednon-compensatory theories seemperfectlynon-objectionable,asweindeedhavenoclearideaofhowdecisionsgetweighed.However,andletusspeakplainlyhere, theargument that this somehow invalidates choice modelling isnonsense.Someofthesecriticismssaythatpeoplepickcertainfeaturesfirst,

eliminate choicesbasedon those, and thennarrowdown in someway. To these critics, this throws into questionwhat people do inchoicemodels.Weunderstandwhatthesecriticsaresayingandcansympathize

withit,buttheyarewrong.Justbecausechoicemodellingpresentsaproductasasetofattributes,thisdoesnotforcepeopletoweighandbalanceeverything theyareseeing.People indeedcan lookatthemarketplacespresented to themanduseanydecisionstrategythattheywoulduseintheactualmarketplace.We can see by watching how people decide that they have

different strategies.Onepartofmanystudies is sittingwitha fewpeople before the formal interviews start and going through themarketplaces.Duringthisphase,youaskpeopletotalkaboutwhatthey are doing, ask questions and do a great deal of observation.Somepeopledoindeedfocusinonasingleattribute,suchaspriceorbrand, somebalance twoor three of them, and somemake anearnestefforttobalancemultiplefactors.Discrete choice modelling can accommodate and reflect any of

these strategies. The problem with arguments about ‘non-compensatory’ decision making seems to lie in confusing acapability of discrete choice modelling with a requirement ofdiscrete choice modelling. That is, it can model people makingcomplicateddecisionswheretheytrytotradeallattributes.Butthat

does not mean that anyone has to do that. People can makedecisionsaccordingtosimplerrulesandthemethodwillstillworkaccurately.

SomelessfortuitousapplicationsWe have already spoken about the dangers of poorly definedprojects,unclearterminologyandunrealisticrepresentationsofthemarketplace.Earlierwereviewedthepitfallsoftalkingto(orabout)thewrongpeople,andofputtingsomanyrestrictionsonselectingpeople forasurvey thatyou lose trackofwhotheyrepresent.Butwhat if you get all the basics right? When could discrete choicemodellingnotworkwell?We have seen problems with certain types of products or

services.First,let’sconsiderproductsasrangingalongacontinuum,from more cognitive (or having more to think about) to moreaffective or sensory (or more feeling-based). Trade-off methodsworkbestwhereproductshavemorecognitiveelements.Sometimes it is very difficult to get people to trade affective or

sensory elements. For instance, in a trade-off exercise, people donot accurately trade off ‘tastes good’ against other productattributes. We know that in real life, people will consume somefairly awful items in the interest of saving money. For instance,certainstorebrandsofnon-fatcreamcheeseare indistinguishablefromwindowcaulkingexceptthattheydonotholdupaswellasthecaulktotheelements.Yettheseremainonsale.However, you will almost never encounter a person who will

admit in a survey to eating something that tastesworse than thecontainer it comes in, just to save a few pennies. Such a purelysensorytrade-offisarareinstancewherethein-marketexperiencedoesnottranslatewellintothesettingofaninterview.Inanyevent,never try to trade off worse taste, aroma or feel versus otherfeaturesinadiscretechoicemodel.However, people can show that they respond differently to

variousbrands.Forinstance,Sonyoncecommandedahigherpricethanotherbrandsforthesamesetoffeatures.Therefore,featuresonce were worth more with the Sony name. This shows that arather amorphousquantity suchas ‘brand identity’ can in fact bemeasured in a choice study – by the hard-edged metric of whatpeoplechoose.

VariantsofchoicemodellingA number of variants of discrete choice modelling have beenproposed.Someofthebetter-knownonescomefromonecompany,Sawtooth Software. These include menu-based choice basedmodelling,whichaimstoaddressthespecificproblemofmodellingchoicesfromcompaniesthatofferamenuofchoices,suchasphonecompanies on their websites. You can set up a standard discretechoicemodellingstudytogetclosetothis,butnotexactlywhatyouwould see on such awebsite.Another offering is adaptive choice-based modelling, which seemingly chases the chimera of non-compensatorydecisionmaking.Theseareinterestingideas,butsofarhavebeensupportedonly

bypaperspublishedonthecompanywebsite,or insomecasesbypapers from loosely vetted industry conferences. This does notmeanthatthereisanythingwrongwiththeworkinvolved,justthatthesehavenotyetbeenhelduptorigorousacademicstandards.AsmentionedinChapter1,everymethodwediscussindetailhas

topassinboththesphereofpeer-reviewedpapersandinthetestofpractical application. So far, then, these variants have intriguingideas,butnomore.Perhaps the best-known alternative, a software product called

Choice-Based-Conjoint, or CBC, has gained many followers. Tosome, this (wrongly) is seen as the same as synonymous withdiscrete choice modelling. Actually, though, the name of thisproduct ishonest. It ispartly conjointanalysisandpartlydiscretechoicemodelling.

Because of this hybrid nature, CBC will appear in Chapter 5,whereweexplainconjointanalysis.There,youwillbeable to seewhatCBCtakesfromthisapproachandwhatittakesfromdiscretechoicemodelling.

SummaryofkeypointsDiscrete choice modelling is arguably the most powerful andsophisticatedmethodfordeterminingtheexactmixoffeaturesandpricestoincludeinaproductorservice.Itfocusesonwhatpeoplewill choose. Itprovides someof themostpowerfuloutputsofanyanalytical method, in particular interactive market simulatorprograms.These showyoupreciselywhatwillhappen to shareofpreferencewhenproducts’featuresand/orpricesarechanged.Ifyoucanmanagetothinkaheadtowhatcompetitorsmostlikely

will do, then you can cover not only what would happen if youchangeyourownfineproductorservice,butalsowhatwillhappenwhencompetitorsrespond.The method uses a survey that shows people realistic

representationsofthechoicestheyhavetomake.Eachpersonwillsee a series of these marketplace scenarios, and in each theysimplysaywhattheywouldchoose,justastheydoinreallife.Theproblemcanalsobesetupsothattheycansaytheywouldchoose‘none of them’ in each marketplace. This greatly increases therealismoftheexercise.Thereareveryfewsituationsindeedwherepeople cannot sit outmaking a purchase, evenwhen the need isstrong.Discrete choice modelling considers products or services as

collectionsofattributes.Inthestudytheseattributesarevariedinspecific ways. For continuous attributes, such as car horsepower,onlycertainspecificvaluesaretested.Thedifferentvaluesthataretested are called levels. For instance, if the engine could haveanywherefrom150to240horsepower,wemighttest150,180,210

and240asthelevels.Todeterminethevaluesinbetweenwewouldinterpolate.Discretechoicemodellingisbasedonstrictexperimentaldesigns

that govern which level of each attribute appears in a givenmarketplace scenario.As youmeasuremoreattributes and levels,theexperimentmustgrowlargertomeasurethem.Itisimportanttomeasureexactlywhatyouneed–andnomore.

People doing a discrete choice study typically start to tire afterevaluating 6–12marketplaces.Most peoplewill stickwith it for afewmore,butabout21isthelimitevenforhighlyanalyticalstudyparticipants.Therefore, careful discussion of precisely what goes into a

discrete choice modelling study is critical. So is a goodunderstandingofthemarketplaceitselfbeforeyoustartthestudy.Youtrulyneedtozeroinonwhatmatters.Recall that you need to have one marketplace scenario to

measureatwo-levelattribute,forinstance,andthreetomeasureafour-levelattribute.Thisunderlinestheneedforeconomy.

Utilitiesandchoices

Discretechoicemodellingusesabstractquantitiescalledutilitiesasakindofbookkeeping.Theseunitsallowustocomparetheeffectsofdifferentattributesandprices.Thisuseofutilitiesdoesnotmeanthat the method assumes people will look at and weigh all theattributesthattheysee.Peoplecanfocusonasingleattribute,suchasprice,theycanlookatafewofthem,ortheycanmakecareful,exacting decisions. Whatever their strategy, this method willaccommodateit.The aim of thismethod is tomodelwhat peoplewill choose. It

does not delve deeply into psychology or hidden motivations. Itsfocusisalwaysonthevalueoffeaturesandhowchangesaffectthatvalue.

Featuresarespecifictothechoices

Featuresarespecifictothechoices

Eachchoicecanhaveitsownfeaturesandpricesindiscretechoicemodelling.Thebrandorchoiceactsasakindofcontainerforthosefeatures. We can get an idea of the residual value of the brand,asidefromtheattributesthatarevaried,bylookingataterminthemodelcalledtheconstant.Wegetthisaspartofthemodel,ifwesetit up properly. We therefore do not need to specify brand as anattribute.Thisisanimportantconsideration,andonethatmaybepuzzling

topeopleusedtothethinkingbehindconjointanalysis.Inthebasicform of that type of analysis, brand is an attribute rather than aplacewhereotherattributeslive.

BestpracticesHere is abasic outline. Please referback to this chapter formoreabouteach:

Understandyourmarket.Toomanyofthesestudiesfailbeforetheystartbecauseofinadequatepreparation.Measuringthewrongattributesorfocusingoninessentialconcernsinevitablyleadstopoorresults.Knowhowusersandprospectstalkabouttheproductorservice.Makersoftechnical,medicalandprofessionalproductsareparticularlypronetobelievingthatallusersoftheirproductsreallyunderstandthemandknowallthecorporatelingo.Itisbettertoassumethatusersknownosuchthing.Itisbesttofindouthowuserstalkabouttheproductorserviceandusepreciselythatlanguage.Focusonbenefitstotheuser,ratherthanhowtheproductismade.Again,makersofaproductmaybelievethetechnicaldetailsarefascinating,andprovidersofaserviceoftenthinkthateveryoneisengrossedbytheorganizationalprocessesthatunderlietheirwork.Userscareaboutwhataproductdoesfor

them,nothowitisputtogether.Focusonthefunctionalbenefits,notthetechnicalspecifications.Measureasmuchvariationforeachattributethatyouthinkmayreasonablyhappen.Thatis,youwanttomeasureasmuchchangeineachattributeasyouexpecttosee.Supposeyouwanttomeasureresponsestopriceandyourhighestpriceis$5.00.Yetat$5.01,peoplestartconsideringthepricetooexpensiveandsalesfalloffdrastically.Youwillhavenowaytoknow.Youneedtomeasureuptothepointwhereyouthinknosuddenchangesinresponsearelikely.Considernarrowingthefieldofcomparisonsifyouhavetoomanychoices.IfyouarestudyingSoggyOsbreakfastsubstance,forinstance,youlikelywouldnotincludeallthingsapersoncouldhaveforbreakfastasacompetitiveset.Indeed,anyspecificcereal-likesubstancewouldlikelyhaveaminusculeshareofthetotalmarket.Measuringwhatdriveschangesisdifficultforitemsthatrarelygetchosen.Instead,youshouldcomparetotheclosecompetitors.InthecaseofSoggyOsthismightbeotherfinecellulose-enhanced,overlysweetenedbreakfast-liketreats.Still,makethemarketplacesrepresentthemaincompetitorsrealistically.Weknowfromrecentresearchondecisionmakingthatthecontextfordecisionsisveryimportantindeterminingwhatpeoplewillchoose.Themarketplacescenariosprovidethatcontext.Makesureitallowspeopletochoosemuchastheywouldintherealworld.Tellpeopletoimaginetheyreallyneedtobuy/getoneofthechoicesifanyisatallacceptable.Researchandexperienceshowsthatthisleadspeopletomakebetterandmoreconsistentdecisions.Includea‘noneofthese’option.Togetherwiththeinstructiondirectlyabove,thismakestheexerciserealistic.Peoplewillthinkasiftheyareinaframeofmindconducivetobuying,but

theyalsowillknowthattheycansitoutthedecisioniftheexperimentaldesigngeneratesamarketplacewhereallchoicesarenotacceptable.

ThepriceisrightDiscrete choicemodelling is themethodpar excellence for testingresponses to changes in price. This comes from the realism ofshowingproductssidebysideinamarketplace,fromaskingpeopleto choose, and from each choice having its own pricing variable.Thismeansthatthepricescanbefitexactlytoeachchoice.Ifyouhaveagoodsenseofpricinginthemarketplace,fourprices

shouldbeallyoueverneed.Asweshowed inseveralcharts, fourpoints will capture the most complex price versus sharerelationshipsyouwillencounter. Ifyouarenotcertain,youmightgo to five prices or even six, but we have never seen a need formore.

TheBayesianadvantageHierarchical Bayesian (HB) analysis has opened up the world ofdiscrete choice modelling. It is an incredibly complex method ofcalculations, and it will give your computer its workout for theweek–oryear.Wehaveseenmassiveproblemstakehourstorunonthelargestandspeediestcomputers.Itseemsalmostmagicalinhowitworks–andyetitdoeswork.Ithasbeenvalidatedtimeandagainoverthelast20years.It allowsyou todoubleor triple theamountof informationyou

canget froma given studyparticipant. Suppose youhavea studythat requires 24marketplaces based on the experiment you haveset up. You can show each person half of those and get a nearlyperfect reading on all the attributes. In the old days, before HBanalysis,youwouldhavetodoubleyoursampletocompensateforsplittingthetaskinhalf.

Also,yougetdataforeveryperson.Beforethisformofanalysis,you could only get data for a group of people. Data for eachindividualisaremarkableadvance.

UtilityandshareUtility isanabstractquantityanditmustgettranslatedintoshareinsomeway.Discretechoicemodellingusesasophisticatedmodelthat is in line with how we respond perceptually to theenvironment. This model also is at least a reasonableapproximation for the biases in decision making that recentresearchhasrevealed.Thisresponsemodel isanS-shapedcurve.This isrealisticbut it

alsomakestranslatingutilityintosharedifficult.Therelationshipisnot a straight line. This makes sense looking at a marketplace,though.Ifyoubrandhasalowshare,youhavetodomoretostandoutfromthenoise.Ifyourbrandisreasonablyvisible,peoplewillrespondmore readily to changes. Then if you have a really largeshare, it becomes increasingly difficult to convert those who areintenselyloyaltootherbrands.ThatisthetopoftheS-curve,whereyoucanreallypouronextravalueandstillseeonlysmallshiftsinthemarketplace.

Youcannotjustlookandguess

Youmustrunasimulatorprogramtoseehowsharechangeswhena product or service changes. Looking at utilities and guessing isinaccurate.UtilitiesaredoublyinaccurateifyouuseHBanalysis,aseachperson’sutilityvalueswillbelargerorsmallerthaneachotherperson’s.Youneedtogetalltheutilitiesonthesamefootingbyfirstsolvingfor‘share’withineachpersonandthenaveraging.

AllocatingchoicesDiscretechoicemodellingallowsyoutomodelpeoplemakingmore

than one choice at a time. For instance, we can ask doctors toallocatedifferentdrugsto10or20typicalpatients.Wealsodosuchthingsasaskingordinaryconsumers tochoose theirnext10or12softdrinkpurchases.

ContingentattributesWealsocanmakethevaluesofoneattributecontingentonanother.For instance,supposewehavetwosizeclassesofwinebottle,andthreeprices thatweconceptualizeas low,mediumandhigh.Eachsizecanhaveitsownsetoflow,mediumandhighprices(Table4.1).

TABLE4.1Settingupcontingentpricing

Verylow Low Medium High

750mlor850ml $8.39 $9.59 $10.59 $11.99

Onelitre $8.99 $9.99 $11.00 $12.49

We also can make one attribute (or more) disappear entirelybasedonthevalueofanotherattribute.Thewaytodothiswasfirstproposed over 20 years ago. Blanking out attributes in anexperimental design does not introduce correlations or hurt thestatisticalperformanceofthedesign.

CautionsandwherethismaynotworkIt is worth saying again that discrete choicemodelling, for all itsincrediblepower,producesshareofpreference,notmarketshare.To get to market share, you must adjust for marketplace factorssuch as comprehension and awareness of the product and howwidely the product is distributed. To go to projections, you mustknow the size of the entire market, which is often difficult todetermine.Lastly, you must know what part of the total market your

particular sample of study participants represent. This makesdrawing projections extremely difficult. We often include extracontrols in the simulator that the sponsoring company canmanipulate, seeing what projected sales would be under a widerangeofdifferentassumptions.Wealsoneed to recall that forecasts arenecessarily short term.

Oncemarket conditions change, they losemuchof their precisionand power. With sufficient change, they can become highlyinaccurate.These methods seem to break downmost severely with purely

sensory attributes. People do not seem able to trade a product‘tasting worse’, ‘smelling worse’ or ‘feeling worse’ for otherattributes in a discrete choice study. This is so even though theyobviouslydosoroutinelyinreallife.

OverallMarket simulations (andmarket simulator programs)may indeedbe thatelusivebest thingsinceslicedbread.Whendiscretechoicemodelling studies are done with care and understanding, thesemodelscanhaveincrediblepredictivepowerintheshortterm.Thisneed for care, for advance thinking, and for measuredconsideration of the results may work against truly wide-spreadadoption.Thoseseekingaquick fix,or thosewhohopeamachinewillsolvetheproblemforthem,willquicklylookelsewhere.A sample simulator is included in the online resources for this

chapter, available at www.koganpage.com/AI-Marketing – itshould give you an idea of the tremendous power that discretechoice modelling can have. We encourage you to follow theinstructions,useitandseehowitworks.


05Conjointanalysisanditsuses

This chapter addresses the other main method for getting to theoptimal product, service or message. Conjoint analysis has beendescribed as outmoded by some, but in the right applications itremainsapowerfulmethod.Inparticular, itcanworkremarkablywell in optimizing communications, such as advertisements andweb pages. It totally outdistances the traditional A/B testing usedwithwebpages. Italsocanpreciselyoptimizedeliveryofcomplexservices. We will compare and contrast conjoint with discretechoice modelling, showing where this method has uniqueapplications.

ThinkinginconjointversusthinkinginchoicesConjoint analysis developed largely in the market researchcommunity. As you may recall from Chapter 3, its earliestincarnations lookedsomething likemagicsquares,where levelsoftwo features were crossed versus each other and the resultingcombinationswere ranked. Youwill find this back in Figure 3.10,whichshowshowthislooks.Conjoint became a widely used and useful tool with the

development of traditional full-profile conjoint in themid-1970s.Thisshowsaseriesofwholeproductsorservices–hencethenamefullprofile.Studyparticipantstypicallyratetheseproductprofiles.Before online interviewing became so prevalent, people could

alsosortandranktheprofiles,whichappearedoncards.Youwill

still encounter the term conjoint cards, meaning the productprofiles,asaremnantofthisnowrarepractice.Conjointwas quickly hailed as a great advance over traditional

ratingscales,andgainedwidespreadadoption.Itovercamemanyofthe problems with ratings, such the tendency of people to rateeverythinghighly important ifgiventhechance,andthetendencytogivesociallyacceptableanswers inratings. (Wediscussed theseconcernsingreaterdetailinChapter2.)Conjointoftenworkedwellwithwidelyknownbrandsthatwere

largelysimilartoeachother.Butitalsobrokedownmysteriouslyinothersituations.Therealsowasalogicaldisconnectinusingitwithmultipleproducts.That is, itpresentedproductsoneatatime,butthen attempted to simulate how they behaved side by side in amarketplace. The taskposed in the interviewwas clearly less liketherealworldthantheoneindiscretechoicemodelling.And indeed, the underlying conjoint analysismodel as a whole

was less realistic than the model used in discrete choice. As areminder, with discrete choice, each choice can have its ownfeatures. With conjoint analysis, features were considered asapplying across all the choices, and compromises often had to bemade.Figure 5.1 goes back to our earlier example in Chapter 4 with

QuietFinancialServices(QFS);itshowstheframeworkfordiscretechoice modelling, and how it contrasts with the one for conjointanalysis.

FIGURE5.1Attributesindiscretechoiceversusattributesinconjoint

With discrete choice modelling, the brand (or the choice) is acontainerandattributesare specific to each choice.With conjointanalysis, brand is an attribute that can combine freely with theothers. Note that with conjoint, some attribute levels that werespecifictoeachbrandhadtobeeliminatedsothattheexperimentaldesignwouldremainareasonablesize.Ifweweretohavetwofive-levelattributes,forinstance,thatwouldrequire5x5or25productprofiles. That would be too many to show using the traditionalconjointanalysismethods.Anothersalientproblemisthatattributelevelsthatdonotbelong

with each brand would need to appear with that brand. Forinstance, ‘no fee’ would have to appear with QFS, which has nointentionofoffering‘nofee’.Thatproblemwasaddressed imperfectlybyusing ‘prohibitions’.

Basically,thismeantswappinginanotherattributelevelfortheonespecifiedbytheexperimentaldesign,whenevertwoattributelevelscouldnotappeartogether.Thiscoulddamagethedesignandmaketheattributescorrelated.

If you were lucky, the correlations would still be low, and thedesignwouldnotbeweakenedtoomuch.Butwiththewrongswapor toomany swaps, the experimentwould fall apart, underminedby strong correlations among the attributes. (If this seemsconfusing, please go back and look at the section in Chapter 3 ondesignsandwhytheyneedzeroornear-zerocorrelations.)

LackofrealisminhowconjointtreatsutilitiesConjointanalysisstartedwithanoverlysimpleviewofhowutilitybecomes share in the marketplace. More utility equals morepreference, in a straightline relationship. More formally, therelationship is seen as linear and additive. As we discussed inChapter 4, the S-shaped relationship postulated in discrete choicemodelling closely fits what we know about decision making andhow products fare in themarketplace. Figure 5.2 shows how thetraditional treatment of utility by conjoint compares withrelationshipcapturedbydiscretechoicemodelling.

FIGURE5.2Differencesinhowdiscretechoiceandconjointtreatutility

Weusetheword‘traditional’becauseconjointcanberetooledtoperform better in this area. With a few tweaks, we can solve aconjoint problem just as we would an allocation problem indiscrete choice modelling. Then we have at least overcome onesalientissue.All of the remaining problems, though,make conjoint less than

ideal for estimating how products will behave in a competitiveenvironmentwhen theyarevaried.Theproblemof attributesnotbeing specific to the choices also makes measuring responses tochanges in the attributes less accurate. It is no wonder conjointanalysishasbeeneclipsedbydiscretechoiceastheleadingmethodforpredictingacceptanceofproductsinacompetitivemarketplace.

Thenhow(andwhy)doweuseconjointanalysis?

Conjointanalysiscanprovidevaluableinsightsinsituationswheretheproductwouldhaveaminuscule share, and so get completelyswampedbyotherchoiceswhenpeopleareaskedtochoose.Thatis,if a product is rarely chosen, there would not be enoughinformation from the few times it did get selected to determinewhat was driving that choice. Conjoint analysis allows youeffectively to put that infrequently chosen product ‘under amicroscope’anddetermineitsbestpossibleconfiguration.

Traditionalconjointanalysisshowsoneproductatatimeandasksforaratingofthatproduct.Inconjointanalysis,ifyouincludebrand,itbecomesanattributeratherthanacontainerforattributes(whichisthecasewithdiscretechoicemodelling).Sincebrandisanattribute,eachbrandmustappearwitheachlevelofeachotherattribute.Thiscanleadtoimpossiblecombinations.Evenifitdoesnot,theconjointapproachislessrealisticandlessprecisethanoneusedbydiscretechoicemodelling.Discretechoicemodellingisnowstronglypreferredforpredictingacceptanceofproductsinacompetitivemarketplace.

Conjoint also can do remarkably well in optimizingcommunications, inparticularprintadvertisingandwebsites.Youcantesttheequivalentofthousandsofalternativeconfigurationsinone study. It completely overwhelms the more usual A/B testingdonewithwebsites–aswewillsee.In addition, conjoint analysis can do remarkably well in

determiningthespecificlevelsofservicethatneedtobeofferedincomplexcustomerrelationships,suchasthosebetweenautilityandits commercial customers, or a pharmaceutical company and thedoctors and clinics it serves. We will see all three of theseapplications.

Whileconjointanalysisisnotthebestchoicefordetermininghowproductswillbehaveinacompetitivemarketplace,itstillhassomeimportantuses.Thesearethreeprincipalapplications:

Wherecompetitivecontextwouldbeoverwhelming:thiswouldbethecasewhenyouwantedtomakethebestproductandtheproducthadaverylowshare.Becausetheproductwouldgetchosenonlyrarely,patternsofchoiceswouldnotaccuratelyrevealwhichchangesworkedbest.Conjointanalysis,withitssingle-productfocus,canputthatproduct‘underamicroscope’andshowhowtoselectthebestvariationsofitsfeatures.

Foroptimizingcommunications,suchasprintadvertisementsandwebsites.Youcangettherelativeappealofthousandsofalternativeconfigurations,completelysurpassingsuchcurrentmethodsasA/Btesting.

Fordeterminingtheexactlevelsofserviceincomplexcustomerrelationships,suchasbetweenutilitiesandtheircommercialcustomers.Themanyelementsincustomerinteractionscanbecarefullytailored,leadingtothebestmix.

Conjointanalysisforsingle-productoptimizationAnexamplefromagoodnumberofyearsagoinvolvesamakerofdisposablepens.Thesearetheinexpensivepensthateventuallyrunoutofinkandarediscarded,ormysteriouslydisappearforeverandarenotworthretrieving,aswhenyoulendthemtoafriend.There are dozens of alternative brands and models to choose

from,soanewpenenteringthiscrowdedfieldwouldgarneratbesta very small share. This would have posed a problem in testingreactionstothenewpenifithadbeentestedalongsidecompetitors.Itmightbeenchosenrarely –ornever.Zeroing inon theproductitselfwithconjointanalysisthereforemadesense.This approach also worked because of the way that consumers

make purchases. The manufacturer noticed that buyers wouldpurchasemorepensiftheyhadapositiveexperiencewritingwithacertainpen.The companyhad identifiedanumberof factors thatcouldmakethepenbettertouse,includingthebarrelwidth,roller-

ballcomposition,viscosityoftheink,dryingtimeoftheinkandsoon.Theydecidedtotestfivefeaturesthatcouldeachvarythreeways

andonethatcouldvarysixways.Thiswouldbe3x3x3x3x3x6or1,458possibleconfigurations.An experimental design led to 18 different prototype pens. The

makersproducedabatchofthesetobeusedintesting.Peoplewereinvited to interviewing facilities, where they found the pens andplentyofpaper.Theytriedthe18pens,thenputtheminorderfromfavouritetoworst.Theserankingscanbeusedinconjointanalysisjustasnumericalratingscan.Themainoutputofthisstudywasasimulatoraccuratelyshowing

the relative appeal of all 1,458 possible pens. In fact,manymoreconfigurations were possible, because the simulator allowedinterpolation between thevalues tested for continuous variables,such as drying time of the ink. After carefully considering theresults,thecompanywasabletomakeanewpenthatpeoplelikedusingandthatsucceededinthemarket.

UsingthesingleproductsimulatorintheonlineresourcesOurnextexampleconcernsanewtypeofdietaryfibrethatcameinthe form of a large tablet. There aremany products that containdietaryfibre,sothesponsoringcompanywasconcernedthattheirnew invention might get lost in the mix if it were tested inmarketplacescenariosalongsidethemoreestablishedbrands.The study therefore used full-profile conjoint analysis, showing

thisnewproductalone.Studyparticipantsevaluatedaseriesof16hypotheticalnewproducts.Eachproducttheysawdifferedinlevelof dietary fibre, formulation, flavour, numbers of days of supplyandprice.Pricewasshownbothaspriceper tabletandpriceperpackage.

Price per tablet was contingent on the formulation, with fourdifferent prices, depending on how the product was made. Totalprice was contingent on both the price per tablet and days ofsupply.The main results of the study can be seen in the simulator

(availableatwww.koganpage.com/AI-Marketing).Therearethreeversions, one that runs under Excel, one that runs underPowerPoint,andonethatrunsintheAdobeAcrobatPDFformat.Allversionsarefullyinteractive.

PowerPointversionDownload the file and open PowerPoint. The program works inpresentationmode.TogettopresentationmodeonaPC,presstheF5functionkeyat the topof thekeyboard.OnaMac,youneedtopress Option-Return. (You need to have Flash Player on yourcomputerforthePowerPointandPDFversionstorun.)Figure5.3showswhatyouwillseewhenyouopenthissimulator.

Wehaveadded letters thatdonotappear.Thesewill aid inquicklocationofthefeaturesbeingdiscussed.


FIGURE5.3Samplebasic-oneproductconjointsimulator

The controls for the simulator appear under ‘A’. These are allselect-onetypecontrols,withthedarkershadingappearingoverthelevel selected. The total appeal of the product, and how thatcompares to thebest possibleproduct, falls under ‘B’.Moredetailrelated to this chart follows under ‘C’, where a chart displaysdifferences between the attribute levels chosen and the bestpossiblelevel.Thissimulatorhasaspecialcontrolthatthesponsoringcompany

requested. You see this near ‘D’. This deflates the so-called‘optimism’ of the estimate. That is, study participants estimatedtheirlikelihoodofbuyingthevariousproductstheysawona0–100

scale.Because they rated theproduct in isolation, therewas someconcernthatthisratingwasinflated.Thereforeacontrolwasaddedtodeflatethisestimateifneeded.Near ‘E’, there is a two-item selector, a kind of mini-menu.

Clickingonthebottombarswitchestoanotherpagethatshowshowestimated share of preference changes as the so-called optimismchanges.Theleastoptimisticcurveaimstolooklikethebottompartofthecurveinprospecttheory(whichwediscussedinChapter4),whichwas postulated to be a bottom limit for how quickly shareversusutilitycoulddecline.Figure5.4showshowtherelationshipbetweenutilityandshare

canbechangedbythedial-shapedcontrol(thiscontrolappearsonbothpagesofthesimulator).Movingthecontrolcounter-clockwisewith the pointer will move the line down to the least optimisticrelationship.

FIGURE5.4Detailsofhowutilitybecomesshareofpreference

Similarly,diallingthecontrolclockwisemakestheestimatemoreoptimistic, until it finally reaches the straight-line approximationtraditionally used in conjoint analysis. That is as optimistic aboutshareeffectsasyoucanbecome.InFigure5.3,pressingonthetopbarofthetwo-itemmenu(above

letterE)bringsyoubacktothesimulatoritself.Wehaveonemorefeature to describe. Prices, both per tablet and in total, appearabove letter F in Figure 5.3. These reflect the actual prices thatpeopleparticipating in the study saw. The controls under letterAreflect the way that prices were set up so that the correct pricecould appearwith each formulation. (As a reminder, Table 4.1 onpage125showshowcontingentpricingcanbedone.)

Excelversion

ExcelversionThis works very much like the PowerPoint version, but due todifferences in how graphics are handled in each program, has aslightlydifferentlookandfeel.Again,youneedtodownloadthefileandopenitwithExcel.IfExcelaskswhetheryoureallywanttorunthe content, answer in theaffirmative. ShouldExcel still refuse tofunction fully, please refer to the instructions on the third sheet,‘SecurityinExcel2007andon’.

Problemswithestimatingactualeffectsinthemarketplace

This simulator underlines the problems inherent inmoving fromratingstomarketshare.Wedonotknowwhichmodelwillforecastchanges most accurately. We must make assumptions, or at theleast test various assumptions as we did with the demonstrationsimulator.Conjoint analysis does particularly well at showing the relative

values of different features, and relative changes that come fromvarying in those features. But again, it is not the best choice forshowingeffectsinthemarketplace.

ConjointremainsanexcellentmethodformessagesFortunately,whenwearedealingwithmessages,weare trying topick the best alternative – so the issue of how utilities becomemarketplace behaviour becomes irrelevant. And with websites,even though we still use a conjoint-style approach, we can godirectlytobehaviour,suchasclicksorstickiness(howlongapersonstayson thepageor site).Usingconjoint,weget theequivalentoftesting hundreds or even thousands of alternative message

configurations,allinonetest.

Ourfirstexample:anunwantedmessageThisisadirectmailpiece,oneofthoseinsuranceofferswealllookforwardsomuchtoreceiving.Thishappenedmanyyearsago,andmay be one of the first applications of conjoint to developing thebestmixofelementsinamessage.Because these insurance offers are designed to workwith very

low level of responses (often under 1 per cent), even a fractionalimprovementcanmakeavastdifference.Ifyoumovefrom0.8percent to1.0percentresponse,youhave increasedyoursalesby25percent(thatis,thisisthe0.2percentincreasedividedbythe0.8percentbaserate).Therearetwocomponentstothisoffering:theenvelopeandthe

letter.Figure5.5showsadisguisedversionof the itemstested.Sixfeatures or attributes were varied. Counting the variations orlevelswegettohowmanymessageprofileswewillneedtoshow.

FIGURE5.5Elementsvariedinthedirectmailoffering

Specifically, herewe have four attributes eachwith four levels,one attribute with three levels, and one with five levels. Let’s gothroughtheformulaforhowmanyprofilesthiswillrequire.First,we take the number of attributes times the number of levels.Specifically,wehave(4x4)+(1x3)+(1x5);thatcomesto16+3+5,or24.Thenwesubtractoutthetotalcountofattributes,whichissix.Thatgivesus18.Weneedtoaddbacktwomoresothatwecanmeasureerrorandwecanestimateatermcalledtheconstant.We finally used 24 message profiles, and showed each person

eightofthem.Thatis,everyonesawjustone-thirdofthetotal.So,ineffect, everyone counted for just one-third of a complete set ofresponses,orperhaps,one-thirdofawholerespondent.Each profile was rated on a 0–100 per cent scale, with the

dependentvariablebeinghowlikelythepersonfelttheywouldbeto read and consider the message. (Recalling that expected

purchaseshitabout0.8percentonaverygoodday,thesponsoringcompanydidnotmake‘actuallybuying’thecriterion.)Whenwedidthis many years ago, it required us to triple our sample. As areminder,nowthatwehaveHBanalysis,weactuallycangetmuchmore per respondent. We might possibly even get away with noincreaseinsample.Backtoourmainstory,Figure5.6showsthewaytheeffectson

acceptancelevelslooked.Thisclearlydelineateswhatismostandleastattractivetoprospectivecustomers.

FIGURE5.6Responsestotheelementsofthedirectmailoffer

Herewehavetestedtheequivalentof4x4x3x4x5x4orsome3,840 alternative combinations of message elements, and havefoundthebestone:

Timeisflying.Areyouready?Calibrifontonenvelope.Greeninkonenvelope.Letter:Warningfirst,promisenext,messagelast.Letter:Whyplanningforthefutureisimportant.

Wehadadirectway to testwhether this combinationworked, asthedesigntheclientwasusingalreadywasoneofthecombinations

with lower total utility. By switching they actually improved theirresponseratebysome25percent,justpassingthemagic1percentacceptancemark.Sothiswasanearlysuccessstory.

UsingthismethodwithaprintadvertisementThenextexamplecomesfroma testofaprintadvertisement.Thesameprinciplesholdasinthedirectmailtest.Figure5.7showsourslightlyfictionalizedadandtheelementsvariedinit.

FIGURE5.7Alternativestobetestedintheprintadvertisement

SOURCESBillyRoseTheatreDivision,TheNewYorkPublicLibrary,L-R.AMcQueen,NCarter,DAllenandKPagedoing‘LoungingAtTheWaldorf’fromtheBroadwayproductionofthemusicalAin’tMisbehavin’(NewYork).TheNewYorkPublicLibraryDigitalCollections.http://digitalcollections.nypl.org/items/acfeeb2d-5cbe-4ce7-e040-e00a180644aa,BillyRoseTheatreDivision,TheNewYorkPublicLibrary,actressKateNelliganinascenefromtheNewYorkShakespeareFestival’sproductionoftheplayPlenty(NewYork),TheNewYorkPublicLibraryDigitalCollections1982.http://digitalcollections.nypl.org/items/c3b01b10-dab7-0131-de35-58d385a7b928SarahVaughan,possiblyatCafeSociety,NYC,caAugust1946.PhotographybyWilliamPGottlieb,fromtheWilliamPGottliebcollectionattheLibraryofCongresshttps://en.wikipedia.org/wiki/Sarah_Vaughan#/media/File:Sarah_Vaughan_-_William_P._Gottlieb_-_No._1.jpg

Thereare threealternativephotos,andavarietyofareas in theheadline and text that would be varied. Here we would have animmensenumberofpossiblevariations.Thatis,wewouldhave:3x3x3x3x2x3x3x3or4,374possiblewaysofcombiningtheseelements. We can determine the value all of them with 24experimentallydesignedcombinations.Howwouldthislooktoastudyparticipant?InFigure5.8,wesee

how one profile in the test would look, using a combination of

http://digitalcollections.nypl.org/items/acfeeb2d-5cbe-4ce7-e040-e00a180644aa

http://digitalcollections.nypl.org/items/c3b01b10-dab7-0131-de35-58d385a7b928

https://en.wikipedia.org/wiki/Sarah_Vaughan#/media/File:Sarah_Vaughan_-_William_P._Gottlieb_-_No._1.jpg

elementsbasedontheexperimentaldesign.Thiswasanonlinetest,with each person exposed to eight alternative combinations orprofilesoutofatotalof24usedforthetest.

FIGURE5.8Oneoftheadstested

SOURCEPhotofromBillyRoseTheatreDivision,TheNewYorkPublicLibrary,actressKateNelliganinascenefromtheNewYorkShakespeareFestival’sproductionoftheplayPlenty(NewYork),TheNewYorkPublicLibraryDigitalCollections1982.http://digitalcollections.nypl.org/items/c3b01b10-dab7-0131-de35-58d385a7b928

HowthisworkedBasedonthistest,theclientwasabletodetermineeasilywhichofthe over 4,300 possible combinations generated themost interest.Thiswas donewith a chart showing the basic effects of changingeachmeasured attribute, just as we did in the last example. Thisstudy,however,involvedanotherinterestingissue,whichwecouldcall the presence of aHIPPO. AHIPPO is simply theHighestPaidPerson’sOpinion (so,no indeed,wehavenotgonecrazy).TheBigBossreallywantedtoknowhowafewofhisfavouriteideasplayedoutagainstthebestcombination.

Thisseemedtocallforasimulator,similartoamarketsimulator,but in this casewith the overall rating as the outcome. Figure 5.9givesanideaofhowthesimulatorlooked.

http://digitalcollections.nypl.org/items/c3b01b10-dab7-0131-de35-58d385a7b928

FIGURE5.9Thesimulatorusedforoptimizingthead

Aswe can see in Figure 5.9, the Big Boss’s favourite is roughlyone-third as well received as the best possible combination (thisappearsinthecomparisonofthecurrentsharebartotheleftandthe maximum possible share to the right in the chart labelled‘Current score and difference from the best possible’). This led tothe truly difficult part of the study: the researchers in theorganizationwouldspendthenextseveraldaystryingtofigureoutexactlyhowtoconveythisinformation.

Testingwebsites:completelyoutdoingA/BtestingThis approachworks extremelywellwithwebsites, as testing cantakeplaceusing theactualbehaviourofvisitors to thesite, ratherthanaskingpeopletoprovideratings.Thisishowitworks:

1. Justasforprint,thelistoffeaturestobevariediscreated.2. Thevariationsareputintoanexperimentaldesign.3. Alternativeconfigurationsaremadeup.4. Whenpeoplevisitthesite,theyarerandomlyassignedtooneof

thealternatives.5. Clicksand/or‘stickiness’(amountoftimeapersonspendson

thepage)getmeasured.6. Withextremevaluesremovedorrolledbacktomore

reasonablelevels,wesolveforthetargetvariable(clicksorstickiness).

Gettingridofextremevaluesisimportant–apersonmayappeartostayonawebpageforalongtimeformanyreasons(answeringthephone,doingseveralthingsatonce,puttingoutakitchenfire,andsoon).Also,vanishinglysmalltimessuggestamistakenclickonthesite–andsolikelyarenotareflectionofinterestlevels.InFigure5.10,weseehowadisguisedwebpage lookswithfive

elementsbeingvaried.(Thesuperimposednumbers(1–5)aretherejustforourreference–theywouldnotappearinthetest.)

FIGURE5.10Elementsonawebpagevariedfortesting

SOURCEImageontheright:JamesTissot,Holyday

Thesefiveelementseachvariedinfourways.Thiscomestosome1,024alternativecombinationsofthevariedelements(or4x4x4x4x4).Totestthevalueofallpossiblecombinations,weneededtodevelop and test 20 alternatives in an experimental design to getaccuratemeasurements.Needing20alternativesleadstoacaution:youneedafairlywell

visitedwebsite to do behaviour-based testing of this type. If eachperson sees justone of thosealternatives, then thatperson is just1/20 of a complete set – or that person counts as just 1/20 of acompleterespondent.Sticking with a fairly slender requirement of 125 complete

respondents,wewouldneed2,500visitorstocompleteonetest.So

you do need a relatively busy site to do this kind of testing.However, considering that a recently reported A/B test involvedover1millionvisitorswhowereshowneitherof twoalternatives,perhapsthisisnotaskingthatmuch.Testing by showing an alternative to hundreds of thousands of

viewersseemstohearkenbacktotheolddaysofdirect-mailtesting,before conjoint analysis-based methods of message optimizationweredeveloped.A test then consisted of sending out 700,000 or 1millionpiecesandseeingwhatcameback.Thereisofcourselessdirectcostandeffortinvolvedinpassively

allowing visitors to see the website in alternative configurationsthan in developingmailings and sending them.However, there isstillconsiderablewasteinshowingthatmanypeopleamessagethatislessthanthebest.Certainly,byusingjustafewthousandpeopleandbeingableto

determine responses to 1,000 alternative configurations, conjointmessagetestinghascompletelysurpassedA/Btesting.InFigure5.11,wehavetheoutcomeofthewebtest,withfictional

slogans.(Theactualonesreallywerealittlebetter.)Findingthebestalternative from the 1,000 possible configurations has become asimpletask.

FIGURE5.11Effectsofchangingelementsofthewebpage

ThesurveyalternativeIfwaitingforthousandstovisitasitestillseemsliketoomany,thennothingpreventsyoufromevaluatingtheelementsforawebsiteinthesamewayasyouwouldtestaprintadvertisement.Thatis,youwould recruit people to a survey, and then show each person anumberofalternativesitedesigns.Thelivetesthastheadvantageofmeasuringactualmarketplace

behaviour,andpresumablyamongpeoplewhoareinterestedintheproductorservice.Inthesurvey’sfavour,ifwequestionedpeople,we would be able to select people of most interest to interview,zeroinginontheresponsesofimportantaudiences.Wealsocould

askotherquestionsaboutwhotherespondentsare, theirusageofother products, and so on. We would get a more comprehensivepictureofwhoresponded.

ConjointanalysisforthebestservicedeliveryMany companies have complex service relationships with theircustomers. For instance, consider telecommunications and utilitycompanieswith theircommercialcustomers,cablecompaniesandinsurance companies with their users, and medical device andpharmaceuticalcompanieswithdoctorsandclinics.Allof thesecompaniesdomore thanprovideaproductorbasic

service. They may also instal equipment and maintain it, bringsamples of supplies, provide ongoing education for professionalcertification – and have many other interactions. These differentareasofinteractionaresometimescalledtouchpoints.Giventheoftenvanishinglysmalldifferencebetweencompanies’

basicofferings, theseotherservicescanloomlarge inthedecisionto use one provider, to renew a contract, or simply to continuebuying.

AbetteranswerthanratingsWealreadyhavediscussedatlengththepitfallsofaskingcustomersdirectly what they find important in the services they receive.When asked to provide these ratings, people almost invariablydecideitisbesttoratenearlyeverythingashighlyimportant.Conjoint analysis allows to you show an entire service profile,

withall theelementsinplace.Youaskstudyparticipantstogivearatingreflectinghowsatisfied theywouldfeel if theentireservicepackage could be reliably delivered at that level. Since all thefeatures are evaluated in the context of the whole service

experience,thisavoidstheproblemofnon-essentialfeaturesbeingrated as important. The result is avoiding the dual pitfalls ofunderdelivering and overdelivering in key areas of customerinteractions.

Ourexample:thepharmaceuticalcompanywithnothingspecialPharmaceutical companies often follow a successful competitiveproduct introductionwith their ownoffering that does somethingsimilar.Thisisonereasonthatwehave,forinstance,somanydrugsthat lower cholesterol. Their motivations are understandable,becauseaso-calledblockbusterdrugcangarnerwellover$1billioninsales.Onesuchcompany,whowewillcallNewInnova,hadlargelyused

astrategysometimeslabelled‘metoo’.Mostoftheirproductscameabout from following a market leader into a new and profitablecategory.Thenoneday,theytookalonglookandrealizedthattheyofferednothingdistinctive.Their answer (until such a time as they actually developed

something new), would be to focus on their professionalrelationshipswith theircustomers.Theyhad to takecareful stock.Many once-favoured interactions (that were in fact bribes) havebeenoutlawed.Still, thesecompaniescandoagreatdeal for theircustomers.Theycansendcheeryyoungrepresentativestotalkwithdoctors

aboutwhattheyaredoing,answerquestionsaboutthedrugs,andoutline new research and developments. This is called detailing.They also can drop off numerous samples of their fine products.This is called sampling. Many companies will produce extensivematerials to educate patients and staff members. Some will evensendin-persontrainersforcomplexdevicesandproducts.Companies also offer education, including continuing medical

education(CME),whichintheUnitedStatesisrequiredinordertoretain a medical licence. They offer seminars, conferences and,sometimes, something called medical thought leadershipprogrammes. This last item is new, and seems something like anovel way to offer the now-outlawed bribes. They also send outcolourful newsletters, either on glossy paper or on the web. Andthere are still otherways that they can insinuate themselves intotheircustomers’professionallives.

Thekeyquestionconjointserviceoptimizationaddressed

Thesponsoringcompany,NewInnova,hadthisquestion:giventhattheycouldperformalltheseservices,howshouldtheyofferthem?Forinstance,howmuchshouldtheysample,andhowoftenshouldthey do detailing? To determine the right balance, they turned toconjointoptimizationoftheirserviceofferings.Theylookedinto11serviceareas,ofwhichweshownineinthe

disguisedexamplebelow,inFigure5.12.NewInnovacameupwithenoughvariationstorequirea42-profileexperimentaldesign.Eachpersonsawandevaluated14ofthese.Theyratedhowsatisfiedtheywouldbeona0–100scale,ifservicecouldbedeliveredconsistentlyatthelevelshown.

FIGURE5.12Aserviceprofileforconjoint-basedoptimization

ThiswasanalysedwithHBanalysis,using the tweakofconjointwe mentioned earlier, where it is analysed like an allocationexercise in discrete choice modelling. The first output, in Figure5.13, showsa clearprioritizationof thevarious serviceareas.Theimportance of each area is proportional to its effect on overallratings.Areaswithastrongereffectareaccordedmoreimportance.

FIGURE5.13Priorities’placeonserviceareasfromtheanalysis

Valueshavebeenscaledsothatthemostimportantareaissetto100.Thesevaluesareratio-level,sothatthesamplingwithanindexof 100 would be nearly three times as important as continuingmedicaleducation,withanindexof37.Therearethreeclearlydefinedgroupsoffeatures.Notealsothat

theleast importantfeature, thenewsletter,hasanindexofonly6.This means that sampling is over 16 times as important as thenewsletterindetermininglevelsofsatisfaction(100/6is16.67).Speakingof thenewsletter,while showingdetails abouthow its

different levels compared, we saw the senior marketing person’seyes light up for the first time. He was worried that they wouldhavetoproduceamonthlyedition.Asyoumayremember,thiswasacompanythatdidnothingspecial–soonamonthlyscheduletheydefinitelywouldnothavehadenoughtosay.InFigure5.14wecanseehowthelevelsofthisvariableaffectedsatisfaction.

FIGURE5.14Effectsfromchangingthefrequencyofthenewsletter

Weranintoamoreseriousconcernwhenanalysingthesamplingthis company did. Part of the survey asked about how oftencustomers now received each of the services, including gettingsamples.InthebarchartatthetopofFigure5.15wecanseethat60percentreceivedsampleseverytwoweeksorthreeweeks.Some22percentgotthesampleseverytwoweeks.

FIGURE5.15Howfrequencyofsamplingisoverdeliveredandunderdelivered

However, we see that satisfaction with sampling is nearlyidenticalwhethertimingistwoorthreeweeks.Thismeansthattheclient company was overdelivering this service to nearly one-quarteroftheircustomers.Thisisbothexpensiveandwasteful.Onthelowside,theywereunderdeliveringtosome40percent

of their customers. Those getting samples every sixweeks or less,

whichemergedashighlydissatisfying,accountforasubstantial14percentofcustomers.Clearly,workneededtobedoneonhowsamplesweregivenout.

This isacriticalareaforboostingcustomersatisfaction,andsomecustomersweregettingneedlessextra sampling,while fouroutof10weregettingsamplestooinfrequently.Because they did this conjoint service optimization, NewInnova

could realign sample delivery and so become a more desiredprovider of all their fabulous ‘me too’ products. They couldsimilarly refine and revise all their other service offerings, trulymeetingcustomers’ serviceneeds.Thiskindofpreciseguidance iswhatmakesthismethodsopowerful.

UsingthemessageoptimizationsimulatorintheonlineresourcesYoucandownloadfromtheonlineresourcesasimplesimulatorforoptimizing messages (available at www.koganpage.com/AI-Marketing). Once you have done this, if Excel asks if you reallywant torunit, say ‘yes’. IfExcelstilldoesnotwant togoforward,please refer to the instructionson the third tab, ‘Security inExcel2007andon’.ThissimulatorwillhidetheExcelribbon(thebarwithcommands

andmenusatthetopofthescreen)whenyouopenit.Twobuttonsoneachpageofthesimulatorcontrolwhethertheribbonisvisibleornot.You will find these buttons on the bottom of the opening

‘Welcome’page andat the right of the simulatorpage.Make sureyourestoretheribbontoviewandexitusingthe‘File’menuifyouwanttocontinueworkinginExcelafterclosingthesimulator.

Thestorybehindthesimulator


The sponsor, amagazinewewill callToday’s Troubles,wanted toshowhowmuchbetteralargeraddidinstimulatinginterestamongreaders. They also were confident that they were a strongerpublication than their similar competitors andverymuchwantedtoshowthataswell.Theytookanadthattheyfeltwasrelativelyneutral,advertisinga

vacation.Theymade16prototypeads,varyinginsizefromspottofull page, with different levels of colour, different imagery anddifferent levelsofdetail in the text.Theyplacedtheprototypeadsalongsidebland‘filler’contentiftheywerenotfullpage.Someadshadanaddedborderandsomedidnot.Most importantly, some of the ads were identified as being in

theirfinepublication,andsomeasbeingineachmaincompetitor’smagazine. This gave them a total of six features to be varied andsome864possibleconfigurations.

FIGURE5.16Thedownloadableadvertisingoptimizationsimulator

Each person interviewed saw 8 of the 16 prototypes. Theyreviewed each and said how likely they would be to follow uplookingintothevacationlisted,usinga0–100likelihoodscale.

ThedisplaysinthesimulatorThesimulatorappearsinFigure5.16.Asyoumovethecontrolsontheleft,thedisplayswillchangeinrealtime.Youwillseeboththeestimatedresponserateandhowthiscompareswiththesponsor’sleastdesiredchoice(thespotad).Thisinformationappearsasbotha number chart and a bar graph. There also is a numericalcomparisontotheweakestdisplayad.Thiswastoprovidefurtherimpetus toprospectiveclients togetabetter (moreexpensive)ad.Thedifferencescanbequitedramatic.Thissimulatorwasjudgedtohavedoneitsjobextremelywell.

ConjointanalysisandinteractionsNowthatwehavecoveredseveralimportantapplications,itistimeto circlearound todiscussmoreabout theworldofvariablesandhow they behave. The subject of interactions can becomeimportantwithconjointanalysis.Itcertainlyisimportantwheneverbrand is treated as an attribute and allowed to mix and matchfreelywithotherattributes.An interaction occurs when two or more variables together

behave in ways we would not expect from seeing each of themseparately.Forinstance,formenofacertainageandmind-set,redandsportscarinteract.Thatis,aredcarisnice,andasportscarisnice, but put them together into a red sports car and you havesomethingtrulyextraordinary.Itisworthfarmorethanyouwouldsuspect fromtheattractivenessofredcarsor theattractivenessofsportscars.Interactionsmattered a great deal in the early days of conjoint,

because attributes applied evenly to all brands. This wasparticularly problematic when brands responded differently tochangesinprice.An example comes from the golden days of Sony electronics,

whenthisbrandcouldchargemorefortheirproductsbecausetheirnamecarriedaspecialcachet.AswecanseeinFigure5.17,whereweplotsalesversusprice,Sonywouldalwayssellmoreatagivenprice than all their competitors. Their competitor RCA could sellnearlyasmuchbuthadto lowerpricestoattainthesamelevelofsales. Two other brands, the one offered by the store under theirown name, and the mysterious Nonameo (the brand you neverheardof)wouldfareconsiderablyworse.ThisiswhatweseeintheFigure5.17priceversus share curves for small televisionsofferedbythesefourbrands.

FIGURE5.17Averageresponsetochangesinpriceversuseachbrand’schanges

Ifyouweretouseasinglepriceattributethatappliedacrossallthe brands, your estimates would be wrong for all of them. ThiswouldbetheaveragelineinthemiddleofFigure5.17,pointedoutbythewhitearrows.Errors in estimating price responses probably contributed as

muchasanythingtoconjointfallingoutoffavour.Becauseconjointtreatedbrandasanattributeandpriceasanattribute,youneededtobuildinspecialinteractiontermstogetgoodmeasurements.Yettheexperimentaldesignsusuallyusedwerenotlargeenough

topickup those interactions.For instance, ifyouhad fourbrandsandfourprices,theinteractiontermwouldcontain4x4itemsor16parameters.Thesewouldbeinadditiontothefourbrandsandfourprices,whichtogetheramountedtoeightparameters.Youcouldcreatealargerexperimenttomakesurethatyouwere

measuringthis large16-iteminteractiontermaccurately,butthiskindofplanningdidnotseemtohappenmuchinpractice.Missingthis interaction could have disastrous results. For instance, if youhadfollowedtheaverageprice-responsecurveandweresellingtheNonameobrand,youwouldprobablygooutofbusiness.Attheverybest,youwouldbesittingonmountainsofunsoldinventory,asyouconsistentlyoverestimatedhowmuchyoucouldsell.Interactions usually are not a concern with discrete choice

modelling, because each choice can have its own attributes. Forinstance, if you want to measure responses to changing Sony’spricing,youwouldhaveavariable,Sonyprice,specifictotheSonybrand.Youwouldgetadirectandcleanmeasurementofhowsharechangedbymeasuringthisvariable.However, interactions still matter a great deal with a popular

variant of conjoint called choice-based (CBC), and so it is best toremainawareofthem.Wediscussthisfurtherinthenextsection.

Variantsofconjointanalysis

Manymodificationsofconjointhavebeenproposed.Twoofthebestknown are the so-called partial profile method and a softwareproduct calledAdaptive Conjoint Analysis (ACA). Partial-profileconjoint was designed to shrink the task that study participantsneededtodo.Theideawasthatonlyapartoftheproductwouldbeshown and the rest would be ‘assumed’ to be acceptable. Behinddoing this, there were some arguments about the interviewotherwise imposing too complexa task.Yet these concernsdonotseem well-founded. Research has been published that suggestspeoplecanevaluatequiteafewcomplexmarketplacescenariosorproduct profiles. One careful study showed that people were stillgoing strong after 21 marketplaces. How many you can use inpracticedependsonhowengaged theperson iswith the categoryandtheirnativeskillsinfiguringoutproblems.Engineers and farmers, for instance, do particularly well

evaluatingmanycomplexscenarios.(Farmersmustbeverygoodatcalculating and problem solving to stay in business. They reallymissverylittle.Theideathattheyarenotontopofthingsisdoublyfalse.)Childrenandpeoplewhorarelyreadtendtodopoorlyafterjustafewscenarios.ACAhadaperiodofwideadoption.Italsowasroundlycriticized

anddeservedsomeofit.Itallowspeopletoeliminatesomeattributelevelsbeforetheyseetheminthecontextofthewholeproduct.Aswe have mentioned several times, looking at the entire productoftenleadspeopletorevisetheirideasaboutwhatisacceptable.Inaddition,themathsusedtoscoretheitemsratedunacceptablewassubjectiveandunsubstantiated.Thismethodalsowaspromotedinpartasameansofmakingthe

interview less onerous. However, much of the impetus behind itmayhavebeenincuttingdowntherequiredsizeofexperiments.Inthe early days of conjoint,measuring only a handful of attributeswaspossible.Because we now can use HB analysis, and measure far more

attributes than used to be possible, this reduction is no longer

needed. Indeed, the popularity of the ACAmethod appears to bewaning, although you will still encounter some individuals whowanttouseit.

Choice-basedconjoint(CBC)This is a do-it-yourself software approach. It makes the processrelativelyeasy,butsimplifies theapproachenoughtosapsomeofitspowerandeventoallowmisapplications.However,itappearstobe quite popular. And indeed, to some, it is seen as synonymouswithdiscretechoicemodelling.However, itsname isaccurate. It takes something fromconjoint

and something from discrete choice modelling. (And as you mayhaveguessedfromtheinitialsCBC,itcomesfromthesamepeoplewhoofferustheACAsoftwareproduct.)AsyoucanseeinFigure5.18,itshowsalternativesandasksfora

choice.However,notethatbrandismissing.Itactuallyisofferingachoicebetweenthreevariantsofoneproduct.Thisisnotarealisticrepresentationofwhatconsumerswillfindinthemarketplace.

FIGURE5.18AnexampleofCBC-styleconjointanalysis

Not as readily apparent is that all the choices have the sameattributes.HavingchoicessharethesameattributesistheconjointpartoftheCBCapproach.Ifyouusethisprogramasitissetup,youwill not have any chance to see if an attribute levelworks betterwithin a specific branded option. You can try to get by this with‘prohibitions’, which swap attribute levels. But this canmake fordesignproblems.Otherwise,youcannottestanyfeatureuniquetoagivenbrand.For instance, ifyouwant tosayyourbrand is the ‘mostreliable

cellular network’ in a standard conjoint model, that claim wouldhavetobetestedwithallthebrands.Yettherecanonlybeonebestnetwork, in spite of what some television commercials seem toimply. Attaching this claim to all the choices would lead to somehighlyunrealisticcombinations.TheCBCprogramdoesofferanextra-cost ‘advanced’optionthat

allowsyoutogetclosertotheexperimentaldesignsyouwouldusewith standard discrete choice modelling. But there is norequirement that you use it. And indeed, we have encountered agoodnumberofstudieswherenoneofthepeopleinvolvedhadanyideathatyoucouldsetupattributesasspecifictochoices.Youmay comeacross studies setup like theone inFigure 5.18.

However, we can do better. The lack of realism in having threevariantsofaproductcompeteagainsteachother,whenthatwouldneverhappen in theactualmarketplace, isa serious shortcoming.And the absence of branding is another problem. Everything thatwebuy anywhere, except certain ‘generic’medications,will comewithclearbrandidentifications.Also,ifyouincludebrandasanattributeinthisset-up,thatthen

opensuptheneedtoaddinteractionterms.(Youalsocanavoidthisby breaking with the program’s default method of setting upproblems.)Asareminder,interactiontermsarelarge,havingmanyparameters that need to be estimated. You definitely cannotestimate all possible interaction terms in even a small problem –therearetoomanyitemstogetanaccuratefixonallofthem.Therefore,unlesssomeoneisdoingconsiderableextraworkwith

CBC,weshouldnotexpectittoprovidewhollyaccuratereadingsofhowproductsperforminacompetitivemarketplace.Itshouldgiveyouanexcellentreadingoftherelativeappealoffeatureswithinaproduct, just as full-profile conjoint will. However, finding howappealing features are – versus each other – falls far short ofestimatingshare,evenshareofpreference.SouseparticularcareiftheCBCapproachisbeingusedtoestimatemarketplacebehaviour.TocompareCBCtofull-profileconjoint,ontheplussideforCBC,it

asksforachoice.This ismorelikewhatpeopledoeverydaythantheratingsthattraditionalconjointrequires.Onthedownside,CBCaskspeopletoprocessmuchmoreinformationthanconjointdoes.Thatis,eachscreenapersonseeswithCBCcouldcontainthreeorfour product profiles side by side. Full-profile conjoint shows justoneprofile.

Choice-basedconjoint(CBC)isapopularsoftwareapplicationthatmanyseeassynonymouswithdiscretechoicemodelling.However,itsnameisaccurate.Ithassomefeaturesofconjoint,andsomeofdiscretechoice.Itiseasytouse,butunfortunatelyalsomakesiteasytocommitmistakes.Itisnotthebestchoiceforseeinghowproductswillbehaveinacompetitivemarketplace,becauseofthewayitsetsupcomparisonsanditsdefaulthandlingofattributes.Itshoulddoanexcellentjobofdeterminingtherelativeappealofattributes–andofthelevelsofthoseattributes–justastraditionalfull-profileconjointwill.Butbeverycautiousifsomeonehasuseditandisattemptingtoforecastshare.

Which one is better for a conjoint-like task? There is no clearanswer.Somewouldsay that thechoice involved inCBCmakes itmorerealisticandsosuperior.Others(yourauthorincluded)havenoticed thatmostpeopledonot like toreadwhiledoingasurvey.The fact that full-profileconjointshows justoneproductata timehelpsstudyparticipants.Thisallowspeople toseewhat theyneedmoreeasily,andsotorespondmorethoughtfully.Whichonetouseis an area that those disposed to arguing could argue about allnight.Butlet’snot,andlet’sleaveituptoyourpreference.

SummaryofkeypointsConjointanalysis is theothermethodintendedtodevelopthebestpossibleproductsandservices.Whilediscretechoicemodellingwasdeveloped by economists and econometricians, conjoint analysisdevelopedinthemarketresearchcommunity.The first truly useful, and widely adopted form of conjoint

analysis is called full-profile conjoint. This shows an entireproductrepresentedasasetofspecificallydescribedattributes,oraproduct profile. Study participants see a series of these profiles,and in each the levels of the attributes that appear vary. Studyparticipantsratetheseprofiles,orsortandrankthem.(Therankingapproachisrarenowthatinterviewstypicallyaredoneonline.)Likediscretechoice,conjointreliesonstrictexperimentaldesigns

to make sure that the effects of varying one attribute do not get

mixedupwiththeeffectsofvaryinganother.Withoutthedesignedexperiment, attributes can get more or less attractive from oneprofile toanother inhighlysimilarways. If thishappens,we thenhavenoclearwayoftellingwhatisdrivingproductpreferences.

AsimplermodelConjointanalysisisbasedonasimplerviewofattributesandutilitythanisdiscretechoicemodelling.Inconjoint,brandisanattributeandallotherattributesapplyequallytoallbrands.Thiscanleadtounrealisticcombinationsofattributelevelswithagivenbrand,andsoisasalientdrawback.Thisunderlyingconjointmodelalsomakesitmuchmoredifficult

todeterminewhathappenstoaspecificproductwhenitsfeatureschange. This becomes particularly problematic with price, as ineverycategorywehaveeverstudiedat leastoneproductbehavesdifferentlyfromallotherswhenitspricechanges.Assumingallproductsbehaveinthesamewaywhentheirprices

changeactuallymakes itnearly certain thatall estimatesof sharechangeswillbewrong.Youcantrytoavoidthisinconjointanalysisbyaddingalargeinteractiontermtothemodel.Forexample,withpricevariablehavingfourlevelsandabrandattributehavingfourlevels, the interaction term would have 4 x 4 items or 16parameters that need to get estimated. (This is in addition to theeightparametersrepresentedbythefourpricelevelsandthefourbrandlevels.)Estimating all these extra parameters would require making a

bigger experiment. The basic design typically used for conjointanalysiswouldinfactbetoosmalltomeasurealltheseextraitemsaccurately.

AboututilityTraditionally, conjoint analysis took the very simple view that

utility and share have a straight-line relationship. This does notreflect the realities of themarketplace. Fortunately, you now canretoolconjoint,solvingit likeadiscretechoiceallocationproblem.Onceyouhave slightly recast theanalysis, thenyou canhandle itlikediscretechoicebyusingHBanalysis.Thisgivesamorerealisticviewofhowutilitybecomespreferences.

NotbestforestimatingeffectsincompetitivemarketplacesTheremainingproblems,though,makeconjointlessthanidealforestimatinghowproductswillbehaveinacompetitiveenvironmentwhentheyarevaried.Theproblemofattributesnotbeingspecificto the choices alsomakesmeasuring responses to changes in theattributeslessaccurate.It isnowonderconjointanalysishasbeeneclipsed by discrete choice as the leading method for predictingacceptanceofproductsinacompetitivemarketplace.

StillhighlyusefulinthreeimportantapplicationsAlthough we do not recommend conjoint as a replacement fordiscrete choice modelling, it has other highly useful applications.Herearethreecompellinguses:

Wherecompetitivecontextwouldbeoverwhelming:thiswouldbethecasewhenyouwantedtomakethebestproductandtheproductinquestionhadaverylowshare.Becausetheproductwouldgetchosenonlyrarely,patternsofchoiceswouldnotaccuratelyrevealwhichchangesworkedbest.Conjointanalysis,withitssingle-productfocus,canputthatproduct‘underamicroscope’andshowhowtoselectthebestvariationsofitsfeatures.Foroptimizingcommunications:suchasprintadvertisementsandwebsites.Youcangettherelativeappealofthousandsof

alternativeconfigurations,completelysurpassingsuchcurrentmethodsasA/Btesting.Fordeterminingtheexactlevelsofserviceincomplexcustomerrelationships:suchasbetweenutilitiesandtheircommercialcustomers,ortelecommunicationsorinsurancecompaniesandtheirusers.Themanyelementsincustomerinteractionscanbecarefullytailored,leadingtothebestmix.

VariantsofconjointanalysisConjoint has had a number of variants, including most notablychoice-basedconjoint(CBC).Asitsnamesays,ithassomethingfromconjoint and something from discrete choice, and yet is neither.Thismakes itrelativelyeasy tosetupaprobleminvolvingchoice.However, if left in itsdefault settings, it treats the choicesas theywouldbeinatraditionalconjointanalysis.Ifbrandisincluded,itistreatedasanattributeandallowedtocombinefreelywithallotherattributes.(Youcanbuyanextracost‘advanced’modulethatallowsyou to get closer to designs like those used by discrete choicemodelling.)This method also allows you to make highly unrealistic

comparisons, such as comparing three unbranded variants of aproductversuseachother.Thisisnothingyouwouldeverseeintheactualmarketplace,andits lackofrealismmakesanyestimatesofreal-worldbehavioursuspect.Thismethod,liketraditionalfull-profileconjoint,shouldgiveyou

anexcellentpictureof relativepreferencesamongdifferent levelsofthevariousfeatures.However,preferenceisnotchoice.Wehopeyouall recall thestoryabout the littlegirlwho lovedspinach,butnotenoughtoeatit.It is an open question whether CBC is better than full-profile

conjointforthosequestionsthatconjointissuitedtoanswer.SomewouldsaythatthefactCBCasksforachoice,ratherthanarating,makes it superior. Some (your author included) have noticed that

nearly all study participants do not like reading while doing asurvey. Showing them one product at time, as in full-profileconjoint, encouragesmore attention and thought to how featuresarebeingvariedthanseeingthreeorfourproductvariantssidebyside.

OverallAlthough supplanted by discrete choice modelling for forecastinghow products will behave in a competitive marketplace, conjointanalysisstillhasimportantuses.Youcanuseittozeroinonalow-shareproduct thatwouldhardly ever get chosen in a competitivemarketplace.Itsabilitytoputaproduct‘underamicroscope’canbeuseful in any exercise where you want to optimize the user’sexperiencewiththatproduct.We reviewed a successful example of this, in which study

participants evaluated18disposablepens configuredaccording toanexperimentaldesign.Thiscloseattentiontooneproductallowedthemanufacturertocreateapenthathadthebestpossiblewritingcharacteristics.Also,conjointanalysiscandoremarkablywellinfindingthebest

mixofelementstoincludeinamessageoronawebpage.Youcantest the equivalent of hundreds or thousands of alternativeconfigurationsinonesimpletest.ThismethodcompletelysurpassestraditionalA/Bweb testingwhere,atmost, two to fouralternativepagesaretested.Lastly, conjoint can optimize how services are delivered in

complexcustomer relationships.Where therearemanyways thatcompanies interact with their customers, sometimes calledcustomer touch points, conjoint can define the best possibleconfiguration. Companies that have complex service relationshipscan be found in areas such as telecommunications, utilities,insurance socialnetworking, and themedical andpharmaceuticalindustries.

Four simulators are available in the online resources for thischapter (available at www.koganpage.com/AI-Marketing). Theywillgiveyouanideaofthetruepowerinconjointanalysis.Threeofthem are different versions of a single product optimizationsimulator,one inPowerPoint,one in (AdobeAcrobat)PDF format,and one in Excel. The other shows responses to different printadvertisements,revealingtheoptimalconfiguration.Weencourageyoutofollowthe instructions inthischapter,anduseandexploreallofthem.


BonusChapter1:Finishingexperimentsandontothenon-experimentalworld

Thisbonuschaptershowsthebestusesoftheotherkeytrade-offmethods,MaxDiffandQ-Sort/Case5,whichprovideclearlydifferentiatedimportanceforlistsoffeatures,claimsormessages.Itwillgiveyouguidanceonwhenitismostappropriatetouseeachofthesemethods,andwhenyouwouldbebetterservedbyeitherdiscretechoicemodellingorconjointanalysis.Italsoprovidescriticalcontrastsbetweenthesemethodsasagroupandthosethatdealwithlessstructureddata–includinghowtoapplyandinterpreteach.

Accessthisbonuschapteronlinehere:



06PredictivemodelsViaclassificationsthatgrowontrees

Thischapterdescribestheclassificationtreemethods,aremarkablesetofapproachesthatuncovercomplexrelationshipsindata.Hereour focus broadens from predicting shares and understandingvariables’relativeimportancestodevelopingmodelsthatboosttheoddsof reachingadesiredoutcome.Several illustrationswill takethemystery out of thesemethods and show how they apply.Wealsodiscussseveralusefulextensionsofbasicclassificationtrees.

Classificationtrees:understandinganamazinganalyticalmethodNow that we have absorbed a great many facts about predictingwhat happens when you change the features of a product (orserviceormessage),itistimeforsomethingcompletelydifferent.Asareminder,throughChapters4and5(andbonusonlineChapter1)we have seen highly powerfulmeans of understanding variables’effects, and other methods that diagnose variables’ importances.Discrete choice and conjoint provide remarkable power inansweringwhat/if-typequestions,andtheMaxDiffandQ-Sort/Case5methodsclearlydelineaterelativeimportances.Hereweshiftfromfocusingontheseareas,expandingtomethods

thatalsocanworktoimprovetheoddsofanoutcome.Marketshareand odds are closely related concerns. Improving the odds of anoutcome is something different. The goals of doing that includereducing uncertainty and waste. Also, as we will see, these

investigations can provide insights that lead to further analyses,andfinallypropelchanges.Classificationtrees,thesubjectofthischapter,arethefirstsetof

methodsthatcanbeturnedtowardsincreasingthelikelihoodofanoutcome.Othermethods that are pressed into service in thiswayinclude the Bayesian networks explained in Chapter 7, and theensembles andneural networks in Bonus online Chapter 2.Manyothermethodsexist.Classificationtreesstartedasmoreofapromisethanasolution.

Manyyearsago(inthe1980s,tobeexact,whichpracticallycountsasancienthistory)precursorsoftoday’smethodsweredismissedasweak and inadequate. Since then, the procedures and computershave developed.Now classification trees have become a powerfulandwell-studiedsetofpredictivemethods.Inthe last fewchapters,wekept theanalyticaldetailsdiscretely

tuckedaway,doubtlesstothereliefofmany.Wewillmentionnowthat discrete choicemodelling and conjoint use variants of eitherregression,multinomiallogitormotherlogit.Theselasttwoareregression-like methods that solve problems where the targetvariableisasetofcategories.(Thedistinctchoiceswearetryingtopredictare thecategories.)Thereareseveralrelatedmethods thatcould be used, but understanding the distinctions does not helpunderstand theusesandoutputof thesemethods.The finedetailsarebestlefttotheexperts.

AhighlyvisualapproachWithclassificationtrees, theanalyticalmethoditself iskey.Seeinghow the analysis progresses is important in understanding theresults. The Bayes Nets we discuss in Chapter 7 also are highlyvisuallyoriented.Eachofthesemethodsisquitedifferentfromallthathavegonebefore.In this chapter we will mostly discuss the classification tree

method called CHAID. There are a host of other, closely related

methods with different names, most notably CART (also calledC&RTandCRT).WewilltalkabouthowCHAIDandCART(andtheothers)compareandcontrastafterwereviewsomeapplications.

Seeinghowtreeswork,stepbystep

SplittingandresplittingCHAID and relatedmethods split the data into groups, seeking tofind some group (or groups) with more of some desiredcharacteristic.Thenitsplitseachofthesegroupsagain.Itcontinuessplitting until it reaches some stopping point you specify.We canthinkofitassomethinglikeasifterthatworksinstages,witheachstagerefiningtheselectedgroupmorefinely.Forinstance,seekingbuyersofaproduct,wemightuseavariable

suchasnumberofchildrenathome to findgroupsbuyingmoreand buying less. Looking at all possible numbers of children athome, from zero to 14,wemight then find the biggest differencebetween thosewith fiveormorechildrenathomeand thosewithfewer than five children. Classification trees zero in on the bestwaystofinddifferencesofthiskind.Oncethesemethodssplitthedata,theythenreturntothesmaller

subgroups formed and split those again – and if possible, again.Eachsplitproducesstill smallergroups.Someof thosegroupswillhaveagreatdealofthedesiredcharacteristic.Supposewegobackto families with five or more children. We might then find thatthoselivinginthesuburbswithinthisgrouparestillmorelikelytobebuyers.Thismaysoundabstract,solet’stakealookathowthisactuallyworks.

CombattingKardboardKrunchiesThe example deals solely with behaviour and demographics. It

starts with a database, which contains household characteristicsanddata aboutpurchasesofbreakfast foods. The client,SoggyOs,has grown concerned about inroads made by their dreadcompetitor, Kardboard Krunchies. These two brands, togetherwith Sorghum Sweeties, dominate the cellulose-enriched, overlysweetenedbreakfast-likesubstanceproductcategory. (Forthoseofyouwhohavenotbeenoutofdoorsmuch, theseare fictionalizedbrandnames.Also,theoutcomehasbeenslightlydisguised,asyouwillsee.)SoggyOs,Inc.hadamassedinformationaboutpurchasersof theircategoryfromscannerdataata largegrocerychain.Theycollected information on some 14,552 households that boughtproducts in their specific category. Some 20 per cent purchasedKardboardKrunchies.Because this was scanner data and the store had a so-called

loyaltyprogramme,theyknewtheaddressofeachpurchaser.(Thepurchaser scanned a tag or held up their smartphone next to ascanningdevicewitheachpurchase.)SoggyOs, Inc. thenmerged thepurchasewith informationabout

the purchaser’s household demographics. An amazing amount ofsuchinformationisavailabletobeadded,orappended,tothedataof any individual whose address is known (at least in the UnitedStates,andnearlyasmuchsointheUK).Hundredsofdemographicandproduct-usageitemsareavailable.Thedaysofprivacyarelongbehindus.Amongthedemographiccharacteristicsappendedtohouseholds

in thisdatabase,we find items suchas the typeof town inwhichthey live, reported household incomes, education levels and thepresence and ages of children – and so on. In total, there are 46demographic characteristics for each household in the database.Figure 6.1 depicts the incidence of Krunchies buyers in thispopulation.

FIGURE6.1Thedatabase‘population’

Theclassification treeprocedure (CHAID)will examineall thesedemographiccharacteristics,firstseekingtheonethatcanbestsplitthe sample into smaller groups that differ asmuch as possible inlikelihoodofbuyingKardboardKrunchies.TheCHAIDsoftware inquestioncansplitthesampleinto2–15groups.Examiningallfactors,CHAIDfoundthestrongestdifferencelayin

contrastingthosewholiveinsuburbanareasinonegroupversusthoselivingineithercitiesorruralareasinasecondgroup.Some22 per cent of those in the suburbs buy the competitive product,versus 17 per cent in the other two areas combined. That is, thesuburbangroup is 1.3 times as likely as the other tobebuyers ofthisbreakfastsubstance.WeseethisdivisioninFigure6.2.

FIGURE6.2Thetree’sfirstsplit

TreesandartificialintelligenceTheproceduredidsomethingquiteadvancedhere,whichmaynotbe immediatelyapparent. It found thatweneeded tocombine thestudy participants in two geographies into one group to get thestrongestcontrastinincidences.For theCHAIDprocedure to select thisvariableas the strongest

differentiator, itneeded to examine splitting thepopulationbasedon this variable in four different ways. (Another way of dividingwouldhavebeencombiningpeople incitiesandsuburbs intoonegroupversusthoseinruralareas,anothercombiningpeopleinthesuburbsandruralareasversus those incities,and the last simplysplittingthethreegeographiesintothreegroups.)Siftingthroughfourwaysofgroupingpeopletodeterminewhich

isbestmaynotseemthatimpressive,butrecallthattheproceduresimultaneouslylookedatall45otherdemographiccharacteristics–and at all the ways in which those could be used to divide thepopulation. Here is our historically earliest brush with genuineartificial intelligence (AI). The analytical methods CHAID uses forfiguringouthowtosplitasamplearesoadvancedthat theywere

firstpresentedatanAIconference.Whattheprogramdoesnextaddsmorevalue(andcomplexity)to

thefindings.Itreturnstothefirstgroupsplitfromthetotal,theonewith7,761householdsthatweseeinFigure6.2.Itthensearchesallthe remainingdemographic characteristics specificallywithin thatgrouptofindtheonethatwillleadtoasubgroupwithastillhigherincidenceofKrunchiesbuyers.Figure6.3 showswhat theprocedure found.This is a three-way

split of the group living in the suburbs, based on number ofchildrenathome.

FIGURE6.3Resplittingthefirstsubgroup

Weseeaverystrongcontrastamongthesethreesmallergroups:some 28 per cent of suburbanites with 5+ children boughtKrunchies, which is nearly three times as high as the incidenceamongsuburbaniteswith1–3children(only10percentboughtthissubstanceinthisgroup).Hereweseeaninteractionbetweentwodemographiccharacteristics.An interaction betweenvariablesmeans that the effects of two

(ormore) together differ from the sums of each individual effect.Morespecifically,inthisexample,thepercentagebuyingKrunchiesamongthosewholiveinthesuburbsandwhohave5+childrenishigher thanwhatwewouldexpect, eitheramongpeoplewho justliveinthesuburbs,oramongpeoplewhojusthave5+children.Wemusthavespecificvaluesofbothvariablescombiningtoseethis

highapercentageofSoggyOseaters.In this instance, the variables work together to lead to this

unexpectedlystrongoutcome.Thisisonetimewhenwecanusetheonce-populartermsynergisticandmeanit.

Interactions

Theeffectoftwoormorevariablesworkingtogethertoproduceanoutcomethatisdifferentfromwhatwewouldexpectbasedontheeffectsofthevariablesseparately.Thiseffectcanbelarger,orsynergistic,orsmalleroranti-synergistic(notnearlyasgoodabuzzword).

CHAIDisamazinglygoodatfindinginteractions.ThewordCHAIDis an acronym of chi-squared automatic interaction detector.Classification trees may be unique in that interactions areunmistakable in their output. Interactions do not automaticallyappear in nearly all other statistical procedures. Also,most otherprocedures do not tell you if you are missing an importantinteraction.Clearly, ifyouwanttounderstandeffectsonsomeoutcome,you

should understand if two ormore variables combine to act in anunexpected way. CHAID has unique problem-solving abilities inrevealinganddisplayingthesepatterns.What precisely does this have to do with a tree? The common

displaythatCHAIDproduceslookstree-like.Let’sexamineagainthesplits we just laid out, this time in the standard tree format (seeFigure6.4).

FIGURE6.4Splittingofgroupsshownasaclassificationtree

PerhapsnotapparentinFigure6.4istheremarkableanalysistheprogramperformed in dividing the suburban group based on thenumberofchildrenathome.Thisisthethree-waydivisionwejustdiscussed.Wecanseethatthegroupwithfiveormorechildrenathomeis

thelargestofthethree(4,332households),andthat itstillwasnotdivided into smallergroupshavingdifferentnumbersof children.(Households were recorded as having up to an awe-inspiring 14children.Themindboggles.)Thischoiceofgroupingwasbasedoncomplexstatistical testing.

All families with five or more children have the same statisticallikelihoodofeatingKrunchies.Thatis,thosewithfivechildrenhavethesamelikelihoodasthosewithsix,orthosewithsevenorthosewitheight–andsoon.Theclassificationtreeprogramwasinstructedtoseparatepeople

intogroupsonlywhereitfoundstatisticallysignificantdifferences,

while itsearchedthroughallpossiblewaysofdividingthesampletofindtheonewaythatwasthestrongest.Thisgroupof4,332waslargeenoughtodividefurther,butnodifferencesexistedthatcouldleadtoafurtherdivision.Thiscontrastfoundbytheprogram(basedonnumberofchildren

athome)isverystrongstatistically.Thereisonlya0.00001percentchance that the three groups have the same incidence of buyingKrunchies.This is farbeyond the lowestacceptable threshold (a5percentchanceofthegroupsbeingthesame).Ifyoufeelinneedofarefresheronstatisticalsignificance,pleasecheckbacktoChapter2.Thefullanalysiswouldcontinuepastthispoint,continuingtouse

other characteristics to grow the tree. Thiswould gountilwe ranout of demographic characteristics that led to further significantdifferences, oruntilwedecided that thegroupswere too small tosplitfurther.Wewill saya fond farewell toKrunchiesbynotingagain thata

veryhigh27.5percentofall suburbanfamilieswith fiveormorechildrenbuythisfineproduct.Thisledtoaninsightthatwarrantedfurtherexploration.Namely, SoggyOs, Inc. asked, ‘Could a larger-size package help

attract this group?’ SoggyOs pursued this finding, investigated theappealofthisideausingadiscretechoicemodelandcameupwithan optimal new product. This is their highly popular economygunny-sack size. (US cereal buyers may notice that many brandsnow offer, if not exactly this, a massive multi-large-box special.Theredefinitelyisappealinhavingheapsofone’sfavouritecereal-likesubstanceathome.)

OptimalrecodingClassification trees’ ability to split a variable in the best possibleway, picking the breaking points and the number of groups, is aremarkableanalyticalstrength.This iscalledoptimalrecoding. It

hasparticularvaluewhendealingwithcategoricalvariables.As a reminder, a categorical variable is onewhere thenumeric

codesholdplaces fornon-numericalvalues. Inourexample, towntype/sizewasacategoricalvariable.Theprogramheldthevaluesof1, 2 and 3 for this variable, corresponding respectively to city,suburbanandurban.

Optimalrecoding

Optimalrecodingisclassificationtrees’uniqueabilitytosplitthecodesinavariableinthebestpossiblewaytopredictanoutcomevariable.Codesareautomaticallyarrangedintogroupsandthebestsplittingpointsfound.Thisinvolvesverycomplextesting,sophisticatedenoughtobeclassifiedasartificialintelligence.

Strong,yetweakClearly,everythinginaclassificationtreedependsheavilyonwhichvariablegetschosenfirst.Thevariablewiththestrongeststatisticalsignificance will get the nod, unless you tell the program to dootherwise.Yet,whilethisvariablecouldbethebestatthatspecificpoint, itmight notwork best from the point of view ofmaking agood overallmodel. Trees do not look forward to seewhatmighthappenifanothervariablewerechosen.Technically,treesareanexampleofagreedyalgorithm.Thisisa

methodthatmakesachoiceateachgivenpointandthendealswithwhateveroutcomesariselater.We often find that many variables would pass the test of

significanceatanygivenspotinatreediagram.Therefore,anyoneof them could be chosen as a predictor there. The differences insignificancebetweenthe ‘best’variableandthenext fewbest–oreventhenext20ormore–canbevanishinglysmall.Thebestpredictorcouldbesignificantatsomethinglikethe10-16

level(thatis16zerosafterthedecimalpoint,ortowritethisoutjustonce, 0.0000000000000000). The next few on the list might besignificant at better than the 10–14 level, so the differences instatistical significance are microscopic. Yet putting in differentvariablesatanypointcanleadtoverydifferenttreesbelow.What canwedo toguardagainst thisproblem?Randomforests

canprovidesomeguidance.

Randomforests

Randomforests

This method runs many hundreds of classification trees, in eachtree swapping out people and variables at random. Then all thetrees‘vote’onafinaloutcome.Anexcellentreadingoftruevariableimportancescomesfromobservinghowclassificationlevelschangeas variables and people aremoved into and out ofmanymodels.The analysis also provides diagnostics showing how variables’importances shift as more trees are added. Eventually, afterrunning a few hundred alternative models, we reach stableestimates.‘Random forests’ adds to the list of horrible names beloved of

maths and science types (think of SCSI – pronounced ‘scuzzy’ –drives,boxandwhiskerplots,andp/pplots,forinstance).However,randomforestsillustrateakeyfindingfrommachinelearning.Thatis,theaverageofmanyweakestimatestypicallyisbetterthananyof the individual estimates. This is important enough to call outseparately.Any approach that uses many models, getting an average of

estimates,iscalledanensemblemethod.Withensembles,wehaveventureddeepintomachinelearning.

Ensembles:manymodelsboostaccuracy

Theaverageofmanyindifferentorweakmodelstypicallyperformsbetterthananyoftheindividualmodels.Thisisakeyinsightfrommachinelearning.Randomforestsareanexampleofensemblelearning.

In our examplebelow,wewill run a forestwith 500 treemodels.Having500 trees ispowerful,butposesaproblem.Wecannotseewhat thismethod isdoing.Unlikeamodelbasedona single tree,where we can see how variables split, this method gives usimportances but otherwise remains opaque. We cannot glanceacrossallthemanyanalysesandintuitwhathashappened.In the example, we therefore used variables that were among

thosewiththehighestimportance(acrossthemanydifferenttrees)ascandidatevariables,andbuiltonefinalclassificationtreemodel.Thisonemodel,basedon‘assuredwinners’,wasthenusedtoguidedecisionmaking.

Acasestudy:let’stakeacruiseImperial Admiral Cruise Lines found itself sitting atop a largedatabase of people who might take one of their fanciest cruises.(This is a fictionalized name – please do not look for them in thehopeofbookinga trip.)The familyrunning thisaugustenterprisehad recently married into the one that ran the mighty GermanKöniglich Luxus Boot-Unternehmen (KLBU) fleet. And together,theyhadjustboughtthebankruptPlatinumNigerianPrinceLine.Having records of who had travelled on all these lines, they

wantedtoputthisdatatogooduse.Theywereinterestedinsellingtheirmost luxurious cruise, theAdmiral Deluxe Imperial RoyalDiamondSpecialPlusCruise(orastheyaffectionatelycalledit,theBig-TipsCruise), andwanted to find those in theirdatabasesmostlikelytoindulgeinthisextravagance.

Here, theywerehoping to improve theiroddsby findingpeoplelikethosewhohadalreadytakensuchacruise.Theyhadaplethoraof household demographics and buying characteristics to workwith, literally a list with hundreds of items that they had boughtfromvariousexternalvendorsandmergedintotheirdata.They also had information on travel on their own line and the

KLBU. This included the number travelling, last destination,numberofcruises,numberoftimesatripwasupgraded,andmore.(Unfortunately,datafromthePlatinumNigerianPrinceLineprovedtobesketchyandsomostofitwasdiscarded.)They reasoned that those who were most likely to have taken

suchacruise,basedonthesecharacteristics–buthadnotdoneso–would be the most promising upcoming customers. Therefore,ImperialAdmiralCruiseLineswashoping to improve theoddsoffindingacustomerbyappealingtoselectgroups.However,theoddsthattheywerehopingtoimprovestartedwith

a vanishingly small baseline. Only 0.6 per cent of those in theirdatabase had ever taken the plunge and bought their mostexpensive offering. Even increasing this tenfold would give themonlya6percentlikelihoodoffindingabuyer.Slenderasthismayseem,thiswouldstillbe10timesbetterthan

doing nothing. Much like the US advertisers of medications forCrohn’s disease mentioned in online bonus Chapter 1, they werewillingtopursuelowodds.Thebesttheylikelycoulddowouldbefindingapopulationwherenearlyallwouldnotbegoodprospects,even though the odds of finding a prospect there would be farbetterthanaverage.

ThemostimportantvariablesfromrandomforestsSeveralhundredvariableswereputintoarandomforestsanalysis.It built 500 trees, randomly swapping people and predictorvariablesintoandoutoftheanalysis.Figure6.5showsthe30mostimportantvariablesandtheirrelativeimportances.Variablesfrom

thislistwereusedinconstructingaclassificationtree.Notethatthisstudytookplaceanumberofyearsago,andatthattime,buyingbymailorderwasamorerobustbusiness.Variablesrelatingtobuyingbymailorderappearonthelist.

FIGURE6.5Importancesfromrandomforests

Constructingthetree

Elementsappearingineachspotinthetree

Weshouldfirstexplainwhatyouwillseeinthetreediagram.Thesedisplaysvaryintheinformationtheyshow.Theredoesnotappearto be a default layout. Your author favours a display highlightingjust the information needed, eschewing fancy graphs and

flourishes.Asyouwillseewhenwerevealtheentiretree(inFigure6.11),evenanunadornedtreecanbevisuallyimposing.Figure 6.6 explainswhat youwill see at eachpoint, ornode, in

thetree.Thelarge,bold-facednumberisthepercentageofpeopleinthatgroupwhohaveevertakenanAdmiralDeluxeImperialRoyalDiamondSpecial Plus (fondly recalledas theBig-Tips) cruise. Thispercentageisthetargetordependentvariableintheanalysis.Inthefigure, some2.3 per cent of those in the group shownhave takenthistypeoftrip.

FIGURE6.6Theelementsinatreediagram

Thetwosmallernumbersinsidethenodeshowhowmanypeopleare in this group (12,003) and the percentage of the total samplethat thisgroupcomprises (3.5percent).Pleasedonotconfuse theincidence of Big-Tips cruise-goers in the group (the very largenumber,or2.3)withhow large thegroup is (thesmallnumber inthecorner,or3.5percent).Above thenode, youwill see informationabout thevariable on

which the sample is split at that point, and the statisticalsignificance of the difference between the split-off groups. Pleaserecall thatweare looking foranumberof0.05or less topass thetest.Inthisinstance,wehaveamuchsmallervalueofP=0.0000001.Thisgroupcomesfromanotherthathasalreadybeensplitoutof

the total sample, so it has two variables or characteristics thatdefine it. Only themost recent splitting variable appears directlyabove. (That is, they have not bought premium luggage by mailorder.)Youwouldneedtotracefurtherupthetreetofindtheothercharacteristic.Below the node, we see the variable leading to the next split,

upgrades on the companionKLBU Line. The procedure found thebest difference was between those who had no upgrades and all

otherswhohadeverupgraded.

Tothewholetree:ourfirstsplitofthesample

The first split is based onhowmany times customersupgraded atripwith ImperialAdmiralCruiseLines. Somedebatearoseaboutusing this as a predictor, because the Admiral Deluxe ImperialRoyal Diamond Special Plus (aka Big-Tips) trip itself could havebeen chosen due to upgrading. The data was probed somewhatmore,anditwasdiscoveredthatnearlyalloftheupgradeswenttoexpensive,butstilllesser,cruises–suchasthepopularRegalSuperExtraGoldDeluxepackage.Figure6.7showsushowthisfirstsplitlooked.Notethatthisisa

four-waydivisionofthetotalsample.Thelargestgrouphasdonenoupgrades.This isnearlythree-quartersof thedatabase.Onlysome0.3percentofthisgrouphavetakenIAG’smostextravagantcruise.

FIGURE6.7Thefirstsplit,basedonupgrades

Thosewhohaveupgradedoncemakeupthenextgroup.Theyarenearly at the average, with some 0.7 per cent having taken theplungeonaBig-Tipscruise.Two smaller groups are far more likely to have taken such a

voyage. This level reaches 3.4 per cent among the 14,158 whoupgraded twice. Among the smallest of all groups, those whoupgraded three ormore times, some 7.7 per cent have taken thissuper-luxuryjourney.Itmayseemthat7.7percentisnotverygoododds.Thisistruein

theabsolute,butitis12.8timesashighastheaverage,andover25times asmuch as thosewho never upgraded. Therefore, this is asubstantialimprovementovernothavingeventhismuchofatreeanalysis.This is a very powerful difference, and so the split is highly

significant(shownbythenotationP=0.0000000,wherePstandsfor‘probability’).Thismeansthatwehaveanear-zerochancethattheaverages in the groups are in fact the same, even the two groupswhere the levels are 0.3 per cent and 0.7 per cent. The artificialintelligencebehindthesplittingmethodsinthetreeanalysisallowsustobeextremelyconfidentthatsuchseeminglysimilargroupsareinfactnotthesame.

Onwardtoourfirstsectionofthetree

Onwardtoourfirstsectionofthetree

We will return to those who had no upgrades. They are notpromising prospects, but perhaps some other characteristic of asubsetofthisgroupwillrevealamorelikelysubgroupamongthem.Thetreewillreachasetoffinal,orterminal,nodes.Thesenodes

willgetnumberedsothatwecanrefertothemmoreeasily.As we see in Figure 6.8, the first split is based on the last

destination. This seems to make sense because IAG cruises tocertainregionscostmorethantoothers.Thissplitdividesthegroupwith no upgrades into two nearly equal parts. However, one ofthosegroups is four timesas likelyas theother tohave taken theexpensivetrip.ThisgroupconsistsofthosewholastdisembarkedinScotlandandPortugal, and the141hearty soulswho took theoneandonlyadventuretouraroundSiberia.

FIGURE6.8Thefirstsectionofthetree

TheothergrouptookthelessexpensivevoyagesaroundNorway,Spain and France. A mere 0.1 per cent of them took the super-deluxetrip.Going back to the group on the left, a variable was found that

could isolate a small subgroup with above-average likelihood oftakingthebigtrip.Thiswasthosewhoalsohadtakeneightormoreluxury trips of any kind in the last five years. The correctway to

define this terminal group, marked with a large ‘2’, would be asfollows: Upgrades on IAG = 0 AND last destination = Scotland,Portugal or Siberia AND took eight ormore luxury cruises in thelastfiveyearsYoumusthaveallthreeofthesetogethertodefinethisgroup.Nooneortwoofthemaloneorincombinationleadstoasubgroupwith this high an incidence of peoplewho took the bigtrip.Terminal group 1,whichmakes upmost of the group above it,

took seven or fewer luxury trips. They remain at 0.4 per centincidence.Rounding out this branch of the tree, the next split within the

group who went to Norway, France or Spain does not lead to asubgroupwith above-average incidence peoplewho took the Big-Tips cruise. Terminal group 3 has a 0.1 per cent likelihood andterminal group 4 remains at 0.4 per cent. The most sharplydifferentiatingcharacteristicfoundherewaswhetherapersonwasamail-orderbuyeroftravelsoftware.

Movingtotheupgraders

Those who upgraded once make up about 21 per cent of thedatabase.Butwithnearly71,000inthisgroup,therewasroomforthetreetobranchseveraltimes.Thisisthemostcomplexportionofthediagram,withseventerminalgroupsornodes,asyoucanseeinFigure6.9.

FIGURE6.9Splittingvariablesforthoseupgradingonce

Fourvariablesworktogethertodefinethesegroups.Oneofthese,theso-calledwealthindex,wascalculatedbyanexternaldatabaseprovider.Sinceitisappendedtoallthepeopleinthedatabase,themeaning of this is not important, just the score at which adifferencewasfound.The way this variable splits, points out that CHAID can handle

both categorical and continuous predictor variables. It will findthebestplaceatwhichtosplitoffgroupswitheitherkindofdata.Twoofthegroupsarewellbelowaverageinincidenceofsuper-

luxury trip buyers. The first is terminal group 5, defined as:upgraded once AND had 1–4 persons on the last trip AND not apremiermail-orderbuyerforoutdoorgardening.Only0.1percentinthisgrouptookthesuper-deluxecruise.

Theother is terminalgroup7,withan incidenceofonly0.3percent. A small number of peoplewith no data (47 to be exact) aremixedinwiththosehavinglowerwealthindexscores.

Missingvaluesarenotaproblem

This admixture of data and missing responses points out oneadditionalstrengthofCHAID.Missingvaluescanbehandledjustasanyotherresponse.Youmustasktheprogramtodothis.Ifyoudo,these responses are put into the group of codes where they bestboostcontrastsinresponsestothedependentvariable.CHAID can even be used to impute or estimate what missing

valuesmight be. Since thesemissing responses are grouped withthe responses that have statistically identical levels of the targetvariable, thiscanbeaverysensiblewayofestimatingwhat thosemissingvaluescouldbe.

Ourfirstgroupswithhighincidencesofsuper-luxurybuyers

Two of the terminal groups (group 7 and group 9) have aboutaverage incidence of peoplewho took the Big-Tips cruise. But theremaining three groups (8, 10 and 11) have incidences of super-luxurybuyersthatarewellaboveaverage.Group11istherichestin this select group, reaching some 3.9 per cent. This is some 6.5timesmorethantheaverage.Thisgroupisdescribedasfollows:

Numberofupgrades=1ANDpersonsonthelastcruise7andoverANDtotalsuper-deluxetripsinthelast3years=2andup

Thebestofslenderodds:thosemostlikely

Wefindthegroupswiththehighestlikelihoodoftakingthesuper-deluxevoyageinthelastsectionofthetree,thoseupgradingtwoormoretimes.Figure6.10showsthiscomprisestwobranches.Thosewhoupgradedthreetimesormorestandalone,withanincidenceof7.4percent(interminalnode15).Nofurthersplitswerepossibleoncethisgroupwasisolated.Itistheonlygroupdescribedfullyby

asinglecharacteristic.

FIGURE6.10Mostlythebestprospects,thoseupgradingmorethanonce

Theincidenceofsuper-luxurytravellersinthisgroupissome12.3timestheaverageof0.6percent.Theoddsarestilllow,butclearlyfarbetterthanjustpickingatrandom.

Thegroupwiththehighestincidenceofallfallsunderthosewhoupgraded twice.Thisgroup (number14) isdefinedbyone furthercharacteristic,namelythattheyweremail-orderbuyersofpremierluggage.Inthisgroup,some8.8percenttookthereallybigcruise.Thiscomestosome14.8timestheoverallaverage,atrulysizeableincrease.

Theentiretreediagram

Figure 6.11 puts all the pieces together. This is not a particularlylargetree,butevenatthissizethedisplayisvisuallyimposing.Thetreehasagreatdealofvaluableinformation,butlookingatitinthisform,itiseasytomissmuchofwhatisshown.

FIGURE6.11Thewholeclassificationtree

Twoformsofadditionaldisplayswillhelpmaketheinformationinthetreemoreaccessible.Thesearetheclassificationrulesandaveryhandychartcalledvariouslyagainsanalysis,a liftanalysisoraleverageanalysis.Let’sdiscusswhateachdoes.

Classificationrulesandtrees

Classification trees do not generate equations like (for instance) aregression,whichhavevariablesandtheirweights(orcoefficients).Youmaybefamiliarwithaformulationfromregressionsomethinglikethis:

Variabley=0.4*A+0.6*B+0.7*C+81

Rather, classification trees lead to a set of simple ‘if-then’statements, or classification rules. In Table 6.1 you will see therulesdescribingthefirstfourgroupsinthetree.Therulenumberscorrespond to thenumbers in the treediagram.Typically,weuserules describing only the very ends of the tree, or the terminal

nodes. Groups or nodes inside the tree lead to only a partialdescriptionofanygroupformed.

TABLE6.1Thefirstfourclassificationrules

RULE1IF

TotalDeluxeTripsLast5Yrs=0to7ANDLastCruiseDestination=Siberia,ScotlandorPortugalANDNo.UpgradesIAC=0

THENTooktheBig-TipsCruise=0.4%

RULE2IF

TotalDeluxeTripsLast5Yrs=8andupANDLastCruiseDestination=Siberia,ScotlandorPortugalANDNo.UpgradesIAC=0


RULE3IF

PremierMail-OrderBuyerTravelSoftware=0(No)ANDLastCruiseDestination=Norway,SpainorFranceANDNo.UpgradesIAC=0


RULE4IF

PremierMail-OrderBuyerTravelSoftware=1(Yes)ANDLastCruiseDestination=Norway,SpainorFranceANDNo.UpgradesIAC=0


NOTEthenumberingoftherulesfollowsthenumberingoftheendingnodesonthetreediagram

Rulesliketheseareallweneedtodescribetheoutput,orcreatea

model.Thismodeliswhatwewouldusetoscoreanotherdataset.Inthisexample,everyoneinthedatasetgotaprobabilityoftakingthe big cruise based on themodel. Then thosemost likely to buyinto this could be approached with appeals to consider thisenriching(andpossiblybankrupting)experience.

Classificationrules

Aclassificationtreeleadstoasetofsimpleif-thenstatementsdescribingthecombinationsofpredictorvariablesthatleadtodifferentvaluesofthetargetvariable.Theytypicallydescribejusttheendingboxesornodesinthetree.Theserulescanbeusedtoassignvaluestoanotherdataset.

These rules are extremely simple and so very easy to programintoadatabase.Thisgreatsimplicityofthemodel,withitsabsenceofequationsorothercalculations, isahighlyappealing featureofclassificationtrees.This treerequires15rules,one foreach terminalnode.Wealso

usedonly10predictorvariables.This treehasdonea tremendousamounttoclarifythedatawithahighlycompactmodel.Recallthatthis database ran into the hundreds of thousands. Everyone wasclassifiedintoagroup,andonlyonegroup.Moreformally,themodelismutuallyexclusiveandcompletely

exhaustive. This kind of jargon can become catching in certainenvironments. At one client company, people were sitting in ameetingand talkingabout amodelbeing something that soundedlike‘me-see’.Atfirsttheirsanityseemedindoubt.Itthenemergedtheywereindeedtalkingabouttheinitialsformutuallyexclusiveand completely exhaustive, or MECE. However, we are still notsureabouttheirsanity.Inanyevent,thisability(tocaptureandcharacterizeeveryonein

a group with so few variables) is a tremendous strength ofclassificationtreemethods.Withsomeaudiencesitmayposesomedifficulties.Theymaynotbelievethatyoucanrevealsomuchaboutadatasetwithsofewpredictors,andmayevenquestionwhytheirfavouriteitemwasnotincluded.

Theultimateindetail:thegainsanalysisWecan learnstillmoreabout the finerpointsof theclassification

tree from a gains analysis. This may also be called a lift orleverage analysis. It lists the terminal nodes of the tree in order,from highest incidence to lowest. It also provides other detailsaboutthem.Table6.2showsaportionofagainsanalysis.Thefullanalysisforthistreewouldhave15sections(oneforeachterminalnode).Wewillshowanddiscussthefirstfive.

TABLE6.2Gainschartshowingthefivegroupswiththehighestindexvalues

Theportionofthecharttotheright,under‘A’, ismoreintuitive.This first shows where the node falls in the tree, how large thegroupis,andtheincidenceofsuper-luxurybuyersinthegroup.One number that is less apparent is added, namely the lift or

leverage. This is an indexedvalue, showinghow the incidenceofsuper-luxurybuyersinthisgroupcomparestotheoverallaverage.Theoverallaverage isset to100.Therefore, the index for the firstgroup of 1,480 means that the incidence of these buyers in thisgroupis14.80timesashighastheaverage.The portion of Table 6.2 under ‘B’ is somewhat more complex.

This shows cumulative figures. Cumulative figures are theweightedaverageof all groupsdown to and including the oneonthat line of the table. So, for instance, cumulative figures for thethirdgroupareaweightedaverageofthevaluesforgroups1,2and

3. (The third group is defined as: No. upgrades KLBU = 1 and upAND premier mail-order buyer of luggage = 0 (No) AND no.upgrades IAC = 2.) Cumulative figures are valuable because theyhelpdecideonacut-offpoint for targeting.Whenyouuseagainscharttotarget,youwouldtaganumberofgroupsinthedatabase.The simplicity of the if-then rules coming from the classificationtreemakesiteasytospecifythatanyofadozenormoreconditionsbemet. Someclassification treeprogramsevengenerate the rulesindatabase(SQL)languageautomatically.Still more statistics could appear in the diagram, such as the

percentageofallluxurybuyersineachgroupsandactualcountsofpeople.However, asmanydetails aswehave shown in this chartoftenseemsperplexing,especiallytoharassedmanagementpeople.Sometimes,then,asimplerdisplaycanhelpgetthemessageacrossmore easily. One such chart, which we could subtitle ‘Gains formanagement’,appearsinFigure6.12.

FIGURE6.12Thegainschartsimplified

Weincludedanother sixgroups in theactual targetingexercise.Thatamountedtosome20percentofthetotalsample,andsome70per cent of the luxury buyers. (Cumulative incidence for these 11groupswas350.)Thisrepresentsaterrificgaininefficiencyversusnothavingthemodel.One way to calculate efficiency would be taking the increased

incidence of buyers, as shown by the cumulative gain (350), andtaking into account that you are skipping 80 per cent of the totaldatabase.Thatis,efficiencywouldbe3.5/0.2,orsome17.5timesasefficientinfindingluxurybuyersasnothavingthemodel.Sowhiletheoddsarestillnothigh,theyarefarbetterthanthey

could be without the model. Using classification trees greatlyincreased the odds of finding likely prospects, and so increasedcertaintyanddecreasedwaste.Still, while the trees have guided IAG to those who might take

suchatrip,theydonotrevealwhichfeaturesormessagesmightbe most appealing. Those investigations would need to be doneusing the trade-off methods we discussed in Chapters 4, 5 and

onlinebonusChapter1.

Growingtrees:automatedorguided?Intheexample,wecouldhavemadethetreelarger,orhavegrownitfurther.Weelectedtostopat15terminalgroupsbecauseaddingmoresplitsprovidednoadditional informationuseful indirectingtactics.Thatis,weguidedthegrowthofthetree.Youeithercan let classification trees runautomaticallyor (with

some programs) guide what the procedure does. Some programsrunonlyautomatically,which inyourauthor’sopinion isastrongdisadvantage if you want to do anything more than optimalrecodingofvariables.The ‘greediness’oftreemethodsthatwediscussedearlierisone

mainreasonthatautomationcanbeaseriousproblem.Thatis,treeprograms will automatically choose the one variable that looks‘best’ateachspotinthetree,evenifthatvariablewouldnotleadtothebestoverallresultwhenaddingothervariableslater.Infact,weencounterthisissuewithnearlyallmethodsthattryto

build models by adding variables. It is worse with trees becausechoosing thewrongvariable can lead tono further growth in thetree.Thiscanbecomeaseriousproblembecausetreestypicallydonotcontainmanyvariables.Sometimesthe‘best’variableoreventhenext‘best’willcausethe

tree to stop growing because no further statistically significantvariables canbe foundbelow it.However, someother statisticallysignificant variable could lead to further growth, and to furthervaluableinformationaboutwhatinfluencesanoveralloutcome.

RecommendationsontreegrowingTypically results are excellent if you let the program pickautomaticallyfirstandthengobackandcheckanyspotsthatseemproblematic – for instance, a place where a large group was left

undivided.Themost complete classification treeprograms let youlookatalistofpossiblevariablesthatcouldsplitthesampleateachpoint in thetree (allof thesevariablespassingyourchosentestofstatistical significance). You then can explore how the tree growswhenyouswaptheprogram’sfirstchoiceforanothervariable.No matter which program you use, you will need to set the

acceptablesignificancelevel,andthesmallestgroupyouwillallowwhentheprogramdoessplitting.Ifyoudonotsetaminimum,theprogrammay even split off a single person into his or her owngroup. And of course, theminimum size you selectwill influencehowthetreegrows.Allthesefactorsleadtotheconclusionthatwemayneverreacha

demonstrably optimal tree. Small fluctuations in the data – forinstance,droppingonepersonwithquestionableresponses–mightlead to very different-looking trees being selected as the bestpossiblebyacomputerprogram.Still,whilethisisacaution,itshouldnotbeadeterrenttousing

this remarkable method. The goal should always be to create amodel that has the most useful information and that still hasstrongpredictivepower.Squeezingthelastpossibledropoutofascore showing thegoodnessofprediction shouldnotbeyouraim.Scores are valuable things, but real-world applicability is moreimportant. Sacrificing a point or two of a score for amoreusefulmodelmakesagreatdealofsense.

CHAIDandCART(andCRT,C&RT,QUEST,J48andothers)Beforeweget toournextnewmethod,weneed to clearup someterminologyabouttrees.Asyouhavenoticed,wehavebeentalkinghere about a specific type of classification tree analysis, calledCHAID.AID, a notoriously inaccurate method, came first. Over the last

few decades, a healthy host of related methods have beendeveloped. CHAID was the first method to solve the problem ofcomparing significance when using variables that have differentnumbers of categories – as it turns out, a fiendishly difficultproblem.OvertheyearsmanyalternativestoCHAIDhavebeenproposed.

Againventuringoutontoametaphoricallimb,wewillsaythattheyalldobasicallythesamethings,butwithdifferentrestrictionsandrules.Atonetime, therewerefiercepartisansofvariousmethods.Now that classification trees are no longer the latest word inpredictivemodels,therancorouslanguageseemstohavesubsided.Perhapsthedisputesalsohavequietedbecausetherenowareso

many variations of classification tree methods, well over 40. ThemostwidelyusedalongwithCHAIDiscalledCART(alsocalledCRTand C&RT because strangely the word CART was allowed atrademark).CART and its relatives, such asQUEST, do only two-way splits

whileCHAIDcansplitmorefinely.SomeusersstillpreferCART,butforthepurposeofperformingoptimalrecoding,CHAIDappearstobemoreefficient.All thegroupsyouneedcanbe formedatonce,ratherthaninaseriesoftwo-waysplits.Youmay also encounter such programs asAC2, J48,C4.5 (free)

andC5(notfree).Wehavediscussedoneoutgrowthofclassificationtrees,randomforests.Nextwediscussanother,boosteddecisionstumps.

DecisionstumpsrescueatheoreticalmodelA decision stump, aside from being another unlovely term, issimplyaone-leveltree.Thesecanbeusedforoptimalrecoding,asintheexampleshowninFigure6.13,whereadecisionstumplinksalargeregression-basedmodeltomarketshare.

FIGURE6.13Decisionstumplinkingscoresandshare

Regression-basedmodelscanbecomehighlycomplex,withsomeextensions of regression actually creating new variables bycombiningrelatedones.Thesecombinedvariables thenactonthetarget or dependent variable. They also can influence each other.(Thebest-knownofthesemethodsarestructuralequationmodelsand partial least squares path models.) The new, combinedvariablesareconsideredtobeunderlyingideasorthemesthattheactualvariablesindirectlymeasure.Herea largenumberofvariableswereenvisionedascombining

into a loyalty score. However, while the loyalty score itself waspredictedaccuratelybytheindividualvariables,themodeldidnotlink well with market share. This happened because regressionexpects relationships among variables to fall into a straight line.Shareandthescoredidhaveastrongrelationship,butitwasnotatalllinear.Adecisionstumprevealedhowtheyrelated.As you can see in Figure 6.13, share jumps abruptly once the

loyalty score goesover 8.7. (This scorewasaweightedaverageof

manyothervariables,andsocouldbeanyvaluebetween0and10.)The classification tree program found the best places to split thescorevariable,usingoptimalrecoding.Thebest-fit regression linemisses this relationship. It even falls

belowapredictedvalueofzero forsharewhenthescore isabout5.2.

BoosteddecisionstumpsThis application gives us another, distinctive way to determinevariables’ importances. It uses a process of building single-leveltrees repeatedly. Itsapproachdiffers from thatof randomforests,which reruns larger trees while randomly swapping predictorvariablesandcases(people)inandoutofeachmodel.Rather, boosting decision stumps runs a first model, a single-

level tree,andthen learns fromthatmodel.Theproceduremarkswhichcasesarepredictedcorrectlyandwhicharenot.Thecorrectcasesaremarkedastheeasyonesandtheincorrectcasesashard.The procedure then puts more weight or emphasis on the hardcases,andtriestofitamodelthatcapturesthembetter.It will redo this as many times as you request. Each time, it

focusesmostonthehardcases,tryingtopredictthosepeoplewhowereincorrectlyclassifiedthelasttime.Boostingisanothertypeofensemblemodel.Figure 6.14 shows an output from a run of boosted decision

stumps. These importances followed a classification tree modelshowing the linkages between the nature of psoriasis anddepression. The model used measurements of the extent andlocation of the affected skin areas for about 6,900 patients. Thesepatients also took an internationally normed test designed tomeasureseriousdepression.

FIGURE6.14Outputfromboosting

NOTEThesescoresarescaledtosumto100

Thetreemodelledtoasimplesetofif-thenrules,witheachrulecorrespondingtoadifferentprobabilityofseveredepression.Itwaseasytouse,sinceitwasbasedonmeasurementsthatdoctorswouldtake in any event as a part of treatment. It could even be scoredusingapencilandpaper.Aquestionaroseabouthowimportantthevariablesinthemodel

were,andforthisboostingwasused.ThespecificmethodiscalledAdaBoost.M1,whichspecifically ismade tousewithclassificationtrees.We asked the method to run boosting 40 times, and the

importancesthatappearinFigure6.14emergedfromtheanalysis.Asyoucanseeinthefigure,twovaluesofthesamepredictorwerethe most important. This one predictor is percentage of bodysurfaceareawithpsoriasis(PSOBSA).Onecriticalthresholdisover10percentofbodysurfaceareaandtheotherisover20percent.Few othermethodsmatch this ability to isolate two values of the

samevariableasimportant.Thismodelprovedtobevaluablebecausepsoriasispatientsoften

conceal how depressed they feel, even from their doctors. Theclassification treemodel gave doctors a simple way to determinewhichpatientsmightbemostatrisk.Theboostinggavethemafewfeaturestowatchwithparticularextraattention.

Summary:applicationsandcautionsClassification trees remainpre-eminent for teasingout and seeinginteractions – theways inwhich variables’ influence on a targetvariable is stronger or weaker than expectedwhen theywork incombination. When two variables work together to produce aneffectthatisstrongerthanwewouldexpectbasedonthewaysthateach behaves separately, we can actually say for once that theseeffectsaresynergistic–andmeanit.Other analytical methods allow variables to interact as part of

theirnormaloperation,inparticulartheBayesNetswediscussinChapter7.Butnowherearethesepatternsasapparentastheyareinaclassificationtreeanalysis.Thesemethods are so effective at finding interactions that they

make a good first step before othermethods of analysis, such asregression-based models. The important interactions thatclassification trees find can be entered into the regression,improvingresults.We mainly discussed the CHAID method, which can produce

many-waysplits.Thisisparticularlyefficienttofindthebestwaytosplit a large categorical variable that holds many codes. Anothertypeofclassificationtreemethod,exemplifiedinCART(orC&RTorCRT),producesonlytwo-waysplitsandsomakesoptimalrecodingsomewhatmorecumbersome.Trees do not produce equations like regressions. Classification

tree models typically are small, with few variables and few

classification rules. Rules are simple if-then statements thatdescribe how the variables work together to lead to an outcome.Thissimplicity,evenwithverylargedatasets, isasalientstrengthof these methods. We typically can explain everything importantquickly and efficiently. Some audiences may find this hard tobelievethough,sothiscouldrequiresomeadvanceexplanation.Classification tree methods do not produce a truly definitive

model. Small fluctuations in thedataorsmallchanges in thewayyouchoosetosetminimumacceptablegroupsizes,forinstance,canleadtoverydifferent-lookingtrees.Ifyouallowthesoftwaretodoall the choosing in shaping the tree, youmight get a result that isnotbestsuitedtoyourstrategicortacticalneeds.Some programs claim to find the best possible model

automatically,butyoualoneknowwhatyoureallyneed.Therefore,you are best served by a program that allows you to modify themodeltofityourobjectives.Weexplainedtwoensemblemethodsbasedontrees.Ensembles

runmanymodelsandeitheraverageortakevotes.Oneofthekeyfindingsofmachinelearningisthattheaverageofmanyindifferentorweakmodels typicallyworks better than any of the individualmodels.Random forests help understand the importance of variables.

Theybuildhundredsofclassificationtreeswithrandomswappingof people andpredictors into themodel each time. They canhelpyoufocusinonthetrulyimportantvariables,aswedidintheIAGCruiseLinesexample.Adaptiveboostingisanotherapproachthatcanprovideanexcellentreadingofvariables’ importances.Whichof theseapproachesyouusewoulddependonwhichmostclearlyshowstheinformationyouneedtomeetthegoalsofyourproject.However, another method, Bayes Nets, arguably provides the

ultimate in seeing how variables relate and understandingvariables’importances.WediscussBayesNetsinChapter7.

07RemarkablepredictivemodelswithBayesNets

In thischapterwewill learnabout theremarkablesetofmethodscalledBayes Nets or Bayesian networks.Wewill show how theyreadilyseepatternsintherealworldthatcompletelyeludeus.Wealso illustratehowtheyautomatically learndatastructures, fittingvariablestogetherintonaturalgroupings.Wereviewsomeoftheirmany other abilities, including the ways they can trim groups ofvariables, determine the true importances of predictors, andgenerally make sense of data. Using numerous illustrations, thischapterdemonstratestheremarkablepropertiesofthesemethods.

WhatareBayesNetsandhowdotheycomparewithothermethods?Nowthatwehaveassimilatedagreatdealabouttrade-offmethods,classification trees and ensembles, we head in another directionentirely:exploringBayesNetsorBayesiannetworks.BayesNetshaveremarkablepropertiesallowingthemtosurpass

manyothermethods in showingusefulpatterns indata.Theycanevensolvesomeproblemsthateludeothermethodsentirely.Theywillrepaylearninganewwaytothinkaboutdata.

ComparingtotheregressionstandardBayes Nets can tackle many of the same problems that havetraditionally been addressed with various regression-based

models. Nearly everyone who has sat (or suffered) through astatisticsclasshasheardaboutregression.Manypeoplebelievethatthey know these methods and their uses, and some in fact dounderstandthem.So that everyone is on an even footing, let’s reviewa fewbasic

properties of regressions. One basic idea behind them is thatwecanaddtheeffectsofvariablestopredictthevaluesofsometargetvariable. This target could be, for instance, a scaled rating or thelevel of use of a product or service. The types of regression wetypically use all work based on seeking straight-linesrelationships. Hence, the designation you will sometimes see:linearadditivemodels.Regressions are based on the patterns of correlations among

variables.Asareminder,whilecorrelationhasbeentakentomeanmanythings,withregressionitmeansasimplesummarymeasureofhowcloselytwovariablesfallintoastraight-linerelationship.Some extensions to regression deal with curved lines for the

target variable, but you are not likely to encounter these outsidescientific settings. Even if the line is curved, regression seeks thebestfittothatline,asitdoeswiththestraightlineinFigure7.1.

FIGURE7.1Aregressionlineshowingthebestprediction

A regression generates an equation that we can think of assomething like a recipe formaking a cake. A regression equationwouldlooklikethis:

Y(thevalueofthetargetvariable)=0.5A+0.7B+81

Inanequationlikethis,youaddingredientsincertainproportions,andperhapsaddanothernumberat theend (theconstant).Thesevaluestogetherthenforecast thetargetvariable. InFigure7.1, thestraightlinerepresentstheregression’sbestestimateofthevalues

of the target variable, based on values of one predictor. The dotsshow the actual values of the target variable (shown as distancesacross) versus the predictor variable (shown as distances up anddown).Regression-basedmethodshavehadalong,distinguishedhistory

of working well in many settings. They are among the mostvenerableandmost testedof all analyticalmethods. Still, at timesregressionsperformindifferently,poorlyorevennotatall.Regular regressions do not work with target variables that are

categorical (such as predicting the group to which a personbelongs, or in which region a person lives). You alsomust breakdown categorical predictors into sets of yes/no variables. Mostcommonlythisisdoneviaaprocesscalleddummycoding.We did not talk much about the methods underlying discrete

choice modelling – these are complex and not critical tounderstandingtheoutput.However,theyarebasedonrelativesofregression that can use a categorical target variable. Specifically,these are variants of multinomial logit or the more inclusivemotherlogit.

Regressionsaregreatforexperiments

Regression-relatedmethodsareaperfectmatchforanexperiment.That is, you have all the variables that youwant tomeasure andnoneextra.Regressionexpectsthat.Also,inanexperiment,allthepredictorvariableshavezeroornear-zerocorrelations.Thisallowsthe regression to measure each variable’s effects precisely. Withthiscarefulset-up,youreallycanmeasuretheeffectofavariableifallothervariablesremainconstant.

Sometroubleintherealworld

However,regressionsmayrunintotroublewithmessier,real-worlddata. When predictor variables are related to each other, thecoefficients of the variables can shift or even reverse signs (going

from positive to negative or vice versa). Variables that are toocloselyrelatedcangetsqueezedoutofthefinalmodelaltogether.Regression will not alert you to situations in which variables

interact, or combine to produce a result different from what wewouldexpectfromeachvariableseparately.YoumayrecallthatinChapter6we saw that large families living in the suburbsboughtmoreKardboardKrunchiesthanwewouldexpecteitherfromlargefamilies overall or from those living in the suburbs overall. Thatwas an interaction. While it appeared in a classification tree, itwouldnotemergespontaneouslyfromaregression.

ReviewingtreesClassification trees take a radically different approach fromregressions, allowing them to tease out different patterns in thedata. All treemethods handle categorical, ordinal and continuousvariables equally well. They can break apart continuous or largecategoricalvariablesinthebestwaytoleadtoastrongprediction.Thisiscalledoptimalrecoding.Treemethodsevencanhandlemissingvaluesindataasanother

typeofresponse,aswesawinChapter6.Theydoallthisinadditiontotheirunparalleledabilitytoshowhowvariablesinteract.Asareminder,classificationtreesworkbysplittingthesample

into contrasting groups and resplitting those groups again andagain, seeking to find small subgroups that differ as much aspossibleinlevelsofsometargetvariable.(Thistargetcouldbe,forinstance, preference ratings or levels of use.)We can see a singlesplit in a tree in Figure 7.2. This comes from the first example inChapter6.

FIGURE7.2Asplitfromaclassificationtree

Classification trees do not generate equations. Rather, theyproduceasetofsimpleif-thenrules.Youmightsay,forinstance:

IF type of town = suburbs AND number of kids = five ormore THEN likelihood ofbuyingKardboardKrunchies=27.5percent

Now,ontonetworksYoumaywell bewonderingwhat kind of predictivemodel coulddifferfromboththistypeofsplittingroutineandfromregressions.AndsowecometoBayesNets.The first clue to the nature of this method comes in the term

network.Allthevariablesconnectandallinfluencethetargetandeachother.Anetworkcanbeassimpleastwovariablesthatrelateto each other, or it can contain thousands of variables, as inresearchintohowgenesinteract,forinstance.Soon we will get to a small (but still mind-bending) example

showinghowremarkablythesenetworksperform.First,though,wewill pause to reassure you that they are very solidmethods,withextensiveuse inairplaneguidancesystems,public safety, runningnuclear power plants, cancer research, the genetic research we

mentionedjustabove–andevennationaldefence.We have seen an example online showing how a Bayes Net is

used to determinewhether to launch a surface-to-airmissile.Weactually can say that these networks have been battle-tested. Thesocialsciencesandmarketingscienceshavebeenalittlelatetotheparty.Catchingupnowseemsliketherightthingtodo.

Whatdothesenetworksactuallydo?These networks can solve a vast and even bewildering array ofproblems. Applications range from brainstorming to highlysophisticated modelling and forecasting systems. Here are someuses:

automaticallyfindingmeaningfulpatternsamongvariables;gettingaccuratemeasuresofvariables’strengths;screeninglargenumbersofvariablesquickly,fordatamining;developingmodelsofcauseandeffect(intherightcircumstances);incorporatingexpertjudgementintodata-drivenmodels;solvingproblemsinconditionalprobability.

Wewillshowseveralexamplesofnetworksinactionfollowingoursmall introductorymodel.Oneexamplewill linksurveyresponsesto actual marketplace behaviour with high predictive accuracy.Anotherwill support and extend a theoreticalmodel. A thirdwillshow how accurate readings of variables’ importances helped toguidedecisionsandstrategy.

WhatmakesaBayesiannetworkBayesian?EverythingBayesian refersback to theworkofReverendThomasBayes, who lived an apparently quiet life in Tunbridge Wells,England,inthe18thcentury.Hepublishedtwobooksinthe1730s,

butneveranythinghecalled‘BayesTheorem’.Bayes’s formulation itself is simple. Any reasonably literate

person caneasilyunderstand it in its entirety, oncewe stepasidefromthenearlyblindingformulaoftenusedtorepresentit.StartingfromBayes’sstraightforwardassertionandarrivingatmanyofthetypesofanalysesthatbearhisnamelikelywouldhavecausedthegood reverend to take on a strange hue. This perhaps is theinevitablepriceofprogress.We can formulate Bayes’s idea in a variety of ways. Let’s start

withthismorepracticalformulation:

We start with prior (existing) information or beliefs that we can then update ormodifybyusinginformationthatwegetfromdataweobserve.Thisupdatinggivesusanewandmoreaccurateposteriorestimate.Fromthisposteriorestimate,wedrawconclusions.

That’sreallyallthereistoit.However,itisusualtoencounterthisheadache-inducingrepresentation:

P(Bi|A)=P(A|Bi)P(Bi)/∑i{P(Bi)P(A|Bi)}

Mostofuslikelywouldpreferthesimpledescriptiveparagraph.

ThegroundrulesfornetworksDiagrams of variables are key. Bayesian networks are graphicallybased methods. As they are based on graph theory and onprobability theory, they fall under the heading of graphicalanalyticalmethods.Graspingtheirworkingsfullyrequiresbothadiagramandthecalculationsthatunderlieit.BayesNets in fact are calleddirected acyclic diagrams (DAGs)

because all the variables connect, and allmust point somewhere.Nonecanpointbacktoitself,orformacyclicstructure.A Bayes Netmay look familiar if you are one of the lucky few

having experience with structural equation models (SEMs) orwith partial least squares (PLS) path models. Variables are

connected with arrows, as in those types of models. You can seepathways among the variables, and finally these lead to a targetvariable.

Arrowsmatter,butnotasyoumightthink

The arrows have a specific meaning in Bayes Net diagrams;however,thisisnotentirelyintuitive.Wecansaythatavariableatthestartofanarrowleadstoanothervariable,orexplainsanothervariable. In highly specific conditions we even can say that thestarting variable causes the variable at the end. However, if wechange the variable at the end of an arrow, the variable at thestartingpointwillchangeaswell.Soeffectsruninbothdirectionsinanetwork.Dealingwiththedatawetypicallyencounter,connectedvariables

inanetworkalmostalwayshaveanequalchanceofbeingthecauseandbeingtheeffect.Stronginfluencesgobothways.Itisveryrareindeed that we can prove one variable in fact causes another,dealing with the messiness inherent in behaviour, opinions andbeliefs.

Termsandphrases:it’sallinthefamily

Thereis,ofcourse,someterminologytolearn.Fortunately,thispartlargely goes down easily, being (for statistics) warm and fuzzy.Someoftherelationshipsare(showninFigure7.3):

Thevariableatthestartofanarrowiscalledaparent.Thevariableattheendiscalledachildoftheparent.Childrencanhaveseveralparentsandparentscanhaveseveralchildren.Iftherearetwoormoreparents,theyarecalledspouses.Aparentofaparentisagrandparent,andsoon.Variablesaredependentonlyiftheyaredirectlyconnected:

–Childrenandparentsaredependentoneachother.

–Childrenareindependentofgrandparentsandothervariablesfurtheraway.

FIGURE7.3Relationshipsinnetworks

Whether variables are dependent on each other becomesimportantwhenscreeningvariables for inclusion inamodel.Onepowerful screening technique involves including only thosevariables that aredependent on the target variable (its parentsand children) and any other co-parents of the children. Thisquicklyeliminateslessimportantvariableswheretherearemany–asindata-miningapplications.Thissetofvariableshasanamealso:theMarkovblanket.

Everythingisconnected

Changesmovethroughthewholenetwork.Understandingthisiscritical.Inwhicheverwaythearrowsbetweenvariablespoint,allvariablesinanetworkchangewhenonechanges.Networksconveyinformationacrossalltheconnectedvariables.

NetworkconstructionrangesfromsimpletocomplexWhen we are attempting to understand relationships amongvariables, the way we fit the network together is of primeimportance. Networks can learn structures from the data,building themselves automatically. This is something akin to aclassification tree constructing itself on autopilot. However,networksarefarlesspronetotakingondifferentshapesbasedonsmallchangesinthedata.There are many ways that you can choose for a network to

assembleitself.Thesimplestmethodsfitallthevariablesdirectlytothe targetvariable.This isverymuch likea standardregression–all the predictor variables are put into the mix and each oneconnectsonlytothetargetvariable,andnottoanotherpredictor.Attheirmostcomplex,networkshavemanybranchesandresult

from countless attempts to develop a best model. They test andretest how variables best fit together to predict the values in thetarget.Thesemethodsensurethatthenetworkdoesnotseizeuponaconnectionthatisgood‘locally’(whereavariableisbeingadded)butnotgoodfortheoverallnetwork.Youalsocanputanetworktogetheryourself.Ifyoudonotviolate

anybasicconstructionrules,itwillreturnanswersaboutvariables’effectsbasedonthewayyouhaveassembledeverything.Therearevariousintermediarystrategiesaswell,suchaslettingthenetworkform an initial shape and then modifying it based on yourunderstandingofthequestionsyouneedtoanswer.

Aboutconditionalprobability

AboutconditionalprobabilityWeneed toexplainconditionalprobability. This ideaappears inmanydiscussionsofBayesNets.Aprobabilitythatisconditionalisnomorethanwhatwejustdescribed:itisanestimateofprobabilitythat takes intoaccountsomeinformationfromanearlierestimateoritemofinformation.That is, we understand how one ormore variableswill change

basedonwhatishappeningtoanothervariableorvariables.Inthesimplest terms, this means all changes in variables consider allvaluesinallothervariables.Theworkingsofconditionalprobabilitymaybedifficulttograsp,

but let’s try in thesmallexamplewehavebeenpromising.This iscalledthethree-doorlet’s-make-a-dealorMontyHallproblem.Theanswerissurprising!

Let’smakeadealSomemayhaveseenthelong-runningtelevisionshowLet’sMakeaDeal, and somemay recall its inimitable formerhost,MontyHall.Althoughhewentoff theairmanyyearsago,showswithhimcanstillbefoundbythecurious,lurkingindarkcranniesoftheweb.The show poses a challenge to contestants. They need to pick

whichofthreedoorshasaprizebehindit–withanaddedwrinkle.The contestant first picks whichever of the doors feels right toher/him.Thehostnever opens that door to revealwhether it hastheprize,though.Rather,heopensanotherdoorthatdoesnothavetheprizebehindit.Thenheasks, ‘Willyoustaywithyourdoororwillyouswitchtotheotherunopeneddoor?’Now we ask you to consider this: if you were advising the

contestant, would you tell her/him to switch or to stay?Alternatively, do you think it makes any difference? Nearlyeveryonesaysthereshouldbenodifference.Andnearlyeveryoneiswrong.

Infact,thisveryproblemanditssolutionappearedinamagazinecolumn by Marilyn vos Savant, billed as the world’s smartestperson.WhetherMsSavantisinfactsmartest,shegotthisanswerright.Youarebetteroffswitching,byafactorof2to1.This correct answer seems somehow impossible to nearly

everyone.Some10,000peoplewrote toMsSavant, sayingshewasmistaken.About1,000ofthemhadPhDdegrees.Let’snowvindicateher.

SolvingtheproblemissimpleusingaBayesNetYou can struggle with standard statistics to get the right answer.You can, for instance, find code on the web for using standardstatistics to run 10,000 simulations to address this question. Youranswerwillbenearlycorrect.AbasicBayesNettakingaboutthreeminutestoconstructcansolvethisproblemexactly.Recallthatwecanmakeanetworkourselvesbylinkingvariables.

Here,wewillformatinynetworkbyjoiningthreeevents:thedooryouchoose,thedoorthatisopenedandthewinningdoor.The simplest representationof aBayesNet looks like connected

shapes. This is what appears in Figure 7.4. We know that thewinningdoorcanbeanyofthethree,andthedooryouchoosecanbeanyofthethree.However,thedoorthatthesneakygame-showhost opens depends on both the door you have chosen and thewinningdoor.

FIGURE7.4Thenetworkforthethree-doorproblem

YoucanseethisdiagrammedinFigure7.4.There,anarrowgoesfromthedooryouchoosetothedooropened.Anotherarrowgoesfromthewinningdoortothedooropened.Thedirectionofthosearrowsmattersthistime,becausethedooropeneddependson,orisconditionalupon,theothertwochoices.Eachof these three spots is called anode.We gain explanatory

power by seeingwhat is inside them. Eachnode actually holds atablecorrespondingtowhatishappeningwiththethreedoors.Firstwesetupthenodeshowingtheoddsofyouropeningeach

door. This is one-third for each door. Then we set up the nodeshowingoddsof eachdoorwinning.That is alsoone-thirdapiece.Hereat lastweencountersomethingthatissimplicityitself,aswecanseeinFigure7.5.

FIGURE7.5Insidethefirsttwonodesofthenetwork

Nowwesetupthelastandmostcomplicatedofthenodes.Oncewehavemadethefirsttwonodesandconnectedthemtothethird,the table describing this node automatically appears in thesoftware,readytofill.Aswementioned,inBayesNets,seeingwhatyouaredoingiskey.Because this table ismore complicated,wewill take it in parts,

startinginFigure7.6.ThisshowswhathappensifyouchooseDoor1.AboveletterA,youseethatDoor1NEVERgetsopenedifyoupickit–itdoesn’tmatterwhichdooristhewinner.Thisisabasicruleofthegame.

FIGURE7.6Partofthethirdnode–whenyouchooseDoor1

IfyoupickDoor1andit is thewinner,youcansee(onthefirstrowofthetableinFigure7.6)thatthechancesofopeningDoor2orDoor3are50percenteach.ThatrowisnexttoletterB.Sofarsogood.Now,ifyouchooseDoor1andthewinningdooris

2,thehostmustopenDoor3orrevealtheprize.Soheretheoddsfor theotherdoorsarenot the sameatall – in fact, theyarezeroversus100percent.YoucanseethisontherownearletterC.Finally,verymuchthesameconditionholdsifyouchoseDoor1

and thewinner is 3. The hostmust openDoor 2. The probabilitythatheorshewilldosois100percent.WeseethisaboveletterD.Oncewehave filled in thecorrectvalues forDoor1, itbecomes

relativelysimpletofillinthecorrectvaluesforDoor2andDoor3.Theentire tablefor thethirdnodeappears inFigure7.7.Wehaveadded someheavilyoutlinedboxes to separate the sectionsof thetable corresponding to each door you might have chosen.Otherwise, all we did was fill in the correct numbers. Again, thenetworkcreatedtheblanktableitselfonceweconnectedthethreenodestogether.

FIGURE7.7Allofthethirdnode

NOTEOutlinesareaddedhereforclarityofreading

NowforthesurprisinganswerMaybethisisagoodtimetotakeadeepbreath.AsyoucanseeinFigure7.8,youshouldswitchdoors.

FIGURE7.8Howlikelytheprizeistohidebehindeachdoor

We have included four possible outcomes, starting with youchoosingeitherDoor1orDoor2.Theseresultsshould leadall (ornearly all) readers to believe that precisely the same thing willhappenifyoustartbychoosingDoor3.We made the network run each time by simply clicking twice.

Firstweclickedonthedooryouchose,whichwentto100percent.Thenweclickedonthedoorthehostopened,whichalsowentto100percent.Looking under letter A, we see what happens if you start by

choosing Door 1 and the host opens Door 2. The odds appearautomatically in thewinningdoornode.Thechancesof theprizebeing behind your door are actually 33.3 per cent. For the otherdoor,Door3,theoddsare66.7percent.UnderletterB,weseewhathappensifyoustartwithDoor1and

thehostopensDoor3.Theodds forDoor2 thenare66.7percent

andforyourdoor33.3percent.JusttoconfirmthatthisisnotaflukerelatedsolelytoDoor1,we

showwhatwouldhappenifyoustartedwithDoor2.OverletterC,youseetheresultifthehostfollowsbyopeningDoor1.Oddsfortheother door are again 66.7 per cent. And finally, if the host opensDoor 3, the odds for the door youdidnot pick are again 66.7 percent.Again, this also underlines a key property of networks. If you

changeavariable,anyvariableconnectedtoitchanges.Weseethatdooropened hasanarrowpointing to it fromwinningdoor. Yetwhen we change door opened, the variable leading to it alsochanges.

Whyisthecorrectanswersodifferentfromwhatweexpect?Thishappensbecausewhatthehostdoesactuallydependson,orisconditional upon, your choice and the winning door. We do notthinkintermsofthiskindofconditionalprobability.Herewe comeuponan issuewithBayesiannetworks.Wehave

justneatlyandsimplyunravelledaproblemthatlikelyeludedmostofus.Andyet,theanswerseemsstrangeuntilitisexplained–andperhapsevenafterwards.As an expert on this subject (Eliezer Yudkowsky) points out,

solvingproblemsof this typeposesdifficulties fornovice studentsandtrainedprofessionalsalike.Wehaveanapproachthat is trulypowerful and that easily reaches answers we cannot guessintuitively.Thismethodlearnspatternsinthedatathatwecannotsee or guess.With that in mind, let’s discuss in more detail howBayesNetswork.

MoreaboutnetworkscomparedwithregressionsBayesNets lookat thewholepatternofscoresorresponses in the

variables theyanalyse.For instance,comparing twovariables thatwewill callA andN, thismethod creates a chart, ormatrix, thatshowshowthescoresrelate.Wecansee this inFigure7.9. In thischart, each box represents how often scores coincide. At the topright, for instance, we see that 33 people gave a score of 10 onvariable A and 10 on variable N, that 44 gave a score of 9 onvariableAand10onvariableN,andsoon.Thedarkenedboxesarewherethescoresonthetwovariablesalignmostoften.

FIGURE7.9S-shapedrelationshipfoundbyaBayesNet,butnotaregression

Bayes Netswill pick up the basically S-shaped relationship thatyoucanseeinthosehighlightedboxesinFigure7.9.Regression,onthe other hand, seeks howwell the relationship between the twovariables falls into a straight line, which we have superimposed.Youcanseethat this isnotaccurate.For instance, itdoesnot fitahigherscoreonvariableAthan8outof10.

Networkstakeamorecomprehensiveview

The straight line is key, as regressions rely on correlations. Acorrelation is a single number describing how well pairs ofvariablesconformtoastraightline.Asareminder,correlationcanrangefrom1,wherethetwovariableschangetogetherinaperfect

straight-line pattern, down to –1,where the two variables have aperfect inverse relationship (one rises exactly as theother falls invalue).Correlations provide a less detailed understanding of how

variablesrelatethanthewholepatterncapturedbyBayesNets.AndmanytimestheBayesNetwillalsoprovideamoreaccuratemodel.

BayesNetsversusclassificationtreesClassification trees take a step towards looking at the wholedistributionsofvariables.Buttheydothis inadifferentandmorerestricted way than Bayes Nets. Classification trees seek to splitapart the scores in the target variable based on scores in thepredictors.Oneway toexpress this is thatclassification trees lookforsituationalrelationships,asinsaying,‘whenvariableNislikethis,THENvariableAislikethat’.The splitting doneby classification trees leads to relatively few

variablesgettingchosenaspredictors.AswediscussedinChapter6,thetreeformssubgroupsthatbecomesmallerateachstep.Finally,thegroupsbecometoosmalltosplitfurther.Thiscompactnesshasadvantages and disadvantages. Variables that you might want toconsidercouldgetleftoutofthemodel.

Differencesindynamicpredictions:networkstakethemorerealisticviewWehavesaidthisbefore,butit isworthunderlining: innetworks,whenonevariablechanges,all theothervariableschangeaswell.Anygivenvariablehasthemostinfluenceonvariableswithwhichitiscloselyconnected.Still,effectsfromchangestravelthroughthenetwork like ripples going across a pond. Another, more formal,way to say this is that information propagates across thenetwork.Thewaynetworksaccountforchangesdiffersstronglyfromthe

assumptionsofregression-basedmodels.Measuringanyvariable’seffectinaregressionassumesthatallothervariablesremainthesame.Thisisaparticularlyunrealisticviewwhendealingwithdatasuchasopinions,beliefsorpurchasingpatterns,asthesealltendtobehighlyinterrelated.Indeed, some authorities on regression (for instance Leland

Wilkinson),bemoanthetendencyofregression-basedmodelstofailin real-world applications, in particular setting policy. In the realworld, when we change one factor, many other factors changealong with it – leading to the downfall of some precise-lookingforecaststhatregressiondelivers.

RealisminBayesNets

BayesNetsgaininrealismbytakingintoaccounthoweveryvariablemustchangeifanyonechanges.Thissometimesisreferredtoasinformationpropagatingthroughanetwork.Networksalsotakeintoaccounttheentirepatternsordistributionsofallvariables.Thisismuchmorerealisticthanregression.There,effectsfromchangingonevariableassumeallothervariablesremainthesame.Andinregressionallrelationshipsarebasedoncorrelations,aone-measuresummaryofhowwellvariablesfitastraightline.Classificationtreesseedifferentpatterns,butallrelationshipsarerestrictedtocertainvaluesofeachvariable,asin‘whenvariableNislikethis,THENvariableAislikethat’.BayesNetslookattheentirepatternineachvariable.

Drivers,causesandwhatworksBayes Nets ultimately lead us to rethink what happens when wemodeltheeffectsofvariables.Manyofushavebecomeaccustomedto talking about the independent variables as ‘predicting’ or‘driving’ the independent. It seems doubtful that this preferredterminologywilleverchange.Still,inaregression,theindependentor predictor variables actually explain some part of the targetvariable’sbehaviour.This may seem puzzling (this is becoming a refrain), but each

independent variable accounts for (or explains) some of thevariance or pattern in the dependent variable. You can see thisclearlywhenyouthinkagainaboutaregressionequation.Intheequation,youmultiplythescoreineachvariablebysome

amount, add up everything (and maybe add some unchangingnumber)–andyouhavethepredictedscoreforthetargetvariable.Eachpredictorvariableactuallyisapartofthetarget’sscore.Nonetheless, thephrasedriveranalysis likely iswithus for the

longterm.This iscommonbutnot learnedusage. ‘Driveranalysis’apparentlyisnotcoveredintextsonstatistics.WithBayesNets,youmay encountermore accurate terminology forwhat happens to atarget variable when you change another variable. This issensitivity of the target to changes in another variable. If you

seethat,anditseemsconfusing,justthinkofdrivers.

Ourfirstexample:BayesNetslinkingsurveyquestionsandbehaviourWe will start with a relatively simple example of Bayes Netsshowing a predictive relationship between responses to surveyquestions and behaviour. In this example, Bayes Nets address animportantsetofquestions:

whattherelativeeffectsareofseveralvariablesonthedependentvariable;howstronglychangingeachvariableaffectsthedependent;thewaysthesevariablesrelatetoeachotherandtothedependent.

The predictor variables were heavily weeded in this example. Asmentioned, Bayes Nets can cut out variables having little directrelationship to the target variable. They can, for instance, restrictthepredictors to those in theMarkovblanket thatwementionedearlier.(MrMarkovandhisblanketcouldbethesubjectofanotherwholediscussion,butwewillhavetoskipthishere.)

NowontotheactualnetworkThisexamplecomesfromasurveydoneamongprofessionalsusingan information technology (or IT)product. It involvedavery longquestionnaire.Wewillleappastthevariablescreeningphase,tothepoint wherewe have just seven key variables. Figure 7.10 showshowthesevariablesrelatetoeachotherandtothetargetvariable,which is the percentage of services each corporate customersigneduptouseagain.

FIGURE7.10Asimplenetworklinkingshareandratings

Beforewedescribehowthevariablesrelatetoeachother,pleaserecall that this network assembled itself. That is, the Bayes Netroutine sought out the strongest patterns of relationships in thedata, then created a diagram of how the variables relate. Thisarrangementdefinitelypassesthetestofcommonsense.

Starting at the top in Figure 7.10, you can see that all predictorvariablesotherthangivesmeacompetitiveadvantageultimatelyconverge on that one. This means that all the other variablesexplainsomethingaboutthewayinwhichthiscompanyisseenasgivingsuchanadvantage.Thisvariable isdirectly explainedbyhelp ingenerating leads.

Threevariableslinkdirectlytothislattervariable.Theseareratingsforsalessupport,pricingandeffectivecommunications.Additionally, value of web services supports effective

communications. In turn,valueofweb services is supportedbyeaseoffindingwebservices.Thenetworkconfiguration,asareminder,alsowilldeterminethe

importances of the variables and sensitivity of the dependentvariabletochangesintheindependents.Correctpredictionlevels,whichwediscusssoon,arehigh.Sowecanhaveconfidencethatwehaveanaccuratepicture.Thesensible-seemingarrangementofthenetwork gives us more assurance that effects are measuredaccurately.Figure7.11showstheabsoluteandindexedsizesofeffectsfrom

changing each predictor variable. This absolute effects chartshows, for instance, that changing gives me a competitiveadvantage has 45 per cent of the effect possible if we couldsomehowdirectlychangethetargetvariable.

FIGURE7.11Effectsrelativetothetargetandindexedversustheaverage

Theseabsoluteeffectscanbehardtocompare,sotheotherchartinFigure7.11showsthemindexed.Theaverageindexissetto100.This chart shows, for instance, that gives me a competitiveadvantage has 1.94 times the effect of each variable on average(seeninitsindexof194).The unindexed values are similar to the Beta values or

coefficientsfromaregression.However,networksdevelopafullerunderstanding of effects. Changes in a predictor variable actuallycanhavevaryinglevelsofimpactfordifferentvaluesofthetargetvariable.Wecouldmakeanotherchartallowingustoseethis.While we will skip this second chart in this short example, it

would indeed show that effects on the target variable vary,depending on the values of the predictor variable and the targetvariable.Thatis,oversomerangesofthevaluesofeach,thetarget

variablechangesmore.Thisisanotherlevelofinformationbeyondwhatwecanseeinasimpleregressioncoefficient.There is still more that we can learn. Inside the more

comprehensiveprogramscreatingBayesNets,thenetworkdiagramitselfisdynamic.Wecanchangeavariableandobservehoweveryothervariablechangesasaresult.Someprogramsalsoallowyoutomake stand-alone simulator programs. These are similar to thesimulators for discrete choice modelling and conjoint analysisdiscussedinChapters4and5.

Informationmovingacrossthenetwork

Variables that are closely connected (such as parent nodes andchildrennodes)have strong influencesoneachother.The furtherthe variables are from each other in the diagram, the less impactthey tend to have on each other. However, all changes propagatethroughthewholenetwork.As a reminder, this whole-network understanding of the data

makesanypredictionofreal-worldeffectsmuchmorerealisticthanthe estimates from a regression-based model. In regressions, wemustassumethatallothervariablesremainthesameifwechangeanyonevariable.

Howthemodelperformed

Correct prediction levels were very strong indeed for fittingquestionnaire questions to behaviour. The level of correctprediction for the target variable (percentage of business signedfor)was84percent.Thiswasusingastringentformoftestingorvalidation of results, called cross-folded validation (more onvalidation follows). Without any validation, predictions were astellar93percentcorrect.Thoseofuswhohavetriedtofitquestionnairequestionstoactual

behaviourknowthat thisalmost invariablyhaspoorresultswhenusingregression-basedmodels.Networksdonotalwaysdoaswell

as thisone.Still, theyhaveusuallyoutperformedregression-basedmodelsonoverallmeasuresofmodelfitwithabehaviouraltargetvariable,suchasactualuselevelsorpurchases.Inthiscase,thebestregression-basedmodelemergedfromavery

complexvariantofregressioncalledapartial leastsquares (PLS)path model. This created intermediary variables that groupeditems,andevensoturnedinapaltry11percentcorrectpredictionofshare.

Whatpreciselyisvalidation?We discussed validation in Chapter 2, but a refresher could beuseful.Validationisnotanewidea,butthenotionthatyoushoulduseitregularlyhasgainedsupportasdatasetshavebecomelargerandmodelshavegrownmorecomplex.Withvalidation,youbuildthemodelonpartofthedata,holding

asidetherestofit.Youthentryoutthemodelontheportionofthedatathatyouheldaside.Whenyoutrythemodelonthishold-outsample,predictiveaccuracyusuallycomesinatalowerlevelthanwhen you simply look at the how themodel performedwhere itwasmade.Even the best predictive modelling technique will fit some

randombumpsandfluctuationsthatarefoundonlyinthedatasetonwhichamodelwasbuilt.Tryingoutthemodelelsewhere,evenonanotherpartofthesamedatathatyousetaside,givesyousomesafeguards against overfitting. Overfitting builds a model usingseemingpatternsinyourdatathatyouwillnotfindintheoutsideworld.

Modelvalidation

Thisisholdingasidesomeofthedatawhenyoubuildapredictivemodel,thentryingoutthemodelonthatportionofthedata.Thisaimstogiveyouabetterreadingofhowwellthemodelwillperformintheoutsideworld,whenitisactuallyapplied.Ithelpsavoidoverfittingyourpredictivemodeltofeaturesthatarepeculiartojustyourdataset.

Whyis itgoodtovalidate?AsmentionedinChapter2, largerdatasets often lead tomany effects seeming to bemeaningful, simplybecauseyouhavesomuchdata.Statisticaltestsstarttobreakdown,becausewithenoughdataeverythingseemssignificant.When you have amassive data set, it is entirely feasible to put

someof it toonesideandhaveampleamounts left forbuildingacomplexmodel. So validation often is a prudent step in assessinghowwellpredictivemodelsactuallywillperform.

BayesNetsconfirmatheoreticalmodel,mostlyMany theoretical models postulate that a person must processinformation incertainways,or thatcertainpsychological changesmusttakeplace,beforethatpersontakesaction.Weknownowthataction can arisewithoutmuch thought – andmanymore cynicalobserversalwayssuspectedasmuch.However,modelspostulatingcomplexpathwaystobehaviourstill

have enormous influence. They could be called stepwise, ormoreformallyhierarchical.Wecanfindsuchmodelsgoingbackatleastto the turnof the 19th century. The so-calledAIDAmodel, by thegratifyingly named E St Elmo Lewis, dates back to 1898. AIDAstandsforAttention,Interest,Desire,Action.Perhaps the most influential of these hierarchies appeared in

1961, posited in a brief article by Lavidge and Steiner, called ‘Amodel for predictive measurements of advertising effectiveness’.

They postulated six stages, building from awareness to purchase.The ideawas that you needed to pass through and complete onestagetomovetothenext–somethinglikeclimbingtherungsofaladder.TheirmodelappearsinFigure7.12.

FIGURE7.12Aclassichierarchicalmodel

Onceyouarealertedtotheexistenceofhierarchicalmodels,youwill notice them in many places. The Lavidge and Steiner modelitself appears in many slightly altered forms, some withoutattribution.Manyofthemeasurementstrategiesthatorganizationsroutinely follow,suchaskeeping trackofhowmanyareawareof

theirproducts,reflectbeliefinahierarchicalmodel.Thesemodels also have had great importance due to analytical

limitations. Foryears, itwasalmost impossible to linkbehaviourswithquestionsaboutawareness,likingorpreference.Thisproblemarose because regression models expect linear relationships, andmany times therelationshipbetweenshareor salesandratings isnotlinear.WesawthisintheloyaltyscoreexampleinChapter6.

GoingthenextstepwithBayesNetsThemodelinFigure7.13isbasedonastudyofcommercialbrokersdone by a firm that provided services to them. The survey askedmanyquestionsabouthowwellthefirmperformed,andhowtheircustomersperceivedthemandfeltaboutthem.Theyalsoaskedforself-reported estimates of the study sponsor’s share of business.(This diagram is part of a much larger model with many othervariables.)

FIGURE7.13BayesNetsdiscoverahierarchy

Figure7.13showshowBayesNetsextendedandrefinedamodelthatwastriedfirstwithacomplexformofregression.Thisspecifictype of regression is called a partial least squares (PLS) pathmodel.PLS path models group similar variables together, and those

grouped variables then influence the dependent variable, andsometimeseachother.Thesegroupedvariables(theso-calledlatentvariables) actually are created in the analysis, then added to theonesthatweredirectlymeasured.Youneedtofigureoutwhattheselatentvariablesrepresentand

namethem.Thoseofyouwhoarefamiliarwithfactoranalysiswillrecognizetheselatentvariablesasbeingsomethinglikethefactorscreatedbythatmethod.PLS path models overcome one salient problem of regular

regressions, namely dealing with variables that are highlycorrelated. Regular regression cannot adequately handle highlycorrelated variables. Some of them will get squeezed out of themodel, and readings of the strengths of the others typically getdistorted.

Theunderlyingtheorybeingexplored

The PLS model was based on a theoretical hierarchical model. Itwasbuiltbyhand,slowlyandpainstakingly.Thismodelpostulatedthat the client’s actions were part of satisfaction, while thecustomer’sfeelingswerepartofrelationship.Together,theseweresupposedtopredictloyalty,whichinturnwouldpredictshare.Loyaltywasconceivedasaweightedaverageofthevariablesin

thetwogroups labelled inFigure7.13as ‘whatIfeel’and ‘whatIdo’.Theoriginalmodelfinallytriedtolinkloyaltytoshare.Whilethatmodeldidwellpredictingwhatwentintoitsviewofloyalty,itpredictedsharewithonly18percentaccuracy.This was a clearly unacceptable level, but it was the best that

couldbemanaged.RecallthatthePLSpathmodelwasconstructedin many steps and with many stops and starts. Elements wereadded and subtracted to theorized groups. Combinations thatproved to be impossible had to be scrapped and tried again, andagain.

LettingthedataspeakwithaBayesNet

BecauseofthepoorPLSpathmodelresults,aBayesNetwastriedasanalternative.Itwasallowedtolearnpatternsinthedata,andtheresultappearsinFigure7.13.Thenetworkshowsaclearhierarchy.However, this differsmarkedly from two chains of variables, onerelatedtosatisfactionandonerelatedtorelationship.TheBayesNetdeterminedpatternsinthedatawhereeffectsran

inadifferentsequence.This is thehierarchy thatappeared,goingfromactionstothetargetvariable(share):

Whattheydo(objectively)→Howtheytreatme→HowIfeel→WhatIdo→Share

TheBayesNet reached 62 per cent correct prediction,with cross-validation.Themodel seemsentirely sensible. It supportsparts ofthe postulated model. The various feelings proposed to make uployaltydo fall together,and they leaddirectly to share.Theothervariablessupportorexplainthisloyaltygroup.However, the reality otherwise differs from the initial theory.

Actionsdonotformagroupworkingseparatelyfromrelationshipto influence loyalty, and then share. Rather, how actions areviewedby customers informs or explains their feelings,which inturnexplainshowtheyact,which inexplainsshare.This iswhatthe diagram shows. The strong prediction of share by thismodelarguesthatthisisplausible.

Whatnext?Variables’importancesalsocomefromthisanalysis.Whilewewillskiptheentirecharthere,severaltypesofspecificactionsemergedas important in affecting feelings and shares. Among these wereseveral that appeared in Figure 7.13 and a few we did not haveroomtoshow.Theseincluded:

designatedcontactthroughout;communicationswhenupdating;designatedcontactfornewbusiness;voicemenusystem;dedicatedpersonforproblemresolution.

Knowing the importance of these items could possibly provideenough information to guide decisions. Ultimately, the clientdecidedtotestthespecificwaysthattheseareascouldbeimprovedusing theconjoint-basedserviceoptimization thatwediscussed inChapter5.

Hereweseeboththestrengthandtheweaknessofthisanalyticalmethodwhenusedwithsurvey-basedratingquestions.Themethodispowerfulenoughtogiveapreciserelationshipbetweenchangesin ratings and changes in share. However, the limitations ofstandard survey ratings make it difficult to move from knowingwhichchangesmustbemade toknowingpreciselyhowchangesmustbemade.

Whatisimportanttobuyersofchildren’sapparelAmajormanufacturerofchildren’sapparelranlargesemi-annualsurveys in which they measured reactions to their products andcompetitors’ offerings. They asked numerous questions related toawareness, shopping, use patterns, and ratings of the products inmanyareas, suchasvalue formoneyandappropriateness for thechild.Theyalsoaskedshopperstoestimatewhatpercentageoftheirspendingwenttoeachofmanybrands.Theyhadmanythousandsofanswersandwantedtolearnmore

aboutwhatspecificallydrovelevelsofpurchasing.Afterexploringafew alternative methods without much success, includingregression-basedmodels and classification trees, they tried BayesNets.Theresultswerehighlyinformative.

ThespecificsoftheanalysisChildren’sapparelisactuallynotasinglemarket.Basedontheageof the child, up to 13, expectations and needs vary sharply. Thismanufacturersawchildrenasfallingintotheseagegroups:

infants(ageuptotwo);toddlers(age2–4);earlyschoolage(age4–8);

pre-teens(age8–13).

Parentswithchildrenineachagecohortwereanalysedseparately.Therewere some 9,000 cases in total, or about 2,250 in each agegroup.Inall,some46variablesweremeasuredandanalysed.Youcan see the importances of these as determined by Bayes Nets,disguised in Figure 7.14. (This disguising is a necessary evil usingrealdata.)

FIGURE7.14Importancesfor46(disguised)attributes

Strongdifferencesemerged,mostmarkedlyintheimportanceofthevariableabrandItrust.Thishasexceptionallylowimportanceforparentsofinfants.Thisperhapsshouldbeexpected,asformanyparentsatthattimeallbrandsarenew.Importancerosetoaboutaverageamongthenextagecohort,the

toddlers,thenskyrocketedforchildrenaged4–8.Forthatgroup,itactually is one of the three most important attributes out of theentirelonglist.However, for the pre-teens, the importance of brand I trust

plummeted toavery low level,as lowas for infants.Thisposedaseemingmystery,butsoonpatternsinotherratingsexplainedwhathadhappened.Figure 7.15 shows the pattern in ratings forbrand I trust. The

otherchart in this figureoffersanexplanationofwhathappened.Threerelatedconcernsbecomehighlyimportantamongparentsofpre-teens: a style appealing to me, a brand or style the childlikes,andclothingappropriateforachildthisage.

FIGURE7.15Importanceoftrustandfashionbyageofchild

We see here a sad story of parent–child relationships. Amongfamilies with pre-teens, everything else gets subsumed by an

argument over the question of, ‘You are thinking of going outwearingWHAT?’Hereisacriticallearningforcreatingmessagesaboutthebenefits

of thismanufacturer’s fine products. By allmeans, for parents ofearly-school children, stress how this is a brand that families cantrust. For pre-teens, stress how parents will find the clothingacceptable(allright,notheart-stopping)andtheirchildrenwillnotsimplysay,‘Ican’teven?’(thislastlocutionwascurrentfor‘noway’when the bookwent to press, but by the time you read this,whoknows?).The manufacturer did this analysis after a segmentation study,

and so was particularly interested in what was important inaffecting share of wallet for their target segment, called SavvySylvia.Thiscompany’smanagementreallybelievedinthevalidityofthesegmentsfound,andthetargetsegmentwasaloadstonefortheirefforts.Forthehard-workinganalystinvolved,itwasstrangebut gratifying to hear the whole company asking ‘What wouldSylviado?’

RunningyourownBayesNetsSoftware unfortunately is a confusing terrain, evenmore so thanfor classification trees. Some programs are free, although theyfollowthegeneralrulethatfreesoftwareishardertouseandmorelimitedinwhatitcandothanprogramsyoubuy.Youcanspendagreat, great amounton someBayesianNetwork software.But youmay not have to, because there are some very powerful choicesamongthemorereasonablypricedoptions.Aswithanysoftware,it isbesttostartwithacuratedlistonline

(that is, a list with a person’s name attached). Then try as manyprogramsasseemreasonablylikelycontenders.BecauseBayesNetsoperate differently from other methods, even if you use regularstatisticalsoftware,therewillbeatleastsomelearning.Ultimately,onlyyouknowwhatwillworkbestforyourneeds.

SummaryandconclusionsBayesNetsorBayesianNetworksarearemarkablesetofmethodswithstrongpredictivepowers.Aswesawinthefirstexample,theycansailwithaplombthroughproblemsthatareextremelydifficultwith traditionalstatistics–andthatcompletelybafflenearlyallofus.They typicallydobetter thanmore traditionalmethods, suchas

regressions,inseveralareas.Inparticular,theyexcelingettinganaccuratefixonvariables’importances,whileincludingallvariablesofinterestinapredictivemodel.Networkstakeintoaccountalltheinterrelationsofallvariableswhenestimatingimportances,unlikeregressions, which assume that when one variable is changed allothersremainconstant.Like classification trees, Bayes Nets give us insight into how

variablesinteract,buttheunderstandinginBayesNetsextendstoaholistic view of the ways in which all variables in a model fittogether. Classification trees’ view of interactions is situational,thatis,anyvariablebeyondthefirstoneinthemodelwillhaveaneffectonly if the other variables have specific values. BayesNetsmodelsseepatternsacrossallvaluesinallvariables.BayesNetsalsoaremorecomprehensivethanclassificationtrees,

in that trees tend to produce simple models with relatively fewpredictors.YoucanenteragreatmanyvariablesintoaBayesNetsmodel,anditwill includeandevaluateallof them.Evenwiththiscomprehensive view,BayesNets runquickly, just as classificationtreesdo.

New,differentandworthgettingtoknowBayesNetsarenewermethodsthatuseanalyticalapproachesquitedifferentfrommoretraditionalmethods.Theycansolveproblemseasily that may entirely elude other analytical procedures.However, they require some new terminology and their

unfamiliaritymaymakesomeaudienceslesswillingtoacceptthem.Bayesian Networks definitely are machine learning methods.

They fall under the heading of graphical analytical methods.Diagrams are key to understanding theirworkings. You can get agreatdealoutofnetworks–suchasvariables’relativeimportancesandeffects –without ever lookingat thenetwork itself.However,thosewiththepatiencetoexaminethestructureofthenetworkwilldevelop a fuller understanding of whether the outcome makessense.BayesNetsdonotinvolveequations(asregressionsdo)orsetsof

if-then rules (as classification trees do). At times, you mightencounter someone who needs to see equations, and for thatperson,networksmightnotprovideasatisfyinganswer.Tosee theeffectsofchangingmore thanonevariableata time,

you need to go back to the network diagram itself, in thoseprogramsthatdynamicallyallowyouto interactwiththenetworkdiagram.Somesoftwareprogramsallowyou to constructa stand-alone simulator. These simulators are similar to the ones thatweencounteredfordiscretechoicemodellingandconjointanalysis inChapters4and5.Inanetwork,evenwitha singlevariablebeingchanged,effects

can vary depending on the values of the target variable and thepredictorvariable.Effectsfromchangingseveralvariablesatoncerequirestakingmanyfactorsintoaccountandcertainlyarenothingwecouldsee intuitively–even though thenetworkwillhandleasmanychangesasyouwantwithaplomb.Wehavecoverednewmaterialandnewconceptsinthischapter.

Using Bayes Nets requires still more in terms of hands-onexperience and learning about the details of the software youchoose.Theyalmostalwaysmerittheeffort.Still, likeanymethod,BayesNetsarenotguaranteedtodevelopusefuloutputeverytime.At times, the questions being analysed themselves limit how

much you can extract from the analysis. For instance, in the

examplewiththebrokerage,themostimportantvariablesaffectingshareemergedclearly.However, theratingsanalysedwerebroad.They showedwhat needed to change, but not how to make thechanges. For that, another approach was needed, namely, theconjoint-basedserviceoptimizationdiscussedinChapter5.Overall, Bayes Nets are among the most powerful of analytical

methods, typically producing very strong predictive models thatoften can address your informational and strategic needs. Anylearningthattheyrequiredefinitelywillbemorethanrepaid.

BonusChapter2:Artificialintelligence,ensemblesandneuralnets

Thisbonusonlinechapterexplainsthepromiseandpitfallsoftheonemethodmostcloselyassociatedwithartificialintelligence:neuralnetworks.Wediscussthepracticalapplicationsandlimitationsofthisapproach,includinghowitperformedversusothermethodswithactualdata.Wehelpyounavigatethroughthemorassofterminology,hypeandconflictingclaimsaboutthisevolvingfield.Thechapteralsoreviewsensembles,includingonemethodwithstrongpredictiveperformanceandaninterestingname:decorate.Finally,itreviewssomeofthemorebasicquestionsraisedbythequestforincreasinglyautonomousmachines.

Accessthisbonuschapteronlinehere:



08PuttingittogetherWhattousewhen

Ifwetakeabroaddefinitionofartificialintelligence,thenweseethat it actually has been working alongside us for many years.Thosewho did not skip directly to this chaptermay have noticedthis.Thisalsoassumesthatthistopichasnotbecometotallyblurredforreaderswhohavegotthisfar(andthankyouforpersevering).AllthewaybackinthePreface,wementionedwhatsomepeople

working at the epicentre of artificial intelligence, Alphabet(formerlyGoogle),hadtosaywhenaskedtodefinethisterm.Asarefresher,herearesomeanswers:

‘Iwoulddefinitelyinterviewsomeoneelse.’

‘I’mnotsure.Ihaven’tdoneanythingwithAI.’

‘Idon’tknowanythingaboutit.’

‘It’smachinelearning.’

‘IworkatYahoo…’

InChapter1,wehazardedadefinitionofartificial intelligence.Soyou do not have to page back, here it is again. Broadly,artificialintelligence means anything a machine does to respond to itsenvironment tomaximize its chancesof success. Sincewearenotdiscussing cars that drive themselves ormachines that otherwiseamble around in the outside world, the environment of ourmachines, computers, is data. We set their goal as detectingcomplex relationships we cannot, to aid in our making betterdecisions.

Wecanfindthistypeofartificialintelligenceinmanyplaces.Forinstance, we encounter this in theway the computer takesmanyalternativeviewsofaproblem,asinensemblessuchasboosting,random forests or decorate. We see it in the way that thecomputer learns from thousandsofpasses through thedata, as inthe hierarchical Bayesian analysis that we use to extend thepower of discrete choice modelling, conjoint analysis andmaximumdifferencescaling.Inadifferentway,wefindcomputer learning insolutions that

completely escape us, as in Bayes Nets and how they useconditional probabilities. This method finds underlyingrelationshipsandsolvesproblemsthatentirelybaffleus.Oneoftheearliestapplicationsofartificialintelligenceisfoundin

thehighlysophisticatedtestingdonebyclassificationtrees.Oneofthemostcurrent,andmostrapidlyexpanding,applicationsisinthecomplexlearningdonebyneuralnetworks.

ThetasksthemethodsdoOur discussion has been organized by method. We explained aparticularapproach,thenexplainedwhatitcouldaccomplish.Thischaptertakesanalternativeview,startingwiththetypeofproblem,thenreviewingthemethodsthatcanaddressit.

First,whatoldermethodscandoWeexplained onemore traditionalmethod,Q-Sort/Case 5,whichcangiveyouaclearhierarchyofimportancesforlonglistsofitems.We included this approach because it nicely complementsMaximumDifference Scaling (MaxDiff), which getsmuch of itspowerfromthemachinelearningmethod,HierarchicalBayesiananalysis.Many other long-established methods solve key problems. For

instance,clusteringneatlyhandlestheproblemoffindinggroupsin data.Discriminant analysis can determine differences amonggroups. It also can assign new people into groups. Clustering hasbeenappliedsinceatleastthe1950s,discriminantanalysissincethe1930s.Newer,high-techalternativestothesemethodsexist,butwedidnothavethespacetodiscussthem.Several other methods use advanced concepts and plenty of

calculationsbutdonotinvolvemachinelearning.Amongthese,wefindmappingmethodssuchascorrespondenceanalysis,MD-PREFand bi-plots. They serve the important function of visuallyrevealing relationships among groups or among brands.Comparisonsofmanygroupswouldbeaddressedbymanymethodsin theanalysisofvariance(ANOVA) family.Someof thesereachhair-raisinglevelsofcomplexity.A complete tool-kit for solvingproblemsandaddressing tactical

andstrategicissuesthereforeneedstoincludeamixofthecutting-edgeandtheold.Nowlet’sgoontowhatcanbeaccomplishedwiththenewermethodswehavebeendiscussing.

FindingandcharacterizinggroupsChapter 6 showed how classification trees greatly boosted theoddsof finding a group. Thismethodnot only locates groups, butcharacterizes them. It also can determine what characterizesdifferentlevelsofconsumptionorspending.Classification treesprovidesimple if-thenrules thatareeasy to

programintoadatabase.Usingthese,newpeople(outsidethedatasetbeingexamined)canbeassignedtogroupsorassigneddifferentprobabilities of acting. The gains analysis (or lift or leverageanalysis) provides a straightforward roadmap to locating bestprospects.Classification trees can be supplemented with random forests.

Thisensemblemethod,alsodiscussedinChapter6,runshundredsoftrees,randomlyswappingpredictorsandcases(people)intoand

out of each run. Based on these hundreds of trees, it developsimportancesforeachvariable.Unfortunately,likeallensembles,itsworkingsareopaque.IntheexampleinChapter6,weusedrandomforeststozeroin

on a list of definitely important predictors. We then used thosepredictors to construct a single classification tree that could bestudied,evaluatedandapplied.Boosting also can extend insights from classification trees. In

Chapter6weshowedhowaboostingmethod,AdaBoost.MI,addedtoamodeldevelopedusingclassificationtrees.Thismodelshowedwhich psoriasis patients were most at risk of severe depression.Boosting highlighted two values of one predictor that warrantedparticularattention.

FindingwhatismostvaluableinalistofitemsDetermining what is most valuable in a list of items has manyapplications. For instance, it can help you decide which of manypossibleincentives(or,astheyarecalled,‘freegifts’)havethemostappeal.Thisalsocanhelpprioritizecorporatemessageelementsorclaims.The first online bonus chapter (available at

www.koganpage.com/AI-Marketing) shows how MaximumDifference Scaling (MaxDiff) and Q-Sort/Case 5 provide asubstantiallyclearerpictureofimportancesorlevelsofappealthandirect scaled ratings. Both methods provide ratio-level readings.Youcansay,forinstance, ‘FreegiftNistwiceasappealingasFreeGiftT’.The power of MaxDiff is greatly enhanced by Hierarchical

Bayesian analysis. Thanks to this machine learning method,MaxDiffcanprovideindividual-leveldata.TheQ-Sortmethodrelieson an older form of analysis, and provides importances only forgroups.While theQ-Sort method is lower tech, it can prioritize many


more items. In practice, MaxDiff starts to become overlyburdensome for people participating in a survey with about 35items. Q-Sort/Case 5 results have been reported with up to 100items.

Determiningtheeffectsofvariables(aka‘drivers’)Bayes Nets remain one of the best methods for determining theeffectsofvariablesonanoutcome,suchasshareor intent tobuy.They work remarkably well with non-experimental data, such asquestionnairequestions,demographicsandothercharacteristics.They have many features that make them a superior method.

Among these, BayesNetsmeasure effects taking all variables intoaccount.Thisdiffersentirelyfrommethodsintheregressionfamily.There,measuringtheeffectsofchanginganyvariablemustassumethatallothervariablesremainconstant.(Thatrarelyhappensintherealworld.Wetypicallyfindawebofinterconnectedrelationshipsandeffects.)Bayes Nets also look at the whole pattern of scores in all

variables. A regression-related model is based on correlations. Acorrelation is a simple one-number summary of how well twovariables fall into a straight line. Bayes Nets also take a morecomprehensive view than classification trees. Trees describespecific situations, as in ‘variable Thas this effect IF (andonly if)variableRhasthisvalue’.Bayes Nets also provide ratio-level importances, as do MaxDiff

andQ-Sort/Case5.However,BayesNetsgobeyond thesemethods,alsoprovidingtheabsolutestrengthsofeffects.Networkswillshowthe precise amount of influence over each range of the predictorvariableandthetargetvariable.Itcanrevealeffectsthatareeitherstraight-lineorthatchangeinnon-linearways.As we showed in Chapter 7, Bayes Nets allow you to see the

patternsofconnectionsamongvariables.Youthencanchecktheseto see if theymake sense.You shouldacceptananalysisonly if it

providesareasonable-lookingnetworkdiagram.Thisalmostalwayshappens.Random forests and boosting also return variables’

importances. However, Bayes Nets usually have higher predictiveaccuracy, and they take amore comprehensive view of the data.Therefore, they typically are a better choice for determiningvariables’ importances. In those few instances where theperformance of a Bayes Net is indifferent, you could check theseothermethodsasanalternativewaytofindimportances.

Optimizingcomplexmessages,advertisementsorsingleproductsConjoint analysis has worked well in applications like these.Conjointanalysis, inparticularthetraditionalfull-profilemethod,hastheadvantageofaskingforanevaluationofawholeproductormessage.Thisensuresthatelementsorfeaturesareevaluatedinarealisticcontext.However, ifyouwanttodeterminehowproductswould fare in a competitive environment, discrete choicemodelling is a superior method. Chapters 4 and 5 explain thereasons.Experimental designs underlie conjoint analyses. These ensure

thatthereisnorelationshipinthewayfeatures(orattributes)varyfromoneevaluatedproductdescription(orprofile)toanother.Thiscompletelackofrelationshipinthewayprofilesvaryensuresthateffects arising from changing any attribute are measured purelyandprecisely.Conjointanalysisisanestablishedmethod,goingbackatleastto

the1970sinanolderform.Itcanbeextendedconsiderablybytheuseof themachine learningmethod,HierarchicalBayesian (HB)analysis. Complex products and serviceswithmany features andmanyvariationsof those featuresnowcanbe investigated.BeforeHBanalysis,thiswasnotpossible.

Conjoint analysis leads to market simulator programs. Theyshow the effects of changing features in any of hundreds orthousandsofwaysandreturnresultsinrealtime.Theseareeasytouseand can rununderpopularprograms suchasMicrosoftExcelandMicrosoftPowerPoint.

DeterminingresponsestoalternativenewproductsinacompetitivemarketplaceDiscrete choice modelling remains the best method fordetermining the effects of varying a product, introducing a newproduct,and/oreliminatingproductsinacompetitivemarketplace.Like conjoint analysis, discrete choice modelling uses designedexperiments.Anyeffectsfromvaryingthefeaturesofaproductorservicearemeasuredprecisely.Also like conjoint analysis, discrete choice modelling gains

considerablepowerfromtheuseofHBanalysis.WithHBanalysis,you can measure effects in complex situations where manyproductsorserviceschangeinmanyways.Discretechoicemodellinggoesbeyondconjointintakingawhole-

marketplace perspective. Consumers participating in a well-designed discrete choice study see the main choices they wouldhaveintheactualmarket.Theydowhattheywoulddoinreallife–makeadecision.Thismethodhasthemostrealismofanymethodforforecasting

whatwillhappeninachangingmarketplace–assumingitissetupwell.Unfortunately,though,therearefartoomanypoorlythoughtout andpoorly designed studies. It is important to set the contextfor the decisionwith reasonable accuracy. Recent research showshowmuchconsumersrelyonthiscontextwhenmakingdecisions.Italsoiscritical tothinkofallreasonablecontingencies.Youneedto consider both how your fine product might change and whatcompetitorsmightbedoingatthesametime,orinresponsetoyouractions.

You also need to build in adjustments to go from the share ofpreference found in simulations, to estimatemarket share.Mostcommonly, you need to adjust for how aware consumers are ofeachproductandhowwidelyeachisdistributed.Ifyousucceedwithall thehard thinking involved, thesestudies

can have remarkable predictive power. Market simulators andmarket simulations can do exceptionally well at forecastingwhatconsumerswillchooseinarealandchangingmarketplace.

FindingcomplexrelationshipsandstructuresindataBayesNetsandclassificationtreescomeclosesttotheelusivegoalofmachine learning spontaneously showing ‘patterns’ in thedata.Bayes Nets take a holistic view, looking at how all variables fittogether into a network of relationships and effects. Classificationtreesrevealinteractionsmoreclearlythananyothermethod.An interaction occurs when two or more variables together

produce an effect that we would not expect from the individualvariables. In our example in Chapter 6, for instance,we saw thatfamilieswithfiveormorechildrenwholivedinthesuburbsboughtmore cardboard-like cereal than either families of that size or allsuburbanites.Both characteristicsworked together to lead tohighlevelsofpurchasing.BayesNetsincludeallthevariablesthatyouareexamining.They

allconnectintoanetwork.Youwillgetstrengthsofrelationshipstothe targetvariableandconnections forall of them.Classificationtrees are sparser. In our cruise line example in Chapter 6, forinstance, only 10 variables clearly differentiated levels ofpurchasingamongover340,000peopleinadatabase.Bothmethodsrevealstructuresthatwecouldnototherwiseunderstand.

NowaboutneuralnetworksNeural networks are being touted as the future of artificial

intelligence.Andwithswathsofdataandampletraining,theyhavedoneremarkablethings.Forall theirpromisedstrengthinfindingpatterns, though, they remain largely mute about what they aredoing.InBonusOnlineChapter2(availableatwww.koganpage.com/AI-

Marketing)weuseneuralnetworkstotrytopredictshare.Asyouwill see, the diagram that the network provided was basicallyunintelligible,aswerethereportedvariablestrengthsinthehiddenlayers.Italsodidpoorlyatprediction,substantiallyworsethantheother

methods we discuss. Even with 1,800 cases and 70 variables, wemightnothavehadenoughdatafortheneuralnetworktofunctionwell.Wealsosawproblemswithoverfittingofthemodeltothedata,

traditionally aweak spotwith thismethod.New implementationsarebeingdevelopedall the time,and so someof these limitationsmay well be overcome. For now, these networks would not be afirstchoiceinmanycases.Ifyouhavealotofdata,andcandoalotoftraining,youmightwanttotrythem.

ThinkingaboutthinkingWantingcomputerstodomoreoftheheavyliftinghasbeenalong-heldwish. Itgoesfarback,at least to theearliestdaysofmachinelearning.Around1959,anarticleaboutteachingcomputerstoplaychequershopedforthehappydaywhenwewouldnotneedtodosomuch programming of every step. More work for the computer,ratherthanus,remainsagoal.Butwhatdoyouwantthemachinetodo?In an old joke, a computer salesperson tells an executive that a

newcomputerwill cuthisworkby 50per cent. ‘In that case’, theexecutiveanswers,‘Iwilltaketwo.’Thismaynotbethefunniestthingyouhaveread,eveninthelast


half-hour,butitunderscoresanimportantquestion.Wheredoyouwant to be in control, andwhere do you cede this control to thecomputers, and/or the friendly vendors running them? We raisethis issue at the end of Bonus Online Chapter 2, and it is worthtalkingaboutagainhere.Many of the newer methods, like ensembles, are basically

incomprehensible. Neural networks epitomize the class ofmethods that we cannot fully understand. They are doingsomething,butpreciselywhatremainsentirely,ornearlyentirely,outofsight.Youmayrecallthecoefficientsfromtheneuralnetworkthatwe

show in TableA2.2. In noway could those be squaredwithwhatseemed sensible. The coefficients’ sizes looked wrong and theirsigns–positiveornegative–seemedbackwardabouthalfthetime.Howwouldyouknowthatthisdoesnotreflectabasicmistake?You could argue that the proof is in the doing. But then, this

networkdidnotdo thatwell. Itpredictedpoorly. Italsoshowedasalientproblemwithoverfitting ofdata–modelling features thatwerepeculiar to thatonedataset,but thatwouldnotbe found intheoutsideworld.Also,whatdoyoudowhenyoucannotpreciselyjudgehowwella

system is doing? It is easy to tell if a self-driving car is not doingwell,forinstance.Itcrashes.Butwhathappensifyouarerunningasystemthatsupposedlylearnsfromthedata,anditdoesamiddlingjob– just likemanyanalyseswhendealingwithmessy,real-worldsituations?Howwouldyoucheck,forinstance,ontheperformanceofanautomatedrecommendationsystem?Wemaywellbecomingtothetimewhenwesetcomputersloose

on data, and they learn by themselves. Theymight even find outonce and for all if there is a connection between the sales ofdisposable diapers and beer. (If this reference is hazy, wementioned inChapter2whatamounted toanurban legendaboutsomebody serendipitously finding such a relationship, just by

havingalotofdata.)Butnoamountofdiggingthroughtransactionswill think up a new strategy for you to follow. Even the mostexquisiteanalysisofhistoricaldatacannotgiveyouasingleideaforanewproductorservice.Youstillneedtoputintheeffortanddothehardthinking.Computers, and machine learning methods in particular, have

beenworkingwithustosolveproblemsformanyyears.Theyhaveprovided admirable assistance, zeroing in on predictiverelationships that we could never have found without them. Youmayhavenotedanobviousauthorialbiasinfavourofmethodsthatkeep the details out in the open, where you can test the modelagainst your experience and acumen. These seem to be theunquestionablygoodusesofmachinelearning.Howmuchyouwant to trust thecomputer togooffon itsown,

and even learn in ways that you can barely understand, isbecoming an increasingly important decision. Recall that we still(theoretically) have an edge in common sense. Then you need tochoose.

BIBLIOGRAPHY

PrefaceCuppy,W(1931)HowtoTellYourFriendsfromtheApes,HoraceLiveright,NewYorkGhosh,P(2016)[accessed15August2016]MachineLearningoftheNextDecade:The

PromisesandthePitfalls[Online]http://wwwdataversitynet/machine-learning-next-decade-promises-pitfalls(anarticleidentifyingregressionandclusteringasadvancedmachine-learningmethods)

Kantrowitz,A(2016)[accessed15August2016]CanAnyoneinThisGroupofAdvancedProgrammersExplaintheTechWorld’sHottestTrend?[Online]https://www.buzzfeed.com/alexkantrowitz/finding-the-meaning-of-artificial-intelligence-at-google-io?utm_term=hr8N7KRmY#obKALmWxk(23/5/2016)

QuoteattributedtoEdwarddeBono[accessed15August2016]EdwarddeBonoQuotes,101Sharequotes2016[Online]http://101sharequotes.com/quote/edward_de_bono-an-expert-is-someone-who-has-suc-80976

QuoteattributedtoNielsBohrbyEdwardTeller(1954),inDrEdwardTeller’sMagnificentObsession,byRCoughlan,LIFEMagazine,6September1954,p62

http://wwwdataversitynet/machine-learning-next-decade-promises-pitfalls

https://www.buzzfeed.com/alexkantrowitz/finding-the-meaning-of-artificial-intelligence-at-google-io?utm_term=hr8N7KRmY#obKALmWxk

http://101sharequotes.com/quote/edward_de_bono-an-expert-is-someone-who-has-suc-80976

Chapter1Armstrong,JS,ed(2001)PrinciplesofForecasting,Kluwer,Norwell,MABagozzi,R(1994)AdvancedMethodsofMarketingResearch,BlackwellPublishers,

CambridgeMAChakrapani,C,ed(2002)MarketingResearch:State-of-the-artperspectives,Southwest

EducationalPublishing,Mason,OHDull,T(2015)[accessed15August2016]DataLakevsDataWarehouse:ABigDataCheat

Sheet:WhatMarketersWanttoKnow(Part4)[Online]http://www.kdnuggets.com/2015/09/data-lake-vs-data-warehouse-key-differenceshtml

Foreman,JW(2014)DataSmart:Usingdatasciencetotransforminformationintoinsight,Wiley,Indianapolis,IN

Gelman,A,etal(2013)BayesianDataAnalysis,3rdedn,Chapman&Hall/CRCTextsinStatisticalScience,BocaRaton,FL

Green,PEandCarroll,J(1978)AnalyzingMultivariateData,DrydenPress,Hinsdale,ILGrigsby,M(2015)MarketingAnalytics,KoganPage,LondonIhaka,RandGentleman,R(1996)R:alanguagefordataanalysisandgraphics,Journalof

ComputationalandGraphicalStatistics,5(3),pp299–314Inmon,B(1992)BuildingtheDataWarehouse,Wiley,Somerset,NJKimball,RandRoss,M

(2013)TheDataWarehouseToolkit,3rdedn,Wiley,Somerset,NJKugler,T,etal(2008)DecisionModelingandBehaviorinComplexandUncertain

Environments,Springer,NewYorkMayer-Schönberger,VandCukier,K(2013)BigData:Arevolutionthatwilltransformhowwelive,work,andthink,JohnMurray,London(abookthatsaysmoredataisalwaysbetter)McCullagh,P(2002)Whatisastatisticalmodel?AnnalsofStatistics,30(5),pp1225–310

Nisbet,R,Elder,JandMiner,G(2009)HandbookofStatisticalAnalysisandDataMiningApplications,AcademicPress,Boston,MAOram,A(1998)[accessed15August2016]TheLandMinesofDataMining[Online]http://www.praxagora.com/andyo/ar/privacy_mineshtml

Provost,FandFawcett,T(2013)DataScienceforBusiness:Whatyouneed.toknowaboutdatamininganddata-analyticthinking,O’ReillyMedia,Sebastopol,CARainer,RK(2012)IntroductiontoInformationSystems:Enablingandtransformingbusiness,4thedn,Wiley,Somerset,NJ

Russo,EandSchoemaker,P(2002)WinningDecisions,Doubleday,NewYork(abookonmakingdecisionsthatbarelymentionsdata)Sapsford,RandJupp,V(2006)DataCollectionandAnalysis,Sage,NewYork

Silver,N(2012)TheSignalandtheNoise:Whysomanypredictionsfail–butsomedon’t,Penguin,NY

Stoll,Clifford(2006)quote,inKeeler,M,NothingtoHide:Privacyinthe21stcentury,iUniverse:Lincoln,NE

Struhl,S(2008)[accessed15August2016]DataMiningComesofAge:OvercomingtheMythsandMisconceptions[online]http://www.hospitalitynet.org/news/4036261.xhtml

Sztandera,L(2014)ComputationalIntelligenceinBusinessAnalytics:Concepts,methods,andtoolsforbigdataapplications,Pearson,UpperSaddleRiver,NJ

http://www.kdnuggets.com/2015/09/data-lake-vs-data-warehouse-key-differenceshtml

http://www.praxagora.com/andyo/ar/privacy_mineshtml

http://www.hospitalitynet.org/news/4036261.xhtml

Vriens,M(2012)TheInsightsAdvantage:Knowinghowtowin,iUniverse,Bloomington,IN(abookondecisionmakingthatbarelymentionsdata)Winer,RandNeslin,SA(2014)TheHistoryofMarketingScience,WorldScientificPublishingCo,Singapore

Witten,I,Frank,EandHall,A(2011)DataMining,3rdedn,MorganKaufmann,Burlington,MA

Woods,Dan(2011)[accessed15August2016]BigDataRequiresaBigArchitecture,Forbes[Online]http://www.forbes.com/sites/ciocentral/2011/07/21/big-data-requires-a-big-new-architecture/#1c5ed8bf1d75

http://www.forbes.com/sites/ciocentral/2011/07/21/big-data-requires-a-big-new-architecture/#1c5ed8bf1d75

Chapter2Armstrong,JSandCollopy,F(1992)Errormeasuresforgeneralizingaboutforecasting

methods:empiricalcomparisons,InternationalJournalofForecasting,8(1)pp69–80Barton,SJ(2010)StrategicManagementSimplified:Whateverymanagerneedstoknow

aboutstrategyandhowtomanageit,iUniversePress,Bloomington,Indiana(storyaboutapparelmanufacturerLevi’s)Box,GEPandDraper,NR(1987)EmpiricalModelBuildingandResponseSurfaces,JohnWiley&Sons,NewYorkCohen,J(1988)StatisticalPowerAnalysisfortheBehavioralSciences,2ndedn,ErlbaumPublishers,Mahwah,NJ

DeTurck,D[accessed15August2016]CaseStudy1:The1936LiteraryDigestPoll[Online]https://www.math.upenn.edu/~deturck/m170/wk4/lecture/case1.xhtml

Dziak,JJetal(2012)SensitivityandSpecificityofInformationCriteria[Online]https://methodology.psu.edu/media/techreports/12-119.pdf

Floridi,L(2010)Information–AVeryShortIntroduction,OxfordUniversityPress,OxfordJessen,RJ(1978)StatisticalSurveyTechniques,Wiley,Edison,NJKish,L(1995)SurveySampling,Wiley,Edison,NJMarcus,AHandElias,W(1999)Someusefulstatisticalmethodsformodelvalidation,

EnvironmentalHealthPerspectives,106(Supplement6),pp1541–50McCullagh,P(2002)Whatisastatisticalmodel?AnnalsofStatistics,30(5),pp1225–310Sapsford,RandJupp,V(2006)DataCollectionandAnalysis,Sage,NewYorkSnedecor,GandCochran,W(1989)StatisticalMethods,8thedn,IowaStateUniversity

Press,Ames,IowaSpanos,S(2011)Statisticalmodelspecificationandvalidation:statisticalvssubstantiveinformation[Online]http://faculty.chicagobooth.edu/midwest.econometrics/papers/megspanos.pdf

Stevens,JP,Pituch,KandWhittaker,T(1999)IntermediateStatistics:Amodernapproach,3rdedn,LawrenceErlbaum,Mahwah,NJ

Thurstone,LL(1935)VectorsoftheMind,UniversityofChicagoPress,Chicago,ILWilkinson,L,Blank,GandGruber,C(1995)DesktopDataAnalysiswithSystat,SPSS,Inc.,

Chicago,IL

https://www.math.upenn.edu/~deturck/m170/wk4/lecture/case1.xhtml

https://methodology.psu.edu/media/techreports/12-119.pdf

http://faculty.chicagobooth.edu/midwest.econometrics/papers/megspanos.pdf

Chapter3Note: references for discrete choice modelling, conjoint, MaxDiffand Q-Sort/Case 5 are found in the chapters focusing on eachmethod.

HierarchicalBayesiananalysisAllenby,G,Rossi,PandMcCulloch,RE(2005)HierarchicalBayesmodel:apractitioner’s

guide,JournalofBayesianApplicationsinMarketing,pp1–4[Online]http://ssrn.com/abstract=655541

Box,GandTiao,G(1965)MultiparameterproblemsfromaBayesianpointofview,Ann.MathematicalStatistics,36(5),pp1468–82

Gelman,A,Carlin,J,Stern,HandRubin,D(2004)BayesianDataAnalysis,2ndedn,CRCPress,BocaRaton,FloridaSmith,B(1994)BayesianTheory,JohnWiley&Sons,Chichester

ExperimentaldesignsBingham,D,Sitter,RandTangB(2009)Orthogonalandnearlyorthogonaldesignsfor

computerexperimentsBiometrika96,pp51-65Box,GE;Hunter,JS;Hunter,WG(2005)Statisticsforexperimenters:Design,innovation,and

discovery,2ndedn,Wiley,NewYorkCochran,WandCox,G(1992)Experimental

designs,2ndedn,Wiley,NewYorkFang,KT,LiR,Sudjianto,A(2006)Designandmodelingforcomputerexperiments,CRC

Press,NewYorkLinCD,Bingham,D,SitterR,Tang,B(2010)Anewandflexiblemethodforconstructing

designsforcomputerexperiments,Ann.Statistics,38,pp1460–77Yang,JYandLiu,MQ(2012)ConstructionoforthogonalandnearlyorthogonalLatin

hypercubedesignsfromorthogonaldesigns,StatisticaSinica,22,pp433–42

http://ssrn.com/abstract=655541

Chapter4Agresti,A(2013)CategoricalDataAnalysis,3rdedn,JohnWileyandSons,Hoboken,NJBaltas,GeorgeandDoyle,Peter(2001)Randomutilitymodelsinmarketingresearch:a

survey,JournalofBusinessResearch,51(2),pp115–25BarghJA,ed(2006)SocialPsychologyandtheUnconscious:Theautomaticityofhigher

mentalprocesses,PsychologyPress,PhiladelphiaBen-Akiva,MandBierlaire,M(1999)Discretechoicemethodsandtheirapplicationstoshort-termtraveldecisions,HandbookofTransportationSciences,pp7–38,Kluwer,Norwell,MABen-Akiva,MandLerman,S(1985)DiscreteChoiceAnalysis:Theoryandapplicationtotraveldemand,MITPress,Cambridge,MAChu,C(1989)Apairedcombinatoriallogitmodelfortraveldemandanalysis,Proceedingsofthe5thWorldConferenceonTransportationResearch,Ventura,CA,pp295–309

Greenwald,AG(1992)NewlookIII:unconsciouscognitionreclaimed,AmericanPsychologist,47,pp766–79

Gustafsson,A,Herrmann,AandHuber,F,eds(2013)ConjointMeasurement:Methodsandapplications,Springer-Verlag,Berlin

Hausman,JandWise,D(1978)Aconditionalprobitmodelforqualitativechoice:discretedecisionsrecognizinginterdependenceandheterogeneouspreference,Econometrica,48(2),pp403–26

Kahneman,D(2013)Thinking,FastandSlow,Farrar,Strauss&Giroux,NewYorkKahneman,DandTverskyA(1979)Prospecttheory:ananalysisofdecisionunderrisk,

Econometrica,47(2),pp263–91Luce,MF,Payne,JWandBettman,JR(1999)Emotionaltrade-offdifficultyandchoice,

JournalofMarketingResearch,36(2),pp143–60Luce,RD(1959)IndividualChoiceBehavior,Wiley,NewYorkMcFadden,DandTrain,K(2000)MixedMNLmodelsfordiscreteresponse,Journalof

AppliedEconometrics,15(5),pp447–70Revelt,DandTrain,K(1998)Mixedlogitwithrepeatedchoices:households’choicesof

applianceefficiencylevel,ReviewofEconomicsandStatistics,80(4),pp647–57Train,K(1978)Avalidationtestofadisaggregatemodechoicemodel,Transportation

Research,12,pp167–74Train,K(2003)DiscreteChoiceMethodswithSimulation,CambridgeUniversityPress,MA

Chapter5Carroll,JandGreenE(1995)Psychometricmethodsinmarketingresearch:partI,conjoint

analysis,JournalofMarketingResearch,32,pp385–91Cattin,PandWittink,D(1982)Commercialuseofconjointanalysis:asurvey,Journalof

Marketing,46,pp44–53Green,P(1984)Hybridmodelsforconjointanalysis:anexpositoryreview,Journalof

MarketingResearch,21,pp155–59Green,PCarroll,JandGoldberg,S(1981)Ageneralapproachtoproductdesign

optimizationviaconjointanalysis,JournalofMarketing,43,pp17–35Green,P,Krieger,MandAgarwalM(1991)Adaptiveconjointanalysis:somecaveatsand

suggestions,JournalofMarketingResearch,28,pp215–21Green,PandSrinivasan,V(1978)Conjointanalysisinconsumerresearch:issuesand

outlook,JournalofConsumerResearch,5,pp103–23Green,PandWind,Y(1973)MultiattributeDecisionsinMarketing:Ameasurement

approach,DrydenPress,Hinsdale,ILLuce,RD(1959)IndividualChoiceBehaviorWiley,NewYorkMarder,E(1999)Theassumptionsofchoicemodelling,CanadianJournalofMarketResearch,18,pp1–10McCollough,PR(2002)[accessed15August2016]ShortcomingsofAdaptiveConjoint

[Online]http://www.macroinc.com/english/dont-use-adaptive-conjoint-methods/McCullough,PR(2002)AUsersGuidetoConjoint[Online]

http://www.macroinc.com/english/papers/A%20Users%20Guide%20to%20Conjoint%20Analysis.pdfMorowitz,V(2001)Methodsforforecastingfromintentionsdata,inPrinciplesof

Forecasting,edJArmstrong,Springer,NewYorkOrme,B(2009)GettingStartedwithConjointAnalysis,ResearchPublishers,Madison,WIRao,V(2014)AppliedConjointAnalysis,Sprinter-Verlag,Berlin

http://www.macroinc.com/english/dont-use-adaptive-conjoint-methods/

http://www.macroinc.com/english/papers/A%20Users%20Guide%20to%20Conjoint%20Analysis.pdf

Chapter6Breiman,L,Friedman,J,Olshen,RandStone,C(1984)ClassificationandRegressionTrees,

ChapmanandHall,NewYorkBrodley,CEandUtgoff,PE(1995)Multivariatedecisiontrees,MachineLearning,19,pp45–

77Buntine,W(1992)Learningclassificationtrees,StatisticsandComputing,2,pp63–73Clark,LAandPregibon,D(1993)Tree-basedmodels,inStatisticalModels,edJMChambers

andTJHastie,pp377–419,ChapmanandHall,NewYorkHazewinkel,M,ed(1987)Greedyalgorithm,EncyclopediaofMathematics,SupplementIII,Springer,Norwell,MAHochberg,YandTamhane,AC(1987)MultipleComparisonProcedures,Wiley,NewYork

Holte,RC(1993)Verysimpleclassificationrulesperformwellonmostcommonlyuseddatasets,MachineLearning,11,pp63–90

Kass,GV(1980)Anexploratorytechniqueforinvestigatinglargequantitiesofcategoricaldata,AppliedStatistics,29,pp119–27

Lim,T-S,Loh,W-YandShih,Y-S(2000)Acomparisonofpredictionaccuracy,complexity,andtrainingtimeofthirty-threeoldandnewclassificationalgorithms,MachineLearning,40,pp203–28

Muller,WandWysotzki,F(1994)Automaticconstructionofdecisiontreesforclassification,AnnalsofOperationsResearch,52,pp231–47

Quinlan,JR(1989)Unknownattributevaluesininduction,ProceedingsoftheSixthInternationalMachineLearningWorkshop,pp164–68

Shi,LandHorvath,S(2006)Unsupervisedlearningwithrandomforestpredictors,JournalofComputationalandGraphicalStatistics,15(1),pp118–38

Steinberg,DandColla,P(1992)CART:AsupplementarymoduleforSystat,SystatInc.,Evanston,IL

White,APandLiu,WZ(1994)Biasininformation-basedmeasuresindecisiontreeinduction,MachineLearning,15,pp321–29

Witten,IandFrank,E(2005)DataMining:Practicalmachinelearningtoolsandtechniques,2ndedn,MorganKaufmann,SanFrancisco

Chapter7Cooper,GFandHerskovits,E(1992)ABayesianmethodfortheinductionofprobabilistic

networksfromdata,MachineLearning,9,pp309–47Gill,R(2010)MontyHallproblem,inInternationalEncyclopediaofStatisticalScience,pp

858–63Springer-Verlag,BerlinHeckerman,D(1995)TutorialonlearningwithBayesiannetworks,inJordan,M,LearninginGraphicalModelsAdaptiveComputationandMachineLearning,MIT,Press,Cambridge,MAJensen,FVandNielsen,TD(2007)BayesianNetworksandDecisionGraphs,2ndedn,Springer-Verlag,NewYork

Kahneman,D,Slovic,PandTversky,A,eds(1982)JudgmentunderUncertainty:Heuristicsandbiases,CambridgeUniversityPress,Cambridge,UK

Korb,KBandNicholson,A(2010)BayesianArtificialIntelligence,Chapman&Hall(CRCPress),NewYork

Mackay,D(2003)InformationTheory,InferenceandLearningAlgorithms,CambridgeUniversityPress,Cambridge,UK

Pearl,J(1986)Fusion,propagation,andstructuringinbeliefnetworks,ArtificialIntelligence,29(3),pp241–88

Pearl,J(1988)ProbabilisticReasoninginIntelligentSystems:Networksofplausibleinferencerepresentationandreasoningseries,2ndedn,MorganKaufmann,SanFrancisco,CAPearl,J(200)Causality:Models,reasoning,andinference,CambridgeUniversityPress,NewYork

Russell,SJandNorvig,P(2003)ArtificialIntelligence:Amodernapproach,2ndedn,PrenticeHall,UpperSaddleRiver,NJ

SolvingtheMontyHallThree-DoorProblemwithBayesNets[accessed15August2016][Online]http://download.hugin.com/webdocs/manuals/Htmlhelp/monty_hall_pane.xhtml

Wilkinson,L,Blank,IandGruber,P(1999)DesktopDataAnalysiswithSystat,SPSSInc.,Chicago,IL

Witten,IandFrank,E(2005)DataMining:Practicalmachinelearningtoolsandtechniques,2ndedn,MorganKaufmann,SanFranciscoYudkowsky,E(2016)[accessed15August2016]AnIntuitiveExplanationofBayes’Theorem[Online]http://yudkowsky.net/rational/bayes

Zhang,NandPoole,D(1994)AsimpleapproachtoBayesiannetworkcomputations,ProceedingsoftheTenthBiennialCanadianArtificialIntelligenceConference,AI-94,pp171–78,Banff,Alberta

http://download.hugin.com/webdocs/manuals/Htmlhelp/monty_hall_pane.xhtml

http://yudkowsky.net/rational/bayes

Chapter8Fisher,RA(1936)Theuseofmultiplemeasurementsintaxonomicproblems,Annalsof

Eugenics,7(2),pp179–88Kantrowitz,A(2016)[accessed15August2016]CanAnyoneinThisGroupofAdvanced.

ProgrammersExplaintheTechWorld’sHottestTrend?[Online]https://www.buzzfeed.com/alexkantrowitz/finding-the-meaning-of-artificial-intelligence-at-google-io?utm_term=hr8N7KRmY#obKALmWxk(23/5/2016)

Lloyd,SP(1957)LeastsquarequantizationinPCM,BellTelephoneLaboratoriespaper;publishedinjournalmuchlater:Lloyd,SP(1982)LeastsquaresquantizationinPCM(PDF),IEEETransactionsonInformationTheory,28(2),pp129–37

Metz,C(2016)[accessed15August2016]AIisTransformingGoogleSearch:TheRestoftheWebisNext[Online]http://www.wired.com/2016/02/ai-is-changing-the-technology-behind-google-searches(4/2/2016)

Samuel,AL(1959)Somestudiesinmachinelearningusingthegameofcheckers,IBMJournalofResearchandDevelopment,3(3),pp210–29

https://www.buzzfeed.com/alexkantrowitz/finding-the-meaning-of-artificial-intelligence-at-google-io?utm_term=hr8N7KRmY#obKALmWxk

http://www.wired.com/2016/02/ai-is-changing-the-technology-behind-google-searches

Bonusonline-onlyChapter1

MaximumDifferenceScaling(MaxDiff)Flynn,TNetal(2007)Best–worstscaling:whatitcandoforhealthcareresearchandhow

todoit,JournalofHealthEconomics,26(1),pp171–89Louviere,JJetal(2015)Best-WorstScaling:Theory,methodsandapplications,Cambridge

UniversityPress,CambridgeMarley,AAandLouviere,JJ(2005)Someprobabilisticmodelsofbest,worst,andbest–worstchoices,JournalofMathematicalPsychology,49(6),pp464–80

Q-SortandCase5Block,J(1978)TheQ-SortMethodinPersonalityAssessmentandPsychiatricResearch,

ConsultingPsychologistsPress,MountainView,CABorbinha,J,etal(c.2005)[accessed15August2016]AdaptiveQ-SortMatrixGeneration:ASimplifiedApproach[Online]http://www.inesc-id.pt/pt/indicadores/Ficheiros/11389.pdf

Bracken,SSandFischel,J(2006)Assessmentofpreschoolclassroompractices:applicationofQ-sortmethodology,EarlyChildhoodResearchQuarterly,21(4),pp417–30

Bradley,RAandTerry,ME(1952)Rankanalysisofincompleteblockdesigns,I:themethodofpairedcomparisons,Biometrika,39,pp324–45

Luce,RD(1959)IndividualChoiceBehaviors:Atheoreticalanalysis,JWiley,NewYorkMcKeown,BFandThomas,BD(1988)Q-Methodology,SagePublications,NewburyPark,CAMichell,J(1997)Quantitativescienceandthedefinitionofmeasurementinpsychology,

BritishJournalofPsychology,88,pp355–83Rasch,G(1980)ProbabilisticModelsforSomeIntelligenceandAttainmentTests,University

ofChicagoPress,ChicagoThurstone,LL(1927)Alawofcomparativejudgement,PsychologicalReview,34,pp273–86Thurstone,LL(1929)Themeasurementofpsychologicalvalue,inEssaysinPhilosophyby

SeventeenDoctorsofPhilosophyoftheUniversityofChicago,edTSmithandWWright,OpenCourt,ChicagoThurstone,LL(1959)TheMeasurementofValues,UniversityofChicagoPress,Chicago

RegressionAchen,CH(1973)InterpretingandUsingRegression,SagePublications,BeverlyHillsDarlington,RB(1968)Multipleregressioninpsychologicalresearchandpractice,

PsychologicalBulletin,69,pp161–82Gorsuch,RL(1973)Dataanalysisofcorrelatedindependentvariables,Multivariate

BehavioralResearch,8,pp89–107Lorenz,FO(1987)Teachingaboutinfluenceinsimpleregression,TeachingSociology,15,

http://www.inesc-id.pt/pt/indicadores/Ficheiros/11389.pdf

pp173–77Mansfield,ERandConerly,MD(1987)Diagnosticvalueofresidualandpartialresidual

plots,TheAmericanStatistician,41,pp107–16Mauro,R(1990)Understandingl.o.v.e.(leftoutvariableserror):amethodforestimating

theeffectsofomittedvariables,PsychologicalBulletin,108,pp314–29Mosteller,FandTukey,J(1977)DataAnalysisandRegression:Asecondcourseinstatistics,

Pearson,UpperSaddleRiver,NJStevens,JP(1984)Outliersandinfluentialdatapointsinregressionanalysis,Psychological

Bulletin,95,pp334–44Wilkinson,L,Blank,GandGruber,C(1995)DesktopDataAnalysiswithSystat,SPSSPress,

Chicago,ILWolf,GandCartwright,B(1974)Rulesforcodingdummyvariablesinmultipleregression,

PsychologicalBulletin,81,pp173–79

Bonusonline-onlyChapter2Asimov,I(1964)Introduction,TheRestoftheRobots,Doubleday,NewYorkBaer,D(2016)[accessed15August2016]The‘OutfielderProblem’ShowsHowYourBrain

IsNotaComputer[Online]http://nymag.com/scienceofus/2016/06/outfielder-problem.xhtml

Bengio,Yoshuaetal(2006)Neuralprobabilisticlanguagemodels,inInnovationsinMachineLearning,edDEHolmesandLCJain,pp137–86,Springer,NewYork

Bourzac,K(2016)[accessed15August2016]BringingBigNeuralNetworkstoSelfDrivingCars,Smartphones,andDrones[Online]http://spectrum.ieee.org/computing/embedded-systems/bringing-big-neuralnetworks-to-selfdriving-cars-smartphones-and-drones

Cellan-Jones,R(2016)[accessed15August2016]StephenHawkingWarnsArtificialIntelligenceCouldEndMankind[Online]http://www.bbc.com/news/technology-30290540

Cho,S-JandKim,J(2003)Bayesiannetworkmodelingofhangulcharactersforon-line:handwritingrecognition,ICDARProceedingsoftheSeventhInternationalConferenceonDocumentAnalysisandRecognition(1),IEEEComputerSociety,Washington,DC

Claburn,T(2016)[accessed15August2016]ManyBusinessesUsingAIWithoutRealizingIt[Online]http://www.informationweek.com/strategic-cio/many-businesses-using-ai-without-realizing-it/d/d-id/1326333(21/7/2016)

Coffman,V(2013)[accessed15August2016]WhyYouShouldNotBuildaRecommendationEngineOnline]http://www.datacommunitydc.org/blog/2013/05/recommendation-engines-why-you-shouldnt-build-one

Coldewey,D(2016)[accessed15August2016]DeepLearningSoftwareKnowsThataRoseisaRoseisaRosaRubiginosa,http://www.techcrunch.com/2016/07/26/deeplearning-software-knows-that-a-rose-is-a-rose-is-a-rosa-rubiginosa

Daugman,JG(2001)Brainmetaphorandbraintheory,inPhilosophyandtheNeurosciences:Areader,edWBechtel,PMandik,JMundaleandRStufflebeam,Blackwell,Oxford,UK

Davis,E(2016)[accessed15August2016]CollectionofWinogradSchemas[Online]http://www.cs.nyu.edu/faculty/davise/papers/WinogradSchemas/WS.xhtml

Etzioni,0(2016)[accessed15August2016]DeepLearningisn’taDangerousMagicGenieit’sJustMath[Online]http://www.wired.com/2016/06/deeplearning-isnt-dangerous-magic-genie-just-math(15/6/2016)

Goodman,J(2016)[accessed15August2016]SayOneSentenceandit’sDoneintheAI-FirstWorld[Online]https://www.theguardian.com/media-network/2016/may/20/say-one-sentence-and-its-done-in-the-ai-first-world?CMP=oth_b-aplnews_d-1

Hern,A(2016)[accessed15August2016]GoogleSaysMachineLearningistheFutureSoITriedItMyself[Online]https://www.theguardian.com/technology/2016/jun/28/google-says-machine-learning-is-the-future-so-i-tried-it-myself(28/7/2016)

Hinton,G(2010)ApracticalguidetotrainingrestrictedBoltzmannmachines,Momentum,91,pp926–43

Hinton,G,Osindero,SandTeh,Y(2006)Afastlearningalgorithmfordeepbeliefnets,

http://nymag.com/scienceofus/2016/06/outfielder-problem.xhtml

http://spectrum.ieee.org/computing/embedded-systems/bringing-big-neural-networks-to-selfdriving-cars-smartphones-and-drones

http://www.bbc.com/news/technology-30290540

http://www.informationweek.com/strategic-cio/many-businesses-using-ai-without-realizing-it/d/d-id/1326333

http://www.datacommunitydc.org/blog/2013/05/recommendation-engines-why-you-shouldnt-build-one

http://www.techcrunch.com/2016/07/26/deep-learning-software-knows-that-a-rose-is-a-rose-is-a-rosa-rubiginosa

http://www.cs.nyu.edu/faculty/davise/papers/WinogradSchemas/WS.xhtml

http://www.wired.com/2016/06/deep-learning-isnt-dangerous-magic-genie-just-math

https://www.theguardian.com/media-network/2016/may/20/say-one-sentence-and-its-done-in-the-ai-first-world?CMP=oth_b-aplnews_d-1

https://www.theguardian.com/technology/2016/jun/28/google-says-machine-learning-is-the-future-so-i-tried-it-myself

NeuralComputation,18(7),pp1527–54Kahneman,D(2013)Thinking,FastandSlow,Farrar,Strauss&Giroux,NewYorkKoch,C(2016)[accessed22February2017]HowtheComputerBeattheGoMaster[Online]

http://www.scientificamerican.com/article/how-the-computer-beat-the-go-master(29/3/16)

Kosko,B(1993)FuzzyThinking:Thenewscienceoffuzzylogic,Hyperion,NewYorkLivingston,B(2002)[accessed22February2017]PaulGrahamProvidesStunningAnswer

toSpamE-Mails[Online]http://www.infoworld.com/article/2674702/technology-business/techology-business-paul-graham-provides-stunning-answer-to-spam-e-mails.xhtml

Masnick,M(2012)[accessed15August2016]WhyNetflixNeverImplementedtheAlgorithmthatWontheNetflix$1MillionChallenge[Online]https://www.techdirt.com/blog/innovation/articles/20120409/03412518422/why-netflix-never-implemented-algorithm-that-won-netflix-1-million-challenges.xhtml

Montaner,M,Lopez,BanddelaRosa,JL(2003)Ataxonomyofrecommenderagentsontheinternet,ArtificialIntelligenceReview,19(4),pp285–330

Nielsen,M(2016)[accessed22February2017]HowtheBackpropagationAlgorithmWorks[Online]http://neuralnetworksanddeeplearning.com/chap2.xhtml

Reese,B(2016)[accessed15August2016]GigaomChats:InterfacingWithMachinesin2026[Online]http://www.gigaom.com/2016/08/02/gigaom-chats-interfacing-with-machines-in-2026(2/8/2016)

Ricci,F,Rokach,LandShapira,B(2012)Introductiontorecommendersystemshandbook,RecommenderSystemsHandbook,Springer,NewYork,pp1–35

Sacks,O(1998)TheManWhoMistookHisWifeForHisHat,Touchstone,NewYorkSahmai,M,Dumais,M,Heckerman,DandHorvitz,E(1998)ABayesianapproachto

filteringjunke-mail,AAAI1998WorkshoponLearningforTextCategorizationShi,LandHorvath,S(2006)Unsupervisedlearningwithrandomforestpredictors,Journal

ofComputationalandGraphicalStatistics,15(1),pp118–38Sonnad,N(2016)[accessed15August2016]Easyquestionsthatcomputersareterribleat

answering[Online]http://qz.com/745104/easy-questions-that-computers-are-terrible-at-answering(2/8/2016)

SpamBayessourcecode[Online]http://spambayes.sourceforge.net/Standage,T(2016)[accessed15August2016]WhyArtificialIntelligenceisEnjoyinga

Renaissance[Online]http://www.economist.com/blogs/economist-explains/2016/07/economist-explains(15/7/2016)

StanfordUniversityDeepLearningTutorial[Online]http://deeplearning.stanford.edu/tutorial/(byANgetal)

Tracy,A(2016)[accessed15August2016]SuddenlyEverybodyisObsessedwithAI–EvenifInvestorsDon’tGetIt[Online]http://www.vanityfair.com/news/2016/06/silicon-valley-artificial-intelligence-obsession(29/6/2016)

Vanian,J(2016)[accessed15August2016]ArtificialIntelligenceStillHasaWays[Sic]ToGoBeforeMachinesCanBehaveLikeHumans[Online]http://fortune.com/2016/05/23/google-baidu-research-artificial-intelligence/?xid=smartnews(23/5/16)

Wakefield,J(2016)[accessed15August2016]WouldYouWanttoTalktoaMachine?[Online]www.bbc.co.uk/news/technology-36225980(4/8/2016)

Zaknich,A(2003)NeuralNetworksforIntelligentSignalProcessing,WorldScientific

http://www.scientificamerican.com/article/how-the-computer-beat-the-go-master

http://www.infoworld.com/article/2674702/technology-business/techology-business-paul-graham-provides-stunning-answer-to-spam-e-mails.xhtml

https://www.techdirt.com/blog/innovation/articles/20120409/03412518422/why-netflix-never-implemented-algorithm-that-won-netflix-1-million-challenges.xhtml

http://neuralnetworksanddeeplearning.com/chap2.xhtml

http://www.gigaom.com/2016/08/02/gigaom-chats-interfacing-with-machines-in-2026

http://qz.com/745104/easy-questions-that-computers-are-terrible-at-answering

http://spambayes.sourceforge.net/

http://www.economist.com/blogs/economist-explains/2016/07/economist-explains

http://deeplearning.stanford.edu/tutorial/

http://www.vanityfair.com/news/2016/06/silicon-valley-artificial-intelligence-obsession

http://fortune.com/2016/05/23/google-baidu-research-artificial-intelligence/?xid=smartnews

http://www.bbc.co.uk/news/technology-36225980

Publishing,SingaporeZhu,X(2008)Semi-SupervisedLearningLiteratureSurvey,ComputerSciences,TR1530,

UniversityofWisconsin,Madison,WI[Online]http://pages.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf

http://pages.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf

INDEX

Note: The index is filed in alphabetical, word-by-word order.Numbersandacronymswithinmainheadingsarefiledasspeltout.Page locators in italics denote information contained within aFigure or Table; locators in roman numerals denote informationwithinthePreface.

A/Bwebtesting127,131,132,146,161,162AC2191acquiescencebias12,57,58actions218,222AdaBoost.M1(adaptiveboosting)194,195–96,231adaptivechoice-basedmodelling120AdaptiveConjointAnalysis(ACA)76,156–57advertisements140–44,152–54AIDmodel191AIDAmodel218AkaikeInformationCriterion(AIC)45,48,54allocatingpurchases(choices)114,124–25AmericanExpress(Amex)91–92,129analysisseeconjointanalysis;correspondenceanalysis;discriminantanalysis;driveranalysis;drugsaleanalysis;gainsanalysis;HBanalysis(hierarchicalBayesiananalysis)

analysisofvariance(ANOVA)230artificialintelligence,defined4–5,229attributesbrandas121–22,131,158,160contingent125levelsof59,69–71,81–82,89,92–94,95,101,128–30,137–38,140–41

sensory119,125

basecase(referencecase)96,106,109,110,111,112,113,116,117Bayes,ReverendThomas201–02BayesNets(BayesianNetworks)33,37,50,166,197,200–28,232–35BayesianInformationCriterion(BIC)45,48,54bestpractice94–97,122–23bi-plots230bias11–13,53acquiescence57,58

Bohr,Nielsxiiiboosteddecisionstumps192–94boosting192–94,231,232seealsoAdaBoost.M1(adaptiveboosting)

Box,George48brand32,34–35,93,121–22,128,131,154–55,158,160,223–26brandawareness34–35brandidentity119

cannibalization79,83cards62,68,127CART(C&RT)166,191,195Case5analysisseeThurstone’sCase5methodcategoricaldata45categoricalvariables53,166,172,181,195,199,200C4.5/C5191CHAID(chi-squaredautomaticinteractiondetector)166,168–72,181,191,195

changerequirements28,51,92childvariable203,217children’sapparel223–26Choice-Based-Conjoint(CBC)93,120,156,157–59,161–62Citibank91,92,129classificationseeclassificationrules;classificationtrees;correctclassification

classificationrules184–86,195classificationtrees38,165–96,200,211–12,227,230,231,234–35clusteringxi,230

seealsoTwoStepCME(continuingmedicaleducation)148,149communications19,131,132,161,214,215,216,220seealsomessages;telecommunicationscompetitiveadvantage214–15,216competitors(competition)19conjointanalysis79,130,160–61discretechoicemodelling83,90,91–92,95–96,110,112,117,122–23,131,233–34

complexnetworks204computers4,16,18,21,32,70,235–36discretechoicemodelling60–61,89–90HBanalysis73–74,123,229–30

conditionalprobabilities205,206,207,210,230conjointanalysis13,62,67,75–77,78,79–80,81,83–84,127–64,233andbrand93partialprofile156andshareofpreference34

constant,the71,92,93,94,122,129,137,198consumers5,22,89seealsocustomersatisfaction;customers

contingentattributes125continuingmedicaleducation(CME)148,149continuousdata46continuouspredictorvariables181convergence73,101correctclassification45,46–47,53,193correctpredictionlevels47,214,215,217,222correlations43–45,54,68–69,128,198,211,232seealsozerocorrelation

correspondenceanalysis230counting14,22cross-foldedvalidation217CRT(CART)166,191,195culturalskew12,58

cumulativefigures(statistics)186,187,188,189currentcase108,109,115,116,117customerlists16–17customersatisfaction10,11,23,62,150,151,221,222customers15–17,131,132,147–52,161,162seealsocustomersatisfaction;touchpoints

D-optimaldesigns90DAGs202data6–8,22,28–31,45–46,48–49,98historical11–12,13,14,23,25,32–33,51,56,75ordinal200patternsin32–33,38,234ratio-level65,77,80,149,231,232scanner167transaction14,25,32–33,51seealsosampleerrors;datalakes;datamining;dataquality;dataswamps;datawarehouses

datalakes7datamining14,33,201,204dataquality7,22dataswamps7datawarehouses7,31,32databasemanagement16–17,167–68,174,175,177,181,185,186deBono,Edwardxiiidebitcards91–92decisionstumps191–94demographicinformation10,167–72,174seealsochildren’sapparel

designedexperiments66–71,82dietaryfibremarket133–36direct-mail79,83,137–40,146directmeasurement8–9directquestions12,24,56directedacyclicdiagrams202seealsoBayesNets(BayesianNetworks)

discretechoicemodelling13–14,52,60–61,67,74,78–79,81,83,85–126,233–34andattributes128,129

andshareofpreference34andsurveys17–18andutility130

discriminantanalysis230donationgeneration8driveranalysis213drugsalesanalysis19,49,86,87,114,124,142,143,148–52dummycoding199

earlyschoolageclothingmarket223–24,225efficiencycalculation189elasticity99electiondata28–29,39engineers71,95,156ensemblelearning(models)174,194,195,229enterpriseclasssoftware21envelopes137,138,139,140errorsdata28–29discretechoicemodelling103–04inmeasurement71,94,104inpricing155sample38–41,53,103andsignificancetesting42–43TypeI36,52TypeII37seealsomarginoferror

estimationoptimism135–36evaluation35–36,45–50Excel70,81,86–88,107,109,114–17,136,152ExcelRibbon87,88,114–15,117,152expectationsetting50–51,54experimentalmethods10,12,23,24,34–35,52,55–84,89–90,233

seealsoconjointanalysis;discretechoicemodelling;HBanalysis(hierarchicalBayesiananalysis);marketsimulatorprograms(marketsimulations);MaxDiff(maximumdifferencescaling);Q-Sortmethod;Thurstone’sCase5method

explainedvariance45,47,49,53exponentiation107extrapolation96,99extremeresponses57–58extremevalues144

farmers71,156focusgroups38,41–42,53font138,139,140,141,143forecasting4,15,19,25,125,201seealsodiscretechoicemodelling;marketsimulatorprograms(marketsimulations)

fractionalfactorialorthogonaldesigns69,82fullprofileconjointanalysis62,76–77,79,81,127,133–36,158,159,233

gainsanalysis184,186–89,231Gallup,George(GallupPoll)29GeneralSocialSurvey37goals48,50–51,66,165,190‘goodresults’48–49grandparentvariable203graphicalanalyticalmethods202,227seealsoBayesNets(BayesianNetworks)

graphicaluserinterfaces(GUIs)21grossrevenues99,111,112,113,116,117growthcurve43–44Gumbeldistribution103–04

Hadoop7HBanalysis(hierarchicalBayesiananalysis)72–74,78,82–83,98,101–02,104,107,123–24

headlines79,139,140,141,143hierarchicalmodels218–19HIPPO(highestpaidperson’sopinion)140historicaldata(models)11–12,13,14,23,25,32–33,51,56,75hold-outsamples49–50,54,217

if-thenstatements(rules)184–85,186,194,195,200,227,231implicitviews11–13,23–24in-depthinterviews38inelasticity99,112infantclothingmarket223–24,225inflectionpoints99,100information6–7,22,37informationcriteria45,48,53–54informationpropagation212insurancesector79,83,91,137,147,161,162interactions154–56,158,170,195,234interpolation59,95,121,133interval-leveldata45–46interviews9,10–11,23,31,38,41,57,147online(web)60,64,65,68,159

Japan58JCPenney32J4820,191

Kahneman,Daniel105

latentvariables221LatinAmerica58LavidgeandSteinermodel219LawofComparativeJudgements,The,seeThurstone’sCase5method

Let’sMakeaDeal205–10letters137,138,139,140levels,attribute59,69–71,81–82,89,92–94,95,101,128–30,137–38,

140–41leverageanalysis(liftanalysis)seegainsanalysislinearadditivemodels198linearregression50LiteraryDigest29logos141,143loyalty220,221,222loyaltyprogrammes167,224loyaltyscores192

McFadden,Daniel74machinelearningxi–xii,13,21,74,173–74,236seealsoHBanalysis(hierarchicalBayesiananalysis)

marginoferror39marketshare11,33,84,86,125,136,165,191–92,234marketsimulations10,23,34,126,234marketsimulatorprograms63–64,70,86–90,107–17,124,126,133–36,142–44,162,234

marketplace,understandingof15–16,84,94,106,121,122,160marketplacescenarios(marketscenarios)60–61,81,88,89,93–94,96,98,104–05,120–21

Markovblanket204,213mathematicalmodels5,10,46MaxDiff(maximumdifferencescaling)13,52,64–65,66,78,80,81–82,84,163,230,231,232

MD-PREF230measurement8–11,71–72,94,95–96,98,99,103,104,122MECE183–86messageoptimizationsimulators142–44,152–54,233messageprofiles137messages136–47,152–54seealsomessageoptimizationsimulators

MicrosoftExcel70,81,86–88,107,109,114–17,136,152ExcelRibbon87,88,114–15,117,152PowerPoint70,133–36

missingvalues181,200mobilephonetowers5,110–14MontyHallproblem205–10Morwitz,V.56motherlogit166,199multinomiallogit166,199mutuallyexclusiveandcompletelyexhaustive185–86

NCSS20networks201,226Bayesian33,37,50,166,197,200–28,232–35neural166,228,230,235,236simple204,214

newsletters148,150nodes176,177,178,181,184,185,186,206–08,217non-compensatorytheories118,120Nonameo154–55,156‘noneofthese’option60,81,97,109,111,123NORC37normaldistribution103–04norms56nullhypothesis36,52

Ockham’sRazor94onlineinterviews60,68,159seealsowebinterviews

optimalrecoding100,172,189,191,192,195,200optimism,ofestimation135–36optimizers117ordinaldata45,46,200orthogonaldesigns69,82,89–90outcomes9–10,11,60overfitting50,218,235,236

parentvariable203,217parents223–26

partialleastsquares(PLS)pathmodels192,202,217,221partialprofileconjointmethod76–77,156penproduction132–33pharmaceuticalindustry148–52seealsodrugsalesanalysis

photos(pictures)140,141–42,143pointofinflection(inflectionpoints)99,100PowerPoint70,133–36pre-teenclothingmarket223–24,225,226precursors(surrogates),forbehaviour9–10,23prediction4,21,47,214,215,217,222predictorvariablesBayesNets198–99,204,213,215,227,232classificationtrees173,176,181,192,211,231

preferences9,11,23presetvalues117priceelasticity99pricevsshare98–100pricing97–100,123,133,155–56,214printadvertisement140–44,152–54privacyissues11,14probabilities,conditional205,206,207,210,230productlinesynergies79,83productprofiles62,127–28,156,158,159productsawarenessof9,34–35,95,122,125distributionof34–35,125sensory119,125seealsoproductlinesynergies;productprofiles;single-productoptimization

projectplanning27–54projections15–18,25,29,31,84prospecttheory105–06,135psoriasistreatment193–94,231

Q-Sortmethod13,52,60,65–66,78,80,81–82,84,163,232

seealsoThurstone’sCase5analysisqualitativeresearch94,95QUEST191questionsandanswers10–11,12,13,14,23,24–25,27,33,51–52,56seealsosociallydesirableresponses(socialdesirability);straight-lineresponses(straightlining)

QuietFinancialServices(QFS)casestudy91–93,129

R(program)21R-squaredstatistic43,44,47Rstatistic43,44randomforests173–76,192,195,231,232randomsamples43ratingsscales(scaledratings)9,12,33,46,55–59,67,75,80,88–89,147–48

ratio-leveldata46,65,77,80,149,231,232RCA154–55referencecase(basecase)96,106,109,110,111,112,113,116,117referencevalue106regression43,44,47,50,166,191–92,198–99,210–11,212,221replication72representative(definition)31reservationprices99results48–50,54revenueneutral99seealsogrossrevenues

Roosevelt,Franklin29root-mean-squarederror(RMSE)48,54

s-curve105–07,109,124,130,210–11sampleerrors(samplingerror)38–42,53,103sampleframes29–31,51samplesize38–41,42,53sampling148,150–51seealsohold-outsamples;randomsamples;sampleerrors(samplingerror);samplesize

SAS20Savant,Marilynvos205SavvySylvia226SawtoothSoftware119–20scaledratings(ratingsscales)9,12,33,46,55–59,67,75,80,88–89,147–48

scannerdata167screening17–18,203–04self-effectscurve98–99sensoryproducts(attributes)119,125servicedeliveryoptimization147–52,162share98–100,107,109,114,124,135,192,214–15,221shareofacceptance52,63,64,84,134shareofpreference34,35,86,87,108,120,125,135,158,234significancetesting37–38,42–43,52–54seealsostatisticalsignificance

silentdonationdrives8simplenetworks204,214single-productoptimization132–36socialmedia31sociallydesirableresponses(socialdesirability)12,58,128softwareprograms20–21,43,191,195,226,227seealsoAdaptiveConjointAnalysis(ACA);Choice-Based-Conjoint(CBC);Hadoop;marketsimulatorprograms;MaxDiff(maximumdifferencescaling);messageoptimizationsimulators;SawtoothSoftware;SPSS

solver,the117Sony35,119,154–55spousevariable203SPSS20,48stakeholders50Stat20Statistica20statisticalanalysissoftware20–21statisticalpower36–37,52

statisticalsignificance36,37,52–53,171–72,173,176–77,178,190seealsosignificancetesting

stimulusitems66storeddataseehistoricdata(models)straight-lineresponses(straightlining)12,57–58,99,100straight-linesrelationships43,44,54,105,109,130,198,211structuralequationmodels192,202surrogates(precursors),forbehaviour9–10,23surveys17–18,33,37,59,65,66,147synergistic170,195Systat20

taglines141,143targetvariablesBayesNets202,204,211,213,214,215,216,217,222,227classificationtrees194,198–99,200

telecommunications22,84,147,161,162testingA/B127,131,132,146,161,162direct-mail146significance37–38,42–43,52–54seealsostatisticalsignificance

text140,141theoreticalmodels11,13,14,23,25Thurstone’sCase5method13,60,65–66,77,78,81,163,165,230,232seealsoQ-Sortmethod

TinyCo99,100toddlerclothingmarket223–24,225touchpoints147,149,162trade-offmethodsseeexperimentalmethodstransactiondata14,25,32–33,51transportationsector74,91trust,brand223–26TwoStep48TypeIerror36,52

TypeIIerror37typicalgrowthcurve43–44

UK167UnitedStates(US)14,29,58,91,99,148,167,172userbenefits95,122utility59–60,86,88,105–07,109,121,124,130,135,160

validation49–50,54,217–18valueofinformation37values106,117,181,200variables5,68–69,170,180,190,194–95,202–04,210,212categorical53,166,172,181,199,200latent221predictorBayesNets198–99,213,215,227,232classificationtrees173,176,181,192,211,231randomforests175,176targetBayesNets211,213,214,215,216,217,222,227classificationtrees198–99,200

variance47,213explained45,49,53seealsoanalysisofvariance(ANOVA)

variations59,70,71,94,122,144–45verbatimcomments14–15

wealthindex175,180,181,183webinterviews64,65seealsoonlineinterviewswebsites119–20,131–32,136–37,144–46,147,149,161Weka20,21Wilkinson,Leland212WilliamofOckham94win-lossmatrix77

Yudkowsky,Eliezer210

zerocorrelation89,128,199

“Fullofhard-wonpracticalwisdom,thisisacomprehensiveguidetonavigatingthecomplexityofmarketforecasting.Foregoingthehyperbolethatsooftencharacterizesdiscussionsofartificial

intelligence,DrStruhlthoroughlyexplainsawiderangeofmethods,wheretheirdifficultieslieandhowtogetthebestinsightsfromeach.”

PeterGoldstein,SoftwareEngineer,Google

“DrStruhl’snewbookisararejewelamongmarketingsciencetomes–informative,easytounderstandand,dareIsay,evenentertaining.DrStruhlsurveysseveralmajoranalytictechniquesinplainEnglish,introducingthenovicetofoundationalconceptswhileatthesame

timeremindingtheseasonedanalystofbestpracticesoftenforgotten.Avaluableresourceforbeginnerandexpertalike.”

DrRichardMcCullough,President,MacroConsultingInc

“Thisbookcoverslucidlyanumberofresearchmethodologiesthatcommonlysupportveryimportantnewproductdevelopmentandmarketingstrategydecisions.DrStruhlshouldbecommendedformakingthematerialsaccessibletoawiderangeofaudiencesby

emphasizingthepracticality,appropriateness,andprosandconsofthevariousmethodologies.”

JehoshuaEliashberg,SebastianSKresgeProfessorofMarketing,andProfessorofOperations,InformationandDecisions,TheWharton

School

“DrStruhlhaswrittenanotherhighlyinformativebook.Itoffersaneasy-to-understandwayofthinkingabouthowtobestusedatatoanswerbiggermarketingquestions.Hisexplanationsareclearandrelatable,makingthisbookaninvaluabletoolforanyoneinvolvedincommercialdecisionmaking,especiallymarketersandresearchers.”KatieSzelc,Manager,CustomerInsights,GlobalBusinessInsights,

Johnson&JohnsonMedicalDevices

Theabilitytopredictconsumerchoiceisafundamentalaspectofsuccessforanybusiness.Inthecontextofartificialintelligence

marketing,thereisawidearrayofpredictiveanalytictechniquesthatcanbeusedtoanticipatewhatconsumerswant;eachtechniquehavingitsownuniqueadvantagesanddisadvantages.ArtificialIntelligenceMarketingandPredictingConsumerChoiceservestointegratethesedisparateapproaches,showingthestrengths,weaknessesandbestapplicationsofeach.

Byexploringthehumansideofadvancedanalytics,thisbookprovidesabridgebetweenthepersonwhomustlearnorapplytheseproblem-solvingmethodsandthecommunityofexpertswhodotheactualanalysis.Coveringareassuchasdiscretechoicemodelling,conjointanalysisandmachinelearningmethods,ArtificialIntelligenceMarketingandPredictingConsumerChoicedeliversincrediblyusefulbusinesstechniques–allowingforpracticalreal-worldapplicationwhilesimultaneouslyprovidingengaginginsightintotheremarkableadvancesinthisfascinatingfield.

Bonusonlineresourcesareavailableatwww.koganpage.com/AI-Marketing

DrStevenStruhlPhD,MBA,MA,hasmorethan25years’experienceinconsultingandresearch,specializinginpracticalsolutionsbasedonstatisticalmodelsofdecisionmakingandbehaviour.Alongsidetextanalyticsanddatamining,hisworkaddresseshowbuyingdecisionsaremade,optimizingservicedeliveryandproductconfigurations,andfindingthemeaningfuldifferencesamongproductsandservices.DrStruhlhasalsotaughtgraduatecoursesonstatisticalmethodsanddataanalysis.Heisaregularconferencespeakerandhasgivennumerousseminarsonpricing,choicemodelling,marketsegmentationandpresentingdata.

KoganPage


London

NewYork

NewDelhiwww.koganpage.com

http://www.koganpage.com

Artificial Intelligence Marketing and Predicting Consumer Choice

Documents