Fake It Till You Make It:Reputation, Competition, and Yelp
Review
FraudMichaelLucaHarvardBusinessSchoolmluca@[email protected],2013AbstractConsumerreviewsarenowapartofeverydaydecision-making.
Yetthecredibilityof
reviewsisfundamentallyunderminedwhenbusiness-ownerscommitreviewfraud,either
byleavingpositivereviews for themselves or negativereviews for
their com-petitors. Inthis paper, we investigate the extent
andpatterns of
reviewfraudonthepopularconsumerreviewplatformYelp.com.
Becauseonecannotdirectlyobservewhich reviews are fake, we focus on
reviews that Yelps algorithmic indicator has iden-tied as
fraudulent. Using this proxy,we present four main ndings.
First,roughly 16percent of restaurant reviews on Yelp are identied
as fraudulent, and tend to be
moreextreme(favorableorunfavorable)thanotherreviews. Second,
arestaurantismorelikelytocommitreviewfraudwhenitsreputationisweak,
i.e., whenithasfewre-views, or it has recently received bad
reviews. Third, chain restaurants - which
benetlessfromYelp-arealsolesslikelytocommitreviewfraud.
Fourth,whenrestaurantsfaceincreasedcompetition,
theybecomemorelikelytoleaveunfavorablereviewsforcompetitors.
Takeninaggregate, thesendingshighlighttheextentof
reviewfraudandsuggestthatabusinesssdecisiontocommitreviewfraudrespondtocompetitionandreputationincentivesratherthansimplytherestaurantsethics.PartofthisworkwascompletedwhiletheauthorwassupportedbyaSimonsFoundationPostdoctoralFellowship.11
IntroductionConsumer reviewwebsites suchas Yelp, TripAdvisor,
andAngies List have become in-creasinglypopularoverthepastdecade,
andnowexistfornearlyanyproductorserviceimaginable.
Yelpalonecontainsmorethan30millionreviewsofrestaurants, barbers,
me-chanics,andotherservices,andhasamarketcapitalizationinexcessoffourbilliondollars.Moreover,thereismountingevidencethatthesereviewshaveadirectinuenceonproductsales(seeChevalierandMayzlin(2006),Luca(2011),ZhuandZhang(2010)).Asthepopularityoftheseplatformshasgrown,sohaveconcernsthatthecredibilityofreviewscanbeunderminedbybusinessesleavingfakereviewsforthemselvesorfortheircompetitors.
Thereisconsiderableanecdotalevidencethatthistypeofcheatingisendemicintheindustry.
Forexample, theNewYorkTimesrecentlyreportedonthecaseof
busi-nesseshiringworkersonMechanical
TurkanAmazon-ownedcrowdsourcingmarketplacetopostfake5-starYelpreviewsontheirbehalf
foraslittleas25centsperreview.1In2004,
Amazon.caunintentionallyrevealedtheidentitiesof anonymousreviewers,
brieyunmaskingconsiderableself-reviewingbybookauthors.2Inrecentyears,reviewfraudhasemergedasthepreeminentthreattothesustainabilityofthistypeofusergeneratedcontent.
Despitethemajorchallengethatreviewfraudposesforrmsandconsumersalike,
littleisknownabouttheeconomicincentivesbehindit. Inthis paper, we
assemble a novel dataset from Yelp one of the industry leaders to
estimatetheincidenceof
reviewfraudamongrestaurantsintheBostonmetropolitanarea,
andtounderstand the conditions under which it is most prevalent.
Specically, we provide evidenceonthe impact of restaurants
evolvingreputation, andthe competitionit faces,
anditsdecisiontoengageinpositiveandnegativereviewfraud.A growing
computer science literature has developed data-mining algorithms
which lever-age observable review characteristics, such as textual
features, and reviewers social networks,toidentifyabnormal
reviewingpatterns(forexample, seeAkogluetal. (2013)).
Arelatedstrandoftheliteraturehasfocusedonconstructingagoldstandardforfakereviewsthatcanbeusedas
traininginput for fakereviewclassiers. For example, Ott et al.
(2012)constructsuchafakereviewcorpusbyhiringusersonMechanical
Turkanonlinelabormarkettoexplicitlywritefakereviews.In this paper,
we analyze fake reviews from a dierent perspective, and investigate
a busi-nesss incentives to engage in review fraud. We analyze these
issues using data from
Yelp.com,1SeeARave,aPan,orJustaFake?byDavidSegal,May11,availableathttp://www.nytimes.com/2011/05/22/your-money/22haggler.html.2See
Amazon reviewers brought to book by David Smith, Feb.04, available
athttp://www.guardian.co.uk/technology/2004/feb/15/books.booksnews.2focusing
on reviews that have been written for restaurants in the Boston
metropolitan area.Empirically, identifying fake reviews is dicult
because the econometrician does not
directlyobservewhetherareviewisfake. Asaproxyforfakereviews,
weusetheresultsofYelpsltering algorithm that predicts whether a
review is genuine or fake. Yelp uses this algorithmto ag fake
reviews, and to lter them o of the main Yelp page (we have access
to all
reviewsthatdonotdirectlyviolatetermsofservice,regardlessofwhethertheywereltered.)
Theexact algorithm is not public information, but the results of
the algorithm are. With this
inhand,wecananalyzethepatternsofreviewfraudonYelp.Overall,
roughly16%ofreviewsareidentiedbyYelpasfakeandaresubsequentlyl-tered.
What does a ltered review look like?We rst consider the
distribution of
star-ratings.Thedatashowthatlteredreviewstendtobemoreextremethanpublishedreviews.
Thisobservationrelatestoabroaderliteratureonthedistributionof
opinioninuser-generatedcontent. Li and Hitt (2008) show that the
distribution of reviews for many products tend tobebimodal,
withreviewstendingtoward1-and5-starsandrelativelylittleinthemiddle.Theirargumentisthatthiscanbeexplainedthroughselectionifpeoplearemorelikelytoleave
a review after an extreme experience. Our results suggest another
factor that
increasestheproportionofextremereviewsistheprevalenceoffakereviews.Does
reviewfraudrespondtoeconomic incentives, or is it
drivenmainlybyasmallnumber of unethical restaurants that are intent
ongamingthe systemregardless of thesituation?If review fraud is
driven by incentives,then we should see a higher concentrationof
fraudulent reviews when the incentives are stronger. Theoretically,
restaurants with worse,or less established reputations have a
stronger incentive to game the system. Consistent withthis,
wendthatarestaurantsreputationisastrongpredictorof
itsdecisiontoleaveafakereview.
Wealsondthatrestaurantsaremorelikelytoengageinfraudwhentheyhavefewerreviews.
Thisresultmotivatesanadditional
explanationfortheoften-observedconcentrationofhighratingsearlierinabusinesslife-cycle.
Previousworkattributedthisconcentration to interactions between a
consumers decision to contribute a review, and
priorreviewsbyotherconsumers(seeMoeandSchweidel (2012),
GodesandSilva(2012).) Ourworksuggestsbusinesses
increasedincentivestomanipulatetheirreviewsearlyonasanadditional
explanation for this observation. We also nd that restaurants that
have recentlyreceivedbadratings engageinmorepositivereviewfraud.
Bycontrast, wendnolinkbetweennegativereviewfraudandreputation.
Tosomeextentthisistobeexpected,sincethesearefakereviewsthatarepresumablyleftbyabusinesscompetitors.Wealsondthatarestaurantsoinereputationisadeterminantof
itsdecisiontoseekpositivefakereviews. Inparticular,
Luca(2011)ndsthatconsumerreviewsarelessinuentialforchainrestaurants,whichalreadyhavermlyestablishedreputationsbuiltby3extensivemarketingandbranding.
JinandLeslie(2009)ndthatorganizationalformalsoaects a restaurants
performance in hygiene inspections. Consistent with this,we nd
thatchainrestaurantsarelesslikelytoleavefakereviewsrelativetoindependentrestaurants.This
contributes to our understanding of the ways in which a business
reputation aects itsincentiveswithrespecttoreviewfraud.In addition
to leaving reviews for itself, a restaurant may commit review fraud
by leavinganegative reviewfor acompetitor. Empirically, we ndthat
increasedcompetitionbynearbyindependentrestaurantsservingsimilartypesof
foodispositivelyassociatedwithnegativereviewfraud. Interestingly,
increasedcompetitionbynearbyrestaurantsservingdierenttypesoffoodarenotasignicantpredictorofnegativereviewfraud.
Apotentialexplanationforthisndingisthatrestaurantstendtocompetebasedonacombinationoflocationandcuisine.
Thisexplanationechoesthesurveyof Auty(1992)inwhichdinersranked food
type and quality highest among a list of restaurant selection
criteria, suggestingthat restaurants do no compete on the basis of
location alone. Our results are also consistentwiththeanalysisof
Mayzlinetal.
(2012)whondthathotelswithindependently-ownedneighborsaremorelikelytoreceivenegativefakereviews.Overall,
our ndings suggest that positive review fraud is primarily driven
by changes inarestaurantsownreputation,
whilenegativereviewfraudisprimarilydrivenbychangingpatterns of
competition. For platforms looking to curtail gaming, our results
provide insightsboth into the extent of gaming, as well as the
circumstances in which this is more
prevalent.Ourresultsalsoprovideinsightintoourunderstandingofethicaldecisionmakingbyrms,whichweshowtobeafunctionofeconomicincentivesratherthanafunctionofunethicalrms.
Finally,ourworkiscloselyrelatedtotheliteratureonorganizationalform,showingthatincentivesbyindependentrestaurantsarequitedierentfromincentivesofchains.2
RelatedworkThereisnowextensiveevidencethatconsumerreviewshaveacausal
impactondemandinindustries rangingfrombooks torestaurants tohotels,
amongothers (Chevalier andMayzlin(2006), Luca(2011), Ghose et al.
(2012)). However, there is
considerablylessagreementaboutwhetherreviewscontaintrustworthyinformationthatcustomersshoulduse.
Forexample,LiandHitt(2008)arguethatearlierreviewerstendtobemorefavorabletoward
a product relative to later reviewers, making reviews less
representative of the typicalbuyerandhencelessreliable.
Lookingatmoviesalesandreviews,
Dellarocasetal.(2010)provideevidencethatconsumersaremorelikelytoreviewnicheproducts,butatthesametimearemorelikelytoleaveareviewifmanyotherreviewershavecontributed,suggesting4othertypesofbiasthatmayappearinreviews.
Inprinciple,
ifoneknowsthestructureofanygivenbiasthatreviewsexhibit, Dai etal.
(2012)arguethatthereviewplatformcanimprovethereliabilityofinformationbysimplyadjustingandreweighingreviewstomakethemmorerepresentative.Perhaps
the most direct challenge to the reliability of online information
is the possibilityof leavingfakereviews. Theoretically,
Dellarocas(2006) providesconditionsunderwhichreviews can still be
informative even if there is gaming. In concurrent but independent
work,AndersonandSimester(2013)showthatmanyreviewsonanonlineapparel
platformarewritten by customers who have no purchase record. These
reviews tend to be more negativethanotherreviews.
Incontrastwithoursetting-
wherewelookforeconomicincentivestocommitreviewfrom-theirworkhighlightsreviewsthatarewrittenbypeoplewithoutanyclear
nancial incentivetoleaveafakereview. Withincomputer science,
agrowingliterature has focusedonthe development of machine learning
algorithms to identify reviewfraud.
Commonlythesealgorithmsrelyeither
ondataminingtechniquestoidentifyab-normal reviewing patterns, or
employ supervised learning methods trained on
hand-labelledexamples. SomeexamplesareOttetal. (2012), Fengetal.
(2012), Akogluetal.
(2013),Mukherjeeetal.(2011,2012),Jindaletal.(2010).To the best of
our knowledge, there exists only one other paper that analyzes the
economicincentives of reviewfraud, anduses adierent empirical
approachandsetting. Mayzlinetal. (2012)exploitanorganizational
dierencebetweenExpediaandTripAdvisor(whicharespin-os of
thesameparent companywithdierent features3)
tostudyreviewfraudbyhotels: whileanyonecanpostareviewonTripAdvisor,
Expediarequiresthataguesthas paidandstayedbefore submittingareview.
The authors observe that Expediasvericationmechanismincreases
thecost of postingafakereview. Thestudynds thatindependent hotels
tendtohaveahigher proportionof ve-star reviews
onTripAdvisorrelative to Expedia and competitors of independent
hotels tend to have a higher proportion ofone-star reviews on
TripAdvisor relative to Expedia. Their argument is that there are
manyreasons that TripAdvisor reviews may be dierent from Expedia
reviews,and many reasonsthat independent hotels may receive dierent
reviews from chain hotels. However, they
arguethatifindependenthotelsreceivefavorabletreatmentrelativetochainsonTripAdvisorbutnotonExpedia,
thenthissuggeststhatthesereviewsonTripAdvisorarefraudulent.
Thevalidity of this measure rests on the assumption that dierences
in the distributions of ratingsacrossthetwosites,
andbetweendierenttypesofhotelsareduetoreviewfraud. Inourwork,
wedonotrelyonthisassumptionasweareidentifyingoureectsentirelywithinasinglereviewplatform.
Ourworkalsodierenceinthatweareabletoexploitthepanel3Seehttp://www.iac.com/about/history.5FilteredPublishedReviews0500010000150002005
2006 2007 2008 2009 2010 2011
2012(a)Publishedandlteredreviewcountsbyquarter.GGGGGGGGGGGG G
GGGGGG GGGG GGG GGGGGGGFiltered reviews2005 2006 2007 2008 2009
2010 2011
20120%10%20%30%(b)Percentageoflteredreviewsbyquarter.Figure1:
ReviewingactivityforBostonrestaurantsfromYelpsfoundingthrough2012.natureofourdata,
andanalyzethewithinrestaurantroleofreputationinthedecisiontocommit
reviewfraud. This allows us toshownot onlythat certaintypes of
restaurantsaremorelikelytocommitreviewfraud,
butalsothatevenwithinarestaurant,
economicconditionsinuencetheultimatedecisiontocommitreviewfraud.Finally,
we briey mention the connection between our work, the literature on
statisticalfrauddetection(e.g., see BoltonandHand(2002)), andthe
relatedline of researchonstatistical models of misclassication in
binary data (e.g., see Hausman et al. (1998)).
Thesemethodshavebeenappliedtouncovervarioustypesoffraud,
suchasfraudelentinsuranceclaims,
andfraudulentcreditcardtransactions.
Thekeydierenceofourworkisthatourendgoal isnt toidentifyindividual
fraudulent reviewsinsteadwewishexploit
anoisysignalofreviewfraudtoinvestigatetheincentivesbehindit.3
DescriptionofYelpandFilteredReviews3.1
AboutYelpOuranalysisinvestigatesreviewsfromthewebsiteYelp.com,
whichisaconsumerreviewplatformwhereuserscanreviewlocalbusinessessuchasrestaurants,bars,hairsalons,andmanyother
services. At thetimeof this study, Yelpreceives
approximately100millionuniquevisitors per month, andcounts over
30millionreviews inits collection. It is thedominant reviewsitefor
restaurants. For thesereasons, Yelpis acompellingsettinginwhich to
investigate review fraud. For a more detailed description of Yelp
in general, see Luca(2011).Inthis analysis, we focus onrestaurant
reviews inthe metropolitanareaof Boston,6MA.
WeincludeinouranalysiseveryYelpreviewthatwaswrittenfromthefoundingofYelpin2004through2012,otherthantheroughly1%ofreviewsthatviolateYelpstermsof
service (for example, reviews that containoensive, or
discriminatorylanguage.) Intotal,
ourdatasetcontains316,415reviewsfor3,625restaurants.
Ofthesereviews, 50,486(approximately16%) have beenlteredbyYelp.
Figure 1adisplays quarterlytotals ofpublishedandlteredreviews
onYelp. BothYelps growthinterms of the number ofreviewsthat
arepostedonit, andtheincreasingnumber of reviewsthat
arelteredareevidentinthisgure.3.2
FakeandFilteredReviewsThemainchallengeinempiricallyidentifyingreviewfraudisthatwenotdirectlyobservewhether
a review is fake. The situation is further complicated by the lack
of single standardforwhatmakesreviewfake.
TheFederalTradeCommissionstruth-in-advertisingrules4provide some
useful guidelines: reviews must be truthful and substantiated,
non-deceptive,and any material connection between the reviewer and
the business being reviewed must bedisclosed. For example,reviews
by the business owner,his or her family members,competi-tors,
reviewers that have been compensated, or a disgruntled ex-employee
are considered fake(and, by extension illegal) unless these
connections are disclosed. Not every review can be
asunambiguouslyclassied.
Thecaseofabusinessownernudgingconsumersbyprovidingthemwithinstructionsonhowtoreviewhisbusinessisinalegal
greyarea.
Mostreviewsiteswhoseobjectiveistocollectreviewsthatareasobjectiveaspossiblefrownuponsuchinterventions,andencouragebusinessownerstoavoidthem.Toworkaroundthelimitationof
notobservingfakereviewsweexploitauniqueYelpfeature: Yelp is the
only major review site we know of that allows access to
lteredreviews reviews that Yelp has classied as illegitimate using
a combination of algorithmic techniques,simpleheuristics,
andhumanexpertise.
FilteredreviewsarenotpublishedonYelpsmainlistings,
andtheydonotcounttowardscalculatingabusiness averagestar-rating.
Never-theless,
adeterminedYelpvisitorcanseeabusinesslteredreviewsaftersolvingapuzzleknownasaCAPTCHA.5Filteredreviewsare,
of course, onlyimperfectindicatorsof fakereviews. Our work
contributes to the literature on review fraud by developing a
method
thatusesanimperfectindicatoroffakereviewstoempiricallyidentifythecircumstancesunder4SeeGuidesConcerningtheUseofEndorsementsandTestimonialsinAdvertising,availableathttp://ftc.gov/os/2009/10/091005revisedendorsementguides.pdf.5ACAPTCHAis
apuzzleoriginallydesignedtodistinguishhumans frommachines. It is
commonlyimplementedbyaskinguserstoaccuratelytranscribeapieceoftextthathasbeenintentionallyblurredataskthatiseasierforhumansthanformachines.
YelpusesCAPTCHAstomakeaccesstolteredreviewsharderforbothhumansandmachines.
FormoreonCAPTCHAsseeVonAhnetal.(2003).7which fraud is prevalent.
This technique translates to other settings where such an
imperfectindicatorisavailable, andreliesonthefollowingassumption:
thattheproportionof
fakereviewsisstrictlysmalleramongthereviewsYelppublishes,
thanthereviewsYelplters.Weconsiderthistobeamodestassumptionwhosevaliditycanbequalitativelyevaluated.In
4,weformalizetheassumption,suggestamethodofevaluatingitsvalidity,anduseittodevelopourempiricalmethodologyforidentifyingtheincentivesofreviewfraud.3.3
CharacteristicsoflteredreviewsTotheextentthatYelpisacontentaggregatorratherthanacontentcreator,
thereisadirectinterestinunderstandingreviewsthatYelphasltered.
WhileYelppurposelymakesthelteringalgorithmhardtoreverseengineer,
weareabletotestfordierencesintheobservedattributesofpublishedandlteredreviews.Figure1bdisplaystheproportionofreviewsthathavebeenlteredbyYelpovertime.The
spike in the beginning results from a small sample of reviews
posted in the correspondingquarters.
Afterthis,thereisaclearupwardtrendintheprevalenceofwhatYelpconsidersto
be fake reviews. Yelps retroactively lters reviews using the latest
version of its detectionalgorithm. Therefore,
aYelpreviewcanbeinitiallyltered, but
subsequentlypublished(andviceversa.) Hence, theincreasingtrendseems
toreect thegrowingincentives forbusinesses toleavefakereviews as
Yelpgrows ininuence, rather thanimprovements
inYelpsfake-reviewdetectiontechnology.Should we expect the
distribution of ratings for a given restaurant to reect the
unbiaseddistribution of consumer opinions?The answer to this
question is likely no. Empirically,
Huetal.(2006)showthatreviewsonAmazonarehighlydispersed,andinfactoftenbimodal(roughly
50% of products on Amazon have a bimodal distribution of ratings).
Theoretically,LiandHitt(2008)pointtothefactthatpeoplechoosewhichproductstoreview,andmaybemorelikelytorateproductsafter
havinganextremelygoodor badexperience.
Thiswouldleadreviewstobemoredispersedthanactual consumeropinion.
Thisselectionofconsumerscanunderminethequalityofinformationthatconsumersreceivefromreviews.Wearguethat
fakereviews mayalsocontributetothelargedispersionthat is
oftenobservedinconsumer ratings. Toseewhy, consider what
afakereviewmight looklike:fakereviewsmayconsist of
abusinessleavingfavorablereviewsfor itself, or unfavorablereviews
for its competitors. There is little incentive for abusiness
toleave amediocrereview. Hence,
thedistributionoffakereviewsshouldtendtobemoreextremethanthatof
legitimatereviews. Figure2ashows thedistributions of
publishedandlteredreviewonYelp.
Thecontrastbetweenthetwodistributionsisconsistentwiththesepredictions.81
2 3 4 5PublishedFilteredStar
rating0%10%20%30%40%(a)Distributionofstarsratingsbypublishedstatus.User
review countFiltered reviews1 2 3 4 5 6 7 8 9 10 11 12 13 14
150%20%40%60%(b) Percentage of ltered reviews by user review
count.Figure2: Characteristicsoflteredreviews.Legitimate reviews
are unimodal with a sharp peak at 4 stars. By contrast, the
distributionof fakereviewsisbimodal withspikesat1starand5stars.
Hence, inthiscontext, fakereviews appear to exacerbate the
dispersion that is often observed in online consumer
ratings.InFigure2bwebreakdownindividual reviews bythetotal number
of reviews theirauthorshavewritten, anddisplaythepercentageof
lteredreviewsforeachgroup. Thetrendwe observe suggests that
Yelpusers who have contributedmore reviews are less
likelytohavetheirreviewsltered.Weestimatethecharacteristicsoflteredreviewsinmoredetailbyusingthefollowinglinearprobabilitymodel:Filteredij=
bi + x
ij + ij,
(1)wherethedependentvariableFilteredijindicateswhetherthejthreviewof
businessiwasltered,biisabusinessxedeect,andxijisvectorofreviewandreviewercharacteristicsincluding:
star rating, (log of) length in characters, (log of) total number
of reviewer reviews,and a dummy for the reviewer having a
Yelp-prole picture. The estimation results are shownin the rst
column of Table 1. In line with our observations so far, we nd that
reviews withextreme ratings are more likely to be ltered all else
equal, 1- and 5-star review are
roughly3percentagepointsmorelikelytobelteredthan3-starreviews.
WealsondthatYelpsreviewlterissensitivetothereviewandreviewerattributesincludedinourmodel.
Forexample,longerreviews,orreviewsbyuserswithalargerreviewcountarelesslikelytobeltered.
Beyondestablishingsomecharacteristicsof Yelpslter,
thisanalysisalsopointstotheneedforcontrollingforpotentialalgorithmicbiaseswhenusinglteredreviewsasaproxyforfakereviews.
Weexplainourapproachindealingwiththisissuein 4.93.4
FilteredReviewsandAdvertisingonYelpLocal business advertising
constitutes Yelps major revenue stream. Advertisers are featuredon
Yelp search results pages in response to relevant consumer queries,
and on the Yelp pagesof similar, nearbybusinesses. Furthermore,
whenabusiness purchases advertising,
YelpremovescompetitorsadsfromthatbusinessYelppage.
Overtheyears,Yelphasbeenthetarget of repeatedcomplaints
allegingthat its lter discriminates infavor of
advertisers,goinginsomecases as far as claimingthat thelter is
nothingother thananextortionmechanismfor
advertisingrevenue.6Yelphas deniedthese allegations,
andsuccessfullydefendeditselfincourtwhenlawsuitshavebeenbroughtagainstit(forexample,seeLevittv.
YelpInc.,andDemetriadesv. YelpInc.)
Ifsuchallegationsweretrue,theywouldraiseseriousconcernsastothevalidityofusinglteredreviewsasproxyforfakereviewsinouranalysis.Usingourdatasetweareabletocastfurtherlightonthisissue.
TodosoweexploitthefactthatYelppubliclydiscloseswhichbusinessesarecurrentadvertisers(itdoesnotdisclose
which businesses were advertisers historically.) Specically, we
augment Equation
1byinteractingthexitvariableswithadummyvariableindicatingwhetherabusinesswasaYelpadvertiseratthetimeweobtainedourdataset.
Theresultsofestimatingthismodelareshowninsecondcolumnof Table1.
Wendthat noneof theadvertiser interactioneects are statistically
signicant, while the remaining coecients are essentially
unchangedin comparisonto those in Equation 1. This suggests,for
example,that neither 1- nor 5-starreviews were signicantly more or
less likely to be ltered for businesses that were
advertisingonYelpatthetimewecollectedourdataset.This analysis has
some clear limitations. First, since we do not observe the complete
his-toric record of which businesses have advertised on Yelp, we
can only test for discriminationin favor of (or, against)
currentYelp advertisers. Second, we can only test for
discriminationinthepresent
breakdownbetweenlteredandpublishedreviews.
Third,ourtestobviouslypossessesnopowerwhatsoeverindetectingdiscriminationunrelatedtolteringdecisions.Therefore,
while our analysis provides some suggestive evidence against the
theory that Yelpfavorsadvertisers,
westressthatitisneitherexhaustive, norconclusive.
Itisbeyondthescope of this paper, and outside the capacity of our
dataset to evaluate all the ways in
whichYelpcouldfavoradvertisers.6See No, Yelp Doesnt Extort Small
Businesses. See For Yourself., available at:
http://officialblog.yelp.com/2013/05/no-yelp-doesnt-extort-small-businesses-see-for-yourself.html.104
EmpiricallyStrategyInthis section, weintroduceour empirical
strategyfor identifyingreviewfraudonYelp.Ideally, if we could
recognize fake reviews, we would estimate the following regression
model:fit= x
it + bi + t + it(i = 1 . . . N; t = 1 . . . T),
(2)wherefitisthenumberoffakereviewsbusinessireceivedduringperiodt,xitisavectoroftime-varying
covariates measuring a business economic incentives to engage in
review fraud,arethestructural parameters of interest,
biandtarebusiness andtimexedeects,andtheitisanerrorterm.
Theinclusionofbusinessxedeectsallowsustocontrol forunobservable
time-invariant, business-specic incentives for Yelp review fraud.
For example,Mayzlin et al. (2012) nd that the management structure
of hotels in their study is associatedwithreviewfraud. Totheextent
that management structureis time-invariant,
businessxedeectsallowustocontrolforthisunobservablecharacteristic.
Hence,whenlookingatincentives to leave fake reviewers over time, we
include restaurant xed eects. However, wealso run specications
without a restaurant xed eect so that we can analyze
time-invariantcharacteristicsaswell. Similarly,
theinclusionoftimexedeectsallowsustocontrol
forunobservable,commonacrossbusinesses,time-varyingshocks.Asisoftenthecaseinstudiesofgamingandcorruption(e.g.,seeMayzlinetal.(2012),Duggan
and Levitt (2002),and references therein) we do not directly
observe fit,and hencewecannotestimatetheparametersofthismodel.
ToproceedweassumethatYelpslterpossessessomepositivepredictivepowerindistinguishingfakereviewsfromgenuineones.Isthisacredibleassumptiontomake?Yelpsappearstoespousetheviewthatitis.
WhileYelp is secretive about how its review lter works, it states
that the lter sometimes aectsperfectly legitimate reviews and
misses some fake ones, too, but does a good job given
thesheervolumeofreviewsandthedicultyofitstask.7Inaddition,wesuggestasubjectivetest
to assess the assumptions validity: for any business, one can
qualitatively check whetherthe fraction of suspicious-looking
reviews is larger among the reviews Yelp
publishes,ratherthanamongtheonesitlters.Formally,weassumethatPr[Filtered|Fake]
= a0, andPr[Filtered|Fake] = a0 + a1,forconstantsa0 [0, 1], anda1
(0, 1], i.e.,
thattheprobabilityafakereviewislteredisstrictlygreater
thantheprobabilityagenuinereviewisltered.
Lettingfitkbealatentindicator of the kthreview of businesses i at
time t being fake, we model the ltering
process7SeeWhatisthelter?,availableathttp://www.yelp.com/faq#what_is_the_filter.11forasinglereviewas:fitk=
0(1 fitk) + (0 + 1)fitk +
uitk,whereuitkisazero-meanindependenterrorterm.
Werelaxthisindependenceassumptionlater.
Summingofoverallnitreviewsforbusinessiinperiodtweobtain:nit
k=1fitk=nit
k=1[0(1 fitk) + (0 + 1)fitk + uitk]fit= 0nit + 1fit +
uit(3)whereuitis acompositeerror term.
SubstitutingEquation2intotheabove, yields thefollowingmodelyit=
a0nit + a1 (x
it + bi + t + it) + uit. (4)It consists of observedquantities,
unobservedxedeects, andanerror term. We canestimatethismodel
usingawithinestimatorwhichwipesoutthexedeects.
However,whilewecanidentifythereduced-formparametersa1,
wecannotseparatelyidentifythevectorofstructural
parametersofinterest, . Therefore, wecanonlytestforthepresenceof
fraudthroughtheestimates of thereduced-formparameters, a1.
Furthermore, sincea1
1,theseestimateswillbelowerboundstothestructuralparameters,.4.1
ControllingforbiasesinYelpslterSofar,
wehavenotaccountedforpossiblebiasesinYelpslterrelatedtospecicreviewattributes.
Butwhatif uitisendogenous? Forexample,
theltermaybemorelikelytoltershorterreviews,
regardlessofwhethertheyarefake. Tosomeextent,
wecancontrolforthesebiases. Letzitkbeavectorofreviewattributes.
Weincorporatelterbiasesinbymodelingtheerrorthemuitkasfollowsuitk=
z
itk + uitk(5)where uitkisnowanindependenterrorterm.
Thisinturnsuggeststhefollowingregressionmodelyit= a0nit + a1 (x
it + bi + t + it) +nit
kz
itk + uit. (6)12In zitk, we include controls for: review length,
the number of prior reviews a reviews authorhas written, and
whether the reviewer has a prole picture associated with his or her
account.As we saw in 3 these attributes help explain a large
fraction of the variance in ltering.
Alimitationofourworkisthatwecannotcontrolforlteringbiasesinattributesthatwedonot
observe, such as the IP address of a reviewer, or the exact time a
review was submitted.If theseunobservedattributesareendogenous,
ourestimationwill bebiased.
Equation6constitutesourpreferredspecication.5
ReviewFraudandOwnReputationThis section discusses the main results,
which are at the restaurant-month level and presentedin Tables 3
and 4. The particular choice of aggregation granularity is driven
by the frequencyofreviewingactivity.
RestaurantsintheBostonareareceiveonaverageapproximately0.11-star
publishedreviews per month, and0.375-star reviews. Table 2contains
detailedsummarystatisticsofthethesevariables.Wefocusonunderstandingtherelationshipbetweenarestaurantsreputationandtheitsincentivestoleaveafakereview,
withtheoverarchinghypothesisthatrestaurantswithamore established,
or more favorable reputationhave less of anincentive toleave
fakereviews. Whilethereisnt asinglevariablethat
fullycapturesarestaurantsreputation,thereareseveralthatwefeelcomfortableanalyzing,includingitsnumberofrecentpositiveand
negative reviews, and an indicator for whether the restaurant is a
chain, or independentbusiness.
Theseareempiricallywell-foundedmetricsofarestaurantsreputation.Specically,
weestimateEquation6, whereweincludeinthevector
xitthefollowingparameters: the number of 1, 2, 3, 4, and5star
reviews receivedinperiodt 1; thelogof thetotal number of reviews
thebusiness hadreceiveduptoandincludingperiodt
1;and,theageofthebusinessinperiodtmeasuredin(fractional)years.
Toinvestigatetheincentivesofpositivereviewfraudweestimatespecicationswiththenumberof5-starreviews
per restaurant-monthas the dependent variable. These results are
presentedinTable3. Similarly, toinvestigatenegativereviewfraud,
werepeat theanalysis withthenumber of 1-star reviews per
restaurant-month as the dependent variable. We present
theseresultsinTable4. Next,wediscussourresultsindetail.5.1 Results:
worseningreputationdrivespositivereviewfraudLowratingsincreaseincentivesforpositivereviewfraud,
andhighratingsde-creasethem Onemeasureof
arestaurantsreputationisits rating.
Asarestaurants13ratingincreases,itreceivesmorebusinessLuca(2011)andhencemay
havelessincentivetogame the system. Consistent with this
hypothesis, in the rst column of Table 3 we observea positive and
signicant association between the number of published 1- and 2-star
reviewsabusinessreceivedinperiodt
1,andreviewfraudinthecurrentperiod. Conversely,weobserveanegative,
statisticallysignicantassociationbetweenreviewfraudinthecurrentperiod,
and the occurrence of 4- and 5-star published reviews in the
previous period. In otherwords,
apositivechangetoarestaurantsreputationwhethertheresultoflegitimate,
orfakereviewsreducestheincentivesof engaginginreviewfraud,
whileanegativechangeincreasesthem.Beyondthestatisticalsignicanceoftheseresultswealsointerestedintheirsubstantiveeconomic
impact. One way to gauge this, is to compare the magnitudes of the
estimated co-ecients to the average value of the dependent
variable. For example, on average
restaurantsinourdatasetreceivedapproximate0.1ltered5-starreviewspermonth.
Meanwhile,
thecoecientestimatesintherstcolumnofTable3suggestthatanadditional1-starreviewpublishedinthepreviousperiodisassociatedwithanextra0.01ltered5-starreviewsinthe
current period, i.e., an increase constituting approximately 10% of
the observed monthlyaverage. Furthermore, recalling that most
likely a1< 1 (that is to say Yelp does not
identifyeverysinglefakereview),
thisnumberisalowerboundfortheincreaseinpositivereviewfraud.To
assess the robustness of the relationship between recent
reputational shocks and
reviewfraudwere-estimatedtheabovemodelincludingthe6-monthleadsofpublished1,2,3,4,and5star
reviews counts. We hypothesize that while tosome extent restaurants
mayanticipategoodorbadreviewsandengageinreviewfraudinadvance,theeectshouldbemuchsmaller
comparedtotheeect of recentlyreceivedreviews. Our results,
shownincolumn2of Table3suggestthatthisisindeedthecase.
Thecoecientsof the6-monthleadvariablesarenearzero,
andnotstatisticallyatconventional signicancelevels.
Theonlyexceptionisthecoecientforthe6-monthleadof 5starreviews(p