-
Restaurant Survival Analysis with HeterogeneousInformation
Jianxun Lian∗
University of Science andTechnology of China
Hefei, [email protected]
Fuzheng ZhangMicrosoft Research
Beijing, [email protected]
Xing XieMicrosoft Research
Beijing, [email protected]
Guangzhong SunUniversity of Science and
Technology of ChinaHefei, China
[email protected]
ABSTRACTFor shopkeepers, one of their biggest common concerns
iswhether their business will thrive or fail in the future. Withthe
development of new ways to collect business data, it ispossible to
leverage multiple domains’ knowledge to build anintelligent model
for business assessment. In this paper, wediscuss what the
potential indicators are for the long-termsurvival of a physical
store. To this end, we study factorsfrom four pillars: geography,
user mobility, user rating, andreview text. We start by exploring
the impact of geograph-ic features, which describe the location
environment of theretailer store. The location and nearby places
play an im-portant role in the popularity of the shop, and usually
lesscompetitiveness and more heterogeneity is better. Then westudy
user mobility. It can be viewed as supplementary tothe geographical
placement, showing how the location canattract users from anywhere.
Another important factor ishow the shop can serve and satisfy
users. We find thatrestaurant survival prediction is a hard task
that can not besolved simply using consumers’ ratings or sentiment
met-rics. Compared with conclusive and well-formatted ratings,the
various review words provide more insight of the shopand deserve
in-depth mining. We adopt several languagemodels to fully explore
the textual message. Comprehen-sive experiments demonstrate that
review text indeed havethe strongest predictive power. We further
compare differentcities’ models and find the conclusions are highly
consistent.Although we focus on the class of restaurant in this
paper,the method can be easily extended to other shop
categories.
∗This work was done when Jianxun Lian was an intern atMicrosoft
Research, Beijing, P.R. China.
c©2017 International World Wide Web Conference Committee
(IW3C2),published under Creative Commons CC BY 4.0 License.WWW 2017
Companion, April 3–7, 2017, Perth, Australia.ACM
978-1-4503-4914-7/17/04.http://dx.doi.org/10.1145/3041021.3055130
.
Keywordsrestaurant survival analysis; location-based services;
datamining
1. INTRODUCTIONHow business will thrive in the future is an
important con-
cern for all shopkeepers. Knowing about long-term
trends,shopkeepers can take corresponding actions in advance.
Forinstance, if they know the store will come to a crisis in
amatter of months, they could take steps to avoid the mis-fortune
such as to make changes on the style of the store,or even consider
choosing a new placement to minimize eco-nomic losses. Usually
store owners make long-term decisionsbased on empirical judgement.
Due to limited data sourcesand lack of analytic tools, it is
traditionally a challenge tomake data-driven decisions.
Bankruptcy prediction is a common topic in managementand finance
literature. However, existing studies [31, 22, 25,11, 20, 21, 13]
are usually limited to the analysis of financialfactors, such as
liquidity, solvency, and profitability. In ad-dition, the data-set
used is usually small due to the challengeof obtaining financial
data. With the development of infor-mation techniques, especially
the growth of online location-based services, a large amount of
business related data canbe collected through the Internet. For
example, people maypost check-ins at some point of interest (POI)
they are vis-iting; after consuming in a shop, they can write
reviews onYelp to show how they like the shop. Thus there is a
poten-tial to exploit heterogeneous information to build
automaticbusiness intelligence tools for enhancing the decision
pro-cess. In this paper, we take advantage of both
geographicanalysis as well as user behavior analysis to study
whethera physical store will close down in the future.
Specifically,we explore various factors under the guidance of the
follow-ing considerations:(H1) the geographical placement of
thestore play an important role in the store’s operation;
(H2)people’s offline mobility patterns to the store as well as its
n-earby places influence the business; (H3) user’s rating
scores(e.g., on Yelp) are explicit evaluations of the store from
thecustomers’ point of view; (H4) besides well-formatted
ratingscores, review words contain more rich information which
asimple numeric score does not cover.
-
Recent works have studied the economic impact of geo-graphic and
user mobility factors on the retailer store[19][12].As formulated
in these works, geographic signs contain thetypes and density of
nearby places, and user mobility in-cludes transitions between
venues or the incoming flow ofmobile users from distant areas.
Inspired by them, we firstanalyze these two types of features. In
fact user mobilityfeatures are quite correlated with geographical
features, be-cause the former reflects spatial character in terms
of humanpopularity. We further bring in users’ review as an
impor-tant data source, including rating scores and review
text.After consumption at a store, users can rate scores for
thestore on multiple aspects, such as environment, cost andtaste on
platforms like Yelp and Dianping. These numericvalues summarize
customers’ overall opinion, but they arenot as detailed as the
words in the review. Through exper-iments we find that words
provide more predictive powerthan the simple numeric values.
To make the work more specific and accurate, we focuson the shop
category of restaurants. The main data sourcewe use is from the
Chinese website www.dianping.com, onwhich the overwhelming majority
of venues are food ser-vice ones. Additionally, China has a variety
of restauran-t categories, for example, Cantonese restaurants,
Szechuanrestaurants, and Shaanxi noodle restaurants. Thus as a
spe-cific type of store, restaurants are interesting in diversity
andworth in-dep mining. What is more, the analysis method
isgeneralized and can be easily applied to other types of
stores.
The contributions of this paper are summarized as follows:
• Different from traditional bankruptcy studies whichusually
focus on financial variables, we propose an ap-proach to conduct
restaurant survival analysis from ex-ogenous factors which can be
obtained from big dataover the Internet. Note that our purpose is
not to pro-vide better accuracy than existing models which makefull
use of financial factors. The primary advantagesare that our
approach covers various angles from het-erogeneous data and as well
is scalable to large numberof restaurant samples.
• We provide an in-depth analysis on geography, mo-bility, and
user opinions. We demonstrate what arethe relatively stronger
predictors. For example, neigh-bor entropy turns out to be the best
predictor fromthe perspective of geographic; users’ textual
messageare far more important than their numeric
ratings;restaurants which offer attractive group purchases butserve
poor food have a higher probability to close;and restaurants
holding core competitiveness (time-honored brand, well-deserved
reputation; crowded con-sumers, state-run, etc.) tend to survive,
which is insync with common sense.
• We conduct comprehensive experiments on three dif-ferent
cities, and find that the conclusions are quiteconsistent.
Meanwhile, integrating all the predictorscan lead to the best
accurate model, which demon-strates the necessary of including
feature variety.
The rest of this paper is organized as follows. In Section 2we
describe the essential information about our dataset. InSection 3
we define and analyze the geographical features. InSection 4, we
give an analysis of user mobility. In Section 5,we study online
rating scores and exploit various methods to
mine review text. After this we provide experiment detailson
combining different models and on different cities. InSection 7 we
summarize the related works. Finally we givethe conclusion in
Section 8.
2. DATA AND PROBLEM STATEMENTIn this section we first provide
some essential information
about the dataset used, including the data collection pro-cess
and the basic statistics of the collected data. Then weraise the
restaurant survival prediction problem and showthe performance of a
naive solution.
2.1 Data CollectionThe main data source we use in this paper is
Dianping.com.
Dianping, known as “Yelp for China”, is the largest con-sumer
review site in China. It offers multi-level knowledgethrough its
diverse functions such as reviews, check-ins, andPOI meta data
(including geographical message and shopattributes). We use the
LifeSpec data crawling platform[32] to retrieve all data related to
the shop (from the shop’sopen time to our crawling time).
Specifically, for each shopwe crawl :
(1) the meta information, including name, location
(city,latitude, longitude, and detailed address), category,and
price;
(2) all the reviews written by consumers. A review is com-prised
of review words and 5 scores, including overallrating, taste,
environment, service, and price.
(3) all the check-ins posted by users.
All the data we have crawled is publicly available on
thewebsite. The data crawling process finished in April 2014.
In the literature of churn analysis, a user is usually definedas
a churner if he/she does not have any data during the lastseveral
periods of the dataset. However, some shops may notbe popular
online often resulting in receiving no reviews orcheck-ins for a
long period, say several months. Therefore,it is not proper to
define shop failure based on the reviewor check-in numbers across a
period. Fortunately, we findthat Dianping has an API to query the
status of s shop.In general, all statuses can be grouped into four
categories:(1) normal shop, which means the shop is still
operating;(2) closed shop, meaning the shop has already closed
down;(3) suspended shop, meaning the business is suspended fora
certain time. The reason for suspension is various andunaware, and
the shop may or may not reopen; (4) others,including a few special
cases such as unqualified shops andapplicative shops. We crawl
shops’ status at March 2016and use the shop status as the
label.
2.2 Basic StatisticsOur entire Dianping dataset captures the
period ranging
from April 2003 (when dianping.com was established) toJune 2014
(when we finished crawling content data), as wellas the shops’
snapshot status at March 2016. Consideringthat spatial context may
change over such a long time, forrestaurant analysis we focus on a
certain year and assumethat geographical placement will not change
a lot within oneyear. We decide to use the 2012 data because we
have themost abundant data for this year. The basic statistics
for
-
50.33%
12.20% 10.35%7.59% 5.50% 3.67% 3.15% 2.82% 2.23% 2.17%
0%
10%
20%
30%
40%
50%
60%
Figure 1: Shop categories and their percentage.
Table 1: Basic statistics of Dianping dataset for year2012
#check-ins #reviews #cities #shops9,270,299 4,576,587 349
409,602
Table 2: Basic statistics of dataset for Shanghai, Bei-jing, and
Guangzhou
City #check-ins #reviews #shopsShanghai 4,027,503 1,980,914
76,190Beijing 1,710,396 826,772 50,917
Guangzhou 261,273 156,844 17,747
20000
25000
图表
20000
25000
0%
5%
10%
15%
20%
25%
30%
35%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
(a) The distribution of 23subcategories for restaurantsin
Beijing. The highest pointis snack bar, which accountsfor
32.8%.
1 10 100 1000 10000
0.20.4
0.60.8
1.0
(b) The cumulative distribu-tion function of the reviewcount for
each shop.
Figure 2: The restaurant subcategories distributionand the
cumulative distribution function of the re-view count per shop.
2012 are listed in Table 11. Among the 349 cities in
China,Shanghai, Beijing, Guangzhou are the most popular citiesin
our dataset, and their statistics are shown in Table 2. Aswe can
see, the three cities account for 64.8% of the totalreview count,
64.7% of the total check-ins, and 35.4% of theshop in amount.
Rather than building a unified model for all shops, in thispaper
we limit the study to restaurants. Restaurants is notonly the
largest shop category in quantity in our dataset, butalso has the
biggest number of subcategories. Figure 1 showsthat half of the
shops in our dataset belong to the restaurantcategory. There are 23
subcategories for restaurants andthe distribution is shown in
Figure 2a, from which we canobserve that the biggest subcategory is
snack bar.
1We only count shops which have at least one review
from2012.
Table 3: Statistics of restaurants which received noless than 10
reviews in 2012. Early closed restau-rants(closed before the end of
2012) are removed.We use this set for training and testing.
Shanghai Beijing Guangzhou#restaurants 12,990 8,615 2,325closed
ratio 37.4% 28.4% 30.8%
In Figure 2b we plot the Cumulative Distribution Func-tion (CDF)
of the review number a shop has. As can beobserved, 37.7% of the
shops have no less than 10 reviews.Since we are mining users’
online opinions, in the predic-tion task we focus on the shops
which have no less that 10reviews in order to ensure enough data to
build the mod-el. Next we plot the shop status distribution. As
shownin Figure 3, the ratio of closed status in restaurant groupis
as high as 28.6%, which is significantly higher than thatof
non-restaurants. It indicates that restaurants differ fromother
types of shops not only in quantities or diversity butalso in
stability, warranting its own in-depth study. Final-ly, we remove
the restaurants which closed before the endof 2012 from our
learning set, since long term prediction isnot relevant to them.
The final statistics of the learning setare listed in Table 3. We
randomly split the dataset intotraining set (70%), validation set
(15%) and test set (15%)in the experiments.
2.3 Problem StatementThe restaurant survival prediction problem
can be stated
as follows: given the heterogeneous data (geographical
infor-mation, user mobility data, online scores and review text)
in2012, we want to predict whether the restaurant will closedown
before March 2016. We use the restaurants which be-long to normal
shops or closed shops categories for study.The first thing that
comes into mind is that consumers’ sat-isfaction may influence the
future of a restaurant. This leadsus to ask, can the task of
restaurant survival prediction besolved simply using review scores
and consumers’ sentimentdata? To verify, we use SnowNLP 2 to
conduct a senti-ment analysis on the review text. For each review
of theshop, we can get a sentiment score s(r) ∈ [0, 1], with 0
in-dicating negative and 1 indicating positive. We calculatethe
average/minimum/maximum sentiment score for eachrestaurant, and use
these three scores as features to build alogistic regression model.
The AUC is 0.52, which is just s-lightly better than a random
guess. Similarly, we design sev-eral features based on consumers’
rating scores. The AUCis 0.6136, which is not satisfactory. Now we
ask: (1) Canwe build a more accurate model for restaurant survival
pre-diction? (2) What factors highly correlate with the futureof
restaurants?
3. GEOGRAPHICAL MODELWe expect that a shop’s business is to some
extent de-
pendent on its location. Motivated by [19, 12], we designspatial
metrics and study their predictive power. When s-tudying the
performance of these metrics in our scenario,we use Beijing’s data
for illustration. Later we will comparethe performance on different
cities in Section 6.Formally, we denote the set of all the shops in
a city as S.
2https://github.com/isnowfy/snownlp
-
Restaurant NonRestaurant0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
percentage normal
closedsuspendedothers
Figure 3: The status in March 2016 of restaurantsand
non-restaurants which are alive in 2012.
For each restaurant r, its neighbor set denoted by N(r) ={s ∈ S
: distance(r, s) ≤ d} are defined as all shops thatlie within a d
meter radius of it, and in the experiment weempirically set d to
500 meters. The category of a shop isdenoted by γ(r), and the
entire category set by Γ.
3.1 PredictorsDensity: Although the prediction objects are
restaurants,in predictor design, we consider all types of shops
when s-tudying context. Density is calculated as the number ofshops
in the restaurant’s neighborhood. It is an indicator ofthe
popularity around the restaurant:
fDr = |N(r)| (1)
Neighbor Entropy: We refer to Nγ(r) as the number ofshops of
cateogry γ near the restaurant r. Neighbor entropymetrics is
defined as:
fNEr = −∑γ∈Γ
Nγ(r)
N(r)× logNγ(r)
N(r)(2)
A high entropy value means more diversity in terms of
facil-ities within the area around the shop. A small entropy
valueindicates that the functionality of the area is biased
toward-s a few specific categories, e.g. working area or
residentialarea.Competitiveness: Restaurants may have different
cuisinestyles and people may have different dinning preferences
[33].We assume that most competition comes from the
nearbyrestaurants with the same category. For measuring
compet-itiveness we count the proportion of the neighbors of
thesame category C(r):
fComr =Nγ(r)(r)
N(r)(3)
Quality by Jensen: These metrics encode the spatial
inter-actions between different place categories. It is first
definedby Jensen et al.[18], and Dmytro et al.[19] use the
inter-category coefficients to weight the desirability of the
storefor location choosing. Formally, we have:
fQJr =∑β∈Γ
log(κβ→γ(r))× (Nβ(r)−Nβ(r)) (4)
where Nβ(r) means how many shops of category β are ob-served on
average around the shop of type γ(r). And the
inter-type attractiveness coefficient κβ→γ(r) is defined as:
κβ→γ(r) =N −Nβ
Nβ ×Nγ(r)
∑p:γ(p)=β
Nγ(r)(p)
N(p)−Nγ(p)(p)(5)
Category Demand: Inspired by the Qualify by Jensenmetrics, we
propose a simplified category attractiveness mea-sure, which is
named with category demand :
fCDr =1
Nγ(r)(r)
Nβ(r)
N(r)×Nγ(r)(β) (6)
where Nγ(r)(β) denotes how many shops of category γ(r)
areobserved on average around the shops of type β.
Basically,category demand is the ratio between the expected
numberof shops of category γ(r) in its location and the real
numberof shops of category γ(r). When fCDr is larger than 1.0,
theshop meets the requirements of the position and is supposedto
have a good business.
3.2 ResultsWe use logistic regression to perform binary
classification
based on the above predictors. We evaluate the performancein
terms of area under the (ROC) curve (AUC)[6] because itis not
influenced by the unbalanced instances problem. Fig-ure 4a presents
the AUC results. For individual features,neighbor entropy has
better performance than the otherthree, which indicates that the
heterogeneity of the near-by area play a more important role among
the geographicattributes. Density, competitiveness, and quality of
Jensenshow similar levels of predictive power. Combining all
thefeatures could lead to a significantly(t-test
p-value:0.026)better AUC than the best individual feature.
4. MOBILITY ANALYSIS[19] find that deeper insight into human
mobility pattern-
s helps improve local business analytics. People’s mobilitycan
directly reflect a place’s popularity. While various kindsof data
source can be employed to mine mobility pattern-s, e.g., taxi
trajectories and bus records as used in [10], weuse check-ins to
represent human mobility as [19] does. Acheck-in can be represented
as a triple, < u, s, t >, contain-ing user’s id, shop’s id,
and event timestamp. To clean upthe data, firstly we remove
outliers by filtering out userswho post check-ins too
frequently(e.g., more than 100 timesper day), and deleting
successive check-ins in a very shortperiod(e.g. in 1 minutes) or at
the same place. The densi-ty distribution is shown in Figure 5b. As
can be observed,check-in’s distribution greatly coincides with
shop’s distribu-tion (Figure 5a). There are four main concentrated
regionswhich are considered the most flourishing areas in
Beijing:Zhongguancun, China National Trade Center, Xidan
Com-mercial Street, and Wangfujing Street.
4.1 PredictorsArea Popularity: We use two values for mobility
popu-larity: the number of total check-ins to the shop, and
thecheck-ins near the shop:
fAP2r = |{< u, s, t > ∈ CI : distance(s, r) ≤ d}| (7)
where CI denotes the entire check-in set.Transition Density: We
define a user transition as hap-pening when a user posts two
consecutive check-ins (ci, cj)within 24 hours, and denote the
entire set of transition set
-
D NE Com JQ CD ALL0.45
0.5
0.55
AUC
(a) Geographical predictors
AP TD IF TQ PF ALL0.5
0.55
0.6
AUC
(b) Mobility predictors
Overall Cost Taste Env Service ALL
0.5
0.55
0.6
AUC
(c) Review-score predictors.
Figure 4: AUC performance comparison for individual predictors
of different groups. In all the three chartswe observe the best
performance when combining all individual predictors. The
classifier used is logisticregression.
(a) Shops distributions (b) Check-ins distribution
Figure 5: Heat maps of shops’ and check-ins’ densitydistribution
in Beijing.
as Ts. Then transition density is defined as the number
oftransitions whose start and end location are both near shopr:
fTDr = |{(ci, cj) ∈ Ts : distance(sci , r) ≤ d&&
distance(scj , r) ≤ d}|
(8)
Incoming Flow: The number of transitions whose startplace is
outside shop r’s neighborhood but the end place isinside r’s
neighborhood:
fIFr = |{(ci, cj) ∈ Ts : distance(sci , r) > d&&
distance(scj , r) ≤ d}|
(9)
This metrics indicate how well the area could attract cus-tomers
from remote regions.Transition Quality: This measures the potential
numberof customers that might be attracted from shop r’s
neigh-bors:
fTQr =∑
s∈S:distance(s,r)≤d
σγ(s)→γ(r) × CIs (10)
σγ(s)→γ(r) = E[|{(ci, cj) ∈ Ts : sci = s && γ(scj ) =
γ(r)}|
CIs]
(11)where CIs is the number of check-ins at shop s. σγ(s)→γ(r)is
the expected probability of transitions from category γ(s)to
category γ(r).Peer Popularity: These metrics assesses shop r’s
relativepopularity in comparison with shops of the same
category:
fPPr =CIr
CIr(12)
where CIr means how many check-ins the shops of categoryγ(r)
have on average. We use fPPr instead of the restau-
rant’s absolute check-in number to eliminate popularity
biascaused by store nature. People are more likely to post
check-ins at some types of shops like Starbucks, while they do
notlike to check in at Shaxian Refection, which is a famouslow-cost
restaurant in China. So fPPr reflects mobility pop-ularity better
through normalization .
4.2 PerformanceFigure 4b presents the AUC performance of
mobility pre-
dictors with logistic regression. Among the individual
pre-dictors, Peer Popularity is the strongest one and
transitionquality is the weakest one. This is reasonable because
(1) theshop’s own popularity can better reflect its business
statusthan the popularity of the area around it; (2)people tend
tochoose nearby restaurants for dinner. Again, by combiningall
mobility features the AUC is significant (p-value < 0.01)better
than the best individual one (peer popularity).
5. PREDICTING WITH ONLINE REVIEWSOnline reviews directly reflect
customers’ satisfaction with
the restaurants, thus the data is a big fortune worth mining.In
Section 2.3, we provide the initial results from ratingscores only.
In this section we go deeper with review data.
5.1 Rating ValuesWhen writing a review for a restaurant, the
consumer is
asked to provide five scores on different aspects
including:(1)overall rating (2)consumption level (3)taste
(4)environ-ment and (5) service quality. The rating scores are
scaledfrom 0 to 4 except for consumption level. The distributionof
scores is shown in Figure 6. The majority of users preferto give a
medium score, like 3 or 2.
For each type of score we compute the average, maximumand
minimum values as features. The prediction results areshown in
Figure 4c. The best individual score is overall rat-ing, which can
be regarded as a score summarizing all theother four ratings. Using
of all rating scores yields a signif-icantly better (p-value
-
Rating Taste Environment Service0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Percentage
01234
Figure 6: The distribution of different review rat-ings.
us a small piece of beef steak and two drinks. I willnever come
here again!!!
2. The food here is terrible. We almost ate nothing andleft in a
moment. This restaurant is far worse than theone next to the
museum.
The first reviewer points out that the price is too high
whilethe food is too little. The second reviewer is
complainingabout the taste of the food. Both provide helpful
knowledgeabout the potential problems with the restaurant.
Inspiredby this as well by the observations in Section 5.1, we
exam-ine whether we can mine more knowledge besides
conclusivescores by exploiting textual information.
Bag-Of-Words. First we employ the Bag-Of-Words(BOW)model and use
words as predictors. We collect all the reviewsand then segment the
sentences into words. To remove in-frequent and helpless words, we
use χ2 statistic (CHI) [26]to select the top kvoc most useful words
as the text repre-sentation:
fbowr = (wc1, wc2, ..., wckvoc) (13)
Where wci is the frequency of the i-th word. Figure 7a
il-lustrates the AUC of the BOW model in the Beijing datasetwith
different kvoc settings. We observe that kvoc = 1000performs better
than other settings. Ideally we should ob-serve a non-decreasing
trend when increasing kvoc. Perfor-mance in our case drops from
kvoc = 8000 due to the limitednumber of training instances. In
Beijing dataset we have8615 instances, so when kvoc increase, the
curse of dimen-sionality occurs and the amount of our data is not
sufficientto train an optimal model. Thus in the next step we
focuson language models which can reduce the dimension of tex-tual
features.
Word Embedding. Word embeddings are dense and low-dimensional
representation of words[29][28]. Each word isrepresented as a
Dwe-dimensional continuous-valued vector,where Dwe is a relatively
small number(i.e., 100 in our ex-periment). Similar words have
similar vectors. We traina model on our corpus using Word2Vec
toolkit3 and get avector for each word w:
πw = (πw1 , πw2 , ..., π
wDwe) (14)
3https://code.google.com/archive/p/word2vec/
100 1000 8000 5000000.5
0.55
0.6
0.65
0.7
0.75
0.8
AUC
(a) BOW with various vocab-ulary size.
LDA RNNLM ParaghWordEmb BOW0.4
0.5
0.6
0.7
0.8
AUC
(b) Comparison of differenttextual models.
Figure 7: Performance analysis for review text. Lo-gistic
regression is used as the classification method.
Similarly, we refer to a restaurant’s representation as a
Dwe-dimensionlal vector, which is a TF-IDF weighted average ofall
words that have ever appeared in the restaurant’s re-views:
πi(r) =∑
p∈RV (r)
wcp× log1
dcp×πpi for 0 ≤ i < Dwe (15)
fWEr = (π1(r), π2(r), ..., πDwe(r)) (16)
Where RV (r) indicates the review set of restaurant r, anddcp
means the number of restaurants containing word p inits review set.
Finally we refer to fWEr as the representationof the
restaurant.
Paragraph Vector. Based on the word embedding model,Quoc et al.
[23] propose Paragraph Vector, which learns con-tinuous distributed
vector representations for pieces of texts.We use this model in
Gensim 4 to generate an embedding foreach review. Then the
representation of a restaurant is theaverage of its reviews’
embeddings weighted by the review’stext length.
Neural Language Model. [3] proposes a method to learnmicro-post
representation based on the Elman network[15][27].Basically, it is
a recurrent neural network-based languagemodel aimed at predicting
the next word’s probability dis-tribution given previous words. The
architecture is shownin Figure 8. The input w(t) ∈ Rkvoc is the
one-hot represen-tation of a word at time t. The hidden layer h(t)
∈ RDrnn ,also known as the context layer, is computed based on
w(t)and h(t − 1), which is the context layer at time t-1. Drnnis
the size of hidden layer dimension, which is set to 100 inthe
experiment. The output y(t) ∈ Rkvoc is the probabilitydistribution
of a word at t+1.We use the neural language model to generate the
restau-
rant’s representation. We build a recurrent neural networkusing
CNTK[1], and the training runs 3.5 days with GPUNVIDIA GK107. Then
for each review pi, we feed its tex-tual content into the neural
network word by word. We useh(Tpi) to be the representation of
review pi, where Tpi isthe word count of pi and h(Tpi) is the state
of hidden layerat the last word of pi. For a restaurant, its
representation isthe average of all its reviews’
representation.
fNLMr =1
|RV (r)|×
(∑
p∈RV (r)
h(Tp)1,∑
p∈RV (r)
h(Tp)2, ...,∑
p∈RV (r)
h(Tp)Drnn)
(17)
4https://radimrehurek.com/gensim/
-
...
One-hot representation
INPUT(t) OUTPUT(t)
CONTEXT(t)
CONTEXT(t-1)
Figure 8: The recurrent neural network for languagemodelling
Topic Model. Another way to represent a restaurant isto generate
its topic distribution. We concatenate all thereviews belong to the
same restaurant to form a documen-t. We exploit Latent Dirichlet
Allocation (LDA)[5] to modelthe topics. In LDA, each document is
represented as a prob-ability distribution over topics, and each
topic is representedas a probability distribution over words. Thus
we refer tothe topic distribution vector as the restaurant’s
representa-tion.
5.3 ResultsWe set kvoc = 1000 in BOW predictors for its
superiority.
Figure 7b shows the performance of each model. The RNNlanguage
model does not work as well as word embeddingand bag-of-words. One
possible reason is that textual con-text is not as important as the
words themselves in our case.Another possible reason is that a
simple recurrent neuralnetwork might not be able to keep long-term
dependencies.Unlike tweets which are usually short, reviews might
containmuch longer textual content in length. Due to the vanish-ing
gradients, the RNN model can not model the connectionbetween final
outputs and earlier input words. In the futurewe will enhance RNN
with Long Short Term Memory unit-s[16].Unexpectedly, LDA doesn’t
work in our task. The possibleexplanations are: (1)the topic in our
case can be regardedas restaurants’ characteristics, such as
food-style. However,for restaurant survival prediction, topic is
not a good signalcompared with opinion. (2)it might not be proper
to con-catenate all the reviews of the same restaurant together
toform a document, since different reviews may concentrateon
different aspects.Paragraph2Vector performs slightly worse than
word em-bedding, which to some extent verifies our guess that
tex-tual context is not as important as words themselves in
ourtask. Since the two models share very similar algorithms, weonly
use one of them for further experiments. In the nextsection we will
use word embedding and BOW as textualpredictors.Since review text
plays such an important role in prediction,we use χ2 score to
select top 10 words related to alive andclosed restaurants
respectively, and we list them in Table 4(words are translated from
Chinese). Row alive lists the keywords which indicate a higher
probability for a restaurantto survive. These words describe the
strength of the restau-
Table 4: Top informative words for the two types ofrestaurants
(translated from Chinese). Words areselected based on χ2 score.
type top words
alivetime-honored brand; from childhood; well-deserved
reputation; crowded; a dozen years;well-know; early morning; not
tire; state-run;must-try
closedgroup purchase; original price; four people setmeal;
sluggish; Meituan; double meal; lackof customers; booth; catfish;
leaflet; LaShouGroup
rant, including already having a long history
(time-honoredbrand, from childhood, a dozen years), having strong
rep-utations (well-deserved reputation, well-known), being pop-ular
(crowded), serving delicacies (not tire, must-try), andfoundation
(state-run). Key words for closed restaurantsare more interesting.
Meituan5 and LaShou Group6 are t-wo famous Chinese group buying
websites. It seems thatrestaurants which offer attractive group
purchases but ac-tually serve disappointing food have a higher
probability ofclosing in the next few years. On the other side, the
storybehind words like original price, double meal, and leaflet
isthat consumers are complaining about the food or
service:consumers feel that the reality of the food is a long
wayfrom the image on the leaflet. Lastly, words like sluggishand
lack of customers directly describe the gloomy status ofthe
business, which obviously make it hard for the restaurantto
survive.
6. COMBINING MODELSIn previous sections we have studied various
features’ in-
dividual predictive power. Now we want to figure out
howperformance can be improved by combining features from
d-ifferent groups. In order to test the generality of models,
weconduct experiments separately on three cities, i.e.
Beijing,Shanghai and Guangzhou, which are the most popular citiesin
our dataset. In each experiment, we train a model basedon
parameters tuned from a validation set, and then reportthe
performance in the test set. We examine the perfor-mance of
logistic regression(LR), gradient boosted decisiontree
(GBDT)[9][7], and supported vector machine (SVM).
Results are shown in Table 5. Rows from G to E presentthe
detailed performance of different individual models. Forall three
cities, textual models(BOW and WE) significantlyoutperform
geographical, mobility and rating models. Geo-graphical metrics and
people mobility patterns are implicitfactors reflecting the spatial
demand within an area for therestaurant. However, most of the time,
before a merchantopens a new retail store, he/she will carefully
choose an opti-mal location to place the store, e.g., McDonald’s
restaurantsare often placed near train stations; a new Muslim
restauran-t may open to meet people’s dietary requirements if
thereare no existing Muslim food shops around. On the otherside,
people’s online reviews are explicit feedbacks about therestaurant.
The rating scores may not be directly connectedto the future
survival of the restaurant. Take environment s-core for instance.
Shaxian Refection is a low-cost restaurant
5http://www.meituan.com6http://www.lashou.com
-
Table 5: AUC performance of model combination for Beijing,
Shanghai, and Guangzhou. The best resultfor each city is
highlighted in bold. Significance test (denoted by *) indicates the
best model significantlyoutperforms the others with p-value
-
el to extract topic from user check-ins. However,
handlingreviews through topic model is proven not effective in
ourscenario.
Restaurant survival prediction is also related to customerchurn
prediction[2]. Churn means the customer leave a prod-uct or
service. Existing research works have explored vari-ous user
features through their historical behavior[8][2], andwith the fast
growth of online social network, several workshave studied social
influence on churn analysis[35][30]. How-ever, shop survival
analysis is obviously different from tradi-tional churn analysis.
To some extent shop’s failure could beregarded as all or the vast
majority of its customers’ churn.
There are some research works that deserve a mention be-cause
are related to restaurant analysis. [17] showed that at-mospherics
and service functioned as stimuli that enhancedpositive emotions,
which mediated the relationship betweenatmospherics/services and
future behavioral outcomes. [24]conducted experiments to show
negative reviews could influ-ence customer’s dinning decision. [33]
provided a compre-hensive study on restaurants and embodied dinning
prefer-ence, implicit feedback and explicit feedback for
restaurantrecommendation. [4] studied how restaurant attributes,
lo-cal demographics and local weather conditions could influ-ence
the reviews of restaurants.
8. CONCLUSIONThis paper discusses the problem of restaurant
survival
prediction by modeling four perspectives: geographical met-rics,
user mobility, rating scores, and review text. We pro-vide detailed
analysis on each perspective separately anddemonstrate its
predictive power. We find that if used prop-erly, review text can
reflect a restaurant’s operating statusbest. Comprehensive
experiments show that integrating d-ifferent predictors can lead to
the best model, and it is con-sistent among different cities.
In the future study, we are going to : (1) investigate
moreappropriate language models to extract better knowledgefrom
review text; (2) design a unified model to incorporateheterogeneous
learning algorithms so that the performancewill not limited by a
single learning algorithm such as GB-DT.
9. REFERENCES[1] A. Agarwal, E. Akchurin, C. Basoglu, G.
Chen,
S. Cyphers, J. Droppo, A. Eversole, B. Guenter,M. Hillebrand, R.
Hoens, et al. An introduction tocomputational networks and the
computationalnetwork toolkit. Technical report.
[2] J.-H. Ahn, S.-P. Han, and Y.-S. Lee. Customer churnanalysis:
Churn determinants and mediation effects ofpartial defection in the
korean mobiletelecommunications service industry.Telecommunications
policy, 30(10):552–568, 2006.
[3] H. Amiri and H. Daumé III. Short text representationfor
detecting churn in microblogs. In Thirtieth AAAIConference on
Artificial Intelligence, 2016.
[4] S. Bakhshi, P. Kanuparthy, and E. Gilbert.Demographics,
weather and online reviews: A study ofrestaurant recommendations.
In Proceedings of the23rd International Conference on World Wide
Web,WWW ’14, pages 443–454, New York, NY, USA,2014. ACM.
[5] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latentdirichlet
allocation. the Journal of machine Learningresearch, 3:993–1022,
2003.
[6] C. Buckley and E. M. Voorhees. Retrieval evaluationwith
incomplete information. In Proceedings of the27th annual
international ACM SIGIR conference onResearch and development in
information retrieval,pages 25–32. ACM, 2004.
[7] T. Chen and C. Guestrin. Xgboost: A scalable treeboosting
system. CoRR, abs/1603.02754, 2016.
[8] G. Dror, D. Pelleg, O. Rokhlenko, and I. Szpektor.Churn
prediction in new users of yahoo! answers. InProceedings of the
21st international conferencecompanion on World Wide Web, pages
829–834.ACM, 2012.
[9] J. H. Friedman. Greedy function approximation: agradient
boosting machine. Annals of statistics, pages1189–1232, 2001.
[10] Y. Fu, Y. Ge, Y. Zheng, Z. Yao, Y. Liu, H. Xiong, andJ.
Yuan. Sparse real estate ranking with online userreviews and
offline moving behaviors. In 2014 IEEEInternational Conference on
Data Mining (ICDM),pages 120–129. IEEE, 2014.
[11] M. A. F. Gámez, A. C. Gil, and A. J. C. Ruiz.Applying a
probabilistic neural network to hotelbankruptcy prediction.
Encontros Cient́ıficos-Tourism& Management Studies,
12(1):40–52, 2016.
[12] P. Georgiev, A. Noulas, and C. Mascolo. Wherebusinesses
thrive: Predicting the impact of theolympic games on local
retailers throughlocation-based services data. arXiv
preprintarXiv:1403.7654, 2014.
[13] Z. Gu. Analyzing bankruptcy in the restaurantindustry: A
multiple discriminant model. InternationalJournal of Hospitality
Management, 21(1):25–42, 2002.
[14] Z. Gu and L. Gao. A multivariate model for
predictingbusiness failures of hospitality firms. Tourism
andHospitality Research, 2(1):37–49, 2000.
[15] J. Hertz, A. Krogh, and R. G. Palmer. Introduction tothe
theory of neural computation, volume 1. BasicBooks, 1991.
[16] S. Hochreiter and J. Schmidhuber. Long short-termmemory.
Neural computation, 9(8):1735–1780, 1997.
[17] S. S. Jang and Y. Namkung. Perceived quality,emotions, and
behavioral intentions: Application of anextended mehrabian–russell
model to restaurants.Journal of Business Research, 62(4):451–460,
2009.
[18] P. Jensen. Network-based predictions of retail
storecommercial categories and optimal locations. PhysicalReview E,
74(3):035101, 2006.
[19] D. Karamshuk, A. Noulas, S. Scellato, V. Nicosia, andC.
Mascolo. Geo-spotting: Mining onlinelocation-based services for
optimal retail storeplacement. In Proceedings of the 19th ACM
SIGKDDInternational Conference on Knowledge Discovery andData
Mining, KDD ’13, pages 793–801, New York,NY, USA, 2013. ACM.
[20] H. Kim and Z. Gu. A logistic regression analysis
forpredicting bankruptcy in the hospitality industry. TheJournal of
Hospitality Financial Management,14(1):17–34, 2006.
-
[21] H. Kim and Z. Gu. Predicting restaurant bankruptcy:A logit
model in comparison with a discriminantmodel. Journal of
Hospitality & Tourism Research,30(4):474–493, 2006.
[22] S. Y. Kim and A. Upneja. Predicting restaurantfinancial
distress using decision tree and adaboosteddecision tree models.
Economic Modelling, 36:354–362,2014.
[23] Q. V. Le and T. Mikolov. Distributed representationsof
sentences and documents. In Proceedings of the 31thInternational
Conference on Machine Learning, ICML2014, Beijing, China, 21-26
June 2014, pages1188–1196, 2014.
[24] C. C. Lee. Understanding negative reviews’ influenceto user
reaction in restaurants recommendingapplications: An experimental
study.
[25] H. Li and J. Sun. Forecasting business failure: The useof
nearest-neighbour support vectors and correctingimbalanced
samples–evidence from the chinese hotelindustry. Tourism
Management, 33(3):622–634, 2012.
[26] T. Liu, S. Liu, Z. Chen, and W. Ma. An evaluation onfeature
selection for text clustering. In MachineLearning, Proceedings of
the Twentieth InternationalConference (ICML 2003), August 21-24,
2003,Washington, DC, USA, pages 488–495, 2003.
[27] T. Mikolov. Recurrent neural network based
languagemodel.
[28] T. Mikolov, K. Chen, G. Corrado, and J. Dean.Efficient
estimation of word representations in vectorspace. arXiv preprint
arXiv:1301.3781, 2013.
[29] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, andJ.
Dean. Distributed representations of words andphrases and their
compositionality. In Advances inneural information processing
systems, pages3111–3119, 2013.
[30] R. J. Oentaryo, E.-P. Lim, D. Lo, F. Zhu, and P.
K.Prasetyo. Collective churn prediction in socialnetwork. In
Proceedings of the 2012 InternationalConference on Advances in
Social Networks Analysisand Mining (ASONAM 2012), pages 210–214.
IEEEComputer Society, 2012.
[31] M. Olsen, C. Bellas, and L. V. Kish. Improving
theprediction of restaurant failure through ratio
analysis.International Journal of Hospitality
Management,2(4):187–193, 1983.
[32] N. J. Yuan, F. Zhang, D. Lian, K. Zheng, S. Yu, andX. Xie.
We know how you live: Exploring thespectrum of urban lifestyles. In
Proceedings of theFirst ACM Conference on Online Social
Networks,COSN ’13, pages 3–14, New York, NY, USA, 2013.ACM.
[33] F. Zhang, N. J. Yuan, K. Zheng, D. Lian, X. Xie, andY. Rui.
Exploiting dining preference for restaurantrecommendation. In
Proceedings of the 25thInternational Conference on World Wide Web,
WWW’16, pages 725–735, Republic and Canton of Geneva,Switzerland,
2016. International World Wide WebConferences Steering
Committee.
[34] Y. Zhong, N. J. Yuan, W. Zhong, F. Zhang, andX. Xie. You
are where you go: Inferring demographicattributes from location
check-ins. In Proceedings ofthe Eighth ACM International Conference
on WebSearch and Data Mining, WSDM ’15, pages 295–304,New York, NY,
USA, 2015. ACM.
[35] Y. Zhu, E. Zhong, S. J. Pan, X. Wang, M. Zhou, andQ. Yang.
Predicting user activity level in socialnetworks. In Proceedings of
the 22Nd ACMInternational Conference on Information &
KnowledgeManagement, CIKM ’13, pages 159–168, New York,NY, USA,
2013. ACM.