Restaurant Survival Analysis with Heterogeneous Information€¦ · Restaurant Survival Analysis with Heterogeneous Information Jianxun Lian University of Science and Technology of

Restaurant Survival Analysis with HeterogeneousInformation

Jianxun Lian∗

University of Science andTechnology of China

Hefei, [email protected]

Fuzheng ZhangMicrosoft Research

Beijing, [email protected]

Xing XieMicrosoft Research

Beijing, [email protected]

Guangzhong SunUniversity of Science and

Technology of ChinaHefei, China

[email protected]

ABSTRACTFor shopkeepers, one of their biggest common concerns iswhether their business will thrive or fail in the future. Withthe development of new ways to collect business data, it ispossible to leverage multiple domains’ knowledge to build anintelligent model for business assessment. In this paper, wediscuss what the potential indicators are for the long-termsurvival of a physical store. To this end, we study factorsfrom four pillars: geography, user mobility, user rating, andreview text. We start by exploring the impact of geograph-ic features, which describe the location environment of theretailer store. The location and nearby places play an im-portant role in the popularity of the shop, and usually lesscompetitiveness and more heterogeneity is better. Then westudy user mobility. It can be viewed as supplementary tothe geographical placement, showing how the location canattract users from anywhere. Another important factor ishow the shop can serve and satisfy users. We find thatrestaurant survival prediction is a hard task that can not besolved simply using consumers’ ratings or sentiment met-rics. Compared with conclusive and well-formatted ratings,the various review words provide more insight of the shopand deserve in-depth mining. We adopt several languagemodels to fully explore the textual message. Comprehen-sive experiments demonstrate that review text indeed havethe strongest predictive power. We further compare differentcities’ models and find the conclusions are highly consistent.Although we focus on the class of restaurant in this paper,the method can be easily extended to other shop categories.

∗This work was done when Jianxun Lian was an intern atMicrosoft Research, Beijing, P.R. China.

c©2017 International World Wide Web Conference Committee (IW3C2),published under Creative Commons CC BY 4.0 License.WWW 2017 Companion, April 3–7, 2017, Perth, Australia.ACM 978-1-4503-4914-7/17/04.http://dx.doi.org/10.1145/3041021.3055130

.

Keywordsrestaurant survival analysis; location-based services; datamining

1. INTRODUCTIONHow business will thrive in the future is an important con-

cern for all shopkeepers. Knowing about long-term trends,shopkeepers can take corresponding actions in advance. Forinstance, if they know the store will come to a crisis in amatter of months, they could take steps to avoid the mis-fortune such as to make changes on the style of the store,or even consider choosing a new placement to minimize eco-nomic losses. Usually store owners make long-term decisionsbased on empirical judgement. Due to limited data sourcesand lack of analytic tools, it is traditionally a challenge tomake data-driven decisions.

Bankruptcy prediction is a common topic in managementand finance literature. However, existing studies [31, 22, 25,11, 20, 21, 13] are usually limited to the analysis of financialfactors, such as liquidity, solvency, and profitability. In ad-dition, the data-set used is usually small due to the challengeof obtaining financial data. With the development of infor-mation techniques, especially the growth of online location-based services, a large amount of business related data canbe collected through the Internet. For example, people maypost check-ins at some point of interest (POI) they are vis-iting; after consuming in a shop, they can write reviews onYelp to show how they like the shop. Thus there is a poten-tial to exploit heterogeneous information to build automaticbusiness intelligence tools for enhancing the decision pro-cess. In this paper, we take advantage of both geographicanalysis as well as user behavior analysis to study whethera physical store will close down in the future. Specifically,we explore various factors under the guidance of the follow-ing considerations:(H1) the geographical placement of thestore play an important role in the store’s operation; (H2)people’s offline mobility patterns to the store as well as its n-earby places influence the business; (H3) user’s rating scores(e.g., on Yelp) are explicit evaluations of the store from thecustomers’ point of view; (H4) besides well-formatted ratingscores, review words contain more rich information which asimple numeric score does not cover.

Recent works have studied the economic impact of geo-graphic and user mobility factors on the retailer store[19][12].As formulated in these works, geographic signs contain thetypes and density of nearby places, and user mobility in-cludes transitions between venues or the incoming flow ofmobile users from distant areas. Inspired by them, we firstanalyze these two types of features. In fact user mobilityfeatures are quite correlated with geographical features, be-cause the former reflects spatial character in terms of humanpopularity. We further bring in users’ review as an impor-tant data source, including rating scores and review text.After consumption at a store, users can rate scores for thestore on multiple aspects, such as environment, cost andtaste on platforms like Yelp and Dianping. These numericvalues summarize customers’ overall opinion, but they arenot as detailed as the words in the review. Through exper-iments we find that words provide more predictive powerthan the simple numeric values.

To make the work more specific and accurate, we focuson the shop category of restaurants. The main data sourcewe use is from the Chinese website www.dianping.com, onwhich the overwhelming majority of venues are food ser-vice ones. Additionally, China has a variety of restauran-t categories, for example, Cantonese restaurants, Szechuanrestaurants, and Shaanxi noodle restaurants. Thus as a spe-cific type of store, restaurants are interesting in diversity andworth in-dep mining. What is more, the analysis method isgeneralized and can be easily applied to other types of stores.

The contributions of this paper are summarized as follows:

• Different from traditional bankruptcy studies whichusually focus on financial variables, we propose an ap-proach to conduct restaurant survival analysis from ex-ogenous factors which can be obtained from big dataover the Internet. Note that our purpose is not to pro-vide better accuracy than existing models which makefull use of financial factors. The primary advantagesare that our approach covers various angles from het-erogeneous data and as well is scalable to large numberof restaurant samples.

• We provide an in-depth analysis on geography, mo-bility, and user opinions. We demonstrate what arethe relatively stronger predictors. For example, neigh-bor entropy turns out to be the best predictor fromthe perspective of geographic; users’ textual messageare far more important than their numeric ratings;restaurants which offer attractive group purchases butserve poor food have a higher probability to close;and restaurants holding core competitiveness (time-honored brand, well-deserved reputation; crowded con-sumers, state-run, etc.) tend to survive, which is insync with common sense.

• We conduct comprehensive experiments on three dif-ferent cities, and find that the conclusions are quiteconsistent. Meanwhile, integrating all the predictorscan lead to the best accurate model, which demon-strates the necessary of including feature variety.

The rest of this paper is organized as follows. In Section 2we describe the essential information about our dataset. InSection 3 we define and analyze the geographical features. InSection 4, we give an analysis of user mobility. In Section 5,we study online rating scores and exploit various methods to

mine review text. After this we provide experiment detailson combining different models and on different cities. InSection 7 we summarize the related works. Finally we givethe conclusion in Section 8.

2. DATA AND PROBLEM STATEMENTIn this section we first provide some essential information

about the dataset used, including the data collection pro-cess and the basic statistics of the collected data. Then weraise the restaurant survival prediction problem and showthe performance of a naive solution.

2.1 Data CollectionThe main data source we use in this paper is Dianping.com.

Dianping, known as “Yelp for China”, is the largest con-sumer review site in China. It offers multi-level knowledgethrough its diverse functions such as reviews, check-ins, andPOI meta data (including geographical message and shopattributes). We use the LifeSpec data crawling platform[32] to retrieve all data related to the shop (from the shop’sopen time to our crawling time). Specifically, for each shopwe crawl :

(1) the meta information, including name, location (city,latitude, longitude, and detailed address), category,and price;

(2) all the reviews written by consumers. A review is com-prised of review words and 5 scores, including overallrating, taste, environment, service, and price.

(3) all the check-ins posted by users.

All the data we have crawled is publicly available on thewebsite. The data crawling process finished in April 2014.

In the literature of churn analysis, a user is usually definedas a churner if he/she does not have any data during the lastseveral periods of the dataset. However, some shops may notbe popular online often resulting in receiving no reviews orcheck-ins for a long period, say several months. Therefore,it is not proper to define shop failure based on the reviewor check-in numbers across a period. Fortunately, we findthat Dianping has an API to query the status of s shop.In general, all statuses can be grouped into four categories:(1) normal shop, which means the shop is still operating;(2) closed shop, meaning the shop has already closed down;(3) suspended shop, meaning the business is suspended fora certain time. The reason for suspension is various andunaware, and the shop may or may not reopen; (4) others,including a few special cases such as unqualified shops andapplicative shops. We crawl shops’ status at March 2016and use the shop status as the label.

2.2 Basic StatisticsOur entire Dianping dataset captures the period ranging

from April 2003 (when dianping.com was established) toJune 2014 (when we finished crawling content data), as wellas the shops’ snapshot status at March 2016. Consideringthat spatial context may change over such a long time, forrestaurant analysis we focus on a certain year and assumethat geographical placement will not change a lot within oneyear. We decide to use the 2012 data because we have themost abundant data for this year. The basic statistics for

50.33%

12.20% 10.35%7.59% 5.50% 3.67% 3.15% 2.82% 2.23% 2.17%

0%

10%

20%

30%

40%

50%

60%

Figure 1: Shop categories and their percentage.

Table 1: Basic statistics of Dianping dataset for year2012

#check-ins #reviews #cities #shops9,270,299 4,576,587 349 409,602

Table 2: Basic statistics of dataset for Shanghai, Bei-jing, and Guangzhou

City #check-ins #reviews #shopsShanghai 4,027,503 1,980,914 76,190Beijing 1,710,396 826,772 50,917

Guangzhou 261,273 156,844 17,747

20000

25000

图表

20000

25000

0%

5%

10%

15%

20%

25%

30%

35%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(a) The distribution of 23subcategories for restaurantsin Beijing. The highest pointis snack bar, which accountsfor 32.8%.

1 10 100 1000 10000

0.20.4

0.60.8

1.0

(b) The cumulative distribu-tion function of the reviewcount for each shop.

Figure 2: The restaurant subcategories distributionand the cumulative distribution function of the re-view count per shop.

2012 are listed in Table 11. Among the 349 cities in China,Shanghai, Beijing, Guangzhou are the most popular citiesin our dataset, and their statistics are shown in Table 2. Aswe can see, the three cities account for 64.8% of the totalreview count, 64.7% of the total check-ins, and 35.4% of theshop in amount.

Rather than building a unified model for all shops, in thispaper we limit the study to restaurants. Restaurants is notonly the largest shop category in quantity in our dataset, butalso has the biggest number of subcategories. Figure 1 showsthat half of the shops in our dataset belong to the restaurantcategory. There are 23 subcategories for restaurants andthe distribution is shown in Figure 2a, from which we canobserve that the biggest subcategory is snack bar.

1We only count shops which have at least one review from2012.

Table 3: Statistics of restaurants which received noless than 10 reviews in 2012. Early closed restau-rants(closed before the end of 2012) are removed.We use this set for training and testing.

Shanghai Beijing Guangzhou#restaurants 12,990 8,615 2,325closed ratio 37.4% 28.4% 30.8%

In Figure 2b we plot the Cumulative Distribution Func-tion (CDF) of the review number a shop has. As can beobserved, 37.7% of the shops have no less than 10 reviews.Since we are mining users’ online opinions, in the predic-tion task we focus on the shops which have no less that 10reviews in order to ensure enough data to build the mod-el. Next we plot the shop status distribution. As shownin Figure 3, the ratio of closed status in restaurant groupis as high as 28.6%, which is significantly higher than thatof non-restaurants. It indicates that restaurants differ fromother types of shops not only in quantities or diversity butalso in stability, warranting its own in-depth study. Final-ly, we remove the restaurants which closed before the endof 2012 from our learning set, since long term prediction isnot relevant to them. The final statistics of the learning setare listed in Table 3. We randomly split the dataset intotraining set (70%), validation set (15%) and test set (15%)in the experiments.

2.3 Problem StatementThe restaurant survival prediction problem can be stated

as follows: given the heterogeneous data (geographical infor-mation, user mobility data, online scores and review text) in2012, we want to predict whether the restaurant will closedown before March 2016. We use the restaurants which be-long to normal shops or closed shops categories for study.The first thing that comes into mind is that consumers’ sat-isfaction may influence the future of a restaurant. This leadsus to ask, can the task of restaurant survival prediction besolved simply using review scores and consumers’ sentimentdata? To verify, we use SnowNLP 2 to conduct a senti-ment analysis on the review text. For each review of theshop, we can get a sentiment score s(r) ∈ [0, 1], with 0 in-dicating negative and 1 indicating positive. We calculatethe average/minimum/maximum sentiment score for eachrestaurant, and use these three scores as features to build alogistic regression model. The AUC is 0.52, which is just s-lightly better than a random guess. Similarly, we design sev-eral features based on consumers’ rating scores. The AUCis 0.6136, which is not satisfactory. Now we ask: (1) Canwe build a more accurate model for restaurant survival pre-diction? (2) What factors highly correlate with the futureof restaurants?

3. GEOGRAPHICAL MODELWe expect that a shop’s business is to some extent de-

pendent on its location. Motivated by [19, 12], we designspatial metrics and study their predictive power. When s-tudying the performance of these metrics in our scenario,we use Beijing’s data for illustration. Later we will comparethe performance on different cities in Section 6.Formally, we denote the set of all the shops in a city as S.

2https://github.com/isnowfy/snownlp

Restaurant NonRestaurant0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

percentage normal

closedsuspendedothers

Figure 3: The status in March 2016 of restaurantsand non-restaurants which are alive in 2012.

For each restaurant r, its neighbor set denoted by N(r) ={s ∈ S : distance(r, s) ≤ d} are defined as all shops thatlie within a d meter radius of it, and in the experiment weempirically set d to 500 meters. The category of a shop isdenoted by γ(r), and the entire category set by Γ.

3.1 PredictorsDensity: Although the prediction objects are restaurants,in predictor design, we consider all types of shops when s-tudying context. Density is calculated as the number ofshops in the restaurant’s neighborhood. It is an indicator ofthe popularity around the restaurant:

fDr = |N(r)| (1)

Neighbor Entropy: We refer to Nγ(r) as the number ofshops of cateogry γ near the restaurant r. Neighbor entropymetrics is defined as:

fNEr = −∑γ∈Γ

Nγ(r)

N(r)× logNγ(r)

N(r)(2)

A high entropy value means more diversity in terms of facil-ities within the area around the shop. A small entropy valueindicates that the functionality of the area is biased toward-s a few specific categories, e.g. working area or residentialarea.Competitiveness: Restaurants may have different cuisinestyles and people may have different dinning preferences [33].We assume that most competition comes from the nearbyrestaurants with the same category. For measuring compet-itiveness we count the proportion of the neighbors of thesame category C(r):

fComr =Nγ(r)(r)

N(r)(3)

Quality by Jensen: These metrics encode the spatial inter-actions between different place categories. It is first definedby Jensen et al.[18], and Dmytro et al.[19] use the inter-category coefficients to weight the desirability of the storefor location choosing. Formally, we have:

fQJr =∑β∈Γ

log(κβ→γ(r))× (Nβ(r)−Nβ(r)) (4)

where Nβ(r) means how many shops of category β are ob-served on average around the shop of type γ(r). And the

inter-type attractiveness coefficient κβ→γ(r) is defined as:

κβ→γ(r) =N −Nβ

Nβ ×Nγ(r)

∑p:γ(p)=β

Nγ(r)(p)

N(p)−Nγ(p)(p)(5)

Category Demand: Inspired by the Qualify by Jensenmetrics, we propose a simplified category attractiveness mea-sure, which is named with category demand :

fCDr =1

Nγ(r)(r)

Nβ(r)

N(r)×Nγ(r)(β) (6)

where Nγ(r)(β) denotes how many shops of category γ(r) areobserved on average around the shops of type β. Basically,category demand is the ratio between the expected numberof shops of category γ(r) in its location and the real numberof shops of category γ(r). When fCDr is larger than 1.0, theshop meets the requirements of the position and is supposedto have a good business.

3.2 ResultsWe use logistic regression to perform binary classification

based on the above predictors. We evaluate the performancein terms of area under the (ROC) curve (AUC)[6] because itis not influenced by the unbalanced instances problem. Fig-ure 4a presents the AUC results. For individual features,neighbor entropy has better performance than the otherthree, which indicates that the heterogeneity of the near-by area play a more important role among the geographicattributes. Density, competitiveness, and quality of Jensenshow similar levels of predictive power. Combining all thefeatures could lead to a significantly(t-test p-value:0.026)better AUC than the best individual feature.

4. MOBILITY ANALYSIS[19] find that deeper insight into human mobility pattern-

s helps improve local business analytics. People’s mobilitycan directly reflect a place’s popularity. While various kindsof data source can be employed to mine mobility pattern-s, e.g., taxi trajectories and bus records as used in [10], weuse check-ins to represent human mobility as [19] does. Acheck-in can be represented as a triple, < u, s, t >, contain-ing user’s id, shop’s id, and event timestamp. To clean upthe data, firstly we remove outliers by filtering out userswho post check-ins too frequently(e.g., more than 100 timesper day), and deleting successive check-ins in a very shortperiod(e.g. in 1 minutes) or at the same place. The densi-ty distribution is shown in Figure 5b. As can be observed,check-in’s distribution greatly coincides with shop’s distribu-tion (Figure 5a). There are four main concentrated regionswhich are considered the most flourishing areas in Beijing:Zhongguancun, China National Trade Center, Xidan Com-mercial Street, and Wangfujing Street.

4.1 PredictorsArea Popularity: We use two values for mobility popu-larity: the number of total check-ins to the shop, and thecheck-ins near the shop:

fAP2r = |{< u, s, t > ∈ CI : distance(s, r) ≤ d}| (7)

where CI denotes the entire check-in set.Transition Density: We define a user transition as hap-pening when a user posts two consecutive check-ins (ci, cj)within 24 hours, and denote the entire set of transition set

D NE Com JQ CD ALL0.45

0.5

0.55

AUC

(a) Geographical predictors

AP TD IF TQ PF ALL0.5

0.55

0.6

AUC

(b) Mobility predictors

Overall Cost Taste Env Service ALL

0.5

0.55

0.6

AUC

(c) Review-score predictors.

Figure 4: AUC performance comparison for individual predictors of different groups. In all the three chartswe observe the best performance when combining all individual predictors. The classifier used is logisticregression.

(a) Shops distributions (b) Check-ins distribution

Figure 5: Heat maps of shops’ and check-ins’ densitydistribution in Beijing.

as Ts. Then transition density is defined as the number oftransitions whose start and end location are both near shopr:

fTDr = |{(ci, cj) ∈ Ts : distance(sci , r) ≤ d&& distance(scj , r) ≤ d}|

(8)

Incoming Flow: The number of transitions whose startplace is outside shop r’s neighborhood but the end place isinside r’s neighborhood:

fIFr = |{(ci, cj) ∈ Ts : distance(sci , r) > d&& distance(scj , r) ≤ d}|

(9)

This metrics indicate how well the area could attract cus-tomers from remote regions.Transition Quality: This measures the potential numberof customers that might be attracted from shop r’s neigh-bors:

fTQr =∑

s∈S:distance(s,r)≤d

σγ(s)→γ(r) × CIs (10)

σγ(s)→γ(r) = E[|{(ci, cj) ∈ Ts : sci = s && γ(scj ) = γ(r)}|

CIs]

(11)where CIs is the number of check-ins at shop s. σγ(s)→γ(r)is the expected probability of transitions from category γ(s)to category γ(r).Peer Popularity: These metrics assesses shop r’s relativepopularity in comparison with shops of the same category:

fPPr =CIr

CIr(12)

where CIr means how many check-ins the shops of categoryγ(r) have on average. We use fPPr instead of the restau-

rant’s absolute check-in number to eliminate popularity biascaused by store nature. People are more likely to post check-ins at some types of shops like Starbucks, while they do notlike to check in at Shaxian Refection, which is a famouslow-cost restaurant in China. So fPPr reflects mobility pop-ularity better through normalization .

4.2 PerformanceFigure 4b presents the AUC performance of mobility pre-

dictors with logistic regression. Among the individual pre-dictors, Peer Popularity is the strongest one and transitionquality is the weakest one. This is reasonable because (1) theshop’s own popularity can better reflect its business statusthan the popularity of the area around it; (2)people tend tochoose nearby restaurants for dinner. Again, by combiningall mobility features the AUC is significant (p-value < 0.01)better than the best individual one (peer popularity).

5. PREDICTING WITH ONLINE REVIEWSOnline reviews directly reflect customers’ satisfaction with

the restaurants, thus the data is a big fortune worth mining.In Section 2.3, we provide the initial results from ratingscores only. In this section we go deeper with review data.

5.1 Rating ValuesWhen writing a review for a restaurant, the consumer is

asked to provide five scores on different aspects including:(1)overall rating (2)consumption level (3)taste (4)environ-ment and (5) service quality. The rating scores are scaledfrom 0 to 4 except for consumption level. The distributionof scores is shown in Figure 6. The majority of users preferto give a medium score, like 3 or 2.

For each type of score we compute the average, maximumand minimum values as features. The prediction results areshown in Figure 4c. The best individual score is overall rat-ing, which can be regarded as a score summarizing all theother four ratings. Using of all rating scores yields a signif-icantly better (p-value

Rating Taste Environment Service0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Percentage

01234

Figure 6: The distribution of different review rat-ings.

us a small piece of beef steak and two drinks. I willnever come here again!!!

2. The food here is terrible. We almost ate nothing andleft in a moment. This restaurant is far worse than theone next to the museum.

The first reviewer points out that the price is too high whilethe food is too little. The second reviewer is complainingabout the taste of the food. Both provide helpful knowledgeabout the potential problems with the restaurant. Inspiredby this as well by the observations in Section 5.1, we exam-ine whether we can mine more knowledge besides conclusivescores by exploiting textual information.

Bag-Of-Words. First we employ the Bag-Of-Words(BOW)model and use words as predictors. We collect all the reviewsand then segment the sentences into words. To remove in-frequent and helpless words, we use χ2 statistic (CHI) [26]to select the top kvoc most useful words as the text repre-sentation:

fbowr = (wc1, wc2, ..., wckvoc) (13)

Where wci is the frequency of the i-th word. Figure 7a il-lustrates the AUC of the BOW model in the Beijing datasetwith different kvoc settings. We observe that kvoc = 1000performs better than other settings. Ideally we should ob-serve a non-decreasing trend when increasing kvoc. Perfor-mance in our case drops from kvoc = 8000 due to the limitednumber of training instances. In Beijing dataset we have8615 instances, so when kvoc increase, the curse of dimen-sionality occurs and the amount of our data is not sufficientto train an optimal model. Thus in the next step we focuson language models which can reduce the dimension of tex-tual features.

Word Embedding. Word embeddings are dense and low-dimensional representation of words[29][28]. Each word isrepresented as a Dwe-dimensional continuous-valued vector,where Dwe is a relatively small number(i.e., 100 in our ex-periment). Similar words have similar vectors. We traina model on our corpus using Word2Vec toolkit3 and get avector for each word w:

πw = (πw1 , πw2 , ..., π

wDwe) (14)

3https://code.google.com/archive/p/word2vec/

100 1000 8000 5000000.5

0.55

0.6

0.65

0.7

0.75

0.8

AUC

(a) BOW with various vocab-ulary size.

LDA RNNLM ParaghWordEmb BOW0.4

0.5

0.6

0.7

0.8

AUC

(b) Comparison of differenttextual models.

Figure 7: Performance analysis for review text. Lo-gistic regression is used as the classification method.

Similarly, we refer to a restaurant’s representation as a Dwe-dimensionlal vector, which is a TF-IDF weighted average ofall words that have ever appeared in the restaurant’s re-views:

πi(r) =∑

p∈RV (r)

wcp× log1

dcp×πpi for 0 ≤ i < Dwe (15)

fWEr = (π1(r), π2(r), ..., πDwe(r)) (16)

Where RV (r) indicates the review set of restaurant r, anddcp means the number of restaurants containing word p inits review set. Finally we refer to fWEr as the representationof the restaurant.

Paragraph Vector. Based on the word embedding model,Quoc et al. [23] propose Paragraph Vector, which learns con-tinuous distributed vector representations for pieces of texts.We use this model in Gensim 4 to generate an embedding foreach review. Then the representation of a restaurant is theaverage of its reviews’ embeddings weighted by the review’stext length.

Neural Language Model. [3] proposes a method to learnmicro-post representation based on the Elman network[15][27].Basically, it is a recurrent neural network-based languagemodel aimed at predicting the next word’s probability dis-tribution given previous words. The architecture is shownin Figure 8. The input w(t) ∈ Rkvoc is the one-hot represen-tation of a word at time t. The hidden layer h(t) ∈ RDrnn ,also known as the context layer, is computed based on w(t)and h(t − 1), which is the context layer at time t-1. Drnnis the size of hidden layer dimension, which is set to 100 inthe experiment. The output y(t) ∈ Rkvoc is the probabilitydistribution of a word at t+1.We use the neural language model to generate the restau-

rant’s representation. We build a recurrent neural networkusing CNTK[1], and the training runs 3.5 days with GPUNVIDIA GK107. Then for each review pi, we feed its tex-tual content into the neural network word by word. We useh(Tpi) to be the representation of review pi, where Tpi isthe word count of pi and h(Tpi) is the state of hidden layerat the last word of pi. For a restaurant, its representation isthe average of all its reviews’ representation.

fNLMr =1

|RV (r)|×

(∑

p∈RV (r)

h(Tp)1,∑

p∈RV (r)

h(Tp)2, ...,∑

p∈RV (r)

h(Tp)Drnn)

(17)

4https://radimrehurek.com/gensim/

...

One-hot representation

INPUT(t) OUTPUT(t)

CONTEXT(t)

CONTEXT(t-1)

Figure 8: The recurrent neural network for languagemodelling

Topic Model. Another way to represent a restaurant isto generate its topic distribution. We concatenate all thereviews belong to the same restaurant to form a documen-t. We exploit Latent Dirichlet Allocation (LDA)[5] to modelthe topics. In LDA, each document is represented as a prob-ability distribution over topics, and each topic is representedas a probability distribution over words. Thus we refer tothe topic distribution vector as the restaurant’s representa-tion.

5.3 ResultsWe set kvoc = 1000 in BOW predictors for its superiority.

Figure 7b shows the performance of each model. The RNNlanguage model does not work as well as word embeddingand bag-of-words. One possible reason is that textual con-text is not as important as the words themselves in our case.Another possible reason is that a simple recurrent neuralnetwork might not be able to keep long-term dependencies.Unlike tweets which are usually short, reviews might containmuch longer textual content in length. Due to the vanish-ing gradients, the RNN model can not model the connectionbetween final outputs and earlier input words. In the futurewe will enhance RNN with Long Short Term Memory unit-s[16].Unexpectedly, LDA doesn’t work in our task. The possibleexplanations are: (1)the topic in our case can be regardedas restaurants’ characteristics, such as food-style. However,for restaurant survival prediction, topic is not a good signalcompared with opinion. (2)it might not be proper to con-catenate all the reviews of the same restaurant together toform a document, since different reviews may concentrateon different aspects.Paragraph2Vector performs slightly worse than word em-bedding, which to some extent verifies our guess that tex-tual context is not as important as words themselves in ourtask. Since the two models share very similar algorithms, weonly use one of them for further experiments. In the nextsection we will use word embedding and BOW as textualpredictors.Since review text plays such an important role in prediction,we use χ2 score to select top 10 words related to alive andclosed restaurants respectively, and we list them in Table 4(words are translated from Chinese). Row alive lists the keywords which indicate a higher probability for a restaurantto survive. These words describe the strength of the restau-

Table 4: Top informative words for the two types ofrestaurants (translated from Chinese). Words areselected based on χ2 score.

type top words

alivetime-honored brand; from childhood; well-deserved reputation; crowded; a dozen years;well-know; early morning; not tire; state-run;must-try

closedgroup purchase; original price; four people setmeal; sluggish; Meituan; double meal; lackof customers; booth; catfish; leaflet; LaShouGroup

rant, including already having a long history (time-honoredbrand, from childhood, a dozen years), having strong rep-utations (well-deserved reputation, well-known), being pop-ular (crowded), serving delicacies (not tire, must-try), andfoundation (state-run). Key words for closed restaurantsare more interesting. Meituan5 and LaShou Group6 are t-wo famous Chinese group buying websites. It seems thatrestaurants which offer attractive group purchases but ac-tually serve disappointing food have a higher probability ofclosing in the next few years. On the other side, the storybehind words like original price, double meal, and leaflet isthat consumers are complaining about the food or service:consumers feel that the reality of the food is a long wayfrom the image on the leaflet. Lastly, words like sluggishand lack of customers directly describe the gloomy status ofthe business, which obviously make it hard for the restaurantto survive.

6. COMBINING MODELSIn previous sections we have studied various features’ in-

dividual predictive power. Now we want to figure out howperformance can be improved by combining features from d-ifferent groups. In order to test the generality of models, weconduct experiments separately on three cities, i.e. Beijing,Shanghai and Guangzhou, which are the most popular citiesin our dataset. In each experiment, we train a model basedon parameters tuned from a validation set, and then reportthe performance in the test set. We examine the perfor-mance of logistic regression(LR), gradient boosted decisiontree (GBDT)[9][7], and supported vector machine (SVM).

Results are shown in Table 5. Rows from G to E presentthe detailed performance of different individual models. Forall three cities, textual models(BOW and WE) significantlyoutperform geographical, mobility and rating models. Geo-graphical metrics and people mobility patterns are implicitfactors reflecting the spatial demand within an area for therestaurant. However, most of the time, before a merchantopens a new retail store, he/she will carefully choose an opti-mal location to place the store, e.g., McDonald’s restaurantsare often placed near train stations; a new Muslim restauran-t may open to meet people’s dietary requirements if thereare no existing Muslim food shops around. On the otherside, people’s online reviews are explicit feedbacks about therestaurant. The rating scores may not be directly connectedto the future survival of the restaurant. Take environment s-core for instance. Shaxian Refection is a low-cost restaurant

5http://www.meituan.com6http://www.lashou.com

Table 5: AUC performance of model combination for Beijing, Shanghai, and Guangzhou. The best resultfor each city is highlighted in bold. Significance test (denoted by *) indicates the best model significantlyoutperforms the others with p-value

el to extract topic from user check-ins. However, handlingreviews through topic model is proven not effective in ourscenario.

Restaurant survival prediction is also related to customerchurn prediction[2]. Churn means the customer leave a prod-uct or service. Existing research works have explored vari-ous user features through their historical behavior[8][2], andwith the fast growth of online social network, several workshave studied social influence on churn analysis[35][30]. How-ever, shop survival analysis is obviously different from tradi-tional churn analysis. To some extent shop’s failure could beregarded as all or the vast majority of its customers’ churn.

There are some research works that deserve a mention be-cause are related to restaurant analysis. [17] showed that at-mospherics and service functioned as stimuli that enhancedpositive emotions, which mediated the relationship betweenatmospherics/services and future behavioral outcomes. [24]conducted experiments to show negative reviews could influ-ence customer’s dinning decision. [33] provided a compre-hensive study on restaurants and embodied dinning prefer-ence, implicit feedback and explicit feedback for restaurantrecommendation. [4] studied how restaurant attributes, lo-cal demographics and local weather conditions could influ-ence the reviews of restaurants.

8. CONCLUSIONThis paper discusses the problem of restaurant survival

prediction by modeling four perspectives: geographical met-rics, user mobility, rating scores, and review text. We pro-vide detailed analysis on each perspective separately anddemonstrate its predictive power. We find that if used prop-erly, review text can reflect a restaurant’s operating statusbest. Comprehensive experiments show that integrating d-ifferent predictors can lead to the best model, and it is con-sistent among different cities.

In the future study, we are going to : (1) investigate moreappropriate language models to extract better knowledgefrom review text; (2) design a unified model to incorporateheterogeneous learning algorithms so that the performancewill not limited by a single learning algorithm such as GB-DT.

9. REFERENCES[1] A. Agarwal, E. Akchurin, C. Basoglu, G. Chen,

S. Cyphers, J. Droppo, A. Eversole, B. Guenter,M. Hillebrand, R. Hoens, et al. An introduction tocomputational networks and the computationalnetwork toolkit. Technical report.

[2] J.-H. Ahn, S.-P. Han, and Y.-S. Lee. Customer churnanalysis: Churn determinants and mediation effects ofpartial defection in the korean mobiletelecommunications service industry.Telecommunications policy, 30(10):552–568, 2006.

[3] H. Amiri and H. Daumé III. Short text representationfor detecting churn in microblogs. In Thirtieth AAAIConference on Artificial Intelligence, 2016.

[4] S. Bakhshi, P. Kanuparthy, and E. Gilbert.Demographics, weather and online reviews: A study ofrestaurant recommendations. In Proceedings of the23rd International Conference on World Wide Web,WWW ’14, pages 443–454, New York, NY, USA,2014. ACM.

[5] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latentdirichlet allocation. the Journal of machine Learningresearch, 3:993–1022, 2003.

[6] C. Buckley and E. M. Voorhees. Retrieval evaluationwith incomplete information. In Proceedings of the27th annual international ACM SIGIR conference onResearch and development in information retrieval,pages 25–32. ACM, 2004.

[7] T. Chen and C. Guestrin. Xgboost: A scalable treeboosting system. CoRR, abs/1603.02754, 2016.

[8] G. Dror, D. Pelleg, O. Rokhlenko, and I. Szpektor.Churn prediction in new users of yahoo! answers. InProceedings of the 21st international conferencecompanion on World Wide Web, pages 829–834.ACM, 2012.

[9] J. H. Friedman. Greedy function approximation: agradient boosting machine. Annals of statistics, pages1189–1232, 2001.

[10] Y. Fu, Y. Ge, Y. Zheng, Z. Yao, Y. Liu, H. Xiong, andJ. Yuan. Sparse real estate ranking with online userreviews and offline moving behaviors. In 2014 IEEEInternational Conference on Data Mining (ICDM),pages 120–129. IEEE, 2014.

[11] M. A. F. Gámez, A. C. Gil, and A. J. C. Ruiz.Applying a probabilistic neural network to hotelbankruptcy prediction. Encontros Cient́ıficos-Tourism& Management Studies, 12(1):40–52, 2016.

[12] P. Georgiev, A. Noulas, and C. Mascolo. Wherebusinesses thrive: Predicting the impact of theolympic games on local retailers throughlocation-based services data. arXiv preprintarXiv:1403.7654, 2014.

[13] Z. Gu. Analyzing bankruptcy in the restaurantindustry: A multiple discriminant model. InternationalJournal of Hospitality Management, 21(1):25–42, 2002.

[14] Z. Gu and L. Gao. A multivariate model for predictingbusiness failures of hospitality firms. Tourism andHospitality Research, 2(1):37–49, 2000.

[15] J. Hertz, A. Krogh, and R. G. Palmer. Introduction tothe theory of neural computation, volume 1. BasicBooks, 1991.

[16] S. Hochreiter and J. Schmidhuber. Long short-termmemory. Neural computation, 9(8):1735–1780, 1997.

[17] S. S. Jang and Y. Namkung. Perceived quality,emotions, and behavioral intentions: Application of anextended mehrabian–russell model to restaurants.Journal of Business Research, 62(4):451–460, 2009.

[18] P. Jensen. Network-based predictions of retail storecommercial categories and optimal locations. PhysicalReview E, 74(3):035101, 2006.

[19] D. Karamshuk, A. Noulas, S. Scellato, V. Nicosia, andC. Mascolo. Geo-spotting: Mining onlinelocation-based services for optimal retail storeplacement. In Proceedings of the 19th ACM SIGKDDInternational Conference on Knowledge Discovery andData Mining, KDD ’13, pages 793–801, New York,NY, USA, 2013. ACM.

[20] H. Kim and Z. Gu. A logistic regression analysis forpredicting bankruptcy in the hospitality industry. TheJournal of Hospitality Financial Management,14(1):17–34, 2006.

[21] H. Kim and Z. Gu. Predicting restaurant bankruptcy:A logit model in comparison with a discriminantmodel. Journal of Hospitality & Tourism Research,30(4):474–493, 2006.

[22] S. Y. Kim and A. Upneja. Predicting restaurantfinancial distress using decision tree and adaboosteddecision tree models. Economic Modelling, 36:354–362,2014.

[23] Q. V. Le and T. Mikolov. Distributed representationsof sentences and documents. In Proceedings of the 31thInternational Conference on Machine Learning, ICML2014, Beijing, China, 21-26 June 2014, pages1188–1196, 2014.

[24] C. C. Lee. Understanding negative reviews’ influenceto user reaction in restaurants recommendingapplications: An experimental study.

[25] H. Li and J. Sun. Forecasting business failure: The useof nearest-neighbour support vectors and correctingimbalanced samples–evidence from the chinese hotelindustry. Tourism Management, 33(3):622–634, 2012.

[26] T. Liu, S. Liu, Z. Chen, and W. Ma. An evaluation onfeature selection for text clustering. In MachineLearning, Proceedings of the Twentieth InternationalConference (ICML 2003), August 21-24, 2003,Washington, DC, USA, pages 488–495, 2003.

[27] T. Mikolov. Recurrent neural network based languagemodel.

[28] T. Mikolov, K. Chen, G. Corrado, and J. Dean.Efficient estimation of word representations in vectorspace. arXiv preprint arXiv:1301.3781, 2013.

[29] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, andJ. Dean. Distributed representations of words andphrases and their compositionality. In Advances inneural information processing systems, pages3111–3119, 2013.

[30] R. J. Oentaryo, E.-P. Lim, D. Lo, F. Zhu, and P. K.Prasetyo. Collective churn prediction in socialnetwork. In Proceedings of the 2012 InternationalConference on Advances in Social Networks Analysisand Mining (ASONAM 2012), pages 210–214. IEEEComputer Society, 2012.

[31] M. Olsen, C. Bellas, and L. V. Kish. Improving theprediction of restaurant failure through ratio analysis.International Journal of Hospitality Management,2(4):187–193, 1983.

[32] N. J. Yuan, F. Zhang, D. Lian, K. Zheng, S. Yu, andX. Xie. We know how you live: Exploring thespectrum of urban lifestyles. In Proceedings of theFirst ACM Conference on Online Social Networks,COSN ’13, pages 3–14, New York, NY, USA, 2013.ACM.

[33] F. Zhang, N. J. Yuan, K. Zheng, D. Lian, X. Xie, andY. Rui. Exploiting dining preference for restaurantrecommendation. In Proceedings of the 25thInternational Conference on World Wide Web, WWW’16, pages 725–735, Republic and Canton of Geneva,Switzerland, 2016. International World Wide WebConferences Steering Committee.

[34] Y. Zhong, N. J. Yuan, W. Zhong, F. Zhang, andX. Xie. You are where you go: Inferring demographicattributes from location check-ins. In Proceedings ofthe Eighth ACM International Conference on WebSearch and Data Mining, WSDM ’15, pages 295–304,New York, NY, USA, 2015. ACM.

[35] Y. Zhu, E. Zhong, S. J. Pan, X. Wang, M. Zhou, andQ. Yang. Predicting user activity level in socialnetworks. In Proceedings of the 22Nd ACMInternational Conference on Information & KnowledgeManagement, CIKM ’13, pages 159–168, New York,NY, USA, 2013. ACM.

Restaurant Survival Analysis with Heterogeneous Information€¦ · Restaurant Survival Analysis with Heterogeneous Information Jianxun Lian University of Science and Technology of

Documents