Yelp Data Analysis Sugandha Goel Nisha Nair Liz Stapleton Yiqun Xiang
Yelp Data AnalysisSugandha GoelNisha NairLiz StapletonYiqun Xiang
Our Data• Data received from Yelp
• All Data – includes four countries (US, UK, DE, CA)
• Business – list of businesses, key variables included:• Business Category (multiple)• Review Count• # Stars• Location
• Tip – comments given by users about businesses
• User – list of users, key variables included:• Review Count• Average Stars Given• Yelping Since• # of Fans
BUSINESS DATA
• Initial dataset:• 61,184 initial records• 436 categories
• Removed:• Non-food related categories using category1 and category2
• 19,981 rows remaining• 113 categories remaining
• Columns that had less than 1,000 completed rows• More complete dataset
Business Data – Cleaning the Data
Business Data – Decision Tree Analysis (CHAID)
With Drive Thru: 60% between 2.5 and 3.5
No Drive Thru: 58% between 3.5 and 4.0
26% 31% 60% 95%
64% 76% 86% 63% 73% 80% 90%
No Street Parkting: 69% > 3.5 stars
72% 58% 33%
With Street Parkting: 83% >3.5 stars
• Important factors:• Drive Thru• Review Count• Parking (Lot/Street)• Noise Level• Takes Reservations• Outdoor Seating
• Non-Drive Thrus > Drive Thrus
• The greater the review count, the better the star rating
Business Data – Decision Tree Analysis (CHAID)
8
Business Data – Tableau Discovery
Business Data – Tableau Discovery
Population # of Business
Avg Reviews per Business
NV 2.8 M 4,626 83
AZ 6.7 M 7,255 47
NV/AZ(%) 42% 64% 179%
• The average number of reviews per business of NV (83) is twice of AZ (47) and five times of SC (16).
• Potential reasons:• (1). NV has more Yelp users• (2). The Yelp users in NV write reviews more frequently
• Conclusion: Yelp is more of a cultural norm in NV
Business Data – Tableau Discovery
TIP DATA
• Completed Sentiment Analysis using r-studio• Randomly chose 50,000 comments from the
500,000 available
• Conclusions:• People may be worried about writing negative
reviews• People that are satisfied are more likely to spend
the time giving the business a positive review
Tip Data – Sentiment Analysis
Tip Data – Word CloudsMost frequent words (1-star reviews)
Tip Data – Word CloudsMost frequent words (5-star reviews)
USER DATA
• Removed:• All users without a user ID
• Added:• # of years since users started yelping
User Data – Cleaning the Data
∗𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦𝑜𝑓 𝑅𝑒𝑣𝑖𝑒𝑤=𝑅𝑒𝑣𝑖𝑒𝑤𝐶𝑜𝑢𝑛𝑡
¿𝑜𝑓 𝑦𝑒𝑎𝑟𝑠 𝑦𝑒𝑙𝑝𝑖𝑛𝑔
User Data – Regression AnalysisConclusion• All three independent
variables are significant in this model
• More frequently a user writes reviews, the less fans they will have
• People care about quality rather than quantity of reviews
SUMMARY
Advice to Improve your Yelp Rating
Do:• Take reservations• Offer a quieter atmosphere• Offer sufficient parking• Encourage customers to write
reviews
Don’t:• Have a drive-thru• Have a noisy environment• Be cash only
Software Used in Our Analysis
QUESTIONS?
APPENDIX
User Data – Cluster Analysis
Conclusion• Cluster Analysis does not
provide any useful conclusions because 96% of the data falls into one cluster
• Most users are similar to one another
Business Data – Tableau Discovery