ENTER 2015 Research Track Slide Number 1 Analyzing User Reviews in Tourism with Topic Models Marco Rossetti, Fabio Stella, Longbin Cao and Markus Zanker* Alpen-Adria-Universität Klagenfurt , Austria [email protected]http://www.aau.at * The presenter acknowledges the financial support of the European Union (EU), the European Regional Development Fund (ERDF), the Austrian Federal Government and the State of Carinthia in the Interreg IV Italien-Österreich programme (project acronym O-STAR).
18
Embed
Analyzing User Reviews in Tourism with Topic Models
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
ENTER 2015 Research Track Slide Number 1
Analyzing User Reviews in Tourism with Topic Models
* The presenter acknowledges the financial support of the European Union (EU), the European Regional Development Fund (ERDF), the Austrian Federal Government andthe State of Carinthia in the Interreg IV Italien-Österreich programme (project acronym O-STAR).
– ~200 mio. reviews on Tripadvisor– Valuable opinion source
• Need for automated processing of data harvested from the Web.
• Two principal (research) directions– Machine Learning (ML): fitting general purpose statistical models to data– Semantic Web: goal to move from the traditional „unstructured“ Web to a web of
data (annotate data with semantic descriptors and efficient reasoning mechanisms)
• Topic Model is within the ML direction, but it promises to detect semantic ties between words
Topic Model 1/3• Method to organize, search and summarize electronic
documents
• „..algorithms for discovering the themes that pervade a large and otherwise unstructured collection of documents.“ [Blei, CACM, 2012]
• Unsupervised learning strategy that builds on the basic idea:– Big corpus of documents such as reviews– Uncover hidden topical patterns– Annotate documents according to those topics
Topic Model 3/3• Intuition: Topics are probability distributions over
words and this discrete distribution generates observations (words in documents).
• Computation task: Compute the topic structure given the observations (Posterior).– Approximation of .. – .. distribution over words for each topic– .. topic proportion for each document– .. topic assignment to each occurence of a word in a
• Evaluation on datasets from YELP (restaurants) and Tripadvisor (hotels) with different levels of sparsity
• Accuracy results (RMSE) of Topic-Criteria model comparable to Nearest-Neighbor and Matrix Factorization approaches, BUT richer user profiles and we could explain which topics have been considered in real user interaction!
• Automatically derive different properties from a review such as:– Rating value: extract topics from the written text and match with
them with the item profile – if users writes about strengths of the hotel high score
– Identify reviews where the associated rating value is / is not coherent with the predicted rating to identify fake reviews or rank more plausible reviews higher
– Identify reviews with more breath / broader scope (see Daniel Leung‘s thesis)