כ" ז/ אייר/ תשע" ג1 1 Use of LDA Topics in Aspect and Sentiment Analysis by: Masha Igra Adviser: Prof. Michael Elhadad 2 Agenda • Introduction • Previous work – Knowledge Sources for Sentiment Analysis – Two-phase Approach • Aspect Detection • Sentiment Analysis – Joint Models • Proposed method • Results • Summary
25
Embed
Use of LDA Topics in Aspect and Sentiment Analysiselhadad/nlpproj/pub/masha-slides.pdfג"עשת/רייא/ז"כ 4 11 Terminology Opinion: “An opinion is simply a positive or negative
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
ג"תשע/אייר/ז"כ
1
1
Use of LDA Topics in Aspect
and Sentiment Analysis
by: Masha Igra
Adviser: Prof. Michael Elhadad
2
Agenda
• Introduction
• Previous work
– Knowledge Sources for Sentiment Analysis
– Two-phase Approach
• Aspect Detection
• Sentiment Analysis
– Joint Models
• Proposed method
• Results
• Summary
ג"תשע/אייר/ז"כ
2
3
Introduction
“What other people think” has always been an important piece of
information during decision making.
“The restaurant is really pretty inside and everyone who works there
looks like they like it.
The food is really great.
The reason they aren't getting five stars is because of their parking
situation.”
4
Introduction
“What other people think” has always been an important piece of
information during decision making.
“The restaurant is really pretty inside and everyone who works there Positive
looks like they like it.
The food is really great. Positive
The reason they aren't getting five stars is because of their parking Negative
situation.”
ג"תשע/אייר/ז"כ
3
9
Challenges
Can't we just look for words like “great” or “terrible” ?
Yes, but ... ... learning a sufficient set of such words or phrases is an active challenge.
"This film should be brilliant. It sounds like a great plot, the actors are
first grade, and the supporting cast is good as well, and Stallone is
attempting to deliver a good performance. However, it can't hold up."
Overall sentiment is negative
“She runs the gamut of emotions from A to B."
No ostensibly negative words occur.
10
Challenges (2)
“Read the book.” - Positive or Negative?
Sentiment-related indicators are domain-dependent:
“Read the book.” - positive for book,
“Read the book.” - negative for movie.
“Unpredictable” - positive for movie plots,
“Unpredictable” - negative for a car's steering
Aspect-related opinion words of restaurant domain:
“Large.” - positive for screen aspect
“Large.” - negative for battery aspect
ג"תשע/אייר/ז"כ
4
11
Terminology Opinion:
“An opinion is simply a positive or negative sentiment, view,
attitude, emotion, or appraisal about an entity or an aspect of
the entity from an opinion holder.” [Kim and Hovy, 2004]
Domain:
“A domain is a product, service, person, event or
organization.” [Liu and Zhang, 2012]
Aspect:
“An aspect is a set of terms characterizing a subtopic or a
theme in a given domain, which can be features of products or
attributes of services.” [Liu and Zhang, 2012]
12
Why it is important?
With the dramatic growth of user generated content comes a
corresponding need for automatic tools capable of extracting relevant
information for the user from plain text:
• Comparing two similar products:
– Presentation to the user the aspects in which the products differ.
• Automatic recommendations generation:
– Based on similarity between products, user reviews, and history of
previous purchases.
• A summary of the important factors mentioned in the reviews of a
product.
ג"תשע/אייר/ז"כ
5
13
Agenda
• Introduction
• Previous work
– Knowledge Sources for Sentiment Analysis
– Two-phase Approach
• Aspect Detection
• Sentiment Analysis
– Joint Models
• Proposed method
• Results
• Summary
14
Knowledge Sources for Sentiment Analysis
In most sentiment analysis approaches, the following features have been used:
– Terms and their frequency:
• individual words or word n-grams: “great”, “bad”, “so cheap”
• TF-IDF weights (words that are more frequent in a document than expected across all documents
are more relevant than words that are frequent across all documents):
tfi - the number of times term i occurs in document.
N - the total number of documents.
dfi - the number of documents that contain term i.
– Part of speech (POS): adjectives, verbs, nouns.
– Opinion words and phrases: words that are commonly used to express positive or negative sentiments:
• beautiful, good, and amazing (positive)
• bad, poor, and terrible (negative)
– Negations: “I don’t like this camera”
– Syntactic dependency: word dependency-based features, dependency trees.
i
iiidf
Ntfidftf log**
ג"תשע/אייר/ז"כ
6
15
Aspect Sentiment Analysis Approaches
• Two-phase approach:
– The first phase attempts to extract the aspects of an object that users frequently rate.
– The second phase classifies and aggregates sentiment over each of these aspects.
• Joint model:
The joint model discovers aspects and sentiment simultaneously.
16
Datasets
Dataset Number of
aspects
Number of
sentences
Restaurants 6 80,000
Hotels 7 49,471
Multi-Domain 4 3,684
DVD 4 2,660
A restaurant review:
<Ambience><Negative> “It became impossible to stand and have a drink or any type of
conversation .”
<Staff><Negative> “After waiting an hour and a half , we were finally seated at 11:00 .”
<Food><Negative> “I had a blue cheese burger that was dry and tasteless .”
ג"תשע/אייר/ז"כ
7
17
Two-Phase Approach: Aspect Detection
• LocalLDA [Brody and Elhadad, 2010] : a method which operates LDA on sentences, rather than documents, and employs a small number of topics that correspond to ratable aspects.
• Latent Dirichlet Allocation (LDA) [Blei et al., 2003] :
A probabilistic generative model that can be used to estimate the properties of multinomial observations by unsupervised learning.
Intuition: to find the latent structure of “topics” or “concepts” in a text corpus, which captures the meaning of the text.
18
Latent Dirichlet Allocation (LDA) - Blei et al. [2003]
ג"תשע/אייר/ז"כ
8
19
LDA (2)
20
The LDA model
u
z4 z3 z2 z1
w4 w3 w2 w1
b
u
z4 z3 z2 z1
w4 w3 w2 w1
u
z4 z3 z2 z1
w4 w3 w2 w1
•For each document,
•Choose u~Dirichlet()
•For each of the N words wn:
–Choose a topic zn» Multinomial(u)
–Choose a word wn from p(wn|zn,b), a multinomial probability
conditioned on the topic zn.
ג"תשע/אייר/ז"כ
9
21
The LDA model (cont.)
topic plate
document
plate
word plate
LDA algorithm solution is based on Gibbs sampling
22
LocalLDA • LocalLDA [Brody and Elhadad, 2010] : According to previous research,
LDA is not suited to the task of aspect detection in reviews, because it tends to capture global topics in the data, rather than ratable aspects relevant to the review. In order to prevent the inference of global topics and direct the model towards ratable aspects, they treated each sentence as a separate document.
“… public transport in London is straightforward. The tube station is about an 8 minute walk … or you can get a bus for £1.50”.
A global topic: London .
A local topic: ratable aspect location .
Results:
• There are a lot of variation of LDA extension.
Precision Recall
Food 82% 85%
Service 71% 75%
Atmosphere 63% 61%
ג"תשע/אייר/ז"כ
10
23
Two-Phase Approach: Sentiment Analysis
• Linguistic heuristics approach [Hatzivassiloglou and McKeown, 1997]: extracting a list of adjectives that have positive and negative meanings.
– Conjunctions between adjectives provide indirect information about orientation:
• “fair and legitimate”, “corrupt and brutal”.
• “but” usually connects two adjectives of different orientations.
– Clustering algorithm separates the adjectives into two subsets of different orientation.
– Group of words whose members have the highest average frequency are labeled as positive.
Input: Wall Street Journal corpus.
Output: Positive and negative adjectives.
24
Sentiment Analysis(2)
Classifiers based on machine learning showed higher
performance than rule-based classifiers.
• Word unigram-based model through SVMs [Pang et al., 2002]
• Focus only on subjective sentences in the reviews. But the accuracy
of their method is less than that of the classifier using full reviews.
[Pang and Lee, 2004]
Accuracy
Full reviews 87.2%
Subjective sentences 87.15%
ג"תשע/אייר/ז"כ
11
25
Joint Models
• Sentence-LDA (SLDA) and Aspect and Sentiment Unification Model (ASUM) [Jo and Oh, 2011] : one sentence tends to represent one aspect and one sentiment.
26
Research questions
• Do topic models help in supervised aspect identification
and sentiment detection?
• We want to compare results across multiple datasets that
have been used in previous work but not previously
compared.
ג"תשע/אייר/ז"כ
12
27
Agenda
• Introduction
• Previous work
– Knowledge Sources for Sentiment Analysis
– Two-phase Approach
• Aspect Detection
• Sentiment Analysis
– Joint Models
• Proposed method
• Results
• Summary
28
Methodology – aspect-sentiment example
A restaurant review:
“The bar was crowded with other people waiting to be seated for their reservations .
It became impossible to stand and have a drink or any type of conversation .
After waiting an hour and a half , we were finally seated at 11:00 .
I had a blue cheese burger that was dry and tasteless .”
ג"תשע/אייר/ז"כ
13
29
Methodology – aspect-sentiment example (2)
A restaurant review:
“The bar was crowded with other people waiting to be seated for their reservations .
It became impossible to stand and have a drink or any type of conversation .
After waiting an hour and a half , we were finally seated at 11:00 .
I had a blue cheese burger that was dry and tasteless .”
Staff Ambience Food
30
Methodology – aspect-sentiment example (3)
A restaurant review:
“The bar was crowded with other people waiting to be seated for their reservations .
It became impossible to stand and have a drink or any type of conversation .
After waiting an hour and a half , we were finally seated at 11:00 .
I had a blue cheese burger that was dry and tasteless .”
Staff Ambience Food
Neg
Neg
Neg
Neg
ג"תשע/אייר/ז"כ
14
31
Methodology – training step
Remove stop
words
Extract LDA
topics
Extract unigrams,
bigrams, POS
Sentences
Prepare TF-IDF
features
Train SVM
model for
aspects
extraction
Extract aspect
reviews and
group them to
aspect datasets
Predicted
aspect
datasets
Extract LDA
topics
Per aspect
Extract unigrams,
bigrams, POS
Prepare TF-IDF
features
Train SVM
model for
aspect
sentiment
classification
Sentiment
classification
of aspect
reviews
Sentiment
of reviews
Sentiment analysis
Aspects extraction
32
Methodology – test step
Sentence [features: unigrams, POS, topics]
SVM
model for
aspects
extraction
(sentence, aspect)
Per aspect:
Sentence [features: unigrams, POS, topicsA]
Sentiment
SVM
aspect
model
(sentence, sentiment)
ג"תשע/אייר/ז"כ
15
33
Aspect Classification
• Construct a supervised classifier (SVM) in order to