Top Banner
Sentiment Detection Sentiment Detection Naveen Sharma(02005010) Naveen Sharma(02005010) PrateekChoudhary(02005016) PrateekChoudhary(02005016) Yashpal Meena(02005030) Yashpal Meena(02005030) Under guidance Under guidance Of Of Prof. Pushpak Bhattacharya Prof. Pushpak Bhattacharya
30

Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

Jan 02, 2016

Download

Documents

Mae Riley
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

Sentiment DetectionSentiment Detection

Naveen Sharma(02005010)Naveen Sharma(02005010)PrateekChoudhary(02005016)PrateekChoudhary(02005016)

Yashpal Meena(02005030)Yashpal Meena(02005030)Under guidance Under guidance

OfOfProf. Pushpak BhattacharyaProf. Pushpak Bhattacharya

Page 2: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

OutlineOutline

Problem StatementProblem Statement

ChallengesChallenges

Earlier Work and Traditional ApproachesEarlier Work and Traditional Approaches

Recent AdvancesRecent Advances

Conclusion/Future DirectionsConclusion/Future Directions

Page 3: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

Sentiment AnalysisSentiment Analysis

What is Sentiment Analysis?What is Sentiment Analysis?– Determining the overall polarity of a given Determining the overall polarity of a given

documentdocument

Polarity:Polarity:- Positive- Positive- Negative- Negative- Mixed- Mixed- Neutral- Neutral

Page 4: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

MotivationMotivation

IndividualIndividual– Movie Reviews on web (Thumbs up or Thumbs down)Movie Reviews on web (Thumbs up or Thumbs down)

CommercialCommercial– Feedback/evaluation forms.Feedback/evaluation forms.– Opinions about a product.Opinions about a product.– Recognizing and discarding “flames” on newsgroups.Recognizing and discarding “flames” on newsgroups.

PoliticalPolitical– Opinions on government policiesOpinions on government policies

eg. Iraq War, Taxationeg. Iraq War, Taxation

Page 5: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

Sentiment AnalysisSentiment Analysis

A type of Text ClassificationA type of Text ClassificationOther types of Text ClassificationsOther types of Text Classifications– Author based ClassificationAuthor based Classification– Topic CategorizationTopic Categorization

Sentiment Analysis and Topic Sentiment Analysis and Topic categorizationcategorization– Topics - subject matterTopics - subject matter– Sentiments - opinion towards subject matterSentiments - opinion towards subject matter

Page 6: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

ChallengesChallenges

Reference to multiple objects in the same Reference to multiple objects in the same documentdocument- - The NR70 is The NR70 is trendy.trendy. T-Series is fast becoming T-Series is fast becoming obsoleteobsolete..Dependence on the context of the documentDependence on the context of the document- - “Unpredictable” plot ; “Unpredictable” performance“Unpredictable” plot ; “Unpredictable” performance

Negations have to be capturedNegations have to be captured- - Monochrome display is Monochrome display is notnot what the user what the user wantswants– It is It is notnot like the movie is a total waste of time like the movie is a total waste of time

Page 7: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

Challenges(contd.)Challenges(contd.)

Metaphors/SimilesMetaphors/Similes

- - The metallic body is The metallic body is solid as a rocksolid as a rock

Part-of and Attribute-of relationshipsPart-of and Attribute-of relationships

- - The small keypad is inconvenientThe small keypad is inconvenient

Subtle ExpressionSubtle Expression

- - How can someone sit through this How can someone sit through this movie?movie?

Page 8: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

Earlier Work (First approaches)Earlier Work (First approaches)

Naive BayesNaive Bayes

Maximum EntropyMaximum Entropy

Support Vector MachinesSupport Vector Machines

Page 9: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

Naïve BayesNaïve Bayes

What is Naïve Bayesian ClassifierWhat is Naïve Bayesian Classifier

DifficultyDifficulty

-More than few variables-More than few variables

How to over come this difficultyHow to over come this difficulty

- Independence of variables- Independence of variables

Page 10: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

Naïve Bayes(Contd.)Naïve Bayes(Contd.) --- set of predefined feature vectors--- set of predefined feature vectors

– Features can be representative words/word patternsFeatures can be representative words/word patternsEach document d represented by document vector Each document d represented by document vector

Where nWhere nii(d) = no. of times feature vector f(d) = no. of times feature vector f i i occurs in doccurs in d

Assign a document d to classAssign a document d to class

WhereWhere

P(d) plays no role in selecting c*.P(d) plays no role in selecting c*.

( )* ( / )( / )

( )

P c P d cP c d

P d

1 2{ , ,..... }mf f f

1( ( ),.... ( ))md n d n d

* arg max ( / )cc P c d

Page 11: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

Naïve Bayes(contd.)Naïve Bayes(contd.)

Assuming fAssuming fiis are independent, Naïve Bayes s are independent, Naïve Bayes can be decomposed ascan be decomposed as

Advantages: Advantages: SimpleSimplePerforms Well Performs Well

( )

1( )( ( | ) )

( / ) :( )

im n d

iiNB

P c P f cP c d

P d

Page 12: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

Recent AdvancesRecent Advances

An unsupervised learning algorithmAn unsupervised learning algorithm

Extract phrases from the review based on Extract phrases from the review based on pattern of parts of speech tags.pattern of parts of speech tags.

JJ = adjective NN = NounJJ = adjective NN = Noun

Eg. Extracting 2 word patternsEg. Extracting 2 word patterns

First wordFirst word Second WordSecond Word Third Word (Not Third Word (Not extracted)extracted)

JJJJ NN or NNSNN or NNS AnythingAnything

JJJJ JJJJ Not NN nor NNSNot NN nor NNS

Page 13: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

Unsupervised Learning(contd.)Unsupervised Learning(contd.)

Estimate Semantic Orientation of Estimate Semantic Orientation of extracted phrasesextracted phrases

PMI (Pointwise Mutual Information) PMI (Pointwise Mutual Information) as strength of semantic associationas strength of semantic association

PMI(wordPMI(word11 , word , word22) = ) =

loglog22[ p(word[ p(word1 1 & word& word22)/ p(word)/ p(word11) p(word) p(word22)])]

SO(phrase) = SO(phrase) = PMI (phrase, ”excellent”) – PMI (phrase, “poor”)

Page 14: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

Unsupervised Learning(contd.)Unsupervised Learning(contd.)

Determine the Determine the Semantic Orientation Semantic Orientation (SO) of the phrases(SO) of the phrases

Search on AltaVistaSearch on AltaVista

SO (SO (phrasephrase) = ) =

( " ") (" ")log

( " ") (" ")

hits phraseNear excellent hits poor

hits phraseNear poor hits excellent

Page 15: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

Unsupervised Learning(contd.)Unsupervised Learning(contd.)

Calculate the average semantic orientation Calculate the average semantic orientation of phrases in the given review and classify of phrases in the given review and classify the review as recommended if the av-the review as recommended if the av-erage is positive and otherwise not erage is positive and otherwise not recommended.recommended.

Page 16: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

Recent Advances(contd.)Recent Advances(contd.)

Subjectivity and min-cuts Approach by Subjectivity and min-cuts Approach by Pang and LeePang and Lee– Step1: labeling sentences as subjective and Step1: labeling sentences as subjective and

objective.objective.– Step2: applying standard machine learning Step2: applying standard machine learning

classifier to the subjective extract.classifier to the subjective extract.

Page 17: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

Min cut approach(contd.)Min cut approach(contd.)

Formalization : Suppose we have n items Formalization : Suppose we have n items xx1 1 …..x…..xnn to divide into classes C to divide into classes C1 1 and Cand C22

We need two types of scores:We need two types of scores:– Individual scores indIndividual scores ind jj(x(xii))

estimate of each xestimate of each x ii’s preference’s preference

– Associative scores assoc(xAssociative scores assoc(x ii, x, xkk))

estimate of importance of both being in the estimate of importance of both being in the same classsame class

Page 18: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

Min cut approach(contd.)Min cut approach(contd.)Maximize individual preferenceMaximize individual preference

Penalize tightly associated items in different Penalize tightly associated items in different classesclasses

Optimization problem: The formula for cost:Optimization problem: The formula for cost:

Build an undirected graph G with vertices {vBuild an undirected graph G with vertices {v11 ….v ….vnn, ,

s, t}s, t}

edge (s, vedge (s, vii) ---- weight ind) ---- weight ind11(x(xii))

1 2 1

2

2 1,

( ) ( ) ( , )i

k

i kx C x C x C

x C

ind x ind x assoc x x

Page 19: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

Min cut approach(contd.)Min cut approach(contd.)

edge (vedge (vi i , t) – weight ind, t) – weight ind22(x(xii))

edge (vedge (vii, v, vkk) –weight assoc(x) –weight assoc(xii, x, xkk))

Classification problem now reduces to Classification problem now reduces to finding minimum cuts in the graphfinding minimum cuts in the graph

Page 20: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

Min cut approach(contd.)Min cut approach(contd.)

Page 21: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

Min cut approach(contd.)Min cut approach(contd.)

Advantages/Analysis:Advantages/Analysis:– Different algorithmsDifferent algorithms– Maximum flow algorithms Maximum flow algorithms – N most subjective sentences.N most subjective sentences.– Last N sentences Last N sentences – Most Subjective N sentencesMost Subjective N sentences

Page 22: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

Recent AdvancesRecent Advances

Using linguistic knowledge and wordnet Using linguistic knowledge and wordnet synonymy graphs – Agarwal and synonymy graphs – Agarwal and BhattacharyaBhattacharya

On Movie reviewsOn Movie reviews

Bag of words featuresBag of words features

Strength of adjective:Strength of adjective:

( , ) ( , )( )

( , )

d w bad d w goodEVA w

d good bad

Page 23: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

Wordnet Approach(contd.)Wordnet Approach(contd.)

aboutabout and and ofof sentences sentences– About the movie (review)About the movie (review)– Whats in the movieWhats in the movie

Two kinds of weights:Two kinds of weights:– Individual weights :: probability estimates by an SVM Individual weights :: probability estimates by an SVM

classifierclassifier– Mutual weights:: tendency to fall in same categoryMutual weights:: tendency to fall in same category

Physical separationPhysical separation– Paragraph boundariesParagraph boundaries

Contextual similarityContextual similarity– Total adjective strengthTotal adjective strength– Scaling and distance measureScaling and distance measure

Page 24: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

WordnetWordnet Approach(cont.) Approach(cont.)

Minimum cut algorithm similar to Pang and LeeMinimum cut algorithm similar to Pang and Lee

Mutual Similarity CoefficientMutual Similarity Coefficient

ffkk is the kth feature is the kth feature

FFii(f(fkk) = 1 if kth feature present in document) = 1 if kth feature present in document

= 0 otherwise= 0 otherwise

min

max min

( )* ( )( , ) i k j kki j

F f F f sMSC d d

s s

Page 25: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

WordnetWordnet Approach(contd.) Approach(contd.)

SVM trained to give PrSVM trained to give Prgoodgood and Pr and Prbadbad

SVM probabilities and MSC values – SVM probabilities and MSC values – Weights MatrixWeights Matrix

Min cut ApproachMin cut Approach

Page 26: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

WordnetWordnet Approach(contd.) Approach(contd.)

AnalysisAnalysis– Mutual relationships between documentsMutual relationships between documents– Graph cut technique as simple and powerfulGraph cut technique as simple and powerful– Decline in accuracy with subjectivityDecline in accuracy with subjectivity– Wordnet Wordnet - a useful lexicon resource- a useful lexicon resource

Page 27: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

Conclusion/Future DirectionsConclusion/Future Directions

Practical UtilityPractical Utility

Harder than other text classificationsHarder than other text classifications

Traditional machine learning techniques Traditional machine learning techniques don’t perform that well.don’t perform that well.

Linguistic knowledge needs to be usedLinguistic knowledge needs to be used– Eg. Eg. WordnetWordnet

Subjectivity extracts and mutual Subjectivity extracts and mutual dependenciesdependencies

Page 28: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

Conclusion/Future DirectionsConclusion/Future Directions

Better measure to incorporate linguistic Better measure to incorporate linguistic knowledgeknowledge

Better measures for degree of similarityBetter measures for degree of similarity

Formulation as multiclass problemFormulation as multiclass problem– Eg. Emotional icons in messengersEg. Emotional icons in messengers– May be helpful in building psychological May be helpful in building psychological

profiles through newsgroup mailsprofiles through newsgroup mails

Page 29: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

ReferencesReferences

Alekh Agarwal and Pushpak Bhattacharyya, Alekh Agarwal and Pushpak Bhattacharyya, Sentiment Analysis: A New Sentiment Analysis: A New Approach for Effective Use of Linguistic Knowledge and Exploiting Approach for Effective Use of Linguistic Knowledge and Exploiting Similarities in a Set of Documents to be ClassifiedSimilarities in a Set of Documents to be Classified, International Conference , International Conference on Natural Language Processing (on Natural Language Processing ( ICON 05 ICON 05), IIT Kanpur, India, December, ), IIT Kanpur, India, December, 20052005

Bo Pang and Lillian Lee, Bo Pang and Lillian Lee, A Sentimental Education:Sentiment Analysis Using A Sentimental Education:Sentiment Analysis Using Subjectivity Summarization Based on Minimum CutsSubjectivity Summarization Based on Minimum Cuts, Proceedings of ACL, , Proceedings of ACL, 2004.2004.

Bo Pang, Lillian Lee and Shivakumar Vaithyanathan, Bo Pang, Lillian Lee and Shivakumar Vaithyanathan, Thumbs Up? Thumbs Up? Sentiment Classification Using Machine Learning TechniquesSentiment Classification Using Machine Learning Techniques, Proceedings , Proceedings of EMNLP 2002,pp 79-86.of EMNLP 2002,pp 79-86.

Peter Turney. 2002. Peter Turney. 2002. Thumbs up or thumbs down? Se-mantic orientation Thumbs up or thumbs down? Se-mantic orientation applied to unsupervised classication of reviewsapplied to unsupervised classication of reviews. In Proc. of the ACL.. In Proc. of the ACL.

Page 30: Sentiment Detection Naveen Sharma(02005010) PrateekChoudhary(02005016) Yashpal Meena(02005030) Under guidance Of Prof. Pushpak Bhattacharya.

Thank YouThank You