Top Banner
Sentiment Analysis Balamurali A R IITB-Monash Research Academy {[email protected]} Acknowledgment: Aditya Joshi
55

Sentiment Analysis Balamurali A R IITB-Monash Research Academy {[email protected]} Acknowledgment: Aditya Joshi.

Jan 15, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Sentiment Analysis

Balamurali A RIITB-Monash Research Academy

{[email protected]}

Acknowledgment: Aditya Joshi

Page 2: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Mona Lisa16th century

Artist: Leonardo da Vinci

Image from wikimedia commonsSource: Wikipedia

Smile of Mona Lisa

Is she smiling at all?

Is she happy?

What is she smiling about?

What is she happy about?

Page 3: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Sentiment analysis (SA)

Task of tagging text with orientation of opinion

This is a good movie.

This is a bad movie.

The movie is set in Australia.

Subjective

Objective

Page 4: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Planning

ComputerVision

NLP

ExpertSystems

Robotics

Search, Reasoning,

Learning

Disciplines which form the core of AI- inner circle Fields which draw from these disciplines- outer circle.

Sentiment Analysis

Page 5: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Motivation & Introduction

Approaches to SA

Applications

SA @ CFILT

Outline

Page 6: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Motivation & Introduction

Approaches to SA

Applications

SA @ CFILT

Outline

Need of SA: Why is SA needed?

Variants of SA: What forms does it exist in? Challenges of SA: Why is SA not trivial?

Page 7: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

User-generated content

• Web 2.0 empowers the user of the internet

• They are most likely to express their opinion there

• Temporal nature of UGC: ‘Live Web’• Can SA tap it?

Page 8: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

SA: Where?

• Blogs• Review websites• Social networks• User conversations

A website, usually maintained by an individual with regular

entries of commentary, descriptions of events.

Some SPs: Blogger, LiveJournal,

Wordpress

• Blogs• Review websites• Social networks• User conversations

Multiple review websites offering specific to general-topic

reviews

Some SPs: mouthshut, burrrp,bollywoodhungama

• Blogs• Review websites• Social networks• User conversations

Websitesthat allow people to

connect with one anotherand exchange thoughts

• Blogs• Review websites• Social networks• User conversations

Conversations betweenusers on one of the above

Page 9: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

SA: How much?

• Size of blogosphere– Through the ‘eyes’ of the blog trackers

• Technorati : 112.8 million blogs (excluding 72.82 million blogs in Chinese as counted by a corresponding Chinese Center)

• A blog crawler could extract 88 million blog URLs from blogger.com alone

• 12,000 new weblogs daily

Reference : www.technorati.com/state-of-the-blogosphere/

Page 10: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

SA: How much opinion?

Chart created using : www.technorati.com/chart/

Page 11: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Flavours of SA

• Subjective/Objective• Emotion analysis• SA with magnitude• Entity-specific SA• Feature-based SA• Perspectivization

“The movie is good.”

“People say that the movie is good.”

“This movie is awesome.”

“dude.. just get lost.”

“Whoa! Super!!”

“Taj Mahal was constructed by Shah Jahan in the memory of his

wife Mumtaz.”

“Taj Mahal is a masterpieceof an architecture and

symbolizes unparalleled beauty.”

“India defeated England in the cricket match badly.”

“The camera is the bestin its price range. However,a pathetically slow interfaceruins it for this cell phone.”

“The Leftists were arrestedyesterday by the police.”

Page 12: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Challenges of SA

• Domain dependent• Sarcasm• Thwarted expressions• Negation• Implicit polarity• Time-bounded

the sentences/words that contradict the overall sentiment

of the set are in majority

Example: “The actors are good, the music is brilliant and appealing.

Yet, the movie fails to strike a chord.”

Sarcasm uses words ofa polarity to represent

another polarity.

Example: “The perfume is soamazing that I suggest you wear it

with your windows shut”

Sentiment of a word is w.r.t. the

domain.

Example: ‘unpredictable’

For steering of a car,

For movie review,

“I did not like the movie.”

“Not only is the movie boring, it is also the biggest waste of producer’s

money.”

“Not withstanding the pressure of the public, let me admit that I have

loved the movie.”

“The camera of the mobile phone is less than one mega-pixel – quite

uncommon for a phone of today.”

“This phone allows me to send SMS.”

“This phone has a touch-screen.”

Page 13: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

SA Challenges: Sample Review 1(This, that and this)

FLY E300 is a good mobile which i purchased recently with lots of hesitation. Since this Brand is not familiar in Market as well known as Sony Ericsson. But i found that E300 was cheap with almost all the features for a good mobile. Any other brand with the same set of features would come around 19k Indian Ruppees.. But this one is only 9k.

Touch Screen, good resolution, good talk time, 3.2Mega Pixel camera, A2DP, IRDA and so on...

BUT BEWARE THAT THE CAMERA IS NOT THAT GOOD, THOUGH IT FEATURES 3.2 MEGA PIXEL, ITS NOT AS GOOD AS MY PREVIOUS MOBILE SONY ERICSSION K750i which is just 2Mega Pixel.

Sony ericsson was excellent with the feature of camera. So if anyone is thinking for Camera, please excuse. This model of FLY is not apt for you.. Am fooled in this regard..

Audio is not bad, infact better than Sony Ericsson K750i.

FLY is not user friendly probably since we have just started to use this Brand.

‘Touch screen’ today signifiesa positive feature.

Will it be the same in the future?

Comparing old products

The confused conclusion

From: www.mouthshut.com

Page 14: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

SA Challenges: Sample Review 2

Hi,

I have Haier phone.. It was good when i was buing this phone.. But I invented A lot of bad features by this phone those are It’s cost is low but Software is not good and Battery is very bad..,,Ther are no signals at out side of the city..,, People can’t understand this type of software..,, There aren’t features in this phone, Design is better not good..,, Sound also bad..So I’m not intrest this side.They are giving heare phones it is good. They are giving more talktime and validity these are also good.They are giving colour screen at display time it is also good because other phones aren’t this type of feature.It is also low wait.

Lack of punctuation marks,Grammatical errors

Wait.. err.. Come again

From: www.mouthshut.com

Page 15: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Motivation & Introduction

Approaches to SA

Applications

SA @ CFILT

Outline

Need of SA: Why is SA needed?

Variants of SA: What forms does it exist in? Challenges of SA: Why is SA not trivial?

Basics: What is classification

Approaches: What are ways to do SA

Page 16: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Task Definition

• Marking reviews as positive or negative at the document level– Lexicon-based classifiers– ML-based classifiers

Page 17: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

CombinationMaxEnt ClassifierNaïve Bayes ClassifierSVM Classifier

What is classification?

A machine learning task that deals with identifying the class to which an instance belongs

A classifier performs classification

ClassifierTest instance

Attributes

(a1, a2,… an)

Discrete-valued

Class label

( Age, Marital status,

Health status, Salary ) Issue Loan? {Yes, No}

( Perceptive inputs )

Steer? { Left, Straight, Right }

Category of document? {Politics, Science, Biology}

( Textual features : Ngrams )

Page 18: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Classification learning

Training phase

Testing phase

Learning the classifier

from the available data

‘Training set’

(Labeled)

Testing how well the classifier

performs

‘Testing set’

Page 19: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Testing phase

Methods:– Holdout (2/3rd training, 1/3rd testing)– Cross validation (n – fold)• Divide into n parts• Train on (n-1), test on last• Repeat for different permutations

Page 20: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Approaches to SA and Text granularity

Based on text granularity• Document level• Sentence level• Phrase level• Word level

………..Approaches to SA will differ

Page 21: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Generic Approaches

SA

Machine Learning based

Supervised Unsupervised

Rule based

Lexicon based

Document level sentiment classifier

Page 22: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Motivation & Introduction

Approaches to SA

Applications

SA @ CFILT

Outline

Need of SA: Why is SA needed?

Variants of SA: What forms does it exist in? Challenges of SA: Why is SA not trivial?

Basics: What is classification

Approaches: What are ways to do SA 1) Rule based 2) Machine Learning based

Page 23: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Rule based System: Resources for SA

SentiWordNet– WordNet synsets marked with three types of

scores: positive, negative, objective

I am feeling happy.I am feeling happy.

Page 24: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

LpLn

also-se

e

antonymy

Seed-set expansion in SWN

The sets at the end of kth step are called Tr(k,p) and Tr(k,n)

Tr(k,o) is the set that is not present in Tr(k,p) and Tr(k,n)

Seed words

Page 25: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Building SentiWordnet • Classifier alternatives used: Rocchio (BowPackage) &

SVM(LibSVM) • Different training data based on expansion• POS –NOPOS and NEG-NONEG classification

• Total eight classifiers– For different combinations of k and classifiers

• Synsets not in the expanded seed set are used as test synsets– Score is average of scores returned by the classifiers

Page 26: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Rule based System: An Example

C-FEEL-ITAn entity-based opinion search engine on Twitter

How it works?1. User enter a search string to get its “public vibe”2. Tweets are fetched based on search string3. Based on sentiment lexicon, mark each tweet with

sentiment score using majority rule4. Categorize each tweet into sentiment categories

using a threshold value

Page 27: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

C-FeeL-IT: Preprocessing and heuristics

• Feeds from twitter used to obtain:– 50 tweets

In EnglishAbout the keyword

• Normalization done using:– Mapping between chat lingo to dictionary words1

– Mapping between emoticons and direct sentiment prediction1

– Extensions of words replaced by contracted forms– Negation handling

1 http://chat.reichards.net/

Page 28: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

C-FeeL-IT: Resources used

•SentiWordNet (Andrea & Sebastani,2006)•Subjectivity clues (Weibi et al, 2004)•Taboada (Taboada & Grieve, 2004)•Inquirer (Stone et al, 1966)

Reference given in Notes section

Page 29: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

C-FeeL-IT: Demo

• Available at: http://www.clia.iitb.ac.in:8080/cfeelit-2/

Page 30: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Motivation & Introduction

Approaches to SA

Applications

SA @ CFILT

Outline

Need of SA: Why is SA needed?

Variants of SA: What forms does it exist in? Challenges of SA: Why is SA not trivial?

Basics: What is classification

Approaches: What are ways to do SA 1) Rule based 2) Machine Learning based

Page 31: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Training Data

Feature

Engineering

Learner

Apply on

Test Data

Evaluate

Supervised System

Things to consider:1. Select suitable domain: product, travel, politics, movies etc

2. Select the text granularity

Popular features: Term presence/term frequency, unigram/bigram ,adjectives SVM, Naïve Bayes, MaxEnt, Ensemble etcEvaluation metrics: Accuracy, Recall, Precision, F-Score

Page 32: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Supervised System: Our SystemExisting approaches do not consider

‘sense/meaning’ of the wordHowever, a word may have:1

2

3

sentiment bearing and non sentiment bearing senses

senses with opposing polarity

been abstracted

“Her face fell when she heard that she had been fired.”“The fruit fell from the tree.”

“The snake bite proved to be deadly for the young boy.”“Shane Warne is a deadly spinner.”

“He speaks a vulgar language.”“Now that's real crude behavior!”

Bag-of-words features - Pang et al. (2002), Martineau & Finn(2009), Paltoglou &Thelwall (2010)

Syntactic features - Matsumoto et al. 2005, Kennedy & Inkpen (2006), Whitelaw et al. (2005)

Page 33: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Lexical space v/s sense space

There are also_347757 fire_pits_19147259 available_4203394 if you want_21808093 to have a bonfire_17203241 with your friends_19962226 .

There are also fire-pits available if you want to have a bonfire with your friends .

fire_pits : 19147259 (1: POS identifier : Noun, 9147259: Wordnet Synset offset)

Manually annotated Senses (M)

Automatically annotated Senses (I)

Lexical Space Sense Space

Image source: Wikimedia commons

Supervised System: Our System

Word Sense Disambiguation (WSD)

Page 34: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

CorpusManual

Annotation

Manual Sense-

annotated Corpus

A WSD Engine

Automatic Sense-

annotated Corpus

Classifier Training

Only-senseFilter

Classifier Training

Classifier Training

Only-senseFilter

Classifier Training

Classifier Training

Classifier W

Classifier I

Classifier W+S(I)

Classifier M

Classifier W+S(M)

Supervised System: Our Approach

Image source: Wikimedia commons

Page 35: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Results: Overall Classification

FeatureRepresentation

Accuracy Pos Precision

Neg Precision Pos Recall Neg Recall

W 84.90 84.95 84.92 85.19 84.60

M 89.10 91.50 87.07 85.18 91.24

W+S(M) 90.20 92.02 88.55 87.71 92.39

I 85.48 87.17 83.93 83.53 87.46

W+S(I) 86.08 85.87 86.38 86.69 85.46

• Senses give better overall accuracy• Negative Recall increases

Image source: Wikimedia commons

Page 36: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Motivation & Introduction

Approaches to SA

Applications

SA @ CFILT

Outline

Need of SA: Why is SA needed?

Variants of SA: What forms does it exist in? Challenges of SA: Why is SA not trivial?

Cross-lingual SACross-domain SAOpinion SpamSA for tweets

Basics: What is classification

Approaches: What are ways to do SA1) Machine Learning based2) Rule based

Page 37: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Hindidocument Sentiment Label

Cross-lingual SA

Englishdocument

SentimentAnalysisSystem

SentimentAnalysisSystem

• Multilingual content on the internet growing

• How can the sentiment it carries be identified?

• Can we take help of the ‘rich cousin’ English?

Page 38: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Alternatives to Cross-lingual SA

Strategies for SA for target language

Use corpus in target language

Translate to a ‘rich’ source

language

Develop resources for target language

Page 39: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Motivation & Introduction

Approaches to SA

Applications

SA @ CFILT

Outline

Need of SA: Why is SA needed?

Variants of SA: What forms does it exist in? Challenges of SA: Why is SA not trivial?

Cross-lingual SACross-domain SAOpinion SpamSA for tweets

Basics: What is classification

Approaches: What are ways to do SA1) Machine Learning based2) Rule based

Page 40: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Domain-dependence of words

• ‘deadly’– It was one deadly match!– There are some deadly poisonous snakes in the

jungles of Amazon.

Page 41: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

General Approach

• Retain the ‘common-to-all-domain’ words• Learn only the ‘special domain’ words

• Domain differences can be substantial

Page 42: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Motivation & Introduction

Approaches to SA

Applications

SA @ CFILT

Outline

Need of SA: Why is SA needed?

Variants of SA: What forms does it exist in? Challenges of SA: Why is SA not trivial?

Cross-lingual SACross-domain SAOpinion SpamSA for tweets

Basics: What is classification

Approaches: What are ways to do SA1) Machine Learning based2) Rule based

Page 43: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Opinion spam: A side-effect of UGC

• Reviews contain rich user opinions on products and services

• Anyone can write anything on the Web– No quality control

• Result• Incentives

Low quality reviews,review spam / opinion

Spam.

Positive opinion -> Financial gain for

organization

Page 44: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Different types of spam reviews• Type 1 (untruthful opinions)• Type 2 (reviews on brands only)• Type 3 (non-reviews)

Giving undeserving reviews to some target objects in order

to promote/demote the objecthyper spam - undeserving positive reviews

defaming spam - malicious negative reviews

DUPLICATES

No comment on the productComments on brands, manufacturer or

sellers of the product

Advertisements Other irrelevant reviews containing no opinions

e.g. questions, answers and random textAlthough you should not expect prompt shippin.

(It took 3 weeks and several e-mails before I received my order.)I would order again from this merchant,

just because the price was right - http://www.pricegrabber.com

It’s from nikon, what more you want..

Reference : [Jindal et al, 2008]

Page 45: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Motivation & Introduction

Approaches to SA

Applications

SA @ CFILT

Outline

Need of SA: Why is SA needed?

Variants of SA: What forms does it exist in? Challenges of SA: Why is SA not trivial?

Cross-lingual SACross-domain SAOpinion SpamSA for tweets

Basics: What is classification

Approaches: What are ways to do SA1) Machine Learning based2) Rule based

Page 46: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Challenges with tweets

• Ill-formed– Spelling mistakes– Informal words/emoticons– Extensions of words (‘happppyyyyy’)

• Vague topics

Page 47: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Motivation & Introduction

Approaches to SA

Applications

SA @ CFILT

Outline

Need of SA: Why is SA needed?

Variants of SA: What forms does it exist in? Challenges of SA: Why is SA not trivial?

Cross-lingual SACross-domain SAOpinion SpamSA for tweets

Twitter based SA, Sense based SA, Cross-Lingual SA and many more…..

Basics: What is classification

Approaches: What are ways to do SA1) Machine Learning based2) Rule based

Page 48: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

SA @ CFILT

English

Twitter

Trend Analysis

Cross-Domain

Sense-based

Similarity Metric

Discourse

Detecting Thwarting

Other Languages

Indian Languages

Cross Lingual

Hindi Marathi

European Languages

Cross Lingual

French Spanish German

Page 49: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Thank you!&

Questions?

Page 50: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Extra Reading- Classifiers

• Naïve Bayes• SVM• Committee-based classifiers

Page 51: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Naïve Bayes classifiers

• Based on Bayes rule• Naïve Bayes : Conditional independence assumption

Page 52: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Support vector machines

• Basic idea

Separating hyperplane : wx+b = 0

Margin

Support vectors

“Maximum separating-margin classifier”

Page 53: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Multi-class SVM

• Multiple SVMs are trained:– True/false classifiers for each of the class labels– Pair-wise classifiers for the class labels

Page 54: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Combining Classifiers• ‘Ensemble’ learning• Use a combination of models for prediction– Bagging : Majority votes– Boosting : Attention to the ‘weak’ instances

• Goal : An improved combined model

Reference : Scribe by Rahul Gupta, IIT Bombay

Page 55: Sentiment Analysis Balamurali A R IITB-Monash Research Academy {balamurali@cse.iitb.ac.in} Acknowledgment: Aditya Joshi.

Total set

Boosting (AdaBoost)

SampleD 1

Classifiermodel

M 1

Selection based on weight. May use bootstrap sampling with replacement

Trainingdataset

D

Classifierlearningscheme

Classifiermodel

M nTest set

Weightedvote Class Label

Initialize weights of instances to 1/d

Weights of

correctly classified instances multiplied

by error / (1 – error)

If error > 0.5?

Error

Error

Reference : Scribe by Rahul Gupta, IIT Bombay