MTech Seminar Presentation [IIT-Bombay]

Post on 26-Jan-2015

115 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Seminar presentation made by me for the topic of 'Resources for Sentiment Analysis' at IIT Bombay. Includes a set of bonus slides for additional information which was not actually presented.

Transcript

Resources for Sentiment AnalysisSeminar Presentation

Sagar Ahire133050073

IIT Bombay

02 May, 2014

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 1 / 48

Roadmap

1 Introduction

2 Sentiwordnet

3 SO-CAL

4 Wordnet-Affect

5 Indian-Language Sentiwordnets

6 Conclusions

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 2 / 48

Introduction

Roadmap: We Are Here

1 Introduction

2 Sentiwordnet

3 SO-CAL

4 Wordnet-Affect

5 Indian-Language Sentiwordnets

6 Conclusions

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 3 / 48

Introduction Overview

Overview

An overview of today’s presentation:

This presentation covers lexical resources for sentiment analysis.

Four resources are covered, each using a different approach forrepresentation and creation:

Sentiwordnet, created automatically, with 3 graded scores per synsetSO-CAL, created manually, with a graded score per wordWordnet-Affect, created semi-automatically, with affect information foreach synsetIndian-Language Sentiwordnet, created by projecting the EnglishSentiwordnet

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 4 / 48

Introduction Overview

Overview

An overview of today’s presentation:

This presentation covers lexical resources for sentiment analysis.

Four resources are covered, each using a different approach forrepresentation and creation:

Sentiwordnet, created automatically, with 3 graded scores per synsetSO-CAL, created manually, with a graded score per wordWordnet-Affect, created semi-automatically, with affect information foreach synsetIndian-Language Sentiwordnet, created by projecting the EnglishSentiwordnet

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 4 / 48

Introduction Overview

Overview

An overview of today’s presentation:

This presentation covers lexical resources for sentiment analysis.

Four resources are covered, each using a different approach forrepresentation and creation:

Sentiwordnet, created automatically, with 3 graded scores per synset

SO-CAL, created manually, with a graded score per wordWordnet-Affect, created semi-automatically, with affect information foreach synsetIndian-Language Sentiwordnet, created by projecting the EnglishSentiwordnet

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 4 / 48

Introduction Overview

Overview

An overview of today’s presentation:

This presentation covers lexical resources for sentiment analysis.

Four resources are covered, each using a different approach forrepresentation and creation:

Sentiwordnet, created automatically, with 3 graded scores per synsetSO-CAL, created manually, with a graded score per word

Wordnet-Affect, created semi-automatically, with affect information foreach synsetIndian-Language Sentiwordnet, created by projecting the EnglishSentiwordnet

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 4 / 48

Introduction Overview

Overview

An overview of today’s presentation:

This presentation covers lexical resources for sentiment analysis.

Four resources are covered, each using a different approach forrepresentation and creation:

Sentiwordnet, created automatically, with 3 graded scores per synsetSO-CAL, created manually, with a graded score per wordWordnet-Affect, created semi-automatically, with affect information foreach synset

Indian-Language Sentiwordnet, created by projecting the EnglishSentiwordnet

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 4 / 48

Introduction Overview

Overview

An overview of today’s presentation:

This presentation covers lexical resources for sentiment analysis.

Four resources are covered, each using a different approach forrepresentation and creation:

Sentiwordnet, created automatically, with 3 graded scores per synsetSO-CAL, created manually, with a graded score per wordWordnet-Affect, created semi-automatically, with affect information foreach synsetIndian-Language Sentiwordnet, created by projecting the EnglishSentiwordnet

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 4 / 48

Introduction Sentiment Analysis

Sentiment Analysis

Sentiment Analysis: Determining the opinion expressed in a text

Approaches:

Classifier-basedLexicon-based

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 5 / 48

Introduction Sentiment Analysis

Sentiment Analysis

Sentiment Analysis: Determining the opinion expressed in a text

Approaches:

Classifier-basedLexicon-based

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 5 / 48

Introduction Sentiment Analysis

Sentiment Analysis

Sentiment Analysis: Determining the opinion expressed in a text

Approaches:

Classifier-based

Lexicon-based

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 5 / 48

Introduction Sentiment Analysis

Sentiment Analysis

Sentiment Analysis: Determining the opinion expressed in a text

Approaches:

Classifier-basedLexicon-based

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 5 / 48

Introduction Sentiment Analysis

Why Lexicon-based Approach?

The classifier-based approach has the following drawbacks:

Domain Specificity (Example: Movie reviews mentioning ‘writer’,‘plot’, etc.) [Bro01]

Lack of Context (Example: ‘good’ vs ‘not good’ vs ‘not very good’)

The lexicon-based approach aims at solving these problems.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 6 / 48

Introduction Sentiment Analysis

Why Lexicon-based Approach?

The classifier-based approach has the following drawbacks:

Domain Specificity (Example: Movie reviews mentioning ‘writer’,‘plot’, etc.) [Bro01]

Lack of Context (Example: ‘good’ vs ‘not good’ vs ‘not very good’)

The lexicon-based approach aims at solving these problems.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 6 / 48

Introduction Sentiment Analysis

Why Lexicon-based Approach?

The classifier-based approach has the following drawbacks:

Domain Specificity (Example: Movie reviews mentioning ‘writer’,‘plot’, etc.) [Bro01]

Lack of Context (Example: ‘good’ vs ‘not good’ vs ‘not very good’)

The lexicon-based approach aims at solving these problems.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 6 / 48

Introduction Sentiment Lexicons

Sentiment Lexicons

A sentiment lexicon is a sentiment database for language units of the form(lexical unit, sentiment).

Choices for lexical unit:

Word

Word sense

Phrase, etc.

Choices for sentiment:

Fixed categorization into ‘positive’ and ‘negative’

Graded sets like ‘strongly positive’, ‘mildly positive’, ‘neutral’, ‘mildlynegative’, ‘strongly negative’

Score in an interval like [0, 1] or [−1,+1]

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 7 / 48

Introduction Sentiment Lexicons

Sentiment Lexicons

A sentiment lexicon is a sentiment database for language units of the form(lexical unit, sentiment).Choices for lexical unit:

Word

Word sense

Phrase, etc.

Choices for sentiment:

Fixed categorization into ‘positive’ and ‘negative’

Graded sets like ‘strongly positive’, ‘mildly positive’, ‘neutral’, ‘mildlynegative’, ‘strongly negative’

Score in an interval like [0, 1] or [−1,+1]

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 7 / 48

Introduction Sentiment Lexicons

Sentiment Lexicons

A sentiment lexicon is a sentiment database for language units of the form(lexical unit, sentiment).Choices for lexical unit:

Word

Word sense

Phrase, etc.

Choices for sentiment:

Fixed categorization into ‘positive’ and ‘negative’

Graded sets like ‘strongly positive’, ‘mildly positive’, ‘neutral’, ‘mildlynegative’, ‘strongly negative’

Score in an interval like [0, 1] or [−1,+1]

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 7 / 48

Introduction Sentiment Lexicons

Approaches for Creation

Manual

Automatic

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 8 / 48

Sentiwordnet

Roadmap: We Are Here

1 Introduction

2 Sentiwordnet

3 SO-CAL

4 Wordnet-Affect

5 Indian-Language Sentiwordnets

6 Conclusions

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 9 / 48

Sentiwordnet

Introduction to Sentiwordnet

Sentiwordnet [ES06] is an automatically generated sentiment lexicon madeusing Wordnet. Its salient features are:

High coverage

Support for graded sentiment labels

Support for both sentiment classification and subjectivity detection

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 10 / 48

Sentiwordnet

Introduction to Sentiwordnet

Sentiwordnet [ES06] is an automatically generated sentiment lexicon madeusing Wordnet. Its salient features are:

High coverage

Support for graded sentiment labels

Support for both sentiment classification and subjectivity detection

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 10 / 48

Sentiwordnet

Introduction to Sentiwordnet

Sentiwordnet [ES06] is an automatically generated sentiment lexicon madeusing Wordnet. Its salient features are:

High coverage

Support for graded sentiment labels

Support for both sentiment classification and subjectivity detection

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 10 / 48

Sentiwordnet

Introduction to Sentiwordnet

Sentiwordnet [ES06] is an automatically generated sentiment lexicon madeusing Wordnet. Its salient features are:

High coverage

Support for graded sentiment labels

Support for both sentiment classification and subjectivity detection

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 10 / 48

Sentiwordnet Structure

Structure of Sentiwordnet

Sentiwordnet = Wordnet + Sentiment Information.

Each synset s is given three sentiment scores:

Positive score Pos(s)

Negative score Neg(s)

Objective score Obj(s)

Pos(s) +Neg(s) +Obj(s) = 1

Example Synset

beautifula: Pos = 0.75, Neg = 0.00, Obj = 0.25

aURL: http://sentiwordnet.isti.cnr.it/search.php?q=beautiful

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 11 / 48

Sentiwordnet Structure

Structure of Sentiwordnet

Sentiwordnet = Wordnet + Sentiment Information.Each synset s is given three sentiment scores:

Positive score Pos(s)

Negative score Neg(s)

Objective score Obj(s)

Pos(s) +Neg(s) +Obj(s) = 1

Example Synset

beautifula: Pos = 0.75, Neg = 0.00, Obj = 0.25

aURL: http://sentiwordnet.isti.cnr.it/search.php?q=beautiful

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 11 / 48

Sentiwordnet Structure

Structure of Sentiwordnet

Sentiwordnet = Wordnet + Sentiment Information.Each synset s is given three sentiment scores:

Positive score Pos(s)

Negative score Neg(s)

Objective score Obj(s)

Pos(s) +Neg(s) +Obj(s) = 1

Example Synset

beautifula: Pos = 0.75, Neg = 0.00, Obj = 0.25

aURL: http://sentiwordnet.isti.cnr.it/search.php?q=beautiful

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 11 / 48

Sentiwordnet Creation

Creation Steps

The top-level steps in the algorithm to create Sentiwordnet are as follows:

1 Selection of seed set

2 Expansion using Wordnet’s semantic relations

3 Training of a team of ternary classifiers

4 Classification of each Wordnet synset using the classifiers

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 12 / 48

Sentiwordnet Creation

Creation Steps

The top-level steps in the algorithm to create Sentiwordnet are as follows:

1 Selection of seed set

2 Expansion using Wordnet’s semantic relations

3 Training of a team of ternary classifiers

4 Classification of each Wordnet synset using the classifiers

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 12 / 48

Sentiwordnet Creation

Creation Steps

The top-level steps in the algorithm to create Sentiwordnet are as follows:

1 Selection of seed set

2 Expansion using Wordnet’s semantic relations

3 Training of a team of ternary classifiers

4 Classification of each Wordnet synset using the classifiers

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 12 / 48

Sentiwordnet Creation

Creation Steps

The top-level steps in the algorithm to create Sentiwordnet are as follows:

1 Selection of seed set

2 Expansion using Wordnet’s semantic relations

3 Training of a team of ternary classifiers

4 Classification of each Wordnet synset using the classifiers

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 12 / 48

Sentiwordnet Creation

Creation Steps

The top-level steps in the algorithm to create Sentiwordnet are as follows:

1 Selection of seed set

2 Expansion using Wordnet’s semantic relations

3 Training of a team of ternary classifiers

4 Classification of each Wordnet synset using the classifiers

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 12 / 48

SO-CAL

Roadmap: We Are Here

1 Introduction

2 Sentiwordnet

3 SO-CAL

4 Wordnet-Affect

5 Indian-Language Sentiwordnets

6 Conclusions

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 13 / 48

SO-CAL

Introduction to SO-CAL

SO-CAL is a system that uses a manually-constructed lexicon. Its salientfeatures are:

Highly detailed lexicon

Graded sentiment label

Low coverage, but high accuracy

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 14 / 48

SO-CAL

Introduction to SO-CAL

SO-CAL is a system that uses a manually-constructed lexicon. Its salientfeatures are:

Highly detailed lexicon

Graded sentiment label

Low coverage, but high accuracy

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 14 / 48

SO-CAL

Introduction to SO-CAL

SO-CAL is a system that uses a manually-constructed lexicon. Its salientfeatures are:

Highly detailed lexicon

Graded sentiment label

Low coverage, but high accuracy

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 14 / 48

SO-CAL

Introduction to SO-CAL

SO-CAL is a system that uses a manually-constructed lexicon. Its salientfeatures are:

Highly detailed lexicon

Graded sentiment label

Low coverage, but high accuracy

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 14 / 48

SO-CAL Structure

Features Used

SO-CAL classifies words into various features and treats each featuredifferently in the lexicon. They are:

Adjectives

Nouns, Verbs, Adverbs and Multiwords

Intensifiers and Downtoners

Negation

Irrealis Blocking

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 15 / 48

SO-CAL Structure

Features Used

SO-CAL classifies words into various features and treats each featuredifferently in the lexicon. They are:

Adjectives

Nouns, Verbs, Adverbs and Multiwords

Intensifiers and Downtoners

Negation

Irrealis Blocking

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 15 / 48

SO-CAL Structure

Features Used

SO-CAL classifies words into various features and treats each featuredifferently in the lexicon. They are:

Adjectives

Nouns, Verbs, Adverbs and Multiwords

Intensifiers and Downtoners

Negation

Irrealis Blocking

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 15 / 48

SO-CAL Structure

Features Used

SO-CAL classifies words into various features and treats each featuredifferently in the lexicon. They are:

Adjectives

Nouns, Verbs, Adverbs and Multiwords

Intensifiers and Downtoners

Negation

Irrealis Blocking

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 15 / 48

SO-CAL Structure

Features Used

SO-CAL classifies words into various features and treats each featuredifferently in the lexicon. They are:

Adjectives

Nouns, Verbs, Adverbs and Multiwords

Intensifiers and Downtoners

Negation

Irrealis Blocking

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 15 / 48

SO-CAL Structure

Features Used

SO-CAL classifies words into various features and treats each featuredifferently in the lexicon. They are:

Adjectives

Nouns, Verbs, Adverbs and Multiwords

Intensifiers and Downtoners

Negation

Irrealis Blocking

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 15 / 48

SO-CAL Structure

Structure of SO-CAL

Sentiment scoring:

Words are scored in [−5,+5]

Intensifiers and negation further act upon these scores

Examples

good: +3monstrosity: −5masterpiece: +5

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 16 / 48

SO-CAL Structure

Structure of SO-CAL

Sentiment scoring:

Words are scored in [−5,+5]

Intensifiers and negation further act upon these scores

Examples

good: +3monstrosity: −5masterpiece: +5

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 16 / 48

SO-CAL Structure

Structure of SO-CAL

Sentiment scoring:

Words are scored in [−5,+5]

Intensifiers and negation further act upon these scores

Examples

good: +3monstrosity: −5masterpiece: +5

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 16 / 48

Wordnet-Affect

Roadmap: We Are Here

1 Introduction

2 Sentiwordnet

3 SO-CAL

4 Wordnet-Affect

5 Indian-Language Sentiwordnets

6 Conclusions

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 17 / 48

Wordnet-Affect

Introduction to Wordnet-Affect

Wordnet-Affect [SV04] is a semi-automatically generated sentiment lexiconmade using Wordnet. It associates affective information with eachsynset. Its salient features are:

Highly detailed

Ability to handle sentiment differently depending on emotion

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 18 / 48

Wordnet-Affect

Introduction to Wordnet-Affect

Wordnet-Affect [SV04] is a semi-automatically generated sentiment lexiconmade using Wordnet. It associates affective information with eachsynset. Its salient features are:

Highly detailed

Ability to handle sentiment differently depending on emotion

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 18 / 48

Wordnet-Affect Structure

Structure of Wordnet-Affect

Wordnet-Affect = Wordnet + Affect Information.

Affect is represented using the following:

An a-label which represents the emotion,

The valency which indicates the sentiment.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 19 / 48

Wordnet-Affect Structure

Structure of Wordnet-Affect

Wordnet-Affect = Wordnet + Affect Information.Affect is represented using the following:

An a-label which represents the emotion,

The valency which indicates the sentiment.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 19 / 48

Wordnet-Affect Structure

Structure of Wordnet-Affect

The a-label is a tree of emotions starting at a root node with eachleaf node corresponding to a synset.

The valency can be any of positive, negative, neutral or ambiguous.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 20 / 48

Wordnet-Affect Structure

Structure of Wordnet-Affect

The a-label is a tree of emotions starting at a root node with eachleaf node corresponding to a synset.

The valency can be any of positive, negative, neutral or ambiguous.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 20 / 48

Wordnet-Affect Structure

root

mental-state

cognitive-state affective-state

mood emotion

positive-emotion

joy

elation

love

worship

negative-emotion

sadness

melancholy

shame

embarrassment

. . .

. . .

physical-state . . .

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 21 / 48

Wordnet-Affect Creation

Creation Steps

Wordnet-Affect was created using the following steps:

Manual creation of initial resource

Automatic expansion using Wordnet relations

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 22 / 48

Wordnet-Affect Creation

Creation Steps

Wordnet-Affect was created using the following steps:

Manual creation of initial resource

Automatic expansion using Wordnet relations

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 22 / 48

Wordnet-Affect Creation

Creation Steps

Wordnet-Affect was created using the following steps:

Manual creation of initial resource

Automatic expansion using Wordnet relations

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 22 / 48

Indian-Language Sentiwordnets

Roadmap: We Are Here

1 Introduction

2 Sentiwordnet

3 SO-CAL

4 Wordnet-Affect

5 Indian-Language Sentiwordnets

6 Conclusions

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 23 / 48

Indian-Language Sentiwordnets

Introduction to Indian-Language Sentiwordnets

Indian-language Sentiwordnets can be created using Wordnet projection[JRB10]. This approach has the following salient features:

Easy to create once backing resources are available

No reduplication of effort

Use of tried-and-tested representations

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 24 / 48

Indian-Language Sentiwordnets

Introduction to Indian-Language Sentiwordnets

Indian-language Sentiwordnets can be created using Wordnet projection[JRB10]. This approach has the following salient features:

Easy to create once backing resources are available

No reduplication of effort

Use of tried-and-tested representations

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 24 / 48

Indian-Language Sentiwordnets Creation

Creation Steps

The process of projecting a Sentiwordnet has the following steps:

Fetch a synset from the English Sentiwordnet.

Find the corresponding Hindi synset using Indowordnet.

Assign sentiment scores from English synset to Hindi synset.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 25 / 48

Indian-Language Sentiwordnets Creation

Creation Steps

The process of projecting a Sentiwordnet has the following steps:

Fetch a synset from the English Sentiwordnet.

Find the corresponding Hindi synset using Indowordnet.

Assign sentiment scores from English synset to Hindi synset.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 25 / 48

Indian-Language Sentiwordnets Creation

Creation Steps

The process of projecting a Sentiwordnet has the following steps:

Fetch a synset from the English Sentiwordnet.

Find the corresponding Hindi synset using Indowordnet.

Assign sentiment scores from English synset to Hindi synset.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 25 / 48

Conclusions

Roadmap: We Are Here

1 Introduction

2 Sentiwordnet

3 SO-CAL

4 Wordnet-Affect

5 Indian-Language Sentiwordnets

6 Conclusions

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 26 / 48

Conclusions

A Comparison of the Resources

Criterion SWN SO-CAL WN-Affect IL-SWN

Sentiment 3 x [0, 1] [−5,+5] Affect 3 x [0, 1]Lexical Unit Synset Word Synset SynsetBacking Resource Wordnet None Wordnet SWN + In-

dowordnetCreation Automatic Manual Automatic ProjectionNo of Entries 117,000 5,000 900 16,000

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 27 / 48

Conclusions

Concluding Remarks

To conclude, there are three choices in making a sentiment lexicon:

Creation Approach: Manual, Automatic, Semi-Automatic orProjection

Lexical Unit: Word, Synset or Higher Representations

Sentiment: Labels, Graded Scores or Affect Information

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 28 / 48

Conclusions

Concluding Remarks

To conclude, there are three choices in making a sentiment lexicon:

Creation Approach: Manual, Automatic, Semi-Automatic orProjection

Lexical Unit: Word, Synset or Higher Representations

Sentiment: Labels, Graded Scores or Affect Information

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 28 / 48

Conclusions

Concluding Remarks

To conclude, there are three choices in making a sentiment lexicon:

Creation Approach: Manual, Automatic, Semi-Automatic orProjection

Lexical Unit: Word, Synset or Higher Representations

Sentiment: Labels, Graded Scores or Affect Information

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 28 / 48

Conclusions

Concluding Remarks

To conclude, there are three choices in making a sentiment lexicon:

Creation Approach: Manual, Automatic, Semi-Automatic orProjection

Lexical Unit: Word, Synset or Higher Representations

Sentiment: Labels, Graded Scores or Affect Information

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 28 / 48

Conclusions

Concluding Remarks: Creation Approach

Manual Approach Automatic Approach

High annotation accuracy Low annotation accuracyHigh time investment Low time investmentMore details supported Less details supported

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 29 / 48

Conclusions

Concluding Remarks: Lexical Unit

Word Synset

Unreliable for polysemous words Reliable for polysemous wordsNo pre-processing required Requires WSDProjection is comparatively difficult Projection is comparatively easier

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 30 / 48

Conclusions

Concluding Remarks: Sentiment

Graded scores have been shown to be better than mere labels in general.Moreover, a graded score resource can always be converted to alabel-based resource.Affect information can help in specialized circumstances.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 31 / 48

Conclusions

Future Work

Possible directions in the future:

Automatic resources for higher-level lexical units like phrases, trees,etc.

Manual resources for synsets

Manual lexicons for Indian languages

Techniques for building dynamic resources to incorporate ‘netspeak’and other slang

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 32 / 48

Conclusions

Future Work

Possible directions in the future:

Automatic resources for higher-level lexical units like phrases, trees,etc.

Manual resources for synsets

Manual lexicons for Indian languages

Techniques for building dynamic resources to incorporate ‘netspeak’and other slang

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 32 / 48

Conclusions

References I

Julian Brooke, A semantic approach to automatic text sentimentanalysis, M.A. thesis, Stanford University, 2001.

Andrea Esuli and Fabrizio Sebastiani, SentiWordNet: A publiclyavailable lexical resource for opinion mining, Proceedings of the 5thConference on Language Resources and Evaluation (LREC-06), 2006,pp. 417–422.

Andrea Esuli, Automatic generation of lexical resources for opinionmining: Models, algorithms and applications, Ph.D. thesis, Universitadi Pisa, 2008.

Christiane Fellbaum, Wordnet: An electronic lexical database, ABradford Book, 1998.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 33 / 48

Conclusions

References II

Vasileios Hatzivassiloglou and Kathleen R. McKeown, Predicting thesemantic orientation of adjectives, Proceedings of the 35th AnnualMeeting of the Association for Computational Linguistics and EighthConference of the European Chapter of the Association forComputational Linguistics, Association for Computational Linguistics,1997, pp. 174–181.

Aditya Joshi, Balamurali A R, and Pushpak Bhattacharyya, Afall-back strategy for sentiment analysis in hindi: a case study,Proceedings of ICON 2010: 8th International Conference on NaturalLanguage Processing, Macmillan Publishers, India, 2010.

Jaap Kamps, Maarten Marx, Robert J. Mokken, and Maartende Rijke, Using wordnet to measure semantic orientations ofadjectives, Proceedings of LREC-04, 4th International Conference onLanguage Resources and Evaluation, 2004, pp. 1115–1118.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 34 / 48

Conclusions

References III

Ellen Riloff and Janyce Wiebe, Learning extraction patterns forsubjective expressions, Proceedings of the 2003 Conference onEmpirical Methods in Natural Language Processing, Association forComputational Linguistics, 2003, pp. 105–112.

Carlo Strapparava and Alessandro Valitutti, WordNet-Affect: anaffective extension of WordNet, Proceedings of the 4th InternationalConference on Language Resources and Evaluation (LREC-04), 2004,pp. 1083–1086.

Peter D. Turney and Michael L. Littman, Measuring praise andcriticism: Inference of semantic orientation from association, ACMTransactions on Information Systems 21 (2003), no. 4, 315–346.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 35 / 48

Additional Slides Wordnet

Wordnet

Wordnet [Fel98] is a lexical database organized by word sense. Thefundamental unit of storage is called a synset.

An Example Synset

brilliant, superba: of surpassing excellence“a brilliant performance”; “a superb actor”

aURL: http://wordnetweb.princeton.edu/perl/webwn?s=brilliant

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 36 / 48

Additional Slides Wordnet

Wordnet

Wordnet [Fel98] is a lexical database organized by word sense. Thefundamental unit of storage is called a synset.

An Example Synset

brilliant, superba: of surpassing excellence“a brilliant performance”; “a superb actor”

aURL: http://wordnetweb.princeton.edu/perl/webwn?s=brilliant

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 36 / 48

Additional Slides Wordnet

Semantic Relations in Wordnet

Wordnet synsets are linked to each other by relations called semanticrelations. Some of them are:

Antonymy

Meronymy

Hypernymy

Hyponymy

Similar to, etc.

These relations are helpful in creating the training set for classifyingsynsets to create Sentiwordnet.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 37 / 48

Additional Slides Wordnet

Semantic Relations in Wordnet

Wordnet synsets are linked to each other by relations called semanticrelations. Some of them are:

Antonymy

Meronymy

Hypernymy

Hyponymy

Similar to, etc.

These relations are helpful in creating the training set for classifyingsynsets to create Sentiwordnet.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 37 / 48

Additional Slides Wordnet

Semantic Relations in Wordnet

Wordnet synsets are linked to each other by relations called semanticrelations. Some of them are:

Antonymy

Meronymy

Hypernymy

Hyponymy

Similar to, etc.

These relations are helpful in creating the training set for classifyingsynsets to create Sentiwordnet.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 37 / 48

Additional Slides Background

Sentiment Classification

Initial work that automatically detected the sentiment of a word led totoday’s modern lexicons. This included:

Use of conjunction-separated adjectives [HM97]

PMI-based Extraction using Web Queries [TL03]

Graph Expansion using Wordnet [KMMdR04]

Classification using Wordnet Glosses [Esu08]

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 38 / 48

Additional Slides Background

Sentiment Classification

Initial work that automatically detected the sentiment of a word led totoday’s modern lexicons. This included:

Use of conjunction-separated adjectives [HM97]

PMI-based Extraction using Web Queries [TL03]

Graph Expansion using Wordnet [KMMdR04]

Classification using Wordnet Glosses [Esu08]

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 38 / 48

Additional Slides Background

Sentiment Classification

Initial work that automatically detected the sentiment of a word led totoday’s modern lexicons. This included:

Use of conjunction-separated adjectives [HM97]

PMI-based Extraction using Web Queries [TL03]

Graph Expansion using Wordnet [KMMdR04]

Classification using Wordnet Glosses [Esu08]

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 38 / 48

Additional Slides Background

Sentiment Classification

Initial work that automatically detected the sentiment of a word led totoday’s modern lexicons. This included:

Use of conjunction-separated adjectives [HM97]

PMI-based Extraction using Web Queries [TL03]

Graph Expansion using Wordnet [KMMdR04]

Classification using Wordnet Glosses [Esu08]

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 38 / 48

Additional Slides Background

Sentiment Classification

Initial work that automatically detected the sentiment of a word led totoday’s modern lexicons. This included:

Use of conjunction-separated adjectives [HM97]

PMI-based Extraction using Web Queries [TL03]

Graph Expansion using Wordnet [KMMdR04]

Classification using Wordnet Glosses [Esu08]

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 38 / 48

Additional Slides Background

Subjectivity Detection

Work that identifies whether a term is indeed subjective is necessary tofilter out objective words from sentiment classification. This includes:

Adapting Wordnet Glosses to Subjectivity Detection [Esu08]

Bootstrapping Subjective Expressions from a Corpus [RW03]

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 39 / 48

Additional Slides Background

Subjectivity Detection

Work that identifies whether a term is indeed subjective is necessary tofilter out objective words from sentiment classification. This includes:

Adapting Wordnet Glosses to Subjectivity Detection [Esu08]

Bootstrapping Subjective Expressions from a Corpus [RW03]

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 39 / 48

Additional Slides Background

Subjectivity Detection

Work that identifies whether a term is indeed subjective is necessary tofilter out objective words from sentiment classification. This includes:

Adapting Wordnet Glosses to Subjectivity Detection [Esu08]

Bootstrapping Subjective Expressions from a Corpus [RW03]

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 39 / 48

Additional Slides Structure of SO-CAL

Adjectives

Adjectives were collected from a 500-document corpus and annotated witha sentiment score from −5 to +5.

Examples

good: +3sleazy: −3

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 40 / 48

Additional Slides Structure of SO-CAL

Nouns, Verbs, Adverbs, Multiwords

This was extended to other parts of speech and multiword expressions, fora total of about 5,000 words.

Examples

monstrosity: −5masterpiece: +5inspire: +2funny: +2 vs. act funny: −1

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 41 / 48

Additional Slides Structure of SO-CAL

Intensifiers and Downtoners

Intensifiers are words that increase sentiment intensity while downtonersare words that reduce sentiment intensity. For example extraordinarily andsomewhat.

Intensifiers and downtoners are modeled as percentage modifiers.

Examples

slightly: −50%extraordinarily: +50%

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 42 / 48

Additional Slides Structure of SO-CAL

Intensifiers and Downtoners

Intensifiers are words that increase sentiment intensity while downtonersare words that reduce sentiment intensity. For example extraordinarily andsomewhat.Intensifiers and downtoners are modeled as percentage modifiers.

Examples

slightly: −50%extraordinarily: +50%

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 42 / 48

Additional Slides Structure of SO-CAL

Negation

Negation is modeled as a numeric shift of value 4 towards the oppositesentiment.

Examples

good: +3 ⇒ not good: −1atrocious: −5 ⇒ not atrocious: −1

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 43 / 48

Additional Slides Structure of SO-CAL

Irrealis Blocking

An irrealis marker is a word that indicates that the sentiment may not bereliable because the event hasn’t actually happened. For example, ‘would’,‘expect’, ‘if’, quotation marks, etc.

Sentences with irrealis markers are ignored for sentiment analysis.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 44 / 48

Additional Slides Structure of SO-CAL

Irrealis Blocking

An irrealis marker is a word that indicates that the sentiment may not bereliable because the event hasn’t actually happened. For example, ‘would’,‘expect’, ‘if’, quotation marks, etc.Sentences with irrealis markers are ignored for sentiment analysis.

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 44 / 48

Additional Slides Sentiwordnet Creation

Seed Set

Two seed sets are created:

Lp for positive synsets

Ln for negative synsets

Each synset representation consists of:

The terms

The defninition

The sample phrases

Explicit indication of negation

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 45 / 48

Additional Slides Sentiwordnet Creation

Seed Set

Two seed sets are created:

Lp for positive synsets

Ln for negative synsets

Each synset representation consists of:

The terms

The defninition

The sample phrases

Explicit indication of negation

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 45 / 48

Additional Slides Sentiwordnet Creation

Wordnet Expansion

Relations of Wordnet used for expansion:

Direct antonymy

Similarity

Derived from

Pertains to

Attribute

Also see

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 46 / 48

Additional Slides Sentiwordnet Creation

Wordnet Expansion

Relations of Wordnet used for expansion:

Direct antonymy

Similarity

Derived from

Pertains to

Attribute

Also see

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 46 / 48

Additional Slides Sentiwordnet Creation

Classifiers

8 classifiers were created differing in:

No of iterations of expansion (0, 2, 4, 6)

Learning algorithm (SVM, Rocchio)

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 47 / 48

Additional Slides Sentiwordnet Creation

Classifiers

8 classifiers were created differing in:

No of iterations of expansion (0, 2, 4, 6)

Learning algorithm (SVM, Rocchio)

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 47 / 48

Additional Slides Sentiwordnet Creation

Classifiers

8 classifiers were created differing in:

No of iterations of expansion (0, 2, 4, 6)

Learning algorithm (SVM, Rocchio)

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 47 / 48

Additional Slides Sentiwordnet Creation

Classifiers

Each ternary classifier is a sum of 2 binary classifiers:

Positive vs. Not Positive

Negative vs. Not Negative

The results are combined as:Positive Not Positive

Negative Objective Negative

Not Negative Positive Objective

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 48 / 48

Additional Slides Sentiwordnet Creation

Classifiers

Each ternary classifier is a sum of 2 binary classifiers:

Positive vs. Not Positive

Negative vs. Not Negative

The results are combined as:Positive Not Positive

Negative Objective Negative

Not Negative Positive Objective

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 48 / 48

Additional Slides Sentiwordnet Creation

Classifiers

Each ternary classifier is a sum of 2 binary classifiers:

Positive vs. Not Positive

Negative vs. Not Negative

The results are combined as:Positive Not Positive

Negative Objective Negative

Not Negative Positive Objective

Sagar Ahire (IIT Bombay) Sentiment Resources 02 May, 2014 48 / 48

top related