Top Banner
Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau
39
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

Automatic Domain Adaptive Sentiment Analysis Phase 1

Justin Martineau

Page 2: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

Outline Introduction

Problem Definition Thesis Statement Motivation

Background and Related Work Challenges Approaches

Research Plan Approach Evaluation Timeline

Conclusion

Page 3: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

Problem Definition

Sentiment Analysis is the automatic detection and measurement of sentiment in text segments by machines.

3 Sub Tasks Objective vs. Subjective Topic Detection Positive vs. Negative

Commonly applied to web data Very Domain Dependent

1. Intro - 2. Related Work - 3. Research Plan - 4. Conclusion

Page 4: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

Sentiment Analysis Example

1. Intro - 2. Related Work - 3. Research Plan - 4. Conclusion

Page 5: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

Thesis Statement

This dissertation will develop and evaluate techniques to discover and encode domain-specific, domain-independent, and semantic

knowledge to improve both single and multiple domain sentiment analysis problems

on textual data given low labeled data conditions.

1. Intro - 2. Related Work - 3. Research Plan - 4. Conclusion

Page 6: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

Motivation: Private Sector

Market Research Surveys Focus Groups Feature Analysis Customer targeting (Free samples etc…)

Consumer Sentiment Search Compare pros and cons Overall opinion of products/services

1. Intro - 2. Related Work - 3. Research Plan - 4. Conclusion

Page 7: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

Motivation: Public Sector

Political Alternative Polling Determine popular support for legislation Choose campaign issues

National Security Detect individuals at risk for radicalization Determine local sentiment about US policy Determine local values and sentimental icons Portray actions positively using local flavor

Public Health Detect potential suicide victims Detect mentally unstable people

1. Intro - 2. Related Work - 3. Research Plan - 4. Conclusion

Page 8: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

Challenges

Text Representation Unedited Text Sentiment Drift Negation Sarcasm Sentiment Target Identification Granularity Domain Dependence

1. Intro - 2. Related Work - 3. Research Plan - 4. Conclusion

Page 9: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

Domain Dependence 1Domain Dependent Sentiment The same sentence can mean two very different

things in different domains Ex: “Read the book.” <= Good for books, bad for movies Ex: “Jolting, heart pounding, You’re in for one hell of a

bumpy ride!” Good for movies and books, bad for cars.

Sentimental word associations change with domain Fuzzy cameras are bad, but fuzzy teddy bears are good. Big trucks are good, but big iPods are bad. Bad is bad, but bad villains are good.

1. Intro - 2. Related Work - 3. Research Plan - 4. Conclusion

Page 10: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

Domain Dependence 2 Endless Possibilities

1. Intro - 2. Related Work - 3. Research Plan - 4. Conclusion

Page 11: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

Domain Dependence 3Organization and Granularity

1. Intro - 2. Related Work - 3. Research Plan - 4. Conclusion

Page 12: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

Theory of the Three Signals

Authors communicate messages using three types of signals Domain-Specific Signals Domain-Independent Signals Semantic Signals

More specific signals are generally more powerful than more generic signals

1. Intro - 2. Related Work - 3. Research Plan - 4. Conclusion

Page 13: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

Domain-Specific Signals Dependent on problem and domain Considered more useful by readers

Tells what is good or bad about topic Domain knowledge determines

sentiment orientation Very strong in context, but weak or

misleading out of context Can cause over generalization

error when overvalued New domain-specific signal words

are ignored in CDT

Fuzzy teddy bears

Sharp pictures Sharp knives Smooth rides New ideas Fast servers Fast cars Slow roasted

burgers Slow motion Small cameras Big cars

1. Intro - 2. Related Work - 3. Research Plan - 4. Conclusion

Page 14: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

Proposed Approach

Sentiment Search is more than just a classification problem

Detecting and Using the three signals Dynamic Domain Adapting Classifiers Generic Feature Detection using unlabeled data Semantic Feature Spaces

1. Intro - 2. Related Work - 3. Research Plan - 4. Conclusion

Page 15: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

Dynamic Domain Adapting Classifiers

A (preferably domain-independent) model is built using computationally intense algorithms before query time on a set of labeled data.

Users interact at a query box level Query results define the domain of interest Domain specific adaptations are calculated

compares how the domain of interest is different from known cases uses semantic knowledge about word senses and relations must be fast algorithm: users are waiting

Domain specific adaptations are woven into the domain independent model resulting model is temporary used to classify documents as positive, negative, or objective

Sentimental search results are processed for significant components and presented for human consumption

1. Intro - 2. Related Work - 3. Research Plan - 4. Conclusion

Page 16: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

Overview

SentimentClassifier

Query Results Define a new Domain

ContextSpecificModel

LuceneIndex

Query

Dynamic DomainAdapter

GeneralModel

+ -

Labeled data fromknown domain

SemanticKnowledge

SentimentalSearchResults

ComponentAnalysis

BusinessIntelligence

Key: User Level, Source Data, Knowledge,Labeled Data Algorithms, Search Results

1. Intro - 2. Related Work - 3. Research Plan - 4. Conclusion

Page 17: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

Subjective Context Scoring

Multiply: PMI(Word,Context) IDF Co-occurance with know generic sentiment seed

words times their bias (From movie reviews) Seeds:

bad,worst,stupid,ridiculous, terrible,poorly great,best,perfect,wonderful,

excellent,effective

Page 18: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

Rocchio Baseline

Rocchio - Query Expansion algorithm for search Similar goals to ours, find more relevant words Does not account for sentiment

The new query is a weight sum of Matching document vectors Query vector Non-matching document vectors (negative value).

Page 19: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

Papa John’s According to TFIDF

Page 20: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

Papa John’s According to Subjective Context

Page 21: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

George Bush According to TFIDF

Page 22: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

George Bush According to Subjective Context

Page 23: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

iPod according to Rocchio

Page 24: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

iPod according to TFIDF

Positive Sentiment In Movie ReviewsNegative Sentiment in Movie Reviews

Page 25: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

Sentimental Context

Components: PMI(Word,Context) TF IDF Log( Actual Co Occur of Word,Seed, context / Prob by

chance) Values:

Abnormality to other docs Popular words in context Rare words in the corpus Words that occur with sentiment words in the query

documents

Page 26: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

iPod according to Sentimental Context

Page 27: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

iPod Nike according to Sentimental Context

Page 28: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

iPod+Nike According to Apple

Page 29: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

iPod Audio according to Sentiment Context

Page 30: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

iPod Shuffle According to Sentiment Context

Page 31: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

iPod Warranty According to Sentimental Context

Page 32: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

iPod Battery according to Sentiment Context

Page 33: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

iPod nano battery According to Sentimental Context

Page 34: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

Google Hits (Battery Related): iPod battery good ~ 13.5 Mill iPod battery bad ~ 900 K iPod nano battery good ~ 3 Mill iPod nano battery bad ~ 785 K iPod shuffle battery good ~ 1.6 Mill iPod shuffle battery bad ~ 230 K iPod shuffle battery price good ~ 2.6 Mill (not a typo) iPod shuffle battery price bad ~ 230 K iPod battery price good ~ 13.5 Mill iPod battery price bad ~ 850 K iPod nano battery price good ~ 3 Mill iPod nano battery price bad ~ 785 K

Page 35: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.
Page 36: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

Summary

Interesting problem with many potential applications

Domain dependence is the core challenge The keys to success are:

Vast quantities of unlabeled data Semantic knowledge from freely available

sources Semantics must guide and influence but not

overrule the statistics

1. Intro - 2. Related Work - 3. Research Plan - 4. Conclusion

Page 37: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

Questions?

Page 38: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

BACKUP SLIDES

Page 39: Automatic Domain Adaptive Sentiment Analysis Phase 1 Justin Martineau.

PMI - Pointwise Mutual Information a.k.a. Specific Mutual Information Do 2 variables occur more often with each

other than chance?

PMI(X,Y ) = logP(X &Y )

P(X)P(Y )

⎝ ⎜

⎠ ⎟

1. Intro - 2. Related Work - 3. Research Plan - 4. Conclusion