Other Topics You May Also Agree or Disagree: Modeling Inter-Topic Preferences using Tweets and Matrix Factorization Akira Sasaki, Kazuaki Hanawa, Naoaki Okazaki, Kentaro Inui Tohoku University 1 SENTIMENT 1: Other Topics You May Also Agree or Disagree: Modeling Inter-Topic Preferences using Tweets and Matrix Factorization
42
Embed
Other Topics You May Also Agree or Disagree: Modeling Inter-Topic Preferences using ... · 2018. 7. 27. · Stance classification •Goal •Classify stances of texts in regard to
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Other Topics You May Also Agree or Disagree:
Modeling Inter-Topic Preferences using Tweets and Matrix Factorization
Akira Sasaki, Kazuaki Hanawa, Naoaki Okazaki, Kentaro InuiTohoku University
1SENTIMENT 1: Other Topics You May Also Agree or Disagree:Modeling Inter-Topic Preferences using Tweets and Matrix Factorization
Stance classification• Goal• Classify stances of texts in regard to a specific topic
• Applications• Public opinion survey from SNS data• Predicting voting actions
2SENTIMENT 1: Other Topics You May Also Agree or Disagree:Modeling Inter-Topic Preferences using Tweets and Matrix Factorization
Input Output
Text: I fully agree with TPP Topic: TPP Stance: Agree
Difficulty of stance classification
SENTIMENT 1: Other Topics You May Also Agree or Disagree:Modeling Inter-Topic Preferences using Tweets and Matrix Factorization 3
People often talk about topicswithout explicitly mentioning the topic.
How can we classify stance from such a text?
Input Output
Text: It is better to promote domestic consumption Topic: TPP Stance: Disagree
Input Output
Text: It is better to promote free trade Topic: TPP Stance: Agree
Input Output
Text: It is better to promote domestic consumption Topic: TPP Stance: Disagree
Input Output
Text: It is better to promote free trade Topic: TPP Stance: Agree
who agree with TPP
free trade
revision of copyright law domestic consumption
distribution of pharmaceuticals
also agree with disagree with
knowledge
SENTIMENT 1: Other Topics You May Also Agree or Disagree:Modeling Inter-Topic Preferences using Tweets and Matrix Factorization 4
Use of inter-topic preferences forstance classification
inter-topic preference
A relatively simple example
SENTIMENT 1: Other Topics You May Also Agree or Disagree:Modeling Inter-Topic Preferences using Tweets and Matrix Factorization 5
Topic words and their surrounding wordsprovide strong clues.(Somasundaran&Wiebe,2010),(Mohammad+,2013)
Input Output
Text: I fully agree with TPP Topic: TPP Stance: Agree
※ Although datasets used in this work are in Japanese, we provide examples in English for readability.
Proposal: modeling inter-topic preferences via matrix factorization
SENTIMENT 1: Other Topics You May Also Agree or Disagree:Modeling Inter-Topic Preferences using Tweets and Matrix Factorization 6
1.0 -1.0
-1.0 0.7
-0.4 1.0 -1.0
0.5
1.0 0.3 -1.0 0.2
-1.0 0.2 0.7 -0.2
-0.4 -0.3 1.0 -1.0
0.2 0.5 0.1 -0.5
User 1
User 2
User 3
User 4
Topic
1To
pic 2
Topic
3To
pic 4
Topic
1To
pic 2
Topic
3To
pic 4
User 1
User 2
User 3
User 4
! !"
≈ × =
!" !Users’ stances for
each topics (user-topic matrix)
Compute users’ densefeature vector and topics’ dense
feature vector via matrix factorization
Complete missing values by
feature vectors
The aim of matrix factorization:
1. capture inter-topic preferences by dense feature vectors2. reveal users’ hidden stances by completion
The whole architecture
SENTIMENT 1: Other Topics You May Also Agree or Disagree:Modeling Inter-Topic Preferences using Tweets and Matrix Factorization 7
Corpus (tweets)
Tweets posted by userswho have used pro/con hashtags
A good news. [URL] #TPP反対
…
TPP ruins the future of our country
A is completely wrong
We should introduce A
to A
Pattern candidates in whichthe users describe topics
Linguistic pro/conpatterns
PatternExtraction
Sort candidatesand select
useful patterns
I support A / A is necessary /Welcome A / We should introduce A
…
I disagree A / A is completely wrong /A ruins the future of our country
…
1.0 -1.0
-1.0 0.7
-0.4 1.0 -1.0
0.5
1.0 0.3 -1.0 0.2
-1.0 0.2 0.7 -0.2
-0.4 -0.3 1.0 -1.0
0.2 0.5 0.1 -0.5
User 1
User 2
User 3
User 4
Topic
1To
pic 2
Topic
3To
pic 4
Topic
1To
pic 2
Topic
3To
pic 4
User 1
User 2
User 3
User 4
! !"
≈ × =
!" !
Mine topicpreferences
① Mining Linguistic Patterns of Agreement and Disagreement
② Extracting Instances ofStances
③ Matrix Factorization
The whole architecture
SENTIMENT 1: Other Topics You May Also Agree or Disagree:Modeling Inter-Topic Preferences using Tweets and Matrix Factorization 8
Corpus (tweets)
Tweets posted by userswho have used pro/con hashtags
A good news. [URL] #TPP反対
…
TPP ruins the future of our country
A is completely wrong
We should introduce A
to A
Pattern candidates in whichthe users describe topics
Linguistic pro/conpatterns
PatternExtraction
Sort candidatesand select
useful patterns
I support A / A is necessary /Welcome A / We should introduce A
…
I disagree A / A is completely wrong /A ruins the future of our country
…
1.0 -1.0
-1.0 0.7
-0.4 1.0 -1.0
0.5
1.0 0.3 -1.0 0.2
-1.0 0.2 0.7 -0.2
-0.4 -0.3 1.0 -1.0
0.2 0.5 0.1 -0.5
User 1
User 2
User 3
User 4
Topic
1To
pic 2
Topic
3To
pic 4
Topic
1To
pic 2
Topic
3To
pic 4
User 1
User 2
User 3
User 4
! !"
≈ × =
!" !
Mine topicpreferences
① Mining Linguistic Patterns of Agreement and Disagreement
② Extracting Instances ofStances
③ Matrix Factorization
Mining linguistic patternsof agreement/disagreement• Focus on pro/con hashtags such as “#X賛成” or
“#X反対” used by users who have strong stances to topics
SENTIMENT 1: Other Topics You May Also Agree or Disagree:Modeling Inter-Topic Preferences using Tweets and Matrix Factorization 9
Then extractcon linguistic patterns
from other tweets by this user
#X反対 means“disagree with X”
Corpus (Tweet) Tweets posted by userswho have used pro/con hashtags
A good news. [URL] #TPP反対…
TPP is completely wrongA is completely wrong
We should introduce A
to A
Candidates of linguistic patterns
PatternExtraction
user X
user Y
…
…
…
The whole architecture
SENTIMENT 1: Other Topics You May Also Agree or Disagree:Modeling Inter-Topic Preferences using Tweets and Matrix Factorization 10
Corpus (tweets)
Tweets posted by userswho have used pro/con hashtags
A good news. [URL] #TPP反対
…
TPP ruins the future of our country
A is completely wrong
We should introduce A
to A
Pattern candidates in whichthe users describe topics
Linguistic pro/conpatterns
PatternExtraction
Sort candidatesand select
useful patterns
I support A / A is necessary /Welcome A / We should introduce A
…
I disagree A / A is completely wrong /A ruins the future of our country
…
1.0 -1.0
-1.0 0.7
-0.4 1.0 -1.0
0.5
1.0 0.3 -1.0 0.2
-1.0 0.2 0.7 -0.2
-0.4 -0.3 1.0 -1.0
0.2 0.5 0.1 -0.5
User 1
User 2
User 3
User 4
Topic
1To
pic 2
Topic
3To
pic 4
Topic
1To
pic 2
Topic
3To
pic 4
User 1
User 2
User 3
User 4
! !"
≈ × =
!" !
Mine topicpreferences
① Mining Linguistic Patterns of Agreement and Disagreement
② Extracting Instances ofStances
③ Matrix Factorization
• Sort aforementioned pattern candidates by their frequency, and filter manually
Extracting instances ofstances
SENTIMENT 1: Other Topics You May Also Agree or Disagree:Modeling Inter-Topic Preferences using Tweets and Matrix Factorization 11
A is completely wrong
We should introduce A
to APattern candidates
Linguistic patterns
I support AA is necessaryWelcome A…
I disagree AA is completely wrongA is silly…
PRO
CON
Manual examination
Corpus (Tweet)
1.0 -1.0
-1.0 0.7
-0.4 1.0 -1.0
0.5
User 1
User 2
User 3
User 4
TPP…do
mestic
cons
umpti
on
…I support domestic consumption
…
TPP is sillyuser 1
• By using linguistic patterns, we create user-topic matrix
Extracting instances ofstances
SENTIMENT 1: Other Topics You May Also Agree or Disagree:Modeling Inter-Topic Preferences using Tweets and Matrix Factorization 12
• Ex3: Correlation between human judgements→ Moderate correlation
SENTIMENT 1: Other Topics You May Also Agree or Disagree:Modeling Inter-Topic Preferences using Tweets and Matrix Factorization 15
𝑘
Dataset• Tweet corpus
• about 35 Billion tweets crawled from Feb. 2013 to Sep. 2016• about 7 Million users• retweets are removed
• Collected data• 100 pro patterns and 100 con patterns (manually filtered)• about 25 Million tuples (agreement/disagreement declaration)
corresponding to about 3 Million users and about 5,000 topics
• User-topic matrix• removed users and topics that appeared less than five times• about 10 Million tuples corresponding to about 270,000 users and about
2,300 topics• sparsity = 98.43%
SENTIMENT 1: Other Topics You May Also Agree or Disagree:Modeling Inter-Topic Preferences using Tweets and Matrix Factorization 16
• How accurately can user and topic vectors predict missing stances?
SENTIMENT 1: Other Topics You May Also Agree or Disagree:Modeling Inter-Topic Preferences using Tweets and Matrix Factorization 17
Ex2: Predicting missing stances
1.0 -1.0
-1.0 0.7
-0.4 1.0 -1.0
0.5
User 1
User 2
User 3
User 4
Topic
1To
pic 2
Topic
3To
pic 4
1.0 -1.0
-1.0
1.0 -1.0
User 1
User 2
User 3
User 4To
pic 1
Topic
2To
pic 3
Topic
4
hide 5% of elements ≈ ×
!" !
=matrix
factorization
1.0 0.3 -1.0 0.2
-1.0 0.2 0.9 -0.2
-0.2 -0.3 1.0 -1.0
0.2 0.8 0.1 -0.5
User 1
User 2
User 3
User 4
Topic
1To
pic 2
Topic
3To
pic 4
calculate accuracyin regard to hidden elements
Ex2: Predicting missing stances• How accurately can user and topic vectors
predict missing stances?• majority baseline: predict missing values as
majority one of agree/disagree in regard to the topic
SENTIMENT 1: Other Topics You May Also Agree or Disagree:Modeling Inter-Topic Preferences using Tweets and Matrix Factorization 18