Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion Weakly Supervised User Profile Extraction from Twitter Jiwei Li, Alan Ritter and Eduard Hovy School of Computer Science, Carnegie Mellon University June 22nd, 2014 Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
79
Embed
Weakly Supervised User Profile Extraction from Twitterbdlijiwei/ppt/attribute.pdfMotivation/IntroductionRelated WorkDataset CreationAlgorithmExperimentsConclusion Weakly Supervised
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Weakly Supervised User Profile Extraction fromTwitter
Jiwei Li, Alan Ritter and Eduard Hovy
School of Computer Science, Carnegie Mellon University
June 22nd, 2014
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
User Profile
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Main Contribution
Automatic extraction of attributes from Twitter:
Spouse
Education
Job
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Main Contribution
We use Google plus as distant supervision for user attributeextraction.
We present a large-scale dataset for this task.
We demonstrate the benefit of jointly reasoning about users’social network structure.
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Outline
Motivation/Introduction
Related Work
Dataset Creation
Algorithm
Experiments
Conclusion
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Motivation/Introduction
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Motivation/Introduction
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Motivation/Introduction
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Motivation/Introduction
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Motivation/Introduction
Why Profile Extraction ?
Friend Recommendation
Target Advertising (Movie, Book ... )
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Motivation/Introduction
Why Profile Extraction ?
Friend Recommendation
Target Advertising (Movie, Book ... )
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Motivation/Introduction
Why Profile Extraction ?
Friend Recommendation
Target Advertising (Movie, Book ... )
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Motivation/Introduction
Why Profile Extraction ?
Friend Recommendation
Target Advertising (Movie, Book ... )
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Motivation/Introduction
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Motivation/Introduction
Twitter serves as a wonderful source:
Text Level Evidence
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Motivation/Introduction
Twitter serves as a wonderful source:
Text Level Evidence
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Motivation/Introduction
Twitter serves as a wonderful source:
Text Level EvidenceNetwork Information
Homophily: People sharing more attributes have a higherchance of becoming friends in social media
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Motivation/Introduction
Twitter serves as a wonderful source:
Text Level EvidenceNetwork Information
Homophily: People sharing more attributes have a higherchance of becoming friends in social media
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Motivation/Introduction
Question
Unstructured Twitter data → Structured User Profile ?
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Related Work
Motivation/Introduction
Related Work
Dataset Creation
Algorithm
Experiments
Conclusion
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Related Work
User Attribute Extraction/ Identification
Gender (Ciot et al., 2013; Liu and Ruths, 2013)Age (Rao et al., 2010)
Relying on Amazon Mechanical Turk
Political Polarity (Pennacchiotti et al, 2011)
Relying on external political websites
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Related Work
User Attribute Extraction/ Identification
Gender (Ciot et al., 2013; Liu and Ruths, 2013)
Age (Rao et al., 2010)Relying on Amazon Mechanical Turk
Political Polarity (Pennacchiotti et al, 2011)
Relying on external political websites
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Related Work
User Attribute Extraction/ Identification
Gender (Ciot et al., 2013; Liu and Ruths, 2013)Age (Rao et al., 2010)
Relying on Amazon Mechanical Turk
Political Polarity (Pennacchiotti et al, 2011)
Relying on external political websites
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Related Work
User Attribute Extraction/ Identification
Gender (Ciot et al., 2013; Liu and Ruths, 2013)Age (Rao et al., 2010)
Relying on Amazon Mechanical Turk
Political Polarity (Pennacchiotti et al, 2011)
Relying on external political websites
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Related Work
User Attribute Extraction/ Identification
Gender (Ciot et al., 2013; Liu and Ruths, 2013)Age (Rao et al., 2010)
Relying on Amazon Mechanical Turk
Political Polarity (Pennacchiotti et al, 2011)
Relying on external political websites
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Dataset Creation
Motivation/Introduction
Related Work
Dataset Creation
Algorithm
Experiments
Conclusion
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Dataset Creation
Challenge:
Lack of Training Data
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Dataset Creation
Distant Supervision
Relation Extraction (Mintz et al., 2009)
Paris is the capital and most populous city of FranceThe capital of France is Paris
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Dataset Creation
Distant Supervision
Relation Extraction (Mintz et al., 2009)
Paris is the capital and most populous city of FranceThe capital of France is Paris
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Dataset Creation
Distant Supervision
Relation Extraction (Mintz et al., 2009)
Paris is the capital and most populous city of FranceThe capital of France is Paris
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Dataset Creation
Distant Supervision
Relation Extraction (Mintz et al., 2009)
Paris is the capital and most populous city of FranceThe capital of France is Paris
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Dataset Creation
Distant Supervision
Relation Extraction (Mintz et al., 2009)
Paris is the capital and most populous city of FranceThe capital of France is Paris
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Dataset Creation
Distant Supervision
Relation Extraction (Mintz et al., 2009)
Paris is the capital and most populous city of FranceThe capital of France is Paris
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Dataset Creation
Distant Supervision
Relation Extraction (Mintz et al., 2009)
Paris is the capital and most populous city of FranceThe capital of France is Paris
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Dataset Creation
What is Knowledge Base for our task ?
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Dataset Creation
What is Knowledge Base for our task ?
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Dataset Creation
What is Knowledge Base for our task ?
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Dataset Creation
Attributes we focus on:
Education
Job
Spouse
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Dataset Creation
Attributes we focus on:
Education
Job
Spouse
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Dataset Creation
Education: Positive Examples
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Dataset Creation
Education: Positive Examples
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Dataset Creation
Education: Positive Examples
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Dataset Creation
Education: Positive Examples
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Dataset Creation
Education: Negative Examples
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Dataset Creation
Education: Negative Examples
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Dataset Creation
Education: Negative Examples
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Dataset Creation
Education: Data Expansion
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Dataset Creation
Education: Data Expansion
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Dataset Creation
Education: Data Expansion
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Dataset Creation
Spouse
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Dataset Creation
Spouse
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Dataset Creation
Spouse
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Algorithm
Motivation/Introduction
Related Work
Dataset Creation
Algorithm
Experiments
Conclusion
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Potential Function
Given an entity e recognized by Twitter NER (Ritter et al., 2011).
Ψ(zi ,e): Potential function, entity e constitutes the correspondentattribute of user i
Ψ(zi ,e) =1
ZΨText(zi ,e)ΨNetwork(zi ,e)
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Learning
Text-Level Evidence ΨText(zki ,e)
Entity-level: number of tokens, capital letter, lengthToken-level: identity, shape, POS, NERWindow-level: tokens, POSTweet-level: tokensExternal Sources: list of universities/companies
Neighboring Effect
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Learning
Text-Level Evidence ΨText(zki ,e)
Neighboring Effect
Education, Job (Homophily)
ΨNetwork(zi,e) =∏
j∈Neigh(i)
exp(λI(Zj,e = 1)/N)
SpouseΨNetwork(zi,e) = exp(λI(Ze,useri = 1))
Max-Ent for training.
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Learning
Text-Level Evidence ΨText(zki ,e)
Neighboring Effect
Education, Job (Homophily)
ΨNetwork(zi,e) =∏
j∈Neigh(i)
exp(λI(Zj,e = 1)/N)
SpouseΨNetwork(zi,e) = exp(λI(Ze,useri = 1))
Max-Ent for training.
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Learning
Text-Level Evidence ΨText(zki ,e)
Neighboring Effect
Education, Job (Homophily)
ΨNetwork(zi,e) =∏
j∈Neigh(i)
exp(λI(Zj,e = 1)/N)
SpouseΨNetwork(zi,e) = exp(λI(Ze,useri = 1))
Max-Ent for training.
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Inference
Observed: Neighboring Information is already given(Education, Job).
Latent: Neighboring Information is unknown (Joint Inference)
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Inference
Observed: Neighboring Information is already given(Education, Job).
Latent: Neighboring Information is unknown (Joint Inference)
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Inference
Latent: Neighboring Information is unknown (JointInference)
Initializing only based on text-level information ΨText(zi,e)Infer each individual given its neighbors
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Inference
Latent: Neighboring Information is unknown (JointInference)
Initializing only based on text-level information ΨText(zi,e)
Infer each individual given its neighbors
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Inference
Latent: Neighboring Information is unknown (JointInference)
Initializing only based on text-level information ΨText(zi,e)Infer each individual given its neighbors
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Inference
Latent: Neighboring Information is unknown (JointInference)
Initializing only based on text-level information ΨText(zi,e)Infer each individual given its neighbors
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Inference
Latent: Neighboring Information is unknown (JointInference)
Initializing only based on text-level information ΨText(zi,e)Infer each individual given its neighbors
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Inference
Latent: Neighboring Information is unknown (JointInference)
Initializing only based on text-level information ΨText(zi,e)Infer each individual given its neighbors
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Experiments
Motivation/Introduction
Related Work
Dataset Creation
Algorithm
Experiments
Conclusion
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Baselines
Only-Text:Text-Level Evidence ΨText(zi ,e)
NELL: Bag of words matching in the list ofuniversities or companies borrowed from NELL
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Baselines
Only-Text:Text-Level Evidence ΨText(zi ,e)
NELL: Bag of words matching in the list ofuniversities or companies borrowed from NELL
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Results
˜
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Results
˜
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Results
˜
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Results
˜
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Conclusion
Motivation/Introduction
Related Work
Dataset Creation
Algorithm
Experiments
Conclusion
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Conclusion
We present a framework to extract user attributes fromTwitter.
We present a large-scale dataset for this task.
We demonstrate the benefit of jointly reasoning about users’social network structure.
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion
Future Work
Facebook:
Jiwei Li, Alan Ritter and Eduard Hovy User Profile Extraction
Motivation/Introduction Related Work Dataset Creation Algorithm Experiments Conclusion