Top Banner
Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching, Malaysia
17

Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,

Dec 21, 2015

Download

Documents

Kenneth Lane
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,

Extracting Interest Tags fromTwitter User Biographies

Ying Ding, Jing Jiang

School of Information Systems

Singapore Management University

AIRS 2014, Kuching, Malaysia

Page 2: Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,

Social Media and Personal Data

Dec 5, 2014 AIRS 2014 2

• Much personal information revealed in social media– Content, links, ratings

personal preferences

• All this information is useful to– Researchers: social science– Businesses: targeted

advertising

Page 3: Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,

User Biographies in Twitter

Dec 5, 2014 AIRS 2014 3

• Self-introductions written in free form• Reflect users’ background and interests

Page 4: Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,

User Biographies in Twitter

4

profession interestsage

Around 28% of Singapore Twitter users and 50% of US Twitter usersrevealed their personal interests in their biographies.

Dong Wei et. al. Who am I on Twitter?: A cross-country comparison.WWW’2014

Dec 5, 2014 AIRS 2014

Page 5: Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,

Outline

• Background

• Our task

• Syntactic patterns of interest tags

• Build training data + gold standard

• Method

• Experiments

• Summary

5 Dec 5, 2014 AIRS 2014

Page 6: Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,

Our task

• Automatically extract phrases that describe a user’s personal interests.– We call them “interest tags”– A typical information extraction problem.– Automatically build training data based on

common syntactic patterns.

6 Dec 5, 2014 AIRS 2014

Page 7: Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,

Method

• Linear Chain CRF

• BIO labels

7 Dec 5, 2014 AIRS 2014

Page 8: Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,

Syntactic Patterns of Interest Tags

8

• Based on manual annotation of 500 user biographies.• 28.8% of user biographies contain meaningful interest tags.

Dec 5, 2014 AIRS 2014

Page 9: Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,

Building Training Data

• Seed patterns:

– Play + [NP]

– [NP] + fan

– Interested in + [NP]

• Steps:

– Use seed patterns to extract noun phrases and rank them according to their frequency

– Pick the top-100 ranked noun phrases and use them as positive instances to train CRF

9 Dec 5, 2014 AIRS 2014

Page 10: Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,

Features• Syntactic or dependency features are not used as the

Twitter text is noisy for parsing• Both lexical and POS tag feature are used• To avoid over-fitting: only features extracted from the

surrounding tokens for each position are used

10 Dec 5, 2014 AIRS 2014

Page 11: Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,

Gold Standard

• Two annotators: graduate students

• 500 randomly sampled user biographies

• 1190 sentences– Two annotators disagree on 10 sentences– High agreement

11 Dec 5, 2014 AIRS 2014

Page 12: Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,

Experiment

12

BL-700: top 700 frequent phrases, we choose 700 because it gets the highest F-score among various numbers.Seed: use seed patterns to recognize interest tags

Dec 5, 2014 AIRS 2014

Page 13: Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,

Extracted Patterns

13 Dec 5, 2014 AIRS 2014

Some popular patterns are:•[Interest tag] + fan/lover/enthusiast•I love + [interest tag]•[interest tag] is/are my life

Page 14: Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,

Is it difficult to predict interest tags by users’ tweets?

14 Dec 5, 2014 AIRS 2014

Page 15: Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,

Is it difficult to predict interest tags by users’ tweets?

We also applied Tf-idf ranking, which has been used to extract

personalized user tags, to extract user interest tags.

15 Dec 5, 2014 AIRS 2014

• Interest tags extracted from user’s biographies are not necessarily reflected in a user’s post tweets.• They can work as supplementary information when profiling a user.

Page 16: Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,

Summary

• We studied the problem of extracting interest tags from Twitter user biographies

• We automatically built noisy training data based on syntactic patterns

• We trained CRF classifier on the noisy training data and achieved decent performance

• Interest tags extracted from Twitter user biographies may not be reflected in user’s tweets

16 Dec 5, 2014 AIRS 2014

Page 17: Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,

Thank you!

Questions?

17 Dec 5, 2014 AIRS 2014