Top Banner
WI: IWI August 31st, 2010 Swit Phuvipadawat, Tsuyoshi Murata Dept. of Computer Science Tokyo Institute of Technology, Japan Breaking News Detection and Tracking in Twitter
29

Breaking News Detection and Tracking in Twitter (WI:IW'10)

Apr 10, 2015

Download

Documents

jazripper

Twitter has been used as one of the communication channels for spreading breaking news. We propose a method to collect, group, rank and track breaking news in Twitter. Since short length messages make similarity comparison difficult, we boost scores on proper nouns to improve the grouping results. Each group is ranked based on popularity and reliability factors. Current detection method is limited to facts part of messages. We developed an application called “Hotstream” based on the proposed method. Users can discover breaking news from the Twitter timeline. Each story is provided with the information of message originator, story development and activity chart. This provides a convenient way for people to follow breaking news and stay informed with real-time updates.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Breaking News Detection and Tracking in Twitter (WI:IW'10)

WI: IWIAugust 31st, 2010

Swit Phuvipadawat, Tsuyoshi MurataDept. of Computer Science

Tokyo Institute of Technology, Japan

Breaking News Detection and Tracking in Twitter

Page 2: Breaking News Detection and Tracking in Twitter (WI:IW'10)

Outline

• Introduction

• Analysis

• Methodology

• Results and Application

• Challenges and Future Works

• Conclusion

Page 3: Breaking News Detection and Tracking in Twitter (WI:IW'10)

Introduction

Page 4: Breaking News Detection and Tracking in Twitter (WI:IW'10)

Twitter as a news channel

http://blog.marsdencartoons.com/2009/06/18/cartoon-iranian-election-demonstrations-and-twitter/marsden-iran-twitter72/

In June 2009, during the Iranian Election Twitter has transformed the way people convey news.

Page 5: Breaking News Detection and Tracking in Twitter (WI:IW'10)

Twitter as a news channel

Iraq Election

Earthquake with 6.4 magnitude hits Taiwan!

Tsunami alert after Chilean earthquake.

Early voting begin March, 7 Iraq Election

The Apple iPad starting $499

Apple announced iPad

Obama Health Reform

Earthquakes around the worldEarthquake with 6.4 magnitude hits Taiwan!

Earthquake with 6.4 magnitude hits Taiwan!

Tsunami alert after Chilean earthquake.

Early voting begin March, 7 Iraq Election

The Apple iPad starting $499The Apple iPad starting $499

Apple to launch iPad on March 26

The Apple iPad starting $499

Steve Jobs demoed iPad

Earthquake in Taiwan

Earthquake in Chile

Earthquake in Haiti

Early voting begin March, 7 Iraq Election

US and UN hope Sunni participation help heal the

wound.

Health care explained.

Page 6: Breaking News Detection and Tracking in Twitter (WI:IW'10)

Research Topic

“Breaking News Detection and Tracking in Twitter”

➡ Topic Detection and Tracking (TDT)

➡ Information Retrieval

➡ Social Network Analysis

Page 7: Breaking News Detection and Tracking in Twitter (WI:IW'10)

Topic Detection and Tracking (TDT)• To monitor broadcast news and alert an analyst to new and

interesting events happening in the world. [Allan 2001]

• To search, organize and structure multilingual, news oriented textual materials from a variety of broadcast news media. [Fiscus & Doddington 2002]

• Focuses on 5 tasks:

❖ Story segmentation

❖ First story detection

❖ Cluster detection

❖ Tracking

Page 8: Breaking News Detection and Tracking in Twitter (WI:IW'10)

Recent Studies

• Topological characteristics of Twitter

What is Twitter, a Social Network or a News Media? H. Kwak, C. Lee, S. Moon [WWW2010]

➡ 85% of trending topics in Twitter appear in headline news

• Using Twitter data to improve web ranking

Time is of the Essence: Improving Recency Ranking Using Twitter Data A. Dong, R. Zhang et. al [WWW2010]

➡ Micro-blogging data reveals fresh URLs not yet indexed by search engine

• Event detection

Earthquake Shakes Twitter Users: Real-time Event Detection by Social Sensors T. Sakaki, M. Okazaki, Y. Matsuo [WWW2010]

Page 9: Breaking News Detection and Tracking in Twitter (WI:IW'10)

Recent Studies• In!uential Topics, Users detection

❖ Characterizing Microblogs with Topic Model D. Ramage, S. Dumais, D. Liebling [ICWSM2010]

➡ Use Labeled LDA, a supervised learning model to characterize the content of messages into substance, style, status and social characteristics.

❖ TwitterRank: Finding Topic-sensitive In!uential Twitters J. Weng, E. Peng, J. Jiang [WSDM2010]

➡ Use PageRank with topic model (LDA) to measure the in!uence of users.

Page 10: Breaking News Detection and Tracking in Twitter (WI:IW'10)

Analysis

Page 11: Breaking News Detection and Tracking in Twitter (WI:IW'10)

Message Analysis

Msg Attributes Count %

Tag a user 79,469 51.6% Embed a link 50,404 32.7% Retweet 29,935 19.4% Use a hashtag 20,348 13.2%RT

@http://

#

Findings from a dataset of 154,000 msg.with 33,000 msg. from news engaging users

Text Characteristics Examples

Sensational adjectives E terrible, horrible, terrifying, shocking, terri"c, amazing, ...

Sensational phrases E wow! oh my god! ...

Signi"cant nouns F US. President, Obama, Michael Jackson, Japan, Toyota, ...

Impactful verbs F kill, die, crash, reveal, discover, rescue, ...

@

http://

#

Single Message Aspect

Data of March 2009

Page 12: Breaking News Detection and Tracking in Twitter (WI:IW'10)

Network Analysis

RT

Timeline Aspect

A

A

M6

M3

RT (retweet) is to take a twitter message of someone and rebroadcasting that same

message

Earthquake in Tokyo!

John12:15

RT @John Earthquake in Tokyo!

Lisa 12:30

To retold a story to your friends

Page 13: Breaking News Detection and Tracking in Twitter (WI:IW'10)

Methodology

Page 14: Breaking News Detection and Tracking in Twitter (WI:IW'10)

Method for Collecting, Indexing and Grouping

‣ Collecting

• Fetch messages using pre-de"ned search queries for breaking news related keyword and hashtags

‣ Indexing

• Index based on term vectors is constructed.

• Apache Lucene is used as an information retrieval library

‣ Grouping

• Similar messages are grouped together to form a news story

• Similarity comparison is based on the vector space model using TF-IDF with term boosting for proper nouns

Collecting

Page 15: Breaking News Detection and Tracking in Twitter (WI:IW'10)

Grouping Method Explained

Conditions• Message in a group must be

related to the "rst story• Further messages can develop

upon previous messages

A message is compared with the "rst message in a group and the top k terms in that group.

sim(m1,m2 ) = tf (t,m2 ) ! idf (t) ! boost(t)[ ]t"m1

#

tf (t,m) = count(t in m)size(m)

idf (t) = 1+ log Ncount(m has t)

$%&

'()

Boost is raised for proper nouns e.g. China, Obama, Toyota and Hashtags. NER is used for detection

Page 16: Breaking News Detection and Tracking in Twitter (WI:IW'10)

Name Entity Recognizer

• Stanford Named Entity Recognizer (NER) has been adopted for the following uses:

➡ To detect proper nouns used in the grouping algorithm

➡ To classify messages based on named entities (Person, Organization, Location, Misc.)

• NER is based on linear chain Conditional Random Field (CRF)

Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pp. 363-37

Page 17: Breaking News Detection and Tracking in Twitter (WI:IW'10)

The score for each group is computed as follows:

• A group score is based on reliability, popularity and freshness factors.‣ Reliability comes from the

numbers of followers who follow the user who posted a message.‣ Popularity comes from the

numbers of retweet.‣ Freshness is computed from

the difference of current time and time where a message is posted.

Method for Group Ranking

Page 18: Breaking News Detection and Tracking in Twitter (WI:IW'10)

Results and Application

Page 19: Breaking News Detection and Tracking in Twitter (WI:IW'10)

Detection Effectiveness

RatesMethod

RatesSearch query

Precision 90.0% (45/50)

Recall -

Spam 8% (4/50)Avg. time to collect

100 new msg. 72 sec

User generated 11.1% (5/45)

Based on an experiment conducted in June 2010

Page 20: Breaking News Detection and Tracking in Twitter (WI:IW'10)

Example Result of Grouping(a) No boost(a) No boost(a) No boost(a) No boost(a) No boost(a) No boost

G0 M3 M4 M5G1 M7 M8G2 M0 M1G3 M2G4 M6G5 M9

(b) b=1.5(b) b=1.5(b) b=1.5(b) b=1.5(b) b=1.5(b) b=1.5G0 M2 M3 M4 M5G1 M7 M8G2 M0 M1G4 M6G5 M9

(c) b=1.7(c) b=1.7(c) b=1.7(c) b=1.7(c) b=1.7(c) b=1.7G0 M0 M1 M7 M8G1 M2 M3 M4 M5G2 M6G3 M9

(c) b=2(c) b=2(c) b=2(c) b=2(c) b=2(c) b=2G0 M2 M3 M4 M5 M9G1 M0 M1 M7 M8G2 M6

ToyotaMJ.

AirlineUS. JapanPrisoner

Boosting improves the grouping result

Page 21: Breaking News Detection and Tracking in Twitter (WI:IW'10)

Application

• A prototype application called Hotstream is developed.

• The goal is to create an automatic news portal based on Twitter data.

Page 22: Breaking News Detection and Tracking in Twitter (WI:IW'10)
Page 23: Breaking News Detection and Tracking in Twitter (WI:IW'10)

Challenges and Future Works

Page 24: Breaking News Detection and Tracking in Twitter (WI:IW'10)

Challenges

• The length of messages is short

• Two similar stories may be expressed using different vocabulary terms

• The style of writting is unconventional with slangs, many ways for spellings

Page 25: Breaking News Detection and Tracking in Twitter (WI:IW'10)

Future Works• Explore the comunity structures of named

entities to "nd relationship among groups of messages

Grouped by TF-IDF with proper

noun term boosting

Page 26: Breaking News Detection and Tracking in Twitter (WI:IW'10)

Example Dataset

Top 18 stories and their keywords from Hotstream as of July 21st, 2010Red nodes = keywords, Yellow nodes = message groups

Messages-Named Entities

Page 27: Breaking News Detection and Tracking in Twitter (WI:IW'10)

Community Detection Experiment

Method Edge betweeness

No. Communities 68

Modularity 0.71

Purity 0.67

BP Oil leak

Australian Prime Minister

US. Military in Middle East

Network Type Edge betweeness

No. Vertices 453 (254,200)

No. Edges 1280

Mean Degree 5.639

No. Clusters 40

Largest Component Fraction 0.781

Network Characteristics

Community Detection Results

Page 28: Breaking News Detection and Tracking in Twitter (WI:IW'10)

Conclusion

• Introduced Twitter as a mean to convey news

• Described messages, network characteristics of Twitter

• Described the method to collect, index, group and rank messages

• Introduced Hotstream, an automatic news portal

• Propose an extension study on group-keyword network to improve the grouping result

Page 29: Breaking News Detection and Tracking in Twitter (WI:IW'10)

Thank You