Top Banner
Tutorial on Semantic Analysis and Search for Twitter Ming ZHOU, Xiaohua LIU, Long JIANG,Yajuan DUAN MSRA-NLC Group Aug. 19 th 2010
211

Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Oct 01, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Tutorial on Semantic

Analysis and Searchfor Twitter

Ming ZHOU, Xiaohua LIU, Long JIANG, Yajuan DUAN

MSRA-NLC Group

Aug. 19th 2010

Page 2: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Tutorial outline

Introduction (Ming ZHOU)

Understand Twitter

The task of semantic analysis

Semantic analysis of tweets (Xiaohua LIU, Long JIANG)

Semantic role labeling

Sentiment analysis

Twitter search (Yajuan DUAN)

Feature extraction

Ranking search results with account‟s influence, content relevance and other features

Wrap-up (Ming)

What we have learnt and where we should go

Page 3: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Introduction

Dr. Ming ZHOU

Page 4: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Agenda

Overview of Twitter

Semantic analysis and search

Page 5: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

About Twitter

Page 6: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

What is Twitter

Twitter is a free social networking and micro-blogging

service that enables its users to send and read other

users‟ updates known as „tweets‟.

Twitter allows users to reply with text-based posts of up

to 140 characters in length.

Senders can restrict delivery to those in their circle of

friends or, by default, allow anybody to access them.

Users can send and receive tweets via the Twitter

website, Short Message Service (SMS) or external

applications.

Page 7: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Twitter’s history

In 2005, Jack Dorsey had grown interested in the simple

idea of being able to know what his friends were doing.

Twitter was funded initially by Obvious, a creative

environment in San Francisco, CA.

The first prototype was built in two weeks in March 2006

and launched publicly in August of 2006.

The service grew popular very quickly and it soon made

sense for Twitter to move outside of Obvious.

In May 2007, Twitter Incorporated was founded.

100M registered users now and 70M tweets per day

Page 8: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Twitter influences people’s daily life

Massive information broadcasted

At Soccer World Cup final (Spain vs. Netherlands), 2,000

tweets were sent per second in the last 15 mins, in 27

languages from 172 countries

Quicker information finding

When the National Post covered the G20 protests in Toronto

live, it was Twitter that informed reporters of riots forming

elsewhere while other media was not aware at all

Link with the world

Find friends and community

Page 9: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

User’s behaviors in using Twitter

As personal use

Broadcast news, feelings, observation, gossips

Keep in touch with your friends

Obtain quicker information

As business use

Broadcast your company‟s latest news and blog posts

Interact with your customers

Business intelligence (sentiment analysis, etc.)

Page 10: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Find your friends to follow

Search for people on twitter by their name or user name

Import friends from other networks or invite friends via

email.

Twitter even suggests new friends for you

Twitter account yellow page

Page 11: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Interesting facts

As Twitter users attract more followers, they tend to

Tweet more often.

This is particularly evident once someone has 1,000

followers the average number of Tweets/day climb from

three to six.

When someone has more than 1,750 followers, the

number of Tweets/day rises to 10.

85.3% of Twitter users update less than once/day; While

1.13% Twitter users update more than average of 10

times a day.

Page 12: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Rapid growing tweets per day

Page 13: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

China’s Micro-Blogging

1. According to SIG, Sina micro-blogging grew rapidly in last 9

months, the user number has reached to 15 million ~ 20 million

2. All portals have provided twitter-like micro-blogging by now

Page 14: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Catalogue of popular accounts (twellow.com)

Page 15: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Twitter analytic

http://archivist.visitmix.com/d6385b22/3?isNew=False

Page 16: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Twitter sentiment analysis

http://twittersentiment.appspot.com/

Page 17: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Positive vs. negative comments

http://twittersentiment.appspot.com/

Page 18: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Twitterland.com (1) Follow and track certain keywords

Tweet Beep - Find out who is talking about you or your website through certain keywords

Site volume - Enter five keywords and see their activity on Twitter

Tweet Volume - A more personalize version of Site Volume

Monitter - Monitor Twitter conversations on three keywords. Good for catching the latest news

Hashtags - Track a certain keyword on Twitter

Twemes - See twitter memes or tags for Twitter

Tweetchannel - Find out what people are talking about through certain keywords called channels.

Twitter Meter - Find the trends of certain keywords

Flaptor Trends - Compare the trends of three keywords on Twitter

Twitter Spectrum - Find out the dominant keywords via a tag cloud generated.

Serendipitwiterrous - Search for tweets of a certain person using certain keywords

Twittertroll - real-time Twitter search engine

Page 19: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Twitterland.com (2) Integrate your Twitter with Files, Images and Videos

Twitpic - Let you share photos on Twitter

Autopostr - Update your Twitter when you post a Flickr picture

Snaptweet - Share your Flickr photos on Twitter

Twixr - Allows you to share pictures on Twitter via your mobile phone

Visual Twitter - Answers “what are you doing?” with pictures.

Twitter Poster - A huge conglomeration of Twitter user images.

Twiddeo - Think Twitter updates + video

Twitplus - Make tweets with pictures, videos and files

Twittershare - Share pictures, music, video and other files on Twitter.

Twixxer - Share photos and videos on Twitter

Tweet Cube - Upload files to Twitter

Pikter - Post pictures on Twitter

TinyTwit - Lets you share files on Twitter, also URL shortening service and tweet app.

Page 20: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Twitter search

Page 21: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Twitter now fast-growing search engine

The popular micro-blogging social network known as

Twitter is not actually a social network. It is more like

an information network or a source of news

Twitter co-founder Biz Stone told the World Innovation

Forum

More than 800 million twitter search queries are

processed every day, making for a monthly total of 24

billion searches.

Sources: http://business.financialpost.com/2010/07/07/fp-tech-desk-twitter-now-fastest-growing-search-engine/

Page 22: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Twitter search drives new innovations

Hot competition today: real-time search

Twitter acquired Summize

Facebook acquired FriendFeed

Yahoo collaborated with OneRiot

Bing, Google added Twitter data search results

Many start-ups such as Twazzup

Promote next wave of innovations

Crawler, text mining, ranking of search results, visualization,

mobile search, SNS search

Help general search, local search, news search, travel search

and various verticals

Page 23: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Twitter Search

Rank by tweets

popularity (e.g.

# of retweet)

Rank by

Chronological

order

Page 24: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Problems with current Twitter Search

Lack support to information browsing

Clustering, categorization, navigation (TOC)

Weak search

Hard to read the returned tweets or embedded links only

Search results are ranked in chronologically or content match extent, the user influence, the popularity of tweets are not sufficiently considered

No Business intelligence

Sentiment analysis, branding reputation, market analysis and prediction

Poor information display

Call for semantic analysis and search

Page 25: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua
Page 26: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

The tasks of semantic analysis

Individual tweet level

Page 27: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Twitter Firehose

Rep. Rangel, facing ethics charges, says he may

have been "overzealous" in serving public

http://on.cnn.com/9dhCil

Semantic Social

Network Analysis

Page 28: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Classifier

Pointless

babble

Pass

along

Self-

promotion

NewsSpam

Conversa

tion

Page 29: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Classifier

Pointless

babble

Pass

along

Self-

promotion

NewsSpam

Conversa

tion

Entertainment

World

Politics

Science &

Technology

Lifestyle

Business &

Products

Sports

Classifier according

to content

Normalizati

on

Abbreviation

recovery

Spelling error

correction

Page 30: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Classifier

Pointless

babble

Pass

along

Self-

promotion

NewsSpam

Conversa

tion

Entertainment

World

Politics

Science &

Technology

Lifestyle

Business &

Products

Sports

Classifier

according to

contentNormalizati

on

Abbreviation

recovery

Spelling error

correction

ZoeeLovato: @xuniqueuniverse oh, i really want

to see inception :) it looks real good! haha

Page 31: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Classifier

Pointless

babble

Pass

along

Self-

promotion

NewsSpam

Conversa

tion

Entertainment

World

Politics

Science &

Technology

Lifestyle

Business &

Products

Sports

Classifier according to

content

Normalizati

on

Abbreviation

recovery

Spelling error

correction

Co-

reference

Resolution

Page 32: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Classifier

Pointless

babble

Pass

along

Self-

promotion

NewsSpam

Conversa

tion

Entertainment

World

Politics

Science &

Technology

Lifestyle

Business &

Products

Sports

Classifier according

to content

Normalizati

on

Abbreviation

recovery

Spelling error

correction

Coreferen

ce

Resolution

This is identified as

Rangel

Page 33: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Classifier

Pointless

babble

Pass

along

Self-

promotion

NewsSpam

Conversa

tion

Entertainment

World

Politics

Science &

Technology

Lifestyle

Business &

Products

Sports

Classifier according to

content

Normalizati

on

Abbreviation

recovery

Spelling error

correction

Co-

reference

Resolution

First Publisher

Retweeter

Shared link

Tag

Retweet

Count

Conversational

Meta Data

Extractor

Page 34: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Classifier

Pointless

babble

Pass

along

Self-

promotion

NewsSpam

Conversa

tion

Entertainment

World

Politics

Science &

Technology

Lifestyle

Business &

Products

Sports

Classifier according to

content

Normalizati

on

Abbreviation

recovery

Spelling error

correction

Co-

reference

Resolution

First Publisher

Retweeter

Shared link

Tag

Retweet

Count

Conversational

Meta Data

Extractor

First Publisher: cnnbrk

Retweeter: jonwillifordlaw

Shared link: http://on.cnn.com/9dhCil

Tag: none

Conversational: no

Retweet Count: 1

Page 35: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Classifier

Pointless

babble

Pass

along

Self-

promotion

NewsSpam

Conversa

tion

Entertainment

World

Politics

Science &

Technology

Lifestyle

Business &

Products

Sports

Classifier according to

content

Normalizati

on

Abbreviation

recovery

Spelling error

correction

Co-

reference

Resolution

First Publisher

Retweeter

Shared link

Tag

Retweet

Count

Conversational

Meta Data

Extractor

NER

Page 36: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Classifier

Pointless

babble

Pass

along

Self-

promotion

NewsSpam

Conversa

tion

Entertainment

World

Politics

Science &

Technology

Lifestyle

Business &

Products

Sports

Classifier according

to content

Normalizati

on

Abbreviation

recovery

Spelling error

correction

Coreferen

ce

Resolution

First Publisher

Retweeter

Shared link

Tag

Retweet

Count

Conversational

Meta Data

Extractor

NER

This is recognized

as person name

Page 37: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Classifier

Pointless

babble

Pass

along

Self-

promotion

NewsSpam

Conversa

tion

Entertainment

World

Politics

Science &

Technology

Lifestyle

Business &

Products

Sports

Classifier according

to content

Normalizati

on

Abbreviation

recovery

Spelling error

correction

Coreferen

ce

Resolution

First Publisher

Retweeter

Shared link

Tag

Retweet

Count

Conversational

Meta Data

Extractor

NERSRL

Page 38: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Classifier

Pointless

babble

Pass

along

Self-

promotion

NewsSpam

Conversa

tion

Entertainment

World

Politics

Science &

Technology

Lifestyle

Business &

Products

Sports

Classifier according

to content

Normalizati

on

Abbreviation

recovery

Spelling error

correction

Coreferen

ce

Resolution

First Publisher

Retweeter

Shared link

Tag

Retweet

Count

Conversational

Meta Data

Extractor

NERSRL

ParserTokenize

POS Tag

Chunking

Semantic TreeA0: Rangel; predicate: facing, says A0: he; predicate: serving

A1: charges, he; predicate: facing A1: he; predicate: been

A1: public; predicate: serving

A2: overzealous; predicate: been

AM-MOD: may; predicate: been

Page 39: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Classifier

Pointless

babble

Pass

along

Self-

promotion

NewsSpam

Conversa

tion

Entertainment

World

Politics

Science &

Technology

Lifestyle

Business &

Products

Sports

Classifier according

to content

Normalizati

on

Abbreviation

recovery

Spelling error

correction

Coreferen

ce

Resolution

First Publisher

Retweeter

Shared link

Tag

Retweet

Count

Conversational

Meta Data

Extractor

NERSRL

or

?SA

A Tweet

Page 40: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Classifier

Pointless

babble

Pass

along

Self-

promotion

NewsSpam

Conversa

tion

Entertainment

World

Politics

Science &

Technology

Lifestyle

Business &

Products

Sports

Classifier according

to content

Normalizati

on

Abbreviation

recovery

Spelling error

correction

Coreferen

ce

Resolution

First Publisher

Retweeter

Shared link

Tag

Retweet

Count

Conversational

Meta Data

Extractor

NERSRL

or

?SA

A Tweet

ParserTokenize

POS Tag

Chunking

Semantic Tree

Page 41: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

The task of semantic analysis

Tweet collection level

Page 42: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Tweet

Collection

Page 43: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Tweet

Collection

Retweet Relation

Page 44: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Tweet

Collection

Retweet Relation

@aidanovia @AlexaLxa

@AlexaLxa @nntshafira

Page 45: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Tweet

Collection

Retweet RelationRetweet Graph

Page 46: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Tweet

Collection

Retweet Relation

Mentio

n R

elatio

n

Retweet Graph

Page 47: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Tweet

Collection

Retweet Relation

Mentio

n R

elatio

n

Retweet Graph

@melodi_x @stephloveless

Page 48: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Tweet

Collection

Retweet Relation

Mentio

n R

elatio

n

Mentioning

Graph

Retweet Graph

Page 49: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Tweet

Collection

User

Collection

Retweet Relation

Mentio

n R

elatio

n

Mentioning

Graph

Retweet Graph

Page 50: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Tweet

Collection

User

Collection

Retweet Relation

Mentio

n R

elatio

n

Follow Relation

Following

Graph

Mentioning

Graph

Retweet Graph

Page 51: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Tweet

Collection

User

Collection

Retweet Relation

Mentio

n R

elatio

n

Follow Relation

Following

Graph

Mentioning

Graph

Retweet Graph

User

Influence

Page 52: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Tweet

Collection

User

Collection

Retweet Relation

Mentio

n R

elatio

n

Follow Relation

Following

Graph

Mentioning

Graph

Retweet Graph

User

Influence

Tweet

Classifier

Page 53: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Tweet

Collection

User

Collection

Retweet Relation

Mentio

n R

elatio

n

Follow Relation

Following

Graph

Mentioning

Graph

Retweet Graph

User

Influence

Tweet

Classifier

Topic

Extraction

Page 54: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Tweet

Collection

User

Collection

Retweet Relation

Mentio

n R

elatio

n

Follow Relation

Following

Graph

Mentioning

Graph

Retweet Graph

User

Influence

Tweet

Classifier

Topic

Extraction

Hot Topic

Page 55: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Tweet

Collection

User

Collection

Retweet Relation

Mentio

n R

elatio

n

Follow Relation

Following

Graph

Mentioning

Graph

Retweet Graph

User

Influence

Tweet

Classifier

Topic

Extraction

Topic

Distribution

on User

Hot Topic

Page 56: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Tweet

Collection

User

Collection

Retweet Relation

Mentio

n R

elatio

n

Follow Relation

Following

Graph

Mentioning

Graph

Retweet Graph

User

Influence

Twitter List

Tweet

Classifier

Topic

Extraction

Topic

Distribution

on User

Hot Topic

Page 57: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Tweet

Collection

User

Collection

Retweet Relation

Mentio

n R

elatio

n

Follow Relation

Following

Graph

Mentioning

Graph

Retweet Graph

User

Influence

Twitter List

Tweet

Classifier

Topic

Extraction

Topic

Distribution

on User

Community

Hot Topic

Page 58: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Tweet

Collection

User

Collection

Retweet Relation

Mentio

n R

elatio

n

Follow Relation

Following

Graph

Mentioning

Graph

Retweet Graph

User

Influence

Twitter List

Tweet

Classifier

Topic

Extraction

Topic

Distribution

on User

Community

Influential

users in

community

Hot Topic

Page 59: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Tweet

Collection

User

Collection

Retweet Relation

Mentio

n R

elatio

n

Follow Relation

Following

Graph

Mentioning

Graph

Retweet Graph

User

Influence

Twitter List

Tweet

Classifier

Topic

Extraction

Topic

Distribution

on User

Community

Influential

users in

community

Hot TopicHot Link

Hot Tag

Popular Tweet

Top Video

Top Music

Top Image

Top Artists

Page 60: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Tweet

Collectio

n

User

Collectio

n

Retweet Relation

Mentio

n R

elatio

n

Follow Relation

Following

Graph

Mentioning

Graph

Retweet Graph

User

Influence

Twitter List

Tweet

Classifier

Topic

Extraction

Topic

Distributio

n on User

Community

Influential

users in

community

Hot TopicHot Link

Hot Tag

Popular Tweet

Top Video

Top Music

Top Image

Top Artists

@aidanovia

@AlexaLxa

@AlexaLxa

@nntshafira

@melodi_x @stephloveless

An

Exam

ple

of T

witte

r List

Page 61: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Summary of semantic analysis

Hot topics, news, image, video

Influential people, community, applications

Time series of topic, people, community and event

Who does (says) what when and where

Sentiment analysis and opinion summary

Rank in multiple ways and combination

Visualization of mining and search results

Breaking news detection

Multi-language text mining

Page 62: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

References

Twitter introduction

http://sysomos.com/insidetwitter/

http://blog.sysomos.com/2010/03/29/twitter-enjoys-major-

growth-and-excellent-stickiness/

Twitter analysis (Sysomos products)

http://sysomos.com/docs/Sysomos_Products_Overview_Broch

ure_web.pdf

Page 63: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Tutorial outline

Introduction

Understand the Twitter

The task of semantic analysis

Semantic analysis of tweets (Xiaohua LIU, Long JIANG)

Semantic role labeling

Sentiment analysis

Twitter search (Yajuan DUAN)

Feature extraction

Ranking search results with account‟s influence, content relevance and other features

Wrap-up (Ming)

What we have learnt and where we should go

Page 64: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Semantic Role Labeling for Tweets

Xiaohua LIU

Page 65: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Outline

Introduction

SRL task definition

Application to twitter search

General approaches to SRL

Resources

Typical systems

SRL on tweets

Challenges

Method

Page 66: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Outline

Introduction

SRL task definition

Application to twitter search

General approaches to SRL

Resources

Typical systems

SRL on tweets

Challenges

Method

Page 67: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Semantic role labeling

Detect basic event structures such as who did what to

whom, when and where

S

NP N

P

The luxury auto makerlast year

VP

NP PP

sold1,214 cars in the U.S.

A0

Agent

AM−TMP

Temporal

Marker

P

Predicate

A1

Object

AM-LOC

Locative

Marker

Page 68: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Predicate

Verbal predicate (PropBank)

Chile [earthquake] A0 shorten the [day] A1

Other types of predicate (NomBank)

[Her]A0 gift of [a book]A1 [to John]A2

Page 69: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Predicate arguments

Core arguments

A0, A1: agent and patient

13 adjunctive arguments

Temporal, manner, location, etc.

Phrase level vs. word level argument

Word level: Chile [earthquake] A0 shorten the [day] A1

Phrase level: [Chile earthquake] A0 shorten [the day] A1

Page 70: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Evaluation of SRL

Page 71: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Evaluation of SRL

Test datasets (from PropBank)

WSJ (Wall Street Journal) : mainly news

Brown: more balanced corpus, including news, reports and

others

The state-of-the-art results

CoNLL-2005 :81.52% F1 on WSJ

CoNLL-2008: 87.69% F1 on WSJ, 69.06% F1 on Brown

CoNLL-2009:80.47 F1 on WSJ

Best systems are pipelined or based on MLN

Page 72: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Outline

Introduction

SRL Task definition

Application to twitter search

General approaches to SRL

Resources

Typical systems

SRL on tweets

Challenges

Method

Page 73: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

SRL can help twitter search

Twitter search is now keyword search, unable to answer

questions, like how many people were killed in Algeria

earthquake?

Page 74: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

SRL can help twitter search

SRL extracts who acted what

oh yea and Chile [earthquake] A0 the earth off it's axis according

to NASA and shorten the [day] A1 by a wee second :-(

[earthquake] A0 shorten the [day] A1

Beyond keyword search, e.g., what shorten the day?

Page 75: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

SRL can help twitter search

SRL abstracts away syntax variances

Chile Earthquake Shortened Earth Day

The Chile earthquake shortened the length of an Earth day

[earthquake] A0 shorten [day] A1

Page 76: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Outline

Introduction

SRL Task definition

Application to twitter search

General approaches to SRL

Resources

Typical systems

SRL on tweets

Challenges

Method

Page 77: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

FrameNet(Fillmore et al., 2004)

Computational frame lexicon + corpus of examples

annotated with semantic roles (mostly BNC)

∼800 semantic frames

>9,000 lexical units

∼150,000 annotated sentences

Page 78: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

An example of frame

Page 79: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

PropBank (Palmer et al., 2005)

The primary resource for research in SRL

Annotation of all verbal predicates in Penn TreebankS

NP N

P

The luxury auto makerlast year

VP

NP PP

sold 1,214 cars in the U.S.

A0

Agent

AM−TMP

Temporal

Marker

P

Predicate

A1

Object

AM-LOC

Locative

Marker

Page 80: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

An example: argument structure depends

on verb and its meaning

sell.01: commerce: seller

A0=“seller” (agent); A1=“thing sold” (theme); A2=“buyer”

(recipient); A3=“price paid”; A4=“benefactive”

[Al Brownstein]A0 sold [it]A1 [for $60 a bottle]A3

sell.02: give up

A0=“entity selling out”

[John]A0 sold out

sell.03: sell until none is/are left

A0=“seller”; A1=“thing sold”; ...

[The new Harry Potter]A1 sold out [within 20 minutes]AM−TMP

Page 81: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

NomBank (Meyers et al., 2004)

Annotation of the nominal predicates in Penn TreeBank

[IBM] A0‟s appointment of [John] A1

The appointment of [John] A1 by [IBM] A0

[John] A1 is the current [IBM] A0 appointee

Page 82: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Outline

Introduction

SRL Task definition

Application to twitter search

General approaches to SRL

Resources

Typical systems

SRL on tweets

Challenges

Method

Page 83: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Typical systems

Pipelined system

System based on sequential labeling

System using Markov Logic Networks

Collective SRL ( jointly conduct SRL on multi sentences )

Page 84: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Typical systems

Pipelined system

System based on sequential labeling

System using Markov Logic Networks

Collective SRL ( jointly conduct SRL on multi sentences )

Page 85: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Pipelined SRL

Argument candidates generation

Page 86: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Pipelined SRL

Argument candidates generation

Argument classification

Page 87: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Pipelined SRL

Argument candidates generation

Argument classification

Global inference

Find the best solution from all possible solutions

E.g., Re-ranking of N best solutions(Haghighi et al., 2005; Toutanova et

al., 2008)

Page 88: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Typical systems

Pipelined system

System based on sequential labeling

System using Markov Logic Networks

Collective SRL ( jointly conduct SRL on multi sentences )

Page 89: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

System based on sequential labeling

(Marques et al., 2005)

Break into base chunks

Chunker: Yamcha (Kudo & Matsumoto, 2001)

Page 90: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

System based on sequential labeling

Break into base chunks

Labeling each chunk

B/I marks the beginning/ continuation of an argument span; and

O non-arguments

Tool: CRF++ http://crfpp.sourceforge.net/

Page 91: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Typical systems

Pipelined system

System based on sequential labeling

System using Markov Logic Networks

Collective SRL ( jointly conduct SRL on multi sentences )

Page 92: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

System using Markov Logic Networks

(Sebastian Riedel and Ivan Meza-Ruiz,2008)

Define formulae

Page 93: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

System using Markov Logic Networks

Define formulae

Learning formula weights

To allocate high probability to correctly identified predicate

argument structures

I swim {lemma(1,I) , lemma(2, swim), isPredicate(2)} > {lemma(1,I) , lemma(2, swim),

isPredicate(1)}

Page 94: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

System using Markov Logic Networks

Define formulae

Learning formula weights

Inference

Jointly determine predicate argument structures that best fit

the formulae

Toolkit: thebeast http://code.google.com/p/thebeast/

Page 95: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Typical systems

Pipelined system

System based on sequential labeling

System using Markov Logic Networks

Collective SRL ( jointly conduct SRL on multi sentences )

Page 96: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Task definition of collective SRL

Input: a set of sentences from news articles

1. Hurricane Ida, the first Atlantic hurricane to target the U.S. this

year, plodded yesterday toward the Gulf Coast…

2. Hurricane Ida trudged toward the Gulf Coast…

Output: predicate-argument-role structures

1. (plodded, Ida, A0), (plodded, toward, AM-DIR), (target, Ida,

A0), (target, U.S., A1), (target, year, AM-TMP)

2. (trudged, Ida, A0), (trudged, toward, AM-DIR)

Role sets (following PropBank)

Page 97: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Collective SRL(Xiaohua Liu. et al., 2010)

Motivated by the fact SRL on one sentence can help that

on other differently phrased sentences with similar

meaning

A suicide bomber blew himself up Sunday in market in Pakistan„s

northwest crowded with shoppers ahead of a Muslim holiday,

killing 12 people, including a mayor who ….

Police in northwestern Pakistan say that a suicide bomber has

killed at least 13 people and wounded dozens of others.

Page 98: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Implementation of collective SRL

Labeling

News MODE

L

Training

News training data for SRL

…clustered

news

Related

Related

Labeled

… Label Revised

Label Revised

Grouping on sentence level

Collective Inference

by MLN

Page 99: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Implementation of collective SRL

Labeling

News MODE

L

training

News training data for SRL

…clustered

news

Related

Related

Labeled

… Label Revised

Label Revised

Grouping on sentence level

Collective Inference

by MLN

News about Chavez ordered his army to

prepare for war with Colombia

Page 100: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Implementation of collective SRL

Labeling

News MODE

L

training

News training data for SRL

…clustered

news

Related

Related

Labeled

… Label Revised

Label Revised

Grouping on sentence level

Collective Inference

by MLN

A sentence group:

1) ...Hugo Chavez, the fiery leftist

president of neighboring Venezuela,

ordered his army to prepare for war in

order to assure peace.

2) President Hugo Chavez ordered

Venezuela's military to prepare for a

possible armed conflict with Colombia...

3) Venezuelan President Hugo Chavez

told his military and civil militias...

Page 101: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Implementation of collective SRL

Labeling

News MODE

L

training

News training data for SRL

…clustered

news

Related

Related

Labeled

… Label Revised

Label Revised

Grouping on sentence level

Collective Inference

by MLN

Preliminary labeling:

1) (ordered, army, A1) …

2) (ordered, Chavez, A0),

(ordered, Venezuela, A1) …

3) (told, Chavez, A0),

(told, military, A1) …

Page 102: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Implementation of collective SRL

Labeling

News MODE

L

training

News training data for SRL

…clustered

news

Related

Related

Labeled

… Label Revised

Label Revised

Grouping on sentence level

Collective Inference

by MLN

Collective inference by MLN:

introduce two formulas (the second is for collective inference)

role(s, p, a, +r)=> final_role (s, p, a, +r) (1)

s1≠s2^lemma(s1,p1,p_lemma)^lemma(s2,p2, p_lemma)

^lemma(s1,a1,a_lemma)^lemma(s2,a2,a_lemma)

^role(s2,p2,a2,+r)=>final_role (s1,p1,a1,+r) (2)

Page 103: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Implementation of collective SRL

Labeling

News MODE

L

training

News training data for SRL

…clustered

news

Related

Related

Labeled

… Label Revised

Label Revised

Grouping on sentence level

Collective Inference

by MLN

Revised labeling:

1) (ordered, army, A1) …

(ordered, Chavez, A0)

2) (ordered, Chavez, A0),

(ordered, Venezuela, A1) …

3) (told, Chavez, A0),

(told, military, A1) …

Page 104: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Data

1000 sentences from news clusters, grouped into 200 clusters

Results (10-fold cross validation)

Experimental results of collective SRL

Systems Precision Recall F-Score

Baseline 69.87% 59.26% 64.13%

Our method 67.01% 68.33% 67.66%

Page 105: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Feature engineering in SRL

1. Predicate lemma: sell

2. Voice: active

3. Sub categorization

VBD_NP_P

P

VP

S

NP NP

The luxury auto maker last year

VP

NP PP

sold

1,214 cars in the U.S.

VBDDT JJ NN NN JJ NN

CD NNSIN DTNNP

Page 106: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Feature engineering in SRL

5. Path:

S

NP NP

The luxury auto maker last year

VP

NP PP

sold

1,214 cars in the U.S.

VBDDT JJ NN NN JJ NN

CD NNSIN DTNNP

VBD ↑VP ↓PP

6. Distance: 2

4. Direction: Right

Page 107: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Feature engineering in SRL

S

NP NP

The luxury auto maker last year

VP

NP PP

sold

1,214 cars in the U.S.

VBDDT JJ NN NN JJ NN

CD NNSIN DTNNP

7. Phrase type: PP

9. Head word: in

8. Governor: VP

10. Content word: U.S.

Page 108: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Feature engineering in SRL

S

NP NP

The luxury auto maker last year

VP

NP PP

sold

1,214 cars in the U.S.

VBDDT JJ NN NN JJ NN

CD NNSIN DTNNP

11. Combined features• Predicate lemma & Phrase type: sell_PP

• Predicate lemma & Head word: sell_in

•Voice & Direction: active_right

12. Others

•Co-occurrence of the predicate and

argument head # of sold~in

•…

Page 109: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Outline

Introduction

SRL Task definition

Application to twitter search

General approaches to SRL

Resources

Typical systems

SRL on tweets

Challenges

Method

Page 110: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Task definition of tweet level SRL

Input: a tweet

oh yea and Chile earthquake the earth off it's axis according to

NASA and shorten the day by a wee second :-(

Output: predicate-argument structures

(shorten, earthquake, A0), (shorten, day, A1)

Page 111: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Research challenges

SRL system for news does not work: 75.5% 43.3%

Reason: tweets are greatly different from news in written styles

Formal vs. informal; and Human edited vs. freely written

Question: how to leverage existing SRL resources?

Page 112: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Research challenges

Building a SRL for tweets requires huge training data

Manually labeling is prohibitively affordable

Question: can we train a system without much human labeling?

Page 113: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Research challenges

Tweets are huge

Infeasible/inefficient to conduct SRL for every tweet

Tweets are noisy

Unnecessary/unwise to conduct SRL for every tweet

Current solution: focus on news tweets

News tweets: tweets that report news

Page 114: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Outline

Introduction

SRL Task definition

Application to twitter search

General approaches to SRL

Resources

Typical systems

SRL on tweets

Challenges

Method

Page 115: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Key observations(1)

There are strong content connection between news and tweets

Tweets directly excerpted from news articles or Links in tweets point to news articles

Official news that follow hot tweets

E.g., For Chile earthquake on Match 2nd, 2010, 261 news and 722 news tweets published on the same day that described this event

Page 116: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Key observations(2)

Often news and tweets that describe similar content have

similar predicate argument structures

Chile Earthquake Shortened Earth Day

Chile Earthquake Shortened Day

oh yea and Chile earthquake the earth off it's axis according to

NASA and shorten the day by a wee second :-(

Page 117: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Raw tweets

Raw news

Clustering

Related

Related

…News MODE

L

Labeled

Training

Labeling

news

……

Labeling

by word alignment …

Labeled

Labeled

Labeled tweets

Tweet MODE

L Training

with CRFNews training data

for SRL, e.g. PropBank

Labeled

Labeled

Self-learning of SRL for tweets(Xiaohua Liu

et al., 2010)

Page 118: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Raw tweets

Raw news

Clustering

Related

Related

…News MODE

L

Labeled

training

Labeling

news

……

Labeling

by word alignment …

Labeled

Labeled

Labeled tweets

Tweet MODE

L Training

with CRFNews training data

for SRL, e.g. PropBank

Labeled

Labeled

Active learning of SRL for tweets

1.oh yea and Chile earthquake the earth

off it's axis according to NASA and shorten

the day by a wee second :-(

1. Chile Earthquake Shortened Earth Day

2. Chile Earthquake Shortened Day

Page 119: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Raw tweets

Raw news

Clustering

Related

Related

…News MODE

L

Labeled

training

Labeling

news

……

Labeling

by word alignment …

Labeled

Labeled

Labeled tweets

Tweet MODE

L Training

with CRFNews training data

for SRL, e.g. PropBank

Labeled

Labeled

Self-learning of SRL for tweets

Tweet:

oh yea and Chile earthquake the earth

off it's axis according to NASA and

shorten the day by a wee second :-(

News:

Chile Earthquake Shortened Earth Day

Chile Earthquake Shortened Day

Page 120: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Raw tweets

Raw news

Clustering

Related

Related

…News MODE

L

Labeled

training

Labeling

news

……

Labeling

by word alignment …

Labeled

Labeled

Labeled tweets

Tweet MODE

L Training

with CRFNews training data

for SRL, e.g. PropBank

Labeled

Labeled

Self-learning of SRL for tweets

Chile Earthquake Shortened Earth Day

A0 predicate A1

NASA and shorten the day by a wee second :-(

oh yea and Chile earthquake the earth off it's axis according to

Page 121: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Raw tweets

Raw news

Clustering

Related

Related

…News MODE

L

Labeled

training

Labeling

news

……

Labeling

by word alignment …

Labeled

Labeled

Labeled tweets

Tweet MODE

L Training

with CRFNews training data

for SRL, e.g. PropBank

Labeled

Labeled

Self-learning of SRL for tweets

Conflict resolution

Conflicts are cases that violate any of the two structure

constraints(Meza-Ruiz and Riedel, 2009)

1. one (predicate, argument) pair has only one role

label in one sentence;

E.g., (shorten, earthquake, A0) vs. (shorten, earthquake,

A1)

2. one predicate can have each of the proper arguments

(A0~A5) once at most in one sentence.

E.g., (shorten, earthquake, A0) vs. (shorten, axis, A0),

Page 122: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Raw tweets

Raw news

Clustering

Related

Related

…News MODE

L

Labeled

training

Labeling

news

……

Labeling

by word alignment …

Labeled

Labeled

Labeled tweets

Tweet MODE

L Training

with CRFNews training data

for SRL, e.g. PropBank

Labeled

Labeled

Self-learning of SRL for tweetsConflict resolution

Strategy: for any conflicting pair keep the more frequent

one

E.g., (shorten, earthquake, A0) 6 wins (shorten, earthquake,

A1) 4

Resolving order: first resolve the one that can resolve

most conflicts.

Resolving order:

Conflict 1

Left: (shorten, earthquake, A0),

(shorten, axis, A0),

(shorten, day, A1)

Conflict 3

Left: (shorten, earthquake, A0),

(shorten, day, A1)

3 conflicting pairs:

1. (shorten, earthquake, A0) 6 vs.

(shorten, earthquake, A1) 4;

2. (shorten, earthquake, A1) 4 vs.

(shorten, day, A1) 3;

3. (shorten, earthquake, A0) 6 vs.

(shorten, axis, A0) 1

Page 123: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Experiment setting

Evaluation metric: precision, recall and F1

Baseline: SRL system trained on news (Meza-Ruiz and

Riedel, 2009)

Data preparation

Training dataset: 10,000 mechanically labeled tweets

Testing dataset: 1,110 human labeled tweets

Page 124: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Experimental results (1)

Basic results

SRL-TS: our system; SRL-BS: baseline

Precision Recall F1

SRL-BS 36.0 % 54.5% 43.3%

SRL-TS 78.0% 57.1% 66.0%

Page 125: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Experimental results (2)

Influence of training data size

Curve1: no test data is used for training

Curve2: half of the test data is used as training data

Page 126: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Some directions to explore

• Enlarge training and test data size

• Explore tweets specific features

• Combine SRL system on news

• …

Page 127: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

References Màrquez, Lluís. 2009. Semantic Role Labeling Past, Present and Future,

Tutorial of ACL-IJCNLP 2009.

Meza-Ruiz, Ivan and Sebastian Riedel. 2009. Jointly Identifying Predicates, Arguments and Senses using Markov Logic. Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the ACL, pages: 155-163.

Xiaohua Liu, Kuan Li, Bo Han, Ming Zhou, Long Jiang, Daniel Tse and Zhongyang Xiong. 2010. Collective Semantic Role Labeling on Open News Corpus by Leveraging Redundancy. COLING 2010

Xiaohua Liu, Kuan Li, Bo Han, Ming Zhou, Long Jiang, ZhongyangXiong and Changning Huang. 2010. Semantic Role Labeling for News Tweets. COLING 2010

Page 128: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Twitter Sentiment Analysis

Long JIANG

Page 129: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Outline

Sentiment analysis

Introduction

Definition, application, components

Approaches for SA subtasks

Holder detection

Target detection

Polarity classification

Twitter SA

Goals and challenges

Existing systems

Page 130: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Outline

Sentiment analysis

Introduction

Definition, application, components

Approaches for SA subtasks

Holder detection

Target detection

Polarity classification

Twitter SA

Goals and challenges

Existing systems

Page 131: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Sentiment Analysis (SA)

Also known as opinion mining: to understand the

attitude of a speaker or a writer with respect to some

topic

The attitude may be their judgment or evaluation, their

affective state or the intended emotional communication

Most popular classification of sentiment: positive or negative

For example

The pictures are very clear.

In his recent State of the Union address, US President Bush quite

unexpectedly labeled Iran, Iraq, and the DPRK as an “axis of evil”.

Page 132: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Applications of SA

Business intelligence system

Purchase planning

Public opinion management

Web advertising

Page 133: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Sentiment Components

Holder

who expresses the sentiment

Target

what the sentiment is expressed to

Polarity

the nature of the sentiment (e.g., positive/negative)

In his recent State of the Union address, US President Bushquite unexpectedly labeled Iran, Iraq, and the DPRK as an “axis of evil”.

Page 134: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Outline

Sentiment analysis

Introduction

Definition, application, components

Approaches for SA subtasks

Holder detection

Target detection

Polarity classification

Twitter SA

Goals and challenges

Existing systems

Page 135: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Outline

Sentiment analysis

Introduction

Definition, application, components

Approaches for SA subtasks

Holder detection

Target detection

Polarity classification

Twitter SA

Goals and challenges

Existing systems

Page 136: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Holder Detection

Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns

(Choi et al., HLT/EMNLP-05)

International officers believe that the EU will prevail.International officers said US officials want the EU to prevail.

View source identification as an information extraction task and tackle the problem using sequence tagging and pattern matching techniques simultaneously

Linear-chain CRF model to identify opinion sources

Patterns incorporated as features

Page 137: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

CRF for Holder Detection

Given a sentence X, to seek for a label sequence Y that maximizes

Yi belongs to {‟S‟, ‟T‟, ‟-‟}

λk and λ‟k are parameters, fk and f ‟k are feature functions

Zx is the normalization factor

International officers believe that the EU will prevail

S T - - - - - -

Page 138: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Basic Features Capitalization features: all-capital, initial-capital

Part-of-speech features ([-2,+2]): noun, verb, adverb, wh-word, determiner, punctuation, etc

Opinion lexicon features: [-1,+1] whether or not the word is in the opinion lexicon

Dependency tree features the grammatical role of its chunk

the grammatical role of xi-1‟s chunk

whether the parent chunk includes an opinion word

whether xi‟s chunk is in an argument position with respect to the parent chunk

whether xi represents a constituent boundary

Semantic class features: the semantic class of each word: authority, government, human, media, organization or company, proper name, and other

Page 139: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Extraction Pattern Learning

Looking at the context surrounding each answer and

proposes a lexico-syntactic pattern

[They]h complained about the deficiencies of the benefits given to

them.

<subj> complained

Compute the probability that the pattern will extract

an opinion source

Page 140: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Extraction Pattern Features

Four IE pattern-based features for each token xi

SourcePatt-Freq, SourcePatt-Prob,

SourceExtr-Freq, SourceExtr-Prob

Where

SourcePatt indicates whether a word activates any source extraction

pattern. E.g., “complained” activates the pattern “<subj> complained”

SourceExtr indicates whether a word is extracted by any source

pattern. E.g., “They” would be extracted by the “<subj> complained”

Page 141: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Experimental Results

MPQA data In total, 535 documents where targets are annotated by human

135 as development set and feature engineering, and the remaining 400 for evaluation, performing 10-fold cross validation

3 measures: overlap match (OL), head match (HM), and exact match (EM)

Page 142: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Outline

Sentiment analysis

Introduction

Definition, application, components

Approaches for SA subtasks

Holder detection

Target detection

Polarity classification

Twitter SA

Goals and challenges

Existing systems

Page 143: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Target Detection

Mining Opinion Features in Customer Reviews

(Minqing Hu and Bing Liu, AAAI 2004)

Explicit feature

The pictures are very clear.

Implicit feature

While light, it will not easily fit in pockets. (size)

Task definition

Given a product name and all the reviews of the product, to find the features of the product that appear explicitly as nouns or noun phrases in the reviews

Page 144: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Approach Overview

Page 145: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Frequent Features Detection

Association rule mining

Find frequent features with three words or fewer

Appears in more than 1% of the review sentences (minimum

support)

Feature Pruning

Compactness: compact in at least 2 sentences

p-support (pure support): a p-support lower than the minimum

p-support (3)

Page 146: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Infrequent Feature Detection

People use the same adjective words to describe different

subjects

“Red eye is very easy to correct.”

“The camera comes with an excellent easy to install software”

“The pictures are absolutely amazing”

“The software that comes with it is amazing”

Page 147: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Infrequent Feature Detection

Opinion word identification

For each sentence in the review database, if it contains any

frequent feature, extract the nearby adjective as opinion word

Infrequent feature detection

For each sentence in the review database, if it contains no

frequent feature but one or more opinion words, find the

nearest noun/noun phrase of the opinion word as an infrequent

feature

Page 148: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Experimental Results

Data: customer reviews of five electronics products

from Amazon.com and C|net.com

Page 149: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Outline

Sentiment analysis

Introduction

Definition, application, components

Approaches for SA subtasks

Holder detection

Target detection

Polarity classification

Twitter SA

Goals and challenges

Existing systems

Page 150: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Lexicon Based Polarity Classification

Mining and Summarizing Customer Reviews

(Hu and Liu, KDD-2004)

Basic idea

Use the dominant orientation of the opinion words in the

sentence to determine the orientation of the sentence.

That is, if positive/negative opinion prevails, the opinion

sentence is regarded as a positive/negative one.

Page 151: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Lexicon Building(Hu and Liu, KDD-2004)

Utilize the adjective synonym set and antonym set in WordNet

to predict the semantic orientations of adjectives

Adjectives share the same orientation as their synonyms and opposite

orientations as their antonyms.

Start with several seeds, iteratively expand to cover most

opinion words

Page 152: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Hatzivassiloglou and McKeown (1997)

Predicting the Semantic Orientation of Adjectives

(Hatzivassiloglou and McKeown, ACL-97)

Assumption: adjectives connected by “and”/”but” tend to

have same/opposite polarities

The tax proposal was 1. simple and well-received

2. simplistic but well-received

3. *simplistic and well-received

by the public.

Page 153: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

ML-based Approaches for Polarity

Classification

Thumbs up? Sentiment Classification using Machine

Learning Techniques

(Pang et al., 2002)

Basic idea

Treat sentiment classification simply as a special case of topic-

based categorization

With the two “topics” being positive sentiment and negative

sentiment

Use three standard algorithms: Naive Bayes classification, maximum

entropy classification, and support vector machines

Page 154: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Approach Details

Document representation

Each document d is represented by a feature vector ~d:= (n1(d),

n2(d), . . . , nm(d))

ni(d) could indicate presence, term frequency

Classification algorithms

Naive Bayes, Maximum Entropy, SVM

Page 155: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Data

Movie reviews

From Internet Movie Database (IMDb)

http://www.cs.cornell.edu/people/pabo/movie-review-data/

http://reviews.imdb.com/Reviews/

700 positive / 700negative

Experiment setting for ML classifiers

3-fold cross validation

Treating punctuation as separate lexical items

No stemming or stoplists were used

Page 156: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Experimental Results

• Baseline: use a few words written by human to classify

• ML-based methods

Page 157: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Other Related Approaches

Topic sentiment mixture

Mei et al., 2007

Semi-supervised approach

Li et al., 2010

Domain Adaptation

Blizter et al., 2007

Page 158: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Summary

1. Sentiment analysis refers to a set of subtasks

Holder, target, polarity

2. Sentiment analysis is a challenging task and more

difficult than traditional topic-based classification

Understanding of the semantics is often needed

How could anyone sit through this movie?

Same word/phrase may have different polarities in different

domains

An unpredictable movie (positive)

An unpredictable politician (negative)

Page 159: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Outline

Sentiment analysis

Introduction

Definition, application, components

Approaches for SA subtasks

Holder detection

Target detection

Polarity classification

Twitter SA

Goals and challenges

Existing systems

Page 160: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Outline

Sentiment analysis

Introduction

Definition, application, components

Approaches for SA subtasks

Holder detection

Target detection

Polarity classification

Twitter SA

Goals and challenges

Existing systems

Page 161: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Twitter SA

Aiming to find positive and negative tweets about a given

topic

Focusing on polarity classification

Target-dependent sentiment classification

Given a target, classifying a tweet as positive, negative or

neutral (no sentiment) towards the target

Input: a tweet “Windows 7 is much better than Vista!” and a

target “Windows 7”

Output: positive

Page 162: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Advantages of Twitter SA

Large amount

Wide coverage of domain

Fresh

From grass roots

Page 163: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Special Challenges

Short and ambiguous

Informal and unedited texts

“another part of me by Micheal Jackson is soo nicee! Loooveeeeee

itttttttttt!”

Page 164: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Outline

Sentiment analysis

Introduction

Definition, application, components

Approaches for SA subtasks

Holder detection

Target detection

Polarity classification

Twitter SA

Goals and challenges

Existing systems

Page 165: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Existing Twitter SA Systems

Lexicon-based method

Twittratr

Rule-based

Tweetfeel

Machine learning based

Twitter sentiment

Unknown

Twendz

Tweetsentiments

Page 166: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Twitrratr

Example: Microsoft

Page 167: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Twitrratr

http://twitrratr.com

Feature

3 classes (positive, negative, neutral)

Highlight the sentiment expressions

Method

Lexicon-based

Words, phrases, emoticons (, :D, :-(…)

Manually-made lexicon

Still contains errors (e.g., fail in the positive list)

Simple string (not word) match (“unhelpful” )

Page 168: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Tweetfeel

Example: Microsoft

Page 169: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Tweetfeel

http://www.tweetfeel.com

Feature

2 classes: positive and negative

Method

Probably rule based

Positive patterns

pos_verb [Query], [Query] pos_verb, [Query] is pos_adj

Negative patterns

neg_verb [Query], [Query] neg_verb, [Query] is neg_adj

High precision, low recall

Page 170: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Twitter Sentiment

Example: Microsoft

http://twittersentiment.appspot.com/search?query=MicrosoftSentiment by

percent

Sentiment

timeline

Detailed

tweets

Page 171: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Twitter Sentiment

http://twittersentiment.appspot.com

Created by some graduate students at Stanford University

Features

2 classes: positive and negative

Timeline: how the number of pos/neg sentiments change over time

Allows users to correct wrongly classified tweets

Method

Machine learning-based (maximum entropy classifier)

Unsupervised training data construction by making use of emoticons

( for positive, for negative)

Page 172: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Summary

Twitter SA has its own characteristics

Short, informal text

Pictograms (<3) and emoticons (, , :D,…)

However, not intensively studied yet

Traditional SA methods are employed

No paper published in top conferences yet

Lacking of large amount of publicly available annotated data for system evaluation and comparison

Potential directions

Tweet normalization

Context aware sentiment analysis

Page 173: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

References J. Blitzer, R. McDonald, and F. Pereira. Domain adaption with structural correspondence learning. In EMNLP,

2006.

Choi, Y., Cardie, C., Riloff, E., and Patwardhan, S. 2005. Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns. Proceedings of HLT/EMNLP-05.

Alec Go, Richa Bhayani, Lei Huang. Twitter Sentiment Classification using Distant Supervision. http://www.stanford.edu/~alecmgo/papers/TwitterDistantSupervision09.pdf

Vasileios Hatzivassiloglou and Kathleen McKeown. 1997. Predicting the semantic orientation of adjectives. In Proc. of the 35th ACL/8th EACL, pages 174–181.

Hu, M., and Liu, B. 2004. Mining Opinion Features in Customer Reviews. To appear in AAAI‟04, 2004.

Hu, M., and Liu, B. 2004a. Mining and summarizing customer reviews. In Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, pages 168–177. ACM Press New York, NY, USA.

S Li, CR Huang, G Zhou, SYM Lee. 2010. Employing Personal/Impersonal Views in Supervised and Semi-supervised Sentiment Classification. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 414–423, Uppsala, Sweden, 11-16 July 2010.

Q.Mei, X. Ling,M.Wondra, H. Su, and C.X. Zhai. 2007. Topic sentiment mixture: modeling facets and opinions in weblogs. In Proceedings of the 16th International Conference onWorldWideWeb, pages 171–180.

Bo Pang, Lillian Lee and ShivakumarVaithyanathan: Thumbs up? Sentiment Classification using Machine. Learning Techniques. In Proc. Conf. on EMNLP 2002.

Page 174: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Twitter Search

Yajuan DUAN

Page 175: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Outline

Twitter search services

Learning to rank

An empirical study on learning to rank of tweets

Page 176: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Outline

Twitter search services

Motivation

Problem definition

Approaches

Learning to rank

An empirical study on learning to rank of tweets

Page 177: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Motivatio

n

real time

Restricted within

140 characters

Additional

information

Page 178: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Twitter search architecture

Spider

Request

TweetsIndexer

Index files

Tweets

Parser

Query

Parser

SearchRanking

User

Interface

End user

Query

Results

Background process

foreground process

Page 179: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Twitter rank

Given:

A query

A corpus of tweets

Output:

A ranked set of tweets that are

relevant to the query.

Object:

Seek the best rank function

A corpus

of tweets

Query(e.g. iphone)

Rank

Function

Page 180: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Challenge

Short

Restricted within 140 characters.

Not easily distinguishable from content

Informal

Spoken language style

Abbreviation

Spam tweets

Pointless babble

Page 182: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Approaches

Account authority

Tweefind, Twitority

Page 183: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Approaches

Tweet popularity

Chirrps, Twitter Search

Page 184: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Approaches

Content relevance combined with account authority

CrowdEye, Bing

Page 185: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Outline

Twitter search services

Motivation

Problem definition

Approaches

Learning to rank

Definition

Approaches

An empirical study on learning to rank of tweets

Page 186: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Learning to rank Goal

Automatically learn a ranking model from training data

Input X, all possible queries

C, Instance corpus

Output Y, ranking over C

Application Document retrieval, collaborative filtering, sentiment analysis, computational

advertising

Machine translation

Computational biology

YXH :

Page 187: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Learning to rank

nvaluefeaturevaluefeaturevaluefeaturelabel :,,:,: 21

Figure 1. General paradigm for learning for tweets ranking

Page 188: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Approaches of learning to rank

Pointwise approach

Approximated by a regression problem

Ordinal regression, classification

Pairwise approach

Approximated by a classification problem

RankNet, FRank, RankBoost, RankSVM, IR-SVM

Listwise approach

Optimized the value of one of the evaluation measures

SoftRank, SVM-MAP, AdaRank, RankGP, ListNet, ListMLE

Page 189: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

A learning to rank approach

RankBoost (Yoav Freund et al., 2003)

Combing preferences based on the boosting approach to

machine learning

Weak learner

Update

Final ranking

Do well on data sets of varying sizes

Able to combine different approaches for ranking

Need no feature selection approach

RXht :

t

ttttt

Z

xhxhxxDxxD

)))()((exp(),(),( 1010

101

T

t tt xhxH1

)()(

Page 190: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Measurement

Precision@n

MAP (Mean Average Precision)

where

NDCG (Normalized Discount Cumulative Gain)

nnP

resultsn in top weetsrelevant t of #@

Q

N

nnrelnP

NMAP

Qfor weetsrelevant t of #

)(@1 1

relevant is tweet 1,

irrelevant is tweet ,0)(

th

th

n

nnrel

m

j

jrel

mj

zmNDCG1

)(

)1log(

12@

Page 191: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Learning to rank

Tools

Svm_light

Thorsten Joachims, http://svmlight.joachims.org/

Svm_rank

Thorsten Joachims, http://svmlight.joachims.org/

Svm_map

YisongYue, Thomas Finley, http://projects.yisongyue.com/svmmap/

LAGEP (rankboost)

Jung-Yi Lin et al.,

http://www.cs.nctu.edu.tw/~jylin/download/LAGEP083.zip

Page 192: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Outline

Twitter search services

Motivation

Problem definition

Approaches

Learning to rank

Definition

Approaches

An empirical study on learning to rank of tweets

Framework

Features

Experiment & Conclusion

Page 193: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

An empirical study on learning to rank of

tweets

Learning to rank framework

Popular approach for ranking, proved to be effective

Data-driven approach

Could integrate a bag of features into the model effectively.

Features

Content relevance features

Twitter specific features

Account authority features

Page 194: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

FeaturesType Feature Description[normalized into [0,1]]

Content

Relevance

Okapi Bm25 score Content Relevance between query and tweet

Similarity Popularity of each tweet in the corpus

Length Length

Twitter

Specific

URL Whether the tweet contains URL or not (boolean)

URL Count Frequency of URL appeared in the corpus

Retweet Count Retweet count in the corpus

Hash-tag Score Popularity of hash-tags contained by the tweet in the corpus

Reply Conversation tweet, is the current tweet a reply tweet (boolean)

OOV Words out of vocabulary (Ratio)

Account

Authority

Popularity Score Calculated by PageRank algorithm based on retweet relations

Follower Number Number of followers of the publisher and retweeters

Mention Number Number of times the publisher and retweeters were referred to in tweet corpus

e.g. if you have not seen the ustream @Longineu quit drinking, please support him

List Number Number of lists including the publisher or retweeters

Page 195: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Features

Similarity Sum of cosine similarities between tweet and all other tweets related

to query in the corpus.

, tweets collection related to query

, TFIDF vector of

Hash-tag score Sum of frequencies of top-n tags of the query appeared in the tweet.

, normalization factor

, tags extracted from

ijTT ji

ji

Q

i

kQjkTVTV

TVTV

TTSimilarity

, ||||1||

1)(

kQ

kQ

kQT

iT

ji TVTV , ji TT ,

n

Ttag

Tagtagj

j

k

i

ij

kQj

tagfreqz

TTagScore

,,1

)(1

)(

kZ

kQTagkQT

Page 196: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Features

Popularity Score

Graph for accounts based on retweet relations.

PageRank algorithm

PageRank algorithm for calculating popularity score for accounts.

Input: Directed Graph G of retweeting relationship

Damping factor e.

Output: popularity score for each user

Procedure:

Step 1: popularity score of all users are initialized as .

Step 2: update the popularity score for users.

denotes the collection of users who retweeted ’s tweet.

is the number of times has been retweeted by .

is the number of users whose tweets has retweeted.

Step 3: Repeat the second step until diff < ԑ.

.

e1

Figure 2. PageRank algorithm for

calculating popularity score for accounts

ivj Rv j

ijjt

itN

RNvPScoreeevPScore

)(1)(1

ivRiv

ijRNjviv

jN jv

0

2

15

4

3

0.2

0.2

0.2

0.20.2

0.2

2/)2(*8.02.0)5(

)2/)5(2/)3((*8.02.0)4(

2.0)3(

2/)3(*8.02.0)2(

2/)2(*8.02.0)1(

)2/)5()4()1((*8.02.0)0(

1

1

1

1

1

1

tt

ttt

t

tt

tt

tttt

PScorePScore

PScorePScorePScore

PScore

PScorePScore

PScorePScore

PScorePScorePScorePScore

0.6

0.28

0.28

0.280.36

0.2

diff=0.2048

Page 197: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Features

Popularity Score

Graph for accounts based on retweet relations.

PageRank algorithm

PageRank algorithm for calculating popularity score for accounts.

Input: Directed Graph G of retweeting relationship

Damping factor e.

Output: popularity score for each user

Procedure:

Step 1: popularity score of all users are initialized as .

Step 2: update the popularity score for users.

denotes the collection of users who retweeted ’s tweet.

is the number of times has been retweeted by .

is the number of users whose tweets has retweeted.

Step 3: Repeat the second step until diff < ԑ.

.

e1

Figure 2. PageRank algorithm for

calculating popularity score for accounts

ivj Rv j

ijjt

itN

RNvPScoreeevPScore

)(1)(1

ivRiv

ijRNjviv

jN jv

0

2

15

4

3

0.6

0.28

0.28

0.280.36

0.2

2/)2(*8.02.0)5(

)2/)5(2/)3((*8.02.0)4(

2.0)3(

2/)3(*8.02.0)2(

2/)2(*8.02.0)1(

)2/)5()4()1((*8.02.0)0(

1

1

1

1

1

1

tt

ttt

t

tt

tt

tttt

PScorePScore

PScorePScorePScore

PScore

PScorePScore

PScorePScore

PScorePScorePScorePScore

0.824

0.312

0.28 0.2

0.3920.312

diff=0.0532

Page 198: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Features

Popularity Score

Graph for accounts based on retweet relations.

PageRank algorithm

PageRank algorithm for calculating popularity score for accounts.

Input: Directed Graph G of retweeting relationship

Damping factor e.

Output: popularity score for each user

Procedure:

Step 1: popularity score of all users are initialized as .

Step 2: update the popularity score for users.

denotes the collection of users who retweeted ’s tweet.

is the number of times has been retweeted by .

is the number of users whose tweets has retweeted.

Step 3: Repeat the second step until diff < ԑ.

.

e1

Figure 2. PageRank algorithm for

calculating popularity score for accounts

ivj Rv j

ijjt

itN

RNvPScoreeevPScore

)(1)(1

ivRiv

ijRNjviv

jN jv

0

2

15

4

3

2/)2(*8.02.0)5(

)2/)5(2/)3((*8.02.0)4(

2.0)3(

2/)3(*8.02.0)2(

2/)2(*8.02.0)1(

)2/)5()4()1((*8.02.0)0(

1

1

1

1

1

1

tt

ttt

t

tt

tt

tttt

PScorePScore

PScorePScorePScore

PScore

PScorePScore

PScorePScore

PScorePScorePScorePScore

0.824

0.312

0.28 0.2

0.3920.312

diff=0.0043

0.888

0.312

0.28 0.2

0.40480.312

Page 199: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Features

Popularity Score

Graph for accounts based on retweet relations.

PageRank algorithm

PageRank algorithm for calculating popularity score for accounts.

Input: Directed Graph G of retweeting relationship

Damping factor e.

Output: popularity score for each user

Procedure:

Step 1: popularity score of all users are initialized as .

Step 2: update the popularity score for users.

denotes the collection of users who retweeted ’s tweet.

is the number of times has been retweeted by .

is the number of users whose tweets has retweeted.

Step 3: Repeat the second step until diff < ԑ.

.

e1

Figure 2. PageRank algorithm for

calculating popularity score for accounts

ivj Rv j

ijjt

itN

RNvPScoreeevPScore

)(1)(1

ivRiv

ijRNjviv

jN jv

0

2

15

4

3

2/)2(*8.02.0)5(

)2/)5(2/)3((*8.02.0)4(

2.0)3(

2/)3(*8.02.0)2(

2/)2(*8.02.0)1(

)2/)5()4()1((*8.02.0)0(

1

1

1

1

1

1

tt

ttt

t

tt

tt

tttt

PScorePScore

PScorePScorePScore

PScore

PScorePScore

PScorePScore

PScorePScorePScorePScore

diff=0.0001

0.888

0.312

0.28 0.2

0.40480.312

0.8982

0.312

0.28 0.2

0.40480.312

Page 200: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

List number

Page 201: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Data preparation

Query selection Type

Analyzed hot search on CrowdEye

Four types query, accounts for 81% of all hot searches Person, location, products, movies

Selection

Person from hot search of CrowdEye

Location from American map

Products from eBay

Movies from a collection of recommended movies from 2005 to 2010

Data

20 queries

Get 162626 tweets from Twitter Search on above queries.

Sampled 500 tweets for each query as experiment data.

Annotation Multiple search intention for each query

Four Grade

Grade Excellent Good Fair Bad

Percentage 20.88% 10.92% 16.85% 51.35%

Page 202: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Experiment BaseLine

Rank all tweets by content relevance, account authority and time respectively.

Our model

RankSVM

Tool: Svm_rank

Training and evaluation 5-fold cross-validation for each fold

Training data, 8000 tweets.

Validation data,1000 tweets.

Test data, 1000 tweets.

Evaluation Metric

Normalized Discount Cumulative Gain (NDCG)

Page 203: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Experimental results

Performance of four ranking methods

Three baselines

RankSVM using all features described before (RankSVM_Full)

Figure 3. Performance of Four Ranking Methods

Page 204: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Feature selection

Motivation

RankSVM_Full underperforms some models trained from part of features.

SVM does not directly obtain the feature importance

Selection approach

Advanced greedy feature selection algorithm inspired by greedy

algorithm(Cormen, et al. 1990).

An advanced greedy feature selection algorithm.

Input: All features we extracted.

Output: the best feature conjunction BFC

Procedure:

Step1: Random generate 80 feature set F.

Step 2: Evaluate every feature set in F and select the best one denoted by RBF.

Features excluded those in RBF are denoted as EX_RBF

Step 3: t = 0,BFC(t)=RBF;

Repeat

Foreach feature in EX_RBF

If Evaluation(BFC)

< Evaluation(BFC, feature)

BFC(t+1) = {BFC(t), feature}

EX_RBF(t+1) = EX_RBF(t) – {feature}

While BFC(t+1) ≠ BFC(t)

Note: Evaluation(BFC) refers to the performance of ranking function trained

from features in BFC on validation data.

Figure 4. Feature Selection Algorithm

Page 205: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Results of feature selection

Best feature combination (RankSVM_Best) URL, Mention number(sum_mention), List number(Publisher_list), Length, Follower

number(Important_follower)

Performance

Figure 5. Comparison between Five

Ranking Models

RankSVM_Best outperforms the four ranking models significantly? Paired T-Test

NDCG@10

0.01 level: time, authority, content relevance

0.05 level: RankSVM_Full

Conclusion

RankSVM_Best outperforms ranked by time, authority, content relevance respectively significantly.

RankSVM_Best outperforms RankSVM_Full indistinctively

Page 206: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Feature contribution

The importance of each feature in the best feature combination

Measurement Decrement of performance when removing the feature to be

evaluated from RankSVM_Best

Figure 6. Improtance of Each Feature

The feature really effective? Paired T-Test

NDCG@10

0.01 level: URL

0.05 level: List Number

Conclusion

URL is very important feature

List Number is useful feature

Other feature don‟t show significant contribution

Page 207: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Summary

Our learning to rank approach significantly outperforms

ranking methods through chronological order, account

authority and content relevance respectively.

URL, Length, Mention Number, List Number, and Follower

Number are top five effective features.

URL and List Number are the most effective features.

List Number is a better account authority representation

than follower number

Page 208: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

References Anish Das Sarma, Atish Das Sarma, Sreenivas Gollapudi, and Rina Panigrahy,

2010, Ranking Mechanisms in Twitter-like Forums. In the proceedings of the third ACM International Conference on Web Search and Data Mining. Pages: 21-30.

Danny Sullivan, What is real time search? Definitions & Players

Yoav Freund, Raj Iyer, Robert E. Schapire and Yoram Singer, 2003, An efficient boosting algorithm for combining preferences. In the proceedings of the Journal of Machine Learning Research. Vol. 4, Pages: 933-969

Microsoft learning to rank

Yajuan Duan, Long Jiang, Tao Qin, Ming Zhou, and Heung-Yeung Shum, 2010, An empirical study on learning to rank of tweets. Accepted by Coling2010.

Page 209: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Wrap-up: what we have learnt

Introduction (Ming ZHOU)

Understand Twitter

The task of semantic analysis

Semantic analysis of tweets (Xiaohua LIU, Long JIANG)

Semantic role labeling

Sentiment analysis

Twitter search (Yajuan DUAN)

Feature extraction

Ranking search results with account‟s influence, content

relevance and other features

Page 210: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

What we should go in the future

Classification

Clustering

Spam filtering

Recommend people to new users

Multi-language analysis

Page 211: Tutorial on Semantic Analysis and Search analysis an… · Tutorial outline Introduction (Ming ZHOU) Understand Twitter The task of semantic analysis Semantic analysis of tweets (Xiaohua

Thanks

Contact person: [email protected]