Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Post on 27-Jun-2015

331 Views

Category:

Data & Analytics

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs

Transcript

Stretching the Life of Twitter Classifiers with Time-Stamped

Semantic Graphs

A. Elizabeth Cano (@pixarelli) amparo.cano@open.ac.uk

Yulan He

y.he9@aston.ac.uk

Harith Alani

h.alani@open.ac.uk

1  

Introduction Social Media Streams

2  

Introduction Representing Topics in Dynamic Environments

#Jan24

Egypt

dead

protest

security

Egypt

Pres Morsi

Tehran

Syrian

uprising

Boston

bombing

suspect

Watertown

Obama

ISIS

strategy

3 dead in protest in Egypt. Security official vows to ‘deal firmly..#Jan24

Egypt Pres Morsi uses his visit to Tehran to praise the Syrian uprising

#Boston bombing suspect “pinned down” on boat in Watertown

Why Obama needs to rethink his entire ISIS strategy…

2011   2012   2013   2014  

Techniques for topic classification of Social Media are sensitive to the evolution of topics

3  

Introduction

Challenges •  Keeping updated model requires regular

retuning. •  Manual annotation expensive

Questions

•  Which feature types provide a more stable representation of a topic?

4  

Introduction Previous work

 

Using local features •  Bag of Words (BoW)[Genc et al., 2011] •  BoW + Bag of Entities (BoE) [Vitale et al., 2012] •  BoW + BoE + Part of Speech (PoS) tagging [Munoz et al.,

2011][Varga et al., 2012]

Exploiting the link structure of a Knowledge Source •  Exploiting categories containing entities [Michelson et al.,

2010] •  Relating tweets with Wikipedia resources[Milne et al., 2008]

[Xu et al., 2011]. •  Use of semantic features for topic classification [Cano et al.,

2013] [Varga et al.,2014].

5  

Introduction Topic Evolution

Twitter Corpus

 

Topic  

. . . .  

t   t+1  . . . .   . . . .  

Lexical  

Seman7c  

6  

Introduction Characterising Topic Changes with DBpedia

 

dbp:Barack_Obama

yago:PresidentOfTheUnitedStates

rdf:type

dbo:Person

rdfs:subClassOf

dbo:au

thor

dbp:Michelle_Obama

dbo:spouse

dbp:Hawaii dbo:birthPlace

skos:subject dbp:The_Audacity_of_Hope

dbp:Dreams_from_My_Father

.

.category:Community_organisers

category:Columbia_University_Alumni

.

.

3.6 DBPEDIA

3.7 DBPEDIA

skos:subject

dbo:leaderdbp:United_States_National_Council

dbp:National_Science_and_Techology

.

.

category:United_States_presidential_candidates,_2012dbp:Al-Qaeda

dbo:wikiPageWikiLink

3.8 DBPEDIA dbp:Budget_Control_Act_of_2011dbo:wikiPageWikiLink

Some features remain unchanged, others provide information of past, current or future contexts (e.g. dbp:UnitedStatesPresidentialCandidates)!

7  

Approach DBpedia Graph Snapshots

 

Definition: Time-dependent Resource Meta Graph! Is a sequence of tuples G:=(R,P,C,Y, ft) where •  R, P, C are finite sets whose elements are

resources, properties and classes; •  Y is a ternary relation

representing a hypergraph with ternary edges. •  Y is a tripartite graph where the

vertices are •  ft assigns a temporal marker to each ternary

edge.

Y ⊆ R×P×C

H Y( ) = V,DD = r, p,c{ } r, p,c( )∈ Y{ }

8  

Approach Semantic Representation of a Tweet

 

<dbp:Hosni_Mubarak>

<dbp:Egypt> <dbp:Barack_Obama>

dbp: http://dbpedia.org/resource/  

<dbp:CNN>

9  

Approach Semantic Representation of a Tweet

 

<dbp:Hosni_Mubarak>

<dbp:Egypt> <dbp:Barack_Obama>

<dbp:CNN>

<yago:NobelPeacePrizeLaureates>

rdf:type  

<dbo:OfficeHolder> rdf:type  

<dbo:Country>

rdf:type  

dbo: http://dbpedia.org/ontology/  

<dbo:Broadcaster>

rdf:type  

Class Features (rdf:type)  

10  

Approach Semantic Representation of a Tweet

 

<dbp:Hosni_Mubarak>

<dbp:Egypt> <dbp:Barack_Obama>

<dbp:CNN>

American

<dbp:Prime_Minister_of_Egypt>

<dbp:Altanta>

skos: http://dbpedia.org/resource/Category:  

<dbp:Egyptian_Arabic>

dbprop:languages  

Property Features  

dbprop:nationality  dbprop:headquarters  

dbprop:title  

11  

Approach Semantic Representation of a Tweet

 

<dbp:Hosni_Mubarak>

<dbp:Egypt> <dbp:Barack_Obama>

<dbp:CNN>

<skos:Presidents_of_the_United_States

<skos:PresidentsOfEgypt> dcterms:subject  

<skos:Arab_republics>

skos: http://dbpedia.org/resource/Category:  

<skos:English-language_television_stations>

dcterms:subject  

Category Features (skos)  

dcterms:subject  

dcterms:subject  

12  

Approach Semantic Representation of a Tweet

 

<dbp:Hosni_Mubarak>

<dbp:Egypt> <dbp:Barack_Obama>

<dbp:CNN>

<dbp:Death_Of_Osama_Bin_Laden>

<dbp:Prime_Minister_of_Egypt>

<dbp:Altanta>

skos: http://dbpedia.org/resource/Category:  

<dbp:Egyptian_Arabic>

dbprop:languages  

Resource Features  

dbprop:commander  dbprop:headquarters  

dbprop:title  

13  

Approach DBpedia Graph Snapshots

 

I.e. The meta-graph of entity e is the aggregation of all resources, properties and classes related to this entity at time t.  

Properties and Resources <dbp:Barack_Obama>

DBpedia 3.6 3.7 3.8 ….

<MichelleObama>

<Hawaii>

prop:spouse  

prop:birthPlace  

<MichelleObama>

<Hawaii>

prop:spouse  

prop:birthPlace  

prop:commander  

<MichelleObama>

<Hawaii>

prop:spouse  

prop:wikiPageWikiLink  

<UnitedStatesPresidentialCandidates>

prop:birthPlace  

<Budget_Control_Act_of_2011>

prop:wikiPageWikiLink  <dbp:Death_Of_Osama_Bin_Laden>

14  

Approach Semantic Feature Weighting Strategies

 

Characterise the global relevance of a semantic feature to a given topic in DBpedia at a given point in time.

Topic Relevance-based Weighting Strategy:

DBpedia Graph Topic graph in DBpedia Graph

?  

15  

Approach Semantic Feature Weighting Strategies

 

•  Class-based Topic Relevance (ClsW) •  Property-based Topic Relevance (PropW) •  Category-based Topic Relevance (CatW) •  Resource Relevance (ResW)

Topic Relevance-based Weighting Strategy:

16  

Approach Semantic Feature Weighting Strategies

 

Integrating weights into a Tweet representation

DB_ tWx( f ) = DB_ t

Nx( f ) +1

F +DB_ t

Nx( f ')f '∈F∑

#

$

%%

&

'

((∗ WDB_ t ( f )#$ &'

1/2

Semantic feature f in a document x:

Frequency with Laplace smoothing

Weight derived from DB_t graph

17  

Experiments Framework for Twitter Topic Classification with DBpedia

 18  

•  Do semantic features built from DBpedia Graphs aid on a cross-epoch topic classification of Tweets?

•  Which feature type provides a more stable topic representation over time?

Experiments Framework for Twitter Topic Classification with DBpedia

 

2010

Microposts

Dumps

3.6

3.7

3.8

2011

2013

Resources

3.9

19  

Experiments Datasets

 

Tweets 2010 2011 2013

Disaster and Accident (D&A) Law and Crime (L&C) War and Conflict (W&C) Violence Related Topics

Nov-Dec Aug Sep

1x106 1x106 1x106

Assigns a topic label from a pool of over 10 categories

Perform Manual Annotation until 1K per year per Topic

Negative set 1K per year for Topics other than the 3

12K annotated tweets

20  

Experiments Framework for Twitter Topic Classification with DBpedia

 

2010

Microposts

Dumps

3.6

3.7

3.8

2011

2013

Resources

3.9

Concept Enrichment

<dbp:Hosni_Mubarak>

<dbp:Egypt> <dbp:Barack_Obama>

<dbp:CNN>

21  

Experiments Framework for Twitter Topic Classification with DBpedia

 

2010

Microposts

Dumps

3.6

3.7

3.8

2011

2013

Resources

3.9

Concept Enrichment

Resource Backtrack Mapping

Deriving Semantic Graph Snapshots

2010 2011 2013

22  

Experiments Framework for Twitter Topic Classification with DBpedia

 

Concept Enrichment

Resource Backtrack Mapping

Deriving Semantic Graph Snapshots

2010 2011 2013

DBpedia Topic Relevance based Feature Weighting

2010

Microposts

Dumps

3.6

3.7

3.8

2011

2013

Resources

3.9

23  

Experiments Datasets

 

LEX  

24  

W&

C

D&

A L&

C

NE

G

2010 2011 2013

2010 2011 2013

2010 2011 2013

2010 2011 2013

BoW Category Property Resource Class SEMANTIC  

Experiments Framework for Twitter Topic Classification with DBpedia

 

Concept Enrichment

Resource Backtrack Mapping

Deriving Semantic Graph Snapshots

2010 2011 2013DBpedia Topic Relevance based Feature Weighting

Build Topic Classifier

Topic Labelled Microposts 20

10

2011

2013

2010

Microposts

Dumps

3.6

3.7

3.8

2011

2013

Resources

3.9

25  

Experiments Understanding the Stability of a Topic Representation

 

train test

Lexi

cal

Sem

anti

c Co

mbi

ned

Epoch t t+1

Same epoch Scenario

26  

Experiments Epoch Scenarios

Same epoch Scenario (Trained on 2010- Tested on 2010)

Disaster_Acc Law_Crime War_Conflict F1 F1 F1

BoW 0.831 0.765 0.844

Category 0.697 0.650 0.744

Property 0.680 0.639 0.720

Resource 0.692 0.637 0.762

Class 0.633 0.583 0.637

27  

All  the  experiments  reported  in  our  paper  were  conducted  using  a  10-­‐fold  cross  valida7on  seMng    

Same epoch Scenario

Experiments Understanding the Stability of a Topic Representation

 

train test

Lexi

cal

Sem

anti

c Co

mbi

ned

Epoch t t+1

Cross-epoch Scenario test train

t t+1 28  

Experiments Epoch Scenarios

Cross-epoch Scenario (Trained on 2010- Tested on X)

Cross-Epoch

2010-2011 2010-2013 2011-2013 Average

F1 F1 F1 BoW 0.634 0.481 0.261 0.458 Category 0.683 0.539 0.524 0.582 Property 0.665 0.557 0.502 0.603 Resource 0.774 0.544 0.445 0.587 Class 0.691 0.665 0.669 0.675

Disaster_Acc  

29  

Experiments Epoch Scenarios

Averaged Cross-epoch Scenarios

Disaster_Acc Law_Crime War_Conflict Average F1 F1 F1

BoW 0.458 0.620 0.531 0.536 Category 0.582 0.537 0.453 0.55 Property 0.574 0.504 0.506 0.528 Resource 0.587 0.578 0.466 0.544 Class 0.675 0.647 0.664 0.665

30  

Conclusions

 

•  Semantic Features are much slower to decay

than lexical features. •  Semantic representation improve performance in

cross-time setting scenarios. •  Class based features alone achieve on average a

gain of 7% over lexical features on cross-epoch setting scenarios.

31  

Future Work

 

•  Concept-drift tracking for transfer learning using

Linked Data sources. •  Study cross-epoch transfer learning approaches

using semantic features.

32  

Questions

 

ampaeli@gmail.com @pixarelli

33  

top related