Outline Introduction Corpus Analysis NER Performance Analysis Experiments Final Remarks Is this NE tagger getting old? Language Resources and Evaluation Conference Marrakech, Morocco - May 28th - 30th 2008 Cristina Mota and Ralph Grishman IST & L2F INESC-ID (Portugal) & NYU (USA) and New York University (USA) (Advisors: Ralph Grishman & Nuno Mamede) This research was funded by Funda¸ c˜ ao para a Ciˆ encia e a Tecnologia (doctoral scholarship SFRH/BD/3237/2000) Cristina Mota and Ralph Grishman Is this NE tagger getting old?
Presentation at LREC 2008 of Cristina Mota & Ralph Grishman (2008). "Is this NE tagger getting old?". In Proc. of the 6th International Conference on Language Resources and Evaluation (LREC 2008) (Marrakech, 28-30 May 2008).
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
OutlineIntroduction
Corpus AnalysisNER Performance Analysis
ExperimentsFinal Remarks
Is this NE tagger getting old?
Language Resources and Evaluation ConferenceMarrakech, Morocco - May 28th - 30th 2008
Cristina Mota and Ralph Grishman
IST & L2F INESC-ID (Portugal) & NYU (USA)and
New York University (USA)
(Advisors: Ralph Grishman & Nuno Mamede)
This research was funded by Fundacao para a Ciencia e a Tecnologia (doctoral scholarship SFRH/BD/3237/2000)
Cristina Mota and Ralph Grishman Is this NE tagger getting old?
OutlineIntroduction
Corpus AnalysisNER Performance Analysis
ExperimentsFinal Remarks
Outline
1 Introduction
2 Corpus Analysis
3 NER Performance Analysis
4 Experiments
5 Final Remarks
Cristina Mota and Ralph Grishman Is this NE tagger getting old?
OutlineIntroduction
Corpus AnalysisNER Performance Analysis
ExperimentsFinal Remarks
MotivationApproach
1 IntroductionMotivationApproach
2 Corpus Analysis
3 NER Performance Analysis
4 Experiments
5 Final Remarks
Cristina Mota and Ralph Grishman Is this NE tagger getting old?
OutlineIntroduction
Corpus AnalysisNER Performance Analysis
ExperimentsFinal Remarks
MotivationApproach
What is NER?
Mary is studying in Rabat at Mohammed V University� NE Tagger �
MaryPER is studying in RabatLOC at Mohammed VUniversityORG
Cristina Mota and Ralph Grishman Is this NE tagger getting old?
OutlineIntroduction
Corpus AnalysisNER Performance Analysis
ExperimentsFinal Remarks
MotivationApproach
The Problem
o o o o o o
oo
o o
o
o
o
o
o
o
05
1015
2025
Time frame (semester)N
ame
occu
rren
ces
/ 100
K w
ords
x
x
x
x x x x x x x x x x x x xO O
OO
O O
O
O O O
O
O
O
O
O O
xx
x
x
x x
x
x x x x x x x x x
oxOx
UECEEUnião EuropeiaComunidade Europeia
91a 92a 93a 94a 95a 96a 97a 98a
Do texts vary over time in a way that affects NE recognition?
Should NE taggers be also conceived time-aware?
Cristina Mota and Ralph Grishman Is this NE tagger getting old?
OutlineIntroduction
Corpus AnalysisNER Performance Analysis
ExperimentsFinal Remarks
MotivationApproach
Approach
Corpus Analysis
Measure corpus similarity based on
Words
Compute name list overlaps
By type
By token
NER Performance Analysis
Assess performance by trainingand testing with differentconfigurations (train,test)
Increase time gap betweentraining and test data
Cristina Mota and Ralph Grishman Is this NE tagger getting old?
OutlineIntroduction
Corpus AnalysisNER Performance Analysis
ExperimentsFinal Remarks
Corpus Similarity Algorithm (Kilgarriff, 2001)Name List Overlaps
1 Introduction
2 Corpus AnalysisCorpus Similarity Algorithm (Kilgarriff, 2001)Name List Overlaps
3 NER Performance Analysis
4 Experiments
5 Final Remarks
Cristina Mota and Ralph Grishman Is this NE tagger getting old?
OutlineIntroduction
Corpus AnalysisNER Performance Analysis
ExperimentsFinal Remarks
Corpus Similarity Algorithm (Kilgarriff, 2001)Name List Overlaps
Corpus Similarity Algorithm (Kilgarriff, 2001)
Similarity(A,B):
Split corpus A and B into k slices each
Repeat m times:
Randomly allocate k2 slices to Ai and k
2 to Bi
Construct word frequency lists for Ai and Bi
Compute CBDF between A and B for the n most frequentwords of the joint corpus (Ai+Bi )[CBDF = χ2 by degrees of freedom]
Output mean and standard deviation of CBDF of allexperiments
Repeat using corpus A only: Similarity(A,A) → Homogeneity(A)Repeat using corpus B only: Similarity(B,B) → Homogeneity(B)
Cristina Mota and Ralph Grishman Is this NE tagger getting old?
OutlineIntroduction
Corpus AnalysisNER Performance Analysis
ExperimentsFinal Remarks
Corpus Similarity Algorithm (Kilgarriff, 2001)Name List Overlaps
Corpus Similarity Algorithm (Kilgarriff, 2001)
Corpus A
DAA′1
DAA′2
.
.
.
DAA′n
DAA′
Homogeneity(A)
12 Corpus A + 1
2 Corpus B
DAB′1
DAB′2
.
.
.
DAB′n
DAB
Similarity(A, B)
Corpus B
DBB′1
DBB′2
.
.
.
DBB′n
DBB′
Homogeneity(B)
Lower values of D ⇒ higher homogeneity/similarity
Cristina Mota and Ralph Grishman Is this NE tagger getting old?
OutlineIntroduction
Corpus AnalysisNER Performance Analysis
ExperimentsFinal Remarks
Corpus Similarity Algorithm (Kilgarriff, 2001)Name List Overlaps
Name List Overlaps
type overlap =|TA ∩ TB |
|TA| + |TB | − |TA ∩ TB |(1)
token overlap =
∑Ni=1 min(fA(i), fb(i))
∑Ni=1 max(fA(i), fB(i))
(2)
TA = list of different names (name types) of text A
fA(i) = frequency of name i in text A
Cristina Mota and Ralph Grishman Is this NE tagger getting old?
OutlineIntroduction
Corpus AnalysisNER Performance Analysis
ExperimentsFinal Remarks
Corpus Similarity Algorithm (Kilgarriff, 2001)Name List Overlaps
Name List Overlaps
A name list: Mary (3), Rabat (5), Mohammed V University (4)B name list: John (1), Rabat (2), Mohammed V Universirty (6)
Cristina Mota and Ralph Grishman Is this NE tagger getting old?
OutlineIntroduction
Corpus AnalysisNER Performance Analysis
ExperimentsFinal Remarks
NE Tagger Description (Collins & Singer, 1999)
1 Introduction
2 Corpus Analysis
3 NER Performance AnalysisNE Tagger Description (Collins & Singer, 1999)
4 Experiments
5 Final Remarks
Cristina Mota and Ralph Grishman Is this NE tagger getting old?
OutlineIntroduction
Corpus AnalysisNER Performance Analysis
ExperimentsFinal Remarks
NE Tagger Description (Collins & Singer, 1999)
NE Tagger Description (Collins & Singer, 1999)
Raw TEXT
POS Tagging + Parsing
Shallow Parsed TEXT
NE Identification TEXT with unclassified NE
List of Examples (NE,context)
NE Classification Name seeds
List of Labeled Examples (NE, context, label)
Text Update + NE Propagation
TEXT with classified NE
?
?
?
?-
?�
?
?�
?
?
Classification in detail:
Name Rules :- Name seeds
Label with Name Rules
Infer Contextual Rules
Label with Contextual Rules
Infer Name Rules
Label with Name + Contextual Rules
List of Labeled Examples (NE, Context, Label)
?
?
?
?
�
-
6
?
?
Cristina Mota and Ralph Grishman Is this NE tagger getting old?
OutlineIntroduction
Corpus AnalysisNER Performance Analysis
ExperimentsFinal Remarks
Experimental SettingF-Measure over TimePolitics Dissimilarity over TimePolitics Name List Overlap over TimeF-Measure compared to Dissimilarity
1 Introduction
2 Corpus Analysis
3 NER Performance Analysis
4 ExperimentsExperimental SettingF-Measure over TimePolitics Dissimilarity over TimePolitics Name List Overlap over TimeF-Measure compared to Dissimilarity
5 Final RemarksCristina Mota and Ralph Grishman Is this NE tagger getting old?
OutlineIntroduction
Corpus AnalysisNER Performance Analysis
ExperimentsFinal Remarks
Experimental SettingF-Measure over TimePolitics Dissimilarity over TimePolitics Name List Overlap over TimeF-Measure compared to Dissimilarity
Experimental Setting
91a 92a 93a 94a 95a 96a 97a 98a
Time frame (semester)
Num
ber
of w
ords
0e+
002e
+06
4e+
066e
+06
8e+
061e
+07
CultureSportsEconomyPoliticsSociety
CETEMPublico (Santos & Rocha, 2001) is aPortuguese public journalistic corpus
Cristina Mota and Ralph Grishman Is this NE tagger getting old?
OutlineIntroduction
Corpus AnalysisNER Performance Analysis
ExperimentsFinal Remarks
Experimental SettingF-Measure over TimePolitics Dissimilarity over TimePolitics Name List Overlap over TimeF-Measure compared to Dissimilarity
Politics Name List Overlap over Time
0 1 2 3 4 5 6 7
4.0
4.5
5.0
5.5
6.0
Time gap (year)
Nam
e ty
pe o
verla
p (%
)
0 1 2 3 4 5 6 7
1.7
1.8
1.9
2.0
2.1
2.2
Time gap (year)
Nam
e to
ken
over
lap
(%)
Within the same time frame, the type overlap varies between 5% and 6%
At a distance of 5 years it varies between 3.5% and 4.5%
Within the same year, the name token overlap varied between 4.2% and 4.4%
At distance of 5 years varied between 3.2% and 3.7%
Overlap between name lists also decreases over time
Corpus comparisons: (Ui ,Tj ), i=91..98, j=91..98 [64 comparisons]
Cristina Mota and Ralph Grishman Is this NE tagger getting old?
OutlineIntroduction
Corpus AnalysisNER Performance Analysis
ExperimentsFinal Remarks
Experimental SettingF-Measure over TimePolitics Dissimilarity over TimePolitics Name List Overlap over TimeF-Measure compared to Dissimilarity
F-Measure compared to Dissimilarity
1 2 3 4 5 6
0.79
0.80
0.81
0.82
0.83
0.84
0.85
Dissimilarity (= mean CBDF)
F−
mea
sure
(%
)
There is an inverse associationbetween dissimilarity andF-measure: for higher levels ofdissimilarity (i.e, higher distancevalues) we obtain lowerperformance values
OBS: Higher values = Lower similarity
Cristina Mota and Ralph Grishman Is this NE tagger getting old?
OutlineIntroduction
Corpus AnalysisNER Performance Analysis
ExperimentsFinal Remarks
Main ResultsWork in Progress
1 Introduction
2 Corpus Analysis
3 NER Performance Analysis
4 Experiments
5 Final RemarksMain ResultsWork in Progress
Cristina Mota and Ralph Grishman Is this NE tagger getting old?
OutlineIntroduction
Corpus AnalysisNER Performance Analysis
ExperimentsFinal Remarks
Main ResultsWork in Progress
Main Results
Within a period of 8 years we observed that:
Corpus similarity and name overlaps tend to decrease as thetwo corpora become more temporally distant
The performance of a co-training based NE tagger trained andtested on those texts shows a decay as we increase the timegap between the training and the test data
There is an association between the results of the corpusanalysis and the tagger performance
Cristina Mota and Ralph Grishman Is this NE tagger getting old?
OutlineIntroduction
Corpus AnalysisNER Performance Analysis
ExperimentsFinal Remarks
Main ResultsWork in Progress
Work in Progress
Other related issues we are currently investigating aiming at betternamed entity recognition
Analyze the NE surrounding contexts to verify if they alsotend to overlap less over time
Investigate how we can avoid the performance decay
Do we need more data?Do we need more labeled data within the same time frame?Do we need more unlabeled data within the same time frame?
Cristina Mota and Ralph Grishman Is this NE tagger getting old?