YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: TIMEN: An Open Temporal Expression Normalisation Resource

TIMENAn Open Temporal Expression

Normalisation Resource

H.Llorens, L.Derczynski, R.Gaizauskas, E. Saquete

Page 2: TIMEN: An Open Temporal Expression Normalisation Resource

Outline ● Introduction: Timex normalisation● Related work● Problem: reinventing the wheel once and again

● Proposal: TIMEN● Evaluation● Conclusions● Further Work

Page 3: TIMEN: An Open Temporal Expression Normalisation Resource

Timex NormalisationTemporal information extraction subtask.

Timex: linguistic expression of a time point or interval.

Normalisation: semantic interpretation of timexes.

Temporal Expression (TIMEX)Linguistics/Variability/RelativityJune 2012, next month, 06/2012this morning 7 a.m.3 days and 3 hoursweekly

Timex normalizationISO 8601/Invariable interpretation2012-062012-05-24T07:00PT3D3HXXXX-XX-WXX

Page 4: TIMEN: An Open Temporal Expression Normalisation Resource

Timex Normalisation (II)Useful for a variety of NLP applications: IR, QA, Summarization, etc. I went to the cinema yesterday. When did he go to the cinema? 2012-05-23 The main advantage of normalisation is having timexes in standard time representations (e.g., gregorian calendar).

event timex

Value: 2012-05-23

Page 5: TIMEN: An Open Temporal Expression Normalisation Resource

Related Work There are many approaches to timex normalisation ● Pre TempEval-2

○ TempEx (2000), GUTime (2005), Chronos (2004), TERSEO (2005), TimexTag (2005), TEA (2006), DANTE (2007)...

● TempEval-2 (2010)○ HeidelTime, TRIPS/TRIOS, TIPSem/TIPSemB...

Page 6: TIMEN: An Open Temporal Expression Normalisation Resource

Similarities and differences● Approaches have slightly different architectures and

show slightly different performances on tests.

● But all the approaches are rule-based and in general they use the same normalization strategies.

● & also require the same parameters to perform the task.○ DCT: document creation time (deictic) (2 days ago: 2012-05-22)○ Reference time: time talked about (anaphoric)

(2 days before: 2012-05-20)○ Tense: Resolution direction (October)

Past (2011-10), Present/Future (2012-10)

Page 7: TIMEN: An Open Temporal Expression Normalisation Resource

The problemReinventing the wheel once and again● Implementation of high-performance approaches is

costly and it is done all the times from the scratch.● all the approaches are similar: rule-based with similar

normalization rules and strategies.● none is meant to be reused and refined by others.

Page 8: TIMEN: An Open Temporal Expression Normalisation Resource

Proposal: TIMENCharacteristics:

● Open philosophy: meant to be reused and refined (even across languages)

● Not only meant for computer scientists:

○ the algorithms (source code) and normalisation rules (db of user-friendly rules with a documented syntax) are separated.

● Independent from other timex processing tasks

● Multi-platform and easy integration

Page 9: TIMEN: An Open Temporal Expression Normalisation Resource

TIMEN Library ArchitectureExample:timex: three days agoDCT:2012-05-24normtext: 3_day_agopattern: Num_TUnit_agoonly 1 rule matches.normalized value: 2012-05-21

Example2:timex: October 202 rules matchingdisambiguation20 probably a dayrather than a yearbecause <32

Page 10: TIMEN: An Open Temporal Expression Normalisation Resource

Rule base sample (English)

Page 11: TIMEN: An Open Temporal Expression Normalisation Resource

TIMEN integration

Page 12: TIMEN: An Open Temporal Expression Normalisation Resource

TIMEN community ● Open-source software:

http://code.google.com/p/timen/ ● Crowd extension of the rule set (interactive

web interface to upload and check new rules): http//timen.org

* new rules only accepted if they improve the performance on the current dataset or new examples (human reviewed). Eg: New Year's Eve

Page 13: TIMEN: An Open Temporal Expression Normalisation Resource

EvaluationExperiments:● Normalization accuracy of TIMEN

● Performance gain in s-o-a approaches by integrating TIMEN

Datasets:● TempEval-2 test-set

(already known for approaches, mainly common dates and duration)

● TimenEval dataset (new, unknown for appr., balanced among different timex types)

Page 14: TIMEN: An Open Temporal Expression Normalisation Resource

Normalisation accuracy

yesterday2012Octoberdailymorning...

TIMEN

2012-05-2320122012-10xxxx-xx-xx2011...

correctcorrectincorrectcorrectincorrect...

normalisationgold timexes

e.g. TOTAL: 100 timexes to normalise e.g. TOTAL: 90 correct normalizations

RESULT: 90/100 --> 90% ACCURACY

Page 15: TIMEN: An Open Temporal Expression Normalisation Resource

Normalisation accuracy ● TIMEN shows a high performance even in this first

version (only 76 rules). ● TimenEval accuracy is lower. This corpus is more

heterogeneous (times/sets) and normalization is more difficult.

TEST SET NORMALISAION ACC

TempEval-2 0.90

TimenEval 0.68

Page 16: TIMEN: An Open Temporal Expression Normalisation Resource

Performance gain

Approach X recognized timexes

built-innormalisationof Approach X

Originalnormalisation

Performance gain = New accuracy - Original accuracy

TIMEN Newnormalisation

Page 17: TIMEN: An Open Temporal Expression Normalisation Resource

Performance gain (TempEval-2) "known data" ● Replacing built-in normalization approaches of the

systems by TIMEN generally improves their performance in TE2 testset.

● Tested (current) versions of the systems may have been developed/updated being aware of this data. What does it happen with data which is new for them?

System built-in norm. TIMEN norm. Err. Redution

TIPSemB 0.83 0.89 35%

HeidelTime 0.94 0.94 0%

TERNIP 0.76 0.92 66%

Page 18: TIMEN: An Open Temporal Expression Normalisation Resource

Performance gain (TimenEval) "new data" ● Using new data, the built-in approaches performance

decreases in general.● TIMEN favours the normalization performance for all the

systems.

System built-in norm. TIMEN norm. Err. Redution

TIPSemB 0.57 0.67 23%

HeidelTime 0.72 0.74 7%

TERNIP 0.70 0.72 66%

Page 19: TIMEN: An Open Temporal Expression Normalisation Resource

Conclusions ● We presented an open tool for timex normalisation:

TIMEN. ● ADVANTAGES:

○ High performance (above recent approaches).○ Easily integrated in any timex recognition

approach.○ Can be improved by the community (open philosophy),

and avoids re-development from scratch.○ Available: http://timen.org and Google code

Page 20: TIMEN: An Open Temporal Expression Normalisation Resource

Further Work ● Community-based extension and refinement

of TIMEN (rulebase). ● Extensive evaluation of TIMEN in various

languages (Spanish, Chinese, Italian and Danish).

Page 21: TIMEN: An Open Temporal Expression Normalisation Resource

TIMEN: An Open TIMEX Normalisation Resource

THANK YOU!QUESTIONS?

http://timen.org

H.Llorens, L.Derczynski, R.Gaizauskas, E. Saquete


Related Documents