Top Banner
TIMEN An Open Temporal Expression Normalisation Resource H.Llorens, L.Derczynski, R.Gaizauskas, E. Saquete
21

TIMEN: An Open Temporal Expression Normalisation Resource

Jan 26, 2015

Download

Technology

Leon Derczynski

We present TIMEN, a resource for building and sharing knowledge and rules for TimeML temporal expression normalization subtask - that is, the generation of a TIMEX3 annotation from a linguistic temporal expression. This sets a strong basis built from current best approaches which is independent from the rest of temporal expression processing subtasks. Therefore, it can be easily integrated as a module in temporal information processing systems.

Since it is open it can be used, improved and extended by the community, in contrast to closed tools, which must be replicated from scratch as the field advances. Furthermore, TIMEN eases the development of normalization knowledge and rules for low-resourced languages since the normalization process is partially shared between languages.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: TIMEN: An Open Temporal Expression Normalisation Resource

TIMENAn Open Temporal Expression

Normalisation Resource

H.Llorens, L.Derczynski, R.Gaizauskas, E. Saquete

Page 2: TIMEN: An Open Temporal Expression Normalisation Resource

Outline ● Introduction: Timex normalisation● Related work● Problem: reinventing the wheel once and again

● Proposal: TIMEN● Evaluation● Conclusions● Further Work

Page 3: TIMEN: An Open Temporal Expression Normalisation Resource

Timex NormalisationTemporal information extraction subtask.

Timex: linguistic expression of a time point or interval.

Normalisation: semantic interpretation of timexes.

Temporal Expression (TIMEX)Linguistics/Variability/RelativityJune 2012, next month, 06/2012this morning 7 a.m.3 days and 3 hoursweekly

Timex normalizationISO 8601/Invariable interpretation2012-062012-05-24T07:00PT3D3HXXXX-XX-WXX

Page 4: TIMEN: An Open Temporal Expression Normalisation Resource

Timex Normalisation (II)Useful for a variety of NLP applications: IR, QA, Summarization, etc. I went to the cinema yesterday. When did he go to the cinema? 2012-05-23 The main advantage of normalisation is having timexes in standard time representations (e.g., gregorian calendar).

event timex

Value: 2012-05-23

Page 5: TIMEN: An Open Temporal Expression Normalisation Resource

Related Work There are many approaches to timex normalisation ● Pre TempEval-2

○ TempEx (2000), GUTime (2005), Chronos (2004), TERSEO (2005), TimexTag (2005), TEA (2006), DANTE (2007)...

● TempEval-2 (2010)○ HeidelTime, TRIPS/TRIOS, TIPSem/TIPSemB...

Page 6: TIMEN: An Open Temporal Expression Normalisation Resource

Similarities and differences● Approaches have slightly different architectures and

show slightly different performances on tests.

● But all the approaches are rule-based and in general they use the same normalization strategies.

● & also require the same parameters to perform the task.○ DCT: document creation time (deictic) (2 days ago: 2012-05-22)○ Reference time: time talked about (anaphoric)

(2 days before: 2012-05-20)○ Tense: Resolution direction (October)

Past (2011-10), Present/Future (2012-10)

Page 7: TIMEN: An Open Temporal Expression Normalisation Resource

The problemReinventing the wheel once and again● Implementation of high-performance approaches is

costly and it is done all the times from the scratch.● all the approaches are similar: rule-based with similar

normalization rules and strategies.● none is meant to be reused and refined by others.

Page 8: TIMEN: An Open Temporal Expression Normalisation Resource

Proposal: TIMENCharacteristics:

● Open philosophy: meant to be reused and refined (even across languages)

● Not only meant for computer scientists:

○ the algorithms (source code) and normalisation rules (db of user-friendly rules with a documented syntax) are separated.

● Independent from other timex processing tasks

● Multi-platform and easy integration

Page 9: TIMEN: An Open Temporal Expression Normalisation Resource

TIMEN Library ArchitectureExample:timex: three days agoDCT:2012-05-24normtext: 3_day_agopattern: Num_TUnit_agoonly 1 rule matches.normalized value: 2012-05-21

Example2:timex: October 202 rules matchingdisambiguation20 probably a dayrather than a yearbecause <32

Page 10: TIMEN: An Open Temporal Expression Normalisation Resource

Rule base sample (English)

Page 11: TIMEN: An Open Temporal Expression Normalisation Resource

TIMEN integration

Page 12: TIMEN: An Open Temporal Expression Normalisation Resource

TIMEN community ● Open-source software:

http://code.google.com/p/timen/ ● Crowd extension of the rule set (interactive

web interface to upload and check new rules): http//timen.org

* new rules only accepted if they improve the performance on the current dataset or new examples (human reviewed). Eg: New Year's Eve

Page 13: TIMEN: An Open Temporal Expression Normalisation Resource

EvaluationExperiments:● Normalization accuracy of TIMEN

● Performance gain in s-o-a approaches by integrating TIMEN

Datasets:● TempEval-2 test-set

(already known for approaches, mainly common dates and duration)

● TimenEval dataset (new, unknown for appr., balanced among different timex types)

Page 14: TIMEN: An Open Temporal Expression Normalisation Resource

Normalisation accuracy

yesterday2012Octoberdailymorning...

TIMEN

2012-05-2320122012-10xxxx-xx-xx2011...

correctcorrectincorrectcorrectincorrect...

normalisationgold timexes

e.g. TOTAL: 100 timexes to normalise e.g. TOTAL: 90 correct normalizations

RESULT: 90/100 --> 90% ACCURACY

Page 15: TIMEN: An Open Temporal Expression Normalisation Resource

Normalisation accuracy ● TIMEN shows a high performance even in this first

version (only 76 rules). ● TimenEval accuracy is lower. This corpus is more

heterogeneous (times/sets) and normalization is more difficult.

TEST SET NORMALISAION ACC

TempEval-2 0.90

TimenEval 0.68

Page 16: TIMEN: An Open Temporal Expression Normalisation Resource

Performance gain

Approach X recognized timexes

built-innormalisationof Approach X

Originalnormalisation

Performance gain = New accuracy - Original accuracy

TIMEN Newnormalisation

Page 17: TIMEN: An Open Temporal Expression Normalisation Resource

Performance gain (TempEval-2) "known data" ● Replacing built-in normalization approaches of the

systems by TIMEN generally improves their performance in TE2 testset.

● Tested (current) versions of the systems may have been developed/updated being aware of this data. What does it happen with data which is new for them?

System built-in norm. TIMEN norm. Err. Redution

TIPSemB 0.83 0.89 35%

HeidelTime 0.94 0.94 0%

TERNIP 0.76 0.92 66%

Page 18: TIMEN: An Open Temporal Expression Normalisation Resource

Performance gain (TimenEval) "new data" ● Using new data, the built-in approaches performance

decreases in general.● TIMEN favours the normalization performance for all the

systems.

System built-in norm. TIMEN norm. Err. Redution

TIPSemB 0.57 0.67 23%

HeidelTime 0.72 0.74 7%

TERNIP 0.70 0.72 66%

Page 19: TIMEN: An Open Temporal Expression Normalisation Resource

Conclusions ● We presented an open tool for timex normalisation:

TIMEN. ● ADVANTAGES:

○ High performance (above recent approaches).○ Easily integrated in any timex recognition

approach.○ Can be improved by the community (open philosophy),

and avoids re-development from scratch.○ Available: http://timen.org and Google code

Page 20: TIMEN: An Open Temporal Expression Normalisation Resource

Further Work ● Community-based extension and refinement

of TIMEN (rulebase). ● Extensive evaluation of TIMEN in various

languages (Spanish, Chinese, Italian and Danish).

Page 21: TIMEN: An Open Temporal Expression Normalisation Resource

TIMEN: An Open TIMEX Normalisation Resource

THANK YOU!QUESTIONS?

http://timen.org

H.Llorens, L.Derczynski, R.Gaizauskas, E. Saquete