Temporal Relations with Signals: the case of Italian Temporal Prepositions
Post on 15-Dec-2014
428 Views
Preview:
DESCRIPTION
Transcript
Temporal Relations with Signals: the Case of Italian
Temporal Prepositions
Tommaso Caselli, Felice dell’Orletta and Irina Prodanof
{firstname.lastname@ilc.cnr.it}
ILC-CNR, Pisa
16th International Symposium on Temporal Representation and Reasoning TIME 2009
Bressanone/Brixen, July 24 2009
Different approach Application oriented NLP techniques Focus on: intuitions, knowledge and strategies people use in order
to
place events in time order events (encoding and decoding)
Query texts (corpora) and NOT structured knowledge
Introduction
Outline: Motivations
Temporal Signals in Italian: Theoretical background
Methodology
Corpus Study
A Maximum Entropy Model Feature Identification
Evaluation and Results
Conclusion and Future Work
Motivations Recovering temporal relations in text/discourse is essential to
improve the performance of many NLP systems (O.D-Q.A., Text Mining, Summarization, Reasoning)
Most temporal information in text/discourse is only IMPLICITLY stated
Need to develop procedures to maximize the role of the various sources of information
Temporal prepositions are a partially explicit source of information.
Determinig their meaning is part of a strategy to improve the extraction of temporal information
Motivations (2)
SIGNAL = cover term for a homogeneous class of words which express relations between textual entities
EXPLICIT = self-evident and stable meaning; Rel (X, Y)
IMPLICIT = abstract meaning which gets specialized in the co-text; Rel (λ(X), λ(Y))
Temporal signals express temporal relations.
Temporal signals can occur in 3 types of constructions involving temporal entities:
temporal expression – temporal expression
eventuality – temporal expression
eventuality - eventuality
Theoretical Background
Corpus Study: Data
To identify a large set of temporal signals realized by prepositions we have conducted a corpus study:
5 million shallow parsed word corpus (from the PAROLE corpus)
all PP chunks with their left and right contexts have been automatically extracted and imported into a database structure
automatically generated DB augmented with ontological information from the SIMPLE/CLIPS Ontology, by associating the head noun of each PP chunk to its ontological type
extraction of the noun head corresponding to type TIME + postprocessing to exclude false positives (e.g. incubation, school…)
Temporal relations coded by implicit signals:
annotation of temporal relations by means of paraphrase tests
e.g. [sono stato sposato] per [4 anni] (I’ve been married for four years)
The state of “being married” EQUALS four years
499 occurrences of construction of the type “eventuality + signal + temporal expressions”
9 temporal relations (compliant with TimeML and ISO-TimeML): overlap, simultaneous, before, after, no tlink, begin, end, before_ending, equals
the most frequent temporal relation/implicit signal is assumed to be the prototypical meaning of the signal
Corpus Study (2)
The corpus study together with theoretical statements have led to the identification of 16 features:
PREP: the signal lemma
3 sets of co-textual feature:
information about temporal expression
information about the eventuality
local contextual information
Feature Identification
Temporal expression features:
Ontological status: INSTANT, INTERVAL
Type of temporal expressions (TIMEX):
DATE: August 3; 1968; 01/12/1980…
DURATION: 3 hours; the last quarter…
SET: once every year…
TIME: 3 o’ clock; (in) the morning…
Presence of a quantifier: QUANTIFIER
Feature Identification – Temporal Expressions
Eventuality features:
Lemma (POTGOV_head);
POS of the eventuality: VERB, NOUN
Presence of negations (NEGATION)
Verb diatesis (DIATESIS)
Tense: PRESENT, IMPERFECT, FUTURE, PAST, INFINITIVE
(Viewpoint) Aspect: IMPERFECTIVE, PERFECTIVE, PROGRESSIVE, NONE
Lexical Aspect (AKTIONSAART): TRANSITION, PROCESS, STATE
Feature Identification - Eventuality
Local context features: features which accounts for the presence of further signals in the local context which influence the identification of the Rel value of the signal in analysis
FOLLOWED_SIGNAL+TIMEX
PRECEED_SIGNAL+TIMEX
FOLLOWED_SIGNAL+EVENT
Feature Identification – Local context
Feature annotation: manually conducted by one annotator + one of the author.
1000 instances of constructions of the type “eventuality + signal + timex”
• two interlinked criteria: semantic transparency of the signal + relative frequency of the signal in the 5 million shallow parsed corpus
Assigning the right temporal relation is (in essence) a tagging task.
Maximum Entropy algorithm: it provides a suitable solution to identify the set of possible values for each signal on the basis of the conditional probability distribution. No a priori constraints must be met other than those related to a set of features fi(a, c) of a context C, whose distribution is derived from the training data.
Building a M.E. Model
Evaluation
The data set has been split in test (100) and training (900) data
8 different models have been created to discover the most salient features. 10- cross fold validation/model.
All models outperforms the baseline relevance of the features
Evaluation (2)
1. PREP
2. INTERVAL3. INSTANT4. POTGOV_head5. VERB6. NOUN7. DIATESIS8. NEGATION
9. AKTIONSAART10. FOLLOWED_SIGNAL+TIMEX11. PRECEED_SIGNAL+TIMEX12. FOLLOWED_SIGNAL+EVENT13. TENSE14. ASPECT15. TIMEX16. QUANTIFIER
10 Feature Model
Performance = 90%
surface-based features
good performance without the AKTIONSAART feature
Evaluation (3)
Model Performance Features
9 features 89.8%
PREP, INTERVAL,
INSTANT, AKTIONSAART,
FOLLOWED_SIGNAL+TIMEX,
PRECEED_SIGNAL+TIMEX,
FOLLOWED_SIGNAL+EVENT,
TIMEX, QUANTIFIER
8 features 89.8%
PREP, INTERVAL,
INSTANT,
FOLLOWED_SIGNAL+TIMEX,
PRECEED_SIGNAL+TIMEX,
FOLLOWED_SIGNAL+EVENT,
TIMEX, QUANTIFIER
Evaluation (3)
Model Performance Features
8 features (No QUANTIFIER)
85%
PREP, INTERVAL,
INSTANT, AKTIONSAART,
FOLLOWED_SIGNAL+TIMEX,
PRECEED_SIGNAL+TIMEX,
FOLLOWED_SIGNAL+EVENT,
TIMEX
7 features 86.8%
PREP, INTERVAL,
INSTANT,
FOLLOWED_SIGNAL+TIMEX,
PRECEED_SIGNAL+TIMEX,
FOLLOWED_SIGNAL+EVENT,
TIMEX, QUANTIFIER
5 features 87.6%
PREP, INTERVAL,
INSTANT, TIMEX, QUANTIFIER
Mismatch between linguistic theory and features salience
Observations on the features:
5 core features: PREP, INSTANT, INTERVAL, TIMEX, QUANTIFIER (5 feature model)
AKTIONSAART influence in this task is almost null. It could be reduced with a set of features more surface-based e.g. presence of D.O., definiteness, cardinality, type of subject…
the remaining features could be activated in particular linguistic context and with particular signals; e.g. TENSE, ASPECT and AKTIONSAART (ot its subsitutes) with the signal IN; the local context features with the signals DA, A and TRA.
Conclusion & Future Work
Integration of the M.E. Model into a complete automatic system for temporal processing of text/discourse
Thanks
top related