Information on the temporal interval of validity for facts described by RDF triples plays an important role in a large number of applications. Yet, most of the knowledge bases available on the Web of Data do not provide such information in an explicit manner. In this paper, we present a generic approach which addresses this drawback by inserting temporal information into knowledge bases. Our approach combines two types of information to associate RDF triples with time intervals. First, it relies on temporal information gathered from the document Web by an extension of the fact validation framework DeFacto. Second, it harnesses the time information contained in knowledge bases. This knowledge is combined within a three-step approach which comprises the steps matching, selection and merging. We evaluate our approach against a corpus of facts gathered from Yago2 by using DBpedia and Freebase as input and different parameter settings for the underlying algorithms. Our results suggest that we can detect temporal information for facts from DBpedia with an F-measure of up to 70%.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Hybrid Acquisition of Temporal Scopes for RDF Data
Anisa Rula1, Matteo Palmonari1, Axel-Cyrille Ngonga Ngomo2, Daniel Gerber2, Jens Lehmann2, and Lorenz Bühmann2
1. University of Milano-Bicocca, SITI Lab2. Universität Leipzig, Institut für Informatik, AKSW
2
Outline
Anisa Rula
1. Introduction & Motivation
2. Approach Overview
3. Details of the Approach
4. Experimental Evaluation
5. Conclusions
team
team
Temporally annotated RDF triples
Alexandre Pato
S.C. Corinthians
Anisa Rula
Some facts are always valid while other facts are valid for a certain time interval (volatile facts)
Volatile facts are represented by triples whose validity is defined by a time interval i.e. the temporal scope
Temporal Scoping of RDF triples
2007-2013
2013-2014
Temporal scopes, represented by time intervals
A.C. Milan
3
Motivation World changes: relations represented in RDF triples may be valid only
for a specific time interval [Gutierrez et al.,2005]o E.g. <Alexandre_Pato, team, A.C._Milan> [2007,2013]
Many applications have to use temporally annotated RDF tripleso E.g. Temporal Query Answering, Question Answering over KBs, Temporal
Reasoning, Timelines
Challenges Low availability and quality of temporal information in RDF data NLP challenges for web-scale temporal information extraction
(scalability, availability of corpus, conflicting information) [Derczynsk et al., 2013]
Motivation & Challenges
Anisa Rula 4
Temporally annotated RDF triples are largely unavailable or incomplete in the LOD
(Rula et al., 2012)
Anisa Rula
Approach Overview: Use the Web as Source of Evidence
Web of Data - RDF (61.9 Billion)
World Wide Web (1.8 Billion)
Source of evidence
Temporally annotated RDF triples
team
teamAlexandre Pato
team
team
Alexandre Pato
S.C. Corinthians
A.C. Milan
2007-2013
2013-2014S.C. Corinthians
A.C. Milan
5Anisa Rula
Use evidence from the Web for temporal scoping of RDF triples
Web of Documents
Mapping facts to time intervalsTemporal Information Extraction
fact
t1 occ1t2 occ2t3 occ3t4 occ4
Matching Selection
Reasoning
Approach Overview: Hybrid Acquisition of Time Scopes
<s,p,o>
Web of Data
Temporally annotated RDF triples
6Anisa Rula
Set of disconnected time intervals
<s,p,o>[x1,y1],…,[xn,yn]
Temporal Information Extraction - Web Documents
Anisa Rula 7
DeFacto [Lehmann & al. 2012] Retrieves a set of webpages that
confirm the given RDF triple The RDF triple issued to the search
engine is verbalized by using natural language patterns
Temporal Extension for DeFacto (TempDeFacto) Apply Named Entity Tagger to extract the entities of type Date class Observe the occurrences of the labels of the subject and object in less
than 20 tokens Analyze the context window of n characters before and after subject-
object occurrences in order to retrieve the time points Return a distribution vector of date and their number of occurrences
The set of time intervals for a given triple with starting and ending time points defined with the set of relevant time points
9
null null null null null null
null null null null null
null null null null
null null null
null null
null
1989 2000 2006 2007 2008 2013
1989
2000
2006
2007
2008
2013
1. Matching temporal distribution (dfv) against the relevant time interval matrix
0.004 0.166 0.166 0.736 0.8 2.48
0 0 0.142 1.5 1.555 4.2
0 0 0.002 6 4.666 7.5
0 0 0 0.026 6.5 8.428
0 0 0 0 0.004 8
0 0 0 0 0 0.040
1989 2000 2006 2007 2008 2013
1989
2000
2006
2007
2008
2013
RIM
Mapping Facts to Time Intervals - Matching
MatchingSelection
Reasoning
RDF data
2013 17
2007 11
2006 1
2011 6
2008 2
2016 3
2012 15
2010 4
2009 4
1989 2
𝑠𝑚2007 :2008=11+22 =6.5
Significance Matrix (SM)dfv
Anisa Rula 10
1989 2000 2006 2007 2008 2013
1989
2000
2006
2007
2008
2013
SM
0.004 0.166 0.166 0.736 0.8 2.48
0 0 0.142 1.5 1.555 4.2
0 0 0.002 6 4.666 7.5
0 0 0 0.026 6.5 8.428
0 0 0 0 0.004 8
0 0 0 0 0 0.040
Mapping Facts to Time Intervals - Selection
2. Mapping Selection: top-k function: selects the k intervals that have highest scores in the SM neighbor-x: selects a set of intervals whose significance score is close to
the maximum significance score in the SM matrix, up to a certain threshold x
neighbor-k-x: selects the top-k intervals in the neighborhood of the interval with higher significance score
n eighbor ,𝑥=23
top-k
neighbor-k-x [2007, 2013][2008, 2013]
[2006,2013][2007, 2013][2008, 2013]
[2007,2008][2006,2013][2007, 2013][2008, 2013]
MatchingSelection
Reasoning
11Anisa Rula
[2007, 2013][2008, 2013]
[ 2007 2013]
Mapping Facts to Time Intervals - Reasoning
3. Interval merging via reasoning based on Allen’s algebra relation
The best results are obtained when reasoning is enabled
Experimental Results - Accuracy with vs. without Reasoning for all Properties
The best configurations for the three properties
16Anisa Rula
Conclusions & Future Work
Summary Temporal extension of the DeFacto framework Modeling a space of relevant time intervals given an RDF triple Mapping volatile facts to time intervals based on a three-phase algorithm Unsupervised method
Future work Determine when to add or not to add the temporal scope based on the
confidence of the acquisition process Collect additional relevant time points to improve the overall results Show the effectiveness of acquired temporal scopes in temporal query
answering
17Anisa Rula
Thank you for your attentionQuestion?
#eswc2014Rula
18Anisa Rula
References
[Rula&2012] Anisa Rula, Matteo Palmonari, Andreas Harth, Steffen Stadtmüller, Andrea Maurino: On the Diversity and Availability of Temporal Information in Linked Open Data. International Semantic Web Conference (1) 2012: 492-507
[Gutiérrez&2005] C. Gutierrez, C. A. Hurtado, and A. A. Vaisman. Temporal RDF. In The 2ndESWC, pages 93-107, 2005
[Lehmann&2012] Jens Lehmann, Daniel Gerber, Mohamed Morsey, Axel-Cyrille Ngonga Ngomo: DeFacto - Deep Fact Validation. International Semantic Web Conference (1) 2012: 312-327
[Ling&2010] X. Ling and D. S. Weld. Temporal information extraction. In 25th AAAI, 2010.
[Derczynsk&2013] L. Derczynski and R. Gaizauskas. Information retrieval for temporal bounding. In 4th ICTIR, pages 29:129–29:130. ACM, 2013.