-
Finding Concept Coverings in AligningOntologies of Linked
Data
Rahul Parundekar, Craig A. Knoblock, and José Luis Ambite
Information Sciences Institute and Department of Computer
ScienceUniversity of Southern California
4676 Admiralty Way, Suite 1001, Marina del Rey, CA
90292{parundek,knoblock,ambite}@usc.edu
Abstract. Despite the recent growth in the size of the Linked
DataCloud, the absence of links between the vocabularies of the
sources hasresulted in heterogenous schemas. Our previous work
tried to find con-ceptual mapping between two sources and was
successful in finding align-ments, such as equivalence and subset
relations, using the instances thatare linked as equal. By using
existential concepts and their intersectionsto define specialized
classes (restriction classes), we were able to findalignments where
previously existing concepts in one source did not
havecorresponding equivalent concepts in the other source. Upon
inspection,we found that though we were able to find a good number
of alignments,we were unable to completely cover one source with
the other. In manycases we observed that even though a larger class
could be defined com-pletely by the multiple smaller classes that
it subsumed, we were unableto find these alignments because our
definition of restriction classes didnot contain the disjunction
operator to define a union of concepts. Inthis paper we propose a
method that discovers alignments such as these,where a (larger)
concept of the first source is aligned to the union of thesubsumed
(smaller) concepts from the other source. We apply this
newalgorithm to the Geospatial, Biological Classification, and
Genetics do-mains and show that this approach is able to discover
numerous conceptcoverings, where (in most cases) the subsumed
classes are disjoint. Theresulting alignments are useful for
determining the mappings betweenontologies, refining existing
ontologies, and finding inconsistencies thatmay indicate that some
instances have been erroneously aligned.
1 Introduction
The Web of Linked Data has seen huge growth in the past few
years. As ofSeptember 2011, the Linked Open Data Cloud has grown to
a size of 31.6 billiontriples. This includes a wide range of data
sources belonging to the government(42%), geographic (19.4%), life
sciences (9.6%) and other domains1. A commonway that the instances
in these sources are linked to others is the use of theowl:sameAs
property. Though the size of Linked Data Cloud seems to be
in-creasing drastically (10% over the 28.5 billion triples in
2010), inspection of the
1 http://www4.wiwiss.fu-berlin.de/lodcloud/state/
-
2 Rahul Parundekar, Craig A. Knoblock, and José Luis Ambite
sources at the ontology level reveals that only a few of them
(15 out of the 190sources) have some mapping of the vocabularies.
For the success of the SemanticWeb, it is important that these
heterogenous schemas be linked. As described inour previous papers
on Linking and Building Ontologies of Linked Data [8] andAligning
Ontologies of Geospatial Linked Data [7], an extensional technique
canbe used to generate alignments between the ontologies behind
these sources. Inthese previous papers, we introduced the concept
of restriction classes, which isthe set of instances that satisfy a
conjunction of value restrictions on properties(property-value
pairs).
Though our algorithm was able to identify a good number of
alignments, itwas unable to completely cover one source with the
classes in the other source.Upon closer look, we found that most of
these alignments that we missed did nothave a corresponding
restriction class in the other source, and instead subsumedmultiple
restriction classes. While reviewing these subset relations, we
discoveredthat in many cases the union of the smaller classes
completely covered the largerclass. In this paper, we describe how
we extend our previous work to discoversuch concept coverings by
introducing more expressive set of class descriptions(unions of
value restrictions).2 In most of these coverings, the smaller
classes arealso found to be disjoint. In addition, further analysis
of the alignments of thesecoverings provides a powerful tool to
discover incorrect links in the Web of LinkedData, which can
potentially be used to point out and rectify the inconsistenciesin
the instance alignments.
This paper is organized as follows. First, we describe the
Linked Open Datasources that we try to align in the paper. Second,
we briefly review our alignmentalgorithm from [8] along with the
limitations of the results that were generated.Third, we describe
our approach to finding alignments between unions of re-strictions
classes. Fourth, we describe how outliers in these alignments help
toidentify inconsistencies and erroneous links. Fifth, we describe
the experimen-tal results on union alignments over additional
domains. Finally, we compareagainst related work, and discuss our
contributions and future work.
2 Sources Used for Alignments
Linked Data, by definition, links the instances of multiple
sources. Often, sourcesconform to different, but related,
ontologies that can also be meaningfully linked[8]. In this section
we describe some of these sources from different domains thatwe try
to align, instances in which are linked using an equivalence
property likeowl:sameAs.
Linking GeoNames with places in DBpedia: DBpedia (dbpedia.org)
isa knowledge base that covers multiple domains including around
526,000 placesand other geographical features from the Geospatial
domain. We try to alignthe concepts in DBpedia with GeoNames
(geonames.org), which is a geographicsource with about 7.8 million
things. It uses a flat-file like ontology, where all
2 This work is an extended version of our workshop paper [6]. We
have extended themethod to find coverings in the Biological
Classification and Genetics domains.
-
Finding Concept Coverings in Aligning Ontologies of Linked Data
3
instances belong to a single concept of Feature. This makes the
ontology rudimen-tary, with the type data (e.g. mountains, lakes,
etc.) about these geographicalfeatures instead in the Feature Class
& Feature Code properties.
Linking LinkedGeoDatawith places in DBpedia: We also try to
findalignments between the ontologies behind LinkedGeoData
(linkedgeodata.org)and DBpedia. LinkedGeoData is derived from the
Open Street Map initiativewith around 101,000 instances linked to
DBpedia using the owl:sameAs property.
Linking species from Geospecies with DBpedia: The Geospecies
(geospecies.org)knowledge base contains species belonging to plant,
animal, and other kingdomslinked to species in DBpedia using the
skos:closeMatch property. Since the in-stances in the taxonomies in
both these sources are the same, the sources areideal for finding
the alignment between the vocabularies.
Linking genes from GeneID with MGI : The Bio2RDF
(bio2rdf.org)project contains inter-linked life sciences data
extracted from multiple data-setsthat cover genes, chemicals,
enzymes, etc. We consider two sources from theGenetics domain from
Bio2RDF, GeneID (extracted from the National Centerfor
Biotechnology Information database) and MGI (extracted from the
MouseGenome Informatics project), where the genes are marked
equivalent.
Although we provide results of the above four mentioned
alignments in Sec-tion 4, in the rest of this paper we explain our
methodology by using the align-ment of GeoNames with DBpedia as an
example.
3 Aligning Ontologies on the Web of Linked Data
First, we briefly describe our previous work on finding subset
and equivalentalignments between restriction classes from two
ontologies. Then, we describehow to use the subset alignments to
finding more expressive union alignments.Finally, we discuss how
outliers in these union alignments often identify incorrectlinks in
the Web of Linked Data.
3.1 Our previous work on aligning ontologies of Linked Data
In [8] we introduced the concept of restriction classes to align
extensional con-cepts in two sources. A restriction class is a
concept that is derived extensionallyand defined by a conjunction
of value restrictions for properties (called property-value pairs)
in a source. Such a definition helps overcome the problem of
aligningrudimentary ontologies with more sophisticated ones. For
example, GeoNamesonly has a single concept (Feature) to which all
of its instances belong, whileDBpedia has a rich ontology. However,
Feature has several properties that canbe used to define more
meaningful classes. For example, the set of instances inGeoNames
with the value PPL in the property featureCode, nicely aligns
withthe instances of City in DBpedia.
Our algorithm explored the space of restriction classes from two
ontologiesand was able to find equivalent and subset alignments
between these restrictionclasses. Fig. 1 illustrates the instance
sets considered to score an alignment hy-pothesis. We first find
the instances belonging to the restriction class r1 fromthe first
source and r2 from the second source. We then compute the image
-
4 Rahul Parundekar, Craig A. Knoblock, and José Luis Ambite
of r1 (denoted by I(r1)), which is the set of instances from the
second sourcelinked to instances in r1 (dashed lines in the
figure). By comparing r2 with theintersection of I(r1) and r2
(shaded region), we can determine the relationshipbetween r1 and
r2. We defined two metrics P and R, as the ratio of |I(r1)∩r2|
to|I(r1)| and |r2| respectively, to quantify set-containment
relations. For example,two classes are equivalent if P = R = 1. In
order to allow a certain marginof error induced by the data-set, we
used the relaxed versions P’ and R’ aspart of our scoring
mechanism. In this case, two classes were considered equiv-alent if
P ′ > 0.9 and R′ > 0.9 For example, consider the alignment
betweenrestriction classes (lgd:gnis%3AST alpha=NJ) from
LinkedGeoData and
(db-pedia:Place#type=http://dbpedia.org/resource/City (New Jersey))
from DBpe-dia. Based on the extension sets, our algorithm finds
|I(r1)| = 39, |r2| = 40,|I(r1)∩ r2| = 39, R′ = 0.97 and P ′ = 1.0.
Based on our error margins, we assertthe alignment as equivalent in
an extensional sense. The exploration of the spaceof alignments and
the scoring procedure is described in detail in [8].
r1 r2
I(r1) Instance pairs where both r1 and r2 holds
Set of instance pairs where both r1 and r2 holds
Key:
Set of instances from O1 where r1 holds
Set of instances from O2 where r2 holds
Set of instances from O2 paired to instances from O1
Instance pairs where r1 holds
Fig. 1. Comparing the linked instances from two ontologies
Though the approach produced a large number of equivalent
alignments, wewere not able to find a complete coverage because
some restriction classes did nothave a corresponding equivalent
restriction class and instead subsumed multiplesmaller restriction
classes. For example, in the GeoNames and DBpedia align-ment, we
found that {rdf:type=dbpedia:EducationalInstitution} from
DBpediasubsumed {geonames:featureCode=S.SCH},
{geonames:featureCode=S.SCHC}and {geonames:featureCode=S.UNIV}
(i.e. Schools, Colleges and Universitiesfrom GeoNames). We
discovered that taken together, the union of these threerestriction
classes completely define rdf:type=dbpedia:EducationalInstitution.
Tofind such previously undetected alignments we decided to extend
the expressivityof our restriction classesby introducing a
disjunction operator to detect conceptcoverings completely.
3.2 Identifying union alignments
In our current work, we use the subset and equivalent alignments
generatedby the previous work to try and align a larger class from
one ontology with aunion of smaller subsumed restriction classes in
the other ontology. Since theproblem of finding alignments with
conjunctions and disjunction of property-value pairs of restriction
classes is combinatorial in nature, we focus only on
-
Finding Concept Coverings in Aligning Ontologies of Linked Data
5
subset and equivalence relations that map to an restriction
classes with a singleproperty-value pair. This helps us find the
simplest definitions of concepts andalso makes the problem
tractable. Alignments generated by our previous workthat satisfy
the single property-value pair constraint are first grouped
accordingto the subsuming restriction classes and then according to
the property of thesmaller classes. Since restriction classes are
constructed by forming a set ofinstances that have one of the
properties restricted to a single value, aggregatingrestriction
classes from the group according to their properties builds a
moreintuitive definition of the union. We can now define the
disjunction operatorthat constructs the union concept from the
smaller restriction classes in thesesub-groups. The disjunction
operator is defined for restriction classes, such thati) the
concept formed by the disjunction of the classes represents the
union oftheir set of instances, ii) each of the classes that are
aggregated contain only asingle property-value pair and iii) the
property for all those property-value pairsis the same. We then try
to detect the alignment between the larger commonrestriction class
and the union by using an extensional approach similar to
ourprevious paper. We call such an alignment a hypothesis union
alignment.
We define US as the set of instances that is the union of
individual smallerrestriction classes Union(r2); UL as the image of
the larger class by itself,Img(r1)); and UA as the overlap between
these sets, union(Img(r1) ∩ r2)). Wecheck whether the larger
restriction class is equivalent to the union concept byusing
scoring functions analogous to P ′ & R′ from our previous
paper. The new
scoring mechanism defines P ′U as|UA||US | and R
′U as
|UA||UL| with relaxed scoring as-
sumptions as in P ′ & R′. To accommodate errors in the
data-set, we consider ita complete coverage when the score is
greater than a relaxed score of 0.9. Thatis, the hypothesis union
alignment is considered equivalent if P ′U > 0.9 & R
′U >
0.9. Since by construction, each of the subset already satisfies
P ′ > 0.9, thenwe are assured that P ′U is always going to be
greater than 0.9. Thus, a unionalignment is equivalent if R′U >
0.9.
Figure 2 provides an example of the approach. Our previous
algorithm findsthat {geonames:featureCode = S.SCH},
{geonames:featureCode = S.SCHC},{geonames:featureCode = S.UNIV} are
subsets of {rdf:type=dbpedia:Educational-Institution}. As can be
seen in the Venn diagram in Figure 2, UL is Img({rdf:type=
dbpedia:EducationalInstitution}), US is {geonames:featureCode =
S.SCH} ∪{geonames:featureCode = S.SCHC} ∪ {geonames:featureCode =
S.UNIV}, andUA is the intersection of the two. With the educational
institutions example,R′U for the alignment of
dbpedia:EducationalInstitution to the union of S.SCH,S.SCHC &
S.UNIV is 0.98. We can thus confirm the hypothesis and considerthis
union alignment equivalent. Section 4 shows additional examples of
unionalignments.
3.3 Using outliers in union mappings to identify linked data
errors
The computation of union alignments allows for a margin of error
in the subsetcomputation. It turns out that the outliers, the
instances that do not satisfy the
-
6 Rahul Parundekar, Craig A. Knoblock, and José Luis Ambite
Img(r1) : Educational Institutions from Dbpedia
Key:
Union(r2): Schools, Colleges and Universities from Geonames.
Schools from Geonames.
Img(r1) Union(r2)
Colleges from Geonames.
Universities from Geonames.
EducationalInstitution
S.SCH S.SCHC
S.UNIV
Outliers.
Fig. 2. Spatial covering of Educational Institutions from
DBpedia
restriction classes in the alignments, are often due to
incorrect links. Thus, ouralgorithm also provides a novel method to
curate the Web of Linked Data.
Consider the outlier found in the {dbpedia:country = Spain} ≡
{geonames:-countryCode = ES} alignment. Of the 3918 instances of
dbpedia:country=Spain,3917 have a link to a
geonames:countryCode=ES. The one instance not havingcountry code ES
has an assertion of country code IT (Italy) in GeoNames.
Thealgorithm would flag this situation as a possible linking error,
since there is over-whelming support for the ES being the country
code of Spain. A more interestingcase occurs in the alignment of
{rdf:type = dbpedia:EducationalInstitution} to{geonames:featureCode
∈ {S.SCH, S.SCHC, S.UNIV}}. For {rdf:type =
dbpe-dia:EducationalInstitution}, 396 instances out of the 404
Educational Institu-tions were accounted for as having their
geonames:featureCode as one of S.SCH,S.SCHC or S.UNIV. From the 8
outliers, 1 does not have a geonames:featureCodeproperty asserted.
The other 7 have their feature codes as either S.BLDG (3buildings),
S.EST (1 establishment), S.HSP (1 hospital), S.LIBR (1 library)
orS.MUS (1 museum). This case requires more sophisticated curation
and theoutliers may indicate a case for multiple inheritance. For
example, the hospi-tal instance in geonames may be a medical
college that could be classified as auniversity.
Our union alignment algorithm is able to detect similar other
outliers andprovides a powerful tool to quickly focus on links that
require human curation,or that could be automatically flagged as
problematic, and provides evidence forthe error.
4 Experimental Results
The results of union alignment algorithm over the four pairs of
sources we con-sider appear in Table 1. In total, the 7069 union
alignments explained (covered)77966 subset alignments, for a
compression ratio of 90%.
-
Finding Concept Coverings in Aligning Ontologies of Linked Data
7
Table 1. Union Alignments Found in the 4 Source Pairs
Source1 Source2 Union Alignments 12 Union Alignments 21 Total
union(Subset Alignments 12) (Subset Alignments 21) alignments
GeoNames DBpedia 434 (2197) 318 (7942) 752LinkedGeoData DBpedia
2746 (12572) 3097 (48345) 5843
Geospecies DBpedia 191 (1226) 255 (2569) 446GeneID MGI 6 (29) 22
(3086) 28
The resulting alignments were intuitive. Some interesting
examples appearin Tables 2, 3 and 4. In the tables, for each union
alignment, column 2 describesthe large restriction class from
ontology1 and colunm 3 describes the union ofthe (smaller) classes
on ontology2 with the corresponding property and value
set. The score of the union is noted in column 4 (R′U =|UA||UL|
) followed by |UA|
and |UL| in columns 5 and 6. Column 7 describes the outliers,
i.e. values of v2that form restriction classes that are not direct
subsets of the larger restrictionclass. Each of these outliers also
has a fraction with the number of instancesthat belong to the
intersection upon the the number of instances of the smaller
restriction class (or |Img(r1)∩r2||r2| ). It can be seen that
the fraction is less than our
relaxed subset score. If the value of this fraction was greater
than the relaxedsubset score (i.e. 0.9), the set would have been
included in column 3 instead.The last column mentions how many of
the total UL instances were we ableto explain using UA and the
outliers. For example, the union alignment #1 ofTable 2 is the
Educational Institution example described before. It shows
howeducational institutions from DBpedia can be explained by
schools, colleges anduniversities in GeoNames. Column 4, 5 and 6
explain the alignment score R′U(0.98), the size UA (396) and the
size of UL (404). Outliers (S.BLDG, S.EST,S.LIBR, S.MUS, S.HSP)
along with their P ′ fractions appear in column 7. Wewere able to
explain 403 of the total 404 instances (see column 8).
We find other interesting alignments, a representative few of
which are shownin the tables. In some cases, the union alignments
found were intuitive becauseof an underlying hierarchical nature of
the concepts involved, especially in case ofalignments of
administrative divisions in geospatial sources and alignments in
thebiological classification taxonomy. For example, #3 highlights
alignments thatreflect the containment properties of administrative
divisions. Other interestingtypes of alignment were also found. For
example #7 tries to map two non-similarconcepts. It explains the
license plate codes found in the state (bundesland) ofSaarland3.
Due to lack of space, we explain the other union alignments
alongsidein the tables. The complete set of alignments discovered
by our algorithm areavailable on our group page.4
Outliers. In alignments that had inconsistencies, we identified
three mainreasons: (i)Incorrect instance alignments - outliers
arising out of possible erro-
3
http://www.europlates.com/publish/euro-plate-info/german-city-codes4
http://www.isi.edu/integration/data/UnionAlignments
-
8 Rahul Parundekar, Craig A. Knoblock, and José Luis Ambite
neous equivalence link between instances (e.g. #4, #8, etc.),
(ii)Missing instancealignments - insufficient support for coverage
due to missing links between in-stances or missing instances (e.g.
#9, etc.), (iii) Incorrect values for properties- outliers arising
out of possible erroneous assertion for property (e.g. #5,
#6,etc.). In the tables, we also mention the classes that these
inconsistencies belongto along with their support.
5 Related Work
Ontology alignment and schema matching have been a well explored
area of re-search since the early days of ontologies[3, 1] and
received renewed interest inrecent years with the rise of the
Semantic Web and Linked Data. Though mostwork done in the Web of
Linked Data is on linking instances across differentsources, an
increasing number of authors have looked into aligning the
sourceontologies in the past couple of years. Jain et al. [4]
describe the BLOOMS ap-proach which uses a central forest of
concepts derived from topics in Wikipedia.An update to this is the
BLOOMS+ approach [5] that aligns Linked Open Dataontologies with an
upper-level ontology called Proton. BLOOMS & BLOOMS+are unable
to find alignments because of the small number of classes in
GeoN-ames that have vague declarations. The advantage of our
approach over these isthat our use of restriction classes is able
to find a large set of alignments in caseslike aligning GeoNames
with DBpedia where Proton fails due to a rudimentaryontology. Cruz
et al. [2] describe a dynamic ontology mapping approach
calledAgreementMaker that uses similarity measures along with a
mediator ontologyto find mappings using the labels of the classes.
From the subset and equiv-alent alignment between GeoNames(10
concepts) and DBpedia(257 concepts),AgreementMaker was able to
achieve a precision of 26% and a recall of 68%. Webelieve that
since their approach did not consider unions of concepts, it
wouldnot have been able to find alignments like the Educational
Institutions exam-ple (#1) by using only the labels and the
structure of the ontology, though athorough comparison is not
possible. In our work, we find equivalent relationsbetween a
concept on one side and a union of concepts on another side. CSR
[9]is a similar work to ours that tries to align a concept from one
ontology to a unionof concepts from the other ontology. In their
approach, the authors describe howthe similarity of properties are
used as features in predicting the subsumptionrelationships. It
differs from our approach in that it uses a statistical
machinelearning approach for detection of subsets rather than the
extensional approach.An approach that uses statistical methods for
finding alignments, similar to ourwork, has also been described in
Völker et al. [10]. This work induces schemas forRDF data sources
by generating OWL2 axioms using an intermediate associa-tivity
table of instances and concepts (called transaction data-sets) and
miningassociativity rules from it.
6 Conclusions and Future Work
We described an approach to identifying union alignments in data
sources on theWeb of Linked Data from the Geospatial, Biological
Classification and Genetics
-
Finding Concept Coverings in Aligning Ontologies of Linked Data
9Table
2.
Exam
ple
alignm
ents
from
theGeoNames
-DBpedia
LinkedGeoData
-DBpedia
.
#{r
1}
p2∈{v
2}
R′ U
=|U
A|
|UL||U
A||U
L|
Outliers
#Explain
ed
Instance
s
DBped
ia(larg
er)
-Geo
Names(smaller)
1{rdf:type
=geonames:featureCode∈
0.9
801
396
404
S.B
LD
G(3
/122),
S.E
ST
(1/13),
403
dbpedia:E
ducationalInstitution}
{S.S
CH
,S.S
CH
C,
S.U
NIV
}S.L
IBR
(1/7),
S.H
SP
(1/31),
S.M
US
(1/43)
As
des
crib
edin
Sec
tion
4,
Sch
ools
,C
olleg
esand
Univ
ersi
ties
inGeoNames
make
Educa
tional
Inst
ituti
ons
inDBpedia
2{d
bpedia:country=
dbpedia:Spain}
geonames:countryC
ode=
ES
0.9
997
3917
3918
IT(1
/7635)
3918
The
conce
pts
for
the
countr
ySpain
are
equal
inb
oth
sourc
es.
The
only
outl
ier
has
it’s
countr
yas
Italy
,an
erro
neo
us
ass
erti
on.
3dbpedia:region=
geonames:parentA
DM2∈
1.0
754
754
754
dbpedia:B
asse-Norm
andie
{geo
nam
es:2
989247,
geo
nam
es:2
996268,
geo
nam
es:3
029094}
We
confirm
the
hie
rarc
hic
al
natu
reof
adm
inis
trati
ve
div
isio
ns
wit
halignm
ents
bet
wee
nadm
inis
trati
ve
unit
sat
two
diff
eren
tle
vel
s.
4{rdf:type
=geonames:featureCode∈
0.9
924
1981
1996
S.A
IRF
(9/22),
S.F
RM
T(1
/5),
1996
dbpedia:A
irport}
{S.A
IRB
,S.A
IRP}
S.S
CH
(1/404),
S.S
TN
B(2
/5)
S.S
TN
M(1
/36),
T.H
LL
(1/61)
Inalignm
enin
gair
port
s,an
air
fiel
dsh
ould
hav
eb
een
an
an
air
port
.H
owev
er,
ther
ew
as
not
enough
inst
ance
supp
ort
.
Geo
Names(larg
er)
-DBped
ia(smaller)
5{geonames:countryC
ode=
NL}
dbpedia:country∈
0.9
802
1939
1978
dbp
edia
:Kin
gdom
of
1940
{dbp
edia
:The
Net
her
lands,
the
Net
her
lands
dbp
edia
:Fla
gof
the
Net
her
lands.
svg,
dbp
edia
:Net
her
lands}
The
Alignm
ent
for
Net
her
lands
should
hav
eb
een
as
stra
ightf
orw
ard
as
#2.
How
ever
we
hav
ep
oss
ible
alias
nam
es,
such
as
TheNetherlands
andKingdom
ofNetherlands,
as
wel
la
poss
ible
linka
ge
erro
rto
FlagoftheNetherlands.svg
6{geonames:countryC
ode=
JO}
dbpedia:country∈
0.9
519
20
20
{dbp
edia
:Jord
an,
dbp
edia
:Fla
gof
Jord
an.s
vg}
The
erro
rpatt
ern
in#
5se
ems
tore
pea
tsy
stem
ati
cally,
as
can
be
seen
from
this
alignm
ent
for
the
coutr
yof
Jord
an.
DBped
ia(larg
er)
-Lin
ked
Geo
Data
(smaller)
7{d
bpedia:bundesland=
Saarland}
lgd:O
penGeoDBLicen
sePlate-
0.9
346
49
46
Num
ber
∈{
HO
M,
IGB
,M
ZG
,N
K,
SB
,SL
S,
VK
,W
ND}
Our
alg
ori
thm
als
opro
duce
sin
tere
stin
galignm
ents
bet
wee
ndiff
eren
tpro
per
ties
.In
this
case
,w
efind
8of
the
10
lice
nse
pla
tes
inth
est
ate
of
Saarl
and
-
10 Rahul Parundekar, Craig A. Knoblock, and José Luis
AmbiteTable
3.
Exam
ple
alig
nm
ents
from
theLinked
GeoData
-DBped
ia,Geospecies
-DBped
ia
#{r1 }
p2∈{v2 }
R′U
=|U
A|
|UL||U
A ||U
L |Outlie
rs#
Explain
ed
Insta
nce
s
8{rdf:type,
rdf:type
∈0.9
901
2609
2610
2609
dbped
ia:E
ducatio
nalIn
stitutio
n}{lg
d:A
men
ity,lg
d:K
2543,
lgd:S
chool,
lgd:U
niv
ersity,lg
d:W
aterT
ower}
Educa
tional
Institu
tions
inDBped
iaca
nb
eex
pla
ined
with
classes
inLinked
GeoData
.A
nex
am
ple
of
an
inco
rrent
alig
nm
ent,
aw
ater
tower
has
been
linked
toas
an
educa
tional
institu
tion.
Lin
ked
Geo
Data
(larg
er)
-DBped
ia(sm
alle
r)
9{lgd
:gnisS
Talpha=
NJ}
dbped
ia:su
bdivisio
nName∈
1.0
214
214
214
{A
tlantic,
Burlin
gto
n,
{C
ap
eM
ay,H
udso
n,
Hunterd
on,
Monm
oth
,N
ewJersey,
Ocea
n,
Passa
ic}D
ue
tom
issing
insta
nce
alig
nm
ents,
this
unionalign
men
tin
correctly
claim
sth
at
the
state
of
New
Jersey
isco
mp
osed
of
9co
unties
while
actu
ally
ithas
21.
10{rdf:type
=lgd
:Waterw
ay}
rdf:type
∈0.9
733
34
dbp
edia
:Pla
ce(1/94989)
34
dbp
edia
:Riv
erdbp
edia
:Strea
m}
Waterw
ays
inLinked
GeoData
as
equal
toth
eunio
nof
stream
sand
rivers
from
DBped
ia
DBped
ia(la
rger)
-Geo
spec
ies(sm
alle
r)
11{rdf:type
=dbped
ia:A
mphibia
n}geo
species:hasO
rderN
ame∈
0.9
990
91
Testu
din
es(1
/7)
91
dbped
ia:A
mphibia
n}
{A
nura
,C
audata
,G
ym
nophio
nia}
Sp
eciesfro
mGeospecies
with
the
ord
ernam
esA
nura
,C
audata
&G
ym
nophio
nia
are
all
Am
phib
ians
We
also
find
inco
nsista
ncies
due
tom
isalig
ned
insta
nces,
e.g.
one
Turtle
(Testid
une)
was
classifi
edas
am
phib
ian.
12{rdf:type
=dbped
ia:Salamander}
{geo
species:hasO
rderN
ame=
0.9
416
17
Testu
din
es(1
/7)
17
Caudata}
Up
on
furth
erin
spectio
nof
#11,
we
find
that
the
culp
ritis
aSala
mander
Geo
spec
ies(la
rger)
-DBped
ia(sm
alle
r)
13{rdf:type
=dbped
ia:P
lant}
{geo
species:inKingdom
=0.9
91874
1876
geo
species:k
ingdom
s/A
c(1/8)
1875
geo
species:k
ingdom
s/A
b}T
he
Kin
gdom
Pla
nta
e,fro
mb
oth
sources,
alm
ost
match
esp
erfectly.T
he
only
inco
nsista
nt
insta
nce
happ
ens
tob
ea
fungus.
-
Finding Concept Coverings in Aligning Ontologies of Linked Data
11
Table
4.
Exam
ple
alignm
ents
from
theGen
eID
-MGI
#{r
1}
p2∈{v
2}
R′ U
=|U
A|
|UL|
|UA|
|UL|
Outliers
#Explain
ed
Instance
s
14{geospecies:inOrder
=dbpedia:ordo∈
0.9
9247
247
247
geospecies:orders/jtSaY}
{dbp
edia
:Carn
ivora
,dbp
edia
:Carn
ivore}
Inco
nsi
stanci
esin
the
ob
ject
valu
esca
nals
ob
ese
en-
Carn
ivore
sfr
om
Geospecies
are
aligned
wit
hb
oth
:C
arn
ivora
&C
arn
ivore
.
15{geospecies:hasO
rderName=
dbpedia:ordo∈
1111
111
111
Chiroptera}
{Chir
opte
ra@
en,
dbp
edia
:Bat}
We
can
det
ect
that
spec
ies
wit
hord
erC
hir
opte
raco
rrec
tly
bel
ong
toth
eord
erof
Bats
.U
nfo
rtunate
y,due
tova
lues
of
the
pro
per
tyb
eing
the
lite
ral
“C
hir
opta
@en
”,
the
alignm
ent
isnot
clea
n.
GeneID
(larg
er)
-M
GI
(smaller)
16{bio2rdf:subT
ype=
{bio2rdf:subT
ype=
0.9
35919
6317
Gen
e(3
18/24692)
6237
pseudo}
Pse
udogen
e}D
ue
toth
eabse
nce
of
acl
ear
hie
rarc
hy,
we
found
only
afe
whie
rarc
hic
al
rela
tions.
For
exam
ple
,alignm
ents
of
the
class
esP
seudogen
es.
17{bio2rdf:xT
axon=
bio2rdf:subT
ype∈
130993
30993
30993
taxon:10090}
{Com
ple
xC
lust
er/R
egio
n,
DN
ASeg
men
t,G
ene,
Pse
udogen
e}T
he
Mus
Musc
ulu
s(h
ouse
mouse
)ta
xonom
yis
com
ple
tely
com
pose
dof
com
ple
xcl
ust
ers,
DN
Ase
gm
ents
,G
enes
and
Pse
udogen
es.
MGI
(larg
er)
-GeneID
(smaller)
18{bio2rdf:subT
ype=
bio2rdf:subT
ype=
pseudo
0.9
45919
6297
oth
er(4
/230)
6297
Pseudogen
e}pro
tein
-codin
g(3
51/39999)
unknow
n(2
3/570)
Inco
nsi
stanci
esare
als
oev
iden
tas
the
valu
espse
udo
and
Pse
udogen
eare
use
dto
den
ote
the
sam
eth
ing.
19{m
gi:gen
omeS
tart
=1}
geneid:location∈
0.9
81697
1735
“”(3
7/1048)
1735
{1,
10.0
cM,
5(1
/52)
11.0
cM,
110.4
cM,
...}
20{m
gi:gen
omeS
tart
=X}
geneid:location∈
0.9
91748
1758
“”(1
0/1048)
1758
{X,
X0.5
cM,
X0.8
cM,
X1.0
cM,
...}
We
find
inte
rest
ing
alignm
ents
like
#19
&#
20
,w
hic
halign
the
gen
om
est
art
posi
tion
inMGI
wit
hth
elo
cati
on
inGen
eID
As
can
be
seen
,th
eva
lues
of
the
loca
tions
(dis
tance
sin
centi
morg
ans)
inGen
eID
conta
ingen
om
est
art
valu
eas
apre
fix.
Inco
nsi
stanci
esare
als
ose
en,
e.g.
in#
19
agen
eth
at
start
sw
ith
5is
mis
aligned
and
in#
20,
wher
eth
eva
lue
isan
empty
stri
ng.
-
12 Rahul Parundekar, Craig A. Knoblock, and José Luis
Ambite
domains. By extending our definition of restriction classes with
the disjunctionoperator, we are able to find alignments of union
concepts from one source tolarger concepts from the other source.
Our approach produce coverings whereconcepts at different levels in
the ontologies of two sources can be mapped evenwhen there is no
direct equivalence. We are also able to find outliers that enableus
to identify inconsistencies in the instances that are linked by
looking at thealignment pattern. The results provide deeper insight
into the nature of thealignments of Linked Data.
As part of our future work we want to try to find a more
complete descriptionsfor the sources. Our preliminary findings show
that the results of this paper canbe used to find patterns in the
properties. For example, the countryCode propertyin GeoNames is
closely associated with the country property in DBpedia,
thoughtheir ranges are not exactly equal. We believe that an
in-depth analysis of thealignment of ontologies of sources is
warranted with the recent rise in the linksin the Linked Data
cloud. This is an extremely important step for the grandSemantic
Web vision.
References
1. Bernstein, P., Madhavan, J., Rahm, E.: Generic schema
matching, ten years later.Proceedings of the VLDB Endowment 4(11)
(2011)
2. Cruz, I., Palmonari, M., Caimi, F., Stroe, C.: Towards on the
go matching of linkedopen data ontologies. In: Workshop on
Discovering Meaning On The Go in LargeHeterogeneous Data. p. 37
(2011)
3. Euzenat, J., Shvaiko, P.: Ontology matching. Springer-Verlag
(2007)4. Jain, P., Hitzler, P., Sheth, A., Verma, K., Yeh, P.:
Ontology alignment for linked
open data. The Semantic Web–ISWC 2010 pp. 402–417 (2010)5. Jain,
P., Yeh, P., Verma, K., Vasquez, R., Damova, M., Hitzler, P.,
Sheth, A.:
Contextual ontology alignment of lod with an upper ontology: A
case study withproton. The Semantic Web: Research and Applications
pp. 80–92 (2011)
6. Parundekar, R., Ambite, J.L., Knoblock, C.A.: Aligning unions
of concepts in on-tologies of geospatial linked data. In:
Proceedings of the Terra Cognita 2011 Work-shop in Conjunction with
the 10th International Semantic Web Conference. Bonn,Germany
(2011)
7. Parundekar, R., Knoblock, C.A., Ambite, J.L.: Aligning
geospatial ontologies onthe linked data web. In: Proceedings of the
GIScience Workshop on Linked Spa-tiotemporal Data. Zurich,
Switzerland (2010)
8. Parundekar, R., Knoblock, C.A., Ambite, J.L.: Linking and
building ontologies oflinked data. In: Proceedings of the 9th
International Semantic Web Conference(ISWC 2010). Shanghai, China
(2010)
9. Spiliopoulos, V., Valarakos, A., Vouros, G.: Csr: discovering
subsumption relationsfor the alignment of ontologies. The Semantic
Web: Research and Applications pp.418–431 (2008)
10. Völker, J., Niepert, M.: Statistical schema induction. The
Semantic Web: Researchand Applications pp. 124–138 (2011)