Fixing the Domain and Range of Properties in Linked Data by Context Disambiguation Alberto Tonon, Michele Catasta, Gianluca Demartini, Philippe Cudré-Mauroux LDOW - May the 19th, 2015
Fixing the Domain and Range of Properties in Linked Data by Context Disambiguation
Alberto Tonon, Michele Catasta, Gianluca Demartini, Philippe Cudré-Mauroux
LDOW - May the 19th, 2015
Linked Data…
2
"Cobie Smulders"
"Neil Patrick Harris"
"How I Met Your Mother"
showName
starring
starring
name
name
TV Show
type type
Persontype
type
type
type
network
typeTV Network
Broadcast Network
type
Actor
Actor
Person
Work
… and its Schema
3
......
Thing
Person Work
TV showActor
Organisation
Broadcaster...
Type Hierarchy
network
Broadcaster
range
domain
Broadcasterstarring
Work
range
domain
Actor
Property Definitions
Data-Schema Coherence
4
"Cobie Smulders"
"Neil Patrick Harris"
"How I Met Your Mother"
showName
starring
starring
name
name
TV Show
type type
Persontype
type
type
type
network
typeTV Network
Broadcast Network
type
Actor
Actor
Person
Work
network
Broadcaster
range
domain
Broadcasterstarring
Work
range
domain
Actor
Data-Schema Coherence
4
"Cobie Smulders"
"Neil Patrick Harris"
"How I Met Your Mother"
showName
starring
starring
name
name
TV Show
type type
Persontype
type
type
type
network
typeTV Network
Broadcast Network
type
Actor
Actor
Person
Work
network
Broadcaster
range
domain
Broadcasterstarring
Work
range
domain
Actor
✔✔
Data-Schema Coherence
4
"Cobie Smulders"
"Neil Patrick Harris"
"How I Met Your Mother"
showName
starring
starring
name
name
TV Show
type type
Persontype
type
type
type
network
typeTV Network
Broadcast Network
type
Actor
Actor
Person
Work
network
Broadcaster
range
domain
Broadcasterstarring
Work
range
domain
Actor
✔✔✘
Incoherences in Real KBs
5
Property Dom Incoherences
% Dom Incoherences
dpo:years ~641k 100%dpo:currentMember ~260k 100%… … …
Property Dom Incoherences
% Dom Incoherences
fb:[…]object.type ~99M 61%fb:[…]object.name ~41M 100%… … …
Data-Driven Domains/Ranges
• Just intersect the types of all resources appearing as subject/object…
• …being consistent with the type hierarchy.
6
......
Thing
Person Work
TV showActor
Organisation
Broadcaster...
Type Hierarchy
Data-Driven Domains/Ranges
• Dom(foaf:name) = Thing —> Everything has a name !
• Dom(dpo:manager) = Thing —> Everything has a manager "
7
SportSeason0.55
Agent0.44
...
Thing1.00
...Soccer Cricket"k�1
Rugby"k
Baseball"10.42
... ...SoccerClubSeason0.55
SportsTeam0.44
... ... Organisation0.44
SportsTeamSeason0.55
LEXT: an ExampleComputing the domain of dpo:manager
8
SportSeason0.55
Agent0.44
...
Thing1.00
...Soccer Cricket"k�1
Rugby"k
Baseball"10.42
... ...SoccerClubSeason0.55
SportsTeam0.44
... ... Organisation0.44
SportsTeamSeason0.55
dpo:manager is used in two different contexts
LEXT: an ExampleComputing the domain of dpo:manager
8
SportSeason0.55
Agent0.44
...
Thing1.00
...Soccer Cricket"k�1
Rugby"k
Baseball"10.42
... ...SoccerClubSeason0.55
SportsTeam0.44
... ... Organisation0.44
SportsTeamSeason0.55
dpo:manager is used in two different contexts
LEXT: an ExampleComputing the domain of dpo:manager
8
manager
soccer club season manager
sports teammanager
Thing
SoccerClubSeason SportsTeam
SportSeason0.55
Agent0.44
...
Thing1.00
...Soccer Cricket"k�1
Rugby"k
Baseball"10.42
... ...SoccerClubSeason0.55
SportsTeam0.44
... ... Organisation0.44
SportsTeamSeason0.55
dpo:manager is used in two different contexts
LEXT: an ExampleComputing the domain of dpo:manager
8
manager
soccer club season manager
sports teammanager
Thing
SoccerClubSeason SportsTeam
Visit the hierarchy until: 1) Pr(type | property) ≥ λ&&2) H(Pr(property | children)) < η
LEXT
H = 1.96
H = 0.9 SportSeason0.55
Agent0.44
...
Thing1.00
...Soccer Cricket"k�1
Rugby"k
Baseball"10.42
... ...SoccerClubSeason0.55
SportsTeam0.44
... ... Organisation0.44
SportsTeamSeason0.55
dpo:manager is used in two different contexts
LEXT: an ExampleComputing the domain of dpo:manager
8
manager
soccer club season manager
sports teammanager
Thing
SoccerClubSeason SportsTeam
Visit the hierarchy until: 1) Pr(type | property) ≥ λ&&2) H(Pr(property | children)) < η
LEXT
REXT and LERIXT• REXT = LEXT but with types of object resources
• LERIXT = LEXT + REXT
• two type trees (one for Domain and one for Range), current state is a pair (subject type, object type)
9
SportSeason Agent ...
Thing
...Soccer Cricket RugbyBaseball
... ...SoccerClubSeason SportsTeam
... ... OrganisationSportsTeamSeason
SportSeason Agent ...
Thing
...Soccer Cricket RugbyBaseball
... ...SoccerClubSeason SportsTeam
... ... OrganisationSportsTeamSeason
Current State
About λ
10
Figure 1: Coverage and number of new sub-properties varying λ.
Evaluation• Fixed λ = 0.1, η = 1
• 3 authors + 2 experts (majority vote) evaluated the output of LEXT REXT, and LERIXT.
• LERIXT generates too many new sub-properties
11
LEXT REXT LERIXT
Precision 96.50% 91.40% 87.00%
Table 2: Precision of LEXT, REXT, and LERIXT
Conclusion• Three different methods for identifying contexts
• LEXT: exploits the type of the subject resources
• REXT: exploits the type of the object resources
• LERIXT: exploits both
• Up to 96.50% precision.
12
Visit the hierarchy until: 1) Pr(type | property) ≥ λ&&2) H(Pr(property | children)) < η
LEXT