SPARQLeR: Extended Sparql for Semantic Association Discovery Krzysztof Kochut and Maciej Janik Work supported by the National Science Foundation Grant No. IIS-0325464, entitled “SemDIS: Discovering Complex Relationships in the Semantic Web”. ESWC 2007, Innsbruck, Austria June 4, 2007
34
Embed
SPARQLeR: Extended Sparql for Semantic Association Discovery Krzysztof Kochut and Maciej Janik Work supported by the National Science Foundation Grant.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
SPARQLeR: Extended Sparql for Semantic
Association DiscoveryKrzysztof Kochut and Maciej Janik
Work supported by the National Science Foundation Grant No. IIS-0325464, entitled “SemDIS: Discovering Complex
Relationships in the Semantic Web”.
ESWC 2007, Innsbruck, Austria
June 4, 2007
Computer Science DepartmentUniversity of Georgia
Paths in RDF
Directed path
Undirected path
Undirected path,but with specific properties anddirectionality
Computer Science DepartmentUniversity of Georgia
Why are paths interesting ?
• A path describes how entities are related.– Relationships on the path define meaning of this
connection.– Entities on the path specify the content.
• Do you have migraine? Try taking magnesium!– Path discovered by Dr. D.R.Swanson from partial
information available in PubMed publications• stress can lead to loss of magnesium in the human body• migraine patients seem to be experiencing stress
… that’s why …• migraine could lead to a loss of magnesium, so …
take magnesium to fight migraine!
Swanson, R.D. Migraine and Magnesium: Eleven Neglected Connections. Perspectives in Biology and Medicine, 31 (4). 526-557.
Computer Science DepartmentUniversity of Georgia
Formally, what is a simple path ?• Simple directed path between resources
r0 and rn in a description base R:– sequence r0 p1 r1 p2 r2 , … , pn-1 rn-1 pn rn (n>0)– r0 p1 r1, r1 p2 r2 , … , rn-2 pn-1 rn-1, rn-1 pn rn (n>0) are triples in R.– all of the resources ri (0 ≤i ≤ n) in the path are distinct
• Simple undirected path between resources r0 and rn in R:– sequence r0 p1 r1 p2 r2 , … , pn-1 rn-1 pn rn (n>0)– for each ri-1 pi ri (0 < i ≤ n) in the path, either ri-1 pi ri or ri pi
ri-1 is a triple in R– all of the resources ri (0 ≤i ≤ n) in the path are distinct
Computer Science DepartmentUniversity of Georgia
Paths and SPARQL
• SPARQL query can express only static graph patterns.– Some flexibility is introduced by an OPTIONAL
part, but it does not solve path problems.
• No support for flexible length path expressions.
– Glycan biosynthesis pathway in biology has a specific pattern (properties), but its length may be unknown.
– Path discovery may be of unknown length and pattern, like in Dr. Swanson’s example.
Computer Science DepartmentUniversity of Georgia
What we need to discover paths?• Knowledge discovery needs more flexible
patterns.– Patterns may be partially known or even
unknown (unrestricted path).– Properties on the path, their order and
directionality create a specific meaning.– Entities on the path provide content.– Relationships to entities outside of the path
give an additional context.
Computer Science DepartmentUniversity of Georgia
Proposed extensions
• A path may have a flexible length– For computational reasons, length is limited.
• Constraints on properties– Specific properties must appear in the path.– Their order and directionality is meaningful.– They can form a repeating pattern.
• Constraints on resources– Specific resources must be on the path.– They can be anywhere on the path or at
specific positions.
Computer Science DepartmentUniversity of Georgia
SPARQLeR
• Extension of SPARQL for semantic association discovery.
• Seamlessly integrated into the SPARQL syntax.
• Graph patterns incorporating simple paths with constraints.
• Constraints are based on regular expressions over properties.
Computer Science DepartmentUniversity of Georgia
What is a path in SPARQLeR ?• Path is a meta-property that connects two
resources.– Defined as a sequence of interleaving properties and
resources.– Starts and ends with properties (endpoint resources are
not included).– A path of length 1 is a sequence with just one property.