May 11, 2015
Presentation Outline• Contemporary workflow design pattern
Using workflow to capture experimentation process Discovery of services using semantics
• Problem description Syntactic incompatibility
• Using ontologies for mediation• Architecture to support syntactic mediation• Mapping Language
Overview of mapping mechanics Implementation description
• Future work Dynamic discovery of Mappings
In Silico Experimentation
• Computational experimentation
• Access to resources provided by Web Services
• Users map experimental process to workflow
• Tasks are realised by service instances
Service Discovery
• Users need to find services to fulfill given tasks e.g. Retrieve sequence data Sequence alignment (Blast)
• There are lots of services !• Interface definitions can be terse, often un-
documented and sometimes cryptic • Limited semantic value• Manual discovery not ideal
Semantic Discovery
• Support users in the discovery of services according to domain specific terminology
• Annotate service descriptions with concepts from an ontology (PEDRO annotation tool) Input and output types assigned a semantic type by a
reference to an ontology concept
• Discover services by: Task performed Resources used Input and output semantic types
Use Case
• Common bioinformatics task:I. Find sequence data for a given id
(accession number)
II. Perform sequence alignment to discover similar sequence data
III. Obtain results
• Itself a complete workflow, but likely to feature in larger workflows too
Semantically Driven Workflow Design
• When building workflows, users connect services because they are deemed semantically compatible: Output semantic type equivalent to input
semantic type
Syntactic Compatibility• However, semantically compatible service
interfaces may not be syntactically compatible (i.e. different data formats)
Syntactic Mediation• When a mismatch in data formats occurs within a
workflow, a translation component is required• Current solutions are manual
Identify when mismatch occurs Derive conversion requirements Find suitable conversion tool Create new translation components if necessary
• These conversion components come in a variety of guises Translation Scripts (e.g. XSLT) Bespoke Code (JAVA and PERL) Web Services
• Simple solution: Adaptor for each compatible data format O(n2) Poor Scalability
• Alternative: Introduce intermediate representation O(n) Less effort introducing new formats
• Data Integration problem
Conversion Approaches
fc
e
b
d
a
fc
e
b
d
a
Three Layer View
• Physical Layer Data can be stored in different formats:
• E.g. binary, text, xml, relational database, etc…
• Logical Layer Organisation of data elements described by a schema:
• E.g XML Schema, relational database model
• Conceptual Layer What the data means (semantics)
• E.g. Ontology, description logic, Entity Relation Diagram
Intermediate Representation
• Data integration field has used this solution in similar application domains: TAMBIS Project [Stevens et al 2003]
• Complex query formulation over diverse bioinformatics information sources
SEEK Project [Bowers and Ludascher 2004]
• An ontology-driven framework for geographic data transformation in scientific workflows
• Intermediate representation in the form of a conceptual model E.g. Ontology, Description Logic
Architecture Requirements
1) OWL ontologies capture data format structure and semantics: Existing service ontologies [e.g. C. Wroe et al 2003] can be
extended with concepts and properties to describe data contents
2) Modular and composable mapping language Mapping overhead reduced when service providers
expose multiple operations over single schemas When schemas are combined to form new datasets,
existing mappings can be reused
Architecture Requirements
3) Invocation of arbitrary Web Services Grid and WS applications pull resources from
multiple providers into a dynamic and volatile environment
Must be able to invoke previously unseen services
4) Minimise annotation overhead Reuse existing Semantic Web Service description
methods Input and output types are assigned a concept
(semantic type)
Mapping XML to OWL
• Problem can be simplified by assuming a canonical XML representation for OWL concept instances [OWL-XI] XML serialisations of OWL concepts commonly used
• However, XML Schemas to validate individuals do not exist
• To support validation, OWL instance Schemas [OWL-XIS] are generated from ontologies Concept hierarchies computed Jena + Java Implementation
• Enables us to view the translation as an XML to XML transformation
Architecture Diagram
Service providers describe their Web Service interfaces using WSDL. Data consumed and produced is defined
using XML Schema.
OWL Ontologies are created toDescribe the information contained
Within Bioinformatics data structures.
Serialisation and Realisation Mappings describe how totransform XML dopcuments to and from [OWL-XI]
Semantic Annotations associate each WSDLMessage part with a concept from the ontology.
[OWL XIS]are generated to validate ontology instances.
Configurable Mediator
• Input: Source data instance Source schema Realisation Mapping (source format -> ontology) Ontology Definition Serialisation Mapping (ontology -> destination format) Destination Schema
• Output: Destination data instance
• Conversion performed via intermediate OWL concept instance
Configurable Mediator
Mapping Mechanics
<S> <X>foo</X> <X>bar</X></S>
<D> <Y>foo</Y> <Y>bar</Y></D>
Source Document Destination Document
m1: S/X -> D/Y
m2: X/$ -> Y/$
Mappings
Mapping Mechanics
S
X
“foo”
S/* S/*
xsd:string xsd:string
X
“bar”
D
Y
“foo”
D/* D/*
xsd:string xsd:string
Y
“bar”
m1: S/X -> D/Ym2: X/$ -> Y/$
Example M-Binding<binding xmlns="http://www.ecs.soton.ac.uk/~mns03r/mapping/example" xmlns:sns="http://jaco.ecs.soton.ac.uk/schema/source" xmlns:dns="http://jaco.ecs.soton.ac.uk/schema/destination">
<mapping id="1"> <source match="sns:S/sns:X"/> <destination create="dns:D[join]/dns:Y[branch]"/> </mapping>
<mapping id=”2"> <source match="sns:X/$"/> <destination create="dns:Y[join]/$"/> </mapping>
</binding>
Bio Example<ddbj:DDBJXML> <ddbj:ACCESSION>AB000059</ddbj:ACCESSION> <ddbj:FEATURES> <ddbj:source> <ddbj:location>1..1755</ddbj:location> <ddbj:qualifiers name="isolate">Som1</ddbj:qualifiers> <ddbj:qualifiers name="lab_host">Felis domesticus</ddbj:qualifiers> </ddbj:source> </ddbj:FEATURES></ddbj:DDBJXML>
<ont:Sequence_Data_Record> <ont:accession_id>AB000059</ont:accession_id> <ont:has_feature> <ont:Feature_Source> <ont:isolate>Som1</ont:isolate> <ont:lab_host>Felis domesticus</ont:lab_host> <ont:location> <ont:Feature_Location> <ont:start>1</ont:start> <ont:end>1755</ont:end> </ont:Feature_Location> </ont:location> </ont:Feature_source> </ont:has_feature></ont:Sequence_Data_Record>
Simple One-to-OneElement and literalMany-to-ManySplit literal valuePredicate evaluation
Example M-Binding<binding xmlns="http://www.ecs.soton.ac.uk/~mns03r/mapping/ddbj-to-ont-mapping" xmlns:sns="http://jaco.ecs.soton.ac.uk/schema/DDBJ" xmlns:dns="http://jaco.ecs.soton.ac.uk/ont/sequencedata">
<mapping id="1"> <source match="sns:DDBJXML/sns:ACCESSION"/> <destination create="dns:Sequence_Data_Record[join]/dns:accession_id[branch]/"/> </mapping>
<mapping id=“2”> <source match="sns:ACCESSION/$"/> <destination create="dns:accession_id[join]/$"/> </mapping>
<mapping id=”3"> <source match="sns:DDBJXML/sns:FEATURES/sns:source"/> <destination create="dns:Sequence_Data_Record[join]/dns:has_feature[branch]/ dns:Feature_Source[branch]"/> </mapping> <mapping id=”4"> <source match='sns:source/sns:qualifiers[sns:qualifiers/sns:name/$ = "lab_host"]'/> <destination create="dns:Feature_Source[join]/dns:lab-host[branch]"/> <mapping> <source match="sns:qualifiers/$"/> <destination create="dns:lab-host[join]/$"/> </mapping> </mapping>
<mapping id=”5"> <source match="sns:location/$^[^.]+"/> <destination create="dns:Location[join]/dns:start[branch]/$"/> </mapping></binding>
Conclusions
• Provide infrastructure to support syntactic mediation: OWL Ontologies to capture data format structure and
semantics (reuse existing annotations) Mapping Language to describe relationships between
XML Schemas and OWL Ontologies• Modular and Composable
Configurable Mediator to consume mappings and perform document translation
Dynamic Web Service Invoker
Future Work
• Dynamic discovery of Mappings Already implemented using the GRIMOIRES
registry and WSDL to describe mapping capabilities
• [Szomszor, Payne, Moreau 2006] UK All Hands, Nottingham
• Annotation Tool Mappings are complex and difficult to write by
hand Web based annotation tool
Questions and Comments?