Date: 10/11/2012 Common Motifs in Scientific Workflows: An Empirical Analysis Daniel Garijo *, Pinar Alper ⱡ , Khalid Belhajjame ⱡ , Oscar Corcho *, Yolanda Gil Ŧ , Carole Goble ⱡ * Universidad Politécnica de Madrid, ⱡ University of Manchester, Ŧ USC Information Sciences Institute IEEE eScience 2012. Chicago, USA
25
Embed
Common Motifs in Scientific Workflows: An Empirical Analysis
Slides for the e-Science 2012 presentation for the paper: Common Motifs in Scientific Workflows: An Empirical Analysis. The paper provides an analysis on 177 workflows from Taverna and Wings workflow systems, across diverse domains. The analysis highlights the commonmotifs or patterns that were found in the templates based on the functionality of each workflow step.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Date: 10/11/2012
Common Motifs in Scientific Workflows: An Empirical
Analysis
Daniel Garijo *, Pinar Alper ⱡ, Khalid Belhajjame ⱡ, Oscar Corcho *, Yolanda Gil Ŧ, Carole Goble ⱡ
* Universidad Politécnica de Madrid,ⱡ University of Manchester,
Ŧ USC Information Sciences Institute
IEEE eScience 2012. Chicago, USA
2
Overview
• Empirical analysis on 177 workflow templates from Taverna and Wings
• Catalog of recurring patterns: scientific workflow motifs.
•Workflow motif: Domain independent conceptual abstraction on the workflow steps.1. Data-oriented motifs: What kind of manipulations does the workflow have?
• E.g.: • Data retrieval • Data preparation• etc.
2. Workflow-oriented motifs: How does the workflow perform its operations?
•E.g.:• Stateful steps• Stateless steps• Human interactions• etc.
IEEE eScience 2012. Chicago, USA
WHAT?
HOW?
7
Data Oriented MotifsData-Oriented Motifs
Data Retrieval
Data Preparation
Format Transformation
Input Augmentation and Output Splitting
Data Organisation
Data Analysis
Data Curation/Cleaning
Data Moving
Data Visualisation
IEEE eScience 2012. Chicago, USA
8
Data Oriented MotifsData-Oriented Motifs
Data Retrieval
Data Preparation
Format Transformation
Input Augmentation and Output Splitting
Data Organisation
Data Analysis
Data Curation/Cleaning
Data Moving
Data Visualisation
IEEE eScience 2012. Chicago, USA
9
Data Oriented MotifsData-Oriented Motifs
Data Retrieval
Data Preparation
Format Transformation
Input Augmentation and Output Splitting
Data Organisation
Data Analysis
Data Curation/Cleaning
Data Moving
Data Visualisation
IEEE eScience 2012. Chicago, USA
10
Data Oriented MotifsData-Oriented Motifs
Data Retrieval
Data Preparation
Format Transformation
Input Augmentation and Output Splitting
Data Organisation
Data Analysis
Data Curation/Cleaning
Data Moving
Data Visualisation
IEEE eScience 2012. Chicago, USA
11
Data Oriented MotifsData-Oriented Motifs
Data Retrieval
Data Preparation
Format Transformation
Input Augmentation and Output Splitting
Data Organisation
Data Analysis
Data Curation/Cleaning
Data Moving
Data Visualisation
IEEE eScience 2012. Chicago, USA
12
Data Oriented MotifsData-Oriented Motifs
Data Retrieval
Data Preparation
Format Transformation
Input Augmentation and Output Splitting
Data Organisation
Data Analysis
Data Curation/Cleaning
Data Moving
Data Visualisation
IEEE eScience 2012. Chicago, USA
13
Workflow Oriented MotifsWorkflow-Oriented Motifs
Intra-Workflow Motifs
Stateful (Asynchronous) Invocations
Stateless (Synchronous) Invocations
Internal Macros
Human Interactions
Inter-Workflow Motifs
Atomic Workflows
Composite Workflows
Workflow Overloading
IEEE eScience 2012. Chicago, USA
14
Workflow Oriented MotifsWorkflow-Oriented Motifs
Intra-Workflow Motifs
Stateful (Asynchronous) Invocations
Stateless (Synchronous) Invocations
Internal Macros
Human Interactions
Inter-Workflow Motifs
Atomic Workflows
Composite Workflows
Workflow Overloading
IEEE eScience 2012. Chicago, USA
15
Workflow Oriented MotifsWorkflow-Oriented Motifs
Intra-Workflow Motifs
Stateful (Asynchronous) Invocations
Stateless (Synchronous) Invocations
Internal Macros
Human Interactions
Inter-Workflow Motifs
Atomic Workflows
Composite Workflows
Workflow Overloading
IEEE eScience 2012. Chicago, USA
16
Workflow Oriented MotifsWorkflow-Oriented Motifs
Intra-Workflow Motifs
Stateful (Asynchronous) Invocations
Stateless (Synchronous) Invocations
Internal Macros
Human Interactions
Inter-Workflow Motifs
Atomic Workflows
Composite Workflows
Workflow Overloading
IEEE eScience 2012. Chicago, USA
17
Workflow Oriented MotifsWorkflow-Oriented Motifs
Intra-Workflow Motifs
Stateful (Asynchronous) Invocations
Stateless (Synchronous) Invocations
Internal Macros
Human Interactions
Inter-Workflow Motifs
Atomic Workflows
Composite Workflows
Workflow Overloading
IEEE eScience 2012. Chicago, USA
18
Experiment setup
IEEE eScience 2012. Chicago, USA
•177 Workflow templates
• 111 from Taverna, sample from myExperiment• 66 from Wings, available in public server (now as Linked Data)• Diverse domains
Drug D
iscove
ry
Astronomy
Biodiversi
ty
ChemInformati
cs
Genomics
GeoInformati
cs
IST600
TextAnaly
tics05
10152025303540
TavernaWings
19
Result Summary: Data Oriented Motifs
IEEE eScience 2012. Chicago, USA
•Over 60% of the motifs are data preparation motifs• Of the 4 subcategories, the most common across domains are output
splitting, input augmentation, and reformatting steps.
•Data retrieval common in domains where curated databases exist
•Data analysis is often the main functionality of the workflow
Data organisation
20
Result Summary: Workflow Oriented Motifs
IEEE eScience 2012. Chicago, USA
• Around 40% composite workflows and internal macros• Workflow reuse is present even in some atomic workflows
•Human interactions steps increasingly used in some domains
21
Differences and commonalities of the workflow systems
IEEE eScience 2012. Chicago, USA
•Data moving/retrieval, stateful interactions and human interaction steps are not present in Wings• Web services (Taverna) versus software components (Wings)• Wings has layered execution through Pegasus
•Data preparation steps are common in both systems