A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Manuel Caeiro Zsolt Nemeth Thierry Priol CoreGRID Post Doc IRISA, Rennes, France MTA SZTAKI, Budapest, Hungary Associated Teacher University of Vigo, Spain [email protected]MTA SZTAKI Budapest, Hungary IRISA Rennes, France
29
Embed
A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Manuel Caeiro Zsolt Nemeth Thierry Priol CoreGRID Post Doc IRISA, Rennes, France.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
A Chemical Workflow Engine for Scientific Workflows with Dynamicity
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 2
Outline of the Presentation
1. Introduction• Scientific Workflows• The Chemical Computation Model
2. Proposal• The Scientific Workflow Language• The Chemical Workflow Engine• Dynamicity Support
3. Validation
4. Conclusions and Future Works
3
1. Introduction
This work has been performed in the context of the CoreGRID Excellence Network• IRISA (Rennes): December 2007 – March 2008• SZTAKI (Budapest): April 2008 – August 2008
VIGO
RENNESBUDAPEST
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 4
1. Introduction: Scientific Workflows
Scientific applications and experiments involve:• Large number of operations• Large data sets• Complex algorithms
Earth Sciences
Biology
Medical Image Analysis
Astronomy
Wheather Prediction
Sub-atomic Physics
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 5
1. Introduction: Scientific Workflows
Dynamicity is intrinsic to Scientific Workflows
• Scientists usually introduce modifications and variations in their experiments
• Scientific workflows are not always completely specified• Data is known dynamically during execution• Data is distributed and mobile• The resources are not fixed, but they change during workflow
execution
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 6
1. Introduction: Scientific Workflows
Dynamicity Requirements (1/2)– Monitoring
• To observe the progress of the workflow• To obtain the partial and final results
– Automatic Control• To support the detection of errors, problems• To support the control of data values and events
– Reproducibility• To enable the reproduction of the execution• It is important to validate the results
– Smart “re-runs”• To be able to re-start at an already performed stage
– Version Management• To support and distinguish different “attempts”
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 7
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 11
1. Introduction: The Chemical Computation Model
Main properties of the chemical computation model:
• Inherently concurrent
• Natural parallelism. No serialization is imposed
• Non determinism
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 12
2. Proposal
Goal: • To develop a workflow engine for scientific applications based on the
chemical computation model and supporting dynamicity
Steps:• The Scientific Workflow Language• The Chemical Workflow Engine• The Support of Dynamicity
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 13
2. Proposal: The Scientific Workflow Language
No General Accepted Scientific Workflow Language: • There exists several languages• Two main approaches: control-flow and data-flow • Specific data operators:
o SCUFL: one-to-one, all-to-allo ASKALON: large data set loops
• Solution Adopted:• To propose a new workflow language involving the more common
constructs
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 14
2. Proposal: The Scientific Workflow Language
Main Features: • It is an extension to Event-driven Process Chains (EPCs)• Events represent the state• Data Elements are related to Events (Inputs and outputs of Functions)• Resources are used to process Functions• Connector Types: AND/OR/XOR-split/Join, Sub-process, Loops, Data-
Loops, O2O, A2A
Function Connector Event Data Element Resource
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 15
2. Proposal: The Scientific Workflow LanguageLAPW0
Data-LOOP-split
Init
R1
Event1
LAPW1-K1
Event21
Event31
LAPW1-K2
Event22
Event32
LAPW1-Kn
Event2n
Event3n
Data-LOOP-join
R2
Data1
Data21
Data31
An Example: The VIEM workflow from
ASKALOM
2. Proposal: The Chemical Workflow EngineTwo main kinds of molecules:
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 16
Function Connector Event Data Element Resource
Active Molecules Passive Molecules
Connector + Event(s) + Data Element(s) Event(s) + Data Element(s)
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 17
2. Proposal: The Chemical Workflow Engine
Functions evolve through 4 states:• Disabled: a function not activated, not matched the input Event• Enabled: not matched the input Data Elements• Ready: not assigned to appropriate Resources• Initiated: the function that is being performed
Each state is represented by a different molecule
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 18
2. Proposal: The Chemical Workflow Engine
Disabled FunctionsDisabled Connectors
Events Data Elements
Enabled Function
Ready Function
Resources
Initiated Function
Event
Data Element
Resource
Chemical Solution
Disabled Enabled Ready Initiated
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 19
2. Proposal: The Chemical Workflow Engine
Connectors evolve through 2 states:• Disabled: a connector not activated, not matched the input Event(s)• Enabled: not matched the input Data Elements
Each state is represented by a different molecule
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 20
3. An HOCL Workflow Engine
Disabled FunctionsDisabled Connectors
Events Data Elements
+ 1 Connector
Resources
F.A
Ev.1 D.A.1..n
Resource
Chemical Solution
Data One-to-One Connector
F.A
+
F.B
Data A.1,2, …, N
Data B.1,2, …, N
Data C.1,2, …, N
Ev.1 Ev.2
Ev.3.1… 3.N
F.B
Ev.2 D.B.1..n
Resource
+ Connector+ 2 Connector
+ N Connector
Data A.1
Data B.1
Data C.1
Ev.3.1
F.C
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 21
2. Proposal: The Chemical Workflow Engine
Structure of the Chemical Workflow Engine:• Separated in 4 sub-solutions: one for each state• Transfer of molecules among sub-solutions
Operations in the Workflow Engine:• Compilation: the molecules representing the Disabled Functions and
Connectors corresponding to the process definition are introduced• Data Population: the molecules representing the Input Data Elements related
with a case are introduced• Resource Population: the molecules representing the available Resources are
introduced• Instance Creation: the molecules representing the initial Events are introduced
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 22
2. Proposal: The Chemical Workflow Engine
InputData
Compilation Data Population
Instance Creation
Resource Population
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 23
2. Proposal: The Chemical Workflow Engine
Identifiers:• Element Identifier: distinguishes among the several elements included
in a process specification.• Process Schema Identifier: distinguishes among process
specifications. • It has two parts: a process number and a version number.• Included in Functions, Connectors and Events.
• Instance Identifier: distinguishes among the several instances.• It includes a thread identifier (numbered Data Elements).• Included in Events and Data Elements and also in Functions and
Connectors in states Enabled, Ready and Initiated.
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 24
2. Proposal: Dynamicity Support
Dynamicity is supported in several ways:
• A workflow specification can be modified by changing the Functions and Connectors contained in the disabled sub-solution.
• The distinction between Event and Data Element molecules enables to separate the workflow specification from the data to be processed.
• Several workflow instances can be initiated and executed in parallel. Disabled molecules are not eliminated.
• The availability of Event molecules enables to develop a steering facility.
• Data Element molecules are not eliminated. This enables the development of monitoring, “smart re-runs” and provenance solutions.
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 25
2. Proposal: Dynamicity Support
Addendums to the Identifiers:• Addendum to the Process Schema Identifier
• Enables to use modifying versions of an existing process specification just by including the new molecules.
• Addendum to the Instance Identifier• Enables to use the data of another instance execution.
We support the 13 change patterns proposed in [18]:
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 26
3. Validation
Developed in CLIPS:• CLIPS provides an environment for the construction of rule-based
expert systems• CLIPS programming is performed by assertions and rules
• Assertions are used to are used to maintain information• Rules specify a certain action to be performed when a
conditions is satisfied• To validate the CWE we used two kinds of assertions and specific
rules:• Active molecule assertions of two types (Function and
Connector) and four possible states (Disabled, Enabled, Ready, Initiated)
• Passive molecule assertions of three types (Event, Data Element and Resource)
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 27
4. Conclusions
Summary:• Scientific workflows are gaining a great momentum• Dynamicity is an intrinsic need in scientific workflows
• A workflow engine based on the Chemical Computation Model has been conceived supporting dynamicity needs
Scientific Workflow Chemical Workflow Engine CLIPS
Future Work:• To provide an actual validation
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 28
4. Conclusions
Opportunities from the Chemical Computation Model:
• It is parallel in nature: it facilitates the distribution of computations parallelization is obtained in a transparent way• Workflows can be specified in the same way• Execution of workflows is automatically parallelized
• Change of the role of resources:– Central “chemical solution” vs. central Workflow engine– Pull-oriented vs. Push-oriented
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 29