A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University 2005.9
Jan 15, 2016
A Workflow Engine with Multi-Level Parallelism Supports
Qifeng Huang and Yan Huang
School of Computer ScienceCardiff University
2005.9
Agenda
• Background
• SWFL Workflow Architecture
• SWFL Description Language
• SWFL Workflow Engine
• Multi-level Parallelisms in SWFL
Background: Service and Service Composition
• Service encapsulates various resources and make them available over the network via standard interface and protocol
• Web/grid services are emerging as important paradigms for distributed computing
• Service composition/workflow: complex application can created by simple services
Background: GSiB
• Current efforts such as BPEL mainly focus on business process
• Increased demands for scientific workflow, as parallel computing especially grid computing applications expands
• GSiB aims to a general workflow for both business and scientific areas, especially for the latter
• The convergence trend of grid services and web services make it feasible
GSiB Workflow Architecture
VSCE
Service Workflow Language
SWFL Workflow Engine
• SWFL: an XML-based, graph-oriented service workflow description language
• Engine: Distributed enactment environment with multi-level parallelism support
• VSCE: Visual Service Composition Environment
SWFL: Basic Elements
Types*
FlowModel(name, isParallel, …)
Message* (name, part* …)
Variables* (name, type)
Activity* Definition of all involved activities (normal/native services, assign, if, switch, for, while, do while and catchEnd activities)
FlowModel* (name, isParallel …)
ControlLink* (Source/Port, Target/Port)
DataLink* (Source/Part, Target/Part)
SWFL: Graph-Oriented
• In GSiB, a workflow application can be described either as a validated XML (SWFL) documentation or a directed graph
• A node (activity in SWFL) could be either a standard service operation, an compound structure, or an on-machine program
• An edge (data/control link in SWFL) describes the data and control dependencies among involved activities
SWFL: An Example
Data Source
Activity A
IF(a/b)
Activity B
Activity C
Data Sink
a>b
……<swfl:flow name="sample" requireParallel="false"> <wsdl:input message="flowInput"/> <wsdl:output message="flowOutput"/> <swfl:activity> <swfl:if name="ifControl">…</swfl:if> </swfl:activity> <swfl:activity> <swfl:normal name="ActivityA"> <swfl:performedBy>… </swfl:performedBy> </swfl:normal> </swfl:activity>…… <swfl:controlLink> <swfl:source name="ifControl" port="IF"/> <swfl:target name="task2"/> </swfl:controlLink> …… <swfl:dataLink target="ifControl"> <swfl:source name="ActivityA"> <swfl:map>…</swfl:map> </swfl:source> </swfl:dataLink> ……</swfl:flow>……
SWFL vs. BPEL
• Both can be used to build workflows which involve peer-to-peer interactions between web services
• BPEL is mainly for business processes while SWFL is mainly for scientific areas
• BPEL uses a script-oriented approach, while SWFL follows a graph-oriented approach
SWFL: Why Graph-Oriented?
• Easy to use, especially using friendly VSCE: Like flow chart and UML model
• Flexible and dynamic in services schedule and execution– Completely decided by the engine– Making full use of dynamic runtime features,
different strategies can be used for a flow– Straightforward support to multi-level
parallelisms
VSCE: Make Complicated Things Easy
Workflow Drawing
Pane
VSCE: What is more…
• Friendly integrated visual tool for users to build, execute and control workflow– Make end users not have to know much
about workflow
• Design (draw) a flow with fun: Drag-and-drop
• Configure and initiate the execution
• Retrieve results and track runtime status
A Grid Architecture Based on workflow Engines (1)
<invoke name="registerAuctionResults"partnerLink="auctionRegistrationService"portType="as:auctionRegistrationPT"operation="process"inputVariable="auctionData"><correlations><correlation set="auctionIdentification"/></correlations></invoke><receive name="receiveAuctionRegistrationInformation"partnerLink="auctionRegistrationService"portType="as:auctionRegistrationAnswerPT"operation="answer"variable="auctionAnswerData"><correlations><correlation set="auctionIdentification"/></correlations></receive>
Job ProcessorJob ProcessorJob Processor
Workflow Engine
ServiceServiceServiceServiceServiceService
Job ProcessorJob ProcessorJob Processor
Job Processor
SWFL
BPEL
A Grid Architecture Based on workflow Engines (2)
ServiceServiceServiceServiceServiceService
Job ProcessorJob ProcessorJob Processor
Workflow Engine
Job ProcessorJob ProcessorJob ProcessorJob Processor
Job ProcessorJob ProcessorJob Processor
Workflow Engine Job ProcessorJob ProcessorJob ProcessorJob Processor
Job ProcessorJob ProcessorJob Processor
Workflow Engine
Job ProcessorJob ProcessorJob ProcessorJob Processor
ServiceServiceServiceServiceServiceService
<invoke name="registerAuctionResults"partnerLink="auctionRegistrationService"portType="as:auctionRegistrationPT"operation="process"inputVariable="auctionData"><correlations><correlation set="auctionIdentification"/></correlations></invoke><receive name="receiveAuctionRegistrationInformation"partnerLink="auctionRegistrationService"portType="as:auctionRegistrationAnswerPT"operation="answer"variable="auctionAnswerData"><correlations><correlation set="auctionIdentification"/></correlations></receive>
SWFL
BPEL
GSiB Workflow Processing
SWFL/MPFL Document
Java Programs
XML2Graph
Graph2Java
Enactment Environment
ExecutionResult
1
2
3
Graph2XML
GSiB Instance: Graph Objects
• XML2Graph and Graph2Java tools
• Graph Objects– Two kinds: data graphs and control graphs– Straightforward format for VSCE– Schedule strategy is decided during
runtime
Engine: Architecture
Gateway
Job Processor
Storage
Scheduler
UDDI
VSCE
Engine
Engine: Components
• Gateway: a web service provides entry point to submit jobs and retrieval results and runtime status: three job formats
• Job Processor: computing resources composed of a pool of worker threads
• Scheduler: provides dynamic service execution strategy during runtime
• Storage: provides space as well as API for objects, results and status information
Engine: Multi-Level Parallelisms
• Service-level
• Flow-Level
• Message-Passing
• Parallelism in BPEL: explicitly described in the script
Service-Level Parallelism
• An activity is ready when all its input data are ready and all activities it has control dependencies are complete
• May exist several ready activities at the same time; Can be executed in parallel
• Greedy algorithm: execute an activity once it is ready; may waste storage and computing resource; not always optimum
• Question: how to schedule services?
Flow-Level Parallelism: An Example
A
B C
D E
F
A
BC
DE
F
Partition
Process 1
Process 2
Flow-Level Parallelism (2)
• Decentralized orchestration of services: divide a workflow into several sub-flows, to run by several job processors in parallel
• Two kinds: independent connected graphs; partition connected graph
• Parallelism achievements: quick response; high throughput; scalability
• Additional complexities: flow partition; coordination of distributed execution
Message-Passing Parallelism: Background and MPFL
• Parallelism in SWFL is suitable for applications with forms of parallelism that can be displayed in a workflow graph
• Most scientific applications exhibit more sophisticated parallelism like message passing, which is a normal thing
• MPFL: extends the SWFL flow model to support applications with message-passing
Message-Passing Parallelism: An Example
A
D
B
C
Flow Model 1for process 0
Flow Model 2for processes with rank larger than 0
A
D
B
C
B
C
B
C
B
C
Process 0 Process 1 Process 2 Process 3 Process N-1
Workflow Applicatio
n
MPFL Model
• Multi-layer heterogeneous communication domains are supported
• An instance is usually run on a cluster: parallelism just like a standard MPI program can be achieved
• Engine: accumulative extension of SWFL engine; still a work in progress
Job ProcessorJob ProcessorJob Processor
Workflow Engine
ServiceServiceServiceServiceServiceService
Job ProcessorJob ProcessorJob Processor
Cluster
MPFL
Message-Passing Parallelism
Conclusions
• Workflow framework in GSiB is grid-oriented, suitable for both business and scientific applications composed of web/grid services
• Graph-based SWFL provides much flexibilities for both end users and engine implementation
• VSCE provides visual tool to build and execute workflow applications
• SWFL engine provides an automatic and self-organizing enactment environment for the processing of workflow applications
• Better performance is achieved with the support of multi-level parallelism in SWFL engine