Design and Evaluation of an Autonomic Workflow Engine Thomas Heinis, Cesare Pautasso, Gustavo Alsonso Dept. of Computer Science Swiss Federal Institute.

Post on 26-Mar-2015

215 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

Transcript

Design and EvaluationDesign and Evaluationof an Autonomic Workflow Engineof an Autonomic Workflow Engine

Thomas Heinis, Cesare Pautasso, Gustavo AlsonsoDept. of Computer Science

Swiss Federal Institute of Technology (ETHZ)

The 2nd IEEE International Conference on Autonomic Computing (UCAC-05)

March 15th, 2008Seo, Dongmahn

2/47

Contents Introduction System Background System Architecture Autonomic Capabilities System evaluation Conclusion

3/47

ContentsIntroductionIntroduction System Background System Architecture Autonomic Capabilities System evaluation Conclusion

4/47

Introduction Motivation Related Work Contribution

5/47

Motivation Workflow management systems

e-commercevirtual laboratoriesDNA sequencingscientific computingGrid computing idea of process-based Web service composition

6/47

Motivation (cont.)

Workflow enginesopen environmentunknown workloaddifficult to choose

a centralized solution a distributed implementation of the engine

problem of configuring the system in an optimal way NOT feasible solution

considering the number of parameters involved the variability of the workload having a system administrator in charge of manually monitoring reconfiguring the system

7/47

Related Work Decentralization of workflow process execution

important area of research support business processes lead to higher scalability introduces several problems

lack of a global view over the process scalability and reliability problems per se

To address the problem GOLIAT ,autonomic computing techniques, self-optimizing

computer systems autonomic computing principles in the context of distributed

workflow engines

8/47

Contribution Goal

self-tuningself-configuration capabilitiesself-healing capabilities

9/47

Contribution (cont.)

System extension to the JOpera engine

Java based service composition tool combines a workflow engine with an open architecture to provide support for Web service composition, Grid computing and

specialized workflow engines

flexible architecture, components Key system modules can be replicated to handle large

workloads. Other modules can be paired with a backup to achieve fault

tolerance. The autonomic controller can be configured by selecting

different reconfiguration strategies.

10/47

Contribution (cont.)

the key contributions of the paper the novel system architecture

genericcan be adopted by many engines operating under different

models and languages the resulting scalability and fault tolerance

flexible enough to support the very large loads present in computational applications and large scale Web service composition

the independence of the underlying workflow modeleasily extensible to support many different kinds of services

11/47

Contents Introduction

System BackgroundSystem Background System Architecture Autonomic Capabilities System evaluation Conclusion

12/47

System Background Requirements Workload Assumptions Deployment Environment

13/47

Requirements the workflow execution engine

to support autonomic behaviormust feature

self-configuration, self-tuning and self healing capabilities

Self-configurationswitching the system’s configuration on the flywithout manual intervention and disrupting the system requires the workflow execution engine

to support dynamically and efficiently change the configuration

14/47

Requirements (cont.)

self-tuningsystem reconfiguration to optimal given the current

workload the workflow engine must give access to its internal

statecontrol algorithms can analyze current and past performance

information to plan configuration changes in respose to the current workload

assumptionthe characteristics of the workload affect the system’s

performancethe self-tuning algorithm can optimally adapt the system to

the workload by monitoring key performance indicators

15/47

Requirements (cont.)

self-healingable to detect configuration changes due to external

eventsfailures of nodes

recovery action requires

mechanisms for detecting failures and configuration changes of the cluster

to query the workflow execution state

16/47

Workload Assumptions the workload is assumed

to be a collection of concurrent workflow processes a worst case scenario not deal with workload prediction issues

future work

17/47

Deployment Environment [Assumption] JOpera

runs on a dedicated cluster of computers can use these resources exclusively

main goal of the autonomic features to ensure the optimal configuration of the cluster

efficient resource utilization good allocation of the available nodes to the different system components

cluster configuration is NOT static the system could be extended to use shared nodes

that are also used for other purposes.

18/47

Contents Introduction System Background

System ArchitectureSystem Architecture Autonomic Capabilities System evaluation Conclusion

19/47

System Architecture Workflow Execution Distributed Workflow Execution Scalable Workflow Execution

20/47

Workflow Execution Workflow processes model

interactions btw different tasks by defining the data flow and control flow btw them

21/47

Distributed Workflow Execution

22/47

Scalable Workflow Execution scalability bottleneck

use several layers of cachingbtw tuple space and threads producing and consuming tuples

23/47

Contents Introduction System Background System Architecture

Autonomic CapabilitiesAutonomic Capabilities System evaluation Conclusion

24/47

Autonomic Capabilities Self-Tuning

Information StrategyOptimization StrategySelection Strategy

Self-ConfigurationReconfiguration

Actions Self-Healing

25/47

Self-tuning Information Strategy

detect imbalances in the system’s configuration to sample the current space size

Optimization Strategy to establish a configuration

such that the number of navigator and dispatcher threads is balanced

Selection Strategyprioritizing nodes according to how well suited they are

for a configuration change

26/47

Self-Configuration a closed feedback-loop controller Reconfiguration Actions

Starting Threadsthe JOpera API

Stopping Navigator Threadsmigrating the state of the processes

the navigator thread is working on and redirecting associated events by flushing the locally cached state into the global tuple space

27/47

Self-Configuration (cont.)

Stooping Dispatcher Threadsmore difficulttask may involve the invocation of a local application or the

interaction with a remote service provider on the Webmetadatakill method

immediately stops all active task executions ensures all task invocations will be repeated on a differend dispatcher

thread

stop method immediately ceases to take tuples from the task space

28/47

Self-Healing periodically monitors the nodes of the cluster Handling Dispatcher Thread Failures

the task that were managed by it are lost and have to be restarted

very similar to self-configuration component kills a dispatcher

Handling Navigator Thread Failures the state of the execution of the process is still the

available in the global process execution state spacesimply removing their entries in the tuple routing table

which point to the failed navigator

29/47

Contents Introduction System Background System Architecture Autonomic Capabilities

System evaluationSystem evaluation Conclusion

30/47

System evaluation Experimental Setup Base line Autonomic Behavior

Self-ConfigurationReconfiguration Overhead

Self-Healing Discussion

31/47

Experimental Setup a cluster of up to 20 nodes

1.0GHz dual P-III, 1GB of RAM, Linux (Kernel version 2.4.22) and Sun’s Java Development Kit version 1.4.2

one additional node the global tuple space server IBM’s T-Spaces v2.1.3

32/47

Base Line two different workloads

1000 concurrent processes containing 10 parallel tasks of duration of 0 seconds (workload 0)

1000 processes containing 10 parallel tasks of duration of 20 seconds (workload 20)

total 15 nodes14 navigators and 1 dispatcher up to 14 dispatchers and

1 navigator

33/47

Base Line (cont.)

34/47

Base Line (cont.)

35/47

Autonomic Behavior Self-Configuration

36/47

Autonomic Behavior (cont.)

37/47

Autonomic Behavior (cont.)

38/47

Autonomic Behavior (cont.)

Reconfiguration Overhead

39/47

Self-Healing initially to use 15 nodes to replace 5 of the nodes assigned workload

consists of four peaks of 500 processes occurring every 100 seconds

each of the processes consist of 10 parallel tasks of 10 seconds duration

change nodesgrow to 20 nodes at t=90 reduced by 5 nodes at t = 140again by 5 nodes at t=230

40/47

Self-Healing (cont.)

41/47

Self-Healing (cont.)

42/47

Self-Healing (cont.)

43/47

Self-Healing (cont.)

44/47

Discussion to find an optimal static configuration for a given

workloadvery difficultdifferent characteristics lead to different optimal

configurations autonomic controller was able to

adapt the configuration of the workflow engineaccording to the variable characteristics of the workload

self-healing experimentcommon situation in the lifetime of a cluster-based

system

45/47

Contents Introduction System Background System Architecture Autonomic Capabilities System evaluation

ConclusionConclusion

46/47

Conclusion the design of an autonomic workflow engine demonstrated its self-managing behavior and

evaluated its performance show how to apply the autonomic computing

paradigm to greatly simplify the deployment and the maintenance of such systems

homogeneous workload more complex characteristics as part of future

work

47/47

top related