Design and Evaluation of an Autonomic Workflow Engine Thomas Heinis, Cesare Pautasso, Gustavo Alsonso Dept. of Computer Science Swiss Federal Institute.

Design and EvaluationDesign and Evaluationof an Autonomic Workflow Engineof an Autonomic Workflow Engine

Thomas Heinis, Cesare Pautasso, Gustavo AlsonsoDept. of Computer Science

Swiss Federal Institute of Technology (ETHZ)

The 2nd IEEE International Conference on Autonomic Computing (UCAC-05)

March 15th, 2008Seo, Dongmahn

Contents Introduction System Background System Architecture Autonomic Capabilities System evaluation Conclusion

ContentsIntroductionIntroduction System Background System Architecture Autonomic Capabilities System evaluation Conclusion

Introduction Motivation Related Work Contribution

Motivation Workflow management systems

e-commercevirtual laboratoriesDNA sequencingscientific computingGrid computing idea of process-based Web service composition

Motivation (cont.)

Workflow enginesopen environmentunknown workloaddifficult to choose

a centralized solution a distributed implementation of the engine

problem of configuring the system in an optimal way NOT feasible solution

considering the number of parameters involved the variability of the workload having a system administrator in charge of manually monitoring reconfiguring the system

Related Work Decentralization of workflow process execution

important area of research support business processes lead to higher scalability introduces several problems

lack of a global view over the process scalability and reliability problems per se

To address the problem GOLIAT ,autonomic computing techniques, self-optimizing

computer systems autonomic computing principles in the context of distributed

workflow engines

Contribution Goal

self-tuningself-configuration capabilitiesself-healing capabilities

Contribution (cont.)

System extension to the JOpera engine

Java based service composition tool combines a workflow engine with an open architecture to provide support for Web service composition, Grid computing and

specialized workflow engines

flexible architecture, components Key system modules can be replicated to handle large

workloads. Other modules can be paired with a backup to achieve fault

tolerance. The autonomic controller can be configured by selecting

different reconfiguration strategies.

Contribution (cont.)

the key contributions of the paper the novel system architecture

genericcan be adopted by many engines operating under different

models and languages the resulting scalability and fault tolerance

flexible enough to support the very large loads present in computational applications and large scale Web service composition

the independence of the underlying workflow modeleasily extensible to support many different kinds of services

Contents Introduction

System BackgroundSystem Background System Architecture Autonomic Capabilities System evaluation Conclusion

System Background Requirements Workload Assumptions Deployment Environment

Requirements the workflow execution engine

to support autonomic behaviormust feature

self-configuration, self-tuning and self healing capabilities

Self-configurationswitching the system’s configuration on the flywithout manual intervention and disrupting the system requires the workflow execution engine

to support dynamically and efficiently change the configuration

Requirements (cont.)

self-tuningsystem reconfiguration to optimal given the current

workload the workflow engine must give access to its internal

statecontrol algorithms can analyze current and past performance

information to plan configuration changes in respose to the current workload

assumptionthe characteristics of the workload affect the system’s

performancethe self-tuning algorithm can optimally adapt the system to

the workload by monitoring key performance indicators

Requirements (cont.)

self-healingable to detect configuration changes due to external

eventsfailures of nodes

recovery action requires

mechanisms for detecting failures and configuration changes of the cluster

to query the workflow execution state

Workload Assumptions the workload is assumed

to be a collection of concurrent workflow processes a worst case scenario not deal with workload prediction issues

future work

Deployment Environment [Assumption] JOpera

runs on a dedicated cluster of computers can use these resources exclusively

main goal of the autonomic features to ensure the optimal configuration of the cluster

efficient resource utilization good allocation of the available nodes to the different system components

cluster configuration is NOT static the system could be extended to use shared nodes

that are also used for other purposes.

Contents Introduction System Background

System ArchitectureSystem Architecture Autonomic Capabilities System evaluation Conclusion

System Architecture Workflow Execution Distributed Workflow Execution Scalable Workflow Execution

Workflow Execution Workflow processes model

interactions btw different tasks by defining the data flow and control flow btw them

Distributed Workflow Execution

Scalable Workflow Execution scalability bottleneck

use several layers of cachingbtw tuple space and threads producing and consuming tuples

Contents Introduction System Background System Architecture

Autonomic CapabilitiesAutonomic Capabilities System evaluation Conclusion

Autonomic Capabilities Self-Tuning

Information StrategyOptimization StrategySelection Strategy

Self-ConfigurationReconfiguration

Actions Self-Healing

Self-tuning Information Strategy

detect imbalances in the system’s configuration to sample the current space size

Optimization Strategy to establish a configuration

such that the number of navigator and dispatcher threads is balanced

Selection Strategyprioritizing nodes according to how well suited they are

for a configuration change

Self-Configuration a closed feedback-loop controller Reconfiguration Actions

Starting Threadsthe JOpera API

Stopping Navigator Threadsmigrating the state of the processes

the navigator thread is working on and redirecting associated events by flushing the locally cached state into the global tuple space

Self-Configuration (cont.)

Stooping Dispatcher Threadsmore difficulttask may involve the invocation of a local application or the

interaction with a remote service provider on the Webmetadatakill method

immediately stops all active task executions ensures all task invocations will be repeated on a differend dispatcher

thread

stop method immediately ceases to take tuples from the task space

Self-Healing periodically monitors the nodes of the cluster Handling Dispatcher Thread Failures

the task that were managed by it are lost and have to be restarted

very similar to self-configuration component kills a dispatcher

Handling Navigator Thread Failures the state of the execution of the process is still the

available in the global process execution state spacesimply removing their entries in the tuple routing table

which point to the failed navigator

Contents Introduction System Background System Architecture Autonomic Capabilities

System evaluationSystem evaluation Conclusion

System evaluation Experimental Setup Base line Autonomic Behavior

Self-ConfigurationReconfiguration Overhead

Self-Healing Discussion

Experimental Setup a cluster of up to 20 nodes

1.0GHz dual P-III, 1GB of RAM, Linux (Kernel version 2.4.22) and Sun’s Java Development Kit version 1.4.2

one additional node the global tuple space server IBM’s T-Spaces v2.1.3

Base Line two different workloads

1000 concurrent processes containing 10 parallel tasks of duration of 0 seconds (workload 0)

1000 processes containing 10 parallel tasks of duration of 20 seconds (workload 20)

total 15 nodes14 navigators and 1 dispatcher up to 14 dispatchers and

1 navigator

Base Line (cont.)

Autonomic Behavior Self-Configuration

Autonomic Behavior (cont.)

Reconfiguration Overhead

Self-Healing initially to use 15 nodes to replace 5 of the nodes assigned workload

consists of four peaks of 500 processes occurring every 100 seconds

each of the processes consist of 10 parallel tasks of 10 seconds duration

change nodesgrow to 20 nodes at t=90 reduced by 5 nodes at t = 140again by 5 nodes at t=230

Self-Healing (cont.)

Discussion to find an optimal static configuration for a given

workloadvery difficultdifferent characteristics lead to different optimal

configurations autonomic controller was able to

adapt the configuration of the workflow engineaccording to the variable characteristics of the workload

self-healing experimentcommon situation in the lifetime of a cluster-based

system

Contents Introduction System Background System Architecture Autonomic Capabilities System evaluation

ConclusionConclusion

Conclusion the design of an autonomic workflow engine demonstrated its self-managing behavior and

evaluated its performance show how to apply the autonomic computing

paradigm to greatly simplify the deployment and the maintenance of such systems

homogeneous workload more complex characteristics as part of future

Design and Evaluation of an Autonomic Workflow Engine Thomas Heinis, Cesare Pautasso, Gustavo Alsonso Dept. of Computer Science Swiss Federal Institute.

system slide

configuration slide

system extension

system administrator

workflow execution state

workflow execution engine

dongmahn slide

components key system

Documents

Autonomic Testing

The autonomic nervous system The autonomic nervous system...

Bicocca Restws Pautasso Talk

Autonomic Dysfunction: Autonomic Non-Epileptic …...

Cesare Pautasso R E S T V1

Autonomic Execution of Web Service Compositions€¦ · 12....

SAN FRANCISCO, CA, USA Daniele Bonetta Achille Peternier...

Chapter 15 Autonomic Nervous System & Visceral Reflexes...

Triune Autonomic Nervous System - veteransresiliency.com ·...

Ignacio lopez y martina pautasso terminamos!!!!

Prof. Cesare Pautasso

Clinical Practice Guidelines: Neurological/Autonomic ... ·...

Autonomic Computing

Autonomic DNS

Autonomic DRUGS

The Autonomic Nervous System - Prehospital Medicine ·...