Top Banner
www.s-cube-network.eu S-Cube Learning Package Self-* infrastructures: Self-healing in Mixed Service-oriented Systems TU Wien (TUW) Harald Psaier, TUW
34
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

www.s-cube-network.eu

S-Cube Learning Package

Self-* infrastructures:

Self-healing in Mixed Service-oriented Systems

TU Wien (TUW)

Harald Psaier, TUW

Page 2: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

© Harald Psaier

Learning Package Categorization

S-Cube

Self-* Service Infrastructure

and Discovery Support

Self-* Service Infrastructure

Self-healing in SOA

Page 3: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

Learning Package Overview

Problem Description

Self-healing research

Example: Self-healing policies for Mixed Service-oriented

Systems

Conclusions

© Harald Psaier

Page 4: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

Mixed Service-oriented Systems

Open dynamic service environment to humans and services

– distributed coordination and communication

– no predefined top-down- but flexible compositions

Interactions are ad-hoc and dynamic and usually in

boundaries of an activity

Mixed System (MS) include a mixed collaboration between two main and distinct types of services:

Human-Provided Services (HPS)

– Human provide knowledge/skills/expertise as services

– Close gab between required human expertise and difficulty of implementation as software

Software-Based Services (SBS)

© Harald Psaier

Page 5: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

Examples of mixed systems

Review services: Include shared reviewing activities arround

documents, code, and evaluations

Innovation services: foster various ideas for a new product

design

Support services: provide solutions for questions and

problems on multiple or selected subjects

Current platforms with massive use of MSs: crowdsourcing

platforms. These include, e.g., Amazon’s Mechanical Turk,

Yahoo answers, uTest.

© Harald Psaier

Page 6: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

Let’s Consider a Scenario (1)

Humans and services interact to perform work described by

the activities in the process model.

© Harald Psaier

Service

Registry

Process Model

inv

oke

human service

activity scopes

Page 7: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

Let’s Consider a Scenario (2)

One of the services fails to complete an assigned activity.

In a loop self-healing monitors, recognizes and adapts to the

incident © Harald Psaier

Process Model

Deployment with

Dependency

Management

Run-Time Environment

Monitoring

in

vo

ke

X

Adaptation

Self-healing

Policies

Page 8: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

Let’s Consider a Scenario (3)

The reaction is controlled by policies connected to the

process activities

The challenge of the autonomous system is in particular the

complexity of MSs (c.f., dynamicity of MSs).

The goal of Self-* properties is to support administration in

system management.

In particular the tasks of self-healing in MS include:

– Avoid errors in design

– Avoid errors in configuration

– Replace failing services at runtime

– Handle adaptation complexity transparently to keep system healthy

– Support need of service maintenance

© Harald Psaier

Page 9: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

Learning Package Overview

Problem Description

Self-healing research

Example: Self-healing policies for Mixed Service-oriented

Systems

Conclusions

© Harald Psaier

Page 10: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

What is self-healing

A self-healing system should recover from the abnormal (or

“unhealthy”) state and return to the normative (“healthy”)

state, and function as it was prior to disruption.

A system with self-healing properties can be identified as a

system that comprises fault-tolerant, self-stabilizing, and

survivable system capabilities and, if needed, must be human

supported.

© Harald Psaier

The 3 common states are

Normal, Broken, and

Degraded. The challenge is

to identify Degraded in time

and to recover soundly.

Page 11: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

Self-healing origins

Fault-tolerant system refers to a system that continues

working at a reasonable degree in the presence of faults

Self-stabilizing systems refers to a system that continuously

stabilizes the system from any perturbations.

Survivable systems sustain the unexpected

© Harald Psaier

Page 12: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

Self-healing research: autonomic computing (1/2)

IBM's autonomic computing research envisions a layered structure that can manage itself to given high-level objectives from administrators.

Motivated by the amount spent on and overwhelming effort in system maintenance

The research tries to cover all adaptable layers down to network and operating system

Defines 4 properties for a self-managing system (self-CHOP):

– self-configuring: The ability to readjust itself “on-the fly”

– self-healing: Discover, diagnose, and react to disruptions

– self-optimization: Maximize resource utilization to meet end-user needs

– self-protection: Anticipate, detect, identify, and protect itself from attacks.

© Harald Psaier

Page 13: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

Self-healing research: self-adaptive systems (2/2)

Self-adaptive systems evaluate their behavior and adapt on

system irregularities or when better functionality or

performance is possible

The research primarily covers the application and the

middleware layers and focuses on the system as a whole.

Includes also self-healing as a combination of self-diagnosing

and self-repairing with the capabilities to diagnose and

recover from malfunctions.

© Harald Psaier

Page 14: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

Self-healing characteristics

© Harald Psaier

What:

Continuous availability by

compensating the dynamics of a

running system.

Why:

maintenance of health momentarily

and ...

Enduring continuity by resilience

against unintentional behavior

How:

Detect disruptions

Diagnose root cause

Derive recovery strategy

Page 15: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

Self-healing requirements

A closed loop design which integrates sufficient sensor and

effector interfaces.

A status knowledge database and logic for an accurate state

recognition

State recognition must include failure classification for a

adequate handling of the problem

A collection of recovery policies in the format of <trigger, rule,

action>. Usually this collection is preconfigured but must also

be configurable to obtain…

Fitness and evolutionary aspects. Self-* properties generally

are applied to maintain a long-term use of the system

© Harald Psaier

Page 16: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

Self-healing loop

© Harald Psaier

detecting: filters any

suspicious status information

diagnosing: does root cause

analysis and calculates an

appropriate recovery

recovery: carefully applies

the planned adaptations

A self-healing loop comprises 3 common states: detecting,

diagnosing, recovering

These are connected to the sensors and effectors of the

system

In the background, a knowledge-base supports the states

Page 17: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

Self-healing states

The most general states in self-healing research are:

Normal: The system is in a “healthy” state. In particular, it

signalizes intentional functioning and all requirements are

met as expected.

Broken: This is an “unhealthy” system. It can generally be

identified by an unacceptable response which most probably

is the cause of a failure or error.

Degraded: The system is in a fuzzy transition zone between

the former. Behavior is expected to be unpredictable and

parts of the system will drift from acceptable state to some

failure state. In large-scale system in many cases this is

recognizable by considerable performance loss. If

redundant, in most cases the size provides the system with

additional recovery time.

© Harald Psaier

Page 18: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

Failure classification: Failure types (1/2)

The main goal of this classification is to assist root cause

analysis and find the adequate resolution for the failure.

Common failure types are:

– Crash failure: undetectable malign service interruption

– Fail-stop: detected failure caused a service interruption

– Transient: instantaneous transparent interruption with measurable

side-effects

– Omission: message loss, transmission errors in communication

infrastructure

– Performance: violation of agreements on execution time

– Arbitrary: any type of failure with no specific pattern

© Harald Psaier

Page 19: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

Failure classification: Policies (2/2)

Policies provide configuration and settings for detection and

recovery.

There are three different types of policies:

– Action policies: These are reactive policies with a specialized trigger

and immediate response is expected.

– Goal Policies: These define a set of desired states. They also

calculate the set of actions for the transition from the current (failure

affected) to a desired state

– Utility Function Policies: the set of states is connected to an utility

function. Problem solving includes extensive analysis including history

information, adaptation knowledge and a comprehensive system

awareness

Common recovery include:

– Replacement, balancing, isolation, persistence, redirection, etc.

© Harald Psaier

Page 20: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

Fitness and evolution

Current large-scale systems, especially self-* enhanced, must

be designed for long-term service.

This means they must be resilient to changes and allow any

required future variations.

The issues to keep in mind are:

– Most arising requirements are not known a-priori but expose over time

– Intervention and changes on the current system must respect the

system’s essential functionality and avoid malicious failures at any

cost

– adaptation might reach its limits in resources

The current solution is to create self-* systems with exposed

configuration management and thus human assisted

adaptations

© Harald Psaier

Page 21: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

S-Cube contributions to Self-healing/-* research

<NAME> – SoE1.1 Virtual Campus learning material © Harald Psaier – 21/<Max>

Psaier H., Dustdar S. (2010). A survey on self-healing systems: approaches and systems. Computing. Springer Wien.

Di Nitto, E., Ghezzi, C., Metzger, A., Papazoglou, M., Pohl, K. (2008). A journey to highly dynamic, self-adaptive service-based applications. Automated Software Engineering, 15(3), p 313—341. Springer.

Hielscher, J., Kazhamiakin, R., Metzger, A., Pistore, M. (2008). A framework for proactive self-adaptation of service-based applications based on online testing. Towards a Service-Based Internet. P 122—133. Springer.

Pernici, B. (2009). Self-healing Systems and Web Services: The WS-Diamond Approach. Business Process Management Workshops. p 440—442. Springer.

Psaier H., Skopik F., Schall D., Dustdar S. (2010). Behavior Monitoring in Self-healing Service-oriented Systems. 34th Annual IEEE Computer Software and Applications Conference (COMPSAC), July 19-23, 2010, Seoul, South Korea. IEEE.

Papazoglou, M.; Pohl, K.; Parkin, M.; Metzger, A. (2010). S-Cube - Towards Engineering, Managing and Adapting Service-Based Systems. Springer. 1st Edition., 2010, XVIII, 374 p.

Page 22: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

Learning Package Overview

Problem Description

Self-healing research

Example: Self-healing policies for Mixed Service-oriented

Systems

Conclusions

© Harald Psaier

Page 23: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

Mixed Service-oriented Systems: Challenges

Mixed Service-oriented Systems aka. Mixed Systems (MS)

are open to humans and services.

Inherit all properties of SOA including distributed, ad-hoc

interactions along with a communication infrastructure and

coordination.

… and aforementioned properties

… and examples

What are the challenges in MS?

– the „openness“ of the system allows to join many and possibly

unreliable services

– In particular humans are unreliable related to their, e.g., different

working hours, particular preferences, current mood, and context.

© Harald Psaier

Page 24: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

Scenario: Expert Network

The key is to share the subtask of the activity among the

appropriate experts for the subtask. This is usually solved by

delegation and re-delegation. However can fail on individual

misbehavior.

Main challenge: How to guarantee that the activity is

complete, also, on time?

© Harald Psaier

Includes two parties: the

service consumer with a

request as an activity – and

experts and resources in

the service network.

The network combines all

knowledge required to

process jointly the activity

Page 25: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

Delegation and processing behavior

A model of the network helps to analyze a possible problem

– HPS and SBS are represented as nodes

– Interactions are allowed over established channels

– The current work load of nodes is indicated by the queues

At runtime the model additionally indicates

– The delegation directions and frequency by the arrow direction and the

thickness of the connection

– The current work load is indicated

by the queue fill state

With the model we can present

two main patterns of misbehavior

© Harald Psaier

Page 26: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

1st misbehavior pattern: Delegation Factory

The delegation factory misbehavior pattern:

– a accepts and delegates particular tasks frequently

– However, a processes few tasks and has a low task-queue

The factory behavior impact:

– produces unusual amounts of task delegations

– tasks miss their deadline

– leads to performance degradations of the entire network

© Harald Psaier

Page 27: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

2nd misbehavior pattern: Delegation Sink

The delegation sink Misbehavior pattern:

– d accepts too many offered tasks

– However, d processes slow (e.g., overestimates its capability vs.

received overload)

Sink behavior impact:

– produces unusual amounts of task delegations

– tasks miss their deadline

– leads to performance degradations of the entire network

© Harald Psaier

Page 28: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

Observing and avoiding misbehavior

A successful self-healing architecture that can handle the

misbehavior situations must

– avoid unpredictable system behavior leading to faults

– indentify and handle degraded states. Degraded states here relate to

poor progress in activity process because of increasing factory/source

behavior

Feasible adaptation actions must not include direct

punishment of the misbehaving participating experts. Instead

a transparent temporary decoupling from the system is

considered.

Also, the architecture must be aware of the side-effects of the

healing actions.

– a feedback loop informs about the success of the adaptation

© Harald Psaier

Page 29: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

The VieCure Framework

© Harald Psaier

Between the MS atop a

monitoring and adaptation

layer connects to the

framework.

From the interaction logs

events are derived and

diagnosed.

The Behavior Registry

provides the metrics to

identify the misbehavior

patterns

During recovery the

interaction channels are

adjusted

Page 30: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

Self-healing steps on misbehavior

System is in prefect health

An overload in node b is detected

Assuming a causes the most

overload traffic, the recovery action

regulates channel (i) between a and b

However, b remains overloaded. An

additional unknown cause is

assumed

An alternative for b is found and

channels to d are opened

Channels (ii) and (iii) are now

available

© Harald Psaier

Page 31: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

Learning Package Overview

Problem Description

Self-healing research

Example: Self-healing policies for Mixed Service-oriented

Systems

Conclusions

© Harald Psaier

Page 32: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

Summary

Self-healing research principles

– A self-healing system should recover from the abnormal (or

“unhealthy”) state and return to the normative (“healthy”) state, and

function as it was prior to disruption.

– The 3 common states are Normal, Broken, and Degraded. The

Challenge is to identify Degraded in time and to recover soundly.

– In order to recover a self-healing loop is required that detects,

diagnose, and recovers the system.

Self-healing in MS

– the „openness“ of the system and the generally unpredictable human

behavior are sources of system degradation.

– The two presented misbehavior models are delegation factory and

sink. Either a node delegates without respecting the capacity of the

neighbors or a node overestimates its capacity.

– The VieCure Framework considers and resolves both cases.

© Harald Psaier

Page 33: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

Further S-Cube Reading

© Harald Psaier

Psaier H., Juszczyk L., Skopik F., Schall D., Dustdar S. (2010). Runtime Behavior Monitoring and Self-Adaptation in Service-Oriented Systems. 4th IEEE International Conference on Self-Adaptive and Self-Organizing Systems (SASO), September 27 - October 01, 2010, Budapest, Hungary. IEEE.

Psaier H., Skopik F., Schall D., Juszczyk L., Treiber M., Dustdar S. (2010). A Programming Model for Self-Adaptive Open Enterprise Systems. 5th Workshop of the 11th International Middleware Conference (MW4SOC), November 29 - December 3, 2010, Bangalore, India. ACM.

Psaier H., Skopik F., Schall D., Dustdar S. (2011). Resource and Agreement Management in Dynamic

Crowdcomputing Environments. 15th IEEE International EDOC Conference (EDOC), 29th August - 2nd

September, 2011, Helsinki, Finland, IEEE.

.

Dustdar, S.; Schall, D.; Skopik, F.; Juszczyk, L.; Psaier, H. (Eds.) (2011). Socially Enhanced Services

Computing -- Modern Models and Algorithms for Distributed Systems. (1) p. 37. Springer

Page 34: S-CUBE LP: Self-healing in Mixed Service-oriented Systems

Acknowledgements

The research leading to these results has

received funding from the European

Community’s Seventh Framework

Programme [FP7/2007-2013] under grant

agreement 215483 (S-Cube).

© Harald Psaier