Top Banner
1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer Engineering Duke University Durham, NC 27708-0291 Phone: 7576 e-mail: [email protected] URL: www.ee.duke.edu/~kst IIT Kanpur
102

1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

Dec 29, 2015

Download

Documents

Maud Griffin
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

1

Probability for Computer Science

Kishor S. TrivediVisiting Prof. Of Computer Science and Engineering, IITKProf. Department of Electrical and Computer EngineeringDuke UniversityDurham, NC 27708-0291Phone: 7576e-mail: [email protected]: www.ee.duke.edu/~kst

IIT Kanpur

Page 2: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

2

Outline Introduction

Reliability, Availability, Security, Performance, Performability

Methods of Evaluation

Evaluation Vs. Optimization

Model construction, parameterization,solution,validation, interpretation Preliminaries: Sample Space, Probability Axioms, Independence, Conditioning, Binomial Trials Random Variables: Binomial, Poisson, Exponential, Weibull, Erlang, Hyperexponential, Hypoexponential, Pareto, Defective Reliability, Hazard Rate Average Case Analysis of Program Performance Reliability Analysis Using Block Diagrams and Fault Trees Reliability of Standby Systems Statistical Inference Including Confidence Intervals Hypothesis Testing Regression

Page 3: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

3

Schedule & Textbooks

Schedule: Jan 21, 23, 28 and Feb 6, 18, 25, 27 Probability & Statistics with reliability, queuing,

and computer science applications, K. S. Trivedi, second edition, John Wiley & Sons, 2001.

Performance and reliability analysis of computer systems: An Example-Based Approach Using the SHARPE Software Package, Sahner,

Trivedi, Puliafito, Kluwer Academic Publishers, 1996.

Page 4: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

4

Program Performance Evaluation

Worst-case vs. Average case analysis Data-structure-oriented vs. Control structure-oriented Sequential vs. Concurrent Centralized vs. Distributed Structured vs. with unrestricted transfer of control Unlimited (hardware) resources vs. limited resources Software architecture: modules, their characteristics

(execution time) and interactions (branching, looping) Measures: completion time (mean, variance & dist.) Measurements or Models (simulation vs. analytic) analytic models: combinatorial, DTMC, SMP, CTMC, SPN

Page 5: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

5

System Performance Evaluation

Workload: traffic arrivals, service time distributions pattern of resource requests Hardware architecture and software architecture Resource Contention, Scheduling & Allocation Concurrency, Synchronization, distributed processing Timeliness (Have to Meet Deadlines) Measures: Thruput, Goodput, loss probability, response time or delay (mean, variance & dist.) Low-level (Cache, memory interference: ch. 7) System-level (CPU-I/O, multiprocessing: ch. 8,9) Network-level (protocols, handoff in wireless: ch. 7,8) Measurements or models (simulation or analytic) analytic models: DTMC, CTMC, PFQN, SPN

Page 6: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

6

System Performance Evaluation Workload:

Single vs. multiple types of requests (classes, chains); in the latter case, the following three items needed for each type of request

traffic arrivals: one time vs. a stream stream: Poisson (Bernoulli), General renewal, IPP (IBP), MMPP(MMBP),

MAP, BMAP, NHPP, Self-similar service time distributions: Exponential (geometric), deterministic, uniform,

Erlang, Hyperexponential, Hypoexponential, Phase-type, general (with finite mean and variance), Pareto

pattern of resource requests: service time distribution (or the mean) at each resource per visit, branching probabilities; often described as a DTMC (discrete-time Markov chain) and can also be seen as the behavior of an individual program

All this information should be collected from actual measurements (if possible) followed by statistical inference

Page 7: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

7

Software Reliability Black-box (measurements+ statistical inference) vs.

Architecture-based approach (models)

Black-box approaches treat software as a monolithic whole, considering only its interactions with external environment, without an attempt to model its internal structure

With growing emphasis on reuse, software development process moves toward component-based software design

White-box approach may be better to analyze a system with many software components and how they fit together

Page 8: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

8

Software Architecture

Software behavior with respect to the manner in which different components interact

May include the information about the execution time of each component

Use control flow graph to represent architecture

Sequential program architecture modeled by Discrete Time Markov Chain (DTMC) Continuous Time Markov Chain (CTMC) Semi-Markov process (SMP)

Page 9: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

9

Failure Behavior of Components and Interfaces

Failure can happen during the execution of any component or during the transfer of control between components

Failure behavior can be specified in terms of

reliability

constant failure rate

time-dependent failure intensity

Page 10: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

10

System Reliability/Availability

Faultload: fault types, fault arrivals, repair/recovery procedures and delay time distributions

Hardware architecture and software architecture Minimum Resource Requirements Dynamic failures Performance/Reliability interdependence Measures: Reliability, Availability, MTTF, Downtime Low-level (Physics of failures, chip level) System-level (CPU-I/O, multiprocessing: ch. 8,9) Software and Hardware combined together Network-level Measurements or models (simulation or analytic) analytic models: RBD, FTREE, CTMC, SPN

Page 11: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

11

Definition of Reliability

Recommendations E.800 of the International Telecommunications Union (ITU-T) defines reliability as follows:

“The ability of an item to perform a required function under given conditions for a given time interval.”

In this definition, an item may be a circuit board, a component on a circuit board, a module consisting of several circuit boards, a base transceiver station with several modules, a fiber-optic transport-system, or a mobile switching center (MSC) and all its subtending network elements. The definition includes systems with software.

Page 12: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

12

Definition of AvailabilityAvailability is closely related to reliability, and is also defined in ITU-T Recommendation E.800 as follows:[1]

"The ability of an item to be in a state to perform a required function at a given instant of time or at any instant of time within a given time interval, assuming that the external resources, if required, are provided."

An important difference between reliability and availability is that reliability refers to failure-free operation during an interval, while availability refers to failure-free operation at a given instant of time, usually the time when a device or system is first accessed to provide a required function or service

Page 13: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

13

High Reliability/Availability/Safety

Traditional applications (long-life/life-critical/safety-critical)

Space missions, aircraft control, defense, nuclear systems

New applications (non-life-critical/non-safety-critical, business

critical) Banking, airline reservation, e-commerce

applications, web-hosting, telecommunication

Scientific applications (non-critical)

Page 14: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

14

Motivation: High Availability Scott McNealy, Sun Microsystems Inc.

"We're paying people for uptime.The only thing that really matters is uptime, uptime, uptime, uptime and uptime. I want to get it down to a handful of times you might want to bring a Sun computer down in a year. I'm spending all my time with employees to get this design goal”

SUN Microsystems – SunUP & RASCAL program for high-availability

Motorola - 5NINES Initiative HP, Cisco, Oracle, SAP - 5nines:5minutes Alliance IBM – Cornhusker clustering technology for high-availability, eLiza,

autonomic computing Microsoft – Trustable computing initiative John Hennessey – in IEEE Computer Microsoft – Regular full page ad on 99.999% availability in USA

Today

Page 15: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

15

Motivation – High Availability

Page 16: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

16

Need for a new term

Reliability is used in a generic sense

Reliability used as a precisely defined mathematical function

To remove confusion, IFIP WG 10.4 has proposed Dependability as an umbrella term

Page 17: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

17

Dependability– Umbrella term

Trustworthiness of a computer system such that reliance can justifiably be placed on the service it delivers

DEPENDABILITY

ATTRIBUTES

AVAILABILITY RELIABILITYSAFETYCONFIDENTIALITYINTEGRITYMAINTAINABILITY

FAULT PREVENTIONFAULT REMOVALFAULT TOLERANCEFAULT FORECASTING

MEANS

THREATSFAULTSERRORSFAILURES

SECURITY

Page 18: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

18

IFIP WG10.4

Failure occurs when the delivered service no longer complies with the specification

Error is that part of the system state which is liable to lead to subsequent failure

Fault is adjudged or hypothesized cause of an error

Faults are the cause of errors that may lead to failuresFault Error Failure

Page 19: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

19

Dependability:Reliability, Availability,Safety, Security

Redundancy: Hardware (Static,Dynamic), Information, Time, software

Fault Types: Permanent (needs repair or replacement), Intermittent (reboot/restart or replacement), Transient (retry), Design : Heisenbugs, Aging related bugs

Bohrbugs Fault Detection, Automated Reconfiguration Imperfect Coverage Maintenance: scheduled, unscheduled

Page 20: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

20

Software Fault Classification

Other bugs that are hard to find and fix remain in the software during the operational phase These bugs may never be fixed, but if the operation is retried

or the system is rebooted, the bugs may not manifest themselves as failures

manifestation is non-deterministic and dependent on the software reaching very rare states

Bohrbugs

Heisenbugs

Many software bugs are reproducible, easily found and fixed during the testing and

debugging phase

Page 21: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

21

Software Fault Classification

Bohrbugs

Software

Heisenbugs“Aging”

related bugs

Test/Debug

Des./Data Diversity

Retry opn.

Restart app.

Rebootnode

OperationalDesign/Development

Page 22: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

22

Failure Classification (Cristian)

Failures Omission failures (Send/receive failures)

Crash failures Infinite loop

Timing failures Early Late (performance or dynamic failures)

Response failures Value failures State-transition failures

Page 23: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

23

Security• Security intrusions cause a system to fail

• Security Failure • Integrity: Destruction/Unauthorized

modification of information• Confidentiality: Theft of information• Availability: e.g., Denial of Services

(DoS)• Similarity (as well as differences) between:

• Malicious vs. accidental faults• Security vs. reliability/availability• Intrusion tolerance vs. fault tolerance

Page 24: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

24

The Need of Performability Modeling

New technologies, services & standards need new modeling methodologies

Pure performance modeling: too optimistic!

Outage-and-recovery behavior not considered Pure dependability modeling: too conservative!

Different levels of performance not considered

Page 25: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

25

“ilities” besides performance

Performability measures of the

systems ability to perform

designated functions

Reliabilityfor a specified operational time

Availabilityat any given instant

SurvivabilityPerformance under

failures

R.A.S.-ability concerns grow. High-R.A.S. not only a selling point for equipment vendors and service providers. But, regulatory outage

report required by FCC for public switched telephone networks (PSTN) may soon apply to wireless.

Page 26: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

26

Evaluation vs. Optimization

Evaluation of system for desired measures given a set of parameters

Sensitivity Analysis Bottleneck analysis Reliability importance

Optimization Static:Linear,integer,geometric,nonlinear, multi-

objective; constrained or unconstrained Dynamic: Dynamic programming, Markov decision

process, semi-Markov decision process

Page 27: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

27

PURPOSE OF EVALUATION Understanding a system

Observation

Operational environment

Controlled environment Reasoning

A model is a convenient abstraction Predicting behavior of a system

Need a model Accuracy based on degree of extrapolation

Page 28: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

28

PURPOSE OF EVALUATION(Continued)

These famous quotes bring out the difficulty of prediction

based on models:

“All Models are Wrong; Some Models are Useful”

George Box

“Prediction is fine as long as it is not about the future”

Mark Twain

Page 29: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

29

Basic Definitions Reliability R(t):

X : time to failure of a system

F(t): : distribution function of system lifetime

Mean Time To system Failure:

f(t): density function of system lifetime

tFtXPtR 1

00

dttRdtttfXEMTTF

Page 30: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

30

Availability (Continued)

Instantaneous (point) Availability A(t):

A(t) = P (system working at t)

Let H(t) be the convolution of F and G:

g(t): density function of system repair time

Then:

Inst. Availability , , Reliability

dxxgxtFtHt

)()(0

t

xdHxtAtRtA0

)()()()(

)()( tRtA

Page 31: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

31

First failed and got repaired at time x<t & UP at end of interval (x,t), prob:

Availability

0 x t

x + dx

First repair completed here

Never failed in (0,t), prob: R(t) System working at time t

t

xdHxtA0

)()(

Page 32: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

32

Availability (Continued)

MTTR: Mean Time to Repair

Y: repair period of the system

Availability and Reliability are related but different!

0

)( dtttgYEMTTR

Page 33: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

33

We can show from equation (1) that:

Also:

Availability (Continued)

MTTRMTTF

MTTFASS

)yearminutes(

60*8760*)1(

perin

Adowntime ss

Page 34: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

34

Availability (Continued) Steady-State Availability:

There are two kinds of Availabilities! Instantaneous & Steady-state

For a system with high degree of redundancy

where MTTFeq & MTTReq must be carefully defined;

they can be computed using SHARPE

)(lim tAAt

SS

eqeq

eqSS MTTRMTTF

MTTFA

Page 35: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

35

Dependability Reliability: R(t), System MTTF Availability: Steady-state, Transient Downtime

Performance Throughput, Blocking Probability, Response Time

MEASURES TO BE EVALUATED

“Does it work, and for how long?''

“Given that it works, how well does it work?''

Page 36: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

36

MEASURES TO BE EVALUATED (Continued)

Composite Performance and Dependability

Need Techniques and Tools That Can Evaluate Performance, Dependability and Their Combinations

“How much work will be done(lost) in a given interval including the effects of failure/repair/contention?''

Page 37: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

37

Methods of EVALUATION

Measurement-Based

Most believable, most expensive

Not always possible or cost effective during system

design Statistical techniques are very important here

Model-Based

Page 38: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

38

Methods of EVALUATION(Continued)

Model-Based

Less believable, Less expensive

1. Discrete-Event Simulation vs. Analytic

2. State-Space Methods vs. Non-State-Space Methods

3. Hybrid: Simulation + Analytic (SPNP)

4. State Space + Non-State Space (SHARPE)

Page 39: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

39

Methods of EVALUATION(Continued)

Measurements + Models

Vaidyanathan et al ISSRE 99

Page 40: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

40

QUANTITATIVE EVALUATION TAXONOMY

Closed-form solution

Numerical solution using a tool

Page 41: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

41

Note that Both measurements & simulations imply statistical analysis of outputs

(ch. 10,11) Statistical inference Hypothesis testing Design of experiments Analysis of variance Regression (linear, nonlinear)

Distribution driven simulation requires generation of random deviates (variates) (ch. 3, 4, 5)

Probability and Statistics are different yet highly related Probability models need inputs that generally come from measurement

data (followed by statistical inference) Statistics in turn uses probability theory

Page 42: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

42

MODELING TAXONOMY

Page 43: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

43

ANALYTIC MODELING TAXONOMY

NON-STATE SPACE MODELING TECHNIQUES

SP reliability block diagrams

Non-SP reliability block diagrams

Page 44: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

44

State Space Modeling Taxonomy

State space methods

Markovian modeling

non-Markovian modeling

discrete-time Markov chains

continuous-time Markov chains

Markov reward models

Semi-Markov models

Markov regenerative models

Non-Homogeneous Markov

Page 45: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

45

• Model construction • Model parameterization• Model solution • Result interpretation• Model Validation

Modeling Steps

Page 46: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

46

MODELING AND MEASUREMENTS: INTERFACES

Measurements supply Input Parameters to Models

(Model Calibration or Parameterization)

Confidence Intervals should be obtained

Boeing, Draper, Union Switch projects

Model Sensitivity Analysis can suggest which Parameters

to Measure More Accurately: Blake, Reibman and Trivedi:

SIGMETRICS 1988.

Page 47: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

47

MODELING AND MEASUREMENTS: INTERFACES

Model Validation

1. Face Validation

2. Input-Output Validation

3. Validation of Model Assumptions

(Hypothesis Testing)

Rejection of a hypothesis regarding model assumption

based on measurement data leads to an improved model

Page 48: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

48

MODELING AND MEASUREMENTS: INTERFACES

Model Structure Based on Measurement Data

Hsueh, Iyer and Trivedi; IEEE TC, April 1988

Gokhale et al, IPDS 98;

Vaidyanathan et al, ISSRE99

Page 49: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

49

MODELING TAXONOMY

Page 50: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

50

ANALYTIC MODELING TAXONOMY

NON-STATE SPACE MODELING TECHNIQUES

SP reliability block diagrams

Non-SP reliability block diagrams

Page 51: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

51

State Space Modeling Taxonomy

(discrete) State space models

Markovian models

non-Markovian models

discrete-time Markov chains

continuous-time Markov chains

Markov reward models

Semi-Markov process

Markov regenerative process

Non-Homogeneous Markov

Page 52: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

52

MODELING THROUGHOUT SYSTEM LIFECYCLE

System Specification/Design Phase

Answer “What-if Questions'' Compare design alternatives (Bedrock,

Wireless handoff)

Performance-Dependability Trade-offs

(Wireless Handoff)

Design Optimization (optimizing the number of

guard channels)

Page 53: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

53

MODELING THROUGHOUT SYSTEM LIFECYCLE (Continued)

Design Verification Phase

Use Measurements + Models

E.g. Fault/Injection + Availability Model

Union Switch and Signals, Boeing, Draper

Configuration Selection Phase: DEC, HP

System Operational Phase: IDEN Project

Workload based adaptive rejuvenation

• It is fun!

Page 54: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

54

MODELER'S DILEMMA

Should I Use Discrete-Event Simulation?

Point Estimates and Confidence Intervals

How many simulation runs are sufficient?

What Specification Language to use?

C, SIMULA, SIMSCRIPT, MODSIM, GPSS, RESQ,

SPNP v6, Bones, SES workbench, ns, opnet

Page 55: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

55

MODELER'S DILEMMA (Continued)

Simulation:

+ Detailed System Behavior including non-exponential behavior

+ Performance, Dependability and Performability Modeling Possible

- Long Execution Time (Variance Reduction Possible)

Importance Sampling, importance splitting, regenerative simulation.

Parallel and Distributed Simulation

- Many users in practice do not realize the need to calculate confidence

intervals

Page 56: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

56

MODELER'S DILEMMA (Continued)

Also Known as Combinatorial Models Model Solved Without Generating State Space Use: Order Statistics, Mixing, Convolution (chapters 1-5) Common Dependability Model Types:

also called Combinatorial Models Series-Parallel Reliability Block Diagrams Non-Series-Parallel Block Diagrams (or Reliability Graphs) Fault-Trees Without Repeated Events Fault-Trees With Repeated Events

Should I Use Non-State-Space Methods?

Page 57: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

57

Combinatorial analytic models

Reliability block diagrams, Fault trees and Reliability

graphs

Commonly used for reliability and availability

These model types are similar in that they capture

conditions that make a system fail in terms of the

structural relationships between the system

components.

Page 58: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

58

RBD example

Page 59: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

59

Combinatorial Models Combinatorial modeling techniques like RBDs

and FTs are easy to use and assuming statistical independence solve for system availability and system MTTF

Each component can have attached to it A probability of failure A failure rate A distribution of time to failure Steady-state and instantaneous unavailability

Page 60: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

60

Non-State Space Modeling Techniques

Possible to compute (given component failure/repair rates:) System Reliability System Availability (Steady-state, instantaneous) Downtime System MTTF

Page 61: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

61

Non-State Space Modeling Techniques (Continued)

Assuming:

Failures are statistically independent

As many repair units as needed

Relatively good algorithms are available for

solving such models so that 100 component

systems can be handled.

Page 62: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

62

Common Model Types: Performance

Series-Parallel Task Precedence Graphs

Product-Form Queuing Networks

+ Easy specification, fast computation, no

distributional assumption

+ Can easily solve models with 100’s of components

Non-State Space Modeling Techniques (Continued)

Page 63: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

63

Combinatorial Modeling (Continued)

These models can be solved using fast algorithms assuming stochastic independence between system components. Systems with several hundred components can be handled. Sum of disjoint products (SDP) algorithms Binary decision diagrams (BDD) algorithms Factoring (conditioning) algorithms Series-parallel composition algorithm

- Failure/Repair Dependencies are often present; RBDs, FTREEs cannot easily handle these

(e.g., shared repair, warm/cold spares, imperfect coverage, non-

zero switching time, travel time of repair person, reliability with

repair)

Page 64: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

64

Markov chain

To model more complicated interactions between

components, use other kinds of models like Markov

chains or more generally state space models.

Many examples of dependencies among system

components have been observed in practice and

captured by Markov models.

Page 65: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

65

State-Space-Based Models

States and labeled state transitions State can keep track of:

Number of functioning resources of each type States of recovery for each failed resource Number of tasks of each type waiting at each

resource Allocation of resources to tasks

A transition: Can occur from any state to any other state Can represent a simple or a compound event

Page 66: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

66

Transitions between states represent the change of the system state due to the occurrence of an event

Drawn as a directed graph Transition label:

Probability: homogeneous discrete-time Markov chain (DTMC) Rate: homogeneous continuous-time Markov chain (CTMC) Time-dependent rate: non-homogeneous CTMC Distribution function: semi-Markov process (SMP) Two distribution functions; Markov regenerative process (MRGP)

State-Space-Based Models (Continued)

Page 67: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

67

MODELER'S DILEMMA (Continued)

Should I Use Markov Models?

State-Space-Based Methods

+ Model Fault-Tolerance and Recovery/Repair

+ Model Dependencies

+ Model Contention for Resources

+ Model Concurrency and Timeliness

+ Generalize to Markov Reward Models for Modeling Degradable

Performance

Page 68: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

68

MODELER'S DILEMMA (Continued)

Should I Use Markov Models?

+ Generalize to Markov Regenerative Models for Allowing

Generally Distributed Event Times

+ Generalize to Non-Homogeneous Markov Chains for Allowing

Weibull Failure Distributions

+ Performance, Availability and Performability Modeling Possible

- Large (Exponential) State Space

Page 69: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

69

IN ORDER TO FULFILL OUR GOALS

Modeling Performance, Availability and

Performability

Modeling Complex Systems

We Need

Automatic Generation and Solution of Large

Markov Reward Models

Page 70: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

70

IN ORDER TO FULFILL OUR GOALS (Continued)

Facility for State Truncation, Hierarchical composition of

Non-State-Space and State-Space Models, Fixed-Point

Iteration There are Two Tools that Potentially meet these Goals

Stochastic Petri Net Package (SPNP)

Symbolic Hierarchical Automated Reliability and

Performance Evaluator (SHARPE)

Page 71: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

71

Model-based Performance/Dependability

evaluation Choice of the model type is dictated by:

Measures of interest Level of detailed system behavior to be

represented Ease of model specification and solution Representation power of the model type Access to suitable tools or toolkits

Page 72: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

72

Difficulty in Modeling using Markov chains

The Markov chains tend to be large and complex

leading too:

Model generation problem

Use automated means of generating the Markov

chains: Stochastic Petri Nets, Stochastic Reward

Nets

Page 73: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

73

Difficulty in Modeling using Markov chains (Continued)

Model solution problem

Use sparse storage for the matrices

Use sparsity preserving solution methods

Sucessive Overrelaxation,

Gauss-Seidel,

Uniformization,

ODE-solution methods

Page 74: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

74

Modeling any system with a pure reliability / availability

model can lead to incomplete, or, at least, less precise

results.

Gracefully degrading systems may be able to survive the

failure of one or more of their active components and continue

to provide service at a reduced level.

Markov reward model is commonly used technique for the

modeling of gracefully degradable system

Markov Reward Models (MRMs)

Page 75: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

75

State-Space-Based Models

Use also the following model types:

Markov chains & Markov reward models

semi-Markov & Markov regenerative processes

Stochastic reward nets or generalized stochastic Petri nets.

SRN & GSPN models are transformed into Markov chains for

analysis.

Only model types (in SHARPE) that requires a conversion to a

different model (Markov chain) to be solved.

Page 76: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

76

Summary- Modeling Techniques

Combinatorial techniques like RBDs and FTREEs are easy to use and solve

Combinatorial models cannot easily represent intricate dependencies

State space based models like Markov chains can handle dependencies

State space explosion problem Use automated generation methods: stochastic Petri nets Concurrency, contention and conditional branching easily

modeled with Petri nets.

Page 77: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

77

Hierarchy used State space explosion can be handled in two

ways: Large model tolerance must apply to

specification, storage and solution of the model. If the storage and solution problems can be solved, the specification problem can be solved by using more concise (and smaller) model specifications that can be automatically transformed into Markov models.

Large models can be avoided by using hierarchical (Multilevel) model composition.

Page 78: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

78

An Introduction to SHARPE software tool

Page 79: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

79

Overview of SHARPE SHARPE: Symbolic-Hierarchical

Automated Reliability and Performance Evaluator

Well-known modeling tool (Installed at over 300 Sites; companies and universities)

Combines flexibility of Markov models and efficiency of combinatorial models

Ported to most architectures and operating systems

Used for Education, Research, Engineering Practice

Page 80: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

80

Graphical User Interface is available

Used for analysis of performance(traffic), dependability and

performability

Hierarchy facilitates largeness & stiffness avoidance

Steady-state as well as transient analysis

Written in C language

Used as an engine by several other tools

Overview of SHARPE (cont.)

Page 81: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

81

SHARPE - new features

Many more built in distributions

Ability to easily specify structured Markov

chains (Loop feature)

Ability to print models and outputs

Page 82: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

82

New Features Equivalent mean time to system failure and equivalent mean

time to system repair implemented for Markov chains and RBDs

BDD algorithms implemented for FTs and RGs Steady-state computation of MRGP models Stochastic reward net is available as a model type Fast MTTF algorithm implemented for Markov chain Mathematica used for some fully symbolic computations GUI implemented

Page 83: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

83

Architecture of SHARPE interface

Fault tree

Reliability graph

ReliabilityBlock

Diagrams

Task graph Pfqn, Mfqn

Hierarchical & Hybrid Compositions

MRGP

Markov chain

Petri net(GSPN & SRN)

Reliability/Availability Performance Performability

Page 84: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

84

SHARPE MENU OF MODEL TYPES

Availability/Reliability:

Series-Parallel Reliability Block

Diagram (block)

Fault Trees (ftree)

Reliability Graphs (relgraph)

Page 85: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

85

SHARPE MENU OF MODEL TYPES

Performance (traffic modeling):

Product-Form Queuing Networks

(pfqn, mpfqn)

Series-Parallel Task Graphs (graph)

Page 86: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

86

SHARPE MENU OF MODEL TYPES Both Availability and Performance

Markov Chains (markov)

Semi-Markov Chains (semimark)

Reward Models

Generalized Stochastic Petri Nets (gspn)

Hierarchical & Hybrid Compositions of Above

Many solution algorithms for each model type; these algorithms

continually improving

Page 87: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

87

Architecture of SHARPE

Reliability/Availability Performance Performability

Fault treeMultistate fault treeReliability block diagramReliability graphPhased-mission systemsMarkov chainSemi-Markov chainGSPNStochastic reward netMRGPPFQNMPFQNTask Graph

Page 88: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

88

State Space Explosion State space explosion can be handled in two ways:

Large model tolerance must apply to specification, storage and solution of the model. If the storage and solution problems can be solved, the specification problem can be solved by using more concise (and smaller) model specifications that can be automatically transformed into Markov models (GSPN and SRN models).

Large models can be avoided by using hierarchical model composition.

Ability of SHARPE to combine results from different kinds of models Possibility to use state-space methods for those parts of a system

that require them, and use non-state-space methods for the more “well-behaved” parts of the system.

Page 89: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

89

Reliability models in practice

Fully symbolic CDFFully symbolic MTTFFully symbolic PQCDF

Page 90: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

90

Availability models in practice

Expected interval availability

Page 91: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

91

RBD example

Page 92: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

92

Fault tree example

Page 93: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

93

Performance models in practice

Page 94: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

94

Markov chain model of a multiprocessor system

Page 95: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

95

Markov reward model

Page 96: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

96

GSPN model

Page 97: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

97

GSPN model

Page 98: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

98

Performability models in practice

Page 99: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

99

Possible outputs Availability, Unavailability and Downtime Cost of downtime Mean Time to System Failure, Mean Time to System Repair Downtime breakdown into Hardware, Software & Upgrade Breakdown of downtime by states for Markov chain models,

by blocks for Reliability block diagram models. Sensitivity Analysis, Strategy to improve the availability of

the systems.

Page 100: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

100

SHARPE - references Performance and Reliability Analysis of Computer Systems,

Robin Sahner, Kishor Trivedi, A. Puliafito, Kluwer Academic

Press, 1996, Red book

Reliability and Performability Modeling using SHARPE

2000, C. Hirel, R. Sahner, X. Zang, K. Trivedi Computer

performance evaluation: Modelling tools and

techniques; 11th International Conference; TOOLS

2000, Schaumburg, Il., USA, March 2000.

Page 101: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

101

ADVANTAGES OF THE APPROACH

Pick a Natural Model Type for a Given Application

(No Retrofitting Required)

Use a Natural Model Type for a Portion of a Model

(Encourages Hybrid and Hierarchical Composition)

Page 102: 1 Probability for Computer Science Kishor S. Trivedi Visiting Prof. Of Computer Science and Engineering, IITK Prof. Department of Electrical and Computer.

102

ADVANTAGES OF THE APPROACH Except for gspn and srn Models, No Internal Conversion Done

Appropriate Solution Algorithm for Each Model Type

i.e., Hierarchy for Solution as well as Specification

Pedagogic Advantages

Multi-Version Modeling

Step-Wise Refinement in Modeling