DS - II - DCMM - 1 HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK Zuverlässige Systeme für Web und E-Business (Dependable Systems for Web and E-Business) Lecture 2 DEPENDABILITY CONCEPTS, MEASURES AND MODELS Wintersemester 2000/2001 Leitung: Prof. Dr. Miroslaw Malek http://www.informatik.hu-berlin.de/~rok/zs
66
Embed
HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK
HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK. Zuverlässige Systeme für Web und E-Business (Dependable Systems for Web and E-Business) Lecture 2 DEPENDABILITY CONCEPTS, MEASURES AND MODELS Wintersemester 2000/2001 Leitung: Prof. Dr. Miroslaw Malek - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
DS - II - DCMM - 1
HUMBOLDT-UNIVERSITÄT ZU BERLININSTITUT FÜR INFORMATIK
Zuverlässige Systeme für Web und E-Business (Dependable Systems for Web and E-Business)
• OBJECTIVES – TO INTRODUCE BASIC CONCEPTS AND TERMINOLOGY IN
FAULT-TOLERANT COMPUTING
– TO DEFINE MEASURES OF DEPENDABILITY
– TO DESCRIBE MODELS FOR DEPENDABILITY EVALUATION
– TO CHARACTERIZE BASIC DEPENDABILITY EVALUATION TOOLS
• CONTENTS – BASIC DEFINITIONS
– DEPENDABILITY MEASURES
– DEPENDABILITY MODELS
– EXAMPLES DEPENDABILITY EVALUATION TOOLS
DS - II - DCMM - 3
ADDING A THIRD DIMENSION
COST
DEPENDABILITY
PERFORMANCE
DE
PE N D A B I L
IT
Y
RELIABILITY
AVAILABILITY
MTTF
MTTR
MISSION TIME
FAULT TOLERANCE ETC .
DS - II - DCMM - 4
FAULT INTOLERANCE
• PRIOR ELIMINATION OF CAUSES OF UNRELIABILITY– fault avoidance– fault removal
• NO REDUNDANCY
• MANUAL / AUTOMATIC SYSTEM MAINTENANCE
• FAULT INTOLERANCE ATTAINS RELIABLE SYSTEMS BY:– very reliable components – refined design techniques– refined manufacturing techniques– shielding– comprehensive testing
DS - II - DCMM - 5
DEGREES OF DEFECTS • FAILURE
– occurs when the delivered service deviates from the specified service: failures are caused by errors
• FAULT – incorrect state of hardware or software
• ERROR –– manifestation of a fault within a program or data structure forcing
deviation from the expected result of computation (incorrect result)PHYSICAL
DEFECTPERMANENT
FAULT
INCORRECT DESIGN
INTERMITTENT FAULT
UNSTABLE ENVIRONMENT
TRANSIENT FAULT
OPERATOR MISTAKE
UNSTABLE OR MARGINAL HARDWARE
SOURCES OF ERRORS
[from Siewiorek and Swarz]
TEMPORARY INTERNAL
TEMPORARY EXTERNAL
SERVICE FAILURE
ERROR
DS - II - DCMM - 6
TYPICAL FAULT MODELS FORPARALLEL / DISTRIBUTED SYSTEMS
• CRASH
• OMISSION
• TIMING
• INCORRECT COMPUTATION /COMMUNICATION
• ARBITARARY (BYZANTINE)
DS - II - DCMM - 7
FAULT TOLERANCE: BENEFITS & DISADVANTAGE
• FAULT TOLERANCE – ACCEPT THAT AN IMPLEMENTED SYSTEM WILL NOT BE
FAULT-FREE
– FAULT TOLERANCE IS ATTAINED BY REDUNDANCY IN TIME AND/OR REDUNDANCY IN SPACE
– AUTOMATIC RECOVERY FROM ERRORS
– COMBINING REDUNDANCY AND FAULT INTOLERANCE
• BENEFITS OF FAULT TOLERANCE– HIGHER RELIABILITY
– LOWER TOTAL COST
– PSYCHOLOGICAL SUPPORT OF USERS
• DISADVANTAGE OF FAULT TOLERANCE – COST OF REDUNDANCY
DS - II - DCMM - 8
C o m p o n e n t F a i l u r e s ( P h y s i c a l D e f e c t s )
E x t e r n a l D i s t u r b a n c e s
( U n s t a b l e E n v i r o n m e n t )
O p e r a t o r M i s t a k e s
I n c o r r e c t S p e c i f i c a t i o n ,
D e s i g n o r I m p l e m e n t a t i o n
H a r d w a r e F a u l t s
S o f t w a r e F a u l t s
E r r o r s
F A U L T - T O L E R A N T C O M P U T I N G
O V E R V I E W
S e r v i c e F a i l u r e ( S y s t e m M a l f u n c t i o n )
DS - II - DCMM - 9
FAULT CHARACTERIZATIONS
• CAUSE– specification
– design
– implementation
– component
– external
• NATURE– hardware
– software
– analog
– digital
• DURATION– permanent
– temporary
– transient
– intermittent
– latent
• EXTENT – local
– distributed
• VALUE– determinate
– indeterminate
DS - II - DCMM - 10
BASIC TECHNIQUES/REDUNDANCY TECHNIQUES (1)
• HARDWARE REDUNDANCY
– Static (Masking) Redundancy
– Dynamic Redundancy
• SOFTWARE REDUNDANCY
– Multiple Storage of Programs and Data
– Test and Diagnostic Programs
– Reconfiguration Programs
– Program Restarts
DS - II - DCMM - 11
BASIC TECHNIQUES/REDUNDANCY TECHNIQUES (2)
TIME (EXECUTION REDUNDANCY)
Repeat or acknowledge operations at various levels
Major Goal - Fault Detection and Recovery
DS - II - DCMM - 12
DEPENDABILITY MEASURES
• DEPENDABILITY IS A VALUE OF QUANTITATIVE MEASURES SUCH AS RELIABILITY AND AVAILABILITY AS PERCEIVED OR DEFINED BY A USER
• DEPENDABILITY IS THE QUALITY OF THE DELIVERED SERVICE SUCH THAT RELIANCE CAN JUSTIFIABLY BE PLACED ON THIS SERVICE
• DEPENDABILITY IS THE ABILITY OF A SYSTEM TO PERFORM A REQUIRED SERVICE UNDER STATED CONDITIONS FOR A SPECIFIED PERIOD OF TIME
DS - II - DCMM - 13
RELIABILITY
• RELIABILITY R(t) OF A SYSTEM IS THE PROBABILITY THAT THE SYSTEM WILL PERFORM SATISFACTORILY FROM TIME ZERO TO TIME t, GIVEN THAT OPERATION COMMENCES SUCCESSFULLY AT TIME ZERO (THE PROBABILITY THAT THE SYSTEM WILL CONFORM TO ITS SPECIFICATION THROUGHOUT A PERIOD OF DURATION t)
• HARDWARE: Exponential distribution e-t
Weibull distribution e-(t) - shape parameter - failure rate
• SOFTWARE: Exponential, Weibull, normal, gamma or Bayesian
R(t) = e- t
• A constant failure rate is assumed during the life of a system
DS - II - DCMM - 14
MORTALITY CURVE: FAILURE RATE VS. AGE FAILURE RATE
e (t) w
(t)
u
EARLY LIFE
T
USEFUL LIFE
T T
m E W
WEAR OUT
DS - II - DCMM - 15
EXPANDED RELIABILITY FUNCTION
FOR EARLY LIFE e(t) = e() + u + w()
BUT w() IS NEGLIGIBLE DURING EARLY LIFE.
FOR USEFUL LIFE u(t) = e() + u + w()
BUT BOTH e() AND w() ARE NEGLIGIBLE FOR USEFUL LIFE.
FOR WEAR OUT w(t) = e() + u + w()
WITH e() NEGLIGIBLE.
THUS, THE GENERAL RELIABILITY FUNCTION BECOMES
R(t) = Re(t) Ru(t) Rw(t)
OR
R(t) = exp [ -o
t
(T) d T ]
R ( t ) = e x p { - o
t
[ e ( ) +
u
+ w
( d] d
DS - II - DCMM - 16
SERIES SYSTEMS - the failure of any one module causes the failure of the entire system
Rs(t) = R1(t) R2(t) R3(t) = e-(1
+2
+3
)t
In general for n serial modules:
WHERE
- SYSTEM FAILURE RATE
i - INDIVIDUAL MODULE FAILURE RATE
For n identical modules:
R (t) R (t) R (t)1 2 3
R(t) = i=1
n R
i(t) = e
-t
i=1
n
i
R(t) = [ Rm
(t) ]n
DS - II - DCMM - 17
PARALLEL SYSTEMS - assume each module operates independently and the failure of one
module does not affect the operation of the other
RP(t) = RA(t) + RB(t) - RA(t)RB(t)
A
B
Unreliability
Q(t) = 1 - R(t)
A B R(t) Probability of Module Working
Venn Diagram
EXAMPLE: TWO IDENTICAL SYSTEMS IN PARALLEL EACH CAN ASSUME THE ENTIRE LOAD.
– MT(r) gives the time at which system reliability falls below the prespecified level r.
MT(r) = -ln r/
COVERAGE
• a) Qualitative: List of classes of faults that are recoverable (testable, diagnosable)
• b) Quantitative: The probability that the system successfully recovers given that a failure has occurred.
• c) Quantitative: Percentage of testable/diagnosable/ recoverable faults.
• d) Quantitative: Sum of the coverages of all fault classes, weighted by the probability of occurrence of each fault class.
C = p1C1 + p2C2 + ....+ pnCn
DS - II - DCMM - 24
DIAGNOSABILITY
• A system of n units is one step t-fault diagnosable (t-diagnosable) if all faulty units within the system can be located without replacement, provided the number of faulty units does not exceed t. (Preparata-Metze-Chien, 12/67)
1) 2t+1n
2) At least t units must test each unit
FAULT TOLERANCE
• Fault tolerance – is the ability of a system to operate correctly in presence of faults
or– a system S is called k-fault-tolerant with respect to a set of algorithms
{A1,A2,...,Ap} and a set of faults {F1,F2,...Fq} if for every k-fault F in S, Ai is executable by SF when 1 i p. (Hayes, 9/76)
or – Fault tolerance is the use of redundancy (time or space) to achieve
the desired level of system dependability.
SF is a subsystem of a system S with k-faults.
DS - II - DCMM - 25
COMPARATIVE MEASURES
• RELIABILITY DIFFERENCE
• RELIABILITY GAIN
• MISSION TIME IMPROVEMENT
• SENSITIVITY
• R2(t) - R1(t)
• Rnew(t)/Rold(t)
• MTI=MTnew(r)/MTold(r)
• dR2(t)/dt vs dR1(t)/dt
OTHER MEASURES
MAINTAINABILITY (SERVICEABILITY)
is the probability that a system will recover to an operable state within a specified time.
SURVIVABILITY
is the probability that a system will deliver the required service in the presence of a defined a priori set of faults or any of its subset.
DS - II - DCMM - 26
RESPONSIVENESS AN OPTIMIZATION METRIC PROPOSAL
responsiveness = ri (t) = aipi
where
ri (t) reflects the responsiveness of a task at time t
ai denotes i-th task availability
pi represents probability of timely completion of the i-th task
DS - II - DCMM - 27
QUESTION
• IN MANY PRACTICAL SITUATIONS, ESPECIALLY IN REAL-TIME SYSTEMS, WE FREQUENTLY NEED TO ANSWER A QUESTION:
WILL WE ACCEPT
LESS PRECISE RESULT
IN SHORTER TIME?
• PROPOSED METRICS:
– 1) WEIGHTED SUM
– ( • PRECISION / • TIME) AVAILABILITY
– 2) QUOTIENT-PRODUCT
– [PRECISION / log (TIME)] AVAILABILITY
DS - II - DCMM - 28
INTEGRITY(PERFORMANCE + DEPENDABILITY)
• TOTAL BENEFIT DERIVED FROM A SYSTEM OVER A TIME t.– HOW TO MEASURE? – 1) TOTAL NUMBER OF USEFUL MACHINE CYCLES OVER THE TIME t.
– 2) P - performance index R - integrity level (probability
that an expected service is delivered)
• Example: – In a multistage network performance index could be P = N2 (the number of
paths in the network)
• L = number of levels in a network
2) I = P o
t
R() d
I = N2 o
t
(1-p)(1-
m)(1-L
s)d
DS - II - DCMM - 29
EXAMPLE
• COMPUTE OVERALL SYSTEM's – MTTF
– MTTR
– MTBF
– AVAILABILITY
DISK
POWER
MAIN COMPUTER
CPU 4M Console
1950 1.20
1500 2.00
3800 0.80
SERIES
PARALLEL
UPS
UPS
15,800 3.75
UTILITY
460 0.50
1800 4.50
1800 4.50
1800 4.50
1800 4.50
15,800 3.75
DISK
DISK
DISK
DISK
3-OF-4 (K OF N)
MTTF MTTR
DS - II - DCMM - 30
SERIES ELEMENT SUBSYSTEM (1)
• Given MTTF and MTTR of each element– Total Failure Rate
– s
– Series MTTF
MTTFs =
s
1 = (
i=1
3
MTTF1
)-1
i
DS - II - DCMM - 31
SERIES ELEMENT SUBSYSTEM (2)
• Availability
A1 = = 0,99938499
A2 = = 0.99866844
• MTTR
MTTR = 1.4976 Hours
• MTBF
MTBF = 693.2 + 1.5 = 694.7 Hours A
3 =
3800 + 0.803800
= 0.99978952
As =
i=1
3 A
i = 0.99784418
MTTR = A
1 - A MTTF
20.11950
1950
0.21500
1500
DS - II - DCMM - 32
PARALLEL ELEMENT SUBSYSTEM (1)• All elements must fail to cause subsystem failure• MTTF and MTTR known for each element• Unavailability for entire subsystem is
• Availability is
• MTTR
• MTTF
Us =
i=1
n U
i
As = 1 - U
s = 1 -
i=1
n (1 - A
i)
MTTRs = (
i=1
n MTTR
i
1 )
-1
MTTFs =
1-As
As
MTTRs
DS - II - DCMM - 33
PARALLEL ELEMENT SUBSYSTEM (2)
• Availability
A3 =
460 + 0.50460
= 0.9989142
0.9997627 A A
9997627.075.315800
15800
MTTR MTTF
MTTF A
12
1
As = 1 -
i=1
n (1 - A
i)
As = 1 - (0.0002373) (0.0002373) (0.0010858)
As = 0.99999999994
DS - II - DCMM - 34
PARALLEL ELEMENT SUBSYSTEM (3)
• MTTR
MTTRs = 0.3947368 Hour
• MTTF
MTTFs = 6,456,914,809 Hours
MTTRs = (
i=1
3 MTTR
i
1 )
-1
MTTRs = (
3.751
+ 3.751
+ 0.501
)-1
MTTFs = (
1 - As
As
) MTTRs
MTTFs = (
1.0 - 0.999999999940.99999999994
) (0.3947368)
DS - II - DCMM - 35
PARALLEL ELEMENT SUBSYSTEM (4)
• Paralleling modules is a technique commonly used to significantly upgrade system reliability
• Compare one universal power supply (UPS) with availability of
0.9997627
• to the parallel combination of power supplies with availability of
0.99999999994
• In practical systems availability ranges (2 to 12 9‘s)
0.99 to 0.999999999999
DS - II - DCMM - 36
K of N PARALLEL ELEMENT SUBSYSTEM (1)
• Have N identical modules in parallel (assume all have the same MTTF and MTTR)
• Only K elements are required for full operation • K = 1 is the same as parallel • K = N is the same as series
• Reliability • Rs = Prob (system works for time T) = Prob (N modules work
or N - 1 modules work or . . . or K modules work)• Note: The above conditions are mutually exclusive • Rs = Prob (N modules work) + Prob (N - 1 modules work)
+ . . . + Prob (K modules work)
DS - II - DCMM - 37
K of N PARALLEL ELEMENT SUBSYSTEM (2)
• RELIABILITY
where Rm is the individual modules reliability. Therefore
For 3 of 4 and Rm = 0.9
Rs =
L=K
N
(N-L)! L!N!
(Rm
)L
(1 - Rm
)N - L
Rs= R
m
4 + (
34)R
m
3(1-R
m)=0.9
4+4x0.9
3x0.1=0.9477
DS - II - DCMM - 38
K of N PARALLEL ELEMENT SUBSYSTEM (3)• MTTR
• MTTF
• AVAILABILITY
• Example (3 out of 4 subsystem)– N = 4 K = 3 – MTTF = 1800 Hours – MTTR = 4.50 Hours
MTTRs =
N - K + 1MTTR
MTTFs = MTTF (
MTTRMTTF)
N-K
(K(K
N))
-1
MTTFs = MTTF (
MTTRMTTF)
N-K
(N!
(N-K)!(K-1)!) =
As =
MTTFs + MTTR
s
MTTFs
DS - II - DCMM - 39
K of N PARALLEL ELEMENT SUBSYSTEM (4)
• MTTR
• MTTF
MTTFs = 60,000 Hours
MTTRs =
N-K+1MTTR
= 4-3+14.50
= 2.25 Hours
MTTFs = MTTF(
MTTRMTTF)
N-K
(N!
(N-K)! (K-1)!)
MTTFs = 1800(
4.51800)
4-3
(4!
(4-3)! (3-1)!)
DS - II - DCMM - 40
K of N PARALLEL ELEMENT SUBSYSTEM (5)
• AVAILABILITY
• MODULE AVAILABILITY
As =
MTTF + MTTRMTTF
As =
60,000 + 2.2560,000
= 0.9999625
A = MTTF + MTTR
MTTF =
1,800 + 4.501,800
= 0.9975062
DS - II - DCMM - 41
OVERALL SYSTEM (1)
• Three series elements– 1. Main Computer
– 2. Power
– 3.Disks
• MTTF
MTTF = 685.25 Hours
MTTF = ( i=1
3
MTTFi
1 )
-1
MTTF = ( 693.2
1 +
646 x 107
1 +
60,0001
)-1
DS - II - DCMM - 42
OVERALL SYSTEM (2)
• AVAILABILITY
A = (0.997844) (0.99999999994) (0.9999635)
A = 0.9978033• MTTR
MTTR = 1.51 Hours
A = i = 1
3 A
i
MTTR = (A
1 - A) MTTF
MTTR = (0.997803
1 - 0.9978033 ) 685.25
DS - II - DCMM - 43
OVERALL SYSTEM (3)
• Reliability Function – R (t) = e -t/MTTF
– R (2080) = e-2080/685.25
– R (2080) = 0.0480567
• MTBF
– MTBF = MTTF + MTTR
– MTBF = 685.25 + 1.51 = 686.76 Hours
DS - II - DCMM - 44
IMPORTANT ATTRIBUTES OF COMPUTER SYSTEMS
• DEPENDABILITY (RELIABILITY /AVAILABILITY) – Whether or not it works and for how long ?
• PERFORMANCE– (Throughput, response time, etc.) – Given that it works, how well does it work ?
• PERFORMABILITY – For degradable Systems performance evaluation
of systems subject to failure/repair
• RESPONSIVENESS – Does it meet deadlines in presence of faults ?
HARP,CARE III, SURE ADVISER, SPADE ARIES, SURF SAVE
– Mainly non-repairable or assuming successful fault recovery
– Block diagrams
• SIMULATION– Markov chains model
• PETRI NETS MODEL– places, tokens and transitions
Good Bad (t) Δ t
Δ t
1
3
2
0 Both faulty
P1 faulty P2 fault-free
Both good
P1 fault-free P2 faulty
1 2
2
1
1 2
2
1
DS - II - DCMM - 47
EXTENDED STOCHASTIC PETRI NET MODEL - AN EXAMPLE OF A TMR SYSTEM
Operational Processors
Fault Handling
System Failed
T 1
T 2
T 3
T 4
T 5
T 6
T 7
T 8
T 9
T 10
T 11
K =2 1
K =2 2
Processors Failed
P 1
P 2
P 3
P 4
Initial state
(from Johnson and Malek)
DS - II - DCMM - 48
FAULT TREES
• Fault tree analysis is an application of deductive logic to produce a fault oriented pictorial diagram which allows one to analyze system safety and reliability.
• Fault trees may serve as a design aid for identifying the general fault classes.
• Fault trees were traditionally used in evaluation of hardware reliability but they may help in designing fault-tolerant software and in developing a top-down view of the system.
• Complex event such as a system failure is consecutively broken down into simpler events such as subsystem failures, individual components and block failures, down to single element failures. These simple events are linked together by "and" or "or" Boolean functions.
• The probability of higher level events can be calculated by combining probabilities of the lower level events.
DS - II - DCMM - 49
MARKOV MODELS
ARIES, CARE III, HARP, SAVE, SURE, SURF • Different Fault Types
– Transient, Intermittent, Permanent
– Common-Mode
– Near-Coincident
• Details of Fault-Handling (Coverage) Behavior• Dynamic and Static Redundancy• Hierarchy is difficult to handle• State Explosion
DS - II - DCMM - 50
FAULT TREE EXAMPLE
P P1 2P P
1 3 P P2 3 Voter
System Failure
P P1 2P
3
DS - II - DCMM - 51
HANDLING OF STATE SPACE PROBLEM
• BASIC APPROACH: – DIVIDE-AND-CONQUER (Decompose and Aggregate)
• STRUCTURAL DECOMPOSITION
– Consider a System as a Set of Independent Subsystems
• BEHAVIORAL DECOMPOSITION– Consider Different Time Constants in Fault-Occurrence and Fault-
Handling Processes
• Consider the Use of Different Techniques for Different Submodels
DS - II - DCMM - 52
RELIABILITY TOOLS OBJECTIVESAND TABLE OF COMPARISONS
• FAULT COVERAGE EVALUATION
• RELIABILITY EVALUATION
• AVAILABILITY EVALUATION
• LIFE CYCLE EVALUATION
• MTTF/MTTR
• SERVICE COST
DS - II - DCMM - 53
EXPERIMENTAL
FAULT INJECTION
FAULT LATENCY
DEPENDABILITY MEASUREMENT
DS - II - DCMM - 54
RELIABILITY TOOLS (1)
Tool Application Input Models Solution
ARIES-82
-Repairable/non-
repairable systems
-Temporary and
permanent faults
-Exponential failure and repair distributions
-Physical & logical structure needed to build Markov process
Continuous-state
homogeneous
Markov process
Analytical
-Non repairable systems only
-Temporary and permanent faults
-Weibull or exponential failure distribution
-Fault tree
-Continuous non- homogeneous Markov chain
-Semi-Markov model
Analytical
SAVE -Repairable/non-repairable systems
-Permanent faults
-Maintenance strategies
-Exponential failure & repair distribution
-Fault tree
-Input language to describe Markov chain
-Fault trees
-Continuous-state homogeneous Markov chain
Analytical
Simulation
MARK1 Non-repairable systems -Poisson failure distribution
• NIPPON TELEGRAPH and TELEPHONE 542 Standard Reliability Table
• Handbook of Reliability Data HRD3 (BRITISH TELECOM) 10
DS - II - DCMM - 61
MIL-HDBK-217E RELIABILITY MODEL
• The MIL-217 Module is a powerful reliability prediction program based on the internationally recognised method of calculating electronic equipment reliability given in MIL-HDBK-217. This standard uses a series of models for various categories of electronic, electrical and electro-mechanical components to predict failure rates which are affected by environmental conditions, quality levels, stress conditions and various other parameters. These models are fully detailed in MIL-HDBK-217.
• Multi Systems within the same Project• Transfer to and from any other module• Linked Blocks, represent blocks with identical characteristics• Redundancy modelling including hot standby• Mission Phase• Additional sources
BELLCOR´S RELIABILITY PREDICTION PROGRAM• The Bellcore Module calculates the reliability prediction of electronic
equipment based on the Bellcore (Telcordia) standard TR-332 Issue 6. This standard uses a series of models for various categories of electronic, electrical and electro-mechanical components to predict steady-state failure rates which are affected by environmental conditions, quality levels, electrical stress conditions and various other parameters. These models allow reliability prediction to be performed using three methods.
• Method I : Parts Count procedure • Method II : Combines Method I predictions with laboratory data • Method III : Combines Method I predictions with Field Tracking Data • Multi Systems within the same Project • Transfer to and from any other module • Linked Blocks, represent blocks with identical characteristics • Redundancy modelling including hot standby• Global Editing• Additional source
– http://www.relexsoftware.com/
DS - II - DCMM - 65
BELLCOR´S RELIABILITY PREDICTION PROGRAM
= gbst
g – environmental factor
s – quality factor
t – temperature factor
b – base failure rate based on the number of transistors for 64K DRAM g = s = t = 1.0 = 550 FIT‘s
Basicb = (l+G)
G – gate count
– technology factor
DS - II - DCMM - 66
REFERENCES:
• A. M. Johnson and M. Malek, "Survey of Software Tools for Evaluating Reliability, Availability and Serviceability," ACM Computing Surveys, 20 (4), 227-269, December 1988; translated and reprinted in Japanese, Kyoritsu Shuppan Co., Ltd., publisher, 1990.
• R. Geist, M. Smotherman, and M. Brown, "Ultrahigh Reliability Estimates for Systems Exhibiting Globally Time-Dependent Failure Processes", Proceedings of the 19th IEEE International Symposium on Fault-Tolerant Computing (FTCS-19), 152-158, Chicago, IL, June 1989.
• Software Quality and Reliability: Tools and Methods (Unicom Applied Information Technology Reports) Darrell Ince(Editor) / Hardcover/Published 1991.