Failure Data Analysis by Models Involving 3 Weibull
Distributions
State Probability of a Series-parallel Repairable System with
Two-types of Failure States
Gregory Levitin
Reliability Department, Planning, Development and Technology
Division,
Israel Electric Corporation Ltd., P.O. Box 10, Haifa, 31000
Israel
Tieling Zhang, Min Xie
Department of Industrial and Systems Engineering
National University of Singapore, Singapore 117576
ABSTRACT
This paper presents a method for the analysis of series-parallel
safety-critical system where the system states can be distinguished
into failure-safe and failure-dangerous. The method incorporates
Markov chain and universal generating function technique. In the
model considered, both periodic inspection and repair (perfect and
imperfect) of system elements are taken into account. The system
state distributions and the overall system safety function are
derived based on the developed model. The proposed method is
applicable to complex systems for analyzing state distributions and
it is also useful in decision-making such as determining the
optimal proof-test interval or repair resource allocation. An
illustrative example is given.
Keywords: Availability, Safety-critical system, Markov model,
Universal generating function, Periodic inspection, Failure-safe,
Failure-dangerous
1. Introduction
Safety is of paramount concern for large and complex systems
such as nuclear power and chemical processing plants, aircraft
navigation control system, power transmission and high speed
railway networks, and so on. The complexity of large systems raises
many important problems concerning safety such that it may be very
difficult or even impossible to ensure that the systems will always
behave as expected under all foreseeable conditions. Dangerous
faults may be caused by not only random hardware failure but also
systematic faults inadvertently designed into the system. Safety
analysis or risk assessment for such a system thus becomes a
complex problem that involves study of human factors (human error),
production process, manufacturing control, on-line measurement or
test and repair, diagnosis with periodic inspections and so on. See
Dominguez-Garcia et al. (2006), Delon et al. (2005), Cowing et al.
(2004), Marseguerra et al. (2004), Burgazzi (2003) for some related
discussions on the recent reliability related research for
safety-critical systems.
The use of safety-critical systems represents taking proactive
measures to prevent a process plant from occurrence of dangerous
events. For example, emergency shutdown controllers are widely used
in chemical processing industry. Their function is to monitor a
plant process and to identify if the process is operating within
the acceptable limits. If the process moves outside of an
acceptable operation range, the controller automatically shuts the
process down in a safe manner (Bukowski, 2001). In order to provide
proper analysis of safety-critical systems the dangerous and
non-dangerous failures should be distinguished, that are
corresponding to failure-safe and failure-dangerous states of the
system.
The international standard IEC 61508 (1998) includes two
frameworks: One is risk reduction with Safety-Related System (SRS)
and the other is the Overall Safety Life-cycle. Since its
publication, it has been widely adopted in various safety related
studies and applications (see, e.g., Faller, 2004, Hokstad and
Cornliussen, 2004, Zhang et al., 2003, Nunns, 2000, and Knegtering,
1999). A typical architecture of SRS is regarded to consist of
components with diagnosis and periodic inspection, where the
failures in each component are classified into detectable and
undetectable. There are a number of studies on safety-critical
systems which correspond to different specific system structures,
see, e.g., some recent references such as Kang and Jang (2006), Kim
et al. (2005), Weber et al. (2005), Lee et al. (2004),
Latif-Shabgahi (2004) and Son and Seong (2003).
Periodic inspection is important for safety-critical systems and
it has been studied in reliability analysis in general (see, e.g.,
Cui et al., 2004, Biswas, 2003, Bris et al., 2003, Bukowski, 2001).
In various studies of safety-critical system performance, the
effects of periodic inspection have been either ignored or modeled
by assigning quite longer average repair times for unrecognized
degraded states (Zhang et al., 2003 or 2006). In practice, the
unrecognized fault can not be repaired until the next periodic
inspection (proof-test). In fact, the repair for this kind of
faults is carried out at determined time. However, only very few
studies have concerned the problem. Bukowski (2001) gives a method
of incorporating periodic inspection and repair into Markov model
in which both perfect and imperfect inspection and repair can be
modeled. However, in Bukowski (2001), the situation that both
unrecognized and recognized degraded states may exist
simultaneously was not included in the Markov model. As the
unrecognized failure can only be found at periodic inspection, the
two kinds of faults could exist in some period of time.
The purpose of this paper is to present a method for evaluating
the probabilities of failure-safe and failure-dangerous states for
arbitrary complex series-parallel systems with imperfect
diagnostics and imperfect periodic inspections and repairs of
elements. Each kind of element failures whatever are of
failure-safe or failure-dangerous can be either detected or
undetected. The emphasis is on exact state probability or
availability of such a system. See Bowles and Dobbins (2004),
Chandrasekhar et al. (2004) and Carrasco (2004) for some related
study of other systems.
The remainder of this paper is composed of Markov model for
determining state distribution of a single system element,
universal generating function technique for determining state
distribution of the entire system and an illustrative example
presented.
Acronyms & Notations
FD
failure-dangerous state
FS
failure-safe state
W
operational state
G
set of states of element (system): G = {W, FS, FD}
(
structure function
(par
structure function for elements connected in parallel
(ser
structure function for elements connected in series
Sj
random discrete state variable of element j
sjk
k-th realization of Sj: sjk ( G
Fd
detected failure
Fu
undetected failure
FDd
detected failure-dangerous
FDu
undetected failure-dangerous
FSd
detected failure-safe
FSu
undetected failure-safe
pfd
probability of failure on demand
pfdD
probability of failure-dangerous on demand
pfdS
probability of failure-safe on demand
(
system transition rate matrix
0k
zero column vector of size k(1
1k
unit column vector of size k(1
PW(t), PFS(t), PFD(t)
probability of subsystem or the entire system is in state W, FS,
FD at time t
(sd, (dd, (du, (su
failure rate of FSd, FDd, FDu, FSu
(sd, (dd, (du, (su
repair rate of FSd, FDd, FDu, FSu
d
fraction of detected failures that are detected correctly
TI
Proof-test interval
Assumptions
1. System is composed of elements and each element can
experience two categories of failures: Dangerous and non-dangerous,
corresponding respectively to failure-dangerous and failure-safe
events. Failure-dangerous and failure-safe events are
independent.
2. Both categories of failures can be detected and
undetected.
3. Detected and undetected failures constitute independent
events.
4. Failure rates for both kinds of failures are constant.
5. The element is in operation state if no failure event
(detected or undetected) has occurred.
6. The element is in failure-safe state if at least one
non-dangerous failure (detected or undetected) has occurred and no
dangerous failure has occurred.
7. The element is in failure-dangerous state if at least one
dangerous failure (detected or undetected) has occurred.
8. The elements are independent and can undergo periodic
inspections at different times.
9. The state of any composition of elements is unambiguously
defined by the states of these elements and the nature of elements
interaction in the system.
10. The elements interaction is represented by series-parallel
block diagram.
2. State distribution of single system element
According to IEC 61508, the typical system structure is composed
of elements to which diagnosis and periodic inspection and repair
are applied. Failure-safe or failure-dangerous events can occur
independently. The failure category depends on the effects of a
fault occurrence. For example, if a failure results in shutdown of
a properly operating process, it is of the type of failure-safe
(FS). This type of failure is referred in a variety of ways to
false trip and false alarm. However, if a safety-critical system
fails in an operation which is required to shut down a process,
that could cause hazardous results, such as failure of a monitor
that is applied to control an important process. This type of
failure is generally called failure-dangerous (FD).
Both FS and FD events can be detected or undetected. The
detected failure can be detected instantly by diagnostic devices.
An imperfect diagnosis model presumes that a fraction d of detected
failures can be detected instantaneously by diagnostic devices.
Whenever the failure of this kind is detected, the on-line repair
is initiated. The failures that can not be detected by the
diagnostic devices or remain undetected because of the imperfect
diagnosis are considered to be undetected failures. These failures
can be found only by the proof-test (periodical inspection) just
after the end of a proof-test interval. We assume that failure
rates of detected failure-safe and failure-dangerous ((sd and (dd,
respectively) as well as undetected failure-safe and
failure-dangerous ((su and (du, respectively) can be calculated or
elicited from tests.
The state of any single element can be represented as
combination of two independent states corresponding to detected and
undetected failures. Each of the two failures can be in three
different states of no failure (state W), failure of category FS
and failure of category FD. According to assumptions 5-7, the state
of each element can be determined based on each combination of
states of failures using Table 1.
Table 1. States of single element.
The state of each element j can be represented by a discrete
random variable Sj that takes values from the set G = {W, FS, FD}.
In order to obtain the element state distribution pjW = Pr(Sj = W),
pjFS = Pr(Sj = FS) and pjFD = Pr(Sj = FD), one should summarize the
probabilities of any combination of states of detected and
undetected failures that results in the element states W, FS and
FD, respectively. Based on element state transition analysis, one
can obtain the Markov state transition diagram presented in Fig. 1.
In this diagram, each possible combination of the states of
detected and undetected failures (marked inside the cycles) belongs
to one of the three sets corresponding to three different states of
element defined according to Table 1.
Practically, no repair action is applied to the undetected
failure until the next proof-test. In general, the periodic
inspection and repair take very short time when comparing to the
proof test interval TI, and the whole system stops operation (in
down state) during the process of periodic inspection and repair.
Therefore, it is reasonable to set repair rates for undetected
failures (du = (su = 0 when analyzing the behavior of a
safety-critical system within the proof test interval (unlike
equivalent repair rates for (du and (su used in Zhang et al.
(2003).
Fig. 1. Markov state transition diagram used for calculating
state distribution of a single element.
According to Fig. 1, the following group of equations describes
the elements behavior:
Pj((t) = Pj(t) (j (1)
Pj(t) = (pj1(t), pj2(t), , pj9(t)) is the vector of state
probabilities, P((t) is derivative of P(t) with respect to t, and
(j is transition rate matrix, see appendix. According to Table 1,
state 1 in the Markov diagram corresponds to state W of the
element, states 2 - 4 correspond to state FS of the element and
states 5 - 9 correspond to state FD of the element. Having the
solution P(t) of Eq. (1) for any element j, one can obtain pjW =
pj1, pjFS = pj2 + pj3 + pj4 and pjFD = pj5+ pj6 + pj7 + pj8 + pj9.
The solution of Eq. (1) can be expressed as
Pj(t) = Pj(0) ( exp((j ( t), for t ( 0; (2)
Pj(t) = Pj(n ( TI+) ( exp((j( (t ( n ( TI)), for n ( TI+ ( t (
(n +1) TI+ , n = 0, 1, 2, (
To consider imperfect inspection and repair, the undetected
fault can not be repaired as good as new and some may still exist
after inspection and repair. A matrix Mji is used to describe this
behavior. Each element of the matrix Mji describes the transition
rate of probability from one state to another. Thus, we have
Pj(TI+) = Pj(TI) ( Mj1 = Pj(0) ( exp((j TI) ( Mj1 (3)
Pj(2TI+) = Pj(2TI) ( Mj2 = Pj(0) ( exp((j TI) ( Mj1 ( exp((j TI)
( Mj2
Pj(n ( TI+) = Pj(n ( TI) ( Mjn
= Pj((n (1 )TI+) ( exp((j( TI) ( Mjn
= Pj((n (2 )TI+) ( exp((j( TI) ( Mj(n (1) ( exp((j( TI) (
Mjn
= Pj(0) ( exp((j( TI) ( Mj1 ( exp((j( TI) ( Mj2 (
( ( ( ( ( ( exp((j( TI) ( Mj(n (1) ( exp((j( TI) ( Mjn for n =
1, 2, 3, ( (4)
In Eq. (4), n represents the nth proof-test interval and Mji (i
= 1, 2, 3, (, n) is matrix associated with the ith proof-test.
3. State distribution of the entire series-parallel system
In order to obtain the state distribution of the entire system,
the procedure used in this paper is based on the universal
generating function (u-function) technique. This method was
introduced in Ushakov (1987) and has shown to be very effective for
the reliability evaluation of different types of multi-state
systems, see Levitin et al. (1998) and Lisnianski and Levitin
(2003). The comprehensive description of the method and its
numerous applications in reliability engineering can be found in
(Levitin, 2005). For some recent and related applications, see
e.g., Levitin (2004 and 2005), and Korczak et al. (2006).
The u-function of a discrete random variable Y is defined as a
polynomial
,
)
(
1
=
=
K
k
y
k
k
z
q
z
u
(5)
where the variable Y has K possible values and qk is the
probability that Y takes the value of yk. In our case, the
polynomial u(z) can define state distributions, i.e. it represents
all of the possible mutually exclusive states of the element (or
any subsystem) by relating the probabilities of each state to the
value that takes the random state variable corresponding to this
element (subsystem) in that state. Note that the performance
distribution of the basic element j (probability mass function of
discrete random variable Sj) can now be represented as
=
=
3
1
)
(
k
s
jk
j
jk
z
p
z
u
, (6)
where sj1 = FD, sj2 = FS, sj3 = W for any j.
To obtain the u-function of a subsystem consisting of two
elements, composition operators are introduced. These operators
determine the u-function for two elements connected in parallel and
in series, respectively, using simple algebraic operations on the
individual u-functions of basic elements. All the composition
operators take the form
=
=
=
=
=
=
3
1
)
,
(
3
1
3
1
3
1
)
(
)
(
h
s
s
ih
jk
k
h
s
ih
k
s
jk
i
j
ih
jk
jh
jk
z
p
p
z
p
z
p
z
u
z
u
j
j
j
. (7)
The obtained u-function relates the probability of each
combination of states of the independent elements (which is equal
to the product of the probabilities of these states) to the value
that the random state variable of the entire subsystem takes when
this combination is realized. The function ((.) in composition
operators expresses the dependence of the entire subsystem state on
the states of both of its elements. The definition of the function
((.) strictly depends on the physical nature of the system and on
the nature of the interaction of the system elements.
The structure functions for pairs of elements connected in
parallel and in series should be defined for any specific
application based on analysis of system functioning. For example,
in the widely applied conservative approach the following
assumptions are made. Any subsystem consisting of two parallel
elements is in failure-dangerous state if at least one of elements
is in failure-dangerous state and is in operational state if at
least one of the elements is in operational state. In the rest of
cases, the subsystem is in failure-safe state. This can be
expressed by the structure function (par(.) presented in Table 2. A
subsystem consisting of two elements connected in series is in the
operational state if both of the elements are in the operational
state, whereas it is in failure-dangerous state if at least one of
elements is in failure-dangerous state. In the rest of cases, the
subsystem is in failure-safe state. This can be expressed by the
structure function (ser(.) presented in Table 3.
Table 2. Structure function for pair of elements connected in
parallel.
Table 3. Structure function for pair of elements connected in
series.
In the numerical realization of the composition operator in Eq.
(7), we can encode the states W, FS and FD by integer numbers 3, 2
and 1, respectively, as such sjk = k for any j. In our case, k = 1,
2, 3. It can be seen that in this case the defined above functions
(par(.) and (ser(.) take the form:
(par(sjk, sih) =
=
>
1
)
,
min(
if
,
1
1
)
,
min(
if
),
,
max(
ih
jk
ih
jk
ih
jk
s
s
s
s
s
s
and (ser(sjk, sih) = min(sjk, sih).
Note that the nine possible different combinations of element
states produce only three possible states of the subsystem. The
probabilities of combinations that produce the same subsystem state
should be summed in order to obtain this state probability. This
can be done by collecting terms with equal exponents in the
u-function obtained by Eq. (7). Finally, any subsystem state
distribution can be represented by the u-function taking the form
of Eq. (6).
Any subsystem consisting of two elements can be further treated
as a single equivalent element with a performance distribution that
is equal to the performance distribution of this subsystem.
Consecutively applying the composition operators and replacing
pairs of elements by equivalent elements, one can obtain the
u-function representing the performance distribution of the entire
system.
The recursive algorithm
The following recursive algorithm obtains the u-function that
represents the entire system state distribution:
Step 1. Obtain the state probabilities for each element j using
the Markov transition diagram method presented in Section 2.
Step 2. Define the u-functions uj(z) for each element j using
Eq. (6).
Step 3. If the system contains a pair elements connected in
parallel or in a series, replace this pair with an equivalent
element with u-function obtained by operator of Eq. (7) with the
structure functions (par(.) and (ser(.), respectively.
Step 4. If the system contains more than one element, return to
Step 3. Otherwise, the algorithm stops.
The coefficients of the obtained u-function are equal to
probabilities of operational, failure-safe and failure-dangerous
states of the entire system.
With the state probabilities of each element in the form of
functions of time, one can use the algorithm presented above to get
the probability values corresponding to any given time. Finally,
the entire system state probabilities and the overall system safety
(defined as the sum of operational probability and failure-safe
state probability) as functions of time can be obtained. In the
following section, we use an example to illustrate the procedure
described here.
4. Illustrative example
Consider a combine-cycle power plant with two generating units.
Each unit consists of a gas turbine blocks and fuel supply systems.
The fuel to each turbine block can be supplied by two parallel
systems. The simplified reliability block diagram of the plant is
presented in Fig. 2. Each fuel supply system as well as each
turbine can experience both safe and dangerous failures (detected
and undetected).
Fig. 2. Reliability block diagram of combine cycle power
plant
The parameters of fuel supply systems are: (sd = 2.56(10-5, (su=
10-5, (dd= 8.9(10-6, (du = 1(10-6, (sd = 0.25; (dd = 0.0833, ( su=
(du = 0; d = 0.99; TI = 1.5 years. The fuel supply systems are
statistically identical, but the inspection times of systems 2 and
4 are shifted 0.5 year earlier relatively to inspection times of
systems 1 and 3. The matrix Mji associated with each fuel supply
system is M1i (i = 1, 2, 3, 4) as shown in Eq. (A2) in
Appendix.
The turbine blocks are also statistically identical. The
parameters of the turbine blocks are: (sd = 2.56(10-5, ( su=
6.540(10-6, (dd= 7.9(10-6, (du = 7.8(10-7; ( sd = 0.25, (dd =
0.0625, ( su= (du = 0; d = 0.99; TI = 2 years. The matrix Mji
associated with each turbine block is M2i (i = 1, 2, 3) as shown in
Eq. (A3) in Appendix.
The probabilities pjW(t), pjFS(t) and pjFD(t) for each system
element obtained by solving equations (2) and (3) for a period of
time, 65000 hours, are presented in Fig. 3 - 5. At the same time,
the probabilities PW(t), PFS(t) and PFD(t) for single generating
unit and for the entire system (the structure functions are defined
in accordance with Tables 2 and 3, respectively), obtained using
the algorithm given in Section 3, are also presented in Fig. 3
through 5. These figures show that the variations of these
probabilities for single generating unit and the entire system have
also the property of periodicity.
The system safety S(t)=PW(t)+PFS(t) as the function of time is
presented in Fig. 6.
0.84
0.88
0.92
0.96
1
0102030405060
t (thousands of hours)
P
W
elements 1,3elements 2,4elements 5,6
single unitsystem
Fig. 3. Probabilities of working states
0
0.04
0.08
0.12
0102030405060
t (thousands of hours)
P
S
elements 1,3elements 2,4elements 5,6single unitsystem
Fig. 4. Probabilities of failure-safe states
0
0.016
0.032
0.048
0.064
0.08
0102030405060
t (thousands of hours)
P
D
elements 1,3elements 2,4elements 5,6
single unitsystem
Fig. 5. Probabilities of failure-dangerous states
0.9
0.92
0.94
0.96
0.98
1
0102030405060
t (thousands of hours)
S
Fig. 6. Overall system safety
5. Conclusions
In this paper a method is proposed for the study of
series-parallel systems with imperfect diagnostics and imperfect
periodic inspections and repairs of elements. Element failures can
be failure-safe and failure-dangerous and can be either detected or
undetected. The proposed model incorporates periodic inspection and
repair (both perfect and imperfect) of system elements. The Markov
model is used for the determination of state distribution of a
single system element, while universal generating function
technique for state distribution of the entire system. The
presented example shows that the procedure can be easily
implemented to estimate the state probabilities and the overall
safety of a safety-critical system.
The method presented in this paper can be applied to different
research fields such as power generation units, electronic devices
and chips, data storage based on redundant array of inexpensive
disks (Katz et al., 1989; Gibson and Patterson, 1993, etc.) and so
on. It can be used for evaluating safety of a fault-tolerant
single-chip multiple microprocessors architecture (Yao, et al.,
2004) which represents a promising solution to partly mitigate the
system faults and to increase the system dependability in
mission-critical applications.
Acknowledgement:
This research was carried out while the first author was
visiting National University of Singapore supported by the research
grant R-266-000-020-112 at National University of Singapore. The
authors would like to thank three referees for their constructive
comments.
References
Biswas, A.; Sarkar, J. and Sarkar, S. (2003). Availability of a
periodically inspected system, maintained under an imperfect-repair
policy. IEEE Transactions on Reliability, 52 (3), 311-318.
Bowles, J.B. and Dobbins, J.G. (2004). Approximate reliability
and availability models for high availability and fault-tolerant
systems with repair. Quality and Reliability Engineering
International, 20 (7), 679-697.
Bris, R., Chatelet, E. and Yalaoui, F. (2003). New method to
minimize the preventive maintenance cost of series-parallel
systems. Reliability Engineering & System Safety, 82 (3),
247-255.
Bukowski, J.W. (2001). Modeling and analyzing the effects of
periodic inspection on the performance of safety-critical systems,
IEEE Transactions on Reliability, 50 (2), 321 329.
Burgazzi, L. (2003). Reliability evaluation of passive systems
through functional reliability assessment. Nuclear Technology, 144
(2), 145-151.
Carrasco, J.A. (2004). Solving large interval availability
models using a model transformation approach. Computers &
Operations Research, 31 (6), 807-861.
Chandrasekhar, P.; Natarajan, R. and Yadavalli, V.S.S. (2004). A
study on a two unit standby system with Erlangian repair time.
Asia-Pacific Journal of Operational Research, 21 (3), 271-277
Cowing, M.M.; Pate-Cornell, M.E. and Glynn, P.W. (2004). Dynamic
modeling of the tradeoff between productivity and safety in
critical engineering systems. Reliability Engineering & System
Safety, 86 (3), 269-284.
Cui, L.R.; Loh, H.T. and Xie, M. (2004). Sequential inspection
strategy for multiple systems under availability requirement.
European Journal of Operational Research, 155 (1), 170-177.
DeLong, T.A.; Smith, D.T. and Johnson, B.W. (2005).
Dependability metrics to assess safety-critical systems. IEEE
Transactions on Reliability, 54, 498-505.
Dominguez-Garcia, A.D.; Kassakian, J.G. and Schindall, J.E.
(2006). Reliability evaluation of the power supply of an electrical
power net for safety-relevant applications. Reliability Engineering
& System Safety, 91, 505-514.
Faller, R. (2004). Project experience with IEC 61508 and its
consequences. Safety Science, 42 (5), 405-422.
Gibson G. A. and Patterson D.A. (1993). Designing Disk Arrays
for High Data Reliability, Journal of Parallel and Distributed
Computing, 17, 4 27.
Goble, W.M. (1998). Control Systems Safety Evaluation and
Reliability, 2nd ed: ISA.
Hokstad, P. and Corneliussen, J. (2004). Loss of safety
assessment and the IEC 61508 standard. Reliability Engineering
& System Safety, 83 (1), 111-120.
IEC 61508 (1998). Functional safety of
electric/electronic/programmable electronic safety-related systems,
Parts. 17, October 1998May 2000.
Inagaki, T. and Ikebe, Y. (1989). Performance analysis of a
safety monitoring system under human-machine interface of
safety-presentation type, Microelectronics and Reliability, 29 (2),
1989, 165 175.
Kang, H.G. and Jang, S.C. (2006). Application of condition-based
HRA method for a manual actuation of the safety features in a
nuclear power plant. Reliability Engineering & System Safety,
91, 627-633.
Katz R.H.; Gibson G.A. and Patterson D. (1989). Disk System
Architectures for High Performance Computing, Proceedings of the
IEEE, 77, No. 12, pp. 1842 1858.
Kim, H.; Lee, H. and Lee, K. (2005). The design and analysis of
AVTMR (all voting triple modular redundancy) and dual-duplex
system. Reliability Engineering & System Safety, 88,
291-300.
Korczak, E.; Levitin, G and Ben Haim. H. (2005). Survivability
of series-parallel systems with multilevel protection. Reliability
Engineering & System Safety, 66, 45-54.
Knegtering, B. and Brombacher, A.C. (1999). Application of micro
Markov models for quantitative safety assessment to determine
safety integrity levels as defined by the IEC 61508 standard for
functional safety. Reliability Engineering & System Safety, 66
(2), 171-175.
Latif-Shabgahi, G.; Bass, J.M. and Bennett, S. (2004). Taxonomy
for software voting algorithms used in safety-critical systems.
IEEE Transactions on Reliability, 53 (3), 319-328.
Lee, D.Y.; Han, J.B. and Lyou, J. (2004). Reliability analysis
of the reactor protection system with fault diagnosis. Key
Engineering Materials, 270, 1749-1754.
Levitin, G. (2004). A universal generating function approach for
the analysis of multi-state systems with dependent elements.
Reliability Engineering & System Safety, 66, 285-292.
Levitin, G. (2005). Uneven allocation of elements in linear
multi-state sliding window system. Eyropean Journal of Operational
Research, 163, 418-433.
Levitin G.; Lisnianski A.; Beh-Haim H. and Elmakis, D. (1998).
Redundancy optimization for series-parallel multi-state systems,
IEEE Transactions on Reliability, 47 (2), 165-172.
Lisnianski, A. and Levitin, G. (2003). Multi-state System
Reliability, World Scientific, Singapore.
Levitin, G. (2005). The Universal Generating Function in
Reliability Analysis and Optimisation. Springer-Verlag: Berlin,
Springer Series in Reliability Engineering.
Marseguerra, M.; Zio, E. and Podofillini, L. (2004). A
multiobjective genetic algorithm approach to the optimization of
the technical specifications of a nuclear safety system.
Reliability Engineering & System Safety, 84 (1), 87-99.
Nunns, S.R. (2000). Conformity assessment of safety related
systems to IEC 61508 - the CASS initiative. Computing & Control
Engineering Journal, 11 (1), 33-39.
Olbrich, T; Richardson, A.M.D. and Bradley, D.A. (1996).
Built-in self-test and diagnostic support for safety critical
Microsystems, Microelectronics and Reliability, 36, 1125 1136.
Son, H.S. and Seong, P.H. (2003). Development of a safety
critical software requirements verification method with combined
CPN and PVS: a nuclear power plant protection system application.
Reliability Engineering & System Safety, 80 (1), 19-32.
Ushakov I., (1987). Optimal standby problems and a universal
generating function, Soviet Journal of Computer System Science, 25,
79-82.
Wang, D. and Inagaki, T. (1994).Time-dependent optimality of an
alarm subsystem, Microelectronics and Reliability, 34, 1623
1633.
Weber, W.; Tondok, H. and Bachmayer, M.B. (2005). Enhancing
software safety by fault trees: experiences from an application to
flight critical software. Reliability Engineering & System
Safety, 89, 57-70.
Yao, W.B.; Wang D.S. and Zheng W.M. (2004). A Fault-tolerant
Single-chip Multiprocessor, ACSAC 2004 ( Proceedings of Advances in
Computer Systems Architecture: 9th Asia-Pacific Conference,
Pen-Cheng Yew and Jingling Xue (eds.), Berlin: Springer, 2004, p.
137-145.
Zhang, T.L.; Long, W. and Sato, Y. (2003). Availability of
systems with self-diagnostic componentsapplying Markov model to IEC
61508-6, Reliability Engineering & System Safety, 80, 133
141.
Zhang, T.L.; Xie, M. and Horigome, M. (2006). Availability and
reliability of k-out-of-(M plus N): G warm standby systems.
Reliability Engineering & System Safety, 91, 381-387.
Zhou, Z. (1987). Analysis of a two unit standby redundant
fail-safe system. Microelectronics and Reliability, 27, 469
474.
Appendix
The transition rate matrix for one element is
(c
(su
(sd
(du
(dd
0
0
0
0
0
0
0
0
(sd
0
(dd
0
(sd
0
0
0
(su
(du
0
0
0
0
0
0
0
(sd
0
(dd
(dd
0
0
0
0
0
(su
(du
0
(sd
0
0
0
((sd
0
0
0
0
0
0
(sd
0
0
((sd
0
0
0
(dd
0
0
0
0
0
((dd
0
0
0
0
(dd
0
0
0
0
((dd
where c = (sd + (dd + (du + (su .
The matrices M1i (i = 1, 2, 3, 4) for fuel supply system are
p1
p2
p3
p4
p5
p6
p7
p8
p9
1
0
0
0
09
09
09
09
09
0.90
0.10
0
0
1
0
0
0
0.80
0
0
0.20
15
05
05
05
p1
p2
p3
p4
p5
p6
p7
p8
p9
1
0
0
0
09
09
09
09
09
0.88
0.12
0
0
1
0
0
0
0.776
0
0
0.224
15
05
05
05
p1
p2
p3
p4
p5
p6
p7
p8
p9
1
0
0
0
09
09
09
09
09
0.85
0.15
0
0
1
0
0
0
0.747
0
0
0.253
15
05
05
05
p1
p2
p3
p4
p5
p6
p7
p8
p9
1
0
0
0
09
09
09
09
09
0.808
0.192
0
0
1
0
0
0
0.711
0
0
0.289
15
05
05
05
The matrices M2i (i = 1, 2, 3) for turbine block are
p1
p2
p3
p4
p5
p6
p7
p8
p9
1
0
0
0
09
09
09
09
09
0.92
0.08
0
0
1
0
0
0
0.85
0
0
0.15
15
05
05
05
p1
p2
p3
p4
p5
p6
p7
p8
p9
1
0
0
0
09
09
09
09
09
0.804
0.096
0
0
1
0
0
0
0.832
0
0
0.168
15
05
05
05
p1
p2
p3
p4
p5
p6
p7
p8
p9
1
0
0
0
09
09
09
09
09
0.882
0.118
0
0
1
0
0
0
0.810
0
0
0.190
15
05
05
05
Gregory Levitin received a PhD degree in Industrial Automation
from Moscow Research Institute of Metalworking Machines in 1989.
From 1982 to 1990 he worked as software engineer and research
associate in the field of industrial automation. From 1991 to 1993
he worked at the Technion (Israel Institute of Technology) as a
postdoctoral fellow at the faculty of Industrial Engineering and
Management. Dr. Levitin is presently an engineer-expert at the
Reliability Department of the Israel Electric Corporation and
adjunct senior lecturer at the Technion. His current interests are
in operations research and artificial intelligence applications in
reliability and power engineering. In this field Dr. Levitin has
published over 100 papers and two books. He is senior member of
IEEE. He serves in editorial boards of IEEE Transactions on
Reliability and Reliability Engineering and System Safety.
Tieling Zhang received a Ph.D. in engineering from Tokyo
University of Mercantile Marine in 2001. He has six years
experience of teaching, three years working in industry and a few
years holding research positions. Currently he is with Hitachi GST,
Singapore. He has 30 articles included in peer-review journals and
international conference proceedings. He holds a new practical
patent of China. His research interests include system reliability,
maintainability and safety, system optimization and vibration
control.
Min Xie received his Ph.D. in Quality Technology from Linkoping
University, Sweden, in 1987. Dr Xie has been active in reliability
and quality related research since then. He has authored or
co-authored over 100 articles in refereed journals and 6 books,
including Software Reliability Modelling by World Scientific,
Statistical Models and Control Charts for High Quality Processes by
Kluwer Academic Publisher, and Weibull Models by John Wiley &
Sons. He is a department editor of IIE Transactions, an associate
editor of IEEE Trans on Reliability, and on the editorial board of
several other journals. He is a fellow of IEEE.
5
2
1
M22 =
,
,
M12 =
,
M13 =
(A1)
(((su +(du+ (dd )
M21 =
Detected Failure
W
FSd
FDd
Undetected Failure
W
W
FS
FD
FSu
FS
FS
FD
FDu
FD
FD
FD
,
8
9
7
6
(j =
(((sd +(dd)
(((su + ddu +(sd )
(((sd +(dd)
(A2)
.
M14 =
M11 =
5
4
3
2
1
W
FD
FS
(dd
(dd
(su
(du
(dd
(sd
(sd
(su
(du
(sd
(su
(du
(su
(du
(dd
FDd, FSu
FDd, FDu
FSd, FDu
FSd, FSu
(A3)
.
M23 =
(dd
(du
(sd
(su
(sd
(dd
(du
(sd
(su
FDd, W
W, FDu
FSd, W
,
W, FSu
W, W
Detected Undetected
Turbine block
Fuel supply systems
6
4
Element 1
W
FS
FD
Element 2
W
W
FS
FD
FS
FS
FS
FD
FD
FD
FD
FD
Element 1
W
FS
FD
Element 2
W
W
W
FD
FS
W
FS
FD
FD
FD
FD
FD
3
_1163938733.unknown
_1166355018.unknown
_1208517830.unknown
_1163860227.unknown