8/12/2019 QRMEC-Elsayed May 28 2013
1/47
Reliability and Maintainability
Engineering: An Overview
E. A. ElsayedDepartment of Industrial and Systems Engineering
Rutgers University
New Jersey, USA
1
8/12/2019 QRMEC-Elsayed May 28 2013
2/47
Brief Topics
Some initial thoughts and personal
experience Reliability definition
Reliability testing
Methods for improving reliability
Maintenance strategies: PM, FR,Inspection policies
Condition-based maintenance based oncurrent sensors technologies
2
8/12/2019 QRMEC-Elsayed May 28 2013
3/47
8/12/2019 QRMEC-Elsayed May 28 2013
4/47
Topics
Some initial thoughts and personal
experience Reliability definition
Reliability testing
Methods for improving reliability
Maintenance strategies: PM, FR,Inspection policies
Condition-based maintenance based oncurrent sensors technologies
4
8/12/2019 QRMEC-Elsayed May 28 2013
5/47
Reliability Economics
Oil Pipeline Shutdown (Hardware Failure)
BP shuts oilfield August 8, 2006
Damaged pipeline in Alaska affects 8% of U.S. oil
production; crude surges; record gas prices seen.
The threat of a stoppage also endangers Alaska's budget:Oil taxes account for more than 90 percent of its revenues.
BP officials have acknowledged they did not test the pipes
adequately using a so-called pig device which is run
through a pipe to gauge corrosion (utilizes ultrasound todetect corrosion).,,,inspection and maintenance strategies.
5
8/12/2019 QRMEC-Elsayed May 28 2013
6/47
Reliability Engineering
Air Traffic Delays (Software Failure)
Nov 19 2009: A computer glitch caused flight
cancellations and delays across the U.S.
The problem involved the FAA computer systems
in Salt Lake City and Atlanta that handle
automated flight plans, forcing air traffic
controllers to revert to the much more time-
consuming approach of entering flight plans by
hand.
Software failure (7000 flights)
6
8/12/2019 QRMEC-Elsayed May 28 2013
7/47
Global Impact: Oceanic Airspace
Data Link Communication Reliability
Oakland ARTCC
ARINC SITA
Ground
EarthStations
Uplink
7
Engine
MonitoringAOC or Aircraft
Operational
Communication
Position
Monitoring
8/12/2019 QRMEC-Elsayed May 28 2013
8/47
Recall of Cars due to Degradation
October 1, 2012(Environmental conditions)
8
General Motors Co. recalled more than 40,000 cars sold inwarm-weather states because a plastic part might crack and
cause a fuel leak.
The recall affects vehicles sold or currently registered inArizona, California, Florida, Nevada or Texas. Owners in
Arkansas and Oklahoma also are included in the recall of the
2009 Cobalt and G5.
The vehicles have plastic parts connected to the fuel pump
which could crack. If the crack gets large enough, fuel could
leak out of the vehicle and cause a fire.
8/12/2019 QRMEC-Elsayed May 28 2013
9/47
Reliability Definitions Measurements
When you a buy a product or service
you request high quality and high reliability
How do you measure it? What is high?
How long? Reliability: 0.99 at year 5, 0.999 at year 4
Time dependent qualityreliability
How do companies predict reliability and estimatewarranty?
Reliability of cold standby units New tires and oldtires
9
8/12/2019 QRMEC-Elsayed May 28 2013
10/47
Maximum Reliability level
Reliability
WithRepairs
Time
NoRepairs
Some Initial Thoughts
Repairable and Non-Repairable
Another measure of reliability is availability (probability
that the system provides its functions when needed).
10
8/12/2019 QRMEC-Elsayed May 28 2013
11/47
Some Initial Thoughts
Failure Rate During Life Cycle
Will you buy additional warranty?
Burn in and removal of early failures.
Time
Fa
ilureRate
Early Failures
ConstantFailure Rate
Increasing
FailureRate
11
8/12/2019 QRMEC-Elsayed May 28 2013
12/47
8/12/2019 QRMEC-Elsayed May 28 2013
13/47
13
Reliability Definitions
Reliability is a time dependent characteristic.
It can only be determined after an elapsed time but
can be predicted at any time.
It is the probability that a product or service will
operate properly for a specified period of time (design
life) under the design operating conditions without
failure.
13
8/12/2019 QRMEC-Elsayed May 28 2013
14/47
14
Other Measures of Reliability
Availability is used for repairable systems
It is the probability that the system is operational atany random time t.
It can also be specified as a proportion of time thatthe system is available for use in a given interval(0,T).
Mission availabilityplay time, military products
One shot devicesmissiles, standby generators.
14
8/12/2019 QRMEC-Elsayed May 28 2013
15/47
15
Other Measures of Reliability
Mean Time To Failure (MTTF): It is the average
time that elapses until a failure occurs.
It does not provide information about the distribution
of the TTF, hence we need to estimate the variance
of the TTF.
Mean Time Between Failure (MTBF): It is the
average time between successive failures.
It is used for repairable systems.
15
8/12/2019 QRMEC-Elsayed May 28 2013
16/47
8/12/2019 QRMEC-Elsayed May 28 2013
17/47
Brief Topics
Some initial thoughts and personal
experience Reliability definition
Reliability testing
Methods for improving reliability
Maintenance strategies: PM, FR,Inspection policies
Condition-based maintenance based oncurrent sensors technologies
17
8/12/2019 QRMEC-Elsayed May 28 2013
18/47
1818
Types of Reliability Testing
Highly accelerated life testing (HALT) Highly accelerated stress screening (HASS)
Accelerated life testing (ALT)
Degradation testing (DT) Accelerated Degradation Testing (ADT)
Acceptance test (AT)
8/12/2019 QRMEC-Elsayed May 28 2013
19/47
19
8/12/2019 QRMEC-Elsayed May 28 2013
20/47
2020
Purposes and Needs
1. Investigate failure modes and correct designs
2. Improve reliability during development
3. Demonstrate reliability for accepting a design
4. Predict reliability
5. Eliminate units with manufacturing defects
(infant mortality failures)
6. Accept or reject a products
8/12/2019 QRMEC-Elsayed May 28 2013
21/47
2121
Idea of Accelerated Stress Testing
Compression and extrapolation
Use conditionSeverer condition
Life
Stress
Unknown
extrapolation
MTTF
8/12/2019 QRMEC-Elsayed May 28 2013
22/47
Testing and Acceptance
Testing and Life Prediction:
Conduct extensive reliability testing for both
demonstration (acceptance of products) and life prediction.
Collect continuously to have constant assessment of thesystem reliability. Always assess the mean residual life to
determine the optimum time to replace the systemspecially in the wear out region..
Suppliers need to provide evidence of systems reliability.
22
8/12/2019 QRMEC-Elsayed May 28 2013
23/47
Current Cycle Profile: -40 C to 125 C
New Cycle Profile: -65 C to 170 C
Stress type, stress loading, sample size, duration of
the test.???
8/12/2019 QRMEC-Elsayed May 28 2013
24/47
2424
Summary of Reliability Testing
1 2 3 4 5 6
RDT X
RGT X
HALT X X
HASS X X
ALT X X
DT X X
ADT X X
Burn-in X
Acceptance X X
8/12/2019 QRMEC-Elsayed May 28 2013
25/47
Brief Topics
Some initial thoughts and personal
experience Reliability definition
Reliability testing
Methods for improving reliability
Maintenance strategies: PM, FR,Inspection policies
Condition-based maintenance based oncurrent sensors technologies
25
8/12/2019 QRMEC-Elsayed May 28 2013
26/47
8/12/2019 QRMEC-Elsayed May 28 2013
27/47
System Design Contd
System design: Implicit
RedundancyPumps connected in series.
Two consecutive failures for the system to fail
27
8/12/2019 QRMEC-Elsayed May 28 2013
28/47
Brief Topics
Some initial thoughts and personal
experience Reliability definition
Reliability testing
Methods for improving reliability
Maintenance strategies: PM, FR,Inspection policies
Condition-based maintenance based oncurrent sensors technologies
28
8/12/2019 QRMEC-Elsayed May 28 2013
29/47
Failure Rate and Maintenance
Availability
Region 1, repair upon failure (FR)
Region 2, no PM
Region 3, PM, Condition-based, if possible.
Time
FailureRate
Early Failures
Constant
Failure Rate
Increasing
FailureRate
29
8/12/2019 QRMEC-Elsayed May 28 2013
30/47
30
Distribut ion of the state
after maintenance
System State under Imperfect Maintenance
Time t
T1
> T2
> T3> T
4
0
ST
A
T
E
X(t) D
PM
R1 R+
1 R2R+
2
DF
R3R+
3 R4 R+
4 R5R+
5 R6R+
6
is the failure threshold while is preventive
maintenance threshold.FD PMD
8/12/2019 QRMEC-Elsayed May 28 2013
31/47
Maintenance
Maintenance:
Time to perform maintenance is normally
recommended by the manufacturer of the equipmentoperating under normal conditionswhat is normal?
If the estimate of failure time is conservative, onerisks the replacement of items before its useful life.
If the estimate of failure time is optimistic, one risks
that the component or system may fail in service.
Use Condition-Based Maintenance
31
8/12/2019 QRMEC-Elsayed May 28 2013
32/47
Brief Topics
Some initial thoughts and personal
experience Reliability definition
Reliability testing
Methods for improving reliability
Maintenance strategies: PM, FR,Inspection policies
Condition-based maintenance based onadvances in sensors technologies
32
8/12/2019 QRMEC-Elsayed May 28 2013
33/47
Technological Advances and Maintenance
Condition-based Maintenance
The recent advances in sensors technology,
chemical and physical non-destructive testing
(NDT), and sophisticated measurement techniques,
information processing, wireless communications
and internet capabilities have significantly impactedthe condition-based maintenance approach by
providing dynamic maintenance schedules that
minimize the cost, downtime and increase system
availability.
33
8/12/2019 QRMEC-Elsayed May 28 2013
34/47
Condition-Based Maintenance Example
Otis Elevators / United Technologies
Remote Elevator Monitoring
It is an interconnected system of sensors,
monitors, circuits, hardware and software used
to collect, record, analyze and communicate
elevator data continuously monitors hundreds ofdifferent functions on elevators worldwide
(braking time, acceleration, deceleration, floor
level, door opening, forces on strands,..).
34
8/12/2019 QRMEC-Elsayed May 28 2013
35/47
Condition-Based Maintenance Example
If the system detects a problem, it analyzes and
diagnoses the problems cause and location. Itmakes the service and aids Otis mechanic in
identifying the exact component(s) causing the
problem, which helps to facilitate a timely and
accurate resolution.
Issues:
Which component or subsystem?What is the indicator?
What is the alarm level of the indicator?
What is the action?35
8/12/2019 QRMEC-Elsayed May 28 2013
36/47
3636
Most of mechanical components such as gears,
brakes, bearings exhibit degradation before
failure.
Ductile materials have degradation indicators:
change in its physical properties Electronic components such as resistors,
capacitors, diodes exhibit change in
performanceresistance drift as an example
Polymers and elastomers change properties with
time and stress
Degradation Indicators
8/12/2019 QRMEC-Elsayed May 28 2013
37/47
3737
Examples of these indicators includehardness
which is a measure of degradation of elastomers. This is due to the fact that
elastomeric materials are critical to many
applications including hoses, seals and dampers
of various types and their hardness increases
over time to a critical level at which their ability
to absorb energy is severely degraded. This
may lead to cracks or excessive wear and
related failure modes in components
Degradation Indicators
8/12/2019 QRMEC-Elsayed May 28 2013
38/47
38
Continuous-time Continuous-state
Degradation (Indicators)
Examples
Light intensity degradation of light emitting diode (LED)
Metal crack propagation
Wear of tires, brake system wear out, nozzle blocking.
Increase of stiffness of energy absorbing material.
Strength loss of steel beams due to corrosion
8/12/2019 QRMEC-Elsayed May 28 2013
39/47
Degradation Path
39
dX t dt dW t
0 i idX t dt dW t X t x t W t
8/12/2019 QRMEC-Elsayed May 28 2013
40/47
Brownian Motion: Parameters Estimation
Using the Maximum Likelihood Estimator we obtain the
parameters of the Brownian Motion degradation path as
40
2
1 1
2
2
2
1 1
1 1
n ni ii
i
i i
i i
n ni ii
i
i i
y mym v
n n
m t v t
y my
t n t n
8/12/2019 QRMEC-Elsayed May 28 2013
41/47
First Passage Time Distribution
41
degradation 1
degradation 2
threshold 1
threshold 2
R l t D d ti t F il Ti
8/12/2019 QRMEC-Elsayed May 28 2013
42/47
42
Relate Degradation to Failure Time
Distribution (First Passage Time)
:Failure time
0
10
20
30
40
50
0 20 40 60 80 100 120 140 160 180 200
Time
Data 1
&Exponential Fit 1
Data 2
&Exponential Fit 2
Data 3
&Exponential Fit 3
Data 4
&Exponential Fit 4
Critical Degradatio
Degradation vs Time( ) )R t t stProb(the 1 time for degradation measure to cross the failure threshold
Failure threshold
8/12/2019 QRMEC-Elsayed May 28 2013
43/47
First Passage Time Distribution
43
In degradation modeling we set a degradation
threshold level at which the performance of the
system is considered unacceptable. The thresholdlevel is . The probability of failure can be
defined as
The time to cross the threshold level is referred to
as the first passage time. It has a distribution andits density function is given as shown next.
fD
1
fD t
F t P T t
8/12/2019 QRMEC-Elsayed May 28 2013
44/47
Failure Data in Oil and Gas Industry
OREDA database:
MTBF motor driven dry-wet compressor unit
=1.8 year (It is really MTTF)
Major Failures: 80% of all forced outages are
caused by unforeseen liquid ingress into
compressor
44
8/12/2019 QRMEC-Elsayed May 28 2013
45/47
Failure Data in Oil and Gas Industry
OREDA database:
Seal Failures: 80% of all seal failures are caused contamination
aging (elastomers)
Solution Analysis of 11,000 mechanical seal failures from
148 different reliability contract and alliance plant
sites over two years show that that 13% of theseal failures are attributable to lack of effective
corrective and preventive maintenance.
45
8/12/2019 QRMEC-Elsayed May 28 2013
46/47
8/12/2019 QRMEC-Elsayed May 28 2013
47/47