Reliability growth planning (RGP) is emerging as a promising technique to address the reliability challenges arising from the distributed manufacturing environment. Unlike RGT (reliability growth testing), RGP drives the reliability growth of new products by spanning the product’s lifecycle from design, prototyping, manufacturing, to field use. It is a lifetime commitment to the product reliability via systematic failure analysis, rigorous corrective actions, and cost-effective financial investment. RGP has shown to be very effective, particularly in new product introductions under the fast time-to-market requirement.
The RGP process will be introduced based on the three-phase product lifecycle: 1) design for reliability during early product development; 2) accelerated lifetime testing and corrective actions in pilot line stage; and 3) continuous reliability improvement following the volume shipment. Trade-offs among reliability investment, warranty cost reduction, and customer satisfactions will be investigated from the perspective of the manufacturer and the customer. Reliability growth tools such as Crow/AMSAA, Pareto graphs, failure mode run chart, FIT (failure-in-time), and FMECA will be reviewed and their roles in the GRP process will be discussed and demonstrated. Case studies drawn from electronics equipment industry will be used to demonstrate the RGP applications and justify its benefits as well.
In parallel with the RGP, efforts have been devoted to developing optimal preventative maintenance programs, either time-based or usage-based strategies. Recently, CBM (condition based maintenance) is showing a great potential to achieve just-in-time maintenance or zero-downtime equipment. RGP and maintenance strategies share a common objective, i.e. achieving high system reliability and availability. In this presentation, optimal maintenance policies will be devised in the context of system reliability growth.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Reliability Growth Planning: Its Concept,Applications, and
ChallengesTongdan Jin
Assistant Prof. of Industrial EngineeringIngram School of Engineering
ASQ Reliability Division English Webinar SeriesOne of the monthly webinars
on topics of interest to reliability engineers.
To view recorded webinar (available to ASQ Reliability Division members only) visit asq.org/reliability
To sign up for the free and available to anyone live webinars visit reliabilitycalendar.org and select English Webinars to find links to register for upcoming events
For a given design, play essential roles in the actual component reliability.
ET ππ ,
11
Aggregate Failure Rate for Hardware
∑=∑===
k
iEiTiii
k
iiihw nn
10
1ππλλλ
][][][1
0 Ei
k
iTiiihw EEnE ππλλ ∑=
=
∑==
k
iEiTiiihw n
1
20
2 )var()var( ππλλ
Where
k = number of types of devices used in the product.
ni = quantity of ith type of device used in the product.
0i = base failure rate for ith type of device.
ASIC Temperature Distribution
0
2
4
68
10
12
14
<65 [65, 70)[70, 75)[75, 80)[80, 85)[85, 90) >90
Degree in Celsius
Qua
ntity
00.010.020.030.040.050.060.070.08
pdf
histogrampdf
12
Challenges in Modeling Non-Hardware Failures
1. Quite often data is not well recorded
2. Varies from one product line to another
3. Process related
4. Design experience
5. Other random factors
13 Triangle Models for Non-Hardware Failures
⎪⎪⎪
⎩
⎪⎪⎪
⎨
⎧
≤<−−−
≤≤−−−
=
otherwise
bcbabbc
caaabac
g
0
)())((
2
)())((
2
)( λλ
λλ
λ
a = the smallest possible value of the failure rate b = the largest possible value of the failure rate c = the most likely value, and c=3 -b-a = is the sample mean for the dataset
λλ
Where:
a bcλ
g(λ)
h
14 Example for Non-Hardware Failure Estimate
Example: Based on historical data of predecessor products, it shows failure rates pertaining to manufacturing issues are (faults/hour): 1.210-6, 1.410-6 and 2.4 10-6. Then : = (1.210-6+1.410-6 +2.3 10-6)/3=1.610-6 a = 1.210-6
b = 2.4 10-6
c = 1.310-6
λ
15 Combining HW and Non-HW Failure Rate
∑+++++==
k
iiiopmsdsys n
1λλλλλλλ
Where: d = failure rate of design weakness s = failure rate of software m = failure rate of manufacturing p = failure rate of process o = failure rate of other issues (e.g. NFF)k= total number of HW component types i = failure rates for component type i
16 Confidence Intervals for Failure Rate
∑+++++==
k
iiiopmsdsys n
1λλλλλλλ
∑+++++==
k
ii iopmsdsysn1
22222222λλλλλλλ σσσσσσσ
sysλ sysλσ2sysλσ2−
17
Application to Reliability Design (cnt’d) 51013.1][][ −
− ×=+= HWnonHWsys EE λλµ
112 1023.2)var()var( −− ×=+= HWnonHWsys λλσ
µsys 51043.2 −×
0.3%
18
MTBF with 99.7% Confidence
%7.99}Pr{ ≥≥ tMTBF
%7.99}1Pr{ ≥≤tsysλ
MTBF(99.7%) =41,115 hours
MTBFSYS1=λ
MTBF Estimate with Confidence Neutral MTBF Estimate
The mean of PCB failure rate is 1.1310-5 faults/hours
MTBF=1/(1.1310-5 ) =88,100 hours
19
Topic Two:
Failure Mode Rate &
Failure-In-Time
20
Pareto Chart for Failure Modes
Difficulties: • Static View
• No Trend of Each Failure Mode
• Fail to Reflect Product MTBF
Pareto by Failure Mode From January to March
02468
101214
Rel
ays
Res
isto
rs
No
Faul
tFo
und
Col
dS
olde
r
Sof
twar
eB
ug
Op-
Am
p
Qty
0%
20%
40%
60%
80%
100%
No C/AC/A In ProcessC/A CompletePercentage
Pareto Chart by Failure Mode From April to June
048
1216202428
Op-
Am
p
Res
isto
rs
Col
dS
olde
r
Rel
ays
softw
are
bug
No
Faul
tFo
und
Qty
0%
20%
40%
60%
80%
100%
No C/AC/A In ProcessC/A CompletePercentage Note: C/A= corrective action
21
Failure Mode Rate (FMR)
onsinstallatiproductfieldFMoftypeaforfailures=FMR
22
FMR Estimation: Example
For example: Assuming 120 PCBs were shipped and installed in the field in the first quarter, 5 failures returned due poor solder joints, then the FMR for poor solder joints in the first quarter is
Where d = failure rate of design errors s = failure rate of software bugs m = failure rate of manufacturing p = failure rate of process o = failure rate of other issues i = failure rates for component type i k= total number of new component types ni= quantity of component type i used in the product
26
FIT-Based Reliability Driven: Example (1)
FM Category Target MTBF (hrs) Target FIT
Overall Product 50,000 20,000
Components (hardware) 117,647 8,500
Others (NFF) 250,000 4,000
Design 333,333 3,000
Manufacturing 500,000 2,000
Process 666,667 1,500
Software 1,000,000 1,000
MTBFFIT
910Notice =
27
FIT-Based Reliability Driven: Example (2) Product Target
FIT Categorical FM FIT Failure Mode Target FIT Current FIT Ownership
Cumulative operating time is 4800 hours, total failures is 14. Current MTBF=4800/14=343 hours.
Which FM should be fixed? Given limited budget.
Given $10 budget for corrective actions. Option one: Fix relays MTBF=4800/(14-2.5) =417 hours Option two: fix all others MTBF=4800/(14-9) =960 hours
32
New Reliability Growth Model
1. Failure mode based growth prediction
2. Reliability growth subject to CA budget constraints
3. No assumption of parametric models
4. CA effectiveness function
33
Limit Recourses ($) Spent on CA due to
1. Retrofit 2. ECO
Maximize Reliability
Growth
CA Effectiveness Function
Why Need the CA Effectiveness Function?
34 An Example: ECO or Retrofit
A type of relays used on a PCB module fails constantly due to a known failure mechanism. Two options available for corrective actions 1. Replace all on-board relays upon the failure return of the
module 2. Pro-actively recall all modules and replace with new types
of relays having much higher reliability
CA Option Cost ($) CA Effectiveness
ECO Low Low
Retrofit High High
35
0 c
x
1
effe
ctiv
enes
s
b
cxxh ⎟⎠⎞⎜
⎝⎛=)(
h(x)
CA budget ($)
Effectiveness Model
b>1 b=1
b<1
Modeling CA Effectiveness
b and c to be determined
Effectiveness= Failure rate before CA – Failures rate after CA
Failure rate before CA
36 An Example
The current failure rate a type of relay is 210-8 faults per hour. Upon the implementation of CA, the rate is reduced to 510-9. The CA effectiveness can be expressed as 0.75, that is