Failure Analysis Failure Analysis of Engineering Systems of Engineering Systems Instructor: Professor Steve Maher Instructor: Professor Steve Maher Module 5: Module 5: Scripture of the Module Scripture of the Module Some review of Module 3 Some review of Module 3 8 – Failure Mode Assessment and Assignment 8 – Failure Mode Assessment and Assignment (FMA&A) (FMA&A) 9 – Pedigree Analysis 9 – Pedigree Analysis 10 – Change Analysis 10 – Change Analysis
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Failure Analysis Failure Analysis of Engineering Systemsof Engineering Systems
Instructor: Professor Steve MaherInstructor: Professor Steve Maher
Module 5:Module 5:
Scripture of the ModuleScripture of the Module
Some review of Module 3Some review of Module 3
8 – Failure Mode Assessment and Assignment (FMA&A)8 – Failure Mode Assessment and Assignment (FMA&A)
9 – Pedigree Analysis9 – Pedigree Analysis
10 – Change Analysis10 – Change Analysis
Scripture of the ModuleScripture of the Module
““The plans of the diligent lead to profit The plans of the diligent lead to profit as surely as haste leads to poverty.”as surely as haste leads to poverty.”
- Proverbs 21:5- Proverbs 21:5
Failure Analysis of Engineering Systems ENGR 5323
2
AssignmentAssignment
Read Chapter 8, 9, and 10 of Read Chapter 8, 9, and 10 of Systems Systems Failure AnalysisFailure Analysis
Do Quizzes on Bb as they appearDo Quizzes on Bb as they appear
Quiz next week at beginning of classQuiz next week at beginning of class
Failure Analysis of Engineering Systems ENGR 5323
3
Some Module 3 Some Module 3 “Leftovers”“Leftovers”
From Module 3 QuizFrom Module 3 Quiz
We have 1000 parts that have run at an average We have 1000 parts that have run at an average of 800 hours each. 20 of them have failed. of 800 hours each. 20 of them have failed. What is the failure rate?What is the failure rate?
MTBF = total service hours/# failedMTBF = total service hours/# failed
For the parts in Question 17, what is the For the parts in Question 17, what is the Probability that a part will run to 1000 hours Probability that a part will run to 1000 hours without failing?without failing?
PPss = e = e--λλt t = e= e-(2.5E-5)(1000) -(2.5E-5)(1000) = 0.9753 (or 97.53% chance = 0.9753 (or 97.53% chance
of running that long without failing.of running that long without failing.
Some more info: PSome more info: PFF = 1 – P = 1 – Pss = .0247 = 2.47% = .0247 = 2.47%
chance of failing; i.e. ~25 parts will fail by 1000 chance of failing; i.e. ~25 parts will fail by 1000 hours of operation, or ~5 more between 800 and hours of operation, or ~5 more between 800 and 1000 hours.1000 hours.
Failure Analysis of Engineering Systems ENGR 5323
From Module 3 QuizFrom Module 3 Quiz
Fig 7.1: Event B has a failure rate of 10Fig 7.1: Event B has a failure rate of 10 -4-4. The . The part is operated for 100 hours. The probability of part is operated for 100 hours. The probability of event C happening is .005. What is the event C happening is .005. What is the probability that command event A will occur?probability that command event A will occur?
OR gate, so POR gate, so PAA = P = PBB + P + PCC – P – PBB*P*PCC. .
= .015 or 1.5% chance of Event A happening.= .015 or 1.5% chance of Event A happening.
Failure Analysis of Engineering Systems ENGR 5323
From Module 3 QuizFrom Module 3 Quiz
Fig 7.3: The system is operated for 100 hours. Fig 7.3: The system is operated for 100 hours. The failure rate for B is 2x10The failure rate for B is 2x10-5-5. The failure rate . The failure rate for C is 5x10for C is 5x10-6-6. What is the probability that . What is the probability that command event A will occur?command event A will occur?
Berk’s Overall FA ProcessBerk’s Overall FA Process
Failure Analysis of Engineering Systems ENGR 5323
10
Designate a team
Gather all related information
Review and define problem
Identify all potential failure causes
List causes in FMA & A
Converge on root cause
Determine Corrective Actions
Implement Corrective Actions
Assess Corrective Actions
Evaluate for Preventive Actions
Incorporate FA Findings
What is FMA&A?FMA&A = Failure Mode Assessment and Assignment
It is a tool to help manage the evaluation of each of the hypothesized failure causes.
It is generally a table – textbook has 4 columns:– Event number– Description of each hypothesized failure cause– Likelihood assessment of each cause (updated as data
becomes available)– Actions necessary to evaluate the cause and status of the
evaluation (sometimes separate columns).
See Table 8.1: FMA&A for light bulb example
Spreadsheet (e.g. MS Excel) is an excellent tool for this; can use word-processing tool (e.g. MS Word)
Failure Analysis of Engineering Systems ENGR 5323
11
Hypothesized Failure CausesEach row of the table is a hypothesized failure cause
Each of the causes is briefly described
Can develop hypothesized causes with any method, then “map” them to a row/column
List causes or inducing events only– In FTA terms, do not list command events– Focus on basic failures, human errors, normal events,
inhibiting conditions, and undeveloped events (using FTA terms)
Typically a repeat from previous activity of identifying potential causes (described in Modules 2-3)– Easier to work with table than with diagram(s)– Saves time, less confusion
Failure Analysis of Engineering Systems ENGR 5323
12
Event Number Column
Each of the hypothesized failure causes is numbered
This is used for tracking and organizational purposes
Can develop hypothesized causes with any method– Textbook uses FTA example– Each team or individual can choose numbering system
List and assign numbers to causes or inducing events only– In FTA terms, do not list/assign command events– Focus list/number on basic failures, human errors, normal
events, inhibiting conditions, and undeveloped events (using FTA terms)
Failure Analysis of Engineering Systems ENGR 5323
13
Assessment ColumnDescribes assessment for each of the hypothesized failure causes
Default for each cause is “Unknown”
As analysis proceeds, each cause will be updated using terms such as – “Unlikely” = evaluation showed no problem or no lead– “Likely” = cause likely found but not conclusive, or – “Confirmed” = we found it! (At least the objective evidence
indicates it)
For FA (i.e. the failure has already occurred), Probabilities do not really matter that much…– Can be used as a guide to prioritize evaluation– Should NOT be used by itself to update status
Failure Analysis of Engineering Systems ENGR 5323
14
Assignment ColumnDefines action necessary to evaluate each of the hypothesized failure causes
Review each hypothesized cause (i.e. row by row) and determine actions necessary to evaluate it– Evaluation needs to be objective (i.e. fact-based), not
subjective (i.e. opinion-based)– Focus on ruling the cause in or out– Be careful!
of jumping to conclusions
of ruling causes out too quickly and without evaluation
Best if ONE owner and a DUE DATE for the action(s)
Status is updated as the actions are completed (in this column or an additional one)
Failure Analysis of Engineering Systems ENGR 5323
15
Point of EmphasisPoint of Emphasis
Do not touch any hardware or software from the failed system until you have defined an organized, systematic, and objective manner in which to proceed
Failure Analysis of Engineering Systems ENGR 5323
16
Follow-On Activities (Team)Meet regularly – determined by priority, severity, and urgency– High profile: at least daily– Low profile: at least weekly is recommended– Use FMA&A to guide the meeting
Execute actions that are assigned and update status– Include findings in Assignment column– Assessment and Assignment changes based on the data– Clearly indicate items completed and ruled out (e.g. shading
the row)
Distribute updates to team and stakeholders on a regular basis (e.g. after each team meeting)
Failure Analysis of Engineering Systems ENGR 5323
17
Individual ApproachesSuggest you use FMA&A or a similar format – Some organizations use an FA Log or similar tool– You want something to capture your thoughts, planned
actions, and status updates
Update someone regularly – determined by priority, severity, and urgency– Go no more than 2 weeks – recommend weekly– Use FMA&A or FA Log to guide the meeting
Execute planned actions and update status– Document findings as you go– Adjust plans based on the data– Clearly indicate items completed and ruled out
Have updates ready to distribute when needed
Failure Analysis of Engineering Systems ENGR 5323
18
Performing the Evaluations
Pedigree analysis (Ch 9)
Change analysis (Ch 10)
Analytical equipment (Ch 11)
Mechanical and electronic component failures (Ch 12)
Leaks (Ch 13)
Contamination (Ch 14)
Design Analysis (Ch 15)
Statistical Considerations (Ch 16)
Design of Experiments (Ch 17)
Failure Analysis of Engineering Systems ENGR 5323
19
Group ActivityGroup Activity
* Discuss Scenario (pg. 72-73)* Discuss Scenario (pg. 72-73)* Document answers to * Document answers to questions in a filequestions in a file* Email file to me (one per team)* Email file to me (one per team)*After emailing, take a 5-10 min *After emailing, take a 5-10 min breakbreak* Re-convene about ____* Re-convene about ____
Which approach does your organization (or do you) Which approach does your organization (or do you) follow? follow?
Do you think your failure analysis approach needs to Do you think your failure analysis approach needs to change? change?
If so, what can you do to initiate a change?If so, what can you do to initiate a change?
Failure Analysis of Engineering Systems ENGR 5323
Pedigree AnalysisPedigree Analysis
Berk’s Overall FA ProcessBerk’s Overall FA Process
Failure Analysis of Engineering Systems ENGR 5323
24
Designate a team
Gather all related information
Review and define problem
Identify all potential failure causes
List causes in FMA & A
Converge on root cause
Determine Corrective Actions
Implement Corrective Actions
Assess Corrective Actions
Evaluate for Preventive Actions
Incorporate FA Findings
Overall Process Flow for Overall Process Flow for Diagnosing Root Cause of a FailureDiagnosing Root Cause of a Failure
Failure Analysis of Engineering Systems ENGR 5323
25
Confirm the Failure
Characterize the Failure
Isolate the Failure
Isolate the Defect
Identify the Defect
Determine Root Cause
What is a Pedigree?And How Do You Analyze it?
Product or System Pedigree = Essentially the history of the product– Describes design of product– How it was built– That it was built in accordance to spec’s, codes, etc.
Documents in a pedigree:– Records of how it was built– Records of material used– Conformance to drawing and material requirements
Will a suspect condition be revealed by an analysis of the pedigree?
Failure Analysis of Engineering Systems ENGR 5323
26
Value of Reviewing the Pedigree
If it addresses the suspect area, examine the pedigree to see if there is something suspicious – Anomalies in test results– Non-conformities found in inspections– Missing items or documents
If it does not address the suspect area, maybe the pedigree should– Recommend for future builds– Can be part of corrective action to prevent future failures
Failure Analysis of Engineering Systems ENGR 5323
27
Examining the Pedigree
Purchase orders
Nonconformance documentation
Inspection records
Test data
Calibration data
Drawings and specifications
Drawing changes
Work instructions
Certificates of conformance
Failure Analysis of Engineering Systems ENGR 5323
28
Surprisingly…
Shipped systems do not always meet all of its requirements– Many pedigree reviews reveal that the product had/has a
problem– In some cases, pedigree directly related to the failure
Not necessary to check the ENTIRE pedigree– That may be a massive undertaking– Only review areas that relate to the hypothetical causes
Pedigree can be suspect – Errors, omissions, or even fraud can happen– Certificates of conformance are not a guarantee
Failure Analysis of Engineering Systems ENGR 5323
29
Example: Tragedy in HawaiiTour Plane caught fire and crashed
Oil leaking into engine caused the fire
Oil filter gasket had melted
Gasket was made of wrong material
Maintenance, filter spec, and gasket spec were fine…
Gasket manufacturer noted different material
Certificate of conformance was missing
Gasket packing slip and certificate did not match
Mismatch slipped through and nonconforming oil gasket was used, leading to the accident
Failure Analysis of Engineering Systems ENGR 5323
30
Non-conformance Does HappenAnomalous Certificates of Conformance are not uncommon– Typically not outright fraud– Most are human error
Sometimes nonconforming material or system ships anyway
Sometimes everything looks fine but something is suspicious– Follow-up independent verification may be needed – Additional inspections, testing, etc. can be sought
Failure Analysis of Engineering Systems ENGR 5323
31
Change AnalysisChange Analysis
What is Change Analysis?
If a system WAS working, what changed?
Need to determine if a change occurred and if the change induced the failure.
Options:– Nothing changed!– Failure was happening, but not observed– Failure occurs within normal statistical variation– Change occurred, but unrelated to failure– A change induced the failure
Failure Analysis of Engineering Systems ENGR 5323
33
Things That Can Change
Design
Manufacturing Process
Test and Inspection
Environment
Lot Changes (Manufacturing variation)
Aging
Supplier Changes
Failure Analysis of Engineering Systems ENGR 5323
34
Design Changes
Controlled design change
“Redlined” design change
Rejected material: “use as is” or “repair”
Outsourced components and subassemblies
Failure Analysis of Engineering Systems ENGR 5323
35
Process Changes
Work or Build Instructions– Many companies do not have instructions– Imprecise work instructions– Little rigor on changes– Changes to equipment, tooling, settings not documented– People not following the instructions
Investigate the documentation for changes
Investigate for non-documented changes
SPC can help and provide a starting point
Failure Analysis of Engineering Systems ENGR 5323
36
Test and Inspection Changes
Investigate for issues in testing– Failures returned to manufacturing– Reworked systems or components– Results out of the ordinary– Changes in the test process
Investigate inspection processes and results– Change of inspectors– Change of instructions– Items noted but system continued anyway
Changes in test or inspection equipmentFailure Analysis of Engineering Systems
ENGR 532337
Environmental Changes
Temperature and Humidity issues– Curing or dying of materials– Non-environmentally controlled processes– Investigate if failure correlates to temp/humidity
Storage– Investigate changes in environment or procedure– Epoxies and raw material may be sensitive– Moving locations can induce changes
Shipping
Failure Analysis of Engineering Systems ENGR 5323
38
Lot and Supplier Changes
Manufacturing has normal variation– Sometimes failures correlate to supplier lots– May be related to material distributions– Investigate if failure correlates to supplier lots – Need to understand supplier’s processes
Aging
Suppliers can change materials or designs – Purchased supplies may still meet spec’s– Investigate for changes that affect system– Can be difficult and sensitive to get information
Failure Analysis of Engineering Systems ENGR 5323
39
Example from Textbook:Example from Textbook:CBU-87/B Cluster BombCBU-87/B Cluster Bomb