Page 1
2/26/2019
1
The Next Bhopal
Paul Gruhn, P.E., CFSEGlobal Functional Safety Consultant
Abstract
The precursors that led to the Bhopal disaster occur daily throughout industry today
This presentation will summarize:
• How we have historically looked at accidents
• A portion of the Bhopal process design
• Changes that were made contrary to specifications
• Problems encountered
• Numerous further design and operational changes
• The events that led to the worst industrial disaster in history… it took five years
• Similarities that are still occurring world‐wide
• How we might better prevent future events
Page 2
2/26/2019
2
Paul Gruhn, P.E., CFSE
Global Functional Safety Consultant at aeSolutions
Safety Systems Specialist for 30 years
ISA Life Fellow, active volunteer since 1989
Co‐chair and 28 year member of ISA 84 (SIS) committee
Developer & primary instructor for ISA’s courses on Safety Instrumented Systems (8.5 days of material)
Co‐author of ISA book on SIS
BSME, IIT, Chicago, IL
2019 ISA President
Kenneth Bloch
Process Safety Supervisor
30+ years experience in maintenance, process safety, technical and operational roles
Author of “Rethinking Bhopal”
Environmental Science degree from Lamar University, Beaumont, TX
Page 3
2/26/2019
3
Much of this material comes from…
Other great references:
• Safeware ‐ System Safety and Computers, Nancy G. Leveson
• What Went Wrong? Case Histories of Process Plant Disasters, Trevor A. Kletz
• An Engineer's View Of Human Error, Trevor A. Kletz
• Learning from Accidents, Trevor A. Kletz
• Drift into failure, Sidney Dekker
Introduction
Dealing with chronic pump seal problems led to misguided process‐related shortcuts
How could this have happened?
• The plant was patterned after a successful US facility
• However, changes were made that did not match the design intention or specifications
• The pattern of manipulating a process to compensate for issues is not unusual
“The road to hell is paved with good intentions”
• “If the original designers…”
• If a US operator were transplanted…
Page 4
2/26/2019
4
Summary of the disaster (Dec 3 1984)
1. Plant undergoing maintenance
2. Worker connects water hose to flush lines and valves
3. Water leaks into MIC tank and a reaction begins
4. Temperature and pressure builds and a leak started
5. Workers detect and report the leak
6. Vapor recover system off and not designed for such a load
7. Piping to flare badly corroded
Summary of the disaster
8. Material released above makeshift water curtain
9. 28 tons of toxic vapor were released which settled in the capital city
10. 3,800 dead, 200,000 injured, 2,000 animals died, environment severely impacted
11. Facility never reopened
12. US operations suspended
13. Parent company never recovered, and all divisions were eventually sold
AIChE CCPS, being proactive, the Titanic…
Page 5
2/26/2019
5
Looking for causes
Often an image of corporate misconduct, greed, and/or irresponsible cost cutting
• Responsible, proud people who believe in safety
There is no such thing as a “precise cause”
• Cultural causes are very similar, though
Equipment reliability, process reliability, productivity, and process safety are linked
• Asset reliability is a fundamental aspect of preventing industrial disasters
• Much depends upon how small issues are managed (as repeat failures have led to accidents)
• Repeat failures also lead to normalization of deviance (e.g., Texas City)
It’s not so much what they did, but why
It’s not so much “How can situations like this be prevented?” but rather
“What makes the regrettable choices people make even possible?”
1. Why did the design not follow the US plant?
2. Why did a worker connect a water hose to flush out lines?
3. Why was material building up inside the piping in the first place?
Page 6
2/26/2019
6
History and process summary
The product to be made was a success
Yet the process to make it was neither cooperative nor originally efficient
MIC was an intermediate product
MIC needed to be:
1. cooled,
2. covered with a nitrogen blanket,
3. process constructed with stainless steel
US facility operated safely for over a decade and served as the template for Bhopal
History and process summary
Other safety layers
• high temperature alarms,
• diluting heat sink (if adequate cooling could not be restored),
• reserve tanks (for additional cooling, reprocessing, or disposal),
• pressure relief,
• vent scrubber,
• flare
Unfortunately, there were dependencies between all these layers
Page 7
2/26/2019
7
Facility layout (1979)
Facility siting issue
Page 8
2/26/2019
8
Design change
Vent header piping, valves, and vent gas scrubber made of iron
• Such an exception could be justified with the nitrogen blanket
• Might even be considered today (Value Eng.)
“Can be managed” can lead to compromised thinking and normalization of deviance
• When you finally appreciate what you’ve lost…
When suggesting changes, state not only the what, but they why
• Excluding the why can lead to confusion behind the recommendation, especially when the original designers are no longer around
Note: Not the piping at Bhopal
MIC tank, pumps, vent and nitrogen lines
Page 9
2/26/2019
9
Pump problems from the start
Pump seals lasting 45 months would be ‘average’
Seals at Bhopal only lasted 24 days
Considering the number of pumps, there were repairs about every 5 days
Specific details of the failure mechanism are not in the public record
• Repeat failure are urgent warning signs
• Simply diagnosing ‘vibration’ is not helpful (ex.)
• Misdiagnosed vibration problems have led to many repeat failures in industry (Ken’s book has examples)
• Not diagnosing chronic reliability problems will lead to loss of control of a process
Normalization of deviance
Detecting leaks was audio/visual/olfactory
Workers found they could respond to leaks without serious problems
Chest and eye irritation became normal for those living near the plant
No immediate solution could be found
Repairs became routine, but the financial impact was very penalizing
Pump problems led to secondary issues…
Page 10
2/26/2019
10
Secondary issues
Pump failures cause irregular instrument readings (temperature)
• The meaningless information became normal
• High temperature alarm in the control room disabled
Repeat failures and exposure became normal
• Yet a false sense of security
Leaks are a sign of a problem worth analyzing
• All failures deserve to be addressed
Acceptance of such problems is normalization of deviance
• Bad things are usually the result
Task force (1981)
Operating at 1/3 of capacity not sustainable
Task force created with members from the parent company and the subsidiary
Nitrogen could also be used to move MIC (rather than use transfer pumps)
• From 2 to 25 psig
• Pressure information now useless
Yet the circulation pumps still failed
Other problems soon surfaced as a result of the changes
Page 11
2/26/2019
11
More improvisations
Using Nitrogen to pressurize the tank interrupted its flow to the vent header lines and the vent gas scrubber
Use of iron piping and valves led to
• Rust, which led to
• Trimer formation, which led to
• Choking of pipes and
• Failure (leakage) of valves
Dealing with trimer now became the priority
Flushing the lines with water was the answer
• Yet this led to further problems
Further corrosion and repairs
Process not designed for invasive maintenance
• Attaching water hoses to pressure gauge taps
Yet water increased the corrosion of iron pipes
• Deep corrosion pits formed
Leaks not tolerable and required repair
• Replacing corroded sections of the pipes
Water now a more likely source of MIC contamination
Note: These are not pipes at Bhopal!
Page 12
2/26/2019
12
Safety vs. maintainability
+ Inherently Safer Design ‐
‐ Maintainable +
Weld Joint
Flange Joint
Flange Joint with Spacer
Slip Blind
Spacer
A fatality (Dec 1981)
Repairs often required the use of blinds
One worker sprayed when unbolting a flange
He inhaled vapors while at the safety shower
Workers felt the problem was an unforgiving, maintenance intensive process
Supervisors disagreed
Worker’s union insisted upon design modifications, yet supervisors refused
The divide between the workers and supervisors grows
Page 13
2/26/2019
13
25 workers injured (Jan 1982)
One pump seal replaced with a different material (ceramic, not metallic)
• Corporate did not authorize the change
The seal failed catastrophically after 2 days
Workers were not wearing breathing masks
Supervisors considered the change sabotage
3rd party investigation triggered
Seal failure cannot be tolerated, so…
Circulation pumps shut off
• Pressurize the tank, rather than cool it
• Such a practice is still supported in industry
The divide widens
Supervisors would not agree to changes
• Neither side communicating or negotiating well
Workers acted independently to protect themselves (e.g., use of blinds avoided)
Workers took their complaints to the public
• Publicity campaign with pamphlets
• Public demonstrations at the factory gate
• The public showed little interest
• Supervisors terminated certain individuals
• Workers came back for employment with their morale broken
Note: These are not Bhopal workers!
Page 14
2/26/2019
14
Operating far beyond the original design
1. The process monitoring temperature and pressure gauges were of no value
2. The temperature alarms had been disabled
3. The refrigeration unit was not in use most of the time
4. Chronic valve leaks replaced chronic pump seal failures
5. Leaking pump seals, although less frequent that before, were still a periodic exposure hazard
Operating far beyond the original design
6. Water was being introduced regularly into process piping sections to flush out copious amounts of trimer deposits
7. Sections of iron pipe in the vent headers were corroding from the inside out and were in constant need of repair
Must either change to stainless steel piping and valves, or find the root cause of the pump seal problems
• Not possible with the plant in financial crisis
Page 15
2/26/2019
15
An independent audit (May 1982)
The team’s composition caused a dependency that limited their effectiveness
• Corporate staff members were already familiar with decisions made (and previously approved)
• It put them in an uncomfortable position (e.g., interrupting refrigeration was not marked as a deviation)
• Design and operation not according to original specifications
Audit findings (July 1982)
The audit team praised the facility for their workaround solutions!
• Yet the team did not transfer any of that knowledge back to the US facility
No immediate concerns requiring urgent attention
Several long term improvements recommended
• Manage the effects, did not address their cause
Failed to point out:
• Compliance issues to operate safely, the connection between trimer and failed valves, MIC pump reliability issues
Page 16
2/26/2019
16
Further improvisations (May 1983)
Considered a spare PVH
Used a jumper instead
• Corporate approved the change
The facility discontinued the use of replacing damaged sections of pipe…
Instead, clamps and weld overlays were applied as a quick fix
What would you do if you worked had worked here for five years?
Where can you pinpoint “blame” for all this?
RVVH starts to experience the same problems
Optimization and improvement program
Between ‘78 and ‘83 a loss of $7.5 MM
• Could not compete with local suppliers
Investors demanded the “bleeding” stop (Jan 1984)
An “optimization and improvement” program
• > 300 layoffs
• Reductions in compensation and benefits
• Savings of $1.25 MM
Process now less profitable, less stable, and more labor intensive than before
Investors couldn’t justify sinking more money
Factory abandoned in place
Page 17
2/26/2019
17
Facility shuts down (June 1984)
Refrigerant had leaked out, vent headers corroded and choked, valves leaking…
Two storage tanks idle with 25 tons of MIC
• Yet no way to convert to saleable product
Workers insecure about their future
• Wandered around the facility for weeks
What to do? (Oct 1984)
Facility up for sale
MIC and other hazardous materials on site
• Raw materials must be consumed by Jan ’85 (collaboration agreement expiration date)
August meeting to coordinate liquidation
Decided to go into production one last time
Temporary workers found
Short term repairs done
Production run planned to take 15 days
Circulation pump seal failed on 12th day
Relief systems choked and valves leaking
• Can’t transfer material to derivatives section
Page 18
2/26/2019
18
The last straw (Dec 1984)
Water flush required
Water flush worked, but…
Exothermic reaction started
Leak patrol party reported a leak
Little they could do at this point• No pumps, spare tanks, cooling, heat sink,
scrubber, flare, or water curtain
28 tons of material released• 3,800 dead, 200,000 injured, 2,000 animals died,
environment severely impacted
AIChE CCPS, being proactive, the Titanic…
Conclusion
It’s not enough to find out what people were doing at the time
It’s more important to find out why they were doing it (so document you design ‘whys’)
How widespread might these issues be?
• Actual design differs from intent and specification (e.g., if stainless steel would have been used…)
• Fundamental issues not resolved (e.g., if the pump problems would have been resolved…)
When confronting a problem, look for ways to eliminate it, not control it
Have you seen cases of instruments not installed correctly, operating procedures that differ from original intent, design basis no longer available…
Page 19
2/26/2019
19
Lest you think I’m joking…
When the Bhopal plant manager was informed of the accident, he actually said in disbelief,
“The gas leak just can’t be from my plant. The plant is shut down. Our technology just can’t go wrong. We just can’t have leaks.”
He eventually went to prison
Bhopal safety officer…
Indiana student example…
DuPont mercapton release video
What is this a graph of?
Happy Thanksgiving!Surprise!!
Turkey Well Being
Page 20
2/26/2019
20
Historical information and recommendations
Knowing what happened the day of an event is interesting…
Newtonian thinking is too simplified
Kletz’s suggestions have been time:
• Process hazards analysis
• Training
• Procedures
• Inspections and testing
• Control of management of change
• User friendly designs
• Better management
Process safety management (per CFR 1910.119)
Trade secretsCompliance auditsEmergency planning and responseIncident investigationManagement of changeHot work permitMechanical integrityPre‐startup safety reviewContractorsTrainingOperating proceduresProcess hazards analysisProcess safety informationEmployee participation
Page 21
2/26/2019
21
Process safety management (per CFR 1910.119)
Trade secretsCompliance auditsEmergency planning and responseIncident investigationManagement of changeHot work permitMechanical integrityPre‐startup safety reviewContractorsTrainingOperating proceduresProcess hazards analysisProcess safety informationEmployee participation
Yet the instability may not be obvious
The tower is still standing…
Murphy’s law is wrong…Everything that can go wrong usually goes right,And then we draw the wrong conclusions. (Langewiesche)
What do safe plants and safe drivers have in common?
Not all Jenga towers (plants) are the same
What if we could tell how close the tower was to toppling!?
Page 22
2/26/2019
22
Health isn’t ‘binary’
There are varying levels of health
How long are you willing to tolerate high cholesterol?
Are you willing to tolerate being obese and diabetic?
What would it take to make you (or your company) change?
Complex systems and drifting into failure
Organizations are complex, messy, and have conflicting goals • NASA: “Faster, Better, Cheaper”
Complex systems are bounded by three constraints (Jens Rasmussen)
The goal is to remain in the space bound by all three
Yet complex organizations, over time, may “Drift into Failure” (a book by Sidney Dekker)
Control theory can be used to keep the system bounded, and provide a visual indication!
Must adapt to unruly technology, along with pressures of scarcity and competition
Page 23
2/26/2019
23
So what else might we do?
1. Diversity of opinion and power• A person or group that has the authority, credibility and courage to say “no” (HROs: aircraft carriers)
• Accidents and the call for PEs
2. Alteration of MOC practices
• Involve outsiders (for diversity)
3. Audits / assessments
• Insiders may not have the necessary diversity
Safety and reliability are connected!
Definition of Maturity Class
Mean Class Performance
Best in ClassTop 20%
of aggregate performance scores
• 90% OEE (Overall Equipment Effectiveness)
• 0.2% Repeat Accident Rate
• 0.05 Injury Frequency Rate
• 2% Unscheduled Asset Downtime
Industry AverageMiddle 50%
of aggregate performance scores
• 85% OEE
• 2.4% Repeat Accident Rate
• 0.9 Injury Frequency Rate
• 6% Unscheduled Asset Downtime
LaggardBottom 30%
of aggregate performance scores
• 76% OEE
• 10% Repeat Accident Rate
• 3 Injury Frequency Rate
• 14% Unscheduled Asset Downtime
Source: Aberdeen Group Having an Executive Sponsor is key!
Page 24
2/26/2019
24
160+ specialists dedicated to:
Process safety PHAs, compliance audits / gap assessments, facility siting, risk management planning, and much more
Functional safety Cradle to grave SIS/BMS/F&G design and implementation services
CybersecurityAll aspects of the cyber lifecycle: Assess and define; design, implement, and construct; operate and maintain
SIS lifecyle softwareA relational database to document, track, and analyze all SIFs
Control system design and integration services UL listed panel shop; partners with Rockwell, Siemens & Schneider
aeSolutions: Experts in safety & cybersecurity
Partnering for a safe and secure world