Transcript
AbnormalSituationManagement
Defining the way things will be.
ASM
The birth of ASM...
• ASM grew from an initial focus on alarm management. Most sites are aware that operator overload and alarm floods are common during abnormal operations. As we analyzed the issues around alarm management, we discovered that operator problems with the alarm system were only a symptom of a general issue: – the design, implementation, and maintenance
of many facilities, systems, and practices.
ASM Consortium• Charter:
– Research the causes of abnormal situations and create technologies to address this problem
• Deliverables: – Technology, best practices,
application knowledge, prototypes, metrics
• History:– Started in 1994– Co-funded by US Govt
(NIST)– Budget: +$16M USD
• Current Status:– Committed through 2002– Honeywell leadership– Expanding membership
Current Membership:
University AffiliatesB R A D A D A M S W A L K E R A R C H I T E C T U R E, P. C.
Requirements for Safe Operation• Hazards must be recognized and
Understood• Equipment must be “fit for purpose”• Systems and procedures to maintain
plant Integrity• Competent staff• Emergency Preparedness• Monitor Performance
In the area of alarm management most companies fail to meet these basic requirements for safe operation
Various cost elementsVarious cost elements
Efficie
ncy
Operating Target
Current Limit
Theoretical Limit
Plant Performance
Comfort Margin
Theoretically possible; currently unsustainable
Lost opportunity(Cost of comfort)
Future upgrades (e.g.,Advanced Control)
Lost Profit
Additionalunplanned costs
Break-even
LossFixed Costs(Idle Plant)
Equipmentdamage, etc.
Accident
Lost Revenue
Profit
Shut down
Incident
Losses due toincidents, accidents(about 10% ofoperating costs)
Savings from reducing the comfortmargin
A Look At Plant Operations
Daily Production
Day
s pe
r Yea
r
95% 100%< 60%
A typical Production Profile
for an Asset Intensive Facility
for a calendar year.
95 days
62 days
23 days
79 days
47 days
30 days
16 days
8 days
5 days
Production Target set by Enterprise
Factors Affecting Plant Operations
Plant Operating Target
Plant Capacity Limit
Daily Production
Day
s pe
r Yea
r
Operational Constraints
Planning Constraints
95% 100%< 60%
Plant Availability
Plant Incidents Production EffectivenessAsset Utilization
Agility/Flexibility
Real Life Examples
Total Feed
0
5
10
15
20
280
290
300
310
320
330
340
350
360
370
380
390
400
410
420
430
440
450
460
470
480
490
500
510
520
530
540
550
560
570
580
590
600
610
620
Rate
# D
ays
$33.5 M
Total Feed
02468
1012141618
280
290
300
310
320
330
340
350
360
370
380
390
400
410
420
430
440
450
460
470
480
490
500
510
520
530
540
550
560
570
580
590
600
610
620
Rate
# D
ays $38.5 M
0
50
100
150
200
250
300
112
115
118
121
124
127
130
133
136
139
142
145
148
151
154
157
160
163
166
169
172
174
177
180
183
Production rate
Freq
uenc
y
3.2%5.8%
0
50
100
150
200
250
300
457
463
468
474
480
486
492
497
503
509
515
520
526
532
538
543
549
555
561
567
572
578
584
590
595
Feed Rate
Freq
uenc
y
1503
$24.2M
24.2M
5.8% This plant had 5.8% in lost capacity!
This plant had $24.2M in lost capacity due to asset availability & incidents!
This plant lost $38.5M!
And this plant lost $33.5M!
Site Studies have identified Plant Lost Opportunity
Plant Operating Target
Plant Capacity Limit
Daily Production
Day
s pe
r Yea
r
Operational Constraints
Planning Constraints
95% 100%< 60%
Between 3-15% in Lost Capacity is attributed to asset in-availability and
incidentsPlant Availability
Plant Incidents ProductionManagement
DCS/APC/ Optimization efforts
Manufacturing
ExecutionScheduling & ERP
NEW EMPHASIS!!
Asset Management
Reliability & CMMS
Higher Plant Operating Target
Plant Capacity Limit
Fewer Operational Constraints
Fewer Planning Constraints
Day
s pe
r Yea
r
95% 100%< 60% Daily Production
Emphasis on plant & equipment reliability improvements and reduced incidents can result in a recovery of 3-15% of lost capacity!
Major Profit Potential
The Importance of Alarm Management Improvement Project
Alarm management is the proper design, implementation, operation, and maintenance of industrial manufacturing plant alarm systems.
Current alarming practices are leading to Incidents
Major problem is:-
alarm flood
Standing Alarms
Poor Configuration of Alarms
Nuisance Alarms
Technology exists to significantly contribute to effective alarm systems and provide good Situation Awareness
Alarms identified as contribution
A Case
The lightning struck just before 9:00 AM on a Sunday. It immediately started a fire in the crude distillation unit of the refinery. The control operators on duty responded by calling out the fire brigade, and then had to divert their attention to a growing number of alarms while desperately trying to bring the crude unit to a safe emergency shutdown.
Hydrocarbon flow was lost to the deethanizer in the FCCU recovery section, which fed the debutanizer further along. The system was arranged to prevent total loss of liquid level in the two vessels, so the falling level in the deethanizer caused the deethanizer discharge valve to close. This, in turn, caused the level in the debutanizer to drop rapidly and its discharge valve also closed. Heat remained on the debutanizer and the trapped liquid vaporized as the pressure rose causing the pressure relief valve to “pop” (for the first of three times) into the flare KO drum and then immediately onto the flare itself.
b
continued
In a matter of minutes, the board operator was able to restore flow to the deethanizer. This permitted the deethanizer discharge valve to be opened, allowing renewed flow forward to the debutanizer. The rising level in the debutanizer should have caused the debutanizer discharge valve to open (by the level controller action) and allow flow on to the naphtha splitter. Although the operators in the control room received a signal indicating the valve had opened, the debutanizer, nonetheless was filling rapidly with liquid while the naphtha splitter was emptying. The operators were concentrating on the displays which focussed on the problems with the deethanizer and debutanizer, and had no overview of the process available to indicate that even though the debutanizer discharge valve registered as open, there was no flow going from the debutanizer to the naphtha splitter.
b
Despite attempts to divert the excess, the debutanizer became liquid-logged about an hour later and the pressure relief valve lifted for the second time, venting to the flare via the flare KO drum. Because there were enormous volumes of gas venting, the level of liquid in the flare KO drum was rising to a very high value.
About 2-1/2 hours later, the debutanizer vented to the flare a third time AND CONTINUED VENTING FOR 36 MINUTES. The high level alarm for the flare drum was activated at this time. But with alarms going off every 2 to 3 seconds, there appears to be no evidence that that alarm was ever seen. By this time, the flare KO drum had filled with liquid well beyond its design capacity. The fast-flowing gas through the overfilled drum forced liquid out of the drum’s discharge pipe. The discharge line was not designed for liquid, so the force of the liquid caused a rupture at an elbow. This released over 20 tons of highly flammable hydrocarbon.
continued
The ensuing release quickly formed an ominous drifting cloud of vapor and droplets. In a matter of minutes, this cloud found its ignition source 350 feet downwind. The resulting explosion was heard 80 miles away. In the town nearest the plant, few windows still held intact panes, so overpowering was the pressure shock wave from the blast. The last fires in the refinery were eventually extinguished 2 days later. end
Stylistic or Cultural IndicatorsTop Down:
CommitmentCompetenceCognizance
data collected & analyzed
Diagnostic and remedial measures
Source Failure Types
Unsafe ActsErrors &
Violations
Condition Tokens
Precursors
Functional Failure Types
Safety Information System
Interfacebetween theorganization
& the individualManagement Workplace
General Failure Types
AccidentsIncidents
Near-Misses1-10 hit list
Proactive DesignSI Projects
Best Practices
Poor workplacedesign
High workloadUnsociable hours
Inadequatetraining
Poor perceptionof hazardsAlarms
Human Factors
Control roomdesign
Near miss AuditingDu PontTraining
WorkspaceMotivation
Attitude
Group FactorsWorking Practice
Organization Individual
Various cost elementsVarious cost elements
Efficie
ncy
Operating Target
Current Limit
Theoretical Limit
Plant Performance
Comfort Margin
Theoretically possible; currently unsustainable
Lost opportunity(Cost of comfort)
Future upgrades (e.g.,Advanced Control)
Lost Profit
Additionalunplanned costs
Break-even
LossFixed Costs(Idle Plant)
Equipmentdamage, etc.
Accident
Lost Revenue
Profit
Shut down
Incident
Losses due toincidents, accidents(about 10% ofoperating costs)
Savings from reducing the comfortmargin
Managing Abnormal Situations Anatomy of a Disaster from Operations Perspective
Operational Modes:
Normal
Abnormal
Emergency
Plant States:
Normal
Abnormal
Out of Control
Accident
Disaster
Critical Systems:
Decision Support System
Process Equipment,
DCS, Automatic Controls
Plant Management Systems
Safety Shutdown,
Protective Systems,
Hardwired Emergency Alarms
DCS Alarm System
Physical and Mechanical Containment System
Site Emergency Response System
Area Emergency Response System
Operational Goals:
Keep Normal
Return to Normal
Bring to Safe State
Minimize Impact
Plant Activities:
Preventative Monitoring & Testing
Manual Control & Troubleshooting
Firefighting
First Aid
Rescue
Evacuation
Total Feed
0
5
10
15
20
280
290
300
310
320
330
340
350
360
370
380
390
400
410
420
430
440
450
460
470
480
490
500
510
520
530
540
550
560
570
580
590
600
610
620
Rate
# D
ays $33.5 M
Total Feed
02468
1012141618
280
290
300
310
320
330
340
350
360
370
380
390
400
410
420
430
440
450
460
470
480
490
500
510
520
530
540
550
560
570
580
590
600
610
620
Rate
# D
ays $38.5 M
0
50
100
150
200
250
300
112
115
118
121
124
127
130
133
136
139
142
145
148
151
154
157
160
163
166
169
172
174
177
180
183
Production rate
Freq
uenc
y
3.2%5.8%
0
50
100
150
200
250
300
457
463
468
474
480
486
492
497
503
509
515
520
526
532
538
543
549
555
561
567
572
578
584
590
595
Feed Rate
Freq
uenc
y
1503
$24.2M
Summarized Production Data
Unexpected Upsets Cost 3-8% of Capacity
Plant Operating Target
Plant Capacity Limit
Daily Production
Day
s pe
r Yea
r
Optimization efforts
Operational Constraints
Planning Constraints
95% 100%< 60%
~ $10 Billion annually in lost production !
Higher Plant Operating Target
Plant Capacity Limit
Fewer Operational Constraints
Fewer Planning Constraints
Day
s pe
r Yea
r
95% 100%< 60% Daily Production
Focused efforts can result in recovery of 3-8% of capacity
~ $10 Billion potential to the bottom line!
Major Profit Potential
Timing diagram of DIN V 19251 as applicable for a single channel SRS with ultimate self tests
executed within the PST
Failure isDetected
Safe status of theProcess assured
Failure Occurrence in theProcess or in the
Safeguarding System
tTime for Time for reaction of the Process
corrective action on the corrective action
Fault Tolerance Time
Fault tolerance time of the process or Process Safety Time (PST)
System internaldiagnostic time
Reliability Requirements for AlarmsClaimed PFDavg Alarm system
integrity/reliabilityrequirements
Humanreliabilityrequirements
1 – 0.1 Alarms may beintegrated into theprocess controlsystem
No special requirements – however the alarm system should be operated engineered and maintained to the good engineering standards identified in the EEMUA Guide
EMMUA Alarm Systems Guide page 17
CONCEPT 1 : RISK REDUCTION
IncreasingRisk
EUC Risk
Necessary minimum risk reduction [ R ]
Risk to meet required Level
of Safety
Partial risk covered by External Risk
Reduction Facilities
Partial risk covered by Other Technology
SRSs
Partial risk covered by E/E/PES
SRSs
Risk reduction achieved by all SRSs & External Risk Reduction Facilities
Actual risk reduction
Actualremaining
risk
SAFETY INTEGRITY LEVELS
TABLE 2: SAFETY INTEGRITY LEVELS: TARGET FAILURE MEASURES
SAFETY INTEGRITY
LEVEL
(SIL)
DEMAND MODE OF OPERATION
(Average Probability of failure to perform its design function
on demand)
CONTINUOUS/HIGH DEMAND MODE OF
OPERATION(Average Probability
of a dangerous failure per year)
4
3
2
1 10-2 to < 10-1 10-2 to < 10-1
10-3 to < 10-2 10-3 to < 10-2
10-4 to < 10-3 10-4 to < 10-3
10-5 to < 10-4 10-5 to < 10-4
Reliability requirements for alarmsClaimed PFDavg Alarm system
integrity/reliabilityrequirements
Human reliabilityrequirements
0.1 – 0.01 Alarms system shouldbe designated as safetyrelated & categorized asSIL 1
The operator should betrained in themanagement of thespecific plant failurethat the alarm indicates;
Alarm system shouldbe independent fromthe process controlsystem
The alarm presentationarrangements shouldmake the claimed alarmvery obvious to theoperator anddistinguishable fromother alarmsThe alarm shouldremain on view to theoperator for the wholeof the time it is active
EMMUA Alarm Systems Guide page 17
Reliability requirements for alarmsClaimed PFDavg Alarm system
integrity/reliabilityrequirements
Human reliabilityrequirements
Below 0.01 Alarms system wouldhave to be designated assafety related andcategorized as at leastSIL2
It is not recommendedthat claims for a PFDavgbelow 0.01 are madefor any operator actioneven if it is multiplealarmed and verysimple.For all credibleaccident scenarios thedesigner shoulddemonstrate that thetotal number of safetyrelated alarms and theirmaximum rate ofpresentation does notoverload the operator
EMMUA Alarm Systems Guide page 17
The Setting of a high pre-trip alarm
B
A
Maximum rate of changeof alarmed variable during fault
Limit at whichprotection operatesTime for operator
to respond to alarmand correct fault
Alarm Setting
Limit of largest normaloperational fluctuation
EMMUA Alarm Systems Guide page 17
Abnormal Operating Region
0
80
60
40
20
100
120
Gas
Con
cent
ratio
n (P
erce
ntag
e of
LE
L)
0 302010 5040 7060 80Time after onset of fault (Seconds)
ExplosionLower Explosive Limit (LEL)
Actual GasConcentration
Error
ErrorDelay
SamplingDelay
FaultOccurs
SensorDelay
Shut DownSystem Delay
Set trip point
Actual trip point
Measured GasConcentration
Gas concentrationprior to fault
Normaloperating Level
Redesign Choices• Redesign - the plant or its controls to provide greater margin between the normal
operating limits & the trip limits. This is the most desirable solution but is often impractical or too expensive;
• Setting within normal operating limits - setting the alam within the limits of normal operating fluctuations & accepting that spurious alarms will occur during large normal disturbances. This is ergonomically very undesirable and will tend to increase alarm rates and reduce the operator confidence in the alarm system. In effect it increases the Average Probability of Failure on Demand (PFDavg) for the alarm system as a whole;
• setting nearer trip limits - setting the alarm closer to the trip limits and accepting that some fast transients will not be corrected by the operator before they reach the trip level. This will increase the production losses due to plant trips, & because there are more demands on the protection system, tend to make the plant less safe. It also implies an increase PFDavg for the alarm system.
EMMUA Alarm Systems Guide page 17
Different Kinds of Events
Time
Abrupt/Catastrophic
Insidious
Manageable
PotentialImpact
of Initiating
Event
Impact of DCS Alarm SystemAwareness of Disturbances
PotentialImpact
of Initiating
Event
Time
With typical alarm systems, orienting begins after an event creates an abnormal plant state.
The extent of the problem can impact operator’s ability to be fully aware of the locations of process disturbances.
As disturbances propagate the number of conditions to be aware of increases as well as the response requirements and the likelihood of missing important information.
Point of operator awareness
Correct intervention causes return to normal
Failure Occurrence in theProcess or in the Safeguarding System
Failure isDetected
Safe status of theProcess assured
Incident
Impact of DCS Alarm System Management of Problems
PotentialImpact
of Initiating
Event
Time
Alarm Floods delay Evaluation
Point of operator awareness
Correct intervention causes return to normal
Standing Alarms interfere with Orientation
Inadequate filtering interferes with Action
Incident
Impact of Good Alarm Management in Situation Awareness
PotentialImpact
of Initiating
Event
Time
• Increases likelihood of awareness of disturbances
• Reduces time to awareness• Hence, reduces the average
impact of initiating events
Average shift in awareness with decision support
Emergency Alarm
Impact of Protection System
Impactof
InitiatingEvent
Time
UN-SAFE
SAFE
Trip from SIS
FTTProcess Safety Time
Trip
Loss
Quality
ProfitHigh Alarm
HighEmergency
Incident
FTT= Fault Tolerance Time
Operator diagnostic time
PotentialImpact
of Initiating
Event
Time
No responseNo responseIncorrectIncorrect
Best
SuboptimalSuboptimal
Impact of Decision Support SystemSupport for Optimal Response
PotentialImpact
of Initiating
Event
Time
• Reduces errors• Decreases time to implement
response• Manages side effects• Increases awareness
ASM Alarm Management Solutions
Education for Management, Engineers, Technicians and Operators.
• Alarm Performance Assessment.• Requirement for alarm optimization tools.• Alignment with Company & EEMUA Guidelines.• Alarm Rationalization.• User Interface Design.• Decision Support Activities
ObjectivesObjectivesAlarm Management Optimization
• Enhance operator effectiveness– Avoid alarm floods– Identify root causes– Eliminate nuisance alarms
• Enhance profitability– Reduce variability– Maximize plant up time– Prevent damage to equipment
• Reduce risk of : – Injury to personnel– Environmental incidents
Develop PlantDevelop PlantAlarm ManagementAlarm Management
Standards & PhilosophyStandards & Philosophy
IdentifyIdentifyEnhancementsEnhancements
ChangeChangeManagementManagement
ImplementImplement
Verify AgainstVerify AgainstStandardsStandards
Collect DataCollect Data
AnalyzeAnalyze
Alarm Management OptimizationThe ProcessThe Process
Alarm Management Alarm Management
After - 30 Points Account for ~ 52 % of All Alarms
Before - 30 Points Account for ~ 85 % of All Alarms
100K
2K
Alarm Management Optimization• Increase the effectiveness of the existing
alarm system through proven methodology– Analyze existing system performance– Assist in developing an alarm strategy and educating
operations staff– Rationalize existing alarm system
• Recommend and apply new alarm management software– UserAlert– Optimization Suite
• Alarm Rationalization and Documentation• Alarm Metrics and Analysis• Advanced Alarm Handlers
Alarm RationalizationAlarm RationalizationOptimization Suite…
• Alarm priority (class) is based on severity and level of impact and time
• Available priority options in TPS:– No Action– Journal– Print– Print & Journal– Low– High– Emergency
Alarm RationalizationAlarm RationalizationOptimization Suite…
• Recommends alarm priorities based on plant philosophy– Severity of impact– Time to respond– Trip Point
• Electronically captures plant alarm management philosophy– Time to respond rules definition – Impact and severity rules definition
• Apply manual priority override• Use Alarm Impact Templates• Generate EC Files (Honeywell)
top related