Central Ephrata Substation Incident Root Cause Analysis Results May 2, 2017 Ty Ehrman Jeff Shupe Angel Barahona-Sanchez Darrell Hahn
Central Ephrata Substation IncidentRoot Cause Analysis Results
May 2, 2017
Ty EhrmanJeff ShupeAngel Barahona-SanchezDarrell Hahn
Roadmap• Discuss the event• Introduce the investigative team• Describe the distribution system: how should it have worked vs. how
did it work• Damage and safety potential• What else happened and contributed to the problem• Root cause• How to keep it from happening again• Major conclusions and take-aways
Event and Response• February 15, 2017 – 3 minute electrical fault occurred in the
distribution lines fed by, and near, the Central Ephrata Substation
• Root cause analysis was performed
• Corrective actions have been identified to prevent recurrence
Root Cause Analysis Team• Root Cause Analysis (“RCA”) requested by Executive Management
• A diverse team of District employees was formed: – Ty Ehrman– Jeff Shupe– Angel Barahona-Sanchez– Scott Smith– Darrell Hahn– Randy Hovland– LeRoy Patterson– Karrie Buescher
• Contracted with Pilot Advisors, Andrew Bielat, to provide RCA expertise
RCA Process
• The RCA team conducted interviews with employees and witnesses from the public
• A third-party contractor was retained by the District to provide failure mode testing
• A formal RCA methodology was applied and followed
Central Ephrata Substation Layout
This design is similar to many other existing District distribution substations.
How the Substation Protection Works
• Fault is detected by relays• Circuit breaker operates
How the Substation Protection Works
• Fault is detected by relays• Circuit breaker operates• If the fault is still present, the circuit breaker operates to lock-out, clearing
the fault
How the Substation Protection Works
• If the fault is still present during a breaker failure scenario, the second layer of protection operates
How the Substation Protection Works
• If the fault is still present during a breaker failure scenario, the second layer of protection operates
• The circuit switcher operates, extinguishing the fault
How the Central Ephrata Substation Protection System Failed to Operate Correctly
• Fault is detected by relays• Circuit breaker operates opens and closes once• Circuit breaker fails to open again• Fault continues for approximately three minutes, until the
transmission protection system clears the fault
Consequences
• Safety– On the distribution line: arcing lines burned and came down, presenting a hazard
to the public– Substation: If occupied, employees could have been severely injured
• Disruption to service and damage to customer equipment and property
– Outage affecting up to 8,000 customers in the Ephrata, Quincy, and Soap Lake areas
• Estimated cost to repair District damages: $4.5 – $7.5M
Original Fault
Original Fault Location
Damage to Switchgear
Damage to Truck Near Switchgear
Damage to Main & Auxiliary Bus
Damage to CE6 Breaker
Damage to Transformer
Damage to Facilities Building
Central Ephrata Substation DC Battery System
• Purpose: operate the protection systems within the substation• The battery charger helps maintain the battery bank at full charge
and ensure operation during loss of A/C
Battery Transfer Switch
Interior of Battery Transfer Switch
1986
Substation Commissioned
2005
Battery TransferSwitch Installed
InadequateDesign Review
For AddedRisk
Safety of Maintenance
“Battery Transfer Switch” Event
SuspectedDistribution
Breaker Failed toOperate
TransmissionBreaker Prevented
More Damage
Suspected Loss of Control Of DC Supply
2013
Replace Battery Transfer Switch
Single-point Failure Known
InadequateMonitoring
Known
2013
RCA Initiated
Unfinished RCA
InadequateCorrective ActionProgram/Preventive Action
CE 6 Protection
FailureFebruary 15, 2017
CE6 Protection
Failure
Battery TransferSwitch “OFF”
Loss of Control of DC Power
Human Error
Establishment of Control of Critical
SystemsInadequate
Establishment of Control of Critical
Systems
/
November 16,2013 February 15, 2017
Inadequate Work/Configuration
Control
Original Design
:InadequatePositive Control
InadequateMod/ Design
Program
Root Cause
The failure of executive management to establish control
of critical systems
Corrective Action 1
• Establish and manage a “critical systems” program that includes:
1. Identifying the criticality of systems with respect to safety and asset protection;
2. Establishing appropriate levels of criticality;
3. Establishing and verifying appropriate programmatic control based upon the level of criticality.
Corrective Action 2• Establish a Programmatic Approach to Operations
A. Programmatically manage the mission-critical business functions, including the critical systems program recommended in Corrective Action 1.
B. Evaluate the processes/programs that should have prevented this event and other unwanted events. Executive level responsibility should include necessary elements of:
I. SafetyII. Risk ManagementIII. Design/Design ModificationsIV. Corrective ActionsV. Work/Configuration ControlVI. Craft Certification and System Training
Corrective Action 3
• Establish ownership to enhance the awareness and control of critical areas and events
Commonly accepted industrial practice to improve safety, reliability, and performance of equipment and systems is to establish singular authority and ownership of area, systems, and activities.
Corrective Action 4
• Implementation Team
Establish a cross-disciplinary Implementation Team with the responsibility and authority to ensure the successful implementation of the actions included in the previous three slides.
Compensatory Actions (Short-Term)• Ensure safe operations of substations
A. Establish entry protocol; B. Improve control of safety zones outside the substations related
to shock risk; C. Create a plan to evaluate integrity and adequacy of existing
substation ground grids.
• DC Power ControlA. Establish positive control of the DC power systems; B. Multidisciplinary review of any design changes or modification.
Conclusion• Past practice shows that safety is not the primary consideration in
design and operations.– Have made progress on improving safety culture– Safety culture is not yet where it needs to be
• A programmatic approach to define and control the most important elements of safe, effective operations is needed.
• Root cause responsibility has been placed at a high level, but for culture change to take place and a new programmatic approach to be effective, support is needed at all levels of the organization.
Thank you.Questions?