Cause Coding: Cause Coding: Learning from every event. James Merlo, PhD August 1 st 2012
Cause Coding:Cause Coding:Learning from every event.
James Merlo, PhD
August 1st 2012
NERC Pillars
• Reliability – to address events and identifiable risks therebyReliability to address events and identifiable risks, thereby improving the reliability of the bulk power system.
• Assurance – to provide assurance to the public, industry, and government for the reliable performance of the bulk power system.
• Learning – to promote learning and continuous improvement• Learning – to promote learning and continuous improvement of operations and adapt to lessons learned for improvement of bulk power system reliability.
• Risk‐based model – to focus attention, resources and actions on issues most important to bulk power system reliability.
2 RELIABILITY | ACCOUNTABILITY
Come on in the water is fine!
3 RELIABILITY | ACCOUNTABILITY
Our Mantra
Not every event results in a succinct lesson learned but wesuccinct lesson learned, but we learn from every event.
4 RELIABILITY | ACCOUNTABILITY
Malcolm K. Sparrow John F. Kennedy School of Government, Harvard University
Visualize the Data
5 RELIABILITY | ACCOUNTABILITY
Pie Chart
0 91
0 50.60.70.80.9
0 10.20.30.40.5
00.1
6 RELIABILITY | ACCOUNTABILITY
Challenge
7 RELIABILITY | ACCOUNTABILITY
www.airlines.org/PublicPolicy/Testimony/Pages/testimony_5-13-09Senate.aspx&docid=qnHU9MAraY_WIM&w=550&h=403&ei=mdRbTvkrhLm3B8nyibgM&zoom=1&iact=rc&dur=62&page=2&tbnh=167&tbnw=216&start=50&ndsp=31&ved=1t:429,r:4,s:50&tx=110&ty=85
Reliability
Equipment Reliability
Human Performance
Human Interaction with EquipmentHuman Interaction with quipment(coupled w/Automation)
8 RELIABILITY | ACCOUNTABILITY
Misoperation Categories (2011 Q2-Q3)
835900
Misoperation Count by Category(total reported)
600
700
800
ount
432
400
500
600
ration
Co
200
300
Misop
er
4820
0
100
Unnecessary Trips during fault
Unnecessary Trips other than fault
Failure to trip Slow trip
M
9 RELIABILITY | ACCOUNTABILITY
Misoperation Category
Misoperation Causes (2011 Q2-Q3)
424
400
450Total Reported
unt
305
219250
300
350
ration
Co
155
88 91
4350
100
150
200
Misop
er
100
50
10 RELIABILITY | ACCOUNTABILITY
Misoperation Cause
Misoperation Relay Technology (2011 Q2-Q3)
350
Misoperations by Relay Technology(only Relay Failure/Malfunction Cause)
250
300
ount
200
peration
co
138
96100
150
Misop
63
8
50
11 RELIABILITY | ACCOUNTABILITY
0
Electromechanical Microprocessor Solid state Unknown
Misoperation Relay Technology (2011 Q2-Q3)
316
350
Misoperations by Relay Technology(only incorrect Settings/Logic/Design Error Cause)
250
300
ount
200
peration
co
100
150
Misop
68
19 21
50
12 RELIABILITY | ACCOUNTABILITY
0
Electromechanical Microprocessor Solid State Unknown
Human Performance
It is not a matter of if the automation fails, it is a matter of when.
13 RELIABILITY | ACCOUNTABILITY
Events by Category
14 RELIABILITY | ACCOUNTABILITY
Event Analysis Report (EAR) Submittals
30
24
26 26
93 EAR's submitted during the Field Trial(October 25, 2010 ‐ February 25, 2012);
28 EAR's submitted since end of Field Trial = 121 total
20
2524
15
0
13
16
10
5
7
5
54
15 RELIABILITY | ACCOUNTABILITY
0FRCC MRO NPCC RFC SERC SPP TRE WECC
Candidate Lessons Learned
Not every event on the bulk power system (BPS) has a quality “Lesson” to sharequality Lesson to share
• NERC looked at 230 qualifying events (Category 1 and above) and received 119 “candidates” for Lessonsabove) and received 119 candidates for Lessons Learned 55 of these came from the Cold Snap event of 2011
• Excluding the Cold Snap event, there were 64 other events which resulted in a Lesson Learned being submitted for consideration
• Twenty‐two Lessons Learned published in 2011, and h d
16 RELIABILITY | ACCOUNTABILITY
thirteen to date in 2012
Lessons Learned – Published (2012)
Region Lessons Learned Brief Description DateTRE TRE-LL-05 – Plant Onsite Material and Personnel Needed for a Winter
Weather Event 1/06/2012
TRE TRE-LL-06 - Plant Operator Training to Prepare for a Winter Weather Event 1/06/2012
TRE TRE-LL-07 - Transmission Facilities and Winter Weather Operations 1/06/2012
NPCC LL 54 DC Supply and AC Transients 3/06/2012NPCC LL-54 - DC Supply and AC Transients 3/06/2012
WECC LL-58 – Saturated Bus Auxiliary Current Transformer causes Bus Differential Operations during Line Fault 3/06/2012
TRE TRE-LL-34 – Rotational Load Shed 3/06/2012
WECC LL-59 - Auxiliary Relay Contact Contamination 6/19/2012
WECC LL-60 – Remote Terminal Units not on DC Sources 6/19/2012
WECC LL-61 – EMS Database Corruption Problem 6/19/2012
WECC LL-62 – Unmanned Forklift contact with Energized Bus 6/19/2012
RFC LL-65 – Excessive Resource Utilization 6/19/2012
TRE LL-66 – Alarm Interpretation Leads to Generator Stator Coil Failure 6/19/2012
17 RELIABILITY | ACCOUNTABILITY
p
NPCC LL-67 – Protective Relaying Digital Input Board Loading 6/19/2012
Human Factors Analysis and Classification
18 RELIABILITY | ACCOUNTABILITY
Initiative
NERC CCAPNorth American Electric Reliability CorporationNorth American Electric Reliability Corporation Causal Code Assignment Process An event and data analysis tool
The Reliability Risk Management Group (RRM) has designed
An event and data analysis tool
The Reliability Risk Management Group (RRM) has designed, developed, and implemented the North American Energy Reliability Corporation (NERC) Causal Code Assignment Process
ll ff d d b l fto allow accurate, efficient trending and subsequent analysis of events for sharing and providing a cooperative forum focused on improving the reliability of the Bulk Power System (BPS).
19 RELIABILITY | ACCOUNTABILITY
p g y y ( )
Purpose
• Establish NERC causal coding program to guide, assist, d i f i d tand inform industry
• Venue to share data across NERCO t it t ll t d di i t li bilit d t• Opportunity to collect and disseminate reliability data
• Expanded communication channel to reach parts of industry and Electric Reliability Organizationindustry and Electric Reliability Organization
• Foster trust and collaboration between Applicable Governmental Agency ERO and stakeholdersGovernmental Agency, ERO and stakeholders
20 RELIABILITY | ACCOUNTABILITY
Cause Code Assignment Process (CCAP)
• A1 Design/Engineering Problem• A2 Equipment/Material Problemq p /• A3 Individual Human Performance LTA• A4 Management Problem• A5 Communication LTA• A6 Training Deficiency• A7 Other Problem• A7 Other Problem
21 RELIABILITY | ACCOUNTABILITY
Cause Code Definitions
Short Title DefinitionDesign/Engineering Problem An event or condition that can be traced to a defect in
d i th f t l t d t fi tidesign or other factors related to configuration, engineering, layout, tolerances, calculations, etc.
Equipment/Material Problem Is defined as an event or condition resulting from the failure, malfunction, or deterioration of equipment or parts, , , q p p ,including instruments or material.
Individual Human Performance LTA
An event or condition resulting from the failure, malfunction, or deterioration of the individual human performance associated with the processperformance associated with the process.
Management Problem An event or condition that could be directly traced to managerial actions, or methodology (or lack thereof).
Communications LTA Inadequate presentation or exchange of informationCommunications LTA Inadequate presentation or exchange of information.
Other Problem The problem was caused by factors beyond the control of the organization
22 RELIABILITY | ACCOUNTABILITY
LTA = Less Than Adequate
A3 - Human Performance
Cause Code Assignment Process (CCAP)
• A1 Design/Engineering Problem• A2 Equipment/Material Problem• A3 Individual Human Performance B1 SKILL BASED ERROR
B2 RULE BASED ERROR
B3 KNOWLEDGE BASED ERROR B3 KNOWLEDGE BASED ERROR
B4 WORK PRACTICES
• A4 Management ProblemA4 Management Problem• A5 Communication LTA• A6 Training Deficiency
23 RELIABILITY | ACCOUNTABILITY
g y• A7 Other Problem
A3 - Human Performanced ( )Cause Code Assignment Process (CCAP)
• A3 Individual Human Performance B1 SKILL BASED ERROR B1 SKILL BASED ERROR
o C01 Check of work LTA
o C02 Step was omitted due to distraction
o C03 Incorrect performance due to mental lapse
o C04 Infrequently performed steps were performed incorrectly
o C05 Delay in time caused LTA actionso C05 Delay in time caused LTA actions
o C06 Wrong action selected based on similarity with other actions
o C07 Omission / repeating of steps due to assumptions for l ticompletion
B2 RULE BASED ERROR
B3 KNOWLEDGE BASED ERROR
24 RELIABILITY | ACCOUNTABILITY
B3 KNOWLEDGE BASED ERROR
B4 WORK PRACTICES
NERC CCAP (2 deep)
25 RELIABILITY | ACCOUNTABILITY
NERC CCAP (3 Deep)
26 RELIABILITY | ACCOUNTABILITY
Who else needs to know?
27 RELIABILITY | ACCOUNTABILITY
Rasmussen’s Classifications
• Human Error Classifications Skill Based
Rule Based
Knowledge Based
• Driving example: Often times a human will operate in all three levels, going back and forth in a single event.
28 RELIABILITY | ACCOUNTABILITY
NERC Alert-Advisory
Configuration Control Practices – Advised industry of events resulting from human performance errors duringevents resulting from human performance errors during protection system maintenanceEvent examples of inadequate control procedures:Event examples of inadequate control procedures:
1. Relay technician follow proper procedure to return protection system to normal state resulting in remote trip
2. Construction team failed to use latest construction document resulting in incorrect calibration of equipment
3 Relay technician leaves work site Returns to resume work3. Relay technician leaves work site. Returns to resume work but did so at wrong cabinet and trips substation
4. Technician trips a transformer due opening a wrong
29 RELIABILITY | ACCOUNTABILITY
p p g gcurrent shorting switch
NERC Alert-Advisory
• EMS Alert Advisory Analysis‐ During the Event Analysis (EA) field trial, 28 Category 2b events have occurred where a complete loss of SCADA/EMS lasted for more than 30 minutes Analysis is currently being conducted to providefor more than 30 minutes. Analysis is currently being conducted to provide emerging trends for the industry
• Current analysis of these events has shown:
f f il b f f h Software failure is a major contributing factor in 50 percent of the events
Testing of the equipment has been shown to be a factor in over 40 percent of the failures:
o Test environment did not match the production environment
o Product design (less than adequate)
Change Management has had an impact in over 50 percent of the failures:g g p p
o Risk and consequences associated with change not properly managed
o Identified changes not implemented in a timely manner
Individual operator skill based error was involved in 15 percent of the
30 RELIABILITY | ACCOUNTABILITY
Individual operator skill‐based error was involved in 15 percent of the events...
Solving Problems: Untying the Knot
31 RELIABILITY | ACCOUNTABILITY
Malcolm K. Sparrow John F. Kennedy School of Government, Harvard University
Calling Balls and Strikes
Official Baseball Rules ©
Section 2.00, Definition of TermsThe STRIKE ZONE is that area over home plate the upper limit f hi h i h i l li h id i b hof which is a horizontal line at the midpoint between the top
of the shoulders and the top of the uniform pants, and the lower level is a line at the hollow beneath the kneecap. The Strike Zone shall be determined from the batter's stance as the batter is prepared to swing at a pitched ball.Section 9.02, The UmpireAny umpire's decision which involves judgment, such as, but not limited to whether a batted ball is fair or foul whether anot limited to, whether a batted ball is fair or foul, whether a pitch is a strike or a ball, or whether a runner is safe or out, is final. No player, manager, coach or substitute shall object to any such judgment decisions. a. Players leaving their position in the field or on base, or managers or coaches leaving the bench or coaches box, to argue on BALLS AND STRIKES will not be permitted. They should be warned if they start for the plate to protest the call
32 RELIABILITY | ACCOUNTABILITY
should be warned if they start for the plate to protest the call. If they continue, they will be ejected from the game.
Have it Your Way
1887 ‐ "The batter can no longer call for a 'high' or 'low' pitch.""A (strike) is defined as a pitch that 'passes over home plate notA (strike) is defined as a pitch that passes over home plate not lower than the batsman's knee, nor higher than his shoulders.'"1876 ‐ "The batsman, on taking his position, must call for a 'high,' 'low,' or 'fair' pitch, and the umpire shall notify the pitcher to deliver the ball as required; such a call cannot be changed after the first pitch is delivered."pHigh ‐ pitches over the plate between the batter's waist and shouldersLow ‐ pitches over the plate between the batter's waist and atLow ‐ pitches over the plate between the batter s waist and at least one foot from the ground.Fair ‐ pitches over the plate between the batter's shoulders and
l f f h d
33 RELIABILITY | ACCOUNTABILITY
at least one foot from the ground.
Foul Balls
1901 ‐ "A foul hit ball not caught on the fly is a strike unless two strikes have already been called." (NOTE: Adopted by National L i 1901 A i L i 1903)League in 1901; American League in 1903).1899 ‐ "A foul tip by the batter, caught by the catcher while standing within the lines of his position is a strike."1894 ‐ "A strike is called when the batter makes a foul hit, other than a foul tip, while attempting a bunt hit that falls or rolls upon foul ground between home base and first or third bases ”foul ground between home base and first or third bases.
34 RELIABILITY | ACCOUNTABILITY
Fairly and Unfairly
1907 ‐ "A fairly delivered ball is a ball pitched or thrown to the bat by the pitcher while standing in his position and facing the batsman that passes over any portion of the home base, before touching the ground, not lower than the batsman's knee, nortouching the ground, not lower than the batsman s knee, nor higher than his shoulder. For every such fairly delivered ball, the umpire shall call one strike.""An unfairly delivered ball is a ball delivered to the bat by the"An unfairly delivered ball is a ball delivered to the bat by the pitcher while standing in his position and facing the batsman that does not pass over any portion of the home base between the batsman's shoulder and knees, or that touches the ground before passing home base, unless struck at by the batsman. For every unfairly delivered ball the umpire shall call one ball.”
35 RELIABILITY | ACCOUNTABILITY
unfairly delivered ball the umpire shall call one ball.
Armpits and Shoulders
1957 ‐ "A strike is a legal pitch when so called by the umpire which (a) is struck at by the batter and is missed; (b) enters thewhich (a) is struck at by the batter and is missed; (b) enters the Strike Zone in flight and is not struck at; (c) is fouled by the batter when he has less than two strikes at it; (d) is bunted foul; (e) touches the batter as he strikes at it; (f) touches the batter in(e) touches the batter as he strikes at it; (f) touches the batter in flight in the Strike Zone; or (g) becomes a foul tip. Note: (f) was added to the former rule and definition."1950 ‐ "The Strike Zone is that space over home plate which is between the batter's armpits and the top of his knees when he assumes his natural stance."assumes his natural stance.1910 ‐ "With the bases unoccupied, any ball delivered by the pitcher while either foot is not in contact with the pitcher's plate shall be called a ball by the umpire ”
36 RELIABILITY | ACCOUNTABILITY
shall be called a ball by the umpire.
Width of the Knee
1996 ‐ The Strike Zone is expanded on the lower end, moving from the top of the knees to the bottom of the knees.of the knees to the bottom of the knees.1988 ‐ "The Strike Zone is that area over home plate the upper limit of which is a horizontal line at the midpoint between the top of the shoulders and the top of the uniform pants, and the lower level is a line at the top of the knees. The Strike Zone shall be determined from the batter's stance as the batter is prepared to swing at a pitched ball."1969 ‐ "The Strike Zone is that space over home plate which is between the batter's armpits and the top of his knees when he assumes a natural stance. The umpire shall determine the Strike Zone according to the batter's usual stance when he swings at a pitch."1963 "The Strike Zone is that space over home plate which is between1963 ‐ "The Strike Zone is that space over home plate which is between the top of the batter's shoulders and his knees when he assumes his natural stance. The umpire shall determine the Strike Zone according to the batter's usual stance when he swings at a pitch "
37 RELIABILITY | ACCOUNTABILITY
the batter s usual stance when he swings at a pitch.
No Vampires at NERC
38 RELIABILITY | ACCOUNTABILITY
Got Collaboration?
39 RELIABILITY | ACCOUNTABILITY
Q ti d AQuestions and Answers
40 RELIABILITY | ACCOUNTABILITY