Top Banner
1 © Erik Hollnagel 2006 Achieving System Safety by Resilience Engineering Erik Hollnagel Industrial Safety Chair, École des Mines de Paris, France E-mail: [email protected] Professor, University of Linköping, Sweden E-mail: [email protected] © Erik Hollnagel 2006 Accidents, incidents Safety as a non-event Daily operation (Status quo) Unwanted outcome Unexpected event Prevention of unwanted events Protection against unwanted outcomes SAFE SYSTEM = NOTHING UNWANTED HAPPENS Reduce likelihood. Reduce consequences. Safety management must prevent/protect against both KNOWN and UNKNOWN risks. Safety management requires THINKING about how accidents can HAPPEN
12

Achieving System Safety by Resilience Engineering IET_System_Safety_Hollnagel

Oct 08, 2014

Download

Documents

stanchell
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Achieving System Safety by Resilience Engineering IET_System_Safety_Hollnagel

1

© Erik Hollnagel 2006

Achieving System Safety by Resilience Engineering

Erik HollnagelIndustrial Safety Chair, École des Mines de Paris, France

E-mail: [email protected], University of Linköping, Sweden

E-mail: [email protected]

© Erik Hollnagel 2006

Accidents, incidents

Safety as a non-event

Daily operation

(Status quo)

Unwanted outcomeUnexpected event

Prevention of unwanted events

Protection against unwanted outcomes

SAFE SYSTEM = NOTHING UNWANTED HAPPENS

Reduce likelihood.

Reduce consequences.

Safety management must prevent/protect against both KNOWN and UNKNOWN risks.Safety management requires THINKING about how accidents can HAPPEN

David
X Arrow Left
David
Highlight
Page 2: Achieving System Safety by Resilience Engineering IET_System_Safety_Hollnagel

2

© Erik Hollnagel 2006

Looking into the futureLooking at the past

What has happened? What may happen?

Accident model

Simple linear

Complex linear

Non-linear*

* outcomes are not proportional toinputs, and cannot be derived froma simple combination of inputs

Risk model

Component failures

Combination of failures and degraded defences

Performance variability coincidences

© Erik Hollnagel 2006

Simple, linear cause-effect modelAssumption: Accidents are the (natural) culmination of a series of events or circumstances, which occur in a specific and recognisable order.

Consequence: Accidents are prevented by finding and eliminating possible causes. Safety is ensured by improving the organisation’s ability to respond.

Domino model (Heinrich, 1930)

Hazards-risks: Due to component failures (technical, human, organisational), hence looking for failure probabilities (event tree, PRA/HRA).

Page 3: Achieving System Safety by Resilience Engineering IET_System_Safety_Hollnagel

3

© Erik Hollnagel 2006

Consequence: Accidents are prevented by strengthening barriers and defences. Safety is ensured by measuring/sampling performance indicators.

Complex, linear cause-effect modelAssumption: Accidents result from a combination of active failures (unsafe acts) and latent conditions (hazards).

Swiss cheese model (Reason, 1990)

Hazards-risks: Due to degradation of components (organisational, human, technical), hence looking for drift, degradation and weaknesses

© Erik Hollnagel 2006

Consequence: Accidents are prevented by monitoring and damping variability. Safety requires constant ability to anticipate future events.

Non-linear accident modelAssumption: Accidents result from unexpected combinations (resonance) of normal performance variability.

Hazards-risks: Emerges from combinations of normal variability (socio-technical system), hence looking for ETTO* and sacrificing decision

Functional Resonance Accident Model

CertificationI

P

C

O

R

TFAA

LubricationI

P

C

O

R

T

Mechanics

High workload

Grease

Maintenance oversightI

P

C

O

R

T

Interval approvals

Horizontal stabilizer

movementI

P

C

O

R

TJackscrew up-down

movementI

P

C

O

R

T

Expertise

Controlledstabilizer

movement

Aircraft designI

P

C

O

R

T

Aircraft design knowledge

Aircraft pitch controlI

P

C

O

R

T

Limiting stabilizer

movementI

P

C

O

R

T

Limitedstabilizer

movement

Aircraft

Lubrication

End-play checkingI

P

C

O

R

T

Allowableend-play

Jackscrew replacement

I

P

C

O

R

T

Excessiveend-play

High workload

Equipment Expertise

Interval approvals

Redundantdesign

Procedures

Procedures

* ETTO = Efficiency-Thoroughness Trade-Off

David
X Arrow Left
David
Highlight
Page 4: Achieving System Safety by Resilience Engineering IET_System_Safety_Hollnagel

4

© Erik Hollnagel 2006

Safety management and control

Controller and actuating

deviceProcess

Sensor

Output+

-

Disturbance

Setpoint

The purpose of safety management is ensure that nothing unwanted happens.

An SMS must therefore be able to control a dynamic process or organisation to insure that performance remains within predetermined safety limits.

Key concepts: Process model (nature of activity)Measurements (performance indicators, output)Possibilities for control (means of intervention)Nature of threats (disturbances, noise)

© Erik Hollnagel 2006

Safety management as feedback control

Process (internal

variability)

Environment (external

variability)

Safety Management

SystemRequired

safety level Performance

Accident model:- simple linear- complex linear- non-linear

Reporting threshold

How can changes be brought about?What are the control options/tools?

Delays in effects? Delays in feedback?

Nature of threats:- regular- irregular- unexampled

Performance indicators

David
Highlight
Page 5: Achieving System Safety by Resilience Engineering IET_System_Safety_Hollnagel

5

© Erik Hollnagel 2006

Knowing what may happen

There is an infinite number of ways in which something can go wrong. The problem is to find those that are unlikely yet potentially serious.Pr

obab

ility

(p)

Consequence

Unknown (unsafe)

Requisite imagination:

Where is the cut-off point?

Murphy’s law:“everything that can go wrong

sooner or later will go wrong”

“If there’s more than one way to do a job and one of those ways will end in disaster, then somebody will do it that way.”

Known (safe)

© Erik Hollnagel 2006

Regular threats

Events that occur so often that the organisation can learn how to respond.

(Westrum, 2006)

Medication errors that only affect a single patient.Transportation accidents (collision between vehicles)Process or component failure (loss of mass, loss of energy)

Regular threats are covered by standard methods (HAZOP, Fault Trees, FMECA, etc.)

Solutions can be based on standard responses,typically elimination or barriers

Their likelihood and severity (cost) are so high that they must be dealt with.

p

Cost

p = 0.01

Page 6: Achieving System Safety by Resilience Engineering IET_System_Safety_Hollnagel

6

© Erik Hollnagel 2006

Irregular threats(Westrum, 2006)

p

Cost

p = 0.01

One-off (singular) events, but so many, so rare, and so different that a standard response is impossible.

Apollo 13 moon mission accident.Epidemics (BSE, N5H1)Simultaneous loss of main and back-up systems.

Irregular threats are imaginable but usually completely unexpected. They are discounted by standard methods.

Solutions require interaction and improvisation. Standard responses are insufficient.

Their likelihood is so low that defences are not cost effective, even if consequences are serious.

© Erik Hollnagel 2006

Unexampled events(Westrum, 2006)

p

Cost

p = 0.01

Events that are virtually impossible to imagine and which exceed the organisation’s collective experience

ChernobylNew Orleans flooding (2005)Attack on the WTC (9/11).

Even when unexampled events are imaginable, they are normally discounted as impossible.

Solutions require the ability to cope, i.e., dynamically to self-organize, formulate and monitor responses.

Their likelihood is so low that defences are notviable, even if consequences are catastrophic.

Page 7: Achieving System Safety by Resilience Engineering IET_System_Safety_Hollnagel

7

© Erik Hollnagel 2006

Reactive organisation

Accident

Surprise!Scrambling for action

Activate ready-made plans

Safety planningPreparing for

regular threats

Accident

© Erik Hollnagel 2006

Interactive (attentive) organisation

Accident

Evaluation, learning

Safety planningPreparing for

irregular threats

Situation assessment,

quick replanning

Occasional health checks using pre-defined indicators

Prepared and alertLooking for expected

situations.

Page 8: Achieving System Safety by Resilience Engineering IET_System_Safety_Hollnagel

8

© Erik Hollnagel 2006

Proactive (resilient) organisation

Accident

Alert and observant.

Situation assessment,

reorganisation

Constantly self-critical and inquisitive

Evaluation, learning

Safety planningPreparing for

unexampled events

Alternative ways of functioning

© Erik Hollnagel 2006

Some examples

Reactive (brittle, no resilience)

Interactive (robust, partial

resilience)

Proactive (full resilience)

Mont Blanc Tunnel fire (March 26 1999)Swedish government after Tsunami (December 26 2004)Homeland Security and FEMA after Hurricane Katrina (August 29 2005)

The aviation industry Nuclear power plantsHospitals

Toyota (as innovative manufacturer)People of London after bombing, July 7 2005Israeli hospitals (bus bombings)

Type of organisation Examples

Page 9: Achieving System Safety by Resilience Engineering IET_System_Safety_Hollnagel

9

© Erik Hollnagel 2006

Success and failureFailure is normally explained as a breakdown or malfunctioning of a system and/or its components.

Individuals and organisations must adjust to the current conditions in everything they do. Because information, resources and time are finite such adjustments will always be approximate.

Failure is due to the absence of that ability — either temporarily or permanently.

Success is due to the ability of organisations, groups and individuals correctly to make these adjustments, in particular to anticipate failures before they occur.

This view assumes that success and failure are of a fundamentally different nature.

Safety must encompass strengthening this ability, rather than just avoiding or eliminating failures.

© Erik Hollnagel 2006

“Surprises” and responses

Disturbances, or disrupting events, which challenge the proper functioning of a process.

Organisation’s view on “surprises”

Exceptions that must be regimented.Uncertainty about the future.

A need constantly to update definitions of the difference between success and failure.

A recognition that models and plans are likely to be incomplete or wrong, despite best efforts.

Try to keep process under control and ensure people do not exceed given ‘limits.’

Focus of organisation’s response

Improve ability to detect and to respond when challenged. Prepare routines and plans.

Identify the variability that organisation should be aware of; ensure ability to cope with these variations.Search for the boundaries of own assessments in order to learn and revise.

Reactive

Interactive (attentive)

Proactive (resilient)

David
Highlight
David
Highlight
David
Highlight
David
Highlight
Page 10: Achieving System Safety by Resilience Engineering IET_System_Safety_Hollnagel

10

© Erik Hollnagel 2006

From reactive to proactive control

+

-

Process

Sensor

Target state (setpoint)

Anticipatory control

(feedforward)

Compensatory control

(feedback)

Disturbance

Output

The main tool for looking ahead should NOT be to look back

You cannot drive a car by looking in the rear-view mirror!

© Erik Hollnagel 2006

Environment (external

variability)

Anticipation (irregularities, disturbances,

threats)

SMS as feedforward control

Process (internal

variability)

Safety Management

System

Safety values and targets

Performance

Accident model:- simple linear- complex linear- non-linear Reporting

threshold

How can changes be brought about?What are the control options/tools?

Delays in effects? Delays in feedback?

Nature of threats:- regular- irregular- unexampled

Performance indicators

Customers, regulators, …

David
Highlight
David
Highlight
David
Highlight
situation / context
Page 11: Achieving System Safety by Resilience Engineering IET_System_Safety_Hollnagel

11

© Erik Hollnagel 2006

Knowing what to expect

(anticipation)

Knowing what to look

for (attention)

Knowing what to do (rational response)

Components of resilience

AttentionAnticipation Response

Dynamic developments

Upda

ting Learning

Knowledge Competence Resources

© Erik Hollnagel 2006

Resilience and safety management

Resilience is the intrinsic ability of an organisation to keep or recover a stable state, thereby allowing it to continue operations after a major mishap or in presence of continuous stress.

A practice of Resilience Engineering must comprise the followingcritical components:

Techniques to model and predict the short- and long-term effects of change and decisions on risk.

Tools and methods to improve an organisation’s resilience vis-à-vis the environment.

Ways to analyse, measure and monitor the resilience of organisations in their operating environment.

David
Highlight
David
Highlight
David
Highlight
David
Text Box
Strategic decision making
David
Highlight
David
Highlight
David
Highlight
Page 12: Achieving System Safety by Resilience Engineering IET_System_Safety_Hollnagel

12

© Erik Hollnagel 2006

If you want to know more about RE ...