A_TOMCZYNSKI_MAA-DISS-19-A__665187 FINAL

CRANFIELD UNIVERSITY

Alexander Tomczynski

APPLICATION OF RESILIENCE ENGINEERING CONCEPTS TO

THE MANAGEMENT OF AIRWORTHINESS

DEFENCE ACADEMY - COLLEGE OF MANAGEMENT AND

TECHNOLOGY

Military Aerospace and Airworthiness

MSc

Academic Year: 2013 - 2014

Supervisor: Dr Simon Place

March 2014

CRANFIELD UNIVERSITY

DEFENCE ACADEMY - COLLEGE OF MANAGEMENT AND

TECHNOLOGY

Military Aerospace and Airworthiness

MSc

Academic Year 2013 - 2014

Alexander Tomczynski

APPLICATION OF RESILIENCE ENGINEERING CONCEPTS TO

THE MANAGEMENT OF AIRWORTHINESS

Supervisor: Dr Simon Place

March 2014

This thesis is submitted in partial fulfilment of the requirements for

the degree of Master of Science

© Crown Copyright 2014. All rights reserved. No part of this

publication may be reproduced without the written permission of the

copyright owner.

i

ABSTRACT

Complex safety critical systems in high hazard industries continue to have

accidents despite improvements in reliability, understanding of human factors

and the behaviour of organisations. Resilience engineering offers a new

paradigm in safety science and proposes that safety is defined as success

under varying performance conditions. The theory is examined and its

applicability to airworthiness is discussed. A related technique, the Functional

Resonance Analysis Method (FRAM), treats system performance as a control

problem. This methodology is employed to create an airworthiness

management tool for the Royal Air Force Tornado aircraft fleet. Data was

gathered through occurrence report data, practical experience and semi-

structured interviews with a variety of personnel within the airworthiness

system. The tool comprises a spreadsheet model with an accompanying

interactive visualisation tool. The tool is used to analyse two air safety

occurrences and also to attempt to provide a resilience based risk assessment

of an airworthiness issue. It was concluded that resilience engineering presents

a promising basis for better management of airworthiness. The initial version of

the tool was found to work well but extensive development work is required to

produce a desktop IT airworthiness resilience dashboard tool.

Keywords:

SYSTEM SAFETY, SAFETY CRITICAL SYSTEMS, ACCIDENT

INVESTIGATION

iii

ACKNOWLEDGEMENTS

I would like to thank my wife Natalie for her encouragement and support.

Also worthy of thanks are Professor Erik Hollnagel and the rest of the “FRAMily”

who have collected both online and at the 2013 meeting in Munich. The shared

knowledge and experience has been most instructive.

This project would not have been possible without the enthusiastic participation

of a large number of people at Royal Air Force Station Marham - service

personnel and employees of BAE Systems and Rolls Royce.

The guidance provided by my supervisor Dr Simon Place has been invaluable

in the completion of this project and I thank him for it.

iv

In remembrance of No. CXX Squadron, Crew 3

“Endurance”

v

TABLE OF CONTENTS

ABSTRACT ......................................................................................................... i

ACKNOWLEDGEMENTS................................................................................... iii

LIST OF FIGURES ............................................................................................. ix

LIST OF TABLES .............................................................................................. xii

LIST OF EQUATIONS ...................................................................................... xiv

LIST OF ABBREVIATIONS ............................................................................... xv

1 INTRODUCTION ............................................................................................. 1

1.1 Introduction ............................................................................................... 1

1.2 Background – Theories of Safety .............................................................. 2

1.3 Background – The Practical Requirement ................................................ 3

1.4 What is ‘Airworthiness Management’? ...................................................... 5

1.5 The Research Aim .................................................................................... 5

1.6 Objectives ................................................................................................. 5

1.7 Methodology Overview ............................................................................. 6

1.8 Descriptions and Definitions ..................................................................... 6

1.9 Thesis Structure ........................................................................................ 6

2 LITERATURE REVIEW ................................................................................... 9

2.1 Airworthiness in the Context of Safety ...................................................... 9

2.1.1 Accident Investigations ....................................................................... 9

2.1.2 Initial and Type Airworthiness .......................................................... 10

2.1.3 Safety Management ......................................................................... 10

2.1.4 Continuing Airworthiness .................................................................. 11

2.2 A History of Safety Theory ...................................................................... 11

2.2.1 Technological Age – Governing Philosophy ..................................... 14

2.2.2 Technological Age – Tools ............................................................... 14

2.2.3 Limits of Probabilistic Risk Assessment ........................................... 16

2.2.4 Human Factors ................................................................................. 18

2.2.5 Organisational .................................................................................. 18

2.3 Complexity .............................................................................................. 19

2.3.1 Complexity Theory ........................................................................... 20

2.3.2 Systems Thinking and Systems Engineering ................................... 22

2.3.3 Control Theory ................................................................................. 24

2.3.4 Non-Linear Dynamics ....................................................................... 25

2.4 Resilience Engineering ........................................................................... 26

2.4.1 Resilience Engineering as a Successor to Safety Management ...... 30

2.4.2 Under Specification of Performance Conditions ............................... 30

2.4.3 Performance Variability .................................................................... 31

2.4.4 Examples of Resilience Engineering in Practice .............................. 31

2.4.5 Criticism of Resilience Engineering .................................................. 33

2.4.6 Resilience Engineering and Airworthiness ....................................... 34

vi

2.4.7 Lean Resilience ................................................................................ 38

2.5 Functional Resonance Analysis Method ................................................. 38

2.6 Quantifying Resilience ............................................................................ 39

2.7 Concluding Remarks ............................................................................... 40

3 METHODOLOGY .......................................................................................... 41

3.1 Introduction ............................................................................................. 41

3.2 Working Arrangements ........................................................................... 41

3.3 Research Interviews ............................................................................... 41

3.4 Model Development ................................................................................ 43

3.5 Air Safety Information Management System Data .................................. 44

3.5.1 Data Extraction ................................................................................. 44

3.5.2 Assignment of Related Functions to Incidents ................................. 46

4 BUILDING THE TORNADO AIRWORTHINESS SYSTEM MODEL USING

THE FUNCTIONAL RESONANCE ANALYSIS METHOD ................................ 47

4.1 Basic Principles ...................................................................................... 47

4.2 Taxonomy ............................................................................................... 48

4.3 FRAM Step 0 – Recognise the Purpose of the FRAM Analysis .............. 50

4.4 FRAM Step 1a – Identify and Describe the Initial Function List. ............. 51

4.5 FRAM Step 1b – Verify Functions with Experts ...................................... 53

4.6 Step 2 – Identification of Output Variability ............................................. 56

4.7 Step 2a – Identify the Type of Function .................................................. 56

4.8 Step 2b – Identify Internal Sources of Output Variability ......................... 59

4.9 Step 2c – Identify External Sources of Output Variability ........................ 60

4.10 Step 2d – Most Likely Dimension of Output Variability .......................... 61

4.11 Step 3 – Aggregation of Variability ........................................................ 65

4.12 Step 4 – Consequences of the Analysis ............................................... 71

4.12.1 Step 4a – Damping Factors ............................................................ 71

4.12.2 Step 4b Performance Indicators ..................................................... 71

4.13 Summary of TASM Layout .................................................................... 74

5 TORNADO AIRWORTHINESS SYSTEM MODEL VISUALISATION TOOL . 77

5.1 Need for the Tool .................................................................................... 77

5.2 Microsoft Visio ........................................................................................ 77

5.3 Building the Tool ..................................................................................... 77

5.3.1 General Functional Areas ................................................................. 77

5.3.2 Functions .......................................................................................... 79

5.3.3 External Dependencies .................................................................... 82

5.3.4 Functional Activities.......................................................................... 84

5.4 Exploiting the Tool .................................................................................. 85

5.5 Summary ................................................................................................ 89

6 USING THE TORNADO AIRWORTHINESS SYSTEM MODEL FOR

INCIDENT ANALYSIS ...................................................................................... 93

6.1 Case for Using FRAM for Incident Modelling .......................................... 93

vii

6.2 Incident One – Thrust Reverser Incidents ............................................... 94

6.2.1 Description of Incidents .................................................................... 95

6.2.2 Summary of the Investigations ......................................................... 96

6.2.3 Instantiation of the FRAM Model ...................................................... 98

6.2.4 The Sources of Variability .............................................................. 102

6.2.5 Insights from TASM ........................................................................ 108

6.3 Incident 2 – Missing Rigging Pin ........................................................... 111

6.3.1 Description of Incident .................................................................... 111

6.3.2 Summary of Investigation ............................................................... 112

6.3.3 Instantiation of the TASM ............................................................... 116

6.3.4 Insights from TASM ........................................................................ 119

7 USING THE TORNADO AIRWORTHINESS SYSTEM MODEL FOR RISK

ANALYSIS ...................................................................................................... 121

7.1 Case for Using TASM for Risk Analysis ................................................ 121

7.2 Current Theoretical Basis for Airworthiness Risk Management ............ 123

7.3 Proposal of FRAM Based Airworthiness Risk Theory ........................... 124

7.4 Proposal for a FRAM Based Risk Assessment Process ....................... 127

7.5 Risk Example – Operation of Components in Excess of Cleared Life .. 131

7.5.1 Generating a FRAM Model Risk Assessment ................................ 131

7.5.2 Insights into Risk ............................................................................ 144

7.6 Proposal for a FRAM Based Risk Management ................................... 148

7.7 Chapter Summary ................................................................................. 149

8 DISCUSSION .............................................................................................. 151

8.1 Applicability of the Resilience Engineering Paradigm to Airworthiness . 151

8.2 The Tornado Airworthiness System Model – Initial Version .................. 155

8.3 Incident Investigation ............................................................................ 156

8.3.1 Data Collection ............................................................................... 157

8.3.2 Aids to Investigation ....................................................................... 157

8.4 Risk Assessment .................................................................................. 158

8.4.1 Hazard Management vs Functional Resonance Management ....... 160

8.5 Utility of the TASM for Type Airworthiness Activities ............................ 160

8.6 Utility of the TASM for Continuing Airworthiness Activities ................... 162

8.7 Utility of TASM for Duty Holder Activity ................................................. 164

8.8 Potential Use for System Improvement ................................................. 165

8.9 Potential for Further Development of the TASM ................................... 166

8.9.1 Increased Model Fidelity ................................................................ 167

8.9.2 Application of Bayesian and/or Fuzzy Logic ................................... 168

8.9.3 Expansion into Operational Safety Management ........................... 168

8.10 Chapter Summary ............................................................................... 169

9 CONCLUSIONS .......................................................................................... 171

9.1 Summary .............................................................................................. 171

9.2 Recommendations ................................................................................ 172

viii

9.2.1 Manage Airworthiness as a Control Problem ................................. 173

9.2.2 Use the TASM to Control the Airworthiness System ...................... 173

9.2.3 Review Airworthiness Risk from a Resilience Perspective ............. 173

9.2.4 Use FRAM as a Means to Improve System Resilience and

Efficiency ................................................................................................. 173

9.3 Potential for Further Research and Development ................................. 174

9.4 Concluding Remarks ............................................................................. 174

REFERENCES ............................................................................................... 177

Appendix A –TORNADO AIRWORTHINESS FRAM MODEL ..................... 185

Appendix B – TORNADO AIRWORTHINESS MODEL VISUALISATION ... 187

Appendix C – PARTICIPANTS BRIEFING SHEET .................................... 188

ix

LIST OF FIGURES

Figure 1-1 - Nimrod MR2 XV230 ........................................................................ 3

Figure 1-2 - RAF Tornado GR4 Aircraft ............................................................. 4

Figure 2-1 Accident Analysis and Risk Assessment Methods .......................... 13

Figure 2-2 Three Tracks on the Evolution of Safety Theory ............................. 13

Figure 2-3 The ‘Cynefin’ Framework – Complexity and Risk Management ..... 21

Figure 2-4 General Form of a Model of Socio-technical Control....................... 24

Figure 2-5 The Four Cornerstones of Resilience .............................................. 29

Figure 2-6 Conceptual Framework for Resilience Engineering ........................ 30

Figure 2-7 Framework for managing the impact organisation, technology and human factors have on safety management systems ................................ 37

Figure 2-8 FRAM Function ............................................................................... 38

Figure 4-1 FRAM Model Visualisation Demonstrating Taxonomy .................... 49

Figure 4-2 TASM Step 12 – Screen Capture Showing Applicable Spreadsheet Areas ......................................................................................................... 54

Figure 4-3 Visualising Functional Output Variability ......................................... 56

Figure 4-4 Instances of Functional Output Variability Recorded in Occurrence Reports 2012/13 ........................................................................................ 58

Figure 4-5 Instances of Reported Functional Output Variability by Function Type .................................................................................................................. 59

Figure 4-6 Total Instances of Functional Output Variability Recorded in Occurrence Reports 2012/13 ..................................................................... 59


Figure 4-8 Tracing Output Downstream Dependencies (Screen Capture) ....... 66

Figure 4-9 Rough Score Matrix ........................................................................ 70

Figure 4-10 Rough Downstream Function Variability Score ............................. 70



Figure 4-13 Example FRAM for 2 Functions, A and B ...................................... 75

x

Figure 5-1 Visualisation Functional Groupings ................................................. 78

Figure 5-2 A Function and Its Aspects ............................................................. 79

Figure 5-3 Screen Capture of Visualisation Tool with Functions Added ........... 81

Figure 5-4 Screen Capture of Visualisation Tool with External Dependencies Added ........................................................................................................ 83

Figure 5-5 5-6 Screen Capture of Visualisation Tool with all Functional Activities Shown........................................................................................................ 85

Figure 5-7 Activities and Dependencies Linked to Aspects of the ‘Train Maintenance Personnel’ Function ............................................................. 86

Figure 5-8 Selecting Layers within Visio – Screen Capture .............................. 87

Figure 5-9 DII Visio Viewer – Screen Capture .................................................. 88

Figure 5-10 Visualisation Tool Key ................................................................... 90

Figure 6-1 Tornado GR4 with Thrust Reversers Deployed .............................. 94

Figure 6-2 Thrust Reverser Incidents Visualisation ........................................ 101

Figure 6-3 Propulsion & Electrical System ..................................................... 102

Figure 6-4 Electrical System Potential Functionally Resonant Activities ........ 104

Figure 6-5 Instantiation of Thrust Reverse Occurrence Reports .................... 110

Figure 6-6 Location of Where Lost Pin was Installed ..................................... 113

Figure 6-7 General Installation Location of Lost Pin ....................................... 114

Figure 6-8 Pin Location in Tool Kit ................................................................. 115

Figure 6-9 Visualisation Tool Output for Rigging Tool Occurrence ................. 117

Figure 6-10 Instantiation of Rigging Pin Occurrence ...................................... 118

Figure 7-1 Tornado Process for Emergent Airworthiness Issues ................... 122

Figure 7-2 Current Theoretical Basis for Tornado Airworthiness Risk Management ............................................................................................ 124

Figure 7-3 Proposed Functional Resonance Risk Management Theory - Visualisation of a Generic Hazardous Process ........................................ 125

Figure 7-4 FRAM Model Risk Assessment Process ....................................... 129

Figure 7-5 Operation of Components Beyond Cleared Life - First Stage Risk Visualisation, Excluding Background Functions ...................................... 132

Figure 7-6 Visualisation of Hazard Generation Process ................................. 141

Figure 7-7 Visualisation of Potential Accident Processes ............................... 143

xi

Figure 7-8 Proposed Risk Management Process ........................................... 148

Figure 8-1 Fractal Property of the FRAM - Function Decomposed into Lower Level Functions ....................................................................................... 159

Figure 8-2 TASM Development Pathway ....................................................... 167

xii

LIST OF TABLES

Table 2-1 Herrera 's Ages of Safety Theory ..................................................... 12

Table 2-2 Benefits and Criticisms of Probabilistic Risk Assessment ............... 17

Table 2-3 Examples of Resilience Engineering in Practice .............................. 31

Table 3-1 D-ASOR Classifications included in Data ......................................... 45

Table 4-1 Example FRAM frame for Fault Diagnosis ....................................... 52

Table 4-2 Listing of TASM Functions ............................................................... 55

Table 4-3 Summary of Internal Variability ........................................................ 60

Table 4-4 Summary of External Variability ....................................................... 61

Table 4-5 Example TASM Recording of Step 2a-c for Function 67 - Engine Fleet Monitoring .................................................................................................. 61

Table 4-6 Elaborate Description of Output Variability ....................................... 62

Table 4-7 Characterising Output Variability – Flight Servicing ......................... 63

Table 4-8 Classifications for Frequency of Output Variability ........................... 64

Table 4-9 Classification of Amplitude of Performance Variability ..................... 65

Table 4-10 Aggregation of Variability for Flight Servicing ................................. 69

Table 4-11 Example of Step 4 - Flight Servicing .............................................. 73

Table 6-1 Thrust Reverser Air Safety Occurrence Reports 2012/13 ................ 95

Table 6-2 Thrust Reverse Occurrences with Detailed Investigation ................. 96

Table 6-3 Thrust Reverser FRAM Instantiation ................................................ 99

Table 6-4 FRAM Model of Electrical System .................................................. 105

Table 6-5 Electrical System Precondition Variability ...................................... 107

Table 6-6 Functional Variability Noted From Investigation ............................. 116

Table 7-1 Configuration Management Aspects .............................................. 133

Table 7-2 Summary of Second Stage of Risk Assessment ............................ 135

Table 7-3 Stage 2 - Scheduled Maintenance Function ................................... 136

Table 7-4 Stage 2 - Force and A4 Operations Function (Part 1) .................... 137

Table 7-5 Stage 2 - Force and A4 Operations Function (Part 2) .................... 138

Table 7-6 Stage 3 Replacement of Life Limited Parts Function...................... 139

xiii

Table 7-7 Example Accident Generating Function FRAM Frame Layout ....... 145

Table 7-8 Avionic Flight Systems Output – Baseline FRAM Model ................ 147

Table 8-1 Utility of TASM for TAA Activities ................................................... 161

Table 8-2 Potential CAMO Use of TASM ....................................................... 163

Table 8-3 Aviation Duty Holder Use of TASM ................................................ 164

xiv

LIST OF EQUATIONS

Equation 1 – Linear System ............................................................................. 25

Equation 2 - Additive Property .......................................................................... 25

Equation 3 – Homogeneous Property .............................................................. 25

Equation 4 – Non Linear System; lack of Additive Property ............................. 25

Equation 5 – Non Linear System; lack of Homogeneous Property ................... 25

Equation 6 - Rough Downstream Function Variability Score ............................ 70

xv

LIST OF ABBREVIATIONS

A4

AC

AcciMap

ADF

AEB

AESO

ALARP

ARC

ATTAC

ASIMS

ATHEANA

AWFL

CAM

CAMO

CAMSS

CMU

CREAM

CSNI

DAOS

DASOR

DE&S

DII

DMS

DO

DQAFF

EA

EngO

ETTO

FAST

FMEA

FMECA

FOC

FRAM

GSE

HAS

HAZOPS

HAZID

NATO designator for Logistics/Engineering

Aircraft

Accident Map

Acceptable Deferred Fault

Accident Evolution and Barrier Function

Air Engineering Standing Orders

As Low As Reasonably Practicable

Airworthiness Review Certificate

Aircraft Tornado Transformation Availability Contract

Air Safety Information Management System

A Technique for Human Error ANAlysis

AirWorthiness Flight Limitations

Continuing Airworthiness Manager

Continuing Airworthiness Management Organisation

Continued Airworthiness Management Support Services

Combined Maintenance and Upgrade Unit

Cognitive Reliability Error Analysis Method

Committee on the Safety of Nuclear Installations

Design Approved Organisation Scheme

Defence Air Safety Occurrence Report

Defence Equipment and Support

Defence Information Infrastructure

Dedicated Maintenance System

Design Organisation

Defence Quality Assurance Field Force

Engineering Authority

Engineer Officer

Efficiency Thoroughness Trade Off

Fast Air Support Team

Failure Mode Effects Analysis

Failure Mode and Criticality Analysis

Force Operations Centre

Functional Resonance Analysis Method

Ground Support Equipment

Hardened Aircraft Shelter

Hazard and Operability Study

Hazard Identification

xvi

HCR

HEAT

HERA

HFACS

HPES

HRO

ITEA

JEngO

JSP

LFT

LITS

LOAA

MAA

MAOS

MAP-01

MERMOS

MOD

MMD

MORT

MSG

MRP

MTO

MWO

NAT

NATO

NETMA

OOPS

OSI

ORG

PST

PT

QA

QMS

R2

RA

RAF

RCA

ROCET

Human Cognitive Reliability

Human Error Assessment Technique

Human Error in Air Traffic Management Technique

Human Factors Analysis and Classification System

Human Performance Enhancement System

High Reliability Organisation

Independent Technical Evaluation and Advice

Junior Engineering Officer

Joint Service Publication

Latest Finish Time

Logistics Information Technology System

Letter Of Airworthiness Authority

Military Airworthiness Authority

Maintenance Approved Organisation Scheme

Manual of Airworthiness Processes - 01

Méthode d’Evaluation de la Réalisation des Missions Opérateur pour la Sûreté

Ministry of Defence

Man Made Disaster

Maintenance Oversight and Risk Tree

Maintenance Steering Group

MAA Regulatory Publications

Man-Technology-Organisation

Maintenance Work Order

North Atlantic Treaty Organisation

Normal Accident Theory

NATO Eurofighter and Tornado Management Agency

Out Of Phase Servicing

Occurrence Safety Investigation

Occurrence Review Group

Propulsion Support Team

Project Team

Quality Assurance

Quality Management System

2nd Line Repair

Regulatory Article

Royal Air Force

Root Cause Analysis

RB199 Operational Contract for Engine Transformation

xvii

RTS

RTSA

SEngO

SI(T)

SQEP

STAMP

STANEVAL

STEP

TAA

TAP

TASM

TGRF

THERP

TME

TRACEr

TSEMP

Release To Service

Release To Service Authority

Senior Engineering Officer

Special Instruction (Technical)

Suitably Qualified and Experienced Person

Systems Theoretic Accident Model

STANdards EVALuation

Sequential Timed Event Plotting

Type Airworthiness Authority

Technical Assistance Process

Tornado Airworthiness System Model

Tornado Ground Attack & Reconnaissance Force

Technique for Human Reliability Analysis

Testing and Measuring Equipment

Technique for The Retrospective Analysis of Cognitive Error

Tornado Safety and Environmental Management Plan

1

1 INTRODUCTION

“I can see him now, fighting with the controls, trying his best…

…We want some justice and the MOD to sit up and take notice, what they have

done could have been avoided; we live in hope that they will not let this happen

in the future.”

Mrs Adele Squires, Wife of Flight Lieutenant Al Squires, Captain of the Nimrod

aircraft XV230

1.1 Introduction

Air accidents have shown that aircraft are sometimes not as safe or as

airworthy as was previously imagined. With huge resources applied to ensuring

airworthiness, why do accidents still occur? It is often said that such accidents

could be prevented if the lessons of the past been heeded. Yet despite many

investigations and recommendations, accidents still occur. Why is this so? Are

existing tools for safety analysis inadequately applied or inadequate in of

themselves? How can those charged with responsibility over complex

hazardous systems in industry, transportation or the military work better to

prevent accidents and yet still achieve their operational objectives?

Design engineers are duty bound to demonstrate that their system may initially

be operated without an unacceptable level of harm. Thereafter, the system must

be maintained and continually monitored for the increase of risk beyond

acceptable levels. For organisations that have very few if any accidents, this is

a major challenge; how can the risk of something that has not happened be

measured and managed? In the aviation domain, airworthiness is a property

that requires continual management and assessment. Whilst the property is

attributable to the materiel itself, aircraft systems experience almost constant

contact with humans and thus airworthiness is inherently bound up with the

humans who manage, operate and maintain aircraft. Aircraft and their

supporting organisations are complex ‘socio-technical systems’. Resilience

engineering is a new concept that provides insight into this relationship and

offers useful models and tools for better management of the safety of such

2

systems. If complex socio-technical systems managing airworthiness are better

understood, then perhaps future accidents will be prevented.

1.2 Background – Theories of Safety

There have been accidents involving complex systems ever since the industrial

revolution. The management of safety has consequently been a concern since

these times but a theoretical basis for safety did not emerge until the 1930s.

Herrea (2012) divides the development of safety theory into 4 overlapping ages

of safety theory - the ages of technology, human factors, organisational safety

and complexity. The age of technology dealt with the design of machines and

why they fail whereas human factors has traditionally been concerned with why

humans fail to do what is expected of them. Organisational safety has been

concerned with the safe management of potentially hazardous enterprises and

how these fail – Reason’s (1997) famous ‘Swiss Cheese’ being the pre-eminent

model in the field. However detailed the taxonomies of failure in the first 3 ages

were, the associated models of accident causation have been linear. An

emerging 4th age of safety theory is that of complexity. In complexity theories,

accident causation models are non-linear and are sometimes said to be

intractable. The term Resilience Engineering has come to encompass the use

of these models; the practise seeks to develop socio-technological systems that

are resilient against those variations in system performance which may cause

accidents. Airworthiness has been mostly associated with technological safety

theory, with reliability and safety assessment methods such as fault tree

analysis dominating the thinking of designers and regulators. Although design

for human factors has been an issue since the 1940s the field has generally

been concerned with operator performance; human factors in maintenance has

only more recently come into the spot light (Reason and Hobbs, 2003). The

ability of aircraft operating authorities and regulators to maintain continuing

airworthiness has been the subject analysis from an organisational safety

standpoint due to accidents such as Alaska Air 261 (Woltjer, 2007). More

recently accidents such as Air France 447 (Stoop, 2013) have shown that

3

unexpected results can emerge from increasingly complex systems and that a

lack of resilience can be fatal.

1.3 Background – The Practical Requirement

This research will specifically address the management of airworthiness within

the United Kingdom’s military. On the 2nd September 2006 the UK military

suffered its single largest loss of life since the 1982 Falklands War, when a

Royal Air Force (RAF) Nimrod MR2 aircraft was destroyed near Kandahar,

Afghanistan. This was not the consequence of a hostile act or the outcome of

operator error. It was an accident caused by a failure to establish the correct

level of initial airworthiness though the design of modifications and thereafter a

failure to maintain continuing airworthiness in the condition of fuel and hot air

systems. The independent inquiry into the incident identified that the deeper

causes were organizational and managerial (Haddon-Cave, 2009).

Figure 1-1 - Nimrod MR2 XV230 (McKenzie, 2012)

As a consequence of the recommendations made by the Nimrod Review,

military airworthiness management has been comprehensively overhauled as

part of a reorganisation of ‘Air Safety’ within the Ministry of Defence (MOD). The

previously “byzantine” (Haddon-Cave, 2009) regulation of air safety has been

simplified through the establishment of the Military Aviation Authority (MAA).

Key to the new system has been the establishment of a chain of ‘duty holders’

who are named senior military officers with legal responsibility for the safety of

aircraft operated by their organisation. Duty Holders rely on Type Airworthiness

4

Authorities (TAA) and Continuous Airworthiness Managers (CAM) to ensure

that the airworthiness of their aircraft is adequately established and maintained.

In practise this is achieved through a variety of processes aimed at managing

the risk of a technical failure. There is an engineering programme to maintain

the integrity of the systems’ initial airworthiness whilst developing the system’s

capability and also a maintenance programme specified by the Engineering

Authority (EA) (reporting to the TAA) and implemented by the Continuing

Airworthiness Management Organisation (CAMO). In common with most socio-

technological systems, these processes do not operate exactly as designed or

documented. Particular concerns centre around the human factors within the

maintenance programme and whether or not appropriate engineering

‘standards and practises’ can be ensured in the face of pressures to produce

operational output within an increasingly lean front line organisation. These are

typical examples of the Efficiency-Thoroughness Trade-Off (ETTO) principle

highlighted within the Resilience Engineering literature (Hollnagel et al, 2007).

Practical experience of the messy realities of military aircraft operations and

back-office airworthiness assessment was the genesis of the research aim. This

research uses the RAF’s Tornado Ground Attack Reconnaissance Force

(TGRF) as a case study.

Figure 1-2 - RAF Tornado GR4 Aircraft (Crown Copyright, 2009)

5

1.4 What is ‘Airworthiness Management’?

Large organisations exist to maintain, modify, provide resources, operate and

monitor aircraft fleets in order to keep them airworthy. The way in which this

multitude of functions is carried out has a variety of effects on the aircraft

system and the property of airworthiness. Those responsible for airworthiness

can only manage it indirectly by managing of the functioning of the organisation.

This is achieved by means of tasking maintenance or setting policy, defining an

organisational structure (including contracting out elements), providing

resources and conducting quality assurance. So whilst making engineering

assessments and specifying what physical actions are to be carried out on an

aircraft system is critical, the management of airworthiness is a wider

endeavour.

1.5 The Research Aim

The aim of this thesis is:

To apply resilience engineering concepts by producing a system

model of an airworthiness management organisation in order to

provide a tool to improve management of airworthiness.

1.6 Objectives

In order to achieve the aim the following research objectives were established:

Review the theoretical background to safety management and the

implications for airworthiness management.

Review the concepts of Resilience Engineering with an emphasis on

applying it to airworthiness management.

Establish a theoretical framework for a model of an airworthiness

management system.

Gather and use primary research data to establish and validate a model

of the airworthiness management system for the RAF Tornado Force.

Using the model, develop a tool to enhance the airworthiness

management system of the RAF Tornado Force.

6

1.7 Methodology Overview

A literature review of resilience engineering was carried out, which branched out

into source disciplines of systems thinking and engineering; control theory; non-

linear dynamics and complexity theory. A search for work in this area

addressing airworthiness or technical safety in other domains was conducted.

For the Tornado case study, the safety, airworthiness and assurance plans of

the various elements of the organisation were examined. Resilience

engineering provides a number of modelling techniques that could be applied to

the case study; these were assessed and down selected to the Functional

Resonance Analysis Method (FRAM). The system was assessed by semi-

structured interviews with key personnel as well as using a large amount of

information and experience gained from working within the system. The FRAM

Model was built within a spreadsheet and a separate model visualisation tool

was created using Microsoft Visio. This allowed for the identification of various

potential leading indicators for system safety. In order to validate the FRAM

model, specific case studies were required. Two incident reports and an

emergent airworthiness risk were selected for analysis.

1.8 Descriptions and Definitions

For simplicity the standard terminology as described within MAA02 – Military

Aviation Authority Master Glossary (MAA, 2012) is adopted for this thesis.

There are a number of minor differences in emphasis between terms used here

and in civil aviation or other domains; these are discussed where relevant.

1.9 Thesis Structure

This thesis is structured around the research objectives:

Chapter 2 describes the theoretical foundations for resilience engineering

in the context of the other theories of safety and safety engineering

practise in other domains. Potentially useful models are analysed.

Chapter 3 details the methodology for carrying out the primary research.

Chapter 4 describes the process for building the case study FRAM Model

– the Tornado Airworthiness System Model.

7

Chapter 5 describes the development of the FRAM visualisation tool.

Chapter 6 discusses how the FRAM Model may be used for incident

analysis with reference to two examples.

Chapter 7 gives a process for, and example of the FRAM Model as a risk

assessment tool.

Chapter 8 provides a general discussion of the case study exercise,

focussing on the applicability of Resilience Engineering to aspects of

airworthiness practise.

Chapter 9 provides some conclusions.

9

2 LITERATURE REVIEW

The literature review will examine arguments for broadening the scope of

airworthiness to address the complexities of managing modern aircraft,

maintenance and support organisations. Existing notions of cause, failure and

hazards are challenged as the theoretical background to resilience engineering

is described. Models and methods for understanding and managing the safety

and airworthiness of complex systems are examined using the paradigm of

resilience engineering.

2.1 Airworthiness in the Context of Safety

There are a number of definitions for the term airworthiness; all these have at

their core the need for the aircraft to be able to be operated in safety or as the

MAA has it; ‘without significant hazard’. Hazard is further defined as ‘an

intermediate state where the potential for harm exists’ (MAA, 2012b). The

hazard is said to lie between a cause (such as a technical or human failure) and

an accident. So whilst airworthiness is clearly a target for aerospace design

organisations to meet through satisfaction of certification standards, it is also an

element of system safety that requires management throughout the lifecycle of

the system. It is analogous to ‘technical safety’ or in other domains, which is

often separated ‘operational’ or ‘occupational’ safety.

2.1.1 Accident Investigations

The need to investigate loss of life or near misses is both a pragmatic and moral

choice. The conclusions drawn from such investigations are extremely

important at a human level but also critical to restoring system safety. It is

therefore vital for accident investigators to use mental and procedural models

that reflect the complexity of modern technologies. One of the largest accident

investigation agencies, the National Transportation Safety Board (NTSB)

determines a ‘probable cause’ in all its reports (Johnson and Holloway, 2004)

but ICAO recommends that ‘causes’ – plural are determined (ICAO, 2001).

This indicates a governing accident chain theory in the former organisation but

perhaps a slightly more sophisticated model in the latter. Various writers (De

10

Landre et al., 2006),(Coury et al., 2008) have proposed models or frameworks

in which multiple causes can be described in accident investigation. Much has

been written about the intersection between legal frameworks and accident

investigation methodologies. Dekker (2003) for example has described the

detrimental effect of the adversarial nature of justice. The rest of this chapter will

describe how assigning ‘root’ or probable cause to accidents is potentially

unhelpful in the context of complex systems. It follows therefore that notions of

blame or individual responsibility are often problematic to apply.

2.1.2 Initial and Type Airworthiness

Much of the airworthiness of a system is ‘designed-in’ before manufacture. This

involves specifications, systems configuration and assumptions on support and

maintenance philosophy. A structured systems engineering approach to safety

as described in ARP 4761 (SAE, 1996) is used to convince regulators that a

type certificate can be issued. The evolution of safety requirements and

regulation over a system’s lifecycle causes difficulty (Kelly and McDermid,

1999). Military aircraft in particular are often retained in service for many

decades. Whilst the technology may remain relatively constant, experience

shows that it is usual operational usage to evolve over the course of the

lifecycle. For this reason it is important to regularly adjust, validate and reassess

airworthiness assessments if the type airworthiness of a design is to be

maintained.

2.1.3 Safety Management

For many complex systems, the development of safety cases is a mandatory

requirement (MoD, 2007) and in particular for military airworthiness this is

governed by MAA Regulatory Article 1205 (MAA, 2013). The concept of a

safety case is the presentation or collation of a body of evidence to assure

interested parties that the system is safe. This body of evidence is collected and

organised according to mental or procedural models. The theoretical basis for

these models are the same theories of safety as described below. Safety

management systems are similarly structured according to the prevailing

11

theoretical approach to safety. An evolution in modelling requires an evolved

approach to safety management.

2.1.4 Continuing Airworthiness

Continuing airworthiness relates to the maintenance of a particular, safe system

state for each of the individual aircraft being managed (MAA, 2012b). Given that

it is never possible to comprehensively inspect/audit each aircraft before every

flight, there must be assumptions made as to the effect of organisational and

human interactions with the aircraft so as to maintain the system in a safe state.

Understanding maintenance system performance is critical to assuring

continued airworthiness. This achieved through a Continuing Airworthiness

Management Organisation (CAMO) which provides assurance that its specified

tasks are being undertaken successfully. This is primarily achieved through a

quality assurance system, which ensures that rigorous processes are

established (Casey, 2013).

2.2 A History of Safety Theory

Chapter One sketched out a chronological view of ‘Ages’ of safety theory. New

theories tend to gain traction as a result of the investigation to major accidents.

Herrera (2012) describes how safety theory has evolved across technological,

human factors, organisational and complexity ‘ages’, identifying key accidents

and ideas on a time line, which is summarised in Table 2-1:

12

Table 2-1 Herrera 's Ages of Safety Theory

Leonhardt et al (2009) presents breakdown of safety methodologies within a

Resilience Engineering White Paper. This document describes Technical,

Human Factors, Organisational and Systemic accident analysis and risk

assessment methods. Systemic models/methods are those that have recently

emerged to provide a means of analysing safety from a ‘complexity’ standpoint.

These are shown chronologically in Figure 2-1 with an expansion of each

abbreviation available within the glossary.

Time Accidents Technology Human Factors Organisational Complexity

1930s Domino Model

1940 - 50sFailure Mode Effects

Analysis (FMEA)

Human Factors

Design

Task Analysis

1960s Aberfan Colliery Disaster

Fault Tree Analysis (FTA) -

Minute-Man Missiles &

Boeing aircraft

Energy Barrier Model

Technique for Human

Error Rate Prediction

1970s

Flixborough & Seveso Chemical

Plants

Tenerife Aircraft Collision

Three Mile Island Nuclear Plant

Probalistic Risk

Assessment (WASH-1400

Reactor Safety Study)

Hazard & Operability

Analysis

Energy Damage and

Countermeasure Strategies

Man Made

Disaster

Information

Perspective

1980s

Bhopal Chemical Plant

Challenger Space Shuttle

Chernobyl Nuclear Plant

Kings Cross Railway

Piper Alpha Oil & Gas

Dryden Aviation

Crew Resource

Management

Safety Culture

Swiss Cheese

Model

Normal

Accident Theory

1990s

Warsaw Air Crash

Iraq Friendly Fire

Cali Air Crash

Arianne 5 - Space

Norne Air Crash

Longford Oil & Gas

Mandatory Safety Cases

(UK)Normal Deviations

Man,

Technology and

Organisation

Concept

Drift into Failure

Risk Influence

Model

High Reliability

Organisations

2000s

Uberlingen Air Crash

Columbia Space Shuttle

Helios Airways

Texas City Refinery

Nimrod Air Crash

Air France 447

Deepwater Horizon

Human Factors

Analysis &

Classification System

Failure of

Leadership,

Culture &

Priorities

Aviation Safety

Management

Systems

Resilience

Engineering

Theory of

Practical Drift

"Age" of Safety Theory

13

Figure 2-1 Accident Analysis and Risk Assessment Methods (Leonhardt et al,

2009)

Saleh et al (2010) present a slightly different narrative in the development of

safety theory. Whilst they note most of the same key ideas and developments,

they identify three tracks in safety theory leading towards the modern ‘system

and control theoretic’. These are illustrated below:

Figure 2-2 Three Tracks on the Evolution of Safety Theory (Saleh et al., 2010)

The tracks are not exhaustive and there is some cross coupling between ideas.

Herrera’s (2012) technological age can be likened to the middle track, the

defence in depth track is comparable to the organisational age whilst the top

14

track has many human factors elements but takes much from the current ‘age of

complexity’. The current state of the art is given as a systems engineering-

control theory approach. Saleh (2010) acknowledges that the literature in the

field is particularly fractured. This is perhaps because the various theories

emanate from disparate fields such as psychology, reliability, operations studies

and management.

2.2.1 Technological Age – Governing Philosophy

The predominant theme in the technological age of safety theory is that of a

‘chain of causation’; first visualised as a set of toppling dominos by Heinrich

(1950). Each domino represented a factor in the accident: Management

controls; failure of a man; unsafe acts or mechanical conditions; the accident;

injury. Once the first domino was toppled removal of either of the others would

prevent the final injury domino toppling. Related to this is the concept of an

accident or event chain, where causative elements or events link together to

form a chain, which if it had been broken would have prevented the accident. It

is unclear where this idea originated; it is perhaps a reflection that a linear view

of the world still represents the defining popular narrative for any major

accident. Leveson (2011) links this to an erroneous assumption that there is

always a cause for any given accident.

2.2.2 Technological Age – Tools

The notion of a linear event chain gave rise to methods of analysing system

safety or the related property of reliability. The Fault Tree Analysis (FTA)

methodologies were developed from reliability studies of the American

Minuteman missile system and quickly developed into a methodology for

analysing safety by defining the probability of an unsafe condition developing

(Herrera, 2012). Closely associated are event trees which define hierarchies of

events post a single initiating event (such as an unsafe condition). These

analyses use stochastic methods to forecast top level probabilities for accidents

caused by single or multiple failures lower down in the system. There is always

a mathematical audit trail from the top level system safety target, for example

hull loss probability in commercial aviation, down to individual system or

15

component reliability data or predictions. Importantly, modern system safety

assessments contain more qualitative information based on expert

understanding of systems; carried out through Functional Hazard Assessments

(FHAs) (Dalton, 1996). When analysing accidents using event chain type

models such as FTA, there is a question of how far back it is appropriate to go

in order to find an initiating event. Leveson (2011) argues that selection of

initiating events is often arbitrary in accident analysis. It has been accepted in a

large number of major accident reports that management commitment to safety

or ‘safety culture’ is a key factor in risk of accident (Dekker, 2005), yet there is

no clear way in which these vital considerations can be fitted into an event chain

model. Reason (1997) espouses a version of the event chain in the famous

‘Swiss cheese’ model of organisational accidents. Reason’s cheese has

become the de-facto mental model for understanding safety and accidents

within the military aviation community as shown by articles in the RAF’s Air

Clues in-house safety magazine demonstrate (Anon, 2011; Gale et al., 2013).

Whilst Haddon-Cave’s (2009) investigation into Nimrod addresses issues of

culture and complexity, his view of causation is essentially linear. Leveson

(2011) outlines why linear accident models of the technological age such as the

Swiss Cheese are no longer considered acceptable:

Direct Causality – there is a reliance on the notion that there is always a

linear relationship between event A causing event B.

Subjectivity in Selecting Events – The backward chain of events is

often shown to stop for a number of arbitrary reasons, which could

include familiarity with a particular event in the sequence (“We’ve seen

this before”), it deviates from a standard (component operates outside its

specification) or a lack of information (such as inability to understand a

human performance issue).

Subjectivity in Selecting Chaining Conditions – It is often not clear

which factors caused each other.

Discounting System Factors – Event chain models generally deal with

proximate causes and do not deal with issues such as culture or

16

organisational pressures which can pervade through a socio-technical

system.

A useful example of how this approach to accident analysis can prove

disastrous is given by Leveson (2011). She notes how an incident where a DC-

10 lost a cargo door (without loss of life) was attributed to the failure of a

baggage handler to close the door properly rather than a design floor meant

that two years later a similar incident resulted in the complete loss of a DC-10

near Paris in 1974.

2.2.3 Limits of Probabilistic Risk Assessment

Both civil and military airworthiness certification standards require certain safety

targets to be met. These targets are expressed in terms of probabilities,

principally probability of hull loss and death of passengers or crew; for military

aircraft this is specified in Regulatory Article 1230 – Design Safety Targets

(MAA, 2012a). There are various other targets regarding risk of harm to third

parties or other unsafe conditions – these are operating risks. Operating risks

are also commonly assigned qualitative risk levels; in the case of military

aviation this process is specified in Regulatory Article 1210 – Management of

Operating Risk to Life (MAA, 2012a). This regulation advises Platform

Operators and Project Teams to make use of Fault Tree Analysis to enable

calculation of these risks. For some UK military platforms this has resulted in

the introduction of ‘Loss Models’ to guide the assessment of new or emergent

risks. In the case of Tornado, the Loss Model (Sugden, 2011) is not a tool that

can be used in isolation for predictive risk assessment; rather it uses incident

statistics to provide a current picture of loss rates across the fleet (Woodbridge,

2012). The regulation and recommended practise (SAE, 2010; Lloyd and Tye,

1982) for both civil and military airworthiness and safety targets is for the use of

fault tree and dependency diagram models. These methods of probabilistic risk

assessment (PRA) are linear, which usefully provides for aggregation of total

risk. There are however a variety of issues to consider in their use. Apostolakis

(2004) provides a summary of some of the benefits and criticisms of PRA.

However in the case of airworthiness certification risk assessments the process

17

is generally based on a qualitative assessment of Functional Hazard Analysis

(FHA). FHA allows expert subjective analysis to provide an element of linkage

between various hazards. Equally Common Cause Analysis (CCA)

methodologies go some way to accounting for system-wide failure mechanisms.

The literature on resilience engineering disputes Apostolakis’ (2004) claim that

PRA deals effectively with true complexity.

Table 2-2 Benefits and Criticisms of Probabilistic Risk Assessment (Apostolakis,

2004)

Benefits Criticisms

Multiple failures considered

Increases likelihood of spotting complex failure interactions.

Facilitates communication.

Integrated Approach.

Identifies unknown areas for research.

Focuses risk management activity on key areas

Human actions during accident scenarios cannot be modelled.

Difficulty of quantifying software failures.

Cannot model safety culture.

Difficulty estimating design and manufacturing errors.

PRA models are essentially a product of the ‘technical era’ of safety science,

they assume linear behaviour and that the systems being analysed are

tractable; thus decomposable into independent subsystems. This remains the

de-facto approach to managing most complex socio-technical systems and

forms the basis of the safety case approach prevalent within many regulatory

environments. The fundamental assumptions that justify their use are

questionable when applied to complex socio-technical systems. The principle

concern is that the human element cannot be satisfactorily modelled using

Boolean logic, in systems where there are frequent interactions with humans,

whether operators, maintainers or design or support engineers this presents the

possibility that common cause failures will be built into the system and that the

relationships will be non-linear.

18

2.2.4 Human Factors

Herrera (2012) outlines how 20th century disasters such as Three Mile Island

and Flixborough showed that the event chain models were becoming

inadequate – the focus began to shift to human failing, with the human identified

as the number one unreliable component in the event chain. Herrera (2012)

highlights two trends in the age of human factors; studies concerned with

eliminating human error by design for human performance and studies into how

humans cope with disturbances.

2.2.5 Organisational

‘Man Made Disaster’ theory was the initiating scholarly theory behind

organisational accident theory (Saleh et al., 2010). This theory noted that within

a certain class of events known as ‘man made disasters’ there were multiple

events chains that reached a long back into the past and that management and

organisation were key factors in causing accidents. Saleh (2010) also notes

‘Normal Accident Theory’ and ‘High Reliability Organisations’ as key precepts of

the organizational accident. Normal accident theory notes that there are tight

couplings between interacting causal factors in complex system accidents and

that they cannot be predicted. This has been condemned as a somewhat

fatalistic view. Herrera (2012) sees High Reliability Organisation Theory as a

counter to Normal Accident Theory. This characterises successful organisations

as those operating complex systems with a very small number of accidents.

Saleh (2010) notes that the research highlights a number of common

characteristics of such organisations such as:

Preoccupation with failure and organizational learning.

Commitment to and consensus on production and safety as concomitant

organizational goals.

Organizational slack and redundancy.

These facets of successfully safe or high reliability organisations correspond to

aspects of ‘safety culture’ as described by Reason (1997) and others.

19

2.3 Complexity

Aircraft are complicated machines; they have many components interacting in a

multitude of combinations. Dekker (2011) holds that analytic reduction, as

practised within traditional linear safety analysis, is unable to describe how

system elements and processes behave when exposed to multiple

simultaneous influences. He also describes the key distinction between a

complicated system such as an aircraft, which could conceivably be

disassembled then reassembled by a single person and complex systems. A

complex system is one where the boundaries are ‘fussy’ (require highly detailed

definition) and the structure is intractable; an aircraft operated subject to human

factors, culture, regulatory and organisational factors is therefore complex.

Cilliers (2005) defines complex systems as those having the following

properties:

Large numbers of simple elements.

Dynamic, propagating and non-linear interactions; these define

behaviour which is emergent and cannot be understood by inspection of

components nor predicted by deterministic methods.

Open, exchanging energy and information with the environment.

Memory is distributed within the system, influencing behaviour.

Adaptive behaviour; without the intervention of external agents.

This study assumes that the complete aircraft system, incorporating its

operation and support is complex rather than simply complicated. It could also

be argued that the edition of extensive software within aircraft renders the

system complex. The safety management system and airworthiness

management in particular must deal with complexity.

For those charged with managing the safety of complex systems, understanding

models for accidents and studying post mortem analyses of accidents does not

present a comprehensive approach to prevention. It is generally accepted that

events, hazards and risks often combine in unexpected ways. Is it therefore

adequate to manage safety risk as a game of ‘whack-a-mole’; eliminating or

20

mitigating risks as and when they become apparent (Zarboutis and Wright,

2006)?

It may be argued that a proactive reporting culture does much to allow

elimination or mitigation of risks before they materialise. Heinrich’s (1950) ‘ice

berg’ model drives much of this effort to uncover previously unknown risk and

there is an indisputable logic which says that knowing about a risk is a first step

to eliminating or managing it. The continued history of complex accidents tells

us that this approach may never be completely effective in preventing

unexpected failure (Hollnagel, 2007). Leveson (2011) explains that the concept

of a High Reliability Organisation confuses notions of safety and reliability. Just

because individual components of a socio-technical system can be proven to be

individually reliable it does not follow that safety will necessarily emerge as a

system property. Systems may be reliable yet unsafe, such as the NASA Mars

lander which crashed because the designer failed to anticipate the interaction

between the software and mechanical systems. Equally it is possible for a

system to be unreliable yet safe where systems fail-safe.

2.3.1 Complexity Theory

Accident investigation or analysis of complex system failure requires a mental

model to be applied to the accident scenario (Hollnagel, 2011). Similarly

accident prevention through risk management uses modelling to understand

potential accidents. Hitchens (2003) describes how complexity is relative to the

observer’s frame of reference. Modelling complex systems requires judgement

as to the extent of elaboration or its converse; encapsulation. He proposes that

systems derive their degree of complexity from their variety, connectedness and

disorder. Socio-technical systems are increasing in complexity as a result of the

increased use of networks. Manson (2001) provides a useful review of

complexity theory, most of the branches of which have an antecedent in general

systems theory. Three main branches of complexity theory are identified;

‘algorithmic complexity’ which gives that complexity is defined by the difficulty in

describing system characteristics. ‘Deterministic complexity’ deals with chaos or

catastrophe theories which posit that stable complex systems may become

21

suddenly unstable ‘Aggregate complexity’ deals with how elements interact to

produce complexity. A key property of complex systems is that of emergence

which describes how system-wide characteristics cannot be computed by the

aggregation system component behaviour. Zabourtis (2006) highlights that

patterns that emerge from complex socio-technical systems which erode the

resilience of complex systems. Grøtan et al (2011) gives a good account of the

theoretical foundations of complexity and how they can be applied to risk

assessment; the ‘Cynefin’ Framework provides a summary.

Figure 2-3 The ‘Cynefin’ Framework – Complexity and Risk Management (Grøtan

et al., 2011)

Generally the literature shows that whilst linear thinking has reached its limits

within system safety science, complexity theory has yet to be completely

applied to the problem. Zabourtis (2006) identifies how complexity theories can

be used to replace HAZOPS type safety analyses. The key inputs should be:

How can system entities co-adapt?

What will the probable effect be on the whole?

How can such patterns be eliminated?

22

The output of such an analysis should therefore be some means of avoiding the

emergent harmful properties. Dekker (2011) advises that complexity theories

can be applied to accident investigation if the search for a single cause is

dropped and multiple narratives are allowed to overlap and on occasion

contradict each other. The nature of complexity defies analysis; Cilliers (2005)

writes on the ‘incompressibility’ of complex systems, in that the only reliable

model of a complex system is that which has the same level of detail as the

system itself. Clearly this is impractical, yet as any model will involve

simplification, disregarded elements may have non-linear effects and the

magnitude of the potential outcomes may be non-trivial. However Cilliers (2005)

also states that whilst modelling and computing complex systems will never be

sufficient, it is still necessary.

2.3.2 Systems Thinking and Systems Engineering

The concept of a system is well-established with roots in philosophy and

thermodynamic theories leading to theories and practise surrounding systems

engineering. Hitchens (2003) provides one definition:

A system is an open set of complementary, interacting parts with properties,

capabilities and behaviours emerging both from the parts and their interactions.

The concept of emergence is an important one; accidents are emergent system

states of disorder. Systems engineering involves the generation of models to

represent a system (Oliver et al., 1997). Leveson (2011) first describes how

safety ought to fit into systems engineering’s primary activities – Needs

Analysis, Feasibility studies, Trade studies, System architecture development

and Interface analysis. This is the basis for system safety assessments

employed in generating evidence for airworthiness certification as per ARP

4761 (Dalton, 1996). Saleh (2010) distinguishes between failure modes

attributable to component failure and those failures attributable to emergent or

interactive failures; his thesis is that a systems theoretic approach addresses

this second set of failures. However he raises concerns that formal systems

theoretic approaches such as co-ordinatability and consistency in hierarchical

and multilevel systems are yet to be fully applied to safety analysis. Leveson’s

23

(2011) Systems-Theoretic Accident Model and Processes (STAMP) uses

control theory and processes as the key to prevention of accidents. It

decomposes the system across the complete lifecycle, from concept to

disposal, into a series of control loops. The key to prevention of accidents is

said to be keeping the entire system in a state of equilibrium, which is achieved

by applying constraints to implement control. The model is said to more

effectively deal with software than traditional notions of failure. STAMP utilises

descriptions of control loops at technological subsystem level, human controller

level and socio-technical organisation level, shown in Figure 2-5. STAMP uses

a taxonomy of control loop failure modes as an audit check list. Salmon et al

(2012) compares STAMP to other models concluding that STAMP provides a

more comprehensive system description but it is difficult to incorporate human

failures into the model, which itself needs a highly developed understanding of

the whole system. This highlights the difficulty in applying theoretically strong

models of complexity to particular scenarios.

24

Figure 2-4 General Form of a Model of Socio-technical Control (Leveson,

2011)

2.3.3 Control Theory

STAMP (Leveson, 2011) suggests that safety can be treated as a control

engineering problem and Saleh (2010) identifies this idea as an important

corollary to the development of a systems thinking approach to safety.

Kontogiannis and Malakis (2012a) describe how the concept of a model with

control loops is fundamental to systems safety incorporating human and

organisational factors. Hollnagel and Woods (2005) produced an Extended

COntrol Model (ECOM) which describes generically how organisational

25

processes transfers downwards to directly interact and control the technological

system and hence alter its state. The Viable System Model (VSM) uses

cybernetics principles to describe how safety goals are transferred downwards

through an organisation and how output is controlled by various measures such

as audit (Espejo, 1989). Kontogiannis (2012a) combines these two models and

applies them to studying the accident involving the crash of flight AEW-241 in

December 1997. Like many control and systems models in the safety literature

Kontogiannis (2012a) highlights the difficulty of applying the models for the

purposes of accident prevention. Kontogiannis (2012b) also tries to apply these

principles in a case study involving emergency helicopter operations.

2.3.4 Non-Linear Dynamics

Control of complex socio-technical systems needs to address the problem of

non-linear behaviour. Bendat (1998) describes how physical and engineering

systems can be divided into linear and non-linear systems. A system is linear,

if for any inputs and and for any constants ,

Equation 1 – Linear System (Bendat, 1998)

[ ] [ ] [ ]

This leads to 2 properties:

Equation 2 - Additive Property (Bendat, 1998)

[ ] [ ] [ ]

Equation 3 – Homogeneous Property (Bendat, 1998)

[ ] [ ]

A non-linear system is therefore one where,

Equation 4 – Non Linear System; lack of Additive Property (Bendat, 1998)

[ ] [ ] [ ]

Equation 5 – Non Linear System; lack of Homogeneous Property (Bendat, 1998)

[ ] [ ]

26

This means that for a linear system with a random theoretical Gaussian

probability density function as an input (e.g. a normal distribution), the system

will transform that data and produce an output with a Gaussian probability

density function as an output. Bendat (1998) also makes the point that any

physical system will display non-linear properties if the input conditions are

suitably wide. As this is true for numerous examples in flight dynamics it is also

true for various instances in safety and reliability, where oversimplifying

assumptions are made regarding the condition of equipment and its interaction

with maintenance and operating organisations. Human behaviour often defies

mathematical modelling due to its complexity and non-linear properties. As

previously described, it is common for safety analyses and models to assume

linear behaviour. In fact complex socio-technical systems generally exhibit a

lack of additive and homogeneous properties; where different inputs combine to

produce unexpected and ‘out-of-control’ outputs resulting in accidents. This

explains some of the difficulties encountered in producing a workable approach

to human and organisational reliability, as outlined by Rasmussen (1997). Non-

linear effects explain the concept of emergence that is the behaviour of linear

systems are predictable and tractable, yet nonlinear systems produce

unexpected results. Grøtan (2011) outlines how this leads to the concept of

‘Black Swan’ events that are unexpected with a huge impact – such as a

catastrophic accident with a complex system. These are understandable in

retrospect but could not have been predicted. Leveson (2011) describes how

such accidents are as a result of non-linear interactions between components of

the system, whether human, organisational or technological. The key to

developing an improved method of managing safety and estimating risk will be

to understand and predict these non-linear interactions.

2.4 Resilience Engineering

The theory of resilience engineering is emerging as a response to the problems

posed to safety management and engineering by complexity theory and the age

of the organisational accident as described by Reason (1997). The central

theme is to move from a focus on failure, where notions of component reliability

27

are applied to complex systems, humans and organisations; to looking at how

systems can succeed under varying conditions. The literature on the subject is

somewhat fragmented, although a series of books has been published, which

bring together the key ideas. One of the aviation organisations embracing

resilience engineering is EUROCONTROL which is a multinational air traffic

management service provider with Leonhardt et al (2009) publishing a white

paper on the application of resilience engineering within the organisation. This

illustrates that there is a blurred line between ‘traditional resilience’ study as

applied to infrastructure, and resilience engineering which has emerged from

the study of safety. Hollnagel et al (2011) give a simple definition of resilience:

“Resilience is the intrinsic ability of a system to adjust its functioning prior to,

during, or following changes and disturbances, so that it can sustain required

operations under both expected and unexpected conditions.”

Woods and Hollnagel (2007) set the scene for resilience engineering. They

outline fundamentals which include a shift away from the traditional safety focus

on ‘what went wrong’ (hindsight) and what could go wrong (risk assessment) to

a focus on ‘what can go right’ for risk assessment and ‘what did go right’ for

accident analysis – also neatly summarised by Schafer (2012). Resilience

engineering also rejects the notion of human failure, error taxonomies and

reliability analysis of complex systems in favour of a theory that failures

represent either the breakdown in strategies for coping with complexity, or an

unfavourable combination of functional variability within a system (technological,

human or organisational). In resilience engineering, safety is redefined as the

ability to succeed under varying conditions. By observing how systems work

under everyday pressures, it should be possible to understand the level of

resilience in a system and how it might be engineered to increase this quality.

For the purposes of both accident investigation and risk assessment it is

necessary to move away from linear combinations of events to an

understanding of how a system might lose its dynamic stability and veer into an

accident trajectory (Hollnagel et al., 2007). In summary, there are four key

precepts to Resilience Engineering:

28

1. Performance conditions are always underspecified. Individuals

and organisations must therefore adjust what they do to match current

demands and resources. Because resources and time are finite, such

adjustments will inevitably be approximate.

2. Some adverse events can be attributed to a breakdown or

malfunctioning of components and normal system functions, but others

cannot. The latter can best be understood as the result of unexpected

combinations of performance variability.

3. Safety management cannot be based exclusively on hindsight,

nor rely on error tabulation and the calculation of failure probabilities.

Safety management must be proactive as well as reactive.

4. Safety cannot be isolated from the core (business) process, or

vice versa. Safety is the prerequisite for productivity, and productivity is

the prerequisite for safety. Safety must therefore be achieved by

improvements rather than by constraints.

These precepts define a theoretical approach drawn from various ideas about

organisational accidents and safety culture. The key development is the focus

on the functions within the system and the emphasis on improving their

combined performance, rather than a focus on the potential sources of hazards

and barriers for accident prevention. This positive standpoint is a key attraction

to the approach; the drive for operational performance improvement and safety

can be in synergy rather than in conflict. Hollnagel (2011) gives four

cornerstones to the practise of resilience engineering. The first is knowing what

to do to respond to everyday disturbances – the actual. The second is knowing

how to monitor potential threats from the environment and from the functioning

of the system itself – the critical. The third part of the practise is knowing what to

expect in terms of threats and opportunities in order to address potential.

Finally, the fourth ‘cornerstone’ is that of the ability to address the factual

through learning.

29

A slightly different conceptual framework for Resilience Engineering is

presented by Madni (2009); offering more concrete requirements for

operationalising the practise:

Responding

(Actual)

Learning

(Factual) Monitoring

(critical)

Anticipating

(Potential)

Knowing what has happened

Knowing what to do

Knowing what to look for

Knowing what to expect

Figure 2-5 The Four Cornerstones of Resilience (Hollnagel, 2007)

30

Figure 2-6 Conceptual Framework for Resilience Engineering (Madni, 2009)

2.4.1 Resilience Engineering as a Successor to Safety Management

Leonhardt et al (2009) puts the resilience engineering approach to safety

management simply:

The more likely it is that something goes right, the less likely it is that it goes

wrong.

Cambon (2006) provides a resilience framework for assessing safety

management systems; they propose a number of metrics based on Tripod

theory, which essentially measures the performance conditions under which the

SMS operates. The balance of these performance conditions is said to

determine the stability of the SMS. ‘Engineering’ implies design and

Beauchamp (2006) notes how this can be achieved through organisational

learning to provide organisational resilience; a model for guidance is provided.

Zarboutis (2006) describes how, analogous to Rasmussen’s (1997) approach to

organisational drift, resilience engineering can identify symptoms of an erosion

in resilience. Johansson (2008) provides a ‘quick and dirty’ approach to

evaluating resilience in systems; a helpful overview but does not prescribe

specific improvement or change activities. Stoker (2008) outlines a

comprehensive approach to the assessment of operational resilience,

effectively specifying a goal based hierarchy for elements contributing to

resilience; producing a check list approach. Whilst this is undoubtedly a

valuable activity, it is questionable whether it will be able to deal with the

emergence of safety issues.

2.4.2 Under Specification of Performance Conditions

Under specification of performance conditions, that is the factors that affect the

execution of a particular function is key concept in the literature (Hollnagel,

2007). In most organisations performance conditions are subject to control

through rules, with the idea that this will improve safety. Hale (2013) reviews the

literature on this, noting that there are two approaches; a classical top down

approach, punishing transgression and secondly a bottom up approach that

31

sees expert ability to adapt to changing circumstances as paramount.

Nathanael (2006) notes that it is impossible to make what happens in practise

match that which is espoused by officialdom; the key to generating resilience is

dialogue between the hierarchical levels.

2.4.3 Performance Variability

Resilience engineering regards performance variability as inherently useful; it

allows operations to continue in underspecified conditions. It also provides the

potential for coupling between functions where upstream performance variability

combines with downstream performance variability to grow in amplitude. This

phenomenon can be harnessed for system success or else it provides an origin

for safety risk ( Hollnagel, 2012).

2.4.4 Examples of Resilience Engineering in Practice

Resilience engineering is more theoretical than its name suggests and

discussion abounds over the practicality of implementing its precepts is

uncertain. However, its principles can be found in evidence where it was not

specifically applied. Table 2.3 provides a brief summary of some examples.

Table 2-3 Examples of Resilience Engineering in Practice

Industry Tools Insights

Process Industry

Survey of workforce using Principal Component Analysis

Shirali et al.(2013) attempt quantitative measurement of resilience at an organisational level. Only possible to measure the potential for resilience rather than resilience itself. The following variables are given as indicators:

Top management commitment

Just culture

Learning culture

Awareness and opacity

Preparedness

Flexibility

Process Industry

Bayesian Networks Resilience Dashboard

Pasman et al. (2013) define a holistic control methodology for plant safety using leading indicators derived from process measurements within the plant. Also use of process simulation tools to develop scenarios. Traditional

32

(not currently achievable)

HAZOP/FMEA analyses do not capture all potential accident scenarios. Key Points:

Technical resilience can be measured/simulated. Organisational factors less so.

Importance of leading indicators to enable response to variations

Difficulty in dealing with drift in safety metrics.

Safety Gains made through interdepartmental cooperation vs common cause failures.

Advocate extensive use of bow-ties.

Aviation Interviews, audit and expert analysis

An investigation into both the sources of resilience and sources of brittleness. Comparison of two comparable small air carriers. Identification through extensive interviews. Resilience and brittleness categorised and risk assessed (Saurin and Carim Junior, 2012).

Air Traffic Management

FRAM Analysis of a mid-air collision fatal accident. Provides notes on buffering capacity, flexibility, margins, tolerance and cross scale interactions. There was no root cause – aircraft and ATM was operating normally. The system was inadequate (de Carvalho, 2011).

Aviation Bayesian Belief Networks (BBN)

Examines the use of and qualification of experts to provide probability estimates for BBN. Hidden common causes in BBN – principally safety culture. Difficulty in estimating frequencies or probabilities of rare events. BBN assume the ‘Causal Markov Condition’ therefore common cause failures are difficult to deal with – maybe applying BBN to FRAM would solve this issue (Brooker, 2011).

Aviation FRAM Alaska Airlines flight 261 accident analysed to understand FRAMs performance against 5 key resilience characteristics: buffering capacity, flexibility, margin, tolerance, and cross-scale

33

interactions (Woltjer, 2007).

Railways FRAM Interdisciplinary safety analysis of complex socio-technological systems based on the Functional Resonance Accident Model: an application to railway traffic supervision (Belmonte et al., 2011).

Nuclear FRAM Specific case study surrounding a task to move Nuclear Fuel – a specific task analysis rather than a generic system approach (Lundberg, 2008).

2.4.5 Criticism of Resilience Engineering

Oxstrand and Sylvander (2010) argue that Resilience engineering is little more

than a rebranding of safety culture; they do not see how the practise can be

applied to the nuclear industry which already uses both PRA and human

reliability analyses in the licensing of nuclear plants. In this industry it is argued,

safety culture forms part of every operation. The nuclear industry defines safety

culture as:

“Safety Culture is that assembly of characteristics and attitudes in organisations

and individuals which establishes that, as an overriding priority, nuclear plant

safety issues receive the attention warranted by their significance.”

International Atomic Energy Authority (Edwards et al., 2013)

Clearly safety culture is fundamental to engineering resilience into a socio-

technical system. The theory of safety culture does not in of itself propose a

different conceptual framework for the origin of unsafe system performance.

Also some safety culture literature describes a requirement for safety to become

the overriding priority for an organisation (Edwards et al., 2013). Clearly this is

at odds with notions of efficiency-thoroughness trade-offs and the requirement

to increase the proportion of activities that ‘go right’ as a means for reducing the

number that ‘go wrong. Whilst Resilience Engineering draws on much of the

theory around safety culture, it goes a lot further in proposing ways in which

organisations can be designed, analysed and modified in order to deliver

34

resilience. Le Coze (2013) describes a number of criticisms of Resilience

engineering the foremost amongst these being scepticism over the need to

introduce a new vocabulary to safety science. He also notes that the social

concept of power is missing from the resilience literature, although it could be

argued that the exercise of social power could be modelled as a function or a

resource. He also notes that many have disagreed with the notion that

resilience engineering does not present anything new; it collects simply

connects a number of existing ideas, foremost of which is the High Reliability

Organisation concept. He does note that the proof of the concept will be in its

application to real systems – testing the worth of the ‘engineering’ aspect of the

theory. McDonald (2008) asserts that Resilience Engineering is attractive

because other models are weak. He notes that the theory needs to be further

unified and demonstrated in practical examples.

2.4.6 Resilience Engineering and Airworthiness

Current MAA (2011a) policy is based on the idea that airworthiness is made up

of four pillars: the safety management system, compliance with recognised

standards, competence (of people and organisations) and independent

assessment. All of these activities and qualities are likely to contribute to the

resilience of an airworthiness system. Wilson (2008) provides a system model

for resilience of an airworthiness system and presents a number of key ideas:

The requirement for ‘organisational mindfulness’ – a safety culture keen

to seek out areas of risk.

Balancing ALARP principles with ‘And Still Stay In Business’ which could

be thought of as an efficiency thoroughness trade off; as per Hollnagel

(2011).

Understand how the organisational boundaries contribute to safety;

dealing with outsourcing, partnering and regulation.

Translate strategies into management frameworks for managing

organisational risk – these can be represented by ‘framework diagrams’

that show the factors that impact on safety management systems.

35

This work was succeeded by a thesis by Wilison (2012) which produced a

framework called RISK2VALUE which provides an integrated management

framework and decision support tool kit which address both safety and value

management at an organisational level. A generic diagram shown at Figure 2-7

is provided to support decisions – the use of which is illustrated by means of an

extensive diagram mapping various relationships. The strength of this approach

is that it either provides a generic approach to an audit of airworthiness or would

guide the construction of a new system. Equally it provides an assessment of

socio-technical factors surrounding accidents. A criticism that could be levelled

at the tool is that the linkages between the elements are not explicitly defined

and it therefore unclear how changes would influence the path that the

organisation took through the diagram.

37

Figure 2-7 Framework for managing the impact organisation, technology and human factors have on safety management systems (Wilson, 2008)

38

2.4.7 Lean Resilience

Leondhart (2009) notes that modern business systems are largely premised on

‘just-in-time’ processes. This methodology increases efficiency and

consequently coupling between upstream and downstream functions. Individual

system boundaries are more difficult to define as, for example, maintenance

units become increasingly tightly dependent on supply chains. Carney (2010)

urged caution in the introduction of lean principles and envisaged a hybrid

between lean maintenance and a more traditional model. Resilience

engineering in other domains has shown that it is in fact possible to harness the

approach to introduce production improvement alongside safety (Hounsgaard,

2013). Lean methodology is profoundly linear in its thinking (Carney, 2010); this

methodology is easily deployable in a highly tractable system such as a

production line. In less tractable systems such as maintenance it is likely that

Resilience Engineering techniques will produce better results.

2.5 Functional Resonance Analysis Method

The resilience engineering literature lacks specific methodologies or tools for

practical implementation of resilience engineering principles. The notable

exception is Hollnagel’s (2012) Functional Resonance Analysis Method

(FRAM). This is a technique for building models of complex socio-technological

systems. It differs from STAMP, in that it is a method for generating a model

rather than a model. FRAM maps the system as a series of functions, defined

by their various ‘aspects’ and linked ‘activities’.

O

C

P

I

T

R

FUNCTION

Time Control

Output

ResourcesPreconditions

Input

Figure 2-8 FRAM Function

39

By analysing the output variability from each function and the extent to which

this variability is damped up-stream, it is possible to begin to understand how to

analyse system performance from a resilience engineering point of view. The

FRAM forms the basis of the case study in later chapters and is described in

detail in Chapter 4.

2.6 Quantifying Resilience

Most approaches to quantifying resilience rely on surveys and audit approaches

such as those described by Shirali (2013) or by Saurin (2012). However whilst

an overall system assessment is of value, system managers are interested in

particular risks and being able to quantify them and manage them towards

ALARP levels, as required by legislation. Within process industries a high

degree of automation can be achieved within intensive data collection and

monitoring. These aspects mean that it is comparatively easy to run simulations

and model different systems. Risks can therefore be assessed in a more

quantifiable manner Pasman (2013). A reliability approach to safety is easily

quantifiable through linear decomposition to produce probabilistic risk

assessment. By contrast it is much more difficult to provide quantitative

assessment using a resilience engineering approach. Luxhøj (2003) and

Williams (1996) present Bayesian Belief Networks as a potential solution to low

probability – high consequence risks. Slater (2013) has presented an approach

to nesting BBN within a FRAM model and hence providing a way of quantifying

risk analysis developed through FRAM. He presents this technique as an

alternative to HAZOPS for use in process and transport industry. Brooker

(2011) analyses BBN in the aviation domain, specifically focusses on the ability

of experts to provide accurate assessments of probability in the case of low

probability events. He notes the ‘Causal Markov Condition’ which is an

assumption in BBN that there is no common cause Failure mode across the

network; issues such as ‘safety culture’ are therefore difficult to address. Other

potential techniques for quantification are the use of fuzzy logic or fuzzy set

theory with the use of Monte Carlo simulation (Shirali, 2013). An approach to

quantifying resilience in the context of civil infrastructure is presented by Vugrin

40

(2009), providing a menu of control engineering methodologies that may be

suitable. The issue of data collection in more human centric systems remains a

barrier to expansion of this method. Quantification is the key if Resilience

Engineering is going to gain ground against more traditional risk assessment

techniques.

2.7 Concluding Remarks

The various ages of safety theory were all products of the technology of their

time. Now in an age characterised by networked technology it is clearly time to

fully address notions of complexity for the purpose of providing safe systems.

This is certainly the case for the new generation of civil and military aircraft.

Resilience Engineering appears to offer a different approach to previous

theories and models. In particular the notion that accidents emerge from

unforeseen combinations of varying functional performance is a powerful one. It

offers the prospect that analysis from this perspective might provide risk insights

that may otherwise be missed. It also rings true from experience within an

airworthiness environment. Notions of ‘accident trajectories’ and holes in

processes or defences do not resonate in the same way. There is an

opportunity to combine efforts in process improvement and efficiency with

safety strategies. Resilience engineering offers the theoretical framework and

FRAM provides a potential method. This will be explored in subsequent

sections. It remains the case however that there is some way to go to

operationalize Resilience Engineering; Madni (2009) lists the key issues:

Help organizational decision makers in making trade-offs between

severe production pressures, required safety levels and acceptable risk.

Measure organizational resilience.

Identify ways to engineer the resilience of organizations.

The following chapters outline a case study in which this approach is tested.

41

3 METHODOLOGY

3.1 Introduction

In order to meet the research aim it was necessary to choose a technique with

which to model an airworthiness management system. The literature review

revealed that the Functional Resonance Analysis Method (FRAM) was the best

way to practically apply resilience engineering principles. The FRAM therefore

formed the basis of the practical element of the research. A single case study

organisation was used, with an aspiration of delivering an operationally useful

tool to the organisation at the end of the project. The case study was conducted

in two stages:

Stage 1 – Construct a FRAM Model of the Airworthiness Management

System and concurrently develop a visualisation tool.

Stage 2 – Test the model using scenarios drawn from occurrence

reporting and potential in-service airworthiness risks.

The model was developed iteratively, using expert opinion and data from a

variety of sources.

3.2 Working Arrangements

A key difficulty reported by other FRAM practitioners has been understanding

‘work as done’ rather than ‘work as imagined’. This was mitigated by conducting

the research from within the case study organisation on a part time basis, whilst

working within the Force Operations Centre. Moreover, this was preceded by 9

years work in other roles in military airworthiness; including quality assurance,

process improvement and error investigation roles. This provided insight into

‘work as done’ practise. Whilst there was a risk of bias, this was mitigated to

some extent through exposing parts of the model to other workers within the

organisation for verification.

3.3 Research Interviews

Semi-structured interviews were conducted with 19 different workers across all

of the functions. The interviews were flexibly arranged at the interviewees work

42

location (generally offices but control rooms and tool stores were also visited). A

pre-briefing was provided in the form of a two sided A4 document, shown at

Appendix C. The average interview duration was around 30 minutes, giving a

rough total of around nine and a half hours of interview time over the course of

the project. The general interview structure was as follows:

Check understanding and clarify scope of the study.

Confirm that participant was currently engaged in the function as part of

their daily activity.

Check accuracy of each of the function aspects.

Open questioning to highlight particular areas of variability in the

‘aspects’ of the function.

Open questions to ascertain whether any aspects had been missed.

Open questions to ascertain whether participants work covered any

further relevant functions.

The following research interviews were conducted:

Deputy Continuing Airworthiness Manager

Engineering Authority – various team members.

Military Airworthiness Review Certificate team member.

Continuing Airworthiness Management Organisation Quality Manager.

Experienced Aircraft Technician at Inspector Level.

Tornado Forward Fleet Manager.

Front Line Squadron Senior and Junior Engineering Officers.

Front Line Squadron Rectification controller, Line Controller, Weapons,

Mechanical and Avionic Trade Managers, with additional contributions

from various mechanics, technicians, supervisors and inspectors.

Tool Stores Controller.

Rolls Royce Technical Support Manager.

BAES Technical Support Manager.

BAES Reliability Engineering Manager.

Depth Workshops Supervisor.

43

Ground Support Equipment Trade Manager.

Station Air Safety Officer.

3.4 Model Development

Most if not all risk assessment or incident investigation methods require

practitioners to be trained in the application of the technique and are generally

most effectively applied in teams (e.g. HAZOPS, Safety Panels, etc.). Time and

resources precluded this approach for the case study; however insight from a

number of other practitioners’ case studies was gained through attendance at

the annual FRAM Workshop in Munich. Whilst Hollnagel’s (2012) guidelines for

FRAM model development were followed, the final Tornado Airworthiness

System Model used a number of innovative approaches. The main innovation

was the use of a Microsoft Visio drawing to provide an interactive ‘visualisation’

tool. This approach allowed the creation of a much larger model than has been

recorded to date in the literature. The visualisation tool was developed

concurrently with the spreadsheet model, which allowed for greater accuracy by

cross-checking between the two methods of describing the model. The final

model contains a total of 69 individual functions with 985 individual aspects

described. Where inconsistencies in the model became apparent or there was a

gap in knowledge, a variety of experts were used to provide additional

information through conversation or correspondence. In particular, various key

meetings were attended which provided insights that assisted with model

development:

Force Operations Centre Daily Summary.

Joint Qualifications and Trials Meeting.

Level B Capability Programme Reviews.

Various Upgrade Readiness Reviews.

Fleet Planning Meetings.

Scheduled Maintenance Reviews.

Depth HQ Value Stream Analysis – Continuous Improvement Event.

Mission Essential Equipment Continuous Improvement Event.

Air Safety Occurrence Investigators Workshop.

44

It is not possible to attribute individual model elements to particular sources; the

iterative nature of FRAM development precludes this in an experimental project

of this nature. It is recognised that this is a weakness in the process, however

this is mitigated volume of cross checking required to ensure model

consistency. The visualisation tool provided the final check of model

consistency in that all aspects had to be connected to another function or to an

external resource – loose ends were not allowed.

3.5 Air Safety Information Management System Data

Data from the Air Safety Information Management System (ASIMS) was used to

provide information on the variability of output from various functions within the

model.

3.5.1 Data Extraction

Data was extracted from ASIMS using the ‘Search Reports’ facility (MAA,

2011a), which allows user to apply various filters to the database. This allowed

only Tornado DASORs to be considered, within an initial selected date range of

1 Jan 2006 to 1 Nov 2013. This range allowed consideration of a time period in

which organisational structures have been relatively stable e.g. since the end-

to-end logistics transformation process (2001-2006), where a large number of

previously in-house functions were outsourced to industry. This date range was

then reduced to the most recent 12 month period 1 November 2012 to 1

November 2013, once the work required to analyse each entry became

apparent. ASIMS uses a standard taxonomy to describe both Occurrence

Cause Groups (OCG) and event descriptors. The MAA describes OCG as the

“final link in the chain which caused the occurrence… the one and only final

cause” and event descriptors are other ‘events in the chain’ (MAA, 2011a).This

clearly represents a linear accident model rather than the complex model

represented by FRAM. That said, both OCG and event descriptors do provide a

useful indication as to instances where undesirable functional variability

occurred. A second issue was that the FRAM Model was limited to

airworthiness management rather than ‘flight safety’ or ‘air safety’ in totality.

Operator actions that affected the airworthiness of the aircraft were included in

45

the ASIMS download, this was because such incidents rely on the performance

of additional functions to maintain continuing airworthiness following harmful

variability in the ‘operate aircraft function’. For example an operating pilot might

inadvertently cause a flap over-speed; this then relies on fault reporting, and

corrective maintenance functions (amongst many others) to perform within

acceptable limits in order to restore airworthiness. Table 3-1 shows the cause

and event descriptors that were included in the ASIMS ‘search report’ filter and

consequently the downloaded data. Reports that were captured in the ASIMS

filters but that were found to have no airworthiness aspect were deleted. This

left a total of 426 reports for analysis.

Table 3-1 D-ASOR Classifications included in Data

Cause and Event Descriptor Sub-Categories Included in Download

Hostile Action Nil

Human Factors (ATC/ABM) Nil

Human Factors (Aircraft Operation)

Flap / Slat / Airbrake Overspeed, Fuel Management, Gear Overspeed, Inadvertent Operation, Incorrect In-flight Shutdown, Incorrect Switch / Control Selection / Position, Overcontrol, Overstress, Overtemp, Overtorque, Undercontrol, Access Not Closed, Equipment Not Secured, Incorrect Use of Emergency Equipment, Loose Article, Collision with Aircraft/Vehicle, Collision with Ground Object, Deep Landing, Downwash, Flap / Slat Overspeed, Gear Overspeed, Heavy Landing, Tail Strike, Blanks / Pins Not Removed, Missed on Walk Round, Wrong aircraft, Blanks / Pins Not Fitted, Chock Jump, Collision with Aircraft/Vehicle.

Human Factors (Maintenance) All

Human Factors (Ground Services) All

Human Factors (Other) Material Dropped into Open System, Material left in Aircraft or Engine, Access not Closed, Equipment not Secured, Incorrect Use of Emergency Equipment.

Not Positively Determined All

46

Organisational Fault All

Technical Fault All

Unsatisfactory Equipment All

3.5.2 Assignment of Related Functions to Incidents

Once ASIMS data had been exported into an MS Excel format, each report was

assigned to up to 3 functions to indicate that the occurrence was a result of

output variability from each of these functions. The three functions were

assigned in a rough order of proximity to the reported occurrence, for example

an incidence of nose wheel steering failure would show as the mechanical

system function as the first and ‘closest’ function to the occurrence because the

variation in this function’s output was what was being reported. However, it may

have been variability in the output of the electrical system function that caused

downstream variability in the mechanical system; in this case the electrical

system function would be recorded second. Only three functions were assigned

to enable expedient data processing; the output was designed to be a rough

indicator of reported functional variability so this simplification was deemed

acceptable. Assignment was a matter of judgement formed by reading the ‘Brief

Title’, ‘Description’, ‘Investigation and Rectification work’, ‘Other Equipment

Involved’, ‘Cause Narrative’ and ‘Cause Observations’ fields. As the taxonomy

and language used by report authors did not correspond directly to the FRAM

model, this had to be carried out manually, which precluded a full analysis of

each report due to the time required. In addition to assigning three FRAM

functions to each report, a further set of fields was added to the data to show

the number of couplings by type of function e.g. Human-Human, Technological-

Organisational, Human-Organisational, etc. It was recognised that these

couplings might not be ‘direct function to function’ couplings and that there may

be intermediary functions identified in the FRAM model. Results from this

process are presented as they used in chapter four.

47

4 BUILDING THE TORNADO AIRWORTHINESS SYSTEM

MODEL USING THE FUNCTIONAL RESONANCE

ANALYSIS METHOD

Chapter two described the theoretical background to resilience engineering and

identified the Functional Resonance Analysis Method (FRAM) as the most

practical way to apply the principles. As described in Chapter one, the RAF’s

Tornado GR4 fast jet aircraft fleet was used as a case study. The following

chapter describes how the Tornado Airworthiness System Model (TASM) was

constructed using the FRAM. The TASM was created within a Microsoft Excel

spreadsheet; Chapter 5 describes the accompanying Visualisation Tool, which

was developed concurrently with the spreadsheet model. A copy of the final

spreadsheet model is at Appendix A.

4.1 Basic Principles

A full description of the FRAM is given by Hollnagel (2012) and also on the

website www.functionalresonance.com (Hollnagel, 2014). Drawing on the

theoretical basis of resilience engineering already described, the basic

principles of FRAM are given as:

The Equivalence of Success and Failure. Things go right and wrong in

fundamentally the same way. Although outcomes may be different, the

underlying processes are not necessarily different.

Approximate Adjustments. Conditions under which work or activity is

conducted never entirely matches that which is prescribed. Systems

normally adjust performance approximately to match existing conditions.

This approximation results in performance variability.

Emergence. Variability is not normally enough to cause an accident.

Variability may combine in unexpected ways leading to disproportionately

large, non-linear outcomes.

Functional Resonance. Occasionally functions reinforce each other

and cause unusually high output variability. This coupling effect is called

functional resonance, which may spread through the system. The

http://www.functionalresonance.com/

48

phenomena is dynamic and attributable to a simple combination of

causal links.

4.2 Taxonomy

Modelling complex socio-technical systems requires clarity of definition for the

elements of the system. The taxonomy proposed by Slater (2013) is used in this

investigation:

Function – the means by which an outcome is achieved. Can be carried

out by mechanical or electrical technology or by humans or

organisations.

Aspect – those features that describe the operation of a function. These

are Input, Preconditions, Resources, Control, Time and Output.

Activity – The output from the whole system under consideration

requires linkages between the various functions via their aspects. These

linkages are activities.

Process – A process is a sequence of activities.

System – The collection of functions and their dependencies define the

system under consideration.

Input – That which the function processes or transforms or that which

starts the function.

Preconditions – Conditions that must exist before a function execution.

Resources – That which the function needs or consumes to produce the

output.

Control – How the function is monitored or controlled; plan, programme,

instructions etc.

Time – Temporal constraints affecting the function.

Output – That which is the result of the function; an entity or state

change, finishing time or duration.

49

Instantiation – A ‘time-sliced’ map of system activity showing the system

state at a particular time. This is likely to show one or more processes

underway.

Functional Resonance – the detectable signal that emerges from the

unintended interaction of the normal variabilities of many signals

Figure 4-1 provides a visualisation of some aspects of the FRAM taxonomy. For

this study a convention has been adopted in that activities linking functions are

shown as dotted lines to illustrate the potential for their existence. The

illustration of a particular process or instantiation of a system state shows the

activities underway at the time as solid lines. Figure 4-1 also shows an

instantiation of an example process, which is shown by the purple lines. A

system will be complete when all functions are linked by potential activities or to

external dependencies or outputs.

Figure 4-1 FRAM Model Visualisation Demonstrating Taxonomy

O

C

P

I

T

R

DOWNSTREAM FUNCTION

B

O

C

P

I

T

R

UPSTREAM FUNCTION

D

O

C

P

I

T

R

UPSTREAM FUNCTION

E

O

C

P

I

T

R

UPSTREAM FUNCTION

F

O

C

P

I

T

R

UPSTREAM FUNCTION

A O

C

P

I

T

R

UPSTREAM FUNCTION

G

External Dependency

O

C

P

I

T

R

DOWNSTREAM FUNCTION

C

Activity

Function

Aspects

Process

50

4.3 FRAM Step 0 – Recognise the Purpose of the FRAM

Analysis

The primary purpose of the analysis was, in line with the research objective, to

allow airworthiness risk assessment to be conducted. Hollnagell (2012) offers

the choice of conducting a FRAM assessment for either incident analysis or risk

assessment. The research objective is to provide a tool for airworthiness

management and given that resilience engineering is concerned with both

reactive and proactive management of risks, risk assessment is the primary

purpose of the model. This is likely to produce a more complete system than

that created to analyse particular incidents. In order risk assessments to be

carried out the model must aid understanding of how the system operates and

the effect of any future disturbances. The scope of the analysis and the system

boundary was defined as follows:

The fundamental purpose of the system is to provide airworthy aircraft

for operations.

Only the functions that have the potential to affect the airworthiness of

the aircraft fleet were considered to be part of the system. For example,

whilst feeding and paying the technicians who maintain the aircraft is

important; these functions are considered as constant in output and

therefore not in scope.

Factors external to the system could be modelled as functions

themselves; Hollnagell (2012) describes these as ‘background

functions’. These external relationships are not described; activities

simply link to ‘External Factors’.

‘Management’ implies control of the system, therefore the system

boundary is set to encompass only those functions where the users of

the tools created through this application of the FRAM will have an

ability to exert control. Thus the regulatory system is considered an

external factor, noting that other studies have modelled this relationship

with FRAM (Herrera, 2010).

51

Environmental factors such the weather were considered not to vary and

were therefore outside of the system boundary; these were mapped as

external factors linked to specific aspects of some functions.

The total British military fleet of Tornado aircraft was considered.

Functions performed by the Front Line Command, Defence Equipment &

Support and industry contractors were all considered to be part of the

system.

The System is concerned with the management of an in-service aircraft

fleet with an adequate safety case. There is no consideration of the

functions required to design or manufacture the base-line aircraft.

The aircraft itself is modelled as a series of functions which interacting

with functions carried out by the aircrew and maintenance teams.

4.4 FRAM Step 1a – Identify and Describe the Initial Function

List.

The difficulty with modelling systems for the purpose of risk assessment rather

than accident analysis is that more imagination is required to identify functions.

In a complex sociotechnical management system there are no shortage of

candidate functions; the difficulty lies in avoidance of duplication and modelling

an acceptable level of detail. The iterative nature of FRAM means that

unnecessary or duplicate functions will be removed at later stages as the model

is refined. Experience of aircraft operations was initially used to list a number of

candidate functions. This was conducted by noting various functions carried out

by the various organisations involved with the Tornado Force. For example,

starting at a frontline squadron, the daily activity was envisaged by mental

walkthrough of activity, recording the various tasks required to handle and

maintain the aircraft. This ‘thought experiment’ then moved further back through

the organisation. Once an initial list was generated the various policy

documents detailed below were consulted as memory joggers to identify

additional functions.

Tornado Continuing Airworthiness Management Exposition (MOD., 2013)

No. 1 Group Air Safety Management Plan (Dudman, 2012)

52

RAF Marham Air Safety Management Plan

Tornado Equipment Safety Management Plan (Woodbridge, 2012)

It was important at this stage to guard against confusing a task with a function

(Hollnagel, 2012). The Tornado Continuing Airworthiness Management

Exposition (MOD., 2013) was particularly helpful in this context because as it

had only recently been written it was assumed to provide a reasonably close

match to ‘work as done’. Deviations between ‘work as imagined’ and ‘work as

done’ became clearer as the model grew. Once a reasonably complete first set

of function names was produced, these were recorded on a spread sheet as a

number of FRAM frames, as shown in Table 4-1. For each function, the aspects

were examined and recorded. An initial draft of potential aspect descriptions

was created from prior knowledge of the airworthiness system. This first draft

was then cross checked against the policy documents for consistency, with any

conflicts noted for later checking. As described in later steps, this draft was

subject to complete revision based on interviews with subject matter experts

which either validated or changed the first iteration of both the list of functions

and the description of individual function’s aspects. Each function was

assigned a serial number; the order of which is not significant.

Table 4-1 Example FRAM frame for Fault Diagnosis

52 Name of Function Fault Diagnosis

Aspect Description of Aspect

Input Fault shown on rects control board

Output Corrective Maintenance

Precondition Authorised maintenance personnel

Aircraft in correct fuel state

Resource Tools and TME

Approved data (Maintenance Procedures)

Authorised maintenance personnel

Information from aircrew (debrief)

Any unauthorised aide-memoires

GSE

Spare parts

Control Approved data (Fault Diagnosis)

Time Maintenance Programme

Flying Programme

53

In most cases it was effective to start the analysis with the input condition,

which generally led to some work on defining what represented a precondition

as opposed to the input. In order to bound the scope of the function, it was then

useful to define the output and at this stage some effort was required to cross

check against other functions to ensure that the output formed some link with

another function. This inevitably provided ideas for further functions, which were

added to the list. Resources were then identified; some care was taken to

generate a consistent set of named resources across various functions. If future

FRAM analyses are carried out on similar airworthiness systems, it would be

efficient to develop an initial taxonomy of resources based on those listed in this

model. It should be noted that the FRAM requires that functions should be

described as verbs, given that by definition they must perform some action. The

aircraft is described as a series of functions in terms of its subsystems;

structure, propulsion and so on. For the sake of brevity these are simply given

nouns, using the same terminology and subsystem structure that is used to sub-

divide airworthiness management tasks within the engineering authority. Clearly

however, all of these technological subsystems do provide a function. For

example in the case of the aircraft structure this is to react the loads imposed by

aircraft operation. Similarly the function of the propulsion system is to provide

thrust, electrical power, reduce fuel and also to record engine health data.

4.5 FRAM Step 1b – Verify Functions with Experts

The FRAM is designed to be conducted with groups of experts trained in its

use; this was impractical due to resource constraints and the experimental

nature of the approach. In order to verify the accuracy of each function at least

one person who currently formed part the function was identified and was

voluntarily co-opted into working through the function as described in Chapter 3

(with the expectation of technological functions).

54

Figure 4-2 TASM Step 12 – Screen Capture Showing Applicable Spreadsheet

Areas

The final and complete list of functions is given in Table 4-2.

Step 1 Functions and their

Aspects

55

Table 4-2 Listing of TASM Functions

Number Function Brief Description Number Function Brief Description

1 Flight Servicing 35 Monitor Reliability Data

2 Scheduled Maintenance 36 Publish Release To Service

3 Ground Handling 37 Independent Advice

4 3 Month Flying Programme 38Store, Service, Repair Weapons and

Role Equipment

5 Task Maintenance 39 Demand Spare Parts

6 Record Work Done on ac 40 Repair Aircraft

7 Train Maintenance Personnel 41Structural Inspections & Corrosion

Control

8Provide Authorised Maintenance

Personnel42 Fault Diagnosis

9Occurence Reporting, Investigation &

Follow Up43 Corrective Maintenance

10 Supply Chain 44 Technical Asistance Process

11

Fit/Remove Role

Equiment/Weapons/Explosives/Ejecti

on Seats

45 Avionic Flight Systems

12Maintain Ground Support Equipment

(GSE)46 Defensive Aids

13Provide & Account for Tools and Test

Equipment47 Avionic Communications

14 Refuel/Defuel 48 Armament & Electrical Systems

15 Assure Quality 49 Mechanical Systems

16Coordinate Maintenance

Documentation50 Aircraft Structure

17 Defer Faults 51 Propulsion

18 Locally Manufacture Parts 52 Crew Escape Systems

19 Engine Health Monitoring 53 Weapons

20

Ground Services (Cooling, Power,

Dehumidification, Steps, Staging,

Bungs, Blanks)

54 Operate Aircraft

21 Force and A4 Operations 55 Pre-Flight Checks

22Maintenance Programme

Development56 Produce Airworthy Survival Equipment

23 Modify Aircraft 57 Handover

24 Apply Special Instruction (Technical)s 58 Supervise Maintenance

25 Report Fault 59 Independent Inspection

26Replacement of service life limited

parts60 Plan Weekly-Daily Flying Programme

27 Airworthiness Review Certification 61 Rectification & Line Control Boards

28 Repair Spares - Industry 62 Manage Maintenance Extensions

29 Publish Aircrew Publications 63 Configuration Management (LITS)

30Publish Approved Data (Tech Manuals

& Policy)64 Operate Shift Pattern

31Publish Special Instructions

(Technical)65 Software

32Cost Benefit Analysis / Hazard

Analysis/ ALARP Decision66 Engine Performance Monitoring

33 Acquire Spare Parts 67 Engine Fleet Monitoring

34 Repair/Maintain Spares R2 68 Design Organisation

69 Chief Air Engineer

56

4.6 Step 2 – Identification of Output Variability

The purpose of the second step was to identify and characterise the potential

variability of the output of each function. It was first necessary to classify the

type of function; broad inferences could then be drawn from the literature as to

the likely nature the variability. This was compared to the data gathered from

interviews, ASIMS and general system experience. Output variability was

described in two generic dimensions; frequency and amplitude. Frequency

referred to how often the output of the function typically varied and the

amplitude was a measure of this variation in terms of deviation from a normal

level. Figure 4-5 provides a graphical representation of the notion of output

variability:

Figure 4-3 Visualising Functional Output Variability

4.7 Step 2a – Identify the Type of Function

Hollnagel (2012) identifies three classifications of function; Technological,

Human and Organisational. The difficulty of classifying each function varies.

Some, such as the function carried out by an aircraft system e.g. ‘Defensive

Aids’ were clearly technological. The attribution of either ‘human’ or

‘organisational’ characteristics to functions was largely down to the number of

people involved. Broadly defined functions such as ‘Supply Chain’ were clearly

organisational in nature. Others such as ‘Refuel/Defuel’ are carried out by only

one or two people and hence were classified as ‘human’. In other cases, such

as ‘Scheduled Maintenance’ this was less clear, as the function represented the

conglomeration of a number of human functions but also required organisation

with a hierarchical structure. As this sub-step only provided an initial pointer

O

C

P

I

T

R

FUNCTION

amplitude

1frequency

output variability

57

towards identifying output variability, these distinctions were not critical. As

described in chapter three, ASIMS data was used to show reported functional

variability. Figure 4-4 shows the number of times that functional variability was

reported. The majority of reports related to variability in technological functions,

which was because technological functions are those whose output has a most

direct impact on flight safety. In general, ASIMS reports did not identify

organisational or human factors related causes for occurrences. This is

because all incidents were purely related to the reliability of the technology or

that investigations did not probe deep enough into the incidents to uncover

these instances. Also the majority of occurrences did not result in any harm;

reporting of near misses due to human or organisational factors may not be

reported in the same ratio as reliability issues.

58

Figure 4-4 Instances of Functional Output Variability Recorded in Occurrence

Reports 2012/13

59

Figure 4-5 Instances of Reported Functional Output Variability by Function Type

Figure 4-6 Total Instances of Functional Output Variability Recorded in

Occurrence Reports 2012/13

4.8 Step 2b – Identify Internal Sources of Output Variability

Using system experience, interview data and ASIMS data described above, the

sources of internal variability for each function were noted and then

characterised. Internal sources of output variability are those which are

produced from within the function due to its inherent nature. Technological

functions may suffer component failure due to wear-out or human functions are

subject to a variety of psychological and physiological variations.

60

Table 4-3 Summary of Internal Variability (Hollnagel, 2012)

Possible internal sources of performance variability

Likelihood of performance variability

Technological Few, well known Low

Human Very many High frequency, large amplitude

Organisational Many, function specific or relating to ‘culture’

Low frequency, large amplitude

At this point in the analysis, notes were also made relating to any internal

damping mechanisms, for later reference. Damping mechanisms might include

internal redundancy in the case of technological functions, for instance a fail-

safe structure might continue to react loads to the full specification despite the

failure of one load pathway. In the case of an organisation, overlapping

responsibilities might provide cross checking of activity and reduce output

variability.

4.9 Step 2c – Identify External Sources of Output Variability

External output variability can be traced to some external dependency or linked

function in a process. The function ‘Ground Handling’ requires a variety of

resources in order for it to work (mechanics, drivers, tow tractor, etc.) and if

these aspects of the function vary in some respect then the potential exists for

the output of the ground handling function to also vary. For example, if the

ground handling team contained a particularly inexperienced worker then the

output of the function may potentially vary. Of course, damping factors whether

internal or external might remove this potential function output variability.

Damping factors could include additional supervision or time to complete the

task. As well as external variability within the defined function’s aspects (input,

precondition, resources, control and time) there are system-wide external

factors to consider that might exert influence on some or all functions, leading to

output variability. Such factors cannot be easily mapped in the FRAM Model;

they include environmental factors such as weather, infrastructure such as

heating, lighting, office space and IT reliability and also more intangible factors

61

such as cultural dimensions (such as ‘Just’, ‘Safety’ or ‘Reporting’ cultures).

Where external system-wide factors were potentially significant these were

noted at this step. The same data sources used for internal variability were also

used to produce notes on the external sources of variability for each function.

Table 4-4 Summary of External Variability (Hollnagel, 2012)

Possible external

sources of performance variability

Likelihood of performance variability

Technological Maintenance, misuse Low

Human Very many, social and

organisational High frequency, large

amplitude

Organisational Many, instrumental or

‘culture’ Low frequency, large

amplitude

Initial notes on internal and external sources of output variability were entered in

the FRAM Model as show in Table 4-5:

Table 4-5 Example TASM Recording of Step 2a-c for Function 67 - Engine Fleet

Monitoring

Type of Function: Internal Variability External Variability

Organisational

This contains a variety of technological and human judgement functions which combine to provide and overall organisational function

Internal variability is caused by human judgement elements of the function.

There is a variety of commercial and operational production pressures that influence this function. The ability and expertise of front line squadrons also provides context to the advice given out from Propulsion Support Team /Rolls-Royce.

4.10 Step 2d – Most Likely Dimension of Output Variability

Steps 2b and c identified the sources of output variability, the next step

characterised the potential output variability in its most likely dimensions. The

principles of conservation of energy and mass dictate that output must be in

some form of mass or energy transfer. For many functions this also provides for

62

some form of information transfer in various media (verbal, electronic, visual,

etc.). In order to keep the model at a manageable size, not all functional outputs

are described in exhaustive detail. The level of detail is in itself an ‘efficiency

thoroughness trade-off’; the validity of the judgement will be iteratively assessed

and adjusted as the model is used. All outputs were linked to aspects of other

functions, apart from the aircraft functions themselves which interact with the

external environment. The self-contained nature of the system provided a

mechanism for checking the internal consistency of the model – all outputs must

link to another function or to the external environment. Hollnagel (2012)

provides two options for characterising output variability; either a ‘simple’

solution or an ‘elaborate’ solution. The simple solution provides characterisation

in terms of time or precision. Given the broad scope of this model and the

potential wide range of activity covered by a single functional output line, the

elaborate solution was used to characterise output variability. Hollnagel (2012)

identifies 8 manifestations of output variability which are further divided into four

subgroups.

Table 4-6 Elaborate Description of Output Variability (Hollnagel, 2012)

Manifestation of Variability Description

Timing/Duration Too early/ too late/ omission.

Force/ Distance/ Direction Too weak/ too strong/ too short/ not far enough/ wrong direction/ too long too far/ wrong type of movement.

Wrong Object Wrong object or points to wrong object.

Sequence (of actions or information)

Omission, jumping, repetition, reversal, wrong part

Hollnagel (2012) emphasises the difference between actual variability and

potential variability. The main purpose of this model is to allow risk assessment;

potential variability is therefore the important issue and the subject of the initial

assessment that forms the basis of the model. Hollnagel describes potential

variability as what ‘could possibly go right or wrong’. Given the broad scope of

this model, this has been further clarified to the most likely potential variability.

This means that there is a steady state starting point from which the model can

63

be iteratively manipulated. The FRAM spreadsheet uses ‘drop-down’ selections

to allow allocation of ‘most likely’ output variability. The term ‘most likely’ allows

for the fact that some outputs may potentially be able to produce a variety of

manifestations. In particular instantiations of the model, these may not

correspond to the exact activity that is occurring. It is important to emphasise

that the model classifies the most likely output variability not the most likely

output, therefore there may be a more likely form of output but it is the rarer but

more variable form that is captured in the model. For example Table 4-7 shows

the characterisation of the output variability for flight servicing. One output of

this function is ‘replenishment of aircraft systems’ (with oils, greases and

gases). The description of the most likely output variability gives that an

omission or the wrong fluid may be used. Of course, in the majority of cases (or

instantiations) of this function in operation, the correct fluid will be used in the

correct quantity, hence exhibiting no variability.

Table 4-7 Characterising Output Variability – Flight Servicing

As shown in Table 4-8, the FRAM spreadsheet was developed to define output

performance variability in one of three gradations for both frequency of most

likely variability and its most likely amplitude. It is important to note that these

characterisations are related to the observed performance of the system, which

is not necessarily an accurate indicator of future performance. Frequency of

variability was defined as the rate of occurrence of a performance deviation

AC visually inspected (Avionics,

Electrical, Structure, Mechanical,

Crew Escape, Weapons,

Propulsion)

Sequence Omission High Medium

Any faults recorded Sequence Omission High Medium

Husbandry jobs recorded in log Sequence Omission High Low

Flight Servicing Certificate Signed Wrong Object

Sign up for wrong tail

number or omit full

information

Medium High

Frequency of

Output

Performance

Variability

Amplitude of

Output

Performance

Variability

Outputs

Most Likely

Dimension of

Output

Variability

Description of Most

Likely Output

Variability

64

from some accepted level. To provide consistency across the model a set of

qualitative and quantitative descriptions were established:

Table 4-8 Classifications for Frequency of Output Variability

In many cases variability is a designed-in or an inherent part of the system and

other functions serve the purpose of damping the effects of the performance

variability. For example, it is expected that the aircraft structure (a function) will

develop cracks; this is what happens to aero-structures in service. However,

anticipating this, the design organisation specifies a maintenance schedule to

check for cracks. There is then a process (sequence of activities linking

functions) to mitigate the effects of cracking before the structures function is

allowed to vary in its output to the extent that loads are not reacted and

structural integrity is lost. This quality of complex socio-technological systems

makes it difficult to dissect the amplitude of performance variability from its

potential effect on the performance of downstream functions. In the case of a

cracked structure, the designer’s (and regulators) intent was that the cracks

must be spotted before they propagate to a length where integrity will be lost. In

this respect the only variability of the structure function occurs once integrity is

lost. If left uninspected the structure would of course eventually fail. Because

the system is in a state of balance, both frequency and amplitude are assessed

at the current system state (start of any iteration). Given the disparate nature of

functional output; from lubricant top-ups to an operational plan, it is not possible

to give quantitative descriptions of amplitude. A three tier qualitative system was

therefore employed:

Frequency of Variability Qualitative Description Quantitative Description

High

A not unusual occurance in a

monthly period across the

whole fleet.

Occurs 1 to 10-2 times per

event/flying hour/work hour

Medium

The output of this function

infrequently varies from the

standard or proscribed form.

Occurs 10-2 to 10-4 times per


Low

Very rarely varies from the

standard or proscribed form

of output

Occurs 10-4 or less times per


65

Table 4-9 Classification of Amplitude of Performance Variability


Areas

4.11 Step 3 – Aggregation of Variability

Once most likely output variability has been modelled, it is then necessary to

show where these outputs link to other functions and thereafter to aggregate the

effects of these varying upstream outputs on the downstream function. Figure 4-

10 shows how the model uses cell-linking functionality to provide the input to

step three, from the details entered in step two.

Amplitude of Variability Timing/Duration Force/ Distance/Direction Wrong Object Sequence

High

Complete Critical

Ommissions or too

late/early to produce

useful effect

Gross error in

force/distance/direction of

output - requires major

restorative action to correct

Totally wrong object is

output or pointed at

Sequence completely

jumbled /large or critical

sections missed/critically

wrong part inserted

Medium

Less than critical

ommissions or effect is

late/early enough to cause

difficulty for downstream

functions

Error in

force/distance/direction

requires some restorative

action to correct

Nearby or similar object is

pointed at or output

Some significant

ommission/skipping/rev

ersals or additional parts

Low

Ommissions or late/early

output cause minor

difficulty to upstream

functions

Minor error of

force/distance/direction -

requires little if any

correction

Minor difference between

the object output/pointed

at and the correct object

Minor

ommission/skipping/rev

ersals or additional parts

Step 2 Identify Output

Variability

a b c d

66

Figure 4-8 Tracing Output Downstream Dependencies (Screen Capture)

Hollnagel (2012) suggests a variety of possible effects on downstream functions

based on the simple solution to characterising variability. These are used as a

guide for potential effects on the more complicated set of variability descriptions

used in the model. The potential effects are expressed as free text and where

67

relevant highlight the most likely upstream outputs that will vary from the

downstream function. The effects of the upstream function output variability are

considered independently. For functions with multiple outputs and upstream

aspects there will be a large number of potential combinations of variability. It is

therefore not possible to express the overall effect of upstream variability other

than in a particular instantiation of the model. However as a visual aid, each

upstream aspect is rated as to the extent that its most likely potential output

variability will affect the downstream function in question. The possible effect on

this (downstream) Function Output Variability Score is given as either

‘increasing’, ‘no change’ or ‘decreasing’ in terms of upstream output variability.

This score provides an estimate of the likely downstream ability to damp out

upstream variability. To contrast with the frequency and amplitude ratings of the

upstream aspects, the possible effect on the downstream function is shown as a

shade of purple in Table 4-10:

69

Table 4-10 Aggregation of Variability for Flight Servicing

Name of Function Flight Servicing Number Name Aspect


Input Line Controller indicates task on boards 61Rectification & Line Control

Boards

Maintenance

Information for taskingSequence

Inappropriate/unworkable

planMedium Medium

Potential to sway ETTO and cause

ommissions and errorsINCREASE

Output AC systems replenished (propulsion, mechanical)

AC visually inspected (Avionics, Electrical, Structure,

Mechanical, Crew Escape, Weapons, Propulsion)

Any faults recorded

Husbandry jobs recorded in log

Flight Servicing Certificate Signed

Precondition Maintenance Activity complete 16Coordinate Maintenance

Documentation

F700 ready for flight

servicing and ac captain

(pre-flight checks)

Timing/DurationF700 not available for crew

walkMedium Medium

Unlikely to start flight servicing if

pre-condition not in place.NO CHANGE

AC available at groundcrew location 54 Operate AircraftReturn aircraft to

groundcrewWrong Object Parked in incorrect location Low High

Not possible to start flight servicing

without access to aircraftDECREASE

Resource Fuels & Lubricants (from supply chain) 10 Supply Chain

Part Delivered to

Corrective Maintenance,

Scheduled Maintenance,

Repair Aircraft,

replacement or life

limited parts, weapons

& role equipment, tools

& test equipment

Timing/DurationNot delivered in time to

meet requirementHigh Medium

Not possible to fully complete

flight servicing without necessary

consuamables

INCREASE

Tools & Test Equipment 13Provide & Account for

Tools and Test EquipmentTools and TME Wrong Object Incorrect tool Medium Low

Increases likelihood of using unsafe

work-arounds.INCREASE

Authorised manpower 8Provide Authorised

Maintenance Personnel

Appropriately (to

requirement)

Authorised Maintenance

Personnel (record work

done, fuel, scheduled

maintenance, report

faults, conduct quality

tasks etc)

SequenceOmmission of a required

authorised skillHigh Medium

Potential for unauathised (an not

competent) personnel to carry out

servicing; however likely to be

damped out by maintenance

tasking function.

INCREASE

Control Flight Servicing Schedule (approved data) 30Publish Approved Data

(Tech Manuals & Policy)Flight Servicing Notes Sequence Ommission Medium High

Misleading or inaccurate

information causes variability.INCREASE

Supplementary Flight Servicing requirements (from defer

faults)17 Defer Faults

ADF or Lim entry to close

job card hence allows co-

ordination of

maintenance

documentation

Sequence

Element of ADF/Lim

insufficiently defined or

ommitted

High Medium

Additional tasking within the flight

servicing increases human

performance issues.

INCREASE

Time Daily/Weekly Flying Programme 60Plan Weekly-Daily Flying

Programme

Flying Programme (Fuel,

flight service, xx)Sequence


planMedium Medium

Insufficient time likely to cause

inappropriate ETTO.INCREASE

Most Likely Dimension of

Upstream Output Variability

Description of Most Likely


Step 1 - Identify and Describe the FunctionsStep 3 - Aggregation of Variability

Possible effect on this (downstream)

Function Output Variability (Damping)

Fequency of Upstream

Output Performance

Variability

Amplitude of Upstream

Output Performance

Variability

Possible effect on this

(downstream) Function

Upstream Function

70

The amplitude and frequency classifications for upstream output variability

combined with the Possible Effect rating was then combined to produce a

Rough Downstream Function Variability Score. This Rough Score was

calculated as follows:

Numerical Score High Medium Low

(a) Frequency of Upstream Output

Performance Variability 3 2 1

(b) Amplitude of Upstream Output

Performance Variability 3 2 1

INCREASE NO CHANGE DECREASE

(c) Possible effect on this

(downstream) Function Output Variability (Damping)

3 2 1

Figure 4-9 Rough Score Matrix

Equation 6 - Rough Downstream Function Variability Score

Rough Score = a x b x c

This gave a score between 0 and 27:

Figure 4-10 Rough Downstream Function Variability Score

This rough score is shown in the FRAM Model against each aspect of every

function, except for those aspects linked to external dependencies. These

external dependencies were assumed to be constant.

0 3 6 9 12 15 18 21 24 27

Most Likely

upstream output

does not effect

downstream

functional

variability

Most Likely

upstream output

variability is highly

likely to

significantly effect

the output

variability of the

downstream

function

Increasing effect on downstream function output variability - may be manifested in either frequency or amplitude of downstream output variability

71


Areas

4.12 Step 4 – Consequences of the Analysis

Traditional safety models focus on elimination of hazards, prevention of unsafe

conditions and protection from the consequences of unsafe conditions if they

occur. By contrast a FRAM Model may be used to prevent functional resonance

occurring in an activity linking two functions. There are two consequences of

constructing the model; how to monitor performance variability and how to

provide damping to prevent adverse performance variability.

4.12.1 Step 4a – Damping Factors

A number of damping factors to prevent adverse performance variability are

suggested in the final section of the model. These may be in the form of

additional functions, activities or changes to the performance of existing

functions or activities. Alternatively there may be some way in which additional

internal functional damping could be introduced.

4.12.2 Step 4b Performance Indicators

The concept of monitoring safety performance indicators is a much studied

technique. FRAM offers a clear conceptual starting point from which such

Step 3 Aggregation of

Performance Variability

72

indicators may be developed. Dependent on the outcome of the analysis, safety

indicators may be conceived to monitor either overall performance of the

function or particular activities. This is the potentially key means by which the

model can serve a useful purpose. The potential performance indicators shown

in the model represent an initial suggestion for review by subject matter experts;

where possible existing data should be used to generate indicators without

additional work.


Areas

Step 3 Aggregation of


73

Table 4-11 Example of Step 4 - Flight Servicing

Name of Function Flight ServicingAspect Description of Aspect

Input Line Controller indicates task on boardsPre flight checks, fault reporting, engineering

management supervision

Output AC systems replenished (propulsion, mechanical) Related fault reporting



Aircrew pre-flight checks - feedback info.

Husbandry checks and Airworthiness Review

Any faults recordedComparison of flt servicing fault reporting

across shifts/sqns etc

Husbandry jobs recorded in logComparison of flt servicing husbandry

reporting across shifts/sqns etc

Flight Servicing Certificate SignedCaptured in ASIMS reports if found

Precondition Maintenance Activity complete N/A

AC available at groundcrew location N/A

Resource Fuels & Lubricants (from supply chain)Pre flight checks, fault reporting, engineering


Tools & Test Equipment

Approved data specifies tools + Pre flight

checks, fault reporting, engineering


Authorised manpowerPre flight checks, fault reporting, engineering


Control Flight Servicing Schedule (approved data)The F765 process allows for reporting

unsatisfactory features in the approved data


faults)

Feedback to engineering management on any

inconsistencies with supplementary flight

servicing requirements

Time Daily/Weekly Flying Programme

Feedback to engineering management as to

likely feasibility of the plan. Line controller is

experienced technician and is able to judge

plan.

Step 4 - Consequences of the AnalysisStep 1 - Identify and Describe the Functions

Damping Factors Potential Performance Indicators

74

4.13 Summary of TASM Layout

The interconnected nature of the TASM Model means that it is challenging to

follow for those not involved in its creation. A summary example of is given in

Figure 4-14 which shows a representation of the FRAM carried out on the

relationship between 2 functions; Function A which is upstream of Function B.

In this case Function B relies upon Function A to provide a time signal to start or

stop the function (or some element of it). This time aspect relationship is

identified in step one, along with the activities linking the other aspects of

Function B to other upstream functions.

75

Figure 4-13 Example FRAM for 2 Functions, A and B


Numb Name Aspect

Time Description of time signal A Function ADescription of

time signalSequence

Describe how Function A's

output variability is

manifested?

Medium Medium

How does variation in time

signal from Function A affect

the output of Function B?

INCREASE

Step 3 - Aggregation of Variability

Upstream Function Most Likely Dimension

of Upstream Variability


Upstream Variability

Fequency of Upstream







Function Output Variability

O

C

P

I

T

R

DOWNSTREAM FUNCTION

B

O

C

P

I

T

R

UPSTREAM FUNCTION

O

C

P

I

T

R

UPSTREAM FUNCTION

O

C

P

I

T

R

UPSTREAM FUNCTION

O

C

P

I

T

R

UPSTREAM FUNCTION

A

O

C

P

I

T

R

UPSTREAM FUNCTION

External Dependency

O

C

P

I

T

R

DOWNSTREAM FUNCTION

Type of Function

Human

Why this type? Describe sources? Describe sources? Description Sequence Ommission Low Medium

External/Exogenous

Variability

Step 2 - Identification of Output Variability

Internal/Endogenous

VariabilityOutputs


Output Variability


Output Variability

Fequency of Output Performance

Variability

Amplitude of Output Performance

Variability

OutputIdentify potential

performance

Time

What may damp output

variability? Other

functions? External Factors?

Internal factors?

Step 4 - Consequences of the Analysis

Damping FactorsPotential Performance

Indicators

Name of FunctionDownstream

Function BAspect Description of Aspect

Input What causes the function to

start?

Output What is produced?

Precondition

What is condition is

required to allow function

to occur?

Resource What is used or consumed?

ControlWhat defines the

operation?

Time What sets the schedule?

Step 1 - Identify and Describe the Functions

STEP 1

STEP 2

STEP 3

STEP 4

a b c d

From Function A Step 2

76

Step 2a then identifies the function type (shown as human), followed by 2b

describing sources of output variability caused internally within the function.

Step 2c identifies external sources of variability, whether these are activities

linking function B to other functions or other general ‘environmental’ factors.

Step 2d first shows a repeater of the identities of the output activities shown in

step one. The most likely form of output variability is selected from the

phenotypes discussed previously (shown here as ‘Sequence’). A more detailed

description of the most likely output variability is given (shown as ‘omissions’ in

the sequence). The frequency of the most likely variability is then given in gross

qualitative statements for the frequency and amplitude (shown as ‘low’ and

‘medium’ respectively). Step 3 merely takes the results from Function A’s step

two with respect to the output providing Function B’s time aspect. Step three

then goes on to assess the effect of the Function A’s output variability on the

performance of Function B. This then leads to a categorisation of the potential

effect of function B’s output variability (shown here as an ‘Increase’ in

variability). Step four first looks at what damping factors might reduce the effect

of Function B’s output variability and then on what potential performance

indicators might be available to measure the output variability from Function B.

Note of course that there will be a consideration of performance indicators for

Function A’s output, which will be of use in monitoring the performance of

Function B also. This process will be repeated for all aspects of Function B and

then for all the other functions in the system. This process becomes iterative as

the linking activities are assessed from both sides. It is not possible to

exhaustively test the model to establish whether all of the activities between

functions have been captured. This can only be established by building the

model simultaneously with the visualisation tool. This tool is described in the

following chapter.

77

5 TORNADO AIRWORTHINESS SYSTEM MODEL

VISUALISATION TOOL

5.1 Need for the Tool

Whilst the spreadsheet of the TASM is the actual model, its size and inherent

complexity makes it difficult to interpret. There is a requirement for some other

form of representing the model. As shown in Figure 4-1, functions and their

aspects can be visualised as a series of hexagons linked by lines. This

technique formed the basis of a visualisation tool built using Microsoft Visio.

5.2 Microsoft Visio

If the model is to be of practical use for the RAF, then it must be able to be

interrogated using standard software available on Defence Information

Infrastructure (DII) computer systems. Software for viewing Visio drawings is

available as standard and the full version of Visio is available at an additional

cost to units. Furthermore it is also possible to incorporate interactive Visio

drawings into web-based Microsoft Sharepoint sites which are used for internal

communication, storage of documents and other tools. Visio was therefore

selected as the basis for the FRAM Visualisation Tool because of its ability to

host drawings that can be manipulated both during development and by end

users to aid interpretation. Key to this is part of the project was the the ability to

assign objects within the drawing to ‘layers’.

5.3 Building the Tool

The visualisation tool was developed concurrently with the spreadsheet model.

As the links within the spreadsheet are not easily viewable, the visualisation

provided a method of cross-checking the model for completeness. The

visualisation was developed in the following steps.

5.3.1 General Functional Areas

Because of the complexity of the model, it was important to minimise the

average distance between linked functional aspects so as to minimise the

number of drawing elements placed on top of each other. The first step was to

78

decide on groupings for functions. These were laid out as shown in Figure 5-1,

with the technological functions representing the physical aircraft system in the

top right-hand corner (blue), with the more human related functions involved in

line operations (peach) and maintenance (cream) to the left of the aircraft. The

green area hosts those functions involved in continued airworthiness

management and associated operations management. Type airworthiness

functions carried out by the DE&S Project Team and industry design

organisations are shown in the bottom left in mauve. The purple area in the top

left hosts functions in the supply and repair chain. Grey areas host functions

that are not described within the model; these are external resources or

regulations. It is important to note that these areas are a general aid to building

the visualisation and also an aid to interpretation. They are not definitive

features of the functions overlaid on them – many functions sit between the

areas and there is a limit to how they can be represented in a two dimensional

representation.

LINE OPERATIONS

AIRCRAFT SYSTEMAIRCRAFT MAINTENANCE

CONTINUING AIRWORTHINESS & OPERATIONS SUPPORT

SUPPLY & REPAIR CHAIN

PROJECT TEAM/TYPE AIRWORTHINESS AUTHORITY &

INDUSTRY SUPPORT

REGULATION

EXTERNAL RESOURCES AND FUNCTIONS

© Crown Copyright, 2013

Figure 5-1 Visualisation Functional Groupings

79

5.3.2 Functions

The next step was to add the functions; these were shown in the same manner

as described in chapter two and in Figure 5-3:

O

C

P

I

T

R

Function

Time

Preconditions

Input

Resources

Output

Control

Figure 5-2 A Function and Its Aspects

Additionally a colour code was developed to show as pink those functions with

the potential to directly affect air safety by means of their outputs. These are

primarily the technological functions that comprise the aircraft system and

associated airborne equipment. Functions which directly affect the condition of

aircraft systems are shown as yellow, with other functions shown in grey. These

functions were overlaid as a new layer on the background, along with a key.

The functions are drawn as part of a single ‘functions’ layer and also as part of

layers specific to each function. A illustration of the background and ‘functions’

and ‘callouts layers is shown at Figure 5-4.

81

LINE OPERATIONS





INDUSTRY SUPPORT



Anywhere in system

Reporting / Just Culture + Occurrence or Perception of Risk

somewhere in system

Anywhere in system

Quality Culture

O

C

P

I

T

R


O

C

P

I

T

R

Locally Manufacture

Parts

O

C

P

I

T

R

Publish SI(T)sO

C

P

I

T

R

Engine Performance Monitoring

18

31 668

O

C

P

I

T

R

Repair Spares – Industry

28O

C

P

I

T

R

Repair/Maintain Spares R2

34O

C

P

I

T

R

Demand & Return Spare

Parts

39O

C

P

I

T

R

Independent Inspection

59O

C

P

I

T

R

Record Work done on Aircraft

6O

C

P

I

T

R

Defer Faults

17 O

C

P

I

T

R


13O

C

P

I

T

R

OperateShift Pattern

64

O

C

P

I

T

R

Co-ordinate Maintenance

Documentation

16 O

C

P

I

T

R

Ground Handling

3

O

C

P

I

T

R

Fuel/Defuel

14

O

C

P

I

T

R

Avionic Communicatio

ns

47 O

C

P

I

T

R

Avionic Flight Systems

45

O

C

P

I

T

R

Software

65

O

C

P

I

T

R

Mechanical Systems

49

O

C

P

I

T

R

Armament & Electrical Systems

48O

C

P

I

T

R

Propulsion

51O

C

P

I

T

R

Replacement of service life

limited parts

26O

C

P

I

T

R

Scheduled Maintenance

2

O

C

P

I

T

R

Task Maintenance

5

O

C

P

I

T

R

Flight Servicing

1

O

C

P

I

T

R

Pre-Flight Checks

55O

C

P

I

T

R

Operate Aircraft

54

O

C

P

I

T

R

Crew Escape System

52O

C

P

I

T

R

Aircraft Structure

50

O

C

P

I

T

R

Supply Chain

10

O

C

P

I

T

R

Acquire Spare Parts

33O

C

P

I

T

R

Store & Maintain

Weapons & RE

38

O

C

P

I

T

R

Structural Inspections

41

O

C

P

I

T

R

Repair Aircraft

40O

C

P

I

T

R

Corrective Maintenance

43

O

C

P

I

T

R

Apply SI(T)s

24 O

C

P

I

T

R

Fault Diagnosis

42

O

C

P

I

T

R

Supervise Maintenance

58

O

C

P

I

T

R

Handover

57

O

C

P

I

T

R

Fit/Remove Role & Arm Equipment

11

O

C

P

I

T

R

Rectification and Line

Control Boards

61

O

C

P

I

T

R

Ground Services

20

O

C

P

I

T

R

Report Faults & Husbandry

25

O

C

P

I

T

R

Weapons

53 O

C

P

I

T

R

Defensive AIds

46

O

C

P

I

T

R

Survival Equipment

56

O

C

P

I

T

R

Train Maintenance

Personnel

7

O

C

P

I

T

R

Airworthiness Review

Certification

27

O

C

P

I

T

R

Chief Air Engineer

69

O

C

P

I

T

R

Force & A4 Operations

21

O

C

P

I

T

R

Plan Weekly/Daily Flying Programme

60

O

C

P

I

T

R

Occurrence Reporting

9

O

C

P

I

T

R

Maintain GSE

12

O

C

P

I

T

R

Configuration Management

(LITS)

63O

C

P

I

T

R

Manage Maintenance

Extensions

62

O

C

P

I

T

R

Technical Assistance

Process

44

O

C

P

I

T

R

Modify Aircraft

23O

C

P

I

T

R

Monitor Reliability Data

35

O

C

P

I

T

R

Maintenance Programme

Development

22

O

C

P

I

T

R

Independent Technical

Advice

37O

C

P

I

T

R

Publish Approved Data

30

O

C

P

I

T

R

Release to Service

36O

C

P

I

T

R

Engine Health Monitoring

19O

C

P

I

T

R

3 Month Flying Programme

4

O

C

P

I

T

R

Assure Quality

15

O

C

P

I

T

R

Engine Fleet Monitoring

67

O

C

P

I

T

R

Publish Aircrew Publications

29O

C

P

I

T

R

Cost/Benefit and Hazard

Analysis

32

O

C

P

I

T

R

Design Organisations

68

Figure 5-3 Screen Capture of Visualisation Tool with Functions Added

82

5.3.3 External Dependencies

The next step in the development was to add the external dependencies to the

functions. In order to differentiate these dependencies from the activities linking

functions the dependencies are shown as straight lines. These dependencies

are drawn both as a separate layer called ‘External’ but also each dependency

is assigned to each layer representing a function to which it is linked. As Figure

5-5 shows, the visualisation already begins to resemble a complicated circuit

diagram, however further development steps begin to make it clearer to

interpret.

83

LINE OPERATIONS





INDUSTRY SUPPORT



Licensed Hangar/ Parking Space

71(IR) Sqn – Non Destructive Testing

Force Level 0 Plan

Crew Training Plan

Squadron Planning Staff

Squadron Management Tools

SQEP Engineering Management

ESLOPS (Aircraft State Database)

Personal Notes

LITS Instructions

Codification Process

MJDI System

BAES Supply IT System

StorageTransport Supply Orders

JSP800/886

JSP 886 Pipeline Times

WeatherBowser & Driver

Strategic Fleet Plan

Joint Business Agreement

ATTAC Contract (BAE Systems)

Capability Development Programme

GR4mations IT Tool


MILITARY EFFECT

Workshop Infra & Tools

Local Finance

Testing

AP100E-15

RB199 Ground Support Station

JetscanDetuner / HP Bay

ROCET Contract (Rolls Royce)

JAMES (IT system)

Capability Requirements Management

Investment Appraisal & Business CaseCommercial

Arrangements

Project Management

5000 Series Regulatory Articles

F799 Instructions for Use – Maintenance Log

Airworthiness/Safety Delegation Holders

Trilogi System


RESOLVECAMO Staff

Manual of Airworthiness Processes -01

Other Nations: Tornado Tech

Warning/Special Technical Order

Project Commercial & Financial Advice

Tornado Equipment Safety Management Plan

Commodity Internal Business

Agreement

Inventory Management

Staff

Explosives Regulations

Supply Personnel

Integrated Engineering Database

EDSR (Drawings database)

NETMA

PROQUIS

External Communications

Dynamic Environment

Aircraft Abandoned

Flight Authorisation Process

Qualified and Current Aircrew

AP100B-01 Handover Policy

Duty Auth

Squadron Golden Rules

Maintenance Personnel Assigned to Post

Phase 1 & 2 Training

Trainee Maintenance Personnel

Rigs

Anywhere in system


somewhere in system

Air Safety Management

Information System

External Occurrence Investigators

CAMO Staff

Air Safety Cell

Air Safety Management Plans

Anywhere in system

Quality Culture

Quality Staff

External Audit

Quality System Plans & Regulation

Archived Data

O

C

P

I

T

R


O

C

P

I

T

R

Locally Manufacture

Parts

O

C

P

I

T

R

Publish SI(T)sO

C

P

I

T

R


Symptom Capture Tool

Reliability Database

71(IR) Sqn – Repair Team

18

31 668

Handling Squadron

O

C

P

I

T

R


28O

C

P

I

T

R


34O

C

P

I

T

R


Parts

39O

C

P

I

T

R


59O

C

P

I

T

R


6O

C

P

I

T

R

Defer Faults

17 O

C

P

I

T

R


13O

C

P

I

T

R


64

O

C

P

I

T

R


Documentation

16 O

C

P

I

T

R

Ground Handling

3

O

C

P

I

T

R

Fuel/Defuel

14

O

C

P

I

T

R


ns

47 O

C

P

I

T

R


45

O

C

P

I

T

R

Software

65

O

C

P

I

T

R

Mechanical Systems

49

O

C

P

I

T

R


48O

C

P

I

T

R

Propulsion

51O

C

P

I

T

R


limited parts

26O

C

P

I

T

R


2

O

C

P

I

T

R

Task Maintenance

5

O

C

P

I

T

R

Flight Servicing

1

O

C

P

I

T

R

Pre-Flight Checks

55O

C

P

I

T

R

Operate Aircraft

54

O

C

P

I

T

R

Crew Escape System

52O

C

P

I

T

R

Aircraft Structure

50

O

C

P

I

T

R

Supply Chain

10

O

C

P

I

T

R

Acquire Spare Parts

33O

C

P

I

T

R

Store & Maintain

Weapons & RE

38

O

C

P

I

T

R


41

O

C

P

I

T

R

Repair Aircraft

40O

C

P

I

T

R


43

O

C

P

I

T

R

Apply SI(T)s

24 O

C

P

I

T

R

Fault Diagnosis

42

O

C

P

I

T

R


58

O

C

P

I

T

R

Handover

57

O

C

P

I

T

R


11

O

C

P

I

T

R


Control Boards

61

O

C

P

I

T

R

Ground Services

20

O

C

P

I

T

R


25

O

C

P

I

T

R

Weapons

53 O

C

P

I

T

R

Defensive AIds

46

O

C

P

I

T

R

Survival Equipment

56

O

C

P

I

T

R

Train Maintenance

Personnel

7

O

C

P

I

T

R


Certification

27

O

C

P

I

T

R

Chief Air Engineer

69

O

C

P

I

T

R


21

O

C

P

I

T

R


60

O

C

P

I

T

R


9

O

C

P

I

T

R

Maintain GSE

12

O

C

P

I

T

R


(LITS)

63O

C

P

I

T

R

Manage Maintenance

Extensions

62

O

C

P

I

T

R


Process

44

O

C

P

I

T

R

Modify Aircraft

23O

C

P

I

T

R


35

O

C

P

I

T

R


Development

22

O

C

P

I

T

R


Advice

37O

C

P

I

T

R


30

O

C

P

I

T

R

Release to Service

36O

C

P

I

T

R


19O

C

P

I

T

R


4

O

C

P

I

T

R

Assure Quality

15

O

C

P

I

T

R


67

O

C

P

I

T

R


29O

C

P

I

T

R


Analysis

32

O

C

P

I

T

R


68

Business Procedure BS013

RA 1300

RTSA

ITEA Contract

Codification

LITS ServersTAG TeamSEMA

DAOSBaseline Design

Warton Manpower Drawing SetDevelopment Aircraft Flight Trials

Materials

Figure 5-4 Screen Capture of Visualisation Tool with External Dependencies Added

84

5.3.4 Functional Activities

The most time consuming and difficult part of developing the visualisation tool

was the addition of functional activities. These had to be drawn to link all of the

over 900 functional aspects described in the spreadsheet model. As far as

possible they were drawn as an arc around other functions; however this was

not always possible without introducing unnecessary complication. There were

a small minority of activities which could not be shown directly on the

visualisation, because they would have introduced confusion to the diagram.

Functions that also draw activity from quality, safety or reporting cultures have

these activities shown as cloud shapes in order to prevent the need to link those

functions to every other function. The addition of approximately 900 activities

produces a diagram that itself becomes complicated enough to be described as

both complex and intractable. However it is important to note that the activity

lines represent the potential for these to activities to occur and link the

functions. Any particular instantiation of the model, that is a representation of

total activity at a specific moment, will not need to show every single activity

occurring.

Figure 5-6 shows the visualisation tool with all of the activities shown. This

illustrates that it is impossible to interpret the system with all activities shown at

once.

85

LINE OPERATIONS





INDUSTRY SUPPORT





Force Level 0 Plan

Crew Training Plan





Personal Notes

LITS Instructions


MJDI System



JSP800/886







GR4mations IT Tool


MILITARY EFFECT


Local Finance

Testing

AP100E-15




JAMES (IT system)



Arrangements

Project Management




Trilogi System


RESOLVECAMO Staff







Agreement


Staff


Supply Personnel



NETMA

PROQUIS


Dynamic Environment

Aircraft Abandoned




Duty Auth





Rigs

Anywhere in system


somewhere in system


Information System


CAMO Staff

Air Safety Cell


Anywhere in system

Quality Culture

Quality Staff

External Audit


Archived Data

O

C

P

I

T

R


O

C

P

I

T

R

Locally Manufacture

Parts

O

C

P

I

T

R

Publish SI(T)sO

C

P

I

T

R





18

31 668

Handling Squadron

O

C

P

I

T

R


28O

C

P

I

T

R


34O

C

P

I

T

R


Parts

39O

C

P

I

T

R


59O

C

P

I

T

R


6O

C

P

I

T

R

Defer Faults

17 O

C

P

I

T

R


13O

C

P

I

T

R


64

O

C

P

I

T

R


Documentation

16 O

C

P

I

T

R

Ground Handling

3

O

C

P

I

T

R

Fuel/Defuel

14

O

C

P

I

T

R


ns

47 O

C

P

I

T

R


45

O

C

P

I

T

R

Software

65

O

C

P

I

T

R

Mechanical Systems

49

O

C

P

I

T

R


48O

C

P

I

T

R

Propulsion

51O

C

P

I

T

R


limited parts

26O

C

P

I

T

R


2

O

C

P

I

T

R

Task Maintenance

5

O

C

P

I

T

R

Flight Servicing

1

O

C

P

I

T

R

Pre-Flight Checks

55O

C

P

I

T

R

Operate Aircraft

54

O

C

P

I

T

R

Crew Escape System

52O

C

P

I

T

R

Aircraft Structure

50

O

C

P

I

T

R

Supply Chain

10

O

C

P

I

T

R

Acquire Spare Parts

33O

C

P

I

T

R

Store & Maintain

Weapons & RE

38

O

C

P

I

T

R


41

O

C

P

I

T

R

Repair Aircraft

40O

C

P

I

T

R


43

O

C

P

I

T

R

Apply SI(T)s

24 O

C

P

I

T

R

Fault Diagnosis

42

O

C

P

I

T

R


58

O

C

P

I

T

R

Handover

57

O

C

P

I

T

R


11

O

C

P

I

T

R


Control Boards

61

O

C

P

I

T

R

Ground Services

20

O

C

P

I

T

R


25

O

C

P

I

T

R

Weapons

53 O

C

P

I

T

R

Defensive AIds

46

O

C

P

I

T

R

Survival Equipment

56

O

C

P

I

T

R

Train Maintenance

Personnel

7

O

C

P

I

T

R


Certification

27

O

C

P

I

T

R

Chief Air Engineer

69

O

C

P

I

T

R


21

O

C

P

I

T

R


60

O

C

P

I

T

R


9

O

C

P

I

T

R

Maintain GSE

12

O

C

P

I

T

R


(LITS)

63O

C

P

I

T

R

Manage Maintenance

Extensions

62

O

C

P

I

T

R


Process

44

O

C

P

I

T

R

Modify Aircraft

23O

C

P

I

T

R


35

O

C

P

I

T

R


Development

22

O

C

P

I

T

R


Advice

37O

C

P

I

T

R


30

O

C

P

I

T

R

Release to Service

36O

C

P

I

T

R


19O

C

P

I

T

R


4

O

C

P

I

T

R

Assure Quality

15

O

C

P

I

T

R


67

O

C

P

I

T

R


29O

C

P

I

T

R


Analysis

32

O

C

P

I

T

R


68


RA 1300

RTSA

ITEA Contract

Codification


DAOSBaseline Design


Materials

Figure 5-5 5-6 Screen Capture of Visualisation Tool with all Functional Activities

Shown

5.4 Exploiting the Tool

The tool produces a complex and interesting visualisation of the entire

airworthiness management system. Exploitation of the tool will require particular

processes (that is a series of linked functions) to be analysed to ascertain

whether functional resonance has occurred or is likely to occur. The benefit of

using Viso to develop the tool is the ability to decompose the diagram by

selecting its constituent layers. For example Figure 5-7 shows the external

dependencies and activities linked to the aspects of the ‘Train Maintenance

Personnel’ function. This can easily be achieved by selecting the ‘Train

Maintenance Personnel’ layer within Visio. Figures 5-8 to 5- 9 show how the

layers may be manipulated both within Visio and the DII Visio viewing tool. This

functionality will be exploited in the examples given in the following chapters.

86

LINE OPERATIONS





INDUSTRY SUPPORT

REGULATION



Licensed Hangar/ Parking

Space71(IR) Sqn – Non

Destructive Testing

Force Level 0 Plan

Crew Training Plan





Personal Notes

LITS Instructions


MJDI System


Storage Supply Orders

JSP800/886







GR4mations IT Tool


MILITARY EFFECT

O

C

P

I

T

R

Software


Local Finance

Testing

AP100E-15

Detuner / HP Bay


JAMES (IT system)


Commercial Arrangements

Project Management




Trilogi System


RESOLVECAMO Staff







Agreement


Staff


Supply Personnel



PROQUIS


Dynamic Environment

Aircraft Abandoned




Duty Auth





Rigs


Information System


Air Safety Cell


Quality Staff

External Audit


Archived Data

O

C

P

I

T

R

Fit/Remove Role Equiment/Weapons/Explosives/

Ejection Seats

O

C

P

I

T

R


Documentation

O

C

P

I

T

R


O

C

P

I

T

R


O

C

P

I

T

R

Task Maintenance

O

C

P

I

T

R

Provide Authorised


O

C

P

I

T

R

Supply Chain

O

C

P

I

T

R

Maintain GSE

O

C

P

I

T

R

Provide and Account for

Tools and Test Equipment

O

C

P

I

T

R

Fuel/Defuel

O

C

P

I

T

R

Flight Servicing

O

C

P

I

T

R

Defer Faults

O

C

P

I

T

R

Locally Manufacture

Parts

O

C

P

I

T

R


O

C

P

I

T

R

Ground Services

O

C

P

I

T

R


O

C

P

I

T

R


Development

O

C

P

I

T

R

Modify Aircraft

O

C

P

I

T

R

Apply Special Instructions (Technical)

O

C

P

I

T

R


O

C

P

I

T

R


limited parts

O

C

P

I

T

R


Certification

O

C

P

I

T

R

Repair Spares - Industry

O

C

P

I

T

R


O

C

P

I

T

R


O

C

P

I

T

R

Publish Special Instructions

Technical

O

C

P

I

T

R


Analysis

O

C

P

I

T

R

Acquire Spare Parts

O

C

P

I

T

R


O

C

P

I

T

R


O

C

P

I

T

R

Publish Release to Service

O

C

P

I

T

R


Advice

O

C

P

I

T

R

Store, Service, Repair

Weapons and Role

Equipment

O

C

P

I

T

R


Parts

O

C

P

I

T

R

Repair Aircraft

O

C

P

I

T

R

Structural Inspections and

Corrosion Control

O

C

P

I

T

R

Fault Diagnosis

O

C

P

I

T

R


O

C

P

I

T

R


Process

O

C

P

I

T

R

Operate Aircraft

O

C

P

I

T

R

Pre-Flight Checks

O

C

P

I

T

R


O

C

P

I

T

R

Defensive AIds

O

C

P

I

T

R


ns

O

C

P

I

T

R


O

C

P

I

T

R

Mechanical Systems

O

C

P

I

T

R

Aircraft Structure

O

C

P

I

T

R

Propulsion

O

C

P

I

T

R

Crew Escape System

O

C

P

I

T

R

Weapons

O

C

P

I

T

R

Produce Airworthy Survival

Equipment

O

C

P

I

T

R

Handover

O

C

P

I

T

R


O

C

P

I

T

R


O

C

P

I

T

R


O

C

P

I

T

R

Archive Continuing

Airworthiness Records

O

C

P

I

T

R

Chief Air Engineer/

Accountable Manager

O

C

P

I

T

R

Assure Quality

O

C

P

I

T

R

Occurence Reporting,

Investigation & Follow Up

O

C

P

I

T

R

Train Maintenance

Personnel

O

C

P

I

T

R

Ground Handling

O

C

P

I

T

R


Control Boards

O

C

P

I

T

R

Manage Maintenance

Extensions

O

C

P

I

T

R


(LITS)

O

C

P

I

T

R


O

C

P

I

T

R


O

C

P

I

T

R


O

C

P

I

T

R


O

C

P

I

T

R





Figure 5-7 Activities and Dependencies Linked to Aspects of the ‘Train Maintenance Personnel’ Function

87

Figure 5-8 Selecting Layers within Visio – Screen Capture

88

Figure 5-9 DII Visio Viewer – Screen Capture

89

5.5 Summary

A visualisation tool to complement the TASM has been created using MS Visio.

The tool allows linked functions to be highlighted using the layers feature of the

Visio application or the Visio Viewer tool within the MOD’s standard IT system.

An overview of the tool is given in the key included within it and reproduced at

Figure 5-11.The tool allows processes to be investigated for the purpose of risk

assessment or incident investigation. When used with the TASM spreadsheet,

experienced engineers or safety managers will be able to assist in engineering

resilience into the Tornado airworthiness system by adjusting controls on

existing processes so as to prevent harmful functional resonance occurring.

Such system adjustments will need to be based on assessment of the risks

posed by particular hazards, which may only become apparent through

investigation of incidents using the tool. Examples of incident investigation and

risk assessment are given in chapters 6 and 7. System adjustments themselves

may take any form that alters the way in which particular functions perform. For

example, if a reliability problem arose with particular technical subsystem,

resources may be increased such as the provision of additional funding to

procure more spares to feed scheduled maintenance. Control of the

maintenance function would need to change through changing the output of the

‘provide approved data’ function. Whilst all of these things may have been done

without the use of the tools, it is hoped that FRAM will provide insights into

‘whole system’ operation and emergent behaviour that would otherwise be

difficult to achieve.

90

TORNADO GR4 AIRWORTHINESS SYSTEM MODEL – Visualisation Tool

Using the Functional Resonance Analysis Method (FRAM) to model Complex Organisational, Human Factors and Technological Functions

A function (identified by a serial number) has 6 types of aspects, which define its interaction with the system:

Functions have potential couplings between their aspects

*An instantiaion is a ‘time-slice’ of the system showing a specific series of activities

Further info: [email protected]

These couplings only exist for finite periods of time and represent activities.

A process can be shown by an instantiation* of the model; showing a series of coupled functions forming a process.

The complexity of the system makes the model intractable if all processes are considered together. Using the interactive layers, various processes can be can be visualised and cross referred to the spreadsheet model.

The FRAM Spreadsheet Model contains information regarding likely variability of functional outputs; if inadequately controlled this variability may lead to hazards and accidents.

O

C

P

I

T

R

X

O

C

P

I

T

R

X

Function with potential to produce direct air safety hazards through their output

Function which directly affects condition of aircraft & equipment

O

C

P

I

T

R

X O

C

P

I

T

R

Z

O

C

P

I

T

R

Y

Instructions for Highlighting a Process: Click the ‘Layers’ Button above - Select a tick against each function identified as

a part of the process. Select a new colour for each function that has

been ticked – this must be the same colour for each function.

If you wish to also highlight the external processes involved both ‘0 - External’ and ‘0 – External Resources’ layers must be selected and given the same colour as the functions.

Tracking The Process Further into the System: Simply keep selecting and colouring functions. You can highlight all potential activities by

selecting the layer ‘0 – BLUE’ The background can be selected or deselected

using the ‘0 - Background’ layer.Printing an Instantiation: The Internet Explorer print function produces

a poor quality image. Instead, press Ctrl + Prt Scr on your keyboard and then paste into a word document, the use crop tool.

Purpose of the tool: To allow visualisation of specific instantiations* of the Tornado GR4 airworthiness system to enable air safety/airworthiness occurrence investigation, airworthiness risk assessment and system improvement activity

O

C

P

I

T

R

Function Name

Time

Preconditions

Input

Resources

Output

Control

00

Figure 5-10 Visualisation Tool Key

91

This tool allows processes to be investigated for the purpose of risk assessment

or incident investigation. When used with the TASM spreadsheet, experienced

engineers or safety managers will be able to assist in engineering resilience into

the Tornado airworthiness system by adjusting controls on existing processes

so as to prevent harmful functional resonance occurring. Such system

adjustments will need to be based on assessment of the risks posed by

particular hazards, which may only become apparent through investigation of

incidents using the tool. Examples of incident investigation and risk assessment

are given in chapters six and seven. System adjustments themselves may take

any form that alters the way in which particular functions perform. For example,

if a reliability problem arose with particular technical subsystem, resources may

be increased such as the provision of additional funding to procure more spares

to feed scheduled maintenance. Control of the maintenance function would

need to change through changing the output of the ‘provide approved data’

function. Whilst all of these things may have been done without the use of the

tools, it is hoped that FRAM will provide insights into ‘whole system’ operation

and emergent behaviour that would otherwise be difficult to achieve.

93

6 USING THE TORNADO AIRWORTHINESS SYSTEM

MODEL FOR INCIDENT ANALYSIS

Chapter four described Step zero in the FRAM used to build the Tornado

Airworthiness System Model (TASM); this specified that the main purpose of the

TASM was for risk assessment. However, using the visualisation tool allows

ready decomposition of the model into parts pertinent to particular incidents.

Particular processes can be highlighted, with other functions being left as

background functions on the assumption that their variability was not significant

in controlling the processes involved in the incident.

6.1 Case for Using FRAM for Incident Modelling

Chapter one discusses commonly applied accident models, whether they are

technological, human or organisational. The military air safety management

system uses an Occurrence Investigation process manned by local personnel to

understand any occurrences that had the potential to pose an unacceptable air

safety risk. For accidents or serious occurrences the MAA will convene Service

Inquiries to investigate using experts from the military air accident investigation

branch. Similar arrangements exist within civilian operators and regulators. The

purpose of applying FRAM to incident analysis is to provide a resilience

engineering perspective to understanding how incidents occurred and to

provide recommendations that are more likely than traditional methods to

prevent reoccurrence of similar or unrelated incidents. By understanding how

functional performance variability combined to produce an adverse outcome it

should be possible to understand how performance conditions might be shaped

or controlled to produce more desirable outcomes in the future. In order to

explore this hypothesis, two particular incidents that have occurred within the

RAF Tornado Force were selected and analysed. As the following analyses rely

only on data from existing occurrence reports, no new findings will be

highlighted – this chapter just demonstrates how incidents can be described

using the TASM.

94

6.2 Incident One – Thrust Reverser Incidents

Tornado employs a thrust reverse system to provide braking on landing in order

to slow the aircraft to safe taxying speeds. In the event thrust reversers fail to

operate, wheel braking may be used although this does increase the likelihood

of fire hazards from hot brakes, both to the aircraft and to ground crews.

Significantly thrust reverse is also required in the event of high a high speed

abort during take-off. Thrust reversers deploy as ‘clam-shell’ buckets directly

into the jet efflux, rear of the final nozzle in the RB199 engine exhaust system.

Figure 6-1 Tornado GR4 with Thrust Reversers Deployed (Cooke, 2004)

95

6.2.1 Description of Incidents

Tornado has experienced a recent history of thrust reverser incidents, some of

which are summarised here, using data taken from ASIMS:

Table 6-1 Thrust Reverser Air Safety Occurrence Reports 2012/13

# Report ID Date of Occurrence

Brief Title

1 asor\Marham - RAF\2(AC) Sqn\Tornado\13\8805

20/09/2013 Thrust Reverse Failure

2 asor\OOA Kandahar\TorDet (LOS) - KAF\Tornado\13\8559

13/09/2013 Thrust Reverse Failure on Landing


25/07/2013 Thrust reverse failure on landing.

4 asor\Marham - RAF\9 Sqn\Tornado\13\4415

21/05/2013 Thrust Reverse Fault on Taxi

5 asor\OOA Kandahar\TorDet (MRM) - KAF\Tornado\13\3056

16/04/2013 TR Failure on Landing

6 asor\Lossiemouth - RAF\XV(R) Sqn\Tornado\13\2506

25/03/2013 Thrust reverse failure on landing

7 asor\Lossiemouth - RAF\XV(R) Sqn\Tornado\13\2299

18/03/2013 Thrust Reverse failing to stow correctly




19/02/2013 Thrust-Reverse Failure


13/02/2013 TR bucket failed to stow


11/12/2012 Thrust reverse failure on landing

96

The recent history shown in Table 6-1 has been preceded by a number of

detailed investigations into earlier incidents shown in Table 6-2:

Table 6-2 Thrust Reverse Occurrences with Detailed Investigation

# Report ID Date of Occurrence

Brief Title

12 asor\Marham – RAF\31 Sqn\Tornado\12\18524

18/09/2012 Lift Dump and thrust reverser failure on landing

13 asor\Marham – RAF\9 Sqn\Tornado\12\17514


14 asor\OOA Kandahar\TorDet (LOS) – KAF\Tornado\10\133434

20/08/2010 Thrust Reverse Failure and Brake Fire

15 asor\Lossiemouth – RAF\14 Sqn\Tornado\10\133083


In report number 12 the cause of the system failure was not positively

determined, although a throttle box was changed as there was speculation that

this may have caused a wiring fault. In reports 12-14 the thrust reverse system

did not operate because circuit-breakers (CB) had been pulled some time prior

to the flight, consequently when the pilot selected reverse thrust, the system did

not operate.

6.2.2 Summary of the Investigations

The ASIMS record contained various detailed investigations into occurrences

12-14, these all centred on the reasons for the circuit breakers remaining pulled

after the aircraft was released for flight. The investigations worked within a

frame of reference that relied on a hazards and barriers model of the situation.

In each case a number of missed opportunities were identified where the error

could have been spotted. Three particular circuit breakers (CBs) were the focus

of the investigations and in different combinations were responsible for the

failure of the thrust reversers to deploy. These circuit breakers had been

legitimately pulled to inhibit the thrust reverse system as a result of a

requirement to conduct engine ground runs, prior to the flights where each

occurrence happened. The circuit breakers prevented the activation of relays

97

which would put the electrical system into the on-the-ground rather than

airborne state, once the aircraft had landed. This relay system is in of itself a

safety barrier to the operation of thrust reversers whilst airborne. The engine

ground runs were conducted because maintenance of the Environmental

Conditioning System (ECS) had been carried out. ECS has suffered numerous

reliability issues since the mid-life upgrade of the fleet to GR4 standard and had

been subject to considerable work by the Design Organisation (DO) and the

Engineering Authority (EA) since that upgrade programme. This resulted in the

issue of a complicated Routine Technical Instruction1 (RTI) that required the

ECS to be tested and adjusted during engine ground runs. This RTI was not

required in all cases but engine grounds runs were required for other reasons

where it was not, as a result of mandatory maintenance procedures called up

during fault rectification. The difficulty in carrying out the RTI was raised as a

factor in the some of the investigations. The approved maintenance data

required that yellow clip-on safety tags were to be fitted to any CB that has been

pulled, in order that their inoperative state became highly visible. It had become

normal practise for this not to happen during maintenance on front line

squadrons, the reason being that in many cases CBs are set and re-set multiple

times during fault diagnosis. However in some instances, CBs were in fact

pulled as a safety measure during engine ground runs where continual setting

and resetting was not a factor. None the less, the practise of failing to safety-tag

pulled CBs had become an organisational norm. There obviously was a need to

reset CBs so that the system would perform as demanded in the air. However,

maintenance procedures often mandated a complicated series of CB setting

and resetting. It was highlighted that the layout of these procedures in the

written form was difficult to follow. In some cases there appears to have been

some confusion surrounding which technician had performed the final check of

the CB panel, however in all cases this was recorded as having been done. A

final opportunity to spot the pulled CBs was missed during the servicing of each

aircraft before flight. This action was specifically mandated by the flight

1 An RTI is a category of Special Instruction (Technical) instigated by the EA.

98

servicing notes and further amplified through the addition of a supplementary

flight servicing requirement in the aircraft logbook. The flight servicing task was

self-supervised. Shortly after occurrence 15 a local technical instruction was

enacted; the instruction mandated another independent check of CBs as an

additional barrier to failure. The instruction did not include the CB that prevented

the thrust reversers deploying in occurrence 14. In case 14, the approved data

was not rigorously followed as the technicians realised that this would require

functional tests to be duplicated. An unforeseen consequence of this approach

was the removal of an opportunity to check the condition of the CB, although to

add to the confusion, the maintenance procedure that was not followed

contained an error in that it required the CB to be pulled and not reset.

Checking the CBs visually was not easy due to their confined location on the

roof of the nose undercarriage bay; a technique often employed was to sweep a

hand over the panel to feel any raised CBs. In case 12 there was mention of

distraction caused by work being carried out by other tradesmen on systems on

the same aircraft, as well as the implications of the shift system being

employed. The investigations made some recommendations towards

considering a post-taxi check of the thrust reverser system. This was rejected

on the basis of independent technical advice from QinetiQ following a ground

trial.

6.2.3 Instantiation of the FRAM Model

The FRAM model described in Chapter 4 and shown in detail at appendix A

seeks to describe all potential and actual activities in the system linking the

various functions. In order to understand the incidents in question this model

must be ‘time-sliced’ to produce an instantiation of activity during the function. It

is not however quite as simple as describing the activities underway at a single

moment with respect to a single aircraft. The time-slice can be considered to be

moving as the activities permeate downstream through the model. For example

issue of approved data will have happened well before any downstream

coupling occurs between that function and any human function which requires it

as a control. The first step to describing an instantiation of the model in

99

reference to this series of incidents is to list the functions and make of note of

which have been referenced (directly or by implication) in the investigation

reports as contributing in some way to the incident:

Table 6-3 Thrust Reverser FRAM Instantiation

Number Type Function Variability noted in DASOR

1 Human Flight Servicing CB check missed or ineffective

5 Human Task Maintenance Discontinuity in tasking shifts

6 Human Record Work Done on ac

Current CB configuration not correctly recorded; not recorded as work progressed

11 Human

Fit/Remove Role Equipment/Weapons/Explosives/Ejection Seats

Unrelated ‘Litening’ Pod fit task caused distraction

15 Organisational Assure Quality CB Clip normalised practise note dealt with

23 Organisational Modify Aircraft GR4 Modification caused ECS system reliability issues

24 Organisational Apply Special Instruction (Technical)s

Application of complicated RTI required CBs to be disturbed - not completed correctly

30 Organisational Publish Approved Data (Tech Manuals & Policy)

Maintenance Procedures requires CBs to be disturbed

31 Organisational Publish Special Instructions (Technical)

EA mandated repeated disturbance of CBs

32 Human

Cost Benefit Analysis / Hazard Analysis/ ALARP Decision

EA decision not to mandate Thrust Reverser system tests during taxy because of FOD risk.

35 Human Monitor Reliability Data

RTI was means of achieving require reliability data

37 Organisational Independent Advice QQ advice on FOD ingestion during taxi checks

42 Human Fault Diagnosis ECS Fault diagnosis called up engine ground runs.

43 Human Corrective Maintenance

Replacement of parts called up engine ground runs

48 Technological Armament & Electrical Systems

Electrical system did not function in CB pulled state

49 Technological Mechanical Systems ECS system not performing to specification

100

51 Technological Propulsion Thrust reverse system did not operate - failure to provide thrust in required direction

54 Human Operate Aircraft Pilot selected thrust reverse

55 Human Pre-Flight Checks Did not highlight CBs in correctly set or Thrust Reverse serviceability

58 Human Supervise Maintenance

Supervision (including self-supervision) not adequate to identify failure to set CBs

59 Human Independent Inspection

Not in place - now instigated

60 Human Plan Weekly-Daily Flying Programme

Task did not match maintenance personnel resource

64 Organisational Operate Shift Pattern

Caused discontinuity in tasking

These functions of interest can be selected as layers within the Visualisation

Tool, with colour added to highlight coupled functions. Because 23 functions

have been identified as having variability the visualisation of the FRAM Model

Instantiation is still very complicated.

101

Figure 6-2 Thrust Reverser Incidents Visualisation



Personal Notes

LITS Instructions

Weather



Archived Data

O

C

P

I

T

R


O

C

P

I

T

R

Locally Manufacture

Parts

O

C

P

I

T

R

Publish SI(T)sO

C

P

I

T

R


18

31 668

O

C

P

I

T

R


28O

C

P

I

T

R


34O

C

P

I

T

R


Parts

39O

C

P

I

T

R


59O

C

P

I

T

R


6O

C

P

I

T

R

Defer Faults

17 O

C

P

I

T

R


13O

C

P

I

T

R


64

O

C

P

I

T

R


Documentation

16 O

C

P

I

T

R

Ground Handling

3

O

C

P

I

T

R

Fuel/Defuel

14

O

C

P

I

T

R


ns

47 O

C

P

I

T

R


45

O

C

P

I

T

R

Software

65

O

C

P

I

T

R

Mechanical Systems

49

O

C

P

I

T

R


48O

C

P

I

T

R

Propulsion

51O

C

P

I

T

R


limited parts

26O

C

P

I

T

R


2

O

C

P

I

T

R

Task Maintenance

5

O

C

P

I

T

R

Flight Servicing

1

O

C

P

I

T

R

Pre-Flight Checks

55O

C

P

I

T

R

Operate Aircraft

54

O

C

P

I

T

R

Crew Escape System

52O

C

P

I

T

R

Aircraft Structure

50

O

C

P

I

T

R

Supply Chain

10

O

C

P

I

T

R

Acquire Spare Parts

33O

C

P

I

T

R

Store & Maintain

Weapons & RE

38

O

C

P

I

T

R


41

O

C

P

I

T

R

Repair Aircraft

40O

C

P

I

T

R


43

O

C

P

I

T

R

Apply SI(T)s

24 O

C

P

I

T

R

Fault Diagnosis

42

O

C

P

I

T

R


58

O

C

P

I

T

R

Handover

57

O

C

P

I

T

R


11

O

C

P

I

T

R


Control Boards

61

O

C

P

I

T

R

Ground Services

20

O

C

P

I

T

R


25

O

C

P

I

T

R

Weapons

53 O

C

P

I

T

R

Defensive AIds

46

O

C

P

I

T

R

Survival Equipment

56

O

C

P

I

T

R

Train Maintenance

Personnel

7

O

C

P

I

T

R


Certification

27

O

C

P

I

T

R

Chief Air Engineer

69

O

C

P

I

T

R


21

O

C

P

I

T

R


60

O

C

P

I

T

R


9

O

C

P

I

T

R

Maintain GSE

12

O

C

P

I

T

R


(LITS)

63O

C

P

I

T

R

Manage Maintenance

Extensions

62

O

C

P

I

T

R


Process

44

O

C

P

I

T

R

Modify Aircraft

23O

C

P

I

T

R


35

O

C

P

I

T

R


Development

22

O

C

P

I

T

R


Advice

37O

C

P

I

T

R


30

O

C

P

I

T

R

Release to Service

36O

C

P

I

T

R


19O

C

P

I

T

R


4

O

C

P

I

T

R

Assure Quality

15

O

C

P

I

T

R


67

O

C

P

I

T

R


29O

C

P

I

T

R


Analysis

32

O

C

P

I

T

R


68

102

6.2.4 The Sources of Variability

The output of the propulsion system and the electrical system on which it relies

varied outside of the required performance envelope in that the thrust reverse

did not deploy because the upstream electrical function did not provide the

required output. In this case the electrical system output was an extreme case

of output variability – there was no power supplied to the thrust reverse circuit

when demanded. There were potentially other dimensions in which the

electrical output might have varied e.g. power, current, voltage etc.

Figure 6-3 Propulsion & Electrical System

Clearly this situation arose because the upstream maintenance functional

output meant that the electrical system was in the wrong configuration (CB

pulled). Thrust reversers are used on nearly every Tornado sortie – what then

was the key element of variability that made these instances different to most

other times the aircraft was operated? In each case, human maintenance

activity was required on a system that had an upstream connection with the

electrical system prior to the occurrence. Every Tornado flight requires a

significant degree of variable human functions to allow it to take place. Using

FRAM and the TASM an occurrence investigator needs establish:

How the functions came to combine in a manner that was potentially

hazardous to the system?

No Electrical Supply to Thrust ReversersO

C

P

I

T

R


48O

C

P

I

T

R

Propulsion

51

Dynamic Environment

No Reverse Thrust

103

Given functional output variability is normally sufficiently damped so as

not to produce a hazardous output (e.g. thrust reverse normally operates

correctly), what damping function that is normally present was not

adequate in this case?

The TASM visualisation tool can be used to trace back through the system to

identify where functional resonance has occurred. Figure 6-4 highlights

potentially functionally resonant activities, which can then be examined in the

FRAM Model – shown with red outlines in Table 6-4:

104

Figure 6-4 Electrical System Potential Functionally Resonant Activities

O

C

P

I

T

R


O

C

P

I

T

R

Locally Manufacture

Parts

O

C

P

I

T

R

Publish SI(T)sO

C

P

I

T

R


18

31 668

O

C

P

I

T

R


28O

C

P

I

T

R


34O

C

P

I

T

R


Parts

39O

C

P

I

T

R


59O

C

P

I

T

R


6O

C

P

I

T

R

Defer Faults

17 O

C

P

I

T

R


13O

C

P

I

T

R


64

O

C

P

I

T

R


Documentation

16 O

C

P

I

T

R

Ground Handling

3

O

C

P

I

T

R

Fuel/Defuel

14

O

C

P

I

T

R


ns

47 O

C

P

I

T

R


45

O

C

P

I

T

R

Software

65

O

C

P

I

T

R

Mechanical Systems

49

O

C

P

I

T

R


48O

C

P

I

T

R

Propulsion

51O

C

P

I

T

R


limited parts

26O

C

P

I

T

R


2

O

C

P

I

T

R

Task Maintenance

5

O

C

P

I

T

R

Flight Servicing

1

O

C

P

I

T

R

Pre-Flight Checks

55O

C

P

I

T

R

Operate Aircraft

54

O

C

P

I

T

R

Crew Escape System

52O

C

P

I

T

R

Aircraft Structure

50

O

C

P

I

T

R

Supply Chain

10

O

C

P

I

T

R

Acquire Spare Parts

33O

C

P

I

T

R

Store & Maintain

Weapons & RE

38

O

C

P

I

T

R


41

O

C

P

I

T

R

Repair Aircraft

40O

C

P

I

T

R


43

O

C

P

I

T

R

Apply SI(T)s

24 O

C

P

I

T

R

Fault Diagnosis

42

O

C

P

I

T

R


58

O

C

P

I

T

R

Handover

57

O

C

P

I

T

R


11

O

C

P

I

T

R


Control Boards

61

O

C

P

I

T

R

Ground Services

20

O

C

P

I

T

R


25

O

C

P

I

T

R

Weapons

53 O

C

P

I

T

R

Defensive AIds

46

O

C

P

I

T

R

Survival Equipment

56

O

C

P

I

T

R

Train Maintenance

Personnel

7

O

C

P

I

T

R


Certification

27

O

C

P

I

T

R

Chief Air Engineer

69

O

C

P

I

T

R


21

O

C

P

I

T

R


60

O

C

P

I

T

R


9

O

C

P

I

T

R

Maintain GSE

12

O

C

P

I

T

R


(LITS)

63O

C

P

I

T

R

Manage Maintenance

Extensions

62

O

C

P

I

T

R


Process

44

O

C

P

I

T

R

Modify Aircraft

23O

C

P

I

T

R


35

O

C

P

I

T

R


Development

22

O

C

P

I

T

R


Advice

37O

C

P

I

T

R


30

O

C

P

I

T

R

Release to Service

36O

C

P

I

T

R


19O

C

P

I

T

R


4

O

C

P

I

T

R


67

O

C

P

I

T

R


29O

C

P

I

T

R


Analysis

32

O

C

P

I

T

R


68

No Electrical Supply to Thrust Reversers

105

Table 6-4 FRAM Model of Electrical System

Name of Function Armament & Electrical SystemsAspect Description of Aspect Number Name Aspect

Input Ground Services (Electrical Power Generation) 20

Ground Services (Cooling,

Power, Dehumidification,

Steps, Staging, Bungs,

Blanks)

AC connected/removed

to ground services - Arm

Elect, Structure, Mech

Sys,

SequenceOmissions - items left

attached or not fittedMedium Medium

No electrical output during

maintenanceINCREASE 12

Propulsion System (Electrical Power Generation) 51 Propulsion Electrical Power Force/ Distance/Direction Fail to provide power Low High No electrical power INCREASE 9

Output Electrical Power/Signals

Precondition Apply Special Instructions (Technical) 24Apply Special Instruction

(Technical)

Special Instruction

(Technical) Applied to

applicable

Timing/DurationInstruction not complied

within specified timeHigh High Unsafe condition develops INCREASE 27

Scheduled Maintenance 2 Scheduled MaintenanceAC Inspected (all

systems)Sequence Omission High Medium

Unsafe condition develops due to

component failureINCREASE 18

Repair Maintenance 40 Repair Aircraft Aircraft Structure Repair Timing/Duration Not completed within time High MediumIncorrect functional output or

unsafe conditionINCREASE 18

Corrective Maintenance 43 Corrective MaintenanceSystem Restored to

correct functionForce/ Distance/Direction

Ineffective maintenance

actionHigh High Function impaired INCREASE 27

Modify Aircraft 23 Modify AircraftAircraft Systems

Modified under Service Sequence

Modification occurs in wrong

sequence - config control Medium High Function impaired INCREASE 18

Fit/Remove Role Equipment and Weapons 11

Fit/Remove Role

Equipment/Weapons/Expl

osives/Ejection Seats

Aircraft in Changed Role

Fit (Arm

Elec/Weapons/Crew

Timing/DurationTakes longer than

forecast/requiredHigh Low

Incorrect connection impairs signal

to weaponsINCREASE 9

Pre-Flight Checks 55 Pre-Flight ChecksArmament & Electrical

Systems CheckedTiming/Duration Omissions High Medium

Function impaired remains in a

failed state whilst airborneNO CHANGE 12

Resource Structure 50 Aircraft StructureArmament & Electrical

System Loads are Force/ Distance/Direction Fails to react load Low High

Potential for electrical shorting or

sparkingINCREASE 9

Control Operate Aircraft 54 Operate AircraftInputs to aircraft

systemsForce/ Distance/Direction Incorrect control input High High Potential for unsafe condition INCREASE 27

Time Not initially described NO CHANGE 0



(Damping)

Rough Downstream

Function Variability Score

Not initially described

Upstream Function Most Likely Dimension of




Frequency of Upstream

Output Performance

Variability


Output Performance

Variability



106

It is important to note that the visualisation tool automatically highlights all

activities which are linked to the electrical system function and any other

function identified in Table 6-3; this does not necessarily mean that these

activities were functionally resonant. To understand the relationship further it is

necessary to compare the model data to the occurrence report described

above. This shows that neither the operator function (pilot) nor the propulsion

system (providing power) output variability was significant during the

occurrence. This left the Apply Special Instructions (Technical), Corrective

Maintenance and Pre-Flight Checks aspects of the Electrical system function.

These three upstream functions are linked by various activities to the ‘pre-

condition’ aspect of the Electrical System function. In this occurrence, all of

three of these functions should have resulted in the CBs being correctly set.

The variation in their functional output meant that the CBs were incorrectly set

and the preconditions (otherwise termed ‘execution conditions’) for the Electrical

System were not present and therefore the electrical signal was not sent to the

thrust reverse element of the propulsion system. Table 6-5 shows these three

preconditions highlighted within the FRAM Model.

107

Table 6-5 Electrical System Precondition Variability

Name of Function Armament & Electrical Systems

Aspect Description of Aspect Number Name Aspect

Input Ground Services (Electrical Power Generation) 20

Ground Services (Cooling,

Power, Dehumidification,

Steps, Staging, Bungs,

Blanks)

AC connected/removed

to ground services - Arm

Elect, Structure, Mech

Sys,

SequenceOmissions - items left

attached or not fittedMedium Medium

No electrical output during

maintenanceINCREASE 12

Propulsion System (Electrical Power Generation) 51 Propulsion Electrical Power Force/ Distance/Direction Fail to provide power Low High No electrical power INCREASE 9

Output Electrical Power/Signals

Precondition Apply Special Instructions (Technical) 24Apply Special Instruction

(Technical)

Special Instruction

(Technical) Applied to

applicable

Timing/DurationInstruction not complied

within specified timeHigh High Unsafe condition develops INCREASE 27

Scheduled Maintenance 2 Scheduled MaintenanceAC Inspected (all

systems)Sequence Omission High Medium

Unsafe condition develops due to

component failureINCREASE 18

Repair Maintenance 40 Repair Aircraft Aircraft Structure Repair Timing/Duration Not completed within time High MediumIncorrect functional output or

unsafe conditionINCREASE 18

Corrective Maintenance 43 Corrective MaintenanceSystem Restored to

correct functionForce/ Distance/Direction

Ineffective maintenance

actionHigh High Function impaired INCREASE 27

Modify Aircraft 23 Modify AircraftAircraft Systems

Modified under Service Sequence

Modification occurs in wrong

sequence - config control Medium High Function impaired INCREASE 18

Fit/Remove Role Equipment and Weapons 11

Fit/Remove Role

Equipment/Weapons/Expl

osives/Ejection Seats

Aircraft in Changed Role

Fit (Arm

Elec/Weapons/Crew

Timing/DurationTakes longer than

forecast/requiredHigh Low

Incorrect connection impairs signal

to weaponsINCREASE 9

Pre-Flight Checks 55 Pre-Flight ChecksArmament & Electrical

Systems CheckedTiming/Duration Omissions High Medium

Function impaired remains in a

failed state whilst airborneNO CHANGE 12

Resource Structure 50 Aircraft StructureArmament & Electrical

System Loads are Force/ Distance/Direction Fails to react load Low High

Potential for electrical shorting or

sparkingINCREASE 9

Control Operate Aircraft 54 Operate AircraftInputs to aircraft

systemsForce/ Distance/Direction Incorrect control input High High Potential for unsafe condition INCREASE 27

Time Not initially described NO CHANGE 0



(Damping)

Rough Downstream



Upstream Function Most Likely Dimension of





Output Performance

Variability


Output Performance

Variability



108

By conducting an iterative examination of both the TASM and the visualisation

tool it is possible to construct an instantiation of the occurrence – in so far as

the investigation report details. Copying and pasting the functions and their

linked activities into a new Visio drawing a slightly clearer picture can be

created as shown in Figure 6-8. This instantiation uses the investigation report

to describe how the activities shown in the TASM at Appendix A varied in this

series of occurrences.

6.2.5 Insights from TASM

Figure 6-8 contains a number of insights that can be drawn from the accident

reports. The functions coloured red produced variability in their output; this was

either through leaving CBs in the incorrect position or by failing to spot this

condition before the aircraft flew. Purple functions provided a damping external

control to the red functions. In the reported occurrences these damping controls

were inadequate at ensuring that the red functional output was delivered inside

appropriate bounds. For example, in the case of supervision, the function failed

to adequately monitor and adjust the output of the functions that carried out

work on the aircraft (corrective maintenance, apply SI(T), etc.). Moving further

back into the system, it is possible to see how functions carried out by the Type

Airworthiness Authority staff did not provide additional damping mechanisms

through mandating pre-flight checks or independent checks of the CBs. In the

latter case, this additional control loop was provided later. The balance of the

system was tipped by the need to carry out Routine Technical Inspections on

the Environmental Conditioning System, resulting in the disturbance of the CBs.

This process was in itself a method of controlling the performance of the ECS;

changed a result of modification some years prior to the occurrences.

Examination of Figure 6-8 shows that the provide maintenance personnel

function was a significant sources of variable system performance. A potential

control loop existed to provide damping on the effects of this variability – the

planning process. This is a sub process within the visualisation shown at Figure

6-8. Adjustments could have been made to maintain system stability through

adjusting the three month flying programme to account for the lack of SQEP.

109

This would have flowed through to the weekly flying programme which provides

a time constraint on the performance of the various maintenance functions. This

method of incident analysis does not provide a chain of events or failure mode

explanation to what happened, rather it paints a picture of a system tipped out

of balance, resulting in a hazardous physical output.

110

Figure 6-5 Instantiation of Thrust Reverse Occurrence Reports

Time Available for Flight Servicing

Requirement to Rectify ECS Fault

Requirement to Record CB Positions

Litening Pod Fit Required System Access

GR4 Modification to ECS System introduced Reliability Issues

Decision on requirement for CB Checks

Requirement for ECS RTI

Specification for Time Window to Carry Out ECS RTI

Requirement to Pull CBs for EGR

Time to Complete ECS Corrective Maintenance

Rectification of ECS Faults

No Electrical Supply to Thrust Reversers

CBs not correctly reset

Thrust Reverse Pilot Input

ECS Failures

Inexperienced Supervisor Assigned

Supervision not adequately carried out

Inadequate Time for Supervision

No Independent Inspection of CB position

No Requirement for Independent CB checks

Quality Culture

O

C

P

I

T

R

Publish SI(T)s

Requirement to Carry Out ECS RTI on Specific Aircraft

Programme Not Matched to Swing Shift Resource

Time Available for RTI

RTI Specifies CB Resets

Time Available for Checks

No Check of Thrust Reverse System

No Aircrew CB Checks

O

C

P

I

T

R


59

O

C

P

I

T

R


6

O

C

P

I

T

R


64

O

C

P

I

T

R

Mechanical Systems

49

O

C

P

I

T

R


48

O

C

P

I

T

R

Propulsion

51

O

C

P

I

T

R

Task Maintenance

5

O

C

P

I

T

R

Flight Servicing

1

O

C

P

I

T

R

Pre-Flight Checks

55

O

C

P

I

T

R

Operate Aircraft

54

O

C

P

I

T

R


43

O

C

P

I

T

R

Apply SI(T)s

24

O

C

P

I

T

R

Fault Diagnosis

42

O

C

P

I

T

R


58

O

C

P

I

T

R


11

O

C

P

I

T

R


60

O

C

P

I

T

R

Modify Aircraft

23

O

C

P

I

T

R


35

O

C

P

I

T

R


Advice

37

O

C

P

I

T

R


30

O

C

P

I

T

R

Assure Quality

15

O

C

P

I

T

R


29

O

C

P

I

T

R


Analysis

32

Task to Analyse FOD Risk From Thrust Reverse Pre-Flight Check

Data

Requirement to Pull CBs for EGRs

Requirement/Approval of GR4 ECS Modification

Requirement for CB checks

Advice on FOD Hazard

No Aircrew CB Check Specified

31

CBs not Reset after Fault Diagnosis

CBs not Reset After ECS Rectification



CB Incorrect Configuration Not Spotted

No Up-to-Date Record of CB Position as Tasks Progress

No Report of CB Issues

O

C

P

I

T

R


8

Inadequate SQEP for ECS RTI

Inadequate SQEP for ECS Fault Diagnosis

Inadequate SQEP for ECS Rectification

Time Available for Litening Pod Fit

Notes: Red lines show activities with unacceptable

variability. Black lines show other activities recorded or

inferred as having significant variability in the Occurrence Investigations.

Other aspects parts of the system are not shown for clarity.

Variability could be traced back to other functions not currently shown, with further investigation.

Inadequate SQEP for Minimum Shift List

O

C

P

I

T

R

EXCESS OUTPUT

VARIABILITY

O

C

P

I

T

R

INADEQUATE DAMPING

O

C

P

I

T

R

DIRECT INTERFACE

WITH AIRCRAFT

O

C

P

I

T

R

AIRCRAFT SYSTEMS

Harmful Variability

Variable Activity

111

6.2.6 Incident 2 – Missing Rigging Pin

The Tornado is equipped with a first generation fly-by-wire flight control system,

with a reversionary mechanical control system as a back-up. Flying control

surfaces are mechanically actuated through hydraulics. When components

within the system are disturbed or removed for maintenance, there is a

requirement to fit rigging pins to hold the remaining components in the correct

configuration. The issue of such pins is strictly controlled by tool control

processes mandated by regulation. These processes are designed to prevent

pins being inadvertently left fitted to the aircraft and restricting control

movement in flight.

6.2.7 Description of Incident

An incident (MAA, 2011b) occurred during maintenance when a set of rigging

pins was found to be deficient of a single pin, after they were provided from tool

stores to aid work on an aircraft. Having found that the pin was missing, all

further flights on the Squadron were delayed whilst a search for the missing

item was conducted. Following an examination of paperwork records, the pin

was eventually found inside a second aircraft where the set had previously been

used. This aircraft was in the process of being rebuilt to an airworthy condition

following maintenance. The incident clearly presented a near-miss in that it was

only by chance that the set of pins was re-used before the aircraft with the loose

pin inside was flown. The loose article hazard presented by the pin was

sufficient to potentially cause a control restriction and conceivably cause loss of

the aircraft.

112

6.2.8 Summary of Investigation

The Occurrence Investigation report identified a series of contributory factors.

The occurrence was during a period of disruption caused by military operations

over Libya (Operation ELLAMY). The units remaining in the UK were left

deficient of a variety of technicians and the maintenance and flying task was felt

not have been adequately reduced to compensate. The first element of the

maintenance task required removal of the High Lift and Wing Sweep Control

Unit (HLWSCU). At this stage the required rigging pins were not fitted, this

having become the normal practise at the unit concerned. When installation of

the component occurred some days later, the technician involved at the time

realised that pins had not been fitted and withdrew a set from the tool stores. Of

the set of four pins that were required to be fitted for the HLWSCU replacement

only three were in fact fitted, with two of them being placed correctly and the

third placed in an incorrect position and not fully engaged in the mechanism

(see dashed line in Figure 6-9). There was some confusion between the

technician and his supervisor over the correct location for the third pin although

neither of them referred to the approved data.

113

Figure 6-6 Location of Where Lost Pin was Installed (MAA, 2011b)

The task was identified as being particularly complicated with less than totally

clear information provided by maintenance procedures (approved data).

Compounding the issue was the lack of experience of the supervisor and the

overall lack of resources on the maintenance unit due to manpower being

drawn to the Libyan air campaign. As shown in Figure 6-7, the access to the

area was difficult for the technicians.

114

Figure 6-7 General Installation Location of Lost Pin (MAA, 2011b)

A week then elapsed before other functional tests were carried out, these

initially failed because the rigging pins prevented control movement. The

technician performing the test realised this and removed the two pins from the

normal HLWSCU location but omitted to remove the third erroneously placed

pin. Because the third pin had not been correctly inserted, it failed to prevent

movement of the controls during functional testing and its presence remained

unnoticed. Once mechanical aspects of the task were complete, all tools were

returned including the set of pins. The technician who returned these pins failed

to check its contents and returned it directly to the tools stores shelf. Seeing that

it had been returned to the shelf, the worker in tool stores did not check its

contents either. The investigation concluded that the tool stores worker’s

training for the task was inadequate. A further 100% check was required at the

end of the shift but the inspector who should have carried this out did not do so.

During the series of handovers as the task progressed, regulation required

100% tool checks at each handover. It is not clear whether these were carried

115

out, however the configuration of the box in which the pin set is kept is not

conducive to highlighting missing items – e.g. no ‘shadowing’ of the items (see

Fig 6-8.). The contents list was also not clear.

Figure 6-8 Pin Location in Tool Kit (MAA, 2011b)

116

6.2.9 Instantiation of the TASM

The first step was to take the points raised in the DASOR investigation and map

these to functions within the FRAM model:

Table 6-6 Functional Variability Noted From Investigation

Using the FRAM Visualisation these functional layers can be selected to

produce the diagram shown in Figure 6-9.

As with the previous investigation the visualisation tool only provides an initial

step in the investigation process; highlighting all links between those functions

noted as having significant output variability. These linked functions can be

easily selected and pasted into a new drawing where pertinent information

relating to the variability can be added to the diagram to produce a more

complete picture of the incident, illustrated in Figure 6-10.

Number Type Function Variation noted in DASOR

4 Organisational 3 Month Flying Programme Output not matched to shift resources

5 Human Task Maintenance Lack of continuity in tasking e.g. broken activity

6 Human Record Work Done on ac Pins not accounted for in records

7 OrganisationalTrain Maintenance

PersonnelInsufficient on the job training for supervisor

8 OrganisationalProvide Authorised

Maintenance PersonnelInsufficient manpower to match the flying programme

13 OrganisationalProvide & Account for Tools

and Test EquipmentPin returned to store with missing item not checked

21 Organisational Force and A4 OperationsMost experienced manpower diverted to Operations

elsewhere

30 OrganisationalPublish Approved Data

(Tech Manuals & Policy)Insufficient specification of rigging pin placements

43 Human Corrective MaintenanceNot carried out iaw approved data; functional test

passed with pin in place

49 Technological Mechanical SystemsFailures required corrective maintenance, pinning

required

50 Technological Aircraft Structure Pin did not interface correctly with structure

57 Human Handover100% tool check not complete at handover; pin

location not specified

58 Human Supervise Maintenance Supervision not close enough to spot errors

64 Organisational Operate Shift PatternShift pattern broke task into factured elements

causing discontinuity

117

Figure 6-9 Visualisation Tool Output for Rigging Tool Occurrence2

2 Note that a Visio software bug means that some connections become ‘un-glued’ when copying and pasting as images – this results in some lines being erroneously pasted into the corner of the drawing. On-screen performance is

not affected.



Force Level 0 Plan

Crew Training Plan





Personal Notes

LITS Instructions

MJDI System



JSP800/886







GR4mations IT Tool


MILITARY EFFECT

AP100E-15




JAMES (IT system)



Arrangements

Project Management




Trilogi System


RESOLVECAMO Staff







Agreement


Staff


Supply Personnel



NETMA

PROQUIS


Dynamic Environment

Aircraft Abandoned




Duty Auth





Rigs

Anywhere in system


somewhere in system


Information System


CAMO Staff

Air Safety Cell


Anywhere in system

Quality Culture

Quality Staff

External Audit


Archived Data

O

C

P

I

T

R


O

C

P

I

T

R

Locally Manufacture

Parts

O

C

P

I

T

R

Publish SI(T)sO

C

P

I

T

R




18

31 668

Handling Squadron

O

C

P

I

T

R


28O

C

P

I

T

R


34O

C

P

I

T

R


Parts

39O

C

P

I

T

R


59O

C

P

I

T

R


6O

C

P

I

T

R

Defer Faults

17 O

C

P

I

T

R


13O

C

P

I

T

R


64

O

C

P

I

T

R


Documentation

16 O

C

P

I

T

R

Ground Handling

3

O

C

P

I

T

R

Fuel/Defuel

14

O

C

P

I

T

R


ns

47 O

C

P

I

T

R


45

O

C

P

I

T

R

Software

65

O

C

P

I

T

R

Mechanical Systems

49

O

C

P

I

T

R


48O

C

P

I

T

R

Propulsion

51O

C

P

I

T

R


limited parts

26O

C

P

I

T

R


2

O

C

P

I

T

R

Task Maintenance

5

O

C

P

I

T

R

Flight Servicing

1

O

C

P

I

T

R

Pre-Flight Checks

55O

C

P

I

T

R

Operate Aircraft

54

O

C

P

I

T

R

Crew Escape System

52O

C

P

I

T

R

Aircraft Structure

50

O

C

P

I

T

R

Supply Chain

10

O

C

P

I

T

R

Acquire Spare Parts

33O

C

P

I

T

R

Store & Maintain

Weapons & RE

38

O

C

P

I

T

R


41

O

C

P

I

T

R

Repair Aircraft

40O

C

P

I

T

R


43

O

C

P

I

T

R

Apply SI(T)s

24 O

C

P

I

T

R

Fault Diagnosis

42

O

C

P

I

T

R


58

O

C

P

I

T

R

Handover

57

O

C

P

I

T

R


11

O

C

P

I

T

R


Control Boards

61

O

C

P

I

T

R

Ground Services

20

O

C

P

I

T

R


25

O

C

P

I

T

R

Weapons

53 O

C

P

I

T

R

Defensive AIds

46

O

C

P

I

T

R

Survival Equipment

56

O

C

P

I

T

R

Train Maintenance

Personnel

7

O

C

P

I

T

R


Certification

27

O

C

P

I

T

R

Chief Air Engineer

69

O

C

P

I

T

R


21

O

C

P

I

T

R


60

O

C

P

I

T

R


9

O

C

P

I

T

R

Maintain GSE

12

O

C

P

I

T

R


(LITS)

63O

C

P

I

T

R

Manage Maintenance

Extensions

62

O

C

P

I

T

R


Process

44

O

C

P

I

T

R

Modify Aircraft

23O

C

P

I

T

R


35

O

C

P

I

T

R


Development

22

O

C

P

I

T

R


Advice

37O

C

P

I

T

R


30

O

C

P

I

T

R

Release to Service

36O

C

P

I

T

R


19O

C

P

I

T

R


4

O

C

P

I

T

R

Assure Quality

15

O

C

P

I

T

R


67

O

C

P

I

T

R


29O

C

P

I

T

R


Analysis

32

O

C

P

I

T

R


68


RA 1300

RTSA

ITEA Contract

Codification


DAOSBaseline Design


118

Figure 6-10 Instantiation of Rigging Pin Occurrence

Flying Requirement Not Sufficiently Reduced for Additional Operational Requirement

Reduced number of SQEP Technicians on Shift List

Reduced EngineeringResources

Lack of Continuity in Tasking

Insufficient Manpower to Meet Requirement

Insufficient Time to Conduct Continuous Work on Task

Rigging Pin Left In Mechanical System

Inadequate Time Allowed to Conduct Handover

Supervisor was not SQEP

Supervisor Had Received Insufficient On-the-Job Training

Trained Personnel Diverted to Operations – No Back Fill

Flying Requirement not Matched to Engineering Resource

Discontinuity in Allocation of Personnel to Task

O

C

P

I

T

R


6

O

C

P

I

T

R


13

O

C

P

I

T

R


64

O

C

P

I

T

R

Mechanical Systems

49

O

C

P

I

T

R

Task Maintenance

5

O

C

P

I

T

R


43

O

C

P

I

T

R


58

O

C

P

I

T

R

Handover

57

O

C

P

I

T

R

Train Maintenance

Personnel

7

O

C

P

I

T

R


21

O

C

P

I

T

R


30

O

C

P

I

T

R


4

Supervision did not highlight errors in pin placement

NCO in Charge of Tool Stores was Not Available

Supervisor not SQEP

Placement of Rigging Pin Inadequately Defined

Rigging Pin Set Issued with Pin Missing

Location of Rigging Pins Not Described in Handover

O

C

P

I

T

R


60

HLWSCU Fit/Removal/Test Repeatedly

Interrupted to Divert Resources

Position of Rigging Pins Inadequately Documented


Supervision Broken Across Shifts

Supervisor Time Required for Higher Priority Tasks


Insufficient Total Manpower to Match the Task

Flying Requirement Not Matched to Engineering Resource

O

C

P

I

T

R


8

Rigging Pin Set Returned with Pin Missing

Tool Stores Worker Inadequately Trained

Functional Test Passed with Pin In place

Rigging Pin Incorrectly Positioned

Handover Tool Checks not Completed

Supervision did not ensure correct pin placement

Notes: Red lines show activities with unacceptable variability. Black lines show other activities recorded or inferred as

having significant variability in the Occurrence Investigations. Other aspects parts of the system are not shown for clarity. Variability could be traced back to other functions not

currently shown, with further investigation.

O

C

P

I

T

R

EXCESS OUTPUT

VARIABILITY

O

C

P

I

T

R

INADEQUATE DAMPING

O

C

P

I

T

R

DIRECT INTERFACE

WITH AIRCRAFT

O

C

P

I

T

R

AIRCRAFT SYSTEMS

Harmful Variability

Variable Activity

119

6.2.10 Insights from TASM

The TASM again shows this incident as a control problem; figure 6-10 provides

an instantiation of the TASM. It shows in red a number of activities which linked

functions and caused output variability to permeate downstream through the

system. Other activities are shown where they are mentioned in the RAF

investigation. Many of these activities could have exerted more control over the

functions that produced variable output. In some cases a complete control loop

was missing – despite the practise of CBs not being ‘safety-tagged’ when in the

pulled condition, there was not quality control over this practise. The quality

function is shown as having an unlinked activity, which for the purpose of this

instantiation meant that there was no output. The main variability in question in

this occurrence was that surrounding the performance of tool control processes

and the way the corrective maintenance was conducted on the mechanical

system. There was a potentially harmful variability from the corrective

maintenance function in that the rigging pin was left in the system, which

resonated with the way that the mechanical system performed under the

functional test – passing with the pin in place. If the whole system had been

operating within acceptable bounds of control then the supervision and tool

control function would have provided further damping through checks

adjustments on the way that the scheduled maintenance was conducted. It was

a serendipity rather than ‘design for resilience’ that provided a warning that

there was a tool control issue before the aircraft was released in an un-

airworthy condition. A source of variability that is shown to permeate through

the system was the operational plan to divert resources to Operation ELLAMY.

The flying and shift programming functions did not adjust their outputs to

compensate adequately and neither did the maintenance tasking function. A

potential damping mechanism was therefore lost. The DASOR included an error

management investigation which focussed on the organisational and human

failings in the scenario. The benefit in using FRAM is that it shows how all of

these aspects of the situation are linked.

121

7 USING THE TORNADO AIRWORTHINESS SYSTEM

MODEL FOR RISK ANALYSIS

The use of FRAM for risk analysis has been the subject of discussion amongst

researchers and an accepted form of practise has not yet been developed. This

chapter seeks to contribute to this development process. The FRAM attempts to

provide a more complete solution for managing risks in complex systems in

comparison to other more linear methods. In this chapter the existing risk

management process is described along with the current theoretical basis for

risk management. A new theoretical basis is proposed and then a new risk

assessment process is given. This is followed by a detailed example. The new

theoretical basis and assessment technique is then combined to give a new

approach to risk management.

7.1 Case for Using TASM for Risk Analysis

Airworthiness (or ‘equipment safety’) risk management systems employed by

the MOD in relation to Tornado include the construction and management of a

Weapon System Safety Case, which uses Goal Structured Notation (GSN). Part

of the safety case argument (Manson, 2001), is the requirement to manage

equipment safety risks to within the MAA’s targets for Risk of Death from All

Causes as required by RA1210 (MAA, 2012a). This is achieved through the use

of an Equipment Hazard Management Process (LI-BS0056) summarised in

Figure 7-1 (MOD, 2013a). A Fault Tree Loss Model is used to aid this process

of assessment. The Air Safety Duty Holder then maintains a platform risk

register, which forms part of the overall Operational Safety Case. Bow tie

models are beginning to be used to aid the assessment of operating safety

risks. All of these models are based on the assumption that safety is a resultant

property of the aggregated activities, arguments or elements of the model or

safety case. The resilience engineering understanding of safety is that it is

equivalent to its converse condition (an accident) and both of these system

states are emergent properties. The purpose of developing the TASM for risk

assessment is to provide a resilience engineering view of safety risks as a more

realistic contrast to linear methods such as bow tie, which have the potential to

122

produce a false level of accuracy when applied to a human-centred centric

system such as the Tornado airworthiness system. A resilience engineering

perspective may produce either a more positive or negative view of a particular

risk but dependent on the level of complexity that applies to the process under

consideration, it is unlikely to be able to produce a quantifiable assessment. To

achieve quantification, Bayesian or fuzzy logic principles are required. It should

be noted of course that the risk assessment process is an implied part of the

‘cost-benefit analysis’ function carried out by the Tornado Engineering Authority.

Figure 7-1 Tornado Process for Emergent Airworthiness Issues (MOD, 2013)

With regard to safety critical complex systems in general, isolating individual

potential risks is challenging due to the interconnected nature of such systems.

Initially FRAM analysis will provide a Resilience Engineering assessment of

123

risks that have already been identified within a system. It may allow a more

realistic understanding of the nature of particular hazards and how they may be

avoided or mitigated. Typically, hazard logs and risk registers record isolated

hazards/risks; FRAM has the potential to more accurately describe how both

hazards and mitigating (or damping) factors are linked. Many accident reports

detail seemingly unlikely combinations of unfortunate circumstances; FRAM

seeks to deal with the issue of harmful combination of varying functional output

more effectively. This provides a novel approach to assessing Common Cause

failures during airworthiness assessments, particularly with respect to

maintenance or design ‘error’. As the approach develops, it may be possible to

identify the potential for previously unexpected risks to emerge. Risk analysis

techniques need to work from a theoretical basis for the origin of risk.

7.2 Current Theoretical Basis for Airworthiness Risk

Management

The current theoretical basis for managing Tornado airworthiness risk is shown

in Figure 7-2, which is adapted from the Hazard log structure illustrated in Local

Instruction BS0056 (MOD, 2013a). The overall risk to life from a particular

accident scenario is calculated by means of adjusting the historical reliability or

event rate data within the fault tree loss model to reflect the new issue

identified. Alternatively a qualitative engineering judgement based assessment

using broad likelihood and consequence categories may be employed. Figure

7-2 does not illustrate a process; the arrows indicate the aggregation of risk.

The current theory, as illustrated, starts with various system controls which

prevent accident causes emerging, which in turn lead to hazards which are then

subject to additional controls. Potential accident scenarios may develop

dependent on the likelihood of the preceding elements in the chain. These

potential accidents may develop through a series of events, some of which may

prevent the situation developing into an actual accident. The likelihood and the

severity of the accident should it develop is based on an arithmetic or qualitative

aggregation of all the preceding elements in the chain. This provides an

124

estimate for risk to life for a particular scenario which is then managed

(including changing elements in the chain) in the manner outlined in Figure 7-1.

The current theoretical basis implies that unless explicitly connected in some

way, adjustments to the various controls will provide a resultant increase or

decrease in the risk to life attributable to a particular potential accident scenario.

The advantage of the current theory is that it provides a basis from which risks

can be separated, considered in isolation and managed in an auditable manner.

An overall quantitative or qualitative estimation of risk to life is based in a linear

aggregation of estimated probabilities of hazards occurring and the

effectiveness of the various mitigating pre or post-accident controls. Historical

data in the form of a loss model is used alongside data relating to specific

issues, such as failure rate data from inspections or tests (MOD, 2013). The

combination of loss models, hazard logs, and risk registers are used to model

the system on the basis of the theory shown in Figure 7-2.

7.3 Proposal of FRAM Based Airworthiness Risk Theory

Resilience engineering theory and FRAM in particular proposes that accidents

are an emergent result of system performance and accidents themselves are

mitigated or prevented by the system behaviour once a hazard has begun to

emerge. In the TASM, potential accident sequences3 may be modelled in very

broad terms through the ‘operate aircraft’ and the various aircraft subsystem

technological functions. Figure 7-3 proposes an alternative or complementary

3 With TASM an accident sequence starts when an aircraft is operating hazardous. It does not

refer to the way the airworthiness management system is behaving at any particular time.

Cause Hazard Potential Accidents Controls

Controls & Events

Figure 7-2 Current Theoretical Basis for Tornado Airworthiness Risk Management

Cause Hazard Controls

Accident

Controls

Controls

Risk to Life

125

theoretical model for the derivation of risks to life to the current theory shown in

Figure 7-2. In this case it is not possible to directly calculate quantitative risks

without some means of describing the TASM in quantitative terms itself; through

some numerical or algebraic calculation of model behaviour. As the TASM

already contains some qualitative descriptions of performance, it should also be

possible to provide a qualitative output in terms of risk. The risk to life may only

be calculated by estimating the likelihood of the system developing into a

functionally resonant state (or states) where an accident is generated.

Figure 7-3 Proposed Functional Resonance Risk Management Theory -

Visualisation of a Generic Hazardous Process

The theory shown in Figure 7-3 makes it difficult to extract a meaningful

description of any risk from the system, as this risk is the product of both the

damping and varying performance of a function. The literature does not provide

HAZARD

O

C

P

I

T

R

Hazard Generating

Function

O

C

P

I

T

R

Upstream Background

Function

O

C

P

I

T

R

Upstream Damping Function

O

C

P

I

T

R

Upstream Forcing

Function

O

C

P

I

T

R

Upstream Forcing

Function

O

C

P

I

T

R

Upstream Damping Function

External Dependency

O

C

P

I

T

R

Downstream Aircraft System

or Operating Function

O

C

P

I

T

R

Upstream/Downstream

Damping Function

ACCIDENT

126

examples of how this can be done using the existing FRAM; a bespoke

technique has therefore been developed. For an in-service system such as

Tornado, many risks are currently recorded and it is likely other potential risks

remain unrecorded. Such unknown risks may become apparent through close

examination and experimentation with the TASM. The reassessment of

currently recorded risk is easier and is the focus of this initial study. Clearly

airworthiness risk to life will only manifest itself through unacceptable variation

in the output of one of the FRAM functions relating to a physical element of the

aircraft. For example, the mechanical system might prevent proper control of

the aircraft or the structure may fail to react loads through a loss of structural

integrity. Defining an associated risk likelihood element depends on the

upstream performance variability of the system. Likelihood of hazardous

variability will also depend on the effectiveness of upstream/downstream

functions in providing damping against harmful variability. Where this

upstream/downstream damping fails to control the hazardous variability,

functional resonance occurs. In order to examine the risk associated with a

particular hazard, the following terms are defined:

Hazardous Process – All functions and activities contributing to the

hazard generation.

Hazard Generating Function – The aircraft system function whose

variable output directly generates a hazardous condition.

Upstream Forcing Function – A function whose output contributes to

forcing the generation of a hazardously variable output from a

downstream function.

Upstream Damping Function – A function whose output reduces the

variability of a downstream Hazard Generating Function.

Background Function – A function whose output can be assumed not

to vary to any significant degree and does not contribute to the variability

of the hazard generating function.

A hazardous process and hence a hazard can emerge in any part of the

system. An airworthiness related accident can only occur as a result of one or

127

more hazardous processes producing uncontrollable variable output from

aircraft system functions (structure fails, electrical fire, loss of power etc.). In the

majority of cases this would be deemed a technical failure in the existing

Tornado air safety risk management process. For example, a hazard might be

corrosion of the aircraft structure. An accident relating to this hazard would

involve the loss of structural integrity as a result of an out of limits variability in

the react-loads output from the aircraft structure function. Corrosion can be

considered to be largely due to internal variability as it is related to the material

of the structure. This is of course based on the assumption that the environment

is not a function within the TASM. If there is significant variability in the

environment experienced across the fleet, then this should be mapped as a new

function in the TASM. Corrosion may also be caused by external variability from

other functions – damage inflicted during maintenance or contamination from

other aircraft systems for example. These would be upstream forcing functions

and could link together in a hazardous process. Damping of this negative

variability is provided by a series of control processes acting on the aircraft

structure function, for example structural inspections called up during scheduled

maintenance and specified by the Engineering authority in the approved data as

a result of a cost benefit analysis. Airworthiness related issues could also play a

part in generating accidents by providing an upstream forcing function for the

‘operate aircraft’ function whilst still remaining providing an output that varies

within the bounds of the system specification. For example, the output of the

avionics flight system may provide an accurate but potentially confusing signal

to aircrew, which when combined with the internal variability of the ‘operate

aircraft’ function may result in an accident. Test and Evaluation is intended to

identify such hazardous variability.

7.4 Proposal for a FRAM Based Risk Assessment Process

As already discussed, quantitative risk assessment is difficult using the FRAM.

It would be possible to apply probabilistic data to the outputs of various

functions in the TASM, however without some using fuzzy or Bayesian

techniques it is not possible to model how these probabilities aggregate through

128

any particular process. To do so would require assumptions to be made that all

functions not considered are background functions and will not vary in output as

a result of the variability in the process under consideration. An inspection of the

TASM shows that the level of connectivity between all functions means that any

such assumption is of dubious validity. Figure 7-4 shows the risk assessment

methodology developed for this project. This iterative process expands through

the TASM allowing all functions in the hazardous process to be highlighted and

then the hazardous output variability to be re-assessed based on the forcing

and damping functions in play.

129

Figure 7-4 FRAM Model Risk Assessment Process

Identify Hazard

Identify the initial/next Hazard Generating Function and select in

Visualisation Tool

Select Upstream Function

Is Upstream Function Forcing, Damping or Background?

BackgroundAssign to a Background

Layer

Any Upstream Functions remaining?

Assign to a Forcing Layer

Forcing

DampingAssign to a

Damping Layer

Yes

Produce/Print Visualisation Tool Output highlighting damping

and forcing functions and activities

Can Function

Output constitute accident?

No

No

Highlight Damping and

Forcing activities within

FRAM Spreadsheet

Model

For each Hazard Generating Function assess whether

combination of highlighted dependent activities (damping +

forcing) changes Hazardous Output variability

Yes

Output Variability Needs Adgustment?

No

Yes

Adjust frequency/amplitude of hazardous output variability

Make qualitative assessment as to how

likely it is that hazardous output variability will

exceed safe level

Record Risk in terms of Likelihood and Severity

130

The TASM includes a risk assessment in terms of frequency and amplitude of

variability and Figure 7-5 shows how functional variability may be differ. There

will some level which defines a safe limit, although this may differ according to

the total system performance condition.

Time

Output(Force, Information, etc.)

Safe limits of variability

Unsafe output – Accident sequence initiates

Digital Variability

Complicated Variability

Figure 7-5 Demonstration of Variability Resulting in Accident Sequence

If this variability is digital in nature (e.g. failed or not failed) then the frequency of

failure may be treated as analogous to the likelihood element of a traditional risk

rating. If the variable output is more complicated in nature (as shown in Figure

7-5) then this is more problematic to map to a likelihood measure. Severity of

any potential accident depends on the extent of out of control variability in the

activities linking functions directly associated with any accident. This is

problematic as the starting point for risk analysis is usually upstream in the

system and there are a multitude of potential outcomes from this upstream

variability. The FRAM output could more reasonably be mapped to an area

rather than a point on a likelihood-severity risk matrix.

Figure 7-5 shows that for every type of functional output from an aircraft system

there will be some level at which output will become unsafe; this depends on

the damping in place downstream at that moment. Where downstream damping

is inadequate this is where functional resonance occurs and system

performance becomes harmful and thus an accident develops. Fundamentally,

131

resilience engineering presents a very different notion of the origin of hazards

and nature of accidents. This does not easily fit within the bounds of current risk

management practise or regulation – RA 1210 (MAA, 2013b).

7.5 Risk Example – Operation of Components in Excess of

Cleared Life

Within the Tornado airworthiness system there have been historical difficulties

managing the quality of data relating to component lives within the Logistics

Information System (LITS). There have also been difficulties in tracking the

consumption of component shelf life for items in storage. This leads to a hazard

arising from out-of-life components that may be installed and then operated in

the aircraft systems in excess of their cleared life. Consequently there is an air

safety risk due to potential component failure. The issue clearly encompasses

technical, organisational and human functions; as such it makes an interesting

case study for the TASM. The mitigating/damping actions currently in place

include a Tornado Asset Gateway Team (TAG) whose task was the restoration

of LITS records for a series of safety critical airworthiness items.

7.5.1 Generating a FRAM Model Risk Assessment

The initial Hazard Generating Function is the ‘Configuration Management

Function’ – this function seeks to ensure that records are accurately maintained

on the lifing and modification state of components on and off the aircraft. The

primary tool to achieve this is LITS. Various records on this issue have been

examined; the ‘Annex A’ process highlighted the issue as it emerged (Freed

and Priday, 2008) and Aitken (2009) provided detailed follow-up examination.

The MOD then contracted BAE Systems to provide a Tornado Asset Gateway

(TAG) Team to resolve the problem (Singleton, 2009) and QinetiQ provided

independent advice (Jeffery, 2009). Once the issue had been resolved to the

satisfaction of the Operational Duty Holder and ALARP report was raised

(Bagwell, 2011). All of this information allowed the construction of a FRAM

Model for the residual risk to be constructed using the TASM as baseline.

Aspects of the Hazard Generating Function have been highlighted as either a

forcing (red), damping (purple) or background functions in Table 7-1. An

132

additional column has been added to show the manner in which damping or

forcing is achieved. Following a first iteration of the process for identifying

downstream functions (as described in Figure 7-5) a visualisation was

produced:

Figure 7-5 Operation of Components Beyond Cleared Life - First Stage Risk

Visualisation, Excluding Background Functions

It is worth noting that the scope of the configuration management in the TASM

is broad – it encompasses all aspects of managing the state of the components

fitted to the aircraft. That includes the compatibility of modifications and the

control of which items can be fitted for what length of time.

Erroneous Entries in LITS

O

C

P

I

T

R


28O

C

P

I

T

R


34

O

C

P

I

T

R


6

O

C

P

I

T

R


Documentation

16

O

C

P

I

T

R


(LITS)

63

O

C

P

I

T

R


30

TAG Team Restoring Configuration Control for safety critical assets

ERCs provide hard copy cross check for component life

ERCs provide hard copy cross check for component life

Clear LITS Policy to Follow

Difficult to complete LITS actions if data corrupt

O

C

P

I

T

R


8

133

Table 7-1 Configuration Management Aspects

An erroneous configuration management output in terms of lifing information is

a hazard produce. In order to understand how the hazardous variability

potentially permeates through the system to generate an accident it is

necessary to look at the output from the configuration management function

using the process shown in Figure 7-4. Following the process, this leads to the

identification of a further eight hazard generating functions. Tracing these

functions within the FRAM Spreadsheet model reveals whether or not the

variability of lifing data is likely to cause downstream variability in these

functions. In some cases, the upstream activity required by the downstream

Name of

FunctionConfiguration Management (LITS)

Aspect Description of Aspect Number Name AspectActual Variability in

Risk Assessment

Input Work Done on Aircraft Recorded 6 Record Work Done on ac

LITS Record

(Configuration

Management)

Erroneous Entries in

LITS

Lifing policy change from EA 30Publish Approved Data

(Tech Manuals & Policy)

Schedule of Life Limited

Parts

Component Life extension 62Manage Maintenance

Extensions

Maintenance Extended

on LITS

Output Information reports to EA and CAMSS

Life limited item reports to Maintenance

Organisations

Life limted items reports to FOC Maintenance Taskers

Pre-printed maintenance work orders and data input

facility to maintenance orgs.

LITS asset gateway to update records for items

delvered from supply chain

Modification retro-list

Precondition Modification approved by TAA in TLARC 23 Modify Aircraft

TAA approval through

TLARC and Issue of

Approved Data (SM

leaflet etc)

Previous Work (config change) Correctly recorded in

LITS6 Record Work Done on ac

LITS Record

(Configuration

Management)

Erroneous Entries in

LITS

LITS team in EA, LITS Team on units 8Provide Authorised


Appropriately (to

requirement)




maintenance, report


tasks etc)

TAG Team Restoring

Configuration

Control for safety

critical assets

Component Engineering Record Cards update - R2 34 Repair/Maintain Spares R2 Update Log cards

ERCs provide hard

copy cross check for

life

Component Engineering Record Cards update -

industry28 Repair Spares - Industry

Update Component Log

Card

ERCs provide hard

copy cross check for

life

Control LITS policy - DAP 300A-01 & 2(R) 1A 30Publish Approved Data

(Tech Manuals & Policy)Support Policy

Clear LITS Policy to

Follow

TimeRequired to Coordinate maintenance documentation

(individual updates)16

Coordinate Maintenance

Documentation



(pre-flight checks)

Diificult to complete

LITS actions if data

corrupt

Upstream Function

134

function does not relate to lifing, therefore these downstream functions remain

as background functions rather than additional hazard generating functions. In

other cases the configuration management output provides data for functions

with an element of data checking and correction – these functions provide an

upstream/downstream damping process. This step is summarised in Table 7-2.

There are two further downstream hazard generating functions identified (two

and 21).

135

Table 7-2 Summary of Second Stage of Risk Assessment

The next stage in the process is to repeat the analysis for these two

downstream hazard generating functions, again using both the Visualisation

Tool and the TASM. In both cases we are interested in the variability of the

Hazard Generating Function 1

63Function

NumberFunction Name Activity Actual Variability in Risk Assessment

6 Record Work Done on acLITS Record (Configuration

Management)Erroneous Entries in LITS

6 Record Work Done on acLITS Record (Configuration

Management)Erroneous Entries in LITS


Documentation



(pre-flight checks)

Diificult to complete LITS actions if data

corrupt

8Provide Authorised


Appropriately (to

requirement) Authorised


TAG Team Restoring Configuration

Control for safety critical assets

34 Repair/Maintain Spares R2 Update Log cardsERCs provide hard copy cross check for

life

28 Repair Spares - IndustryUpdate Component Log

Card

ERCs provide hard copy cross check for

life

30Publish Approved Data (Tech

Manuals & Policy)Support Policy Clear LITS Policy to Follow

2 Scheduled Maintenance

Life limited item reports to

Maintenance

Organisations

Errors in Lifing details


Documentation

Pre-printed maintenance

work orders and data input

facility to maintenance

orgs.

No effect - No corrupt lifing data

involved

17 Defer Faults




orgs.


involved

21 Force & A4 OperationsLife limted items reports

to FOC Maintenance

Failure to task removal of life limted

items during scheduled maintenance

25 Report Faults and Husbandry




orgs.


involved

28 Repair Spares - Industry

LITS asset gateway to

update records for items

delvered from supply

chain

Variation damped out - ERCs provide

hard copy cross check for life

34 Repair/Maintain Spares R2




orgs.

Variation damped out - ERCs provide

hard copy cross check for life

27Airworthiness Review

Certification

Information reports to EA

and CAMSS

Variation damped out - ARC proccess

provides a dip check of data accuracy

Dam

pin

g Fu

nct

ion

sFo

rcin

g Fu

nct

ion

s P

ote

nti

al D

ow

nst

eam

Haz

ard

Ge

ne

rati

ng

Fun

ctio

ns

136

output relating to the replacement of life limited parts – this calls up a specific

additional function, which becomes another hazard generating function.

Table 7-3 Stage 2 - Scheduled Maintenance Function

Hazard Generating Function 2a

2Function


63Configuration Management

(LITS)

LITS 'pull' at start of shift

(configuration

management)

Hazardous Variability from Stage 2 -

Erroneous LITS lifing data

10 Supply Chain Spare Parts (supply chain)Potential for over-life components to be

delivered from supply chain

21 Force and A4 Operations Force Operations Tasking

Potential for life limited part removal

not to be included within CMU scheduled

maintenance tasking.

5 Task Maintenance Rectification Control TaskingExperienced Task Controllers spot items

that have unusual life attached

62Manage Maintenance

Extensions

Extend Maintenance (lined

up to available window)

Experienced engineering management

spot unusual life attached to item that

requires extension

19 Engine Health Monitoring Engine Health MonitoringEffects of over-life components within

propulsion system may be spotted

15 Assure QualityDefence Quality Assurance

Field Force Checks

Experienced DQAFF auditors may spot

over-life items


Extensions

Extend Maintenance (lined

up to available window)



requires extension

Various Various AC Inspected (all systems)Physical degredation of over-life parts

may be spotted.

26Replacement of service life

limited parts

Life limited parts change

out tasking

Failure to initiate removal of over/due

life parts

6 Record Work Done on acRecord Work Done on

Aircraft

Record of installed life may allow later

audit and recovery of situation

25 Report FaultEmergent Work (Report

Faults)

Age related failure of life limited parts

may be spotted and tasked for recovery

39 Demand Spare Parts Demand Spares No effect

41Structural Inspections &

Corrosion Control

Structural Inspection &

Corrosion Control

Age related degredation of life limited

parts may be spotted


CertificationAirworthiness Review

Lifing errors may be spotted downstream

at review

Forc

ing

Fun

ctio

ns

Dam

pin

g Fu

nct

ion

sP

ote

nti

al D

ow

nst

eam

Haz

ard

Ge

ne

rati

ng

Fun

ctio

ns

137

Table 7-4 Stage 2 - Force and A4 Operations Function (Part 1)

Hazard Generating Function 2b

21Function


Forc

ing

Fun

ctio

n


(LITS)

LITS mod configuration

informationErroneous lifing details

Dam

pin

g

Fun

ctio

n


Manuals & Policy)

Maintenance Schedule

(approved data)

Maintenance Schedule provides a

hardcopy cross check for lifing details

69Chief Air Engineer

AuthorisationForce Operations Tasking

40 Repair Aircraft Force Operations Tasking

31Publish Special Instructions

(Technical)Force Operations Tasking


Manuals & Policy)Force Operations Tasking

29 Publish Aircrew Publications Force Operations Tasking


CertificationForce Operations Tasking

Lifing errors may be spotted downstream

at review

7Train Maintenance

PersonnelForce Operations Tasking

4 3 Month Flying Programme Force Operations Tasking

2 Scheduled MaintenanceMaintenance Schedule

(From MOC via FOC)

Potential for life limited part removal

not to be included within CMU scheduled

maintenance tasking.

18 Locally Manufacture Parts


and modification

programme

23 Modify Aircraft


and modification

programme

Po

ten

tial

Do

wn

ste

am H

azar

d G

en

era

tin

g Fu

nct

ion

s

138

Table 7-5 Stage 2 - Force and A4 Operations Function (Part 2)

Hazard Generating Function 2b (continued)

21Function


24Apply Special Instruction

(Technical)


and modification

programme

26Replacement of service life

limited parts


and modification

programme

Failure to initiate removal of over/due

life parts

29 Publish Aircrew Publications


and modification

programme

36 Release To Service


and modification

programme

41Structural Inspections &

Corrosion Control


and modification

programme

Age related degredation of life limited

parts may be spotted

42 Fault Diagnosis


and modification

programme

43 Corrective Maintenance


and modification

programme

44 Technical Asistance Process


and modification

programme


Extensions


and modification

programme



requires extension

10 Supply Chain Supply Chain Prioritisation

20 Ground Services Mission Critical GSE list

38

Store, Service, Repair

Weapons and Role

Equipment

Role Equipment/Weapon

Prioritisation

4 3 Month Flying Programme Level 0 Plan

36 Release To Service Level 0 Plan

7Train Maintenance

Personnel

Force Operations

(Manning) Prioritise

Allocation

Po

ten

tial

Do

wn

ste

am H

azar

d G

en

era

tin

g Fu

nct

ion

s

139

Table 7-6 Stage 3 Replacement of Life Limited Parts Function

The data in Tables 7-3 to 7-6 is combined within a visualisation in Figure 7-6.

This diagram illustrates the relationship between the configuration management

function (mainly exercised through the LITS computer system) the tasking

function carried out by Force Operations, the scheduled maintenance activity

carried out on the aircraft and then the specific function of involved in fitting and

removing life limited items. It demonstrates how various other functions force

variability in the hazard generating functions and also how other functions

provide damping on the frequency and amplitude of that variability. In the case

of the third hazard generating function, ‘Replacement of Life Limited Parts’, the

dimension of output under consideration is whether the part currently fitted is

Hazard Generating Function 3

26Function


5 Scheduled Maintenance Task JobFailure to task removal of component

before authorised life expires

10 Supply Chain Spare PartsComponent Supplied beyond authorised

service life


(LITS)

LITS (Configuration

Management)Erroneous LITS information

8Provide Authorised



Personnel

Experienced personnel inspect items and

paperwork before fitting


Manuals & Policy)Approved Data

Approved life limits in Schedule of Life

Limted items will allow cross-checking

45 Avionic Flight Systems Replace Life Limited Parts Failure to operate, fire hazards

47 Avionic CommunicationsReplacement of Life

Limited PartsFailure to operate, fire hazards

49 Mechanical SystemsReplacement of Life

Limited PartsFailure to operate, Leaks, fire etc

50 Aircraft StructureReplacement of Life

Limited Parts

Failure to react loads (loss of structural

integrity)

48Armament and Electrical

Systems

Replacement of Life

Limited PartsFailure to operate, fire hazards

Dam

pin

g Fu

nct

ion

sP

ote

nti

al D

ow

nst

eam

Haz

ard

Ge

ne

rati

ng

Fun

ctio

ns

Forc

ing

Fun

ctio

ns

140

within its authorised service life (as opposed to other considerations such as

whether the installation is achieved satisfactorily). Figure 7-7 then demonstrates

how an accident could develop as a result of the variability from the

‘Replacement of Life Limited Parts’ function. Four further functions are identified

as being additional downstream hazard generating functions – these are all

physically part of the aircraft system and therefore output variability from these

functions has the potential to generate an airworthiness related accident. The

TASM has been further dissected to show those activities that have the

potential to vary and generate an accident; in turn some of the activities are

linked to further downstream aircraft system functions. The operate aircraft

function has some considerable ability to damp out variability in aircraft system

functions – in other words the crew would often be able to deal with technical

failures of life limited items and still land safely, with actions such as using

redundant systems or limiting the flight envelope (G limits etc.). However, there

are some cases in which variability would be too high in amplitude for the crew

to successfully deal with; such as a sudden loss of structural integrity. Particular

risks such as fire have been highlighted in the diagram, this may occur due to

internal failure within a system producing an unlinked output. In most cases this

would actually be a product of out-of-control activity linking functions – such as

a fuel leak interacting with an electrical component chafing on structure. An

analysis of the stage four hazard generating functions also highlighted that

there are additional damping factors that may prevent excess variability in the

aircraft system outputs. These damping activities include inspection and

functional testing of the systems to highlight variable output (failures) prior to

flight. The damping would be accomplished though further downstream process

e.g. fault reporting and maintenance that is not shown but can be traced out in

the full FRAM Visualisation tool.

141

Figure 7-6 Visualisation of Hazard Generation Process

Erro

ne

ou

s En

trie

s in

LIT

S

O

C

P

I

T

R

Rep

air

Spar

es –

In

du

stry

28

O

C

P

I

T

R

Rep

air/

Mai

nta

in

Spar

es R

2

34

O

C

P

I

T

R

Rec

ord

Wo

rk

do

ne

on

A

ircr

aft

6

O

C

P

I

T

R

Co

-ord

inat

e M

ain

ten

ance

D

ocu

men

tati

on

16

O

C

P

I

T

R

Co

nfi

gura

tio

n

Man

agem

ent

(LIT

S)

63

O

C

P

I

T

R

Pu

blis

h

Ap

pro

ved

Dat

a

30

TAG

Te

am R

est

ori

ng

Co

nfi

gura

tio

n

Co

ntr

ol f

or

safe

ty c

riti

cal a

sse

tsER

Cs

pro

vid

e h

ard

co

py

cro

ss c

he

ck f

or

com

po

ne

nt

life

ERC

s p

rovi

de

har

d c

op

y c

ross

ch

eck

fo

r co

mp

on

en

t lif

e

Cle

ar L

ITS

Po

licy

to F

ollo

w

Dif

ficu

lt t

o c

om

ple

te L

ITS

act

ion

s if

dat

a co

rru

pt

O

C

P

I

T

R

Mai

nte

nan

ce

Per

son

nel

8

Inco

rre

ct It

em

s sp

eci

fie

d

in L

ITS

pu

ll (F

orw

ard

)

O

C

P

I

T

R

Sch

edu

led

M

ain

ten

ance

2

Po

ten

tial

fo

r o

ver-

life

co

mp

on

en

ts t

o b

e d

eliv

ere

d f

rom

su

pp

ly c

hai

n

Erro

ne

ou

s d

ata

Sup

plie

d w

ith

Par

ts

O

C

P

I

T

R

Sup

ply

Ch

ain

10

O

C

P

I

T

R

Forc

e &

A4

O

per

atio

ns

21

Inco

rre

ct li

fe li

mit

ed

ite

ms

spe

cifi

ed

fro

m L

ITS

pu

ll (D

ep

th)

HG

F 1

HG

F 2

a

HG

F 2

b

FOC

om

its

task

ing

for

life

lim

ite

d it

em

ch

ange

O

C

P

I

T

R

Task

M

ain

ten

ance

5 Exp

eri

en

ced

Tas

k C

on

tro

llers

sp

ot

ite

ms

that

hav

e u

nu

sual

life

att

ach

ed

O

C

P

I

T

R

Man

age

Mai

nte

nan

ce

Exte

nsi

on

s

62

Exp

eri

en

ced

en

gin

ee

rin

g m

anag

em

en

t sp

ot

un

usu

al li

fe a

ttac

he

d t

o it

em

th

at r

eq

uir

es

ext

en

sio

n

O

C

P

I

T

R

Engi

ne

Hea

lth

M

on

ito

rin

g

19

Effe

cts

of

ove

r-lif

e c

om

po

ne

nts

wit

hin

p

rop

uls

ion

sys

tem

may

be

sp

ott

ed

O

C

P

I

T

R

Ass

ure

Qu

alit

y

15

Exp

eri

en

ced

DQ

AFF

au

dit

ors

may

sp

ot

ove

r-lif

e it

em

s

Mai

nte

nan

ce S

che

du

le p

rovi

de

s a

har

dco

py

cro

ss c

he

ck f

or

lifin

g d

eta

ils

O

C

P

I

T

R

Rep

lace

men

t o

f se

rvic

e lif

e lim

ited

par

ts

26

HG

F 3

Inco

rre

ct It

em

s sp

eci

fie

d in

LIT

S p

ull

(Fo

rwar

d)

FOC

om

its

task

ing

for

life

lim

ite

d it

em

ch

ange

Exp

eri

en

ced

pe

rso

nn

el i

nsp

ect

it

em

s an

d p

ape

rwo

rk b

efo

re f

itti

ng

Ap

pro

ved

life

lim

its

in S

che

du

le o

f L

ife

Lim

ted

ite

ms

will

allo

w c

ross

-ch

eck

ing

143

O

C

P

I

T

R


limited parts

26

HGF 3

Life Expired Item Fitted




Failure to React Loads




PowerFailure

O

C

P

I

T

R


ns

47

O

C

P

I

T

R


45

O

C

P

I

T

R

Mechanical Systems

49

O

C

P

I

T

R


48

O

C

P

I

T

R

Aircraft Structure

50

O

C

P

I

T

R

Operate Aircraft

54

Incorrect Information Signalled

Isolate Malfunctioning

system

Aircraft Not Controllable/ Life Support Failure

Exploit Redundancies

Isolate Malfunctioning System

Isolate Malfunctioning System

Incorrect Information Signalled

Fire, Electrocution etc

Exploit Redundancies

Fire, Electrocution etc

O

C

P

I

T

R

Defensive AIds

46

Power Failure

Power Failure

O

C

P

I

T

R

Software

65Power Failure

ElectricalSignal Failure

Power Failure

O

C

P

I

T

R

Weapons

53


O

C

P

I

T

R

Propulsion

51


O

C

P

I

T

R

Crew Escape System

52


Fire


POTENTIAL ACCIDENT

HGF 4a

HGF 4b

HGF 4c

HGF 4d

HGF 4e

O

C

P

I

T

R

Flight Servicing

1

Inspection

O

C

P

I

T

R


43

Functional testing

Fault indications

O

C

P

I

T

R

Pre-Flight Checks

55

Inspection

Inspection

Inspection

Functional testing

Fault indications

Functional testing

Inspection

Inspection

Inspection

Functional testing

CommunicationFailure

Restrict Envelope

O

C

P

I

T

R


41

Inspection

Inspection

Figure 7-7 Visualisation of Potential Accident

Processes

144

7.5.2 Insights into Risk

Figure 7-7 represents a multitude of potential scenarios and is a baseline from

which a risk assessment could be conducted for any specific component. The

loose-ended activities shown (largely fire) are given to represent activity that

involves what would normally be background environmental functions e.g. the

atmosphere. Damping processes such as the fire suppression (part of the

mechanical system) could also be plotted.

The risk assessment so far shows how difficult it is to ‘reverse engineer’ a

potential emergent accident process when considering all potential

dependencies and connections, rather than relying on linear assumptions. This

specific hazard is of interest because specific issues with upstream variability

had been noted – how can the risk of this variability triggering a downstream

accident sequence be isolated from the variability in the rest of the system?

Certain assumptions have to be continually made as to the variability of other

functions in the hypothetical system instantiation that is created for the purpose

of assessing the risk. The current safety management system for Tornado

separates a log of hazards and maps these to a number of ‘accident sets’ and

thereafter, at a top level, to a set of air safety risks. The FRAM provides an

alternative model for the analysis of these risks. The final stage in the process

described by Figure 7-6 requires a judgement on how the aggregated upstream

variability is likely to affect the output variability and then to express this in terms

of likelihood and severity. The general layout of the reduced FRAM Model frame

is shown in Table 7-7:

145

Table 7-7 Example Accident Generating Function FRAM Frame Layout

Name of Function Avionic Flight SystemsAspect Description of Aspect Number Name Aspect

Input Operate Aircraft 54 Operate Aircraft Inputs to aircraft systemsIsolate Malfunctioning

SystemHigh High

Likely to induce system state that

does not produce required

outcome

DECREASE Aircrew training and STANEVAL

OutputAircraft Control Signals (mechanical

systems)

Information to Aircrew (operate

aircraft)

Precondition Flight Servicing 1 Flight Servicing

AC visually inspected (Avionics,

Electrical, Structure, Mechanical,

Crew Escape, Weapons,

Propulsion)

Inspection High Medium Induces fault - contamination etc DECREASE Pre-flight checks

Apply Special Instructrions (Technical) 24Apply Special Instruction

(Technical)

Special Instruction (Technical)

Applied to applicable

aircraft/equipment

Unsafe condition develops of not

rectifiedMaintenance document coordination

Scheduled Maintenance 2 Scheduled MaintenanceLife limited parts change out

tasking

Incorrect functional output or

undafe conditionMaintenance document coordination

Repair Maintenance 40 Repair Aircraft Aircraft Structure Repair Function impared Maintenance document coordination

Corrective Maintenance 43 Corrective MaintenanceSystem Restored to correct

functionFunctional testing High High Function impared DECREASE Maintenance document coordination

Modify Aircraft 23 Modify AircraftAircraft Systems Modified under

Service Mod or Designer Mod Function impared Maintenance document coordination

Aircraft Structure 50 Aircraft StructureAvionic Flight System Loads are

reactedFailure to react loads Low High

Incorrect functional output or

undafe conditionINCREASE 9 Pre-flight checks

Pre-Flight Checks 55 Pre-Flight Checks Avionic Flight Systems Checked Inspection High MediumFunction impared remains in a

failed state whilst airborneDECREASE Ground handler spots issues on dispatch

Resource Electrical Power 48Armament & Electrical

SystemsElectrical Power/Signals Power Failure Low High Electronic failure INCREASE 9

Redundancy within electrical system e.g.

battery

Replace Life Limited Parts 26Replacement of service life

limited parts

Part Replaced (system

concerned)Life Expired Items Fitted Medium High

Burn-out of components - function

imparedINCREASE 18 Redundancy and internal failure monitoring

Control Software 65 Software Avionic Flight Systems Signal Fault indications Low HighSpurious or misleading signals or

failure conditionDECREASE Aircrew training and STANEVAL

Time Not initially described

Potential Damping Factors to counter upstream-

downstream coupling






Rough Downstream


Upstream FunctionActual Variability


Output Performance


Output Performance

146

Essentially a judgement is required as to the combined effect of the forcing

functions (red) and the damping functions (purple) taking into account their

respective frequencies and amplitude of variability. The effect will be manifested

in the output from the Hazard Generating function. The initial TASM provides a

prior assessment of the output variability of all of the stage four Hazard

Generating Functions. As these functions are all technological safety critical

systems that form the aircraft they are all also assessed as having a low

frequency of variability (i.e. they their output is highly reliable) and the amplitude

of variability is high. The thinking behind the high amplitude categorisation is

that performance is more likely to cease entirely than be degraded; clearly

however there are more frequent but low amplitude (degraded system

performance) failures – the output from these functions is complicated. Table 7-

7 shows how this is recorded in the baseline FRAM Model.

Considering the risk of posed by operation of components in excess of cleared

life, it is likely that the frequency of variability will increase if the hazard to

propagates through the system to this fourth stage. The key point to consider

here is how meaningful any assessment of a generic risk based on this initial

hazard can be. For example a structural component may present a more

serious mode of failure than an avionic component. There will be a variety of

levels of redundancy within the different systems. Using the FRAM it is only

possible to generate a risk assessment in the case of a specific component or

class of components. As previously described this would also require functions

not involved in the hazardous or associated damping processes to remain

constant. This could be set by describing allowable boundaries for the

performance indicators described in the TASM.

147

Table 7-8 Avionic Flight Systems Output – Baseline FRAM Model

45Name of

FunctionAvionic Flight Systems


Input Operate Aircraft

Output Aircraft Control Signals (mechanical systems) Aircraft Control Signals (mechanical systems) Sequence Incorrect information signalled Low High

Information to Aircrew (operate aircraft)Information to Aircrew (operate aircraft)

Sequence Incorrect information signalled Low High

Precondition Flight Servicing

Apply Special Instructrions (Technical)


Repair Maintenance


Modify Aircraft

Aircraft Structure

Pre-Flight Checks

Resource Electrical Power

Replace Life Limited Parts

Control Software

Time Not initially described

Amplitude of

VariabilityFrequency of Variability


OutputsMost Likely Dimension of

Output Variability

Description of Most Likely Output

Variability


148

7.6 Proposal for a FRAM Based Risk Management

Whilst it is difficult to produce a quantified risk assessment along the lines of

current practise using a FRAM model, the model does provide for a greater

level of understanding of the system. There is potential to use this feature of the

approach to facilitate system redesign. A typical hazards and barrier approach

to safety would involve introducing additional barriers. Resilience engineering

principles advocate that safety may be increased by increasing the instances of

successful operations of a function or a system to prevent the instances of

harmful operations. With this in mind, focus on system re-design should be on

strengthening the damping internal to hazard generating functions and that

provided by the wider system. Figure 7-8 shows the means by which risks can

be managed; using the output from the various performance indicators given in

the TASM to allow managers to make adjustments through system redesign.

Redesign could be through improvements in the way functions operate in terms

of internal process, addition of new or better resource or by amending

controlling outputs from other functions.

Figure 7-8 Proposed Risk Management Process

This can be achieved by understanding the ‘work as done’ rather than as

imagined or prescribed. Obviously the only way to achieve this is through

dialogue with the individuals involved in the process or where data systems

Potential Accidents Risk to

Life Adjustment

s to System

Performance Indicators

System Damping or Forcing

Functional Performance

Variability

149

such as LITS are involved by analysing the flow of that data. Similarly those

potential performance indicators already provided within the Baseline FRAM

Model should be collected and monitored. Risk management in this manner will

allow for risks to life to mitigated to an appropriate ALARP level.

7.7 Chapter Summary

At this stage of development it has only been possible to describe an outline

process for risk assessment methodology. This requires further work and it is

likely that more advanced mathematical techniques will be required. Whilst it is

more difficult to apply the FRAM for risk assessment than other more traditional

techniques, resilience engineering principles would suggest that a more

accurate result is more likely, albeit with a considerable level of uncertainty and

validity limited to specific system states. In order to progress the FRAM should

first be used as a monitoring tool, developing leading safety performance

indicators as described in the model. Thereafter it can be used a anticipating

tool to deal with threats from emergent safety risk noting Hollnagel’s (2011) four

cornerstones of resilience.

151

8 DISCUSSION

Chapters four and five described how a Tornado Airworthiness System Model

(TASM) and associated Visualisation Tool were created using the FRAM. The

use of these tools was then discussed both for incident investigation and risk

assessment in chapters six and seven. This chapter discusses the results of

these exercises and considers how well the research objective has been met,

with reference to the current literature on resilience engineering reviewed in

Chapter one.

8.1 Applicability of the Resilience Engineering Paradigm to

Airworthiness

Chapter 1 highlighted various views on the progress of safety science over the

last century. Experience within high hazard industries such as military

aerospace suggests that accident reports impose a certain degree of hind-sight

simplification to deconstruct complexity (Dekker et al, 2011). The literature

review highlighted that resilience engineering is emerging as a new paradigm in

safety. Dekker (2011) also describes resilience engineering as a ‘post

Newtonian’ analysis of safety, which aptly describes the change in perspective

required to adopt its precepts. The boundaries of resilience engineering practise

are far from clear and it is not possible to isolate it from other elements of safety

engineering and management best practise. The theory is however markedly

different to most existing notions of safety.

The reality of everyday work in high hazard industries is one of compromise and

pragmatic application of safety regulation – through approximate adjustments.

Yet, given that such industries are now successfully meeting safety targets it

would appear that current safety management and analysis techniques may be

sufficient. Clearly better regulation and understanding of human factors and

organisational behaviour are responsible to some extent for increasing levels of

safety but will an enhanced understanding of the science of complexity enhance

safety? The Functional Resonance Analysis Method provides one way in which

resilience can truly be engineered into systems at organisational, technological

152

and human levels. It has the potential, not yet fully realised, to replace or

augment many existing safety engineering techniques. Resilience engineering

could potentially produce a paradigm shift in safety engineering, although much

further work is required to operationalise FRAM, which stands out as the most

useful methodology. Much of the literature on resilience engineering focusses

on operational aspects of high hazard industries. Continuing airworthiness is

within that particular scope and FRAM may well offer practical ways to address

continuing concerns over maintenance error. Wider than that, a focus on safety

being driven by increasing the probability of success instead of the prevention

of failure is an attractive engineering design philosophy in that it promotes a

more efficient use of mass, power and volume – an application that requires

further investigation. This study however focussed on the organisational

aspects of maintenance and modification. The key finding was that even by

modelling the socio-technical system at a relatively high level of abstraction

(with TASM) it became clear that all aspects of the airworthiness management

system have evolved to become extremely tightly connected in interlocking

processes.

Airworthiness regulation and practise is managed in a compartmentalised and

therefore mostly linear fashion. In the civil sector this is through a system of

licences for individual maintenance personnel and approvals for design,

production, maintenance and training organisations. Increasingly the UK military

regulator is adopting similar practise, which facilitates operational flexibility as it

provides a variety of potential options for sourcing airworthiness related activity.

In many cases, for example depth maintenance or maintenance programme

development, these functions have been contracted to industry. Type

Airworthiness Authorities must maintain oversight of the complexities of the

whole system. The technical safety cases for air platforms such as Tornado

(Mason, 2012) make use of Goal Structured Notation to break the argument for

a safe system down into to manageable and auditable portions. Such

techniques do not effectively deal with the interconnected nature of support

systems. Fundamentally, existing airworthiness practise assumes that safety is

a resultant rather than an emergent system property. The evidence presented in

153

the literature review makes a compelling case for the latter assumption. The

tools available to practitioners need to develop to match this new paradigm.

Ideas of complexity and resilience in safety are becoming more prescient. There

is a clear trend for aircraft systems to become more complex, due to their size

(e.g. A380) or their advanced software driven systems (e.g. F-35 Lightning II).

The system of systems approach is becoming more relevant as individual

aircraft become more integrated into Air Traffic Management Systems in order

to increase efficiency (open skies) or safety (TCAS), similarly the advent of

Network Enabled Capability sees military air platforms integrated into wider

systems, which in the case or Remotely Piloted Air Systems (e.g. Reaper) have

the ability to directly control the aircraft. Hence the scope of airworthiness must

be extended beyond the actual aircraft itself; a key issues in regulating of civil

unmanned aircraft (Hodson, 2008). In continuing airworthiness, maintenance is

conducted using networked systems to provide approved data to technicians

and interact with the aircraft directly to diagnose faults. Similarly the supply

chain is similarly becoming fused with the aircraft (e.g. Boeing Gold system). On

the organisational front, there is a trend for traditional company or military

structures to become fragmented with various aspects of aircraft acquisition,

design, manufacture, support and maintenance outsourced or subcontracted.

Thus it could be argued that there is now a requirement to be able to model

these complexities and estimate how they might interact in productive or

counter-productive ways.

The practise of airworthiness management has remained strongly rooted in the

technical era of safety management. Accepted practise for linear combinations

of reliability assessment such as fault trees do not meet the ideal of resilience

engineering at the system level. The resilience engineering framework itself

draws on the rigour of systems engineering and some of the insights of

complexity theory. Resilience engineering is positioning itself as a successor to

conventional forms of safety management; this requires further justification

through real world examples. The line between reliability and safety assessment

has often seemed to become blurred; this is understandable for simple aircraft

154

systems. However software and human factors have made aircraft systems

increasingly intractable – it is not possible to predict their performance under all

conceivable conditions. For this reason Development Assurance Levels or

Safety Integrity Levels are used to control the design of safety critical software.

This is an example of resilience engineering already in practise – in the case of

software it is not possible to analyse it in a Newtonian-linear manner so instead

upstream controls are used on the development process in order to maximise

the likelihood of successful operation. It is quite common particularly in military

aviation, for aircraft to be operated in a manner quite different to the way the

designer had originally envisaged. It is for this reason that both ‘type’ and

‘continuing’ airworthiness management is important. Airworthiness activity is in

practise is delegated across multiple organisational boundaries. Managing the

output of these various organisations is therefore important. Human factors in

maintenance are also a subject of much debate and research (RAeS, 2013).

Research interviews found common complaints in the proliferation of regulation,

assurance activity and other barriers. Barriers designed with good intentions,

have the potential to shape functions in unexpected ways; producing and new

emergent properties from the system. This is especially the case when work is

viewed from an efficiency-thoroughness-trade-off perspective. If additional work

is required with no increase in resources it inevitable that thoroughness will

suffer in some other area.

Accident theory has tended to focus on producing techniques that prove useful

in a large number of circumstances; perhaps at the expense of compliance with

a more rigorous theoretical basis. Given the complexity of sociotechnical

systems and the property of incompressibility; how can an analyst be sure that

a model is viable? Also, given the low probability of occurrence of the

catastrophic accidents that we are concerned with, how is it possible to validate

these models? In terms of accident investigation, FRAM deals with

incompressibility through its fractal property – functions can be continually

decomposed into further functions until an appropriate level of detail is provided.

Incompressibility leads to the requirement for dealing with cross-scale

interactions between very large and very small functions. For accident

155

investigation the TASM can be altered at will to account for this requirement and

the facts as they are found. This process cannot be reverse engineered for risk

assessment and the Tornado Airworthiness System Model (TASM) represents a

first guess at an appropriate level of decomposition. Assuming that the system

is in fact non-linear, the associated properties mean that a catastrophic out-of-

control condition may develop from an element in the system that exists below

the level of detail modelled. There is not yet a satisfactory answer for how to

deal with this. .

Madni’s (2007) diagram at Figure 2-7 shows a variety of activities that would be

familiar to current safety managers within a variety of industries, how is

resilience engineering different to safety management? Resilience engineering

seeks to engineer safety into the system at all stages of the lifecycle;

emphasising pro-activity rather than reactivity. It seeks to enhance performance

in synergy with the operational output rather than the introduction of additional

checks and barriers.

Resilience engineering offers a theoretical basis for understanding how

airworthiness management currently operates in practise and perhaps how it

might be better designed or ‘engineered’; providing a framework in which

technical, organisational and human factors risks can be managed more

holistically. It goes further than the trend for employing safety management

systems or striving to shape safety cultures. Wider than that, it offers a positive

outlook in which safety can only be improved as by improving the way that

functions perform. New tools and techniques are required in order to bring

resilience engineering theory into practise. The literature presents a variety of

possible tools that may be applicable but none have yet gained widespread use

in comparable hazardous industries. The tool that appears to have the most

potential is the Functional Resonance Analysis Method; it was therefore

selected for use in the case study organisation.

8.2 The Tornado Airworthiness System Model – Initial Version

The key point which the model emphasises is the connectivity between the

different functions involved in Tornado Airworthiness. This is where the detail of

156

the model is contained and where its value is provided. How the functions

themselves internally operate is not explicitly described although much is

implicit from the activities which link functions. This is a weakness of the model;

it does not allow predictions to be made as to future behaviour or risks of

particular outcomes based on internal variability. Its strength however is in its

relative completeness. The iterative process of construction means that

activities highlighted during the data gathering phase must link either two

functions or out to some external or background function. In many cases this

property of completeness led to the identification of previously un-modelled

functions. Another weakness is the level of detail in which activities are

specified; these are generally described in a one-dimensional manner whereas

in reality each activity describes multiple changes of state, flows of information

or conversion of energy. Similarly with the description of variability, this is

described in its ‘most likely’ form whereas the form of variability that may prove

to be worst could be completely different in nature. However complete the

model is in this initial version, it is anticipated that if it is used over time it will be

added to and refined.

8.3 Incident Investigation

The accident or occurrence models currently employed within the MOD’s

ASIMS system is essentially linear in nature; it classifies the occurrence by

cause, event descriptors and by contributory factors. Use of the FRAM model

allows a more nuanced approach to the investigation. It is anticipated that future

investigations will be able to use the model as a framework to guide

investigations. Although ASIMS has a specific taxonomy which must be used,

there is the facility to attach files so outputs from the model can easily be

archived within the system. The most important aspect of using TASM during

incident investigation is its encapsulation of resilience engineering precepts and

more specifically the basic principles of FRAM. It is hoped that this will help

prevent future investigators using assumptions linked to more linear ages of

safety thinking.

157

8.3.1 Data Collection

A key part of investigating any incident or accident is the collection of evidence.

This data is then organised in some manner to provide a narrative description of

what happened. At this stage mental models of causation become relevant; the

TASM provides both a starting point and an aid deciding on the completeness

of the data collection stage of any investigation. In broad terms it provides a

model for how each function performs; if investigations show that there was

unexpected variability from the output of a specific function, conclusions as to

why this occurred should not be drawn until all aspects of that function have

been investigated. A ‘guilty until proven innocent’ approach to output variability

needs to be taken so that the emphasis is on showing that output variability was

within safe bounds during the accident model instantiation. Using the model as

an investigative framework helps to guard against the arbitrary allocation of

‘root’ causes as described in Chapter one. Conversely, using the model as a

framework does pose a risk that investigations will be drawn into avenues that

do not fit the evidence as a result of assumptions made when constructing the

model. Any investigator must be highly sceptical as to the accuracy of the

model and where functions and activities occur that do not seem to fit, they

should propose alterations to the baseline model. That said the level at which

the functions and activity has been mapped is sufficiently high to encompass

most levels of variability. The two cases studies provided in Chapter six

demonstrate that this is the case.

8.3.2 Aids to Investigation

The use of Visio to provide an interactive visualisation tool allows for quick and

easy manipulation of potential scenarios within the model. This is highly useful

at the overview level but as detail is investigated it becomes difficult to trace

variability through the model. For further detail the spreadsheet model is

particularly useful and the detailed interactions between functions may be

tracked using the excel ‘trace dependents’ tool.

158

8.4 Risk Assessment

The area of risk assessment is particularly problematic as linear analysis

methods are able to aggregate risks and provide quantitative and semi-

quantitative risk assessments for particular accident scenarios or sets of

scenarios. Non-linear mathematical methods are required to estimate

probabilities for risks when the system is analysed from a FRAM perspective.

Linear methodologies such as Bow-Tie or Boolean fault and event trees rely on

digital interpretations of functions or events, e.g. the system/human/process has

failed or not failed. The FRAM functions used in the TASM do not specify all of

the potential dimensions of the output. For example as a technological function,

the mechanical systems will produce hydraulically actuated flying control

movement, breathing air and cabin conditioning amongst other outputs.

Similarly as an organisational function, scheduled maintenance produces a

variety of outputs which are as diverse as surface finishing, inspection,

lubrication and functional testing. Human functions further defy description with

a huge variety in possible outputs. In order to overcome this the number of

functions could be increased by further decomposing the current set of

functions. The nature of FRAM is that functions are fractal and can be

decomposed into an infinite number of functions depending on the level of detail

required in the model. These possible functions below the level of detail used in

the model may produce some or all of the outputs of the higher level functions.

This is illustrated in Figure 8-1, which also demonstrates how functional outputs

may take a variety of dimensions.

159

Figure 8-1 Fractal Property of the FRAM - Function Decomposed into Lower

Level Functions

Estimation of functional output variability is down to ‘expert’ opinion on the likely

performance of the function on the basis of the combination of upstream

activities presented. This provides for a less than satisfactory estimate of risk,

particularly given the doubts raised about the ability of ‘experts’ to produce

reliable estimates of risk for low probability events (Brooker, 2011). Rather than

a final method for estimating risk, the FRAM provides a framework from which

such a method may be developed in the future. A key point to be drawn from

the resilience engineering perspective is the need to improve the success rate

of the system and thus prevent the emergence of hazards and risks. This is a

different perspective for consideration of duty holders and regulators, despite

the difficulty of interfacing this perspective with the current legal system and

need to attribute blame and culpability in the event of an accident.

O

C

P

I

T

R

FUNCTION

O

C

P

I

T

R

FUNCTION

O

C

P

I

T

R

FUNCTION

O

C

P

I

T

R

FUNCTION

O

C

P

I

T

R

FUNCTION

160

8.4.1 Hazard Management vs Functional Resonance Management

The key to managing risk to life in the current practise is the management of

hazards. Such hazards are derived from historical records or from functional

hazard assessment techniques (FHA). Chapter 7 outlines a methodology which

to some extent seeks to replace some of these techniques. It is unlikely that the

process of managing functional resonance can completely replace these

standard techniques using the TASM. In order to fully evaluate all potential

accidents and the antecedent hazard generating functions within the TASM, it

would be necessary to conduct a similarly FRAM modelling exercise for the

operation of the aircraft.

8.5 Utility of the TASM for Type Airworthiness Activities

The TAA’s responsibilities are set out in RA1015 (MAA, 2012a); the TASM will

facilitate many of these responsibilities in its current version. As is discussed

below, there is scope to develop this tool further and this will provide greater

assistance to the TAA. Table 8-2 provides details whether or not each of the

responsibilities required by the RA can be assisted by the TASM. The key way

in which the TASM is useful is that it provides a model for how the aircraft

system interacts with its supporting personnel and organisations; this will allow

the safety management system to be adapted to provide enhanced resilience.

Type airworthiness activities are predominantly organisational in nature and

therefore more susceptible to low frequency variability, known as organisational

drift; this is more difficult to sense and control. The use of a rigorous approach

such as the TASM will enable this feature of organisational safety to be more

effectively dealt with.

161

Table 8-1 Utility of TASM for TAA Activities

Us

e o

f T

AS

M V

ers

ion

1P

ote

nti

al F

utu

re U

se

s

aM

ana

gin

g, o

n b

eha

lf o

f th

e O

CD

, a

irw

ort

hin

ess a

ctivi

ty o

f a

n a

ir s

yste

m d

uri

ng

its d

eve

lop

me

nt.

No

t a

pp

lica

ble

to

To

rna

do

but use

ful b

ase

line

fo

r fu

ture

acq

uis

tio

ns

Pre

dic

tive

mo

de

lling

and

de

sig

n o

f sup

po

rt s

yste

ms (

org

anis

atio

ns)

alo

ng

sid

e

techno

log

ica

l de

sig

n

bC

om

pili

ng

ce

rtific

atio

n e

vid

ence

to

sup

po

rt m

ilita

ry typ

e c

ert

ific

atio

n/ R

ele

ase

to

Se

rvic

e R

eco

mm

end

atio

n (

RT

SR

)/R

ele

ase

to

Se

rvic

e (

RT

S).

N

/AN

/A

cT

he

co

mp

lete

ne

ss a

nd

accura

cy

of th

e A

pp

rove

d D

ata

, e

lem

ents

of th

e A

ircra

ft

Do

cum

ent S

et (A

DS

), a

nd

the

up

ke

ep

of th

e T

ype

inclu

din

g a

ll D

esig

n.

Gre

ate

r und

ers

tand

ing

of d

ow

nstr

ea

m a

ffe

cts

of cha

ng

es to

AD

S a

nd

air

cra

ft

mo

dific

atio

ns. F

or

Mo

dific

atio

ns u

se

mo

de

l to

de

mo

nstr

ate

tha

t e

xce

ss v

ari

ab

ility

will

no

t b

e c

rea

ted

.

Inte

gra

ted

da

ta w

ill p

rovi

de

ne

ar

rea

l tim

e fe

ed

ba

ck o

f th

e e

ffe

ctive

ne

ss o

f A

DS

.

d

De

velo

pin

g, m

ain

tain

ing

and

enha

ncin

g a

Sa

fety

Ma

na

ge

me

nt S

yste

m,

co

mp

liant w

ith the

OC

D a

pp

rove

d p

roje

ct a

irw

ort

hin

ess s

tra

teg

y, w

hic

h w

ill

co

ntr

ibute

to

the

Op

era

ting

Duty

Ho

lde

r’s (

OD

H’s

) A

ir S

yste

m S

afe

ty C

ase

fo

r

ea

ch typ

e.

Pro

vid

es a

fra

me

wo

rk o

n w

hic

h the

SM

S c

an b

e b

ase

d u

sin

g r

esili

ence

eng

ine

eri

ng

pri

ncip

les.

Inte

gra

ted

da

ta w

ill p

rovi

de

ne

ar

rea

l tim

e fe

ed

ba

ck o

f th

e e

ffe

ctive

ne

ss o

f S

MS

.

e

Ensuri

ng

tha

t a

pp

rop

ria

te a

ctio

n is ta

ke

n in r

esp

onse

to

air

wo

rthin

ess issue

s

inclu

din

g, b

ut no

t lim

ite

d to

, th

e issuin

g o

f T

echnic

al I

nstr

uctio

ns, a

nd

reco

mm

end

ing

to

the

OC

D the

sto

pp

ag

e o

f, o

r m

ajo

r re

str

ictio

n to

, o

f fly

ing

.

Pla

nnin

g the

effe

ctive

im

ple

me

nta

tio

n a

nd

eva

lua

ting

the

se

co

nd

and

thir

d o

rde

r

effe

cts

of in

tro

ducin

g S

I(T

)s. A

sse

ssin

g the

full

imp

lica

tio

ns o

f (a

nd

re

sili

ence

ag

ain

st)

any

ne

wly

id

entifie

d h

aza

rd p

rio

r to

any

sto

pp

ag

e o

r re

str

ictio

n o

f fly

ing

reco

mm

end

atio

n.

Inte

gra

ted

da

ta w

ill a

llow

mo

re u

se

ful a

nd

ne

ar

rea

l tim

e fe

ed

ba

ck; a

llow

ing

mo

re

pre

cis

ly ta

rge

tte

d r

estr

ictio

ns in fly

ing

whe

re n

ece

ssa

ry.

f

Co

llecting

, in

vestig

ating

and

ana

lysin

g r

ep

ort

s o

f, a

nd

to

, fa

ilure

s, m

alfu

nctio

ns,

de

fects

or

oth

er

occurr

ence

s info

rma

tio

n r

ela

ted

to

co

nfirm

tha

t th

e typ

e d

esig

n

rem

ain

s a

irw

ort

hy.

Ana

lysis

of te

chnic

al f

ailu

re a

nd

ma

lfunctio

ns fro

m a

pe

rfo

ma

nce

va

ria

bili

ty/

resili

ence

po

int o

f vi

ew

. A

na

lysis

of o

ccure

nce

s u

sin

g r

esili

ence

eng

ine

eri

ng

pri

ncip

les.

Mo

re d

eta

iled

FR

AM

mo

de

l of a

ircra

ft s

yste

ms to

re

pla

ce

fa

ult

tre

e b

ase

d lo

ss

mo

de

l.

gIn

form

ing

the

typ

e d

esig

ne

r, o

the

r o

pe

rato

rs a

nd

the

MA

A o

f th

e o

utc

om

e o

f a

ny

inve

stig

atio

n into

a s

ignific

ant a

irw

ort

hin

ess o

ccurr

ence

.P

rese

nta

tio

n o

f a

irw

ort

hin

ess o

ccurr

ence

re

po

rts u

sin

g F

RA

M v

isua

lisa

tio

ns.

h

Ma

inta

inin

g the

Str

uctu

ral,

Pro

puls

ion a

nd

Sys

tem

s In

teg

rity

o

f p

latfo

rm typ

e

thro

ug

h li

fe inclu

din

g p

latfo

rm typ

e s

erv

ice

exp

eri

ence

ag

ain

st th

e d

esig

n

assum

ptio

ns.

Te

sting

the

de

sig

n a

ssum

ptio

ns (

pro

ba

bly

no

w o

nly

fo

r m

od

ific

atio

ns)

ag

ain

st th

e

in-s

erv

ice

mo

de

l fo

r a

irw

ort

hin

ess m

ana

ge

me

nt.

iE

nsuri

ng

tha

t th

ere

is a

de

qua

te c

o-o

rdin

atio

n b

etw

ee

n d

esig

n a

nd

pro

ductio

n

org

aniz

atio

ns.

N/A

Usin

g d

ata

inte

gra

tio

n to

und

ers

tand

and

mo

nito

r a

ny

do

wnstr

ea

m e

ffe

cts

of D

O-

PO

issue

s.

jA

cce

pta

nce

of b

uild

de

via

tio

ns fro

m d

esig

n, b

y ta

il num

be

r.

N/A

N/A

k

Re

tain

ing

, o

r ha

ving

acce

ss to

, a

ll re

leva

nt d

esig

n info

rma

tio

n, d

raw

ing

s, b

uild

info

rma

tio

n a

nd

te

st &

insp

ectio

n r

ep

ort

s to

pro

vid

e the

info

rma

tio

n n

ee

de

d to

sup

po

rt the

sa

fety

arg

um

ent th

at und

erp

ins T

ype

Air

wo

rthin

ess.

N/A

N/A

lE

nsuri

ng

tim

ely

up

da

te a

nd

co

mm

unic

atio

n o

f cha

ng

es to

the

Air

cra

ft D

ocum

ent

Se

t (A

DS

).

N/A

N/A

mT

he

pro

visio

n o

f m

od

ific

atio

ns n

ece

ssita

ted

by

in-s

erv

ice

exp

eri

ence

or

as

req

ue

ste

d b

y th

e D

Hs fo

r sa

fety

, o

pe

ratio

na

l, o

r e

co

no

mic

re

aso

ns.

Ris

k a

sse

ssm

ent a

nd

pre

dic

tio

n o

f e

ffe

ctive

ne

ss o

f a

ny

sa

fety

mo

dific

atio

ns.

Gre

ate

r q

ua

ntifica

tio

n o

f ri

sk to

pro

vid

e e

vid

ence

fo

r m

od

ific

atio

n b

usin

ess

ca

se

s.

n

The

co

mp

ete

ncy

asse

ssm

ent a

nd

sub

-de

leg

atio

n o

f e

lem

ents

of a

irw

ort

hin

ess

auth

ori

ty w

ithin

the

ir P

latfo

rm P

T to

re

leva

nt P

T s

taff, w

he

re n

ece

ssa

ry, b

y

me

ans o

f a

Lo

AA

. S

uch L

oA

As m

ay

be

giv

en o

nly

to

ind

ivid

ua

ls w

ho

re

quir

e

air

wo

rthin

ess a

uth

ori

ty to

alte

r A

pp

rove

d D

ata

and

the

AD

S w

itho

ut re

fere

nce

to

hig

he

r a

uth

ori

ty.

N/A

Inte

gra

ted

da

ta w

ill p

rovi

de

fe

ed

ba

ck o

n e

ffe

ctive

ne

ss o

f a

irw

ort

hin

ess d

ecis

ion

ma

kin

g.

oE

nsuri

ng

tha

t a

ll sub

-de

leg

atio

ns a

re r

evi

ew

ed

at le

ast a

nnua

lly.

N/A

N/A

RA

10

15

Ty

pe

Air

wo

rth

ine

ss

Au

tho

rity

Ro

les

an

d R

es

po

ns

ibilit

ies

162

8.6 Utility of the TASM for Continuing Airworthiness Activities

The TASM details how airworthiness is managed across type and continuing

airworthiness areas. These boundaries are somewhat more blurred than is the

case in the civil sector because of the way the industries have evolved – the

Tornado Continuing Airworthiness Management Exposition (Casey, 2013)

highlights how the Continuing Airworthiness Management Organisation (CAMO)

tasks have been distributed across a variety of organisational boundaries. In

particular many of the tasks have been contracted to industry but the MOD

organisation responsible for these contracts works for the TAA. Whilst the

background layer in the TASM broadly shows areas of responsibility, CAMO

tasks do in fact stretch across most areas of the TASM. Table 8-3 shows how

the TASM will assist the CAMO in undertaking its tasks. The CAM has

responsibility for more dynamic elements of the system compared to steadier

state activities undertaken by TAA staff. Modelling this system will at least

promote greater understanding of how the various components interact. Where

the TASM is more use to the TAA as a risk assessment tool, to the CAMO it is

more useful as a more general management tool. It may provide a useful check

on future system changes, allowing analysis of downstream implications of any

change prior to implementation. It could be argued that a knowledgeable and

experienced manager would have an instinctive handle on the implications of

change without use of the tool. The modelling process has shown a high level of

complexity exists so it is moot as to whether an intuitive decision making

process would be able to consider all factors in as complete a manner without a

model. The scope of management understanding of the control mechanisms

available to adjust the system is currently limited by experience.

163

Table 8-2 Potential CAMO Use of TASM

RA 4947 CAMO Tasks Use of TASM Version 1 Potential Future Uses

a

Develop and control a maintenance

programme including any applicable

reliability programme, proposing

amendments and additions to the

maintenance schedule to the TAA.

Qualitative understanding of the

airworthiness system to

understand any second and third

order implications of changes to

the maintenance programme.

Integrate reliability data into

TASM to provide enhanced

analysis of organisational or

human causes for repeat

arisings.

b Manage the embodiment of

modifications and repairs.

Assessment of the effect of

tasking modification and repairs

by generating an instantiation of

the system.

Incorporate reporting of

modification satisfaction into the

TASM to provide feedback as to

the succes of the plan.

c

Ensure that all maintenance is

carried out to the required standard

and in accordance with the

maintenance programme, and

released in accordance with MRP

Maintenance Certification

Regulation.

Qualitative understanding as to

whether the system as currently

constructed is capable of reliably

implementing the maintenance

programme - for example

assessing the resources

required for variations in the

programme.

Incorporation of resouce data

(e.g. manning spreadsheets) into

the model.

dEnsure that all applicable SI(T)s are

applied.

Assessment of the capability of

the system's ability to reliably

carry out SI(T)s without excessive

variability; understand all of the

upstream dependencies on the

SI(T) functions.

Incorporate SI(T) satisfaction

data into the TASM. Better

predictive capability for SI(T)

satisfaction rate.

e

Ensure that all faults reported, or

those discovered during scheduled

maintenance, are managed correctly

by a Military Maintenance

Organization or MRP/Mil Part 145

Approved Maintenance Organization.

Analysis of the maintenance

organisation's likelihood of

managing faults correctly.

Use predictive data to provide

indications of when organisations

may fail.

f

Co-ordinate scheduled maintenance,

the application of SI(T)s and the

replacement of service life limited

parts.

Provides a top level map of

these activities.

Leading perfomance indicators

to provide warning of potential

failures.

g

Manage and archive all continuing

airworthiness records and the

MF700/operator's technical log.

N/A N/A

h

Ensure that the weight and moment

statement reflects the current status

of the aircraft.

N/A N/A

i

Initiate and coordinate any necessary

actions and follow up activity

highlighted by an occurrence report.

Allows more effective and more

easily implemented

reccomendations to be

generated from occurrence

reports.

Incorporation of ASIMS data into

the TASM.

164

8.7 Utility of TASM for Duty Holder Activity

The TASM was designed as an airworthiness management tool rather than an

air safety tool. None the less it will provide a useful facility for the Duty Holder

and his staff engaged in air safety management; principally it will provide non-

airworthiness staff a better overview and understanding of the airworthiness

system and thus provide for greater challenge for the advice of specialists.

Table 8-3 Aviation Duty Holder Use of TASM

The duty holder is responsible for ‘holding’ the risk to life as a result of any

airworthiness issue that may be present in the system or may arise in the future.

For all of the reasons described in the Chapter 2 this responsibility can be

discharged with greater realism if these risks are analysed from resilience rather

than a traditional linear perspective. It is anticipated that the lack of clarity over

risk may provide significant concern and this risk assessment element is the key

area for further work to develop. However, there is a concern that linear

methods currently provide a spurious level of accuracy in their risk modelling;

Use of TASM Version 1 Potential Future Uses

a

Cease routine aviation operations if

RtL are identified that are not

demonstrably at least Tolerable and

ALARP.

More effective assessment of

risks using a resilience point of

view.

Potential for quantification of risk

analyses.

b

Establish and maintain an effective

ASMS that, wherever

possible,exploits the MOD’s existing

aviation regulatory structures,

publications and management

practices, in order to demonstrate an

acceptable means of compliance

with the requirements in RA1200.

Evolve ASMS into a resilience

engineering based system.

Introduce leading indicators to

increase effectiveness of the

ASMS - incorporated into the

model.

cPromote and lead by example a

questioning Air Safety culture.

Provide DH with an overview of

the system to allow more

effective questioning of the CAM

and the TAA.

Hold TAA and the CAM to

account using quantitative

performance indicators.

d

If necessary, challenge formally any

option or action that is proposed or

implemented by DH-facing

organizations that may result in the

activities for which they are

responsible not being Tolerable and

ALARP.

Provide a ready model which can

used to demonstrate the effect of

DH facing organisations on

airworthiness.

Quantify the effect of DH facing

actions, using intergrated data.

RA 1020 Roles & Responsibilities:

Aviation Duty Holder

165

duty holders may need to accept a greater degree of uncertainty around these

estimates. The difficulty is that the aim of the duty holder in dealing with any

airworthiness related risk to life is to ensure that risk has been reduced to

ALARP and at least within tolerable bounds. Greater uncertainty over

categorisation of risk to life may lead to greater conservatism in dealing with

emerging risks. Whilst this may be beneficial for safety it would be

disadvantageous from an operational perspective. To counter this problem it

should be emphasised that the model shows in greater detail than has been

described before the multiple layer (or loops) of control that provide damping

against harmful activity within the system. A specific insight that was gained

from the model is the degree of connectivity of few specific functions. These

functions are highlighted visually within the model. It would be possible to

conduct a similar exercise for flight safety using a FRAM model and perhaps to

link the 2 models together.

8.8 Potential Use for System Improvement

The literature review outlined successes for using FRAM in process

improvement activity in other industries. Experience has shown that the RAF

has discontinued the use of a variety of ‘lean’ methodologies in the years

following their initial introduction. These linear methodologies have often failed

to deal adequately with either the complexity or the variability of aircraft

maintenance operations. Criticisms have been levelled in that lean sought to

impose production line methodologies on maintenance, which was a poor fit

and as Carney (Carney, 2010) found, disadvantageous for safety. The FRAM

has a great potential to provide an alternative or complementary means of

achieving process improvement. For example, the TASM highlights the

variability in the way that maintenance is tasked; sometimes tasking is

generated through handovers, sometimes it is written on boards and sometimes

tasks are given verbally. A more detailed mapping of this element of the TASM

using FRAM worked through with a facilitated workshop using a variety of

personnel involved in the activity day-to-day, may assist in developing a more

efficient and safer process. This type of system improvement workshop ought to

166

become the de-facto response to any occurrence investigation. The current

system employed by the RAF provides for a hierarchical review and

implementation of recommendations made by occurrence investigators and

review groups. Whilst this accords with the need to vest decision making with

those ultimately responsible for the risk to life, it does remove decision making

further from those who may be best placed to understand the complexities of

the system. Further work is required on how best to implement FRAM into

decision making on occurrence report recommendations.

8.9 Potential for Further Development of the TASM

This project sought to test whether the FRAM could be applied to airworthiness

management and how useful a tool could be created. The version that is

presented in this report is an initial baseline version and whilst it ought to prove

a useful tool in its own right there much scope for further development.

Currently the model exists as two files; one a spreadsheet and another visio

drawing which can be manipulated interactively using the layers feature. It is

possible to embed Visio drawings within Microsoft SharePoint sites as used

within the military IT systems (MOSS). It is also possible to attach data files

drawn from excel to shapes within Visio Drawings. Figure 8-2 shows a potential

development pathway for the TASM. The envisaged end state for development

is tool hosted on standard desk top IT, providing a ‘dashboard’ type function to

show how the whole system is performing. It should be able to display output

from a variety of data sources such as LITS reports, manning information

spreadsheets, ASIMS data and quality audit reports.

167

Figure 8-2 TASM Development Pathway

8.9.1 Increased Model Fidelity

Throughout the development of the model to date there has been a continuous

set of assumptions and simplifications made regarding the operation of the real

world system. Chapter one discussed the nature of complexity and its inherent

incompressibility. With this in mind it is important to remember that the model

TORNADO GR4 AIRWORTHINESS SYSTEM MODEL


Type of Function External Variability

1 Name of Function Flight Servicing Human Number Name Aspect



Boards

Maintenance


Missed maintenance

requirementsMedium Medium


omissions and errorsINCREASE 12

Pre flight checks, fault reporting, engineering


Output AC systems replenished (propulsion, mechanical) AC systems replenished (propulsion, mechanical) Sequence Omission or wrong fluid used High Medium Related fault reporting



AC visually inspected (Avionics, Electrical, Structure, Mechanical, Crew

Escape, Weapons, Propulsion)Sequence Omission High Medium



Any faults recordedAny faults recorded

Sequence Omission High MediumComparison of flt servicing fault reporting


Husbandry jobs recorded in logHusbandry jobs recorded in log

Sequence Omission High LowComparison of flt servicing husbandry


Flight Servicing Certificate SignedFlight Servicing Certificate Signed

Wrong ObjectSign up for wrong tail number or

omit full informationMedium High Captured in ASIMS reports if found


Documentation



(pre-flight checks)


walkMedium Medium


pre-condition not in place.NO CHANGE 8 N/A




without access to aircraftDECREASE 3 N/A


Part Delivered to



Repair Aircraft,

replacement or life



& test equipment





consumables

INCREASE 18Pre flight checks, fault reporting, engineering





work-around.INCREASE 6






Appropriately (to

requirement)




maintenance, report


tasks etc)

SequenceOmission of a required


Potential for unauthorised (not




tasking function.




(Tech Manuals & Policy)Flight Servicing Notes Sequence Omission Medium High


information causes variability.INCREASE 18

The F765 process allows for reporting






ordination of

maintenance

documentation

Sequence

Element of ADF/Lim


omitted

High Medium



performance issues.

INCREASE 18





Programme


flight service, etc)Sequence


planMedium Medium


inappropriate ETTO.INCREASE 12




plan.








Frequency of Output


Amplitude of Output

Performance

Variability







Carried out by small

groups or individually.


Output Variability


Variability

Relatively high degree of potential

variability for human factors reasons:

(Identify key issues from Error

Management Info)

Variation due to:

Social/cultural factors such as normalised

behaviours.

Environmental Factors - lighting, climate,

shelter/hangarage.

Organisational Factors - Availability of

tools, test equipment, fuels, lubricants,

ground support equipment and authorised

manpower.

Internal Variability Outputs Upstream Function Possible effect on this (downstream)



Rough Downstream



downstream couplingPotential Performance Indicators

TASMVisualisation

Tool

Occurrence

Investigations

Review of Existing

Hazard Log

TORNADO GR4 AIRWORTHINESS SYSTEM MODEL


Type of Function External Variability

1 Name of Function Flight Servicing Human Number Name Aspect



Boards

Maintenance


Missed maintenance

requirementsMedium Medium


omissions and errorsINCREASE 12

Pre flight checks, fault reporting, engineering


Output AC systems replenished (propulsion, mechanical) AC systems replenished (propulsion, mechanical) Sequence Omission or wrong fluid used High Medium Related fault reporting



AC visually inspected (Avionics, Electrical, Structure, Mechanical, Crew

Escape, Weapons, Propulsion)Sequence Omission High Medium



Any faults recordedAny faults recorded

Sequence Omission High MediumComparison of flt servicing fault reporting


Husbandry jobs recorded in logHusbandry jobs recorded in log

Sequence Omission High LowComparison of flt servicing husbandry


Flight Servicing Certificate SignedFlight Servicing Certificate Signed

Wrong ObjectSign up for wrong tail number or

omit full informationMedium High Captured in ASIMS reports if found


Documentation



(pre-flight checks)


walkMedium Medium


pre-condition not in place.NO CHANGE 8 N/A




without access to aircraftDECREASE 3 N/A


Part Delivered to



Repair Aircraft,

replacement or life



& test equipment





consumables






work-around.INCREASE 6






Appropriately (to

requirement)




maintenance, report


tasks etc)

SequenceOmission of a required


Potential for unauthorised (not




tasking function.




(Tech Manuals & Policy)Flight Servicing Notes Sequence Omission Medium High


information causes variability.INCREASE 18

The F765 process allows for reporting






ordination of

maintenance

documentation

Sequence

Element of ADF/Lim


omitted

High Medium



performance issues.

INCREASE 18





Programme


flight service, etc)Sequence


planMedium Medium


inappropriate ETTO.INCREASE 12




plan.








Frequency of Output


Amplitude of Output

Performance

Variability







Carried out by small

groups or individually.


Output Variability


Variability

Relatively high degree of potential

variability for human factors reasons:

(Identify key issues from Error

Management Info)

Variation due to:

Social/cultural factors such as normalised

behaviours.

Environmental Factors - lighting, climate,

shelter/hangarage.

Organisational Factors - Availability of

tools, test equipment, fuels, lubricants,

ground support equipment and authorised

manpower.

Internal Variability Outputs Upstream Function Possible effect on this (downstream)



Rough Downstream



downstream couplingPotential Performance Indicators

Hyperlink TASM to Visualisation tool

Hyperlink Safety Indicators to Tool

Develop Interactive

Sharepoint Dashboard Tool

Apply Bayesian Logic to

model to generate risk

assessment methodologies

168

can only provide a rough description of what is happening within the system or

how it likely to behave in any future scenarios. The only way that the fidelity of

the model can be improved is to exercise it against various scenarios and make

iterative adjustments. This process would need careful configuration control as

is currently exercised for the loss model and for risk registers.

8.9.2 Application of Bayesian and/or Fuzzy Logic

The inability to generate simple risk assessments is likely to be seen by users

as a key weakness of the FRAM approach. Whilst this level of uncertainty

potentially a more realistic assessment of the risk, it would be useful to be able

to more accurately assess risk in order that different courses of action can be

compared. For example, one solution to the configuration management issues

highlighted in chapter seven would be to further automate the data capture

process, perhaps with portable devices that could be used alongside the

aircraft. This would remove some of the higher levels of variability that are

provided by human elements of functions such as ‘scheduled maintenance’ and

‘record work done’. It would however require a substantial investment to

introduce such a capability. This would require a business case to allow public

funds to be committed. Existing processes (MOD, 2013) require quantitative risk

assessment to achieve this and generally use ‘waterfall diagrams’ to

demonstrate how risk is mitigated over time. There is therefore a clear

requirement to introduce some further elements of quantification into the FRAM

and the TASM. The currently most promising approach is that which has been

outlined by Slater (2013) who has developed a desktop interface to allow

development of Bayesian logic dependency diagrams. Slater’s tool does not

require a risk assessor to become competent themselves in Bayesian

mathematics. This approach uses FRAM as a framework on which Bayesian

decision nets can be constructed.

8.9.3 Expansion into Operational Safety Management

The entirety of the system for operating Tornado is captured within the ‘Operate

Aircraft’ function. It is recorded in this way because the aircrew interact with the

aircraft systems and hence affect the airworthiness of the aircraft in the short

169

term during a particular sortie or over the long term as patterns of usage affect

the condition of the aircraft systems. Clearly operating the aircraft is a human

function and is also heavily involved in most air accidents. So whilst outside of

the scope of this particular study, there is likely to be significant benefit in

developing a FRAM model to understand flight safety elements of aircraft

operations. This could be linked to the TASM to understand how airworthiness

and flight safety risks are interlinked.

8.10 Chapter Summary

This discussion centred on the applicability of resilience engineering concepts

to the practise of airworthiness engineering. It discussed how the framework

under which airworthiness related safety investigations are conducted could be

adapted to the new ideas. The increase of both realism and uncertainty in risk

assessment using resilience engineering techniques was discussed. The

potential for future development of the TASM was also described.

171

9 CONCLUSIONS

9.1 Summary

The project first reviewed the literature on the general background theories to

safety science and engineering. Three broad themes were identified as having

reached maturity; that of the technological age based on Boolean logic and

reliability studies; the age of human factors and then the age of the

organisational accident. More recent developments included a study of the

effects of complexity and control theory in order to understand safety. From

these roots resilience engineering has been highlighted by some as a new

paradigm in safety. The literature around resilience engineering was found to be

somewhat fractured however several key themes to the topic where identified

and discussed. Principally, a resilience engineering perspective views safety as

the system’s ability to perform under disturbed and potentially unexpected

conditions. Safety therefore becomes a control problem rather than a reliability

problem. Safety is an emergent property rather than a direct linear resultant

property determined by the reliability of the system’s components and their

mode of use. The Functional Resonance Analysis Method (FRAM) was

identified as having the greatest potential to operationalise these new theories

in existing airworthiness organisational systems. The methodology describes

the system in terms of its functions, linked by activities. Harmful activity may

emerge as a result of the non-linear combination variable functional outputs,

occasionally causing functional resonance which may propagate in the form of

uncontrollable output variability across the system. Using FRAM a spreadsheet

model was developed for Tornado Airworthiness; alongside this an interactive

visualisation tool was also developed. The model was based on a number of

interviews with personnel within the airworthiness system and on data from the

MOD’s Air Safety Information Management System, alongside a large amount

of policy documentation. This Tornado Airworthiness System Model (TASM)

was tested by taking the results from two separate incidents and describing the

scenarios in terms of functional resonance. This identified that the model was

consistent with both scenarios but also raised various questions over the

172

assumptions behind the investigations. The TASM was also used to investigate

the risk posed by the operation of components in excess of their cleared life on

the Tornado. This analysis highlighted that the model in its current form was not

able to quantify the risk in anything other than very general terms. However the

model did illustrate how the various factors were responsible for either forcing or

damping the variability of the functional output of the ‘replace life limited parts’

function within the model. This method of analysing risk scenarios provides

additional insight that traditional reporting techniques do not. Resilience

engineering and the FRAM in particular was shown to offer a great deal of

insight into how airworthiness may be more effectively managed. The research

objectives were:

Review the theoretical background to safety management and the

implications for airworthiness management.

Review the concepts of Resilience Engineering with an emphasis on

application to airworthiness management.

Establish a theoretical framework for a model of an airworthiness

management system.

Gather and use primary research data to establish and validate a model

of the airworthiness management system for the RAF Tornado Force.

Using the model, develop a tool to enhance the airworthiness

management system of the RAF Tornado Force.

All of these objectives were met and it can be concluded that the project has

produced an operationally useful tool which will enhance the management of

airworthiness across the RAF’s Tornado fleet using the latest safety thinking.

9.2 Recommendations

Whilst resilience engineering in general and the TASM in particular require

extensive development, the following specific recommendations are given with

respect to the RAF Tornado case study.

173

9.2.1 Manage Airworthiness as a Control Problem

Quantitative or probabilistic risk assessments are well suited to reliability

analysis of components or subsystems. Such analyses are of dubious validity

when considering complex systems and even more so where there is a large

element of human and organisational interaction. These cases apply to

airworthiness issues and as such it is better to combine reliability analyses with

a treatment of the achievement of airworthy systems as an ongoing control

problem.

9.2.2 Use the TASM to Control the Airworthiness System

Control of airworthiness systems can effectively be modelled by the Functional

Resonance Analysis Method, where harmful activity occurs when the output

from one function becomes coupled in a resonant manner with the aspect of

another function. The Tornado Airworthiness System Model (TASM) provides a

baseline model from which such analyses can be carried out.

The TASM will prove to be a powerful tool for occurrence investigation and

should be used a baseline from which to conduct such investigations.

9.2.3 Review Airworthiness Risk from a Resilience Perspective

Where it is necessary for the Tornado TAA and Duty Holder to sentence

emerging air safety risks that have any connection to airworthiness

management, the TASM should be used to review the risk from a resilience

engineering perspective alongside existing methodologies required by the MAA.

9.2.4 Use FRAM as a Means to Improve System Resilience and

Efficiency

Where incidents occur, the TASM should be the baseline investigative tool.

Other Quality and continuous improvement activity should use the TSAM and

FRAM as a means to seek out improvements in safety and in efficiency across

organisations involved in airworthiness. In particular FRAM can be used as an

alternative to linear ‘lean’ techniques when dealing with complex working

environments.

174

9.3 Potential for Further Research and Development

This project has provided a very initial look into resilience engineering with

respect to airworthiness. There is a large amount of further research that can be

conducted into this area. General themes should encompass:

The use of FRAM as a technique for investigating air accidents and air

safety occurrence reports.

The development of FRAM to produce quantitative and qualitative risk

assessments, particularly focussing on how it may be used as a

framework to develop Bayesian probability models.

Development of techniques, protocols and standards for conducting

FRAM workshops, whether for the analysis of safety issues or for the

purpose of improving quality or safety.

This project has created an initial version of the TSAM, which while useful, will

require extensive further development:

Integration of the FRAM spreadsheet as data attached to the functional

shapes within the visualisation tool to allow easier interpretation.

The visualisation tool into a Microsoft SharePoint site to allow further

Integration of development as an airworthiness ‘dashboard’.

Development of the leading safety indicators identified within the TASM.

Linking of existing and new data sources as leading safety indicators in

a TASM ‘dashboard’ to provide a mechanism for day-to-day

management of the airworthiness system and enhance the ability of both

the CAM and the TAA to take appropriate airworthiness decisions.

9.4 Concluding Remarks

This study has taken a new set of safety science concepts and has sought to

apply them to the management of airworthiness. This activity has been largely

successful although inevitably there will need to be a further continuous process

of iteration and improvement to the tools produced. The background to this

project was the questions posed by the Nimrod Review. It is clear that in the

light of the new safety paradigm described by resilience engineering, that in the

175

case of Nimrod, the airworthiness system had gradually slipped out of control

and that a variety of functions had begun to resonate with each other resulting

eventually in uncontrollable interaction between the fuel, mechanical and

electrical systems to produce the catastrophic loss of the aircraft and crew.

Resilience engineering and FRAM provide a basis for more effective future

control of the organisational, technological and human functions involved in

airworthiness. Better upstream management of airworthiness controls will

prevent some future pilot having to “fight with the controls” in the face of some

potential downstream catastrophe.

177

REFERENCES

Anon, (2011) ‘Supervision High Up on the Equator - the Puma Force in Kenya’, Air Clues, July [Online], Available at: http://www.raf.mod.uk/rafcms/mediafiles/29D67908_5056_A318_A8AFDA410071E0B8.pdf (Accessed: 1 December 2013).

Aitken, H. (2009) LITS Business Data Corruption, MOD: Internal, DES/WYT/595441/4 20 May 09.

Apostolakis, G. E. (2004) ‘How useful is quantitative risk assessment?’, Risk Analysis, vol. 24, no. 3, pp. 515-520.

Bagwell, G. (2011) 1 Gp ODH ALARP Statement - Operation of Components in Excess of Cleared Life, MOD Internal RESTRICTED, TOR 01.

Beauchamp, E. (2006) ‘Learning from Diversity: Model-Based Evaluation of Opportunities for Process (Re)-Design and Increasing Company Resilience’, The Second Resilience Engineering Symposium, Antibes – Juan-Les-Pins, France 8-10 November 2006: Resilience Engineering Association, pp. 23.

Belmonte, F., Schön, W., Heurley, L. and Capel, R. (2011) ‘Interdisciplinary safety analysis of complex socio-technological systems based on the functional resonance accident model: An application to railway traffic supervision’, Reliability Engineering & System Safety, vol. 96, no. 2, pp. 237-249.

Bendat, J. S. (1998) Nonlinear system techniques and applications, New York: Wiley.

Brooker, P. (2011) ‘Experts, Bayesian Belief Networks, rare events and aviation risk estimates’, Safety Science, vol. 49, no. 8–9, pp. 1142-1155.

Cambon, J., Guarnieri, F. and Groeneweg, J. (2006) ‘Towards a new tool for measuring Safety Management Systems performance’, Learning from Diversity: The Second Resilience Engineering Symposium, Antibes – Juan-Les-Pins, France 8-10 November 2006: Resilience Engineering Association, pp. 53.

Carney, P. (2010) Critical Analysis of the Airworthiness Impact of Lean Production Principles in a Depth Maintenance Organisation . MSc thesis, Cranfield University.

Casey, T. (2013) Tornado Continuing Airworthiness Management Exposition (CAME) MOD Internal RESTRICTED, CAMO/CERT/2012/018.

Cilliers, P. (2005) ‘Knowing complex systems’, in Richardson, K. (ed.) Managing Organizational Complexity: Philosophy, Theory, and Application, Greenwich, CT: ISCE Publishing, pp. 7-19.

http://www.raf.mod.uk/rafcms/mediafiles/29D67908_5056_A318_A8AFDA410071E0B8.pdf

http://www.raf.mod.uk/rafcms/mediafiles/29D67908_5056_A318_A8AFDA410071E0B8.pdf

178

Cooke, P. (2004) Panavia Tornado GR4 [Online], Available at: http://www.airliners.net/photo/UK---Air/Panavia-Tornado-GR4/0636414/L/ (Accessed 5 March 2014).

Coury, B., Kolly, J., Gormley, E. and Dietz, A. (2008) ‘The central role of principal issues in aviation accident investigation’, Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Vol. 52, Sage Publications, pp. 99.

Crown Copyright (2009) 31 Squadron Tornado [Online], available at: http://www.raf.mod.uk/gallery/tornadogallery.cfm?start=1&viewmedia=4#pageContent (Accessed 5 March 2014).

de Carvalho, P. V. R. (2011) ‘The use of Functional Resonance Analysis Method (FRAM) in a mid-air collision to understand some characteristics of the air traffic management system resilience’, Reliability Engineering & System Safety, vol. 96, no. 11, pp. 1482-1498.

De Landre, J., Gibb, G. and Walters, N. (2006) ‘Using Incident Investigation Tools Proactively for Incident Prevention’, Meeting of the Australian and New Zealand Society of Air Safety Investigators. Australia: Australian and New Zealand Society of Air Safety Investigators [Online]. Available at: http://asasi.org/papers.htm (Accessed 13 November 2013).

Dekker, S. (2003) ‘When human error becomes a crime’, Human Factors and Aerospace Safety, vol. 3, pp. 83-92.

Dekker, S. (2005) ‘9 Why we need new accident models’, Contemporary issues in human factors and aviation safety, pp. 181.

Dekker, S., Cilliers, P. and Hofmeyr, J. (2011) ‘The complexity of failure: Implications of complexity theory for safety investigations’, Safety Science, vol. 49, no. 6, pp. 939-945.

Dudman, D., ( 2012) ‘No 1 Group Air Safety Management Plan’, 3rd ed., Royal Air Force Internal, Defence Intranet.

Edwards, J. R. D., Davey, J. and Armstrong, K. (2013) ‘Returning to the roots of culture: A review and re-conceptualisation of safety culture’, Safety Science, vol. 55, no. 0, pp. 70-80.

Espejo, R. (1989) ‘A cybernetic method to study organizations’, The Viable System Model: Interpretations and Applications of Stafford Beer’s VSM, pp. 361-382.

http://www.airliners.net/photo/UK---Air/Panavia-Tornado-GR4/0636414/L/

http://www.raf.mod.uk/gallery/tornadogallery.cfm?start=1&viewmedia=4#pageContent

http://www.raf.mod.uk/gallery/tornadogallery.cfm?start=1&viewmedia=4#pageContent

http://asasi.org/papers.htm

179

Freed and Priday, R. (2008) ‘Annex A to BP 1301 - Initial Report of Serious Occurrence or Fault’, MOD Internal.

Gale, I., Keeling, A. and Strasdin, S., (2013) ‘Perfect Storm’, Air Clues, July [Online], Available at: http://www.raf.mod.uk/rafcms/mediafiles/3AE4263C_5056_A318_A883FF5D10B24E91.pdf

Grøtan, T. O., Størseth, F. and Albrechtsen, E. (2011) ‘Scientific foundations of addressing risk in complex and dynamic environments’, Reliability Engineering & System Safety, vol. 96, no. 6, pp. 706-712.

Haddon-Cave, C. (2009) The Nimrod Review, London: The Stationary Office.

Hale, A. and Borys, D. (2013) ‘Working to rule or working safely? Part 2: The management of safety rules and procedures’, Safety Science, vol. 55, no. 0, pp. 222-231.

Heinrich, H. W., Petersen, D. and Roos, N. (1950) Industrial accident prevention, McGraw-Hill:New York.

Herrera, I. (2012) Proactive safety performance indicators. PhD thesis Norges teknisk-naturvitenskapelige universitet, Institutt for produksjons- og kvalitetsteknikk [Online]. Available at: http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-16990.

Hitchens, D. (2003) Advanced Systems Thinking, Engineering and Management, 1st ed, Artech House: Norwood.

Hodson, C. J. (2008) Civil Airworthiness for a UAV Control Station. MSc thesis. University of York [Online]. Available at: http://www-users.cs.york.ac.uk/~mark/projects/cjh507_project.pdf

Hollnagel, E. (2011) Resilience engineering in practice: A guidebook, Farnham, Surrey: Ashgate Publishing.

Hollnagel, E. (2012) FRAM: The Functional Resonance Analysis Method Modelling Complex Socio-Technical Systems, Farnham, Surrey: Ashgate Publishing.

Hollnagel, E. (2014) The Functional Resonance Analysis Method, 20 March [Online] Available at: www.functionalresonance.com.

Hollnagel, E. and Woods, D.(2005) Joint cognitive systems: Foundations of cognitive systems engineering, NW: CRC Press.

Hollnagel, E., Woods, D. and Leveson, N. (2007) Resilience Engineering Concepts and Precepts, Farnham, Surrey: Ashgate Publishing.

http://www.raf.mod.uk/rafcms/mediafiles/3AE4263C_5056_A318_A883FF5D10B24E91.pdf

http://www.raf.mod.uk/rafcms/mediafiles/3AE4263C_5056_A318_A883FF5D10B24E91.pdf

http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-16990

http://www-users.cs.york.ac.uk/~mark/projects/cjh507_project.pdf

http://www-users.cs.york.ac.uk/~mark/projects/cjh507_project.pdf

http://www.functionalresonance.com/

180

Hounsgaard, J. (2013) Using FRAM as a Quality Improvement Tool in Health Care [Online], available at: http://functionalresonance.com/onewebmedia/FRAMily_2013_Hounsgaard.pdf

ICAO ( 2001) Annex 13 to the Convention on International Civil Aviation - Aircraft Accident and Incident Investigation, 9th ed., ICAO [Online]. Available at: http://www.cad.gov.rs/docs/udesi/an13_cons.pdf.

Jeffery, D. (2009) Tornado Configuration Control and Impact on Continued Airworthiness, QinetiQ RESTRICTED, QINETIQ/MS/SES/CR0902379/1.

Johansson, B. and Lindgren, M. (2008) ‘A quick and dirty evaluation of resilience enhancing properties in safety critical systems’, Proceedings of the third symposium on resilience engineering, Juan-les-Pins, France, pp133.

Johnson, C. and Holloway, C. (2004) ‘On the over-emphasis of human ‘error’ as a cause of aviation accidents: ‘systemic failures’ and ‘human error’ in US NTSB and Canadian TSB aviation reports 1996–2003’, Proceedings of the 22nd International System Safety Conference (ISSC). Providence, RI: Systems Safety Society, Citeseer .

Kelly, T. P. and McDermid, J. A. (1999) ‘A Systematic Approach to Safety Case Maintenance’, Computer Safety, Reliability and Security 18th International Conference, SAFECOMP’99. Tolouse, France: Springer, pp. 13-26.

Kontogiannis, T. and Malakis, S. (2012a) ‘Recursive modelling of loss of control in human and organizational processes: A systemic model for accident analysis’, Accident Analysis & Prevention, vol. 48, no. 0, pp. 303-316.

Kontogiannis, T. and Malakis, S. (2012b) ‘A systemic analysis of patterns of organizational breakdowns in accidents: A case from Helicopter Emergency Medical Service (HEMS) operations’, Reliability Engineering & System Safety, vol. 99, no. 0, pp. 193-208.

Le Coze, J. (2013) ‘New models for new times. An anti-dualist move’, Safety Science, vol. 59, no. 0, pp. 200-218.

Leonhardt, J., Macchi, L., Hollnagel, E. and Kirwan, B. (2009) A White Paper on Resilience Engineering for ATM, EUROCONTROL [Online], Available at: www.eurocontrol.int.

Leveson, N. (2011) Engineering a safer world: Systems thinking applied to safety, London: MIT Press.

Lloyd, E. and Tye, W. (1982) Systematic safety, London: Civil Aviation Authority.

http://functionalresonance.com/onewebmedia/FRAMily_2013_Hounsgaard.pdf

http://www.cad.gov.rs/docs/udesi/an13_cons.pdf

http://www.eurocontrol.int/

181

Lundberg, J. (2008) ‘FRAM as a risk assessment method for nuclear fuel transportation’, 3rd IET International Conference on System Safety. 20 – 22 October. NEC, Birmingham: Institute of Engineering and Technology.

Luxhøj, J. T. (2003) Probabilistic Causal Analysis for System Safety Risk Assessments in Commercial Air Transport, Department of Industrial and Systems Engineering, Rutgers University [Online]. Available at: shemesh.larc.nasa.gov/ira03/p02-luxhoj.pdf

Luxhøj, J. T. and Williams, T. P. (1996) ‘Integrated decision support for aviation safety inspectors’, Finite Elements in Analysis and Design, vol. 23, no. 2–4, pp. 381-403.

MAA, (2011a) Air Safety Information Management System User Manual, MAA [Online]. Available at: http://www.maa.mod.uk/linkedfiles/occurrence_reporting/20111005asims_user_guide_v42_finalu.pdf.

MAA (2011b) Missing Rigging Pin, asor\Lossiemouth - RAF\XV(R) Sqn\Tornado\11\9110, MAA: Air Safety Information Management System (MOD Internal System).

MAA, (2012a) Gen1000 Series Regulatory Articles, 2nd ed MAA [Online]. Available at: http://www.maa.mod.uk/linkedfiles/regulation/gen1000seriesprint.pdf.

MAA, (2012b) MAA02: Military Aviation Authority Master Glossary, Issue 3 ed., ed MAA [Online]. Available at: http://www.maa.mod.uk/linkedfiles/regulation/maa02.pdf.

MAA, (2013a) RA 1205 – Air System Safety Cases, 2nd ed., ed MAA [Online]. Available at: http://www.maa.mod.uk/linkedfiles/regulation/gen1000seriesprint.pdf.

MAA, (2013b) RA 1210 – Ownership and Management of Operating Risk (Risk to Life, 2nd ed., ed MAA [Online]. Available at: http://www.maa.mod.uk/linkedfiles/regulation/gen1000seriesprint.pdf.

Madni, A. M. and Jackson, S. (2009) ‘Towards a Conceptual Framework for Resilience Engineering’, Systems Journal, IEEE, vol. 3, no. 2, pp. 181-191.

Manson, S. M. (2001) ‘Simplifying complexity: a review of complexity theory’, Geoforum, vol. 32, no. 3, pp. 405-414.

Mason, M. (2012) Tornado Weapon System Safety Case Report Issue 1, EFIPT-ABW/06/01/13/06, MOD: Internal (RESTRICTED).

shemesh.larc.nasa.gov/ira03/p02-luxhoj.pdf

http://www.maa.mod.uk/linkedfiles/occurrence_reporting/20111005asims_user_guide_v42_finalu.pdf

http://www.maa.mod.uk/linkedfiles/occurrence_reporting/20111005asims_user_guide_v42_finalu.pdf

http://www.maa.mod.uk/linkedfiles/regulation/gen1000seriesprint.pdf

http://www.maa.mod.uk/linkedfiles/regulation/maa02.pdf



182

McDonald, N. (2008) ‘Challenges facing Resilience Engineering as a Theoretical and Practical Project’, Proceedings of the third symposium on resilience engineering, Juan-les-Pins, France, pp205-2010

McKenzie, K. (2012) MR.2 XV230 in the circuit at Kinloss in 2000, available at: http://www.aeroflight.co.uk/wp-content/uploads/2010/03/XV230-02.jpg (Accessed 8 October 2013).

MOD (2007) Safety Management Requirements for Defence Systems, Defence Standard 00-56, Issue 4, MOD.

MOD (2013a) Tornado Local Instruction - Equipment Risk Management, LI BS0056 Version 1.2, MOD: Internal (RESTRICTED).

MOD (2013b) Tornado Continuous Airworthiness Management Exposition, CAMO/CERT/2012/018, MOD: Internal (RESTRICTED).

Nathanael, D. and Marmaras, N. (2006) ‘The interplay between work practices and prescription: a key issue for organizational resilience’, Proceedings of the second symposium on resilience engineering, Juan-les-Pins, France pp. 229.

Oliver, D., Kelliher, T. and Keegan Jr, J. (1997) Engineering Complex Systems, McGraw-Hill.

Oxstrand, J. and Sylvander, C. (2010) ‘Resilience engineering: Fancy talk for safety culture: A Nordic perspective on resilience engineering’, Resilient Control Systems (ISRCS) 2010 3rd International Symposium on, IEEE, pp. 135.

Pasman, H. J., Knegtering, B. and Rogers, W. J. (2013) ‘A holistic approach to control process safety risks: Possible ways forward’, Reliability Engineering and System Safety, vol. 117, pp. 21-29.

RAeS (2013) ‘The Way We Do Things Around Here’ Culture in The Aviation Maintenance and Engineering Environment, Royal Aeronautical Society [Online]. Available at: http://aerosociety.com/Assets/Docs/Events/728/728Programme.pdf (Accessed 8th March).

Rasmussen, J. (1997) ‘Risk management in a dynamic society: a modelling problem’, Safety Science, vol. 27, no. 2–3, pp. 183-213.

Reason, J. (1997) Managing the Risks of Organizational Accidents, 1st ed, Farnham, Surrey: Ashgate.

Reason, J. T. and Hobbs, A. (2003) Managing maintenance error: a practical guide, Farnham, Surrey: Ashgate.

http://www.aeroflight.co.uk/wp-content/uploads/2010/03/XV230-02.jpg

http://aerosociety.com/Assets/Docs/Events/728/728Programme.pdf

183

SAE, (1996) Guidelines and Methods for Conducting the Safety Assessment Process on Civil Airborne Systems and Equipment, ARP476, 1st ed. Washington: Society of Automotive Engineers.

SAE (2010) Guidelines for Development of Civil Aircraft and Systems, ARP4754 Rev A, Washington: Society of Automotive Engineers.

Saleh, J. H., Marais, K. B., Bakolas, E. and Cowlagi, R. V. (2010) ‘Highlights from the literature on accident causation and system safety: Review of major ideas, recent contributions, and challenges’, Reliability Engineering & System Safety, vol. 95, no. 11, pp. 1105-1116.

Salmon, P. M., Cornelissen, M. and Trotter, M. J. (2012) ‘Systems-based accident analysis methods: A comparison of Accimap, HFACS, and STAMP’, Safety Science, vol. 50, no. 4, pp. 1158-1170.

Saurin, T. A. and Carim Junior, G. C. (2012) ‘A framework for identifying and analyzing sources of resilience and brittleness: A case study of two air taxi carriers’, International Journal of Industrial Ergonomics, vol. 42, no. 3, pp. 312-324.

Schafer, D. (2012) A Resilience Engineering Primer, Michigan State University [Online]. Available at: https://www.msu.edu/~tariq/Resilience%20engineering%20primer.pdf (Accessed 25 October 2013).

Shirali, G. A., Mohammadfam, I. and Ebrahimipour, V. (2013) ‘A new method for quantitative assessment of resilience engineering by PCA and NT approach: A case study in a process industry’, Reliability Engineering and System Safety, vol. 119, pp. 88-94.

Singleton, C. (2009) Tornado Asset Gateway Proof of Concept - Final Report, 20090519_TAGProofOfConceptFinalReport_R, MOD: Internal (RESTRICTED).

Slater, D., (2013) SIOPS - The New HAZOPS?, Cambrensis [Online]. Available at: http://www.cambrensis.org/wp-content/uploads/2012/05/A-System-Integrity-and-operability-Study.pdf. (Accessed 4 April 2014).

Stolker, R., Karydas, D. and Rouvroye, J. (2008) ‘A comprehensive approach to assess operational resilience’, Proceedings of the third symposium on resilience engineering, Juan-les-Pins, France, pp. 28.

Stoop, J. (2013) To Certify, to Investigate or to Engineer, that is the Question, Resilience Engineering Association [Online], Available at: http://www.resilience-engineering-association.org/download/resources/symposium/symposium-2013/Stoop%20(REA%202013).%20To%20certify,%20to%20investigate%20or%20to%20engineer,%20that%20is%20the%20question.pdf. (Accessed 4 April 2014).

https://www.msu.edu/~tariq/Resilience%20engineering%20primer.pdf

http://www.cambrensis.org/wp-content/uploads/2012/05/A-System-Integrity-and-operability-Study.pdf

http://www.cambrensis.org/wp-content/uploads/2012/05/A-System-Integrity-and-operability-Study.pdf

http://www.resilience-engineering-association.org/download/resources/symposium/symposium-2013/Stoop%20(REA%202013).%20To%20certify,%20to%20investigate%20or%20to%20engineer,%20that%20is%20the%20question.pdf




184

Sugden, G. (2011) Tornado Loss Model and Loss Model Database - December 2011 Update, BAE-WAW-RP-TOR-TGP-5209, BAE Systems: Internal (RESTRICTED).

Vugrin, E. D., Camphouse, R. C. and Sunderland, D., Quantitative Resilience Analysis Through Control Design, SAND2009-5957, Livermore, CA: Sandia National Laboratories [Online]. Available at: http://prod.sandia.gov/techlib/access-control.cgi/2009/095957.pdf(Accessed 4 April 2014).

Wilson, E. S. (2012) The Interaction Of Organisational, Human and Technology Factors On The Effectiveness Of Safety Management Systems And Value Achieved From Deploying New Technology, PhD Thesis. University of New South Wales [Online]. Available at: unsworks.unsw.edu.au/fapi/datastream/unsworks:10843/SOURCE01 (Accessed 4 April 2014).

Wilson, E. (2008) ‘Toward a model of the impact organisation, human and technology factors have on the effectiveness of safety management systems’, Journal of Achievements in Materials and Manufacturing Engineering, vol. 31, no. 2, pp. 827-836.

Woltjer, R. (2007) ‘A systemic functional resonance analysis of the Alaska Airlines flight 261 accident’, Human Factors and Economic Aspects on Safety, pp. 83.

Woodbridge, K., (2012) Tornado Equipment Safety Management Plan, 8.0th ed., MOD: Internal (RESTRICTED).

Zarboutis, N. and Wright, P. (2006) ‘Using complexity theories to reveal emerged patterns that erode the resilience of complex systems’, Proceedings of the Second Symposium on Resilience Engineering, Juan-les-Pins, France, pp. 1999.

http://prod.sandia.gov/techlib/access-control.cgi/2009/095957.pdf

unsworks.unsw.edu.au/fapi/datastream/unsworks:10843/SOURCE01

185

Appendix A –TORNADO AIRWORTHINESS FRAM MODEL

The following File was submitted electronically: TASM V1.xls

187

Appendix B – TORNADO AIRWORTHINESS MODEL

VISUALISATION

The following files were submitted electronically:

TASM Visualisation Tool V1.vis (Viso file)

TASM Visualisation Tool V1.pdf (large image of Viso file showing all

layers)

TASM Visualisation Tool V1 BLUEPRINT.pdf (large image of Viso file highlighting

connections)

188

Appendix C – PARTICIPANTS BRIEFING SHEET

RESILENCE ENGINEERING STUDY Thank you for agreeing to take part in this post graduate research study undertaken with Cranfield University. The aim is to improve the management of airworthiness in the RAF using Resilience Engineering principles. What is Resilience? Resilience is the intrinsic ability of a system to adjust its functioning prior to, during, or following changes and disturbances, so that it can sustain required operations under both expected and unexpected conditions. What is Resilience Engineering? Resilience Engineering is the practise of designing or modifying resilience into a system, whether the system is a piece of technology (such as a Tornado) or a complicated organisation (RAF and its contractors). It is a move away from ‘linear’ thinking which has produced overly simplistic models of safety such as the (in)famous ‘Swiss cheese’ or ‘bow ties’. It describes complex systems at a manageable level of detail without discarding critical connections. Principles 1. The orders and instructions we work to never quite match the real world. Individuals and organisations must therefore adjust what they do to match current demands and resources – this is generally an approximation.

2. Some adverse events can be attributed to a breakdown or malfunctioning of components and normal system functions, but others cannot. The latter can best be understood as the result of unexpected combinations of performance variability. 3. Safety management cannot be based exclusively on hindsight (occurrence investigations), nor rely on error tabulation and the calculation of failure probabilities (risk registers). Safety management must be proactive as well as reactive. 4. Safety cannot be isolated from the core business of producing aircraft, nor vice versa. Safety is the prerequisite for productivity, and productivity is the prerequisite for safety. Safety must therefore be achieved by improvements rather than constraining how we work with a multitude of ‘safety barriers’. How? – Understand Combinations of Performance Variability; Functional Resonance The study will map the whole socio-technical system that produces an airworthy Tornado. This includes everything from an AMM servicing a jet; to a design

189

engineer producing a modification; to the fleet planning office. The whole system comprises a variety of processes which are made up of a variety of functions (or activities). Functions are linked together by a variety of aspects – your subject matter expertise is needed to understand the different aspects of your function. Aspects of Functions using an aircraft take-off as an example Input – that which the function processes or transforms or that which starts the function. Clearance to take-off from ATC. Preconditions – Conditions that must exist before a function execution. Aircraft on the runway. Resources – that which the function needs or consumes to produce the output. Aircraft, fuel, etc. Control – How the function is monitored or controlled; plan, programme, instructions. Checklist. Time – temporal constraints affecting the function. Take-off slot. Output – that which is the result of the function, either an entity or a state change finishing time or duration. Aircraft becomes airborne. A Function and its Aspects:

A model built using the Functional Resonance Analysis Method Source: (Leonhardt et al., 2009)

A_TOMCZYNSKI_MAA-DISS-19-A__665187 FINAL

Documents

airworthiness management

airworthiness system

airworthiness issue

system safety

safety science

air safety occurrences

supervisor dr simon

system performance