FRAMEWORK FOR THE ANALYSIS OF CONTROLLER RECOVERY FROM EQUIPMENT FAILURES IN AIR TRAFFIC CONTROL Branka Subotic (MSc BSc) April 2007 A thesis submitted for as fulfilment of the requirements for the degree of Doctor of Philosophy of the University of London and for the Diploma of Membership of Imperial College London Centre for Transport Studies Department of Civil and Environmental Engineering Imperial College London, United Kingdom
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
FRAMEWORK FOR THE ANALYSIS OF CONTROLLER RECOVERYFROM EQUIPMENT FAILURES IN AIR TRAFFIC CONTROL
Branka Subotic (MSc BSc)
April 2007
A thesis submitted for as fulfilment of the requirements for the degree of Doctor ofPhilosophy of the University of London and for the
Diploma of Membership of Imperial College London
Centre for Transport Studies Department of Civil and Environmental Engineering
Imperial College London, United Kingdom
Declaration
At various stages during this PhD, I was involved in collaborative efforts with both
academic and industrial colleagues. In certain cases, the outputs of these collaborations
are included in this thesis to better explain and support the research presented. In
particular, during the period 2004 to 2005, colleagues from the Air Traffic Management
(ATM) Group at the Centre for Transport Studies, Imperial College London, assisted in the
questionnaire-based survey of air traffic controllers. This mainly involved the distribution of
questionnaires and collection of the responses.
Furthermore, a key element of the research presented in this thesis is the experiment
conducted at a facility owned and operated by a Civil Aviation Authority (CAA). The
experiment was facilitated by the assistance of various Air Traffic Control (ATC) Centre
staff including ATM specialists, ATC controllers, pseudo-pilots, engineers, and technicians.
Finally, EUROCONTROL staff provided a valuable contribution at various stages of this
research in terms of access to relevant publications, professional networks, and simulation
trials.
I hereby declare that besides the collaborations referred to above, I have personally
carried out the work described in this thesis:
…………………………………………………..
Branka Subotic
…………………………………………………..
Dr. Washington Yotto Ochieng
ii
Abstract
An Air Traffic Control (ATC) system represents a set of components that act together to
achieve a safe and efficient flow of traffic in any given airspace. The elements of this
system are human operators, equipment, and procedures, along with all the interactions
between them. Failure of equipment, as one component of an ATC system, and its
interaction with human operators (i.e. air traffic controllers) is the main focus of the
research presented in this thesis. Thus, the thesis focuses on the human recovery process
triggered by failure of equipment that support air traffic controllers in the provision of air
traffic services in a dedicated airspace. A detailed understanding of the controller recovery
process has the potential to significantly contribute to safety and operational efficiency in
the current and future ATC environment. Currently, there is a very limited understanding of
the factors that influence the recovery process, particularly with respect to equipment
failures in ATC. This thesis builds on existing relevant research in other industries and
uses targeted experiments and mathematical modelling to develop a functional
relationship between recovery and its influencing factors.
The research presented in this thesis addresses on two areas, namely equipment failures
in ATC and controller recovery. The first investigates the characteristics of the ATC
equipment failures from past research and derives the associated target level of safety.
Linking the target level of safety with available operational failure reports establishes a
means to validate the realism and operational significance of the equipment failure
characteristics. A subset of these characteristics relevant to the ATC operations is further
used to develop a novel qualitative equipment failure impact assessment tool. This tool
enables the identification of equipment failures that are most severe to ATC operations
and thus may be most challenging to controller performance.
iii
Having identified the relevant equipment failure types and their characteristics, the thesis
carries out a critical review of the associated issues regarding the process of controller
recovery. A critical element of this is the review of past human reliability research and its
relationship to controller recovery from equipment failures in ATC. The findings from this
are augmented by questionnaire survey results based on responses of 134 air traffic
controllers from 34 countries. Both the past research and the questionnaire survey results
are used to highlight the importance of the context in which controller recovery
performance takes place and to define the recovery context through a set of 20 candidate
contextual factors or Recovery Influencing Factors (RIFs).
The thesis then uses the candidate RIFs to develop a novel approach for the quantitative
assessment of the recovery context through the concept of recovery context indicator. This
approach and its operational benefits are further validated by an experiment conducted in
a training facility of an ATC Centre with the participation of 30 operational air traffic
controllers. In addition to the verification of the generic methodology for the assessment of
the recovery context, the experimental data are used to analyse controller recovery
performance and investigate the outcome of the recovery process. The findings obtained
from the experimental investigation are in line with those obtained from past research and
the ATC operational environment.
iv
Acknowledgements
Having started my research initially at the EUROCONTROL Experimental Centre (EEC) in
Bretigny sur Orge and then at Imperial College London, it is understandable that naming
all those people who have contributed to this work is quite a hard task. However, I will try
anyway and if some names are not listed, my gratitude is not less than for those listed
below.
For help with the funding of my studies, I would like to thank the following organisations:
� EUROCONTROL Experimental Centre (EEC) in Bretigny sur Orge, France for the
award of a graduate internship and a further three-year research studentship;
� Universities UK for the Overseas Research Scheme (ORS) award for three
consecutive years; and
� the Centre for Transport Studies, Department of Civil and Environmental
Engineering, Imperial College London for the contribution to my tuition fee and a
three-year research bursary.
This PhD research would not have been possible without Christian Push and Dirk
Schaefer who invited me initially to join the EUROCONTROL Human Factors group and to
start developing a research project satisfying both the needs of the EEC as well as my
own interests. Once started, this collaboration proved to be highly supportive in both
technical and financial terms. As a EUROCONTROL PhD student I had a privilege of
unlimited access to many aviation experts working “in house”: at the EEC, Headquarters
(Belgium), and the Maastricht Upper Area Control (UAC) Centre (Netherlands). Among
these were Nigel Makings, Catherine Gandolfi, Eric Perrin, Deirdere Bonini, Rachael
Gordon, Andrew Harvey, and the entire Gate-to-Gate (G2G) team and controllers involved
in simulation A and B, especially Diarmuid Houlihan ‘Motto’. I thank them all for the fruitful
collaboration. My special gratitude goes to Barry Kirwan and Oliver Straeter whose
v
technical assistance and unlimited support was crucial to embarking upon the field of
human reliability, completely unknown to me at the beginning of this research. Their
assistance and interest in my research opened many doors and assured the highest
quality of information and professional contacts.
At Imperial College there are many colleagues and research students that offered their
help at various stages and aspects of my work. Among them are Jackie Sime, William
Knottenbelt, Dimitri Panagiotakopoulos, Marie-Dominique Dupuy, Umar Bhatti, Victoria
Williams, and Wolfgang Shuster. However, my biggest gratitude goes to Arnab Majumdar
and to my supervisor, Washington Y. Ochieng. They had a critical role in the support,
supervision, and achievement of excellence in my research. Thanks to their
understanding, I attended various technical meetings, seminars, conferences, courses,
and simulation trials. These proved to be a significant direct and indirect contribution to the
quality of the research presented in this thesis.
One of the critical parts of the research presented in this thesis would not be feasible
without the technical support of the Irish Aviation Authority staff, especially Nick Lowth,
Bernard Mackessy, and Garrett MacNamara. However, my special gratitude goes to Alan
Byrne for making the impossible truly possible and allowing me to complete successfully a
key part of this research and make it complete.
There are many other people that have helped in various ways. I would like to thank Yvette
Dalle-Mule, Veronique Begault, and Sonja Straussberger from EUROCONTROL EEC.
Furthermore, I would like to thank Rajkumar Pant from the Indian Institute of Technology,
Isa Alkalaj and Marek Bekier from Skyguide, Martin Richards and Vic Burgess from UK
NATS, Christopher Adams from Maastricht UAC, Bob Phillips from CASA Australia, Peter
Nalder from New Zealand Civil Aviation Authority (CAA), Jos Kuijper and Randal de Garis
from EUROCONTROL, Sarah Doherty and Joji Waites from the UK CAA, and Keshava
Sharma from the Airports Authority of India.
I want to thank my friend Tamara Pejovic for all the support that she gave me during the
years I have been working on this thesis. Last but not least, I want to express my deepest
gratitude to my brother and my mother who were always the core support in all the
journeys that I have embarked upon. Hence, I am dedicating this thesis to them.
vi
Table of Contents
DECLARATION ii ABSTRACT iii ACKNOWLEDGEMENTS v TABLE OF CONTENTS vii LIST OF FIGURES xiv LIST OF TABLES xvii LIST OF ABBREVIATION xix
1 INTRODUCTION 1 1.1 Background to the problem 1 1.2 Research objectives 4 1.3 Outline of the thesis 5
2 FUNDAMENTALS OF AIR TRAFFIC MANAGEMENT AND CONTROL 8 2.1 Air Traffic Management 8 2.2 Air Traffic Control 10
2.2.1 Area Control service 11 2.2.2 Approach Control service 12 2.2.3 Aerodrome control service 12
2.3 Overall Air Traffic Control system architecture 13 2.3.1 Air Traffic Control functionalities 15
2.3.1.1 Communication function 15 2.3.1.2 Navigation function 18
2.3.1.2.1 Approach and landing navigation 19 2.3.1.2.2 Area navigation 20 2.3.1.2.3 Systems for control and monitoring of ground-based airport 22
facilities 2.3.1.3 Surveillance function 22
2.3.1.3.1 Radar systems 23 2.3.1.3.2 Radar and auxiliary display 24 2.3.1.3.3 Terminal and ground surveillance 24
2.3.1.4 Data processing and distribution function 25 2.3.1.5 Supporting function 28 2.3.1.6 Safety Nets 29 2.3.1.7 Power supply 30 2.3.1.8 Pointing and input devices 31 2.3.1.9 System control and monitoring function 31
2.4 Characteristics of the generic Air Traffic Control Centre 32 2.5 The future of Air Traffic Control 34
vii
2.5.1 Challenges of automation 34 2.5.2 Human-centred vs. technology-centred automation 36 2.5.3 The future of air navigation service 37 2.5.4 Impact of future ATM/ATC on controller recovery from equipment failures 38
2.6 Summary 39
3 PRELIMINARY ASSESSMENT OF EQUIPMENT FAILURES IN AIR TRAFFIC 41 CONTROL 3.1 Definition of equipment failure 42 3.2 Definition of a hazard 44 3.3 Supporting data: operational failure reports 45
3.3.1 Reporting and data collection 46 3.3.2 Data pre-processing problems 47 3.3.3 Available operational failure reports 49
3.4 Methodology to assess the relevance of supporting data 51 3.4.1 The accident to incident ratio 51 3.4.2 Units of measurement 53 3.4.3 The acceptable risk or target level of safety (TLS) 55
3.4.3.1 Existing standards 55 3.4.3.1.1 Joint Aviation Authority 56 3.4.3.1.2 UK Civil Aviation Authority 58 3.4.3.1.3 International Civil Aviation Organisation 58 3.4.3.1.4 Summary of the various TLS analyses 60
3.4.4 Target level of safety and Air Traffic Control risk budgeting 62 3.4.5 Target level of safety and Air Traffic Control equipment risk budgeting 63
3.5 Preliminary analysis and validation of operational failure reports 65 3.6 Summary 67
4 EQUIPMENT FAILURES AND TECHNICAL DEFENCES IN AIR TRAFFIC CONTROL 69 4.1 Equipment failure characteristics 69
4.1.1 ATC functionality affected 70 4.1.2 Complexity of failure type 71 4.1.3 Time course of failure development 71 4.1.4 Duration of failure 72 4.1.5 Potential causes of equipment failures 72
4.2 Consequences of equipment failure 73 4.2.1 Impact on air traffic controller 73 4.2.2 Impact on operations room 73 4.2.3 Impact on ATC operations 74 4.2.4 Impact on ATM operations 79
4.3 Definition of technical defences (technical recovery) 80 4.3.1 Defences for recovering from failure (safety devices) 82 4.3.2 Defences for transmitting information regarding the failure (warning devices) 83
4.4 Analysis of operational failure reports 85 4.4.1 Data analysis methodology 85 4.4.2 Rate of equipment failures 89 4.4.3 Type of ATC functionality and equipment affected 91 4.4.4 Complexity of failure type 95 4.4.5 Severity of equipment failures 96 4.4.6 Duration of equipment failures 98 4.4.7 Additional statistical tests 100
6.7 Methodology for the questionnaire survey data analysis 149 6.7.1 Data pre-processing for analysis 150 6.7.2 Characteristics of the sample 151
6.7.2.1 Sampling per ATC Centre 154 6.7.2.2 Sampling of air traffic controllers 154
6.7.3 High-level analyses 155
ix
6.7.3.1 Experience with equipment failures (Q1) 156 6.7.3.2 Factors that influence the controller recovery performance (Q2) 156 6.7.3.3 The most unreliable ATC systems/tools (Q3) 158 6.7.3.4 Organised exchange of information on equipment failures (Q4) 163 6.7.3.5 Status and quality of recovery procedures (Q5) 164
6.7.3.5.1 Other findings regarding the recovery procedures 167 6.7.3.6 Status and quality of training for recovery (Q6) 168
6.7.3.6.1 Other findings on training for recovery 170 6.7.3.7 Other findings on recovery performance 171
6.7.4 Interaction analyses 171 6.8 Summary 175
7 METHODOLOGY FOR A SELECTION OF RELEVANT AIR TRAFFIC CONTROLLER 178 RECOVERY INFLUENCING FACTORS
7.1 Relevance of the recovery context 178 7.1.1 Example of the recovery context 180
7.2 Methodology to extract the candidate set of contextual factors 181 7.2.1 Human Reliability Assessment techniques 183
7.2.1.1 Human Error in ATM (HERA) 183 7.2.1.2 Technique for the Retrospective and Predictive Analysis of Cognitive 184
Errors in ATC (TRACEr) 7.2.1.3 Recovery from Automation Failure (RAFT) Tool 185 7.2.1.4 Recovery from failures: understanding the positive role of human 186
operators during incidents 7.2.1.5 Computerised Operator Reliability and Error Database (CORE-DATA) 187 7.2.1.6 Technique for Human Error Rate Prediction (THERP) 188 7.2.1.7 Human Error Assessment and Reduction Technique (HEART) 190 7.2.1.8 The Contextual Control Model (COCOM) 191 7.2.1.9 Cognitive Reliability and Error Analysis Method (CREAM) 192 7.2.1.10 Human Reliability Management System (HRMS) 193 7.2.1.11 A Technique for Human Event Analysis (ATHEANA) 194 7.2.1.12 Connectionism Assessment of Human Reliability (CAHR) 195 7.2.1.13 Nuclear Action Reliability Assessment (NARA) 196 7.2.1.14 Human Performance DataBase (HPDB) 197 7.2.1.15 Summary of the findings 198
7.2.2 Augmentation with equipment-failure related factors 200 7.2.3 Augmentation with dynamic situational factors 200 7.2.4 Further subdivision of the identified RIFs 201
7.3 Definition of qualitative descriptors 202 7.4 Summary 204
8 QUANTITATIVE ASSESSMENT OF THE RECOVERY CONTEXT 206 8.1 Lessons leant from past research 206
8.1.1 Application of the CREAM technique 207 8.1.2 Connectionism Assessment of Human Reliability (CAHR) 208
8.2 Framework for the methodology for a quantitative assessment of recovery context 209 8.3 Probabilistic assessment of RIFs (Step 2) 211
8.3.1 Sources of information 212 8.3.1.1 Operational failure reports 212 8.3.1.2 Questionnaire survey 213 8.3.1.3 Input by ATM Specialists 213
x
8.3.1.4 Past literature 216 8.3.1.5 Aggregation of data 216
8.4.3 Quantification of RIFs interactions 223 8.5 Methodology for the determination of the cut-off points (Step 4) 227 8.6 Specific effects of RIFs on controller recovery performance (Step 5) 231 8.7 Calculation of the recovery context indicator (Step 6) 232
8.7.1 Re-calculation of RIF probabilities 232 8.7.2 Distribution of the recovery context indicator 234 8.7.3 Sensitivity analysis 236 8.7.4 Optimal solutions 237
8.8 Summary 238
9 EXPERIMENTAL INVESTIGATION OF THE AIR TRAFFIC CONTROLLER 240 RECOVERY PERFORMANCE
9.1 High-level design of the experimental process 241 9.2 Rationale for the experiment 242 9.3 Assessment of the available resources 242 9.4 Planning for the experiment 243 9.5 Design of the experiment 244 9.6 Selection of the equipment failure to be simulated 246 9.7 Pilot study: lessons learnt 249
9.7.1 Summary of the findings from the pilot study 252 9.8 Experimental set up 253
10.3.7.2 Observed behaviour and attitude 295 10.3.7.3 Additional findings 296
10.4 Summary 299
11 CONCLUSIONS 301 11.1 Revisiting the research objectives 301 11.2 Conclusions 301
11.2.1 Literature review 301 11.2.2 Equipment failure types and their characteristics 302 11.2.3 Controller recovery performance, recovery context, and influencing factors 303 11.2.4 Framework for the analysis of controller recovery 305
11.3 Future work 306 11.4 Publications relating to this work 307
11.4.1 Publication format: journal – accepted subject to revision 308 11.4.2 Publication format: journal – published 308 11.4.3 Publication format: conference proceedings - published 308
12 LIST OF REFERENCES 309
APPENDICES 323 Appendix I The cost of delays induced by equipment failures 324 Appendix II Interviews with ATM staff 326 Appendix III Checklist for the Equipment Failure Scenarios in a specific European 329
ATC Centre - An Aide-Memoire framework Appendix IV The questionnaire design 341 Appendix V Example of one questionnaire response 348 Appendix VI Results extracted from question 5 of the questionnaire survey 354 Appendix VII Overview of contextual factors 359 Appendix VIII Probabilities for 20 Recovery Influencing Factors (RIFs) 361 Appendix IX Questions for the ATM Specialist 375 Appendix X Overview of RIFs, their corresponding levels, and designated 378
probabilities Appendix XI Validation of the RIFs interaction matrix 381
xii
Appendix XII Distribution of 20 Recovery Influencing Factors (RIFs) 383 Appendix XIII Experimental material 385
Appendix XIV Overview of RIFs, their corresponding levels, determined in the experimental investigation
and probabilities 402
Appendix XV Distribution of the recovery context indicator captured in the experiment
404
xiii
List of Figures
Figure 1-1 Overview of the thesis 7 Figure 2-1 Air transport system (from Subotic et al., 2005) 9 Figure 2-2 Flight profile (adapter from ICAO, 2001b) 10 Figure 2-3 ATM and ATC system components (adapted from ICAO, 2001a) 14 Figure 2-4 Communication function 16 Figure 2-5 Navigational function 19 Figure 2-6 Surveillance function 23 Figure 2-7 Data processing and distribution function 26 Figure 2-8 Supporting function 29 Figure 2-9 System monitoring and control function 31 Figure 3-1 Phases of an equipment failure occurrence 41 Figure 3-2 Different definitions 43 Figure 3-3 Reporting system 46 Figure 3-4 ”Bathtub” model of reliability for electronic components (Leveson, 50
1995) Figure 3-5 Aviation TLS and risk budgeting 64 Figure 4-1 Safety through design (adapted from Christensen and Manuele, 81
1999) Figure 4-2 Technical and human recovery 82 Figure 4-3 Operational failure reports analyses 87 Figure 4-4 Total number of equipment failures per flight hours flown in each 90
year for countries A, B, and C Figure 4-5 Total number of equipment failures per flight hours flown in each 90
year for country D (year 2000 incomplete) Figure 4-6 Most affected ATC functionality (Country A) 91 Figure 4-7 Most affected ATC functionality (Country B) 92 Figure 4-8 Most affected ATC functionality (Country C) 92 Figure 4-9 Most affected ATC functionality (Country D) 93 Figure 4-10 Distribution of equipment failures according to their severity 96 Figure 4-11 Distribution of major equipment failures according to ATC 97
functionality Figure 4-12 Distribution of the failure duration according to four distinct 99
categories Figure 4-13 Qualitative equipment failure impact assessment tool 105 Figure 5-1 Analysis of outcome phase (adapted from EUROCONTROL, 2004e) 120 Figure 5-2 Recovery process phase model (Kanse, 2004) 123 Figure 5-3 The Recovery from Automation Failure Tool (RAFT) Framework 124
(EUROCONTROL, 2004e) Figure 5-4 Model of failure recovery in air traffic control. Where two nodes are 125
connected by an arrow, signs (+, -, 0) indicate the direction of effect on the variable depicted in the right node, caused by an increase in the variable depicted in the left node (Wickens et al., 1998) The flow diagram of organising a survey 140 Distribution of world air traffic per region for the year 2003 and 2023 142 (adapted from Airbus, 2004) One-page example of the questionnaire 146 The flow chart of questionnaire survey analyses 150 Distribution of questionnaire responses per region 153 Distribution of operational experience 155 Distribution of air traffic controllers’ ratings 155 Controllers’ reliance on written procedures throughout the recovery 157 process Controllers’ reliance on situation-specific problem solving throughout 157 the recovery process Controllers’ reliance on past experience throughout the recovery 158 process Distribution of affected ATC functionalities as reported in the 159 questionnaire survey Methodology to extract a candidate set of RIFs 182 Framework for the quantitative assessment of the recovery context 210 Distribution of RIF5 levels amongst identified recovery contexts 226 without interactions Distribution of RIF5 levels amongst identified recovery contexts with 226 interactions Distribution of RIF1 levels amongst identified recovery contexts with 227 interactions Distribution of RIF20 levels amongst identified recovery contexts with 227 interactions Distribution fitting for the three cut-off points on the example of RIF5 229 Level 1 Cubic polynomial function f(x) fitted for the RIF5 to determine its 230 minimum Distribution of the recovery context indicator 235 The flow diagram of experimental investigation 241 Timeline of the experiment 254 Room setup 255 The visual representation of equipment failure on CWP: a) before the 258 failure, b) after the failure Framework for the analysis of experimental results 271 Distribution of operational experience 272 Distribution of controllers’ ratings 273 Distribution of the recovery context indicator in the experiment 277 Distribution of the recovery context indicator in the experiment with 279 an increased value of the coefficient of interaction Distribution of the recovery context indicator of 30 controllers 280 Recovery steps performed by each participant 283 Distribution of required recovery steps (S1 to S17) 284 Distribution of recovery effectiveness per category 286 Distribution of recovery duration 287 Distribution of the recovery outcome 290
xv
295 Figure 10-12 Recovery phases, their corresponding influencing factors, and required recovery steps
Summary of available data, number of reports, and equipment failure 49 incidents per country Summary of various analyses on aviation TLS 61 Analysis of operational failure reports and results 66 Examples of equipment failures related to different ATC system 70 functionalities (as defined in Chapter 2) UK NATS severity rating (from NATS, 2002) 75 Country C’s severity rating as defined by its CAA 76 Country D severity rating as defined by the particular ATC Centre 76 Severity rating defined in this research and mapped with available 77 sources Most affected ATC equipment (Country A) 91 Most affected ATC equipment (Country B) 92 Most affected ATC equipment (Country C) 93 Most affected ATC equipment (Country D) 94 Summary of five ATC equipment types most affected by failures 94 Percentage of the multiple failure occurrences reported in the 95 available datasets Summary of five most affected equipment types from four datasets 98 Distribution of major failures lasting up to 15 minutes per ATC 99 equipment affected Statistical tests and results obtained 100 Main findings regarding interaction between ATC functionality and 101 severity Review of equipment failure characteristics with regard to their 101 impact on ATC operations Detailed overview of the primary and the secondary group of ATC 103 functionalities Phases of the recovery process identified in past research 112 Summary of relevant models of the human recovery process 126 Summary of the questionnaire survey sample 151 Mapping between most unreliable ATC functionalities and existing 160 recovery procedures for sampled worldwide countries Existence of recovery procedures, recovery training, and recurrent 165 training as reported in the questionnaire survey Interaction matrix 172 Statistical tests and results obtained 173 Factors influencing recovery from failures (from Kanse and van der 186 Schaaf, 2000) Factors influencing human actions in THERP (cited in Straeter, 189 2000) Review of Human Reliability Assessment (HRA) techniques and 198
relevant findings Recovery Influencing Factors 201 Relevant recovery influencing factors and their corresponding 203 qualitative descriptors Overview of CREAM and CAHR differences 208 Distribution of probabilistic RIF ratings per source 212 ATM specialists involved in the assessment of RIFs 214 Overview of the sources of information used to determine RIF 217 probabilities Example of a potential recovery context represented as a 20-digit 218 array Interaction matrix: (1) validation by CREAM, (2) validation by CAHR, 220 (3) validation by ATM specialists; and (x) not validated interactions Mapping between RIFs and CAHR contextual factors 222 Recovery context (as presented in Table 8-5) after the incorporation 225 of RIF interactions Descriptive statistics for the three cut-off points on the example of 229 RIF5 Level 1 Local minimums of polynomial functions 230 Cut-off points between the levels for all RIFs 230 Probabilities for the RIF5 and each of its levels (see Appendix VII) 232 Sensitivity analysis 237 Training, pilot study, and experiment sessions 244 Overview of the potential equipment failures to be simulated and 247 their inclusion in the pilot study Equipment failures used in the pilot study 249 The mapping between exercise characteristics and the controllers 257 observations Equipment failure in the experimental study 258 Availability of functions in the reduced flight data processing mode 259
Overview of independent and dependent variables 259 Overview of independent and extraneous variables 261 Overview and description of required recovery steps 263 Recovery process and its three main tasks 265 Characteristics of a sample of controllers participating in experiment 273 Verification of RIFs probabilities from a ‘generic’ approach (Chapter 275 8) and the experiment Summary of RIFs defined through a single corresponding level 277 Verification of the distribution of the recovery context indicator 278 obtained from a ‘generic’ approach (Chapter 8) and the experiment A review of RIFs with the potential for recovery enhancement 281 A review of the proposed recovery solutions 282 Percentage of performed recovery steps in three experimental 285 sessions Comparison of recovery durations between three experimental 288 sessions Statistical tests and results 289 The outcome of the recovery process matrix (S stands for 290 successful, T for tolerable, and U for unsuccessful recovery) Statistical tests and results 291 Summary of additional findings 299
xviii
List of Abbreviations
ACAS Airborne Collision Avoidance System ACC Area Control Centre ADREP Accident/Incident Reporting ADS Automatic Dependent Surveillance ADS-B Automatic Dependence Surveillance Broadcast ADS-C Automatic Dependence Surveillance Contract AFTN Aeronautical Fixed Telecommunication Network A/G Air-Ground communication AGDP Air Ground Data Processor AGL Aeronautical Ground Lighting AIAA American Institute of Aeronautics and Astronautics AIS Aeronautical Information Service AMAN Arrival Manager ANSP Air Navigation Service Provider APP Approach Control Office APR Automatic Position Reporting APW Area Proximity Warning ARO Air traffic services Reporting Office ARTCC Air Route Traffic Control Centre ASAS Airborne Surveillance and Separation Assurance ASM Airspace Management ASMT ATM Safety Monitoring Tool ASMT Automatic Safety Monitoring Tool ASTERIX All Purpose STructured Eurocontrol Radar Information
Exchange ATC Air Traffic Control ATCT Air Traffic Control Tower ATFM Air Traffic Flow Management ATHEANA A Technique for Human Event Analysis ATIS Aeronautical Terminal Information Service ATM Air Traffic Management ATS Air Traffic Service AWOP All-Weather Operations Panel BBN Bayesian Belief Network BEST Beginning to End Skills Trainer BEVOR German special occurrences database CAA Civil Aviation Authority CAHR Connectionism Assessment of Human Reliability
xix
CATIS Computerised Automatic Terminal Information Service CC Contextual Condition CLAM Cleared Level Adherence Monitoring CEATS Central European Air Traffic Services CFMU Central Flow Management Unit CMS Control and Monitoring System CNS Communication Navigation Surveillance COCOM Contextual Control Model CORE-DATA Computerised Operator Reliability and Error Database CPC Common Performance Condition CPDLC Controller Pilot Data Link Communication CPM Common Performance Modes CRDS CEATS Research, Development and Simulation CREAM Cognitive Reliability and Error Analysis Method CS Commercial Service CWP Controller Working Position DARC Direct Access Radar Channel DMAN Departure Manager DME Distance Measuring Equipment EASA European Aviation Safety Agency ECAC European Civil Aviation Conference ECSS European Cooperation for Space Standardisation EGNOS European Geostationary Navigation Overlay Service EOC Errors Of Commission EOO Errors of Ommission EPC Error Producing Condition ESA European Space Agency ESSAR EUROCONTROL SAfety Regulatory Requirements ET Event Tree EU European Union EUROCONTROL European Organization for Safety of Air Navigation FAA Federal Aviation Administration FANS Future Navigation System FDPD Flight Data Processing and Distribution FDPS Flight Data Processing System FIR Flight Information Region FIS Flight Information Service FL Flight Level FMEA Failure Mode and Effect Analysis FMECA Failure Modes, Effects, and Criticality Analysis FMS Flight Management System FPP Flight Plan Processing FPS Flight Progress Strips FT Fault Tree G2G Gate to Gate G/G Ground-Ground communication GLONAS Global Orbiting Navigation Satellite System GNSS Global Navigation Satellite Systems GPS Global Positioning System HEART Human Error Assessment and Reduction Technique HEIDI Harmonisation of European Incident Definition Initiative
xx
HEP Human Error Probability HFACS Human Factors Analysis and Classification System HEP Human Error Probability HERA Human Error in ATM Project HF High Frequency HF DL High Frequency Data Link HMI Human Machine Interface HPDB Human Performance DataBase HRA Human Reliability Assessment HRMS Human Reliability Management System IANS Institute of Air Navigation Services IC Intercom Ic recovery Context Indicator ICAO International Civil Aviation Organization IEC International Electrotechnical Commission IEEE Institute of Electrical and Electronics Engineers IFR Instrument Flight Rules ILS Instrument Landing System IMC Instrument Meteorological Conditions IMC Industry Management Committee INS Inertial Navigation Systems IP Interphone IRS Incident Reporting System ISO International Organisation for Standardisation JAA Joint Aviation Authority JAR Joint Aviation Regulations JHEDI Justification of Human Error Data Information M Mean
MAESTRO Means to Aid Expedition and Sequencing of Traffic with Research and Optimisation
MANTAS Maastricht ATC New Tools And Systems MATS Manual of Air Traffic Services MDT Mean Down Time MET Meteorological METAR Meteorological Aerodrome Report Mil Military MLS Microwave Landing System MMI Man Machine Interface MMS Man Machine System MONA MONitoring Aids MORS Mandatory Occurrence Reporting Scheme MRP Multi Radar Processing MSAW Minimum Safe Altitude Warning MSL Mean Sea Level MTBF Mean Time Between Failure MTBM Mean Time Between Maintenance MTCD Medium Term Conflict Detection MTTR Mean Time To Repair MUAC Maastricht Upper Area Control Centre NATSPG North Atlantic Systems Planning Group MTOW Maximum Take Off Weight
xxi
NARA Nuclear Action Reliability Assessment NAIPS National Aeronautical Information Processing System NAS National Aviation System NASA National Aeronautics and Space Administration NATS National Air Traffic Service NUCLARR Nuclear Computerise Library for Assessing Reactor Reliability NDB Non-Directional Beacon NLR National Aerospace Laboratory NOTAM Notice to Airmen NTL National Transportation Library NTSB National Transportation Safety Board OJT On-the-Job-Training OLDI On-line Data Interchange OS Open Service PABX Private Automatic Branch Exchange PAR Precision Approach Radar PARM Parallel Approach Runway Monitor PPS Precise Positioning Service PRA Probabilistic Risk Assessment PRNAV Precision aRea NAVigation PRS Public Regulated Service Proc Procedural control PRS Primary Radar Service PSA Probabilistic Safety Assessment PSF Performance Shaping Factor PSR Primary Surveillance Radar PTT Press To Talk QRA Quantitative Risk Assessment RAFT Recovery from Automation Failure Tool RAM Route Adherence Monitoring RCP Required Communication Performance RDP Radar Data Processing RDPS Radar Data Processing System RDR Radar RGCSP Review of the General Concept of Separation Panel RIF Recovery Influencing Factor RIMCAS Runway Incursion Monitoring and Conflict Alert System RNP Required Navigational Performance RSP Required Surveillance Performance RT Radio Telephony RTCA Radio Technical Commission for Aeronautics RVSM Reduced Vertical Separation Minima RVR Runway Visual Range RWY Runway SAR Special Administrative Region SAR Search And Rescue SAS Situational Awareness for Safety SATCOM SATellite COMmunication SHAPE Solutions for Human Automation Partnership in European ATM SBAS Satellite-Based Augmentation Systems SBJ Supersonic Business Jet
xxii
SD Standard Deviation SE Standard Error SEP Safety and Emergency Procedures SES Single European Sky SID Standard Instrument Departure SME Subject Matter Expert SMC Surface Movement Control SMR Surface Movement Radar SNET Safety Nets SoL Safety-of-Life SOR Stimulus-Organism-Response SPS Standard Positioning Service SRG Safety Regulatory Group SRK Skill Rule Knowledge SRP Single Radar Processing SRU Safety Regulatory Unit SSR Secondary Surveillance Radar STAR Standard Terminal Arrival Route STCA Short Term Conflict Alert SUA Special Use Airspace SYSCO System Supported COordination TACAN TACtical Air Navigation THERP Technique for Human Error Rate Prediction TAR Terminal Approach Radar TCAS Traffic Alert and Collision Avoidance System TID Touch Input Device TRACON Terminal Radar Approach CONtrol TIP Touch Input Panels TLS Target Level of Safety TRACEr Technique for the Retrospective and Predictive Analysis of
Cognitive Errors in ATC TRACON Terminal Radar Approach CONtrol TRUCE TRaining for Unusual Circumstances and Emergencies TRM Team Resource Management TTA Time To Alert TWR Aerodrome Control Tower TWY Taxiway UAV Unmanned Aerial Vehicles UHF Ultra High Frequency UPS Uninterruptible Power Supply US United States UTC Coordinated Universal Time VDL Very high frequency Data Link VFR Visual Flight Rules VHF Very High Frequency VMC Visual Meteorological Conditions VOR VHF Omnidirectional Range navigation system VORTAC VHF Omnidirectional Range /TACtical Air Navigation VSCS Voice Switching Communication System WAAS World Aircraft Accident Summary
xxiii
Chapter 1 Introduction
1
1 Introduction
The aim of this Chapter is to present the background to the problem of controller
recovery from equipment failures in Air Traffic Control (ATC) and to set the scene for
the research presented in this thesis. This Chapter defines the rationale behind the
need to better understand the impact that equipment failures have on controller
performance in the current as well as in the future ATC environment. Based on this
background, the principle research objectives are defined to assure an in depth
analysis of ATC equipment failures and controller recovery. This is followed by the
specification of the structure of the thesis and a summary of each Chapter.
1.1 Background to the problem
The aim of the research presented in this thesis is to provide a holistic assessment of
controller recovery from equipment failures in ATC. In order to achieve this, it is
essential to define the environment in which equipment failures are investigated, i.e.
the Air Traffic Management (ATM) system and its ATC component. While ATC is
responsible for the separation of air traffic, other components of the ATM system
manage air traffic flow and airspace design to assure minimal delays and optimal use
of airspace. The ATC system is comprised of people, equipment, and procedures
required to act together to achieve the same objective, i.e. safe and efficient flow of air
traffic in a dedicated airspace. In order to achieve this, all three components must be
operational and fully integrated to enable the most effective and efficient air traffic
service. Consequently, in the case of failure of any component of the ATC system, the
remaining nominally operational components may still provide air traffic services, either
partially or fully, depending on the characteristics of the failure. The research presented
in this thesis focuses solely on failures of one component of the ATC system, namely
equipment.
In order to provide continuous air traffic services various ‘defences’ or ‘barriers’ are
designed to prevent or mitigate the occurrence of equipment failures. For example, the
existence of technical built-in defences offers protection against the majority of
Chapter 1 Introduction
2
equipment failures that can occur (NATS, 2002). In most cases, this protection is
triggered automatically and seamlessly. Hence, an equipment failure should not result
in a problem that impacts on the controller’s ability to carry out tasks safely, as they
should be automatically resolved with no interruption of the service (EUROCONTROL,
2004e). However, there are occasions when these technical defences are not sufficient
to maintain the normal ATC system state and protect against negative outcomes. On
such occasions, the intervention of the human, as a component of the ATC system, is
necessary. In other words, the intervention of the air traffic controller becomes crucial
for the provision of a safe but not necessarily efficient air traffic service. Note that
safety represents the key driver here as opposed to efficiency.
In the past, major failures or total outages (i.e. failure of the entire system) were the
subject of detailed investigations. These investigations were aimed at resolving and
preventing similar failure occurrences by focusing mostly on the technology (National
Transportation Safety Board, 1996; General Accounting Office, 1982; General
Accounting Office, 1991; General Accounting Office, 1996; and General Accounting
Office, 1998). For a long time, the basic focus of reliability, system safety, and quality
management was purely on the prevention of equipment failures or the reduction of
their reoccurrence. Various techniques have been developed to assess equipment
failures, their causes, consequences, and appropriate defences. For example, the US
Federal Aviation Administration (FAA) requests that the availability of the Voice
Switching Communication System (VSCS) on the level of the ATC Centre (facility-
level1) should not be less than 0.9999999, including the backup VSCS (FAA, 1997). In
spite of the significant efforts, equipment failures still occur and every ATC system
eventually fails to perform its intended function or part thereof. On these unexpected
occasions, the recovery of the ATC system is left to the human operator to implement
an appropriate recovery strategy in both a timely and effective manner. While past
research focused on the technical aspects of the occurrence of equipment failures,
very little has been done on human factors, with a particular reference to controller
recovery from such failures. Some examples, such as research by Wickens et al.
(1998), Low and Donohoe (2001), and EUROCONTROL (2004e), are discussed in the
following paragraphs.
1 The facility-level availability is based on a 50-position system. According to the FAA, system failure occurs when one or more critical functions are unavailable in more than 10 percent of the
positions.
Chapter 1 Introduction
3
There is a vast amount of Human Reliability Assessment (HRA) research on recovery
from human error in areas including the nuclear and chemical process industry.
However, this knowledge has not been fully exhausted in aviation. For example, Zapf
and Reason (1994), Kontogiannis (1999), Kanse and van der Schaaf (2000), and
Kanse (2004) analysed recovery from the consequences of human error in various
non-ATC environments. Moreover, past HRA research recognised the importance of
contextual factors that influence the recovery process. Various HRA techniques defined
these factors depending on the type of operation and environment that surrounds the
human operator. In short, the concepts of recovery from human error and recovery
context are transferable to the recovery from equipment failure. Both represent human
recovery triggered by different stimulus (human error as opposed to technical failure)
occurring within a certain context.
The above findings led to a significant research effort being devoted to the area of
human recovery, from both human error and technical faults. For example, research on
automation in future ATM has shown that human operators are less likely to detect
failures in the automated process due to complacency and reduced situational
awareness (Wickens et al., 1998; Metzger and Parasuraman, 2005). Researchers at
the UK National Air Traffic Service (NATS) examined the potential methodologies to
assess human recovery performance from failures of several automated systems (Low
and Donohoe, 2001). Several different safety (e.g. hazard and operability-HAZOP) and
psycho-physiological methods (e.g. eye movement tracking, situational awareness
assessment-SAGAT, subjective workload ratings-NASA TLX, speech workload) were
investigated. While some of these methods are quite easy to implement (e.g. HAZOP,
SAGAT, NASA TLX), others require complex training and the use of sophisticated
equipment (e.g. eye movement tracking, speech workload). Most of these methods
proved to be appropriate, providing useful information and were thus recommended for
future use. Due to the confidential nature of this research, no further insight was given
into the human recovery process, its phases, and the impact of the context surrounding
the controllers.
Furthermore, the EUROCONTROL Gate to Gate (G2G) project, initiated to test future
advanced ATC concepts, further highlighted the impact and importance of ATC
equipment failures. ATC safety managers throughout Europe highlighted several
equipment related areas of concern within their ATC Centres (Gordon and Makings,
2003). These are: radio communication interference, equipment reliability, ATC tools
failure, and relevance of emergency checklists for controllers and appropriate handling
Chapter 1 Introduction
4
of emergency situations. This study highlighted the consequences of equipment
unavailability in current as well as future more automated ATC environments.
Simulation trials that followed attempted to identify and investigate safety-relevant
occurrences associated with future ATC concepts/tools (Medium Term Conflict
Detection-MTCD, MONitoring Aid-MONA, data link, Arrival Manager-AMAN, and
Airborne Separation Assistance System-ASAS). Various equipment failures were
identified amongst the potential safety-relevant occurrences 2 . They ranged from
problems with Human Machine Interface (HMI), ASAS messages, as well as data link
messages (Damidau, Kirwan, and Scrivani, 2006).
However, not many studies have explicitly addressed jointly the question of equipment
failures and recovery in the area of ATC. The Panel on Human Factors in Air Traffic
Control Automation was formed at the request of the Federal Aviation Administration
(FAA) to study the air traffic control system, the national airspace system, and future
automation alternatives from a human factors perspective (Wickens et al., 1998). The
Panel’s deliberations, in particular, highlighted the role of reliability of automation and
human recovery in the future ATC environment, characterised with higher levels of
automation, complexity, and traffic density. Similarly, the EUROCONTROL project on
Solutions for Human Automation Partnership in European Air Traffic Management
(SHAPE) dedicated one part to the analysis of human recovery from equipment failures
in the automated ATC environment. The findings highlighted the importance of context
within which a failure occurs as well as recovery training and procedures designed to
aid recovering (EUROCONTROL, 2004e).
Overall, existing research has shown that there is a need to understand the
mechanisms behind failure and recovery in ATC. This applies both to the technical and
human perspectives as both are essential to ensuring the highest level of safety. In
order to develop a heuristic method to address these issues, it is necessary to define
the major research objectives. These are presented below.
1.2 Research objectives
The need for an in depth analysis of ATC equipment failures and the associated
controller recovery processes is presented briefly above and is discussed in more
2 Personal correspondence with EUROCONTROL G2G project team.
Chapter 1 Introduction
5
detail in the remainder of the thesis. Based on the background to the problem
presented above, four research objectives have been formulated:
� Provide a systematic literature review to connect disparate but related topics of
ATC equipment failures and controller recovery, previously lacking in the area of
ATC;
� Identify potential equipment failure types and their characteristics;
� Identify contextual factors that affect controller recovery performance and derive
a methodology to quantitatively assess recovery context; and
� Propose a framework for the analysis of controller recovery. This framework
should be further verified with a specific reference to a particular equipment
failure type.
1.3 Outline of the thesis
This thesis is organised as follows. Chapter 2 discusses the architecture of the Air
Traffic Management (ATM) system with specific attention paid to its Air Traffic Control
(ATC) component, to portray the context of the research presented in this thesis. The
ATC architecture is presented in terms of nine functionalities and the corresponding
physical architecture (equipment). In other words, it specifies nine ATC functionalities
and equipment that supports each of them. Chapter 3 presents a preliminary
assessment of the equipment failures in ATC based on the sample of operational
failure reports available in this research. It provides definitions of equipment failure,
hazards, and built-in technical defences to be used in the research on recovery from
equipment failures in ATC. The Chapter continues by assessing how representative is
the sample of equipment failures occurring in the operational ATC environment. This is
achieved though a methodology that determines how much ATC equipment contributes
to the safety of the overall air transport system.
Having confirmed that the operational failure reports available in this thesis are
representative of the equipment failure types experienced operationally, Chapter 4
provides a good understanding of equipment failures and their impact on the ATM and
ATC operations. It discusses the main equipment failure characteristics extracted from
available operational failure reports and past research. Assessed characteristics range
from the ATC functionality affected to the impact of equipment failure on ATC and ATM
operations. The Chapter concludes with the development of a novel tool for the
assessment of the overall impact of an equipment failure on ATC operations, known as
the qualitative equipment failure impact assessment tool.
Chapter 1 Introduction
6
Having established the framework for the assessment of equipment failures in
Chapters 3 and 4, Chapter 5 addresses the human factors aspects of relevance to
controller recovery performance in the event of an equipment failure. It discusses past
research on human reliability transferable to controller recovery performance. The
Chapter presents the initial theoretical findings on the recovery process, including the
relevance of the recovery context, past experience, recovery procedures, and recovery
training. It concludes by defining the potential variables that enable the assessment
and understanding of controller recovery performance.
The theoretical findings from Chapter 5 are further informed by the operational
experience extracted from the questionnaire survey results presented in Chapter 6.
This survey informed both the technical and human aspects of the research into
recovery from ATC equipment failures.
Having acknowledged the importance of recovery context both from past research
(Chapter 5) and operational experience (Chapter 6), this thesis continues by setting the
scene for the qualitative and quantitative assessment of the recovery context. Chapter
7 reviews past ATC and non-ATC research to extract the relevant factors important for
the definition of the context surrounding an ATC equipment failure occurrence. As a
result, this Chapter concludes with a set of 20 candidate Recovery Influencing Factors
(RIFs). Chapter 8 reviews relevant past research to further exploit the findings from
Chapter 7. It continues by defining the methodology for the quantitative assessment of
the recovery context and definition of the recovery context indicator.
To further verify this methodology proposed in Chapter 8, Chapter 9 presents the
design of an experiment carried out at a particular ATC Centre that involved exposing
30 operational controllers to an unexpected but complex equipment failure. This
particular equipment failure was carefully selected from several failure types based on
the findings in Chapters 4, 5, and 6. The analyses of the data collected on recovery
performance from this experiment are presented in Chapter 10. These analyses are
based on a set of variables that enable investigation of controller recovery as proposed
in Chapter 5. The thesis ends with Chapter 11 drawing together the conclusions
achieved throughout this research together with suggested areas for further research.
Figure 1-1 crystallises the overall structure of this thesis.
Chapter 1 Introduction
7
Figure 1-1 Overview of the thesis
Chapter 2 Fundamental of ATM and ATC
8
2 Fundamentals of Air Traffic Management and Control
The main objective of the research presented in this thesis is to investigate the
recovery process adopted by air traffic controllers in the event of Air Traffic Control
(ATC) equipment failures. A desirable objective of the research in this thesis is a
framework to analyse controller recovery transferable in time (i.e. to the current and
future ATC Centre). The Chapter contributes to this objective in several ways. Firstly, it
defines the environment for the investigation of equipment failures, i.e. Air Traffic
Management (ATM) and its component ATC. Secondly, it discusses the ATC system
architecture including its specific functional elements. The Chapter proposes a unique
classification of equipment failures based on these functional elements that enables the
capture of all operational components of ATC. This classification is further built upon in
the remainder of the thesis (Chapter 4) to create a qualitative equipment failure impact
assessment tool. Thirdly, the Chapter reviews the characteristics of a generic ATC
Centre with regard to current and future technologies. The potential characteristics of
future ATC Centres are discussed with an emphasis on challenges that face human
operators (i.e. air traffic controllers) due to increasing levels of automation. The
Chapter concludes with discussions on the potential sources of technical and controller
performance deficiencies within future ATC Centres and their relevance to the recovery
process.
2.1 Air Traffic Management
The major components of the air transport system are aircraft, airline operations, ATM,
airport operations, and the operational environment in which these components exist
and interact (Figure 2-1). The objective of ATM is “to enable aircraft operators to meet
their planned times of departure and arrival and adhere to their preferred flight profiles
with minimum constraints, without compromising agreed levels of safety”
(EUROCONTROL, 2006a).
Chapter 2 Fundamental of ATM and ATC
9
Figure 2-1 Air transport system (from Subotic et al., 2005)
An ATM system comprises two functionally integrated elements, namely airborne ATM
and ground-based ATM. The airborne ATM consists of several systems integrated into
the aircraft cockpit, such as the airborne Communication/Navigation/Surveillance
(CNS) system, the Flight Management System (FMS), and the Airborne Collision
Avoidance System (ACAS) also known as the Traffic Alert and Collision Avoidance
System (TCAS). The components of ground-based ATM (Figure 2-1) are Airspace
Management (ASM), Air Traffic Service (ATS), and Air Traffic Flow Management
(ATFM) (ICAO, 2001a).
Airspace Management (ASM) is related to the structure and organisation of the national
airspace organised at a strategic (i.e. national ASM policy, planning, and coordination),
pre-tactical (i.e. daily management and temporary allocation of airspace), and tactical
levels (i.e. real-time activation, deactivation, reallocation of airspace, and civil/military
coordination). Air Traffic Service (ATS) is a generic term that combines various
services: the Air traffic services Reporting Office (ARO), the Air Traffic Control service
(ATC), and the Flight Information and alerting Service (FIS) (ICAO, 2001a). The ARO is
a unit established for the purpose of receiving reports concerning air traffic services
and flight plans submitted before flight departure. The ATC component of ATS provides
control of all air traffic in a dedicated airspace. This is discussed in detail in section 2.2
given its importance to the research presented in this thesis. The Flight Information and
alerting Service (FIS) gives advice and information useful for the safe and efficient
conduct of flights. The alerting service provides search and rescue assistance to
aircraft in distress and coordinates any action that may be required. Finally, Air Traffic
Flow Management (ATFM) is a service established to ensure that ATC capacity is
Chapter 2 Fundamental of ATM and ATC
10
utilised to the maximum extent possible, and that the traffic volumes are compatible
with the capacities declared by the appropriate authority. Optimal flow of traffic is
achieved by continuously balancing the traffic demand and the ability of ATC to
accommodate that demand.
2.2 Air Traffic Control
The research presented in this thesis is focused specifically on controller recovery from
equipment failures in Air Traffic Control (ATC). Therefore, this section focuses on the
main characteristics of ATC and the different services provided. Modern ATC services
are provided from ATC Centres by controllers and supporting staff (engineers,
managers, and administrators), working together to achieve the same objective. The
primary objective of an ATC service is to provide a safe flow of traffic both in the air and
on the ground (EUROCONTROL, 1999). In other words, the primary function is to
prevent collision between aircraft in the air as well as collision between aircraft and any
obstacles on the manoeuvring area, by providing and maintaining the required lateral
and vertical separations. The secondary function of an ATC service include ensuring
orderly and expeditious traffic flow by providing traffic advisories, such as weather
information and navigation directions (i.e. vectors). To achieve these functions, the
service is divided into sections that provide an ATC service to aircraft depending on the
segment of the flight profile, i.e. phase of flight (Figure 2-2). According to the
International Civil Aviation Organisation (ICAO)1, ATC provides area, approach, and
aerodrome control services. These are discussed in the following sections.
Figure 2-2 Flight profile (adapter from ICAO, 2001b)
1 ICAO is the specialised agency of the United Nations concerned with the development of air
navigation and regulation of international air transport.
Chapter 2 Fundamental of ATM and ATC
11
2.2.1 Area control service
The area control service is provided from an Area Control Centre (ACC), as defined by
ICAO. In the US, such a Center is referred to as an Air Route Traffic Control Centre
(ARTCC) as defined by the US Federal Aviation Administration (FAA). The controllers
at ACCs provide instructions, clearances, and advice regarding flight conditions during
the cruise phase of the flight (see Figure 2-2). The controllers provide separation
between aircraft operating in the complex network of airways (predetermined air
routes). The controllers use radar to monitor the progress of flights and intervene when
the route or flight level of an aircraft brings it into conflict with another. This is achieved
through tactical air traffic control interventions such as heading or track change, flight
level change, speed control, or alteration of flight routes. In areas where it is impossible
to provide a radar service (i.e. oceanic airspace and other regions without radar
coverage), the controllers employ procedural (i.e. non-radar) control to ensure that
adequate separation exists between aircraft. Procedural control employs greater
separation standards because of the absence of direct radar surveillance (Nolan, 1998;
EUROCONTROL, 1999).
An ACC is usually sub-divided into controlled airspace sectors2 that have responsibility
for specific portions of airspace. This is a direct result of the large volumes of air traffic
that utilise the airspace in the cruise phase of the flight. The greater airspace is
sectorised into smaller, more manageable parts in an effort to prevent controller
overload (i.e. when the traffic in a sector exceeds available airspace capacity or a
controller is unable to safely control existing levels of air traffic).
Generally, each ATC sector is manned by an executive and planning controller, where
each has clearly defined roles and responsibilities (EUROCONTROL, 1999). In the
case of high traffic complexity, two sector controllers are supported by a third person,
i.e. an assistant or a flight data controller. The executive controller is responsible for the
correct identification of traffic within the sector’s area of responsibility and for the
control of all aircraft to ensure a safe, orderly, and expeditious flow of air traffic.
Additionally, the executive controller is required to assist pilots by providing required
navigation assistance and to assist aircraft in any emergency situation. The planning
controller assists the executive controller to the fullest extent by identifying traffic in
2 Airspace is organised into adjacent portions, the so-called sectors, controlled by two or three
controllers, namely executive or tactical controller, planning controller, and assistant or flight data controller.
Chapter 2 Fundamental of ATM and ATC
12
potential conflict, managing flight progress strips, and planning the flow of traffic within
the sector. In addition, the planning controller has to assure that traffic enters and
leaves the sector at flight levels and exit points as agreed with the adjacent sectors
(EUROCONTROL, 1999). The assistant or flight data controller ensures that the strip
printer functions properly. In addition, the assistant accepts, processes all received
messages in a timely manner, and passes them to the appropriate position, manually
inputting any tracks for which flight progress strips have not been produced.
The controllers operating in the sectors within an ACC Centre work in close
cooperation and negotiate with each other on aircraft’s behalf to optimise efficiency and
ensure safety. The area controller’s responsibility terminates when aircraft is handed
over to an adjacent ACC or to an approach control office.
2.2.2 Approach control service
The approach control service is provided from the APProach control office or room
(APP), as defined by ICAO or Terminal Radar Approach CONtrol (TRACON), as
defined by the FAA. According to ICAO (2001a) the approach control unit is
established to provide air traffic control service to controlled flights arriving at, or
departing from, one or more airports. This service is closely associated with the
characteristics of the airports. The radar controllers in the approach control office
provide separation between aircraft in descent during the arrival phase, and, during the
departure phase, between aircraft climbing to their assigned cruise or intermediate
assigned levels (see Figure 2-2). Therefore, the approach controllers are responsible
for providing a safe and expeditious service to departing aircraft in the initial phase of
flight and to arriving aircraft in the descent and final phases of flight (Nolan, 1998;
EUROCONTROL, 1999). The approach controller’s responsibility terminates when
departing aircraft is handed over to an ACC or when arriving aircraft has landed. Note
that APP is responsible for monitoring approaching aircraft, even after they are
transferred to aerodrome control tower, until they land.
2.2.3 Aerodrome control service
The aerodrome control service is provided from the Aerodrome Control Tower (TWR),
as defined by ICAO or Air Traffic Control Tower (ATCT), as defined by the FAA. The
aerodrome controllers are responsible for the safe and efficient conduct of flights during
the take-off and landing phases. These controllers direct airport traffic so that it flows
smoothly and expeditiously. Working closely with the approach controller, they ensure
safety of airport operations by restricting traffic movements so that only one aircraft
Chapter 2 Fundamental of ATM and ATC
13
may land or take-off at a time (Nolan, 1998; EUROCONTROL, 1999). In airports that
use multi-runway operations, the aerodrome controller may be responsible for all
runway operations. Otherwise, the responsibility for multi-runway operations may be
divided between a number of controllers. For example, a parallel runway configuration,
where one runway is dedicated to departures and the other to arrivals, requires
separate departure and arrival controller. In this case close cooperation between the
two controllers is essential to ensure a safe operation.
The aerodrome controller is responsible for all traffic operating in the designated area
of responsibility of the control tower. This includes aerodrome circuit traffic, aircraft
landing and taking off, and aircraft and vehicles operating on the manoeuvring areas
(ICAO, 2001a). When good visibility conditions prevail, (i.e. visual meteorological
conditions or VMC), the controller may separate the traffic by visual means and a
reduction in standard separation is permissible. When poor visibility conditions prevail
(i.e. instrument meteorological conditions or IMC) the aerodrome controller works in
close cooperation with the approach controller. In such conditions, prescribed
separation standards must be applied between aircraft in the air.
The surface movement control or ground control (in the US) is a supplementary service
to the aerodrome control service. In less busy airports the aerodrome and surface
movement control functions can be combined and provided by the aerodrome
controller. Otherwise, the surface controller is responsible for issuing taxi clearance
which will take all aircraft to the departure end of the runway (Nolan, 1998;
EUROCONTROL, 1999). In addition, the surface controller is responsible for the
movements of all aircraft and vehicular traffic on the manoeuvring areas of the airport.
ICAO (2001a) defines the manoeuvring areas as any part of the airport used for the
takeoff, landing, and taxiing of aircraft, excluding aprons. Surface movement control is
usually undertaken by visual means. However, in conditions of poor visibility the
controller relies upon surface movement radar (SMR). Working in close cooperation
with the aerodrome controller, the surface controller ensures that all active runways are
free from vehicular activity during aircraft movements.
2.3 Overall Air Traffic Control system architecture
The preceding paragraphs have highlighted the complexity of the ATM system and its
further decomposition down to the ATC system. Additionally, Figure 2-3 presents ATC
as a system comprised of people, equipment, and procedures integrated in an optimal
way to achieve a common objective. In order to understand how these components
Chapter 2 Fundamental of ATM and ATC
14
come together, a more detailed explanation of the ATC architecture and its basic
functionalities is given below. In line with the objectives of the research presented in
this thesis, this section provides a deeper understanding of ATC functionalities and the
types of ATC equipment that can fail, and therefore affect controller recovery.
ATM
Airspace
management
(ASM)
Air Traffic Flow
Management
(ATFM)
Airborne ATM
(e.g. airborne
CNS, FMC,
ACAS/TCAS)
Ground-based
ATM
Air Traffic
Services (ATS)
Air Traffic Control
(ATC)
Air traffic services
Reporting Office
(ARO)
Flight Information
Service (FIS)
PEOPLE
Controllers
Engineers
Management
EQUIPMENT
HMI
Hardware
Software
PROCEDURES &
TRAINING
Operational Procedures
Engineering Procedures
Figure 2-3 ATM and ATC system components (adapted from ICAO, 2001a)
The functional architecture of any system presents a high level decomposition of the
overall system into a logical set of functional blocks. Each block may be further
decomposed into a series of sub-functions. The ATC functionalities and their related
sub-functions, as presented in this thesis, include all those of the current ATM/ATC
system as well those under development for inclusion in the future (i.e. with 2020 taken
as the target year in this thesis in line with the European Commission’s ‘Vision 2020’;
European Commission, 2001).
The starting point for the development of the ATC functional classification in this thesis
is the EUROCONTROL Harmonisation of European Incident Definition Initiative for
ATM (HEIDI) taxonomy. HEIDI taxonomy identifies six different ATC functionalities and
related ATC equipment that supports each of them. The functionalities listed in HEIDI
are: communication, surveillance, navigation, data processing and distribution, support
information functionality and power supply (EUROCONTROL, 2001e). This taxonomy
is subsequently expanded in this thesis by taking into account the needs for both the
classification and characteristics of the information derived from operational failure
reports processed. The analysis of operational failure reports highlighted the need for
nine ATC functional blocks. . The next set of layers dissects each ATC functional block
Chapter 2 Fundamental of ATM and ATC
15
into relevant sub-functions which are then dissected further to the elemental level. This
approach enables the capture of all operational components of ATC. The resulting nine
ATC functional blocks, as defined in this thesis, are:
� Communication;
� Navigation;
� Surveillance;
� Data processing and distribution;
� Supporting;
� Safety nets;
� Power supply;
� Pointing and data input; and
� System monitoring and control.
Additionally, this classification is further built upon in Chapter 4. The following
paragraphs give a detailed description of each functionality and the corresponding
physical components (i.e. hardware components that support each function).
2.3.1 Air Traffic Control functionalities
2.3.1.1 Communication function
The scope of communication function covers the distribution of information to air- and
ground-based ATC system components in the form of voice, data, or both. This is
achieved using various communication methods. Currently, radio telephony (RT)
enables voice transfer of information via high frequencies (HF), very high frequencies
(VHF), and ultra-high frequencies (UHF). Controller-pilot data link communication
(CPDLC), as a concept currently used in Australasia and the Pacific, assumes transfer
of data based on high frequency data link (HF DL), very high frequency data link (VDL),
and satellite communication (SATCOM). In general, the communication function
provides connectivity and information transfer between users and providers that are
both internal and external to a particular ATC Centre. This function is supported by
various components (Figure 2-4) which are discussed in the following paragraphs. The
section concludes with a discussion of the future communication systems and the
concept of Required Communication Performance (RCP).
Chapter 2 Fundamental of ATM and ATC
16
Figure 2-4 Communication function
Firstly, the communication function is supported by a Voice Switching Communication
System (VSCS) presented on Controller Working Positions (CWPs) via the VSCS
panel. This is a computer-controlled switching system that facilitates both the air-to-
ground (A/G) and ground-ground (G/G) communication necessary for ATC operations
(FAA, 1998). Controllers are able to use the VSCS for A/G communication by
accessing A/G transmitters and receivers through which they communicate with pilots
via HF, VHF, or UHF. The VSCS also ensures that incoming A/G communications from
pilots are routed to the appropriate control position. Controllers are able to use the
VSCS for G/G communication via intercom, interphone, and external circuits. Intercom
enables controllers to access other control positions or ancillary positions located within
the operational room. Interphone enables controllers to access positions located within
another ATC/ATM facility. Finally, external circuits of VSCS enable controllers to
access the public telephone network (FAA, 1998).
Secondly, data is exchanged with adjacent ATC Centres via the Aeronautical Fixed
Telecommunication Network (AFTN), On-line Data Exchange (OLDI) automated
protocols, and ICAO data interchange network, using both public and private telephone
networks. AFTN, administered by ICAO, is the means by which all information
concerning national and international air operations are exchanged. The data consists
of messages on aircraft movements, conditions of airports, weather, and other
information related to ATC. OLDI refers to operational use of connections between
various Flight Data Processing Systems (FDPS) at different Area Control Centres
(ACCs). Public and private telephone networks are used to communicate data on
individual flights between ATC Centres along the route of the flight. The data that is
Chapter 2 Fundamental of ATM and ATC
17
exchanged includes flight level information, airspace boundary estimates of flights, and
other conditions that may be agreed between ATC Centres. This category incorporates
both systems for data exchange and any supporting equipment (e.g. AFTN printer,
console).
Thirdly, the Aeronautical Information System (AIS) provides information of a permanent
or semi-permanent nature on subjects such as geographical description of airspace, in-
flight procedures, sector procedures, communications data, surveillance data, and
specific airport characteristics data, either verbally or via datalink. In addition, local ATC
units provide a dynamic broadcast of relevant information to arriving and departing
pilots in the vicinity of the airport is known as Aerodrome Terminal Information Service
(ATIS). This service uses local weather data (from the meteorological office) and AIS
data (e.g. runway and taxiway conditions, navigational aids status).
Fourthly, backup radio and telephone systems must be provided. These backup
systems may provide identical functionality if it is a duplicated VSCS system. However,
in some cases, redundancy can be provided by similar but not identical systems which
cannot offer identical functionality. In these cases it is essential that controllers are
aware of these differences. Backup communication systems must be capable of
providing continuity of communication during outages (complete loss of the
communications at the level of an ATC Centre), as voice communication continues to
be the primary means of communicating ATC instructions to aircraft.
Finally, several other physical components are listed which have a role in providing the
overall communications function. These include but are not limited to pagers, headsets,
handsets, microphones, processors, press-to-talk buttons (PTT), buzzers, cables, and
footswitches.
The previous discussion has focused on current systems that support the
communication function. Current communication methods are mostly based on
analogue voice communication that pose various limitations to the users (e.g. limited
coverage, accessibility, capability, integrity, and security). Moreover, the combination of
these limitations with current Radio Telephony (RT) procedures is linked to excessive
levels of controller workload (see Figure 21 in EUROCONTROL, 2004g). As a result,
future development of air navigation for civil aviation aims toward enhanced
communication links between aircraft and controllers. This was an important element of
the ICAO’s Future Navigation Systems - FANS concept (ICAO, 2007). With respect to
Chapter 2 Fundamental of ATM and ATC
18
communication, a major development has been the advent of the Required
Communications Performance (RCP) concept. This concept characterises the
performance requirements for communications with no specific reference to
technology. Hence, the concept allows various technologies to be evaluated in terms of
communication process time (i.e. delay), integrity, availability, and continuity of function
(NASA, 2000). Until 2015, it is anticipated that the voice communication function will be
supported by a very high frequency data link (VDL) in addition to existing analogue
voice channels. In general, voice communication will be used for real-time, time-critical,
and non-routine messages (i.e. radar vectoring to avoid traffic). All other, more routine
communications will be served via data communication supported by VDL and satellite
communication (SATCOM) (NASA, 2000). The use of enhanced modes of data link will
enable several advanced features. Firstly, it will bring automatic data entry capabilities
while reducing time spent on manual data entry and potential for data entry errors.
Secondly, it will permit a significant reduction in transmission time and thus reduce RT
frequency congestion. Finally, it will eliminate misunderstandings as a result of
broadcasting problems and language issues. As a result, communication in the 2020
time frame is expected to be characterised by a mix of analogue voice and digital
communication with increased use of datalink to complement or replace existing
analogue voice communications.
2.3.1.2 Navigation function
The main objective of the navigation function within air traffic control (ATC) is to provide
aircraft with the means to navigate between the point of departure and the point of
arrival, i.e. to accurately and reliably determine their position during all phases of flight.
The quality of required navigational information (e.g. accuracy and integrity of aircraft
position) differs based upon the phase of flight. For example, the requirements in the
landing phase of the flight are the most stringent due to proximity to the ground and
high speed of aircraft, leaving little time to pilot to take corrective action. The navigation
function block, as shown in Figure 2-5, focuses on three components, namely
approach and landing navigation systems, area navigation systems, and systems for
control and monitoring of ground-based airport facilities. These are explained in the
following sections, concluding with a discussion of the concept of Required Navigation
Performance (RNP).
Chapter 2 Fundamental of ATM and ATC
19
Figure 2-5 Navigational function
2.3.1.2.1 Approach and landing navigation
This category within the navigation function consists of the systems that provide
precise guidance to an aircraft approaching a runway. The most widespread approach
aid is the Instrument Landing System (ILS) used for the most critical phases of the
flight, i.e. approach and landing. This system provides the pilot with both runway
centreline azimuth guidance (provided by an ILS localiser) and descent rate guidance
(provided by ILS glide slope) along the approach path of an aircraft. It allows pilots to
conduct the final approach and land safely even in conditions of poor visibility.
Previously, a Microwave Landing System (MLS) was supported by ICAO in areas
where it offered operational and economic advantages (e.g. increased runway
throughput/capacity). However, in this domain much more emphasis is now put on
evaluation of satellite navigation techniques and the necessary augmentations to
support precision landing with the long term objective of replacing the ILS system
(Aviation International News, 2001).
2.3.1.2.2 Area navigation
aRea NAVigation (RNAV) is a method of navigation that enables aircraft to fly any
chosen direct course within a network of navigation beacons, rather than navigating
directly to and from the individual beacons (EUROCONTROL, 2003h). Navigation
systems which provide RNAV capability include VHF Omni-directional Range/ Distance
It should be pointed out that although this research considers only failures which lead
to hazardous situations, there are other failures as well. These other failures represent
the majority which never affect the controllers’ performance due to the effectiveness of
technical built-in defences (NATS, 2002). However, these failures still require
intervention, repair, and maintenance by engineers from the ATC system control and
monitoring unit.
After defining a failure and hazard as used in this research, the next session analyses
the nature of equipment failures in the operational environment. Details on this sample
of equipment failure reports are presented in the following section.
3.3 Supporting data: operational failure reports
Operational experience in this research is captured through a sample of operational
failure reports. They originate from four de-identified countries, referred to as Country
A, B, C, and D due to confidentiality. The following discussion focuses firstly on the
process of reporting equipment failures and their collection at the local level (i.e.
database of the ATC Centre) and national level (database of the respective Civil
Aviation Authority-CAA). The discussion continues by revealing a range of data pre-
processing problems and the corresponding solutions.
Chapter 3 Preliminary Assessment
46
3.3.1 Reporting and data collection
The aim of occurrence data collection is generally to record the safety performance of
the relevant unit (e.g. ATC Centre). The data are collected on a range of safety-
relevant occurrences, such as incidents, losses of separation, equipment failures, bird
strikes, runway incursions, level busts, and others. For example, at the European level,
the EUROCONTROL ESSAR 2 document (EUROCONTROL, 2000c) provides
recommendations on the reporting and assessment of safety occurrences in ATM. As a
result, the national Civil Aviation Authorities (CAAs) specify the types of ATM
occurrences to be collected, analysed, or investigated through their mandatory
occurrence reporting (MOR) schemes (Figure 3-3). For example, the UK CAA also
specifies who can report an occurrence, what the correct reporting procedure is, and
how the details should be disseminated (in the case of the investigation). The UK CAA
states that the objective of this reporting scheme is “to contribute to the improvement of
air safety by ensuring that relevant information on safety is reported, collected, stored,
protected, and disseminated. The sole objective of occurrence reporting is the
prevention of accidents and incidents and not to attribute blame or liability” (UK CAA,
2005).
Figure 0-3 Reporting system
In aviation generally, as in ATC, data is usually stored and sorted electronically in
different databases. Collection of data in hardcopy has long been abandoned in most
of the developed countries worldwide. The type and level of database detail depends
on the unit/group/authority collecting the data (e.g. a system control and monitoring
unit, air navigation service provider, or national CAA). For example, when collecting
equipment failure occurrences, the most detailed information is available in the
Chapter 3 Preliminary Assessment
47
database of the control and monitoring unit within the particular ATC Centre. This
database must contain information on all equipment failures that occurred in the ATC
Centre regardless of their impact or severity. The reason for this is because
engineering staff have to have a complete insight on all equipment failures as they are
responsible for repair and maintenance.
However, not all equipment failures are required to be reported at a national level. The
choice of those that need to reach respective CAAs is made through a review of
reported incidents or safety events on a monthly, quarterly, and annual basis. As a
result, a national database will contain only occurrences of appropriate severity
characteristics and impact on operations. As an example, the UK CAA uses a MOR
database which contains, amongst others, reports on equipment failures that impact on
the controllers’ ability to provide air traffic services. These reports are fed in from the
Engineering Reporting Occurrence Database which contains details on all technical
problems, failures, and maintenance issues, of which the majority pass unnoticed by
controllers (due to the high level of ATC systems redundancy).
Collected data is regularly analysed to assess the safety performance at national level
as well as at the level of the relevant units (e.g. ATC Centre). Furthermore, this
information is sometimes used on a wider basis for benchmarking studies and to record
the safety performance of a given region (e.g. European Civil Aviation Conference –
ECAC consisting of 41 European countries).
3.3.2 Data pre-processing problems
As previously mentioned, the research presented in this thesis uses operational failure
reports from four operational databases. Problems experienced with extracting failures
from different operational databases can be summarised as follows:
� Different reporting schemes produce different levels of reporting detail. The amount
and quality of information reported differ significantly from one report to another.
Therefore, inconsistencies between reports were identified in terms of failure impact
(i.e. severity), duration, and location.
� There are differences in terminology used (e.g. Computerised Automatic Terminal
Information Service - CATIS as Automatic Terminal Information Service - ATIS by
another name, “hotline” as ground to ground communication, usually intercom;
National Aeronautical Information Processing System - NAIPS as Aeronautical
Information Service - AIS), usage of very specific component names (e.g. Air
Ground Data Processor - AGDP, as part of datalink system).
Chapter 3 Preliminary Assessment
48
� A lack of reporting culture that results in uncertainty related to data reliability and
completeness.
These problems are addressed below highlighting the approaches adopted to mitigate
them.
All reports have a short, one sentence long, summary followed by a description of the
equipment failure incident plus some additional information (e.g., date, occurrence
number, location, area code: flight information region or sector name). Unfortunately
the additional information were not always available. Additionally, Countries C and D
provided their internal severity categorisation, while Country D provided information on
failure duration. Since Country D’s dataset originates from an engineering unit, the
duration variable was measured from the first log of the failure until its final resolution.
As a result, it was possible to consistently extract four types of information. The type of
equipment/ATC functionality affected and complexity of failure type are extracted
usually from the short summary available for each report. The severity of equipment
failure is extracted using the available severity rating (if it existed) or assessing the
available information of the operational and safety impact of equipment failure and thus
applying the severity rating derived in this research (see Chapter 4, Table 4-5). Finally,
the duration variable is available only in the Country D database.
Data pre-processing is based on the classification of ATC system functionalities (see
Chapter 2). In certain reports it was very difficult to determine the type of equipment.
This problem was compounded by having only an acronym to explain precisely what
the report referred to. Consequently, several interviews have been conducted with
engineering staff from two European ATC Centres to correctly identify and classify
those ambiguous problems and assure proper classification. A glossary of terms and
acronyms is found to be a very useful tool during the pre-processing stage. Such
documents should accompany (or be an integral part of) every database as part of a
normal reporting practice.
Within one country, the number of reports may not reflect the actual number of
equipment failure incidents in the ATC Centres for a variety of reasons. The main
reasons may be the lack of reporting as a result of an inadequate reporting culture in
the ATC Centre and aviation community overall. Secondly, not all equipment failures
are included in the CAA databases. As previously explained, only failures of certain
Chapter 3 Preliminary Assessment
49
severity (i.e. impact on ATC operations and controller performance) tend to be reported
to the CAA. As a result, the available operational failure reports are neither necessarily
complete nor reliable (i.e. they lack the detail on the context surrounding a reported
occurrence). To date, no measure of completeness and reliability of occurrence
databases has been produced. This is a task for future research.
3.3.3 Available operational failure reports
As stated previously, there are four sources of data on equipment failures included in
this thesis, Countries A, B, C, and D. The first three data sets are from Civil Aviation
Authority (CAA) databases for a given time period. In other words, these are equipment
failures reported in the CAA database for all ATC Centres within the national
boundaries of these countries over a given time period (usually a year). The fourth data
source (Country D) represents data from the system control and monitoring unit of one
ATC Centre. Table 3-1 gives a summary of the available data.
Table 0-1 Summary of available data, number of reports, and equipment failure incidents per country
Country Source of data Time period
available
Average flight hours flown for available time
period
Total number of reports pre-
processed
Total number of equipment
failures reported
A CAA 1999-2003 1,375,800.00 1,378 791
B CAA 2001-2005 1,027,870.00 1,393 1,324
C CAA 1992-2004 389,245.68 3,340 448
D System control
unit/ATC Centre 08/2000-2004 428,502.22 16,697 7,788
Total 22,808 10,351
After pre-processing of all available equipment failure reports (22,808), more than ten
thousand reports (i.e. 10,351) are identified as equipment failures in air traffic control
(Table 3-1). The remaining reports mainly comprised of equipment related reports
outside of the national airspace, multiple reports filed for the same occurrence to reflect
multiple finding or causes identified, as well as reports on non-ATC equipment and
other non-technical types of incidents (e.g. human error, runway closures due to non-
equipment issues, scheduled maintenance, software updates, and scheduled hardware
changes).
Chapter 3 Preliminary Assessment
50
The time period studied, for countries A and B, could be considered steady (uniform)
with respect to the ATC service provided and other aviation related factors (e.g. traffic
levels, jet fuel prices, airline fares, regulations). However, one modern ATC Centre was
opened in Country A in the second half 2001. This resulted in a relatively large number
of early failures of individual components early in 2002. This is a recognised
characteristic of the initial life or ‘burn-in period’ of any newly implemented system
(Figure 3-4).
Figure 0-4 ”Bathtub” model of reliability for electronic components (Leveson, 1995)
Country B underwent a complete modernisation of its ATM system in 2000. Given that
a typical ‘burn-in period’ range between 30-90 days (IEEE, 1998), it is reasonable to
assume that the system was well integrated and settled for the period of the data (i.e.
2001 to 2005). Therefore, the average number of incidents reported in this period could
be considered representative and appropriate for further analysis.
However, the time period available for Country C consists of 13 consecutive years (i.e.
1992 to 2004). This country went through extensive regulatory changes throughout the
1980’s. The change in air service licensing assured that any operator that could prove
financial viability and meet safety standards would obtain a license. As a result, by the
end of the 1980’s, the number of operators had more than doubled. At about the same
time, the Government decided to commercialise most of its service provision activities.
Thus air traffic and other services formed new state-owned commercial enterprises.
However, all of these changes were firmly embedded into the system until the 1990’s,
and therefore, the sample provided could be considered stable and appropriate for
further analysis.
Country D is unique in that it provided data from a single engineering unit database and
therefore represents the most detailed data source in this research. It covers the
Chapter 3 Preliminary Assessment
51
shortest period available (3.5 years) but contains the highest proportion of failures or
75 percent of all available reports.
Although the available sample has a significant number of operational failure reports,
this still does not indicate how representative these reports are of the operational ATC
environment. For this reason, a methodology for the top down total aviation system
safety is developed. This methodology enables determination of the contribution of
ATC equipment to the safety of the overall air transport system based on past
research. Once this is established, the same methodology is applied using the
operational failure reports and then the results are compared. This methodology and
the subsequent validation of the available operational data are presented in the
following section.
3.4 Methodology to assess the relevance of supporting data
This section develops the methodology for an assessment of the available sample of
operational failure reports. In order to assure the relevance of this sample, this section
builds a methodology for its validation. In short, the contribution or risk budget of
equipment failures to the overall safety of air transport system extracted from past
literature is compared to the result obtained from the analysis of available operational
failure reports. The section starts by identifying the overall aviation Target Level of
Safety (TLS) and derives risk budgets for ATM and its ATC component. It concludes by
determining the risk budget of ATC equipment. In other words, this methodology
determines the contribution of ATC equipment failures to the safety of the overall air
transport system. This finding is then compared to the results of the preliminary
analysis of the available operational failure reports.
3.4.1 The accident to incident ratio
Aviation Target Level of Safety (TLS) expressed only in terms of accidents has two
potential limitations. Firstly, the number of accidents is small for any adequate
statistical analysis. Non-accident data, such as loss of standard separation between
aircraft in controlled airspace, is therefore necessary to establish the occurrence of any
trends. Secondly, the number of accidents (or accident rate) is not necessarily the best
measure of safety performance. For example, the currently used target of one accident
in 107 flight hours demands the collection of operational data over many years to
demonstrate whether the TLS has been met. A single accident may violate the TLS,
whilst many years without an accident will satisfy the TLS, but conceal any
deterioration in safety prior to an accident (Graham, Kinnersly, and Joyce, 2002). In
Chapter 3 Preliminary Assessment
52
this context, past safety analyses (not only in aviation) have used the number of
incidents together with the assumed accident/incident ratio. The United States Federal
Aviation Administration (FAA, 2000) cites several different analytical approaches. The
two most common of these are discussed below.
In the 1940s, Heinrich introduced the idea of the existence of accidents where injuries
did not occur, but considered only damage to property (Heinrich, 1941). This led to the
creation of the so-called ‘Heinrich pyramid’ with established proportions of accidents,
serious incidents, and incidents; 1:29:300 (Saldana et al., 2002). After these initial
studies, there was stagnation in the theoretical underpinnings of safety investigations
until the practical work of Byrd in the 1970s. Byrd carried out his work in a steel factory
and revised Heinrich’s proportions to 1:29:600 (Saldana et al., 2002).
However, whilst both of these studies are valuable in their statistical analyses, they do
not seem to be appropriate in dealing with equipment failures in ATC, at least not in the
ratios they offer. Both studies are designed to determine the risk and related ratio of
on-the-job accidents and incident. The reason for the weaknesses in both studies may
originate from their design and in particular, the bias of analysing accident reports filed
by supervisors only (which tend to blame injuries on workers) and much lower levels of
equipment reliability and integrity compared to the systems used in ATC today.
For the purpose of the research presented in this thesis, additional attention has been
given to the ratio between accident and incidents induced by ATC equipment failures.
However, a EUROCONTROL safety assessment study assumed that one in 10,000
equipment failures will contribute to an aviation accident (EUROCONTROL, 2004c), an
assumption which is in line with the high reliability requirement for the overall ATC
systems, as well as ATC equipment. A number of arguments can be made to suggest
that in future, this proposed ratio will decrease:
� The number of incidents should decrease due to continuous safety initiatives and
hazard prevention programmes;
� The probability of an incident leading to an accident should decrease due to
increases both in equipment reliability and advanced solutions for redundancy
and diversity (dissimilar redundancy);
� Changes should be seen in the type of incidents occurring, in that as a result of
enhanced risk management approaches, the frequency of serious incidents
should reduce;
Chapter 3 Preliminary Assessment
53
� There should also be a decrease in the number of software-related incidents,
which are prevalent today as discussed earlier. Hardware-related incidents
should also diminish.
The arguments discussed above infer the step change in software and hardware
reliability as a result of considerable operational experience, knowledge, and expertise.
For example, in its requirements for the software configuration EUROCONTROL states
that reporting, tracking, and corrective actions are set in place to mitigate any software-
related problem (EUROCONTROL, 2003i). Note also that a decrease in the number of
incidents should only consider the steady state (i.e. useful life) as captured in the ‘bath
tub’ reliability model (Figure 3-4).
It has been highlighted that perception of risk only in terms of accidents tends to mask
the actual safety issues. For this reason, it is important to include the number of
incidents so as to estimate the appropriate accident/incident ratio. After the discussion
of accidents and incident ratio, the following section discusses the units of
measurement used in aviation and thus the different perspectives obtained in the
investigation of a critical event.
3.4.2 Units of measurement
The rate of any critical event represents the number of occurrences (e.g. equipment
failures, incidents, accidents) divided by the exposure to those events. For example,
aviation accident statistics are presented in a variety of ratios and units, called units of
measurement. The most frequently used are the number of accidents per operation
(take off or landing), per million flight hours flown, per flight, per million departures, per
million aircraft-miles, per million aircraft-hours, per million passenger-hours, and per
million passenger-miles.
No single measurement gives a complete picture of the critical event under
investigation. Each of these units gives only one perspective, whilst possibly hiding
others. For example, rates per million passenger-miles are most useful for comparing
air transport and other modes of transport, whilst aircraft departures are suitable for
comparison of accidents between small commuter jets and large commercial jets (e.g.
BA46 and B747, respectively). In addition, for the determination of the required
performance of the landing aids e.g. Instrument Landing System (ILS) or Microwave
Landing System (MLS), the only appropriate measure would be the number of landings
Chapter 3 Preliminary Assessment
54
per time period of interest. Any other measure would mask the true performance
values.
In addition to the units of measure, accident rates are determined by the definition of
the critical event as well. These critical events range from accidents, fatal accidents,
hull losses, to the number of fatalities or injuries. An accident, as defined by ICAO
Annex 13 (ICAO, 2001d), involves “an occurrence associated with the operation of an
aircraft, which takes place between the time that any persons board the aircraft with the
intention of flight and that all such persons have disembarked, in which any person
suffers death or serious injury, or in which the aircraft receives substantial damage.”
This definition therefore comprises fatal accidents as well as hull losses. Thus, in
dealing with various accidents rates it is crucial to be aware of the precise definition of
both the critical event and the unit of measurement used.
The current rate of aircraft accidents per million flying hours has remained constant
over recent years. If the same accident rate is assumed for the future together with
predicted increases in traffic levels, there will be an increase in the absolute number of
accidents. Using the current accident rate, ICAO has predicted that by the year 2010
there will be an aircraft accident per week, i.e. 52 accidents per year (Hai, 2004). This
is the reason why the US FAA and other aviation authorities have identified the need to
significantly decrease the risk of aircraft accidents.
The following sections propose a methodology for the derivation of aviation target level
of safety (TLS) based on the rate of aircraft accidents (defined as a number of
accidents per flight hour). An accident is defined according to ICAO, while the flight
hour has been chosen as the most appropriate measure of risk induced by equipment
failures. It is usually more convenient to work in terms of flight hours rather than
operational hours of an ATC unit or sector. This approach avoids difficulties and
differences associated with the geographical coverage of the system(s) being
considered, phase of flight, the density and complexity of airspace, as well as available
systems and equipment (e.g. number of radars, navigation systems, communication
systems). This is also in line with Required Communication, Navigation, and
Surveillance Concepts (RNC, RNC, RSC) as defined in the previous Chapter. In short
the proposed methodology starts by identifying the high-level aviation target level of
safety further focusing on the precise contribution of equipment failures, as the type of
occurrence under investigation in this thesis.
Chapter 3 Preliminary Assessment
55
3.4.3 The acceptable risk or target level of safety (TLS)
The methodology to determine the contribution of equipment failures to the safety of
the overall air transport system is organised in several steps. Firstly, existing aviation
standards for Target Level of Safety (TLS) are assessed. Secondly, the contribution of
ATC to the risk of an aircraft accident is determined. Thirdly, the contribution of ATC
equipment to the ATC risk budget is determined. These findings are than extrapolated
to the year 2020, as the target year in this research in line with the European
Commission’s ‘Vision 2020’ (European Commission, 2001). The final step involves
validation of the available sample of operational data using the same methodology.
These steps are presented in the following sections.
3.4.3.1 Existing standards
Technology and engineering have brought numerous inventions and benefits to the
modern way of life. Whilst these benefits are welcome, the risks associated with them
are not. The high pressure on the engineering world to reduce risk and increase safety
comes at a financial price. Therefore, it is important to manage the trade-off between
risk and the cost of its reduction.
As a result, there are certain degrees of risk that must be accepted. Determining the
acceptable level of risk1 is generally the responsibility of management and is based on
several principles. These are the objective to be achieved, the alternatives available,
and the consequences and values that can be identified. Based upon this, the TLS is a
quantified level of risk (or potential loss) that a system should be designed to deliver
(Brooker, 2004). In aviation, the TLS is usually expressed as a number of aircraft
accidents per flight hour flown, which is used in this thesis, as indicated previously.
The concepts of TLS and risk budgeting are directly linked. Indeed, risk budgeting
represents a top-down distribution of TLS (or total aviation risk) between the
independent sub-categories. The logic behind this process is to specify the maximum
1 Note the difference between acceptable and tolerable risk. Tolerability refers to a “willingness
to live with a risk so as to secure certain benefits and in the confidence that it is being properly controlled. Tolerable risk, is not ignored, but is controlled and reduced further if possible. On the other hand, acceptable risk means that we are “prepared to take risk as it is” (Reid, 1996). It should be noted also that acceptable risk is a relative term and is based on different risk perceptions: individual, public (group of individuals), industry (industry usually needs additional pressure to declare a product as unsafe), and risk perception by safety experts. They all differ in the level of risk they are willing to ‘accept’.
Chapter 3 Preliminary Assessment
56
acceptable risk for each sub-category, so that each one has to produce equal or lower
risk than prescribed (see Figures 2-1 and 2-3).
As pointed out by Brooker (2004), there are several methods to derive the TLS. In most
cases, the analysis starts from the current situation and uses an improvement factor to
derive the desired TLS. In some cases, this improvement factor may be established as
a continuing trend from the past translated into the future. It should incorporate traffic
growth factors, factors representing changes in the systems involved, the operational
procedures, and work practices. In other cases, it may be based on a common
agreement between technical experts, with the main idea underlying it being to set
challenging, but still realistic safety improvement targets.
The following sections provide an overview of the most relevant aviation TLS analyses.
The level of diversity between these approaches highlights the complexity of the
problem and the need for a consistent top-down total air transport system approach.
3.4.3.1.1 Joint Aviation Authority
The Joint Aviation Authority (JAA) document JAR-25.1309 is one of the main regulatory
documents in aviation. It also defines the fundamental principles that govern aircraft
design and certification. JAR 25.1309 defines the risk of a serious accident due to
“operational and airframe-related causes” to be in the order of one per million hours of
flight. About ten percent of the number of accidents related to operational and airframe
causes is attributed to aircraft equipment failures (e.g. hydraulics and electrical
systems) and the rest (90 percent) to other operational aspects (JAA, 1994). A
EUROCONTROL review of existing TLS standards and practices (EUROCONTROL,
2000a) argues that this requirement is based on data from the 1960s and as such is
outdated. Furthermore, the JAR requirement is related to aircraft design,
encompassing only aircraft equipment, without consideration for the other components
of the air transport system (including ATM). Accordingly this JAR requirement needs to
be informed with all the major changes in the aviation industry since the 1960s. The
following paragraphs indicate several key factors that symbolise the changes and
growth in aviation since the 1960s.
There has been a rapid expansion in the air transport industry over the last four
decades due to a number of factors, including growth in the world economy,
advancement in flight technology and the deregulation of the airline services. The result
of these forces has been a steady decline in airline costs and passenger fares, which
Chapter 3 Preliminary Assessment
57
has further stimulated traffic growth. As an example of economic growth, ICAO cites
that there has been an increase in total gross domestic product (GDP) by a factor of
3.8 over the same period (ICAO, 1997). The GDP is considered to be the most
appropriate available measure of world output and indicates the health of the global
economy.
Changes in flight technology have also had a major effect on the growth in travel
demand. The modern era of air transportation began in the 1960s. The major drive was
the replacement of piston engines with jet engines, which was accompanied by
increased speed, reliability, and comfort. This change led to a reduction in operational
costs, which in turn led to increased travel demand.
In addition to this, changes in the regulatory environment in both the US and Europe
have had a big effect. The deregulation of airline services in the US in 1978 allowed
airlines to improve services, reduce average costs, increase routes, and increase
efficiency of scheduling. In Europe, the introduction of a single market for aviation
services by the European Union in 1992 has seen similar changes to that seen in the
USA.
The ICAO Manual on Air Traffic Forecasting (ICAO, 1985) suggests three methods for
forecasting future civil aviation traffic. These methods are trend projection, econometric
analysis, and market and industry survey. Econometric forecasting is the only method
that takes into account various economic, social, and operational factors affecting air
traffic. The objective here is to translate the relevant factors into projections of future
traffic growth. Then the traffic growth factors are reviewed further to incorporate
prospective changes by other factors that are not accommodated in the econometric
analysis.
The predicted traffic growth will influence target safety levels through the increase in
the number of flight hours forecast. However, there are other factors, not necessarily
included in this forecast of traffic growth, that have the potential to influence the level of
safety. Some of these factors are: the growth in the total number of aircraft flying as
well as in the passenger capacity of aircraft (e.g. Airbus 380, Airbus 350, Boeing 7E7
Dreamliner), increased airport and airspace congestion, technological development
(e.g. advanced safety nets, satellite-based CNS/ATM), and pressure on finding the
tools to control and mitigate human error. Another important factor not considered is
Chapter 3 Preliminary Assessment
58
the increasing effect of environmental policies on aviation, in particular on air fares,
costs, and restrictions to possible routes.
Therefore, in line with the EUROCONTROL argument the JAR requirement should be
informed with an analysis based on an updated data sample of accident rates from the
last four decades. At the same time, future predictions and regulations should be based
on econometric forecasting, which will involve the effect of traffic growth as well as
other economic, technical, and operational factors.
3.4.3.1.2 UK Civil Aviation Authority
The UK Civil Aviation Authority (CAA) has calculated a worldwide fatal accident rate
using the Worldwide Aircraft Accident Summary (WAAS) aviation database sample2 for
the period 1990-1999 (UK CAA, 2000). The CAA based its analysis on this sample and
the following assumptions (EUROCONTROL, 2005):
� A fixed annual traffic growth rate until the year 2020 (i.e. 4 percent for western
built jets); and
� A constant number of fatal accidents per year (i.e. eight fatal accidents each
year).
Based on these assumptions, the UK CAA predicted a rate of 1.8E-07 fatal accidents
per flight for the year 2020. For the purpose of the methodology presented in this
Chapter, this target has been translated into the rate per flight hour using the
information available on the Boeing web site (Boeing, 2004) as follows. The average
flight in 1982 was approximately 1.4 hours, while in 2002 it was 1.94 hours. If this trend
continues, it is determined in this research that the average flight in 2020 will be 2.43
hours. Using this assumption, the UK CAA’s TLS for the year 2020 corresponds to
7.4E-08 fatal accidents per flight hour.
3.4.3.1.3 International Civil Aviation Organisation
There have been several attempts by ICAO to derive aviation target levels of safety.
These originate from a number of different studies and reports, which are presented
below, from the earliest to the most recent.
2 Information published by Flight International (monthly publication of Reed Business
Information Group). Includes accidents and serious incidents worldwide with the exception of the Commonwealth of Independent States (CIS) before 1990 (former Soviet Union). The data set covered only commercial aircraft or aircraft with maximum takeoff weight above 5.7t.
Chapter 3 Preliminary Assessment
59
� ICAO North Atlantic Systems Planning Group (NATSPG) - the ICAO NATSPG
initially developed a method using the data on fatal accidents of jet aircraft in
the period from 1959 to 1966 (EUROCONTROL, 2000a). Based on available
data3 this analysis estimated fatal accident rate of 2.34E-06. The analysis
progressed by assigning a factor 0.1 for accidents due to collision. The basis for
this assumption is not evident or recorded. An improvement factor between two
and five was further applied to justify the use of historical data on future targets
(EUROCONTROL, 2000a). This resulted in a TLS ranging between 12E-08 to
4.6E-08 fatal accident per flight hour due to collision. Finally, the analysis
apportioned the value of TLS to three flight dimensions and thus calculated a
TLS for collision due to loss of lateral separation to be between 4E-08 and
1.5E-08 fatal accidents per flight hour.
� ICAO Review of the General Concept of Separation Panel (RGCSP) - in 1995,
the ICAO RGCSP reviewed several approaches to deriving a TLS for ATM and
accepted the one developed by ICAO NATSPG. The RGCSP assumed a total
accident rate from all causes to be 1E-07 per flight hour for the year 2010. This
TLS is based upon the NATSPG analysis extrapolated to the year 2010
(Brooker, 2004). Based on the contributions from the US (TLS ranging between
2E-09 and 7E-09) and the USSR4, the RGCSP agreed upon TLS value that
should be used for establishing any vertical minimum performance
specification. This value is equal to or better than 5E-09 fatal accidents per
flight hour arising from collisions due to any cause for the period 2000 to 2010.
This value of a TLS is also indicated in the ICAO Annex 11 (ICAO, 2001c);
� ICAO Annex 11 - in the situation where “fatal accidents per flight hour” is
considered to be an appropriate metric, ICAO Annex 11 (ICAO, 2001c)
proposes a TLS of 5E-09 fatal accidents per flight hour per dimension after the
year 2000. Although ICAO Annex 11 does not provide any justification for this
TLS, it is assumed that this value is taken from the ICAO RGCSP. For the
period prior to the year 2000, ICAO Annex 11 recommends the use of a TLS of
2E-08 fatal accidents per flight hour per dimension; and
� ICAO All-Weather Operations Panel (AWOP) - the objective of the ICAO AWOP
was to assess the required navigational performance (RNP) for approach,
landing, and departure phases of flight (ICAO, 1994). Based upon historical
3 Based on 36 fatal accidents and an estimate of 15.5 million flight hours during the period
1959-1966. 4 The USSR developed a series of targets for progressive implementation, such as 1E-08 from
1990 to 2000, 5E-09 for 2000-2010, and 2E-09 for 2010 onwards (ICAO, 1995).
Chapter 3 Preliminary Assessment
60
data5, ICAO’s calculation determined the average hull loss to be 1.87E-06 per
flight or 1.27E-06 per flight hour. Based on this historical data, ICAO proposed a
TLS for hull loss per flight hour to be 1E-07. The rationale for this risk
improvement over the historical accident rate is the removal of pilot errors by
the use of glass cockpit aircraft and tunnel incident alarm. The glass cockpit is a
system of electronic displays presenting all information on an aircraft's situation,
position, and progress. The tunnel incident alarm is an alert that is triggered if
the aircraft unintentionally leaves the assigned flight path, the “tunnel”, during
the approach and landing phases of flight. Additionally, the objective in aviation
safety is to reduce the number of accidents despite increasing flight hours. This
is essential if public confidence in aviation is to be maintained as the global air
transport system expands.
3.4.3.1.4 Summary of the various TLS analyses
The previous section has given an overview of the research on aviation TLS which is
summarised in Table 3-2 (based on the information available). This table enables
comparison of the TLS taking into account the source of data, the time period covered
by the data set, the type of accident, the type of aircraft operation, and the TLS unit
used.
Once again the differences in the derivation of TLS should be pointed out. The
summary presented shows the level of discrepancy in the method, data set, and
taxonomies used. The major factors that drive the differences in the calculation of
target levels of safety are:
� Type of accident (accident, fatal accident, hull loss),
� Weight of aircraft involved in the accident,
� Differences in the definitions (i.e. taxonomies used),
� Type of operations analysed: scheduled vs. non-scheduled, commercial vs.
non-commercial (military, freight, general aviation), registered vs. non-
registered, domestic vs. international,
� Type of aircraft included: jets vs. turbo props,
� Time frame of the data set analysed,
� Source of the data,
5 Data set covers hull loss accidents for the period from 1959 to 1990 for commercial jet aircraft
whose weight exceeds 60,000lbs. Exposure percentages are based on an average flight duration of 1.47h. A hull loss accident is defined as an accident where the primary cause is hull loss or aircraft damage beyond economical repair.
Chapter 3 Preliminary Assessment
61
� Region involved in the analysis (with or without former Soviet Union),
� Targeted year for the TLS calculation: current vs. future levels.
Table 0-2 Summary of various analyses on aviation TLS
Reference Title Database
Scope
Target year
TLS Region/time period
Type of operation/
weight/type of accident
Joint Aviation
Authorities
JAR 25.1309 Large
Aeroplanes - Advisory
Material - AMJ
Not specified Worldwide
1960s Serious accident
Not specified
1E-06 per flight hour
UK Civil Aviation Authority
Aviation Safety Review
CAP 701 WAAS
Worldwide 1990-1999
Jets & turbo props/
MTOW>5,700t/fatal
accidents
2020 1.8E-07 per
flight/7.4E-08 per flight hour
ICAO
North Atlantic Systems
Planning Group (NATSPG)
Not specified Worldwide Jets/1959-
1966 Not
specified 2.34E-06 per
flight
ICAO
Review of the General
Concept of Separation
Panel (RGCSP)
Not specified Not
specified Jets/fatal accidents
2010 1E-07 per flight hour
ICAO Annex 11 Not specified Worldwide En route fatal
accidents
After the year 2000
5E-09 per flight hour per
dimension (1.5E-08 per flight hour)
ICAO
All-Weather Operations
Panel (AWOP) 15
th meeting
Not specified Worldwide 1959-1990
Jets/MTOW> 60,000lb/ hull loss
accidents
Not specified
1E-07 per flight hour
Key: MTOW = maximum take-off weight of the aircraft
After the review of the most relevant analysis and methods of TLS calculation, the TLS
of 1E-08 accidents per flight hour is used as the baseline for the year 2020 (target year
of the research presented in this thesis). The reasons for using this baseline are:
� The rate of 1E-07 is currently used as a target by ICAO for both fatal accidents
and hull loss accidents (see Table 3-2);
� With the overall aim of reducing the accident rate given the current safety
targets, it is reasonable to aim at 1E-08 accidents per flight hour in the year
2020;
� The analysis conducted by the UK CAA to predict the role of fatal accidents for
2020 (i.e. 7.4E-08 fatal accidents per flight hour).
Chapter 3 Preliminary Assessment
62
Once the TLS for the year 2020 is determined, the next step is to apportion the
contribution of ATC in the overall air transport TLS. To establish this, several studies
have been reviewed. The key findings are presented in the following section.
3.4.4 Target level of safety and Air Traffic Control risk budgeting
The next step is to determine the risk budget allocation for the ATC system as a
component of the overall air transport system, i.e. determine the contribution of ATC.
According to the results of the UK CAA’s analysis, the contribution of ATC and ground
aids to aircraft accidents is 1.7 percent (Table 13 in EUROCONTROL, 2005).
EUROCONTROL currently uses 2 percent as a maximum direct contribution of ATM to
aircraft accidents within the European Civil Aviation Conference (ECAC) region. This
figure was derived based upon historical data (ICAO ADREP database focused on the
ECAC region) from which a contribution of ATC is determined to be 1.1 percent
(EUROCONTROL, 2001a). Recognising that only ATC causes were accounted for
(without contribution of other ATM components, such as ATS, ASM, AFTM)
EUROCONTROL allowed additional 0.9 percent, resulting in 2 percent of ATM
contribution to aircraft accident. This figure has been further validates via discussions
with EUROCONTROL Safety Regulatory Commission’s task force Hazard
Classification Matrix (HCM). EUROCONTROL has defined “the maximum tolerable
probability of ATM directly contributing to an accident of a commercial air transport
aircraft” in the ECAC region to be 1.55E-08 per flight hour (EUROCONTROL, 2001b).
This figure is based on the rate of aircraft accident for the year 1999 (extracted from
ICAO ADREP database focusing on the ECAC region) with direct ATM contribution (2
percent) and a forecast of 6.7 percent increase in the traffic volumes for the period
1999-2015 (EUROCONTROL, 2001a).
In the Netherlands, a study by the national research laboratory (NLR) used a sample of
civil aircraft accidents that occurred worldwide during the period 1980-1999, mostly
based on ICAO database (van Es, 2003). This study determined that ATM-related
accidents represent 8 percent of the total number of accidents. Additionally, 28 percent
of these ATM-related accidents are directly caused by ATC, which makes the ATC
contribution to aircraft accidents approximately 2.2 percent. The difference in the
contribution of ATC in these two studies is due to the difference in classification of
causal factors. While the UK CAA analysis divided all underlying factors into primary,
causal, and circumstantial groups, the NLR analysis followed the recommendation by
Chapter 3 Preliminary Assessment
63
ICAO and did not use this distinction. The NLR study considered an occurrence as a
causal factor only if that occurrence was part of the chain of events leading to the
accident. The NLR approach seems to reflect better the aim of determining the overall
ATC contribution to aircraft accidents.
The results presented above need to be augmented for possible statistical error and
uncertainties linked to the reporting processes as well as to provide additional
protection for the future. As previously discussed, EUROCONTROL allowed additional
0.9 percent for statistical error and uncertainties in the calculation of the ATM safety
targets for ECAC region based upon historical data for only one component of ATM,
namely ATC (EUROCONTROL, 2001a). With this in mind, together with the results
from UK CAA and NLR studies, this thesis uses a maximum contribution of ATC of 3
percent. Thus, using the previously established TLS for air transport system for the
year 2020 (in the previous section), apportioned contribution of ATC is considered to
be 3E-10 per flight hour. Now, after deriving the TLS for ATC specifically, this functional
block should be divided between human operators, equipment, and procedures. This
approach now gives the opportunity to define the appropriate risk induced by failure of
ATC equipment which is presented in the next section.
3.4.5 Target level of safety and Air Traffic Control equipment risk budgeting
It is important to determine the contribution of equipment (or their failure or malfunction)
to the ATC risk budget. The historical data on the proportion of incidents in which
equipment failure is implicated varies to a certain degree. Interviews with system
control and monitoring staff at two European ATC Centres6, as well as the
approximation used by the CORA 2 documentation (EUROCONTROL, 2004c) reveal
that equipment failures are the causal factor in 0.01 or one percent of all incidents.
Although this assumption is based on the ATM system and not its ATC component
only, it is used with other sources of information to inform the ATC equipment risk
budgeting within overall air transport system.
More focused approach is provided by the NLR study (van Es, 2003). This study
determined that the particular causal factor ‘ATC ground aid malfunction or unavailable’
has been attributed to 5 percent of all ATM related accidents or 18 percent of all ATC
related accidents. It should be noted that this causal factor includes ‘unavailable’ ATC
6 Based upon private communications with staff at two European Area Control Centres (ACCs).
Chapter 3 Preliminary Assessment
64
equipment meaning equipment that was taken out of service by ATC staff, presumably
for maintenance reasons. In addition, the research was based on data samples that
incorporated older systems with lower levels of automation. Future systems are shifting
more towards a higher level of automation and higher reliability, as discussed in the
previous Chapter.
Therefore, it can be approximated that equipment failures represent the causal factor in
10 percent of all ATC related accidents (or 3 percent in all ATM related accidents). This
is based on the assumption that unscheduled failures constitute about 50 percent of
the failures in the NLR analysis discussed above. This approach derives a risk of an
ATC equipment failure leading to the aircraft accident to be 3E-11 per flight hour. The
reasoning presented seems to correlate with the widespread argument that human
error represents the causal factor in 70-80 percent of all accidents (Reason, 1997).
Although there is some evidence that the majority of these human errors represent
organisational errors (Johnson and Holloway, 2004). A graphical representation of the
determined risk budgets is given in Figure 3-5.
Figure 0-5 Aviation TLS and risk budgeting
After assessing the contribution of ATC equipment failures to the overall risk of aircraft
accident, it is important to validate these findings with some operational experience.
This is achieved in the following section by analysis of operational failure reports from
three countries.
Chapter 3 Preliminary Assessment
65
3.5 Preliminary analysis and validation of operational failure reports
The previous sections described the process of deriving an overall aviation TLS for the
reference year 2020 and further risk budgeting for ATC equipment. In order to justify
the use of the available sample of operational reports in this thesis, this sample is
validated by the proposed TLS methodology. This is presented in the following
paragraphs.
Having the accident rate for the year 2000 (EUROCONTROL, 2005) and predicted
accident rates for the year 2010 (1E-07; Brooker, 2004) and 2020 (1E-08, used in this
research), it is apparent that future safety levels are predicted to improve tenfold every
decade. This is in line with the attempts of various aviation institutions to significantly
improve future aviation safety levels (e.g. FAA, ICAO). The next step is to implement
the established rate of improvement to the ATC equipment failures.
Using the same analogy and the ratios within an air transport system, as presented in
Figure 3-5, it is possible to translate the 2020 rate of ATC equipment contribution to
aircraft accident to the present levels (i.e. 2000). The calculation presented in section
3.4.5 showed that for the year 2020 this effect is of the order of 3E-11 per flight hour.
Using the reverse logic, this effect equals to the level of 3E-09 for the year 2000. In
other words, based on the past research and established ratios the contribution of
equipment failures to the overall safety of air transport system in the current period is in
the order of 3E-09 per flight hour.
Having established the contribution of equipment failures to the overall safety of the air
transport system based on past research, it is necessary to calculate the same value
using the available operational failure reports. The conformance of ATC equipment
budgeting obtained from past research and available failure reports would indicate that
the available sample is representative of equipment failures occurring in the operational
ATC environment.
Firstly, it is important to discuss the overall commercial air transport accident rates for
the three countries analysed. These rates are slightly higher than the worldwide
average (1E-06 per flight hour; see Figure 3-5), ranging from 1E-05 and 9E-06 aircraft
accidents per flight hour). Secondly, it is necessary to discuss the available sample of
operational failure reports by focusing on the frequency of equipment failure reports per
Chapter 3 Preliminary Assessment
66
year and per source. The incident reports used in this section were from three sources,
namely three Civil Aviation Authorities (CAAs), presented as Country A (for the period
1999 to 2003), Country B (for the period 2001 to 2005), and Country C (for the period
1992 to 2004). The final results of this preliminarily analysis of available operational
reports are presented in Table 3-3. The average number of failures is calculated for all
three data sets (column 4). This is followed by the calculation of incident rates based
on the average flight hours flown for the given time periods (column 5). The final step
involved adjustment of the calculated incident rate to give the probability of accident
caused by equipment failure (using the accident to incident rate of 1 in 10,000) as
shown in the last column on Table 3-3. In other words this calculation produced the
operational level of safety for three countries and three respective time periods.
Table 0-3 Analysis of operational failure reports and results
Country Year
Total number of equipment
failures reported
Average number of equipment
failures per year
Rate of failure - incident (per flight hour)
Rate of failure - accident (per flight
hour)
(1) (2) (3) (4) (5) (6)
A
1999 100
158.2 1.15E-04 1.15E-08
2000 107
2001 122
2002 287
2003 175
B
2001 184
264.8 2.58E-04 2.58E-08
2002 237
2003 171
2004 247
2005 485
C
1992 28
34.46 8.85E-05 8.85E-09
1993 38
1994 41
1995 21
1996 16
1997 42
1998 40
1999 25
2000 38
2001 27
2002 46
2003 42
2004 44
Based on the contribution of equipment failures to the overall safety of air transport
system extracted from the past research and overall TLS methodology (3E-09 per flight
Chapter 3 Preliminary Assessment
67
hour), we can conclude that the TLS levels acquired from operational reports (last
column in Table 3-3) show a degree of conformity.
Even higher levels of conformity would be achieved with setting of higher level of TLS
for year 2000 (data indicate 1E-05 as opposed to 1E-06 accepted within aviation
community). Furthermore, better tuning of the current and future trade-offs within the
air transport system (see Chapter 2, Figures 2-1 and 2-3) would additionally enhance
the proposed methodology for determination of risk budgeting of the ATC equipment.
Future advancements in technology, changes in the levels of traffic, and overall
changes in the ATC/ATM philosophy (e.g. shifting of separation responsibility from the
ground to the air) have a potential to improve safety. At the same time it is reasonable
to assume that the distribution of the levels of risk within the air transport system will
change. The results specific to ATC given here could be used as an input to a
complete safety analysis that should consider trade-offs between the various
components of the aviation system to realise risk budgets for a safe and cost effective
system. Finally, the severity of the reported incidents could be used to inform the
weighting scheme and to better reflect the accident to incident ratio, as the above
analysis considered all incidents equally.
In short, the above analysis indicates that the available operational failure reports are a
representative sample of equipment failures occurring in ATC Centres worldwide.
Having established the appropriateness of this sample, the following Chapter moves
toward the identification of operational characteristics of equipment failures extracted
from past research and operational failure reports.
3.6 Summary
This Chapter starts with a precise definition of equipment failures and hazards,
representing a sub-group of equipment failures that require human intervention (or
human recovery). It continues by presenting a sample of operational failure reports
available in this research. After discussion on the reporting schemes designed to
capture incident occurrences, including equipment failures, the Chapter continues by
highlighting data pre-processing problems and solutions applied to overcome them. In
order to assure the relevance of equipment failures captured in the sample available,
the remainder of the Chapter builds a framework for its validation. This framework for
risk assessment, based entirely on past literature, begins from the risk assessment of
the overall air transport system and focuses on one component, namely ATC
Chapter 3 Preliminary Assessment
68
equipment. In other words, this section determines the maximum allowed accident risk
imposed by ATC equipment failures for the target year 2020.
The contribution of equipment failures to the overall safety of air transport system
extracted from past literature have then been compared with the result obtained from
the analysis of available sample. This analysis showed a degree of agreement between
the theoretically assumed and operationally extracted levels of ATC equipment risk
budgeting. In other words, the available operational failure reports are a representative
sample of equipment failures occurring in operational ATC environment. Hence, the
next Chapter proceeds with a detailed assessment of the equipment failure
characteristics extracted from operational failure reports and available literature.
Chapter 4 Equipment Failures in ATC
69
4 Equipment Failures and Technical Defences in Air Traffic Control
The previous Chapter showed that operational failure reports available in this thesis
constitute a representative sample of equipment failures occurring in the operational Air
Traffic Control (ATC) environment. This Chapter moves toward the identification of the
operational characteristics of equipment failures. These are extracted from past
research and more than 20,000 operational failure reports. Special attention is paid to
the impact that equipment failures may have on ATC operations, and as a result a
severity rating scheme has been designed to support the research presented in this
thesis. Having discussed the consequences of equipment failures and their impact on
ATC operations, it is important to discuss how such consequences can be prevented or
mitigated. This involves the process of recovery from equipment failure and a
distinction can be made between technical and human recovery. This Chapter
discusses technical recovery by reviewing the existing technical built-in defences,
whilst the next Chapter discusses human (i.e. controller) recovery. A subset of
equipment failure characteristics relevant to ATC operations is then used in this
Chapter to develop a novel tool for the assessment of the severity of equipment
failures, known as the qualitative equipment failure impact assessment tool. This tool
enables an assessment of the overall impact of an equipment failure on ATC
operations.
4.1 Equipment failure characteristics
When dealing with any type of equipment failure, it is important to understand its
underlying characteristics. In other words, it is important to take into account issues like
causes, consequences, duration, and complexity. Thus, a detailed hazard analysis
would capture the most important characteristics of a failure and the context
surrounding its occurrence (Leveson, 1995). The following sections explain several
important failure characteristics:
� ATC functionality affected;
� Complexity of failure type;
Chapter 4 Equipment Failures in ATC
70
� Time course of failure development;
� Duration of failure;
� Potential causes of equipment failure; and
� Consequences of equipment failure.
The consequences of equipment failures are discussed on several different levels,
ranging from their impact on the individual (i.e. the air traffic controller), the operations
room, the ATC system, and the impact they have on the overall ATM system.
4.1.1 ATC functionality affected
The methodology adopted in this thesis for the classification of ATC functionalities
results in a nine-category classification (Chapter 2, section 2.3). Several examples of
the equipment failures related to different ATC functionalities are presented in Table 4-
1. These examples are randomly selected and de-identified from operational failure
reports available in this research, as discussed previously in Chapter 3.
Table 4-1 Examples of equipment failures related to different ATC system functionalities (as defined in Chapter 2)
Type of failure Example
Communication function
Total radio telephony failure on three frequencies (three sectors). Workstation had to be reset to default fallback setting.
Navigation function
Runway 15 Instrument Landing System (ILS) failed whilst aircraft on 16 NM final approach in Instrument Meteorological Conditions (IMC). Approach Control Centre was advised and aircraft confirmed the failure. Aircraft was preparing for a missed approach, when the ILS returned to service after recovery.
Surveillance function Erroneous altitude readings displayed on radar for B777 and B767 at FL340 and FL350, respectively. Short term conflict alert (STCA) was activated.
Data processing function
Triple failure on suite flight data exchange. System fully recovered after 40 min by manual intervention. Departures from two airports were stopped for approximately 10min. The cause was the existence of duplicate flight identity numbers within the flight data held in the affected workstations.
Supporting function
B737 was on the final approach at 50ft over the runway when the controller received a false Approach Monitoring Aid (AMA) warning. The controller was concerned that in low visibility conditions a go-around would have been unnecessarily given.
Safety nets (SNET)
STCA failed to activate against two aircraft at FL120. One aircraft was dropping parachutes, with the other filming them. Consequently, the aircraft were quite close to each other. They were both squawking Secondary Surveillance Radar (SSR) codes, but Short term Conflict Alert
Chapter 4 Equipment Failures in ATC
71
(STCA) failed to activate.
Power supply
At time 0535 power failure in the tower caused Radar Data Processing System (RDPS) and Flight Data Processing System (FDPS), radar, public telephone network, weather radar, and computer failure. At time 0650 position rebooted and upgraded. ATC service returned to normal at 0730.
Pointing and input devices
Cursor frozen in global ops field of electronic flight strip. The controller was moved to an adjacent console and resumed operations from that position. There was only a brief interruption to the service.
System monitoring and control function
At 0215 the ATC system suffered a significant slowdown. The System Monitoring (SMS) shut itself down.
4.1.2 Complexity of failure type
Failures can be single or multiple component failures (Wickens et al., 1998). A single
failure can be total or partial affecting only one piece of equipment or one of its
components. Multiple component failures can be independent of each other (which can
make the process of diagnosis very difficult) or dependent failures (common cause,
common mode, or cascade failures) (Mauri, 2000). Common cause failures occur when
a single cause creates simultaneous (or near simultaneous) multiple failures (e.g. due
to fire, loss of power, or software bug). Common mode failures are a subset of common
cause failures whose observed effect on the system is identical. Cascade failures are
dependent failures that affect redundant components by shifting their load sequentially
(e.g. power grids or servers). Once the first level of redundancy is pushed beyond its
capacity (e.g. transformer), the load will be shifted onto the next redundant component
until all redundancies are exhausted (Mauri, 2000).
4.1.3 Time course of failure development
In terms of time course of failure development, there are sudden, gradual, or latent
failures. With sudden failures, the operator does not have much time to prepare for
recovery, but at the same time there is the potential advantage of immediate detection
of the failure. Contrary to this, gradual failures may degrade system capabilities in ways
that are not apparent to the operator (e.g. gradual loss of data integrity). This makes
failure detection, and therefore technical and human recovery extremely difficult. Latent
failures are generally difficult to detect. These failures exist in the system unnoticed
until the occurrence of some other failure or unusual occurrence reveals long-existing
latent failures in the system (Wickens et al., 1998). As a result, this group of failures is
observed separately, as the time course of their initial development is not known, i.e.
these failures could occur initially either as sudden or gradual.
Chapter 4 Equipment Failures in ATC
72
4.1.4 Duration of failure
Duration of failure is defined as the time between the first log of the event (corresponds
closely to the failure detection) until its final closure. Applied to a specific failure, it can
carry important information on recovery and its impact on ATC, ATM, and overall
aviation safety. The categories defined in this research are based on the evidence from
the available operational failure reports. Their analysis indicates the distribution of
failure duration which corresponds to the following categories (section 4.4.6):
� Short period of time - order of magnitude is in minutes;
� Moderate period of time - order of magnitude is in minutes up to one hour; and
� Substantial period of time - order of magnitude is in hours (it can extend to days).
4.1.5 Potential causes of equipment failures
The causes of equipment failures come from the three interacting sources. These are:
� Technical faults as defects or anomalies built into the system or its components;
� Human errors or violations as acts of omission or commission by the designer,
constructor, controller, engineer, or maintenance personnel that might result in a
failure; and
� External factors or unfortunate, unforeseen, or uncontrolled events, such as severe
weather, fire, accidents, vandalism, sabotage, or terrorism.
The listed causes of failures represent only the first layer of causation. Further analysis
might reveal the existence of organisational error, organisational loss of control, or
failure to anticipate all hazardous conditions and prepare appropriate defences against
them. As an example, the impact of a power outage should be anticipated by
management and consequently appropriate preventive strategies should be
implemented. Similarly, the threat of either terrorism or vandalism should be guarded
against through the provision of adequate internal security measures.
There are various techniques designed to investigate technical faults, human error, and
organisational error. For technical faults, Fault Trees (FT), Event Trees (ET), and
Probabilistic Safety Assessment (PSA) are mostly applied (Brooker, 2006); human
error is investigated by a range of Human Reliability Assessment (HRA) techniques
which are discussed in more detail in Chapters 7 and 8. Finally, organisational errors
are mostly investigated using the Reason model (Reason, 1997), the Human Factors
Chapter 4 Equipment Failures in ATC
73
Analysis and Classification System-HFACS (Shappell, 2000), or qualitative principles
behind a safety culture (Sorensen, 2002).
After brief discussion of these five failure characteristics, the next section discusses the
potential consequences of equipment failures. The consequences of equipment failures
are discussed at several levels, from their impact on the individual (i.e. the controller),
the operations room, the ATC system, concluding with their impact on the ATM system
as a whole.
4.2 Consequences of equipment failure
Equipment failures that penetrate existing technical built-in defences and hence affect
controller performance (called hazards) are the main objective of the research
presented in this thesis. Therefore, the consequences of these failures are initially
assessed at the level of the controller, followed by the operations room, a given
airspace (i.e. the impact on ATC operations), and finally at regional level (i.e. the
impact on ATM operations).
4.2.1 Impact on air traffic controller
The impact of equipment failures on controller performance represents the focus of this
thesis, and as such will be assessed in detail in the following Chapters. One equipment
failure occurrence in the Lisbon ATC Centre highlights the impact that equipment
failures could have on the controller (Sampaio and Guerra, 2004). In this very busy
sector, a sudden failure of the Radar Data Processing System (RDPS) affected only
one radar track. This failure went unnoticed for 21 minutes until a traffic advisory by the
cockpit-based Traffic Collision and Avoidance System (TCAS) triggered an action by
the controller. The controller did suspect some problems prior to the TCAS alert
focusing only on human error in the input of relevant data (i.e. SSR code).
Unfortunately, the controller never considered the possibility of an equipment failure.
Post-incident investigation revealed that the cause of this failure was incompatibility of
the software developed for the installed radar with the software of the main ATC
system. However, the same investigation did not reveal why this failure affected only
one radar track and not all tracks informed by the same radar. This particular example
highlights how complex and severe an equipment failure can be.
4.2.2 Impact on operations room
The impact of equipment failures on the entire ATC operations room depends entirely
upon the failure characteristics in terms of the number of equipment/positions affected.
Chapter 4 Equipment Failures in ATC
74
Another important factor is the overall ATC Centre architecture, since exposure to
failure varies greatly based on the interconnectivity of different equipment, the level of
separate channels (redundancy/variability), and failure complexity (single failure vs.
multiple failures). Based on operational experience (NATS, 2002) and ATC operations
room configuration, four categories can be differentiated. These categories range from
the impact on the entire operations room, several sectors, or only one sector. The
categories are defined as follows:
� All workstations/all sectors affected;
� A number of workstations/different sectors affected;
� Several workstations (within same suite)/one sector affected; and
� One workstation/one sector affected.
The proposed categorisation by NATS follows the severity of the impact of failures on
the operations room starting with the most severe failure (known as outage) to the least
severe type of failure (affecting only one workstation). In addition, each ‘suite’ is
responsible for a specific portion of airspace (i.e. sector) whilst each sector has a
declared capacity (expressed in terms of the number of aircraft in the sector in the peak
hour). As a result, the failure characteristic ‘impact on operations room’ is linked with
the number of aircraft exposed to the impact of equipment failure.
4.2.3 Impact on ATC operations
The impact of equipment failures on Air Traffic Control (ATC) service provision should
incorporate effects from an operational, safety, and financial perspective. In terms of
ATC operation, equipment failures could result in an inadequate ATC service, leading
for example to unexpected or increased delays in service provision (aircraft performing
holding procedures due to a failure of the Instrument Landing System – ILS during the
landing phase of flight), delayed arrivals/departures, and limitations in capacity due to
traffic flow restrictions or stopped departures/arrivals.
From the safety perspective, failures generate unavailability of certain ATC functions.
They also generate increased workload as a result of unexpected and highly stressful
failure occurrences increasing the potential for incident/accident occurrence. Vitally,
safety could be jeopardised by any type of data integrity equipment issue when the
equipment provides timely but inaccurate information. On such occasions, an
equipment failure could go undetected for some time (see the example discussed in
section 4.2.1). All of these, combined with inadequate or insufficient training, the
Chapter 4 Equipment Failures in ATC
75
absence of recovery procedures, and a lack of experience may create the potential for
controller error.
From a financial perspective, equipment failures create planned and unplanned costs
of repair, training (of both controllers and technicians), and incident investigation.
However, the most likely costs are measured in terms of additional costs placed on
airlines in the case of significant delays (e.g. loss of connecting flights and passenger
accommodation). These are discussed further in the next section.
Ideally the combination of all three consequences of an equipment failure should
constitute the overall impact on ATC operations or the particular failure’s ‘severity’.
However, in the operational environment the most usual practice is to combine safety
and the operational impact of an equipment failure to determine its severity rating. The
following paragraphs review severity ratings defined specifically for equipment failure
occurrences. They originate from safety regulations defined in two Air Navigation
Service Providers (ANSPs) and one Civil Aviation Authority (CAA).
The UK National Air Traffic Service (NATS) recognises four categories of failure types
based on their impact on ATC operations, namely major impact, impact on workstation
or suite, ATC impact, and minimal impact (Table 4-2). Furthermore, analysis of
operational failure reports in this thesis identified the severity categorisation from one
CAA (referred to as Country C) and another ANSP (referred to as Country D). The CAA
of Country C defines the severity rating of equipment failures according to the potential
to cause a significant problem (see Table 4-3).
Table 4-2 UK NATS severity rating (from NATS, 2002)
Severity Definition
Major impact to Ops room
Severe flow restrictions could be required
Impact to workstation/suite
May be necessary to combine/move positions immediately or sector flow restrictions may be required
ATC impact Not immediately critical, will have greater operational impact over time
Minimal impact Centre management required
Chapter 4 Equipment Failures in ATC
76
Table 4-3 Country C’s severity rating as defined by its CAA
Severity Factor Definition
CR Critical An occurrence or deficiency that caused, or on its own had the potential to cause, loss of life or limb.
MA Major An occurrence or deficiency involving a major ATC system component that caused, or had the potential to cause, significant problems to the function or effectiveness of that system.
MI Minor An isolated occurrence or deficiency not indicative of a significant ATC system problem.
Finally, the data for Country D originate from one particular ATC Centre. This Centre
determines the severity of an incident as a result of the combination of the impact it has
on both the controllers (internally in this ATC Centre as well as externally in other ATC
units) and system control and monitoring engineers. In general, in this particular ATC
Centre the determination of the severity of an incident is the task of the system control
and monitoring unit which distinguishes five severity classes. These are presented in
the Table 4-4.
Table 4-4 Country D severity rating as defined by the particular ATC Centre
Severity Factor Definition
1 System down A system outage affecting the total of ATC services provided
2 Critical An error severely affecting a single or few random working positions or a single external service or an error on a “first” standby system.
3 Urgent An error affecting part of a single or few random working positions or part of an external service or an error on a backup system reducing backup capacity.
4 Important An error affecting a supportive service or a system for which automatic backup is available.
5 Enhancement An error having no direct operational impact and only slight non-operational impact.
These severity rating schemes indicate that each country follows its own severity index.
Furthermore, there is a difference in severity ratings between ANSPs and CAAs, as
ANSPs are concerned about the impact on their service provision business (e.g.
delays), whilst safety regulators are concerned about whether such an event causes an
accident. Therefore, simply comparing the severity of occurrences between countries is
unlikely to produce useful findings. All classifications are rather qualitative and depend
Chapter 4 Equipment Failures in ATC
77
upon experience and judgement, which always involves a degree of subjectivity. As a
result, it is necessary to define a unique severity classification for the entire dataset
available in this study corresponding to the existing equipment failure severity ratings
(UK NATS, Country C, and Country D). Consistent with operational practice, the
severity rating defined in the following paragraphs combines safety and operational
impact of equipment failures, while disregarding the financial aspect due to lack of
data. Since the focus of this thesis is on the impact of equipment failures on ATC
operations (including its impact on controller performance), the exclusion of the
financial aspect of severity rating does not have a detrimental effect on this severity
rating and the subsequent quality of data analyses.
The result is a three-level severity rating (major, moderate, and minimal) of equipment
failures based on their impact on ATC operations, as would be appreciated by the
controller (Table 4-5). It is important to highlight that this severity categorisation is
based on the exposure of an ATC Centre to the failed equipment (affecting the entire
ATC Centre, a number of workstations, or only the backup system) regardless of the
type of service provided by the affected ATC Centre. The significant difference in the
level of detail in the reports and the overall need for a consistent approach led to the
exclusion of the type of ATC service in the overall severity categorisation. This
characteristic is accounted for later on in the thesis through the assessment of the
recovery context surrounding an equipment failure occurrence. As a result, this
exclusion here does not have detrimental effect on the severity rating and the
subsequent quality of data analyses. In general, the severity rating is based on the
failure type, available contextual conditions of the failure occurrence, and its impact on
ATC operations.
Table 4-5 Severity rating defined in this research and mapped with available sources
Severity rating in
this research
Definition of the severity rating in this research
Mapping with severity ratings from available
research
Major
Definition: This type of failure may cause severe disruptions on every workstation. It may require immediate traffic flow restrictions to contain workload to manageable levels, which are safe for sustained ongoing operations.
Major
(UK NATS)
Chapter 4 Equipment Failures in ATC
78
Examples: loss of main Flight Data Processing System (FDPS), total voice communication outage, loss of Multiple Radar Processing (MRP), loss of Terminal Approach Radar (TAR), loss of Parallel Approach Runway Monitor (PARM), loss of radar coverage, either complete or over larger parts (Primary Surveillance Radar - PSR and secondary surveillance radar - SSR), total power failure, loss of all Radio Telephony (RT) frequencies, incorrect barometer indication (as part of meteorological equipment), Instrument Landing System (ILS) failure during approach phase and in the reduced visibility conditions, failure of runway/taxiway lights in reduced visibility conditions, wrong indication of runway/taxiway lights, Surface Movement Radar (SMR) failure or provision of wrong label indication.
Major
(Country C)
1
(Country D)
Moderate
Definition: Only affects workstations reliant on the failed item or service. The disruption of ATC operation is contained and a normal level of operation may be resumed by physically moving and combining the role of the affected workstations with another within the sector suite or by physically moving the sector team to the stand-by suite. Under some conditions, sector flow restrictions may be applied.
Impact on workstation/suite
(UK NATS)
Examples: loss of single sector frequency, loss of a number of frequencies, loss of one or two workstations in a sector suite, loss of entire sector suite, loss of telephone panel or Voice Switching And Communication System (VSCS) on a single workstation, loss of one radar (in multiple radar environment), loss of ground-based navigational aids (e.g. Very high frequency Omnidirectional Range - VOR, Non-Directional Beacon - NDB, Distance Measuring Equipment - DME), loss of PSR (as it is a backup to SSR), SSR garbling, loss of safety nets (as these are only tools to support controller).
Major
(Country C)
2 and 3
(Country D)
Minimal
Definition: Initial disruption to ATC operations is not immediately critical, but could have greater impact over time (If not recovered within a reasonable time frame, disruptions to ATC operations may be prolonged/sustained). This escalation with time can restrict traffic flow into sector(s).
ATC and minimal impact
(UK NATS)
Examples: loss of processor, loss of link, loss of system control and monitoring unit, loss of headset, ILS failure during approach in normal visibility conditions because the opportunity for go-around always exists, failure of runway/taxiway lights (in normal visibility conditions) as this system is only a visual aid to the instrument landing, failure in communication link to adjacent ATC Centre, loss of auxiliary display, temporary failure of strip printer or paper jam, inadequate strength of RT frequency, failure of left hand headset connector while right hand is functioning, disturbance/interference on a ground frequency, loss of sequencing tool, and loss of pointing/input devices.
Minor
(Country C)
4 and 5
(Country D)
Having defined the three-level severity rating to be used in this research, appropriate
mapping is established with the existing severity ratings (as defined by UK NATS, the
CAA of Country C, and the ANSP of Country D). The comparison of specific categories
from each of the available sources reveals the matching with ‘major’, ‘moderate’, and
‘minimal’ ratings as defined in this research (Table 4-5). Note however that the ‘major’
category, as defined by Country C, had to be split between ‘major’ and ‘moderate’
categories, as defined in this research. The rationale behind this split is based on two
Chapter 4 Equipment Failures in ATC
79
criteria of equal importance. The first criterion is the definition of ‘major’ and ‘moderate’
categories as presented in Table 4-5. In other words, the severity rating has to
distinguish between failures that affect the entire ATC Centre and those that affect only
workstations reliant on the failed item. The second criterion is based on the impact of a
failure on ATC operations. For example, loss of a VOR or NDB is rated as ‘moderate’
because navigation may be still provided using radar surveillance, other navigational
aids (Global Positioning System-GPS, Automatic Dependence Surveillance-ADS).
However, loss of an ILS during the approach phase or in reduced visibility conditions is
rated as ‘major’. During this phase of flight the aircraft is in the landing configuration
(i.e. reduced speed, in close proximity to the ground). If visual contact with ground is
not achieved at the moment of the failure, an immediate go-around procedure is
necessary. Because of this, the failure of an approach navigation aid (such as ILS) is
considered more severe.
4.2.4 Impact on ATM operations
As noted earlier, it is highly beneficial to analyse the impact of the failures on
operations both inside the control room and outside over a given airspace. At the same
time, it is also important to recognise that failures could have an impact not only on
ATC but also on the wider ATM system. The following examples show how severe the
impact of an equipment failure on ATM operations can be.
According to Aviation Week (reported in RISKS, 2000; NATS, 2004), the UK ATC
service suffered a flight data processing software failure at West Drayton ATC Centre
in June 2000. As a result of the failure, flight progress strips had to be hand written,
which forced the ANSP to restrict the amount of traffic in UK airspace. While the ATC
system recovered after four hours, the effects of this failure were felt for several days
with knock-on effects as far as France and Germany. This is understandable due to the
centralised flow control of traffic in Europe (provided by the EUROCONTROL Central
Flow and Management Unit). As a result of the failure’s severity and subsequent flow
control, its impact spread over a sub-continental region.
Another example of a failure with a severe impact on a wide region is the brief power
failure which affected the US Federal Aviation Administration (FAA) Southern California
Terminal Radar Approach Control (TRACON) facility at Miramar on April 19, 2006. The
facility switched immediately to backup power. The outage lasted only 6 or 7 seconds,
but had an impact on airports from the Mexican border and half way through the state
of California, due to imposed traffic flow control (10News, 2006).
Chapter 4 Equipment Failures in ATC
80
Another example of the severe impact that one single failure can induce is the outage
that occurred in the Chicago ATC Centre in 1995 when the en-route automation
component failed for two hours. This single occurrence cost the airlines an estimated
$12 million in delays (National Transportation Library, 1997). The National
Transportation Library (NTL) report mentions this example to make a case for the
replacement of the outdated main and back up Flight Data Processing Systems
(FDPS), involved in the reported incident. In short, these examples show how severe
the impact of an equipment failure on global ATM operations can be. This issue will
become especially important in a future gate-to-gate ATM system where the roles for
planning and control will have to be re-organised and distributed between controllers
and pilots.
Similar to ATC operations, the impact of failure on ATM can be analysed from several
different perspectives. From operational and safety perspectives, a higher degree of
workload will be experienced both on the ground by controllers, technicians, and
engineers and in the air by flight crew. From a financial perspective, in addition to costs
identified in ATC, it is necessary to add the cost of delays in a wider region. A small
exercise has been conducted on the cost of delays induced by ATC equipment failures
to indicate the financial impact of delays in the European Civil Aviation Conference
(ECAC) and US airspace. This is presented in Appendix I.
Having discussed the consequences of equipment failures, it is important to discuss
how such consequences could be prevented or mitigated. This involves the process of
recovery from equipment failure and a distinction can be made between technical and
human recovery. The following section focuses on technical recovery and the principles
used to prevent and in some cases to mitigate the impact of equipment failures. The
human recovery aspects are addressed in Chapter 5 and throughout the rest of the
thesis.
4.3 Definition of technical defences (technical recovery)
The aim of any design is to identify the functions of a system in advance and to
develop a method which assures the delivery of the intended functions. It is always
necessary to predict what may happen if something fails or if an operator handles a
system incorrectly. Experience shows that even the best designed systems fail
occasionally. Therefore, it is crucial that every design concept includes a solution to re-
establish system operation and provide continuous service. These solutions are
Chapter 4 Equipment Failures in ATC
81
grouped under the term ‘technical built-in defences’. They represent defences against
any unplanned or unwanted interruption of service. They are complex socio-technical
systems which combine technical, human, and organisational measures that prevent or
protect against an adverse effect (Smith et al., 2004). Verification of the existence and
appropriateness of existing defences provides confidence in the safety of a system and
is a requirement for system certification.
Safety is recognised as the ultimate imperative in ATC and therefore, should be
addressed as early as possible in the design process. Having sound safety principles
built into each phase of the design (i.e. conceptual, preliminary, and detailed design
phase) is a useful way to avoid, prevent, and mitigate failures and their impact. Safety
through design is planned through five different principles (Figure 4-1) for hazard1
avoidance, elimination, or control, which are as follows (Christensen and Manuele,
1999; National Aeronautics and Space Administration, 2002; The European New
Machinery Directives cited in Piantek, 1999):
� Eliminate hazards;
� Design for minimum risk;
� Incorporate safety devices (i.e. devices designed to prevent any unwanted event);
� Provide warning devices (i.e. alert that signals the occurrence of some unwanted
event); and
� Develop operating procedures and training schemes.
Figure 4-1 Safety through design (adapted from Christensen and Manuele, 1999)
1 Within system safety, a hazard is usually defined as a condition which can lead to an accident.
In this research, a hazard is defined as the ATC system state resulting from an equipment failure that penetrates all existing technical defences and affects the ability of the controller to perform his/her tasks.
Chapter 4 Equipment Failures in ATC
82
The suggested principles follow the logical order of precedence. The first two
approaches focus on the elimination of the hazard from the system. However, if the
identified hazards cannot be eliminated (due to difficulties or cost), risk should be
reduced by using fixed, automatic, or other protective safety devices (i.e. defences for
seamless recovery from failure). When neither design nor safety devices can effectively
eliminate identified risks or adequately reduce them, devices should be used that
detect the unwanted condition and produce adequate warning signals to alert the
controller (i.e. defences for transmitting information regarding a failure). These warning
signals should be designed to minimise the probability of inappropriate human reaction
and response. Note that regardless of how a warning device performs (Figure 4-2), the
triggering failure represents a hazard (according to the definition in this thesis) as it
affects controller performance.
As explained before, the human operator remains the last line of defence (i.e. human
recovery). For this reason, when warning devices are not sufficient, special procedures
and training scheme should be designed. These must be periodically tested, verified,
and regularly updated to assure their effectiveness.
Similarly, when dealing with equipment failures in ATC, it is important to distinguish
between technical and human (i.e. controller) recovery (Figure 4-2). Both processes
start with the detection of failure (either by a technical system or controller) and
conclude with an outcome. The outcome can be nominal (pre-failure), non-nominal but
stable (i.e. degraded), or inadequate system state (leading to incident or accident). The
outcome of the equipment failure and recovery process is discussed in detail in the
following Chapter. The following paragraphs focus on technical recovery, while human
recovery is addressed in subsequent Chapters.
Figure 4-2 Technical and human recovery
As already highlighted, technical built-in defences can be divided in two different
categories according to the function they provide. These are defences for recovering
from failures (safety devices) and defences for transmitting relevant information on
Chapter 4 Equipment Failures in ATC
83
failure (warning devices). Both categories are examined further in the following
sections.
4.3.1 Defences for recovering from failures (safety devices)
This group of technical built-in defences should include mechanisms designed to
prevent an unwanted event or safety devices (e.g. radiotelephony anti-blocking device,
availability of primary and secondary frequency, automatic switching from normal to
fallback operational mode, automatic switching from primary to secondary glide slope
transmitter) and the creation of fault-tolerant systems though redundancy/diversity. The
main objective of built-in defences is to prevent adverse events from happening (i.e.
preventive defences) or to lessen the impact of the consequences on operations (i.e.
mitigative or protective defences). If a failure has only a preventive barrier, there is no
fault tolerance in the system, as achieved by protective defences. For example, the
feasibility study of the EUROCONTROL eight states free route airspace concept was
established to ensure that free route airspace operations are as safe as the current
fixed route operations (EUROCONTROL, 2001c). The analysis identified 128
preventive defences but no protective defences. Therefore, this concept, in its current
state, fails to establish fault tolerance in the ATM system.
Fault-tolerant systems are designed to preserve the minimum required service in spite
of failure occurrence. This is achieved through the employment of redundancy.
Redundancy is an ability of a system to keep functioning normally in the event of an
equipment failure, by having backup components that perform duplicate functions
(Mauri, 2000). The goal of this process is to mask failure events from the controller, but
also to capture it and report it for the necessary maintenance. However, redundancy
itself is not always a solution due to common cause failures (e.g. fire or power outage).
Common cause failures are due to the same cause. In order to prevent the occurrence
of these types of failures emphasis is placed on diversity of the systems (i.e. different
manufacturers), equipment diversity in manufacturing (e.g. different software
redundant hydraulic system lines of commercial aircraft are physically separated so
that fire in a certain compartment does not affect all the lines simultaneously).
4.3.2 Defences for transmitting information on failure (warning devices)
Alerts should be provided to the controller in the event of a critical change in the ATC
system or equipment status and to remind him of critical actions that must be taken. An
Chapter 4 Equipment Failures in ATC
84
alert or a warning should enhance the probability of appropriate human reaction and
response (i.e. controller recovery performance). According to the FAA’s Human Factors
Design Standard (Federal Aviation Administration, 2003) warning devices should:
� Alert the operator to the fact that a problem exists;
� Inform the operator of the nature of the problem;
� Guide the operator’s initial responses (based on priority); and
� Confirm in a timely manner whether the operator’s response corrected the problem.
Alerts are usually generated immediately after the system detects any discrepancy
from predefined system performance. There are several ways in which ATC controllers
are informed of equipment failures or non-availability of certain functions. The most
usual ones are through colour-coding (e.g. change in the workstation’s border colour)
and textual messages, all presented on the Human Machine Interface (HMI). In
addition to the content and location of the alert message, it is equally important to
display an alert in a timely manner. Alert onset is defined as time between a system’s
detection of a failure and the moment an alert is presented on the HMI either by colour
change or text message (i.e. time-to-alert or TTA). This timing is usually system-driven
(based on the system threshold) but there are novel initiatives toward human-driven or
cognitively-driven alert onset. In general there are three different types of alert onset:
� Immediate onset (an alert is presented on the HMI after the system detects the
failure with the least time delay). This is the normal case for severe events.
� Delayed onset (an alert is presented on the HMI with a time-based or threshold-
based onset). For example, system requirements could be set up to inject an
alert with a specific time delay following the occurrence of a failure or to inject an
alert once a system-defined threshold has been reached (i.e. TTA). In the nuclear
industry this is known as alert sequencing or alert hierarchies indicating the
urgency of actions needed. In this way, a hierarchy makes use of safety criticality,
injecting firstly safety-relevant alerts followed by operational alerts. In satellite
navigation, the TTA value is one of the measures of the integrity of a satellite
navigation system (Feng et al., 2005).
� Cognitively convenient onset (an alert is presented on the HMI based on
cognitive convenience which can be defined thorough the levels of controller
workload). This futuristic concept is mostly used in the nuclear and automobile
industry where cognitive convenience is determined by measuring workload
using physiological measures (e.g. heart rate, breathing rate, galvanic skin
response, eye tracking device). This concept has been tested on a US naval ship
as described in Daniels, Regli, and Franke (2002). This study proposes a method
Chapter 4 Equipment Failures in ATC
85
to control the cognitive effects of task interruption by influencing the timing of an
alert and helping a user to regain their situational awareness within the
interrupted task.
After a detailed overview of the equipment failure characteristics as well as technical
recovery, the next section analyses the nature of equipment failures that manage to
penetrate the existing built-in defences and affect controller performance. For this
purpose, findings from existing literature have been augmented by results of the
analysis of more than ten thousand operational failure reports originating from four
different countries. This sample of equipment failure reports have already been
introduced in Chapter 3 and the following section further analyses this sample.
4.4 Analyses of operational failure reports
Existing literature on equipment failure characteristics has been reviewed in the
previous sections of this Chapter. This has been further augmented and informed by
the analyses of operational data from four countries (i.e. Countries A, B, C, and D), as
presented in detail in Chapter 3.
4.4.1 Data analysis methodology
Since the four countries are of different airspace size, equipage, traffic demand, and
density in their airspace, simple analysis of equipment failure rate would be of limited
value. Therefore, to gain a common metric to assess distribution of equipment failures
per year and per data source, it is necessary to normalise the rates of equipment
failures per appropriate unit of measurement. For example, the rates per ATC Centre
enable comparison of ATC Centres of similar traffic demands and thus equipage, but
otherwise fail to provide a meaningful performance measure. Similarly, the rate of radio
frequency failure per sector or per total number of available frequencies in a sector
(usually there are primary and secondary frequencies available in a sector) enables a
metric for the availability of voice communication in each sector. However, this unit is
not of practical use as the number of sectors changes hourly based upon changes in
air traffic demands. As a result, the rate of equipment failures per flight hours is used in
this research2. This approach avoids difficulties and differences associated with the
2 Hours flown data are collected for commercial airlines, including domestic, regional, and
international air traffic for each country.
Chapter 4 Equipment Failures in ATC
86
geographical coverage of the datasets available and the availability of ATC systems
and equipment (e.g. number of radars, navaids, communication systems).
The information on flight hours for each country has been extracted from the CAA
websites, annual incident summaries, and personal correspondence with the staff from
the engineering unit. After establishing the common ground with an appropriate unit of
measurement, further analyses are performed with available data structured around
four equipment failure characteristics, as they were possible to extract consistently
from available datasets. These four equipment failure characteristics are: type of ATC
functionality and equipment affected, complexity, severity, and duration3 of equipment
failures. The type of equipment/ATC functionality affected and complexity of failure type
are extracted from the short summary available for each report. The severity of
equipment failure is extracted using the available severity rating (if it existed) or
assessing the available information of the operational and safety impact of equipment
failure and thus applying the severity rating derived in this research (see Table 4-5).
The duration variable was available only in the Country D database. Finally, additional
statistical tests have been performed to identify any relationship between four
equipment failure characteristics. The structure of the data analyses is presented in
Figure 4-3.
The nature of the variables under consideration determined which statistical methods
could be used to analyse the data. As can be seen from their description in this
Chapter, most variables are categorical (type of equipment/ATC functionality affected,
complexity of failure type, and severity). Additionally, complexity of failure type and
severity variable have an ordinal character (assuming the ranking between possible
categories). Only duration represents a continuous or ratio scale variable4. This
variable is firstly investigated for its overall distribution, further to be split into categories
to extract information regarding failures of short duration (discussed in sections 4.1.4
and 4.4.6).
3 The duration characteristic is analysed last as it is available only in one database.
4 Variables can be either continuous or categorical. Continuous variables are numeric values on
an interval or ratio scale (e.g. age, income). Categorical variables can be either nominal or ordinal. Nominal variables differentiate between categories but do not assume any ranking between them (e.g. gender). On the other hand, ordinal variables differentiate between categories that can be rank-ordered (e.g. from lowest to highest).
Chapter 4 Equipment Failures in ATC
87
Operational failure reports
4 Countries22,808 available reports
Country D
Country A, B, C, and D
Country A, B, C, and D
Country A, B, C, and D
Data pre-processing
Rate ofequipment failures
Type of ATC function and equipment
affected
Severity
Duration
Additional statistical tests
Available data
Country D database
Traffic figures from respective CAAs
ATC functional classification –Chapter 2
Severity rating –Chapter 4, Table 4-5
Reference
Country A, B, and CComplexity of failure type Chapter 4, section
4.1.2
Figure 4-3 Operational failure reports analyses
Using the SPSS statistical package, frequencies of related categories are identified and
the most frequent categories are reported for each variable. To establish relationships
between these variables, additional statistical tests are also performed. In this regard,
chi-square tests are used to test the relationships between two categorical variables.
The most important assumptions of the chi-squared statistical tests are random sample
data, a large sample size, adequate cell sizes (no less than 5 observations per cell),
independent observations, and normal distribution of deviations between observed and
expected values. The size and characteristics of the available datasets imply the
conformance with all listed assumptions. Furthermore, the Cramer’s V test is used to
measure the association for nominal data (i.e. ATC functionality variable) whilst the
Kendall tau test is used for ordinal data (i.e. severity and duration variables). These
tests are briefly discussed in the following paragraphs.
Chapter 4 Equipment Failures in ATC
88
Cramer’s V is the chi-square-based test that measures the strength of the relationship
between nominal variables and is applicable across contingency tables of size greater
than 2X2 (Berenson et al., 2006). Cramer’s V coefficient is interpreted as a measure of
the relative strength of an association between two variables and it ranges from 0 to 1
(i.e. 1 representing a strong association). Suppose that the null hypothesis is that two
variables are independent random variables. Based on the frequency table and the null
hypothesis, the chi-squared statistic X2 can be computed as the squared difference
between the observed (O) and expected frequency (E) in each cell, divided by the
expected frequency. Then, Cramer’s V coefficient is defined in equation 4-1 below:
mn
E
EO
mn
XV
×
−
=
×
=
2
2)(
4-1
where n represents a sample size while m represents a smaller value between number
of rows minimised by one and number of columns minimised by one.
Kendall’s tau is a chi-square-based test that measures the strength of the relationship
between ordinal variables applicable across contingency tables of all sizes (Berenson
et al., 2006). Kendall’s tau coefficient has the following properties:
� If the agreement between the two rankings is perfect (i.e. the two rankings are the
same) the coefficient takes the value of 1.
� If the disagreement between the two rankings is perfect (i.e., one ranking is the
reverse of the other) the coefficient takes the value of -1.
� For all other associations the value lies between -1 and 1, and increasing values
imply increasing agreement between the rankings. If the rankings are completely
independent, the coefficient takes the value of 0.
Kendall tau coefficient is defined in equation 4-2 below:
1)1(
41
)1(2
1
2−
−
=−
−
=
nn
P
nn
Pτ 4-2
where n represents the number of pairs, P represents the number of concordant pairs.
In statistics, a concordant pair is a pair of a two-variable observation dataset {X1,Y1}
and {X2,Y2}, where (equation 4-3):
)sgn()sgn( 1212 YYXX −=− 4-3
Chapter 4 Equipment Failures in ATC
89
Correspondingly, a discordant pair is a pair where (equation 4-4):
)sgn()sgn( 1212 YYXX −−=− 4-4
Sgn represents the sign function defined as (equation 4-5):
>
=
<−
=
0,1
0,0
0,1
sgn
x
x
x
x 4-5
Therefore, a high value of P indicates that most pairs are concordant, i.e. the rankings
are consistent. A tied pair (sgn x = 0) is not regarded as concordant or discordant. If
there is a large number of ties, the total number of pairs (in the denominator of the
equation 4-2) should be adjusted accordingly (Berenson et al., 2006).
After presenting the overall methodology used for data analyses, the following sections
present some of the key findings and results.
4.4.2 Rate of equipment failures
From Figure 4-4, the rate of equipment failures for Country A initially increases greatly
before peaking in 2002, followed by a sharp drop in 2003. This corresponds to a large
number of early failures experienced with the opening of the new ATC Centre which
accounted for 63.4 percent of all reported equipment failures in that year. Country B’s
rate rises from 17.5 failures per 100,000 flight hours in 2001 to 25 failures per 100,000
flight hours in 2002. This is followed by a drop to 17.8 failures per 100,000 flight hours
in 2003 before increasing sharply in 2005. The reason for high rates in 2004/2005 is
that the air navigational service provider directed controllers to be more diligent about
filling out incident reports to improve the quality of the incident database and the overall
safety management system. Country C’s rate exhibits a steady trend for the entire
period of 13 years, being on average nine failures per 100,000 flight hours.
Chapter 4 Equipment Failures in ATC
90
0
5
10
15
20
25
30
35
40
45
50
19
92
19
93
19
94
19
95
19
96
19
97
19
98
19
99
20
00
20
01
20
02
20
03
20
04
20
05
Year
Rate
(in
100,0
00)
Country A
Country B
Country C
Figure 4-4 Total number of equipment failures per flight hours flown in each year for countries A, B, and C
The data available on the rate of equipment failures for Country D reveals a sharp rise
in number of equipment failures from 30 failures per 10,000 flight hours captured in the
last half of the year 2000 to 45 failures per 10,000 flight hours in 2001 (Figure 4-5)5.
The reason for this is that only five months of data was available for the year 2000.
Therefore, we can conclude that a rate of reported equipment failures in this ATC
Centre decreases in absolute numbers.
0
5
10
15
20
25
30
35
40
45
50
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
Year
Rate
(in
10,0
00)
Country D
Figure 4-5 Total number of equipment failures per flight hours flown in each year for country D (year 2000 incomplete)
5 Although the rates of equipment failure of Country D are tenfold higher compared to Countries
A, B, and C, Country D data are retained for subsequent analyses as they represent the most detailed and reliable source of operational failure reports.
Chapter 4 Equipment Failures in ATC
91
The next section builds on this trend analysis and assesses affected ATC
functionalities. The classification of all ATC functionalities, as defined in Chapter 2, has
been used for this purpose and the findings are presented for each Country separately.
4.4.3 Type of ATC functionality and equipment affected
This section provides the analysis of ATC functionalities and their sub-functions
affected by equipment failure occurrences as reported for Countries A, B, C, and D.
Country A data shows that the two ATC functionalities most affected are the
communication and surveillance functions (Figure 4-6).
Figure 4-6 Most affected ATC functionality (Country A)
Further analysis of sub-functions and equipment most affected by failures identified the
following five types: air ground communication, secondary surveillance radar (SSR),
flight data processing system (FDPS), primary surveillance radar (PSR), and other
communication systems, ranging from pagers, headsets, microphones, cables, to
footswitches (Table 4-6).
Table 4-6 Most affected ATC equipment (Country A)
ATC equipment affected Percentage
air ground communication 33.1
secondary surveillance radar (SSR) 17.7
flight data processing system (FDPS) 10.1
primary surveillance radar (PSR) 5.2
other communication systems 4
Similar to the previous case, two ATC functionalities for Country B most affected by
equipment failures are the communication and surveillance functions (Figure 4-7).
Chapter 4 Equipment Failures in ATC
92
Figure 4-7 Most affected ATC functionality (Country B)
Table 4-7 presents five types of equipment most affected by failures. These are: PSR,
air situational display or radar display, air ground communication, voice switching
communication system (VSCS), data exchange network, and runway/taxiway lighting.
Table 4-7 Most affected ATC equipment (Country B)
ATC equipment affected Percentage
primary surveillance radar (PSR) 17.2
air situational display 15.1
air ground communication 11.6
voice switching communication system (VSCS)
8.8
data exchange network 7.6
runway/taxiway lighting 7.6
Country C shows a slightly different trend in the distribution of equipment failures per
ATC functionality. The two most affected categories are the navigation and
communication functions (Figure 4-8).
Figure 4-8 Most affected ATC functionality (Country C)
Chapter 4 Equipment Failures in ATC
93
Furthermore, the five most affected equipment types are: air ground communication,
instrument landing system (ILS), very high frequency omnidirectional radio range
(VOR), non-directional beacon (NDB), and air situational display (Table 4-8).
Table 4-8 Most affected ATC equipment (Country C)
ATC equipment affected Percentage
air ground communication 23.7
instrument landing system (ILS) 19.6
very high frequency omnidirectional radio range (VOR)
7.6
non-directional beacon 6.5
air situational display 5.8
Country D shows a similar trend to Countries A and B, as two most affected ATC
functionalities are communication and surveillance (Figure 4-9). Although the
navigation function seems not to be represented at all in Figure 4-9, there were only
two failures affecting this functionality and both are due to testing of Global Positioning
System (GPS) clock alarms. The reason for the under representation of this ATC
functionality is the fact that data originated from one particular ATC Centre that
provides area control service and as such is not responsible for the ground-based
navigational aids and airport-based equipment (e.g. meteorological equipment,
Data processing Flight Data Processing System Radar Data Processing System
Power supply Main power system Uninterruptible power supply(generator, battery)
Secondary
Communication
Data exchange network Back-up system Aeronautical Information Service Other
Navigation
Navigational aids (e.g. Very high frequency Omnidirectional Range - VOR, Distance Measuring Equipment - DME) Airport facilities control and monitor (navigation aids monitoring, aeronautical ground lighting)
Chapter 4 Equipment Failures in ATC
104
Surveillance
Surface Movement radar Automatic Dependent Surveillance Aerodrome Traffic Monitor Other (radar link, radar console) Auxiliary Display
Data processing Flow control supporting equipment Fallback facility Other (e.g. strip printer)
Supporting function (ATC tools)
Monitoring aids Sequencing manager Other
Safety nets
Short Term Conflict Alert Minimum Safe Altitude Warning Area Proximity Warning Runway Incursion Monitoring and Conflict Alert System
Pointing and input devices
Pointing devices Input devices
System monitoring
Data recording and playback facility Control and monitoring Degraded modes Time management
Based on the selected characteristics of ATC equipment failures, it is possible to rate
the severity of each possible combination of characteristics. The three-level severity
rating defined previously, based on the impact of equipment failure on ATC operations,
has been used. This severity rating differentiates between major, moderate, and
minimal impact, as defined in section 4.2.3. In general, Figure 4-13 presents the
equipment failure impact assessment tool as a four-step methodology to assess the
severity of an equipment failure. After determining the exact characteristics of
equipment failure in each step, it is possible to follow the link to the final outcome, i.e.
van der Schaaf (1992) Nuclear industry � Detection � Localisation � Correction
Kanse (2004) Chemical industry � Detection � Explanation � Countermeasures
Kaarstad and Ludvigsen (2002)
Nuclear industry � Detection � Explanation � Correction
Bove (2002)2 ATM industry
� Detection � Correction
Therefore, in the research on recovery from equipment failures presented in this thesis,
past research is used to inform the phases of the controller recovery process.
2 Bove (2002) does not identify the diagnosis phase in the human error management process.
This may be due to the fact that this phase represents a covert human activity, difficult to observe, measure, and capture in incident reports.
Chapter 5 Air Traffic Controller Recovery
113
Detection of equipment failure is taken as the first phase, triggered by the mismatch
between ATC system feedback and active knowledge of the controller (expectation or
assumption). This phase is followed by the diagnosis and correction, leading toward
the outcome of the recovery process (as a result of both technical and controller
recovery).
Controller recovery is defined in this thesis as the ability of the controller to detect3,
diagnose, and correct any non-nominal system state resulting from ATC equipment
failure (adapted from van der Schaaf, 1995). The objective of the recovery process (i.e.
its outcome) is to restore the system to its nominal (pre-failure) state or at least to limit
the consequences of failure in the most efficient and effective way (by achieving stable
non-nominal system state). The following sections discuss the phases of controller
recovery.
5.2.1 Detection
Human recovery is a sequential process whose first step is the detection of failure.
Without this detection there is no recovery process. Therefore, the first task of the
controller is to detect the failure. As previously explained, failures can be firstly
detected either by a technical system or by a controller. Hallbert and Meyer (1995) note
that to accomplish detection by the human operator, the stimulus must be
recognisable. In other words, the stimulus must be something that a controller has
already experienced, is trained to observe, or is of sufficient intensity to interrupt the
monitoring process (e.g. visual or auditory alert positioned within the field of view but
different from the background ‘noise’ already present on the radar screen or other
operational support system).
Thus, detection is triggered by any mismatch between the expected effects and
observed outcomes. The mismatch can be explained on the basis of the information
that is matched against the frame of reference or range of the expected system
responses. For example, after issuing an instruction for a flight level change to an
aircraft, the controller expects to see the old flight level gradually changing toward the
new one. However, if the controller observes a flight level change outside the expected
3 Failures can be firstly detected either by a technical system or by a controller. Failures
detected by a technical system may trigger the generation of an alert (via warning device) transmitting information on failure to the controller. However, failures can also go unnoticed by the technical system and be detected by a controller working with fallible equipment.
Chapter 5 Air Traffic Controller Recovery
114
values, then this expectation will trigger the identification of some sort of ‘fault’. This
‘fault’ can be caused by an erroneous flight level change by the pilot or system readout
of the aircraft altitude (e.g. due to radar garbling).
In the case of a total failure of a particular function, it is easier to detect and diagnose
the significance of the change, since the failure is obvious. However, in the case of a
partial failure of a particular ATC function (e.g. corruption of tracks and squawks),
detection may be more challenging. In these circumstances, detection is based on the
controller’s memory of aircraft’s past positions and future trajectories, aided by
available tools (e.g. flight strips). An example of potential difficulties encountered by
controllers in detecting partial equipment failure is reported by Sampaio and Guerra
(2004). In this example, a sudden failure of the Radar Data Processing System (RDPS)
affected only one radar track and went unnoticed by the controller for 21 minutes (see
Chapter 4, section 4.2.1).
Detection is also closely connected to the time course of equipment failure
development, namely sudden, gradual, or latent failures (see Chapter 4, section 4.1.3).
Sudden failures do not allow any time to prepare, but are usually detected immediately.
On the other hand, detection of gradual failures may be extremely difficult and delayed.
Persistent (latent) failures are almost impossible to detect. They might exist in the ATC
system for a long period of time before they are detected. This is confirmed by
interviews conducted during this research with the aim of augmenting the theoretical
sources of information. Engineers from three European ATC Centres confirmed that
latent failures (mostly software failures) tend to go unnoticed until some other event or
failure reveals their existence (for evidence see Appendix II).
There are various other factors that can hinder failure detection, such as difficulties in
observing system feedback or remembering expectations about effects. Detection can
also be made difficult by inappropriate system design (e.g. poor human machine
interface, poor quality or position of alert), workplace layout, or controller working
strategy. As an example, an alert that is barely visible or audible may remain
undetected even by a highly alert controller.
Often, successful detection occurs as a consequence of a combination of design
qualities and mental resources. An example is taken from one of the European ATC
Centres where the label of the ATC function positioned in the ‘general information
window’ changes its colour from white to yellow in the case of a failure. However, in the
Chapter 5 Air Traffic Controller Recovery
115
training facility of the same ATC Centre, within the same window, one specific label is
designed to be colour-coded yellow regardless of its status (i.e. label ‘Lines’ refers to
the status of the communication lines between a number of ATC Centres). Such a
training platform design feature has the potential to result in the missed detection of a
failure by a controller as a result of a continuous and consistent presence of the yellow
colour in the ‘general information window’.
Besides the quality of an alert, its onset also plays an important role. As previously
discussed in Chapter 4, alert onset (i.e. Time-To-Alert or TTA) is defined as time
between a system’s detection of a failure and the moment an alert is presented on the
Human Machine Interface (HMI) either by colour change or text message. More
importantly, the future concept of cognitively convenient alarm onset aims to
circumvent these human limitations by providing an alert, for the system-detected
failure occurrence, at the moment when levels of controller workload allow its detection
(see Chapter 4, section 4.3.2).
The above discussions have highlighted that detection can be either enhanced or
hindered by a combination of technical and human related factors. External stimulus,
past experience, appropriate design solutions, and sudden development of equipment
failures tend to enhance detection. However, inappropriate system design, high levels
of workload and fatigue may hinder failure detection. Similar conclusions are drawn
from the study on human recovery performance in nuclear power plants by Kaarstad
and Ludvigsen (2002). Based on a literature review, an experimental investigation, and
field studies, they identify the three most significant factors that affect the detection
phase. These are:
� communication - interaction with colleagues can provide information to detect a
failure;
� system feedback - cues directly found in the operational environment (e.g. alerts,
other non-usual system event); and
� internal feedback - mismatch between operator’s expectations of
system/environment and the existing system status.
All above mentioned factors are relevant within the ATC environment. For example,
communication represents an important factor as the information on an equipment
failure can come from the supervisor or the system control and monitoring unit.
Similarly, in the ATC environment internal feedback is referred to as ‘mental model’.
Once the controller is aware of information mismatch, his or her task is to rapidly
Chapter 5 Air Traffic Controller Recovery
116
determine the significance of that mismatch. Generally, the existing system output is
compared with the previously observed one, to determine whether the change is within
tolerance. For example, if an aircraft is in level flight no flight level change should occur
and any deviation from the cleared flight level should trigger the detection of an
unusual event (e.g. pilot error, radar garbling).
The detection phase is investigated further using data from a questionnaire survey and
an experiment in Chapters 6 and 10 respectively.
5.2.2 Diagnosis
Once detection occurs, the diagnosis phase (also known as explanation, localisation,
or identification phase) determines what the failure is, its cause, and what should be
done to correct it. A controller needs a good knowledge of a failure to determine what is
occurring and its effects (e.g. what to expect in the near future, whether the function is
still partially available or totally lost, any problem with data integrity and possible impact
on other tools). This is especially important in the ATC environment where the overall
system consists of highly integrated components and different failures may present
themselves to the controller in a similar manner. For example, a radio frequency failure
manifests itself in the same manner regardless of its cause (i.e. ground- vs. airborne-
based failure). Therefore, it is up to the controller to identify the true failure by ruling out
alternatives. In this particular example, the controller will first try to establish radio
contact with other aircraft. If communication is established with the other aircraft it is
reasonable to assume that the failure is on the aircraft side. The controller will then try
to identify if it is a receiver or a transmitter failure by asking the aircraft to squawk
identification. If the aircraft squawks identification then the pilot clearly heard the
transmission. The controller then knows that the aircraft has experienced a transmitter
failure. By employing this procedure, the controller determines the precise element of
the equipment that failed, and thus implements the most appropriate recovery
procedure.
Past research in non-ATM industries has shown that in some cases, after the detection
of a failure, the corrective actions are immediately known and implemented. In these
cases, the diagnosis phase is omitted (e.g. in the nuclear industry - Kaarstad and
Ludvigsen, 2002). Similarly, the study from the chemical process industry has shown
that the order of the phases is not always the same. More precisely, the diagnosis
phase does not necessarily follow the detection phase, especially in time-critical
Chapter 5 Air Traffic Controller Recovery
117
operations. Often a quick fix might be necessary or an initial correction might occur
even before the cause of a failure has been identified (Kanse, 2004).
The findings from non-ATM industries are not entirely applicable to the ATC/ATM
environment. It is difficult to see how the diagnosis phase could be omitted simply
because proper ATC equipment failure recovery is not possible without knowing the
true nature of a failure. However, the duration and the attention dedicated to the
diagnosis phase relates directly to the level of workload experienced by the controller
at the moment of failure occurrence and during the recovery process. Through
interviews, EUROCONTROL study determined that controllers in most occasions do
not seek an explanation for a cause of failure (EUROCONTROL, 2004e). They focus
only on identifying the system that failed, which is essential to implement an adequate
recovery strategy. An example could be the code-callsign conversion failure, where,
having detected a problem, the controller has to identify the pair of aircraft affected.
This tends to be a very time-consuming process leaving no time for the controller to
consider the cause of the failure. Another example is corruption of radar data. If the
controller doubts the quality of a particular radar source in the multi-radar coverage
airspace, it is possible to use information from other radar sources. If the same failure
occurs in the single-radar coverage airspace, the controller has to disregard radar data,
initiate procedural (non-radar) control, and pass the problem to the system control and
monitoring unit. In both cases, the controller has to determine what failed and what the
impact of that failure is, in order to implement an adequate recovery strategy. The
cause of the failure is left to the system control and monitoring unit to investigate.
From the discussion above, it is clear that the diagnosis phase is important to identify
the equipment that has failed. However, if the failure is identified and corrective actions
are immediately known, diagnosis is omitted for the subsequent correction phase. The
diagnosis phase and the factors that may influence it are addressed further in Chapter
10 on an experimental investigation. Once the controller diagnoses the failure type and
its impact on the ATC system, the tasks shift to more action-based activities. In short,
the controller initiates the correction phase which is described below.
5.2.3 Correction
Failure recovery involves knowing how to undo or minimise the effect of failure and
achieve the desired system state (nominal or stable non-nominal system state,
respectively). The first priority is to minimise the effect on the air navigation service and
the exposure of the problem in terms of aircraft and time. Depending upon the
Chapter 5 Air Traffic Controller Recovery
118
equipment failure type, recovery should follow available procedures (for details see
section 5.5). Some of them could be fairly simple like switching to another radar source
in multi-radar processing areas, changing to the secondary radio frequency (if the
primary one is blocked), changing unserviceable input devices (mouse or keyboard),
and switching to another console (if the current one is not operational). Other recovery
strategies could be very complex and both physically and mentally demanding. For
example, if an automated conflict detection tool fails to work properly (e.g. Short-Term
Conflict Alert – STCA and Medium Term Conflict Detection - MTCD), an alert might
appear when there is no failure, or conversely the controller might detect a conflict that
was not alerted automatically. In both instances, the controller will diagnose that the
conflict detection tool itself is not functioning properly. Immediate action would be
required to ensure the safety of all traffic. In other words, the controller will have to
detect all existing conflicts and resolve them in a timely and efficient manner without
the assistance of automated safety nets (e.g. STCA). The second priority would be to
test and restore the automated function, which would be the responsibility of the
system control and monitoring unit.
Past research in the nuclear industry has identified different types of decision events
that constitute the correction phase of recovery (Orsanu and Fischer, 1997; Kaarstad
and Ludvigsen, 2002). These are assessed for the ATC environment below:
� ignoring the failure – error/failure has been detected, but ignored by the operator for
two possible reasons: error/failure is considered irrelevant (i.e. no impact on
operations) or the operator assumes that his/her intervention may make the
situation worse. In any case the failure would have to be reported;
� applying procedures – this seems to be the most common correction type.
Therefore, it is necessary to ensure that procedures exist and that they are
appropriate to a particular failure;
� choosing a solution – in theory this is applicable when procedures are not available
and the human operator has to apply more conscious resources to comprehend the
situation. In many situations it may seem that only one solution is possible to
resolve the failure. However, in retrospect, more than one solution may be
available, while only one was considered at the time; and
� creating a solution – in this case the operator has no experience with the failure
type. No procedures, training, or past experience are available for the human
operator to draw upon. A completely new solution or strategy has to be created.
Chapter 5 Air Traffic Controller Recovery
119
This represents the most resource-demanding option of all. This process
corresponds to human heuristic competence4 (Rigas and Elg, 1997).
In the context of ATC, if the failure penetrates all existing built-in defences and affects
controller performance, it cannot be ignored. Thus, the recovery from ATC equipment
failures can be accomplished by applying a predefined procedure, modifying an
existing plan, or developing a new one. However, application of an existing procedure
would be the preferred option as it puts the least strain upon the controller. Compared
to the nuclear environment, the execution of the chosen procedure has to be done in a
very short time frame (EUROCONTROL, 2004e). An important aspect of the correction
phase and recovery is coping with stress induced by unexpected failure. Interviews
with controllers conducted for the EUROCONTROL study confirmed that unexpected
failures tend to significantly increase workload and stress (EUROCONTROL, 2004e).
Controllers are unable to perform their tasks effectively with a large reduction of the
ability to cope with other adverse operational and environmental conditions.
Furthermore, the controllers interviewed highlighted that the critical incident stress
management is essential in managing the stress associated with equipment failures
(EUROCONTROL, 2004e).
The correction phase and the factors that may influence it are investigated further in
Chapter 6 and 10. From the discussions above, it is clear that existing recovery
procedures, recovery training, and past experience with equipment failures play an
important role in the overall recovery process. These three drivers build a knowledge
base for the choice or creation of the most appropriate solution for recovery from an
equipment failure. The discussion above, of the phases that constitute the process of
recovery, is followed in the next section by looking at the outcome of the recovery
process.
5.3 Outcome of the recovery process
Although the main recovery process consists of several phases, as explained
previously, these activities do not conclude the process itself (Figure 5-1). Prior to the
4 There are two types of human competences: epistemic and heuristic. Epistemic competence
refers to domain knowledge about the system which one seeks to control. It is context dependent component of the actual competence. Heuristic competence refers to a general competence for handling complex dynamic tasks. It is context independent, but it is developed over many years through both training and experience. As a result, actions and decisions become fast, automatic, without apparent conscious awareness.
Chapter 5 Air Traffic Controller Recovery
120
EQUIPMENT FAILURE
HAZARD
OUTCOME
RECOVERY
RECOVERY SUCESSFUL
RECOVERY NOT SUCCESSFUL
RECOVERY CONTINUES
INCIDENT WITH FURTHER
CONSEQUENCES
outcome phase, the human operator attempts to resolve the problem, by implementing
a recovery strategy. This is followed in the outcome phase by post-correction
monitoring or post-recovery analysis to determine the actual outcome of the
implemented strategy. Therefore, the first task in this phase is the monitoring itself,
both by controllers and engineers. Proper design solutions could aid this phase by
providing post-recovery system status indicators.
Figure 5-1 Analysis of the outcome phase (adapted from EUROCONTROL, 2004e)
It might be expected that at this stage human performance requirements are similar to
those of the detection phase. However, as observed by EUROCONTROL (2004e)
there is a crucial difference. Guided by implemented corrections (recovery strategies),
monitoring by both engineers and controllers is driven more by ‘top-down’ processes,
primarily expectation. Since at this stage in the recovery process the operators have
knowledge of the failure and its cause, they also have expectations on how the system
might behave after a correction is implemented. For instance, if the system remains
unstable, operators may expect a reoccurrence of the same problem, other related
problems (common-mode or common-cause failures), or have a general suspicion that
the assessment of the problem was wrong or misleading.
Following the period of monitoring or active checks, the controller must decide whether
recovery is successful. Recovery is considered successful if the system returns to the
nominal (pre-failure) or intermediate, stable state (EUROCONTROL, 2004e).
Intermediate state represents a degraded operational state (e.g. loss of any function,
item of equipment, or a significant overload condition causing increased system
response time) which is detected and stabilised either by controllers or engineers. In
essence, the system is in the intermediate state if the consequences of failure are still
observable in the system performance while controllers are aware of the quality of
Chapter 5 Air Traffic Controller Recovery
121
information they are receiving from the system and thus the quality of service they can
provide to traffic.
If recovery is unsuccessful, the controller will return to either diagnosis (to determine
the real cause of the problem) or correction phase to retry the previous strategy or
attempt a new one (Kanse and van der Schaaf, 2000; EUROCONTROL, 2004e). This
cycle of reapplied efforts continues as long as there is the time available for recovery.
Otherwise, if no time is available, the final outcome may be an incident with further
consequences (e.g. loss of separation).
The next section reviews the existing models of failure and recovery process
developed to support the research on human recovery in ATM and non-ATM industries.
5.4 Models of human recovery
Throughout the reviewed literature, only a few models cover both equipment failure and
its recovery process. On the other hand, an extensive volume of research is dedicated
to models of recovery from human error. These models are the result of work in the
field of human reliability and can be transferred to recovery from equipment failure. In
chronological order, the review begins with the work of Frese et al. (1990) and Frese
(1991), which was based on office workers’ errors and error handling in using
computers. In 1992, as part of a PhD thesis on near miss reporting in the chemical
process industry, van der Schaaf (1992) developed the Eindhoven classification model
of system failures. This model was based on Rasmussen’s Skill-Rule-Knowledge
(SRK) model of human behaviour (Rasmussen, 1982) as one of the most dominant
factors causing system failures in chemical process plants. The SRK model of human
behaviour was extended to system failures, incorporating additional root causes of
incidents, namely technical and organisational factors. The incorporation of all relevant
failure factors has created a comprehensive approach to safety management.
However, the approach has suffered from the limitations of the SRK model as
discussed below.
Bainbrigde (1984) reports problems using Rasmussen’s taxonomy of three main types
of cognitive behaviour, namely SRK. For example, the word ‘rule’ could be used for a
specific procedure, instructions, standard method based on previous experience, or
precise heuristic method. Another criticism is of the associated model for organisation
of cognitive behaviour, the so-called Rasmussen’s pyramid model. The model places
‘skilled’ behaviour at the base and ‘knowledge’ based behaviour at the top of the
Chapter 5 Air Traffic Controller Recovery
122
pyramid. This model, although representing the general organisation of cognitive
behaviour, does not contain mechanisms for complex behaviour (see Bainbridge,
1984).
While the previous discussions focus mainly on models for recovering from human
error, this section further presents three models that focus on recovery from technical
failures. These are: the model by Kanse (2004) developed and tested in the chemical
process industry; the EUROCONTROL’s project on Solutions for Human Automation
Partnership in European ATM (SHAPE) and the Recovery from Automation Failure
Tool (RAFT) developed specifically for the Air Traffic Management (ATM) industry
(EUROCONTROL, 2004e); and the model of failure recovery in air traffic control by
Wickens et al. (1998). The model by Kanse originates in non-ATM industry but focuses
not only on the human as a system component, but equipment and procedures as well.
This model lays down the ideas for the RAFT. The RAFT and the Wickens’ models
were chosen because of their relevance to research in this thesis as both assess the
impact of future automation on recovery from potential failures.
5.4.1 Model by Kanse
The basic principle behind the model by Kanse (2004) is a sequence of phases that
constitute the process of human recovery, detection, explanation (i.e. diagnosis), and
countermeasures (i.e. correction). The model is based on past research and
operational data from three studies of near misses in chemical process plants. Near
misses are incidents that have the potential to, but do not result in a loss (e.g. an
accident, injury, failure).
According to this qualitative phase model (Figure 5-2) the recovery process starts by
detection of a failure. This is followed by any combination of explanation (referred to as
diagnosis in this thesis) and countermeasures (referred to as correction in this thesis),
including omitting one or both of these phases but also their recurrences. For example,
the assessment of the order of the recovery steps performed by plant operators in each
incident revealed that the intermediate phase (i.e. diagnosis) was omitted in more than
35 percent of incidents (see Table 3 in Kanse, 2004).
The model does not focus on the factors that influence the recovery process but
highlights that factors influencing recovery might be different in different domains.
Additionally, the model does not make any attempts toward the prediction of human
performance, future errors, or failures.
Chapter 5 Air Traffic Controller Recovery
123
DDetection of
deviation
CCountermeasures
ENDOf recovery
process
BEGINProblem situation
arises as a result of one or more failures
EExplanation of deviation and
causes
Figure 5-2 Recovery process phase model (Kanse, 2004)
5.4.2 The RAFT Tool
The EUROCONTROL’s SHAPE project addressed the effects of automation on human
performance and future ATM concepts. A part of this project focused on the technical
failures and the controller’s ability to manage them and resulted in the Recovery from
Automation Failure Tool (RAFT), as a method for analysing technical failures.
The basic principle behind RAFT is a sequence of phases that constitute the process of
failure and recovery (Figure 5-3). Following a number of important factors that influence
the consequences of an equipment failure, the RAFT tool starts by assessing the
recovery context that has the potential to influence human recovery process (Figure 5-
3). This is followed by an assessment of the failure cause, problem definition
(according to the RAFT framework an equipment failure leads to a functional
disturbance), and the failure effects. Then, the RAFT tool moves toward the
investigation of the human recovery process. This is done separately for the controllers
and engineers involved. The final step in the failure analysis is the outcome phase
and includes an assessment of the effectiveness of the implemented recovery strategy
(Figure 5-3).
The RAFT is based on the past research and operational experience. It is based on a
qualitative model developed by Kanse and van der Schaaf (2000) for the chemical
process industry (further adapted by Kanse, 2004 as explained in the previous section).
The model by Kanse and van der Schaaf is further augmented with operational
experience, extracted from interviews with 31 ATM staff in four European ATC Centres.
The practical use of the RAFT is based on the existence of expert group-based
evaluation of each failure and prediction of how controllers are likely to respond to
equipment failures. This tool is intended to be used together with other SHAPE project
outputs for predicting controller performance in the future highly automated
Chapter 5 Air Traffic Controller Recovery
124
environment (e.g. a prediction of changes in controller skill requirements, workload,
trust). The approach has neither been verified through the recovery performance in
simulated nor operational environments and still lacks the set of recovery relevant
principles to guide designers of current and future ATM systems. Second generation
prospective Human Reliability Assessment (HRA) methods could be used to develop a
predictive capability of the RAFT tool and to inform safety-adequate design principles
related to controller recovery from equipment failures.
Figure 5-3 The Recovery from Automation Failure Tool Framework (EUROCONTROL, 2004e)
5.4.3 Model by Wickens et al.
In 1998, the Panel on Human Factors in Air Traffic Control Automation established by
the Federal Aviation Administration (FAA) studied various aspects of human factors
and the role of the human in proposed future automated systems. Amongst several
different issues, research by this Panel recognised the importance of equipment
failures and recovery. The Panel proposes a model of ATC failure recovery and places
an emphasis on the consequences of degradation of automated ATC functionalities
(Wickens et al., 1998). It is assumed that the model is based entirely on available
research as the Panel focused on concepts that will characterise the future ATC
system. The basic principle behind this qualitative model is the impact of ATC
automation functionalities (left-hand side on Figure 5-4) on capacity, traffic density,
complexity, workload, situational awareness, manual skills, and recovery response
time. Each of these variables is associated with a sign (or a set of signs) indicating
Chapter 5 Air Traffic Controller Recovery
125
whether automation is likely to increase or decrease the variable in question. However,
this model does not consider in detail how recovery is accomplished.
Figure 5-4 Model of failure recovery in air traffic control. Where two nodes are connected by an arrow, signs (+, -, 0) indicate the direction of effect on the variable depicted in the right node, caused by an increase in the variable depicted in the left node (Wickens et al., 1998)
The model also reflects the hypothetical function which relates recovery response time
to the level of automation (Figure 5-4). It is expected that recovery response time will
increase as the level of automation increases (shown as a dashed upward line on the
right side of the Figure 5-4), due to increased complexity, skill degradation, and overall
‘out of the loop’ phenomenon. The solid downward line reflects the decrease of the
reaction time available to controllers as a result of the introduction of higher levels of
automation. Controllers will have far less time to safely respond to any loss of
separation and fewer opportunities for effective solutions. As a result, this model
represents the Bainbridge’s (1983) ‘ironies of automation’ by overlaying two critical time
variables against each other and as a function of automation-related changes. These
variables are: the time required to establish safe separation, given a degraded ATC
service, and the time available to a controller (or a team) to react and safely recover
from a failure.
After describing the three models relevant to controller recovery from equipment failure
in ATC, Table 5-2 summarises their characteristics and identifies their limitations
addressed later in the thesis. In general, all three models are qualitative and based on
a principle of a sequence of phases that constitute the process of human recovery.
Chapter 5 Air Traffic Controller Recovery
126
They are based on past research, whilst only one model is based on operational data.
The limitations identified in the last column of Table 5-2, guided the research presented
in this thesis and the main principles behind the framework for the assessment of
controller recovery. In short, the research in this thesis is verified in the simulated
environment (experimental investigation – Chapter 10), based on operational
experience (from interviews with relevant ATM staff, operational data – Chapter 4, and
the questionnaire survey - Chapter 6), and based upon detailed assessment of the
recovery context (Chapters 7 and 8).
Table 5-2 Summary of relevant models of the human recovery process
Model Context Operational
input Assessment of recovery
Prediction of recovery
Limitations
Kanse (2004)
Chemical industry
Yes (interviews and data)
Qualitative and
quantitative No
� No assessments of the recovery context
� No prediction of the recovery process
SHAPE’s RAFT tool
ATM Yes
(interviews)
Qualitative (expert-based)
Qualitative (expert-based)
� Not verified in simulated/operational environment
� Based only on interviews and no operational reports
Wickens et al.
(1998) ATM No No
Qualitative and potentially
quantitative (based on the
recovery reaction time)
� Theoretical approach
As stated previously, there are three major factors that influence the quality of
controller recovery, i.e. past experience, procedures, and training. Whilst procedures
and training are regulated within the aviation community, operational experience is
accumulated over time and controllers may or may not experience equipment failures
during their career. For this reason, the next sections describe and discuss existing
regulations regarding recovery procedures and training. Operational experience,
extracted from the questionnaire survey, is investigated in the following Chapter.
5.5 Procedures for handling ATC equipment failures
In both the literature and operational practice, procedures are recognised as the critical
factor for effective recovery. The following section provides an overview of the existing
international and national regulations on procedures for recovery from equipment
failures in ATC. This is followed by a discussion on key principles on the recovery
procedures in ATC, identified in this research.
Chapter 5 Air Traffic Controller Recovery
127
5.5.1 Existing regulations
Regulation on procedures for handling ATC equipment failures, i.e. recovery
procedures, exists at three levels. These are: international (i.e. by the International Civil
Aviation Organisation - ICAO), regional or national (e.g. by the European Organisation
for Safety of Air Navigation – EUROCONTROL at the regional level and Civil Aviation
Authorities – CAAs at the national level), and air navigation service providers (ANSPs)
level.
The main activity of ICAO is the establishment of International Standards,
Recommended Practices and Procedures covering all technical fields of aviation. The
‘Recommended Practices’ are desirable objectives to which ICAO member states
should aim (but are not required) to conform with; whilst ‘Standards’ are considered
mandatory or required in the interest of safety of international air navigation (FAA,
2005). ICAO Standards and Recommended Practices are passed to the respective
regional organisation (e.g. EUROCONTROL) or directly to the national CAAs for
assessment and implementation. The national CAA is then responsible for assurance
and monitoring that these standards are properly implemented by ANSPs at the level of
ATC Centres. The current status of regulations on recovery procedures is discussed in
the following sections.
5.5.1.1 International regulation
Since 1945 ICAO has specified the standards, practices, and procedures for ATC. The
most recent edition of ICAO Annex 11 responsible for air traffic services (ICAO, 2001c)
advises that “air traffic services authorities should develop and promulgate contingency
plans for implementation in the event of disruption or potential disruption of air traffic
services and related supporting services in the airspace for which they are responsible
for the provision of such services”. This ICAO recommendation represents a summary
of the key system safety principles that need to be considered within each air traffic
service unit. Moreover, several particular equipment failures are covered separately in
the ICAO document dealing with procedures for air navigation service (ICAO, 2001a).
These are radar equipment failure, ground radio failure (blocked frequency), ground
Automatic Dependent Surveillance (ADS), and failure of Controller Pilot Data Link
Communication (CPDLC). Based upon the findings from the analysis of operational
failure reports presented in Chapter 4, ICAO has concentrated upon the appropriate
components in terms of the communication and surveillance ATC functionalities whilst
disregarding the data processing functionality.
Chapter 5 Air Traffic Controller Recovery
128
In their guidance for recovery from four failure types, ICAO recommends necessary
steps to be taken by controllers and pilots, as well as ATC Centre watch managers or
supervisors. When necessary, ICAO also recommends collaboration with adjacent ATC
units. Therefore, the recovery process is not seen only as the responsibility of
controllers but all parties involved within the affected ATC Centre and region (including
the adjacent ATC unit which can provide valuable assistance in restricting or rerouting
the flow of traffic). All other failure types are left to national service providers to include
and define in their Manuals of Air Traffic Services (MATS).
5.5.1.2 European and national regulation
At European level, EUROCONTROL published guidance and recommendations for
controller training in the handling of unusual/emergency situations, known as the
ASSIST scheme (EUROCONTROL, 2003f). This scheme covers all procedures for
aircraft emergencies but paradoxically does not cover any type of ATC equipment
failure. The ASSIST programme, captured in a publicly available document, is intended
to represent only a framework to be further customised and adapted to the specific
requirements of each ATC Centre utilising local expertise. Thus, each ATC Centre is
required to assemble a team of experts, implement the current ASSIST programme,
and discuss other safety-critical events (e.g. ATC equipment failures) to be included in
Zagreb-Croatia). They also include ATC Centres with technically advanced ATC
system (e.g. Frankfurt, Amsterdam, Karlsruhe, Stavanger, and Melbourne). Finally, the
characteristics of controllers include all levels of operational experience (i.e. ranging
from 3 to 39 years in service) and ratings. In short, these 27 ATC Centres capture the
characteristics of the target population and as such will be included in the further data
analyses.
6.7.2.2 Sampling of air traffic controllers
The questionnaire survey captured interesting information related to the operational
experience of controllers, namely years of experience, country of residence, and ATC
facility location (i.e. city or airport). The survey data show that on average controllers
have more than 13 years of operational experience (i.e. length of service), ranging from
1 to 39 years. More than 77 percent of the controllers surveyed have up to 20 years of
experience. Taking into account the length of service captured in this survey, it is split
into four categories: 1-10, 11-20, 21-30, and 31-40 years (Figure 6-6). The sample is
reasonably representative of the population as all categories are represented. There
seems to be fewer respondents with over 30 years of experience in the sample
collected. However, this is expected as the majority of controllers with more than 30
years in service tend to move to operational support roles, including training,
instructing, and management.
Chapter 6 Questionnaire Survey
155
Figure 6-6 Distribution of operational experience
Furthermore, Figure 6-7 presents the distribution of the ratings of the controllers who
participated in the survey. In general, most controllers have ACC ratings. As a result,
data analyses may be biased towards the experience within the ACC environment
which tends to be better staffed and with more access to advanced equipment/tools
(e.g. multiple radar sites feed the radar coverage instead of single radar site as in APP
and TWR control, and investment in the more automated systems).
3.732.24
31.34
15.67
9.7
26.12
10.45
0
5
10
15
20
25
30
35
ACC & APP &
TWR
ACC & APP ACC & TWR APP & TWR ACC APP TWR
Rating
Perc
en
tag
e
Figure 6-7 Distribution of controllers’ ratings
6.7.3 High-level analyses
This section presents high-level results from the simple percentage analyses of the
entire dataset. These summaries are organised into seven sub-groups, corresponding
to the six key questions that the survey was designed to answer (defined in section 6.1)
Chapter 6 Questionnaire Survey
156
and concluding with other findings on controller recovery (captured in question 5).
Therefore, the relevant sub-groups are: experience with equipment failures in the ATC
Centre, factors that influence the recovery performance, the most unreliable ATC
systems/tools, organised exchange of information on equipment failures, status and
quality of recovery procedures, status and quality of training for recovery, and other
findings. Each of the sub-groups is discussed below.
6.7.3.1 Experience with equipment failures (Q1)
In the sample obtained, 94.8 percent of controllers did experience some kind of ATC
equipment failure in their career. Additionally, this group of controllers experienced on
average 17 equipment failures annually, ranging from less than 1 per year up to 600,
as reported by one ATC Centre. This dispersion of the results reflects the wide
variation in the interpretation of equipment failures. Some controllers interpreted the
question on equipment failures in terms of only ‘major’ (more severe) failures. Their
answers ranged from less than one (e.g. once in two years, once in five years, once in
a career) to one failure annually (34.6 percent of responses). Other controllers reported
the total number of failures experienced annually regardless of their level of severity, as
their responses ranged from dozens to hundreds. In short, the vast majority of
controllers surveyed have experienced equipment failures.
6.7.3.2 Factors that influence controller recovery performance (Q2)
Controllers were asked to rate how much they relied upon written procedures,
situation-specific strategies (i.e. context), and other factors (e.g. past experience) in
handling equipment failures. The ratings ranged from one to five, where one stands for
‘very much’, two for ‘much’, three for ‘moderate’, four for ‘minimal’ and five for ‘not at
all’.
The results show that more than 45 percent of the controllers surveyed rely on written
procedures in the event of an equipment failure at the levels of either ‘much’ or ‘very
much’ (see Figure 6-8). These controllers have on average more than 13 years of
experience, they operate in ATC Centres with recovery procedures (96.4 percent of
controllers who rated written procedures ‘much’ or ‘very much’) and recovery training
schemes (64.3 percent controllers who rated written procedures ‘much’ or ‘very much’).
Chapter 6 Questionnaire Survey
157
Not at allMinimalModeratelyMuchVery much
Written procedures
50
40
30
20
10
0
Fre
qu
en
cy
3.25%
13.01%
37.4%
22.76%23.58%
Figure 6-8 Controllers’ reliance on written procedures throughout the recovery process
When it comes to situation-specific problem solving, 63.48 percent of controllers rated
this factor at the levels of either ‘much’ or ‘very much’ (see Figure 6-9). Similar to the
previous factor, the operational experience of controllers who rated this factor highest
is on average more than 13 years, they operate in ATC Centres with recovery
procedures (94.5 percent of controllers who rated situation-specific problem solving
‘much’ or ‘very much’) and recovery training schemes (63 percent of controllers who
rated situation-specific problem solving ‘much’ or ‘very much’). The only difference
observed with the previous group of controllers is that no controllers from the African
region rated situation-specific problem solving highly. European controllers tend to rely
much more on situation-specific problem solving (69.3 percent of responses captured
from European controllers) compared to their reliance on written procedures (42.7
percent).
Not at allMinimalModeratelyMuchVery much
Situation-specific problem solving
50
40
30
20
10
0
Fre
qu
en
cy
1.74%
10.43%
24.35%
35.65%
27.83%
Figure 6-9 Controllers’ reliance on situation-specific problem solving throughout the recovery process
Chapter 6 Questionnaire Survey
158
Finally, 64.08 percent of controllers rated other factors (e.g. past experience) at the
level of either ‘much’ or ‘very much’ (see Figure 6-10). Similar to the previous factors,
the operational experience of controllers who rated this factor highest is on average
more than 13 years, they operate in ATC Centres with recovery procedures (90.8
percent of controllers who rated other factors ‘much’ or ‘very much’) and recovery
training schemes (58.5 percent of controllers who rated other factors ‘much’ or ‘very
much’). European controllers rely most on other factors (e.g. past experience) when
recovering from equipment failures (69.6 percent of responses captured from European
controllers) compared to Asian controllers (42.1 percent of responses captured from
Asian controllers). The sample of African controllers is too small for any comparison.
Not at allMinimalModeratelyMuchVery much
Past experience
40
30
20
10
0
Fre
qu
en
cy
2.91%3.88%
29.13%31.07%
33.01%
Figure 6-10 Controllers’ reliance on other factors (e.g. past experience) throughout the recovery process
Figures 6-8 to 6-10 and frequency analysis show that controllers mostly rely upon other
factors (e.g. past experience) when dealing with equipment failures. This is followed by
situation-specific problem solving and finally written procedures. After investigation of
factors that affect controller recovery, the next section focuses on the survey objective
and the assessment of the most unreliable ATC system/tool.
6.7.3.3 The most unreliable ATC systems/tools (Q3)
The data used for the analysis of the most unreliable ATC equipment are based on two
particular questions, 5 and 9. Question 5 consisted of examples of equipment failures
that severely impacted on the controller’s work. Question 9 asked controllers to list the
three most unreliable ATC systems/subsystems they have experienced. The data
obtained from both questions were collated and pre-processed to remove any duplicate
Chapter 6 Questionnaire Survey
159
answers. This was necessary as controllers tended to give the similar response to both
questions.
The results of the analysis of questionnaire responses from 34 countries were found to
be similar to those obtained from the analysis of operational failure reports, presented
in Chapter 4. The questionnaire survey shows that the three most affected ATC
functionalities are: communication (37.2 percent of all examples provided), data
processing (24.6 percent), and surveillance (23 percent) (Figure 6-11). More precisely,
the following five equipment types are affected most:
� air-ground communication (12.03 percent of all examples provided);
� primary surveillance radar ( 9.1 percent);
� flight data processing system (7.75 percent);
� communication panel ( 7.49 percent); and
� ground to ground communication (6.68 percent).
Figure 6-11 Distribution of affected ATC functionalities as reported in the questionnaire survey
Table 6-2 establishes the link between the most unreliable ATC functionalities and
existing recovery procedures, as reported by 134 controllers from 34 countries
representing various regions of the world. The link is established based on responses
to questions 5, 9, 10, and 11. In addition, the analysis was conducted at the country
level rather than ATC Centre level to avoid direct reference to sensitive information
specific to ATC Centres. It should be noted that because of this, inaccuracies are
possible only for the cases when the controllers did not have a full awareness of the
availability of recovery procedures in their ATC Centres.
Chapter 6 Questionnaire Survey
160
Table 6-2 Mapping between most unreliable ATC functionalities and existing recovery procedures for the countries sampled
Country Most unreliable ATC functionalities
Existing recovery procedure
Ireland
Communication Frequency failure, telephone failure Navigation Failure of navigational aids Surveillance Radar failure (procedural/non-radar control) Data processing Strip printer failure (emergency strip printing) Pointing/input devices
Input device failure
Power outages, procedures for all failure types
Finland Communication Surveillance Data processing
Serbia
Communication Frequency failure, telephone failure Surveillance
Data processing Flight data processing system (FDPS) failure, radar data processing system (RDPS) failure
Switzerland
Communication Frequency failure, telephone failure Navigation Surveillance Radar failure, visualisation system (radar display) failure Data processing FDPS failure Pointing/input devices
Power supply failure
United Kingdom
Surveillance Procedures for all failure types
Netherlands
Communication Frequency failure
Surveillance Secondary surveillance radar (SSR) failure, radar fallback system failure, failure of the working position (radar display)
Data processing FDPS failure, RDPS failure Pointing/input devices
Total system failure (in various gradations)
Germany
Communication Surveillance Radar failure Data processing Total system failure
Spain
Communication Frequency failure Surveillance Total radar failure Data processing Fire contingencies
Norway
Communication Frequency failure, on-line data interchange (OLDI) link failure, communication panel failure, telephone failure, headset failure, intercom failure
Surveillance Radar failure, failure of the radar display Data processing FDPS failure Pointing/input devices
Italy
Communication Frequency failure Navigation Runway/taxiway lights failure Surveillance Radar failure Data processing
France
Communication Frequency failure, telephone failure Surveillance Radar failure Data processing FDPS failure, RDPS failure
Power outage, air conditioning failure, fire evacuation, meteorological equipment failure, failure of navigation
Chapter 6 Questionnaire Survey
161
aids
Sweden
Communication Frequency failure, telephone failure Surveillance Radar failures, surface movement radar failure Data processing Pointing/input devices
Safety nets
Procedures for most failure types, runway/taxiway lighting system failure, instrument landing system (ILS) failure
Slovenia Communication Frequency failure, telephone failure Data processing FDPS failure, RDPS failure Radar failure
Belgium Communication Frequency failure Surveillance Radar failure, radar fallback failure
Macedonia
Communication Frequency failure Data processing Pointing/input devices
Radar failure
Croatia
Communication Frequency failure, telephone failure Surveillance Radar failure Data processing Power outage
Moldova Radar failure
Iceland Communication Surveillance Data processing FDPS failure
Denmark Communication Frequency failure, telephone failure Data processing Radar failure
Portugal
Communication Frequency failure, telephone failure, voice switching and communication system (VSCS) failure
Tanzania Frequency failure, telephone failure, FDPS failure, power outage
India
Communication Telephone failure, intercom failure
Navigation Failure of navigation equipment, instrument landing system (ILS) failure
Surveillance Radar failure Data processing FDPS failure Pointing/input devices
Singapore Communication Frequency failure Surveillance Radar failures, failure of radar display
Tahiti
Communication Frequency failure, failure of satellite communication Surveillance Data processing Safety nets
Navigational aids failure, tsunami alert, aircraft diverting due to terrorist action
Australia Communication Surveillance
Austria Surveillance Data processing FDPS failure , RDPS failure, failure of strip printer
Chapter 6 Questionnaire Survey
162
Pointing device failure, failure of touch input display (TID), frequency failure
Romania Communication Surveillance Procedures for all failure types
Malta
Communication Surveillance Radar failure Data processing Pointing/input devices
Power supply
Macau Special Administrative Region
Communication Frequency failure Navigation Navigation aids failure Data processing
Procedures for all failure types, radar failure, SSR failure
Kenya
Communication Frequency failure, telephone failure Navigation Surveillance Data processing Strip printer failure
New Zealand
Communication Frequency failure, telephone failure Surveillance Radar failure, radar screen failure Data processing FDPS failure, RDPS failure Safety nets
Partial and total failure of all ATC equipment, evacuation of ATC centre, mouse/keyboard failure, power outage
China Surveillance Radar failure FDPS failure, frequency failure
Malaysia
Communication Frequency failure Surveillance Data processing Safety nets
The instances in which identified failures are not supported by existing recovery
procedures are highlighted in grey. In these cases, controllers experienced ATC
equipment failures for which recovery procedures were not available in their ATC
Centre. On the other hand, the instances in which sampled controllers have not yet
experienced equipment failures, for which procedures exist, are highlighted in yellow
and separated as the last row for each country. As an example, if the communication
function was affected specifically by frequency failure, the mapping is not established
(coloured grey) if the recovery procedure did not exist for this particular failure type. In
several cases controllers reported that their ATC Centre has procedures for all failure
types. Clearly it is not possible to cover all failure types but to design generic
procedures or guidelines to perform in the case of equipment failure.
It can be concluded that inadequate mapping between recovery procedures and
equipment failures experienced by controllers occurred in many cases. The most
severe cases are those in which countries do provide at best only one type of recovery
Chapter 6 Questionnaire Survey
163
procedure. This was identified in several European countries (i.e. Finland, Macedonia,
Iceland, and Malta), in two African countries (i.e. South Africa and Kenya), and two
Asian/Pacific countries (i.e. Tahiti and Malaysia). The most neglected ATC functionality
was found to be data processing, followed by surveillance and communication. The
paradox is that the qualitative equipment failure impact assessment tool (Chapter 4)
identified exactly these three ATC functionalities as the most challenging to controller
recovery.
6.7.3.4 Organised exchange of information on equipment failures (Q4)
40.3 percent of the controllers surveyed reported that their ATC Centres have
organised exchange of information on equipment failures between colleagues. 49.3
percent reported a lack of this exchange of experience whilst 10.4 percent did not
answer this question.
Contradictory responses were obtained from 14 ATC Centres and are further
investigated by responses given to the subsequent question, i.e. whether the organised
exchange of experience is supported by management as a good working practice.
From the ATC Centres that have exchange of experience, 76 percent have formal
processes approved by management as opposed to the practice based on ’word of
mouth’ that reaches only a small portion of controllers. The question was intended to
capture initiatives by management to provide means to share experience on equipment
failures in an organised manner. This may be achieved using different methods, such
as seminars, company newsletters, safety bulletins, memorandums, and workshops. In
these ways the lessons learnt are disseminated not only between the controllers
directly experiencing the effects of the failure, but within the entire ATC Centre and
often within the same country.
Based on this additional assessment, the following countries do not have formal nor
informal processes for exchange of experience on equipment failures: Italy, Ireland,
Croatia, India, Slovenia, Maastricht ATC Centre (as opposed to Amsterdam Centre),
Switzerland, Slovenia, Macau SAR, and Kenya.
The data indicates that there is room for improvement. There is a clear need for the
implementation of formal processes for exchange of experience on equipment failures
including failure modes and recovery processes. This should form part of a wider safety
culture within ATC Centres which is the responsibility of management. The past has
proven this type of indirect training to have a beneficial safety impact in a similar way to
Chapter 6 Questionnaire Survey
164
regular recurrent training. The example discussed in Chapter 5 mentions an incident
where A300 was struck on the left wing by a surface to air missile system resulting in a
loss of all flight controls. Reacting rapidly, the captain recalled a television documentary
on a DC-10 crash at Sioux City (Iowa) and the thrust change technique employed by
the captain and crew of the DC-10 to control their aircraft. Although the A300 crew had
never practiced this technique before, they quickly gained control despite the extreme
stress of the situation (IFALPA, 2005).
6.7.3.5 Status and quality of recovery procedures (Q5)
A section of the questionnaire consisting of 11 questions (from 10th to 20th question)
was dedicated to the assessment of recovery procedures within each ATC Centre. The
first question was designed to immediately filter out those ATC Centres without any
written procedures in place. In this case, the controller would skip the rest of this
section and proceed with the rest of the questionnaire. In cases where recovery
procedures exist, the remaining ten questions were designed to assess the quality of
those procedures. These questions focused on the completeness of the recovery
procedure, the level of currency, clarity, realism or feasibility, accessibility, and
compatibility with other procedures. In addition, controllers were given the opportunity
to comment on any event for which there was an inadequate application of recovery
procedures in their working experience.
The analysis of the questionnaire responses highlighted some inconsistencies (marked
with ‘?’ in Table 6-3). In these cases, the controllers from the same ATC Centre gave
opposite responses to the questions on the existence of recovery procedures, recovery
training, and/or recurrent training. These are further investigated using the responses
to the subsequent questions related to recovery procedure (11th to 20th question),
recovery training (25th to 28th question), and recurrent training (23rd and 24th question).
In this section, further investigation regarding the existence of recovery procedures is
conducted for Shannon, Cork, Brussels, and Nairobi ATC Centres (Table 6-3) using the
answers provided from 11th to 20th question. Although controllers from these ATC
Centres reported a lack of recovery procedures in the 10th question, their subsequent
answers revealed that these procedures do exist (at least for some failure types).
Chapter 6 Questionnaire Survey
165
Table 6-3 Existence of recovery procedures, recovery training, and recurrent training as reported in the questionnaire survey
Country ATC Centre Existence of
recovery procedure
Existence of training for equipment failures
Existence of recurrent training
Ireland
Shannon ? Yes ?
Dublin Yes No ?
Cork ? ? ?
Finland Kemi No Yes Yes
Serbia Belgrade Yes No No
Switzerland Zurich Yes Yes ?
Geneva Yes Yes ?
United Kingdom
Bristol Yes Yes No
Netherlands
Maastricht Yes ? Yes
Nieuw Milligen Yes Yes No
Amsterdam Yes Yes Yes
Germany
Karlsruhe Yes Yes No
Langen Yes Yes No
Frankfurt Yes Yes Yes
Spain Seville Yes ? No
Norway
Olso Yes Yes Yes
Kirkenes Yes Yes No
Stavanger Yes No Yes
Bodo Yes Yes Yes
Italy
Rome Yes No ?
Bologna Yes No No
Naples Yes No No
Venice Yes Yes No
Milan Yes No No
France Paris Yes Yes No
Nice Yes No No
Sweden
Stockholm Yes No No
Malmo Yes Yes Yes
Gothenburg Yes Yes Yes
Slovenia Ljubljana Yes Yes Yes
Belgium Brussels ? No No
Macedonia Skopje Yes No No
Croatia
Split Yes No Yes
Zagreb Yes No No
Pula No No Missing data
Zadar No No Missing data
Moldova Chisinau Yes Yes Yes
Iceland Reykjavik Yes No ?
Denmark Copenhagen Yes Yes Yes
Portugal Lisbon Yes ? ?
South Africa FAJS Yes Yes Yes
Tanzania Dar el Salaam Yes Yes No
India Mumbai Yes ? Yes
Kolkata Yes ? No
Singapore Singapore Yes Yes Yes
Tahiti Papeete Yes ? ?
Australia Melbourne Yes No No
Chapter 6 Questionnaire Survey
166
Austria Vienna Yes No Yes
Romania Bucharest Yes Yes Yes
Malta Malta No Yes No
Loqa airport Yes Yes Yes
Macau SAR Macau Yes ? ?
Kenya Nairobi ? Yes No
New Zealand
Wellington Yes Yes No
Auckland Yes Yes Yes
Christchurch Yes ? Yes
China Hong Kong Yes Yes No
Malaysia Subang Yes ? No
Table 6-2 shows that 93.1 percent of sampled ATC Centres do have some form of
recovery procedure in place (i.e. 54 ATC Centres). The types of equipment failures
mostly covered by recovery procedures in sampled ATC Centres are:
� radar failure (reported by 40.2 percent of controllers surveyed);
� failure of communication function: radio telephony, ground to ground
communication, voice switching and communication system panel (reported by
43.3 percent of controllers surveyed); and
� flight data processing system failure (reported by 12.69 percent of controllers
surveyed)4.
74 percent of controllers reported that these recovery procedures are kept up-to-date
and reflect the changes in hardware and software occurring in the ATC Centre.
Similarly, 72 percent of controllers rated available recovery procedures as
comprehensive, while only 55 percent rated them as complete. The remaining 45
percent of controllers surveyed rated available recovery procedures as incomplete (i.e.
missing recovery steps necessary to re-establish a safe ATC service). When asked
which types of recovery procedures should be added, the controllers mostly
emphasised the requirement for recovery procedures from radar failure,
communication systems failure, the need for back-up systems, and procedures for
handling outages at ATC Centre level. Furthermore, 88 percent of controllers rate
available recovery procedures as clear and understandable, while 72 percent rated
them as realistic and feasible to perform.
69 percent of controllers surveyed reported that recovery procedures documentation is
easily accessible, i.e. they are placed in close proximity to controller working positions.
4 The discussion presented in Chapter 5 showed that ICAO provides recovery procedures for
the communication and surveillance functionalities but not for the data processing functionality.
Chapter 6 Questionnaire Survey
167
Finally, 77 percent of controllers reported that available recovery procedures are linked
or harmonised to other procedures specified within the Manual of Air Traffic Services
(MATS), e.g. on suite allocation of tasks (separation of responsibilities between
executive and planner controller), and duties of the staff such as the approach
controller, the ground controller, or the watch manager.
From the survey data and subsequent analyses, it can be concluded that majority of
sampled ATC Centres have some form of recovery procedures. The majority of
controllers reported that these procedures are up-to-date, comprehensive, easily
accessible, and compatible with other procedures. Moreover, controllers emphasise
the need for procedures on radar and communication failures.
6.7.3.5.1 Other findings regarding the recovery procedures
In addition to the findings in the previous section, the questionnaire’s narrative section
highlighted interesting safety-relevant issues regarding recovery procedures. These are
individual comments rather than findings representative of the entire sample. The
reported issues are categorised in three groups, namely equipment specific, teamwork
specific, and generic recovery related issues. These are discussed in the following
paragraphs.
The equipment related issues highlighted major problems with the flight data
processing system not covered in the operational manuals. In addition, controllers
reported a lack of back-up facilities. One example indicated that during radio
communication system failure, a particular ATC Centre had only ten emergency radio
devices for the operational room with a 20 seat configuration.
On teamwork related issues, the controllers mostly reported inadequate familiarisation
with contingency procedures on the part of technical staff and controllers in
neighbouring sectors. In general, the controllers highlighted the important role of
teamwork and the need for an experienced planning controller in the event of
equipment failure. Another example drew attention to the unavailability of technical staff
during night shifts to immediately provide assistance in the case of equipment failure.
In short, controllers feel that teamwork is important in dealing with failures and that
Team Resource Management (TRM) training, aimed at enhancing teamwork efficiency,
should be mandatory for all ATC Centres.
Chapter 6 Questionnaire Survey
168
Finally, many individual recovery related issues, such as context, procedures, and
working practice, are also highlighted in the questionnaire’s narrative part. These are
as follows:
� Situation-specific problem solving plays a major role as all equipment failures
occur within a specific context (e.g. bad weather, frequency jamming, high/low
traffic levels);
� There is a need for a similar approach to recovery procedures as are available to
pilots. In other words, a comprehensive manual with all possible failures and
corresponding recovery steps is needed during controller training. For the
operational environment, it would be necessary to design an abbreviated version
of the contingency manual available at each controller working position (e.g. aide-
memoire in the form of check-list, see Appendix III); and
� Accurate and efficient strip marking is seen as the most reliable recovery tool in
the case of radar or flight data processing failure.
6.7.3.6 Status and quality of training for recovery (Q6)
A section of the questionnaire consisting of eight questions (from 21st to 28th question)
was dedicated to the assessment of training in recovery from equipment failures within
each ATC Centre. The first question was designed to immediately filter out those
Centres without training schemes. In this case, the controller would skip the reminder
of this section and proceed with the final part of the questionnaire. In the case of the
existence of a recovery training scheme, the remaining seven questions were designed
to assess its quality by extracting information on the existence of recurrent training, its
frequency, content, and compatibility with other types of training. The final section of
the questionnaire provided the opportunity for controllers to comment on other issues
of relevance to training.
The analysis of the collected data firstly revealed inconsistencies in the responses to
questions on training (Table 6-3). The reason for this may be that some controllers
assumed their initial training, e.g. initial radar control training, as training for recovery.
Other controllers may have considered only separate training for emergency situations
and whether it involved some type of equipment failure.
30 ATC Centres (51.7 percent) have training for recovery for equipment failures, 18
ATC Centres (31 percent) do not, while data for 10 ATC Centres (17.3 percent) are
inconsistent (i.e. marked with ‘?’ in Table 6-2). In these cases, the controllers from the
Chapter 6 Questionnaire Survey
169
same ATC Centre gave opposite responses to the questions on existence of recovery
training. All these inconsistencies are further investigated using the subsequent
questions related to recovery training (i.e. 25th to 28th question). Although controllers
from these ATC Centres reported contradictory responses on existence of the recovery
training (i.e. 21st question), their answers to subsequent training-related questions did
not reveal any further information. Therefore, a conservative approach has been taken
and these 10 ATC Centres are considered not to have recovery training in place.
In the case of recurrent training, the analysis shows that only 36.2 percent of the whole
sample of ATC Centres have recurrent training, 43 percent do not, while the rest of the
data is either inconsistent or missing. Recurrent training is provided once a year in 25
ATC Centres and bi-annually in three ATC Centres (Oslo-Norway, Bucharest-Romania,
Auckland-New Zealand). In addition, Geneva and Melbourne ATC Centres provide
recurrent training three times per year, while Frankfurt ATC Centre provides recurrent
training 20 times per year. In the latter a contingency system is used every weekend to
train controllers.
Further analysis of the ATC Centres with recurrent training frequency higher than once
a year, shows that all have recovery procedures in place, while the majority (i.e. 64
percent) have an organised exchange of information on equipment failures. The
Auckland ATC Centre emphasised that recovery performance was difficult before the
introduction of clear and easy to follow procedures. Moreover, this ATC Centre
highlighted that operations impact on recovery training as the recent failure types are
included in the recurrent training. Although the Oslo ATC Centre has recovery
procedures, its controllers report the need for more comprehensive and easily available
procedures (e.g. checklist type procedures on each console). These controllers
expressed a need to step away from increased dependency on experience when
handling equipment failures.
From the subset of controllers who have recurrent training once a year, 55 percent
believe that this is adequate, with the rest express the need for higher frequency in
order to build competency in handling unexpected equipment failures. When asked if
the training covers all important equipment failures, the majority of controllers (i.e. 63
percent) answered negatively. The most frequent issues mentioned to be added to the
current training syllabus are:
� complete radar failure simulated in a comprehensive and realistic way;
� total power failure;
Chapter 6 Questionnaire Survey
170
� facility evacuation;
� team resource management (TRM);
� different types of aircraft problems (e.g. communication failure, engine failure,
landing gear problem);
� hot standby procedures (system running in the background ready for immediate
use); and
� radar bypass (radar information is presented directly at the radar display without
having been processed, resulting in the presentation of uncorrelated tracks only).
61 percent of controllers believe that the training methods utilised in their ATC Centres
are suitable, or more precisely, realistic and varied. Furthermore, according to the
responses from 63 percent of controllers surveyed the recovery training is compatible
(i.e. linked to other training schemes). In general, it is essential to harmonise recovery
training within the overall training syllabus. One option is to include recovery training
within each training course, such as ab-initio training, conversion course, continuity or
recurrent training, training for unusual situations, and TRM training. The other option is
to provide separate recovery training sessions on a regular basis. Regardless of the
approach, ATC management has to assure an inclusive, regular, and consistent
approach in training for recovery to its entire population of controllers.
From the survey data and subsequent analyses, it can be concluded that the majority
of the ATC Centres surveyed have some form of recovery training although not
necessarily provided consistently throughout the Centre. The situation with recurrent
training is worse as in the majority of cases, this type of training is not provided
regularly. This results in the extensive reliance on experience in dealing with equipment
failures which may pose a significant safety threat in ATC Centres with a large
percentage of newly established and thus less experienced controllers. In general, the
controllers surveyed want to step away from over reliance on experience and be
regularly trained as much as possible.
6.7.3.6.1 Other findings on training for recovery
In addition to the findings in the previous section, the questionnaire’s narrative section
highlighted interesting safety-relevant issues regarding recovery training. These are
individual comments rather than findings representative of the entire sample. The
reported issues focus on the quality and frequency of recovery training.
Chapter 6 Questionnaire Survey
171
According to the controllers surveyed the main problem is the overall lack of training,
for supervisors, engineers, and controllers. The controllers believe that a couple of
hours of training per year is far too little practice and some of them feel that recurrent
training is necessary at least twice a year. In the event of more critical equipment
failures (e.g. radar) with high traffic levels, there may be occasions that there is no time
to act upon the recovery procedures. On these occasions the role of training as well as
teamwork has a much greater importance.
The controllers are aware that it is almost impossible to include everything that can go
wrong within the training syllabus, but emphasise that more training and guidance
should be given. They also highlight that training sessions should be as realistic as
possible in the simulated environment (e.g. higher traffic levels and the need to use
radar fallback system regularly). Currently, in some ATC Centres, the training only
focuses on outages (i.e. failure of the entire ATC system) and not on everyday failures.
An example of an ATC Centre where recurrent training takes place only on a night shift
highlighted inconsistent provision of training throughout the ATC Centre, as only those
controllers on a night shift get recovery training.
6.7.3.7 Other findings on recovery performance
This section deals with additional findings extracted specifically from question 5. This
question aimed to provide an opportunity to controllers to discuss their past experience
with equipment failures which seriously impacted on their work. The findings extracted
from question 5 are presented in Appendix VI.
While section 6.7.3 has provided a high level analysis and results of the survey, the
following section carries a more rigorous analysis of the data.
6.7.4 Interaction analyses
The data analyses started with the assessment of the sample characteristics and
proceeded with the high-level summaries of controller responses. In this section, the
final set of data analyses investigates the relationships between the characteristics of
controllers (e.g. operational experience) and various recovery factors using appropriate
statistical tests. The section starts by the qualitative assessment of potential
interactions and identification of those relevant to controller recovery. This is followed
by the presentation of appropriate statistical tests and their key findings.
Chapter 6 Questionnaire Survey
172
Several reciprocal interactions amongst controller characteristics and recovery factors
(correspond to key question defined in section 6.1) are chosen for further statistical
testing and marked with symbol ‘√’ (Table 6-4). This choice is based on known
relationships from operational experience further tested using the rigorous statistical
assessment. The focus is placed on controller recovery and factors that influence it,
which corresponds to a total of eight interactions.
Table 6-4 Interaction matrix
Opera
tion
al
exp
erie
nce
Rating
Experi
en
ce
with
equ
ipm
en
t fa
ilure
s
Fa
cto
rs t
ha
t in
flue
nce r
eco
very
p
erf
orm
an
ce
Fo
rmal exch
an
ge o
f in
form
atio
n
Exis
ten
ce o
f re
co
ve
ry p
roce
dure
s
Exis
ten
ce o
f re
co
ve
ry t
rain
ing
Operational experience (length of service) √ √ √
Rating √ √
Experience with equipment failures (frequency per year)
√
Factors that influence recovery performance √ √
Formal (management supported) exchange of information
Existence of recovery procedures
Existence of recovery training
The nature of the variables under consideration determined which statistical methods
could be used to analyse the data. As can be seen from their description in this
Chapter, three variables are categorical (rating, factors that influence recovery
performance, formal or management supported exchange of information on equipment
failures) whilst two represent a continuous or ratio scale variable5 (operational
experience-length of service, experience with equipment failures-frequency per year).
As data differ significantly from the normal distribution, several non-parametric tests
with 95 percent significance level have been used. As previously explained in Chapter
4 (section 4.4.1), chi-square tests are used to test the relationships between two
categorical variables. Furthermore, the Cramer’s V test is used to measure the
5 As mentioned in Chapter 4, variables can be either continuous or categorical. Continuous
variables are numeric values on an interval or ratio scale (e.g. age, income). Categorical variables can be either nominal or ordinal. Nominal variables differentiate between categories but do not assume any ranking between them (e.g. gender). On the other hand, ordinal variables differentiate between categories that can be rank-ordered (e.g. from lowest to highest).
Chapter 6 Questionnaire Survey
173
association for nominal data (i.e. interactions between ‘factors that influence recovery
performance’ with ‘rating’ and ‘existence of formal exchange of information on
equipment failures’) whilst the Kendall tau test is used for ordinal data (i.e. ‘factors that
influence recovery performance’). The relationship between two ratio variables is tested
via non-parametric correlation or Kendall’s tau statistics which uses the ranks of the
data to calculate correlation coefficient. Correlation coefficient ranges between -1 and
1, where its sign indicates the direction of the relationship (either positive or negative)
whilst its absolute value indicates the strength of the relationship.
Finally, the relationship between ratio and categorical variable is tested using the non-
parametric Mann-Whitney test. The test is used to assess whether two samples of
observations come from the same distribution (Shier, 2004). The test involves the
calculation of a statistic, referred to as ‘U’ (see equation 6-1).
,2
)1(1
1121 R
nnnnU −
+
+= 6-1
where n1 and n2 are the two sample sizes, and R1 is the sum of the ranks all the
observations in sample 1. Samples greater than 20 are assumed to follow normal
distribution, thus U statistic is converted to a Z score using the formula in equation 6-2
(Shier, 2004):
12
)1(2
value U largest
2121
21
++
−
=
nnnn
nn
Z 6-2
The results of all tests are presented in Table 6-5.
Table 6-5 Statistical tests and results obtained
Variable 1 Variable 2 Test
Statistical significance at 95
percent confidence level
Operational experience (length of service)
ACC Mann-Whitney non parametric
test
p>0.05
APP p<0.001 (U=1382.5,
z=-3.56)
TWR p=0.014 (U=3387.5,
z=-2.46)
Operational experience (length of service)
Experience with equipment failures (frequency per year)
Non-parametric test (Kendall’s
tau) p>0.05
Written procedures Mann-Whitney non parametric
test
p>0.05 Situation-specific problem solving
p>0.05
Other p>0.05
Chapter 6 Questionnaire Survey
174
Rating
ACC Number of equipment failures experienced annually (Q4)
as above
p>0.05
APP p>0.05
TWR p>0.05
ACC Factors that influence recovery performance
Non-parametric test (Cramer's V)
p=0.0086
APP p>0.05
TWR p>0.05
Experience with equipment failures (frequency per year)
Written procedures Mann-Whitney non parametric
test
p>0.05 Situation-specific problem solving
p>0.05
Other p>0.05
Factors that influence recovery performance
Written procedures
Situation-specific problem solving
Non-parametric test (Kendall’s
tau)
p>0.05
Other p>0.05
Situation-specific problem solving
Other p<0.001
Factors that influence recovery performance
Written procedures
Formal exchange of information (Q7)
Non-parametric test
(Cramer's V)
p>0.05
Situation-specific problem solving
p>0.05
Other p=0.029
Statistical tests performed indicated five significant relationships (Table 6-5). Significant
relationships are found between controllers with APP rating and TWR rating and years
of operational experience (i.e. years in service). In the sample surveyed, controllers
with APP rating have more operational experience compared to those without this
rating. Similarly, controllers with TWR rating have more operational experience
compared to those without it. Secondly, a significant relationship is identified between
other factors that influence recovery performance and ACC rating. Data indicates that
controllers with ACC rating tend to rely upon other factors (e.g. past experience) more
than those without ACC rating. This is expected as controllers with ACC rating in the
available sample have more operational experience than those without ACC rating.
Thirdly, a significant relationship is identified between controller reliance on situation-
specific problem solving and other factors (e.g. past experience) when recovering from
equipment failures. This is expected as past experience represents one of the factors
that define the situation surrounding (context) of an equipment failure. Finally, a
significant relationship is identified between controller reliance on other factors (e.g.
past experience) when recovering from equipment failures and management supported
6 Relationship between other factor that influence recovery procedure and ACC rating.
Chapter 6 Questionnaire Survey
175
exchange of information regarding equipment failures (Table 6-5). It may be the case
that controllers account for exchange of information regarding equipment failures as a
type of past experience.
On the other hand, no relationship is identified between the factors that influence
recovery process and operational experience (i.e. number of years active as a
controller). Although it was expected that less experienced controllers may rely more
on written procedures and that more experienced controllers may rely more on past
experience, statistical testing did not support these expectations. Years in service do
not differentiate between reliance upon a written procedure, context, or other factors
(e.g. past experience). It may be the case that the overall safety culture built in the ATC
Centre determines what a controller may use as the main resource in recovering from
equipment failures. Therefore, if the procedures are not available, they will rely more on
situation-specific problem solving. Therefore, this decision would be based on
organisational issues more than their own experience.
6.8 Summary
This Chapter has discussed in detail the questionnaire survey that sampled 134
controllers in 58 ATC Centres from 34 countries. The survey was designed to achieve
four main objectives. Firstly, to build on the literature review to further investigate
equipment failures and factors that influence controller recovery by introducing
operational experience. Secondly, to support the information obtained from operational
failure reports (as represented in Chapter 4), which lacked the input on controller
recovery. Thirdly, to assess the status and quality of recovery procedures and training
in the sampled set of ATC Centres. Finally, to contribute to the wider human reliability
research with a particular focus on controller recovery from equipment failures.
The results of the analyses conducted on the data consist of several interesting
findings. These are structured around six key questions that this survey addresses.
� How often do controllers experience equipment failures (Q1)?
Almost 95 percent of controllers surveyed experienced ATC equipment failure in their
operational career. The investigation of frequency of failures per year revealed that
major failures tend to occur only once a year or once in two years, while less severe
failures tend to occur with a relatively high frequency. These findings are in line with the
results obtained from operational failure reports and their categorisation based on
severity (presented in Chapter 4).
Chapter 6 Questionnaire Survey
176
� What factors influence their recovery performance (Q2)?
Investigation of the factors that mostly influence controller’s recovery performance
has revealed that factors other than written procedures and situation-specific problem
solving have the greatest impact, e.g. past experience. However, differences
between these ‘other’ factors (e.g. past experience) compared to written procedures
and situation-specific problem solving are not large, i.e. the controllers rated the
importance of all listed factors similarly.
� What is the most unreliable ATC equipment (Q3)?
Investigation of the most unreliable ATC equipment, based upon the experiences of the
controllers surveyed, has shown a match with the results obtained from the analyses of
operational failure reports (as presented in Chapter 4). The most affected ATC
functionalities are the communication, surveillance, and data processing. The most
unreliable ATC equipment incorporates air-ground and ground-ground communication,
radar coverage, and the flight data processing system. These findings, together with
those from Chapter 4, led to the selection of the equipment failure to be simulated in
the experiment presented in Chapter 9 (i.e. the flight data processing system failure).
� Is there any organised exchange of information on equipment failures and/or other
types of unusual/emergency situations (Q4)?
The organised exchange of information of equipment failure represents an ‘indirect’
experience and a learning opportunity. Through presentation, seminars, and safety
bulletins, the controllers could be presented with failure types, contextual conditions
surrounding the failure, and the difficulties experienced by their fellow colleagues in
handling the situation. However, in the sample obtained almost half of the controllers
did not have this kind of information exchange organised in their ATC Centres.
� Do recovery procedures exist (Q5)?
Assessment of the existence and quality of recovery procedures shows that the
majority of sampled ATC Centres have some type of recovery procedure in place,
mostly for radar failure, communication failure, and flight data processing system
failure. The analyses also show that most of these procedures are kept up-to-date but
not always complete. Therefore, additional emphasis should be placed on the revision
of existing procedures to assure that the recovery steps presented are complete and
that these follow a logical order. However, attention should be paid to the trade-off
between the thoroughness of the procedure and limited time available to perform all
Chapter 6 Questionnaire Survey
177
prescribed steps and thus to recover. An example of a concise check-list type recovery
procedures developed in this thesis for a specific European ATC Centre is presented in
Appendix III. It is based on a format used previously by the German air traffic service
provider (DFS) accepted and published by EUROCONTROL (2003f).
� What do controllers feel about the quality of training currently available for recovery
from equipment failures (Q6)?
Assessment of the existence and quality of training for recovery shows that only half of
the ATC Centres surveyed have established training for recovery from equipment
failures. The situation with recurrent training is even worse as only 36 percent of ATC
Centres surveyed organise regular recurrent training. In most cases, recurrent training
is provided only once a year, while in nine ATC Centres it is provided twice a year. On
the other hand, controllers support the idea of very frequent recurrent training. Almost
half of the respondents (i.e. 45 percent) feel an annual training session for a couple of
hours is simply not enough to keep them proficient and ready to deal with unexpected
equipment failures.
The process of identification of factors that affect controller recovery started in the
previous Chapter by an overall assessment of past research relevant to controller
recovery. It has continued in this Chapter by expanding these findings with the
questionnaire survey results and operational experience of controllers worldwide.
Based on these findings, the next Chapter finalises this rigorous process by identifying
factors that affect controller recovery, referred to as ‘Recovery Influencing Factors’
(RIFs).
Chapter 7 Methodology for a Selection of Relevant RIFs
178
7 Methodology for a Selection of Relevant Air Traffic Controller Recovery Influencing Factors
This Chapter builds on the findings from past research of relevance to controller
recovery (Chapter 5) further augmented by the operational experience extracted from
the questionnaire survey (Chapter 6) to realise a detailed understanding of the context
that surrounds a controller during the occurrence of an unexpected equipment failure.
The Chapter starts by illustrating the importance of the impact that contextual factors
have on controller recovery from equipment failures in Air Traffic Control (ATC). It
reviews both Air Traffic Management (ATM) and non-ATM related Human Reliability
Assessment (HRA) techniques to assure a comprehensive investigation of contextual
factors relevant to controller recovery from equipment failures in ATC. This initial
selection is augmented by the findings from the equipment reliability literature,
operational failure reports, human reliability research, and interviews with ATM
specialists. The Chapter concludes by identifying a set of relevant contextual factors,
referred to as ‘Recovery Influencing Factors’ (RIFs), and their qualitative descriptors or
the levels of their influence on controller recovery performance.
7.1 Relevance of the recovery context
Analyses of accident investigations in various industries (e.g. aviation, nuclear and
chemical) have revealed that it is not possible to gain a full understanding of the
cause(s) of an accident from factual data alone. For example, the US National
Transportation Safety Board (NTSB) conducted dozens of detailed accident
investigations in which the teams of experts managed to assess different contributory
factors and identified various issues with task design, procedures, cultural issues
(mostly relevant to language barriers within pilot-controller communication), personal
factors (e.g. a shift in attention in L-1011 1972 accident in Everglades; NTSB, 1973),
weather (e.g. the Pan Am Flight 759 accident was due to thunderstorm and wind shear;
NTSB, 1983). Such factors can help explain why errors occur. Additionally, the
description of the context may also serve as a basis for defining ways of preventing or
Chapter 7 Methodology for a Selection of Relevant RIFs
179
reducing specific types of erroneous actions by means of technical recovery (i.e. built-
in defences) and human recovery.
It is also necessary to take into consideration contextual factors that traditionally may
not be recorded by investigating bodies, but which can have a significant impact on the
outcome of an accident. In support of this, Dekker et al. (2004) note that it is
“necessary to capture both a situation in which the action takes place and the action
itself”. Similar arguments were presented by researchers at the National Aeronautics
and Space Administration (NASA) Ames Research Centre, who pointed out that "we
must move beyond trying to pin the blame for accidents on a culprit but seek instead to
understand the systemic causes underlying the outcomes" (cited in Cox, 2005). The
research presented in this thesis expands the analysis of equipment-related incidents
to include the context in which controller recovery unfolds. Therefore, the objective of
this Chapter is to determine the relevant contextual factors that affect the process of
controller recovery from equipment failures in ATC.
In Air Traffic Management (ATM), the contextual factors relevant to controllers are
defined as “internal or external factors which influence the controller’s performance of
ATM tasks” (EUROCONTROL, 2002b). It is notable that this definition is generic and
thus does not give an indication as to when it is appropriate to stop looking further for
contextual factors. The so-called ‘stopping rule’ is taken to be directly linked to the
overall investigation process, where assessment of contextual factors represents only
one segment of that process. In other words, it is the role of the investigator to
determine the chain of events that constitute a safety-relevant occurrence. In this
respect, the analysis of contextual factors should cover the entire chain and assess the
relevant context for each link in the chain. The research presented in this thesis adapts
the EUROCONTROL definition of contextual factors. Hence, the contextual factors in
this research or ‘Recovery Influencing Factors’ (RIFs) are defined as internal or
external factors that influence the controller’s recovery from unexpected equipment
failures in ATC.
The factors extracted from the various techniques are known in the HRA literature as
noise), personal factors (e.g. alertness/fatigue), social and team factors (e.g.
handover/takeover), and organisational factors (e.g. conditions of work).
Chapter 7 Methodology for a Selection of Relevant RIFs
185
The main difference between TRACEr and HERA is that the former does not include
pilot actions and weather (see Appendix VII). Thus, no additional candidate factors
could be extracted from TRACEr.
7.2.1.3 Recovery from Automation Failure (RAFT) Tool
As previously discussed in Chapter 5, this tool has been developed as a part of the
“Solutions for the Human-Automation Partnerships in European ATM (SHAPE)” project,
managed by the Human Factors Division of EUROCONTROL. The SHAPE project
defines context as “any aspect of the operating environment that can influence a failure
or recovery process” (EUROCONTROL, 2004e). The project focused on the contextual
factors affecting recovery, which is in line with the objective of this thesis. The relevant
contextual factors or PSF categories recognised in RAFT are: task load and system
complexity, pilot-controller communication, procedures and documentation, training
and experience, human-machine interaction, personal factors, social and team factors,
logistical factors, and other organisational factors.
A review of the RAFT PSFs shows that ‘task load and system complexity’ represents a
workload facing the controller as a result of task performance and overall system
complexity. Therefore, this factor has a potential to be included as a RIF. Compared to
HERA, RAFT disregards ‘pilot action’, ‘weather’, and ‘environment’ as relevant
contextual factors for human recovery from equipment failure in ATC. Whilst pilot
actions do not have much impact as explained in section 7.2.1.1, weather can bring
additional complexity to the occurrence of equipment failure. At the same time, RAFT
includes a ‘new’ category called ‘logistical factors’, which includes maintenance and
staffing issues.
Environmental issues (e.g. noise, temperature, and lighting) are excluded. The reason
for this is that controllers are used to ambient characteristics by working in a specific
ATC Centre. On the other hand, logistical factors will be assigned to the existing
organisational factors category. The reason for this lies in the fact that staffing and
maintenance issues should be anticipated and pre-planned at organisational or
managerial level (e.g. maintenance scheduling, availability, and assignment of
personnel, stock of equipment and spare parts, on-the-job training aids). The
management in any ATC Centre should anticipate as far as possible unscheduled
technical disturbances and provide necessary defences for their prevention.
Chapter 7 Methodology for a Selection of Relevant RIFs
186
The three techniques (HERA, TRACEr, and SHAPE/RAFT tool) above were developed
specifically for the ATC/ATM environment. In general, they defined context and
contextual factors in a similar way as it is defined in this thesis. The assessment of
these three models identifies a total of nine candidate RIFs. These are: communication,
traffic and airspace, weather, procedures, training and experience, HMI, personal,
organisational factors, and task complexity.
Whilst the review of ATM related HRA techniques gives many relevant contextual
factors, it worth examining relevant non-ATM HRA techniques to investigate if other
factors exist. The following sections provide an insight into the relevant findings.
7.2.1.4 Recovery from failures: understanding the positive role of human operators during incidents
This research attempted to emphasise the positive role of human operators in the
overall system performance. In addition, it proposed a preliminary failure compensation
process model (or recovery model) derived initially for the chemical process industry.
Furthermore, the importance of a taxonomy used to describe the factors influencing
recovery was recognised. Based on the experience gained from field studies and the
relevant literature, Kanse and van der Schaaf (2000) developed a list of RIFs. In their
research the recovery factors were defined as factors that contribute to human
recovery performance once an error or failure has occurred. This definition
corresponds to the definition of RIFs adopted in this thesis. A categorisation into six
groups of RIFs adopted by Kanse and van der Schaaf (2000) from the power plant
industry is presented in Table 7-1.
Table 7-1 Factors influencing recovery from failures (from Kanse and van der Schaaf, 2000)
Categories of factors Recovery Influencing Factors
Prioritisation of recovery-related tasks
Time available for recovery task, considering other tasks requiring attention Urgency of recovery (amount of time until negative consequence arise) Importance of or need for recovery (seriousness of possible consequences if not recovered)
Occurrence-related
Type(s) of preceding failures Performance phase in which the immediate result of the failure process is detected (during the planning phase/ while carrying out the action/when the outcome of the action is observable) Available and applicable barriers/defences
Human (person) related
Overall work area knowledge Work area and process related skills General competency in job Time elapsed since last (re)training in work area Time since last (re)training with regard to specific problem occurrence Suspicion/distrust/intuition
Chapter 7 Methodology for a Selection of Relevant RIFs
187
Personal attitude toward failure and failure compensation System failure coping strategies Self-efficacy (trust in own ability), self esteem Fatigue; Shift work coping ability Feeling of personal responsibility for the failure or problem Feeling of personal responsibility with regard to recovery Pride regarding job well done Previous experience with failures (any type) Previous experience with this failure (any type)
Social
Team attitude toward failures and failure compensation Attitude toward teamwork; Team efficacy Feeling of team responsibility for the failure or problem Feeling of team responsibility with regard to recovery
Organisational
Availability of team members/colleagues Organisation of work and responsibilities Training plan; Competency assessment plan Supervision; Personnel selection processes Availability, quality and usability of procedures/instructions Shift patterns and personnel planning Organisational policy Management attitudes towards failures & failure compensation
Technical/workplace/situational
Availability of equipment/materials needed Operator-process interface properties
The majority of the identified factors are relevant to equipment failures in ATC and
should be considered as potential RIFs. For example, ‘available and applicable
barriers/defences’ are important with respect to detection, diagnosis, and correction of
equipment failure. Time pressure is recognised under the ‘prioritisation of recovery-
related tasks’. Equipment failures in ATC are unexpected events, which degrade the
ATC service offered. In this case controllers are still required to provide a service to
ensure a safe flow of traffic. As a result, controller workload increases rapidly
potentially compromising controller performance. Therefore, this factor should be
analysed for potential inclusion into the RIFs. Occurrence-related factors are mostly
applicable to the power plant environment and as such could not be directly applied to
ATC. However, if transferred to the characteristics of the ATC environment, these
factors may be relevant to equipment failure occurrence.
7.2.1.5 Computerised Operator Reliability and Error Database (CORE-DATA)
The CORE-DATA database was developed at the University of Birmingham to assist
the UK personnel involved in the assessment of hazardous systems such as nuclear,
chemical, and offshore systems (Kirwan, Basra, and Taylor-Adam, 1997;
EUROCONTROL, 2002b; EUROCONTROL, 2004d). It represents an attempt to
develop a systematic approach to recording human errors. Several sources of data are
used to populate the database including: real operating experience (incident and
accident reports), simulation (both training and experimental simulators), experiments
(from literature on performance), expert judgment (e.g. as used in risk assessments),
Chapter 7 Methodology for a Selection of Relevant RIFs
188
and synthetic data (from human reliability quantification techniques). According to
EUROCONTROL (2002b), CORE-DATA contains approximately four hundred data
records describing particular errors that have occurred, together with their causes, error
mechanisms, and their probabilities of occurrence. PSFs are defined in CORE-DATA
as underlying causes which influence human performance and indicate how the human
error occurred. CORE-DATA’s PSF taxonomy consists of alarms, communication,
ergonomic design, ambiguous HMI, HMI feedback, labels, lack of supervision/checks,
There are a number of factors here of potential relevance to ATC and controller
recovery. Firstly, alarms should be considered as a particular type of technical built-in
defence (discussed in Chapter 4) and are therefore, important with respect to detection,
diagnosis, and correction of equipment failure. This is also in accordance with the work
done by Kanse and van der Schaaf (2000) as explained in the previous section. Hence
‘alarm’ should be considered as a potential RIF. Secondly, task novelty or task
familiarity in the case of equipment failures in ATC should be considered under the
training and experience RIF. Thirdly, time pressure has also been recognised in the
work done by Kanse and van der Schaaf (2000) under the ‘prioritisation of recovery-
related tasks’. Therefore, this factor should be analysed for inclusion into the RIFs.
7.2.1.6 Technique for Human Error Rate Prediction (THERP)
The THERP technique was developed by Alan Swain at Sandia National Laboratories
in the 1950's (Swain and Guttman, 1983; Straeter, 2000). The THERP technique
assumes that human information processing can be influenced by error conditions
(Performance Shaping Factors-PSFs). THERP subdivides all PSFs into internal,
external, and those that act as physiological and psychological stressors. However, the
ways in which PSFs act on human performance are not explicitly specified.
Furthermore, THERP sub-divides external PSFs into situational factors, task factors,
and task instructions. Internal factors are defined as factors related to the organism (i.e.
human factors). The PSFs recognised in THERP are presented in Table 7-2.
Table 7-2 Factors influencing human actions in THERP (cited in Straeter, 2000)
Category Factors influencing human actions
External Performance Shaping Factors
Situational factors
Design features; Quality of environment; Temperature, air humidity, air quality, radiation exposure, illumination, noise, vibration, cleanliness; Working hours; Breaks; Availability of special work resources; Job manning; Organisational structure (authority, responsibility, channels
Chapter 7 Methodology for a Selection of Relevant RIFs
189
of communication); Actions by shift leader, worker, manager, supervisory authority); Remuneration structure (recognition, payment)
Factors in tasks and work resources
Requirements for perception; Requirements for motor system (speed, power expenditure, accuracy); Relationship between operators and display; Requirements for adaptation; Interpretation; Decision making; Complexity (information loading); Narrow nature of task; Short term and long term memory; Calculations; Feedback (knowledge regarding results of an action); Dynamic of gradual actions; Group structure and communications; Man-machine factors; Interface (design of work resources, test instruments, maintenance equipment, work aids, tools, accessories)
Work and task instructions
Required procedures (written, non-written); Written and verbal communication; Warnings and danger signs; Work-methods; Plant policy
Stressors
Psychological stressors
Suddenness of occurrence; Duration of stress; Task speed; Task load; High hazard risks; Threats (fear of failure, loss of job); Monotony, degrading or meaningless activities); Duration of uneventful periods of alertness; Work performance motive conflicts; Reinforcement of missing or negative sensory deprivation; Detractors (noise, blinding, motion, flickering, coloration); Inconsistent labelling
Physiological stressors
Duration of stress; Fatigue; Pain or discomfort; Hunger or thirst; Extreme temperatures; Radiation; Extreme gravitational forces ; Extreme pressure conditions ; Inadequate oxygen supply; Vibration; Restricted movements; Absence of physical exercise; Interruption of circadian rhythm
Internal Performance Shaping Factors
Factors relating to the organism (i.e. human factors)
Prior training, experience; State of momentary practice or abilities; Personality and intelligence variables; Motivation and attitudes; Emotional states; Stress (mental or physical); Knowledge about demanded performance prerequisites; Gender differences; Physical conditions; Attitudes deriving from family or groups; Group dynamic processes
A review of the contextual factors relevant to THERP reveals that most can be
allocated to the RIFs identified by the first three ATM-related techniques. Several other
factors, such as decision-making, short-term, and long-tem memory (external PSF)
may be categorised as personal factors. These factors may become increasingly
important within the planned modernisation of ATM (i.e. datalink, electronic strips, or
‘stripless’ environment). Finally, the suddenness of occurrence factor identified in
THERP is not possible to categorise within existing RIF groups. This factor is relevant
for the occurrence of equipment failure in ATC environment as it greatly affects the
controller detection. Hence it should be treated as an additional potential RIF.
7.2.1.7 Human Error Assessment and Reduction Technique (HEART)
The HEART technique was developed by Jeremy Williams, a British ergonomist, in
1985. The review of this technique is available in EUROCONTROL (2004d) and
Chapter 7 Methodology for a Selection of Relevant RIFs
190
Williams (1986). It is one of the most popular human error quantification techniques
due to its ease of implementation and is still used extensively in the nuclear, chemical,
petrochemical, railway, and defence industries.
HEART was derived from a wide range of findings in ergonomics literature. The
technique defines a set of generic error probabilities for the tasks considered, and
identifies the Error Producing Conditions (EPC) associated with these. EPCs include
Communication Communication for recovery within team/ATC Centre
Traffic and airspace Traffic complexity during the recovery process Airspace characteristics during the recovery process
Weather Weather conditions during the recovery process Procedures Existence of recovery procedure
Training and experience Training for recovery from ATC equipment failures Experience with equipment failures
HMI Adequacy of HMI and operational support Personal factors Personal factors Organisational factors Adequacy of organisation Task complexity Conflicting issues in the situation (task complexity) Time available & time pressure Time necessary to recover Available and applicable defences and barriers & alarms
Adequacy of alarms/alerts (as part of HMI)
Complexity of failure Complexity of failure type Suddenness of occurrence & Time course of failure development
Time course of failure development
Duration of failure type Duration of failure Impact on operational room (i.e. number of workstations/sectors affected)
Number of workstations/sectors affected
Experience with system performance (reliance)
Experience with system performance (reliance or trust in the system)
Chapter 7 Methodology for a Selection of Relevant RIFs
202
Ambiguity of information in the working environment
Ambiguity of information in the working environment
Adequacy of alarm/alert onset Adequacy of alarm onset
7.3 Definition of qualitative descriptors
The final step involves the definition of the qualitative descriptors for each RIF. In this
research, a qualitative descriptor defines the levels of impact that each RIF has in the
context of controller recovery performance. The simplest case would be a dichotomous
descriptor distinguishing only two levels of impact of each recovery factor. However,
this approach is often lacking valuable information and it is not always suitable.
Therefore, qualitative descriptors have been constructed providing three levels of
impact. It starts from Level 1, referring to the most desirable level (in terms of ATC
recovery), toward Level 2, referring to the tolerable or average level, and finishing with
Level 3, referring to the least desirable level. For example, the RIF ‘communication for
recovery within team/ATC Centre’ would have three qualitative descriptors, namely
‘efficient communication’, ‘tolerable communication’, and ‘inefficient communication’.
This approach is similar to that taken in the CREAM technique (Hollnagel, 1998;
section 7.2.1.9).
On the other hand, the RIF ‘Experience with the system performance (reliance or trust
in the system)’ would have two qualitative descriptors. The first would be ‘objective
attitude toward the system’. The second would account for inadequate attitude of the
controller toward the ATC system and would include both ‘positive experience with the
system (overtrust) and negative experience with the system (undertrust)’. In order to
accurately present the levels of impact that this particular RIF has in the context of
controller recovery performance, it was necessary to combine the cases of undertrust
and overtrust in the ATC system. To all extents and purposes, they both have a similar,
undesirable, affect on controller recovery performance. Undertrust in ATC systems
leads to inefficient use of available equipment or all of the available tools. On the other
hand, overtrust leads to complete reliance on the information provided by the system
without consideration of the controller’s own judgement or situational awareness of the
position (lateral and longitudinal) and intent of the traffic within a dedicated airspace.
The above analyses led to a final set of 20 controller Recovery Influencing Factors
(RIFs) divided into four main groups: internal factors (i.e. factors related to the
controller), equipment failure related factors, external factors (i.e. factors related to
working conditions), and airspace related factors. Finally, it has to be noted that the
Chapter 7 Methodology for a Selection of Relevant RIFs
203
definition of these 20 RIFs assumes that an equipment failure has occurred (i.e.
probability of equipment failure is 1). Otherwise, these 20 RIFs would have to be re-
named and re-defined to allow an analysis of the context surrounding a particular event
under investigation, no longer being an equipment failure. Table 7-5 presents the final
set of factors relevant to the recovery from equipment failures in ATC, together with
their corresponding qualitative descriptors. It has to be noted that these 20 RIFs
represents high-level categories (e.g. personal factors) consisting of several low-level
factors (e.g. age, experience, stress, fatigue). The detailed definitions of these 20 RIFs
in this thesis are presented in Appendix VIII.
Table 7-5 Relevant recovery influencing factors and their corresponding qualitative descriptors
RIF name Qualitative descriptor Level
Inte
rna
l fa
cto
rs
Training for recovery from ATC equipment failure
Suitable to the situation in question 1
Tolerable to the situation in question 2
Counter productive to the situation in question
3
Experience with equipment failures
Experienced a particular type of failure or any other type of ATC equipment failure
1
No experience with ATC equipment failures 2
Experience with the system performance (reliance)
Objective attitude toward the system 2
Positive experience with the system or negative experience with the system
3
Personal factors
Suitable for the recovery process 1
Tolerable for the recovery process 2
Counter productive for the recovery process 3
Communication for recovery within team/ATC Centre
Efficient 1
Tolerable 2
Inefficient 3
Equ
ipm
en
t fa
ilure
rela
ted
fa
cto
rs
Complexity of failure type Single system affected 2
Multiple systems affected 3
Time course of failure development
Sudden failure 1
Persistent or latent failure 2
Gradual degradation of system 3
Number of workstations/sectors affected
One workstation/one sector or all workstations in one sector
2
Several workstations/couple of sectors or all workstations/all sectors
3
Time necessary to recover Adequate 1
Inadequate 3
Existence of recovery procedure
Suitable to the situation in question 1
Tolerable to the situation in question 2
Inappropriate 3
Duration of failure Short period of time 2
Moderate or substantial period of time 3
or
facto
rs
rela
ted to
w
ork
ing
co
nditio
n
Adequacy of HMI and operational support
Suitable to the situation in question 1
Tolerable to the situation in question 2
Counter productive to the situation in 3
Chapter 7 Methodology for a Selection of Relevant RIFs
204
question
Ambiguity of information in the working environment
External working environment matches the controller's internal mental model
1
External working environment mismatches the controller's internal mental model
3
Adequacy of alarms/alerts
Suitable to the situation in question 1
Tolerable to the situation in question 2
Counter productive to the situation in question
3
Adequacy of alarm/alert onset
Information from the external world enters the processing loop at the right time
1
Information from the external world enters the processing loop at the wrong time (misleading sequence of alarms)
3
Adequacy of organisation
Efficient 1
Tolerable 2
Inefficient 3
Air
spa
ce r
ela
ted f
acto
rs Traffic complexity during the
recovery process
Average traffic complexity 2
High or low traffic complexity 3
Airspace characteristics during the recovery process
Adequate 1
Tolerable 2
Inappropriate 3
Weather conditions during the recovery process
Improved 2
Deteriorated 3
Conflicting issues in the situation (task complexity)
Average complexity of the situation 2
Conflicting, multiple tasks or extremely low complexity of the situation
3
In order to assure a complete list of relevant contextual factors, a key step at this stage
included verification of the selected RIFs. An initial verification was provided by two
ATM specialists (from one European ATC Centre) with extensive operational
experience. They had an opportunity to review the candidate RIFs, their definitions,
and related qualitative descriptors (for evidence see Appendix II) and their feedback
was valuable in the approval of selected RIFs. Further verification of the selected RIFs
has been conducted in the experiment (presented in Chapters 9 and 10). A discussion
on the process to quantify the probabilistic definition of 20 RIFs, their interactions, and
their influence on controller recovery is presented in more detail in the following
Chapter.
7.4 Summary
This Chapter has had the objective of defining recovery context via a set of contextual
factors, known as ‘Recovery Influencing Factors’ or RIFs. The Chapter has built on the
review of existing HRA techniques and their corresponding contextual factors to identify
which factors are relevant to recovery from equipment failure in ATC. This initial
selection of relevant contextual factors has been augmented with specific equipment
Chapter 7 Methodology for a Selection of Relevant RIFs
205
failure related factors and dynamic situational factors. The methodology resulted in a
set of 20 controller RIFs. The Chapter concludes with a definition of the qualitative
descriptors for each RIF or the levels of impact that each RIF has in the context of
controller recovery performance. All results obtained have been initially verified by two
ATM specialists who reviewed the choice of selected RIFs and their qualitative
descriptors. The selection of relevant contextual factors (i.e. RIFs) and their qualitative
descriptors are taken forward to the next Chapter to develop the methodology for the
quantitative assessment of the recovery context.
Chapter 8 Quantitative Assessment of Recovery Context
206
8 Quantitative Assessment of Air Traffic Controller Recovery Context
The previous Chapter presented a selection of contextual factors relevant to recovery
from equipment failures in Air Traffic Control (ATC), known as Recovery Influencing
Factors (RIFs). This selection was based on a review of existing Human Reliability
Assessment (HRA) techniques, augmented by specific equipment failure and dynamic
situational factors. A set of 20 RIFs were identified and distributed in four main groups:
internal, equipment failure related, external, and airspace related factors. In order to
facilitate quantitative assessment of the recovery context, the selected RIFs were firstly
assigned potential qualitative levels of impact followed by their quantitative definition
(i.e. probability of each level occurring). The Chapter starts by reviewing relevant past
research to formulate the methodology adopted in this thesis. The proposed
methodology consists of six steps. The qualitative definition of 20 RIFs from the
previous Chapter (Step 1) is followed by the quantitative definition of each RIF (Step 2).
This quantitative definition is based on various sources, such as past literature,
operational failure reports, expert input of eight ATM specialists, and the questionnaire
survey. The Chapter continues by the implementation of all existing interactions
between relevant RIFs (Step 3). These are identified by utilising operational experience
and further validated by past research and expert input. Incorporation of interactions
results in the change of RIF levels that necessitate determination of the cut-off point
between any two consecutive levels (Step 4). Finally, the methodology defines the
relationship between a particular RIF level and its effect on controller recovery
performance (Step 5), to conclude with the definition of a numerical indicator for each
recovery context (Step 6).
8.1 Lessons leant from past research
The review of various HRA techniques (in Chapter 7) identified two issues relevant to
this thesis. Firstly, it identified potential RIFs. Secondly, it revealed the two HRA
techniques which use contextual factors as the basis for quantitative human
performance analysis. These are: the Cognitive Reliability and Error Analysis Method -
Chapter 8 Quantitative Assessment of Recovery Context
207
CREAM (Hollnagel, 1998) and Connectionism Assessment of Human Reliability -
CAHR (Straeter, 2000). A discussion of the CREAM techniques and its relevance to
this thesis is presented in sections 7.2.1.9 and 7.3 of Chapter 7 and will not be
repeated here. However, since the CREAM technique has been further developed in
the work by Kim, Seong, and Hollnagel (2005) and Fujita and Hollnagel (2004), both
approaches have been assessed for their relevance to the research presented in this
thesis.
8.1.1 Applications of the CREAM technique
The application of the CREAM technique by Kim, Seong, and Hollnagel (2005)
attempted a probabilistic determination of contextual factors to determine the relevant
control mode (tactical, opportunistic, scrambled, and strategic control as defined in
CREAM). In short, the authors proposed probability distributions for nine contextual
factors or CPCs, taking into account their dependencies. The advantage of their
approach is the straightforward incorporation of uncertainties. In other words, this
approach is useful in the case of contextual factors which are not clearly defined or
understood. Because of this particular feature, this approach has been adopted in this
thesis.
Furthermore, Kim, Seong, and Hollnagel (2005) link each level of a contextual factor to
a specific type of control and assess all possible contexts using the Bayesian Belief
Network (BBN) approach. Littlewood, Strigini, Wright, and Courtois (1998) state that
the use of BBNs allows safety experts to better handle safety assessment and
potentially make hidden safety arguments more visible, communicable, and auditable.
In general, the concept of BBN is based on a probabilistic approach. It combines expert
input and data, and is useful for building complex and uncertain applications. However,
the approach by Kim, et al. (2005) based on nine CPCs was too complex.
Subsequently, Kim, et al. simplified it by grouping the nine CPCs into the groups of
three, further assessed by the BBN approach. For this reason, a probabilistic approach
based upon C programming codes and the core methodology by Kim et al. (2005) is
used in this thesis to enable incorporation of all 20 RIFs.
The application of the CREAM technique by Fujita and Hollnagel (2004) is designed as
a practical application of CREAM for screening various scenarios and estimating the
failure probability solely from the characteristics of the contextual conditions
surrounding an occurrence (e.g. accident). In this way, the method moves away from
the notion of human error and focuses more on context as a driving force of inadequate
Chapter 8 Quantitative Assessment of Recovery Context
208
human performance, regardless of whether an individual or a team is involved.
Although it demonstrates the usefulness of the CREAM methodology, this method is
not very relevant to this thesis.
8.1.2 Connectionism Assessment of Human Reliability (CAHR)
As previously discussed in section 7.2.1.12 of Chapter 7, CAHR is a data-driven HRA
technique based on highly detailed databases of incident reports in the nuclear industry.
Using the available incident reports, it was possible to move away from an expert
judgment based categorisation of PSFs towards a more analytical method. However,
ATC still lacks a high-level database that captures human performance in the event of
an ATC related incident/accident. Therefore, an analysis of context as performed in
CAHR is still not achievable in the ATC industry. Some initial attempts to establish a
database that captures the human performance data are planned by EUROCONTROL
through the Human Error in ATM (HERA) project (EUROCONTROL, 2002d), but
currently this is incapable of supporting any meaningful statistical analysis.
The following Table 8-1 summarises the characteristics of CREAM, its two main
applications, and CAHR. Section 8.2 builds on the relevant elements of the CREAM
technique to define a framework for the quantitative assessment of recovery context.
Table 8-1 Overview of CREAM and CAHR differences
HRA technique Relevant area Number of contextual
factors
Interaction between
contextual factors Output
CREAM by Hollnagel (1998)
Theoretical approach toward human erroneous
action
Nine Included
qualitatively
Quantitative probabilistic
range
Improvement of CREAM by Fujita and
Hollnagel (2004)
Theoretical approach toward
‘action’ failure rate based on contextual
factors
Ten
Included qualitatively (based on CREAM)
Quantitative mean failure rate
Improvement of CREAM by Kim,
Seong, and Hollnagel (2005)
Theoretical approach toward human erroneous
action
Nine
Included qualitatively (based on CREAM)
Quantitative, probabilistic approach
CAHR by Straeter (2000)
Data driven approach defined
within nuclear industry
Thirty Included
quantitatively using the available data
Connectionism method
facilitating qualitative and
quantitative approach
Chapter 8 Quantitative Assessment of Recovery Context
209
8.2 Framework of the methodology for a quantitative assessment of recovery context
The proposed methodology is ‘generic’ as its aim is to present the framework for a
‘generic’ ATC Centre, as described in Chapter 2, section 2.4. Used operationally, this
methodology would have to be refined to reflect and incorporate all the characteristics
of the ATC Centre or event under investigation.
In general this methodology consists of six steps (Figure 8-1). Firstly, it is necessary to
review the twenty RIFs identified in the previous Chapter and their relevance to the
ATC Centre or event under investigation. In the ‘generic’ approach, all 20 factors are
assessed and defined through their qualitative descriptor or their levels of impact on
controller recovery performance (Step 1). Secondly, based on available sources of
information each RIF is probabilistically defined (Step 2). As a result, it is possible to
present the recovery context as a function of identified RIFs and their corresponding
levels. At this stage, there is no consideration of the interactions between RIFs, as they
are considered to be independent. To provide an accurate approach, Step 3 takes into
account all interactions between RIFs. These are assessed both qualitatively and
quantitatively. This results in a distribution of RIFs levels. Having a distribution of RIF
levels, as opposed to discrete Levels 1, 2 and 3, necessitates identification of the cut-
off point between any two consecutive levels (Step 4). Once these cut-off points are
identified and RIF levels re-defined, the next step quantifies the relationship between
the particular level of RIF and its impact on controller recovery performance. This
relationship is expressed via correlation coefficients (Step 5). At this stage, previously
determined probabilities of each RIF level (Step 2) are re-calculated to account for
RIFs interactions. The result is the definition of an aggregated indicator of the recovery
context, referred to as the recovery context indicator – Ic (Step 6).
The Figure 8-1 below presents the six steps framework of the quantitative assessment
of the recovery context. Since the previous Chapter identified and discussed all 20
RIFs and their levels of impact (qualitative descriptor), the following section discusses
the consequent step, namely probabilistic assessment of RIFs (Step 2). This is
followed by the remaining steps of the proposed methodology (Figure 8-1).
Chapter 8 Quantitative Assessment of Recovery Context
210
Figure 8-1 Framework for the quantitative assessment of the recovery context
Chapter 8 Quantitative Assessment of Recovery Context
211
8.3 Probabilistic assessment of RIFs (Step 2)
Given that the aim of this Chapter is to present a reliable quantitative approach for the
analysis of the controller recovery performance, it is necessary to probabilistically
define levels of influence of each RIF on controller performance (referred to as
qualitative descriptor). As previously discussed in Chapter 7 (section 7.3), the
qualitative and quantitative definition of RIFs assumes that a failure occurred (i.e. that
the probability of failure is 1). In this way, it is possible to define every possible context
as a combination of RIFs and their corresponding levels of influence, i.e. qualitative
descriptor. This approach is important for the prospective analysis of controller
performance, as well as a retrospective event analysis. Even in the case of
retrospective analysis, specifying RIFs exactly is not straightforward due to the lack of
data and information about the context. In the case of predicting future events or
potential hazardous contexts, specifying the RIFs accurately becomes much more
difficult and a level of uncertainty is inherent in the process.
The use of a probabilistic approach has several advantages. Firstly, if a certain RIF is
not clearly specified or known, it is possible to assume probabilities for each of its
levels based on operational data. In this way any uncertainties identified for a certain
RIF can be considered more explicitly as illustrated by Kim, Seong, and Hollnagel
(2005). Another advantage of this approach is that the probability distribution of the
context, and indirectly controller performance, is a result of considering all possible
combinations of contextual factors or RIFs.
The definition of each RIF in terms of the probability of each of its levels is not
straightforward. However, this is necessary for any attempt to quantify the
effectiveness of controller recovery performance in a given context or environment.
Major difficulties are experienced in the quantification of internal RIFs (or factors
related to the controller), as it is hard to quantify any type of human performance. It is
also difficult to quantity some of the equipment failure related RIFs due to the lack of
consistent data collection in the available occurrence reporting schemes. In other
words, some failure characteristics, such as the number of workstations affected, are
not consistently reported. Finally, the majority of the external RIFs are highly ATC
Centre specific and as such extremely hard to define in a generic form. Bearing this in
mind, it is understandable why the quantification of RIFs has been a challenge in the
past.
Chapter 8 Quantitative Assessment of Recovery Context
212
For this reason, it should be noted that this Chapter captures the characteristics of the
‘generic’ ATC Centre as a base for any further fine tuning of the proposed methodology
and its usage as either a retrospective or prospective/predictive tool. Each ATC Centre
has its unique characteristics that may be represented by different RIF probabilities.
For example, the ‘number of workstations/sectors affected’ and ‘complexity of failure
type’ depend on a particular architecture in each ATC Centre, while ‘training for
recovery’ as well as ‘adequacy of organisation’ depend on a particular safety culture.
The framework developed in this Chapter is applied to a unique ATC Centre, presented
in Chapter 10.
8.3.1 Sources of information
A total of four different sources of information have been consulted in order to
determine the necessary RIFs probabilities. These are: operational failure reports
(presented in Chapter 4), the responses from the questionnaire survey (presented in
Chapter 6), responses of ATM specialists, and past literature. Table 8-2 presents the
number of RIFs defined by each available source of information, while the following
paragraphs explain each source in detail. However, two RIFs are not informed by any
of the available sources (‘number of workstations/sectors affected’ and ‘adequacy of
alarm/alert onset’). In these cases, a conservative approach is taken and probabilities
are equally assigned between their levels. Details are presented in Appendix VIII.
Furthermore, three RIFs are informed by combined sources of information (last column
in Table 8-2).
Table 8-2 Distribution of probabilistic RIF ratings per source
Source of probabilistic assessment
Number of RIFs assessed directly (single source)
Number of RIFs assessed indirectly (combined sources)
4 Source: Bureau of Transport and Regional Economics (2006). Australian Government
Chapter 8 Quantitative Assessment of Recovery Context
215
Christchurch ACC/Oceanic Latest generation 5553
Tokyo ACC/Oceanic Older generation 2,2505
The responses from the ATM specialists surveyed are used to inform 12 RIFs. For
three RIFs their responses have been used to either supplement the findings from the
past research (for the ‘experience with the system performance’ RIF) or validate
findings from the operational failure reports (for the ‘complexity of failure type’ and
‘duration of failure’ RIFs).
For majority of RIFs, the responses from the ATM specialists surveyed have been
consistent. However, for six RIFs some ATM specialist gave different answers. This
was the case with the following RIFs: ‘personal factors’, ‘communication for recovery
within team/ATC Centre’, ‘time course of failure development’, ‘adequacy of HMI and
operational support’, ‘airspace characteristics’, and ‘conflicting issues in the situation
(task complexity)’. For example, for ‘personal factors’ the majority of ATM specialists
reported this RIF as ‘suitable for the recovery process’ in 70 to 90 percent of failure
occurrences. However, Oslo and Tokyo ATM specialists reported personal factors as
‘suitable’ in less then 15 percent of failure occurrences. These lesser ratings of the
‘personal factors’ indicate the perception of ATM specialists on readiness of air traffic
controllers to face unusual/emergency situations, such as equipment failure.
Similarly, potential gaps are identified with Melbourne and Christchurch ATC Centres
where the majority of failures seem to be latent (accounted for 92 and 60 percent,
respectively). This is contrary to the answers provided from other ATC Centres. Finally,
the potential gaps regarding the ‘adequacy of airspace’ are identified by ATM
specialists from Auckland and Tokyo ATC Centres. They ranked airspace design and
configuration as tolerable, highlighting the potential for improvement of airspace
characteristics to enhance controller recovery performance.
It can be concluded that the ATM specialists from eight countries worldwide produced
similar ratings for the majority of RIFs. Identified inconsistencies reflect differences that
exist between these ATC Centres in terms of the ATC Centre culture (reflected in
personal factors), airspace design, and ATC Centre architecture. These differences are
reasonable as indicators of diversity that exists between ATC Centres within one
5 Source: Air Traffic Activity at Area Control Centre (last available for 2003) from Ministry of
Land, Infrastructure, and Transport (2006)
Chapter 8 Quantitative Assessment of Recovery Context
216
country as well as worldwide. As a result, the responses from the ATM specialists
surveyed have been taken to inform several RIFs. In future, the weighting scheme may
be used to account for the variability between ATC Centres (e.g. safety culture,
differences of ATC Centres, ATM specialists experience).
8.3.1.4 Past literature
Finally, the relevant data from past ATC research are used to inform probabilities for
the RIF ‘experience with the system performance’. The probabilities are determined
from the findings of Hilburn and Flynn (2001) and EUROCONTROL (2000b) in which
18 percent of controllers reported undertrust in technology. These findings are
combined by the responses from the ATM specialists surveyed on the percentage of
controllers with an excessive trust in technology (i.e. overtrust). Therefore, both
sources of information are used to establish the final probability rating for this particular
RIF (presented in Appendix VIII).
8.3.1.5 Aggregation of data
The previous sections have described four different sources of information used to
determine RIF probabilities. These are: operational failure reports, responses from a
questionnaire survey, responses from the ATM specialists surveyed, and past literature.
Table 8-4 reviews all four sources of information with respect to the level of confidence
and therefore the rationale behind the aggregation of data. Three data sources are
rated with a high level of confidence (questionnaire survey, responses from the ATM
specialists surveyed, and past literature). Only one source is rated with medium
confidence. More precisely, the confidence level for operational failure reports from the
CAA databases is not defined as ‘high’ due to the lack of information on the reliability of
available reporting schemes. There are reliability issues regarding the reporting of
safety occurrences recognised by CAAs 6 . However, none of the CAAs has a
methodology in place to assess the reliability of their reporting scheme, and therefore,
the completeness of the occurrence databases. Therefore, the medium ranking for the
confidence level is an assumption informed by operational experience. As a result, the
data from this source are validated by the findings from another source of data (i.e.
ATM specialists input) to assure reliable RIF ratings.
6 International workshop on the analysis of aviation incident/accident precursors. The workshop
was held on 25 and 26 May 2005 at Imperial College London.
Chapter 8 Quantitative Assessment of Recovery Context
217
Table 8-4 Overview of the sources of information used to determine RIF probabilities
Source Level of confidence
(subjective) Comment
Operational failure reports from the CAAs
Medium The confidence level is not defined as ‘high’ due to the lack of information on reliability of available reporting schemes
Operational failure reports from the
engineering unit of particular ANSP
High
The confidence level is defined as ‘high’ due to the fact that the engineering unit has to be aware of all equipment failures occurring in the ATC Centre as they are directly responsible for their maintenance and repair
Questionnaire survey High Responses from 134 air traffic controllers, from 58 ATC Centres, and 34 countries worldwide
ATM specialists High Conducted with ATC specialists from eight ATC Centres worldwide
Past literature High Hilburn and Flynn (2001) and EUROCONTROL (2000b)
In general, the above analyses employed the data from all four sources to define the
probabilities for 20 Recovery Influencing Factors (RIFs). These are presented in
Appendix VIII.
8.3.2 Summary
The preceding paragraphs have used the qualitative levels of the impact of each of the
RIFs (i.e. qualitative descriptor) defined in Chapter 7 and probabilistically defined each.
Overview of all 20 RIFs, their corresponding levels, and designated probabilities is
provided in detail in Appendix VIII and in a tabular form in Appendix X.
Having defined all 20 relevant recovery factors in the previous sections, it is possible to
define recovery context. In general the recovery context may be seen as a discrete
function since all possible contexts are defined exactly by 20 elements, and since each
RIF has only two or three defined levels. In mathematical terms, the existing method
can be expressed as a function f using a set of 20 RIFs to define the recovery context
indicator (Ic) as shown in equation 8-1:
),....,,( 2021 RIFRIFRIFfIc = 8-1
The total number of possible recovery contexts represents the number of combinations
of the 20 RIFs, where nine of them have three levels whilst eleven have only two levels
of impact. In total, this approach generates 39 x 211 = 40,310,784 possible contexts,
each having equal probability of occurrence of 1/40,310,784 = 2.4E-08. In
mathematical terms this is equivalent to finding all variation with repetitions of 20 RIFs
Chapter 8 Quantitative Assessment of Recovery Context
218
and their corresponding levels. In addition, each recovery context will have a specific
value of the recovery context indicator (Ic). The methodology to calculate this variable
is presented in the remainder of this Chapter.
Table 8-5 presents an example of a potential recovery context as a 20-digit array
where each digit corresponds by its position to a particular RIF and by its value to the
precise impact of a particular RIF on controller performance. At this stage, all RIFs are
considered independently and their corresponding levels of influence on controller
performance take integer value, i.e. 1, 2, or 3.
Table 8-5 Example of a potential recovery context represented as a 20-digit array
The way to approach this problem is firstly to determine all recovery contexts for which
RIF5 is represented via Level 1. In other words, it is necessary to determine the
number of recovery contexts for which the RIF5 level is smaller or equal to the cut-off
point between Levels 1 and 2 (i.e. 1.7, Table 8-11). This is presented in equation 8-5
below:
≤<==
≤<==
≤<==
=
=
∑ ∑
∑ ∑
∑ ∑
∑
−
+
−
+
=
−
=
=
+−
=
+
= =
4
'
,1
4
'
''3
'
1,,1
'
''2
1,
0' 0'
''1
'
'1
,1 3,2
1,
,1
3,2
2,1
1, 2,1
0.4',,3
',,2
'0,,1
j
jj
jj
jj
jj
Cj
jj
Cj
jj
C
Cj
jjjj
C
Cj
jj
jj
C
j
C
j
jj
j
j
jCRIFXRIFXRIFX
CjCRIFXRIFXRIFX
CjRIFXRIFXRIFX
RIFXRIFX
8-5
Chapter 8 Quantitative Assessment of Recovery Context
233
where
X represents different contextual factors, X= 1,2,3…,20;
j represents a level of RIFX and can take the values of 1, 2 or 3;
j’ represents a level of RIFX after incorporation of interactions where 0.0 ≤j’≤4.0;
Cj j+1 represents a cut-off point between Levels j and j+1;
For example, for RIF5 (Table 8-11):
0.4'7.2
7.2'7.1
7.1'0
,/,3
,7.2,2
,7.1,1
j 3,21,
2,11,
<<
≤<
≤<
==
==
=+
+
j
j
j
AN
CC
CC
jj
jj
Secondly, it is necessary to determine a subset of recovery context which correspond
to the newly determined level (i.e. 0.89). These are all recovery contexts having RIF5
level in the range (0.8, 0.9]. It should be noted that level 0.89 represents the value of
RIF5 level for one specific recovery context. Finally, the probability of the new level is
calculated as follows (equation 8-6):
055.0924,476,13
576,008,,173.0
)5(
)5(73.0)5(
)(
)()()(
1
89.089.0
'
'
=×=×=
=×=
RIFf
RIFfRIFp
RIFXf
RIFXfRIFXpRIFXp
j
j
jj
8-6
where
X represents different contextual factors, X= 1,2,3…,20;
j represents levels 1, 2, or 3;
f represents the sum of all possible recovery contexts;
p (RIF5 j) represents initial probability of occurrence of RIF5 for level j;
p (RIF5 j’) represents probability of occurrence of RIF5 for its new level j’;
f (RIF5 j’) represents the sum of levels for 0.89 < j’ ≤ 0.90; and
f (RIF5 j) represents the sum of all levels that correspond to the RIF5 Level 1
(i.e. 0.0 < j’ ≤ 1.7).
The new probability of occurrence (0.055) is low in its magnitude, but represents an
occurrence which a high probability of recovery. In other words, in this particular
context, RIF5 is enhanced by the influence of all the other RIFs that have interaction
with it. The final output of this methodology is the indicator of a specific recovery
context (Ic), as presented in equation 8-7. The characteristics of Ic are that, for
example, in the case of all 20 RIFs defined via Level 1 with the probability 1 and no
Chapter 8 Quantitative Assessment of Recovery Context
234
interactions, the value of Ic equals 1. Similarly, in the case of all 20 RIFs defined via
Level 3 with the probability 1 and no interactions, the value of Ic equals -1.
N
RRIFXpRRIFXp
levelsRIFsi j
jj
levelsRIFsi j
jj
c2
20
1
2
1
'
3
20
1
3
1
' )()(
I
×+
×
=
∑∑∑∑= == =
8-7
, where
All calculations relevant to the quantitative assessment of the recovery context
conducted in this thesis are performed using standard C programming language.
8.7.2 Distribution of the recovery context indicator
The recovery context indicator (Ic) represents the numerical representation of a specific
context that surrounds controller recovery from an ATC equipment failure. For
example, changes in the factors that constitute the recovery context (i.e. 20 RIFs),
captured via the change of their qualitative levels, interactions, and effect on controller
performance, are reflected in the change of the Ic magnitude. In practical terms, this
change facilitates better or worse controller recovery.
After the calculation of all 40,310,784 possible contexts it was determined that the
mean value of recovery context indicator (Ic) is 0.027, ranging between -0.069 and
0.131. The distribution of the Ic variable is presented in Figure 8-8.
p(RIFX j’) probability of RIFX with level j’, where X=1, 2, 3, …, 20 and 0.0 ≤ j’ ≤ 4.0. The level j’ takes into account all interactions between RIFs;
Rj correlation coefficient between RIFX and controller recovery performance. Depending upon level j’, it can take values {-1, 0, +1};
N total number of recovery factors (i.e. 40,310,784); and
p(RIFX j’) x Rj
probability of the overall situation occurring in one ATC Centre. In order to look at the quantitative impact that each RIF has on the controller recovery performance, each of the probabilities has to be multiplied with the correlation coefficient.
Chapter 8 Quantitative Assessment of Recovery Context
235
0
100000
200000
300000
400000
500000
600000
-0.0
7-0
.059
-0.0
48-0
.037
-0.0
26-0
.015
-0.0
040.
007
0.01
80.
029
0.04
0.05
10.
062
0.07
30.
084
0.09
50.
106
0.11
70.
128
Recovery context indicator (Ic)
Fre
qu
en
cy
Figure 8-8 Distribution of the recovery context indicator
This distribution is slightly positively skewed (right-skewed) since it has a longer tail in
the positive direction relative to the other tail. This is also confirmed by the positive
value of the statistical test indicating the concentration of values on the left side of the
distribution. The median value or value on the horizontal axis which has exactly 50
percent of the data on each side is -0.023. This positive skew may result from initial
inputs into the methodology for the quantitative (probabilistic) assessment of the
recovery context surrounding equipment failure in ATC. For example, observing the
probability values for each RIF and its corresponding levels it is clear that 12 out of 20
RIFs have a higher probability of enhancing recovery performance as opposed to
having no impact or negative impact. In other words, the probabilities of Level 1 for
these 12 RIFs are higher than for other level(s) (i.e. Level 2 and Level 3, see Appendix
X for details on RIFs probabilities). Therefore, it can be concluded that the framework
for a calculation of the recovery context in the ‘generic’ ATC Centre takes the value of
the recovery context indicator close to 0.027. This indicates that there is a large
potential for improvement and shift of the Ic values more towards a positive side, thus
enabling more appropriate contextual conditions.
In order to fully comprehend the characteristics of Ic, the next step is to calculate the
extreme values of Ic, from the most negative towards the most positive value of Ic. In
other words, it is necessary to determine the ‘ideal’ recovery context where all RIFs can
Chapter 8 Quantitative Assessment of Recovery Context
236
be expressed via Level 110. Similarly, it is necessary to determine the ‘worst’ possible
recovery context where all RIFs can be expressed via Level 311. In these cases, when
there is no uncertainty related to the probabilities of each RIF’s level, it is possible to
represent the most negative and the most positive recovery context.
Hence, the most negative value of Ic calculated using equations 8-6 and 8-7 takes the
value of -0.95. This value represents the worse possible recovery context that can
facilitate controller recovery performance in the ’generic’ ATC Centre. Similarly, the
most positive value of Ic calculated using the same equations is 0.65. These two
values are numerical representations of two extreme recovery contexts which are
mutually exclusive. However, these extreme values may be used as a good indicator of
the scale of changes that are possible to achieve within the ATC environment.
8.7.3 Sensitivity analysis
Because of the large number of recovery contexts (millions) it is reasonable to use the
assumption of normality in accordance with the central limit theory (Berenson et al.,
2006). When the data set is large, the sampling distribution of the mean is
approximately normally distributed. Using this assumption, it is possible to carry out an
analysis of the sensitivity of Ic to changes in any one recovery influencing factor.
The first step is to determine an interval around the baseline (population) mean that
includes 95 percent of the sample means or µ±2σ. According to the statistics presented
in Table 8-13 this range is 0.027+/-0.058. The second step is to implement a particular
change and test whether the sampled recovery context indicator comes from the same
population. As an example, it is assumed that the ‘training for the recovery’ provided to
air traffic controllers includes the equipment failure in question. Therefore, since there
are no uncertainties, this RIF can be defined exactly via Level 1 and its corresponding
probability (p=1). Sample statistics are presented in Table 8-13.
10
RIF3, RIF6, RIF8, RIF11, RIF17, RIF19, and RIF20 do not have the possibility of Level 1 and thus these will take the next most desirable level, being Level 2. 11
RIF2 does not have the possibility of Level 3 and thus it will take the next most undesirable level, being Level 2.
Chapter 8 Quantitative Assessment of Recovery Context
237
Table 8-13 Sensitivity analysis
Step change Statistics (M, SD) Baseline mean range
Baseline N=40,310,784
M=0.027 SD=0.029
(-0.031, 0.085) Sample 1 (change of RIF1)
N=13,436,928 M=0.061
SD=0.035
Sample 2 (change of RIF1 and RIF2) N=6,718,464
M=0.091 SD=0.023
With suitable training for the situation in question (e.g. a particular failure type) there is
no significant difference between the sample and baseline means but it is observable
that the value of Ic shifts toward a more positive value. Therefore, a second sample
was taken, assuming additionally that RIF2 or ‘experience with equipment failure’
matches precisely the equipment failure in question. In other words, RIF2 can be
defined exactly via Level 1 and its corresponding probability (p=1). The result of this
analysis shows that there is a significant change in the recovery context, since the
obtained mean does not fit the 95 percent confidence interval determined for the
baseline. Therefore, the enhanced recovery context (sample 2) comes from a
population different from the baseline recovery context. This finding indicates that the
value of Ic is sensitive to changes in the individual RIFs.
8.7.4 Optimal solutions
The methodology for the quantitative assessment of the recovery context presented in
the previous sections allows for the investigation of the recovery context in a particular
ATC Centre as well as for a particular equipment failure event. Furthermore, this
approach creates a basis for quantitative assessment and the choice of optimal
solutions for recovery enhancement. These solutions should be reviewed through the
changes in RIFs, their corresponding level, and the resulting changes in the value of Ic.
Whilst not all RIFs could be enhanced, it is necessary to focus on those which may be
affected. For instance, it is reasonable to assume that internal factors have a significant
potential for change either by enhancement of training or personal abilities on a daily
basis (e.g. fatigue, health, attitude, stress). A review of the other three RIF groups
(equipment related, external, and airspace related) reveals potential areas of change
as well as factors which cannot be influenced at the level of a particular ATC Centre
but possibly at the level of a region (e.g. traffic complexity is possible to impact on the
regional ATM level through the central flow management unit).
The optimal change is defined as the best ratio between the benefit and the cost of the
proposed recommendations. Benefit is defined as a shift in the RIF levels toward more
Chapter 8 Quantitative Assessment of Recovery Context
238
desirable Level 2 (average) or Level 1 (most favourable) and an overall shift in the
recovery context indicator (Ic) towards more positive values (e.g. extreme positive
value). The cost should be defined through the inherent costs linked to the proposed
recommendation and therefore, should include actual rather than generic costs of the
proposed change within the specific ATC Centre. Thus the cost may include the
following:
� costs of technical changes, followed by any other operational costs (delay in the
use of new system due to necessary maintenance, staff training);
� costs of designing a new procedure, followed by the cost of training the staff (i.e.
time and resources);
� cost of additional Team Resource Management (TRM) training;
� creation of a more adequate organisational environment. The examples are
improvements in terms of roles and responsibilities, the availability of team
members, the adequacy of supervision, the availability of additional support (e.g.
assistant), the personnel selection process, shift patterns and personnel planning,
attitude to teamwork, safety culture, stress management programs, support for
the organised exchange of past experience on non-nominal events,
communication with management and technicians (e.g. briefings, exchange of
knowledge, bulletins, safety panels); and
� the costs of any potential changes in airspace design.
The methodology presented in this thesis is able to provide the benefit of each
proposed solution. However, the evaluation of the related costs, as opposed to the
benefit, is not so straightforward and would necessitate input from ATC Centres.
Therefore, another approach may be utilised to ‘rate’ the benefit of implemented
changes on the level of ATC Centre, namely by the calculation of the ‘recovery context
efficiency’. This variable represents the ratio between the value of current recovery
context and the value of the most positive recovery context feasible in a particular ATC
Centre.
8.8 Summary
This Chapter has presented a methodology for the quantitative assessment of recovery
context. It started by reviewing the past HRA research of relevance to the quantitative
analysis of contextual factors. This has resulted in the selection of the CREAM
technique and its application by Kim, Seong, and Hollnagel (2005) for further
development. Building on this, a novel methodology has been developed for the
research presented in this thesis. This method assessed controller recovery
Chapter 8 Quantitative Assessment of Recovery Context
239
performance based on 20 relevant contextual factors (RIFs) and through several
distinct steps. Each RIF and its corresponding levels have been probabilistically
determined using four sources of information. These are operational failure reports,
questionnaire survey, input from eight ATM specialists, and past ATM related literature.
The methodology has further built on this and incorporated RIF interactions. This has
resulted in the change of the RIF levels and re-calculation of the corresponding
probabilities. The outcome of the entire methodology is the definition of the recovery
context indicator (Ic), as a numerical representation of a specific context surrounding
recovery from equipment failure in ATC. Ic is sensitive to the RIF changes and as such
may be used to investigate solutions to enhance the controller recovery. In other words,
the benefits of any safety-relevant changes in ATC Centres may be quantitatively
assessed in two separate ways. Firstly, the benefit can be assessed as a shift in the
distribution of the recovery context indicator from the baseline (pre-change) value to
the new value (as a result of implemented changes). Secondly, it is possible to
calculate the context utilisation or the ratio between the current value of the recovery
context and its most positive value achievable within the particular ATC Centre.
After the review of the methodology for the quantitative assessment of recovery context
in a specific ATC environment, the following Chapter 9 describes an experimental
investigation designed to further verify the proposed methodology.
Chapter 9 Experimental Investigation
240
9 Experimental Investigation of the Air Traffic Controller Recovery Performance
After the review of the methodology for the quantitative assessment of the recovery
context in the previous Chapter, this Chapter describes an experiment designed to
further validate the proposed methodology and capture the controller recovery
performance. This Chapter begins with a high-level design for the process adopted for
the experiment. This is followed by the rationale behind the need for the experiment
defined through several objectives. In order to achieve these objectives, this Chapter
describes the overall design of the experiment and selection of potential equipment
failures initially tested in a pilot study. It continues by providing the key requirements for
the experiment of relevance to this thesis, measured variables, and experimental
procedure.
Both the pilot and the main experiment were conducted in close collaboration with one
European Civil Aviation Authority (CAA)1. This particular CAA provided all of the
necessary infrastructure and staff from two ATC Centres during the period of the
experiment in 2005 and 2006. One ATC Centre was used for the pilot study which
tested the feasibility of the experimental design and its overall methodology. The other
ATC Centre was used on three separate occasions to simulate a selected unexpected
equipment failure in order to capture data on the recovery performance of 30 licensed
air traffic controllers. The Chapter concludes with a discussion of measured variables
used to capture the characteristics of controller recovery in ATC. The data collected is
subjected to a rigorous analysis in Chapter 10.
1 This CAA performs the function of Air Navigational Service Provider (ANSP) and the term CAA
will be used to denote also ANSP in the remainder of this thesis.
Chapter 9 Experimental Investigation
241
9.1 High-level design of the experimental process
Figure 9-1 below indicates the steps of organising and conducting this experiment. The
process starts with the rationale behind the need for experiment designed to capture
controller recovery performance. It proceeds with the assessment of available
resources, with focus on two key requirements, namely access to an ATC simulator
and the participation of controllers. Once these requirements have been assured, the
experimental process proceeded with the initial planning and design of the experiment
(i.e. airspace and traffic scenario, equipment failure type). Once this design had been
tested in a pilot study, the experimental process proceeded with the main experimental
study. Collected data are pre-processed and subjected to a rigorous analysis to extract
information of controller recovery from an operational environment (presented in
Chapter 10).
Rational for the experiment
Planning for the experiment
Design of the experiment
Assessment of the available resources
Pilot study
Revision of the pilot study
Main experimental
study
Data processing and analysis
In case of necessary changes
Selection of the equipment failure
Figure 9-1 The flow diagram of the experimental process
Chapter 9 Experimental Investigation
242
9.2 Rationale for the experiment
The preceding Chapters presented a detailed overview of equipment failure
occurrences in the ATC environment from both technical and human perspectives. The
findings from past literature were augmented by operational failure reports (capturing
the technical aspect of equipment failures) and feedback from an international
questionnaire survey (capturing both technical and human aspect of equipment
failures). Furthermore, factors relevant to controller recovery were identified using both
theoretical and operational findings. These factors, referred to as Recovery Influencing
Factors (RIFs), created a basis for the quantitative assessment of the recovery context.
This Chapter builds on the preceding Chapters and generates ‘real’ operational data on
controller recovery. These data are further used in Chapter 10 to verify the quantitative
assessment of the recovery context developed in Chapter 8 and the relevance of RIFs
identified in Chapter 7.
9.3 Assessment of the available resources
An assessment of the requirements and necessary resources for the experiment
highlighted the need to perform it either at an ATC Centre or a research institution
appropriately equipped. The critical requirements of the experimental design can be
grouped under two particular categories. These are the access to an ATC simulator
and the availability of licensed controllers. Based on these requirements several
potential locations were assessed:
� The Maastricht Upper Area Control Centre (MUAC) in the Netherlands. This is a
EUROCONTROL operational and simulation facility having the resources to support
both access to simulators and controllers;
� Human Factors Lab at the EUROCONTROL Experimental Centre (France),
providing access to simulators but not controllers;
� The CEATS Research, Development and Simulation (CRDS) Centre in Budapest
(Hungary). This is a EUROCONTROL facility providing access to simulators but not
controllers; and
� Various Civil Aviation Authorities (CAAs), air navigational service providers
(ANSPs) and their respective ATC Centres providing access to both simulation
facilities and controllers.
Chapter 9 Experimental Investigation
243
Although the requirements for an experimental plan were ready at the initial stage of
the research, it took two years to gain access to the required facilities. After
considerable negotiations with all potential locations, only one CAA responded
positively and agreed to provide both simulation facilities and staff for this experiment.
Both the pilot and the main study were conducted using their facilities, assistance, and
manpower.
9.4 Planning for the experiment
The review of the relevant literature, presented in Chapter 5, revealed that there is a
lack of detailed knowledge of how controllers perform during unexpected or unusual
situations (including equipment failures). This is partly due to the fact that there is no
relevant data available in the public domain2. This necessitated the design of an
experiment in this thesis to capture and exploit the relevant data.
As a result of close academic cooperation, one European CAA gave Imperial College
London the opportunity to plan, prepare, and run an experiment designed to study the
factors that drive the process that controllers follow to recover from ATC equipment
failures. This experiment was conducted in two phases (see Table 9-1). The first phase
involved a pilot study designed to test the feasibility of the experimental plan including
the appropriateness of the recovery methodology, serviceability of the equipment, and
clarity of the instructions to the participants-controllers working in the ATC Centre. The
results of the pilot study were used to enhance the plan for the main experiment. The
second phase of the study involved the execution of the main experiment where data
was collected for further analysis. A secondary objective was to assess and augment
the existing emergency training procedures as defined by this particular CAA in their
Manual of Air Traffic Services (MATS).
The planned experiments assumed a level of knowledge (on the part of the researcher)
necessary to fully comprehend the recovery process, in terms of the reactions and
actions of the controller in dealing with unexpected equipment failure. For this reason, it
was essential to acquire certain skills before running the actual experiments. To
achieve this objective, practical simulator training was completed by the researcher
prior to the execution of the main experiments (Table 9-1). The scheduled training was
2 Some research was done in the UK National Air Traffic Services (NATS), but was not released for public use.
Chapter 9 Experimental Investigation
244
preceded by a review of relevant ATC topics in order to prepare efficiently for practical
work on the simulator. The relevant areas covered were ATC phraseology, operational
procedures, equipment, radar vectoring, speed control, level busts, and aircraft
performance.
Table 9-1 Training, pilot study, and experiment sessions
Date Phase Objective Comment
19-20 Feb 2005 Planning for the
experiment
Basic training for the ab initio student, APP training
Total of 10h training on simulator
26-27 Feb 2005
APP training (arrivals and departures sequencing, radar vectoring)
Total of 10h training on simulator
02 Nov 2005
Phase I Pilot study Total of three
controllers participated
29 Nov – 01 Dec 2005
Phase II
Main study I Total of eleven
controllers participated
27 Feb – 02 Mar 2006
Main study II Total of ten controllers
participated
06 Jun – 09 Jun 2006
Main study III Total of ten controllers
participated
9.5 Design of the experiment
Since equipment failures are rare events3 , the experiment aimed to represent failure in
the most realistic form, i.e. as unexpected event. To assure the occurrence of failure as
an unexpected event, each controller participated once in the experiment. The
experiment also assumed a single-controller ACC sector (as opposed to a team of
controllers) to allow best utilisation of available ATC staff and to lessen any logistical
difficulties. Before the experiment, controllers were to be informed of the objectives of
the study in highly generic terms. They were to be given the opportunity to ask specific
questions in the post-experiment debriefing session. Additionally, to assure the
discretion and confidentiality of this study, each participant was to be required to sign a
consent form which incorporated an agreement not to disclose any information
regarding this experiment. In this way, the true objective of the experiment, i.e. the
injection of the unexpected and unforeseen equipment failure, was preserved.
3 Most of the failures in the ATC environment are prevented or handled at the
technical/engineering level. Only a few failures manage to penetrate multiple redundancies and fail-safe system design and affect controller performance.
Chapter 9 Experimental Investigation
245
The experiments were to be conducted during morning and afternoon sessions with an
assurance that participants are tested in equal proportion during the two sessions. The
simulation room conditions (lighting, temperature, noise) were to be consistent for all
runs.
Each simulation run was planned to last approximately 30 minutes, followed by a
debriefing session of similar duration. The instant of the injection of equipment failure
was planned to be precisely determined during the pilot study, occurring between the
5th and 15th minute of each run. The equipment failure would last 15 minutes. This was
decided based on two factors. Firstly, operational data shows that the majority of
failures last up to 15 minutes (Chapter 4 section 4.4.6). This has been confirmed by the
questionnaire survey results (presented in Appendix VI). Secondly, the 15 minute
duration of failure represents enough time to observe, capture, and assess the
controller reactions, performance, and overall recovery strategy.
The selection of the equipment failure to be simulated in the pilot study was based on
the results of the analysis of operational failure reports, the qualitative equipment
failure impact assessment tool, and the results of the questionnaire survey. However,
this selection was constrained by the technical capabilities of the available simulation
platform. In other words, it was important to simulate failure as well as the restoration of
the relevant equipment. Thus, the simulator platform would have to provide this
particular capability for a selected failure type. The final decision on the equipment
failure to be simulated would be achieved after testing candidate failure types during
the pilot study. The detailed rationale behind the selection of potential equipment
failures for the pilot and main experiment is given in the following section.
Another important factor of the experiment was the involvement of a Subject Matter
Expert (SME). The role of the SME would be to act as an observer and the coordinator
of the operations room. Upon a request from a controller, the SME would be
responsible for issuing any relevant information about the failure and its effect on the
ATC Centre (as would be required in the operational environment upon receiving an
update from the system control and monitoring unit). Upon restoration of the
equipment, there are several steps that controllers must perform to assure equipment
reliability and hence its readiness for the restoration of normal service (i.e. post-
restoration steps). Therefore, additional time would be given to controllers in the post-
restoration part of the simulation run, from the 25th to the 30th minute of each run. This
Chapter 9 Experimental Investigation
246
is to restore a normal working strategy after the effects of an unexpected equipment
failure.
Each simulation run would be observed by the researcher and the SME, and recorded
for the purpose of further data analysis. During each simulation run, notes would be
taken on each controller’s recovery performance and changes in attitude/behaviour
prior to and after the injection of a failure. This would enable both qualitative and
quantitative data to be captured.
The observation team would be positioned in the most unobtrusive way, still having a
clear view of the radar screen. The simulation runs would be followed by an immediate
debriefing session guided by the questionnaire and other material designed specifically
for this session. The controllers would assess all the factors that potentially influenced
their recovery performance, guided by the RIFs identified in Chapter 7. In addition, they
would be given an opportunity to judge their own performance and the credibility of the
simulated failure.
9.6 Selection of the equipment failure to be simulated
The classification of ATC system functionalities, presented in Chapter 2, identified nine
main categories. The critical subsystems, equipment, and tools were identified in each
category. This categorisation identified the number of components that could fail within
the ATC system architecture. To further assess the characteristics of equipment failure
occurrence, Chapter 4 reviewed some of the main characteristics of failures in terms of
complexity, time course of failure development, overall exposure, and impact on ATC
and ATM operations.
Further assessment of equipment failure types is presented in Chapter 4 and is based
on the detailed analysis of operational failure reports from four different countries. This
analysis shows that equipment failures dominate within the communication, navigation,
surveillance, and data processing functionalities. A subsequent analysis of the level of
severity showed that most failures that have a major impact on ATC operations occur
within the communication, surveillance, and data processing functionalities.
Furthermore, the availability of the ‘duration’ variable in one of the datasets (Country
D), enabled identification of equipment failures lasting up to 15min, which is the failure
duration feasible within this experimental set up. Failures with a major impact on ATC
operations lasting for a period of up to 15 minutes include: data exchange network,
Chapter 9 Experimental Investigation
247
other surveillance systems (predominantly radar link), the flight data processing
system, and air situational display (see Table 9-2).
Table 9-2 Overview of the potential equipment failures to be simulated and their inclusion in the pilot study
Source Potential
equipment failures to simulate
Qualitative equipment
failure impact
assessment tool rating
Adequacy for the pilot
study
Comment Testing in the
pilot study
Operational failure reports
(selection focused on
major failures of short
duration)
Data exchange network
Secondary functionality
No
It can range from moderate to minor and the selection tries to focus on major failures
-
Other surveillance systems (e.g. radar
link)
Secondary functionality
No -
Flight data processing system
Primary functionality
Yes - Reduced flight
plan mode
Air situational display
Primary functionality
Yes
Not interesting enough from the
controller recovery perspective
-
Questionnaire survey
Air-ground communication
Primary functionality
Yes - Aircraft radio
communication failure
Primary surveillance radar
Primary functionality
Yes
Not possible to simulate failure of one radar, but only
the complete loss of radar coverage
-
Flight data processing system
Primary functionality
Yes - Reduced flight
plan mode
Communication panel
Primary functionality
No
Not interesting enough from the
controller recovery perspective as the
controller would simply change the
position
-
Ground-ground communication
Primary functionality
No
Not interesting enough from the
controller recovery perspective as the controller would try
to establish communication via
other means
-
Furthermore, the analyses of the questionnaire survey responses in Chapter 6 (Table
9-2) identified the five most unreliable aspects of ATC equipment. These systems are:
air-ground communication, primary surveillance radar, flight data processing system,
communication panel, and ground-ground communication.
Chapter 9 Experimental Investigation
248
Having these nine possible failure types identified, it was necessary to select candidate
failure types for a final assessment in the pilot study in order to determine the failure to
be simulated in the main experiment. The rationale for this selection was based on the
severity of the failures as determined using the qualitative equipment failure impact
assessment tool (Chapter 4, section 4.5). The development of this tool was based
around the fact that not all equipment failures have the same severity of impact on ATC
operations. This tool identified the failures with the largest impact on ATC operations.
These are failures of the primary ATC functionality, which affect multiple
systems/tools/equipment either suddenly or gradually up to one hour in duration (see
Figure 4-9 and Table 9-2).
The process above, based on operational failure reports, the questionnaire survey, and
the qualitative equipment failure impact assessment tool, identified four potential failure
types. These are the failure of the flight data processing system, air situational display,
air-ground communication, and primary surveillance radar. These four candidate failure
types are further scoped by assessing their significance from the controller recovery
perspective but also their technical feasibility. In other words, the focus was on the
failures which require controllers to recover using only the systems available at their
positions. As a result, the pilot study simulated two different equipment failures. These
were a reduced flight plan mode as a part of the flight data and processing system and
air-ground radio communication failure.
Both failure types also conform to the requirements described in Chapter 5 (section
5.7.3) that the simulated equipment failure should allow one part of the diagnosis
phase of controller recovery to be performed overtly and thus be captured via
observations. For example, the flight data and processing system failure may be
initially thought as aircraft transponder or secondary surveillance radar failure.
Similarly, air-ground communication failure manifests itself in the same manner
regardless of its cause (i.e. ground- vs. airborne-based failure). In both cases, it is up to
the controller to identify the true failure by ruling out alternatives (e.g. communication
with pilot or adjacent ATC Centre) and this diagnostic process can be captured via
observations.
Chapter 9 Experimental Investigation
249
9.7 Pilot study: lessons learnt
Before conducting the main experiment, a pilot study was performed in order to
determine the feasibility of the experimental plan particularly with respect to the
serviceability of the equipment, ease of understanding of instructions, and logistical
issues. The study was designed to match the main experiment as far as possible.
Three controllers, selected at random and with no prior knowledge of the nature and
purpose of the experiment, participated in the study.
The pilot study was conducted on 2 November, 2005. It was part of a pre-planned
simulation, designed to test a newly restructured and reorganised airspace in the Area
Control Centre (ACC) of this particular ATC Centre. Of the three controllers who
participated in the pilot study, one was part of the airspace simulation test programme.
The others were volunteers who participated upon completion of their operational shift.
A total of three simulation runs were conducted. The first run was discarded due to the
inappropriate timing of the injection of the equipment failure.
The set up of the pilot study involved two Controller Working Positions (CWPs), with
the same simulation exercise running simultaneously on both CWPs. The participating
controller was located at one CWP, whilst the researcher and the SME occupied the
second CWP. In addition, a video camera was positioned in front of the second position
so that the controller would not be intimidated by its presence. The pilot study
simulated two equipment failures (Table 9-3) chosen based on the findings from
several sources (as discussed in section 9.6). There were no recovery procedures in
place for the first failure. The second failure has a defined procedure defined by
international aviation organisations (see EUROCONTROL, 2003f; ICAO, 2001a) but
not implemented within the respective ATC Centre.
Table 9-3 Equipment failures used in the pilot study
Type of failure Effect Existence
of recovery procedure
Human Machine Interface (HMI) indication on CWP
Reduced flight plan mode –
failure of flight data processing system
Monitoring aid available only for flight plan tracks already
displayed No
General Information Window/Flight Data
Processing (FDP) label changes from white to
yellow Flight data functions not
available
Aircraft radio communication
failure
Inability of the controller to contact aircraft on the
dedicated frequency as well as emergency frequency.
No (not in the ATC Centre)
None
Chapter 9 Experimental Investigation
250
Several important conclusions were drawn from this pilot study and the lessons learnt
were used to enhance the main experimental design. These are as follows:
� Integration of a research experiment into any kind of on-going ATC training requires
significant collaboration with training instructors, the engineer in charge, and an
ATM specialist (SME). In spite of thorough preparation, the injection of failure in the
first simulator run did not occur at the required instant due to the unclear
instructions given to pseudo pilots. This issue was corrected in the subsequent
runs. Therefore, for the main experiment a complete understanding of the set up of
the experiment would have to be ensured between the training instructor, engineer
in charge, pseudo pilots, and the SME in order to avoid any misunderstanding. This
should involve detailed discussions prior to the first simulation run of the day.
� The initial intention was to inject an equipment failure in the 25th minute of the
simulation run, in order to give the controller adequate time to adjust to the traffic
scenario. However, the first run showed that this timing was inappropriate for two
reasons. Firstly, the controllers were all very experienced and thus did not require
the proposed length of time to adjust to the traffic scenarios. Secondly, the traffic
scenarios used had a low number of aircraft in the dedicated sector from the 25th
minute onwards. This was contrary to the plan to inject an equipment failure during
the periods of average to high traffic density. Both problems were corrected by
injecting a failure in the 10th minute of the simulation run and observing the
controller recovery process while traffic increased progressively during the 30
minute runs. Since the main experiment was to use fully licensed and experienced
controllers, the exact moment of failure injection would have to be based on the
number of aircraft in the sector. The aim would be to initiate failure with traffic levels
starting with average and then progressing towards high.
� The need for access to the simulator log files was identified for the purpose of
capturing all of the inputs of the controller on the keyboard and HMI. The main
purpose for these log files would be to extract the precise reaction time of the
controller following detection of the equipment failure. However, difficulties were
encountered in the acquisition and decoding of these log files. Log files from
simulation platforms tend to have a specific format and level of detail too
cumbersome to decipher. In addition, initial detection may not necessarily be
captured in these log files (as an actual action). This is because controllers may
detect the failures but not take any action until they have evaluated the impact of
the failure on the operation. Having considered all the advantages and
disadvantages of using log files, it was decided to omit them. An alternative was
Chapter 9 Experimental Investigation
251
developed based on the use of a camcorder with a precise timing capability
(synchronised with the CWP timer). In addition, a debriefing session with the SME
was implemented to validate the data captured throughout the recovery processes.
The moment of detection was further validated through the results of the interviews
with the participating controllers in the debriefing session.
� The debriefing session revealed that some changes to the questionnaire used in
the debriefing session would be necessary. This would involve amending several
questions to extract more information from the participating controllers (e.g. traffic
and airspace related questions were to be presented in such a way as to extract
more detailed information on precise characteristics such as mix of traffic, vertical
movements, crossing movements, sector design, size of the sector, and number of
entry and exit points.
� Due to staff shortage (i.e. ATM experts) and the significant duration of the
experiment (three sessions spread across 11 days), it was not possible to access
two SME’s to observe the performance of each controller.
� It was possible to define required recovery steps for a simulated equipment failure
types and thus avoid a level of variability in each simulation run (as a result of
differences in experience, working strategies, traffic complexity at the instant of
failure injection, and inconsistencies in the pseudo-pilot inputs). The required
recovery steps are validated by the SME.
� Several issues of a more technical nature were recognised: a need for the use of a
voice recording device in the debriefing stage of the experiment as a more efficient
means of capturing the controller responses, the need for two camcorders or a
combination of one camcorder and radar replay for the debriefing session, and the
need for the use of 8mm tape camcorder instead of digital camcorders due to the
higher resolution achieved in recording and replay.
� Another factor of note was that the controllers tended initially to stop their work
when a failure occurred. This was because they felt this was a software
glitch/bugging error, common to real-time simulations. Therefore, the instructions
were to be updated to inform the controllers that in the case of any unusual event
they are expected to continue working as they would in the operational
environment. The experience of ATM specialists showed that although the
controllers may anticipate an unusual occurrence, this does not facilitate a better
handling of the occurrence (for evidence see Appendix II). Therefore, it was
assumed that prior warning of some unusual situation may not alter or enhance
controller recovery performance. It was more important that participating controllers
Chapter 9 Experimental Investigation
252
did not have advance knowledge of the nature of that unusual occurrence, i.e. ATC
equipment failure.
� Because of the great amount of data and observations to be collected, it was
realised that the main experiment would require an assistant. The primary task of
the assistant would be to observe and take notes/recordings of the controller’s overt
behaviour and attitude.
� Finally, although the simulation runs in the pilot study were designed to reflect high
traffic levels, failures were injected during a period of average to low traffic.
Additionally, no adverse weather was simulated, which would add to the complexity
of the exercise. As a result, the traffic scenario in the main experiment would
necessitate high traffic levels from the moment of failure injection throughout the
duration of the exercise. Additionally, adverse weather could be simulated resulting
in the unplanned rerouting of air traffic.
9.7.1 Summary of the findings from the pilot study
As a result of the findings from the pilot study and subsequent discussions with
technical staff and the SME, the following lessons were learnt and used to enhance the
main experimental study:
� A complete understanding of all details on the experimental set up has to be
ensured between the training instructor, engineer in charge, and the SME. In this
manner it is possible to provide a consistent injection of failure, adverse weather
conditions, and timely recordings for each simulation run of the main experiment.
This would require detailed discussions prior to the first simulation run of the day.
� In the main experiment the failure should be injected in the tenth minute of the
simulation runs, when the traffic reaches average levels and progresses towards
higher traffic levels.
� The main experimental set up would require an assistant to observe and take
notes/recordings of the controller’s overt behaviour and attitude.
� The main experimental set up should be based upon one traffic scenario with
average to busy traffic and adverse weather conditions (pseudo pilots should be
briefed to ask for rerouting due to adverse weather conditions); and
� The pilot study tested two different equipment failures. Both failure types showed
the potential for the experiment. However, the flight data processing system failure
was chosen for the main experiment as it is more demanding from the controller
recovery perspective. The failure would be injected as a sudden failure in the tenth
minute of each simulation run and it would last for 15 minutes.
Chapter 9 Experimental Investigation
253
The following section discusses the process adapted to set up the actual experiment
including a description of the characteristics of the simulated airspace, traffic, and
equipment failure type.
9.8 Experimental set up
The main experimental study was conducted in an ATC Centre (different from the one
used in the pilot study) in three separate sessions: from November 29 to December 1,
2005, from February 27 to March 02, 2006, and from June 06 to June 09, 2006 (Table
9-1). The reason for choosing a different ATC Centre to the one used for the pilot
study, was to access a larger population of controllers and required simulation facilities.
There were several differences in the set up of the main experimental study when
compared to the pilot study. The differences are presented in the following paragraphs.
Note that the other design specifications were maintained as given in section 9.5.
The population for this experiment should consist of the controllers from the ATC
Centre where the experiment was to be carried out. The population characteristics to
be sampled in this experiment are age, operational experience (i.e. years in service),
and rating of the controllers. Based on the statistical characteristics of human (i.e.
controller) performance and potential modelling with the normal distribution, the
minimal number of simulation runs (and thus participants) would be 20 (Shier, 2004).
However, collecting a larger sample of controller recovery performance poses a
significant challenge because of accessibility (to both controllers and a simulator
facility) and other logistical problems.
As a result, the study had a total of 31 simulation runs (eleven runs in the first session,
ten runs in the second and third session) performed on the Beginning to End Skills
Trainer (BEST) simulation platform. The main study was conducted in collaboration
with various staff from the ATC Centre. They were: one ATM specialist taking the role
of the Subject Matter Expert4 (SME), technical staff supporting the simulation runs,
several pseudo pilots, and total of 31 controllers. All three sessions were designed to
be as similar as possible in a given ATC environment.
4 The SME participating in this study is an ATM Specialist with 20 years of experience in many
facets of ATC and has 15 years of experience as an ATC instructor.
Chapter 9 Experimental Investigation
254
As mentioned previously, each simulation run was of approximately 30 minutes
duration, followed by a debriefing session of a similar duration. The experiment
(executed according to the timeline in Figure 9-2) used a pre-planned training exercise
modified for experimental use. After the first simulation run (which was discarded
afterwards), the exercise was amended to reproduce a busier traffic environment. In
other words, several arrivals were accelerated to achieve a busier period from the 10th
to the 25th minute of the exercise. FDPS failure was consistently injected in the 10th
minute of each run by pseudo pilots who manually de-correlated each new radar track.
In addition, pseudo pilots were instructed to simulate adverse weather conditions en
route by asking for necessary rerouting from the controller. Weather conditions were
scheduled for the fifth and fifteenth minute of the run. The FDPS was consistently
restored in the 25th minute of each run (see Figure 9-2).
Figure 9-2 Timeline of the experiment
The recovery process did not end with the restoration of the equipment (the 25th
minute) due to several steps that the controller had to perform to assure equipment
reliability and hence the readiness for the restoration of normal service. It usually took
one minute to accomplish these post-restoration steps. Additional time was given to
controllers in the post-restoration part of the simulation run (from the 25th to the 30th
minute of the run) to restore their normal working strategy and to calm down after the
effects of a highly stressful equipment failure occurrence.
The SME involved in the study as an observer also acted as a coordinator to issue any
relevant information about the failure and its effect on the entire ATC Centre. This
notice was issued in response to queries from the participating controllers. However, if
a controller did not make any attempt to contact the coordinator, the SME issued this
information at the most suitable moment during the exercise (based on the level of the
controller’s workload).
Each simulation run was observed by the researcher, the assistant, and the SME; and
recorded for the purpose of further data analysis. The assistant was mainly responsible
Chapter 9 Experimental Investigation
255
for taking notes of the controllers’ overt behaviour prior to and after injection of failure.
A check-list using the SHAPE5’s list of attitudes was used to guide the assistant in
performing this task (EUROCONTROL, 2004f). The assistant was positioned in the
least intrusive way to the controller, completely outside of his/her field of view. On most
occasions, the observation team was positioned as far from the controller’s field of view
as possible, whilst still having a clear view of the radar screen. The precise set up of
the simulation room in which the experiment took place and the positions of all parties
involved are depicted in Figure 9-3.
Figure 9-3 Room set up
The simulation runs were followed by an immediate debriefing session guided by the
questionnaire and other material designed specifically for this session. The controllers
were asked to evaluate all the factors that potentially influenced their recovery
performance. In addition, they were given an opportunity to judge their own
performance and the realism of the exercise itself. The questionnaire and other
material designed for the experiment and the debriefing session is presented in the
Appendix XIII.
Equipment failure in ATC, as any other unusual or emergency event, represents a
highly stressful event. In these instances the controllers are required to intervene with
complex strategies and employ their knowledge under significant pressure and high
psychological stress. For this reason, the debriefing session was used to help diffuse
stress by creating a relaxed interview environment where the participating controllers
could evaluate their actions and performance. This session was structured in such a
way as to enable comparisons across the participants. For this reason, a special
5 SHAPE project is briefly explained in Chapter 7, section 7.3.1.3. List of attitudes used to guide
the assistant in the experimental process was derived from SHAPE attitude items, such as attentive, active, confident, thoughtful, calm, careful, and enquiring.
Chapter 9 Experimental Investigation
256
debriefing sheet had been designed prior to simulation runs. The rationale behind this
structured approach to debriefing was to ensure a consistent and reliable acquisition of
data on controller recovery performance. The debrief segment of the experiment was
used to confirm and detail observations made during the simulation run via an
approach similar to a “cognitive walkthrough”. In other words, this part of experiment
was used to discuss the sequence of recovery steps required by a controller to
accomplish a recovery, and to validate failure detection and the factors that influenced
each stage of the recovery (i.e. detection, diagnosis, and correction; further discussed
in Chapter 10).
The following paragraphs give a brief description of the key elements of the
experiments in terms of airspace, traffic, and failure characteristics.
9.8.1 Airspace characteristics
The approach airspace of the ATC Centre where the experiment was carried out is
designated as class “C” airspace. This airspace extends horizontally over a radius of
30Nm from the airfield (runway 06/24, instrument landing system - ILS equipped on
both runway ends). The vertical limits are from the surface to 8,000 ft or FL80.
However, in the case of an early handover from area control, the area of responsibility
of the approach control increases. For example, if an aircraft is handed over at FL180
descending to FL80, all of the airspace in between becomes the responsibility of the
particular approach sector. On a scale of one (adequate airspace) to three
(inappropriate airspace) the participating controllers ranked this airspace as 1.31 on
average, which translates to airspace of adequate to tolerable complexity (Table 9-4).
In addition, a series of in-depth questions on airspace characteristics were presented to
each controller to identify the specific features of this airspace. The most frequently
observed issues with traffic complexity were:
� that there were a variety of flight levels and altitudes utilised (from FL100 down to
FL90, 4500ft, 4000ft, 3500ft, 3000ft);
� that there were no specific entry and exit points (throughout the duration of this
experiment this particular airspace did not provide for any standard instrument
departure and arrival routes, i.e. SIDs and STARs); and
� that the complexity of the neighbouring sectors did influence complexity within the
approach sector they operated in (e.g. two neighbouring sectors have large
numbers of crossing traffic).
Chapter 9 Experimental Investigation
257
Table 9-4 The mapping between exercise characteristics and the controllers observations
The exercise characteristics The controllers observations
Airspace characteristics simulated as adequate Adequate to tolerable Weather conditions simulated as unchanged (pre- and post-failure)
Unchanged
Traffic characteristics simulated as high Average to high
In addition, the weather conditions in the exercise simulated 15-25 knots southwest
wind, rain showers, half of the sky covered with cumulonimbus cloud (i.e. thunderstorm
cloud) with base at 1800ft, temperature of two degrees Celsius, and the pressure at
mean sea level (MSL) of 1032 hPa. Generally, in these conditions, icing will occur
inside cloud above 2000ft (in the ICAO standard atmosphere the temperature
decreases on average by 2 degrees Celsius/1000ft). Since the weather conditions pre-
and post-failure injection remained unchanged (i.e. re-routings requested by pilots in
both cases), the overall weather was marked as unchanged. This was confirmed by the
SME and participating controllers (Table 9-4).
9.8.2 Traffic characteristics
The exercise used in this experiment had a duration of 30 minutes and a total of 14
flights (one training aircraft, ten arrivals, and three departures), which translates to 28
aircraft per hour. In the peak segment of the training exercise, the controller was in
simultaneous radio contact with seven to eight aircraft. On a scale of one (high
complexity) to three (low complexity) the participating controllers ranked the traffic
complexity as 1.66 on average. This rating translates to average to high traffic
complexity (Table 9-4). In addition, a series of in-depth questions on traffic
characteristics were presented to each controller to identify the traffic characteristics
mostly observed in the given traffic scenario. These were:
� aircraft speed mix or the difference in indicated airspeeds ranging from 125 knots to
250knots (i.e. the speed read directly from the airspeed indicator on an aircraft);
� the utilisation of hold and thus induced delays;
� only Instrument Flight Rules (IFR) aircraft utilising the airspace;
� high volume of traffic with vertical and crossing movements; and
� an average flight time in the sector of 10-15 minutes (longer than usual due to the
injected equipment failure).
9.8.3 Equipment failure characteristics
The choice of the equipment failure was driven by the previous analyses and four
different sources of information (operational failure reports, questionnaire survey, the
Chapter 9 Experimental Investigation
258
qualitative equipment failure impact assessment tool, and the pilot study). The FDPS
failure was chosen for this experimental set up for several reasons. Firstly, the data
available showed that this failure is both severe and frequent. Secondly, this failure
represents an example of major failures that affect multiple systems, as seen from the
qualitative equipment failure impact assessment tool. Thirdly, the participating CAA
does not have a written procedure for this particular failure which makes the controller
recovery performance more dependable upon their knowledge, experience, and
personal abilities. Finally, the technical features of the Beginning to End Skills Trainer
(BEST) platform allowed injection of this failure type and its restoration in a fairly easy
way. In order to simulate equipment failure in the most realistic way, it was necessary
to have the ability to inject failure but also to restore system functionality rapidly. This
was possible with the FDPS failure and its degradation was simulated as a sudden
failure affecting the entire ATC Centre for a period of 15 minutes.
A visual representation of this type of equipment failure on the BEST platform is
presented in Figure 9-4. Correlated radar track with all relevant flight-related
information is presented on the left-hand side of Figure 9-4, whilst the uncorrelated
track (resulting from the FDPS failure) depicting only the aircraft position is on the right-
hand side. It can be seen that the FDPS failure represented a failure which affects
multiple systems. The actual effects of the FDPS failure are presented in the Table 9-5
and in more detail in Table 9-6.
(a) (b)
Figure 9-4 The visual representation of equipment failure on CWP: a) before the failure, b) after
the failure
Table 9-5 Equipment failure in the experimental study
Type of failure
Effects Existence of
recovery procedure
HMI indication on BEST simulation
platform
Reduced flight data processing
mode
Monitoring aid only available with existing flight plans
No None
Flight data functions (flight plan management) not available
Safety Nets functions available
Radar data functions available
CALLSIGN TYPE
AFL XPT GS
CFL XFL ADES
Chapter 9 Experimental Investigation
259
Table 9-6 Availability of functions in the reduced flight data processing mode
Radar data source
Radar tracks Available Flight plan track Only for flight plan tracks already displayed
Maps Available Tools Available
Radar picture controls Available
Flight plan facilities
Flight plan commands Not available Flight plan lists Partially available (for display only, frozen lists)
ATC messages de-queue management
Not available
Transmission of ATC messages Not available Coordination message Not available
Alarm and warning facilities Partially available (no MTCA warnings update) General information area Available
Mail box management Not available
Operational data management Partially available (runway in use and airspace
management are not available) Sectorisation Partially available (only displayable)
Aeronautical Information System Available Load management facilities Not available
Air Traffic Flow Management facilities
Not available
Operational load forecast facilities Not available Current Operational Load facilities Not available
System survey facilities Partially available (percentage of use of SSR code
indication that a flight plan has received message is incorrect and alerts are not available)
Operational room configuration Partially available (only displayable) Manual printing facilities Available
Operator roles (eligibility rules) Partially available (only displayable) Off-line customisation Available
User mode of ATC position Available Repetitive flight plan database
version management Not available
9.9 Experimental variables
The following sections define the variables that were taken into account in the design of
the experiment to capture the characteristics of the recovery process in ATC. They are
defined as independent, dependent, and extraneous variables (see Table 9-7 and
Table 9-8) and discussed in the following sections.
Table 9-7 Overview of independent and dependent variables
Independent variable Dependant variable
Set of 20 RIFs The recovery context (recovery context
indicator) The required recovery
steps The recovery effectiveness
The recovery duration
Chapter 9 Experimental Investigation
260
9.9.1 Independent Variables
There are two sets of independent variables in this experiment. These are the
Recovery Influencing Factors (RIFs) and required recovery steps, discussed in the
following sections.
9.9.1.1 Recovery Influencing Factors (RIFs)
The research carried out in this thesis includes an assessment of the factors that
influence controllers during the process of recovery from equipment failures in ATC (i.e.
RIFs; see Chapter 7). A total of 20 relevant factors (RIFs) were identified. During the
post-experiment debriefing session each participating controller was presented with the
questionnaire. This questionnaire enabled controllers to mark and briefly explain the
influence of each RIF on their recovery performance as experienced in the simulation
run. Although it would be beneficial to question controllers on their experience with the
interactions between RIFs, this would considerably increase the complexity of the
experimental design. Therefore, the statistical approach is taken instead (presented in
Chapter 8).
Table 9-8 briefly summarises each of the 20 factors, specifying the key considerations
taken into account in the design of the experiment. Each factor is defined as either
independent or extraneous variable. Seven RIFs were kept constant for all participating
controllers (Table 9-8), whilst two RIFs were not considered in this experiment (i.e.
‘adequacy of alarm’ and ‘adequacy of alarm onset’).
Chapter 9 Experimental Investigation
261
Table 9-8 Overview of independent and extraneous variables
Variable Independent
variable Extraneous
variable Comment
Training for recovery √ Assessed in the debriefing session.
Previous experience with equipment failures
√ Assessed in the debriefing session.
Experience with system performance
√ Assessed in the debriefing session.
Personal factors √ Assessed in the debriefing session.
Communication for recovery √
Existing studies from the nuclear industry have confirmed that communication within a team does have a significant impact on recovery performance (Kaarstad and Ludvigsen, 2002). Hence, the impact of this factor is fairly well known. Regardless, this variable will be assessed after the experiment.
Complexity of failure type
Constant (multiple systems affected)
Refers to single vs. multiple failure occurrences. The experimental set up should assess the impact of one failure which affects multiple ATC systems. Therefore this variable will be constant for all subjects.
Time course of failure development Constant (sudden failure)
This variable varies between sudden failure and gradual degradation of the system. This variable will be constant for all subjects.
Number of workstations/sectors affected
Constant (all workstation
affected)
Experiment is conducted on a single workstation with one controller at a time. But the controller will be informed that the failure affects the entire ATC Centre.
Time necessary to recover √
This variable varies between adequate and inadequate time to recover. It can be influenced by several factors. Firstly, the characteristics of a given failure will drive the time necessary to recover through the criticality of the failed function and its detectability. Secondly, the controller characteristics will also have an effect. More experienced controllers may react and resolve an issue more quickly than less experienced ones. Finally, the characteristics of traffic at the moment of failure will drive the time necessary to recover. The more complex the traffic situation, the more recovery time will be needed to the controller. This variable will be assessed in the debriefing session.
Existence of recovery procedure Constant (no procedure)
Theoretical review and various experiments in other safety-related industries have confirmed the relevance of procedures to recovery performance (Kaarstad and Ludvigsen, 2002; EUROCONTROL, 2004e; Kanse, van der Schaaf, 2000). Therefore, it was decided to choose a failure which does not have an appropriate recovery
Chapter 9 Experimental Investigation
262
procedure.
Duration of failure
Constant (short
duration – 15min)
In the experimental set up, duration of failure should be long enough to capture all phases of the recovery (e.g. 15min) taking into account the total duration of experiment.
Adequacy of HMI and operational support
√ Assessed in the debriefing session.
Ambiguity of information √ Assessed in the debriefing session.
Adequacy of alarms/alerts Not applicable for technical
reasons
The experimental design aims to capture controller performance unaided by system tools, emphasising more controller readiness to detect and react to unexpected occurrence. Additionally, past research have already shown that in most cases the existence of an alert does have a significant impact on recovery performance (Kaarstad and Ludvigsen, 2002; Theis and Straeter, 2001).
Adequacy of alarm/alert onset Not applicable for technical
reasons
Existing studies from various industries have confirmed that the alert onset or its ‘cognitive convenience’ does have a significant impact on recovery performance (Straeter, 2005).
Adequacy of organisation √ Assessed in the debriefing session.
Traffic complexity Constant
(average to high)
This variable will be kept constant for all subjects. The aim is to reflect the current levels of traffic as well as the future predicted traffic increase. The declared sector capacity is defined as the number of aircraft entering the sector per hour, respecting the peak hour pattern, when controller workload is 70 percent in that hour (Majumdar and Ochieng, 2002). Therefore, the aim of the proposed experimental set up is to use a 30-min peak hour traffic sample that adequately reflects the sector’s declared capacity. In addition, the scenario should aim at steady traffic increase up to the tenth minute into the scenario. The remaining 20 minutes of the scenario should reflect higher levels of traffic as well as controller workload.
Airspace characteristics √ This variable will be constant since each participant will experience the same airspace/sector characteristics. However, each controller will be able to assess the adequacy of airspace in the debriefing session.
Weather conditions during the recovery process
Constant This variable will be constant for all participants. Poor weather conditions will be experienced both pre- and post-failure period.
Conflicting issues in the situation √ Assessed in the debriefing session.
Age √ Assessed in the debriefing session.
Overall experience as a controller √ Assessed in the debriefing session.
Required recovery steps √ Set of required recovery strategy steps will be defined prior to the experiment based on the type of failure, traffic sample, and airspace characteristics.
Chapter 9 Experimental Investigation
263
9.9.1.2 Required recovery steps
The recovery performance of each participant was compared to the pre-determined set
of required recovery steps. These recovery steps were determined on the basis of
operational experience, since the participating Civil Aviation Authority (CAA) does not
have any official guidelines for this particular failure type (e.g. procedure, written
instruction). This set of required recovery steps was validated by the independent input
of the SME and two ATC instructors. It should be noted that controller performance
was highly dependent upon the traffic situation at the moment of failure and therefore
several different sequences of the recovery steps were possible. The list of the
seventeen recovery steps presented in Table 9-9 presents one logical sequence of the
recovery steps. Whilst some steps had to be performed only once (e.g. identification of
a failure type, informing the coordinator, and post restoration), others had to be re-
applied. For example, for each new (uncorrelated) track entering the dedicated
airspace, it was necessary to identify the traffic and maintain that identification. In
addition, timely and accurate strip marking was a must especially in the situation of
degraded equipment reliability, as simulated in this experiment. A detailed evaluation of
strip management and annotations should be addressed in future research.
An important point to note is that these simulation runs were not entirely identical in
spite of the great effort to achieve consistency amongst participants. The observed
differences were due to pseudo pilots’ manual actions, namely their incorporation of
requested weather rerouting and slight deviations of the moment of failure injection. In
short, pseudo pilots had to manually de-correlate each new track which influenced to
some extent the traffic distribution in each simulation run.
Due to the small differences in the simulation runs, further analysis focused only on the
list of required recovery steps (Table 9-9), irrespective of their sequence. The objective
was to capture these core steps (including the post-restoration steps, S14-S17) and
evaluate any deviations.
Table 9-9 Overview and description of required recovery steps
Required recovery step
Description
S1
Detect the problem either by pilot’s contact or visually on the radar display (detection of the uncorrelated track). In both cases, the first assumption may be a transponder failure. After confirmation that the aircraft transponder is operational, further check on ATC system performance should be conducted.
S2 Locate traffic
Chapter 9 Experimental Investigation
264
S3 Check identity of eastbound overflight S4 Identify all traffic using appropriate technique
Bearing/range or Turn method (turning the aircraft for 30 degrees or more)
S5 Identify failure type (either by controller or by coordinator) S6 Inform all traffic on RTF of the failure and advise of possible restrictions S7 Maintain identification of all traffic S8 Ground the trainer S9 Refuse departing traffic permission to depart
S10 All airborne traffic in inbound sequence should continue to be sequenced for landing (without unnecessary delay)
S11 Maintain accurate and timely strip marking throughout the process S12 Provide vertical separation S13 Utilise holding patterns when necessary S14 After restoration has been confirmed by coordinator re-identify all traffic S15 Confirm Mode C S16 Continue to monitor S17 Release all departures (which leads to the restoration of the normal service)
It is important to state the some of the recovery steps above are of greater importance
to maintaining a safe ATC service than others. For example, maintaining identification
of all traffic, conducting timely and efficient strip marking and board management, and
maintaining separation are considered critical to overall safety in a degraded situation.
Other recovery steps, such as grounding the trainer and preventing departures, are of
less importance in that they are workload reduction measures. Nevertheless, their
implementation contributes to a safer traffic environment in unusual situations.
9.9.2 Dependent Variables
This study was designed to capture several quantitative and qualitative dependent
variables. The reason for this lies in the fact that controller recovery cannot be captured
through only one recovery variable as highlighted previously in Chapter 5. The
dependent variables in this experimental set up are recovery context (recovery context
indicator), recovery effectiveness and recovery duration (see Table 9-7). The precise
methodology for the assessment of the recovery context both as a qualitative and a
quantitative variable is presented in Chapter 8. The following sections investigate other
variables.
9.9.2.1 Recovery effectiveness
The recovery effectiveness of each participating controller was rated by combining
three separate sources of data. Firstly, each participant’s recovery performance was
rated during the simulation run. In general, this analysis was based on the performance
indicators for a particular airspace, such as optimal use of airspace (separation of 5-
8Nm), radar vectoring, speed control, use of radio telephony (RT), prioritisation of
Chapter 9 Experimental Investigation
265
tasks, and appropriateness of traffic management. Secondly, the recovery
effectiveness was rated based on a set of required recovery steps as explained in
9.7.1.2. Thirdly, the steps identified earlier were grouped under three main tasks to
enable credible rating (see Table 9-10). These are:
� System protection or recovery steps which aimed to assure protection of the ATC
system in case of further equipment deterioration. Note that the reduction of
controller’s workload through better traffic management is an integral part of system
protection and as such is included in this task;
� Maintaining situational awareness (i.e. accurate mental picture of traffic and
airspace); and
� Post-restoration recovery steps.
Table 9-10 Recovery process and its three main tasks
System protection task SA or mental picture task Post-restoration task
Ground the trainer Detect the problem Re-identify all traffic Refuse departures permission
to depart Identify failure type Confirm Mode C
All airborne traffic in inbound sequence should continue to
be sequenced for landing
Maintain accurate and timely strip marking
Continue to monitor
Utilise holding patterns when necessary
Identify all traffic (including eastbound overflight)
Release all departures
Inform all traffic and advise of possible restrictions
Locate traffic
Provide vertical separation Maintain identification of all
traffic
It should be noted that an assessment of controller performance is not a simple task of
counting the number of recovery steps performed versus the total number of required
steps. The reason for this lies in the different effects that each step has on the overall
recovery performance. Therefore, three sources of information enabled a structured
recovery assessment of each participant using the following five categories:
� Very good recovery performance (VG) - the controller employed a very good
recovery strategy and all recovery steps;
� Good recovery performance (G) - the controller employed a good recovery strategy
but failed to perform some of the steps;
� Adequate recovery performance (A) - the controller employed an adequate
recovery strategy but failed to completely protect the ATC system in case of further
equipment deterioration and failed to implement some of the post-restoration steps;
recovery strategy. In other words, there was a complete lack of ATC system
Chapter 9 Experimental Investigation
266
protection from possible further equipment degradation. In addition, the controller
did not assure timely and accurate strip management and therefore had no means
to support his/her situational awareness or mental picture of the traffic and
airspace. The post-restoration steps were performed only to some basic extent
without a proper check of the accuracy of new data; and
� Inadequate recovery performance (I) – the controller had no recovery strategy in
place, no plan to reduce his own workload, and therefore, failed to protect the ATC
system in the case of further equipment deterioration. In addition, the controller
failed to implement most of the post-restoration steps.
Although not attempted in this thesis, future research should assess the relevance and
contribution of existing tests such as the situational awareness test – SAGAT, to the
assessment of controller recovery.
9.9.2.2 Recovery duration
As previously discussed in Chapter 5, the recovery duration is measured as the time
from the first controller overt action to the end of the recovery process. The
measurement starts from the first controller overt action as opposed to the moment of
actual failure detection although they can differ significantly. Identifying the moment of
the failure detection can be an extremely difficult task as this first reaction usually
represents covert behaviour (i.e. detection) not directly observable. In the current
experimental set up and with the available apparatus, it was not possible to accurately
capture the moment of failure detection but only the controller’s first action as observed
on the ATC system.
More sophisticated equipment, such as an eye movement tracker (e.g. ASL Model
501), offers a better, but still not entirely accurate, approach to the discrimination of the
moment of failure detection. The reason for this is that there is no integrated measure
of eye point of gaze and brain activity which would differentiate between fixations with
information gathering and ‘stares’, when no information has been gathered6. Therefore,
even with the use of this advanced eye tracking equipment, it would not be possible to
firmly state the precise moment of failure detection. Whilst the moment of failure
6 Personal correspondence with human factors experts from Netherlands National Research
Laboratory (NLR) and EUROCONTROL Experimental Centre (Human Factors Lab).
Chapter 9 Experimental Investigation
267
detection was investigated during the post-experimental debriefing, it still proved to be
difficult to determine.
For this reason, the research presented in this thesis uses the first controllers’ action to
measure the recovery duration. It is necessary to highlight that this first observable
action may be postponed for two generic reasons. Firstly, the controller may not
necessarily detect the uncorrelated track as soon as it becomes visible on the radar
display. Secondly, the controller may detect it immediately (upon its presentation on the
radar display) but consciously delay any action due to the workload experienced or the
presence of a more urgent task which needs to be addressed first. For example, the
controller may need to address some of the tasks that are completely unrelated to the
recovery process, namely turning the aircraft to intercept the ILS localiser for the
approach and landing, radar vectoring of the traffic with speed differential. In other
words, the controller’s first action is the moment when the controller decides to initiate
an appropriate recovery strategy and not necessarily the actual time when he/she
detects the uncorrelated label. It is well known that controllers develop their own
working strategies concurrently with gaining experience and proficiency with years on
the job. This results in the gradual built up of ‘personal criteria’ for separation limits and
methods for solving the potential conflicts (whether it is to change speed of the aircraft,
its flight level, or heading).
Based on the moment of the controller’s first action, the recovery duration was
determined by observation of simulation runs and recorded video/audio material. It
should be noted that controller recovery performance did not stop with the restoration
of FDPS service, but continued to include all necessary post-restoration steps. The
post-restoration steps are required to restore normal service and to confirm that the
restored functionality provides accurate information. Discussion with the SME revealed
that this stage of the recovery should take up to one minute in duration, simply to limit
the recovery duration for the controllers who fail to perform all post-restoration steps.
As a result, the recovery duration was directly influenced by the duration of the failure
(15 minutes) and the period required for the post-restoration phase (one minute). Thus,
the recovery duration could reach a maximum of 16 minutes only if the controller
immediately initiates recovery action(s). The more time it takes for the controller to
initiate recovery action, the shorter the recovery duration will be.
The results of all three sources of information as well as the final rating for each
participant were confirmed by the one SME involved in the experiment. Clearly, having
Chapter 9 Experimental Investigation
268
the participation of more SMEs would increase the validity of the outcome of the
experiment. Future research should address how statistical representation could be
achieved given the logistical difficulties associated with these types of experiments.
9.9.3 Extraneous Variables
Extraneous variables influence the outcome of an experiment, although they are not
the variables of interest. These variables are undesirable because they add errors to
the experiment. A major goal in the experimental design is to eliminate the influence of
extraneous variables as much as possible. If it is not possible to eliminate them, they
should be controlled. Two extraneous variables in this experiment could not be
controlled. These are:
� Operational experience (i.e. years in service)
The differences in the level of experience were to be captured once the controllers are
recruited for the experiment. The experience variable is differentiated between the
following categories: 1-10; 11-20; 21-30; and 31-40 years.
� Personal factors
There is a wide variety of factors that could be categorised as personal. Some of these
are more complex to determine than others. For example, factors like health, vision,
level of confidence, complacency, level of trust in automation, self esteem (i.e. trust in
own ability), personality, motivation, attitudes deriving from family or close social group
personality type, etc. require specific sets of tests which can be too complex and too
time consuming. However, age was to be captured once the controllers were recruited
for the experiment. Fatigue and stress were to be controlled by using rested controllers,
similar as ‘time of the day’ (i.e. relevance of circadian rhythm) and time into the shift
(i.e. level of situational awareness as well as fatigue). In short, the experiment was to
be conducted in the same periods of the day, where half of the subjects were to be
tested in the morning sessions, and the other half in the afternoon sessions.
9.10 Potential limitations
There are two limitations of the experimental set up and its use to capture data. Firstly,
one limitation is the individual differences of the participants (i.e. controllers). These are
characteristics that differ from one participant to another which could be overcome by
using random assignments or even matching groups (to ensure that different groups
are equivalent with respect to pre-selected characteristics (e.g. experience and age).
Secondly, validation of recovery performance of each participating controller by only
one SME creates a potential for bias. Although special attention has been given to the
Chapter 9 Experimental Investigation
269
choice of the SME (in terms of experience and expertise), still only one SME was
available for this experiment.
9.11 Summary
This Chapter has presented in detail the experiment designed to capture controller
recovery in ATC. The Chapter started by justifying the need for the field experiment.
This was followed by an assessment of the available resources and the key
requirements that had to be accomplished. The Chapter continued by discussing and
justifying the overall experimental set up and data acquisition. This included the
presentation of the rationale for the choice of the equipment failures to be tested in the
pilot study. After the lessons learnt from the pilot study, it was possible to implement
the final changes and fine tune the set up of the main experiment. This segment
focused on the characteristics of the simulated traffic, airspace, and equipment failure,
as well as on the research variables while highlighting potential limitations. The
following Chapter analyses the data captured from this experiment.
Chapter 10 Analysis of Experimental Results
270
10 Analysis of Experimental Results
The previous Chapters identified a set of relevant contextual factors or Recovery
Influencing Factors (RIFs) and developed a novel approach for the quantitative
assessment of the recovery context. This approach and its operational benefits are
further verified in this Chapter by an experimental investigation conducted in a training
facility of an Air Traffic Control (ATC) Centre with the participation of 30 operational air
traffic controllers. In addition to the assessment of the recovery context, the
experimental data are used to assess controller recovery performance using the
recovery variables identified in Chapter 5.
The Chapter starts with the overall framework for the analysis of a unique set of data
on controller recovery performance. This is followed by the analysis of the
characteristics of the sample of controllers participating in the experiment. The Chapter
continues with an assessment of controller recovery performance using three recovery
variables, namely recovery context, duration, and effectiveness. It concludes by
focusing on the outcome of the recovery process, as captured in the experiment.
10.1 Overall framework
The objective of the experiment conducted in this research is mainly to capture data
related specifically to controller recovery from equipment failure in ATC. Based on the
experimental set up (presented in Chapter 9), three experimental sessions were
conducted with 30 controllers from a particular ATC Centre who participated on a
voluntary basis. The controllers were asked to complete one emergency training
session (based on a simulated Flight Data Processing System-FDPS failure), followed
by a debriefing session.
The framework for the analysis of data collected on controller recovery from a FDPS
failure is structured according to Figure 10-1. It starts by assessing the characteristics
of the controllers who participated in the experiment. This is followed by a detailed
Chapter 10 Analysis of Experimental Results
271
analysis of the recovery variables defined in Chapter 5, their interactions, and other
relevant findings obtained form the experiment.
Participants
Recovery context indicator
Recovery effectiveness
30 operational air traffic controllersOne particular ATC CentreSimulated Flight Data Processing System (FDPS) failure
Analyses of recovery variables
Recovery context
Required recovery steps
The recovery phases
Observed behaviour and
attitude
Additionalfindings
Analysis of interactions
Analyses of dependent variables
Recovery duration
Experimental results
AgeOperational experienceRatings
Outcome of the recovery process
Other findings
Figure 10-1 Framework for the analysis of experimental results
10.2 Participants
As discussed in section 9.8 (Chapter 9), it is important that statistical representation is
achieved in research that involves sampling of the population. In this case, such
representation is required for the ATC Centre where the experiment was to be carried
out. The main distinguishing characteristics of the controllers are age, operational
experience (i.e. years in service), and rating. This section analyses these and makes a
link to statistical representation.
Chapter 10 Analysis of Experimental Results
272
10.2.1 Age and operational experience
The average age of the controllers who participated in the experiment is 37 years,
ranging from 24 to 58 years. On average, they have more than 12 years of operational
experience, ranging from 2 to 35 years. Figure 10-2 shows the distribution of
operational experience of sampled controllers in terms of the four categories adopted
for the questionnaire survey in Chapter 6. It can be seen that the sample is reasonably
representative of the population of controllers in the particular ATC Centre as all
experience categories have been represented. The under representation of controllers
with over 30 years of experience is to be expected as the majority of the controllers in
this category tend to move to operational support roles (e.g. ATC instructors). This
finding is in line with the results of the questionnaire survey (Chapter 6) where there
were fewer respondents with over 30 years of experience.
Figure 10-2 Distribution of operational experience
10.2.2 Ratings
Figure 10-3 presents the distribution of the ratings of the controllers who participated in
the experiment. Considering that the training exercise was designed for the approach
control course (APP), it is important to highlight that 20 percent of the participants did
not have APP rating. However, half of these participants had ACC rating which
incorporates training in elements of approach control (as a part of the low level ACC
course). Although the remaining participants had only TWR rating, they had just
Chapter 10 Analysis of Experimental Results
273
completed an APP course and therefore possessed knowledge of all relevant elements
of approach control.
All - ACC APP TWR
ACC and APP
ACC and TWR
APP and TWR
ACC APP TWR
Ratings
0
10
20
30
40
Pe
rce
nt
36.7
26.7
3.3
10
6.7 6.7
10
Figure 10-3 Distribution of controllers’ ratings
Since the experiment was conducted in three separate sessions (as discussed in
section 10.1), it is important to investigate whether the sampling on all three occasions
was appropriate. In other words, it is important to show that all three sessions come
from the same population of controllers from the ATC Centre, and that aggregated,
they represent a proper sample (Table 10-1).
Table 10-1 Characteristics of a sample of controllers participating in experiment
Variables Experimental session
1 Experimental session
2 Experimental session
3
Age (mean, standard deviation)
M=35.9, SD=8.95 M=37.9, SD=10.3 M=37.7, SD=9.73
Experience (mean, standard deviation)
M=10.7, SD=6.70 M=14.3, SD=11.08 M=13.7, SD=8.22
Category of experience (frequency)
1-10 5 5 4 11-20 4 2 5 21-30 1 2 0 31-40 0 1 1
The Mann-Whitney non-parametric test was used to investigate the differences
between age and operational experience of controllers from the three experimental
Chapter 10 Analysis of Experimental Results
274
sessions. Details of this statistical test are presented in Chapter 6, section 6.7.4. The
statistical tests1 at 95 percent confidence level indicated that there is no difference
between the three experimental sessions (p>0.05). Based upon this, data were pooled
for further analyses.
10.3 Assessment of controller recovery performance
The main objective of the research presented in this thesis is to investigate controller
recovery from equipment failures in ATC. The discussions in Chapter 5 concluded that
the assessment of controller recovery needs to assess the recovery context,
effectiveness, and duration, followed by the assessment of the outcome of the recovery
process. The section continues with an analysis of the interactions between recovery
variables and concludes with the discussion of other relevant experimental findings.
10.3.1 Recovery context
The thesis used a set of RIFs, identified in Chapter 7, to develop a novel approach for
the quantitative assessment of the recovery context through the concept of a recovery
context indicator (presented in Chapter 8). The experiment carried out and presented in
Chapter 9 attempts to verify this approach and its operational benefits. The following
sections adapt the proposed methodology to the particular environment of the ATC
Centre used as a case study. This is achieved in several steps. Firstly, it is necessary
to assess all candidate RIFs and identify those relevant to a particular ATC Centre.
Secondly, the probabilities for each RIF (and its corresponding levels) are defined
based on the controllers input during the debriefing sessions. Thirdly, RIF interactions
are assessed and incorporated. Finally, the recovery context indicator is calculated as
a numerical representation of the context surrounding the simulated FDPS failure and
the subsequent controller recovery. These steps are presented in detail in the following
paragraphs.
10.3.1.1 Assessment of relevant RIFs
This step consists of the assessment of the 20 candidate RIFs and their relevance to
the experiment and the particular ATC Centre involved. Of these RIFs, ‘adequacy of
alarm’ and ‘adequacy of alarm onset’ are not relevant since there was no alarm/alert in
the design of the experiment (see Table 9-7, Chapter 9). There are two reasons for
1 Statistical tests investigated the null hypothesis for experimental sessions 1 and 2, 1 and 3,
and 2 and 3, separately.
Chapter 10 Analysis of Experimental Results
275
this. Firstly, the experiment in this research is designed to capture controller recovery
unaided by system tools, and emphasis is placed on controller readiness to detect and
react to an unexpected failure. Secondly, past research have already shown that in
most cases the existence of an alert does have a significant impact on recovery
performance (Kaarstad and Ludvigsen, 2002; Theis and Straeter, 2001). As a result, 18
RIFs were determined to be relevant to this experiment.
10.3.1.2 Probabilities of each RIF and the corresponding levels
Based on data collected during the post-experiment debriefing session it was possible
to derive probabilities of each RIF and its corresponding levels. The results for all 18
RIFs are presented in Appendix XIV. Furthermore, these probabilities are used to verify
the RIF probabilities defined in Chapter 8 using the verification criteria (Table 10-2). In
other words, a set of expectations was defined before comparing the RIFs probabilities
derived for a ‘generic’ ATC Centre (Chapter 8) and a particular ATC Centre (used in
the experiment).
Table 10-2 Verification of RIFs probabilities from a ‘generic’ approach (Chapter 8) and the experiment
RIF groups Verification
criteria Result Comment
Internal No
difference
No difference, except ‘Communication for
recovery’
The controllers who participated in the experiment rated their communication mostly as ‘tolerable’, compared to the ATM specialists who rated it mostly as ‘efficient’. The experience with an equipment failure in the simulated environment may have indicated some shortcomings in the communication for recovery to participating controllers, of which ATM specialists were not aware of.
Equipment-related
No difference
No difference Note that the five out of six RIFs in this group have been controlled in the experimental design.
External Potential
for difference
No difference, except ‘Adequacy of organisation’
The controllers who participated in the experiment rated the organisation in their ATC Centre mostly ‘tolerable’ while the overall rating from ATM specialists was mostly ‘efficient’. This is a result of the local ATC Centre characteristics masked within more generic characteristics captured by eight ATM specialists.
Airspace-related
Potential for
difference
Difference is observed with ‘traffic
complexity’ and ‘overall task complexity’
This is expected as the experimental design planned for high traffic levels and overall task complexity (resulting from the simulated equipment failure)
The expected differences in RIF probabilities are a result of the experimental design
(e.g. traffic complexity and task complexity) and the overall difference in the
Chapter 10 Analysis of Experimental Results
276
populations sampled (i.e. various ATC Centres sampled in Chapter 8 compared to the
ATC Centre sampled in the experiment). In short, the comparison of RIFs probabilities
for a ‘generic’ and a particular ATC Centre shows similarity.
10.3.1.3 Interactions between RIFs
This step consisted of an assessment and subsequent incorporation of interactions
between identified RIFs, as presented in Table 8-5 (Chapter 8). Based on the
methodology for the quantification of RIFs interactions developed in section 8.4.3 of
Chapter 8, it is possible to determine the coefficient of interaction for the interactions
between 18 relevant RIFs. This coefficient is k=1/(N-1)=1/17=0.059 (where N
represents the total number of relevant RIFs).
10.3.1.4 Recovery context indicator (Ic)
This particular study investigated 18 relevant RIFs, where six RIFs are defined via
three levels of impact and six RIFs via two levels of impact (according to qualitative
descriptors defined in Chapter 7, section 7.3). The remaining six RIFs are defined
through only one level, either because factors were controlled in the experiment or the
participants gave identical answers. For details see Table 10-3 and Chapter 9. In total,
this approach generates 36x 26 = 46,656 possible contexts, each defined through the
corresponding recovery context indicator.
Chapter 10 Analysis of Experimental Results
277
Table 10-3 Summary of RIFs defined through a single corresponding level
Recovery Influencing Factor
(RIF) Descriptor Probability Level Comment
Complexity of failure type
Multiple systems affected
1 3 Simulated Flight Data Processing System (FDPS) failure affects multiple systems
Time course of failure development
Sudden failure 1 1 The FDPS failure is simulated as a sudden failure
Number of workstations/sectors affected
All workstations
1 3 The FDPS failure is simulated to affect the entire ATC Centre
Existence of recovery procedure
Inappropriate 1 3
The objective of the experimental investigation was to simulate failure without recovery procedure
Duration of failure Short period of
time 1 2
The FDPS failure is simulated to last long enough to capture all phases of the recovery
Ambiguity of information in the working environment
External working
environment matches the controller’s
internal mental model
1 1
The controllers responded positively to the question on match between external environment and internal mental model, although they could not say that this match was one hundred percent.
After the calculation of all 46,656 possible contexts it was determined that the mean
value of the Ic is 0.029, ranging from -0.088 to 0.121. The distribution of the recovery
contexts is presented in Figure 10-4. Based on the shape of the Ic distribution, the data
has been fitted with two normal distributions. The result of this fitting is presented in
Appendix XV.
0
100
200
300
400
500
600
700
800
-0.088
-0.078
-0.068
-0.058
-0.048
-0.038
-0.028
-0.018
-0.008
0.00
2
0.01
2
0.02
2
0.03
2
0.04
2
0.05
2
0.06
2
0.07
2
0.08
2
0.09
2
0.10
2
0.11
2
Recovery context indicator (Ic)
Fre
qu
en
cy
Figure 10-4 Distribution of the recovery context indicator in the experiment
Chapter 10 Analysis of Experimental Results
278
Using the experimental results, the distribution of the Ic derived in Chapter 8 is
assessed using the verification criteria (Table 10-4). In other words, a set of
expectations was defined before comparing the distribution of Ic for a ‘generic’ ATC
Centre (Chapter 8) and a particular ATC Centre used in the experiment.
Table 10-4 Verification of the distribution of the recovery context indicator obtained from a ‘generic’ approach (Chapter 8) and the experiment
Recovery context
indicator (Ic)
Verification criteria
Result Comment
Ic
Shape
Potential for difference as a result of the local characteristics of a
particular ATC Centre as compared to a ‘generic’ ATC Centre
Shape: the difference is observed with the left tail of the distribution
Mean Mean: similar2
Median Median: similar3
Range Range: similar4
The main difference observed is the shape of the distribution in the left tail. This cannot
be explained by the difference in the RIF probabilities as the previous section showed
that they differed for only two RIFs, as a result of the characteristics of the experimental
design. Therefore, it is assumed that the shape of the left tail resulted from the local
characteristics of the ATC Centre used in the experiment (Figure 10-4). Although these
characteristics may have existed in the distribution of Ic obtained from a ‘generic’ ATC
Centre (Chapter 8), they may be masked by a ‘generic’ approach.
Therefore, the cause of the deviation in the left tail may be the incorporation of a single
coefficient of interaction between all RIFs, as discussed in section 8.4.3 of Chapter 8.
Although it is known from the operational experience that the RIF interactions do not
have the same level of influence, this thesis had to define a more generic approach to
account for the lack of operational data.
The assumption that a change in the shape of the Ic distribution (in the left tail) is a
result of a single value of the coefficient of interaction, no longer capable of properly
2 A mean value of Ic for a ‘generic ATC Centre is 0.027, whilst for the ATC Centre used in the
experiment is 0.029. 3 A median value of Ic for a ‘generic ATC Centre is -0.023, whilst for the ATC Centre used in the
experiment is -0.026. 4 A range of Ic values for a ‘generic ATC Centre is from -0.069 to 0.131, whilst for the ATC
Centre used in the experiment is from -0.088 to 0.121.
Chapter 10 Analysis of Experimental Results
279
accounting for local characteristic is further assessed on the example of the RIF
‘Adequacy of HMI and operational support’. This RIF is chosen because the interaction
matrix (Table 8-26, Chapter 8) indicates that this RIF impacts on several other RIFs.
Thus the change of its coefficient of interaction may have a significant impact on the Ic
distribution. As a result, the coefficient of interaction relevant to this RIF is increased
from the previous value of k=1/(N-1)=1/17=0.059 (section 10.3.1.3) by factor 10 to the
new value of k=10/(N-1)=10/17=0.59. The resulting distribution of Ic, presented in
Figure 10-5, shows the notable change in the shape of the left tail.
0
100
200
300
400
500
600
700
800
-0.088
-0.076
-0.064
-0.052
-0.04
-0.028
-0.016
-0.004
0.00
80.
02
0.03
2
0.04
4
0.05
6
0.06
80.
08
0.09
2
Recovery context indicator (Ic)
Fre
qu
en
cy
Figure 10-5 Distribution of the recovery context indicator in the experiment with an increased value of the coefficient of interaction
In short, the comparison of the distribution of Ic obtained from a ‘generic’ ATC Centre
and from the particular ATC Centre shows no difference in the mean, median, and
range, but only in the shape of the left tail. This difference in the shape has been
explained by the inadequate definition of the coefficient of the interaction. As previously
discussed in Chapter 8, more accurate definition of this coefficient will be possible once
a detailed database of human performance becomes available in the ATM industry.
While the controller’s responses gave a basis for the definition of the recovery context
indicator (Ic) through each possible recovery context, it was also possible to define
indicators for each controller. In several cases, the participants were not able to select
the corresponding level for several RIFs. For example, in the case of the RIF ‘weather
conditions during the recovery process’ several controllers were so preoccupied with
the recovery process that they did not pay any attention to the weather conditions.
Therefore, they were unable to select the appropriate level for this RIF. The missing
responses were informed by those available for this RIF. In other words, the missing
Chapter 10 Analysis of Experimental Results
280
responses were replaced with the answer ‘unchanged’ (corresponding to Level 2)
reported by the majority of controllers. This is also in line with the actual design of the
experiment, where similar weather conditions were presented to the controllers in the
pre- and post-failure period. A similar approach is applied for other missing answers.
Figure 10-6 shows the distribution of recovery contexts for 30 controllers. All values of
the Ic are positive and range between 0 and 0.1. This reflects average or tolerable
environment (values of Ic are close to 0) that has a potential for improvement to
facilitate better recovery from equipment failure.
Figure 10-6 Distribution of the recovery context indicator of 30 controllers
After the assessment of recovery contexts surrounding each controller, the next section
reviews the potential solutions to enhance the recovery context (and thus controller
recovery) using the methodology developed in Chapter 8. In other words, the next
section analyses the sensitivity of the Ic to changes in RIFs.
10.3.1.5 Optimal solutions
In searching for the areas for potential enhancement to improve the controller’s
recovery process, it is necessary to focus on RIFs which may be affected at the level of
the ATC Centre. Table 10-5 presents the nine RIFs that could be enhanced, based on
the responses of the controllers who participated in the experiment and the
characteristics of the ATC Centre investigated.
Chapter 10 Analysis of Experimental Results
281
Table 10-5 A review of RIFs with the potential for recovery enhancement
RIFs Potential for improvement
Internal RIFs
Training for recovery Previous experience Experience with system performance Personal factors Communication for recovery
√ - - √ √
Equipment failure related RIFs
Complexity of failure type Time course of failure development Number of workstations affected Time necessary to recover Existence of recovery procedure Duration of failure
- - - √ √ -
External RIFs
Adequacy of HMI Ambiguity of information Adequacy of organisation
Figure 10-7 Recovery steps performed by each participant
6 Note that if a controller did not seek failure-related information from the coordinator, the
coordinator was advised to inform the controller but only after the controller detected the failure. As a result, the occurrence of this step is inevitable.
Figure 10-8 Distribution of required recovery steps (S1 to S17)
Further data analysis shows that on average each controller performed 74.2 percent of
the required recovery steps, ranging from as low as 29 percent to 100 percent. The
most neglected steps were the re-identification of all traffic (S14) and confirmation of
Mode C (i.e. confirmation of the accuracy of the post restoration FDPS data – S15).
The post restoration recovery steps of re-identifying traffic and validating Mode C are
important as these steps are considered best practice to ensure system safety in the
aftermath of an FDPS failure. The re-identification process is necessary for two
reasons. Firstly, the identification of traffic is lost whilst aircraft occupy a holding
pattern. Separation in a holding pattern is purely procedural and radar separation does
not apply. Secondly, because of the potential for label swapping and garbling of radar
signals when aircraft are in close lateral proximity (i.e. such as in a holding pattern).
Further investigation of the percentage of the steps performed in three sessions
reveals a significant difference between the first and the third session. The percentage
of the steps carried out in the first session is significantly lower than in the third
session. The relevant statistics are presented in Table 10-7. The percentage of the
performed recovery steps in the first experimental session is on average 64 percent,
increasing in the second experimental session to 77 percent, reaching 82 percent in
the third experimental session (Table 10-7).
Chapter 10 Analysis of Experimental Results
285
Table 10-7 Percentage of performed recovery steps in three experimental sessions
Session Statistics Paired sessions Non-parametric Mann-Whitney test results
1 M=63.98
SD=21.69 1 and 2 p>0.05
2 M=77.06
SD=17.64 1 and 3
p=0.044 Sig (U=23.5, z=-2.0)
3 M=81.77
SD=12.84 2 and 3 p>0.05
After the last experimental session, it was suspected that certain changes had been
implemented in the training of controllers in the participating ATC Centre. The
debriefing session with controllers participating in the third experimental session and
the input from management revealed the incorporation of a compulsory emergency
training module within every rating conversion and continuation training course. This
change was firstly incorporated in the SID/STAR training that started on May 2006. As
a result, several controllers participating in the third experimental session (taking place
in June 2006) benefited from this change. It seems that that this change in training
syllabus led to the increased number of recovery steps performed and the significant
difference observed when compared to the first experimental session.
Statistical tests performed to determine the relationship between the percentage of
recovery steps performed and 18 RIFs, showed that only RIF2 (‘previous experience
with equipment failures’) has a statistically significant correlation. More precisely, the
negative correlation identified (r=-0.31) indicates that controllers who have experienced
equipment failures tend to perform more of the required recovery steps compared to
those who have not experienced failure. In other words, experience with equipment
failures enhances the controllers’ ability to recover. This finding should be transferred
into the training syllabus of every ATC Centre.
10.3.3 Recovery effectiveness
As explained in the previous Chapter, this variable is based on data and information
from three different sources, where each controller is categorised as follows: very good
(VG), good (G), adequate (A), partially adequate (PA), and inadequate (I). The
recovery performance of 43 percent of controllers is rated as partially adequate or
totally inadequate (Figure 10-9). These controllers did not assure ATC system
protection from possible further equipment degradation and did not employ timely and
accurate strip marking and strip board management. Therefore, they had little or no
means of supporting their mental picture of traffic and airspace. The post-restoration
Chapter 10 Analysis of Experimental Results
286
steps were performed only to some basic extent without any proper check of the new
data accuracy. In addition, such a high percentage of inadequate performance
indicates that there is room for improvement throughout the ATC Centre participating in
this experimental investigation. The management of the ATC Centre should implement
solutions to assure a more efficient handling of unusual/emergency situations. Such
solutions could include emergency training on equipment failures, design of recovery
procedures, and regular briefings.
Figure 10-9 Distribution of recovery effectiveness per category (presented via frequencies and relative percentages)
Comparison of the recovery effectiveness for the three experimental sessions does not
reveal any significant differences (using the non-parametric Mann-Whitney test). In
spite of the implemented change in the participating ATC Centre (i.e. compulsory
emergency training module within the SID/STAR conversion training) and the increase
in the number of recovery steps performed, the effectiveness of the recovery
performance did not differ from one session to the other. This finding confirms that the
rating of recovery effectiveness does not depend on a simple count of recovery steps
performed. This finding further justifies the use of pooled data from all three
experimental sessions. It is an indication of the overall objective achieved with the
execution of those steps but without account of the time frame (recovery duration)
within which the objective is achieved. The combined effect of recovery effectiveness
and recovery duration is assessed in section 10.3.5.
Chapter 10 Analysis of Experimental Results
287
10.3.4 Recovery duration
The recovery duration is the time measured from the controller’s first action to the end
of the recovery process. During the experiment the first action was identified by the
observation and video recording of each controller’s performance, further validated with
the controller (during the post-experiment debriefing session) and the SME. For
example, the time of the first action was the moment when a controller initiated a
search for the uncorrelated track(s), contacted Area Control Centre (ACC) to check on
the uncorrelated track(s) or contacted aircraft to ask for a transponder check (using the
phraseology “squawk ident”). The end of the recovery process in this particular
experimental design was influenced by the restoration of the failed system and the
performance of the necessary post-restoration steps.
In general, the recovery duration ranged between 12:08 and 15:49 minutes, with an
average duration of 14:38 minutes (SD=0:55). The distribution of the recovery duration
of all 30 controllers per four duration categories is presented in Figure 10-10. These
categories are: 12-13, 13-14, 14-15, and 15-16 minutes. Figure 10-10 shows that 50
percent of controllers initiated the first recovery action within the first minute of the
failure occurrence (and thus their recovery duration lasted between 15 and 16
minutes). The shortest recovery duration is captured in the recovery performance of
two controllers (6.7 percent; Figure 10-10). These two controllers, although initiating
recovery later than the others, implemented an excellent recovery strategy. This finding
highlights that the recovery duration and recovery effectiveness alone are not
appropriate indicators of the overall recovery outcome. To enable a safety assessment
of the recovery performance it is necessary to account for both, as presented in section
10.3.5.
Chapter 10 Analysis of Experimental Results
288
Figure 10-10 Distribution of recovery duration
Comparison of the recovery duration for the three experimental sessions revealed
significant differences. More precisely, the recovery duration in the third experimental
session is significantly longer than in the first two sessions (Table 10-8). This is a result
of the controllers from the third session reacting to the identified failure more promptly
compared to the controllers from the previous two sessions. This may be the result of
the change in the training implemented by the management in the participating ATC
Centre prior to the third session. However, it has to be noted that more prompt reaction
to the identified failure (i.e. longer recovery duration) does not necessarily entail an
effective recovery.
Table 10-8 Comparison of recovery durations between three experimental sessions
Session Statistics Paired sessions Non-parametric Mann-Whitney test results
1 M=14:15 SD=1:02
1 and 2 p>0.05
2 M=14:25 SD=0:58
1 and 3 p=0.031 Sig (U=21.5, z=-2.2)
3 M=15:14 SD=0:18
2 and 3 p=0.014 Sig (U=17.5, z=-2.5)
Non-parametric Kendall’s tau tests performed between recovery duration and various
RIFs, reveal four statistically significant correlations. These are presented in Table 10-9
while the details of this test are discussed in Chapter 6. Firstly, the analysis shows that
Chapter 10 Analysis of Experimental Results
289
the recovery duration tends to be longer7 if the last emergency training had a module
on equipment failures. This finding indicates the benefit that emergency training has on
recovery duration (as it prepares controllers to react rapidly to an emergency situation).
Secondly, a similar effect on recovery duration is seen with enhanced communication
for recovery. In other words, if the controllers initiate recovery sooner, they have more
time to adequately communicate the problem to team members or a supervisor.
Thirdly, the existence of adequate recovery procedures promotes prompt recovery
action. This is in line with the finding of the first test. Finally, recovery duration
increases with a decrease in traffic complexity. This is expected as the less demanding
traffic situation allows more prompt action and initiation of the first recovery action
sooner rather than later.
Table 10-9 Statistical tests and results
Variable 1 Variable 2 Test Statistical significance at
95% confidence level
Recovery duration
Last emergency training (module on equipment failure)
The nonparametric correlation
(Kendall’s tau)
p=0.018 (r=-0.39)
Communication for recovery p=0.10 (r=-0.39)
Existence of the recovery procedure
8
p=0.15 (r=-0.41)
Traffic complexity p=0.004 (r=-0.46)
After assessing both recovery effectiveness and recovery duration, it is realised that
independently they are not appropriate indicators of the recovery outcome, as
discussed in Chapter 5. Therefore, a safety assessment of the overall recovery
performance necessitates the use of both variables combined into the ‘outcome of the
recovery process’ presented in the following section.
10.3.5 Outcome of the recovery process
The outcome of the recovery process represents the final stage in technical and
controller recovery as previously discussed in section 5.3 of Chapter 5. Since no
technical recovery was taken into account in this experiment, the outcome of the
7 More prompt first recovery action by a controller is representative of the longer recovery
duration. 8 There is no recovery procedure for the simulated equipment failure in the participating ATC
Centre, but some controllers stated that they had experienced similar failures as part of their initial simulator training. Discussion with the subject matter expert revealed that this particular equipment failure is not simulated in any training syllabus.
Chapter 10 Analysis of Experimental Results
290
recovery process focuses solely on the outcome of controller recovery. This is defined
as a combination of two recovery variables. Firstly, recovery effectiveness that
accounts for recovery steps carried out by a controller and achievement of the three
key objectives (i.e. ATC system protection, maintenance of situational awareness, and
adequate post-restoration steps). Secondly, recovery duration accounts for the time
frame in which these steps were performed. In line with the discussion in Chapter 5,
the outcome of the recovery process is accounts for successful and unsuccessful
recovery. An additional category for ‘tolerable’ recovery outcome is also defined in this
thesis (Table 10-10).
Table 10-10 The outcome of the recovery process matrix applicable to the experimental set up presented in this thesis (S stands for successful, T for tolerable, and U for unsuccessful recovery)
Recovery duration (minutes)
12-13 13-14 14-15 15-16
Reco
very
E
ffe
ctiven
ess Very good T T S S
Good T T T S
Adequate U T T T
Partially adequate U U T T
Totally inadequate U U U T
The recovery outcome matrix highlights that successful recovery requires the initiation
of the recovery process within the first two minutes from the instant of the failure
occurrence and the performance of the majority of the recovery steps (assuring
achievement of all three objectives). An unsuccessful recovery is a result of a controller
failing to achieve two or more key objectives while initiating the recovery after more
than one minute from the instant of the failure occurrence. The delayed first recovery
action leaves the ATC system completely unprotected. Therefore, the temporal
requirements for the unsuccessful recovery account for three categories of the
recovery duration variable (Table 10-10). Everything outside the scope of the
successful and unsuccessful recovery is considered tolerable. The above discussions
are only applicable to this experimental time frame and setting, and are extracted
based on operational experience, with a further validation by the SME.
Based on the presented categorisation, the outcome of the recovery process for
controllers who participated in the experiment is mostly tolerable (Figure 10-11). This
finding again confirms that there is room for improvement of the recovery performance
in the ATC Centre used in this experiment.
Chapter 10 Analysis of Experimental Results
291
Figure 10-11 Distribution of the recovery outcome
After assessing all recovery variables, the next section identifies any relevant
interactions between them.
10.3.6 Interactions
This section investigates the level of interactions between the recovery variables using
statistical testing (previously discussed in Chapter 6). Table 10-11 presents the results.
Table 10-11 Statistical tests and results
Variable 1 Variable 2 Test Statistical significance at 95 percent confidence interval
Recovery context indicator
Recovery effectiveness
Non-parametric
test (Kendall’s tau)
p=0.06, r=0.329
Outcome of the recovery process
p=0.017, r=-0.36
Recovery effectiveness
Outcome of the recovery process
p=0.01, r=0.57
Recovery duration Outcome of the
recovery process p>0.05
Non-parametric Kendal’s tau statistical tests indicated three significant relationships
(Table 10-11). Firstly, a statistical test indicates a relationship between recovery
effectiveness and recovery context indicator at the 90 percent confidence level
(p=0.06, r=0.32). Furthermore, the Mann-Whitney non-parametric test shows the
9 Statistical significance at the 90 percent confidence interval
Chapter 10 Analysis of Experimental Results
292
relationship between recovery context indicator for the combined category of ‘very
good’ and ‘good’ recovery effectiveness on one side and ‘partially adequate’ and ‘totally
inadequate’ on the other (at the 90 percent confidence interval, p=0.065). Secondly, a
statistical test indicates a significant relationship between the recovery context indicator
and the outcome of the recovery process at the 95 percent significance level (p=0.017,
r=-0.36). In other words, the higher values of the recovery context indicator enhance
the outcome of the recovery process or the recovery success. Finally, a statistical test
indicates a significant relationship between recovery effectiveness and the outcome of
the recovery process. In other words, the greater controller recovery effectiveness the
more successful is the overall recovery. All findings are in line with the operational
experience.
10.3.7 Other findings
In addition to the findings above, the following points are worthy of note. These are
presented, firstly by considering the phases of recovery and the corresponding
influencing factors. Secondly, by considering the behaviour and attitude of the
controllers, as the simulated failure was unexpected. Finally, additional findings related
to controller recovery of relevance to the management of the particular ATC Centre and
the wider aviation community are presented also.
10.3.7.1 The recovery phases
The following paragraphs provide a review of the three distinct recovery phases as
explained in Chapter 5, section 5.2. This review focuses on the factors that influenced
controller recovery performance in each phase.
10.3.7.1.1 Detection
In the simulated runs, detection, or recognition that there is something unusual in the
ATC system, was determined by several factors. The most prominent factor was the
pilot's first contact with ATC. There were two flights entering the approach sector
simultaneously following failure injection. Depending on the pseudo-pilots’ workload,
either of these aircraft could contact the controller first. At the moment of the first
contact the flights were still outside of the controller’s area of responsibility (some
40Nm away from the airport10) and controllers were sufficiently busy in the vicinity of
the airport providing approach control service. As a result, the aircraft were usually
10
Note that the display range in this experiment was set to 30Nm for each controller.
Chapter 10 Analysis of Experimental Results
293
asked to standby for radar identification. In the case of late contact by the first
uncorrelated track (once the track is almost visible on the radar screen or at about
35Nm from the airport), controllers searched for the track and detection of the problem
was then immediate. The common factors that influenced the detection phase of the
recovery process in this experiment were determined based on observations, video
recordings, and debriefings. These are as follows:
� The first radio contact (RT) of uncorrelated track;
� Traffic complexity and related level of controller workload at the moment of contact;
� Display range (set at 30Nm for this experiment);
� Type of the equipment failure (uncorrelated tracks were immediately visible on the
screen once within radar range); and
� Complexity of failure type (affecting single or multiple equipment simultaneously).
It should be noted that the same set of factors also affected the instant of the first
recovery action. The reason is that detection is a prerequisite for the first recovery
action.
10.3.7.1.2 Diagnosis
In this experiment, after the detection of one uncorrelated track, the controller’s first
assumption was usually aircraft transponder failure. This prompted a request to the
pilot to squawk identification on the secondary transponder (i.e. to operate the
designated Mode A code on the primary/secondary transponder). When this check did
not produce a correlated track on the radar screen further checks were necessary. At
this stage, the second aircraft was usually well inside the radar display range also in an
uncorrelated state. At this point, it became obvious to the controllers that they were
experiencing some form of equipment failure and they sought information from the ATC
Centre coordinator as to the nature of the failure. The possible options were failure of
secondary surveillance radar or FDPS failure. SSR failure was discounted as soon as
the mix of correlated and uncorrelated tracks was visible. The final option was FDPS.
The coordinator was instructed to announce that it was FDPS failure affecting the
entire ATC Centre. Moreover, he also emphasised that flight plan tracks would remain
correlated only for tracks already displayed, while all other tracks entering the system
will appear uncorrelated. The common factors that influenced the diagnosis stage of
the recovery process in this experiment were determined based on observations, video
recordings, and debriefings. These are as follows:
� The number of uncorrelated tracks observed on the radar display;
� Input by the coordinator;
Chapter 10 Analysis of Experimental Results
294
� Type of equipment failure; and
� Complexity of failure type.
10.3.7.1.3 Correction
In the exercised traffic scenario, the correction phase consisted of the identification of
all traffic using an appropriate primary radar technique. There are a number of
available techniques to identify traffic. Those chosen by the controllers in this
experiment were confirmation of bearing/distance of the aircraft from a fix and the turn
method (turning a singe aircraft by 30 degrees or more to ascertain positive radar
identification). Operationally, the bearing/range technique is considered to be more
effective and expeditious, as it avoids misidentification due to simultaneous turning of
more than one aircraft. The next step in this process would be to inform all traffic of the
exact nature of the equipment failure and to advise them of possible consequences
(i.e. restrictions and delays). This would be followed by restricting any sport/training or
non-commercial aircraft, refusing departures permission to depart, and utilising the
holding pattern for all arrivals. If the failure was persistent (in this experiment it lasted
15 minutes), the controllers had to think of the steps to assure system safety in the
case of further deterioration of the equipment reliability. Thus, they had to provide
vertical separation and preserve the highest level of situational awareness. This should
be achieved by maintaining accurate and timely strip marking and strip board
management11. The common factors that influenced the correction stage of the
recovery process were determined based on observations, video recordings, and
debriefings. These are as follows:
� Traffic complexity;
� Existence and familiarity with the recovery procedure(s);
� Duration of failure;
� Type of equipment failure; and
� Complexity of failure type.
Figure 10-12 links the key characteristics of each recovery phase in this particular
experiment with the recovery steps relevant for each phase.
11
The debriefing sessions investigated the overall quality of strip management and annotation without going into a more detailed analysis. In future, the structure of the debriefing session may place more emphasis on this segment of the recovery process.
Chapter 10 Analysis of Experimental Results
295
Figure 10-12 Recovery phases, their corresponding influencing factors and required recovery steps
10.3.7.2 Observed behaviour and attitude
As discussed in Chapter 9, all the observations of the controllers’ attitude and
behaviour were captured by the assistant. A check-list using the SHAPE’s list of
attitudes was used as an initial tool and guidance to the assistant in performing this
task (see EUROCONTROL, 2004f). In addition, some of the observations were
captured during the debriefing sessions.
In general, the observations in the first two experimental sessions show a difference in
overt behaviour in the pre- and post-failure segment of the experimental investigation.
In line with the results obtained with other recovery variables, the analysis of the
relevant data on controllers participating in the third session did not reveal significant
changes in overt behaviour in the pre- and post-failure segment of the experiment.
Furthermore, the findings from the first two sessions are in line with the previous
findings on the consequences of stress on individual controllers (Costa, 1995). Whilst
for some controllers the overall posture remained the same throughout the exercise,
Chapter 10 Analysis of Experimental Results
296
others displayed the complete opposite. The deviations from the pre-failure behaviour
involved the following:
� increased movement (i.e. overall posture, hands, feet, or head);
� forceful displacement of the strip holders;
� deviations from standard RT phraseology;
� hesitation in RT communication; and
� change in pitch or tone of voice.
The subject matter expert involved confirmed that most of these behavioural gestures
depict a typical reaction to a reduced mental picture of either the traffic or overall
situational awareness. Even during the debriefing stage of the experiment, the change
in the controllers’ behaviour was noticeable for the first two experimental sessions.
Examples include shaky voice, overall unease, high alertness, and seriousness. The
controllers who performed the recovery process at either tolerable or good levels were
noticeably more relaxed and talkative. On the other hand, the controllers who
performed at either partially adequate or inadequate levels were without exception
more nervous and reluctant to answer questions in detail, and carry out an objective
review of their own performance. The overall conclusion is that the equipment failure
was an unexpected event and contributed to a significant increase in the controller’s
workload (as reported subjectively by the participating controllers).
10.3.7.3 Additional findings
It is important to present all acquired findings as they represent important issues for the
management of the participating ATC Centre as well as the wider aviation community.
These are presented in the following paragraphs.
Although 73 percent of the controllers reported that their training was suitable to the
equipment (i.e. FDPS) failure and traffic scenario in question, analysis of data collected
in the experiment showed that for 43 percent (of the 73 percent) received the last
emergency training more than a year prior to the experiment12. From the controllers
who were able to recall, 50 percent stated that the emergency training session they
participated in had a module on equipment failures, predominantly on radar failures.
However, it was also noted that 40 percent of the controllers did not have any type of
equipment failure in their last emergency training. As a result, 93 percent of controllers
12
Note that 27 percent of controllers had their last emergency training in the month prior to this experiment, as a part of the approach rating course.
Chapter 10 Analysis of Experimental Results
297
who participated in the experiment reported they would like to have more frequent
training for unusual situations. The most desired frequency of emergency training
sessions was every six months. This is in line with the findings obtained in the
questionnaire survey (Chapter 6) where 45 percent of controllers believe that recurrent
training once a year is not enough to develop and maintain the level of proficiency
required for recovery from equipment failures.
Interesting results were obtained on the question on the existence of a recovery
procedure for the simulated FDPS failure. Although the procedure for this kind of failure
does not exist in the Manual of Air Traffic Services (MATS), 20 percent of controllers
believed that this particular procedure does exist. Some of the controllers, who had
participated in the approach control course, quoted their training manual as the
reference for this procedure. However, no evidence was found to support their
statement. The best explanation for this is that these controllers identified Secondary
Surveillance Radar (SSR) failure with FDPS failure and relied on their recent radar
fallback training, without fully understanding what the implications of the loss of FDPS
are. The outcome of FDPS failure is significantly different from simple SSR failure, as it
represents a more serious failure that requires immediate attention from the controllers
with the required skills.
On the issue of Human Machine Interface (HMI) and operational support (e.g. auxiliary
display, communication panel) 46.7 percent of controllers found the Beginning to End
Skills Trainer (BEST) simulator platform suitable to the equipment failure and traffic
scenario in question, 36.7 percent found it tolerable, while ten percent found it counter
productive. 6.7 percent of the controllers did not respond to this question. However,
most of the controllers stated that the BEST platform’s HMI is not as good as the HMI
used in the operational centre. There are two reasons for this. Firstly, meteorological
data needs better positioning (i.e. closer to the screen) to avoid head turn and change
of visual field and secondly, a lack of alert or warning that a failure has occurred (i.e.
colour change to yellow or red in the ‘general information window’).
Several organisational issues were raised during the debrief sessions. The most
frequent issues raised were that controllers:
� felt that supervisors should receive more dedicated training in the handling of
unusual occurrences and system failures. Their role in coordinating recovery
actions should be more proactive. In addition, it was highlighted that coordination
Chapter 10 Analysis of Experimental Results
298
with technical services and adjacent ATC Centres should be the primary
responsibility of the supervisor during a Centre crisis;
� felt that more emphasis could be placed on developing an understanding of the
separate roles of both controllers and engineers. This perceived lack of
understanding of each peer group’s function and tasks can create communication
difficulties in the operational environment;
� identified a need for an update of the MATS with regard to the on suite task
allocation between the executive and planning controller. Additionally, controllers
stated that the last three incidents involving a loss of standard separation involved
team related issues that contributed to the events. Therefore, it is necessary to
strengthen the relationship between executive and planning controllers and to
define their precise roles and responsibilities;
� stated that their roles as currently defined in MATS are ideal but in reality are
difficult to adhere to, especially in a busy operational environment. They further
stated that in the event of an unusual occurrence, there are no guidelines available
for the handling of such situations;
� stated that competency checking, conducted once per year for only one hour, is not
sufficient. They also stated that the availability of refresher training in unusual
occurrences is also limited to once per year. One again, this finding is in line with
the questionnaire survey results presented in Chapter 6.
In general, the participating controllers rated their own performance between efficient
and tolerable (47 percent rated their own performance as efficient and 50 percent as
tolerable). This is not in accordance with the overall assessment of their performance
(recovery effectiveness) where 43 percent of the controllers performed at the ‘partially
adequate’ and ‘inadequate’ levels. This should pose some concern especially
considering that 46.7 percent of controllers stated that their performance in this study
was no different from any other day. In addition, 45 percent of them marked their
performance as highly representative of their overall ability to recover from an
equipment failure in ATC. Finally, 70 percent of controllers stated that the task they
experienced in the experiment was highly realistic.
Furthermore, 33 percent of the controllers stated that they were not aware of the
complete impacts/implications of a particular failure or equipment failures in general. As
a result, 87 percent of the controllers stated that they would like to have some form of
aide memoire available at each CWP to assist them in recognising the effects of a
particular equipment failure and steps to be taken to recover. As a consequence this
Chapter 10 Analysis of Experimental Results
299
thesis proposes a framework for the establishment of an aide-memoire (in Appendix
III). A summary of all additional findings is presented in Table 10-12.
Table 10-12 Summary of additional findings
Variable Finding Comment
Training
73 percent reported that their training was suitable
Majority of these controllers had the last training on unusual situations more than a year ago. Only half of the respondent had an equipment failure.
93 percent of controllers would like more frequent training for unusual situations
Trust in ATC technology
93 percent of controllers have an objective attitude toward ATC equipment
Recovery procedure
20 percent of controllers believe that the procedure for FDPS failure exists
The procedure does not exist in the ATC Centre
HMI
46.7 percent of controllers found the BEST platform suitable to their needs and only 10 percent found it counter productive
Negative comments are mostly related to the differences between BEST platform and the system used in the operations room
Overall recovery performance
47 percent of controllers rated efficient 50 percent of controllers rated tolerable
Not is accordance with their overall performance. 43 percent of controllers were rated partially adequate or inadequate.
Awareness of the impact of a
particular failure
33 percent of controllers is not completely aware
Availability of aide memoire
87 percent of controllers is in favour A framework of aide memoire is provided in Appendix III
10.4 Summary
The Chapter set out to achieve several objectives. Firstly, it set out to verify a
methodology for the quantitative assessment of the recovery context (defined in
Chapter 8) and its operational benefits. Secondly, it set out to verify a framework for an
in depth analyses of controller recovery using recovery variables previously identified in
Chapter 5. The final objective set out to assess the outcome of the recovery process.
All these objectives have been achieved by the experiment and several interesting
findings have been produced. These are as follows:
� The majority of controllers tend to omit some critical recovery steps related to the
post-restoration phase. These are re-identification of traffic and confirmation of
the accuracy of information provided by the restored equipment. The sampled
controllers seemed to rely on the information provided without questioning its
accuracy following the occurrence of a failure.
Chapter 10 Analysis of Experimental Results
300
� Controllers with prior experience of equipment failures tend to carry out more
recovery steps compared to those without prior experience. In other words,
experience with any equipment failure tends to enhance the controllers’ ability to
deal with equipment failures. Moreover, this type of stress-exposure training
enhances the stress-coping skills of controllers and as such should be
incorporated into the training syllabus of every ATC Centre.
� A high percentage of inadequate recovery performance indicates that there is
room for improvement throughout the ATC Centre participating in the experiment.
Hence, the ATC Centre management should implement solutions to assure
efficient handling of unusual/emergency situations. Note, however that the
management of the ATC Centre where the experiment took place implemented
an initial process to train controllers to deal with unusual/emergency situations.
This was in the form of a compulsory emergency training module within every
rating conversion and continuation training course.
� The first recovery action tends to occur more promptly if a controller has had
training for unusual/emergency situations.
� If the controllers initiate recovery sooner, they communicate better with team
members and the supervisor.
� The existence of adequate recovery procedures tends to promote prompt
recovery action.
� Recovery duration tends to increase with a decrease in traffic complexity. This is
expected as the less demanding traffic situation allows the controllers to initiate
recovery action sooner rather than later.
� The outcome of the recovery process variable has been defined as an overall
safety indicator of the recovery process. It represents a combination of the
recovery effectiveness and duration.
� The recovery context indicator represents a good indicator of both recovery
effectiveness and the outcome of the recovery process.
� Recovery duration itself is not a good indicator of the outcome of the recovery
process, whilst recovery effectiveness is.
� The framework for the analysis of controller recovery proposed in this thesis and
verified in the operational environment, shows a potential for an in depth analysis
of controller recovery from equipment failures in ATC.
Chapter 11 Conclusions
301
11 Conclusions
This Chapter presents the main findings of the research on controller recovery from
equipment failures in Air Traffic Control (ATC) and suggests avenues for future work.
The approach taken for the former is to address each of the research objectives
formulated in Chapter 1 (repeated below for ease of reference) and to present the
corresponding findings. The Chapter concludes with the identification of research
questions and ideas to be explored in future research.
11.1 Revisiting the research objectives
Chapter 1 defined a set of four research objectives for this thesis. These are to:
� Provide a systematic literature review to connect disparate but related topics of
ATC equipment failures and controller recovery, previously lacking in the area of
ATC;
� Identify potential equipment failure types and their characteristics;
� Identify contextual factors that affect controller recovery performance and derive a
methodology to quantitatively assess recovery context; and
� Propose a framework for the analysis of controller recovery. This framework should
be further verified with specific reference to a particular equipment failure type.
11.2 Conclusions
11.2.1 Literature review
The review of relevant literature aimed to connect ATC equipment failures with both
technical and air traffic controller recovery. With respect to the literature review, the
following conclusions are relevant:
1. The assessment of controller recovery from equipment failures in ATC has to
address technical and controller recovery together and not in isolation as has
been the case in the past. This holistic approach enables a complete
understanding of controller recovery and all of its influencing factors.
Chapter 11 Conclusions
302
2. Because of the variety of equipment, components, and tools in both current and
future ATC system architectures, ATC equipment should be classified based on
the type of ATC functionality it supports. Such a functional classification is
flexible to changes in ATM/ATC and can capture both current and future
equipment failure types.
3. Recovery procedures, recovery training, and past experience with equipment
failures are the main drivers of controller recovery performance. However, the
provision of both recovery procedures and training is inconsistent, across ATC
Centres.
4. The context in which controller performance takes place has an important role
in controller recovery.
11.2.2 Equipment failure types and their characteristics
Equipment failure characteristics were determined from past research and operational
experience through the analysis of operational failure reports and responses from a
questionnaire survey of air traffic controllers. With respect to equipment failure
characteristics, the following conclusions are relevant:
5. The key characteristics of ATC equipment failure are: ATC functionality
affected, complexity of failure type, time course of failure development, duration
of failure, potential causes of equipment failure, and the consequences of
equipment failure.
6. Information on equipment failure characteristics has been used to develop a
novel qualitative equipment failure impact assessment tool. This tool enables
the identification of equipment failures that are most challenging to ATC
operations.
7. Communication, surveillance, and data processing ATC functionalities are
affected most by equipment failures and have the most severe impact on ATC
operations. This finding has been verified by operational failure reports and the
results of the questionnaire survey.
8. According to operational failure reports further verified with the results of the
questionnaire survey, equipment failures that have a major impact on ATC
operations mostly affect the air ground communication, radar surveillance
coverage, and the Flight Data Processing System (FDPS).
9. According to operational failure reports, the most frequent equipment failures
last up to 15 minutes. Furthermore, analysis of the reports has shown that the
Chapter 11 Conclusions
303
longer the failure, the less severe it is. This finding is expected as more severe
failures are attended to immediately.
The conclusions listed above, resulting from the investigation of equipment failure
types and their characteristics in the operational ATC environment, have the potential
to impact policy formulation and the operational aspects of ATC/ATM. The thesis
findings have highlighted, for the first time, the ATC functionalities that are most
affected by equipment failures as well as those which have the most severe impact on
ATC operations. These use of the findings are twofold. Firstly, to identify the equipment
failure types mandatory for recovery training/procedures designed for an ATC Centre.
Secondly, the qualitative equipment failure impact assessment tool can be used as a
part of the incident investigation process as well as a design tool, supporting the design
of recovery training scenarios.
11.2.3 Controller recovery performance, recovery context, and influencing factors
The main findings related to controller recovery performance and the recovery context
are drawn from two sources of information. Firstly, the questionnaire survey results
provided an initial insight into controller recovery and relevant factors. Secondly, a
review of several Human Reliability Assessment (HRA) techniques identified a set of
relevant contextual factors, the so-called Recovery Influencing Factors (RIFs). With
respect to controller recovery and the overall recovery context, the following
conclusions are relevant:
10. This thesis presents for the first time, a comprehensive investigation of the
factors that influence controller recovery. This has been done through a
rigorous process that started with relevant past research, a questionnaire
survey, targeted experiments, and statistical analyses to develop a functional
relationship between controller recovery and its influencing factors.
11. The questionnaire survey showed that the majority of controllers experience
equipment failures annually.
12. Improvement in ATC Centre management is required to facilitate effective
recovery. This can be achieved through, for example organised exchange of
experience within ATC Centres, not only with respect to equipment failures but
also with all types of emergency/unusual situations. Statistical tests identified
that controllers’ account for exchange of information regarding equipment
failures as a type of past experience.
Chapter 11 Conclusions
304
13. The questionnaire survey showed that the vast majority of ATC Centres
surveyed have some form of recovery procedure. The most neglected
procedures are for ATC functionalities which are most challenging to controller
recovery (data processing, surveillance, and communication functionalities). In
addition, controllers highlighted the need for an abbreviated version of the
contingency manual which should be made available at each controller working
position (i.e. aide-memoire).
14. Recovery procedures should be up-to-date, complete, and follow a logical
sequence of steps that the controllers should perform. In addition, recovery
procedures need to be compatible with other procedures within the ATC Centre.
In short, procedures should be seen as guidance to the controller, they should
be adaptable to any given situation, and should take account of a variety of
contextual factors.
15. Half of the ATC Centres surveyed in the questionnaire survey have
programmes for training in recovery from equipment failures. However, this
recurrent training is usually provided once a year. The controllers believe that
the frequency of recurrent training is inadequate and are in favour of receiving
as much training as possible on emergency/unusual situations, including
equipment failures.
16. Recurrent training must be up-to-date and compatible with other training
programmes. Moreover, the recurrent training exercises should be varied and
realistic covering both outages and less severe failures. The ATC Centre should
adopt a custom of periodically reverting to backup systems in order to maintain
controllers’ proficiency with their usage, perhaps during less busy traffic
periods.
17. Regular training on system functionalities, upgrades, and degradation modes
could be a useful method to ensure consistent knowledge and familiarity with
the ATC system architecture.
18. The majority of controllers surveyed confirmed the importance of context
surrounding an equipment failure occurrence. This confirmed the earlier finding
from existing research literature.
19. The context surrounding controller recovery from equipment failure in ATC is
defined via 20 contextual factors, known as Recovery Influencing Factors
(RIFs). Each RIF can be further defined via its qualitative descriptor. This
establishes the relationship between each RIF and its influence on controller
performance.
Chapter 11 Conclusions
305
20. An aggregated indicator of the entire recovery context has been proposed,
referred to as recovery context indicator (Ic). This quantitative indicator of the
recovery context is sensitive to changes in the individual RIFs.
This thesis presents for the first time, a comprehensive set of the factors that influence
controller recovery (RIFs). These factors can be used as part of an incident
investigation process, enabling a detailed investigation of the impact of context on
controller recovery performance. The identification and assessment of RIFs can also
be used for the identification of recommendations on various aspects of ATC operation
and their refinement. However, the final decision of the optimal recommendation should
be based on the degree of positive shift in the value of the recovery context indicator
(as the quantitative indicator of the recovery context). Within the future ATM system,
this methodology could be easily modified to account for the shared responsibility of
separation of aircraft and collaborative decision-making between airborne and ground
based ATM system components.
11.2.4 Framework for the analysis of controller recovery
The framework for the analysis of controller recovery proposed in this thesis was
verified in an experimental investigation with specific reference to a particular
equipment failure type (i.e. FDPS) and a particular ATC Centre. With respect to the
framework for the analysis of controller recovery, the following conclusions are
relevant:
21. Recovery variables relevant to controller recovery from equipment failures in
ATC are the recovery context, effectiveness, and duration. This set of recovery
variables showed a potential for the rigorous analysis of controller recovery.
22. The experiment showed that the controllers with previous experience of
equipment failures executed more required recovery steps. Overall, experience
with equipment failures enhances a controller’s ability to deal with any type of
equipment failure.
23. A further finding from the experiment is that recovery duration tends to be
longer, the closer the emergency training with a module on equipment failures
is to the occurrence of the actual failure.
24. Communication with team members or the supervisor is enhanced when
controllers initiate recovery action sooner (i.e. as close as possible to the instant
of the occurrence of the failure).
Chapter 11 Conclusions
306
25. Furthermore, the experiment showed that the existence of recovery procedures
(or any type of reference material, such as training manuals) promotes prompt
recovery action.
26. The experiment also showed that recovery duration increases with a decrease
in traffic complexity.
27. The recovery context indicator represents a good indicator of both recovery
effectiveness and the outcome of the recovery process (represented as a
combination of the recovery effectiveness and duration).
28. The thesis has identified a statistically significant correlation between recovery
context indicator and the outcome of the recovery process. Hence, the outcome
of the recovery process represents a good safety indicator of the overall
recovery process.
The relevance of recovery training (either as an alternative or an addition to past
experience) and recovery procedures has been confirmed by experiment. Recovery
training and awareness of recovery procedures lead to more prompt recovery action,
better awareness of required recovery steps, and enhanced team communication.
These findings should directly inform the required policy on training and procedures for
handling unusual/emergency situations, highlighting required content, frequency, and
format. Furthermore, the recovery variables identified (recovery context, effectiveness,
and duration) have the potential to facilitate a rigorous analysis of controller recovery
from equipment failures in ATC and thus can be used in incident investigation
processes. Finally, the recovery context indicator represents a good indicator of the
outcome of the recovery process (represented as a combination of the recovery
effectiveness and duration). As such, the overall framework for the analysis of
controller recovery based on identified recovery variables can be used to assess the
outcome of the recovery process in both current and future ATM environment.
11.3 Future work
The research presented in this thesis demonstrates the capability to assess ATC
equipment failures and subsequent controller recovery performance. However, these
findings also suggest a number of directions for further research. These include:
� It is hard to find safety related research in the aviation industry which does not rely
upon some type of occurrence data. However, seldom do any of them pose a
question about the reliability of the data available. To this date, no measure of
reliability of occurrence databases has been produced. Automatic tools exist in
certain countries, for example the Safety Monitoring Function (SMF), which
Chapter 11 Conclusions
307
captures all losses of separation incidents in controlled airspace of that country.
Data from such a tool may provide an indication of the reliability of the occurrence
data.
� Future research should investigate ways to overcome the logistical difficulties with
capturing operational data and corresponding qualitative and quantitative aspects of
validation (e.g. in terms of questionnaire survey sample, number and characteristics
of ATM specialists, and subject matter experts).
� The further development of the qualitative equipment failure impact assessment
tool (Chapter 4) would be required to enable assessment of the impact of several
independent failures on ATC operations and thus controller performance. The
output of this more advanced approach would be to indicate the most severe
independent multiple failures. However, to achieve this, the tool would have to be
adapted to a specific ATC Centre to integrate the complexity of its ATC architecture
and the flow of data between various ATC systems.
� The questionnaire survey used in any future research should apply rigorous design
methods to avoid ambiguities and facilitate interpretation or perception of key terms
(e.g. equipment failure).
� The relationship between the particular RIF level and its impact on controller
recovery (i.e. defined via qualitative descriptor in Chapter 7 and the correlation
coefficient in Chapter 8) could be defined as a function of RIF level. This approach
would be more sensitive to the changes resulting from the incorporation of RIF
interactions.
� It would be necessary to simulate the impact of ATC equipment failures in a future
gate-to-gate ATM system where the roles for planning and executive control will be
reorganised and distributed between controllers and pilots. Additionally, this future
environment will be characterised with dynamic real-time exchange and distribution
of flight-related information. Thus, the safety assessments would have to consider
the exchange and distribution of corrupted data and its impact on both air and
ground services.
� The thesis has identified a statistically significant correlation between recovery
context indicator and the outcome of the recovery process. Future research should
transfer this finding into a model that could be used operationally in an ATC Centre.
11.4 Publications relating to this work
The following publications have been produced in support of the research on controller
recovery from equipment failures in ATC. The publications consist of journal
Chapter 11 Conclusions
308
publications and published conference proceedings, each commented on the precise
contribution of listed co-authors.
11.4.1 Publication format: journal – accepted subject to revision
Subotic, B., Majumdar, A., and Ochieng, W.Y. (2007). Recovery from Equipment
Failures in Air Traffic Control (ATC): The findings from an international survey of
controllers. Accepted subject to revision to the International Journal of Engineering and
Operations: Air Traffic Control Quarterly. Air Traffic Control Association Institute, Inc.
11.4.2 Publication format: journal - published
Subotic, B., Ochieng, W.Y., and Straeter, O. (2007). Recovery from equipment failures
in ATC: An overview of contextual factors. The Reliability Engineering and System
Safety Journal, Vol 92 (7), pp. 858-870.
Subotic, B., Ochieng, W.Y., and Majumdar, A. (2005). Equipment Failures in Air Traffic
Control: Finding an Appropriate Safety Target. The Aeronautical Journal of the Royal
Aeronautical Society, Vol 109 (1096), pp.277-284.
11.4.3 Publication format: conference proceedings - published
Subotic, B., Ochieng, W. and Straeter, O. (2006). Recovery from Equipment Failures in
Air Traffic Control: A Probabilistic Assessment of Context. Proceedings of the
Probabilistic Safety Assessment (PSAM 08) conference, May 14-19, 2006, New
Orleans, USA.
Subotic, B., and Ochieng, W.Y. (2005). Recovery from Equipment Failures in Air Traffic
Control. In Contemporary Ergonomics 2005 (Eds. P.D. Bust and P. T. McCabe). Taylor
& Francis. Presented at the Ergonomics Society Annual Conference, De Havilland
Campus, University of Hertfordshire, Hatfield.
Chapter 12 List of References
309
12 List of References
10News (2006). Power Outage Momentarily Interrupts Air Traffic Control. From http://www.10news.com/news/8831526/detail.html
Air Transport Action Group (2005). The economic & social benefits of air transport. From http://www.atag.org/files/Soceconomic-124721A.pdf
Air Transport Association (2006). Cost of ATC Delays. From http://www.airlines.org/economics/specialtopics/ATC+Delay+Cost.htm
Airbus (2004). Global Market Forecast 2004-2023. From http://www.airbus.com/en/myairbus/global_market_forcast.html
Airways New Zealand (2006a). Manual of Air Traffic Services (amendment 113). Airways New Zealand.
Airways New Zealand (2006b). Domestic and International Aircraft Movements by Calendar Year. From http://www.airways.co.nz/documents/avimove_stats.pdf
Aviation International News (2001). Europeans embracing MLS with a vengeance. From http://www.ainonline.com/issues/04_01/Apr_2001_europeanmlspg75.html
Bainbridge, L. (1983). Ironies of Automation. Automatica, 19, 775-779. From http://www.bainbrdg.demon.co.uk/Papers/Ironies.html
Bainbridge, L. (1984). Diagnostic Skill in Process Operation. Department of Psychology, University College London. From http://www.bainbrdg.demon.co.uk/Papers/DiagnosticSkill.html
Baker, S., and Weston, I. (2001). Mayday, mayday, mayday. From http://www.isasi.org/working_groups/ats/atsmayday.pdf
Berenson, M.L., Levine, D.M., Krehbiel, T.C. (2006). Basic Business Statistics: Concepts and Applications. Prentice Hall: Upper Saddle River, NJ.
Billings, C.E. (1996). Aviation Automation: The Search for a Human-Centred Approach. Hillsdale, N.J.: Lawrence Erlbaum Associates.
Boehm-Davis, D., Curry, R.E., Wiener, E.L., and Harrison, R.L. (1983). Human factors of flight-deck automation: Report on a NASA industry workshop. Ergonomics, 26, 953-961.
Boeing (2004). Statistical Summary of Commercial Jet Airplane Accidents: Worldwide Operations 1959 – 2003. From http://www.boeing.com/news/techissues/pdf/statsum.pdf.
Bove, T. (2002). Development and Validation of a Human Error Management Taxonomy in Air Traffic Control. PhD dissertation. Risø National Laboratory, Roskilde. From http://www.risoe.dk/rispubl/SYS/syspdf/ris-r-1378.pdf
Chapter 12 List of References
310
British Airways (2006). Flight Training Safety and Emergency Procedures (SEP) Training. From http://www.britishairwaysjobs.com/baweb1/?newms=info150
Brooker, P. (2004). Consistent and up-to-date aviation safety targets. Draft version. Cranfield University.
Brooker, P. (2006). Air Traffic Control Safety Indicators: What is Achievable? Eurocontrol: Safety R&D Seminar, 25-27 October 2006, Spain. From https://dspace.lib.cranfield.ac.uk/bitstream/1826/1372/1/Eurocontrol+2006+ATC-Brooker.pdf
Bureau of Transport and Regional Economics (2006). Aviation. Australian Government. From http://www.btre.gov.au/statistics/aviation.aspx
Bureau of Transportation Statistics (2004). Airline On-Time Statistics and Delay Causes. From http://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp
Bureau of Transportation Statistics (2006). Dictionary. From http://www.bts.gov/dictionary/list.xml?letter=A
CASA (2006). ADS-B: Automatic Dependent Surveillance – Broadcast. Civil Aviation Safety Authority Australia. From http://casa.gov.au/pilots/download/ADS-B.pdf
Christensen, W.C., and Manuele, F.A. (1999). Safety through Design: Best Practices. National Safety Council Press.
Cox, K. (2005). Teamwork and Trust: A Pilot’s Perspective. From http://safecopter.arc.nasa.gov/Pages/Columns/SBrief/SafeBrf1Articles/6Teamwork.html
Damidau, A., Kirwan, B., and Scrivani, P. (2006). Safety Getting Real: Safety Insights from Real Time Simulations. Proceedings from the EUROCONTROL Safety R&D Seminar, Barcelona 25-27 October 2006, Spain.
Daniels, J.J., Regli, S.H., and Franke,J.L. (2002). Support for Intelligent Interruption and Augmented Context Recovery. Proceedings from 7th IEEE Human Factors Meeting. Scottsdale, Arizona.
Dekker, S., Fields, B., and Wright, P. (2004). Human Error Recontextualised. From http://www.cs.mdx.ac.uk/staffpages/bobf/papers/glasgow.pdf
Department of Defense (2001). Global Positioning System: Standard Positioning Service Performance Standard. Command, Control, Communication, and Intelligence. Washington DC.
Endsley, M. (1997). Situation Awareness, Automation & Free Flight. From http://atm-seminar-97.eurocontrol.fr/endsley.htm
Endsley, M. R., and Kaber, D. B. (1999). Level of automation effects on performance, situation awareness and workload in a dynamic control task. Ergonomics, 42(3), pp. 462-492.
Endsley, M., and Kiris, E. (1995). The out-of-the-loop performance problem and level of control in automation. Human Factors, 37(2), pp. 381-394.
EUROCONTROL (1997). EUROCONTROL Standard Document for Radar Surveillance in En-Route Airspace and Major Terminal Areas. From http://www.eurocontrol.int/surveillance/gallery/content/public/documents/SURVSTD.pdf
EUROCONTROL (1999). CD-ROM: An introduction to ATM. EUROCONTROL Institute of Air Navigation Services.
Chapter 12 List of References
311
EUROCONTROL (2000a). Safety Minima Study: Review Of Existing Standards And Practices. From http://www.eurocontrol.int/src/gallery/content/public/documents/deliverables/srcdoc1ri.pdf
EUROCONTROL (2000c). ESARR 2: Reporting and Assessment of Safety Occurrences in ATM. From http://www.atceuc.org/site/Eurocontrol/pdf02/esarr2%20v2.0%20en.pdf
EUROCONTROL (2001a). ECAC Safety Minima for ATM. EUROCONTROL Safety Regulation Commission.
EUROCONTROL (2001b). ESARR 4: Risk Assessment and Mitigation in ATM. EUROCONTROL Safety Regulation Commission. http://www.eurocontrol.int/src/gallery/content/public/documents/deliverables/esarr4v1.pdf
EUROCONTROL (2001c). Safety assessment of the free route airspace concept: Feasibility phase. Working Draft 0.3. European Organisation for the Safety of Air Navigation, EUROCONTROL. From http://www.eurocontrol.int/airspace/gallery/content/public/documents/frap/safety_assessment_report_integrated
EUROCONTROL (2001d). European Manual of Personnel Licensing - Air Traffic Controllers: Guidance on Implementation. From http://www.eurocontrol.int/humanfactors/gallery/content/public/docs/DELIVERABLES/L2%20(HUM.ET1.ST08.10000-GUI-01)%20Released-withsig.pdf
EUROCONTROL (2001e). Harmonisation of European Incident Definitions Initiative for ATM – HEIDI Viewer Instructions for Use. Safety, Quality and Standardisation Unit (SQS).
EUROCONTROL (2001f). EUROCONTROL Airspace Strategy for the ECAC States. From http://www.eurocontrol.int/eatm/gallery/content/public/library/airspace.pdf
EUROCONTROL (2002b). Technical Review of Human Performance Models and Taxonomies of Human Error in ATM (HERA). From http://www.eurocontrol.int/humanfactors/gallery/content/public/docs/DELIVERABLES/HF26 (HRS-HSP-002-REP-01) Released.pdf
EUROCONTROL (2002c). Glossary of Terms and Definitions & List of Acronyms (SRC DOC 4). From http://www.eurocontrol.int/src/gallery/content/public/documents/deliverables/srcdoc4e2.pdf
EUROCONTROL (2002d). Short Report on Human Performance Models and Taxonomies of Human Error in ATM (HERA). From http://www.eurocontrol.int/humanfactors/gallery/content/public/docs/DELIVERABLES/HF27%20(HRS-HSP-002-REP-02)%20Released.pdf
EUROCONTROL (2003a). MADAP in a Nutshell. Maastricht Upper Area Control Centre, Netherlands.
EUROCONTROL (2003b). Summer: ATFM summary report. From http://www.cfmu.eurocontrol.int/ATFM/public/docs/publicreport_2003year.pdf
Chapter 12 List of References
312
EUROCONTROL (2003c). EUROCONTROL ATM Strategy for the Years 2000+, Volume 1. From http://www.eurocontrol.int/eatm/gallery/content/public/library/ATM2000-EN-V1-2003.pdf
EUROCONTROL (2003d). HERA-JANUS training: Analysing Human Error in Incident Investigation. 18-20 November 2003. EUROCONTROL Institute of Air Navigation Service, Luxembourg.
EUROCONTROL (2003e). The Human Error in ATM Technique (HERA-JANUS). From http://www.eurocontrol.int/humanfactors/gallery/content/public/docs/DELIVERABLES/HF30 (HRS-HSP-002-REP-03) Released-withsig.pdf
EUROCONTROL (2003f). Guidelines for Controller Training in the Handling of Unusual/Emergency Situations. From http://www.eurocontrol.int/humanfactors/gallery/content/public/docs/DELIVERABLES/T11%20(Edition%202.0)%20HRS-TSP-004-GUI-05withsig.pdf
EUROCONTROL (2003g). Radio and Navigation Aids Course (IANS_ATC_RADNAV). EUROCONTROL Institute of Air Navigation Service, Luxembourg.
EUROCONTROL (2003h). Area Navigation Applications in Europe. From http://elearning.eurocontrol.int/ATMTraining/precourse/nav/rnav/index.html
EUROCONTROL (2003i). ESARR 6: Software in ATM Systems. Safety Regulatory Commission. From http://www.eurocontrol.int/src/gallery/content/public/documents/deliverables/esarr6_e10_ri.pdf
EUROCONTROL (2004a). Evaluating the True Cost to Airlines of One Minute of Airborne or Ground Delay. Prepared by the University of Westminster for Performance Review Unit. From www.eurocontrol.int/prc/gallery/content/public/Docs/cost_of_delay.pdf
EUROCONTROL (2004c). CORA 2 Safety Analysis: Exploratory Preliminary System Safety Assessment (PSSA). European Air Traffic Management Programme.
EUROCONTROL (2004d). Review of Techniques to Support the EATMP Safety Assessment Methodology. From http://www.eurocontrol.int/eec/gallery/content/public/documents/EEC_notes/2004/EEC_note_2004_01_1.pdf
EUROCONTROL (2004e). Managing System Disturbances in ATM: Background and Contextual Framework. From http://www.eurocontrol.int/humanfactors/gallery/content/public/docs/DELIVERABLES/HF47%20(HRS-HSP-005-REP-06)%20Released-withsig.pdf
EUROCONTROL (2004f). The Impact of Automation on Future Controller Skill Requirements and a Framework for SHAPE (HRS/HSP-005-REP-04). Human Factors Management Business Division (DAS/HUM).
EUROCONTROL (2004g). Model Based Simulation of the Turkish En-Route Airspace (EEC Report No. 396). From http://www.ans.dhmi.gov.tr/TR/ATCTR/proje/fts.pdf
EUROCONTROL (2005). ATM Contribution to Aircraft Accidents/Incidents: Review and Analysis of Historical Data. From http://www.eurocontrol.int/src/gallery/content/public/documents/deliverables/srcdoc2_e40_ri_web.pdf
Chapter 12 List of References
313
EUROCONTROL (2006a). Air Traffic Control (ATC). From http://www.eurocontrol.int/corporate/public/standard_page/cb_airtraffic_controller.html
EUROCONTROL (2006b). What is PRNAV? From http://www.ecacnav.com/content.asp?PageID=82
EUROCONTROL (2006c). Performance Review Report covering the calendar year 2005. Performance Review Commission.
EUROCONTROL (2006d). The impact of fragmentation in European ATM/CNS. Performance Review Commission. From http://www.eurocontrol.int/prc/gallery/content/public/Docs/fragmentation.pdf
EUROCONTROL (2007a). Safety Nets. From http://www.eurocontrol.int/safety-nets/public/subsite_homepage/homepage.html
EUROCONTROL (2007b). Single European Sky. From http://www.eurocontrol.int/ses/public/subsite_homepage/homepage.html
European Commission (2001). Meeting society’s needs and winning global leadership. Report of the group of personalities. From http://ec.europa.eu/research/growth/aeronautics2020/pdf/aeronautics2020_en.pdf
European Commission (2006a). GNSS Autonomous Navigation Algorithms Critical Study (D3.2.2.1). Draft report. Sixth Framework Programme (2002-2006).
European Commission (2006b). Critical Analysis of Space-Based Navigation Technologies Usable for Civil Aviation (D3.1P). Draft report. Sixth Framework Programme (2002-2006).
European Space Agency (2002). Space Product Assurance: Safety (ESA Q-40-B). Requirements & Standards Division. Noordwijk, The Netherlands.
Federal Aviation Administration (1995). Approach Station Keeping (Ask) Experiment Plan and Final Report (DOT/FAA/CT-TN95/58). Department of Transportation: Federal Aviation Administration. From http://www.tc.faa.gov/acb300/techreports/TN9558.pdf
Federal Aviation Administration (1997). Hardware Product Specification Document for the Voice Switching and Control System (VSCS) (DTFA01–92–D–00004). Department of Transportation: Federal Aviation Administration.
Federal Aviation Administration (1998). Voice Switching and Control System: Attachment J-3 - Product Specification (FAA-E-2731G). Department of Transportation: Federal Aviation Administration.
Federal Aviation Administration (2000). System Safety Handbook, Chapter 3. Department of Transportation: Federal Aviation Administration. From http://www.asy.faa.gov/RISK/SSHandbook/contents.htm.
Federal Aviation Administration (2003). The Human Factors Design Standard (HF-STD-001). Compact disk, William J. Hughes Technical Center, Atlantic City International Airport, NJ.
Federal Aviation Administration (2005). Air Transportation Operations Inspector's Handbook (Order 8400), Vol 1. Department of Transportation: Federal Aviation Administration. From http://www.faa.gov/library/manuals/examiners_inspectors/8400/
Chapter 12 List of References
314
Feng, S., Ochieng, W., Walsh, D., and Ioannides, R. (2005).A Measurement Domain Receiver Autonomous Integrity Monitoring Algorithm. GPS Solutions. Springer Berlin/Heidelberg.
Frese, M. (1991). Error Management or Error Prevention: Two Strategies to Deal with Errors in Software Design. In H. J. Bullinger (Ed.) Human aspects in Computing: Design and Use of Interactive Systems and Work with Terminals. Amsterdam: Elsevier Science Publishers.
Frese, M., Brodbeck, F.C., Zapf, D., & Prumper, J. (1990). The Effects of Task Structure and Social Support on Users’ Errors and Error Handling. In D. Diaper et al. (Eds.) Human – Computer Interaction - INTERACT’90 (pp.35-41). Amsterdam, Elsevier Science Publishers.
Fujita, Y., and Hollnagel, E. (2004). Failures without errors: quantification of context in HRA. Reliability Engineering and System Safety, 83, pp. 145-151.
Funk, K., Lyall, B., and Riley, V. (1996). Perceived Human Factors Problems of Flightdeck Automation: Phase 1 Final Report. Federal Aviation Administration Grant 93-G-039. From http://www.flightdeckautomation.com/phase1/phase1report.aspx
General Accounting Office (1982). Computer Outages at Terminal Facilities and Their Correlation to Near mid-air Collisions (AFMD-82-43). US GAO, Washington DC.
General Accounting Office (1991). Air Traffic Control: FAA Can Better Forecast and Prevent Equipment Failures. US GAO, Washington DC.
General Accounting Office (1996). Air Traffic Control: Good Progress on Interim Replacement for Outage-Plagued System, but Risks Can Be Further Reduced. US GAO, Washington DC.
General Accounting Office (1998). Air Traffic Control: Information Concerning Equipment Outages at Two Kansas City Area Facilities. US GAO, Washington DC.
Gordon, R., and Makings, N. (2003). Gate 2 Gate: Stakeholder Safety Survey. EUROCONTROL Experimental Centre, France.
Graham, G.M., Kinnersly, S and Joyce, A. (2002). Safety Reporting and Aviation Target Levels of Safety. In C.W. Johnson, Investigation and Reporting of Incidents and Accidents (IRIA 2002). Department of Computing Science, University of Glasgow, Scotland.
Hai, L. (2004). Civil Aviation Safety Outline (2001-2020). From http://www.seaskyad.com/ad@cca_english/content/content_0206_special_articles/article16.htm.
Hallbert B.P. and P. Meyer (1995). Summary of lessons learned at the OECD Halden reactor project for the evaluation of human-machine systems. Institutt for Energiteknikk, Halden, Norway.
Heinrich, H.W. (1941). Industrial Accident Prevention – A Scientific Approach. Mc Graw Hill: New York and Wiley: London.
Hilburn, B. (2004). Cognitive Complexity in Air Traffic Control - A Literature Review. EUROCONTROL Experimental Centre, EEC Note 04/04.
Hilburn, B., and Flynn, M. (2001). Air Traffic Controller and Management Attitudes Toward Automation: An Empirical Investigation. 4th USA/EUROPE Air Traffic Management R&D Seminar, Santa Fe, USA.
Chapter 12 List of References
315
Hollnagel, E. (1993). Human Reliability Analysis: Context and Control. Academic Press, London.
Hollnagel, E. (1998). Cognitive Reliability and Error Analysis Method (CREAM). Elsevier Science Ltd., London, UK.
IEEE (1998). IEEE Guide for Microwave Communications System Development: Design, Procurement, Construction, Maintenance, and Operation. IEEE-SA Standards Board. From http://ieeexplore.ieee.org/iel4/5643/15123/00690973.pdf?arnumber=690973
IFALPA (2005). Interpilot: 60th Annual Conference: Boeing 787 programme update. From http://216.239.59.104/search?q=cache:oJuuByAkeqEJ:www.ifalpa.org/Interpilot/2005/06inp01.pdf+Interpilot:+60th+Annual+Conference:+Boeing+787+programme+update&hl=en&ct=clnk&cd=1&gl=uk
IFATCA (2004). Produce Definition of Controller Tools (Agenda Item B.5.2). Proceedings from 43rd Annual Conference, Hong Kong, 22-26 March 2004.
IFATCA (2005). A Positive Step to Improve Aviation Safety. From http://www.ifatca.org/press/141105.pdf
International Civil Aviation Organization (1979). Annex 5: Units of Measurement to be Used in Air and Ground Operations. Montreal, Canada.
International Civil Aviation Organization (1985). Manual of Air Traffic Forecasting (Doc 8991-AT/722/2). Montreal, Canada.
International Civil Aviation Organization (1995). Review of the General Concept of Separation panel (RGCSP). Working Group A: A Review of Work on Deriving a Target Level of Safety (TLS) for En-route Collision Risk. Montreal, Canada.
International Civil Aviation Organization (1997). Outlook for Air Transport to the Year 2005 (ICAO Circular 270-AT/111). Montreal, Canada.
International Civil Aviation Organization (1998). Human Factors Training Manual – Doc 9683 (First Edition). Montreal, Canada.
International Civil Aviation Organization (2001a). Air Traffic Management Doc 4444. Montreal, Canada.
International Civil Aviation Organization (2001b). Annex 6: Operation of Aircraft. Montreal, Canada.
International Civil Aviation Organization (2001c). Annex 11: Air Traffic Services. Montreal, Canada.
International Civil Aviation Organization (2001d). Annex 13: Aircraft Accident and Incident Investigation. Montreal, Canada.
International Civil Aviation Organization (2003). Review the latest developments in the ATN Panel and the Aeronautical Mobile Communication Panel. From http://www.icao.int/icao/en/ro/apac/atn_2003/ip02.pdf
International Civil Aviation Organization (2005). Report of the Ninth Meeting of Communications, Navigation And Surveillance/Meteorology Sub-Group
Chapter 12 List of References
316
(Cns/Met/Sg/9) Bangkok, Thailand 11– 15 July 2005. From http://www.icao.int/icao/en/ro/apac/2005/CNS_MET_SG9/CNSMET_SG9.pdf
International Civil Aviation Organization (2006a). Review Developments Relating to CNS/ATM Implementation: Review the Work by RNP Special Operational Requirements Study Group on the Implementation of RNP Operations. From http://www.icao.int/icao/en/ro/apac/2006/ATM_AIS_SAR_SG16/wp22.pdf
International Civil Aviation Organization (2006b). Contracting States. From http://www.icao.int/cgi/goto_m.pl?/cgi/statesDB4.pl?en
International Civil Aviation Organization (2007). CNS/ATM Systems. From http://www.icao.int/icao/en/ro/rio/execsum.pdf
Johnson, C. W. and Holloway, C.M. (2004). On the Over-Emphasis of Human ‘Error’ As A Cause of Aviation Accidents: ‘Systemic Failures’ and ‘Human Error’ in US NTSB and Canadian TSB Aviation Reports 1996-2003. From http://www.dcs.gla.ac.uk/~johnson/papers/Cause_comparisons/Error_and_accidents.PDF
Joint Aviation Administration (1994). Joint Aviation Requirements for Large Aeroplanes (JAR–25).
Kaarstad M., Ludvigsen J.T. (2002). Background study for further research in performance recovery. Presented at Enlarged Halden Programme Group Meeting, Storefjell,C2/5/1–16.
Kaber D.B. (1997). The Effect of Level of Automation and Adaptive Automation on Performance in Dynamic Control Environments (ANRCP-NG-ITWD-97-01). Amarillo, TX: Amarillo National Resource Center for Plutonium.
Kaber, D. B. and Riley, J. (1999). Adaptive automation of a dynamic control task based on secondary task workload measurement. International Journal of Cognitive Ergonomics, 3(3), 169-187.
Kaber, D.B., Prinzel, L.J., Wright, M.C., and Clamann, M.P. (2002). Workload-Matched Adaptive Automation Support of Air Traffic Controller Information Processing Stages (NASA/TP-2002-211932). National Aeronautics and Space Administration. From http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20020080640_2002133430.pdf
Kanse, L. (2004). Recovery uncovered: How people in the chemical process industry recover from failures. PhD dissertation. Eindhoven University of Technology.
Kanse, L. and van der Schaaf, T. (2000). Recovery from failures - understanding the positive role of human operators during incidents. In by D. de Waard, C. Weikert, J. Hoonhout and J. Ramaekers (Eds.), Human System Interaction: Education, Research and Application in the 21st Century. Maastricht, Netherlands: Shaker Publishing.
Kennedy, R., Kirwan, B., and Summersgill, R. (2000). Making HRA a more consistent science. In Foresight & Precaution, Eds. Cottam, M., Pape, R.P., Harvey, D.W., and Tait,J. Balkema, Rotterdam.
Kim, M.C., Seong, P.H., and Hollnagel, E. (2005). A probabilistic approach for determining the control mode in CREAM. Reliability Engineering and System Safety, pp. 1-9.
Chapter 12 List of References
317
Kirwan, B. (1994). A Guide to Practical Human Reliability Assessment. Taylor & Francis, London, UK.
Kirwan, B. (1997). The development of a nuclear chemical plant human reliability management approach: HRMS and JHEDI. Reliability Engineering and System Safety, Vol 56, pp. 107-133.
Kirwan, B., Gibson, H., Edmunds, J., Cooksley, G., Kennedy, R., and Umbers, I. (1994). Nuclear Action Reliability Assessment (NARA): A Data-Based HRA Tool.
Kirwan, B., Basra, G., and Taylor-Adam, S.E. (1997). CORE-DATA: A Computerised Human Error Database for Human Reliability Support. Proceedings from the Sixth Annual Human Factors Meeting, Orlando, US.
Kontogiannis, T. (1999). User strategies in recovering from system failures in man-machine systems. Safety Science 32(1), pp. 49-68.
Kopardekar, P., and Magryratis, S. (2003). The measurement and prediction of dynamic density. Presented at the FAA-EUROCONTROL ATM 2003 Seminar, Budapest.
Lanzi, P., and Marti, P. (2001). Innovate or preserve: when technology questions cooperative processes. From http://www.dblue.it/pdf/ECCE11_Lanzi_Marti_v3.pdf
Layton, C., Smith, P. J., and McCoy, E. (1994). Design of a cooperative problem-solving system for en-route flight planning: An empirical evaluation. Human Factors, 36, pp. 94-119.
Leveson N.G. (1995). Safeware: System Safety and Computers. Addison- Wesley publishing company, New York.
Littlewood, B., Strigini, L., Wright, D., and Courtois, P.J. (1998). Examination of Bayesian Belief Network for Safety Assessment of Nuclear Computer-Based Systems ESPRIT DeVa Project 20072). From http://www.csr.city.ac.uk/people/lorenzo.strigini/ls.papers/DeVa_BBN_reports/DeVaTR70_year3.5a/DeVaTR70.pdf
Low, I. and Donohoe, L. (2001). Engineering Psychology and Cognitive Ergonomics Volume 5: Aerospace and Transportation Systems. Edited by Don Harris. Methods for assessing ATC controllers’ recovery from automation failures. National Air Traffic Service (NATS), UK.
Majumdar, A., and Ochieng, W.Y. (2002). Estimation of European Airspace Capacity from a Model of Controller Workload. Journal of Navigation, Vol 55(3), pp. 381-403.
Majumdar, A., Ochieng, W.Y., McAuley, G., Lenzi, J.M., and Lepadatu, C. (2004). The Factors Affecting Airspace Capacity in Europe: A Cross-Sectional Time-Series Analysis Using Simulated Controller Workload. Journal of Navigation, Vol 57(3), pp.385-405.
Massaiu, S., Haugset, H., and Bjorlo, T.J. (2003). Human Reliability Issues in Traffic Control Centres. Norwegian Research Council.
Mauri, G. (2000). Integrating Safety Analysis Techniques, Supporting Identification of Common Cause Failures. PhD thesis, The University of York.
Metzger, U., and Parasuraman, R. (2005). Automation in future air traffic management: Effects of decision aid reliability on controller performance and mental workload. Human Factors, 47(1), 35-49.
Chapter 12 List of References
318
Ministry of Land, Infrastructure, and Transport (2006). Statistics. Air Traffic Activity at Cab Facilities: Area Control Center. From http://www.mlit.go.jp/koku/04_hoan/e/statistics/image/00_00.gif
Mohleji, S., C., Lacher, A. R., and Ostwald, P.A. (2003). CNS/ATM System Architecture Concepts and Future Vision of NAS Operations. In 2020 Timeframe. Center for Advanced Aviation System Development (CAASD), The MITRE Corporation. From http://www.mitre.org/work/tech_papers/tech_papers_03/mohleji_2020/mohleji_2020.pdf
National Aeronautics and Space Administration (2000). Required Communication Performance (RCP). From http://as.nasa.gov/aatt/wspdfs/Oishi.pdf
National Aeronautics and Space Administration (2002). NASA Safety Manual w/Changes through Change 1 (NPR 8715.3). NASA QS / Safety & Risk Management Division.
National Air Traffic Services (1999). Testing Operational Scenarios for Concepts in ATM (Phase II). WP2: Airspace Sectorisation Optimisation. European Commission.
National Air Traffic Services (2002). Manual of Air Traffic Services Part II. London Area Control Centre, edition 2/02.
National Air Traffic Services (2004). NATS apologises for delays experienced today. From http://www.nats.co.uk/news/news_stories/2004_06_03_2.html
National Transportation Library (1997). Potential Cost Savings Ideas for FAA and Users. From http://ntl.bts.gov/lib/000/500/511/costsav.pdf
National Transportation Safety Board (1973). Aircraft Accident Report (AAR-73-14). From http://amelia.db.erau.edu/reports/ntsb/aar/AAR73-14.pdf
National Transportation Safety Board (1983). Aircraft Accident Report (AAR-83-02). From http://amelia.db.erau.edu/reports/ntsb/aar/AAR83-02.pdf
National Transportation Safety Board (1996).Special Investigation Report: Air Traffic Control Equipment Outages. Washington, D.C.
Nolan, M. S. (1998). Fundamentals of Air Traffic Control. Belmont, USA: Wadsworth Publishing Company.
Nuclear Regulatory Commission (1998). Technical Basis and Implementation Guidelines for a Technique for Human Event Analysis (ATHEANA). NUREG-1624. U.S. Nuclear Regulatory Commission, Washington, DC.
Ochieng, W.Y. (2006). Future Air Traffic Management. Course presentation for Air Traffic Management Module (T23). Imperial College London.
Orasanu, J., and Fischer, P. (1997). Finding decisions in natural environments: the view from the cockpit. In Zsambok, C.E. & Klein, G. Mahwah (Eds) Naturalistic decision-making. New Jersey: Lawrence Erlbaum Associates Publishers.
Oren, T., and Ghasem-Aghaee, N. (2003). Personality Representation Processable in Fuzzy Logic for Human Behavior Simulation. Summer Computer Simulation Conference, July 20-24, 2003. Montreal, Canada. From http://www.site.uottawa.ca/~oren/pres/pres-of-2003-01-SCSC-personality.pdf
Parasuraman, R., and Riley, V. (1997). Humans and automation: use, misuse, disuse, abuse. Human Factors Vol 39, 230-253.
Chapter 12 List of References
319
Parasuraman, R., Bahri, T., Deaton, J., Morrison, J., and Barnes, M. (1990). Theory and Design of Adaptive Automation in Aviation Systems. Technical Report No. CSL-N90-1, Cognitive Science Laboratory. Catholic University of America, Washington, DC.
Parasuraman, R., Mouloua, M., and Molloy, R. (1996). Effects of adaptive task allocation on monitoring of automated systems. Human Factors. 38. pp. 665-679.
Parasuraman, R., Wickens, C. D., and Sheridan, T. (2000). A model for types and levels of human interaction with automation. IEEE Transactions on Systems, Man, and Cybernetics, 30(3), 286-297.
Park, J., Jung, W., Ha, J., and Shin, Y. (2004). Analysis of operators’ performance under emergencies using a training simulator of the nuclear power plant. Reliability Engineering and System Safety, 83, pp. 179-186.
Perrow, C. (1999). Normal Accidents. Princeton University Press.
Piantek, T.W. (1999). Influence in contracting and purchasing. In Safety Through Design: Best Practices (EDS. Christensen, W.C., Manuele, F.A.). National Safety Council Press.
PPrune Forums (2006). ATC Issues. From http://www.pprune.org/forums/forumdisplay.php?s=ac64e2a0afd13472a93e7df2bba4b826&f=18
Rail Safety and Standards Board (2004). Rail-Specific HRA Tool for Driving Tasks Phase 1 Report. From http://www.rssb.co.uk/pdf/reports/research/T270 Rail-specific HRA tool for driving tasks Phase 1 report.pdf
Rasmussen, J. (1982). Human errors: A taxonomy for describing human malfunction in industrial installations. Journal of Occupational Accidents, 4, 311-335.
Reason, J.T. (1997). Managing the risks of organizational accidents. Aldershot, England: Ashgate Publishing.
Reid, J.W. (1996). Safety by Design. Lecture 4: Cost and acceptability of risk. Hazardous forum: London.
Rigas, G. and Elg, F. (1997). Mental models, confidence, and performance in a complex dynamic decision making environment. Department of Psychology, Uppsala University, Sweden. From http://www.ie.boun.edu.tr/labs/sesdyn/isdc97/TURKIA.doc
RISKS (2000). U.K. ATC System Failure. The RISKS Digest, Vol 20, issue 94. From http://catless.ncl.ac.uk/Risks/20.94.html
Rizzo, A., Ferante, D., and Bagnara, S. (1995). Handling human error. In J.M. Hoc, P.C. Cacciabue, & E. Hollnagel (Eds.), Expertise and Technology: Cognition & Human-Computer Cooperation (pp. 195-212). Hillsdale, NJ: Lawrence Erlbaum.
Saldana, M. A. M., Herrero, S. G., del Campo, M. A. M. and Ritzel, D. O. (2002). Assessing Definitions and Concepts within the Safety Profession. From http://www.aahperd.org/iejhe/2003_first/ritzel.pdf.
Sampaio, J. J. M., and Guerra, A. A. (2004). The day god failed or overtrust in automation: The Portuguese case study. In Proceedings from the 2nd Conference on Human Performance Situation Awareness and Automation (HPSAA 2). Daytona Beach, FL.
Chapter 12 List of References
320
Scerbo, M.W. (2005).Adaptive Automation. Department of Psychology Old Dominion University. From http://www.cs.colorado.edu/~mozer/courses/6622/papers/aachpt05-12-15.htm
Sellen, A. J. (1994). Detection of everyday errors. Applied psychology: An International Review 43(4), pp. 475-498.
Shappell, S.A. (2000). The Human Factors Analysis and Classification System-HFACS (DOT/FAA/AM-00/7). Federal Aviation Administration. US Department of Transportation. From http://www.nifc.gov/safety_study/accident_invest/humanfactors_class&anly.pdf
Sheridan, T.B. (1980). Computer control and human alienation. Technology Review Vol 10, pp.61-73.
Shier, R. (2004). The Mann-Whitney U Test. Matematics Learning Support Centre. From http://mlsc.lboro.ac.uk/documents/Mannwhitney.pdf
Shorrock, S. (1992). Error Classification for Safety Management: Finding the Right approach. In C.W. Johnson (Ed.), Investigation and Reporting of Incidents and Accidents IRIA 2002 (pp. 57-67). From http://www.dcs.gla.ac.uk/~johnson/iria2002/IRIA_2002.pdf
Shorrock, S. T., and Kirwan, B. (2002). Development and application of a human error identification tool for air traffic control. Applied Ergonomics, Vol 33, pp. 319–336.
Smith, S.P., Harrison, M.D. and Schupp, B.A. (2004). How explicit are the barriers to failure in safety arguments? Computer Safety, Reliability, and Security (SAFECOMP'04). In M. Heisel, P. Liggesmeyer and S. Wittmann (Eds), Lecture Notes in Computer Science Vo 3219, pp. 325-337, Springer.
Sorensen, J.N. (2002). Safety culture: a survey of the state-of-the-art. Reliability Engineering and System Safety, Vol 76, pp. 189-204.
Straeter, O. (2000). Evaluation of human reliability on the basis of operational experience. Dissertation at Munich Technical University.
Straeter, O. (2001). The quantification process for human interventions. In: Kafka, P. (ed.) PSA RID – Probabilistic Safety Assessment in Risk Informed Decision making. EURO-Course. 4.- 9.3.2001. GRS. Germany.
Straeter, O. (2005). Cognition and Safety: An Integrated Approach to Systems Design and Performance Assessment. Ashgate: Aldershot.
Subotic, B., Ochieng, W.Y., and Majumdar, A. (2005). Equipment Failures in Air Traffic Control: Finding an appropriate safety target. The Aeronautical Journal of the Royal Aeronautical Society, Vol 109(1096), p. 277-284.
Subotic, B., Ochieng, W.Y., and Straeter, O. (2006a). Recovery from equipment failures in ATC: An overview of contextual factors. Reliability Engineering and System Safety Journal Vol 92 (7), pp. 858-870.
Subotic, B., Ochieng, W. and Straeter, O. (2006b). Recovery from Equipment Failures in Air Traffic Control: A Probabilistic Assessment of Context. Probabilistic Safety Assessment (PSAM 08) Conference, May 14-19, 2006, New Orleans, US.
Swain, A. D., and Guttman, H. E. (1983). Handbook of human reliability analysis with emphasis on nuclear power plant applications (NUREG/CR-1278). Washington D.C.
Theis, I. and Sträter, O. (2001). By-Wire Systems in Automotive Industry. Reliability Analysis of the Driver-Vehicle-Interface Proceedings. ESREL 2001, Turin.
Chapter 12 List of References
321
THEMES (2001). Thematic Network for Safety Assessment of Waterborne Transport. Deliverable No. D5.1. Report on Safety and Environmental Assessment Method. From http://projects.dnv.com/themes/Deliverables/D5.1Final.pdf
Theureau J., Jeffroy F. and Vermersch P. (2000). Controlling a nuclear reactor in accidental situations with symptom-based computerized procedures: a semiological & phenomenological analysis. Proceedings from CSEPC 2000. Taejon, Corée, 22-25 Novembre.
UK Civil Aviation Authority (2003). United Kingdom Manual of Personnel Licensing - Air Traffic Controllers (CAP 744). Civil Aviation Authority. London.
UK Civil Aviation Authority (2004). Fact Sheet - SSR Mode S, Edition 1.2. From http://www.caa.co.uk/docs/810/DAP_SSM_Mode_S_SSR_Factsheet.pdf
UK Civil Aviation Authority (2005). Mandatory Occurrence Reporting Scheme. CAP 382. Civil Aviation Authority, London. From http://www.caa.co.uk/docs/33/CAP382.PDF
UK Civil Aviation Authority (2006). Manual of Air Traffic Services - Part 1 (CAP 493). Civil Aviation Authority, London. From http://www.caa.co.uk/docs/33/CAP493Part1.pdf
United Nations (2006). UN in Brief. From http://www.un.org/Overview/brief1.html#footnote
van der Schaaf, T. W. (1992). Near miss reporting in the chemical process industry. PhD thesis. Eindhoven University of Technology.
van der Schaaf, T.W. (1995). Human recovery of errors in man-machine systems. Proceedings of the Sixth IFAC/IFIP/IFORS/IEA Symposium on the Analysis, Design and Evaluation of Man–Machine Systems. Cambridge, MA.
van Es, G.W.H. (2003). Review of Air Traffic Management-related accidents worldwide: 1980-2001. National Aerospace Laboratory (NLR).
Ward, M., Grupen, L., Regehr, G. (2002). Measuring Self-assessment: Current State of the Art. Advances in Health Sciences Education, 7, pp. 63–80.
Weisberg, H.F., Krosnick, J.A., and Bowen, B.D. (1996). An Introduction to Survey Research, Polling, and Data Analysis. SAGE Publications: London.
Wickens, C.D. (1992). Engineering psychology and human performance, 2nd Ed. New York: Harper Collins.
Wickens, C.D. (2001). Attention to Safety and the Psychology of Surprise. From http://www.aviation.uiuc.edu/UnitsHFD/conference/Osukeynote01.pdf
Wickens, C.D., Lee, J.D., Liu, Y., and Gordon Becker, S.E. (2004). An Introduction to Human Factors Engineering. New Jersey: Pearson Prentice Hall.
Wickens C.D, Mavor, A. and McGee, J.P. (Eds.) (1997). Flight to the Future: Human Factors in Air Traffic Control. Washington, DC: National Academy Press.
Wickens, C.D., Mavor, A. S., Parasuraman, R., and McGee, J.P. (1998). The Future of Air Traffic Control: Human Operators and Automation. National Academy Press: Washington, DC.
Wiener, E.L. and Curry, R.E. (1980). Flight deck automation: promises and problems. Ergonomics, Vol 23, pp. 995-1011.
Chapter 12 List of References
322
Williams, J.C. (1986). HEART – A Proposed Method for Assessing and Reducing Human Error. In 9th Advances in Reliability Technology Symposium. University of Bradford, 1986.
Wood, A. (1996). Software Reliability Growth Models. From http://www.hpl.hp.com/techreports/tandem/TR-96.1.pdf
Zapf, D., and Reason, J.T. (1994). Introduction: Human Error and Error Handling. Applied psychology: An international review, Vol 43(4), pp. 4127-432.
Appendices
323
Appendices
Appendix I The cost of delays induced by ATC equipment failures
Appendix II Interviews with ATM staff
Appendix III Checklist for the Equipment Failure Scenarios in a specific European
ATC Centre - An Aide-Memoire framework
Appendix IV The questionnaire design
Appendix V Example of one questionnaire response
Appendix VI Results extracted from the question 5 of the questionnaire survey
Appendix VII Overview of contextual factors
Appendix VIII Probabilities for 20 Recovery Influencing Factors (RIFs)
Appendix IX Questions for the ATM Specialist
Appendix X Overview of RIFs, their corresponding levels, and designated
probabilities
Appendix XI Validation of the RIFs interaction matrix
Appendix XII Distribution of 20 Recovery Influencing Factors (RIFs)
Appendix XIII Experimental material
Appendix XIV Overview of RIFs, their corresponding levels, and probabilities
determined in the experimental investigation
Appendix XV Distribution of the recovery context indicator captured in the experiment
Appendices
324
Appendix I The cost of delays induced by ATC equipment failures The impact of an equipment failure on ATM can be analysed from several different
perspectives. From a financial perspective, it is necessary to consider the costs
identified in ATC and the cost of delays in a wider region. A small exercise has been
conducted on the cost of delays induced by ATC equipment failures in the European
Civil Aviation Conference (ECAC) and US airspace.
From EUROCONTROL’s Central Flow Management Unit (CFMU) data for the period
from 1999 to 2003 (Table 1), ATC equipment failure induced delays are split between
en route and airports respectively. Given that the cost of one minute delay in Europe in
the year 2002 is estimated to be EUR72 (EUROCONTROL, 2004a), the last column of
Table 1 presents total costs incurred by airlines as a result of airborne and ground
delays. It is important to highlight that the estimate for the cost of one minute delay
(EUR72) is based on primary delay costs, reactionary delay costs (e.g. ‘knock-on’
effect to the other aircraft), as well as fuel, maintenance, ground handling of aircraft
and passengers, passenger costs of delay to the airline, and future loss of market
share due to lack of punctuality (EUROCONTROL, 2004a). As a result, the calculated
annual cost of delays caused by ATC equipment failures accounts for all relevant costs
and thus demonstrates the high cost of technical failures.
Table 1 ATC equipment as a cause of airport and enroute delays (personal correspondence1)
Year Enroute Delay
(min) Airport Delay
(min) Total Delay
(min)
Annual cost for the airlines (million EUR) based on the
year 2002
1999 609265 461290 1070555 77.08
2000 598660 265055 863715 62.19
2001 614534 406760 1021294 73.53
2002 425627 138045 563672 40.58
2003 149476 147528 297004 21.38
There are a number of reasons for the differences in the delay reported by the CFMU
(Table 1) for a given period. Some global factors explaining the delay reductions in the
decade beginning in 2000, are the general reduction of air traffic (as a result of post
September 11th 2001 crisis in the aviation industry), the presence of severe factors
(e.g. closure of Yugoslav airspace in 1999), the introduction of new route structures in
1999, the influence of European ATM network programs (e.g. Reduced Vertical
1 Personal correspondence with EUROCONTROL CMFU.
Appendices
325
Separation Minima-RVSM, improved capacity management), and staffing issues that
reached the highest record in 2002 (EUROCONTROL, 2003b).
Similar calculations have been carried out for the impact of ATC equipment failures on
the overall US’s National Aviation System (NAS). The US NAS consists of aircraft,
pilots, facilities, controllers, airports, maintenance personnel, together with computers,
communications equipment, satellite navigation aids, and radars. Direct aircraft
operating cost per minute of delay is calculated according to the Air Transport
Association (ATA) estimates for the year 2005, which is $62.33 (Air Transport
Association, 2006). This cost comprises of fuel burn, extra crew time, maintenance,
aircraft ownership costs, and additional costs. These additional costs account for costs
of extra gates and manpower on the ground and costs imposed on airline customers
(passengers and cargo shippers) in the form of lost productivity, wages, and customer
satisfaction. The FAA estimates average cost of delay to air travelers to be $30.26 per
hour or $0.50 per minute (Air Transport Association, 2006). As a result, the average
costs of ATC equipment induced failures for the year 2004 and 2005 are given in Table
2.
Table 2 ATC equipment as a cause of the US National Aviation System delays. From Bureau of Transportation Statistics (2004), summaries available only for the whole 2004 and 2005
Year ATC equipment (min) Average cost (millions $)
2004 402644 25.10
2005 274126 17.09
In general, these high-level analyses illustrate that equipment failures can significantly
affect operational, safety, and financial aspects of both ATC and ATM systems. Both
methods (employed for Europe and the US) for calculating the cost of the delay per
minute are largely similar. The only difference is the financial value assigned to each
minute of delay in Europe and the US. In addition, the ‘true’ cost of equipment failure
induced delay should also incorporate technical repair, unscheduled maintenance,
training, and additional staffing. However, it is assumed that these costs represent only
a fraction when compared to the cost of delay per minute. Therefore, it can be
concluded that these estimates are a reasonable representation of the total cost
induced by ATC equipment failure both in the European and the US aviation markets.
Appendices
326
Appendix II Interviews with ATM staff
Interviews with relevant Air Traffic Management (ATM) staff, as a method of data
collection, have been conducted to support the research presented in this thesis and to
augment available theoretical findings. They aimed to extract operational experience of
ATM specialists and experienced system control and monitoring engineers. The focus
of these interviews has been on four research areas. These are:
� classification of ambiguous operational failure reports;
� characteristics of air traffic controllers training;
� characteristics of equipment failures in Air Traffic Control (ATC); and
� contextual factors relevant to controller recovery from equipment failures in ATC.
Interviews with ATM specialists focused on the air traffic controller training (ab initio,
recurrent, and emergency training) and contextual factors relevant to controller
recovery. Interviews with system control and monitoring engineers revealed their
experiences related to the characteristics of ATC equipment failures.
The sample of ATM staff interviewed is as follows:
� system control and monitoring engineers from four countries:
o National Air Traffic Services (NATS), Corporate and Technical Centre (CTC)
and Swanwick Centre, UK;
o EUROCONTROL Maastricht Upper Area Control Centre (MUAC),
Netherlands;
o Irish Aviation Authority (IAA);
o Airports Authority of India (AAI);
� ATM specialists from two countries:
o EUROCONTROL Institute of Air Navigation Services (IANS), Luxembourg;
o Irish Aviation Authority (IAA).
Findings related to each research area are presented below.
Appendices
327
Table A-1 Findings related to the clarification of ambiguous operational data
Location Number of participants interviewed
Research question
Finding Agreement
between study participants
UK NATS (CTC) one experienced
engineer Ambiguous operational
failure reports
Proper classification of all operational failure reports
Yes, clarified all ambiguities EUROCONTROL
MUAC two experienced
engineers
Table A-2 Findings related to the air traffic controllers training
Location Number of participants interviewed
Research question
Findings Agreement
between study participants
EUROCONTROL IANS
one ATM specialist
Usefulness of announcing the
training for unusual/emergen
cy situations
Although controllers may anticipate an
unusual occurrence within their
emergency training, this does not
facilitate better performance as
long as they do not know the nature of
that unusual occurrence
Yes, both agreed
IAA one ATM specialist
Table A-3 Findings related to the characteristics of equipment failures in ATC
Location Number of participants interviewed
Research question
Finding Agreement
between study participants
UK NATS (CTC) one experienced
engineer Existence of latent failures
Latent failures tend to go unnoticed until some other event or failure reveals their
existence.
Yes, experienced
latent software failures
EUROCONTROL MUAC
one experienced engineer
IAA one experienced
engineer
UK NATS (CTC) one experienced
engineer Complexity of
failure type
Majority of ATC equipment failures
affect single system. Yes
EUROCONTROL (MUAC)
two experienced engineers
IAA one experienced
engineer
UK NATS (CTC) one experienced
engineer Time course of
failure development
Majority of failures tend to manifest
themselves suddenly
Yes EUROCONTROL
(MUAC) two experienced
engineers
IAA one experienced
engineer
Appendices
328
Table A-4 Findings related to the contextual factors relevant to controller recovery from equipment failures in ATC
Location Number of participants interviewed
Research question
Finding Agreement between
study participants
IAA two ATM
specialists
Contextual factors relevant
to controller recovery from
equipment failures in ATC
Validation of the candidate
contextual factors
Agreed on selected contextual factors and aided the definition of
each factor
IAA three ATM specialists
Interactions between
contextual factors
Validation of interactions
between contextual factors identified using operational
experience and the past research
Their feedback was similar. Identified
inconsistencies were further clarified during the
interview and were the result of the
misperception of some factors. All
inconsistencies were clarified.
Appendices
329
Appendix III Checklist for the Equipment Failure Scenarios in ATC Centre - An Aide-Memoire framework
This section provides a framework for the design of the Aide-Memoire or checklist type
procedures for recovery from equipment failures in a particular ATC Centre. The
proposed framework is adapted to an ATC Centre that participated in the experimental
investigation segment of the research presented in this thesis. This Aide-Memoire
provides a potential framework, which needs be further discussed and developed in
accordance with the in-house expertise of the system control and monitoring staff and
ATM specialists of a respective ATC Centre. However, the concept and the design
solution presented here is transferable across ATC Centres.
Contents
Once all equipment failures to be included in the Aide-memoire have been defined,
they could be categorised into four distinct groups based upon their impact on ATC
operations (as discussed in Chapter 4). These four categories are as follows:
� Major impact to operations room (all sectors/all workstations) – severe flow
restrictions possible. Relevant failures are:
o ONL LAN failure
o Failure of the Surveillance Network
o Failure of COMPAD
o Loss of Flight Server
o Loss of Track Server
o Loss of SSR and PSR
o Loss of FDPS
o Loss of MRP
� Moderate impact to operations room - impact to one or several workstation in
different suite, possible need to combine/move positions immediately and
possible flow restrictions. Relevant failures are:
o Reduced radar data mode
o Reduced alert mode
o Reduced communication mode
o Loss of ARTAS
o Loss of VCS panel
o Loss of a single CWP
o Loss of entire sector suite
o Loss of SRP
Potential colour coding in Aide-
Memoire RED
Potential colour coding in Aide-
Memoire YELLOW
Appendices
330
o Loss of adjacent sector
� Minimal impact – not immediately critical but may have greater operational
impact over time. Relevant failures are:
o Radar Data Function failure
o Loss of single frequency
o Overload of SRP
o Overload of MRP
o Loss of external feeds to AIS
o Loss of STCA
o Loss of APW
o Loss of MSAW
o Loss of OLDI
o Loss of paper strip printer
Note that the categorisation above lists some but not all possible failures. Those
marked in italics are designed in the Aide-Memoire format and are presented below.
Further input from system control and monitoring staff and ATM specialists may yield
more accurate and precise types of failures and recovery steps to be taken.
Design
At the top of each procedure, it would be useful to have the appearance of the pictorial
Human Machine interface (HMI) warning, if applicable (e.g. the highlighted labels on
the General Information Window). This would be followed by the presentation of the
two types of information. Firstly, the required recovery steps, i.e. those that a controller
must perform to recover effectively and ensure safe air traffic control service. Secondly,
the key effects of the equipment failure on the ATC system (i.e. the ATC system
feedback). The rational for this design solution is that the top part of the checklist
should be reserved for the items that controllers should be aware of first, i.e. recovery
steps.
In addition, it is necessary to define procedures for different personnel working in the
operational environment, namely controllers (i.e. different roles for executive, planner,
and assistant controller), supervisors, and managers to assure a seamless recovery
process. If, for example, radar services fail on all workstations, personnel should have
a readily available guide to help them recover from the failure. These guidelines may
vary according to the type of user, because different roles may require different
information on equipment failures and recovery procedures.
Potential colour coding in Aide-
Memoire GREEN
Appendices
331
Note that the colour-coded categorisation could be used in a slightly different manner
as well. If this Aide-Memoire becomes a part of the generic procedures for handling
emergency/unusual situations than the use of colour should be restricted to categories
such as ‘Aircraft Emergencies’, ‘Equipment Failures’, ‘Fire and Building Evacuation’.
The Aide-Memoire, as a hard, laminated copy flip chart, should be readily available on
each Controller Working Position (CWP). A more detailed version, providing local or
ATC Centre specific data, should be at the supervisor’s position. For simplicity and
efficiency, it is better to present each relevant failure on a single page highlighting the
two main areas: what recovery steps to perform and what feedback to expect from the
ATC system. This approach assures the most efficient usage of the tool.
The final version of the Aide-Memoire should not be considered as an exhaustive list
but more of a living document. In other words, it will be necessary to update this tool on
annual basis to reflect the local expertise and to compile all changes (i.e. changes in
the ATC system, both software and hardware).
Appendices
332
ONL LAN Failure
ATCO actions:
− Inform Coordinator − Inform all traffic − Check spare ODS − Maintain timely & accurate strip marking − Restrict traffic − Utilise holding patterns − Use only verbal coordination channels − Reaffirm traffic identification using the code on the FPS − Identify any new tracks using the “Confirm Squawk?”
method − Seek SAS assistance and print screen if possible − Ground all sport/non-commercial traffic ASAP − Utilise strategic ATC techniques when possible − Conduct regular checks of aircraft identification − Monitor Mode C closely − Be aware of the absence of Safety Nets and Monitoring
Aids − Cross check that exit conditions are achieved − Expedite reduction in traffic load
Appendices
333
ONL LAN Failure (Cont’d)
Expect:
The radar data is distributed via the RFS LAN
The following functions are NOT AVAILABLE:
− Safety Nets and Monitoring Aids (existing alarms maintained)
− Flight Plan function (no coupling, no RAM & CLAM) − Radar Data function replaced by Radar Fallback function − Flight plan commands (i.e. mod) − Flight plan lists frozen with data at time of failure − Reception Queues − Message transmission − Coordination messaging − Mail box management − Resectorisation − SSR code management − AIS (only data available at the time of failure) − All correlation will be lost
Appendices
334
Failure of the Surveillance Network
ATCO actions:
− Inform Coordinator − Inform all traffic − Employ procedural control techniques (if necessary
utilise emergency vertical separation of 500 feet) − Utilise holding patterns − Deny departures − Maintain timely & accurate strip marking − Instruct aircraft to maintain VMC, if in VMC − Reduce traffic load ASAP − Seek assistance − Relocate to contingency site if required
Expect
All ODS frozen or blanked throughout the Centre
Appendices
335
Failure of COMPAD
ATCO actions:
− Inform Coordinator − Transmit on second sector COMPAD − Access RBS and inform traffic of failure − Reset COMPAD − Seek assistance and relocate to spare CWP − Inform traffic of restoration of normal service when
service is restored
Expect:
Complete or Partial failure
Inability to transmit on RTF
Inability to access alternate RTF
Inability to use intercoms
Inability to access telephone network
Appendices
336
Reduced Radar Data Mode
GIW will show “MRTS”
ATCO actions:
− Inform Coordinator − Report failure − Operate as normal
Expect:
All functions are available
The switch to RFS (MRTS) from ARTAS is automatic
Any position in by-pass before ARTAS failure will remain
in by-pass
Appendices
337
Reduced Alert Mode
GIW will show “SNMAP”
ATCO actions:
− Inform Coordinator − Be aware of restricted, danger and prohibited airspace inc. TSA’s
− Check MSA’s at regional airports − Double and cross check Oceanic Entry COP’s and levels − Maintain timely & accurate strip marking − Utilise strategic traffic plans − Ensure tactical ATCO action is accurate − Employ TRM best practice − Continuously scan Mode C − Seek SAS assistance if necessary
Expect:
Any alert displayed prior to the reduced alert mode will remain displayed regardless of whether or not the alert is still valid.
The following functions are NOT AVAILABLE:
− Safety Net Function (STCA) − ATC Tools (MSAW and APW) − Monitoring Aids (RAM and CLAM) − Coupling − No APR sent to Flight Data function (no profile updates)
Appendices
338
Reduced Flight Plan Mode
GIW will show “FDP”
ATCO actions:
− Inform Coordinator − Check availability of FDP function on spare ODS − Inform traffic of failure − Maintain timely & accurate strip marking − Use verbal coordination channels inter sector/ centre − Identify all new tracks using the “Confirm Squawk”
technique − Maintain identification by regular checks − Restrict traffic flow where necessary − Utilise holding patterns − Be aware of unreliable Safety Nets and Monitoring Aids − Seek SAS assistance where necessary
Expect:
The following functions are NOT AVAILABLE:
− Flight Plan tracks − Tracks already displayed will remain displayed − Flight Plan commands (i.e. mod, terminate) − Message queues − Message transmission − Coordination messages − Mailbox management − Resectorisation − Limited Safety Net and Monitoring Aids due no update
of the flight plans
Appendices
339
Reduced Communication Mode
GIW will show “FDX”
ATCO actions:
− Inform Coordinator − Use only verbal inter-centre coordination channels − Inform all traffic on RTF − Seek FDA assistance for AFTN or AIS information − Maintain timely & accurate strip marking − Seek SAS assistance where necessary
Expect:
The following functions are NOT AVAILABLE:
− Inter centre communications − AFTN − Coordination messages (except inter sector) − Flight plans are not updated by external messaging − AIS
No radar data function (neither ARTAS nor MRTS nor RFS)
341
Appendix IV The questionnaire design
Air Traffic Controller Questionnaire
Dear Sir/Madam, This questionnaire is created for the purpose of obtaining information on equipment failures and recovery in Air Traffic Control (ATC) System(s) from various standpoints. The information you provide will be used in a research project jointly supported by EUROCONTROL Experimental Centre and Imperial College London. We would greatly appreciate your completing of the attached questionnaire. It will only take a few minutes of your time to answer the questions which will contribute to our joined effort to introduce more real experience into ATC safety analysis. Data collection intends to support recovery strategies of future ATM and analyse the current status on this issue. The information that you provide will be used as additional data source for the PhD dissertation developing in this area. The questionnaire is created in Microsoft® Word 2000. It is our intention to enable you to fill it out electronically and directly send it directly to the following e-mail address ([email protected]). However, if it is more convenient you can use the fax number provided below. Generally there are two formats of the questions, which require different way of answering. For some questions you will have to choose the most appropriate answer by highlighting it, marking it (e.g. yes/no answers), while for the others you will have to type in your full answer. Please, fill out your questionnaire and try to answer the questions as detailed as possible. Your answers will be strictly confidential and de-identified, thus your personal details will not appear in any document connected to this research. Thank you in advance for your time and effort.
Sincerely, Branka Subotic
Research PhD student Imperial College London Centre for Transport Studies London SW7 2AZ
1. Total number of years active as a controller ____________
2. Please list the types of facilities that you have worked in, beginning with the most recent.
ATC Facility Name (beginning with the
most recent) Location Country
Number of years worked in particular
Unit
Type (Civilian/ Military)
Position/Rating ACC/RDR, ACC/PROC,
APP/RDR, APP/PROC, TWR or
ARTCC, TRACON, ATCT (USA)
3. Have you ever experienced ATC equipment failure during your work? Mark the corresponding letter. (If ‘No’ go to question 10) Y N
4. What is the average number of ATC equipment failures during one year that you experience? _________________________
Appendices
343
5. Please fill in any previous experience with equipment failures which seriously impacted your work:
* Page: 343 Context is defined as any aspect of the operating context that influenced the failure or recovery aspect (e.g. workload, HMI, personal factors, team factors).
Note: The typical CWP (controller working position) contains one or more of the following systems (systems will vary from one center and country to another):
• Radar (SSR, PRS, Mode S, radar data processing (RDP), multi-radar processing (MRP), single radar processing (SRP))
• Ancillary screens (meteorological information, strip bay, traffic flow information, etc.) o Flight Plan Processing (FPP) o Flight Progress Strips (FPS)
• Pointing devices (mouse & trackball)
• Secondary input devices (keyboard or touch input device (TID))
Type of equipment
failure
System affected? (See Note
below)
Frequency of the failure per
year (in your own experience)?
Did you detect it
and how?
If not, who
detected it?
Duration of the failure
min, h, days (If you can
recall)?
Was the context* of the failure an
important factor? If yes, has it positive or
negative impact?
Recovery/ contingency
procedure existed or
not?
Recovery/ contingency training existed or
not?
Who initiated
the recovery?
How was the
recovery initiated?
Any additional comment
Appendices
344
• Communication panel
• R/T, telephone, headset, intercom
• Strip printer
• Ground based Safety Nets (SNET): STCA, MSAW, APW, or any other SNET available
• Other (e.g. power supply)
6. How much do you generally rely upon the written procedures in case of equipment failure and how much on situation-specific problem solving (i.e. improvisation)? Fill in the corresponding number for Procedures, Problem solving, AND Other.
1 (very much) 2 3 (moderately) 4 5 (not at all)
Written procedures
Situation-specific problem solving
Other (e.g. past experience)
7. Is there any organized exchange of the past experience in solving the equipment failures with your fellow colleagues?
Y N
8. If yes, is it supported by your management as a good work practice? Y N
9. According to your experience, what are the three most unreliable ATC systems/subsystems? Please use the device listing from the Note above to state those systems starting with the most unreliable one:
(Note: Reliability is defined in this questionnaire as the probability that a piece of equipment or component will perform its intended function without failure over the given time period and under specific or assumed conditions)
Appendices
345
Following questions should be answered in relation to your current job, position, and level of experience (the first one cited in the question 2).
Procedures
10. Are recovery/contingency procedures available? Mark the corresponding letter. Y N
11. Which types of equipment failures (outages) are covered by procedures in your Center?
12. Are recovery/contingency procedures up-to-date? Y N
13. Are recovery/contingency procedures comprehensive? Y N
14. Are recovery/contingency procedures complete? Y N
15. If not, which procedure(s) would you add?
16. Are recovery/contingency procedures understandable? Y N
17. Are recovery/contingency procedures easily accessible? Y N
18. Are recovery/contingency procedures realistic/feasible? Y N
19. Are recovery/contingency procedures compatible with other procedures? Y N
Appendices
346
20. Describe the situation when you had a problem applying the recovery/contingency procedure and why?
Training
21. Is training provided in recovery from equipment failures? Y N
22. Is there separate refreshment training every year? Y N
23. If provided, how many times per year?
24. Is it enough? Y N
25. Does the training covers all important equipment failures? Y N
26. If not, what should be added?
27. Are training methods suitable (realistic, varied, etc)? Y N
28. Is recovery/contingency training compatible with and linked to other training? Y N
Appendices
347
Conclusion
29. Please write down any other comments or suggestions based on your past experience or professional opinion that you might have on the issue of equipment failures, recovery/contingency procedures, or training.
Thank you for taking the time to answer these questions. Your time and participation are greatly appreciated.
--End--
Appendices
348
Appendix V Example of one questionnaire response
Appendices
349
Appendices
350
Appendices
351
Appendices
352
Appendices
353
Appendices
354
Appendix VI Results extracted from question 5 of the questionnaire survey
The question 5 aimed to provide an opportunity to controllers to discuss their past
experience with equipment failures which seriously impacted on their work. In order to
provide a structured description of each example and extract all relevant information,
question 5 was presented in the form of a table. The rows dealt with different failure
types while the columns dealt with various failure characteristics. These failure
characteristics were as follows:
1. Type of equipment failure and system affected (assessed in section 6.7.3.3
of Chapter 6);
2. Frequency of failure per year;
3. Individual who detected the failure;
4. Duration of the equipment failure;
5. Importance of the recovery context;
6. Existence of recovery procedure for a particular failure (assessed in Table
6-3, Chapter 6);
7. Existence of training for recovery for a particular failure;
8. Individual who initiated the recovery and method applied; and
9. Concluding remarks.
1. Frequency of failure per year
The frequency of failure experienced by controllers was not possible to extract in 27.20
percent of cases. This was partially due to missing responses but mostly due to vague
and unclear responses (e.g. very often, rare). The available and pre-processed data
show that the frequency of failures per year is on average more than 14, ranging
between less than once per year to as many as 730 annually (or twice per day). The
great dispersion of data confirms different interpretation of equipment failures (as
discussed in section 6.7.3.1 of Chapter 6).
2. Individual who detected the failure
The failures were detected most frequently by controllers (in 79.4 percent of examples)
and with the assistance of the system-generated failure alert (in 7.1 percent of
examples). Other cases include failure detection by watch supervisors, engineers,
pilots, or controllers from other ATC Centres (in the case of a failure affecting national
or regional airspace, such as failure of satellite communication, flight data processing
Appendices
355
system, or radar). These findings are expected as NATS (2002) reports that most
failures do not affect the controllers as these are prevented or recovered by system
control and monitoring unit. Moreover, the results obtained from this questionnaire
survey emphasise that the prompt detection of any ATC system deficiency depends
mostly on the controller, as a direct result of the controller’s situational awareness.
Furthermore, the results show that failure detection may be aided by system-generated
failure alerts. This is an example of the synergy that exists between technical and
controller recovery achieved through the technical built-in defences for transmitting
information on failure (discussed in Chapter 4, section 4.3.2). These technical systems
will demonstrate more potential in the future, highly integrated ATC environment.
3. Duration of the equipment failure
Similar to the frequency variable, it was not possible to extract the duration of failures in
27.20 percent of examples. This was expected due to the difficulties with recalling the
duration of past failures. Additional problems were encountered with vague qualitative
responses (e.g. several days, a couple of hours, a few minutes). The available and pre-
processed data show that the average duration of the reported failures was close to
one day, ranging from five minutes to one month. The large dispersion indicates
different durations for different types of failures.
The same categorisation of duration variables is applied as previously with the
operational failure reports (see Chapter 4, section 4.4.6). More precisely, the
categorisation focused on failures up to 15 minutes, between 15 minutes and one hour,
between one hour and one day, and those lasting more than one day. It is interesting to
note that distribution of duration from operational failure reports and from past
experience captured in this survey show similarities (Figure 1). The difference is
observed in the third category (duration from one hour to one day). It seems that in the
operational environment, equipment failures of this duration tend to occur more
frequently compared to the experience of controllers worldwide.
Appendices
356
(>24.01][1.01-24.00][0.26-1.00][0.00-0.25]
Duration category (h)
100
80
60
40
20
0
Fre
qu
en
cy
7.23%
19.15%
31.06%
42.55%
a)
[>24.01][1.01-24][0.26-1][0.00-0.25]
Duration category (h)
3,000
2,500
2,000
1,500
1,000
500
0
Fre
qu
en
cy
8.04%
31.6%
25.85%
34.51%
b)
Figure 1 Distribution of the duration variable a) from the questionnaire survey; b) from the Country D operational failure reports (see Chapter 4)
4. Importance of the recovery context
When asked about the context surrounding the occurrence of an equipment failure, the
controllers acknowledged its importance in the majority of examples (73 percent of
examples). Furthermore, these controllers rated its impact mostly as negative (63.9
percent of examples). The negative issues mentioned regarding the context of the
equipment failures were reduction of capacity, increased workload, increased stress,
increased communication with aircraft, increased coordination with adjacent sectors,
and in some cases additional workload due to deterioration in the weather. However,
Appendices
357
there were several instances in which controllers rated context as positive mostly
through efficient teamwork, availability of an efficient assistant, low traffic levels at the
time of occurrence (i.e. no significant increase in workload), and ability to work with
fallback systems. As a result, the importance of context identified in past research is
confirmed in this questionnaire survey. The following Chapters are dedicated to further
assessment of recovery context.
5. Existence of training for recovery for a particular failure
Question 5 allowed mapping between ATC functionalities and available recovery
training for the sampled equipment failures1. The analysis showed that in 48 percent of
examples provided, the controllers had some type of recovery training. This training
was mostly provided for the communication, navigation, surveillance, and data
processing functions. Lack of training is identified for power outages and loss of safety
nets.
6. Individual who initiated the recovery and method applied
The individuals that initiated and applied recovery processes came predominately from
the controller population when compared with watch managers and engineers. This is
understandable as section 2 pointed out that most equipment failures are detected by
controllers. Having detected a problem with equipment, the controllers have to inform
engineers, indirectly through the watch manager, which constitutes the initiation of the
recovery. In some simple cases (e.g. loss of microphone and loss of screen), the
controller tries to replace the failed equipment either by using the spare one or by
changing to another working position (if there are any spare ones). In more complex
situations, when a change of position is not possible, the controller has to continue
working with the remaining tools and equipment and potentially revert to procedural
control, assure vertical separation, use fallback systems, and/or transfer all flights to an
adjacent sector or flight information region. Engineers initiate the recovery process in
the case of failures of aeronautical data exchange with adjacent ATC Centres,
runway/taxiway lighting systems, and data processing system. However, the controller
still remains responsible for safe separation of all traffic in the affected airspace.
1 Question 26 although intended to capture the type of recovery training missing in each
sampled ATC Centre yielded mostly high-level comments on impossibility to train for every potential equipment failure.
Appendices
358
7. Concluding remarks
In general, the controllers’ perceive equipment failures as stressful and distracting
events that pose a major safety problem due to increased workload and difficulties with
maintaining identification of aircraft (e.g. in case of radar failure and data processing
failure). In one particular instance a controller commented that an equipment failure led
to a near miss. Another example pointed out the problems with equipment failures
occurring during night shift, as technical staff are not always available during that
period.
Appendices
359
Appendix VII Overview of contextual factors
Factor HERA
Eurocontrol HERA [12]
TRACEr Shorock and Kirwan [19]
RAFT Eurocontrol
[20]
THERP Swain and Guttman [24]
COCOM Hollnagel
[27]
CREAM Hollnagel [11]
External PSF Stressors Internal
PSF
1 Pilot-controller comm.
Pilot-controller comm.
Pilot-controller comm.
Written and verbal communication
2 Pilot actions
3 Traffic and airspace
Traffic and airspace Task load and system complexity
Complexity; Requirements for perception; requirements for motor speed
Task speed; Task load
4 Weather
5 Documentation and procedures
Procedures Procedures and documentation
Required procedures; Work-methods; Plant policy
Plans Availability of procedures/ plans
6 Training and experience
Training and experience
Training and experience
Prior training, experience
Normal/familiar process state
Adequacy of training and experience
7 Workplace design and HMI
Workplace design, HMI, and equipment factors
Human machine interaction
Design features; Factors in task and work resources; Warnings and danger signs; Man-machine factors; Interface
Inconsistent labelling
MMI and support
Adequacy of MMI and operational support
8 Environment Ambient environment
Quality of environment; T; Air quality; Situational factors
Usability of control; Usability of equipment; Positioning; Equivocation of equipment ; arrangement of equipment; display range; accuracy of display; Labelling; Marking; Reliability; Technical layout; Construction; Redundancy; Coupled equipment
Low signal to noise ratio; Overriding information easily accessible; no means to reverse an unintended action; Poor system feedback; Poor system feedback on activity progress
8 Technical/workplace/situational factors
Environmental factors and ergonomics
External event Poor environment
9 Person related factors Stress; Workload
Human performance capabilities at low point; Excessive workload
Processing; Information; Goal reduction
Operator under load/boredom; A conflict between intermediate and long-term objectives; Stress and ill-health; Information overload
Person issues; Demand of perception, cognition, etc.
10 Task organisation Social factors Poor handovers and team coordination problems
Team issues
11 Organisational factors Lack of supervision/checks
Non-optimal use of human resources
Low workforce moral or adverse organisational environment
12
13 Time Factors relevant for prioritisation of recovery-related factors
Time pressure Time constraints Time pressure Time pressure
The time needed to correctly perform tasks, steps, and actions
14 Occurrence-related factors
Appendices
361
Appendix VIII Probabilities for 20 Recovery Influencing Factors (RIFs)
The relevant Recovery Influencing Factors (RIFs) are discussed in the four main
groups: internal factors (i.e. related to the controller), equipment failure related factors,
external factors (i.e. factors related to working conditions), and airspace related factors.
The following paragraphs present the underlying considerations in developing the
probability values for each predefined RIF.
A.1 Internal factors
Internal factors represent a group of RIFs closely related to the air traffic controller.
These include quality of training, controller experience with equipment failures in
his/her professional career, experience with (or trust in) the ATC system, generic
assessment of personal factors (e.g. personality, fatigue, stress), and communication
for recovery as a result of detected equipment failure.
A.1.1 Training for recovery from ATC equipment failure
This factor describes the adequacy of training provided in recovery tasks based on the
existing recovery procedures and/or other ATC Centre specific equipment failures,
frequency of refresher training (e.g. once per year), and familiarity with ATC system
operational modes (ranging from full, through reduced/emergency, to failed operation).
The qualitative descriptor and the corresponding probabilities are determined from the
questionnaire survey responses based on percentages of ATC Centres that provide
training for recovery, those that provide this training but not consistently, and those that
do not provide any training for recovery (see Chapter 6, section 6.7.3.6 and Chapter 8,
section 8.3.1.2). The qualitative descriptor and the corresponding probabilities for this
RIF are presented in Table 1.
Table 1 Summary of the RIF ‘Training for recovery from ATC equipment failure’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of responses
Percentage of
responses
RIF probability
Nature of the
validation
Training for recovery from ATC equipment
failure
suitable The
questionnaire survey
134
52 0.52
- tolerable 17 0.17
counter productive
31 0.31
Appendices
362
A.1.2 Previous experience with equipment failures
This factor describes the overall level of controller experience with equipment failures,
as well as the level of experience with a particular type of failure under assessment.
The qualitative descriptor is set at two levels (controllers can either have experience
with equipment failures or not), while the probabilities are determined from the
questionnaire survey, further validated by the responses from the ATM specialists
surveyed (Table 2).
Table 2 Summary of the RIF ‘Previous experience with equipment failures’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of responses
Percentage of
responses
RIF probability
Nature of the
validation
Previous experience
with equipment
failures
experienced any type of equipment
failure The questionnaire
survey 134
95 0.95
ATM specialists surveyed
no experience
with equipment
failures
5 0.05
A.1.3 Experience with system performance (reliance or trust in the system)
This dynamic factor describes the overall level of experience of the controller with the
ATC system including the tools and subsystems on the ATC console. The use of
automated tools depends upon the controllers’ trust in their reliability. The extreme
situations of undertrust or overtrust may lead to problems. The former may result in the
tool not being used and the latter, in the over reliance of the controller on the tool
available. The probabilities are determined from the findings of the study by Hilburn
and Flynn (2001) also reported in EUROCONTROL (2000b), which involved a total of
79 controllers from seven European ATC Centres. This study used both focus group
discussions and survey data collections to extract controllers’ attitudes to future
automation needs, system development issues, and operational requirements. The
results showed that 18 percent of controllers sampled mistrust technology. On the
other hand, the responses from the ATM specialists surveyed in this thesis reveal that
10 percent of controllers have excessive trust in the system. Taking mistrust and
excessive trust together, the qualitative descriptor for this RIF is set at two levels and
the corresponding probabilities are shown below (Table 3).
Appendices
363
Table 3 Summary of the RIF ‘Experience with system performance’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of responses
Percentage of
responses
RIF probability
Nature of the
validation
Experience with system performance (reliance or trust in the
system)
objective attitude
toward the ATC
system
Past research and ATM
specialists
79/8
72 0.72
-
excessive trust and mistrust
28 0.28
A.1.4 Personal factors
These are controller-related factors, which can be determined in a post-failure analysis
or predicted in the case of predictive analysis. This factor includes, but it is not limited
to, the following: time of the day (i.e. relevance of circadian rhythm), time into the shift
(i.e. level of situational awareness as well as fatigue), and age. Although other factors
are important, for example, the level of confidence, complacency, self-esteem (i.e. trust
in own ability), personality, motivation, attitudes deriving from family or close social
groups, and ability to cope with stress, they require the application of various sets of
psychological tests. Current definition of the personal factors accounts for all the above
mentioned factors and sets the qualitative descriptor at three levels. The respective
probabilities are determined from the average of the responses from the ATM
specialists surveyed (Table 4).
Table 4 Summary of the RIF ‘Personal factors’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of responses
Percentage of
responses
RIF probability
Nature of the
validation
Personal factors
suitable
ATM specialists
8
65 0.65
- tolerable 26 0.26
counter productive
9 0.09
A.1.5 Communication for recovery within team/ATC Centre
This factor includes only the communication that takes place between controllers for
the purpose of recovery from equipment failure. Therefore, it assesses the quality of
communication as well as the decision-making process, quality of Team Resource
Appendices
364
Management (TRM)2, familiarity of team members or the level of synergy between
them, the level of mutual understanding and the knowledge of different working
strategies, team efficacy, intent recognition (i.e. overt communication), and other items.
In the case of a single-controller position this factor should be understood as a
communication with a supervisor or any other relevant personnel. The qualitative
descriptor is proposed at three levels while the corresponding probabilities are
determined from the average of the responses from the ATM specialists surveyed
(Table 5).
Table 5 Summary of the RIF ‘Communication for recovery within team/ATC Centre’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of
responses
Percentage of
responses
RIF probability
Nature of the
validation
Communication for recovery
within team/ATC
Centre
efficient
ATM specialists
8
73 0.73
- tolerable 24 0.24
inefficient 4 0.04
A.2 Equipment failure related factors
Equipment failure related factors represent a group of RIFs defining the characteristics
of failures relevant to the controller recovery process. These are complexity of failure
type, time course of failure development, number of workstations/sectors affected, time
necessary to recover, existence of recovery procedure, and duration of failure. Details
on failure characteristics can be found in Chapter 4.
A.2.1 Complexity of failure type
This factor identifies single versus multiple component failures (as discussed in
Chapter 4) and thus the qualitative descriptor is proposed at two levels. The
probabilities of each level are determined using the operational failure reports from
available Civil Aviation Authorities (Table 6). Due to the relatively low level of
confidence in the use of CAA occurrence databases (see Chapter 8, section 8.3.1.5),
these probabilities were validated by the responses from the ATM specialists surveyed
which did not show a significant difference. Additionally, these results are in line with
the experience of system control and monitoring engineers interviewed for this study
2 TRM represents an effective use of all available resources for ATC personnel to assure safe
and efficient operation, to reduce error, avoid stress, and increase efficiency.
Appendices
365
who stated that the majority of ATC equipment failures represent single as opposed to
multiple failure occurrence (for evidence see Appendix II).
Table 6 Summary of the RIF ‘Complexity of failure type’
RIF Qualitative descriptor
Data source for probabilistic
assessment
Number of
responses
Percentage of
responses
RIF probab
ility
Nature of the
validation
Complexity of failure type
a single failure
Operational failure reports
22,808 reports
92 0.92
ATM specialists responses and system control and monitoring engineers
multiple failure
8 0.08
A.2.2 Time course of failure development
This factor defines the temporal characteristics of failure occurrence. These are
sudden, gradual, and latent/persistent failures. As a result, the qualitative descriptor is
set at three levels: sudden failure/gradual degradation of system/persistent or latent
failure. Based on the averaged responses from the ATM specialists surveyed the
corresponding probabilities are presented in Table 7. These probabilities were
validated by the interviews with system control and monitoring staff from several ATC
Centres which did not show a significant difference (for evidence see Appendix II).
Table 7 Summary of the RIF ‘Time course of failure development’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of responses
Percentage of
responses
RIF probability
Nature of the
validation
Time course of failure
development
sudden ATM
specialists responses
8
55 0.55 System control and monitoring engineers
gradual 39 0.39
latent 7 0.07
A.2.3 Number of workstations/sectors affected
This factor describes the immediate impact of a particular type of failure in terms of the
number of positions/sectors affected. It is closely linked to the overall ATC Centre
architecture, since exposure to failure varies greatly with the level of interconnectivity of
different systems, the level of availability of separate channels (redundancy/variability),
and complexity of failure (single vs. multiple failure). The qualitative descriptor is
proposed at two levels, differentiating between a failure affecting a single and multiple
Appendices
366
Controller Working Positions (CWPs) and sectors. Due to the lack of operational data,
a conservative approach is taken and probabilities are equally assigned between two
levels. Note that this RIF has no Level 1, i.e. the most favourable level, simply because
the number of workstations/sectors affected cannot have any positive or favourable
effect on controller performance (Table 8).
Table 8 Summary of the RIF ‘Number of workstations/sectors affected’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of
responses
Percentage of
responses
RIF probability
Nature of the
validation
Number of workstations/
sectors affected
one CWP or several CWPs in a
sector N/A
50 0.5
- several CWPs in
several sectors/all CWPs in all sectors
50 0.5
A.2.4 Time necessary to recover
This factor describes the time necessary for a controller to recover from the effect(s) of
equipment failure. This time should be measured from the moment of failure
occurrence until the establishment of a normal or stable system state (i.e. assurance of
safe but not necessarily efficient control of air traffic). The qualitative descriptor is set at
two levels, differentiating between availability and lack of time to recover, while the
corresponding probabilities are determined from the average of the responses from the
ATM specialists surveyed (Table 9).
Table 9 Summary of the RIF ‘Time necessary to recover’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of responses
Percentage of
responses
RIF probability
Nature of the
validation
Time necessary to recover
less than time
available3 ATM
specialists 8
94 0.94
- in excess
of time available
6 0.06
3 Time available to controller to react before the development of less than adequate separation.
Appendices
367
A.2.5 Existence of recovery procedure
This factor takes into account the availability of a written procedure, rules, or guidelines
for a particular type of equipment failure, the level of its comprehensiveness and
completeness. In future this RIF may even include the existence of some sort of a
dynamically adaptable procedure. The qualitative descriptor is set at three levels to
capture the quality of the existing procedure (Table 10). Probabilities are calculated
based on the findings from the questionnaire survey responses which showed that 13.8
percent of ATC Centres do not have any recovery procedures. The distinction between
suitable and tolerable procedures was acquired taking into account that 45 percent of
existing procedures are not complete, and therefore only tolerable. It should be noted
that this approach is limited as it associates incomplete procedures with tolerable
procedures. A more accurate approach is achievable when the proposed methodology
is applied to a specific equipment failure and its context.
Table 10 Summary of the RIF ‘Existence of recovery procedure’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of
responses
Percentage of
responses
RIF probability
Nature of the
validation
Existence of recovery
procedure
suitable The
questionnaire survey
134
47 0.47
- tolerable 39 0.39
inappropriate4 14 0.14
A.2.6 Duration of failure
This particular factor represents the amount of time during which a failure persists.
Applied to a specific system, it can carry important information on recovery and the
impact of particular failure on ATC and overall aviation safety. A discussion of the
duration of failures informed by the results of the operational failure report analysis
informed the qualitative descriptor, proposed at two levels. The corresponding
probabilities are determined from the operational failure reports (Chapter 4), further
validated by the responses from the ATM specialists surveyed which did not show a
significant difference (Table 11).
4 If procedures are not available, ‘Inappropriate’ would be used.
Appendices
368
Table 11 Summary of the RIF ‘Duration of failure’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of
responses
Percentage of
responses
RIF probability
Nature of the
validation
Duration of failure
short period of time (up to 15minutes)
Operational failure reports
22,808 (reports)
56 0.56 ATM
specialists surveyed
moderate to substantial period of time (failures longer
than 15 minutes)
44 0.44
A.3 External factors
External factors or factors related to working conditions represent the group of RIFs
related to the working conditions surrounding a controller at the moment of failure.
These are adequacy of HMI, operational support, quality of alarms/alerts and the
moment when they are triggered in the system, and the overall adequacy of the
organisational characteristics in an ATC Centre from the safety and operational
perspectives.
A.3.1 Adequacy of HMI and operational support
This factor includes the HMI and all available control panels (e.g. mode of operation,
radars in use, frequencies in use and dynamic flight information), situational display, as
well as the operational support provided by specifically designed decision aids. It is
important to highlight that a controller receives the entire feedback on the ATM system
performance through the HMI. The qualitative descriptor is set at three levels to capture
the quality of the HMI, while the probabilities are determined from the average of the
responses from the ATM specialists surveyed (Table 12).
Table 12 Summary of the RIF ‘Adequacy of HMI and operational support’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of
responses
Percentage of
responses
RIF probability
Nature of the
validation
Adequacy of HMI and
operational support
suitable
ATM specialists
8
53 0.53
- tolerable 45 0.45
counter productive
3 0.03
A.3.2 Ambiguity of information in the working environment
This dynamic factor describes the transparency of the system, the level of system
interaction and redundancy, and existence of symptoms that can be interpreted in more
Appendices
369
than one way. In general, it is observed that a lack of transparency of an ATC system
leads people to make hypotheses on the causes of failures based on incomplete
information or best guess (see Straeter, 2005). ATC subsystems are highly dependent
on each other. Information from one tool can be distributed to several different
subsystems at the same time. For example, information on aircraft position is sent
directly to the radar data processing system, air traffic flow management, ATC tools
(including the monitoring aid and the medium term conflict detection tool), safety nets
(e.g. the short term conflict alert tool), and flight data processing system. In other
words, ATC systems are closely coupled and dependant upon dynamic information
exchange. For this reason the architecture of any ATC Centre takes into account
existing interactions by building a net of redundancies. In addition, any symptoms that
can be interpreted in more than one way will be interpreted wrongly in some instances.
Based on the above discussion, the qualitative descriptor are set at two levels whilst
the corresponding probabilities are determined from the average of the responses from
the ATM specialists surveyed (Table 13).
Table 13 Summary of the RIF ‘Ambiguity of information in the working environment’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of
responses
Percentage of
responses
RIF probability
Nature of the
validation
Ambiguity of information in the working environment
the match between the external
working environment and the controller's internal mental
model ATM specialists
8
86 0.86
- the mismatch between the
external working environment and the controller's internal mental
model
14 0.14
A.3.3 Adequacy of alarms/alerts
As explained in Chapter 4, the function of alarms/alerts is to alert operators (visually
and/or auditory) to potential non-nominal system states. The role of the human
operator is then to confirm the existence of a failure and take appropriate actions.
Because of the complexity of current ATC consoles, it is believed that the availability,
adequacy of alerts, and other relevant characteristics should be considered separately
from HMI. Therefore, this factor describes the availability and adequacy of
Appendices
370
alarms/alerts which permit detection, diagnosis, and/or correction of failures, the
reliability of given information, the number of alerts presented to the controller, and the
appropriate location and format of alert information (e.g. signal, colour coding,
warning/message). The qualitative descriptor is set at three levels, to account for
suitable tolerable and inadequate design solutions, while the probabilities are
determined from the average of the responses from the ATM specialists surveyed
(Table 14).
Table 14 Summary of the RIF ‘Adequacy of alarms/alerts’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of
responses
Percentage of
responses
RIF probability
Nature of the
validation
Adequacy of alarms/alerts
suitable
ATM specialists
8
75 0.75
- tolerable 20 0.2
counter productive
5 0.05
A.3.4 Adequacy of alarm/alert onset
This dynamic factor describes one important characteristic of the available
alerts/alarms, namely the ‘cognitive convenience’ of alert onset. In other words, alert
onset has a high impact on the overall recovery performance depending on the
moment of its onset. In addition, a misleading sequence of alerts can lead the controller
towards wrong assumptions with a cognitive tunnelling based on the initial alert,
thereby disregarding a later, possibly more relevant alert (Straeter, 2005). Since the
adequacy of alert onset depends directly on the complexity of traffic in the dedicated
airspace (dynamically changing every second), this RIF is given two levels.
Furthermore, due to the lack of ATC operational data on this advanced and futuristic
concept, a conservative approach is taken and probabilities are equally assigned
between two levels (Table 15).
Appendices
371
Table 15 Summary of the RIF ‘Adequacy of alarm/alert onset’
RIF Qualitative descriptor
Data source for probabilistic assessment
Number of responses
Percentage of responses
RIF probability
Nature of the validation
Adequacy of
alarm/alert onset
information from the external world enters the processing loop at
the right time
N/A N/A
50 0.50
- information from the external world enters the processing loop at
the wrong time, i.e. misleading alarm or sequence of alarms
50 0.50
A.3.5 Adequacy of organisation
This factor describes several organisational characteristics of the ATC Centre. These
include but are not limited to the quality of roles and responsibilities, the availability of
team members, the availability and adequacy of supervision, the availability of
additional support (e.g. assistant), the personnel selection process, shift patterns and
personnel planning, attitude to teamwork, safety culture, existence of stress
management programs, support for the organised exchange of past experience on
equipment failures, adequacy of communication with management and technicians
(e.g. briefings, exchange of knowledge, bulletins, safety panels). Three qualitative
descriptors can be distinguished with probabilities determined from the average of the
responses from the ATM specialists surveyed (Table 16).
Table 16 Summary of the RIF ‘Adequacy of organisation’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of
responses
Percentage of
responses
RIF probability
Nature of the
validation
Adequacy of organisation
efficient
ATM specialists
8
67 0.67
- tolerable 31 0.31
inefficient 3 0.03
A.4 Airspace related factors
Airspace related factors relate to the characteristics of the airspace affected by the
degraded system performance, traffic complexity at the moment of failure and during
the recovery process, and weather conditions. In addition, this group includes the
overall task complexity of the situation. For example, an equipment failure occurrence
coupled with sudden increase in amount of traffic, sudden deterioration of weather, or
the existence of priority aircraft highly increase the complexity of the overall situation.
Appendices
372
A.4.1 Traffic complexity during the recovery process
This dynamic factor includes but is not limited to the following: the level and
characteristics of the traffic load, the mix of aircraft flying on instrument flight rules (IFR)
and visual flight rules (VFR), military aircraft (because of different performance
characteristics and speed differentials), the existence of priority aircraft (e.g. low fuel,
government flights, and medical emergency). There have been various studies into
traffic complexity (Hilburn, 2004) and various attempts to provide a quantitative
indicator of traffic complexity; for example using dynamic density (Kopardekar and
Magyrtis, 2003), cross-sectional time-series analysis methods (Majumdar et al., 2004),
and the use of traffic complexity indicator (EUROCONTROL, 2006c). Any of these
approaches may be used to inform the probabilities for the qualitative descriptor of this
particular RIF. Taking into account only the impact that traffic complexity may have on
the controller performance, this qualitative descriptor is proposed at two levels. One
level accounts for average traffic complexity whilst the other accounts for high and low
traffic complexity, as both negatively impact controller performance. The probabilities
are determined from the average of the responses from the ATM specialists surveyed
(Table 17).
Table 17 Summary of the RIF ‘Traffic complexity during the recovery process’
RIF Qualitative descriptor
Data source for probabilistic assessment
Number of responses
Percentage of responses
RIF probability
Nature of the validation
Traffic complexity during the recovery process
High and low traffic complexity
ATM specialists
8
19 0.19
- Average traffic
complexity 81 0.81
A.4.2 Airspace characteristics during the recovery process
This dynamic factor incorporates the characteristics and complexity of airspace (i.e. its
component sectors), based upon the sector design characteristics (for details see
NATS, 1999). These characteristics include the number of crossing points and their
position in relation to sector boundaries, number of flight levels, number of entry and
exit points, special use airspace (SUAs) including zones of military activity,
characteristics of upper vs. lower airspace, airways configuration, and the number of
neighbouring sectors. It is important to highlight the difference between enroute and
terminal airspace in relation to recovery from equipment failures. The terminal airspace
is characterised with traffic in constant level change (i.e. ascending or descending) and
Appendices
373
frequent changes in heading compared to enroute airspace and especially its higher
levels. Due to differences in controller tasks, en-route airspace in general provides
more time to recover compared to terminal airspace. In addition, interviews with ATM
specialists revealed that terminal airspaces have radar coverage provided from one
radar source compared to en-route airspace, which is usually based on multi-radar
tracking (i.e. integration of data from several radar sites). The qualitative descriptor is
set at three levels whilst the corresponding probabilities are determined from the
average of the responses from the ATM specialists surveyed (Table 18).
Table 18 Summary of the RIF ‘Airspace characteristics during the recovery process’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of
responses
Percentage of
responses
RIF probability
Nature of the
validation
Airspace characteristics
during the recovery process
Adequate
ATM specialists
8
64 0.64
- Tolerable 33 0.33
Inappropriate 3 0.03
A.4.3 Weather conditions during the recovery process
This dynamic factor takes into account any change in weather conditions during the
recovery process. The qualitative descriptor is proposed at two levels whilst the
corresponding probabilities are determined from the responses from the ATM
specialists surveyed (Table 19).
Table 19 Summary of the RIF ‘Weather conditions during the recovery process’
RIF Qualitative descriptor
Data source for probabilistic assessment
Number of responses
Percentage of responses
RIF probability
Nature of the validation
Weather conditions during the recovery process
Improved ATM
specialists 8
89 0.89
-
Deteriorated 11 0.11
A.4.4 Conflicting issues during the recovery process (task complexity)
This dynamic factor describes the level of overall task complexity at the moment of
equipment failure. In the case of multiple conflicting tasks, the operator has to prioritise
between them (Straeter, 2005). In the case of any type of conflict alert (i.e. two or more
aircraft having a conflicting intent), the controller has to provide full attention to the
Appendices
374
resolution of the conflict using the equipment which is still operational, but assuming
that some other subsystem might fail. In ATC overall safety is the first priority. Due to
the dynamic nature of ATC, this qualitative descriptor is proposed at two levels, the
average complexity of the situation and both high and low complexity of the situation
(as both have negative effect on controller performance: increased workload and
boredom or monotony, respectively). The corresponding probabilities are determined
from the responses from the ATM specialists surveyed (Table 20).
Table 20 Summary of the RIF ‘Conflicting issues during the recovery process (overall task complexity)’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of
responses
Percentage of
responses
RIF probability
Nature of the
validation
Conflicting issues during the recovery
process
The average complexity
ATM specialists
8
72 0.72
- Multiple tasks and low
complexity 28 0.28
Appendices
375
Appendix IX Questions for ATM Specialist
Note: The set of questions presented below is investigating controller recovery from
equipment failures in ATC. All questions should be answered based upon your
operational experience and knowledge. Whilst some of them are very specific, and
therefore pose a challenge to answer, please try to respond to all the questions giving
the appropriate percentages.
How often has training (initial & refreshment) in your ATC Centre been:
Suitable for potential equipment failures Tolerable for potential equipment failures Counter productive for potential equipment failures
100%
What is the percentage of ATCOs that have never experienced equipment failure in their career? Please think of novice ATCOs as well and try to make the best estimation.
According to your best judgement, what percentage of ATCOs have:
Over-trust the automation/systems they are using Objective attitude toward ATC automation (ATCOs do trust automation but are aware of possible failures) Under-trust the automation/systems they are using
100%
In the event of equipment failure, how often have personal factors (stress, fatigue, self esteem) been:
Suitable to the equipment failure in question Tolerable to the equipment failure in question Counter productive to the equipment failure in question
100%
How often has team-related communication for recovery been:
Efficient Tolerable Inefficient
100%
What is the percentage of equipment failures affecting:
One system only Multiple systems at the same time
100%
What is the percentage of: Sudden equipment failures Gradual equipment failures Latent equipment failures in your ATC Centre
Appendices
376
100%
How often has the time necessary to recover (time before the development of any inadequate separation) been:
Adequate Inadequate
100%
How often (in your overall experience) have existing recovery procedures been:
Suitable to the equipment failure in question Tolerable to the equipment failure in question Counter productive to the equipment failure in question
100%
What is the percentage of equipment failures lasting:
Up to 15min More than 15min
100%
When there is a failure, how often has information presented on your HMI (i.e. radar screen) been:
Suitable to the recovery from equipment failure (e.g. provides appropriate cues, visual/auditory alerts) Tolerable to the recovery from equipment failure Counter productive to the recovery from equipment failure (e.g. provides wrong cues, mislead you)
100%
When there is a failure, how often have existing alarms/alerts on radar screen been:
Suitable to the recovery from equipment failure Tolerable to the recovery from equipment failure Counter productive to the recovery from equipment failure
100%
According to your opinion, what is the percentage of match between the controller's situational awareness and the dynamic airspace and traffic configuration (traffic mix, speed differentials, FL utilized, airways configuration) during the recovery process?
What percentage of time the organisational features in your ATC centre are:
Efficient Tolerable Inefficient regarding the support for better recovery from equipment failures.
100%
In the event of an equipment failure, how often has the traffic complexity been:
Too high Tolerable Too low
100%
In the event of an equipment failure, how often has airspace design and configuration been:
Adequate Tolerable Inappropriate
100%
In the event of an equipment failure, how often have the weather conditions been:
Improved Deteriorated or worsen Unchanged
100%
Appendices
377
In the event of equipment failure, how often has the total complexity of the recovery situation been:
High Average Low
100%
Appendices
378
Appendix X Overview of RIFs, their corresponding levels, and designated probabilities
(1) (2) (3) (4) (5) (6) (7) (8)
ID RIF name Descriptor Probability
(p)
Expected effect of
controller recovery
performance
Level Designator
(R)
Probability of overall situation occurring
(p*R)
Inte
rnal fa
cto
rs
1 Training for recovery from ATC equipment failure
Suitable to the situation in question
0.52 Most
favourable 1 1 0.52
Tolerable to the situation in question
0.17 Non
significant 2 0 0.00
Counter productive to the situation in question
0.31 Least
favourable 3 -1 -0.31
2 Previous experience with equipment failures
Experienced with a particular type of failure or Experienced with any other type of ATC equipment failure
0.95 Most
favourable 1 1 0.95
No experience with ATC equipment failures
0.05 Non
significant 2 0 0.00
3 Experience with the system performance (reliance)
Objective attitude toward the system
0.72 Non
significant 2 0 0.00
Positive experience with the system (excessive trust) or Negative experience with the system (under-trust)
0.28 Least
favourable 3 -1 -0.28
4 Personal factors
Suitable for the recovery process
0.65 Most
favourable 1 1 0.65
Tolerable for the recovery process
0.26 Non
significant 2 0 0.00
Counter productive for the recovery process
0.09 Least
favourable 3 -1 -0.09
5 Communication for recovery within team/ATC Centre
Efficient 0.73 Most
favourable 1 1 0.73
Tolerable 0.24 Non
significant 2 0 0.00
Inefficient 0.04 Least
favourable 3 -1 -0.04
Equip
ment
failu
re r
ela
ted facto
rs
6 Complexity of failure type
Single system affected
0.92 Non
significant 2 0 0.00
Multiple systems affected
0.08 Least
favourable 3 -1 -0.08
7 Time course of failure development
Sudden failure 0.55 Improve 1 1 0.55
Persistent or latent failure
0.07 Non
significant 2 0 0.00
Gradual degradation of system
0.39 Least
favourable 3 -1 -0.39
8 Number of workstations/sectors affected
One workstation/one sector or All workstations in one sector
0.50 Non
significant 2 0 0.00
Several workstations/couple of sectors or All
0.50 Least
favourable 3 -1 -0.50
Appendices
379
workstations/all sectors
9 Time necessary to recover
Adequate - less than available time
0.94 Most
favourable 1 1 0.94
Inadequate - in excess of available time
0.06 Least
favourable 3 -1 -0.06
10 Existence of recovery procedure
Suitable to the situation in question
0.47 Most
favourable 1 1 0.47
Tolerable to the situation in question
0.39 Non
significant 2 0 0.00
Inappropriate 0.14 Least
favourable 3 -1 -0.14
11 Duration of failure
Short period of time 0.56 Non
significant 2 0 0.00
Moderate period of time or Substantial period of time
0.44 Least
favourable 3 -1 -0.44
Exte
rnal or
facto
rs r
ela
ted to w
ork
ing c
onditio
ns
12 Adequacy of HMI and operational support
Suitable to the situation in question
0.53 Most
favourable 1 1 0.53
Tolerable to the situation in question
0.45 Non
significant 2 0 0.00
Counter productive to the situation in question
0.03 Least
favourable 3 -1 -0.03
13
Ambiguity of information in the working environment
External working environment matches the controller's internal mental model
0.86 Most
favourable 1 1 0.86
External working environment mismatches the controller's internal mental model
0.14 Least
favourable 3 -1 -0.14
14 Adequacy of alarms/alerts
Suitable to the situation in question
0.75 Most
favourable 1 1 0.75
Tolerable to the situation in question
0.20 Non
significant 2 0 0.00
Counter productive to the situation in question
0.05 Least
favourable 3 -1 -0.05
15 Adequacy of alarm/alert onset
Information from the external world enters the processing loop at the right time
0.50 Most
favourable 1 1 0.50
Information from the external world enters the processing loop at the wrong time (misleading sequence of alarms)
0.50 Least
favourable 3 -1 -0.50
16 Adequacy of organisation
Efficient 0.67 Most favourable
1 1 0.67
Tolerable 0.31 Non significant
2 0 0.00
Inefficient 0.03 Least favourable
3 -1 -0.03
Airspace
rela
ted
facto
rs
17 Traffic complexity
Average traffic complexity
0.81 Non significant
2 0 0.00
Extremely high or extremely low traffic complexity
0.19 Least favourable
3 -1 -0.19
Appendices
380
18 Airspace characteristics
Adequate (e.g. enroute higher levels)
0.64 Most favourable
1 1 0.64
Tolerable 0.33 Non significant
2 0 0.00
Inappropriate (e.g. enroute lower levels or terminal)
0.03 Least favourable
3 -1 -0.03
19 Weather conditions during the recovery process
Improved 0.89 Non significant
2 0 0.00
Deteriorated 0.11 Least favourable
3 -1 -0.11
20 Conflicting issues in the situation (task complexity)
Average complexity of the situation
0.72 Non significant
2 0 0.00
Conflicting, multiple tasks or Extremely low complexity of the situation (may lead to monotony)
0.28 Least favourable
3 -1 -0.28
Appendices
381
Appendix XI Validation of the RIFs interaction matrix
DIRECT INFLUENCE
Tra
inin
g f
or
recovery
Pre
vio
us e
xperience w
ith e
quip
. fa
ilure
s
Experience w
ith s
yste
m p
erf
orm
ance
Pers
onal fa
cto
rs
Com
m. fo
r re
covery
Com
ple
xity o
f fa
ilure
Tim
e c
ours
e o
f fa
ilure
develo
pm
ent
Num
ber
of w
ork
sta
tions a
ffecte
d
Tim
e n
ecessary
to r
ecover
Exis
tence o
f re
covery
pro
cedure
Dura
tion o
f fa
ilure
Adequacy o
f H
MI
and o
per.
support
Am
big
uity o
f in
form
ation
Adequacy o
f ala
rms/a
lert
s
Adequacy o
f ala
rms/a
lert
s o
nset
Adequacy o
f org
aniz
ation
Tra
ffic
/tra
ffic
com
ple
xity
Airspace c
hara
cte
ristics
Weath
er
Task c
om
ple
xity
Training for recovery from ATC equipment failures
x x
Previous experience with equip. failures
x
Experience with system performance (reliance)
x x x x
Personal factors
x x x x x x x x x x x x x x x x x x
Comm. for recovery within a team of controllers
x x x x x x x x x x x x x x x x x x
Complexity of failure type
x
Time course of failure development
x
Number of workstations/ sectors affected
x x
Time necessary to recover
x x x x x x x x x x x x x x x x x
Existence of recovery procedure
x
Duration of failure
x x
Adequacy of HMI and operational support
x x x x
Ambiguity of information in the working environment
x x x x x x x
Adequacy of alarms/alerts
x x x
Adequacy of alarms/alerts onset
x x x x x
Adequacy of organization
x x x x
Appendices
382
Traffic/traffic complexity in the moment of failure
x x x
Airspace characteristics
x x x
Weather conditions during the recovery process
Task complexity
x x x x x x x x x x x x x x x x x
NOTE: Please mark the interactions between each factor in the upper row and each factor from the left column. For example, does 'Training for recovery' influences any of the factors from the left side ('previous experience', 'experience with the system', 'personal factors', and so on). Please add or delete existing interactions as you find it appropriate.
Appendices
383
Appendix XII Distribution of 20 Recovery Influencing Factors (RIFs)
Experimental material consists of various documents used by air traffic controllers
participating in the study, as well as the subject matter expert (SME). The documents
used by controllers are presented in the following order:
a) The controller handbook;
b) Debriefing interview sheet; and
c) Feedback form.
The documents used by subject matter expert are presented in the following order:
d) Subject matter expert’s assessment; and
e) Best practice procedure sheet.
Appendices
386
a) The controller handbook
TThhee CCoonnttrroolllleerr HHaannddbbooookk
Researcher: Branka Subotic
Supervisor: Dr Washington Y. Ochieng
University: Imperial College London
Location of experiment: XXX
June 2006
Appendices
387
SSUUBBJJEECCTT IINNSSTTRRUUCCTTIIOONNSS
Strategic and tactical decision making in ATC
Dear Controller, Welcome to the “Strategic and tactical decision making in ATC” research program. Because of your extensive experience as an Air Traffic Controller, you have been asked to participate in this study. Our aim is to test a new approach to better understanding of the decision making process by air traffic controllers. We will try to determine the cognitive processes that drive your decisions/actions during the dynamic and complex control of air traffic. The knowledge gained from this research will feed into the future design solutions of computerized ATC tools. We are not in position to reveal more information on this study at this point, as it may influence your behaviour, actions and, the processes we wish to observe and analyze. At the end of this study you will be more familiar with our objectives and you will be able to ask as many questions as you find necessary. So please bear with us and help us make this study as realistic as possible.
Your understanding and help are crucial at every step of this study! This study is designed as an integrated part of regular emergency training in Dublin ATC Centre with the minimal impact on the controller. Therefore, please consider and treat this training session as any other training session you have had in your professional career. From time to time, additional information may be given to you from the training instructor or researcher. In these occasions please act as if you would in the operational environment. Also, when information or instructions is given to you by the researcher, please regard it as if it comes from a training instructor.
Now, we would like you to read the “Consent form” which aims to inform you what the experiment involves and to make you fully aware of your rights while you are taking part in it. So please proceed to the next page, read the form, and sign it if you agree with all terms and conditions. If you have any questions, please do not hesitate to contact the researcher. In addition, we will ask you to fill out a questionnaire and participate in a de-briefing after the training session. The De-briefing part of this experiment is of high importance as we will compare the recorded data with your own experience and decision-making process. Therefore, we would like to encourage you to give the researcher detailed input and explanation.
Appendices
388
IMPERIAL COLLEGE LONDON RESEARCH SUBJECT INFORMED CONSENT FORM
The purpose of this research is to investigate the controller’s decision making process. You will be asked to complete one emergency training session and therefore perform air traffic control service through one traffic scenario. The entire experiment is expected to take approximately 1.5h to complete. The results of this experiment are for research purposes only, and may be presented at professional meetings or published in research literature. Your name will not be used in the reporting of results. Only recorded data will be used; all personal information will be kept completely confidential. A videotape of part of the experiment may be taken for purposes of data collection only. Neither your face nor identity will ever be associated with any reporting of these results. In addition, because of the confidentiality of this experiment, you will be asked not to disclose any information of what you have experienced today to anyone (including family, fellow colleagues, and friends) for a next 30 days. Only in this way we can be assured that the experiment will remain as realistic as possible. With your signature below you are accepting these conditions. If for any reason you are unable to comply with any of the listed conditions, please inform the researcher right away and you will be released of any other obligations. Additionally, if you wish to withdraw from the experiment, you may do so at any time. With Sincerest Thanks I, ________________________________, understand that my participation in this experiment is completely voluntary and that I may refuse to participate, or withdraw from the experiment, at any time without penalty. ___________________________________ _________________ Participant Signature Date I _______________________________ the researcher undertake to guarantee the confidentiality of the information you provided in this experiment. I understand that you reserve the right to seek legal redress should any aspect of this agreement be breached. ___________________________________ _________________ Researcher Signature Date
Prospective Research Subject: Read this consent form carefully and ask as many questions as you like before you decide whether you want to participate in this research study. You are free to ask questions at any time before or after your participation in this research.
Appendices
389
Now you are ready for training session!
~ When ready contact pseudo-pilot on dedicated R/T frequency so that your training session can be
initiated ~
Appendices
390
PPOOSSTT –– EEXXPPEERRIIMMEENNTT SSEESSSSIIOONN Dear Controller, Once again thank you very much for your participation is this experimental trial. Now you understand what our true objective in the experiment was and why we had to keep it confidential. Our objective in this research project is to research controller recovery from equipment failures in ATC. However, in order to achieve the unexpected effect of this rare occurrence, it was necessary to mask the real objective of this research. Our aim is therefore to determine how controllers manage equipment failures. The complexity of this experiment gave us the opportunity to test only one equipment failure in spite of the large number of potential equipment failures in any ATC Centre. By observing your reactions, recovery strategy, and attitude, we are aiming to identify better solutions in design of ATC tools/systems, recovery procedures, and training. Our belief is that current, more automated ATC Centres need to create better support to its main element – air traffic controllers. For the above reasons, we kindly remind you that you have agreed not to disclose any information and details from today’s experiment to your fellow colleagues, family, and friends in the next 30 days.
IIff yyoouu nneeeedd ccllaarriiffiiccaattiioonn aatt aannyy ppooiinntt,, pplleeaassee ddoo nnoott hheessiittaattee ttoo ccoonnttaacctt tthhee rreesseeaarrcchheerr!! How suitable was your previous training to the situation (equipment failure) that you have just experienced? Please answer this question taking into account quality of training syllabus as well as the frequency of training. (Circle the appropriate number)
1. Suitable to the situation in question
2. Tolerable to the situation in question
3. Counter productive to the situation in question When was your last emergency training?
1. In the last 30 days 2. In the last 6 months 3. 1 year ago 4. More than 1 year ago
Did you have training on equipment failures during that session? Y N Do you need better or more frequent training for unusual situations, such as handling emergencies? Y N Please mark the statement that is closest to your previous experience with equipment failures:
1. I have experienced very similar or same type of equipment failure in the past. 2. I have not experienced this particular type of failure, but have experienced other
types of equipment failures previously. 3. I have never experienced equipment failure in my professional career.
Please mark the statement that is closest to your experience with ATC system:
1. I trust ATC technology more than I trust my own judgments. 2. I trust new ATC technology but I am aware of possible failures. 3. I do not trust new ATC technology, even though it is designed to make my job
easier.
Current rating: ACC RDR Proc Age ____ Years of experience as a controller: ____ APP RDR Proc TWR
Appendices
392
How would you rate your personal ability in today’s training session? Personal ability comprises different factors, not limited to: your level of fatigue, stress, confidence, complacency, your ability to cope with emergency situation, any family or other social group issues, etc. based on this explanation, rate your personal ability:
1. Suitable for the recovery process 2. Tolerable for the recovery process 3. Counter productive for the recovery process
How would you rate your communication for recovery today:
1. Efficient 2. Tolerable 3. Inefficient
Would you say that you had enough time to recover from the effect(s) of the equipment failure (taking into account possible development of less than adequate separation)?
1. Yes, time was adequate. Time necessary to recover was less than available time in the simulation.
2. No, time was not adequate. Time necessary to recover was in excess of available time in the simulation.
Is there relevant recovery procedure for this particular failure? Y N If yes, according to your opinion is that procedure:
1. Suitable to the situation in question
2. Tolerable to the situation in question
3. Counter productive to the situation in question
How familiar are you right now with that procedure?
1. Very familiar
2. Semi familiar
3. Not familiar at all Would you say that HMI and operational support have been:
1. Suitable to the situation in question
2. Tolerable to the situation in question
3. Counter productive to the situation in question Would you say that:
1. External working environment matched your internal mental model during recovery process
2. External working environment mismatched your internal mental model at any point of recovery
Appendices
393
How would you rate the adequacy of organisation in your ATC Centre?
1. Efficient
2. Tolerable
3. Inefficient How would you rate traffic complexity during the recovery process (please note: only during the recovery process and not during the entire training session):
1. High
2. Average
3. Low How would you rate the complexity of the airspace in the used scenario? The airspace complexity was:
1. Adequate 2. Tolerable 3. Inappropriate
How would you rate weather conditions during the recovery process?
1. Improved 2. Unchanged 3. Deteriorated
The quality of roles and responsibilities
The availability and adequacy of supervision
Attitude to teamwork
Support for organised exchange of past experience on eq. failures
Personnel selection process
Shift patterns and personnel planning
Availability of team members
Availability of additional support (e.g. Assistant)
Safety culture
Communication with management and technicians (e.g. Briefings, exchange of knowledge, bulletins)
Existence of stress management programs
The mix of IFR/VFR
Military aircraft
The existence of priority aircraft
Speed mix of aircraft
Amount of vertical movements
Amount of crossing movements
Amount of conflicts
The number of crossing points
Proximity of crossing point s to the sector boundaries
Number of flight levels
Number of entry points
Number of exit points
Special use airspace (SUAs)
Upper vs. Lower airspace
Airways configuration
The number of neighbouring sectors
Sector geometry (e.g. sharp edges)
Size of sector Bidirectional vs. unidirectional routes
Route length
Proximity of route to sector boundary
Appendices
394
Considering the entire training session how would you rate the overall task complexity:
1. Conflicting, multiple tasks existed during this training session.
2. Average complexity of the situation.
3. Extremely low complexity of the situation. How would you rate your recovery performance today?
1. Efficient
2. Tolerable
3. Inefficient How different your today’s performance is from any other day?
1. Not different at all
2. Similar
3. Very different How representative today’s performance have been of your overall ability to recover from an equipment failure in ATC?
1. Highly representative
2. Average
3. Not representative at all How realistic the today’s task was?
1. Highly realistic
2. Moderately
3. Not realistic at all Are you completely aware of the impact/implications of a particular failure that you have just experienced? Do you fully understand what will happen when particular equipment fails? Y N Any comment? Would you like to see some form of Aide-Memoire (flip chart, small laminated booklet, HMI drop down menu) available at each CWP to assist you in recognising the effects of a particular equipment failure and steps to be taken toward its recovery? Y N
Appendices
395
Is there any aspect of training, procedures, HMI, teamwork that could enhance your today’s recovery performance?
Thank you!!!!
Appendices
396
b) Debriefing interview structure
IMPERIAL COLLEGE LONDON
DEBRIEFING INTERVIEW STRUCTURE
Questions for each subject:
1. How did you notice/detect that there was an equipment failure? What info triggered the detection?
2. When exactly detection occurred?
3. What could have been the worst consequence if the situation was not detected?
4. Did you find diagnosis phase possible/necessary? If yes go to question 4. If no go to question 7.
5. What was your diagnosis?
6. What you did with it (i.e. tried to confirm, or rule out alternatives)?
7. Was the recovery strategy influenced by diagnosis?
8. How did you choose the recovery strategy to apply (i.e. based on training, own experience, colleague’s experience, any other source of info)?
9. What could have made the situation worse?
10. Can you think of any fall-back actions which could mitigate this situation? Can you suggest any changes to the procedures, phraseology; HMI design; fall-back procedures that could improve the situation?
Note: The researcher should replay the video recording from the moment of failure
injection and start further discussion with the subject.
Appendices
397
c) Feedback form
FEEDBACK FORM
Concerning the study conducted by representatives of
Imperial College London at XXX ATC Centre 06/06/06 – 09/06/06
Dear Controller, Having participated in this study we would like to ask you to provide your feedback on the importance and value of this study. Please answer all questions as accurately as possible, since these answers will guide us in our future endeavours. Your answers will be used only for the assessment of the usefulness of this study. Once again thank you very much for participating in this study! Please circle the appropriate answer:
Did you find participating in this study interesting? Y N
Do you think that this experience is beneficial for your future work? Y N
Do you feel that this experiment raised important issues? Y N
Do you feel that this experiment helped you to identify any gaps in your:
• Knowledge Y N
• Training Y N
• Skills Y N
• Awareness of effects of unusual events Y N
Would you be willing to participate in future studies of this type? Y N
Do you have any other comments on the experiment?
After completing, please return this feedback form to the office of XXX. Thank you for your time! Your cooperation is highly appreciated. Researcher Assistant
Our objective in this research project is to analyse the recovery from equipment failures in ATC. Since the area of ATC is highly specialised, it was necessary to evaluate the controller’s recovery performance using the expert opinion. As a Subject Matter Expert (SME) in the area of Air Traffic Control (ATC) you are asked to help in the assessment of the subject controller’s recovery performance. We kindly ask you not to disclose any information and details on this experiment to your fellow colleagues in the next 30 days so that we can assure the injection of failure as unexpected event for each subject-controller.
Recovery effectiveness
According to the controller performance that you observed in this experiment (either “live” or on the video recording of the experimental trial) it is necessary to use your professional experience and assess the effectiveness of the controller’s recovery.
Recovery is considered successful if the system returns to the normal or intermediate (but still stable) state. In the short term (as simulated in this experiment), the situation should be stable and control of airspace should be considered safe, but not necessarily efficient.
Please notice that the anchor points of each scale range from “Firmly Disagree” to “Firmly Agree.” Place a mark in one of the five boxes along each line, as shown in following example.
Example
In general, I am professionally more efficient in the mornings than evenings.
1. The recovery strategy implemented by this controller can be considered successful.
Firmly Partly Neutral Partly Firmly Disagree Disagree Agree Agree 2. In this traffic scenario, it was possible to implement more than one recovery strategy.
If answered ‘partly agree’ or ‘firmly agree’, your answer referrers that you thought of alternative recovery strategy(s). Please describe briefly this/these alternative(s).
3. If you were in the place of subject-controller, would you implement different recovery strategy than he did?
Firmly Partly Neutral Partly Firmly Disagree Disagree Agree Agree If answered ‘partly agree’ or ‘partly disagree’, please specify your reasons to implement different recovery strategy and which recovery strategy that would be. In addition, please specify any particular/difficult issues regarding traffic situation during the recovery process:
Evaluation of the contextual factors in the training scenario: Please circle corresponding answers according to your professional experience and expertise:
How would you rate complexity of simulated failure type?
1. Single system affected 2. Multiple system affected
How would you rate the time course of simulated failure development?
1. It was sudden failure 2. It was latent failure. 3. It was gradual degradation of system.
Would you say that controller had enough time to recover from the effect(s) of the equipment failure?
3. Yes, time was adequate. Time necessary to recover was less than available time for recovery in the simulation.
4. No, time was not adequate. Time necessary to recover was in excess of available time for recovery in the simulation.
Is there recovery procedure for this particular failure? Y N If yes, is that procedure:
4. Suitable to the observed situation in question
5. Tolerable to the observed situation in question
6. Counter productive to the observed situation in question
Appendices
400
How would you rate duration of simulated equipment failure? 1. Short period of time (is it reasonable to consider them less than 15min) 2. Moderate period of time (is it reasonable to consider them less than 1h) 3. Substantial period of time (is it reasonable to consider them more than 1h)
How would you rate traffic complexity during the recovery process (please note: only during the recovery process and not during the entire training session).
1. High 2. Average 3. Low
How would you rate airspace complexity in the used scenario?
4. Adequate 5. Tolerable 6. Inappropriate
How would you rate weather conditions during the recovery process?
4. Improved 5. Unchanged 6. Deteriorated
How realistic the today’s task was?
4. Highly realistic
5. Moderately
6. Not realistic at all
Thank you!!!!
The mix of IFR/VFR
Military aircraft
The existence of priority aircraft
Speed mix of aircraft
Amount of vertical movements
Amount of crossing movements
Amount of conflicts
The number of crossing points
Proximity of crossing points to the sector boundaries
Number of flight levels
Number of entry points
Number of exit points
Special use airspace (SUAs)
Upper vs. Lower airspace
Airways configuration
The number of neighbouring sectors
Sector geometry (e.g. sharp edges)
Size of sector
Bidirectional vs unidirectional routes
Route length
Proximity of route to sector boundary
Appendices
401
e) Best practice procedure sheet
BEST PRACTICE PROCEDURE FOR XXX SIMULATION
Detect the problem � Either by pilot’s first contact or � Visually on the radar display (uncorrelated track). In this case first
assumption may be transponder failure. After confirmation that a/c transponder is serviceable, further check on system performance should be conducted.
Identify failure type either by ATCO or by input from the coordinator
Locate traffic
Check identity of all tracks (referring to the eastbound overflight)
Identify traffic using appropriate technique
Bearing/range Turn method
Inform all traffic on RTF of the failure and advise of possible restrictions
Maintain identification of all traffic
Ground trainer
Refuse departures permission to depart
Get all airborne traffic to land
Maintain accurate and timely strip marking throughout the process
Provide vertical separation
Utilize holding patterns when necessary
After restoration has been confirmed by coordinator: � Re-identify all traffic � Confirm Mode C � Continue to monitor � Release all departures
First possible detection/action may have occurred at: ______________ First actual action occurred at: ______________ End of the recovery process (release of the departures): ______________
Chapter 13 Appendices
Appendix XIV Overview of RIFs, their corresponding levels, and probabilities determined in the experimental investigation
(1) (2) (3) (4) (5) (6) (7) (8)
ID RIF name Descriptor Probability
(p)
Expected effect of
controller recovery
performance
Level Designator
(R)
Probability of overall situation occurring
(p*R)
Inte
rnal fa
cto
rs
1 Training for recovery from ATC equipment failure
Suitable to the situation in question
0.73 Most favourable
1 1 0.73
Tolerable to the situation in question
0.23 Non significant
2 0 0
Counter productive to the situation in question
0.03 Least favourable
3 -1 -0.03
2 Previous experience with equipment failures
Experienced with a particular type of failure or Experienced with any other type of ATC equipment failure
0.83 Most favourable
1 1 0.83
No experience with ATC equipment failures
0.17 Non significant
2 0 0
3 Experience with the system performance (reliance or trust)
Objective attitude toward the system
0.93 Non significant
2 0 0
Positive experience with the system (excessive trust) or Negative experience with the system (under-trust)
0.07 Least favourable
3 -1 -0.07
4 Personal factors
Suitable for the recovery process
0.83 Most favourable
1 1 0.83
Tolerable for the recovery process
0.13 Non significant
2 0 0
Counter productive for the recovery process
0.03 Least favourable
3 -1 -0.03
5 Communication for recovery within team/ATC Centre
Efficient 0.27 Most favourable
1 1 0.27
Tolerable 0.67 Non significant
2 0 0
Inefficient 0.07 Least favourable
3 -1 -0.07
Equip
ment
failu
re r
ela
ted facto
rs
6 Complexity of failure type
Single system affected
0 Non significant
2 0 0
Multiple systems affected
1 Least favourable
3 -1 -1
7 Time course of failure development
Sudden failure 1 Improve 1 1 1
Persistent or latent failure
0 Non significant
2 0 0
Gradual degradation of system
0 Least favourable
3 -1 0
8 Number of workstations/sectors affected
One workstation/one sector or All workstations in one sector
0 Non significant
2 0 0
Appendices
403
Several workstations/couple of sectors or All workstations/all sectors
1 Least favourable
3 -1 -1
9 Time necessary to recover
Adequate - less than available time
0.86 Most favourable
1 1 0.86
Inadequate - in excess of available time
0.14 Least favourable
3 -1 -0.14
10 Existence of recovery procedure
Suitable to the situation in question
0 Most favourable
1 1 0
Tolerable to the situation in question
0 Non significant
2 0 0
Inappropriate 1 Least favourable
3 -1 -1
11 Duration of failure
Short period of time 1 Non significant
2 0 0
Moderate period of time or Substantial period of time
0 Least favourable
3 -1 0
Exte
rnal or
facto
rs r
ela
ted to w
ork
ing c
onditio
ns
12 Adequacy of HMI and operational support
Suitable to the situation in question
0.5 Most favourable
1 1 0.5
Tolerable to the situation in question
0.39 Non significant
2 0 0
Counter productive to the situation in question
0.11 Least favourable
3 -1 -0.11
13
Ambiguity of information in the working environment
External working environment matches the controller's internal mental model
1 Most favourable
1 1 1
External working environment mismatches the controller's internal mental model
0 Least favourable
3 -1 0
16 Adequacy of organisation
Efficient 0.4 Most favourable
1 1 0.4
Tolerable 0.5 Non significant
2 0 0
Inefficient 0.1 Least favourable
3 -1 -0.1
Airspace r
ela
ted f
acto
rs
17 Traffic complexity
Average traffic complexity
0.35 Non significant
2 0 0
Extremely high or extremely low traffic complexity
0.65 Least favourable
3 -1 -0.65
18 Airspace characteristics
Adequate (e.g. enroute higher levels)
0.8 Most favourable
1 1 0.8
Tolerable 0.1 Non significant
2 0 0
Inappropriate (e.g. enroute lower levels or terminal)
0.1 Least favourable
3 -1 -0.1
19 Weather conditions during the recovery process
Improved 0.83 Non significant
2 0 0
Deteriorated 0.17 Least favourable
3 -1 -0.17
20 Conflicting issues in the situation (task complexity)
Average complexity of the situation
0.3 Non significant
2 0 0
Conflicting, multiple tasks or Extremely low complexity of the situation (may lead to monotony)
0. 7 Least favourable
3 -1 -0.7
Appendices
404
Appendix XV Distribution of the recovery context indicator captured in the experiment
The distribution of the recovery context indicator (Ic) obtained from the experimental
results is presented in Figure 1.
0
100
200
300
400
500
600
700
800
-0.088
-0.078
-0.068
-0.058
-0.048
-0.038
-0.028
-0.018
-0.008
0.00
2
0.01
2
0.02
2
0.03
2
0.04
2
0.05
2
0.06
2
0.07
2
0.08
2
0.09
2
0.10
2
0.11
2
Recovery context indicator (Ic)
Fre
qu
en
cy
Figure 1 Distribution of the recovery context indicator in the experimental investigation
(six RIFs defined through one level)
Based on the shape of the Ic distribution, the data has been fitted with two normal
distributions according to equation 1 (Figure 2). The distribution on the left accounts for
unfavourable recovery contexts whose recovery context indicator takes the average
value of -0.04 (A1=141.4, SD1=0.02). The distribution on the right accounts for
favourable recovery contexts whose recovery context indicator takes an average value