FRAMEWORK FOR THE ANALYSIS OF CONTROLLER RECOVERYFROM EQUIPMENT FAILURES IN AIR TRAFFIC CONTROL
Branka Subotic (MSc BSc)
April 2007
A thesis submitted for as fulfilment of the requirements for the degree of Doctor ofPhilosophy of the University of London and for the
Diploma of Membership of Imperial College London
Centre for Transport Studies Department of Civil and Environmental Engineering
Imperial College London, United Kingdom
Declaration
At various stages during this PhD, I was involved in collaborative efforts with both
academic and industrial colleagues. In certain cases, the outputs of these collaborations
are included in this thesis to better explain and support the research presented. In
particular, during the period 2004 to 2005, colleagues from the Air Traffic Management
(ATM) Group at the Centre for Transport Studies, Imperial College London, assisted in the
questionnaire-based survey of air traffic controllers. This mainly involved the distribution of
questionnaires and collection of the responses.
Furthermore, a key element of the research presented in this thesis is the experiment
conducted at a facility owned and operated by a Civil Aviation Authority (CAA). The
experiment was facilitated by the assistance of various Air Traffic Control (ATC) Centre
staff including ATM specialists, ATC controllers, pseudo-pilots, engineers, and technicians.
Finally, EUROCONTROL staff provided a valuable contribution at various stages of this
research in terms of access to relevant publications, professional networks, and simulation
trials.
I hereby declare that besides the collaborations referred to above, I have personally
carried out the work described in this thesis:
…………………………………………………..
Branka Subotic
…………………………………………………..
Dr. Washington Yotto Ochieng
ii
Abstract
An Air Traffic Control (ATC) system represents a set of components that act together to
achieve a safe and efficient flow of traffic in any given airspace. The elements of this
system are human operators, equipment, and procedures, along with all the interactions
between them. Failure of equipment, as one component of an ATC system, and its
interaction with human operators (i.e. air traffic controllers) is the main focus of the
research presented in this thesis. Thus, the thesis focuses on the human recovery process
triggered by failure of equipment that support air traffic controllers in the provision of air
traffic services in a dedicated airspace. A detailed understanding of the controller recovery
process has the potential to significantly contribute to safety and operational efficiency in
the current and future ATC environment. Currently, there is a very limited understanding of
the factors that influence the recovery process, particularly with respect to equipment
failures in ATC. This thesis builds on existing relevant research in other industries and
uses targeted experiments and mathematical modelling to develop a functional
relationship between recovery and its influencing factors.
The research presented in this thesis addresses on two areas, namely equipment failures
in ATC and controller recovery. The first investigates the characteristics of the ATC
equipment failures from past research and derives the associated target level of safety.
Linking the target level of safety with available operational failure reports establishes a
means to validate the realism and operational significance of the equipment failure
characteristics. A subset of these characteristics relevant to the ATC operations is further
used to develop a novel qualitative equipment failure impact assessment tool. This tool
enables the identification of equipment failures that are most severe to ATC operations
and thus may be most challenging to controller performance.
iii
Having identified the relevant equipment failure types and their characteristics, the thesis
carries out a critical review of the associated issues regarding the process of controller
recovery. A critical element of this is the review of past human reliability research and its
relationship to controller recovery from equipment failures in ATC. The findings from this
are augmented by questionnaire survey results based on responses of 134 air traffic
controllers from 34 countries. Both the past research and the questionnaire survey results
are used to highlight the importance of the context in which controller recovery
performance takes place and to define the recovery context through a set of 20 candidate
contextual factors or Recovery Influencing Factors (RIFs).
The thesis then uses the candidate RIFs to develop a novel approach for the quantitative
assessment of the recovery context through the concept of recovery context indicator. This
approach and its operational benefits are further validated by an experiment conducted in
a training facility of an ATC Centre with the participation of 30 operational air traffic
controllers. In addition to the verification of the generic methodology for the assessment of
the recovery context, the experimental data are used to analyse controller recovery
performance and investigate the outcome of the recovery process. The findings obtained
from the experimental investigation are in line with those obtained from past research and
the ATC operational environment.
iv
Acknowledgements
Having started my research initially at the EUROCONTROL Experimental Centre (EEC) in
Bretigny sur Orge and then at Imperial College London, it is understandable that naming
all those people who have contributed to this work is quite a hard task. However, I will try
anyway and if some names are not listed, my gratitude is not less than for those listed
below.
For help with the funding of my studies, I would like to thank the following organisations:
� EUROCONTROL Experimental Centre (EEC) in Bretigny sur Orge, France for the
award of a graduate internship and a further three-year research studentship;
� Universities UK for the Overseas Research Scheme (ORS) award for three
consecutive years; and
� the Centre for Transport Studies, Department of Civil and Environmental
Engineering, Imperial College London for the contribution to my tuition fee and a
three-year research bursary.
This PhD research would not have been possible without Christian Push and Dirk
Schaefer who invited me initially to join the EUROCONTROL Human Factors group and to
start developing a research project satisfying both the needs of the EEC as well as my
own interests. Once started, this collaboration proved to be highly supportive in both
technical and financial terms. As a EUROCONTROL PhD student I had a privilege of
unlimited access to many aviation experts working “in house”: at the EEC, Headquarters
(Belgium), and the Maastricht Upper Area Control (UAC) Centre (Netherlands). Among
these were Nigel Makings, Catherine Gandolfi, Eric Perrin, Deirdere Bonini, Rachael
Gordon, Andrew Harvey, and the entire Gate-to-Gate (G2G) team and controllers involved
in simulation A and B, especially Diarmuid Houlihan ‘Motto’. I thank them all for the fruitful
collaboration. My special gratitude goes to Barry Kirwan and Oliver Straeter whose
v
technical assistance and unlimited support was crucial to embarking upon the field of
human reliability, completely unknown to me at the beginning of this research. Their
assistance and interest in my research opened many doors and assured the highest
quality of information and professional contacts.
At Imperial College there are many colleagues and research students that offered their
help at various stages and aspects of my work. Among them are Jackie Sime, William
Knottenbelt, Dimitri Panagiotakopoulos, Marie-Dominique Dupuy, Umar Bhatti, Victoria
Williams, and Wolfgang Shuster. However, my biggest gratitude goes to Arnab Majumdar
and to my supervisor, Washington Y. Ochieng. They had a critical role in the support,
supervision, and achievement of excellence in my research. Thanks to their
understanding, I attended various technical meetings, seminars, conferences, courses,
and simulation trials. These proved to be a significant direct and indirect contribution to the
quality of the research presented in this thesis.
One of the critical parts of the research presented in this thesis would not be feasible
without the technical support of the Irish Aviation Authority staff, especially Nick Lowth,
Bernard Mackessy, and Garrett MacNamara. However, my special gratitude goes to Alan
Byrne for making the impossible truly possible and allowing me to complete successfully a
key part of this research and make it complete.
There are many other people that have helped in various ways. I would like to thank Yvette
Dalle-Mule, Veronique Begault, and Sonja Straussberger from EUROCONTROL EEC.
Furthermore, I would like to thank Rajkumar Pant from the Indian Institute of Technology,
Isa Alkalaj and Marek Bekier from Skyguide, Martin Richards and Vic Burgess from UK
NATS, Christopher Adams from Maastricht UAC, Bob Phillips from CASA Australia, Peter
Nalder from New Zealand Civil Aviation Authority (CAA), Jos Kuijper and Randal de Garis
from EUROCONTROL, Sarah Doherty and Joji Waites from the UK CAA, and Keshava
Sharma from the Airports Authority of India.
I want to thank my friend Tamara Pejovic for all the support that she gave me during the
years I have been working on this thesis. Last but not least, I want to express my deepest
gratitude to my brother and my mother who were always the core support in all the
journeys that I have embarked upon. Hence, I am dedicating this thesis to them.
vi
Table of Contents
DECLARATION ii ABSTRACT iii ACKNOWLEDGEMENTS v TABLE OF CONTENTS vii LIST OF FIGURES xiv LIST OF TABLES xvii LIST OF ABBREVIATION xix
1 INTRODUCTION 1 1.1 Background to the problem 1 1.2 Research objectives 4 1.3 Outline of the thesis 5
2 FUNDAMENTALS OF AIR TRAFFIC MANAGEMENT AND CONTROL 8 2.1 Air Traffic Management 8 2.2 Air Traffic Control 10
2.2.1 Area Control service 11 2.2.2 Approach Control service 12 2.2.3 Aerodrome control service 12
2.3 Overall Air Traffic Control system architecture 13 2.3.1 Air Traffic Control functionalities 15
2.3.1.1 Communication function 15 2.3.1.2 Navigation function 18
2.3.1.2.1 Approach and landing navigation 19 2.3.1.2.2 Area navigation 20 2.3.1.2.3 Systems for control and monitoring of ground-based airport 22
facilities 2.3.1.3 Surveillance function 22
2.3.1.3.1 Radar systems 23 2.3.1.3.2 Radar and auxiliary display 24 2.3.1.3.3 Terminal and ground surveillance 24
2.3.1.4 Data processing and distribution function 25 2.3.1.5 Supporting function 28 2.3.1.6 Safety Nets 29 2.3.1.7 Power supply 30 2.3.1.8 Pointing and input devices 31 2.3.1.9 System control and monitoring function 31
2.4 Characteristics of the generic Air Traffic Control Centre 32 2.5 The future of Air Traffic Control 34
vii
2.5.1 Challenges of automation 34 2.5.2 Human-centred vs. technology-centred automation 36 2.5.3 The future of air navigation service 37 2.5.4 Impact of future ATM/ATC on controller recovery from equipment failures 38
2.6 Summary 39
3 PRELIMINARY ASSESSMENT OF EQUIPMENT FAILURES IN AIR TRAFFIC 41 CONTROL 3.1 Definition of equipment failure 42 3.2 Definition of a hazard 44 3.3 Supporting data: operational failure reports 45
3.3.1 Reporting and data collection 46 3.3.2 Data pre-processing problems 47 3.3.3 Available operational failure reports 49
3.4 Methodology to assess the relevance of supporting data 51 3.4.1 The accident to incident ratio 51 3.4.2 Units of measurement 53 3.4.3 The acceptable risk or target level of safety (TLS) 55
3.4.3.1 Existing standards 55 3.4.3.1.1 Joint Aviation Authority 56 3.4.3.1.2 UK Civil Aviation Authority 58 3.4.3.1.3 International Civil Aviation Organisation 58 3.4.3.1.4 Summary of the various TLS analyses 60
3.4.4 Target level of safety and Air Traffic Control risk budgeting 62 3.4.5 Target level of safety and Air Traffic Control equipment risk budgeting 63
3.5 Preliminary analysis and validation of operational failure reports 65 3.6 Summary 67
4 EQUIPMENT FAILURES AND TECHNICAL DEFENCES IN AIR TRAFFIC CONTROL 69 4.1 Equipment failure characteristics 69
4.1.1 ATC functionality affected 70 4.1.2 Complexity of failure type 71 4.1.3 Time course of failure development 71 4.1.4 Duration of failure 72 4.1.5 Potential causes of equipment failures 72
4.2 Consequences of equipment failure 73 4.2.1 Impact on air traffic controller 73 4.2.2 Impact on operations room 73 4.2.3 Impact on ATC operations 74 4.2.4 Impact on ATM operations 79
4.3 Definition of technical defences (technical recovery) 80 4.3.1 Defences for recovering from failure (safety devices) 82 4.3.2 Defences for transmitting information regarding the failure (warning devices) 83
4.4 Analysis of operational failure reports 85 4.4.1 Data analysis methodology 85 4.4.2 Rate of equipment failures 89 4.4.3 Type of ATC functionality and equipment affected 91 4.4.4 Complexity of failure type 95 4.4.5 Severity of equipment failures 96 4.4.6 Duration of equipment failures 98 4.4.7 Additional statistical tests 100
viii
4.5 Qualitative equipment failure impact assessment tool 101 4.6 Summary 107
5 AIR TRAFFIC CONTROLLER RECOVERY 109 5.1 Human recovery in air traffic control 109
5.1.1 Recovery by air traffic controllers 110 5.1.2 Recovery by system control and monitoring engineers 110
5.2 Phases of the controller recovery process 111 5.2.1 Detection 113 5.2.2 Diagnosis 116 5.2.3 Correction 117
5.3 Outcome of the recovery process 119 5.4 Models of human recovery 121
5.4.1 Model by Kanse 122 5.4.2 RAFT Tool 123 5.4.3 Model by Wickens et al. 124
5.5 Procedures for handling ATC equipment failures 126 5.5.1 Existing regulations 127
5.5.1.1 International regulation 127 5.5.1.2 European and national regulation 128 5.5.1.3 Air navigational service provider regulation 128
5.5.2 Main principles on recovery procedures in ATC 130 5.6 Training for handling ATC equipment failures 131
5.6.1 Existing regulations 131 5.6.1.1 International regulation 131 5.6.1.2 European and national regulation 132
5.6.1.2.1 UK Civil Aviation Authority regulation 132 5.6.1.3 Air navigational service provider regulation 133
5.6.2 Areas of concern related to recovery training 133 5.7 Definition of controller recovery performance in this thesis 135
5.7.1 Recovery context 135 5.7.2 Recovery effectiveness 136 5.7.3 Recovery duration 136
5.8 Summary 137
6 QUESTIONNAIRE SURVEY 139 6.1 Objectives of the questionnaire survey 140 6.2 sampling 141 6.3 Survey methodology 143 6.4 Design of the questionnaire 144 6.5 Pilot survey 146 6.6 Full survey 147
6.6.1 Face-to-face interviews 147 6.6.2 Self-completion survey 147 6.6.3 Potential sources of errors 148
6.7 Methodology for the questionnaire survey data analysis 149 6.7.1 Data pre-processing for analysis 150 6.7.2 Characteristics of the sample 151
6.7.2.1 Sampling per ATC Centre 154 6.7.2.2 Sampling of air traffic controllers 154
6.7.3 High-level analyses 155
ix
6.7.3.1 Experience with equipment failures (Q1) 156 6.7.3.2 Factors that influence the controller recovery performance (Q2) 156 6.7.3.3 The most unreliable ATC systems/tools (Q3) 158 6.7.3.4 Organised exchange of information on equipment failures (Q4) 163 6.7.3.5 Status and quality of recovery procedures (Q5) 164
6.7.3.5.1 Other findings regarding the recovery procedures 167 6.7.3.6 Status and quality of training for recovery (Q6) 168
6.7.3.6.1 Other findings on training for recovery 170 6.7.3.7 Other findings on recovery performance 171
6.7.4 Interaction analyses 171 6.8 Summary 175
7 METHODOLOGY FOR A SELECTION OF RELEVANT AIR TRAFFIC CONTROLLER 178 RECOVERY INFLUENCING FACTORS
7.1 Relevance of the recovery context 178 7.1.1 Example of the recovery context 180
7.2 Methodology to extract the candidate set of contextual factors 181 7.2.1 Human Reliability Assessment techniques 183
7.2.1.1 Human Error in ATM (HERA) 183 7.2.1.2 Technique for the Retrospective and Predictive Analysis of Cognitive 184
Errors in ATC (TRACEr) 7.2.1.3 Recovery from Automation Failure (RAFT) Tool 185 7.2.1.4 Recovery from failures: understanding the positive role of human 186
operators during incidents 7.2.1.5 Computerised Operator Reliability and Error Database (CORE-DATA) 187 7.2.1.6 Technique for Human Error Rate Prediction (THERP) 188 7.2.1.7 Human Error Assessment and Reduction Technique (HEART) 190 7.2.1.8 The Contextual Control Model (COCOM) 191 7.2.1.9 Cognitive Reliability and Error Analysis Method (CREAM) 192 7.2.1.10 Human Reliability Management System (HRMS) 193 7.2.1.11 A Technique for Human Event Analysis (ATHEANA) 194 7.2.1.12 Connectionism Assessment of Human Reliability (CAHR) 195 7.2.1.13 Nuclear Action Reliability Assessment (NARA) 196 7.2.1.14 Human Performance DataBase (HPDB) 197 7.2.1.15 Summary of the findings 198
7.2.2 Augmentation with equipment-failure related factors 200 7.2.3 Augmentation with dynamic situational factors 200 7.2.4 Further subdivision of the identified RIFs 201
7.3 Definition of qualitative descriptors 202 7.4 Summary 204
8 QUANTITATIVE ASSESSMENT OF THE RECOVERY CONTEXT 206 8.1 Lessons leant from past research 206
8.1.1 Application of the CREAM technique 207 8.1.2 Connectionism Assessment of Human Reliability (CAHR) 208
8.2 Framework for the methodology for a quantitative assessment of recovery context 209 8.3 Probabilistic assessment of RIFs (Step 2) 211
8.3.1 Sources of information 212 8.3.1.1 Operational failure reports 212 8.3.1.2 Questionnaire survey 213 8.3.1.3 Input by ATM Specialists 213
x
8.3.1.4 Past literature 216 8.3.1.5 Aggregation of data 216
8.3.2 Summary 217 8.4 Interactions between Recovery Influencing Factors (Step 3) 218
8.4.1 Identification of RIF interactions 218 8.4.2 Validation of RIF interactions 221
8.4.2.1 CREAM 221 8.4.2.2 CAHR 221 8.4.2.3 Validation by ATM specialists 222 8.4.2.4 Validation summary 223
8.4.3 Quantification of RIFs interactions 223 8.5 Methodology for the determination of the cut-off points (Step 4) 227 8.6 Specific effects of RIFs on controller recovery performance (Step 5) 231 8.7 Calculation of the recovery context indicator (Step 6) 232
8.7.1 Re-calculation of RIF probabilities 232 8.7.2 Distribution of the recovery context indicator 234 8.7.3 Sensitivity analysis 236 8.7.4 Optimal solutions 237
8.8 Summary 238
9 EXPERIMENTAL INVESTIGATION OF THE AIR TRAFFIC CONTROLLER 240 RECOVERY PERFORMANCE
9.1 High-level design of the experimental process 241 9.2 Rationale for the experiment 242 9.3 Assessment of the available resources 242 9.4 Planning for the experiment 243 9.5 Design of the experiment 244 9.6 Selection of the equipment failure to be simulated 246 9.7 Pilot study: lessons learnt 249
9.7.1 Summary of the findings from the pilot study 252 9.8 Experimental set up 253
9.8.1 Airspace characteristics 256 9.8.2 Traffic characteristics 257 9.8.3 Equipment failure characteristics 257
9.9 Experimental variables 259 9.9.1 Independent Variables 260
9.9.1.1 Recovery Influencing Factors (RIFs) 260 9.9.1.2 Required recovery steps 263
9.9.2 Dependent Variables 264 9.9.2.1 Recovery effectiveness 264 9.9.2.2 Recovery duration 266
9.9.3 Extraneous Variables 267 9.10 Potential limitations 268 9.11 Summary 268
10 ANALYSIS OF EXPERIMENTAL RESULTS 270 10.1 Overall framework 270 10.2 Participants 271
10.2.1 Age and operational experience 272 10.2.2 Ratings 272
10.3 Assessment of controller recovery performance 274
xi
10.3.1 Recovery context 274 10.3.1.1 Assessment of relevant RIFs 274 10.3.1.2 Probabilities of each RIF and its corresponding level 275 10.3.1.3 Interactions between RIFs 276 10.3.1.4 Recovery context indicator (Ic) 276 10.3.1.5 Optimal solutions 280
10.3.1.5.1 Impact of enhancing ‘recovery procedure’ on recovery 281 context
10.3.2 Required recovery steps 283 10.3.3 Recovery effectiveness 285 10.3.4 Recovery duration 286 10.3.5 Outcome of the recovery process 289 10.3.6 Interactions 291 10.3.7 Other findings 292
10.3.7.1 The recovery phases 292 10.3.7.1.1 Detection 292 10.3.7.1.2 Diagnosis 293 10.3.7.1.3 Correction 293
10.3.7.2 Observed behaviour and attitude 295 10.3.7.3 Additional findings 296
10.4 Summary 299
11 CONCLUSIONS 301 11.1 Revisiting the research objectives 301 11.2 Conclusions 301
11.2.1 Literature review 301 11.2.2 Equipment failure types and their characteristics 302 11.2.3 Controller recovery performance, recovery context, and influencing factors 303 11.2.4 Framework for the analysis of controller recovery 305
11.3 Future work 306 11.4 Publications relating to this work 307
11.4.1 Publication format: journal – accepted subject to revision 308 11.4.2 Publication format: journal – published 308 11.4.3 Publication format: conference proceedings - published 308
12 LIST OF REFERENCES 309
APPENDICES 323 Appendix I The cost of delays induced by equipment failures 324 Appendix II Interviews with ATM staff 326 Appendix III Checklist for the Equipment Failure Scenarios in a specific European 329
ATC Centre - An Aide-Memoire framework Appendix IV The questionnaire design 341 Appendix V Example of one questionnaire response 348 Appendix VI Results extracted from question 5 of the questionnaire survey 354 Appendix VII Overview of contextual factors 359 Appendix VIII Probabilities for 20 Recovery Influencing Factors (RIFs) 361 Appendix IX Questions for the ATM Specialist 375 Appendix X Overview of RIFs, their corresponding levels, and designated 378
probabilities Appendix XI Validation of the RIFs interaction matrix 381
xii
Appendix XII Distribution of 20 Recovery Influencing Factors (RIFs) 383 Appendix XIII Experimental material 385
Appendix XIV Overview of RIFs, their corresponding levels, determined in the experimental investigation
and probabilities 402
Appendix XV Distribution of the recovery context indicator captured in the experiment
404
xiii
List of Figures
Figure 1-1 Overview of the thesis 7 Figure 2-1 Air transport system (from Subotic et al., 2005) 9 Figure 2-2 Flight profile (adapter from ICAO, 2001b) 10 Figure 2-3 ATM and ATC system components (adapted from ICAO, 2001a) 14 Figure 2-4 Communication function 16 Figure 2-5 Navigational function 19 Figure 2-6 Surveillance function 23 Figure 2-7 Data processing and distribution function 26 Figure 2-8 Supporting function 29 Figure 2-9 System monitoring and control function 31 Figure 3-1 Phases of an equipment failure occurrence 41 Figure 3-2 Different definitions 43 Figure 3-3 Reporting system 46 Figure 3-4 ”Bathtub” model of reliability for electronic components (Leveson, 50
1995) Figure 3-5 Aviation TLS and risk budgeting 64 Figure 4-1 Safety through design (adapted from Christensen and Manuele, 81
1999) Figure 4-2 Technical and human recovery 82 Figure 4-3 Operational failure reports analyses 87 Figure 4-4 Total number of equipment failures per flight hours flown in each 90
year for countries A, B, and C Figure 4-5 Total number of equipment failures per flight hours flown in each 90
year for country D (year 2000 incomplete) Figure 4-6 Most affected ATC functionality (Country A) 91 Figure 4-7 Most affected ATC functionality (Country B) 92 Figure 4-8 Most affected ATC functionality (Country C) 92 Figure 4-9 Most affected ATC functionality (Country D) 93 Figure 4-10 Distribution of equipment failures according to their severity 96 Figure 4-11 Distribution of major equipment failures according to ATC 97
functionality Figure 4-12 Distribution of the failure duration according to four distinct 99
categories Figure 4-13 Qualitative equipment failure impact assessment tool 105 Figure 5-1 Analysis of outcome phase (adapted from EUROCONTROL, 2004e) 120 Figure 5-2 Recovery process phase model (Kanse, 2004) 123 Figure 5-3 The Recovery from Automation Failure Tool (RAFT) Framework 124
(EUROCONTROL, 2004e) Figure 5-4 Model of failure recovery in air traffic control. Where two nodes are 125
xiv
Figure 6-1 Figure 6-2
Figure 6-3 Figure 6-4 Figure 6-5 Figure 6-6 Figure 6-7 Figure 6-8
Figure 6-9
Figure 6-10
Figure 6-11
Figure 7-1 Figure 8-1 Figure 8-2
Figure 8-3
Figure 8-4
Figure 8-5
Figure 8-6
Figure 8-7
Figure 8-8 Figure 9-1 Figure 9-2 Figure 9-3 Figure 9-4
Figure 10-1 Figure 10-2 Figure 10-3 Figure 10-4 Figure 10-5
Figure 10-6 Figure 10-7 Figure 10-8 Figure 10-9 Figure 10-10 Figure 10-11
connected by an arrow, signs (+, -, 0) indicate the direction of effect on the variable depicted in the right node, caused by an increase in the variable depicted in the left node (Wickens et al., 1998) The flow diagram of organising a survey 140 Distribution of world air traffic per region for the year 2003 and 2023 142 (adapted from Airbus, 2004) One-page example of the questionnaire 146 The flow chart of questionnaire survey analyses 150 Distribution of questionnaire responses per region 153 Distribution of operational experience 155 Distribution of air traffic controllers’ ratings 155 Controllers’ reliance on written procedures throughout the recovery 157 process Controllers’ reliance on situation-specific problem solving throughout 157 the recovery process Controllers’ reliance on past experience throughout the recovery 158 process Distribution of affected ATC functionalities as reported in the 159 questionnaire survey Methodology to extract a candidate set of RIFs 182 Framework for the quantitative assessment of the recovery context 210 Distribution of RIF5 levels amongst identified recovery contexts 226 without interactions Distribution of RIF5 levels amongst identified recovery contexts with 226 interactions Distribution of RIF1 levels amongst identified recovery contexts with 227 interactions Distribution of RIF20 levels amongst identified recovery contexts with 227 interactions Distribution fitting for the three cut-off points on the example of RIF5 229 Level 1 Cubic polynomial function f(x) fitted for the RIF5 to determine its 230 minimum Distribution of the recovery context indicator 235 The flow diagram of experimental investigation 241 Timeline of the experiment 254 Room setup 255 The visual representation of equipment failure on CWP: a) before the 258 failure, b) after the failure Framework for the analysis of experimental results 271 Distribution of operational experience 272 Distribution of controllers’ ratings 273 Distribution of the recovery context indicator in the experiment 277 Distribution of the recovery context indicator in the experiment with 279 an increased value of the coefficient of interaction Distribution of the recovery context indicator of 30 controllers 280 Recovery steps performed by each participant 283 Distribution of required recovery steps (S1 to S17) 284 Distribution of recovery effectiveness per category 286 Distribution of recovery duration 287 Distribution of the recovery outcome 290
xv
295 Figure 10-12 Recovery phases, their corresponding influencing factors, and required recovery steps
xvi
List of Tables
Table 3-1
Table 3-2 Table 3-3 Table 4-1
Table 4-2 Table 4-3 Table 4-4 Table 4-5
Table 4-6 Table 4-7 Table 4-8 Table 4 9 Table 4-10 Table 4-11
Table 4-12 Table 4-13
Table 4-14 Table 4-15
Table 4-16
Table 4-17
Table 5-1 Table 5-2 Table 6-1 Table 6-2
Table 6-3
Table 6-4 Table 6-5 Table 7-1
Table 7-2
Table 7-3
Summary of available data, number of reports, and equipment failure 49 incidents per country Summary of various analyses on aviation TLS 61 Analysis of operational failure reports and results 66 Examples of equipment failures related to different ATC system 70 functionalities (as defined in Chapter 2) UK NATS severity rating (from NATS, 2002) 75 Country C’s severity rating as defined by its CAA 76 Country D severity rating as defined by the particular ATC Centre 76 Severity rating defined in this research and mapped with available 77 sources Most affected ATC equipment (Country A) 91 Most affected ATC equipment (Country B) 92 Most affected ATC equipment (Country C) 93 Most affected ATC equipment (Country D) 94 Summary of five ATC equipment types most affected by failures 94 Percentage of the multiple failure occurrences reported in the 95 available datasets Summary of five most affected equipment types from four datasets 98 Distribution of major failures lasting up to 15 minutes per ATC 99 equipment affected Statistical tests and results obtained 100 Main findings regarding interaction between ATC functionality and 101 severity Review of equipment failure characteristics with regard to their 101 impact on ATC operations Detailed overview of the primary and the secondary group of ATC 103 functionalities Phases of the recovery process identified in past research 112 Summary of relevant models of the human recovery process 126 Summary of the questionnaire survey sample 151 Mapping between most unreliable ATC functionalities and existing 160 recovery procedures for sampled worldwide countries Existence of recovery procedures, recovery training, and recurrent 165 training as reported in the questionnaire survey Interaction matrix 172 Statistical tests and results obtained 173 Factors influencing recovery from failures (from Kanse and van der 186 Schaaf, 2000) Factors influencing human actions in THERP (cited in Straeter, 189 2000) Review of Human Reliability Assessment (HRA) techniques and 198
xvii
Table 7-4 Table 7-5
Table 8-1 Table 8-2 Table 8-3 Table 8-4
Table 8-5
Table 8-6
Table 8-7 Table 8-8
Table 8-9
Table 8-10 Table 8-11 Table 8-12 Table 8-13 Table 9-1 Table 9-2
Table 9-3 Table 9-4
Table 9-5 Table 9-6
Table 9-7 Table 9-8 Table 9-9 Table 9-10 Table 10-1 Table 10-2
Table 10-3 Table 10-4
Table 10-5 Table 10-6 Table 10-7
Table 10-8
Table 10-9 Table 10-10
Table 10-11 Table 10-12
relevant findings Recovery Influencing Factors 201 Relevant recovery influencing factors and their corresponding 203 qualitative descriptors Overview of CREAM and CAHR differences 208 Distribution of probabilistic RIF ratings per source 212 ATM specialists involved in the assessment of RIFs 214 Overview of the sources of information used to determine RIF 217 probabilities Example of a potential recovery context represented as a 20-digit 218 array Interaction matrix: (1) validation by CREAM, (2) validation by CAHR, 220 (3) validation by ATM specialists; and (x) not validated interactions Mapping between RIFs and CAHR contextual factors 222 Recovery context (as presented in Table 8-5) after the incorporation 225 of RIF interactions Descriptive statistics for the three cut-off points on the example of 229 RIF5 Level 1 Local minimums of polynomial functions 230 Cut-off points between the levels for all RIFs 230 Probabilities for the RIF5 and each of its levels (see Appendix VII) 232 Sensitivity analysis 237 Training, pilot study, and experiment sessions 244 Overview of the potential equipment failures to be simulated and 247 their inclusion in the pilot study Equipment failures used in the pilot study 249 The mapping between exercise characteristics and the controllers 257 observations Equipment failure in the experimental study 258 Availability of functions in the reduced flight data processing mode 259
Overview of independent and dependent variables 259 Overview of independent and extraneous variables 261 Overview and description of required recovery steps 263 Recovery process and its three main tasks 265 Characteristics of a sample of controllers participating in experiment 273 Verification of RIFs probabilities from a ‘generic’ approach (Chapter 275 8) and the experiment Summary of RIFs defined through a single corresponding level 277 Verification of the distribution of the recovery context indicator 278 obtained from a ‘generic’ approach (Chapter 8) and the experiment A review of RIFs with the potential for recovery enhancement 281 A review of the proposed recovery solutions 282 Percentage of performed recovery steps in three experimental 285 sessions Comparison of recovery durations between three experimental 288 sessions Statistical tests and results 289 The outcome of the recovery process matrix (S stands for 290 successful, T for tolerable, and U for unsuccessful recovery) Statistical tests and results 291 Summary of additional findings 299
xviii
List of Abbreviations
ACAS Airborne Collision Avoidance System ACC Area Control Centre ADREP Accident/Incident Reporting ADS Automatic Dependent Surveillance ADS-B Automatic Dependence Surveillance Broadcast ADS-C Automatic Dependence Surveillance Contract AFTN Aeronautical Fixed Telecommunication Network A/G Air-Ground communication AGDP Air Ground Data Processor AGL Aeronautical Ground Lighting AIAA American Institute of Aeronautics and Astronautics AIS Aeronautical Information Service AMAN Arrival Manager ANSP Air Navigation Service Provider APP Approach Control Office APR Automatic Position Reporting APW Area Proximity Warning ARO Air traffic services Reporting Office ARTCC Air Route Traffic Control Centre ASAS Airborne Surveillance and Separation Assurance ASM Airspace Management ASMT ATM Safety Monitoring Tool ASMT Automatic Safety Monitoring Tool ASTERIX All Purpose STructured Eurocontrol Radar Information
Exchange ATC Air Traffic Control ATCT Air Traffic Control Tower ATFM Air Traffic Flow Management ATHEANA A Technique for Human Event Analysis ATIS Aeronautical Terminal Information Service ATM Air Traffic Management ATS Air Traffic Service AWOP All-Weather Operations Panel BBN Bayesian Belief Network BEST Beginning to End Skills Trainer BEVOR German special occurrences database CAA Civil Aviation Authority CAHR Connectionism Assessment of Human Reliability
xix
CATIS Computerised Automatic Terminal Information Service CC Contextual Condition CLAM Cleared Level Adherence Monitoring CEATS Central European Air Traffic Services CFMU Central Flow Management Unit CMS Control and Monitoring System CNS Communication Navigation Surveillance COCOM Contextual Control Model CORE-DATA Computerised Operator Reliability and Error Database CPC Common Performance Condition CPDLC Controller Pilot Data Link Communication CPM Common Performance Modes CRDS CEATS Research, Development and Simulation CREAM Cognitive Reliability and Error Analysis Method CS Commercial Service CWP Controller Working Position DARC Direct Access Radar Channel DMAN Departure Manager DME Distance Measuring Equipment EASA European Aviation Safety Agency ECAC European Civil Aviation Conference ECSS European Cooperation for Space Standardisation EGNOS European Geostationary Navigation Overlay Service EOC Errors Of Commission EOO Errors of Ommission EPC Error Producing Condition ESA European Space Agency ESSAR EUROCONTROL SAfety Regulatory Requirements ET Event Tree EU European Union EUROCONTROL European Organization for Safety of Air Navigation FAA Federal Aviation Administration FANS Future Navigation System FDPD Flight Data Processing and Distribution FDPS Flight Data Processing System FIR Flight Information Region FIS Flight Information Service FL Flight Level FMEA Failure Mode and Effect Analysis FMECA Failure Modes, Effects, and Criticality Analysis FMS Flight Management System FPP Flight Plan Processing FPS Flight Progress Strips FT Fault Tree G2G Gate to Gate G/G Ground-Ground communication GLONAS Global Orbiting Navigation Satellite System GNSS Global Navigation Satellite Systems GPS Global Positioning System HEART Human Error Assessment and Reduction Technique HEIDI Harmonisation of European Incident Definition Initiative
xx
HEP Human Error Probability HFACS Human Factors Analysis and Classification System HEP Human Error Probability HERA Human Error in ATM Project HF High Frequency HF DL High Frequency Data Link HMI Human Machine Interface HPDB Human Performance DataBase HRA Human Reliability Assessment HRMS Human Reliability Management System IANS Institute of Air Navigation Services IC Intercom Ic recovery Context Indicator ICAO International Civil Aviation Organization IEC International Electrotechnical Commission IEEE Institute of Electrical and Electronics Engineers IFR Instrument Flight Rules ILS Instrument Landing System IMC Instrument Meteorological Conditions IMC Industry Management Committee INS Inertial Navigation Systems IP Interphone IRS Incident Reporting System ISO International Organisation for Standardisation JAA Joint Aviation Authority JAR Joint Aviation Regulations JHEDI Justification of Human Error Data Information M Mean
MAESTRO Means to Aid Expedition and Sequencing of Traffic with Research and Optimisation
MANTAS Maastricht ATC New Tools And Systems MATS Manual of Air Traffic Services MDT Mean Down Time MET Meteorological METAR Meteorological Aerodrome Report Mil Military MLS Microwave Landing System MMI Man Machine Interface MMS Man Machine System MONA MONitoring Aids MORS Mandatory Occurrence Reporting Scheme MRP Multi Radar Processing MSAW Minimum Safe Altitude Warning MSL Mean Sea Level MTBF Mean Time Between Failure MTBM Mean Time Between Maintenance MTCD Medium Term Conflict Detection MTTR Mean Time To Repair MUAC Maastricht Upper Area Control Centre NATSPG North Atlantic Systems Planning Group MTOW Maximum Take Off Weight
xxi
NARA Nuclear Action Reliability Assessment NAIPS National Aeronautical Information Processing System NAS National Aviation System NASA National Aeronautics and Space Administration NATS National Air Traffic Service NUCLARR Nuclear Computerise Library for Assessing Reactor Reliability NDB Non-Directional Beacon NLR National Aerospace Laboratory NOTAM Notice to Airmen NTL National Transportation Library NTSB National Transportation Safety Board OJT On-the-Job-Training OLDI On-line Data Interchange OS Open Service PABX Private Automatic Branch Exchange PAR Precision Approach Radar PARM Parallel Approach Runway Monitor PPS Precise Positioning Service PRA Probabilistic Risk Assessment PRNAV Precision aRea NAVigation PRS Public Regulated Service Proc Procedural control PRS Primary Radar Service PSA Probabilistic Safety Assessment PSF Performance Shaping Factor PSR Primary Surveillance Radar PTT Press To Talk QRA Quantitative Risk Assessment RAFT Recovery from Automation Failure Tool RAM Route Adherence Monitoring RCP Required Communication Performance RDP Radar Data Processing RDPS Radar Data Processing System RDR Radar RGCSP Review of the General Concept of Separation Panel RIF Recovery Influencing Factor RIMCAS Runway Incursion Monitoring and Conflict Alert System RNP Required Navigational Performance RSP Required Surveillance Performance RT Radio Telephony RTCA Radio Technical Commission for Aeronautics RVSM Reduced Vertical Separation Minima RVR Runway Visual Range RWY Runway SAR Special Administrative Region SAR Search And Rescue SAS Situational Awareness for Safety SATCOM SATellite COMmunication SHAPE Solutions for Human Automation Partnership in European ATM SBAS Satellite-Based Augmentation Systems SBJ Supersonic Business Jet
xxii
SD Standard Deviation SE Standard Error SEP Safety and Emergency Procedures SES Single European Sky SID Standard Instrument Departure SME Subject Matter Expert SMC Surface Movement Control SMR Surface Movement Radar SNET Safety Nets SoL Safety-of-Life SOR Stimulus-Organism-Response SPS Standard Positioning Service SRG Safety Regulatory Group SRK Skill Rule Knowledge SRP Single Radar Processing SRU Safety Regulatory Unit SSR Secondary Surveillance Radar STAR Standard Terminal Arrival Route STCA Short Term Conflict Alert SUA Special Use Airspace SYSCO System Supported COordination TACAN TACtical Air Navigation THERP Technique for Human Error Rate Prediction TAR Terminal Approach Radar TCAS Traffic Alert and Collision Avoidance System TID Touch Input Device TRACON Terminal Radar Approach CONtrol TIP Touch Input Panels TLS Target Level of Safety TRACEr Technique for the Retrospective and Predictive Analysis of
Cognitive Errors in ATC TRACON Terminal Radar Approach CONtrol TRUCE TRaining for Unusual Circumstances and Emergencies TRM Team Resource Management TTA Time To Alert TWR Aerodrome Control Tower TWY Taxiway UAV Unmanned Aerial Vehicles UHF Ultra High Frequency UPS Uninterruptible Power Supply US United States UTC Coordinated Universal Time VDL Very high frequency Data Link VFR Visual Flight Rules VHF Very High Frequency VMC Visual Meteorological Conditions VOR VHF Omnidirectional Range navigation system VORTAC VHF Omnidirectional Range /TACtical Air Navigation VSCS Voice Switching Communication System WAAS World Aircraft Accident Summary
xxiii
Chapter 1 Introduction
1
1 Introduction
The aim of this Chapter is to present the background to the problem of controller
recovery from equipment failures in Air Traffic Control (ATC) and to set the scene for
the research presented in this thesis. This Chapter defines the rationale behind the
need to better understand the impact that equipment failures have on controller
performance in the current as well as in the future ATC environment. Based on this
background, the principle research objectives are defined to assure an in depth
analysis of ATC equipment failures and controller recovery. This is followed by the
specification of the structure of the thesis and a summary of each Chapter.
1.1 Background to the problem
The aim of the research presented in this thesis is to provide a holistic assessment of
controller recovery from equipment failures in ATC. In order to achieve this, it is
essential to define the environment in which equipment failures are investigated, i.e.
the Air Traffic Management (ATM) system and its ATC component. While ATC is
responsible for the separation of air traffic, other components of the ATM system
manage air traffic flow and airspace design to assure minimal delays and optimal use
of airspace. The ATC system is comprised of people, equipment, and procedures
required to act together to achieve the same objective, i.e. safe and efficient flow of air
traffic in a dedicated airspace. In order to achieve this, all three components must be
operational and fully integrated to enable the most effective and efficient air traffic
service. Consequently, in the case of failure of any component of the ATC system, the
remaining nominally operational components may still provide air traffic services, either
partially or fully, depending on the characteristics of the failure. The research presented
in this thesis focuses solely on failures of one component of the ATC system, namely
equipment.
In order to provide continuous air traffic services various ‘defences’ or ‘barriers’ are
designed to prevent or mitigate the occurrence of equipment failures. For example, the
existence of technical built-in defences offers protection against the majority of
Chapter 1 Introduction
2
equipment failures that can occur (NATS, 2002). In most cases, this protection is
triggered automatically and seamlessly. Hence, an equipment failure should not result
in a problem that impacts on the controller’s ability to carry out tasks safely, as they
should be automatically resolved with no interruption of the service (EUROCONTROL,
2004e). However, there are occasions when these technical defences are not sufficient
to maintain the normal ATC system state and protect against negative outcomes. On
such occasions, the intervention of the human, as a component of the ATC system, is
necessary. In other words, the intervention of the air traffic controller becomes crucial
for the provision of a safe but not necessarily efficient air traffic service. Note that
safety represents the key driver here as opposed to efficiency.
In the past, major failures or total outages (i.e. failure of the entire system) were the
subject of detailed investigations. These investigations were aimed at resolving and
preventing similar failure occurrences by focusing mostly on the technology (National
Transportation Safety Board, 1996; General Accounting Office, 1982; General
Accounting Office, 1991; General Accounting Office, 1996; and General Accounting
Office, 1998). For a long time, the basic focus of reliability, system safety, and quality
management was purely on the prevention of equipment failures or the reduction of
their reoccurrence. Various techniques have been developed to assess equipment
failures, their causes, consequences, and appropriate defences. For example, the US
Federal Aviation Administration (FAA) requests that the availability of the Voice
Switching Communication System (VSCS) on the level of the ATC Centre (facility-
level1) should not be less than 0.9999999, including the backup VSCS (FAA, 1997). In
spite of the significant efforts, equipment failures still occur and every ATC system
eventually fails to perform its intended function or part thereof. On these unexpected
occasions, the recovery of the ATC system is left to the human operator to implement
an appropriate recovery strategy in both a timely and effective manner. While past
research focused on the technical aspects of the occurrence of equipment failures,
very little has been done on human factors, with a particular reference to controller
recovery from such failures. Some examples, such as research by Wickens et al.
(1998), Low and Donohoe (2001), and EUROCONTROL (2004e), are discussed in the
following paragraphs.
1 The facility-level availability is based on a 50-position system. According to the FAA, system failure occurs when one or more critical functions are unavailable in more than 10 percent of the
positions.
Chapter 1 Introduction
3
There is a vast amount of Human Reliability Assessment (HRA) research on recovery
from human error in areas including the nuclear and chemical process industry.
However, this knowledge has not been fully exhausted in aviation. For example, Zapf
and Reason (1994), Kontogiannis (1999), Kanse and van der Schaaf (2000), and
Kanse (2004) analysed recovery from the consequences of human error in various
non-ATC environments. Moreover, past HRA research recognised the importance of
contextual factors that influence the recovery process. Various HRA techniques defined
these factors depending on the type of operation and environment that surrounds the
human operator. In short, the concepts of recovery from human error and recovery
context are transferable to the recovery from equipment failure. Both represent human
recovery triggered by different stimulus (human error as opposed to technical failure)
occurring within a certain context.
The above findings led to a significant research effort being devoted to the area of
human recovery, from both human error and technical faults. For example, research on
automation in future ATM has shown that human operators are less likely to detect
failures in the automated process due to complacency and reduced situational
awareness (Wickens et al., 1998; Metzger and Parasuraman, 2005). Researchers at
the UK National Air Traffic Service (NATS) examined the potential methodologies to
assess human recovery performance from failures of several automated systems (Low
and Donohoe, 2001). Several different safety (e.g. hazard and operability-HAZOP) and
psycho-physiological methods (e.g. eye movement tracking, situational awareness
assessment-SAGAT, subjective workload ratings-NASA TLX, speech workload) were
investigated. While some of these methods are quite easy to implement (e.g. HAZOP,
SAGAT, NASA TLX), others require complex training and the use of sophisticated
equipment (e.g. eye movement tracking, speech workload). Most of these methods
proved to be appropriate, providing useful information and were thus recommended for
future use. Due to the confidential nature of this research, no further insight was given
into the human recovery process, its phases, and the impact of the context surrounding
the controllers.
Furthermore, the EUROCONTROL Gate to Gate (G2G) project, initiated to test future
advanced ATC concepts, further highlighted the impact and importance of ATC
equipment failures. ATC safety managers throughout Europe highlighted several
equipment related areas of concern within their ATC Centres (Gordon and Makings,
2003). These are: radio communication interference, equipment reliability, ATC tools
failure, and relevance of emergency checklists for controllers and appropriate handling
Chapter 1 Introduction
4
of emergency situations. This study highlighted the consequences of equipment
unavailability in current as well as future more automated ATC environments.
Simulation trials that followed attempted to identify and investigate safety-relevant
occurrences associated with future ATC concepts/tools (Medium Term Conflict
Detection-MTCD, MONitoring Aid-MONA, data link, Arrival Manager-AMAN, and
Airborne Separation Assistance System-ASAS). Various equipment failures were
identified amongst the potential safety-relevant occurrences 2 . They ranged from
problems with Human Machine Interface (HMI), ASAS messages, as well as data link
messages (Damidau, Kirwan, and Scrivani, 2006).
However, not many studies have explicitly addressed jointly the question of equipment
failures and recovery in the area of ATC. The Panel on Human Factors in Air Traffic
Control Automation was formed at the request of the Federal Aviation Administration
(FAA) to study the air traffic control system, the national airspace system, and future
automation alternatives from a human factors perspective (Wickens et al., 1998). The
Panel’s deliberations, in particular, highlighted the role of reliability of automation and
human recovery in the future ATC environment, characterised with higher levels of
automation, complexity, and traffic density. Similarly, the EUROCONTROL project on
Solutions for Human Automation Partnership in European Air Traffic Management
(SHAPE) dedicated one part to the analysis of human recovery from equipment failures
in the automated ATC environment. The findings highlighted the importance of context
within which a failure occurs as well as recovery training and procedures designed to
aid recovering (EUROCONTROL, 2004e).
Overall, existing research has shown that there is a need to understand the
mechanisms behind failure and recovery in ATC. This applies both to the technical and
human perspectives as both are essential to ensuring the highest level of safety. In
order to develop a heuristic method to address these issues, it is necessary to define
the major research objectives. These are presented below.
1.2 Research objectives
The need for an in depth analysis of ATC equipment failures and the associated
controller recovery processes is presented briefly above and is discussed in more
2 Personal correspondence with EUROCONTROL G2G project team.
Chapter 1 Introduction
5
detail in the remainder of the thesis. Based on the background to the problem
presented above, four research objectives have been formulated:
� Provide a systematic literature review to connect disparate but related topics of
ATC equipment failures and controller recovery, previously lacking in the area of
ATC;
� Identify potential equipment failure types and their characteristics;
� Identify contextual factors that affect controller recovery performance and derive
a methodology to quantitatively assess recovery context; and
� Propose a framework for the analysis of controller recovery. This framework
should be further verified with a specific reference to a particular equipment
failure type.
1.3 Outline of the thesis
This thesis is organised as follows. Chapter 2 discusses the architecture of the Air
Traffic Management (ATM) system with specific attention paid to its Air Traffic Control
(ATC) component, to portray the context of the research presented in this thesis. The
ATC architecture is presented in terms of nine functionalities and the corresponding
physical architecture (equipment). In other words, it specifies nine ATC functionalities
and equipment that supports each of them. Chapter 3 presents a preliminary
assessment of the equipment failures in ATC based on the sample of operational
failure reports available in this research. It provides definitions of equipment failure,
hazards, and built-in technical defences to be used in the research on recovery from
equipment failures in ATC. The Chapter continues by assessing how representative is
the sample of equipment failures occurring in the operational ATC environment. This is
achieved though a methodology that determines how much ATC equipment contributes
to the safety of the overall air transport system.
Having confirmed that the operational failure reports available in this thesis are
representative of the equipment failure types experienced operationally, Chapter 4
provides a good understanding of equipment failures and their impact on the ATM and
ATC operations. It discusses the main equipment failure characteristics extracted from
available operational failure reports and past research. Assessed characteristics range
from the ATC functionality affected to the impact of equipment failure on ATC and ATM
operations. The Chapter concludes with the development of a novel tool for the
assessment of the overall impact of an equipment failure on ATC operations, known as
the qualitative equipment failure impact assessment tool.
Chapter 1 Introduction
6
Having established the framework for the assessment of equipment failures in
Chapters 3 and 4, Chapter 5 addresses the human factors aspects of relevance to
controller recovery performance in the event of an equipment failure. It discusses past
research on human reliability transferable to controller recovery performance. The
Chapter presents the initial theoretical findings on the recovery process, including the
relevance of the recovery context, past experience, recovery procedures, and recovery
training. It concludes by defining the potential variables that enable the assessment
and understanding of controller recovery performance.
The theoretical findings from Chapter 5 are further informed by the operational
experience extracted from the questionnaire survey results presented in Chapter 6.
This survey informed both the technical and human aspects of the research into
recovery from ATC equipment failures.
Having acknowledged the importance of recovery context both from past research
(Chapter 5) and operational experience (Chapter 6), this thesis continues by setting the
scene for the qualitative and quantitative assessment of the recovery context. Chapter
7 reviews past ATC and non-ATC research to extract the relevant factors important for
the definition of the context surrounding an ATC equipment failure occurrence. As a
result, this Chapter concludes with a set of 20 candidate Recovery Influencing Factors
(RIFs). Chapter 8 reviews relevant past research to further exploit the findings from
Chapter 7. It continues by defining the methodology for the quantitative assessment of
the recovery context and definition of the recovery context indicator.
To further verify this methodology proposed in Chapter 8, Chapter 9 presents the
design of an experiment carried out at a particular ATC Centre that involved exposing
30 operational controllers to an unexpected but complex equipment failure. This
particular equipment failure was carefully selected from several failure types based on
the findings in Chapters 4, 5, and 6. The analyses of the data collected on recovery
performance from this experiment are presented in Chapter 10. These analyses are
based on a set of variables that enable investigation of controller recovery as proposed
in Chapter 5. The thesis ends with Chapter 11 drawing together the conclusions
achieved throughout this research together with suggested areas for further research.
Figure 1-1 crystallises the overall structure of this thesis.
Chapter 1 Introduction
7
Figure 1-1 Overview of the thesis
Chapter 2 Fundamental of ATM and ATC
8
2 Fundamentals of Air Traffic Management and Control
The main objective of the research presented in this thesis is to investigate the
recovery process adopted by air traffic controllers in the event of Air Traffic Control
(ATC) equipment failures. A desirable objective of the research in this thesis is a
framework to analyse controller recovery transferable in time (i.e. to the current and
future ATC Centre). The Chapter contributes to this objective in several ways. Firstly, it
defines the environment for the investigation of equipment failures, i.e. Air Traffic
Management (ATM) and its component ATC. Secondly, it discusses the ATC system
architecture including its specific functional elements. The Chapter proposes a unique
classification of equipment failures based on these functional elements that enables the
capture of all operational components of ATC. This classification is further built upon in
the remainder of the thesis (Chapter 4) to create a qualitative equipment failure impact
assessment tool. Thirdly, the Chapter reviews the characteristics of a generic ATC
Centre with regard to current and future technologies. The potential characteristics of
future ATC Centres are discussed with an emphasis on challenges that face human
operators (i.e. air traffic controllers) due to increasing levels of automation. The
Chapter concludes with discussions on the potential sources of technical and controller
performance deficiencies within future ATC Centres and their relevance to the recovery
process.
2.1 Air Traffic Management
The major components of the air transport system are aircraft, airline operations, ATM,
airport operations, and the operational environment in which these components exist
and interact (Figure 2-1). The objective of ATM is “to enable aircraft operators to meet
their planned times of departure and arrival and adhere to their preferred flight profiles
with minimum constraints, without compromising agreed levels of safety”
(EUROCONTROL, 2006a).
Chapter 2 Fundamental of ATM and ATC
9
Figure 2-1 Air transport system (from Subotic et al., 2005)
An ATM system comprises two functionally integrated elements, namely airborne ATM
and ground-based ATM. The airborne ATM consists of several systems integrated into
the aircraft cockpit, such as the airborne Communication/Navigation/Surveillance
(CNS) system, the Flight Management System (FMS), and the Airborne Collision
Avoidance System (ACAS) also known as the Traffic Alert and Collision Avoidance
System (TCAS). The components of ground-based ATM (Figure 2-1) are Airspace
Management (ASM), Air Traffic Service (ATS), and Air Traffic Flow Management
(ATFM) (ICAO, 2001a).
Airspace Management (ASM) is related to the structure and organisation of the national
airspace organised at a strategic (i.e. national ASM policy, planning, and coordination),
pre-tactical (i.e. daily management and temporary allocation of airspace), and tactical
levels (i.e. real-time activation, deactivation, reallocation of airspace, and civil/military
coordination). Air Traffic Service (ATS) is a generic term that combines various
services: the Air traffic services Reporting Office (ARO), the Air Traffic Control service
(ATC), and the Flight Information and alerting Service (FIS) (ICAO, 2001a). The ARO is
a unit established for the purpose of receiving reports concerning air traffic services
and flight plans submitted before flight departure. The ATC component of ATS provides
control of all air traffic in a dedicated airspace. This is discussed in detail in section 2.2
given its importance to the research presented in this thesis. The Flight Information and
alerting Service (FIS) gives advice and information useful for the safe and efficient
conduct of flights. The alerting service provides search and rescue assistance to
aircraft in distress and coordinates any action that may be required. Finally, Air Traffic
Flow Management (ATFM) is a service established to ensure that ATC capacity is
Chapter 2 Fundamental of ATM and ATC
10
utilised to the maximum extent possible, and that the traffic volumes are compatible
with the capacities declared by the appropriate authority. Optimal flow of traffic is
achieved by continuously balancing the traffic demand and the ability of ATC to
accommodate that demand.
2.2 Air Traffic Control
The research presented in this thesis is focused specifically on controller recovery from
equipment failures in Air Traffic Control (ATC). Therefore, this section focuses on the
main characteristics of ATC and the different services provided. Modern ATC services
are provided from ATC Centres by controllers and supporting staff (engineers,
managers, and administrators), working together to achieve the same objective. The
primary objective of an ATC service is to provide a safe flow of traffic both in the air and
on the ground (EUROCONTROL, 1999). In other words, the primary function is to
prevent collision between aircraft in the air as well as collision between aircraft and any
obstacles on the manoeuvring area, by providing and maintaining the required lateral
and vertical separations. The secondary function of an ATC service include ensuring
orderly and expeditious traffic flow by providing traffic advisories, such as weather
information and navigation directions (i.e. vectors). To achieve these functions, the
service is divided into sections that provide an ATC service to aircraft depending on the
segment of the flight profile, i.e. phase of flight (Figure 2-2). According to the
International Civil Aviation Organisation (ICAO)1, ATC provides area, approach, and
aerodrome control services. These are discussed in the following sections.
Figure 2-2 Flight profile (adapter from ICAO, 2001b)
1 ICAO is the specialised agency of the United Nations concerned with the development of air
navigation and regulation of international air transport.
Chapter 2 Fundamental of ATM and ATC
11
2.2.1 Area control service
The area control service is provided from an Area Control Centre (ACC), as defined by
ICAO. In the US, such a Center is referred to as an Air Route Traffic Control Centre
(ARTCC) as defined by the US Federal Aviation Administration (FAA). The controllers
at ACCs provide instructions, clearances, and advice regarding flight conditions during
the cruise phase of the flight (see Figure 2-2). The controllers provide separation
between aircraft operating in the complex network of airways (predetermined air
routes). The controllers use radar to monitor the progress of flights and intervene when
the route or flight level of an aircraft brings it into conflict with another. This is achieved
through tactical air traffic control interventions such as heading or track change, flight
level change, speed control, or alteration of flight routes. In areas where it is impossible
to provide a radar service (i.e. oceanic airspace and other regions without radar
coverage), the controllers employ procedural (i.e. non-radar) control to ensure that
adequate separation exists between aircraft. Procedural control employs greater
separation standards because of the absence of direct radar surveillance (Nolan, 1998;
EUROCONTROL, 1999).
An ACC is usually sub-divided into controlled airspace sectors2 that have responsibility
for specific portions of airspace. This is a direct result of the large volumes of air traffic
that utilise the airspace in the cruise phase of the flight. The greater airspace is
sectorised into smaller, more manageable parts in an effort to prevent controller
overload (i.e. when the traffic in a sector exceeds available airspace capacity or a
controller is unable to safely control existing levels of air traffic).
Generally, each ATC sector is manned by an executive and planning controller, where
each has clearly defined roles and responsibilities (EUROCONTROL, 1999). In the
case of high traffic complexity, two sector controllers are supported by a third person,
i.e. an assistant or a flight data controller. The executive controller is responsible for the
correct identification of traffic within the sector’s area of responsibility and for the
control of all aircraft to ensure a safe, orderly, and expeditious flow of air traffic.
Additionally, the executive controller is required to assist pilots by providing required
navigation assistance and to assist aircraft in any emergency situation. The planning
controller assists the executive controller to the fullest extent by identifying traffic in
2 Airspace is organised into adjacent portions, the so-called sectors, controlled by two or three
controllers, namely executive or tactical controller, planning controller, and assistant or flight data controller.
Chapter 2 Fundamental of ATM and ATC
12
potential conflict, managing flight progress strips, and planning the flow of traffic within
the sector. In addition, the planning controller has to assure that traffic enters and
leaves the sector at flight levels and exit points as agreed with the adjacent sectors
(EUROCONTROL, 1999). The assistant or flight data controller ensures that the strip
printer functions properly. In addition, the assistant accepts, processes all received
messages in a timely manner, and passes them to the appropriate position, manually
inputting any tracks for which flight progress strips have not been produced.
The controllers operating in the sectors within an ACC Centre work in close
cooperation and negotiate with each other on aircraft’s behalf to optimise efficiency and
ensure safety. The area controller’s responsibility terminates when aircraft is handed
over to an adjacent ACC or to an approach control office.
2.2.2 Approach control service
The approach control service is provided from the APProach control office or room
(APP), as defined by ICAO or Terminal Radar Approach CONtrol (TRACON), as
defined by the FAA. According to ICAO (2001a) the approach control unit is
established to provide air traffic control service to controlled flights arriving at, or
departing from, one or more airports. This service is closely associated with the
characteristics of the airports. The radar controllers in the approach control office
provide separation between aircraft in descent during the arrival phase, and, during the
departure phase, between aircraft climbing to their assigned cruise or intermediate
assigned levels (see Figure 2-2). Therefore, the approach controllers are responsible
for providing a safe and expeditious service to departing aircraft in the initial phase of
flight and to arriving aircraft in the descent and final phases of flight (Nolan, 1998;
EUROCONTROL, 1999). The approach controller’s responsibility terminates when
departing aircraft is handed over to an ACC or when arriving aircraft has landed. Note
that APP is responsible for monitoring approaching aircraft, even after they are
transferred to aerodrome control tower, until they land.
2.2.3 Aerodrome control service
The aerodrome control service is provided from the Aerodrome Control Tower (TWR),
as defined by ICAO or Air Traffic Control Tower (ATCT), as defined by the FAA. The
aerodrome controllers are responsible for the safe and efficient conduct of flights during
the take-off and landing phases. These controllers direct airport traffic so that it flows
smoothly and expeditiously. Working closely with the approach controller, they ensure
safety of airport operations by restricting traffic movements so that only one aircraft
Chapter 2 Fundamental of ATM and ATC
13
may land or take-off at a time (Nolan, 1998; EUROCONTROL, 1999). In airports that
use multi-runway operations, the aerodrome controller may be responsible for all
runway operations. Otherwise, the responsibility for multi-runway operations may be
divided between a number of controllers. For example, a parallel runway configuration,
where one runway is dedicated to departures and the other to arrivals, requires
separate departure and arrival controller. In this case close cooperation between the
two controllers is essential to ensure a safe operation.
The aerodrome controller is responsible for all traffic operating in the designated area
of responsibility of the control tower. This includes aerodrome circuit traffic, aircraft
landing and taking off, and aircraft and vehicles operating on the manoeuvring areas
(ICAO, 2001a). When good visibility conditions prevail, (i.e. visual meteorological
conditions or VMC), the controller may separate the traffic by visual means and a
reduction in standard separation is permissible. When poor visibility conditions prevail
(i.e. instrument meteorological conditions or IMC) the aerodrome controller works in
close cooperation with the approach controller. In such conditions, prescribed
separation standards must be applied between aircraft in the air.
The surface movement control or ground control (in the US) is a supplementary service
to the aerodrome control service. In less busy airports the aerodrome and surface
movement control functions can be combined and provided by the aerodrome
controller. Otherwise, the surface controller is responsible for issuing taxi clearance
which will take all aircraft to the departure end of the runway (Nolan, 1998;
EUROCONTROL, 1999). In addition, the surface controller is responsible for the
movements of all aircraft and vehicular traffic on the manoeuvring areas of the airport.
ICAO (2001a) defines the manoeuvring areas as any part of the airport used for the
takeoff, landing, and taxiing of aircraft, excluding aprons. Surface movement control is
usually undertaken by visual means. However, in conditions of poor visibility the
controller relies upon surface movement radar (SMR). Working in close cooperation
with the aerodrome controller, the surface controller ensures that all active runways are
free from vehicular activity during aircraft movements.
2.3 Overall Air Traffic Control system architecture
The preceding paragraphs have highlighted the complexity of the ATM system and its
further decomposition down to the ATC system. Additionally, Figure 2-3 presents ATC
as a system comprised of people, equipment, and procedures integrated in an optimal
way to achieve a common objective. In order to understand how these components
Chapter 2 Fundamental of ATM and ATC
14
come together, a more detailed explanation of the ATC architecture and its basic
functionalities is given below. In line with the objectives of the research presented in
this thesis, this section provides a deeper understanding of ATC functionalities and the
types of ATC equipment that can fail, and therefore affect controller recovery.
ATM
Airspace
management
(ASM)
Air Traffic Flow
Management
(ATFM)
Airborne ATM
(e.g. airborne
CNS, FMC,
ACAS/TCAS)
Ground-based
ATM
Air Traffic
Services (ATS)
Air Traffic Control
(ATC)
Air traffic services
Reporting Office
(ARO)
Flight Information
Service (FIS)
PEOPLE
Controllers
Engineers
Management
EQUIPMENT
HMI
Hardware
Software
PROCEDURES &
TRAINING
Operational Procedures
Engineering Procedures
Figure 2-3 ATM and ATC system components (adapted from ICAO, 2001a)
The functional architecture of any system presents a high level decomposition of the
overall system into a logical set of functional blocks. Each block may be further
decomposed into a series of sub-functions. The ATC functionalities and their related
sub-functions, as presented in this thesis, include all those of the current ATM/ATC
system as well those under development for inclusion in the future (i.e. with 2020 taken
as the target year in this thesis in line with the European Commission’s ‘Vision 2020’;
European Commission, 2001).
The starting point for the development of the ATC functional classification in this thesis
is the EUROCONTROL Harmonisation of European Incident Definition Initiative for
ATM (HEIDI) taxonomy. HEIDI taxonomy identifies six different ATC functionalities and
related ATC equipment that supports each of them. The functionalities listed in HEIDI
are: communication, surveillance, navigation, data processing and distribution, support
information functionality and power supply (EUROCONTROL, 2001e). This taxonomy
is subsequently expanded in this thesis by taking into account the needs for both the
classification and characteristics of the information derived from operational failure
reports processed. The analysis of operational failure reports highlighted the need for
nine ATC functional blocks. . The next set of layers dissects each ATC functional block
Chapter 2 Fundamental of ATM and ATC
15
into relevant sub-functions which are then dissected further to the elemental level. This
approach enables the capture of all operational components of ATC. The resulting nine
ATC functional blocks, as defined in this thesis, are:
� Communication;
� Navigation;
� Surveillance;
� Data processing and distribution;
� Supporting;
� Safety nets;
� Power supply;
� Pointing and data input; and
� System monitoring and control.
Additionally, this classification is further built upon in Chapter 4. The following
paragraphs give a detailed description of each functionality and the corresponding
physical components (i.e. hardware components that support each function).
2.3.1 Air Traffic Control functionalities
2.3.1.1 Communication function
The scope of communication function covers the distribution of information to air- and
ground-based ATC system components in the form of voice, data, or both. This is
achieved using various communication methods. Currently, radio telephony (RT)
enables voice transfer of information via high frequencies (HF), very high frequencies
(VHF), and ultra-high frequencies (UHF). Controller-pilot data link communication
(CPDLC), as a concept currently used in Australasia and the Pacific, assumes transfer
of data based on high frequency data link (HF DL), very high frequency data link (VDL),
and satellite communication (SATCOM). In general, the communication function
provides connectivity and information transfer between users and providers that are
both internal and external to a particular ATC Centre. This function is supported by
various components (Figure 2-4) which are discussed in the following paragraphs. The
section concludes with a discussion of the future communication systems and the
concept of Required Communication Performance (RCP).
Chapter 2 Fundamental of ATM and ATC
16
Figure 2-4 Communication function
Firstly, the communication function is supported by a Voice Switching Communication
System (VSCS) presented on Controller Working Positions (CWPs) via the VSCS
panel. This is a computer-controlled switching system that facilitates both the air-to-
ground (A/G) and ground-ground (G/G) communication necessary for ATC operations
(FAA, 1998). Controllers are able to use the VSCS for A/G communication by
accessing A/G transmitters and receivers through which they communicate with pilots
via HF, VHF, or UHF. The VSCS also ensures that incoming A/G communications from
pilots are routed to the appropriate control position. Controllers are able to use the
VSCS for G/G communication via intercom, interphone, and external circuits. Intercom
enables controllers to access other control positions or ancillary positions located within
the operational room. Interphone enables controllers to access positions located within
another ATC/ATM facility. Finally, external circuits of VSCS enable controllers to
access the public telephone network (FAA, 1998).
Secondly, data is exchanged with adjacent ATC Centres via the Aeronautical Fixed
Telecommunication Network (AFTN), On-line Data Exchange (OLDI) automated
protocols, and ICAO data interchange network, using both public and private telephone
networks. AFTN, administered by ICAO, is the means by which all information
concerning national and international air operations are exchanged. The data consists
of messages on aircraft movements, conditions of airports, weather, and other
information related to ATC. OLDI refers to operational use of connections between
various Flight Data Processing Systems (FDPS) at different Area Control Centres
(ACCs). Public and private telephone networks are used to communicate data on
individual flights between ATC Centres along the route of the flight. The data that is
Chapter 2 Fundamental of ATM and ATC
17
exchanged includes flight level information, airspace boundary estimates of flights, and
other conditions that may be agreed between ATC Centres. This category incorporates
both systems for data exchange and any supporting equipment (e.g. AFTN printer,
console).
Thirdly, the Aeronautical Information System (AIS) provides information of a permanent
or semi-permanent nature on subjects such as geographical description of airspace, in-
flight procedures, sector procedures, communications data, surveillance data, and
specific airport characteristics data, either verbally or via datalink. In addition, local ATC
units provide a dynamic broadcast of relevant information to arriving and departing
pilots in the vicinity of the airport is known as Aerodrome Terminal Information Service
(ATIS). This service uses local weather data (from the meteorological office) and AIS
data (e.g. runway and taxiway conditions, navigational aids status).
Fourthly, backup radio and telephone systems must be provided. These backup
systems may provide identical functionality if it is a duplicated VSCS system. However,
in some cases, redundancy can be provided by similar but not identical systems which
cannot offer identical functionality. In these cases it is essential that controllers are
aware of these differences. Backup communication systems must be capable of
providing continuity of communication during outages (complete loss of the
communications at the level of an ATC Centre), as voice communication continues to
be the primary means of communicating ATC instructions to aircraft.
Finally, several other physical components are listed which have a role in providing the
overall communications function. These include but are not limited to pagers, headsets,
handsets, microphones, processors, press-to-talk buttons (PTT), buzzers, cables, and
footswitches.
The previous discussion has focused on current systems that support the
communication function. Current communication methods are mostly based on
analogue voice communication that pose various limitations to the users (e.g. limited
coverage, accessibility, capability, integrity, and security). Moreover, the combination of
these limitations with current Radio Telephony (RT) procedures is linked to excessive
levels of controller workload (see Figure 21 in EUROCONTROL, 2004g). As a result,
future development of air navigation for civil aviation aims toward enhanced
communication links between aircraft and controllers. This was an important element of
the ICAO’s Future Navigation Systems - FANS concept (ICAO, 2007). With respect to
Chapter 2 Fundamental of ATM and ATC
18
communication, a major development has been the advent of the Required
Communications Performance (RCP) concept. This concept characterises the
performance requirements for communications with no specific reference to
technology. Hence, the concept allows various technologies to be evaluated in terms of
communication process time (i.e. delay), integrity, availability, and continuity of function
(NASA, 2000). Until 2015, it is anticipated that the voice communication function will be
supported by a very high frequency data link (VDL) in addition to existing analogue
voice channels. In general, voice communication will be used for real-time, time-critical,
and non-routine messages (i.e. radar vectoring to avoid traffic). All other, more routine
communications will be served via data communication supported by VDL and satellite
communication (SATCOM) (NASA, 2000). The use of enhanced modes of data link will
enable several advanced features. Firstly, it will bring automatic data entry capabilities
while reducing time spent on manual data entry and potential for data entry errors.
Secondly, it will permit a significant reduction in transmission time and thus reduce RT
frequency congestion. Finally, it will eliminate misunderstandings as a result of
broadcasting problems and language issues. As a result, communication in the 2020
time frame is expected to be characterised by a mix of analogue voice and digital
communication with increased use of datalink to complement or replace existing
analogue voice communications.
2.3.1.2 Navigation function
The main objective of the navigation function within air traffic control (ATC) is to provide
aircraft with the means to navigate between the point of departure and the point of
arrival, i.e. to accurately and reliably determine their position during all phases of flight.
The quality of required navigational information (e.g. accuracy and integrity of aircraft
position) differs based upon the phase of flight. For example, the requirements in the
landing phase of the flight are the most stringent due to proximity to the ground and
high speed of aircraft, leaving little time to pilot to take corrective action. The navigation
function block, as shown in Figure 2-5, focuses on three components, namely
approach and landing navigation systems, area navigation systems, and systems for
control and monitoring of ground-based airport facilities. These are explained in the
following sections, concluding with a discussion of the concept of Required Navigation
Performance (RNP).
Chapter 2 Fundamental of ATM and ATC
19
Figure 2-5 Navigational function
2.3.1.2.1 Approach and landing navigation
This category within the navigation function consists of the systems that provide
precise guidance to an aircraft approaching a runway. The most widespread approach
aid is the Instrument Landing System (ILS) used for the most critical phases of the
flight, i.e. approach and landing. This system provides the pilot with both runway
centreline azimuth guidance (provided by an ILS localiser) and descent rate guidance
(provided by ILS glide slope) along the approach path of an aircraft. It allows pilots to
conduct the final approach and land safely even in conditions of poor visibility.
Previously, a Microwave Landing System (MLS) was supported by ICAO in areas
where it offered operational and economic advantages (e.g. increased runway
throughput/capacity). However, in this domain much more emphasis is now put on
evaluation of satellite navigation techniques and the necessary augmentations to
support precision landing with the long term objective of replacing the ILS system
(Aviation International News, 2001).
2.3.1.2.2 Area navigation
aRea NAVigation (RNAV) is a method of navigation that enables aircraft to fly any
chosen direct course within a network of navigation beacons, rather than navigating
directly to and from the individual beacons (EUROCONTROL, 2003h). Navigation
systems which provide RNAV capability include VHF Omni-directional Range/ Distance
Chapter 2 Fundamental of ATM and ATC
20
Measuring Equipment (VOR/DME), DME/DME, Non-Directional Beacon (NDB), self-
contained Inertial Navigation Systems (INS), and Global Positioning System (GPS).
Currently, area navigation is primarily supported by ground-based systems. Most
widespread is the VOR which provides a radial or bearing on which aircraft fly from one
VOR station to another (EUROCONTROL, 2003g). This aid is usually combined with
DME providing information on the distance of the aircraft from the VOR/DME beacon.
Therefore, any aircraft utilising this facility, can determine its position in terms of
bearing and distance relative to the location of the VOR station. The VOR/DME
combination represents the primary ground based aid for area navigation. Generally,
the maximum range of VOR stations is in the region of 250nm due to the line-of-sight
nature of VHF signals and the curvature of the Earth (EUROCONTROL, 2003g). Each
air navigational service provider publishes the effective range of their VOR stations.
Another system that uses a radio beacon is a NDB. It consists of two components, the
Automatic Direction Finder (ADF) which represents the airborne component and the
NDB's transmitting unit which is the ground component. The NDB beacon broadcasts
continuously on a specific frequency. An ADF on the aircraft detects specific bearing to
or from an NDB unit and thus determines its position relative to the NDB beacon. A
NDB bearing is a line passing through the station that points in a specific direction (e.g.
270 degrees west). This system may also be coupled with a DME. Although widely
used in the approach environment, it is less accurate and less reliable than VOR/DME
since it is susceptible to interference from thunderstorms and other atmospheric
phenomena. The power output determines the maximum range of the NDB beacon but
generally they are usable in the range of 50-100 Nm (EUROCONTROL, 2003g).
An INS is a completely self-contained navigational system located on board the aircraft
and independent of ground-based navigation aids. The basic INS consists of three
mutually orthogonal gyroscopes, three mutually orthogonal accelerometers, a
navigation computer, and a clock (EUROCONTROL, 2003g). Gyroscopes are
instruments that provide the orientation of an object (e.g. aircraft’s angles of roll, pitch,
and yaw). Accelerometers sense a rate of movement or acceleration along a given
axis. The orthogonal accelerometer configuration provides three orthogonal
acceleration components. Combination of the gyroscope orientation information with
the summed accelerometer outputs yields the total acceleration in three-dimensional
airspace. A navigation computer then time integrates the total acceleration to get the
aircraft's velocity vector. This velocity vector is further time integrated, yielding the
Chapter 2 Fundamental of ATM and ATC
21
position vector of aircraft. These steps are continuously iterated throughout the
duration of the flight. Based on all of the data, the INS system determines the aircraft’s
position relative to a known point of departure (i.e. latitude and longitude coordinates of
the departure gate).
In recent years, Global Navigation Satellite Systems (GNSS) are being slowly
introduced where appropriate and cost effective. Two GNSS systems are currently in
operation: the United States GPS and the Russian Federation’s GLObal NAvigation
Satellite System (GLONASS)3. A third, the European Galileo system, is scheduled to
become operational in 2010. Each of the GNSS systems uses a constellation of
orbiting satellites working in conjunction with a network of ground stations. The GPS
system is available for civil use based on 24 operational satellites. Two distinctive GPS
services are available, namely the Standard Positioning Service (SPS) and the more
accurate Precise Positioning Service (PPS). The SPS is available to the civil users
worldwide without charge or restriction, while the PPS is available primarily to the
military. The SPS requirements are defined through the service availability standard of
more than 99% of time at an average location, with an average accuracy of 34m
horizontal and 77m vertical (95% threshold) (Department of Defence, 2001; European
Commission, 2006a). Similar standards are defined for the Galileo system, where five
distinctive navigation services will be available namely Open Service (OS), Safety-of-
Life service (SoL), Commercial Service (CS), Public Regulated Service (PRS), and
Search And Rescue service (SAR) (European Commission, 2006b). The SoL service is
intended primarily for aircraft navigation. Service performance requirements for SoL
with dual frequency correction are set to be 4m horizontally and 8m vertically (95%
threshold) (European Commission, 2006b).
In recent years, additionally to the concept and supporting systems for area navigation,
a new concept referred to as Precision aRea NAVigation (PRNAV) has emerged.
PRNAV has been introduced to allow consistent terminal airspace operations in the
European region (i.e. European Civil Aviation Conference – ECAC member states).
This is based on the navigation requirements that procedures, design principles, and
aircraft capabilities should meet the accuracy of ±1 Nm for at least 95% of the flight
time (EUROCONTROL, 2006b).
3 ГЛОбальная НАвигационная Спутниковая Система (ГЛОНАСС) or Global'naya
Navigatsionnaya Sputnikovaya Sistema.
Chapter 2 Fundamental of ATM and ATC
22
2.3.1.2.3 Systems for control and monitoring of ground-based airport facilities
In addition to all systems previously discussed, the navigation functional block also
includes systems for monitoring and control of ground-based airport facilities. Typically
monitoring and control of ground-based airport facilities is physically provided via
control desk with an interface panel designed to represent the airport facilities and
lighting services at a suitable scale (EUROCONTROL, 2003a). This component of the
navigation functional block supports but is not limited to the following elements:
navigational aids status, Aeronautical Ground Lighting (AGL) system (e.g. status of
runway, taxiway lighting panel), warning systems (e.g. runway in use), internal lighting,
meteorological equipment status, and alarming and reporting systems.
Finally, future development of air navigation for civil aviation aims toward enabling
aircraft navigation in four-dimensions seamlessly and gate-to-gate. The post FANS
Required Navigation Performance (RNP) concept is intended to characterise airspace
through a statement of the navigation performance accuracy (RNP type) to be
achieved (Jeppesen, 2001). In addition, the RNP-RNAV concept has emerged to
overcome the lack of harmonisation between the different RNP/RNAV naming
conventions and to enable common understanding of the relationship between RNP
and RNAV system functionality (ICAO, 2006a). The enhanced navigation, landing, and
surface movement service will be predominantly provided by the satellite-based
systems including the various augmentations such as Satellite-Based Augmentation
Systems (SBAS) and Ground-Based Augmentation Systems (GBAS). Surface
movements in all weather operations will be assisted with enhanced vision systems
enabling aircraft to ‘see’ the airport surface in reduced visibility conditions. As a result,
navigation in the 2020 time frame is expected to be characterised by a mix of ground-
and satellite-based systems with increased functionality complementing or replacing
the existing ground-based systems (VOR, NDB, DME).
2.3.1.3 Surveillance function
The ATC surveillance function identifies all aircraft and presents their position on a
radar screen. Additional dynamic information on the aircraft is also provided depending
on the type of radar employed. The surveillance function block, as shown in Figure 2-6,
focuses on radars, radar and auxiliary display, and radars used predominantly for the
Chapter 2 Fundamental of ATM and ATC
23
terminal and ground surveillance4. The section concludes with a discussion of the
concept of Required Surveillance Performance (RSP).
Surveillance
Primary Radar
SSR Mode A/
C/S
Automatic Dependent
Surveillance (ADS)
Surface
Movement Radar
Parallel
Approach
Runway Monitor
Terminal
Approach Radar
Display
Aux Display
Precision
Approach Radar
Aerodrome
Traffic Monitor
Figure 2-6 Surveillance function
2.3.1.3.1 Radar systems
Basically there are two types of radar. The Primary Surveillance Radar (PSR) is the
most basic form of radar which transmits a pulsed beam of ultrahigh frequency radio
waves through 360 degrees via a rotating radar head (EUROCONTROL, 1999). When
the waves reach the aircraft, some of the energy is reflected back. Every time the
aircraft reflects the transmitted energy it will be displayed on the radar screen, thus
plotting the course of the aircraft. The PSR only displays an aircraft track or course and
does not provide any other dynamic flight data. This form of radar is rarely used for
commercial aviation except in underdeveloped regions or as a back up to secondary
surveillance radar.
Secondary surveillance radar (SSR) is a more sophisticated form of radar which does
not rely on reflected radio waves. SSR transmits electromagnetic waves in the form of
pulses through 360 degrees (EUROCONTROL, 1999). These pulses are received by
4 The primary difference between enroute radars and those used in the terminal and ground
surveillance is the rate of radar information update (e.g. enroute radars update every 8s, whilst terminal radars update every 5s; EUROCONTROL, 1997).
Chapter 2 Fundamental of ATM and ATC
24
equipment on board the aircraft known as a transponder. The radar pulses interrogate
the transponder and if the transponder recognises the pulses it will respond by
transmitting back to the radar. Recognition is achieved by a discrete four digit code
assigned by ATC. When the transponder transmits to the radar, it actually transmits
essential data about the flight such as aircraft identification (known as Mode A) and
altitude (known as Mode C). As a result, the combination of the PSR and SSR Modes
A and C or SSR alone provides a three dimensional representation of the traffic. In
addition to this information, Mode S possess a data link functionality and access to
aircraft state vector (ground speed, track angle, turn rate, roll angle, climb rate,
magnetic heading, indicated air speed, mach number) as well as aircraft intent
information or indication of the future path (UK CAA, 2004).
A new surveillance initiative is directed toward the development of Automatic
Dependent Surveillance Broadcast (ADS-B) technology. This is a satellite-based
surveillance system that enables a constellation of satellites to determine the aircraft’s
position, altitude, velocity, and other parameters (CASA, 2006). The data is broadcast
to all possible recipients in contrast to Automatic Dependent Surveillance Contract
(ADS-C), where only point to point data transfer is established. As a result, surveillance
in the 2020 time frame is expected to be characterised by a mix of airborne (ADS,
ADS-B, ADS-C) and ground-based functions with increased functionality
complementing or replacing the existing ground-based systems (PSR and SSR).
2.3.1.3.2 Radar and auxiliary display
All surveillance information is presented to controllers on the Human Machine Interface
(HMI) commonly known as air situational display or radar display. Therefore, this
component of surveillance function block includes both radar and auxiliary displays.
Auxiliary display acts as a support providing data such as flight plan data, traffic lists,
and static and dynamic aeronautical data (e.g. notification to airmen - NOTAMs,
meteorological messages, and airport related information).
2.3.1.3.3 Terminal and ground surveillance
The surveillance functional block also incorporates radar systems which are relevant to
terminal and ground surveillance (Figure 2-6). These are Surface Movement Radar
(SMR), Parallel Approach Runway Monitor (PARM), Terminal Approach Radar (TAR),
Precision Approach Radar (PAR), and Aerodrome Traffic Monitor (ATM).
Chapter 2 Fundamental of ATM and ATC
25
Finally, future development of air navigation for civil aviation is focused on increased
accuracy of the aircraft position by integrating data from all available sources, such as
primary and secondary surveillance signals and Automatic Dependence Surveillance
Broadcast - ADS-B (Mohleji, Lacher, and Ostwald, 2003). The Required Surveillance
Performance (RSP) defines the surveillance requirements according to the airspace
involved (e.g. oceanic/remote airspace vs. high density traffic airspace). In addition, the
ADS system will enable merging of communications, navigation, and surveillance
technologies. This will accelerate the movement toward Airborne Surveillance and
Separation Assurance (ASAS). In other words, the future surveillance technologies
(e.g. ADS) will enable pilots to participate actively in the process of safely separating
their flight from other flights. This will be achieved by the display of traffic information
within the cockpit, wake vortex hazard prediction and avoidance, three dimensional
terrain presentation, terrain avoidance system, and weather awareness (Ochieng,
2006). Moreover, the US FAA is developing a concept of Situational Awareness for
Safety (SAS). The SAS concept is based on the use of available data (e.g. satellite-
based position data, terrain, weather) and their exchange between all parties involved
(e.g. pilots, dispatchers, controllers). The primary objective of the SAS concept is to
create an environment promoting more efficient, safe, and free use of airspace (FAA,
1995).
2.3.1.4 Data processing and distribution function
The data processing and distribution function incorporates all systems required to
process flight related data (e.g. initial flight plan data, dynamic communication,
navigation, and surveillance flight data). These include the Flight Data Processing
System (FDPS) as well as the Radar Data Processing System (RDPS) enabling
controllers to 'see' in real-time the movement of aircraft in a dedicated airspace, as
represented on radar display. In addition, this function block also incorporates all
supporting equipment, such as strip printer (Figure 2-7).
Chapter 2 Fundamental of ATM and ATC
26
Data Processing and
Distribution
Supporting
equipmentFlight Data
Processing
System
Radar Data
Processing
System
Single Radar
Processing
Multiple Radar
Processing
Fallback Radar
Data Processing
System
Fallback Flight
Data Processing
System Flight plan processing
Airspace data processing
Flight data management
& distribution
SSR management
MTCD
Trajectory prediction
MAESTRO
Figure 2-7 Data processing and distribution function
The FDPS handles flight plans and updates them through automatic events, manual
inputs, and triggered transitions from one state to another. This life of a flight plan
represents the condition of the flight plan at a specific time in its cycle. The phase of
the flight plan life cycle triggers certain system actions and directly affects what actions
the controller can take on the flight plan and therefore the actual flight. Through the
processing of flight progress strip (either manually or electronically), the controller
manages all traffic by interacting with flight related data (on the radar and auxiliary
display, and strip management board). The FDPS carries out the following specific
processes (EUROCONTROL, 2003a):
� initial flight plan processing which includes checking incoming flight plan
messages, creating a record of flight data, and storing it in the flight plan
database. In addition, the FDPS handles flight data throughout the ‘life’ of the
flight plan by constantly updating and distributing the flight data;
� airspace data processing and distribution which handles the complete airspace
information (e.g. airways and navigation beacons). In addition, it processes any
information on the special use of airspace to warn the controller about
infringements which require modification of flight trajectory;
� meteorological data processing and distribution;
� SSR code management which involves the assignment of SSR code to flights
and identification of all flights by SSR mode A. It also prevents assignment of
duplicate codes;
� trajectory prediction which is performed throughout the flight plan life cycle, taking
into account the initial flight plan as well as all modifications of the route;
Chapter 2 Fundamental of ATM and ATC
27
� provision of system supported coordination and transfer of control within the ATC
Centre and between adjacent ATC Centres;
� processing of data link messages from/to the aircraft (A/G coordination);
� flight plan conflict detection which is performed inside a defined region (i.e.
sector) using flight plan data. This function is known as Medium Term Conflict
Detection (MTCD);
� workload monitoring and distribution essential for assisting the supervisor in the
adjustment of the existing sectorisation (i.e. collapse/de-collapse of sectors) and
computation of position/sector load;
� arrival sequencing which provides the approach and en-route controllers with a
proposed sequence number for each arrival flight; and
� establishment of code/callsign correlation as a mapping between radar tracks
and flight plan database.
A flight progress strip is a tool that controllers use to record the progress of each flight
as it moves through the sector. It represents a record of all ATC instructions given to
each aircraft. It is also used as a back up to the surveillance function in the event of a
failure. The flight strip printer facility, as an additional component in this functional
block, supports the printing of flight strips at the executive, planner, and/or flight data
assistant positions, depending on the suite configuration. This facility automates the
previous manual filling of a flight strip through access to a database of flight information
and a printout of the data when needed. The printed strip displays the non-dynamic
aspects of the flight, necessitating only tactical dynamic instructions to be manually
entered on the strip by the controller.
The RDPS processes radar pictures from all available sources (primary and secondary,
short range and long range, en-route and approach radars) to establish an accurate
picture of all traffic over a well-defined geographical area. In the case of multiple radar
coverage, the RDPS provides a composite air picture of the traffic while taking into
account radar biases for range and azimuth measurements (EUROCONTROL, 2003a).
The ATM surveillance tracker and server system (ARTAS) processes PSR, SSR, Mode
S, and ADS data. These highly accurate and reliable data are directly integrated into
the existing ATC environment by using a universal data exchange format. For example,
EUROCONTROL defined the All Purpose STructured Eurocontrol Radar Information
Exchange (ASTERIX) messaging format. This allows the transfer of information
between two parties (e.g. systems) using a mutually agreed format of data.
Chapter 2 Fundamental of ATM and ATC
28
The data processing and distribution functional block also incorporates both a fallback
flight data processing system and fallback radar data processing system, as necessary
redundant systems in every ATC Centre. These fallback systems may provide identical
functionality if they are duplicates of the FDPS and RDPS systems. However, in some
cases these fallback systems do not necessarily provide the same range of functions
as the main systems. The necessity of redundant systems in ATC is discussed further
in Chapter 4.
2.3.1.5 Supporting function
The supporting function comprises various ATC tools that enable integrated air traffic
management operations that enhance safety and increase airspace capacity. The main
objective of these tools is to lessen the cognitive workload on the controller while
focusing on the relevant (task specific) information (IFATCA, 2004). They also assist in
the detection and resolution of potential problems. It is important to note that these
tools do not replace the need for controller decision making processes, they simply aid
them. The supporting function includes the following tools (Figure 2-8):
� Monitoring tools assist with detection and recording of any safety-related events
(e.g. the Automatic Safety Monitoring Tool – ASMT), reduce the workload
associated with traffic monitoring tasks by identifying the potential and actual
deviations or non-conformance with the planned flight trajectory (e.g. MONitoring
Aid – MONA), and automatically check if aircraft are adhering to their planned
route (e.g. Route Adherence Monitoring – RAM) or cleared flight level (e.g.
Cleared Level Adherence Monitoring – CLAM) by comparing ‘planned’ or
‘cleared’ information with the aircraft actual position (EUROCONTROL, 2001f);
� The Medium Term Conflict Detection (MTCD) system is a tool which enables
controllers to predict and identify future conflict between aircraft in the predefined
region by applying separation rules (EUROCONTROL, 2001f); and
� Sequencing managers (e.g. Arrival Manager - AMAN, Departure Manager -
DMAN, Means to Aid Expedition and Sequencing of Traffic with Research and
Optimisation - MAESTRO) are decision making tools for providing the approach
and en-route controllers with the control and sequencing actions to properly
expedite traffic to the destination airports and runways (EUROCONTROL, 2001f).
Chapter 2 Fundamental of ATM and ATC
29
Figure 2-8 Supporting function
These tools aim to enhance the controller’s appreciation of the current and predicted
traffic situation and facilitate the decision making process. They are an integral part of
the HMI (i.e. radar display) and are informed by the output of the data processing and
distribution function.
2.3.1.6 Safety Nets
A safety net (SNET) is an airborne and/or ground-based function informing the pilot or
controller to the imminent possibility of collision between aircraft, between aircraft and
terrain/obstacles, as well as penetration of dangerous airspace (IFATCA, 2004). The
most common safety nets are Short Term Conflict Detection (STCA), Minimum Safe
Altitude Warnings (MSAW), Area Proximity Warnings (APW), and Runway Incursion
Monitoring and Conflict Alert System (RIMCAS).
The previous section described medium term conflict detection (MTCD) as an ATC tool
which assists the controllers in early detection and prediction of conflicts (e.g. 20
minutes in advance). Similarly, the STCA function detects two system tracks predicted
to be in conflict (i.e. two tracks where both horizontal and vertical separations are about
to be compromised). This system then alerts the controller to the imminence of a
separation minima infringement through the display of visual alarms presented on the
affected traffic on the HMI. However, whilst MTCD is for early detection and prediction
of conflicts, the STCA is used as a safety net or defence against imminent conflict
(EUROCONTROL, 2007a). The exact moment of STCA alarm depends upon
Chapter 2 Fundamental of ATM and ATC
30
predetermined settings (usually it is set to trigger the alert between 90 seconds and two
minutes prior to conflict).
The MSAW function enables detection of a radar track predicted to infringe the
minimum safe altitude above an obstacle. MSAW processing takes into account the
track altitude (i.e. altitude of the track extracted from Mode C or present altitude
corrected for pressure at mean sea level known as QNH pressure, thus providing the
altitude above mean sea level), attitude indicator (i.e. climb or descent), position and
speed vector. In addition, the system will detect if a radar track is predicted to deviate
from the approach path of an airport (EUROCONTROL, 2007a).
The APW is used to designate areas which are dangerous for an aircraft to enter (e.g.
missile firing, military training, and air display areas). These areas can be identified as:
prohibited, restricted, dangerous, military training, segregated, special use, temporary
restricted, and permanently restricted. The APW ensures that any aircraft infringing or
predicted to infringe on one of these areas is detected by this system and an advance
warning is presented to the controllers (EUROCONTROL, 2007a).
RIMCAS is an airport monitoring and conflict alert system which detects and alerts
controllers before a runway incursion is about to occur. The system gives the controller
an opportunity to react within a realistic and effective timeframe. This system is also
known as the ground short term conflict alert system. The main requirement of this
system is to be supplied with reliable surveillance data as any false alert unnecessarily
increases controller workload. As a result, the Automatic Dependent Surveillance
Broadcast (ADS-B) system should enhance surveillance capability for airport
monitoring and conflict prevention through the Advanced Surface Movement Guidance
and Control Systems (ASMGCS) (ICAO, 2005).
2.3.1.7 Power supply
The availability of electrical power is a prerequisite in a computer driven environment,
such as an ATC Centre. Electrical power is obtained from public utilities, but in case of
interruptions or non-availability, the ATC Centre's own installations are required to
provide electrical power. This is most commonly achieved by diesel-powered
generators or powerful batteries, supporting an Uninterrupted Power Supply (UPS)
capability. These components are required to provide uninterrupted electrical power
supply in order to prevent computers shutting down.
Chapter 2 Fundamental of ATM and ATC
31
2.3.1.8 Pointing and input devices
The Human Machine Interface (HMI) represents the entire ATC system to the controller
on each Controller Working Position (CWP). In order to interact with available systems,
the controller uses input and pointing devices. Input devices include Touch Input
Panels (TIP), the mouse, and keyboard. However the most frequent pointing devices
are the mouse and trackerball. Using the input and pointing devices, the controller
‘communicates’ with the entire ATC system, and edits and reads ‘live’ flight plans. All
the changes and interactions made by controllers via input and pointing devices are
presented on displays (i.e. radar, auxiliary display, and communication panel).
2.3.1.9 System control and monitoring function
This function is supported by a computer and monitor system that controls the overall
ATC system from a centralised position, i.e. the system control and monitoring unit.
The main purpose of this system is to display the actual state of the core systems and
subsystems within the CNS/ATM infrastructure, to manage incidents, and to perform
the reconfiguration of resources within its infrastructure. This functional block
constantly checks the functionality of the overall system, involving the software and
hardware configuration in order to ensure a high system availability (EUROCONTROL,
2003a). The system monitoring and control functionality is supported by several
different facilities which are explained in the following paragraphs (Figure 2-9).
Figure 2-9 System monitoring and control function
The data recording and playback facility enables automatic recording of all transactions
made by the radar data, flight data, radar display, and communication functions. This
includes all controllers’ modifications to flight plans, received messages, and display
setting modifications (EUROCONTROL, 2003a). The recorded data are used for further
data analysis and for playback of the specific air traffic situation (i.e. in the case of an
Chapter 2 Fundamental of ATM and ATC
32
incident). The recordings are stored on disks for the time deemed necessary by the
relevant aviation authority (the legal requirement is 30 days but could be longer if
necessary for incident investigation).
One of the most requested system control and monitoring functions is the ability to
detect faults in the supervised ATC system by continuous control and monitoring of the
system operation. This facility provides detailed information on the equipment states
within the managed systems and the relevant alarm conditions which may affect the
operating mode. It also logs events and enables the remote control of supervised
equipment and setting of the system thresholds (EUROCONTROL, 2003a). Its main
sub-functions are: fault management (i.e. alarm management, threshold setting),
configuration management (i.e. equipment descriptions), performance management
(i.e. identification of trends and problems), and security management (i.e.
authentication, identification, password protection, tailored user interface). The control
and monitoring is performed on all positions, external lines, and connections.
Each ATC system is designed to have several operational system modes
(EUROCONTROL, 2003a). These modes automatically switch-in if any of the major
processing systems fail. The objective is that the controller always has some
functionality available despite the degradation of equipment. Reduced radar, alert, flight
plan, and communication modes are the most frequent types of reduced operational
modes available in current ATC systems.
The time management facility uses the external time received from the GPS signal for
synchronising time on all computers (i.e. all Controller Working Positions - CWPs). The
time is expressed in Coordinated Universal Time (UTC), also known as zulu time.
Originally, it was a time scale based on the local standard time on the 0° longitude
meridian which runs through Greenwich, United Kingdom. Today, UTC uses precise
atomic clocks and satellites to ensure a reliable and accurate time standard for air and
ground operations (ICAO, 1979).
2.4 Characteristics of the generic Air Traffic Control Centre
The preceding paragraphs presented the architecture (functional and physical) of an
Air Traffic Control (ATC) system. However, a more complete understanding of the ATC
system (i.e. people, equipment, procedures) is possible within the context of an ATC
Centre providing specific types of services. Therefore, this section reviews the main
characteristics of a ‘generic’ ATC Centre with particular focus on current technologies.
Chapter 2 Fundamental of ATM and ATC
33
The following section focuses on technologies that will determine the characteristics of
the generic ATC Centre in the future.
There are significant variations in equipment between ATC Centres, both in Europe
and worldwide. On the European level, EUROCONTROL, the European Organisation
for Safety of Air Navigation, took the role of promoting the harmonisation, integration,
and standardisation while improving safety and overall performance of the ATM/ATC
systems in its member states. For example, EUROCONTROL (2006d) has considered
the costs of fragmentation of the EUROPEAN ATM system. At a global level, ICAO
standardisation activities are undertaken when new systems or technologies are
mature, have demonstrated their ability to provide safety enhancements compared to
existing systems, and are cost beneficial to international civil aviation (ICAO, 2003).
ICAO has established standards and recommended practices for all of its contracting
states (ICAO, 2006b).
In spite of the significant effort to date to standardise ATM/ATC within the aviation
community, there are still significant differences. For this reason, the methodology
adopted in this thesis for the assessment of controller recovery from equipment failures
in ATC is designed on the basis of a ‘generic’ ATC Centre. This is defined below.
The ATC Centre should be based on a fully automated and integrated system with a
fail-safe design based on duplicated processors and open architecture in accordance
with existing industrial standards. It also has to have graceful degradation modes. The
data processing functional block should be able to support acquisition and processing
of data from several radars (i.e. multiradar tracking), automatic collection and
processing of flight plans, automatic allocation of SSR codes, coordination achieved
through direct connection to adjacent centres (e.g. on-line data exchange - OLDI),
coordination of civil and military flights via a separate military suite, and automatic flight
progress monitoring (continuous calculation of flight profile and update based on radar
data). The air situational picture should be presented on the HMI (radar and auxiliary
display) with necessary alert facilities (e.g. STCA, MSAW, CLAM, RAM). The playback
function of radar pictures should be available for incident investigation, testing,
development, and training.
The ATC Centre should have the capability to have paper strip presentation on the strip
console. A flight progress strip is a single strip of paper that contains all information on
a flight and its evolution through a particular sector of airspace. It is used as a quick
Chapter 2 Fundamental of ATM and ATC
34
way to record the progress of the flight and to keep a legal record of the instructions
issued. It is also used to allow the planning controller to predict future conflicts and to
ensure that sector entry/exit conditions are achieved. In addition, in the case of radar
failure, flight progress strips represent the primary control tool. The strip, mounted in a
strip holder, is placed with other strips in a 'strip board' which displays all flights in a
particular sector of airspace or on an airport.
In recent years, there have been initiatives aimed at electronic strip presentation, used
in many European ATC Centres and airports. However, as Lanzi and Marti (2001) point
out, controllers do not generally find electronic strips to have the same level of flexibility
and support as paper strips. On the other hand, more radical attempts have been made
toward a stripless environment, where aircraft information is tagged to the label on the
radar screen that can be expanded as necessary. In this environment generally three
modes of the same aircraft label exist: the standard label that is always displayed on
the screen, the highlighted label that is bigger and contains more information, and the
extended label that contains all information not immediately required by the controller
(for details see Lanzi and Marti, 2001).
The previous sections have discussed the current technologies relevant to an ATC
Centre. This forms a part of the definition of a ‘generic’ ATC Centre. In addition, the
generic ATC Centre should be adaptable to changes in technologies. Hence, the
following section addresses the future of ATC and how this is likely to impact on an
ATC Centre.
2.5 The future of Air Traffic Control
The research presented in this thesis has to take into account the future challenges
that may face controllers with the increased exposure to more automated systems. In
this regard, this section briefly discusses the key challenges of automation,
characteristics of human-centred design, as well as the concept behind the ICAO’s
Future of Air Navigation Service (FANS). The section concludes with a discussion of
the potential sources of technical and human performance deficiencies within the future
ATC Centres and their relevance to the equipment failures and the recovery process as
investigated in this thesis.
2.5.1 Challenges of automation
There are various definitions of automation, residing in different contexts. In the context
of Air Traffic Management (ATM), the National Research Council Panel on Human
Chapter 2 Fundamental of ATM and ATC
35
Factors in Air Traffic Control Automation (Wickens et al., 1998) defined automation as:
“a device or system that accomplishes (partially or fully) a function that was previously
carried out (partially or fully) by a human operator.”
According to Wickens (1992) automation is mainly applied to perform or assist
functions in which humans are naturally limited (e.g. accessibility to toxic, dangerous,
unreachable environments; or inherent working memory limitation). In addition,
automation is used to replace humans in operations which are time consuming, costly,
or induce high workload (e.g. complex monitoring or analytical processes). While often
seen as replacing humans, in reality, automation changes the role of the human
operator from direct manual control to largely supervisory control. In other words, in this
new role, the human operator plans and inputs tasks and the computer systems
implement these tasks automatically. Automation does not totally replace human
activity, it just changes the nature of the work that humans do. This change is often
completely unintended or unexpected by automation designers (Parasuraman and
Riley, 1997).
Past research has identified three sources of human performance deficiencies when
using high level automation (Bainbridge, 1983; Wickens et al., 1998; Wiener and Curry,
1980; Boehm-Davis et al, 1983). Firstly, humans become less likely to detect failures in
the automation itself or in the automated process. Secondly, they lose some
awareness of the state of the automated process. Finally, human operators eventually
lose skills in performing the actions manually if these actions have been previously
automated. These three phenomena are commonly known in literature as ‘out of the
loop’ performance problems. This problem of deterioration of manual skills is
particularly relevant to controllers and flight crews. As Bainbridge (1983) points out, an
irony is that the more reliable the automation, the more prone to ‘out of the loop’
performance problems will be the operator. This is the direct result of the increased
complacency, over trust in automation, and deterioration of manual skills of both
controllers and pilots.
Experiments have shown that operators abilities to recover from emergency
automation failure significantly improve with levels of automation that require human
involvement in the implementation of a task. Thus automation strategies that allow
operators to focus on current operations may contribute to improved situational
awareness and reduction in workload (Endsley, 1997). As a result, a new approach to
Chapter 2 Fundamental of ATM and ATC
36
automation evolved resulting in human-centred designs instead of technology- or
automation-centred designs.
2.5.2 Human-centred vs. technology-centred automation
Traditionally, automation was perceived in an all-or-none fashion. At one extreme,
automation was employed completely and expected to eliminate human error. At the
other extreme, automation was kept to an absolute minimum, keeping the operator as
much as possible in the control loop. This traditional approach to automation has been
known as ‘static’, where the level of automated assistance was unchanged over time
(Parasuraman et al., 1990). However, decades of research showed that between these
two extremes, different levels of automation can be specified by the degree to which a
task is automated. This way of thinking led to a concept of human-centred automation
which is essentially developed around the idea to keep the operator in control of the
situation (Billings, 1996; Parasuraman et al.; 1990; Sheridan, 1980). As Layton, et al.
(1994) note, the design of any automated system should be seen as the design of a
new collaboration between the machine and the human operator.
According to Wickens et al. (1998) the choice of what to automate should be simply
guided by the need to compensate for human vulnerabilities and to exploit human
strengths. However, this simplistic approach may again lead to static automation, not
exploiting and adapting automation to the characteristics of the context (surrounding
the human operator). Therefore, it seems more reasonable to move beyond traditional
automation approaches toward the principles of dynamic allocation of control between
human and machine, i.e. ‘adaptive automation’ (Scerbo, 2005; Kaber, 1997; Kaber and
Riley, 1999; Parasuraman et al., 1996; Parasuraman et al., 2000; Kaber, Prinzel,
Wright, and Clamann, 2002).
In short, the presence of automation is inevitable in all future concepts of air navigation.
Current design initiatives are more focused on the human-centred automation while
initial steps have began to be taken toward adaptive automation. For example, the
concept of cognitively convenient alarm onset has been tested on a US naval ship as
described in Daniels, et al. (2002). Based on the previous discussion on the main
principles of automation, it is necessary to review how these principles are
implemented in the design of future ATC systems and tools. The following section
presents the key concepts that will signify the characteristics of the Communication
Navigation and Surveillance (CNS/ATM) up to the year 2020.
Chapter 2 Fundamental of ATM and ATC
37
2.5.3 The future of air navigation service
The problems with the current air traffic management system can be summarised in
two areas. Firstly, the fragmentation of national systems prevents optimal use of global
airspace, as aircraft have to be controlled by many different air traffic systems.
Secondly, inherent limitations of current Air Traffic Control (ATC) technologies and
operational procedures are well known and make it impossible to achieve enhanced
efficiency and required capacity for the future (Ochieng, 2006).
To respond to the identified areas of concern, the International Civil Aviation
Organisation (ICAO) developed the Future Navigation Systems (FANS) concept built
around Communications, Navigation, and Surveillance in Air Traffic Management
(CNS/ATM) system. As a result, future concepts and strategies in ATM/ATC will follow
a global approach to ATM and no longer focus solely on national needs. In this overall
environment, ATM/ATC technologies will face necessary changes and development
currently under conceptual or design phase. The general drivers of future ATM/ATC
are structured around communication, navigation, and surveillance functionalities and
are summarised below:
� communication in the 2020 time frame is expected to be characterised by a mix
of analogue voice and digital communication with increased use of datalink (VHF
based datalink-VDL, SSR Mode S datalink) and satellite communication
(SATCOM) to complement or replace existing analogue voice communications.
� navigation in the 2020 time frame is expected to be characterised by a mix of
ground- and satellite-based systems with increased use of satellite systems (e.g.
GPS, Galileo) for all phases of flight.
� surveillance in the 2020 time frame is expected to be characterised by a mix of
airborne (ADS, ADS-B, ADS-C, A-SMGCS, cockpit situational awareness-SAS)
and ground-based functions (SSR Mode S) with increased functionality
complementing or replacing the existing ground-based systems (PSR and SSR).
This succinct statement of the evolution of CNS/ATM within 2020 time frame needs to
be further discussed from the perspective of a generic ATC Centre. In other words, it is
necessary to discuss the potential characteristics of the generic ATC Centre in 2020.
Based on ICAO and EUROCONTROL future concepts, the following changes are
expected in the generic ATC Centre in 2020:
� in support to Gate to Gate (G2G) flight management the following ATC systems
and tools are proposed for the period from 2010 onwards: four dimensional flight
Chapter 2 Fundamental of ATM and ATC
38
trajectory prediction, sequencing managers (AMAN, DMAN), MTCD, monitoring
aid (MONA), system supported coordination (SYSCO);
� stripless environment;
� datalink communication;
� autonomous or free flight concept less reliant on ground-based navigational aids;
� transfer of separation responsibility to the flight deck giving controllers more of
a monitoring role;
� electronic (silent) coordination; and
� dynamic optimisation of airspace through the Single European Sky (SES)
initiative (EUROCONTROL, 2007b) and the concept of flexible use of airspace
(see MANTAS concept; EUROCONTROL, 2004b).
After presenting the system design principles and characteristics of future ATM/ATC, it
is important to discuss the impact that those changes may have on equipment and
human reliability. Following the main objective of the research presented in this thesis,
it is necessary to identify the potential sources of technical and human performance
deficiencies and their relevance to the controller recovery process.
2.5.4 Impact of future ATM/ATC on controller recovery from equipment failures
With the accumulated knowledge of the modern integrated ATC systems, it is
reasonable to assume that future overall equipment reliability will remain similar to
current standards. However, the nature and types of equipment failure may change.
While eliminating single-points failure, future ATC Centres may experience increased
problems with software reliability and data integrity (e.g. presentation of inaccurate
data). This will be the direct result of a more complex and integrated ATC architecture
as well as incompatibility between current and future, more automated ATC equipment.
In other words, the future ATC Centres may be faced with failure types that will be
harder to detect and repair. The highly integrated ATC architecture may mask some of
these failures and hide the real cause(s) of the problem.
When discussing human reliability issues in future ATM/ATC environment, it is
reasonable to assume that automation design will create situations where controllers
will not be able to cope with its complexity or simply will not have enough time
available. This is a direct result of the assumed ‘out of the loop’ performance and the
reduced separation between aircraft (as a requirement to provide necessary capacity).
As noted by Wickens et al. (1998), the time available to safely respond to an
Chapter 2 Fundamental of ATM and ATC
39
emergency situation will decrease with decreased separation, while the operator
response time may increase due to ‘out of the loop’ performance. One alleviating factor
may be the transfer of responsibility for separation management from controllers to
pilots, giving the former more time to affect recovery. The environment of collaborative
decision-making and real-time information exchange though threatens to distribute
false or inaccurate information from the ground to the air. In this case, ATC equipment
failure may affect the airborne segment of ATM and cockpit instruments (e.g. Flight
Management System - FMS).
The European Organisation for Safety of Air Navigation (EUROCONTROL) recognised
that the role and nature of controller tasks will change as a result of the addition of
increased automation within the ATM system. As a result, they initiated the Solutions
for Human-Automation Partnerships in European ATM (SHAPE) project to better
understand interactions between automated support and controllers (EUROCONTROL,
2004f). SHAPE has identified seven factors that need to be addressed to ensure
harmonisation between automated support and the controller. Amongst factors such as
trust, situational awareness, team issues, skills, ageing, and workload, SHAPE
recognised the importance of managing system disturbances (details are presented in
Chapters 5 and 7). As a result, the assessment of controller recovery presented in the
remainder of this thesis, considers the interactions between human and automation. A
flexible approach has been developed to assess controller recovery in any possible
context.
In short, the role of the human operator will remain significant in the future ATC
environment. Due to the transfer of responsibility for separation management from
controllers to pilots the recovery performance will evolve from purely controller’s
actions to collaboration between controller and pilot. To support human performance in
the future more automated environment (both on the ground and in the cockpit), special
attention will have to be given to the areas of human-computer interaction, training, and
procedures for both normal and abnormal situations.
2.6 Summary
The aim of this Chapter is to create a basis for the research on recovery from
equipment failures in ATC. There are several findings that will be taken forward from
this Chapter. Firstly, this Chapter defined ATM and its component ATC and thus
indicated the scope of the research presented in this thesis. Secondly, this Chapter
placed additional emphasis on the ATC functional classification. This classifications
Chapter 2 Fundamental of ATM and ATC
40
starts with the main ATC functional blocks further dissected to element level. It has
been defined based on both current and future ATC systems and tools in accordance
with principles and initiatives of ICAO and EUROCONTROL. As such, this ATC
functional breakdown is flexible to changes in ATM/ATC and should capture both
current and future equipment failure types. Finally, this Chapter defined characteristics
of a ‘generic’ ATC Centre in both current and future ATC environment. This finding
creates a base for the entire research presented in this thesis.
The next Chapter focuses more on the equipment component of the ATC system.
Since the aim of the overall thesis is to assess the impact of equipment failures, the
next Chapters provide relevant definitions, identify types of equipment failure, and their
contribution to the safety of the overall air transport system. A sample of operational
failure reports used in this research is validated through a framework based on the
contribution of equipment failures to the overall safety of air transport system.
Chapter 3 Preliminary Assessment
41
3 Preliminary Assessment of Equipment Failures in Air Traffic Control
The previous Chapter presented the context of the research in this thesis by describing
the Air Traffic Management (ATM) system and its component the Air Traffic Control
(ATC) system. Furthermore, it detailed the range of functions provided in an ATC
Centre. The main characteristics of current ATC Centres as well as the concepts
shaping their future characteristics were covered also. A comprehensive analysis of
equipment failure should follow its ‘life’ by assessing all the phases that this occurrence
undergoes throughout the ATC system (Figure 3-1). An equipment failure firstly
encounters the existing technical built-in defences. If these inherent defences are
insufficient to prevent the failure impacting on the ATC system, the failure now
becomes a hazard. Hazards represent a sub-group of equipment failures that penetrate
existing technical built-in defences and hence require human intervention (or human
recovery). An equipment failure occurrence concludes with the outcome which is the
result of the collaboration between technical and human recovery.
Figure 0-1 Phases of an equipment failure occurrence
Following the equipment failure ‘life’, the Chapter starts with the relevant definitions of
equipment failures and hazards. While the human recovery and outcome phases of the
equipment failure ‘life’ are discussed in the remainder of the thesis, this Chapter
continues by presenting the available sample of operational failure reports. It also
discusses the reporting schemes used to obtain equipment failure reports and data
pre-processing issues. The appropriateness of this sample is assessed by using a
Chapter 3 Preliminary Assessment
42
methodology that determines how much ATC equipment contributes to the safety of the
overall air transport system. Agreement between the findings obtained from past
research and the analysis of available operational failure reports indicates the validity
of this sample. Once this is achieved, the thesis continues with more in depth
assessment of the available sample in the following Chapter.
3.1 Definition of equipment failure
The focus of aviation safety and reliability management has mainly been on the
prevention of technical failures, human failures (also known as human errors), and
more recently organisational or management failures (Reason, 1997). The European
Organisation for Safety of Air Navigation (EUROCONTROL) defines failures in the ATC
system as “the inability of any element of that system to perform its intended function or
to perform it correctly within specified limits” (EUROCONTROL, 2002c). As discussed
in Chapter 2, the ATC system comprises of people, equipment, and procedures
integrated in an optimal way to achieve a common objective. However, the research
presented in this thesis focuses solely on failures of one component of ATC system,
namely equipment. Therefore, in the following text, the term ‘failure’ will only apply to
equipment failures or malfunctions.
Leveson (1995) defines failure as the “inability of the system or component to perform
its intended function for a specified time under specified environmental conditions”. The
definitions by Leveson and EUROCONTROL are similar as both take into account
failure in a much wider sense. In this research a failure occurs when any component of
ATC equipment terminates unexpectedly and no longer performs the required function,
while the overall ATC system remains operational. If the entire ATC system becomes
unavailable, the failure is known as an outage. For example, communication failure is
observable in an ATC Centre if there is unexpected failure of radio communication
equipment on one console. However, if the failure affects the entire ATC Centre (e.g.
due to loss of power), this failure is known as an outage. It is important not to restrict
the term failure only to catastrophic events. Small-scale failures can combine to act
more severely in different environmental conditions (contexts). According to Wickens et
al. (1998) the source of such problems could be software bugs, erroneous or delayed
data exchange, or design deficiencies. Figure 3-2 illustrates the definitions discussed
previously.
Chapter 3 Preliminary Assessment
43
Air Traffic Control
(ATC) System
PEOPLE EQUIPMENTPROCEDURES
& TRAINING
FAILURE
HUMAN FAILURE =
HUMAN ERROR
EQUIPMENT
FAILURE
FAILURE OF
PROCEDURE AND/
OR TRAINING
Equipment
failure
Outage or
Fallback
Local impact: console/sector
Overall impact: entire ATC
Centre
Failure modeFailure effect observable on
equipment and/or ATC system
Figure 0-2 Different definitions
In a similar way, it is necessary to differentiate between total and partial equipment
failures. Using the example above, a total radio communications failure will result in a
situation where a controller working position (or a sector) can no longer provide air
traffic services due to the inability to communicate clearances or instructions to aircraft.
However, if a failure affects only one element, either the transmitter or receiver, and the
other component is still operational on that position (or the sector), the radio
communication failure will be regarded as partial. In other words, if the equipment no
longer performs any aspect of the required function the failure is total, but if at least
some portion of the required functionality still exists, the failure is only partial.
All technical items are designed to fulfil one or more functions. A failure mode is thus
defined as an inability to partially or completely fulfil one of these functions (Figure 3-2).
It is also defined as the visible effect of a failure on the ATC system. Note that
equipment failures may not have any visible impact on the ATC service due to the
availability and effectiveness of built-in defences (e.g. redundancy) discussed in more
detail in Chapter 4. In this case, the only visible effect on the system (i.e. failure mode)
would be the engagement of the first level of redundancy. In some cases, this transition
is done seamlessly and it is only apparent to technical staff, but not to controllers. The
Chapter 3 Preliminary Assessment
44
UK national air navigation service provider (NATS) differentiates between fallback and
failure modes. According to NATS, fallback mode is a condition which occurs only if
there is a major failure or when the level of redundancy is significantly eroded (NATS,
2002). Thus, the NATS definition of fallback modes corresponds closely to outages
defined previously.
It is very important to distinguish between equipment failures and human operator
failures, known as human errors (Figure 3-2). Note that it could be said that all failures
are human in their nature, since most of them involve humans at some stage of the
process, e.g. system designers might fail to anticipate a certain equipment state.
Humans are also involved in manufacturing, testing, validation, certification, and
maintenance. Any of these human operators can be directly or indirectly responsible for
a failure occurring in ATC. It is also important to note that non-technical failures should
not be directly considered as human failures. Frequently, a failure that has no obvious
technical cause is directly attributed to the human, due to a lack of a deep and
objective analysis of its causes and dynamic relations between technical and human
components of the system (Straeter, 2001).
The following sections start with the definition of a hazard, as a sub-group of equipment
failures that penetrate existing technical built-in defences and hence require human
intervention, which is the focus of the research presented in this thesis. This is followed
by the presentation of the sample of operational failure reports available in this thesis.
3.2 Definition of a hazard
The research in this thesis focuses on failures that penetrate technical defences (i.e.
technical recovery) and therefore impact (with different levels of severity) on a
controller’s performance. In this thesis, a hazard is defined as the ATC system state
resulting from an equipment failure that penetrates all existing technical defences and
affects the ability of the controller to perform his/her tasks. In different contexts a
hazard may have different definitions. For example, EUROCONTROL (2002c) defines
a hazard as “any condition, event or circumstance, which could induce an accident or
incident”. This EUROCONTROL definition is too broad and thus not in line with the
scope of this research. Thus, the term hazard in this research takes into account only
failures that require controller intervention (i.e. human recovery. The failures that
belong to this category are addressed in this thesis.
Chapter 3 Preliminary Assessment
45
The following examples may help to clarify the difference between failure, hazard,
technical and human recovery, as defined in this research:
� A blocked radio frequency (failure) prevents exchange of information between a
controller and pilot. This failure presents a hazardous situation and requires the
controller’s immediate action (human recovery). Changing the frequency on the
same working position or moving to another available working position are
possible ways to recover.
� A power loss (failure) affects one set of Controller Working Positions (CWP).
Due to the independent Uninterruptible Power Supplies (UPS) electrical energy
is continuously provided and the controller does not notice this failure (no
hazard). The automatic changeover to UPS represents one example of built-in
technical defence or technical recovery (see Chapter 4 for detailed
explanation). If the continuous supply of electrical energy is not provided,
several CWPs may experience a problem, creating a hazardous situation and
requiring controller intervention (human recovery).
It should be pointed out that although this research considers only failures which lead
to hazardous situations, there are other failures as well. These other failures represent
the majority which never affect the controllers’ performance due to the effectiveness of
technical built-in defences (NATS, 2002). However, these failures still require
intervention, repair, and maintenance by engineers from the ATC system control and
monitoring unit.
After defining a failure and hazard as used in this research, the next session analyses
the nature of equipment failures in the operational environment. Details on this sample
of equipment failure reports are presented in the following section.
3.3 Supporting data: operational failure reports
Operational experience in this research is captured through a sample of operational
failure reports. They originate from four de-identified countries, referred to as Country
A, B, C, and D due to confidentiality. The following discussion focuses firstly on the
process of reporting equipment failures and their collection at the local level (i.e.
database of the ATC Centre) and national level (database of the respective Civil
Aviation Authority-CAA). The discussion continues by revealing a range of data pre-
processing problems and the corresponding solutions.
Chapter 3 Preliminary Assessment
46
3.3.1 Reporting and data collection
The aim of occurrence data collection is generally to record the safety performance of
the relevant unit (e.g. ATC Centre). The data are collected on a range of safety-
relevant occurrences, such as incidents, losses of separation, equipment failures, bird
strikes, runway incursions, level busts, and others. For example, at the European level,
the EUROCONTROL ESSAR 2 document (EUROCONTROL, 2000c) provides
recommendations on the reporting and assessment of safety occurrences in ATM. As a
result, the national Civil Aviation Authorities (CAAs) specify the types of ATM
occurrences to be collected, analysed, or investigated through their mandatory
occurrence reporting (MOR) schemes (Figure 3-3). For example, the UK CAA also
specifies who can report an occurrence, what the correct reporting procedure is, and
how the details should be disseminated (in the case of the investigation). The UK CAA
states that the objective of this reporting scheme is “to contribute to the improvement of
air safety by ensuring that relevant information on safety is reported, collected, stored,
protected, and disseminated. The sole objective of occurrence reporting is the
prevention of accidents and incidents and not to attribute blame or liability” (UK CAA,
2005).
Figure 0-3 Reporting system
In aviation generally, as in ATC, data is usually stored and sorted electronically in
different databases. Collection of data in hardcopy has long been abandoned in most
of the developed countries worldwide. The type and level of database detail depends
on the unit/group/authority collecting the data (e.g. a system control and monitoring
unit, air navigation service provider, or national CAA). For example, when collecting
equipment failure occurrences, the most detailed information is available in the
Chapter 3 Preliminary Assessment
47
database of the control and monitoring unit within the particular ATC Centre. This
database must contain information on all equipment failures that occurred in the ATC
Centre regardless of their impact or severity. The reason for this is because
engineering staff have to have a complete insight on all equipment failures as they are
responsible for repair and maintenance.
However, not all equipment failures are required to be reported at a national level. The
choice of those that need to reach respective CAAs is made through a review of
reported incidents or safety events on a monthly, quarterly, and annual basis. As a
result, a national database will contain only occurrences of appropriate severity
characteristics and impact on operations. As an example, the UK CAA uses a MOR
database which contains, amongst others, reports on equipment failures that impact on
the controllers’ ability to provide air traffic services. These reports are fed in from the
Engineering Reporting Occurrence Database which contains details on all technical
problems, failures, and maintenance issues, of which the majority pass unnoticed by
controllers (due to the high level of ATC systems redundancy).
Collected data is regularly analysed to assess the safety performance at national level
as well as at the level of the relevant units (e.g. ATC Centre). Furthermore, this
information is sometimes used on a wider basis for benchmarking studies and to record
the safety performance of a given region (e.g. European Civil Aviation Conference –
ECAC consisting of 41 European countries).
3.3.2 Data pre-processing problems
As previously mentioned, the research presented in this thesis uses operational failure
reports from four operational databases. Problems experienced with extracting failures
from different operational databases can be summarised as follows:
� Different reporting schemes produce different levels of reporting detail. The amount
and quality of information reported differ significantly from one report to another.
Therefore, inconsistencies between reports were identified in terms of failure impact
(i.e. severity), duration, and location.
� There are differences in terminology used (e.g. Computerised Automatic Terminal
Information Service - CATIS as Automatic Terminal Information Service - ATIS by
another name, “hotline” as ground to ground communication, usually intercom;
National Aeronautical Information Processing System - NAIPS as Aeronautical
Information Service - AIS), usage of very specific component names (e.g. Air
Ground Data Processor - AGDP, as part of datalink system).
Chapter 3 Preliminary Assessment
48
� A lack of reporting culture that results in uncertainty related to data reliability and
completeness.
These problems are addressed below highlighting the approaches adopted to mitigate
them.
All reports have a short, one sentence long, summary followed by a description of the
equipment failure incident plus some additional information (e.g., date, occurrence
number, location, area code: flight information region or sector name). Unfortunately
the additional information were not always available. Additionally, Countries C and D
provided their internal severity categorisation, while Country D provided information on
failure duration. Since Country D’s dataset originates from an engineering unit, the
duration variable was measured from the first log of the failure until its final resolution.
As a result, it was possible to consistently extract four types of information. The type of
equipment/ATC functionality affected and complexity of failure type are extracted
usually from the short summary available for each report. The severity of equipment
failure is extracted using the available severity rating (if it existed) or assessing the
available information of the operational and safety impact of equipment failure and thus
applying the severity rating derived in this research (see Chapter 4, Table 4-5). Finally,
the duration variable is available only in the Country D database.
Data pre-processing is based on the classification of ATC system functionalities (see
Chapter 2). In certain reports it was very difficult to determine the type of equipment.
This problem was compounded by having only an acronym to explain precisely what
the report referred to. Consequently, several interviews have been conducted with
engineering staff from two European ATC Centres to correctly identify and classify
those ambiguous problems and assure proper classification. A glossary of terms and
acronyms is found to be a very useful tool during the pre-processing stage. Such
documents should accompany (or be an integral part of) every database as part of a
normal reporting practice.
Within one country, the number of reports may not reflect the actual number of
equipment failure incidents in the ATC Centres for a variety of reasons. The main
reasons may be the lack of reporting as a result of an inadequate reporting culture in
the ATC Centre and aviation community overall. Secondly, not all equipment failures
are included in the CAA databases. As previously explained, only failures of certain
Chapter 3 Preliminary Assessment
49
severity (i.e. impact on ATC operations and controller performance) tend to be reported
to the CAA. As a result, the available operational failure reports are neither necessarily
complete nor reliable (i.e. they lack the detail on the context surrounding a reported
occurrence). To date, no measure of completeness and reliability of occurrence
databases has been produced. This is a task for future research.
3.3.3 Available operational failure reports
As stated previously, there are four sources of data on equipment failures included in
this thesis, Countries A, B, C, and D. The first three data sets are from Civil Aviation
Authority (CAA) databases for a given time period. In other words, these are equipment
failures reported in the CAA database for all ATC Centres within the national
boundaries of these countries over a given time period (usually a year). The fourth data
source (Country D) represents data from the system control and monitoring unit of one
ATC Centre. Table 3-1 gives a summary of the available data.
Table 0-1 Summary of available data, number of reports, and equipment failure incidents per country
Country Source of data Time period
available
Average flight hours flown for available time
period
Total number of reports pre-
processed
Total number of equipment
failures reported
A CAA 1999-2003 1,375,800.00 1,378 791
B CAA 2001-2005 1,027,870.00 1,393 1,324
C CAA 1992-2004 389,245.68 3,340 448
D System control
unit/ATC Centre 08/2000-2004 428,502.22 16,697 7,788
Total 22,808 10,351
After pre-processing of all available equipment failure reports (22,808), more than ten
thousand reports (i.e. 10,351) are identified as equipment failures in air traffic control
(Table 3-1). The remaining reports mainly comprised of equipment related reports
outside of the national airspace, multiple reports filed for the same occurrence to reflect
multiple finding or causes identified, as well as reports on non-ATC equipment and
other non-technical types of incidents (e.g. human error, runway closures due to non-
equipment issues, scheduled maintenance, software updates, and scheduled hardware
changes).
Chapter 3 Preliminary Assessment
50
The time period studied, for countries A and B, could be considered steady (uniform)
with respect to the ATC service provided and other aviation related factors (e.g. traffic
levels, jet fuel prices, airline fares, regulations). However, one modern ATC Centre was
opened in Country A in the second half 2001. This resulted in a relatively large number
of early failures of individual components early in 2002. This is a recognised
characteristic of the initial life or ‘burn-in period’ of any newly implemented system
(Figure 3-4).
Figure 0-4 ”Bathtub” model of reliability for electronic components (Leveson, 1995)
Country B underwent a complete modernisation of its ATM system in 2000. Given that
a typical ‘burn-in period’ range between 30-90 days (IEEE, 1998), it is reasonable to
assume that the system was well integrated and settled for the period of the data (i.e.
2001 to 2005). Therefore, the average number of incidents reported in this period could
be considered representative and appropriate for further analysis.
However, the time period available for Country C consists of 13 consecutive years (i.e.
1992 to 2004). This country went through extensive regulatory changes throughout the
1980’s. The change in air service licensing assured that any operator that could prove
financial viability and meet safety standards would obtain a license. As a result, by the
end of the 1980’s, the number of operators had more than doubled. At about the same
time, the Government decided to commercialise most of its service provision activities.
Thus air traffic and other services formed new state-owned commercial enterprises.
However, all of these changes were firmly embedded into the system until the 1990’s,
and therefore, the sample provided could be considered stable and appropriate for
further analysis.
Country D is unique in that it provided data from a single engineering unit database and
therefore represents the most detailed data source in this research. It covers the
Chapter 3 Preliminary Assessment
51
shortest period available (3.5 years) but contains the highest proportion of failures or
75 percent of all available reports.
Although the available sample has a significant number of operational failure reports,
this still does not indicate how representative these reports are of the operational ATC
environment. For this reason, a methodology for the top down total aviation system
safety is developed. This methodology enables determination of the contribution of
ATC equipment to the safety of the overall air transport system based on past
research. Once this is established, the same methodology is applied using the
operational failure reports and then the results are compared. This methodology and
the subsequent validation of the available operational data are presented in the
following section.
3.4 Methodology to assess the relevance of supporting data
This section develops the methodology for an assessment of the available sample of
operational failure reports. In order to assure the relevance of this sample, this section
builds a methodology for its validation. In short, the contribution or risk budget of
equipment failures to the overall safety of air transport system extracted from past
literature is compared to the result obtained from the analysis of available operational
failure reports. The section starts by identifying the overall aviation Target Level of
Safety (TLS) and derives risk budgets for ATM and its ATC component. It concludes by
determining the risk budget of ATC equipment. In other words, this methodology
determines the contribution of ATC equipment failures to the safety of the overall air
transport system. This finding is then compared to the results of the preliminary
analysis of the available operational failure reports.
3.4.1 The accident to incident ratio
Aviation Target Level of Safety (TLS) expressed only in terms of accidents has two
potential limitations. Firstly, the number of accidents is small for any adequate
statistical analysis. Non-accident data, such as loss of standard separation between
aircraft in controlled airspace, is therefore necessary to establish the occurrence of any
trends. Secondly, the number of accidents (or accident rate) is not necessarily the best
measure of safety performance. For example, the currently used target of one accident
in 107 flight hours demands the collection of operational data over many years to
demonstrate whether the TLS has been met. A single accident may violate the TLS,
whilst many years without an accident will satisfy the TLS, but conceal any
deterioration in safety prior to an accident (Graham, Kinnersly, and Joyce, 2002). In
Chapter 3 Preliminary Assessment
52
this context, past safety analyses (not only in aviation) have used the number of
incidents together with the assumed accident/incident ratio. The United States Federal
Aviation Administration (FAA, 2000) cites several different analytical approaches. The
two most common of these are discussed below.
In the 1940s, Heinrich introduced the idea of the existence of accidents where injuries
did not occur, but considered only damage to property (Heinrich, 1941). This led to the
creation of the so-called ‘Heinrich pyramid’ with established proportions of accidents,
serious incidents, and incidents; 1:29:300 (Saldana et al., 2002). After these initial
studies, there was stagnation in the theoretical underpinnings of safety investigations
until the practical work of Byrd in the 1970s. Byrd carried out his work in a steel factory
and revised Heinrich’s proportions to 1:29:600 (Saldana et al., 2002).
However, whilst both of these studies are valuable in their statistical analyses, they do
not seem to be appropriate in dealing with equipment failures in ATC, at least not in the
ratios they offer. Both studies are designed to determine the risk and related ratio of
on-the-job accidents and incident. The reason for the weaknesses in both studies may
originate from their design and in particular, the bias of analysing accident reports filed
by supervisors only (which tend to blame injuries on workers) and much lower levels of
equipment reliability and integrity compared to the systems used in ATC today.
For the purpose of the research presented in this thesis, additional attention has been
given to the ratio between accident and incidents induced by ATC equipment failures.
However, a EUROCONTROL safety assessment study assumed that one in 10,000
equipment failures will contribute to an aviation accident (EUROCONTROL, 2004c), an
assumption which is in line with the high reliability requirement for the overall ATC
systems, as well as ATC equipment. A number of arguments can be made to suggest
that in future, this proposed ratio will decrease:
� The number of incidents should decrease due to continuous safety initiatives and
hazard prevention programmes;
� The probability of an incident leading to an accident should decrease due to
increases both in equipment reliability and advanced solutions for redundancy
and diversity (dissimilar redundancy);
� Changes should be seen in the type of incidents occurring, in that as a result of
enhanced risk management approaches, the frequency of serious incidents
should reduce;
Chapter 3 Preliminary Assessment
53
� There should also be a decrease in the number of software-related incidents,
which are prevalent today as discussed earlier. Hardware-related incidents
should also diminish.
The arguments discussed above infer the step change in software and hardware
reliability as a result of considerable operational experience, knowledge, and expertise.
For example, in its requirements for the software configuration EUROCONTROL states
that reporting, tracking, and corrective actions are set in place to mitigate any software-
related problem (EUROCONTROL, 2003i). Note also that a decrease in the number of
incidents should only consider the steady state (i.e. useful life) as captured in the ‘bath
tub’ reliability model (Figure 3-4).
It has been highlighted that perception of risk only in terms of accidents tends to mask
the actual safety issues. For this reason, it is important to include the number of
incidents so as to estimate the appropriate accident/incident ratio. After the discussion
of accidents and incident ratio, the following section discusses the units of
measurement used in aviation and thus the different perspectives obtained in the
investigation of a critical event.
3.4.2 Units of measurement
The rate of any critical event represents the number of occurrences (e.g. equipment
failures, incidents, accidents) divided by the exposure to those events. For example,
aviation accident statistics are presented in a variety of ratios and units, called units of
measurement. The most frequently used are the number of accidents per operation
(take off or landing), per million flight hours flown, per flight, per million departures, per
million aircraft-miles, per million aircraft-hours, per million passenger-hours, and per
million passenger-miles.
No single measurement gives a complete picture of the critical event under
investigation. Each of these units gives only one perspective, whilst possibly hiding
others. For example, rates per million passenger-miles are most useful for comparing
air transport and other modes of transport, whilst aircraft departures are suitable for
comparison of accidents between small commuter jets and large commercial jets (e.g.
BA46 and B747, respectively). In addition, for the determination of the required
performance of the landing aids e.g. Instrument Landing System (ILS) or Microwave
Landing System (MLS), the only appropriate measure would be the number of landings
Chapter 3 Preliminary Assessment
54
per time period of interest. Any other measure would mask the true performance
values.
In addition to the units of measure, accident rates are determined by the definition of
the critical event as well. These critical events range from accidents, fatal accidents,
hull losses, to the number of fatalities or injuries. An accident, as defined by ICAO
Annex 13 (ICAO, 2001d), involves “an occurrence associated with the operation of an
aircraft, which takes place between the time that any persons board the aircraft with the
intention of flight and that all such persons have disembarked, in which any person
suffers death or serious injury, or in which the aircraft receives substantial damage.”
This definition therefore comprises fatal accidents as well as hull losses. Thus, in
dealing with various accidents rates it is crucial to be aware of the precise definition of
both the critical event and the unit of measurement used.
The current rate of aircraft accidents per million flying hours has remained constant
over recent years. If the same accident rate is assumed for the future together with
predicted increases in traffic levels, there will be an increase in the absolute number of
accidents. Using the current accident rate, ICAO has predicted that by the year 2010
there will be an aircraft accident per week, i.e. 52 accidents per year (Hai, 2004). This
is the reason why the US FAA and other aviation authorities have identified the need to
significantly decrease the risk of aircraft accidents.
The following sections propose a methodology for the derivation of aviation target level
of safety (TLS) based on the rate of aircraft accidents (defined as a number of
accidents per flight hour). An accident is defined according to ICAO, while the flight
hour has been chosen as the most appropriate measure of risk induced by equipment
failures. It is usually more convenient to work in terms of flight hours rather than
operational hours of an ATC unit or sector. This approach avoids difficulties and
differences associated with the geographical coverage of the system(s) being
considered, phase of flight, the density and complexity of airspace, as well as available
systems and equipment (e.g. number of radars, navigation systems, communication
systems). This is also in line with Required Communication, Navigation, and
Surveillance Concepts (RNC, RNC, RSC) as defined in the previous Chapter. In short
the proposed methodology starts by identifying the high-level aviation target level of
safety further focusing on the precise contribution of equipment failures, as the type of
occurrence under investigation in this thesis.
Chapter 3 Preliminary Assessment
55
3.4.3 The acceptable risk or target level of safety (TLS)
The methodology to determine the contribution of equipment failures to the safety of
the overall air transport system is organised in several steps. Firstly, existing aviation
standards for Target Level of Safety (TLS) are assessed. Secondly, the contribution of
ATC to the risk of an aircraft accident is determined. Thirdly, the contribution of ATC
equipment to the ATC risk budget is determined. These findings are than extrapolated
to the year 2020, as the target year in this research in line with the European
Commission’s ‘Vision 2020’ (European Commission, 2001). The final step involves
validation of the available sample of operational data using the same methodology.
These steps are presented in the following sections.
3.4.3.1 Existing standards
Technology and engineering have brought numerous inventions and benefits to the
modern way of life. Whilst these benefits are welcome, the risks associated with them
are not. The high pressure on the engineering world to reduce risk and increase safety
comes at a financial price. Therefore, it is important to manage the trade-off between
risk and the cost of its reduction.
As a result, there are certain degrees of risk that must be accepted. Determining the
acceptable level of risk1 is generally the responsibility of management and is based on
several principles. These are the objective to be achieved, the alternatives available,
and the consequences and values that can be identified. Based upon this, the TLS is a
quantified level of risk (or potential loss) that a system should be designed to deliver
(Brooker, 2004). In aviation, the TLS is usually expressed as a number of aircraft
accidents per flight hour flown, which is used in this thesis, as indicated previously.
The concepts of TLS and risk budgeting are directly linked. Indeed, risk budgeting
represents a top-down distribution of TLS (or total aviation risk) between the
independent sub-categories. The logic behind this process is to specify the maximum
1 Note the difference between acceptable and tolerable risk. Tolerability refers to a “willingness
to live with a risk so as to secure certain benefits and in the confidence that it is being properly controlled. Tolerable risk, is not ignored, but is controlled and reduced further if possible. On the other hand, acceptable risk means that we are “prepared to take risk as it is” (Reid, 1996). It should be noted also that acceptable risk is a relative term and is based on different risk perceptions: individual, public (group of individuals), industry (industry usually needs additional pressure to declare a product as unsafe), and risk perception by safety experts. They all differ in the level of risk they are willing to ‘accept’.
Chapter 3 Preliminary Assessment
56
acceptable risk for each sub-category, so that each one has to produce equal or lower
risk than prescribed (see Figures 2-1 and 2-3).
As pointed out by Brooker (2004), there are several methods to derive the TLS. In most
cases, the analysis starts from the current situation and uses an improvement factor to
derive the desired TLS. In some cases, this improvement factor may be established as
a continuing trend from the past translated into the future. It should incorporate traffic
growth factors, factors representing changes in the systems involved, the operational
procedures, and work practices. In other cases, it may be based on a common
agreement between technical experts, with the main idea underlying it being to set
challenging, but still realistic safety improvement targets.
The following sections provide an overview of the most relevant aviation TLS analyses.
The level of diversity between these approaches highlights the complexity of the
problem and the need for a consistent top-down total air transport system approach.
3.4.3.1.1 Joint Aviation Authority
The Joint Aviation Authority (JAA) document JAR-25.1309 is one of the main regulatory
documents in aviation. It also defines the fundamental principles that govern aircraft
design and certification. JAR 25.1309 defines the risk of a serious accident due to
“operational and airframe-related causes” to be in the order of one per million hours of
flight. About ten percent of the number of accidents related to operational and airframe
causes is attributed to aircraft equipment failures (e.g. hydraulics and electrical
systems) and the rest (90 percent) to other operational aspects (JAA, 1994). A
EUROCONTROL review of existing TLS standards and practices (EUROCONTROL,
2000a) argues that this requirement is based on data from the 1960s and as such is
outdated. Furthermore, the JAR requirement is related to aircraft design,
encompassing only aircraft equipment, without consideration for the other components
of the air transport system (including ATM). Accordingly this JAR requirement needs to
be informed with all the major changes in the aviation industry since the 1960s. The
following paragraphs indicate several key factors that symbolise the changes and
growth in aviation since the 1960s.
There has been a rapid expansion in the air transport industry over the last four
decades due to a number of factors, including growth in the world economy,
advancement in flight technology and the deregulation of the airline services. The result
of these forces has been a steady decline in airline costs and passenger fares, which
Chapter 3 Preliminary Assessment
57
has further stimulated traffic growth. As an example of economic growth, ICAO cites
that there has been an increase in total gross domestic product (GDP) by a factor of
3.8 over the same period (ICAO, 1997). The GDP is considered to be the most
appropriate available measure of world output and indicates the health of the global
economy.
Changes in flight technology have also had a major effect on the growth in travel
demand. The modern era of air transportation began in the 1960s. The major drive was
the replacement of piston engines with jet engines, which was accompanied by
increased speed, reliability, and comfort. This change led to a reduction in operational
costs, which in turn led to increased travel demand.
In addition to this, changes in the regulatory environment in both the US and Europe
have had a big effect. The deregulation of airline services in the US in 1978 allowed
airlines to improve services, reduce average costs, increase routes, and increase
efficiency of scheduling. In Europe, the introduction of a single market for aviation
services by the European Union in 1992 has seen similar changes to that seen in the
USA.
The ICAO Manual on Air Traffic Forecasting (ICAO, 1985) suggests three methods for
forecasting future civil aviation traffic. These methods are trend projection, econometric
analysis, and market and industry survey. Econometric forecasting is the only method
that takes into account various economic, social, and operational factors affecting air
traffic. The objective here is to translate the relevant factors into projections of future
traffic growth. Then the traffic growth factors are reviewed further to incorporate
prospective changes by other factors that are not accommodated in the econometric
analysis.
The predicted traffic growth will influence target safety levels through the increase in
the number of flight hours forecast. However, there are other factors, not necessarily
included in this forecast of traffic growth, that have the potential to influence the level of
safety. Some of these factors are: the growth in the total number of aircraft flying as
well as in the passenger capacity of aircraft (e.g. Airbus 380, Airbus 350, Boeing 7E7
Dreamliner), increased airport and airspace congestion, technological development
(e.g. advanced safety nets, satellite-based CNS/ATM), and pressure on finding the
tools to control and mitigate human error. Another important factor not considered is
Chapter 3 Preliminary Assessment
58
the increasing effect of environmental policies on aviation, in particular on air fares,
costs, and restrictions to possible routes.
Therefore, in line with the EUROCONTROL argument the JAR requirement should be
informed with an analysis based on an updated data sample of accident rates from the
last four decades. At the same time, future predictions and regulations should be based
on econometric forecasting, which will involve the effect of traffic growth as well as
other economic, technical, and operational factors.
3.4.3.1.2 UK Civil Aviation Authority
The UK Civil Aviation Authority (CAA) has calculated a worldwide fatal accident rate
using the Worldwide Aircraft Accident Summary (WAAS) aviation database sample2 for
the period 1990-1999 (UK CAA, 2000). The CAA based its analysis on this sample and
the following assumptions (EUROCONTROL, 2005):
� A fixed annual traffic growth rate until the year 2020 (i.e. 4 percent for western
built jets); and
� A constant number of fatal accidents per year (i.e. eight fatal accidents each
year).
Based on these assumptions, the UK CAA predicted a rate of 1.8E-07 fatal accidents
per flight for the year 2020. For the purpose of the methodology presented in this
Chapter, this target has been translated into the rate per flight hour using the
information available on the Boeing web site (Boeing, 2004) as follows. The average
flight in 1982 was approximately 1.4 hours, while in 2002 it was 1.94 hours. If this trend
continues, it is determined in this research that the average flight in 2020 will be 2.43
hours. Using this assumption, the UK CAA’s TLS for the year 2020 corresponds to
7.4E-08 fatal accidents per flight hour.
3.4.3.1.3 International Civil Aviation Organisation
There have been several attempts by ICAO to derive aviation target levels of safety.
These originate from a number of different studies and reports, which are presented
below, from the earliest to the most recent.
2 Information published by Flight International (monthly publication of Reed Business
Information Group). Includes accidents and serious incidents worldwide with the exception of the Commonwealth of Independent States (CIS) before 1990 (former Soviet Union). The data set covered only commercial aircraft or aircraft with maximum takeoff weight above 5.7t.
Chapter 3 Preliminary Assessment
59
� ICAO North Atlantic Systems Planning Group (NATSPG) - the ICAO NATSPG
initially developed a method using the data on fatal accidents of jet aircraft in
the period from 1959 to 1966 (EUROCONTROL, 2000a). Based on available
data3 this analysis estimated fatal accident rate of 2.34E-06. The analysis
progressed by assigning a factor 0.1 for accidents due to collision. The basis for
this assumption is not evident or recorded. An improvement factor between two
and five was further applied to justify the use of historical data on future targets
(EUROCONTROL, 2000a). This resulted in a TLS ranging between 12E-08 to
4.6E-08 fatal accident per flight hour due to collision. Finally, the analysis
apportioned the value of TLS to three flight dimensions and thus calculated a
TLS for collision due to loss of lateral separation to be between 4E-08 and
1.5E-08 fatal accidents per flight hour.
� ICAO Review of the General Concept of Separation Panel (RGCSP) - in 1995,
the ICAO RGCSP reviewed several approaches to deriving a TLS for ATM and
accepted the one developed by ICAO NATSPG. The RGCSP assumed a total
accident rate from all causes to be 1E-07 per flight hour for the year 2010. This
TLS is based upon the NATSPG analysis extrapolated to the year 2010
(Brooker, 2004). Based on the contributions from the US (TLS ranging between
2E-09 and 7E-09) and the USSR4, the RGCSP agreed upon TLS value that
should be used for establishing any vertical minimum performance
specification. This value is equal to or better than 5E-09 fatal accidents per
flight hour arising from collisions due to any cause for the period 2000 to 2010.
This value of a TLS is also indicated in the ICAO Annex 11 (ICAO, 2001c);
� ICAO Annex 11 - in the situation where “fatal accidents per flight hour” is
considered to be an appropriate metric, ICAO Annex 11 (ICAO, 2001c)
proposes a TLS of 5E-09 fatal accidents per flight hour per dimension after the
year 2000. Although ICAO Annex 11 does not provide any justification for this
TLS, it is assumed that this value is taken from the ICAO RGCSP. For the
period prior to the year 2000, ICAO Annex 11 recommends the use of a TLS of
2E-08 fatal accidents per flight hour per dimension; and
� ICAO All-Weather Operations Panel (AWOP) - the objective of the ICAO AWOP
was to assess the required navigational performance (RNP) for approach,
landing, and departure phases of flight (ICAO, 1994). Based upon historical
3 Based on 36 fatal accidents and an estimate of 15.5 million flight hours during the period
1959-1966. 4 The USSR developed a series of targets for progressive implementation, such as 1E-08 from
1990 to 2000, 5E-09 for 2000-2010, and 2E-09 for 2010 onwards (ICAO, 1995).
Chapter 3 Preliminary Assessment
60
data5, ICAO’s calculation determined the average hull loss to be 1.87E-06 per
flight or 1.27E-06 per flight hour. Based on this historical data, ICAO proposed a
TLS for hull loss per flight hour to be 1E-07. The rationale for this risk
improvement over the historical accident rate is the removal of pilot errors by
the use of glass cockpit aircraft and tunnel incident alarm. The glass cockpit is a
system of electronic displays presenting all information on an aircraft's situation,
position, and progress. The tunnel incident alarm is an alert that is triggered if
the aircraft unintentionally leaves the assigned flight path, the “tunnel”, during
the approach and landing phases of flight. Additionally, the objective in aviation
safety is to reduce the number of accidents despite increasing flight hours. This
is essential if public confidence in aviation is to be maintained as the global air
transport system expands.
3.4.3.1.4 Summary of the various TLS analyses
The previous section has given an overview of the research on aviation TLS which is
summarised in Table 3-2 (based on the information available). This table enables
comparison of the TLS taking into account the source of data, the time period covered
by the data set, the type of accident, the type of aircraft operation, and the TLS unit
used.
Once again the differences in the derivation of TLS should be pointed out. The
summary presented shows the level of discrepancy in the method, data set, and
taxonomies used. The major factors that drive the differences in the calculation of
target levels of safety are:
� Type of accident (accident, fatal accident, hull loss),
� Weight of aircraft involved in the accident,
� Differences in the definitions (i.e. taxonomies used),
� Type of operations analysed: scheduled vs. non-scheduled, commercial vs.
non-commercial (military, freight, general aviation), registered vs. non-
registered, domestic vs. international,
� Type of aircraft included: jets vs. turbo props,
� Time frame of the data set analysed,
� Source of the data,
5 Data set covers hull loss accidents for the period from 1959 to 1990 for commercial jet aircraft
whose weight exceeds 60,000lbs. Exposure percentages are based on an average flight duration of 1.47h. A hull loss accident is defined as an accident where the primary cause is hull loss or aircraft damage beyond economical repair.
Chapter 3 Preliminary Assessment
61
� Region involved in the analysis (with or without former Soviet Union),
� Targeted year for the TLS calculation: current vs. future levels.
Table 0-2 Summary of various analyses on aviation TLS
Reference Title Database
Scope
Target year
TLS Region/time period
Type of operation/
weight/type of accident
Joint Aviation
Authorities
JAR 25.1309 Large
Aeroplanes - Advisory
Material - AMJ
Not specified Worldwide
1960s Serious accident
Not specified
1E-06 per flight hour
UK Civil Aviation Authority
Aviation Safety Review
CAP 701 WAAS
Worldwide 1990-1999
Jets & turbo props/
MTOW>5,700t/fatal
accidents
2020 1.8E-07 per
flight/7.4E-08 per flight hour
ICAO
North Atlantic Systems
Planning Group (NATSPG)
Not specified Worldwide Jets/1959-
1966 Not
specified 2.34E-06 per
flight
ICAO
Review of the General
Concept of Separation
Panel (RGCSP)
Not specified Not
specified Jets/fatal accidents
2010 1E-07 per flight hour
ICAO Annex 11 Not specified Worldwide En route fatal
accidents
After the year 2000
5E-09 per flight hour per
dimension (1.5E-08 per flight hour)
ICAO
All-Weather Operations
Panel (AWOP) 15
th meeting
Not specified Worldwide 1959-1990
Jets/MTOW> 60,000lb/ hull loss
accidents
Not specified
1E-07 per flight hour
Key: MTOW = maximum take-off weight of the aircraft
After the review of the most relevant analysis and methods of TLS calculation, the TLS
of 1E-08 accidents per flight hour is used as the baseline for the year 2020 (target year
of the research presented in this thesis). The reasons for using this baseline are:
� The rate of 1E-07 is currently used as a target by ICAO for both fatal accidents
and hull loss accidents (see Table 3-2);
� With the overall aim of reducing the accident rate given the current safety
targets, it is reasonable to aim at 1E-08 accidents per flight hour in the year
2020;
� The analysis conducted by the UK CAA to predict the role of fatal accidents for
2020 (i.e. 7.4E-08 fatal accidents per flight hour).
Chapter 3 Preliminary Assessment
62
Once the TLS for the year 2020 is determined, the next step is to apportion the
contribution of ATC in the overall air transport TLS. To establish this, several studies
have been reviewed. The key findings are presented in the following section.
3.4.4 Target level of safety and Air Traffic Control risk budgeting
The next step is to determine the risk budget allocation for the ATC system as a
component of the overall air transport system, i.e. determine the contribution of ATC.
According to the results of the UK CAA’s analysis, the contribution of ATC and ground
aids to aircraft accidents is 1.7 percent (Table 13 in EUROCONTROL, 2005).
EUROCONTROL currently uses 2 percent as a maximum direct contribution of ATM to
aircraft accidents within the European Civil Aviation Conference (ECAC) region. This
figure was derived based upon historical data (ICAO ADREP database focused on the
ECAC region) from which a contribution of ATC is determined to be 1.1 percent
(EUROCONTROL, 2001a). Recognising that only ATC causes were accounted for
(without contribution of other ATM components, such as ATS, ASM, AFTM)
EUROCONTROL allowed additional 0.9 percent, resulting in 2 percent of ATM
contribution to aircraft accident. This figure has been further validates via discussions
with EUROCONTROL Safety Regulatory Commission’s task force Hazard
Classification Matrix (HCM). EUROCONTROL has defined “the maximum tolerable
probability of ATM directly contributing to an accident of a commercial air transport
aircraft” in the ECAC region to be 1.55E-08 per flight hour (EUROCONTROL, 2001b).
This figure is based on the rate of aircraft accident for the year 1999 (extracted from
ICAO ADREP database focusing on the ECAC region) with direct ATM contribution (2
percent) and a forecast of 6.7 percent increase in the traffic volumes for the period
1999-2015 (EUROCONTROL, 2001a).
In the Netherlands, a study by the national research laboratory (NLR) used a sample of
civil aircraft accidents that occurred worldwide during the period 1980-1999, mostly
based on ICAO database (van Es, 2003). This study determined that ATM-related
accidents represent 8 percent of the total number of accidents. Additionally, 28 percent
of these ATM-related accidents are directly caused by ATC, which makes the ATC
contribution to aircraft accidents approximately 2.2 percent. The difference in the
contribution of ATC in these two studies is due to the difference in classification of
causal factors. While the UK CAA analysis divided all underlying factors into primary,
causal, and circumstantial groups, the NLR analysis followed the recommendation by
Chapter 3 Preliminary Assessment
63
ICAO and did not use this distinction. The NLR study considered an occurrence as a
causal factor only if that occurrence was part of the chain of events leading to the
accident. The NLR approach seems to reflect better the aim of determining the overall
ATC contribution to aircraft accidents.
The results presented above need to be augmented for possible statistical error and
uncertainties linked to the reporting processes as well as to provide additional
protection for the future. As previously discussed, EUROCONTROL allowed additional
0.9 percent for statistical error and uncertainties in the calculation of the ATM safety
targets for ECAC region based upon historical data for only one component of ATM,
namely ATC (EUROCONTROL, 2001a). With this in mind, together with the results
from UK CAA and NLR studies, this thesis uses a maximum contribution of ATC of 3
percent. Thus, using the previously established TLS for air transport system for the
year 2020 (in the previous section), apportioned contribution of ATC is considered to
be 3E-10 per flight hour. Now, after deriving the TLS for ATC specifically, this functional
block should be divided between human operators, equipment, and procedures. This
approach now gives the opportunity to define the appropriate risk induced by failure of
ATC equipment which is presented in the next section.
3.4.5 Target level of safety and Air Traffic Control equipment risk budgeting
It is important to determine the contribution of equipment (or their failure or malfunction)
to the ATC risk budget. The historical data on the proportion of incidents in which
equipment failure is implicated varies to a certain degree. Interviews with system
control and monitoring staff at two European ATC Centres6, as well as the
approximation used by the CORA 2 documentation (EUROCONTROL, 2004c) reveal
that equipment failures are the causal factor in 0.01 or one percent of all incidents.
Although this assumption is based on the ATM system and not its ATC component
only, it is used with other sources of information to inform the ATC equipment risk
budgeting within overall air transport system.
More focused approach is provided by the NLR study (van Es, 2003). This study
determined that the particular causal factor ‘ATC ground aid malfunction or unavailable’
has been attributed to 5 percent of all ATM related accidents or 18 percent of all ATC
related accidents. It should be noted that this causal factor includes ‘unavailable’ ATC
6 Based upon private communications with staff at two European Area Control Centres (ACCs).
Chapter 3 Preliminary Assessment
64
equipment meaning equipment that was taken out of service by ATC staff, presumably
for maintenance reasons. In addition, the research was based on data samples that
incorporated older systems with lower levels of automation. Future systems are shifting
more towards a higher level of automation and higher reliability, as discussed in the
previous Chapter.
Therefore, it can be approximated that equipment failures represent the causal factor in
10 percent of all ATC related accidents (or 3 percent in all ATM related accidents). This
is based on the assumption that unscheduled failures constitute about 50 percent of
the failures in the NLR analysis discussed above. This approach derives a risk of an
ATC equipment failure leading to the aircraft accident to be 3E-11 per flight hour. The
reasoning presented seems to correlate with the widespread argument that human
error represents the causal factor in 70-80 percent of all accidents (Reason, 1997).
Although there is some evidence that the majority of these human errors represent
organisational errors (Johnson and Holloway, 2004). A graphical representation of the
determined risk budgets is given in Figure 3-5.
Figure 0-5 Aviation TLS and risk budgeting
After assessing the contribution of ATC equipment failures to the overall risk of aircraft
accident, it is important to validate these findings with some operational experience.
This is achieved in the following section by analysis of operational failure reports from
three countries.
Chapter 3 Preliminary Assessment
65
3.5 Preliminary analysis and validation of operational failure reports
The previous sections described the process of deriving an overall aviation TLS for the
reference year 2020 and further risk budgeting for ATC equipment. In order to justify
the use of the available sample of operational reports in this thesis, this sample is
validated by the proposed TLS methodology. This is presented in the following
paragraphs.
Having the accident rate for the year 2000 (EUROCONTROL, 2005) and predicted
accident rates for the year 2010 (1E-07; Brooker, 2004) and 2020 (1E-08, used in this
research), it is apparent that future safety levels are predicted to improve tenfold every
decade. This is in line with the attempts of various aviation institutions to significantly
improve future aviation safety levels (e.g. FAA, ICAO). The next step is to implement
the established rate of improvement to the ATC equipment failures.
Using the same analogy and the ratios within an air transport system, as presented in
Figure 3-5, it is possible to translate the 2020 rate of ATC equipment contribution to
aircraft accident to the present levels (i.e. 2000). The calculation presented in section
3.4.5 showed that for the year 2020 this effect is of the order of 3E-11 per flight hour.
Using the reverse logic, this effect equals to the level of 3E-09 for the year 2000. In
other words, based on the past research and established ratios the contribution of
equipment failures to the overall safety of air transport system in the current period is in
the order of 3E-09 per flight hour.
Having established the contribution of equipment failures to the overall safety of the air
transport system based on past research, it is necessary to calculate the same value
using the available operational failure reports. The conformance of ATC equipment
budgeting obtained from past research and available failure reports would indicate that
the available sample is representative of equipment failures occurring in the operational
ATC environment.
Firstly, it is important to discuss the overall commercial air transport accident rates for
the three countries analysed. These rates are slightly higher than the worldwide
average (1E-06 per flight hour; see Figure 3-5), ranging from 1E-05 and 9E-06 aircraft
accidents per flight hour). Secondly, it is necessary to discuss the available sample of
operational failure reports by focusing on the frequency of equipment failure reports per
Chapter 3 Preliminary Assessment
66
year and per source. The incident reports used in this section were from three sources,
namely three Civil Aviation Authorities (CAAs), presented as Country A (for the period
1999 to 2003), Country B (for the period 2001 to 2005), and Country C (for the period
1992 to 2004). The final results of this preliminarily analysis of available operational
reports are presented in Table 3-3. The average number of failures is calculated for all
three data sets (column 4). This is followed by the calculation of incident rates based
on the average flight hours flown for the given time periods (column 5). The final step
involved adjustment of the calculated incident rate to give the probability of accident
caused by equipment failure (using the accident to incident rate of 1 in 10,000) as
shown in the last column on Table 3-3. In other words this calculation produced the
operational level of safety for three countries and three respective time periods.
Table 0-3 Analysis of operational failure reports and results
Country Year
Total number of equipment
failures reported
Average number of equipment
failures per year
Rate of failure - incident (per flight hour)
Rate of failure - accident (per flight
hour)
(1) (2) (3) (4) (5) (6)
A
1999 100
158.2 1.15E-04 1.15E-08
2000 107
2001 122
2002 287
2003 175
B
2001 184
264.8 2.58E-04 2.58E-08
2002 237
2003 171
2004 247
2005 485
C
1992 28
34.46 8.85E-05 8.85E-09
1993 38
1994 41
1995 21
1996 16
1997 42
1998 40
1999 25
2000 38
2001 27
2002 46
2003 42
2004 44
Based on the contribution of equipment failures to the overall safety of air transport
system extracted from the past research and overall TLS methodology (3E-09 per flight
Chapter 3 Preliminary Assessment
67
hour), we can conclude that the TLS levels acquired from operational reports (last
column in Table 3-3) show a degree of conformity.
Even higher levels of conformity would be achieved with setting of higher level of TLS
for year 2000 (data indicate 1E-05 as opposed to 1E-06 accepted within aviation
community). Furthermore, better tuning of the current and future trade-offs within the
air transport system (see Chapter 2, Figures 2-1 and 2-3) would additionally enhance
the proposed methodology for determination of risk budgeting of the ATC equipment.
Future advancements in technology, changes in the levels of traffic, and overall
changes in the ATC/ATM philosophy (e.g. shifting of separation responsibility from the
ground to the air) have a potential to improve safety. At the same time it is reasonable
to assume that the distribution of the levels of risk within the air transport system will
change. The results specific to ATC given here could be used as an input to a
complete safety analysis that should consider trade-offs between the various
components of the aviation system to realise risk budgets for a safe and cost effective
system. Finally, the severity of the reported incidents could be used to inform the
weighting scheme and to better reflect the accident to incident ratio, as the above
analysis considered all incidents equally.
In short, the above analysis indicates that the available operational failure reports are a
representative sample of equipment failures occurring in ATC Centres worldwide.
Having established the appropriateness of this sample, the following Chapter moves
toward the identification of operational characteristics of equipment failures extracted
from past research and operational failure reports.
3.6 Summary
This Chapter starts with a precise definition of equipment failures and hazards,
representing a sub-group of equipment failures that require human intervention (or
human recovery). It continues by presenting a sample of operational failure reports
available in this research. After discussion on the reporting schemes designed to
capture incident occurrences, including equipment failures, the Chapter continues by
highlighting data pre-processing problems and solutions applied to overcome them. In
order to assure the relevance of equipment failures captured in the sample available,
the remainder of the Chapter builds a framework for its validation. This framework for
risk assessment, based entirely on past literature, begins from the risk assessment of
the overall air transport system and focuses on one component, namely ATC
Chapter 3 Preliminary Assessment
68
equipment. In other words, this section determines the maximum allowed accident risk
imposed by ATC equipment failures for the target year 2020.
The contribution of equipment failures to the overall safety of air transport system
extracted from past literature have then been compared with the result obtained from
the analysis of available sample. This analysis showed a degree of agreement between
the theoretically assumed and operationally extracted levels of ATC equipment risk
budgeting. In other words, the available operational failure reports are a representative
sample of equipment failures occurring in operational ATC environment. Hence, the
next Chapter proceeds with a detailed assessment of the equipment failure
characteristics extracted from operational failure reports and available literature.
Chapter 4 Equipment Failures in ATC
69
4 Equipment Failures and Technical Defences in Air Traffic Control
The previous Chapter showed that operational failure reports available in this thesis
constitute a representative sample of equipment failures occurring in the operational Air
Traffic Control (ATC) environment. This Chapter moves toward the identification of the
operational characteristics of equipment failures. These are extracted from past
research and more than 20,000 operational failure reports. Special attention is paid to
the impact that equipment failures may have on ATC operations, and as a result a
severity rating scheme has been designed to support the research presented in this
thesis. Having discussed the consequences of equipment failures and their impact on
ATC operations, it is important to discuss how such consequences can be prevented or
mitigated. This involves the process of recovery from equipment failure and a
distinction can be made between technical and human recovery. This Chapter
discusses technical recovery by reviewing the existing technical built-in defences,
whilst the next Chapter discusses human (i.e. controller) recovery. A subset of
equipment failure characteristics relevant to ATC operations is then used in this
Chapter to develop a novel tool for the assessment of the severity of equipment
failures, known as the qualitative equipment failure impact assessment tool. This tool
enables an assessment of the overall impact of an equipment failure on ATC
operations.
4.1 Equipment failure characteristics
When dealing with any type of equipment failure, it is important to understand its
underlying characteristics. In other words, it is important to take into account issues like
causes, consequences, duration, and complexity. Thus, a detailed hazard analysis
would capture the most important characteristics of a failure and the context
surrounding its occurrence (Leveson, 1995). The following sections explain several
important failure characteristics:
� ATC functionality affected;
� Complexity of failure type;
Chapter 4 Equipment Failures in ATC
70
� Time course of failure development;
� Duration of failure;
� Potential causes of equipment failure; and
� Consequences of equipment failure.
The consequences of equipment failures are discussed on several different levels,
ranging from their impact on the individual (i.e. the air traffic controller), the operations
room, the ATC system, and the impact they have on the overall ATM system.
4.1.1 ATC functionality affected
The methodology adopted in this thesis for the classification of ATC functionalities
results in a nine-category classification (Chapter 2, section 2.3). Several examples of
the equipment failures related to different ATC functionalities are presented in Table 4-
1. These examples are randomly selected and de-identified from operational failure
reports available in this research, as discussed previously in Chapter 3.
Table 4-1 Examples of equipment failures related to different ATC system functionalities (as defined in Chapter 2)
Type of failure Example
Communication function
Total radio telephony failure on three frequencies (three sectors). Workstation had to be reset to default fallback setting.
Navigation function
Runway 15 Instrument Landing System (ILS) failed whilst aircraft on 16 NM final approach in Instrument Meteorological Conditions (IMC). Approach Control Centre was advised and aircraft confirmed the failure. Aircraft was preparing for a missed approach, when the ILS returned to service after recovery.
Surveillance function Erroneous altitude readings displayed on radar for B777 and B767 at FL340 and FL350, respectively. Short term conflict alert (STCA) was activated.
Data processing function
Triple failure on suite flight data exchange. System fully recovered after 40 min by manual intervention. Departures from two airports were stopped for approximately 10min. The cause was the existence of duplicate flight identity numbers within the flight data held in the affected workstations.
Supporting function
B737 was on the final approach at 50ft over the runway when the controller received a false Approach Monitoring Aid (AMA) warning. The controller was concerned that in low visibility conditions a go-around would have been unnecessarily given.
Safety nets (SNET)
STCA failed to activate against two aircraft at FL120. One aircraft was dropping parachutes, with the other filming them. Consequently, the aircraft were quite close to each other. They were both squawking Secondary Surveillance Radar (SSR) codes, but Short term Conflict Alert
Chapter 4 Equipment Failures in ATC
71
(STCA) failed to activate.
Power supply
At time 0535 power failure in the tower caused Radar Data Processing System (RDPS) and Flight Data Processing System (FDPS), radar, public telephone network, weather radar, and computer failure. At time 0650 position rebooted and upgraded. ATC service returned to normal at 0730.
Pointing and input devices
Cursor frozen in global ops field of electronic flight strip. The controller was moved to an adjacent console and resumed operations from that position. There was only a brief interruption to the service.
System monitoring and control function
At 0215 the ATC system suffered a significant slowdown. The System Monitoring (SMS) shut itself down.
4.1.2 Complexity of failure type
Failures can be single or multiple component failures (Wickens et al., 1998). A single
failure can be total or partial affecting only one piece of equipment or one of its
components. Multiple component failures can be independent of each other (which can
make the process of diagnosis very difficult) or dependent failures (common cause,
common mode, or cascade failures) (Mauri, 2000). Common cause failures occur when
a single cause creates simultaneous (or near simultaneous) multiple failures (e.g. due
to fire, loss of power, or software bug). Common mode failures are a subset of common
cause failures whose observed effect on the system is identical. Cascade failures are
dependent failures that affect redundant components by shifting their load sequentially
(e.g. power grids or servers). Once the first level of redundancy is pushed beyond its
capacity (e.g. transformer), the load will be shifted onto the next redundant component
until all redundancies are exhausted (Mauri, 2000).
4.1.3 Time course of failure development
In terms of time course of failure development, there are sudden, gradual, or latent
failures. With sudden failures, the operator does not have much time to prepare for
recovery, but at the same time there is the potential advantage of immediate detection
of the failure. Contrary to this, gradual failures may degrade system capabilities in ways
that are not apparent to the operator (e.g. gradual loss of data integrity). This makes
failure detection, and therefore technical and human recovery extremely difficult. Latent
failures are generally difficult to detect. These failures exist in the system unnoticed
until the occurrence of some other failure or unusual occurrence reveals long-existing
latent failures in the system (Wickens et al., 1998). As a result, this group of failures is
observed separately, as the time course of their initial development is not known, i.e.
these failures could occur initially either as sudden or gradual.
Chapter 4 Equipment Failures in ATC
72
4.1.4 Duration of failure
Duration of failure is defined as the time between the first log of the event (corresponds
closely to the failure detection) until its final closure. Applied to a specific failure, it can
carry important information on recovery and its impact on ATC, ATM, and overall
aviation safety. The categories defined in this research are based on the evidence from
the available operational failure reports. Their analysis indicates the distribution of
failure duration which corresponds to the following categories (section 4.4.6):
� Short period of time - order of magnitude is in minutes;
� Moderate period of time - order of magnitude is in minutes up to one hour; and
� Substantial period of time - order of magnitude is in hours (it can extend to days).
4.1.5 Potential causes of equipment failures
The causes of equipment failures come from the three interacting sources. These are:
� Technical faults as defects or anomalies built into the system or its components;
� Human errors or violations as acts of omission or commission by the designer,
constructor, controller, engineer, or maintenance personnel that might result in a
failure; and
� External factors or unfortunate, unforeseen, or uncontrolled events, such as severe
weather, fire, accidents, vandalism, sabotage, or terrorism.
The listed causes of failures represent only the first layer of causation. Further analysis
might reveal the existence of organisational error, organisational loss of control, or
failure to anticipate all hazardous conditions and prepare appropriate defences against
them. As an example, the impact of a power outage should be anticipated by
management and consequently appropriate preventive strategies should be
implemented. Similarly, the threat of either terrorism or vandalism should be guarded
against through the provision of adequate internal security measures.
There are various techniques designed to investigate technical faults, human error, and
organisational error. For technical faults, Fault Trees (FT), Event Trees (ET), and
Probabilistic Safety Assessment (PSA) are mostly applied (Brooker, 2006); human
error is investigated by a range of Human Reliability Assessment (HRA) techniques
which are discussed in more detail in Chapters 7 and 8. Finally, organisational errors
are mostly investigated using the Reason model (Reason, 1997), the Human Factors
Chapter 4 Equipment Failures in ATC
73
Analysis and Classification System-HFACS (Shappell, 2000), or qualitative principles
behind a safety culture (Sorensen, 2002).
After brief discussion of these five failure characteristics, the next section discusses the
potential consequences of equipment failures. The consequences of equipment failures
are discussed at several levels, from their impact on the individual (i.e. the controller),
the operations room, the ATC system, concluding with their impact on the ATM system
as a whole.
4.2 Consequences of equipment failure
Equipment failures that penetrate existing technical built-in defences and hence affect
controller performance (called hazards) are the main objective of the research
presented in this thesis. Therefore, the consequences of these failures are initially
assessed at the level of the controller, followed by the operations room, a given
airspace (i.e. the impact on ATC operations), and finally at regional level (i.e. the
impact on ATM operations).
4.2.1 Impact on air traffic controller
The impact of equipment failures on controller performance represents the focus of this
thesis, and as such will be assessed in detail in the following Chapters. One equipment
failure occurrence in the Lisbon ATC Centre highlights the impact that equipment
failures could have on the controller (Sampaio and Guerra, 2004). In this very busy
sector, a sudden failure of the Radar Data Processing System (RDPS) affected only
one radar track. This failure went unnoticed for 21 minutes until a traffic advisory by the
cockpit-based Traffic Collision and Avoidance System (TCAS) triggered an action by
the controller. The controller did suspect some problems prior to the TCAS alert
focusing only on human error in the input of relevant data (i.e. SSR code).
Unfortunately, the controller never considered the possibility of an equipment failure.
Post-incident investigation revealed that the cause of this failure was incompatibility of
the software developed for the installed radar with the software of the main ATC
system. However, the same investigation did not reveal why this failure affected only
one radar track and not all tracks informed by the same radar. This particular example
highlights how complex and severe an equipment failure can be.
4.2.2 Impact on operations room
The impact of equipment failures on the entire ATC operations room depends entirely
upon the failure characteristics in terms of the number of equipment/positions affected.
Chapter 4 Equipment Failures in ATC
74
Another important factor is the overall ATC Centre architecture, since exposure to
failure varies greatly based on the interconnectivity of different equipment, the level of
separate channels (redundancy/variability), and failure complexity (single failure vs.
multiple failures). Based on operational experience (NATS, 2002) and ATC operations
room configuration, four categories can be differentiated. These categories range from
the impact on the entire operations room, several sectors, or only one sector. The
categories are defined as follows:
� All workstations/all sectors affected;
� A number of workstations/different sectors affected;
� Several workstations (within same suite)/one sector affected; and
� One workstation/one sector affected.
The proposed categorisation by NATS follows the severity of the impact of failures on
the operations room starting with the most severe failure (known as outage) to the least
severe type of failure (affecting only one workstation). In addition, each ‘suite’ is
responsible for a specific portion of airspace (i.e. sector) whilst each sector has a
declared capacity (expressed in terms of the number of aircraft in the sector in the peak
hour). As a result, the failure characteristic ‘impact on operations room’ is linked with
the number of aircraft exposed to the impact of equipment failure.
4.2.3 Impact on ATC operations
The impact of equipment failures on Air Traffic Control (ATC) service provision should
incorporate effects from an operational, safety, and financial perspective. In terms of
ATC operation, equipment failures could result in an inadequate ATC service, leading
for example to unexpected or increased delays in service provision (aircraft performing
holding procedures due to a failure of the Instrument Landing System – ILS during the
landing phase of flight), delayed arrivals/departures, and limitations in capacity due to
traffic flow restrictions or stopped departures/arrivals.
From the safety perspective, failures generate unavailability of certain ATC functions.
They also generate increased workload as a result of unexpected and highly stressful
failure occurrences increasing the potential for incident/accident occurrence. Vitally,
safety could be jeopardised by any type of data integrity equipment issue when the
equipment provides timely but inaccurate information. On such occasions, an
equipment failure could go undetected for some time (see the example discussed in
section 4.2.1). All of these, combined with inadequate or insufficient training, the
Chapter 4 Equipment Failures in ATC
75
absence of recovery procedures, and a lack of experience may create the potential for
controller error.
From a financial perspective, equipment failures create planned and unplanned costs
of repair, training (of both controllers and technicians), and incident investigation.
However, the most likely costs are measured in terms of additional costs placed on
airlines in the case of significant delays (e.g. loss of connecting flights and passenger
accommodation). These are discussed further in the next section.
Ideally the combination of all three consequences of an equipment failure should
constitute the overall impact on ATC operations or the particular failure’s ‘severity’.
However, in the operational environment the most usual practice is to combine safety
and the operational impact of an equipment failure to determine its severity rating. The
following paragraphs review severity ratings defined specifically for equipment failure
occurrences. They originate from safety regulations defined in two Air Navigation
Service Providers (ANSPs) and one Civil Aviation Authority (CAA).
The UK National Air Traffic Service (NATS) recognises four categories of failure types
based on their impact on ATC operations, namely major impact, impact on workstation
or suite, ATC impact, and minimal impact (Table 4-2). Furthermore, analysis of
operational failure reports in this thesis identified the severity categorisation from one
CAA (referred to as Country C) and another ANSP (referred to as Country D). The CAA
of Country C defines the severity rating of equipment failures according to the potential
to cause a significant problem (see Table 4-3).
Table 4-2 UK NATS severity rating (from NATS, 2002)
Severity Definition
Major impact to Ops room
Severe flow restrictions could be required
Impact to workstation/suite
May be necessary to combine/move positions immediately or sector flow restrictions may be required
ATC impact Not immediately critical, will have greater operational impact over time
Minimal impact Centre management required
Chapter 4 Equipment Failures in ATC
76
Table 4-3 Country C’s severity rating as defined by its CAA
Severity Factor Definition
CR Critical An occurrence or deficiency that caused, or on its own had the potential to cause, loss of life or limb.
MA Major An occurrence or deficiency involving a major ATC system component that caused, or had the potential to cause, significant problems to the function or effectiveness of that system.
MI Minor An isolated occurrence or deficiency not indicative of a significant ATC system problem.
Finally, the data for Country D originate from one particular ATC Centre. This Centre
determines the severity of an incident as a result of the combination of the impact it has
on both the controllers (internally in this ATC Centre as well as externally in other ATC
units) and system control and monitoring engineers. In general, in this particular ATC
Centre the determination of the severity of an incident is the task of the system control
and monitoring unit which distinguishes five severity classes. These are presented in
the Table 4-4.
Table 4-4 Country D severity rating as defined by the particular ATC Centre
Severity Factor Definition
1 System down A system outage affecting the total of ATC services provided
2 Critical An error severely affecting a single or few random working positions or a single external service or an error on a “first” standby system.
3 Urgent An error affecting part of a single or few random working positions or part of an external service or an error on a backup system reducing backup capacity.
4 Important An error affecting a supportive service or a system for which automatic backup is available.
5 Enhancement An error having no direct operational impact and only slight non-operational impact.
These severity rating schemes indicate that each country follows its own severity index.
Furthermore, there is a difference in severity ratings between ANSPs and CAAs, as
ANSPs are concerned about the impact on their service provision business (e.g.
delays), whilst safety regulators are concerned about whether such an event causes an
accident. Therefore, simply comparing the severity of occurrences between countries is
unlikely to produce useful findings. All classifications are rather qualitative and depend
Chapter 4 Equipment Failures in ATC
77
upon experience and judgement, which always involves a degree of subjectivity. As a
result, it is necessary to define a unique severity classification for the entire dataset
available in this study corresponding to the existing equipment failure severity ratings
(UK NATS, Country C, and Country D). Consistent with operational practice, the
severity rating defined in the following paragraphs combines safety and operational
impact of equipment failures, while disregarding the financial aspect due to lack of
data. Since the focus of this thesis is on the impact of equipment failures on ATC
operations (including its impact on controller performance), the exclusion of the
financial aspect of severity rating does not have a detrimental effect on this severity
rating and the subsequent quality of data analyses.
The result is a three-level severity rating (major, moderate, and minimal) of equipment
failures based on their impact on ATC operations, as would be appreciated by the
controller (Table 4-5). It is important to highlight that this severity categorisation is
based on the exposure of an ATC Centre to the failed equipment (affecting the entire
ATC Centre, a number of workstations, or only the backup system) regardless of the
type of service provided by the affected ATC Centre. The significant difference in the
level of detail in the reports and the overall need for a consistent approach led to the
exclusion of the type of ATC service in the overall severity categorisation. This
characteristic is accounted for later on in the thesis through the assessment of the
recovery context surrounding an equipment failure occurrence. As a result, this
exclusion here does not have detrimental effect on the severity rating and the
subsequent quality of data analyses. In general, the severity rating is based on the
failure type, available contextual conditions of the failure occurrence, and its impact on
ATC operations.
Table 4-5 Severity rating defined in this research and mapped with available sources
Severity rating in
this research
Definition of the severity rating in this research
Mapping with severity ratings from available
research
Major
Definition: This type of failure may cause severe disruptions on every workstation. It may require immediate traffic flow restrictions to contain workload to manageable levels, which are safe for sustained ongoing operations.
Major
(UK NATS)
Chapter 4 Equipment Failures in ATC
78
Examples: loss of main Flight Data Processing System (FDPS), total voice communication outage, loss of Multiple Radar Processing (MRP), loss of Terminal Approach Radar (TAR), loss of Parallel Approach Runway Monitor (PARM), loss of radar coverage, either complete or over larger parts (Primary Surveillance Radar - PSR and secondary surveillance radar - SSR), total power failure, loss of all Radio Telephony (RT) frequencies, incorrect barometer indication (as part of meteorological equipment), Instrument Landing System (ILS) failure during approach phase and in the reduced visibility conditions, failure of runway/taxiway lights in reduced visibility conditions, wrong indication of runway/taxiway lights, Surface Movement Radar (SMR) failure or provision of wrong label indication.
Major
(Country C)
1
(Country D)
Moderate
Definition: Only affects workstations reliant on the failed item or service. The disruption of ATC operation is contained and a normal level of operation may be resumed by physically moving and combining the role of the affected workstations with another within the sector suite or by physically moving the sector team to the stand-by suite. Under some conditions, sector flow restrictions may be applied.
Impact on workstation/suite
(UK NATS)
Examples: loss of single sector frequency, loss of a number of frequencies, loss of one or two workstations in a sector suite, loss of entire sector suite, loss of telephone panel or Voice Switching And Communication System (VSCS) on a single workstation, loss of one radar (in multiple radar environment), loss of ground-based navigational aids (e.g. Very high frequency Omnidirectional Range - VOR, Non-Directional Beacon - NDB, Distance Measuring Equipment - DME), loss of PSR (as it is a backup to SSR), SSR garbling, loss of safety nets (as these are only tools to support controller).
Major
(Country C)
2 and 3
(Country D)
Minimal
Definition: Initial disruption to ATC operations is not immediately critical, but could have greater impact over time (If not recovered within a reasonable time frame, disruptions to ATC operations may be prolonged/sustained). This escalation with time can restrict traffic flow into sector(s).
ATC and minimal impact
(UK NATS)
Examples: loss of processor, loss of link, loss of system control and monitoring unit, loss of headset, ILS failure during approach in normal visibility conditions because the opportunity for go-around always exists, failure of runway/taxiway lights (in normal visibility conditions) as this system is only a visual aid to the instrument landing, failure in communication link to adjacent ATC Centre, loss of auxiliary display, temporary failure of strip printer or paper jam, inadequate strength of RT frequency, failure of left hand headset connector while right hand is functioning, disturbance/interference on a ground frequency, loss of sequencing tool, and loss of pointing/input devices.
Minor
(Country C)
4 and 5
(Country D)
Having defined the three-level severity rating to be used in this research, appropriate
mapping is established with the existing severity ratings (as defined by UK NATS, the
CAA of Country C, and the ANSP of Country D). The comparison of specific categories
from each of the available sources reveals the matching with ‘major’, ‘moderate’, and
‘minimal’ ratings as defined in this research (Table 4-5). Note however that the ‘major’
category, as defined by Country C, had to be split between ‘major’ and ‘moderate’
categories, as defined in this research. The rationale behind this split is based on two
Chapter 4 Equipment Failures in ATC
79
criteria of equal importance. The first criterion is the definition of ‘major’ and ‘moderate’
categories as presented in Table 4-5. In other words, the severity rating has to
distinguish between failures that affect the entire ATC Centre and those that affect only
workstations reliant on the failed item. The second criterion is based on the impact of a
failure on ATC operations. For example, loss of a VOR or NDB is rated as ‘moderate’
because navigation may be still provided using radar surveillance, other navigational
aids (Global Positioning System-GPS, Automatic Dependence Surveillance-ADS).
However, loss of an ILS during the approach phase or in reduced visibility conditions is
rated as ‘major’. During this phase of flight the aircraft is in the landing configuration
(i.e. reduced speed, in close proximity to the ground). If visual contact with ground is
not achieved at the moment of the failure, an immediate go-around procedure is
necessary. Because of this, the failure of an approach navigation aid (such as ILS) is
considered more severe.
4.2.4 Impact on ATM operations
As noted earlier, it is highly beneficial to analyse the impact of the failures on
operations both inside the control room and outside over a given airspace. At the same
time, it is also important to recognise that failures could have an impact not only on
ATC but also on the wider ATM system. The following examples show how severe the
impact of an equipment failure on ATM operations can be.
According to Aviation Week (reported in RISKS, 2000; NATS, 2004), the UK ATC
service suffered a flight data processing software failure at West Drayton ATC Centre
in June 2000. As a result of the failure, flight progress strips had to be hand written,
which forced the ANSP to restrict the amount of traffic in UK airspace. While the ATC
system recovered after four hours, the effects of this failure were felt for several days
with knock-on effects as far as France and Germany. This is understandable due to the
centralised flow control of traffic in Europe (provided by the EUROCONTROL Central
Flow and Management Unit). As a result of the failure’s severity and subsequent flow
control, its impact spread over a sub-continental region.
Another example of a failure with a severe impact on a wide region is the brief power
failure which affected the US Federal Aviation Administration (FAA) Southern California
Terminal Radar Approach Control (TRACON) facility at Miramar on April 19, 2006. The
facility switched immediately to backup power. The outage lasted only 6 or 7 seconds,
but had an impact on airports from the Mexican border and half way through the state
of California, due to imposed traffic flow control (10News, 2006).
Chapter 4 Equipment Failures in ATC
80
Another example of the severe impact that one single failure can induce is the outage
that occurred in the Chicago ATC Centre in 1995 when the en-route automation
component failed for two hours. This single occurrence cost the airlines an estimated
$12 million in delays (National Transportation Library, 1997). The National
Transportation Library (NTL) report mentions this example to make a case for the
replacement of the outdated main and back up Flight Data Processing Systems
(FDPS), involved in the reported incident. In short, these examples show how severe
the impact of an equipment failure on global ATM operations can be. This issue will
become especially important in a future gate-to-gate ATM system where the roles for
planning and control will have to be re-organised and distributed between controllers
and pilots.
Similar to ATC operations, the impact of failure on ATM can be analysed from several
different perspectives. From operational and safety perspectives, a higher degree of
workload will be experienced both on the ground by controllers, technicians, and
engineers and in the air by flight crew. From a financial perspective, in addition to costs
identified in ATC, it is necessary to add the cost of delays in a wider region. A small
exercise has been conducted on the cost of delays induced by ATC equipment failures
to indicate the financial impact of delays in the European Civil Aviation Conference
(ECAC) and US airspace. This is presented in Appendix I.
Having discussed the consequences of equipment failures, it is important to discuss
how such consequences could be prevented or mitigated. This involves the process of
recovery from equipment failure and a distinction can be made between technical and
human recovery. The following section focuses on technical recovery and the principles
used to prevent and in some cases to mitigate the impact of equipment failures. The
human recovery aspects are addressed in Chapter 5 and throughout the rest of the
thesis.
4.3 Definition of technical defences (technical recovery)
The aim of any design is to identify the functions of a system in advance and to
develop a method which assures the delivery of the intended functions. It is always
necessary to predict what may happen if something fails or if an operator handles a
system incorrectly. Experience shows that even the best designed systems fail
occasionally. Therefore, it is crucial that every design concept includes a solution to re-
establish system operation and provide continuous service. These solutions are
Chapter 4 Equipment Failures in ATC
81
grouped under the term ‘technical built-in defences’. They represent defences against
any unplanned or unwanted interruption of service. They are complex socio-technical
systems which combine technical, human, and organisational measures that prevent or
protect against an adverse effect (Smith et al., 2004). Verification of the existence and
appropriateness of existing defences provides confidence in the safety of a system and
is a requirement for system certification.
Safety is recognised as the ultimate imperative in ATC and therefore, should be
addressed as early as possible in the design process. Having sound safety principles
built into each phase of the design (i.e. conceptual, preliminary, and detailed design
phase) is a useful way to avoid, prevent, and mitigate failures and their impact. Safety
through design is planned through five different principles (Figure 4-1) for hazard1
avoidance, elimination, or control, which are as follows (Christensen and Manuele,
1999; National Aeronautics and Space Administration, 2002; The European New
Machinery Directives cited in Piantek, 1999):
� Eliminate hazards;
� Design for minimum risk;
� Incorporate safety devices (i.e. devices designed to prevent any unwanted event);
� Provide warning devices (i.e. alert that signals the occurrence of some unwanted
event); and
� Develop operating procedures and training schemes.
Figure 4-1 Safety through design (adapted from Christensen and Manuele, 1999)
1 Within system safety, a hazard is usually defined as a condition which can lead to an accident.
In this research, a hazard is defined as the ATC system state resulting from an equipment failure that penetrates all existing technical defences and affects the ability of the controller to perform his/her tasks.
Chapter 4 Equipment Failures in ATC
82
The suggested principles follow the logical order of precedence. The first two
approaches focus on the elimination of the hazard from the system. However, if the
identified hazards cannot be eliminated (due to difficulties or cost), risk should be
reduced by using fixed, automatic, or other protective safety devices (i.e. defences for
seamless recovery from failure). When neither design nor safety devices can effectively
eliminate identified risks or adequately reduce them, devices should be used that
detect the unwanted condition and produce adequate warning signals to alert the
controller (i.e. defences for transmitting information regarding a failure). These warning
signals should be designed to minimise the probability of inappropriate human reaction
and response. Note that regardless of how a warning device performs (Figure 4-2), the
triggering failure represents a hazard (according to the definition in this thesis) as it
affects controller performance.
As explained before, the human operator remains the last line of defence (i.e. human
recovery). For this reason, when warning devices are not sufficient, special procedures
and training scheme should be designed. These must be periodically tested, verified,
and regularly updated to assure their effectiveness.
Similarly, when dealing with equipment failures in ATC, it is important to distinguish
between technical and human (i.e. controller) recovery (Figure 4-2). Both processes
start with the detection of failure (either by a technical system or controller) and
conclude with an outcome. The outcome can be nominal (pre-failure), non-nominal but
stable (i.e. degraded), or inadequate system state (leading to incident or accident). The
outcome of the equipment failure and recovery process is discussed in detail in the
following Chapter. The following paragraphs focus on technical recovery, while human
recovery is addressed in subsequent Chapters.
Figure 4-2 Technical and human recovery
As already highlighted, technical built-in defences can be divided in two different
categories according to the function they provide. These are defences for recovering
from failures (safety devices) and defences for transmitting relevant information on
Chapter 4 Equipment Failures in ATC
83
failure (warning devices). Both categories are examined further in the following
sections.
4.3.1 Defences for recovering from failures (safety devices)
This group of technical built-in defences should include mechanisms designed to
prevent an unwanted event or safety devices (e.g. radiotelephony anti-blocking device,
availability of primary and secondary frequency, automatic switching from normal to
fallback operational mode, automatic switching from primary to secondary glide slope
transmitter) and the creation of fault-tolerant systems though redundancy/diversity. The
main objective of built-in defences is to prevent adverse events from happening (i.e.
preventive defences) or to lessen the impact of the consequences on operations (i.e.
mitigative or protective defences). If a failure has only a preventive barrier, there is no
fault tolerance in the system, as achieved by protective defences. For example, the
feasibility study of the EUROCONTROL eight states free route airspace concept was
established to ensure that free route airspace operations are as safe as the current
fixed route operations (EUROCONTROL, 2001c). The analysis identified 128
preventive defences but no protective defences. Therefore, this concept, in its current
state, fails to establish fault tolerance in the ATM system.
Fault-tolerant systems are designed to preserve the minimum required service in spite
of failure occurrence. This is achieved through the employment of redundancy.
Redundancy is an ability of a system to keep functioning normally in the event of an
equipment failure, by having backup components that perform duplicate functions
(Mauri, 2000). The goal of this process is to mask failure events from the controller, but
also to capture it and report it for the necessary maintenance. However, redundancy
itself is not always a solution due to common cause failures (e.g. fire or power outage).
Common cause failures are due to the same cause. In order to prevent the occurrence
of these types of failures emphasis is placed on diversity of the systems (i.e. different
manufacturers), equipment diversity in manufacturing (e.g. different software
packages), and/or functional diversity (e.g. physically independent components,
redundant hydraulic system lines of commercial aircraft are physically separated so
that fire in a certain compartment does not affect all the lines simultaneously).
4.3.2 Defences for transmitting information on failure (warning devices)
Alerts should be provided to the controller in the event of a critical change in the ATC
system or equipment status and to remind him of critical actions that must be taken. An
Chapter 4 Equipment Failures in ATC
84
alert or a warning should enhance the probability of appropriate human reaction and
response (i.e. controller recovery performance). According to the FAA’s Human Factors
Design Standard (Federal Aviation Administration, 2003) warning devices should:
� Alert the operator to the fact that a problem exists;
� Inform the operator of the nature of the problem;
� Guide the operator’s initial responses (based on priority); and
� Confirm in a timely manner whether the operator’s response corrected the problem.
Alerts are usually generated immediately after the system detects any discrepancy
from predefined system performance. There are several ways in which ATC controllers
are informed of equipment failures or non-availability of certain functions. The most
usual ones are through colour-coding (e.g. change in the workstation’s border colour)
and textual messages, all presented on the Human Machine Interface (HMI). In
addition to the content and location of the alert message, it is equally important to
display an alert in a timely manner. Alert onset is defined as time between a system’s
detection of a failure and the moment an alert is presented on the HMI either by colour
change or text message (i.e. time-to-alert or TTA). This timing is usually system-driven
(based on the system threshold) but there are novel initiatives toward human-driven or
cognitively-driven alert onset. In general there are three different types of alert onset:
� Immediate onset (an alert is presented on the HMI after the system detects the
failure with the least time delay). This is the normal case for severe events.
� Delayed onset (an alert is presented on the HMI with a time-based or threshold-
based onset). For example, system requirements could be set up to inject an
alert with a specific time delay following the occurrence of a failure or to inject an
alert once a system-defined threshold has been reached (i.e. TTA). In the nuclear
industry this is known as alert sequencing or alert hierarchies indicating the
urgency of actions needed. In this way, a hierarchy makes use of safety criticality,
injecting firstly safety-relevant alerts followed by operational alerts. In satellite
navigation, the TTA value is one of the measures of the integrity of a satellite
navigation system (Feng et al., 2005).
� Cognitively convenient onset (an alert is presented on the HMI based on
cognitive convenience which can be defined thorough the levels of controller
workload). This futuristic concept is mostly used in the nuclear and automobile
industry where cognitive convenience is determined by measuring workload
using physiological measures (e.g. heart rate, breathing rate, galvanic skin
response, eye tracking device). This concept has been tested on a US naval ship
as described in Daniels, Regli, and Franke (2002). This study proposes a method
Chapter 4 Equipment Failures in ATC
85
to control the cognitive effects of task interruption by influencing the timing of an
alert and helping a user to regain their situational awareness within the
interrupted task.
After a detailed overview of the equipment failure characteristics as well as technical
recovery, the next section analyses the nature of equipment failures that manage to
penetrate the existing built-in defences and affect controller performance. For this
purpose, findings from existing literature have been augmented by results of the
analysis of more than ten thousand operational failure reports originating from four
different countries. This sample of equipment failure reports have already been
introduced in Chapter 3 and the following section further analyses this sample.
4.4 Analyses of operational failure reports
Existing literature on equipment failure characteristics has been reviewed in the
previous sections of this Chapter. This has been further augmented and informed by
the analyses of operational data from four countries (i.e. Countries A, B, C, and D), as
presented in detail in Chapter 3.
4.4.1 Data analysis methodology
Since the four countries are of different airspace size, equipage, traffic demand, and
density in their airspace, simple analysis of equipment failure rate would be of limited
value. Therefore, to gain a common metric to assess distribution of equipment failures
per year and per data source, it is necessary to normalise the rates of equipment
failures per appropriate unit of measurement. For example, the rates per ATC Centre
enable comparison of ATC Centres of similar traffic demands and thus equipage, but
otherwise fail to provide a meaningful performance measure. Similarly, the rate of radio
frequency failure per sector or per total number of available frequencies in a sector
(usually there are primary and secondary frequencies available in a sector) enables a
metric for the availability of voice communication in each sector. However, this unit is
not of practical use as the number of sectors changes hourly based upon changes in
air traffic demands. As a result, the rate of equipment failures per flight hours is used in
this research2. This approach avoids difficulties and differences associated with the
2 Hours flown data are collected for commercial airlines, including domestic, regional, and
international air traffic for each country.
Chapter 4 Equipment Failures in ATC
86
geographical coverage of the datasets available and the availability of ATC systems
and equipment (e.g. number of radars, navaids, communication systems).
The information on flight hours for each country has been extracted from the CAA
websites, annual incident summaries, and personal correspondence with the staff from
the engineering unit. After establishing the common ground with an appropriate unit of
measurement, further analyses are performed with available data structured around
four equipment failure characteristics, as they were possible to extract consistently
from available datasets. These four equipment failure characteristics are: type of ATC
functionality and equipment affected, complexity, severity, and duration3 of equipment
failures. The type of equipment/ATC functionality affected and complexity of failure type
are extracted from the short summary available for each report. The severity of
equipment failure is extracted using the available severity rating (if it existed) or
assessing the available information of the operational and safety impact of equipment
failure and thus applying the severity rating derived in this research (see Table 4-5).
The duration variable was available only in the Country D database. Finally, additional
statistical tests have been performed to identify any relationship between four
equipment failure characteristics. The structure of the data analyses is presented in
Figure 4-3.
The nature of the variables under consideration determined which statistical methods
could be used to analyse the data. As can be seen from their description in this
Chapter, most variables are categorical (type of equipment/ATC functionality affected,
complexity of failure type, and severity). Additionally, complexity of failure type and
severity variable have an ordinal character (assuming the ranking between possible
categories). Only duration represents a continuous or ratio scale variable4. This
variable is firstly investigated for its overall distribution, further to be split into categories
to extract information regarding failures of short duration (discussed in sections 4.1.4
and 4.4.6).
3 The duration characteristic is analysed last as it is available only in one database.
4 Variables can be either continuous or categorical. Continuous variables are numeric values on
an interval or ratio scale (e.g. age, income). Categorical variables can be either nominal or ordinal. Nominal variables differentiate between categories but do not assume any ranking between them (e.g. gender). On the other hand, ordinal variables differentiate between categories that can be rank-ordered (e.g. from lowest to highest).
Chapter 4 Equipment Failures in ATC
87
Operational failure reports
4 Countries22,808 available reports
Country D
Country A, B, C, and D
Country A, B, C, and D
Country A, B, C, and D
Data pre-processing
Rate ofequipment failures
Type of ATC function and equipment
affected
Severity
Duration
Additional statistical tests
Available data
Country D database
Traffic figures from respective CAAs
ATC functional classification –Chapter 2
Severity rating –Chapter 4, Table 4-5
Reference
Country A, B, and CComplexity of failure type Chapter 4, section
4.1.2
Figure 4-3 Operational failure reports analyses
Using the SPSS statistical package, frequencies of related categories are identified and
the most frequent categories are reported for each variable. To establish relationships
between these variables, additional statistical tests are also performed. In this regard,
chi-square tests are used to test the relationships between two categorical variables.
The most important assumptions of the chi-squared statistical tests are random sample
data, a large sample size, adequate cell sizes (no less than 5 observations per cell),
independent observations, and normal distribution of deviations between observed and
expected values. The size and characteristics of the available datasets imply the
conformance with all listed assumptions. Furthermore, the Cramer’s V test is used to
measure the association for nominal data (i.e. ATC functionality variable) whilst the
Kendall tau test is used for ordinal data (i.e. severity and duration variables). These
tests are briefly discussed in the following paragraphs.
Chapter 4 Equipment Failures in ATC
88
Cramer’s V is the chi-square-based test that measures the strength of the relationship
between nominal variables and is applicable across contingency tables of size greater
than 2X2 (Berenson et al., 2006). Cramer’s V coefficient is interpreted as a measure of
the relative strength of an association between two variables and it ranges from 0 to 1
(i.e. 1 representing a strong association). Suppose that the null hypothesis is that two
variables are independent random variables. Based on the frequency table and the null
hypothesis, the chi-squared statistic X2 can be computed as the squared difference
between the observed (O) and expected frequency (E) in each cell, divided by the
expected frequency. Then, Cramer’s V coefficient is defined in equation 4-1 below:
mn
E
EO
mn
XV
×
−
=
×
=
2
2)(
4-1
where n represents a sample size while m represents a smaller value between number
of rows minimised by one and number of columns minimised by one.
Kendall’s tau is a chi-square-based test that measures the strength of the relationship
between ordinal variables applicable across contingency tables of all sizes (Berenson
et al., 2006). Kendall’s tau coefficient has the following properties:
� If the agreement between the two rankings is perfect (i.e. the two rankings are the
same) the coefficient takes the value of 1.
� If the disagreement between the two rankings is perfect (i.e., one ranking is the
reverse of the other) the coefficient takes the value of -1.
� For all other associations the value lies between -1 and 1, and increasing values
imply increasing agreement between the rankings. If the rankings are completely
independent, the coefficient takes the value of 0.
Kendall tau coefficient is defined in equation 4-2 below:
1)1(
41
)1(2
1
2−
−
=−
−
=
nn
P
nn
Pτ 4-2
where n represents the number of pairs, P represents the number of concordant pairs.
In statistics, a concordant pair is a pair of a two-variable observation dataset {X1,Y1}
and {X2,Y2}, where (equation 4-3):
)sgn()sgn( 1212 YYXX −=− 4-3
Chapter 4 Equipment Failures in ATC
89
Correspondingly, a discordant pair is a pair where (equation 4-4):
)sgn()sgn( 1212 YYXX −−=− 4-4
Sgn represents the sign function defined as (equation 4-5):
>
=
<−
=
0,1
0,0
0,1
sgn
x
x
x
x 4-5
Therefore, a high value of P indicates that most pairs are concordant, i.e. the rankings
are consistent. A tied pair (sgn x = 0) is not regarded as concordant or discordant. If
there is a large number of ties, the total number of pairs (in the denominator of the
equation 4-2) should be adjusted accordingly (Berenson et al., 2006).
After presenting the overall methodology used for data analyses, the following sections
present some of the key findings and results.
4.4.2 Rate of equipment failures
From Figure 4-4, the rate of equipment failures for Country A initially increases greatly
before peaking in 2002, followed by a sharp drop in 2003. This corresponds to a large
number of early failures experienced with the opening of the new ATC Centre which
accounted for 63.4 percent of all reported equipment failures in that year. Country B’s
rate rises from 17.5 failures per 100,000 flight hours in 2001 to 25 failures per 100,000
flight hours in 2002. This is followed by a drop to 17.8 failures per 100,000 flight hours
in 2003 before increasing sharply in 2005. The reason for high rates in 2004/2005 is
that the air navigational service provider directed controllers to be more diligent about
filling out incident reports to improve the quality of the incident database and the overall
safety management system. Country C’s rate exhibits a steady trend for the entire
period of 13 years, being on average nine failures per 100,000 flight hours.
Chapter 4 Equipment Failures in ATC
90
0
5
10
15
20
25
30
35
40
45
50
19
92
19
93
19
94
19
95
19
96
19
97
19
98
19
99
20
00
20
01
20
02
20
03
20
04
20
05
Year
Rate
(in
100,0
00)
Country A
Country B
Country C
Figure 4-4 Total number of equipment failures per flight hours flown in each year for countries A, B, and C
The data available on the rate of equipment failures for Country D reveals a sharp rise
in number of equipment failures from 30 failures per 10,000 flight hours captured in the
last half of the year 2000 to 45 failures per 10,000 flight hours in 2001 (Figure 4-5)5.
The reason for this is that only five months of data was available for the year 2000.
Therefore, we can conclude that a rate of reported equipment failures in this ATC
Centre decreases in absolute numbers.
0
5
10
15
20
25
30
35
40
45
50
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
Year
Rate
(in
10,0
00)
Country D
Figure 4-5 Total number of equipment failures per flight hours flown in each year for country D (year 2000 incomplete)
5 Although the rates of equipment failure of Country D are tenfold higher compared to Countries
A, B, and C, Country D data are retained for subsequent analyses as they represent the most detailed and reliable source of operational failure reports.
Chapter 4 Equipment Failures in ATC
91
The next section builds on this trend analysis and assesses affected ATC
functionalities. The classification of all ATC functionalities, as defined in Chapter 2, has
been used for this purpose and the findings are presented for each Country separately.
4.4.3 Type of ATC functionality and equipment affected
This section provides the analysis of ATC functionalities and their sub-functions
affected by equipment failure occurrences as reported for Countries A, B, C, and D.
Country A data shows that the two ATC functionalities most affected are the
communication and surveillance functions (Figure 4-6).
Figure 4-6 Most affected ATC functionality (Country A)
Further analysis of sub-functions and equipment most affected by failures identified the
following five types: air ground communication, secondary surveillance radar (SSR),
flight data processing system (FDPS), primary surveillance radar (PSR), and other
communication systems, ranging from pagers, headsets, microphones, cables, to
footswitches (Table 4-6).
Table 4-6 Most affected ATC equipment (Country A)
ATC equipment affected Percentage
air ground communication 33.1
secondary surveillance radar (SSR) 17.7
flight data processing system (FDPS) 10.1
primary surveillance radar (PSR) 5.2
other communication systems 4
Similar to the previous case, two ATC functionalities for Country B most affected by
equipment failures are the communication and surveillance functions (Figure 4-7).
Chapter 4 Equipment Failures in ATC
92
Figure 4-7 Most affected ATC functionality (Country B)
Table 4-7 presents five types of equipment most affected by failures. These are: PSR,
air situational display or radar display, air ground communication, voice switching
communication system (VSCS), data exchange network, and runway/taxiway lighting.
Table 4-7 Most affected ATC equipment (Country B)
ATC equipment affected Percentage
primary surveillance radar (PSR) 17.2
air situational display 15.1
air ground communication 11.6
voice switching communication system (VSCS)
8.8
data exchange network 7.6
runway/taxiway lighting 7.6
Country C shows a slightly different trend in the distribution of equipment failures per
ATC functionality. The two most affected categories are the navigation and
communication functions (Figure 4-8).
Figure 4-8 Most affected ATC functionality (Country C)
Chapter 4 Equipment Failures in ATC
93
Furthermore, the five most affected equipment types are: air ground communication,
instrument landing system (ILS), very high frequency omnidirectional radio range
(VOR), non-directional beacon (NDB), and air situational display (Table 4-8).
Table 4-8 Most affected ATC equipment (Country C)
ATC equipment affected Percentage
air ground communication 23.7
instrument landing system (ILS) 19.6
very high frequency omnidirectional radio range (VOR)
7.6
non-directional beacon 6.5
air situational display 5.8
Country D shows a similar trend to Countries A and B, as two most affected ATC
functionalities are communication and surveillance (Figure 4-9). Although the
navigation function seems not to be represented at all in Figure 4-9, there were only
two failures affecting this functionality and both are due to testing of Global Positioning
System (GPS) clock alarms. The reason for the under representation of this ATC
functionality is the fact that data originated from one particular ATC Centre that
provides area control service and as such is not responsible for the ground-based
navigational aids and airport-based equipment (e.g. meteorological equipment,
runway/taxiway lighting, ILS, Surface Monitoring Radar-SMR).
0
500
1000
1500
2000
2500
3000
3500
Com
muni
catio
n
Nav
igat
ion
Surve
illanc
e
Dat
a pr
oces
sing
Suppo
rting
Safet
y ne
ts
Power s
uppl
y
Point
ing/
input
Syste
m m
onito
ring
ATC functionality
Fre
qu
en
cy
Figure 4-9 Most affected ATC functionality (Country D)
Further analysis of data for Country D shows that the following five equipment types
are most affected by equipment failures: air situational display (radar display), data
exchange network, air ground communication, other surveillance systems (mostly
referrers to radar links), and other communication systems, such as pagers, headsets,
microphones, cables, and footswitches (Table 4-9).
Chapter 4 Equipment Failures in ATC
94
Table 4-9 Most affected ATC equipment (Country D)
ATC equipment affected Percentage
air situational display 21.9
data exchange network 15.7
air ground communication 11.6
other surveillance systems 8.7
other communication systems 4
Table 4-10 collates the five ATC equipment types most affected by failures, from each
available dataset. Findings are structured according to the ATC functionality they
support (in rows) and sources (in columns). Overall it can be concluded that Countries
A, B, and D are quite similar in relation to the most affected ATC functionalities. Results
of data analyses from these three countries indicate that failures mostly affect the
communication and surveillance functionalities. On the other hand, results of data
analysis from Country C differ as failures mostly affect the navigation functionality.
These are mostly failures of ILS, followed by failures of VOR, NDB, DME, as well as
airport lighting facilities (runway and taxiway lighting). Furthermore, the only equipment
type frequently affected by failures in all four countries is air-ground communication.
Other equipment types common in available datasets are air situational display, radar,
data exchange network, and supporting communication system (e.g. pagers, headsets,
microphones, cables, and footswitches).
Table 4-10 Summary of the five ATC equipment types most affected by failures
ATC functionalities
Country A Country B Country C Country D
Communication
A/G communication
A/G communication
A/G communication
A/G communication
other communication
systems VSCS
other communication
systems
data exchange
network
data exchange network
Surveillance PSR PSR
other surveillance
systems
SSR air situational
display air situational
display air situational
display Data
processing and distribution
FDPS
Navigation
runway/taxiway
lighting ILS
VOR
NDB
Chapter 4 Equipment Failures in ATC
95
4.4.4 Complexity of failure type
As discussed previously in section 4.1.2 failures can affect single or multiple
components at the same time. The analysis of complexity of failure type was based on
extraction of the number of failures reported in each occurrence report, i.e. single or
multiple failures. It is assumed that failures that affect multiple components, regardless
of whether they are dependent or independent, were reported in the same operational
failure report. The personal correspondence with CAA staff in charge of the occurrence
databases from Countries A and B confirmed this assumption. According to them, if
two different items of equipment fail, but the time between failures is such that the
failure of one does not contribute to the failure of the other, then two 'single' failures are
reported separately. However, if the failures occur close together such that the failure
of one could have impacted on the failure of the other or, if unrelated, the fact that two
items failed close together meant that the controller workload is significantly increased,
then ‘multiple’ failures are reported in the same occurrence report. Based on these
findings, it was necessary to capture the frequency of reports that mentioned more than
one equipment failure. This was consistently done for Countries A, B, C, and D dataset.
Country C dataset has to be separately assessed due to the specifics of their reporting
system. In other words, in Country C, the database of each occurrence has multiple
records as they report separately each finding and cause. As a result, the assessment
of the multiple failure occurrences had to be performed by assessing each individual
case and completely avoiding all non-equipment failure reports. Similarly, Country D
dataset had to be completely ignored as the reporting system of the system control and
monitoring unit accounts for each failure independently. Table 4-11 represents the
percentage of multiple failures amongst the available operational failure reports.
Table 4-11 Percentage of the multiple failure occurrences reported in the available datasets
Country Number of reports with multiple failure
occurrences
Total number of reports
Comment
A 42 1378
B 206 1393
C 24 448 separate assessment due to the specific reporting system
D N/A N/A not applicable due to the specific reporting system
Aggregated data
272 (8.4%) 3219
Chapter 4 Equipment Failures in ATC
96
Using the severity categorisation defined in section 4.2.3, it is possible to categorise all
available equipment failure reports from operational and safety perspectives. The
following section assesses the ATC functionalities affected by equipment failure with
respect to their severity or impact on ATC operations.
4.4.5 Severity of equipment failures
Figure 4-10 presents the distribution of equipment failures according to the severity of
their impact on ATC operations. As discussed previously, three severity ratings are
recognised, namely major, moderate, and minimal (Table 4-5). Although major failures
are the least frequent, their impacts on ATC operations and controller recovery
performance are the most severe. For this reason, the rest of the analysis focuses on
‘major’ equipment failures. The distribution of the ATC functionalities most affected by
major failures may be skewed due to the Country D dataset which does not incorporate
failures of the navigation functionality (see section 4.4.3). Future research should
address ‘moderate’ and ‘minimal’ severity categories as these are prone to errors of
controller recovery in the absence of written and practiced procedures.
Figure 4-10 Distribution of equipment failures according to their severity
The ‘major’ category accounts for 7 percent, 14.4 percent, 12.7 percent and 6.5
percent of the equipment failures within Countries A, B, C, and D respectively. These
results show the importance of assessing the degree of severity for each of the
equipment failure occurrences. For example, the majority of failures reported in the
Chapter 4 Equipment Failures in ATC
97
Country D dataset tend to have minimal impact on ATC operations and controller
performance (Figure 4-13). However, if we observe only major equipment failures, or
failures that affect an entire ATC Centre or a major part of it, it is notable that the most
affected ATC functionalities are: communication accounting for 45.3 percent of all
aggregated equipment failure reports, surveillance accounting for 29 percent, followed
by data processing and distribution accounting for 15 percent (Figure 4-11).
System monPointing/inputPowerData procSurvNavComm
ATC functionalities
250
200
150
100
50
0
Fre
qu
en
cy
Country D
Country C
Country B
Country A
Country
Figure 4-11 Distribution of major equipment failures according to ATC functionality
Further, the major failures of the communication functionality are mostly due to the loss
of air ground communication or available frequencies and problems with data
exchange network (when used as a coordination channel). This is determined by
observing the frequency of equipment types that support the communication
functionality affected by a major failure. Using a similar approach, the frequency of
equipment types that support the surveillance functionality affected by a major failure is
determined. These are: air situational display and radar. Within the data processing
and distribution function, more than half of the major failures are due to one particular
piece of equipment, namely the Flight Data Processing System (FDPS). This particular
system handles flight plans, making them ‘live’ through automatic events, manual
inputs, and transitions from one state to the other. This information is provided via the
air situational display or radar display (Table 4-12).
Chapter 4 Equipment Failures in ATC
98
Table 4-12 Summary of the five most affected equipment types from four datasets
ATC functionalities Major failures
Communication air ground communication
data exchange network
Surveillance air situational display
primary and secondary surveillance radar = loss of radar coverage
Data processing and distribution
flight data processing system (FDPS)
4.4.6 Duration of equipment failures
This section provides the distribution of equipment failures according to their duration.
As discussed previously in section 4.1.4, three categories are distinguished, namely
short period of time (order of magnitude in minutes), moderate period of time (order of
magnitude in minutes up to one hour), and substantial period of time (order of
magnitude in hours or days). This categorisation is informed by the characteristics of
the failure duration extracted from the Country D dataset as it is the only dataset which
has this information available. In general, the data shows that equipment failures could
last for a significant amount of time, i.e. the average duration being more than ten
hours (M=10.25h, SD=77.6h). This variable is measured from the first log of the event
until its final closure, which may have occurred some days later. This is the reason for
the significant spread of the duration variable around its mean. Data analysis revealed
that more than 600 failures lasted more than 24h. One particular failure of radar
telephone lines was particularly extreme in its duration as it was logged initially on
November 20, 2003 and closed on June 09, 2004, lasting more than six months.
Figure 4-12 shows the distribution of the failure duration according to the four
categories. It can be seen that the majority of failures last for less than one day, while
34.5 percent of equipment failures last up to 15 minutes (corresponding to short
durations). This particular category of equipment failures (short period of time) is
relevant to controller recovery. Equipment failures lasting up to 15 minutes require ad-
hoc thinking, use of past experience, training, and existing recovery procedures to
select and implement an optimal recovery strategy for the relevant contextual
conditions. Moreover, short duration failures lend themselves to experiment of
controller recovery, as presented in Chapter 9. Equipment failures lasting from 15
minutes to one hour belong to moderate duration category. Available data shows that
approximately 26 percent of equipment failures belong to the ‘moderate period of time’
Chapter 4 Equipment Failures in ATC
99
category. The final duration category, substantial period of time, is further divided into
two additional sub-categories, failures that last up to one day and those that last longer
than a day. This is done to extract more information as about 40 percent of the
equipment failures belong to the ‘substantial period of time’ category. The results of the
analysis suggest that eight percent of reported equipment failures in Country D lasted
more than one day. Further investigation of equipment types affected by failures lasting
more than one day revealed that the majority of these are data exchange network
problems, air situational display, flight data processing system, links with radar sites,
and air ground communication.
[>24.01][1.01-24][0.26-1][0.00-0.25]
Duration category (h)
3,000
2,500
2,000
1,500
1,000
500
0
Fre
qu
en
cy
8.04%
31.6%
25.85%
34.51%
Figure 4-12 Distribution of the failure duration according to four distinct categories
Since this research addresses controller recovery from ATC equipment failures, the
focus is on ‘major’ failures within the ‘short period of time’ category. Table 4-13
presents the distribution of the major failures lasting up to 15 minutes, according to the
ATC equipment affected. It can be seen that the equipment most affected is the data
exchange network, followed by the other surveillance systems (mostly refers to radar
link), flight data processing system, air situational display, and air ground
communication.
Table 4-13 Distribution of major failures lasting up to 15 minutes per ATC equipment affected
ATC equipment affected Percentage
data exchange network 28
other surveillance systems 16
flight data processing system 13.7
Chapter 4 Equipment Failures in ATC
100
air situational display 12
air ground communication 7.4
4.4.7 Additional statistical tests
After the summary statistics presented for each of the datasets available and for four
relevant variables (ATC functionality, complexity of failure type, severity, and duration),
the final step is to test any interactions that may exist between these variables. The
ATC functionality variable is used because it has only nine categories, compared to the
ATC equipment variable which has more than 60 different categories. The rationale
behind the choice of statistical tests performed is explained in section 4.4.1. The results
are presented in Table 4-14.
Table 4-14 Statistical tests and results obtained
Country Variable 1 Variable 2 Test Statistical significance at 95
percent confidence level
Country A
ATC functionality
Severity Non-parametric test
(Cramer's V)
p<0.001
Country B p<0.001
Country C p<0.001
Country D
ATC functionality
Severity as above p<0.001
ATC functionality
Duration
as above p<0.001
Severity Non-parametric test
(Kendall’s tau) p=0.021
All statistical tests revealed significant relationships. For all available datasets there is a
significant relationship between the type of ATC functionality affected and the
equipment failure severity rating. The main findings from these tests indicate the
dominance of equipment failures affecting the communication and surveillance
functionalities with both minimal and major impact (see Table 4-15). The last test,
namely the relationship between failure severity and duration for Country D’s dataset
indicates significant negative relationship. In other words, the data indicates that the
longer the failure, the less severe it tends to be. This finding is expected as more
severe failures tend to be attended to immediately and thus the time between the first
log and closure of these failures may be shorter.
Chapter 4 Equipment Failures in ATC
101
Table 4-15 Main findings regarding interactions between ATC functionality and severity
Country Severity rating
Major Minimal Country A surveillance communication Country B
communication communication and navigation
Country C navigation Country D communication and surveillance communication and surveillance
After qualitative and quantitative assessment of the equipment failures in ATC, the next
section derives a framework of the equipment failure impact assessment tool. This tool
is designed to assess equipment failures and provide an indication of their severity or
overall impact on ATC operations.
4.5 Qualitative equipment failure impact assessment tool
The ATC functionality classification defined in Chapter 2 is used as a basis for the
framework of the qualitative equipment failure impact assessment tool, as designed in
this research. This tool takes into account the proposed classification as well as the
failure characteristics relevant to controller performance. Thus, all previously defined
equipment failure characteristics must be examined for their relevance to ATC
operations. Table 4-16 provides the list of equipment failure characteristics relevant to
this tool. These are the type of ATC functionality provided by the failing system,
complexity of failure type, time course of failure development, and duration of failure.
Table 4-16 Review of equipment failure characteristics with regard to their impact on ATC operations
The inclusion of all failure characteristics in this tool except ‘ATC functionality affected’
is relatively straightforward. When including the characteristic ‘time course of failure
development’, out of three possible categories (i.e. sudden, gradual, and latent) the
category ‘latent’ was omitted. The reason for this lies in the fact that latent failures tend
Equipment failure characteristics Impact on ATC
operations Comment
ATC functionality affected √ To be considered
Complexity of failure type √ To be considered
Time course of failure development √ To be considered
Duration of failure √ To be considered
Impact on operational room x Output
Impact on ATC operations (severity) x Output
Impact on ATM operations (capacity, delays)
x Not relevant within the scope
of this research
Chapter 4 Equipment Failures in ATC
102
to be overlooked in the overall ATC system for long periods of time until triggered by
some other failure. As such, they have a profound effect on the controller, but only
once they are triggered by other failure.
The ‘ATC functionality affected’ represents the key failure characteristics in terms of
effect on controller performance. It is significantly different if the controller is left to
operate without some key functionality (e.g. radar picture, communication, power
supply) as opposed to some auxiliary tools or equipment (e.g. monitoring tool, headset,
mouse). Therefore, it is necessary to separate ATC functionalities according to their
importance for the radar control of air traffic in a dedicated airspace. The separation is
intended to simply differentiate between primary and secondary ATC functionalities.
Their precise definitions informed by various examples are given in the following
paragraphs and Table 4-17.
Primary ATC functionalities are considered primary tools for achieving safe and
efficient flow of air traffic in any dedicated airspace. This group consists of the key
components, equipment, or tools of the communication, navigation, surveillance, data
processing, and power supply functionalities. These ATC functionalities are
categorised as primary ATC functions because they provide the critical information to
the controller. This critical information consists of: voice (and data) communication with
the aircraft in a dedicated airspace, aircraft horizontal and vertical position relative to
other traffic, and navigational directions or vectors to comply with the requirements of
the flight plan. These data are presented to the controller via an operational display
used for tracking the progress of multiple aircraft at any given moment. In modern ATC
Centres, the communication function is provided via the Voice Switching
Communication System (VSCS) touch panel (see Chapter 2 for more details). In
addition, it is necessary to highlight that the power functionality also represents a
primary function. This is a direct consequence of the computer driven ATC environment
where electrical power supplies all of the above mentioned systems. Therefore, in case
of any disruption (either from public utilities or an ATC Centre's own installation), the
controller may lose some or all primary functionalities. Table 4-17 captures the primary
ATC functionalities.
Secondary ATC functionalities (Table 4-17) represent supporting tools to achieve the
primary objective of the ATC service. Their function is important but not irreplaceable
by other, primary ATC functionalities. This group consists of: input/pointing devices,
system monitoring, safety nets, supporting ATC tools, as well as various components
Chapter 4 Equipment Failures in ATC
103
of the communication, navigation, surveillance, and data processing functionalities. For
example, STCA, as a safety net, gained popularity out the past few years because of
its increased safety application and as a last ground-based technical defence against
mid-air collisions. Its sole purpose is to alert the controller to unsafe projected proximity
of two or more aircraft. Therefore, this system cannot be considered a primary function
in ATC but more of a supportive one. Furthermore, ATC tools, such as arrival and
departure managers, help sequence takeoff and landing of aircraft to provide the most
efficient utilisation of available resources (i.e. runway and airspace capacity). Overall,
without these tools, the controller may still provide the same functionality with
potentially less efficiency and increased workload.
Table 4-17 Detailed overview of the primary and the secondary group of ATC functionalities
ATC functionality
group ATC functionality
Sub-functionalities (equipment, sub-systems, tools)
Primary
Communication Air-ground Ground-ground Voice Switching Communication System
Navigation Instrument Landing System (ILS) (during approach phase and in the case of reduced visibility)
Surveillance
Primary Surveillance Radar Secondary Surveillance Radar Parallel Approach Runway Monitor Terminal Approach Radar Precision Approach Radar Air Situational Display
Data processing Flight Data Processing System Radar Data Processing System
Power supply Main power system Uninterruptible power supply(generator, battery)
Secondary
Communication
Data exchange network Back-up system Aeronautical Information Service Other
Navigation
Navigational aids (e.g. Very high frequency Omnidirectional Range - VOR, Distance Measuring Equipment - DME) Airport facilities control and monitor (navigation aids monitoring, aeronautical ground lighting)
Chapter 4 Equipment Failures in ATC
104
Surveillance
Surface Movement radar Automatic Dependent Surveillance Aerodrome Traffic Monitor Other (radar link, radar console) Auxiliary Display
Data processing Flow control supporting equipment Fallback facility Other (e.g. strip printer)
Supporting function (ATC tools)
Monitoring aids Sequencing manager Other
Safety nets
Short Term Conflict Alert Minimum Safe Altitude Warning Area Proximity Warning Runway Incursion Monitoring and Conflict Alert System
Pointing and input devices
Pointing devices Input devices
System monitoring
Data recording and playback facility Control and monitoring Degraded modes Time management
Based on the selected characteristics of ATC equipment failures, it is possible to rate
the severity of each possible combination of characteristics. The three-level severity
rating defined previously, based on the impact of equipment failure on ATC operations,
has been used. This severity rating differentiates between major, moderate, and
minimal impact, as defined in section 4.2.3. In general, Figure 4-13 presents the
equipment failure impact assessment tool as a four-step methodology to assess the
severity of an equipment failure. After determining the exact characteristics of
equipment failure in each step, it is possible to follow the link to the final outcome, i.e.
severity rating.
Chapter 4 Equipment Failures in ATC
105
Figure 4-13 Qualitative equipment failure impact assessment tool
The output of this tool is an assessment of the overall impact of an equipment failure
on ATC operations and consequently controller performance. The rationale behind the
severity ratings presented in Figure 4-13 is as follows:
� Loss of primary functionality tends to have moderate to major severity, depending
on other equipment failure characteristics (e.g. complexity of failure type) and
relevant contextual conditions (e.g. traffic). Moderate to major severity rating is due
to the fact that the primary ATC functionalities represent the critical tools for
achieving a safe and efficient flow of air traffic in any airspace.
� Loss of secondary functions tends to have minor to moderate severity, depending
on the additional variables such as complexity of failure type, time course of failure
development, and duration. Minor to moderate severity rating is due to the fact that
the secondary ATC functionalities only provide assistance for more efficient air
traffic control, but do not represent the systems without which the control of the air
traffic flow becomes unfeasible.
� Multiple failure occurrences may have a more severe impact on ATC operations
than a single failure occurrence simply because controllers have to cope with more
than one failure simultaneously.
� Gradual failures (e.g. gradual loss of data integrity) may have a more severe impact
on ATC operations than sudden failures (e.g. sudden loss of data).
� Duration of failure and severity rating tends to be inversely proportional. Data
analysis indicates that the longer the failure duration, the less severe it tends to
affect ATC operations and controller performance. The rational behind is that more
Chapter 4 Equipment Failures in ATC
106
severe failures tend to be attended to immediately and repaired in a shorter time.
Moreover, if it is known that a certain primary functionality will not be available for a
considerable amount of time an ATC Centre may impose strict flow restrictions. For
example, strict flow restrictions may be imposed in the event of total failure of the
surveillance function (loss of primary and secondary radar). Partial failure would
allow traffic but at a restrictive flow rate (loss of secondary radar). Even if a
prolonged failure affects secondary ATC functionality (e.g. strip printer), the
controller working position will have to be closed. This is due to the disruption
caused by replacement of a previously automated task with manual input of flight
information for each flight entering a dedicated airspace. As a result, it seems that
the most severe impact can be expected mainly from short to medium duration
failures.
The emphasis of this research is on equipment failures which may have a major impact
on ATC operations, including an air traffic controller performance. Therefore, the output
of the qualitative equipment failure impact assessment tool in Figure 4-13 is useful for
selecting potential equipment failures of relevance to the research on controller
recovery (used to inform the experimental design in Chapter 9).
Considering this, the qualitative tool could be used in an operational environment in two
ways. Firstly, the left-to-right approach allows investigation of past equipment failure
occurrences and their impact on ATC operations. Secondly, using the right-to-left
approach this qualitative tool can be used as a method for design of the most severe
training scenarios. The training instructors could easily adjust the set of primary ATC
functionalities to the taxonomy of their systems/equipment and the characteristics of
the ATC system architecture. The qualitative equipment failure impact assessment tool
may be used as a design tool for the regular refresher unusual/emergency situation
training as recommended by EUROCONTROL ASSIST scheme (EUROCONTROL,
2003f).
The main disadvantage of the qualitative equipment failure impact assessment tool is
its inability to simultaneously assess the impact of several independent failures on
controller performance; rather it assesses one failure at a time as well as common
cause and common mode failures through the complexity of failure category. However,
previous research has already highlighted that multiple failure occurrences create
the highest workload (Wickens et al., 1997). As such, the current version of the
qualitative equipment failure impact assessment tool is sufficient for selection of the
Chapter 4 Equipment Failures in ATC
107
most severe failure types, independent of each other. Future research should look into
the enhancement of this tool to enable the assessment of the impact of several
independent failures on controller performance. The output of this more advanced
approach would be to indicate the most severe independent multiple failure
combinations. However, to achieve this, the tool would have to be designed for a
specific ATC Centre to integrate the complexity of its ATC architecture and flow of data
between the various components of the ATC system.
4.6 Summary
In line with the objective of the research presented in this thesis, this Chapter has
identified potential equipment failure types and their key characteristics. Special
attention has been paid to the consequences of equipment failures and their impact on
ATC operations. A severity rating has been defined and applied to available operational
failure reports. The Chapter has further discussed technical recovery designed to
prevent or mitigate the impact of equipment failures on ATC operations and controller
performance.
Stepping away from theoretical findings from past literature, this Chapter has provided
operational input through the analyses of operational failure reports from four countries.
These analyses focused on four variables: the type of ATC functionality and equipment
affected by the failure, complexity of failure type, severity of its impact, and the overall
duration of the failure. Using the available reports it has been possible to identify
distributions of equipment failures in relation to these four variables. Although these
countries are different in terms of the volume and characteristics of airspace they
control, traffic levels, and equipment types; the analyses has shown that
communication and surveillance functionalities are affected most by equipment failures.
When observing only major failures, the most affected are the communication,
surveillance, data processing functionalities, and power supply. Further investigation of
major failures lasting a short period of time has revealed the most affected ATC
equipment. These are the data exchange network (as part of the communication
functionality), the flight data processing system (as part of the data processing
functionality), and air situational display (as part of the surveillance functionality).
The Chapter has concluded with development of a framework for the assessment of
the impact that every single equipment failure has on ATC operations. In general, the
knowledge acquired from equipment failure literature, informed by the analyses of
operational failure reports has been incorporated into the qualitative equipment failure
Chapter 4 Equipment Failures in ATC
108
impact assessment tool and its severity output. These will inform the choice of
equipment failure and its characteristics for the experiment designed to assess
controller recovery.
The safety-critical industry is aware of the fact that hazardous equipment failures
cannot be avoided and that absolute safety is not achievable. Thus, the same attention
given to their analysis should be given to the overall human recovery process. Kanse
(2004) points out that “what we really want to prevent is not so much the failures
themselves, but the negative consequences of these failures.” As a result, the following
Chapter gives appropriate attention to the controller recovery process.
Chapter 5 Air Traffic Controller Recovery
109
5 Air Traffic Controller Recovery
The previous Chapter explained the characteristics of equipment failures and the
notion of technical recovery. This Chapter reviews the associated issues of the process
of controller recovery. In Air Traffic Control (ATC), the human recovery process
involves two groups of individuals. One group consists of controllers and the other
consists of system control and monitoring engineers1. The Chapter starts with a brief
discussion of the roles controllers and engineers have in the recovery process. As the
focus of this thesis is on controller recovery from equipment failures, the Chapter
continues with a review of past research of relevance to this subject. In this respect, the
Chapter reviews in detail the phases of controller recovery and the corresponding
models developed for the Air Traffic Management (ATM) and non-ATM industries. This
is followed by a discussion of the major factors that influence the quality of controller
recovery. The Chapter concludes by proposing a set of variables used for a detailed
assessment of controller recovery performance later in this thesis. This set of recovery
variables is also used as a guide to the design of the experiment to capture real data
on controller recovery in Chapter 9.
5.1 Human recovery in air traffic control
The human recovery process in the ATC environment involves two distinct groups of
individuals. One group is represented by air traffic controllers and can consist of a
single controller or a team of controllers depending on the configuration of the ATC
Centre and the traffic levels at any given moment. Engineers from the system control
and monitoring unit belong to the second group. This section gives a brief description
of the role of each group and the specific tasks to be executed to recover from
equipment failures in ATC.
1 Referred to as ‘engineers’ throughout the thesis.
Chapter 5 Air Traffic Controller Recovery
110
5.1.1 Recovery by air traffic controllers
In the case of any equipment failure that affects controller performance (referred to as
a hazard in this thesis), controllers are responsible for recovering the system and
achieving a safe but not necessarily efficient level of operation. There are many human
factors issues that affect controller performance under normal conditions, and it is
reasonable to assume that the same factors are even more critical under abnormal
conditions, such as equipment failures. In other words, the context in which controller
performance takes place is important in understanding controller reliability. A detailed
review of contextual factors that may influence controller recovery and a methodology
for their potential influence on controller performance are presented in Chapter 7 and 8,
respectively.
While a recovery procedure may exist or not in the event of an equipment failure, most
ATC Centres have developed procedures for reporting and resolving such failures. Any
equipment failure should be reported to the supervisor, whilst those with operational
and safety impact must be reported under the mandatory occurrence reporting scheme
(for details see Chapter 3). Details of the failure are also forwarded to the system
control and monitoring unit.
When a failure has been rectified, the system control and monitoring unit notifies the
supervisor that the equipment has been restored to service. Then it is the duty of the
supervisor to inform the relevant sector staff and ensure that the restored equipment is
functioning correctly before updating the status of the failure in the database. In the
event that the system control and monitoring unit identifies a failure occurring in the
operations room, it is the duty of this unit to inform the supervisor who will subsequently
informs the controllers.
5.1.2 Recovery by system control and monitoring engineers
Failures are not necessarily detected only by controllers. Due to the layers of built-in
defences that exist in modern ATC Centres, the majority of equipment failures do not
affect the controller (NATS, 2002). These failures are detected by the technical system
and resolved by engineers from the system control and monitoring unit (e.g. by
receiving a system-generated alert and using redundant equipment, respectively).
EUROCONTROL (2004e) refers to an ATC system control and monitoring unit as ‘a
critical partner in maintaining ATC systems’. Engineers monitor and control equipment
Chapter 5 Air Traffic Controller Recovery
111
that supports controllers. They reconfigure and maintain degraded or failed equipment
with minimum disruption to controller tasks and regularly upgrade the software as
operational requirements deem necessary. System control personnel have rapid and
reliable communication links with the ATC operations room via the supervisor. They
utilise this communication channel to inform ATC staff of the status and performance of
equipment and systems or to receive reports of technical problems and equipment
failures from the operations room. Therefore, EUROCONTROL (2004e) concludes that
recovering the ATC system from failure is a result of close coordination and
cooperation between controllers, technicians, and management.
Following this brief discussion of the roles and responsibilities of controllers and
engineers in the recovery process, the next section reviews the past research on the
human recovery process and its phases, developed for the Air Traffic Management
(ATM) and non-ATM industries. The main findings are then applied to a particular
process of controller recovery.
5.2 Phases of the controller recovery process
Existing literature on the human recovery process (either from human error or technical
failure) is largely based on the concept of a sequence of phases that constitute the
process of recovery. The human recovery process has become an important topic in
many areas of applied psychology, particularly in safety research in the chemical
industry (e.g. van der Schaaf, 1992; Kanse and van der Schaaf, 2000; and Kanse,
2004), the nuclear industry (Kaarstad and Ludvigsen, 2002), and the ATM industry
(Bove, 2002). Other examples include research on errors in the use of human-
computer interfaces (e.g. Kontogiannis, 1999; Rizzo, Ferrante, Bagnara, 1995; Zapf
and Reason, 1994), in the office environment (e.g. Frese, Broadbeck, Zapf, and
Prumper, 1990), in software design (Frese, 1991), and in the assessment of everyday
slips and mistakes (e.g. Sellen, 1994).
As can be seen from Table 5-1, there is consensus amongst researchers in various
domains to the existence of at least three phases of the human recovery process. A
few of the researchers, focused on the errors in the design of human-computer
interfaces, including a phase before the actual detection: the occurrence of an error
(Zapf and Reason, 1994) or the emergence of a mismatch (Rizzo et al., 1995), with the
latter being a precursor of the detection phase. The emergence of a mismatch involves
the discrepancy between feedback and active knowledge (active expectations or
implicit assumptions). Rizzo et al. (1995) discuss and explain the difference between
Chapter 5 Air Traffic Controller Recovery
112
mismatch and detection processes through several examples of human error.
Mismatch is considered as a breakdown of the action-perception loop. However, only
after actual detection of mismatch will it be understood as an error or a failure.
From the detection phase onwards, some phases, including diagnosis and correction,
are recognised by most researchers even though sometimes different terminology is
used (Table 5-1). For example, the diagnosis phase is often referred to as the
explanation, localisation, or identification phase. Similarly, the correction phase is often
referred to as the handling, planning and execution, recovery, or countermeasure
phase.
Table 5-1 Phases of the recovery process identified in past research
Author(s) Context of research Phases of the recovery process
Frese (1991) Software design � Error detection � Error explanation � Error handling
Kontogiannis (1999) Human Machine
Interface
� Error detection � Error explanation or localisation � Error correction
Zapf and Reason (1994) Human Machine
Interface
� Error occurrence � Error diagnosis (detection +
explanation) � Error recovery (planning + execution)
Rizzo, Ferrante, and Bagnara (1995)
Human Machine Interface
� Mismatch emergence � Detection � Recovery
Sellen (1994) Assessment of
everyday slips and mistakes
� Error detection � Error identification � Error recovery
van der Schaaf (1992) Nuclear industry � Detection � Localisation � Correction
Kanse (2004) Chemical industry � Detection � Explanation � Countermeasures
Kaarstad and Ludvigsen (2002)
Nuclear industry � Detection � Explanation � Correction
Bove (2002)2 ATM industry
� Detection � Correction
Therefore, in the research on recovery from equipment failures presented in this thesis,
past research is used to inform the phases of the controller recovery process.
2 Bove (2002) does not identify the diagnosis phase in the human error management process.
This may be due to the fact that this phase represents a covert human activity, difficult to observe, measure, and capture in incident reports.
Chapter 5 Air Traffic Controller Recovery
113
Detection of equipment failure is taken as the first phase, triggered by the mismatch
between ATC system feedback and active knowledge of the controller (expectation or
assumption). This phase is followed by the diagnosis and correction, leading toward
the outcome of the recovery process (as a result of both technical and controller
recovery).
Controller recovery is defined in this thesis as the ability of the controller to detect3,
diagnose, and correct any non-nominal system state resulting from ATC equipment
failure (adapted from van der Schaaf, 1995). The objective of the recovery process (i.e.
its outcome) is to restore the system to its nominal (pre-failure) state or at least to limit
the consequences of failure in the most efficient and effective way (by achieving stable
non-nominal system state). The following sections discuss the phases of controller
recovery.
5.2.1 Detection
Human recovery is a sequential process whose first step is the detection of failure.
Without this detection there is no recovery process. Therefore, the first task of the
controller is to detect the failure. As previously explained, failures can be firstly
detected either by a technical system or by a controller. Hallbert and Meyer (1995) note
that to accomplish detection by the human operator, the stimulus must be
recognisable. In other words, the stimulus must be something that a controller has
already experienced, is trained to observe, or is of sufficient intensity to interrupt the
monitoring process (e.g. visual or auditory alert positioned within the field of view but
different from the background ‘noise’ already present on the radar screen or other
operational support system).
Thus, detection is triggered by any mismatch between the expected effects and
observed outcomes. The mismatch can be explained on the basis of the information
that is matched against the frame of reference or range of the expected system
responses. For example, after issuing an instruction for a flight level change to an
aircraft, the controller expects to see the old flight level gradually changing toward the
new one. However, if the controller observes a flight level change outside the expected
3 Failures can be firstly detected either by a technical system or by a controller. Failures
detected by a technical system may trigger the generation of an alert (via warning device) transmitting information on failure to the controller. However, failures can also go unnoticed by the technical system and be detected by a controller working with fallible equipment.
Chapter 5 Air Traffic Controller Recovery
114
values, then this expectation will trigger the identification of some sort of ‘fault’. This
‘fault’ can be caused by an erroneous flight level change by the pilot or system readout
of the aircraft altitude (e.g. due to radar garbling).
In the case of a total failure of a particular function, it is easier to detect and diagnose
the significance of the change, since the failure is obvious. However, in the case of a
partial failure of a particular ATC function (e.g. corruption of tracks and squawks),
detection may be more challenging. In these circumstances, detection is based on the
controller’s memory of aircraft’s past positions and future trajectories, aided by
available tools (e.g. flight strips). An example of potential difficulties encountered by
controllers in detecting partial equipment failure is reported by Sampaio and Guerra
(2004). In this example, a sudden failure of the Radar Data Processing System (RDPS)
affected only one radar track and went unnoticed by the controller for 21 minutes (see
Chapter 4, section 4.2.1).
Detection is also closely connected to the time course of equipment failure
development, namely sudden, gradual, or latent failures (see Chapter 4, section 4.1.3).
Sudden failures do not allow any time to prepare, but are usually detected immediately.
On the other hand, detection of gradual failures may be extremely difficult and delayed.
Persistent (latent) failures are almost impossible to detect. They might exist in the ATC
system for a long period of time before they are detected. This is confirmed by
interviews conducted during this research with the aim of augmenting the theoretical
sources of information. Engineers from three European ATC Centres confirmed that
latent failures (mostly software failures) tend to go unnoticed until some other event or
failure reveals their existence (for evidence see Appendix II).
There are various other factors that can hinder failure detection, such as difficulties in
observing system feedback or remembering expectations about effects. Detection can
also be made difficult by inappropriate system design (e.g. poor human machine
interface, poor quality or position of alert), workplace layout, or controller working
strategy. As an example, an alert that is barely visible or audible may remain
undetected even by a highly alert controller.
Often, successful detection occurs as a consequence of a combination of design
qualities and mental resources. An example is taken from one of the European ATC
Centres where the label of the ATC function positioned in the ‘general information
window’ changes its colour from white to yellow in the case of a failure. However, in the
Chapter 5 Air Traffic Controller Recovery
115
training facility of the same ATC Centre, within the same window, one specific label is
designed to be colour-coded yellow regardless of its status (i.e. label ‘Lines’ refers to
the status of the communication lines between a number of ATC Centres). Such a
training platform design feature has the potential to result in the missed detection of a
failure by a controller as a result of a continuous and consistent presence of the yellow
colour in the ‘general information window’.
Besides the quality of an alert, its onset also plays an important role. As previously
discussed in Chapter 4, alert onset (i.e. Time-To-Alert or TTA) is defined as time
between a system’s detection of a failure and the moment an alert is presented on the
Human Machine Interface (HMI) either by colour change or text message. More
importantly, the future concept of cognitively convenient alarm onset aims to
circumvent these human limitations by providing an alert, for the system-detected
failure occurrence, at the moment when levels of controller workload allow its detection
(see Chapter 4, section 4.3.2).
The above discussions have highlighted that detection can be either enhanced or
hindered by a combination of technical and human related factors. External stimulus,
past experience, appropriate design solutions, and sudden development of equipment
failures tend to enhance detection. However, inappropriate system design, high levels
of workload and fatigue may hinder failure detection. Similar conclusions are drawn
from the study on human recovery performance in nuclear power plants by Kaarstad
and Ludvigsen (2002). Based on a literature review, an experimental investigation, and
field studies, they identify the three most significant factors that affect the detection
phase. These are:
� communication - interaction with colleagues can provide information to detect a
failure;
� system feedback - cues directly found in the operational environment (e.g. alerts,
other non-usual system event); and
� internal feedback - mismatch between operator’s expectations of
system/environment and the existing system status.
All above mentioned factors are relevant within the ATC environment. For example,
communication represents an important factor as the information on an equipment
failure can come from the supervisor or the system control and monitoring unit.
Similarly, in the ATC environment internal feedback is referred to as ‘mental model’.
Once the controller is aware of information mismatch, his or her task is to rapidly
Chapter 5 Air Traffic Controller Recovery
116
determine the significance of that mismatch. Generally, the existing system output is
compared with the previously observed one, to determine whether the change is within
tolerance. For example, if an aircraft is in level flight no flight level change should occur
and any deviation from the cleared flight level should trigger the detection of an
unusual event (e.g. pilot error, radar garbling).
The detection phase is investigated further using data from a questionnaire survey and
an experiment in Chapters 6 and 10 respectively.
5.2.2 Diagnosis
Once detection occurs, the diagnosis phase (also known as explanation, localisation,
or identification phase) determines what the failure is, its cause, and what should be
done to correct it. A controller needs a good knowledge of a failure to determine what is
occurring and its effects (e.g. what to expect in the near future, whether the function is
still partially available or totally lost, any problem with data integrity and possible impact
on other tools). This is especially important in the ATC environment where the overall
system consists of highly integrated components and different failures may present
themselves to the controller in a similar manner. For example, a radio frequency failure
manifests itself in the same manner regardless of its cause (i.e. ground- vs. airborne-
based failure). Therefore, it is up to the controller to identify the true failure by ruling out
alternatives. In this particular example, the controller will first try to establish radio
contact with other aircraft. If communication is established with the other aircraft it is
reasonable to assume that the failure is on the aircraft side. The controller will then try
to identify if it is a receiver or a transmitter failure by asking the aircraft to squawk
identification. If the aircraft squawks identification then the pilot clearly heard the
transmission. The controller then knows that the aircraft has experienced a transmitter
failure. By employing this procedure, the controller determines the precise element of
the equipment that failed, and thus implements the most appropriate recovery
procedure.
Past research in non-ATM industries has shown that in some cases, after the detection
of a failure, the corrective actions are immediately known and implemented. In these
cases, the diagnosis phase is omitted (e.g. in the nuclear industry - Kaarstad and
Ludvigsen, 2002). Similarly, the study from the chemical process industry has shown
that the order of the phases is not always the same. More precisely, the diagnosis
phase does not necessarily follow the detection phase, especially in time-critical
Chapter 5 Air Traffic Controller Recovery
117
operations. Often a quick fix might be necessary or an initial correction might occur
even before the cause of a failure has been identified (Kanse, 2004).
The findings from non-ATM industries are not entirely applicable to the ATC/ATM
environment. It is difficult to see how the diagnosis phase could be omitted simply
because proper ATC equipment failure recovery is not possible without knowing the
true nature of a failure. However, the duration and the attention dedicated to the
diagnosis phase relates directly to the level of workload experienced by the controller
at the moment of failure occurrence and during the recovery process. Through
interviews, EUROCONTROL study determined that controllers in most occasions do
not seek an explanation for a cause of failure (EUROCONTROL, 2004e). They focus
only on identifying the system that failed, which is essential to implement an adequate
recovery strategy. An example could be the code-callsign conversion failure, where,
having detected a problem, the controller has to identify the pair of aircraft affected.
This tends to be a very time-consuming process leaving no time for the controller to
consider the cause of the failure. Another example is corruption of radar data. If the
controller doubts the quality of a particular radar source in the multi-radar coverage
airspace, it is possible to use information from other radar sources. If the same failure
occurs in the single-radar coverage airspace, the controller has to disregard radar data,
initiate procedural (non-radar) control, and pass the problem to the system control and
monitoring unit. In both cases, the controller has to determine what failed and what the
impact of that failure is, in order to implement an adequate recovery strategy. The
cause of the failure is left to the system control and monitoring unit to investigate.
From the discussion above, it is clear that the diagnosis phase is important to identify
the equipment that has failed. However, if the failure is identified and corrective actions
are immediately known, diagnosis is omitted for the subsequent correction phase. The
diagnosis phase and the factors that may influence it are addressed further in Chapter
10 on an experimental investigation. Once the controller diagnoses the failure type and
its impact on the ATC system, the tasks shift to more action-based activities. In short,
the controller initiates the correction phase which is described below.
5.2.3 Correction
Failure recovery involves knowing how to undo or minimise the effect of failure and
achieve the desired system state (nominal or stable non-nominal system state,
respectively). The first priority is to minimise the effect on the air navigation service and
the exposure of the problem in terms of aircraft and time. Depending upon the
Chapter 5 Air Traffic Controller Recovery
118
equipment failure type, recovery should follow available procedures (for details see
section 5.5). Some of them could be fairly simple like switching to another radar source
in multi-radar processing areas, changing to the secondary radio frequency (if the
primary one is blocked), changing unserviceable input devices (mouse or keyboard),
and switching to another console (if the current one is not operational). Other recovery
strategies could be very complex and both physically and mentally demanding. For
example, if an automated conflict detection tool fails to work properly (e.g. Short-Term
Conflict Alert – STCA and Medium Term Conflict Detection - MTCD), an alert might
appear when there is no failure, or conversely the controller might detect a conflict that
was not alerted automatically. In both instances, the controller will diagnose that the
conflict detection tool itself is not functioning properly. Immediate action would be
required to ensure the safety of all traffic. In other words, the controller will have to
detect all existing conflicts and resolve them in a timely and efficient manner without
the assistance of automated safety nets (e.g. STCA). The second priority would be to
test and restore the automated function, which would be the responsibility of the
system control and monitoring unit.
Past research in the nuclear industry has identified different types of decision events
that constitute the correction phase of recovery (Orsanu and Fischer, 1997; Kaarstad
and Ludvigsen, 2002). These are assessed for the ATC environment below:
� ignoring the failure – error/failure has been detected, but ignored by the operator for
two possible reasons: error/failure is considered irrelevant (i.e. no impact on
operations) or the operator assumes that his/her intervention may make the
situation worse. In any case the failure would have to be reported;
� applying procedures – this seems to be the most common correction type.
Therefore, it is necessary to ensure that procedures exist and that they are
appropriate to a particular failure;
� choosing a solution – in theory this is applicable when procedures are not available
and the human operator has to apply more conscious resources to comprehend the
situation. In many situations it may seem that only one solution is possible to
resolve the failure. However, in retrospect, more than one solution may be
available, while only one was considered at the time; and
� creating a solution – in this case the operator has no experience with the failure
type. No procedures, training, or past experience are available for the human
operator to draw upon. A completely new solution or strategy has to be created.
Chapter 5 Air Traffic Controller Recovery
119
This represents the most resource-demanding option of all. This process
corresponds to human heuristic competence4 (Rigas and Elg, 1997).
In the context of ATC, if the failure penetrates all existing built-in defences and affects
controller performance, it cannot be ignored. Thus, the recovery from ATC equipment
failures can be accomplished by applying a predefined procedure, modifying an
existing plan, or developing a new one. However, application of an existing procedure
would be the preferred option as it puts the least strain upon the controller. Compared
to the nuclear environment, the execution of the chosen procedure has to be done in a
very short time frame (EUROCONTROL, 2004e). An important aspect of the correction
phase and recovery is coping with stress induced by unexpected failure. Interviews
with controllers conducted for the EUROCONTROL study confirmed that unexpected
failures tend to significantly increase workload and stress (EUROCONTROL, 2004e).
Controllers are unable to perform their tasks effectively with a large reduction of the
ability to cope with other adverse operational and environmental conditions.
Furthermore, the controllers interviewed highlighted that the critical incident stress
management is essential in managing the stress associated with equipment failures
(EUROCONTROL, 2004e).
The correction phase and the factors that may influence it are investigated further in
Chapter 6 and 10. From the discussions above, it is clear that existing recovery
procedures, recovery training, and past experience with equipment failures play an
important role in the overall recovery process. These three drivers build a knowledge
base for the choice or creation of the most appropriate solution for recovery from an
equipment failure. The discussion above, of the phases that constitute the process of
recovery, is followed in the next section by looking at the outcome of the recovery
process.
5.3 Outcome of the recovery process
Although the main recovery process consists of several phases, as explained
previously, these activities do not conclude the process itself (Figure 5-1). Prior to the
4 There are two types of human competences: epistemic and heuristic. Epistemic competence
refers to domain knowledge about the system which one seeks to control. It is context dependent component of the actual competence. Heuristic competence refers to a general competence for handling complex dynamic tasks. It is context independent, but it is developed over many years through both training and experience. As a result, actions and decisions become fast, automatic, without apparent conscious awareness.
Chapter 5 Air Traffic Controller Recovery
120
EQUIPMENT FAILURE
HAZARD
OUTCOME
RECOVERY
RECOVERY SUCESSFUL
RECOVERY NOT SUCCESSFUL
RECOVERY CONTINUES
INCIDENT WITH FURTHER
CONSEQUENCES
outcome phase, the human operator attempts to resolve the problem, by implementing
a recovery strategy. This is followed in the outcome phase by post-correction
monitoring or post-recovery analysis to determine the actual outcome of the
implemented strategy. Therefore, the first task in this phase is the monitoring itself,
both by controllers and engineers. Proper design solutions could aid this phase by
providing post-recovery system status indicators.
Figure 5-1 Analysis of the outcome phase (adapted from EUROCONTROL, 2004e)
It might be expected that at this stage human performance requirements are similar to
those of the detection phase. However, as observed by EUROCONTROL (2004e)
there is a crucial difference. Guided by implemented corrections (recovery strategies),
monitoring by both engineers and controllers is driven more by ‘top-down’ processes,
primarily expectation. Since at this stage in the recovery process the operators have
knowledge of the failure and its cause, they also have expectations on how the system
might behave after a correction is implemented. For instance, if the system remains
unstable, operators may expect a reoccurrence of the same problem, other related
problems (common-mode or common-cause failures), or have a general suspicion that
the assessment of the problem was wrong or misleading.
Following the period of monitoring or active checks, the controller must decide whether
recovery is successful. Recovery is considered successful if the system returns to the
nominal (pre-failure) or intermediate, stable state (EUROCONTROL, 2004e).
Intermediate state represents a degraded operational state (e.g. loss of any function,
item of equipment, or a significant overload condition causing increased system
response time) which is detected and stabilised either by controllers or engineers. In
essence, the system is in the intermediate state if the consequences of failure are still
observable in the system performance while controllers are aware of the quality of
Chapter 5 Air Traffic Controller Recovery
121
information they are receiving from the system and thus the quality of service they can
provide to traffic.
If recovery is unsuccessful, the controller will return to either diagnosis (to determine
the real cause of the problem) or correction phase to retry the previous strategy or
attempt a new one (Kanse and van der Schaaf, 2000; EUROCONTROL, 2004e). This
cycle of reapplied efforts continues as long as there is the time available for recovery.
Otherwise, if no time is available, the final outcome may be an incident with further
consequences (e.g. loss of separation).
The next section reviews the existing models of failure and recovery process
developed to support the research on human recovery in ATM and non-ATM industries.
5.4 Models of human recovery
Throughout the reviewed literature, only a few models cover both equipment failure and
its recovery process. On the other hand, an extensive volume of research is dedicated
to models of recovery from human error. These models are the result of work in the
field of human reliability and can be transferred to recovery from equipment failure. In
chronological order, the review begins with the work of Frese et al. (1990) and Frese
(1991), which was based on office workers’ errors and error handling in using
computers. In 1992, as part of a PhD thesis on near miss reporting in the chemical
process industry, van der Schaaf (1992) developed the Eindhoven classification model
of system failures. This model was based on Rasmussen’s Skill-Rule-Knowledge
(SRK) model of human behaviour (Rasmussen, 1982) as one of the most dominant
factors causing system failures in chemical process plants. The SRK model of human
behaviour was extended to system failures, incorporating additional root causes of
incidents, namely technical and organisational factors. The incorporation of all relevant
failure factors has created a comprehensive approach to safety management.
However, the approach has suffered from the limitations of the SRK model as
discussed below.
Bainbrigde (1984) reports problems using Rasmussen’s taxonomy of three main types
of cognitive behaviour, namely SRK. For example, the word ‘rule’ could be used for a
specific procedure, instructions, standard method based on previous experience, or
precise heuristic method. Another criticism is of the associated model for organisation
of cognitive behaviour, the so-called Rasmussen’s pyramid model. The model places
‘skilled’ behaviour at the base and ‘knowledge’ based behaviour at the top of the
Chapter 5 Air Traffic Controller Recovery
122
pyramid. This model, although representing the general organisation of cognitive
behaviour, does not contain mechanisms for complex behaviour (see Bainbridge,
1984).
While the previous discussions focus mainly on models for recovering from human
error, this section further presents three models that focus on recovery from technical
failures. These are: the model by Kanse (2004) developed and tested in the chemical
process industry; the EUROCONTROL’s project on Solutions for Human Automation
Partnership in European ATM (SHAPE) and the Recovery from Automation Failure
Tool (RAFT) developed specifically for the Air Traffic Management (ATM) industry
(EUROCONTROL, 2004e); and the model of failure recovery in air traffic control by
Wickens et al. (1998). The model by Kanse originates in non-ATM industry but focuses
not only on the human as a system component, but equipment and procedures as well.
This model lays down the ideas for the RAFT. The RAFT and the Wickens’ models
were chosen because of their relevance to research in this thesis as both assess the
impact of future automation on recovery from potential failures.
5.4.1 Model by Kanse
The basic principle behind the model by Kanse (2004) is a sequence of phases that
constitute the process of human recovery, detection, explanation (i.e. diagnosis), and
countermeasures (i.e. correction). The model is based on past research and
operational data from three studies of near misses in chemical process plants. Near
misses are incidents that have the potential to, but do not result in a loss (e.g. an
accident, injury, failure).
According to this qualitative phase model (Figure 5-2) the recovery process starts by
detection of a failure. This is followed by any combination of explanation (referred to as
diagnosis in this thesis) and countermeasures (referred to as correction in this thesis),
including omitting one or both of these phases but also their recurrences. For example,
the assessment of the order of the recovery steps performed by plant operators in each
incident revealed that the intermediate phase (i.e. diagnosis) was omitted in more than
35 percent of incidents (see Table 3 in Kanse, 2004).
The model does not focus on the factors that influence the recovery process but
highlights that factors influencing recovery might be different in different domains.
Additionally, the model does not make any attempts toward the prediction of human
performance, future errors, or failures.
Chapter 5 Air Traffic Controller Recovery
123
DDetection of
deviation
CCountermeasures
ENDOf recovery
process
BEGINProblem situation
arises as a result of one or more failures
EExplanation of deviation and
causes
Figure 5-2 Recovery process phase model (Kanse, 2004)
5.4.2 The RAFT Tool
The EUROCONTROL’s SHAPE project addressed the effects of automation on human
performance and future ATM concepts. A part of this project focused on the technical
failures and the controller’s ability to manage them and resulted in the Recovery from
Automation Failure Tool (RAFT), as a method for analysing technical failures.
The basic principle behind RAFT is a sequence of phases that constitute the process of
failure and recovery (Figure 5-3). Following a number of important factors that influence
the consequences of an equipment failure, the RAFT tool starts by assessing the
recovery context that has the potential to influence human recovery process (Figure 5-
3). This is followed by an assessment of the failure cause, problem definition
(according to the RAFT framework an equipment failure leads to a functional
disturbance), and the failure effects. Then, the RAFT tool moves toward the
investigation of the human recovery process. This is done separately for the controllers
and engineers involved. The final step in the failure analysis is the outcome phase
and includes an assessment of the effectiveness of the implemented recovery strategy
(Figure 5-3).
The RAFT is based on the past research and operational experience. It is based on a
qualitative model developed by Kanse and van der Schaaf (2000) for the chemical
process industry (further adapted by Kanse, 2004 as explained in the previous section).
The model by Kanse and van der Schaaf is further augmented with operational
experience, extracted from interviews with 31 ATM staff in four European ATC Centres.
The practical use of the RAFT is based on the existence of expert group-based
evaluation of each failure and prediction of how controllers are likely to respond to
equipment failures. This tool is intended to be used together with other SHAPE project
outputs for predicting controller performance in the future highly automated
Chapter 5 Air Traffic Controller Recovery
124
environment (e.g. a prediction of changes in controller skill requirements, workload,
trust). The approach has neither been verified through the recovery performance in
simulated nor operational environments and still lacks the set of recovery relevant
principles to guide designers of current and future ATM systems. Second generation
prospective Human Reliability Assessment (HRA) methods could be used to develop a
predictive capability of the RAFT tool and to inform safety-adequate design principles
related to controller recovery from equipment failures.
Figure 5-3 The Recovery from Automation Failure Tool Framework (EUROCONTROL, 2004e)
5.4.3 Model by Wickens et al.
In 1998, the Panel on Human Factors in Air Traffic Control Automation established by
the Federal Aviation Administration (FAA) studied various aspects of human factors
and the role of the human in proposed future automated systems. Amongst several
different issues, research by this Panel recognised the importance of equipment
failures and recovery. The Panel proposes a model of ATC failure recovery and places
an emphasis on the consequences of degradation of automated ATC functionalities
(Wickens et al., 1998). It is assumed that the model is based entirely on available
research as the Panel focused on concepts that will characterise the future ATC
system. The basic principle behind this qualitative model is the impact of ATC
automation functionalities (left-hand side on Figure 5-4) on capacity, traffic density,
complexity, workload, situational awareness, manual skills, and recovery response
time. Each of these variables is associated with a sign (or a set of signs) indicating
Chapter 5 Air Traffic Controller Recovery
125
whether automation is likely to increase or decrease the variable in question. However,
this model does not consider in detail how recovery is accomplished.
Figure 5-4 Model of failure recovery in air traffic control. Where two nodes are connected by an arrow, signs (+, -, 0) indicate the direction of effect on the variable depicted in the right node, caused by an increase in the variable depicted in the left node (Wickens et al., 1998)
The model also reflects the hypothetical function which relates recovery response time
to the level of automation (Figure 5-4). It is expected that recovery response time will
increase as the level of automation increases (shown as a dashed upward line on the
right side of the Figure 5-4), due to increased complexity, skill degradation, and overall
‘out of the loop’ phenomenon. The solid downward line reflects the decrease of the
reaction time available to controllers as a result of the introduction of higher levels of
automation. Controllers will have far less time to safely respond to any loss of
separation and fewer opportunities for effective solutions. As a result, this model
represents the Bainbridge’s (1983) ‘ironies of automation’ by overlaying two critical time
variables against each other and as a function of automation-related changes. These
variables are: the time required to establish safe separation, given a degraded ATC
service, and the time available to a controller (or a team) to react and safely recover
from a failure.
After describing the three models relevant to controller recovery from equipment failure
in ATC, Table 5-2 summarises their characteristics and identifies their limitations
addressed later in the thesis. In general, all three models are qualitative and based on
a principle of a sequence of phases that constitute the process of human recovery.
Chapter 5 Air Traffic Controller Recovery
126
They are based on past research, whilst only one model is based on operational data.
The limitations identified in the last column of Table 5-2, guided the research presented
in this thesis and the main principles behind the framework for the assessment of
controller recovery. In short, the research in this thesis is verified in the simulated
environment (experimental investigation – Chapter 10), based on operational
experience (from interviews with relevant ATM staff, operational data – Chapter 4, and
the questionnaire survey - Chapter 6), and based upon detailed assessment of the
recovery context (Chapters 7 and 8).
Table 5-2 Summary of relevant models of the human recovery process
Model Context Operational
input Assessment of recovery
Prediction of recovery
Limitations
Kanse (2004)
Chemical industry
Yes (interviews and data)
Qualitative and
quantitative No
� No assessments of the recovery context
� No prediction of the recovery process
SHAPE’s RAFT tool
ATM Yes
(interviews)
Qualitative (expert-based)
Qualitative (expert-based)
� Not verified in simulated/operational environment
� Based only on interviews and no operational reports
Wickens et al.
(1998) ATM No No
Qualitative and potentially
quantitative (based on the
recovery reaction time)
� Theoretical approach
As stated previously, there are three major factors that influence the quality of
controller recovery, i.e. past experience, procedures, and training. Whilst procedures
and training are regulated within the aviation community, operational experience is
accumulated over time and controllers may or may not experience equipment failures
during their career. For this reason, the next sections describe and discuss existing
regulations regarding recovery procedures and training. Operational experience,
extracted from the questionnaire survey, is investigated in the following Chapter.
5.5 Procedures for handling ATC equipment failures
In both the literature and operational practice, procedures are recognised as the critical
factor for effective recovery. The following section provides an overview of the existing
international and national regulations on procedures for recovery from equipment
failures in ATC. This is followed by a discussion on key principles on the recovery
procedures in ATC, identified in this research.
Chapter 5 Air Traffic Controller Recovery
127
5.5.1 Existing regulations
Regulation on procedures for handling ATC equipment failures, i.e. recovery
procedures, exists at three levels. These are: international (i.e. by the International Civil
Aviation Organisation - ICAO), regional or national (e.g. by the European Organisation
for Safety of Air Navigation – EUROCONTROL at the regional level and Civil Aviation
Authorities – CAAs at the national level), and air navigation service providers (ANSPs)
level.
The main activity of ICAO is the establishment of International Standards,
Recommended Practices and Procedures covering all technical fields of aviation. The
‘Recommended Practices’ are desirable objectives to which ICAO member states
should aim (but are not required) to conform with; whilst ‘Standards’ are considered
mandatory or required in the interest of safety of international air navigation (FAA,
2005). ICAO Standards and Recommended Practices are passed to the respective
regional organisation (e.g. EUROCONTROL) or directly to the national CAAs for
assessment and implementation. The national CAA is then responsible for assurance
and monitoring that these standards are properly implemented by ANSPs at the level of
ATC Centres. The current status of regulations on recovery procedures is discussed in
the following sections.
5.5.1.1 International regulation
Since 1945 ICAO has specified the standards, practices, and procedures for ATC. The
most recent edition of ICAO Annex 11 responsible for air traffic services (ICAO, 2001c)
advises that “air traffic services authorities should develop and promulgate contingency
plans for implementation in the event of disruption or potential disruption of air traffic
services and related supporting services in the airspace for which they are responsible
for the provision of such services”. This ICAO recommendation represents a summary
of the key system safety principles that need to be considered within each air traffic
service unit. Moreover, several particular equipment failures are covered separately in
the ICAO document dealing with procedures for air navigation service (ICAO, 2001a).
These are radar equipment failure, ground radio failure (blocked frequency), ground
Automatic Dependent Surveillance (ADS), and failure of Controller Pilot Data Link
Communication (CPDLC). Based upon the findings from the analysis of operational
failure reports presented in Chapter 4, ICAO has concentrated upon the appropriate
components in terms of the communication and surveillance ATC functionalities whilst
disregarding the data processing functionality.
Chapter 5 Air Traffic Controller Recovery
128
In their guidance for recovery from four failure types, ICAO recommends necessary
steps to be taken by controllers and pilots, as well as ATC Centre watch managers or
supervisors. When necessary, ICAO also recommends collaboration with adjacent ATC
units. Therefore, the recovery process is not seen only as the responsibility of
controllers but all parties involved within the affected ATC Centre and region (including
the adjacent ATC unit which can provide valuable assistance in restricting or rerouting
the flow of traffic). All other failure types are left to national service providers to include
and define in their Manuals of Air Traffic Services (MATS).
5.5.1.2 European and national regulation
At European level, EUROCONTROL published guidance and recommendations for
controller training in the handling of unusual/emergency situations, known as the
ASSIST scheme (EUROCONTROL, 2003f). This scheme covers all procedures for
aircraft emergencies but paradoxically does not cover any type of ATC equipment
failure. The ASSIST programme, captured in a publicly available document, is intended
to represent only a framework to be further customised and adapted to the specific
requirements of each ATC Centre utilising local expertise. Thus, each ATC Centre is
required to assemble a team of experts, implement the current ASSIST programme,
and discuss other safety-critical events (e.g. ATC equipment failures) to be included in
emergency procedures, training, and/or aide-memoire.
5.5.1.3 Air navigational service provider regulation
National air traffic service providers may publish their own procedures for
emergency/unusual situations in the MATS. The MATS contains procedures,
instructions, and information which form the basis of air traffic services within a country.
It is published for the guidance of civil air traffic controllers, but may also be of general
interest to other associated parties within civil aviation. For example, the UK MATS is
arranged in two parts. Part 1 is published by the UK CAA (as CAP 493; UK CAA, 2006)
and consists of instructions which apply to all UK ATC units. Part 2 is published by the
UK National Air Traffic Service Provider (NATS) and consists of instructions which
apply to a particular air traffic control unit (e.g. the London Area Control Centre).
NATS publishes specific recovery or fallback procedures in their internal MATS Part 2
document. This document defines 33 failure types and relevant strategies for their
recovery (NATS, 2002) and thus reflects the particular ATC system characteristics of
the UK ATC Centres. No information regarding the methodology to compile these
Chapter 5 Air Traffic Controller Recovery
129
recovery strategies is available. It can only be assumed that these recovery procedures
are a direct result of expert discussions, operational experience, and experience with
ATC system performance.
The manual advises that the planning controller should be the focal point in the sector
team during the duration of failure with the main objective to ensure that the
tactical/executive controller is supported at all times. The recovery procedure for each
of the 33 defined failures consists of the following:
� a short description of the failure (i.e. what a controller should expect, what are the
potential effects on the ATC system);
� a description of the system-generated alert (e.g. brown border, text message);
and
� a list of required recovery steps (these steps are separately defined for planner,
tactical/executive, assistant controllers, and watch supervisor).
The New Zealand air navigation service provider (i.e. Airways New Zealand) publishes
MATS as required by the Civil Aviation Authority of New Zealand. This document
recommends the use of the recovery procedures for failures of significant components
(e.g. radar data processing, flight data processing, the overall communication system),
as these have the most severe effect of ATC operations. The recovery procedures are
published as a separate document designed to be readily available at each position
(Failure Modes Quick Reference Guide-FMQRG; Airways New Zealand, 2006a). The
main objective of this document is to provide ready and quick assistance to operational
staff for handling equipment failures (i.e. aide-memoire).
The German air traffic service provider (DFS) defines emergency checklists for various
aircraft-related as well as military-specific emergencies. This document created a basis
for the development of EUROCONTROL’s guidance for controller training in the
handling of unusual/emergency situations and the ASSIST scheme (EUROCONTROL,
2003f). However, emergency checklists developed by DFS (same as the
EUROCONTROL ASSIST scheme) do not cover any ATC equipment failures.
While ICAO provides generic recommendations for recovery, ANSPs tend to publish
recovery procedures in the form of a checklist of recovery steps that controllers need to
perform upon detection of any of the pre-defined unusual situations. This form is
practical and easy to follow especially in the case on unexpected and emergency
situations, such as equipment failures. Similar to other types of emergency situations, it
Chapter 5 Air Traffic Controller Recovery
130
is possible to define a set of equipment failure recovery steps whose implementation
lead to system protection and assurance of accurate situational awareness. The
selection of relevant recovery steps as well as the timely manner in which they are
implemented lead to effective or successful recovery. It is important to highlight that in
general all emergency/unusual situation procedures are intended as a general guide,
and controllers are expected to use their best judgment in any given situation.
As stated above, air navigation service providers that recognise the importance of the
existence of procedures for equipment failures publish them in their relevant manuals.
These unusual situations are slowly being included into a list of regular emergency
procedures. However, MATS manuals are not available in the public domain. For this
reason, it was necessary to set up a questionnaire survey to investigate the current
status and quality of procedures and training worldwide. The results of this survey are
presented in Chapter 6. The review of recovery procedures in ATC is concluded in the
following section by a discussion on identified areas of concern.
5.5.2 Main principles behind recovery procedures in ATC
Following the discussion of available recovery procedures in the aviation community,
this section summarises the key principles on the recovery procedures in ATC. These
are availability, design, and contents, as presented below.
The EUROCONTROL report on managing technical disturbances (EUROCONTROL,
2004e) concludes that procedures represent a critical factor for effective recovery. If no
procedures are available to the controllers, they may use their own mental models of
the ATC system and operational environment to decide on the most effective recovery
strategy. Such ad hoc performance can significantly vary depending on the quality of
the controller diagnosis of the failure occurrence, experience, available information,
and the failure complexity. Therefore, to assure minimal required safety performance, it
is essential to provide recovery procedures to controllers.
Recovery procedure design should focus on phases of the recovery process and steps
that the controller must perform to recover effectively and ensure a safe ATC service.
Furthermore, the procedure should also contain the key effects of the failure on the
operational system, so that there is no potential that the controller may implement the
wrong procedure. Appendix III presents a framework for a check-list type of controller
recovery procedure or aide-memoire that should be available at each Controller
Working Position (CWP). This aide memoire, designed in this research, is based upon
Chapter 5 Air Traffic Controller Recovery
131
the characteristics of the ATC Centre that participated in the experimental investigation
(presented in Chapters 9 and 10).
Finally, assuming that recovery procedures are available, their contents must be
accurate and kept up to date (i.e. reflecting all modifications/updates in the ATC system
architecture). They must be realistic, comprehensive, clear and easy to use, easily
accessible, and linked to regular emergency training.
After discussion on the recovery procedures and their key principles in ATC, the
following section discusses training for handling ATC equipment failures in a similar
manner.
5.6 Training for handling ATC equipment failures
In line with the recovery procedures, training is recognised also as a critical enabler for
effective recovery. This section reviews the existing regulations on training for recovery
from equipment failures in Air Traffic Control (ATC) at three levels: international,
regional/national, and air navigation service provider. This is followed by a discussion
on several areas of concern on training for unusual/emergency situations in ATC, as
identified in this research.
5.6.1 Existing regulations
Regulation on training for handling ATC equipment failures, i.e. recovery training, exists
at three levels. These are: international (i.e. by the ICAO), regional or national (e.g. by
the EUROCONTROL at the regional level and CAAs at the national level), and ANSPs
level.
5.6.1.1 International regulation
ICAO guidance on human factors can be found in the Human Factors Training Manual
(ICAO document 9683; ICAO, 1998). According to ICAO, human factors principles
account for design, certification, training, operations, and maintenance, as well as safe
interfaces between humans and systems. The module of Human Factors Training
Manual highlights the necessity to train controllers on skills such as controller-
equipment relationship and operational aspects of automation (e.g. staying in the loop,
situational awareness, and the appropriate use of automated ATC equipment).
However, there is no specific guidance on training for emergency/unusual situations.
Chapter 5 Air Traffic Controller Recovery
132
5.6.1.2 European and national regulation
A number of countries have realised the benefits of regular emergency training for
controllers and consequently have initiated training programs. In addition, on a
European scale, the EUROCONTROL European Manual of Personnel Licensing - Air
Traffic Controllers (EUROCONTROL, 2001d) now contains a requirement that ATC
units must include training for emergency/unusual situations in their training
procedures. It should consist of two segments: the first is to prepare trainees, prior to
validation, in the procedures used in the event of an emergency situation and the
second is for routine refresher training to enable qualified controllers to respond to
unusual or emergency situations in a competent and professional manner. The
importance of practicing unusual situations that have occurred elsewhere is recognised
and recommended as best practice. In general, the EUROCONTROL European
Manual of Personnel Licensing document details minimum standards for professional
qualification of controllers and has the aim of harmonising licensing schemes in
Europe. The following section describes how a particular incident made a significant
impact on the regulations related to emergency training within one Civil Aviation
Authority (CAA).
5.6.1.2.1 UK Civil Aviation Authority regulation
An emergency situation that occurred in the UK airspace highlighted both the
importance of the existence of training in unusual situations and the necessity for
refresher training. In short, a particular aircraft reported dangerously low oil pressures
in both engines and consequently declared an emergency situation. In this incident the
controller on duty handled the situation with a ‘text book’ performance. The controller
informed the crew on the closest diverting airport, minimised radio frequency
transmissions still passing all relevant information, and arranged direct routeing and
descent towards the chosen airport. During the course of the subsequent investigation,
the controller, a young trainee, pointed out that his actions were timely and efficient as
a direct result of the training in handling emergencies received on the day before the
incident occurred (Baker and Weston, 2001).
As a result of the recommendations made in the report on this incident, in 1994 the UK
CAA’s Safety Regulatory Group (SRG) decided to mandate such training for all UK
controllers (Baker and Weston, 2001). In 1999, an initial set of guidelines was
broadened to include team related aspects and to place additional focus on unusual
events rather than just emergencies. This change was reflected in the TRaining for
Chapter 5 Air Traffic Controller Recovery
133
Unusual Circumstances and Emergencies (TRUCE) scheme. TRUCE was designed to
ensure that staff involved in the provision of an air traffic control service are trained to
recognise and handle emergency occurrences and unusual circumstances in a
competent manner. Some of the emergency/unusual situations that severely affect the
ATC operations, including equipment failures, are mandatory in the TRUCE scheme
(UK CAA, 2003).
5.6.1.3 Air navigation service provider regulation
As noted above, aviation authorities recognise the importance of regular training for
equipment failures. According to the regulations issued by Civil Aviation Authorities
(CAAs), training for emergency/unusual situations is usually set up by air navigation
service providers within their respective ATC Centres. However, the type and
frequency of emergency training can deviate from existing regulations (due to shortage
of staff and infrastructure and the high costs involved). To further augment the
regulations on recovery training available from CAAs, it was necessary to set up a
questionnaire survey to investigate the current provision of recovery training worldwide.
The results of this survey are presented in Chapter 6.
5.6.2 Areas of concern related to recovery training
Currently in ATC there are several issues of concern related to training. Firstly, the
recovery training should follow the phases of the recovery process, where adequate
time and guidelines should be given for failure diagnosis. Secondly, established
controllers have been trained in non-radar or procedural control, which is not the case
with newly qualified controllers. This means that established controllers posses the
skills to handle any degree of radar failure (as one of the most severe equipment failure
types).
Thirdly, the frequency, comprehensiveness and range of unusual situations for training
in the simulated ATC environment vary from Centre to Centre. While some ATC
Centres offer comprehensive initial training, supported by annual refresher training,
other Centres offer little or no opportunity for staff to practise coping with unusual
occurrences in a simulated environment (EUROCONTROL, 2004e). This lack of
regular training and the infrequent occurrence of serious equipment failures may lead
to a serious lack of experience with recovery performance. In addition, as the newly
more automated ATC systems tend to be reliable, controllers are deprived of the
opportunity to experience equipment failure and recovery in the operational
environment, and therefore need to gain these experiences through regular training.
Chapter 5 Air Traffic Controller Recovery
134
Fourthly, in spite of the clear need for regular training, the lack of resources
(infrastructure and staff) makes it impossible to train controllers for all different types of
emergency/unusual situations and all equipment failure types. For this reason an
organised exchange of experience at the level of ATC Centres, countries, or regions
(e.g. ECAC states, EUROCONTROL member states) may provide valuable knowledge
and insight into various unusual situations and strategies to resolve them. As an
example, in 2003 an A300 was struck on the left wing by an air missile system resulting
in a complete loss of hydraulics and therefore loss of all flight controls. Reacting
rapidly, the captain recalled a television documentary he had seen about a DC-10
crash at Sioux City, Iowa, and the thrust change technique employed by the captain
and crew of the DC-10 to control their aircraft. Although the A300 crew had never
practiced this technique before, they quickly gained control despite the extreme stress
of the situation (IFALPA, 2005). This example shows the importance of exchanging
information on knowledge, performance, and strategy between human operators.
Similar experience could be achieved in the area of ATM by supporting workshops,
newsletters, and other forms of information exchange on best practices and handling of
unusual events.
Finally, the EUROCONTROL (2004e) report on managing technical failures in ATC
points out potential future problems identified through controller interviews. Firstly, it
suggests that the mental picture of the traffic situation will be more difficult to form in
the future ATC environment. Secondly, it suggests that in the future, the controller may
require more knowledge of the ATC system architecture when compared to today.
Finally, the report suggests that newly qualified controllers and fully established
controllers have different perceptions of one another: newly qualified controllers are
perceived by some fully established controllers to be more trusting of the reliability of
new equipment, having rarely experienced failures in the past, while established
controllers are perceived by some newly qualified controllers as less computer literate
and more suspicious of technology.
The previous sections of this Chapter revealed the complexity of controller recovery by
discussing its relevant phases, from failure detection to the outcome of the recovery
process. In addition, the past research identified factors that influence the quality of
controller recovery. The next section defines a set of variables that capture the
important characteristics of controller recovery. These are the context that surrounds
the controller recovery process, the recovery effectiveness, as well as the recovery
Chapter 5 Air Traffic Controller Recovery
135
duration. These variables guide the design of the experiment to capture real data on
controller recovery later on in the thesis.
5.7 Definition of controller recovery in this thesis
This thesis investigates the process of controller recovery from equipment failures in
Air Traffic Control (ATC). From discussions in the preceding sections of this Chapter, it
is clear that controller recovery (as a human recovery in a particular context of the ATC
system) is a complex process that involves a number of steps that can be assessed
using different methods and variables. In summary, a credible assessment of controller
recovery should answer the following questions:
� What are the factors that influence controller recovery performance and choice of
recovery strategy (i.e. characteristics of the recovery context)?
� What is the effectiveness of the selected and implemented recovery strategy (i.e.
the required recovery steps and the outcome or effectiveness of recovery)?
� How efficiently does a controller respond to an equipment failure (i.e. the recovery
duration)?
These questions are discussed in the following sections.
5.7.1 Recovery context
Human reliability assessment research over the years has shown the important role of
the context in which human performance take place. Recent techniques now place
more emphasis on the definition of key contextual factors and their impact on the
reliability of human performance. Context affects every part of the process of
recovering from equipment failures and thus includes past experience and the status of
recovery procedure and training relevant to a particular equipment failure under
investigation. As stated by EUROCONTROL (2004e), ’context is everything’. Chapter 7
of this thesis presents a detailed review of the current understanding of contextual
factors in various ATM and non-ATM industries. The research presented in this thesis
uses these findings together with results from controller interviews to identify the
contextual factors relevant to controller recovery from equipment failures in ATC.
Furthermore, these factors are used in conjunction with an appropriate methodology to
further analyse controller performance during the process of recovery from failures and
to quantitatively define the recovery context indicator (Chapter 8). In addition, the
importance of the recovery context is further explored in the experiment (Chapters 9
and 10).
Chapter 5 Air Traffic Controller Recovery
136
5.7.2 Recovery effectiveness
The recovery effectiveness of each controller responding to an unusual, emergency, or
non-nominal situation can be characterised by a set of required recovery steps.
Sections 5.5 and 5.6 reviewed the existing schemes for handling emergency
occurrences, achieved through defined recovery procedures and training. Existing
procedures and schemes were reviewed including the UK CAA’s TRUCE scheme, UK
NATS fallback procedures, Airways New Zealand, and the German air service provider
(DFS) emergency checklists, all designed to ensure that staff involved in the provision
of ATC service are trained to recognise and resolve any emergency situation in a
competent manner. In addition, the review included the overview of the
EUROCONTROL’s and ICAO’s guidance for recovery procedures and recovery
training (EUROCONTROL, 2003f; ICAO, 2001a).
In general, these safety schemes create a checklist of recovery steps that follow the
phases of the recovery process (i.e. detection, diagnosis, correction). These checklists
are written procedures and controllers are expected to know and follow them. In a
similar way, an ATC equipment failure is considered as one type of unusual/emergency
situation. Although equipment failure related procedures or checklists are not always
available, it is possible to define a set of required recovery steps, whose
implementation can assist in the protection of the system and preservation of accurate
situational awareness. The selection of relevant recovery steps and the time frame in
which they are implemented contribute to an effective or successful outcome of the
recovery process. This is explained further in the following section.
5.7.3 Recovery duration
The duration of the controllers’ recovery process is time measured from the first overt
controller action to the end of the recovery process. The end of the recovery process is
influenced by the restoration of the failed component or by the reversion to the backup
facilities (i.e. fallback systems). The analysis of operational failure reports (Chapter 4)
indicates that the longer the failure, the less severe it tends to be. As a result, the
research presented in this thesis focuses on failures of short duration. Furthermore,
past research has focused on the reaction time, while putting more emphasis on its
extreme values (see Wickens, 2001). However, extracting the controller reaction time
can be an extremely difficult task as this first reaction usually represents covert (i.e. not
directly observable) behaviour. For this reason, the research presented in this thesis
Chapter 5 Air Traffic Controller Recovery
137
focuses on the controllers’ first action that is observed on the ATC system (e.g.
communication regarding identified failure, interaction with HMI).
Apart from the moment of actual detection, the recovery duration variable may also
lack some aspects of the diagnosis phase. In other words, the cognitive processes
behind understanding the new situation and prioritisation of the recovery tasks to be
performed may also occur covertly. For example, the real cause of the communication
failure is not immediately obvious as the controller needs to investigate if the failure
affects ground ATC equipment or airborne radio equipment. Both of these features of
controller recovery are considered in the design of the experimental investigation
presented in Chapter 9.
5.8 Summary
As pointed out at the beginning of this Chapter, a good understanding of recovery
requires a detailed assessment of the recovery process from both the technical and
human perspectives. Whilst the previous Chapter discussed the technical recovery, this
Chapter focuses on controller recovery. The Chapters starts by distinguishing the
objectives of two separate groups of operators involved in recovery from equipment
failures, namely controllers and engineers. While this thesis focuses solely on controller
recovery from equipment failures, the reviewed theoretical background to human
recovery is applied to the controller recovery by identifying its major phases. As a
result, the main phases of controller recovery together with the outcome of the overall
recovery process have been described. Finally, various models of human recovery,
developed for both ATM and non-ATM industries, have been discussed with emphasis
on three of the most relevant ones to controller recovery. These are: the model by
Kanse derived for recovery performance in the chemical process industry, the RAFT
tool derived specifically for the ATC operational environment, and the model by
Wickens generally focusing on the impact of different levels of automation on the
recovery process.
Apart from identifying the main phases of the controller recovery process, the review of
the theoretical background has also highlighted the factors that influence the quality of
controller recovery, namely past experience, recovery procedures and training. While
past experience is aggregated throughout the controller’s operational experience, the
current status and quality of recovery procedures and training are regulated by
international and national aviation authorities. Thus, the Chapter reviews and discusses
the current status of regulation regarding recovery procedures and training, whilst the
Chapter 5 Air Traffic Controller Recovery
138
feedback regarding controllers’ past experience is gained through from the
questionnaire survey presented in the following Chapter. After reviewing theoretical
findings extracted from ATM and non-ATM research relevant to controller recovery, the
Chapter concludes by proposing a set of variables for an in depth assessment of
controller recovery. This is achieved by assessing the context, quality, and temporal
characteristics of the controller recovery process. These variables also guide the
experimental design to collect real data on controller recovery (Chapter 9).
Chapter 6 Questionnaire Survey
139
6 Questionnaire Survey
Chapter 5 showed that limited research has been carried out globally on human
reliability in relation to controller recovery. Hence, this Chapter presents the details of a
questionnaire survey scheme with the aim of overcoming the lack of knowledge and
further support the research in this thesis. The specific objectives of the questionnaire
survey are to investigate controller experience with equipment failures and to identify
factors that affect their recovery, to extract more operational experience, to investigate
the status and quality of recovery procedures and training, and to contribute to the
wider human reliability research by assessing the specific controller recovery. The
Chapter starts with the definition of the target population and sampling. It proceeds by
discussing the survey methodology identified for the collection of questionnaire
responses, design of the questionnaire, and the refinements identified by a pilot survey.
This is followed by the description of the full survey scheme (Figure 6-1). The Chapter
concludes with the methodology for the questionnaire survey data analyses structured
in three segments. These are: assessment of the sample characteristics, high-level
frequency analyses, and in depth assessment of interactions between recovery factors.
Chapter 6 Questionnaire Survey
140
Figure 6-1 The flow diagram of organising a survey
6.1 Objectives of the questionnaire survey
One of the objectives of the research presented in this thesis is to address the general
lack of knowledge in the area of controller recovery from equipment failure. This is vital
in oer to enhance safety and operational efficiency in the current and future ATC
environment. As described in Chapter 5, although significant human reliability research
has been undertaken in other industries, such as nuclear and chemical processing, it is
not directly transferable to the highly dynamic ATC environment. In order to address
the issues above, the questionnaire survey presented in this Chapter focuses on four
objectives. Firstly, the survey is designed to investigate controller experience with
equipment failures and to identify factors that affect controller recovery. This is to be
achieved by extracting the operational experience from the sample of air traffic
controllers. Secondly, the survey is to be used to augment the information obtained
from the operational failure reports (as presented in Chapter 4) which lack any input on
controller recovery. This is achieved by questioning the participating controllers as to
Chapter 6 Questionnaire Survey
141
the most severe failures they have experienced. Thirdly, the survey contributes to the
determination of the status and quality of recovery procedures and training in ATC
Centres (and thus augments the findings from Chapter 5). Finally, the survey is
designed to contribute to the wider human reliability research by assessing the specific
controller recovery performance.
Six key questions were formulated in order to achieve the four objectives. The
questions (below) address ATC equipment, controller recovery performance, and
status of recovery procedures and training:
� How often do controllers experience equipment failures (Q1)?
� What factors influence their recovery performance (Q2)?
� What is the most unreliable ATC equipment (Q3)?
� Is there any organised exchange of information on equipment failures and/or other
types of unusual/emergency situations (Q4)?
� Do recovery procedures exist (Q5)?
� What do controllers feel about the quality of training currently available for recovery
from equipment failures (Q6)?
Given the objectives of the questionnaire survey above, the next section defines the
target population and sample size.
6.2 Sampling
The population for this questionnaire survey should consist of controllers from various
ATC Centres worldwide. The population characteristics to be sampled in this survey
are ATC Centres with different levels of traffic and airspace complexity, and ATC
system automation, and controllers with a range of operational experience (i.e. years in
service, rating).
Using the United Nations (UN) statistics that there are 191 independent countries
worldwide (United Nations, 2006), it is possible to estimate the total number of ATC
Centres. However, data on the number of ATC Centres for each country were not
available to this research1. Therefore another approach based on the distribution of
global air traffic (Airbus, 2004) has been used. In other words, the ideal sample should
consist of regional distributions of sampled controllers that correspond to the air traffic
1 Personal correspondence with International Federation of Air Traffic Controllers' Associations
(IFATCA) revealed that this data is not available.
Chapter 6 Questionnaire Survey
142
distribution as presented in Figure 6-2. Moreover, it is also important to obtain a sample
which represents the current distribution of air traffic but also account for its future
predicted growth. The predicted growth in air traffic to the year 2023 indicates the
importance of Asia/Pacific and Middle East regions, while other markets remain steady
(Figure 6-2). Airbus (2004) predicts that Asian airlines will experience the fastest
growth rates. This prediction is in line with observed changes in the aviation market
and the shift towards Asian operations (Airbus, 2004; Air Transport Action Group,
2005). Moreover, it is predicted that by 2023 the already mature North American
domestic market will lose its historical dominance to both Europe and the dynamic
Asia/Pacific region. Based on all these findings, the target of the questionnaire survey
should be to collect responses from Asia/Pacific, Europe, and North America
corresponding to characteristics of the population surveyed (i.e. different levels of traffic
and airspace complexity, ATC system automation, and controllers experience).
32 3331 32
26
33
4
25
52
40
5
10
15
20
25
30
35
Africa Latin America
and Caribbean
Asia and
Pacific
Europe North America Middle East
Region
Perc
en
tag
e
2003
2023
Figure 6-2 Distribution of world air traffic per region for the years 2003 and 2023 (adapted from Airbus, 2004)
Having defined a target population and its characteristics to be sampled, it is important
to define the size of the sample. Collecting a large sample of data would pose a
significant challenge as it would be a logistically huge task and very time consuming for
one single researcher. Therefore, the sample size needed to be contained within
manageable proportions. However, the sample still needed to be representative of the
population of controllers. As guidance, the modelling of controller operational
experience with the normal distribution requires approximately 20 data points (Shier,
2004). Increasing this minimal sample size by a factor 5, the target sample size was
initially aimed at 100 controller responses. This sample size is in line with the sample
used to support a Federal Aviation Administration (FAA) study of similar scope (i.e. 128
responses from aviation experts; Funk, Lyall, and Riley, 1996). However, target sample
Chapter 6 Questionnaire Survey
143
size (in terms of number of controllers and ATC Centres sampled) would vary
according to the choice of data collection method and available resources.
6.3 Survey methodology
Surveys have long been recognised as a valid method for measuring attitudes (or
preferences), beliefs, or facts (including past behavioural experiences). Actually, one of
the most common uses of surveys is to measure individuals’ past behavioural
experiences (Weisberg, Krosnick, and Bowen, 1996). The aim of the questionnaire
survey presented in this thesis is to collect facts regarding equipment failures and
controller recovery, in particular the operational experience and status of procedures
and training for equipment failures. Therefore, using a survey to collect these types of
data is justified.
Due to the nature of this survey, the methods available were either to gather the
information directly from face-to-face interviews with controllers in various ATC Centres
or remotely by self-completion via the internet and professional networks. Although less
reliable, the use of the internet and professional networks is useful in presenting a
wider picture of controller experience and recovery from equipment failures. The
advantages and disadvantages of both methods are presented below.
Data gathering through face-to-face interviews requires visits to ATC Centres and
direct access to controllers. This approach is comparatively more reliable since it
presents the opportunity to clarify any issues either prior to or during the interview.
Moreover, it facilitates representative sampling for example within an ATC Centre as
more than one controller can be asked to participate. The drawbacks of this approach
are the practical and financial issues related to the cost of travel and access to enough
ATC Centres to generate a representative sample depending on the characteristics of
the population.
In a self-completion survey, the questionnaires are distributed using a professional
network or popular aviation related internet forums. Compared to face-to-face
interviews, this method saves time and enables more questionnaires to be distributed.
However, research has shown that the response rate is inferior to face-to-face
interviews. A response rate of 10 to 50 percent is usually achieved with self-completion
questionnaires compared to 100 percent in the case of face-to-face interviews. This
means that in order to collect 100 samples, between 200 and 1000 questionnaires
should be distributed. The questionnaires may be distributed via personal/professional
Chapter 6 Questionnaire Survey
144
network and corresponding emails. However, accessing the email addresses of 200-
1000 controllers worldwide presents a significant obstacle to the distribution of
questionnaires.
Additional problems with the self-completion method are the number of responses and
the quality of survey sample obtained. The self-completion method depends entirely on
the intention and willingness of the controller to participate in the survey. Thus it is
harder to control the number of responses obtained. Apart from the high likelihood of
low response rate of a self-completion survey, another drawback is that the quality of
the answers cannot be controlled. Even in the case of straightforward questions,
respondents may misinterpret some of the questions or may need more information on
the subject under investigation. The presence of the researcher, while the respondent
is answering the questions, provides the advantage of ensuring that the respondent
understands what is required from the survey.
After careful consideration of both the advantages and disadvantages of the two survey
methods (face-to-face and self-completion), both were adopted in this thesis. This
decision was based on the need to exploit the strong points of both methods
particularly given the timing and response rate constraints. In order to maximise the
benefit of the combined approach, the design of the questionnaire must account for
their unique characteristics.
6.4 Design of the questionnaire
It is very important when designing a questionnaire to focus on information needed for
the study and to present questions in an unbiased fashion to enable responses with a
high degree of fidelity. The length of the questionnaire should also be considered.
Given the decision to use both face-to-face interviews and self-completion surveys, it
was necessary to focus on a questionnaire design that meets the requirements for both
methods. While face-to-face interviews allow a more complicated structure for the
questionnaire, additional attention has to be paid to the length of the interview. Self-
completion survey allows detailed questions to be designed using a less complex
structure. This survey method requires a written introduction to explain the objectives of
the study, its added value, and the key features of the survey itself (e.g. format, type of
questions, approximate time required for the survey completion).
One of the possible solutions was to design two sets of questionnaires; one for the
face-to-face interview and the other for the self-completion survey. However, to assure
Chapter 6 Questionnaire Survey
145
the highest reliability and completeness of responses, it was decided to use one
questionnaire design in both survey methods. The aim was to design the questionnaire
survey to extract the maximum information whilst ensuring convenience for both face-
to-face and self-completion respondents. This was achieved following several design
principles. Firstly, special attention was given to clarity of questions to avoid any
ambiguity in the self-completion survey. Secondly, emphasis was placed on closed
questions, where the respondents’ answers did not require the presence of the
researcher. Closed-ended questions can be answered finitely by one of the given
answers; the simplest form being the yes/no answer. In general, these questions are
restrictive and can be answered in a few words. Thirdly, all key terms were defined.
Finally, for open questions, a list of potential answers were provided to guide the
respondents (e.g. for the question on the most unreliable ATC equipment, a
comprehensive list of various ATC equipment were provided). Open-ended questions
allow respondents to answer in their own words providing a narrative. In general these
questions solicit additional information, as they require more than one or two word
responses. Furthermore, the questionnaire was designed in a way that ensured that
any inconsistencies in responses can be identified. This was achieved through the
careful choice of questions and by having multiple questions assessing a particular
issue (e.g. recovery procedures).
The questionnaire has been structured around the main objective of the research
presented in this thesis. In other words, all the questions have been designed to
support the research on controller recovery from equipment failures in Air Traffic
Control (ATC). Based on the type of information obtained, the questionnaire is
structured in four distinct groups totalling 29 questions. The first group consists of
general and specific questions. The former covers the overall operational experience,
ratings, and the country/ATC Centre where the respondent works. The latter inquire
specifically about experience with equipment failure, asking the respondent to list
several examples in a greater detail. This first group consists of five questions.
The second group of questions inquires about the factors that affect controller recovery
by asking the respondent to rate the importance of three factors. This is followed by the
question on the most unreliable ATC systems/components, as well as the
organisational issues relevant for recovery. In total, this second group consists of four
questions.
Chapter 6 Questionnaire Survey
146
The third group of questions focuses on the existence and quality of recovery
procedures at the ATC Centre where the respondent works. This group consists of 11
questions.
The fourth group of questions focuses on the existence and quality of training for
recovery at the ATC Centre where the respondent works. This group has nine
questions. The final question provides an opportunity to the respondent to add
comments and suggestions related to the entire questionnaire.
The following is a one-page example of the questionnaire which was used during the
survey (Figure 6-3). It is the second page of the questionnaire. A complete
questionnaire is included in Appendix IV, while an example of a response to the
questionnaire is provided in Appendix V.
Figure 6-3 One-page example of the questionnaire
6.5 Pilot survey
Before conducting the full survey, a small-scale pilot survey was performed to verify the
clarity of questions and the time necessary to complete the questionnaire. It surveyed
two EUROCONTROL in-house controllers, two ATM specialists, and three
psychologists with backgrounds in ATC and the design of questionnaire surveys. No
conflicting issues have been identified between them. Their input included only minor
amendments in the design of the questionnaire, such as additional emphasis on the
Chapter 6 Questionnaire Survey
147
added value of the survey and how the results would be used. This information was
included in the introductory page of the questionnaire (i.e. the first page). Additionally,
the pilot survey revealed the need for some examples of ATC equipment/tools which
were added as a note after question 5. These changes were incorporated in the final
design of the questionnaire.
The following sections discuss how the survey methodology has been exploited to
achieve the target sample size.
6.6 Full survey
As discussed previously responses have been gathered using face-to-face interviews
and self-completion methods. The results are briefly presented below.
6.6.1 Face-to-face interviews
Professional visits to various ATC Centres and relevant organisations were used to
distribute questionnaires to available controllers and capture their responses through
face-to-face interviews. Using this approach, responses were received firstly from the
visited ATC Centres (involving controllers from India, Serbia, and Ireland), their training
facilities and various controllers in training (involving Irish and Maltese controllers).
Secondly, responses were received from the controllers involved in the
EUROCONTROL’s Gate to Gate project on real-time simulations (involving controllers
from the Netherlands, Germany, Italy, France, Sweden, Spain, and Slovenia). Finally,
responses were received from controllers on various courses run by
EUROCONTROL’s Institute for Air Navigation Services (IANS)2 (involving controllers
from Belgium, Ireland, Switzerland, Netherlands, Romania, Sweden). In spite of the
high costs involved, approximately 40 percent of the data were collected using face-to-
face interviews, where controllers had an opportunity to clarify any doubt before
answering questions.
6.6.2 Self-completion survey
Self-completion survey involved electronic distribution of questionnaires by Imperial
College colleagues visiting various ATC Centres and via professional networks and
popular aviation related internet forums. Countries visited included Tahiti, South Africa,
Tanzania, a number of European countries, Macau, New Zealand, Singapore, and
1 IANS provides regular courses to ATC staff from all EUROCONTROL Member States (i.e. 37
European countries).
Chapter 6 Questionnaire Survey
148
China. In addition, the Imperial College colleagues exploited professional networks of
controllers to gain more responses. These networks have links to EUROCONTROL
and ATM specialists in various air navigational service providers, and hence resulted in
responses from Croatia, Finland, Switzerland, Macedonia, Moldova, India, and
Germany. Additionally, the Professional Pilots Rumour Network (PPRuNe) forum, an
aviation website dedicated to airline pilots and others in aviation business including air
traffic control staff, was also used for obtaining survey data (see PPRuNe, 2006). The
aims and objectives of the survey and the overall research were posted on this
particular internet forum on two separate occasions to attract controllers worldwide. If
interested in participating in this survey, controllers were advised to contact the
researcher and thus obtain an electronic copy of the questionnaire survey. In spite of
an initially high level of interest, only a few responses were collected using this method
(including Australia and United Kingdom). Overall, approximately 60 percent of data
was collected using the self-completion method.
6.6.3 Potential sources of errors
There are two main potential sources of error in the survey. These sources are the
respondent and data pre-processing. In general respondent errors may occur for a
variety of reasons. It was noted for example, that controllers from the same ATC
Centre gave contradicting answers to particular questions. Possible causes for this
include imprecision in the formulation of questions, lack of knowledge on the part of
controllers (on existence of recovery procedures, training, organised exchange of
information), and misinterpretation of questions. The imprecision in the formulation of
questions was addressed by the pilot study and thus should not have played a
significant role in generating respondent errors.
Lack of knowledge on the part of controllers was noted for the questions on the status
of recovery procedures, training, and organised exchange of information within their
ATC Centre. For example, while a group of controllers from an ATC Centre was aware
of the recovery procedures, others stated that these procedures do not exist. These
inconsistent responses were further investigated using the related questions. For
example, if the controller responded that no recovery procedures are defined within
his/her ATC Centre (the first question related to recovery procedures), the subsequent
questions related to recovery procedures are investigated (e.g. adequacy,
completeness, currency). The final judgement is based on all answers that were
provided in relation to recovery procedures (and not only the first one).
Chapter 6 Questionnaire Survey
149
Misinterpretation was also noted in the question on the number of equipment failures
experienced annually. In this particular case, the data collection reflected the overall
misinterpretation of the term ‘equipment failure’ and the consequent variation in the
answers. While some controllers reported all equipment failures they experienced
within one year regardless of severity, others reported only major failures classified as
infrequent high severity occurrences.
The possibility of errors arising from pre-processing of the responses was mitigated by
extra care at the data input stage (i.e. double checking of each input). In the case of
multiple response questions or questions returning a range instead of a single value, a
consistent approach was taken. For example, in response to question 4 ‘What is the
average number of ATC equipment failures during one year that you experience?’ the
respondents tended to provide either a single numerical value, range, or a textual
answer. In the case of range, the middle value was taken. This method has been
applied consistently with other questions, if necessary. Textual answers have been
transformed into numerical values (e.g. ‘once in two years’ was considered as 0.5 per
year). However, sometimes these textual answers could not be transformed to
numerical values and thus the answer was omitted (e.g. question 5 segment on
frequency and duration of failure was answered ‘minutes’, ‘very frequent’, ‘very often’,
‘rarely’, ‘very rarely’, or ‘once in career’).
The next section describes the methodology behind the analysis of questionnaire
survey results.
6.7 Methodology for the questionnaire survey data analysis
This section starts with a discussion on the questionnaire data pre-processing issues. It
then proceeds with the analysis of questionnaire survey data organised in three
segments. The first segment deals with the characteristics of survey sample in terms of
number of countries, ATC Centres, and controllers surveyed. This segment also
focuses on the characteristics of controllers by assessing their operational experience
(i.e. number of years in service) and rating3. The second segment of the questionnaire
survey data analysis presents the high-level summaries of responses, i.e. simple
percentage analysis (Figure 6-4). These summaries are organised in seven sub-
groups, corresponding to the six key questions that the questionnaire survey was
3 Differentiating between Area Control, Approach Control, and Tower rating.
Chapter 6 Questionnaire Survey
150
designed to answer (see section 6.1) whilst the seventh sub-group presents other
findings captured in the survey (presented in Appendix VI). The final segment of the
questionnaire survey data analysis provides an in-depth investigation of the interaction
between recovery factors previously analysed. The following sections discuss the
results and findings generated using the process in Figure 6-4.
Characteristics of the sample
58 ATC Centres134 controllers
High-level analyses
Interaction analyses
Experience with equipment failures
Factors that influence recovery performance
The most unreliable ATC
systems
Other findingsreported in Appendix VI
Questionnaire survey data
Organised exchange of information on equipment failures
Status of recovery procedures
Status of training for recovery
Figure 6-4 The flow chart of questionnaire survey analyses
6.7.1 Data pre-processing
The data collected during the survey was subjected to further statistical analysis using
the SPSS statistics package. Each respondent was given a numerical identifier (serial
number) but no identifying information, such as the person’s name, was used. The
Chapter 6 Questionnaire Survey
151
choices made in the questionnaire by each respondent were recorded under each
corresponding serial number.
During the process of data pre-processing and analysis, all available responses were
taken into account. A special ‘scoring’ technique was used for questions that required
the ranking of choices (question 6). In this particular case, the controllers were asked to
‘score’ their reliance upon written procedures, situation-specific problem solving, and
other factors during the recovery process. This approach is explained in detail in
section 6.7.3.2.
6.7.2 Characteristics of the sample
A total of 134 questionnaire responses were received from 58 ATC Centres spread
across 34 countries (Table 6-1). According to UN data, this questionnaire survey
covers 17.8 percent of independent countries worldwide.
Table 6-1 Summary of the questionnaire survey sample
Country ATC Centre Number of responses
per ATC Centre Number of responses
per country
Ireland
Shannon 7
16 Dublin 4
Cork 5
Finland Kemi 1 1
Serbia Belgrade 3 3
Switzerland Zurich 8
16 Geneva 8
United Kingdom Bristol 1 1
Netherlands
Maastricht 2
4 Nieuw Milligen 1
Amsterdam 1
Germany
Karlsruhe 1
3 Langen 1
Frankfurt 1
Spain Seville 5 5
Norway
Olso 5
8 Kirkenes 1
Stavanger 1
Bodo 1
Italy
Rome 2
7
Bologna 1
Naples 2
Venice 1
Milan 1
France Paris 1
2 Nice 1
Sweden Stockholm 3 8
Chapter 6 Questionnaire Survey
152
Malmo 3
Gothenburg 2
Slovenia Ljubljana 1 1
Belgium Brussels 3 3
Macedonia Skopje 1 1
Croatia
Split 1
4 Zagreb 1
Pula 1
Zadar 1
Moldova Chisinau 1 1
Iceland Reykjavik 2 2
Denmark Copenhagen 3 3
Portugal Lisbon 4 4
South Africa FAJS 2 2
Tanzania Dar el Salaam 1 1
India Mumbai 3
7 Kolkata 4
Singapore Singapore 2 2
Tahiti Papeete 6 6
Australia Melbourne 1 1
Austria Vienna 2 2
Romania Bucharest 2 2
Malta Malta 2
3 Loqa airport 1
Macau SAR Macau 3 3
Kenya Nairobi 4 4
New Zealand
Wellington 1
5 Auckland 2
Christchurch 2
China Hong Kong 1 1
Malaysia Subang 2 2
Total 34 58 134 134
Section 6.2 defined the sampling methodology to correspond to the distribution of
global air traffic per region for the year 2003, taking into account the predicted growth
and estimates to the year 2023 (Airbus, 2004; Air Transport Action Group, 2005).
Assuming a similar distribution of traffic for the period of the survey (2005 and 2006)
and predicted changes in the distribution of future air traffic, the questionnaire sample
lacks the input from two key markets, namely the North America and Middle East
(Figure 6-5).
Chapter 6 Questionnaire Survey
153
20
75
5
0
10
20
30
40
50
60
70
80
Africa Latin America
and Caribbean
Asia and Pacific Europe North America Middle East
Region
Perc
en
tag
e
Figure 6-5 Distribution of questionnaire responses per region
However, looking back at the characteristics of the population surveyed, the sample
still manages to capture the diverse levels of traffic and airspace complexity, ATC
system automation, and controllers with a range of operational experience (i.e. years in
service, rating). For example, in the European region the responses from Paris,
Frankfurt, Amsterdam, Zurich, Geneva, and Maastricht represent the input from some
of the busiest European ATC Centres. Likewise from Asia, the responses from
Mumbai, Hong Kong, and Singapore represent some of the busiest ATC Centres on
the continent as well as those that have experienced considerable growth in recent
years. Finally, the sample also includes ATC Centres with technically advanced
systems, e.g. Malmo ACC in Sweden, Maastricht ACC in Netherlands, Shannon ATC
in Ireland, and the Oceanic Control Centre in Auckland, New Zealand.
Although only five percent of responses were received from the African continent, the
ATC Centres sampled were considered carefully. Johannesburg and Nairobi airports
represent the leading airports in Africa for both passengers and cargo (Air Transport
Action Group, 2005). Both regions are experiencing an increase in passenger
movement mostly as a result of growth in tourism. Failure of ATC equipment and the
recovery response of controllers are of considerable importance in such busy ATC
Centres, more so than in other ATC Centres in Africa with considerably less traffic.
Given the difficulties encountered in accessing ATC Centres and controllers worldwide
(e.g. security, logistics, related costs) and the characteristics of the population
surveyed, the obtained sample can be considered as representative of the population.
The next section assesses the adequacy of sampling achieved within each ATC
Centre.
Chapter 6 Questionnaire Survey
154
6.7.2.1 Sampling per ATC Centre
Although 27 ATC Centres only had one response per Centre, analysis of these ATC
Centres shows that their characteristics do not differ from the characteristics of the
remaining sample. For example, these ATC Centres include some of the busiest ATC
Centres (e.g. Frankfurt, Paris, Hong Kong) as well as those with low traffic and
airspace complexity (Kemi-Finland, Bristol-UK, Bologna-Italy, Ljubljana-Slovenia,
Zagreb-Croatia). They also include ATC Centres with technically advanced ATC
system (e.g. Frankfurt, Amsterdam, Karlsruhe, Stavanger, and Melbourne). Finally, the
characteristics of controllers include all levels of operational experience (i.e. ranging
from 3 to 39 years in service) and ratings. In short, these 27 ATC Centres capture the
characteristics of the target population and as such will be included in the further data
analyses.
6.7.2.2 Sampling of air traffic controllers
The questionnaire survey captured interesting information related to the operational
experience of controllers, namely years of experience, country of residence, and ATC
facility location (i.e. city or airport). The survey data show that on average controllers
have more than 13 years of operational experience (i.e. length of service), ranging from
1 to 39 years. More than 77 percent of the controllers surveyed have up to 20 years of
experience. Taking into account the length of service captured in this survey, it is split
into four categories: 1-10, 11-20, 21-30, and 31-40 years (Figure 6-6). The sample is
reasonably representative of the population as all categories are represented. There
seems to be fewer respondents with over 30 years of experience in the sample
collected. However, this is expected as the majority of controllers with more than 30
years in service tend to move to operational support roles, including training,
instructing, and management.
Chapter 6 Questionnaire Survey
155
Figure 6-6 Distribution of operational experience
Furthermore, Figure 6-7 presents the distribution of the ratings of the controllers who
participated in the survey. In general, most controllers have ACC ratings. As a result,
data analyses may be biased towards the experience within the ACC environment
which tends to be better staffed and with more access to advanced equipment/tools
(e.g. multiple radar sites feed the radar coverage instead of single radar site as in APP
and TWR control, and investment in the more automated systems).
3.732.24
31.34
15.67
9.7
26.12
10.45
0
5
10
15
20
25
30
35
ACC & APP &
TWR
ACC & APP ACC & TWR APP & TWR ACC APP TWR
Rating
Perc
en
tag
e
Figure 6-7 Distribution of controllers’ ratings
6.7.3 High-level analyses
This section presents high-level results from the simple percentage analyses of the
entire dataset. These summaries are organised into seven sub-groups, corresponding
to the six key questions that the survey was designed to answer (defined in section 6.1)
Chapter 6 Questionnaire Survey
156
and concluding with other findings on controller recovery (captured in question 5).
Therefore, the relevant sub-groups are: experience with equipment failures in the ATC
Centre, factors that influence the recovery performance, the most unreliable ATC
systems/tools, organised exchange of information on equipment failures, status and
quality of recovery procedures, status and quality of training for recovery, and other
findings. Each of the sub-groups is discussed below.
6.7.3.1 Experience with equipment failures (Q1)
In the sample obtained, 94.8 percent of controllers did experience some kind of ATC
equipment failure in their career. Additionally, this group of controllers experienced on
average 17 equipment failures annually, ranging from less than 1 per year up to 600,
as reported by one ATC Centre. This dispersion of the results reflects the wide
variation in the interpretation of equipment failures. Some controllers interpreted the
question on equipment failures in terms of only ‘major’ (more severe) failures. Their
answers ranged from less than one (e.g. once in two years, once in five years, once in
a career) to one failure annually (34.6 percent of responses). Other controllers reported
the total number of failures experienced annually regardless of their level of severity, as
their responses ranged from dozens to hundreds. In short, the vast majority of
controllers surveyed have experienced equipment failures.
6.7.3.2 Factors that influence controller recovery performance (Q2)
Controllers were asked to rate how much they relied upon written procedures,
situation-specific strategies (i.e. context), and other factors (e.g. past experience) in
handling equipment failures. The ratings ranged from one to five, where one stands for
‘very much’, two for ‘much’, three for ‘moderate’, four for ‘minimal’ and five for ‘not at
all’.
The results show that more than 45 percent of the controllers surveyed rely on written
procedures in the event of an equipment failure at the levels of either ‘much’ or ‘very
much’ (see Figure 6-8). These controllers have on average more than 13 years of
experience, they operate in ATC Centres with recovery procedures (96.4 percent of
controllers who rated written procedures ‘much’ or ‘very much’) and recovery training
schemes (64.3 percent controllers who rated written procedures ‘much’ or ‘very much’).
Chapter 6 Questionnaire Survey
157
Not at allMinimalModeratelyMuchVery much
Written procedures
50
40
30
20
10
0
Fre
qu
en
cy
3.25%
13.01%
37.4%
22.76%23.58%
Figure 6-8 Controllers’ reliance on written procedures throughout the recovery process
When it comes to situation-specific problem solving, 63.48 percent of controllers rated
this factor at the levels of either ‘much’ or ‘very much’ (see Figure 6-9). Similar to the
previous factor, the operational experience of controllers who rated this factor highest
is on average more than 13 years, they operate in ATC Centres with recovery
procedures (94.5 percent of controllers who rated situation-specific problem solving
‘much’ or ‘very much’) and recovery training schemes (63 percent of controllers who
rated situation-specific problem solving ‘much’ or ‘very much’). The only difference
observed with the previous group of controllers is that no controllers from the African
region rated situation-specific problem solving highly. European controllers tend to rely
much more on situation-specific problem solving (69.3 percent of responses captured
from European controllers) compared to their reliance on written procedures (42.7
percent).
Not at allMinimalModeratelyMuchVery much
Situation-specific problem solving
50
40
30
20
10
0
Fre
qu
en
cy
1.74%
10.43%
24.35%
35.65%
27.83%
Figure 6-9 Controllers’ reliance on situation-specific problem solving throughout the recovery process
Chapter 6 Questionnaire Survey
158
Finally, 64.08 percent of controllers rated other factors (e.g. past experience) at the
level of either ‘much’ or ‘very much’ (see Figure 6-10). Similar to the previous factors,
the operational experience of controllers who rated this factor highest is on average
more than 13 years, they operate in ATC Centres with recovery procedures (90.8
percent of controllers who rated other factors ‘much’ or ‘very much’) and recovery
training schemes (58.5 percent of controllers who rated other factors ‘much’ or ‘very
much’). European controllers rely most on other factors (e.g. past experience) when
recovering from equipment failures (69.6 percent of responses captured from European
controllers) compared to Asian controllers (42.1 percent of responses captured from
Asian controllers). The sample of African controllers is too small for any comparison.
Not at allMinimalModeratelyMuchVery much
Past experience
40
30
20
10
0
Fre
qu
en
cy
2.91%3.88%
29.13%31.07%
33.01%
Figure 6-10 Controllers’ reliance on other factors (e.g. past experience) throughout the recovery process
Figures 6-8 to 6-10 and frequency analysis show that controllers mostly rely upon other
factors (e.g. past experience) when dealing with equipment failures. This is followed by
situation-specific problem solving and finally written procedures. After investigation of
factors that affect controller recovery, the next section focuses on the survey objective
and the assessment of the most unreliable ATC system/tool.
6.7.3.3 The most unreliable ATC systems/tools (Q3)
The data used for the analysis of the most unreliable ATC equipment are based on two
particular questions, 5 and 9. Question 5 consisted of examples of equipment failures
that severely impacted on the controller’s work. Question 9 asked controllers to list the
three most unreliable ATC systems/subsystems they have experienced. The data
obtained from both questions were collated and pre-processed to remove any duplicate
Chapter 6 Questionnaire Survey
159
answers. This was necessary as controllers tended to give the similar response to both
questions.
The results of the analysis of questionnaire responses from 34 countries were found to
be similar to those obtained from the analysis of operational failure reports, presented
in Chapter 4. The questionnaire survey shows that the three most affected ATC
functionalities are: communication (37.2 percent of all examples provided), data
processing (24.6 percent), and surveillance (23 percent) (Figure 6-11). More precisely,
the following five equipment types are affected most:
� air-ground communication (12.03 percent of all examples provided);
� primary surveillance radar ( 9.1 percent);
� flight data processing system (7.75 percent);
� communication panel ( 7.49 percent); and
� ground to ground communication (6.68 percent).
Figure 6-11 Distribution of affected ATC functionalities as reported in the questionnaire survey
Table 6-2 establishes the link between the most unreliable ATC functionalities and
existing recovery procedures, as reported by 134 controllers from 34 countries
representing various regions of the world. The link is established based on responses
to questions 5, 9, 10, and 11. In addition, the analysis was conducted at the country
level rather than ATC Centre level to avoid direct reference to sensitive information
specific to ATC Centres. It should be noted that because of this, inaccuracies are
possible only for the cases when the controllers did not have a full awareness of the
availability of recovery procedures in their ATC Centres.
Chapter 6 Questionnaire Survey
160
Table 6-2 Mapping between most unreliable ATC functionalities and existing recovery procedures for the countries sampled
Country Most unreliable ATC functionalities
Existing recovery procedure
Ireland
Communication Frequency failure, telephone failure Navigation Failure of navigational aids Surveillance Radar failure (procedural/non-radar control) Data processing Strip printer failure (emergency strip printing) Pointing/input devices
Input device failure
Power outages, procedures for all failure types
Finland Communication Surveillance Data processing
Serbia
Communication Frequency failure, telephone failure Surveillance
Data processing Flight data processing system (FDPS) failure, radar data processing system (RDPS) failure
Switzerland
Communication Frequency failure, telephone failure Navigation Surveillance Radar failure, visualisation system (radar display) failure Data processing FDPS failure Pointing/input devices
Power supply failure
United Kingdom
Surveillance Procedures for all failure types
Netherlands
Communication Frequency failure
Surveillance Secondary surveillance radar (SSR) failure, radar fallback system failure, failure of the working position (radar display)
Data processing FDPS failure, RDPS failure Pointing/input devices
Total system failure (in various gradations)
Germany
Communication Surveillance Radar failure Data processing Total system failure
Spain
Communication Frequency failure Surveillance Total radar failure Data processing Fire contingencies
Norway
Communication Frequency failure, on-line data interchange (OLDI) link failure, communication panel failure, telephone failure, headset failure, intercom failure
Surveillance Radar failure, failure of the radar display Data processing FDPS failure Pointing/input devices
Italy
Communication Frequency failure Navigation Runway/taxiway lights failure Surveillance Radar failure Data processing
France
Communication Frequency failure, telephone failure Surveillance Radar failure Data processing FDPS failure, RDPS failure
Power outage, air conditioning failure, fire evacuation, meteorological equipment failure, failure of navigation
Chapter 6 Questionnaire Survey
161
aids
Sweden
Communication Frequency failure, telephone failure Surveillance Radar failures, surface movement radar failure Data processing Pointing/input devices
Safety nets
Procedures for most failure types, runway/taxiway lighting system failure, instrument landing system (ILS) failure
Slovenia Communication Frequency failure, telephone failure Data processing FDPS failure, RDPS failure Radar failure
Belgium Communication Frequency failure Surveillance Radar failure, radar fallback failure
Macedonia
Communication Frequency failure Data processing Pointing/input devices
Radar failure
Croatia
Communication Frequency failure, telephone failure Surveillance Radar failure Data processing Power outage
Moldova Radar failure
Iceland Communication Surveillance Data processing FDPS failure
Denmark Communication Frequency failure, telephone failure Data processing Radar failure
Portugal
Communication Frequency failure, telephone failure, voice switching and communication system (VSCS) failure
Navigation Surveillance Radar failure, radar display failure Data processing Strip printer failure
South Africa Communication Radar failure
Tanzania Frequency failure, telephone failure, FDPS failure, power outage
India
Communication Telephone failure, intercom failure
Navigation Failure of navigation equipment, instrument landing system (ILS) failure
Surveillance Radar failure Data processing FDPS failure Pointing/input devices
Singapore Communication Frequency failure Surveillance Radar failures, failure of radar display
Tahiti
Communication Frequency failure, failure of satellite communication Surveillance Data processing Safety nets
Navigational aids failure, tsunami alert, aircraft diverting due to terrorist action
Australia Communication Surveillance
Austria Surveillance Data processing FDPS failure , RDPS failure, failure of strip printer
Chapter 6 Questionnaire Survey
162
Pointing device failure, failure of touch input display (TID), frequency failure
Romania Communication Surveillance Procedures for all failure types
Malta
Communication Surveillance Radar failure Data processing Pointing/input devices
Power supply
Macau Special Administrative Region
Communication Frequency failure Navigation Navigation aids failure Data processing
Procedures for all failure types, radar failure, SSR failure
Kenya
Communication Frequency failure, telephone failure Navigation Surveillance Data processing Strip printer failure
New Zealand
Communication Frequency failure, telephone failure Surveillance Radar failure, radar screen failure Data processing FDPS failure, RDPS failure Safety nets
Partial and total failure of all ATC equipment, evacuation of ATC centre, mouse/keyboard failure, power outage
China Surveillance Radar failure FDPS failure, frequency failure
Malaysia
Communication Frequency failure Surveillance Data processing Safety nets
The instances in which identified failures are not supported by existing recovery
procedures are highlighted in grey. In these cases, controllers experienced ATC
equipment failures for which recovery procedures were not available in their ATC
Centre. On the other hand, the instances in which sampled controllers have not yet
experienced equipment failures, for which procedures exist, are highlighted in yellow
and separated as the last row for each country. As an example, if the communication
function was affected specifically by frequency failure, the mapping is not established
(coloured grey) if the recovery procedure did not exist for this particular failure type. In
several cases controllers reported that their ATC Centre has procedures for all failure
types. Clearly it is not possible to cover all failure types but to design generic
procedures or guidelines to perform in the case of equipment failure.
It can be concluded that inadequate mapping between recovery procedures and
equipment failures experienced by controllers occurred in many cases. The most
severe cases are those in which countries do provide at best only one type of recovery
Chapter 6 Questionnaire Survey
163
procedure. This was identified in several European countries (i.e. Finland, Macedonia,
Iceland, and Malta), in two African countries (i.e. South Africa and Kenya), and two
Asian/Pacific countries (i.e. Tahiti and Malaysia). The most neglected ATC functionality
was found to be data processing, followed by surveillance and communication. The
paradox is that the qualitative equipment failure impact assessment tool (Chapter 4)
identified exactly these three ATC functionalities as the most challenging to controller
recovery.
6.7.3.4 Organised exchange of information on equipment failures (Q4)
40.3 percent of the controllers surveyed reported that their ATC Centres have
organised exchange of information on equipment failures between colleagues. 49.3
percent reported a lack of this exchange of experience whilst 10.4 percent did not
answer this question.
Contradictory responses were obtained from 14 ATC Centres and are further
investigated by responses given to the subsequent question, i.e. whether the organised
exchange of experience is supported by management as a good working practice.
From the ATC Centres that have exchange of experience, 76 percent have formal
processes approved by management as opposed to the practice based on ’word of
mouth’ that reaches only a small portion of controllers. The question was intended to
capture initiatives by management to provide means to share experience on equipment
failures in an organised manner. This may be achieved using different methods, such
as seminars, company newsletters, safety bulletins, memorandums, and workshops. In
these ways the lessons learnt are disseminated not only between the controllers
directly experiencing the effects of the failure, but within the entire ATC Centre and
often within the same country.
Based on this additional assessment, the following countries do not have formal nor
informal processes for exchange of experience on equipment failures: Italy, Ireland,
Croatia, India, Slovenia, Maastricht ATC Centre (as opposed to Amsterdam Centre),
Switzerland, Slovenia, Macau SAR, and Kenya.
The data indicates that there is room for improvement. There is a clear need for the
implementation of formal processes for exchange of experience on equipment failures
including failure modes and recovery processes. This should form part of a wider safety
culture within ATC Centres which is the responsibility of management. The past has
proven this type of indirect training to have a beneficial safety impact in a similar way to
Chapter 6 Questionnaire Survey
164
regular recurrent training. The example discussed in Chapter 5 mentions an incident
where A300 was struck on the left wing by a surface to air missile system resulting in a
loss of all flight controls. Reacting rapidly, the captain recalled a television documentary
on a DC-10 crash at Sioux City (Iowa) and the thrust change technique employed by
the captain and crew of the DC-10 to control their aircraft. Although the A300 crew had
never practiced this technique before, they quickly gained control despite the extreme
stress of the situation (IFALPA, 2005).
6.7.3.5 Status and quality of recovery procedures (Q5)
A section of the questionnaire consisting of 11 questions (from 10th to 20th question)
was dedicated to the assessment of recovery procedures within each ATC Centre. The
first question was designed to immediately filter out those ATC Centres without any
written procedures in place. In this case, the controller would skip the rest of this
section and proceed with the rest of the questionnaire. In cases where recovery
procedures exist, the remaining ten questions were designed to assess the quality of
those procedures. These questions focused on the completeness of the recovery
procedure, the level of currency, clarity, realism or feasibility, accessibility, and
compatibility with other procedures. In addition, controllers were given the opportunity
to comment on any event for which there was an inadequate application of recovery
procedures in their working experience.
The analysis of the questionnaire responses highlighted some inconsistencies (marked
with ‘?’ in Table 6-3). In these cases, the controllers from the same ATC Centre gave
opposite responses to the questions on the existence of recovery procedures, recovery
training, and/or recurrent training. These are further investigated using the responses
to the subsequent questions related to recovery procedure (11th to 20th question),
recovery training (25th to 28th question), and recurrent training (23rd and 24th question).
In this section, further investigation regarding the existence of recovery procedures is
conducted for Shannon, Cork, Brussels, and Nairobi ATC Centres (Table 6-3) using the
answers provided from 11th to 20th question. Although controllers from these ATC
Centres reported a lack of recovery procedures in the 10th question, their subsequent
answers revealed that these procedures do exist (at least for some failure types).
Chapter 6 Questionnaire Survey
165
Table 6-3 Existence of recovery procedures, recovery training, and recurrent training as reported in the questionnaire survey
Country ATC Centre Existence of
recovery procedure
Existence of training for equipment failures
Existence of recurrent training
Ireland
Shannon ? Yes ?
Dublin Yes No ?
Cork ? ? ?
Finland Kemi No Yes Yes
Serbia Belgrade Yes No No
Switzerland Zurich Yes Yes ?
Geneva Yes Yes ?
United Kingdom
Bristol Yes Yes No
Netherlands
Maastricht Yes ? Yes
Nieuw Milligen Yes Yes No
Amsterdam Yes Yes Yes
Germany
Karlsruhe Yes Yes No
Langen Yes Yes No
Frankfurt Yes Yes Yes
Spain Seville Yes ? No
Norway
Olso Yes Yes Yes
Kirkenes Yes Yes No
Stavanger Yes No Yes
Bodo Yes Yes Yes
Italy
Rome Yes No ?
Bologna Yes No No
Naples Yes No No
Venice Yes Yes No
Milan Yes No No
France Paris Yes Yes No
Nice Yes No No
Sweden
Stockholm Yes No No
Malmo Yes Yes Yes
Gothenburg Yes Yes Yes
Slovenia Ljubljana Yes Yes Yes
Belgium Brussels ? No No
Macedonia Skopje Yes No No
Croatia
Split Yes No Yes
Zagreb Yes No No
Pula No No Missing data
Zadar No No Missing data
Moldova Chisinau Yes Yes Yes
Iceland Reykjavik Yes No ?
Denmark Copenhagen Yes Yes Yes
Portugal Lisbon Yes ? ?
South Africa FAJS Yes Yes Yes
Tanzania Dar el Salaam Yes Yes No
India Mumbai Yes ? Yes
Kolkata Yes ? No
Singapore Singapore Yes Yes Yes
Tahiti Papeete Yes ? ?
Australia Melbourne Yes No No
Chapter 6 Questionnaire Survey
166
Austria Vienna Yes No Yes
Romania Bucharest Yes Yes Yes
Malta Malta No Yes No
Loqa airport Yes Yes Yes
Macau SAR Macau Yes ? ?
Kenya Nairobi ? Yes No
New Zealand
Wellington Yes Yes No
Auckland Yes Yes Yes
Christchurch Yes ? Yes
China Hong Kong Yes Yes No
Malaysia Subang Yes ? No
Table 6-2 shows that 93.1 percent of sampled ATC Centres do have some form of
recovery procedure in place (i.e. 54 ATC Centres). The types of equipment failures
mostly covered by recovery procedures in sampled ATC Centres are:
� radar failure (reported by 40.2 percent of controllers surveyed);
� failure of communication function: radio telephony, ground to ground
communication, voice switching and communication system panel (reported by
43.3 percent of controllers surveyed); and
� flight data processing system failure (reported by 12.69 percent of controllers
surveyed)4.
74 percent of controllers reported that these recovery procedures are kept up-to-date
and reflect the changes in hardware and software occurring in the ATC Centre.
Similarly, 72 percent of controllers rated available recovery procedures as
comprehensive, while only 55 percent rated them as complete. The remaining 45
percent of controllers surveyed rated available recovery procedures as incomplete (i.e.
missing recovery steps necessary to re-establish a safe ATC service). When asked
which types of recovery procedures should be added, the controllers mostly
emphasised the requirement for recovery procedures from radar failure,
communication systems failure, the need for back-up systems, and procedures for
handling outages at ATC Centre level. Furthermore, 88 percent of controllers rate
available recovery procedures as clear and understandable, while 72 percent rated
them as realistic and feasible to perform.
69 percent of controllers surveyed reported that recovery procedures documentation is
easily accessible, i.e. they are placed in close proximity to controller working positions.
4 The discussion presented in Chapter 5 showed that ICAO provides recovery procedures for
the communication and surveillance functionalities but not for the data processing functionality.
Chapter 6 Questionnaire Survey
167
Finally, 77 percent of controllers reported that available recovery procedures are linked
or harmonised to other procedures specified within the Manual of Air Traffic Services
(MATS), e.g. on suite allocation of tasks (separation of responsibilities between
executive and planner controller), and duties of the staff such as the approach
controller, the ground controller, or the watch manager.
From the survey data and subsequent analyses, it can be concluded that majority of
sampled ATC Centres have some form of recovery procedures. The majority of
controllers reported that these procedures are up-to-date, comprehensive, easily
accessible, and compatible with other procedures. Moreover, controllers emphasise
the need for procedures on radar and communication failures.
6.7.3.5.1 Other findings regarding the recovery procedures
In addition to the findings in the previous section, the questionnaire’s narrative section
highlighted interesting safety-relevant issues regarding recovery procedures. These are
individual comments rather than findings representative of the entire sample. The
reported issues are categorised in three groups, namely equipment specific, teamwork
specific, and generic recovery related issues. These are discussed in the following
paragraphs.
The equipment related issues highlighted major problems with the flight data
processing system not covered in the operational manuals. In addition, controllers
reported a lack of back-up facilities. One example indicated that during radio
communication system failure, a particular ATC Centre had only ten emergency radio
devices for the operational room with a 20 seat configuration.
On teamwork related issues, the controllers mostly reported inadequate familiarisation
with contingency procedures on the part of technical staff and controllers in
neighbouring sectors. In general, the controllers highlighted the important role of
teamwork and the need for an experienced planning controller in the event of
equipment failure. Another example drew attention to the unavailability of technical staff
during night shifts to immediately provide assistance in the case of equipment failure.
In short, controllers feel that teamwork is important in dealing with failures and that
Team Resource Management (TRM) training, aimed at enhancing teamwork efficiency,
should be mandatory for all ATC Centres.
Chapter 6 Questionnaire Survey
168
Finally, many individual recovery related issues, such as context, procedures, and
working practice, are also highlighted in the questionnaire’s narrative part. These are
as follows:
� Situation-specific problem solving plays a major role as all equipment failures
occur within a specific context (e.g. bad weather, frequency jamming, high/low
traffic levels);
� There is a need for a similar approach to recovery procedures as are available to
pilots. In other words, a comprehensive manual with all possible failures and
corresponding recovery steps is needed during controller training. For the
operational environment, it would be necessary to design an abbreviated version
of the contingency manual available at each controller working position (e.g. aide-
memoire in the form of check-list, see Appendix III); and
� Accurate and efficient strip marking is seen as the most reliable recovery tool in
the case of radar or flight data processing failure.
6.7.3.6 Status and quality of training for recovery (Q6)
A section of the questionnaire consisting of eight questions (from 21st to 28th question)
was dedicated to the assessment of training in recovery from equipment failures within
each ATC Centre. The first question was designed to immediately filter out those
Centres without training schemes. In this case, the controller would skip the reminder
of this section and proceed with the final part of the questionnaire. In the case of the
existence of a recovery training scheme, the remaining seven questions were designed
to assess its quality by extracting information on the existence of recurrent training, its
frequency, content, and compatibility with other types of training. The final section of
the questionnaire provided the opportunity for controllers to comment on other issues
of relevance to training.
The analysis of the collected data firstly revealed inconsistencies in the responses to
questions on training (Table 6-3). The reason for this may be that some controllers
assumed their initial training, e.g. initial radar control training, as training for recovery.
Other controllers may have considered only separate training for emergency situations
and whether it involved some type of equipment failure.
30 ATC Centres (51.7 percent) have training for recovery for equipment failures, 18
ATC Centres (31 percent) do not, while data for 10 ATC Centres (17.3 percent) are
inconsistent (i.e. marked with ‘?’ in Table 6-2). In these cases, the controllers from the
Chapter 6 Questionnaire Survey
169
same ATC Centre gave opposite responses to the questions on existence of recovery
training. All these inconsistencies are further investigated using the subsequent
questions related to recovery training (i.e. 25th to 28th question). Although controllers
from these ATC Centres reported contradictory responses on existence of the recovery
training (i.e. 21st question), their answers to subsequent training-related questions did
not reveal any further information. Therefore, a conservative approach has been taken
and these 10 ATC Centres are considered not to have recovery training in place.
In the case of recurrent training, the analysis shows that only 36.2 percent of the whole
sample of ATC Centres have recurrent training, 43 percent do not, while the rest of the
data is either inconsistent or missing. Recurrent training is provided once a year in 25
ATC Centres and bi-annually in three ATC Centres (Oslo-Norway, Bucharest-Romania,
Auckland-New Zealand). In addition, Geneva and Melbourne ATC Centres provide
recurrent training three times per year, while Frankfurt ATC Centre provides recurrent
training 20 times per year. In the latter a contingency system is used every weekend to
train controllers.
Further analysis of the ATC Centres with recurrent training frequency higher than once
a year, shows that all have recovery procedures in place, while the majority (i.e. 64
percent) have an organised exchange of information on equipment failures. The
Auckland ATC Centre emphasised that recovery performance was difficult before the
introduction of clear and easy to follow procedures. Moreover, this ATC Centre
highlighted that operations impact on recovery training as the recent failure types are
included in the recurrent training. Although the Oslo ATC Centre has recovery
procedures, its controllers report the need for more comprehensive and easily available
procedures (e.g. checklist type procedures on each console). These controllers
expressed a need to step away from increased dependency on experience when
handling equipment failures.
From the subset of controllers who have recurrent training once a year, 55 percent
believe that this is adequate, with the rest express the need for higher frequency in
order to build competency in handling unexpected equipment failures. When asked if
the training covers all important equipment failures, the majority of controllers (i.e. 63
percent) answered negatively. The most frequent issues mentioned to be added to the
current training syllabus are:
� complete radar failure simulated in a comprehensive and realistic way;
� total power failure;
Chapter 6 Questionnaire Survey
170
� facility evacuation;
� team resource management (TRM);
� different types of aircraft problems (e.g. communication failure, engine failure,
landing gear problem);
� hot standby procedures (system running in the background ready for immediate
use); and
� radar bypass (radar information is presented directly at the radar display without
having been processed, resulting in the presentation of uncorrelated tracks only).
61 percent of controllers believe that the training methods utilised in their ATC Centres
are suitable, or more precisely, realistic and varied. Furthermore, according to the
responses from 63 percent of controllers surveyed the recovery training is compatible
(i.e. linked to other training schemes). In general, it is essential to harmonise recovery
training within the overall training syllabus. One option is to include recovery training
within each training course, such as ab-initio training, conversion course, continuity or
recurrent training, training for unusual situations, and TRM training. The other option is
to provide separate recovery training sessions on a regular basis. Regardless of the
approach, ATC management has to assure an inclusive, regular, and consistent
approach in training for recovery to its entire population of controllers.
From the survey data and subsequent analyses, it can be concluded that the majority
of the ATC Centres surveyed have some form of recovery training although not
necessarily provided consistently throughout the Centre. The situation with recurrent
training is worse as in the majority of cases, this type of training is not provided
regularly. This results in the extensive reliance on experience in dealing with equipment
failures which may pose a significant safety threat in ATC Centres with a large
percentage of newly established and thus less experienced controllers. In general, the
controllers surveyed want to step away from over reliance on experience and be
regularly trained as much as possible.
6.7.3.6.1 Other findings on training for recovery
In addition to the findings in the previous section, the questionnaire’s narrative section
highlighted interesting safety-relevant issues regarding recovery training. These are
individual comments rather than findings representative of the entire sample. The
reported issues focus on the quality and frequency of recovery training.
Chapter 6 Questionnaire Survey
171
According to the controllers surveyed the main problem is the overall lack of training,
for supervisors, engineers, and controllers. The controllers believe that a couple of
hours of training per year is far too little practice and some of them feel that recurrent
training is necessary at least twice a year. In the event of more critical equipment
failures (e.g. radar) with high traffic levels, there may be occasions that there is no time
to act upon the recovery procedures. On these occasions the role of training as well as
teamwork has a much greater importance.
The controllers are aware that it is almost impossible to include everything that can go
wrong within the training syllabus, but emphasise that more training and guidance
should be given. They also highlight that training sessions should be as realistic as
possible in the simulated environment (e.g. higher traffic levels and the need to use
radar fallback system regularly). Currently, in some ATC Centres, the training only
focuses on outages (i.e. failure of the entire ATC system) and not on everyday failures.
An example of an ATC Centre where recurrent training takes place only on a night shift
highlighted inconsistent provision of training throughout the ATC Centre, as only those
controllers on a night shift get recovery training.
6.7.3.7 Other findings on recovery performance
This section deals with additional findings extracted specifically from question 5. This
question aimed to provide an opportunity to controllers to discuss their past experience
with equipment failures which seriously impacted on their work. The findings extracted
from question 5 are presented in Appendix VI.
While section 6.7.3 has provided a high level analysis and results of the survey, the
following section carries a more rigorous analysis of the data.
6.7.4 Interaction analyses
The data analyses started with the assessment of the sample characteristics and
proceeded with the high-level summaries of controller responses. In this section, the
final set of data analyses investigates the relationships between the characteristics of
controllers (e.g. operational experience) and various recovery factors using appropriate
statistical tests. The section starts by the qualitative assessment of potential
interactions and identification of those relevant to controller recovery. This is followed
by the presentation of appropriate statistical tests and their key findings.
Chapter 6 Questionnaire Survey
172
Several reciprocal interactions amongst controller characteristics and recovery factors
(correspond to key question defined in section 6.1) are chosen for further statistical
testing and marked with symbol ‘√’ (Table 6-4). This choice is based on known
relationships from operational experience further tested using the rigorous statistical
assessment. The focus is placed on controller recovery and factors that influence it,
which corresponds to a total of eight interactions.
Table 6-4 Interaction matrix
Opera
tion
al
exp
erie
nce
Rating
Experi
en
ce
with
equ
ipm
en
t fa
ilure
s
Fa
cto
rs t
ha
t in
flue
nce r
eco
very
p
erf
orm
an
ce
Fo
rmal exch
an
ge o
f in
form
atio
n
Exis
ten
ce o
f re
co
ve
ry p
roce
dure
s
Exis
ten
ce o
f re
co
ve
ry t
rain
ing
Operational experience (length of service) √ √ √
Rating √ √
Experience with equipment failures (frequency per year)
√
Factors that influence recovery performance √ √
Formal (management supported) exchange of information
Existence of recovery procedures
Existence of recovery training
The nature of the variables under consideration determined which statistical methods
could be used to analyse the data. As can be seen from their description in this
Chapter, three variables are categorical (rating, factors that influence recovery
performance, formal or management supported exchange of information on equipment
failures) whilst two represent a continuous or ratio scale variable5 (operational
experience-length of service, experience with equipment failures-frequency per year).
As data differ significantly from the normal distribution, several non-parametric tests
with 95 percent significance level have been used. As previously explained in Chapter
4 (section 4.4.1), chi-square tests are used to test the relationships between two
categorical variables. Furthermore, the Cramer’s V test is used to measure the
5 As mentioned in Chapter 4, variables can be either continuous or categorical. Continuous
variables are numeric values on an interval or ratio scale (e.g. age, income). Categorical variables can be either nominal or ordinal. Nominal variables differentiate between categories but do not assume any ranking between them (e.g. gender). On the other hand, ordinal variables differentiate between categories that can be rank-ordered (e.g. from lowest to highest).
Chapter 6 Questionnaire Survey
173
association for nominal data (i.e. interactions between ‘factors that influence recovery
performance’ with ‘rating’ and ‘existence of formal exchange of information on
equipment failures’) whilst the Kendall tau test is used for ordinal data (i.e. ‘factors that
influence recovery performance’). The relationship between two ratio variables is tested
via non-parametric correlation or Kendall’s tau statistics which uses the ranks of the
data to calculate correlation coefficient. Correlation coefficient ranges between -1 and
1, where its sign indicates the direction of the relationship (either positive or negative)
whilst its absolute value indicates the strength of the relationship.
Finally, the relationship between ratio and categorical variable is tested using the non-
parametric Mann-Whitney test. The test is used to assess whether two samples of
observations come from the same distribution (Shier, 2004). The test involves the
calculation of a statistic, referred to as ‘U’ (see equation 6-1).
,2
)1(1
1121 R
nnnnU −
+
+= 6-1
where n1 and n2 are the two sample sizes, and R1 is the sum of the ranks all the
observations in sample 1. Samples greater than 20 are assumed to follow normal
distribution, thus U statistic is converted to a Z score using the formula in equation 6-2
(Shier, 2004):
12
)1(2
value U largest
2121
21
++
−
=
nnnn
nn
Z 6-2
The results of all tests are presented in Table 6-5.
Table 6-5 Statistical tests and results obtained
Variable 1 Variable 2 Test
Statistical significance at 95
percent confidence level
Operational experience (length of service)
ACC Mann-Whitney non parametric
test
p>0.05
APP p<0.001 (U=1382.5,
z=-3.56)
TWR p=0.014 (U=3387.5,
z=-2.46)
Operational experience (length of service)
Experience with equipment failures (frequency per year)
Non-parametric test (Kendall’s
tau) p>0.05
Written procedures Mann-Whitney non parametric
test
p>0.05 Situation-specific problem solving
p>0.05
Other p>0.05
Chapter 6 Questionnaire Survey
174
Rating
ACC Number of equipment failures experienced annually (Q4)
as above
p>0.05
APP p>0.05
TWR p>0.05
ACC Factors that influence recovery performance
Non-parametric test (Cramer's V)
p=0.0086
APP p>0.05
TWR p>0.05
Experience with equipment failures (frequency per year)
Written procedures Mann-Whitney non parametric
test
p>0.05 Situation-specific problem solving
p>0.05
Other p>0.05
Factors that influence recovery performance
Written procedures
Situation-specific problem solving
Non-parametric test (Kendall’s
tau)
p>0.05
Other p>0.05
Situation-specific problem solving
Other p<0.001
Factors that influence recovery performance
Written procedures
Formal exchange of information (Q7)
Non-parametric test
(Cramer's V)
p>0.05
Situation-specific problem solving
p>0.05
Other p=0.029
Statistical tests performed indicated five significant relationships (Table 6-5). Significant
relationships are found between controllers with APP rating and TWR rating and years
of operational experience (i.e. years in service). In the sample surveyed, controllers
with APP rating have more operational experience compared to those without this
rating. Similarly, controllers with TWR rating have more operational experience
compared to those without it. Secondly, a significant relationship is identified between
other factors that influence recovery performance and ACC rating. Data indicates that
controllers with ACC rating tend to rely upon other factors (e.g. past experience) more
than those without ACC rating. This is expected as controllers with ACC rating in the
available sample have more operational experience than those without ACC rating.
Thirdly, a significant relationship is identified between controller reliance on situation-
specific problem solving and other factors (e.g. past experience) when recovering from
equipment failures. This is expected as past experience represents one of the factors
that define the situation surrounding (context) of an equipment failure. Finally, a
significant relationship is identified between controller reliance on other factors (e.g.
past experience) when recovering from equipment failures and management supported
6 Relationship between other factor that influence recovery procedure and ACC rating.
Chapter 6 Questionnaire Survey
175
exchange of information regarding equipment failures (Table 6-5). It may be the case
that controllers account for exchange of information regarding equipment failures as a
type of past experience.
On the other hand, no relationship is identified between the factors that influence
recovery process and operational experience (i.e. number of years active as a
controller). Although it was expected that less experienced controllers may rely more
on written procedures and that more experienced controllers may rely more on past
experience, statistical testing did not support these expectations. Years in service do
not differentiate between reliance upon a written procedure, context, or other factors
(e.g. past experience). It may be the case that the overall safety culture built in the ATC
Centre determines what a controller may use as the main resource in recovering from
equipment failures. Therefore, if the procedures are not available, they will rely more on
situation-specific problem solving. Therefore, this decision would be based on
organisational issues more than their own experience.
6.8 Summary
This Chapter has discussed in detail the questionnaire survey that sampled 134
controllers in 58 ATC Centres from 34 countries. The survey was designed to achieve
four main objectives. Firstly, to build on the literature review to further investigate
equipment failures and factors that influence controller recovery by introducing
operational experience. Secondly, to support the information obtained from operational
failure reports (as represented in Chapter 4), which lacked the input on controller
recovery. Thirdly, to assess the status and quality of recovery procedures and training
in the sampled set of ATC Centres. Finally, to contribute to the wider human reliability
research with a particular focus on controller recovery from equipment failures.
The results of the analyses conducted on the data consist of several interesting
findings. These are structured around six key questions that this survey addresses.
� How often do controllers experience equipment failures (Q1)?
Almost 95 percent of controllers surveyed experienced ATC equipment failure in their
operational career. The investigation of frequency of failures per year revealed that
major failures tend to occur only once a year or once in two years, while less severe
failures tend to occur with a relatively high frequency. These findings are in line with the
results obtained from operational failure reports and their categorisation based on
severity (presented in Chapter 4).
Chapter 6 Questionnaire Survey
176
� What factors influence their recovery performance (Q2)?
Investigation of the factors that mostly influence controller’s recovery performance
has revealed that factors other than written procedures and situation-specific problem
solving have the greatest impact, e.g. past experience. However, differences
between these ‘other’ factors (e.g. past experience) compared to written procedures
and situation-specific problem solving are not large, i.e. the controllers rated the
importance of all listed factors similarly.
� What is the most unreliable ATC equipment (Q3)?
Investigation of the most unreliable ATC equipment, based upon the experiences of the
controllers surveyed, has shown a match with the results obtained from the analyses of
operational failure reports (as presented in Chapter 4). The most affected ATC
functionalities are the communication, surveillance, and data processing. The most
unreliable ATC equipment incorporates air-ground and ground-ground communication,
radar coverage, and the flight data processing system. These findings, together with
those from Chapter 4, led to the selection of the equipment failure to be simulated in
the experiment presented in Chapter 9 (i.e. the flight data processing system failure).
� Is there any organised exchange of information on equipment failures and/or other
types of unusual/emergency situations (Q4)?
The organised exchange of information of equipment failure represents an ‘indirect’
experience and a learning opportunity. Through presentation, seminars, and safety
bulletins, the controllers could be presented with failure types, contextual conditions
surrounding the failure, and the difficulties experienced by their fellow colleagues in
handling the situation. However, in the sample obtained almost half of the controllers
did not have this kind of information exchange organised in their ATC Centres.
� Do recovery procedures exist (Q5)?
Assessment of the existence and quality of recovery procedures shows that the
majority of sampled ATC Centres have some type of recovery procedure in place,
mostly for radar failure, communication failure, and flight data processing system
failure. The analyses also show that most of these procedures are kept up-to-date but
not always complete. Therefore, additional emphasis should be placed on the revision
of existing procedures to assure that the recovery steps presented are complete and
that these follow a logical order. However, attention should be paid to the trade-off
between the thoroughness of the procedure and limited time available to perform all
Chapter 6 Questionnaire Survey
177
prescribed steps and thus to recover. An example of a concise check-list type recovery
procedures developed in this thesis for a specific European ATC Centre is presented in
Appendix III. It is based on a format used previously by the German air traffic service
provider (DFS) accepted and published by EUROCONTROL (2003f).
� What do controllers feel about the quality of training currently available for recovery
from equipment failures (Q6)?
Assessment of the existence and quality of training for recovery shows that only half of
the ATC Centres surveyed have established training for recovery from equipment
failures. The situation with recurrent training is even worse as only 36 percent of ATC
Centres surveyed organise regular recurrent training. In most cases, recurrent training
is provided only once a year, while in nine ATC Centres it is provided twice a year. On
the other hand, controllers support the idea of very frequent recurrent training. Almost
half of the respondents (i.e. 45 percent) feel an annual training session for a couple of
hours is simply not enough to keep them proficient and ready to deal with unexpected
equipment failures.
The process of identification of factors that affect controller recovery started in the
previous Chapter by an overall assessment of past research relevant to controller
recovery. It has continued in this Chapter by expanding these findings with the
questionnaire survey results and operational experience of controllers worldwide.
Based on these findings, the next Chapter finalises this rigorous process by identifying
factors that affect controller recovery, referred to as ‘Recovery Influencing Factors’
(RIFs).
Chapter 7 Methodology for a Selection of Relevant RIFs
178
7 Methodology for a Selection of Relevant Air Traffic Controller Recovery Influencing Factors
This Chapter builds on the findings from past research of relevance to controller
recovery (Chapter 5) further augmented by the operational experience extracted from
the questionnaire survey (Chapter 6) to realise a detailed understanding of the context
that surrounds a controller during the occurrence of an unexpected equipment failure.
The Chapter starts by illustrating the importance of the impact that contextual factors
have on controller recovery from equipment failures in Air Traffic Control (ATC). It
reviews both Air Traffic Management (ATM) and non-ATM related Human Reliability
Assessment (HRA) techniques to assure a comprehensive investigation of contextual
factors relevant to controller recovery from equipment failures in ATC. This initial
selection is augmented by the findings from the equipment reliability literature,
operational failure reports, human reliability research, and interviews with ATM
specialists. The Chapter concludes by identifying a set of relevant contextual factors,
referred to as ‘Recovery Influencing Factors’ (RIFs), and their qualitative descriptors or
the levels of their influence on controller recovery performance.
7.1 Relevance of the recovery context
Analyses of accident investigations in various industries (e.g. aviation, nuclear and
chemical) have revealed that it is not possible to gain a full understanding of the
cause(s) of an accident from factual data alone. For example, the US National
Transportation Safety Board (NTSB) conducted dozens of detailed accident
investigations in which the teams of experts managed to assess different contributory
factors and identified various issues with task design, procedures, cultural issues
(mostly relevant to language barriers within pilot-controller communication), personal
factors (e.g. a shift in attention in L-1011 1972 accident in Everglades; NTSB, 1973),
weather (e.g. the Pan Am Flight 759 accident was due to thunderstorm and wind shear;
NTSB, 1983). Such factors can help explain why errors occur. Additionally, the
description of the context may also serve as a basis for defining ways of preventing or
Chapter 7 Methodology for a Selection of Relevant RIFs
179
reducing specific types of erroneous actions by means of technical recovery (i.e. built-
in defences) and human recovery.
It is also necessary to take into consideration contextual factors that traditionally may
not be recorded by investigating bodies, but which can have a significant impact on the
outcome of an accident. In support of this, Dekker et al. (2004) note that it is
“necessary to capture both a situation in which the action takes place and the action
itself”. Similar arguments were presented by researchers at the National Aeronautics
and Space Administration (NASA) Ames Research Centre, who pointed out that "we
must move beyond trying to pin the blame for accidents on a culprit but seek instead to
understand the systemic causes underlying the outcomes" (cited in Cox, 2005). The
research presented in this thesis expands the analysis of equipment-related incidents
to include the context in which controller recovery unfolds. Therefore, the objective of
this Chapter is to determine the relevant contextual factors that affect the process of
controller recovery from equipment failures in ATC.
In Air Traffic Management (ATM), the contextual factors relevant to controllers are
defined as “internal or external factors which influence the controller’s performance of
ATM tasks” (EUROCONTROL, 2002b). It is notable that this definition is generic and
thus does not give an indication as to when it is appropriate to stop looking further for
contextual factors. The so-called ‘stopping rule’ is taken to be directly linked to the
overall investigation process, where assessment of contextual factors represents only
one segment of that process. In other words, it is the role of the investigator to
determine the chain of events that constitute a safety-relevant occurrence. In this
respect, the analysis of contextual factors should cover the entire chain and assess the
relevant context for each link in the chain. The research presented in this thesis adapts
the EUROCONTROL definition of contextual factors. Hence, the contextual factors in
this research or ‘Recovery Influencing Factors’ (RIFs) are defined as internal or
external factors that influence the controller’s recovery from unexpected equipment
failures in ATC.
The factors extracted from the various techniques are known in the HRA literature as
Contextual Conditions – CCs (EUROCONTROL, 2002b), Performance Shaping
Factors - PSFs (Shorrock, 1992; Shorrock and Kirwan, 2002; EUROCONTROL, 2004e;
THEMES, 2001; Swain and Guttman, 1983), Error Producing Conditions – EPC
(EUROCONTROL, 2004d; Williams, 1986), Common Performance Modes – CPMs
Chapter 7 Methodology for a Selection of Relevant RIFs
180
(Hollnagel, 1993), Common Performance Conditions – CPCs (Hollnagel, 1998), or
Recovery Influencing Factors – RIFs (Kanse and van der Schaaf, 2000).
However, not all contextual factors are appropriate to describe the context around
recovery from equipment failures. This is because, firstly many factors have been listed
and recognised as generic factors without a good understanding of their influence
specifically on the recovery process. Secondly, many of the existing contextual factors
are derived from the nuclear and process industries. Such factors are not always
transferable to the highly dynamic and time-dependant ATC environment. Thirdly,
some of the past research was based on the models of human performance not
representative of specific ATC tasks.
It should be noted that the research presented in this thesis does not rely exclusively
on any particular model of human information processing. Instead, it simply assesses
the importance of the recovery context and aims to derive a set of contextual factors
that best determines the controller recovery performance. The following section
presents two equipment failure incidents to highlight the importance of the context in
which controller recovery takes place.
7.1.1 Examples of the recovery context
Two real examples taken from an incident database of a Civil Aviation Authority (CAA)
are presented below to illustrate the relationship between failure, recovery, and
contextual factors. Because of their confidential nature, the examples are de-identified.
Although brief in the description of equipment failure, the two reports identified various
contextual factors and their impact on controller performance.
The first report contained the following: “At 2230 advice was received that there would
be a load test performed on the electrical system which would involve changing from
mains power supply to generators. Assurance was received that there would be no risk
of service interruption. Shortly after the power changeover two XX consoles crashed
followed by the remaining two. The Voice Switching Communication System (VSCS)
also failed as did the wall clock adjacent to the XX area. At the same time the simulator
also failed.” It was subsequently established that the root cause of the reported failure
had been within the ATC organisation which did not set up appropriate maintenance
procedures on the ‘live’ ATC system (i.e. organisational factor). Additionally, this report
highlighted the relevance of other contextual factors such as: the number of
workstations/sectors affected (i.e. loss of four workstations and the simulation platform),
Chapter 7 Methodology for a Selection of Relevant RIFs
181
time course of failure development (i.e. sudden failure), and complexity of failure type
(i.e. multiple failure: several workstations, clock, and simulation platform affected).
The second report contained the following: “The loss of radar display and VSCS at a
time of moderate traffic (approximately 10 aircraft on frequency) created substantial
workload on the controller. Thankfully, there were two controllers in the near vicinity
who were able to assist with a transition to a nearby controller working position and to
help maintain situational awareness and communications with the various aircraft via
air-ground (AG) bypass.” This report highlighted the impact of traffic complexity at the
moment of failure occurrence (i.e. ten aircraft in simultaneous communication with the
controller), personal factors (i.e. substantial workload), communication for recovery
within a team (i.e. assistance with handling the traffic and maintaining traffic awareness
in spite of the loss of all critical systems: visual representation of traffic on display and
direct communication with relevant aircraft), adequacy of organisation (i.e. availability
of additional support), number of workstations affected (i.e. one workstation), and
complexity of failure type (i.e. multiple systems affected: radar display and
communication system).
The two brief cases above taken from an incident database illustrate the important
relationship between failure, recovery, and relevant contextual factors. In other words,
these equipment failure examples have shown that the context in which human
performance takes place is important in understanding human reliability. Although the
examples do not convey the complete picture of the occurrence of equipment failure
(e.g. no mention of any personal issues in the first example, weather), several
contextual factors have been captured. As a result, research on controller recovery
from equipment failures in ATC requires a precise definition of the context surrounding
any failure type. In order to achieve this objective, it is necessary to review the specific
contextual factors defined in various HRA techniques. This is used together with
information from equipment reliability literature to identify the ‘Recovery Influencing
Factors’ (RIFs).
7.2 Methodology to extract the candidate set of contextual factors
In order to determine a candidate set of contextual factors relevant to controller
recovery from ATC equipment failures, it is necessary to start with a review of
contextual factors as identified in the most relevant current HRA techniques (i.e. ATM-
specific HRA techniques). It is important to highlight that this overview is not focused
Chapter 7 Methodology for a Selection of Relevant RIFs
182
on human error per se or the underlying human information processing theory. The
literature on human error has been used simply to investigate the relevant factors that
influence the human performance in unusual/unexpected events (i.e. contextual
factors). As a result, human information processing theories used in assessed HRA
techniques are outside the scope of this thesis.
It is also important to note that although there are currently three HRA techniques used
in the ATM sector, the review presented here has also considered other HRA
approaches employed in other domains to assure a complete set of RIFs. Furthermore,
a review of relevant equipment-failure characteristics and dynamic situational factors
has been conducted in order to augment the results from the review of the HRA
techniques. This is to ensure a complete and reliable determination of the RIFs. The
RIFs are then verified by interviews with ATM specialists. Figure 7-1 presents the
methodology used in this thesis to extract a candidate set of contextual factors relevant
to controller recovery from ATC equipment failures.
Methodology to extract a
candidate set of Recovery
Influencing Factors (RIFs)
Augmentation with
dynamic situational
factors
Augmentation with findings from other HRA techniques
ATM related
HRA techniques
Augmentation with equipment-failure
related characteristics
Output
Output
Output
Output
Identified gaps
Identified gaps
Identified gaps
Verification of
selected RIFs by
two ATM Specialists
Figure 7-1 Methodology to extract a candidate set of RIFs
Chapter 7 Methodology for a Selection of Relevant RIFs
183
7.2.1 Human reliability assessment techniques
The methodology for the selection of contextual factors relevant to controller recovery
starts with a review of contextual factors as identified in the most relevant current HRA
techniques.
7.2.1.1 Human Error in ATM (HERA)
The HERA project represents the most recent approach for the analysis of human error
in the ATM domain. It evolved because of European and US initiatives1 to produce a
distinctive HRA tool. HERA is based on an extensive literature review and the
operational involvement of air traffic controllers, incident investigators, and safety
managers. The HERA project developed an initial set of CCs for ATM based on the UK
incident reports, discussions with controllers, and vast literature on human factors
(EUROCONTROL, 2002b; EUROCONTROL, 2003d; EUROCONTROL, 2003e;
EUROCONTROL, 2004d). HERA uses eleven groups of Contextual Conditions (CCs)
to define context: pilot-controller communications, pilot actions, traffic & airspace,
weather, documentation & procedures, training & experience, workplace design & HMI,
environment, personal factors, team factors, and organisational factors. Each of the CC
groups is further sub-divided, resulting in more than 200 contextual factors. HERA
recommends that CCs should be applied individually to each error that occurred during
an incident, rather than just once for the entire incident. This supports the concept
presented in this thesis that analysis of contextual factors should cover the entire chain
of events leading to an incident. Thus it should assess contextual factors relevant for
each link in that chain (see section 7.1).
The majority of contextual factors defined in HERA are relevant to controller recovery
from equipment failures in ATC. Thus, the HERA technique represents a good starting
point for compiling a list of RIFs. For example, severe weather conditions can degrade
controller performance by adding additional workload to the already complex recovery
task. As such weather should be incorporated in the list of RIFs.
There are also some factors defined in HERA that are not applicable to the recovery
from equipment failure in ATC. For example, pilot actions are relevant to ATM but not
ATC. Therefore, this particular factor will be excluded in the final choice of RIFs.
1 The US Federal Aviation Administration (FAA) developed the Human Factors Analysis and
Classification System (HFACS) tool.
Chapter 7 Methodology for a Selection of Relevant RIFs
184
Additionally, pilot-controller communication is not relevant in the immediate event of
equipment failure. Although not addressed in this thesis, there are circumstances when
pilot actions are of importance, such as in the case of a major failure or when
unplanned or erroneous pilot actions result in the increase of controller workload. More
important than the example above is the communication between a team of controllers
for efficient recovery. In this respect, communication (for recovery) and team factors
could be combined to create one factor since the entire team interaction takes place
through the communication for recovery. Only in the event of severe equipment failure
(i.e. a failure that adversely affects the availability of an Air Traffic Service-ATS over a
significant period), is a controller obliged to inform all traffic (i.e. pilots) in the affected
airspace of a reduced level of ATS. Finally, there is a tendency to exclude
environmental issues, when looking at more specific events, such as equipment failure,
on the basis that controllers are familiar with working in a specific ATC Centre. This is
discussed further in section 7.2.1.3.
7.2.1.2 Technique for the Retrospective and Predictive Analysis of Cognitive Errors in ATC (TRACEr)
This approach was developed by the UK National Air Traffic Services (NATS) to gain a
better understanding of controller error. It is a model-based approach, which performs
both a retrospective and a prospective analysis. The original version of TRACEr
contains eight different taxonomies; one of which describes context (Shorrock, 1992;
Shorrock and Kirwan, 2002). The CC groups derived in HERA were based largely on
the context defined in TRACEr. The TRACEr technique uses the Performance Shaping
Factors (PSF) taxonomy and “classifies factors that have influenced or could influence
controller performance, aggravating the occurrence of errors, or perhaps assisting error
recovery” (Shorrock and Kirwan, 2002). Thus, it can be concluded that TRACEr defines
context in a similar way to HERA, i.e. by defining relevant groups of PSFs. As with
HERA, each PSF group is further sub-divided, resulting in approximately 60 PSFs in
the TRACEr Light version. The PSF groups recognised by TRACEr are: traffic and
airspace (e.g. traffic complexity), pilot/controller communications (e.g. RT workload),
procedures (e.g. accuracy), training and experience (e.g. task familiarity), workplace
design, HMI and equipment factors (e.g. radar display), ambient environment (e.g.
noise), personal factors (e.g. alertness/fatigue), social and team factors (e.g.
handover/takeover), and organisational factors (e.g. conditions of work).
Chapter 7 Methodology for a Selection of Relevant RIFs
185
The main difference between TRACEr and HERA is that the former does not include
pilot actions and weather (see Appendix VII). Thus, no additional candidate factors
could be extracted from TRACEr.
7.2.1.3 Recovery from Automation Failure (RAFT) Tool
As previously discussed in Chapter 5, this tool has been developed as a part of the
“Solutions for the Human-Automation Partnerships in European ATM (SHAPE)” project,
managed by the Human Factors Division of EUROCONTROL. The SHAPE project
defines context as “any aspect of the operating environment that can influence a failure
or recovery process” (EUROCONTROL, 2004e). The project focused on the contextual
factors affecting recovery, which is in line with the objective of this thesis. The relevant
contextual factors or PSF categories recognised in RAFT are: task load and system
complexity, pilot-controller communication, procedures and documentation, training
and experience, human-machine interaction, personal factors, social and team factors,
logistical factors, and other organisational factors.
A review of the RAFT PSFs shows that ‘task load and system complexity’ represents a
workload facing the controller as a result of task performance and overall system
complexity. Therefore, this factor has a potential to be included as a RIF. Compared to
HERA, RAFT disregards ‘pilot action’, ‘weather’, and ‘environment’ as relevant
contextual factors for human recovery from equipment failure in ATC. Whilst pilot
actions do not have much impact as explained in section 7.2.1.1, weather can bring
additional complexity to the occurrence of equipment failure. At the same time, RAFT
includes a ‘new’ category called ‘logistical factors’, which includes maintenance and
staffing issues.
Environmental issues (e.g. noise, temperature, and lighting) are excluded. The reason
for this is that controllers are used to ambient characteristics by working in a specific
ATC Centre. On the other hand, logistical factors will be assigned to the existing
organisational factors category. The reason for this lies in the fact that staffing and
maintenance issues should be anticipated and pre-planned at organisational or
managerial level (e.g. maintenance scheduling, availability, and assignment of
personnel, stock of equipment and spare parts, on-the-job training aids). The
management in any ATC Centre should anticipate as far as possible unscheduled
technical disturbances and provide necessary defences for their prevention.
Chapter 7 Methodology for a Selection of Relevant RIFs
186
The three techniques (HERA, TRACEr, and SHAPE/RAFT tool) above were developed
specifically for the ATC/ATM environment. In general, they defined context and
contextual factors in a similar way as it is defined in this thesis. The assessment of
these three models identifies a total of nine candidate RIFs. These are: communication,
traffic and airspace, weather, procedures, training and experience, HMI, personal,
organisational factors, and task complexity.
Whilst the review of ATM related HRA techniques gives many relevant contextual
factors, it worth examining relevant non-ATM HRA techniques to investigate if other
factors exist. The following sections provide an insight into the relevant findings.
7.2.1.4 Recovery from failures: understanding the positive role of human operators during incidents
This research attempted to emphasise the positive role of human operators in the
overall system performance. In addition, it proposed a preliminary failure compensation
process model (or recovery model) derived initially for the chemical process industry.
Furthermore, the importance of a taxonomy used to describe the factors influencing
recovery was recognised. Based on the experience gained from field studies and the
relevant literature, Kanse and van der Schaaf (2000) developed a list of RIFs. In their
research the recovery factors were defined as factors that contribute to human
recovery performance once an error or failure has occurred. This definition
corresponds to the definition of RIFs adopted in this thesis. A categorisation into six
groups of RIFs adopted by Kanse and van der Schaaf (2000) from the power plant
industry is presented in Table 7-1.
Table 7-1 Factors influencing recovery from failures (from Kanse and van der Schaaf, 2000)
Categories of factors Recovery Influencing Factors
Prioritisation of recovery-related tasks
Time available for recovery task, considering other tasks requiring attention Urgency of recovery (amount of time until negative consequence arise) Importance of or need for recovery (seriousness of possible consequences if not recovered)
Occurrence-related
Type(s) of preceding failures Performance phase in which the immediate result of the failure process is detected (during the planning phase/ while carrying out the action/when the outcome of the action is observable) Available and applicable barriers/defences
Human (person) related
Overall work area knowledge Work area and process related skills General competency in job Time elapsed since last (re)training in work area Time since last (re)training with regard to specific problem occurrence Suspicion/distrust/intuition
Chapter 7 Methodology for a Selection of Relevant RIFs
187
Personal attitude toward failure and failure compensation System failure coping strategies Self-efficacy (trust in own ability), self esteem Fatigue; Shift work coping ability Feeling of personal responsibility for the failure or problem Feeling of personal responsibility with regard to recovery Pride regarding job well done Previous experience with failures (any type) Previous experience with this failure (any type)
Social
Team attitude toward failures and failure compensation Attitude toward teamwork; Team efficacy Feeling of team responsibility for the failure or problem Feeling of team responsibility with regard to recovery
Organisational
Availability of team members/colleagues Organisation of work and responsibilities Training plan; Competency assessment plan Supervision; Personnel selection processes Availability, quality and usability of procedures/instructions Shift patterns and personnel planning Organisational policy Management attitudes towards failures & failure compensation
Technical/workplace/situational
Availability of equipment/materials needed Operator-process interface properties
The majority of the identified factors are relevant to equipment failures in ATC and
should be considered as potential RIFs. For example, ‘available and applicable
barriers/defences’ are important with respect to detection, diagnosis, and correction of
equipment failure. Time pressure is recognised under the ‘prioritisation of recovery-
related tasks’. Equipment failures in ATC are unexpected events, which degrade the
ATC service offered. In this case controllers are still required to provide a service to
ensure a safe flow of traffic. As a result, controller workload increases rapidly
potentially compromising controller performance. Therefore, this factor should be
analysed for potential inclusion into the RIFs. Occurrence-related factors are mostly
applicable to the power plant environment and as such could not be directly applied to
ATC. However, if transferred to the characteristics of the ATC environment, these
factors may be relevant to equipment failure occurrence.
7.2.1.5 Computerised Operator Reliability and Error Database (CORE-DATA)
The CORE-DATA database was developed at the University of Birmingham to assist
the UK personnel involved in the assessment of hazardous systems such as nuclear,
chemical, and offshore systems (Kirwan, Basra, and Taylor-Adam, 1997;
EUROCONTROL, 2002b; EUROCONTROL, 2004d). It represents an attempt to
develop a systematic approach to recording human errors. Several sources of data are
used to populate the database including: real operating experience (incident and
accident reports), simulation (both training and experimental simulators), experiments
(from literature on performance), expert judgment (e.g. as used in risk assessments),
Chapter 7 Methodology for a Selection of Relevant RIFs
188
and synthetic data (from human reliability quantification techniques). According to
EUROCONTROL (2002b), CORE-DATA contains approximately four hundred data
records describing particular errors that have occurred, together with their causes, error
mechanisms, and their probabilities of occurrence. PSFs are defined in CORE-DATA
as underlying causes which influence human performance and indicate how the human
error occurred. CORE-DATA’s PSF taxonomy consists of alarms, communication,
ergonomic design, ambiguous HMI, HMI feedback, labels, lack of supervision/checks,
procedures, refresher training, stress, task complexity, task criticality, task novelty, time
pressure, training, and workload.
There are a number of factors here of potential relevance to ATC and controller
recovery. Firstly, alarms should be considered as a particular type of technical built-in
defence (discussed in Chapter 4) and are therefore, important with respect to detection,
diagnosis, and correction of equipment failure. This is also in accordance with the work
done by Kanse and van der Schaaf (2000) as explained in the previous section. Hence
‘alarm’ should be considered as a potential RIF. Secondly, task novelty or task
familiarity in the case of equipment failures in ATC should be considered under the
training and experience RIF. Thirdly, time pressure has also been recognised in the
work done by Kanse and van der Schaaf (2000) under the ‘prioritisation of recovery-
related tasks’. Therefore, this factor should be analysed for inclusion into the RIFs.
7.2.1.6 Technique for Human Error Rate Prediction (THERP)
The THERP technique was developed by Alan Swain at Sandia National Laboratories
in the 1950's (Swain and Guttman, 1983; Straeter, 2000). The THERP technique
assumes that human information processing can be influenced by error conditions
(Performance Shaping Factors-PSFs). THERP subdivides all PSFs into internal,
external, and those that act as physiological and psychological stressors. However, the
ways in which PSFs act on human performance are not explicitly specified.
Furthermore, THERP sub-divides external PSFs into situational factors, task factors,
and task instructions. Internal factors are defined as factors related to the organism (i.e.
human factors). The PSFs recognised in THERP are presented in Table 7-2.
Table 7-2 Factors influencing human actions in THERP (cited in Straeter, 2000)
Category Factors influencing human actions
External Performance Shaping Factors
Situational factors
Design features; Quality of environment; Temperature, air humidity, air quality, radiation exposure, illumination, noise, vibration, cleanliness; Working hours; Breaks; Availability of special work resources; Job manning; Organisational structure (authority, responsibility, channels
Chapter 7 Methodology for a Selection of Relevant RIFs
189
of communication); Actions by shift leader, worker, manager, supervisory authority); Remuneration structure (recognition, payment)
Factors in tasks and work resources
Requirements for perception; Requirements for motor system (speed, power expenditure, accuracy); Relationship between operators and display; Requirements for adaptation; Interpretation; Decision making; Complexity (information loading); Narrow nature of task; Short term and long term memory; Calculations; Feedback (knowledge regarding results of an action); Dynamic of gradual actions; Group structure and communications; Man-machine factors; Interface (design of work resources, test instruments, maintenance equipment, work aids, tools, accessories)
Work and task instructions
Required procedures (written, non-written); Written and verbal communication; Warnings and danger signs; Work-methods; Plant policy
Stressors
Psychological stressors
Suddenness of occurrence; Duration of stress; Task speed; Task load; High hazard risks; Threats (fear of failure, loss of job); Monotony, degrading or meaningless activities); Duration of uneventful periods of alertness; Work performance motive conflicts; Reinforcement of missing or negative sensory deprivation; Detractors (noise, blinding, motion, flickering, coloration); Inconsistent labelling
Physiological stressors
Duration of stress; Fatigue; Pain or discomfort; Hunger or thirst; Extreme temperatures; Radiation; Extreme gravitational forces ; Extreme pressure conditions ; Inadequate oxygen supply; Vibration; Restricted movements; Absence of physical exercise; Interruption of circadian rhythm
Internal Performance Shaping Factors
Factors relating to the organism (i.e. human factors)
Prior training, experience; State of momentary practice or abilities; Personality and intelligence variables; Motivation and attitudes; Emotional states; Stress (mental or physical); Knowledge about demanded performance prerequisites; Gender differences; Physical conditions; Attitudes deriving from family or groups; Group dynamic processes
A review of the contextual factors relevant to THERP reveals that most can be
allocated to the RIFs identified by the first three ATM-related techniques. Several other
factors, such as decision-making, short-term, and long-tem memory (external PSF)
may be categorised as personal factors. These factors may become increasingly
important within the planned modernisation of ATM (i.e. datalink, electronic strips, or
‘stripless’ environment). Finally, the suddenness of occurrence factor identified in
THERP is not possible to categorise within existing RIF groups. This factor is relevant
for the occurrence of equipment failure in ATC environment as it greatly affects the
controller detection. Hence it should be treated as an additional potential RIF.
7.2.1.7 Human Error Assessment and Reduction Technique (HEART)
The HEART technique was developed by Jeremy Williams, a British ergonomist, in
1985. The review of this technique is available in EUROCONTROL (2004d) and
Chapter 7 Methodology for a Selection of Relevant RIFs
190
Williams (1986). It is one of the most popular human error quantification techniques
due to its ease of implementation and is still used extensively in the nuclear, chemical,
petrochemical, railway, and defence industries.
HEART was derived from a wide range of findings in ergonomics literature. The
technique defines a set of generic error probabilities for the tasks considered, and
identifies the Error Producing Conditions (EPC) associated with these. EPCs include
particular ergonomic, task (e.g. inactivity, repetitious, or low mental workload tasks,
additional team members necessary to perform task normally), and environmental
factors that could each have a negative effect on human performance. In other words,
the definition of contextual factors or EPCs emphasises purely their negative impact on
human performance. The extent to which each EPC factor affects performance is
quantified and the human error probability is calculated as a function of the precise
effect of each EPC on a particular task. HEART assumes that basic human reliability is
dependent upon a generic nature of the task to be performed and that under nominal
conditions this level of reliability will tend to be consistent (Williams, 1986).
This technique identified 38 different Error Producing Conditions (EPC). These can be
categorised into two groups, those directly transferable to ATC and those that are not.
The EPCs relevant to ATC can be further sub-divided into those that fit within existing
RIF categories and those that do not. The former are, for example, ‘unfamiliarity with a
situation which is potentially important but which only occurs infrequently or which is
new’, ‘a shortage of time available for error detection and correction’, and ‘a channel
capacity overload’. The EPC concerned with ‘unfamiliarity with a situation’ may be
captured through two RIFs i.e. training and experience. Unusual or emergency
situations (such as ATC equipment failures) are rare but highly demanding events that
require efficient and effective response from each controller. Regular and
comprehensive training plays a key factor in building the skills and experience
necessary to cope with such unusual situations. ‘Shortage of time available’ has
already been discussed and recommended to be included as a candidate RIF (see
section 7.2.1.5). Finally, ‘channel capacity overload’ is a term used for the workload
caused by simultaneous presentation of critical information to the human operator. As
such it can be classified under personal factors.
The EPCs not relevant to ATC include several factors. For example, a category
‘mismatch between the educational level and the requirements of the task’ is not
applicable to controllers. The level of education and training for ATC licence is
Chapter 7 Methodology for a Selection of Relevant RIFs
191
standardised and reflects the knowledge controllers should acquire. Furthermore, the
category ‘an incentive to use more dangerous procedures’ is also not applicable to
ATC as ‘dangerous’ procedures or working practices are direct violations of the rules.
7.2.1.8 The Contextual Control Model (COCOM)
The COCOM model, developed by Hollnagel (1993), describes how human
performance is dynamically determined by the current context, as an alternative to the
common information processing models. This is a generic HRA approach not related to
any specific industry.
COCOM represents a control model of cognition focusing on two important aspects:
the conditions under which a person changes from one mode to another and the
characteristics of human performance in a given mode. COCOM recognises four
control modes: scrambled, opportunistic, tactical, and strategic. According to this
approach human actions are determined by the context as well as specific
characteristics and mechanisms of human cognition. In Hollnagel’s view, humans do
not passively react to events, they actively look for information and act based on
intentions as well as external developments. Therefore, it was concluded that human
actions are only meaningful when considered in the appropriate context.
In this regard, COCOM defines Common Performance Modes (CPM) as the conditions
under which the human performance takes place. Hollnagel (1993) divides them into
CPMs that may increase or decrease human reliability. The former include sufficient
available time, available plans, adequate Man Machine Interface (MMI) and support,
few simultaneous goals, normal/familiar process state, and adequate organisation. The
CPMs that may reduce reliability include insufficient available time, plans not available,
inadequate MMI and support, many simultaneous goals, abnormal process state, and
inadequate organisation.
According to Hollnagel (1993), the objective is not to find a precise probability of a
specific action but rather to identify the specific steps, which are particularly prone to
produce hazardous consequences. This knowledge can then be used to change the
design of the system, to introduce specific measures of compensation, and to construct
defences and recovery options. Generally, the objective of the recovery performance
assessment should be to identify the context that is likely to result in an inadequate
recovery performance. The characteristics of the context resulting in an inadequate
recovery performance would be used to define the necessary changes to the ATC
Chapter 7 Methodology for a Selection of Relevant RIFs
192
system/component design (e.g. technical defences, recovery procedures and training).
This should allow the whole ATC system to be safer and more reliable.
The COCOM technique was subsequently used in the development of another method
discussed in the next section. Therefore the final choice of potential RIF factors from
both techniques is discussed within the next section.
7.2.1.9 Cognitive Reliability and Error Analysis Method (CREAM)
The CREAM methodology represents a further development to the COCOM model that
deals with the duality of competence and control in human cognition (Hollnagel, 1998).
Basing the work on COCOM’s model of cognition and four distinctive control modes,
CREAM represents a practical approach for both human performance analysis (i.e.
retrospective analysis) and performance prediction. The method is cyclical rather than
sequential and has well-defined conditions that identify when an analysis should end.
Similar to COCOM, CREAM represents a generic approach not related to any specific
industry.
Using past research (i.e. THERP technique), Hollnagel (1998) attempts a more
structured approach where related categories of contextual factors are grouped
together. As a result he defines a small set of Common Performance Conditions (CPCs)
that contain the general determinants of performance (i.e. common modes) including:
adequacy of organisation, working conditions, adequacy of MMI and operational
support, availability of procedures/plans, number of simultaneous goals, available time,
time of day (circadian rhythm), adequacy of training and experience, and crew
collaboration quality. The proposed CPCs were intended to have a minimal degree of
overlap, although they are not independent.
Hollangel (1998) argues that there is a significant similarity between PSFs and CPCs.
However, the difference lies in the scope of these factors. Similar to CPMs in the
previous COCOM technique, CPC categories are more generic conditions and
designed to be applied in the early stage of the analysis to characterise the context for
the entire human operational task. On the other hand, PSFs tend to be more specific
and focused on a particular stage of that task.
Hollnagel (1998) went one-step further to define the levels that each CPC can take and
their appropriate effects on performance reliability (the so called ‘typical values’ of
CPCs). These levels are based on general human factors knowledge and experience
Chapter 7 Methodology for a Selection of Relevant RIFs
193
from the HRA discipline. Hollnagel used the general principle that advantageous
performance conditions improve reliability, whereas disadvantageous conditions are
likely to reduce it. If reliability is improved, operators are expected to fail less often in
their tasks and perform better in general. He proposed an expected effect of each CPC
on performance reliability at three levels: improved, not significant, or reduced. The
advantages of this approach can be seen in the direct link between the descriptors
used for CPCs and expected effect on human performance reliability. As such, the
research presented in this thesis adopted this approach (further explained in section
7.3).
In order to determine the overall effect of the context on human performance, the
CREAM technique assumes an expert judgement of the relevance of each CPC for the
particular event under investigation and its impact on the probability of failure (no
impact, improves, reduces). The resulting score is used to determine the expected
control mode, which, as previously mentioned, is: scrambled, opportunistic, tactical, or
strategic control.
Taking account of the review of both the CPMs (COCOM) and CPCs (CREAM), the
majority of the factors identified are directly transferable to ATC. The exceptions are
the number of simultaneous goals and normal/familiar process state (see Appendix VII).
Regarding the number of simultaneous goals, it is important to highlight that air traffic
control implies the simultaneous processing of multiple tasks. In other words, a
controller may be in radio contact with 10-20 aircraft simultaneously performing
computer-related tasks (e.g. entering assigned altitude information, handing off flights
to another controller). Therefore, high levels of multitasking remain inherent
characteristics of ATC (Wickens, 1992) and as such will be excluded from the list of
RIFs. The other factor (normal/familiar process state) is highly relevant to the recovery
performance but has to be indirectly mapped with training and experience.
7.2.1.10 Human Reliability Management System (HRMS)
The HRMS technique was developed to derive a comprehensive and accurate
assessment of human contribution to risk in the nuclear industry, through a detailed
task and error analysis, quantification, and practical error reduction scheme. Since this
technique was too resource-intensive, it was necessary to additionally develop a fast
screening technique. This ‘light’ version required a detailed approach only for those
scenarios, which showed critical human involvement. This led to a subsequent
technique, the Justification of Human Error Data Information (JHEDI). Six PSFs were
Chapter 7 Methodology for a Selection of Relevant RIFs
194
identified based on the assessment of several HRA techniques (Kirwan, 1997): time,
quality of information and interface, training/expertise/experience/competence,
procedures, task organisation, and task complexity. Context is defined as complete
task design, the working and organisational environment, and the entire history of the
task and individual(s) performing the task. In fact context encompasses all the
conventionally-used PSFs, plus the myriad of factors, including culture, many too
microscopic and idiosyncratic, or even possibly too macroscopic and intangible to allow
a tractable predictive analysis (Kirwan, 1997).
The HRMS approach is based on its own audit document and consists of fifty questions
as an assumed limit for an acceptable and practicable tool. The expert inputs to each
of these questions (‘yes, ‘no’, ‘not applicable’) are used to rate each PSF, ranging from
zero to ten, where a value of zero represents a near-perfect design and ten a poor
design. As a result, a profile of PSFs is created for each task and further linked to the
known value of human error probability for that task (extracted from the available
incident database). The quantitative assessment of each new task comprises of its
comparison with known tasks (and their PSF profile) and deriving an extrapolation rule
to predict its outcome.
Looking at the PSFs identified in HRMS and JHEDI above, it is clear that ‘time’ is an
important factor also relevant to controller recovery. The time it takes to recover from
the occurrence of an equipment failure is important in ATC due to its highly dynamic
nature and the potential for development of an unsafe situation (e.g. loss of standard
separation distance between aircraft). The other factors (e.g. quality of interface,
training, procedures) are also relevant to ATC and are already discussed for their
inclusion as potential RIFs.
7.2.1.11 A Technique for Human Event Analysis (ATHEANA)
The US Nuclear Regulatory Commission supported the development of ATHEANA as
a technique to overcome the shortcomings of the first generation HRA techniques
(Nuclear Regulatory Commission, 1998). ATHEANA is a context driven technique in
the identification and analysis of human failure events. This technique was intended to
provide a means for analysing Errors Of Commission (EOC). ATHEANA moved away
from random human errors under nominal conditions to errors which result from error-
forcing contexts. According to ATHEANA, an error-forcing context comprises of two
components (i.e. plant conditions and associated PSFs) and is associated with (human)
unsafe actions. Thus, the emphasis is placed on the negative impact of context on
Chapter 7 Methodology for a Selection of Relevant RIFs
195
human performance (similar to HEART technique). ATHEANA borrows its methodology
from HEART (see section 7.2.1.7) but accounts for various plant conditions into the
analysis. Starting from the basic scenario (i.e. nominal plant mode), various alternative
deviation scenarios were developed. The deviation scenarios include additional events
that increase the likelihood of certain error-mechanisms to be triggered (Nuclear
Regulatory Commission, 1998).
As in most other HRA methods, the PSFs derived for ATHEANA are broad categories
which need to be assessed for adequacy by the HRA analyst. These are: procedures,
training, communications, supervision, staffing, human-system interface, organisational
factors, stress, and environmental conditions. All these factors are relevant to controller
recovery from equipment failures in ATC and have already been discussed in the
previous sections.
7.2.1.12 Connectionism Assessment of Human Reliability (CAHR)
The CAHR technique was developed as part of a PhD dissertation and a project for the
German nuclear industry (Straeter, 2000). The objective of this dissertation was to
develop a method for evaluation of human reliability within plant events. The novelty in
this approach is that it is based on very detailed databases introduced to facilitate
international exchange of experiences on events in the nuclear industry. These
databases are: the Nuclear Computerise Library for Assessing Reactor Reliability
(NUCLARR), the Incident Reporting System (IRS), and the German special
occurrences database (BEVOR). These databases collect mandatory occurrences data
to enable international exchange of experiences on events in nuclear systems (Straeter,
2000).
The CAHR technique is based on the evaluation of the operator’s task from the incident
description and identification of interactions between various PSFs. In general, PSFs
are defined here as causes or conditions necessary for the occurrence of an error.
Straeter (2000) considered a weighting scheme for each PSF. Since the available data
sources (i.e. databases) offered a high-level event description, it was possible to move
away from a judgment based categorisation of PSFs towards a more analytical method.
Straeter (2000) determined the frequencies with which a shaping factor was observed
in connection to a human error of a certain type. However, as much as this approach
seems reasonable, it requires access to highly detailed datasets of human reliability
performance. Amongst the investigated events, Straeter (2000) determined 30
conditions under which human errors occurred. These were categorised into six groups:
Chapter 7 Methodology for a Selection of Relevant RIFs
196
� task (e.g. preparation, simplicity/complexity, precision, time pressure);
� order issue (clarity of procedures, design of procedure, content, completeness,
presence);
� person (e.g. processing, information, goal reduction);
� activity (e.g. usability of control, usability of equipment, monotony, positioning,
quality assurance, equivocation of equipment);
� feedback (e.g. arrangement of equipment, display range, accuracy of display,
labelling, marking, reliability); and
� system (e.g. technical layout, external event, construction, redundancy, coupled
equipment).
The identified PSFs are applicable to recovery from equipment failures in ATC and
have been already considered for the inclusion in candidate RIFs (e.g. task, order issue
- procedures, person, activity – operational support, feedback - HMI). The last CAHR
category (i.e. system) is also relevant as a potential RIF especially as it is deals with
technical layout or system architecture and level of redundancy (as a type of built-in
technical defence). However, these factors are important from a technical point of view
since they directly determine the reliability and availability of the ATC service. The
research presented in this thesis focuses on controller recovery performance once all
redundant systems fail and affect the controller’s ability to control traffic in dedicated
airspace. As a result, more emphasis should be placed on built-in defences
transmitting information to the controller regarding the failure (e.g. alarms, alerts) since
these have an effect on the quality of the controller recovery process (for details see
Chapter 4, section 4.3.2). This also directly corresponds to findings by Kanse and van
der Schaaf (2000) reviewed in section 7.2.1.4.
7.2.1.13 Nuclear Action Reliability Assessment (NARA)
The Nuclear Industry Management Committee (IMC) and British Energy supported an
initiative to produce an enhanced and updated version of the HEART technique
specific to the nuclear industry and known as Nuclear Action Reliability Assessment -
NARA (Kirwan et al., 1994). A review of the data sources used for the original version
of HEART pointed out the need for a detailed human error probability database
(CORE-DATA) which overcame some of the shortcomings detected in the intervening
years. NARA is based on a combination of CORE-DATA and real accident/incident
data available from the nuclear industry, augmented by expert judgement.
Chapter 7 Methodology for a Selection of Relevant RIFs
197
In this technique, contextual factors are referred to as Error Producing Conditions
(EPCs). However, the set of EPCs included in NARA was based simply on a review of
the data sources used in the original version of HEART. From the original thirty eight
PSFs identified in HEART, eighteen were included in NARA based on the findings from
the research by Kennedy et al. (2000). The factors relevant to controller recovery are
the same as those in the HEART model.
7.2.1.14 Human Performance DataBase (HPDB)
Park et al. (2004) emphasised the need to collect plant-specific or domain-specific data
in order to identify the key factors that can degrade/enhance a plant’s safety. To fulfil
this requirement they initiated the Human Performance DataBase (HPDB) under the
Korean Atomic Energy Research Institute. The objective of this database was to
provide the reliable human performance information needed to perform HRA,
especially for plant-specific emergencies. In order to achieve this objective, they
collected operational emergency reports from regular training sessions. Information
that was considered relevant for an appropriate HRA analysis was grouped under the
following categories:
� available procedure;
� description of the different tasks, steps, and actions, and their dependence;
� demand of perception, cognition, and action to perform necessary tasks and
actions;
� person or team issues;
� level of experience; and
� time needed to correctly perform tasks, steps, and actions.
The third category ‘demand of perception, cognition, and action to perform necessary
tasks and actions’ refers to the operator’s workload. This factor has been assumed
under the personal factors similar to the approach taken in section 7.2.1.5. All other
factors have already been assessed as relevant to the recovery from equipment
failures in ATC.
Similar to the main objective of HPDB, the research presented in this thesis is relevant
to the advancement of knowledge of controller performance under emergency/unusual
situations, such as equipment failure in ATC. Under equipment failure occurrence
controller behaviour tends to differ from the normal everyday routine behaviour. For this
reason, it is necessary to review relevant internal or external factors that influence the
controller’s recovery from unexpected equipment failures in ATC.
Chapter 7 Methodology for a Selection of Relevant RIFs
198
The discussions presented in the previous sections attempted to extract relevant
factors from various human reliability research to assure the complete presentation of
the recovery context under research in this thesis. The following section gives a
summary of the findings.
7.2.1.15 Summary of the findings
The Recovery Influencing Factors (RIFs) relevant to ATC equipment failure have been
selected on the basis of several sources of information. In general, the definitions of
contextual factors throughout the assessed HRA techniques show great similarity,
where contextual factors are seen as causes, conditions, or factors that influence
human performance. The only difference is observed in three techniques (HEART,
ATHEANA, and CAHR) which focus purely on negative human performance.
The process follows to select the relevant RIFs started with an initial selection based
on the review of contextual factors identified in three ATC/ATM related human reliability
techniques, namely HERA, TRACEr, and RAFT (Table 7-3). As a result, nine groups of
RIFs have been determined as relevant to ATC: communication, traffic and airspace,
weather, procedures, training and experience, HMI, personal factors, organisational
factors, and task complexity. These initial findings are augmented with a review of non-
ATM related HRA techniques (as presented in the previous sections). Therefore, the
second step involved a review of eleven HRA techniques mostly designed to analyse
human error in the nuclear and process industries. These generated additional three
factors of relevance to controller recovery (see Table 7-3).
Table 7-3 Review of Human Reliability Assessment (HRA) techniques and relevant findings
HRA technique
Industry
Terminology used for
contextual factors
Definition of contextual
factors
Extracted contextual factors
HERA ATM Contextual Conditions (CCc)
Corresponds to the definition is this research
� Communication for recovery
� Traffic and airspace � Weather � Procedures � Training � HMI � Personal factors � Organisational factors
TRACEr ATM Performance Shaping Factors (PSFs)
No definition is provided
as above
RAFT ATM as above No definition is provided
� Task complexity
Chapter 7 Methodology for a Selection of Relevant RIFs
199
Recovery from
failures Chemical
Recovery Influencing factors (RIFs)
Corresponds to the definition is this research
� Occurrence-related factors (available and applicable defences such as alarm)
� Group of factors relevant for prioritisation of recovery-related factors (time available/time pressure)
CORE-DATA
Nuclear chemical offshore
Performance Shaping Factors (PSFs)
Corresponds to the definition is this research
as above
THERP Nuclear Performance Shaping Factors (PSFs)
Corresponds to the definition is this research
� Suddenness of occurrence (or time course of failure development)
HEART
Nuclear chemical
petrochemical railway defence
Error Producing Conditions (EPCs)
Corresponds to the definition is this research
as above
COCOM Generic Common Performance Modes (CPMs)
More generic definition
as above
CREAM Generic
Common Performance Conditions (CPCs)
More generic definition
as above
HRMS Nuclear Performance Shaping Factors (PSFs)
Additionally include myriad of other factors
as above
ATHEANA Nuclear Performance Shaping Factors (PSFs)
Emphasis is placed on purely negative context
as above
CAHR Nuclear Performance Shaping Factors (PSFs)
Emphasis is placed on purely negative context
as above
NARA Nuclear Error Producing Conditions (EPCs)
Corresponds to the definition is this research
as above
HPDB Nuclear Factors No definition is provided
as above
The assessed HRA techniques and their related factors are presented in tabular form
in Appendix VII. Factors from all techniques are compared to HERA, as the most
recent HRA technique in the ATC/ATM domain. In most cases, the comparison was
straightforward since certain factors were identified in almost all techniques. (e.g. the
factor ‘procedures’). However, a number of factors could not be identified as belonging
to any of the HERA categories and were thus categorised separately (shown as
dashed boxes in Appendix VII). Although these did not specifically ‘fit’ any of the HERA
categories, they were retained because of their relevance to the recovery from
equipment failures in ATC. Table 7-3 gives an overview of the RIFs that are taken
forward for further analysis in the next section.
Chapter 7 Methodology for a Selection of Relevant RIFs
200
7.2.2 Augmentation with equipment-failure related factors
Once the relevant factors have been determined based on the relevant HRA
techniques (Table 7-3), it was necessary to complement the identified RIFs with
equipment failure related factors. The reason for this is to better reflect the context
surrounding the occurrence of equipment failure and its subsequent controller recovery.
Chapter 4 yielded a further set of recovery factors related to some of the key
characteristics of equipment failures: ATC functionality affected (this is taken into
account separately through the classification of ATC functionalities as defined in
Chapter 2), complexity of failure type, time course of failure development, duration of
failure, impact on operations room (i.e. number of workstations/sectors affected), and
impact on ATC/ATM. As a result, the following RIFs have been added to the previous
list: complexity of failure type, time course of failure development, duration of failure,
and impact on operations room (i.e. number of workstations/sectors affected).
The relevance of the additional equipment-related RIFs has been confirmed in the
analysis of more than 20,000 operational failure reports from four different countries (as
presented in the Chapters 3 and 4). However, even the two brief operational reports
given in section 7.1.1 confirmed the relevance of the equipment-related RIFs, namely
number of workstations affected, time course of failure development, and complexity of
failure type.
7.2.3 Augmentation with dynamic situational factors
It was observed that the chosen RIFs represented more static aspects of the working
environment. As observed by Straeter (2005) dynamic situational factors play an
important role in human decision making and behaviour in emergencies (e.g.
unexpected equipment failure). Straeter (2005) identified a total of seven dynamic
situational factors subdivided into time-related and system-related. Time-related
dynamic situational factors are suddenness of onset of a system development,
operational phase of a task, and involvement of the operator. System-related dynamic
situational factors are: experience with system performance (reliance), conflicting
issues in the situation (task complexity), ambiguity of information in the working
environment, and misleading information processing (priming).
Based on the overview of these seven dynamic situational factors, it was possible to
identify additional three factors relevant to the recovery from equipment failures in ATC.
These are: experience with system performance (reliance), ambiguity of information in
Chapter 7 Methodology for a Selection of Relevant RIFs
201
the working environment, and adequacy of alarm/alert onset (adapted ‘suddenness of
onset of a system development’ factor). The remaining dynamic situational factors were
either already incorporated amongst candidate RIFs (i.e. task complexity) or were not
considered relevant in the ATM industry (e.g. ‘operational phase of a task’ and
‘misleading information processing’ are more relevant for the non-ATM industries).
7.2.4 Further subdivision of the identified RIFs
In certain cases, the identified recovery factors were too generic to capture the specific
characteristics of the environment at the moment of failure. In order to avoid any
ambiguity, two principles are adopted at this stage of the research. Firstly, each
identified contextual factor is rephrased to better reflect the research presented in this
thesis. For example, ‘communication’ is rephrased to ‘communication for recovery
within team/ATC Centre’. In this way, the selected RIF precisely reflects which segment
of communication is taken into account (i.e. in relation to the recovery process) and
between which parties (i.e. team of controllers or entire ATC Centre). The second
principle represents the subdivision of identified contextual factors whenever necessary
(see Table 7-4). As an example, the ‘traffic and airspace’ factor is too generic to
capture the characteristics of both traffic and airspace and was therefore broken down
into two separate categories. A similar approach is applied to ‘training and experience’.
Table 7-4 Recovery Influencing Factors
Identified contextual factors Corresponding Recovery Influencing Factors (RIFs)
Communication Communication for recovery within team/ATC Centre
Traffic and airspace Traffic complexity during the recovery process Airspace characteristics during the recovery process
Weather Weather conditions during the recovery process Procedures Existence of recovery procedure
Training and experience Training for recovery from ATC equipment failures Experience with equipment failures
HMI Adequacy of HMI and operational support Personal factors Personal factors Organisational factors Adequacy of organisation Task complexity Conflicting issues in the situation (task complexity) Time available & time pressure Time necessary to recover Available and applicable defences and barriers & alarms
Adequacy of alarms/alerts (as part of HMI)
Complexity of failure Complexity of failure type Suddenness of occurrence & Time course of failure development
Time course of failure development
Duration of failure type Duration of failure Impact on operational room (i.e. number of workstations/sectors affected)
Number of workstations/sectors affected
Experience with system performance (reliance)
Experience with system performance (reliance or trust in the system)
Chapter 7 Methodology for a Selection of Relevant RIFs
202
Ambiguity of information in the working environment
Ambiguity of information in the working environment
Adequacy of alarm/alert onset Adequacy of alarm onset
7.3 Definition of qualitative descriptors
The final step involves the definition of the qualitative descriptors for each RIF. In this
research, a qualitative descriptor defines the levels of impact that each RIF has in the
context of controller recovery performance. The simplest case would be a dichotomous
descriptor distinguishing only two levels of impact of each recovery factor. However,
this approach is often lacking valuable information and it is not always suitable.
Therefore, qualitative descriptors have been constructed providing three levels of
impact. It starts from Level 1, referring to the most desirable level (in terms of ATC
recovery), toward Level 2, referring to the tolerable or average level, and finishing with
Level 3, referring to the least desirable level. For example, the RIF ‘communication for
recovery within team/ATC Centre’ would have three qualitative descriptors, namely
‘efficient communication’, ‘tolerable communication’, and ‘inefficient communication’.
This approach is similar to that taken in the CREAM technique (Hollnagel, 1998;
section 7.2.1.9).
On the other hand, the RIF ‘Experience with the system performance (reliance or trust
in the system)’ would have two qualitative descriptors. The first would be ‘objective
attitude toward the system’. The second would account for inadequate attitude of the
controller toward the ATC system and would include both ‘positive experience with the
system (overtrust) and negative experience with the system (undertrust)’. In order to
accurately present the levels of impact that this particular RIF has in the context of
controller recovery performance, it was necessary to combine the cases of undertrust
and overtrust in the ATC system. To all extents and purposes, they both have a similar,
undesirable, affect on controller recovery performance. Undertrust in ATC systems
leads to inefficient use of available equipment or all of the available tools. On the other
hand, overtrust leads to complete reliance on the information provided by the system
without consideration of the controller’s own judgement or situational awareness of the
position (lateral and longitudinal) and intent of the traffic within a dedicated airspace.
The above analyses led to a final set of 20 controller Recovery Influencing Factors
(RIFs) divided into four main groups: internal factors (i.e. factors related to the
controller), equipment failure related factors, external factors (i.e. factors related to
working conditions), and airspace related factors. Finally, it has to be noted that the
Chapter 7 Methodology for a Selection of Relevant RIFs
203
definition of these 20 RIFs assumes that an equipment failure has occurred (i.e.
probability of equipment failure is 1). Otherwise, these 20 RIFs would have to be re-
named and re-defined to allow an analysis of the context surrounding a particular event
under investigation, no longer being an equipment failure. Table 7-5 presents the final
set of factors relevant to the recovery from equipment failures in ATC, together with
their corresponding qualitative descriptors. It has to be noted that these 20 RIFs
represents high-level categories (e.g. personal factors) consisting of several low-level
factors (e.g. age, experience, stress, fatigue). The detailed definitions of these 20 RIFs
in this thesis are presented in Appendix VIII.
Table 7-5 Relevant recovery influencing factors and their corresponding qualitative descriptors
RIF name Qualitative descriptor Level
Inte
rna
l fa
cto
rs
Training for recovery from ATC equipment failure
Suitable to the situation in question 1
Tolerable to the situation in question 2
Counter productive to the situation in question
3
Experience with equipment failures
Experienced a particular type of failure or any other type of ATC equipment failure
1
No experience with ATC equipment failures 2
Experience with the system performance (reliance)
Objective attitude toward the system 2
Positive experience with the system or negative experience with the system
3
Personal factors
Suitable for the recovery process 1
Tolerable for the recovery process 2
Counter productive for the recovery process 3
Communication for recovery within team/ATC Centre
Efficient 1
Tolerable 2
Inefficient 3
Equ
ipm
en
t fa
ilure
rela
ted
fa
cto
rs
Complexity of failure type Single system affected 2
Multiple systems affected 3
Time course of failure development
Sudden failure 1
Persistent or latent failure 2
Gradual degradation of system 3
Number of workstations/sectors affected
One workstation/one sector or all workstations in one sector
2
Several workstations/couple of sectors or all workstations/all sectors
3
Time necessary to recover Adequate 1
Inadequate 3
Existence of recovery procedure
Suitable to the situation in question 1
Tolerable to the situation in question 2
Inappropriate 3
Duration of failure Short period of time 2
Moderate or substantial period of time 3
or
facto
rs
rela
ted to
w
ork
ing
co
nditio
n
Adequacy of HMI and operational support
Suitable to the situation in question 1
Tolerable to the situation in question 2
Counter productive to the situation in 3
Chapter 7 Methodology for a Selection of Relevant RIFs
204
question
Ambiguity of information in the working environment
External working environment matches the controller's internal mental model
1
External working environment mismatches the controller's internal mental model
3
Adequacy of alarms/alerts
Suitable to the situation in question 1
Tolerable to the situation in question 2
Counter productive to the situation in question
3
Adequacy of alarm/alert onset
Information from the external world enters the processing loop at the right time
1
Information from the external world enters the processing loop at the wrong time (misleading sequence of alarms)
3
Adequacy of organisation
Efficient 1
Tolerable 2
Inefficient 3
Air
spa
ce r
ela
ted f
acto
rs Traffic complexity during the
recovery process
Average traffic complexity 2
High or low traffic complexity 3
Airspace characteristics during the recovery process
Adequate 1
Tolerable 2
Inappropriate 3
Weather conditions during the recovery process
Improved 2
Deteriorated 3
Conflicting issues in the situation (task complexity)
Average complexity of the situation 2
Conflicting, multiple tasks or extremely low complexity of the situation
3
In order to assure a complete list of relevant contextual factors, a key step at this stage
included verification of the selected RIFs. An initial verification was provided by two
ATM specialists (from one European ATC Centre) with extensive operational
experience. They had an opportunity to review the candidate RIFs, their definitions,
and related qualitative descriptors (for evidence see Appendix II) and their feedback
was valuable in the approval of selected RIFs. Further verification of the selected RIFs
has been conducted in the experiment (presented in Chapters 9 and 10). A discussion
on the process to quantify the probabilistic definition of 20 RIFs, their interactions, and
their influence on controller recovery is presented in more detail in the following
Chapter.
7.4 Summary
This Chapter has had the objective of defining recovery context via a set of contextual
factors, known as ‘Recovery Influencing Factors’ or RIFs. The Chapter has built on the
review of existing HRA techniques and their corresponding contextual factors to identify
which factors are relevant to recovery from equipment failure in ATC. This initial
selection of relevant contextual factors has been augmented with specific equipment
Chapter 7 Methodology for a Selection of Relevant RIFs
205
failure related factors and dynamic situational factors. The methodology resulted in a
set of 20 controller RIFs. The Chapter concludes with a definition of the qualitative
descriptors for each RIF or the levels of impact that each RIF has in the context of
controller recovery performance. All results obtained have been initially verified by two
ATM specialists who reviewed the choice of selected RIFs and their qualitative
descriptors. The selection of relevant contextual factors (i.e. RIFs) and their qualitative
descriptors are taken forward to the next Chapter to develop the methodology for the
quantitative assessment of the recovery context.
Chapter 8 Quantitative Assessment of Recovery Context
206
8 Quantitative Assessment of Air Traffic Controller Recovery Context
The previous Chapter presented a selection of contextual factors relevant to recovery
from equipment failures in Air Traffic Control (ATC), known as Recovery Influencing
Factors (RIFs). This selection was based on a review of existing Human Reliability
Assessment (HRA) techniques, augmented by specific equipment failure and dynamic
situational factors. A set of 20 RIFs were identified and distributed in four main groups:
internal, equipment failure related, external, and airspace related factors. In order to
facilitate quantitative assessment of the recovery context, the selected RIFs were firstly
assigned potential qualitative levels of impact followed by their quantitative definition
(i.e. probability of each level occurring). The Chapter starts by reviewing relevant past
research to formulate the methodology adopted in this thesis. The proposed
methodology consists of six steps. The qualitative definition of 20 RIFs from the
previous Chapter (Step 1) is followed by the quantitative definition of each RIF (Step 2).
This quantitative definition is based on various sources, such as past literature,
operational failure reports, expert input of eight ATM specialists, and the questionnaire
survey. The Chapter continues by the implementation of all existing interactions
between relevant RIFs (Step 3). These are identified by utilising operational experience
and further validated by past research and expert input. Incorporation of interactions
results in the change of RIF levels that necessitate determination of the cut-off point
between any two consecutive levels (Step 4). Finally, the methodology defines the
relationship between a particular RIF level and its effect on controller recovery
performance (Step 5), to conclude with the definition of a numerical indicator for each
recovery context (Step 6).
8.1 Lessons leant from past research
The review of various HRA techniques (in Chapter 7) identified two issues relevant to
this thesis. Firstly, it identified potential RIFs. Secondly, it revealed the two HRA
techniques which use contextual factors as the basis for quantitative human
performance analysis. These are: the Cognitive Reliability and Error Analysis Method -
Chapter 8 Quantitative Assessment of Recovery Context
207
CREAM (Hollnagel, 1998) and Connectionism Assessment of Human Reliability -
CAHR (Straeter, 2000). A discussion of the CREAM techniques and its relevance to
this thesis is presented in sections 7.2.1.9 and 7.3 of Chapter 7 and will not be
repeated here. However, since the CREAM technique has been further developed in
the work by Kim, Seong, and Hollnagel (2005) and Fujita and Hollnagel (2004), both
approaches have been assessed for their relevance to the research presented in this
thesis.
8.1.1 Applications of the CREAM technique
The application of the CREAM technique by Kim, Seong, and Hollnagel (2005)
attempted a probabilistic determination of contextual factors to determine the relevant
control mode (tactical, opportunistic, scrambled, and strategic control as defined in
CREAM). In short, the authors proposed probability distributions for nine contextual
factors or CPCs, taking into account their dependencies. The advantage of their
approach is the straightforward incorporation of uncertainties. In other words, this
approach is useful in the case of contextual factors which are not clearly defined or
understood. Because of this particular feature, this approach has been adopted in this
thesis.
Furthermore, Kim, Seong, and Hollnagel (2005) link each level of a contextual factor to
a specific type of control and assess all possible contexts using the Bayesian Belief
Network (BBN) approach. Littlewood, Strigini, Wright, and Courtois (1998) state that
the use of BBNs allows safety experts to better handle safety assessment and
potentially make hidden safety arguments more visible, communicable, and auditable.
In general, the concept of BBN is based on a probabilistic approach. It combines expert
input and data, and is useful for building complex and uncertain applications. However,
the approach by Kim, et al. (2005) based on nine CPCs was too complex.
Subsequently, Kim, et al. simplified it by grouping the nine CPCs into the groups of
three, further assessed by the BBN approach. For this reason, a probabilistic approach
based upon C programming codes and the core methodology by Kim et al. (2005) is
used in this thesis to enable incorporation of all 20 RIFs.
The application of the CREAM technique by Fujita and Hollnagel (2004) is designed as
a practical application of CREAM for screening various scenarios and estimating the
failure probability solely from the characteristics of the contextual conditions
surrounding an occurrence (e.g. accident). In this way, the method moves away from
the notion of human error and focuses more on context as a driving force of inadequate
Chapter 8 Quantitative Assessment of Recovery Context
208
human performance, regardless of whether an individual or a team is involved.
Although it demonstrates the usefulness of the CREAM methodology, this method is
not very relevant to this thesis.
8.1.2 Connectionism Assessment of Human Reliability (CAHR)
As previously discussed in section 7.2.1.12 of Chapter 7, CAHR is a data-driven HRA
technique based on highly detailed databases of incident reports in the nuclear industry.
Using the available incident reports, it was possible to move away from an expert
judgment based categorisation of PSFs towards a more analytical method. However,
ATC still lacks a high-level database that captures human performance in the event of
an ATC related incident/accident. Therefore, an analysis of context as performed in
CAHR is still not achievable in the ATC industry. Some initial attempts to establish a
database that captures the human performance data are planned by EUROCONTROL
through the Human Error in ATM (HERA) project (EUROCONTROL, 2002d), but
currently this is incapable of supporting any meaningful statistical analysis.
The following Table 8-1 summarises the characteristics of CREAM, its two main
applications, and CAHR. Section 8.2 builds on the relevant elements of the CREAM
technique to define a framework for the quantitative assessment of recovery context.
Table 8-1 Overview of CREAM and CAHR differences
HRA technique Relevant area Number of contextual
factors
Interaction between
contextual factors Output
CREAM by Hollnagel (1998)
Theoretical approach toward human erroneous
action
Nine Included
qualitatively
Quantitative probabilistic
range
Improvement of CREAM by Fujita and
Hollnagel (2004)
Theoretical approach toward
‘action’ failure rate based on contextual
factors
Ten
Included qualitatively (based on CREAM)
Quantitative mean failure rate
Improvement of CREAM by Kim,
Seong, and Hollnagel (2005)
Theoretical approach toward human erroneous
action
Nine
Included qualitatively (based on CREAM)
Quantitative, probabilistic approach
CAHR by Straeter (2000)
Data driven approach defined
within nuclear industry
Thirty Included
quantitatively using the available data
Connectionism method
facilitating qualitative and
quantitative approach
Chapter 8 Quantitative Assessment of Recovery Context
209
8.2 Framework of the methodology for a quantitative assessment of recovery context
The proposed methodology is ‘generic’ as its aim is to present the framework for a
‘generic’ ATC Centre, as described in Chapter 2, section 2.4. Used operationally, this
methodology would have to be refined to reflect and incorporate all the characteristics
of the ATC Centre or event under investigation.
In general this methodology consists of six steps (Figure 8-1). Firstly, it is necessary to
review the twenty RIFs identified in the previous Chapter and their relevance to the
ATC Centre or event under investigation. In the ‘generic’ approach, all 20 factors are
assessed and defined through their qualitative descriptor or their levels of impact on
controller recovery performance (Step 1). Secondly, based on available sources of
information each RIF is probabilistically defined (Step 2). As a result, it is possible to
present the recovery context as a function of identified RIFs and their corresponding
levels. At this stage, there is no consideration of the interactions between RIFs, as they
are considered to be independent. To provide an accurate approach, Step 3 takes into
account all interactions between RIFs. These are assessed both qualitatively and
quantitatively. This results in a distribution of RIFs levels. Having a distribution of RIF
levels, as opposed to discrete Levels 1, 2 and 3, necessitates identification of the cut-
off point between any two consecutive levels (Step 4). Once these cut-off points are
identified and RIF levels re-defined, the next step quantifies the relationship between
the particular level of RIF and its impact on controller recovery performance. This
relationship is expressed via correlation coefficients (Step 5). At this stage, previously
determined probabilities of each RIF level (Step 2) are re-calculated to account for
RIFs interactions. The result is the definition of an aggregated indicator of the recovery
context, referred to as the recovery context indicator – Ic (Step 6).
The Figure 8-1 below presents the six steps framework of the quantitative assessment
of the recovery context. Since the previous Chapter identified and discussed all 20
RIFs and their levels of impact (qualitative descriptor), the following section discusses
the consequent step, namely probabilistic assessment of RIFs (Step 2). This is
followed by the remaining steps of the proposed methodology (Figure 8-1).
Chapter 8 Quantitative Assessment of Recovery Context
210
Figure 8-1 Framework for the quantitative assessment of the recovery context
Chapter 8 Quantitative Assessment of Recovery Context
211
8.3 Probabilistic assessment of RIFs (Step 2)
Given that the aim of this Chapter is to present a reliable quantitative approach for the
analysis of the controller recovery performance, it is necessary to probabilistically
define levels of influence of each RIF on controller performance (referred to as
qualitative descriptor). As previously discussed in Chapter 7 (section 7.3), the
qualitative and quantitative definition of RIFs assumes that a failure occurred (i.e. that
the probability of failure is 1). In this way, it is possible to define every possible context
as a combination of RIFs and their corresponding levels of influence, i.e. qualitative
descriptor. This approach is important for the prospective analysis of controller
performance, as well as a retrospective event analysis. Even in the case of
retrospective analysis, specifying RIFs exactly is not straightforward due to the lack of
data and information about the context. In the case of predicting future events or
potential hazardous contexts, specifying the RIFs accurately becomes much more
difficult and a level of uncertainty is inherent in the process.
The use of a probabilistic approach has several advantages. Firstly, if a certain RIF is
not clearly specified or known, it is possible to assume probabilities for each of its
levels based on operational data. In this way any uncertainties identified for a certain
RIF can be considered more explicitly as illustrated by Kim, Seong, and Hollnagel
(2005). Another advantage of this approach is that the probability distribution of the
context, and indirectly controller performance, is a result of considering all possible
combinations of contextual factors or RIFs.
The definition of each RIF in terms of the probability of each of its levels is not
straightforward. However, this is necessary for any attempt to quantify the
effectiveness of controller recovery performance in a given context or environment.
Major difficulties are experienced in the quantification of internal RIFs (or factors
related to the controller), as it is hard to quantify any type of human performance. It is
also difficult to quantity some of the equipment failure related RIFs due to the lack of
consistent data collection in the available occurrence reporting schemes. In other
words, some failure characteristics, such as the number of workstations affected, are
not consistently reported. Finally, the majority of the external RIFs are highly ATC
Centre specific and as such extremely hard to define in a generic form. Bearing this in
mind, it is understandable why the quantification of RIFs has been a challenge in the
past.
Chapter 8 Quantitative Assessment of Recovery Context
212
For this reason, it should be noted that this Chapter captures the characteristics of the
‘generic’ ATC Centre as a base for any further fine tuning of the proposed methodology
and its usage as either a retrospective or prospective/predictive tool. Each ATC Centre
has its unique characteristics that may be represented by different RIF probabilities.
For example, the ‘number of workstations/sectors affected’ and ‘complexity of failure
type’ depend on a particular architecture in each ATC Centre, while ‘training for
recovery’ as well as ‘adequacy of organisation’ depend on a particular safety culture.
The framework developed in this Chapter is applied to a unique ATC Centre, presented
in Chapter 10.
8.3.1 Sources of information
A total of four different sources of information have been consulted in order to
determine the necessary RIFs probabilities. These are: operational failure reports
(presented in Chapter 4), the responses from the questionnaire survey (presented in
Chapter 6), responses of ATM specialists, and past literature. Table 8-2 presents the
number of RIFs defined by each available source of information, while the following
paragraphs explain each source in detail. However, two RIFs are not informed by any
of the available sources (‘number of workstations/sectors affected’ and ‘adequacy of
alarm/alert onset’). In these cases, a conservative approach is taken and probabilities
are equally assigned between their levels. Details are presented in Appendix VIII.
Furthermore, three RIFs are informed by combined sources of information (last column
in Table 8-2).
Table 8-2 Distribution of probabilistic RIF ratings per source
Source of probabilistic assessment
Number of RIFs assessed directly (single source)
Number of RIFs assessed indirectly (combined sources)
Operational failure reports - 1 (RIF11) 1 (RIF6) Questionnaire survey 3 - Averaged ATM specialists input
12 1 (RIF11) 1 (RIF3) 1 (RIF6)
Past literature - 1 (RIF3) No available source 2 -
Sum 17 3 (i.e. RIF3, RIF6, and RIF11)
8.3.1.1 Operational failure reports
The probabilistic assessment of the recovery factors is informed by the analysis of
more than 20,000 operational failure reports on equipment failures originating from
three Civil Aviation Authorities (referred to as Countries A, B, and C) and one ATC
Chapter 8 Quantitative Assessment of Recovery Context
213
Centre system control and monitoring database (referred to as Country D). Detailed
analyses of these reports are presented in Chapter 4.
The analyses of operational failure reports are used to inform two particular RIF
probabilities. The first one is ‘complexity of failure type’. The probabilities relevant to
this RIF are determined by tracking the number of reports based on only single failure
compared to those reporting more than one failure. These findings are further validated
by the responses from the eight ATM specialists surveyed. The second RIF is ‘duration
of failure’. This RIF is informed by the analysis of data from Country D database, as it
was the only database that captured duration of failure. These findings are further
validated by the responses from the eight ATM specialists surveyed.
8.3.1.2 Questionnaire survey
The responses from the questionnaire survey, received from 34 different countries,
captured the experiences of more than one hundred air traffic controllers (average
controller experience is 13.8 years, ranging from 1 to 39 years). The detailed
assessment of this dataset is presented in Chapter 6. This source provided an input for
three RIF probabilities. These are: ‘training for recovery from ATC equipment failure’,
‘previous experience with a particular type of equipment failure’, and ‘existence of
recovery procedure’.
The first RIF (‘training for recovery from ATC equipment failure’) is more difficult to
determine compared to other two RIFs. The questionnaire survey determined that 51.7
percent of sampled ATC Centres have established training for recovery (informed
probability of RIF1 defined via Level 1) and that 31 percent have not (informed
probability of RIF1 defined via Level 3). The remaining 17.4 percent of sampled ATC
Centres showed inconsistent responses and this result is translated into the probability
of this RIF1 defined via Level 2 or ‘tolerable’ level. It is assumed that inconsistent
responses on the existence of recovery training, within the same ATC Centre, may
suggest that training is not organised in a consistent manner.
8.3.1.3 Input by ATM specialists
Several probabilities are captured through the input from relevant ATM specialists from
eight similar ATC Centres. The ATM specialists from Ireland, Norway, Sweden, Austria,
New Zealand, Australia, and Japan participated in the small-scale survey. In two cases
the relevant probabilities are captured through face-to-face interviews (with ATM
specialists from Ireland and Norway), whilst in all other cases a predefined set of
Chapter 8 Quantitative Assessment of Recovery Context
214
questions was distributed for self-completion. These questions were designed to
investigate the factors that impact on controller recovery (as defined via 20 RIFs). For
example, their input informed the probabilities which could not be captured using other
sources of information either because of their confidential nature (e.g. ‘time course of
failure development’) or because of the general unavailability of data (‘adequacy of HMI
and operational support’, ‘adequacy of organisation’). The form used with both face-to-
face interviews and self-completion methods of response collection is available in
Appendix IX.
The ATM specialists surveyed have wide ATM operational experience and worked as
either rated air traffic controllers or as engineers in the operational ATM environment.
However, their resident ATC Centres needed to be assessed to establish the level of
similarity that may be reflected in their RIF ratings (Table 8-3). All eight ATC Centres
provide Area Control Service (ACC) while some also provide oceanic air traffic services,
i.e. control of traffic transiting oceanic areas where the absence of radar coverage
necessitates the use of procedural control. Furthermore, six ATC Centres are equipped
with advanced ATC systems, utilising the latest automated tools such as Short Term
Conflict Alert (STCA), Area Proximity Warning (APW), and Minimum Safe Altitude
Warning (MSAW). Finally, although the traffic is reported at the country level, all ATC
Centres provide the majority of ACC services in their respective countries. For this
reason, country-level traffic figures can be taken as a good indicator of the amount of
traffic controlled by each respective ATC Centre. Reviewing the available traffic figures,
only Japan differs significantly compared to other countries. The Tokyo area represents
one of the busiest airspaces in the world, comparable to the London and Maastricht
areas of Europe.
Table 8-3 ATM specialists involved in the assessment of RIFs
Resident ATC Centre
ATC Service provided
ATC system status1
Total IFR flights controlled within the country in 2005 (in thousands)
Shannon ACC/Oceanic Latest generation 6212
Oslo ACC Latest generation 4882
Malmo ACC Latest generation 6862
Vienna ACC Older generation 8192
Auckland ACC/Oceanic Latest generation 5553
Melbourne ACC/Oceanic Latest generation 6474
1 Source: personal correspondence with Dr Arnab Majumdar who visited all listed ATC Centres
2 Source: EUROCONTROL Performance Review Report (EUROCONTROL, 2006c)
3 Source: Airways New Zealand (2006b)
4 Source: Bureau of Transport and Regional Economics (2006). Australian Government
Chapter 8 Quantitative Assessment of Recovery Context
215
Christchurch ACC/Oceanic Latest generation 5553
Tokyo ACC/Oceanic Older generation 2,2505
The responses from the ATM specialists surveyed are used to inform 12 RIFs. For
three RIFs their responses have been used to either supplement the findings from the
past research (for the ‘experience with the system performance’ RIF) or validate
findings from the operational failure reports (for the ‘complexity of failure type’ and
‘duration of failure’ RIFs).
For majority of RIFs, the responses from the ATM specialists surveyed have been
consistent. However, for six RIFs some ATM specialist gave different answers. This
was the case with the following RIFs: ‘personal factors’, ‘communication for recovery
within team/ATC Centre’, ‘time course of failure development’, ‘adequacy of HMI and
operational support’, ‘airspace characteristics’, and ‘conflicting issues in the situation
(task complexity)’. For example, for ‘personal factors’ the majority of ATM specialists
reported this RIF as ‘suitable for the recovery process’ in 70 to 90 percent of failure
occurrences. However, Oslo and Tokyo ATM specialists reported personal factors as
‘suitable’ in less then 15 percent of failure occurrences. These lesser ratings of the
‘personal factors’ indicate the perception of ATM specialists on readiness of air traffic
controllers to face unusual/emergency situations, such as equipment failure.
Similarly, potential gaps are identified with Melbourne and Christchurch ATC Centres
where the majority of failures seem to be latent (accounted for 92 and 60 percent,
respectively). This is contrary to the answers provided from other ATC Centres. Finally,
the potential gaps regarding the ‘adequacy of airspace’ are identified by ATM
specialists from Auckland and Tokyo ATC Centres. They ranked airspace design and
configuration as tolerable, highlighting the potential for improvement of airspace
characteristics to enhance controller recovery performance.
It can be concluded that the ATM specialists from eight countries worldwide produced
similar ratings for the majority of RIFs. Identified inconsistencies reflect differences that
exist between these ATC Centres in terms of the ATC Centre culture (reflected in
personal factors), airspace design, and ATC Centre architecture. These differences are
reasonable as indicators of diversity that exists between ATC Centres within one
5 Source: Air Traffic Activity at Area Control Centre (last available for 2003) from Ministry of
Land, Infrastructure, and Transport (2006)
Chapter 8 Quantitative Assessment of Recovery Context
216
country as well as worldwide. As a result, the responses from the ATM specialists
surveyed have been taken to inform several RIFs. In future, the weighting scheme may
be used to account for the variability between ATC Centres (e.g. safety culture,
differences of ATC Centres, ATM specialists experience).
8.3.1.4 Past literature
Finally, the relevant data from past ATC research are used to inform probabilities for
the RIF ‘experience with the system performance’. The probabilities are determined
from the findings of Hilburn and Flynn (2001) and EUROCONTROL (2000b) in which
18 percent of controllers reported undertrust in technology. These findings are
combined by the responses from the ATM specialists surveyed on the percentage of
controllers with an excessive trust in technology (i.e. overtrust). Therefore, both
sources of information are used to establish the final probability rating for this particular
RIF (presented in Appendix VIII).
8.3.1.5 Aggregation of data
The previous sections have described four different sources of information used to
determine RIF probabilities. These are: operational failure reports, responses from a
questionnaire survey, responses from the ATM specialists surveyed, and past literature.
Table 8-4 reviews all four sources of information with respect to the level of confidence
and therefore the rationale behind the aggregation of data. Three data sources are
rated with a high level of confidence (questionnaire survey, responses from the ATM
specialists surveyed, and past literature). Only one source is rated with medium
confidence. More precisely, the confidence level for operational failure reports from the
CAA databases is not defined as ‘high’ due to the lack of information on the reliability of
available reporting schemes. There are reliability issues regarding the reporting of
safety occurrences recognised by CAAs 6 . However, none of the CAAs has a
methodology in place to assess the reliability of their reporting scheme, and therefore,
the completeness of the occurrence databases. Therefore, the medium ranking for the
confidence level is an assumption informed by operational experience. As a result, the
data from this source are validated by the findings from another source of data (i.e.
ATM specialists input) to assure reliable RIF ratings.
6 International workshop on the analysis of aviation incident/accident precursors. The workshop
was held on 25 and 26 May 2005 at Imperial College London.
Chapter 8 Quantitative Assessment of Recovery Context
217
Table 8-4 Overview of the sources of information used to determine RIF probabilities
Source Level of confidence
(subjective) Comment
Operational failure reports from the CAAs
Medium The confidence level is not defined as ‘high’ due to the lack of information on reliability of available reporting schemes
Operational failure reports from the
engineering unit of particular ANSP
High
The confidence level is defined as ‘high’ due to the fact that the engineering unit has to be aware of all equipment failures occurring in the ATC Centre as they are directly responsible for their maintenance and repair
Questionnaire survey High Responses from 134 air traffic controllers, from 58 ATC Centres, and 34 countries worldwide
ATM specialists High Conducted with ATC specialists from eight ATC Centres worldwide
Past literature High Hilburn and Flynn (2001) and EUROCONTROL (2000b)
In general, the above analyses employed the data from all four sources to define the
probabilities for 20 Recovery Influencing Factors (RIFs). These are presented in
Appendix VIII.
8.3.2 Summary
The preceding paragraphs have used the qualitative levels of the impact of each of the
RIFs (i.e. qualitative descriptor) defined in Chapter 7 and probabilistically defined each.
Overview of all 20 RIFs, their corresponding levels, and designated probabilities is
provided in detail in Appendix VIII and in a tabular form in Appendix X.
Having defined all 20 relevant recovery factors in the previous sections, it is possible to
define recovery context. In general the recovery context may be seen as a discrete
function since all possible contexts are defined exactly by 20 elements, and since each
RIF has only two or three defined levels. In mathematical terms, the existing method
can be expressed as a function f using a set of 20 RIFs to define the recovery context
indicator (Ic) as shown in equation 8-1:
),....,,( 2021 RIFRIFRIFfIc = 8-1
The total number of possible recovery contexts represents the number of combinations
of the 20 RIFs, where nine of them have three levels whilst eleven have only two levels
of impact. In total, this approach generates 39 x 211 = 40,310,784 possible contexts,
each having equal probability of occurrence of 1/40,310,784 = 2.4E-08. In
mathematical terms this is equivalent to finding all variation with repetitions of 20 RIFs
Chapter 8 Quantitative Assessment of Recovery Context
218
and their corresponding levels. In addition, each recovery context will have a specific
value of the recovery context indicator (Ic). The methodology to calculate this variable
is presented in the remainder of this Chapter.
Table 8-5 presents an example of a potential recovery context as a 20-digit array
where each digit corresponds by its position to a particular RIF and by its value to the
precise impact of a particular RIF on controller performance. At this stage, all RIFs are
considered independently and their corresponding levels of influence on controller
performance take integer value, i.e. 1, 2, or 3.
Table 8-5 Example of a potential recovery context represented as a 20-digit array
RIF ID RIF1 RIF2 RIF3 RIF4 RIF5 RIF6 RIF7 RIF8 RIF9 RIF10
Level 1 1 2 1 1 2 1 2 1 1
RIF ID RIF11 RIF12 RIF13 RIF14 RIF15 RIF16 RIF17 RIF18 RIF19 RIF20
Level 2 2 1 1 3 3 3 1 3 3
The following sections show how the existing RIFs interactions may change the RIF
levels in either direction (i.e. increase the value of the level which corresponds to the
deterioration in controller performance or decrease the value of the level which
corresponds to an improvement in controller performance).
8.4 Interactions between Recovery Influencing Factors (Step 3)
The methodology for the assessment of the recovery context surrounding the
equipment failure occurrence presented in this Chapter is based upon 20 relevant
contextual factors or RIFs. In order to provide an accurate approach, this methodology
has to take into account all the interactions between these contextual factors. The
interactions have been initially established based upon operational experience and
validated by findings from HRA techniques and ATM specialists. The selection of all
relevant RIFs and establishment of their interactions creates a basis for the generation
of all possible recovery contexts and the calculation of the numerical indicator for each
context (Ic). The steps taken to identify RIFs interactions are presented in the following
sections.
8.4.1 Identification of RIF interactions
At first glance, the identified RIFs reveal possible interactions between them. For
example, a poorly designed display (i.e. HMI) as well as inadequate knowledge of ATC
system modes (i.e. inadequate training) may lead to delayed failure detection and less
efficient recovery. Furthermore, stress as a personal factor cannot be independent of
Chapter 8 Quantitative Assessment of Recovery Context
219
traffic and airspace complexity. If a controller deals with increased levels of traffic, it is
reasonable to assume that stress levels will be higher.
In order to determine the effect of contextual factors on controller performance it is
therefore necessary to describe these interactions, in addition to describing how they
affect controller performance. The analysis of interactions makes it possible to gain a
more accurate picture of the context and thus a better understanding of the recovery
process. In other words, this permits a broader retrospective analysis as well as a more
precise prediction of the effectiveness of the improvement measures. As noted by
Straeter (2000), such interactions could also point to additional factors previously
omitted, such as potential organisational shortcomings.
Straeter (2000) tackles this problem in CAHR by looking at the common appearance of
different factors (using available databases). The analysis is based on capturing the
observed interactions between reported contextual factors. The availability of a detailed
database is however a prerequisite to this approach. Hollnagel (1998) on the other
hand establishes these interactions in CREAM by considering each contextual
condition with respect to how it generally influences the others (there is no mention
whether expert judgement or operational expertise have been used). It is also
important to say that CREAM assumes reciprocal interaction between the contextual
conditions.
The interactions amongst predefined 20 RIFs have been determined based on known
relationships from operational experience and marked with symbol ‘√’ in Table 8-6.
They represent the irreversible influence between two RIFs or how RIFs in the first row
affect RIFs in the left hand column. The reason for irreversible influence lies in the
characteristics of the air traffic environment where one factor may influence the other
one without any reverse effect. For example, complex traffic can influence controller
personal capabilities in terms of increased stress, anxiety, and workload; while the
opposite influence (impact of personal capabilities on traffic complexity in the sector) is
simply not logical.
Chapter 8 Quantitative Assessment of Recovery Context
220
Table 8-6 Interactions matrix: (c) validation by CREAM, (h) validation by CAHR, (a) validation by ATM specialists; and (x) not validated interactions
RIF ID
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Direct Influence
Tra
inin
g f
or
recovery
fro
m A
TC
equip
ment
failu
res
Pre
vio
us e
xperience w
ith
equip
. fa
ilure
s
Experience w
ith s
yste
m
perf
orm
ance (
relia
nce)
Pers
onal fa
cto
rs
Com
m. fo
r re
covery
within
a
team
of contr
olle
rs
Com
ple
xity o
f fa
ilure
Tim
e c
ours
e o
f fa
ilure
develo
pm
ent
Num
ber
of w
ork
sta
tions/
secto
rs a
ffecte
d
Tim
e n
ecessary
to r
ecover
Exis
tence o
f re
covery
pro
cedure
Dura
tion o
f fa
ilure
Adequacy o
f H
MI
Am
big
uity o
f in
fo in the w
ork
ing
environm
ent
Adequacy o
f ala
rms/a
lert
s
Adequacy o
f ala
rms/a
lert
s
onset
Adequacy o
f org
anis
ation
Tra
ffic
Airspace c
hara
cte
ristics
Weath
er
conditio
ns
Task c
om
ple
xity
1
Training for recovery from ATC equipment failures
√
(a)
√ (c/a)
2 Previous experience with equip. failures
√ (a)
3 Experience with system perf. (reliance)
√ (a)
√
(h/a)
√
(h/a)
√
(h/a)
4 Personal factors √
(a) √
(a) √
(a)
√ (a)
√ (a)
√
(x)
√ (h/a)
√ (h)
√ (x)
√ (h/a)
√ (h/a)
√ (h/a)
√ (h/a)
√ (a)
√ (h/a)
√ (h)
√ (h/a)
√ (h/a)
5
Comm. for recovery within a team of controllers
√ (c/a)
√ (c/a)
√ (c/a)
√ (a)
√
(a)
√ (x)
√ (h/a)
√ (h)
√ (x)
√ (h)
√ (h/a)
√ (h/a)
√ (h/a)
√ (c/a)
√ (a)
√ (x)
√ (a)
√ (h/a)
6 Complexity of failure type
√
(a)
7 Time course of failure develop.
√
(a)
8 Number of workstations/ sectors affected
√
(a)
√ (a)
9 Time necessary to recover
√ (h/a)
√ (h/a)
√ (h/a)
√ (c/h/a)
√ (c/h/a)
√ (a)
√ (a)
√
(c/a)
√ (a)
√ (c/h/a)
√ (c/h/a)
√ (c/h)
√ (c/h/a)
√
(h/a)
√ (h/a)
√ (h/a)
√ (c/h/a)
10 Existence of recovery procedure
√
(c/a)
11 Duration of failure
√
(a) √
(a)
12 Adequacy of HMI
√
(a)
√ (a)
√ (a)
√ (c/a)
13 Ambiguity of info in the working environment
√ (a)
√ (a)
√
(a)
√ (a)
√
(a)
√ (c/a)
√
(c/a)
14 Adequacy of alarms/alerts
√
(a)
√ (a)
√
(c/a)
15 Adequacy of alarms/alerts onset
√
(a) √
(a)
√ (a)
√
(a)
√ (c/a)
16 Adequacy of org. √
(a) √
(a) √
(a) √
(a)
17 Traffic √
(a)
√ (a)
√ (a)
18 Airspace char. √
(a) √
(a)
√ (x)
19 Weather
20 Task complexity √
(h/a)
√
(h/a)
√ (h/a)
√ (a)
√ (a)
√ (a)
√ (h/a)
√ (c/h/a)
√ (a)
√ (c/h/a)
√ (c/h/a)
√ (c/a)
√ (c/h/a)
√ (a)
√ (a)
√ (a)
√ (a)
Chapter 8 Quantitative Assessment of Recovery Context
221
8.4.2 Validation of RIF interactions
This section validates the interactions identified in the previous section. This was
carried out in two stages. The first stage (sections 8.4.2.1 and 8.4.2.2) addresses
interactions identified in existing literature (CREAM and CAHR techniques). Although
Chapter 7 presented the basic principles behind these two techniques and extracted
candidate RIFs, this Chapter focuses only on the assessment of the interactions
between contextual factors identified in both techniques. The second stage (section
8.4.2.3) identifies the interactions based on the input by three ATM specialists. The
self-completion method was used to collect their responses.
8.4.2.1 CREAM
A comparison of the interactions between contextual factors defined in the CREAM
technique (i.e. CPCs) and those defined between RIFs (Table 8-6) shows a degree of
mapping. A direct link was found with all interactions except those relevant to ‘working
conditions’ and ‘number of simultaneous goals’ CPCs. As already explained in Chapter
7, these two contextual factors are excluded from the list of RIFs. Note that the
interactions relevant to the ‘crew collaboration quality’ CPC are compared with those
related to the ‘communication for recovery’ RIF, because mostly verbal form of
teamwork occurs after the detection of equipment failure.
The CREAM technique is developed as a generic technique for the analysis of human
actions. Therefore, it is not specifically ATC oriented and cannot entirely reflect the
characteristics of the ATC environment. For this reason, several RIFs could not be
mapped to the CPCs. These are personal factors (except ‘time of the day’ as one of the
contextual factors identified in CREAM), complexity of failure type, time course of
failure development, number of workstations/sectors affected, duration of failure, traffic
complexity, airspace characteristics, and weather conditions. In general from all the
interactions identified amongst the RIFs, 22 percent have been reflected in CREAM.
Mapping between CREAM CPCs factor interactions and RIF interactions is presented
with symbol ‘c’ in Table 8-6.
8.4.2.2 CAHR
A comparison of the interactions between six Man-Machine System (MMS) and their
corresponding PSFs defined in CAHR and those defined between RIFs (Table 8-6)
shows a degree of mapping. This mapping is presented in Table 8-7.
Chapter 8 Quantitative Assessment of Recovery Context
222
Table 8-7 Mapping between RIFs and CAHR contextual factors
RIF MMS
Personal factors Person Complexity of failure type Task Number of workstations affected
System
Duration of failure Task Time necessary to recover Task Time course of failure development
System
Existence of recovery procedure
Order-issue
Adequacy of HMI Feedback
Adequacy of alarms/alerts Airspace-related factors Task/activity
Several identified PSFs are relevant to the nuclear plants (e.g. task preparation,
precision, labelling, marking), whilst the majority are applicable to recovery from
equipment failures in ATC (e.g. time pressure, procedures, HMI). Straeter (2000)
presents reciprocal interactions between PSFs in CAHR as captured through the
analysis of the common appearance of different factors in individual events from
nuclear databases. Table 8-6 presents these interactions (marked with ‘h’ in Table 8-6).
35 percent of the RIF interactions are captured by CAHR.
8.4.2.3 Validation by ATM specialists
Various interactions between failure characteristics, airspace, traffic, personal factors,
ambiguity of information in the working environment, and the time necessary to recover
have not been confirmed through the preceding validation processes. However, the
existence of links between these factors has been validated independently by three
ATM specialists.
These ATM specialists come from the same ATC Centre and have more than ten years
of operational experience in the ATC domain. ATM specialists reviewed existing
interactions and marked those with which they disagreed. Their input was taken
through a small-scale self-completion survey based on the interactions identified in
Table 8-6 and marked with ‘√’. The exact form used in this small-scale survey is
presented in Appendix XI. The comparison of their independent validations showed
similarities. Several inconsistencies were identified, mostly due to ATM specialists
initially reading the matrix wrongly. These were clarified via personal correspondence
before the final validation. As a result, 90 percent of the RIF interactions from Table 8-6
have been validated by the ATM specialists (marked with ‘a’ in Table 8-6).
Chapter 8 Quantitative Assessment of Recovery Context
223
8.4.2.4 Validation summary
95 percent (107 interactions out of 113) of the RIFs interactions have been validated by
existing literature and ATM specialists. The remaining six interactions were not
validated by either of the sources available. These, marked with ‘x’ in Table 8-6, are:
� impact of ‘number of workstations/sectors affected’ on ‘personal factors’;
� impact of ‘duration of failure’ on ‘personal factors’;
� impact of ‘number of workstations/sectors affected’ on ‘communication for
recovery’;
� impact of ‘duration of failure’ on ‘communication for recovery’;
� impact of ‘airspace characteristics’ on ‘‘communication for recovery’; and
� impact of ‘weather’ on ‘airspace characteristics’.
From the perspective of past research and ATM experts input these six interactions do
not exhibit any correlation and thus, the research presented in this thesis excludes
them from the remaining analysis. However, a more quantitative approach would be
required in future. For example, further development of the HERA database could allow
additional validation of RIF interactions (including these six). Furthermore, it could allow
the quantification of their level of influence through the definition of the coefficient of
interaction. Details on the coefficient of interaction are presented in the next section.
8.4.3 Quantification of RIFs interactions
The validated RIFs interactions above were used to develop a method to quantify the
level of interactions. The most accurate approach would be to analyse each interaction
separately as presented in equation 8-2:
∑∑ +=×+=
x
xxyj
x
xxyjj RkRIFYRkRIFYRIFY ' 8-2
where,
RIFYj represents a level j of RIFY; j =1, 2, or 3;
RIFYj’ represents a level j’ of RIFY after incorporation of RIF interactions, 0.0 ≤ j’ ≤ 4.0;
kxy represents the coefficient of interaction between RIFX and RIFY (kxy≠kyx);
Rx depends upon the level of RIFX → Rx={+1, 0, -1}
In other words, kxy is the numerical representation of the direct influence that RIFX has
on RIFY. Note that the interaction factor represents irreversible interaction (i.e. kxy ≠ kyx).
Taking into account the overall lack of quantitative assessment of context in the area of
Chapter 8 Quantitative Assessment of Recovery Context
224
ATC, it is difficult to determine each coefficient kxy separately. As already discussed in
section 8.1.2, some initial attempts to establish a detailed database that captures the
human performance data are planned by EUROCONTROL through the Human Error in
ATM (HERA) project (EUROCONTROL, 2002d). Although the interactions do not
necessarily have the same level of influence, this thesis had to define a more generic
approach to account for lack of operational data. Nevertheless, if the RIFs interactions
become quantifiable (e.g. via HERA database), the methodology presented in this
Chapter will still be valid.
As a result, this thesis follows the assumption that all determined interactions have the
same level of influence, referred to as k. Namely, it is assumed that interactions
between all pairs of RIFs are equal and as such that there is only one coefficient, k=1/
(N-1). N represents the total number of relevant RIFs for a particular ATC Centre or a
particular incident under investigation. In addition, (N-1) is used because one factor
cannot influence itself. Therefore, in the case of 20 relevant factors, the coefficient of
interaction would be calculated as k=1/19=0.053.
One important assumption made here is that all RIFs which influence a particular RIF
can never change its level by more than one unit, e.g. from Level 3 to Level 2 but not
from Level 3 to Level 1. The reason for this is that it takes more than 50 percent of
relevant RIFs to influence one particular RIFs in exactly the same manner in order to
change its level (either enhancing or worsening it). For example, in the generic
approach where all 20 RIFs are relevant, it will take at least 11 RIFs, all defined via
Level 1, to influence one particular RIF in order to enhance its level by one unit, either
from Level 3 to Level 2 or from Level 2 to Level 1. This concept is similar to the
approach presented in CREAM (Hollnagel, 1998).
As a consequence of incorporating RIF interactions, the RIF levels change. Table 8-8
presents the change in the RIF levels from the initial integer values (i.e. 1, 2, or 3)
presented in Table 8-5. If the level of any RIF decreases as a number this means that
other RIFs impacted this particular RIF in such a way that the change enhances
controller performance (see RIF20 in Tables 8-5 and 8-8 which decreased from the
initial value of 3 to a new value of 2.74). Similarly, if the RIF level increases as a
number means that other RIFs impacted this particular RIF in such a way that the
change degrades controller performance (see RIF18 which increased from the initial
value of 1 to a new value of 1.11). It is important to note that the probability of the
Chapter 8 Quantitative Assessment of Recovery Context
225
occurrence of any context, with or without incorporation of RIF interactions, is the same
(1/40,310,784=2.4E-08 as previously reported in section 8.3.2).
Table 8-8 Recovery context (as presented in Table 8-5) after the incorporation of RIF interactions
RIF ID RIF1 RIF2 RIF3 RIF4 RIF5 RIF6 RIF7 RIF8 RIF9 RIF10
Level 1.00 .95 1.95 .84 .89 2.05 1.05 2.05 .74 1.05
RIF ID RIF11 RIF12 RIF13 RIF14 RIF15 RIF16 RIF17 RIF18 RIF19 RIF20
Level 1.95 2.00 0.89 1.05 2.95 2.89 2.95 1.11 3.00 2.74
In short, a change (increase or decrease) in the value of a particular RIF represents the
final outcome of all possible interactions with that particular RIF. For example, RIF5
level changes from value 1 to value 0.89 as a results of the influence of 15 different
RIFs, as seen from the matrix in Table 8-6 (see row 5).
In this particular example, RIF1, RIF2, RIF4, RIF9, RIF10, RIF13, and RIF14 influence
RIF5 in a positive way as they are defined via Level 1. As a result, each of these seven
RIFs decreases the RIF5 level by -1/19=-0.053. However, RIF15, RIF16, RIF17, RIF19,
and RIF20 influence RIF5 in a negative way as they are defined via Level 3. As a result,
each of these five RIFs increases the RIF5 level by +0.053. Other RIFs, namely RIF3,
RIF6, and RIF12 do not have any influence on RIF5 as their level is 2, which assumes
no significant influence on human performance. Furthermore, RIF7, RIF8, RIF11, and
RIF18 have no impact on RIF5 and therefore are not considerate. The result of this is
an overall decrease in RIF5 level as follows (equation 8-3):
894.0106.01)053.0(215)(755 ' =−=−×+=×+−×+= kkRIFRIF jj 8-3
The incorporation of all identified RIF interactions applied to all the identified recovery
contexts (all 40,310,784 of them) made it possible to identify the distribution of all RIFs.
Prior to incorporation of RIF interactions, the distribution of each level is the same. For
example, Figure 8-2 represents the distribution of RIF5 without incorporation of RIF
interactions. This graph represents three levels of RIF5 in a symmetrical manner, each
accounting for exactly 13,436,928 contexts or one third of the total (Figure 8-2). This
results in equal representation of each level in the 40,310,784 possible recovery
contexts.
Chapter 8 Quantitative Assessment of Recovery Context
226
0
2000000
4000000
6000000
8000000
10000000
12000000
14000000
16000000
00.
30.
60.
91.
21.
51.
82.
12.
42.
7 33.
33.
63.
9
Level
Fre
qu
en
cy
Figure 8-2 Distribution of RIF5 levels amongst identified recovery contexts without interactions
However, due to the identified interactions, the distribution of RIF5 levels amongst all
possible recovery contexts takes a different, more dispersed, shape (Figure 8-3). It is
notable that the more interactions exists with a particular RIF, the more dispersed the
distribution of levels will be. The example utilised in this section (i.e. RIF5) has a
substantial number of other contextual factors that affect it, namely 15. However, in
some cases the number of identified interactions can be small (e.g. one or two) while in
the case of RIF19 (weather conditions) there are no identified interactions and thus this
RIF has a similar distribution to RIF5 (Figure 8-2). In any case, the total number of
recovery contexts where RIF5 (or any other RIF) is defined via Level 1 remains the
same whether RIF interactions are incorporated or not. The distribution of the levels for
each of the 20 RIFs is presented in Appendix XII in a tabular format.
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
0.1
0.3
0.5
0.7
0.9
1.1
1.3
1.5
1.7
1.9
2.1
2.3
2.5
2.7
2.9
3.1
3.3
3.5
3.7
3.9
Level
Fre
qu
en
cy
Figure 8-3 Distribution of RIF5 levels amongst identified recovery contexts with interactions
Once the RIF interactions have been identified and their impact quantitatively
determined, the next step is to re-calculate existing RIF probabilities to more accurately
reflect newly determined RIF levels. However, to achieve this step it is necessary to
Chapter 8 Quantitative Assessment of Recovery Context
227
determine the cut-off points between any two consecutive levels of influence, i.e. to
determine the precise boundaries between Level 1, Level 2, and Level 3. Another
option would be to consider each of the distributions separately, i.e. covering the entire
spectrum (-∞, +∞). In this way, there is no cut-off point and there is coherency between
all results as well. However, both approaches yield similar results as there is very little
overlap between these distributions. The following section explains the method applied
to determine the cut-off points between any two consecutive RIF levels.
8.5 Methodology for the determination of the cut-off points (Step 4)
As a result of differences between the interactions affecting different RIFs (see Table 8-
6) as previously highlighted, the cut-off points between different RIFs will vary from one
RIF to the other. The shape and dispersion of the distribution of levels for each RIF
depends upon the number and type of interactions with other RIFs. As an example,
observe the difference in the distribution of levels for RIF1 (Figure 8-4) and RIF20
(Figure 8-5), where RIF1 is impacted by two different RIFs while RIF20 is being
impacted by 17 different RIFs.
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
8000000
9000000
10000000
0.1
0.4
0.7 1
1.3
1.6
1.9
2.2
2.5
2.8
3.1
3.4
3.7 4
Level
Fre
qu
en
cy
Figure 8-4 Distribution of RIF1 levels amongst identified recovery contexts with interactions
0
1000000
2000000
3000000
4000000
5000000
6000000
0.1
0.4
0.7 1
1.3
1.6
1.9
2.2
2.5
2.8
3.1
3.4
3.7 4
Level
Fre
qu
en
cy
Figure 8-5 Distribution of RIF20 levels amongst identified recovery contexts with interactions
Chapter 8 Quantitative Assessment of Recovery Context
228
The statistical method for determining the cut-off points between the levels for each
RIF is based on the 95 percent confidence interval for each level. For example, a 95
percent confidence interval for Level 1 of RIF1 would cover 95 percent of the normal
curve, where the probability of observing a value of Level 1 RIF1 outside of this area
would be less than 0.05. Under the assumption of a normal distribution7, the interval
range (µ - 2σ, µ + 2σ) captures approximately 95 percent of data.
The advantage of this approach is that it takes a common statistical approach. In
addition, this method relies upon known values of µ and σ in order to define interval the
range for each level. In other words, to calculate the values of µ and σ for RIF1 Level 1,
it is necessary to already have an assumption about the sample size (depicted as N in
equation 8-4).
N
XN
n
n∑=
=1
µ N
XN
n
n
2
1
)( µ
σ
−
=
∑=
, where 8-4
µ represents population mean for RIF1 Level 1 (population of all possible recovery
contexts where RIF1 is defined through Level 1);
σ represent population standard deviation for RIF1 Level 1;
N represents the total number of recovery contexts in which RIF1 is defined via Level 1;
Xn represents the n-th value of the variable RIF1 Level 1 (n=1,2, …. , 40,310,784).
To overcome this, three different interval values or three different cut-off points
(assumed based upon the initial distribution of data) are tested. For example, when
assessing the cut-off points between levels of RIF5, three different values between
Level 1 and Level 2 have been tested (namely Fit 1, Fit 2, and Fit 3 in Figure 8-6).
7 Corresponds to the symmetrical distribution of levels around the values of 1, 2 and 3, but also
to the large number of observations.
Chapter 8 Quantitative Assessment of Recovery Context
229
Figure 8-6 Distribution fitting for the three cut-off points on the example of RIF5 Level 18
The normal distribution parameters, as presented in Table 8-9, show no difference
between the distribution of RIF 5 Level 1 data when first and second cut-off points are
applied. However, the use of third cut-off point determines a different distribution. This
is expected as the third cut-off incorporates data which shows increased frequency for
the value of 1.8 (see Figure 8-7 and Table 8-9). Based on this, Fit 1 and Fit 2,
corresponding to cut-off points 1.6 and 1.7 respectively, are taken forward. However, it
is necessary to determine which of these two values will be taken as a final cut-off point.
Table 8-9 Descriptive statistics for the three cut-off points on the example of RIF5 Level 1
RIF5 Level 1 Cut-off point
used Mean
Standard deviation
Standard error on the mean
Fit 1 1.6 1.18 0.17 4.59E-05 Fit 2 1.7 1.18 0.17 4.65E-05 Fit 3 1.8 1.19 0.19 5.11E-05
In order to precisely determine the optimal cut-off point, it is necessary to apply a
polynomial function to the data between the mean values for Level 1 and Level 2 and
determine the minimum of that function. The polynomial function minimum rounded to
the first decimal should indicate the cut-off point (either 1.6 or 1.7). Table 8-10 presents
three different polynomial functions applied to distribution of RIF5 Level 1 and Level 2
8 Probability density function approach represents distributions so that the sum of the areas of
the rectangles equals 1.
Chapter 8 Quantitative Assessment of Recovery Context
230
data. The calculation of the function minimum9 shows that regardless of the type of
polynomial function, the local minimum corresponds to the cut-off point at 1.7 (Table 8-
10). The fit of a cubic polynomial function to RIF5 Level 1 data is presented in Figure 8-
7. Since Table 8-9 shows that the choice of cut-off at 1.6 and 1.7 constitute no
significant difference, and since the function minimum is closer to the value of 1.7, this
value is taken forward as a cut-off point between RIF 5 Level 1 and Level 2.
Table 8-10 Local minimums of polynomial functions
Polynomial function f(x) Local minimum
Quadratic 1E07(1.3472x2 - 4.5848x + 3.9200) 1.7016
Cubic 1E07(-0.5613x3 + 4.2097x
2 - 9.3510x + 6.5076) 1.6653
Quadric 1E08(-0.1785 x4 1.1574 x
3-2.6289x
2 +2.4203 x -0.7121) 1.6756
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2
Level
Fre
qu
en
cy
f(x)= 1E07(-0.5613x3 + 4.2097x2 - 9.3510x + 6.5076)
Figure 8-7 Cubic polynomial function f(x) fitted for the RIF5 data to determine its minimum
Similarly, the value of 2.7 is taken as a cut-off point between Level 2 and Level 3 (see
Table 8-11). Using the same methodology, the cut-off points are determined for all
RIFs and their corresponding levels. The established values are reported in Table 8-11.
Table 8-11 Cut-off points between the levels for all RIFs
RIF ID Cut-off point between Level 1 and
Level 2 Cut-off point between Level 2 and
Level 3
1 1.5 2.5 2 1.5 N/A 3 N/A 2.5
9 In the case of quadric polynomial functions, it is necessary to specify the local minimum (this
polynomial function has three first derivatives and thus potentially two minimums).
Chapter 8 Quantitative Assessment of Recovery Context
231
4 1.7 2.7 5 1.7 2.7 6 N/A 2.5 7 1.5 2.5 8 N/A 2.5 9 2.2 10 1.5 2.5 11 N/A 2.5 12 1.5 2.5 13 2.0 14 1.5 2.5 15 2.0 16 1.5 2.5 17 N/A 2.5 18 1.6 2.6 19 N/A 2.5 20 N/A 2.7
8.6 Specific effects of RIFs on controller recovery performance (Step 5)
While the previous section identified the cut-off points between consecutive levels of
each RIF, it is necessary to quantify the relationship between the particular level of a
RIF and its impact on controller recovery performance. This relationship has been
already defined qualitatively in Chapter 7 through the definition of the qualitative
descriptor. In short, Level 1 corresponds to the most desirable level, Level 2 to the
tolerable or average level, whilst Level 3 corresponds to the least desirable level in the
context of controller recovery performance.
In order to begin to look at the quantitative impact of each RIF level on the controller
recovery performance, the correlation coefficient is proposed. This correlation
coefficient is defined as: +1.00 corresponding to Level 1 (high positive relationship),
0.00 corresponding to Level 2 (no relationship), and -1.00 corresponding to Level 3
(high negative relationship). This approach is in line with the approach presented in
Oren, and Ghasem-Aghaee (2003) who also introduced a correlation coefficient as an
indicator of the relationship between the factors that define a personality (e.g.
openness, extroversion) and different personality types.
Once the relevant RIFs and their corresponding levels have been defined and linked to
the controller recovery performance, the next step is to present the recovery context as
a function of all contextual factors, their interactions, and impact of controller recovery
performance. The following section presents the definition of the recovery context via
recovery context indicator.
Chapter 8 Quantitative Assessment of Recovery Context
232
8.7 Calculation of the recovery context indicator (Step 6)
Based on the determination of the boundaries between consecutive levels for each RIF,
it is possible to proceed with the re-calculation of RIF probabilities and the
determination of the numerical indicator of each recovery context (i.e. recovery context
indicator - Ic). These are presented in the following sections.
8.7.1 Re-calculation of RIF probabilities
The main task at this stage is to re-calculate the probabilities that correspond to more
realistic (effective) levels resulting from the incorporation of all RIF interactions. The
previous example of one randomly chosen recovery context showed that RIF5 changed
from Level 1 (Table 8-5) to a new effective level (0.89; Table 8-8). Therefore, if the
probability of RIF5 at Level 1 is 0.73 (see Table 8-12), then it is necessary to determine
the probability of the new, effective level 0.89.
Table 8-12 Probabilities for the RIF5 and each of its levels (see Appendix X)
RIF5: Communication for recovery within team/ATC Centre
Level p(L)
Efficient 1 0.73 Tolerable 2 0.24 Inefficient 3 0.04
The way to approach this problem is firstly to determine all recovery contexts for which
RIF5 is represented via Level 1. In other words, it is necessary to determine the
number of recovery contexts for which the RIF5 level is smaller or equal to the cut-off
point between Levels 1 and 2 (i.e. 1.7, Table 8-11). This is presented in equation 8-5
below:
≤<==
≤<==
≤<==
=
=
∑ ∑
∑ ∑
∑ ∑
∑
−
+
−
+
=
−
=
=
+−
=
+
= =
4
'
,1
4
'
''3
'
1,,1
'
''2
1,
0' 0'
''1
'
'1
,1 3,2
1,
,1
3,2
2,1
1, 2,1
0.4',,3
',,2
'0,,1
j
jj
jj
jj
jj
Cj
jj
Cj
jj
C
Cj
jjjj
C
Cj
jj
jj
C
j
C
j
jj
j
j
jCRIFXRIFXRIFX
CjCRIFXRIFXRIFX
CjRIFXRIFXRIFX
RIFXRIFX
8-5
Chapter 8 Quantitative Assessment of Recovery Context
233
where
X represents different contextual factors, X= 1,2,3…,20;
j represents a level of RIFX and can take the values of 1, 2 or 3;
j’ represents a level of RIFX after incorporation of interactions where 0.0 ≤j’≤4.0;
Cj j+1 represents a cut-off point between Levels j and j+1;
For example, for RIF5 (Table 8-11):
0.4'7.2
7.2'7.1
7.1'0
,/,3
,7.2,2
,7.1,1
j 3,21,
2,11,
<<
≤<
≤<
==
==
=+
+
j
j
j
AN
CC
CC
jj
jj
Secondly, it is necessary to determine a subset of recovery context which correspond
to the newly determined level (i.e. 0.89). These are all recovery contexts having RIF5
level in the range (0.8, 0.9]. It should be noted that level 0.89 represents the value of
RIF5 level for one specific recovery context. Finally, the probability of the new level is
calculated as follows (equation 8-6):
055.0924,476,13
576,008,,173.0
)5(
)5(73.0)5(
)(
)()()(
1
89.089.0
'
'
=×=×=
=×=
RIFf
RIFfRIFp
RIFXf
RIFXfRIFXpRIFXp
j
j
jj
8-6
where
X represents different contextual factors, X= 1,2,3…,20;
j represents levels 1, 2, or 3;
f represents the sum of all possible recovery contexts;
p (RIF5 j) represents initial probability of occurrence of RIF5 for level j;
p (RIF5 j’) represents probability of occurrence of RIF5 for its new level j’;
f (RIF5 j’) represents the sum of levels for 0.89 < j’ ≤ 0.90; and
f (RIF5 j) represents the sum of all levels that correspond to the RIF5 Level 1
(i.e. 0.0 < j’ ≤ 1.7).
The new probability of occurrence (0.055) is low in its magnitude, but represents an
occurrence which a high probability of recovery. In other words, in this particular
context, RIF5 is enhanced by the influence of all the other RIFs that have interaction
with it. The final output of this methodology is the indicator of a specific recovery
context (Ic), as presented in equation 8-7. The characteristics of Ic are that, for
example, in the case of all 20 RIFs defined via Level 1 with the probability 1 and no
Chapter 8 Quantitative Assessment of Recovery Context
234
interactions, the value of Ic equals 1. Similarly, in the case of all 20 RIFs defined via
Level 3 with the probability 1 and no interactions, the value of Ic equals -1.
N
RRIFXpRRIFXp
levelsRIFsi j
jj
levelsRIFsi j
jj
c2
20
1
2
1
'
3
20
1
3
1
' )()(
I
×+
×
=
∑∑∑∑= == =
8-7
, where
All calculations relevant to the quantitative assessment of the recovery context
conducted in this thesis are performed using standard C programming language.
8.7.2 Distribution of the recovery context indicator
The recovery context indicator (Ic) represents the numerical representation of a specific
context that surrounds controller recovery from an ATC equipment failure. For
example, changes in the factors that constitute the recovery context (i.e. 20 RIFs),
captured via the change of their qualitative levels, interactions, and effect on controller
performance, are reflected in the change of the Ic magnitude. In practical terms, this
change facilitates better or worse controller recovery.
After the calculation of all 40,310,784 possible contexts it was determined that the
mean value of recovery context indicator (Ic) is 0.027, ranging between -0.069 and
0.131. The distribution of the Ic variable is presented in Figure 8-8.
p(RIFX j’) probability of RIFX with level j’, where X=1, 2, 3, …, 20 and 0.0 ≤ j’ ≤ 4.0. The level j’ takes into account all interactions between RIFs;
Rj correlation coefficient between RIFX and controller recovery performance. Depending upon level j’, it can take values {-1, 0, +1};
N total number of recovery factors (i.e. 40,310,784); and
p(RIFX j’) x Rj
probability of the overall situation occurring in one ATC Centre. In order to look at the quantitative impact that each RIF has on the controller recovery performance, each of the probabilities has to be multiplied with the correlation coefficient.
Chapter 8 Quantitative Assessment of Recovery Context
235
0
100000
200000
300000
400000
500000
600000
-0.0
7-0
.059
-0.0
48-0
.037
-0.0
26-0
.015
-0.0
040.
007
0.01
80.
029
0.04
0.05
10.
062
0.07
30.
084
0.09
50.
106
0.11
70.
128
Recovery context indicator (Ic)
Fre
qu
en
cy
Figure 8-8 Distribution of the recovery context indicator
This distribution is slightly positively skewed (right-skewed) since it has a longer tail in
the positive direction relative to the other tail. This is also confirmed by the positive
value of the statistical test indicating the concentration of values on the left side of the
distribution. The median value or value on the horizontal axis which has exactly 50
percent of the data on each side is -0.023. This positive skew may result from initial
inputs into the methodology for the quantitative (probabilistic) assessment of the
recovery context surrounding equipment failure in ATC. For example, observing the
probability values for each RIF and its corresponding levels it is clear that 12 out of 20
RIFs have a higher probability of enhancing recovery performance as opposed to
having no impact or negative impact. In other words, the probabilities of Level 1 for
these 12 RIFs are higher than for other level(s) (i.e. Level 2 and Level 3, see Appendix
X for details on RIFs probabilities). Therefore, it can be concluded that the framework
for a calculation of the recovery context in the ‘generic’ ATC Centre takes the value of
the recovery context indicator close to 0.027. This indicates that there is a large
potential for improvement and shift of the Ic values more towards a positive side, thus
enabling more appropriate contextual conditions.
In order to fully comprehend the characteristics of Ic, the next step is to calculate the
extreme values of Ic, from the most negative towards the most positive value of Ic. In
other words, it is necessary to determine the ‘ideal’ recovery context where all RIFs can
Chapter 8 Quantitative Assessment of Recovery Context
236
be expressed via Level 110. Similarly, it is necessary to determine the ‘worst’ possible
recovery context where all RIFs can be expressed via Level 311. In these cases, when
there is no uncertainty related to the probabilities of each RIF’s level, it is possible to
represent the most negative and the most positive recovery context.
Hence, the most negative value of Ic calculated using equations 8-6 and 8-7 takes the
value of -0.95. This value represents the worse possible recovery context that can
facilitate controller recovery performance in the ’generic’ ATC Centre. Similarly, the
most positive value of Ic calculated using the same equations is 0.65. These two
values are numerical representations of two extreme recovery contexts which are
mutually exclusive. However, these extreme values may be used as a good indicator of
the scale of changes that are possible to achieve within the ATC environment.
8.7.3 Sensitivity analysis
Because of the large number of recovery contexts (millions) it is reasonable to use the
assumption of normality in accordance with the central limit theory (Berenson et al.,
2006). When the data set is large, the sampling distribution of the mean is
approximately normally distributed. Using this assumption, it is possible to carry out an
analysis of the sensitivity of Ic to changes in any one recovery influencing factor.
The first step is to determine an interval around the baseline (population) mean that
includes 95 percent of the sample means or µ±2σ. According to the statistics presented
in Table 8-13 this range is 0.027+/-0.058. The second step is to implement a particular
change and test whether the sampled recovery context indicator comes from the same
population. As an example, it is assumed that the ‘training for the recovery’ provided to
air traffic controllers includes the equipment failure in question. Therefore, since there
are no uncertainties, this RIF can be defined exactly via Level 1 and its corresponding
probability (p=1). Sample statistics are presented in Table 8-13.
10
RIF3, RIF6, RIF8, RIF11, RIF17, RIF19, and RIF20 do not have the possibility of Level 1 and thus these will take the next most desirable level, being Level 2. 11
RIF2 does not have the possibility of Level 3 and thus it will take the next most undesirable level, being Level 2.
Chapter 8 Quantitative Assessment of Recovery Context
237
Table 8-13 Sensitivity analysis
Step change Statistics (M, SD) Baseline mean range
Baseline N=40,310,784
M=0.027 SD=0.029
(-0.031, 0.085) Sample 1 (change of RIF1)
N=13,436,928 M=0.061
SD=0.035
Sample 2 (change of RIF1 and RIF2) N=6,718,464
M=0.091 SD=0.023
With suitable training for the situation in question (e.g. a particular failure type) there is
no significant difference between the sample and baseline means but it is observable
that the value of Ic shifts toward a more positive value. Therefore, a second sample
was taken, assuming additionally that RIF2 or ‘experience with equipment failure’
matches precisely the equipment failure in question. In other words, RIF2 can be
defined exactly via Level 1 and its corresponding probability (p=1). The result of this
analysis shows that there is a significant change in the recovery context, since the
obtained mean does not fit the 95 percent confidence interval determined for the
baseline. Therefore, the enhanced recovery context (sample 2) comes from a
population different from the baseline recovery context. This finding indicates that the
value of Ic is sensitive to changes in the individual RIFs.
8.7.4 Optimal solutions
The methodology for the quantitative assessment of the recovery context presented in
the previous sections allows for the investigation of the recovery context in a particular
ATC Centre as well as for a particular equipment failure event. Furthermore, this
approach creates a basis for quantitative assessment and the choice of optimal
solutions for recovery enhancement. These solutions should be reviewed through the
changes in RIFs, their corresponding level, and the resulting changes in the value of Ic.
Whilst not all RIFs could be enhanced, it is necessary to focus on those which may be
affected. For instance, it is reasonable to assume that internal factors have a significant
potential for change either by enhancement of training or personal abilities on a daily
basis (e.g. fatigue, health, attitude, stress). A review of the other three RIF groups
(equipment related, external, and airspace related) reveals potential areas of change
as well as factors which cannot be influenced at the level of a particular ATC Centre
but possibly at the level of a region (e.g. traffic complexity is possible to impact on the
regional ATM level through the central flow management unit).
The optimal change is defined as the best ratio between the benefit and the cost of the
proposed recommendations. Benefit is defined as a shift in the RIF levels toward more
Chapter 8 Quantitative Assessment of Recovery Context
238
desirable Level 2 (average) or Level 1 (most favourable) and an overall shift in the
recovery context indicator (Ic) towards more positive values (e.g. extreme positive
value). The cost should be defined through the inherent costs linked to the proposed
recommendation and therefore, should include actual rather than generic costs of the
proposed change within the specific ATC Centre. Thus the cost may include the
following:
� costs of technical changes, followed by any other operational costs (delay in the
use of new system due to necessary maintenance, staff training);
� costs of designing a new procedure, followed by the cost of training the staff (i.e.
time and resources);
� cost of additional Team Resource Management (TRM) training;
� creation of a more adequate organisational environment. The examples are
improvements in terms of roles and responsibilities, the availability of team
members, the adequacy of supervision, the availability of additional support (e.g.
assistant), the personnel selection process, shift patterns and personnel planning,
attitude to teamwork, safety culture, stress management programs, support for
the organised exchange of past experience on non-nominal events,
communication with management and technicians (e.g. briefings, exchange of
knowledge, bulletins, safety panels); and
� the costs of any potential changes in airspace design.
The methodology presented in this thesis is able to provide the benefit of each
proposed solution. However, the evaluation of the related costs, as opposed to the
benefit, is not so straightforward and would necessitate input from ATC Centres.
Therefore, another approach may be utilised to ‘rate’ the benefit of implemented
changes on the level of ATC Centre, namely by the calculation of the ‘recovery context
efficiency’. This variable represents the ratio between the value of current recovery
context and the value of the most positive recovery context feasible in a particular ATC
Centre.
8.8 Summary
This Chapter has presented a methodology for the quantitative assessment of recovery
context. It started by reviewing the past HRA research of relevance to the quantitative
analysis of contextual factors. This has resulted in the selection of the CREAM
technique and its application by Kim, Seong, and Hollnagel (2005) for further
development. Building on this, a novel methodology has been developed for the
research presented in this thesis. This method assessed controller recovery
Chapter 8 Quantitative Assessment of Recovery Context
239
performance based on 20 relevant contextual factors (RIFs) and through several
distinct steps. Each RIF and its corresponding levels have been probabilistically
determined using four sources of information. These are operational failure reports,
questionnaire survey, input from eight ATM specialists, and past ATM related literature.
The methodology has further built on this and incorporated RIF interactions. This has
resulted in the change of the RIF levels and re-calculation of the corresponding
probabilities. The outcome of the entire methodology is the definition of the recovery
context indicator (Ic), as a numerical representation of a specific context surrounding
recovery from equipment failure in ATC. Ic is sensitive to the RIF changes and as such
may be used to investigate solutions to enhance the controller recovery. In other words,
the benefits of any safety-relevant changes in ATC Centres may be quantitatively
assessed in two separate ways. Firstly, the benefit can be assessed as a shift in the
distribution of the recovery context indicator from the baseline (pre-change) value to
the new value (as a result of implemented changes). Secondly, it is possible to
calculate the context utilisation or the ratio between the current value of the recovery
context and its most positive value achievable within the particular ATC Centre.
After the review of the methodology for the quantitative assessment of recovery context
in a specific ATC environment, the following Chapter 9 describes an experimental
investigation designed to further verify the proposed methodology.
Chapter 9 Experimental Investigation
240
9 Experimental Investigation of the Air Traffic Controller Recovery Performance
After the review of the methodology for the quantitative assessment of the recovery
context in the previous Chapter, this Chapter describes an experiment designed to
further validate the proposed methodology and capture the controller recovery
performance. This Chapter begins with a high-level design for the process adopted for
the experiment. This is followed by the rationale behind the need for the experiment
defined through several objectives. In order to achieve these objectives, this Chapter
describes the overall design of the experiment and selection of potential equipment
failures initially tested in a pilot study. It continues by providing the key requirements for
the experiment of relevance to this thesis, measured variables, and experimental
procedure.
Both the pilot and the main experiment were conducted in close collaboration with one
European Civil Aviation Authority (CAA)1. This particular CAA provided all of the
necessary infrastructure and staff from two ATC Centres during the period of the
experiment in 2005 and 2006. One ATC Centre was used for the pilot study which
tested the feasibility of the experimental design and its overall methodology. The other
ATC Centre was used on three separate occasions to simulate a selected unexpected
equipment failure in order to capture data on the recovery performance of 30 licensed
air traffic controllers. The Chapter concludes with a discussion of measured variables
used to capture the characteristics of controller recovery in ATC. The data collected is
subjected to a rigorous analysis in Chapter 10.
1 This CAA performs the function of Air Navigational Service Provider (ANSP) and the term CAA
will be used to denote also ANSP in the remainder of this thesis.
Chapter 9 Experimental Investigation
241
9.1 High-level design of the experimental process
Figure 9-1 below indicates the steps of organising and conducting this experiment. The
process starts with the rationale behind the need for experiment designed to capture
controller recovery performance. It proceeds with the assessment of available
resources, with focus on two key requirements, namely access to an ATC simulator
and the participation of controllers. Once these requirements have been assured, the
experimental process proceeded with the initial planning and design of the experiment
(i.e. airspace and traffic scenario, equipment failure type). Once this design had been
tested in a pilot study, the experimental process proceeded with the main experimental
study. Collected data are pre-processed and subjected to a rigorous analysis to extract
information of controller recovery from an operational environment (presented in
Chapter 10).
Rational for the experiment
Planning for the experiment
Design of the experiment
Assessment of the available resources
Pilot study
Revision of the pilot study
Main experimental
study
Data processing and analysis
In case of necessary changes
Selection of the equipment failure
Figure 9-1 The flow diagram of the experimental process
Chapter 9 Experimental Investigation
242
9.2 Rationale for the experiment
The preceding Chapters presented a detailed overview of equipment failure
occurrences in the ATC environment from both technical and human perspectives. The
findings from past literature were augmented by operational failure reports (capturing
the technical aspect of equipment failures) and feedback from an international
questionnaire survey (capturing both technical and human aspect of equipment
failures). Furthermore, factors relevant to controller recovery were identified using both
theoretical and operational findings. These factors, referred to as Recovery Influencing
Factors (RIFs), created a basis for the quantitative assessment of the recovery context.
This Chapter builds on the preceding Chapters and generates ‘real’ operational data on
controller recovery. These data are further used in Chapter 10 to verify the quantitative
assessment of the recovery context developed in Chapter 8 and the relevance of RIFs
identified in Chapter 7.
9.3 Assessment of the available resources
An assessment of the requirements and necessary resources for the experiment
highlighted the need to perform it either at an ATC Centre or a research institution
appropriately equipped. The critical requirements of the experimental design can be
grouped under two particular categories. These are the access to an ATC simulator
and the availability of licensed controllers. Based on these requirements several
potential locations were assessed:
� The Maastricht Upper Area Control Centre (MUAC) in the Netherlands. This is a
EUROCONTROL operational and simulation facility having the resources to support
both access to simulators and controllers;
� Human Factors Lab at the EUROCONTROL Experimental Centre (France),
providing access to simulators but not controllers;
� The CEATS Research, Development and Simulation (CRDS) Centre in Budapest
(Hungary). This is a EUROCONTROL facility providing access to simulators but not
controllers; and
� Various Civil Aviation Authorities (CAAs), air navigational service providers
(ANSPs) and their respective ATC Centres providing access to both simulation
facilities and controllers.
Chapter 9 Experimental Investigation
243
Although the requirements for an experimental plan were ready at the initial stage of
the research, it took two years to gain access to the required facilities. After
considerable negotiations with all potential locations, only one CAA responded
positively and agreed to provide both simulation facilities and staff for this experiment.
Both the pilot and the main study were conducted using their facilities, assistance, and
manpower.
9.4 Planning for the experiment
The review of the relevant literature, presented in Chapter 5, revealed that there is a
lack of detailed knowledge of how controllers perform during unexpected or unusual
situations (including equipment failures). This is partly due to the fact that there is no
relevant data available in the public domain2. This necessitated the design of an
experiment in this thesis to capture and exploit the relevant data.
As a result of close academic cooperation, one European CAA gave Imperial College
London the opportunity to plan, prepare, and run an experiment designed to study the
factors that drive the process that controllers follow to recover from ATC equipment
failures. This experiment was conducted in two phases (see Table 9-1). The first phase
involved a pilot study designed to test the feasibility of the experimental plan including
the appropriateness of the recovery methodology, serviceability of the equipment, and
clarity of the instructions to the participants-controllers working in the ATC Centre. The
results of the pilot study were used to enhance the plan for the main experiment. The
second phase of the study involved the execution of the main experiment where data
was collected for further analysis. A secondary objective was to assess and augment
the existing emergency training procedures as defined by this particular CAA in their
Manual of Air Traffic Services (MATS).
The planned experiments assumed a level of knowledge (on the part of the researcher)
necessary to fully comprehend the recovery process, in terms of the reactions and
actions of the controller in dealing with unexpected equipment failure. For this reason, it
was essential to acquire certain skills before running the actual experiments. To
achieve this objective, practical simulator training was completed by the researcher
prior to the execution of the main experiments (Table 9-1). The scheduled training was
2 Some research was done in the UK National Air Traffic Services (NATS), but was not released for public use.
Chapter 9 Experimental Investigation
244
preceded by a review of relevant ATC topics in order to prepare efficiently for practical
work on the simulator. The relevant areas covered were ATC phraseology, operational
procedures, equipment, radar vectoring, speed control, level busts, and aircraft
performance.
Table 9-1 Training, pilot study, and experiment sessions
Date Phase Objective Comment
19-20 Feb 2005 Planning for the
experiment
Basic training for the ab initio student, APP training
Total of 10h training on simulator
26-27 Feb 2005
APP training (arrivals and departures sequencing, radar vectoring)
Total of 10h training on simulator
02 Nov 2005
Phase I Pilot study Total of three
controllers participated
29 Nov – 01 Dec 2005
Phase II
Main study I Total of eleven
controllers participated
27 Feb – 02 Mar 2006
Main study II Total of ten controllers
participated
06 Jun – 09 Jun 2006
Main study III Total of ten controllers
participated
9.5 Design of the experiment
Since equipment failures are rare events3 , the experiment aimed to represent failure in
the most realistic form, i.e. as unexpected event. To assure the occurrence of failure as
an unexpected event, each controller participated once in the experiment. The
experiment also assumed a single-controller ACC sector (as opposed to a team of
controllers) to allow best utilisation of available ATC staff and to lessen any logistical
difficulties. Before the experiment, controllers were to be informed of the objectives of
the study in highly generic terms. They were to be given the opportunity to ask specific
questions in the post-experiment debriefing session. Additionally, to assure the
discretion and confidentiality of this study, each participant was to be required to sign a
consent form which incorporated an agreement not to disclose any information
regarding this experiment. In this way, the true objective of the experiment, i.e. the
injection of the unexpected and unforeseen equipment failure, was preserved.
3 Most of the failures in the ATC environment are prevented or handled at the
technical/engineering level. Only a few failures manage to penetrate multiple redundancies and fail-safe system design and affect controller performance.
Chapter 9 Experimental Investigation
245
The experiments were to be conducted during morning and afternoon sessions with an
assurance that participants are tested in equal proportion during the two sessions. The
simulation room conditions (lighting, temperature, noise) were to be consistent for all
runs.
Each simulation run was planned to last approximately 30 minutes, followed by a
debriefing session of similar duration. The instant of the injection of equipment failure
was planned to be precisely determined during the pilot study, occurring between the
5th and 15th minute of each run. The equipment failure would last 15 minutes. This was
decided based on two factors. Firstly, operational data shows that the majority of
failures last up to 15 minutes (Chapter 4 section 4.4.6). This has been confirmed by the
questionnaire survey results (presented in Appendix VI). Secondly, the 15 minute
duration of failure represents enough time to observe, capture, and assess the
controller reactions, performance, and overall recovery strategy.
The selection of the equipment failure to be simulated in the pilot study was based on
the results of the analysis of operational failure reports, the qualitative equipment
failure impact assessment tool, and the results of the questionnaire survey. However,
this selection was constrained by the technical capabilities of the available simulation
platform. In other words, it was important to simulate failure as well as the restoration of
the relevant equipment. Thus, the simulator platform would have to provide this
particular capability for a selected failure type. The final decision on the equipment
failure to be simulated would be achieved after testing candidate failure types during
the pilot study. The detailed rationale behind the selection of potential equipment
failures for the pilot and main experiment is given in the following section.
Another important factor of the experiment was the involvement of a Subject Matter
Expert (SME). The role of the SME would be to act as an observer and the coordinator
of the operations room. Upon a request from a controller, the SME would be
responsible for issuing any relevant information about the failure and its effect on the
ATC Centre (as would be required in the operational environment upon receiving an
update from the system control and monitoring unit). Upon restoration of the
equipment, there are several steps that controllers must perform to assure equipment
reliability and hence its readiness for the restoration of normal service (i.e. post-
restoration steps). Therefore, additional time would be given to controllers in the post-
restoration part of the simulation run, from the 25th to the 30th minute of each run. This
Chapter 9 Experimental Investigation
246
is to restore a normal working strategy after the effects of an unexpected equipment
failure.
Each simulation run would be observed by the researcher and the SME, and recorded
for the purpose of further data analysis. During each simulation run, notes would be
taken on each controller’s recovery performance and changes in attitude/behaviour
prior to and after the injection of a failure. This would enable both qualitative and
quantitative data to be captured.
The observation team would be positioned in the most unobtrusive way, still having a
clear view of the radar screen. The simulation runs would be followed by an immediate
debriefing session guided by the questionnaire and other material designed specifically
for this session. The controllers would assess all the factors that potentially influenced
their recovery performance, guided by the RIFs identified in Chapter 7. In addition, they
would be given an opportunity to judge their own performance and the credibility of the
simulated failure.
9.6 Selection of the equipment failure to be simulated
The classification of ATC system functionalities, presented in Chapter 2, identified nine
main categories. The critical subsystems, equipment, and tools were identified in each
category. This categorisation identified the number of components that could fail within
the ATC system architecture. To further assess the characteristics of equipment failure
occurrence, Chapter 4 reviewed some of the main characteristics of failures in terms of
complexity, time course of failure development, overall exposure, and impact on ATC
and ATM operations.
Further assessment of equipment failure types is presented in Chapter 4 and is based
on the detailed analysis of operational failure reports from four different countries. This
analysis shows that equipment failures dominate within the communication, navigation,
surveillance, and data processing functionalities. A subsequent analysis of the level of
severity showed that most failures that have a major impact on ATC operations occur
within the communication, surveillance, and data processing functionalities.
Furthermore, the availability of the ‘duration’ variable in one of the datasets (Country
D), enabled identification of equipment failures lasting up to 15min, which is the failure
duration feasible within this experimental set up. Failures with a major impact on ATC
operations lasting for a period of up to 15 minutes include: data exchange network,
Chapter 9 Experimental Investigation
247
other surveillance systems (predominantly radar link), the flight data processing
system, and air situational display (see Table 9-2).
Table 9-2 Overview of the potential equipment failures to be simulated and their inclusion in the pilot study
Source Potential
equipment failures to simulate
Qualitative equipment
failure impact
assessment tool rating
Adequacy for the pilot
study
Comment Testing in the
pilot study
Operational failure reports
(selection focused on
major failures of short
duration)
Data exchange network
Secondary functionality
No
It can range from moderate to minor and the selection tries to focus on major failures
-
Other surveillance systems (e.g. radar
link)
Secondary functionality
No -
Flight data processing system
Primary functionality
Yes - Reduced flight
plan mode
Air situational display
Primary functionality
Yes
Not interesting enough from the
controller recovery perspective
-
Questionnaire survey
Air-ground communication
Primary functionality
Yes - Aircraft radio
communication failure
Primary surveillance radar
Primary functionality
Yes
Not possible to simulate failure of one radar, but only
the complete loss of radar coverage
-
Flight data processing system
Primary functionality
Yes - Reduced flight
plan mode
Communication panel
Primary functionality
No
Not interesting enough from the
controller recovery perspective as the
controller would simply change the
position
-
Ground-ground communication
Primary functionality
No
Not interesting enough from the
controller recovery perspective as the controller would try
to establish communication via
other means
-
Furthermore, the analyses of the questionnaire survey responses in Chapter 6 (Table
9-2) identified the five most unreliable aspects of ATC equipment. These systems are:
air-ground communication, primary surveillance radar, flight data processing system,
communication panel, and ground-ground communication.
Chapter 9 Experimental Investigation
248
Having these nine possible failure types identified, it was necessary to select candidate
failure types for a final assessment in the pilot study in order to determine the failure to
be simulated in the main experiment. The rationale for this selection was based on the
severity of the failures as determined using the qualitative equipment failure impact
assessment tool (Chapter 4, section 4.5). The development of this tool was based
around the fact that not all equipment failures have the same severity of impact on ATC
operations. This tool identified the failures with the largest impact on ATC operations.
These are failures of the primary ATC functionality, which affect multiple
systems/tools/equipment either suddenly or gradually up to one hour in duration (see
Figure 4-9 and Table 9-2).
The process above, based on operational failure reports, the questionnaire survey, and
the qualitative equipment failure impact assessment tool, identified four potential failure
types. These are the failure of the flight data processing system, air situational display,
air-ground communication, and primary surveillance radar. These four candidate failure
types are further scoped by assessing their significance from the controller recovery
perspective but also their technical feasibility. In other words, the focus was on the
failures which require controllers to recover using only the systems available at their
positions. As a result, the pilot study simulated two different equipment failures. These
were a reduced flight plan mode as a part of the flight data and processing system and
air-ground radio communication failure.
Both failure types also conform to the requirements described in Chapter 5 (section
5.7.3) that the simulated equipment failure should allow one part of the diagnosis
phase of controller recovery to be performed overtly and thus be captured via
observations. For example, the flight data and processing system failure may be
initially thought as aircraft transponder or secondary surveillance radar failure.
Similarly, air-ground communication failure manifests itself in the same manner
regardless of its cause (i.e. ground- vs. airborne-based failure). In both cases, it is up to
the controller to identify the true failure by ruling out alternatives (e.g. communication
with pilot or adjacent ATC Centre) and this diagnostic process can be captured via
observations.
Chapter 9 Experimental Investigation
249
9.7 Pilot study: lessons learnt
Before conducting the main experiment, a pilot study was performed in order to
determine the feasibility of the experimental plan particularly with respect to the
serviceability of the equipment, ease of understanding of instructions, and logistical
issues. The study was designed to match the main experiment as far as possible.
Three controllers, selected at random and with no prior knowledge of the nature and
purpose of the experiment, participated in the study.
The pilot study was conducted on 2 November, 2005. It was part of a pre-planned
simulation, designed to test a newly restructured and reorganised airspace in the Area
Control Centre (ACC) of this particular ATC Centre. Of the three controllers who
participated in the pilot study, one was part of the airspace simulation test programme.
The others were volunteers who participated upon completion of their operational shift.
A total of three simulation runs were conducted. The first run was discarded due to the
inappropriate timing of the injection of the equipment failure.
The set up of the pilot study involved two Controller Working Positions (CWPs), with
the same simulation exercise running simultaneously on both CWPs. The participating
controller was located at one CWP, whilst the researcher and the SME occupied the
second CWP. In addition, a video camera was positioned in front of the second position
so that the controller would not be intimidated by its presence. The pilot study
simulated two equipment failures (Table 9-3) chosen based on the findings from
several sources (as discussed in section 9.6). There were no recovery procedures in
place for the first failure. The second failure has a defined procedure defined by
international aviation organisations (see EUROCONTROL, 2003f; ICAO, 2001a) but
not implemented within the respective ATC Centre.
Table 9-3 Equipment failures used in the pilot study
Type of failure Effect Existence
of recovery procedure
Human Machine Interface (HMI) indication on CWP
Reduced flight plan mode –
failure of flight data processing system
Monitoring aid available only for flight plan tracks already
displayed No
General Information Window/Flight Data
Processing (FDP) label changes from white to
yellow Flight data functions not
available
Aircraft radio communication
failure
Inability of the controller to contact aircraft on the
dedicated frequency as well as emergency frequency.
No (not in the ATC Centre)
None
Chapter 9 Experimental Investigation
250
Several important conclusions were drawn from this pilot study and the lessons learnt
were used to enhance the main experimental design. These are as follows:
� Integration of a research experiment into any kind of on-going ATC training requires
significant collaboration with training instructors, the engineer in charge, and an
ATM specialist (SME). In spite of thorough preparation, the injection of failure in the
first simulator run did not occur at the required instant due to the unclear
instructions given to pseudo pilots. This issue was corrected in the subsequent
runs. Therefore, for the main experiment a complete understanding of the set up of
the experiment would have to be ensured between the training instructor, engineer
in charge, pseudo pilots, and the SME in order to avoid any misunderstanding. This
should involve detailed discussions prior to the first simulation run of the day.
� The initial intention was to inject an equipment failure in the 25th minute of the
simulation run, in order to give the controller adequate time to adjust to the traffic
scenario. However, the first run showed that this timing was inappropriate for two
reasons. Firstly, the controllers were all very experienced and thus did not require
the proposed length of time to adjust to the traffic scenarios. Secondly, the traffic
scenarios used had a low number of aircraft in the dedicated sector from the 25th
minute onwards. This was contrary to the plan to inject an equipment failure during
the periods of average to high traffic density. Both problems were corrected by
injecting a failure in the 10th minute of the simulation run and observing the
controller recovery process while traffic increased progressively during the 30
minute runs. Since the main experiment was to use fully licensed and experienced
controllers, the exact moment of failure injection would have to be based on the
number of aircraft in the sector. The aim would be to initiate failure with traffic levels
starting with average and then progressing towards high.
� The need for access to the simulator log files was identified for the purpose of
capturing all of the inputs of the controller on the keyboard and HMI. The main
purpose for these log files would be to extract the precise reaction time of the
controller following detection of the equipment failure. However, difficulties were
encountered in the acquisition and decoding of these log files. Log files from
simulation platforms tend to have a specific format and level of detail too
cumbersome to decipher. In addition, initial detection may not necessarily be
captured in these log files (as an actual action). This is because controllers may
detect the failures but not take any action until they have evaluated the impact of
the failure on the operation. Having considered all the advantages and
disadvantages of using log files, it was decided to omit them. An alternative was
Chapter 9 Experimental Investigation
251
developed based on the use of a camcorder with a precise timing capability
(synchronised with the CWP timer). In addition, a debriefing session with the SME
was implemented to validate the data captured throughout the recovery processes.
The moment of detection was further validated through the results of the interviews
with the participating controllers in the debriefing session.
� The debriefing session revealed that some changes to the questionnaire used in
the debriefing session would be necessary. This would involve amending several
questions to extract more information from the participating controllers (e.g. traffic
and airspace related questions were to be presented in such a way as to extract
more detailed information on precise characteristics such as mix of traffic, vertical
movements, crossing movements, sector design, size of the sector, and number of
entry and exit points.
� Due to staff shortage (i.e. ATM experts) and the significant duration of the
experiment (three sessions spread across 11 days), it was not possible to access
two SME’s to observe the performance of each controller.
� It was possible to define required recovery steps for a simulated equipment failure
types and thus avoid a level of variability in each simulation run (as a result of
differences in experience, working strategies, traffic complexity at the instant of
failure injection, and inconsistencies in the pseudo-pilot inputs). The required
recovery steps are validated by the SME.
� Several issues of a more technical nature were recognised: a need for the use of a
voice recording device in the debriefing stage of the experiment as a more efficient
means of capturing the controller responses, the need for two camcorders or a
combination of one camcorder and radar replay for the debriefing session, and the
need for the use of 8mm tape camcorder instead of digital camcorders due to the
higher resolution achieved in recording and replay.
� Another factor of note was that the controllers tended initially to stop their work
when a failure occurred. This was because they felt this was a software
glitch/bugging error, common to real-time simulations. Therefore, the instructions
were to be updated to inform the controllers that in the case of any unusual event
they are expected to continue working as they would in the operational
environment. The experience of ATM specialists showed that although the
controllers may anticipate an unusual occurrence, this does not facilitate a better
handling of the occurrence (for evidence see Appendix II). Therefore, it was
assumed that prior warning of some unusual situation may not alter or enhance
controller recovery performance. It was more important that participating controllers
Chapter 9 Experimental Investigation
252
did not have advance knowledge of the nature of that unusual occurrence, i.e. ATC
equipment failure.
� Because of the great amount of data and observations to be collected, it was
realised that the main experiment would require an assistant. The primary task of
the assistant would be to observe and take notes/recordings of the controller’s overt
behaviour and attitude.
� Finally, although the simulation runs in the pilot study were designed to reflect high
traffic levels, failures were injected during a period of average to low traffic.
Additionally, no adverse weather was simulated, which would add to the complexity
of the exercise. As a result, the traffic scenario in the main experiment would
necessitate high traffic levels from the moment of failure injection throughout the
duration of the exercise. Additionally, adverse weather could be simulated resulting
in the unplanned rerouting of air traffic.
9.7.1 Summary of the findings from the pilot study
As a result of the findings from the pilot study and subsequent discussions with
technical staff and the SME, the following lessons were learnt and used to enhance the
main experimental study:
� A complete understanding of all details on the experimental set up has to be
ensured between the training instructor, engineer in charge, and the SME. In this
manner it is possible to provide a consistent injection of failure, adverse weather
conditions, and timely recordings for each simulation run of the main experiment.
This would require detailed discussions prior to the first simulation run of the day.
� In the main experiment the failure should be injected in the tenth minute of the
simulation runs, when the traffic reaches average levels and progresses towards
higher traffic levels.
� The main experimental set up would require an assistant to observe and take
notes/recordings of the controller’s overt behaviour and attitude.
� The main experimental set up should be based upon one traffic scenario with
average to busy traffic and adverse weather conditions (pseudo pilots should be
briefed to ask for rerouting due to adverse weather conditions); and
� The pilot study tested two different equipment failures. Both failure types showed
the potential for the experiment. However, the flight data processing system failure
was chosen for the main experiment as it is more demanding from the controller
recovery perspective. The failure would be injected as a sudden failure in the tenth
minute of each simulation run and it would last for 15 minutes.
Chapter 9 Experimental Investigation
253
The following section discusses the process adapted to set up the actual experiment
including a description of the characteristics of the simulated airspace, traffic, and
equipment failure type.
9.8 Experimental set up
The main experimental study was conducted in an ATC Centre (different from the one
used in the pilot study) in three separate sessions: from November 29 to December 1,
2005, from February 27 to March 02, 2006, and from June 06 to June 09, 2006 (Table
9-1). The reason for choosing a different ATC Centre to the one used for the pilot
study, was to access a larger population of controllers and required simulation facilities.
There were several differences in the set up of the main experimental study when
compared to the pilot study. The differences are presented in the following paragraphs.
Note that the other design specifications were maintained as given in section 9.5.
The population for this experiment should consist of the controllers from the ATC
Centre where the experiment was to be carried out. The population characteristics to
be sampled in this experiment are age, operational experience (i.e. years in service),
and rating of the controllers. Based on the statistical characteristics of human (i.e.
controller) performance and potential modelling with the normal distribution, the
minimal number of simulation runs (and thus participants) would be 20 (Shier, 2004).
However, collecting a larger sample of controller recovery performance poses a
significant challenge because of accessibility (to both controllers and a simulator
facility) and other logistical problems.
As a result, the study had a total of 31 simulation runs (eleven runs in the first session,
ten runs in the second and third session) performed on the Beginning to End Skills
Trainer (BEST) simulation platform. The main study was conducted in collaboration
with various staff from the ATC Centre. They were: one ATM specialist taking the role
of the Subject Matter Expert4 (SME), technical staff supporting the simulation runs,
several pseudo pilots, and total of 31 controllers. All three sessions were designed to
be as similar as possible in a given ATC environment.
4 The SME participating in this study is an ATM Specialist with 20 years of experience in many
facets of ATC and has 15 years of experience as an ATC instructor.
Chapter 9 Experimental Investigation
254
As mentioned previously, each simulation run was of approximately 30 minutes
duration, followed by a debriefing session of a similar duration. The experiment
(executed according to the timeline in Figure 9-2) used a pre-planned training exercise
modified for experimental use. After the first simulation run (which was discarded
afterwards), the exercise was amended to reproduce a busier traffic environment. In
other words, several arrivals were accelerated to achieve a busier period from the 10th
to the 25th minute of the exercise. FDPS failure was consistently injected in the 10th
minute of each run by pseudo pilots who manually de-correlated each new radar track.
In addition, pseudo pilots were instructed to simulate adverse weather conditions en
route by asking for necessary rerouting from the controller. Weather conditions were
scheduled for the fifth and fifteenth minute of the run. The FDPS was consistently
restored in the 25th minute of each run (see Figure 9-2).
Figure 9-2 Timeline of the experiment
The recovery process did not end with the restoration of the equipment (the 25th
minute) due to several steps that the controller had to perform to assure equipment
reliability and hence the readiness for the restoration of normal service. It usually took
one minute to accomplish these post-restoration steps. Additional time was given to
controllers in the post-restoration part of the simulation run (from the 25th to the 30th
minute of the run) to restore their normal working strategy and to calm down after the
effects of a highly stressful equipment failure occurrence.
The SME involved in the study as an observer also acted as a coordinator to issue any
relevant information about the failure and its effect on the entire ATC Centre. This
notice was issued in response to queries from the participating controllers. However, if
a controller did not make any attempt to contact the coordinator, the SME issued this
information at the most suitable moment during the exercise (based on the level of the
controller’s workload).
Each simulation run was observed by the researcher, the assistant, and the SME; and
recorded for the purpose of further data analysis. The assistant was mainly responsible
Chapter 9 Experimental Investigation
255
for taking notes of the controllers’ overt behaviour prior to and after injection of failure.
A check-list using the SHAPE5’s list of attitudes was used to guide the assistant in
performing this task (EUROCONTROL, 2004f). The assistant was positioned in the
least intrusive way to the controller, completely outside of his/her field of view. On most
occasions, the observation team was positioned as far from the controller’s field of view
as possible, whilst still having a clear view of the radar screen. The precise set up of
the simulation room in which the experiment took place and the positions of all parties
involved are depicted in Figure 9-3.
Figure 9-3 Room set up
The simulation runs were followed by an immediate debriefing session guided by the
questionnaire and other material designed specifically for this session. The controllers
were asked to evaluate all the factors that potentially influenced their recovery
performance. In addition, they were given an opportunity to judge their own
performance and the realism of the exercise itself. The questionnaire and other
material designed for the experiment and the debriefing session is presented in the
Appendix XIII.
Equipment failure in ATC, as any other unusual or emergency event, represents a
highly stressful event. In these instances the controllers are required to intervene with
complex strategies and employ their knowledge under significant pressure and high
psychological stress. For this reason, the debriefing session was used to help diffuse
stress by creating a relaxed interview environment where the participating controllers
could evaluate their actions and performance. This session was structured in such a
way as to enable comparisons across the participants. For this reason, a special
5 SHAPE project is briefly explained in Chapter 7, section 7.3.1.3. List of attitudes used to guide
the assistant in the experimental process was derived from SHAPE attitude items, such as attentive, active, confident, thoughtful, calm, careful, and enquiring.
Chapter 9 Experimental Investigation
256
debriefing sheet had been designed prior to simulation runs. The rationale behind this
structured approach to debriefing was to ensure a consistent and reliable acquisition of
data on controller recovery performance. The debrief segment of the experiment was
used to confirm and detail observations made during the simulation run via an
approach similar to a “cognitive walkthrough”. In other words, this part of experiment
was used to discuss the sequence of recovery steps required by a controller to
accomplish a recovery, and to validate failure detection and the factors that influenced
each stage of the recovery (i.e. detection, diagnosis, and correction; further discussed
in Chapter 10).
The following paragraphs give a brief description of the key elements of the
experiments in terms of airspace, traffic, and failure characteristics.
9.8.1 Airspace characteristics
The approach airspace of the ATC Centre where the experiment was carried out is
designated as class “C” airspace. This airspace extends horizontally over a radius of
30Nm from the airfield (runway 06/24, instrument landing system - ILS equipped on
both runway ends). The vertical limits are from the surface to 8,000 ft or FL80.
However, in the case of an early handover from area control, the area of responsibility
of the approach control increases. For example, if an aircraft is handed over at FL180
descending to FL80, all of the airspace in between becomes the responsibility of the
particular approach sector. On a scale of one (adequate airspace) to three
(inappropriate airspace) the participating controllers ranked this airspace as 1.31 on
average, which translates to airspace of adequate to tolerable complexity (Table 9-4).
In addition, a series of in-depth questions on airspace characteristics were presented to
each controller to identify the specific features of this airspace. The most frequently
observed issues with traffic complexity were:
� that there were a variety of flight levels and altitudes utilised (from FL100 down to
FL90, 4500ft, 4000ft, 3500ft, 3000ft);
� that there were no specific entry and exit points (throughout the duration of this
experiment this particular airspace did not provide for any standard instrument
departure and arrival routes, i.e. SIDs and STARs); and
� that the complexity of the neighbouring sectors did influence complexity within the
approach sector they operated in (e.g. two neighbouring sectors have large
numbers of crossing traffic).
Chapter 9 Experimental Investigation
257
Table 9-4 The mapping between exercise characteristics and the controllers observations
The exercise characteristics The controllers observations
Airspace characteristics simulated as adequate Adequate to tolerable Weather conditions simulated as unchanged (pre- and post-failure)
Unchanged
Traffic characteristics simulated as high Average to high
In addition, the weather conditions in the exercise simulated 15-25 knots southwest
wind, rain showers, half of the sky covered with cumulonimbus cloud (i.e. thunderstorm
cloud) with base at 1800ft, temperature of two degrees Celsius, and the pressure at
mean sea level (MSL) of 1032 hPa. Generally, in these conditions, icing will occur
inside cloud above 2000ft (in the ICAO standard atmosphere the temperature
decreases on average by 2 degrees Celsius/1000ft). Since the weather conditions pre-
and post-failure injection remained unchanged (i.e. re-routings requested by pilots in
both cases), the overall weather was marked as unchanged. This was confirmed by the
SME and participating controllers (Table 9-4).
9.8.2 Traffic characteristics
The exercise used in this experiment had a duration of 30 minutes and a total of 14
flights (one training aircraft, ten arrivals, and three departures), which translates to 28
aircraft per hour. In the peak segment of the training exercise, the controller was in
simultaneous radio contact with seven to eight aircraft. On a scale of one (high
complexity) to three (low complexity) the participating controllers ranked the traffic
complexity as 1.66 on average. This rating translates to average to high traffic
complexity (Table 9-4). In addition, a series of in-depth questions on traffic
characteristics were presented to each controller to identify the traffic characteristics
mostly observed in the given traffic scenario. These were:
� aircraft speed mix or the difference in indicated airspeeds ranging from 125 knots to
250knots (i.e. the speed read directly from the airspeed indicator on an aircraft);
� the utilisation of hold and thus induced delays;
� only Instrument Flight Rules (IFR) aircraft utilising the airspace;
� high volume of traffic with vertical and crossing movements; and
� an average flight time in the sector of 10-15 minutes (longer than usual due to the
injected equipment failure).
9.8.3 Equipment failure characteristics
The choice of the equipment failure was driven by the previous analyses and four
different sources of information (operational failure reports, questionnaire survey, the
Chapter 9 Experimental Investigation
258
qualitative equipment failure impact assessment tool, and the pilot study). The FDPS
failure was chosen for this experimental set up for several reasons. Firstly, the data
available showed that this failure is both severe and frequent. Secondly, this failure
represents an example of major failures that affect multiple systems, as seen from the
qualitative equipment failure impact assessment tool. Thirdly, the participating CAA
does not have a written procedure for this particular failure which makes the controller
recovery performance more dependable upon their knowledge, experience, and
personal abilities. Finally, the technical features of the Beginning to End Skills Trainer
(BEST) platform allowed injection of this failure type and its restoration in a fairly easy
way. In order to simulate equipment failure in the most realistic way, it was necessary
to have the ability to inject failure but also to restore system functionality rapidly. This
was possible with the FDPS failure and its degradation was simulated as a sudden
failure affecting the entire ATC Centre for a period of 15 minutes.
A visual representation of this type of equipment failure on the BEST platform is
presented in Figure 9-4. Correlated radar track with all relevant flight-related
information is presented on the left-hand side of Figure 9-4, whilst the uncorrelated
track (resulting from the FDPS failure) depicting only the aircraft position is on the right-
hand side. It can be seen that the FDPS failure represented a failure which affects
multiple systems. The actual effects of the FDPS failure are presented in the Table 9-5
and in more detail in Table 9-6.
(a) (b)
Figure 9-4 The visual representation of equipment failure on CWP: a) before the failure, b) after
the failure
Table 9-5 Equipment failure in the experimental study
Type of failure
Effects Existence of
recovery procedure
HMI indication on BEST simulation
platform
Reduced flight data processing
mode
Monitoring aid only available with existing flight plans
No None
Flight data functions (flight plan management) not available
Safety Nets functions available
Radar data functions available
CALLSIGN TYPE
AFL XPT GS
CFL XFL ADES
Chapter 9 Experimental Investigation
259
Table 9-6 Availability of functions in the reduced flight data processing mode
Radar data source
Radar tracks Available Flight plan track Only for flight plan tracks already displayed
Maps Available Tools Available
Radar picture controls Available
Flight plan facilities
Flight plan commands Not available Flight plan lists Partially available (for display only, frozen lists)
ATC messages de-queue management
Not available
Transmission of ATC messages Not available Coordination message Not available
Alarm and warning facilities Partially available (no MTCA warnings update) General information area Available
Mail box management Not available
Operational data management Partially available (runway in use and airspace
management are not available) Sectorisation Partially available (only displayable)
Aeronautical Information System Available Load management facilities Not available
Air Traffic Flow Management facilities
Not available
Operational load forecast facilities Not available Current Operational Load facilities Not available
System survey facilities Partially available (percentage of use of SSR code
indication that a flight plan has received message is incorrect and alerts are not available)
Operational room configuration Partially available (only displayable) Manual printing facilities Available
Operator roles (eligibility rules) Partially available (only displayable) Off-line customisation Available
User mode of ATC position Available Repetitive flight plan database
version management Not available
9.9 Experimental variables
The following sections define the variables that were taken into account in the design of
the experiment to capture the characteristics of the recovery process in ATC. They are
defined as independent, dependent, and extraneous variables (see Table 9-7 and
Table 9-8) and discussed in the following sections.
Table 9-7 Overview of independent and dependent variables
Independent variable Dependant variable
Set of 20 RIFs The recovery context (recovery context
indicator) The required recovery
steps The recovery effectiveness
The recovery duration
Chapter 9 Experimental Investigation
260
9.9.1 Independent Variables
There are two sets of independent variables in this experiment. These are the
Recovery Influencing Factors (RIFs) and required recovery steps, discussed in the
following sections.
9.9.1.1 Recovery Influencing Factors (RIFs)
The research carried out in this thesis includes an assessment of the factors that
influence controllers during the process of recovery from equipment failures in ATC (i.e.
RIFs; see Chapter 7). A total of 20 relevant factors (RIFs) were identified. During the
post-experiment debriefing session each participating controller was presented with the
questionnaire. This questionnaire enabled controllers to mark and briefly explain the
influence of each RIF on their recovery performance as experienced in the simulation
run. Although it would be beneficial to question controllers on their experience with the
interactions between RIFs, this would considerably increase the complexity of the
experimental design. Therefore, the statistical approach is taken instead (presented in
Chapter 8).
Table 9-8 briefly summarises each of the 20 factors, specifying the key considerations
taken into account in the design of the experiment. Each factor is defined as either
independent or extraneous variable. Seven RIFs were kept constant for all participating
controllers (Table 9-8), whilst two RIFs were not considered in this experiment (i.e.
‘adequacy of alarm’ and ‘adequacy of alarm onset’).
Chapter 9 Experimental Investigation
261
Table 9-8 Overview of independent and extraneous variables
Variable Independent
variable Extraneous
variable Comment
Training for recovery √ Assessed in the debriefing session.
Previous experience with equipment failures
√ Assessed in the debriefing session.
Experience with system performance
√ Assessed in the debriefing session.
Personal factors √ Assessed in the debriefing session.
Communication for recovery √
Existing studies from the nuclear industry have confirmed that communication within a team does have a significant impact on recovery performance (Kaarstad and Ludvigsen, 2002). Hence, the impact of this factor is fairly well known. Regardless, this variable will be assessed after the experiment.
Complexity of failure type
Constant (multiple systems affected)
Refers to single vs. multiple failure occurrences. The experimental set up should assess the impact of one failure which affects multiple ATC systems. Therefore this variable will be constant for all subjects.
Time course of failure development Constant (sudden failure)
This variable varies between sudden failure and gradual degradation of the system. This variable will be constant for all subjects.
Number of workstations/sectors affected
Constant (all workstation
affected)
Experiment is conducted on a single workstation with one controller at a time. But the controller will be informed that the failure affects the entire ATC Centre.
Time necessary to recover √
This variable varies between adequate and inadequate time to recover. It can be influenced by several factors. Firstly, the characteristics of a given failure will drive the time necessary to recover through the criticality of the failed function and its detectability. Secondly, the controller characteristics will also have an effect. More experienced controllers may react and resolve an issue more quickly than less experienced ones. Finally, the characteristics of traffic at the moment of failure will drive the time necessary to recover. The more complex the traffic situation, the more recovery time will be needed to the controller. This variable will be assessed in the debriefing session.
Existence of recovery procedure Constant (no procedure)
Theoretical review and various experiments in other safety-related industries have confirmed the relevance of procedures to recovery performance (Kaarstad and Ludvigsen, 2002; EUROCONTROL, 2004e; Kanse, van der Schaaf, 2000). Therefore, it was decided to choose a failure which does not have an appropriate recovery
Chapter 9 Experimental Investigation
262
procedure.
Duration of failure
Constant (short
duration – 15min)
In the experimental set up, duration of failure should be long enough to capture all phases of the recovery (e.g. 15min) taking into account the total duration of experiment.
Adequacy of HMI and operational support
√ Assessed in the debriefing session.
Ambiguity of information √ Assessed in the debriefing session.
Adequacy of alarms/alerts Not applicable for technical
reasons
The experimental design aims to capture controller performance unaided by system tools, emphasising more controller readiness to detect and react to unexpected occurrence. Additionally, past research have already shown that in most cases the existence of an alert does have a significant impact on recovery performance (Kaarstad and Ludvigsen, 2002; Theis and Straeter, 2001).
Adequacy of alarm/alert onset Not applicable for technical
reasons
Existing studies from various industries have confirmed that the alert onset or its ‘cognitive convenience’ does have a significant impact on recovery performance (Straeter, 2005).
Adequacy of organisation √ Assessed in the debriefing session.
Traffic complexity Constant
(average to high)
This variable will be kept constant for all subjects. The aim is to reflect the current levels of traffic as well as the future predicted traffic increase. The declared sector capacity is defined as the number of aircraft entering the sector per hour, respecting the peak hour pattern, when controller workload is 70 percent in that hour (Majumdar and Ochieng, 2002). Therefore, the aim of the proposed experimental set up is to use a 30-min peak hour traffic sample that adequately reflects the sector’s declared capacity. In addition, the scenario should aim at steady traffic increase up to the tenth minute into the scenario. The remaining 20 minutes of the scenario should reflect higher levels of traffic as well as controller workload.
Airspace characteristics √ This variable will be constant since each participant will experience the same airspace/sector characteristics. However, each controller will be able to assess the adequacy of airspace in the debriefing session.
Weather conditions during the recovery process
Constant This variable will be constant for all participants. Poor weather conditions will be experienced both pre- and post-failure period.
Conflicting issues in the situation √ Assessed in the debriefing session.
Age √ Assessed in the debriefing session.
Overall experience as a controller √ Assessed in the debriefing session.
Required recovery steps √ Set of required recovery strategy steps will be defined prior to the experiment based on the type of failure, traffic sample, and airspace characteristics.
Chapter 9 Experimental Investigation
263
9.9.1.2 Required recovery steps
The recovery performance of each participant was compared to the pre-determined set
of required recovery steps. These recovery steps were determined on the basis of
operational experience, since the participating Civil Aviation Authority (CAA) does not
have any official guidelines for this particular failure type (e.g. procedure, written
instruction). This set of required recovery steps was validated by the independent input
of the SME and two ATC instructors. It should be noted that controller performance
was highly dependent upon the traffic situation at the moment of failure and therefore
several different sequences of the recovery steps were possible. The list of the
seventeen recovery steps presented in Table 9-9 presents one logical sequence of the
recovery steps. Whilst some steps had to be performed only once (e.g. identification of
a failure type, informing the coordinator, and post restoration), others had to be re-
applied. For example, for each new (uncorrelated) track entering the dedicated
airspace, it was necessary to identify the traffic and maintain that identification. In
addition, timely and accurate strip marking was a must especially in the situation of
degraded equipment reliability, as simulated in this experiment. A detailed evaluation of
strip management and annotations should be addressed in future research.
An important point to note is that these simulation runs were not entirely identical in
spite of the great effort to achieve consistency amongst participants. The observed
differences were due to pseudo pilots’ manual actions, namely their incorporation of
requested weather rerouting and slight deviations of the moment of failure injection. In
short, pseudo pilots had to manually de-correlate each new track which influenced to
some extent the traffic distribution in each simulation run.
Due to the small differences in the simulation runs, further analysis focused only on the
list of required recovery steps (Table 9-9), irrespective of their sequence. The objective
was to capture these core steps (including the post-restoration steps, S14-S17) and
evaluate any deviations.
Table 9-9 Overview and description of required recovery steps
Required recovery step
Description
S1
Detect the problem either by pilot’s contact or visually on the radar display (detection of the uncorrelated track). In both cases, the first assumption may be a transponder failure. After confirmation that the aircraft transponder is operational, further check on ATC system performance should be conducted.
S2 Locate traffic
Chapter 9 Experimental Investigation
264
S3 Check identity of eastbound overflight S4 Identify all traffic using appropriate technique
Bearing/range or Turn method (turning the aircraft for 30 degrees or more)
S5 Identify failure type (either by controller or by coordinator) S6 Inform all traffic on RTF of the failure and advise of possible restrictions S7 Maintain identification of all traffic S8 Ground the trainer S9 Refuse departing traffic permission to depart
S10 All airborne traffic in inbound sequence should continue to be sequenced for landing (without unnecessary delay)
S11 Maintain accurate and timely strip marking throughout the process S12 Provide vertical separation S13 Utilise holding patterns when necessary S14 After restoration has been confirmed by coordinator re-identify all traffic S15 Confirm Mode C S16 Continue to monitor S17 Release all departures (which leads to the restoration of the normal service)
It is important to state the some of the recovery steps above are of greater importance
to maintaining a safe ATC service than others. For example, maintaining identification
of all traffic, conducting timely and efficient strip marking and board management, and
maintaining separation are considered critical to overall safety in a degraded situation.
Other recovery steps, such as grounding the trainer and preventing departures, are of
less importance in that they are workload reduction measures. Nevertheless, their
implementation contributes to a safer traffic environment in unusual situations.
9.9.2 Dependent Variables
This study was designed to capture several quantitative and qualitative dependent
variables. The reason for this lies in the fact that controller recovery cannot be captured
through only one recovery variable as highlighted previously in Chapter 5. The
dependent variables in this experimental set up are recovery context (recovery context
indicator), recovery effectiveness and recovery duration (see Table 9-7). The precise
methodology for the assessment of the recovery context both as a qualitative and a
quantitative variable is presented in Chapter 8. The following sections investigate other
variables.
9.9.2.1 Recovery effectiveness
The recovery effectiveness of each participating controller was rated by combining
three separate sources of data. Firstly, each participant’s recovery performance was
rated during the simulation run. In general, this analysis was based on the performance
indicators for a particular airspace, such as optimal use of airspace (separation of 5-
8Nm), radar vectoring, speed control, use of radio telephony (RT), prioritisation of
Chapter 9 Experimental Investigation
265
tasks, and appropriateness of traffic management. Secondly, the recovery
effectiveness was rated based on a set of required recovery steps as explained in
9.7.1.2. Thirdly, the steps identified earlier were grouped under three main tasks to
enable credible rating (see Table 9-10). These are:
� System protection or recovery steps which aimed to assure protection of the ATC
system in case of further equipment deterioration. Note that the reduction of
controller’s workload through better traffic management is an integral part of system
protection and as such is included in this task;
� Maintaining situational awareness (i.e. accurate mental picture of traffic and
airspace); and
� Post-restoration recovery steps.
Table 9-10 Recovery process and its three main tasks
System protection task SA or mental picture task Post-restoration task
Ground the trainer Detect the problem Re-identify all traffic Refuse departures permission
to depart Identify failure type Confirm Mode C
All airborne traffic in inbound sequence should continue to
be sequenced for landing
Maintain accurate and timely strip marking
Continue to monitor
Utilise holding patterns when necessary
Identify all traffic (including eastbound overflight)
Release all departures
Inform all traffic and advise of possible restrictions
Locate traffic
Provide vertical separation Maintain identification of all
traffic
It should be noted that an assessment of controller performance is not a simple task of
counting the number of recovery steps performed versus the total number of required
steps. The reason for this lies in the different effects that each step has on the overall
recovery performance. Therefore, three sources of information enabled a structured
recovery assessment of each participant using the following five categories:
� Very good recovery performance (VG) - the controller employed a very good
recovery strategy and all recovery steps;
� Good recovery performance (G) - the controller employed a good recovery strategy
but failed to perform some of the steps;
� Adequate recovery performance (A) - the controller employed an adequate
recovery strategy but failed to completely protect the ATC system in case of further
equipment deterioration and failed to implement some of the post-restoration steps;
� Partially adequate recovery performance (PA) – the controller employed inadequate
recovery strategy. In other words, there was a complete lack of ATC system
Chapter 9 Experimental Investigation
266
protection from possible further equipment degradation. In addition, the controller
did not assure timely and accurate strip management and therefore had no means
to support his/her situational awareness or mental picture of the traffic and
airspace. The post-restoration steps were performed only to some basic extent
without a proper check of the accuracy of new data; and
� Inadequate recovery performance (I) – the controller had no recovery strategy in
place, no plan to reduce his own workload, and therefore, failed to protect the ATC
system in the case of further equipment deterioration. In addition, the controller
failed to implement most of the post-restoration steps.
Although not attempted in this thesis, future research should assess the relevance and
contribution of existing tests such as the situational awareness test – SAGAT, to the
assessment of controller recovery.
9.9.2.2 Recovery duration
As previously discussed in Chapter 5, the recovery duration is measured as the time
from the first controller overt action to the end of the recovery process. The
measurement starts from the first controller overt action as opposed to the moment of
actual failure detection although they can differ significantly. Identifying the moment of
the failure detection can be an extremely difficult task as this first reaction usually
represents covert behaviour (i.e. detection) not directly observable. In the current
experimental set up and with the available apparatus, it was not possible to accurately
capture the moment of failure detection but only the controller’s first action as observed
on the ATC system.
More sophisticated equipment, such as an eye movement tracker (e.g. ASL Model
501), offers a better, but still not entirely accurate, approach to the discrimination of the
moment of failure detection. The reason for this is that there is no integrated measure
of eye point of gaze and brain activity which would differentiate between fixations with
information gathering and ‘stares’, when no information has been gathered6. Therefore,
even with the use of this advanced eye tracking equipment, it would not be possible to
firmly state the precise moment of failure detection. Whilst the moment of failure
6 Personal correspondence with human factors experts from Netherlands National Research
Laboratory (NLR) and EUROCONTROL Experimental Centre (Human Factors Lab).
Chapter 9 Experimental Investigation
267
detection was investigated during the post-experimental debriefing, it still proved to be
difficult to determine.
For this reason, the research presented in this thesis uses the first controllers’ action to
measure the recovery duration. It is necessary to highlight that this first observable
action may be postponed for two generic reasons. Firstly, the controller may not
necessarily detect the uncorrelated track as soon as it becomes visible on the radar
display. Secondly, the controller may detect it immediately (upon its presentation on the
radar display) but consciously delay any action due to the workload experienced or the
presence of a more urgent task which needs to be addressed first. For example, the
controller may need to address some of the tasks that are completely unrelated to the
recovery process, namely turning the aircraft to intercept the ILS localiser for the
approach and landing, radar vectoring of the traffic with speed differential. In other
words, the controller’s first action is the moment when the controller decides to initiate
an appropriate recovery strategy and not necessarily the actual time when he/she
detects the uncorrelated label. It is well known that controllers develop their own
working strategies concurrently with gaining experience and proficiency with years on
the job. This results in the gradual built up of ‘personal criteria’ for separation limits and
methods for solving the potential conflicts (whether it is to change speed of the aircraft,
its flight level, or heading).
Based on the moment of the controller’s first action, the recovery duration was
determined by observation of simulation runs and recorded video/audio material. It
should be noted that controller recovery performance did not stop with the restoration
of FDPS service, but continued to include all necessary post-restoration steps. The
post-restoration steps are required to restore normal service and to confirm that the
restored functionality provides accurate information. Discussion with the SME revealed
that this stage of the recovery should take up to one minute in duration, simply to limit
the recovery duration for the controllers who fail to perform all post-restoration steps.
As a result, the recovery duration was directly influenced by the duration of the failure
(15 minutes) and the period required for the post-restoration phase (one minute). Thus,
the recovery duration could reach a maximum of 16 minutes only if the controller
immediately initiates recovery action(s). The more time it takes for the controller to
initiate recovery action, the shorter the recovery duration will be.
The results of all three sources of information as well as the final rating for each
participant were confirmed by the one SME involved in the experiment. Clearly, having
Chapter 9 Experimental Investigation
268
the participation of more SMEs would increase the validity of the outcome of the
experiment. Future research should address how statistical representation could be
achieved given the logistical difficulties associated with these types of experiments.
9.9.3 Extraneous Variables
Extraneous variables influence the outcome of an experiment, although they are not
the variables of interest. These variables are undesirable because they add errors to
the experiment. A major goal in the experimental design is to eliminate the influence of
extraneous variables as much as possible. If it is not possible to eliminate them, they
should be controlled. Two extraneous variables in this experiment could not be
controlled. These are:
� Operational experience (i.e. years in service)
The differences in the level of experience were to be captured once the controllers are
recruited for the experiment. The experience variable is differentiated between the
following categories: 1-10; 11-20; 21-30; and 31-40 years.
� Personal factors
There is a wide variety of factors that could be categorised as personal. Some of these
are more complex to determine than others. For example, factors like health, vision,
level of confidence, complacency, level of trust in automation, self esteem (i.e. trust in
own ability), personality, motivation, attitudes deriving from family or close social group
personality type, etc. require specific sets of tests which can be too complex and too
time consuming. However, age was to be captured once the controllers were recruited
for the experiment. Fatigue and stress were to be controlled by using rested controllers,
similar as ‘time of the day’ (i.e. relevance of circadian rhythm) and time into the shift
(i.e. level of situational awareness as well as fatigue). In short, the experiment was to
be conducted in the same periods of the day, where half of the subjects were to be
tested in the morning sessions, and the other half in the afternoon sessions.
9.10 Potential limitations
There are two limitations of the experimental set up and its use to capture data. Firstly,
one limitation is the individual differences of the participants (i.e. controllers). These are
characteristics that differ from one participant to another which could be overcome by
using random assignments or even matching groups (to ensure that different groups
are equivalent with respect to pre-selected characteristics (e.g. experience and age).
Secondly, validation of recovery performance of each participating controller by only
one SME creates a potential for bias. Although special attention has been given to the
Chapter 9 Experimental Investigation
269
choice of the SME (in terms of experience and expertise), still only one SME was
available for this experiment.
9.11 Summary
This Chapter has presented in detail the experiment designed to capture controller
recovery in ATC. The Chapter started by justifying the need for the field experiment.
This was followed by an assessment of the available resources and the key
requirements that had to be accomplished. The Chapter continued by discussing and
justifying the overall experimental set up and data acquisition. This included the
presentation of the rationale for the choice of the equipment failures to be tested in the
pilot study. After the lessons learnt from the pilot study, it was possible to implement
the final changes and fine tune the set up of the main experiment. This segment
focused on the characteristics of the simulated traffic, airspace, and equipment failure,
as well as on the research variables while highlighting potential limitations. The
following Chapter analyses the data captured from this experiment.
Chapter 10 Analysis of Experimental Results
270
10 Analysis of Experimental Results
The previous Chapters identified a set of relevant contextual factors or Recovery
Influencing Factors (RIFs) and developed a novel approach for the quantitative
assessment of the recovery context. This approach and its operational benefits are
further verified in this Chapter by an experimental investigation conducted in a training
facility of an Air Traffic Control (ATC) Centre with the participation of 30 operational air
traffic controllers. In addition to the assessment of the recovery context, the
experimental data are used to assess controller recovery performance using the
recovery variables identified in Chapter 5.
The Chapter starts with the overall framework for the analysis of a unique set of data
on controller recovery performance. This is followed by the analysis of the
characteristics of the sample of controllers participating in the experiment. The Chapter
continues with an assessment of controller recovery performance using three recovery
variables, namely recovery context, duration, and effectiveness. It concludes by
focusing on the outcome of the recovery process, as captured in the experiment.
10.1 Overall framework
The objective of the experiment conducted in this research is mainly to capture data
related specifically to controller recovery from equipment failure in ATC. Based on the
experimental set up (presented in Chapter 9), three experimental sessions were
conducted with 30 controllers from a particular ATC Centre who participated on a
voluntary basis. The controllers were asked to complete one emergency training
session (based on a simulated Flight Data Processing System-FDPS failure), followed
by a debriefing session.
The framework for the analysis of data collected on controller recovery from a FDPS
failure is structured according to Figure 10-1. It starts by assessing the characteristics
of the controllers who participated in the experiment. This is followed by a detailed
Chapter 10 Analysis of Experimental Results
271
analysis of the recovery variables defined in Chapter 5, their interactions, and other
relevant findings obtained form the experiment.
Participants
Recovery context indicator
Recovery effectiveness
30 operational air traffic controllersOne particular ATC CentreSimulated Flight Data Processing System (FDPS) failure
Analyses of recovery variables
Recovery context
Required recovery steps
The recovery phases
Observed behaviour and
attitude
Additionalfindings
Analysis of interactions
Analyses of dependent variables
Recovery duration
Experimental results
AgeOperational experienceRatings
Outcome of the recovery process
Other findings
Figure 10-1 Framework for the analysis of experimental results
10.2 Participants
As discussed in section 9.8 (Chapter 9), it is important that statistical representation is
achieved in research that involves sampling of the population. In this case, such
representation is required for the ATC Centre where the experiment was to be carried
out. The main distinguishing characteristics of the controllers are age, operational
experience (i.e. years in service), and rating. This section analyses these and makes a
link to statistical representation.
Chapter 10 Analysis of Experimental Results
272
10.2.1 Age and operational experience
The average age of the controllers who participated in the experiment is 37 years,
ranging from 24 to 58 years. On average, they have more than 12 years of operational
experience, ranging from 2 to 35 years. Figure 10-2 shows the distribution of
operational experience of sampled controllers in terms of the four categories adopted
for the questionnaire survey in Chapter 6. It can be seen that the sample is reasonably
representative of the population of controllers in the particular ATC Centre as all
experience categories have been represented. The under representation of controllers
with over 30 years of experience is to be expected as the majority of the controllers in
this category tend to move to operational support roles (e.g. ATC instructors). This
finding is in line with the results of the questionnaire survey (Chapter 6) where there
were fewer respondents with over 30 years of experience.
Figure 10-2 Distribution of operational experience
10.2.2 Ratings
Figure 10-3 presents the distribution of the ratings of the controllers who participated in
the experiment. Considering that the training exercise was designed for the approach
control course (APP), it is important to highlight that 20 percent of the participants did
not have APP rating. However, half of these participants had ACC rating which
incorporates training in elements of approach control (as a part of the low level ACC
course). Although the remaining participants had only TWR rating, they had just
Chapter 10 Analysis of Experimental Results
273
completed an APP course and therefore possessed knowledge of all relevant elements
of approach control.
All - ACC APP TWR
ACC and APP
ACC and TWR
APP and TWR
ACC APP TWR
Ratings
0
10
20
30
40
Pe
rce
nt
36.7
26.7
3.3
10
6.7 6.7
10
Figure 10-3 Distribution of controllers’ ratings
Since the experiment was conducted in three separate sessions (as discussed in
section 10.1), it is important to investigate whether the sampling on all three occasions
was appropriate. In other words, it is important to show that all three sessions come
from the same population of controllers from the ATC Centre, and that aggregated,
they represent a proper sample (Table 10-1).
Table 10-1 Characteristics of a sample of controllers participating in experiment
Variables Experimental session
1 Experimental session
2 Experimental session
3
Age (mean, standard deviation)
M=35.9, SD=8.95 M=37.9, SD=10.3 M=37.7, SD=9.73
Experience (mean, standard deviation)
M=10.7, SD=6.70 M=14.3, SD=11.08 M=13.7, SD=8.22
Category of experience (frequency)
1-10 5 5 4 11-20 4 2 5 21-30 1 2 0 31-40 0 1 1
The Mann-Whitney non-parametric test was used to investigate the differences
between age and operational experience of controllers from the three experimental
Chapter 10 Analysis of Experimental Results
274
sessions. Details of this statistical test are presented in Chapter 6, section 6.7.4. The
statistical tests1 at 95 percent confidence level indicated that there is no difference
between the three experimental sessions (p>0.05). Based upon this, data were pooled
for further analyses.
10.3 Assessment of controller recovery performance
The main objective of the research presented in this thesis is to investigate controller
recovery from equipment failures in ATC. The discussions in Chapter 5 concluded that
the assessment of controller recovery needs to assess the recovery context,
effectiveness, and duration, followed by the assessment of the outcome of the recovery
process. The section continues with an analysis of the interactions between recovery
variables and concludes with the discussion of other relevant experimental findings.
10.3.1 Recovery context
The thesis used a set of RIFs, identified in Chapter 7, to develop a novel approach for
the quantitative assessment of the recovery context through the concept of a recovery
context indicator (presented in Chapter 8). The experiment carried out and presented in
Chapter 9 attempts to verify this approach and its operational benefits. The following
sections adapt the proposed methodology to the particular environment of the ATC
Centre used as a case study. This is achieved in several steps. Firstly, it is necessary
to assess all candidate RIFs and identify those relevant to a particular ATC Centre.
Secondly, the probabilities for each RIF (and its corresponding levels) are defined
based on the controllers input during the debriefing sessions. Thirdly, RIF interactions
are assessed and incorporated. Finally, the recovery context indicator is calculated as
a numerical representation of the context surrounding the simulated FDPS failure and
the subsequent controller recovery. These steps are presented in detail in the following
paragraphs.
10.3.1.1 Assessment of relevant RIFs
This step consists of the assessment of the 20 candidate RIFs and their relevance to
the experiment and the particular ATC Centre involved. Of these RIFs, ‘adequacy of
alarm’ and ‘adequacy of alarm onset’ are not relevant since there was no alarm/alert in
the design of the experiment (see Table 9-7, Chapter 9). There are two reasons for
1 Statistical tests investigated the null hypothesis for experimental sessions 1 and 2, 1 and 3,
and 2 and 3, separately.
Chapter 10 Analysis of Experimental Results
275
this. Firstly, the experiment in this research is designed to capture controller recovery
unaided by system tools, and emphasis is placed on controller readiness to detect and
react to an unexpected failure. Secondly, past research have already shown that in
most cases the existence of an alert does have a significant impact on recovery
performance (Kaarstad and Ludvigsen, 2002; Theis and Straeter, 2001). As a result, 18
RIFs were determined to be relevant to this experiment.
10.3.1.2 Probabilities of each RIF and the corresponding levels
Based on data collected during the post-experiment debriefing session it was possible
to derive probabilities of each RIF and its corresponding levels. The results for all 18
RIFs are presented in Appendix XIV. Furthermore, these probabilities are used to verify
the RIF probabilities defined in Chapter 8 using the verification criteria (Table 10-2). In
other words, a set of expectations was defined before comparing the RIFs probabilities
derived for a ‘generic’ ATC Centre (Chapter 8) and a particular ATC Centre (used in
the experiment).
Table 10-2 Verification of RIFs probabilities from a ‘generic’ approach (Chapter 8) and the experiment
RIF groups Verification
criteria Result Comment
Internal No
difference
No difference, except ‘Communication for
recovery’
The controllers who participated in the experiment rated their communication mostly as ‘tolerable’, compared to the ATM specialists who rated it mostly as ‘efficient’. The experience with an equipment failure in the simulated environment may have indicated some shortcomings in the communication for recovery to participating controllers, of which ATM specialists were not aware of.
Equipment-related
No difference
No difference Note that the five out of six RIFs in this group have been controlled in the experimental design.
External Potential
for difference
No difference, except ‘Adequacy of organisation’
The controllers who participated in the experiment rated the organisation in their ATC Centre mostly ‘tolerable’ while the overall rating from ATM specialists was mostly ‘efficient’. This is a result of the local ATC Centre characteristics masked within more generic characteristics captured by eight ATM specialists.
Airspace-related
Potential for
difference
Difference is observed with ‘traffic
complexity’ and ‘overall task complexity’
This is expected as the experimental design planned for high traffic levels and overall task complexity (resulting from the simulated equipment failure)
The expected differences in RIF probabilities are a result of the experimental design
(e.g. traffic complexity and task complexity) and the overall difference in the
Chapter 10 Analysis of Experimental Results
276
populations sampled (i.e. various ATC Centres sampled in Chapter 8 compared to the
ATC Centre sampled in the experiment). In short, the comparison of RIFs probabilities
for a ‘generic’ and a particular ATC Centre shows similarity.
10.3.1.3 Interactions between RIFs
This step consisted of an assessment and subsequent incorporation of interactions
between identified RIFs, as presented in Table 8-5 (Chapter 8). Based on the
methodology for the quantification of RIFs interactions developed in section 8.4.3 of
Chapter 8, it is possible to determine the coefficient of interaction for the interactions
between 18 relevant RIFs. This coefficient is k=1/(N-1)=1/17=0.059 (where N
represents the total number of relevant RIFs).
10.3.1.4 Recovery context indicator (Ic)
This particular study investigated 18 relevant RIFs, where six RIFs are defined via
three levels of impact and six RIFs via two levels of impact (according to qualitative
descriptors defined in Chapter 7, section 7.3). The remaining six RIFs are defined
through only one level, either because factors were controlled in the experiment or the
participants gave identical answers. For details see Table 10-3 and Chapter 9. In total,
this approach generates 36x 26 = 46,656 possible contexts, each defined through the
corresponding recovery context indicator.
Chapter 10 Analysis of Experimental Results
277
Table 10-3 Summary of RIFs defined through a single corresponding level
Recovery Influencing Factor
(RIF) Descriptor Probability Level Comment
Complexity of failure type
Multiple systems affected
1 3 Simulated Flight Data Processing System (FDPS) failure affects multiple systems
Time course of failure development
Sudden failure 1 1 The FDPS failure is simulated as a sudden failure
Number of workstations/sectors affected
All workstations
1 3 The FDPS failure is simulated to affect the entire ATC Centre
Existence of recovery procedure
Inappropriate 1 3
The objective of the experimental investigation was to simulate failure without recovery procedure
Duration of failure Short period of
time 1 2
The FDPS failure is simulated to last long enough to capture all phases of the recovery
Ambiguity of information in the working environment
External working
environment matches the controller’s
internal mental model
1 1
The controllers responded positively to the question on match between external environment and internal mental model, although they could not say that this match was one hundred percent.
After the calculation of all 46,656 possible contexts it was determined that the mean
value of the Ic is 0.029, ranging from -0.088 to 0.121. The distribution of the recovery
contexts is presented in Figure 10-4. Based on the shape of the Ic distribution, the data
has been fitted with two normal distributions. The result of this fitting is presented in
Appendix XV.
0
100
200
300
400
500
600
700
800
-0.088
-0.078
-0.068
-0.058
-0.048
-0.038
-0.028
-0.018
-0.008
0.00
2
0.01
2
0.02
2
0.03
2
0.04
2
0.05
2
0.06
2
0.07
2
0.08
2
0.09
2
0.10
2
0.11
2
Recovery context indicator (Ic)
Fre
qu
en
cy
Figure 10-4 Distribution of the recovery context indicator in the experiment
Chapter 10 Analysis of Experimental Results
278
Using the experimental results, the distribution of the Ic derived in Chapter 8 is
assessed using the verification criteria (Table 10-4). In other words, a set of
expectations was defined before comparing the distribution of Ic for a ‘generic’ ATC
Centre (Chapter 8) and a particular ATC Centre used in the experiment.
Table 10-4 Verification of the distribution of the recovery context indicator obtained from a ‘generic’ approach (Chapter 8) and the experiment
Recovery context
indicator (Ic)
Verification criteria
Result Comment
Ic
Shape
Potential for difference as a result of the local characteristics of a
particular ATC Centre as compared to a ‘generic’ ATC Centre
Shape: the difference is observed with the left tail of the distribution
Mean Mean: similar2
Median Median: similar3
Range Range: similar4
The main difference observed is the shape of the distribution in the left tail. This cannot
be explained by the difference in the RIF probabilities as the previous section showed
that they differed for only two RIFs, as a result of the characteristics of the experimental
design. Therefore, it is assumed that the shape of the left tail resulted from the local
characteristics of the ATC Centre used in the experiment (Figure 10-4). Although these
characteristics may have existed in the distribution of Ic obtained from a ‘generic’ ATC
Centre (Chapter 8), they may be masked by a ‘generic’ approach.
Therefore, the cause of the deviation in the left tail may be the incorporation of a single
coefficient of interaction between all RIFs, as discussed in section 8.4.3 of Chapter 8.
Although it is known from the operational experience that the RIF interactions do not
have the same level of influence, this thesis had to define a more generic approach to
account for the lack of operational data.
The assumption that a change in the shape of the Ic distribution (in the left tail) is a
result of a single value of the coefficient of interaction, no longer capable of properly
2 A mean value of Ic for a ‘generic ATC Centre is 0.027, whilst for the ATC Centre used in the
experiment is 0.029. 3 A median value of Ic for a ‘generic ATC Centre is -0.023, whilst for the ATC Centre used in the
experiment is -0.026. 4 A range of Ic values for a ‘generic ATC Centre is from -0.069 to 0.131, whilst for the ATC
Centre used in the experiment is from -0.088 to 0.121.
Chapter 10 Analysis of Experimental Results
279
accounting for local characteristic is further assessed on the example of the RIF
‘Adequacy of HMI and operational support’. This RIF is chosen because the interaction
matrix (Table 8-26, Chapter 8) indicates that this RIF impacts on several other RIFs.
Thus the change of its coefficient of interaction may have a significant impact on the Ic
distribution. As a result, the coefficient of interaction relevant to this RIF is increased
from the previous value of k=1/(N-1)=1/17=0.059 (section 10.3.1.3) by factor 10 to the
new value of k=10/(N-1)=10/17=0.59. The resulting distribution of Ic, presented in
Figure 10-5, shows the notable change in the shape of the left tail.
0
100
200
300
400
500
600
700
800
-0.088
-0.076
-0.064
-0.052
-0.04
-0.028
-0.016
-0.004
0.00
80.
02
0.03
2
0.04
4
0.05
6
0.06
80.
08
0.09
2
Recovery context indicator (Ic)
Fre
qu
en
cy
Figure 10-5 Distribution of the recovery context indicator in the experiment with an increased value of the coefficient of interaction
In short, the comparison of the distribution of Ic obtained from a ‘generic’ ATC Centre
and from the particular ATC Centre shows no difference in the mean, median, and
range, but only in the shape of the left tail. This difference in the shape has been
explained by the inadequate definition of the coefficient of the interaction. As previously
discussed in Chapter 8, more accurate definition of this coefficient will be possible once
a detailed database of human performance becomes available in the ATM industry.
While the controller’s responses gave a basis for the definition of the recovery context
indicator (Ic) through each possible recovery context, it was also possible to define
indicators for each controller. In several cases, the participants were not able to select
the corresponding level for several RIFs. For example, in the case of the RIF ‘weather
conditions during the recovery process’ several controllers were so preoccupied with
the recovery process that they did not pay any attention to the weather conditions.
Therefore, they were unable to select the appropriate level for this RIF. The missing
responses were informed by those available for this RIF. In other words, the missing
Chapter 10 Analysis of Experimental Results
280
responses were replaced with the answer ‘unchanged’ (corresponding to Level 2)
reported by the majority of controllers. This is also in line with the actual design of the
experiment, where similar weather conditions were presented to the controllers in the
pre- and post-failure period. A similar approach is applied for other missing answers.
Figure 10-6 shows the distribution of recovery contexts for 30 controllers. All values of
the Ic are positive and range between 0 and 0.1. This reflects average or tolerable
environment (values of Ic are close to 0) that has a potential for improvement to
facilitate better recovery from equipment failure.
Figure 10-6 Distribution of the recovery context indicator of 30 controllers
After the assessment of recovery contexts surrounding each controller, the next section
reviews the potential solutions to enhance the recovery context (and thus controller
recovery) using the methodology developed in Chapter 8. In other words, the next
section analyses the sensitivity of the Ic to changes in RIFs.
10.3.1.5 Optimal solutions
In searching for the areas for potential enhancement to improve the controller’s
recovery process, it is necessary to focus on RIFs which may be affected at the level of
the ATC Centre. Table 10-5 presents the nine RIFs that could be enhanced, based on
the responses of the controllers who participated in the experiment and the
characteristics of the ATC Centre investigated.
Chapter 10 Analysis of Experimental Results
281
Table 10-5 A review of RIFs with the potential for recovery enhancement
RIFs Potential for improvement
Internal RIFs
Training for recovery Previous experience Experience with system performance Personal factors Communication for recovery
√ - - √ √
Equipment failure related RIFs
Complexity of failure type Time course of failure development Number of workstations affected Time necessary to recover Existence of recovery procedure Duration of failure
- - - √ √ -
External RIFs
Adequacy of HMI Ambiguity of information Adequacy of organisation
√ - √
Airspace related RIFs
Traffic complexity Airspace characteristics Weather conditions Task complexity
- √ - √
It is important to note that the remaining RIFs are not taken into account for several
reasons. Firstly, in the particular experiment, a number of RIFs attained their most
favourable levels. In such cases, the majority of controllers expressed satisfaction with
the ATC system and expressed no desire for improvement of the particular RIFs.
Furthermore, several RIFs were controlled in the experiment and as such cannot be
changed. These are: complexity of failure type, time course of failure development,
number of workstations affected, and duration of failure. Finally, certain RIFs are simply
not possible to change, such as weather, experience with a particular type of
equipment failure, whilst traffic complexity cannot be influenced at the level of the ATC
Centre. This resulted in total of nine RIFs that have the potential to enhance the
recovery context and thus controller recovery performance (Table 10-4). The next
section illustrates how the improvement of one RIF (‘existence of the recovery
procedure’) could influence the recovery context.
10.3.1.5.1 Impact of enhancing ‘recovery procedure’ on recovery context
As the participating ATC Centre does not have a recovery procedure for FDPS failure
in place, this factor is chosen as the most practical and effective way of supporting
Chapter 10 Analysis of Experimental Results
282
controllers and enhancing their recovery performance5. Assuming that the
management at the ATC Centre implements recovery procedures for FDPS failure, the
‘existence of recovery procedure’ RIF would be enhanced from Level 3 to Level 1 and
thus defined as ‘suitable to the situation in question’ (the probability of Level 1 equals
1.00; Table 10-6). This approach also assumes that all other RIFs remain unchanged
and that any potential impact of this change on other RIFs will be reflected through
identified RIF interactions.
The resulting recovery context would take the mean value of 0.091 (SD=0.0398; Table
10-6). The difference in the distribution of the Ic with and without change in the
recovery procedures has been tested using the non-parametric Mann-Whitney test
(presented in Chapter 6, section 6.7.4). Overall, the baseline recovery context differs
significantly from the recovery context which incorporated the proposed enhancement.
This means that the design of an appropriate recovery procedure significantly
enhances the recovery context and thus creates a better environment for controller
recovery.
Table 10-6 A review of the proposed recovery solutions
Potential RIF for change
Initial level
Ic
(M, SD, SE)
Level after
iteration
Ic
(M, SD, SE)
Statistical significance with 95% confidence
interval
Existence of recovery
procedure
0 M=0.029
SD=0.036
1 M=0.091
SD=0.039 p<0.001
Sig (U=3E08, z=-196.2) 0 0 1 0
It has to be noted that the proposed change in the recovery procedure represents only
one possible form of recovery context enhancement. In reality, one ATC Centre may
undertake several other solutions to enhance controller recovery. Furthermore, the
proposed change assumes the definition of the recovery procedure for a particular
equipment failure. Therefore, the calculated recovery context indicator is valid for this
failure type only and it would have to be recalculated for other failure types.
This approach may be used to rate the significance of each proposed change and
compare it with their related cost. However, the evaluation of the related costs, as
opposed to the benefit, is not so straightforward and would necessitate an input from
5 The only available procedures in this ATC Centre are those defined by ICAO. As previously
discussed in Chapter 5, ICAO does not define recovery practice for the FDPS failure.
Chapter 10 Analysis of Experimental Results
283
the specific ATC Centre. Therefore, another approach presented in Chapter 8 may be
utilised to ‘rate’ the benefit of implemented changes by the calculation of the ‘recovery
context efficiency’. The ratio between the value of the current recovery context (mean
value of 0.04; Figure 10-5) and the value of the most positive recovery context feasible
in the particular ATC Centre (i.e. Ic=0.44) indicates that a ten fold improvement is
needed to achieve the most positive value of Ic.
The next section analyses the recovery steps taken by the controllers and their overall
recovery effectiveness.
10.3.2 Required recovery steps
The recovery performance of each participant has been compared to the pre-
determined set of required recovery steps. Figure 10-7 presents the ratio of recovery
steps performed by each participant to the total number of steps, whilst Figure 10-8
presents the distribution of recovery steps carried out. Only three out of 17 steps were
performed by all participating controllers. These are detection of the problem, location
of traffic, and identification of failure type6.
0
20
40
60
80
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Participants
Perc
en
tag
e o
f re
co
very
ste
ps p
erf
orm
ed
Steps not performed
Steps performed
Figure 10-7 Recovery steps performed by each participant
6 Note that if a controller did not seek failure-related information from the coordinator, the
coordinator was advised to inform the controller but only after the controller detected the failure. As a result, the occurrence of this step is inevitable.
Chapter 10 Analysis of Experimental Results
284
0
5
10
15
20
25
30
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17
Required recovery steps
No
. o
f p
art
icip
an
ts
Figure 10-8 Distribution of required recovery steps (S1 to S17)
Further data analysis shows that on average each controller performed 74.2 percent of
the required recovery steps, ranging from as low as 29 percent to 100 percent. The
most neglected steps were the re-identification of all traffic (S14) and confirmation of
Mode C (i.e. confirmation of the accuracy of the post restoration FDPS data – S15).
The post restoration recovery steps of re-identifying traffic and validating Mode C are
important as these steps are considered best practice to ensure system safety in the
aftermath of an FDPS failure. The re-identification process is necessary for two
reasons. Firstly, the identification of traffic is lost whilst aircraft occupy a holding
pattern. Separation in a holding pattern is purely procedural and radar separation does
not apply. Secondly, because of the potential for label swapping and garbling of radar
signals when aircraft are in close lateral proximity (i.e. such as in a holding pattern).
Further investigation of the percentage of the steps performed in three sessions
reveals a significant difference between the first and the third session. The percentage
of the steps carried out in the first session is significantly lower than in the third
session. The relevant statistics are presented in Table 10-7. The percentage of the
performed recovery steps in the first experimental session is on average 64 percent,
increasing in the second experimental session to 77 percent, reaching 82 percent in
the third experimental session (Table 10-7).
Chapter 10 Analysis of Experimental Results
285
Table 10-7 Percentage of performed recovery steps in three experimental sessions
Session Statistics Paired sessions Non-parametric Mann-Whitney test results
1 M=63.98
SD=21.69 1 and 2 p>0.05
2 M=77.06
SD=17.64 1 and 3
p=0.044 Sig (U=23.5, z=-2.0)
3 M=81.77
SD=12.84 2 and 3 p>0.05
After the last experimental session, it was suspected that certain changes had been
implemented in the training of controllers in the participating ATC Centre. The
debriefing session with controllers participating in the third experimental session and
the input from management revealed the incorporation of a compulsory emergency
training module within every rating conversion and continuation training course. This
change was firstly incorporated in the SID/STAR training that started on May 2006. As
a result, several controllers participating in the third experimental session (taking place
in June 2006) benefited from this change. It seems that that this change in training
syllabus led to the increased number of recovery steps performed and the significant
difference observed when compared to the first experimental session.
Statistical tests performed to determine the relationship between the percentage of
recovery steps performed and 18 RIFs, showed that only RIF2 (‘previous experience
with equipment failures’) has a statistically significant correlation. More precisely, the
negative correlation identified (r=-0.31) indicates that controllers who have experienced
equipment failures tend to perform more of the required recovery steps compared to
those who have not experienced failure. In other words, experience with equipment
failures enhances the controllers’ ability to recover. This finding should be transferred
into the training syllabus of every ATC Centre.
10.3.3 Recovery effectiveness
As explained in the previous Chapter, this variable is based on data and information
from three different sources, where each controller is categorised as follows: very good
(VG), good (G), adequate (A), partially adequate (PA), and inadequate (I). The
recovery performance of 43 percent of controllers is rated as partially adequate or
totally inadequate (Figure 10-9). These controllers did not assure ATC system
protection from possible further equipment degradation and did not employ timely and
accurate strip marking and strip board management. Therefore, they had little or no
means of supporting their mental picture of traffic and airspace. The post-restoration
Chapter 10 Analysis of Experimental Results
286
steps were performed only to some basic extent without any proper check of the new
data accuracy. In addition, such a high percentage of inadequate performance
indicates that there is room for improvement throughout the ATC Centre participating in
this experimental investigation. The management of the ATC Centre should implement
solutions to assure a more efficient handling of unusual/emergency situations. Such
solutions could include emergency training on equipment failures, design of recovery
procedures, and regular briefings.
Figure 10-9 Distribution of recovery effectiveness per category (presented via frequencies and relative percentages)
Comparison of the recovery effectiveness for the three experimental sessions does not
reveal any significant differences (using the non-parametric Mann-Whitney test). In
spite of the implemented change in the participating ATC Centre (i.e. compulsory
emergency training module within the SID/STAR conversion training) and the increase
in the number of recovery steps performed, the effectiveness of the recovery
performance did not differ from one session to the other. This finding confirms that the
rating of recovery effectiveness does not depend on a simple count of recovery steps
performed. This finding further justifies the use of pooled data from all three
experimental sessions. It is an indication of the overall objective achieved with the
execution of those steps but without account of the time frame (recovery duration)
within which the objective is achieved. The combined effect of recovery effectiveness
and recovery duration is assessed in section 10.3.5.
Chapter 10 Analysis of Experimental Results
287
10.3.4 Recovery duration
The recovery duration is the time measured from the controller’s first action to the end
of the recovery process. During the experiment the first action was identified by the
observation and video recording of each controller’s performance, further validated with
the controller (during the post-experiment debriefing session) and the SME. For
example, the time of the first action was the moment when a controller initiated a
search for the uncorrelated track(s), contacted Area Control Centre (ACC) to check on
the uncorrelated track(s) or contacted aircraft to ask for a transponder check (using the
phraseology “squawk ident”). The end of the recovery process in this particular
experimental design was influenced by the restoration of the failed system and the
performance of the necessary post-restoration steps.
In general, the recovery duration ranged between 12:08 and 15:49 minutes, with an
average duration of 14:38 minutes (SD=0:55). The distribution of the recovery duration
of all 30 controllers per four duration categories is presented in Figure 10-10. These
categories are: 12-13, 13-14, 14-15, and 15-16 minutes. Figure 10-10 shows that 50
percent of controllers initiated the first recovery action within the first minute of the
failure occurrence (and thus their recovery duration lasted between 15 and 16
minutes). The shortest recovery duration is captured in the recovery performance of
two controllers (6.7 percent; Figure 10-10). These two controllers, although initiating
recovery later than the others, implemented an excellent recovery strategy. This finding
highlights that the recovery duration and recovery effectiveness alone are not
appropriate indicators of the overall recovery outcome. To enable a safety assessment
of the recovery performance it is necessary to account for both, as presented in section
10.3.5.
Chapter 10 Analysis of Experimental Results
288
Figure 10-10 Distribution of recovery duration
Comparison of the recovery duration for the three experimental sessions revealed
significant differences. More precisely, the recovery duration in the third experimental
session is significantly longer than in the first two sessions (Table 10-8). This is a result
of the controllers from the third session reacting to the identified failure more promptly
compared to the controllers from the previous two sessions. This may be the result of
the change in the training implemented by the management in the participating ATC
Centre prior to the third session. However, it has to be noted that more prompt reaction
to the identified failure (i.e. longer recovery duration) does not necessarily entail an
effective recovery.
Table 10-8 Comparison of recovery durations between three experimental sessions
Session Statistics Paired sessions Non-parametric Mann-Whitney test results
1 M=14:15 SD=1:02
1 and 2 p>0.05
2 M=14:25 SD=0:58
1 and 3 p=0.031 Sig (U=21.5, z=-2.2)
3 M=15:14 SD=0:18
2 and 3 p=0.014 Sig (U=17.5, z=-2.5)
Non-parametric Kendall’s tau tests performed between recovery duration and various
RIFs, reveal four statistically significant correlations. These are presented in Table 10-9
while the details of this test are discussed in Chapter 6. Firstly, the analysis shows that
Chapter 10 Analysis of Experimental Results
289
the recovery duration tends to be longer7 if the last emergency training had a module
on equipment failures. This finding indicates the benefit that emergency training has on
recovery duration (as it prepares controllers to react rapidly to an emergency situation).
Secondly, a similar effect on recovery duration is seen with enhanced communication
for recovery. In other words, if the controllers initiate recovery sooner, they have more
time to adequately communicate the problem to team members or a supervisor.
Thirdly, the existence of adequate recovery procedures promotes prompt recovery
action. This is in line with the finding of the first test. Finally, recovery duration
increases with a decrease in traffic complexity. This is expected as the less demanding
traffic situation allows more prompt action and initiation of the first recovery action
sooner rather than later.
Table 10-9 Statistical tests and results
Variable 1 Variable 2 Test Statistical significance at
95% confidence level
Recovery duration
Last emergency training (module on equipment failure)
The nonparametric correlation
(Kendall’s tau)
p=0.018 (r=-0.39)
Communication for recovery p=0.10 (r=-0.39)
Existence of the recovery procedure
8
p=0.15 (r=-0.41)
Traffic complexity p=0.004 (r=-0.46)
After assessing both recovery effectiveness and recovery duration, it is realised that
independently they are not appropriate indicators of the recovery outcome, as
discussed in Chapter 5. Therefore, a safety assessment of the overall recovery
performance necessitates the use of both variables combined into the ‘outcome of the
recovery process’ presented in the following section.
10.3.5 Outcome of the recovery process
The outcome of the recovery process represents the final stage in technical and
controller recovery as previously discussed in section 5.3 of Chapter 5. Since no
technical recovery was taken into account in this experiment, the outcome of the
7 More prompt first recovery action by a controller is representative of the longer recovery
duration. 8 There is no recovery procedure for the simulated equipment failure in the participating ATC
Centre, but some controllers stated that they had experienced similar failures as part of their initial simulator training. Discussion with the subject matter expert revealed that this particular equipment failure is not simulated in any training syllabus.
Chapter 10 Analysis of Experimental Results
290
recovery process focuses solely on the outcome of controller recovery. This is defined
as a combination of two recovery variables. Firstly, recovery effectiveness that
accounts for recovery steps carried out by a controller and achievement of the three
key objectives (i.e. ATC system protection, maintenance of situational awareness, and
adequate post-restoration steps). Secondly, recovery duration accounts for the time
frame in which these steps were performed. In line with the discussion in Chapter 5,
the outcome of the recovery process is accounts for successful and unsuccessful
recovery. An additional category for ‘tolerable’ recovery outcome is also defined in this
thesis (Table 10-10).
Table 10-10 The outcome of the recovery process matrix applicable to the experimental set up presented in this thesis (S stands for successful, T for tolerable, and U for unsuccessful recovery)
Recovery duration (minutes)
12-13 13-14 14-15 15-16
Reco
very
E
ffe
ctiven
ess Very good T T S S
Good T T T S
Adequate U T T T
Partially adequate U U T T
Totally inadequate U U U T
The recovery outcome matrix highlights that successful recovery requires the initiation
of the recovery process within the first two minutes from the instant of the failure
occurrence and the performance of the majority of the recovery steps (assuring
achievement of all three objectives). An unsuccessful recovery is a result of a controller
failing to achieve two or more key objectives while initiating the recovery after more
than one minute from the instant of the failure occurrence. The delayed first recovery
action leaves the ATC system completely unprotected. Therefore, the temporal
requirements for the unsuccessful recovery account for three categories of the
recovery duration variable (Table 10-10). Everything outside the scope of the
successful and unsuccessful recovery is considered tolerable. The above discussions
are only applicable to this experimental time frame and setting, and are extracted
based on operational experience, with a further validation by the SME.
Based on the presented categorisation, the outcome of the recovery process for
controllers who participated in the experiment is mostly tolerable (Figure 10-11). This
finding again confirms that there is room for improvement of the recovery performance
in the ATC Centre used in this experiment.
Chapter 10 Analysis of Experimental Results
291
Figure 10-11 Distribution of the recovery outcome
After assessing all recovery variables, the next section identifies any relevant
interactions between them.
10.3.6 Interactions
This section investigates the level of interactions between the recovery variables using
statistical testing (previously discussed in Chapter 6). Table 10-11 presents the results.
Table 10-11 Statistical tests and results
Variable 1 Variable 2 Test Statistical significance at 95 percent confidence interval
Recovery context indicator
Recovery effectiveness
Non-parametric
test (Kendall’s tau)
p=0.06, r=0.329
Outcome of the recovery process
p=0.017, r=-0.36
Recovery effectiveness
Outcome of the recovery process
p=0.01, r=0.57
Recovery duration Outcome of the
recovery process p>0.05
Non-parametric Kendal’s tau statistical tests indicated three significant relationships
(Table 10-11). Firstly, a statistical test indicates a relationship between recovery
effectiveness and recovery context indicator at the 90 percent confidence level
(p=0.06, r=0.32). Furthermore, the Mann-Whitney non-parametric test shows the
9 Statistical significance at the 90 percent confidence interval
Chapter 10 Analysis of Experimental Results
292
relationship between recovery context indicator for the combined category of ‘very
good’ and ‘good’ recovery effectiveness on one side and ‘partially adequate’ and ‘totally
inadequate’ on the other (at the 90 percent confidence interval, p=0.065). Secondly, a
statistical test indicates a significant relationship between the recovery context indicator
and the outcome of the recovery process at the 95 percent significance level (p=0.017,
r=-0.36). In other words, the higher values of the recovery context indicator enhance
the outcome of the recovery process or the recovery success. Finally, a statistical test
indicates a significant relationship between recovery effectiveness and the outcome of
the recovery process. In other words, the greater controller recovery effectiveness the
more successful is the overall recovery. All findings are in line with the operational
experience.
10.3.7 Other findings
In addition to the findings above, the following points are worthy of note. These are
presented, firstly by considering the phases of recovery and the corresponding
influencing factors. Secondly, by considering the behaviour and attitude of the
controllers, as the simulated failure was unexpected. Finally, additional findings related
to controller recovery of relevance to the management of the particular ATC Centre and
the wider aviation community are presented also.
10.3.7.1 The recovery phases
The following paragraphs provide a review of the three distinct recovery phases as
explained in Chapter 5, section 5.2. This review focuses on the factors that influenced
controller recovery performance in each phase.
10.3.7.1.1 Detection
In the simulated runs, detection, or recognition that there is something unusual in the
ATC system, was determined by several factors. The most prominent factor was the
pilot's first contact with ATC. There were two flights entering the approach sector
simultaneously following failure injection. Depending on the pseudo-pilots’ workload,
either of these aircraft could contact the controller first. At the moment of the first
contact the flights were still outside of the controller’s area of responsibility (some
40Nm away from the airport10) and controllers were sufficiently busy in the vicinity of
the airport providing approach control service. As a result, the aircraft were usually
10
Note that the display range in this experiment was set to 30Nm for each controller.
Chapter 10 Analysis of Experimental Results
293
asked to standby for radar identification. In the case of late contact by the first
uncorrelated track (once the track is almost visible on the radar screen or at about
35Nm from the airport), controllers searched for the track and detection of the problem
was then immediate. The common factors that influenced the detection phase of the
recovery process in this experiment were determined based on observations, video
recordings, and debriefings. These are as follows:
� The first radio contact (RT) of uncorrelated track;
� Traffic complexity and related level of controller workload at the moment of contact;
� Display range (set at 30Nm for this experiment);
� Type of the equipment failure (uncorrelated tracks were immediately visible on the
screen once within radar range); and
� Complexity of failure type (affecting single or multiple equipment simultaneously).
It should be noted that the same set of factors also affected the instant of the first
recovery action. The reason is that detection is a prerequisite for the first recovery
action.
10.3.7.1.2 Diagnosis
In this experiment, after the detection of one uncorrelated track, the controller’s first
assumption was usually aircraft transponder failure. This prompted a request to the
pilot to squawk identification on the secondary transponder (i.e. to operate the
designated Mode A code on the primary/secondary transponder). When this check did
not produce a correlated track on the radar screen further checks were necessary. At
this stage, the second aircraft was usually well inside the radar display range also in an
uncorrelated state. At this point, it became obvious to the controllers that they were
experiencing some form of equipment failure and they sought information from the ATC
Centre coordinator as to the nature of the failure. The possible options were failure of
secondary surveillance radar or FDPS failure. SSR failure was discounted as soon as
the mix of correlated and uncorrelated tracks was visible. The final option was FDPS.
The coordinator was instructed to announce that it was FDPS failure affecting the
entire ATC Centre. Moreover, he also emphasised that flight plan tracks would remain
correlated only for tracks already displayed, while all other tracks entering the system
will appear uncorrelated. The common factors that influenced the diagnosis stage of
the recovery process in this experiment were determined based on observations, video
recordings, and debriefings. These are as follows:
� The number of uncorrelated tracks observed on the radar display;
� Input by the coordinator;
Chapter 10 Analysis of Experimental Results
294
� Type of equipment failure; and
� Complexity of failure type.
10.3.7.1.3 Correction
In the exercised traffic scenario, the correction phase consisted of the identification of
all traffic using an appropriate primary radar technique. There are a number of
available techniques to identify traffic. Those chosen by the controllers in this
experiment were confirmation of bearing/distance of the aircraft from a fix and the turn
method (turning a singe aircraft by 30 degrees or more to ascertain positive radar
identification). Operationally, the bearing/range technique is considered to be more
effective and expeditious, as it avoids misidentification due to simultaneous turning of
more than one aircraft. The next step in this process would be to inform all traffic of the
exact nature of the equipment failure and to advise them of possible consequences
(i.e. restrictions and delays). This would be followed by restricting any sport/training or
non-commercial aircraft, refusing departures permission to depart, and utilising the
holding pattern for all arrivals. If the failure was persistent (in this experiment it lasted
15 minutes), the controllers had to think of the steps to assure system safety in the
case of further deterioration of the equipment reliability. Thus, they had to provide
vertical separation and preserve the highest level of situational awareness. This should
be achieved by maintaining accurate and timely strip marking and strip board
management11. The common factors that influenced the correction stage of the
recovery process were determined based on observations, video recordings, and
debriefings. These are as follows:
� Traffic complexity;
� Existence and familiarity with the recovery procedure(s);
� Duration of failure;
� Type of equipment failure; and
� Complexity of failure type.
Figure 10-12 links the key characteristics of each recovery phase in this particular
experiment with the recovery steps relevant for each phase.
11
The debriefing sessions investigated the overall quality of strip management and annotation without going into a more detailed analysis. In future, the structure of the debriefing session may place more emphasis on this segment of the recovery process.
Chapter 10 Analysis of Experimental Results
295
Figure 10-12 Recovery phases, their corresponding influencing factors and required recovery steps
10.3.7.2 Observed behaviour and attitude
As discussed in Chapter 9, all the observations of the controllers’ attitude and
behaviour were captured by the assistant. A check-list using the SHAPE’s list of
attitudes was used as an initial tool and guidance to the assistant in performing this
task (see EUROCONTROL, 2004f). In addition, some of the observations were
captured during the debriefing sessions.
In general, the observations in the first two experimental sessions show a difference in
overt behaviour in the pre- and post-failure segment of the experimental investigation.
In line with the results obtained with other recovery variables, the analysis of the
relevant data on controllers participating in the third session did not reveal significant
changes in overt behaviour in the pre- and post-failure segment of the experiment.
Furthermore, the findings from the first two sessions are in line with the previous
findings on the consequences of stress on individual controllers (Costa, 1995). Whilst
for some controllers the overall posture remained the same throughout the exercise,
Chapter 10 Analysis of Experimental Results
296
others displayed the complete opposite. The deviations from the pre-failure behaviour
involved the following:
� increased movement (i.e. overall posture, hands, feet, or head);
� forceful displacement of the strip holders;
� deviations from standard RT phraseology;
� hesitation in RT communication; and
� change in pitch or tone of voice.
The subject matter expert involved confirmed that most of these behavioural gestures
depict a typical reaction to a reduced mental picture of either the traffic or overall
situational awareness. Even during the debriefing stage of the experiment, the change
in the controllers’ behaviour was noticeable for the first two experimental sessions.
Examples include shaky voice, overall unease, high alertness, and seriousness. The
controllers who performed the recovery process at either tolerable or good levels were
noticeably more relaxed and talkative. On the other hand, the controllers who
performed at either partially adequate or inadequate levels were without exception
more nervous and reluctant to answer questions in detail, and carry out an objective
review of their own performance. The overall conclusion is that the equipment failure
was an unexpected event and contributed to a significant increase in the controller’s
workload (as reported subjectively by the participating controllers).
10.3.7.3 Additional findings
It is important to present all acquired findings as they represent important issues for the
management of the participating ATC Centre as well as the wider aviation community.
These are presented in the following paragraphs.
Although 73 percent of the controllers reported that their training was suitable to the
equipment (i.e. FDPS) failure and traffic scenario in question, analysis of data collected
in the experiment showed that for 43 percent (of the 73 percent) received the last
emergency training more than a year prior to the experiment12. From the controllers
who were able to recall, 50 percent stated that the emergency training session they
participated in had a module on equipment failures, predominantly on radar failures.
However, it was also noted that 40 percent of the controllers did not have any type of
equipment failure in their last emergency training. As a result, 93 percent of controllers
12
Note that 27 percent of controllers had their last emergency training in the month prior to this experiment, as a part of the approach rating course.
Chapter 10 Analysis of Experimental Results
297
who participated in the experiment reported they would like to have more frequent
training for unusual situations. The most desired frequency of emergency training
sessions was every six months. This is in line with the findings obtained in the
questionnaire survey (Chapter 6) where 45 percent of controllers believe that recurrent
training once a year is not enough to develop and maintain the level of proficiency
required for recovery from equipment failures.
Interesting results were obtained on the question on the existence of a recovery
procedure for the simulated FDPS failure. Although the procedure for this kind of failure
does not exist in the Manual of Air Traffic Services (MATS), 20 percent of controllers
believed that this particular procedure does exist. Some of the controllers, who had
participated in the approach control course, quoted their training manual as the
reference for this procedure. However, no evidence was found to support their
statement. The best explanation for this is that these controllers identified Secondary
Surveillance Radar (SSR) failure with FDPS failure and relied on their recent radar
fallback training, without fully understanding what the implications of the loss of FDPS
are. The outcome of FDPS failure is significantly different from simple SSR failure, as it
represents a more serious failure that requires immediate attention from the controllers
with the required skills.
On the issue of Human Machine Interface (HMI) and operational support (e.g. auxiliary
display, communication panel) 46.7 percent of controllers found the Beginning to End
Skills Trainer (BEST) simulator platform suitable to the equipment failure and traffic
scenario in question, 36.7 percent found it tolerable, while ten percent found it counter
productive. 6.7 percent of the controllers did not respond to this question. However,
most of the controllers stated that the BEST platform’s HMI is not as good as the HMI
used in the operational centre. There are two reasons for this. Firstly, meteorological
data needs better positioning (i.e. closer to the screen) to avoid head turn and change
of visual field and secondly, a lack of alert or warning that a failure has occurred (i.e.
colour change to yellow or red in the ‘general information window’).
Several organisational issues were raised during the debrief sessions. The most
frequent issues raised were that controllers:
� felt that supervisors should receive more dedicated training in the handling of
unusual occurrences and system failures. Their role in coordinating recovery
actions should be more proactive. In addition, it was highlighted that coordination
Chapter 10 Analysis of Experimental Results
298
with technical services and adjacent ATC Centres should be the primary
responsibility of the supervisor during a Centre crisis;
� felt that more emphasis could be placed on developing an understanding of the
separate roles of both controllers and engineers. This perceived lack of
understanding of each peer group’s function and tasks can create communication
difficulties in the operational environment;
� identified a need for an update of the MATS with regard to the on suite task
allocation between the executive and planning controller. Additionally, controllers
stated that the last three incidents involving a loss of standard separation involved
team related issues that contributed to the events. Therefore, it is necessary to
strengthen the relationship between executive and planning controllers and to
define their precise roles and responsibilities;
� stated that their roles as currently defined in MATS are ideal but in reality are
difficult to adhere to, especially in a busy operational environment. They further
stated that in the event of an unusual occurrence, there are no guidelines available
for the handling of such situations;
� stated that competency checking, conducted once per year for only one hour, is not
sufficient. They also stated that the availability of refresher training in unusual
occurrences is also limited to once per year. One again, this finding is in line with
the questionnaire survey results presented in Chapter 6.
In general, the participating controllers rated their own performance between efficient
and tolerable (47 percent rated their own performance as efficient and 50 percent as
tolerable). This is not in accordance with the overall assessment of their performance
(recovery effectiveness) where 43 percent of the controllers performed at the ‘partially
adequate’ and ‘inadequate’ levels. This should pose some concern especially
considering that 46.7 percent of controllers stated that their performance in this study
was no different from any other day. In addition, 45 percent of them marked their
performance as highly representative of their overall ability to recover from an
equipment failure in ATC. Finally, 70 percent of controllers stated that the task they
experienced in the experiment was highly realistic.
Furthermore, 33 percent of the controllers stated that they were not aware of the
complete impacts/implications of a particular failure or equipment failures in general. As
a result, 87 percent of the controllers stated that they would like to have some form of
aide memoire available at each CWP to assist them in recognising the effects of a
particular equipment failure and steps to be taken to recover. As a consequence this
Chapter 10 Analysis of Experimental Results
299
thesis proposes a framework for the establishment of an aide-memoire (in Appendix
III). A summary of all additional findings is presented in Table 10-12.
Table 10-12 Summary of additional findings
Variable Finding Comment
Training
73 percent reported that their training was suitable
Majority of these controllers had the last training on unusual situations more than a year ago. Only half of the respondent had an equipment failure.
93 percent of controllers would like more frequent training for unusual situations
Trust in ATC technology
93 percent of controllers have an objective attitude toward ATC equipment
Recovery procedure
20 percent of controllers believe that the procedure for FDPS failure exists
The procedure does not exist in the ATC Centre
HMI
46.7 percent of controllers found the BEST platform suitable to their needs and only 10 percent found it counter productive
Negative comments are mostly related to the differences between BEST platform and the system used in the operations room
Overall recovery performance
47 percent of controllers rated efficient 50 percent of controllers rated tolerable
Not is accordance with their overall performance. 43 percent of controllers were rated partially adequate or inadequate.
Awareness of the impact of a
particular failure
33 percent of controllers is not completely aware
Availability of aide memoire
87 percent of controllers is in favour A framework of aide memoire is provided in Appendix III
10.4 Summary
The Chapter set out to achieve several objectives. Firstly, it set out to verify a
methodology for the quantitative assessment of the recovery context (defined in
Chapter 8) and its operational benefits. Secondly, it set out to verify a framework for an
in depth analyses of controller recovery using recovery variables previously identified in
Chapter 5. The final objective set out to assess the outcome of the recovery process.
All these objectives have been achieved by the experiment and several interesting
findings have been produced. These are as follows:
� The majority of controllers tend to omit some critical recovery steps related to the
post-restoration phase. These are re-identification of traffic and confirmation of
the accuracy of information provided by the restored equipment. The sampled
controllers seemed to rely on the information provided without questioning its
accuracy following the occurrence of a failure.
Chapter 10 Analysis of Experimental Results
300
� Controllers with prior experience of equipment failures tend to carry out more
recovery steps compared to those without prior experience. In other words,
experience with any equipment failure tends to enhance the controllers’ ability to
deal with equipment failures. Moreover, this type of stress-exposure training
enhances the stress-coping skills of controllers and as such should be
incorporated into the training syllabus of every ATC Centre.
� A high percentage of inadequate recovery performance indicates that there is
room for improvement throughout the ATC Centre participating in the experiment.
Hence, the ATC Centre management should implement solutions to assure
efficient handling of unusual/emergency situations. Note, however that the
management of the ATC Centre where the experiment took place implemented
an initial process to train controllers to deal with unusual/emergency situations.
This was in the form of a compulsory emergency training module within every
rating conversion and continuation training course.
� The first recovery action tends to occur more promptly if a controller has had
training for unusual/emergency situations.
� If the controllers initiate recovery sooner, they communicate better with team
members and the supervisor.
� The existence of adequate recovery procedures tends to promote prompt
recovery action.
� Recovery duration tends to increase with a decrease in traffic complexity. This is
expected as the less demanding traffic situation allows the controllers to initiate
recovery action sooner rather than later.
� The outcome of the recovery process variable has been defined as an overall
safety indicator of the recovery process. It represents a combination of the
recovery effectiveness and duration.
� The recovery context indicator represents a good indicator of both recovery
effectiveness and the outcome of the recovery process.
� Recovery duration itself is not a good indicator of the outcome of the recovery
process, whilst recovery effectiveness is.
� The framework for the analysis of controller recovery proposed in this thesis and
verified in the operational environment, shows a potential for an in depth analysis
of controller recovery from equipment failures in ATC.
Chapter 11 Conclusions
301
11 Conclusions
This Chapter presents the main findings of the research on controller recovery from
equipment failures in Air Traffic Control (ATC) and suggests avenues for future work.
The approach taken for the former is to address each of the research objectives
formulated in Chapter 1 (repeated below for ease of reference) and to present the
corresponding findings. The Chapter concludes with the identification of research
questions and ideas to be explored in future research.
11.1 Revisiting the research objectives
Chapter 1 defined a set of four research objectives for this thesis. These are to:
� Provide a systematic literature review to connect disparate but related topics of
ATC equipment failures and controller recovery, previously lacking in the area of
ATC;
� Identify potential equipment failure types and their characteristics;
� Identify contextual factors that affect controller recovery performance and derive a
methodology to quantitatively assess recovery context; and
� Propose a framework for the analysis of controller recovery. This framework should
be further verified with specific reference to a particular equipment failure type.
11.2 Conclusions
11.2.1 Literature review
The review of relevant literature aimed to connect ATC equipment failures with both
technical and air traffic controller recovery. With respect to the literature review, the
following conclusions are relevant:
1. The assessment of controller recovery from equipment failures in ATC has to
address technical and controller recovery together and not in isolation as has
been the case in the past. This holistic approach enables a complete
understanding of controller recovery and all of its influencing factors.
Chapter 11 Conclusions
302
2. Because of the variety of equipment, components, and tools in both current and
future ATC system architectures, ATC equipment should be classified based on
the type of ATC functionality it supports. Such a functional classification is
flexible to changes in ATM/ATC and can capture both current and future
equipment failure types.
3. Recovery procedures, recovery training, and past experience with equipment
failures are the main drivers of controller recovery performance. However, the
provision of both recovery procedures and training is inconsistent, across ATC
Centres.
4. The context in which controller performance takes place has an important role
in controller recovery.
11.2.2 Equipment failure types and their characteristics
Equipment failure characteristics were determined from past research and operational
experience through the analysis of operational failure reports and responses from a
questionnaire survey of air traffic controllers. With respect to equipment failure
characteristics, the following conclusions are relevant:
5. The key characteristics of ATC equipment failure are: ATC functionality
affected, complexity of failure type, time course of failure development, duration
of failure, potential causes of equipment failure, and the consequences of
equipment failure.
6. Information on equipment failure characteristics has been used to develop a
novel qualitative equipment failure impact assessment tool. This tool enables
the identification of equipment failures that are most challenging to ATC
operations.
7. Communication, surveillance, and data processing ATC functionalities are
affected most by equipment failures and have the most severe impact on ATC
operations. This finding has been verified by operational failure reports and the
results of the questionnaire survey.
8. According to operational failure reports further verified with the results of the
questionnaire survey, equipment failures that have a major impact on ATC
operations mostly affect the air ground communication, radar surveillance
coverage, and the Flight Data Processing System (FDPS).
9. According to operational failure reports, the most frequent equipment failures
last up to 15 minutes. Furthermore, analysis of the reports has shown that the
Chapter 11 Conclusions
303
longer the failure, the less severe it is. This finding is expected as more severe
failures are attended to immediately.
The conclusions listed above, resulting from the investigation of equipment failure
types and their characteristics in the operational ATC environment, have the potential
to impact policy formulation and the operational aspects of ATC/ATM. The thesis
findings have highlighted, for the first time, the ATC functionalities that are most
affected by equipment failures as well as those which have the most severe impact on
ATC operations. These use of the findings are twofold. Firstly, to identify the equipment
failure types mandatory for recovery training/procedures designed for an ATC Centre.
Secondly, the qualitative equipment failure impact assessment tool can be used as a
part of the incident investigation process as well as a design tool, supporting the design
of recovery training scenarios.
11.2.3 Controller recovery performance, recovery context, and influencing factors
The main findings related to controller recovery performance and the recovery context
are drawn from two sources of information. Firstly, the questionnaire survey results
provided an initial insight into controller recovery and relevant factors. Secondly, a
review of several Human Reliability Assessment (HRA) techniques identified a set of
relevant contextual factors, the so-called Recovery Influencing Factors (RIFs). With
respect to controller recovery and the overall recovery context, the following
conclusions are relevant:
10. This thesis presents for the first time, a comprehensive investigation of the
factors that influence controller recovery. This has been done through a
rigorous process that started with relevant past research, a questionnaire
survey, targeted experiments, and statistical analyses to develop a functional
relationship between controller recovery and its influencing factors.
11. The questionnaire survey showed that the majority of controllers experience
equipment failures annually.
12. Improvement in ATC Centre management is required to facilitate effective
recovery. This can be achieved through, for example organised exchange of
experience within ATC Centres, not only with respect to equipment failures but
also with all types of emergency/unusual situations. Statistical tests identified
that controllers’ account for exchange of information regarding equipment
failures as a type of past experience.
Chapter 11 Conclusions
304
13. The questionnaire survey showed that the vast majority of ATC Centres
surveyed have some form of recovery procedure. The most neglected
procedures are for ATC functionalities which are most challenging to controller
recovery (data processing, surveillance, and communication functionalities). In
addition, controllers highlighted the need for an abbreviated version of the
contingency manual which should be made available at each controller working
position (i.e. aide-memoire).
14. Recovery procedures should be up-to-date, complete, and follow a logical
sequence of steps that the controllers should perform. In addition, recovery
procedures need to be compatible with other procedures within the ATC Centre.
In short, procedures should be seen as guidance to the controller, they should
be adaptable to any given situation, and should take account of a variety of
contextual factors.
15. Half of the ATC Centres surveyed in the questionnaire survey have
programmes for training in recovery from equipment failures. However, this
recurrent training is usually provided once a year. The controllers believe that
the frequency of recurrent training is inadequate and are in favour of receiving
as much training as possible on emergency/unusual situations, including
equipment failures.
16. Recurrent training must be up-to-date and compatible with other training
programmes. Moreover, the recurrent training exercises should be varied and
realistic covering both outages and less severe failures. The ATC Centre should
adopt a custom of periodically reverting to backup systems in order to maintain
controllers’ proficiency with their usage, perhaps during less busy traffic
periods.
17. Regular training on system functionalities, upgrades, and degradation modes
could be a useful method to ensure consistent knowledge and familiarity with
the ATC system architecture.
18. The majority of controllers surveyed confirmed the importance of context
surrounding an equipment failure occurrence. This confirmed the earlier finding
from existing research literature.
19. The context surrounding controller recovery from equipment failure in ATC is
defined via 20 contextual factors, known as Recovery Influencing Factors
(RIFs). Each RIF can be further defined via its qualitative descriptor. This
establishes the relationship between each RIF and its influence on controller
performance.
Chapter 11 Conclusions
305
20. An aggregated indicator of the entire recovery context has been proposed,
referred to as recovery context indicator (Ic). This quantitative indicator of the
recovery context is sensitive to changes in the individual RIFs.
This thesis presents for the first time, a comprehensive set of the factors that influence
controller recovery (RIFs). These factors can be used as part of an incident
investigation process, enabling a detailed investigation of the impact of context on
controller recovery performance. The identification and assessment of RIFs can also
be used for the identification of recommendations on various aspects of ATC operation
and their refinement. However, the final decision of the optimal recommendation should
be based on the degree of positive shift in the value of the recovery context indicator
(as the quantitative indicator of the recovery context). Within the future ATM system,
this methodology could be easily modified to account for the shared responsibility of
separation of aircraft and collaborative decision-making between airborne and ground
based ATM system components.
11.2.4 Framework for the analysis of controller recovery
The framework for the analysis of controller recovery proposed in this thesis was
verified in an experimental investigation with specific reference to a particular
equipment failure type (i.e. FDPS) and a particular ATC Centre. With respect to the
framework for the analysis of controller recovery, the following conclusions are
relevant:
21. Recovery variables relevant to controller recovery from equipment failures in
ATC are the recovery context, effectiveness, and duration. This set of recovery
variables showed a potential for the rigorous analysis of controller recovery.
22. The experiment showed that the controllers with previous experience of
equipment failures executed more required recovery steps. Overall, experience
with equipment failures enhances a controller’s ability to deal with any type of
equipment failure.
23. A further finding from the experiment is that recovery duration tends to be
longer, the closer the emergency training with a module on equipment failures
is to the occurrence of the actual failure.
24. Communication with team members or the supervisor is enhanced when
controllers initiate recovery action sooner (i.e. as close as possible to the instant
of the occurrence of the failure).
Chapter 11 Conclusions
306
25. Furthermore, the experiment showed that the existence of recovery procedures
(or any type of reference material, such as training manuals) promotes prompt
recovery action.
26. The experiment also showed that recovery duration increases with a decrease
in traffic complexity.
27. The recovery context indicator represents a good indicator of both recovery
effectiveness and the outcome of the recovery process (represented as a
combination of the recovery effectiveness and duration).
28. The thesis has identified a statistically significant correlation between recovery
context indicator and the outcome of the recovery process. Hence, the outcome
of the recovery process represents a good safety indicator of the overall
recovery process.
The relevance of recovery training (either as an alternative or an addition to past
experience) and recovery procedures has been confirmed by experiment. Recovery
training and awareness of recovery procedures lead to more prompt recovery action,
better awareness of required recovery steps, and enhanced team communication.
These findings should directly inform the required policy on training and procedures for
handling unusual/emergency situations, highlighting required content, frequency, and
format. Furthermore, the recovery variables identified (recovery context, effectiveness,
and duration) have the potential to facilitate a rigorous analysis of controller recovery
from equipment failures in ATC and thus can be used in incident investigation
processes. Finally, the recovery context indicator represents a good indicator of the
outcome of the recovery process (represented as a combination of the recovery
effectiveness and duration). As such, the overall framework for the analysis of
controller recovery based on identified recovery variables can be used to assess the
outcome of the recovery process in both current and future ATM environment.
11.3 Future work
The research presented in this thesis demonstrates the capability to assess ATC
equipment failures and subsequent controller recovery performance. However, these
findings also suggest a number of directions for further research. These include:
� It is hard to find safety related research in the aviation industry which does not rely
upon some type of occurrence data. However, seldom do any of them pose a
question about the reliability of the data available. To this date, no measure of
reliability of occurrence databases has been produced. Automatic tools exist in
certain countries, for example the Safety Monitoring Function (SMF), which
Chapter 11 Conclusions
307
captures all losses of separation incidents in controlled airspace of that country.
Data from such a tool may provide an indication of the reliability of the occurrence
data.
� Future research should investigate ways to overcome the logistical difficulties with
capturing operational data and corresponding qualitative and quantitative aspects of
validation (e.g. in terms of questionnaire survey sample, number and characteristics
of ATM specialists, and subject matter experts).
� The further development of the qualitative equipment failure impact assessment
tool (Chapter 4) would be required to enable assessment of the impact of several
independent failures on ATC operations and thus controller performance. The
output of this more advanced approach would be to indicate the most severe
independent multiple failures. However, to achieve this, the tool would have to be
adapted to a specific ATC Centre to integrate the complexity of its ATC architecture
and the flow of data between various ATC systems.
� The questionnaire survey used in any future research should apply rigorous design
methods to avoid ambiguities and facilitate interpretation or perception of key terms
(e.g. equipment failure).
� The relationship between the particular RIF level and its impact on controller
recovery (i.e. defined via qualitative descriptor in Chapter 7 and the correlation
coefficient in Chapter 8) could be defined as a function of RIF level. This approach
would be more sensitive to the changes resulting from the incorporation of RIF
interactions.
� It would be necessary to simulate the impact of ATC equipment failures in a future
gate-to-gate ATM system where the roles for planning and executive control will be
reorganised and distributed between controllers and pilots. Additionally, this future
environment will be characterised with dynamic real-time exchange and distribution
of flight-related information. Thus, the safety assessments would have to consider
the exchange and distribution of corrupted data and its impact on both air and
ground services.
� The thesis has identified a statistically significant correlation between recovery
context indicator and the outcome of the recovery process. Future research should
transfer this finding into a model that could be used operationally in an ATC Centre.
11.4 Publications relating to this work
The following publications have been produced in support of the research on controller
recovery from equipment failures in ATC. The publications consist of journal
Chapter 11 Conclusions
308
publications and published conference proceedings, each commented on the precise
contribution of listed co-authors.
11.4.1 Publication format: journal – accepted subject to revision
Subotic, B., Majumdar, A., and Ochieng, W.Y. (2007). Recovery from Equipment
Failures in Air Traffic Control (ATC): The findings from an international survey of
controllers. Accepted subject to revision to the International Journal of Engineering and
Operations: Air Traffic Control Quarterly. Air Traffic Control Association Institute, Inc.
11.4.2 Publication format: journal - published
Subotic, B., Ochieng, W.Y., and Straeter, O. (2007). Recovery from equipment failures
in ATC: An overview of contextual factors. The Reliability Engineering and System
Safety Journal, Vol 92 (7), pp. 858-870.
Subotic, B., Ochieng, W.Y., and Majumdar, A. (2005). Equipment Failures in Air Traffic
Control: Finding an Appropriate Safety Target. The Aeronautical Journal of the Royal
Aeronautical Society, Vol 109 (1096), pp.277-284.
11.4.3 Publication format: conference proceedings - published
Subotic, B., Ochieng, W. and Straeter, O. (2006). Recovery from Equipment Failures in
Air Traffic Control: A Probabilistic Assessment of Context. Proceedings of the
Probabilistic Safety Assessment (PSAM 08) conference, May 14-19, 2006, New
Orleans, USA.
Subotic, B., and Ochieng, W.Y. (2005). Recovery from Equipment Failures in Air Traffic
Control. In Contemporary Ergonomics 2005 (Eds. P.D. Bust and P. T. McCabe). Taylor
& Francis. Presented at the Ergonomics Society Annual Conference, De Havilland
Campus, University of Hertfordshire, Hatfield.
Chapter 12 List of References
309
12 List of References
10News (2006). Power Outage Momentarily Interrupts Air Traffic Control. From http://www.10news.com/news/8831526/detail.html
Air Transport Action Group (2005). The economic & social benefits of air transport. From http://www.atag.org/files/Soceconomic-124721A.pdf
Air Transport Association (2006). Cost of ATC Delays. From http://www.airlines.org/economics/specialtopics/ATC+Delay+Cost.htm
Airbus (2004). Global Market Forecast 2004-2023. From http://www.airbus.com/en/myairbus/global_market_forcast.html
Airways New Zealand (2006a). Manual of Air Traffic Services (amendment 113). Airways New Zealand.
Airways New Zealand (2006b). Domestic and International Aircraft Movements by Calendar Year. From http://www.airways.co.nz/documents/avimove_stats.pdf
Aviation International News (2001). Europeans embracing MLS with a vengeance. From http://www.ainonline.com/issues/04_01/Apr_2001_europeanmlspg75.html
Bainbridge, L. (1983). Ironies of Automation. Automatica, 19, 775-779. From http://www.bainbrdg.demon.co.uk/Papers/Ironies.html
Bainbridge, L. (1984). Diagnostic Skill in Process Operation. Department of Psychology, University College London. From http://www.bainbrdg.demon.co.uk/Papers/DiagnosticSkill.html
Baker, S., and Weston, I. (2001). Mayday, mayday, mayday. From http://www.isasi.org/working_groups/ats/atsmayday.pdf
Berenson, M.L., Levine, D.M., Krehbiel, T.C. (2006). Basic Business Statistics: Concepts and Applications. Prentice Hall: Upper Saddle River, NJ.
Billings, C.E. (1996). Aviation Automation: The Search for a Human-Centred Approach. Hillsdale, N.J.: Lawrence Erlbaum Associates.
Boehm-Davis, D., Curry, R.E., Wiener, E.L., and Harrison, R.L. (1983). Human factors of flight-deck automation: Report on a NASA industry workshop. Ergonomics, 26, 953-961.
Boeing (2004). Statistical Summary of Commercial Jet Airplane Accidents: Worldwide Operations 1959 – 2003. From http://www.boeing.com/news/techissues/pdf/statsum.pdf.
Bove, T. (2002). Development and Validation of a Human Error Management Taxonomy in Air Traffic Control. PhD dissertation. Risø National Laboratory, Roskilde. From http://www.risoe.dk/rispubl/SYS/syspdf/ris-r-1378.pdf
Chapter 12 List of References
310
British Airways (2006). Flight Training Safety and Emergency Procedures (SEP) Training. From http://www.britishairwaysjobs.com/baweb1/?newms=info150
Brooker, P. (2004). Consistent and up-to-date aviation safety targets. Draft version. Cranfield University.
Brooker, P. (2006). Air Traffic Control Safety Indicators: What is Achievable? Eurocontrol: Safety R&D Seminar, 25-27 October 2006, Spain. From https://dspace.lib.cranfield.ac.uk/bitstream/1826/1372/1/Eurocontrol+2006+ATC-Brooker.pdf
Bureau of Transport and Regional Economics (2006). Aviation. Australian Government. From http://www.btre.gov.au/statistics/aviation.aspx
Bureau of Transportation Statistics (2004). Airline On-Time Statistics and Delay Causes. From http://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp
Bureau of Transportation Statistics (2006). Dictionary. From http://www.bts.gov/dictionary/list.xml?letter=A
CASA (2006). ADS-B: Automatic Dependent Surveillance – Broadcast. Civil Aviation Safety Authority Australia. From http://casa.gov.au/pilots/download/ADS-B.pdf
Christensen, W.C., and Manuele, F.A. (1999). Safety through Design: Best Practices. National Safety Council Press.
Cox, K. (2005). Teamwork and Trust: A Pilot’s Perspective. From http://safecopter.arc.nasa.gov/Pages/Columns/SBrief/SafeBrf1Articles/6Teamwork.html
Damidau, A., Kirwan, B., and Scrivani, P. (2006). Safety Getting Real: Safety Insights from Real Time Simulations. Proceedings from the EUROCONTROL Safety R&D Seminar, Barcelona 25-27 October 2006, Spain.
Daniels, J.J., Regli, S.H., and Franke,J.L. (2002). Support for Intelligent Interruption and Augmented Context Recovery. Proceedings from 7th IEEE Human Factors Meeting. Scottsdale, Arizona.
Dekker, S., Fields, B., and Wright, P. (2004). Human Error Recontextualised. From http://www.cs.mdx.ac.uk/staffpages/bobf/papers/glasgow.pdf
Department of Defense (2001). Global Positioning System: Standard Positioning Service Performance Standard. Command, Control, Communication, and Intelligence. Washington DC.
Endsley, M. (1997). Situation Awareness, Automation & Free Flight. From http://atm-seminar-97.eurocontrol.fr/endsley.htm
Endsley, M. R., and Kaber, D. B. (1999). Level of automation effects on performance, situation awareness and workload in a dynamic control task. Ergonomics, 42(3), pp. 462-492.
Endsley, M., and Kiris, E. (1995). The out-of-the-loop performance problem and level of control in automation. Human Factors, 37(2), pp. 381-394.
EUROCONTROL (1997). EUROCONTROL Standard Document for Radar Surveillance in En-Route Airspace and Major Terminal Areas. From http://www.eurocontrol.int/surveillance/gallery/content/public/documents/SURVSTD.pdf
EUROCONTROL (1999). CD-ROM: An introduction to ATM. EUROCONTROL Institute of Air Navigation Services.
Chapter 12 List of References
311
EUROCONTROL (2000a). Safety Minima Study: Review Of Existing Standards And Practices. From http://www.eurocontrol.int/src/gallery/content/public/documents/deliverables/srcdoc1ri.pdf
EUROCONTROL (2000b). Conflict Resolution Assistant Level 2 (CORA2): Controller Assessments (ASA.01.CORA.2.DEL02-b.RS).
EUROCONTROL (2000c). ESARR 2: Reporting and Assessment of Safety Occurrences in ATM. From http://www.atceuc.org/site/Eurocontrol/pdf02/esarr2%20v2.0%20en.pdf
EUROCONTROL (2001a). ECAC Safety Minima for ATM. EUROCONTROL Safety Regulation Commission.
EUROCONTROL (2001b). ESARR 4: Risk Assessment and Mitigation in ATM. EUROCONTROL Safety Regulation Commission. http://www.eurocontrol.int/src/gallery/content/public/documents/deliverables/esarr4v1.pdf
EUROCONTROL (2001c). Safety assessment of the free route airspace concept: Feasibility phase. Working Draft 0.3. European Organisation for the Safety of Air Navigation, EUROCONTROL. From http://www.eurocontrol.int/airspace/gallery/content/public/documents/frap/safety_assessment_report_integrated
EUROCONTROL (2001d). European Manual of Personnel Licensing - Air Traffic Controllers: Guidance on Implementation. From http://www.eurocontrol.int/humanfactors/gallery/content/public/docs/DELIVERABLES/L2%20(HUM.ET1.ST08.10000-GUI-01)%20Released-withsig.pdf
EUROCONTROL (2001e). Harmonisation of European Incident Definitions Initiative for ATM – HEIDI Viewer Instructions for Use. Safety, Quality and Standardisation Unit (SQS).
EUROCONTROL (2001f). EUROCONTROL Airspace Strategy for the ECAC States. From http://www.eurocontrol.int/eatm/gallery/content/public/library/airspace.pdf
EUROCONTROL (2002b). Technical Review of Human Performance Models and Taxonomies of Human Error in ATM (HERA). From http://www.eurocontrol.int/humanfactors/gallery/content/public/docs/DELIVERABLES/HF26 (HRS-HSP-002-REP-01) Released.pdf
EUROCONTROL (2002c). Glossary of Terms and Definitions & List of Acronyms (SRC DOC 4). From http://www.eurocontrol.int/src/gallery/content/public/documents/deliverables/srcdoc4e2.pdf
EUROCONTROL (2002d). Short Report on Human Performance Models and Taxonomies of Human Error in ATM (HERA). From http://www.eurocontrol.int/humanfactors/gallery/content/public/docs/DELIVERABLES/HF27%20(HRS-HSP-002-REP-02)%20Released.pdf
EUROCONTROL (2003a). MADAP in a Nutshell. Maastricht Upper Area Control Centre, Netherlands.
EUROCONTROL (2003b). Summer: ATFM summary report. From http://www.cfmu.eurocontrol.int/ATFM/public/docs/publicreport_2003year.pdf
Chapter 12 List of References
312
EUROCONTROL (2003c). EUROCONTROL ATM Strategy for the Years 2000+, Volume 1. From http://www.eurocontrol.int/eatm/gallery/content/public/library/ATM2000-EN-V1-2003.pdf
EUROCONTROL (2003d). HERA-JANUS training: Analysing Human Error in Incident Investigation. 18-20 November 2003. EUROCONTROL Institute of Air Navigation Service, Luxembourg.
EUROCONTROL (2003e). The Human Error in ATM Technique (HERA-JANUS). From http://www.eurocontrol.int/humanfactors/gallery/content/public/docs/DELIVERABLES/HF30 (HRS-HSP-002-REP-03) Released-withsig.pdf
EUROCONTROL (2003f). Guidelines for Controller Training in the Handling of Unusual/Emergency Situations. From http://www.eurocontrol.int/humanfactors/gallery/content/public/docs/DELIVERABLES/T11%20(Edition%202.0)%20HRS-TSP-004-GUI-05withsig.pdf
EUROCONTROL (2003g). Radio and Navigation Aids Course (IANS_ATC_RADNAV). EUROCONTROL Institute of Air Navigation Service, Luxembourg.
EUROCONTROL (2003h). Area Navigation Applications in Europe. From http://elearning.eurocontrol.int/ATMTraining/precourse/nav/rnav/index.html
EUROCONTROL (2003i). ESARR 6: Software in ATM Systems. Safety Regulatory Commission. From http://www.eurocontrol.int/src/gallery/content/public/documents/deliverables/esarr6_e10_ri.pdf
EUROCONTROL (2004a). Evaluating the True Cost to Airlines of One Minute of Airborne or Ground Delay. Prepared by the University of Westminster for Performance Review Unit. From www.eurocontrol.int/prc/gallery/content/public/Docs/cost_of_delay.pdf
EUROCONTROL (2004b). MANTAS Basic Operational Concept, Version: Draft 0.2. EUROCONTROL.
EUROCONTROL (2004c). CORA 2 Safety Analysis: Exploratory Preliminary System Safety Assessment (PSSA). European Air Traffic Management Programme.
EUROCONTROL (2004d). Review of Techniques to Support the EATMP Safety Assessment Methodology. From http://www.eurocontrol.int/eec/gallery/content/public/documents/EEC_notes/2004/EEC_note_2004_01_1.pdf
EUROCONTROL (2004e). Managing System Disturbances in ATM: Background and Contextual Framework. From http://www.eurocontrol.int/humanfactors/gallery/content/public/docs/DELIVERABLES/HF47%20(HRS-HSP-005-REP-06)%20Released-withsig.pdf
EUROCONTROL (2004f). The Impact of Automation on Future Controller Skill Requirements and a Framework for SHAPE (HRS/HSP-005-REP-04). Human Factors Management Business Division (DAS/HUM).
EUROCONTROL (2004g). Model Based Simulation of the Turkish En-Route Airspace (EEC Report No. 396). From http://www.ans.dhmi.gov.tr/TR/ATCTR/proje/fts.pdf
EUROCONTROL (2005). ATM Contribution to Aircraft Accidents/Incidents: Review and Analysis of Historical Data. From http://www.eurocontrol.int/src/gallery/content/public/documents/deliverables/srcdoc2_e40_ri_web.pdf
Chapter 12 List of References
313
EUROCONTROL (2006a). Air Traffic Control (ATC). From http://www.eurocontrol.int/corporate/public/standard_page/cb_airtraffic_controller.html
EUROCONTROL (2006b). What is PRNAV? From http://www.ecacnav.com/content.asp?PageID=82
EUROCONTROL (2006c). Performance Review Report covering the calendar year 2005. Performance Review Commission.
EUROCONTROL (2006d). The impact of fragmentation in European ATM/CNS. Performance Review Commission. From http://www.eurocontrol.int/prc/gallery/content/public/Docs/fragmentation.pdf
EUROCONTROL (2007a). Safety Nets. From http://www.eurocontrol.int/safety-nets/public/subsite_homepage/homepage.html
EUROCONTROL (2007b). Single European Sky. From http://www.eurocontrol.int/ses/public/subsite_homepage/homepage.html
European Commission (2001). Meeting society’s needs and winning global leadership. Report of the group of personalities. From http://ec.europa.eu/research/growth/aeronautics2020/pdf/aeronautics2020_en.pdf
European Commission (2006a). GNSS Autonomous Navigation Algorithms Critical Study (D3.2.2.1). Draft report. Sixth Framework Programme (2002-2006).
European Commission (2006b). Critical Analysis of Space-Based Navigation Technologies Usable for Civil Aviation (D3.1P). Draft report. Sixth Framework Programme (2002-2006).
European Space Agency (2002). Space Product Assurance: Safety (ESA Q-40-B). Requirements & Standards Division. Noordwijk, The Netherlands.
Federal Aviation Administration (1995). Approach Station Keeping (Ask) Experiment Plan and Final Report (DOT/FAA/CT-TN95/58). Department of Transportation: Federal Aviation Administration. From http://www.tc.faa.gov/acb300/techreports/TN9558.pdf
Federal Aviation Administration (1997). Hardware Product Specification Document for the Voice Switching and Control System (VSCS) (DTFA01–92–D–00004). Department of Transportation: Federal Aviation Administration.
Federal Aviation Administration (1998). Voice Switching and Control System: Attachment J-3 - Product Specification (FAA-E-2731G). Department of Transportation: Federal Aviation Administration.
Federal Aviation Administration (2000). System Safety Handbook, Chapter 3. Department of Transportation: Federal Aviation Administration. From http://www.asy.faa.gov/RISK/SSHandbook/contents.htm.
Federal Aviation Administration (2003). The Human Factors Design Standard (HF-STD-001). Compact disk, William J. Hughes Technical Center, Atlantic City International Airport, NJ.
Federal Aviation Administration (2005). Air Transportation Operations Inspector's Handbook (Order 8400), Vol 1. Department of Transportation: Federal Aviation Administration. From http://www.faa.gov/library/manuals/examiners_inspectors/8400/
Chapter 12 List of References
314
Feng, S., Ochieng, W., Walsh, D., and Ioannides, R. (2005).A Measurement Domain Receiver Autonomous Integrity Monitoring Algorithm. GPS Solutions. Springer Berlin/Heidelberg.
Frese, M. (1991). Error Management or Error Prevention: Two Strategies to Deal with Errors in Software Design. In H. J. Bullinger (Ed.) Human aspects in Computing: Design and Use of Interactive Systems and Work with Terminals. Amsterdam: Elsevier Science Publishers.
Frese, M., Brodbeck, F.C., Zapf, D., & Prumper, J. (1990). The Effects of Task Structure and Social Support on Users’ Errors and Error Handling. In D. Diaper et al. (Eds.) Human – Computer Interaction - INTERACT’90 (pp.35-41). Amsterdam, Elsevier Science Publishers.
Fujita, Y., and Hollnagel, E. (2004). Failures without errors: quantification of context in HRA. Reliability Engineering and System Safety, 83, pp. 145-151.
Funk, K., Lyall, B., and Riley, V. (1996). Perceived Human Factors Problems of Flightdeck Automation: Phase 1 Final Report. Federal Aviation Administration Grant 93-G-039. From http://www.flightdeckautomation.com/phase1/phase1report.aspx
General Accounting Office (1982). Computer Outages at Terminal Facilities and Their Correlation to Near mid-air Collisions (AFMD-82-43). US GAO, Washington DC.
General Accounting Office (1991). Air Traffic Control: FAA Can Better Forecast and Prevent Equipment Failures. US GAO, Washington DC.
General Accounting Office (1996). Air Traffic Control: Good Progress on Interim Replacement for Outage-Plagued System, but Risks Can Be Further Reduced. US GAO, Washington DC.
General Accounting Office (1998). Air Traffic Control: Information Concerning Equipment Outages at Two Kansas City Area Facilities. US GAO, Washington DC.
Gordon, R., and Makings, N. (2003). Gate 2 Gate: Stakeholder Safety Survey. EUROCONTROL Experimental Centre, France.
Graham, G.M., Kinnersly, S and Joyce, A. (2002). Safety Reporting and Aviation Target Levels of Safety. In C.W. Johnson, Investigation and Reporting of Incidents and Accidents (IRIA 2002). Department of Computing Science, University of Glasgow, Scotland.
Hai, L. (2004). Civil Aviation Safety Outline (2001-2020). From http://www.seaskyad.com/ad@cca_english/content/content_0206_special_articles/article16.htm.
Hallbert B.P. and P. Meyer (1995). Summary of lessons learned at the OECD Halden reactor project for the evaluation of human-machine systems. Institutt for Energiteknikk, Halden, Norway.
Heinrich, H.W. (1941). Industrial Accident Prevention – A Scientific Approach. Mc Graw Hill: New York and Wiley: London.
Hilburn, B. (2004). Cognitive Complexity in Air Traffic Control - A Literature Review. EUROCONTROL Experimental Centre, EEC Note 04/04.
Hilburn, B., and Flynn, M. (2001). Air Traffic Controller and Management Attitudes Toward Automation: An Empirical Investigation. 4th USA/EUROPE Air Traffic Management R&D Seminar, Santa Fe, USA.
Chapter 12 List of References
315
Hollnagel, E. (1993). Human Reliability Analysis: Context and Control. Academic Press, London.
Hollnagel, E. (1998). Cognitive Reliability and Error Analysis Method (CREAM). Elsevier Science Ltd., London, UK.
IEEE (1998). IEEE Guide for Microwave Communications System Development: Design, Procurement, Construction, Maintenance, and Operation. IEEE-SA Standards Board. From http://ieeexplore.ieee.org/iel4/5643/15123/00690973.pdf?arnumber=690973
IFALPA (2005). Interpilot: 60th Annual Conference: Boeing 787 programme update. From http://216.239.59.104/search?q=cache:oJuuByAkeqEJ:www.ifalpa.org/Interpilot/2005/06inp01.pdf+Interpilot:+60th+Annual+Conference:+Boeing+787+programme+update&hl=en&ct=clnk&cd=1&gl=uk
IFATCA (2004). Produce Definition of Controller Tools (Agenda Item B.5.2). Proceedings from 43rd Annual Conference, Hong Kong, 22-26 March 2004.
IFATCA (2005). A Positive Step to Improve Aviation Safety. From http://www.ifatca.org/press/141105.pdf
International Civil Aviation Organization (1979). Annex 5: Units of Measurement to be Used in Air and Ground Operations. Montreal, Canada.
International Civil Aviation Organization (1985). Manual of Air Traffic Forecasting (Doc 8991-AT/722/2). Montreal, Canada.
International Civil Aviation Organization (1994). All-Weather Operations Panel. Fifteenth meeting. Montreal, Canada.
International Civil Aviation Organization (1995). Review of the General Concept of Separation panel (RGCSP). Working Group A: A Review of Work on Deriving a Target Level of Safety (TLS) for En-route Collision Risk. Montreal, Canada.
International Civil Aviation Organization (1997). Outlook for Air Transport to the Year 2005 (ICAO Circular 270-AT/111). Montreal, Canada.
International Civil Aviation Organization (1998). Human Factors Training Manual – Doc 9683 (First Edition). Montreal, Canada.
International Civil Aviation Organization (2001a). Air Traffic Management Doc 4444. Montreal, Canada.
International Civil Aviation Organization (2001b). Annex 6: Operation of Aircraft. Montreal, Canada.
International Civil Aviation Organization (2001c). Annex 11: Air Traffic Services. Montreal, Canada.
International Civil Aviation Organization (2001d). Annex 13: Aircraft Accident and Incident Investigation. Montreal, Canada.
International Civil Aviation Organization (2001e). Annex 1: Personnel Licensing. Montreal, Canada.
International Civil Aviation Organization (2003). Review the latest developments in the ATN Panel and the Aeronautical Mobile Communication Panel. From http://www.icao.int/icao/en/ro/apac/atn_2003/ip02.pdf
International Civil Aviation Organization (2005). Report of the Ninth Meeting of Communications, Navigation And Surveillance/Meteorology Sub-Group
Chapter 12 List of References
316
(Cns/Met/Sg/9) Bangkok, Thailand 11– 15 July 2005. From http://www.icao.int/icao/en/ro/apac/2005/CNS_MET_SG9/CNSMET_SG9.pdf
International Civil Aviation Organization (2006a). Review Developments Relating to CNS/ATM Implementation: Review the Work by RNP Special Operational Requirements Study Group on the Implementation of RNP Operations. From http://www.icao.int/icao/en/ro/apac/2006/ATM_AIS_SAR_SG16/wp22.pdf
International Civil Aviation Organization (2006b). Contracting States. From http://www.icao.int/cgi/goto_m.pl?/cgi/statesDB4.pl?en
International Civil Aviation Organization (2007). CNS/ATM Systems. From http://www.icao.int/icao/en/ro/rio/execsum.pdf
Jeppesen (2001). Required Navigation Performance (RNP). Jeppesen Briefing Bulletin. From http://www.jeppesen.com/download/briefbull/den01-j.pdf
Johnson, C. W. and Holloway, C.M. (2004). On the Over-Emphasis of Human ‘Error’ As A Cause of Aviation Accidents: ‘Systemic Failures’ and ‘Human Error’ in US NTSB and Canadian TSB Aviation Reports 1996-2003. From http://www.dcs.gla.ac.uk/~johnson/papers/Cause_comparisons/Error_and_accidents.PDF
Joint Aviation Administration (1994). Joint Aviation Requirements for Large Aeroplanes (JAR–25).
Kaarstad M., Ludvigsen J.T. (2002). Background study for further research in performance recovery. Presented at Enlarged Halden Programme Group Meeting, Storefjell,C2/5/1–16.
Kaber D.B. (1997). The Effect of Level of Automation and Adaptive Automation on Performance in Dynamic Control Environments (ANRCP-NG-ITWD-97-01). Amarillo, TX: Amarillo National Resource Center for Plutonium.
Kaber, D. B. and Riley, J. (1999). Adaptive automation of a dynamic control task based on secondary task workload measurement. International Journal of Cognitive Ergonomics, 3(3), 169-187.
Kaber, D.B., Prinzel, L.J., Wright, M.C., and Clamann, M.P. (2002). Workload-Matched Adaptive Automation Support of Air Traffic Controller Information Processing Stages (NASA/TP-2002-211932). National Aeronautics and Space Administration. From http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20020080640_2002133430.pdf
Kanse, L. (2004). Recovery uncovered: How people in the chemical process industry recover from failures. PhD dissertation. Eindhoven University of Technology.
Kanse, L. and van der Schaaf, T. (2000). Recovery from failures - understanding the positive role of human operators during incidents. In by D. de Waard, C. Weikert, J. Hoonhout and J. Ramaekers (Eds.), Human System Interaction: Education, Research and Application in the 21st Century. Maastricht, Netherlands: Shaker Publishing.
Kennedy, R., Kirwan, B., and Summersgill, R. (2000). Making HRA a more consistent science. In Foresight & Precaution, Eds. Cottam, M., Pape, R.P., Harvey, D.W., and Tait,J. Balkema, Rotterdam.
Kim, M.C., Seong, P.H., and Hollnagel, E. (2005). A probabilistic approach for determining the control mode in CREAM. Reliability Engineering and System Safety, pp. 1-9.
Chapter 12 List of References
317
Kirwan, B. (1994). A Guide to Practical Human Reliability Assessment. Taylor & Francis, London, UK.
Kirwan, B. (1997). The development of a nuclear chemical plant human reliability management approach: HRMS and JHEDI. Reliability Engineering and System Safety, Vol 56, pp. 107-133.
Kirwan, B., Gibson, H., Edmunds, J., Cooksley, G., Kennedy, R., and Umbers, I. (1994). Nuclear Action Reliability Assessment (NARA): A Data-Based HRA Tool.
Kirwan, B., Basra, G., and Taylor-Adam, S.E. (1997). CORE-DATA: A Computerised Human Error Database for Human Reliability Support. Proceedings from the Sixth Annual Human Factors Meeting, Orlando, US.
Kontogiannis, T. (1999). User strategies in recovering from system failures in man-machine systems. Safety Science 32(1), pp. 49-68.
Kopardekar, P., and Magryratis, S. (2003). The measurement and prediction of dynamic density. Presented at the FAA-EUROCONTROL ATM 2003 Seminar, Budapest.
Lanzi, P., and Marti, P. (2001). Innovate or preserve: when technology questions cooperative processes. From http://www.dblue.it/pdf/ECCE11_Lanzi_Marti_v3.pdf
Layton, C., Smith, P. J., and McCoy, E. (1994). Design of a cooperative problem-solving system for en-route flight planning: An empirical evaluation. Human Factors, 36, pp. 94-119.
Leveson N.G. (1995). Safeware: System Safety and Computers. Addison- Wesley publishing company, New York.
Littlewood, B., Strigini, L., Wright, D., and Courtois, P.J. (1998). Examination of Bayesian Belief Network for Safety Assessment of Nuclear Computer-Based Systems ESPRIT DeVa Project 20072). From http://www.csr.city.ac.uk/people/lorenzo.strigini/ls.papers/DeVa_BBN_reports/DeVaTR70_year3.5a/DeVaTR70.pdf
Low, I. and Donohoe, L. (2001). Engineering Psychology and Cognitive Ergonomics Volume 5: Aerospace and Transportation Systems. Edited by Don Harris. Methods for assessing ATC controllers’ recovery from automation failures. National Air Traffic Service (NATS), UK.
Majumdar, A., and Ochieng, W.Y. (2002). Estimation of European Airspace Capacity from a Model of Controller Workload. Journal of Navigation, Vol 55(3), pp. 381-403.
Majumdar, A., Ochieng, W.Y., McAuley, G., Lenzi, J.M., and Lepadatu, C. (2004). The Factors Affecting Airspace Capacity in Europe: A Cross-Sectional Time-Series Analysis Using Simulated Controller Workload. Journal of Navigation, Vol 57(3), pp.385-405.
Massaiu, S., Haugset, H., and Bjorlo, T.J. (2003). Human Reliability Issues in Traffic Control Centres. Norwegian Research Council.
Mauri, G. (2000). Integrating Safety Analysis Techniques, Supporting Identification of Common Cause Failures. PhD thesis, The University of York.
Metzger, U., and Parasuraman, R. (2005). Automation in future air traffic management: Effects of decision aid reliability on controller performance and mental workload. Human Factors, 47(1), 35-49.
Chapter 12 List of References
318
Ministry of Land, Infrastructure, and Transport (2006). Statistics. Air Traffic Activity at Cab Facilities: Area Control Center. From http://www.mlit.go.jp/koku/04_hoan/e/statistics/image/00_00.gif
Mohleji, S., C., Lacher, A. R., and Ostwald, P.A. (2003). CNS/ATM System Architecture Concepts and Future Vision of NAS Operations. In 2020 Timeframe. Center for Advanced Aviation System Development (CAASD), The MITRE Corporation. From http://www.mitre.org/work/tech_papers/tech_papers_03/mohleji_2020/mohleji_2020.pdf
National Aeronautics and Space Administration (2000). Required Communication Performance (RCP). From http://as.nasa.gov/aatt/wspdfs/Oishi.pdf
National Aeronautics and Space Administration (2002). NASA Safety Manual w/Changes through Change 1 (NPR 8715.3). NASA QS / Safety & Risk Management Division.
National Air Traffic Services (1999). Testing Operational Scenarios for Concepts in ATM (Phase II). WP2: Airspace Sectorisation Optimisation. European Commission.
National Air Traffic Services (2002). Manual of Air Traffic Services Part II. London Area Control Centre, edition 2/02.
National Air Traffic Services (2004). NATS apologises for delays experienced today. From http://www.nats.co.uk/news/news_stories/2004_06_03_2.html
National Transportation Library (1997). Potential Cost Savings Ideas for FAA and Users. From http://ntl.bts.gov/lib/000/500/511/costsav.pdf
National Transportation Safety Board (1973). Aircraft Accident Report (AAR-73-14). From http://amelia.db.erau.edu/reports/ntsb/aar/AAR73-14.pdf
National Transportation Safety Board (1983). Aircraft Accident Report (AAR-83-02). From http://amelia.db.erau.edu/reports/ntsb/aar/AAR83-02.pdf
National Transportation Safety Board (1996).Special Investigation Report: Air Traffic Control Equipment Outages. Washington, D.C.
Nolan, M. S. (1998). Fundamentals of Air Traffic Control. Belmont, USA: Wadsworth Publishing Company.
Nuclear Regulatory Commission (1998). Technical Basis and Implementation Guidelines for a Technique for Human Event Analysis (ATHEANA). NUREG-1624. U.S. Nuclear Regulatory Commission, Washington, DC.
Ochieng, W.Y. (2006). Future Air Traffic Management. Course presentation for Air Traffic Management Module (T23). Imperial College London.
Orasanu, J., and Fischer, P. (1997). Finding decisions in natural environments: the view from the cockpit. In Zsambok, C.E. & Klein, G. Mahwah (Eds) Naturalistic decision-making. New Jersey: Lawrence Erlbaum Associates Publishers.
Oren, T., and Ghasem-Aghaee, N. (2003). Personality Representation Processable in Fuzzy Logic for Human Behavior Simulation. Summer Computer Simulation Conference, July 20-24, 2003. Montreal, Canada. From http://www.site.uottawa.ca/~oren/pres/pres-of-2003-01-SCSC-personality.pdf
Parasuraman, R., and Riley, V. (1997). Humans and automation: use, misuse, disuse, abuse. Human Factors Vol 39, 230-253.
Chapter 12 List of References
319
Parasuraman, R., Bahri, T., Deaton, J., Morrison, J., and Barnes, M. (1990). Theory and Design of Adaptive Automation in Aviation Systems. Technical Report No. CSL-N90-1, Cognitive Science Laboratory. Catholic University of America, Washington, DC.
Parasuraman, R., Mouloua, M., and Molloy, R. (1996). Effects of adaptive task allocation on monitoring of automated systems. Human Factors. 38. pp. 665-679.
Parasuraman, R., Wickens, C. D., and Sheridan, T. (2000). A model for types and levels of human interaction with automation. IEEE Transactions on Systems, Man, and Cybernetics, 30(3), 286-297.
Park, J., Jung, W., Ha, J., and Shin, Y. (2004). Analysis of operators’ performance under emergencies using a training simulator of the nuclear power plant. Reliability Engineering and System Safety, 83, pp. 179-186.
Perrow, C. (1999). Normal Accidents. Princeton University Press.
Piantek, T.W. (1999). Influence in contracting and purchasing. In Safety Through Design: Best Practices (EDS. Christensen, W.C., Manuele, F.A.). National Safety Council Press.
PPrune Forums (2006). ATC Issues. From http://www.pprune.org/forums/forumdisplay.php?s=ac64e2a0afd13472a93e7df2bba4b826&f=18
Rail Safety and Standards Board (2004). Rail-Specific HRA Tool for Driving Tasks Phase 1 Report. From http://www.rssb.co.uk/pdf/reports/research/T270 Rail-specific HRA tool for driving tasks Phase 1 report.pdf
Rasmussen, J. (1982). Human errors: A taxonomy for describing human malfunction in industrial installations. Journal of Occupational Accidents, 4, 311-335.
Reason, J.T. (1997). Managing the risks of organizational accidents. Aldershot, England: Ashgate Publishing.
Reid, J.W. (1996). Safety by Design. Lecture 4: Cost and acceptability of risk. Hazardous forum: London.
Rigas, G. and Elg, F. (1997). Mental models, confidence, and performance in a complex dynamic decision making environment. Department of Psychology, Uppsala University, Sweden. From http://www.ie.boun.edu.tr/labs/sesdyn/isdc97/TURKIA.doc
RISKS (2000). U.K. ATC System Failure. The RISKS Digest, Vol 20, issue 94. From http://catless.ncl.ac.uk/Risks/20.94.html
Rizzo, A., Ferante, D., and Bagnara, S. (1995). Handling human error. In J.M. Hoc, P.C. Cacciabue, & E. Hollnagel (Eds.), Expertise and Technology: Cognition & Human-Computer Cooperation (pp. 195-212). Hillsdale, NJ: Lawrence Erlbaum.
Saldana, M. A. M., Herrero, S. G., del Campo, M. A. M. and Ritzel, D. O. (2002). Assessing Definitions and Concepts within the Safety Profession. From http://www.aahperd.org/iejhe/2003_first/ritzel.pdf.
Sampaio, J. J. M., and Guerra, A. A. (2004). The day god failed or overtrust in automation: The Portuguese case study. In Proceedings from the 2nd Conference on Human Performance Situation Awareness and Automation (HPSAA 2). Daytona Beach, FL.
Chapter 12 List of References
320
Scerbo, M.W. (2005).Adaptive Automation. Department of Psychology Old Dominion University. From http://www.cs.colorado.edu/~mozer/courses/6622/papers/aachpt05-12-15.htm
Sellen, A. J. (1994). Detection of everyday errors. Applied psychology: An International Review 43(4), pp. 475-498.
Shappell, S.A. (2000). The Human Factors Analysis and Classification System-HFACS (DOT/FAA/AM-00/7). Federal Aviation Administration. US Department of Transportation. From http://www.nifc.gov/safety_study/accident_invest/humanfactors_class&anly.pdf
Sheridan, T.B. (1980). Computer control and human alienation. Technology Review Vol 10, pp.61-73.
Shier, R. (2004). The Mann-Whitney U Test. Matematics Learning Support Centre. From http://mlsc.lboro.ac.uk/documents/Mannwhitney.pdf
Shorrock, S. (1992). Error Classification for Safety Management: Finding the Right approach. In C.W. Johnson (Ed.), Investigation and Reporting of Incidents and Accidents IRIA 2002 (pp. 57-67). From http://www.dcs.gla.ac.uk/~johnson/iria2002/IRIA_2002.pdf
Shorrock, S. T., and Kirwan, B. (2002). Development and application of a human error identification tool for air traffic control. Applied Ergonomics, Vol 33, pp. 319–336.
Smith, S.P., Harrison, M.D. and Schupp, B.A. (2004). How explicit are the barriers to failure in safety arguments? Computer Safety, Reliability, and Security (SAFECOMP'04). In M. Heisel, P. Liggesmeyer and S. Wittmann (Eds), Lecture Notes in Computer Science Vo 3219, pp. 325-337, Springer.
Sorensen, J.N. (2002). Safety culture: a survey of the state-of-the-art. Reliability Engineering and System Safety, Vol 76, pp. 189-204.
Straeter, O. (2000). Evaluation of human reliability on the basis of operational experience. Dissertation at Munich Technical University.
Straeter, O. (2001). The quantification process for human interventions. In: Kafka, P. (ed.) PSA RID – Probabilistic Safety Assessment in Risk Informed Decision making. EURO-Course. 4.- 9.3.2001. GRS. Germany.
Straeter, O. (2005). Cognition and Safety: An Integrated Approach to Systems Design and Performance Assessment. Ashgate: Aldershot.
Subotic, B., Ochieng, W.Y., and Majumdar, A. (2005). Equipment Failures in Air Traffic Control: Finding an appropriate safety target. The Aeronautical Journal of the Royal Aeronautical Society, Vol 109(1096), p. 277-284.
Subotic, B., Ochieng, W.Y., and Straeter, O. (2006a). Recovery from equipment failures in ATC: An overview of contextual factors. Reliability Engineering and System Safety Journal Vol 92 (7), pp. 858-870.
Subotic, B., Ochieng, W. and Straeter, O. (2006b). Recovery from Equipment Failures in Air Traffic Control: A Probabilistic Assessment of Context. Probabilistic Safety Assessment (PSAM 08) Conference, May 14-19, 2006, New Orleans, US.
Swain, A. D., and Guttman, H. E. (1983). Handbook of human reliability analysis with emphasis on nuclear power plant applications (NUREG/CR-1278). Washington D.C.
Theis, I. and Sträter, O. (2001). By-Wire Systems in Automotive Industry. Reliability Analysis of the Driver-Vehicle-Interface Proceedings. ESREL 2001, Turin.
Chapter 12 List of References
321
THEMES (2001). Thematic Network for Safety Assessment of Waterborne Transport. Deliverable No. D5.1. Report on Safety and Environmental Assessment Method. From http://projects.dnv.com/themes/Deliverables/D5.1Final.pdf
Theureau J., Jeffroy F. and Vermersch P. (2000). Controlling a nuclear reactor in accidental situations with symptom-based computerized procedures: a semiological & phenomenological analysis. Proceedings from CSEPC 2000. Taejon, Corée, 22-25 Novembre.
UK Civil Aviation Authority (2000). Aviation safety review 1990-1999 (CAP 701). Civil Aviation Authority, London.
UK Civil Aviation Authority (2003). United Kingdom Manual of Personnel Licensing - Air Traffic Controllers (CAP 744). Civil Aviation Authority. London.
UK Civil Aviation Authority (2004). Fact Sheet - SSR Mode S, Edition 1.2. From http://www.caa.co.uk/docs/810/DAP_SSM_Mode_S_SSR_Factsheet.pdf
UK Civil Aviation Authority (2005). Mandatory Occurrence Reporting Scheme. CAP 382. Civil Aviation Authority, London. From http://www.caa.co.uk/docs/33/CAP382.PDF
UK Civil Aviation Authority (2006). Manual of Air Traffic Services - Part 1 (CAP 493). Civil Aviation Authority, London. From http://www.caa.co.uk/docs/33/CAP493Part1.pdf
United Nations (2006). UN in Brief. From http://www.un.org/Overview/brief1.html#footnote
van der Schaaf, T. W. (1992). Near miss reporting in the chemical process industry. PhD thesis. Eindhoven University of Technology.
van der Schaaf, T.W. (1995). Human recovery of errors in man-machine systems. Proceedings of the Sixth IFAC/IFIP/IFORS/IEA Symposium on the Analysis, Design and Evaluation of Man–Machine Systems. Cambridge, MA.
van Es, G.W.H. (2003). Review of Air Traffic Management-related accidents worldwide: 1980-2001. National Aerospace Laboratory (NLR).
Ward, M., Grupen, L., Regehr, G. (2002). Measuring Self-assessment: Current State of the Art. Advances in Health Sciences Education, 7, pp. 63–80.
Weisberg, H.F., Krosnick, J.A., and Bowen, B.D. (1996). An Introduction to Survey Research, Polling, and Data Analysis. SAGE Publications: London.
Wickens, C.D. (1992). Engineering psychology and human performance, 2nd Ed. New York: Harper Collins.
Wickens, C.D. (2001). Attention to Safety and the Psychology of Surprise. From http://www.aviation.uiuc.edu/UnitsHFD/conference/Osukeynote01.pdf
Wickens, C.D., Lee, J.D., Liu, Y., and Gordon Becker, S.E. (2004). An Introduction to Human Factors Engineering. New Jersey: Pearson Prentice Hall.
Wickens C.D, Mavor, A. and McGee, J.P. (Eds.) (1997). Flight to the Future: Human Factors in Air Traffic Control. Washington, DC: National Academy Press.
Wickens, C.D., Mavor, A. S., Parasuraman, R., and McGee, J.P. (1998). The Future of Air Traffic Control: Human Operators and Automation. National Academy Press: Washington, DC.
Wiener, E.L. and Curry, R.E. (1980). Flight deck automation: promises and problems. Ergonomics, Vol 23, pp. 995-1011.
Chapter 12 List of References
322
Williams, J.C. (1986). HEART – A Proposed Method for Assessing and Reducing Human Error. In 9th Advances in Reliability Technology Symposium. University of Bradford, 1986.
Wood, A. (1996). Software Reliability Growth Models. From http://www.hpl.hp.com/techreports/tandem/TR-96.1.pdf
Zapf, D., and Reason, J.T. (1994). Introduction: Human Error and Error Handling. Applied psychology: An international review, Vol 43(4), pp. 4127-432.
Appendices
323
Appendices
Appendix I The cost of delays induced by ATC equipment failures
Appendix II Interviews with ATM staff
Appendix III Checklist for the Equipment Failure Scenarios in a specific European
ATC Centre - An Aide-Memoire framework
Appendix IV The questionnaire design
Appendix V Example of one questionnaire response
Appendix VI Results extracted from the question 5 of the questionnaire survey
Appendix VII Overview of contextual factors
Appendix VIII Probabilities for 20 Recovery Influencing Factors (RIFs)
Appendix IX Questions for the ATM Specialist
Appendix X Overview of RIFs, their corresponding levels, and designated
probabilities
Appendix XI Validation of the RIFs interaction matrix
Appendix XII Distribution of 20 Recovery Influencing Factors (RIFs)
Appendix XIII Experimental material
Appendix XIV Overview of RIFs, their corresponding levels, and probabilities
determined in the experimental investigation
Appendix XV Distribution of the recovery context indicator captured in the experiment
Appendices
324
Appendix I The cost of delays induced by ATC equipment failures The impact of an equipment failure on ATM can be analysed from several different
perspectives. From a financial perspective, it is necessary to consider the costs
identified in ATC and the cost of delays in a wider region. A small exercise has been
conducted on the cost of delays induced by ATC equipment failures in the European
Civil Aviation Conference (ECAC) and US airspace.
From EUROCONTROL’s Central Flow Management Unit (CFMU) data for the period
from 1999 to 2003 (Table 1), ATC equipment failure induced delays are split between
en route and airports respectively. Given that the cost of one minute delay in Europe in
the year 2002 is estimated to be EUR72 (EUROCONTROL, 2004a), the last column of
Table 1 presents total costs incurred by airlines as a result of airborne and ground
delays. It is important to highlight that the estimate for the cost of one minute delay
(EUR72) is based on primary delay costs, reactionary delay costs (e.g. ‘knock-on’
effect to the other aircraft), as well as fuel, maintenance, ground handling of aircraft
and passengers, passenger costs of delay to the airline, and future loss of market
share due to lack of punctuality (EUROCONTROL, 2004a). As a result, the calculated
annual cost of delays caused by ATC equipment failures accounts for all relevant costs
and thus demonstrates the high cost of technical failures.
Table 1 ATC equipment as a cause of airport and enroute delays (personal correspondence1)
Year Enroute Delay
(min) Airport Delay
(min) Total Delay
(min)
Annual cost for the airlines (million EUR) based on the
year 2002
1999 609265 461290 1070555 77.08
2000 598660 265055 863715 62.19
2001 614534 406760 1021294 73.53
2002 425627 138045 563672 40.58
2003 149476 147528 297004 21.38
There are a number of reasons for the differences in the delay reported by the CFMU
(Table 1) for a given period. Some global factors explaining the delay reductions in the
decade beginning in 2000, are the general reduction of air traffic (as a result of post
September 11th 2001 crisis in the aviation industry), the presence of severe factors
(e.g. closure of Yugoslav airspace in 1999), the introduction of new route structures in
1999, the influence of European ATM network programs (e.g. Reduced Vertical
1 Personal correspondence with EUROCONTROL CMFU.
Appendices
325
Separation Minima-RVSM, improved capacity management), and staffing issues that
reached the highest record in 2002 (EUROCONTROL, 2003b).
Similar calculations have been carried out for the impact of ATC equipment failures on
the overall US’s National Aviation System (NAS). The US NAS consists of aircraft,
pilots, facilities, controllers, airports, maintenance personnel, together with computers,
communications equipment, satellite navigation aids, and radars. Direct aircraft
operating cost per minute of delay is calculated according to the Air Transport
Association (ATA) estimates for the year 2005, which is $62.33 (Air Transport
Association, 2006). This cost comprises of fuel burn, extra crew time, maintenance,
aircraft ownership costs, and additional costs. These additional costs account for costs
of extra gates and manpower on the ground and costs imposed on airline customers
(passengers and cargo shippers) in the form of lost productivity, wages, and customer
satisfaction. The FAA estimates average cost of delay to air travelers to be $30.26 per
hour or $0.50 per minute (Air Transport Association, 2006). As a result, the average
costs of ATC equipment induced failures for the year 2004 and 2005 are given in Table
2.
Table 2 ATC equipment as a cause of the US National Aviation System delays. From Bureau of Transportation Statistics (2004), summaries available only for the whole 2004 and 2005
Year ATC equipment (min) Average cost (millions $)
2004 402644 25.10
2005 274126 17.09
In general, these high-level analyses illustrate that equipment failures can significantly
affect operational, safety, and financial aspects of both ATC and ATM systems. Both
methods (employed for Europe and the US) for calculating the cost of the delay per
minute are largely similar. The only difference is the financial value assigned to each
minute of delay in Europe and the US. In addition, the ‘true’ cost of equipment failure
induced delay should also incorporate technical repair, unscheduled maintenance,
training, and additional staffing. However, it is assumed that these costs represent only
a fraction when compared to the cost of delay per minute. Therefore, it can be
concluded that these estimates are a reasonable representation of the total cost
induced by ATC equipment failure both in the European and the US aviation markets.
Appendices
326
Appendix II Interviews with ATM staff
Interviews with relevant Air Traffic Management (ATM) staff, as a method of data
collection, have been conducted to support the research presented in this thesis and to
augment available theoretical findings. They aimed to extract operational experience of
ATM specialists and experienced system control and monitoring engineers. The focus
of these interviews has been on four research areas. These are:
� classification of ambiguous operational failure reports;
� characteristics of air traffic controllers training;
� characteristics of equipment failures in Air Traffic Control (ATC); and
� contextual factors relevant to controller recovery from equipment failures in ATC.
Interviews with ATM specialists focused on the air traffic controller training (ab initio,
recurrent, and emergency training) and contextual factors relevant to controller
recovery. Interviews with system control and monitoring engineers revealed their
experiences related to the characteristics of ATC equipment failures.
The sample of ATM staff interviewed is as follows:
� system control and monitoring engineers from four countries:
o National Air Traffic Services (NATS), Corporate and Technical Centre (CTC)
and Swanwick Centre, UK;
o EUROCONTROL Maastricht Upper Area Control Centre (MUAC),
Netherlands;
o Irish Aviation Authority (IAA);
o Airports Authority of India (AAI);
� ATM specialists from two countries:
o EUROCONTROL Institute of Air Navigation Services (IANS), Luxembourg;
o Irish Aviation Authority (IAA).
Findings related to each research area are presented below.
Appendices
327
Table A-1 Findings related to the clarification of ambiguous operational data
Location Number of participants interviewed
Research question
Finding Agreement
between study participants
UK NATS (CTC) one experienced
engineer Ambiguous operational
failure reports
Proper classification of all operational failure reports
Yes, clarified all ambiguities EUROCONTROL
MUAC two experienced
engineers
Table A-2 Findings related to the air traffic controllers training
Location Number of participants interviewed
Research question
Findings Agreement
between study participants
EUROCONTROL IANS
one ATM specialist
Usefulness of announcing the
training for unusual/emergen
cy situations
Although controllers may anticipate an
unusual occurrence within their
emergency training, this does not
facilitate better performance as
long as they do not know the nature of
that unusual occurrence
Yes, both agreed
IAA one ATM specialist
Table A-3 Findings related to the characteristics of equipment failures in ATC
Location Number of participants interviewed
Research question
Finding Agreement
between study participants
UK NATS (CTC) one experienced
engineer Existence of latent failures
Latent failures tend to go unnoticed until some other event or failure reveals their
existence.
Yes, experienced
latent software failures
EUROCONTROL MUAC
one experienced engineer
IAA one experienced
engineer
UK NATS (CTC) one experienced
engineer Complexity of
failure type
Majority of ATC equipment failures
affect single system. Yes
EUROCONTROL (MUAC)
two experienced engineers
IAA one experienced
engineer
UK NATS (CTC) one experienced
engineer Time course of
failure development
Majority of failures tend to manifest
themselves suddenly
Yes EUROCONTROL
(MUAC) two experienced
engineers
IAA one experienced
engineer
Appendices
328
Table A-4 Findings related to the contextual factors relevant to controller recovery from equipment failures in ATC
Location Number of participants interviewed
Research question
Finding Agreement between
study participants
IAA two ATM
specialists
Contextual factors relevant
to controller recovery from
equipment failures in ATC
Validation of the candidate
contextual factors
Agreed on selected contextual factors and aided the definition of
each factor
IAA three ATM specialists
Interactions between
contextual factors
Validation of interactions
between contextual factors identified using operational
experience and the past research
Their feedback was similar. Identified
inconsistencies were further clarified during the
interview and were the result of the
misperception of some factors. All
inconsistencies were clarified.
Appendices
329
Appendix III Checklist for the Equipment Failure Scenarios in ATC Centre - An Aide-Memoire framework
This section provides a framework for the design of the Aide-Memoire or checklist type
procedures for recovery from equipment failures in a particular ATC Centre. The
proposed framework is adapted to an ATC Centre that participated in the experimental
investigation segment of the research presented in this thesis. This Aide-Memoire
provides a potential framework, which needs be further discussed and developed in
accordance with the in-house expertise of the system control and monitoring staff and
ATM specialists of a respective ATC Centre. However, the concept and the design
solution presented here is transferable across ATC Centres.
Contents
Once all equipment failures to be included in the Aide-memoire have been defined,
they could be categorised into four distinct groups based upon their impact on ATC
operations (as discussed in Chapter 4). These four categories are as follows:
� Major impact to operations room (all sectors/all workstations) – severe flow
restrictions possible. Relevant failures are:
o ONL LAN failure
o Failure of the Surveillance Network
o Failure of COMPAD
o Loss of Flight Server
o Loss of Track Server
o Loss of SSR and PSR
o Loss of FDPS
o Loss of MRP
� Moderate impact to operations room - impact to one or several workstation in
different suite, possible need to combine/move positions immediately and
possible flow restrictions. Relevant failures are:
o Reduced radar data mode
o Reduced alert mode
o Reduced communication mode
o Loss of ARTAS
o Loss of VCS panel
o Loss of a single CWP
o Loss of entire sector suite
o Loss of SRP
Potential colour coding in Aide-
Memoire RED
Potential colour coding in Aide-
Memoire YELLOW
Appendices
330
o Loss of adjacent sector
� Minimal impact – not immediately critical but may have greater operational
impact over time. Relevant failures are:
o Radar Data Function failure
o Loss of single frequency
o Overload of SRP
o Overload of MRP
o Loss of external feeds to AIS
o Loss of STCA
o Loss of APW
o Loss of MSAW
o Loss of OLDI
o Loss of paper strip printer
Note that the categorisation above lists some but not all possible failures. Those
marked in italics are designed in the Aide-Memoire format and are presented below.
Further input from system control and monitoring staff and ATM specialists may yield
more accurate and precise types of failures and recovery steps to be taken.
Design
At the top of each procedure, it would be useful to have the appearance of the pictorial
Human Machine interface (HMI) warning, if applicable (e.g. the highlighted labels on
the General Information Window). This would be followed by the presentation of the
two types of information. Firstly, the required recovery steps, i.e. those that a controller
must perform to recover effectively and ensure safe air traffic control service. Secondly,
the key effects of the equipment failure on the ATC system (i.e. the ATC system
feedback). The rational for this design solution is that the top part of the checklist
should be reserved for the items that controllers should be aware of first, i.e. recovery
steps.
In addition, it is necessary to define procedures for different personnel working in the
operational environment, namely controllers (i.e. different roles for executive, planner,
and assistant controller), supervisors, and managers to assure a seamless recovery
process. If, for example, radar services fail on all workstations, personnel should have
a readily available guide to help them recover from the failure. These guidelines may
vary according to the type of user, because different roles may require different
information on equipment failures and recovery procedures.
Potential colour coding in Aide-
Memoire GREEN
Appendices
331
Note that the colour-coded categorisation could be used in a slightly different manner
as well. If this Aide-Memoire becomes a part of the generic procedures for handling
emergency/unusual situations than the use of colour should be restricted to categories
such as ‘Aircraft Emergencies’, ‘Equipment Failures’, ‘Fire and Building Evacuation’.
The Aide-Memoire, as a hard, laminated copy flip chart, should be readily available on
each Controller Working Position (CWP). A more detailed version, providing local or
ATC Centre specific data, should be at the supervisor’s position. For simplicity and
efficiency, it is better to present each relevant failure on a single page highlighting the
two main areas: what recovery steps to perform and what feedback to expect from the
ATC system. This approach assures the most efficient usage of the tool.
The final version of the Aide-Memoire should not be considered as an exhaustive list
but more of a living document. In other words, it will be necessary to update this tool on
annual basis to reflect the local expertise and to compile all changes (i.e. changes in
the ATC system, both software and hardware).
Appendices
332
ONL LAN Failure
ATCO actions:
− Inform Coordinator − Inform all traffic − Check spare ODS − Maintain timely & accurate strip marking − Restrict traffic − Utilise holding patterns − Use only verbal coordination channels − Reaffirm traffic identification using the code on the FPS − Identify any new tracks using the “Confirm Squawk?”
method − Seek SAS assistance and print screen if possible − Ground all sport/non-commercial traffic ASAP − Utilise strategic ATC techniques when possible − Conduct regular checks of aircraft identification − Monitor Mode C closely − Be aware of the absence of Safety Nets and Monitoring
Aids − Cross check that exit conditions are achieved − Expedite reduction in traffic load
Appendices
333
ONL LAN Failure (Cont’d)
Expect:
The radar data is distributed via the RFS LAN
The following functions are NOT AVAILABLE:
− Safety Nets and Monitoring Aids (existing alarms maintained)
− Flight Plan function (no coupling, no RAM & CLAM) − Radar Data function replaced by Radar Fallback function − Flight plan commands (i.e. mod) − Flight plan lists frozen with data at time of failure − Reception Queues − Message transmission − Coordination messaging − Mail box management − Resectorisation − SSR code management − AIS (only data available at the time of failure) − All correlation will be lost
Appendices
334
Failure of the Surveillance Network
ATCO actions:
− Inform Coordinator − Inform all traffic − Employ procedural control techniques (if necessary
utilise emergency vertical separation of 500 feet) − Utilise holding patterns − Deny departures − Maintain timely & accurate strip marking − Instruct aircraft to maintain VMC, if in VMC − Reduce traffic load ASAP − Seek assistance − Relocate to contingency site if required
Expect
All ODS frozen or blanked throughout the Centre
Appendices
335
Failure of COMPAD
ATCO actions:
− Inform Coordinator − Transmit on second sector COMPAD − Access RBS and inform traffic of failure − Reset COMPAD − Seek assistance and relocate to spare CWP − Inform traffic of restoration of normal service when
service is restored
Expect:
Complete or Partial failure
Inability to transmit on RTF
Inability to access alternate RTF
Inability to use intercoms
Inability to access telephone network
Appendices
336
Reduced Radar Data Mode
GIW will show “MRTS”
ATCO actions:
− Inform Coordinator − Report failure − Operate as normal
Expect:
All functions are available
The switch to RFS (MRTS) from ARTAS is automatic
Any position in by-pass before ARTAS failure will remain
in by-pass
Appendices
337
Reduced Alert Mode
GIW will show “SNMAP”
ATCO actions:
− Inform Coordinator − Be aware of restricted, danger and prohibited airspace inc. TSA’s
− Check MSA’s at regional airports − Double and cross check Oceanic Entry COP’s and levels − Maintain timely & accurate strip marking − Utilise strategic traffic plans − Ensure tactical ATCO action is accurate − Employ TRM best practice − Continuously scan Mode C − Seek SAS assistance if necessary
Expect:
Any alert displayed prior to the reduced alert mode will remain displayed regardless of whether or not the alert is still valid.
The following functions are NOT AVAILABLE:
− Safety Net Function (STCA) − ATC Tools (MSAW and APW) − Monitoring Aids (RAM and CLAM) − Coupling − No APR sent to Flight Data function (no profile updates)
Appendices
338
Reduced Flight Plan Mode
GIW will show “FDP”
ATCO actions:
− Inform Coordinator − Check availability of FDP function on spare ODS − Inform traffic of failure − Maintain timely & accurate strip marking − Use verbal coordination channels inter sector/ centre − Identify all new tracks using the “Confirm Squawk”
technique − Maintain identification by regular checks − Restrict traffic flow where necessary − Utilise holding patterns − Be aware of unreliable Safety Nets and Monitoring Aids − Seek SAS assistance where necessary
Expect:
The following functions are NOT AVAILABLE:
− Flight Plan tracks − Tracks already displayed will remain displayed − Flight Plan commands (i.e. mod, terminate) − Message queues − Message transmission − Coordination messages − Mailbox management − Resectorisation − Limited Safety Net and Monitoring Aids due no update
of the flight plans
Appendices
339
Reduced Communication Mode
GIW will show “FDX”
ATCO actions:
− Inform Coordinator − Use only verbal inter-centre coordination channels − Inform all traffic on RTF − Seek FDA assistance for AFTN or AIS information − Maintain timely & accurate strip marking − Seek SAS assistance where necessary
Expect:
The following functions are NOT AVAILABLE:
− Inter centre communications − AFTN − Coordination messages (except inter sector) − Flight plans are not updated by external messaging − AIS
Appendices
340
Radar Data Function failure
ATCO actions:
− Inform Coordinator − Select radar by-pass services
Expect:
No radar data function (neither ARTAS nor MRTS nor RFS)
341
Appendix IV The questionnaire design
Air Traffic Controller Questionnaire
Dear Sir/Madam, This questionnaire is created for the purpose of obtaining information on equipment failures and recovery in Air Traffic Control (ATC) System(s) from various standpoints. The information you provide will be used in a research project jointly supported by EUROCONTROL Experimental Centre and Imperial College London. We would greatly appreciate your completing of the attached questionnaire. It will only take a few minutes of your time to answer the questions which will contribute to our joined effort to introduce more real experience into ATC safety analysis. Data collection intends to support recovery strategies of future ATM and analyse the current status on this issue. The information that you provide will be used as additional data source for the PhD dissertation developing in this area. The questionnaire is created in Microsoft® Word 2000. It is our intention to enable you to fill it out electronically and directly send it directly to the following e-mail address ([email protected]). However, if it is more convenient you can use the fax number provided below. Generally there are two formats of the questions, which require different way of answering. For some questions you will have to choose the most appropriate answer by highlighting it, marking it (e.g. yes/no answers), while for the others you will have to type in your full answer. Please, fill out your questionnaire and try to answer the questions as detailed as possible. Your answers will be strictly confidential and de-identified, thus your personal details will not appear in any document connected to this research. Thank you in advance for your time and effort.
Sincerely, Branka Subotic
Research PhD student Imperial College London Centre for Transport Studies London SW7 2AZ
Phone +44 (0)2075946 022 Fax +44 (0) 2075946 102
Appendices
342
Air Traffic Controller Questionnaire
1. Total number of years active as a controller ____________
2. Please list the types of facilities that you have worked in, beginning with the most recent.
ATC Facility Name (beginning with the
most recent) Location Country
Number of years worked in particular
Unit
Type (Civilian/ Military)
Position/Rating ACC/RDR, ACC/PROC,
APP/RDR, APP/PROC, TWR or
ARTCC, TRACON, ATCT (USA)
3. Have you ever experienced ATC equipment failure during your work? Mark the corresponding letter. (If ‘No’ go to question 10) Y N
4. What is the average number of ATC equipment failures during one year that you experience? _________________________
Appendices
343
5. Please fill in any previous experience with equipment failures which seriously impacted your work:
* Page: 343 Context is defined as any aspect of the operating context that influenced the failure or recovery aspect (e.g. workload, HMI, personal factors, team factors).
Note: The typical CWP (controller working position) contains one or more of the following systems (systems will vary from one center and country to another):
• Radar (SSR, PRS, Mode S, radar data processing (RDP), multi-radar processing (MRP), single radar processing (SRP))
• Ancillary screens (meteorological information, strip bay, traffic flow information, etc.) o Flight Plan Processing (FPP) o Flight Progress Strips (FPS)
• Pointing devices (mouse & trackball)
• Secondary input devices (keyboard or touch input device (TID))
Type of equipment
failure
System affected? (See Note
below)
Frequency of the failure per
year (in your own experience)?
Did you detect it
and how?
If not, who
detected it?
Duration of the failure
min, h, days (If you can
recall)?
Was the context* of the failure an
important factor? If yes, has it positive or
negative impact?
Recovery/ contingency
procedure existed or
not?
Recovery/ contingency training existed or
not?
Who initiated
the recovery?
How was the
recovery initiated?
Any additional comment
Appendices
344
• Communication panel
• R/T, telephone, headset, intercom
• Strip printer
• Ground based Safety Nets (SNET): STCA, MSAW, APW, or any other SNET available
• Other (e.g. power supply)
6. How much do you generally rely upon the written procedures in case of equipment failure and how much on situation-specific problem solving (i.e. improvisation)? Fill in the corresponding number for Procedures, Problem solving, AND Other.
1 (very much) 2 3 (moderately) 4 5 (not at all)
Written procedures
Situation-specific problem solving
Other (e.g. past experience)
7. Is there any organized exchange of the past experience in solving the equipment failures with your fellow colleagues?
Y N
8. If yes, is it supported by your management as a good work practice? Y N
9. According to your experience, what are the three most unreliable ATC systems/subsystems? Please use the device listing from the Note above to state those systems starting with the most unreliable one:
(Note: Reliability is defined in this questionnaire as the probability that a piece of equipment or component will perform its intended function without failure over the given time period and under specific or assumed conditions)
Appendices
345
Following questions should be answered in relation to your current job, position, and level of experience (the first one cited in the question 2).
Procedures
10. Are recovery/contingency procedures available? Mark the corresponding letter. Y N
11. Which types of equipment failures (outages) are covered by procedures in your Center?
12. Are recovery/contingency procedures up-to-date? Y N
13. Are recovery/contingency procedures comprehensive? Y N
14. Are recovery/contingency procedures complete? Y N
15. If not, which procedure(s) would you add?
16. Are recovery/contingency procedures understandable? Y N
17. Are recovery/contingency procedures easily accessible? Y N
18. Are recovery/contingency procedures realistic/feasible? Y N
19. Are recovery/contingency procedures compatible with other procedures? Y N
Appendices
346
20. Describe the situation when you had a problem applying the recovery/contingency procedure and why?
Training
21. Is training provided in recovery from equipment failures? Y N
22. Is there separate refreshment training every year? Y N
23. If provided, how many times per year?
24. Is it enough? Y N
25. Does the training covers all important equipment failures? Y N
26. If not, what should be added?
27. Are training methods suitable (realistic, varied, etc)? Y N
28. Is recovery/contingency training compatible with and linked to other training? Y N
Appendices
347
Conclusion
29. Please write down any other comments or suggestions based on your past experience or professional opinion that you might have on the issue of equipment failures, recovery/contingency procedures, or training.
Thank you for taking the time to answer these questions. Your time and participation are greatly appreciated.
--End--
Appendices
348
Appendix V Example of one questionnaire response
Appendices
349
Appendices
350
Appendices
351
Appendices
352
Appendices
353
Appendices
354
Appendix VI Results extracted from question 5 of the questionnaire survey
The question 5 aimed to provide an opportunity to controllers to discuss their past
experience with equipment failures which seriously impacted on their work. In order to
provide a structured description of each example and extract all relevant information,
question 5 was presented in the form of a table. The rows dealt with different failure
types while the columns dealt with various failure characteristics. These failure
characteristics were as follows:
1. Type of equipment failure and system affected (assessed in section 6.7.3.3
of Chapter 6);
2. Frequency of failure per year;
3. Individual who detected the failure;
4. Duration of the equipment failure;
5. Importance of the recovery context;
6. Existence of recovery procedure for a particular failure (assessed in Table
6-3, Chapter 6);
7. Existence of training for recovery for a particular failure;
8. Individual who initiated the recovery and method applied; and
9. Concluding remarks.
1. Frequency of failure per year
The frequency of failure experienced by controllers was not possible to extract in 27.20
percent of cases. This was partially due to missing responses but mostly due to vague
and unclear responses (e.g. very often, rare). The available and pre-processed data
show that the frequency of failures per year is on average more than 14, ranging
between less than once per year to as many as 730 annually (or twice per day). The
great dispersion of data confirms different interpretation of equipment failures (as
discussed in section 6.7.3.1 of Chapter 6).
2. Individual who detected the failure
The failures were detected most frequently by controllers (in 79.4 percent of examples)
and with the assistance of the system-generated failure alert (in 7.1 percent of
examples). Other cases include failure detection by watch supervisors, engineers,
pilots, or controllers from other ATC Centres (in the case of a failure affecting national
or regional airspace, such as failure of satellite communication, flight data processing
Appendices
355
system, or radar). These findings are expected as NATS (2002) reports that most
failures do not affect the controllers as these are prevented or recovered by system
control and monitoring unit. Moreover, the results obtained from this questionnaire
survey emphasise that the prompt detection of any ATC system deficiency depends
mostly on the controller, as a direct result of the controller’s situational awareness.
Furthermore, the results show that failure detection may be aided by system-generated
failure alerts. This is an example of the synergy that exists between technical and
controller recovery achieved through the technical built-in defences for transmitting
information on failure (discussed in Chapter 4, section 4.3.2). These technical systems
will demonstrate more potential in the future, highly integrated ATC environment.
3. Duration of the equipment failure
Similar to the frequency variable, it was not possible to extract the duration of failures in
27.20 percent of examples. This was expected due to the difficulties with recalling the
duration of past failures. Additional problems were encountered with vague qualitative
responses (e.g. several days, a couple of hours, a few minutes). The available and pre-
processed data show that the average duration of the reported failures was close to
one day, ranging from five minutes to one month. The large dispersion indicates
different durations for different types of failures.
The same categorisation of duration variables is applied as previously with the
operational failure reports (see Chapter 4, section 4.4.6). More precisely, the
categorisation focused on failures up to 15 minutes, between 15 minutes and one hour,
between one hour and one day, and those lasting more than one day. It is interesting to
note that distribution of duration from operational failure reports and from past
experience captured in this survey show similarities (Figure 1). The difference is
observed in the third category (duration from one hour to one day). It seems that in the
operational environment, equipment failures of this duration tend to occur more
frequently compared to the experience of controllers worldwide.
Appendices
356
(>24.01][1.01-24.00][0.26-1.00][0.00-0.25]
Duration category (h)
100
80
60
40
20
0
Fre
qu
en
cy
7.23%
19.15%
31.06%
42.55%
a)
[>24.01][1.01-24][0.26-1][0.00-0.25]
Duration category (h)
3,000
2,500
2,000
1,500
1,000
500
0
Fre
qu
en
cy
8.04%
31.6%
25.85%
34.51%
b)
Figure 1 Distribution of the duration variable a) from the questionnaire survey; b) from the Country D operational failure reports (see Chapter 4)
4. Importance of the recovery context
When asked about the context surrounding the occurrence of an equipment failure, the
controllers acknowledged its importance in the majority of examples (73 percent of
examples). Furthermore, these controllers rated its impact mostly as negative (63.9
percent of examples). The negative issues mentioned regarding the context of the
equipment failures were reduction of capacity, increased workload, increased stress,
increased communication with aircraft, increased coordination with adjacent sectors,
and in some cases additional workload due to deterioration in the weather. However,
Appendices
357
there were several instances in which controllers rated context as positive mostly
through efficient teamwork, availability of an efficient assistant, low traffic levels at the
time of occurrence (i.e. no significant increase in workload), and ability to work with
fallback systems. As a result, the importance of context identified in past research is
confirmed in this questionnaire survey. The following Chapters are dedicated to further
assessment of recovery context.
5. Existence of training for recovery for a particular failure
Question 5 allowed mapping between ATC functionalities and available recovery
training for the sampled equipment failures1. The analysis showed that in 48 percent of
examples provided, the controllers had some type of recovery training. This training
was mostly provided for the communication, navigation, surveillance, and data
processing functions. Lack of training is identified for power outages and loss of safety
nets.
6. Individual who initiated the recovery and method applied
The individuals that initiated and applied recovery processes came predominately from
the controller population when compared with watch managers and engineers. This is
understandable as section 2 pointed out that most equipment failures are detected by
controllers. Having detected a problem with equipment, the controllers have to inform
engineers, indirectly through the watch manager, which constitutes the initiation of the
recovery. In some simple cases (e.g. loss of microphone and loss of screen), the
controller tries to replace the failed equipment either by using the spare one or by
changing to another working position (if there are any spare ones). In more complex
situations, when a change of position is not possible, the controller has to continue
working with the remaining tools and equipment and potentially revert to procedural
control, assure vertical separation, use fallback systems, and/or transfer all flights to an
adjacent sector or flight information region. Engineers initiate the recovery process in
the case of failures of aeronautical data exchange with adjacent ATC Centres,
runway/taxiway lighting systems, and data processing system. However, the controller
still remains responsible for safe separation of all traffic in the affected airspace.
1 Question 26 although intended to capture the type of recovery training missing in each
sampled ATC Centre yielded mostly high-level comments on impossibility to train for every potential equipment failure.
Appendices
358
7. Concluding remarks
In general, the controllers’ perceive equipment failures as stressful and distracting
events that pose a major safety problem due to increased workload and difficulties with
maintaining identification of aircraft (e.g. in case of radar failure and data processing
failure). In one particular instance a controller commented that an equipment failure led
to a near miss. Another example pointed out the problems with equipment failures
occurring during night shift, as technical staff are not always available during that
period.
Appendices
359
Appendix VII Overview of contextual factors
Factor HERA
Eurocontrol HERA [12]
TRACEr Shorock and Kirwan [19]
RAFT Eurocontrol
[20]
THERP Swain and Guttman [24]
COCOM Hollnagel
[27]
CREAM Hollnagel [11]
External PSF Stressors Internal
PSF
1 Pilot-controller comm.
Pilot-controller comm.
Pilot-controller comm.
Written and verbal communication
2 Pilot actions
3 Traffic and airspace
Traffic and airspace Task load and system complexity
Complexity; Requirements for perception; requirements for motor speed
Task speed; Task load
4 Weather
5 Documentation and procedures
Procedures Procedures and documentation
Required procedures; Work-methods; Plant policy
Plans Availability of procedures/ plans
6 Training and experience
Training and experience
Training and experience
Prior training, experience
Normal/familiar process state
Adequacy of training and experience
7 Workplace design and HMI
Workplace design, HMI, and equipment factors
Human machine interaction
Design features; Factors in task and work resources; Warnings and danger signs; Man-machine factors; Interface
Inconsistent labelling
MMI and support
Adequacy of MMI and operational support
8 Environment Ambient environment
Quality of environment; T; Air quality; Situational factors
Detractors; Extreme T; radiation; Pressure; Inadequate oxygen supply; Vibration; Restricted movements
Working conditions
9 Personal factors Personal factors Personal factors
Perception; Motor system; Memory; Decision-making; Short-term and long-term memory
Duration of stress; Pain; Thirst; Fatigue; Threats; Monotony; Work performance; Circadian rhythm
State of momentarily abilities personality and intelligence; motivation and attitudes; emotional state; stress; gender
Time of the day (circadian rhythm)
10 Team factors Social and team factors
Social and team factors
Attitudes deriving from family or groups; group dynamic processes
Crew collaboration quality
11 Organisational factors
Organisational factors
Other organisational factors, Logistical factors
Organisational structure; Working hours; Actions by shift leader, manager; Remuneration structure
Adequate organisation
Adequacy of organisation
12 Few simultaneous goals
Number of simultaneous goals
13 Suddenness of occurrence
Available time Available time
14
Appendices
360
Factor HRMS
Kirwan [28]
Recovery from Failures
Kanse and van der Schaaf [21]
CORE-DATA Eurocontrol
[13]
ATHEANA U.S. NRC
[29]
CAHR Straeter [16]
NARA Kirwan et al.
[30]
HPDB Park et al.
[32]
1 Communication
2
3 Task organisation & Task complexity
Task complexity & Task criticality & Task novelty
Task preparation; Task simplicity; Complexity of the task; Precision; Monotony of activity
Dependencies of the different tasks/steps/actions
4
5 Procedures Procedures
Clarity/Precision of procedures; Design of procedures; Content; Completeness; Presence
Shortfalls in the quality of information conveyed by procedures; use of more dangerous procedures
Available procedure & description of all steps and tasks
6 Training/expertise/experience/competence
Person related factors Refresher training & Training
Inexperience
Operator inexperience; Unfamiliarity (situation occurs infrequently)
Level of experience
7 Quality of information/ interface
Technical/workplace/situational factors
Ergonomic design & HMI ambiguous & HMI feedback; Alarms; Labels
Unfamiliar plant conditions
Usability of control; Usability of equipment; Positioning; Equivocation of equipment ; arrangement of equipment; display range; accuracy of display; Labelling; Marking; Reliability; Technical layout; Construction; Redundancy; Coupled equipment
Low signal to noise ratio; Overriding information easily accessible; no means to reverse an unintended action; Poor system feedback; Poor system feedback on activity progress
8 Technical/workplace/situational factors
Environmental factors and ergonomics
External event Poor environment
9 Person related factors Stress; Workload
Human performance capabilities at low point; Excessive workload
Processing; Information; Goal reduction
Operator under load/boredom; A conflict between intermediate and long-term objectives; Stress and ill-health; Information overload
Person issues; Demand of perception, cognition, etc.
10 Task organisation Social factors Poor handovers and team coordination problems
Team issues
11 Organisational factors Lack of supervision/checks
Non-optimal use of human resources
Low workforce moral or adverse organisational environment
12
13 Time Factors relevant for prioritisation of recovery-related factors
Time pressure Time constraints Time pressure Time pressure
The time needed to correctly perform tasks, steps, and actions
14 Occurrence-related factors
Appendices
361
Appendix VIII Probabilities for 20 Recovery Influencing Factors (RIFs)
The relevant Recovery Influencing Factors (RIFs) are discussed in the four main
groups: internal factors (i.e. related to the controller), equipment failure related factors,
external factors (i.e. factors related to working conditions), and airspace related factors.
The following paragraphs present the underlying considerations in developing the
probability values for each predefined RIF.
A.1 Internal factors
Internal factors represent a group of RIFs closely related to the air traffic controller.
These include quality of training, controller experience with equipment failures in
his/her professional career, experience with (or trust in) the ATC system, generic
assessment of personal factors (e.g. personality, fatigue, stress), and communication
for recovery as a result of detected equipment failure.
A.1.1 Training for recovery from ATC equipment failure
This factor describes the adequacy of training provided in recovery tasks based on the
existing recovery procedures and/or other ATC Centre specific equipment failures,
frequency of refresher training (e.g. once per year), and familiarity with ATC system
operational modes (ranging from full, through reduced/emergency, to failed operation).
The qualitative descriptor and the corresponding probabilities are determined from the
questionnaire survey responses based on percentages of ATC Centres that provide
training for recovery, those that provide this training but not consistently, and those that
do not provide any training for recovery (see Chapter 6, section 6.7.3.6 and Chapter 8,
section 8.3.1.2). The qualitative descriptor and the corresponding probabilities for this
RIF are presented in Table 1.
Table 1 Summary of the RIF ‘Training for recovery from ATC equipment failure’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of responses
Percentage of
responses
RIF probability
Nature of the
validation
Training for recovery from ATC equipment
failure
suitable The
questionnaire survey
134
52 0.52
- tolerable 17 0.17
counter productive
31 0.31
Appendices
362
A.1.2 Previous experience with equipment failures
This factor describes the overall level of controller experience with equipment failures,
as well as the level of experience with a particular type of failure under assessment.
The qualitative descriptor is set at two levels (controllers can either have experience
with equipment failures or not), while the probabilities are determined from the
questionnaire survey, further validated by the responses from the ATM specialists
surveyed (Table 2).
Table 2 Summary of the RIF ‘Previous experience with equipment failures’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of responses
Percentage of
responses
RIF probability
Nature of the
validation
Previous experience
with equipment
failures
experienced any type of equipment
failure The questionnaire
survey 134
95 0.95
ATM specialists surveyed
no experience
with equipment
failures
5 0.05
A.1.3 Experience with system performance (reliance or trust in the system)
This dynamic factor describes the overall level of experience of the controller with the
ATC system including the tools and subsystems on the ATC console. The use of
automated tools depends upon the controllers’ trust in their reliability. The extreme
situations of undertrust or overtrust may lead to problems. The former may result in the
tool not being used and the latter, in the over reliance of the controller on the tool
available. The probabilities are determined from the findings of the study by Hilburn
and Flynn (2001) also reported in EUROCONTROL (2000b), which involved a total of
79 controllers from seven European ATC Centres. This study used both focus group
discussions and survey data collections to extract controllers’ attitudes to future
automation needs, system development issues, and operational requirements. The
results showed that 18 percent of controllers sampled mistrust technology. On the
other hand, the responses from the ATM specialists surveyed in this thesis reveal that
10 percent of controllers have excessive trust in the system. Taking mistrust and
excessive trust together, the qualitative descriptor for this RIF is set at two levels and
the corresponding probabilities are shown below (Table 3).
Appendices
363
Table 3 Summary of the RIF ‘Experience with system performance’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of responses
Percentage of
responses
RIF probability
Nature of the
validation
Experience with system performance (reliance or trust in the
system)
objective attitude
toward the ATC
system
Past research and ATM
specialists
79/8
72 0.72
-
excessive trust and mistrust
28 0.28
A.1.4 Personal factors
These are controller-related factors, which can be determined in a post-failure analysis
or predicted in the case of predictive analysis. This factor includes, but it is not limited
to, the following: time of the day (i.e. relevance of circadian rhythm), time into the shift
(i.e. level of situational awareness as well as fatigue), and age. Although other factors
are important, for example, the level of confidence, complacency, self-esteem (i.e. trust
in own ability), personality, motivation, attitudes deriving from family or close social
groups, and ability to cope with stress, they require the application of various sets of
psychological tests. Current definition of the personal factors accounts for all the above
mentioned factors and sets the qualitative descriptor at three levels. The respective
probabilities are determined from the average of the responses from the ATM
specialists surveyed (Table 4).
Table 4 Summary of the RIF ‘Personal factors’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of responses
Percentage of
responses
RIF probability
Nature of the
validation
Personal factors
suitable
ATM specialists
8
65 0.65
- tolerable 26 0.26
counter productive
9 0.09
A.1.5 Communication for recovery within team/ATC Centre
This factor includes only the communication that takes place between controllers for
the purpose of recovery from equipment failure. Therefore, it assesses the quality of
communication as well as the decision-making process, quality of Team Resource
Appendices
364
Management (TRM)2, familiarity of team members or the level of synergy between
them, the level of mutual understanding and the knowledge of different working
strategies, team efficacy, intent recognition (i.e. overt communication), and other items.
In the case of a single-controller position this factor should be understood as a
communication with a supervisor or any other relevant personnel. The qualitative
descriptor is proposed at three levels while the corresponding probabilities are
determined from the average of the responses from the ATM specialists surveyed
(Table 5).
Table 5 Summary of the RIF ‘Communication for recovery within team/ATC Centre’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of
responses
Percentage of
responses
RIF probability
Nature of the
validation
Communication for recovery
within team/ATC
Centre
efficient
ATM specialists
8
73 0.73
- tolerable 24 0.24
inefficient 4 0.04
A.2 Equipment failure related factors
Equipment failure related factors represent a group of RIFs defining the characteristics
of failures relevant to the controller recovery process. These are complexity of failure
type, time course of failure development, number of workstations/sectors affected, time
necessary to recover, existence of recovery procedure, and duration of failure. Details
on failure characteristics can be found in Chapter 4.
A.2.1 Complexity of failure type
This factor identifies single versus multiple component failures (as discussed in
Chapter 4) and thus the qualitative descriptor is proposed at two levels. The
probabilities of each level are determined using the operational failure reports from
available Civil Aviation Authorities (Table 6). Due to the relatively low level of
confidence in the use of CAA occurrence databases (see Chapter 8, section 8.3.1.5),
these probabilities were validated by the responses from the ATM specialists surveyed
which did not show a significant difference. Additionally, these results are in line with
the experience of system control and monitoring engineers interviewed for this study
2 TRM represents an effective use of all available resources for ATC personnel to assure safe
and efficient operation, to reduce error, avoid stress, and increase efficiency.
Appendices
365
who stated that the majority of ATC equipment failures represent single as opposed to
multiple failure occurrence (for evidence see Appendix II).
Table 6 Summary of the RIF ‘Complexity of failure type’
RIF Qualitative descriptor
Data source for probabilistic
assessment
Number of
responses
Percentage of
responses
RIF probab
ility
Nature of the
validation
Complexity of failure type
a single failure
Operational failure reports
22,808 reports
92 0.92
ATM specialists responses and system control and monitoring engineers
multiple failure
8 0.08
A.2.2 Time course of failure development
This factor defines the temporal characteristics of failure occurrence. These are
sudden, gradual, and latent/persistent failures. As a result, the qualitative descriptor is
set at three levels: sudden failure/gradual degradation of system/persistent or latent
failure. Based on the averaged responses from the ATM specialists surveyed the
corresponding probabilities are presented in Table 7. These probabilities were
validated by the interviews with system control and monitoring staff from several ATC
Centres which did not show a significant difference (for evidence see Appendix II).
Table 7 Summary of the RIF ‘Time course of failure development’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of responses
Percentage of
responses
RIF probability
Nature of the
validation
Time course of failure
development
sudden ATM
specialists responses
8
55 0.55 System control and monitoring engineers
gradual 39 0.39
latent 7 0.07
A.2.3 Number of workstations/sectors affected
This factor describes the immediate impact of a particular type of failure in terms of the
number of positions/sectors affected. It is closely linked to the overall ATC Centre
architecture, since exposure to failure varies greatly with the level of interconnectivity of
different systems, the level of availability of separate channels (redundancy/variability),
and complexity of failure (single vs. multiple failure). The qualitative descriptor is
proposed at two levels, differentiating between a failure affecting a single and multiple
Appendices
366
Controller Working Positions (CWPs) and sectors. Due to the lack of operational data,
a conservative approach is taken and probabilities are equally assigned between two
levels. Note that this RIF has no Level 1, i.e. the most favourable level, simply because
the number of workstations/sectors affected cannot have any positive or favourable
effect on controller performance (Table 8).
Table 8 Summary of the RIF ‘Number of workstations/sectors affected’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of
responses
Percentage of
responses
RIF probability
Nature of the
validation
Number of workstations/
sectors affected
one CWP or several CWPs in a
sector N/A
50 0.5
- several CWPs in
several sectors/all CWPs in all sectors
50 0.5
A.2.4 Time necessary to recover
This factor describes the time necessary for a controller to recover from the effect(s) of
equipment failure. This time should be measured from the moment of failure
occurrence until the establishment of a normal or stable system state (i.e. assurance of
safe but not necessarily efficient control of air traffic). The qualitative descriptor is set at
two levels, differentiating between availability and lack of time to recover, while the
corresponding probabilities are determined from the average of the responses from the
ATM specialists surveyed (Table 9).
Table 9 Summary of the RIF ‘Time necessary to recover’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of responses
Percentage of
responses
RIF probability
Nature of the
validation
Time necessary to recover
less than time
available3 ATM
specialists 8
94 0.94
- in excess
of time available
6 0.06
3 Time available to controller to react before the development of less than adequate separation.
Appendices
367
A.2.5 Existence of recovery procedure
This factor takes into account the availability of a written procedure, rules, or guidelines
for a particular type of equipment failure, the level of its comprehensiveness and
completeness. In future this RIF may even include the existence of some sort of a
dynamically adaptable procedure. The qualitative descriptor is set at three levels to
capture the quality of the existing procedure (Table 10). Probabilities are calculated
based on the findings from the questionnaire survey responses which showed that 13.8
percent of ATC Centres do not have any recovery procedures. The distinction between
suitable and tolerable procedures was acquired taking into account that 45 percent of
existing procedures are not complete, and therefore only tolerable. It should be noted
that this approach is limited as it associates incomplete procedures with tolerable
procedures. A more accurate approach is achievable when the proposed methodology
is applied to a specific equipment failure and its context.
Table 10 Summary of the RIF ‘Existence of recovery procedure’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of
responses
Percentage of
responses
RIF probability
Nature of the
validation
Existence of recovery
procedure
suitable The
questionnaire survey
134
47 0.47
- tolerable 39 0.39
inappropriate4 14 0.14
A.2.6 Duration of failure
This particular factor represents the amount of time during which a failure persists.
Applied to a specific system, it can carry important information on recovery and the
impact of particular failure on ATC and overall aviation safety. A discussion of the
duration of failures informed by the results of the operational failure report analysis
informed the qualitative descriptor, proposed at two levels. The corresponding
probabilities are determined from the operational failure reports (Chapter 4), further
validated by the responses from the ATM specialists surveyed which did not show a
significant difference (Table 11).
4 If procedures are not available, ‘Inappropriate’ would be used.
Appendices
368
Table 11 Summary of the RIF ‘Duration of failure’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of
responses
Percentage of
responses
RIF probability
Nature of the
validation
Duration of failure
short period of time (up to 15minutes)
Operational failure reports
22,808 (reports)
56 0.56 ATM
specialists surveyed
moderate to substantial period of time (failures longer
than 15 minutes)
44 0.44
A.3 External factors
External factors or factors related to working conditions represent the group of RIFs
related to the working conditions surrounding a controller at the moment of failure.
These are adequacy of HMI, operational support, quality of alarms/alerts and the
moment when they are triggered in the system, and the overall adequacy of the
organisational characteristics in an ATC Centre from the safety and operational
perspectives.
A.3.1 Adequacy of HMI and operational support
This factor includes the HMI and all available control panels (e.g. mode of operation,
radars in use, frequencies in use and dynamic flight information), situational display, as
well as the operational support provided by specifically designed decision aids. It is
important to highlight that a controller receives the entire feedback on the ATM system
performance through the HMI. The qualitative descriptor is set at three levels to capture
the quality of the HMI, while the probabilities are determined from the average of the
responses from the ATM specialists surveyed (Table 12).
Table 12 Summary of the RIF ‘Adequacy of HMI and operational support’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of
responses
Percentage of
responses
RIF probability
Nature of the
validation
Adequacy of HMI and
operational support
suitable
ATM specialists
8
53 0.53
- tolerable 45 0.45
counter productive
3 0.03
A.3.2 Ambiguity of information in the working environment
This dynamic factor describes the transparency of the system, the level of system
interaction and redundancy, and existence of symptoms that can be interpreted in more
Appendices
369
than one way. In general, it is observed that a lack of transparency of an ATC system
leads people to make hypotheses on the causes of failures based on incomplete
information or best guess (see Straeter, 2005). ATC subsystems are highly dependent
on each other. Information from one tool can be distributed to several different
subsystems at the same time. For example, information on aircraft position is sent
directly to the radar data processing system, air traffic flow management, ATC tools
(including the monitoring aid and the medium term conflict detection tool), safety nets
(e.g. the short term conflict alert tool), and flight data processing system. In other
words, ATC systems are closely coupled and dependant upon dynamic information
exchange. For this reason the architecture of any ATC Centre takes into account
existing interactions by building a net of redundancies. In addition, any symptoms that
can be interpreted in more than one way will be interpreted wrongly in some instances.
Based on the above discussion, the qualitative descriptor are set at two levels whilst
the corresponding probabilities are determined from the average of the responses from
the ATM specialists surveyed (Table 13).
Table 13 Summary of the RIF ‘Ambiguity of information in the working environment’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of
responses
Percentage of
responses
RIF probability
Nature of the
validation
Ambiguity of information in the working environment
the match between the external
working environment and the controller's internal mental
model ATM specialists
8
86 0.86
- the mismatch between the
external working environment and the controller's internal mental
model
14 0.14
A.3.3 Adequacy of alarms/alerts
As explained in Chapter 4, the function of alarms/alerts is to alert operators (visually
and/or auditory) to potential non-nominal system states. The role of the human
operator is then to confirm the existence of a failure and take appropriate actions.
Because of the complexity of current ATC consoles, it is believed that the availability,
adequacy of alerts, and other relevant characteristics should be considered separately
from HMI. Therefore, this factor describes the availability and adequacy of
Appendices
370
alarms/alerts which permit detection, diagnosis, and/or correction of failures, the
reliability of given information, the number of alerts presented to the controller, and the
appropriate location and format of alert information (e.g. signal, colour coding,
warning/message). The qualitative descriptor is set at three levels, to account for
suitable tolerable and inadequate design solutions, while the probabilities are
determined from the average of the responses from the ATM specialists surveyed
(Table 14).
Table 14 Summary of the RIF ‘Adequacy of alarms/alerts’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of
responses
Percentage of
responses
RIF probability
Nature of the
validation
Adequacy of alarms/alerts
suitable
ATM specialists
8
75 0.75
- tolerable 20 0.2
counter productive
5 0.05
A.3.4 Adequacy of alarm/alert onset
This dynamic factor describes one important characteristic of the available
alerts/alarms, namely the ‘cognitive convenience’ of alert onset. In other words, alert
onset has a high impact on the overall recovery performance depending on the
moment of its onset. In addition, a misleading sequence of alerts can lead the controller
towards wrong assumptions with a cognitive tunnelling based on the initial alert,
thereby disregarding a later, possibly more relevant alert (Straeter, 2005). Since the
adequacy of alert onset depends directly on the complexity of traffic in the dedicated
airspace (dynamically changing every second), this RIF is given two levels.
Furthermore, due to the lack of ATC operational data on this advanced and futuristic
concept, a conservative approach is taken and probabilities are equally assigned
between two levels (Table 15).
Appendices
371
Table 15 Summary of the RIF ‘Adequacy of alarm/alert onset’
RIF Qualitative descriptor
Data source for probabilistic assessment
Number of responses
Percentage of responses
RIF probability
Nature of the validation
Adequacy of
alarm/alert onset
information from the external world enters the processing loop at
the right time
N/A N/A
50 0.50
- information from the external world enters the processing loop at
the wrong time, i.e. misleading alarm or sequence of alarms
50 0.50
A.3.5 Adequacy of organisation
This factor describes several organisational characteristics of the ATC Centre. These
include but are not limited to the quality of roles and responsibilities, the availability of
team members, the availability and adequacy of supervision, the availability of
additional support (e.g. assistant), the personnel selection process, shift patterns and
personnel planning, attitude to teamwork, safety culture, existence of stress
management programs, support for the organised exchange of past experience on
equipment failures, adequacy of communication with management and technicians
(e.g. briefings, exchange of knowledge, bulletins, safety panels). Three qualitative
descriptors can be distinguished with probabilities determined from the average of the
responses from the ATM specialists surveyed (Table 16).
Table 16 Summary of the RIF ‘Adequacy of organisation’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of
responses
Percentage of
responses
RIF probability
Nature of the
validation
Adequacy of organisation
efficient
ATM specialists
8
67 0.67
- tolerable 31 0.31
inefficient 3 0.03
A.4 Airspace related factors
Airspace related factors relate to the characteristics of the airspace affected by the
degraded system performance, traffic complexity at the moment of failure and during
the recovery process, and weather conditions. In addition, this group includes the
overall task complexity of the situation. For example, an equipment failure occurrence
coupled with sudden increase in amount of traffic, sudden deterioration of weather, or
the existence of priority aircraft highly increase the complexity of the overall situation.
Appendices
372
A.4.1 Traffic complexity during the recovery process
This dynamic factor includes but is not limited to the following: the level and
characteristics of the traffic load, the mix of aircraft flying on instrument flight rules (IFR)
and visual flight rules (VFR), military aircraft (because of different performance
characteristics and speed differentials), the existence of priority aircraft (e.g. low fuel,
government flights, and medical emergency). There have been various studies into
traffic complexity (Hilburn, 2004) and various attempts to provide a quantitative
indicator of traffic complexity; for example using dynamic density (Kopardekar and
Magyrtis, 2003), cross-sectional time-series analysis methods (Majumdar et al., 2004),
and the use of traffic complexity indicator (EUROCONTROL, 2006c). Any of these
approaches may be used to inform the probabilities for the qualitative descriptor of this
particular RIF. Taking into account only the impact that traffic complexity may have on
the controller performance, this qualitative descriptor is proposed at two levels. One
level accounts for average traffic complexity whilst the other accounts for high and low
traffic complexity, as both negatively impact controller performance. The probabilities
are determined from the average of the responses from the ATM specialists surveyed
(Table 17).
Table 17 Summary of the RIF ‘Traffic complexity during the recovery process’
RIF Qualitative descriptor
Data source for probabilistic assessment
Number of responses
Percentage of responses
RIF probability
Nature of the validation
Traffic complexity during the recovery process
High and low traffic complexity
ATM specialists
8
19 0.19
- Average traffic
complexity 81 0.81
A.4.2 Airspace characteristics during the recovery process
This dynamic factor incorporates the characteristics and complexity of airspace (i.e. its
component sectors), based upon the sector design characteristics (for details see
NATS, 1999). These characteristics include the number of crossing points and their
position in relation to sector boundaries, number of flight levels, number of entry and
exit points, special use airspace (SUAs) including zones of military activity,
characteristics of upper vs. lower airspace, airways configuration, and the number of
neighbouring sectors. It is important to highlight the difference between enroute and
terminal airspace in relation to recovery from equipment failures. The terminal airspace
is characterised with traffic in constant level change (i.e. ascending or descending) and
Appendices
373
frequent changes in heading compared to enroute airspace and especially its higher
levels. Due to differences in controller tasks, en-route airspace in general provides
more time to recover compared to terminal airspace. In addition, interviews with ATM
specialists revealed that terminal airspaces have radar coverage provided from one
radar source compared to en-route airspace, which is usually based on multi-radar
tracking (i.e. integration of data from several radar sites). The qualitative descriptor is
set at three levels whilst the corresponding probabilities are determined from the
average of the responses from the ATM specialists surveyed (Table 18).
Table 18 Summary of the RIF ‘Airspace characteristics during the recovery process’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of
responses
Percentage of
responses
RIF probability
Nature of the
validation
Airspace characteristics
during the recovery process
Adequate
ATM specialists
8
64 0.64
- Tolerable 33 0.33
Inappropriate 3 0.03
A.4.3 Weather conditions during the recovery process
This dynamic factor takes into account any change in weather conditions during the
recovery process. The qualitative descriptor is proposed at two levels whilst the
corresponding probabilities are determined from the responses from the ATM
specialists surveyed (Table 19).
Table 19 Summary of the RIF ‘Weather conditions during the recovery process’
RIF Qualitative descriptor
Data source for probabilistic assessment
Number of responses
Percentage of responses
RIF probability
Nature of the validation
Weather conditions during the recovery process
Improved ATM
specialists 8
89 0.89
-
Deteriorated 11 0.11
A.4.4 Conflicting issues during the recovery process (task complexity)
This dynamic factor describes the level of overall task complexity at the moment of
equipment failure. In the case of multiple conflicting tasks, the operator has to prioritise
between them (Straeter, 2005). In the case of any type of conflict alert (i.e. two or more
aircraft having a conflicting intent), the controller has to provide full attention to the
Appendices
374
resolution of the conflict using the equipment which is still operational, but assuming
that some other subsystem might fail. In ATC overall safety is the first priority. Due to
the dynamic nature of ATC, this qualitative descriptor is proposed at two levels, the
average complexity of the situation and both high and low complexity of the situation
(as both have negative effect on controller performance: increased workload and
boredom or monotony, respectively). The corresponding probabilities are determined
from the responses from the ATM specialists surveyed (Table 20).
Table 20 Summary of the RIF ‘Conflicting issues during the recovery process (overall task complexity)’
RIF Qualitative descriptor
Data source for
probabilistic assessment
Number of
responses
Percentage of
responses
RIF probability
Nature of the
validation
Conflicting issues during the recovery
process
The average complexity
ATM specialists
8
72 0.72
- Multiple tasks and low
complexity 28 0.28
Appendices
375
Appendix IX Questions for ATM Specialist
Note: The set of questions presented below is investigating controller recovery from
equipment failures in ATC. All questions should be answered based upon your
operational experience and knowledge. Whilst some of them are very specific, and
therefore pose a challenge to answer, please try to respond to all the questions giving
the appropriate percentages.
How often has training (initial & refreshment) in your ATC Centre been:
Suitable for potential equipment failures Tolerable for potential equipment failures Counter productive for potential equipment failures
100%
What is the percentage of ATCOs that have never experienced equipment failure in their career? Please think of novice ATCOs as well and try to make the best estimation.
According to your best judgement, what percentage of ATCOs have:
Over-trust the automation/systems they are using Objective attitude toward ATC automation (ATCOs do trust automation but are aware of possible failures) Under-trust the automation/systems they are using
100%
In the event of equipment failure, how often have personal factors (stress, fatigue, self esteem) been:
Suitable to the equipment failure in question Tolerable to the equipment failure in question Counter productive to the equipment failure in question
100%
How often has team-related communication for recovery been:
Efficient Tolerable Inefficient
100%
What is the percentage of equipment failures affecting:
One system only Multiple systems at the same time
100%
What is the percentage of: Sudden equipment failures Gradual equipment failures Latent equipment failures in your ATC Centre
Appendices
376
100%
How often has the time necessary to recover (time before the development of any inadequate separation) been:
Adequate Inadequate
100%
How often (in your overall experience) have existing recovery procedures been:
Suitable to the equipment failure in question Tolerable to the equipment failure in question Counter productive to the equipment failure in question
100%
What is the percentage of equipment failures lasting:
Up to 15min More than 15min
100%
When there is a failure, how often has information presented on your HMI (i.e. radar screen) been:
Suitable to the recovery from equipment failure (e.g. provides appropriate cues, visual/auditory alerts) Tolerable to the recovery from equipment failure Counter productive to the recovery from equipment failure (e.g. provides wrong cues, mislead you)
100%
When there is a failure, how often have existing alarms/alerts on radar screen been:
Suitable to the recovery from equipment failure Tolerable to the recovery from equipment failure Counter productive to the recovery from equipment failure
100%
According to your opinion, what is the percentage of match between the controller's situational awareness and the dynamic airspace and traffic configuration (traffic mix, speed differentials, FL utilized, airways configuration) during the recovery process?
What percentage of time the organisational features in your ATC centre are:
Efficient Tolerable Inefficient regarding the support for better recovery from equipment failures.
100%
In the event of an equipment failure, how often has the traffic complexity been:
Too high Tolerable Too low
100%
In the event of an equipment failure, how often has airspace design and configuration been:
Adequate Tolerable Inappropriate
100%
In the event of an equipment failure, how often have the weather conditions been:
Improved Deteriorated or worsen Unchanged
100%
Appendices
377
In the event of equipment failure, how often has the total complexity of the recovery situation been:
High Average Low
100%
Appendices
378
Appendix X Overview of RIFs, their corresponding levels, and designated probabilities
(1) (2) (3) (4) (5) (6) (7) (8)
ID RIF name Descriptor Probability
(p)
Expected effect of
controller recovery
performance
Level Designator
(R)
Probability of overall situation occurring
(p*R)
Inte
rnal fa
cto
rs
1 Training for recovery from ATC equipment failure
Suitable to the situation in question
0.52 Most
favourable 1 1 0.52
Tolerable to the situation in question
0.17 Non
significant 2 0 0.00
Counter productive to the situation in question
0.31 Least
favourable 3 -1 -0.31
2 Previous experience with equipment failures
Experienced with a particular type of failure or Experienced with any other type of ATC equipment failure
0.95 Most
favourable 1 1 0.95
No experience with ATC equipment failures
0.05 Non
significant 2 0 0.00
3 Experience with the system performance (reliance)
Objective attitude toward the system
0.72 Non
significant 2 0 0.00
Positive experience with the system (excessive trust) or Negative experience with the system (under-trust)
0.28 Least
favourable 3 -1 -0.28
4 Personal factors
Suitable for the recovery process
0.65 Most
favourable 1 1 0.65
Tolerable for the recovery process
0.26 Non
significant 2 0 0.00
Counter productive for the recovery process
0.09 Least
favourable 3 -1 -0.09
5 Communication for recovery within team/ATC Centre
Efficient 0.73 Most
favourable 1 1 0.73
Tolerable 0.24 Non
significant 2 0 0.00
Inefficient 0.04 Least
favourable 3 -1 -0.04
Equip
ment
failu
re r
ela
ted facto
rs
6 Complexity of failure type
Single system affected
0.92 Non
significant 2 0 0.00
Multiple systems affected
0.08 Least
favourable 3 -1 -0.08
7 Time course of failure development
Sudden failure 0.55 Improve 1 1 0.55
Persistent or latent failure
0.07 Non
significant 2 0 0.00
Gradual degradation of system
0.39 Least
favourable 3 -1 -0.39
8 Number of workstations/sectors affected
One workstation/one sector or All workstations in one sector
0.50 Non
significant 2 0 0.00
Several workstations/couple of sectors or All
0.50 Least
favourable 3 -1 -0.50
Appendices
379
workstations/all sectors
9 Time necessary to recover
Adequate - less than available time
0.94 Most
favourable 1 1 0.94
Inadequate - in excess of available time
0.06 Least
favourable 3 -1 -0.06
10 Existence of recovery procedure
Suitable to the situation in question
0.47 Most
favourable 1 1 0.47
Tolerable to the situation in question
0.39 Non
significant 2 0 0.00
Inappropriate 0.14 Least
favourable 3 -1 -0.14
11 Duration of failure
Short period of time 0.56 Non
significant 2 0 0.00
Moderate period of time or Substantial period of time
0.44 Least
favourable 3 -1 -0.44
Exte
rnal or
facto
rs r
ela
ted to w
ork
ing c
onditio
ns
12 Adequacy of HMI and operational support
Suitable to the situation in question
0.53 Most
favourable 1 1 0.53
Tolerable to the situation in question
0.45 Non
significant 2 0 0.00
Counter productive to the situation in question
0.03 Least
favourable 3 -1 -0.03
13
Ambiguity of information in the working environment
External working environment matches the controller's internal mental model
0.86 Most
favourable 1 1 0.86
External working environment mismatches the controller's internal mental model
0.14 Least
favourable 3 -1 -0.14
14 Adequacy of alarms/alerts
Suitable to the situation in question
0.75 Most
favourable 1 1 0.75
Tolerable to the situation in question
0.20 Non
significant 2 0 0.00
Counter productive to the situation in question
0.05 Least
favourable 3 -1 -0.05
15 Adequacy of alarm/alert onset
Information from the external world enters the processing loop at the right time
0.50 Most
favourable 1 1 0.50
Information from the external world enters the processing loop at the wrong time (misleading sequence of alarms)
0.50 Least
favourable 3 -1 -0.50
16 Adequacy of organisation
Efficient 0.67 Most favourable
1 1 0.67
Tolerable 0.31 Non significant
2 0 0.00
Inefficient 0.03 Least favourable
3 -1 -0.03
Airspace
rela
ted
facto
rs
17 Traffic complexity
Average traffic complexity
0.81 Non significant
2 0 0.00
Extremely high or extremely low traffic complexity
0.19 Least favourable
3 -1 -0.19
Appendices
380
18 Airspace characteristics
Adequate (e.g. enroute higher levels)
0.64 Most favourable
1 1 0.64
Tolerable 0.33 Non significant
2 0 0.00
Inappropriate (e.g. enroute lower levels or terminal)
0.03 Least favourable
3 -1 -0.03
19 Weather conditions during the recovery process
Improved 0.89 Non significant
2 0 0.00
Deteriorated 0.11 Least favourable
3 -1 -0.11
20 Conflicting issues in the situation (task complexity)
Average complexity of the situation
0.72 Non significant
2 0 0.00
Conflicting, multiple tasks or Extremely low complexity of the situation (may lead to monotony)
0.28 Least favourable
3 -1 -0.28
Appendices
381
Appendix XI Validation of the RIFs interaction matrix
DIRECT INFLUENCE
Tra
inin
g f
or
recovery
Pre
vio
us e
xperience w
ith e
quip
. fa
ilure
s
Experience w
ith s
yste
m p
erf
orm
ance
Pers
onal fa
cto
rs
Com
m. fo
r re
covery
Com
ple
xity o
f fa
ilure
Tim
e c
ours
e o
f fa
ilure
develo
pm
ent
Num
ber
of w
ork
sta
tions a
ffecte
d
Tim
e n
ecessary
to r
ecover
Exis
tence o
f re
covery
pro
cedure
Dura
tion o
f fa
ilure
Adequacy o
f H
MI
and o
per.
support
Am
big
uity o
f in
form
ation
Adequacy o
f ala
rms/a
lert
s
Adequacy o
f ala
rms/a
lert
s o
nset
Adequacy o
f org
aniz
ation
Tra
ffic
/tra
ffic
com
ple
xity
Airspace c
hara
cte
ristics
Weath
er
Task c
om
ple
xity
Training for recovery from ATC equipment failures
x x
Previous experience with equip. failures
x
Experience with system performance (reliance)
x x x x
Personal factors
x x x x x x x x x x x x x x x x x x
Comm. for recovery within a team of controllers
x x x x x x x x x x x x x x x x x x
Complexity of failure type
x
Time course of failure development
x
Number of workstations/ sectors affected
x x
Time necessary to recover
x x x x x x x x x x x x x x x x x
Existence of recovery procedure
x
Duration of failure
x x
Adequacy of HMI and operational support
x x x x
Ambiguity of information in the working environment
x x x x x x x
Adequacy of alarms/alerts
x x x
Adequacy of alarms/alerts onset
x x x x x
Adequacy of organization
x x x x
Appendices
382
Traffic/traffic complexity in the moment of failure
x x x
Airspace characteristics
x x x
Weather conditions during the recovery process
Task complexity
x x x x x x x x x x x x x x x x x
NOTE: Please mark the interactions between each factor in the upper row and each factor from the left column. For example, does 'Training for recovery' influences any of the factors from the left side ('previous experience', 'experience with the system', 'personal factors', and so on). Please add or delete existing interactions as you find it appropriate.
Appendices
383
Appendix XII Distribution of 20 Recovery Influencing Factors (RIFs)
Level RIF1 RIF2 RIF3 RIF4 RIF5 RIF6 RIF7 RIF8 RIF9 RIF10
0.1 0 0 0 0 0 0 0 0 0 0
0.2 0 0 0 0 0 0 0 0 0 0
0.3 0 0 0 0 0 0 0 0 0 0
0.4 0 0 0 0 0 0 0 0 0 0
0.5 0 0 0 168 24 0 0 0 96 0
0.6 0 0 0 5964 2244 0 0 0 4272 0
0.7 0 0 0 67956 37908 0 0 0 58656 0
0.8 0 0 0 379116 266508 0 0 0 383184 0
0.9 2239488 0 0 1227984 1008576 0 0 0 1422000 0
1 8957952 13436928 0 2513604 2310156 0 8957952 0 3279840 8957952
1.1 2239488 6718464 0 1653636 1621692 0 4478976 0 2337228 4478976
1.2 0 0 0 3393708 3512088 0 0 0 5184840 0
1.3 0 0 0 2513604 2750052 0 0 0 4234404 0
1.4 0 0 0 1227984 1398444 0 0 0 2283432 0
1.5 0 0 0 379284 442464 0 0 0 786768 0
1.6 0 0 0 73920 82008 0 0 0 162216 0
1.7 0 0 0 73920 44760 0 0 0 17670 0
1.8 0 0 248832 379284 266688 0 0 0 780 0
1.9 2239488 0 3483648 1227984 1008576 0 0 0 6 0
2 8957952 13436928 8709120 2513604 2310156 13436928 8957952 10077696 0 8957952
2.1 2239488 6718464 3981312 1653636 1621692 6718464 4478976 6718464 0 4478976
2.2 0 0 3483648 3393708 3512088 0 0 3359232 0 0
2.3 0 0 248832 2513604 2750052 0 0 0 0 0
2.4 0 0 0 1227984 1398444 0 0 0 0 0
2.5 0 0 0 379284 442464 0 0 0 96 0
2.6 0 0 0 73920 82008 0 0 0 4272 0
2.7 0 0 0 73920 44760 0 0 0 58656 0
2.8 0 0 248832 379284 266688 0 0 0 383184 0
2.9 2239488 0 3483648 1227984 1008576 0 0 0 1422000 0
3 8957952 0 8709120 2513604 2310156 13436928 8957952 10077696 3279840 8957952
3.1 2239488 0 3981312 1653636 1621692 6718464 4478976 6718464 2337228 4478976
3.2 0 0 3483648 3393708 3512088 0 0 3359232 5184840 0
3.3 0 0 248832 2513604 2750052 0 0 0 4234404 0
3.4 0 0 0 1227984 1398444 0 0 0 2283432 0
3.5 0 0 0 379116 442440 0 0 0 786768 0
3.6 0 0 0 67956 79764 0 0 0 162216 0
3.7 0 0 0 5964 6852 0 0 0 17670 0
3.8 0 0 0 168 180 0 0 0 780 0
3.9 0 0 0 0 0 0 0 0 6 0
4 0 0 0 0 0 0 0 0 0 0
Appendices
384
Level RIF11 RIF12 RIF13 RIF14 RIF15 RIF16 RIF17 RIF18 RIF19 RIF20
0.1 0 0 0 0 0 0 0 0 0 0
0.2 0 0 0 0 0 0 0 0 0 0
0.3 0 0 0 0 0 0 0 0 0 0
0.4 0 0 0 0 0 0 0 0 0 0
0.5 0 0 0 0 0 0 0 0 0 0
0.6 0 0 0 0 0 0 0 0 0 0
0.7 0 0 20736 0 0 0 0 0 0 0
0.8 0 248832 684288 0 124416 0 0 0 0 0
0.9 0 2488320 3836160 746496 2363904 1492992 0 0 0 0
1 0 5474304 7527168 5971968 7589376 5225472 0 6718464 0 0
1.1 0 2488320 3545856 3732480 4354560 2985984 0 4478976 0 0
1.2 0 2488320 3836160 2985984 4976640 3359232 0 2239488 0 0
1.3 0 248832 684288 0 746496 373248 0 0 0 0
1.4 0 0 20736 0 0 0 0 0 0 6
1.5 0 0 0 0 0 0 0 0 0 696
1.6 0 0 0 0 0 0 0 0 0 14778
1.7 0 0 0 0 0 0 0 0 0 131736
1.8 0 248832 0 0 0 0 0 0 0 638880
1.9 0 2488320 0 746496 0 1492992 1119744 0 0 1903896
2 10077696 5474304 0 5971968 0 5225472 8957952 6718464 20155392 3719892
2.1 6718464 2488320 0 3732480 0 2985984 5598720 4478976 0 2405976
2.2 3359232 2488320 0 2985984 0 3359232 4478976 2239488 0 4929648
2.3 0 248832 0 0 0 373248 0 0 0 3719892
2.4 0 0 0 0 0 0 0 0 0 1903902
2.5 0 0 0 0 0 0 0 0 0 639576
2.6 0 0 0 0 0 0 0 0 0 146514
2.7 0 0 20736 0 0 0 0 0 0 146514
2.8 0 248832 684288 0 124416 0 0 0 0 639576
2.9 0 2488320 3836160 746496 2363904 1492992 1119744 0 0 1903902
3 10077696 5474304 7527168 5971968 7589376 5225472 8957952 6718464 20155392 3719892
3.1 6718464 2488320 3545856 3732480 4354560 2985984 5598720 4478976 0 2405976
3.2 3359232 2488320 3836160 2985984 4976640 3359232 4478976 2239488 0 4929648
3.3 0 248832 684288 0 746496 373248 0 0 0 3719892
3.4 0 0 20736 0 0 0 0 0 0 1903896
3.5 0 0 0 0 0 0 0 0 0 638880
3.6 0 0 0 0 0 0 0 0 0 131736
3.7 0 0 0 0 0 0 0 0 0 14778
3.8 0 0 0 0 0 0 0 0 0 696
3.9 0 0 0 0 0 0 0 0 0 6
4 0 0 0 0 0 0 0 0 0 0
Appendices
385
Appendix XIII Experimental material
Experimental material consists of various documents used by air traffic controllers
participating in the study, as well as the subject matter expert (SME). The documents
used by controllers are presented in the following order:
a) The controller handbook;
b) Debriefing interview sheet; and
c) Feedback form.
The documents used by subject matter expert are presented in the following order:
d) Subject matter expert’s assessment; and
e) Best practice procedure sheet.
Appendices
386
a) The controller handbook
TThhee CCoonnttrroolllleerr HHaannddbbooookk
Researcher: Branka Subotic
Supervisor: Dr Washington Y. Ochieng
University: Imperial College London
Location of experiment: XXX
June 2006
Appendices
387
SSUUBBJJEECCTT IINNSSTTRRUUCCTTIIOONNSS
Strategic and tactical decision making in ATC
Dear Controller, Welcome to the “Strategic and tactical decision making in ATC” research program. Because of your extensive experience as an Air Traffic Controller, you have been asked to participate in this study. Our aim is to test a new approach to better understanding of the decision making process by air traffic controllers. We will try to determine the cognitive processes that drive your decisions/actions during the dynamic and complex control of air traffic. The knowledge gained from this research will feed into the future design solutions of computerized ATC tools. We are not in position to reveal more information on this study at this point, as it may influence your behaviour, actions and, the processes we wish to observe and analyze. At the end of this study you will be more familiar with our objectives and you will be able to ask as many questions as you find necessary. So please bear with us and help us make this study as realistic as possible.
Your understanding and help are crucial at every step of this study! This study is designed as an integrated part of regular emergency training in Dublin ATC Centre with the minimal impact on the controller. Therefore, please consider and treat this training session as any other training session you have had in your professional career. From time to time, additional information may be given to you from the training instructor or researcher. In these occasions please act as if you would in the operational environment. Also, when information or instructions is given to you by the researcher, please regard it as if it comes from a training instructor.
Now, we would like you to read the “Consent form” which aims to inform you what the experiment involves and to make you fully aware of your rights while you are taking part in it. So please proceed to the next page, read the form, and sign it if you agree with all terms and conditions. If you have any questions, please do not hesitate to contact the researcher. In addition, we will ask you to fill out a questionnaire and participate in a de-briefing after the training session. The De-briefing part of this experiment is of high importance as we will compare the recorded data with your own experience and decision-making process. Therefore, we would like to encourage you to give the researcher detailed input and explanation.
Appendices
388
IMPERIAL COLLEGE LONDON RESEARCH SUBJECT INFORMED CONSENT FORM
The purpose of this research is to investigate the controller’s decision making process. You will be asked to complete one emergency training session and therefore perform air traffic control service through one traffic scenario. The entire experiment is expected to take approximately 1.5h to complete. The results of this experiment are for research purposes only, and may be presented at professional meetings or published in research literature. Your name will not be used in the reporting of results. Only recorded data will be used; all personal information will be kept completely confidential. A videotape of part of the experiment may be taken for purposes of data collection only. Neither your face nor identity will ever be associated with any reporting of these results. In addition, because of the confidentiality of this experiment, you will be asked not to disclose any information of what you have experienced today to anyone (including family, fellow colleagues, and friends) for a next 30 days. Only in this way we can be assured that the experiment will remain as realistic as possible. With your signature below you are accepting these conditions. If for any reason you are unable to comply with any of the listed conditions, please inform the researcher right away and you will be released of any other obligations. Additionally, if you wish to withdraw from the experiment, you may do so at any time. With Sincerest Thanks I, ________________________________, understand that my participation in this experiment is completely voluntary and that I may refuse to participate, or withdraw from the experiment, at any time without penalty. ___________________________________ _________________ Participant Signature Date I _______________________________ the researcher undertake to guarantee the confidentiality of the information you provided in this experiment. I understand that you reserve the right to seek legal redress should any aspect of this agreement be breached. ___________________________________ _________________ Researcher Signature Date
Prospective Research Subject: Read this consent form carefully and ask as many questions as you like before you decide whether you want to participate in this research study. You are free to ask questions at any time before or after your participation in this research.
Appendices
389
Now you are ready for training session!
~ When ready contact pseudo-pilot on dedicated R/T frequency so that your training session can be
initiated ~
Appendices
390
PPOOSSTT –– EEXXPPEERRIIMMEENNTT SSEESSSSIIOONN Dear Controller, Once again thank you very much for your participation is this experimental trial. Now you understand what our true objective in the experiment was and why we had to keep it confidential. Our objective in this research project is to research controller recovery from equipment failures in ATC. However, in order to achieve the unexpected effect of this rare occurrence, it was necessary to mask the real objective of this research. Our aim is therefore to determine how controllers manage equipment failures. The complexity of this experiment gave us the opportunity to test only one equipment failure in spite of the large number of potential equipment failures in any ATC Centre. By observing your reactions, recovery strategy, and attitude, we are aiming to identify better solutions in design of ATC tools/systems, recovery procedures, and training. Our belief is that current, more automated ATC Centres need to create better support to its main element – air traffic controllers. For the above reasons, we kindly remind you that you have agreed not to disclose any information and details from today’s experiment to your fellow colleagues, family, and friends in the next 30 days.
OOnnccee aaggaaiinn,, wwee wwoouulldd lliikkee ttoo hhiigghhlliigghhtt tthhaatt wwiitthhoouutt yyoouurr hheellpp aanndd uunnddeerrssttaannddiinngg tthhiiss rreesseeaarrcchh wwoouulldd nnoott bbee ppoossssiibbllee!!
Appendices
391
Post experiment questionnaire
IIff yyoouu nneeeedd ccllaarriiffiiccaattiioonn aatt aannyy ppooiinntt,, pplleeaassee ddoo nnoott hheessiittaattee ttoo ccoonnttaacctt tthhee rreesseeaarrcchheerr!! How suitable was your previous training to the situation (equipment failure) that you have just experienced? Please answer this question taking into account quality of training syllabus as well as the frequency of training. (Circle the appropriate number)
1. Suitable to the situation in question
2. Tolerable to the situation in question
3. Counter productive to the situation in question When was your last emergency training?
1. In the last 30 days 2. In the last 6 months 3. 1 year ago 4. More than 1 year ago
Did you have training on equipment failures during that session? Y N Do you need better or more frequent training for unusual situations, such as handling emergencies? Y N Please mark the statement that is closest to your previous experience with equipment failures:
1. I have experienced very similar or same type of equipment failure in the past. 2. I have not experienced this particular type of failure, but have experienced other
types of equipment failures previously. 3. I have never experienced equipment failure in my professional career.
Please mark the statement that is closest to your experience with ATC system:
1. I trust ATC technology more than I trust my own judgments. 2. I trust new ATC technology but I am aware of possible failures. 3. I do not trust new ATC technology, even though it is designed to make my job
easier.
Current rating: ACC RDR Proc Age ____ Years of experience as a controller: ____ APP RDR Proc TWR
Appendices
392
How would you rate your personal ability in today’s training session? Personal ability comprises different factors, not limited to: your level of fatigue, stress, confidence, complacency, your ability to cope with emergency situation, any family or other social group issues, etc. based on this explanation, rate your personal ability:
1. Suitable for the recovery process 2. Tolerable for the recovery process 3. Counter productive for the recovery process
How would you rate your communication for recovery today:
1. Efficient 2. Tolerable 3. Inefficient
Would you say that you had enough time to recover from the effect(s) of the equipment failure (taking into account possible development of less than adequate separation)?
1. Yes, time was adequate. Time necessary to recover was less than available time in the simulation.
2. No, time was not adequate. Time necessary to recover was in excess of available time in the simulation.
Is there relevant recovery procedure for this particular failure? Y N If yes, according to your opinion is that procedure:
1. Suitable to the situation in question
2. Tolerable to the situation in question
3. Counter productive to the situation in question
How familiar are you right now with that procedure?
1. Very familiar
2. Semi familiar
3. Not familiar at all Would you say that HMI and operational support have been:
1. Suitable to the situation in question
2. Tolerable to the situation in question
3. Counter productive to the situation in question Would you say that:
1. External working environment matched your internal mental model during recovery process
2. External working environment mismatched your internal mental model at any point of recovery
Appendices
393
How would you rate the adequacy of organisation in your ATC Centre?
1. Efficient
2. Tolerable
3. Inefficient How would you rate traffic complexity during the recovery process (please note: only during the recovery process and not during the entire training session):
1. High
2. Average
3. Low How would you rate the complexity of the airspace in the used scenario? The airspace complexity was:
1. Adequate 2. Tolerable 3. Inappropriate
How would you rate weather conditions during the recovery process?
1. Improved 2. Unchanged 3. Deteriorated
The quality of roles and responsibilities
The availability and adequacy of supervision
Attitude to teamwork
Support for organised exchange of past experience on eq. failures
Personnel selection process
Shift patterns and personnel planning
Availability of team members
Availability of additional support (e.g. Assistant)
Safety culture
Communication with management and technicians (e.g. Briefings, exchange of knowledge, bulletins)
Existence of stress management programs
The mix of IFR/VFR
Military aircraft
The existence of priority aircraft
Speed mix of aircraft
Amount of vertical movements
Amount of crossing movements
Amount of conflicts
The number of crossing points
Proximity of crossing point s to the sector boundaries
Number of flight levels
Number of entry points
Number of exit points
Special use airspace (SUAs)
Upper vs. Lower airspace
Airways configuration
The number of neighbouring sectors
Sector geometry (e.g. sharp edges)
Size of sector Bidirectional vs. unidirectional routes
Route length
Proximity of route to sector boundary
Appendices
394
Considering the entire training session how would you rate the overall task complexity:
1. Conflicting, multiple tasks existed during this training session.
2. Average complexity of the situation.
3. Extremely low complexity of the situation. How would you rate your recovery performance today?
1. Efficient
2. Tolerable
3. Inefficient How different your today’s performance is from any other day?
1. Not different at all
2. Similar
3. Very different How representative today’s performance have been of your overall ability to recover from an equipment failure in ATC?
1. Highly representative
2. Average
3. Not representative at all How realistic the today’s task was?
1. Highly realistic
2. Moderately
3. Not realistic at all Are you completely aware of the impact/implications of a particular failure that you have just experienced? Do you fully understand what will happen when particular equipment fails? Y N Any comment? Would you like to see some form of Aide-Memoire (flip chart, small laminated booklet, HMI drop down menu) available at each CWP to assist you in recognising the effects of a particular equipment failure and steps to be taken toward its recovery? Y N
Appendices
395
Is there any aspect of training, procedures, HMI, teamwork that could enhance your today’s recovery performance?
Thank you!!!!
Appendices
396
b) Debriefing interview structure
IMPERIAL COLLEGE LONDON
DEBRIEFING INTERVIEW STRUCTURE
Questions for each subject:
1. How did you notice/detect that there was an equipment failure? What info triggered the detection?
2. When exactly detection occurred?
3. What could have been the worst consequence if the situation was not detected?
4. Did you find diagnosis phase possible/necessary? If yes go to question 4. If no go to question 7.
5. What was your diagnosis?
6. What you did with it (i.e. tried to confirm, or rule out alternatives)?
7. Was the recovery strategy influenced by diagnosis?
8. How did you choose the recovery strategy to apply (i.e. based on training, own experience, colleague’s experience, any other source of info)?
9. What could have made the situation worse?
10. Can you think of any fall-back actions which could mitigate this situation? Can you suggest any changes to the procedures, phraseology; HMI design; fall-back procedures that could improve the situation?
Note: The researcher should replay the video recording from the moment of failure
injection and start further discussion with the subject.
Appendices
397
c) Feedback form
FEEDBACK FORM
Concerning the study conducted by representatives of
Imperial College London at XXX ATC Centre 06/06/06 – 09/06/06
Dear Controller, Having participated in this study we would like to ask you to provide your feedback on the importance and value of this study. Please answer all questions as accurately as possible, since these answers will guide us in our future endeavours. Your answers will be used only for the assessment of the usefulness of this study. Once again thank you very much for participating in this study! Please circle the appropriate answer:
Did you find participating in this study interesting? Y N
Do you think that this experience is beneficial for your future work? Y N
Do you feel that this experiment raised important issues? Y N
Do you feel that this experiment helped you to identify any gaps in your:
• Knowledge Y N
• Training Y N
• Skills Y N
• Awareness of effects of unusual events Y N
Would you be willing to participate in future studies of this type? Y N
Do you have any other comments on the experiment?
After completing, please return this feedback form to the office of XXX. Thank you for your time! Your cooperation is highly appreciated. Researcher Assistant
XXX, June 2006
Appendices
398
d) Subject matter expert’s assessment
AASSSSEESSSSMMEENNTT OOFF TTHHEE DDEEPPEENNDDEENNCCYY VVAARRIIAABBLLEE IINN TTHHEE EEXXPPEERRIIMMEENNTT
Our objective in this research project is to analyse the recovery from equipment failures in ATC. Since the area of ATC is highly specialised, it was necessary to evaluate the controller’s recovery performance using the expert opinion. As a Subject Matter Expert (SME) in the area of Air Traffic Control (ATC) you are asked to help in the assessment of the subject controller’s recovery performance. We kindly ask you not to disclose any information and details on this experiment to your fellow colleagues in the next 30 days so that we can assure the injection of failure as unexpected event for each subject-controller.
Recovery effectiveness
According to the controller performance that you observed in this experiment (either “live” or on the video recording of the experimental trial) it is necessary to use your professional experience and assess the effectiveness of the controller’s recovery.
Recovery is considered successful if the system returns to the normal or intermediate (but still stable) state. In the short term (as simulated in this experiment), the situation should be stable and control of airspace should be considered safe, but not necessarily efficient.
Please notice that the anchor points of each scale range from “Firmly Disagree” to “Firmly Agree.” Place a mark in one of the five boxes along each line, as shown in following example.
Example
In general, I am professionally more efficient in the mornings than evenings.
x
Firmly Partly Neutral Partly Firmly Disagree Disagree Agree Agree
1. The recovery strategy implemented by this controller can be considered successful.
Firmly Partly Neutral Partly Firmly Disagree Disagree Agree Agree 2. In this traffic scenario, it was possible to implement more than one recovery strategy.
Firmly Partly Neutral Partly Firmly Disagree Disagree Agree Agree
Appendices
399
If answered ‘partly agree’ or ‘firmly agree’, your answer referrers that you thought of alternative recovery strategy(s). Please describe briefly this/these alternative(s).
3. If you were in the place of subject-controller, would you implement different recovery strategy than he did?
Firmly Partly Neutral Partly Firmly Disagree Disagree Agree Agree If answered ‘partly agree’ or ‘partly disagree’, please specify your reasons to implement different recovery strategy and which recovery strategy that would be. In addition, please specify any particular/difficult issues regarding traffic situation during the recovery process:
Evaluation of the contextual factors in the training scenario: Please circle corresponding answers according to your professional experience and expertise:
How would you rate complexity of simulated failure type?
1. Single system affected 2. Multiple system affected
How would you rate the time course of simulated failure development?
1. It was sudden failure 2. It was latent failure. 3. It was gradual degradation of system.
Would you say that controller had enough time to recover from the effect(s) of the equipment failure?
3. Yes, time was adequate. Time necessary to recover was less than available time for recovery in the simulation.
4. No, time was not adequate. Time necessary to recover was in excess of available time for recovery in the simulation.
Is there recovery procedure for this particular failure? Y N If yes, is that procedure:
4. Suitable to the observed situation in question
5. Tolerable to the observed situation in question
6. Counter productive to the observed situation in question
Appendices
400
How would you rate duration of simulated equipment failure? 1. Short period of time (is it reasonable to consider them less than 15min) 2. Moderate period of time (is it reasonable to consider them less than 1h) 3. Substantial period of time (is it reasonable to consider them more than 1h)
How would you rate traffic complexity during the recovery process (please note: only during the recovery process and not during the entire training session).
1. High 2. Average 3. Low
How would you rate airspace complexity in the used scenario?
4. Adequate 5. Tolerable 6. Inappropriate
How would you rate weather conditions during the recovery process?
4. Improved 5. Unchanged 6. Deteriorated
How realistic the today’s task was?
4. Highly realistic
5. Moderately
6. Not realistic at all
Thank you!!!!
The mix of IFR/VFR
Military aircraft
The existence of priority aircraft
Speed mix of aircraft
Amount of vertical movements
Amount of crossing movements
Amount of conflicts
The number of crossing points
Proximity of crossing points to the sector boundaries
Number of flight levels
Number of entry points
Number of exit points
Special use airspace (SUAs)
Upper vs. Lower airspace
Airways configuration
The number of neighbouring sectors
Sector geometry (e.g. sharp edges)
Size of sector
Bidirectional vs unidirectional routes
Route length
Proximity of route to sector boundary
Appendices
401
e) Best practice procedure sheet
BEST PRACTICE PROCEDURE FOR XXX SIMULATION
Detect the problem � Either by pilot’s first contact or � Visually on the radar display (uncorrelated track). In this case first
assumption may be transponder failure. After confirmation that a/c transponder is serviceable, further check on system performance should be conducted.
Identify failure type either by ATCO or by input from the coordinator
Locate traffic
Check identity of all tracks (referring to the eastbound overflight)
Identify traffic using appropriate technique
Bearing/range Turn method
Inform all traffic on RTF of the failure and advise of possible restrictions
Maintain identification of all traffic
Ground trainer
Refuse departures permission to depart
Get all airborne traffic to land
Maintain accurate and timely strip marking throughout the process
Provide vertical separation
Utilize holding patterns when necessary
After restoration has been confirmed by coordinator: � Re-identify all traffic � Confirm Mode C � Continue to monitor � Release all departures
First possible detection/action may have occurred at: ______________ First actual action occurred at: ______________ End of the recovery process (release of the departures): ______________
Chapter 13 Appendices
Appendix XIV Overview of RIFs, their corresponding levels, and probabilities determined in the experimental investigation
(1) (2) (3) (4) (5) (6) (7) (8)
ID RIF name Descriptor Probability
(p)
Expected effect of
controller recovery
performance
Level Designator
(R)
Probability of overall situation occurring
(p*R)
Inte
rnal fa
cto
rs
1 Training for recovery from ATC equipment failure
Suitable to the situation in question
0.73 Most favourable
1 1 0.73
Tolerable to the situation in question
0.23 Non significant
2 0 0
Counter productive to the situation in question
0.03 Least favourable
3 -1 -0.03
2 Previous experience with equipment failures
Experienced with a particular type of failure or Experienced with any other type of ATC equipment failure
0.83 Most favourable
1 1 0.83
No experience with ATC equipment failures
0.17 Non significant
2 0 0
3 Experience with the system performance (reliance or trust)
Objective attitude toward the system
0.93 Non significant
2 0 0
Positive experience with the system (excessive trust) or Negative experience with the system (under-trust)
0.07 Least favourable
3 -1 -0.07
4 Personal factors
Suitable for the recovery process
0.83 Most favourable
1 1 0.83
Tolerable for the recovery process
0.13 Non significant
2 0 0
Counter productive for the recovery process
0.03 Least favourable
3 -1 -0.03
5 Communication for recovery within team/ATC Centre
Efficient 0.27 Most favourable
1 1 0.27
Tolerable 0.67 Non significant
2 0 0
Inefficient 0.07 Least favourable
3 -1 -0.07
Equip
ment
failu
re r
ela
ted facto
rs
6 Complexity of failure type
Single system affected
0 Non significant
2 0 0
Multiple systems affected
1 Least favourable
3 -1 -1
7 Time course of failure development
Sudden failure 1 Improve 1 1 1
Persistent or latent failure
0 Non significant
2 0 0
Gradual degradation of system
0 Least favourable
3 -1 0
8 Number of workstations/sectors affected
One workstation/one sector or All workstations in one sector
0 Non significant
2 0 0
Appendices
403
Several workstations/couple of sectors or All workstations/all sectors
1 Least favourable
3 -1 -1
9 Time necessary to recover
Adequate - less than available time
0.86 Most favourable
1 1 0.86
Inadequate - in excess of available time
0.14 Least favourable
3 -1 -0.14
10 Existence of recovery procedure
Suitable to the situation in question
0 Most favourable
1 1 0
Tolerable to the situation in question
0 Non significant
2 0 0
Inappropriate 1 Least favourable
3 -1 -1
11 Duration of failure
Short period of time 1 Non significant
2 0 0
Moderate period of time or Substantial period of time
0 Least favourable
3 -1 0
Exte
rnal or
facto
rs r
ela
ted to w
ork
ing c
onditio
ns
12 Adequacy of HMI and operational support
Suitable to the situation in question
0.5 Most favourable
1 1 0.5
Tolerable to the situation in question
0.39 Non significant
2 0 0
Counter productive to the situation in question
0.11 Least favourable
3 -1 -0.11
13
Ambiguity of information in the working environment
External working environment matches the controller's internal mental model
1 Most favourable
1 1 1
External working environment mismatches the controller's internal mental model
0 Least favourable
3 -1 0
16 Adequacy of organisation
Efficient 0.4 Most favourable
1 1 0.4
Tolerable 0.5 Non significant
2 0 0
Inefficient 0.1 Least favourable
3 -1 -0.1
Airspace r
ela
ted f
acto
rs
17 Traffic complexity
Average traffic complexity
0.35 Non significant
2 0 0
Extremely high or extremely low traffic complexity
0.65 Least favourable
3 -1 -0.65
18 Airspace characteristics
Adequate (e.g. enroute higher levels)
0.8 Most favourable
1 1 0.8
Tolerable 0.1 Non significant
2 0 0
Inappropriate (e.g. enroute lower levels or terminal)
0.1 Least favourable
3 -1 -0.1
19 Weather conditions during the recovery process
Improved 0.83 Non significant
2 0 0
Deteriorated 0.17 Least favourable
3 -1 -0.17
20 Conflicting issues in the situation (task complexity)
Average complexity of the situation
0.3 Non significant
2 0 0
Conflicting, multiple tasks or Extremely low complexity of the situation (may lead to monotony)
0. 7 Least favourable
3 -1 -0.7
Appendices
404
Appendix XV Distribution of the recovery context indicator captured in the experiment
The distribution of the recovery context indicator (Ic) obtained from the experimental
results is presented in Figure 1.
0
100
200
300
400
500
600
700
800
-0.088
-0.078
-0.068
-0.058
-0.048
-0.038
-0.028
-0.018
-0.008
0.00
2
0.01
2
0.02
2
0.03
2
0.04
2
0.05
2
0.06
2
0.07
2
0.08
2
0.09
2
0.10
2
0.11
2
Recovery context indicator (Ic)
Fre
qu
en
cy
Figure 1 Distribution of the recovery context indicator in the experimental investigation
(six RIFs defined through one level)
Based on the shape of the Ic distribution, the data has been fitted with two normal
distributions according to equation 1 (Figure 2). The distribution on the left accounts for
unfavourable recovery contexts whose recovery context indicator takes the average
value of -0.04 (A1=141.4, SD1=0.02). The distribution on the right accounts for
favourable recovery contexts whose recovery context indicator takes an average value
of 0.04 (A2=632.8, SD2=0.04).
204.02
2)04.0(
8.632
202.02
2)04.0(
4.141
22
2
2)2
(
2
21
2
2)(
1)(
1
×
−−
×+
×
+−
×=
−
−
×+
−−
×=
x
e
x
e
x
eA
x
eAxfσ
µ
σ
µ
1
Appendices
405
Figure 2 Fitting of the two normal distributions