____________________________ Managing the Operational Safety Case in High-Risk Systems Michael Salter Supervisor: Dr Mark Nicholson Submitted in part fulfilment of the MSc in Safety Critical Systems Engineering Department of Computer Science University of York
115
Embed
Managing the Operational Safety Case in High-Risk Systems
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
____________________________ Managing the Operational Safety Case in High-Risk Systems
Michael Salter
Supervisor: Dr Mark Nicholson
Submitted in part fulfilment of the MSc in Safety Critical Systems Engineering Department of Computer Science
University of York
Preface
MSc in SCSE ii
September 2006
MSc in SCSE i
Michael Salter MSc in Safety Critical Engineering Department of Computer Science University of York
MSc Dissertation
Managing the Operational Safety Case in High-Risk Systems
This report contains 30611 words in total, less the Appendices, using the Microsoft Word XP word count command.
Last Updated: 15 September 2006
MSc in SCSE ii
In flying, I have learned that carelessness and overconfidence are usually far more dangerous than deliberately accepted risks.
— Wilbur Wright in a letter to his father, September 1900.
Aviation in itself is not inherently dangerous. But to an even greater degree than the sea, it is terribly unforgiving of any carelessness, incapacity or neglect.
— Captain A. G. Lamplugh, British Aviation Insurance Group, London. Circa early 1930's.
MSc in SCSE iii
Preface
Abstract
The need to develop safety cases for high-risk systems has been identified through reports and investigations of pivotal accidents including Flixborough and Piper Alpha. The Health and Safety at Work (etc) Act 1974 and more recently Defence Standards and Civil regulations have defined the requirements for safety cases and what constitutes a successful safety management system. However, much of the focus for safety case development has concentrated on the procurement and through-life safety management of an equipment’ and little has been formulated about the requirements of operational safety cases. This project investigates the definition of operational safety case in the context of safety critical systems and proposes some safety requirements that are not normally included in the regulatory documents. The project recognizes the significance of identifying the Duty Holder at the appropriate level in an organization to ensure that the safety case scope captures the operational aspects that may apply across systems of systems. The project also investigates the feasibility of reusing GSN safety case arguments across differing domains in military aerospace and evaluates the benefits of such a process. Finally, the project identifies areas of further work that will provide the impetus to define the safety requirements of operational safety cases in other domains and provide better assessment of the soft issues in the safety case argument.
Acknowledgements
This project represented a major focus of my life over the past three years and the achievement would not have been possible without the help and generous support of others. I would like to thank the staff of the Department of Computer Science at the University of York, in particular Dr Rob Weaver who set me on the right track, and Dr Mark Nicholson who had the patience and encouragement to lead me surely through the final vital stages. Additionally, I would like to thank ERA Technology for making all this possible. Finally, my thanks go to my wife Patricia for her endless patience, who had to forego the kind of quality time that I would have spent with her had it not been for this opusculum.
Disclaimer It is recognized that some material reviewed as part of this project is not in the public domain but has been available to the author during the course of his normal work. Where such material is referenced, it is explicitly acknowledged and no content sensitive material has been used. In particular, the information used in this project has been sanitized to cover only sufficient information on the approach and process to support the general arguments used. All views and comments expressed in this project are those of the author and should not be attributed to any organization.
MSc in SCSE iv
Contents
Table of Contents
Abstract................................................................................................................ iii Acknowledgements.............................................................................................. iii Table of Contents................................................................................................. iv Table of Figures.................................................................................................. vii Table of Tables ................................................................................................... vii Table of Appendices ...........................................................................................viii 1. Introduction....................................................................................................1
1.1. Background .....................................................................................1 1.2. The Need for Operational Safety Cases..........................................2 1.2.1. Flixborough (1974) ..........................................................................3 1.2.2. Herald of Free Enterprise (1987) .....................................................3 1.2.3. Piper Alpha (1988)...........................................................................4 1.2.4. Mid-Air Collision in Southern Germany............................................5 1.2.5. Columbia Accident (2002) ...............................................................6 1.3. The Aim of the Project .....................................................................7 1.4. Content of the Report ......................................................................7
2. Literature Review...........................................................................................8 2.1. Review Objectives...........................................................................8 2.2. Operational Safety...........................................................................8 2.2.1. International Atomic Energy Agency – Operational Safety ..............9 2.3. Operational Safety Case ...............................................................10 2.4. Safety Culture................................................................................11 2.4.1. Overconfidence and Complacency................................................11 2.4.2. Buncefield accident .......................................................................12 2.4.3. Underestimating Risk ....................................................................13 2.4.4. Safety Culture Definition................................................................13 2.5. Relationship between Incidents and Accidents in the Military Environment ....................................................................................................14 2.6. Accident Causal Factors in Offshore Helicopter Accidents............16 2.6.1. Aircrew Competence .....................................................................16 2.6.2. Offshore Installations Safety Case Requirements .........................17 2.7. Operational Safety Case – Historical Definition.............................18 2.7.1. Military capability ...........................................................................18 2.8. Responsibility of Duty Holders.......................................................19 2.8.1. The effect of the Duty Holder’s level of responsibility in the organization on the Operational Safety Case..................................................20 2.8.2. Operational Safety Case – Proposed Definition ............................21 2.9. Corporate Manslaughter and Corporate Homicide Bill ..................22 2.9.1. Recent Accident Inquiries..............................................................23 2.9.2. Current Legislation ........................................................................24 2.9.3. Corporate Manslaughter and Corporate Homicide Bill Proposals .24 2.9.4. Current Legislation Failure ............................................................24 2.9.5. Sufficiency of the Bill .....................................................................25 2.9.6. Obtaining Successful Prosecution.................................................25 2.10. Review Summary ..........................................................................25
3. To conduct a critical appraisal and appreciation of the methodology of operational safety case development and the relevant standards that govern them....................................................................................................................27
MSc in SCSE v
Contents
3.1. Introduction....................................................................................27 3.2. Standards Governing Safety Case Development ..........................27 3.2.1. Defence Safety Regulations ..........................................................28 3.2.2. Defence Standard 00-56 Issue 3...................................................28 3.2.3. JSP 550.........................................................................................29 3.2.4. JSP553..........................................................................................30 3.2.5. Civilian Aerospace Regulations.....................................................31 3.3. The London Underground Railway Safety Case ...........................32 3.3.1. The London Underground Safety Case – Appraisal ......................32 3.3.2. Compliance Summary ...................................................................33 3.4. Introduction to GSN.......................................................................40 3.4.1. The GSN Notation .........................................................................40 3.4.2. Selection of GSN for Operational Safety Case Argument .............42 3.5. Evaluation of Reuse Process from a GSN Military ATC Pattern to develop a System of Systems Operational Safety Case .................................43 3.5.1. Safety Case Domain and Scope Extension...................................43 3.5.2. Additional Safety Case Complexity for System of Systems...........43 3.5.3. Safety Case Operational Argument ...............................................44 3.6. The Proposed Safety Case Reuse Evaluation ..............................44 3.6.1. The Warren Safety Argument Reuse Process Background...........44 3.6.2. The Warren Five-Step SAR Process.............................................45 3.6.3. Conclusion.....................................................................................46 3.7. Safety Case Reuse Trial using the SAR Process..........................46 3.7.1. Step 1 Recognize Challenge .........................................................46 3.7.2. Step 2 – Express Challenge in GSN Terms ..................................51 3.7.3. Step 3- Use GSN to Identify Impact...............................................52 3.7.4. Step 4 – Decide upon Recovery Action .........................................52 3.7.5. Step 5 – Recover Identified Damaged Argument ..........................53 3.8. SAR Process Evaluation ...............................................................53 3.8.1. Rigour of Process..........................................................................54 3.8.2. Extension of Scope........................................................................54 3.8.3. Degree of Diversity ........................................................................54 3.8.4. SAR Summary...............................................................................55 3.9. Appraisal conclusions....................................................................55 3.9.1. Textual Safety Case Appraisal ......................................................56 3.9.2. GSN Safety Case Appraisal ..........................................................56 3.9.3. Safety Culture Assessment ...........................................................56 3.9.4. Safety Argument Reuse Assessment ............................................56
4. To discuss the development of operational safety cases and risk assessment in high-risk Systems........................................................................58
4.1. The Operational Safety Case Development Process ....................58 4.1.1. Review Documented Requirements ..............................................58 4.1.2. Elicit Information............................................................................59 4.1.3. Structure Safety Case in GSN.......................................................59 4.2. Development of an Initial Operational Safety Case in GSN for a Multiple Unit Flying Organization.....................................................................59 4.2.1. Top-Level Safety Case Argument .................................................59 4.2.2. Equipment is acceptably safe to use .............................................60 4.2.3. Maintenance Standards Argument ................................................62 4.2.4. Ground Resources Support Operations Safely .............................62
MSc in SCSE vi
Contents
4.2.5. Aircrew Resources Operate Safely................................................63 4.2.6. Aircrew Training Argument ............................................................64 4.2.7. Aircrew Standardization Argument ................................................65 4.2.8. Mission Support Functions Sustain Acceptably Safe Operations ..66 4.2.9. Operations from Non Main Operating Bases are Acceptably Safe 67 4.2.10. Operational Safety Management System......................................68 4.2.11. Hazard Log Argument ...................................................................69 4.2.12. Hazard log Contains the Required Elements Argument ................70 4.3. The Operational Loss Model Hazard Identification Process ..........71 4.4. Safety Culture Assessment ...........................................................72 4.5. Human Factors Occurrences.........................................................72 4.6. Safety Case Development Summary.............................................72 4.7. Evaluation of GSN Development Process .....................................72 4.7.1. Elicitation Techniques....................................................................73 4.7.2. Use of Patterns..............................................................................73 4.7.3. Further work identified ...................................................................73
5. Evaluate the contribution of the process to communicating the effectiveness of a proposed SMS .............................................................................................74
5.1. Evaluation Objectives ....................................................................74 5.2. Operational Safety Case Criteria...................................................74 5.3. London Underground Safety Case Evaluation ..............................75 5.3.1. London Underground Evaluation Summary...................................76 5.4. Multi-Unit Operational Safety Case Evaluation..............................76 5.4.1. Multi-Unit Operational Safety Case Evaluation Summary .............77 5.5. Safety Argument Reuse Evaluation...............................................77 5.6. Use of Patterns..............................................................................77 5.7. Conclusion.....................................................................................77
6. To summarize the areas of further work that have been identified ..............79 6.1. Introduction....................................................................................79 6.2. Summary .......................................................................................79 6.3. Conclusions...................................................................................80 6.4. Further Work .................................................................................81 6.4.1. Operational Safety Case for UAVs ................................................81 6.4.2. Adaptation of Operational Safety Case to Commercial Environments ..................................................................................................81 6.4.3. Identify Hazards Through Loss Model Development .....................81 6.4.4. Safety Culture and Human Factors Assessment...........................81
Figure 1 RAF Category 4 and 5 Air Accidents 1994-2004 ..................................14 Figure 2 RAF Air Incidents 1994-2004................................................................15 Figure 3 MoD AMS components of military capability.........................................18 Figure 4 Safety related decisions applicable to level of authority........................20 Figure 5 Hazard Log Development in a Safety Lifecycle ....................................29 Figure 6 Kelly Safety Case Change Process ......................................................45 Figure 7 High Level Claim Structure Challenged ................................................47 Figure 8 Organizational Argument Structure Challenged....................................48 Figure 9 Staff Competence Argument Structure Challenged ..............................49 Figure 10 Staff Support Arguments Challenged .................................................50 Figure 11 Staff Competence Argument - New Structure.....................................53 Figure 12 Top Level Safety Case Argument – 1 .................................................60 Figure 13 Equipment Safety Case Argument – 2................................................61 Figure 14 Maintenance Standards Argument – 3................................................62 Figure 15 Ground Resources Argument – 4 .......................................................63 Figure 16 Aircrew Resources Argument – 5 .......................................................64 Figure 17 Aircrew Training Argument – 17 .........................................................65 Figure 18 Aircrew Standardization Argument – 18..............................................66 Figure 19 Mission Support Argument – 6............................................................67 Figure 20 Aircraft Operations from Non MOBs Argument– 14............................68 Figure 21 Safety Management Argument – 7 .....................................................69 Figure 22 Hazard Log Established Argument – 15 .............................................70 Figure 23 Hazard Log Required Elements Argument – 20 .................................71 Figure 24 Top Level of Operational Loss Model .................................................71 Figure C-1: High Level Claim Structure ..............................................................95 Figure C-2: System Argument Structure .............................................................96 Figure C-3: Safety Nets Structure .......................................................................97 Figure C-4: Organisational Argument Structure..................................................98 Figure C-5: Lower Level Organisational Structure ..............................................99 Figure C-6: Process & Procedure Argument Structure .....................................100 Figure C-7: Change Argument Structure ..........................................................101 Figure C-8: Lower Level Procedure Objectives Achieved.................................102 Figure C-9: Staff Competence Argument Structure ..........................................103 Figure C-10: Staff Support Arguments..............................................................104 Figure C-11: Functions Structure Arguments....................................................105
Table of Tables
Table 1 London Underground Safety Case – Compliance with Operational Safety Case Requirements .....................................................................................34
Table 2 Elements Challenged in the ATC Safety Case.......................................52 Table 3 Spinal and Contextual Impact ................................................................52
MSc in SCSE viii
Table of Appendices
Table of Appendices
Appendix A Developing and Maintaining an Effective Safety Culture .................87 Appendix B London Underground Workplace Risk Assessment – Generic
Controls .......................................................................................................92 Appendix C RAF Waddington GSN Argument....................................................95
MSc in SCSE 1
Introduction
1. Introduction 1.1. Background The need to maintain an acceptably safe environment with safety critical systems has become increasingly important as systems become more complex and the management of risk based safety cases becomes more convoluted. A century ago there was an almost fatalist approach to safety with untimely death or serious injury regarded as an inevitable risk of society. However, after notable disasters such as Flixborough (1974), The Herald of Free Enterprise (1987) and Piper Alpha (1988) there was a compelling argument to improve the management of Safety Risk. Nevertheless, the attitude to risk within the civilian and military environments had been running out of quilter until the introduction of the Health and Safety at Work etc Act (1974) (HSWA) [Ref 1] and the abolition of Crown Immunity provided the catalyst to adopt similar standards within the 2 environments. Furthermore, the introduction of the Corporate Manslaughter and Corporate Homicide Bill to the House of Commons on 20th July 2006 [Ref 2] has initiated the final chapter for senior managers to recognise their responsibility to reduce the safety risk to the people for which they have duty of care. This will include employees and members of the public who are within the boundary of the system. The main instrument to provide the corporate governance required has been the adoption of a coherent Safety Management System (SMS). Organizations responsible for systems in safety critical environments have normally developed an integrated SMS with a detailed risk based Safety Case. This type of Safety Case relies on a well-founded hazard log for much of the continued assessment of Safety Risk through the life of the system. Equipment safety cases are now well established in safety critical systems particularly in the UK MoD and the major UK Aerospace manufacturing companies. The progress in the US DoD and US industry has been less well pronounced but has been influenced by the requirements imposed by UK contracts. The non-military domains in the UK have been led by the nuclear and rail industry, which has very well defined guidance provided for the safety management process published in the ‘Yellow Book’ [Ref 3]. Whereas the space industry has recognized for some time the need to improve their safety management system, the Columbia accident Report [Ref 4] in 2002 highlighted similar cultural and managerial failings expressed in the Challenger accident Report [Ref 5] in 1988. Specifically, during the Columbia investigation, the Board received several unsolicited comments from NASA personnel regarding pressure to meet the space station completion date of 19 February 2004. Board members at first thought the target date for completing the core space station noteworthy but unrelated to the accident. However, as the investigation continued, the report states,
"It became apparent that the complexity and political mandates surrounding the international space station program, as well as shuttle program management's
MSc in SCSE 2
Introduction
responses to them, resulted in pressure to meet an increasingly ambitious launch schedule.” [Ref 5]
Much effort has been expended in providing robust arguments for describing design attributes that provide the necessary safety assurance to the duty holder. The maintenance of this design integrity has been achieved by the development of management methods to monitor and assess system changes within the equipment safety case. However, the adoption of the ‘inclusive’ Operational Safety Case is less well developed; large systems and systems of systems in high-risk environments tend to be very complex and subjected to many human-induced variables. The safety case for the introduction of a simple piece of equipment operated by a small team can be relatively easily developed using the traditional techniques. The definition of operational safety case will be developed in the next Chapter but to provide an insight to the extended concept, the definition from paragraph 2.8.2 will be reproduced here: A safety case, owned by the duty holder responsible for the operators, that provides a compelling, comprehensible and valid case, that the combination of elements comprising operational capability, when used together in a defined operating environment to achieve agreed objectives, demonstrates that the system is acceptably safe. The requirement for systems that involve humans, who themselves have required years of training to the value of several £m, implies that the maintenance of corporate governance for the safety aspects is very difficult to establish. The measurement and management of risk in these chaotic environments requires a much deeper understanding of the system’s integration complexity, operating limitations and the effect of the inconsistent competences of the actors. Nevertheless, the Operational Safety Case demands the same attributes of ‘a well-structured argument supported by relevant evidence’ as the equipment safety case. However, the fluid nature of some high-risk environments and the large number of variables, places additional emphasis on understanding and defining the degrees of freedom experienced in the operational domain. Almost every policy, investment or operational decision has an impact on the operational safety case and a balance has to be struck between conflicting demands of cost, performance and safety. As there are no pure “safety decisions”, the duty holders and deciders have to find that balance in every decision that they take. 1.2. The Need for Operational Safety Cases This Chapter introduces the need for special considerations when developing and managing operational safety cases by considering lessons identified from major accidents attributed to operational decisions. Specifically, operational aspects include: the external environment, cultural and human factors as they affect, management, competence and procedures. These have not always been dealt with convincingly in equipment safety cases.
MSc in SCSE 3
Introduction
1.2.1. Flixborough (1974) On Saturday 1st June 1974, an explosion demolished the chemical plant owned by Nypro (UK) Ltd with 28 killed, 36 injured and numerous other injuries in the surrounding area. 1821 houses and 167 shops were damaged or destroyed [Ref 6]. Prior to the explosion on 27th March 1974, it was discovered that a vertical crack in reactor No.5 was leaking cyclohexane. The plant was subsequently shutdown for an investigation that identified a serious problem with the reactor. The decision was taken to remove it and install a bypass assembly to connect reactors No.4 and No.6 so that the plant could continue production. However, during the late afternoon of 1st June 1974 a 20-inch bypass system ruptured, which may have been caused by a fire on a nearby 8-inch pipe. This resulted in the escape of a large quantity of cyclohexane, which formed a flammable mixture and subsequently found a source of ignition. This led to a massive vapour cloud explosion that caused extensive damage and started numerous fires that burned for several days. An investigation found failings in technical measures taken such as:
• The plant was modified without a full assessment of the potential consequences, and only limited calculations were undertaken on the integrity of the bypass line.
• No pressure testing was carried out on the installed pipe work modification.
• Those concerned with the design, construction and layout of the plant did not consider the potential for a major disaster happening instantaneously.
• The incident happened during start up when critical decisions were made under operational stress.
The last 2 findings indicate the significance of considering the operational aspects of a complex system. Every eventuality should be assessed and mitigated or eliminated where possible. But the final finding emphasises the need to identify the competencies of the operators and ensure that assumptions made about their capabilities when working under stress are warranted. Additionally, initial and refresher training may be required to ensure that assumptions in the safety case remain valid. 1.2.2. Herald of Free Enterprise (1987) In March 1987 the roll-on roll-off ferry, Herald of Free Enterprise capsized in shallow water departing from Zeebrugge with the outer and inner bow doors fully open [Ref 7]. 193 personnel were killed on this inherently unstable ship. Blame was attributed to four crewmembers and the Townsend Thoresen management. However, the significant human errors included the following:
• The assistant boson was directly responsible for closing the doors, but was asleep in his cabin, having just been relieved from maintenance and cleaning duties.
• The boson noticed that the bow doors were still open, but did not close them, as he did not see that as part of his duties.
• It seems that the captain was to assume that the doors were safely closed unless told otherwise, but it was nobody's particular duty to tell him. The written procedures were unclear.
MSc in SCSE 4
Introduction
• The chief officer, responsible for ensuring door closure, testified that he thought he saw the assistant boson going to close the door. The chief officer was also required to be on the bridge 15 minutes before sailing time.
• The Board of Directors.........did not apply their minds to the question:
What orders should be given to the safety of our ships?........From top to bottom the body corporate was infected with the disease of sloppiness......The failure on the part of the shore management to give proper and clear directions was a contributory cause of the disaster '.
Examples of this sloppiness included the following significant latent errors: There was no information display (not even a single warning light) to tell the captain if the bow doors were open. Two years earlier, following a similar incident when he had gone to sea with his bow doors open, the captain of a similar vessel owned by the same company had requested that a warning light should be installed. Company management had treated the request with derision. Following the loss of the Herald, bow door warning lights were made mandatory on roll-on roll-off car ferries. This accident identified the Board of Directors as having some responsibility for the accident but a conviction was not enforceable because the law required a person to be convicted of manslaughter. The design of the ship was such that it was top heavy and the operational significance of putting to sea with the bow doors open was not considered. 1.2.3. Piper Alpha (1988) In July 1988 an explosion followed by a fire on an Oil Rig in the North Sea killed 165 out of 226 on board; a further 2 rescue workers were killed. The accident was investigated and Lord Cullen carried out the inquiry [Ref 8]. This was a very thorough review and the findings were published 1990. He concluded that there was:
• Inadequate risk assessment. • Inaction concerning known deficiencies in the system. • Lack of enforcement of the permit to work system. • No formal training in the permit to work system. • Inadequate auditing of the permit to work system. • Inadequate senior management commitment to Occupational Safety and
Health (OSH) management. Most significant of the 106 recommendations was the requirement for a safety case: “The operator should be required by regulation to submit to the regulatory body a safety case in respect of each of its installations.” [Ref 8] “ The safety case should demonstrate that certain objectives have been meet, including: Safety management system of the installation is adequate to ensure that the design and operation of the installation and its equipment are safe. That the potential hazards of the installation and the risks to personnel have been
MSc in SCSE 5
Introduction
identified and appropriate controls identified. That adequate provision is made for ensuring in the event of a major emergency affecting the installation:
• A temporary safe refuge for personnel in the installation, and • Their safe and full evacuation escape and rescue.” [Ref 8]
The inclusion of the safety case and the type of activities that should be included was significant and this has led to the improvement and refinement of the safety case development process. Storey, [Ref 9] states that within many safety critical or safety-related industries, the safety systems will require certification by a regulating authority before entering operations. He defines the certification, as ‘Certification is the process of issuing a certificate to indicate conformance with a standard, a set of guidance or some similar document.’ As part of the certification process, the operator may be required to produce a safety case. Safety cases are a means of justification that the system or equipment is safe to be deployed. These safety cases describe the design and assessment techniques used to develop the system, and results of the assessment are used to argue that the system is sufficiently safe to be operated. They provide the arguments and evidence that all potential hazards have been identified and that the appropriate steps have been taken to deal with them. Experience shows that the safety case cannot prove that a system is absolutely safe, because such proof would require an inordinate amount of resource in terms of cost or time to obtain. Storey states that the argument within the safety case is normally based on engineering judgement rather than strict formal logic. ‘Determining an adequate level of safety inevitably involves personal judgement and opinion. This is complicated by the fact that safety is an emotive subject and people’s perceptions of it are varied and illogical’. 1.2.4. Mid-Air Collision in Southern Germany In July 2002, a mid-air collision at Flight Level (FL) 350 occurred between a Tupolev TU154M and a Boeing B757-200 over southern Germany under control from Air Traffic Control (ATC) Centre (ACC) Zurich resulting in the loss of all 71 crew and passengers. Both aircraft were approved for Reduced Vertical Separation Minimum (RVSM) operations and both were using serviceable Traffic Collision Avoidance Systems (TCAS). The Accident Investigation Report [Ref 10] identified operational safety factors related to safety culture, procedures and training that contributed to the accident as described below. Safety culture • Zurich ACC had reissued their Safety Management policy in October
2001 in which it was stressed that a safety culture was to be evolved in which managers and employees were to be aware of their critical importance for safe operations. Arising from the incomplete implementation of this policy, the management and quality assurance of the air navigation service company did not ensure that, during the night, controllers continuously staffed all open workstations. Also, the company tolerated for years that during times of low traffic at night only one controller worked and the other retired to rest. As a result, several related events, such as a lack of information on known equipment shortfalls during the night of the accident, lack of clarity of the roles
MSc in SCSE 6
Introduction
of those on shift, and one controller working the 2 aircraft that collided plus another aircraft on 2 separate frequencies at 2 workstations, contributed to the accident.
Procedures and training • The accident investigation report noted that the flight operations manuals for
both aircraft did contain provisions for operating TCAS, but did not contain detailed operating procedures for the flight crew. Moreover, in the operations manual for the TU154M the TCAS description wording was such that ATC had the highest priority in collision avoidance.
• The separation infringement was not noticed by ATC in time. • The TU154M crew followed the ATC instructions to descend and continued to
do so even after TCAS advised them to climb. This manoeuvre was performed contrary to the generated TCAS Resolution Advisory (RA). The Boeing 757-200 crew responded to the descend RA. The TU154M crew decision to continue descent did not take into account that very likely simultaneous with their RA the other aircraft involved would receive a complementary RA. The TU154M crew queried with ATC the continuation of the descent but received no reply.
• The scope of training for flight crew and ATC operators was a contributory cause of the accident.
1.2.5. Columbia Accident (2002) The Columbia accident exposed weaknesses that might be considered unacceptable considering the funding and intellectual might available to the space industry. Expectations had been raised that assumed that the industry would have the best understanding of designing safe systems. However, the National Aeronautics and Space Administration (NASA) safety culture was not all it seemed as evinced by Daniel Goldin (NASA Administrator, April 1, 1992 – November 17, 2001) in his statement on 29 July 1996, “Since I came to NASA [1992], we have spent billions of dollars on shuttle upgrades without knowing how much they improve safety. I want a tool to help decisions on risk”. The space industry continued to review its safety assessment process but, more recently, the Columbia accident on 1 February 2002 highlighted weakness in the safety management process. The comments made by Goldin were echoed in the August 2003 Columbia report [Ref 4] with the telling recommendation to NASA, “Safety will be the core of the program and that the safety program, a new safety program, will have accountability”. This was emphasized by the quote from Richard Feynman who was on the Challenger Accident Investigation Board when, to illustrate the organizational problems of safety awareness, he attached a personal appendix to the Report;
"It appears that there are enormous differences of opinion as to the probability of a failure with loss of vehicle and of human life. The estimates range from roughly 1 in 100 to 1 in 100,000. The higher figures come from the working engineers, and the very low figures from management. What are the causes and consequences of this lack of agreement? … We could properly ask, "What is the cause of management's fantastic faith in the machinery?"
From these notable accidents throughout several decades, it would seem that while the technology and understanding of equipment safety has improved, the
MSc in SCSE 7
Introduction
decision making process by managers and operators while the equipment is in service has not. The methodology employed to gather and interpret the risk data provides the basis for managerial oversight. However, the manner in which this information is interpreted dictates the safety culture of an organization. 1.3. The Aim of the Project Developing safety cases to provide visibility of the safety argument strategy is well understood in some high-risk systems, especially where the equipment safety is the focus. However, the development of safety cases to include the operational safety argument especially for a system of systems, as experienced in safety critical domains such as the nuclear and aerospace environment, is less well developed. The aim of the project is to provide guidance to:
• Extract and summarise best practice from other domains.
• Investigate issues of particular relevance to military aircraft operations.
• Develop a process that may be used for similar operational safety cases
• Evaluate the contribution of the process to communicating the effectiveness of a proposed SMS
1.4. Content of the Report This report will review the applicable standards, regulations, guidance and academic papers to identify the safety case requirements. The search will extend to operational safety cases in industry and identify the best practice definition of operational in the context of large high-risk systems. Chapter 2 concludes with a proposed definition of operational safety case and provides guidance on the corporate manslaughter legislation with reference to safety management. Chapter 3 presents the domain specific regulations for the defence and air environment and investigates the feasibility of developing a safety case using a safety argument reuse process. It continues with the evaluation of a textual safety case against the criteria for the operational safety case derived in Chapter 2. Chapter 4 develops a large-scale operational safety case for a high-risk system and the evaluation of the process is provided in Chapter 5. A list of abbreviations and references follow. Appendix A is an article from the DASC Journal about the Typhoon aircraft and discusses the importance of safety culture when introducing new systems. Appendix B is a London Underground Risk assessment Form and Appendix C is the RAF Waddington ATC GSN Safety Argument used in the safety case reuse evaluation trial.
MSc in SCSE 8
Literature Review
2. Literature Review 2.1. Review Objectives The key objectives to this Chapter are to assess the requirements for Operational Safety Cases and place them in perspective through:
• Defining what is meant by operational safety and safety management systems.
• Reviewing why safety cases are required. • Identifying the applicable standards to be applied to operations in the
Defence, civilian aerospace and other domains. • Identifying other publications including papers associated with operational
safety cases. • Identifying the responsibilities associated with operational safety cases.
The information required to support the review objectives has been obtained through literature search, internet search and experiential sources. 2.2. Operational Safety In order to define the concept of operational safety it has been necessary to take a consensus of definitions in a variety of domains. The use of the term operational safety case by some organizations such as in the rail, North Sea Oil and nuclear industry seems consistent but without a definition, and it has been difficult to judge the additional requirements expected. A trawl of operational safety cases in international industries was made as far as possible in the public domain but the term is not widely used. Therefore, as a general starting point, the project will work from reference text. A reference to operations can be found in Donald Waters Book, Operations management [Ref 11]. He states that every organization makes a product. These products may be tangible – cars, chemicals etc, or intangible - services such as insurance, or, as in aircraft operations, providing offensive capability or simply move goods and personnel safely. At the heart of the organization is the set of operations that make this product. Waters goes on to state; central to every organization are the activities that make these products. These activities are the ‘operations’. Put simply, the operations describe what the organization does. Operations at IBM make computers; operations in civil airlines move people and goods from one place to another; military operations deliver a capability. In principle, operations are very simple. Organizations take a variety of inputs (such as raw materials, money, people and equipment), and perform operations (such as manufacturing, serving and training) to give outputs (which includes goods and services). Waters states that Operations management involves the following key activities:
• Planning – to establish goals, the means of achieving these goals, and timescales.
• Organizing – structuring the organization in the best way to achieve its goals.
• Staffing – making sure there are suitable people to do all the jobs. • Directing – coaching and guiding employees.
MSc in SCSE 9
Literature Review
• Motivating – empowering and encouraging employees to do their jobs well.
• Allocating – assigning resources to specific jobs. • Monitoring – to check progress towards the goals. • Controlling – to make sure the organization keeps moving towards its
goals. • Informing – keeping everyone informed of progress.
Hence, it can be seen that while in principle operations are simple, in reality they are complex, and involve not just systems (equipment) as support tools, but also people, and procedures. The term safety management can also be open to interpretation. For example, if an organization has safety as one of its core policies or missions, then does safety management cover all activities that contribute to that policy throughout the organization? Or is it confined to those actual activities involved in setting up the safety management policies and procedures? This Chapter will scrutinize the regulatory resources from different domains to show how the safety management systems have been developed for their individuals needs. Civil Aviation Publication (CAP) 712 [Ref 12] describes safety management as ‘the systematic management of the risks associated with flight operations, related ground operations and aircraft engineering or maintenance activities to achieve high levels of safety performance. It further defines a ‘Safety Management System’ is an explicit element of the corporate management responsibility which sets out a company’s safety policy and defines how it intends to manage safety as an integral part of its overall business. In order to provide a comparison with more familiar concepts, the CAP 712 compares a Safety Management System with a financial management system as a method of systematically managing a vital business function. This is a useful analogy as financial targets are set, budgets are prepared, levels of authority are established and so on. The formalities associated with a financial management system include ‘checks and balances’. The whole system includes a monitoring element so that corrections can be made if performance falls short of set targets. The outputs from a financial management system are usually felt across the company. Risks are still taken but the finance procedures should ensure that there are no ‘business surprises’. If there are, it can be disastrous for a small company. For the larger company, unwelcome media attention usually follows an unexpected loss. An aircraft accident is also ‘an unexpected loss’ and not one that any company in the civil aviation industry wishes to suffer. The concept of ‘Loss’ is developed further in the following chapters when developing loss models associated with hazards. It should be apparent that the management of safety must attract at least the same focus as that of finance. The adoption of an effective Safety Management System (SMS) will provide this. A developed SMS provides a transparent, documented system to manage safety and deserves at least the same degree of care that would be applied to a financial management system. 2.2.1. International Atomic Energy Agency – Operational Safety Operational Safety has been a recognized term in the nuclear field since 1982 when the Operational Safety Review Team (OSART) was set up by the International Atomic Energy Agency (IAEA) [Ref 13]. The areas covered by the
MSc in SCSE 10
Literature Review
OSART are management, organization and administration; training and qualification; operations; maintenance; technical support; operating experience; radiation protection; chemistry; and emergency planning and preparedness. OSARTs focus on the safety and reliability of plant operation. They review the operation of the plant and the performance of the plant’s management and staff and, in particular, the factors affecting the management of safety and the performance of personnel, such as organizational structure, roles and responsibilities, management goals and the qualification of personnel. Safety culture in the plant is also reviewed as an integral part of each review area and summarized to strengthen the team leader’s overview of safety performance. The review takes place over a period of 3 weeks and it is expected that an impartial evaluation of the operational performance of the nuclear plant would be produced over this time. The OSART is carried out with due warning and it might be expected that the plant is at a high degree of readiness prior to the OSART. However, the independent observers who are often taken from similar plants in other countries will be highly qualified and experienced in the domain and be able to ascertain when operations are not the normal routine. A particular strength of this form of assessment is the cross fertilization of best practice across all the member countries. This form of assessment is similar to the Tactical Evaluation of military operations within the NATO member states. While the OSART investigates the technical safety of the nuclear plants and thus verifies some aspects of the equipment safety case, it is important to note that it also assesses the culture and management processes. It is these soft issues and other external aspects that are often not included in safety cases. The Project considers that the recognition of the effect these and human factors have on the safety of systems (and system of systems) that inform the operational safety case. 2.3. Operational Safety Case In his report ‘The Eurofighter ‘Operational’ Safety Case [Ref 14], Henery discusses the requirements of an operational safety case but confines himself to the equipment facets and investigates the need for quick assessments when malfunctions of the aircraft systems occur. He is particularly interested in the reassessment of the safety case to ascertain if the aircraft is fit ‘enough’ to fly for a specified mission. The MSc course notes by J McDermid and T Kelly [Ref 15] state that the Operational Safety Case presents the argument that a system is acceptably safe in a given context. It is considered that in many instances, it has been difficult to define the ‘given context’ in sufficient detail to allow the risks to be managed sufficiently. The operational safety cases of aircraft systems define the environment by use of flight envelopes and Statement of Operating Intent and Usage (SOIU). These a further articulated through the aircrew manual and, in the military context, the Release to Service (RTS). The ‘given context’ or environment that provides the relevance for the argument is defined for the basic operation of the equipment. However, many operational factors are generally omitted from the safety case. The reason for the omissions is sometimes because the concept of operations is still being formulated at the time the equipment is introduced into service, but there are also omissions because the complete environment had not been included in the statement of requirements. Therefore, there are at best imaginative assumptions implied or the complete operational context is ignored. Aspects to be considered for systems include
MSc in SCSE 11
Literature Review
cultural issues combined with personnel performance. In military systems, threats from hostile factions such as the enemy’s order of battle and the effectiveness of offensive and defensive equipment should be included. Sometimes the pressure to achieve a certain level of operational capability, at a managerial level and political level, causes the focus on safety to become blurred. Civilian aircraft or other transport systems may now include in their safety case: the terrorist, saboteur, deranged passenger, cultural and a variety of human factor issues affecting the crew and the transport management team. The external pressure from stakeholders in profit making organizations and the, often self imposed, obligations and deadlines mandated on the operations team of all organizations can change the mood and concept of the ‘given context’ from that which was envisaged in the original perception of operations employed to develop the original safety case. 2.4. Safety Culture Many of the human factors attributes considered above, define the safety culture of the organization and Levesen, in her book Safeware [Ref 16], states that major accidents often stem from flaws in the safety culture. She defines safety culture as ‘the general attitude and approach to safety reflected by those who participate in that industry: management, workers, and government regulators. She identifies causes of accidents emanating from flaws in this culture. Specifically she identifies: overconfidence and complacency, a disregard or low priority for safety, or flawed resolution of conflicting goals. 2.4.1. Overconfidence and Complacency In the accident summaries in Chapter 1 above, the safety culture is often called into question and was highlighted in the Challenger accident where two related causes were identified; complacency and a belief that less safety, reliability and quality-assurance activity was required during ‘routine’ Shuttle operations. In an operational environment, it is often assumed by the management team that the very low figures of risk for individual systems or operating conditions are independent. Therefore, when assessing the system safety risk it appears that the overall risk is very low whereas these risks are often dependant. Examples in flight operations would be the probability of food poisoning affecting the pilot and his ability to sense and avoid an aircraft on a collision path. Another facet of overconfidence is experienced when there is an over reliance on redundancy. This is often viewed from the concept of single point failure where, although the majority of a system is duplex or triplex, the voting system may be flawed. However, this may also be attributed to a human factor where the training was flawed and the operator or crew reacted incorrectly given one particular set of conditions. Some difficult risk assessments are underestimated especially when there is no quantified hazard analysis available. In unusual environments, such as a new area of operations or unfamiliar operating rules, there is a tendency to make an assessment using the term ‘operational experience’ as appropriate evidence that sufficient risk assessment had been carried out to ensure that the risk was now as low as reasonably practical (ALARP). Numerical assessments only measure
MSc in SCSE 12
Literature Review
what they can measure and not necessarily what needs to be measured. The immeasurable factors such as the management errors or, in a military context, operational capability of the opposition are ignored, even though they may have a greater influence on safety than those that are measurable. The management team often believe that the numbers actually have some relation to the real risk of accidents, rather than being a way to evaluate specific aspects of the system. Levesen states that in major accidents, she has identified that the important causal factors in terms of accident prevention are often the immeasurable ones. She sites the Bhopal accident [Ref 17] where methyl isocyanate (MIC), a highly reactive, toxic, volatile, flammable and unstable chemical was released into the atmosphere. The accident happened after the operations and maintenance staff numbers were cut by half to reduce operating costs, and many skilled workers left for more secure jobs. The accident involved such immeasurable factors as: the refrigeration being disconnected, an operator ignoring or not believing a value on a gauge, operators putting off investigating the smell of MIC until after tea break, the vent scrubbers being turned off, the insufficient design and capacity of the scrubbers and flare tower, and the failure to inform the local community about what to do in the case of emergency. 2.4.2. Buncefield accident Some of the factors at Bhopal are similar to the circumstances highlighted recently in the HSE Third Progress Report [Ref 18] into the Buncefield accident at Hemel Hempstead. The accident occurred when an unleaded fuel tank on the Hertfordshire Oil Storage Ltd (HOSL) West site overflowed at around 05.30 hours on 11 December 2005 while being filled at a high rate. Local eyewitnesses had smelt the fuel vapour and had seen the large mist cloud forming around the site. The fuel vapour eventually was ignited at 06.00 hours causing a series of large explosions and considerable damage to the local area but no loss of life. From 19.00 on the 10 December, Tank 912 was being filled and at approximately 03.00, the level gauge for the tank recorded an unchanged reading. However, filling of Tank 912 continued at a rate of around 550 m3/hour. Calculations show that at around 05.20, Tank 912 would have been completely full and starting to overflow. Evidence suggests that the protection system that should have automatically closed valves to prevent any more filling did not operate. From 05.20 onwards, continued pumping caused fuel to cascade down the side of the tank and through the air, leading to the rapid formation of a rich fuel/air mixture that collected in bund A. At 05.38, CCTV footage shows vapour from the escaped fuel started to flow out of the northwest corner of bund A towards the west. The vapour cloud was about 1 m deep. At 05.46, the vapour cloud had thickened to about 2 m deep and was flowing out of bund A in all directions. Between 05.50 and 06.00, the pumping rate down the pipeline to Tank 912 gradually rose to around 890 m3/hour. The sequence of events is important as it indicates a lack of reaction to the failure of the level gauge. Also, the automatic shut down system failed and the alarm system did not provide the visual and audible warnings to the operators or to the British Pipelines Agency (BPA) control centre at Kingsbury, Warwickshire. The smell of fuel vapour and the visibility of the vapour cloud would have given about 40 minutes warning, but it would be pure conjecture to expect that any action by this time would have prevented the accident.
MSc in SCSE 13
Literature Review
2.4.3. Underestimating Risk Levesen has identified that another consequence of complacency is the unsystematic consideration of serous risks. It is often attractive for an organization with limited funds to identify a few quick wins by controlling the most likely hazards without prioritizing against a risk analysis. Therefore, hazards with high severity and (assumed) low probability are dismissed as not being worth resource investment. When the accident eventually occurs it is often found that the cause had been dismissed as incredible. Within the military aircraft environment, the lifetime of a platform is often more than 30 years. Before safety management was practiced in the present regulated manner, it was often considered that risk decreased over time. There was a belief that a system must be safe because it has operated without an accident for many years. In reality, risk may decrease, remain constant, or even increase over time. However, there are many reasons for the change of risk over time. Sometimes the lack of an accident encourages less cautious approaches to operation. On the other hand, it is possible that the ‘given context’ of the safety case has changed and the role required of the platform has become more benign or the operating envelope has been reduced to save fatigue life or reduce the maintenance costs. Trade offs between safety and other factors sometimes run the other way when increased performance is required and some maintenance levels are omitted as in the case of the recent leaning programme. Levesen postulates that it is possible that as error rates in a system decrease and the reliability increases, the risk of accidents may actually be increasing. 2.4.4. Safety Culture Definition Safety culture is a sub-set of the overall culture of an organisation and it follows that the safety performance of organizations is greatly influenced by aspects of management that have not traditionally been seen as part of safety. The Health and Safety Commission (HSC) have provided guidance on the definition of Safety Culture but having defined it is important to be able to measure it so that it can be managed. The definition of safety culture suggested by the HSC is: ‘The safety culture of an organisation is the product of the individual and group values, attitudes, competencies and patterns of behaviour that determine the commitment to, and the style and proficiency of, an organization’s health and safety programmes. Organizations with a positive safety culture are characterized by communications founded on mutual trust, by shared perceptions of the importance of safety, and by confidence in the efficacy of preventative measures.’ A positive safety culture implies that the whole is more than the sum of the parts. The different aspects interact together to give added effect in a collective commitment. In a negative safety culture the opposite is the case, with the commitment of some individuals strangled by the cynicism of others. The HSC state that certain factors appear to characterize organizations with a positive safety culture. These factors include:
• The importance of leadership and the commitment of the chief executive. • The executive safety role of line management.
MSc in SCSE 14
Literature Review
• The involvement of all employees. • Effective communications and commonly understood and agreed goals. • Good organizational learning and responsiveness to change. • Manifest attention to workplace safety and health. • A questioning attitude and a rigorous and prudent approach by all
individuals. Furthermore, the HSC Advisory Committee on the Safety of Nuclear Installations (ACSNI) report [Ref 19] contains a prompt-list of indicators of positive safety culture intended to assist organizations in reviewing their own culture. Improving safety culture is something which must be seen as a long term and systematic process, based on an initial assessment of the existing safety culture, determining priorities for change, the actions necessary to effect the change and then going on to review progress before repeating the process indefinitely. This process will be required in the operational safety case. 2.5. Relationship between Incidents and Accidents in the
Military Environment The final aspect of complacency is discounting the warning signs; however, it is evident from Joint Service Publication (JSP) 551 [Ref 20] and Aerospace Recommended Practice (ARP) 5510 [Ref 21] that incident recording and action in the military and civil aviation environments is well understood and monitored. As accidents are usually statistically insignificant, risk assessment based on accident occurrence alone is flawed. If the number and type of incidents (instead of accidents) is analysed it is found that they do not correlate with the causal trends for typical accidents. This was recorded in the recent published data from the MoD Defence Aviation Safety Centre (DASC) [Ref 22] as shown in Figure 1 and Figure 2 below:
Figure 1 RAF Category 4 and 5 Air Accidents 1994-2004
Human Factors (Non-Aircrew)
4%
Natural and Operating Risk
6%Not Positively
Determined3%
Technical Fault21%
Other9%
Human Factors (Aircrew)
57%
MSc in SCSE 15
Literature Review
Figure 2 RAF Air Incidents 1994-2004
It is evident from the RAF data that the causal factor most common in Category 4 and 5 accidents (defined as the loss of or major damage to an aircraft), at 57% is Human factors (aircrew). Whereas the major causal factor for incidents (defined as those occurrences where no major damage or injury was evident) was found to be technical failure, and the human factors (aircrew) was only 5% of the total. This anomaly could be explained by a poor safety culture where the reporting of human factor (aircrew) incidents was discouraged or perceived to be career limiting. Several campaigns have been introduced to improve the reporting of human factors incidents but the stigma of the blame culture appears to override all the good intensions of the safety management organization. This loss of occurrence data is limiting the effectiveness of the operational safety case and has been recognized by the Typhoon Integrated Project Team in their article ‘Developing and Maintaining an Effective Safety Culture’ [Ref 23] published in the Defence Aviation Safety Centre Journal 2005. For reference, the article has been reproduced at Appendix A. However, this article, having recognized the need to change the Safety Culture, has focused on the technical areas and has skated over the operating and operational aspects. The difficulty in implementing a safety culture change is not underestimated and is highlighted in the text: ‘A safety culture does not happen overnight or arrive at a set point in the project. Rather it achieves the required level and then seeks to evolve throughout the project. This adds value and proactively identifies areas of risk and initiates the mitigation or removal of risk by providing early warnings. The most obvious example of this would be the analysis of safety data across a number of key subsystems. Each system is looked at as to how the pattern of minor occurrence may become apparent’. However, the nature of the operational system of systems is that the personnel are always changing and the time taken to change attitudes is too long to achieve a benefit by concentrating on just one of the systems. Therefore, the solution is to provide everyone entering the domain with initial training to foster a
Human Factors (Aircrew)
5%
Other5%
Technical Fault57%
Not Positively Determined
11%
Natural and Operating Risk
19%
Human Factors (Non-Aircrew)
3%
MSc in SCSE 16
Literature Review
common approach. This policy has to be driven from the most senior level of management: the Duty Holder. 2.6. Accident Causal Factors in Offshore Helicopter Accidents In his report on Helicopter Safety Offshore [Ref 24], G Morrison of the HSE states that Helicopter travel to and from offshore installations generates one of the main sources of risk for offshore workers. This is more significant on modern installations where equipment risks are low; helicopter transport may be the dominant risk. In the past, helicopter accidents in the North Sea for instance were too common and largely due to technical failure. It was recognized by the Civil Aviation Authority (CAA) that North Sea helicopter accident rates were an order of magnitude greater than fixed wing aircraft and were recommending the use of Health and Usage Monitoring Systems (HUMS) in order to redress this imbalance. The Chinook accident in 1986 was the driver for the voluntary introduction of HUMS by oil companies in the UK sector. This measure is considered by many experts to be the most significant advance in aviation safety in recent years. Now the incidence of accidents is very low and risks to passengers are now comparable with flights in similar fixed-wing aircraft. Morrison has calculated that the five-year moving average of fatal accidents has reduced from 0.8 per 100,000 flights before 1985 to less than 0.2 today. The number of reported incidents has similarly reduced over the period. However, he recognizes that this improvement cannot continue at the same rate and, just as Levesen reflected in paragraph 2.4.3 above. Morrison has noted, ‘Although the overall trend is downward, the extrapolation into the future of relatively low accident figures from a small sample period should not be taken for granted. There are currently considerable economic and other pressures on installation and helicopter operators and their staff that could eventually have an effect on safety’. Morrison has divided causes of accidents between aircraft mechanical failure and human factors, usually pilot error. Historically, most fatalities to passengers and crew have been from drowning because of mechanical failure leading to aircraft ditching in the sea. In recent years, aircraft systems have become more reliable and a greater proportion of accidents can now be attributed to human error. Nearly all accidents can be traced back to show a human factors contribution at the operational, maintenance, manufacturing or design stage. It is the purpose of this Project to investigate how these factors can be reflected in the safety case. Morison believes that future improvements in helicopter safety offshore are most likely to be achieved through continuous improvements to:
• The design of helicopters by aircraft manufacturers. • Increased use of helicopter onboard monitoring systems such as HUMS. • Improved maintenance of aircraft. • Influencing human factors that affect the behaviour of aircrew, helideck
crew, radio operators, logistics staff and others. • Designing and operating helidecks to take full account of operations on
an installation. 2.6.1. Aircrew Competence Morrison emphasizes the skill of pilots as a significant factor to be reflected in the safety case. The competence of professional pilots flying to offshore installations
MSc in SCSE 17
Literature Review
is ensured by qualifications, training and experience and is monitored by the aviation regulators. However, pilots fly in sometimes very arduous conditions of bad weather from wind, rain and low visibility at night or in fog to land on a relatively small landing area offshore. The helideck may be moving significantly as on a floating platform, and during an emergency evacuation and bad weather. All pilots flying in Europe are required to be highly trained. Many have been flying offshore for years and are highly experienced. Several Installation Duty holders specify high levels of experience in their service contracts with the Helicopter Operators. The CAA has reported in their Helicopter Airworthiness Review Panel (HARP) report, CAP 491 [Ref 25], that Pilot error has been a common cause of helicopter accidents offshore and, ‘most of the human error accidents were operational in character, such as flying into obstructions, flying in meteorological conditions for which the pilot was not qualified or pilot disorientation’. Human factors affecting pilot performance and judgement are probably now the major hazard to offshore flights. The CAA has now mandated improved pilot training, especially in the area of crew cooperation and all helicopter operators now give Crew Resource Management Training (CRMT). This was also recognized in the MoD and CRMT is now an important part of multi-crew aircraft pilot training with courses provided by the DASC at RAF Bentley Priory. CAP 491 [Ref 25] also examined duty times achieved in North Sea helicopter operations and noted that, ‘Stress and fatigue were endemic to the pilots’ way of life. This study concluded that pilots felt under increasing pressure to fly for commercial reasons even in difficult conditions. ‘There is a high pilot workload associated with the take-off and landing phases of the hundreds of offshore flights that take place every week. This workload is particularly high in conditions of low visibility and adverse weather’. 2.6.2. Offshore Installations Safety Case Requirements When the Cullen inquiry findings were published [Ref 8], there was considerable emphasis on the permit to work process and this was actioned very quickly in the Offshore working practices. However, the safety case should encompass a wider scope and include the risk associated with transfer of workers and visitors. Similar to the practices required by the Yellow Book [Ref 3] for the railways, the Offshore safety cases now require inspection and acceptance by a third party. In the offshore industry case, it is the Health and Safety Executive (HSE) who accepts the safety case in accordance with The Offshore Installations (Safety Representatives and Safety Committees) Regulations 1989 [Ref 26]. This regulation has been progressively revised and the latest version 2005 [Ref 27] came into force in April 2006. The regulations require the duty holder to provide installation safety representatives with a written summary of the main features of the case (and let them see a full copy if they wish). The Regulations now include the detail required for Helicopter Emergencies through reference to The Offshore Installations Prevention of Fire and Explosion and Emergency Response 1995 (PFEER) [Ref 28].
MSc in SCSE 18
Literature Review
2.7. Operational Safety Case – Historical Definition In his report, ‘Operational Safety Case’ C Blagrove defined the operational Safety case in the Military environment as: An operational safety case is a compelling, comprehensible and valid case that the combination of elements comprising military capability, when used together in a defined operating environment to perform a given task, demonstrate that the system is acceptably safe. His use of the term ‘combination of elements comprising military capability’ is further explained as follows: 2.7.1. Military capability The UK MoD Acquisition1 Management System (AMS) [Ref 29] defines military capability as comprising the components shown in Figure 3 and described below:
Figure 3 MoD AMS components of military capability
Concepts and Doctrine. The application of current military thinking, tactics, techniques and procedures. Personnel. Current and planned manpower numbers needed to populate the armed forces, MOD organisation and its constituent parts. Equipment and Technology. Provision of platforms, systems, weapons, and their supporting infrastructures, including updates to legacy systems. Training. Provision of individual and collective training and exercises. Sustainability. Consumption of resources (such as fuel, lubricants, ammunition, spares, rations etc) and the maintenance of equipment and supporting technical standards, facilities and infrastructure. 1 Acquisition is described in the MoD AMS Handbook as the process of requirement setting, procurement management, support management and termination/disposal, implying a whole-life approach to defence capability.
MSc in SCSE 19
Literature Review
Structures and Estates. Investment in the defence estate, infrastructure, facilities and non-operational management systems required to support the armed forces, MoD organisation and structures, associated equipment and personnel. Blagrove’s definition is focused on the Acquisition Management System devised by the Defence Procurement Agency (DPA) and Defence Logistics Organization (DLO) and is naturally focused on the procurement function preparing an equipment for entry into Service. However, in this Project there is a need to extend the definition to include the front line operating environment that has been discussed in paragraph 2.3. The additional areas of interest are the operating organization, including the front line personnel both ground crew and aircrew, the actual areas of operation, which may include hostile environments not specifically included in the SOIU, and the more difficult to quantify factors developed in the Safety Culture paragraphs above. The SOIU is a document providing several generic types of operation showing the heights and speed of the flights to assist the Integrated Project Team’s assessment of fatigue usage for inclusion in the equipment (platform2 or weapon system3) safety case. 2.8. Responsibility of Duty Holders Responsibility for decisions that affect safety lies with the duty holders defined by the HSE’s Principle and Guidance [Ref 30] as: full-time principals or senior managers of the business having an understanding of, and responsibility for, the health and safety and other legal requirements relating to the system and have a duty to control and reduce risks "as low as is reasonably practicable" (ALARP). Fundamentally, the Duty Holder is the person responsible for the safety management of the system. The approach described in this report will help to ensure that their decisions are compatible with those taken by other members of the domain, which is especially important when two organizations share responsibility for the safety of different parts of the system. Underpinning this Project is the recognition that the acceptable level of safety is a contentious topic, with many strongly held and diverse views within high-risk systems that have caused considerable distrust and misunderstanding in the general public. The directors and senior managers of a company take the corporate decisions that determine the company’s approach and attitude to risk and safety, and are ultimately responsible. The management policies that they set empower the operational staff to take proper decisions that reflect that approach and attitude. The arguments that they advance to regulators and Ministers influence the development of public policy. When delegating decision-taking authority to operational staff within their organization the senior managers need to reflect that the HSWA distinguishes two types of victim – employees and others (including passengers and third party people affected). It imposes essentially the same duty for each type, and does 2 Platform is the term given to the complete aircraft with all systems required for safe flight. 3 Weapon System is the term given to the combination of equipment including weapons and other stores carried that constitute the complete aircraft fitted for its many roles and includes any ground stations required for operations as in the case of UAVs.
MSc in SCSE 20
Literature Review
not distinguish between different types of accident. However, that does not imply that the level of care is the same in every case. The legal duty is ultimately determined by society’s attitude as to what is reasonable. Society might demand that an organization must do more to prevent an accident that kills five people than it must do to prevent five accidents that each kill one person. If that is what society demands, then the law follows it by setting different thresholds of reasonable practicability. Similarly, society might demand that an organization does more to protect a passenger or worker than to protect a trespasser. Almost every decision that is taken by an organization or its employees, and those taken by regulators and government, can affect safety. Decisions that affect safety may lie at any point along an axis that runs from political through operational to routine as shown in Figure 4, reproduced from the paper ‘How Safe is Safe Enough’ [Ref 31]. However, the duty holder of an operational safety case is more likely to lie in the region where the nature of his decisions interfaces between two organizations.
Figure 4 Safety related decisions applicable to level of authority 2.8.1. The effect of the Duty Holder’s level of responsibility in the
organization on the Operational Safety Case The concept of operational capability allows the duty holder to make decisions that can be balanced against risk; this can be ably illustrated by considering the requirement to place a weapon on a target. The duty holder can assess the different weapon delivery systems available such as: a long-range gun, rocket, ground-launched or air-launched missile, UAV or piloted aircraft delivered. Through analysis of the accuracy required and the probability of failure or probability and consequence of collateral damage, the duty holder can make a risk-based decision. From Figure 4 the level at which a duty holder is likely to be responsible for an operational safety case is where he is at a sufficiently high level that he has the authority to make decision over 2 or more organizations. This coupled with the requirement that the duty holder is also responsible for the personnel who are operating the equipment further defines the meaning of an
MSc in SCSE 21
Literature Review
operational safety case. A safety case (and Hazard Log) owned and maintained by an IPTL is clearly an equipment safety case or if it were for a complete aircraft, it would be a platform safety case. JSP 553 [Ref 32] states that the Release to Service Authority (RTSA) owns the safety case, but since he does not have direct responsibility for the personnel who operate the aircraft, the safety case should still be regarded as a platform safety case. It is only when he safety case in question and associated hazard log is owned by the Aircraft operating Authority (AOA) who has responsibility for how the aircraft is used and who operates it that we can justify the definition of operational safety case. The extension of the definition into the civilian environment or other domains will not be explicitly covered in this report but will be included in the further work section. However the difference between an equipment safety case, platform safety case and operational safety is more obvious when identifying the responsibilities associated with the Duty Holder who has ownership of the safety case and the personnel who operate the equipment. The higher the level in the organization and the nearer to the front line, the more likely the safety case will have the attributes of an operational safety case. It is considered that the explanation of the context in a military environment helps to demonstrate the difference between a platform or equipment safety case and an operational safety case. However, it is also important to ensure that the operational safety case addresses the areas highlighted by Kelly [Ref 15]: Equipment – where it impacts operations such as where the equipment limits the operational requirement Procedures – clearly defined for use by the operators and maintainers. Guidance on the type and structure of training through life. People – Establishment against strength and level of competence required are apparent. Management of change, supervision and responsibility. Safety Management System – Safety management organization role defined. Approval routes. Safety monitoring system that defines how incidents are handled. Information feedback system. Safety documents and maintenance process. 2.8.2. Operational Safety Case – Proposed Definition There is no appropriate definition of operational safety in UK MoD documentation in the public domain. Similarly, dictionaries do not provide a suitable definition applicable to operational safety cases. The United Sates Air Force (USAF) Policy Directive 63-12 [Ref 33] provides a definition of operational effectiveness, which helps to define the word Operational as follows: Operational Effectiveness—The overall degree of mission accomplishment of a system or end-item used by representative personnel in the environment planned or expected (e.g., natural, electronic, threat) for operational employment of the system or end-item considering organization, doctrine, tactics, information assurance, force protection, survivability, vulnerability, and threat (including countermeasures; initial nuclear weapons effects; and nuclear, biological, and chemical contamination threats). (Air Force Instruction (AFI) 99-102). This definition of ‘operational’ provides a much more robust and descriptive definition than that chosen by Blagrove. Furthermore, and to bring Blagrove’s definition up to date, the latest AMS Guidance [Ref 34] now states that:
MSc in SCSE 22
Literature Review
The aim for Safety Management applied to military operations should be to assess the likely Hazards in advance and to have appropriate control measures and risk management integrated into military planning. Safety continues to be important during war. Safety Assessment should provide commanders with systems that are safe for their military role, and with information to enable them to make good decisions when on operations. This has widened the original guidance and provides a better understanding of the military requirements included at the front line. However, in addition, this report will include the operational factors associated with the organizational and personnel aspects (which will included concepts such as safety culture), and airspace environment (which will include Air Traffic Management (ATM) airborne surveillance and control and threat capability). From the discussion above and for the purposes of this Project, the following definition of Operational Safety Case is proposed: A safety case, owned by the duty holder responsible for the operators, that provides a compelling, comprehensible and valid case, that the combination of elements comprising operational capability, when used together in a defined operating environment to achieve agreed objectives, demonstrates that the system is acceptably safe. The definition is similar to Blagrove’s but the meaning is extended to define the duty holder’s role, and by the expanded definitions of ‘operational capability’ and ‘operating environment’ which includes the soft issues of organization and safety culture, and the ‘threat’ and ‘airspace’ environment. 2.9. Corporate Manslaughter and Corporate Homicide Bill The Corporate Manslaughter and Corporate Homicide Bill [Ref 2] was introduced to Parliament in July 2006 and still has some way to go before it receives Royal Assent; however, organizations are preparing for the changes to legislation. It should be emphasized that there are certain exemptions that are pertinent to the domains investigated in this Project. In particular, the Armed Forces are exempt in the following manner: ‘Military activities (1) Any duty of care owed by the Ministry of Defence in respect of:
(a) operations within subsection (2), (b) activities carried on in preparation for, or directly in support of, such operations, or (c) training of a hazardous nature, or training carried out in a hazardous way, which it is considered needs to be carried out, or carried out in that way, in order to improve or maintain the effectiveness of the armed forces with respect to such operations, is not a “relevant duty of care”.
(2) The operations within this subsection are operations, including peacekeeping operations and operations for dealing with terrorism, civil unrest or serious public disorder, in which members of the armed forces come under attack or face the threat of attack or violent resistance.
MSc in SCSE 23
Literature Review
(3) Any duty of care owed by the Ministry of Defence in respect of activities carried on by members of the special forces is not a “relevant duty of care”.’ The explicit exemption of certain operational activities should not imply that these operations do not require a safety case. The definition is helpful in defining some of the activities included within military operational safety cases. 2.9.1. Recent Accident Inquiries The introduction of the Bill and the results of recent accident inquiries have brought into sharp focus the responsibility of Duty Holders. Hatfield Rail Accident. The London to Leeds express derailed near Hatfield in October 2000 after a stretch of track broke apart as the train was passing at 115mph. A rail fault had been diagnosed some 21 months earlier and measures could have been taken to avoid the catastrophe. Balfour Beatty was fined £10 million and Network Rail £3.5 million for offences under the HSWA. The Judge, Mr Justice Mackay stated, “Balfour Beatty’s failure to maintain track at Hatfield in a safe condition was one of the worst example of industrial negligence in a high risk industry I had ever seen”. He put the negligence into context by stating that, “These were breaches of general duty to the public at large. Something over three-quarters of a million passengers would have been put at risk by passing over this area”. Network Rail’s fine was the highest ever awarded against a rail company for health and safety breaches. However, manslaughter charges against Network Rail and Balfour Beatty managers and personnel were dismissed. The five highest safety fines imposed against organizations prior to the Hatfield rail accident were as follows: Larkhall Gas Explosion. Transco fined £15 million after a family of four were killed when a gas explosion destroyed their home on 22 December 1999 Ladbroke Grove Rail Crash. Thames Trains was fined £2 million following a train crash at Ladbroke Grove on 5 October 1999. 31 people died when a train leaving Paddington Station went through a red signal and collided with another train. Port Ramsgate Gangway Collapse. Four companies were fined £1.7 million for their part in the walkway collapse in September 1995, which claimed the lives of six passengers as they were boarding the Prinz Fillip ferry from Ramsgate to Ostend. Swedish engineering companies FEAB and FKAB were given fines of £750,000 and £250,000 respectively, Lloyds Register of Shipping was fined £500.000 and Port Ramsgate Ltd was fined £200,000. Southall Rail Disaster. Great Western Trains was fined £1.5 million after seven passengers were killed and 150 injured when a high speed train travelling from Swansea to Paddington went through a red signal and collided with an empty freight train at Southall, west London in September 1997. Corporate manslaughter charges against the train operator were dismissed on direction of the trial judge. Heathrow Tunnel Collapse. Balfour Beatty was fined £1.2 million after a rail tunnel collapsed beneath Heathrow airport in February 1999. Fortunately, no injuries were caused but the Austrian technical consultancy Geoconsult was fined £500,000.
MSc in SCSE 24
Literature Review
2.9.2. Current Legislation The existing legislation is known as ‘Involuntary Manslaughter’ and requires an individual to kill as a result of some blameworthy act on their part but without actually intending to cause death or serious injury. There is also the concept of gross negligence manslaughter. According to the Crown Prosecution Service (CPS), it has to be established that: there was a duty of care owed by the accused to the deceased; there was a breach of the duty of care by the accused; the death of the deceased was caused by the breach of the duty of care by the accused; the breach of the duty of care by the accused was so great as to be characterized as gross negligence and therefore a crime. However, the problem lies in that for a company to be prosecuted for manslaughter, including gross negligence manslaughter, it was necessary to identify a ‘controlling mind’ who is also personally guilty of manslaughter. It was not possible under that law to add up the negligence of several individuals to show the company as grossly negligent. A specific individual had to be identified as a controlling mind for corporate manslaughter to be proven. An example of where the evidence could not support a manslaughter charge was on the Potters Bar Rail crash where seven passengers lost their lives and over 70 were injured when a train derailed just outside Potters Bar station on 10 May 2002. The HSE investigation into the accident revealed that it was caused by a points' failure. According to the Executive, the points had been poorly maintained while other sets of points in the Potters Bar area were found to have similar, though less serious, maintenance deficiencies. However, the CPS’s Principal Legal Adviser, Mr C Newell stated, “After giving careful consideration to the large volume of evidence provided, the CPS has advised that it does not provide a realistic prospect of conviction for an offence of gross negligence manslaughter against any individual or corporation”. 2.9.3. Corporate Manslaughter and Corporate Homicide Bill
Proposals Under the proposed legislation, an organization is guilty of the offence of corporate manslaughter if the way in which any of the organization’s activities are managed or organized by senior managers both causes a person’s death and amounts to gross breach of a relevant duty of care owed by the organization to the deceased. The senior manager must play a significant role in making decisions about the activities that brought about the death or is the actual manager of the activities. Furthermore, the gross breach is a breach of a duty of care by the organization that falls below what can reasonably be expected of the organization in the circumstances. To decide that question, the jury has to consider whether the evidence shows that the organization failed to comply with any relevant Health and Safety legislation or guidance. The Bill also requires the jury to consider if the senior managers sought to cause the organization to profit from its failure; did they deliberately cut corners to reduce costs or boost profits. 2.9.4. Current Legislation Failure There have been a number of transport sector accidents that failed to secure a manslaughter charge because it has been difficult to identify, ‘beyond reasonable doubt’, the controlling mind at a sufficiently high level in the organization to satisfy the public demand for leaders of companies to be made personally liable.
MSc in SCSE 25
Literature Review
Furthermore, the law did not provide a deterrent to companies to ensure that their safety standards were continually improved. 2.9.5. Sufficiency of the Bill The Bill does not include a provision that will extend the new offence to individual directors; it is the individuals not companies who make decisions. It is also likely that the penalty will be a fine and this is unlikely to be a real deterrent to large corporations. The HSWA legislation gives the HSE wide powers and the law permits unlimited fines therefore the sanctions under the Corporate Manslaughter and Corporate Homicide Bill add little to the present regulatory framework. 2.9.6. Obtaining Successful Prosecution The new Bill, whilst not requiring a jury to identify a senior manager who is personally guilty of manslaughter before the company itself can be prosecuted, will need to show that there was a senior management failure. This may be difficult to prove if responsibility for decision-making is deliberately delegated down the management chain to a lower level in the organization and thereby allow the organization to escape manslaughter prosecution. Therefore it would still be difficult to prosecute a large corporation, whereas with smaller firms it would be easier to prosecute as in the case described in the following quote from the ARGUS 12 Mar 2005 [Ref 35]: ‘Some bosses will serve jail time for serious safety offences - but it continues to be those running small firms that face a custodial sentence rather than their generally better resourced and better remunerated blue chip equivalents, none of whom have ever faced imprisonment for workplace safety offences. In March, garage manager Glen Hawkins started a nine-month prison sentence for manslaughter, after his gross negligence led to the death of trainee mechanic Lewis Murphy. Hawkins had helped the 18-year-old trainee pour a mix of petrol and diesel into a waste oil tank at the Anchor Garage, Peacehaven. Fumes were sucked into the flue of a recently installed boiler sparking a massive fireball on February 18 last year, killing Mr Murphy and injuring Hawkins. Garage owner Howard Hawkins, Greg Hawkins’ father, was fined £10,000 for failing to ensure the safety of his employees. He was told he would be jailed for six months if he failed to pay the fine and was ordered to pay £15,000 towards prosecution costs of £54,000. Only 11 company directors have ever been convicted of manslaughter following a work-related death and of those 11 convictions, just five directors were imprisoned. Of the remaining six directors convicted of manslaughter, five received suspended sentences and one was given community service.’ The Argus 12 Mar 2005 Furthermore, for a gross breach of a duty of care, a jury must consider whether senior managers ‘sought to cause the organization to profit from that failure’. This evidence is extremely difficult to obtain and absence of evidence will be used by organizations to show that their conduct was not grossly negligent 2.10. Review Summary The standards and regulations supporting the requirement for safety cases have been identified along with the major disasters that triggered the need for safety case development. The definition of ‘Operational’ in the context of this report has
MSc in SCSE 26
Literature Review
been developed from first principles an academic paper, through to military instructions, as there was no definition identified for civil organizations in the public domain. A definition of operational safety case in this context has been proposed. The responsibilities associated with the ownership of the safety case and, in particular, the Duty Holder’s responsibility has been discussed and placed into context with the Corporate Manslaughter and Corporate Homicide Bill. Levesen and the HSC identified flaws in safety culture as a root cause of accidents; this observation was evaluated against the Bhopal and Buncefield accidents. It was considered that a good safety culture was a key goal in an operational safety case. The apparent disparity between the proportion of human factor associated incidents compared to the number of aircraft accidents caused by human factors was highlighted to emphasize the urgent need to include these soft issues in the operational safety case argument.
MSc in SCSE 27
Critical Appraisal
3. To conduct a critical appraisal and appreciation of the methodology of operational safety case development and the relevant standards that govern them
3.1. Introduction During a mission where the time to make decisions is reduced or the demands on an aircraft commander’s judgement is heightened (such as a Search and Rescue pilot deciding to approach a ship in marginal weather conditions or a transport aircraft crew entering a war zone), the effectiveness of the operational safety case to reflect the context of the operation is crucial to providing the assurance required by the Duty Holder that the risk to which the pilot and passengers are exposed is acceptable. Safety cases have tended to focus on the airworthiness of platforms however, the operational safety case (in the context of this Project) demands a greater understanding of external hazards and risk analysis. This Chapter will examine the military standards and regulations providing safety requirements and guidance to the MoD and appraise safety cases developed for the operational environment. The resultant evaluation will provide the methodology to be adopted for the development of a large-scale operational safety case in the Military environment. 3.2. Standards Governing Safety Case Development Many standards have been written or evolved to provide regulation and guidance on how systems may be designed, operated, maintained, and disposed of in an acceptably safe manner. The concept of safety cases espoused by the Cullen Report in 1990 has filtered through to most safety critical systems and this review will explore how they deal with the notion of operational safety case. Specifically,
MSc in SCSE 28
Critical Appraisal
the regulations and standards relating to operational safety cases in the military and civilian aerospace and rail domains will be examined. 3.2.1. Defence Safety Regulations The Defence regulatory process has evolved over many years with Defence Standards such as the Def Stan 05-123 defining the certification requirements of aircraft having developed into a document respected throughout the global aircraft industry. This evolutionary process has served to provide a background of stability where both customer and equipment manufacturer have adopted change through consensus and agreement. 3.2.2. Defence Standard 00-56 Issue 3 Def Stan 00-56 [Ref.32] was produced to provide requirements for the management of safety through the life of the project. A Defence Standard is used by Defence Contractors normally as directed in the contract. However, it also provides guidance that can be adopted by the Integrated Project Team (IPT). The Def Stan 00-56 sets out the requirements for safety during system acquisition and emphasizes the compliance with both safety legislation and MoD safety policy. It requires the safety management system to be auditable and that a safety case is developed and maintained to demonstrate how safety is being achieved and maintained. Furthermore, there is considerable regulation and guidance on the application of the ALARP principle and this is applicable to the Duty Holder’s perspective. The need to reflect the contractor’s responsibility in Def Stan 00-56 ensures that the prime focus of the standard is the ‘system’ such as a platform or weapon system. However, the operational safety case includes additional systems that may not be reflected in the platform safety case. While the principles of safety management are valid, the scope of the operational safety case in the context of this report is wider. The following JSPs, while they do not have the status for contracting purposes, define more readily the requirements of an operational safety case and associated management system. A fundamental requirement of Def Stan 00-56 is that risks should be managed. It states that, ‘Risk management is the process of ensuring that hazards and potential accidents are identified and managed, and is a process managed within the Safety Management System. The outputs from the risk management process are a key part of the Safety Case. The Contractor shall identify all hazards and potential accidents, so far as is reasonably practicable, and manage their associated risks as appropriate. The Contractor shall seek to ensure that all risks are broadly acceptable. Where this is not possible, risks shall be reduced to levels that are tolerable and ALARP’. The risk management process includes: hazard identification, hazard analysis, risk estimation, risk and ALARP evaluation, risk reduction and risk acceptance. This process is carried out throughout the life of the system and can be applied equally well to the operational safety case with system of systems, as to a platform. The process can be described in a form of a V model and compared with the Society of Automotive Engineers (SAE) 4754 [Ref 36] process in Figure 5 below:
MSc in SCSE 29
Critical Appraisal
Figure 5 Hazard Log Development in a Safety Lifecycle
3.2.3. JSP 550 JSP 550 Regulation 445 [Ref 37] Is the policy statement for the Defence Aviation Safety Management System (DASMS) and presents the policy authorized by the Defence Aviation Safety Board (DASB). The policy does not just deal with the technical and Platform issues but embraces the organizational and operating issues. It requires MoD organizations to: Minimize risks of an accident occurring as far as reasonably practicable whilst taking cognisance of Secretary of State’s (SofS’s) (for Defence) directive to maintain an appropriate balance of safety and the delivery of Operational Capability. ALARP. This statement makes clear that minimizing risk is the principal safety objective. The As Low As Reasonably Practicable (ALARP) principle recognizes that risks must be balanced against reality and resources. It is implicit, therefore, that risks must be identified and their likelihood and severity evaluated before their acceptability can be judged. The Regulation continues to expand on the ALARP requirements and states that, ‘organizations should recognize and acknowledge the need for a balanced approach in ascribing a high priority to Aviation Safety and environmental protection relative to operational, training, public relations, commercial (where applicable) and working practice pressures’. Safety Case. The Regulation implies that an overarching safety case is required by the Service Divisions in the policy statement requiring a Safety Management System (SMS). It states that, ‘Service Divisions are to implement a formal SMS, which is consistent with this document and prescribes for the active
PHI
FHA
PSSA SSA
Hazard ID and Analysis
Risk Estimation
Risk and ALARP Evaluation
Risk Reduction
Revise Risk Estimation
Risk Acceptance
SAE 4754 Process
Def Stan 00-56 Process
Platform ConceptSafe Platform
PHI
FHA
PSSA SSA
Hazard ID and Analysis
Risk Estimation
Risk and ALARP Evaluation
Risk Reduction
Revise Risk Estimation
Risk Acceptance
SAE 4754 Process
Def Stan 00-56 Process
Platform ConceptSafe Platform
MSc in SCSE 30
Critical Appraisal
management of interfaces between other aviation SMSs where required. This may include active integration with other SMSs or elements thereof. In particular, platform Safety Cases and any associated Service Deviations must be linked to environmental or “System” SMSs. For example, aircraft Safety Cases are to link with ATC or Airborne Surveillance and Control System (ASACS) SMS for their operating area. Safety Culture. The JSP acknowledges the importance of the safety culture issues and reminds the Aircraft Operating Authority (AOA) to, ‘Recognize individual and management responsibility for safety performance and fostering a positive safety culture (in this context this includes the commanders and supervisors themselves, the staff reporting to them and any contractors for whom they are responsible)’. 3.2.4. JSP553 JSP 553 [Ref 38] explicitly requires the development and maintenance of a Safety Case, Release To Service (RTS) and Aircraft Document Set (ADS) for each aircraft, its systems and equipment. Safety analysis is required to be sufficient to justify why the achieved level of airworthiness meets prescribed design criteria. Safety analysis evidence and any resulting limitations and mitigation must be fully described within the Safety Case. The JSP 553 clearly states that design changes are to be assessed for safety and that documentation must reflect each change in design, limitation, or use as described in the SOIU, which forms part of the RTS. Application of the ALARP principle is explicit within the JSP 553 including the provision of sufficient resources and funds to produce an airworthy design, and carry out necessary safety management activities. However, under certain conditions, the RTS Authority (RTSA) is empowered to approve flying to the provisions of Operational Emergency Clearance (OEC), which have been called for as part of Contingency Planning. These conditions include war, hostile action, and situations of direct threat to MoD aircraft from a potential enemy, including direct threat from terrorism and patrolling over potentially hostile territory. This contingency option includes the use of OEC releases for work-up flying and weapon range practice needed to train to counter the potential threat or hostilities. Where the RTSA judges that additional releases may be needed to counter a threat, he will task the Integrated Project Team Leader (IPTL) to investigate extending the provisions of the Release. This might include the recommendation of further Reduced Operating Standards (ROS), Military Operating Standards (MOS) or OEC releases. Use of the ROS involves higher than normal risk and authority to use the ROS is restricted. In conjunction with the AOA and the IPTL, the Release To Service Authority (RTSA) ensures that the authority required, guidance on the risk involved, and related operating techniques are specified in the Operating Data Manual (ODM) and Aircrew Manual (AM) and in appropriate flying orders. Use of the MOS involves a higher risk than that for ROS, authority to use the MOS is restricted, and similar management precautions are taken as with the ROS.
MSc in SCSE 31
Critical Appraisal
If the operational need is deemed to be overriding (such as during actual conflict, disaster relief or Search and Rescue (SAR)), formal approval and issue may be made, at the operational location or HQ, of either existing OEC Releases or of a new Operational Necessity Service Deviation (ONSD). The issue and approval is made by the highest level of operational command (normally the Head of the Aircraft Operating Authority (AOA)) that is able to give approval in the required operational time scale. The JSP 553 puts all the policy in place to cater for the operational environment but only so far that it affects the airworthiness of the aircraft. JSP 550 although scant in its guidance to AOAs, is the only document that deals with the operational safety management. 3.2.5. Civilian Aerospace Regulations The CAP 712 [Ref 12] is primarily a guide for commercial air transport operations and maintenance activities, it sets out to inform and aid organizations of any particular size to develop an effective SMS for managing safety. It defines the requirements for a safety organization and indicates the need for a safety assurance document for that organization which it suggests is a safety case. However, the main plank of the safety guidance is the need for the organization to identify and review hazards and then assess and remove or mitigate the risks. In order to make this an auditable process the guide advises the use of a hazard log. This process is both reactive and proactive and the scope includes Flight Operations, Engineering and Maintenance, Ground Operations and all other departments whose activities contribute to the operator’s safety performance. The CAP 712 therefore extends the risk focus beyond the platform to the operating organizations. Of particular note, the CAP 712 emphasizes the need for safety awareness training for management and staff and stresses the importance of a positive safety culture. It believes that the commitment of an organization’s top management (those who direct and control the organization at the highest level with the responsibility of duty holder) towards safety, safety practices and safety oversight will determine the performance of the SMS. The safety culture of the company underpins the entire safety achievement of the company and is crucial to its success. The ideal safety culture is one that is supportive of the staff and systems of work, recognises that errors will be made and that apportionment of blame does not resolve the problems. This recognition that a ‘blame culture inhibits the safety related data collection process was discussed in paragraph 2.5. Therefore, a supportive culture will encourage open reporting, seek to learn from failures, and be just in dealing with those involved. The CAP 712 provides useful advice by expressing the notion that punitive action must not follow automatically from the open acknowledgement of human error. However, it makes clear that indemnity must not be guaranteed where there has been gross negligence. There is a final reminder that human factors should be reported by anyone involved in the incident, not necessarily the one who performed the error. The point that is made needs to be captured in the safety case argument and sufficient evidence needs to be accumulated to be able to assess the efficacy of the system; methods of achieving this will be discussed in Chapter 6 for further
MSc in SCSE 32
Critical Appraisal
work. The message is clearly expressed, ‘the front line defence is that operating staff must not accept unsafe behaviour from their peers’. 3.3. The London Underground Railway Safety Case The guidance for the Railway Engineering Safety Management safety case are laid out by Railtrack in the Yellow Book [Ref 2], and it is emphasized that the railway regulations draw the distinction between the Engineering Safety Case and the Railway Safety Case. Specifically, the Engineering Safety Case presents the justification for a planned change to the rail system, and the Railway Safety Case is a document that describes the organization’s arrangements for safety management. However, the Engineering Safety Case definition makes the point that it cannot separate engineering from other factors that affect safety – such as human factors. The London Underground safety case has been prepared under the requirements of The Railways (Safety Case) Regulations 2000 [Ref 39]. This Section will present a compliance appraisal to assess if the Safety Case meets the general concepts of an operational safety case and indicate if there are any omissions in the context of the requirements developed in paragraph 2.8.2. 3.3.1. The London Underground Safety Case – Appraisal The London Underground Safety Case Version 4.00 - Nov 2005 [Ref 40] is in textual form available on the web as a pdf file and has been formatted to ensure ease of navigation. At 298 pages, the Safety Case has not been included as an appendix to this report. It was considered that, translation of the Safety Case into a Goal Structuring Notation (GSN) argument would be beneficial to understanding the strategy; however, time was not available to proceed with this additional appraisal methodology. In Table 1 presented below, the arguments that are provided in the London Underground Safety Case that cover the operational specific requirements developed in 2.8.2 are highlighted. The degree of satisfaction achieved against the operational safety case criteria is assessed. The Safety Case provides a strategy and plan for achieving the defined safety criteria as follows: ‘The London Underground carry out a qualitative analysis, supported by a quantitative estimate, of the risk to those individuals considered most at risk within the group being assessed. If this ‘quantification’ indicates that, for an individual, the risk of fatality exceeds 1 in 1,000 per year for employees and suppliers, or 1 in 10,000 per year for a member of the public, then the risk is regarded as intolerable. Under such circumstances, they would take immediate action to reduce the risk to below these levels, including if necessary suspending the activity giving rise to the risk. If their ‘quantification’ indicates that, for an individual, the risk of fatality is below 1 in 1,000,000 per year, the level of risk is regarded as broadly acceptable’ [Ref 40]. At this level, they would not expend any significant effort to find further risk reduction measures, but would focus on ensuring the continuing effectiveness of existing risk control measures. Effectively they are considering this value to be ALARP. The development of the ALARP argument was not clear in the Safety Case and further investigation would be required to confirm the validity of the process. In addition, Health and
MSc in SCSE 33
Critical Appraisal
Safety and Environmental performance was monitored through a number of Safety Key Performance Indicators. The Operational topics covered were comprehensive and Table 1 identifies extensive measures to satisfy the requirements for an Operational safety case. The People issues are dealt with thoroughly with considerable evidence of a competence assessment process; Appendix B provides an example of the manner in which workplace risks are controlled. However, while the Safety Case defines the mitigation factors and responsible person, it does not provide the evidence of how successful each control has been. The Human Factors are mentioned but it was not obvious how London Underground managed these aspects when PPP suppliers carry out a large proportion of work. The term ‘incentivise’ is used to describe the contractual arrangements that London Underground has with their suppliers and this involved infrastructure service charges and penalty points to maintain long-term improvements and a £50 million per year contingency for any safety critical work identified as urgent safety. There was no evidence of an anonymous reporting system for incidents and there may be room for improvement in this area. However, the Safety Case provided evidence of a thorough communications system including verbal in small and large groups at many levels and directed paper and ‘e’ magazines. 3.3.2. Compliance Summary The compliance appraisal was based on the textual document but much of the evidence and methodology behind the evidence, especially the ALARP process, was unavailable in the public domain. Furthermore, the structures of the safety case argument would have had more clarity if presented in Goal Structuring Notation. Paragraph 3.4 provides the background to the GSN technique and an explanation of the terms and process for building a GSN safety Case. In the 298 pages of the London Underground Safety Case, it became difficult to identify the justification of the safety arguments. The Safety Case was produced in an ‘e’ version but required extensive navigation between chapters to piece together the same elements of an argument. By its nature the document provided introductions and summaries and the level of detail varied for different strands of the argument in each chapter. If the argument had been expressed in GSN then it would have been readily apparent if there were missing links in the strategy or evidence without an argument and if there was insufficient evidence. An argument without evidence is unfounded but evidence without an argument is unexplained and may lead to misinterpretation.
MSc in SCSE 34
Critical Appraisal
Table 1 London Underground Safety Case – Compliance with Operational Safety Case Requirements Op SC Requirement London
Tunnels protected from surface stock by detectors. Ventilation to control smoke and quality of atmosphere.
Pumps and floodgates.
2.8 Signalling equipment.
Signals ensure safe spacing and routing of trains. Older signals still in use that do no not provide required level of protection for signal passed at danger (SPAD). Only certain trains have ATP others have automatic tripcock. 2 lines have automatic overspeed systems. All trains have been designed and are maintained to: • Operate within a declared swept envelope that is within the structure gauge. • Control the risk of parts falling from trains to minimise the risk of derailment. • Minimise the risk of collision through train protection, signalling and braking systems. • Minimise the consequences of collisions through structural design.
2.9 Trains.
Equipment – where it impacts operations such as where the equipment limits the operational requirement
2.10Stations The risks associated with station operation are at the platform train interface, station area accident, station fire, lift fire, escalator fire, power failure and staff assaults.
MSc in SCSE 35
Critical Appraisal
Op SC Requirement London Underground Ref
Comment
Procedures – clearly defined for use by the operators and maintainers. Guidance on the type and structure of training through life.
Para 2.11 3.13.3 6.13
The London Underground maintains their equipment through contractors. The contracts are their key mechanisms for asset maintenance and improvement. However, the Safety Case states that The PPP Suppliers and PFI Suppliers are responsible for maintaining and improving designated packages of assets, and ensuring that they do so safely, is a key function of all of our management arrangements and risk control systems that are described in Section 3 to 8 of this Safety Case. The PPP Contracts incentivise long-term improvement in the asset base through a balance of infrastructure service charges and penalty points. Each contract also contains a safety agreement that sets out minimum safety responsibilities and a £50 million per year contingency for any safety critical work identified as urgent. A Reference Manual is issued which contains the operational arrangements for operating the railway safely. The Reference Manual contains clearly identified standards, procedures, directions and information for safe operation of the railway in normal and degraded conditions. They apply to all staff and suppliers involved in the operation or maintenance of the railway Access to the system for maintenance purposes is strictly controlled to ensure that risk levels are ALARP by clearly defining interfaces between the various organisations seeking access and mitigating against competitive pressures. This is achieved through: • Safety being the primary criterion for allowing or denying access.
MSc in SCSE 36
Critical Appraisal
Op SC Requirement London Underground Ref
Comment
• A contractually mandated and disciplined approach to planning access. • A demonstrably fair division of access between the parties, including a baseline of closures to enable enough engineering work to be completed safely whilst taking into account operational needs. • Flexible arrangements for changing those arrangements in real time. • Appropriate incentives or compensation, or both when access is frustrated accidentally, or by force majeure.
People – Establishment against strength and level of competence required are apparent. Management of change, supervision and responsibility.
6.8.1 3.13.3
The Human Resources policy in LU includes: − resourcing − competence − performance management − drugs and alcohol, monitoring and testing programmes. Competence and Training Standards are provided for people working on or about the operations to ensure they have the necessary skills, knowledge and qualifications to do so. They represent best practice that has developed over a number of years to control our specific risk profile. At all levels of management control, individuals are accountable or responsible for meeting minimum prescribing conditions and standards. These conditions and standards specify: • What must be achieved. • Why it must be achieved. • When it must be achieved by.
MSc in SCSE 37
Critical Appraisal
Op SC Requirement London Underground Ref
Comment
6.9 6.10
• How it is to be achieved. Staff are made aware of changes to every edition of the Reference Manual. The physical safety of personnel in the workplace from customer violence is a major issue and the LU have introduced a programme, which includes further training improvements with the accreditation of trainers. Arrangements with the British Transport Police (BTP) database and consideration of whether to set up a joint LU/BTP Workplace Violence Unit. Improvements are planned for support to staff who suffer assault, and refocusing operational managers performance goals to encourage the development of a more supportive culture. Risk Management system defines how risks are managed and lists the Hazards and Risk Controls. Typical of the tables is the Control of Workplace Risks that deals with Human Factors. This Table is reproduced at Appendix B. This section deals with the detail of Competence management and provides the methodology for recruiting, learning and developing and discipline. Considerable emphasis is placed on the regulations and competence standards imposed on employees associated with safety critical work.
Safety Management System – Safety management organization role defined. Approval routes. Safety monitoring system that defines how incidents are handled. Information feedback system. Safety documents maintenance process.
3.13.2
Safety decision-making is explicit in specifying the requirements to ensure that safety decisions are made in a consistent and transparent manner and demonstration that safety risks have been reduced to a level that is ALARP. Furthermore, Standards detail the requirements for identifying risks to employees, customers and the
MSc in SCSE 38
Critical Appraisal
Op SC Requirement London Underground Ref
Comment
5 6.11
environment and implementing controls to reduce risks to a level that is ALARP. Detailed arrangements exist relating to assessing and controlling risks to particular groups e.g. pregnant workers, those arising from specific activities e.g. manual handling, or those arising from specific hazards e.g. noise. Incident Reporting and Investigation process for notifying, recording and reporting incidents, dangerous occurrences and near misses ensures that recording and investigating incidents is maintained in order to prevent recurrence. The arrangements also include those for responding to recommendations from incident reports from third parties. There is a requirement for the local management chain to undertake systematic, risk-based inspections of all LU workplaces to identify and rectify any hazards, substandard conditions or substandard practices. Each level of the local management chain seeks assurance that health, safety and environmental risks are controlled. Also an informal reporting process is used called “What’s Wrong?” Contractual Safety Cases are established as part of the contractual arrangements with the larger suppliers to provide LU with assurance that they understand the risks they pose to the LU network and are controlling them. There is an extensive programme improvement plan with hard milestones and lead managers identified to complete each improvement. Communication media are a strong point of the safety case to provide feedback and raise issues. They include,
MSc in SCSE 39
Critical Appraisal
Op SC Requirement London Underground Ref
Comment
7
Manager briefing events 3 monthly, Team Talk monthly meetings, Speak up monthly chaired by directors, Employee survey and bulletins and induction training, and a magazine. Health safety and environmental performance is monitored through a number of Safety Key Performance Indicators (SKPIs). The SKPIs used are: • Customer accidental fatalities. • Customer major injuries. • Employee and contractor fatalities. • Employee major injuries. • Signals passed at danger. • Confirmed fires. • Number of section 12 contraventions. • Employee lost time injuries. • Workplace and work related violence. • Incorrect train door opening. • Person and train incidents.
MSc in SCSE 40
Critical Appraisal
3.4. Introduction to GSN As identified in the conclusions to the appraisal of the London Underground’s Safety Case, The thread of the argument was difficult to follow in places and it was not obvious that the evidence was appropriate or sufficient. There may have been missing links in the argument but the timescale of the appraisal did not allow a thorough investigation. The author has experience of translating textual safety cases into GSN and this has highlighted significant omissions and circular arguments that were not immediately obvious in large complex safety cases. GSN represents a notation and a methodology and in order to create arguments effectively it is necessary to understand both. The following introduction offers a concise account of the fundamentals and is sourced from T Kelly D Phil submission [Ref 41]. 3.4.1. The GSN Notation GSN consists of a number of elements that represent the 'building blocks' of an argument: Goals - represent the claims and sub-claims of the argument.
Strategies - describe how claims are related to their sub-claims.
Solutions - provide the evidence upon which claims can be substantiated.
Contexts - define the circumstances within which claims are valid.
Assumptions/Justifications - clarify the rationale behind the approaches used.
Models - refer to detailed accounts of systems and processes mentioned within the argument.
GSN also provides two distinct link types that can be used to describe the relationships between argument elements: Solved By Links - demonstrate which goals, strategies or solutions are being used to support a claim.
In Context Of Links - indicate the elements that are providing contextual information for a given claim or strategy.
There are three main objectives in using GSN to communicate a safety argument:
• Make the argument clear in terms of individual statements and the flow of logic.
• Make the argument defensible by providing rationale where necessary to support the argument.
• Make the argument mutually understandable through provision of context where necessary to avoid ambiguities.
The elements of GSN serve the following purposes:
• Goals are requirements, targets or constraints to be met by the system. They should be phrased as propositions, which can be said to be true or false (although we are not concerned whether the proposition is actually true or false). The correct form of the proposition for a goal should be a single sentence consisting of a <Noun-Phrase><Verb-Phrase>.
• Strategies are used to break the goal down into a set of sub goals. It can be regarded as a rule to be invoked in the solution of the goal. Strategies can be implicit or explicit. A strategy statement should succinctly describe the argument approach adopted, ideally in a form similar to ‘Argument by<approach>’, ‘Approach over <approach>’, ‘Argument using <approach> or ‘Argument of <approach>.
• Context is used to state how the argument relates to, and depends upon, information from other viewpoints. Context can be used to refer to any form of information. The context object can be associated with goals, strategies and solutions. When context information is associated with a goal, then all sub goals inherit the context as well. A context statement should either provide a reference to contextual information, or be a statement of contextual information. Context information can take two
SWArch
Model of SoftwareArchitecture
MSc in SCSE 42
Critical Appraisal
forms, Labels – references to information, in the form of a Noun-Phrase Verb-Phrase statement. For context of this type, the subject Noun-Phrase will typically be a term/concept from the goal.
• Assumptions and Justifications are used to describe the rationale behind the argument strategy that has been adopted. Assumptions should state any assumption on which the strategy or goal is being put forward as a solution to a parent goal. Justifications state the reasons why a particular strategy or goal is being put forward, or provides a justification for the adequacy of the strategy.
• The solution provides a direct reference to external information at a level within the goal structure that no further explanation, refinement or explanation is required. Solutions should be phrased as Noun-Phrase statements.
The methodology for assembling a Safety argument will not be described in this document; it is well defined within Kelly’s D Phil Submission [Ref 41]. However, it is worth noting here that there are other advantages associated with the use of GSN to develop safety cases. These are:
• It should allow for the systematic evaluation of the effects of the proposed changes to the safety argument, resulting in a more efficient, and hence more cost effective safety case maintaining process.
• It should reduce the dependence on safety case domain ‘experts’ to write and maintain safety cases, improving project efficiency, and reducing safety overheads.
• It provides the potential to re-use safety arguments and evidence between projects, especially within a common application domain, or when common elements of a system are being employed, although any actual material re-use must be undertaken with great care.
• It allows for greater clarity of safety arguments, and hence provides the potential for improvement in the overall quality of safety cases.
• It allows common safety case argument structures to be identified and generated in the form of re-use patterns. Ultimately these re-use patterns ensure that years of experience in how to generate a safety case are captured. Re-use patterns should provide benefits in terms of reducing the cost of the safety elements of the projects, as well as the risk of unacceptable safety arguments.
3.4.2. Selection of GSN for Operational Safety Case Argument The experience with the London Underground Safety Case has directed this Project towards the adoption of GSN to express the safety case argument. Furthermore, it was believed that the potential for safety case re-use patterns would be applicable to this Project. The development of safety cases is a resource hungry activity and normally requires skills of a safety case practitioner with a considerable amount of domain knowledge. Therefore, the potential for using patterns to assist with the safety case argument generation is worth investigating, as there may be significant savings to be made. The intention of this Project is to develop an operational safety case for a complex system of systems and evaluate the possibility of creating patterns for use when preparing other major operational safety cases. However, the first step is to develop the
MSc in SCSE 43
Critical Appraisal
operational safety case. M Warren has produced a Military ATC GSN safety case in his MSc submission [Ref 42] using a technique of reuse that he developed and evaluated. Therefore, this Project will appraise the methodology Warren used when applied to the much larger systems of systems operational safety case. Section 3.5 intends to evaluate the reuse of a similar operational domain (military air operations) safety case argument from a pattern, and assess the relative merits and disadvantages of this process when developing a multi unit flying operation GSN safety case. 3.5. Evaluation of Reuse Process from a GSN Military ATC
Pattern to develop a System of Systems Operational Safety Case
The GSN pattern developed by Warren [Ref 42] referred to a safety case for a small specialist element of a single independent unit (equivalent to a business unit). Specifically, the safety case was restricted to the ATC part of an operating base and was only applicable to the operating range of the radar equipment. However, the proposed operational safety case that will be developed as the subject of this Project in Chapter 4, will require the Domain to be extended to include additional specializations, and the scope to be increased to include many units and includes the airspace where the aircraft will operate. 3.5.1. Safety Case Domain and Scope Extension The proposed reuse will be projected onto a Domain that is an extension from the local ATM system of Warren’s project to all operating systems on a unit including the ground and aircraft operations. Furthermore, the Scope will be extended from his single unit operation to include all the units under the responsibility of the group Duty Holder. This will cover all the airspace and bases wherever the aircraft operate, even when the airspace is not under the Duty Holder’s direct control. The scope will need to include the many units operating different aircraft under a group organization within the Duty Holder’s responsibility to achieve his overall objective. Therefore, the proposed safety case to be developed will be for a system of systems. Warren’s project was deliberately restricted as it was developing a systematic process to allow safety case reuse for a military ATC system across differing military units. However, this Project extends the challenge to include all air related activities at units and will include arguments for the operations wherever the aircraft fly, regardless of airspace ownership, or destination or en route airfields used. The reason for this challenging increase in definition is due to the selection of a Duty Holder higher in the level of the organization who is responsible for, and has the authority to make decisions about, the operation even when the airspace management is not under his jurisdiction. 3.5.2. Additional Safety Case Complexity for System of Systems The reason for the increased scope with the operational safety case is the need to consider the safety standards of other systems that are not under the direct responsibility of the Duty Holder. The Duty Holder’s responsibility for maintaining his safety target for his personnel is not removed when the aircraft crosses into another airspace. There is considerable standardization being imposed within European airspace but when operating in remote areas or under other forces
MSc in SCSE 44
Critical Appraisal
with a different appreciation of safety targets, the Duty Holder will need to ensure he has the necessary controls in place in the safety case to mitigate the risks. 3.5.3. Safety Case Operational Argument The safety case developed by Warren did not intend to incorporate all the operational issues. Therefore the Operational Safety Case developed in Chapter 4 will include those aspects identified in paragraph 2.8.2, including the safety arguments for soft issues such as human factors, supervision and safety culture. The safety culture aspects were highlighted by J McDermid in Section 9 of [Ref 15] as distinctive customs, achievements, products, and outlook of an organization with respect to safety. In practice, there may be different degrees of safety culture and it will be necessary to incorporate a process to be used to measure the success of the safety culture achieved in an operational environment. This will allow a degree of measurement to assess if the required level has been achieved and will provide appropriate evidence for inclusion in the safety case. One metric for identifying the level of safety culture in an organization could be the number of human factors reports raised on incidents. However, it was evident from paragraph 2.5 of this Report that the human factors incidents had not been consistently identified through the normal reporting processes, therefore some other indicator should be investigated and may be suitable for further work to be discussed in Chapter 6. 3.6. The Proposed Safety Case Reuse Evaluation In his project [Ref 42], Warren developed a safety case reuse process that he intended to apply to similar domains. However, his choice of new domain was so similar to his original domain that he found the process to be unjustified. Therefore in this project, as discussed in section 3.5, the safety case will be developed for a domain that is an extension of his baseline safety case. This Project will evaluate the possible use of Warren’s process on a larger scale. It is intended to use his five-step Safety Argument Reuse (SAR) process (summarized in Paragraph 3.6.2 for clarity). In order to trial the process in a realistic setting and to avoid his dilemma when working with a very similar domain, it was considered that the system of systems operational safety case would provide a challenging trial. Much of the extended domain used in the system of systems would be external to the argument Warren developed in his project (reproduced at Appendix C). Therefore, it was decided for the purposes of the evaluation to carry out the Reuse steps only on the ‘organizational argument structure’ of his safety case, with particular focus on ‘resources’ and ‘competency’. The structures are reproduced in paragraphs 3.7.1 to 3.7.5 with the SAR process carried out as part of this Project superimposed on the elements. The new operational safety case developed in Chapter 4 will, where possible, make use of these argument structures modified by the SAR process. 3.6.1. The Warren Safety Argument Reuse Process Background The Warren SAR process was derived from the safety case maintenance process described by T Kelly and J McDermid [Ref 43], which was developed to enable changes over time such as modifications, regulatory amendments or societal influence, to be considered and reflected in the GSN safety case. This is an incremental process and Kelly intended the process to provide a systematic auditable methodology that ensured the safety argument was not undermined. A
MSc in SCSE 45
Critical Appraisal
diagrammatic view of the Kelly safety case change process is shown at Figure 6. It is considered important to emphasize that the steps in the Kelly process have specific meanings, which in most cases are different to the Warren SAR process. However, the Kelly process is considered fundamental to the long-term maintenance of safety cases and will be referred to at the end of this Project as a suitable management process for large-scale operational safety cases where the need for traceability is paramount.
Figure 6 Kelly Safety Case Change Process
3.6.2. The Warren Five-Step SAR Process The Warren Five-Step SAR Process is briefly summarized as follows. The steps have different meaning to those in the Kelly model in Figure 6:
• Step 1 – Recognize Challenge to the Safety Case. The aim of this step is to recognize the challenge to the safety case posed by the differences in the safety cases from intersecting domains. This may be considered obvious from the chosen domains and the challenges can be compared with the results from the next steps to provide a form of validation.
• Step 2 – Express Challenge in GSN Terms. The aim of this step is to develop the challenges exposed in Step 1 into a GSN argument. This is regarded as a fundamental step to the success of the process and should be completed thoroughly before moving on to the next step. However, it is expected that a degree of iteration may be required.
• Step 3- Use GSN to identify Impact. This step aims to assess the changes required to the argument structure due to the challenged elements. This step is significant where the domains are sufficiently dissimilar and may prove to be the Achilles heal of the Five Step SAR Process where too many challenged elements would reduce the benefit of reuse.
• Step 4 – Decide upon Recovery Action. The aim of this step is to allow recovery action of the argument and with step 3 provides the bulk of the work within the process. After the deletions have been carried out, the safety argument must be repaired. If the argument has been weakened it may be necessary to include diversity, improve or provide additional
RecogniseChallenge toSafety Case
ExpressChallenge inGSN Terms
Use GSNto Identify
Impact
RecoverIdentifiedDamagedArgument
Decide uponRecovery
Action
Step 1
Recovery PhaseDamage Phase
Safety CaseChallenge
GSNChallenge
GSNImpact
RecoveryAction
Step 5Step 4Step 2 Step 3
MSc in SCSE 46
Critical Appraisal
evidence. The context of the structure also needs to be scrutinized to ensure that all child goals under the context are still valid.
• Step 5 – Recover Identified Damaged Argument. This process is undertaken via a top down approach starting with the highest-level claim that is challenged. This process uses the original GSN process rules shown at paragraph 3.4.1 and is continued down the argument until all claims are adequately supported by appropriate evidence.
3.6.3. Conclusion Kelly’s process for GSN safety Case change is viewed as an incremental process that is applied to a given safety case. The changes are easily recognized and the changes to the argument structure would be expected to follow the expectations and regulatory disciplines of the parent safety case. Conversely, Warren’s objective was for his SAR process to be used to allow an argument to be transferred between domains. This he achieved in the specific area of local airspace management crossing the civil to military regulatory boundaries. It is considered feasible for the process to be extended to a larger system of systems. Therefore, the process will be trialled on a portion of Warren’s ATC Safety Case that is common to the larger operational safety case to be developed in Chapter 4. 3.7. Safety Case Reuse Trial using the SAR Process The 5 Steps of the Warren SAR process will be used on the Military ATC Safety Case developed by him for RAF Waddington. The GSN argument was analyzed spine by spine to identify which elements and arguments were either obsolete, required changing or were additional to the new argument. It was appreciated that additional structure or elements may be required to satisfy new requirements. As it was intended to extend the scope and the domain of the safety case, it was anticipated that the additional structure required would be significant. The convention used by Warren was to mark those challenged elements mapped onto the original argument with red crosses; those elements that are additional or obsolete are marked with a Blue circle; and any additional structure required is marked with a green triangle. 3.7.1. Step 1 Recognize Challenge Using the selected organizational argument structure of ‘resources’ and ‘competency’, Figures C-1, C-4, C-9 and C-10 in his mapping convention, the following GSN diagrams transferred from Appendix C have been subjected to the SAR steps using the criteria relevant to this Project.
MSc in SCSE 47
Critical Appraisal
Figure 7 High Level Claim Structure Challenged
In the operational system of systems safety case, it was expected that unit specific elements would require amendment, consequently' in Figure 7 C1 is challenged. The 3 strategies are challenged simply to amalgamate them into a single strategy. Finally, it is considered that the context C7 should be located higher in the argument to apply to Goals G2 and G3.
G1
The unit is acceptablysafe to operate
C1
Unit = RAF WaddingtonATC System and externalfacil ities on which itdepends
C2
Operate = ProvisionLocal. Approach andZonal Services withinarea of responsibil ity
C3
Acceptably safe meansmeeting the requirementsof DEF STAN 00-56, JSP552 and RAF ATC Orders
S1
Argument byaddressing all systemelements
S3
Argument byaddressing allorganisational aspects
G2
The system isacceptably safethroughout its l ife
S2
Qualitative argument byappeal to depth ofdefences
G3
Organisational aspectsare acceptably safe
G4
The ATC system safety netsprovide barriers betweenhazards and losses
C6Identified hazardsfor unit ATCsystem
C8
Operation undernormal, abnormal &emergency modes
C7
ATC systemdescription
C5
Organisational aspects arethe management structure,processes and peoplerequired to support theoperation
C4
System configurationas defined in JSPsand local orders
In Figure 8, it is considered that G3.4 and G3.4.1 may be part of the Staff Competence argument. Solution E3.4.1 may require additional audit evidence.
G3
Organisational aspectsare acceptably safe
S7
Argument to appeal toadequacy ofmanagement
S9
Argument by appeal toadequacy of staffing
S8
Argument by appeal toadequacy of process
G3.1
The organisational structuresupports a safe ATCoperation
Staff are competent toperform their operationalduties
G3.6
All staff resources arplace to support theoperation
C15
Processes include operational,contingency and abnormalprocedures, safety managementprocesses, capacity monitoring,configuration and changemanagement
C14
Adequacy indicatesthe processes are inplace and are worki
G3.3
The processes and proceduresare designed to ensure theyare adequately safe
G3.4
Users are cognisant in theuse of operationalprocedures
G3.5
All changes to proceinclude safety assuraabout their adequac
C12
Users includeoperations andmaintenance staff
G3.4.1
The users have been trainedto use the procedures whereappropriate
G3.4.2
Staff are made aware of thesafety risks associated withsafety related procedures
E3.4.1
Evidence ofCOPP, HFORand OSCARprocedures
E3.3.1.2
All procedurestagged withtheir safetysignificance
Fig C-1
Fig C-10
Fig C-9Fig C-5
Fig C-7Fig C-6
Mapping Diag
1
86
115
4372
MSc in SCSE 49
Critical Appraisal
Figure 9 Staff Competence Argument Structure Challenged It is considered that C23 could be extended to be explicit about the review process including an audit process. The evidence of staff competence achieved described in E3.2.6.1 could be converted to a goal to provide an opportunity for each element of competency achieved to be provided in a solution.
G3.2
Staff are competent toperform their operationalduties
C22
Staff include managers, AirTraffic Controllers &engineers directly involved inthe provision of the air trafficservice
C21
Competent means beingadequately qualified tocomplete an assignedtask(s)
C16
Adequacy of s taffing includesstaff recruitment,competence, motivation,supervision, levels &allocation
S13
Argument by appeal toadequacy of competencerequirements
S14Argument by appeal toachievement ofcompetencerequirements
G3.2.1
Competence requirementsare managed throughclearly defined processes
G3.2.2
Operational roles / functionshave well definedcompetency requirements
G3.2.3
Competence requirementsare reviewed and changedwhen appropriate
E3.2.1
Evidence ofcompetence
mgmtprocesses
E3.2.2
Job descriptionsdefine
competencyrequirements
E3.2.1
Evidence ofcompetence
mgmtprocesses
View
C23
'when appropriate' is a resultof incidents, near misses, orfollowing system, procedureor organisation changes
G3.2.4
Staff are recruited / selected withthe appropriate skil ls andattributes to undertake activitiesand functions
G3.2.5
Staff are trained to meetthe competencerequirements
G3.2.6
Staff competence ismonitored and reviewed inoperations
E3.2.4.1
Personnelselectionprocess
E3.2.4.2
Pschometrictesting
E3.2.1
Evidence ofcompetence
mgmtprocesses
View
E3.2.5.1
Evidence ofcompetencerequirements
feeding into trainingrequirements
E3.2.1
Evidence ofcompetence
mgmtprocesses
View
E3.2.6.1
Evidence ofindiv idual staffcompetenceachievement
Fig C-4
Mapping Diagram
1
10986
115
4372
MSc in SCSE 50
Critical Appraisal
Figure 10 Staff Support Arguments Challenged
It is possible that the G3.2.5 Training Goal embraces G3.6.1, and E3.2.6.1 should be included under G3.2. Consideration should be given to introducing a context to define the contractors’ employment specialization unless this was implicit in one of the higher contexts.
G3.6
All staff resources are inplace to support theoperation
C24
Defined operatinghours of the unit
C26
Planning of staffworkload is bothtactical & strategic
C31
Planning of required stafftakes into account thedifferent skil l set /competence requirements
C25
Staff resources from alloperational areas, includingATC, engineering andfacili ties management
S15
Argument by appealto meeting demand
S16Argument by appeal tosupply of appropriateskills
C27
Scheduling makesallowances for restperiods and schedulinglegislative requirements
Support for systems onwhich the unit depends arein place and agreed
E3.6.2.2.3.2.1
Engineeringworkload predictedusing knowledge,
experience & JSPs
E3.6.2.2.3.2.2
Maintanencerequirements
clearly definedusing ILS
techniques
E3.6.2.2.3.2.3
Contractorsupport is in
place
E3.6.2.2.2.3.1
Support forsystems is
clearly defined
E3.6.2.2.2.4
Flight planningdata is
available
MSc in SCSE 51
Critical Appraisal
3.7.2. Step 2 – Express Challenge in GSN Terms In Figure 7 to Figure 10, the elements challenged are identified in accordance with the SAR process. The Step 2 process, with the justification, is presented in Table 2 where the challenged elements are identified in GSN terms. In the comment column, there is the GSN impact for each element changed or the requirement for a new structure. Military ATC Element
Challenge Operational Safety Case Requirement
Comment
Fig 8 C1
Context is unit specific and includes external facilities
The context will require instantiating for all units and additional context required to define the external facilities
New Context element required
S1, S2, S3 3 single Strategies do not add to the total argument
Provide a single strategy to show how G1 is argued
New Strategy required under G1
C7 The ATC system description is required for the other goals G2 and G3
ATC system description is required for Gaol G3 being evaluated
Link C7 to G1
Fig 9 G3.4
This goal would be more appropriate in the Staff Competence Argument
Users understanding of procedures is a competence issue
Reposition under G3.2
G3.4.1 Training is a competence goal
Reposition under G3.2
E3.4.1 Evidence of the procedures does not solve the goal of training
Evidence of correct use of procedures would require audit for quality of completed documents
Introduce further evidence to solve the argument
Fig 10 C 23
Context requirements for review do not include periodic audit
Requirement for continuous improvement
Include a periodic check of competency adequacy through audit
E3.2.1 Evidence of competence management process is a goal
Management process requires set of solutions to satisfy the process requirements
New argument structure required
Fig 11 G3.6.1
Training is already covered in G3.2.5 of Fig 10
Delete G3.6.1 and supporting Elements
Check training requirements for G3.2.4
E3.2.6.1 Evidence of competence not required under staff
Individual Competency evidence included underG3.2
Evidence of Competence included under
MSc in SCSE 52
Critical Appraisal
Military ATC Element
Challenge Operational Safety Case Requirement
Comment
resources G3.2.6 G3.6.3 Contractor staff are
competent Not apparent where the staff are employed- engineering or ATC
Context for G3.6.3 Contractor Staff required
Table 2 Elements Challenged in the ATC Safety Case
3.7.3. Step 3- Use GSN to Identify Impact This step requires Spinal and contextual impacts to be identified. The step process will be limited to Figure 9 to provide an indication of the extent of the work. It became apparent at this point, that the reuse of the original argument structure was not providing the benefits anticipated; this will be discussed further in the summary. The results appear at Table 3. Element Number
Spinal Impact (Propagation of challenges to Goals Strategies and Solutions)
Contextual Impact (Propagation of challenges to Context, Models, Justifications and Assumptions)
C23 Elements below this context will need to reflect additional audit checks in the review process
E3.2.6.1 This solution will be changed to a goal and the spinal impact will be reflected in the new structure below
Table 3 Spinal and Contextual Impact
The next phase of Step 3 is to check for the evidence impact effect. This step required knowledge of the evidence available and the expectation that the evidence in the original safety case was explicit. Unfortunately, the type of evidence provided in the solutions was not easily identified and insufficient domain knowledge was available at this stage to make sound comment on all solutions. However, it was judged that solutions worded ‘Evidence of competence requirements feeding into training requirements’, would not be used in the new safety case. It was anticipated that the evidence would be available in ‘e’ or paper form and provide actual results of audits, a defined publication or authorized process, with evidence that the process was successful. Therefore, in view of the lack of domain knowledge and expected evidence, this part of Stage 3 will be omitted, as it will be rectified in Steps 4 and 5. 3.7.4. Step 4 – Decide upon Recovery Action This Step selects those branches of the safety argument that require repair work to provide the support for the higher level claims. As an example of this phase of the process, the selected element, solution E3.2.6.1 (Evidence of Individual Staff Competence Achievement), from Step 3 in Paragraph 3.7.3 will be developed to form the following structure. The new structure has been developed from
MSc in SCSE 53
Critical Appraisal
information gathered at the units and from the ATC Examining Board (ATCEB) as part of this Project, independently of Warren, and is shown in Figure 11.
Figure 11 Staff Competence Argument - New Structure
3.7.5. Step 5 – Recover Identified Damaged Argument This step provides assurance that the argument is complete by a systematic process taking a top down approach to ensure that the available evidence adequately supports the claims. The argument must be complete, relevant and have sufficient quality to support the top-level goal. In this case, the ‘Staff Competence Argument’ will be analysed in the light of the changes made in Step 4. In developing the new structure, it was evident that the training aspects were integral with competence monitoring since the training, both initial and on the job training (OJT), resulted in either examination or assessment. Since the initial and continuation training was essential in achieving competence, the structure was further adjusted and Goal G3.2.5 could be deleted. 3.8. SAR Process Evaluation The SAR process was performed on a part of a safety case structure, which had the potential for reuse in a larger system of systems safety case being produced
LocalExamPass the practical assessment
and Working KnowledgeExamination
WklyTrgSummWeekly Training
Summary
ATCTrgRecordIndividual ATC
Training Record
TrgObject2Training Objectives
completed for Live OJT
Strat ATC TrainingStrategy over ATC Initial
Training and Training at Units
StdMaintATC Competence
Maintained
ATCPerfStndATC Performance Standard
Decode
TrgObject1Training Objectives
completed for Part TaskTrainer (Simulator)
F8000Individual Form
8000
StdCheckExtATCEB Standards
Check passed
StdCheckIntInternal Standards
Check passed
F5994F5994 Certificate of
Competency
StdStnInitTrgStandard of competence
achieve during initial StationTraining
GroundSchoolCompletion of Stn
Ground School
GrndSchCERTGround School
Certificate
MSc in SCSE 54
Critical Appraisal
as part of this Project. The Warren SAR process provided a logical method to ensure that all features of the parent structure were scrutinized for suitability and relevance in the new structure being developed. However, there were drawbacks in that the difference between the parent and clone domains should not be too great. The trial used in this project was probably too ambitious and a more incremental approach would have provided a more convincing argument to prove the value of the process. Consequently, it was considered that three main areas should be considered when contemplating using the process. These were:
• The benefits to be gained when constrained by the rigour of process. • The effect of extending the scope of the safety case. • The degree of diversity between the domains in terms of output,
organizational structure or regulatory limitations. 3.8.1. Rigour of Process The employment of the SAR process provided a valuable function in sustaining the rigour required to generate the rational behind the challenges and changes to the evidence and argument structure. This was an important requirement in safety case development and provided a thorough auditable database of the decisions made. It was found that the systematic approach provided by the SAR process provided all the links and evidence that would be vital when progressing to a safety case maintenance programme using the Kelly process [Ref 43]. 3.8.2. Extension of Scope The Domain of ATM was extended in the new safety case to embrace numerous units within the responsibility of the new Duty Holder and the new argument tended to be easily adapted. However, the new safety case author may have different ways of expressing the same argument and it was necessary to understand fully the rationale behind the parent argument first. However, The extension of the scope to include operations in airspace controlled by ATM outside the direct responsibility of the Duty Holder also needed to be incorporated into the safety case. This was required to provide the evidence that duty of care was being maintained when operating in airspace subject to different regulatory instruments. This aspect was more difficult to structure using the SAR process and had to be structured from basics. Furthermore, many of the new goals were not satisfied by the original solutions and additional evidence was required. This was to be expected and is not a disadvantage of the SAR process but needs to be taken into account before adopting the process. The effect of this consideration is proportional to the degree of scope extension. There would come a point when the benefit of reusing the parent safety case was insignificant compared with the degree of original work required to structure the new portions of argument. 3.8.3. Degree of Diversity The parent argument structure provided a starting point for the operational safety case in the area of ATM, but the diversity introduced by the complex system of systems required a wider approach such as grouping all ground competence arguments together instead of single strands for each specialization. Conversely, the aircrew competence required the development of a completely
MSc in SCSE 55
Critical Appraisal
new argument structure because of the complexity of the competency management system. Therefore, the benefits from using the SAR process are limited when the diversity of domains is large. 3.8.4. SAR Summary The 5 Step SAR process was employed over a proportion of the Military ATC safety case to assess the viability of extending the safety case to a larger domain. The formalized reuse process provided an audit trail for the justifications developed for each challenge and the recovery action and the benefits of this database would be realized during the safety case maintenance programme. It was considered that the trial performed in this Project was too ambitious because the domain and scope were appreciably greater in degree and complexity, which masked any appreciable savings in resources. However, it would be expected that the reuse of a safety case with incremental differences would be viable and realize savings in the resources required to develop a clone safety case. This process would provide a guide to a suitable strategy to support the top-level claims but care should be taken to ensure that the rationale behind the GSN argument is completely understood and is compatible with the new organization. Furthermore, the identification of evidence in a new domain, especially in a legacy system, is a resource hungry part of safety case development; therefore, significant savings may be made in using the evidence data to simplify the search process. In conclusion, where the organization domains were similar such as oilrig platforms producing similar products in similar conditions, the SAR process would be beneficial but where the differences were too great, the benefits would be proportionally less. 3.9. Appraisal conclusions This Chapter has investigated the requirement for safety cases in the military environment and examined how the operational factors previously developed in Paragraph 2.8.2 were incorporated in safety cases. The factors identified were:
• Equipment – where it impacts operations such as where the equipment limits the operational requirement
• Procedures – clearly defined for use by the operators and maintainers. Guidance on the type and structure of training through life.
• People – Establishment against strength and level of competence required are apparent. Management of change, supervision and responsibility. Attitude to whistle blowing.
• Safety Management System – Safety management organization role defined. Approval routes. Safety monitoring system that defines how incidents are handled. Information feedback system. Safety documents maintenance process.
• Safety Culture – CAP 712 [Ref 12] stresses the importance of safety culture in maintaining an acceptably safe system.
The Chapter applied the operational safety case requirements to the London Underground textual safety case to identify if the requirements had been incorporated. Finally, a process of Safety Argument Reuse was evaluated to assess the benefits that may be realized when developing large-scale safety cases.
MSc in SCSE 56
Critical Appraisal
3.9.1. Textual Safety Case Appraisal One large textual safety case was appraised for compliance with the operational safety case compliance and it was found that there was a large degree of compliance but the argument was not easy to follow hindering the identification of the argument strategy and the evidence. This led to the decision to use GSN when developing a large-scale operational safety case. 3.9.2. GSN Safety Case Appraisal The next stage was to assess if a safety case, developed in GSN for a similar domain, would be suitable to form the basis for the operational safety case to be developed for this project. The Military ATC safety case produced by Warren was exposed to his SAR process [Ref 42] in the context of the system of systems operational safety case proposed in this Project. The use of GSN was confirmed as the most explicit methodology to analyse the safety argument but there were significant changes to be made to the argument and evidence to accommodate the change in focus of the new safety case. 3.9.3. Safety Culture Assessment The safety cases did not cover all the issues and, in the case of safety culture, it was difficult to identify the type of evidence that should be used. This aspect warrants further work to address the evidence required to support the claim that ‘the organization has an acceptable safety culture’. The operational safety case will need to embrace the softer issues surrounding human factors both with the maintenance staff and the operators. Furthermore, where the duty holder is remote from the operators, and supervision less easy to exert, the claim that ‘supervision is acceptable’ would require special handling. An example of such a scenario would be where commanders of aircraft are operating far from the home base, or in a rapidly changing environment such as might be expected in a search and rescue setting. 3.9.4. Safety Argument Reuse Assessment The project identified the benefits of employing the SAR and the factors to be considered before contemplating the use of the SAR process. These factors were: the benefits of using a rigorous process that would assist in future safety case maintenance, and the disproportionality between both the domain and scope of the parent and clone safety cases. In view of the disparity between the ATC safety case and the system of systems safety case, it was decided to use the traditional GSN approach. If the safety case extension had been more incremental, the SAR process would have realized significant savings in resource. Considering the complexity of the domain and the requirement to view it as a system of systems, it was decided to employ a traditional approach to develop the operational safety case for a multiple aircraft type and multiple units flying operation. Therefore, the safety case to be developed in Chapter 4 will use the process with the following phases:
MSc in SCSE 57
Critical Appraisal
• Review Documented Requirements – Review the standards, regulations, orders and procedures developed for the domain.
• Elicit Information – Conduct interviews at all levels and in all relevant
parts of the domain. This is extremely resource intensive and requires a degree of skill from the facilitator to tease out the important information. It helps to have a plan of the claims important to the safety case top-level goal, but it is inevitable that further goals will be required to support the argument as the facilitator gains more domain knowledge.
• Structure Safety Case in GSN – The methodology developed by Kelly
[Ref 41] and described in paragraph 3.4 will be used to develop a safety case argument in GSN as this is more appropriate for and operational safety case covering system of systems. A safety case of this magnitude would be difficult to follow in a textual document and there would be a high probability of errors and omissions that would be difficult to identify. Additionally, a GSN safety case would provide an opportunity for a more systematic approach to safety case maintenance.
MSc in SCSE 58
Operational Safety Case Development
4. To discuss the development of operational safety cases and risk assessment in high-risk Systems
4.1. The Operational Safety Case Development Process The domain chosen as the subject of the Operational Safety Case Development process was the management of a group of flying units operating different types of military aircraft in a variety of roles. In order to keep the project in the public domain, the detail of the roles and the types of aircraft have been omitted. However, the processes and regulatory aspects, where they are pertinent to the operational safety case development have been included. The GSN safety case development process used in this project was described in Chapter 3 and based on the (now) traditional GSN lines. The gathering of information to develop the argument and identify evidence involved considerable travel to 10 units from Morayshire in the north to Cornwall in the south and Anglesey in the west to Suffolk in the east. As discussed in Paragraph 3.4, the GSN safety case development process required the following elements.
• Review Documented Requirements • Elicit Information
• Structure Safety Case in GSN 4.1.1. Review Documented Requirements The standards, regulations, orders appertaining to the military environment were laid down in the Defence Standard 00-56, JSP 553 and Project Oriented Safety Management System Manual (POSMS) for the equipment procurement and through life management. The operating orders were laid out in the JSP 550 series and local orders, with a remit to comply with the Air Navigation Order (ANO) wherever possible. In addition, the CAP 712 was used as guidance for the elements require in a safety management system where these were not fully explicit in the military documents. There was a move to work to the civilian orders where possible and only military where necessary but this was not reflected in this project. Numerous units’ and headquarters’ orders and instructions were analysed and incorporated in the overall structure requirements but these are omitted from this Project for clarity.
MSc in SCSE 59
Operational Safety Case Development
4.1.2. Elicit Information The main form of information gathering was by interview with key personnel at many levels in the units and headquarters. The level of success of this process depended on the facilitator have a reasonable knowledge of the domain. However, where this was not possible it was constructive to have a subject matter expert at the meeting to draw out the important features of the operating and management process. Extensive notes were made and the discipline of maintaining an auditable database was paramount. Some of the units carried out similar functions in slightly different ways and the database was useful to compare and contrast the processes. The anomalies were referred to the next level and the preferred structure was used in the argument with a note indicating how the process was to be audited in the future. This cross checking was part of the organization’s normal audit function and often led to processes changing to accept best practice; however, the audit function was outside the scope of this Project. In the ATM domain, it helped to have a plan of the claims, derived from the SAR process, supporting the safety case top-level goal, but it seemed inevitable that the pattern needed to be modified to correspond with the larger domain. 4.1.3. Structure Safety Case in GSN Using the methodology developed by Kelly [Ref 41] and described in Paragraph 3.4, the 6 steps to preparing a GSN safety Case were employed as follows:
• Identify goals to be supported. • Define basis on which goals stated. • Identify strategy to support goals. • Define basis on which strategy stated. • Identify basic solution.
4.2. Development of an Initial Operational Safety Case in GSN
for a Multiple Unit Flying Organization The argument relies on the strategy that operational safety risks can be managed and maintained ALARP provided the supporting goals are fulfilled. The next paragraphs detail the safety case argument structure, and expand progressively each of the supporting goals starting with the top-level goal. 4.2.1. Top-Level Safety Case Argument The summary of the top-level safety case argument is shown below as a goal structure with the context and assumptions removed from sub goals for clarity. The plan superimposed on each figure identifies where the diagram on the page belongs in the structure. The numbers refer to the number in each figure title. However, where a sub goal and associated structure could be included in the parent structure diagram without loss of clarity, it was included to reduce the number of pages.
MSc in SCSE 60
Operational Safety Case Development
Figure 12 Top Level Safety Case Argument – 1
The top-level argument includes the ‘equipment safe’ goal, but this could be attached as an assumption or an away goal because the safety management is carried out by the IPT. Furthermore, the RTSA is not directly responsible for the operational duty and therefore not the Duty Holder for the operational safety case. However, the RTSA is responsible for the limitations applied to the equipments’ usage. It has been found convenient to include the equipment detail in the operational safety case as the electronic version of the operational safety case can contain a hyperlink to the equipment safety cases and the RTS documents, which are required for reference by the aircrew. In addition, the Duty Holder is required to have visibility of the equipment hazard log as mitigations can impact on operational effectiveness. 4.2.2. Equipment is acceptably safe to use The Equipment argument differentiates between airborne equipment and ground-based equipment. The aircraft platform argument is supported by the aircraft goals in the context that they are operated in accordance with the SOIU and the additional limitations that may be incorporated in the RTS. The RTS is prepared under the arrangements of the Generic Aircraft Release Process (GARP) and is the responsibility of the RTSA with the supporting evidence provided by the IPTL and incorporated in the Aircraft Platform safety case
1
2 3 64 5 7
8 9 10 11 12 13 14 15
16 17 18 19 20
AirOpSafeAll system units'
Air Operations areacceptably safe
EquipSafeEquipment is safe to use
TolDiffTolerability differs between
operational andnon-operational activities
SysDefAll components of thesystem defined in Org
ManualOpDef
All operational activitiesdefined in Org Manual
TolSafeDefDefinition of 'Tolerably Safe'determined from JSP 550
and SMS
MaintStandardStandard of aircraft
maintenance ensurescontinued safe operations
GrndResSuffOrg Ground resources are sufficient
to meet tasking safely
MissionSupportMission support
functions sustain safeoperations
SafetyMgtPracOrg operational safety ismanaged in accordance
with good practice
AircrewOpSafeSufficient Aircrew to operate
safely
OpTargetOperational activities
defined as acceptably safeif a probability of 1
accident per y fg hr isachieved
MSc in SCSE 61
Operational Safety Case Development
Figure 13 Equipment Safety Case Argument – 2 In Figure 13 above, the assumption that the Aircraft Document Set (ADS) reflects the as-flown condition implies correctly that the IPTL is responsible for the ADS and has Customer Supplier Agreements (CSAs) in place to ensure that it is current. The Platform Hazard Log is also maintained by the IPTL but any risks that need to be incorporated in the operational Hazard Log are notified to Operating Organization under the terms of a CSA.
1
2 3 64 5 7
8 9 10 11 12 13 14 15
16 17 18 19 20
AcTypeXSafCasAircraft Type X platformSafety Case has been
approved by Safetymanagement Board A
AcTypeXADSAc type X ADSmaintained inline with as
AcTypeXRTSAc Type X RTS issued
AcTypeXRTSAircraft Type X
RTS
AcTypeXHLAircraft Type X
Platform HazardLog
Ac Type XSCAircraft Type XPlatform safety
case
EquipSafeEquipment is safe to use
AircraftOpDefAircraft operations
defined by and within theenvironment specified inthe respective platform
SOIU
OperationsLimitOperations constrained
by the limitations,cautions and warningscontained in the aircraft
RTS
SupportEquipSupport equipment
and facilities are safe
AircraftSafeAircraft Platforms are
safe to carry outoperations within Org
AcTypeXSafeAircraft Type Xsafe to operate
MSc in SCSE 62
Operational Safety Case Development
4.2.3. Maintenance Standards Argument The Standard of aircraft maintenance ensures continued acceptably safe operations. This top-level goal is supported by a strategy that there are quality systems in place both at the units and coordinated by the Operating Organization. The safety case does not reflect the recent changes introduced by the transfer of Depth Maintenance to the DLO, as the change was made after the structure was drafted. However, the Safety Case may be changed as part of the maintenance process developed by Kelly [Ref 43]; it would require the introduction of an Assumption or Away Goal.
Figure 14 Maintenance Standards Argument – 3
4.2.4. Ground Resources Support Operations Safely This is a particularly complex goal and further work is required to identify how the establishments (number of personnel required to achieve their work safely) for some of the operations have been derived. Under each organization, there is a requirement for an impact statement to notify the Duty Holder of safety risks that would impinge on the achievement of the overall safety criteria due to insufficient staffing. The goal structure is at Figure 15.
MaintStandardStandard of aircraft
maintenance ensurescontinued safe operations
QualSysStratArgument over Org
quality system and Unitimplementation
QualSysDefQuality system defined in
JAP 101A and 101B
OrgISO9000Org quality system
maintained IAW ISO9000
MaintPolRequired
maintenancestandard is defined
by MoD Policy
UntitOrgQualUnit organization
supports safemaintenance
UnitMaintQSUnit Maintenance
quality system
OrgQualSysOrg quality systemmaintains required
level of maintenancestandards
QReportsQuality reports are
managed
Aircraft MPSAircraft are maintained iaw
IPT Maintenance PolicyStatement
1
2 3 64 5 7
8 9 10 11 12 13 14 15
16 17 18 19 20
MSc in SCSE 63
Operational Safety Case Development
Figure 15 Ground Resources Argument – 4
4.2.5. Aircrew Resources Operate Safely The Aircrew Resources goal structure to argue that aircrew operate safely has been investigated in detail and a complex goal structure was developed to argue sufficiency and competence. The competence strand incorporates the requirements for each facet of training and argues that aircrew are also experienced and current. In addition, and a very important factor in maintaining safe operations, is an effective level of supervision. The structure for this argument appears at Figure 16. The next level below this structure continues on Figure 17 and Figure 18.
1
2 3 64 5 7
8 9 10 11 12 13 14 15
16 17 18 19 20
PersEquipStratArgument overpersonnel and
equipment
EngAuthEngineering authorizations
managed to ensurecompetence
OPCapOperational Capabilityassessed by SATCO
SafetyManSafety Managers are
competent
AirTrafficAir traffic personnelhave the requisite
training experienceand currency
ATCEstPeakRegEstablishment of ATC
Calculated for peak workload
AdeqResOrg air operations
resources areadequate to support
safe operations
GrndResSafeGround resources support
operations safely
EstabAssessEstablishment of air andground crew assessed
by competentorganization
EstablishmentReview?
HumResArgument over Human
resources are competentand sufficient for safe
operations
SFSOCourseSFSOCourse
AuthReqJSP100A-01 defines the
general requirement
Training CoursesTraining Provided forspecial to type andgeneral basic skills
IMPactImpact
Statementcreated whenstaff believe
establishment
UnitLUEUnit LUE provides
evidence ofestablishment
AP100A-01AuthData
P100A-01AuthDataba
EngineeringEngineering personnel
have the requisitetraining andexperience
GrndCompResGround Personnel arecompetent for the role
they fulfil
EquipResEquipment provided is
sufficient and suitable forsafe operations
GPLogsCapDevRepGp logs Capability
DevelopmentReport confirmacceptance of
LSAR
LSARLogistics SupportAnalysis Register
ResConstrainResources constrained
by budgetary limits
PostitionsPositions dictate number of
controls required
AP3392AP3392 provides the
establishment calculation
SuffGroundCSufficient groundcrew
available
LeanAssessLogistics
TransformationProgrammeAssessment
MSc in SCSE 64
Operational Safety Case Development
Figure 16 Aircrew Resources Argument – 5
4.2.6. Aircrew Training Argument The aircrew training argument examines the requirement for training standards to be met at each phase of the training process. The GSN structure appears at Figure 17.
1
2 3 64 5 7
8 9 10 11 12 13 14 15
16 17 18 19 20
FlyPolRegDirFlying Policyregulation and
directives defined inJSP 550
AircrewResSafeAircrew resources operate safely
OrgFlyStdOrg flying standards
and directives definedin Org orders
CompResStratArgument over
Competentresources and
supervision
AircrewCompAircrew are competent
to operate on task
CompContAircrew Competencedefined as sufficient
training, experience andare current as defined by
the Org Manual
TrgStndStratArgument over Aircrew training,standardization and currency
EstAircrewAircrew Established by
Competent Org
CFSSuffCFS Staff Sufficient
WhiteTicketWhite Ticket Process
FLACCompFlying Authorizers Course
Complete
CurContCurrency requirementsdefined in JSP 550 and
Org Manual
SuffAirCSufficient aircrew
available
AircrewCurAircrew meet currency
requirements
AircrewTrgAircrew receive sufficient
training
AircrewStndAircrew Flying is
Standardized
AircrewSupervisionFormal Aircrew
supervision maintained
OrgOrdOrg Orders
OrgFgOBOrg Flying Order
Book
SenSupRecSenior SupervisorRecommendation
SafConsidEnsure Safety
Consideration carried outprior to authorization of
flight
PowersofAuthPowers of Authorization
Promulgated by unit
SAMASAMA Records
MSc in SCSE 65
Operational Safety Case Development
Figure 17 Aircrew Training Argument – 17 4.2.7. Aircrew Standardization Argument The aircrew standardization argument examines the standardization of aircrew during their normal tour of operation and ensures that the standard of training reached through the training phase does not diminish over time. The GSN structure appears at Figure 18.
Operations The goal that mission support functions sustain acceptably safe operations incorporates all the ground based operations in direct support of the flying operations and this includes the Air Traffic Management (ATM) in the local area which was the subject of Warren’s safety case. However, the operational safety case must include the ATM at the Deployed Operating Bases (DOB) and en route. This is to allow the Duty Holder to exercise his responsibility for the crew under his command while they are operating in remote locations. Other aspects that are not normally considered but are vital to safe operations are the catering requirements when in flight and at bases en route. All the associated orders for safe operations are included in this cluster of goals. The top-level of the Mission Support structure is shown at Figure 19. The next level includes the argument for safe operations at non main operating bases (MOBs) and appears at Figure 19.
1
2 3 64 5 7
8 9 10 11 12 13 14 15
16 17 18 19 20
TrgRecAudTraining Records
Audited by StandardsFlt
STANRepStandardsFlt Reports
CFSTgRecFoldCFS Training
Record Folders
JSP550JSP550
OrgTgOrdOrg training
orders
FormatCRFFormat of the CRF
defined inSFT/1/CR/1
TrgFolderTraining Folder forqualifications and
clearances
F5000F5000
records
LogbookIndividual
pilotlogbooks
CRFStudCourse record
Folder forstudents raisedfor the Durationof the Course
TrgRecMaintTraining Records are
Maintained
StndFltSqn StandardizationFlights carried out
successfully
StndReqDefRequirements for and
content of standardizationflights defined in....
HQCFSSdFltHQCFS
StandardizationFlights carried out
successfully
StdRqtPromStandard requirements
promulgated in Org trainingOrders and JSP 550
AircrewStndAircrew Flying is
Standardized
BTRCBasic
TrainingCard
LCR - CRCR Status achieved
OrgStndSysOrg has an effective
standardizationsystem
MSc in SCSE 67
Operational Safety Case Development
Figure 19 Mission Support Argument – 6
4.2.9. Operations from Non Main Operating Bases are
Acceptably Safe The argument that non MOBs are suitable and acceptably safe for aircraft operations is of necessity wide ranging because of the diversity of type of support expected at each location. It may be an unprepared strip or a civil airport, therefore, there are a series of reports, and instructions to be taken into consideration before the Duty Holder can be satisfied that the operation will be acceptably safe. The goal ‘Org Action Group assesses known operational hazards’ requires development and it is this that would identify the strategy to argue safe passage through airspace and incorporate mitigations against such hazards as friendly fire, mid air collision or controlled flight into the ground.
1
2 3 64 5 7
8 9 10 11 12 13 14 15
16 17 18 19 20
ATMHazStratArgument that ATM
Hazards are identifiedand controlled
effectively
SafeEnvirSafe Environmentfor operations is
provided
MissionSupportMission support
functions sustain safeoperations
CatStndSetCatering Standards
are set
CatStandMonCatering Standardsare monitored and
maintained
RegFrameFramework ofregulationsmaintained
CrewCompRegulations cover
requirements of specificmission for crew
competency, environmentalmin equipment, crew duty
times
CatStandCatering Standards
are set andmaintained
OpBasesOperating Basesare suitable and
safe
NonMOBNon MOB is
suitable and safe foraircraft operations
MOBMOB is suitable and
safe for aircraftoperations
ATMSafeOperational and trainingtasks performed by theATM organization are
Figure 20 Aircraft Operations from Non MOBs Argument– 14 4.2.10. Operational Safety Management System The Operational Safety Management is maintained in accordance with the best practice principles of safety management as shown in Figure 21. The goal ‘an acceptable safety case is established’ is still undeveloped until the argument has been satisfied by the production of a complete, audited and approved operational safety case. An important goal in this section is that appropriate safety management and control activities are in place. This requires safety assessments at every stage of operations and should be recorded and documented in operations’ orders and similar documents. It also requires suitably qualified personnel to make the assessments and, to this end, a training programme would be implemented. The safety management argument also includes a goal that the safety culture is maintained at an acceptable level. This requires development because a suitable assessment process has not been identified. This will form another strand for future work in Chapter 6.
1
2 3 64 5 7
8 9 10 11 12 13 14 15
16 17 18 19 20
PostDetReportPost
DetachmentReport raised
AFSuppDocAirfieldsSupport
InformationDocument
LocRegExpLocal Regional
Experts employed tosupport operations at
non MOB
SuppServSupport Services are
provided for operationsaway from MOB
WellFoundBWell Founded
Bases areacceptably safe
WellFoundSurvAll Well Founded
bases are subject toAirfield Surveybefore first use
AFServAirfield Services
provided
MaintContEnabling contracts
Maintained
OverviewAFOverview of
Airfields within aregion is
maintained
OpOrderOperation
OrderRaised
AFSurvRepAirfield
Survey Report
LiaisLocLiaison with local
contractors
A
WellFoundSCWell Founded
Bases are assumedto have an approved
Safety Case
UseRepUsage
Reports
AFSurvGdCarried out iaw Org
Airfield SurveyGuide
WellFoundValidCrews using WellFounded bases
required to makeusage report
AustBaseAustere Bases are
acceptably safe
AustValidCrews using
Austere basesrequired to make
usage report
AustRecceAll Austere bases
are subject toRecce before use
RecRepRecce
Reports
OrgActGpOrg Action Group
assess knownoperational hazards
RecceGuideCarried out in
accordance withOrg Recce guide
SurvMaintResults of previous
surveys aremaintained
NonMOBNon MOB is
suitable and safe foraircraft operations
MSc in SCSE 69
Operational Safety Case Development
Figure 21 Safety Management Argument – 7
4.2.11. Hazard Log Argument An important argument supporting the safety management system is the establishment of a Hazard log to record and manage all the operational risks to ALARP. This goal carries a large proportion of the justification of the safety case argument that air operations are acceptably safe. The argument relies on the identification of all credible hazards and this goal is supported by the undeveloped goal of establishing a Loss Model. This is a methodology employed
1
2 3 64 5 7
8 9 10 11 12 13 14 15
16 17 18 19 20
FormalRevSafety programme
analyses andassessments formally
reviewed
SafetyMgtPracOrg operational safety
is managed inaccordance with good
practice
AppropSMRegimeAppropriate SafetyManagement andcontrol activities in
place
OpOrdersAll Org Operation
Orders contain crossreference to safetytargets, associated
hazards and risk
OpOrderInstReference to
Operation Orderinstructions that
support the claimthat safety targets
are cross referenced
AvSMPEstablishedAppropriate Aviation Safety
Management Planestablished and maintained
AvSMPOrg Aviation
SafetyManagement
Plan
SafeMgtIntegralManagement of the safety
tasks integrated throughout theOrg Operations activities
CompSafePersCompetent safety
personnel assignedto Org Safety
Management tasks
J
SMPrincKey strategic
goal of JSP 550is thei t f
SMPrinc0056CtxtMoD SafetyManagement
Principles defined inDef Stan 00-56 Issue
HLDefSMPContOrg Aviation SMP
defines Hazard Log
SafCultLevelSafety Culture level is
set by reference toHSC Guidance
SafCultureOrganization Safety
Culture is maintainedat acceptable level
AccSafCaseAcceptable safety
case established forOrg Operations
SSASystem Safety Analysisperformed to determinewhether risk associated
with Org Operationsreduced to an acceptable
level
HazLogEstdArgument over
Hazard Logestablished and
suitably managed
SysCharVerifTarget Safety
Characteristics of OrgOperations verified in
practice
MSc in SCSE 70
Operational Safety Case Development
to assist the operators to establish a complete list of hazards and controls for inclusion in the hazard log and the process could form a topic for further work in Chapter 6; an introduction is provided at paragraph 4.3. The Hazard Log argument has been expanded in Figure 22 and the constituent parts supporting this goal is at Figure 23.
Figure 22 Hazard Log Established Argument – 15
4.2.12. Hazard log Contains the Required Elements
Argument This part of the argument expands some of the reporting mechanisms that provide the hazard related data for assessment. Of note is the reliance on the Flight Safety Information Management System (FSIMS) that is intended to collect all the incident data and provide human factors analysis for each occurrence. The structure is developed in Figure 23.
1
2 3 64 5 7
8 9 10 11 12 13 14 15
16 17 18 19 20
HazLogEstdArgument over
Hazard Logestablished and
suitably managed
HazLogProgHazard recording
and tracking definedin Org AvSMP
HLMonProgHazard Log employedas the principal meansof monitoring progress
of risk reduction
OrgAGOrg Action Groupidentify hazards
Loss ModelLoss ModelEstablished
TrendAnalysisOccurrence reports
are analysed toidentify hazard trends
HLRetainedHazard Log retained and
backed up to ensure audittrail is not lost
HLUpdateHazard Log updated
as new hazardsdiscovered or
changes made
HazLogManHazard Log is effectively
managed
SMBEndorseHLHazard Log entriesendorsed at SafetyManagement Board
meetings
SMSConHazards areidentified and
managed iaw OrgAvSMP
HazLogContHazard Log contains
required elements
ALARPRisks reduced to
ALARP
HazIDAll credible hazards
are identified
HLMgtProcsProcedures defined forthe management and
control of the Hazard Log
HLReviewHazard Log regularly
reviewed by the SafetyManagement Board
MSc in SCSE 71
Operational Safety Case Development
Figure 23 Hazard Log Required Elements Argument – 20
4.3. The Operational Loss Model Hazard Identification Process There has been insufficient time to include the development of an operational loss model to assist the identification of hazards and their controls. This process follows the format of an event tree and is facilitated to ensure most value is obtained from the process. A starting point for the top level of the loss model is proposed in Figure 24 but the development would be appropriate for further work and is included in Chapter 6.
Figure 24 Top Level of Operational Loss Model
G01
Probability of Loss of Life,Airframe, or damage to theEnvironment is acceptable
G02Probability of Loss due totechnical failure isacceptable
G03Probability of incorrectRelease to Service isacceptable
G04Probability of inaccurateAircraft Document Set isacceptable
G05Probability of unsafemaintenance workingpractice is acceptable
G07Probability of crew healthand fatigue factors areacceptable
G08Probability of loss fromairspace hazard isacceptable
Goals applicable to Platform Safety Case Goals applicable to Operational Safety Case
1
2 3 64 5 7
8 9 10 11 12 13 14 15
16 17 18 19 20
HazLogContHazard Log contains
required elements
HLDataBaseHazard Data base
developed for recordingand managing issues
and hazards
ReqHLContentsRequired contents for
hazard Log defined in DefStan 00-56 and CAP712
HLDetailsHazard Log contains
details pertinent to eachhazard and accident
ACCIncidentsAirspace Coordination and
Control Occurrences
TechFailureTechnical failures recorded
HLRefEvidenceHazard Log provides
references to all analysesand reports for each hazard
FSIMSFSIMS SupportsHazard Log Input
HLSysDescHazard Log includes a
description and scope ofthe Org system
HazControlDocHazard Log provides
documentation ofhazards and their
controls
OpOcRecOperational Occurrences
Recorded
EnvOccRepEnvironmental aspects ofOccurrences Recorded
F765F765 Occurrence
ReportsFSIMSFSIMS
PandoraPandora Database
HumFactHuman factors
Occurrences Recorded
HFORSHFORS
MSc in SCSE 72
Operational Safety Case Development
4.4. Safety Culture Assessment In Chapter 2, it was recognized that a fundamental property of an operational safety case was the achievement of a particular level of safety culture in the organization. The ability to be able to measure such a subjective concept is important to satisfy the solution to the safety culture goal. The IAEA use a specialist team called the Safety Culture Assessment Review Team (SCART) to achieve a ranking. In the medical domain, the American healthcare system has developed a safety culture tool [Ref 44] for improving patient safety in healthcare organizations. This is based on the collection of data using a list of questions prepared to avoid bias. It would seem appropriate to develop such a tailored tool for operational personnel. This would use tangible criteria to measure safety culture objectively. This initiative would provide a strand for future work and is included in Chapter 6. 4.5. Human Factors Occurrences A major concern with the collection of safety related data was identified in Chapter 2 as the disproportionality between human factors caused incidents compared with human factors caused accidents. An initiative to capture the lost data has been sponsored by the MoD to introduce the Flight Safety Information Management system (FSIMS). This process is still at the trial stage but it is hoped that the project will plug this important gap in data collection. The process is based on an interactive series of questions to guide the author through a logical sequence to identify the human factors elements of the occurrence. 4.6. Safety Case Development Summary The Safety Case GSN argument structure was developed from interviews over a period of a year using the Kelly safety case development process [Ref 43]. The justification for producing the safety case was defined in the DASMS requirements produced in Regulation 445 of JSP 550 [Ref 37]. The safety case argument is not complete but indicates the complexity of producing the operational aspects of a safety case compared to the equipment or platform safety case. The organization was a system of systems and the Duty holder responsible for the safety case had the authority to make decisions at the interface between 2 organizations thus satisfying one of the criteria for an operational safety case as defined in paragraph 2.8. 4.7. Evaluation of GSN Development Process The GSN Development process has provided the author with an organized set of data providing the logic and auditable database for the justification of the structure selected. The process relied heavily on efficient and effect information elicitation techniques. It was found that the type of qualities required for a stakeholders and requirements analyst as described by G Luettgen in the York University Requirements Engineering Module [Ref 45] were most apposite for this phase of the work. The Traditional techniques of background reading, interviews, questionnaires and observing behaviour were most effective. However, it was also relevant in the visits to units to use the brainstorming Collaborative technique for large groups of specialists.
MSc in SCSE 73
Operational Safety Case Development
4.7.1. Elicitation Techniques A great deal of information was established about the organization by reading the extensive regulations and specific orders applicable to specialized roles. However, it was recognized very early in the project that there was a large amount of tacit and semi tacit knowledge that it was essential to capture if an accurate safety case argument were to be developed. The most used technique was interviews that were tailored to suit the level of interviewee. At the higher level, the interviews tended to be open but included an element of introductory presentation to explain the purpose, as the concept of safety cases was unfamiliar to most of the people contacted. The subject matter experts’ interviews were structured around a questionnaire to ensure that their position in the organization, safety responsibilities and stakeholder relationships were clearly identified. At the working level, the brainstorming technique was successful in developing some of the streams of tacit knowledge. These sessions involved a thorough briefing giving the background to safety management and creating a rapport with the team by explaining their role in the Project. The discussions became very wide-ranging after the initiation process but with careful facilitation, the semi-tacit knowledge was elicited and often surprised the team how important these aspects were to the safety management process. Finally, and probably of most importance in some of the high-risk areas, the technique of observing behaviour was adopted to expose the tacit knowledge. This involved some close observations during training tasks. 4.7.2. Use of Patterns It is recognized that there is still a long way to go but the size of the challenge to incorporate the safety case into this Project was initially underestimated. This is a valuable lesson as it had been considered that some resource savings might have been realized if the concept of reuse had been successful in this instance. Furthermore, the option of using patterns was considered but it was eventually decided that patterns were useful for small well-defined and regulated parts of the safety case, but could not be used with confidence over large-scale strands of argument. 4.7.3. Further work identified Further work is required to establish an effective operational Hazard log that will integrate with the platform hazard logs and the development of an operational Loss Model would assist in this process. Finally, further work was identified to satisfy the solutions for safety culture assessment and identify human factors occurrences.
MSc in SCSE 74
Evaluation
5. Evaluate the contribution of the process to communicating the effectiveness of a proposed SMS
5.1. Evaluation Objectives The aim of the evaluation process is to evaluate the contribution of the operational safety case development to communicating the effectiveness of a proposed SMS. The aim will be achieved by the following objectives:
• Use the definition of operational safety case as developed in Chapter 2 and evaluate the criteria over:
o The London underground safety case. o The multi-unit safety case developed in Chapter 4.
• Evaluate the Validity of the SAR process against the Kelly Methodology for GSN safety case development.
• Use of Patterns. 5.2. Operational Safety Case Criteria The criteria will be based on the definition derived in Chapter 2 as follows: A safety case, owned by the duty holder responsible for the operators, that provides a compelling, comprehensible and valid case, that the combination of elements comprising operational capability, when used together in a defined operating environment to achieve agreed objectives, demonstrates that the system is acceptably safe. The constituent parts of an organization that achieves the operational capability would need to be defined for each organization and this may be more difficult to define in the service industries. However, the first safety case to be evaluated should be relatively uncomplicated as the requirement has been running since 1863 and the equipment is introduced in a structured and formal manner with numerous safety requirements to satisfy. In order to provide additional depth to the top-level definition, the criteria will be based on the requirements emphasized
MSc in SCSE 75
Evaluation
by Kelly [Ref 15] and reproduced in paragraph 2.8.1 including additional elements of safety culture and human factors. The final evaluation criteria are:
• Duty Holder – Duty Holder’s responsibility for operators. • Equipment – Limitations for use provided for particular environments or
operator competence. • Procedures – Use by the operators and maintainers. Guidance on the
type and structure of training through life. • People – Establishment against strength and level of competence
required are apparent. Management of change, supervision and responsibility. Human factors managed.
• Safety Culture –Safety culture assessment. • Safety Management System – Safety management organization role
defined. Operational Hazard and risk management system. 5.3. London Underground Safety Case Evaluation Duty Holder The duty holder of the safety case for the London Underground was identified in the first paragraph as the ‘London Underground’ with any queries about the content being sent to the Safety Quality and Environment (SQE) Systems and Assurance Manager. This reference to an organization rather than a person was not expected as it was considered that a position of this importance would be by name or at least by post. The wording provided might be construed to mean the Executive Directors. However, for the purposes of this Project it would have to be assumed that the duty holder is responsible for the operators and therefore meets the first criteria. The safety case mentions that while the operators are employed directly by LU, the equipment and track maintenance staff are contracted through PPP agreements. This project has not investigated the legal aspects of contractors in relationship to the safety case, but it would be expected that the normal HSWA regulations apply in that when on the owner's premises the duty holder is responsible. However, this aspect warrants further legal analysis. Equipment The equipment description and maintenance requirements are laid down. The limitations for use are not clearly identified in the safety case although a great deal of detail about the operation and the safety measures in place to prevent train over speeding or Speed Control After Tripping (SCAT) when a train passes a danger signal is included. The safety case refers to the ‘Reference Manual’ for information about operating in degraded conditions but this document would need to be assessed to confirm its sufficiency. Procedures Safety management procedures have been identified for operators to follow to ensure they comply with health and safety requirements. It is apparent that the safety case concentrates on the health and Safety processes but does not provide much detail on the procedures for operating the system. The procedures are referenced out to the ‘Reference Manual’ which contains clearly identified standards, procedures, directions and information for safe operation of the railway in normal and degraded conditions. People Competence and training standards are laid down in an external document and there is a detailed system for change management. Separate instructions are provided for the contractor personnel.
MSc in SCSE 76
Evaluation
Safety Culture Safety culture was mentioned in the safety case as requiring a step change in improvement, and was to be included in the Safety Improvement Plan. However, specific reference to safety culture was not found in the plan and the means of assessment was omitted. Safety Management System There was ample reference to the safety management organization and roles of each person with safety responsibility. Furthermore, there was a detailed description of the occurrence reporting system and hazard identification. 5.3.1. London Underground Evaluation Summary The criteria for an operational safety case were largely met. The duty holder would probably be better defined as the Managing Director but the definition given in the safety case would not introduce confusion. The person was at a high enough position to have authority to make decisions over at least 2 organizations. The only weak point in the safety case, as provided, was the lack of safety culture analysis or defined way ahead to make the improvements that were highlighted in the safety case. However, on balance it is considered that the LU safety case met the criteria for an operational safety case. 5.4. Multi-Unit Operational Safety Case Evaluation Duty Holder The duty holder of the safety case for the Multi unit operational safety case was clearly defined in the organization Manual as the AOA. This post is responsible for the operators including the first line maintainers and detailed contracts were made with the other personnel responsible for maintenance at locations not under the AOA’s jurisdiction. Therefore, the safety case meets the first criteria. Equipment The equipment description and maintenance requirements are laid down in the ADS, which is referred to in the safety case. All limitations for operating the aircraft are explicit in the RTS. Procedures The procedures for safety management and for operators to follow when flying or using the ground equipment are well defined in the ADS for each equipment. People Competence and training standards are laid down in organization Training Orders and the numbers of staff required is reviewed and subject to a formalized impact statement process if insufficient staff is considered to increase safety risks. Safety Culture Safety culture goal was included in the safety case but a process for assessing the level of safety culture was not available. Safety Management System The safety management system was based on the traditional flight safety system that tended to focus on a very comprehensive occurrence reporting system. The use of hazard logs at the equipment level and ATM environment was evident but the operational risks were not included in a hazard log; some major issues were incorporated in a high level document.
MSc in SCSE 77
Evaluation
5.4.1. Multi-Unit Operational Safety Case Evaluation Summary The criteria for an operational safety case were largely met. The weak points in the safety case were the lack of safety culture analysis process and the absence of an operational Hazard Log and management system. However, The safety case was considered to include all the goals necessary to meet the criteria for an operational safety case, but some of the evidence was unavailable at the time of the evaluation. 5.5. Safety Argument Reuse Evaluation The SAR process was trialled in Chapter 3 and identified the benefits of employing the SAR and the factors to be considered before contemplating the use of the SAR process. In summary, the rigorous systematic process that provided the basis for future auditable safety case maintenance was extremely useful. However, the work involved where the complexity or size of organization was considerably different was not appreciably less than that required for a traditional GSN safety case development process. If the safety case extension contemplated could be defined as incremental then it is considered that there were benefits in reduced resource and time using the SAR process 5.6. Use of Patterns The use of patterns is becoming more common as a library of patterns is being created. Small patterns for clearly defined arguments such as the ALARP pattern are useful to apply to a safety case argument because they have been tested under peer review and, provided the regulatory principles do not change, the pattern remains valid. It has been suggested that large-scale patterns could be developed but the experience gained in the development of the safety case in this Project would militate against this. 5.7. Conclusion The London Underground safety case and Multi-Unit safety case were evaluated against the criteria based on the definition of an operational safety case developed in Chapter 2. Although the LU safety case was not described as an operational safety case, it met most of the criteria developed in this project. The shortcoming in safety culture assessment is understandable as little evidence was found in the public domain apart from in the US medical domain and the IAEA. The Multi Unit Safety case was developed specifically to meet the operational requirement and included all the essential goals but lacked in evidence for the operational Hazard Log and the safety culture assessment. Finally, the use of the SAR process and development of safety cases using large-scale patterns was considered. The use of the SAR process was considered beneficial for incremental differences in organization, scope or domain. However, where the differences were large any benefits in using the process were masked by the amount of original work required to develop the argument. Similarly, the use of patterns was expected to provide benefits in resource and confidence in accuracy. While small well-defined patterns were useful, it was considered that large patterns would be suitable for very specific
MSc in SCSE 78
Evaluation
organizations and, unless all essential criteria defining the system could be identified as similar, then there was potential for error.
MSc in SCSE 79
Conclusions and Further Work
6. To summarize the areas of further work that have been identified
6.1. Introduction The aim of this chapter is to summarize the work undertaken during this project, draw conclusions from the project findings and summarize the further work that has been identified. 6.2. Summary The project set out to:
• Extract and summarize best practice from other domains.
• Investigate issues of particular relevance to military aircraft operations.
• Develop a process that may be used for similar operational safety cases.
• Evaluate the contribution of the process to communicating the effectiveness of a proposed SMS.
Chapter 2 formed the basis of the review of this project and identified the standards and regulations applicable to military and civilian domains. The necessity for different types of safety case was discussed: from the ‘equipment’ through ‘platform’ to ‘operational’. The background to the operational definition was investigated as the concept of operational safety case had evolved over the past decade. The importance of identifying the Duty Holder and assessment of his level of responsibility and authority was discussed. It was argued that the duty holder’s responsibility for the operators was a criterion for applying the operational status to the safety case. The operational definitions developed for the military environment were analysed from the military capability concept defined in the MoD and the operational effectiveness concept defined in the USAF. Finally, the chapter proposed a definition of operational safety case. Chapter 3 analysed the regulations specifying safety case requirements in the aerospace domain that would be used to guide the development process described in Chapter 4. The operational safety case definition was then evaluated against a safety case outside the aerospace domain to test the definition’s relevance. The London Underground safety case was not defined as an operational safety case, but it followed very closely the criteria developed in Chapter 2. Therefore, it was concluded that it fulfilled the purpose of an operational safety case. The Chapter then investigated the Safety Argument Reuse process developed by Warren [Ref 42] and applied it to a military ATM
MSc in SCSE 80
Conclusions and Further Work
safety case when developing a multi-unit operational safety case. The benefits of the process were evaluated and considered to be of most value when the differences between the two safety cases were minor. However, the benefit of using the process decreased in proportion to the differences between the safety cases domains or scale. The Chapter concluded that the (now) traditional GSN approach to developing the multi-unit operational safety case should be used in Chapter 4. Chapter 4 developed the multi-unit safety case in GSN and was able to make use of the experience gained from using the SAR process but the final argument structure was developed from extensive elicitation. Information was gathered from standards regulations and orders; interview of personnel at all levels; brainstorming sessions with operators; and observation on training tasks. In common with the London Underground safety case, it was found that safety culture was recognized as an important goal in an operational safety case but it was not obvious how this should be assessed. Finally, in Chapter 5 the London Underground and Multi-unit safety cases were evaluated against the criteria developed for the operational safety case definition in Chapter 2. Although the London Underground safety case was not presented as an ‘operational’ safety case, it closely met the criteria, as did the Multi-unit safety case. 6.3. Conclusions The conclusions to be drawn from this Project stem from the definition of an operational safety case developed in Chapter 2. The original definition proposed by Blagrove has been extended to include reference to the duty holder who owns the safety case. The duty holder who is responsible for the personnel who operate the equipment or provide the service, owns the ‘operational’ safety case. However, the ‘operational’ definition also includes the caveat that the duty holder should be of sufficiently high level in the organization that he is authorized to make decisions across 2 or more organizations. If the duty holder has ownership of a safety case, but is not responsible for the operators, such as a RTSA, then it is probably not an operational safety case. The progression of a safety case from supplier to operator was discussed. The supplier’s ‘equipment’ safety case would probably be further developed by a procurer along with other safety cases to form a platform safety case. However, the conclusion of this Project is that the ‘platform’ safety case does not become an operational safety case until its ownership moves to the operators’ duty holder. Furthermore, the operational safety case would include the operational hazards and the operational mitigations in the hazard log. The Platform safety case may have the operational mitigations articulated as assumptions, but the operational safety case would manage them as goals. The operational safety case development using the SAR process was trialled and it was concluded that it would be appropriate to use where the differences in domain or scope were incremental. For developing large complex system of systems safety cases, it would be appropriate to use the GSN safety case argument development process.
MSc in SCSE 81
Conclusions and Further Work
6.4. Further Work This section suggests additional research and development that the author believes would contribute to the extension of the Project and further the operational safety case management process in high-risk systems. 6.4.1. Operational Safety Case for UAVs The operational safety case definition developed in the project has included the criterion that the duty holder would be responsible for the operators. In the case of Uninhabited Air Vehicles (UAVs), the operators are employed on the ground and operate the equipment remotely. This fits well with the definition, however, the development of UAVs has now allowed them to operate independently and the use of swarms of UAVs is not far away where the onboard system decides on the exact flight path, identifies targets and instructs other UAVs in the swarm to deliver weapons. The friendly fire argument, for example, would need to be very carefully constructed to ensure that sufficient barriers are in place without affecting operational capability. The concept of ownership of the operational safety case for a system of UAVs needs to be considered more fully and would contribute to the scope of this Project. 6.4.2. Adaptation of Operational Safety Case to Commercial
Environments The Project focused on the operational definitions gleaned from MoD and USAF orders and instructions. However, the feasibility of extending the definition of ‘operational capability’ to the civilian commercial environment was not pursued in depth. It may be possible to exchange the concept of operational capability to the civilian equivalent of ‘ability to make a profit’ or ‘provide a service’. The development of the use of these terms to explain better the meaning of ‘operational’ in the civilian environment would help to extend the scope of this project. 6.4.3. Identify Hazards Through Loss Model Development The project identified the need to identify and mange operational hazards and risks. The use of a ‘loss model’ to assist in a formal HAZID process has been used successfully to identify platform hazards. However, the unusual hazards experienced in a military operational environment need to be identified and managed and the development of an operational hazards loss model would inform this work. 6.4.4. Safety Culture and Human Factors Assessment The goal to ‘have a high level of safety culture’ was considered an important feature of an operational safety case but little work was discovered in the public domain to be able to have a reproducible assessment process similar to the processes introduced to assess competence in an organization. It is considered that research into the soft goals in a safety case such as human factors occurrence reporting and safety culture assessment would merit further work.
MSc in SCSE 82
Abbreviations
7. Abbreviations ACC Air Traffic Control Centre ACSNI Advisory Committee on the Safety of Nuclear Installations ADS Aircraft Document Set AFI Air Force Instruction (USAF) ALARP As Low As is Reasonably Practicable" AM Aircrew Manual AMS Acquisition Management System ANO Air Navigation Order AOA Aircraft Operating Authority ARP Aerospace Recommended Practice ASACS Airborne Surveillance and Control System ATC Air Traffic Control ATCEB ATC Examining Board ATM Air Traffic Management ATP Automatic Train Protection BPA British Pipelines Agency BTP British Transport Police CAA Civil Aviation Authority CAP Civil Aviation Publication CCTV Closed Circuit Television CPS Crown Prosecution Service CRMT Crew Resource Management Training CSA Customer Supplier Agreement DASB Defence Aviation Safety Board DASC Defence Aviation Safety Centre DASMS Defence Safety Management System Def Stan Defence Standard DLO Defence Logistics Organization DOB Deployed Operating Base DoD Department of Defense (US Government) DPA Defence Procurement Agency FSIMS Flight Safety Information Management System GARP Generic Aircraft Release Process GSN Goal Structuring Notation HARP Helicopter Airworthiness Review Panel HOSL Hertfordshire Oil Storage Ltd HSWA Health and Safety at Work etc Act (1974) HSC Health and Safety Commission HSE Health and Safety Executive HUMS Health and Usage Monitoring Systems IAEA International Atomic Energy Agency IPT Integrated Project Team IPTL Integrated Project Team Leader JSP Joint Service Publication
MSc in SCSE 83
Abbreviations
LU London Underground MIC Methyl isocyanate MOB Main Operating Base MoD Ministry of Defence MOS Military Operating Standards NASA National Aeronautics and Space Administration ODM Operating Data Manual OEC Operational Emergency Clearance OJT On The Job training ONSD Operational Necessity Service Deviation OSART Operational Safety Review Team OSH Occupational Safety and Health PFEER Prevention of Fire and Explosion and Emergency Response PFI Private Finance Initiative POSMS Project Oriented Safety Management System Manual PPP Public-Private Partnership RA Resolution Advisory RAF Royal Air Force ROS Reduced Operating Standards RTS Release To Service RTSA Release To Service Authority RVSM Reduced Vertical Separation Minimum SAE Society of Automotive Engineers SAR Search and Rescue SAR Safety Argument Reuse SCART Safety Culture Assessment Review Team SCAT Speed Control After Tripping SKPI Safety Key Performance Indicator SMS Safety Management System SofS Secretary of State SOIU Statement of Operating Intent and Usage SPAD Signal Passed At Danger SQE Safety Quality and Environment TCAS Traffic Collision Avoidance Systems UAV Uninhabited Air Vehicles USAF United Sates Air Force
MSc in SCSE 84
References
8. References Number
Title Author ISBN/Publisher
1. Health and Safety at Work etc Act 1974 UK Government
HMSO 1974
2. Corporate Manslaughter and Corporate Homicide Bill as introduced in the House of Commons on 20th July 2006
SofS for Home Dept
http://www.publications.parliament.uk/pa/cm200506/cmbills/220/06220.i-ii.html Accessed September 2006
3. Engineering Safety Management Railtrack Railtrack PLC January 2000
4. Columbia Accident Investigation Board Report Oct 2003
Columbia Accident Investigation Board
http://www.nasa.gov/columbia/home/index.html Accessed September 2006
5. Presidential Commission on the Space Shuttle Challenger Accident June 1986
W P Rogers http://history.nasa.gov/rogersrep/genindex.htm Accessed September 2006
6. The Flixborough Disaster: Report of the Court of Inquiry’
Appendix C RAF Waddington GSN Argument This appendix details the developed GSN Pattern for assuring acceptable safety of an in-service military ATC system produced by M Warren [Ref 42].
Figure C-1: High Level Claim Structure
G1
The unit is acceptablysafe to operate
C1
Unit = RAF WaddingtonATC System and externalfacilities on which itdepends
C2
Operate = ProvisionLocal. Approach andZonal Services withinarea of responsibil i ty
C3
Acceptably safe meansmeeting the requirementsof DEF STAN 00-56, JSP552 and RAF ATC Orders
S1
Argument byaddressing all systemelements
S3
Argument byaddressing allorganisational aspects
G2
The system isacceptably safethroughout its life
S2
Qualitative argument byappeal to depth ofdefences
G3
Organisational aspectsare acceptably safe
G4
The ATC system safety netsprovide barriers betweenhazards and losses
C6Identified hazardsfor unit ATCsystem
C8
Operation undernormal, abnormal &emergency modes
C7
ATC systemdescription
C5
Organisational aspects arethe management structure,processes and peoplerequired to support theoperation
C4
System configurationas defined in JSPsand local orders
Argument throughconsideration of non-executive functions
G3.1.3.1
All roles and responsibilitiesare clearly defined
E3.1.3.1
No overlap inroles and
responsibilities
G3.1.3.4
The organisation isphysiccally stuctured tosupport safe operations
G3.1.3.4.1
there is an organisation inplace to provide air trafficcontrol
C33
Provision of ATC includes directcontrol of aircraft, management ofairspace design, management ofATC procedures, acceptance ofnew systems and changes toexisting systems
G3.1.3.4.2
There is an organisation inplace to provide engineeringsupport
E3.1.3.4.2.1
Independent QualityAudit of organisation
to ensurecompliance with ISO
2000/9000
E3.1.3.4.2.2
Evidence ofcompliance with
orders,procedures andISO 9000/2000
G3.1.3.4.3
There is an organisation inplace to support thefacil ities
C36
Facil ities support includesprovision of adequatephysical securityarrangements
C35
Support of facilitiesincludes maintenance ofheating, ventil lation andelectrical supply systems
G3.1.3.5
The organisation has asection charged withensuring safety standardsare met at all times
C34Engineering support includessystems monitoring & control,maintenance and repair,management and acceptance ofchange, configuration control,software support facil ities andperformance monitoring
G3.1.3.6
The organisation operates asystematic safetymanagement system
E3.1.3.5.1
Agreed safetyaccountabilities
across theorganisation
E3.1.3.5.2
Evidence ofsafety survey
and safetyaudit plans
E3.1.3.5.3
Clearly definedsafety policiesand principles
E3.1.3.6.3
SafetyCommittees/wkg group
E3.1.3.6.2Evidence of
safetyperformancemonitoring
process
G3.1.3.3
Inter & intra co-ordination &communications supportsafe operations
Fig C-5
G3.1.3.6.1
The organisation makes allpersonnel (not just ATCstaff) aware of safety issues
E3.1.3.6.1.2
Sufficient staffing isavailable to allowpromulgation and