Top Banner
c . Nancy G. Leveson Software System Safety to the source is given. Abstractingwith credit is permitted. that the copies are not made or distributed for direct commercial advantage and provided that credit http://sunnyday.mit.edu MIT Aero/Astro Dept. ([email protected]) Copyright by the author, November 2004.All rights reserved. Copying without fee is permitted provided
52

Software System Safety

Sep 12, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Software System Safety

c

.

Nancy G. Leveson

Software System Safety

to the source is given. Abstractingwith credit is permitted.that the copies are not made or distributed for direct commercial advantage and provided that credit

http://sunnyday.mit.edu

MIT Aero/Astro Dept. ([email protected])

Copyright by the author, November 2004.All rights reserved. Copying without fee is permitted provided

Page 2: Software System Safety

��������� ��� ���

��������� ��� ��� ���������������

���������������Accident with No Component Failures

LC

COMPUTER

WATER

COOLING

CONDENSER

VENT

REFLUX

REACTOR

VAPOR

LA

CATALYST

GEARBOX

c

c

Caused by interactive complexity and tight coupling

Exacerbated by the introduction of computers.

Arise in interactions among components

Single or multiple component failures

Component Failure Accidents

Types of Accidents

Usually assume random failure

System Accidents

No components may have "failed"

..

Page 3: Software System Safety

From a blue ribbon panel report on the V−22 Osprey problems:

From an FAA report on ATC software architectures:

Confusing Safety and Reliability

reliability."en route automation systems must posses ultra−highconsideration as a safety−critical system. Therefore,"The FAA’s en route automation meets the criteria for

����� ��� ������������� � � � � �

����� ��� ������������� � � � � �

Recommendation: Improve reliability, then verify byextensive test/fix/test in challenging environments."

.

..

to safety accordingly.their nature, and we must change our approachesAccidents in high−tech systems are changing

ReliabilitySafety

"Safety [software]: ...

.

�������������"!$#

�������������"!$�

c

c

Page 4: Software System Safety

����� ��� ������������� � � � � �

����� ��� ������������� � � � � �

Reliability: The probability an item will perform its requiredfunction in the specified manner over a given timeperiod and under specified or assumed conditions.

Concerned primarily with failures and failure rate reduction

Parallel redundancyStandby sparingSafety factors and marginsDeratingScreeningTimed replacements

Reliability Engineering Approach to Safety

(Note: Most software−related accidents result from errors

from assumed conditions.)in specified requirements or function and deviations

�������������"!$�

�������������"!$%

c

c

Failure: Nonperformance or inability of system or componentto perform its intended function for a specified timeunder specified environmental conditions.

A basic abnormal occurrence, e.g.,

burned out bearing in a pump

relay not closing properly when voltage applied

Fault: Higher−order events, e.g.,relay closes at wrong time due to improper functioningof an upstream component.

All failures are faults but not all faults are failures.

Does Software Fail?

Page 5: Software System Safety

�&��� ��� ������������� � � � � �

�&��� ��� ������������� � � � � �

Highly reliable components are not necessarily safe.

Incomplete or wrong assumptions about operation of

�������������"!$'

c

Are usually caused by flawed requirements

�������������"!$(

controlled system or required operation of computer.

Unhandled controlled−system states and environmentalconditions.

Merely trying to get the software ‘‘correct’’ or to make it

Software−Related Accidents

reliable will not make it safer under these conditions.

c

Reliability Engineering Approach to Safety (2)

Assumes accidents are the result of component failure.

Techniques exist to increase component reliabilityFailure rates in hardware are quantifiable.

Omits important factors in accidents.May even decrease safety.

Many accidents occur without any component ‘‘failure’’

e.g. Accidents may be caused by equipment operationoutside parameters and time limits upon which reliability analyses are based.

Or may be caused by interactions of componentsall operating according to specification

Page 6: Software System Safety

Example (batch reactor)System safety constraint:

Water must be flowing into reflux condenser whenevercatalyst is added to reactor.

Software must always open water valve before catalyst valve

constraints of materials to intellectual limits

A Possible Solution

Enforce discipline and control complexity

Build safety in by enforcing constraints on behavior

Limits have changed from structural integrity and physical

Improve communication among engineers

����� ��� ������������� � � � � �

������)�*�*�� ��+ �

Software safety constraint:

c

��������������,

��������������#�-

�������������"!$.

what is specified in requirements.Software has unintended (and unsafe) behavior beyond

Requirements do not specify some particular behavior

behavior unsafe from a system perspective.Correctly implements requirements but specified

required for system safety (incomplete)

Software may be highly reliable and ‘‘correct’’ and stillbe unsafe.

Software−Related Accidents (con’t.)

c

c

Page 7: Software System Safety

The primary safety problem in computer−based systems

is the lack of appropriate constraints on design.

The job of the system safety engineer is to identify the

design constraints necessary to maintain safety and to

ensure the system and software design enforces them.

The Problem to be Solved

.

������)�*�*�� ��+ �c ��������������#/!

Page 8: Software System Safety

������)�*�*�� ��+ �

������)�*�*�� ��+ �

identification

management

evaluationelimination control

A planned, disciplined, and systematic approach topreventing or reducing accidents throughout the life

‘‘Organized common sense ’’ (Mueller, 1968)

cycle of a system.

Primary concern is the management of hazards:

System Safety

MIL−STD−882

��������������'/!

��������������'�#

design

c

c

Engineers should recognize that reducing risk is not animpossible task, even under financial and time constraints.All it takes in many cases is a different perspective on thedesign problem.

Mike Martin and Roland SchinzingerEthics in Engineering

An Overview of The Approach

Hazard

throughanalysis

Page 9: Software System Safety

Process Steps

2. Perform a System Hazard Analysis (not just Failure Analysis) Identifies potential causes of hazards

Produces hazard list

4. Design at system level to eliminate or control hazards.

5. Trace unresolved hazards and system hazard controls to

and humans.3. Identify appropriate design constraints on system, software,

software requirements.

������)�*�*�� ��+ �

������)�*�*�� ��+ �

1. Perform a Preliminary Hazard Analysis

developmentConceptual

throughout system development and use.

Design Development Operations

Hazard identification

Hazard resolution

Verification

Change analysis

Operational feedback

System Safety (2)

Management

c ��������������'��

Hazard analysis and control is a continuous, iterative process

c ��������������'��

Hazard resolution precedence:

1. Eliminate the hazard2. Prevent or minimize the occurrence of the hazard3. Control the hazard if it occurs.4. Minimize damage.

Page 10: Software System Safety

Process Steps (2)

6.

Human factors analyses (usability, workload, etc.)

Mode confusion and other human error analyses

Robustness (environment) analysis

Software hazard analysis

Simulation and animation

Software requirements review and analysis

������)�*�*�� ��+ �

Derive from system hazard analysis

Specifying Safety Constraints

What must not do is not inverse of what must do

Need to specify what software must NOT do

Need to specify off−nominal behavior

Most software requirements only specify nominal behavior

������)�*�*�� ��+ �

Completeness

��������������'�'

��������������'�%c

c

Page 11: Software System Safety

9.

Process Steps (4)

Periodic audits

Performance monitoring

Incident and accident analysis

Change analysis

Operational Analysis and Auditing

������)�*�*�� ��+ �

8.

7.

Off−nominal and safety testing

Exception−handling etc.

Elimination of unnecessary functions

Separation of critical functions

Assertions and run−time checking

Defensive programming

Implementation with safety in mind

������)�*�*�� ��+ �

Process Steps (3)

��������������'�,

��������������'�(c

c

Page 12: Software System Safety

021436587/7�9 :�;/<=1

Usability Analysis

Other Human FactorsEvaluation (workload, situationawareness, etc.)

Performance Monitoring

Task Allocation Principles

Training Requirements

Operator Goals and

>@? A�BDC E�F�? A�GIHKJIL$M

Operator Task and

Responsibilities

>@NDODN�P$QSRTF�Q�N�? GSA�MU C VSBDW U�XSYZU Q�[�VS\D? XS]

P^V�N�WDNZV�A�FIB X R`_ X A�Q�ASPJ X FSQ�CDV�ASFIQ�\DV�C E�V�P^Q X _�Q ] V�P X�]

a ? Q�C FIP^Q�NDP^? ASG�b�? A�NDP^V�C C VSP$? X ASb

cZ_SQ ] VSP$? X A�N

VSA�FIP ] V�? A�? A�G

Hazard List

Simulation/Experiments

Change Analysis

Incident and accident analysis

Periodic audits

Performance MonitoringChange Analysis

Periodic audits

Preliminary Hazard Analysis

System Hazard Analysis

Safety Verification

Operational AnalysisOperational Analysis

Operator Task Analysis

Preliminary Task Analysis

Fault Tree Analysis

Safety Requirements andConstraints

Completeness/ConsistencyAnalysis

State Machine Hazard Analysis

Deviation Analysis (FMECA)

Mode Confusion Analysis

Human Error Analysis

Timing and other analyses

Safety TestingSoftware FTA

Simulation and Animation

System

VSA�FIF�Q�N�? G�AIB X A�NDP ] VS? ASP$NX _�Q ] V�P$? X ASV�C ] Q�d�E�? ] Q�RIQ�A�P^NeZQSA�Q ] V�P$QIN�ODNDP^Q�RTVSA�F

f Q ] ? g$? BDVSP$? X A

VSA�F X _�Q ] V�P X�] RIV�A�ESV�C NFS? N�_�C VSODNDbSP ] V�? A�? ASGIR`VSP$Q ] ? V�C NDbB X RI_ X A�QSA�P^NDb�B X A�P ]@X C NZV�A�Fh QSND? G�AIV�A�FIB X A�NDP ] E�B�P

QSA�\D? ]�X A�RIQ�A�P^V�CDV�N�NDE�RI_�P^? X A�NL^F�QSA�P$? g^OZNDODNDP^Q�RiG X V�C NZV�A�F

c

GSQ�A�Q ] V�P^QINDODNDP^Q�RiF�Q�ND? G�Aj C C X BDVSP$QIP^V�NDWDNZVSA�F

System SafetyEngineeringHuman Factors

A Human−Centered, Safety−Driven Design Process

k43�lm3�nm:porqts/upv4w/x

Page 13: Software System Safety

y ��z�� � {�)&� � � ���� �

4. Establish the hazard log.

1. Identify system hazards

2. Translate system hazards into high−level

3. Assess hazards if required to do so.

system safety design constraints.

Preliminary Hazard Analysis

.

.

..

y ��z�� � {�)&� � � ���� �

Door that closes on an obstruction does not reopen or reopeneddoor does not reclose.

Doors cannot be opened for emergency evacuation.

����������������(

����������������,

c

Door closes while someone is in doorway

Door opens while improperly aligned with station platform.

Door opens while train is in motion.

Train starts with door open.

System Hazards for Automated Train Doors

c

Page 14: Software System Safety

other than a safe point of touchdown on assigned runway (CFIT)

y ��z�� � {�)&� � � ���� �

violate minimum separation.

with stationary objects or leaves the paved area.

Controlled aircraft executes an extreme maneuver within its

Identify the system hazards for this cruise−control systemExercise:

The cruise control system operates only when the engine is running.

traveling at that instant is maintained. The system monitors the car’sWhen the driver turns the system on, the speed at which the car is

speed by sensing the rate at which the wheels are turning, and itmaintains desired speed by controlling the throttle position. After the system has been turned on, the driver may tell it to start increasingspeed, wait a period of time, and then tell it to stop increasing speed.Throughout the time period, the system will increase the speed at afixed rate, and then will maintain the final speed reached.

The driver may turn off the system at any time. The system will turnoff if it senses that the accelerator has been depressed far enough tooverride the throttle control. If the system is on and senses that thebrake has been depressed, it will cease maintaining speed but will notturn off. The driver may tell the system to resume speed, whereuponit will return to the speed it was maintaining before braking and resumemaintenance of that speed.

authorization.

Controlled airborne aircraft and an intruder in controlled airspace

y ��z�� � {�)&� � � ���� �

Controlled airborne aircraft gets too close to a fixed obstable

Controlled aircraft operates outside its performance envelope.

Aircraft on ground comes too close to moving objects or collides

Aircraft enters a runway for which it does not have clearance.

performance envelope.

Loss of aircraft control.

System Hazards for Air Traffic ControlControlled aircraft violate minimum separation standards (NMAC).

Airborne controlled aircraft enters an unsafe atmospheric region.

Controlled airborne aircraft enters restricted airspace without

����������������.

����������������-

c

c

Page 15: Software System Safety

y ��z�� � {�)&� � � ���� �

1. A pair of controlled aircraft

1b. ATC shall provide conflict alerts.

maintain safe separation betweenaircraft.

1a. ATC shall provide advisories that

direct aircraft into areas with unsafeatmospheric conditions.

2a. ATC must not issue advisories that

2b. ATC shall provide weather advisoriesand alerts to flight crews.

2c. ATC shall warn aircraft that enter an unsafe atmospheric region.

Hazards must be translated into design constraints.

Door areas must be clear before door

Door opens while train is in motion.

violate minimum separation

�������������S�/!c

Example PHA for ATC Approach Control

areas, thunderstorm cells)(icing conditions, windshear

REQUIREMENTS/CONSTRAINTSHAZARDS

unsafe atmospheric region.2. A controlled aircraft enters an

standards.

doorway.

Door must be capable of opening only after

motion.Doors must remain closed while train is in

any door open.Train must not be capable of moving withTrain starts with door open.

DESIGN CRITERIONHAZARD

train is stopped and properly aligned with

Doors cannot be opened foremergency evacuation.

Door that closes on an obstructiondoes not reopen or reopened door does not reclose.

Door closes while someone is in

with station platform.Door opens while improperly aligned

emergency evacuation.anywhere when the train is stopped forMeans must be provided to open doors

reclose.removal of obstruction and then automaticallyAn obstructed door must reopen to permit

closing begins.

platform unless emergency exists (see below).

Page 16: Software System Safety

y ��z�� � {�)&� � � ���� �

Example PHA for ATC Approach Control (2)

c

to avoid intruders if at all possible.

HAZARDS REQUIREMENTS/CONSTRAINTS

6. Loss of controlled flight or lossof airframe integrity.

safety of flight.

the pilot or aircraft cannot fly or that 6c. ATC must not issue advisories that

6b. ATC advisories must not distractor disrupt the crew from maintaining

degrade the continued safe flight of the aircraft.

it at the wrong place.

that cause an aircraft to fall below

6a. ATC must not issue advisories outsidethe safe performance envelope of theaircraft.

6d. ATC must not provide advisories

the standard glidepath or intersect

�������������S��#

5. ATC shall provide alerts and advisories

HAZARDS REQUIREMENTS/CONSTRAINTS

3. A controlled aircraft entersrestricted airspace withoutauthorization.

4. A controlled aircraft gets too close to a fixed obstacle or terrain other than a safe point oftouchdown on assigned runway.

5. A controlled aircraft and anintruder in controlled airspaceviolate minimum separationstandards.

3a. ATC must not issue advisories thatdirect an aircraft into restricted airspaceunless avoiding a greater hazard.

3b. ATC shall provide timely warnings toaircraft to prevent their incursion intorestricted airspace.

4. ATC shall provide advisories that maintain safe separation betweenaircraft and terrain or physical obstacles.

Page 17: Software System Safety

y ��z�� � {�)&� � � ���� �

y ��z�� � {�)&� � � ���� �

10 12 12

121212121110

|t}�~�}������������������������� ��}���� ��� �

��������������������� ���

��������������������� ���

������� ��}���� ��� �|t}�~�}������������������

�������������� ������}���� � � ������6��}�~�}��������������������� � ���|t}�~�}������������

����������������}�~�}������� � ��� ��}����������������� ���������

6

Impossible

CatastrophicI

Critical

II

Marginal

III

Negligible

IV

1 2 3 4 129

12127643

5 8

�t����� �����������  ��

�� �����¡� �¢�£���¤���¡�¥

¦t��¤���¡�¥�§�¨����©����� �����¡� �¢ ¢ ��¥ �¡6£���¤���¡�¥ª ¡� �©���©�� ¢ � ��«¡���¥�¨�����¥

¬­����¢ � ��� ©�¢ ��£���¤���¡�¥

 �����¨�¡4¡��������® § ª  ������ ©�¢ �¯K����¨�§���°Z� ¢ ¢

�� ��� �����¨�¡

¬Z �¡�§���¢ ¢ «±�� ���� �������²�²����p��� ³��

A B C D E F

´�µ�¶�µ�·�¸�¹�º"»$¼�½

Another Example Hazard Level Matrix

��¢ � §�� �������� �¡

¡���¥�¨�����¥ª ¡� �©���©�� ¢ � ��« �¡6£���¤���¡�¥©����� �����¡� �¢ ¢ ��¥¦t��¤���¡�¥�§�¨�����t����� �����������  ��

¡�����¨�� ¡���¥��� ��¢ � §�� �������� �¡�� �����¡� �¢�£���¤���¡�¥

�t����� �����������  ��¡�����¨�� ¡���¥��� ��¢ � §�� �������� �¡�� �����¡� �¢�£���¤���¡�¥

�t����� �����������  ��¡�����¨�� ¡���¥��� ��¢ � §�� �������� �¡�� �����¡� �¢�£���¤���¡�¥

�t����� �����������  ��¡�����¨�� ¡���¥��� ��¢ � §�� �������� �¡�� �����¡� �¢�£���¤���¡�¥

�t����� �����������  ��¡�����¨�� ¡���¥��� 

Improbable

Remote

Unlikely

Impossible

Catastrophic Critical Marginal Negligible

I−A

I−B

I−C

I−E

I−F

II−A

II−B

II−C

II−D

II−E

II−F

III−A

III−B

III−C

III−D

Occasional

c

c ´�µ�¶�µ�·�¸�¹�º"»$¼�¾

A

B

C

D

E

F

Frequent

Moderate

III−E

Frequent Probable Occasional Remote

III−F

IV−A

IV−B

IV−C

IV−D

IV−E

IV−F

I−D

IVIIIIII

LIKELIHOOD

SEVERITY

Classic Hazard Level Matrix

Page 18: Software System Safety

¿&À ·�Á µ�ÂÄÃ�Å�Æ�Å�Ç È�É&¹�Å�Ê À ·�Ë ·

¿&À ·�Á µ�ÂÄÃ�Å�Æ�Å�Ç È�É&¹�Å�Ê À ·�Ë ·

detailed constraints.

StatesInitiatingFinal

EventsInitiating

nonhazard

nonhazard

nonhazard

X

Z

Y

W

D

C

B

A

Final

´�µ�¶�µ�·�¸Ì¹�º"»�»$½

´�µ�¶�µ�·�¸Ì¹�º"»�»$¾

Forward vs. Backward Search

HAZARDHAZARD

Backward Search

nonhazard

nonhazard

nonhazard

X

Z

Y

W

D

C

B

A

StatesEvents

Forward Search

Used to refine the high−level safety constraints into more

Hazard Causal Analysis

Requires some type of model (even if only in head of analyst)

BackwardForward Bottom−upTop−down

to system hazards.system design (model) for states or conditions that could leadAlmost always involves some type of search through the

c

c

Page 19: Software System Safety

Can use to refine system design constraints.

¿&À ·�Á µ�ÂÄÃ�Å�Æ�Å�Ç È�É&¹�Å�Ê À ·�Ë ·

software behavior.

¿&À ·�Á µ�ÂÄÃ�Å�Æ�Å�Ç È�É&¹�Å�Ê À ·�Ë ·

not openvalve 1

too highPressure

fails on

PositionIndicator

Valve 1

too late Light fails on

Open Indicator

failureComputer does

failureValve

inattentiveOperator

open valve 2not know to

Operator does

FailureSensor

does not openRelief valve 2

Computer

´�µ�¶�µ�·�¸Ì¹�º"»�»$Í

´�µ�¶�µ�·�¸Ì¹�º"»�»$Îc

System fault trees helpful in identifying potentially hazardous

FTA and Software

Not looking for failures but incorrect paths (functions)

provides some assurance they don’t exist.Identifies any paths from inputs to hazardous outputs or

FTA can be used to verify code.

Appropriate for qualitative analyses, not quantitative ones

Fault Tree Example

or

or

and

and

or

open valve 1command to

does not issueComputer

output

c

does not open

Valve

Explosion

Relief valve 1

Page 20: Software System Safety

¿&À ·�Á µ�ÂKÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·

¿&À ·�Á µ�ÂKÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·

Ð@ÑÓÒÕÔ�Ñ×ÖSØÚÙÜÛ$Ô�ÝtÔ�ÙßÞ@ÐÞ�Ù×Û à$áßÔ�âÝ2Ñ×ãÓãÓá6ÖßÛ Ý2Ù6Ð&Û Ñ×Ö Ý�Ñ×ãÓãäá6Ö6Û Ý�Ù6ÐåÛ Ñ×Ö

æ ÑÜÖrÐåÔ�Ñ×à$à â×ÔKÛ ç2çtárâßç

èêé èëé

é Ùßì×Û ÑÓÞ@ÙÜÛ$à á6Ô�â é Ù6ì×Û ÑÓÑ×Ö�ÒÕÔ�Ñ×ÖrØÞåÔ�âßí×árâÜÖrÝ�î

ï ç�î�ÝñðrÑ×à Ñ6Ø×Û Ý�ÙÜà�çtà Û$ò órÙ×ôSâ×à�Û ÖãÓÛ çñà â6Ùßì×Û ÖrØò6à ÙßÝ�âÓÑ×Ö

çñòrâ6âßìÚÙßì6õtÛ ç�Ñ×Ô�îÞ�Ù×Û à$áßÔ�â

ö á6ã÷Ù×Öï ðrî�çñÛ Ý2Ù×à

èëé

èëéÙ×Û$Ô�ÝñÔ�ÙßÞ@ÐßÐ@Ñøã�ÙÜù�âøÖrâßÝ�â6ç�ç2Ù×Ô�îúçtòrâßâ6ìÓÝtðrÙÜÖrØ6â

æ ÑÜÖrÐåÔ�Ñ×à$à â×ÔKÛ$ÖSç�ÐåÔ�árÝ�ÐåÛ ÑÜÖrçúì6ÑøÖrÑßÐ6Ý�ÙÜárç�â

Example Fault Tree for ATC Arrival Traffic

èëé

èëé

ãÓÛ Ö6Û$ãäá6ãûç�â×òrÙÜÔ�Ù6ÐåÛ Ñ×Ö�ç2Ð@ÙÜÖrì6ÙÜÔ�ìßçü òrÙ×Û Ô"ÑßÞ6Ý�ÑÜÖrÐ&Ô�ÑÜà$à â6ìÓÙ×Û Ô�ÝtÔ�Ù6Þ�Ð6õtÛ Ñ×à Ù6Ð�â

´Ìµ�¶�µ�·�¸�¹�ºß»�»$ý

þ Ô�Ñ×ÖSØøà ÙÜôrâ×à

´Ìµ�¶�µ�·�¸�¹�ºß»�»$ÿ

Example Fault Tree for ATC Arrival Traffic (2)

æ ÑÜÖrÐåÔ�Ñ×à$à â×ÔKÛ ç2çtárâßç

ç�â×òrÙÜÔ�Ù6ÐåÛ Ñ×ÖõtÛ ÑÜà ÙßÐ&Û Ñ×Ö��

Ð@ÑßÑøà Ù6Ð�âÓÐ�ÑÓÙ6õ�ÑÜÛ ìçtòrâ6âßìÓÙ6ì6õñÛ ç�ÑÜÔ�î

õtÛ ÑÜà ÙßÐ&Û Ñ×ÖÙ6õ�Ñ×Û ìÓç�âÜòrÙ×Ô�Ù6ÐåÛ Ñ×ÖÐ&ðSÙ6Ð6ìßÑ6â6çëÖrÑßÐçtòrâ6âßìÓÙ6ì6õñÛ ç�ÑÜÔ�îæ ÑÜÖrÐåÔ�Ñ×à$à â×ÔKÛ ç2çtárâßç

ÖrÑßÐ6Þ�Ñ×à à Ñ6Ò Û Ð��Ô�â6Ý2â×Û õ2â6çëÛ ÐÜô6árÐßì6Ñ6âßç

ç�ÝtÔ�â6âÜÖ

Ù6ç�ç2Ñ6ÝtÛ Ù6Ð�â6ìÓÒÕÛ ÐåðÙ×Û Ô�ÝtÔ�Ù6Þ�Ð6Ñ×Öò6à Ù×ÖrõñÛ â6Ò ì×Û çtò6à Ù6î

æ Ñ×ÖSÐ&Ô�ÑÜà$à â×Ô ì6Ñ6âßçÖSÑ6Ð×Û ç�çtáSâÓçtòrâ6âßìÙßì6õtÛ ç�Ñ×Ô�î

æ Ñ×ÖSÐ&Ô�ÑÜà$à â×ÔKÛ ç�çñárâ6çÙÜò6ò6Ô�Ñ×òßÔ�Û ÙßÐ@âÓçtòSâ6â6ìÙßì6õtÛ ç�Ñ×Ô�îëô6áSÐ×ò6Û à ÑßÐìßÑ6â6çëÖrÑßÐ×Ô�â6Ý2â×Û õ2âøÛ Ð��

æ Ñ×ÖSÐ&Ô�Ñ×à à â×Ô Û ç�çtáSâ6çÙ×òßò6Ô�Ñ×òßÔmÛ Ù6Ð�âÓçtòrâßâ6ìÙ6ìßõtÛ ç�Ñ×Ô�îúÙ×ÖrìøòßÛ$à Ñ6Ð

c

Ù×ò6òßÔ�Ñ6Ù6Ýñð Ð�ÑøòrÙÜÔ�Ù×à$à â×àÔ�á6ÖrÒ Ù6î2çëÖrÑ6ÐßçtòrÙ6ÐåÛ Ù×à à îç�Ð�Ù6Ø6Øßâ×Ô�â6ì��

� Ò ÑÓÙÜÛ$Ô�ÝtÔ�ÙßÞ@ÐÜà Ù×ÖSì×Û ÖrØÝ�ÑÜÖrç�âßÝtárÐåÛ õ2â×à îúÑÜÖ�ì×Û Þ�Þ�â×Ô�â×ÖSÐÔ�á6ÖSÒ Ùßî�çëÛ$ÖÓÛ ÖrÐ�â×Ô�ç�âßÝ�ÐåÛ$ÖrØÓÑÜÔÝ�ÑÜÖrõ�âÜÔ�ØÜÛ$ÖSØÓÑ×òrâ×Ô�ÙßÐ&Û Ñ×ÖSç õñÛ ÑÜà Ù6Ð�âãÓÛ Ö6Û ãÓá6ãûì×Û Þ@Þ�â×Ô�âÜÖrÝ�âøÛ ÖÐåð6Ô�â6çtðSÑ×à ìÓÝtÔ�Ñ6ç�çñÛ$ÖrØÓÐåÛ ã�â��

ü Ö�ÙÜÛ$Ô�ÝtÔ�ÙßÞ@ÐßõtÛ ÑÜà ÙßÐ@âßç ÐåðrâÖrÑ×Ö�� ÐåÔ�Ù×Örç2Ø×Ô�â6ç�çtÛ Ñ×Ö�2Ñ×ÖrâÒÕð6Û à âÓÙ×Û Ô�òrÑ×Ô�ÐÜÛ çúÝ�ÑÜÖrì×áSÝ�Ð&Û ÖrØÛ$ÖSì6â×òSâ×Örìßâ×ÖrÐ�@ó�� Ù×ò6òßÔ�Ñ6Ù6ÝtðSâ6çÐ�Ñ òSÙ×Ô�Ù×à à âÜà�Ô�á6ÖrÒ Ùßî�ç �

� Ò ÑÓÙ×Û Ô�ÝtÔ�Ù6Þ�Ð6ÑÜÖ�Þ&Û ÖrÙ×à

Û Ö��tÐ&Ô�Ù×Û à�ç�âÜòrÙ×Ô�ÙßÐ&Û Ñ×Ö�ÒÕð6Û à â� Û Ñ×à ÙßÐ&Û Ñ×Ö�ÑßÞ×ãÓÛ Ö6Û$ãäá6ã

ÑÜÖ�Þ&Û ÖrÙ×àDÙ×ò6òßÔ�Ñ6Ù6Ýtð�Ð�Ñç2Ù×ã�âøÔ�á6ÖrÒ Ùßî

� Û Ñ×à ÙßÐ&Û Ñ×Ö�ÑßÞ6ì×Û ç�Ð�Ù×ÖrÝ2âÚÑÜÔ"ÐåÛ ã�âç2â×òrÙÜÔ�ÙßÐ&Û Ñ×ÖÓôSâ6Ð@Ò âßâ×Ö�ç�ÐåÔ�â6Ù×ã�çÑßÞ6Ù×Û Ô�ÝtÔ�Ù6Þ�Ð×à Ù×ÖrìÜÛ$ÖSØÚÑÜÖ�ì×Û Þ@Þ�â×Ô�â×ÖSÐÔ�áßÖrÒ Ù6î�ç

� Û ÑÜà Ù6ÐåÛ Ñ×Ö�Ñ6ÞÜãÓÛ$ÖßÛ$ãÓáßã ç2â×òrÙÜÔ�ÙßÐ&Û Ñ×Ö

ì6âÜòrÙ×Ô�Ðåá6Ô�âÓÐåÔ�Ù6Þ�Þ&Û ÝúÞ&Ô�Ñ×ã ÖSâ6Ù×Ô�ôrîôrâßÐ@Ò âßâ×Ö�Ù×Ô�Ô�Û õ2Ù×à�ÐåÔ�Ù6Þ�Þ&Û ÝúÙ×Örì

Þ�â6â6ìßâ×Ô Ù×Û$Ô�òSÑ×Ô�Ð@ç��

ü Ö�ÙÜÛ$Ô�ÝtÔ�ÙßÞ@ÐßÞ@ÙÜÛ$à çÐ�Ñ ã÷Ù×ù�âÓÐåá6Ô�ÖÞåÔ�ÑÜã ôSÙ6ç�âÓÐ�ÑÞåÛ$ÖrÙÜà�Ù×òßò6Ô�Ñ6Ù6Ýñð��

c

Page 21: Software System Safety

¿ ¸�� Á �ÏÅ�Ç µ�ÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·

¿ ¸�� Á �ÏÅ�Ç µ�ÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·

Added what have learned from accidents

Mathematical completenessStates, inputs, outputs, transitions

Added basic engineering principles (e.g., feedback)

Defined completeness for each part of state machine

Mapped the parts of a control loop to a state machine

How were criteria derived?

I/O

I/O

´�µ�¶�µ�·�¸�¹�º"»$Î�Î

´�µ�¶�µ�·�¸�¹�º"»$Î�Í

Requirements Completeness Criteria (2)

c

c

Completeness: Requirements are sufficient to distinguishthe desired behavior of the software fromthat of any other undesired program thatmight be designed.

Most software−related accidents involve software requirementsdeficiencies.

Accidents often result from unhandled and unspecified cases.

We have defined a set of criteria to determine whether arequirements specification is complete.

Derived from accidents and basic engineering principles.

Validated (at JPL) and used on industrial projects.

Requirements Completeness

Page 22: Software System Safety

��µ�����Ë Ç µ�Â�µ�¹�Á ·tÉ&¹�Å�Ê À ·�Ë ·

tools can check them.Most integrated into SpecTRM−RL language design or simple

Mode transitions

c ´�µ�¶�µ�·�¸Ì¹�º�½�ÿ

Value and timing

Requirements Completeness Criteria (3)

Path RobustnessPreemptionReversibilityFeedbackLatencyData ageRobustness

Load and capacity

Inputs and outputs

Startup, shutdown

About 60 criteria in all including human−computer interaction.

(won’t go through them all they are in the book)

Human−computer interfaceFailure states and transitionsEnvironment capacity

Page 23: Software System Safety

¿ ¸�� Á �ÏÅ�Ç µ�ÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·

Automatic code generation?

Human Error Analysis

Software Deviation Analysis

State Machine Hazard Analysis (backwards reachability)

Model Execution, Animation, and Visualization

Requirements Analysis

Completeness

¿ ¸�� Á �ÏÅ�Ç µ�ÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·

Test Coverage Analysis and Test Case Generation

Results of execution could be input into a graphical

output can go into another model or simulator.Inputs can come from another model or simulator and

Model Execution and Animation

´�µ�¶�µ�·�¸�¹�º"»$Î�½

SpecTRM−RL models are executable.

´�µ�¶�µ�·�¸�¹�º"»$Î�¾c

visualization

Model execution is animated

c

Page 24: Software System Safety

��µ�·�Ë ��¹

��µ�·�Ë ��¹

Design should incorporate basic safety design principles

RedundancySafety Factors and Margins

Reducing exposureIsolation and containmentProtection systems and fail−safe design

HAZARD ELIMINATION

HAZARD REDUCTION

HAZARD CONTROL

DAMAGE REDUCTION

Safe Design Precedence

Software design must enforce safety constraints

Should be able to trace from requirements to code (vice versa)

Design for Safety

´�µ�¶�µ�·�¸�¹�º�½��

´�µ�¶�µ�·�¸�¹�º�Î�¼

Failure Minimization

c

c

Increasing effectiveness

Decreasing cost

SubstitutionSimplificationDecouplingElimination of human errorsReduction of hazardous materials or conditions

Design for controllabilityBarriers

Lockins, Lockouts, Interlocks

Page 25: Software System Safety

(Systems Theory Accident Modeling and Processes)

STAMP is a new theoretical underpinning for developing

.

..

STAMP

more effective hazard analysis techniques for complex systems.

¿�À ·�Á µ�ÂKÃ�Å�Æ�Å�Ç È�É&¹�Å�Ê À ·�Ë ·

¿�À ·�Á µ�ÂKÃ�Å�Æ�Å�Ç È�É&¹�Å�Ê À ·�Ë ·

Design for safety

Hazard analysis

´�µ�¶�µ�·�¸�¹�º"»�»��

´�µ�¶�µ�·�¸�¹�º"»���¼

Investigating and analyzing accidents

Preventing accidents

Accident models provide the basis for

c

suitable for use)Assessing risk (determining whether systems are

c

Performance modeling and defining safety metrics

Page 26: Software System Safety

¿�¿&À ·�Á µ�ÂKÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·

¿�À ·�Á µ�ÂKÃ�Å�Æ�Å�Ç È�É&¹�Å�Ê À ·�Ë ·

��� �"!$#"%"&(' )"!*#"+-,. �0/210,3#4,�1(+"5�1-6*1-7

' ,�' 89)"� #08�8:&(�;' <2#"=">? ' /2' +"' , 79�0�(,�1(+"5@6BA"' C #D #(#")*)"#(� 82�"+"+(#"C���� �"!)(�08�8:' E"C #4�3� 1"F"!*#(+0, 8:>8�/2�;#"#(+4,��4/2�"+-,�1(' +G � � ? ' =(#*!*#082A

�"&0,��0�",31"+(5H>,��*5�#"#()*!*�"' 8�,�&(�;#

1-8�,31"+"5�1(F"#082>

I 82#*="#-8:' /�/:1(+0,

J #"="&-/:#*)(�;#08�82&"�;#

rupturemetalWeakened

projectedFragments

injuredPersonnel

damaged

8:&08�/2#")0,3' E"C #4,3�*="1(!*1"F"#(>

´�µ�¶�µ�·�¸�¹�º"»����

´�µ�¶�µ�·�¸�¹�º"»��/»

1"+(=4��� 1"F"!*#(+0,310,3' �"+">#0K�,3#"+082' ? #*=(1"!*1"F(#="�(#082L")"� # ? #"+0,3' +"F*!*�(�;#,3�*�;&")-,�&(�;#*E"#-���(�;#4,31"+"5I 8:#*E"&(� 8�,�=(' 1()"A"� 1"F(!

�3�"�;#-8:#(#"1"E(C #*C ' �3#0,3' !*#">�31"' C &"� #*)"�"' +-,�="&"� ' +"F�;#(="&0/2#48�,�� #"+"F-,�A4,3�/2�"�;� �082' �"+46B' C CH+"�-,,3A"' /25H+"#-8H8�8:�M ? #"� ="#082' F"+*!*#-,�1(C

!*�"' 8�,3&"�;#(>/2�"+0,310/�,"6B' ,�A8�,�#(#"CN,3�*)"� # ? #"+0,)"C 1-,�#4/21"� E"�"+8�,�#(#"CH�(�H/:�(10,��-�I 82#48�,�1(' +"C #08�8

c

AND OR

Events almost always involve component failure, human error, or energy−related event

Form the basis of most safety−engineering and reliabilityengineering analysis:

e.g., Fault Tree Analysis, Probabilistic Risk Assessment,FMEA, Event Trees

and design:

e.g., redundancy, overdesign, safety margins, ...

as a forward chain over time.Explain accidents in terms of multiple events, sequenced

Chain−of−Events Models

c

Equipment

pressureOperating

Moisture Corrosion Tank

Chain−of−Events Example

Page 27: Software System Safety

¿&À ·�Á µ�ÂKÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·

¿&À ·�Á µ�ÂKÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·

Software error

System accidents

Models need to include the social system as well as the technology and its underlying science.

OQP*R SUTWVWX R;Y*Z\[^]�_Za`*b^X;P*RaZ�cdZ�e�Y*fge�T*PUehSU]�`Ui�X j*Y*ZkSUl*]�SU`*Z^Y*m*n*`*P*R;Zam*P*o*jWj*Y*b^X Z^X;`*oWba]HXpe�Y*]�X;P*_Zab^X;Y*o*baYWY*fWY*]�ndY*Za_rqsitY*]HR c�X o*nWYUi�Y*]�cue�Y*baT*o*X;b^P*Ra`*]vbaX-itX;RaZ�cdZ�e�Y*fwX;Z\PP*Rpe�Td`*l*n*Tre�T*Yre�Y*baT*o*`*R;`*n4cxfyP4cxz*Yr{|Y*R;Raj*YUi�Y*R `aSUY*jWR `*o*nWz*Y*}�`*]HYre�T*Y~|o*j*Y*]�R c�X o*nWYUi�Y*]�cue�Y*baT*o*`*R;`*n4c�X;ZsPUe4R Y*P*Z�e4`*o*YWz*P*Z^X bsZab^X Y*o*b^Y*m

Limitations of Event Chain Models:

Social and organizational factors in accidents

E1: Worker washes pipes without inserting slip blind

E2: Water leaks into MIT tank

E6: Wind carries MIC into populated area around plant

E5: MIC vented into air

Chain−of−Events Example: Bhopal

E4: Relief valve opens

E3: Explosion occurs

c

´Ìµ�¶�µ�·�¸�¹�ºß»���¾

´Ìµ�¶�µ�·�¸�¹�ºß»���½

c

Page 28: Software System Safety

can easily be identified.of possible accidentsCombinatorial structure

for the trees.very likely will not see the forestdepartments in operational contextDecision makers from separate

Operational Decision Making:Accident Analysis:

Zeebrugge

HarborDesign

CargoManagement

PassengerManagement

TrafficScheduling

VesselOperation

DesignVessel

Time pressureOperations management

Captain’s planning

Berth design

procedure

Major accidents involve systematic migration of organizational

aggressive, competitive environment.behavior under pressure toward cost effectiveness in an

in isolation from theit into individual decisions and actions and studying itCannot effectively model human behavior by decomposing

Adaptation

dynamic work processvalue system in which it takes placephysical and social context

Deviation from normative procedure vs. established practice

Human error

Limitations of Event Chain Models (2)

¿&À ·�Á µ�ÂKÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·

¿&À ·�Á µ�ÂKÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·

to ZeebruggeTransfer of Herald

heuristicsOperations management

Standing orders

Calais

Operations management

Berth design

Excess numbersPassenger management

Capsizing

Unsafe

patternsCrew working

procedure

Change ofdocking

DesignStability Analysis

Shipyard Equipmentload added

Truck companies

Impairedstability

Excess load routines

Docking

c

c ´Ìµ�¶�µ�·�¸�¹�ºß»���Î

´Ìµ�¶�µ�·�¸�¹�ºß»���Í

Page 29: Software System Safety

¿&À ·�Á µ�ÂKÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·

¿&À ·�Á µ�ÂKÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·

3.

Analytic Reduction (Descartes)

Statistics

Treat as a structureless mass with interchangeable parts.

in their behavior that they can be studied statistically.

Use Law of Large Numbers to describe behavior in

Assumes components sufficiently regular and random

terms of averages.

Ways to Cope with Complexity (con’t.)

´Ìµ�¶�µ�·�¸�¹�ºß»���ÿ

´Ìµ�¶�µ�·�¸�¹�ºß»���ý

c

into the whole are themselves straightforward.

c

Ways to Cope with Complexity

Divide system into distinct parts for analysis purposes.

Examine the parts separately.

Three important assumptions:

The division into parts will not distort the phenomenon being studied.

1.

Components are the same when examined singlyas when playing their part in the whole.

2.

Principles governing the assembling of the components

Page 30: Software System Safety

¿&À ·�Á µ�ÂKÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·

¿&À ·�Á µ�ÂKÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·

(basis of system engineering)

systems −− how they interact and fit together.These properties derive from relationships between the parts of

Some properties can only be treated adequately in their entirety,taking into account all social and technical aspects.

Systems Theory

Developed for biology (Bertalanffly) and cybernetics (Norbert Weiner)

c

Most important properties are emergent.

Separation into non−interacting subsystems distorts results

For systems too complex for complete analysis

and too organized for statistical analysis

Concentrates on analysis and design of whole as distinct from parts

´Ìµ�¶�µ�·�¸�¹�ºß»����

´Ìµ�¶�µ�·�¸�¹�ºß»$¾�¼

c

Too organized for statistics

Separation into non−interacting subsystems distortsthe results.

The most important properties are emergent.

Too much underlying structure that distortsthe statistics.

What about software?

Too complex for complete analysis:

Page 31: Software System Safety

Irreducible

Represent constraints upon the degree of freedom of components a lower level.

Safety is an emergent system property

It is NOT a component property.

Hierarchies characterized by control processes working atthe interfaces between levels.

A control action imposes constraints upon the activity at one level of a hierarchy.

Control in open systems implies need for communication

information and control.in a state of dynamic equilibrium by feedback loops ofOpen systems are viewed as interrelated components kept

¿&À ·�Á µ�ÂKÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·

¿&À ·�Á µ�ÂKÃÏÅ�Æ�Å�Ç È�É�¹�Å�Ê À ·�Ë ·

It can only be analyzed in the context of the whole.

c ´Ìµ�¶�µ�·�¸�¹�ºß»$¾��

´Ìµ�¶�µ�·�¸�¹�ºß»$¾/»c

Levels characterized by emergent properties

2. Communication and control

Systems Theory (3)

Two pairs of ideas:

Systems Theory (2)

1. Emergence and hierarchy

Levels of organization, each more complex than one below.

Page 32: Software System Safety

¿�� É��h�

¿�� É��h�

that violate the constraints on safe component

People

Accidents arise from interactions among

behavior and interactions.

Physical system components

Engineering activities

Societal and organizational structures

Need to include the entire socio−technical system

c

c

c

Not simply chains of events or linear causality,but more complex types of causal connections.

c

Safety is an emergent system property.

A Systems Theory Model of Accidents

´�µ�¶�µ�·�¸�¹�º"»$½�ý

´�µ�¶�µ�·�¸�¹�º"»$½�ÿ

Page 33: Software System Safety

To understand accidents, need to examine control structure

and why events occurred.

Result from lack of enforcement of safety constraints

Mars Polar Lander.Software did not adequately control descent speed of

sealing gap in field joint

Views accidents as a control problem

STAMP (2)

e.g., O−ring did not control propellant gas release by

Events are the result of the inadequate control

¿�� É��h�

¿�� É��h�

itself to determine why inadequate to maintain safety constraints

continually adapting to achieve its ends and toA socio−technical system is a dynamic process

and adaptation.structure to enforce constraints on system behaviorPreventing accidents requires designing a control

react to changes in itself and its environment

(Systems−Theoretic Accident Model and Processes)

´�µ�¶�µ�·�¸�¹�º"»$Î�¼

´�µ�¶�µ�·�¸�¹�º"»$½��

Systems not treated as a static design

STAMP

Based on systems and control theory

c

c

c

c

Page 34: Software System Safety

Maintenance

Congress and Legislatures

Legislation

Company

Congress and Legislatures

Legislation

Legal penaltiesCertificationStandardsRegulations

Government ReportsLobbyingHearings and open meetingsAccidents

Case LawLegal penaltiesCertificationStandards

Problem reports

Incident ReportsRisk AssessmentsStatus Reports

Test reports

Test RequirementsStandards

Review Results

Safety Constraints

Implementation

Hazard Analyses

Progress Reports

Safety Standards Hazard AnalysesProgress Reports

Design, Work Instructions Change requests

Audit reports

Regulations

Insurance Companies, CourtsUser Associations, Unions,

Industry Associations,Government Regulatory Agencies

Management

Management

ManagementProject

Government Regulatory AgenciesIndustry Associations,

User Associations, Unions,

Documentation

and assurance

and Evolution

SYSTEM OPERATIONS

Insurance Companies, Courts

Physical

Actuator(s)

SYSTEM DEVELOPMENT

Accidents and incidents

Government ReportsLobbyingHearings and open meetingsAccidents

WhistleblowersChange reportsMaintenance ReportsOperations reportsAccident and incident reports

Change RequestsPerformance Audits

IncidentsProblem Reports

Hardware replacementsSoftware revisions

Hazard AnalysesOperating Process

Case Law

Safety−Related Changes

Operating AssumptionsOperating Procedures

Revisedoperating procedures

WhistleblowersChange reportsCertification Info.

Procedures

safety reports

work logs

Manufacturinginspections

Hazard Analyses

Documentation

Design Rationale

Company

ResourcesStandards

Safety Policy Operations Reports

ManagementOperations

ResourcesStandards

Safety Policy

audits

ManufacturingManagement

SafetyReports

Work

Policy, stds.

Process

Controller

Sensor(s)

Human Controller(s)

Automated

Page 35: Software System Safety

Controller 1

Controller 2

¿�� É��h�

Process 2

Accidents occur when:

Design does not enforce safety constraints

Inadequate control actions

Control structure degrades over time, asynchronous evolution

Control actions inadequately coordinated among multiplecontrollers.

unhandled disturbances, failures, dysfunctional interactions

Overlap areas (side effects of decisions and control actions)

Boundary areas

Process 1

¿�� É��h�

Controller 1

Controller 2Process

New model includes what do now and more

But does imply the need to enforce the safety constraintsin some way.

e.g., redundancy, interlocks, fail−safe design

maintenance procedures

manufacturing processes and procedures

or through process

Component failures may be controlled through design

Note:

Does not imply need for a "controller"

´�µ�¶�µ�·�¸�¹�º"»$Î��

´�µ�¶�µ�·�¸�¹�º"»$Î�¾cc

cc

Page 36: Software System Safety

Model of

¿�� É��h�

Process Models

(Controller) Human Supervisor Automated Controller

InterfacesProcessModel of Model of

Sensors

Actuators

ProcessControlled

inputsProcess

Controls

DisplaysDisturbancesAutomation

Accidents occur when the models do not match the process and

Time lags not accounted for

[Note these are related to what we called system accidents]

inadvertently commanding system into a hazardous stateunhandled or incorrectly handled system component failures

unhandled process statese.g. uncontrolled disturbances

Wrong from beginning

Relationship between Safety and Process Model

¿�� É��h�

incorrect control commands are given (or correct ones not given)

ProcessModel of

variablesMeasured

Controlled

Process

variables

outputs

The ways the process can change stateCurrent state (values of process variables)Required relationship among process variables

Process models must contain:

Missing or incorrect feedback and not updated correctly

Explains most software−related accidents

How do they become inconsistent?

´�µ�¶�µ�·�¸�¹�º"»$Î�Î

´�µ�¶�µ�·�¸�¹�º"»$Î�½c

cc

c

Page 37: Software System Safety

Also explains most human/computer interaction problems

How do I get it to do what I want?How did it get us into this state?What will it do next?Why did it do that?What did it just do?

What caused the failure?What can we do so it does not

happen again?

Or don’t get feedback to update mental models or disbelieve it

Safety and Human Mental Models

Explains developer errors

Why won’t it let us do that?

¿�� É��h�

¿�� É��h�

Pilots and others are not understanding the automation

In preventing accidents

Hazard analysis

Designing for safety

Is it better for these purposes than the chain−of−events model?

Is it useful?

etc.physical lawsdevelopment processrequired system or software behavior

May have incorrect model of

In accident and mishap investigation

Validating and Using the Model

Can it explain (model) accidents that have already occurred?

´�µ�¶�µ�·�¸�¹�º"»$Î�ÿ

´�µ�¶�µ�·�¸�¹�º"»$Î�Íc

c

c

c

Page 38: Software System Safety

¿�� É��h�

¿�� É��h�

Modeling Accidents Using STAMP

Three types of models are needed:

1. Static safety control structure

2. Dynamic structure

3. Behavioral dynamics

Dynamic processes behind the changes, i.e., why the system changes

Shows how the safety control structure changed over time

Safety requirements and constraintsFlawed control actionsContext (social, political, etc.)Mental model flawsCoordination flaws

Using STAMP in Accident and

Root Cause Analysis

Mishap Investigation and

c

c

c

c ´�µ�¶�µ�·�¸�¹�º"»$Î��

´�µ�¶�µ�·�¸�¹�º"»$Î�ý

Page 39: Software System Safety

Diagnostic andflight information

Horizontal velocity

command

commandMain engine

Horizontalvelocity

Main engineNozzle

OBC

SRI

Backup SRI

BoosterNozzles

platformStrapdown inertial

Nozzle

�:�������

being sent to nozzles.an attitude deviation that had not occurred. Results in incorrect commands

Process Model: Model of the current launch attitude is incorrect, i.e., it contains

nozzle to make a large correction for an attitude deviation that had not occurred.Unsafe Behavior: Control command sent to booster nozzles and later to main engine

of attack of more than 20 degrees.

Executes flight program; Controls nozzles of solidboosters and Vulcain cryogenic engine

Measures attitude oflauncher and its movements in space

Measures attitude oflauncher and its movements in space;Takes over if SRI unableto send guidance info

result in the launcher operating outside its safe envelope.

Full nozzle deflections of solid boosters and main engine lead to angleNozzles:

stage at altitude of 4 km and 1 km from launch pad.Triggered (as designed) by boosters separating from mainSelf−Destruct System:

OBC Safety Constraint Violated: Commands from the OBC to the nozzles must not

to disintegrate at 39 seconds after command for main engine ignition (H0).high angle of attack create aerodynamic forces that cause the launcher

OBC (On−Board Computer)

uses it for flight control calculations. With both SRI and backup SRI shut downControl Algorithm Flaw: Interprets diagnostic information from SRI as flight data and

and therefore no possibility of getting correct guidance and attitude information,loss was inevitable.

A rapid change in attitude and high aerodynamic loads stemming from a Ariane 5:

SRI that is available on the databus. to determine which) − does not include the diagnostic information from the

Interface Model: Incomplete or incorrect (not enough information in accident report

Feedback: Diagnostic information received from SRI

A

���H�0���0�h�����N���

B

cc

D C

C

ARIANE 5 LAUNCHER

Page 40: Software System Safety

Diagnostic andflight information

Nozzlecommand

command

Horizontal velocity

Main engine

Horizontalvelocity

Main engineNozzle

OBC

SRI

Backup SRI

BoosterNozzles

platformStrapdown inertial

�:�������

Process Model: Does not match Ariane 5 (based on Ariane 4 trajectory data);

where horizontal bias variable does not get large enough to cause an overflow. exception while calculating the horizontal bias. Algorithm reused from Ariane 4floating point value to a 16−bit signed integer leads to an unhandled overflow velocity input from the strapdown inertial platform (C). Conversion from a 64−bitused as an indicator of alignment precision over time) using the horizontal

Control Algorithm: Calculates the Horizontal Bias (an internal alignment variable

SRI Safety Constraint Violated: The backup SRI must continue to send guidance

inertial platform.

Executes flight program; Controls nozzles of solidboosters and Vulcain cryogenic engine

Measures attitude oflauncher and its movements in space

Measures attitude oflauncher and its movements in space;Takes over if SRI unableto send guidance info

Assumes smaller horizontal velocity values than possible on Ariane 5.

Process Model: Does not match Ariane 5 (based on Ariane 4 trajectory data);Assumes smaller horizontal velocity values than possible on Ariane 5.

information as long as it can get the necessary information from the strapdown

inertial platform.

Backup SRI (Inertial Reference System):

the bus (D).turns itself off (as it was designed to do) after putting diagnostic information on

results in the same behavior, i.e., shutting itself off.

information as long as it can get the necessary information from the strapdown

Unsafe Behavior: At 36.75 seconds after H0, SRI detects an internal error and

Because the algorithm was the same in both SRI computers, the overflow

exception while calculating the horizontal bias. Algorithm reused from Ariane 4where horizontal bias variable does not get large enough to cause an overflow.

Unsafe Behavior: At 36.75 seconds after H0, backup SRI detects an internal error and turns itself off (as it was designed to do).

SRI (Inertial Reference System):

SRI Safety Constraint Violated: The SRI must continue to send guidance

Control Algorithm: Calculates the Horizontal Bias (an internal alignment variable used as an indicator of alignment precision over time) using the horizontal velocity input from the strapdown inertial platform (C). Conversion from a 64−bitfloating point value to a 16−bit signed integer leads to an unhandled overflow

B

���H���H�0�h�����N�h�

D

cc

C

C

ARIANE 5 LAUNCHER

A

Page 41: Software System Safety

���0�H�h�

DEVELOPMENT

�d�4�^�  *¡h�4¢¤£*¡¤¥ ¦�§�¦y¨d© ª4«��4���­¬¯®4£4¬4¢*� ¢°¡hª4¬*� ¡h«^± ²h¢4�°± ª4£4¢¤¬¯£4¨4�³µ´�¶¸·º¹N´¼»4½¾» ¿4À­Á*Â

Ãı ±(��£4ů�4¬¯Æ�Ç�«^©;¬�� � «�£*±(¢4£4¬¯£¤£d¡h¢¤��ª4ů¬¯Èµ£*© �¤É°²h��¬4Ê4�°� ¡h«^± ²h¢4�4¢¥ ¦9§�¦yÉ°²h��¬4Ê4�¤¨4�*©;ůª*©;ÉQ�4¢$ª*¡Ä¬¯®4�¤£4��Ç�Å�± ª4ÈË¡Ä��Æ���¬¯�4É

Ì�¿4Í-Î4·pϤ³µ´�¶¸Á4·(¹N¿*Я¶Ñ·(Â

Òd���4¢¤¢4�4ů£*²4± ¬4Ó�£*± ²h�4�Ôůª*©Ñ¬¯�4��¬�� ¡h ¤��ª4ů¬¯ÈU£*©;�°� ɤ¨*± �4ɤ�*¡h¬¯£4¬�� ª*¡¦�£*± � ¢4£4¬¯�4¢¤¢4�4���  *¡Ä«�ªd¡h��¬¯£*¡h¬4Ê*²h¬*¡hª4¬4£4«�¬�²h£d±(«�ª*¡h��¬¯£*¡h¬

Õ¤� �^²4¡h¢4�*©;��¬¯£*� ¡h¢*� ¡h ¤ª4Å*± ª4£4¢¤¬¯£4¨4�¤«�© �4£4¬�� ª*¡Ä¨*© ª4«��4���Õ¤� �^²4¡h¢4�*©;��¬¯£*¡h¢*� ¡h ¤£4Ê4ª*²h¬4ȵ®4£4¬4«�ª*²4± ¢¤Ê4�¤¬¯�4��¬¯�a¢

Ö Î*¶Ñ·-¿*» Ö ´ ×¾Î*»4½ » ¿aÀ­Á*Â

Titan 4/Centaur/Milstar OPERATIONS

LMAAnalex Denver

Engineering

IV&V of flight softwareHoneywell

Aerospace

development and testMonitor software

LMA Quality

Flight Control Software

Software Design and Development

IMS software

LMA System

Assurance

Ø$Ù*Ú*Û;Ü4Ý�Þ�ß$à�ß

á�â�ã3â�ä¯åpæhçaè�é�êc

operations management)(Responsible for ground

Third Space Launch Squadron (3SLS)

of LMA contract)(Responsible for administration

Center Launch Directorate (SMC)Space and Missile Systems

oversee the process

contract administrationsoftware surveillance

Management CommandDefense Contract

c

verify designAnalex−Cleveland

IV&VAnalex

construction of flight control system)(Responsible for design and

Prime Contractor (LMA)

System test of INULMA FAST Lab

Titan/Centaur/Milstar

(CCAS)Ground Operations

Page 42: Software System Safety

�������h�

�������h�

ë4ì í�ì î-ï;ð ñ@ò�ó

ôºõ�ö â�÷�ä õ�øhù�ú â�ä

ûËü-ý-þ"ÿ������0ýpþ��

���� � �� �þ"ü� ý��

��� ú å�÷ � æ�âh÷ â�ä�� ��� õ�ú¯ø â õ ä���÷ â ø â�æ ö

� å�ä ù � ö õ�ú ÷ â ù å�÷ ö ä���� æ ù � ö�� ÷ å ø*ø â���� � õ�ú � å øhø ��æ�� ö �

� ��ã�� ä3å�÷ � â�ä�� ô(õ ÷ æ�� æ���ä

� å ø:ùpú õ � æ ö ä

ô(õ�ö â�÷

Shallow location

Heavy rains

Porous bedrockMinimal overburden

��� �"!

No chlorinatorDesign flaw:

#�$&% %�'Design flaw:

#($&% %*)

� å�æ ö õ�ø � æ õ æ ö ä

ûËü-ý-þ"ÿ������0ýpþ��

÷ â�+���â�ä ö ä � å�÷,� æ � å

. .

..

- ò/.�02ð í/1202í3ï

3 ��� ��â ú � æ�â�ä

46587:9;9tò&1<14ì îpî(ì ò í�02ð î

ô(õ�ö â�÷

Shallow location

Heavy rains

Porous bedrockMinimal overburden

��� �"!

No chlorinatorDesign flaw:

#�$&% %�'Design flaw:

#($&% %*)

� å�æ ö õ�ø � æ õ æ ö ä

=>������â ö

Dynamic Structure

? ã3â�÷ ä�� � � ö

� å ú � � � â�ä

� æ�ä ù â � ö � å�æ õ æ��hå ö � âp÷0÷ â ù å�÷ ö ä

÷ â ù å�÷ ö äô(õ�ö â�÷0ä õ�øhù�ú â�ä

÷ â ù å�÷ ö äA@ 0�îpï ì í�B�C*D&E÷ â ù å�÷ ö ä

÷ â���� ú õ�ö � å�æ�ä÷ â ù å�÷ ö ä�2?&F � æ�ä ù â � ö � å�æ

? ù â�÷ õ�ö åp÷ � â�÷ ö � � � � õ�ö � å�æ

G,� æ õ æ � � õ�úIH æ � å�J

ûËü-ý-þ"ÿ������0ýpþ��

K -ALAM ë20/N ì O�D&P

÷ â�+���â�ä ö ä÷ â ù å�÷ öä ö õ�ö ��ä

÷ â ù å�÷ ö ä

Q�������â ö ä�� ú õ�ô ä

Q�������â ö ä�� ú õ�ô ä

Q�������â ö ä�� ú õ�ô ä

÷ â ù å�÷ ö ä

����� ��â ú � æ�â�äG0â���â�÷ õ�ú 5 ð ò/.(ì í*Oºì D P

÷ â ù å�÷ ö ä

R:0/D P ï"S TU0 V3ïXW3ò�ó&RU0>D&P ïXS ���� � �� �þ"ü� ý��

��� ú å�÷ � æ õ�ö � å�æ

ë4ì í�ì î-ï ð ñ@ò�ó ?&Y�Z[? ��\ � ú å�÷ � æ õ�ö � å�æ2=>� ú ú â ö � æ\(â�÷ ö � � � � õ�ö â�ä�å � � ù�ù ÷ å�ã õ�ú

Z â ú úä3â ú â � ö � å�æò V�0 ð D�ï ì ò í3î4(D&P ]"0 ð ï ò2í<5&7U9

õ æ��

^&9`_ M

� å�æ ö õ�ø � æ õ æ ö ä

÷ â���� ú õ�ö å�÷ � ù å ú � � �

÷ â���� ú õ�ö å�÷ � ù å ú � � �

46587:9;9tò&1<14ì îpî(ì ò í�02ð î

� ��ã�� ä3å�÷ � â�ä�� ô(õ ÷ æ�� æ���ä

� å�ä ù � ö õ�ú ÷ â ù å�÷ ö ä���� æ ù � ö�� ÷ å ø*ø â���� � õ�ú � å øhø ��æ�� ö �

��� ú å�÷ � æ�âh÷ â�ä�� ��� õ�ú¯ø â õ ä���÷ â ø â�æ ö

���� � �� �þ"ü� ý��

ûËü-ý-þ"ÿ������0ýpþ��

ôºõ�ö â�÷�ä õ�øhù�ú â�ä

ë4ì í�ì î-ï;ð ñ@ò�óR:0/D P ï"S

- ò/.�02ð í/1202í3ï

÷ â ù å�÷ ö ä

5 ð ò/.(ì í*Oºì D PG0â���â�÷ õ�ú

System Hazard: 5&a/E/P ì O ì îb0/c�V3ò�î�0/N4ï ò<0&W*Opò P ì;ò ðpò�ï"S*0 ð:S*0/D&P ïXS�d:ð 0&P D�ï 0/N<O-ò í�ï D 14ì í�D í�ï î@ïXS�ð ò&a*B&S2N ð ì í/]pì í*B<e2D�ï 02ð W

System Safety Constraints: f g�h 4(D�ï 02ð�i&a�D P ì ï ñj1ka�îpï2í3ò�ï E�0<O-ò&1<V�ð ò&14ì î�0>N&Wf l/h 5&a/E/P ì OjS�0/D P ïXS<120/D�î:a�ð 0�îj1<a3î-ï ð 0>N&a�O�0*ð ì î�]�ò�ó>0/c�V3ò�î�a�ð 0*ì ó/e2D�ï 02ð�i&a�D P ì ï ñ ì îbOpò 1<V�ð ò 14ì î�0/N f 0&W B&W m�í3ò�ï;ì ó ì O�D�ï ì ò2í2D í*N�V�ð ò>O�0>N&a�ð 0�î@ï ò4ó ò&P P ò/e h@ S�04î�D�ó 0�ï ñAO-ò í�ï;ð ò&P î-ï ð a�O-ïXa�ð 0�1<a�îpï V�ð 0>.�02í3ï/0>c:V�ò�î�a�ð 04ò�ó�ïXS�0�V/a>E/P ì O@ï ò<Opò2í3ï D&14ì í�D�ï 0>N<e2D�ï 0 ð W

. .

? ù â�÷ õ�ö åp÷ � â�÷ ö � � � � õ�ö � å�æ\(â�÷ ö � � � � õ�ö â�ä�å � � ù�ù ÷ å�ã õ�ú

÷ â ù å�÷ ö ä

� å ú � � � â�ä Z â ú úä3â ú â � ö � å�æ

÷ â ù å�÷ ö ä

÷ â ù å�÷ ö ä

ò V�0 ð D�ï ì ò í3î

ä ö õ�ö ��ä÷ â ù å�÷ ö÷ â�+���â�ä ö ä

K -ALAM ë20/N ì O�D&P

4(D&P ]"0 ð ï ò2í<5&7U9

TU0 V3ïXW3ò�ó&RU0>D&P ïXS÷ â ù å�÷ ö ä

? ã3â�÷ ä�� � � ö

õ æ��

÷ â���� ú õ�ö � å�æ�ä�2?&F � æ�ä ù â � ö � å�æ÷ â ù å�÷ ö ä

^&9`_ M

� å�æ ö õ�ø � æ õ æ ö ä� æ�ä ù â � ö � å�æ õ æ��hå ö � âp÷0÷ â ù å�÷ ö ä

÷ â���� ú õ�ö å�÷ � ù å ú � � �

÷ â���� ú õ�ö å�÷ � ù å ú � � �

ô(õ�ö â�÷0ä õ�øhù�ú â�ä

?&Y�Z[? ��\ � ú å�÷ � æ õ�ö � å�æ2=>� ú ú â ö � æ

Q�������â ö ä�� ú õ�ô ä

Q�������â ö ä�� ú õ�ô ä

@ 0�îpï ì í�B�C*D&E

���� � �� �þ"ü� ý��

��� ú å�÷ � æ õ�ö � å�æ

Q�������â ö ä�� ú õ�ô äë4ì í�ì î-ï ð ñ@ò�ó

÷ â ù å�÷ ö ä

����� ��â ú � æ�â�ä

á�â�ã3â�ä3å�æhçaè�é�n

á�â�ã3â�ä3å�æhçaè�é�é

c

c

c

5 ð ì .�D�ï 0

c

- ò/.�0 ð í>120 í3ï

oU0�îºì N/02í3ï î

4(D&P ]"0 ð ï ò2íoU0�îºì N/02í3ï î

ïXS*0�_�í*.ºì ð ò í/1202í3ï

@ 0�îpï ì í�B�C�D E

4(D&P ]"0 ð ï ò2í

- ò/.�0 ð í>120 í3ï

opa�ð D&P"^�ó ó D ì ð îq-ò�ò/N m�D í*N^&B2ð ì O:a>P ï"a�ð 0&më4ì í�ì îpï ð ñ@ò�ó

ïXS*0�_�í*.ºì ð ò í/1202í3ï

opa�ð D&P"^�ó ó D ì ð îq-ò�ò/N m�D í*N^&B2ð ì O:a>P ï"a�ð 0&më4ì í�ì îpï ð ñ@ò�ó

Page 43: Software System Safety

r8s2tvuxw

Modeling Behavioral Dynamics

yAz�{�|8},~

��},�}�� �8�x�;���8�

ux�j�

ux�v�u z��8� �� ���8� � ~ �

� �8}��"�,~*���

yA� �jr

ux�v��� �}�� �� |���~

� ���,~*�"����� � �8��� � �8�����8}�� �

� �"��� w ������ �,�� �����j�,��~*�w �"}����z��"}`~*� � z,~Ay�z8{�|8},~

c

� ��� � � {8}��,�}

� � � },��~*� �}��8},����� �t z�~��8�x�,~*� � r ��z,~*{������r ����~*}��"�

�`}����8� ~�}8{x vz8��� � ~ �� � r �8�x��� }8{`¡¢��~�}8�

� ��}8����~��8�� ���x��}�~�}8�,�}

�<� �£v� � � � � ��� �¤ � � },��~*� ���

w �"},�}���� }x� �r ��z,~*{������ r ����~*}���`}�¥�z8� �"}��x}8�,~ � � ���

¦�"����~�� �����8���<�,~*}� � ¤ � � },��~*� ���

� � � },��~*� �}��8},����� �yA� �jr§t {,�� ��8�"� },�

t ���8�"}���}����w z8¨�� � �

� ��}��"��~��8�

c

� � � },��~*� �}8��},����� �� ����~��"����� � �8�x� � �8�����8}�� �yA}�~ �x}�}�� u z8��� �� �8��� � ~ ����yA� �jr

w �"},�}���� }x� ����8¨x�<}��8��� ~*� ��|

©v}��"�,�� ~ ��� �� ��}8�"�,~*���w �"��¨�� }��ª�`}����8� ~�� ��|�<� �£v� �� ����~��8��� ���,~*� �8�x� �«<�"� �8£�� ��|`¡¬�,~*}��

 jz��8� � ~ � � ����~��"�8� r ����~*}��� z��x���8� � �8�,~*�"��� � }�{`¡¬�,~*}��� � � },��~*� �}��8},����� �

� � � },��~*� �}��8},����� �r ���x��� � ��|

�<��~�}x� � ¤ �,��"}����}

� �}8� �� |���~

� � � },��~*� �}8��},����� �yA},~ ��}�}8�x����¨x���j�,��~��

� �8}��"�,~*���� �8�x��� � �8�,�}

� � w z��8� ����x}8�,~� ��}8�"�,~*���2¦}8���

��� ~*�x�<}8{ s �8��}t �8���,��8�,�}

� � w �"�,�},���u }8�,~*��� u ��{8}��

� � w �"�,�},���u }8�,~*��� u ��{8}��

� � w �"���},���u }���~��8� u �8{�}��� � w �"�,�}����u }��,~*��� u �8{�}��

� � w �"�,�},���u }8�,~*��� u ��{8}��

u z���� �� ���8� � ~ �

� � � �8�,~*���x� �8�,~*� ���x�<� �£

�`�,~*}x� � � �,��"}����}x� �¤ � � },��~*� ���x�`� �£

w }�� ��8�����t ¨�� � � ~*� }��

� � � },��~*� �}8��},����� �� ¥�z�� ���x}��,~u ��� �,~*}����8�,�}

w }�� � ���"�x���,�}x� �� �8� ���"� ���,~*��� �

¨,��yA� �vr�<}����z�� �},�t ���� � ��¨8� }­2�}x� �

ux�j�� �}�� �� |���~

 jz���� � ~ �®� �s �"��� �8� ��|

 jz��8� � ~ ��� � ¡¬}�� �

Page 44: Software System Safety

���0�H�h�

RiskPerceived

Budget cutsdirected toward

safety

ExternalPressure

PerformancePressure

Expectations

Launch Rate

¯2° ±6°X²

of the Columbia AccidentA (Partial) System Dynamics Model

safety programsPriority of

Success RateSuccess

³`´

in complacencyRate of increase

Complacency

³�µ

¶¸·2¹�¹�º2»�»

Accident Rate

safety

Systemsafetyefforts

cutsBudget

³�µ¼`½�¾2¿2À º ± »;Á2Â<Ã,º¿ º2º2Ä6Å ° Æ º2Ç

¯2° ±6°X² » ² ¾

Èɵ¼ ·2»8Á ° Ä2Ê ² Á2º

ËÌ° ¹ ¾2À Â2»®ÍÌ· À Â2¹

safety increaseRate of

Safety

cc á�â�ã3â�ä¯åpæhçaè�é�Î

Page 45: Software System Safety

�������h�

�������h�

Coordination flaws

1. Identify

Mental model flaws

Change those factors if possible

Dynamic processes in effect that led to changesChanges to static safety control structure over time

2. Model dynamic aspects of accident:

Steps in a STAMP analysis:

Examines interrelationships rather than linear cause−effect chains

Looks at the processes behind the events

Includes entire socio−economic system

Includes behavioral dynamics (changes over time)

Want to not just react to accidents and impose controls for a while, but understand why controls drift toward ineffectiveness over time and

Context in which decisions made

3. Create the overall explanation for the accident

STAMP vs. Traditional Accident Models

Detect the drift before accidents occur

System hazardsSystem safety constraints and requirementsControl structure in place to enforce constraints

Control flaws (e.g., missing feedback loops)

Inadequate control actions and decisions

á�â�ã3â�ä3å�æhçaèÐÏ�Ñ

á�â�ã3â�ä3å�æhçaèÐÏ�è

c

c

c

c

Page 46: Software System Safety

STAMP−Based Hazard Analysis (STPA)

executable and analyzable

Assists in designing safety into system from the beginning

design, development, manufacturing, and operationsUsed to eliminate, reduce, and control hazards in system

Not just after−the−fact analysis

violated.

Can use a concrete model of control (SpecTRM−RL) that is

regulatory authoritiesIncludes software, operators, system accidents, management,

�������h�

�������h�

Provides information about how safety constraints could be

Risk Assessment

Safety Metrics and Performance Auditing

Hazard Analysis

Using STAMP to Prevent Accidents

á�â�ã3â�ä3å�æhçaèÐÏ�ê

á�â�ã3â�ä3å�æhçaèÐÏ�Ò

c

c

c

c

Page 47: Software System Safety

Operating

Aural AlertsDisplays

Pilot

Aural AlertsDisplays

Radar

Advisories

Radio

PilotAdvisories

Controller

Air TrafficFAA

Mode

6. Interference with ATC safety−related advisory

5. Interference with ground−based ATC system

4. Interference with other safety−related aircraft systems

3. Loss of control of aircraft

2. A controlled maneuver into the ground

(a pair of controlled aircraft violate minimum separationstandards)

1. A near mid−air collision (NMAC)

TCAS Hazards

�H�0���h�

�H�0���h�requirements and constraints on behavior

STPA − Step1: Identify hazards and translate into high−level

STPA − Step 2: Define basic control structure

Aircraft

Aircraft

Aircraft InformationOwn and Other

Aircraft InformationOwn and Other

TCAS

OperatingMode TCAS

OpsAirline

Mgmt.Ops

Airline

Flight DataProcessor

Mgmt.OpsATCLocal

Mgmt.

cc

c

ápâ�ã¯â�ä3å�æhç^èÐÏ�n

ápâ�ã¯â�ä3å�æhç^èÐÏ�Óc

Page 48: Software System Safety

�������h�

STPA − Step 3: Identify potential inadequate control actions that

could lead to hazardous process state

4. A correct control action is stopped too soon

provided too late (at the wrong time)3. A potentially correct or inadequate control action is

2. An incorrect or unsafe control action is provided.

1. A required control action is not provided

In general:

3. The pilot applies the RA but too late to avoid the NMAC

2. The pilot incorrectly executes the TCAS resolution advisory.

by TCAS (does not respond to the RA)

Pilot:1. The pilot does not follow the resolution advisory provided

4. The pilot stops the RA maneuver too soon.

�������h�

For the NMAC hazard:

1. The aircraft are on a near collision course and TCAS does not provide an RA

TCAS:

2. The aircraft are in close proximity and TCAS provides anRA that degrades vertical separation

3. The aircraft are on a near collision course and TCAS providesan RA too late to avoid an NMAC

4. TCAS removes an RA too soon.

á�â�ã3â�ä3å�æhçaèÐÏ�é

á�â�ã3â�ä3å�æhçaèÐÏ�Ïc

cc

c

Page 49: Software System Safety

Eliminate from design or control or mitigate in design or operations

STPA − Step 4: Determine how potentially hazardous control

Guided by set of generic control loop flaws

Where human or organization involved must evaluate:

Behavior−shaping mechanisms (influences)Context in which decisions made

Step 4a: Augment control structure with process models for eachcontrol component

Step 4b: For each of inadequate control actions, examine parts ofcontrol loop to see if could cause it.

actions could occur.

Can use a concrete model in SpecTRM−RL

�������h�

In general:

�������h�

Step 4c: Consider how designed controls could degrade over time

Assists with communication and completeness of analysis

Provides a continuous simulation and analysis environmentto evaluate impact of faults and effectiveness of mitigationfeatures.

á�â�ã3â�ä3å�æhçaèÐÏ�Ô

á�â�ã3â�ä3å�æhçaèÐÏ�Îc

cc

c

Page 50: Software System Safety

Classification

Current RA Level

Current RA Sense

Other Aircraft (1..30) Model

Reversal

Crossing

StatusStatus

RA Strength

Altitude Reporting

Sensivity Level

On

Fault DetectedSystem Start

1

Other Altitude

TCAS

Sensitivity Level

RangeOther BearingOther Bearing Valid Mode S Address

Other Altitude Valid

RA Sense

Own Aircraft Model

Increase Climb InhibitClimb Inhibit

Descent Inhibit Increase Descent Inhibit

Altitude Layer

INPUTS FROM OWN AIRCRAFT

Non−CrossingInt−CrossingOwn−CrossUnknown

DescendClimbNone

NoneUnknown

UnknownLostNoYesOn ground

AirborneUnknown

Proximate Traffic

ThreatUnknown

Other Traffic

Potential Threat

INPUTS FROM OTHER AIRCRAFT

2

Unknown

34567

Not InhibitedInhibited

Unknown

Layer 1

Layer 2Layer 3

Layer 4

Unknown

VSL 0VSL 500 VSL 1000VSL 2000

Increase 2500Nominal 1500

Unknown

Not Selected

ReversedNot Reversed

Equippage

DisturbancesDisplays

Controls

Processinputs

ControlledProcess

Actuators

Sensors

Model of

outputsProcess

Controlledvariables

Measuredvariables

Model ofProcess

Model ofAutomation

Config Climb InhibitAircraft Altitude Limit Õ�Ö,×>Ø2Ù

Prox Traffic DisplayAltitude RateAir StatusBarometric Altimeter Status

Radio Altitude StatusBarometric Altitude

Radio Altitude

Model ofProcess Interfaces

Automated ControllerHuman Supervisor(Controller)

Õ�Ö,×>Ø2Ù

Traffic Display Permitted

Own MOde S addressAltitude Climb InhibitIncrease Climb Inhibit Discrete

None

Not Inhibited

ClimbDescend

NoneVSL 0VSL 500VSL 1000VSL 2000Unknown

Inhibited

Unknown

Unknown

UnknownNot Inhibited

Inhibited

Inhibited

UnknownNot Inhibited

cc Ú�Û�Ü�Û�Ý�Þ�ß2à¸áÐâUá

Ú�Û�Ü�Û�Ý�Þ�ß2à¸áÐâ�ãcc

Page 51: Software System Safety

inadequate control actions

Design of control algorithm (process) does not enforce constraints

Process models inconsistent, incomplete, or incorrect (lack of linkup)

Communication flaw

Flaw(s) in creation or updating process Inadequate or missing feedback

Not provided in system design

Inadequate sensor operation (incorrect or no information provided)

Time lags and measurement inaccuracies not accounted for

Inadequate coordination among controllers and decision−makers(boundary and overlap areas)

STPA − Step 4b: Examine control loop for potential to cause

Inadequate Execution of Control ActionCommunication flaw

Time lagInadequate "actuator" operation

Inadequate Control Actions (enforcement of constraints)

Õ>Ö,×�Ø2Ù

Õ>Ö�×>Ø2Ù

e.g. operational procedures

Use information to design protection against changes:

over time.

E.g., specified procedures ==> effective procedures

controls over changes and maintenance activities

Use system dynamics models?

STPA − Step4c: Consider how designed controls could degrade

management feedback channels to detect unsafe changes

auditing procedures and performance metrics

Ú�Û�Ü�Û�ÝIÞ�ß2à¸áÐâ�ä

Ú�Û�Ü�Û�Ý�Þ�ß2à`áÐâ�åc

cc

c

Page 52: Software System Safety

STPA results more comprehensive

Top−down (vs. bottom−up like FMECA)

Includes HAZOP model but more general

caused by deviations in system variablesHAZOP guidewords based on model of accidents being

General model of inadequate control

Compared with TCAS II Fault Tree (MITRE)

Not physical structure (HAZOP) but control (functional) structure

Concrete model (not just in head)

Handles dysfunctional interactions, software, management, etc.

Guidance in doing analysis (vs. FTA)

Considers more than just component failures and failure events

Comparisons with Traditional HA Techniques

Õ�Ö,×>Ø2ÙÚ�Û�Ü�Û�Ý�Þ�ß2à¸áÐâ�æcc