Copyright © 2003 M. E. Kabay. All rights reserved. Critical Systems Specification IS301 – Software Engineering Lecture #18 – 2003-10-23 M. E. Kabay, PhD, CISSP Dept of Computer Information Systems Norwich University [email protected]
1 Copyright © 2003 M. E. Kabay. All rights reserved.
Critical Systems
SpecificationIS301 – Software Engineering
Lecture #18 – 2003-10-23M. E. Kabay, PhD, CISSP
Dept of Computer Information SystemsNorwich University
2 Copyright © 2003 M. E. Kabay. All rights reserved.
Acknowledgement
All of the material in this presentation is based directly on slides kindly provided by Prof. Ian Sommerville on his Web site at
http://www.software-engin.com Used with Sommerville’s permission as
extended by him for all non-commercial educational use
Copyright in Kabay’s name applies solely to appearance and minor changes in Sommerville’s work or to original materials and is used solely to prevent commercial exploitation of this material
3 Copyright © 2003 M. E. Kabay. All rights reserved.
Topics
Software reliability specificationSafety specificationSecurity specification
4 Copyright © 2003 M. E. Kabay. All rights reserved.
Dependable Systems Specification
Processes and techniques for developing specification for
System availabilityReliabilitySafetySecurity
5 Copyright © 2003 M. E. Kabay. All rights reserved.
Functional and Non-Functional Requirements
System functional requirements Define error checkingRecovery facilities and featuresProtection against system failures
Non-functional requirements Required reliabilityAvailability of system
6 Copyright © 2003 M. E. Kabay. All rights reserved.
System Reliability Specification
Hardware reliability P{hardware component failing}?Time to repair component?
Software reliability P{incorrect output}?Software can continue operation after error
HW often causes stoppageOperator reliability
P{operator error}?
7 Copyright © 2003 M. E. Kabay. All rights reserved.
What Happens When All Components Must Work?
Consider system with 2 components A and B where
P{failure of A} = PA
P{failure of B} = PB
P{not A} = 1 – P{A}
P{A&B} = P{A}*P{B}
I.e., at least 1 will fail
Operation of system depends on both of themP{A will not fail} = (1 – PA)
P{B will not fail} = (1 – PB)
P{A & B will both not fail} = (1 – PA) (1 – PB)
P{system failure} = 1 – [(1 – PA) (1 – PB)]
8 Copyright © 2003 M. E. Kabay. All rights reserved.
General Principles
If there are a number of elements i with probability of failure Pi and all of them
have to work for the system to work, then the probability of system failure PS is
Therefore, as number of components (all of which need to function) increases then probability of system failure increases
PS = 1 - (1 – Pi) i
9 Copyright © 2003 M. E. Kabay. All rights reserved.
Component Replication
If components with failure probability P are replicated so that system works as long as any one of components works, then probability of system failure is
PS = P{all will fail} = Pn
If the system will fail if any of the components fail, then probability of system failure is
PS = P{at least 1 will fail}
= P{not all will work}
= 1 - (1 – P)n
10 Copyright © 2003 M. E. Kabay. All rights reserved.
Examples of Functional Reliability Requirements
Predefined range for all values input by operator shall be defined and system shall check all operator inputs fall within predefined range
System shall check all disks for bad blocks when it initialized
System must use N-version programming to implement braking control system
System must be implemented in safe subset of Ada and checked using static analysis
11 Copyright © 2003 M. E. Kabay. All rights reserved.
Non-Functional Reliability Specification
Required level of system reliability required should be expressed in quantitatively
Reliability a dynamic system attribute: Reliability specifications related to source
code meaningless:“No more than N faults/1000 lines” -- BADUseful only for post-delivery process
analysis -- trying to assess quality of development techniques
Appropriate reliability metric should be chosen to specify overall system reliability
12 Copyright © 2003 M. E. Kabay. All rights reserved.
Reliability Metrics
Reliability metrics: units of measurement of system reliability
Count number of operational failuresRelate to demands on systemTime system has been operational
Long-term measurement programRequired to assess reliability of critical
systems
13 Copyright © 2003 M. E. Kabay. All rights reserved.
Reliability Metrics
Metric ExplanationPOFODProbability of failureon demand
The likelihood that the system will fail when a servicerequest is made. For example, a POFOD of 0.001means that 1 out of a thousand service requests mayresult in failure.
ROCOFRate of failureoccurrence
The frequency of occurrence with which unexpectedbehaviour is likely to occur. For example, a ROCOF of2/100 means that 2 failures are likely to occur in each100 operational time units. This metric is sometimescalled the failure intensity.
MTTFMean time to failure
The average time between observed system failures.For example, an MTTF of 500 means that 1 failure canbe expected every 500 time units.
MTTRMean time to repair
The average time between a system failure and thereturn of that system to service.
AVAILAvailability
The probability that the system is available for use at agiven time. For example, an availability of 0.998means that in every 1000 time units, the system islikely to be available for 998 of these.
14 Copyright © 2003 M. E. Kabay. All rights reserved.
Probability of Failure on Demand (POFOD)
Probability system will fail when service request made. Useful when demands for service intermittent and relatively infrequent
Appropriate for protection systems where services demanded occasionally and where there serious consequence if service not delivered
Relevant for many safety-critical systems with exception management componentsEmergency shutdown system in chemical
plant
15 Copyright © 2003 M. E. Kabay. All rights reserved.
Rate of Fault Occurrence (ROCOF)
Reflects rate of occurrence of failure in system
ROCOF of 0.002 means 2 failures likely in each 1000 operational time units e.g. 2 failures per 1000 hours of operation
Relevant for operating systems, transaction processing systems where system has to process large number of similar requests relatively frequentCredit card processing system, airline
booking system
16 Copyright © 2003 M. E. Kabay. All rights reserved.
Mean Time to Failure
Measure of time between observed failures of system. reciprocal of ROCOF for stable systems
MTTF of 500 means mean time between failures 500 time units
Relevant for systems with long transactions i.e. where system processing takes long time. MTTF should be longer than transaction lengthComputer-aided design systems where
designer will work on design for several hours, word processor systems
17 Copyright © 2003 M. E. Kabay. All rights reserved.
Availability
Measure of fraction of time system available for use
Takes repair and restart time into accountAvailability of 0.998 means software available
for 998 out of 1000 time unitsRelevant for non-stop, continuously running
systems Telephone switching systems, railway
signaling systems
18 Copyright © 2003 M. E. Kabay. All rights reserved.
Failure Consequences
Reliability measurements do NOT take consequences of failure into account
Transient faults may have no real consequences
Other faults may cause Data lossCorruptionLoss of system service
Identify different failure classesUse different metrics for each of these.
Reliability specification must be structured
19 Copyright © 2003 M. E. Kabay. All rights reserved.
Failure Consequences
When specifying reliability, it not just number of system failures matter but consequences of these failures
Failures have serious consequences clearly more damaging than those where repair and recovery straightforward
In some cases, therefore, different reliability specifications for different types of failure may be defined
20 Copyright © 2003 M. E. Kabay. All rights reserved.
Failure Classification
Failure class DescriptionTransient Occurs only with certain inputsPermanent Occurs with all inputsRecoverable System can recover without operator interventionUnrecoverable Operator intervention needed to recover from failureNon-corrupting Failure does not corrupt system state or dataCorrupting Failure corrupts system state or data
21 Copyright © 2003 M. E. Kabay. All rights reserved.
Steps to Reliability Specification
For each sub-system, analyze consequences of possible system failures
From system failure analysis, partition failures into appropriate classes
For each failure class identified, set out reliability using appropriate metric. Different metrics may be used for different
reliability requirements Identify functional reliability requirements to
reduce chances of critical failures
22 Copyright © 2003 M. E. Kabay. All rights reserved.
Bank Auto-Teller System
Expected usage statisticsEach machine in network used 300 times
dayLifetime of software release 2 yearsEach machine handles about 220,000
transactions over 2 yearsTotal throughput
Bank has 1,000 ATMs~300,000 database transactions per day~110M transactions per year
23 Copyright © 2003 M. E. Kabay. All rights reserved.
Bank ATM (cont’d)
Types of failureSingle-machine failures
Affect individual ATMNetwork failures
Affect groups of ATMsLower throughput
Central database failuresPotentially affect entire network
24 Copyright © 2003 M. E. Kabay. All rights reserved.
Examples of Reliability Spec.
Failure Class
Example Reliability Metric
Permanent, non-corrupting
System fails to operate w/ any card input. SW must be restarted to correct failure.
ROCOF 1 occurrence /1,000 days
Transient, non-corrupting
Mag stripe data cannot be read on undamaged card that is input
POFOD
1 in 1,000 transactions
Transient, corrupting
Pattern of transactions across network causes DB corruption
Unquantifiable! Should never happen in lifetime of system
25 Copyright © 2003 M. E. Kabay. All rights reserved.
Specification Validation
Impossible to validate very high reliability specifications empirically
E.g., in ATM example:“no database corruptions”=POFOD of less than 1 in 220 million
If transaction takes 1 second, then simulating one day’s ATM transactions on a single system would take 300,000 seconds = 3.5 days
Testing a single run of 110M transactions would take 3.5 yearsIt would take longer than system’s lifetime
(2 years) to test it for reliability
26 Copyright © 2003 M. E. Kabay. All rights reserved.
Topics
Software reliability specificationSafety specificationSecurity specification
27 Copyright © 2003 M. E. Kabay. All rights reserved.
Safety Specification
Safety requirements of system should beSeparately specifiedBased on analysis of possible hazards and
risksSafety requirements
Usually apply to system as whole rather than to individual sub-systems
In systems engineering terms, safety of system is emergent property
28 Copyright © 2003 M. E. Kabay. All rights reserved.
Safety Life-CycleHazard and risk
analysis
Safety req.allocation
Safety req.derivation
Concept andscope definition
Validation O & M Installation
Planning Safety-relatedsystems
development
External riskreductionfacilities
Operation andmaintenance
Planning and development
Systemdecommissioning
Installation andcommissioning
Safetyvalidation
29 Copyright © 2003 M. E. Kabay. All rights reserved.
Safety Processes
Hazard and risk analysisAssess hazards and risks of damage
associated with systemSafety requirements specification
Specify set of safety requirements which apply to system
Designation of safety-critical systemsIdentify sub-systems whose incorrect
operation may compromise system safety. Ideally, these should be as small part as possible of whole system.
Safety validationCheck overall system safety
30 Copyright © 2003 M. E. Kabay. All rights reserved.
Hazard and Risk Analysis
Hazarddescription
Hazardidentification
Risk analysis andhazard classification
Hazarddecomposition
Risk reductionassessment
Riskassessment
Fault treeanalysis
Preliminary safetyrequirements
31 Copyright © 2003 M. E. Kabay. All rights reserved.
Hazard Analysis Stages
Hazard identificationIdentify potential hazards which may arise
Risk analysis and hazard classificationAssess risk associated with each hazard
Hazard decompositionDecompose hazards to discover their
potential root causesRisk reduction assessment
Define how each hazard must be taken into account when system designed
32 Copyright © 2003 M. E. Kabay. All rights reserved.
Fault-Tree Analysis
Method of hazard analysisStarts with identified faultWorks backward to causes of fault
Used at all stages of hazard analysisPreliminary analysisDetailed SW checking
Top-down hazard analysis methodMay be combined with bottom-up methods
Start with system failuresLead to hazards
33 Copyright © 2003 M. E. Kabay. All rights reserved.
Fault-Tree Analysis
Identify hazard Identify potential causes of hazard
Usually several alternative causesLink these on fault-tree with ‘or’ or ‘and’
symbolsContinue process until root causes identifiedConsider following example
How data might be lost System where backup process running
34 Copyright © 2003 M. E. Kabay. All rights reserved.
Fault Tree Example
Data deleted
H/W failure S/W failureExternal attack Operator failure
Operating system failureBackup system failure
Incorrect configurationIncorrect operator input Execution failure
Timing fault Algorithm fault Data faultUI design fault Training fault Human error
or or or or
or or
or or or
or ororor
oror
Data deleted
H/W failure S/W failureExternal attack Operator failure
Operating system failureBackup system failure
Incorrect configurationIncorrect operator input Execution failure
Timing fault Algorithm fault Data faultUI design fault Training fault Human error
or or or or
or or
or or or
or ororor
oror
35 Copyright © 2003 M. E. Kabay. All rights reserved.
Risk Assessment
Assesses hazard severity, hazard probability and accident probability
Outcome of risk assessment statement of acceptabilityIntolerable. Must never arise or result in
accidentAs low as reasonably practical (ALARP)
Must minimize possibility of hazard given cost and schedule constraints
Acceptable. Consequences of hazard acceptable and no extra costs should be incurred to reduce hazard probability
36 Copyright © 2003 M. E. Kabay. All rights reserved.
Levels of Risk
Unacceptable regionrisk cannot be tolerated
Risk tolerated only ifrisk reduction is impractical
or grossly expensive
Acceptableregion
Negligible risk
ALARPregion
As low as reasonably
practical
RIS
KS
COSTS
37 Copyright © 2003 M. E. Kabay. All rights reserved.
Risk Acceptability
Acceptability of risk determined by human, social and political considerations
In most societies, boundaries between regions pushed upwards with time; i.e., society increasingly less willing to accept riskFor example, costs of cleaning up pollution
may be less than costs of preventing it but pollution may not be socially acceptable
Risk assessment often highly subjectiveOften lack hard data on real probabilitiesRisks identified as probable, unlikely, etc.
depends on who making assessment
38 Copyright © 2003 M. E. Kabay. All rights reserved.
Why Do We Lack Firm Risk Probabilities and Costs?
Failure of observation – don’t noticeFailure of reporting – don’t tell anyoneVariability of systems – can’t pool dataDifficulty of classifying incidents – can’t
compare problemsDifficulty of measuring costs – don’t know all
repercussions
39 Copyright © 2003 M. E. Kabay. All rights reserved.
Risk Reduction
System should be specified so hazards do not arise or result in accident
Hazard avoidanceDesign so hazard can never arise during
correct system operationHazard detection and removal
Design so hazards are detected and neutralized before they result in accident
Damage limitation or mitigationDesign so consequences of accident are
minimized or at least reduced
40 Copyright © 2003 M. E. Kabay. All rights reserved.
Specifying Forbidden Behavior: Examples
System shall not allow users to modify access permissions on any files they have not created (security)
System shall not allow reverse thrust mode to be selected when aircraft in flight (safety)
System shall not allow simultaneous activation of more than three alarm signals (safety)
41 Copyright © 2003 M. E. Kabay. All rights reserved.
Topics
Software reliability specificationSafety specificationSecurity specification
42 Copyright © 2003 M. E. Kabay. All rights reserved.
Security Specification
Similar to safety specificationNot possible to specify security
requirements quantitativelyRequirements often ‘shall not’ rather than
‘shall’ requirementsDifferences
No well-defined notion of security life cycle for security management
Generic threats rather than system specific hazards
Mature security technology (encryption, etc.) but problems in transferring into general use – corporate culture
43 Copyright © 2003 M. E. Kabay. All rights reserved.
Security Specification Process
System assetlist
Assetidentification
Threat analysis andrisk assessment
Threatassignment
Security req.specification
Threat andrisk matrix
Asset andthreat
description
Securityrequirements
Technologyanalysis
Securitytechnology
analysis
44 Copyright © 2003 M. E. Kabay. All rights reserved.
Stages in Security Specification (1)
Asset identification and evaluation Assets (data and programs) identifiedRequired degree of protection
Criticality and sensitivityThreat analysis and risk assessment
Possible threatsRisks estimated
Threat assignment Identified threats related to assetsFor each identified asset, list of associated
threats
45 Copyright © 2003 M. E. Kabay. All rights reserved.
Stages in Security Specification (2)
Technology analysis Identify available security technologiesAssess applicability against identified
threatsSecurity requirements specification
PolicyProcedureTechnology
46 Copyright © 2003 M. E. Kabay. All rights reserved.
HOMEWORK
Apply full Read-Recite-Review phases of SQ3R to Chapter 17 of Sommerville’s text
For next class (Tuesday), apply Survey-Question phases to Chapter 18 on Critical Systems Development.
For Thursday 30 Nov 2003: REQUIREDHand in responses to Exercises 17.1(2
points), .2(6), .3(4), .4(4), .5(2), .6(6) and .7(6) = 30 points total
OPTIONAL by 6 Nov: 17.8 and/or 17.9 for 3 extra points each.
47 Copyright © 2003 M. E. Kabay. All rights reserved.
DISCUSSION