This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Title Slide
“Forensic Software Engineering:Reliability, Security, Cost and other woes"
by
Les Hatton
Professor of Forensic Software EngineeringCISM, University of Kingston
An observation• All the evidence suggests that many if not most failures
exhibited by software controlled systems could have been avoided by techniques we already know how to apply.
Principles
The two engineering obligations• When a system fails (and it will), it should be designed
in such a way as to minimise deleterious effects on its user by means of built-in redundancy or otherwise
• When a system fails, the diagnostic system should always be able to provide an efficient means for finding the corresponding fault or faults so they can be corrected.
How to get it wrongAn Airbus having a bad day
A Tarom airlines Airbus which performed an uncontrolled dive,climb, roll and spin near Orly in 1995 due to ‘a fault in the automatic pilot’.The plane landed safely, a tribute to the pilots’ skill.
How to get it wrong: Ariane 5
How to get it wrong: more avionics …
28/Jul/2003. “As recently as February, test pilots of the new F/A-22 (Raptor) fighter were spending an average of 14 minutes per flight rebooting critical systems. This is now down to only 36 seconds per flight.
Washington Post.
Whoops ...
Human Computer Interfaces, a popular way of screwing things up …
Whoops ...
My first medical system experience, (a medical records system which each night backed itself up with the message … )
Storing
Whoops ...
Unfortunately, it was delivered in the Netherlands which after suitable translation yields …
Jamming
Automobile industry:• 14/Apr/2004. Ford is recalling 363,440 of its 2001-2003
Ford Escape vehicles due to software problems in power-train causing engine stalling.
Detroit News
• 17/Mar/04. 2003 US vehicle recalls hit 19.5 million in spite of ‘engineering never being better’. Experts cite problem-prone computers as significant factor.
• 09/Mar/04, Toyota faces US safety investigation and potential recall of 1 million of its best-selling Camry and Lexus ES300 sedans because of reports of unexpected acceleration causing 30 crashes.
Detroit Free Press
Cost ...
Automobile industry:• 06/02/2005. Whole string of problems, shaking Mercedes,
Ford that bakes back seat passengers …http://www.nytimes.com/2005/02/06/automobiles/06AUTO.html
• 26/10/2004. BMW disables dynamic stability control and ABS. Two police drivers vindicated after investigation.
• Forensic Systems Analysis• OS reliability, security, environment failures (eg arithmetic),
compiler quality, implications for design …
In each area we are trying to answer the question“Why ?” to avoid future occurrences
“Planning is an unnatural process. Its much more fun to get on with it. The real benefit of not planning is that failure comes as a complete surprise and is not preceded by months of worry.”
Sir John Harvey Jones.
Forensic Process Analysis
When the train of ambition pulls away from the platform of reality
Planning data from a grand ‘unified’ programming project.(Produced after the project seemed to be struggling.)
Note that unify appears next to unintelligible in the OCD.
Unsuccessful project (abandoned)
0
10
20
30
40
50
60
70
80
90
1 24 39 54 64 74
Day of prediction
Predicted daysTo completion
Ruthlessly controlling tasks
Project restarted with (far) less ambitious goals and tracked weekly withresults published on staff notice board.
Succesful project (about 10% overrun)
0
20
40
60
80
100
120
140
1601
15 36 50 64 78 92
106
120
134
176
Day of prediction
Predicted daysTo completion
Forensic Process Analysis:results so far
The following are necessary (but may not be sufficient) for satisfactory project planning:-
• No sub-task with software systems should be longer than 1 week
• Projects should be tracked weekly with progress published
• Programmers underestimate the time taken to do things by about 50%
Forensic Product Analysis:
Here we are essentially analysing the software product itself to understand the nature of its failures.
The T-experiments
Multi-industry study using static inspection, 1990-1992
E-S Aerospace ......
Single-industry study using N-version techniques, 1990-1993
Earth Science
Nuclear Control
StagesObserved many repeating faults in development of SKSDeveloped F77 parsing engine to study other packages, 1988-1992Developed C parsing engine to study similar problems in different language, 1990-1994Measured around 100 major systems 1988-1997Developed more advanced C parsing engine 1996-2000, restart experiments on embedded control systems
1988-1997: The T1 Fault experiments
Fault frequencies in C applications
Wei
ghte
d fa
ults
per
100
0 lin
es.
0
5
10
15
20
25
Gra
phic
s
Gen
eral
Elec
-eng
Des
ign
Syst
em
Con
trol
Dat
abas
e
Gra
phic
s
Pars
ing
Pars
ing
Insu
ranc
e
Util
ities
Util
ities
Util
ities
Con
trol
Com
ms
Com
ms
Averageof 8
Survey: 1993-1998
Recent examples:Netscape JavascriptInterpreter, 200314.78 per KSLOC
F1 racing car software200313.47 per KSLOC
Government agency,20050 per KSLOC
Fault frequencies in Fortran 77 applications
Wei
ghte
d fa
ults
per
100
0 lin
es.
0
5
10
15
20
25ge
nera
l
elc-
eng
Earth
Sci
pars
ing
Cad
Cam
Che
mM
od
Earth
Sci
elc-
eng
fld-e
ng
mch
-eng
mch
-eng
nuc-
eng
nuc-
eng
oper
-rs
Cad
Cam
the-
phys
Geo
desy
Aer
ospa
ce
gene
ral
Averageof 12
Same application areaone at 140 / KLOC and oneat 0 / KLOC
How long before a fault fails for the first time, (Adams 1984) ?
Mean time to fail
0
5
10
15
20
25
30
35
1.6 5 16 50 160 500 1600 5000
Ye ars
Perc
enta
ge o
f all
faul
ts
StagesAn observation: Failure experiments are REALLY expensive compared with fault experiments“T2” experiment, 1990-1993
Funded by Enterprise Oil plc in the UKCompared the output of 9 packages all in Fortran 77 developed independentlyCarried out with a colleague Andy Roberts
1990-1996: Failure experiments
How to collect seismic data
Borrow around 20 milliondollars and buy one ofthese
If it doesn’t work out youcan always run boozecruises.
T2 details
9 independently developed commercial versions of same ~750,000 F77 package of signal processing algorithms.Same input data tapes.Same processing parameters, (46 page monitored specification document).All algorithms published with precise specification, (e.g. FFT, deconvolution, finite-difference wave-equation solutions, tridiagonal matrix inversions and so on).All companies had detailed QA and testing procedures.
Similarity v. coordinate: No feedback
Defect example 1: feedback detail
Similarity v. coordinate: Feedback to company 8
Defect example 2: feedback detail
Similarity v. coordinate: Feedback to company 3
The end product: 9 subtly different views of the geology
Useful lessonsThe differences are due to subtle defects
Forensic response: design tests with sufficient precision for the desired accuracy
These defects had exceptionally long lives and can cost a fortune. Software which will accumulate thousands of execution years should depend more on static testing methods than dynamic testing methods.
Forensic response: carefully balance test resources between static and dynamic methods to match the expected life-time exposure of the software.
The outcome …
A summary of 10 years of failure experiments
Seismic processing software environment Number of significantfigures agreement
32 bit floating point arithmetic. 6
Same software on different platforms, samedata.
4
Same software on same platform, 5-1 lossycompression.
3-4
Same software subjected to continual'enhancement'
1-2
T2: different software, same specs, same data,same language, same parameters.
1
Portability degradation
Compression degradation
Maintenance degradation
Diversity degradation
BewareSpreadsheets are not normally subjected to the same quality control in an organisation91% of all spreadsheets analysed had errors affecting the results by at least 5%, (Ray Panko, University of Hawaii).
Spreadsheets
Forensic Product Analysis:results so far
The following seem well supported• Modern programming languages are riddled with
poorly defined behaviour which programmers regularly fall prey to
• Numerical computations are often wrong however they are done
• The choice of technology is irrelevant, it is the fluency of the programmers in that technology which matters most
• Beware of spreadsheets.
Forensic Systems Analysis:
Here we are essentially analysing the systems environment in which software functions to understand the nature of its failures.
Forensic Systems Analysis
We can identify at least the following areas:• OS reliability• Security• Arithmetic environment• Compiler quality
0.1
1
10
100
1000
10000
W'95 Macintosh7.5-8.1
NT 4.0 Linux Sparc4.1.3c
OS
OS Reliability
Mean Time Between Failures of various operating systems
Hours 2000,XP
> 50,000 hours
Security
A very big subject which includes:-• Monolithic v. modular design• The use of binary format files• How permissions and users are defined• Software failures, (many security breaks are due
to buffer overflow caused by programmers using inappropriate functions, (e.g. strcpy instead of strncpy))
24 8-hour days in August 2005 on Linux tripwired machine
18533 attempted intrusions282 explicit attempts to break in11211 port scans52 attempts to hijack machine for spam relayStrong evidence of repeated attacks by small number of intruders.
Security
Recovering a tainted Windows XP machine …Even disc scan failed. To save disc, mount in firewire caddy and back up under Unix. (3 hrs)Reformat hard disc (1 hr)Contact supplier to find XP Home CD is 70 quid *$&%^** !Reload from own disc and restore (4 hrs)Install ZoneAlarm, download upgrades, service packs, security fixes, (10 hrs)Norton anti-virus now fights it out with XP SP2 for privilege of protecting us, and switches off messages to avoid duplicatesZoneAlarm informs us that it has blocked 54 intrusion attempts whilst we were downloading upgrades.
Security
Speaking of Windows …
Fortunately nobodywould be stupidenough to put thisin a critical system.
Speaking of Windows …
06/09/2004• Royal Navy to run warships on Windows 2000.
This follows on from the deployment of Windows NT on the USS Enterprise in 1997 which then had to be rebooted frequently and occasionally towed to port.
22/09/2004• Total air-traffic failure at Los Angeles after Unix
system replacement Windows 2000 server hung because they forgot to reboot it frequently enough. The Unix systems had never failed
Arithmetic environment
Even in 2004, computers still get arithmetic wrong:-
• Embedded System Paranoia extends the venerable paranoia to embedded control systems with similar results:-http://www.leshatton.org/ESP_903.html
Compiler Quality
Note the following:-• In April 2000, NIST formally stopped validating
compilers in any language• Most compilers fail the existing validation suites
in some way or another – these departures are not documented, you assume the risk
• It seems very likely that the situation will get worse as languages get more complicated
Forensic Systems Analysis:results so far
The following seem well supported• If you need a reliable OS environment (MTBF >~
500 hours) do not use Windows• If you need a secure OS environment do not use
Windows or binary file formats• Test your computer arithmetic, it will probably
have inconvenient flaws and may have major failures
• Do not take your compiler quality for granted. Seek written assurances from the supplier if possible.
PrinciplesScopeConclusions
Overview
Conclusions
Forensic Software Engineering seeks:• To analyse failures in various categories and to
disseminate this information in a searchable way to allow developers and scientists to avoid future occurrences of these failures.
All the evidence suggests that relatively simple use of avoidance strategies can lead to extraordinarily reliable applications
Reference site
For more information, downloadable papers and software, see:-