Top Banner
1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity Systems Ltd Visiting Professor of Software Engineering, Oxford University Computing Laboratory
27

1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

1

Software in Practicea series of four lectures on why software projects fail, and what you can do about it

Martyn ThomasFounder: Praxis High Integrity Systems LtdVisiting Professor of Software Engineering, Oxford University Computing Laboratory

Page 2: 1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

2/25

Lecture 2: Software Failures

Developing software is very difficult it is easy to make mistakes … …. and they are unlikely to be found by

testingErrors can be introduced in every

phase of software development: requirements capture, specification,

design, programming, building, error correction, modification, re-use ...

Page 3: 1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

3/25

Finding faults by testing?type Alert is (Warning, Caution, Advisory);

function RingBell(Event : Alert) return Boolean

-- return True for Event = Warning or Event = Caution,

-- return False for Event = Advisory

is

Result : Boolean;

begin

if Event = Warning then

Result := True;

elsif Event = Advisory then

Result := False;

end if;

return Result;

end RingBell;

-- C130J code: Caution returns uninitialised (usually TRUE, as required).

Page 4: 1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

4/25

Taurus

Taurus was a £50m system to provide electronic share trading for the London Stock Exchange in 1991, removing paper share certificates. (This would revolutionise the job of share registrars).

It overran: a recovery strategy was put in place, It reached 85% complete and a date for cut-over

was announced later the same year. A few weeks later, the project was cancelled.

City firms had wasted £350m on new systems to interface to Taurus.

Page 5: 1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

5/25

Taurus: a requirements problem

The system was over-complicated and had failed to reconcile conflicting requirements, especially those from the share registrars.

Page 6: 1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

6/25

This lesson has not been learnt ...

No public-sector civil project has ever been put out to tender with a formal specification.

For example, eFDP took two years to agree a set of requirements. The remaining difficulties were put in the requirements as six-month “design studies”. Four weeks after the RfP, the project was abandoned.

Page 7: 1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

7/25

Nancy Leveson’s Torpedo:gaps in the specification

How to stop a torpedo blowing up the launch ship?

If it malfunctions or starts to come back: sink it blow it up

On live test, a torpedo failed whilst still in the torpedo tube… …

Page 8: 1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

8/25

London Ambulance Service (LAS) they took the lowest bid ...

Page 9: 1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

9/25

LAS: The Manual System

LAS covers 600 Sq Miles, carries >5000 patients each day; handles 2000-2500 calls daily including 1300-1600 emergency calls. 750 ambulances.

Emergency call written on a form. Location looked up on a map. Form and map co-ordinates placed on a conveyor belt to central dispatch, who remove duplicates and route to a zone to contact an ambulance

This took ~3 minutes and 200 staff. Decision to implement Computer-Aided Dispatch.

Page 10: 1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

10/25

LAS: Computer Aided Dispatch (CAD) version 1

1980s. £7.5 million spent. System built but failed its load test and was abandoned. LAS sued the Supplier, who had not understood the requirement properly.

1990: Requirements started for Version 2.New CAD to be “fully automated”.

Automatic lookup of location; automatic selection of the best ambulance.

No similar system in existence

Page 11: 1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

11/25

LAS: CAD Version 2

New System much more complex than Version 1: CAD+Map Display+Automatic Vehicle Location Service (AVLS)

Andersen Consulting had estimated that a package solution without AVLS, if one existed, would cost £1.5m and take 19 months to implement.

This seems to have become the project budget for a custom system.

Page 12: 1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

12/25

LAS: Version 2 bids

35 companies looked, 19 bid, most said it needed more time and money than the budget

The only bidder who promised to meet all the requirements on time and within budget was a consortium of Apricot (hardware), Systems Options (SO - a small software house) and Datatrak (AVLS).

SO bid only £35K to develop the CAD software! Total bid £937,463

The next lowest bid was £700K more!!

Page 13: 1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

13/25

LAS: Version 2 development

Phase 1 system: no radio messaging client and server lock-ups

Phase 2 system: with radio messaging unstable, overloaded at shift change,

radio blackspots, unable to cope with staff taking the “wrong” vehicle.

Managers decided to go live on 26 October 2002, ignoring independent review

Page 14: 1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

14/25

LAS: Result26 October, control room reconfigured to use CAD. No

manual backup system. System progressively lost ambulances screens filled with exception messages, that scrolled off and

were lost system delayed incidents, waiting for ambulances, so public

called again, increasing the workload. Several or zero ambulances sent to each incident. Staff stress caused operator errors Network congestion, slowdown, system collapse. Oct 27th, semi-manual operation but system crashed

through memory leak. System abandoned.

Page 15: 1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

15/25

Radiotherapy

Page 16: 1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

16/25

Therac 25

(not the system on the previous slide)A system for treatment of tumours

Mode 1: low energy electron beam treatment Mode 2: very high energy beam (25MeV) with

a thick metal plate in front, for X-rays.Therac-20 had a mechanical switch to

change beam, and an interlock to stop change to high energy without the plate.

Therac 25 interlock was in software.

Page 17: 1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

17/25

Therac-25 User InterfaceSet up treatment timeElectron beam, type eX-ray beam, type x. System puts the plate in place before

switching beam to X-rays. System: “Beam Ready”, Operator types b

to start treatment.Operator station in a different room from

the patient, to protect staff from radiation

Page 18: 1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

18/25

Therac: Accident

Ray Cox, oil worker, on the table for his regular e-beam treatment for a tumour on his shoulder.

Operator goes to the other room types x, realises mistake, types “edit”, e, “enter” - all

within 8 seconds. System says “Malfunction” cleared the error, got “beam ready” and hit b same error message, so tried again. Twice.

Ray felt a painful jolt - not like previous treatments. Shouted in pain but no-one heard. Third time he got off the table and went to find the nurse.

Page 19: 1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

19/25

Therac 25: outcome

Ray Cox died of radiation overdose 4 months later.

Meanwhile another patient experienced the same accident, but this time a technician realised there was a problem and reported it.

The same problem had occurred in Georgia, Canada and Washington.

Page 20: 1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

20/25

Therac: what went wrong?

The operator’s actions exposed a race-condition in the (multi-tasking) code.

The result was a full-power beam without the plate in place. 125-fold overdose!

The particular sequence of actions had never occurred in testing.

Made worse because audio intercom and video link both out of service. System error messages not informative (and usually meant treatment had not occurred).

Page 21: 1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

21/25

Therac: Failings

Safety Case claimed 10-11 probability for “computer selects wrong energy”. No evidence for the claim.

No low-complexity protection system (fuse and/or interlock).

Poor software engineering.Poor investigation of reported accidents.

Manufacturer did not consider possible software fault until several accidents

Page 22: 1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

22/25

Ariane V: European Space Agency launch vehicle

Page 23: 1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

23/25

Ariane V: Explosion

Initial launch explodedFailure traced to the inertial

navigation system (INS). Overflow on conversion from 64-bit

floating to 16-bit integer; exception not trapped

primary and back-up INS both failed for the same reason, and stopped

loss of INS led to auto-destruction.

Page 24: 1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

24/25

Ariane V: cause of failure

INS software re-used from Ariane IVAriane IV flight profile guaranteed

this parameter could not overflowAriane V specification was different,

in a way that affected the requirements for the INS.

Formal specification would catch this fault.

Page 25: 1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

25/25

Conclusions (1)

Software development is hard - all sorts of things go wrong.

It is an engineering task. You dare not do without discipline and rigour.

Even the best people make mistakes. That’s why we use reviews, checklists, type-checkers and other static analysis tools, testing, and proof.

Page 26: 1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

26/25

Conclusions (2)

A safety-critical software team must have:Good domain knowledgeExcellent systems engineering / software

engineering knowledge, skills, processesGood knowledge of safety assessment

principles, standards, practice and law,… and finally ...

Page 27: 1 Software in Practice a series of four lectures on why software projects fail, and what you can do about it Martyn Thomas Founder: Praxis High Integrity.

27/25

…a strong safety culture

Developing safety-critical software is the subject of mynext lecture.