8/2/2019 SwRelResearch1
1/26
Software ReliabilitySoftware Reliability
ResearchResearchPankaj JalotePankaj Jalote
Professor, CSE, IIT Kanpur,Professor, CSE, IIT Kanpur,
IndiaIndia
8/2/2019 SwRelResearch1
2/26
System ReliabilitySystem Reliability
SystemSystem an entity that providesan entity that providesdefined behavior at interfacesdefined behavior at interfaces System is a hierarchy of subsystems,System is a hierarchy of subsystems,
each subsystem being a systemeach subsystem being a system
Reliability of a systemReliability of a system -- its ability toits ability toprovide failureprovide failure--free operationfree operation
FailureFailure the system behavior isthe system behavior isincorrect or not as expected; is aincorrect or not as expected; is arandom phenomenonrandom phenomenon
8/2/2019 SwRelResearch1
3/26
Reliability QuantificationReliability Quantification
Reliability of a system defined asReliability of a system defined as
failure probability in a time periodfailure probability in a time periodR(t) = Prob that system has notR(t) = Prob that system has not
failed by time tfailed by time t
For rel work, often distribution ofFor rel work, often distribution ofR(t) is specifiedR(t) is specified
8/2/2019 SwRelResearch1
4/26
Reliability Quantification..Reliability Quantification..
Reliability can also be quantified byReliability can also be quantified byMean Time to Failure (MTTF)Mean Time to Failure (MTTF)
Also by failure rate (no of failures perAlso by failure rate (no of failures perunit time.)unit time.)
From R(t), MTTF or failure rate canFrom R(t), MTTF or failure rate canbe determinedbe determined
Under some assumptions, failure rateUnder some assumptions, failure rateand MTTF are inversely relatedand MTTF are inversely related
8/2/2019 SwRelResearch1
5/26
Software ReliabilitySoftware Reliability
Software (un)reliability not causedSoftware (un)reliability not causeddue to aging but due to bugsdue to aging but due to bugs
The more the bugs, the lesser theThe more the bugs, the lesser thereliability of the softwarereliability of the software
Still failures seem random, henceStill failures seem random, hence relreltheory can be appliedtheory can be applied
8/2/2019 SwRelResearch1
6/26
Software Reliability ResearchSoftware Reliability Research
Two main threadsTwo main threads
Software reliability modelingSoftware reliability modeling how tohow tomodel and predictmodel and predict swsw relrel
ImprovingImproving swsw reliabilityreliability by removingby removingdefects through program checking,defects through program checking,verification, testing,verification, testing,
Will discuss some work being doneWill discuss some work being donehere in these twohere in these two
8/2/2019 SwRelResearch1
7/26
Software ReliabilitySoftware Reliability
ModelingModeling
8/2/2019 SwRelResearch1
8/26
Software ReliabilitySoftware Reliability
Software systems often are oneSoftware systems often are one--offoff Measuring reliability in lab not practicalMeasuring reliability in lab not practical
as too much failure data is needed;as too much failure data is needed;
requires timerequires time Failures often result in fault removal,Failures often result in fault removal,
leading to reliability improvementleading to reliability improvement Predicting future reliability fromPredicting future reliability from
measured reliability is hardermeasured reliability is harder
Hence different models neededHence different models needed
8/2/2019 SwRelResearch1
9/26
Software Reliability Growth ModelsSoftware Reliability Growth Models
Assume that reliability is a functionAssume that reliability is a functionof the defect level and as defects areof the defect level and as defects areremoved, reliability improvesremoved, reliability improves
Model the failureModel the failure--fix process offix process ofsoftware evolutionsoftware evolution
Many models have been proposed inMany models have been proposed in
the last 3 decadesthe last 3 decades Model parameters determined fromModel parameters determined from
past data on failures and fixespast data on failures and fixes
8/2/2019 SwRelResearch1
10/26
Reliability of Software ProductsReliability of Software Products
For software products, a largeFor software products, a largepopulation exists in field and faultspopulation exists in field and faults
are not removed as failures occurare not removed as failures occur According to SRGMs, the reliabilityAccording to SRGMs, the reliability
should remain the sameshould remain the same
I.e. the failure rate should beI.e. the failure rate should beconstantconstant
8/2/2019 SwRelResearch1
11/26
Average Failure Rate of a MSAverage Failure Rate of a MS
ProductProductFailure intensity
0
0.01
0.02
0.03
0.04
0.05
0.06
0.070.08
0.09
1 2 3 4 5 6 7 8 9 10 11
Months frm release
Failu
res/month/unit
8/2/2019 SwRelResearch1
12/26
Reasons for this PhenomenonReasons for this Phenomenon
Users learn with time and avoidUsers learn with time and avoidfailure causing situationfailure causing situation
Users start with exploring more, thenUsers start with exploring more, thenlimit to some part of the productlimit to some part of the product Most users use a few product featuresMost users use a few product features
Configuration related failures areConfiguration related failures are
much more in the startmuch more in the start
These failures reduce with timeThese failures reduce with time
8/2/2019 SwRelResearch1
13/26
A New Model for Product Rel.A New Model for Product Rel.
For a user, there is a transient failureFor a user, there is a transient failurerate, which decays with a factorrate, which decays with a factor
With time the transient goes, andWith time the transient goes, andfailure rate reaches a steady statefailure rate reaches a steady state
Steady state failure rateSteady state failure rate representsrepresentsthe reliability of the productthe reliability of the product
8/2/2019 SwRelResearch1
14/26
Failure Rate of a UnitFailure Rate of a Unit
Failure rate for oneFailure rate for oneunit isunit is (i) = 0 * (i) = 0 *ii + f+ f
0 is the initial0 is the initialtransient ratetransient rate
f is the finalf is the final
steady state ratesteady state rate is the decay is the decay
factorfactor
Failure rate of a unit
Time
Failurerate
8/2/2019 SwRelResearch1
15/26
Applying it to a ProductApplying it to a Product
Considered the failure and sale dataConsidered the failure and sale dataof a real product for MSof a real product for MS
Applying the model to the data andApplying the model to the data anddetermining parameters, we getdetermining parameters, we get
0 = 0.04 failures/month0 = 0.04 failures/month
f = 0.008 failures/monthf = 0.008 failures/month
= 0.4 (i.e. 40% decay each month) = 0.4 (i.e. 40% decay each month)
8/2/2019 SwRelResearch1
16/26
ExampleExample
Steady state failure rate is 1/6Steady state failure rate is 1/6thth ofofaverage rate in month 2, 1/3average rate in month 2, 1/3rdrd ofofaverage rate in month 4average rate in month 4
I.e. initial MTTF could be 1/6I.e. initial MTTF could be 1/6thth thethesteady state MTTFsteady state MTTF
Steady state is reached quite soonSteady state is reached quite soon in two to three monthsin two to three months
8/2/2019 SwRelResearch1
17/26
Software ArchitectureSoftware Architecture
Based Rel EstimationBased Rel Estimation
8/2/2019 SwRelResearch1
18/26
Sw ArchitectureSw Architecture
Architecture is the components in theArchitecture is the components in thesystem and how they are connectedsystem and how they are connected
Is decided very early in sw projectIs decided very early in sw project If reliability and performance can beIf reliability and performance can be
modeled from architecture, canmodeled from architecture, canimprove the architectureimprove the architecture
Some work going on in arch. basedSome work going on in arch. basedperf. and rel modelingperf. and rel modeling
8/2/2019 SwRelResearch1
19/26
Program VerificationProgram Verification
8/2/2019 SwRelResearch1
20/26
Program VerificationProgram Verification
Basic goalBasic goal to ensure that programto ensure that programis free of defects (bugs) as much asis free of defects (bugs) as much aspossiblepossible
Good program verification leads toGood program verification leads tohigher reliabilityhigher reliability
8/2/2019 SwRelResearch1
21/26
Program Verification TechniquesProgram Verification Techniques
TestingTesting program is executed withprogram is executed withtest data to find bugstest data to find bugs
Static analysisStatic analysis program sourceprogram sourcecode is analyzedcode is analyzed
Dynamic analysisDynamic analysis program run onprogram run onsome data and assertions madesome data and assertions made
Model checkingModel checking
Formal verificationFormal verification
8/2/2019 SwRelResearch1
22/26
TechniquesTechniques
Most techniques work in isolationMost techniques work in isolation
Sometimes they are complimentarySometimes they are complimentary
in their defect detection capabilityin their defect detection capability Combining techniques meaningfullyCombining techniques meaningfully
can improve reliabilitycan improve reliability
We are working on techniques forWe are working on techniques forcombining testing and static analysiscombining testing and static analysis
8/2/2019 SwRelResearch1
23/26
StateState--based Testingbased Testing
AutomationAutomation
8/2/2019 SwRelResearch1
24/26
TestingTesting
Testing remains main verificationTesting remains main verificationactivityactivity most reliance on itmost reliance on it
Consumes as much as half of theConsumes as much as half of thetotal effort in a sw producttotal effort in a sw product
Testing: test case design, execution,Testing: test case design, execution,checking the results, thenchecking the results, thendebugging, fixing, retestingdebugging, fixing, retesting
Each step is expensiveEach step is expensive
8/2/2019 SwRelResearch1
25/26
Test AutomationTest Automation
Test automation can help reduce costTest automation can help reduce costand make testing more effectiveand make testing more effective
Most test automation approachesMost test automation approachesfocus on data collection, refocus on data collection, re--testingtesting
Little effort in complete endLittle effort in complete end--toto--endendautomationautomation
We are working on automating OOWe are working on automating OOtesting using state based modelstesting using state based models
8/2/2019 SwRelResearch1
26/26
SummarySummary
Software reliability is a rich and wideSoftware reliability is a rich and wideareaarea
Exciting work going on across theExciting work going on across theworld in modeling, analysis, programworld in modeling, analysis, programchecking, testing, etcchecking, testing, etc
Lots of open issuesLots of open issues