SAS_08_Preventing_Eliminating_SWfaults_Goseva- Popstojanova Preventing and Eliminating Software Faults through the Life Cycle PI: Katerina Goseva-Popstojanova Student: Margaret Hamill Lane Dept. Computer Science and Electrical Engineering West Virginia University, Morgantown, WV E-mail: [email protected]
29
Embed
SAS_08_Preventing_Eliminating_SWfaults_Goseva-Popstojanova Preventing and Eliminating Software Faults through the Life Cycle PI: Katerina Goseva-Popstojanova.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• NASA spends time and effort to track problem reports/change data for every project. These repositories are rich, underused sources about the way software systems fail and the software faults that cause these failures.
• Our goal: Based on systematic and thorough analysis of the available empirical data, build quantitative and qualitative knowledge that contributes towards improvement of software quality by– preventing introduction of faults into the system – more efficiently eliminating them through the life cycle– compiling lessons learned & recommendations to be used
• We used a large NASA mission as a pilot study– 21 Computer Software Configuration Items (CSCIs)– millions of lines of code– over 8,000 files – developed at two different locations
• We analyzed – over 2,800 Software Change Requests (SCRs) entered due
to non-conformance with requirements• collected through the software life cycle (i.e.,
development, testing and on-orbit)• over a period of almost 10 years
• To the best of our knowledge, this is the largest dataset considered so far in the published literature
• Sources of failures (i.e., type of faults)– Identified most common fault types– Showed both the internal and external validity of the results
• Activities when the problem was discovered (e.g., inspection, testing, analysis, on-orbit) – Only 3% of SCRs are on-orbit– Identified dominant faults types during Development & testing and On-orbit
• Severity – Only around 8% of SCRs are safety critical (less than 1% On-orbit)– Analyzed severity of different fault types
• Compiled internal document on lessons learned & recommendations for product and process improvement
Most common sources of failure for all 21 CSCIs grouped together– Requirements faults (incorrect, changed & missing requirements): 33% – Coding faults: 33%– Data problems: 14%
• This distribution of faults across life cycle activities contradicts the common belief that majority of faults are introduced during early life cycle activities, i.e., requirements and design, which dates back to some of the earliest empirical studies [Boehm et. al 75, Endres 1975, Basili et. al 1984]
• Important question: Internal & External validity of our results
Source of failures: Internal validityCSCIs grouped by the number of releases
# of releases
% files
% scrs
% cscis
% requirementsfaults
% coding faults
% data problems
1 2.56 13.72 4.76 31.12 39.54 13.01
2 24.64 10.11 38.10 39.79 34.60 11.42
3 27.33 18.51 33.33 33.65 25.14 16.82
4 16.53 14.42 9.52 28.40 43.45 9.47
5 6.84 9.20 4.76 31.94 24.71 11.03
7 22.09 34.04 9.52 32.58 30.73 15.52
• The same three most common sources are consistently dominating the fault types, accounting for 68% to 86% of the SCRs in each group– Requirements faults: 28% - 40%– Coding faults: 25% - 43%– Data problems: 9% - 17%
• We compared our results based on 2,858 non-conformance SCRs with results from several recent large empirical studies – 199 anomaly reports, 7 JPL unmanned spacecrafts [Lutz et al. 2004 ]– 600 software faults from several releases, switching system [Yu 1998] – 427 pre- and post-release modification requests, optical network element
[Leszak et al, 2002]– 668 faults, 12 open source projects [Duraes et al. 2006] – 408 faults, IBM operating system [Christmansson et al. 1996]
• Consistent trend across different domains, languages, development processes & organizations
Percentage of problems reported due to coding, interface and integration faults together is approximately the same or even higher than the percentage of faults due to early life cycle activities (i.e., requirements and design)
• Sources of failures (i.e., type of faults)– Identified most common fault types– Showed both the internal and external validity of the results
• Activities when the problem was discovered (e.g., inspection, testing, analysis, on-orbit) – Only 3% of SCRs are on-orbit– Identified dominant faults types during Development & testing and On-orbit
• Severity – Only around 8% of SCRs are safety critical (less than 1% On-orbit)– Analyzed severity of different fault types
• Compiled internal document on lessons learned & recommendations for product and process improvement
• Severity is assigned by the review board when deciding whether the SCR needs to be addressed– Sev 1: result in loss of a safety critical function– Sev 1N: sev1 with an established workaround– Sev 2: result in loss of a critical mission support capability– Sev 2N: sev2 with an established workaround– Sev 3: perceivable by operator but neither sev1 or sev2– Sev 4: discrepancy not perceivable to the FSW user and usually
insignificant violation of FSW requirements– Sev 5: not perceivable to the FSW user and usually a case where
programming standard is violated
Around 8% of all SCRs are safety critical (Development & testing 7% and On-orbit 1%)
• Sources of failures (i.e., type of faults)– Identified most common fault types– Showed both the internal and external validity of the results
• Activities when the problem was discovered (e.g., inspection, testing, analysis, on-orbit) – Only 3% of SCRs are on-orbit– Identified dominant faults types during Development & testing and On-orbit
• Severity – Only around 8% of SCRs are safety critical (less than 1% On-orbit)– Analyzed severity of different fault types
• Compiled internal document on lessons learned & recommendations for product and process improvement
• Based on our results and the feedback from the IV&V team and project team we compiled a document for internal use which summarizes the Lessons Learned & Recommendations for Product and Process Improvement
– Prevent the introduction of faults and improve the effectiveness of detection
• Example: Increase effort spent on design and implementation of data repository used to share data between CSCIs
– Improve the quality of the data & change tracking process• Example: Ensure the changes to the software artifacts (e.g.,
requirements, code, etc) made to fix the problem are recorded and can be easily associated with a specific SCR
• Understanding why, how, and when faults manifest as failures is essential to determining how their introduction into the software systems can be prevented and when and how they can be eliminated most effectively
– For the pilot project and many other NASA missions that undergo incremental development and require sustained engineering for a long period of time, these results can be used to improve the system quality
– The internal and external validity of our results indicate that several observed trends are not project specific. Rather, they seem to be intrinsic characteristics of software faults and failures which apply across projects
– Parts of our Lessons Learned document which are related to improvement of the problem/change tracking systems and data quality can be used by newer initiatives such as Constellation, thus proactively avoiding common pitfalls, leading to more accurate data and more cost efficient improvement of software quality
• Assuring data quality is an important step of any empirical research effort– Inaccurate data may lead to misleading observations – Both the IV&V team and the project team have been
extremely valuable in helping us to understand the change tracking system, determine the meaning of different attributes, and verify the quality of the data
• The research approach and analysis techniques can be used by any project that tracks problem reports/change data– However, due to the lack of a unified change tracking
system, some amount of unique work on exploration of the data format and automation of data extraction may be needed