7/22/04 Report Back: Performance Analysis Track Dr. Carol Smidts Wes Deadrick
7/22/04 Report Back:Performance Analysis Track
Dr. Carol SmidtsWes Deadrick
Track Members• Carol Smidts (UMD) – Track Chair
– Integrating Software into PRA
• Ted Bennett and Paul Wennberg (Triakis)– Empirical Assurance of Embedded Software Using Realistic
Simulated Failure Modes
• Dolores Wallace (GSFC)– System and Software Reliability
• Bojan Cukic (WVU)– Compositional Approach to Formal Models
• Kalynnda Berens (GRC)– Software Safety Assurance of Programmable Logic– Injecting Faults for Software Error Evaluation of Flight Software
• Hany Ammar (WVU)– Risk Assessment of Software Architectures
Agenda
• Characterization of the Field
• Problem Statement
• Benefits of Performance Analysis
• Future Directions
• Limitations
• Technology Readiness Levels
Characterization of Field
• Goal: Prediction and Assessment of Software Risk/Assurance Level (Mitigation optimization)
• System Characteristics of interest– Risk (Off-nominal situations)– Reliability, availability, maintainability = Dependability– Failures - general sense
• Performance Analysis Techniques - modeling and simulation, data analysis, failure analysis, design analysis focused on criticality
Problem Statement
• Why should NASA do performance analysis? - We care if things fail!
• Successfully conducting SW and System Performance Analysis gives us the data necessary to make informed decisions in order to improve performance and overall quality
• Performance analysis permits:– Ability to determine if/when system meets requirements– Risk reduction and quantification– Application of new knowledge to future systems– A better understanding of the processes by which systems are
developed and therefore enables NASA to exercise continual improvement
Benefits of Performance Analysis
• Reduced development and operating costs• Manage and optimize current processes thereby
resulting in more efficient and effective processes– Defined and repeatable process – reduced time to do same
volume of work
• Reduces risk and increases safety and reliability• Better software architecture designs• More maintainable systems• Enable NASA to handle more complex systems in the
future• Put the responsibility where it belongs from a
organizational perspective - focuses accountability
Future Directions for Performance Analysis
• Automation of modeling and data collection – increased efficiency and accuracy
• A more useful, better reliability model – useful = user friendly (enable the masses not just the
domain experts), increased usability of the data (learn more from what we have)
– better = greater accuracy and predictability
• Define and follow repeatable methods/processes for data collection and analysis including:– education and training– use of simulation– gold nugget = accurate and complete data
Future Directions for Performance Analysis (Cont.)
• Develop a method for establishing accurate performance predictions earlier in life cycle
• Evolve to refine system level assessment – factor in the human element
• Establish and define an approach to performing trade-off of attributes – reliability, etc.
• Need for early guidance on criticality of components
• Optimize a defect removal model• Methods and metrics for calculating/defending
return on investment of conducting performance analysis
Why not
• Standard traps - Obstacles– Uncertainty about scalability– User friendliness– Lack of generality – “Not invented here” syndrome
• Costs and benefits– Difficult to assess and quantify– Long term project benefit tracking
recommended
Technology Readiness Level• Integrating Software into PRA – Taxonomy (7)• Test-Based Approach for Integrating SW in PRA (3)• Empirical Assurance of Embedded Software Using
Realistic Simulated Failure Modes (5)• Maintaining system and SW test consistency (8)• System Reliability (3)• Software Reliability (9)• Compositional Approach to Formal Models (2)• Software Safety Assurance of Programmable Logic (2)• Injecting Faults for Software Error Evaluation of Flight
Software (9)• Risk Assessment of Software Architectures (5)
Research Project Summaries
Integrating Software Into PRADr. Carol Smidts, Bin Li
Objective:
• PRA is a methodology to assess the risk of large technological systems
• The objective of this research is to extend current classical PRA methodology to account for the impact of software onto mission risk
Integrating Software Into PRA (Cont)
Achievements
1. Developed a software related failure mode taxonomy
2. Validated the taxonomy on multiple projects (ISS, Space Shuttle, X38)
3. Proposed a step-by-step approach to integration in the classical PRA framework with quantification of input and functional failures.
Problem
Disconnect exists between System and software development loops
SYSTEM
Design/Debug
Analyze/Test/V&V
Model,Simulate
,Prototyp
e,ES, etc.
SWInterpretation
Requirements
Analyze/Test/Verify
Design/Debug
Build
Integration Testing
Most embedded SW faults found at integ. test traceable to Rqmts. & interface
misunderstanding
TRIAKIS Corporation
Approach• Develop & simulate entire system design using
executable specifications (ES)• Verify total system design with suite of tests• Simulate controller hardware• Replace controller ES with simulated HW running
object (flight) software• Test SW using system verification testsWhen SW passes all system verification tests, it
has correctly implemented all of the tested requirements
TRIAKIS Corporation
Mini-AERCam
IV&V Facility
Empirical Assurance of Embedded SWUsing Realistic Simulated Failure Modes
• Problem: FMEA Limitations– Expensive & time-consuming– List of possible failure modes extensive – Focuses on prioritized subset of failure modes
• Approach: Test SW w/sim’d Failures– Create pure virtual simulation of Mini-AERCam
HW & flight environment running on PC – Induce realistic component/subsystem failures– Observe flight SW response to induced failures
Can we improve coverage by testing SW resp. to sim’d failures?
Compare results with project-sponsored FMEA, FTA, etc.: #Issues
uncovered?#Failure modes
evaluated?Effort
involved?
TRIAKIS Corporation
Software and System Reliability
Dolores Wallace, Bill Farr, Swapna Gokhale
• Addresses the need to evaluate and assess the reliability and availability of large complex software intensive systems by predicting (with associated confidence intervals):– The number of software/system faults,– Mean time to failure and restore/repair,– Availability, – Estimated release time from testing.
2003 & 2004 Research2003 (Software Based)2003 (Software Based)• Literature search completed• New models were selected: 1) Enhanced Schneidewind
(includes risk assessment and trade-off analysis) and 2) Hypergeometric Model
• Incorporated the new software models into the established public domain tool SMERFS^3
• Applied the new models on a Goddard software project• Made the latest version of SMERFS^3 available to the
general public2004 (System Based)2004 (System Based)• Conducted similar research effort for System Reliability and
Availability• Will enhance SMERFS^3 and validate the system models
on a Goddard data set
A Compositional approach to Validation of Formal Models
• Problem– Significant number of faults in real systems can be traced back to
specifications.– Current methodologies of specification assurance have problems:
• Theorem Proving: Complex• Model Checking: State explosion problems• Testing: Incomplete.
• Approach– Combine them!
• Use test coverage to build abstractions.• Abstractions reduce the size of the state space for model checking. • Develop visual interfaces to improve the usability of the method.
Dejan Desovski, Bojan Cukic
Software Fault Injection ProcessKalynnda Berens, Dr. John Crigler, Richard Plastow
• Standardized approach to test systems with COTS and hardware interfaces
• Provides a roadmap of where to look to determine what to test
Identify Interfaces and Critical Sections
Error/Fault Research
Estimate Effort Required
Obtain Source Code and Documentation
Start
Sufficient time and funds?
Importance Analysis
Select Subset
Test Case Generation
Fault Injection Testing Document Results,
Metrics, Lessons Learned
Feedback to FCF Project
End
Yes
Programmable Logic at NASA Kalynnda Berens, Jacqueline Somos
• Issues– Lack of good assurance of PLCs and PLDs– Increasing complexity = increasing problems– Usage and Assurance Survey - SA involved in less than 1/3 of
the projects; limited knowledge
• Recommendations– Trained SA for PLCs– PLDs – determine what is complex; use process assurance (SA
or QA)
• Training Created– Basic PLC and PLD training aimed at SA– Process assurance for hardware QA
Year 2 of Research• What is industry and other government agencies doing
for assurance and verification? – An intensive literature search of white papers, manuals,
standards, and other documents that illustrated what various organizations were doing.
– Focused interviews with industry practitioners. Interviews were conducted with assurance personnel (both hardware and software) and engineering practitioners in various industries, including biomedical, aerospace, and control systems.
– Meeting with FAA representatives. Discussions with FAA representatives lead to a more thorough understanding of their approach and the pitfalls they have encountered along the way.
• Position paper, with recommendations for NASA Code Q
Current Effort• Implement some of the recommendations
– Develop coursework to educate software and hardware assurance engineers
– Three courses• PLCs for Software Assurance personnel
• PLDs for Software Assurance personnel
• Process Assurance for Hardware QA
– Guidebook
• Other recommendations– For Code Q to implement if desired– Follow-up CSIP to try software-style assurance on complex
electronics
Severity Analysis MethodologyHanny Ammar, Katerina Goseva-Popstojanova, Ajith Guedem, Kalaivani
Appukutty, Walib AbdelMoez, and Ahmad Hassan
• We have developed a methodology to assess severity of failures of components, connectors, and scenarios based on UML models
• This methodology is applied on NASA’s Earth Observing System (EOS)
Requirements Scenarios Failure Modes
FM1 FM2 … FMn
R1 S1 Rf
S2R2 S3
…
Rm
Risk factor of scenario S1 in Failure mode FM2
• Requirements are mapped to UML use case and scenarios
• A failure mode refers to the way in which a scenario fails to achieve its requirement
• According to Dr. Martin Feather’s DDP Process, “The Requirements matrix maps the impacts of each failure mode on each requirement.”
Requirement Risk Analysis Methodology
• We have developed a methodology for assessing requirements based risk using normalized dynamic complexity and severity of failures. This can be used in the DDP process developed at JPL.
What to Read
• Key works in the field
• Tutorials
• Web sites
– Will be completed at a later time