SSS Validation and Testing September 11, 2003 Rockville, MD Rockville, MD William McLendon Neil Pundit Erik DeBenedictis Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract DE-AC04-94AL85000.
33
Embed
SSS Validation and Testing September 11, 2003 Rockville, MD William McLendon Neil Pundit Erik DeBenedictis Sandia is a multiprogram laboratory operated.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
SSS Validation and Testing
September 11, 2003
Rockville, MDRockville, MD
William McLendon
Neil PunditErik DeBenedictis
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy under contract DE-AC04-94AL85000.
• APItest
• Release Testing Experiences at Sandia
• Status daemon
Overview
Distributed Runtime System Testing
• Complex system of interactions• Approach to testing
– Component Testing
– Benchmarks• Performance / Functionality
– Operational Profile
– Stress Testing
• Users expect a high-degree of quality in today’s high end systems!
APItest
APITEST - Overview
• Unit-testing tool for network components– Targeted for networked applications– Extensible framework– Dependency calculus for inter-test relationships
iterations test name % matched Pass/Fail message---------- --------- --------- --------- ----------[1 of 1] A 100.00% PASS [1 of 1] K 100.00% FAIL m[0.0% : 0.0%][1 of 1] J 0.00% FAIL m[90.0% : 90.0%][5 of 5] M 100.00% PASS [1 of 1] L 100.00% FAIL m[0.0% : 0.0%][1 of 1] N 100.00% PASS [0 of 1] T DEPENDENCY FAILURE(S)
iterations test name % matched Pass/Fail message---------- --------- --------- --------- ----------[1 of 1] add-location 100.00% PASS [1 of 1] QuerySDComps 100.00% PASS [1 of 1] QuerySDHost 100.00% PASS [1 of 1] QuerySDProtocol 100.00% PASS [1 of 1] QuerySDPort 100.00% PASS [1 of 1] del-location 100.00% PASS [1 of 1] val-removal 100.00% PASS
iterations test name % matched Pass/Fail message---------- --------- --------- --------- ----------[1 of 1] sss-getproto 100.00% PASS [1 of 1] sss-getport 100.00% PASS [1 of 1] sss-gethost 100.00% PASS [1 of 1] sss-getcomp 100.00% PASS [1 of 1] sss-getproto 100.00% PASS [1 of 1] sss-getport 100.00% PASS [1 of 1] sss-gethost 100.00% PASS [1 of 1] sss-getcomp 100.00% PASS
Release Testing…
Tales from Cplant Release Testing
• Methodical execution of production jobs and 3rd Party benchmarks to identify system instabilities, enabling them to be resolved. Ie:– Rapid job turnover rate (caused mismatches between
scheduler and allocator)– Heavy io (I/O which passes through launch node
process instead of directly to ENFS “yod-io”)
• Wrapping above codes into Ctest framework to enable portable compile, launch, and analysis of synthetic workloads
Ctest
• Extension of Mike Carifio’s work
– Presented at the SciDAC meeting in Houston during Fall of 2002
– Make structure that holds a suite of independent applications.
– Tools to launch as a reproducible workload.
– Goal: 30 users and 60 concurrent apps
Sample Load Profile on CPlant
Issue Tracking
• SNL uses a program called RT– Centralized repository for issue tracking helps give
an overall picture of what problems are.
– Helps give summary of progress.
• Bugzilla is on the SciDAC SSS website– http://bugzilla.mcs.anl.gov/scidac-sss/