Ten Years of Software Sustainability at The Infrared Processing and Analysis Center G. Bruce Berriman and John Good NASA Exoplanet Science Institute, Infrared Processing and Analysis Center, Caltech, USA Ewa Deelman Information Sciences Institute, University of Southern California, USA Anastasia Alexov Astronomical Institute Anton Pannekoek, Amsterdam, Netherlands Presentation at AHM 2010, Cardiff, September 2010.
14
Embed
Ten Years of Software Sustainability at The Infrared Processing and Analysis Center G. Bruce Berriman and John Good NASA Exoplanet Science Institute, Infrared.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Ten Years of Software Sustainability
at The Infrared Processing and
Analysis Center
G. Bruce Berriman and John GoodNASA Exoplanet Science Institute,
Infrared Processing and Analysis Center, Caltech, USAEwa Deelman
Information Sciences Institute, University of Southern California, USA
Anastasia AlexovAstronomical Institute Anton Pannekoek, Amsterdam,
Netherlands
Presentation at AHM 2010, Cardiff, September 2010.
The Role of IPAC in Astronomy
http://www.ipac.caltech.edu
Long-term archive
Curation of data
Dissemination to the community
Size and Usage Have Grown
Archives contain data from 30 missions and projects
Space based, ground based and knowledge based
Archives Built on a Common Hardware And Software ArchitectureArchives Built on a Common Hardware And Software Architecture
85 million queries
3 TB/month downloaded
A Common Software Architecture
Application is usually a CGI program
Each component is a module with a standard interface that communicates with other components and fulfills one general functionModules are stand-alone portable ANSI-C toolsComponents plugged together & controlled by an executive library Executive starts components as child services and parses return values
Application is usually a CGI program
Each component is a module with a standard interface that communicates with other components and fulfills one general functionModules are stand-alone portable ANSI-C toolsComponents plugged together & controlled by an executive library Executive starts components as child services and parses return values
Applications are generally simple web forms or Web services that search for data The “smarts” are on the server side;
optimize complex queries on large data sets
Component based architecture which enables strong re-use and adaptation Optimized for astronomical spatial
searches and complex, general queries regardless of wavelength and type of mission
All services are integrated into the Infrared Science Information System (ISIS)
Components are generic; minimize dependencies on third-party software or environments
Avoid shared memories or system calls All database queries are performed in
one module 300 KLOC
New projects automatically inherit functionality Supports efficient development and
controls maintenance costs
Engage Your Users! Concerted program of user engagement to
attract new users and build a user community
Method
User Surveys
End User Group(drawn from the community)
Exhibits and demos
Coffee pot conversations
Advertize in newsletters
Number of end-users has increased to 18,000
12% of peer-reviewed papers cited IPAC archives or data
Actively seek feedback, e.g.
Watch users as they try services; see where they get stuck
User Surveys ask respondents to write down their views rather than answer questions
Listen to the advice you don’t want to hear
Listen to the advice you don’t want to hear
Speed Is King In An Archive Image data sets becoming very large: Spitzer
Space Telescope will deliver over 100 million images, with varying footprints on the sky.
Searches for spatially extended images are slow: a scan of Spitzer images can take 2,000 s
… results pages are becoming more complex.
What matters more – fast access? Or interactivity? Speed won hands down.
R-tree Indexing Uses hierarchically
nested minimum bounding boxes
Performance scales as log(N)
Performance gain of x1000 over table scan
Memory-mapped files Parallelization / cluster processing REST-based web services
Segment of virtual memory is assigned a byte for byte correlation with part of a file.
Modernization of Scanpi Written in 1983, Scanpi co-adds scans from the far-infrared
IRAS survey. 15 papers per year on average by 2007.
Sensitivity gain of x5 over survey data products
Improve spatial resolution of extended or confused sources
User panel strongly recommended modernization because of its value in supporting interpretation of data from current IR missions Spitzer and Herschel.
But it was coughing up blood and was a classic legacy program
Written in F66, it had become a patchwork of scripts and bug fixes and was a maintenance nightmare.
Dependent modules for data compression etc. no longer supported.
Stranded on Solaris 2.8
Developer retiring
Scanpi Workflow
Co-registerscans
Co-add all scans
Re-usable Components
plotting
background
table manipulationbulk download
coordinate transformation
Sourcefitting
Back-ground fitting
Output:Results and
files on Web
Get scansInput:Source
info
Rewritten from ground up in C
Developed as a workflow application that gives visibility into the processing steps
Calls existing components, reduce code base to 21 KLOC cf. 102 KLOC