1 Supplemental line if need be (example: Supported by the National Science Foundation) Delete if not needed. Supporting Polar Research with National Cyberinfrastructure Justin Miller High Performance File Systems Indiana University Polar HPDC Workshop Rutgers University December 4, 2014
16
Embed
1 Supplemental line if need be (example: Supported by the National Science Foundation) Delete if not needed. Supporting Polar Research with National Cyberinfrastructure.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Supplemental line if need be (example: Supported by the National Science Foundation) Delete if not needed.
Supporting Polar Research with NationalCyberinfrastructure
• Indiana University is a partner with the Center for Remote Sensing of Ice Sheets (CReSIS) at the University of Kansas– CReSIS develops radar instruments and processing software– IU provides the computational support
• Data collection in the field• Data storage and processing on IU resources
• Instruments and compute systems are installed on airborne science platforms – NASA Operation IceBridge– NSF funded campaigns
3
4
Polar Research Operations Center (PROC)
• Data collection in the field– Forward Observer
• In-flight data collection and processing system– Captures multiple copies of the data simultaneously– Provides the ability to process data in near real-time
» Quality control» Observation
• Ground lab hardware– generate derived products in the field
• Data management back at home– computational and archival storage– data and metadata management
5
Data Storage and Processing
• Data from the field is shipped to University of Kansas and Indiana University (two copies)– At KU the copy the data to long-term archival storage– At IU we load the data onto a 3.5 PB filesystem for processing
• A complete season of radar data is approximately 80 TB• Able to keep multiple complete seasons online for
reprocessing• Researches use custom Matlab and python code to generate data
products from the raw data• Data products are available online
funded by the National Science FoundationAward #ACI-1445604
pti.iu.edu/jetstreamAward #1445604
What is Jetstream?• NSF’s first cloud for science and engineering research across all
areas of activity supported by the NSF
• Jetstream will be a user-friendly cloud environment designed to give researchers and research students access to interactive computing and data analysis resources “on demand.”
• It will provide a user-selectable library of virtual machines that users can select from to do their research.
• Software creators and researchers will also be able to create their own customized virtual machines -or- their own “private computing system” within Jetstream.
• It will enable countless discoveries across disciplines such as biology, atmospheric science, economics, network science, observational astronomy, and social sciences.
pti.iu.edu/jetstreamAward #1445604
pti.iu.edu/jetstreamAward #1445604
Jetstream Hardware Components
Jetstream
Site
#CPUs # Physical
Cores
PFLOPS Total RAM
(GB)
Node Local
Storage
(TB)
Secondary
Storage
(TB)
Connectio
n to
Internet2
(Gbps)
Production
Systems
IU 640 7,680 0.258 40,960 640 960 100
TACC 640 7,680 0.258 40,960 640 960 100
Test &
Developm
ent
System
Arizona 32 384 0.013 2,048 32 192 100
Total 1,312 15,744 0.529 83,968 1,312 2,112 300
pti.iu.edu/jetstreamAward #1445604
Jetstream System Diagram
pti.iu.edu/jetstreamAward #1445604
News for software developers
• Jetstream is enabling cyberinfrastructure
• RESTful APIs
• You build, package, deploy
• Users run on NSF-funded hardware
• Can leverage Globus technology for data movement and authentication
12
13
14
15
16
Citations and License Terms
• Slides about Jetstream are from the presentation “Jetstream – A national science & engineering cloud” by Craig Stewart, PI ([email protected])
• Slides about Wrangler are from the presentation “A Transformational Data Intensive Resource for the Open Science Community” by Dan Stanzione, Director and PI ([email protected]), and Niall Gaffney, Director for Data Intensive Computing ([email protected])
• Except where otherwise noted, contents of this presentation are copyright 2013 by the Trustees of Indiana University.
• This document is released under the Creative Commons Attribution 3.0 Unported license (http://creativecommons.org/licenses/by/3.0/). This license includes the following terms: You are free to share – to copy, distribute and transmit the work and to remix – to adapt the work under the following conditions: attribution – you must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). For any reuse or distribution, you must make clear to others the license terms of this work.