www.hdfgroup.org The HDF Group HDF5 and The HDF Group May 2014
Nov 11, 2014
www.hdfgroup.org
The HDF Group
HDF5 and The HDF Group
May 2014
www.hdfgroup.org
THE HDF GROUP
HDF5 and The HDF Group
www.hdfgroup.org
Mission
To provide high quality software for managing large complex data,
to provide outstanding services for users of these technologies,
and to insure effective management of data throughout the data life cycle.
HDF5 and The HDF Group
www.hdfgroup.org
Goals of The HDF Group
• To create, maintain, and evolve software and services that enable society to manage large complex data at every stage of the data life cycle.
• To establish and maintain a sustainable organization with a highly-skilled and committed team devoted to accomplishing the first goal.
HDF5 and The HDF Group
www.hdfgroup.orgHDF5 and The HDF Group
The HDF Group
• 1988-2006: Software group at University of Illinois National Center for Supercomputing Applications
• 2005-present: Non-profit company in Champaign, IL• Passionate about managing large, complex,
heterogeneous data throughout its life cycle• Creators and stewards of HDF4 and HDF5• Own HDF4 and HDF5• Formats, libraries, and tools are open and free• Committed to high quality and reliability
• Currently employ 33 staff
www.hdfgroup.orgHDF5 and The HDF Group
Current project list for the HDF Group
• NASA – Earth Observing System (EOS)• The basis for global climate research• HDF is the standard archive and distribution format for EOS • Hundreds of data products, 8 petabyte archive and growing
• NOAA/NASA – JPSS• Next generation weather satellite system and EOS• HDF5 is the primary distribution format (6 TB/day)
• Sandia National Laboratory• High throughput, multi-stream satellite image management
• Synchrotron community• Scalable solutions for high throughput data acquisition and management
• ExaHDF5 (Lawrence Berkeley National Lab)• High end scientific simulations• Tuning HDF5 for high performance parallel I/O
• FastForward Computing (DOE)• Solving I/O challenges for exascale computing
www.hdfgroup.orgHDF5 and The HDF Group
The HDF Group Services
• Helpdesk and Mailing Lists • Available to all users as a first level of support
• Priority Support • Rapid issue resolution and advice
• Consulting• Needs assessment, troubleshooting, design reviews, etc.
• Training• Tutorials and hands-on practical experience
• Enterprise Support• Coordinating HDF activities across departments
• Special Projects • Adapting customer applications to HDF • New features and tools• Research and Development
www.hdfgroup.org
WHO USES HDF5?
HDF5 and The HDF Group
www.hdfgroup.orgHDF5 and The HDF Group
Who uses HDF5?
• Applications that deal with large or complex data
• Over 200 different application areas• >2 million data product users world-wide• Academia, government agencies, industry
www.hdfgroup.orgHDF5 and The HDF Group
Members of the HDF support community
• NASA – Earth Observing System• NOAA/NASA/Riverside Tech – NPOESS• A large financial institution• DOE – projects w/LBNL & PNNL, ANL & ORNL• Lawrence Livermore National Lab• Army Geospatial Center• NIH/Geospiza (bio software company )• Lawrence Berkeley National Lab• University of Illinois/NCSA• Sandia National Lab• A leading U.S. aerospace company• Projects for petroleum industry, vehicle testing, weapons
research, others• “In kind” support
www.hdfgroup.org
New Areas We’re Exploring
• Fusion research data storage• Submitted proposal for ITER project’s data
management w/large industrial fusion partner• Astronomy
• Submitted NSF SI2 grant w/NRAO• Working toward new standard for radioastronomy data
storage• Electron Microscopy
• Submitted NSF SI2 grant w/LSU, et al• Proposing new standard for storing imaging data
• Synthesis of HDF5 and cloud storage w/Microsoft• Developing “RESTful” API for accessing HDF5 data in
Azure cloudHDF5 and The HDF Group
www.hdfgroup.org
HDF5 SCIENCE APPLICATIONS
HDF5 and The HDF Group
www.hdfgroup.orgHDF5 and The HDF Group
NASA EOS Remote Sensed Data
• HDF format is the standard file format for storing data from NASA's Earth Observing System (EOS) mission.
• Petabytes of data stored in HDF4 and HDF5 to support the Global Climate Change Research Program.
HDF5 and The HDF Group
www.hdfgroup.org
What is JPSS?
• JPSS is the next generation of NOAA's polar-orbiting environmental satellites.
• JPSS observations enable forecasting severe weather like hurricanes, tornadoes and blizzards, and assessing environmental hazards such as droughts, forest fires, poor air quality and harmful coastal waters.
• JPSS will provide continuity of critical, global Earth observations— including our atmosphere, oceans and land through 2025.
• During Hurricane Sandy in October 2012, JPSS data helped forecasters and scientists accurately predict Sandy's hurricane track and infamous 'left hook' landfall into New York and New Jersey–more than five days in advance.
HDF5 and The HDF Group
www.hdfgroup.org
CFD General Notation System
HDF5 and The HDF Group
www.hdfgroup.org
What is CFD?
Computational fluid dynamics (CFD) is a branch of fluid mechanics that uses numerical methods and algorithms to solve and analyze problems that involve fluid flows.
HDF5 and The HDF Group
www.hdfgroup.org
This CFD computer generated image shows a model of the space shuttle. CFD has taken the place of wind tunnels for many evaluations of aircraft and, as computing power increases and computer models become more sophisticated, CFD will largely replace wind tunnels.
HDF5 and The HDF Group
www.hdfgroup.orgHDF5 and The HDF Group
What is CGNS ?
• Standard Interface Data Structures (SIDS)– Collection of conventions and definitions that
defines the intellectual content of CFD-related data.
• SIDS to ADF Mapping– Advanced Data Format
• SIDS to HDF5 Mapping– Defines how the SIDS is represented in HDF5
• CGNS Mid-Level Library (MLL)– Application Programming Interface (API) which
conforms to the SIDS– Built on top of ADF/HDF5, which do I/O operations
www.hdfgroup.org
CGNS and HDF5*
• CGNS was originally built using the ADF format. • However, ADF does not have parallel I/O or data
compression capabilities, and does not have the support and tools that HDF5 offers.
• HDF5 has rapidly grown to become a world-wide format standard for scientific data.
• HDF5 has parallel capability as well as a broader support base than ADF.
• Therefore, CGNS has adopted HDF5 as the default (official) data storage mechanism.
* Paraphrased from http://cgns.sourceforge.net/hdf5.html.
HDF5 and The HDF Group
www.hdfgroup.org
• An adaptive mesh refinement (AMR), grid-based hybrid code which is designed to do simulations of cosmological structure formation.
HDF5 and The HDF Group
HDF5 and The HDF Group
Image credit: Alexei Kritsuk, Paolo Padoan & Mike Norman
www.hdfgroup.org
What is ENZO for?
• At UC San Diego ENZO cosmology is used to simulate the universe from first principles, starting near the Big Bang.
• Researchers using ENZO have conducted the most detailed simulations ever of a region of the universe more than 1.5 billion light years across.
• “We need to zoom in on these dense regions to capture the key physical processes -- including gravitation, flows of normal and ‘dark’ matter, and shock heating and radiative cooling of the gas,” said Mike Norman. “This requires ENZO’s ‘adaptive mesh refinement’ capability.”
HDF5 and The HDF Group
www.hdfgroup.org
• “AMR codes begin with a coarse grid spacing, and then spawn more detailed subgrids as needed to track key processes in higher density regions.
• “We achieved unprecedented detail by reaching seven levels of subgrids throughout the survey volume -- something never done before -- producing more than 400,000 subgrids,” said SDSC computational scientist Robert Harkness.
• “Norman is one of the largest users of supercomputing time in the world, with 16 million computing hours at the TACC, and millions more on TeraGrid systems at SDSC, PSC, and NCSA.”
• “The HDF Group provided important support for handling the output, and SDSC’s data storage environment allowed the researchers to efficiently store and manage the massive data.”
HDF5 and The HDF Group
NeXus
HDF5 and The HDF Group
www.hdfgroup.org
What is NeXus?
• In recent years, scientists and programmers in neutron and synchrotron facilities around the world concluded that a common data format would fulfill a valuable function in the scattering community.
• As instrumentation becomes more complex and data visualization more challenging, scientists find it difficult to keep up with new developments.
• A common data format makes it easier to exchange experimental results and to exchange ideas about how to analyze them. It promotes greater cooperation in software development and stimulates the design of more sophisticated visualization tools.
• The NeXus data format has been developed in response to these needs.
HDF5 and The HDF Group
www.hdfgroup.org
HDF5 TECHNOLOGIES
HDF5 and The HDF Group
www.hdfgroup.org
Data challenges addressed by HDF5
• Ability to organize complex collections of data
• Efficient and scalable data storage and access
• A growing need to integrate a wide variety of
types of data
• The evolution of data technologies
• Long term preservation of data
HDF5 and The HDF Group
www.hdfgroup.orgHDF5 and The HDF Group
HDF is…
• HDF stands for ‘Hierarchical Data Format’• A file format for storing any kind of data• Software system to manage data in the format
• Designed for high volume or complex data• Designed for every size and type of system• Open format and software library, tools
• There are two HDF’s: HDF4 and HDF5• Here we focus on HDF5
www.hdfgroup.org
HDF5 Technology Platform
HDF5 data model• The “building
blocks” for data organization
HDF5 software• Library, language
interfaces, tools
HDF5 file format• B
yte-level organization of data
HDF5 and The HDF Group
www.hdfgroup.org
Professionally managed
• Source under version control, public access• Automatic daily testing,
• 200+ configurations• Performance, backward/forward compatibility
• “C, C++, Fortran, Java, Python APIs• Build supports Autoconfigure and CMake• Sound development, coding practices• Maintenance releases every May, November
HDF5 and The HDF Group
www.hdfgroup.org
Professionally supported
• Helpdesk• FORUM and mailing lists• Extensive web documentation – User’s Guide,
Ref Manual, examples, tutorials, other docs• Community friendly
• Integrate contributions from external developers
• Solicit feedback on new features and pre-releases
• Collaborate on projects, especially in testing
HDF5 and The HDF Group
www.hdfgroup.org
HDF5 file
lat | lon | temp----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6
Experiment Notes:
Serial Number: 99378920
Date: 3/13/09
Configuration: Standard 3
An HDF5 file is a container that holds data objects.
HDF5 and The HDF Group
www.hdfgroup.org
HDF5 file organization
lat | lon | temp----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6
Experiment Notes:Serial Number: 99378920Date: 3/13/09Config: Standard 3
/
SimOutViz
HDF5 groups and links organize data objects.
Parameters10;100;1000
Timestep36,000
HDF5 and The HDF Group
www.hdfgroup.orgHDF5 and The HDF Group
A single platform with multiple uses• One general data model• One general format• One library• Adaptable for almost any kind of data• Works on almost any architecture• Ability to interact well with other technologies• Attention to past, present, future compatibility
HDF5 Philosophy
www.hdfgroup.org
HDF5 Software Layers & Storage
HDF5 File Format
Stor
age
File Split Files
File on Parallel Filesystem
Other?
h5dumptool
High Level APIs
HDFView toolTo
ols h5repack
tool …
I/O Drivers
InternalsDatatype
Conversiondata
compressionChunked Storage
Version Compatibility
and so on…
Language Interfaces
C, Fortran, C++HDF5 Data Model
Groups, Datasets, Attributes, …
HD
F5 L
ibra
ry
Posix I/O
Split Files
Parallel I/O
Custom
HDF5 and The HDF Group
www.hdfgroup.org
HDF ecosystem
Storage
EOS Domain Data Objects
ApplicationsEOS
ApplicationsMATLAB
HDF Library
IDL
HDF-EOS Library
Swath Grid Point
Etc.
HDF tools
HDF5 and The HDF Group
www.hdfgroup.orgHDF5 and The HDF Group
Other Software
• The HDF Group• HDFView – an HDF4 & HDF5 browser• Command-line utilities• Regression and performance testing software
• 3rd Party • NetCDF-4, IDL, MATLAB, Mathematica,
PyTables, Pandas• Communities
• EOS, ASC, CGNS, Energistics, NeXuS• Integration with other software
• iRODS, OPeNDAP, MPI
www.hdfgroup.org
www.hdfgroup.org
HDF5 and The HDF Group