Top Banner
www.hdfgroup.org The HDF Group HDF5 and The HDF Group May 2014
39

HDF5 and The HDF Group

Nov 11, 2014

Download

Technology

This is the latest information about The HDF Group and HDF5.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HDF5 and The HDF Group

www.hdfgroup.org

The HDF Group

HDF5 and The HDF Group

May 2014

Page 2: HDF5 and The HDF Group

www.hdfgroup.org

THE HDF GROUP

HDF5 and The HDF Group

Page 3: HDF5 and The HDF Group

www.hdfgroup.org

Mission

To provide high quality software for managing large complex data,

to provide outstanding services for users of these technologies,

and to insure effective management of data throughout the data life cycle.

HDF5 and The HDF Group

Page 4: HDF5 and The HDF Group

www.hdfgroup.org

Goals of The HDF Group

• To create, maintain, and evolve software and services that enable society to manage large complex data at every stage of the data life cycle.

• To establish and maintain a sustainable organization with a highly-skilled and committed team devoted to accomplishing the first goal.

HDF5 and The HDF Group

Page 5: HDF5 and The HDF Group

www.hdfgroup.orgHDF5 and The HDF Group

The HDF Group

• 1988-2006: Software group at University of Illinois National Center for Supercomputing Applications

• 2005-present: Non-profit company in Champaign, IL• Passionate about managing large, complex,

heterogeneous data throughout its life cycle• Creators and stewards of HDF4 and HDF5• Own HDF4 and HDF5• Formats, libraries, and tools are open and free• Committed to high quality and reliability

• Currently employ 33 staff

Page 6: HDF5 and The HDF Group

www.hdfgroup.orgHDF5 and The HDF Group

Current project list for the HDF Group

• NASA – Earth Observing System (EOS)• The basis for global climate research• HDF is the standard archive and distribution format for EOS • Hundreds of data products, 8 petabyte archive and growing

• NOAA/NASA – JPSS• Next generation weather satellite system and EOS• HDF5 is the primary distribution format (6 TB/day)

• Sandia National Laboratory• High throughput, multi-stream satellite image management

• Synchrotron community• Scalable solutions for high throughput data acquisition and management

• ExaHDF5 (Lawrence Berkeley National Lab)• High end scientific simulations• Tuning HDF5 for high performance parallel I/O

• FastForward Computing (DOE)• Solving I/O challenges for exascale computing

Page 7: HDF5 and The HDF Group

www.hdfgroup.orgHDF5 and The HDF Group

The HDF Group Services

• Helpdesk and Mailing Lists • Available to all users as a first level of support

• Priority Support • Rapid issue resolution and advice

• Consulting• Needs assessment, troubleshooting, design reviews, etc.

• Training• Tutorials and hands-on practical experience

• Enterprise Support• Coordinating HDF activities across departments

• Special Projects • Adapting customer applications to HDF • New features and tools• Research and Development

Page 8: HDF5 and The HDF Group

www.hdfgroup.org

WHO USES HDF5?

HDF5 and The HDF Group

Page 9: HDF5 and The HDF Group

www.hdfgroup.orgHDF5 and The HDF Group

Who uses HDF5?

• Applications that deal with large or complex data

• Over 200 different application areas• >2 million data product users world-wide• Academia, government agencies, industry

Page 10: HDF5 and The HDF Group

www.hdfgroup.orgHDF5 and The HDF Group

Members of the HDF support community

• NASA – Earth Observing System• NOAA/NASA/Riverside Tech – NPOESS• A large financial institution• DOE – projects w/LBNL & PNNL, ANL & ORNL• Lawrence Livermore National Lab• Army Geospatial Center• NIH/Geospiza (bio software company )• Lawrence Berkeley National Lab• University of Illinois/NCSA• Sandia National Lab• A leading U.S. aerospace company• Projects for petroleum industry, vehicle testing, weapons

research, others• “In kind” support

Page 11: HDF5 and The HDF Group

www.hdfgroup.org

New Areas We’re Exploring

• Fusion research data storage• Submitted proposal for ITER project’s data

management w/large industrial fusion partner• Astronomy

• Submitted NSF SI2 grant w/NRAO• Working toward new standard for radioastronomy data

storage• Electron Microscopy

• Submitted NSF SI2 grant w/LSU, et al• Proposing new standard for storing imaging data

• Synthesis of HDF5 and cloud storage w/Microsoft• Developing “RESTful” API for accessing HDF5 data in

Azure cloudHDF5 and The HDF Group

Page 12: HDF5 and The HDF Group

www.hdfgroup.org

HDF5 SCIENCE APPLICATIONS

HDF5 and The HDF Group

Page 13: HDF5 and The HDF Group

www.hdfgroup.orgHDF5 and The HDF Group

NASA EOS Remote Sensed Data

• HDF format is the standard file format for storing data from NASA's Earth Observing System (EOS) mission.

• Petabytes of data stored in HDF4 and HDF5 to support the Global Climate Change Research Program.

Page 14: HDF5 and The HDF Group

HDF5 and The HDF Group

Page 15: HDF5 and The HDF Group

www.hdfgroup.org

What is JPSS?

• JPSS is the next generation of NOAA's polar-orbiting environmental satellites.

• JPSS observations enable forecasting severe weather like hurricanes, tornadoes and blizzards, and assessing environmental hazards such as droughts, forest fires, poor air quality and harmful coastal waters.

• JPSS will provide continuity of critical, global Earth observations— including our atmosphere, oceans and land through 2025.

• During Hurricane Sandy in October 2012, JPSS data helped forecasters and scientists accurately predict Sandy's hurricane track and infamous 'left hook' landfall into New York and New Jersey–more than five days in advance.

HDF5 and The HDF Group

Page 16: HDF5 and The HDF Group

www.hdfgroup.org

CFD General Notation System

HDF5 and The HDF Group

Page 17: HDF5 and The HDF Group

www.hdfgroup.org

What is CFD?

Computational fluid dynamics (CFD) is a branch of fluid mechanics that uses numerical methods and algorithms to solve and analyze problems that involve fluid flows.

HDF5 and The HDF Group

Page 18: HDF5 and The HDF Group

www.hdfgroup.org

This CFD computer generated image shows a model of the space shuttle. CFD has taken the place of wind tunnels for many evaluations of aircraft and, as computing power increases and computer models become more sophisticated, CFD will largely replace wind tunnels.

HDF5 and The HDF Group

Page 19: HDF5 and The HDF Group

www.hdfgroup.orgHDF5 and The HDF Group

What is CGNS ?

• Standard Interface Data Structures (SIDS)– Collection of conventions and definitions that

defines the intellectual content of CFD-related data.

• SIDS to ADF Mapping– Advanced Data Format

• SIDS to HDF5 Mapping– Defines how the SIDS is represented in HDF5

• CGNS Mid-Level Library (MLL)– Application Programming Interface (API) which

conforms to the SIDS– Built on top of ADF/HDF5, which do I/O operations

Page 20: HDF5 and The HDF Group

www.hdfgroup.org

CGNS and HDF5*

• CGNS was originally built using the ADF format. • However, ADF does not have parallel I/O or data

compression capabilities, and does not have the support and tools that HDF5 offers.

• HDF5 has rapidly grown to become a world-wide format standard for scientific data.

• HDF5 has parallel capability as well as a broader support base than ADF.

• Therefore, CGNS has adopted HDF5 as the default (official) data storage mechanism.

* Paraphrased from http://cgns.sourceforge.net/hdf5.html.

HDF5 and The HDF Group

Page 21: HDF5 and The HDF Group

www.hdfgroup.org

• An adaptive mesh refinement (AMR), grid-based hybrid code which is designed to do simulations of cosmological structure formation.

HDF5 and The HDF Group

Page 22: HDF5 and The HDF Group

HDF5 and The HDF Group

Image credit: Alexei Kritsuk, Paolo Padoan & Mike Norman

Page 23: HDF5 and The HDF Group

www.hdfgroup.org

What is ENZO for?

• At UC San Diego ENZO cosmology is used to simulate the universe from first principles, starting near the Big Bang.

• Researchers using ENZO have conducted the most detailed simulations ever of a region of the universe more than 1.5 billion light years across.

• “We need to zoom in on these dense regions to capture the key physical processes -- including gravitation, flows of normal and ‘dark’ matter, and shock heating and radiative cooling of the gas,” said Mike Norman. “This requires ENZO’s ‘adaptive mesh refinement’ capability.”

HDF5 and The HDF Group

Page 24: HDF5 and The HDF Group

www.hdfgroup.org

• “AMR codes begin with a coarse grid spacing, and then spawn more detailed subgrids as needed to track key processes in higher density regions.

• “We achieved unprecedented detail by reaching seven levels of subgrids throughout the survey volume -- something never done before -- producing more than 400,000 subgrids,” said SDSC computational scientist Robert Harkness.

• “Norman is one of the largest users of supercomputing time in the world, with 16 million computing hours at the TACC, and millions more on TeraGrid systems at SDSC, PSC, and NCSA.”

• “The HDF Group provided important support for handling the output, and SDSC’s data storage environment allowed the researchers to efficiently store and manage the massive data.”

HDF5 and The HDF Group

Page 25: HDF5 and The HDF Group

NeXus

HDF5 and The HDF Group

Page 26: HDF5 and The HDF Group

www.hdfgroup.org

What is NeXus?

• In recent years, scientists and programmers in neutron and synchrotron facilities around the world concluded that a common data format would fulfill a valuable function in the scattering community.

• As instrumentation becomes more complex and data visualization more challenging, scientists find it difficult to keep up with new developments.

• A common data format makes it easier to exchange experimental results and to exchange ideas about how to analyze them. It promotes greater cooperation in software development and stimulates the design of more sophisticated visualization tools.

• The NeXus data format has been developed in response to these needs.

HDF5 and The HDF Group

Page 27: HDF5 and The HDF Group

www.hdfgroup.org

HDF5 TECHNOLOGIES

HDF5 and The HDF Group

Page 28: HDF5 and The HDF Group

www.hdfgroup.org

Data challenges addressed by HDF5

• Ability to organize complex collections of data

• Efficient and scalable data storage and access

• A growing need to integrate a wide variety of

types of data

• The evolution of data technologies

• Long term preservation of data

HDF5 and The HDF Group

Page 29: HDF5 and The HDF Group

www.hdfgroup.orgHDF5 and The HDF Group

HDF is…

• HDF stands for ‘Hierarchical Data Format’• A file format for storing any kind of data• Software system to manage data in the format

• Designed for high volume or complex data• Designed for every size and type of system• Open format and software library, tools

• There are two HDF’s: HDF4 and HDF5• Here we focus on HDF5

Page 30: HDF5 and The HDF Group

www.hdfgroup.org

HDF5 Technology Platform

HDF5 data model• The “building

blocks” for data organization

HDF5 software• Library, language

interfaces, tools

HDF5 file format• B

yte-level organization of data

HDF5 and The HDF Group

Page 31: HDF5 and The HDF Group

www.hdfgroup.org

Professionally managed

• Source under version control, public access• Automatic daily testing,

• 200+ configurations• Performance, backward/forward compatibility

• “C, C++, Fortran, Java, Python APIs• Build supports Autoconfigure and CMake• Sound development, coding practices• Maintenance releases every May, November

HDF5 and The HDF Group

Page 32: HDF5 and The HDF Group

www.hdfgroup.org

Professionally supported

• Helpdesk• FORUM and mailing lists• Extensive web documentation – User’s Guide,

Ref Manual, examples, tutorials, other docs• Community friendly

• Integrate contributions from external developers

• Solicit feedback on new features and pre-releases

• Collaborate on projects, especially in testing

HDF5 and The HDF Group

Page 33: HDF5 and The HDF Group

www.hdfgroup.org

HDF5 file

lat | lon | temp----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6

Experiment Notes:

Serial Number: 99378920

Date: 3/13/09

Configuration: Standard 3

An HDF5 file is a container that holds data objects.

HDF5 and The HDF Group

Page 34: HDF5 and The HDF Group

www.hdfgroup.org

HDF5 file organization

lat | lon | temp----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6

Experiment Notes:Serial Number: 99378920Date: 3/13/09Config: Standard 3

/

SimOutViz

HDF5 groups and links organize data objects.

Parameters10;100;1000

Timestep36,000

HDF5 and The HDF Group

Page 35: HDF5 and The HDF Group

www.hdfgroup.orgHDF5 and The HDF Group

A single platform with multiple uses• One general data model• One general format• One library• Adaptable for almost any kind of data• Works on almost any architecture• Ability to interact well with other technologies• Attention to past, present, future compatibility

HDF5 Philosophy

Page 36: HDF5 and The HDF Group

www.hdfgroup.org

HDF5 Software Layers & Storage

HDF5 File Format

Stor

age

File Split Files

File on Parallel Filesystem

Other?

h5dumptool

High Level APIs

HDFView toolTo

ols h5repack

tool …

I/O Drivers

InternalsDatatype

Conversiondata

compressionChunked Storage

Version Compatibility

and so on…

Language Interfaces

C, Fortran, C++HDF5 Data Model

Groups, Datasets, Attributes, …

HD

F5 L

ibra

ry

Posix I/O

Split Files

Parallel I/O

Custom

HDF5 and The HDF Group

Page 37: HDF5 and The HDF Group

www.hdfgroup.org

HDF ecosystem

Storage

EOS Domain Data Objects

ApplicationsEOS

ApplicationsMATLAB

HDF Library

IDL

HDF-EOS Library

Swath Grid Point

Etc.

HDF tools

HDF5 and The HDF Group

Page 38: HDF5 and The HDF Group

www.hdfgroup.orgHDF5 and The HDF Group

Other Software

• The HDF Group• HDFView – an HDF4 & HDF5 browser• Command-line utilities• Regression and performance testing software

• 3rd Party • NetCDF-4, IDL, MATLAB, Mathematica,

PyTables, Pandas• Communities

• EOS, ASC, CGNS, Energistics, NeXuS• Integration with other software

• iRODS, OPeNDAP, MPI

Page 39: HDF5 and The HDF Group

www.hdfgroup.org

www.hdfgroup.org

HDF5 and The HDF Group