Top Banner
HDF Update HDF Update Mike Folk The HDF Group HDF and HDF-EOS Workshop X November 29, 2006 HDF HDF
67

HDF Update

May 26, 2015

Download

Technology

Update on HDF, including recent changes to the software, upcoming releases, collaborations, future plans. Will include an overview of the upcoming HDF5 1.8 release, and updates on the netCDF4/HDF5 merge, HDF5 support for indexing, BioHDF, the HDF5-Storage Resource Broker project, the NPOESS BAA, HDF5-OPeNDAP project, HDF-EOS library and website supports and the HDF spin-off THG.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HDF Update

HDF UpdateHDF UpdateMike Folk

The HDF GroupHDF and HDF-EOS Workshop X

November 29, 2006

HDFHDF

Page 2: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 2

OutlineOutline

• Organizational info• HDF Software Update• Other Activities of Interest

Page 3: HDF Update

Organizational infoOrganizational info

Page 4: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 4

““The HDF Group” = “THG”The HDF Group” = “THG”

Founded Dec. 2006 Went solo July 15, 2006

Non-profit

Page 5: HDF Update

THG missionTHG missionTo support the vast To support the vast

community of HDF users community of HDF users and to ensure the and to ensure the

sustainable development sustainable development of HDF technologies and of HDF technologies and the ongoing accessibility the ongoing accessibility

of HDF-stored data. of HDF-stored data.

Page 6: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 6

The HDF TeamThe HDF Team

Frank BakerChristian ChilanPeter CaoVailin ChoiMike FolkAnne JenningsBarbara JonesQuincey KoziolJames Laird Raymond Lu

John MainzerMatthew NeedhamPedro NunesTammi O’NeillElena PourmalBinh-minh RiblerRandy RiblerRishi SinhaKent Yang

And all those wonderful folks out there who contribute ideas, requests, bug reports, code, and support.

Page 7: HDF Update

HDF Software HDF Software UpdateUpdate

Page 8: HDF Update

HDF4 updateHDF4 update

Page 9: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 10

Platforms to be droppedPlatforms to be dropped

• Operating systems• HPUX 11.00 • Crays SV1 and TS

IEEE• AIX 5.1 and 5.2• SGI IRIX64-6.5• Linux 2.4• Solaris 2.7, 2.8, 2.9• Windows 2000• MAC OSX 10.3

• Compilers• GNU C compilers

older than 3.4 (Linux)

• Intel 8.*• PGI V. 5.*, 6.0

Page 10: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 11

Platforms to be addedPlatforms to be added

• Systems• MAC OSX 10.4

(Intel)• Solaris 2.* on Intel • Cray XT3• Windows 64-bit (?)• Linux 2.6• HPUX 11.23• IBM Power 5

• Compilers• g95• PGI V. 6.1• Intel 9.*

Page 11: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 12

New featuresNew features

• Configuration• Switched to use F77_FUNC macro for

better Fortran support (no hard-coded compilers anymore!)

• Support for shared libraries

• Library• No hard-coded limit on number of opened

files• New APIs to control number of files

opened by application• Fortran support for SZIP compression

Page 12: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 13

Bugs fixesBugs fixes

• Tools•A lot of improvements to the hdp,

hrepack, hdiff and hdfimport utilites based on users’ feedback

• Library•Data corruption bug for several

opened unlimited dimension SDSs•Better handling of SDSs with

duplicated names in SDgetdimscale and more

Page 13: HDF Update

HDF5 updateHDF5 update

Page 14: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 15

No new releases!No new releases!

• Focus on HDF5 release 1.8• HDF5-1.8.0 Alpha 5 release is

available from:

hdf.ncsa.uiuc.edu/HDF5/release/alpha/obtain518.html

Page 15: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 16

Platforms to be droppedPlatforms to be dropped

• Operating systems• HPUX 11.00 • MAC OS 10.3• AIX 5.1 and 5.2• SGI IRIX64-6.5• Linux 2.4• Solaris 2.8 and 2.9

• Compilers• GNU C compilers

older than 3.4 (Linux)

• Intel 8.*• PGI V. 5.*, 6.0• MPICH 1.2.5

http://www.hdfgroup.org/HDF5/release/alpha/obtain518.html

Page 16: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 17

Platforms to be addedPlatforms to be added

• Systems• Alpha Open VMS• MAC OSX 10.4

(Intel)• Solaris 2.* on Intel

(?)• Cray XT3• Windows 64-bit

(32-bit binaries)• Linux 2.6• BG/L

• Compilers• g95• PGI V. 6.1• Intel 9.*• MPICH 1.2.7• MPICH2

Page 17: HDF Update

New Features New Features in HDF5 1.8 in HDF5 1.8

Page 18: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 19

HDF5 1.8 new library HDF5 1.8 new library featuresfeatures

• Datatype and dataspace features• Serialized dataspaces and datatypes• Ability to create data type from text

description• Integer to float conversions during I/O • Revised exception handling during type

conversion• Compact storage for N-bit data types• Offset+size storage filter, saving space• “Null” dataspace – datasets with no

elements• Data transformation filter

Page 19: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 20

HDF5 1.8 – new library HDF5 1.8 – new library featuresfeatures

• Group revisions• Creation order access• Compact groups – small groups take

less space• Large group storage improvements• Intermediate group creation

Page 20: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 21

HDF5 1.8 – HDF5 1.8 – new library featuresnew library features

• Link improvements• External links -- can refer to objects in

another file• User defined links – apps create own

kinds of links

• Attribute improvments• Storage improvements for large

numbers of attr• Iterate or look up by creation order

Page 21: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 22

HDF5 1.8 – new library HDF5 1.8 – new library featuresfeatures

• Support for Unicode UTF-8 character set

• Shared header info – duplicate header info shared, possibly saving space

• Metadata cache improvements – faster I/O on files with many objects

• Data transformation filter• Stackable Virtual File Drivers• Better UNIX/Linux portability

Page 22: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 23

HDF5 1.8– new APIsHDF5 1.8– new APIs

• New extendible error-handling API• New APIs to copy objects between

files fast• Dimension scale model and API• “HDFpacket” – API to read/write

packets efficiently

Page 23: HDF Update

HDF5 1.8 – backward HDF5 1.8 – backward and forward and forward compatibilitycompatibility

Page 24: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 25

HDF5 1.8 vs. 1.6.5HDF5 1.8 vs. 1.6.5

• Differences between 1.8 vs. 1.6.5• Some file format changes• Several new routines added• Old APIs deprecated -- removed in later

release

• Consequences• Application requiring 1.8 format changes

will write objects that 1.6.5 library cannot read

• To exploit 1.8 changes, apps need to be rewritten

Page 25: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 26

Principle of Principle of “Maximum file format “Maximum file format

compatibility”compatibility”

Unless instructed otherwise, the HDF5 library will write objects using the earliest version of the format possible for describing the information.

Assures forward compatibility with the older versions whenever possible – objects in new files can be read with old libraries if those objects are “known” to the old libraries.

Page 26: HDF Update

Command line Command line toolstools

Page 27: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 32

New features for old toolsNew features for old tools

• h5dump• Dump data in binary format

• h5diff • Compare dataset regions

• Parallel h5diff (ph5diff) • Compare two files in MPI parallel

environment• h5repack

• Efficient data copy using H5Gcopy()• Able to handle big datasets

Page 28: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 33

New HDF5 ToolsNew HDF5 Tools

• h5copy• Copies an group, dataset or named

datatype from one location to another location

• Copies within a file or across files

• h5check • Verifies an HDF5 file against the defined

HDF5 File Format Specification

• h5stat• Reports statistics about a file and objects

in a file

Page 29: HDF Update

HDF Java ProductsHDF Java Products

Page 30: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 35

HDFView changesHDFView changes

• Quality improvements for HDF-java package • Full documentation of hdf-java object package• Test suite for hdf-java object package

• Support 64-bit Java on Linux and Solaris• Many new features, including

• Change font size easily• Grab and move image• Create new table (compound dataset) from

template• Filter out fill value for image creation• -geometry option for very high resolution

displays

Page 31: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 36

Future work for JavaFuture work for Java

• Update HDF5 JNI APIs for HDF5 1.8 release

• Release HDFView 2.4 with bug fixes/new features with HDF5 1.8 release

• New GUI features dealing with table, image and animation

• Writing capability for HDF5-SRB model

Page 32: HDF Update

Website Development Website Development for for HDF-EOS Tools HDF-EOS Tools

& Information & Information CenterCenter

Page 33: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 38

Website for HDF-EOS ToolsWebsite for HDF-EOS Tools

• THG now manages HDF-EOS web site• Registered domain names:

hdfeos.net/.org/.com• Re-implemented major topic areas• Re-designed interface• Registered google search

• Will continue maintenance• Phase two

• Host mailing list• Support simple forum features

Page 34: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 39

Website for HDF-EOS ToolsWebsite for HDF-EOS Tools

Page 35: HDF Update

Other Activities of Other Activities of InterestInterest

Page 36: HDF Update

Performance R&DPerformance R&D

Page 37: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 42

HDF5 - PnetCDF performance HDF5 - PnetCDF performance comparisoncomparison

uP: Power 5

Flash I/O Benchmark (Checkpoint files)

0

500

1000

1500

2000

2500

10 110 210 310

Number of Processors

MB

/sPnetCDF HDF5 collective HDF5 independent

I/O performance of PnetCDF is comparable with parallel HDF5 when the libraries are used in similar manners.

Page 38: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 43

PnetCDF4 - PnetCDF PnetCDF4 - PnetCDF comparisoncomparison

I/O performance of parallel NetCDF4 is comparable with PnetCDF with about 15% slowness on average for the output of ROMS history file.

0

2040

6080

100

120140

160

0 16 32 48 64 80 96 112 128 144

Number of processors

Ban

dw

idth

(M

B/S

)PNetCDF collective NetCDF4 collective

Page 39: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 44

Collective I/O Collective I/O improvementsimprovements

• HDF5 supports collective IO for non-regular selections

• Collective IO for chunked storage is not trivial.

• Non-regular selection performance optimizations:• Added IO options to achieve good collective

IO performance• Added APIs for applications to participate in

the optimization process

• See the poster

Page 40: HDF Update

DOE LabsDOE Labs

Sandia National

Laboratory

Lawrence Livermore National

Laboratory

Page 41: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 46

DOE ASC* and OthersDOE ASC* and Others

• Support HDF5 on major systems at Sandia & Lawrence Livermore National Laboratories

• R&D efforts underway• File recovery after a crash• Very fast write speed – goal is 300

MB/sec• Read-while-writing capability• Java library and HDFView

improvements* Advanced Scientific Computing project

Page 42: HDF Update

Flight testFlight test

Page 43: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 48

Flight test – collect, then Flight test – collect, then processprocess

Page 44: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 49

Boeing HDF5 for flight test Boeing HDF5 for flight test datadata

• Boeing 787 active archive• 10 TB per flight-test day

• Must handle raw, real-time data• High speed ingest, by “packet”• Post-processing, by “time-history”

• Boeing High Level API’s• HDFpacket – released with HDF5 1.8• HDFtime_history – new, open version

likely

Page 45: HDF Update

Product dataProduct data

STEPSTEP

Page 46: HDF Update

BioinformaticsBioinformaticscaacaagccaaaactcgtacaacaacaagccaaaactcgtacaaCgagatatctcttggaaaaactCgagatatctcttggaaaaactgctcacaatattgacgtacaaggctcacaatattgacgtacaaggttgttcatgaaactttcggtagttgttcatgaaactttcggtaAcaatcgttgacattgcgacctAcaatcgttgacattgcgacctaatacagcccagcaagcagaataatacagcccagcaagcagaat

Managing genomic dataManaging genomic data

Page 47: HDF Update

C# HDF5 API C# HDF5 API for Agilentfor Agilent

Page 48: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 53

Agilent C# projectAgilent C# project

• Why?• Heavy use of C# at Agilent• Compatibility with Matlab• Other interest in HDF5 at Agilent

• What? • Prototype API in C# for Windows XP• Basic functions to create, open, close, read,

write• Limited datatypes, no partial I/O

• When?• March 2007

Page 49: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 54

HDF5 SoftwareHDF5 Software

HDF FileHDF FileHDF FileHDF File

Tools & ApplicationsTools & ApplicationsTools & ApplicationsTools & Applications

HDF I/O LibraryHDF I/O LibraryHDF I/O LibraryHDF I/O Library

C APIC APIC APIC APIFortranFortranFortranFortran C++C++C++C++ JavaJavaJavaJava C#C#C#C#

Page 50: HDF Update

NetCDF 4NetCDF 4

Page 51: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 56

NetCDF 4 projectNetCDF 4 project

• Enhanced NetCDF-4 Interface to HDF5• Combine features of netCDF and HDF5• Take advantage of their separate

strengths

• Collaboration between NCSA, THG, Unidata

• Currently in Alpha Release• Waiting for beta release

Page 52: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 57

NetCDF-4 ArchitectureNetCDF-4 Architecture

HDF5 Library

netCDF-4netCDF-4LibraryLibrary

netCDF-3Interface

netCDF-3applications

netCDF-3applications

netCDF-4netCDF-4applicationsapplications

netCDF-4netCDF-4applicationsapplications

HDF5applications

HDF5applications

netCDFfiles

netCDFfiles

netCDF-4HDF5 files

HDF5files

• Supports access to netCDF files and HDF5 files created through netCDF-4 interface

Page 53: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 58

Archival formatsArchival formats

• Proposal to NOAA Scientific Data Stewardship program

• Will investigate use of OAIS “Archive Information Package” standard with HDF5

• PI: Ruth Duerr (NSIDC) and Kent Yang

OAIS: Open Archival Information System

Page 54: HDF Update

Asymmetries between Asymmetries between collecting and accessing collecting and accessing

datadata

Page 55: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 60

• Huge streams of data collected …

• To be accessed in little bits…

Page 56: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 61

Challenge – efficient remote Challenge – efficient remote accessaccess

• How do we efficiently find and access data from distributed repositories, when the data are big and complex?

• Storage Resource Broker (SRB)• Efficient access to HDF5 objects in

repository

• OPeNDAP• Powerful protocol for remote querying

and subsetting of scientific data

Page 57: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 62

Example – Storage resource Example – Storage resource brokerbroker

• Storage Resource Broker – repository for heterogeneous data collections

• Simplifies storage, query and access to massive amounts of scientific data

• Has data in HDF5, netCDF, other formats

Page 58: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 63

Normal SRB configurationNormal SRB configuration

SRB ServerSRB Server

HDF5HDF5

MCAT

HDF5 File(whole file or a

sequence of bytes)

clientclient

Page 59: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 64

OPeNDAP-HDF5 projectOPeNDAP-HDF5 project

• OPeNDAP• Powerful protocol for remote querying

and subsetting of scientific data• Replaces direct file access with

remote query and access• Widely used in Earth Sciences

Page 60: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 65

OPeNDAP – HDF5 ProjectOPeNDAP – HDF5 Project

• A NASA ROSES NRA project• Tasks

• HDF5-DAP2 server (now a prototype)• HDF5-DAP4 server• DAP4 to HDF5 conversion utility• Investigate integrated DAP-aware HDF5

library

Page 61: HDF Update

SQL Server and SQL Server and HDF5 HDF5

with Microsoftwith Microsoft

Page 62: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 67

SQL Server and HDF5 SQL Server and HDF5

• Microsoft “dream environment for scientists”

• Combine data management, computing

• SQL Server 2005 solution• Combine RDBMS with scientific

analysis tools, together in one integrated system.

• HDF5 & other formats manage scientific objects

Page 63: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 68

HDF5 in SQL serverHDF5 in SQL server

Entity Framework (EDM, eSQL, O-R mapping)HDF5 EDM model

Visualization Libraries (MATLAB,…)

HDF5 files

Web Services(XML, REST, RSS)

OLAP and Data Mining Reporting

HDF5 typeHDF5 type

HDF5 Index

HDF5 FS blob

HDF5 FS blob

HDF5 TVFsHDF5 TVFs

.NET Languages with Language Integrated Query

SQL Server

Page 64: HDF Update

Thank you allThank you allandand

Thank you NASA!Thank you NASA!

Page 65: HDF Update

AcknowledgementAcknowledgementThis report is based upon work supported in part by a Cooperative Agreement with NASA under NASA NNG05GC60A. Any opinions, findings, and conclusions or recommendations expressed in this

material are those of the author(s) and do not necessarily reflect the views of the

National Aeronautics and Space Administration.

Page 66: HDF Update

Questions/Questions/comments?comments?

Page 67: HDF Update

Nov. 29, 2006

HDF Workshop X, Landover MD 72

Information SourcesInformation Sources

• HDF websitehttp://hdfgroup.org/

• HDF5 Information Centerhttp://hdfgroup.org/HDF5/

• HDF [email protected]

• HDF users mailing [email protected]

coming soon: [email protected]