HDF Update HDF Update Mike Folk The HDF Group HDF and HDF-EOS Workshop X November 29, 2006 HDF HDF
May 26, 2015
HDF UpdateHDF UpdateMike Folk
The HDF GroupHDF and HDF-EOS Workshop X
November 29, 2006
HDFHDF
Nov. 29, 2006
HDF Workshop X, Landover MD 2
OutlineOutline
• Organizational info• HDF Software Update• Other Activities of Interest
Organizational infoOrganizational info
Nov. 29, 2006
HDF Workshop X, Landover MD 4
““The HDF Group” = “THG”The HDF Group” = “THG”
Founded Dec. 2006 Went solo July 15, 2006
Non-profit
THG missionTHG missionTo support the vast To support the vast
community of HDF users community of HDF users and to ensure the and to ensure the
sustainable development sustainable development of HDF technologies and of HDF technologies and the ongoing accessibility the ongoing accessibility
of HDF-stored data. of HDF-stored data.
Nov. 29, 2006
HDF Workshop X, Landover MD 6
The HDF TeamThe HDF Team
Frank BakerChristian ChilanPeter CaoVailin ChoiMike FolkAnne JenningsBarbara JonesQuincey KoziolJames Laird Raymond Lu
John MainzerMatthew NeedhamPedro NunesTammi O’NeillElena PourmalBinh-minh RiblerRandy RiblerRishi SinhaKent Yang
And all those wonderful folks out there who contribute ideas, requests, bug reports, code, and support.
HDF Software HDF Software UpdateUpdate
HDF4 updateHDF4 update
Nov. 29, 2006
HDF Workshop X, Landover MD 10
Platforms to be droppedPlatforms to be dropped
• Operating systems• HPUX 11.00 • Crays SV1 and TS
IEEE• AIX 5.1 and 5.2• SGI IRIX64-6.5• Linux 2.4• Solaris 2.7, 2.8, 2.9• Windows 2000• MAC OSX 10.3
• Compilers• GNU C compilers
older than 3.4 (Linux)
• Intel 8.*• PGI V. 5.*, 6.0
Nov. 29, 2006
HDF Workshop X, Landover MD 11
Platforms to be addedPlatforms to be added
• Systems• MAC OSX 10.4
(Intel)• Solaris 2.* on Intel • Cray XT3• Windows 64-bit (?)• Linux 2.6• HPUX 11.23• IBM Power 5
• Compilers• g95• PGI V. 6.1• Intel 9.*
Nov. 29, 2006
HDF Workshop X, Landover MD 12
New featuresNew features
• Configuration• Switched to use F77_FUNC macro for
better Fortran support (no hard-coded compilers anymore!)
• Support for shared libraries
• Library• No hard-coded limit on number of opened
files• New APIs to control number of files
opened by application• Fortran support for SZIP compression
Nov. 29, 2006
HDF Workshop X, Landover MD 13
Bugs fixesBugs fixes
• Tools•A lot of improvements to the hdp,
hrepack, hdiff and hdfimport utilites based on users’ feedback
• Library•Data corruption bug for several
opened unlimited dimension SDSs•Better handling of SDSs with
duplicated names in SDgetdimscale and more
HDF5 updateHDF5 update
Nov. 29, 2006
HDF Workshop X, Landover MD 15
No new releases!No new releases!
• Focus on HDF5 release 1.8• HDF5-1.8.0 Alpha 5 release is
available from:
hdf.ncsa.uiuc.edu/HDF5/release/alpha/obtain518.html
Nov. 29, 2006
HDF Workshop X, Landover MD 16
Platforms to be droppedPlatforms to be dropped
• Operating systems• HPUX 11.00 • MAC OS 10.3• AIX 5.1 and 5.2• SGI IRIX64-6.5• Linux 2.4• Solaris 2.8 and 2.9
• Compilers• GNU C compilers
older than 3.4 (Linux)
• Intel 8.*• PGI V. 5.*, 6.0• MPICH 1.2.5
http://www.hdfgroup.org/HDF5/release/alpha/obtain518.html
Nov. 29, 2006
HDF Workshop X, Landover MD 17
Platforms to be addedPlatforms to be added
• Systems• Alpha Open VMS• MAC OSX 10.4
(Intel)• Solaris 2.* on Intel
(?)• Cray XT3• Windows 64-bit
(32-bit binaries)• Linux 2.6• BG/L
• Compilers• g95• PGI V. 6.1• Intel 9.*• MPICH 1.2.7• MPICH2
New Features New Features in HDF5 1.8 in HDF5 1.8
Nov. 29, 2006
HDF Workshop X, Landover MD 19
HDF5 1.8 new library HDF5 1.8 new library featuresfeatures
• Datatype and dataspace features• Serialized dataspaces and datatypes• Ability to create data type from text
description• Integer to float conversions during I/O • Revised exception handling during type
conversion• Compact storage for N-bit data types• Offset+size storage filter, saving space• “Null” dataspace – datasets with no
elements• Data transformation filter
Nov. 29, 2006
HDF Workshop X, Landover MD 20
HDF5 1.8 – new library HDF5 1.8 – new library featuresfeatures
• Group revisions• Creation order access• Compact groups – small groups take
less space• Large group storage improvements• Intermediate group creation
Nov. 29, 2006
HDF Workshop X, Landover MD 21
HDF5 1.8 – HDF5 1.8 – new library featuresnew library features
• Link improvements• External links -- can refer to objects in
another file• User defined links – apps create own
kinds of links
• Attribute improvments• Storage improvements for large
numbers of attr• Iterate or look up by creation order
Nov. 29, 2006
HDF Workshop X, Landover MD 22
HDF5 1.8 – new library HDF5 1.8 – new library featuresfeatures
• Support for Unicode UTF-8 character set
• Shared header info – duplicate header info shared, possibly saving space
• Metadata cache improvements – faster I/O on files with many objects
• Data transformation filter• Stackable Virtual File Drivers• Better UNIX/Linux portability
Nov. 29, 2006
HDF Workshop X, Landover MD 23
HDF5 1.8– new APIsHDF5 1.8– new APIs
• New extendible error-handling API• New APIs to copy objects between
files fast• Dimension scale model and API• “HDFpacket” – API to read/write
packets efficiently
HDF5 1.8 – backward HDF5 1.8 – backward and forward and forward compatibilitycompatibility
Nov. 29, 2006
HDF Workshop X, Landover MD 25
HDF5 1.8 vs. 1.6.5HDF5 1.8 vs. 1.6.5
• Differences between 1.8 vs. 1.6.5• Some file format changes• Several new routines added• Old APIs deprecated -- removed in later
release
• Consequences• Application requiring 1.8 format changes
will write objects that 1.6.5 library cannot read
• To exploit 1.8 changes, apps need to be rewritten
Nov. 29, 2006
HDF Workshop X, Landover MD 26
Principle of Principle of “Maximum file format “Maximum file format
compatibility”compatibility”
Unless instructed otherwise, the HDF5 library will write objects using the earliest version of the format possible for describing the information.
Assures forward compatibility with the older versions whenever possible – objects in new files can be read with old libraries if those objects are “known” to the old libraries.
Command line Command line toolstools
Nov. 29, 2006
HDF Workshop X, Landover MD 32
New features for old toolsNew features for old tools
• h5dump• Dump data in binary format
• h5diff • Compare dataset regions
• Parallel h5diff (ph5diff) • Compare two files in MPI parallel
environment• h5repack
• Efficient data copy using H5Gcopy()• Able to handle big datasets
Nov. 29, 2006
HDF Workshop X, Landover MD 33
New HDF5 ToolsNew HDF5 Tools
• h5copy• Copies an group, dataset or named
datatype from one location to another location
• Copies within a file or across files
• h5check • Verifies an HDF5 file against the defined
HDF5 File Format Specification
• h5stat• Reports statistics about a file and objects
in a file
HDF Java ProductsHDF Java Products
Nov. 29, 2006
HDF Workshop X, Landover MD 35
HDFView changesHDFView changes
• Quality improvements for HDF-java package • Full documentation of hdf-java object package• Test suite for hdf-java object package
• Support 64-bit Java on Linux and Solaris• Many new features, including
• Change font size easily• Grab and move image• Create new table (compound dataset) from
template• Filter out fill value for image creation• -geometry option for very high resolution
displays
Nov. 29, 2006
HDF Workshop X, Landover MD 36
Future work for JavaFuture work for Java
• Update HDF5 JNI APIs for HDF5 1.8 release
• Release HDFView 2.4 with bug fixes/new features with HDF5 1.8 release
• New GUI features dealing with table, image and animation
• Writing capability for HDF5-SRB model
Website Development Website Development for for HDF-EOS Tools HDF-EOS Tools
& Information & Information CenterCenter
Nov. 29, 2006
HDF Workshop X, Landover MD 38
Website for HDF-EOS ToolsWebsite for HDF-EOS Tools
• THG now manages HDF-EOS web site• Registered domain names:
hdfeos.net/.org/.com• Re-implemented major topic areas• Re-designed interface• Registered google search
• Will continue maintenance• Phase two
• Host mailing list• Support simple forum features
Nov. 29, 2006
HDF Workshop X, Landover MD 39
Website for HDF-EOS ToolsWebsite for HDF-EOS Tools
Other Activities of Other Activities of InterestInterest
Performance R&DPerformance R&D
Nov. 29, 2006
HDF Workshop X, Landover MD 42
HDF5 - PnetCDF performance HDF5 - PnetCDF performance comparisoncomparison
uP: Power 5
Flash I/O Benchmark (Checkpoint files)
0
500
1000
1500
2000
2500
10 110 210 310
Number of Processors
MB
/sPnetCDF HDF5 collective HDF5 independent
I/O performance of PnetCDF is comparable with parallel HDF5 when the libraries are used in similar manners.
Nov. 29, 2006
HDF Workshop X, Landover MD 43
PnetCDF4 - PnetCDF PnetCDF4 - PnetCDF comparisoncomparison
I/O performance of parallel NetCDF4 is comparable with PnetCDF with about 15% slowness on average for the output of ROMS history file.
0
2040
6080
100
120140
160
0 16 32 48 64 80 96 112 128 144
Number of processors
Ban
dw
idth
(M
B/S
)PNetCDF collective NetCDF4 collective
Nov. 29, 2006
HDF Workshop X, Landover MD 44
Collective I/O Collective I/O improvementsimprovements
• HDF5 supports collective IO for non-regular selections
• Collective IO for chunked storage is not trivial.
• Non-regular selection performance optimizations:• Added IO options to achieve good collective
IO performance• Added APIs for applications to participate in
the optimization process
• See the poster
DOE LabsDOE Labs
Sandia National
Laboratory
Lawrence Livermore National
Laboratory
Nov. 29, 2006
HDF Workshop X, Landover MD 46
DOE ASC* and OthersDOE ASC* and Others
• Support HDF5 on major systems at Sandia & Lawrence Livermore National Laboratories
• R&D efforts underway• File recovery after a crash• Very fast write speed – goal is 300
MB/sec• Read-while-writing capability• Java library and HDFView
improvements* Advanced Scientific Computing project
Flight testFlight test
Nov. 29, 2006
HDF Workshop X, Landover MD 48
Flight test – collect, then Flight test – collect, then processprocess
Nov. 29, 2006
HDF Workshop X, Landover MD 49
Boeing HDF5 for flight test Boeing HDF5 for flight test datadata
• Boeing 787 active archive• 10 TB per flight-test day
• Must handle raw, real-time data• High speed ingest, by “packet”• Post-processing, by “time-history”
• Boeing High Level API’s• HDFpacket – released with HDF5 1.8• HDFtime_history – new, open version
likely
Product dataProduct data
STEPSTEP
BioinformaticsBioinformaticscaacaagccaaaactcgtacaacaacaagccaaaactcgtacaaCgagatatctcttggaaaaactCgagatatctcttggaaaaactgctcacaatattgacgtacaaggctcacaatattgacgtacaaggttgttcatgaaactttcggtagttgttcatgaaactttcggtaAcaatcgttgacattgcgacctAcaatcgttgacattgcgacctaatacagcccagcaagcagaataatacagcccagcaagcagaat
Managing genomic dataManaging genomic data
C# HDF5 API C# HDF5 API for Agilentfor Agilent
Nov. 29, 2006
HDF Workshop X, Landover MD 53
Agilent C# projectAgilent C# project
• Why?• Heavy use of C# at Agilent• Compatibility with Matlab• Other interest in HDF5 at Agilent
• What? • Prototype API in C# for Windows XP• Basic functions to create, open, close, read,
write• Limited datatypes, no partial I/O
• When?• March 2007
Nov. 29, 2006
HDF Workshop X, Landover MD 54
HDF5 SoftwareHDF5 Software
HDF FileHDF FileHDF FileHDF File
Tools & ApplicationsTools & ApplicationsTools & ApplicationsTools & Applications
HDF I/O LibraryHDF I/O LibraryHDF I/O LibraryHDF I/O Library
C APIC APIC APIC APIFortranFortranFortranFortran C++C++C++C++ JavaJavaJavaJava C#C#C#C#
NetCDF 4NetCDF 4
Nov. 29, 2006
HDF Workshop X, Landover MD 56
NetCDF 4 projectNetCDF 4 project
• Enhanced NetCDF-4 Interface to HDF5• Combine features of netCDF and HDF5• Take advantage of their separate
strengths
• Collaboration between NCSA, THG, Unidata
• Currently in Alpha Release• Waiting for beta release
Nov. 29, 2006
HDF Workshop X, Landover MD 57
NetCDF-4 ArchitectureNetCDF-4 Architecture
HDF5 Library
netCDF-4netCDF-4LibraryLibrary
netCDF-3Interface
netCDF-3applications
netCDF-3applications
netCDF-4netCDF-4applicationsapplications
netCDF-4netCDF-4applicationsapplications
HDF5applications
HDF5applications
netCDFfiles
netCDFfiles
netCDF-4HDF5 files
HDF5files
• Supports access to netCDF files and HDF5 files created through netCDF-4 interface
Nov. 29, 2006
HDF Workshop X, Landover MD 58
Archival formatsArchival formats
• Proposal to NOAA Scientific Data Stewardship program
• Will investigate use of OAIS “Archive Information Package” standard with HDF5
• PI: Ruth Duerr (NSIDC) and Kent Yang
OAIS: Open Archival Information System
Asymmetries between Asymmetries between collecting and accessing collecting and accessing
datadata
Nov. 29, 2006
HDF Workshop X, Landover MD 60
• Huge streams of data collected …
• To be accessed in little bits…
Nov. 29, 2006
HDF Workshop X, Landover MD 61
Challenge – efficient remote Challenge – efficient remote accessaccess
• How do we efficiently find and access data from distributed repositories, when the data are big and complex?
• Storage Resource Broker (SRB)• Efficient access to HDF5 objects in
repository
• OPeNDAP• Powerful protocol for remote querying
and subsetting of scientific data
Nov. 29, 2006
HDF Workshop X, Landover MD 62
Example – Storage resource Example – Storage resource brokerbroker
• Storage Resource Broker – repository for heterogeneous data collections
• Simplifies storage, query and access to massive amounts of scientific data
• Has data in HDF5, netCDF, other formats
Nov. 29, 2006
HDF Workshop X, Landover MD 63
Normal SRB configurationNormal SRB configuration
SRB ServerSRB Server
HDF5HDF5
MCAT
HDF5 File(whole file or a
sequence of bytes)
clientclient
Nov. 29, 2006
HDF Workshop X, Landover MD 64
OPeNDAP-HDF5 projectOPeNDAP-HDF5 project
• OPeNDAP• Powerful protocol for remote querying
and subsetting of scientific data• Replaces direct file access with
remote query and access• Widely used in Earth Sciences
Nov. 29, 2006
HDF Workshop X, Landover MD 65
OPeNDAP – HDF5 ProjectOPeNDAP – HDF5 Project
• A NASA ROSES NRA project• Tasks
• HDF5-DAP2 server (now a prototype)• HDF5-DAP4 server• DAP4 to HDF5 conversion utility• Investigate integrated DAP-aware HDF5
library
SQL Server and SQL Server and HDF5 HDF5
with Microsoftwith Microsoft
Nov. 29, 2006
HDF Workshop X, Landover MD 67
SQL Server and HDF5 SQL Server and HDF5
• Microsoft “dream environment for scientists”
• Combine data management, computing
• SQL Server 2005 solution• Combine RDBMS with scientific
analysis tools, together in one integrated system.
• HDF5 & other formats manage scientific objects
Nov. 29, 2006
HDF Workshop X, Landover MD 68
HDF5 in SQL serverHDF5 in SQL server
Entity Framework (EDM, eSQL, O-R mapping)HDF5 EDM model
Visualization Libraries (MATLAB,…)
HDF5 files
Web Services(XML, REST, RSS)
OLAP and Data Mining Reporting
HDF5 typeHDF5 type
HDF5 Index
HDF5 FS blob
HDF5 FS blob
HDF5 TVFsHDF5 TVFs
.NET Languages with Language Integrated Query
SQL Server
Thank you allThank you allandand
Thank you NASA!Thank you NASA!
AcknowledgementAcknowledgementThis report is based upon work supported in part by a Cooperative Agreement with NASA under NASA NNG05GC60A. Any opinions, findings, and conclusions or recommendations expressed in this
material are those of the author(s) and do not necessarily reflect the views of the
National Aeronautics and Space Administration.
Questions/Questions/comments?comments?
Nov. 29, 2006
HDF Workshop X, Landover MD 72
Information SourcesInformation Sources
• HDF websitehttp://hdfgroup.org/
• HDF5 Information Centerhttp://hdfgroup.org/HDF5/
• HDF [email protected]
• HDF users mailing [email protected]
coming soon: [email protected]