1 NCSA/Univ of Illinois at Urbana-Champaign HDF HDF Mike Folk, HDF Group http://hdf.ncsa.uiuc.edu/ National Center for Supercomputing Applications University of Illinois at Urbana-Champaign HDF HDF HDF/HDF-EOS Workshop III HDF/HDF-EOS Workshop III Sept. 14-16, 1999 Sept. 14-16, 1999
HDF HDF/HDF-EOS Workshop III Sept. 14-16, 1999. Mike Folk, HDF Group http://hdf.ncsa.uiuc.edu/ National Center for Supercomputing Applications University of Illinois at Urbana-Champaign. Topics. I.Overview II. NCSA HDF Activities III. HDF5 IV. HDF4 vs. HDF5. I. HDF Overview. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
To develop, promote, deploy, and support To develop, promote, deploy, and support open and free technologies that facilitate open and free technologies that facilitate scientific data storage, exchange, access, scientific data storage, exchange, access,
analysis and discovery. analysis and discovery.
5
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
What is HDF?What is HDF?
• Scientific data file format & supporting software
• For images, arrays, tables, other structures
• Features– Portability across architectures
• I/O library• Files
– Efficient I/O
– Efficient storage
6
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Why use HDF?Why use HDF?
• Manage data
• Share data
• Use software that understands HDF
• Improve I/O performance
• Improve storage efficiency
• Use an open standard
7
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
An HDF File: A Collection of An HDF File: A Collection of Scientific Data ObjectsScientific Data Objects
HDF file containing four 3-D arraysHDF file containing four 3-D arrays
8
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Mixing HDF Objects in One FileMixing HDF Objects in One File
• Free software– NCSA HDF library and utilities– Other software
• Commercial/other software that “understands”– all of HDF (Noesys, IDL, HDF Explorer)– certain HDF objects (MATLAB, WebWinds)– certain HDF applications (SHARP, WIM)
• http://hdf.ncsa.uiuc.edu/tools.html
11
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDFUniversity of Illinois at Urbana-Champaign
What platforms does HDF run on?What platforms does HDF run on?
• Sun: Solaris
• SGI: Indy, Power Challenge, Origin, Cray C90, YMP, T3E
• HP9000, HP-Convex Exemplar
• IBM: RS6000, SP2
• DEC: Alpha/Digital UNIX, OpenVMSVAX: OpenVMS
• Intel: Solarisx86, Linux, FreeBSD, Windows NT/98
• PowerPC: Mac-OS
12
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
A Sampling of HDF UsersA Sampling of HDF Users
NCSA-affiliated Science teams NCSA-affiliated Science teams Visualization, data exch, fast I/O, ... Visualization, data exch, fast I/O, ...
Mathworks, Fortner Software, Mathworks, Fortner Software, Format supported by vendors of visFormat supported by vendors of vis Research Systems Inc., etc. Research Systems Inc., etc. and data analysis softwareand data analysis software
BoeingBoeing Space-time change detection in imagesSpace-time change detection in images
Distributed Oceanographic DataDistributed Oceanographic Data Remote access to earth science dataRemote access to earth science dataSystem (DODS)System (DODS)
Army Research LabArmy Research Lab Network distributed global memoryNetwork distributed global memory
Center for Analysis & PredictionCenter for Analysis & Prediction Fast parallel I/O, portability, Fast parallel I/O, portability, of Storms of Storms multi-resolution grids multi-resolution grids
TRAPPIST TRAPPIST Exchange, analysis & visualization of Exchange, analysis & visualization of (Euro consortium) (Euro consortium) non-destructive testing datanon-destructive testing data
13
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Major User #1: EOSDISMajor User #1: EOSDIS
• ESDIS ProjectESDIS Project– open standard exchange format and I/O library for EOSDIS
– EOS applications
• HDF requirements– Earth science data types (HDF-EOS, etc,)
– User support for scientists, data producers, etc.
– Library and file structure improvements
– HDF tools, utilities, access software
– Software maintenance and QA
14
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Major User #2: ASCIMajor User #2: ASCI
• ASCI Data Models and Formats (DMF) Group – open standard exchange format and I/O library for ASCI
– DOE tri-lab ASCI applications
• HDF requirements– large datasets (> a terabyte)
– ASCI data types, especially meshes
– good performance in massive parallel environments
– primarily HDF 5
15
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
II. NCSA HDF ActivitiesII. NCSA HDF Activities
16
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Java applicationsJava applications
• HDF APIs– Basis for tools that access HDF
• HDF Viewers– HDF browser/visualizer
• HDF4 Data Server Prototype– Lessons learned about remote access to
17
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Remote Data AccessRemote Data Access
• The SDB: Web-based Server-side Data Browser
• Java for remote access
• WP-ESIP: DODS project
• Computational Grids (Globus/GASS)
18
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
HDF StandardizationHDF Standardization
• To share files, users must organize them similarly.
• HDF user groups create standard profiles– Ways to organize data in HDF files.– Metadata– API
• To simplify the tasks of defining, comparing, and producing HDF-EOS files
• Formal (ODL) descriptions of HDF-EOS objects
21
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
HCR of SwathHCR of Swath/* Project XYZ *//* First version defined on June 10th, 1998 */OBJECT = SWATH
NAME = SCAN1OBJECT = Dimension
NAME = GeoTrackSize = 1200
END_OBJECT = DimensionOBJECT = Dimension
NAME = GeoCrossTrackSize = 205
END_OBJECT = DimensionOBJECT = Dimension
NAME = DataXSize = 2410
END_OBJECT = DimensionEND_OBJECT = SWATHEND
22
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
HCRHCR
• HCR Utilities:– Converters: HCR HDF-EOS– Edit HCR and HDF-EOS– Compare HCR with HDF-EOS file
• Current projects: – Extend HCR converters to all of HDF4– Similar work with HDF5– XML too
23
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
III. HDF5III. HDF5
24
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Why HDF5?Why HDF5?
• HDF shortcomings exposed by EOSDIS, ASCI and others...– Limits on object & file size (<2GB)– Limited number of of objects (<20K)– Rigid data models– I/O performance– Aging software infrastructure (code entropy)
25
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
• …new Demands...– Bigger, faster machines and storage systems
• massive parallelism, parallel file systems
• teraflop speeds, terabyte storage
– Greater complexity• complex data structures
• complex subsetting
– More emphasis on remote & distributed access
26
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
• … and ASCI Requirements – Compatibility with vector bundle model– Compatibility with MPI-IO– Ability to transform data between memory & storage– Parallel file systems: PIOFS, HPSS, etc.
27
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
New HDF5 FeaturesNew HDF5 Features
• More scalable– Larger arrays and files– More objects
• Improved data model– New datatypes– Single comprehensive dataset object
• Improved software– More flexible, robust library– More flexible API– More I/O options
28
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
HDF5 data modelHDF5 data model
• Two primary objects
• Dataset– multidimensional array of elements – rich variety of datatypes
• group– directory-like structure – contains datasets, groups, other objects
29
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Dataset componentsDataset components
• multidimensional array
• header with metadata– datatype– dataspace– attributes– storage properties
30
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Simple datatypesSimple datatypes
• The usual scalars: integer & float
• user-defined scalars (e.g. 13-bit integers)
• variable length (e.g. strings)
• pointers to objects or regions of datasets
• enumeration
• opaque
31
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Compound datatypesCompound datatypes
• User-defined
• Comparable to C structs
• Members can be simple or compound types
• Members can be multidimensional
32
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Data Spaces Data Spaces
• How data are organized to form a dataset – rank– dimensions
• Subsetting during I/O operations– What subset of data is to be moved– In-memory organization of data– In-file organization of data
33
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
5
3
HDF5 dataset: array of recordsHDF5 dataset: array of records
DataspacesDataspacesReading Dataset into Memory from FileReading Dataset into Memory from File
Read
35
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Selection: Examples of mappings between file selections Selection: Examples of mappings between file selections and memory selections. and memory selections.
(c) A sequence of points from a 2D array to a sequence of points in a 3D array.
(d) Union of slabs in file to union of slabs in memory. No. of elements must be equal.
(b) A regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array
(a) A hyperslab from a 2D array to the corner of a smaller 2D array
36
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Attributes Attributes
• Named pieces of data
• Stored in a dataset or group header
• Operations are scaled down versions of the dataset operations – Not extendible – No compression – No partial I/O
37
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Property listProperty list
• Properties of objects or operations
• Describe how to create, store, access and transfer data
38
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Some PropertiesSome Properties
• chunked
• compressed
• extendable
• split file
Metadata for Fred
Dataset “Fred”
File AFile A
File BFile B
Data for FredData for Fred
Better subsetting access time; extendable
Improves storage efficiency, transmission speed
Datasets can be extended in any direction
Metadata in one file, raw data in another.
39
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Metadata
Dataset
Datatype
time = 32.4pressure = 987temp = 56
int16
Dataspace
Dim_3=2
Dim_2=4
Dim_1=5Rank=2 Storage properties
Chunked; compressed
Attributes
Data
Dataset componentsDataset components
40
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
GroupsGroups
• Structures for organizing the file
• Like Vgroups in HDF4
• Like directories in hierarchical file system
• Every file starts with a root group
• Groups have attributes
41
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
“root”
GroupsGroups
• A mechanism for collections of related objects
• Every file starts with a root group
• Can have attributes• Like directories
in Unix, but a graph, rather than a tree
42
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
GroupsGroups
Groups and members of groups can be sharedGroups and members of groups can be shared
root
43
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
mount!mount!mount!mount!
MountingMounting
root
File A
root
File B
44
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Reading & writing with HDF5Reading & writing with HDF5
• Set properties
• Describe the data – datatypes– rank and dimensions– mapping between file and memory
• Read/write
45
NCSA/Univ of Illinois at Urbana-Champaign
HDFHDF
Files needn’t be files - Virtual File LayerFiles needn’t be files - Virtual File Layer