Top Banner
1 NCSA/Univ of Illinois at Urbana-Champaign HDF HDF Mike Folk, HDF Group http://hdf.ncsa.uiuc.edu/ National Center for Supercomputing Applications University of Illinois at Urbana-Champaign HDF HDF HDF/HDF-EOS Workshop III HDF/HDF-EOS Workshop III Sept. 14-16, 1999 Sept. 14-16, 1999
52

HDF

May 12, 2015

Download

Technology

Source: http://hdfeos.org/workshops/ws03/presentations/MikeIII.ppt
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HDF

1

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Mike Folk, HDF Grouphttp://hdf.ncsa.uiuc.edu/

National Center for Supercomputing Applications

University of Illinois at Urbana-Champaign

HDFHDF

HDF/HDF-EOS Workshop IIIHDF/HDF-EOS Workshop IIISept. 14-16, 1999Sept. 14-16, 1999

Page 2: HDF

2

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

TopicsTopics

I. Overview

II. NCSA HDF Activities

III. HDF5

IV. HDF4 vs. HDF5

Page 3: HDF

3

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

I. HDF OverviewI. HDF Overview

Page 4: HDF

4

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

HDF MissionHDF Mission

To develop, promote, deploy, and support To develop, promote, deploy, and support open and free technologies that facilitate open and free technologies that facilitate scientific data storage, exchange, access, scientific data storage, exchange, access,

analysis and discovery. analysis and discovery.

Page 5: HDF

5

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

What is HDF?What is HDF?

• Scientific data file format & supporting software

• For images, arrays, tables, other structures

• Features– Portability across architectures

• I/O library• Files

– Efficient I/O

– Efficient storage

Page 6: HDF

6

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Why use HDF?Why use HDF?

• Manage data

• Share data

• Use software that understands HDF

• Improve I/O performance

• Improve storage efficiency

• Use an open standard

Page 7: HDF

7

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

An HDF File: A Collection of An HDF File: A Collection of Scientific Data ObjectsScientific Data Objects

HDF file containing four 3-D arraysHDF file containing four 3-D arrays

Page 8: HDF

8

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Mixing HDF Objects in One FileMixing HDF Objects in One File

3-D array3-D arrayRaster imageRaster image

TableTable

groupgroup

Raster Raster imageimage

palettepalette

HDF fileHDF file

3-D array3-D array

Lat lon temp---- ---- ----- 12 23 3.1 15 24 4.2 17 21 3.6 16 35 5.7

Page 9: HDF

9

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Utilities and applications for Utilities and applications for manipulating, viewing, and manipulating, viewing, and analyzing data.analyzing data.

HDF I/O libraryHDF I/O library

– High-level, object-specific APIs.High-level, object-specific APIs.

– Low-level API for I/O to files, etc.Low-level API for I/O to files, etc.

File or other data source. File or other data source.

General Applications

ApplicationProgramming

Interfaces

Low-levelInterface

HDFfile

HDF SoftwareHDF Software

}

Page 10: HDF

10

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

HDF Applications SoftwareHDF Applications Software

• Free software– NCSA HDF library and utilities– Other software

• Commercial/other software that “understands”– all of HDF (Noesys, IDL, HDF Explorer)– certain HDF objects (MATLAB, WebWinds)– certain HDF applications (SHARP, WIM)

• http://hdf.ncsa.uiuc.edu/tools.html

Page 11: HDF

11

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDFUniversity of Illinois at Urbana-Champaign

What platforms does HDF run on?What platforms does HDF run on?

• Sun: Solaris

• SGI: Indy, Power Challenge, Origin, Cray C90, YMP, T3E

• HP9000, HP-Convex Exemplar

• IBM: RS6000, SP2

• DEC: Alpha/Digital UNIX, OpenVMSVAX: OpenVMS

• Intel: Solarisx86, Linux, FreeBSD, Windows NT/98

• PowerPC: Mac-OS

Page 12: HDF

12

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

A Sampling of HDF UsersA Sampling of HDF Users

NCSA-affiliated Science teams NCSA-affiliated Science teams Visualization, data exch, fast I/O, ... Visualization, data exch, fast I/O, ...

Mathworks, Fortner Software, Mathworks, Fortner Software, Format supported by vendors of visFormat supported by vendors of vis Research Systems Inc., etc. Research Systems Inc., etc. and data analysis softwareand data analysis software

BoeingBoeing Space-time change detection in imagesSpace-time change detection in images

Distributed Oceanographic DataDistributed Oceanographic Data Remote access to earth science dataRemote access to earth science dataSystem (DODS)System (DODS)

Army Research LabArmy Research Lab Network distributed global memoryNetwork distributed global memory

Center for Analysis & PredictionCenter for Analysis & Prediction Fast parallel I/O, portability, Fast parallel I/O, portability, of Storms of Storms multi-resolution grids multi-resolution grids

TRAPPIST TRAPPIST Exchange, analysis & visualization of Exchange, analysis & visualization of (Euro consortium) (Euro consortium) non-destructive testing datanon-destructive testing data

Page 13: HDF

13

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Major User #1: EOSDISMajor User #1: EOSDIS

• ESDIS ProjectESDIS Project– open standard exchange format and I/O library for EOSDIS

– EOS applications

• HDF requirements– Earth science data types (HDF-EOS, etc,)

– User support for scientists, data producers, etc.

– Library and file structure improvements

– HDF tools, utilities, access software

– Software maintenance and QA

Page 14: HDF

14

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Major User #2: ASCIMajor User #2: ASCI

• ASCI Data Models and Formats (DMF) Group – open standard exchange format and I/O library for ASCI

– DOE tri-lab ASCI applications

• HDF requirements– large datasets (> a terabyte)

– ASCI data types, especially meshes

– good performance in massive parallel environments

– primarily HDF 5

Page 15: HDF

15

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

II. NCSA HDF ActivitiesII. NCSA HDF Activities

Page 16: HDF

16

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Java applicationsJava applications

• HDF APIs– Basis for tools that access HDF

• HDF Viewers– HDF browser/visualizer

• HDF4 Data Server Prototype– Lessons learned about remote access to

Page 17: HDF

17

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Remote Data AccessRemote Data Access

• The SDB: Web-based Server-side Data Browser

• Java for remote access

• WP-ESIP: DODS project

• Computational Grids (Globus/GASS)

Page 18: HDF

18

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

HDF StandardizationHDF Standardization

• To share files, users must organize them similarly.

• HDF user groups create standard profiles– Ways to organize data in HDF files.– Metadata– API

• Examples: HDF-EOS, ASCI DMF

Page 19: HDF

19

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

General ApplicationsHDF-EOS APIHDF-EOS API

ApplicationProgramming

Interfaces

Low-levelInterface

HDFfile

HDF-EOS software layersHDF-EOS software layers

HDF-EOS ApplicationsHDF-EOS Applications

HDF-EOS profiles

Page 20: HDF

20

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

““HDF Configuration Record” (HCR)HDF Configuration Record” (HCR)

• To simplify the tasks of defining, comparing, and producing HDF-EOS files

• Formal (ODL) descriptions of HDF-EOS objects

Page 21: HDF

21

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

HCR of SwathHCR of Swath/* Project XYZ *//* First version defined on June 10th, 1998 */OBJECT = SWATH

NAME = SCAN1OBJECT = Dimension

NAME = GeoTrackSize = 1200

END_OBJECT = DimensionOBJECT = Dimension

NAME = GeoCrossTrackSize = 205

END_OBJECT = DimensionOBJECT = Dimension

NAME = DataXSize = 2410

END_OBJECT = DimensionEND_OBJECT = SWATHEND

Page 22: HDF

22

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

HCRHCR

• HCR Utilities:– Converters: HCR HDF-EOS– Edit HCR and HDF-EOS– Compare HCR with HDF-EOS file

• Current projects: – Extend HCR converters to all of HDF4– Similar work with HDF5– XML too

Page 23: HDF

23

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

III. HDF5III. HDF5

Page 24: HDF

24

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Why HDF5?Why HDF5?

• HDF shortcomings exposed by EOSDIS, ASCI and others...– Limits on object & file size (<2GB)– Limited number of of objects (<20K)– Rigid data models– I/O performance– Aging software infrastructure (code entropy)

Page 25: HDF

25

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

• …new Demands...– Bigger, faster machines and storage systems

• massive parallelism, parallel file systems

• teraflop speeds, terabyte storage

– Greater complexity• complex data structures

• complex subsetting

– More emphasis on remote & distributed access

Page 26: HDF

26

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

• … and ASCI Requirements – Compatibility with vector bundle model– Compatibility with MPI-IO– Ability to transform data between memory & storage– Parallel file systems: PIOFS, HPSS, etc.

Page 27: HDF

27

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

New HDF5 FeaturesNew HDF5 Features

• More scalable– Larger arrays and files– More objects

• Improved data model– New datatypes– Single comprehensive dataset object

• Improved software– More flexible, robust library– More flexible API– More I/O options

Page 28: HDF

28

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

HDF5 data modelHDF5 data model

• Two primary objects

• Dataset– multidimensional array of elements – rich variety of datatypes

• group– directory-like structure – contains datasets, groups, other objects

Page 29: HDF

29

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Dataset componentsDataset components

• multidimensional array

• header with metadata– datatype– dataspace– attributes– storage properties

Page 30: HDF

30

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Simple datatypesSimple datatypes

• The usual scalars: integer & float

• user-defined scalars (e.g. 13-bit integers)

• variable length (e.g. strings)

• pointers to objects or regions of datasets

• enumeration

• opaque

Page 31: HDF

31

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Compound datatypesCompound datatypes

• User-defined

• Comparable to C structs

• Members can be simple or compound types

• Members can be multidimensional

Page 32: HDF

32

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Data Spaces Data Spaces

• How data are organized to form a dataset – rank– dimensions

• Subsetting during I/O operations– What subset of data is to be moved– In-memory organization of data– In-file organization of data

Page 33: HDF

33

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

5

3

HDF5 dataset: array of recordsHDF5 dataset: array of records

Dimensionality: 5 x 3Dimensionality: 5 x 3

RecordRecord

int8int8 int4int4 int16int16 float32float32Datatype:Datatype:

Page 34: HDF

34

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

2D array of integers 3D array of floats

File Memory

DataspacesDataspacesReading Dataset into Memory from FileReading Dataset into Memory from File

Read

Page 35: HDF

35

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Selection: Examples of mappings between file selections Selection: Examples of mappings between file selections and memory selections. and memory selections.

(c) A sequence of points from a 2D array to a sequence of points in a 3D array.

(d) Union of slabs in file to union of slabs in memory. No. of elements must be equal.

(b) A regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array

(a) A hyperslab from a 2D array to the corner of a smaller 2D array

Page 36: HDF

36

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Attributes Attributes

• Named pieces of data

• Stored in a dataset or group header

• Operations are scaled down versions of the dataset operations – Not extendible – No compression – No partial I/O

Page 37: HDF

37

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Property listProperty list

• Properties of objects or operations

• Describe how to create, store, access and transfer data

Page 38: HDF

38

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Some PropertiesSome Properties

• chunked

• compressed

• extendable

• split file

Metadata for Fred

Dataset “Fred”

File AFile A

File BFile B

Data for FredData for Fred

Better subsetting access time; extendable

Improves storage efficiency, transmission speed

Datasets can be extended in any direction

Metadata in one file, raw data in another.

Page 39: HDF

39

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Metadata

Dataset

Datatype

time = 32.4pressure = 987temp = 56

int16

Dataspace

Dim_3=2

Dim_2=4

Dim_1=5Rank=2 Storage properties

Chunked; compressed

Attributes

Data

Dataset componentsDataset components

Page 40: HDF

40

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

GroupsGroups

• Structures for organizing the file

• Like Vgroups in HDF4

• Like directories in hierarchical file system

• Every file starts with a root group

• Groups have attributes

Page 41: HDF

41

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

“root”

GroupsGroups

• A mechanism for collections of related objects

• Every file starts with a root group

• Can have attributes• Like directories

in Unix, but a graph, rather than a tree

Page 42: HDF

42

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

GroupsGroups

Groups and members of groups can be sharedGroups and members of groups can be shared

root

Page 43: HDF

43

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

mount!mount!mount!mount!

MountingMounting

root

File A

root

File B

Page 44: HDF

44

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Reading & writing with HDF5Reading & writing with HDF5

• Set properties

• Describe the data – datatypes– rank and dimensions– mapping between file and memory

• Read/write

Page 45: HDF

45

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Files needn’t be files - Virtual File LayerFiles needn’t be files - Virtual File Layer

VFL: A public API for writing I/O drivers

memorympiostdio

Hid_t

Files Memory

““File” HandleFile” Handle

I/O driversnetwork

Network

VFL: Virtual File I/O LayerVFL: Virtual File I/O Layer

““Storage”Storage”

Page 46: HDF

46

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

HDF5 toolsHDF5 tools

• Current– hdf5ls - lists contents of HDF5 file

– h5dumper - higher level view

– hdf5hdf4 converter

• Future– Convert HDF5 ascii, binary, GIFF, etc

– Convert HDF4 HDF5

– Java tools - VisAD, etc.

– File/code generation from DDL description

– Talking to vendors

Page 47: HDF

47

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Other HDF5 activitiesOther HDF5 activities

• Performance tuning

• Object model

• Fortran and C++ API

• Thread-safe HDF5

Page 48: HDF

48

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

IV. HDF4 vs. HDF5IV. HDF4 vs. HDF5

Page 49: HDF

49

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

HDF4 vs. HDF5HDF4 vs. HDF5

• HDF4– Original format and library – Compatible with all earlier

versions– 6 primary objects

• multidim array of scalars• raster image, palette• table• annotation • group

– Biggest current user: Earth Observing System Data and Info System (EOSDIS)

• HDF5 - successor to HDF4– New format and library– Not compatible with earlier versions– 2 primary objects

• multidim. array of records• group

– Biggest current user: Accelerated Strategic Computing Initiative (ASCI)

Page 50: HDF

50

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

HDF4 object types can be derived from HDF4 object types can be derived from HDF5 datasets and groupsHDF5 datasets and groups

HDF5 dataset

03 04 43 43 43 -3 72 44 50 34 45 77 34 23 57 45 67 87 00 45

March 15, 1990. Simulation with k=10.0, beta=1.22e3. Calculate the magnitude ...

HDF5 group

HDF4 Vgroup

HDF4 SDSn-dim arrayof scalars

HDF4 8-bit raster

HDF4 24-bit raster

2-dim array ofmulti-component

scalarsHDF4 Vdata1-dim arrayof records

lat lon temp 12 23 3.1 15 24 4.2 17 21 3.6 23 35 7.2 25 31 6.3

Page 51: HDF

51

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Status of HDF4 vs. HDF5Status of HDF4 vs. HDF5

• HDF4 is still an EOS standard

• HDF5 likely also

• HDF4 maintenance– Maintained as long as EOS needs it– Minimal new feature

• New applications: use HDF5 if possible!– New features, performance improvements, etc.

Page 52: HDF

52

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

HDF InformationHDF Information

• HDF Information Center– http://hdf.ncsa.uiuc.edu/

• HDF Help email address– [email protected]

• HDF users mailing list– [email protected]