HDF

Post on 12-May-2015

178 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Source: http://hdfeos.org/workshops/ws03/presentations/MikeIII.ppt

Transcript

1

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Mike Folk, HDF Grouphttp://hdf.ncsa.uiuc.edu/

National Center for Supercomputing Applications

University of Illinois at Urbana-Champaign

HDFHDF

HDF/HDF-EOS Workshop IIIHDF/HDF-EOS Workshop IIISept. 14-16, 1999Sept. 14-16, 1999

2

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

TopicsTopics

I. Overview

II. NCSA HDF Activities

III. HDF5

IV. HDF4 vs. HDF5

3

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

I. HDF OverviewI. HDF Overview

4

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

HDF MissionHDF Mission

To develop, promote, deploy, and support To develop, promote, deploy, and support open and free technologies that facilitate open and free technologies that facilitate scientific data storage, exchange, access, scientific data storage, exchange, access,

analysis and discovery. analysis and discovery.

5

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

What is HDF?What is HDF?

• Scientific data file format & supporting software

• For images, arrays, tables, other structures

• Features– Portability across architectures

• I/O library• Files

– Efficient I/O

– Efficient storage

6

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Why use HDF?Why use HDF?

• Manage data

• Share data

• Use software that understands HDF

• Improve I/O performance

• Improve storage efficiency

• Use an open standard

7

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

An HDF File: A Collection of An HDF File: A Collection of Scientific Data ObjectsScientific Data Objects

HDF file containing four 3-D arraysHDF file containing four 3-D arrays

8

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Mixing HDF Objects in One FileMixing HDF Objects in One File

3-D array3-D arrayRaster imageRaster image

TableTable

groupgroup

Raster Raster imageimage

palettepalette

HDF fileHDF file

3-D array3-D array

Lat lon temp---- ---- ----- 12 23 3.1 15 24 4.2 17 21 3.6 16 35 5.7

9

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Utilities and applications for Utilities and applications for manipulating, viewing, and manipulating, viewing, and analyzing data.analyzing data.

HDF I/O libraryHDF I/O library

– High-level, object-specific APIs.High-level, object-specific APIs.

– Low-level API for I/O to files, etc.Low-level API for I/O to files, etc.

File or other data source. File or other data source.

General Applications

ApplicationProgramming

Interfaces

Low-levelInterface

HDFfile

HDF SoftwareHDF Software

}

10

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

HDF Applications SoftwareHDF Applications Software

• Free software– NCSA HDF library and utilities– Other software

• Commercial/other software that “understands”– all of HDF (Noesys, IDL, HDF Explorer)– certain HDF objects (MATLAB, WebWinds)– certain HDF applications (SHARP, WIM)

• http://hdf.ncsa.uiuc.edu/tools.html

11

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDFUniversity of Illinois at Urbana-Champaign

What platforms does HDF run on?What platforms does HDF run on?

• Sun: Solaris

• SGI: Indy, Power Challenge, Origin, Cray C90, YMP, T3E

• HP9000, HP-Convex Exemplar

• IBM: RS6000, SP2

• DEC: Alpha/Digital UNIX, OpenVMSVAX: OpenVMS

• Intel: Solarisx86, Linux, FreeBSD, Windows NT/98

• PowerPC: Mac-OS

12

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

A Sampling of HDF UsersA Sampling of HDF Users

NCSA-affiliated Science teams NCSA-affiliated Science teams Visualization, data exch, fast I/O, ... Visualization, data exch, fast I/O, ...

Mathworks, Fortner Software, Mathworks, Fortner Software, Format supported by vendors of visFormat supported by vendors of vis Research Systems Inc., etc. Research Systems Inc., etc. and data analysis softwareand data analysis software

BoeingBoeing Space-time change detection in imagesSpace-time change detection in images

Distributed Oceanographic DataDistributed Oceanographic Data Remote access to earth science dataRemote access to earth science dataSystem (DODS)System (DODS)

Army Research LabArmy Research Lab Network distributed global memoryNetwork distributed global memory

Center for Analysis & PredictionCenter for Analysis & Prediction Fast parallel I/O, portability, Fast parallel I/O, portability, of Storms of Storms multi-resolution grids multi-resolution grids

TRAPPIST TRAPPIST Exchange, analysis & visualization of Exchange, analysis & visualization of (Euro consortium) (Euro consortium) non-destructive testing datanon-destructive testing data

13

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Major User #1: EOSDISMajor User #1: EOSDIS

• ESDIS ProjectESDIS Project– open standard exchange format and I/O library for EOSDIS

– EOS applications

• HDF requirements– Earth science data types (HDF-EOS, etc,)

– User support for scientists, data producers, etc.

– Library and file structure improvements

– HDF tools, utilities, access software

– Software maintenance and QA

14

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Major User #2: ASCIMajor User #2: ASCI

• ASCI Data Models and Formats (DMF) Group – open standard exchange format and I/O library for ASCI

– DOE tri-lab ASCI applications

• HDF requirements– large datasets (> a terabyte)

– ASCI data types, especially meshes

– good performance in massive parallel environments

– primarily HDF 5

15

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

II. NCSA HDF ActivitiesII. NCSA HDF Activities

16

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Java applicationsJava applications

• HDF APIs– Basis for tools that access HDF

• HDF Viewers– HDF browser/visualizer

• HDF4 Data Server Prototype– Lessons learned about remote access to

17

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Remote Data AccessRemote Data Access

• The SDB: Web-based Server-side Data Browser

• Java for remote access

• WP-ESIP: DODS project

• Computational Grids (Globus/GASS)

18

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

HDF StandardizationHDF Standardization

• To share files, users must organize them similarly.

• HDF user groups create standard profiles– Ways to organize data in HDF files.– Metadata– API

• Examples: HDF-EOS, ASCI DMF

19

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

General ApplicationsHDF-EOS APIHDF-EOS API

ApplicationProgramming

Interfaces

Low-levelInterface

HDFfile

HDF-EOS software layersHDF-EOS software layers

HDF-EOS ApplicationsHDF-EOS Applications

HDF-EOS profiles

20

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

““HDF Configuration Record” (HCR)HDF Configuration Record” (HCR)

• To simplify the tasks of defining, comparing, and producing HDF-EOS files

• Formal (ODL) descriptions of HDF-EOS objects

21

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

HCR of SwathHCR of Swath/* Project XYZ *//* First version defined on June 10th, 1998 */OBJECT = SWATH

NAME = SCAN1OBJECT = Dimension

NAME = GeoTrackSize = 1200

END_OBJECT = DimensionOBJECT = Dimension

NAME = GeoCrossTrackSize = 205

END_OBJECT = DimensionOBJECT = Dimension

NAME = DataXSize = 2410

END_OBJECT = DimensionEND_OBJECT = SWATHEND

22

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

HCRHCR

• HCR Utilities:– Converters: HCR HDF-EOS– Edit HCR and HDF-EOS– Compare HCR with HDF-EOS file

• Current projects: – Extend HCR converters to all of HDF4– Similar work with HDF5– XML too

23

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

III. HDF5III. HDF5

24

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Why HDF5?Why HDF5?

• HDF shortcomings exposed by EOSDIS, ASCI and others...– Limits on object & file size (<2GB)– Limited number of of objects (<20K)– Rigid data models– I/O performance– Aging software infrastructure (code entropy)

25

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

• …new Demands...– Bigger, faster machines and storage systems

• massive parallelism, parallel file systems

• teraflop speeds, terabyte storage

– Greater complexity• complex data structures

• complex subsetting

– More emphasis on remote & distributed access

26

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

• … and ASCI Requirements – Compatibility with vector bundle model– Compatibility with MPI-IO– Ability to transform data between memory & storage– Parallel file systems: PIOFS, HPSS, etc.

27

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

New HDF5 FeaturesNew HDF5 Features

• More scalable– Larger arrays and files– More objects

• Improved data model– New datatypes– Single comprehensive dataset object

• Improved software– More flexible, robust library– More flexible API– More I/O options

28

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

HDF5 data modelHDF5 data model

• Two primary objects

• Dataset– multidimensional array of elements – rich variety of datatypes

• group– directory-like structure – contains datasets, groups, other objects

29

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Dataset componentsDataset components

• multidimensional array

• header with metadata– datatype– dataspace– attributes– storage properties

30

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Simple datatypesSimple datatypes

• The usual scalars: integer & float

• user-defined scalars (e.g. 13-bit integers)

• variable length (e.g. strings)

• pointers to objects or regions of datasets

• enumeration

• opaque

31

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Compound datatypesCompound datatypes

• User-defined

• Comparable to C structs

• Members can be simple or compound types

• Members can be multidimensional

32

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Data Spaces Data Spaces

• How data are organized to form a dataset – rank– dimensions

• Subsetting during I/O operations– What subset of data is to be moved– In-memory organization of data– In-file organization of data

33

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

5

3

HDF5 dataset: array of recordsHDF5 dataset: array of records

Dimensionality: 5 x 3Dimensionality: 5 x 3

RecordRecord

int8int8 int4int4 int16int16 float32float32Datatype:Datatype:

34

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

2D array of integers 3D array of floats

File Memory

DataspacesDataspacesReading Dataset into Memory from FileReading Dataset into Memory from File

Read

35

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Selection: Examples of mappings between file selections Selection: Examples of mappings between file selections and memory selections. and memory selections.

(c) A sequence of points from a 2D array to a sequence of points in a 3D array.

(d) Union of slabs in file to union of slabs in memory. No. of elements must be equal.

(b) A regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array

(a) A hyperslab from a 2D array to the corner of a smaller 2D array

36

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Attributes Attributes

• Named pieces of data

• Stored in a dataset or group header

• Operations are scaled down versions of the dataset operations – Not extendible – No compression – No partial I/O

37

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Property listProperty list

• Properties of objects or operations

• Describe how to create, store, access and transfer data

38

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Some PropertiesSome Properties

• chunked

• compressed

• extendable

• split file

Metadata for Fred

Dataset “Fred”

File AFile A

File BFile B

Data for FredData for Fred

Better subsetting access time; extendable

Improves storage efficiency, transmission speed

Datasets can be extended in any direction

Metadata in one file, raw data in another.

39

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Metadata

Dataset

Datatype

time = 32.4pressure = 987temp = 56

int16

Dataspace

Dim_3=2

Dim_2=4

Dim_1=5Rank=2 Storage properties

Chunked; compressed

Attributes

Data

Dataset componentsDataset components

40

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

GroupsGroups

• Structures for organizing the file

• Like Vgroups in HDF4

• Like directories in hierarchical file system

• Every file starts with a root group

• Groups have attributes

41

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

“root”

GroupsGroups

• A mechanism for collections of related objects

• Every file starts with a root group

• Can have attributes• Like directories

in Unix, but a graph, rather than a tree

42

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

GroupsGroups

Groups and members of groups can be sharedGroups and members of groups can be shared

root

43

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

mount!mount!mount!mount!

MountingMounting

root

File A

root

File B

44

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Reading & writing with HDF5Reading & writing with HDF5

• Set properties

• Describe the data – datatypes– rank and dimensions– mapping between file and memory

• Read/write

45

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Files needn’t be files - Virtual File LayerFiles needn’t be files - Virtual File Layer

VFL: A public API for writing I/O drivers

memorympiostdio

Hid_t

Files Memory

““File” HandleFile” Handle

I/O driversnetwork

Network

VFL: Virtual File I/O LayerVFL: Virtual File I/O Layer

““Storage”Storage”

46

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

HDF5 toolsHDF5 tools

• Current– hdf5ls - lists contents of HDF5 file

– h5dumper - higher level view

– hdf5hdf4 converter

• Future– Convert HDF5 ascii, binary, GIFF, etc

– Convert HDF4 HDF5

– Java tools - VisAD, etc.

– File/code generation from DDL description

– Talking to vendors

47

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Other HDF5 activitiesOther HDF5 activities

• Performance tuning

• Object model

• Fortran and C++ API

• Thread-safe HDF5

48

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

IV. HDF4 vs. HDF5IV. HDF4 vs. HDF5

49

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

HDF4 vs. HDF5HDF4 vs. HDF5

• HDF4– Original format and library – Compatible with all earlier

versions– 6 primary objects

• multidim array of scalars• raster image, palette• table• annotation • group

– Biggest current user: Earth Observing System Data and Info System (EOSDIS)

• HDF5 - successor to HDF4– New format and library– Not compatible with earlier versions– 2 primary objects

• multidim. array of records• group

– Biggest current user: Accelerated Strategic Computing Initiative (ASCI)

50

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

HDF4 object types can be derived from HDF4 object types can be derived from HDF5 datasets and groupsHDF5 datasets and groups

HDF5 dataset

03 04 43 43 43 -3 72 44 50 34 45 77 34 23 57 45 67 87 00 45

March 15, 1990. Simulation with k=10.0, beta=1.22e3. Calculate the magnitude ...

HDF5 group

HDF4 Vgroup

HDF4 SDSn-dim arrayof scalars

HDF4 8-bit raster

HDF4 24-bit raster

2-dim array ofmulti-component

scalarsHDF4 Vdata1-dim arrayof records

lat lon temp 12 23 3.1 15 24 4.2 17 21 3.6 23 35 7.2 25 31 6.3

51

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

Status of HDF4 vs. HDF5Status of HDF4 vs. HDF5

• HDF4 is still an EOS standard

• HDF5 likely also

• HDF4 maintenance– Maintained as long as EOS needs it– Minimal new feature

• New applications: use HDF5 if possible!– New features, performance improvements, etc.

52

NCSA/Univ of Illinois at Urbana-Champaign

HDFHDF

HDF InformationHDF Information

• HDF Information Center– http://hdf.ncsa.uiuc.edu/

• HDF Help email address– hdfhelp@ncsa.uiuc.edu

• HDF users mailing list– hdfnews@ncsa.uiuc.edu

top related