Top Banner
September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 1 New Features in HDF5
114

September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

Dec 25, 2015

Download

Documents

Karen Small
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 1

New Features in HDF5

Page 2: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

Why new features?

September 9, 2008 2SPEEDUP Workshop - HDF5 Tutorial

Page 3: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 3

Why new features?

• HDF5 1.8.0 was released in February 2008• Major update of HDF5 1.6.* series (stable set of

features and APIs since 1998)• New features• 200 new APIs• Changes to file format• Changes to APIs• Backward compatible

• New releases in November 2008• HDF5 1.6.8 and 1.8.2

• Minor bug fixes• Support for new platforms and compilers

Page 4: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 4

Information about the release

http://www.hdfgroup.org/HDF5/doc/

Follow “New Features and Compatibility Issues” links

Page 5: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 5

Why new features?

• Need to address some deficiencies in initial design• Examples:

• Big overhead in file sizes• Non-tunable metadata cache implementation• Handling of free-space in a file

Page 6: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 6

Why new features?

• Need to address new requirements• Add support for

• New types of indexing (object creation order)• Big volumes of variable-length data (DNA

sequences)• Simultaneous real-time streams (fast append to one

-dimensional datasets)• UTF-8 encoding for objects’ path names• Accessing objects stored in another HDF5 files

(external or user-defined links)

Page 7: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

Outline

• Dataset and datatype improvements• Group improvements• Link revisions• Shared object header messages• Metadata cache improvements• Error handling• Backward/forward compatibility• HDF5 and NetCDF-4

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 7

Page 8: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

Dataset and Datatype Improvements

September 9, 2008 8SPEEDUP Workshop - HDF5 Tutorial

Page 9: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 9

Text-based data type descriptions

• Why:• Simplify data type creation

• Make data type creation code more readable

• Facilitate debugging by printing the text description of a data type

• What: • New routines to create an HDF5 data type through

the text description of the data type and get a text description from the HDF5 data type

Page 10: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 10

Text data type description

Example

/* Create the data type from DDL text description */dtype = H5LTtext_to_dtype(( "H5T_IEEE_F32BE\n”,H5LT_DDL);"H5T_IEEE_F32BE\n”,H5LT_DDL);

/* Convert the data type back to text */

H5LTtype_to_text(dtype, NULL, H5LT_DLL, str_len);dt_str = (char*)calloc(str_len, sizeof(char));

H5LTdtype_to_text(dtype, dt_str, H5LT_DDL, &str_len);

Page 11: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 11

Serialized datatypes and dataspaces

• Why: • Allow datatype and dataspace info to be

transmitted between processes

• Allow datatype/dataspace to be stored in non-HDF5 files

• What: • A new set of routines to serialize/deserialize HDF5

datatypes and dataspaces.

Page 12: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 12

Serialized datatypes and dataspaces

Example

/* Find the buffer length and encode a datatype into buffer */

status = H5Tencode(t_id, NULL, &cmpd_buf_size);cmpd_buf = (unsigned char*)calloc(1, cmpd_buf_size);H5Tencode(t_id, cmpd_buf, &cmpd_buf_size)

/* Decode a binary description of a datatype and retune a datatype handle */

t_id = H5Tdecode(cmpd_buf);

Page 13: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 13

Integer to float convert during I/O

• Why: • HDF5 1.6 and earlier supported conversion within

the same class (16-bit integer 32-bit integer, 64-bit float 32-bit float)

• Conversion needed to support NetCDF 4 programming model

• What: • Integer to float conversion supported during I/O

Page 14: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 14

Integer to float convert during I/O

Example: conversion is transparent to application

/* Create a dataset of 64-bit little-endian type */

dset_id = H5Dcreate(loc_id,“Mydata”, H5T_IEEE_F64LE,space_id, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);

/* Write integer data to “Mydata” */

status = H5Dwrite(dset_id, H5T_NATIVE_INT, …);

Page 15: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 15

Revised conversion exception handling

• Why: • Give apps greater control over exceptions (range

errors, etc.) during datatype conversion

• Needed to support NetCDF 4 programming model

• What: • Revised conversion exception handling

Page 16: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 16

Revised conversion exception handling

• To handle exceptions during conversions, register handling function through H5Pset_type_conv_cb().

• Cases of exception:• H5T_CONV_EXCEPT_RANGE_HI• H5T_CONV_EXCEPT_RANGE_LOW• H5T_CONV_EXCEPT_TRUNCATE• H5T_CONV_EXCEPT_PRECISION• H5T_CONV_EXCEPT_PINF• H5T_CONV_EXCEPT_NINF• H5T_CONV_EXCEPT_NAN

• Return values: H5T_CONV_ABORT, H5T_CONV_UNHANDLED, H5T_CONV_HANDLED

Page 17: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 17

Compression filter for n-bit data

• Why: • Compact storage for user-defined datatypes

• What:• When data stored on disk, padding bits chopped

off and only significant bits stored

• Supports most datatypes

• Works with compound datatypes

Page 18: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 18

N-bit compression example

• In memory, one value of N-Bit datatype is stored like this:

| byte 3 | byte 2 | byte 1 | byte 0 ||????????|????SPPP|PPPPPPPP|PPPP????|

S-sign bit P-significant bit ?-padding bit

• After passing through the N-Bit filter, all padding bits are chopped off, and the bits are stored on disk like this:

| 1st value | 2nd value ||SPPPPPPP PPPPPPPP|SPPPPPPP PPPPPPPP|...

• Opposite (decompress) when going from disk to memory• Limited to integer and floating-point data

Page 19: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 19

N-bit compression example

Example

/* Create a N-bit datatype */

dt_id = H5Tcopy(H5T_STD_I32LE); H5Tset_precision(dt_id, 16); H5Tset_offset(dt_id, 4);

/* Create and write a dataset */

dcpl_id = H5Pcreate(H5P_DATASET_CREATE); H5Pset_chunk(dcpl_id, …); H5Pset_nbit(dcpl_id);dset_id = H5Dcreate(…,…,…,…,…,dcpl_id,…); H5Dwrite(dset_id,…,…,…,…,buf);

Page 20: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 20

Offset+size storage filter

• Why:• Use less storage when less precision needed

• What:• Performs scale/offset operation on each value• Truncates result to fewer bits before storing• Currently supports integers and floats• Precision may be lost

Page 21: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 21

Example with floating-point type

• Data: {104.561, 99.459, 100.545, 105.644}• Choose scaling factor: decimal precision to keep

E.g. scale factor D = 21. Find minimum value (offset): 99.4592. Subtract minimum value from each

elementResult: {5.102, 0, 1.086, 6.185}

3. Scale data by multiplying 10D = 100Result: {510.2, 0, 108.6, 618.5}

4. Round the data to integerResult: {510 , 0, 109, 619}

5. Pack and store using min number of bits

Page 22: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 22

Offset+size storage filter

Example

/* Use scale+offset filter on integer data; let library figure out the number of minimum bits necessary to story the data without loss of precision */

H5Pset_scaleoffset (dcrp_id,H5Z_SO_INT,H5Z_SO_INT_MINBITS_DEFAULT);

H5Pset_chunk(dcrp_id,…,…);

dset_id = H5Dcreate(…,…,…,…,…,dcpl_id, …);

/* Use sclae+offset filter on floating-point data; compression may be lossy */

H5Pset_scaleoffset(dcrp_id,H5Z_SO_FLOAT_DSCALE,2 );

Page 23: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 23

“NULL” Dataspace

• Why:• Allow datasets with no elements to be described

• NetCDF 4 needed a “place holder” for attributes

• What:• A dataset with no dimensions, no data

Page 24: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 24

NULL dataspace

Example

/* Create a dataset with “NULL” dataspace*/

sp_id = H5Screate(H5S_NULL);

dset_id = H5Dcreate(…,"SDS.h5”,…,sp_id,…,…,…);

HDF5 "SDS.h5" {GROUP "/" { DATASET "IntArray" { DATATYPE H5T_STD_I32LE DATASPACE NULL DATA { } }}}

Page 25: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 25

HDF5 file format revision

Page 26: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 26

HDF5 file format revision

• Why: • Address deficiencies of the original file format

• Address space overhead in an HDF5 file

• Enable new features

• What: • New routine that instructs the HDF5 library to

create all objects using the latest version of the HDF5 file format (cmp. with the earliest version when object became available, e.g. array datatype)

• Will talk about the versioning later

Page 27: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 27

HDF5 file format revision

Example

/* Use the latest version of a file format for each object created in a file */

fapl_id = H5Pcreate(H5P_FILE_ACCESS); H5Pset_latest_format(fapl_id, 1);fid = H5Fcreate(…,…,…,fapl_id);orfid = H5Fopen(…,…,fapl_id);

Page 28: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 28

Group Revisions

Page 29: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 29

Better large group storage

• Why: • Faster, more scalable storage and access for large

groups

• What: • New format and method for storing groups with

many links

Page 30: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 30

Informal benchmark

• Create a file and a group in a file• Create up to 10^6 groups with one dataset in

each group• Compare files sizes and performance of HDF5

1.8.1 using the latest group format with the performance of HDF5 1.8.1 (default, old format) and 1.6.7

• Note: Default 1.8.1 and 1.6.7 became very slow after 700000 groups

Page 31: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

Time to open and read a dataset

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 31

Page 32: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

Time to close the file

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 32

Page 33: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

File size

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 33

Page 34: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 34

Access links by creation-time order

• Why: • Allow iteration & lookup of group’s links (children)

by creation order as well as by name order

• Support netCDF access model for netCDF 4

• What: • Option to access objects in group according to

relative creation time

Page 35: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 35

Access links by creation-time order

Example

/* Track and index creation order of the links */

H5Pset_link_creation_order(gcpl_id, (H5P_CRT_ORDER_TRACKED | H5P_CRT_ORDER_INDEXED));

/* Create a group */

gid = H5Gcreate(fid, GNAME, H5P_DEFAULT, gcpl_id, H5P_DEFAULT);

Page 36: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 36

Example: h5dump --group=1 tordergr.h5

HDF5 "tordergr.h5" {GROUP "1" { GROUP "a" { GROUP "a1" { } GROUP "a2" { GROUP "a21" { } GROUP "a22" { } } } GROUP "b" { } GROUP "c" { }}}

Page 37: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 37

Example: h5dump --sort_by=creation_order

HDF5 "tordergr.h5" {GROUP "1" { GROUP "c" { } GROUP "b" { } GROUP "a" { GROUP "a1" { } GROUP "a2" { GROUP "a22" { } GROUP "a21" { } } }}}

Page 38: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 38

“Compact groups”

• Why: • Save space and access time for small groups• If groups small, don’t need B-tree overhead

• What:• Alternate storage for groups with few links• Default storage when “latest format” is specified• Library converts to “original” storage (B-tree based)

using default or user-specified threshold

Page 39: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 39

“Compact groups”

• Example• File with 11,600 groups• With original group structure, file size ~ 20 MB• With compact groups, file size ~ 12 MB• Total savings: 8 MB (40%)• Average savings/group: ~700 bytes

Page 40: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 40

Compact groups

Example

/* Change storage to “dense” if number of group members is bigger than 16 and go back to compact storage if number of group members is smaller than 12 */

H5Pset_link_phase_change(gcpl_id, 16, 12)

/* Create a group */

g_id = H5Gcreate(…,…,…,gcpl_id,…);

Page 41: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 41

Intermediate group creation

• Why: • Simplify creation of a series of connected groups

• Avoid having to create each intermediate group separately, one by one

• What: • Intermediate groups can be created when creating

an object in a file, with one function call

Page 42: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 42

Intermediate group creation

• Want to create “/A/B/C/dset1”• “A” exists, but “B/C/dset1” do not

/A

/A

BB

dset1dset1

CC

One call creates groups “B” & “C”, then creates “dset1”

Page 43: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 43

Intermediate group creation

Example

/* Create link creation property list */

lcrp_id = H5Pcreate(H5P_LINK_CREATE);

/* Set flag for intermediate group creation

Groups B and C will be created automatically */

H5Pset_create_intermediate_group(lcrp_id, TRUE);

ds_id = H5Dcreate (file_id, "/A/B/C/dset1",…,…,

lcrp_id,…,…,);

Page 44: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

Link Revisions

September 9, 2008 44SPEEDUP Workshop - HDF5 Tutorial

Page 45: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 45

What are links?

• Links connect groups to their members• “Hard” links point to a target by address• “Soft” links store the path to a target

root group

Hard link

dataset

Soft link“/target dataset”<address>

Page 46: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 46

New: External Links

• Why:• Access objects stored in other HDF5 files in a

transparent way

• What:• Store location of file and path within that file

• Can link across files

Page 47: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 47

file2.h5

file1.h5

New: External Links

root group

“External_link”

“file2.h5”

“/A/B/C/D/E”

root group

group

“target object”

<address>

External link object “External_link” in file1.h5 points to the group/A/B/C/D/E in file2.h5

Page 48: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 48

External links

Example

/* Create an external link */

H5Lcreate_external(TARGET_FILE, ”/A/B/C/D/E", source_file_id, ”External_link”, …,…);

/* We will use external link to create a group in a target file */

gr_id = H5Gcreate(source_file_id,”External_link/F”,…,…,…,…);

/* We can access group “External_link/F” in the source file and group “/A/B/C/D/E/F” in the target file */

Page 49: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 49

New: User-defined Links

• Why:• Allow applications to create their own kinds of links and

link operations, such as• Create “hard” external link that finds an object by address• Create link that accesses a URL• Keep track of how often a link accessed, or other behavior

• What:• Applications can create new kinds of links by supplying

custom callback functions• Can do anything HDF5 hard, soft, or external links do

Page 50: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

Traversing an HDF5 file

September 9, 2008 50SPEEDUP Workshop - HDF5 Tutorial

Page 51: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 51

Traversing HDF5 file

• Why:• Allow applications to iterate through the objects in a

group or visit recursively all objects under a group

• What:• New APIs to traverse a group hierarchy • New APIs to iterate through a group using different

types of indices (name or creation order)• H5Giterate is deprecated in favor of new functions

Page 52: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 52

Traversing HDF5 file

Example of some new APIs

/* Check if object “A/B” exists in a root group */H5Lexists(file_id, “A/B”, …);

/* Iterate through group members of a root group

using name as an index; this function doesn’t recursively follow links into subgroups */

H5Literate(file_id, H5_INDEX_NAME, H5_ITER_INC, &idx, iter_link_cb, &info);

/* Visit all objects under the root group; this function recursively follow links into subgroups */

H5Lvisit(file_id, H5_INDEX_NAME, H5_ITER_INC, visit_link_cb, &info);

Page 53: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 53

Traversing HDF5 file

• Things to remember• Never use H5Ldelete in any HDF5 iterate or visit call

back functions• Always close parent object before deleting a child

object

Page 54: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

Shared Object Header Messages

September 9, 2008 54SPEEDUP Workshop - HDF5 Tutorial

Page 55: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 55

Shared object header messages

• Why: metadata duplicated many times, wasting space

• Example:• You create a file with 10,000 datasets• All use the same datatype and dataspace• HDF5 needs to write this information 10,000 times!

Dataset 1

data 1

datatype

dataspace

Dataset 2

data 2

datatype

dataspace

Dataset 3

data 3

datatype

dataspace

Page 56: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 56

Shared object header messages

What:• Enable messages to be shared automatically

• HDF5 shares duplicated messages on its own!

Dataset 1

data 1

datatype

dataspace

Dataset 2

data 2

Page 57: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 57

Shared Messages

• Happens automatically• Works with datatypes, dataspaces, attributes, fill

values, and filter pipelines• Saves space if these objects are relatively large• May be faster if HDF5 can cache shared

messages• Drawbacks

• Usually slower than non-shared messages• Adds overhead to the file

• Index for storing shared datatypes• 25 bytes per instance

• Older library versions can’t read files with shared messages

Page 58: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 58

Two informal tests

• File with 24 datasets, all with same big datatype• 26,000 bytes normally• 17,000 bytes with shared messages enabled• Saves 375 bytes per dataset

• But, make a bad decision: invoke shared messages but only create one dataset…• 9,000 bytes normally• 12,000 bytes with shared messages enabled• Probably slower when reading and writing, too.

• Moral: shared messages can be a big help, but only in the right situation!

Page 59: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 59

Error Handling

Page 60: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 60

Extendible error-handling APIs

• Why: Enable app to integrate error reporting with HDF5 library error stack

• What: New error handling API• H5Epush - push major and minor error ID on specified error

stack• H5Eprint – print specified stack• H5Ewalk – walk through specified stack• H5Eclear – clear specified stack• H5Eset_auto – turn error printing on/off for specified stack• H5Eget_auto – return settings for specified stack traversal

Page 61: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 61

Error-handling programming model

• Create new class, major and minor error messages• Register messages with the HDF5 library• Manage errors

• Use default or create new error stack • Push error• Print error stack• Close stack

Page 62: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 62

Error-handling example

#define ERR_CLS_NAME "Error Test"

#define PROG_NAME "Error Program"

#define PROG_VERS "1.0”

……

#define ERR_MAJ_TEST_MSG "Error in test”

#define ERR_MIN_MYFUNC_MSG "Error in my function”

……

/* Initialize error information for application */

ERR_CLS = H5Eregister_class(ERR_CLS_NAME, PROG_NAME, PROG_VERS);

ERR_MAJ_TEST = H5Ecreate_msg(ERR_CLS, H5E_MAJOR, ERR_MAJ_TEST_MSG);

ERR_MIN_MYFUNC = H5Ecreate_msg(ERR_CLS, H5E_MINOR, ERR_MIN_MYFUNC_MSG);

……..

/* Unregister major and minor error, and class handles when done */

H5Eunregister_class(ERR_CLS);

Page 63: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 63

Error-handling example

/* This function creates and write a dataset */

static herr_t my_function(hid_t fid)

{

…….

/* Force this function to fail and make it push error */

H5E_BEGIN_TRY {

dataset = H5Dcreate1(FAKE_ID, DSET_NAME, H5T_STD_I32BE, space,

H5P_DEFAULT);

} H5E_END_TRY;

if(dataset < 0) {

H5Epush(H5E_DEFAULT, __FILE__, FUNC_my_function, __LINE__, ERR_CLS, ERR_MAJ_IO, ERR_MIN_CREATE, "H5Dcreate failed");

goto error;

} /* end if */

……

Page 64: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 64

Error-handling example

Error Test-DIAG: Error detected in Error Program (1.0) thread 0: #000: error_example.c line 160 in main(): Error stack test failed major: Error in test minor: Error in my function #001: error_example.c line 100 in my_function(): H5Dcreate failed major: Error in IO minor: Error in H5DcreateHDF5-DIAG: Error detected in HDF5 (1.8.1) thread 0: #002: H5Ddeprec.c line 154 in H5Dcreate1(): not a location ID major: Invalid arguments to routine minor: Inappropriate type #003: H5Gloc.c line 241 in H5G_loc(): invalid object ID major: Invalid arguments to routine minor: Bad value

Page 65: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 65

Metadata cache

Page 66: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

66

HDF5 metadata

● Metadata – extra information about user’s data

● Two types of metadata:● Structural metadata: stores information

about user’s data● When you create a group, you really

create:● Group header● B-Tree (to index entries), and● Local heap (to store entry names)

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial

Page 67: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

67

HDF5 metadata

● User defined metadata (for example, created via the H5A calls)● Usually small – less than 1 KB● Accessed frequently● Small disk accesses still expensive

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial

Page 68: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

68

Overview of HDF5 metadata cache

Scenario Working set size(subset of metadata in use)

Number of metadata cache

accessesCreate datasets A,B,C,D 10^6 chunks under root group

< 1MB <50K

Initialize the chunks using a round robin (1 from A, 1 from B, 1 from C, 1 from D, repeat until done

< 1MB ~30M

10^6 random accesses across A,B,C and D

~120MB ~4M

10^6 random accesses to A only

~40MB ~4M

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial

Page 69: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

69

HDF5 metadata cache

• Challenges peculiar to metadata caching in HDF5• Varying metadata entry sizes

• Most entries are less than a few hundred bytes• Entries may be of almost any size• Encountered variations from few bytes to megabytes

• Varying working set sizes• < 1MB for most applications most of the time• ~ 8MB (astrophysics simulation code)

• Metadata cache competes with application in core• Cache must be big enough to to hold working set • Should never be significantly bigger lest is starve

the user program for core

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial

Page 70: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

70

Metadata Cache in HDF5 1.6.3 and before

Hash Table

Metadata

Metadata

Metadata

Fast No provision for collisions Eviction on collision For small hash table performance is

bad since frequently accessed entries hash to the same location

Good performance requires big size of hash table

Inefficient use of core Unsustainable as HDF5 file size and

complexity increases

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial

Page 71: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

71

Metadata Cache in HDF5 1.6.4 and 1.6.5

• Entries are stored in a hash table as before• Collisions handled by chaining• Maintain a LRU list to select candidates for eviction• Maintain a running sum of the sizes of the entries• Entries are evicted when a predefined limit on this sum

is reached • Size of the metadata cache is bounded

• Hard coded to 8MB• Doesn’t work when working set size is bigger• Larger variations on a working set sizes are anticipated• Manual control over the cache size is needed!!!!

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial

Page 72: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

72

Metadata Cache in HDF5 1.6.4 and 1.6.5

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial

9

2

4

1

8

7

6

5

3

Hash Table

LRU list

Metadata 9 Metadata 1 Metadata 3

Metadata 2

Metadata 8

Metadata 4

Metadata 5 Metadata 7

Metadata 6

Page 73: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 73

Metadata Cache improvements

• Why: • Improve I/O performance and memory usage when

accessing many objects• What:

• New metadata cache APIs• control cache size• monitor actual cache size and current hit rate

• Under the hood: adaptive cache resizing• Automatically detects the current working size• Sets max cache size to the working set size

Page 74: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 74

Metadata cache improvements

• Note: most applications do not need to worry about the cache

• See “Special topics” in the HDF5 User’s Guide for details

• And if you do see unusual memory growth or poor performance, please contact us. We want to help you.

Page 75: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 75

Forward-backward compatibility

Page 76: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 76

What do we promise to our users?

• Backward compatibility• Newer version of the library will always read files

created with an older version

• Forward compatibility• Application written to work with an older version will

compile, link and run as expected with a newer version

• Requires compilation flag

• For more information see “API Compatibility Macros in HDF5”

http://www.hdfgroup.org/HDF5/doc/index.html

Page 77: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 77

HDF5 1.8.0 file format changes

• File format changes• Support new features

• Object creation order, UTF-8 encoding, external links

• Reduce file overhead• New format for global and local heaps• New groups storage – compact groups• Shared object messages

• Enabled only by using specific API• If no new features requested, created file should

be read by older versions of the HDF5 library

Page 78: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 78

Example

• How application can create a 1.6.* incompatible file?• Latest format is used for storing compound datatypes

fapl = H5Pcreate(H5P_FILE_ACCESS);

H5Pset_latest_format(fapl, TRUE);

file = H5Fcreate(filename, H5F_ACC_TRUNC, H5P_DEFAULT, fapl);

tid = H5Tcreate(H5T_COMPOUND, sizeof(struct s1));

H5Tinsert(…);

dset = H5Dcreate(file, “New compound”, tid,………);

H5Dwrite(dset, …);

……

Page 79: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 79

HDF5 1.8.0 API changes

• HDF5 uses API versioningH5Gcreate1( loc_id, “New/My old group”, 0 )H5Gcreate2( loc_id, “New/My new group”, lcpl_id, gcpl_id,

gapl_id)

• HDF5 uses macros to map versioned API to a generic oneH5Gcreate ( loc_id, “New/My old group”, 0 )H5Gcreate ( loc_id, “New/My new group”, lcpl_id, gcpl_id,

gapl_id)• Mapping is set up at library build time; can be

overwritten by application if application is built with special compilation flags

Page 80: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 80

HDF5 1.8.0 API changes

• Examples of the new APIsH5Acreate1, H5Aopen1 - deprecatedH5Acreate2, H5Aopen2H5Gcreate1, H5Gopen1H5Gcreate2, H5Gopen2, H5G_link_hardH5Rget_obj_type1H5Rget_obj_type2

• New APIs have more parameters to set up creation and access properties for the objects

• Default values for new parameters H5P_DEFAULT will emulate old behaviorH5Gcreate ( loc_id, “New/My old group”, 0 )H5Gcreate ( loc_id, “New/My new group”, H5P_DEFAULT,H5P_DEFAULT, H5P_DEFAULT)

Page 81: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 81

HDF5 Library configuration

Configure flag(global settings)

Public symbols are mapped to

Are deprecated symbols available?

(default)--with-default-api-version=v18

1.8(e.g. H5Gcreate is mapped to H5Gcreate2, old H5Gcreate is H5Greate1)

Yes

--disable-deprecated-symbols

1.8(e.g. H5Gcreate is mapped to H5Gcreate2, H5Gcreate1 is not available)

No

--with-default-api-version=v16

1.6(e.g. H5Gcreate is mapped to H5Gcreate1, H5Gcreate2 is available)

Yes

Page 82: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 82

HDF5 application configuration

Global level• HDF5 APIs mapping can be done by application• Assuming both deprecated and new symbols are available

in the library:h5cc my_program.c

Both H5Gcreate1, H5Gcreate2 and H5Gcreate may be used

h5cc -DH5_NO_DEPRECATED_SYMBOLS my_program.cOnly new symbols are available for application; H5Gcreate is

mapped to H5Gcreate2; application may use both H5Gcreate2 and H5Gcreate; cannot use H5Gcreate1

h5cc -DH5_USE_16_API my_program.cH5Gcreate is mapped to H5Gcreate1; all three H5Gcreate1,

H5Gcreate2 and H5Gcreate can be used

Page 83: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 83

HDF5 application configuration

Per-function level• Version and mapping can be set per function• Assuming both deprecated and new symbols are available

in the library:h5cc -D H5Gcreate_vers=1 -D H5Acreate_vers=2

my_program.c• Maps H5Gcreate to H5Gcreate1• Maps H5Acreate to H5Acreate2• both H5Gcreate1 and H5Gcreate2 may be used; the same for

H5Acreate1 and H5Acreate2

h5cc -D H5Gcreate_vers=2 my_program.c• Maps H5Gcreate to H5Gcreate2• Both H5Gcreate1 and H5Gcreate2 may be used

Page 84: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 84

Example: --with-default-api-version=v18

hid_t file_id, group_id; /* identifiers */ .../* Open “file.h5” */ file_id = H5Fopen(“file.h5”, H5F_ACC_RDWR, H5P_DEFAULT, H5P_DEFAULT);

/* Create several groups in a file */ grp1_id = H5Gcreate (file_id, ”New/A", H5P_DEAFULT, gcpt, gapt); grp2_id = H5Gcreate1(file_id,"/B",0);… grp3_id = H5Gcreate2(file_id,”New/A", H5P_DEAFULT, gcpt, gapt);

Page 85: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 85

Example: --with-default-api-version=v16

hid_t file_id, group_id; /* identifiers */ .../* Open “file.h5” */ file_id = H5Fopen(“file.h5”, H5F_ACC_RDWR, H5P_DEFAULT, H5P_DEFAULT);

/* Create several groups in a file */ grp1_id = H5Gcreate (file_id, "/A",0);grp2_id = H5Gcreate1(file_id,"/B",0); grp3_id = H5Gcreate2(file_id,”New/C", H5P_DEAFULT, gcpt, gapt);

Page 86: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 86

Example: --disable-deprecated-symbols

hid_t file_id, group_id; /* identifiers */ .../* Open “file.h5” */ file_id = H5Fopen(“file.h5”, H5F_ACC_RDWR, H5P_DEFAULT, H5P_DEFAULT);

/* Create several groups in a file */ grp1_id = H5Gcreate (file_id, ”New/A", H5P_DEAFULT, gcpt, gapt);/* Compilation will fail */ grp2_id = H5Gcreate1(file_id,"/B",0); grp3_id = H5Gcreate2(file_id,”New/A", H5P_DEAFULT, gcpt, gapt);

Page 87: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 87

HDF5 and NetCDF-4

Page 88: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 88

NetCDF-3 and HDF5

NetCDF-3 HDF5

Development, maintenance, and funding

UCAR Unidata

NSF

The HDF Group

NASA, DOE, other

Advantages Popular, simple data model, lots of tools, multiple implementations (Java); data may be recovered after system crash

Flexible, works on 32-bit and 64-bit platforms, high-performance, efficient storage, rich collection of data types, extensible

Disadvantages Separate implementation to support parallel I/O (Argonne) and 64-bit platforms, limited number of data types (e.g. no support for structures), different limitations on variables, modifications after creation are inefficient

Complex, steep learning curve, easy to misuse; system crash may corrupt HDF5 file

Page 89: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 89

Goals of NetCDF/HDF combination

• Create NetCDF-4, combining desirable characteristics of NetCDF-3 and HDF5, while taking advantage of their separate strengths• Widespread use and simplicity of NetCDF-3

• Generality and performance of HDF5

• Make NetCDF more suitable for high-performance computing, large datasets

• Provide simple high-level application programming interface (API) for HDF5

• Demonstrate benefits of combination in advanced Earth science modeling efforts

Page 90: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 90

What is NetCDF-4?

•NASA-funded effort to improve • Interoperability among scientific data

representations

• Integration of observations and model outputs

• I/O for high-performance computing

•Extended NetCDF-3 data model for scientific data (better data organization, richer data types and support for

•Extended set of NetCDF-3 APIs for using the model

•A new format for NetCDF data based on HDF5

Page 91: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

What is NetCDF-4?

• Developed at Unidata in collaboration with The HDF Group

• Supported by Unidata• Released in June 2008• Based on HDF5 1.8.1• Available fromhttp://www.unidata.ucar.edu/software/netcdf/netcdf-4

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 91

Page 92: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 92

NetCDF-4 Architecture

HDF5 Library

netCDF-4netCDF-4LibraryLibrary

netCDF-3Interface

netCDF-3applications

netCDF-3applications

netCDF-4netCDF-4applicationsapplications

netCDF-4netCDF-4applicationsapplications

HDF5applications

HDF5applications

netCDFfiles

netCDFfiles

netCDF-4HDF5 files

HDF5files

• Supports access to NetCDF files and HDF5 files created through NetCDF-4 interface

Page 93: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 93

NetCDF vs HDF5 terminology

NetCDF HDF5

Dataset HDF5 file

Variable Dataset

Coordinate variable Dimension scale

Page 94: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 94

Extended NetCDF model

• NetCDF-3 models multidimensional arrays of primitive types with Variables, Dimensions, and Attributes; only one unlimited dimension is allowed

• HDF5 models multidimensional arrays of complex structures with Datasets and Attributes; multiple unlimited dimensions are allowed.

• NetCDF-4 implements an extended data model with enhancements made possible with HDF5:• Structure types: like C structures, except portable• Multiple unlimited dimensions• Groups• Variable-length objects• New primitive types: strings, unsigned types, opaque

Page 95: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 95

DatasetDatasetDatasetDataset

location: URLlocation: URL

NetCDF-3 Data Model

VariableVariablename: Stringshape: Dimension[ ]type: DataType

name: Stringshape: Dimension[ ]type: DataTypeArray read( )Array read( )Array read( )Array read( )

DimensionDimensionDimensionDimensionname: Stringlength: int

name: Stringlength: intisUnlimited( )isUnlimited( )isUnlimited( )isUnlimited( )

AttributeAttributename: Stringtype: DataTypevalue: 1 D Array

name: Stringtype: DataTypevalue: 1 D Array

DataTypeDataTypeDataTypeDataType

charbyteshortintfloatdouble

charbyteshortintfloatdouble

open( )open( )open( )open( )

Page 96: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 96

HDF5 Data Model

VariableVariablename: Stringshape: Dimension[ ]type: DataType

name: Stringshape: Dimension[ ]type: DataTypeArray read( )Array read( )Array read( )Array read( )

GroupGroupGroupGroupname: Stringmembers: Variable[ ]

name: Stringmembers: Variable[ ]

AttributeAttribute

name: Stringvalue: Variablename: Stringvalue: Variable

HDF5 FileHDF5 FileHDF5 FileHDF5 File

location: multiplelocation: multiple

open( )open( )open( )open( )

StructureStructureStructureStructureStructure

name: Stringmembers: Variable[ ]

name: Stringmembers: Variable[ ]

DataTypeDataTypeDataTypeDataTypebyte, unsigned byteshort, unsigned shortint, unsigned intlong, unsigned longfloatdoubleStringBitFieldEnumerationDateTimeOpaqueReferenceVariableLength

byte, unsigned byteshort, unsigned shortint, unsigned intlong, unsigned longfloatdoubleStringBitFieldEnumerationDateTimeOpaqueReferenceVariableLength

Page 97: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 97

A Common Data Model

VariableVariablename: Stringshape: Dimension[ ]type: DataType

name: Stringshape: Dimension[ ]type: DataTypeArray read( )Array read( )Array read( )Array read( )

GroupGroupGroupGroupname: Stringmembers: Variable[ ]

name: Stringmembers: Variable[ ]

DatasetDatasetDatasetDataset

location: URLlocation: URL

open( )open( )open( )open( )

StructureStructureStructureStructureStructure

name: Stringmembers: Variable[ ]

name: Stringmembers: Variable[ ]

DataTypeDataTypeDataTypeDataTypebyte, unsigned byteshort, unsigned shortint, unsigned intlong, unsigned longfloatdoublecharStringOpaque

byte, unsigned byteshort, unsigned shortint, unsigned intlong, unsigned longfloatdoublecharStringOpaque

DimensionDimensionDimensionDimensionname: Stringlength: int

name: Stringlength: intisUnlimited( )isUnlimited( )isVariableLength( isVariableLength( ))

isUnlimited( )isUnlimited( )isVariableLength( isVariableLength( ))AttributeAttribute

name: Stringtype: DataTypevalue: 1 D Array

name: Stringtype: DataTypevalue: 1 D Array

Page 98: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 98

NetCDF-4 Data Model

VariableVariablename: Stringshape: Dimension[ ]type: DataType

name: Stringshape: Dimension[ ]type: DataTypeArray read( )Array read( )Array read( )Array read( )

GroupGroupname: Stringmembers: Variable[ ]

name: Stringmembers: Variable[ ]

AttributeAttributename: Stringtype: DataTypevalue: 1 D Array

name: Stringtype: DataTypevalue: 1 D Array

NetcDF-4 DatasetNetcDF-4 DatasetNetcDF-4 DatasetNetcDF-4 Dataset

location: URLlocation: URL

open( )open( )open( )open( )

StructureStructureStructure

name: Stringmembers: Variable[ ]

name: Stringmembers: Variable[ ]

DataTypeDataTypeDataTypeDataTypebyte, unsigned byteshort, unsigned shortint, unsigned intlong, unsigned longfloatdoublecharStringOpaque

byte, unsigned byteshort, unsigned shortint, unsigned intlong, unsigned longfloatdoublecharStringOpaque

DimensionDimensionDimensionDimensionname: Stringlength: int

name: Stringlength: intisUnlimited( )isUnlimited( )isVariableLength( )

isUnlimited( )isUnlimited( )isVariableLength( )

Page 99: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 99

Glance at NetCDF-4 Performance

Page 100: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 100

Sequential

NetCDF-3 and NetCDF-4

Page 101: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 101

• Preliminary performance study

http://www.hdfgroup.uiuc.edu/papers/papers/“NetCDF-4 performance report” June 2008

• Used real NASA data for benchmarks• Compares NetCDF-3 and NetCDF-4

performance for reading and writing data of different dimensionality using different storage layouts and system caching parameters

NetCDF-3 and NetCDF-4

Page 102: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 102

• Summary• For contiguous access patterns performance of

NetCDF-4 is comparable with NetCDF-3• For non-contiguous access patterns chunking

feature can improve performance (use the right size!)

• NetCDF-4 compression reduces data storage size and I/O time

NetCDF-3 and NetCDF-4

Page 103: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

Non-contiguous Access

• Logical layout for 2-dimensional arrays

16384

1

16

256

256September 9, 2008 103SPEEDUP Workshop - HDF5 Tutorial

Page 104: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

Non-contiguous access

• Data layout in a file

Chunk size [16384][1]

Chunk size [8192][1]

Chunk size [4096][1]

16384 non-adjacent data points

September 9, 2008 104SPEEDUP Workshop - HDF5 Tutorial

Page 105: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

Non-contiguous write

September 9, 2008 105SPEEDUP Workshop - HDF5 Tutorial

Page 106: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

Non-contiguous read

September 9, 2008 106SPEEDUP Workshop - HDF5 Tutorial

Page 107: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

Non-contiguous access

September 9, 2008 107SPEEDUP Workshop - HDF5 Tutorial

• Word of caution• Shown chunk size will cause poor performance if

access pattern is by row or several contiguous rows

• If access pattern is not known, choose chunk size to accommodate “extreme” cases

Page 108: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 108

Parallel

NetCDF-4, PNetCDF and HDF5

Page 109: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 109

• Performance results for• A parallel version of NetCDF-3 from

ANL/Northwestern University (PnetCDF) • HDF5 parallel library 1.6.5• NetCDF-4 beta1• For more details see materials under

http://www.hdfgroup.uiuc.edu/papers/papers/

HDF5, NetCDF-4 and PNetCDF

Page 110: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 110

Flash I/O Website http://flash.uchicago.edu/~zingale/flash_benchmark_io/

Robb Ross, etc.”Parallel NetCDF: A Scientific High-Performance I/O Interface

HDF5 and PnetCDF Performance Comparison

Page 111: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 111

HDF5 and PnetCDF performance comparison

Bluesky: Power 4 uP: Power 5

Page 112: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 112

HDF5 and PnetCDF performance comparison

Bluesky: Power 4 uP: Power 5

Page 113: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 113

Parallel NetCDF-4 and PnetCDF

• Fixed problem size = 995 MB• Performance of NetCDF-4 is close to PnetCDF

0

2040

6080

100

120140

160

0 16 32 48 64 80 96 112 128 144

Number of processors

Ban

dw

idth

(M

B/S

)PNetCDF from ANL NetCDF4

Page 114: September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

September 9, 2008 SPEEDUP Workshop - HDF5 Tutorial 114

Thank you!

Questions?