YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: Python and HDF5: Overview

Python and HDF5Andrew Collette

University of Colorado

Page 2: Python and HDF5: Overview

What makes scientific data special?

Page 3: Python and HDF5: Overview

What makes scientific data special?

It’s meant to be shared - collaborative

Ad-hoc or changing structure - flexible

Archived and preserved - robust

Python and HDF5 together address all three

Page 4: Python and HDF5: Overview

High-level language

Almost no “boilerplate” code

“Exception” error handling

Fully object-oriented

First-class module/namespace support

Readable

Self-documenting

Free

(the language)

Page 5: Python and HDF5: Overview

(the platform)

Python itself is “batteries included”

Mature numerical, plotting and scientific modules

Hundreds of specialized science packages

Thousands more general-purpose

Page 6: Python and HDF5: Overview

Core analysis packages

NumPy - Array objects and basic operations

SciPy - Advanced science & engineering library

Matplotlib - Publication-quality plots (both rendered and interactive)

Page 7: Python and HDF5: Overview

Thousands of others

Unit testing - unittest module in stdlib

Only need to write code for your problem

Web servers and development - literally hundreds

Interface: F2PY (Fortran), Cython (C), ctypes, others

Distribution - distutils/pip single-command installs

Page 8: Python and HDF5: Overview

Python highlights

Page 9: Python and HDF5: Overview

Readable

Page 10: Python and HDF5: Overview

Iteration

C

IDL

Python

Page 11: Python and HDF5: Overview

Speed

Page 12: Python and HDF5: Overview

SpeedFFTs and optimized routines built in to NumPy/Scipy

Page 13: Python and HDF5: Overview

SpeedFFTs and optimized routines built in to NumPy/Scipy

ctypes and Cython

Page 14: Python and HDF5: Overview

ctypesAdvanced foreign function interface

Call C libraries from pure Python code

Page 15: Python and HDF5: Overview

CythonExample from the HDF5 C Library:

Page 16: Python and HDF5: Overview

HDF5

Page 17: Python and HDF5: Overview

HDF5

Page 18: Python and HDF5: Overview

Hierarchical Data Format

File specification and object model

C library

Ecosystem of users and developers

3 things:

Page 19: Python and HDF5: Overview

Objects

Datasets - Homogenous arrays of data

Groups: containers holding datasets and groups

Attributes: arbitrary metadata on groups & datasets

Standard constructs using these, or make your own!

Page 20: Python and HDF5: Overview

Dataset featuresPartial I/O: read and write just what you want

Automatic type conversion

On-the-fly compression

(In Python, we even use the array-access syntax!)

Parallel reads & writes with MPI

(Directly from Python!)

Page 21: Python and HDF5: Overview

Metadata & OrganizationGroups form a POSIX-style “filesystem” in the file

Attributes can store arbitrary data on arbitrary objects

How should the file be organized?

You decide! !

Thousands of domain-specific “application formats” Anyone can read them because HDF5 is self-describing!

Page 22: Python and HDF5: Overview

Example

Page 23: Python and HDF5: Overview

Open an HDF5 file

Extract a particular dataset

Read the data

Make an interactive plot

Close the file

Page 24: Python and HDF5: Overview

Open an HDF5 file

Extract a particular dataset

Read the data

Make an interactive plot

Close the file

Page 25: Python and HDF5: Overview

Open an HDF5 file

Extract a particular dataset

Read the data

Make an interactive plot

Close the file

Page 26: Python and HDF5: Overview

Open an HDF5 file

Extract a particular dataset

Read the data

Make an interactive plot

Close the file

Page 27: Python and HDF5: Overview

Open an HDF5 file

Extract a particular dataset

Read the data

Make an interactive plot

Close the file

Page 28: Python and HDF5: Overview

Open an HDF5 file

Extract a particular dataset

Read the data

Make an interactive plot

Close the file

Page 29: Python and HDF5: Overview

Demo

Page 30: Python and HDF5: Overview

Real-world use

Page 31: Python and HDF5: Overview

UCLA Large Plasma Device

Page 32: Python and HDF5: Overview

UCLA Large Plasma Device

Image credit: Basic Plasma Science Facility

Page 33: Python and HDF5: Overview

Laser Experiment

Image credit: Basic Plasma Science Facility

Page 34: Python and HDF5: Overview

LAPD Data ProductsAcquisition file - “Planes” of data in HDF5

Metadata:timestamps, digitizer settings, probe positions,

background plasma conditions…

Packaged into HDF5 following “lab layout” Users take their data back home and analyze

Page 35: Python and HDF5: Overview

Visualization

Page 36: Python and HDF5: Overview

Python 2D plotting

A. Collette et al. Phys. Rev. Lett 105, 195003 (2010)

Page 37: Python and HDF5: Overview

Only 160 lines of code!

A. Collette et al. Phys. Rev. Lett 105, 195003 (2010)

Page 38: Python and HDF5: Overview

Python does 3D too!“MayaVi” 3D visualizer

Development sponsored by Enthought

Both offline (scripted) and interactive modes

A. Collette et al. Phys. Plasmas 18, 055705 (2011)

Page 39: Python and HDF5: Overview

CU Accelerator

Page 40: Python and HDF5: Overview

CU Accelerator

Page 41: Python and HDF5: Overview

CU Accelerator

Page 42: Python and HDF5: Overview

CU Accelerator

Page 43: Python and HDF5: Overview

CU AcceleratorRaw data HDF5 Shot file

Automated speed/mass calculation

MySQLData search

HDF5 file for user

Page 44: Python and HDF5: Overview

Where to get Python

Page 45: Python and HDF5: Overview

Where to get PythonDistributions are the best way to get started

(they include HDF5/h5py!)

Anaconda (Windows, Mac, Linux): http://continuum.io

PythonXY (Windows) http://pythonxy.googlecode.com

Page 46: Python and HDF5: Overview

Questions?


Related Documents