Using Compression filters in HDF5 Euge Wintersberger ICALEPCS 2017, 8.10.2017 HDF5s` new external filter interface in action
Using Compression filters in HDF5
Euge WintersbergerICALEPCS 2017, 8.10.2017
HDF5s` new external filter interface in action
Eugen Wintersberger | Using compression filters | 8.10.2017 | Page 2
Motivation
Applying different compression algorithms to individual datasets is one of the key features of HDF5.
➔ Apply compression only where feasible
➔ Other data can be read and written without any performance penalty
➔ We can pick the optimum algorithm for each dataset
Performance key figures for a compression algorithm:
➔ Throughput (Mbyte/sec)
➔ Compression ratio depend on
Nature of the data passed to the
algorithm!
Eugen Wintersberger | Using compression filters | 8.10.2017 | Page 3
The situation before HDF5 1.8.11
Two issues
➔ Need to change sourcecode
➔ Not possible for commercial
applications!
#define H5Z_FILTER_BZIP2 305
/* declare a filter function */size_t H5Z_filter_bzip2(unsigned flags, size_t cd_nelmts, const unsigned cd_values[], size_t nbytes, size_t *buf_size,void**buf);
const H5Z_class2_t H5Z_BZIP2[1] = {{ H5Z_CLASS_T_VERS, /* H5Z_class_t version */ /* Filter id number */ (H5Z_filter_t)H5Z_FILTER_BZIP2, 1,/* encoder_present flag (set to true) */ 1,/* decoder_present flag (set to true) */ "bzip2",/* Filter name for debugging */ NULL, /* The "can apply" callback */ NULL, /* The "set local" callback */ /* The actual filter function */ (H5Z_func_t)H5Z_filter_bzip2, }};
/* somewhere in the code */status = H5Zregister(H5Z_BZIP2);
Currently used
➔ Eiger detector
➔ PyTables
➔ h5py
Could use custom filter algorithms for reading and writing
Eugen Wintersberger | Using compression filters | 8.10.2017 | Page 4
New approach since HDF5 1.8.12
Application
HDF5 library
libLZ4.so
libbitshuffle.so
libBZ2.so
HDF5_PLUGIN_PATH=...
FilterID
The library looks for the appropriate filter by itself using the ID of the filter!
Eugen Wintersberger | Using compression filters | 8.10.2017 | Page 5
Where to get the filter plugins?
Supported platforms
➔ Windows
➔ Linux
➔ macOS
Eugen Wintersberger | Using compression filters | 8.10.2017 | Page 6
Installing the filters – on Windows
Eugen Wintersberger | Using compression filters | 8.10.2017 | Page 7
Install the filters – on Linux (Debian)
Add repository key and sources list
$ wget -q -O - http://repos.pni-hdri.de/debian_repo.pub.gpg | apt-key add -
$ cd /etc/apt/sources.d
$ wget http://repos.pni-hdri.de/jessie-pni-hdri.list
Install the package
$ apt-get update
$ apt-get install hdf5-plugin-lz4
Eugen Wintersberger | Using compression filters | 8.10.2017 | Page 8
Install the filters – on Linux (Ubuntu)
Supported versions
➔ Ubuntu 14.04 (Trusty Tahr)
➔ Ubuntu 16.04 (Xenial Xerus)
Eugen Wintersberger | Using compression filters | 8.10.2017 | Page 9
Install the filters – on macOS
Installing the dependencies$ brew install cmake$ brew install git$ brew install hdf5$ brew install lz4
$ git clone https://github.com/nexusformat/HDF5-External-Filter-Plugins.git$ cd HDF5-External-Filter-Plugins$ git checkout new_build$ cmake -DENABLE_LZ4_PLUGIN=ON -DENABLE_BITSHUFFLE_PLUGIN=ON \ -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr/local/opt/hdf5$ make$ make test$ make install
Build the code
Make installationavailable
Eugen Wintersberger | Using compression filters | 8.10.2017 | Page 10
Using the filter plugins (from Python)
> Reading – there is nothing you have to do
> Writing
import h5py
f = h5py.File("bitshuffle_file.h5","w")filter_id = 32008d1 = f.create_dataset("with_lz4",(100,100),compression=filter_id, compression_opts=(0,2))d2 = f.create_dataset("without_lz4",(100,100),compression=filter_id)
➔ No additional packages must be imported
➔ You need to know
The filters ID
The compression options accepted by the filter
Eugen Wintersberger | Using compression filters | 8.10.2017 | Page 11
Current status
➔ Included filters:
BZIP2
LZ4
LZ4+bitshuffle
➔ Installation packages for:
Windows (VS2015),
Linux (Debian, Ubuntu)
➔ Simplified build for Windows using Conan
Eugen Wintersberger | Using compression filters | 8.10.2017 | Page 12
Todos
➔ Create GitHub pages
➔ Update the documentation
➔ Review of the LZ4 API calls for the new LZ4 1.4 version
➔ BLOSC filter is still missing
➔ Installation packages for
MacOS
RPM based Linux distributions (RedHat, CentOS, …)
Update Debian packages
Eugen Wintersberger | Using compression filters | 8.10.2017 | Page 13
Thank you for your attention!
Questions?