www.hdfgroup.org The HDF Group Multi-threading in HDF5: Paths Forward Current implementation - Future directions May 30-31, 2012 HDF5 Workshop at PSI 1
Dec 31, 2015
www.hdfgroup.org
The HDF Group
Multi-threading in HDF5:Paths Forward
Current implementation - Future directions
May 30-31, 2012 HDF5 Workshop at PSI 1
www.hdfgroup.org
Outline
• Introduction• Current implementation• Paths forward:
• Improve concurrency• Reduce latency
• Conclusions and Recommendations
May 30-31, 2012 HDF5 Workshop at PSI 2
www.hdfgroup.org
Introduction
• HDF5 design principles• Flexibility• Adaptability to new computational
environments• Current challenges:
• Multi-threaded applications run on multi-core systems
• HDF5 thread-safe library cannot support concurrency built into such applications
May 30-31, 2012 HDF5 Workshop at PSI 4
www.hdfgroup.org
Current Implementation
• HDF5 uses single global semaphore • Controls modification of memory and file data
structures:• One thread at a time enters the library
• An application thread enters HDF5 API routine, acquires semaphore
• Other threads are blocked until the thread completes API call and releases semaphore
• No simultaneous modifications of data structures that can cause file corruption
• No race conditions when several threads try to modify a memory data structure
May 30-31, 2012 HDF5 Workshop at PSI 6
www.hdfgroup.org
Current Implementation
• Pros:• Current implementation provides thread-safety
needed to avoid corruption of data structures• Cons:
• No concurrent use of HDF5 library by multi-threaded applications
May 30-31, 2012 HDF5 Workshop at PSI 7
www.hdfgroup.org
Improving Concurrency
• Replace single global semaphore with semaphores that guard individual data structures
• Pros:• Greater level of concurrency• No corruption of internal data structures• Each thread waits only when it needs to modify
a data structure locked by another thread• Reduces waiting time for a resource to become
available
May 30-31, 2012 HDF5 Workshop at PSI 9
www.hdfgroup.org
Improving Concurrency
• Cons:• Replacing the global semaphore with individual
semaphores, locks, etc. requires careful analysis of HDF5 data structures and their interactions
• 300K lines of C code in library will require 4-6 FTE years of knowledgeable staff
• Significant future maintenance effort• Testing challenges
May 30-31, 2012 HDF5 Workshop at PSI 10
www.hdfgroup.org
Reducing Latency
• Reduce waiting time for each thread to acquire global semaphore
• Reduce time by removing known HDF5 bottlenecks:• I/O performance• “Compute bound” (CB) operations
• Datatype conversions• Compression and other filters
• General overhead• E.g., structures for storing and accessing
chunked datasets and metadata
May 30-31, 2012 HDF5 Workshop at PSI 11
www.hdfgroup.org
Reducing Latency: I/O Performance
• Support asynchronous I/O (AIO) access to data in HDF5 file• AIO initiated within the library in response to an
API call• Completes in the background after API call has
returned• Global semaphore is released when API call
returns – less waiting time
May 30-31, 2012 HDF5 Workshop at PSI 12
www.hdfgroup.org
Reducing Latency: CB operations
• Use multiple threads within HDF5 library to• Perform datatype conversion• Perform compression on one chunk
• Multiple threads work on one chunk• Perform compression on many chunks
• Each thread works on a chunk
May 30-31, 2012 HDF5 Workshop at PSI 13
www.hdfgroup.org
Reducing Latency: General Optimizations
• Traditional optimizations• Some examples:
• Algorithm improvements for handling• Chunk cache• Hyperslab selections• Memory usage
• Data structure improvements• Chunk indices with O(1) lookup speed• Advanced B-tree implementations
May 30-31, 2012 HDF5 Workshop at PSI 14
www.hdfgroup.org
Reducing Latency
• Pros:• Smaller development effort, ~ 1.5 FTE years• Localized changes to the library• Easier to maintain • Incremental improvements
• Cons:• Still uses global semaphore
May 30-31, 2012 HDF5 Workshop at PSI 15
www.hdfgroup.org
Decision
• Reduce Latency• Decision factors:
• Available expertise• Cost• Already funded features:
• AIO• Using multiple threads to compress a chunk
• Future maintainability
May 30-31, 2012 HDF5 Workshop at PSI 17
www.hdfgroup.org
Other considerations
• Approaches are not mutually exclusive• Both can be implemented in the future if
funding is available
May 30-31, 2012 HDF5 Workshop at PSI 18