MS 15: Data-Aware Parallel Computing • Data-Driven Parallelization in Multi- Scale Applications – Ashok Srinivasan, Florida State University • Dynamic Data Driven Finite Element Modeling of Brain Shape Deformation During Neurosurgery – Amitava Majumdar, San Diego Supercomputer Center • Dynamic Computations in Large-Scale Graphs – David Bader, Georgia Tech • Tackling Obesity in Children – Radha Nandkumar, NCSA www.cs.fsu.edu/~asriniva/presentations/siampp06
24
Embed
MS 15: Data-Aware Parallel Computing Data-Driven Parallelization in Multi-Scale Applications – Ashok Srinivasan, Florida State University Dynamic Data.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
MS 15: Data-Aware Parallel Computing
• Data-Driven Parallelization in Multi-Scale Applications
– Ashok Srinivasan, Florida State University
• Dynamic Data Driven Finite Element Modeling of Brain Shape Deformation During Neurosurgery
– Amitava Majumdar, San Diego Supercomputer Center
• Dynamic Computations in Large-Scale Graphs
– David Bader, Georgia Tech
• Tackling Obesity in Children
– Radha Nandkumar, NCSA
www.cs.fsu.edu/~asriniva/presentations/siampp06
Data-Driven Parallelization in Multi-Scale Applications
Ashok Srinivasan
Computer Science, Florida State University
http://www.cs.fsu.edu/~asriniva
Aim: Simulate for long time spans
Solution features: Use data from prior simulations to parallelize the time domain
Acknowledgements: NSF, ORNL, NERSC, NCSACollaborators: Yanan Yu and Namas Chandra
Outline
• Background– Limitations of Conventional Parallelization
– Example Application: Carbon Nanotube Tensile Test
• Small Time Step Size in Molecular Dynamics Simulations
• Data-Driven Time Parallelization
• Experimental Results– Scaled efficiently to ~ 1000 processors, for a problem where
conventional parallelization scales to just 2-3 processors
• Other time parallelization approaches
• Conclusions
Background
• Limitations of Conventional Parallelization
• Example Application: Carbon Nanotube
Tensile Test
– Molecular Dynamics Simulations
• Problems with Multiple Time-Scales
Limitations of Conventional Parallelization
• Conventional parallelization decomposes the state space across processors– It is effective for large state space – It is not effective when computational effort arises from a
large number of time steps• … or when granularity becomes very fine due to a large number
of processors
Example Application Carbon Nanotube Tensile Test
• Pull the CNT at a constant velocity– Determine stress-strain response and yield strain
(when CNT starts breaking) using MD• Strain rate dependent
A Drawback of Molecular Dynamics
• Molecular dynamics– In each time step, forces of atoms on each other
modeled using some potential
– After force is computed, update positions
– Repeat for desired number of time steps• Time steps size ~ 10 –15 seconds, due to physical and
numerical considerations
– Desired time range is much larger• A million time steps are required to reach 10-9 s
• Around a day of computing for a 3000-atom CNT
• MD uses unrealistically large strain-rates
Problems with multiple time-scales
• Fine-scale computations (such as MD) are more accurate, but more time consuming– Much of the details at the finer scale are unimportant, but
some are
A simple schematic of multiple time scales
Data-Driven Time Parallelization
• Time parallelization
• Data Driven Prediction
– Dimensionality Reduction
– Relate Simulation Parameters
– Static Prediction
– Dynamic Prediction
• Verification
Time Parallelization
• Each processor simulates a different time interval
• Initial state is obtained by prediction, except for processor 0
• Verify if prediction for end state is close to that computed by MD
• Prediction is based on dynamically determining a relationship between the current simulation and those in a database of prior results
If time interval is sufficiently large, then communication overhead is small
Dimensionality Reduction• Movement of atoms in a 1000-atom CNT can be considered
the motion of a point in 3000-dimensional space • Find a lower dimensional subspace close to which the
points lie• We use principal orthogonal decomposition
– Find a low dimensional affine subspace• Motion may, however, be complex in this subspace
– Use results for different strain rates• Velocity = 10m/s, 5m/s, and 1 m/s
• Dynamically choose closest simulation for prediction
Speedup
__ 450K, 2m/s
… Linear
Stress-strain
Blue: Exact 450K
Red: 200 processors
Other time parallelization approaches
• Waveform relaxation– Repeatedly solve for the entire time domain– Parallelizes well but convergence can be slow– Several variants to improve convergence
• Parareal approach– Features similar to ours and to waveform
relaxation• Precedes our approach
– Not data-driven– Sequential phase for prediction– Not very effective in practice so far
• Has much potential to be improved
Conclusions
• Data-driven time parallelization shows significant improvement in speed, without sacrificing accuracy significantly
• Direct prediction is very effective when applicable
• The 980-processor simulation attained a flop rate of ~ 420 Gflops– Its flops per atom rate of 420 Mflops/atom
is likely the largest flop per atom rate in classical MD simulations
Future Work
• More complex problems– Better prediction
• POD is good for representing data, but not necessarily for identifying patterns
• Use better dimensionality reduction / reduced order modeling techniques
• Use experimental data for prediction
– Better learning– Better verification– In CP8: Application of Dimensionality Reduction
Techniques to Time Parallelization, Yanan Yu• Tomorrow, 2:30 – 3:00 pm