Adaptive Parallel Simulation of a Two-Timescale Model for Apoptotic Receptor-Clustering on GPUs Cooperation with M. Daub • G. Schneider Extending the Scope of Approximate Computing to Scientific Computing and Simulation Technology Dipl.-Inf. Alexander Schöll, Prof. Dr. Hans-Joachim Wunderlich E-mail: [email protected], [email protected] Institute of Computer Architecture and Computer Engineering Motivation Heterogeneous computer architectures Goal: Efficient and fault-tolerant execution of simulation applications Simulation on Reconfigurable Heterogeneous Architectures Challenges Reliability • Simulation applications: • often executed for days and months • Modern CMOS devices: • Increasingly vulnerable to reliability threats • Required: Fault-tolerant simulation algorithms Achieving optimal performance • Performance depends on the combination of implementation and utilized architecture Alexander Schöll Hans-Joachim Wunderlich SimTech Cluster of Excellence www.simtech.uni-stuttgart.de Approximate Computing • Trade-off precision for efficiency • Often limited to applications with inherent error tolerance Applying approximate computing to simulation technology • Tight accuracy constraints • Often low error resilience Acceleration of Markov-Chain Monte-Carlo Molecular Simulations Cooperation with Cooperation with J. Castillo • J. Groß Markov-Chain Monte-Carlo (MCMC) • Core of many tasks in thermodynamics • Mapping to GPU: exploiting parallel energy calculations and speculative evaluation of Monte- Carlo moves • Heterogeneous mapping to CPU and GPU results in significant speedups Collaborations in SimTech Current work Molecular Configuration Motivation • Deeper understanding for the activation of apoptosis Simulation: Dominated by extensive computing times Goals • Reduction of computation time • … to obtain extensive and detailed conclusions about the clustering behavior Computational Performance Results • Adaptive discretization of time and heterogeneous mapping to CPU and GPU results in significant speedups Biological Evaluation Evolution of ligand-receptor clusters in less than 0.5s Preconditioned Conjugate Gradient (PCG): Important sparse linear system solver • Iterative solving method Goal: PCG on approximate hardware with guaranteed result accuracy Challenges: • Error resilience is changing over time • Overhead by additional operations to monitor error resilience Solution: • Use efficient fault tolerance to monitor and adapt approximation at runtime Experimental Results • Hardware utilization and iteration count compared to execution on precise hardware 50% 60% 70% 80% 90% 100% 110% 120% 130% Hardware utilization Iteration count [1] A. Schöll, C. Braun, M. A. Kochte, and H.-J. Wunderlich, "Efficient Algorithm-Based Fault Tolerance for Sparse Matrix Operations", Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN'16, Toulouse, France, 28. June-1. July, 2016, pp. 251-262. [2] A. Schöll, C. Braun, and H.-J. Wunderlich, "Applying Efficient Fault Tolerance to Enable the Preconditioned Conjugate Gradient Solver on Approximate Computing Hardware”, in Proceedings of the 29th Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFT'16, University of Connecticut, USA, 19. – 20. September, 2016 , pp. 21 - 26. DFTS Best Paper Award 2016. [3] A. Schöll, C. Braun, M. A. Kochte, and H.-J. Wunderlich, "Low-Overhead Fault-Tolerance for the Preconditioned Conjugate Gradient Solver", in Proceedings of the IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFT'15, Amherst, MA, USA, 12.-14. October, 2015, pp. 60-66. [4] C. Braun, S. Holst, J. Castillo, J. Groß, and H.-J. Wunderlich, "Acceleration of Monte-Carlo Molecular Simulations on Hybrid Computing Architectures", in Proceedings of the 30th IEEE International Conference on Computer Design, ICCD'12, Montreal, Canada, 30. September-3. October, 2012, pp. 207-212. [5] A. Schöll, C. Braun, M. Daub, G. Schneider, and H.-J. Wunderlich, "Adaptive Parallel Simulation of a Two-Timescale Model for Apoptotic Receptor-Clustering on GPUs", in Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2014, Belfast, UK, 2.-5. November, 2014, pp. 424-431. SimTech Best Paper Award 2014. x86-64 ARM SPARC Intel MIC AMD Excavator Intel Skylake Nvidia Pascal Xilinx Zynq Xilinx Virtex Altera Stratix Central Processing Unit Graphics Processing Unit Field Programmable Gate Array CPU GPU FPGA CPU CPU GPU GPU GPU CPU CPU FPGA FPGA Approximate Computing Paradigm AC Emerging Trade-off precision for a gain in efficiency Required: Exploit inherent error tolerance of applications Approximate Computing in image processing