ModSim Workshop 2014 Multi-scale, Multi-objective, Behavioral Modeling & Emulation of Extreme-scale Systems * NSF Center for High-Performance Reconfigurable Computing (CHREC) + Center for Compressible Multiphase Turbulence (CCMT) University of Florida Dr. Herman Lam * + Assoc. Professor of ECE Assoc. Director of CHREC Dr. Alan George* + Professor of ECE Director of CHREC Dr. Greg Stitt* + Assoc. Professor of ECE Nalini Kumar* + Carlo Pascoe* + Dylan Rudolph* + Graduate Research Assistants Dept. of ECE
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
ModSim Workshop 2014
Multi-scale, Multi-objective, Behavioral Modeling & Emulation of Extreme-scale Systems
* NSF Center for High-Performance Reconfigurable Computing (CHREC)
+ Center for Compressible Multiphase Turbulence (CCMT)
University of Florida
Dr. Herman Lam*+ Assoc. Professor of ECE
Assoc. Director of CHREC
Dr. Alan George*+ Professor of ECE
Director of CHREC
Dr. Greg Stitt*+ Assoc. Professor of ECE
Nalini Kumar*+
Carlo Pascoe*+
Dylan Rudolph*+ Graduate Research Assistants
Dept. of ECE
Outline Overview
§ Background and goal § How to study Exascale w/o Exascale § Related works
* NSF Center for High-Performance Reconfigurable Computing + Predictive Science Academic Alliance Program 3
Background & Goal
Project Goal: Study Exascale before existence of Exascale to provide advanced visibility for CMT studies
n Project conducted by researchers from CHREC* q As part of Center for Compressible Multiphase Turbulence (CCMT)
n CCMT supported by DOE, National Nuclear Security Admin q Advanced Simulation and Computing Program (PSAPP+ Program) q In first year of 5-year support
n CMT poses a grand-challenge problem q Significant importance in many environmental, industrial, & national
security applications q Objective is for CMT simulation code to run on Exascale systems for
fundamental breakthroughs
q How may we study Exascale w/o Exascale? § Analy0cal studies – systems too complicated § So?ware simula0on – simula0ons too slow at scale § Behavioral emula0on – to be defined herein § Func0onal emula0on – systems too massive and complex § Prototype device – future technology, does not exist § Prototype system – future technology, does not exist
q Many pros and cons with various methods § We believe behavioral emula0on is founda;on in terms of
balance (accuracy, ;meliness, scale, versa;lity)
4
How to Study Exascale Systems?
Related Works
Reference list in Appendix § Device (micro-‐scale) & node (meso-‐scale) simulators
* Gilbert Hendry, Joseph Kenny, Jeremy Wilke, and Benjamin Allan. SST/macro Tutorial: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2013
*
5
§ System (macro-‐scale) simulators § ROSS: C. D. Carothers et al., 2013, 2002 § SST MACRO: C. L. Janssen et al., 2010 § FASE: Grobelny, Bueno, Troxel, George, and Vetter, 2007 § BIGSIM: G. Zheng, G. Kakulapati, L. V. Kale, 2004 § ISE: George, Fogarty, Markwell, and Miars, 1999. § PARSIM: A. Symons, V. L. Narasimhan,1995.
n Component-‐based simula;on q Fundamental constructs called Behavioral Emula;on Objects (BEOs) q Characterize & represent Exascale applica;on, devices, nodes, & systems
as fabrics of interconnected Architecture BEOs & Applica;on BEOs
n Mul1-‐scale simula;on q Hierarchical method based upon experimenta;on and explora;on
n Mul1-‐objec1ve simula;on q Performance, power, reliability, and other environmental factors
6
Apps Arch BEO Models
Skeleton-‐apps System BEO fabrics § Models abstracted from Meso-‐scale § Testbed experimenta;on in support § No;onal Exascale system explora;on
Mini-‐apps Node BEO fabrics § Models abstracted from Micro-‐scale § Testbed experimenta;on in support § No;onal Exascale node explora;on
Stratix-III FPGAs), each with 4.25GB SDRAM 2010: 24 more ProcStar III cards (96 more Stratix-III
FPGAs), each with 4.25GB SDRAM 2011: 24 ProcStar IV cards (96 top-end Stratix-IV
FPGAs), each with 8.50GB SDRAM 2012: 24 more ProcStar IV cards (96 more Stratix-IV
FPGAs), each with 8.50GB SDRAM 2014: 32 ProceV cards (32 top-end Stratix-V FPGAs),
with high-speed 4x4x2 torus
Funded by U of Florida w/ generous help from Altera and GiDEL
Conclusions & Questions What is the major contribu0on of your research?
q A novel approach for behavioral simula;on & emula;on of large systems and applica;ons up to Exascale n At mul;ple scales & for mul;ple objec;ves
q Use of reconfigurable hardware (FPGAs) to provide performance and scalability required for study of extreme-‐scale systems
What is bigger picture for your research area? (ident. synergis0c projects, complementary projects in technical sense, etc.) Big picture: DOE’s $100M effort on Exascale arch explora0on q Coarse-‐grained simula;on approach (rapid virtual prototyping, RVP) q Provide a first-‐order approxima;on for design-‐space explora;on q Complementary to other (detailed & slow) fine-‐grained simula;on/
emula;on efforts
14
What are gaps you identify in the research coverage in your area?
1) Characterizing processors, networks, apps, et al. at mul0ple scales (single device to exascale system) with behavioral objects as surrogates n DOE Co-‐design centers; FastForward and DesignForward for vendor roadmaps
to Exascale; parallel coarse-‐grained and fine-‐grained simulators 2) Mapping these behavioral objects onto systems of reconfigurable
processors to maximize the number and speed of these objects 3) Adap;ng synchroniza0on and conges0on-‐modeling techniques to support
simula;on experiments with millions of these behavioral objects n Parallel large-‐scale network simulators
4) Measuring, managing, and visualizing complex behaviors in performance, resilience, and energy of systems and apps up to Exascale n Visualiza0on tools for extreme-‐scale systems
5) Augmen;ng ini;al focus upon performance evalua;on of systems and apps to include evalua;on of resilience and energy consump0on n Modeling and simula0on tools for resilience and energy consump0on
15
(Looking for synergis0c & complementary projects for leveraging & collabora0on!)
CCMT
Appendix
References (1) System (macro-‐scale) Simulators
q C. Engelmann, and T. Kaughton, “A Hardware/Soiware Performance/Resilience/Power Co-‐Design Tool for Extreme-‐scale Compu;ng”, Workshop on Modeling & Simula;on of Exascale Systems & Applica;ons, September 18th-‐19th, 2013. xSim
q C. D. Carothers, R. B. Ross, J. S. Veoer, et.al., “Combining Aspen with Massively Parallel Simula;on for Effec;ve Exascale Co-‐Design”, Workshop on Modeling & Simula;on of Exascale Systems & Applica;ons, September 18th-‐19th, 2013. ROSS
q R. Cledat, J. Fryman, I. Ganev, S. Kaplan, R. Khan et.al., “Func;onal Simulator for Exascale System Research”, Workshop on Modeling & Simula;on of Exascale Systems & Applica;ons, September 18th-‐19th, 2013. FSim
q C. L. Janssen, H. Adalsteinsson, S. Cranford, J. P. Kenny, A. Pinar, D. A. Evensky, and J. Mayo, “A simulator for large-‐scale parallel architectures” Interna;onal Journal of Parallel and Distributed Systems, vol. 1, no. 2, pp. 57-‐73, 2010. SST MACRO
q E. Grobelny, D. Bueno, I. Troxel, A.D. George, and J.S. Veoer, “FASE: A Framework for Scalable Performance Predic;on of HPC Systems and Applica;ons, Simula;on”, Simula;on, Vol. 83, No. 10, pp. 721-‐745, Oct. 2007. FASE
q L. Carrington, A. Snavely, and N. Wolter, “A performance predic;on framework for scien;fic applica;ons”. Future Genera;on Computer Systems, 22(3), 336–346 PMaC
q G. Zheng, G. Kakulapa;, L. V. Kale, “Bigsim: A parallel simulator for performance predic;on of extremely large parallel machines”, 18th IPDPS, pp. 78, 2004. BIGSIM
17 17
References (2) System (macro-‐scale) Simulators, con0nued
q A. D. George, R. B. Fogarty, J. S. Markwell, and M. D. Miars, “An Integrated Simula;on Environment for Parallel and Distributed System Prototyping”, Simula;on, vol. 72, pp. 283-‐294, May 1999. ISE
q A. Symons, V. L. Narasimhan, "Parsim-‐message PAssing computeR SIMulator," IEEE First Interna;onal Conference on Algorithms and Architectures for Parallel Processing, vol. 2, pp. 621, 630, 19-‐20, ICAPP, 1995. PARSIM
Device (micro-‐scale) & Node (meso-‐scale) Simulators q J. Wang, J. Beu, S. Yalamanchili, and T. Conte. “Designing Configurable, Modifiable and
Reusable Components for Simula;on of Mul;core Systems”, 3rd Interna;onal Workshop on Performance Modeling, Benchmarking and Simula;on of High Performance Computer Systems, November 2012. MANIFOLD
q M. Hseih, R. Riesen, K. Thompson,W. Song, A. Rodrigues, “SST: A Scalable Parallel Framework for Architecture-‐Level Performance, Power, Area and Thermal Simula;on”, Computer Journal, vol. 55, no. 2, pp. 181-‐191, 2012. SST MICRO
q N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardash;, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, “The gem5 simulator”, SIGARCH Comput. Archit. News 39, 2 (August 2011), 1-‐7. GEM5
q S. R. Alam, R.F. Barreo, M. R. Fahey, J. M. Larkin, and P.H. Worley, “Cray XT4 : An Early Evalua;on for Petascale Scien;fic Simula;on”, 2007.
q A. Hoisie, G. Johnson, D. J. Kerbyson, M. Lang, and S. Pakin, “A Performance Comparison Through Benchmarking and Modeling of Three Leading Supercomputers : Blue Gene / L, Red Storm , and Purple”, (November), 1–10, 2006.
Hardware Emula0on q Z. Tan, A. Waterman, H. Cook, S. Bird, K. Asanovi, and D. Paoerson, “A Case for FAME:
FPGA Architecture Model Execu;on”, ISCA’10, June 19–23, 2010, Saint-‐Malo, France, 290–301.
q J. Wawrzynek, D. A. Paoerson, S. Lu, and J. C. Hoe, “RAMP: A Research Accelerator for Mul;ple Processors”, 2006.