Computer Science and Mathematics Division Overview Briefing for RAMS Program Barney Maccabe, PhD Director, Computer Science and Mathematics Division Oak Ridge National Laboratory January 21, 2009
Computer Science and Mathematics Division Overview
Briefing for RAMS Program
Barney Maccabe, PhDDirector,Computer Science and Mathematics DivisionOak Ridge National Laboratory
January 21, 2009
Computer Science and Mathematics DivisionComputer Science and Mathematics Division
Mission
Our mission includes basic research in computational sciences and application of advanced computing systems, computational, mathematical and analysis techniques to the solution of scientific problems of national importance. We seek to work collaboratively with universities throughout the world to enhance science education and progress in the computational sciences.
Vision
The Computer Science and Mathematics Division (CSMD) seeks to maintain our position as the premier location among DOE laboratories and to become the premier location worldwide where outstanding scientists, computer scientists and mathematicians can perform interdisciplinary computational research.
2
The Computer Science and Mathematics Division (CSMD) is ORNL's premier source of basic and applied research in high-performance computing, applied mathematics, and intelligent systems. Basic and applied research programs are focused on computational sciences, intelligent systems, and information technologies.
Complex Systems
J. BarhenP. A. Boyd
Y. Y. BraimanC. W. GloverT. S. HumbleN. ImamS. M. Lenhart7
B. Liu5
L. E. Parker7
N. S. V. RaoD. B. Reister7
F. M. Rudakov3
D. R. TufanoB. D. Sullivan
Computer ScienceResearch
G. A. GeistC. Sonewald
P. K. AgarwalD. E. BernholdtM. L. ChenX. ChengJ. J. Dongarra7
W. R. ElwasifC. EngelmannS. S. Hampton5
G. H. Kora8
J. A. KohlA. Longo2
X. Ma2
T. J. NaughtonH. H. OngB-H. ParkN. F. Samatova2
S. L. ScottA. TikotekarA. Shet5a
G. R. ValleeS. VazhkudaiW. R. WingM. Wolf2
Statistics andData Science
A. B. Maccabe(acting)L. E. Thurston
R. W. CountsB. L. JacksonG. OstrouchovL. C. PouchardD. D. SchmoyerK. A. Stewart6
D. A. Wolf
Computational Earth Sciences
J. B. DrakeC. Sonewald
M. L. BranstetterD. J. EricksonK. J. EvansM. W. HamF. M. HoffmanR. T. MillsP. H. Worley
Computational Mathematics
E. F. D’AzevedoL. E. Thurston
V. Alexiades7
R. K. ArchibaldB. V. Asokan5
V. K. ChakravarthyR. DeiterdingG. I. FannS. N. FataL. J. GrayC. S. GroerR. Hartman-Baker6
J. C. HillJ. Jia5
A. K. KhamaysehM. R. Leuze7
V. Mantic2a
Y. Ou8
S. PannalaP. Plechac2
O. Sallah2a
Computational Materials Science
J. C. WellsK. L. Lewis
G. Alvarez6a
R. M. Day8
Y. Gao2
A. A. GorinI. Jouline2
P. R. Kent8
T. A. Maier6a
D. M. NicholsonP. K. NukalaB. RadhakrishnanG. B. SarmaS. SimunovicX. Tao5
X-G. Zhang6a
ComputationalChemical Sciences
R. J. HarrisonL. E. Thurston
E. ApraJ. BernholcA. Beste8
D. Crawford2
E. Cruz-Siva5
T. Fridman8
M. Goswami5J. Huang5
W. Lu2
V. MeunierM. B. Nardelli2C. PanK. Saha5
W. A. SheltonB. G. SumpterA. Vazquez-Mayagoitia
J. Cobb, Lead, TeraGrid
FutureTechnologies
J. S. VetterT. S. Darland
S. R. AlamD. A. Bader2
C. B. McCurdy5
J. S. MeredithK. J. RocheP. C. RothT. L. Sterling2a
O. O. StoraasliV. Tipparaju
ADVISORY COMMITTEEJerry Bernholc David KeyesThomas Sterling Warren Washington
Computer Science and MathematicsA. B. Maccabe, Director
L. M. Wolfe, Division Secretary
S. W. Poole, Chief Scientist andDirector of Special Programs
Operations Council
Finance TechnicalU. F. Henderson Information and
CommunicationsHuman Resources B. A. Riley6b
N. Y. Wright D. L. PackJ. Gergel
Recruiter Facility 5600/5700W. L. Peper J. Gergel
Computer Security ESH/Safety OfficerJ. G. Winfree6 D. L. Pack
Quality Assurance CSB ComputingL. C. Pouchard6 Center Manager
M. W. Dobbs6
Division Training OfficerL. M. Wolfe
1Co-op 2Joint Faculty/2aORISE Faculty3Wigner Fellow 4Householder Fellow5Postdoc/5aPostmaster 6Dual Assignment/6aMatrix/6b off-site assign.7Part-time 8JICS
Center for Molecular BiophysicsJ. C. Smith, Director
Center For Engineering Science Advanced Research (CESAR)J. Barhen, Director
1/7/2009
Computational Astrophysics
A. Mezzacappa6a
P. A. Boyd
C. Y. Cardall6a
E. Endeve5,6a
H. R. Hix6a
E. Lentz5,6a
B. Messer6a
Experimental Computing Laboratory (ExCL)J. S. Vetter, Director
Computational Engineering and Energy Sciences
J. A. TurnerT. S. Darland
Application Performance Tools
R. GrahamT. S. Darland
R. F. BarrettL. G. Broto5
S. W. HodsonT. R. JonesG. A. KoenigJ. A. KuehnJ. S. Ladd
T.C. Schulthess
Computer Science and Mathematics
• General CS Areas− Architectures & Performance measurement− Tools & I/O− System software & Programming environments
• General Math Areas− Computational Math (Linear Algebra, Meshing, Multiscale
methods)− Data Science
• “Rigorous Confidence Measures for Decision Making”• V&V, Uncertainty Quantification, Sensitivity Analysis
4
Computational Science
• Materials Science
• Chemistry
• Astrophysics
• Molecular Biophysics
• Engineering and Energy Science
• Complex Systems− Spanning virtual to physical
5
The Big Challenge: Computation at Scale
• The end goal is impact on science related to energy
• Computational Science provides the context
• Mathematics and Data Science provide the foundations
• Computer Science provides the architectures, tools and programming environments needed to for computation
• How do we provide the right computational environment for science?
• How do we think differently about computation?
6
Institutes
• Institute for Advanced Architectures and Algorithms− Math (Algorithms) and Computer Science (Architecture
and Tools)− ORNL and Sandia
• Extreme Scale System Center− Focus on DARPA HPCS systems− DoD, DARPA, DoE and ORNL
• Understand and influence architectures
7
Computer Science Research Computer Science Research
• Heterogeneous Distributed Computing – PVM, Harness, OpenMPI(NCCS)
• Holistic Fault Tolerance – CIFTS
• CCA Changing the way scientific software is developed and used
• Cluster computing management and reliability – OSCAR, MOLAR
• Building a new way to do Neutron Science – SNS portal
• Building tools to enable the LCF science teams – Workbench
• Data-Intensive Computing for Complex Biological Systems –BioPilot
• EarthSystem Grid – Turning climate datasets into community resources (SDS, Climate)
• Robust storage management from supercomputer to desktop –Freeloader
• UltraScienceNet defining the future of national networking
• Miscellaneous – electronic lab notebooks, Cumulvs, bilab, smart dust
8
Perform basic research and develop software and tools to make high performance computing more effective and accessible for scientists and engineers.
9
Sponsors include:
SciDAC• Performance Engineering Research Institute• Scientific Data Management• Petascale Data Storage Institute• Visualization (VACET)• Fusion• COMPASSDOE Office of Science• Fast OS - Molar• Software EffectivenessDOD• HPC Mod ProgramNSA• Peta-SSI FASTOSDARPA• Peta-Scale Performance (DARPA HPCS)LDRD• Performance Tools for Large Scale Systems• FPGA Programmability• PerumullaNCCS• Vendor Interaction Evaluations• Scientific Computing
Special Projects / Future TechnologiesSpecial Projects / Future TechnologiesResearch Mission - performs basic research in core
technologies for future generations of high-end computing architectures and system software, including experimental computing systems, with the goal of improving the performance, reliability, and usability of these architectures for users.
Topics include• Emerging architectures
• IBM CELL (i.e., Playstation)• Graphics Processors (e.g., Nvidia)• FPGAs (e.g., Xilinx, Altera, Cray, SRC)• Cray XMT Mutlithreaded architecture
• Operating systems• Hypervisors• Lightweight Kernels for HPC
• Programming Systems• Portable programming models for
heterogeneous systems• Parallel IO
• Improving Lustre for Cray• Performance modeling and analysis
• Improving performance on today’s systems• Modeling performance on tomorrow’s systems
(e.g., DARPA HPCS)• Tools for understanding performance
• Applications• Fusion• Physics• Bio
• Visualization• New methods for CNMS
Complex SystemsComplex Systems
10
Examples of current research topics:• Missile defense: C2BMC (tracking and discrimination), NATO, flash
hyperspectral imaging• Modeling and Simulation: Sensitivity and uncertainty analysis, global
optimization • Laser arrays: directed energy, ultraweak signal detection, terahertz
sources, underwater communications, SNS laser stripping • Terascale embedded computing: emerging multicore processors
for real-time signal processing applications (CELL, HyperX, …)• Anti-submarine warfare: ultra-sensitive detection, coherent sensor
networks, advanced computational architectures for nuclear submarines, Doppler-sensitive waveforms, synthetic aperture sonar
• Quantum optics: cryptography, quantum teleportation• Computer Science: UltraScience network (40-100Gb per L)• Intelligent Systems: neural networks, mobile robotics
Mission: Support DOD and the Intelligence CommunityTheory – Computation – Experiments
Sponsors: DOD(AFRL, DARPA, MDA,ONR, NAVSEA ), DOE(SC), NASA, NSF, IC (CIA, DNI/DTO, NSA) UltraScience Net
Statistics and Data ScienceStatistics and Data Science• Chemical and Biological Mass Spectrometer Project• Discrimination of UXO• Forensics - Time Since Death• A Statistical Framework for Guiding Visualization of Petascale Data Sets• Statistical Visualization on Ultra High Resolution Displays• Local Feature Motion Density Analysis• Statistical Decomposition of Time Varying Simulation Data• Site-wide Estimation of Item Density from Limited Area Samples• Network Intrusion Detection• Bayesian Individual Dose Estimation• Sparse Matrix Computation for Complex Problems• Environmental Tobacco Smoke (ETS)• Explosive Detection Canines• Fingerprint Uniqueness Research• Chemical Security Assessment Tool (CSAT)• Sharing a World of Data: Scaling the Earth Systems Grid to Petascale• Group Violent Intent Modeling Project• ORCAT: A desktop tool for the intelligence analyst• Data model for end-to-end simulations with Leadership Class Computing• A Knowledge-Based Middleware and Visualization Framework for the Virtual
Soldier Project• GWAVA: An Information Retrieval Web application for the Virtual Autopsy
11
Computational MathematicsComputational Mathematics
• Development of multiresolution analysis for integro-differential equations and Y-PDE
• Boundary integral modeling of Functionally Graded Materials (FGMs)
• Large-scale parallel Cartesian structured adaptive mesh refinement
• Fast Multipole / non-uniform FFTs• New large-scale first principles electronic structure
code• New electronic structure method• Fracture of 3-D cubic lattice system• Adventure system• Eigensolver with Low-rank Upgrades for Spin-
Fermion Models
12
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
13
The Science Case for Peta- (and Exa-) scale Computing• Energy
− Climate, materials, chemistry, combustion, biology, nuclear fusion/fission
• Fundamental sciences− Materials, chemistry, astrophysics
• There are many others− QCD, accelerator physics, wind, solar, engineering design (aircraft,
ships, cars, buildings)
• What are key system attribute issues?− Processor speed− Memory (capacity, B/W, latency)− Interconnect (B/W, latency)
Computational Chemical SciencesComputational Chemical Sciences• Application areas
− Chemistry, materials, nanoscience, electronics
• Major techniques− Atomistic and quantum modeling of
chemical processes− Statistical mechanics and chemical
reaction dynamics• Strong ties to experiment
− Catalysis, fuel cells (H2), atomic microscopy, neutron spectroscopy
• Theory− Electronic structure, statistical
mechanics, reaction mechanisms, polymer chemistry, electron transport
• Development− Programming models for petascale
computers− Petascale applications
14
Computational Chemical Sciences is focused on the development and application of major new capabilities for the rigorous modeling of large molecular systems found in, e.g., catalysis, energy science and nanoscalechemistry.
Applied Math
Theoretical &ComputationalChemistry
New simulation capabilities
Chemical Science:A partnership of experiment, theory and simulation workingtowards shared goals.
Computer Science
Funding Sources: OBES, OASCR, NIH, DARPA
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
15
Science Case:Example - Chemistry
• 250 TF− Accurate large-scale, all-
electron, density functional simulations
− Absorption on transition metal oxide surface (important catalytic phenomenon)
− Benchmarking of Density-Functionals (importance of Hartree-Fock exchange)
• 1 PF− dynamics of few-electron
systems − Model of few-electron systems
interaction with intense radiation to a guaranteed finite precision
• Sustained PF− Treatment of absorption
problem with larger unit cells to avoid any source of error
• 1 EF− Extension of the interaction
with intense radiation to more realistic systems containing more electrons
Design catalysts, understand radiation interaction with materials, quantitative prediction of absorption processes
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
16
System Attribute: Example - Memory B/W
• Application drivers− Multi-physics, multi-scale applications stress memory bandwidth as they
do node memory capacity• “Intuitive” coding styles often produce poor memory access patterns
− Poor data structures, excessive data copying, indirect addressing
• Algorithm drivers− Unstructured grids, linear algebra− Ex: AMR codes work on blocks or patches, encapsulating small amounts
of work per memory access to make the codes readable and maintainable• This sort of architecture requires very good memory BW to achieve
good performance
• Memory B/W suffers in the multi-core future− Must apps just have to “get used to not having any”?
• Methods to (easily) exploit multiple levels of hierarchical memory are needed
• Cache blocking, cache blocking, cache blocking• Gather-scatter
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
17
System Attribute: Interconnect Latency
• Application drivers− Biology
• Stated goal of 1 ms simulation time per wall clock day requires 0.1 μs wall clock time per time step using current algorithms
− Others: chemistry, materials science• Algorithm drivers
− Explicit algorithms using nearest-neighbor or systolic communication− Medium- to fine-grain parallelization strategies (e.g. various distributed data
approaches in computational chemistry)• Speed of light puts fundamental limit on interconnect latency
− Yet raw compute power keeps getting faster, increasing imbalance• Path forward
− Need new algorithms to meet performance goals − The combination of SW and HW must allow the ability to fully overlap
communication and computation− Specialized hardware for common ops?
• Synchronization; global and semi-global reductions− Vectorization/multithreading of communication?
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
18
Analysis of applications drives architecture
Programming Model Scaling Aids Performance Libraries I/O
Fortr
an 7
7
Fortr
an 9
0
Fortr
an 9
5
C/C
++
MPI
Ope
nMP
Ope
nMP
Pth
read
s
CAF/
UPC
SHM
EM
MPI2
Ass
embler
BLA
S
FFTW
LAPACK
PBLA
S
Pet
SC
Sca
lapa
ck
Cra
y Scilib
HDF5
MPI-I
O
netC
DF
DNS 1 1 1 1 1 1 1 1MILC 1 1 1 1 1NAMD 1 1 1 1 1 1WRF 1 1 1 1 1 1POP 1 1 1 1 1 1HOMME 1 1 1 1 1 1CICE 1 1 1 1 1 1RMG 1 1 1 1 1 1 1 1 1 1PARSEC 1 1 1 1 1 1 1Espresso 1 1 1 1 1 1 1LSMS 1 1 1 1 1 1SPECFEM3D 1 1 1 1 1Chimera 1 1 1 1 1 1 1GTC 1 1 1 1 1 1GAMESS 1 1 1 1 1
Nod
e P
eak
Flop
sM
emor
y B
andw
idth
Inte
rcon
nect
Lat
ency
Mem
ory
Late
ncy
Inte
rcon
nect
Ban
dwid
thN
ode
Mem
ory
Cap
acity
Dis
k B
andw
idth
Loca
l Sto
rage
Cap
acity
Dis
k La
tenc
yW
AN
Net
wor
k B
andw
idth
Mea
n Ti
me
to
Inte
rrup
tA
rchi
val
Sto
rage
C
apac
ity
9691
88 84 8277
6156
4440 40
38
Maximum possible score = 97Minimum possible score = 19
• System design driven by usabilityacross wide range of applications
• We understand the applications needs and implications on architectures to deliver science at the exascale
Engaging application communities to determine system requirements
19
Community engaged Process of engagement Outcome
Users Enabling Petascale Science and Engineering Applications Workshop, December 2005, Atlanta, GA
Science breakthroughs requiring sustained petascale; community benchmarks
Application Developers
Sustained Petaflops: Science Requirements, Application Design and Potential Breakthroughs Workshop, November 2006, Oak Ridge, TN
Application walk-throughs; identification of important HPC system characteristics
Application Teams Weekly conference calls, November 2006–January 2007
Benchmark results; model problems
Application Teams Cray Application Projection Workshop, January 2007 Oak Ridge, TN
Project benchmarks on XT4 to model problem on sustained petaflops system
SoftwareImplementations
21
• Fortran still winning• NetCDF and HDF5 use is
widespread, but not their parallel equivalents
• Widespread use of BLAS and LAPACK
22 Managed by UT-Battellefor the Department of Energy
Mission: Deploy and operate the computational resources neededto tackle global challenges
Vision: Maximize scientific productivityand progress on the largest scalecomputational problems
• Climate change• Terrestrial sequestration of carbon• Sustainable nuclear energy• Bio-fuels and bio-energy• Clean and efficient combustion• Nanoscale Materials• Energy, ecology and security
• Providing world class computational resources and specialized services for the most computationally intensive problems
• Providing a stable hardware/software path of increasing scaleto maximize productive applications development
• Series of increasingly powerful systems for science
Cray “Baker”: 1+ PFAMD multi-core socket G3 with upgrade to 3-6 PF
1 EF system with upgrade to 2 EF
FY2009 FY2011 FY2014 FY2017
Plan for delivering Exascale systems in a decade
20 PF system with upgrade to 40 PF
100 PF system with upgrade to 250 PF based on disruptive technologies
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
23
ORNL design for a 20PF System264 compute cabinets54 router cabinets25,000 uplink cables
• Up to ~20m long• 3.125 Gbps signaling
Potential for optical linksFat Tree Network
20 PF peak1.56 PB memory234 TB/s global bandwidth6224 SMP nodes, 112 IO nodes
Disk:• 46 PB, 480 GB/s
7,600 square feet14.6 Mwatts
Performance projected from 68 TF Cray XT4 to 20 PF Cray Cascade
24
% Peak for benchmark
Projection Time for Model Problem
Performance Projection for the Model Problem
Projected Improvement in Socket Performance Max = 3200% In
terc
onne
ct
Ban
dwid
th
Inte
rcon
nect
La
tenc
y
Inte
rcon
nect
In
ject
ion
Ban
dwid
th
Nod
e Pe
rfor
man
ce
Nod
e M
emor
y B
andw
idth
Prob
lem
Siz
e
DNS 13.20% 5.79 s/step 1.5 PF MILC 18-50% 29.2 Hours 1 PF NAMD 15.00% 7 Hours 3.5 PF WRF 11% 22.63 hours 1 PF 1200% POP 3.80% 965% HOMME 12% 1200% RMG 36.50% 310/iter 2200% PARSEC 35% 1.3 hours/iter 2400% Espresso 36% 786s/iter 1080% LSMS 68% 10 PF 2000% Chimera 19% 4.825 s/step 1400% GTC 13.30% 1100% SPECFEM3D 22% 5.6 hours 1.6 PF 1550%
Bottlenecks
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
25
Processor Trends
0
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
2007 2008 2009 2010 2011 2012Year
GFL
OPS
per
Soc
ket
HighLowLikelyDisruptive
Processor Performance Trends
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
2626
How do Disruptive Technologies change the game?
• Assume 1TF to 2TF per socket • Assume 400 sockets / rack == 400 to 800 TF/rack• 25 to 50 racks for 20 PF system• Memory technology investment needed to get the bandwidth
without using 400,000 DIMMS• Roughly 150Kw/Rack• Water cooled racks• Lots of potential partners: Sun, Cray, Intel, AMD, IBM, DoD
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
2727
Building a 100-250 PF system• Assume 5TF to 10TF per socket • Assume 400 sockets / rack == 2PF to 4PF/rack• 50 to 100 racks for 100-250 PF system• Memory technology investment needed to get the bandwidth without
using 800,000 DIMMS• Various 3D technologies/solutions could be available• Partial Photonic Interconnect• Roughly ~150Kw/Rack• Water cooled racks• Liquid Cooled processors• Hybrid is certainly an option• Potential partners Sun, Cray, Intel, IBM, AMD, nVidia??
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
28
Building an Exaflop System• Assume 20TF to 40TF per socket • Assume 400 sockets / rack == 8PF to 16PF TF/rack• 125 to 250 racks for 1Ex-2Ex system• Memory technology investment needed to get the bandwidth without
using 1,000,000 DIMMS• Various 3D Technologies will be available, new memory design(s)• All Photonic Interconnect (No Copper)• Roughly ~250Kw/Rack• Water cooled racks• Liquid Cooled Processors• Hybrid is certainly an option• Potential partners Sun, Cray, Intel, IBM, AMD, nVidia??
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
29
What investments are needed?• Memory bandwidth on processors
− Potential 2-10X• need 2-4X just to maintain memory BW/FLOP ratio
− Potential joint project with AMD and Sun to increase effective bandwidth
− Another possibility is 3D memory - assume $15M + partners.
• Interconnect Performance− Working with IBM, Intel, Mellanox, Sun− Define 8,16,32X (4, 12X already defined)− Define ODR in IBTA− Already showing excellent latency characteristics− All Optical (exists currently)
• Systems software, tools, and applications• Proprietary networks possible 10X, but large
investment and risk. Assume $20M• Packaging density and cooling will continue to require
investment• Optical cabling/interconnect is a requirement, yet
always in the future
Potential Infiniband Performance
0
100
200
300
400
500
600
700
IB-SDR IB-DDR IB-QDR IB-ODR
Signaling Rate
Gb/s
IB-4XIB-8XIB-12XIB-16XIB-32X
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
30
• Exceptional operational expertise and experience• In-depth applications expertise in house• Strong partners• Proven management team
• Driving architectural innovation needed for Exascale• Superior global and injection bandwidths• Purpose built for scientific computing• Leverages DARPA HPCS technologies
• Broad system software development partnerships• Experienced performance/optimization tool development teams • Partnerships with vendors and agencies to lead the way
Petaflops
• Power (reliability, availability, cost)• Space (current and growth path)• Global network access capable of 96 X 100 Gb/s
Exaflops
Strengths in five key areas will drive project success with reduced risk
• Multidisciplinary application development teams• Partnerships to drive application performance• Science base and thought leadership