-
1
Understanding the building blocks of matter by solving
Quantum
Chromodynamics
SciDAC-3 PI Meeting, Rockville, MarylandJuly 26th, 2013
David RichardsJefferson Lab
1
Who we areWhy we areExploiting new architectures for
LQCDAlgorithms for NP calculationsA sampling of Science
-
12
Frithjof Karsch - Brookhaven National
LaboratoryRichard Brower -Boston UniversityRobert Edwards -Thomas
Jefferson National Accelerator FacilityMartin Savage -University of
WashingtonJohn Negele -Massachusetts Institute of TechnologyRob
Fowler - University of North Carolina (SUPER)Andreas Stathopoulos
-College of William and Mary
SciDAC-3 Scientific Computation Application Partnership
projectProject Director: Frithjof Karsch, Brookhaven National
Laboratory Project Co-director for Science: David
Richards, JLabProject Co-director for Computation: Richard Brower,
Boston University
Teams
Computing Properties of Hadrons, Nuclei and Nuclear Matter from
QCD
mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]
-
2
Lattice QCD for NP: Science
-
4
Lattice QCD for NP: Computation
Leadership-class Clusters + GPUS + MIC
Software + Algorithms
+ Fastmath
Multigrid with HYPRE for Lattice QCD, Andrew Pochinsky
-
QDP++ and Chroma on GPUs
• QDP-JIT/C is production ready. • QDP-JIT/PTX is fully
featured, requires some small integration with
QUDA• Have performed full 2+1 flavor Gauge Generation on Blue
Waters with
both variants• JIT/PTX has used Chroma’s internal solvers, JIT/C
has used QUDA
solvers• Paper Submitted to SC’13.
• Prognosis: Excellent, Tasks will be complete by End of
Year
5
AIM: put all of application code on GPUQDP-JIT -
Just-in-time
Porting Lattice QCD Calculations to Novel Architectures: Balint
Joo, Frank Winter
-
0 512 1024 1536 2048 2560 3072 3584 4096 4608Titan Nodes
(GPUs)
0
50
100
150
200
250
300
350
400
450
TFLO
PS
BiCGStab: 723x256DD+GCR: 723x256BiCGStab: 963x256DD+GCR:
963x256
Strong Scaling, QUDA+Chroma+QDP-JIT(PTX)
B. Joo, F. Winter (JLab), M. Clark (NVIDIA)
3
GPUs and Heterogenous Architectures
• 2010: QUDA parallelized (SC ’10 Paper)• 2011: 256 GPUs on Edge
cluster (SC ’11
Paper)• 2012: 768 GPUs on TitanDev (ACSS ’12
Contribution, Invited APS Contribution)• 2013: On BlueWaters
(NVIDIA GTC
Contribution)
Solver performance
DD: domain-decomposed solver - architecture-aware
-
0 256 512 768 1024 1280 1536 1792XK7 Nodes
0
2000
4000
6000
8000
10000
12000
14000
16000
Tim
e fo
r tra
j 226
(sec
)
CPUQDP-JIT (PTX)QDP-JIT (PTX) + QUDA
Anisotropic Clover HMC. V=48x48x48x512, physical pion mass
attempt, NCSA BlueWaters
Tue Jun 25 10:45:47 2013
3
Gauge Generation: HMC - I
Aim: to put the whole of the HMC code so as to exploit the
GPU
GPUStrong scaling for anisotropic clover HMC - key for
spectroscopy 483.512
QUDA solver
-
3
Gauge Generation: HMC - IICurrent production running
-
94! 95!
172!
195! 188!
215!
96! 96!
187!
211! 204!
235!
93! 93!
164!
197!186!
219!
95! 95!
172!
205!192!
229!
97! 97!
185!
215!202!
237!
0!
50!
100!
150!
200!
250!
Uncompressed! Compressed! Uncompressed! Compressed!
Uncompressed! Compressed!Intel® Xeon® E5-2680 (SNB-EP)! Intel® Xeon
Phi™ (KNC) 5110P! Intel® Xeon Phi™ (KNC)
B1PRQ-7110P!
GFL
OPS
!
Preconditioned Wilson CG, on Intel(R) Xeon(tm) and Intel(R) Xeon
Phi (tm) processors, in Single Precision!from B. Joo, D. Kalamkar,
K. Vaidyanathan, M. Smelyanskiy, K. Pamnany, V. Lee, P. Dubey, W.
Watson III, !
ISC 2013, LNCS 7905, pp. 40–54, 2013!KNC 5110P: 60 cores @ 1.053
GHz, MPSS 2.1.4346-16(Gold)!
KNC B1PRQ-7110P: 61 cores @ 1.1 GHz, B1 stepping, MPSS
2.1.3552-1, 7936 MB [email protected] GHz!!
V=24x24x24x128 ! V=32x32x32x128! V=40x40x40x96! V=48x48x24x64!
V=32x40x24x96!
Intel Xeon-Phi
9
Collaboration with Intel Parallel Labs
•Performance competitive with Nvidia• Exploit e.g. Stampede
-
Typical LQCD WorkflowGenerate the configurationsl Leadership
level
t=0 t=T
Analyze• Typically mid-range
level
Few big jobs Few big files
Many small jobs Many big filesI/O movement
Extractl Extract information from
measured observables
10
IMPORTANT FOR NP
-
Correlation functions: Distillation• Use the new “distillation”
method.• Observe
• Truncate sum at sufficient i to capture relevant physics modes
– we use 64: set “weights” f to be unity
• Meson correlation function
• Decompose using “distillation” operator as
Eigenvectors of Laplacian
Perambulators
M. Peardon et al., PRD80,054506 (2009)
This is a Capacity-computing task: solver with many RHS
GPUs
-
All-Mode Averaging - I
Chulwoo Jung, BNL
T. Blum, T. Izubichi, E. Shintani, arXiv: 1208.4349
-
All-Mode Averaging - IIElectromagnetic form factor of the
proton
Andreas Strathopolous - AM/CS at William and Mary - development
of methods for multiple right-hand sides
-
Wick Contraction Methods
K. Orginos, W. Detmold
Nuclear Physics calculations involve many contractions...
-
15
Momentum-dependent Phase Shifts
0
20
40
60
80
100
120
140
160
180
800 850 900 950 1000 1050
Extension to I = 1 ππ elastic scattering
I = 2 ππ elastic scattering
0.5
1.0
1.5
2.0
2.5
exotics
isoscalar
isovectorYM glueball
negative parity positive parity
-
Hadron Structure
16
Momentum fraction of quarks in proton
How is spin apportioned in a proton?
-
−200
−180
−160
−140
−120
−100
−80
−60
−40
−20
0
−B
[MeV
]
1+0+
1+
0+
1+
0+
12
+ 12
+
32
+
12
+
32
+
0+ 0+
0+
0+
d nn nΣ H-dib nΞ3He 3ΛH 3ΛHe 3ΣHe4He 4ΛHe 4ΛΛ He
s = 0 s = −1 s = −2
2-body3-body4-body
Three- and Four-body systems
Binding energies at physical strange-quark mass
17
-
Chiral Transition Temperature
-
OUTLOOK
• Effective use of JIT to exploit accelerated architectures is
basis of work with SUPER to develop domain specific compiler for
LQCD. Apply lessons to other domain-specific frameworks• Effective
exploitation of Xeon-Phi• Collaboration with FASTMATH for
multi-grid methods• Lattice QCD for nuclear physics characterised
by heavy capacity requirements as well as capability
requirement
- Methods for multiple RHS Wick-contraction methods
• Exciting Physics
19