Top Banner
U.S. epartment of Energy’s Office f Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards [email protected] October 21, 2008
23

U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.

Jan 17, 2018

Download

Documents

Office of Science U.S. Department of Energy Jefferson Lab Science Goals  Understanding structure, spectroscopy and interactions of hadrons from QCD is the central challenge of nuclear physics  How are charge, current and spin distributed in the nucleon?  What are the effective degrees of freedom describing the low-energy spectrum of the theory?  How does the nucleon-nucleon and hadron-hadron interaction arise from QCD?
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.

U.S. Department of Energy’s Office of Science

Midrange Scientific Computing Requirements

Jefferson Lab

Robert [email protected] 21, 2008

Page 2: U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.

Office of Science

U.S. Department of Energy

Jefferson Lab

Lab doubling beam energy Adding new experimental Hall

CD-3 JLab Receives DOE Approval to Start Construction of

$310 Million Upgrade

Page 3: U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.

Office of Science

U.S. Department of Energy

Jefferson Lab

Science Goals Understanding structure, spectroscopy and

interactions of hadrons from QCD is the central challenge of nuclear physics

How are charge, current and spin distributed in the nucleon?

What are the effective degrees of freedom describing the low-energy spectrum of the theory?

How does the nucleon-nucleon and hadron-hadron interaction arise from QCD?

Page 4: U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.

Office of Science

U.S. Department of Energy

Jefferson Lab

NP Milestones in Hadron Physics NP2009 and NP2012 : measurement &

determination of mass & electromagnetic properties of low lying baryons (hadrons)

New Hall D – seek information about exotic measurements. Flagship of JLab upgrade

NP2014 : determination of Generalized Parton Distributions within nucleons. These characterize how quarks interact with nucleons

Page 5: U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.

Office of Science

U.S. Department of Energy

Jefferson Lab

Computing efforts in support of mission Experimental physics data acquisition,

storage, and analysis (farm computing) Lattice QCD theory calculations of

fundamental quantities

Page 6: U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.

Office of Science

U.S. Department of Energy

Jefferson Lab

Theory context: USQCD Collaboration Consists of nearly all high energy and nuclear physicists

in the US involved in lattice QCD. Formed nine years ago to develop infrastructure for these studies

Research directly in support of DOE experimental facilities at BNL, FNAL, JLab

SciDAC I & II: software development for lattice QCD FY06-11: ~$2.0M/yr HEP+NP, $0.4M/yr JLab

USQCD Facilites: LQCD I – Midrange computing FY06-09: ~$2.5M/yr HEP+NP Access to USQCD dedicated hardware is allocated within

peer-reviewed process LQCD II: FY10-14 – proposal submitted & reviewed

Page 7: U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.

Office of Science

U.S. Department of Energy

Jefferson Lab

Example: Spectroscopy Experimental and ab initio N* and Exotic-

meson programs aim at discovering effective degrees of freedom of QCD, NP2009 and NP2012 milestones Excited Baryon Analysis Center

(EBAC) at Jefferson Lab Spectroscopy of Exotic Mesons is a

flagship component of CEBAF@12GeV½+ 3/2+ 5/2+ ½- 3/2- 5/2-

550

1100

1650

2200

2750

3300

Nucleon spectrum (MeV)

Page 8: U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.

Office of Science

U.S. Department of Energy

Jefferson Lab

Nature of the LQCD calculationsCalculations are performed in two steps: Monte Carlo methods are used to gauge

configurations with a probability proportional to their weight in the Feynman path integrals that define QCD

These configurations are stored, and used to calculate a wide variety of physical observables

Page 9: U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.

Office of Science

U.S. Department of Energy

Jefferson Lab

Nature of the LQCD calculations (2) Leadership machines: gauge generation uses large

core count, single sequence of computing, requires multi TF sustained on one job

Midrange machines: analysis jobs typically smaller core count, each configuration an independent job, typically a few 100 GF sustained

Fine grained parallelism; regular hypercubic problems Computations and communication equally important;

low latency required

Page 10: U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.

Office of Science

U.S. Department of Energy

Jefferson Lab

Leadership Machines USQCD aggressively pursuing national resources:

INCITE 07: ORNL: Cray XT4 - 10M-hrs (largest allocation) INCITE 08-10: yearly allocations, will increase

ORNL: Cray XT4 - 7M-hrs ANL: BG/P – 20M-hrs

ESP 08: ANL: BG/P - 250M-hrs [11 TF-yr] ESP 09: ORNL: XT-5 ~ 60M-hrs ??? [7 TF-yr]

Groups (JLab) also pursuing NSF + other resources 2007: 1 TF-yr (PSC+SDSC) 2008: 2 TF-yr (PSC+SDSC+TACC) 2008: 1 TF-yr (LANL/NNSA)

Page 11: U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.

Office of Science

U.S. Department of Energy

Jefferson Lab

LQCD Computing Project Hardware (Midrange)

Year Computer Site Nodes Performance(TF/s)

2002 QCD FNAL 127 0.15

2004 4g JLab 384 0.36

2005 Pion FNAL 518 0.86

2005 QCDOC BNL 12288 4.20

2006 6n JLab 256 0.62

2006 Kaon FNAL 600 2.56

2007 7n JLab 396 2.98

2008 J/Psi FNAL 400 6.00

w/ OASCR

SciDAC I

LQCD I

Page 12: U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.

Office of Science

U.S. Department of Energy

Jefferson Lab

Distribution of Jobs Leadership machines: few K to 128K cores Clusters: currently up to 1K cores. Will grow as lattice

sizes scale up on leadership machines

Page 13: U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.

Office of Science

U.S. Department of Energy

Jefferson Lab

Future Hardware goals (BNL+FNAL+JLab) by fiscal year

Initial year based upon initial INCITE and leadership class awards. Moore’s law growth thereafter

Dedicated hardware is for all resources running that year; assumes clusters with 3.5 year life

Page 14: U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.

Office of Science

U.S. Department of Energy

Jefferson Lab

Distribution of Resources (2010-2014)

Page 15: U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.

Office of Science

U.S. Department of Energy

Jefferson Lab

LQCD II budget request

Page 16: U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.

Office of Science

U.S. Department of Energy

Jefferson Lab

LQCD II review: Jan. 2008Charges to the collaboration by DOE: Why is a new project needed if OASCR is providing

access to Leadership Class machines? In particular, is dedicated hardware, such as additional clusters, essential and cost effective in such an environment? What is the optimal mix of machines, given realistic budget constraints?

What are the plans at FNAL, TJNAF, & BNL for LQCD computing? How are these plans incorporated into your plans for LQCD II ?

Page 17: U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.

Office of Science

U.S. Department of Energy

Jefferson Lab

LQCD II review: Jan. 2008Review findings: The 1-1 mix of Leadership and clusters advocated by

LQCD II is the most suitable hardware mix for this project. The review committee advocates full finding for LQCD II

at the level described in their proposal. The scenario in which the funding for LQCD II would be flat with LQCD I would cause the project to miss opportunities that would otherwise enhance several other fields that the Office of Science supports, such as computer hardware and software, nuclear and astrophysics.

Page 18: U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.

Office of Science

U.S. Department of Energy

Jefferson Lab

Characteristics of computing: Dominant part of calculation: large sparse matrix system

solve; application of matrix on vector Regular grid per compute core: moving to hybrid threaded

+ MPI model. SciDAC II software

development effort

Page 19: U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.

Office of Science

U.S. Department of Energy

Jefferson Lab

Opportunities for Optimization Clusters configured for a single application (LQCD) can be

better optimized than those serving a dozen applications Memory and disk lean Pruned fat tree network (e.g. Infiniband) due to highly

local communications pattern Lower aggregate bandwidth to disk compared to check-

point intensive simulations Overall impact compared to generic clusters: 50% more

computing capacity/dollar 2x-5x more cost effective for analysis jobs than leadership

class machines

Page 20: U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.

1990 2000 2010

Mflops / $

101

10-1

100

QCDSP

Performance/$ for LQCD Applications

• Commodity compute nodes (leverage marketplace & Moore’s law)• Low latency, high bandwidth network to exploit full I/O capability

10-2

Supercomputers, leadership class

machines

JLab SciDAC Prototype Clusters

QCDOC

20022003

2004

2008/9 cluster at FNAL

Japanese Earth Simulator

JLab clusters

BlueGene/L

BlueGene/P

Page 21: U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.

Office of Science

U.S. Department of Energy

Jefferson Lab

NP Approved MissionsData analysis and support for 12 GeV:

Why: Needed to support the current 6 GeV and future 12 GeV programs (detector simulation)

How it’s done today: small cluster / farm, plus all necessary infrastructure (tape library, cache disk, …)

Status: Constrained by tight budgets, but currently keeping up with requirements (barely)

Barriers: Future: multi-threading to prevent memory requirement blow-up on many-core architectures

Special Features: integer intensive and i/o intensive

Page 22: U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.

Office of Science

U.S. Department of Energy

Jefferson Lab

NP Approved MissionsLattice QCD:

Why: Theory calculations in support of JLab experimental physics program

How it’s done today: NSF+INCITE computers + midrange computing within USQCD

Status: need more runs of larger size (more cores), plus more statistics (longer runs)

Barriers: multi-threading to reduce comms. Need LQCD II.

Special Features: floating point intensive + balanced communications – fine grained parallelism

Page 23: U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.

Office of Science

U.S. Department of Energy

Jefferson Lab

NP Proposed Missions or Initiatives

No new proposed missions Currently fulfilling approved missions, with

funding requested to continue those approved missions