Scientific Computing at SLAC

Scientific Computing at SLAC

Richard P. Mount

Director: Scientific Computing and Computing Services

Stanford Linear Accelerator Center

HEPiX

October 11, 2005

October 11, 2005Richard P. Mount, SLAC 2

SLAC Scientific ComputingBalancing Act

• Aligned with the evolving science mission of SLAC

but neither

• Subservient to the science mission

nor

• Unresponsive to SLAC mission needs


SLAC Scientific Computing Drivers

• BaBar (data-taking ends December 2008)– The world’s most data-driven experiment

– Data analysis challenges until the end of the decade

• KIPAC – From cosmological modeling to petabyte data analysis

• Photon Science at SSRL and LCLS– Ultrafast Science, modeling and data analysis

• Accelerator Science– Modeling electromagnetic structures (PDE solvers in a demanding application)

• The Broader US HEP Program (aka LHC)– Contributes to the orientation of SLAC Scientific Computing R&D


DOE Scientific Computing Funding at SLAC

• Particle and Particle Astrophysics

– $14M SCCS

– $5M Other

• Photon Science

– $0M SCCS

– $1M SSRL?

• Computer Science

– $1.5M


Scientific ComputingThe relationship between Science and the

components of Scientific Computing Application Sciences

Issues addressable with “computing”

Computing techniques

Computing hardware

High-energy and Particle-Astro Physics, Accelerator Science, Photon Science …

Particle interactions with matter, Electromagnetic structures, Huge volumes of data, Image processing …

PDE Solving, Algorithmic geometry, Visualization, Meshes, Object databases, Scalable file systems …

Processors, I/O devices, Mass-storage hardware, Random-access hardware, Networks and Interconnects …

Computing architectures

Single system image, Low-latency clusters, Throughput-oriented clusters, Scalable storage …

SCCS FTE

~20

~26


Scientific Computing:SLAC’s goals for

Scientific Computing

Computing

for

Data-

Intensive

Science

The Scien ce of Scientific Computing

Application Sciences

Issues addressable with “computing”

Computing techniques

Computing hardware

SLAC + Stanford Science

Computing architectures


Co

llabo

ratio

n w

ith

Sta

nfo

rd a

nd

Ind

us

try


Scientific Computing:Current SLAC leadership and recent achievements in Scientific Computing


World’s largest database

Internet2 Land-Speed Record; SC2004 Bandwidth Challenge

Huge-memory systems for data analysis

Scalable data management

GEANT4 photon/particle interaction in complex structures (in collaboration with CERN)

PDE solving for complex electromagnetic structures


What does SCCS run (1)?

Data analysis “farms” (also good for HEP simulation)

~ 4000 processors

~ Linux and Solaris

Shared-Memory multiprocessor

– SGI 3700

– 72 processors

– Linux

256 dual-opteron Sun V20zs

Myrinet cluster

– 128 processors

– Linux


What does SCCS run (2)?Application-specific clusters

– each 32 to 128 processors

– Linux

PetaCache Prototype

– 64 nodes

– 16 GB memory per node

– Linux/Solaris

And even …


What does SCCS run (3)?Disk Servers

– About 500TB

– Network attached

– Mainly xrootd

– Some NFS

– Some AFS

Tape Storage

– 6 STK Powderhorn Silos

– Up to 6 petabytes capacity

– Currently store 2 petabytes

– HPSS

About 120 TB

Sun/Solaris Servers

Sun fibrechannel disk arrays


What does SCCS run (4)?Networks

– 10 Gigabits/sto ESNET

– 10 Gigabits/sR&D

– 96 fibersto Stanford

– 10 Gigabits/s core in computer center (as soon as we unpack the boxes)


SLAC Computing - Principles and Practice (Simplify and Standardize)

• Lights-out operation – no operators for the last 10 years– Run 24x7 with 8x5 (in theory) staff

– (When there is a cooling failure on a Sunday morning, 10–15 SCS staff are on site by the time I turn up)

• Science (and business-unix) raised-floor computing– Adequate reliability essential

– Solaris and Linux

– Scalable “cookie-cutter” approach

– Only one type of farm CPU bought each year

– Only one type of file-server + disk bought each year

– Highly automated OS installations and maintenance• e.g see talk on how SLAC does Clusters by Alf Wachsmann

http://www.slac.stanford.edu/~alfw/talks/RCcluster.pdf


Client Client Client Client Client Client

Disk Server

Disk Server

Disk Server

Disk Server

Disk Server

Disk Server

Tape Server

Tape Server

Tape Server

Tape Server

Tape Server

SLAC-BaBar Computing Fabric

IP Network (Cisco)

IP Network (Cisco)

120 dual/quad CPU Sun/Solaris~400 TB Sun FibreChannel RAID arrays (+some SATA)

1700 dual CPU Linux 800 single CPU Sun/Solaris

25 dual CPU Sun/Solaris40 STK 9940B6 STK 9840A6 STK Powderhornover 1 PB of data

HEP-specific ROOT software (Xrootd) + Objectivity/DB object database some NFS

HPSS + SLAC enhancements to ROOT and Objectivity server code


Scientific ComputingResearch Areas (1)

(Funded by DOE-HEP and DOE SciDAC and DOE-MICS)

• Huge-memory systems for data analysis(SCCS Systems group and BaBar)

– Expected major growth area (more later)

• Scalable Data-Intensive Systems: (SCCS Systems and Physics Experiment Support groups)

– “The world’s largest database” (OK not really a database any more)

– How to maintain performance with data volumes growing like “Moore’s Law”?

– How to improve performance by factors of 10, 100, 1000, … ?(intelligence plus brute force)

– Robustness, load balancing, troubleshootability in 1000 – 10000-box systems

– Astronomical data analysis on a petabyte scale (in collaboration with KIPAC)



(Funded by DOE-HEP and DOE SciDAC and DOE MICS)

• Grids and Security: (SCCS Physics Experiment Support. Systems and Security groups)

– PPDG: Building the US HEP Grid – Open Science Grid;

– Security in an open scientific environment;

– Accounting, monitoring, troubleshooting and robustness.

• Network Research and Stunts:(SCCS Network group – Les Cottrell et al.)

– Land-speed record and other trophies

• Internet Monitoring and Prediction:(SCCS Network group)

– IEPM: Internet End-to-End Performance Monitoring (~5 years)

– INCITE: Edge-based Traffic Processing and Service Inference for High-Performance Networks


• GEANT4: Simulation of particle interactions in million to billion-element geometries:(SCCS Physics Experiment Support Group – M. Asai, D. Wright, T. Koi, J. Perl …)

– BaBar, GLAST, LCD …

– LHC program

– Space

– Medical

• PDE Solving for complex electromagnetic structures:(Kwok Ko‘s advanced Computing Department + SCCS clusters)


(Funded by DOE-HEP and DOE SciDAC and DOE MICS)


Growing Competences

• Parallel Computing (MPI …)

– Driven by KIPAC (Tom Abel) and ACD (Kwok Ko)

– SCCS competence in parallel computing (= Alf Wachsmann currently)

– MPI clusters and SGI SSI system

• Visualization

– Driven by KIPAC and ACD

– SCCS competence is currently experimental-HEP focused (WIRED, HEPREP …)

– (A polite way of saying that growth is needed)

A Leadership-Class Facility for Data-Intensive Science

Richard P. Mount

Director, SLAC Computing ServicesAssistant Director, SLAC Research Division

Washington DC, April 13, 2004The Peta

Cache

Projec

t


PetaCache Goals

• The PetaCache architecture aims at revolutionizing the query and analysis of scientific databases with complex structure.

– Generally this applies to feature databases (terabytes–petabytes) rather than bulk data (petabytes–exabytes)

• The original motivation comes from HEP

– Sparse (~random) access to tens of terabytes today, petabytes tomorrow

– Access by thousands of processors today, tens of thousands tomorrow


PetaCacheThe Team

• David Leith, Richard Mount, PIs

• Randy Melen, Project Leader

• Chuck Boeheim (Systems group leader)

• Bill Weeks, performance testing

• Andy Hanushevsky, xrootd

• Systems group members

• Network group members

• BaBar (Stephen Gowdy)


Random-Access Storage Performance

0.000000001

0.00000001

0.0000001

0.000001

0.00001

0.0001

0.001

0.01

0.1

1

10

100

1000

log10 (Obect Size Bytes)

Ret

reiv

al R

ate

Mb

ytes

/s

PC2100

WD200GB

STK9940B

0 1 2 3 4 5 6 7 8 9 10

Latency and Speed – Random Access


The PetaCache Strategy• Sitting back and waiting for technology is a BAD idea

• Scalable petabyte memory-based data servers require much more than just cheap chips. Now is the time to develop:

– Data server architecture(s) delivering latency and throughput cost-optimized for the science

– Scalable data-server software supporting a range of data-serving paradigms (file-access, direct addressing, …)

– “Liberation” of entrenched legacy approaches to scientific data analysis that are founded on the “knowledge” that accessing small data objects is crazily inefficient

• Applications will take time to adapt not just codes, but their whole approach to computing, to exploit the new architecture

• Hence: three phases

1. Prototype machine (In operation)• Commodity hardware

• Existing “scalable data server software” (as developed for disk-based systems)

• HEP-BaBar as co-funder and principal user

• Tests of other applications (GLAST, LSST …)

• Tantalizing “toy” applications only (too little memory for flagship analysis applications)

• Industry participation

2. Development Machine (Next proposal)• Low-risk (purely commodity hardware) and higher-risk (flash memory system requiring some hardware development) components

• Data server software – improvements to performance and scalability, investigation of other paradigms

• HEP-BaBar as co-funder and principal user

• Work to “liberate” BaBar analysis applications

• Tests of other applications (GLAST, LSST …)

• Major impact on a flagship analysis application

• Industry partnerships, DOE Lab partnerships

3. Production Machine(s)• Storage-class memory with a range of interconnect options matched to the latency/throughput needs of differing applications

• Scalable data-server software offering several data-access paradigms to applications

• Proliferation – machines deployed at several labs

• Economic viability – cost-effective for programs needing dedicated machines

• Industry partnerships transitioning to commercialization


Prototype Machine(Operational)

Cisco Switch

Data-Servers 64-128 Nodes, each Sun V20z, 2 Opteron CPU, 16 GB memory

Up to 2TB total MemorySolaris or Linux (mix and match)

Cisco Switches

Clientsup to 2000 Nodes, each 2 CPU, 2 GB memory

Linux

PetaCacheMICS + HEP-

BaBar Funding

Existing HEP-Funded BaBar Systems


Object-Serving Software

• Xrootd/olbd (Andy Hanushevsky/SLAC)– Optimized for read-only access

– File-access paradigm (filename, offset, bytecount)

– Make 1000s of servers transparent to user code

– Load balancing

– Self-organizing

– Automatic staging from tape

– Failure recovery

• Allows BaBar to start getting benefit from a new data-access architecture within months without changes to user code

• The application can ignore the hundreds of separate address spaces in the data-cache memory


Prototype Machine: Performance Measurements

• Latency

• Throughput (transaction rate)

• (Aspects of) Scalability


Latency (microseconds) versus data retrieved (bytes)

0.00

50.00

100.00

150.00

200.00

250.00

100

600

1100

1600

2100

2600

3100

3600

4100

4600

5100

5600

6100

6600

7100

7600

8100

Server xrootd overhead

Server xrootd CPU

Client xroot overhead

Client xroot CPU

TCP stack, NIC, switching

Min transmission time


Throughput Measurements

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

1 5 10 15 20 25 30 35 40 45 50

Number of Clients for One Server

Tra

nsa

ctio

ns

per

Sec

on

d

Linux Client - Solaris Server

Linux Client - Linux Server

Linux Client - Solaris Server bge

22 processor microseconds per transaction


Storage-Class Memory

• New technologies coming to market in the next 3 – 10 years (Jai Menon – IBM)

• Current not-quite-crazy example is flash memory


Development MachinePlans

Switch (10 Gigabit ports)

Data-Servers 80 Nodes, each 8 Opteron CPU, 128 GB memory

Up to 10TB total MemorySolaris/Linux

Cisco Switch Fabric

Clients up to 2000 Nodes, each 2 CPU, 2 GB memory

Linux

Data-Servers 30 Nodes, each 2 Opteron CPU, 1TB Flash memory

~ 30TB total MemorySolaris/Linux

PetaCache

SLAC-BaBar System


Minor Details?

• 1970’s– SLAC Computing Center designed for ~35 Watts/square foot

– 0.56 MWatts maximum

• 2005– Over 1 MWatt by the end of the year

– Locally high densities (16 kW racks)

• 2010– Over 2 MWatts likely need

• Onwards– Uncertain, but increasing power/cooling need is likely


Crystal Ball (1)

• The pessimist’s vision:

– PPA computing winds down to about 20% of its current level as BaBar analysis ends in 2012 (Glast is negligible, KIPAC is small)

– Photon Science is dominated by non-SLAC/non-Stanford scientists who do everything at their home institutions

– The weak drivers from the science base make SLAC unattractive for DOE computer science funding


Crystal Ball (2)• The optimist’s vision:

– PPA computing in 2012 includes: • Vigorous/leadership involvement in LHC physics analysis using

innovative computing facilities

• Massive computational cosmology/astrophysics

• A major role in LSST data analysis (petabytes of data)

• Accelerator simulation for the ILC

– Photon Science computing includes: • A strong SLAC/Stanford faculty, leading much of LCLS science, fully

exploiting SLAC’s strengths in simulation, data analysis and visualization

• A major BES accelerator research initiative

– Computer Science includes:• National/international leadership in computing for data-intensive science

(supported at $25M to $50M per year)

– SLAC and Stanford:• University-wide support for establishing leadership in the science of

scientific computing

• New SLAC/Stanford scientific computing institute.

Scientific Computing at SLAC

Documents

slac computing principles

slac mission needsoctober

current slac leadership

scientific computingoctober

computer science

lclsultrafast science

science missionnorunresponsive

otherphoton science