Top Banner
Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013
35

Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Dec 25, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Dynamically Creating Big Data Centers for the LHC

Frank Würthwein

Professor of PhysicsUniversity of California San Diego

September 25th, 2013

Page 2: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Frank Wurthwein - ISC Big Data 2

Outline

• The Science• Software & Computing Challenges• Present Solutions• Future Solutions

September 25th 2013

Page 3: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

The Science

Page 4: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Frank Wurthwein - ISC Big Data 4

~67% of energy is “dark energy”

~29% of matter is “dark matter”

All of what we know makes upOnly about 4% of the universe.

We have some ideas but no proof of what this is!

We got no clue what this is.

The Universe is a strange place!

September 25th 2013

Page 5: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Frank Wurthwein - ISC Big Data 5

To study Dark Matter we need to

create it in the laboratory

September 25th 2013

Mont Blanc

Lake Geneva

ALICE

ATLAS

LHCb

CMS

Page 6: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.
Page 7: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Frank Wurthwein - ISC Big Data 7

“Big bang” in the laboratory

• We gain insight by colliding particles at the highest energies possible to measure:– Production rates– Masses & lifetimes– Decay rates

• From this we derive the “spectroscopy” as well as the “dynamics” of elementary particles.

• Progress is made by going to higher energies and brighter beams.

September 25th 2013

Page 8: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Frank Wurthwein - ISC Big Data 8

Explore Nature over 15 Orders of magnitudePerfect agreement between Theory & Experiment

Dark Matter expectedsomewhere below this line.

September 25th 2013

Page 9: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Frank Wurthwein - ISC Big Data 9

And for the Sci-Fi Buffs …Imagine our 3D world to beconfined to a 3D surface ina 4D universe.

Imagine this surface to be curved such that the 4th Ddistance is short for locationslight years away in 3D.

Imagine space travel bytunneling through the 4th D.

The LHC is searching for evidence of a 4th dimension of space.

September 25th 2013

Page 10: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Frank Wurthwein - ISC Big Data 10

Recap so far …

• The beams cross in the ATLAS and CMS detectors at a rate of 20MHz

• Each crossing contains ~10 collisions• We are looking for rare events that are

expected to occur in roughly 1/10000000000000 collisions, or less.

September 25th 2013

Page 11: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Software & ComputingChallenges

Page 12: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

The CMS Experiment

Page 13: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Frank Wurthwein - ISC Big Data 13

The CMS Experiment• 80 Million electronic channels

x 4 bytesx 40MHz-----------------------~ 10 Petabytes/sec of informationx 1/1000 zero-suppressionx 1/100,000 online event filtering------------------------~ 100-1000 Megabytes/sec raw data to tape1 to 10 Petabytes of raw data per yearwritten to tape, not counting simulations.

• 2000 Scientists (1200 Ph.D. in physics)– ~ 180 Institutions– ~ 40 countries

• 12,500 tons, 21m long, 16m diameter

September 25th 2013

Page 14: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Frank Wurthwein - ISC Big Data 14

Active Scientists in CMS

September 25th 2013

5-40% of the scientific members are actively doing large scale data analysis in

any given week.

~1/4 of the collaboration,scientists and engineers,

contributed to the common source code of ~3.6M C++ SLOC.

Page 15: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Frank Wurthwein - ISC Big Data 15

Evolution of LHC Science Program

150Hz 1000Hz 10000HzEvent Rate written to tape

September 25th 2013

Page 16: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Frank Wurthwein - ISC Big Data 16

The Challenge

How do we organize the processing of 10’s to 1000’s of Petabytes of data by a globally distributed community

of scientists, and do so with manageable “change costs” for the next 20 years ?

Guiding Principles for SolutionsChose technical solutions that allow

computing resources as distributed as human resources.Support distributed ownership and control,

within a global single sign-on security context.Design for heterogeneity and adaptability.

September 25th 2013

Page 17: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Present Solutions

Page 18: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Frank Wurthwein - ISC Big Data 18September 25th 2013

Federation of National Infrastructures. In the U.S.A.: Open Science Grid

Page 19: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Frank Wurthwein - ISC Big Data 19September 25th 2013

Among the top 500 supercomputers there are only two that are bigger when

measured by power consumption.

Page 20: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Frank Wurthwein - ISC Big Data 20

Tier-3 Centers

• Locally controlled resources not pledged to any of the 4 collaborations.– Large clusters at major research Universities that are time

shared.– Small clusters inside departments and individual research

groups.

• Requires global sign-on system to be open for dynamically adding resources.– Easy to support APIs– Easy to work around unsupported APIs

September 25th 2013

Page 21: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Frank Wurthwein - ISC Big Data 21

Me -- My friends -- The grid/cloud

O(104) Users

O(102-3) Sites

O(101-2) VOs

Thin client

Thin “Grid API”

Thick VOMiddleware& Support

Me

My friends

The anonymousGrid or Cloud

Domain science specific Common to all sciencesand industry

September 25th 2013

Page 22: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Frank Wurthwein - ISC Big Data 22

“My Friends” Services

• Dynamic Resource provisioning• Workload management

– schedule resource, establish runtime environment, execute workload, handle results, clean up

• Data distribution and access– Input, output, and relevant metadata

• File catalogue

September 25th 2013

Page 23: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Frank Wurthwein - ISC Big Data 23

Optimize Data Structure for Partial Reads

September 25th 2013

Page 24: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Frank Wurthwein - ISC Big Data 24

Fraction of a file that is read

September 25th 2013

# of

file

s re

ad

For vast majority of files, less than 20% of the file is read.

20%

Average 20-35%Median 3-7%

(depending on type of file)

Overflow bin

Page 25: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Future Solutions

Page 26: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Frank Wurthwein - ISC Big Data 26

From present to future• Initially, we operated a largely static system.

– Data was placed quasi-static before it can be analyzed.– Analysis centers have contractual agreements with the collaboration.– All reconstruction is done at centers with custodial archives.

• Increasingly, we have too much data to afford this.– Dynamic data placement

• Data is placed at T2s based on job backlog in global queues.– WAN access: ”Any Data, Anytime, Anywhere”

• Jobs are started on the same continent as the data instead of the same cluster attached to the data.

– Dynamic creation of data processing centers• Tier-1 hardware bought to satisfy steady state needs instead of peak needs.• Primary processing as data comes off the detector => steady state• Annual Reprocessing of accumulated data => peak needs

September 25th 2013

Page 27: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Frank Wurthwein - ISC Big Data 27

Any Data, Anytime, Anywhere

September 25th 2013

Global redirection system to unify all CMS data into one globally accessible namespace.

Is made possible by paying careful attention to IO layerto avoid inefficiencies due to IO related latencies.

Page 28: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Frank Wurthwein - ISC Big Data 28

Vision going forward

Implemented vision for 1st time in Spring 2013using Gordon Supercomputer at SDSC.

September 25th 2013

Page 29: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Frank Wurthwein - ISC Big Data 29September 25th 2013

Page 30: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

30

CMS “My Friends” Stack• CMSSW release environment

– NFS exported from Gordon IO nodes– Future: CernVM-FS via Squid caches

• J. Blomer et al.; 2012 J. Phys.: Conf. Ser. 396 052013

• Security Context (CA certs, CRLs) via OSG worker node client• CMS calibration data access via FroNTier

• B. Blumenfeld et al; 2008 J. Phys.: Conf. Ser. 119 072007– Squid caches installed on Gordon IO nodes

• glideinWMS• I. Sfiligoi et al.; doi:10.1109/CSIE.2009.950

– Implements “late binding” provisioning of CPU and job scheduling– Submits pilots to Gordon via BOSCO (GSI-SSH)

• WMAgent to manage CMS workloads• PhEDEx data transfer management

– Uses SRM and gridftp

September 25th 2013 Frank Wurthwein - ISC Big Data

Job

enviro

nm

ent

Data an

d Jo

bh

and

ling

Page 31: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

31

CMS “My Friends” Stack• CMSSW release environment

– NFS exported from Gordon IO nodes– Future: CernVM-FS via Squid caches

• J. Blomer et al.; 2012 J. Phys.: Conf. Ser. 396 052013

• Security Context (CA certs, CRLs) via OSG worker node client• CMS calibration data access via FroNTier

• B. Blumenfeld et al; 2008 J. Phys.: Conf. Ser. 119 072007– Squid caches installed on Gordon IO nodes

• glideinWMS• I. Sfiligoi et al.; doi:10.1109/CSIE.2009.950

– Implements “late binding” provisioning of CPU and job scheduling– Submits pilots to Gordon via BOSCO (GSI-SSH)

• WMAgent to manage CMS workloads• PhEDEx data transfer management

– Uses SRM and gridftp

September 25th 2013 Frank Wurthwein - ISC Big Data

Job

enviro

nm

ent

Data an

d Jo

bh

and

ling

This is clearly mighty complex !!!

So let’s focus only on the parts that are specific to incorporating

Gordon as a dynamic data processing center.

Page 32: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Frank Wurthwein - ISC Big Data 32September 25th 2013

Items in red were deployed/modified to incorporate Gordon

BO

SC

O

Minor mod of PhEDEx config file

Deploy SquidExport CMSSW

& WN client

Page 33: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Frank Wurthwein - ISC Big Data 33

Gordon Results

• Work completed in February/March 2013 as a result of a “lunch conversation” between SDSC & US-CMS management– Dynamically responding to an opportunity

• 400 Million RAW events processed– 125 TB in and ~150 TB out– ~2 Million core hours of processing

• Extremely useful for both science results as well as proof of principle in software & computing.

September 25th 2013

Page 34: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Frank Wurthwein - ISC Big Data 34

Summary & Conclusions

• Guided by the principles:– Support distributed ownership and control in a

global single sign-on security context.– Design for heterogeneity and adaptability

• The LHC experiments very successfully developed and implemented a set of new concepts to deal with BigData.

September 25th 2013

Page 35: Dynamically Creating Big Data Centers for the LHC Frank Würthwein Professor of Physics University of California San Diego September 25th, 2013.

Frank Wurthwein - ISC Big Data 35

Outlook• The LHC experiments had to largely invent an

island of BigData technologies with limited interactions with industry and other domain sciences.

• Is it worth building bridges to other islands ?– IO stack and HDF5 ?– MapReduce ?– What else ?

• Is there a mainland emerging that is not just another island ?

September 25th 2013