Robin Middleton RAL-PPD/EGEE/GridPP Grid Computing A high-level look at Grid Computing in the world of Particle Physics and at LHC in particular. I am.

Post on 26-Dec-2015

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Robin Middleton

RAL-PPD/EGEE/GridPP

Grid Computing

A high-level look at Grid Computing in theworld of Particle Physics and at LHC in particular.

I am indebted to the EGEE, LCG and GridPPprojects and to colleagues therein for much

of the material presented here.

2

Overview

• e-Science and The Grid• Grids in Particle Physics

– EGEE LCG GridPP

– Virtual Organisations

• Computing Model (very high level !)• Components of the EGEE/LCG/GridPP Grid

– security

– information service

– resource brokering

– data management

• Monitoring & User Support• Other Projects / Sciences• Sustainability & EGI• Further information

– Links

3

What is e-Science ?What is the Grid ?

e-Science

• …also : e-Infrastructure, cyberinfrastructure, e-Research, …• Includes

– grid computing (e.g. WLCG, EGEE, EGI, OSG, TeraGrid, NGS…)• computationally and/or data intensive; highly distributed over wide area

– digital curation

– digital libraries

– collaborative tools (e.g. Access Grid)

– …many other areas

• Most UK Research Councils active in e-Science– BBSRC

– NERC (e.g. climate studies, NERC DataGrid)

– ESRC (e.g. NCeSS

– AHRC (e.g. studies in collaborative performing arts)

– EPSRC (e.g. eMinerals, MyGrid, …)

– STFC (formerly PPARC and CCLRC) (e.g. GridPP, AstroGrid) 4

5

e-Science in 2000

• Dr John Taylor (former Director General of Research Councils,Office of Science and Technology)– ‘e-Science is about global collaboration in key areas of science, and the next

generation of infrastructure that will enable it.’

– ‘e-Science will change the dynamic of the way science is undertaken.’

• SR2000 E-Science Budgets

£80m Collaborative projects

Generic Challenges EPSRC (£15m), DTI (£15m)

Industrial Collaboration (£40m)

Academic Application SupportProgramme

Research Councils (£74m), DTI (£5m)

PPARC (£26m) BBSRC (£8m) MRC (£8m) NERC (£7m) ESRC (£3m) EPSRC (£17m) CLRC (£5m)

And 9 Years on…

• An independent panel of international experts has judged the UK'se-Science Programme as "world-leading", citing that "investments are already empowering significant contributions to wellbeing in the UK and the world beyond".

“The panel found the e-Science Programme to have had a positive economic impact, especially in the important areas of life sciences and medicine, materials, and energy and sustainability. Attractive to industry from its inception, the programme has drawn in around £30 million from industrial collaborations, both in cash and in-kind. Additionally it has already contributed to 138 stakeholder collaborations, 30 licenses or patents, 14 spin-off companies and 103 key results taken up by industry and early indications show there are still more to come.”

http://www.rcuk.ac.uk/news/100210.htm

6

Grids, clouds, supercomputers, etc.

7

(Ack: Bob Jones - EGEE Project Director)

Bob Jones - October 2009 7

Grids• Collaborative environment• Distributed resources (political/sociological)• Commodity hardware (also supercomputers)• (HEP) data management• Complex interfaces (bug not feature)

Supercomputers• Expensive• Low latency interconnects• Applications peer reviewed• Parallel/coupled applications• Traditional interfaces (login)• Also SC grids (DEISA, Teragrid)

Clouds• Proprietary (implementation)• Economies of scale in management• Commodity hardware• Virtualisation for service provision and encapsulating application environment• Details of physical resources hidden• Simple interfaces (too simple?)

Volunteer computing• Simple mechanism to access millions CPUs• Difficult if (much) data involved• Control of environment check • Community building – people involved in Science• Potential for huge amounts of real work

Many different problems:Amenable to different solutions

No right answer

Many different problems:Amenable to different solutions

No right answer

Consider ALL as a combined e-Infrastructure ecosystemAim for interoperability and combine the resources into a consistent whole

Grids and Clouds• GridPP4 will take a

closer look at clouds• Issues

– relative costs

– I/O bandwidth : getting the data into the cloud in the first place !

– data security : entrusting our data to external body

8

© GridTalk

9

What is the Grid ?

• Much more than the web…

• “Grid computing [is] distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance orientation...we review the "Grid problem", which we define as flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources - what we refer to as virtual organizations."– From "The Anatomy of the Grid: Enabling Scalable Virtual Organizations" by

Foster, Kesselman and Tuecke

• “The Web on Steroids” !

10

What is the Grid ?

• The Grid : Blueprint for a New Computing Infrastructure(Ian Foster & Carl Kesselman)

– “A computational grid is a hardware and softwareinfrastructure that provides dependable, consistent,pervasive, and inexpensive access to high-endcomputational capabilities.”

– http://www.mkp.com/mk/default.asp?isbn=1558604758

• What is the Grid ? A Three Point Checklist (Ian Foster)i) Co-ordinates resources that are not subject to centralised control

- see EGEE/LCG/GridPP Grid

ii) …using standard, open, general-purpose protocols and interfaces- see Open Grid Forum, x.509, (also Globus, Condor, gLite)

iii) …to deliver nontrivial qualities of service- see LCG MoU, Service availability, resources promises (CPU, storage, network)

– http://www-fp.mcs.anl.gov/~foster/Articles/WhatIsTheGrid.pdf

11

Haven’t we beenhere before ?

• Multiple standards• Interoperability• Distributed resource sharing• No common management• Concurrent access• Flexible

Analogies continue…• Far from perfect• Much room for improvement• A “trip” hazard !

Acknowledgement: J.Gordon

12

Grids in Particle Physics

(with the LHC as an example)

13

EGEELCGGridPP

LCG LHC Computing GridDistributed Production Environment for Physics Data Processing

In 2007 : 100,000 CPU, 15PB/Yr, 5000 physicist, 500 institutes

EGEE Enabling Grids for E-sciencEStarts from LCG infrastructure

Production Grid in 27 countries

HEP, BioMed, CompChem,

Earth Science, …

GridPPGrid computing for HEP in UK

Major contributor to LCG & EGEE

19 Institutes

14

EGEE-III• 140 partners in 33 countries• ~32M€ (EU); 2 years• ~250 sites in 45 countries; • >75,000 CPU-cores• >270 VOs• ~210k jobs/day (peak 230k)• ~76 million jobs run in year

to Aug08

EGEE Federations UK/Ireland

CPU time delivered (CPU months)

x 2

231K jobs/dayNumber of jobs per month

15

EGEE-III

• 140 partners in 33 countries• ~32M€ (EU); 2 years• ~250 sites in 45 countries; • >75,000 CPU-cores• >270 VOs• ~210k jobs/day (peak 230k)• ~76 million jobs run in year

to Aug08• “Other VOs” 30k jobs/day

EGEE “reach” in 2008

16

LCG – LHC Computing Grid

• Worldwide LHC ComputingGrid

• Framework to deliverdistributed computing forLHC experiments

– Middleware / Deployment– Service/Data Challenges– Security– Applications Software– Distributed Analysis– Private Optical Network– MoUs

• Coverage– Europe EGEE– USA OSG– Asia Naregi, Taipei,

China…– Other…

17

GridPP

• Integrated within the LCG/EGEE framework

• UK Service Operations (LCG/EGEE)

– Tier-1 & Tier-2s

• HEP Applications Integration Exploitation by experiments

– @ LHC, FNAL, SLAC– GANGA (LHCb & ATLAS)

• Increasingly closer working with NGS

• Phase 1 : 2001-2004– Prototype (Tier-1)

• Phase 2 : 2004-2008– “From Prototype to Production”– Production (Tier-1&2)

• Phase 3 : 2008-2011– “From Production to Exploitation”– Reconstruction, Monte Carlo, Analysis

• Phase 4 : 2011-2014…• routine operation during LHC running

Tier-1 Farm Usage

18

Virtual Organisations

19

LHC Data Rates

• 40MHz beam crossing• 107 channels

~1015 Bytes/s

• Reduce 107 in trigger• Few MByte/event• 100Hz “to tape”• ~107s in a year !!

108x107 = 1 PByte /yr

per experiment

20

LHC Computing Model

physics group

regional group

Tier2

Lab a

Uni a

Lab c

Uni n

Lab m

Lab b

Uni bUni y

Uni x

Tier3physics

department

Desktop

Germany

Tier 1

USA

UK

France

Italy

……….

CERN Tier 1

……….

The LHC Computing

CentreCERN Tier 0

21

22

Tier Centres & the UK

23

A typical Access Pattern

Raw Data ~1000 Tbytes

AOD ~10 TB

AOD ~10 TB

AOD ~10 TB

AOD ~10 TB

AOD ~10 TB

AOD ~10 TB

AOD ~10 TB

AOD ~10 TB

AOD ~10 TB

Reco-V1 ~1000 Tbytes Reco-V2 ~1000 Tbytes

ESD-V1.1 ~100 Tbytes

ESD-V1.2 ~100 Tbytes

ESD-V2.1 ~100 Tbytes

ESD-V2.2 ~100 Tbytes

Access Rates Access Rates (aggregate, average)(aggregate, average)

100 Mbytes/s (2-5 100 Mbytes/s (2-5 physicists)physicists)

500500 Mbytes/s ( Mbytes/s (55--110 0 physicists)physicists)

11000 Mbytes/s (~000 Mbytes/s (~550 0 physicists)physicists)

22000 Mbytes/s (~000 Mbytes/s (~15150 0 physicists)physicists)

Typical LHC particle physics Typical LHC particle physics experiment One year of acquisition experiment One year of acquisition and analysis of dataand analysis of data

24

Principle Components

25

The Middleware Stack

Computing cluster Network resources Data storage

Operating system Local schedulerFile system

User access SecurityData transferInformation schema

Job Management Data managementApp monitoring system

User interfaces Applications

Hardware

System software

“Basic” services

“Collective” services

Application level services

Scientific LinuxScientific Linux NFS, AFS …NFS, AFS … PBS, Condor, LSF,…PBS, Condor, LSF,…

gLite (+Globus, Condor, etc)gLite (+Globus, Condor, etc)

EGEE/LCG & ExptsEGEE/LCG & Expts

Information system

FTSFTS

X.509X.509

LFCLFCDashboards, SAMDashboards, SAM

Resource BrokersResource Brokers

GANGA, etcGANGA, etc

26

gLite

27

Security – Job Submission

useruser CECE

user cert

low frequency

high frequency

host cert

proxy

authz

VO

informationsystem

1. VO affiliation(AccessControlBase

)4. CEs for VOs in authz?

3. job submission

MyProxyserver

WMS

2. cert upload

VO credential is used by the resource broker to pre-select available CEs.

28

Security – Running a Job

MyProxyserver

CECE

cert(long term)

host cert

proxy

authz

VO

WMS

1. cert download

LCAS/LCMAPS

authentication & authorization info

2. job start

LCAS: authorization based on (multiple) VO/group/role attributes

LCMAPS: mapping to user pool and to (multiple) groups

default VO = default UNIX group

other VO/group/role = other UNIX group(s)

voms-proxy-init

VO credential for authorization and mapping on the CE.

29

Information System

• At the core of making the Grid function• Hierarchy of distributed BDII/LDAP servers • Information organised using the GLUE Schema

30

Information System - LDAP

• Lightweight Directory Access Protocol: – structures data as a tree

– DIT = Directory Information Tree

• Following a path from the node back to

the root of the DIT, a unique name isbuilt (the DN):

“id=ano,ou=PPD,or=STFC,st=Chilton, \c=UK,o=grid”

o = grid (root of the DIT)

c= US c=UK c=Spain

st = Chilton

or = STFC

ou = PPD ou = ESC

objectClass:personcn: A.N.Other

phone: 5555666office: R1-3.10

31

WMS – Workload Management System

• WMS is composed of the following parts:1. User Interface (UI) : access point for the user to the WMS

2. Resource Broker (RB) : the broker of GRID resources, responsible to find the “best” resources where to submit jobs

3. Job Submission Service (JSS) : provides a reliable submission system

4. Information Index (BDII) : a server (based on LDAP) which collects information about Grid resources – used by the Resource Broker to rank and select resources

5. Logging and Bookkeeping services (LB) : store Job Info available for users to query

• (However, this is evolving with the moves to the gLite RB and the gLite CE !

Executable = “/bin/echo”;Arguments = “Good Morning”;StdError = “stderr.log”;StdOutput = “stdout.log”;OutputSandbox = {“stderr.log”, “stdout.log”};

Executable = “gridTest”;

StdError = “stderr.log”;

StdOutput = “stdout.log”;

InputSandbox = {“/home/robin/test/gridTest”};

OutputSandbox = {“stderr.log”, “stdout.log”};

InputData = “lfn:testbed0-00019”;

DataAccessProtocol = “gridftp”;

Requirements = other.Architecture==“INTEL” && \ other.OpSys==“LINUX” && other.FreeCpus >=4;

Rank = “other.GlueHostBenchmarkSF00”;

Example JDL

32

Data Management

• DPM – Disk Pool Manager• also (dCache), CASTOR

• LFC – LHC File Catalogue• FTS – File Transfer Service

33

Storage - DPM

• Disk Pool Manager: lightweight disk-only storage element– disk only storage with focus on manageability

• Features– secure: authentication via GSI or Kerberos 5, authorisation via VOMS

– full POSIX ACL support with DN (userid) and VOMS groups

– disk pool management (direct socket interface)

– storage name space (aka. storage file catalog)

– DPM can act as a site local replica catalog

– SRMv1, SRMv2.1 and SRMv2.2

– gridFTP, rfio

• Other Storage Element technologies…– dCache

– CASTOR

34

File Catalogues

• LFC– secure (authn: GSI, authz: VOMS) file and replica catalogue; DLI

– supports full POSIX namespace and ACLs

– central file catalogue and local file catalogue modes

• Fireman– secure (authn: GSI, authz: VOMS/ACL) file, replica and meta-data catalog;

data location interface (DLI) for WMS

– web-service interface with bulk operations

• AMGA– grid meta-data catalogue

– streaming socket interface

GlossarySURL = Storage URL

GUID = Global Unique ID

LFN = Logical File Name

35

File Transfer Service

• File Transfer Service is a data movement fabric service– multi-VO service, used to balance usage of site resources according to VO

and site policies

– uses SRM and gridFTP services of an SE

• Why is it needed ?– For the user, the service it provides is the reliable point to point movement of

Storage URLs (SURLs) among Storage Elements

– For the site manager, it provides a reliable and manageable way of serving file movement requests from their VOs

– For the VO manager, it provides ability to control requests coming from users (re-ordering, prioritization,...)

36

Grid Portals

37

Ganga• Job Definition & Management• Implemented in Python• Extensible – plug-ins• Used ATLAS, LHCb & non-HEP

http://ganga.web.cern.ch/ganga/index.php

38

GENIUS (INFN)

http://grid.infn.it/modules/italian/index.php?pagenum=6

39

Monitoring the Grid

User Support

40

Monitoring

• Lots and lots of it !– SAM – Service Availability Monitor

• https://twiki.cern.ch/twiki/bin/view/LCG/SAMOverview

– Network Monitoring – GridMon• http://gridmon.dl.ac.uk/gridmon/graph.html• Google Maps & Real-time Monitors

– Grid Map• http://gridmap.cern.ch/gm/

– Application Level• ARDA Dashboard

– CMS Dashboard– ATLAS Dashboard– LHCb Dashboard– ALICE Dashboard

41

Google Map – Site Status

• http://goc02.grid-support.ac.uk/googlemaps/sam.html

42

Google Map – Site Status

43

LCG Real-time Monitor

44

LCG Real-time Monitor

45

GridMap (2008)

GridMap (2009)

46

47

ARDA Dashboard http://dashboard.cern.ch/

• Used by all 4 LHC experiments to monitor jobs and file movements

48

ARDA Dashboard http://dashboard.cern.ch/

• Used by all 4 LHC experiments to monitor jobs and file movements

49

ARDA Dashboard http://dashboard.cern.ch/

• Used by all 4 LHC experiments to monitor jobs and file movements

50

ARDA Dashboard http://dashboard.cern.ch/

• Used by all 4 LHC experiments to monitor jobs and file movements

UK Status (& SAM Tests)

51

52

User Support

• Documentation– oodles of it !

– but much room for improvement !

• Experiment (VO) specific contacts

• GGUSGlobal Grid User Support– ticket based

– Linked to• regional centres• software experts

53

Other Projects

Other Sciences

54

EGEE Related Infrastructure Projects

DEISATeraGrid

Coordination in SA1 for:

• EELA, BalticGrid, EUMedGrid, EUChinaGrid, SEE-GRID

Interoperation with

• OSG, NAREGI

SA3: • DEISA, ARC, NAREGI

55

EGEE Collaborating Projects

Applicationsimproved services for academia,

industry and the public

Support Actionskey complementary functions

Infrastructuresgeographical or thematic coverage

EGEE - Communities

• Astronomy & Astrophysics• large-scale data acquisition, simulation, data storage/retrieval

• Computational Chemistry• use of software packages (incl. commercial) on EGEE

• Earth Sciences• Seismology, Atmospheric modeling, Meteorology, Flood forecasting, Pollution

• Fusion (build up to ITER)• Ion Kinetic Transport, Massive Ray Tracing, Stellarator Optimization.

• Grid Observatory• collect data on Grid behaviour (Computer Science)

• High Energy Physics• four LHC experiments, BaBar, D0, CDF, Lattice QCD, Geant4, SixTrack, …

• Life Sciences• Medical Imaging, Bioinformatics, Drug discovery• WISDOM – drug discovery for neglected / emergent diseases

(malaria, H5N1, …)56

ESFRI Projects

• Many are starting to look at their e-Science needs– some at a similar scale to the LHC (petascale)

– project design study stage

– http://cordis.europa.eu/esfri/

57

Cherenkov Telescope Array

National eScience Centreand other eScience Centres

• Edinburgh & Glasgow collaboration• e-Science Institute• Lectures & presentations• Meeting place• NeSC Mission Statement

– To stimulate and sustain the development of e-Sciencein the UK, to contribute significantly to its internationaldevelopment and to ensure that its techniques are rapidlypropagated to commerce and industry.

– To identify and support e-Science projects within andbetween institutions in Scotland, and to provide theappropriate technical infrastructure and support in orderto ensure rapid uptake of e-Science techniques byScottish scientists.

– To encourage the interaction and bi-directional flow ofideas between computing science research ande-Science applications

– To develop advances in scientific data curation andanalysis and to be a primary source of top qualitysystems and repositories that enable management,sharing and best use of research data.

58

Digital Curation

59

• Digital Curation Centre– Edinburgh, NeSC, HATII, UKOLN, STFC

– Objectives• Provide strategic leadership in digital

curation and preservation for the UKresearch community, with particularemphasis on science data

• Influence and inform national and international policy• Provide advocacy and expert advice and guidance to practitioners and funding

bodies• Create, manage and develop an outstanding suite of resources and tools• Raise the level of awareness and expertise amongst data creators and curators, and

other individuals with a curation role• Strengthen community curation networks and collaborative partnerships• Continue our strong association with our research programme

• Particle Physics– Study group / workshops (DESY & SLAC) in 2009 -> intermediate report

to ICFA

60

Sustainability

EGIEuropean Grid Infrastructure

Move to EGI/NGI

• De-centralised– emphasises NGIs

– still some centralised tasks

• Governed by NGIs• Initial co-funding from EU• For all disciplines

– sciences, humanities, …

62

e-IRG Recommendation, 12/2005:

“The e-IRG recognizes that the current project-based financing model of grids (e.g., EGEE, DEISA) presents continuity and interoperability problems,

and that new financing and governance models need to be explored – taking into account the role of national grid initiatives as recommended in the Luxembourg e-IRG meeting.”

e-IRG Recommendation, 12/2005:

“The e-IRG recognizes that the current project-based financing model of grids (e.g., EGEE, DEISA) presents continuity and interoperability problems,

and that new financing and governance models need to be explored – taking into account the role of national grid initiatives as recommended in the Luxembourg e-IRG meeting.”

• Specialised Support Centres- for VOs / disciplines (e.g. HEP)

- externally funded

EGI - Management Structure

63

EGI - Tasks

• Accounting, Security, User Support, Problem Tracking,Middleware testing, Deployment, VO Registration, Monitoring,Grid Information Systems, etc.

64

EGI - Transition

65

• EGI-DS Project– establish Blueprint for EGI

– establish EGI.org

• EGEE– begin transition ~Spring 2009

• EGI– operational Spring 2010

• Continuity of service is KEY– not only for LHC…

EGI - Status

• EU bids– Proposals submitted – Nov’09

– Significant (for HEP) SSCs not invited to hearing !

– EGI-Inspire & EMI hearing yesterday -> anticipate infrastructure & middleware development will be funded

• New legal entity, EGI.eu, created last week in Amsterdam– …and soon recruiting

• Proto UK NGI based on NGS & GridPP is in place

66

67

Further Information

iSGTW

InternationalScienceGridThisWeek

68

http://www.isgtw.org/

69

Links

• GridPP http://www.gridpp.ac.uk/• LCG http://lcg.web.cern.ch/LCG/

– LCGwiki https://twiki.cern.ch/twiki/bin/view/LCG/WebHome– monitoring & status

• EGEE http://www.eu-egee.org/– gLite http://glite.web.cern.ch/glite/

• EGI (European Grid Initiative)– Design Study http://web.eu-egi.org/

• Computing in…– ALTAS http://atlas-computing.web.cern.ch/atlas-computing/computing.php

– CMS http://cms.cern.ch/iCMS/jsp/page.jsp?mode=cms&action=url&urlkey=CMS_COMPUTING

– LHCb http://lhcb-comp.web.cern.ch/lhcb-comp/

• Portals– Ganga http://ganga.web.cern.ch/ganga/– GILDA https://gilda.ct.infn.it/

• Open Grid Forum http://www.ogf.org/• Globus http://www.globus.org/• Condor http://www.cs.wisc.edu/condor/

Robin Middleton

RAL-PPD/EGEE/GridPP

The End

top related