Top Banner
November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e- Grids: Tools for e- Science Science DoSon AC GRID School DoSon AC GRID School
30

November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

Jan 11, 2016

Download

Documents

Morris Harper
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007

Dominique Boutigny – CC-IN2P3

Grids: Tools for e-ScienceGrids: Tools for e-Science

DoSon AC GRID SchoolDoSon AC GRID School

Page 2: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 2

Main characteristics of a GridMain characteristics of a Grid

A grid is an architecture and a A grid is an architecture and a set of software tools designed set of software tools designed to federate distributed to federate distributed computing resources.computing resources.

Resources are in principle Resources are in principle heterogeneousheterogeneous

Each node of the grid is Each node of the grid is administrated locally but administrated locally but there should be a central there should be a central coordination in order to coordination in order to keep the system coherentkeep the system coherent

An information An information system (even very system (even very light) should be light) should be present in order to present in order to match the match the computing tasks to computing tasks to the computing the computing environmentenvironment

The underlying network is The underlying network is crucialcrucial

A security and A security and authorization system authorization system should be presentshould be present

Page 3: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 3

Different kind of production GridsDifferent kind of production Grids

Computing GridComputing Grid

Data GridData Grid

Both Both Computing Computing and Dataand Data

Molecular dockingMolecular docking

Medical imagery Medical imagery Astronomical dataAstronomical data

LHC data processingLHC data processing

Page 4: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 4

Grids are a good way to increase Grids are a good way to increase the computing power available for the computing power available for a scientific community by putting a scientific community by putting resources in commonresources in common Grids federate and Grids federate and

contribute to build scientific contribute to build scientific communitiescommunities

Grids are often complicated to manage – A large Grids are often complicated to manage – A large grid requires a strong coordination between the grid requires a strong coordination between the participating sitesparticipating sites

ButBut

Page 5: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 5

The LHC Computing GridThe LHC Computing Grid

LCGLCG

Page 6: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 6

Concorde(15 Km)

Balloon(30 Km)

CD stack with1 year LHC data!(~ 20 Km)

Mt. Blanc(4.8 Km)

4 LHC experiments4 LHC experiments

15 PetaByte of data per year15 PetaByte of data per year

We have got a problem with dataWe have got a problem with data

100 Million SpecInt2000100 Million SpecInt2000

This is ~ 5000 today's 8 core computersThis is ~ 5000 today's 8 core computers

~15 M$~15 M$

Relatively easy to setup – Each CPU core is independent Relatively easy to setup – Each CPU core is independent of each otherof each other

15 PetaByte of data per year15 PetaByte of data per year

Today, this is ~20 M$ if you want to put them on diskToday, this is ~20 M$ if you want to put them on disk

And you also need to store the Monte Carlo simulationAnd you also need to store the Monte Carlo simulation

Need to store data securely for the whole life of the Need to store data securely for the whole life of the experimentsexperiments

Complicated architecture as the data have to move Complicated architecture as the data have to move worldwideworldwide

Each LHC contributor should be able to have access to Each LHC contributor should be able to have access to any dataany data

Page 7: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 7

A Hierarchical Grid Architecture in an A Hierarchical Grid Architecture in an International FrameworkInternational Framework

CC-IN2P3CC-IN2P3FZKFZK

PICPIC

NDGFNDGF

NIKHEFNIKHEF

ASCCASCC

BrookhavenBrookhaven

FermilabFermilab

TRIUMFTRIUMF

RALRAL

CNAFCNAF

T1 (11)T1 (11)

T0T0

T3 (many)T3 (many)

T2 (52)T2 (52)

Île de FranceÎle de France

ClermontClermont

NantesNantes

StrasbourgStrasbourg

MarseilleMarseille

LyonLyon

CC-IN2P3

CC-IN2P3

AnnecyAnnecy

Page 8: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 8

LCG Vs EGEELCG Vs EGEE

In Europe the LHC Computing Grid is based on the In Europe the LHC Computing Grid is based on the multidisciplinary project EGEEmultidisciplinary project EGEE

Middleware Middleware

Grid operation infrastructureGrid operation infrastructure

Pilot New

The Grid was a necessity for the LHC ComputingThe Grid was a necessity for the LHC Computing

It was a very good opportunity for other disciplinesIt was a very good opportunity for other disciplines

EGEE is also providing a very sophisticated EGEE is also providing a very sophisticated operational frameworkoperational framework

• MonitoringMonitoring

• Ticketing systemTicketing system

EGEE-II:EGEE-II: 90 partners – 90 partners – 32 countries – 32 M32 countries – 32 M€€ Crucial for the Crucial for the

success of the projectsuccess of the project

Page 9: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 9

LCG Vs EGEELCG Vs EGEE

Page 10: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 10

Page 11: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 11

InteroperabilityInteroperability

3 grid infrastructures are being used for LHC Computing 3 grid infrastructures are being used for LHC Computing – EGEE in EuropeEGEE in Europe– NorduGrid in Nordic CountriesNorduGrid in Nordic Countries– OSG in the USOSG in the US

These 3 infrastructures are now able to interoperateThese 3 infrastructures are now able to interoperate– Job submissionJob submission– OperationOperation

Developments on interoperabilityDevelopments on interoperability– Short term: GIN (Grid Interoperability Now)Short term: GIN (Grid Interoperability Now)– Longer term: SAGA / JSDL etc…Longer term: SAGA / JSDL etc…

They are based They are based on different on different middlewaresmiddlewares

Developed within Developed within the OGF the OGF frameworkframework

Page 12: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 12

GRID Services for the LHCGRID Services for the LHC

Computing servicesComputing services

Computing Computing Element (CE)Element (CE)

Worker nodes (WN)Worker nodes (WN)

WNWN

WNWN

WNWN

WNWN

WNWN

WNWN

WNWN

WNWN

WNWN

WNWN

WNWN

WNWN

WNWN

WNWN

WNWNSL4SL4

Workload Workload Management SystemManagement System

StorageStorage

Based on SRMBased on SRM

dCachedCache

CastorCastor

StormStorm

DPMDPM

File ManagementFile Management

Transfer: FTSTransfer: FTS

Cataloguing: LFC Cataloguing: LFC

Database replicationDatabase replication

3D - Project3D - Project

VOMSVOMS

Virtual Organization Virtual Organization ManagementManagement

Specific experiment Specific experiment servicesservices

VO BoxesVO Boxes

Will be used for priority Will be used for priority management management

Page 13: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 13

The LHC Optical Private NetworkThe LHC Optical Private Network

Page 14: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 14

LCG and emerging countriesLCG and emerging countries

The grid is a complex environment which is The grid is a complex environment which is mandatory to provide the huge computing mandatory to provide the huge computing resources necessary for the LHCresources necessary for the LHC– The learning curve is steep !The learning curve is steep !

Complexity … But…Complexity … But…– It provides a framework in which all the data will be It provides a framework in which all the data will be

available for every collaborator everywhereavailable for every collaborator everywhereThis is a unique opportunity for laboratories in This is a unique opportunity for laboratories in

emerging countries to fully participate to the physics emerging countries to fully participate to the physics analysis analysis

Page 15: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 15

Lightweight GridsLightweight Grids

Page 16: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 16

BOINCBOINC

NetworkNetwork

Main Main serverserver

BOINC provide a framework for a lightweight Grid targeting CPU intensive BOINC provide a framework for a lightweight Grid targeting CPU intensive applications running on small datasets applications running on small datasets

Page 17: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 17

BOINC / Einstein@home BOINC / Einstein@home

Data analysis from the giant interferometer LIGO and GEO – Search for pulsar generated gravitational waves

Fast Fourier transforms are computed on many chunks of the best data taking periods.

Search for Gravitational Wave signals on 30 000 directions spread on the sky

Huge combinatorial problem

• Use of individual PC

Big success > 160 000 participants

Contribution to scientific outreach

Gravitational wave detectionhttp://einstein.phys.uwm.edu/

Page 18: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 18

BOINCBOINC

BOINC provides a framework for a lightweight BOINC provides a framework for a lightweight Grid which is usable to federates the usage of Grid which is usable to federates the usage of distributed PCdistributed PC

Standalone usage is possible in many domains – Standalone usage is possible in many domains – BOINC is already used by several teams working BOINC is already used by several teams working in Biology.in Biology.

Certainly a way to explore, for laboratories with Certainly a way to explore, for laboratories with limited computing resourceslimited computing resources

Page 19: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 19

Java Job Submission (JJS)Java Job Submission (JJS)

Developed at CC-IN2P3 by Pascal CalvatDeveloped at CC-IN2P3 by Pascal Calvat Java Job Submission is a very simple User Java Job Submission is a very simple User

Interface to submit jobs on the GridInterface to submit jobs on the Grid– Works on MAC, Windows and LinuxWorks on MAC, Windows and Linux– Direct submission to Computing ElementDirect submission to Computing Element– Very efficientVery efficient

• Especially for short jobsEspecially for short jobs

– Includes a learning system in order to dynamically build Includes a learning system in order to dynamically build a list of the "best" submission sites based on their a list of the "best" submission sites based on their response timeresponse time

Page 20: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 20

SRB an example of a data SRB an example of a data GridGrid

Developed at San Diego Supercomputing CenterDeveloped at San Diego Supercomputing Center

Page 21: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 21

SRB a Data Grid middleware (1) SRB a Data Grid middleware (1)

Many scientific applications are based Many scientific applications are based on data production and analysison data production and analysis

ATAGGATAGGCATAGCATAGGCTATGCTATAGGCCAGGCCAGATTAGATT

AAAA

ATAGGATAGGCATAGCATAGGCTATGCTATAGGCCAGGCCAGATTAGATT

AAAA

Page 22: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 22

SRB a Data Grid middleware (2)SRB a Data Grid middleware (2)

User wants the complexity User wants the complexity to be hiddento be hidden

Inspired from:

http://legacy-web.nbirn.net/Resources_rd/Educational/Tutorials/SRB/021202SRBTutorial/021202SRBIntroBIRN.ppt

Put dataPut dataGet dataGet data

Get dataGet data

SRBSRB

Put dataPut data

DBDB

SRB SRBMetadata Catalog

DBDB

SRB SRBMetadata Catalog

DBDB

SRB SRBMetadata Catalog

Page 23: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 23

Biomedical applications using SRBBiomedical applications using SRB

Export PC (DICOM server, SRB client)

MRISiemens MAGNETOM

Sonata Maestro Class 1.5 T

Ac

qu

isit

ion

Control PC

DICOM

push DICOM

DICOM

DICOM

DICOM

Page 24: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 24

The BIRN ProjectThe BIRN Project

Biomedical Informatics Research NetworkBiomedical Informatics Research Network

Brain imagery – Study of brain diseasesBrain imagery – Study of brain diseases

http://www.nbirn.net/http://www.nbirn.net/

Page 25: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 25

SRB application in HEPSRB application in HEP

Projet SuperNovae FactoryProjet SuperNovae Factory

Data acquisition in Hawai Data acquisition in Hawai remotely controlled from Franceremotely controlled from France

Data are exported to CC-IN2P3 Data are exported to CC-IN2P3 and put at physicist disposal and put at physicist disposal through SRBthrough SRB

BaBar data distribution has been BaBar data distribution has been using SRB since several yearsusing SRB since several years

Hundreds of TB of data has been Hundreds of TB of data has been transferred and referencedtransferred and referenced

Page 26: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 26

Grid5000 a research gridGrid5000 a research grid

Grid5000 is a project to build a 5000 node grid, dedicated for Grid5000 is a project to build a 5000 node grid, dedicated for research on grid technologiesresearch on grid technologies

9 French sites are currently 9 French sites are currently hosting 3166 Grid5000 nodeshosting 3166 Grid5000 nodes

Sites are connected together on Sites are connected together on a 10 Gb/s backbonea 10 Gb/s backbone

A booking system allows to reserve some nodes to run experiments. A booking system allows to reserve some nodes to run experiments. It is possible to install and deploy a complete software package from It is possible to install and deploy a complete software package from the OS up to the applications on all the nodesthe OS up to the applications on all the nodes

Since recently a network connection has been established between Grid5000 and Since recently a network connection has been established between Grid5000 and the Japanese Grid NAREGIthe Japanese Grid NAREGI

A close collaboration between Research Grids and Production Grids is essentialA close collaboration between Research Grids and Production Grids is essential

Research GridsResearch Grids will develop the future software for the production grids will develop the future software for the production grids

Production GridsProduction Grids will provide the framework to test new developments will provide the framework to test new developments

Page 27: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 27

Networks and the Digital Divide (1)Networks and the Digital Divide (1)

ICFA Standing Committee on Interregional ConnectivityICFA Standing Committee on Interregional Connectivity

R. Les Cottrell and Shahryar Khan http://www.slac.stanford.edu/xorg/icfa/icfa-net-paper-jan07/

Pinger system running on Pinger system running on 649 sites – 128 countries 649 sites – 128 countries – 11 world regions– 11 world regions

Page 28: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 28

Networks and the Digital Divide (2)Networks and the Digital Divide (2)

Behind Europe6 Yrs: Russia, Latin America 7 Yrs: Mid-East, SE Asia8-9 Yrs: So. Asia11 Yrs: Cent. Asia12 Yrs: Africa

Page 29: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 29

The ORIENT / TEIN2 networkThe ORIENT / TEIN2 network

Internet connection difficulties Internet connection difficulties are often related to the "last mile are often related to the "last mile problem" problem"

Institutes local networkInstitutes local network

Institute connection to the Institute connection to the main country backbonemain country backbone

etcetc

Are often a problemAre often a problem

Hong Kong is also Connected

to GLORIAD

45 Mb/s45 Mb/s

622 Mb/s to 622 Mb/s to be upgraded be upgraded to 2x2.5 Gb/sto 2x2.5 Gb/s

Page 30: November 16, 2007 Dominique Boutigny – CC-IN2P3 Grids: Tools for e-Science DoSon AC GRID School.

November 16, 2007Dominique Boutigny 30

ConclusionsConclusions

Different kind of grid systems have been presentedDifferent kind of grid systems have been presented– They are adapted to different kind of researchThey are adapted to different kind of research– They can be very light (BOINC) or much more complicated (LCG)They can be very light (BOINC) or much more complicated (LCG)

There are different ways to do Grid computingThere are different ways to do Grid computing– Can be very simple (a single User Interface) Can be very simple (a single User Interface) – Can be more sophisticated (by deploying a complete Grid node)Can be more sophisticated (by deploying a complete Grid node)

But in any case the network quality is crucial !But in any case the network quality is crucial !– Emerging countries should put the focus on the network Emerging countries should put the focus on the network

developmentdevelopment

Grid is nothing by itself, only scientific applications Grid is nothing by itself, only scientific applications matters !matters !