Top Banner
SAMGrid – A fully functional computing grid based on standard technologies Igor Terekhov for the JIM team FNAL/CD/CCF
20

SAMGrid – A fully functional computing grid based on standard technologies Igor Terekhov for the JIM team FNAL/CD/CCF.

Dec 25, 2015

Download

Documents

Kimberly White
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SAMGrid – A fully functional computing grid based on standard technologies Igor Terekhov for the JIM team FNAL/CD/CCF.

SAMGrid – A fully functional computing grid based on standard technologies

Igor Terekhov for the JIM team FNAL/CD/CCF

Page 2: SAMGrid – A fully functional computing grid based on standard technologies Igor Terekhov for the JIM team FNAL/CD/CCF.

Igor Terekhov, FNAL

Outlook

Brief History, D0 and CDF computing, Grid Jobs and Information Management

ArchitectureJob managementInformation management

Summary

Page 3: SAMGrid – A fully functional computing grid based on standard technologies Igor Terekhov for the JIM team FNAL/CD/CCF.

Igor Terekhov, FNAL

HistoryRun II CDF and D0, the two largest, currently running collider experimentsEach experiment to accumulate ~1PB/yr raw, reconstructed, analyzed data by 2007. Get the Higgs jointly.Real data acquisition – 5 /wk, 25MB/s, 1TB/day, plus MC

1pb

Page 4: SAMGrid – A fully functional computing grid based on standard technologies Igor Terekhov for the JIM team FNAL/CD/CCF.

Igor Terekhov, FNAL

Globally Distributed Computing and GridD0 – 80 institutions, 18 countries. CDF – 60 institutions, 12 countries.Many institutions have computing (including storage) resources, dozens for each of D0, CDFSome of these are actually shared, regionally or experiment-wide

Sharing is goodA possible contribution by the institution into the collaboration while keeping it localRecent Grid trend (and its funding) encourages it

Page 5: SAMGrid – A fully functional computing grid based on standard technologies Igor Terekhov for the JIM team FNAL/CD/CCF.

Igor Terekhov, FNAL

Goals of Globally Distributed Computing in Run II

To distribute data to processing centers – SAM is a wayTo benefit from the pool of distributed resources – maximize job turnaround, yet keep single interfaceTo facilitate and automate job placementTo reliably execute jobs spread across multiple resourcesTo provide an aggregate view of the system and its activities and keep track of what’s happeningTo maintain security Finally, to learn and prepare for the LHC computing

Page 6: SAMGrid – A fully functional computing grid based on standard technologies Igor Terekhov for the JIM team FNAL/CD/CCF.

Igor Terekhov, FNAL

Data Distribution - SAMSAM is Sequential data Access via Meta-data.

http://{d0,cdf}db.fnal.gov/samPresented numerous times, CHEPSCore features: meta-data cataloguing, global data replication and routing, co-allocation of compute and data resourcesGlobal data distribution:

MC import from remote sitesOff-site analysis centersOff-site reconstruction (D0)

See Lee Lueking’s talk for more details

Page 7: SAMGrid – A fully functional computing grid based on standard technologies Igor Terekhov for the JIM team FNAL/CD/CCF.

Igor Terekhov, FNAL

Now that the Data’s Distributed: JIMGrid Jobs and Information ManagementOwes to the D0 Grid funding – PPDG (the FNAL team), UK GridPP (Rod Walker, ICL)Very young – started 2001Actively explore, adopt, enhance, develop new Grid technologiesCollaborate with the Condor team from The University of Wisconsin on Job managementJIM with SAM is also called The SAMGridT<10min?

Page 8: SAMGrid – A fully functional computing grid based on standard technologies Igor Terekhov for the JIM team FNAL/CD/CCF.

Igor Terekhov, FNAL

Page 9: SAMGrid – A fully functional computing grid based on standard technologies Igor Terekhov for the JIM team FNAL/CD/CCF.

Igor Terekhov, FNAL

Job Management StrategiesWe distinguish grid-level (global) job scheduling (selection of a cluster to run) from local scheduling (distribution of the job within the cluster)We distinguish structured jobs from unstructured.

Structured jobs have their details known to Grid middleware. Unstructured jobs are mapped as a whole onto a cluster

In the first phase, we want reasonably intelligent scheduling and reliable execution of unstructured data-intensive jobs.

Page 10: SAMGrid – A fully functional computing grid based on standard technologies Igor Terekhov for the JIM team FNAL/CD/CCF.

Igor Terekhov, FNAL

Job Management HighlightsWe seek to provide automated resource selection (brokering) at the global level with final scheduling done locally Focus on data-intensive jobs:

Execution time is composed of:• Time to retrieve any missing input data• Time to process the data• Time to store output data

In the Leading Order, we rank sites by the amount of data cached at the site (minimize missing input data)Scheduler is interfaced with the data handling system

Page 11: SAMGrid – A fully functional computing grid based on standard technologies Igor Terekhov for the JIM team FNAL/CD/CCF.

Igor Terekhov, FNAL

Job Management – Distinct JIM Features

Decision making is based on both:Information existing irrespective of jobs (resource description)Functions of (jobs,resource)

Decision making is interfaced with data handling middleware rather than individual SE’s or RC alone: this allows incorporation of DH considerationsDecision making is entirely in the Condor framework (no own RB) – strong promotion of standards, interoperability

Page 12: SAMGrid – A fully functional computing grid based on standard technologies Igor Terekhov for the JIM team FNAL/CD/CCF.

Igor Terekhov, FNALJO

B

Computing Element

Submission Client

User Interface

QueuingSystem

Job ManagementUser Interface

User Interface

BrokerMatch

Making Service

Information Collector

Execution Site #1

Submission Client

Submission Client

Match Making Service

Match Making Service

Computing Element

Grid Sensors

Execution Site #n

Queuing System

Queuing System

Grid Sensors

Storage Element

Storage Element

Computing Element

Storage Element

Data Handling System

Data Handling System

Storage Element

Storage Element

Storage Element

Storage Element

Information Collector

Information Collector

Grid Sensor

s

Grid Sensor

s

Grid Sensor

s

Grid Sensor

s

Computing Element

Computing Element

Data Handling System

Data Handling System

Data Handling System

Data Handling System

Page 13: SAMGrid – A fully functional computing grid based on standard technologies Igor Terekhov for the JIM team FNAL/CD/CCF.

Igor Terekhov, FNAL

Monitoring Highlights

Sites (resources) and jobsDistributed knowledge about jobs etcIncremental knowledge buildingGMA for current state inquiries, Logging for recent history studiesAll Web based

Page 14: SAMGrid – A fully functional computing grid based on standard technologies Igor Terekhov for the JIM team FNAL/CD/CCF.

Igor Terekhov, FNAL

Web Browser

Web Server

Site 1 Information System

IPIPIP

Web Browser

Web Server 1

Site 2 Information System

IPIP

IPIP

Web Server N

Site N Information System

JIM Monitoring

Page 15: SAMGrid – A fully functional computing grid based on standard technologies Igor Terekhov for the JIM team FNAL/CD/CCF.

Igor Terekhov, FNAL

Information Management – Implementation and Technology Choices

XML for representation of site configuration and (almost) all other informationXquery and XSLT for information processingXindice and other native XML databases for database semantics

Page 16: SAMGrid – A fully functional computing grid based on standard technologies Igor Terekhov for the JIM team FNAL/CD/CCF.

Igor Terekhov, FNAL

Main Site/cluster Config

Schema

Resource Advertisement

Monitoring Schema

DataHandling

HostingEnvironment

Meta-Schema

Page 17: SAMGrid – A fully functional computing grid based on standard technologies Igor Terekhov for the JIM team FNAL/CD/CCF.

Igor Terekhov, FNAL

SAMGrid Project StatusCore SAM in maintenance stageJIM -- Delivered prototype for D0, Oct 10, 2002. Now deploying V1.

Remote job submissionBrokering based on data cachedWeb-based monitoring

SC-2002 demo – 11 sites (D0, CDF), big successPost V1 – OGSA, Web services, NG logging service

Page 18: SAMGrid – A fully functional computing grid based on standard technologies Igor Terekhov for the JIM team FNAL/CD/CCF.

Igor Terekhov, FNAL

Page 19: SAMGrid – A fully functional computing grid based on standard technologies Igor Terekhov for the JIM team FNAL/CD/CCF.

Igor Terekhov, FNAL

SummaryRun II experiments’ computing is highly distributed, Grid trend is very relevantThe JIM (Jobs and Information Management) part of the SAMGrid addresses the needs for global and grid computing at Run IIWe use Condor and Globus middleware to schedule jobs globally (based on data), and provide Web-based monitoring

Page 20: SAMGrid – A fully functional computing grid based on standard technologies Igor Terekhov for the JIM team FNAL/CD/CCF.

Igor Terekhov, FNAL

AcksV. White, who created SAM and (co-)led it to successPPDG project management, for making d0grid possibleGridPP project in the UK, for its fundingMembers of the Condor team for fruitful discussions