Top Banner
Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee 13 January, 2002
32

Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

Dec 26, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

Experiment Applications:applying the power of the grid to real science

Rick Cavanaugh

University of Florida

GriPhyN/iVDGL

External Advisory Committee

13 January, 2002

Page 2: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

GriPhyN/iVDGLand ATLAS

Argonne, Boston, Brookhaven, Chicago, Indiana, Berkeley, Texas

Page 3: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

13.01.2003 EAC Review 3

ATLAS at SC2002

Grappa Manages the overall grid experience Magda Distributed data management and

replication Pacman Defines and produces software

environments Dc1 production with grat Data

challenge simulations for Atlas Instrumented athena Grid monitoring of

Atlas analysis applications vo-gridmap Virtual organization management Gridview Monitoring U.S. Atlas resources Worldgrid World-wide US/EU grid

infrastructure

Page 4: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

13.01.2003 EAC Review 4

Pacman at SC2002

How did we install our software for this demo?• % pacman –get

iVDGL:WorldGrid

ScienceGrid

Pacman lets you define how a mixed tarball/rpm/gpt/native software environment is

• Fetched• Installed• Setup• Updated

This can be figured out once and exported to the rest of the world via caches• % pacman –get

atlas_testbed

Page 5: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

13.01.2003 EAC Review 5

The caches you have decided to trust

Installed software, pointer to local documentation

Dependencies are automatically resolved

Pacman at SC2002

How did we install our software for this demo?• % pacman –get

iVDGL:WorldGrid

ScienceGrid

Pacman lets you define how a mixed tarball/rpm/gpt/native software environment is

• Fetched• Installed• Setup• Updated

This can be figured out once and exported to the rest of the world via caches• % pacman –get

atlas_testbed

Page 6: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

13.01.2003 EAC Review 6

Grappa at SC2002 Web-based interface for

Athena job submission to Grid resources

Based on XCAT Science Portal technology developed at Indiana

EDG JDL backend to Grappa

Common submission to US gatekeepers and EDG resource broker (through EDG “user interface” machine)

Page 7: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

13.01.2003 EAC Review 7

Grappa Portal Machine:XCAT tomcat server

Web Browsing Machine (JavaScript)Netscape/Mozilla/Int.Expl/PalmScape

https - JavaScript

http: JavaScript Cactus framework

Script-BasedSubmisson

interactiveor

cron-job

Resource A Resource Z. . .MAGDA: registers file/location registers file metadata

Compute Resources

http://browse catalogue

CoG :Submission,Monitoring

CoG :

Data Copy

Data Storage: - Data Disk - HPSS

Magda

(spider)

Inputfiles

Grappa Communications Flow

Page 8: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

13.01.2003 EAC Review 8

Instrumented Athena at SC2002 Part of SuperComputing 2002

ATLAS demo

Prophesy (http://prophesy.mcs.anl.gov/)

• An Infrastructure for Analyzing & Modeling the Performance of Parallel & Distributed Applications

• Normally a Parse & auto-instrument approach (C & FORTRAN).

NetLogger (http://www-didc.lbl.gov/NetLogger/)

• End-to-End Monitoring & Analysis of Distributed Systems

• C, C++, Java, Python, Perl, Tcl APIs

• Web Service Activation

Page 9: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

GriPhyN/iVDGL and CMS

Caltech, Fermilab, Florida, San Diego, Wisconsin

Page 10: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

13.01.2003 EAC Review 10

Bandwidth Gluttony at SC2002

"Grid-Enabled" particle physics analysis application

issued remote database selection queries; prepared data object collections,

moved collections across the WAN using specially enhanced TCP/IP stacks

rendered the results in real time on the analysis client workstation in Baltimore.

Page 11: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

13.01.2003 EAC Review 11

MonaLisa at SC2002

MonaLisa (Caltech)

– Deployed on the US-CMS Test-bed

– Dynamic information/resource discovery mechanism using agents

– Implemented in > Java / Jini with interfaces to SNMP, MDS,

and Ganglia> WDSL / SOAP with UDDI

– Proved critical during live CMS production runs

Pictures taken from Iosif Legrand

Page 12: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

13.01.2003 EAC Review 12

MOP and Clarens at SC2002

Simple, robust grid planner integrated with CMS production software

1.5 million simulated CMS events produced over 2 months (~30 CPU years)

VDT Client

VDT Server 1

MCRunJob

DAGMan/Condor-G

Condor

GridFTP

VDT Server N

Condor

GridFTP

GridFTP

mop-submitter

Linker ScriptGen

Config

Req.

Self Des

Master

ClarensClient

ClarensServer

ClarensServer

Page 13: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

13.01.2003 EAC Review 13

Chimera Production at SC2002

Used VDL to describe virtual data products and their dependencies

Used the Chimera Planners to map abstract workflows onto concrete grid resources

Implemented a WorkRunner to continously schedule jobs across all grid sites

Generator

Simulator

Formator

Reconstructor

Ntuple

Pro

du

ctio

nA

naly

sis

para

ms

exec.

data

Stage File In

Execute Job

Stage File Out

Register File

Example CMSconcrete DAG

Page 14: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

13.01.2003 EAC Review 14

mass = 200decay = WWstability = 1event = 8

mass = 200decay = WWstability = 1plot = 1

mass = 200decay = WWplot = 1

mass = 200decay = WWevent = 8

mass = 200decay = WWstability = 1

mass = 200decay = WWstability = 3

mass = 200

mass = 200decay = WW

mass = 200decay = ZZ

mass = 200plot = 1

mass = 200event = 8

A virtual space of simulated data is created for futureuse by scientists...

Data Provenance at SC2002

Page 15: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

13.01.2003 EAC Review 15

mass = 200decay = WWstability = 1event = 5

mass = 200decay = WWstability = 1plot = 1

mass = 200decay = WWplot = 1

mass = 200decay = WWevent = 8

mass = 200decay = WWstability = 1

mass = 200decay = WWstability = 3

mass = 200

mass = 200decay = WW

mass = 200decay = ZZ

mass = 200plot = 1

mass = 200event = 8

Search forWW decays of the Higgs Boson and where only stable, final state particles are recorded: mass = 200; decay = WW; stability = 1

Data Provenance at SC2002

Page 16: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

13.01.2003 EAC Review 16

mass = 200decay = WWstability = 1LowPt = 20HighPt = 10000

mass = 200decay = WWstability = 1event = 8

mass = 200decay = WWstability = 1plot = 1

mass = 200decay = WWplot = 1

mass = 200decay = WWevent = 8

mass = 200decay = WWstability = 1

mass = 200decay = WWstability = 3

mass = 200

mass = 200decay = WW

mass = 200decay = ZZ

mass = 200plot = 1

mass = 200event = 8

...The scientistadds a new derived data branch... and continues to

investigate !

Data Provenance at SC2002

Page 17: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

ISI, Caltech, Milwaukee

GriPhyN and LIGO (Laser Interferometer Gravitational-wave

Observatory)

Page 18: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

13.01.2003 EAC Review 18

LIGO’s Pulsar Search

Long time frames

Store

raw channels

Short time frames

Hz

Time

Single Frame

Extract channel

transpose

Time-frequency Image

Find Candidate event

DB

archive

Inte

rfero

mete

r

ShortFourierTransform

Extract frequency range

Construct image

30 minutes

Page 19: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

13.01.2003 EAC Review 19

Developed at ISI as part of the GriPhyN project

Configurable system that can map and execute complex workflows on the Grid

Integrated with the GriPhyN Chimera system• It Receives an abstract workflow (AW) description from Chimera, produces

a concrete workflow (CW)• Submits the CW to DAGMan for execution. • Optimizations of CW are done from the point of view of Virtual Data.

Can perform AW planning based on application-level metadata attributes.

Given attributes such as time interval, frequency of interest, location in the sky, etc., Pegasus is currently able to produce any virtual data products present in the LIGO pulsar search

Pegasus: Planning for Execution in Grids

Page 20: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

13.01.2003 EAC Review 20

Condor-G/DAGMan

TransformationCatalog

RLS

MCS

(1) Metadata Attributes

(3) Metadata Attributes

(4) List of Existing VirtualData Products Matching

the Request (LFNs)

(5) Logical File Names(LFNs)

(6) Physical File Names(PFNs)

(8)Metadata Attributes,Current State

Chimera(10b) VDLx

Request Manager

(18) Results

(9b) Derivations

VDL GeneratorSubmit File

Generator forCondor-G

Concrete PlannerAbstract and

Concrete PlannerIn-time scheduler

(9) Concrete DAGMDS

Current SateGenerator

(2) MetadataAttributes

User’s VO information

Available Resources

(7) CurrentState

(10) concreteDAG

(13) DAGMan files

DAGManSubmission and

Monitoring

(14) DAGMan files

(17) Monitoring

(11) PhysicalTransformations

(12) ExecutionEnvironment Information

(15) DAG (16) Log FIles

In development

Resource SelectionInterface

Replica SelectionInterface

Abstract DAGreduction

Metadata Driven Configuration

Page 21: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

13.01.2003 EAC Review 21

LIGO’s pulsar search at SC2002

The pulsar search conducted at SC 2002 • Used LIGO’s data collected

during the first scientific run of the instrument

• Targeted a set of 1000 locations of known pulsar as well as random locations in the sky

• Results of the analysis were published via LDAS (LIGO Data Analysis System) to the LIGO Scientific Collaboration

• performed using LDAS and compute and storage resources at Caltech, University of Southern California, University of Wisconsin Milwaukee.

Page 22: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

13.01.2003 EAC Review 22

Results

SC 2002 demo Over 58 pulsar searches Total of

• 330 tasks

• 469 data transfers

• 330 output files

The total runtime was 11:24:35

To date 185 pulsar searches Total of

• 975 tasks

• 1365 data transfers

• 975 output files

Total runtime96:49:47

Page 23: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

Virtual Galaxy Cluster System:

An Application of the GriPhyN Virtual Data Toolkit to Sloan Digital Sky Survey Data

Chicago, Argonne, Fermilab

Page 24: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

13.01.2003 EAC Review 24

The Brightest Cluster GalaxyPipeline

CatalogCluster

Core

CoreBRGFieldtsObj

FieldtsObj

FieldtsObj

FieldtsObj

BRG

BRG

BRG

3

21

1

1

1

2

2

23

54

Interesting intermediate data reuse made possible by Chimera:

maxBcg is a series of transformations

Cluster finding works well with 1 Mpc radius apertures. If one instead was looking for the sites of gravitational lensing, one would rather use a 1/4 Mpc radius. This would start at transformation 3.

1: extracts galaxies from the full tsObj data set. 2: filter the field for Bright Red Galaxies. 3: calculate the weighted BCG likelihood for each galaxy, most expensive. 4: is this galaxy the most likely galaxy in the neighborhood? 5: remove extraneous data, and store in a compact format.

Page 25: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

13.01.2003 EAC Review 25

BRG

Core

Cluster

Catalog

The DAG

Page 26: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

13.01.2003 EAC Review 26

A DAG for 50 Fields

744 files, 387 nodes, 40 minutes

108

168

60

50

Page 27: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

13.01.2003 EAC Review 27With Jim Annis &

Steve Kent, FNAL1

10

100

1000

10000

100000

1 10 100

Num

ber

of C

lust

ers

Number of Galaxies

Galaxy clustersize distribution

DAG

Example:Sloan Galaxy Cluster Analysis

Sloan Data

Page 28: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

13.01.2003 EAC Review 28

Conclusion

Built a virtual cluster system based on Chimera and SDSS cluster finding.

Described the five stages and data dependencies in VDL.

Tested the system on a virtual data grid. Conducting performance analysis. Helped improve Chimera.

Page 29: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

13.01.2003 EAC Review 29

Some CMS Issues/Challenges

How to generate more buy-in from the experiments? Sociological trust problem, not technical.

More exploition of (virtual) collections of objects and further use of web services (work already well underway).

What is required to store the complete provenance of data generated in a grid environment?

Creation of collaborative peer-to-peer environments. Data Challenge 2003-4: generate and analyze 5% of the

expected data at startup (~1/2 year of continuous production).

What is the relationship between WorldGRID and the LCG? Robust, portable applications! Virtual Organization Management and Policy Enforcement.

Page 30: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

13.01.2003 EAC Review 30

Some ATLAS Issues/Challenges

How to generate more buy-in from the experiments? Sociological trust problem, not technical.

Fleshing out the notion of Pacman "Projects" and prototyping them

What is the best integration path for chimera infrastructure with international atlas catalog systems? Need standardized Virtual Data API?

Packaging and distribution of ATLAS SW releases for each step in the production/analysis chain: gen, sim, reco, analysis.

LCG SW application development env. is now SCRAM: ATLAS evaluating possible migration from CMT to SCRAM

Page 31: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

13.01.2003 EAC Review 31

SDSS Challenges

Cluster Finding• Distribution of clusters in the universe• Evolution of the mass function

• Balanced I/O and compute Power Spectrum

• Distribution of galaxies in the universe• Direct constraints on cosmological parameters

• Compute intensive, prefer MPI systems• Premium on discovering similar results

Analyses based on pixel data• Weak lensing analysis of the SDSS coadded southern survey data• Near Earth asteroid searches• Galaxy morphological properties: NVO Galaxy Morphology Demo

• All involve moving around terabytes of data• Or choosing not to

Page 32: Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.

13.01.2003 EAC Review 32

LIGO Challenges