Top Banner
10 Sep 2005 NVO Summer School 2005 1 Managing VO data and process flows Matthew J. Graham CACR/Caltech THE US NATIONAL VIRTUAL OBSERVATORY
15

10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

Mar 27, 2015

Download

Documents

Marissa Shelton
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 1

Managing VO data and process flows

Matthew J. GrahamCACR/Caltech

THE US NATIONAL VIRTUAL OBSERVATORY

Page 2: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 2

Overview

• Astronomical data• VOStore/VOSpace• Workflows• Astrogrid workflow• CEA

Page 3: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 3

The importance of data

• Data is the raison d’être of the VO• LSST is the data source nonpareil

– data rates of 540MB/s ~16TB in 8 hrs– final archive > 3PB of data

VO Wheel™

• Well-established ways of handling distributed data:

– SRB– PVFS– OGSA-DAI

Page 4: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 4

Data use cases

• Client has data:– stored locally: transfers it to service– stored locally: service retrieves it– stored elsewhere: service retrieves it

• Service generates data:– stores it locally: notifies client of location– transfers it to the client’s local store– transfers it to a client-designated store

Page 5: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 5

VOStore

• Provides a uniform interface to existing or new data storage locations (Facade pattern)

• Structured/unstructured data both first level• Methods:

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

• get• put• list / listAll• importInit• importData (sync/async)• exportInit• exportData (sync/async)

• delete• rename

Page 6: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 6

VOSpace

• Orchestrates VOStores:

– data collections: directories, user-defined– authorisation: user groups – processing efficiency: where is the nearest copy?

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

• move• copy• identifiers

Page 7: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 7

A virtual super-peer data network?

Page 8: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 8

How to manage the flows?

• Way of describing a flow:– processes/steps, inputs/outputs, serial/parallel

execution, control logic, variables, inline scripting– preferably XML (verbose but rigourous)

• Way of controlling a flow: engine• e-Science vs. e-Business:

– open-ended vs. closed– verification and publication– static vs. dynamic workflows– volume and type of data– meta-transactions– customer, manager and user vs. scientist

Page 9: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 9

Workflow patterns

Sequence:

Parallel split Synchronisation

AND

XOR Exclusive choice

Simple Merge

Multi choice

Multi Multi Merge

Multi + Synchronizing Merge

Multi + Multi

Multi + Discriminator

Deferred choice

Multiple Instances with/out Synch

Implicit termination

Interleaved Parallel Routing

Milestone

Page 10: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 10

Workflow kerfuffle

• Workflow languages: BPEL (BPEL4WS, WSBPEL, WSFL, XLANG), BPML, WS-CDL (WSCL, WSCI) , XPDL, BPSS, PSL, AGWL, DGL, DPML, GJobDL, GSFL, GFDL, GWorkflowDL, MoML, SWFL, YAWL, SCUFL/Xscufl, WPDL, PIF, PSL, OWL-S, xWFL, XPL, INCA

• Workflow engines: Taverna, Kepler, Pegasus, DiscoveryNet, Triana, SPA, Geodise, ICENI, Askalon, GridNexus, BioPipe, BizTalk, BPWS4J, DAGMan, GridAnt, GJH, GRMS, GWFE, GWES, ITIEE, JIGSA, Karajan, ScyFLOW, SDSC Matrix, SHOP2, wftk, YAWL Engine, WFEE

Page 11: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 11

Astrogrid workflow components

• JES (Job Execution System)– Astrogrid workflow engine– Manages control flow– Runs steps in a controlled asynchronous fashion

• CEC (Common Execution Controller)– Manages step execution– Manages data flow

• CEA (Common Execution Architecture) apps– datacenters: support complex quesries against

archives– processing: consume data files and reduce them

Page 12: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 12

Astrogrid workflow schematic

Portal

Registry

MySpace

Command LineCEA

Datacenter CEA

JES

Clientlibrary

CEC

Save/load workflow Save/load data

Resolve application

Application list

Submit workflow

Page 13: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 13

Astrogrid workflow language

<workflow name=“a workflow”><description>description of the workflow</description>

<sequence/flow><set var=“dec” value=“15”/><step name=“a” result-var=“a-results”>

<tool name=“toolA” interface=“simpleInterface”><input>

<parameter name=“RA”><value>21</value></parameter>

<parameter name=“Dec”><value>${dec}</value></parameter>

</input><output>

<parameter name=“results ”indirect=“true”> <value>ftp://aServer/myResults</value></parameter>

</output></tool>

</step><step name=“b”>…

</sequence/flow><script>…<if test=…> <while test=…> <for var=… items=…> <parfor var=… items=…> <try>

<catch></workflow>

Page 14: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 14

CEA

• Create a uniform interface and model for an application and its parameters

• Provides higher level description than WSDL:– Restrict how interfaces can be expressed– Provide specific semantics for astronomical quantitites– Extra information, such as default values, GUI labels

• VOResource extensions for a general application• Provide asynchronous operation:

– callback, polling and job identification

• Allow separate data and control flows

Page 15: 10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.

10 Sep 2005

NVO Summer School 2005 15

Minimum CEA compliance

• Must implement CommonExecutionConnector interface

• Must send a message to services implementing ResultsListener interface

• Should send messages to services implementing JobMonitor interface

• Should perform basic type checking on all parameter types during init phase