10 Sep 2005 NVO Summer School 20051 Managing VO data and process flows Matthew J. Graham CACR/Caltech T HE US N ATIONAL V IRTUAL O BSERVATORY.
Post on 27-Mar-2015
217 Views
Preview:
Transcript
10 Sep 2005
NVO Summer School 2005 1
Managing VO data and process flows
Matthew J. GrahamCACR/Caltech
THE US NATIONAL VIRTUAL OBSERVATORY
10 Sep 2005
NVO Summer School 2005 2
Overview
• Astronomical data• VOStore/VOSpace• Workflows• Astrogrid workflow• CEA
10 Sep 2005
NVO Summer School 2005 3
The importance of data
• Data is the raison d’être of the VO• LSST is the data source nonpareil
– data rates of 540MB/s ~16TB in 8 hrs– final archive > 3PB of data
VO Wheel™
• Well-established ways of handling distributed data:
– SRB– PVFS– OGSA-DAI
10 Sep 2005
NVO Summer School 2005 4
Data use cases
• Client has data:– stored locally: transfers it to service– stored locally: service retrieves it– stored elsewhere: service retrieves it
• Service generates data:– stores it locally: notifies client of location– transfers it to the client’s local store– transfers it to a client-designated store
10 Sep 2005
NVO Summer School 2005 5
VOStore
• Provides a uniform interface to existing or new data storage locations (Facade pattern)
• Structured/unstructured data both first level• Methods:
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
• get• put• list / listAll• importInit• importData (sync/async)• exportInit• exportData (sync/async)
• delete• rename
10 Sep 2005
NVO Summer School 2005 6
VOSpace
• Orchestrates VOStores:
– data collections: directories, user-defined– authorisation: user groups – processing efficiency: where is the nearest copy?
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
• move• copy• identifiers
10 Sep 2005
NVO Summer School 2005 7
A virtual super-peer data network?
10 Sep 2005
NVO Summer School 2005 8
How to manage the flows?
• Way of describing a flow:– processes/steps, inputs/outputs, serial/parallel
execution, control logic, variables, inline scripting– preferably XML (verbose but rigourous)
• Way of controlling a flow: engine• e-Science vs. e-Business:
– open-ended vs. closed– verification and publication– static vs. dynamic workflows– volume and type of data– meta-transactions– customer, manager and user vs. scientist
10 Sep 2005
NVO Summer School 2005 9
Workflow patterns
Sequence:
Parallel split Synchronisation
AND
XOR Exclusive choice
Simple Merge
Multi choice
Multi Multi Merge
Multi + Synchronizing Merge
Multi + Multi
Multi + Discriminator
Deferred choice
Multiple Instances with/out Synch
Implicit termination
Interleaved Parallel Routing
Milestone
10 Sep 2005
NVO Summer School 2005 10
Workflow kerfuffle
• Workflow languages: BPEL (BPEL4WS, WSBPEL, WSFL, XLANG), BPML, WS-CDL (WSCL, WSCI) , XPDL, BPSS, PSL, AGWL, DGL, DPML, GJobDL, GSFL, GFDL, GWorkflowDL, MoML, SWFL, YAWL, SCUFL/Xscufl, WPDL, PIF, PSL, OWL-S, xWFL, XPL, INCA
• Workflow engines: Taverna, Kepler, Pegasus, DiscoveryNet, Triana, SPA, Geodise, ICENI, Askalon, GridNexus, BioPipe, BizTalk, BPWS4J, DAGMan, GridAnt, GJH, GRMS, GWFE, GWES, ITIEE, JIGSA, Karajan, ScyFLOW, SDSC Matrix, SHOP2, wftk, YAWL Engine, WFEE
10 Sep 2005
NVO Summer School 2005 11
Astrogrid workflow components
• JES (Job Execution System)– Astrogrid workflow engine– Manages control flow– Runs steps in a controlled asynchronous fashion
• CEC (Common Execution Controller)– Manages step execution– Manages data flow
• CEA (Common Execution Architecture) apps– datacenters: support complex quesries against
archives– processing: consume data files and reduce them
10 Sep 2005
NVO Summer School 2005 12
Astrogrid workflow schematic
Portal
Registry
MySpace
Command LineCEA
Datacenter CEA
JES
Clientlibrary
CEC
Save/load workflow Save/load data
Resolve application
Application list
Submit workflow
10 Sep 2005
NVO Summer School 2005 13
Astrogrid workflow language
<workflow name=“a workflow”><description>description of the workflow</description>
<sequence/flow><set var=“dec” value=“15”/><step name=“a” result-var=“a-results”>
<tool name=“toolA” interface=“simpleInterface”><input>
<parameter name=“RA”><value>21</value></parameter>
<parameter name=“Dec”><value>${dec}</value></parameter>
</input><output>
<parameter name=“results ”indirect=“true”> <value>ftp://aServer/myResults</value></parameter>
</output></tool>
</step><step name=“b”>…
</sequence/flow><script>…<if test=…> <while test=…> <for var=… items=…> <parfor var=… items=…> <try>
<catch></workflow>
10 Sep 2005
NVO Summer School 2005 14
CEA
• Create a uniform interface and model for an application and its parameters
• Provides higher level description than WSDL:– Restrict how interfaces can be expressed– Provide specific semantics for astronomical quantitites– Extra information, such as default values, GUI labels
• VOResource extensions for a general application• Provide asynchronous operation:
– callback, polling and job identification
• Allow separate data and control flows
10 Sep 2005
NVO Summer School 2005 15
Minimum CEA compliance
• Must implement CommonExecutionConnector interface
• Must send a message to services implementing ResultsListener interface
• Should send messages to services implementing JobMonitor interface
• Should perform basic type checking on all parameter types during init phase
top related