Workflow Topics for the Next- Workflow Topics for the Next- Generation SDM-Center Generation SDM-Center Ilkay Altintas altintas@SDSC.edu Bertram Ludäscher ludaesch@UCDAVIS.edu San Diego Supercomputer Center UC DAVIS Department of Computer Science SciDAC SDM AHM Oct 5-6, 2005, NCSU Raleigh, NC Sir Walter Raleigh
34
Embed
Workflow Topics for the Next-Generation SDM-Center
UC DAVIS Department of Computer Science. San Diego Supercomputer Center. Workflow Topics for the Next-Generation SDM-Center. Ilkay Altintas altintas@ SDSC .edu Bertram Ludäscher ludaesch@UC DAVIS .edu. Sir Walter Raleigh. SciDAC SDM AHM Oct 5-6, 2005, NCSU Raleigh, NC. Overview. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Workflow Topics for the Next-Workflow Topics for the Next-
• Kepler/SPA:– What we have (The GOOD)– What we don’t (yet) have (The BAD)– What we really need?? (The UGLY)
Things we might do; prioritization
SDM-AHM-10-05 NCSU Next SDM-C: Workflows
Macro Definitions …
• #define KEPLER KEPLER/SPA
• #define KEPLER KEPLER*SPA
• By the end:
• #define SPA KEPLERHPC
SDM-AHM-10-05 NCSU Next SDM-C: Workflows
What we have – The GOOD
• Big Heritage from Ptolemy II– Vergil GUI for design and (some) execution monitoring– Actor-Oriented Modeling & Design
• Director / Actor Separation• Models of Computation: PN, SDF, DE, .. • Nested Workflows & Hierarchical Modeling• Research Results on Modeling Complex Systems
– modal models, mobile models, reconfig’able models, model lifecycle management, higher-order actors, …
head-start for CCA Extensions, e.g. • SciRUN-2 Extensions (Steve P. et al.) • Self-Managing, Dynamically-Adaptive, Autonomous
Components (Manish et al.)
SDM-AHM-10-05 NCSU Next SDM-C: Workflows
What we have – The GOOD
• Kepler Extensions (to Ptolemy II)– Mostly: loosely coupled, e.g. WS (web service) workflows– Many generic actors
• ssh, scp, cmd-line,SRB, Globus, …• new R expression actor
– Many custom actors• e.g. in PIW, TSI-1, TSI-2, GEON, SEEK, Resurgence, …
– Several ad-hoc extensions & (initial) research, e.g.• External job scheduling (e.g. NIMROD, …)• Director extensions (fault tolerance via WS “retry”)• WF-Templates (structured combination of dataflow & control-flow:
… and some scientific users … (TSI-1/2, PIW, GEON, SEEK, … )
SDM-AHM-10-05 NCSU Next SDM-C: Workflows
Concept-based Actor Search– Implemented as proof-of-
concept
• Additional operations slated for next Kepler Release (data search, port-based actor search, etc.)
Biggest Challenges– Building/searching a
repository …
– Making changes to MoML (see KAR)
– GUI changes
– Ontology management
Concept-based Actor Search
WorkflowComponents(MoML/KAR)
Ontologies(OWL)
Default + Other
SemanticAnnotations
urn idsinstanceexpressions
SDM-AHM-10-05 NCSU Next SDM-C: Workflows
SDM-AHM-10-05 NCSU Next SDM-C: Workflows
The GOOD: Kepler Archives
• Purpose: Encapsulate WF data and actors in an archive file– … inlined or by reference– … version control
More robust workflow exchange
Easy management of semantic annotations
Plug-in architecture (Drop in and use)
Easy documentation updates
• A jar-like archive file (.kar) including a manifest• All entities have unique ids (LSID)• Custom object manager and class loader• UI and API to create, define, search and load .kar files
SDM-AHM-10-05 NCSU Next SDM-C: Workflows
KAR File Example <entity name="Multiply or Divide" class="ptolemy.kernel.ComponentEntity"><property name="entityId" value="urn:lsid:localhost:actor:80:1"
• Designed to access local and distributed objects• Objects: data, metadata, annotations, actor classes,
supporting libraries, native libraries, etc. archived in kar files
• Advantages:– Reduce the size of Kepler distribution
• Only ship the core set of generic actors and domains– Easy exchange of full or partial workflows for collaborations– Publish full workflows with their bound data
• Becomes a provenance system for derived data objects
=> Separate SPA workflow repository and distribution
SDM-AHM-10-05 NCSU Next SDM-C: Workflows
Provenance Framework
• Provenance – Track origin and derivation information about scientific workflows, their runs
and derived information (datasets, metadata…)• Need for Provenance
– Association of process and results– reproduce results– “explain & debug” results (via lineage tracing, parameter settings, …)– optimize: “Smart Re-Runs”
• Types of Provenance Information:– Data provenance
• Intermediate and end results including files and db references– Process (=workflow instance) provenance
• Keep the wf definition with data and parameters used in the run– Error and execution logs
– Workflow design provenance (quite different)• WF design is a (little supported) process (art, magic, …)• for free via cvs: edit history• need more “structure” (e.g. templates) for individual & collaborative
workflow design
SDM-AHM-10-05 NCSU Next SDM-C: Workflows
Kepler Provenance Recording Utility
• Parametric and customizable – Different report formats– Variable levels of detail
• Verbose-all, verbose-some, medium, on error– Multiple cache destinations
• Saves information on– User name, Date, Run, etc…
Joint work with Oscar Barney
SDM-AHM-10-05 NCSU Next SDM-C: Workflows
Provenance: Next Steps
• .kar file generation, registration and search for provenance information
• Possible data/metadata formats• Automatic report generation from accumulated data• A relational schema for the provenance info in
addition to the existing XML• Smart re-runs
SDM-AHM-10-05 NCSU Next SDM-C: Workflows
The Future
• From GOOD via BAD to UGLY
• The good news (about ‘bad’ and ‘ugly’)– Lots of interesting challenges!– … so ‘ugly’ is actually good!
SDM-AHM-10-05 NCSU Next SDM-C: Workflows
What we don’t (yet) have … THE BAD
• Much is still to do (or still ongoing)– Detached execution
• many options; depend on requirements – Kepler WF repository w/ dynamic actor plug-in– Smart Reruns
• avoid doing (old) work twice– Smarter Reruns (too smart?)
• reuse previous results for speed-up of (new) work– NIMROD Director, CONDOR Director … – Task manager / monitor– Support for WF design & reuse
• Via CCA (SciRUN-2, Ccaffeine, …) (Cipres uses CORBA) • HPC needs: code-coupling as efficient & flexible as possible
(e.g. Scott’s challenges…) – memory-to-memory (single node or shared memory), – MPI (multiple-nodes)– optimizations for transfer of data & control (streaming, socket-based
connections)
SDM-AHM-10-05 NCSU Next SDM-C: Workflows
Accord-CCA: Ccaffeine w/ Self-Managed Behavior
Source: Hua Liu and Manish Parashar
cf. w/ mobile models, reconfiguration in Ptolemy II
… begging for a Kepler design and
implementation …
SDM-AHM-10-05 NCSU Next SDM-C: Workflows
Different “Directors” for Different Concerns
• Example: – Ptolemy Directors – “factoring out” the concern of
workflow “orchestration” (MoC)– common aspects of overall execution not left to the
actors• Similarly:
– “Black Box” (“flight recorder”) • a kind of “recording central” to avoid wiring 100’s of
components to recording-actor(s) – “Red Box” (error handling, fault tolerance)
• use ftsh ideas; tempaltes – “Yellow Box” (type checking)
• for workflow design– “Blue Box” (shipping-and-handling)
• central handling of data transport (by value, by reference, by scp, SRB, GridFTP, …)
– “CCA++ Boxes” • Change behavior (e.g. algorithm) of a component
• Change behavior (i.e., wiring) of a workflow in-flight
SDF/PN/DE/…
Provenance Recorder
SHA @
Static Analysis
On Error
Component Mgr
Composition Mgr
SDM-AHM-10-05 NCSU Next SDM-C: Workflows
Summary
• The GOOD: – lots to build upon
• The BAD: – no common / integrated architecture
use Kepler/SPA as a glue this might be harder than it sounds needs a mix of end-to-end application-drive and
serious design effort for the integration architecture
• The UGLY: – HPC challenges: close coupling, fault tolerance, …– The good news: there’s work to be done!
SDM-AHM-10-05 NCSU Next SDM-C: Workflows
Use of Semantics in SWF…
“Smart” Search– Concept-based, e.g., “find all datasets containing biomass
measurements”
Improved Linking, Merging, Integration– Establishing links between data through semantic annotations &
ontologies– Combining heterogeneous sources based on annotations– Concatenate, Union (merge), Join, etc.
Transforming– Construct mappings from schema S1 to S2 based on annotations
Semantic Propagation– “Pushing” semantic annotations through transformations/queries
SDM-AHM-10-05 NCSU Next SDM-C: Workflows
Helping with “shims” / adapters
• Services can be semantically compatible, but structurally incompatible