Versioning for Workflow Evolution Eran Chinthaka Withana, Beth Plale School of Informatics and Computing Indiana University, Bloomington, Indiana Roger Barga, Nelson Araujo Microsoft Research, Microsoft Corporation, Redmond, Washington 3 rd International Workshop on Data Intensive Distributed Computing, Chicago, IL, US; “Versioning for Workflow Evolution”; June 22, 2010; Eran C. Withana
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Versioning for Workflow Evolution
Eran Chinthaka Withana, Beth Plale School of Informatics and ComputingIndiana University, Bloomington, Indiana
Roger Barga, Nelson Araujo Microsoft Research,
Microsoft Corporation, Redmond, Washington
3rd International Workshop on Data Intensive Distributed Computing, Chicago, IL, US; “Versioning for Workflow Evolution”;
– Sequence of activities– Set of configurable parameters and input data– Produces outputs to be analyzed and evaluated further
• Evolution of Research– Changes in research artifacts
Workflow Evolution• Workflows as a good tool to track evolution of research
– Automate repeatable tasks in an efficient manner– Algorithms & experimental procedures encoded in to workflows– Tracking workflows tracks research too
• Tracking effects over time– Provenance of data products– Lineage of and the roots of errors and affected data products
• Comparing Results– More than one research direction in a given experiment– Comparing outputs from different paths of the research
• Attribution– Attribution of credit based on who performed, who owns/created, who own data products– Sharing and attribution of research can and should be an integral part of research
• Eg: Sub-modules from myexperiments.org
• Workflow Evolution Framework and versioning model– Enables the management of knowledge encoded in workflow executions
Related Work• Workflow evolution share a lot in common with provenance collection frameworks
– I. T. Foster, J.-S. Vockler, M. Wilde, and Y. Zhao. Chimera: A virtual data system for representing, querying, and automating data derivation. In Proceedings of the 14th International Conference on Scientific and Statistical Database Management, pages 37-46, Washington, DC, USA, 2002. IEEE Computer Society.
• Existing evolution frameworks– J. Freire, C. Silva, S. Callahan, E. Santos, C. Scheidegger, and H. Vo. Managing rapidly-evolving scientific
workflows. Lecture Notes in Computer Science, 4145:10, 2006.
• Evolution Data Models– L. Bavoil, S. P. Callahan, P. J. Crossno, J. Freire, C. E. Scheidegger, C. T. Silva, H. T. Vo. Vistrails: Enabling
interactive multiple-view visualizations. In IEEE Visualization, 2005. VIS 05, pages 135-142
• Versioning at different levels– Application level: D. Santry, M. Feeley, N. Hutchinson, and A. Veitch. Elephant: The file system that never
forgets. In Workshop on Hot Topics in Operating Systems, pages 2-7. IEEE Computer Society, 1999. – System/database level: R. Chatterjee, G. Arun, S. Agarwal, B. Speckhard, and R. Vasudevan. Using
applications of data versioning in database application development. In ICSE '04: Proceedings of the 26th International Conference on Software Engineering, pages 315{325, Washington, DC, USA, 2004. IEEE Computer Society
– Disk storage level: M. Flouris and A. Bilas. Clotho: Transparent data versioning at the block I/O level. In Proceedings of the 12th NASA Goddard, 21st IEEE Conference on Mass Storage Systems and Technologies (MSST 2004),pages 315-328, 2004.
Use Cases1. Research Reproduction2. Scientific Workflows
– In LEAD tracking namelist input files and visualizations
– Tracking activity binaries
Versioning Model• Dimensions of workflow evolution
– Direct evolution occurs when a user of the workflow performs one of the following actions:• Changes the flow and arrangements of the components within the system• Changes the components within the workflow• Changes inputs and/or output parameters or configuration parameters to different
components within the workflow– Contributions tracks components that are reused from a previous system
• Workflow Evolution Capturing Stages– User explicitly saves the workflow– User closes the workflow editor– Execution of a workflow
• Warning: This granularity might not capture all edits
Architecture within Trident Scientific workflow worbench