Top Banner
QuickTime™ and TIFF (Uncompressed are needed to see VisTrails Second Provenance Challenge Tommy Ellkvist David Koop Juliana Freire Joint work with: Erik Andersen, Steven P. Callahan, Emanuele Santos, Carlos E. Scheidegger, Cláudio Silva, and Huy T. Vo
28

VisTrails

Dec 31, 2015

Download

Documents

VisTrails. Second Provenance Challenge Tommy Ellkvist David Koop Juliana Freire. Joint work with: Erik Andersen, Steven P. Callahan, Emanuele Santos, Carlos E. Scheidegger, Cláudio Silva, and Huy T. Vo. Outline. VisTrails Introduction VisTrails Demo Provenance Model and API - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.VisTrails

Second Provenance ChallengeTommy Ellkvist

David Koop

Juliana Freire

Joint work with:Erik Andersen, Steven P. Callahan, Emanuele Santos, Carlos E. Scheidegger, Cláudio Silva, and Huy T. Vo

Page 2: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.Outline VisTrails Introduction VisTrails Demo Provenance Model and API Challenge Results Issues and Future Work

Page 3: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.VisTrails Comprehensive provenance infrastructure

for computational tasks Support for exploratory tasks such as

visualization and data mining Workflows are iteratively refined as users

generate and test hypotheses New change-based provenance model

Uniformly captures data and workflow provenance

Page 4: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.Change-based Provenance Provenance is stored as a tree of actions

add module

add connection

Page 5: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.Provenance: Storing Actions Each change writes new actions to the tree

<action id=“27” prevId=“26” user=“dakoop” date=“2007-06-20”> <add what=“module” objectId=“12”> <module id=“12” name=“vtkProperty” cache=“1”> <location id=“17” x=“-7.0” y=“97.0”/> </module> </add> <add what=“connection” objectId=“13”> <connection id=“13”> <port type=“source” moduleId=“10”/> <port type=“destination” moduleId=“12”/> </connection> </add></action>

Page 6: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.Change-based Provenance Data provenance: where does a specific

data product come from? Workflow evolution: how has workflow

structure changed over time? Treat workflow versions as data–store

provenance of workflows

Page 7: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.Layered Provenance

Page 8: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.Layered Provenance

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 9: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.Layered Provenance

Page 10: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.Layered Provenance

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 11: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.VisTrails Provenance Normalized information–no redundancy!

Each layer provides more specific information but refers to parent layers

Workflow EvolutionWorkflowExecution Extensible storage options

Support for both relational and XML Flexible annotation framework–users can

specify application-specific provenance information

Page 12: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Provenance for Reproducibility and Beyond

Infrastructure for querying and reusing provenance Query workflows by example Create workflows by analogy

Collaborative exploration Scalable derivation of data products

Page 13: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.VisTrails Demo

Page 14: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Supporting Different Provenance Backends

VisTrails has powerful tools to query and reuse provenance information

There are many powerful workflow systems that produce such information

Problem: How to integrate different provenance backends?

Our approach: A mediation-based approach to provenance interoperability

Page 15: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.Mediator Architecture

Mapping from global schema to data source specific schema

Page 16: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.Mediated Provenance

Mapping from general model to engine-specific model

Page 17: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.Combining Provenance Establish model Produce an API for this model Wrap provenance access for each

system so that queries become native over their provenance data

Page 18: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.Provenance Model Follows the layered architecture

Versions map to a workflows Workflows are modeled as graphs Parameters capture module state User-defined annotations are available at

each layer of the model Module Definition stores information about

the computational pieces

Page 19: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.Provenance Model

Page 20: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.Provenance API Implements common access queries and

operations over the provenance model Examples:

getParent(module)

getChildren(module)

getUpstream(module)

getDownstream(module)

getAnnotations(module | workflow | …)

getDataItems(module_exec)

getParameters(module)

getVersion(time)

getExecutedModules(workflow)

getConnection(data_item)

getPorts(connection)

findModulesByParameter(search_params)

findModulesByAnnotation(search_params)

findExecutionsByAnnotation(search_params)

findVersionsByModules(search_params)

Page 21: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.Provenance API Example

getExecutedModules(wf_exec)

VisTrails (XPath) def getExecutedModules(self, wf_exec): newdataitems = [] q = '//exec[@id="' + wf_exec.pid.key + '"]/@moduleId' dataitems = self.logcontext.xpathEval(q)

Pasoa (XPath)def getExecutedModules(self, wf_exec): q = "//ps:relationshipPAssertion[ps:localPAssertionId='" + wf_exec.pid.key + "']/ps:relation" dataitems = self.context.xpathEval(q)

Taverna (SPARQL)def getExecutedModules(self, wf_exec): " " q = ''' SELECT ?mi FROM <''' + self.path + '''> WHERE { <''' + wf_exec.pid.key + '''> <http://www.mygrid.org.uk/provenance#runsProcess> ?mi } ''' return self.processQueryAsList(q, pModuleInstance)

Page 22: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.Provenance API Results Implemented queries for each system

and a combination of all three Annotation issues for a couple queries Example: Query 1 Results

vt3:4 --> vt3:7vt3:1 --> vt3:4vt3:0 --> vt3:1pas2:http://relation.org/softmean --> vt3:0myg1:urn:www.mygrid.org.uk/process#reslice1 --> pas2:http://relation.org/softmeanmyg1:urn:www.mygrid.org.uk/process#reslice2 --> pas2:http://relation.org/softmeanmyg1:urn:www.mygrid.org.uk/process#reslice3 --> pas2:http://relation.org/softmeanmyg1:urn:www.mygrid.org.uk/process#reslice4 --> pas2:http://relation.org/softmeanmyg1:urn:www.mygrid.org.uk/process#align_warp1 --> myg1:urn:www.mygrid.org.uk/process#reslice1myg1:urn:www.mygrid.org.uk/process#align_warp2 --> myg1:urn:www.mygrid.org.uk/process#reslice2myg1:urn:www.mygrid.org.uk/process#align_warp3 --> myg1:urn:www.mygrid.org.uk/process#reslice3myg1:urn:www.mygrid.org.uk/process#align_warp4 --> myg1:urn:www.mygrid.org.uk/process#reslice4

Page 23: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.Provenance API Integration Developed VisTrails Provenance Query

Language for first challenge Plan to integrate API with query

language Plan to integrate query language with

VisTrails interfaces

Page 24: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.Interoperability Issues Uniquely identifying intermediate results Intermediate file names were not

specified and varied Tracing ids is difficult for users–this

should be transparent A common query language should use

concepts familiar to users Mediator vs. Warehousing approach

Page 25: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.Performance Issues Redundant information can make queries

inefficient What is the best storage backend?

RDBMS vs. XML database? What is the best data model?

XML vs. Relational vs. RDF? Need good benchmarks–large data!

Page 26: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.Questions?

Page 27: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.Mediated ProvenanceUser queries

General Provenance Model

wrapperwrapper wrapper

Taverna

Mappingfrom genericprovenance

modelinto the models of

different systems

Pasoa …

Prov API

Page 28: VisTrails

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.Mediator ArchitectureUser SQL/ODBC queries

Mediator

Global Schema

wrapperwrapper wrapper

DataSource

Mappingfrom global

schemainto sourceschemas

DataSource

DataSource