Top Banner
Apache Taverna NERSC Workflow Day, Berkeley Lab, California 2015-02-20 http://taverna.incubator.apache.org/ Stian Soiland-Reyes @soilandreyes [email protected] http://orcid.org/0000-0001-9842-9718 Donal Fellows @donalfellows [email protected] http://orcid.org/0000-0002-9091-5938 This work is licensed under a Creative Commons Attribution 3.0 Unported License.
28

Taverna summary

Jul 17, 2015

Download

Science

myGrid team
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Taverna summary

Apache Taverna

NERSC Workflow Day, Berkeley Lab, California 2015-02-20

http://taverna.incubator.apache.org/

Stian Soiland-Reyes@soilandreyes

[email protected]://orcid.org/0000-0001-9842-9718

Donal Fellows@donalfellows

[email protected]://orcid.org/0000-0002-9091-5938

This work is licensed under a Creative Commons Attribution 3.0 Unported License.

Page 2: Taverna summary

Taverna Workflow Ecosystem

• Workflow Language — SCUFL2 (and t2flow)• Workflow Engine — Taverna• Used in…

– Taverna Command Line Tool– Taverna Server– Taverna Workbench

• Allied services– myExperiment, workflow repository– Service Catalographer, service catalog software

• Instantiated as BioCatalogue, BiodiversityCatalogue, …

NERSC Workflow Day 2

Page 3: Taverna summary

UI Plugins

Map of the Taverna Ecosystem

UI Plugins

TavernaWorkbench UI Plugins

TavernaCommand Line Tool

UI PluginsTaverna APIs

UI PluginsUI PsTavernaEngine

ActivityPlugins

TavernaCore

TavernaServer

UI Plugins

TavernaPlayer

Ru

by clien

t

REST API

SOAP API

UI Plugins

TavernaOnline

UI Plugins

TavernaLite

Components Other Servers

Workflow Repository

Service Catalogs

many services…

Application-Specific Portals

3

Page 4: Taverna summary

Taverna In Use

Users, Scientific Areas, Projects

NERSC Workflow Day 4

Page 5: Taverna summary

Taverna Users Worldwide

NERSC Workflow Day 5

Page 6: Taverna summary

Taverna Uses — Scientific Areas

• Biodiversity — BioVeL project

• Digital Preservation — SCAPE project

• Astronomy — AstroTaverna product

• Solar Wind Physics — HELIO project

• In silico Medicine — VPH-Share project

NERSC Workflow Day 6

Page 7: Taverna summary

Biodiversity: BioVeL

• Virtual e-Laboratory for Biodiversity– Service and knowledge commons– Supporting biodiversity research– Integrating with third-party

applications• For example, iPython Notebook

• Portal for running production-grade workflows on users’ data– Powered by Taverna Server– Integration with major biodiversity

databases– Interaction support made to

support

NERSC Workflow Day 7

Page 8: Taverna summary

Digital Preservation: SCAPE

• Automated petabyte-scale digital collection maintenance– Century of scanned

newspapers– Whole national radio/TV

output– Major Web archives

• Processing engine powered by Taverna– Lift simple workflows to work

at collection level– Metadata management– Semantic annotations and

components for guided workflow construction

NERSC Workflow Day 8

Page 9: Taverna summary

Astronomy: AstroTaverna

• Taverna plugin: IVOA (Virtual Observatory)– Astronomy data services and tools

• Example workflow:– List of galaxy names → Look up VO

properties → Find similar/near galaxies →Add bibliography

• VOTable support (select/merge/split/..) – Later adapted by bioinformatics community

• Projects: CANUBE, Wf4Ever, VAMDC, ER-Flow

• Taverna Workbench used on the desktop:– IVOA service registry user interface– Integrated with standalone astronomy tools

(SAMPS protocol): Aladin, TOPCAT

NERSC Workflow Day 9

Page 10: Taverna summary

Astrophysics: HELIO

• Virtual laboratory for Solar Wind Science– Observation catalogs

– Processing

– Data integration platform

• Taverna is workflow glue– Taverna Server created to

support

– Workflows manage catalog access

– Workflows manage data processing

NERSC Workflow Day 10

Page 11: Taverna summary

Medicine and Physiology: VPH-Share

• Platform for computer-aided medicine– Support for diagnosis and

treatment prognosis• Osteoarthritis, Dementia, Liver

disease, Cardiovascular disease

– Driven by specially-configured cloud instances

• Taverna is control and data management layer– Coordinates processing within

cloud instances– User communication with

cloud instances via Taverna interactions• Including complex 3D tasks

NERSC Workflow Day 11

Page 12: Taverna summary

Inside the Taverna Ecosystem

Introduction to the Taverna Workflow Language and its Executors

NERSC Workflow Day 12

Page 13: Taverna summary

The Basics of a Taverna Workflow

Input Ports (data in)

SOAP processor (web service call)

XML handling processors

Data Links (connect processors)

Output Ports (data out)

13

Get concept suggestions from termEelke van der Horsthttp://www.myexperiment.org/workflows/4590.html

NERSC Workflow Day

Page 14: Taverna summary

Taverna Workflows

• Describe how data flows between processing nodes– Control dependencies also supported

• Processing service nodes of various kinds– Invoke programs (local or on cluster or grid or …)– Call services (SOAP or REST)– Read from and write to databases– Transfer data– Interact with the user

• Built-in parallelism and iteration– Processes lists of data in parallel

• Large data usually handled by reference– Avoids having to transfer it where not necessary

NERSC Workflow Day 14

Page 15: Taverna summary

Taverna Workflows can get complex…

NERSC Workflow Day 15

BioVeL Population Model Construction and AnalysisMaria Paula Balcázar-Vargas, Jonathan Giddy and Gerard Oostermeijerhttp://www.myexperiment.org/workflows/3684.html

Page 16: Taverna summary

Managing Workflow Complexity

• Subworkflows– Put smaller workflows within larger ones

– Like using a user-defined function in a programming language

– Can hide contents of subworkflow

• Components– “Black box” (but implemented with subworkflow)

– Semantically-annotated; described behaviour

– Like using a library in a programming language

NERSC Workflow Day 16

Page 17: Taverna summary

Taverna Engine

• Executes (“enacts”) Taverna Workflows

• Pushes data through system in parallel– Subject to limits described in workflow

• Processor nodes invoked when their databecomes available– Turn inputs into outputs

• Captures detailed trace of what happened (“provenance”)– Follows W3C PROV specification

NERSC Workflow Day 17

Page 18: Taverna summary

Taverna Command Line Tool

• Simple wrapper round Taverna Workflow Engine

• Inputs as simple files

• Outputs as directory structure

• Provenance packaged in Research Object

– ZIP Archive

– Inputs, Outputs, Intermediate values

– Workflow, Provenance, Overall metadata

NERSC Workflow Day 18

Page 19: Taverna summary

Taverna Server

• Extends Workflow Engine to work for multiple simultaneous users

– Isolates workflows from each other

– Allows asynchronous usage

– Manages resources

– Clients can be in any language, not just Java

• Designed to sit behind a Portal

– User interfaces are domain-specific

NERSC Workflow Day 19

Page 20: Taverna summary

Taverna Server Architecture

20

Tomcat Container+ CXF Framework

Taverna Server Webapp

Common System Model

Per

Use

r Fi

le M

anag

er

Web Portal

Ruby Client

Per-

Ru

n T

aver

na

Wo

rkfl

ow

En

gin

e

Processing Service

Catalog Services

Storage Services

Tave

rna

Wo

rkb

ench

(f

ort

hco

min

g)

Deployment Host

Common Management

Model

SelectedNotificationEndpoints

ManagementInterface

(separate auth)NERSC Workflow Day

Page 21: Taverna summary

Taverna Workbench

• IDE for Taverna Workflows

• Designworkflows

• Run workflows

• Analyzeworkflows

• Access workflow repository

NERSC Workflow Day 21

Page 22: Taverna summary

Taverna OnlineWeb IDE for Taverna

NERSC Workflow Day 22

Page 23: Taverna summary

The Future of Taverna

Apache Taverna and Future Releases

NERSC Workflow Day 23

Page 24: Taverna summary

• Non-profit organization, forming a community of open-source software projects.

• Strong emphasis on openness, collaborationand a consensus-based development process.

• Examples: – Apache HTTP Server, Tomcat, Maven, Hadoop,

OpenOffice, Subversion

NERSC Workflow Day 24

Page 25: Taverna summary

Why Apache Taverna?

• Open development: Everything on mailing list

• Engagement: Encourage developer involvement – not just making plugins

• Independence: Apache Taverna is an independent project – Not a “Manchester thing”

• Shared ownership: equal participation

• Sustainability: self-managed community

NERSC Workflow Day 25

Page 26: Taverna summary

Apache IncubatorGradually becoming an Apache project

• Intellectual Property assigned to ASF

– License changed to Apache License 2.0

• Infrastructure change – everything at *.apache.org

• Community building – growing developer base

• Mentoring on the “Apache Way” by volunteers from other Apache projects

NERSC Workflow Day 26

Page 27: Taverna summary

Taverna Releases

• Current stable release: Taverna 2.5– Command Line (2.5.1), Server (2.5.4), Workbench (2.5.1)

• http://www.taverna.org.uk/download/

• Taverna 3 Release plan:– Apache Taverna Language

• API for workflow definitions

– Apache Taverna Engine & Command Line• Can also run workflows from Taverna 2 Workbench

– Apache Taverna Server

– Apache Taverna Workbench

NERSC Workflow Day 27

Page 28: Taverna summary

Try Taverna!

• Get Taverna:– http://taverna.org.uk/download/

• Documentation:– http://www.taverna.org.uk/documentation/taver

na-2-x/

• Code:– http://taverna.incubator.apache.org/code/

• Getting involved:– http://taverna.incubator.apache.org/community/

NERSC Workflow Day 28