Page 1
Apache Taverna
NERSC Workflow Day, Berkeley Lab, California 2015-02-20
http://taverna.incubator.apache.org/
Stian Soiland-Reyes@soilandreyes
[email protected] ://orcid.org/0000-0001-9842-9718
Donal Fellows@donalfellows
[email protected] ://orcid.org/0000-0002-9091-5938
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
Page 2
Taverna Workflow Ecosystem
• Workflow Language — SCUFL2 (and t2flow)• Workflow Engine — Taverna• Used in…
– Taverna Command Line Tool– Taverna Server– Taverna Workbench
• Allied services– myExperiment, workflow repository– Service Catalographer, service catalog software
• Instantiated as BioCatalogue, BiodiversityCatalogue, …
NERSC Workflow Day 2
Page 3
UI Plugins
Map of the Taverna Ecosystem
UI Plugins
TavernaWorkbench UI Plugins
TavernaCommand Line Tool
UI PluginsTaverna APIs
UI PluginsUI PsTavernaEngine
ActivityPlugins
TavernaCore
TavernaServer
UI Plugins
TavernaPlayer
Ru
by clien
t
REST API
SOAP API
UI Plugins
TavernaOnline
UI Plugins
TavernaLite
Components Other Servers
Workflow Repository
Service Catalogs
many services…
Application-Specific Portals
3
Page 4
Taverna In Use
Users, Scientific Areas, Projects
NERSC Workflow Day 4
Page 5
Taverna Users Worldwide
NERSC Workflow Day 5
Page 6
Taverna Uses — Scientific Areas
• Biodiversity — BioVeL project
• Digital Preservation — SCAPE project
• Astronomy — AstroTaverna product
• Solar Wind Physics — HELIO project
• In silico Medicine — VPH-Share project
NERSC Workflow Day 6
Page 7
Biodiversity: BioVeL
• Virtual e-Laboratory for Biodiversity– Service and knowledge commons– Supporting biodiversity research– Integrating with third-party
applications• For example, iPython Notebook
• Portal for running production-grade workflows on users’ data– Powered by Taverna Server– Integration with major biodiversity
databases– Interaction support made to
support
NERSC Workflow Day 7
Page 8
Digital Preservation: SCAPE
• Automated petabyte-scale digital collection maintenance– Century of scanned
newspapers– Whole national radio/TV
output– Major Web archives
• Processing engine powered by Taverna– Lift simple workflows to work
at collection level– Metadata management– Semantic annotations and
components for guided workflow construction
NERSC Workflow Day 8
Page 9
Astronomy: AstroTaverna
• Taverna plugin: IVOA (Virtual Observatory)– Astronomy data services and tools
• Example workflow:– List of galaxy names → Look up VO
properties → Find similar/near galaxies →Add bibliography
• VOTable support (select/merge/split/..) – Later adapted by bioinformatics community
• Projects: CANUBE, Wf4Ever, VAMDC, ER-Flow
• Taverna Workbench used on the desktop:– IVOA service registry user interface– Integrated with standalone astronomy tools
(SAMPS protocol): Aladin, TOPCAT
NERSC Workflow Day 9
Page 10
Astrophysics: HELIO
• Virtual laboratory for Solar Wind Science– Observation catalogs
– Processing
– Data integration platform
• Taverna is workflow glue– Taverna Server created to
support
– Workflows manage catalog access
– Workflows manage data processing
NERSC Workflow Day 10
Page 11
Medicine and Physiology: VPH-Share
• Platform for computer-aided medicine– Support for diagnosis and
treatment prognosis• Osteoarthritis, Dementia, Liver
disease, Cardiovascular disease
– Driven by specially-configured cloud instances
• Taverna is control and data management layer– Coordinates processing within
cloud instances– User communication with
cloud instances via Taverna interactions• Including complex 3D tasks
NERSC Workflow Day 11
Page 12
Inside the Taverna Ecosystem
Introduction to the Taverna Workflow Language and its Executors
NERSC Workflow Day 12
Page 13
The Basics of a Taverna Workflow
Input Ports (data in)
SOAP processor (web service call)
XML handling processors
Data Links (connect processors)
Output Ports (data out)
13
Get concept suggestions from termEelke van der Horsthttp://www.myexperiment.org/workflows/4590.html
NERSC Workflow Day
Page 14
Taverna Workflows
• Describe how data flows between processing nodes– Control dependencies also supported
• Processing service nodes of various kinds– Invoke programs (local or on cluster or grid or …)– Call services (SOAP or REST)– Read from and write to databases– Transfer data– Interact with the user
• Built-in parallelism and iteration– Processes lists of data in parallel
• Large data usually handled by reference– Avoids having to transfer it where not necessary
NERSC Workflow Day 14
Page 15
Taverna Workflows can get complex…
NERSC Workflow Day 15
BioVeL Population Model Construction and AnalysisMaria Paula Balcázar-Vargas, Jonathan Giddy and Gerard Oostermeijerhttp://www.myexperiment.org/workflows/3684.html
Page 16
Managing Workflow Complexity
• Subworkflows– Put smaller workflows within larger ones
– Like using a user-defined function in a programming language
– Can hide contents of subworkflow
• Components– “Black box” (but implemented with subworkflow)
– Semantically-annotated; described behaviour
– Like using a library in a programming language
NERSC Workflow Day 16
Page 17
Taverna Engine
• Executes (“enacts”) Taverna Workflows
• Pushes data through system in parallel– Subject to limits described in workflow
• Processor nodes invoked when their databecomes available– Turn inputs into outputs
• Captures detailed trace of what happened (“provenance”)– Follows W3C PROV specification
NERSC Workflow Day 17
Page 18
Taverna Command Line Tool
• Simple wrapper round Taverna Workflow Engine
• Inputs as simple files
• Outputs as directory structure
• Provenance packaged in Research Object
– ZIP Archive
– Inputs, Outputs, Intermediate values
– Workflow, Provenance, Overall metadata
NERSC Workflow Day 18
Page 19
Taverna Server
• Extends Workflow Engine to work for multiple simultaneous users
– Isolates workflows from each other
– Allows asynchronous usage
– Manages resources
– Clients can be in any language, not just Java
• Designed to sit behind a Portal
– User interfaces are domain-specific
NERSC Workflow Day 19
Page 20
Taverna Server Architecture
20
Tomcat Container+ CXF Framework
Taverna Server Webapp
Common System Model
Per
Use
r Fi
le M
anag
er
Web Portal
Ruby Client
Per-
Ru
n T
aver
na
Wo
rkfl
ow
En
gin
e
Processing Service
Catalog Services
Storage Services
Tave
rna
Wo
rkb
ench
(f
ort
hco
min
g)
Deployment Host
Common Management
Model
SelectedNotificationEndpoints
ManagementInterface
(separate auth)NERSC Workflow Day
Page 21
Taverna Workbench
• IDE for Taverna Workflows
• Designworkflows
• Run workflows
• Analyzeworkflows
• Access workflow repository
NERSC Workflow Day 21
Page 22
Taverna OnlineWeb IDE for Taverna
NERSC Workflow Day 22
Page 23
The Future of Taverna
Apache Taverna and Future Releases
NERSC Workflow Day 23
Page 24
• Non-profit organization, forming a community of open-source software projects.
• Strong emphasis on openness, collaborationand a consensus-based development process.
• Examples: – Apache HTTP Server, Tomcat, Maven, Hadoop,
OpenOffice, Subversion
NERSC Workflow Day 24
Page 25
Why Apache Taverna?
• Open development: Everything on mailing list
• Engagement: Encourage developer involvement – not just making plugins
• Independence: Apache Taverna is an independent project – Not a “Manchester thing”
• Shared ownership: equal participation
• Sustainability: self-managed community
NERSC Workflow Day 25
Page 26
Apache IncubatorGradually becoming an Apache project
• Intellectual Property assigned to ASF
– License changed to Apache License 2.0
• Infrastructure change – everything at *.apache.org
• Community building – growing developer base
• Mentoring on the “Apache Way” by volunteers from other Apache projects
NERSC Workflow Day 26
Page 27
Taverna Releases
• Current stable release: Taverna 2.5– Command Line (2.5.1), Server (2.5.4), Workbench (2.5.1)
• http://www.taverna.org.uk/download/
• Taverna 3 Release plan:– Apache Taverna Language
• API for workflow definitions
– Apache Taverna Engine & Command Line• Can also run workflows from Taverna 2 Workbench
– Apache Taverna Server
– Apache Taverna Workbench
NERSC Workflow Day 27
Page 28
Try Taverna!
• Get Taverna:– http://taverna.org.uk/download/
• Documentation:– http://www.taverna.org.uk/documentation/taver
na-2-x/
• Code:– http://taverna.incubator.apache.org/code/
• Getting involved:– http://taverna.incubator.apache.org/community/
NERSC Workflow Day 28