Top Banner
1 Kepler/SPA Extensions for Kepler/SPA Extensions for Scientific Workflows – Now and Scientific Workflows – Now and Upcoming Upcoming Ilkay Altintas Ilkay Altintas SWAT lead SWAT lead San Diego Supercomputer Center San Diego Supercomputer Center [email protected] [email protected] Bertram Lud Bertram Lud ä ä scher scher Dept. of Computer Science & Genome Center Dept. of Computer Science & Genome Center University of California, Davis University of California, Davis [email protected] [email protected] + many other SDM/SPA & Kepler contributors! + many other SDM/SPA & Kepler contributors! San Diego Supercomputer Center UC DAVIS Department of Computer Science
23

1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center [email protected] Bertram Ludäscher.

Jan 20, 2016

Download

Documents

Shanna Williams
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center altintas@sdsc.edu Bertram Ludäscher.

1

Kepler/SPA Extensions for Kepler/SPA Extensions for

Scientific Workflows – Now and Scientific Workflows – Now and

UpcomingUpcoming

Ilkay AltintasIlkay AltintasSWAT leadSWAT leadSan Diego Supercomputer CenterSan Diego Supercomputer Center

[email protected]@sdsc.edu

Bertram LudBertram LudääscherscherDept. of Computer Science & Genome CenterDept. of Computer Science & Genome CenterUniversity of California, Davis University of California, Davis

[email protected]@ucdavis.edu

+ many other SDM/SPA & Kepler contributors!+ many other SDM/SPA & Kepler contributors!

San Diego Supercomputer Center

UC DAVISDepartment ofComputer Science

Page 2: 1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center altintas@sdsc.edu Bertram Ludäscher.

2

KEPLER/KEPLER/CSPCSP: : CContributors, ontributors, SSponsors, ponsors, PProjectsrojects

Ilkay Altintas Ilkay Altintas SDM, NLADR, Resurgence, EOL, … SDM, NLADR, Resurgence, EOL, …

Kim Baldridge Kim Baldridge Resurgence, NMIResurgence, NMI

Chad Berkley Chad Berkley SEEKSEEK

Shawn Bowers Shawn Bowers SEEKSEEK

Terence Critchlow Terence Critchlow SDMSDM

Tobin Fricke Tobin Fricke ROADNetROADNet

Jeffrey Grethe Jeffrey Grethe BIRNBIRN

Christopher H. Brooks Christopher H. Brooks Ptolemy IIPtolemy II

Zhengang Cheng Zhengang Cheng SDMSDM

Dan Higgins Dan Higgins SEEKSEEK

Efrat Jaeger Efrat Jaeger GEONGEON

Matt Jones Matt Jones SEEKSEEK

Werner Krebs, Werner Krebs, EOLEOL

Edward A. Lee Edward A. Lee Ptolemy IIPtolemy II

Kai Lin Kai Lin GEONGEON

Bertram Ludaescher Bertram Ludaescher SDM, SEEKSDM, SEEK, , GEONGEON, , BIRN,BIRN, ROADNetROADNet

Mark Miller Mark Miller EOLEOL

Steve Mock Steve Mock NMINMI

Steve Neuendorffer Steve Neuendorffer Ptolemy IIPtolemy II

Jing Tao Jing Tao SEEKSEEK

Mladen Vouk Mladen Vouk SDMSDM

Xiaowen Xin Xiaowen Xin SDMSDM

Yang Zhao Yang Zhao Ptolemy IIPtolemy II

Bing Zhu Bing Zhu SEEKSEEK

••••••

Ptolemy IIPtolemy II

                                                

                                            

www.kepler-project.orgwww.kepler-project.org

LLNL, NCSU, SDSC, UCB, UCD, UCSB, UCSD, …,

Zurich

Collab. tools: IRC, cvs, skype, Wiki: hotTopics, FAQs, ..Collab. tools: IRC, cvs, skype, Wiki: hotTopics, FAQs, ..

Ptolemy IIPtolemy II

SPASPA

Page 3: 1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center altintas@sdsc.edu Bertram Ludäscher.

3

GEON Dataset Generation & GEON Dataset Generation & RegistrationRegistration

(a co-development in KEPLER)(a co-development in KEPLER)

Xiaowen (SDM/SPA)

Edward et al.(Ptolemy)

Yang (Ptolemy)

Efrat(GEON)

Ilkay(SDM/SPA)

SQL database access (JDBC)Matt,Chad,

Dan et al. (SEEK)

% Makefile$> ant run

% Makefile$> ant run

Page 4: 1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center altintas@sdsc.edu Bertram Ludäscher.

4

Update: endo-SPA (exo-Kepler), endo-Update: endo-SPA (exo-Kepler), endo-Kepler (exo-SPA), … w/o counting Kepler (exo-SPA), … w/o counting

peas…peas…• No/minor changes: No/minor changes: – XSLT, email, …

• Web service actor (SDM)Web service actor (SDM)– Updated: dynamic operation display, error reporting

• Command line actor (SDM)Command line actor (SDM)– Updated: improved interface and error handling

• SSH2 actor (SDM)SSH2 actor (SDM)– New: implements ssh2 protocol for remote execution (no plain password sent over the

wire)• Timestamp actor (SDM)Timestamp actor (SDM)

– New: for logging• BrowserUIv2.0 (SDM)BrowserUIv2.0 (SDM)

– reimplemented, improved interface– v3.0 planned (“catching” http-get/post via localhost)

• Execution logger (SDM)Execution logger (SDM)– New: workflow “black box” for keeping track of runs

• Documentation framework (SDM)Documentation framework (SDM)– Autogenerated actor documentation (new doclets and taglets)

• Ontology-based actor and dataset classification (SEEK)Ontology-based actor and dataset classification (SEEK)– Finding relevant components: actors and datasets, suggesting possible connections, …

• Kepler/SRB toolkit (GEON, SDM, SEEK, …)Kepler/SRB toolkit (GEON, SDM, SEEK, …)– improved interfaces, new functions

• … …

Page 5: 1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center altintas@sdsc.edu Bertram Ludäscher.

5

Application Pull vs Technology Application Pull vs Technology PushPush

• Use case drivenUse case driven (application pull) (application pull)– PIW, TSI-1, TSI-2, … – Solve technology issues along the way(+) solve the particular scientists’ problem(-) one-of-a-kind solutions, few generic &

reusable technology Example: – TSI-1 and TSI-2 are conceptually almost

identical scientific (“Grid/HPC/HTC”) workflows– but implemented very differently limited reuse, e.g., evolving/customizing

one into the other is hard/impossible…

Page 6: 1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center altintas@sdsc.edu Bertram Ludäscher.

6

Application Pull vs Technology Application Pull vs Technology PushPush

• Technology drivenTechnology driven (technology push) (technology push)– Generic application integration mechanisms:

• web service actor, harvester, command-line actors, ssh2 actor, BrowserUI, …

– Specialized interfaces to HPC/HTC systems: • Large-scale data management:

– SDSC SRB toolkit (set of SRB actors), – SRM?, PVFS2?, MPI-IO?, …

• Interfacing with generic job schedulers: – NIMROD, Condor, APST, …

– Interfacing with scientific packages: – Statistics toolkit (R, …), GIS (Grass, ArcIMS, Mapserver…)– GAMESS toolkit, APBS (visualization)…

(+) developing a reusable technology / toolkits(!) still need guidance by domain scientists’ problems,

but need to lift one-of solutions into a general SWF engineering methodology

Page 7: 1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center altintas@sdsc.edu Bertram Ludäscher.

7

Increasing number of Kepler Increasing number of Kepler actors…actors…

Page 8: 1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center altintas@sdsc.edu Bertram Ludäscher.

8

… … creating creating prototype workflowsprototype workflows and and test cases test cases (for automated (for automated

tests) … tests) …

Page 9: 1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center altintas@sdsc.edu Bertram Ludäscher.

9

… … putting them together in putting them together in generic, reusable generic, reusable

packagespackages, e.g., e.g.Kepler/SRB toolkitKepler/SRB toolkit

SRB holdings @ SDSC only: 404 TB in 59 million files across 5167 users (12/16/’04, Reagan Moore)

Page 10: 1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center altintas@sdsc.edu Bertram Ludäscher.

10

KEPLER/R Toolkit KEPLER/R Toolkit (under development)(under development)

Source: Dan Higgins, Kepler/SEEKSource: Dan Higgins, Kepler/SEEK

Page 11: 1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center altintas@sdsc.edu Bertram Ludäscher.

11

Page 12: 1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center altintas@sdsc.edu Bertram Ludäscher.

12

New Developments & New Developments & DirectionsDirections

Page 13: 1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center altintas@sdsc.edu Bertram Ludäscher.

13

Ontology-based Actor & Ontology-based Actor & Dataset DiscoveryDataset Discovery

Ontology based actor (service) and dataset

search

Result Display

Page 14: 1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center altintas@sdsc.edu Bertram Ludäscher.

14

Page 15: 1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center altintas@sdsc.edu Bertram Ludäscher.

15

Page 16: 1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center altintas@sdsc.edu Bertram Ludäscher.

16

Example: GAMESS Quantum-mechanics Example: GAMESS Quantum-mechanics cheminformatics workflowcheminformatics workflow

• Job management infrastructure in place• Results database: under development• Goal: 1000’s of GAMESS jobs (quantum mechanics)

Page 17: 1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center altintas@sdsc.edu Bertram Ludäscher.

17

Towards a Framework for Towards a Framework for “Grid/HPC/HTC” WFs & Job “Grid/HPC/HTC” WFs & Job

ManagementManagement

Page 18: 1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center altintas@sdsc.edu Bertram Ludäscher.

18

Technology-oriented meeting: May 12th Technology-oriented meeting: May 12th Ptolemy/Kepler Miniconference in BerkeleyPtolemy/Kepler Miniconference in Berkeley

Page 19: 1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center altintas@sdsc.edu Bertram Ludäscher.

19

What’s needed, what’s nextWhat’s needed, what’s next• Build generic toolkits / packagesBuild generic toolkits / packages

• Don’t reinvent – Reuse!Don’t reinvent – Reuse!– Improved R coupling, SCIRun coupling, …

• SWF Framework that SWF Framework that lets scientists choose…lets scientists choose… – SRB (Sput, Sget,…), SRM, MPI-IO, GlobusTK (GridFTP,

…) , Sabul, …, pNetCDF, parallel-R, … packages– Condor, Nimrod, … schedulers– GRASS, …

• General purpose SWF system/PSE that General purpose SWF system/PSE that scientists can use themselvesscientists can use themselves

Page 20: 1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center altintas@sdsc.edu Bertram Ludäscher.

20

Towards a Towards a KEPLER School of KEPLER School of ExpressionExpression

(Flow-based Design Patterns) (Flow-based Design Patterns)• Generality vs specialization of actorsGenerality vs specialization of actors

– also loosely coupled vs tightly coupled

• Data transformation pipelinesData transformation pipelines– alternate compute and data transformation steps

• Stage-execute-fetch pattern (Grid/HPC/HTC-WFs)Stage-execute-fetch pattern (Grid/HPC/HTC-WFs)• Loops, higher-order functions (map, foldr, …)Loops, higher-order functions (map, foldr, …)

– cf. Taverna’s automatic loop insertion based on data types

F-mapproducer

[f1, f2, …fn]

methodsfunctions

map

f

[x1, x2, …xn]

producer [f(x)1,…,f(xn)]

X

A B C

connectJDBC/SRB connection tokens, proxies, certificates

Page 21: 1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center altintas@sdsc.edu Bertram Ludäscher.

21

Blurring Blurring Design (ToDo)Design (ToDo) and Execution and Execution

Page 22: 1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center altintas@sdsc.edu Bertram Ludäscher.

22

Kepler@UC Davis Genome Center: Kepler@UC Davis Genome Center: Scientific Workflows to Support the Scientific Workflows to Support the

Complete (Wet-lab) Experiment Complete (Wet-lab) Experiment LifecycleLifecycle

• Try to capture and (semi-)automate the Experiment Try to capture and (semi-)automate the Experiment Lifecycle:Lifecycle:– Discover similar experiments, … – reuse, customize, – execute, monitor,– manage results,– Register back to an experiment repository

Support Experiment Design, Execution, & ReuseSupport Experiment Design, Execution, & Reuse

Scientific workflows and semantic extensions Scientific workflows and semantic extensions (ontologies, metadata++) (ontologies, metadata++)

Page 23: 1 Kepler/SPA Extensions for Scientific Workflows – Now and Upcoming Ilkay Altintas SWAT lead San Diego Supercomputer Center altintas@sdsc.edu Bertram Ludäscher.

23

Summary: What we could/should Summary: What we could/should dodo

• Push technology:Push technology:– Distributed Kepler & “detached” execution– Making Kepler more X-aware, where …

• … X=Data plumbing (SRB toolkit, GridTK, others, …) • … X=Grid & Scheduling (need a “Grid director”? Condor

director?), • … X=Parameter-sweep (“Nimrod/APST”… director?)• … X=Statistics & other specialized packages (R, parallel-R?, …,

Grass, … )• … X=Visualization (SciRUN, …)

– Semantic extensions• Actors and datasets have “semantic types” to support reource

discovery, WF design, …

• Create “Packages” or “Rolls” Create “Packages” or “Rolls” – … targeting certain scientific user groups & communities

• SWF Life-cycle support:SWF Life-cycle support:– Design, execution, monitoring, archival, re-use/re-run– Design patterns, “Kepler School of Expression”