Top Banner
Michael Pan nephosity pomsets Workflow management for your cloud
20

pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.

Mar 05, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.

Michael Pannephosity

pomsetsWorkflow management for your cloud

Page 2: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.

In the future, the rapidity with which anygiven discipline advances is likely todepend on how well the communityacquires the necessary expertise indatabase, workflow management,visualization, and cloud computingtechnologies.

“Beyond the Data Deluge”, Science, Vol. 323. no.5919, pp. 1297-1298, 2009.

Page 3: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.

Workflow management is…

the design,specification,coordination ofthe execution oftasks and taskdependencies.

Page 4: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.

Why workflow management +cloud computing?

• Cloud computing provides the ability to scalecompute resources with the work that needsto be done

• Better than what has been available, i.e.WFM+grid

• WFM is critical to a successful long-termcloud computing strategy• A critical component of the cloud computing

software stack• Growing recognition of the need for workflow

management

Page 5: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.

Issues with WFM+grid

• Jobs submitted to grids queue up behindjobs of other users, reduces operationalefficiencies provided by WFMS

• Heterogeneous comput environments mayresult in different task results

• Grids are not easily federated, limiting burstcomputing

• Available only to institutions with theresources to deploy their own grid andimplement their own WFMS

Page 6: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.

Components of a cloudcomputing software stack

• Virtual machines (VMWare, Xen, Virtuzzo, KVM)• Dynamic provisioning (Amazon EC2, Eucalyptus)• Task partitioning (MapReduce, Hadoop, Disco,

Sphere)• Data distribution (GFS, HDFS, Ceph, Sector,

MongoDB, CouchDB)• Unified messaging (Qpid, RabbitMQ, ZeroMQ)• Workflow management (Azkaban, Kepler, Oozie,

Pipeline, Pegasus, Taverna, Triana, pomsets)• Analytics (Rightscale, Nagios, Ganglia, Graphite)

Page 7: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.

Growing recognition of the need forworkflow management

(screencap 2009-12-04, currently 59 watchers)

Page 8: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.

Why pomsets?

• Other existing workflowmanagement systems are madefor programmers

• Non-programmers in enterprisesneed an easier way to managetheir data-intensive computationalworkflows

Page 9: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.

Oozie

Page 10: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.

Cascading

Page 11: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.

Pig

Page 12: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.

Shell script

Page 13: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.

pomsets is …

• A mathematical model- first used in1985 by Vaughn Pratt- to describeconcurrent processes

• An application that implements themathematical model as the datastructures that represent workflowcomplents, facilitates the design andspecification of workflows, andcoordinates the execution of workflowtasks on cloud deployments

Page 14: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.

The mathematical definition

Page 15: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.

The workflow managementsystem

• 2 components• pomsets-core is the backend and provides

an API• pomsets-gui is the front end and interacts

with the user

Page 16: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.

Features• Parallel computing• Data flow• Flow control• Workflow reusability• Compute cloud agnosticism• Execute environment agnosticism• Task partitioning• Shell commands, Hadoop, Python functions, etc• Intuitive GUI• Simple API

Page 17: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.

Demo

How to create the following script in pomsets

Page 18: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.

Demo

Page 19: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.

Growing recognition• nephosity was showcased at Structure 2010 as

one of the 11 most promising startups, due to itsfocus on workflow management in the cloud fornon-programmers

Page 20: pomsets Workflow management for your cloudconference.scipy.org/scipy2010/slides/michael_pan_pomsets.pdf · database, workflow management, visualization, and cloud computing technologies.

nephosity.comenable the cloud

@nephosity

Michael [email protected]