Top Banner
HPSearch and NaradaBrokering: Workflow Scripting and Stream Management PTLIU Laboratory for Community Grids Geoffrey Fox, Harshawardhan S. Gadgil, Shrideep Pallickara Indiana University, Bloomington IN 47404 http://www.hpsearch.org http://www.naradabrokering.org [email protected] Edinburgh December 3 2003
31

HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

Feb 25, 2016

Download

Documents

wray

Edinburgh December 3 2003. HPSearch and NaradaBrokering: Workflow Scripting and Stream Management. PTLIU Laboratory for Community Grids Geoffrey Fox, Harshawardhan S. Gadgil, Shrideep Pallickara Indiana University, Bloomington IN 47404 http://www.hpsearch.org http://www.naradabrokering.org. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

PTLIU Laboratory for Community GridsGeoffrey Fox, Harshawardhan S. Gadgil, Shrideep Pallickara

Indiana University, Bloomington IN 47404

http://www.hpsearch.org http://www.naradabrokering.org

[email protected]

Edinburgh December 3 2003

Page 2: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

Backdrop Workflow systems have several components Development Environment – Graphical User Interface Specification Language such as BPEL4WS Some interface between specification and runtime

(compiler?) Run-time managing linkage of services, error handling

and notification This project contributes to

• Procedural specification of workflow• Stream management part of run-time

Workflow is sufficiently complex that we ought to agree on a general architecture so we can each build parts and link together

Page 3: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

Comments on Standards In this talk, workflow is synonymous with

“Programming the Grid/Internet” We never agreed on a programming model for the

simpler case of “Programming a CPU” so not very likely we will agree on standards for workflow

We did roughly agree on standards within a particular language Fortran v. Java v C++ v C# v Lisp

We also had a little more agreement on common run-time than on languages but not complete

Page 4: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

What to remember about this talk All streams (flow between ports of a web service) are

handled by publish-subscribe messaging infrastructure• Allows robust data transfer with adaptive routing e.g. allows

use of GridFTP• Supports full concurrency inter and intra streams

Data, Streams, Files, Web Services are manipulated by a scripting language analogous to Shell and Perl in UNIX

http://www.hpsearch.org has the details Software will be included in open-source release of

http://wwww.naradabrokering.org • NB Version 0.93 today; 1.0 February 04; 2.0 for SC04

includes HPSearch

Page 5: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

Scripting Environment I HPSearch is designed as a scripting interface to the Internet

(Grid) using currently the Rhino implementation of Javascript• Could use Python, Perl • Called HPSearch because could access variables either by

URI or by search interface (To Google Web Service) but this is not relevant here

x = ‘wsdl:http://156.56.104.155:8080/axis/services/Calculator.jws/add(5, 6)‘; Or x = ‘wsdl:WSDL for WS/function(arguments)‘; Returns x=11 if function adds its arguments Could follow by y=x+1 setting y=12 etc. Can access any data in this fashion and support normal

capabilities supported in most languages (set x=data and use I/O)• Perhaps prefer all I/O to go through Web Services

Page 6: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

Scripting Environment II So scripting environment can manipulate its own

variables and methods as usual but can also invoke any web service with the wsdl: primitive

xpath: primitive evaluates an XPath query against a local variable defined (say by return from a Web service) as an XML instance

Can have multiple communicating scripting enginesWS1 WS3WS2 WS4 WS5 WS6

Script1 Script2 Script3

This is scripting control; workflow is between web services

Page 7: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

Minicomputer

Firewall

ComputerServer

PDA

Modem

Laptop computerWorkstationPeers

Peers

Audio/VideoConferencing Client

Audio/VideoConferencing Client

NaradaBrokering BrokerNetwork

NaradaBrokering

Queues

Web Service A

Web Service B

Stream

Page 8: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

Grid Messaging Substrate

Consumer Service

SOAP+HTTPGridFTPRTP ….

Standard client-serverstyle communication.

Substrate mediatedcommunication removestransport protocol dependence.

Messaging Substrate

Consumer ServiceSOAP+HTTPGridFTPRTP ….

Any Protocol satisfying QoS

Protocols have become overloaded e.g. MUST use UDP for A/V latency requirements but MUSTn’t use UDP as firewall will not support ………

Page 9: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

NaradaBrokering Based on a network of cooperating broker nodes

• Cluster based architecture allows system to scale in size Originally designed to provide uniform software

multicast to support real-time collaboration linked to publish-subscribe for asynchronous systems.

Now has several core functions • Reliable order-preserving “Optimized” Message transport

(based on performance measurement) in heterogeneous multi-link fashion with TCP, UDP, SSL, HTTP, and will add GridFTP

• General publish-subscribe including JMS & JXTA and support for RTP-based audio/video conferencing

• Distributed XML event selection using XPATH metaphor• QoS, Security profiles for sent and received messages• Interface with reliable storage for persistent events

Page 10: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

Laudable Features of NaradaBrokering Is open source http://www.naradabrokering.org Has end-point “plug-in” as well as standalone brokers Will have a discovery service to find nearest brokers

and manage topics Does tunnel through many firewalls without requiring

ports to be opened Supports JXTA, JMS (Java Message Service) and more

powerful native mode Transit time < 1 millisecond per broker Initial version of setup and broker network

administration module• Currently expect to use HPSearch scripts to specify setup

Page 11: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

NaradaBrokering Naturally Supports Filtering of events to support different client

requirements (e.g,. PDA versus desktop, slow lines, different A/V codecs)

Virtualization of addressing, routing, interfaces Federation and Mediation of multiple instances of Grid

services as illustrated by • Composition of Gridlets into full Grids (Gridlets are single

computers in P2P case)• JXTA with peer-group forming a Gridlet

Monitoring of messages for Service management and general autonomic functions

Fault tolerant data transport Virtual Private Grid with fine-grain Security model

Page 12: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

NaradaBrokering Communication Applications interface to NaradaBrokering through UserChannels

which NB constructs as a set of links between NB Brokers acting as “waystations” which may need to be dynamically instantiated

UserChannels have publish/subscribe semantics with XML topics Links implement a single conventional “data” protocol.

• Interface to add new transport protocols within the Framework

• Administrative channel negotiates the best available communication protocol for each link

Different links can have different underlying transport implementations• Implementations in the current release include support for

TCP,UDP, Multicast, SSL, RTP and HTTP. • GridFTP most interesting new protocol• Supports communication through proxies and firewalls such as

iPlanet, Netscape, Apache, Microsoft ISA and Checkpoint.

Page 13: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

Manipulating Streams flow: primitive manages streams between Web services There is service-oriented workflow where streams are

typically implicit. Here HPSearch supports UNIX style pipe and tee and we have trivial examples

For stream-oriented, the streams are explicit. We have built a sophisticated system GlobalMMCS but it is currently not supported in HPSearch

HPSearch will become control engine for NaradaBrokering when streams are “just” message flows on the Grid. Here one would use NB discovery services – find streams – and monitor• In this view a client talking to a Web Service is workflow

Page 14: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

HPSearch Flow Example // The input file         x = "file:///u/hgadgil/datafile.txt";          // Reverses every line in the i/p e.g. abcd becomes dcba         y1 = "156.56.104.155:5050"; // Computes the length of each line minus the last (\n or \r)         y2 = "156.56.104.155:6060";             // And finally the outputs...         z1 = "file:///u/hgadgil/reversed.txt";         z2 = "file:///u/hgadgil/length.txt";      `flow: x &> (y1 | z1), (y2 | z2)`;

T Pipe Pipe

x

y1 z1

y2 z2

NaradaBrokering Queue

Page 15: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

Another Example `flow: x &> (y1|z1 &> p,(q|storage1)), (y2|z2|storage2)`;

Note this approach allows for example all workflow streams to use RMI, GridFTP, RTP – your or rather NaradaBrokering’s choice

x

y1 z1 p

y2 z2 storage2

q storage1

NaradaBrokering Topic (Queue)

Page 16: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

Stream–oriented Workflow As in audio-video conferencing and multimedia file

delivery where it’s the media streams that are the “point”

Services generate and transform streams but one thinks of streams going through services rather than services generating streams

Multi-cast streams where video from one client sent to all other participants in a collaborative session common

One thinks of a stream being published and participants subscribing to it.

Pub/Sub QueuePublish

Subscribe

Page 17: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

XGSP Web Service MCU Architecture

SIP H323 Access Grid Native XGSPAdmire

Gateways convert to uniform XGSP Messaging

High Performance (RTP)and XML/SOAP and ..

Media ServersFilters

Session ServerXGSP-based Control

NaradaBrokeringAll Messaging

Use Multiple Media servers to scale to many codecs and manyversions of audio/video mixing; should allow all e-Scientists to be connected

NB Scales asdistributed

WebServices

NaradaBrokering

Page 18: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

Service-oriented Workflow I As in follow of data between different simulation

programs where one has a program (which becomes a Web service) view and data flow between programs often not explicitly interesting

Elastic DislocationPattern Recognizers

Fault Model BEM

Viscoelastic Layered BEM

Viscoelastic FEMElastic Dislocation Inversion

Page 19: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

Service-oriented Workflow II Initial input and output files identified with perhaps a

visualization as output In many implementations such as ours in earthquake

example one writes and reads files for stream interface• Sometimes one wants the intermediate output files

AVS and such visualization and image processing systems have such a model using streams

Multicast not important per-se; use a publish/subscribe mechanism as it is fault-tolerant and higher performance and not because of multi-cast support

Page 20: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

Streams and Data Scripting engine can either define topics or find them

out from NaradaBrokering discovery service Run-time ensures that all I/O goes through

NaradaBrokering• Note one either uses a proxy or builds NaradaBrokering

interface into Web service• Proxy should be near Web Service as only NaradaBrokering

“guarantees” firewall penetration, fault-tolerance, performance

• NaradaBrokering needs improved discovery system NaradaBrokering and Scripts are distributed so no

central bottlenecks

Page 21: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

NaradaBrokering in practice One can “best” insert NaradaBrokering end-point

interface into each client or web service

But proxy model easiest for existing applications

Client WSNB BrokerNetwork

NBEndpt

NBEndpt NB Streams

NB BrokerNetwork

NBEndpt

NBEndpt NB Streams

Client WSProxy Proxy

“Native Communication” – cannot use added value of NBincluding fault tolerance. Current GridFTP Implementation

Page 22: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

Entities in HPSearch Each Script is a Web Service Each Web Service, File, Web Page has a URI and can be

accessed by a Script• HPSearch at its heart was URI’s bound to Javascript

Publish/Subscribe system defines topics which are the URI of streams. Note syntax is often• topic://Session URI/stream1 with classic hierarchical

labeling Scripts need discovery system to keep track of URI’s and in

particular the session URI (which plays role of context) -- currently this is same as NaradaBrokering Discovery System• Pub/Sub Streams typically support conversations with

related streams topic://Session URI/stream1/WS-A and topic://Session URI/stream1/WS-B to allow Web services A and B to interact

Page 23: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

Publish/Subscribe Topics One has “data” which has perhaps an intrinsic URI For files and web pages, we have as well the location URL I think Publish/Subscribe topic is like the URI for streams

and it is instantiated as a particular queue (or set of queues) in NB

In NB Topics are integers (for performance), URI style or general XML instances

Note that session topic can be thought of as “context” for messages sent to topic as it provides intrinsic information as to meaning of stream (cf. OGSI; WS-Addressing WS-Context WS-Reliable Messaging and WS-Routing)

Topics for streams and sessions virtualize destination, routing and context

Page 24: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

Role of Pub/Sub Queues One can think of N/B as providing an operating service

to transmit streams between end-points with various value-added capabilities

Messages are the units of a stream Events are messages with time-stamps (which could be

absent); so events are messages and vice versa Streams are ordered collections of messages

• NB manipulates streams and collections of streams• Delivery is guaranteed order preserving

NB provides a virtual stream desktop which you can use to manipulate streams in same way you manipulate files in conventional O/S

Page 25: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

Multiple Input and Output Ports We can deal with Web Services with multiple input and

output using an array notation but the &> Tee and | Pipe notation get clumsy

So can use explicit notation such as• x.port[0].publish = NBTopicA; • y1.port[0].subscribe = NBTopicA;• y2.port[0].subscribe = NBTopicA;

This would also be natural way of implementing stream-oriented workflow

Errors and notifications would be easy in this syntax• notifyTOPIC = SessionTOPIC + ‘/notify’;• x.notify.publish = notifyTOPIC;• scriptasaWS.port[1].subscribe = notifyTOPIC;

Page 26: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

HPSearch Administrative Interface to NB One can build administrative policies and procedures

by flowing administrative and monitoring data to appropriate scripting engines• performanceTOPIC = SessionTOPIC + ‘/performance’;• nbws = NBDiscover(“aggregateperformancews”)• nbws.performancedata.publish = performanceTOPIC;• scriptasaWS.port[2].subscribe = performanceTOPIC;• Niftyperformanceanalyser(scriptasaWS.port[2]);• …….

This example pipes performance data from NaradaBrokering and spawns some analysis

NB provides for each link (broker to broker, broker to end-point) available bandwidth, used bandwidth, latency etc.

Page 27: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

Other NB Features to be added to HPSearch Full details of available Brokers and Stable storage Pending queue sizes Message statistics – size, number per second, time since

since last message – at brokers and end-points Current stream sequence number at different parts of

pipeline from source to destination Heartbeat Information Active Topics and list of publishers and subscribers

(subject to security restrictions) Fault tolerance statistics including those subscribed

end-points which are “down”

Page 28: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

Pentium-3, 1GHz, 256 MB RAM100 Mbps LANJRE 1.3 Linux

hop-3

0123456789

100 1000

Tran

sit D

elay

(M

illis

econ

ds)

Message Payload Size (Bytes)

Mean transit delay for message samples in NaradaBrokering: Different communication hops

hop-2

hop-5 hop-7

Page 29: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1000 1500 2000 2500 3000 3500 4000 4500 5000

Sta

ndar

d D

evia

tion

(Milli

seco

nds)

Message Payload Size (Bytes)

Standard Deviation for message samples in NaradaBrokering Different communication hops - Internal Machines

hop-2hop-3hop-5hop-7

Page 30: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

0

10

20

30

40

50

60

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Del

ay (M

illis

econ

ds)

Packet Number

Average delays per packet for 50 video-clients NaradaBrokering Avg=2.23 ms, JMF Avg=3.08 ms

NaradaBrokering-RTP JMF-RTP

Page 31: HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

0

1

2

3

4

5

6

7

8

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Jitte

r (M

illis

econ

ds)

Packet Number

Average jitter (std. dev) for 50 video clients. NaradaBrokering Avg=0.95 ms, JMF Avg=1.10 ms

NaradaBrokering-RTP JMF-RTP