HPSearch and NaradaBrokering: Workflow Scripting and Stream Management PTLIU Laboratory for Community Grids Geoffrey Fox, Harshawardhan S. Gadgil, Shrideep Pallickara Indiana University, Bloomington IN 47404 http://www.hpsearch.org http://www.naradabrokering.org [email protected]Edinburgh December 3 2003
31
Embed
HPSearch and NaradaBrokering: Workflow Scripting and Stream Management
Edinburgh December 3 2003. HPSearch and NaradaBrokering: Workflow Scripting and Stream Management. PTLIU Laboratory for Community Grids Geoffrey Fox, Harshawardhan S. Gadgil, Shrideep Pallickara Indiana University, Bloomington IN 47404 http://www.hpsearch.org http://www.naradabrokering.org. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
HPSearch and NaradaBrokering: Workflow Scripting and Stream Management
PTLIU Laboratory for Community GridsGeoffrey Fox, Harshawardhan S. Gadgil, Shrideep Pallickara
Backdrop Workflow systems have several components Development Environment – Graphical User Interface Specification Language such as BPEL4WS Some interface between specification and runtime
(compiler?) Run-time managing linkage of services, error handling
and notification This project contributes to
• Procedural specification of workflow• Stream management part of run-time
Workflow is sufficiently complex that we ought to agree on a general architecture so we can each build parts and link together
Comments on Standards In this talk, workflow is synonymous with
“Programming the Grid/Internet” We never agreed on a programming model for the
simpler case of “Programming a CPU” so not very likely we will agree on standards for workflow
We did roughly agree on standards within a particular language Fortran v. Java v C++ v C# v Lisp
We also had a little more agreement on common run-time than on languages but not complete
What to remember about this talk All streams (flow between ports of a web service) are
handled by publish-subscribe messaging infrastructure• Allows robust data transfer with adaptive routing e.g. allows
use of GridFTP• Supports full concurrency inter and intra streams
Data, Streams, Files, Web Services are manipulated by a scripting language analogous to Shell and Perl in UNIX
http://www.hpsearch.org has the details Software will be included in open-source release of
http://wwww.naradabrokering.org • NB Version 0.93 today; 1.0 February 04; 2.0 for SC04
Scripting Environment I HPSearch is designed as a scripting interface to the Internet
(Grid) using currently the Rhino implementation of Javascript• Could use Python, Perl • Called HPSearch because could access variables either by
URI or by search interface (To Google Web Service) but this is not relevant here
x = ‘wsdl:http://156.56.104.155:8080/axis/services/Calculator.jws/add(5, 6)‘; Or x = ‘wsdl:WSDL for WS/function(arguments)‘; Returns x=11 if function adds its arguments Could follow by y=x+1 setting y=12 etc. Can access any data in this fashion and support normal
capabilities supported in most languages (set x=data and use I/O)• Perhaps prefer all I/O to go through Web Services
Scripting Environment II So scripting environment can manipulate its own
variables and methods as usual but can also invoke any web service with the wsdl: primitive
xpath: primitive evaluates an XPath query against a local variable defined (say by return from a Web service) as an XML instance
Can have multiple communicating scripting enginesWS1 WS3WS2 WS4 WS5 WS6
Script1 Script2 Script3
This is scripting control; workflow is between web services
Protocols have become overloaded e.g. MUST use UDP for A/V latency requirements but MUSTn’t use UDP as firewall will not support ………
NaradaBrokering Based on a network of cooperating broker nodes
• Cluster based architecture allows system to scale in size Originally designed to provide uniform software
multicast to support real-time collaboration linked to publish-subscribe for asynchronous systems.
Now has several core functions • Reliable order-preserving “Optimized” Message transport
(based on performance measurement) in heterogeneous multi-link fashion with TCP, UDP, SSL, HTTP, and will add GridFTP
• General publish-subscribe including JMS & JXTA and support for RTP-based audio/video conferencing
• Distributed XML event selection using XPATH metaphor• QoS, Security profiles for sent and received messages• Interface with reliable storage for persistent events
Laudable Features of NaradaBrokering Is open source http://www.naradabrokering.org Has end-point “plug-in” as well as standalone brokers Will have a discovery service to find nearest brokers
and manage topics Does tunnel through many firewalls without requiring
ports to be opened Supports JXTA, JMS (Java Message Service) and more
powerful native mode Transit time < 1 millisecond per broker Initial version of setup and broker network
administration module• Currently expect to use HPSearch scripts to specify setup
NaradaBrokering Naturally Supports Filtering of events to support different client
requirements (e.g,. PDA versus desktop, slow lines, different A/V codecs)
Virtualization of addressing, routing, interfaces Federation and Mediation of multiple instances of Grid
services as illustrated by • Composition of Gridlets into full Grids (Gridlets are single
computers in P2P case)• JXTA with peer-group forming a Gridlet
Monitoring of messages for Service management and general autonomic functions
Fault tolerant data transport Virtual Private Grid with fine-grain Security model
NaradaBrokering Communication Applications interface to NaradaBrokering through UserChannels
which NB constructs as a set of links between NB Brokers acting as “waystations” which may need to be dynamically instantiated
UserChannels have publish/subscribe semantics with XML topics Links implement a single conventional “data” protocol.
• Interface to add new transport protocols within the Framework
• Administrative channel negotiates the best available communication protocol for each link
Different links can have different underlying transport implementations• Implementations in the current release include support for
TCP,UDP, Multicast, SSL, RTP and HTTP. • GridFTP most interesting new protocol• Supports communication through proxies and firewalls such as
iPlanet, Netscape, Apache, Microsoft ISA and Checkpoint.
Manipulating Streams flow: primitive manages streams between Web services There is service-oriented workflow where streams are
typically implicit. Here HPSearch supports UNIX style pipe and tee and we have trivial examples
For stream-oriented, the streams are explicit. We have built a sophisticated system GlobalMMCS but it is currently not supported in HPSearch
HPSearch will become control engine for NaradaBrokering when streams are “just” message flows on the Grid. Here one would use NB discovery services – find streams – and monitor• In this view a client talking to a Web Service is workflow
HPSearch Flow Example // The input file x = "file:///u/hgadgil/datafile.txt"; // Reverses every line in the i/p e.g. abcd becomes dcba y1 = "156.56.104.155:5050"; // Computes the length of each line minus the last (\n or \r) y2 = "156.56.104.155:6060"; // And finally the outputs... z1 = "file:///u/hgadgil/reversed.txt"; z2 = "file:///u/hgadgil/length.txt"; `flow: x &> (y1 | z1), (y2 | z2)`;
T Pipe Pipe
x
y1 z1
y2 z2
NaradaBrokering Queue
Another Example `flow: x &> (y1|z1 &> p,(q|storage1)), (y2|z2|storage2)`;
Note this approach allows for example all workflow streams to use RMI, GridFTP, RTP – your or rather NaradaBrokering’s choice
x
y1 z1 p
y2 z2 storage2
q storage1
NaradaBrokering Topic (Queue)
Stream–oriented Workflow As in audio-video conferencing and multimedia file
delivery where it’s the media streams that are the “point”
Services generate and transform streams but one thinks of streams going through services rather than services generating streams
Multi-cast streams where video from one client sent to all other participants in a collaborative session common
One thinks of a stream being published and participants subscribing to it.
Pub/Sub QueuePublish
Subscribe
XGSP Web Service MCU Architecture
SIP H323 Access Grid Native XGSPAdmire
Gateways convert to uniform XGSP Messaging
High Performance (RTP)and XML/SOAP and ..
Media ServersFilters
Session ServerXGSP-based Control
NaradaBrokeringAll Messaging
Use Multiple Media servers to scale to many codecs and manyversions of audio/video mixing; should allow all e-Scientists to be connected
NB Scales asdistributed
WebServices
NaradaBrokering
Service-oriented Workflow I As in follow of data between different simulation
programs where one has a program (which becomes a Web service) view and data flow between programs often not explicitly interesting
Elastic DislocationPattern Recognizers
Fault Model BEM
Viscoelastic Layered BEM
Viscoelastic FEMElastic Dislocation Inversion
Service-oriented Workflow II Initial input and output files identified with perhaps a
visualization as output In many implementations such as ours in earthquake
example one writes and reads files for stream interface• Sometimes one wants the intermediate output files
AVS and such visualization and image processing systems have such a model using streams
Multicast not important per-se; use a publish/subscribe mechanism as it is fault-tolerant and higher performance and not because of multi-cast support
Streams and Data Scripting engine can either define topics or find them
out from NaradaBrokering discovery service Run-time ensures that all I/O goes through
NaradaBrokering• Note one either uses a proxy or builds NaradaBrokering
interface into Web service• Proxy should be near Web Service as only NaradaBrokering
• NaradaBrokering needs improved discovery system NaradaBrokering and Scripts are distributed so no
central bottlenecks
NaradaBrokering in practice One can “best” insert NaradaBrokering end-point
interface into each client or web service
But proxy model easiest for existing applications
Client WSNB BrokerNetwork
NBEndpt
NBEndpt NB Streams
NB BrokerNetwork
NBEndpt
NBEndpt NB Streams
Client WSProxy Proxy
“Native Communication” – cannot use added value of NBincluding fault tolerance. Current GridFTP Implementation
Entities in HPSearch Each Script is a Web Service Each Web Service, File, Web Page has a URI and can be
accessed by a Script• HPSearch at its heart was URI’s bound to Javascript
Publish/Subscribe system defines topics which are the URI of streams. Note syntax is often• topic://Session URI/stream1 with classic hierarchical
labeling Scripts need discovery system to keep track of URI’s and in
particular the session URI (which plays role of context) -- currently this is same as NaradaBrokering Discovery System• Pub/Sub Streams typically support conversations with
related streams topic://Session URI/stream1/WS-A and topic://Session URI/stream1/WS-B to allow Web services A and B to interact
Publish/Subscribe Topics One has “data” which has perhaps an intrinsic URI For files and web pages, we have as well the location URL I think Publish/Subscribe topic is like the URI for streams
and it is instantiated as a particular queue (or set of queues) in NB
In NB Topics are integers (for performance), URI style or general XML instances
Note that session topic can be thought of as “context” for messages sent to topic as it provides intrinsic information as to meaning of stream (cf. OGSI; WS-Addressing WS-Context WS-Reliable Messaging and WS-Routing)
Topics for streams and sessions virtualize destination, routing and context
Role of Pub/Sub Queues One can think of N/B as providing an operating service
to transmit streams between end-points with various value-added capabilities
Messages are the units of a stream Events are messages with time-stamps (which could be
absent); so events are messages and vice versa Streams are ordered collections of messages
• NB manipulates streams and collections of streams• Delivery is guaranteed order preserving
NB provides a virtual stream desktop which you can use to manipulate streams in same way you manipulate files in conventional O/S
Multiple Input and Output Ports We can deal with Web Services with multiple input and
output using an array notation but the &> Tee and | Pipe notation get clumsy
So can use explicit notation such as• x.port[0].publish = NBTopicA; • y1.port[0].subscribe = NBTopicA;• y2.port[0].subscribe = NBTopicA;
This would also be natural way of implementing stream-oriented workflow
Errors and notifications would be easy in this syntax• notifyTOPIC = SessionTOPIC + ‘/notify’;• x.notify.publish = notifyTOPIC;• scriptasaWS.port[1].subscribe = notifyTOPIC;
HPSearch Administrative Interface to NB One can build administrative policies and procedures
by flowing administrative and monitoring data to appropriate scripting engines• performanceTOPIC = SessionTOPIC + ‘/performance’;• nbws = NBDiscover(“aggregateperformancews”)• nbws.performancedata.publish = performanceTOPIC;• scriptasaWS.port[2].subscribe = performanceTOPIC;• Niftyperformanceanalyser(scriptasaWS.port[2]);• …….
This example pipes performance data from NaradaBrokering and spawns some analysis
NB provides for each link (broker to broker, broker to end-point) available bandwidth, used bandwidth, latency etc.
Other NB Features to be added to HPSearch Full details of available Brokers and Stable storage Pending queue sizes Message statistics – size, number per second, time since
since last message – at brokers and end-points Current stream sequence number at different parts of
pipeline from source to destination Heartbeat Information Active Topics and list of publishers and subscribers
(subject to security restrictions) Fault tolerance statistics including those subscribed
end-points which are “down”
Pentium-3, 1GHz, 256 MB RAM100 Mbps LANJRE 1.3 Linux
hop-3
0123456789
100 1000
Tran
sit D
elay
(M
illis
econ
ds)
Message Payload Size (Bytes)
Mean transit delay for message samples in NaradaBrokering: Different communication hops
hop-2
hop-5 hop-7
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1000 1500 2000 2500 3000 3500 4000 4500 5000
Sta
ndar
d D
evia
tion
(Milli
seco
nds)
Message Payload Size (Bytes)
Standard Deviation for message samples in NaradaBrokering Different communication hops - Internal Machines
hop-2hop-3hop-5hop-7
0
10
20
30
40
50
60
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Del
ay (M
illis
econ
ds)
Packet Number
Average delays per packet for 50 video-clients NaradaBrokering Avg=2.23 ms, JMF Avg=3.08 ms
NaradaBrokering-RTP JMF-RTP
0
1
2
3
4
5
6
7
8
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Jitte
r (M
illis
econ
ds)
Packet Number
Average jitter (std. dev) for 50 video clients. NaradaBrokering Avg=0.95 ms, JMF Avg=1.10 ms