8/14/2019 Managing Grid Messaging MiddlewareJan31-07
1/29
ManagingManaging
Grid and Web ServicesGrid and Web Servicesand their exchanged messagesand their exchanged messages
OGF19 Workshop on Reliability andOGF19 Workshop on Reliability and
RobustnessRobustnessFriday Center Chapel Hill NCFriday Center Chapel Hill NCJanuary 31 2007January 31 2007
AuthorsAuthors
Harshawardhan Gadgil (his PhD topic), Geoffrey Fox,Harshawardhan Gadgil (his PhD topic), Geoffrey Fox,Shrideep Pallickara, Marlon PierceShrideep Pallickara, Marlon Pierce
Community Grids Lab, Indiana UniversityCommunity Grids Lab, Indiana University
Presented byPresented byGeoffrey FoxGeoffrey Fox
8/14/2019 Managing Grid Messaging MiddlewareJan31-07
2/29
22
Management Problem IManagement Problem I Characteristics of todays (Grid) applicationsCharacteristics of todays (Grid) applications
Increasing complexityIncreasing complexity
Components widely dispersed and disparate inComponents widely dispersed and disparate innature and accessnature and access Span different administrative domainsSpan different administrative domains Under differing network / security policiesUnder differing network / security policies Limited access to resources due to presence ofLimited access to resources due to presence offirewalls,firewalls,
NATsNATs etc (etc (major focus in prototype)major focus in prototype)
DynamicDynamic Components (Nodes, network, processes) may failComponents (Nodes, network, processes) may fail
ServicesServices mustmust meetmeet GeneralGeneral QoSQoS and Life-cycle featuresand Life-cycle features
(User defined)(User defined) Application specificApplication specific criteriacriteria Need to manage services to provide theseNeed to manage services to provide these
capabilitiescapabilities Dynamic monitoring and recoveryDynamic monitoring and recovery
Static configuration and composition of systemsStatic configuration and composition of systemsfrom subsystemsfrom subsystems
8/14/2019 Managing Grid Messaging MiddlewareJan31-07
3/29
33
Management Problem IIManagement Problem II Management OperationsManagement Operations** includeinclude
Configuration and Lifecycle operations (CREATE, DELETE)Configuration and Lifecycle operations (CREATE, DELETE)
Handle RUNTIME eventsHandle RUNTIME events Monitor status and performanceMonitor status and performance Maintain system state (according to user defined criteria)Maintain system state (according to user defined criteria)
Protocols likeProtocols like WS-Management/WS-DMWS-Management/WS-DM define inter-define inter-service negotiation and how to transfer metadataservice negotiation and how to transfer metadata
We are designing/prototyping a system that willWe are designing/prototyping a system that willmanage amanage a general world wide collection of services andgeneral world wide collection of services andtheir network linkstheir network links Need to addressNeed to address Fault Tolerance, Scalability, Performance,Fault Tolerance, Scalability, Performance,
Interoperability, Generality, UsabilityInteroperability, Generality, Usability
We are starting with ourWe are starting with our messaging infrastructuremessaging infrastructure asas we need this to be robust in Grids we are using it in (Sensor andwe need this to be robust in Grids we are using it in (Sensor andmaterial science)material science)
we are using it in management systemwe are using it in management system and it has critical network requirementsand it has critical network requirements
**From WS Distributed Managementhttp://
devresource.hp.com/drc/slide_presentations/wsdm/index.jsp
http://devresource.hp.com/drc/slide_presentations/wsdm/index.jsphttp://devresource.hp.com/drc/slide_presentations/wsdm/index.jsphttp://devresource.hp.com/drc/slide_presentations/wsdm/index.jsphttp://devresource.hp.com/drc/slide_presentations/wsdm/index.jsphttp://devresource.hp.com/drc/slide_presentations/wsdm/index.jsp8/14/2019 Managing Grid Messaging MiddlewareJan31-07
4/29
44
Core Features of ManagementCore Features of Management
ArchitectureArchitecture Remote ManagementRemote Management Allow management irrespective of the location of the resourceAllow management irrespective of the location of the resource
(as long as that resource is reachable via some means)(as long as that resource is reachable via some means) Traverse firewalls and NATsTraverse firewalls and NATs
Firewalls complicate management by disabling access toFirewalls complicate management by disabling access tosome transports and access to internal resourcessome transports and access to internal resources
Utilize tunneling capabilities and multi-protocol support ofUtilize tunneling capabilities and multi-protocol support ofmessaging infrastructuremessaging infrastructure
ExtensibleExtensible Management capabilities evolve with time. We use a serviceManagement capabilities evolve with time. We use a service
oriented architecture to provide extensibility andoriented architecture to provide extensibility andinteroperabilityinteroperability
ScalableScalable
Management architecture should be scale as number ofManagement architecture should be scale as number ofmanagees increasesmanagees increases Fault-tolerantFault-tolerant
Management itself must be fault-tolerantManagement itself must be fault-tolerant. Failure of transports. Failure of transportsOR management components should not cause managementOR management components should not cause managementarchitecture to fail.architecture to fail.
8/14/2019 Managing Grid Messaging MiddlewareJan31-07
5/29
Management Architecture built in termsof Hierarchical Bootstrap System Robust
itself by Replication managees in different domains can be managed
with separate policies for each domain Periodically spawns a System Health Check that
ensures components are up and running
Registry for metadata (distributeddatabase) Robust by standard databasetechniques and our system itself for ServiceInterfaces Stores managee specific information (User-
defined configuration / policies, external staterequired to properly manage a managee)
Generates a unique ID per instance ofregistered component
8/14/2019 Managing Grid Messaging MiddlewareJan31-07
6/29
Architecture:Scalability: Hierarchical distribution
Replicated ROOT
US EUROPE
FSU CARDIFFCGL
Active Bootstrap Nodes/ROOT/EUROPE/CARDIFF
Responsible for maintaining aworking set of managementcomponents in the domainAlways the leaf nodes in thehierarchy
Passive BootstrapNodes
Only ensure that all childbootstrap nodes are alwaysup and running
Spawns if not presentand ensure up and
running
8/14/2019 Managing Grid Messaging MiddlewareJan31-07
7/29
Messaging Nodes form a scalable messagingsubstrate
Message delivery between managers and managees Provides transport protocol independent messagingbetween distributed entities
Can provide Secure delivery of messages
Managers Active stateless agents that manage
managees. Both general and managee specific management threads
performs actual management Multi-threaded to improve scalability with many managees
Managees what you are managing (managee /
service to manage) Our system makes robust There is NO assumption that Managed system uses
Messaging nodes Wrapped by a Service Adapter which provides a Web
Service interface
Assumed that ONLY modest state needed to bestored/restored externally. Managee could front end and
of
8/14/2019 Managing Grid Messaging MiddlewareJan31-07
8/29
Architecture:Conceptual Idea (Internals)
Resource to
Manage
(Managee)
Service
Adapter
BootstrapService
System Health
Check Manager
Resource to
Manage
(Managee)
ServiceAdapter
Resource to
Manage
(Managee)
Service
Adapter
Manager
Messaging
Node
Registry
Manager
Manager
...
...
Connect toMessaging Nodefor sending and
receivingmessagesUser writes systemconfiguration to
Manager processesperiodically checks availablemanagees to manage. Also
Read/Write manageespecific external state
from/to registry
Always ensureup and running
Always ensure upand running
PeriodicallySpawn
WS Management
8/14/2019 Managing Grid Messaging MiddlewareJan31-07
9/29
Architecture:User Component
Managee Characteristics are determinedby the user.
Events generated by the Managees arehandled by the manager
Event processing is determined by viaWS-Policy constructs E.g. Wait for users decision on handling
specific conditions Auto Instantiate a failed service but service
responsible for doing this consistently evenwhen failed service not failed but justunreachable
Administrators can set up services(managees) by defining characteristics Writing information to registry can be used to
8/14/2019 Managing Grid Messaging MiddlewareJan31-07
10/29
Issues in the distributed systemConsistency Examples of inconsistent behavior
Two or more managers managing the same managee Old messages / requests reaching after new requests Multiple copies of managees existing at the same time /
Orphan managees leading to inconsistent system state Use a Registry generated monotonically increasing
Unique Instance ID (IID) to distinguish betweennew and old instances of Managers, Managees andMessages Requests from manager thread A are considered obsolete
IF IID(A) < IID(B) Service Adapter stores the last known MessageID
(IID:seqNo) allowing it to differentiate between duplicatesAND obsolete messages
Periodic renewal with registry IFIID(manageeInstance_1) < IID(manageeInstance_2) THEN manageeInstance_1 was deemed OBSOLETE
SOEXECUTE Policy (E.g. Instruct manageeInstance_1to silently shutdown)
8/14/2019 Managing Grid Messaging MiddlewareJan31-07
11/29
8/14/2019 Managing Grid Messaging MiddlewareJan31-07
12/29
Implemented:
WS Specifications WS Management (June 2005) parts (WS Transfer [Sep
2004], WS Enumeration [Sep 2004] and WS Eventing)(could use WS-DM)
WS Eventing (Leveraged from the WS
Eventing capability implemented in OMII) WS Addressing [Aug 2004] and SOAP v 1.2 used (needed
for WS-Management)
Used XmlBeans 2.0.0 for manipulating XML incustom container.
Currently implemented using JDK 1.4.2 butwill switch to JDK1.5
Released on http://www.naradabrokering.orgin February 2007
http://www.naradabrokering.org/http://www.naradabrokering.org/8/14/2019 Managing Grid Messaging MiddlewareJan31-07
13/29
Performance EvaluationResults
Extreme case withmany catastrophicfailures
Response timeincreases withincreasing number ofconcurrent requests
Response time isMANAGEE-DEPENDENTand the shown timesare typical
MAY involve 1 or moreRegistry access whichwill increase overallresponse time
Increases rapidly asno. of Managees >(150 200)
managees
8/14/2019 Managing Grid Messaging MiddlewareJan31-07
14/29
Performance EvaluationHow much infrastructure is required to manage N managees ?
N = Number of managees to manage M = Max. no. of entities connected to a single messagingnode
D = Max. no of managees managed by a single managerprocess
R = min. no. of registry service instances required to
provide fault-tolerance Assume every leaf domain has 1 messaging node. Hence
we have N/M leaf domains. Further, No. of managers required per leaf domain is M/D Total Components in lowest level
= (R registry + 1 Bootstrap Service + 1Messaging Node+ M/D Managers) * (N/M such leaf domains)
= (2 + R + M/D) * (N/M) Thus percentage of additional infrastructure is
= [(2 +R)/M + 1/D] * 100 %
8/14/2019 Managing Grid Messaging MiddlewareJan31-07
15/29
Performance EvaluationResearch Question:How much infrastructure is required to manage N managees ?
Additional infrastructure = [(2 +R)/M + 1/D] * 100%
A Few Cases Typical values of D and M are 200 and 800 and assuming
R = 4, then Additional Infrastructure
= [(2+4)/800 + 1/200] * 100 % 1.2 %
Shared Registry => there is one registry interface perdomain, R = 1, then Additional Infrastructure= [(2+1)/800 + 1/200] * 100 % 0.87 %
If NO messaging node is used (assume D = 200), thenAdditional Infrastructure= [(R registry + 1 bootstrap node + N/D managers)/N] *100 %= [(1+R)/N + 1/D] * 100 %
100/D % (for N >> R) 0.5%
8/14/2019 Managing Grid Messaging MiddlewareJan31-07
16/29
Performance EvaluationResearch Question:How much infrastructure is required to manage N managees ?
How Cost varies with maximum Managees per Manager
(D) ?
100.4
33.7
20.414.7
10.4
4.4 2.4 1.4 1.0 0.9
-10.0
10.0
30.0
50.0
70.0
90.0
110.0
1 3 5 7 10 25 50 100 150 200
Max. Managees per Manager (D)
PercentageofAddition
al
Infrastructure
"For R = 1, M = 800"
8/14/2019 Managing Grid Messaging MiddlewareJan31-07
17/29
Performance EvaluationXML Processing Overhead
XML Processing overhead is measured as the totalmarshalling and un-marshalling time required.
In case of Broker Management interactions,typical processing time (includes validation
against schema) 5 ms Broker Management operations invoked only during
initialization and failure from recovery Reading Broker State using a GET operation involves 5ms
overhead and is invoked periodically (E.g. every 1minute, depending on policy)
Further, for most operation dealing with changing brokerstate, actual operation processing time >> 5ms andhence the XML overhead of 5 ms is acceptable.
8/14/2019 Managing Grid Messaging MiddlewareJan31-07
18/29
Prototype:Managing Grid Messaging Middleware
We illustrate the architecture by managing the distributedmessaging middleware: NaradaBrokering This example motivated by the presence of large
number of dynamic peers (brokers) that needconfiguration and deployment in specific topologies
Runtime metrics provide dynamic hints on improving
routing which leads to redeployment of messagingsystem (possibly) using a different configuration andtopology
Can use (dynamically) optimized protocols (UDP v TCPv Parallel TCP) and go through firewalls
Broker Service Adapter Note NB illustrates an electronic entity that didnt start
off with an administrative Service interface So add wrapper over the basic NB BrokerNode object
that provides WS Management front-end Allows CREATION, CONFIGURATION and MODIFICATION of
broker and broker topologies
M i (N d B k i )M i (N d B k i )
8/14/2019 Managing Grid Messaging MiddlewareJan31-07
19/29
Messaging (NaradaBrokering)Messaging (NaradaBrokering)
ArchitectureArchitecture
1919
Messaging
Substrate
Media Device
Slow Clientbehind modem
Laptop
PDA
Computer
File Server
Compute Serverbehind firewall
MediaServer
Audio / Video
Conferencing Client
User
ComputeServer
8/14/2019 Managing Grid Messaging MiddlewareJan31-07
20/29
June 19, 2006 Community Grids Lab, Bloomington IN :CLADE 2006: 20
Typical use of Grid Messaging in NASA
Datamining Grid
Sensor Grid implementing using NB
NBGIS Grid
N d B k i M tN d B k i M t
8/14/2019 Managing Grid Messaging MiddlewareJan31-07
21/29
2121
NaradaBrokering ManagementNaradaBrokering ManagementNeedsNeeds NaradaBrokeringNaradaBrokering Distributed Messaging System consists of peersDistributed Messaging System consists of peers
(brokers) that collectively form a scalable messaging substrate.(brokers) that collectively form a scalable messaging substrate.
Optimizations and configurations include:Optimizations and configurations include: Where should brokers be placed and how should they beWhere should brokers be placed and how should they be
connected, E.g. RING, BUS, TREE, HYPERCUBE etc, eachconnected, E.g. RING, BUS, TREE, HYPERCUBE etc, eachTOPOLOGYTOPOLOGY has varying degree of resource utilization, routing,has varying degree of resource utilization, routing,cost and fault-tolerance characteristics.cost and fault-tolerance characteristics.
StaticStatic topologies or topologies created using static rules may betopologies or topologies created using static rules may beinefficient in some casesinefficient in some cases
E.g., In CAN, Chord a new incoming peer randomly joins nodesE.g., In CAN, Chord a new incoming peer randomly joins nodesin the network. Network distances are not taken into accountin the network. Network distances are not taken into accountand hence some lookup queries may span entire diameter ofand hence some lookup queries may span entire diameter ofnetworknetwork
Runtime metrics provideRuntime metrics provide dynamicdynamic hints on improving routinghints on improving routingwhich leads to redeployment of messaging system (possibly)which leads to redeployment of messaging system (possibly)using a different configuration and topologyusing a different configuration and topology
Can use (dynamically)Can use (dynamically) optimized protocolsoptimized protocols (UDP v TCP v(UDP v TCP vParallel TCP) and go through firewalls but no good way to makeParallel TCP) and go through firewalls but no good way to make
choices dynamicallychoices dynamically These actions collectively termed asThese actions collectively termed as Managing the MessagingManaging the Messaging
8/14/2019 Managing Grid Messaging MiddlewareJan31-07
22/29
Prototype:Costs (Individual Managees are NaradaBrokeringBrokers)
Time (msec) (average values)
Initialized
(Later modifications)
142 1
104 2
160 2
610 6
778 5
Un-Initialized
(First time)
129 2Delete Broker
20 1Delete Link
27 2Create Link
57 2Create Broker
33 3Set Configuration
Operation
8/14/2019 Managing Grid Messaging MiddlewareJan31-07
23/29
Recovery:Typical Time
Max:
20 + 778.0 + 610.1 +160.5 + 2* 26.67
1622 msec
Min:
5 + 778.0 +610.1
1393 msec
N nodes, Links per
broker vary from 0 3
1 4 manageeObjects per node
Cluster
10 + (778.0 + 610.1 + 160.5)
1548 msec
N nodes, N links (1
outgoing link perNode)
2 managee ObjectsPer node
Ring
Recovery Time= T(Read State From Registry) + T(Bring
managee up to speed)
= T(Read State) + T[SetConfig + CreateBroker + CreateLink(s)]
Number of manageespecific Configuration
EntriesTopology
Assuming 5ms Read time from registry permanagee object
8/14/2019 Managing Grid Messaging MiddlewareJan31-07
24/29
Prototype:ObservedRecovery Cost per managee
1616 821421 9
8 1
2362 18
Average (msec)
Restore (1 Broker + 3 Link)Restore (1 Broker + 1 Link)
Read State
*Spawn Process
Operation
Time for Create Broker depends on the number & type oftransports opened by the broker
E.g. SSL transport requires negotiation of keys and wouldrequire more time than simply establishing a TCPconnection
If brokers connect to other brokers, the destination broker
MUST be ready to accept connections, else topology recoverytakes more time.
8/14/2019 Managing Grid Messaging MiddlewareJan31-07
25/29
Management Console:Creating Nodes and Setting Properties
8/14/2019 Managing Grid Messaging MiddlewareJan31-07
26/29
Management Console:Creating Links
8/14/2019 Managing Grid Messaging MiddlewareJan31-07
27/29
Management Console:Policies
8/14/2019 Managing Grid Messaging MiddlewareJan31-07
28/29
Management Console:Creating Topologies
C l i
8/14/2019 Managing Grid Messaging MiddlewareJan31-07
29/29
Conclusion We have presented a scalable, fault-tolerant
management framework that Adds acceptable cost in terms of extra resources
required (about 1%) Provides a general framework for management of
distributed entities Is compatible with existing Web Service
specifications
We have applied our framework to manageManagees that are loosely coupled and have
modest external state (important to improvescalability of management process)
Outside effort is developing a Grid Builderwhich combines BPEL and this management
system to manage initial specification