Top Banner

of 29

Managing Grid Messaging MiddlewareJan31-07

May 30, 2018

Download

Documents

spiritwin
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    1/29

    ManagingManaging

    Grid and Web ServicesGrid and Web Servicesand their exchanged messagesand their exchanged messages

    OGF19 Workshop on Reliability andOGF19 Workshop on Reliability and

    RobustnessRobustnessFriday Center Chapel Hill NCFriday Center Chapel Hill NCJanuary 31 2007January 31 2007

    AuthorsAuthors

    Harshawardhan Gadgil (his PhD topic), Geoffrey Fox,Harshawardhan Gadgil (his PhD topic), Geoffrey Fox,Shrideep Pallickara, Marlon PierceShrideep Pallickara, Marlon Pierce

    Community Grids Lab, Indiana UniversityCommunity Grids Lab, Indiana University

    Presented byPresented byGeoffrey FoxGeoffrey Fox

  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    2/29

    22

    Management Problem IManagement Problem I Characteristics of todays (Grid) applicationsCharacteristics of todays (Grid) applications

    Increasing complexityIncreasing complexity

    Components widely dispersed and disparate inComponents widely dispersed and disparate innature and accessnature and access Span different administrative domainsSpan different administrative domains Under differing network / security policiesUnder differing network / security policies Limited access to resources due to presence ofLimited access to resources due to presence offirewalls,firewalls,

    NATsNATs etc (etc (major focus in prototype)major focus in prototype)

    DynamicDynamic Components (Nodes, network, processes) may failComponents (Nodes, network, processes) may fail

    ServicesServices mustmust meetmeet GeneralGeneral QoSQoS and Life-cycle featuresand Life-cycle features

    (User defined)(User defined) Application specificApplication specific criteriacriteria Need to manage services to provide theseNeed to manage services to provide these

    capabilitiescapabilities Dynamic monitoring and recoveryDynamic monitoring and recovery

    Static configuration and composition of systemsStatic configuration and composition of systemsfrom subsystemsfrom subsystems

  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    3/29

    33

    Management Problem IIManagement Problem II Management OperationsManagement Operations** includeinclude

    Configuration and Lifecycle operations (CREATE, DELETE)Configuration and Lifecycle operations (CREATE, DELETE)

    Handle RUNTIME eventsHandle RUNTIME events Monitor status and performanceMonitor status and performance Maintain system state (according to user defined criteria)Maintain system state (according to user defined criteria)

    Protocols likeProtocols like WS-Management/WS-DMWS-Management/WS-DM define inter-define inter-service negotiation and how to transfer metadataservice negotiation and how to transfer metadata

    We are designing/prototyping a system that willWe are designing/prototyping a system that willmanage amanage a general world wide collection of services andgeneral world wide collection of services andtheir network linkstheir network links Need to addressNeed to address Fault Tolerance, Scalability, Performance,Fault Tolerance, Scalability, Performance,

    Interoperability, Generality, UsabilityInteroperability, Generality, Usability

    We are starting with ourWe are starting with our messaging infrastructuremessaging infrastructure asas we need this to be robust in Grids we are using it in (Sensor andwe need this to be robust in Grids we are using it in (Sensor andmaterial science)material science)

    we are using it in management systemwe are using it in management system and it has critical network requirementsand it has critical network requirements

    **From WS Distributed Managementhttp://

    devresource.hp.com/drc/slide_presentations/wsdm/index.jsp

    http://devresource.hp.com/drc/slide_presentations/wsdm/index.jsphttp://devresource.hp.com/drc/slide_presentations/wsdm/index.jsphttp://devresource.hp.com/drc/slide_presentations/wsdm/index.jsphttp://devresource.hp.com/drc/slide_presentations/wsdm/index.jsphttp://devresource.hp.com/drc/slide_presentations/wsdm/index.jsp
  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    4/29

    44

    Core Features of ManagementCore Features of Management

    ArchitectureArchitecture Remote ManagementRemote Management Allow management irrespective of the location of the resourceAllow management irrespective of the location of the resource

    (as long as that resource is reachable via some means)(as long as that resource is reachable via some means) Traverse firewalls and NATsTraverse firewalls and NATs

    Firewalls complicate management by disabling access toFirewalls complicate management by disabling access tosome transports and access to internal resourcessome transports and access to internal resources

    Utilize tunneling capabilities and multi-protocol support ofUtilize tunneling capabilities and multi-protocol support ofmessaging infrastructuremessaging infrastructure

    ExtensibleExtensible Management capabilities evolve with time. We use a serviceManagement capabilities evolve with time. We use a service

    oriented architecture to provide extensibility andoriented architecture to provide extensibility andinteroperabilityinteroperability

    ScalableScalable

    Management architecture should be scale as number ofManagement architecture should be scale as number ofmanagees increasesmanagees increases Fault-tolerantFault-tolerant

    Management itself must be fault-tolerantManagement itself must be fault-tolerant. Failure of transports. Failure of transportsOR management components should not cause managementOR management components should not cause managementarchitecture to fail.architecture to fail.

  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    5/29

    Management Architecture built in termsof Hierarchical Bootstrap System Robust

    itself by Replication managees in different domains can be managed

    with separate policies for each domain Periodically spawns a System Health Check that

    ensures components are up and running

    Registry for metadata (distributeddatabase) Robust by standard databasetechniques and our system itself for ServiceInterfaces Stores managee specific information (User-

    defined configuration / policies, external staterequired to properly manage a managee)

    Generates a unique ID per instance ofregistered component

  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    6/29

    Architecture:Scalability: Hierarchical distribution

    Replicated ROOT

    US EUROPE

    FSU CARDIFFCGL

    Active Bootstrap Nodes/ROOT/EUROPE/CARDIFF

    Responsible for maintaining aworking set of managementcomponents in the domainAlways the leaf nodes in thehierarchy

    Passive BootstrapNodes

    Only ensure that all childbootstrap nodes are alwaysup and running

    Spawns if not presentand ensure up and

    running

  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    7/29

    Messaging Nodes form a scalable messagingsubstrate

    Message delivery between managers and managees Provides transport protocol independent messagingbetween distributed entities

    Can provide Secure delivery of messages

    Managers Active stateless agents that manage

    managees. Both general and managee specific management threads

    performs actual management Multi-threaded to improve scalability with many managees

    Managees what you are managing (managee /

    service to manage) Our system makes robust There is NO assumption that Managed system uses

    Messaging nodes Wrapped by a Service Adapter which provides a Web

    Service interface

    Assumed that ONLY modest state needed to bestored/restored externally. Managee could front end and

    of

  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    8/29

    Architecture:Conceptual Idea (Internals)

    Resource to

    Manage

    (Managee)

    Service

    Adapter

    BootstrapService

    System Health

    Check Manager

    Resource to

    Manage

    (Managee)

    ServiceAdapter

    Resource to

    Manage

    (Managee)

    Service

    Adapter

    Manager

    Messaging

    Node

    Registry

    Manager

    Manager

    ...

    ...

    Connect toMessaging Nodefor sending and

    receivingmessagesUser writes systemconfiguration to

    Manager processesperiodically checks availablemanagees to manage. Also

    Read/Write manageespecific external state

    from/to registry

    Always ensureup and running

    Always ensure upand running

    PeriodicallySpawn

    WS Management

  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    9/29

    Architecture:User Component

    Managee Characteristics are determinedby the user.

    Events generated by the Managees arehandled by the manager

    Event processing is determined by viaWS-Policy constructs E.g. Wait for users decision on handling

    specific conditions Auto Instantiate a failed service but service

    responsible for doing this consistently evenwhen failed service not failed but justunreachable

    Administrators can set up services(managees) by defining characteristics Writing information to registry can be used to

  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    10/29

    Issues in the distributed systemConsistency Examples of inconsistent behavior

    Two or more managers managing the same managee Old messages / requests reaching after new requests Multiple copies of managees existing at the same time /

    Orphan managees leading to inconsistent system state Use a Registry generated monotonically increasing

    Unique Instance ID (IID) to distinguish betweennew and old instances of Managers, Managees andMessages Requests from manager thread A are considered obsolete

    IF IID(A) < IID(B) Service Adapter stores the last known MessageID

    (IID:seqNo) allowing it to differentiate between duplicatesAND obsolete messages

    Periodic renewal with registry IFIID(manageeInstance_1) < IID(manageeInstance_2) THEN manageeInstance_1 was deemed OBSOLETE

    SOEXECUTE Policy (E.g. Instruct manageeInstance_1to silently shutdown)

  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    11/29

  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    12/29

    Implemented:

    WS Specifications WS Management (June 2005) parts (WS Transfer [Sep

    2004], WS Enumeration [Sep 2004] and WS Eventing)(could use WS-DM)

    WS Eventing (Leveraged from the WS

    Eventing capability implemented in OMII) WS Addressing [Aug 2004] and SOAP v 1.2 used (needed

    for WS-Management)

    Used XmlBeans 2.0.0 for manipulating XML incustom container.

    Currently implemented using JDK 1.4.2 butwill switch to JDK1.5

    Released on http://www.naradabrokering.orgin February 2007

    http://www.naradabrokering.org/http://www.naradabrokering.org/
  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    13/29

    Performance EvaluationResults

    Extreme case withmany catastrophicfailures

    Response timeincreases withincreasing number ofconcurrent requests

    Response time isMANAGEE-DEPENDENTand the shown timesare typical

    MAY involve 1 or moreRegistry access whichwill increase overallresponse time

    Increases rapidly asno. of Managees >(150 200)

    managees

  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    14/29

    Performance EvaluationHow much infrastructure is required to manage N managees ?

    N = Number of managees to manage M = Max. no. of entities connected to a single messagingnode

    D = Max. no of managees managed by a single managerprocess

    R = min. no. of registry service instances required to

    provide fault-tolerance Assume every leaf domain has 1 messaging node. Hence

    we have N/M leaf domains. Further, No. of managers required per leaf domain is M/D Total Components in lowest level

    = (R registry + 1 Bootstrap Service + 1Messaging Node+ M/D Managers) * (N/M such leaf domains)

    = (2 + R + M/D) * (N/M) Thus percentage of additional infrastructure is

    = [(2 +R)/M + 1/D] * 100 %

  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    15/29

    Performance EvaluationResearch Question:How much infrastructure is required to manage N managees ?

    Additional infrastructure = [(2 +R)/M + 1/D] * 100%

    A Few Cases Typical values of D and M are 200 and 800 and assuming

    R = 4, then Additional Infrastructure

    = [(2+4)/800 + 1/200] * 100 % 1.2 %

    Shared Registry => there is one registry interface perdomain, R = 1, then Additional Infrastructure= [(2+1)/800 + 1/200] * 100 % 0.87 %

    If NO messaging node is used (assume D = 200), thenAdditional Infrastructure= [(R registry + 1 bootstrap node + N/D managers)/N] *100 %= [(1+R)/N + 1/D] * 100 %

    100/D % (for N >> R) 0.5%

  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    16/29

    Performance EvaluationResearch Question:How much infrastructure is required to manage N managees ?

    How Cost varies with maximum Managees per Manager

    (D) ?

    100.4

    33.7

    20.414.7

    10.4

    4.4 2.4 1.4 1.0 0.9

    -10.0

    10.0

    30.0

    50.0

    70.0

    90.0

    110.0

    1 3 5 7 10 25 50 100 150 200

    Max. Managees per Manager (D)

    PercentageofAddition

    al

    Infrastructure

    "For R = 1, M = 800"

  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    17/29

    Performance EvaluationXML Processing Overhead

    XML Processing overhead is measured as the totalmarshalling and un-marshalling time required.

    In case of Broker Management interactions,typical processing time (includes validation

    against schema) 5 ms Broker Management operations invoked only during

    initialization and failure from recovery Reading Broker State using a GET operation involves 5ms

    overhead and is invoked periodically (E.g. every 1minute, depending on policy)

    Further, for most operation dealing with changing brokerstate, actual operation processing time >> 5ms andhence the XML overhead of 5 ms is acceptable.

  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    18/29

    Prototype:Managing Grid Messaging Middleware

    We illustrate the architecture by managing the distributedmessaging middleware: NaradaBrokering This example motivated by the presence of large

    number of dynamic peers (brokers) that needconfiguration and deployment in specific topologies

    Runtime metrics provide dynamic hints on improving

    routing which leads to redeployment of messagingsystem (possibly) using a different configuration andtopology

    Can use (dynamically) optimized protocols (UDP v TCPv Parallel TCP) and go through firewalls

    Broker Service Adapter Note NB illustrates an electronic entity that didnt start

    off with an administrative Service interface So add wrapper over the basic NB BrokerNode object

    that provides WS Management front-end Allows CREATION, CONFIGURATION and MODIFICATION of

    broker and broker topologies

    M i (N d B k i )M i (N d B k i )

  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    19/29

    Messaging (NaradaBrokering)Messaging (NaradaBrokering)

    ArchitectureArchitecture

    1919

    Messaging

    Substrate

    Media Device

    Slow Clientbehind modem

    Laptop

    PDA

    Computer

    File Server

    Compute Serverbehind firewall

    MediaServer

    Audio / Video

    Conferencing Client

    User

    ComputeServer

  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    20/29

    June 19, 2006 Community Grids Lab, Bloomington IN :CLADE 2006: 20

    Typical use of Grid Messaging in NASA

    Datamining Grid

    Sensor Grid implementing using NB

    NBGIS Grid

    N d B k i M tN d B k i M t

  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    21/29

    2121

    NaradaBrokering ManagementNaradaBrokering ManagementNeedsNeeds NaradaBrokeringNaradaBrokering Distributed Messaging System consists of peersDistributed Messaging System consists of peers

    (brokers) that collectively form a scalable messaging substrate.(brokers) that collectively form a scalable messaging substrate.

    Optimizations and configurations include:Optimizations and configurations include: Where should brokers be placed and how should they beWhere should brokers be placed and how should they be

    connected, E.g. RING, BUS, TREE, HYPERCUBE etc, eachconnected, E.g. RING, BUS, TREE, HYPERCUBE etc, eachTOPOLOGYTOPOLOGY has varying degree of resource utilization, routing,has varying degree of resource utilization, routing,cost and fault-tolerance characteristics.cost and fault-tolerance characteristics.

    StaticStatic topologies or topologies created using static rules may betopologies or topologies created using static rules may beinefficient in some casesinefficient in some cases

    E.g., In CAN, Chord a new incoming peer randomly joins nodesE.g., In CAN, Chord a new incoming peer randomly joins nodesin the network. Network distances are not taken into accountin the network. Network distances are not taken into accountand hence some lookup queries may span entire diameter ofand hence some lookup queries may span entire diameter ofnetworknetwork

    Runtime metrics provideRuntime metrics provide dynamicdynamic hints on improving routinghints on improving routingwhich leads to redeployment of messaging system (possibly)which leads to redeployment of messaging system (possibly)using a different configuration and topologyusing a different configuration and topology

    Can use (dynamically)Can use (dynamically) optimized protocolsoptimized protocols (UDP v TCP v(UDP v TCP vParallel TCP) and go through firewalls but no good way to makeParallel TCP) and go through firewalls but no good way to make

    choices dynamicallychoices dynamically These actions collectively termed asThese actions collectively termed as Managing the MessagingManaging the Messaging

  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    22/29

    Prototype:Costs (Individual Managees are NaradaBrokeringBrokers)

    Time (msec) (average values)

    Initialized

    (Later modifications)

    142 1

    104 2

    160 2

    610 6

    778 5

    Un-Initialized

    (First time)

    129 2Delete Broker

    20 1Delete Link

    27 2Create Link

    57 2Create Broker

    33 3Set Configuration

    Operation

  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    23/29

    Recovery:Typical Time

    Max:

    20 + 778.0 + 610.1 +160.5 + 2* 26.67

    1622 msec

    Min:

    5 + 778.0 +610.1

    1393 msec

    N nodes, Links per

    broker vary from 0 3

    1 4 manageeObjects per node

    Cluster

    10 + (778.0 + 610.1 + 160.5)

    1548 msec

    N nodes, N links (1

    outgoing link perNode)

    2 managee ObjectsPer node

    Ring

    Recovery Time= T(Read State From Registry) + T(Bring

    managee up to speed)

    = T(Read State) + T[SetConfig + CreateBroker + CreateLink(s)]

    Number of manageespecific Configuration

    EntriesTopology

    Assuming 5ms Read time from registry permanagee object

  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    24/29

    Prototype:ObservedRecovery Cost per managee

    1616 821421 9

    8 1

    2362 18

    Average (msec)

    Restore (1 Broker + 3 Link)Restore (1 Broker + 1 Link)

    Read State

    *Spawn Process

    Operation

    Time for Create Broker depends on the number & type oftransports opened by the broker

    E.g. SSL transport requires negotiation of keys and wouldrequire more time than simply establishing a TCPconnection

    If brokers connect to other brokers, the destination broker

    MUST be ready to accept connections, else topology recoverytakes more time.

  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    25/29

    Management Console:Creating Nodes and Setting Properties

  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    26/29

    Management Console:Creating Links

  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    27/29

    Management Console:Policies

  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    28/29

    Management Console:Creating Topologies

    C l i

  • 8/14/2019 Managing Grid Messaging MiddlewareJan31-07

    29/29

    Conclusion We have presented a scalable, fault-tolerant

    management framework that Adds acceptable cost in terms of extra resources

    required (about 1%) Provides a general framework for management of

    distributed entities Is compatible with existing Web Service

    specifications

    We have applied our framework to manageManagees that are loosely coupled and have

    modest external state (important to improvescalability of management process)

    Outside effort is developing a Grid Builderwhich combines BPEL and this management

    system to manage initial specification