Top Banner
Adaptive network QoS in layer-3/layer-2 networks as a middleware service for mission-critical applications q,qq Balakrishnan Dasarathy * , Shrirang Gadgil, Ravi Vaidyanathan, Arnie Neidhardt, Brian Coan, Kirthika Parmeswaran, Allen McIntosh, Frederick Porter Applied Research, Telcordia Technologies, One Telcordia Drive, Piscataway, NJ 08854, United States Available online 13 November 2006 Abstract We present adaptive network Quality of Service (QoS) technology that provides delay bounds and capacity guarantees for traffic belonging to mission-critical tasks. Our technology uses a Bandwidth Broker to provide admission control and leverages the differenti- ated aggregated traffic treatment provided by today’s high-end COTS layer-3/2 switches. The technology adapts to changes in network resources, work load and mission requirements, using two components that are a particular focus of this paper: Fault Monitor and Performance Monitor. Our technology is being developed and applied in a CORBA-based multi-layer resource management framework. Ó 2006 Elsevier Inc. All rights reserved. Keywords: Bandwidth brokering; Admission control; Layer-3/layer-2 networks; Real-time middleware 1. Introduction A new generation of distributed real-time and embedded (DRE) middleware is needed to address the performance needs of mission-critical military applications. Current capabilities are largely limited to fixed static allocation of resources in support of predefined mission capabilities. A static allocation strategy limits the ability of a military application to adapt to conditions that vary from the origi- nal system design. Dynamic resource management systems can adapt to changes in mission requirements, workload distributions, and available resources, including resource reduction caused by fault conditions. We focus on dynamic resource management for network resources. We are integrating and validating our adaptive network Quality of Service (QoS) solution as part of a Multi-Layer Resource Management (MLRM) framework (Lardieri et al., 2006) being created by the DARPA Adap- tive and Reflective Middleware Systems (ARMS) program using CORBA middleware and component technology. The purpose of the MLRM architecture is to push middle- ware technologies beyond current commercial capabilities especially in their ability to detect mission-impacting events and adapt in a timely manner. It is being applied to ship- board computing. The goals of our adaptive network QoS solution are to guarantee a required minimal level of QoS for mission-crit- ical traffic, provide a simple way to express QoS needs, detect and adapt to adverse events, and optimize overall network use. They are achieved with a Bandwidth Broker (BB) that provides admission control of application packet flows into various traffic classes. Admission control ensures that a flow of a given class has enough available capacity. For a delay- sensitive flow to have enough capacity means that the flow complies with an off-line computed occupancy bound on each link on its path. This compliance check ensures an upper bound on delay for this flow and previously admitted flows. The solution leverages widely available mechanisms 0164-1212/$ - see front matter Ó 2006 Elsevier Inc. All rights reserved. doi:10.1016/j.jss.2006.09.030 q A preliminary version of this paper appeared in RTAS 2005. See Dasarathy et al. (2005). qq This work is supported by DARPA Contract NBCH-C-03-0132; approved for Public Release, Distribution Unlimited. * Corresponding author. Tel.: +1 732 6992430; fax: +1 732 3367015. E-mail addresses: [email protected] (B. Dasarathy), coan@ research.telcordia.com (B. Coan). www.elsevier.com/locate/jss The Journal of Systems and Software 80 (2007) 972–983
12

Adaptive network QoS in layer-3/layer-2 networks as a middleware ...

Feb 14, 2017

Download

Documents

vuongnguyet
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Adaptive network QoS in layer-3/layer-2 networks as a middleware ...

www.elsevier.com/locate/jss

The Journal of Systems and Software 80 (2007) 972–983

Adaptive network QoS in layer-3/layer-2 networks as a middlewareservice for mission-critical applications q,qq

Balakrishnan Dasarathy *, Shrirang Gadgil, Ravi Vaidyanathan, Arnie Neidhardt,Brian Coan, Kirthika Parmeswaran, Allen McIntosh, Frederick Porter

Applied Research, Telcordia Technologies, One Telcordia Drive, Piscataway, NJ 08854, United States

Available online 13 November 2006

Abstract

We present adaptive network Quality of Service (QoS) technology that provides delay bounds and capacity guarantees for trafficbelonging to mission-critical tasks. Our technology uses a Bandwidth Broker to provide admission control and leverages the differenti-ated aggregated traffic treatment provided by today’s high-end COTS layer-3/2 switches. The technology adapts to changes in networkresources, work load and mission requirements, using two components that are a particular focus of this paper: Fault Monitor andPerformance Monitor. Our technology is being developed and applied in a CORBA-based multi-layer resource management framework.� 2006 Elsevier Inc. All rights reserved.

Keywords: Bandwidth brokering; Admission control; Layer-3/layer-2 networks; Real-time middleware

1. Introduction

A new generation of distributed real-time and embedded(DRE) middleware is needed to address the performanceneeds of mission-critical military applications. Currentcapabilities are largely limited to fixed static allocation ofresources in support of predefined mission capabilities. Astatic allocation strategy limits the ability of a militaryapplication to adapt to conditions that vary from the origi-nal system design. Dynamic resource management systemscan adapt to changes in mission requirements, workloaddistributions, and available resources, including resourcereduction caused by fault conditions.

We focus on dynamic resource management for networkresources. We are integrating and validating our adaptive

0164-1212/$ - see front matter � 2006 Elsevier Inc. All rights reserved.

doi:10.1016/j.jss.2006.09.030

q A preliminary version of this paper appeared in RTAS 2005. SeeDasarathy et al. (2005).qq This work is supported by DARPA Contract NBCH-C-03-0132;approved for Public Release, Distribution Unlimited.

* Corresponding author. Tel.: +1 732 6992430; fax: +1 732 3367015.E-mail addresses: [email protected] (B. Dasarathy), coan@

research.telcordia.com (B. Coan).

network Quality of Service (QoS) solution as part of aMulti-Layer Resource Management (MLRM) framework(Lardieri et al., 2006) being created by the DARPA Adap-tive and Reflective Middleware Systems (ARMS) programusing CORBA middleware and component technology.The purpose of the MLRM architecture is to push middle-ware technologies beyond current commercial capabilitiesespecially in their ability to detect mission-impacting eventsand adapt in a timely manner. It is being applied to ship-board computing.

The goals of our adaptive network QoS solution are toguarantee a required minimal level of QoS for mission-crit-ical traffic, provide a simple way to express QoS needs, detectand adapt to adverse events, and optimize overall networkuse. They are achieved with a Bandwidth Broker (BB) thatprovides admission control of application packet flows intovarious traffic classes. Admission control ensures that a flowof a given class has enough available capacity. For a delay-sensitive flow to have enough capacity means that the flowcomplies with an off-line computed occupancy bound oneach link on its path. This compliance check ensures anupper bound on delay for this flow and previously admittedflows. The solution leverages widely available mechanisms

Page 2: Adaptive network QoS in layer-3/layer-2 networks as a middleware ...

B. Dasarathy et al. / The Journal of Systems and Software 80 (2007) 972–983 973

that support layer-3 DiffServ (Differentiated Services) andlayer-2 CoS (Class of Service) features in commercial routersand switches for enforcement.

This paper is organized as follows. Section 2 provides anoverview of the MLRM middleware framework andexplains how our QoS technology fits in. Section 3describes the networks of interest. In Section 4, we describethe Bandwidth Broker and our overall QoS architecture.Section 5 describes the two QoS feedback mechanisms,Fault Monitor and Performance Monitor. In Section 6,we explain how the Bandwidth Broker performs policy-dri-ven mode changes. Section 7 is on our experimental results.Section 8 compares and contrasts our work with the workreported in the literature. Section 9 is the summary.

2. MLRM middleware framework

The ARMS MLRM (Lardieri et al., 2006) is a frame-work for multi-layer resource management whereby com-plex resource allocation and scheduling can be handled ina divide-and-conquer manner. The goal of the MLRMframework at its highest layer is to maximize missioncoverage. The framework supports the incorporation ofdifferent algorithms at different layers in a plug-and-playmanner using the CORBA component and middlewaretechnology, specifically the CIAO (Wang et al., 2003;http://www.cs.wustl.edu/~schmidt/CIAO-intro.html) im-plementation for C++ built over a C++ real-time ORB,TAO (http://www.cs.wustl.edu/~schmidt/TAO.html), andOpenCCM (http://openccm.objectweb.org/) with JacORB(http://www.jacorb.org/), a Java ORB, supporting devel-opment in Java.

One may use a utility function to formulate an optimiza-tion problem in a particular layer. For example, a utilityfunction may penalize heavily if the timeliness of amission-critical function cannot be met. The MLRMframework supports multiple QoS dimensions, such as sur-vivability, timeliness (hard and soft), security, and efficientresource utilization. A key assumption in MLRM is that

InfrastructureAllocator

OperaManage

ResourceAllocator

P

Services Layer

Resource Pool Layer

Physical ResourceLayer

Bandwidth Broker

Network Performance

Monitor

Network Fault

Monitor

FlowProvisioner

InfrastructureAllocator

OperaManage

ResourceAllocator

P

Services Layer

Resource Pool Layer

Physical ResourceLayer

Bandwidth Broker

Network Performance

Monitor

Network Fault

Monitor

FlowProvisioner

Fig. 1. MLRM middl

the level of service in one QoS dimension can be coordi-nated with and/or traded off against the levels of servicein other dimensions. The goal of the framework, further-more, is to enable rapid deployment of applications,monitoring of QoS in different dimensions, and rapid re-deployment of applications if their QoS is being violated.MLRM is a federated resource management middlewareservice and it is a layer in the software architecture sand-wiched between the network/operating system and theapplication layer. The MLRM resource management hier-archy, as shown in Fig. 1, comprises three layers: ServicesLayer, Resource Pool Layer and Physical Resource Layer.Each layer has allocation, scheduling, management or con-figuration functions as well as feedback functions.

• Services Layer: The Services Layer receives explicitresource management requests from applications alongwith command and policy inputs. Two key allocation,scheduling, management or configuration componentsat this layer are Infrastructure Allocator (IA) and Oper-

ational String Manager Global (OSM Global). The IAcomponent provides coarse-grained global resourceallocation. It assigns applications or operational stringsto resource pools taking into account their inter-poolcommunication needs using the Bandwidth Broker. Anoperational string, commonly known as a task in real-time computing, is a sequence of applications thatinteract to provide a service satisfying certain QoSrequirements. A pool is a collection of resources oftendetermined by factors such as physical proximity andtype (e.g., processors in a data center). The OSM Globalcomponent coordinates deployment of operationalstrings across resource pools.

• (Resource) Pool Layer: The Pool Manager (PM) usesmultiple Resource Allocators (RA’s) to assign applica-tions to computing nodes, taking into account theirintra-pool communication needs. The OSM Pool Agent

component is responsible for managing operational sub-strings (assigned to a pool), monitoring and controllingthe applications within each operational substring.

Pool Manager

Operational String Manager Global

tional String r Pool Agent

Noderovisioner

Global Resource Status Service

Pool Resource Status Service

Node Resource Status Service

Pool Manager

Operational String Manager Global

tional String r Pool Agent

Noderovisioner

Global Resource Status Service

Pool Resource Status Service

Node Resource Status Service

eware framework.

Page 3: Adaptive network QoS in layer-3/layer-2 networks as a middleware ...

974 B. Dasarathy et al. / The Journal of Systems and Software 80 (2007) 972–983

• Physical Resources Layer: The Physical Resources Layerdeals with the specific instances of resources in the sys-tem. Each Node Provisioner (NP) handles managementand provisioning of an individual resource, specificallya host resource to run applications allocated to the host.It configures OS process priorities and scheduling classesof deployed components across a variety of operatingsystems (e.g., Linux and VxWorks).

• Resource Status Service (RSS): RSS operates across allthese layers and provides continuous feedback on thestatus of non-network resources toward determininghow well the QoS concerns are being met by applica-tions and operational strings. RSS is classified intoGlobal RSS, Pool RSS and Node RSS based on thelayer at which it operates. At the simplest Node level,RSS consists of processor and process failure detectors.At higher layers, RSS consists of condition monitorsand detectors to monitor for and detect QoS violations.A violation can be set to be triggered in the broader con-text of mission importance and policy directives.

The Bandwidth Broker (BB) component is the mainentity responsible for managing network QoS. The BBcomponent is shown in the Resource Pool Layer, as thenetwork is collectively viewed as a resource pool of net-work elements. Its services are used by the RA and IA com-ponents for managing the network resources in acoordinated fashion within a pool (e.g., data center) andacross pools (e.g., between data centers), respectively.The Bandwidth Broker invokes Flow Provisioner to provi-sion and configure routers/switches at the PhysicalResource Layer. The Bandwidth Broker also interacts withtwo network feedback components at the pool layer,Network Fault Monitor and Network Performance Monitor.Finally, the BB component also provides feedback to thePM and OSM Global components on network-relatedQoS problems. This paper is about these network QoScomponents.

The forward control flow among the MLRM compo-nents is as follows: After the allocation decision, the IAcomponent invokes the OSM Global which then invokesthe OSM Pool Agent. The OSM Pool Agent componentcoordinates allocation of resources within a pool of hostresources through the PM component. The PM componentthen invokes the NP for provisioning node resources (e.g.,setting operating system process priority). The IA and PM(through RA) components invoke the BB to determine theavailability of, reserve and provision network resourcesacross pools and within a pool, respectively. The RA com-ponent also performs a schedulability analysis for timingcompliance of critical operational strings at the pool level.Based on the timing analysis results, a pool-level realloca-tion may need to be coordinated by the PM component.The PM component, in turn, may have to defer to the IAcomponent for a reallocation decision across pools. InFig. 1, we show status propagation using two-way arrowsas it could be done using a synchronous request-reply or

an event propagation paradigm. The coordination amongthe various MLRM components is described in more detailin Lardieri et al. (2006). The paper also illustrates theMLRM framework dynamism using load and missionchange scenarios. We also refer the reader to that paperfor details on an empirical evaluation of the MLRMarchitecture.

3. Network architecture

The network architecture illustrated in Fig. 2 consists offour pools, each served by three access (edge) switches. Theaccess switches within a pool are fully meshed and acrosspools are partially meshed. The rich connectivity amongswitches (numbering in the 10s) is to enable continuedoperation in several catastrophic situations at a cost thatis acceptable. The illustrative architecture shown is a spe-cialized case of a robust wireline, enterprise network archi-tecture. Increasingly, there is only one (IP) network inenterprises that carries all the traffic – data, voice, andvideo. The network carries both point-to-point traffic(e.g., synchronous RPC, voice over IP calls) and multi-point traffic (e.g., pub/sub, announcements, broadcastvideo).

Our network QoS component design and implementa-tion is generic for these layers and is capable of handlinga variety of enterprise network configurations includingall layer-3 and all layer-2 with layer-3 awareness at theedges. Typically, high-end gigabit Ethernet switches suchas the Cisco 6500 used can operate either at layer-3 or atlayer-2 based on configuration settings. Our network QoSdesign is also generic with respect to the topology it cansupport. However, our experimental studies are done usingthe cluster architecture shown in Fig. 2.

4. Network QoS components

Fig. 3 illustrates our network QoS component architec-ture. In Fig. 3, we make use of CORBA Component Model(CCM) notation to describe the interactions among thecomponents. The four major components of the QoS man-agement architecture are: (1) Bandwidth Broker, (2) FlowProvisioner, (3) (Network) Performance Monitor and (4)(Network) Fault Monitor. Our network QoS componentsprovide adaptive admission control that ensures there areadequate network resources to match the needs of admittedflows. The Fault Monitor and Performance Monitor arethe two feedback mechanisms in support of this adaptivebehavior.

4.1. Bandwidth Broker

The functions provided by the Bandwidth Broker toother MLRM components are:

Page 4: Adaptive network QoS in layer-3/layer-2 networks as a middleware ...

Other MLRM

Components

Legend:

R/S: Router/Switch

QoS ProblemEvent

Reservation,& Resource

Query

Host

Host

Applications/Middleware

Applications/Middleware

R/S

R/SR/S

R/S

R/S

Performance Event

BandwidthBroker Flow

Provisioner

PerformanceMonitor

FaultMonitor

FaultEvent

Provisioning Request

Performance Query

Fault Query

: Event sink

: Event source

: Facets (InterfaceExporter)

: Receptacle(interfaceImporter)

Other MLRM

Components

Legend:

R/S: Router/Switch

QoS ProblemEvent

Reservation,& Resource

Query

Host

Host

Applications/Middleware

Applications/Middleware

R/S

R/SR/S

R/S

R/S

Performance Event

BandwidthBroker Flow

Provisioner

PerformanceMonitor

FaultMonitor

FaultEvent

Provisioning Request

Performance Query

Fault Query

: Event sink

: Event source

: Facets (InterfaceExporter)

: Receptacle(interfaceImporter)

Fig. 3. QoS components and their interactions.

Layer 3/2Edge

Switch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitchLayer 3/2

EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Pool 1

Pool2

Pool3

Pool4

Layer 3/2Edge

Switch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitchLayer 3/2

EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Pool 1

Pool2

Pool3

Pool4

Layer 3/2Edge

Switch

Layer 3/2Edge

Switch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitchLayer 3/2

EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Layer 3/2EdgeSwitch

Pool 1

Pool2

Pool3

Pool4

Fig. 2. Network of interest.

B. Dasarathy et al. / The Journal of Systems and Software 80 (2007) 972–983 975

• Flow Admission: Reserve, commit, modify, and deleteflows in support of allocation and scheduling for useby the PM and IA components.

• Queries: Provide information about bandwidth avail-ability in different classes among pairs of pools and sub-nets in support of coarse-level allocation of processes toprocessors for use by the IA component.

• Events: Provide notification of high-level QoS-affectingevents (e.g., the Bandwidth Broker’s inability to meetQoS of a previously admitted flow because of a network

fault, repeated deadline violations on a flow, inability toprovision a switch for desired QoS) for use by the OSMGlobal, and PM components.

• Bandwidth Allocation Policy Changes: Adapt existingand future bandwidth reservations in support of missionmode changes for use by the IA component. (See Sec-tion 6.)

The Bandwidth Broker leverages DiffServ (Blake et al.,1998; Nichols et al., 1998) in layer-3 and CoS mechanisms

Page 5: Adaptive network QoS in layer-3/layer-2 networks as a middleware ...

976 B. Dasarathy et al. / The Journal of Systems and Software 80 (2007) 972–983

in layer-2 network elements to provide end-to-end QoSguarantees. Transitions between DiffServ and CoS aretransparent to end users. CoS mechanisms provide func-tionality at layer-2 similar to what DiffServ mechanismsprovide at layer-3. Layer-2 CoS support is somewhatrestrictive, however. Layer-2 supports a 3-bit Class of Ser-vice (CoS) marking or eight classes as opposed the 6-bitDifferentiated Services Code Point (DSCP) with potentially64 different classes. Moreover, CoS has limited supportmechanisms for scheduling and buffer management. TheDiffServ and CoS features are typically implemented insoftware and in ASIC (Application-Specific IntegratedCircuits) hardware, respectively. They both provideaggregated traffic treatment throughout the network andper-flow treatment at the network ingress.

DiffServ and CoS features by themselves are insufficientto guarantee end-to-end network QoS, because the trafficpresented to the network must be made to conform tothe network capacity. We need admission control thatensures there are adequate network resources to matchthe needs of admitted flows.

4.1.1. Path discovery in layer-3/layer-2 networks

To do its admission control job, the Bandwidth Brokerneeds to be aware of the path that will be traversed by eachflow, track how much bandwidth is being committed oneach link for each traffic class, and estimate whether thetraffic demands of new flows can be accommodated. Ourbandwidth tracking is for both capacity assurance anddeadline assurance. Capacity assurance makes sure thatthere is enough bandwidth on a link. Deadline assuranceensures that the occupancy of all of the deadline-sensitiveflows traversing a link is kept low enough so that worst-case burst and thus the delay on the link will hold. (See Sec-tion 4.1.3.) As tracking bandwidth on links is an importantaspect of the Bandwidth Broker, path discovery is a majorfunctional component of the Bandwidth Broker. Path dis-covery finds out which network links are used by a flow oftraffic between two hosts.

4.1.1.1. Layer-3 path discovery. There are two approachesto layer-3 path discovery. Active techniques introduce pack-ets into the network and generally use Internet ControlMessage Protocol (ICMP) mechanisms to determine thepath taken by the introduced packets. Passive techniquestypically rely on the monitoring of layer-3 routing tables.

Active Layer-3 Path Discovery (traceroute): Traceroute,a widely used active path discovery technique in layer-3,relies on the Time-to-Live (TTL) field in the IP headerand ICMP error messages to track the hop-by-hop IP pathbetween source and destination. Essentially, ICMP orUDP packets with TTL values of (1, 2, . . .) are sent fromthe source to the destination. The IP TTL field is decre-mented at each IP hop. When the TTL value reaches 0,IP routers originate an ICMP error message (ICMP type3) to the source. The traceroute program reconstructs thehop-by-hop IP path from these ICMP error messages.

Passive Layer-3 Path Discovery: Passive techniques relyon the monitoring of layer-3 routing tables. One of themost effective ways to achieve this is to passively partici-pate as a peer in the layer-3 link state routing protocol(e.g., OSPF or IS-IS). The Bandwidth Broker can peer withlink state routing protocols and receive and reconstruct thelink state topology of the network in real-time, just like anyother link state router in the network. The peering arrange-ment enables the Bandwidth Broker to obtain routinginformation as quickly as network routers. The peeringarrangement does not, however, mean that the BandwidthBroker offers any routes or plays any role in packet for-warding. This technique eliminates the need to dotraceroute.

Equal-Cost Multi-Path (ECMP): Layer-3 path discov-ery mechanisms (both active and passive) work well incases where a single best path is available between a sourceand destination. When there is more than one equal-costpath from a source to a destination, COTS routers supporta feature known as Equal-Cost Multi-Path (ECMP). WithECMP, routers balance the traffic load on multiple equal-cost paths between two points. Typically, this type of loadbalancing is done in a way that keeps the packets of thesame flow together (in order to minimize packet re-order-ing). In general, with passive layer-3 path discovery, it isdifficult to predict the specific path that a flow will use,since this depends largely on the vendor’s ECMP imple-mentation. Further, depending on the vendor’s ECMPimplementation, active techniques such as traceroute mayor may not be able to discover the specific best cost pathused for a specific flow. For instance, ICMP packets maybe treated differently by the ECMP implementation thansay, TCP flows. When layer-3 path discovery is able to pre-dict accurately the specific path used, admission controldecisions are accurate. When path discovery is unable topredict which of the shortest paths will be used, a conserva-tive approach that accounts for flows along every possibleequal-cost path should be employed.

4.1.1.2. Layer-2 path discovery. Layer-2 switches do not runrouting protocols, instead they only communicate using thespanning tree protocol (STP). Layer-2 network segmentsare broadcast domains. In a layer-2 network topology,redundant connectivity (or a loop) could lead to broadcaststorms, where broadcast packets are repeatedly forwardedaround the loop. STP eliminates such loops by markingcertain layer-2 ports as non-forwarding. Thus, one of thekeys to layer-2 path discovery is to discover the state ofthe spanning tree, i.e., which ports are active or blocked.The Bandwidth Broker uses SNMP MIBs to track span-ning tree state and VLAN membership. It uses this infor-mation to compute layer-2 paths within a single VLAN(Decker et al., 1993).

4.1.1.3. Hybrid network path discovery. A two-passapproach to path discovery in a hybrid layer-3/2 networkis used. In the first pass, the end-to-end layer-3 path is

Page 6: Adaptive network QoS in layer-3/layer-2 networks as a middleware ...

B. Dasarathy et al. / The Journal of Systems and Software 80 (2007) 972–983 977

discovered. Each switch in the layer-3 path is the gatewayswitch from one VLAN to the next VLAN. In the secondpass, the layer-2 path between each of the layer-3 segmentsdiscovered in the first pass is computed.

4.1.2. Layer-3 and layer-2 QoS treatment

The Bandwidth Broker realizes QoS treatment in layers2 and 3 for traffic flows that are admitted in the network.Our implementation of QoS treatment uses policing andmarking of traffic at the network edge. Policing and mark-ing is done at the granularity of each flow. In the core ofthe network, scheduling and buffer management isperformed at the granularity of each traffic class.

Policing and Marking: These functions are performed atthe edge of the network. The Bandwidth Broker usesAccess Control Lists (ACLs) available on COTS networkelements to classify traffic into flows that can be policed.Typically, the classification is based on the TCP/IP five-tuple: hsource address, destination address, protocol,source port, destination porti. However, in the MLRMarchitecture, the port numbers are generally not knownwhen allocation decisions are made. In our present scheme,the Bandwidth Broker returns a DSCP marking whenevera flow reservation is made based on traffic type or QoSrequested for the flow. The sending application or theORB middleware on behalf of the application then needsto mark the packet with the DSCP marking returned. Eachclass of traffic is then policed by vendor mechanisms suchas Committed Access Rate (CAR). In the policing process,an aggregate rate or bandwidth is ensured for the flowthrough a combination of rate and allowable burst param-eters. The Bandwidth Broker can be configured to eitherdrop packets that exceed the rate profile or re-mark thosepackets to best effort treatment. Actual provisioning ofindividual network elements for policing and marking isdone by the Flow Provisioner under the direction of theBandwidth Broker.

Scheduling and Buffer Management: Scheduling mecha-nisms vary significantly across COTS vendor implementa-tions, from simple round-robin mechanisms, to strictpriority queuing and sophisticated mechanisms such asweighted fair queuing. The Bandwidth Broker uses avail-able vendor scheduling mechanisms. High-priority mis-sion-critical traffic is accorded strict priority treatment,within bounds (dictated by admission control policies),while other traffic classes can share link bandwidth inaccordance with configurable weights or percentages. Typ-ically at layer-3, vendor products isolate traffic classes byusing separate queues for each class. Where sophisticatedbuffer management schemes such as Weighted RandomEarly Detection (WRED) are available, they are appliedto TCP/SCTP traffic. In the absence of such schemes, taildrop is the only option.

Transport of QoS markings: In layer-3 network seg-ments, the DSCP marking is visible in the IP header andscheduling and buffer management decisions can be basedon this information. The layer-3/layer-2 integrated

switches, even when they are configured as layer-2 switches,can typically process DSCP codepoints. When traffic tra-verses layer-2 network segments, DSCP markings aretranslated to corresponding layer-2 CoS values. At mosteight distinct classes of service can be identified on thelayer-2 segments. In our implementation experience, wehave seldom found the need for more than eight classesof service – however, if the deployment requires it, thenmultiple layer-3 classes may map onto the same layer-2class. Typically, layer-2 network segments have increasedbandwidth and forwarding capacity in contrast to layer-3network segments, making such a deployment workable.In layer-2, multiple classes of traffic may be forced to sharethe same queue. In such a situation it is difficult to maintainisolation between traffic belonging to different classes. Foreach queue, one or more configurable drop thresholds maybe available. The drop thresholds indicate the percentqueue utilization at which frames are discarded from thequeue. Multiple drop thresholds can be associated with asingle queue. For instance, a drop threshold of 40% canbe assigned to CoS 3 in queue 1, and a drop threshold of90% can be assigned to CoS 4 in queue 1. If traffic markedwith CoS 3 arrives for queue 1 when it is say, 50% full, thattraffic will be discarded. However, traffic with CoS 4 will beaccepted and enqueued.

4.1.3. Delay-bound support in the Bandwidth Broker

The Bandwidth Broker admission decision for a flow isnot based solely on requested capacity or bandwidth oneach link traversed by the flow, but it is also based on delaybounds requested for the flow. The delay bounds for newflows need to be guaranteed without damaging the delayguarantees for previously admitted flows and without redo-ing the expensive job of readmitting every previouslyadmitted flow. We have developed computational tech-niques to provide both deterministic and statistical delay-bound guarantees. Delay guarantees raise the level ofabstraction of the Bandwidth Broker to the higher layerMLRM components and enable these components to pro-vide better end-to-end mission guarantees. The basicframework we have developed is capable of dealing withany number of priority classes and, within a priority class,any number of weighted fair queuing subclasses. Theseguarantees are based on relatively expensive computationsof occupancy or utilization bounds for various classes oftraffic, performed only at the time of network configura-tion/reconfiguration, and relatively inexpensive checkingfor a violation of these bounds at the time of admissionof a new flow. We provide an overview of the calculationsinvolved.

For deterministic bounds, during off-line (i.e., at thetime of configuration/reconfiguration) computation forcomputing an occupancy bound ak for each traffic classk, the input data items required are:

• service rate C on links;• maximum packet or frame size M;

Page 7: Adaptive network QoS in layer-3/layer-2 networks as a middleware ...

978 B. Dasarathy et al. / The Journal of Systems and Software 80 (2007) 972–983

• diameter h of the network (longest path length in thenetwork);

• propagation delay D on the longest path;• burst times Ts for various service subclasses s;• WFQ weights ws within each priority class k;• target delay ds for each service class/subclass s.

From these values, elementary off-line calculationsdirectly yield the occupancy bounds that need to be satis-fied to ensure that target delays are met. To provide thisassurance even for the worst case consistent with the infor-mation that will be employed during the on-line admissiondecisions (i.e., basically just that the occupancy bounds willbe respected), the off-line calculations need to account forthe fact that a flow’s burstiness effectively grows with eachhop at which queuing can occur. In more detail, if the firstpacket of a burst in the flow’s traffic encounters a queuingdelay d at one link (presumably from bursts in other flowscompeting for that link), and if the later packets of the floware served immediately after the first packet (presumablybecause the queue-growing bursts in the traffic of all thecompeting flows ended just as the first packet of the firstflow arrived and encountered the big queue), then, at thefollowing link, the flow’s effective burst size will haveincreased by qfd, where qf is the flow’s rate parameter.The implication is especially serious for a link l whoseclass-k occupancy limit ak is practically filled by class-kflows traversing long, h-hop paths ending with link l asthe hth hop (so Rqf = akC, where C is the service rate oflink l, and the sum is over these flows). Specifically, theimplication is that if these flows all encountered a queuingdelay of d on each of their earlier hops, then ignoring theoriginal burstiness of these flows as they entered thenetwork, just the corresponding aggregate increase in effec-tive burstiness arriving at link l is (h � 1)akCd, correspond-ing to a contribution of ðh�1Þak

1�a>kd to the queuing delay for

class-k traffic at link l, where a>k is the aggregate occu-pancy bound for classes of higher priority than k (andthe factor (1 � a>k) reflects the fact that class-k traffic, ineffect, can count on link l only for the fraction (1 � a>k)of its capacity). Of course, there are other contributions,and the actual calculations do take these other contribu-tions into account, but the consideration of this one contri-bution alone already leads to a severe constraint on theoccupancy bound ak. Specifically, this contribution alonehad better be smaller than d, if the class-k queuing delayat link l is supposed to be bounded by d, and this require-ment is equivalent to requiring that ak be strictly smallerthan (1 � a>k)/(h � 1), which is a rather low value if thediameter h is even moderately high. Of course, the actualcalculations pick an even smaller value for ak to accountfor the other contributions, but this first contribution typ-ically has the dominant effect. In any case, the variousclass-isolating occupancy bounds for the different serviceclasses are calculated starting from the highest priorityclass. During the on-line admission control calculations,the only calculation required is to determine whether the

admission of the new flow would violate the occupancybound of the flow’s traffic class/subclass on any of the linksthe flow would traverse, so there is no need to readmitalready admitted flows.

The statistical bound calculations require, in addition tothe input values listed above, the tolerance values for theprobabilities of violating delay targets. Many simplifyingassumptions are made in our statistical calculations, ofwhich some are pessimistic and others optimistic. One opti-mistic assumption is that the burstiness of a flow’s traffic asit arrives at one link on its route is the same as the bursti-ness at other links along the route. So, in contrast to thedeterministic case discussed above, the dominant contribu-tion to the depression of occupancy bounds for classesrequiring worst-case guarantees is absent in our calcula-tions for classes requiring only statistical guarantees, andthis absence raises the corresponding occupancy boundsappreciably. Moreover, with the tolerance values as zero(the most constraining case), the occupancy bound calcula-tions are especially simple and that is what we are imple-menting in the Bandwidth Broker. The deterministic andstatistical delay bounds are currently being incorporatedin the Bandwidth Broker admission control process forthe highest priority class (e.g., the DiffServ EF class) andthe second highest priority class (the DiffServ AF Class;in the AF class, there can be two or more weighted fairqueuing subclasses).

4.2. Flow Provisioner

The Flow Provisioner translates technology-indepen-dent configuration directives generated by the BandwidthBroker into vendor-specific router and switch commandsto classify, mark, and police packets belonging to a flow.The Flow Provisioner component enables the enforcementof the Bandwidth Broker admission control primitives onthe network elements. On Cisco devices, enforcement is pri-marily by platform-specific variants of Cisco’s CommittedAccess Rate (CAR). CAR implements a variant of atoken-bucket scheme that allows individual flows to bepoliced to a specific rate with a specific burst size. The sub-set of flows for which CAR rules are applied is specified byAccess Control Lists (ACLs). ACLs allow the matching ofan individual flow (or group of flows) by using thefive-tuple as well as additional fields such as the DSCPcodepoint. We have implemented Flow Provisioners forlayer-3 IOS Cisco (e.g., Cisco 3600, 7200, and 7500 rou-ters), layer-2/3 Catalyst and IOS switches (e.g., Cisco6500, and 4507 switches), and layer-3 Linux routers todemonstrate the viability of this QoS architecture for avariety of network topologies and equipment.

5. Feedback mechanisms

Two QoS feedback mechanisms to the BandwidthBroker are implemented by the Performance Monitorand Fault Monitor components. Both these components

Page 8: Adaptive network QoS in layer-3/layer-2 networks as a middleware ...

B. Dasarathy et al. / The Journal of Systems and Software 80 (2007) 972–983 979

provide ongoing feedback as asynchronous events. Theyalso support synchronous queries. The Performance Mon-itor provides information on the current performance sta-tus of flows and traffic classes. The Fault Monitorprovides information on the up/down status of links andswitches.

5.1. Performance Monitor

The performance monitoring features we support can beclassified as follows:

• Delay measurement: Delay measurement determineshow well critical flows are meeting their timing con-straints, specifically their end-to-end-latency (delay)metric.

• Detection of overflow: Detection of overflow of traffic foran admitted flow can identify a mission-critical task thatrequires additional capacity.

5.1.1. Delay measurement

We employ an active probe technique to measure delay(Alberi et al., 2003). The infrastructure can be easilyextended to measure jitter and packet loss. The measure-ment/probe infrastructure, as illustrated in Fig. 4, has threemain components: the performance data managementcomponent (consisting of Performance Monitor Servant,Probe Sink and Probe Control), the Probe Platform thatmanages setting up of a probe (measurement job) and theprobes that run on hosts (shown as Probe A and Probe B).

A few of the highlights of this Performance Monitoringcomponent are:

PerformanceMonitorServant

Probe A

L2/L3 Core N

packet train for High Priority

packet train fo

get_measurement ()

measurement_event_ register ()

Probe Sink

Delay, delay violation

Pool A

Raw PerformanceData Measurement setup

PerformanceMonitorServant

Probe A

L2/L3 Core N

packet train for High Priority

packet train fo

get_measurement ()

measurement_event_ register ()

Probe Sink

Delay, delay violation

Pool A

Raw PerformanceData Measurement setup

Fig. 4. Delay measure

• The measurement infrastructure is able to measure delayexperienced by specific traffic flows or delay between apair of hosts for one or more traffic classes. Averagingwindow sizes can be specified. The interface supportsboth synchronous requests to query current delay andasynchronous events for violations of thresholds or toprovide periodic updates on delay.

• The analysis and management of performance data isseparated from probes for raw data collection. The mea-surement data is analyzed and managed by the perfor-mance monitor servant and stored in an HSQLDB(http://hsqldb.org/) in-core database.

• Probe job configurations are stored in a persistent med-ium using MySQL (http://dev.mysql.com/). Probe jobscan be recovered in case of probe platform or probe hostfailures.

• In setting up a probe job, one can vary packet size, gap/time between packets, the number of packets in a packettrain and periodicity.

• The probe job packet train generation is done at the(Linux) kernel level to control or minimize the time-related vagaries in generating packets.

• Clocks are synchronized between two measurement(Linux) hosts with GPS using a non-network interfacebetween the hosts to achieve delay measurement accu-racy in the micro-second range.

5.1.2. Detection of overflow

The Bandwidth Broker’s admission control does notallow overload in the network. Excess offered loads aredetected at the ingress of the network and policed. Thepolicing function provided by an ingress network element

ProbePlatform

Probe Betwork

r Best Effort

Clock Synch-ed up with GPS and other host (s)

Probe Control

Probe Job MySQLDatabase

HSQL In-core Database

Pool B

(Odetics Board(IRIGB Signal))

Measurement setup

Measurement setup

ProbePlatform

Probe Betwork

r Best Effort

Clock Synch-ed up with GPS and other host (s)

Probe Control

Probe Job MySQLDatabase

HSQL In-core Database

Pool B

(Odetics Board(IRIGB Signal))

Measurement setup

Measurement setup

ment component.

Page 9: Adaptive network QoS in layer-3/layer-2 networks as a middleware ...

980 B. Dasarathy et al. / The Journal of Systems and Software 80 (2007) 972–983

either drops or marks down all excess packets. Policing isorchestrated by our Flow Provisioner capability that setsup the policing attributes for flows at the ingress of the net-work. The monitoring program uses the policing functionsprovided by an ingress network element to determinewhether the rate at which packets are dropped or markeddown exceeds a threshold during several consecutiveintervals.

5.2. Fault Monitor

A key feature of a resource management system in adynamic battlespace environment is the ability to detectand react to network faults. We illustrate the problem weare trying to address using the network shown in Fig. 5.If the link between switches A and B goes down, then aflow Y between A and B may be routed through switchC (shown in dashed lines). Similarly, a flow Z between Eand A that originally used the links EB and BA may nowuse links ED and DA (shown in dotted lines). However,links AC, CB, ED, and DA may now be oversubscribedcausing concerns on QoS guarantees for Y and Z as wellas for the flows that had been using these links prior tothe occurrence of the fault. Our techniques consider bothreactive and proactive analyses. In the reactive mode, whenand only when the network fault is detected, we recomputethe layer-3 and/or layer-2 topology and paths for the indi-vidual flows. Proactive response essentially involves pre-computing network paths for various failure conditions,thus enabling faster response. We are currently restrictingourselves to only single-mode faults (single link or switchfailure) in proactive analysis. On the occurrence of any sin-gle-mode failure, a simple lookup operation should yieldthe new network path information. The goal of the FaultMonitor is not to perform a root cause analysis or enablefixing the fault, but to do QoS restoration. If the QoS ofa previously admitted flow cannot be guaranteed, the FaultMonitor will raise a fault exception event to the BandwidthBroker. The Bandwidth Broker, in turn, will raise a higher-level event to other MLRM components, specifically OSMGlobal and PM. The three functional aspects of the FaultMonitor component are as follows:

Fig. 5. Fault and its impact on QoS.

• Fault detection: We use SNMP traps to detect link fail-ures (and links coming back into service). A switch fail-ure is detected when SNMP trap notification for all linksto the switch are received by the adjacent switches.

• Impact analysis: For each admitted flow the impact anal-ysis involves determining whether the flow has changedits path using the path discovery algorithms. If a pathfor a flow has changed, that flow has been impacted bythe failure and is a candidate for readmission.

• QoS restoration: Our design is capable of supporting dif-ferent algorithms satisfying different utility functions oroptimality criteria. The first step, regardless of the algo-rithm used, is to temporarily remove affected flows. Inthe current implementation, the affected flows are thenreadmitted one at a time, from the highest priority tothe lowest priority and within the same priority with lessbandwidth first. Here, we are trying to readmit the max-imum number of higher priority flows whose paths havechanged. We may substitute an algorithm that admitsmore flows. We can also employ a preemption algorithm.For instance, if the admission of a flow would lead tocapacity violations on a link, then the process preemptsa lower priority flow that uses this link. The preemptedflow then becomes a candidate for readmission.

The QoS restoration functions described above restoreQoS based on the bandwidth required. When there is a net-work fault, the diameter of the network could haveincreased, causing the occupancy bounds for various ser-vice classes to decrease. If the occupancy bounds havechanged, the QoS of flows whose paths have not changedcan also degrade. To support honoring the delay bounds,readmission has to be carried out on all previously admit-ted flows. The readmission for an affected flow will now bebased on the newer occupancy bound for the flow’s classon each link on the new path to be traversed by the flow.If preemption of a flow is made, the new occupancy valueshould be used in readmitting the preempted flow as well.

Finally, the Fault Monitor also tracks the paths and theresources used when the fault disappears and restores guar-antees to the original level. We carry out the entire processof impact analysis and restoration, as described above,when the fault condition disappears.

6. Support for mission mode changes

In addition to adapting to faults and overload, theBandwidth Broker supports dynamically changing amongmodes. A mode is a major operational situation such asnormal, alert, and battle mode in a military environment.Our work in support of mode changes deals with globalpolicy changes affecting the entire network, includingchanges in the fraction of the bandwidth allocated to vari-ous traffic classes. The bandwidth policy change implemen-tation involves sending reconfiguration instructions toevery switch to change its QoS parameters such as queuesize, number of scheduling slots allocated, and packet drop

Page 10: Adaptive network QoS in layer-3/layer-2 networks as a middleware ...

B. Dasarathy et al. / The Journal of Systems and Software 80 (2007) 972–983 981

rules for every traffic class. Moreover, a policy changeinvariably will result in the reduction of the bandwidthallocated to one or more traffic classes. The QoS for vari-ous flows already admitted in these classes might no longerbe guaranteed. Identifying the flows affected and readmit-ting the affected flows are similar in spirit to the impactanalysis and restoration of QoS in response to networkfaults, but the details are somewhat different. A flow isaffected in this case if there is a link in its path for whichthe total bandwidth allocated to the link exceeds the linkcapacity of the flow’s class and/or the current occupancyvalue of the flow’s class exceeds its corresponding thresholdfor the class. For the flows affected, the primary sortingfield is priority, from the lowest priority to highest priority,and within each priority, we sort on the bandwidth size indescending order. We keep deleting the flows in the affectedlist starting from the one with the lowest priority and high-est bandwidth requirement until there is no link in the pathused by the flow such that the total bandwidth allocated forthe link (for that class) exceeds the link capacity (for thatclass) and/or the current occupancy value exceeds its corre-sponding threshold. The utility function used here mini-mizes the number of higher priority flows that have thepotential of being denied their QoS. When a flow is deleted,the bandwidth used or the current occupancy value in allthe links used by the flow needs to be adjusted down.

7. Experimentation and validation

To demonstrate that the Bandwidth Broker does indeedimprove both performance and predictability, we report onthis simple experiment using a testbed consisting of Cisco3600 series routers. The results of our experiments are sum-marized in Table 1.

We have a simple configuration consisting of two rou-ters each serving a host and connected by a link of10 Mbits/s capacity. We are transferring a 10-Mbyte (80-Mbit) file over the network using FTP. The first row corre-sponds to file transfers being done as best effort. Thesecond row corresponds to file transfers being done usinga High Reliability traffic class (an Assured Forwarding(AF) class in DiffServ) policed at the rate of 2 Mbits/s.The columns correspond to contention traffic of 0, 1, 2,and 3 Mbits/s.

Table 1Illustration of Bandwidth Broker improving both performance andpredictability

Contention traffic

None 1 Mbits/s 2 Mbits/s 3 Mbits/s

Flow not admittedby the BandwidthBroker

11.4 s 70 s 218 s >5 min.

Flow admitted by theBandwidth Brokerat 2 Mbits/s in anAF Class

30.2 s 30.5 s 30.3 s 30.3 s

As can be seen in Row 1, when there is no QoS treat-ment (best effort), as the contention traffic increases, theFTP transfer time (elapsed wall time for transfer) getsworse, from 11.4 to 70 to 218, to more than 300 s. Basi-cally, the file transfer flow and the contention traffic getthe same equally bad treatment. When the FTP transferuses the AF class, as can be seen in Row 2, as the conten-tion traffic increases, the performance of the FTP transferstays the same, i.e., the transfer time is about 30.5 s. Withthe policing rate for the flow at 2 Mbits/s, it should havetaken at least 40 s to transfer an 80 Mbits file. It has takenonly 30.5 s for the transfers. This discrepancy can beexplained by the nominal burst size allowance. In ourpolicing configuration, we have instructed the router todrop the packets instead of marking them down to besteffort when the rate exceeds 2 Mbits/second. This is consis-tent with how the TCP traffic behaves.

In the experiments, only 30% of capacity (3 Mbits/s) wasallocated to this AF class. If there were a request to theBandwidth Broker to admit another flow to the same AFclass at a rate greater than 1 Mbits/s, the BandwidthBroker would have rejected this new traffic flow requestensuring that the already admitted FTP traffic at the rateof 2 Mbits/s in the AF class gets the right QoS treatment.In fact, this is what we did in one of the extended experi-ments where we turned off the admission control of theBandwidth Broker allowing flows in the AF class to exceedthe class capacity on some links. As to be expected, packetswere dropped in all the competing AF flows using thoselinks and thus violating their QoS requirements.

The network overload ‘‘event’’ generation capabilities(see Section 5.1.2) have been demonstrated for a key ‘‘gate’’test in the ARMS Phase I program. (Gate tests are insti-tuted by the program to measure progress.) The gate testshowed that our technology is applicable in dynamicresource management and increases mission survivability.

Finally, the Bandwidth Broker consistently processesadd and delete reservation requests in under 100 ms.

8. Related work

The two main technologies for providing differentiatedtreatment of traffic are DiffServ/CoS and IntServ. The Band-width Broker makes use of DiffServ/CoS. In IntServ, everyrouter on the path of a requested flow decides whether ornot to admit the flow with a given QoS requirement. Eachrouter in the network keeps the status of all flows that ithas admitted as well as the remaining available (uncommit-ted) bandwidth on its links. Some drawbacks with IntServare that (1) it requires per-flow state at each router, whichcan be an issue from a scalability perspective; (2) it makesits admission decisions based on local information ratherthan some adaptive, network-wide policy; and (3) it is appli-cable only to layer-3 IP networks. Our network QoS does nothave any of these drawbacks (see http://www.cisco.com/en/US/tech/tk543/tk766/technologies_white_paper09186a00800a3e2f. shtml).

Page 11: Adaptive network QoS in layer-3/layer-2 networks as a middleware ...

982 B. Dasarathy et al. / The Journal of Systems and Software 80 (2007) 972–983

Our delay-bound work for DiffServ/CoS networks withdeterministic guarantees is closely related to Le Boudecand Thiran (2004). Our mathematical formulation bothfor deterministic and statistical bounds is broader thanLe Boudec and Thiran (2004), and Wang et al. (2001) inthat any number of priority classes and any number ofweighted fair queuing classes within a priority class canbe handled and thus our admission control can supportdelay guarantees for any DiffServ/CoS classes of traffic.Telcordia has successfully applied Bandwidth Broker tech-nologies to other Government projects and toward com-mercial offerings (Kim and Sebuktekin, 2002; Chadhaet al., 2003). None of these endeavors, however, deals withlayer-2 QoS, let alone unified management of QoS acrossmulti-layers. None of these works, moreover, is reflexiveand adaptive with fault and performance monitoringas part of the QoS framework. Furthermore, integrationinto middleware and integration into an end-to-endresource management framework have not been the focusof these efforts. Proactive fault impact analysis and QoSrestoration in a faulty network as explored in our research,to our knowledge, has not been explored by other re-searchers.

9. Concluding remarks

Our network QoS components provide a unified QoSsolution that guarantees network performance for mis-sion-critical applications in complex wireline, layer-3/2 net-work topologies. Our implementation is flexible and uniquein the mix of guarantees it can provide – deterministic delayguarantees, statistical delay guarantees, and capacity(bandwidth) assurance – to various mission tasks. Ourability to detect and respond appropriately to faults,changing mission mode, and changing needs of high prior-ity and time-critical applications is essential to providingan end-to-end adaptive allocation and scheduling servicefor mission-critical systems. Our adaptive behavior is sup-ported by continual performance monitoring at the net-work level. Much of the component functionalitydescribed in this paper is in place. We are realizing the rest.Experimentation and validation are ongoing.

Acknowledgments

The design and architecture of ARMS is a collaborativeeffort by many institutions. Our thanks to our ARMScolleagues from BBN, Boeing, Carnegie-Mellon Univer-sity, Lockheed-Martin, Johns Hopkins University AppliedPhysics Lab., Ohio University, PrismTech, Raytheon, SRCand Vanderbilt University.

References

Alberi, J.L., McIntosh, A., Pucci, M., Raleigh, T., 2003. On achievinggreater accuracy and effectiveness in network measurements. NYMAN2003, New York.

Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., Weiss, W., 1998.An architecture for differentiated services. IETF RFC 2475.

Chadha, R. et al., 2003. PECAN: policy-enabled configuration acrossnetworks. In: IEEE 4th International Workshop on Policies forDistributed Systems and Networks.

Dasarathy, B., Gadgil, S., Vaidyanathan, R., Parmeswaran, K., Coan, B.,Conarty, M., Bhanot, V., 2005. Network QoS assurance in a multi-layer adaptive resource management scheme for mission-criticalapplications using the CORBA middleware framework. In: Proceed-ings of RTAS 2005, pp. 246–255.

Decker, E., McCloghrie, K., Langille, P., Rijsinghani, A., 1993. Defini-tions of managed objects for source routing bridges. IETF RFC 1525.

Kim, B., Sebuktekin, I., 2002. An integrated IP QoS architecture-performance. Milcom’02, Anaheim, CA.

Lardieri, P., Balasubramanian, J., Schmidt, D.C., Thaker, G., Gokhale,A., Damiano, T., 2006. A multi-layered resource management frame-work for dynamic resource management in enterprise DRE systems.Elsevier Journal of Systems and Software, this (Special) Issue onDynamic Resource Management in Distributed Real-Time Systems,Edited by Charles Cavanaugh, Frank Drews, Lonnie Welch.

Le Boudec, J.-Y., Thiran, P., 2004. Network calculus, 2004. A theory ofdeterministic queuing systems for the Internet, Chapter 2, OnlineVersion of the Book, Springer-Verlag, LNCS, vol. 2050.

Nichols, K., Blake, S., Baker, F., Black, D., 1998. Definition of thedifferentiated services field (DS field) in the IPv4 and IPv6 headers.IETF RFC 2474.

Wang, N., Schmidt, D.C., Gokhale, A., Rodrigues, C., Natarajan, B.,Loyall, J.P., Schantz, R.E., Gill, C.D., 2003. QoS-Enabled Middle-ware. In: Mahmoud, Qusay (Ed.), Middleware for Communications.Wiley and Sons, pp. 131–162.

Wang, S., Xuan, D., Bettati, R., Zhao, W., 2001. Differentiated serviceswith statistical QoS guarantees in static-priority scheduling networks.In: Proceedings of the IEEE Real-Time Systems Symposium, London,UK.

Balakrishnan ‘‘Das’’ Dasarathy is a Chief Scientist at Telcordia AppliedResearch. He has over 25 years of R&D experience in software researchand development (R&D) and software R&D management. His currentareas of interest include middleware, real-time systems and network QoS.He received his Ph.D. in Computer and Information Science from theOhio State University.

Shrirang (‘‘Shree’’) Gadgil is currently a Senior Research Scientist atTelcordia Applied Research. He has nearly 15 years of experience innetwork software development and system software. His current areas ofinterest include policy based network management systems, and networkQoS and traffic engineering. He received his M.S. in Computer Sciencefrom Columbia University.

Ravi Vaidyanathan is currently a Senior Scientist, Telcordia AppliedResearch and has been with Telcordia since 1999. His accomplishments atTelcordia include development of a Border Gateway Protocol toolkit,development of QoS assurance architecture for wireless 802.11b networks,designing a policy framework for Traffic Engineering and QoS Provi-sioning in IP/MPLS networks, and development of simulation models ofad hoc wireless networks. He received his M.S. in Electrical Engineeringfrom the University of Maryland.

Arnie Neidhardt is currently a Senior Research Scientist at TelcordiaApplied Research. He has been with Telcordia Technologies since 1984.His areas of interest include network management, and performanceanalysis and traffic modeling and he has published extensively in theseareas. He received his B.S. from Purdue University, and his M.A. andPh.D. from the University of Wisconsin, all in mathematics.

Brian Coan is currently a Director, Distributed Computing ResearchGroup, Telcordia Applied Research. He has been affiliated with Telcordia

Page 12: Adaptive network QoS in layer-3/layer-2 networks as a middleware ...

B. Dasarathy et al. / The Journal of Systems and Software 80 (2007) 972–983 983

Technologies (and previously with Bell Laboratories) continuously since1978. His current work concentrates on providing resilient networking andinformation services in adverse environments, possibly caused by cyberattacks, for the U.S. Army FCS program. He has degrees in ComputerScience from Princeton (B.S.E.), Stanford (M.S.), and MIT (Ph.D.).

Kirthika Parmeswaran is currently a Research Scientist at TelcordiaApplied Research. Her current areas of interest include policy manage-ment and security for wired and wireless tactical networks, real-timemiddleware and network QoS. She has a B.E in Computer Engineeringfrom Pune Institute of Computer Technology (PICT), India and M.S inComputer Science from Washington University in St. Louis.

Allen McIntosh is currently a Senior Scientist, Telcordia AppliedResearch, and has been with Telcordia since 1987. His research interestsinclude statistical computing, linear models and large datasets. Hereceived his Ph.D. in Statistics from the University of Toronto.

Frederick (Rick) Porter is currently a Senior Scientist in Telcordia AppliedResearch. He has been with Telcordia Technologies since 1984. His cur-rent areas of interest include middleware and defense against cyberattacks. He has degrees in Electrical Engineering from Cornell (B.S.) andStanford (M.S.).