Scalable QoS-based Resource Allocation A Dissertation Submitted to the Graduate Education Committee At The Department of Electrical and Computer Engieneering Carnegie Mellon University in Partial Fulfillment of the Requirements for the degree of Doctor of Philosophy In Electrical and Computer Engineering by Sourav Ghosh COMMITTEE MEMBERS: Advisor: Prof. Ragunathan (Raj) Rajkumar Dr. Jeffery Hansen Prof. John Lehoczky Prof. Dan Sieworek Pittsburgh, Pennsylvania August, 2004 Copyright c 2004 Sourav Ghosh This research was supported by the DoD Multidisciplinary University Research Initiative (MURI) program administered by the Office of Naval Research (ONR) under Grant N00014-01-1-0576 and in part by Defense Advanced Research Project Agency(DARPA). The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies ore endorsements, either expressed or implied, of DoD, ONR or DARPA.
220
Embed
Scalable QoS-based Resource Allocationsourav/thesis.pdf · Abstract A distributed real-time or embedded system consists of a large number of applications that interact with the physical
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Scalable QoS-based ResourceAllocation
A Dissertation
Submitted to the Graduate Education Committee
At The Department of Electrical and Computer Engieneering
Carnegie Mellon University in Partial Fulfillment of the Requirements
for the degree of
Doctor of Philosophy In Electrical and Computer Engineering by
Sourav Ghosh
COMMITTEE MEMBERS:Advisor: Prof. Ragunathan (Raj) Rajkumar
ber of Sub-domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1034.17 Number of Transactions for 6400 Tasks with the Number of Sub-domains . 1044.18 Variation of ( H-Q-RAM Execution Time/ Number of Sub-domains) for 6400
Tasks with the Number of Sub-domains . . . . . . . . . . . . . . . . . . . . 105
queueing (WF 2Q) [7] etc. Stoica et al presented core-stateless fair-queuing (CSFQ) that
makes the fair-queueing principle scalable to large networks, where per-flow management
becomes intractable [79]. It differentiates between core and edge routers. Edge routers
perform per-flow management while core routers do not perform per-flow management by
using aggregates instead. There are also other contributions that do not follow the fluid-flow
model, such as fair-Shortest Remaining Time (fair-SRPT) [55], Quantized EDF scheduling
[39] and deadline-monotonic packet-scheduling [28].
Utility-based QoS
Shenker first suggested the use of utility functions for modeling QoS in networks [72].
Bhargvan et al adopt a similar utility-function based QoS optimization method in the
wireless environment [26]. They consider throughput, fairness, delay and loss as their
system-wide QoS parameters. Similar to Q-RAM, they associate concave and continuous
utility functions with QoS parameters. They maximize the global utility of the system
by allocating channel bandwidths to applications subject to a channel capacity constraint.
They build their adaptive algorithm by choosing specific utility functions such as U(r) =
log(r). Unlike Q-RAM, their algorithm relies on the choice of a specific utility function1.
1.3.3 QoS and Radar
As mentioned earlier, there are many real-time systems where physical environment plays a
key role in determining the QoS of applications. Because of the dynamic nature of the envi-
ronment, QoS-based resource management has to be adaptive to changes of environmental
factors such as temperature, noise etc. Consequently, a whole range of resource constraints
such as power, energy etc., come into play. A radar system is a classic example of such a
system.
Many recent studies have focused on phased-array radar systems. The focus has pri-
1More details related to network QoS will be described in Chapter 4.
10 Chapter 1. Introduction
marily been on performing schedulability analysis of radar tasks for their given execution
times. For example, Kuo et al proposed a reservation-based approach for real-time radar
scheduling [48]. This approach allows the system to guarantee the performance requirement
when the schedulability condition holds. However, they do not consider energy constraints.
Shih et al use a template-based scheduling algorithm in which a set of templates is con-
structed offline, and tasks are fit into the templates at run-time [74, 73]. The templates
consider both the timing and power constraints. They also consider interleaving of dwells
that allow beam transmissions (or receptions) on one target to be interleaved with beam
transmissions and receptions on another. The space requirements of templates limit the
number of templates that can be used, and “service classes” designed offline determine how
QoS operating points are assigned to discrete sets of task configurations across an expected
operating range. Goddard et al addressed real-time back-end scheduling of radar tracking
algorithms using a data flow model [33]. Our work in radar QoS optimization is most sim-
ilar to the work of Jha et al[67]. They use their adaptive QoS middleware framework (as
mentioned in Section 1.3.1) for QoS-based resource allocation and schedulability analysis
in Radar Systems.
1.3.4 QoS and Embedded Systems
There has been comparatively less amount of work about QoS in distributed embedded
systems, which are mostly binary control systems. Abdelzaher et al first introduced the
notion of QoS in such systems [1]. They developed a negotiation model that adjusts the
QoS levels of the applications in real-time while maximizing application-perceived system
utility or reward. They incorporated the proposed QoS mechanism into a middleware service
called “RTPOOL”. It uses a QoS optimization heuristic that starts with the maximum
QoS of all tasks, and then reduces the QoS of a task whose drop in reward is minimum
for a lower QoS level. Next, they introduce a distributed QoS-optimization protocol, where
the hosts negotiate with each other and share the load based on the reward of accepting a
1.4. Organization of this Dissertation 11
task of a certain reward level. Based on that work, Sanfridson introduces the concept of
integrating QoS with a feedback control mechanism for automotive systems [70].
1.4 Organization of this Dissertation
We organize this dissertation as follows.
In Chapter 2, we describe our generic model of QoS and distributed systems. In Chap-
ter 3, we describe our resource allocation algorithms for large multiprocessor systems. In
Chapter 4, we describe our distributed resource allocation in scheme for large hierarchical
networks with large numbers of resources and tasks where each task requires many re-
sources. In Chapter 5, we describe an integrated resource allocation and scheduling model
for a Phased Array Radar System as a dynamic scalable real-time system with many dif-
ferent constraints. In Chapter 6, we describe QoS-based resource allocation in distributed
embedded systems. Finally, in Chapter 7, we summarize our research contributions and
discuss future work.
Chapter 2
System Model
2.1 Introduction
In this chapter, we describe a generic model of distributed systems that we use throughout
this dissertation. A distributed system consists of multiple tasks and multiple resources. A
task executes on the system by using the resources. An end-user derives a benefit or utility
from the system due to the execution of these tasks.
This chapter is divided into three parts. First, we define the terms task and resource
and discuss their interactions. Secondly, we elaborate on our mathematical model of a dis-
tributed system. Finally, we describe our basic optimization algorithms [31] that maximize
the accrued utility of the end-user while allocating resources to tasks.
2.1.1 Definitions: Task and Resources
In computer systems, a task is a basic unit of programming that an operating system
controls. Depending on how the operating system defines a task in its design, this unit
of programming may be an entire program or each successive invocation of a program.
A task is considered to be a container that holds a set of instantiating objects known as
threads, as in the case of Mach, Mach-type operating systems and Linux. In BSD Unix-like
13
14 Chapter 2. System Model
environments, however, the word process is used instead of task. When multiple processes
work in the same “context” (address space) of a task, we refer to them as threads.
In this dissertation, we refer to a task as an application that provides a service to the
end-user. It can be a video conferencing task, or a tracking task that tracks a target using a
phased array radar. In its implementation, it consists of one or more processes or threads.
A resource, on the other hand, is defined as a source of aid or supply that can be drawn
upon when needed. Tasks need resources to be executed. Furthermore, a resource is a
measurable entity that has a finite supply. The major computer system resource categories
are processor cycles, network bandwidth, memory, and disk space. Embedded systems may
have other resources. For example, in a radar system, resources include antenna bandwidth
and antenna power.
Resources can be classified into 2 main categories. They are: (1) Time-shared resources,
and (2) Spatial resources. Each is discussed in the following subsections.
2.1.2 Time-shared resources
A resource is time-shared when at a given instant, only one task receives the entire supply
of a resource, while other tasks that require it receive none of it. Processor (CPU) cycles,
and network bandwidth are time-shared resources. We can either express a time-shared
resource as a system-wide supply of an amount ∆R available in every small time unit ∆T ,
such that the rate of the supply can be expressed as:
r(t) =∆R
∆T. (2.1)
A task τi can specify its requirement as a total share Ci over a time spent Di. In this case,
we can express the task’s average usage rate as CiDi
.
If a task is periodic, its requirements can be expressed as Ci units of resource in every
period of Ti time units1. The resource requirement of a periodic task can be expressed as1Periodic and aperiodic tasks are described in [59] in more detail.
2.1. Introduction 15
a rate by:
si(t) =Ci
Ti. (2.2)
For n periodic tasks, Liu and Layland in [16] introduced a fixed priority scheduling
scheme in which the scheduling priority of a task is inversely proportional to its period and
a higher priority task can instantly preempt a lower priority task with no context-switching
overhead. This is known as rate-monotonic scheduling (RMS) algorithm. They proved that
each task τi obtains its share of Ci units in every period Ti if
1r(t)
n∑i=1
si(t) ≤ n(21/n − 1). (2.3)
The number (n(21/n − 1)) provides the least upper bound on the utilization of a time-
shared resource under RMS. In other words, this is only a sufficient condition, not a nec-
essary one. An average case behavior provides a much higher utilization in RMS than the
one presented in Equation (2.3).
As n → ∞, it reaches ln 2 = 0.69. In addition, if a task has non-preemptive regions
during its resource usage, then it causes blocking times of tasks of higher priority. This is
known as priority inversion [71, 65]. Priority inversion happens when a low-priority task
holds a resource that a high-priority task is waiting for. Considering Bi as the blocking
time, Equation 2.3 is transformed to:
1r(t)
n∑i=1
Ci + Bi
Ti≤ n(21/n − 1). (2.4)
Thus, the effective least upper bound∑n
i=1CiTi
r(t) is reduced below the least upper bound of
0.693. However, as mentioned before, this is a pathological case.
If the periods of the tasks are harmonic, then the utilization bound for the Rate-
Monotonic scheduling algorithm is 1.0. Hence, in a special case, if we assume that all
tasks have the same constant small period T and the context-switching cost is zero, we
16 Chapter 2. System Model
can transform Equation 2.4 to a General Processor Share (GPS) [63] model, which is a
special case of rate-monotonic model with harmonic periods. In this case, the task-set is
schedulable when:1
r(t)
n∑i=1
Ci
T= 1. (2.5)
2.1.3 Spatial resources
A spatial resource can be shared by multiple tasks simultaneously. Disk space is a good
example of a spatial resource. Each task requires a certain amount of disk-space to store
its data and instructions. A memory buffer is also a spatial resource. If R is the total size
of a spatial resource at any time t and Si is the demand made by task i for that resource,
then,n∑
i=1
Si ≤ R, (2.6)
is the constraint on the resource demands.
In addition, there are other resources such as memory that can be divided spatially into
multiple time-shared resources. We will discuss this as future work in Chapter 7 of this
dissertation. In the next section, we will discuss our resource allocation model.
2.2 QoS and Resource Allocation
In our QoS optimization model, each task is assumed to have multiple QoS settings, each
of which provides a different quality level to the user. Each setting is associated with
certain resource levels. We employ a modified version of the existing QoS-Based Resource
Allocation Model (Q-RAM) [49, 51, 50, 66] as the basic building block of the optimization
process. Our model determines the near optimal quality levels of each task and apportions
the available resources to them. We assume a simple model of resources where each resource
can be divided among the tasks, either in a time-shared or in a spatial manner. In the case
of a time-shared resource, we limit the total allocable amount by its exact schedulability
2.2. QoS and Resource Allocation 17
bound or an approximate utilization bound.
As a generic model, let us consider a distributed system with m shared resources
r1, . . . , rm. Resources can be of any type including CPU, memory, link bandwidth, or
even radar bandwidth in the case of a radar tracking application. We use the term Re-
source Vector to describe a set of resource units (e.g., a processor of certain frequency,
a network link of certain bandwidth) in a multi-resource environment. For example, the
resource vector ~Rmax = (rmax1 , . . . , rmax
m ) denotes the capacity of the individual resources.
The resources are shared by a set of n independent tasks τ1, . . . , τn. Each task is as-
sumed to have a set of parameters that can be changed to configure its quality levels and
resource demands. We commonly refer to these parameters as dimensions. They are mainly
classified into two main categories: operational dimensions and environmental dimensions.
However, from the perspective of the user, we have only one type of dimension known as
QoS dimensions. We discuss all these dimensions in detail next.
2.2.1 Operational Dimensions
Operational dimensions are the control knobs that are directly controlled by the user or
the system administrator. Values of these dimensions determine the resource allocation of
the application and hence directly or indirectly influence its quality. The choice of a coding
scheme for video conferencing, and the choice of a route for a networked application between
its source and destination are examples of operational dimensions.
Operational Space: This is defined as the set of operational points, as shown for task
τi in Equation 2.7, where Φ is the jth operational dimension and NΦi is the number of
operational dimensions.
Φi = Φi1 × · · · × ΦiNΦi
(2.7)
18 Chapter 2. System Model
Operational Indices: An index in 1, 2, ..|Φij | enumerating the possible values of op-
erational dimension j is called an operational index. Operational dimensions can be of two
types: monotonic and non-monotonic.
Monotonic Operational Dimensions: The value of this type of dimension is directly
or inversely related to the utility of the task. In other words, increasing values along this
dimension either increases or decreases utility. For example, increasing the frequency of a
tracking task in radar increases the quality of tracking.
Non-Monotonic Operational Dimensions: The value of this dimension is not directly
or inversely related to the utility of a task. An example is the selection of a video coding
algorithm for a video task. There may be multiple types of video coding algorithms, but it
may not be possible to sort them in the increasing or decreasing order of utility.
Next, we will introduce another type of dimension that affects the QoS of tasks and
hence the utility, but is not in the direct control of the user or the system administrator.
2.2.2 Environmental Dimensions
The quality obtained by a task may even depend on factors in the environment in addition
to the operational settings. For example, the quality of a video conferencing task in a
wireless medium can depend not only on the strength of the wireless signal received at the
receiver, but also on factors such as the environmental noise. Therefore, the noise is an
example of an environmental dimension.
Environmental Space: This is defined as the set of environmental points, as shown for
task τi in Equation 2.8, where Θ is the jth environmental dimension and NΘi is the number
of environmental dimensions.
Θi = Θi1 × · · · ×ΘiNΘi
(2.8)
2.2. QoS and Resource Allocation 19
Environmental Indices: An index in 1, 2, ..|Θij | enumerating the possible values of
environmental dimension j is called an environmental index.
With different values in the operational and environmental dimensions, a task gets a
different value of the QoS setting. Next, we discuss the dimensions that are of direct
relevance to the end-user and that provide QoS to the end-user.
2.2.3 QoS Dimensions
The dimensions that are of direct relevance to the user are known as QoS dimensions. For
example, the frame rate of a video-conferencing task and the tracking precision of a radar
tracking task are QoS dimensions. A higher value along a QoS dimension generally requires
higher resource levels.
QoS dimensions are derived from operational and environmental dimensions. A QoS
dimension can also be same as a monotonic operational dimension. For example, frame
rate of a videoconferencing task is an operational dimension (controllable knob) that is also
a QoS dimension.
Users derive satisfaction or utilities through various values of QoS dimensions. The
higher the value of a QoS dimensions, the higher the utility to the user. For example, a
higher frame-rate in a video-conferencing application provides a higher utility to the user.
The value of the utility along different QoS dimensions depends on the task, and perhaps
the user.
In the context of QoS dimensions, we use the following terms from [49].
Quality Space: This is defined as a set of quality points, as given by:
Qi = Qi1 × · · · ×QiNQ
i, (2.9)
for task τi, where Qij is the jth QoS dimension and NQi is the number of QoS dimensions.
20 Chapter 2. System Model
QoS Dimensions
EnvironmentalDimensionsDimensions
Operational
System−centric dimensions
Resource requirements
User−centric dimensions
Utility
Figure 2.1: Dimensions and Their Relations
Quality Indices: An index in 1, 2, . . . , |Qij |, enumerating the quality levels for dimen-
sion j arranged in increasing value of the quality level is called a quality index.
Dimension-wise Utility: This is the utility associated with a particular quality level of
a QoS dimension. In other words, it is defined as the mapping uij : Qij → < representing
the utility achieved by assigning quality level qij to dimension Qij .
Application Utility: It is normally expressed as the weighted sum of dimension-wise
utilities across all QoS dimensions as a mapping ui : Qi → <.
For example, if an application has 2 QoS dimensions, its particular QoS setting is denoted
by (qj1, qk2), where j and k are the indices of its respective QoS dimensions. The utility of
the application at this QoS setting is expressed as (w1uj1 + w2uk2), where w1 and w2 are
the respective weights of the two dimensions.
Based on operational and environmental dimensions, we generate the different operating
2.2. QoS and Resource Allocation 21
points of a task. We refer to them as set-points.
Definition 2.2.1 (Set-point). It is an operating point of the task. It consists of a partic-
ular of each of its operational and environmental dimensions and a utility value.
2.2.4 Set-point Generation
Set-points are generated by creating a QoS Profile and a Resource Profile [49].
QoS Profile Generation
The QoS Profile consists of different QoS levels of the task and the values of the corre-
sponding utilities. For some tasks, the operational dimensions and QoS dimensions may
be equivalent and there may be no environmental dimensions, but in general we say that
there is a Quality Function fqi : Φi ×Θi → Qi mapping each point in the cross product of
the operational space and environment space to a point in the quality space. The relation
between operational, environmental and QoS dimensions is illustrated in Figure 2.1.
Resource Profile Generation
In order for a task to operate at a particular set-point φi, it requires resources. We define
a function gi : Φi → ~Ri specifying the amount of resources required for the task to oper-
ate at each set-point, where ~Ri = ri1 , . . . , rim is defined as the Resource Vector describing
the resource requirements of the task at that set-point. Apart from the resource require-
ment of the task, it also has a deployment constraint which is given by an non-monotonic
operational dimension. For example, in a networked system, if a task requires bandwidth
between a source and a destination, the multiple choices of paths belong to a non-monotonic
operational dimension[31].
For each task, all QoS dimensions Qij must satisfy the conditions as
∀k∈1,...,m∂rk
∂qij≥ 0, (2.10)
22 Chapter 2. System Model
where rk denotes the kth resource. That is, an increase in any quality index value never
results in the decrease in any resource requirement value. Set-points that do not satisfy
these conditions can be dropped from consideration. This is because there are other set-
points that can yield higher QoS with reduced resources. The same condition is applicable
for monotonic operational dimensions.
However, for non-monotonic operational dimensions, the conditions are given by:
∃k∈1,...,m∂rk
∂φij< 0, (2.11)
∃k′∈1,...,m,k′ 6=k∂rk′
∂φij> 0. (2.12)
These equations indicate that the switching from one value of “resource configuration” to
another results in subtraction of resource from one or more resource element and addition
to one or more different resource elements.
2.2.5 Example Application
As an example, consider a video conference application with QoS and operational dimensions
as shown in Table 2.1. There are two monotonic operational dimensions that have one-to-
one correspondence with QoS dimensions: frame rate and resolution. They are assumed
to have weights 0.4 and 0.6 respectively. The weights represent the relative importance of
the QoS dimensions from the user’s perspective. For frame rate, there are three possible
levels of service at 10 frames/sec, 20 frames/sec and 30 frames/sec. A quality index is
associated with each of these service levels with 1 for the lowest level of service, and 3 for
the highest level of service. The user of the application has assigned utility values to each
of these levels of service indicating the relative desirability of these service levels. Similar
quality index and utility values are assigned for various resolutions.
In addition to the monotonic operational dimensions, there are also two non-monotonic
operational dimensions. The first operational dimension is the format, or codec, to use for
2.2. QoS and Resource Allocation 23
QoS/Monotonic Operational Levels Quality/Monotonic UtilityDimensions (weight) Operational Index
In this section, we briefly describe the optimization technique presented in [51, 49]. We
denote the number of tasks by n and the number of resources by m. Let Ci represent the
set of utility-resource pairs for task τi, as shown:
Ci = 〈
ui1
~Ri1
, ....,
uiki
~Riki
〉. (2.13)
Next, we would like to determine and compare the costs of the resource vectors in order
to choose one which gives higher utility at a lower cost. When there is a single resource in
the system, the cost of a set-point is simply equal to its resource amount. When there are
2.3. Existing Optimization Algorithm 27
multiple resources, a scalar metric known as compound resource is computed.
To compute the compound resource, we first compute a penalty vector for the resources
(assuming we have m resources) ~P = (p1, ..., pm) to assign a “price” on each resource.
The value of an element in the vector is directly related to the overall demand of the
corresponding resource, and is defined to be:
pk =rsumk
rmax k+ 1, (2.14)
where rsumkis computed as the sum of the kth resource elements of all the set-points of all
the tasks as given by:
rsumi =∑
All tasks
∑All set−points
rji . (2.15)
The compound resource h is a scalar metric, which is defined for each set-point is defined
by:
h =√
(r1.p1)2 + . . . + (rm.pm)2. (2.16)
The metric h is used to compare the relative cost of each of the resource combinations. We
now augment Ci by adding h to get:
Cic = 〈
ui1
ri1
hi1
, ....,
uiki
riki
hiki
〉. (2.17)
Cic is called a compound resource vector. We use the parameters in Cic to determine the
near-optimal resource allocation for tasks that maximizes the global utility value. The
algorithm is called Approximate Multiple Resource Multiple Dimension or AMRMD1 [51]. It
is briefly presented in Algorithm 1.
This algorithm computes the compound resource vector of a resource. The procedure
concave majorant() chooses to retain the points in Cic falling along the line of highest
28 Chapter 2. System Model
input : profiles of tasksoutput: resource allocation of tasks by maximizing utilityCalculate initial penalty vector;for iter = 0 to max iter do
//max iter is usually set to 3for All tasks i do
Generate compound resource Cic for each task τi;Perform concave majorant optimization [51] on Cic;
endCreate slope list by merging set-points of all Cics based on their slopes;Go through the entire slope list and enter/update the resource allocation of thetasks;Update penalty vector from the usage of the individual resources;if the utility in the previous iteration differs from this utility by a small fractionε then
Break from the loop;end
endFinalize resource allocations of the tasks;
Algorithm 1: Basic “AMRMD1” algorithm
slope. The slope of the utility function at a set-point j is defined by:
slope(j) =u(j)− u(j − 1)h(j)− h(j − 1)
, (2.18)
where h(j) and u(j) are the compound resource and the utility at the set-point j respectively.
This is also known as the marginal utility.
2.3.2 Drawbacks of the AMRMD1 Algorithm
There are 2 problems in applying the above algorithm in a multi-resource environment. We
describe them in the order of importance.
2.3. Existing Optimization Algorithm 29
Static Penalty Vector Computation
The AMRMD1 algorithm statically computes the “penalty” vector. It is determined based on
the aggregate potential demand placed on a resource, and penalizes the choices of resources
that are perceived to be heavily loaded in favor of the less loaded resources. The aggregate
is determined by summing the resource requirements of all set-points of all the tasks. In
a true sense, the computation of the penalty vector should reflect the real usage of the
resources at any given point in time during resource allocation. In other words, the penalty
vector should be computed dynamically each time a set-point gets admitted, based on the
quality points that have already been admitted into the system so far. This is particularly
true for a large distributed system where a task can have multiple values of its operational
dimensions in terms of its resource trade-offs. Adding all possible resource trade-off values
will unnecessarily create heavy penalties for small resources. If the dynamic computation
is to be avoided for complexity reasons2, we need to obtain a smarter way of evaluating the
penalty vector that does not unnecessarily penalize resources of small size.
Neglecting Co-located Points
Even after using the static penalty vector computation, there can still be many set-points
that have the same values of utility and compound resource but different resource vectors
(or resource combinations). These set-points are known as co-located set-points.
There can be multiple co-located set-points, and keeping only one of them can be poten-
tially sub-optimal. However, while determining the concave majorant, the AMRMD1 algorithm
will choose only one out of those co-located points whichever appears first in the list and
eliminate others completely from consideration. This decision may not be the best one
simply because during the course of the resource allocation process, one point may be in-
feasible while another co-located point with the same utility may be feasible. This depends
on the status of the current allocation of resources. As a result, AMRMD1 may stop allocating
2We discuss the complexity of dynamic penalty vector computation in Section 2.4.1
Table 2.5: AMRMD1 Resource Allocation for Unequal Processor
In short, the original AMRMD1 algorithm can clearly lead to sub-optimal solutions. In
the next two sections, we will discuss two new algorithms that attempt to overcome these
limitations of AMRMD1.
2.4 Enhanced Optimization Algorithms
We now describe two new algorithms that address the limitations of the AMRMD1 algorithm
described earlier.
2.4.1 Dynamic Penalty Vector (AMRMD DP)
In this algorithm, we compute the penalty vector dynamically as we assign set-points for
the tasks. It works as follows.
input : profiles of tasksoutput: resource allocation of tasks by maximizing utility using Dynamic Penalty
VectorsCalculate initial penalty vector;while Number of set-points of all tasks more than 1 and Resources are available do
Create sorted slope list by merging all set-points of the tasks based on theirslopes;Allocate set-point of highest slope/marginal utility;Eliminate the set-points of the task with same or lower utilities;Recompute penalty vector based on the available resources;Update compound resources of the remaining set-points of the tasks;
endFinalize resource allocations of the tasks;
Algorithm 2: Basic “AMRMD DP” algorithm
First, it creates the Ci lists. Without performing the concave majorant operation, it
2.4. Enhanced Optimization Algorithms 33
Compound Resource
Util
ity UpdateinPenalty Vector
Another update in penalty vectorFewer points left
Util
ity
Util
ity
Compound Resource
Compound Resource
Figure 2.5: AMRMD DP Algorithm
computes the marginal utility as the slope of the compound resource/utility curve. Next,
it selects the point of the highest marginal utility to be allocated. If the allocation is
successful, it updates the penalty vector. This step requires an update in the compound
resource parameters of all tasks containing the remaining set-points. Thus, the set-points
migrate from one location to another in compound resource-utility space during the progress
of the algorithm. Then, it repeats the procedure until all the set-points of all the tasks or
the resources are exhausted.
Complexity of AMRMD DP
The asymptotic computational complexity of AMRMD DP is as follows. The initial computa-
tion of the penalty vector takes O(nL) operations, where n is the number of tasks and L
34 Chapter 2. System Model
is the maximum number of set-points per task. Within the loop, the procedure for updat-
ing the compound resource takes O(nL) operations, the procedure for selecting a set-point
takes O(nL) operations and the procedure for adjusting the penalty takes O(nL) opera-
tions. Now this loop can repeat nL times in the worst case. This yields a total complexity
Therefore, this algorithm has a higher degree of complexity than AMRMD1, whose com-
plexity is O(nL log(nL)); however, unlike AMRMD1, AMRMD DP yields the optimal solutions for
both the examples discussed in Section 2.3.2.
2.4.2 Co-mapping of Quality Points (AMRMD CM)
Util
ity
Compound Resource
Co−located points
Generate K_listandperform convex_hull
Convex_hull_map operationBring back essential co−locatedpoints
Eliminated pointsafter convex_hull
Co−located points
Util
ity
Util
ity
Compound Resource
Compound Resource
Figure 2.6: AMRMD CM Algorithm
The AMRMD CM algorithm explicitly keeps track of co-located quality points, and performs
both a penalty vector and a concave majorant computations in ways different from AMRMD1.
2.4. Enhanced Optimization Algorithms 35
Penalty Vector Computation
Similar to AMRMD1, AMRMD CM algorithm also evaluates the penalty vector statically. However,
its computation is different from that of AMRMD1. In AMRMD1, all the resource deployment
options are added together to determine the potential resource demand. However, we know
that only one out of multiple of resource options need to be selected for a task and each
resource option may not be equally likely to be selected. Therefore, we would like to
include the likelihood of selection of a resource option while computing the penalty vector
of resources.
Let us consider the likelihood of a particular resource trade-off to be chosen. At a par-
ticular utility value, let us denote the resource vector of jth trade-off by ~Rj = (rj1 , . . . , rjm),
where rjk, rmax
k and m denote the demand of the kth resource, the capacity of the kth re-
source and the number of resources in the resource vector respectively. In this context, we
define the following two terms.
Definition 2.4.1 (Bottleneck Resource). At a given utility level, the kth resource is
said to be the bottleneck resource of the resource vector corresponding to the jth trade-off, if
rjk/rmax
k ≥ (rjl/rmax
l ), ∀1 ≤ l ≤ m. (2.19)
Definition 2.4.2 (Bottleneck Factor). At a given utility level, for jth trade-off, if kth
resource is the bottleneck resource, then the factor βj = rmax/rjkis defined as the Bottleneck
Factor of the jth trade-off.
Definition 2.4.3 (Selection Factor). The selection factor of jth trade-off at a fixed utility
level is given by:
ρj =(βj)∑NTi=1 (βi)
, (2.20)
where NT denotes the number of elements for the trade-off dimension.
Using Definition 2.4.3, at a given utility level, multiple resource allocations are weighed
36 Chapter 2. System Model
based on the values of their Selection Factors in order to evaluate their demands. For
example, if a task is allocable to 2 processors of unequal capacities, the selection factor of
the larger processor is higher. However, if the task is allocable to only one processor, the
selection factor of that processor is 1 while that of the other one is 0. We evaluate the
demand of resources by modifying Equation 2.15 as given by:
rsumi =∑
All tasks:n
∑All set−points:L
ρjri. (2.21)
Next, we compute the penalty vector using Equation (2.14) and consequently derive
compound resource for the set-points using Equation (2.16).
Concave Majorant Computation
Similar to AMRMD1, the AMRMD CM algorithm computes the concave majorant procedure to
retain only the necessary set-points. However, unlike AMRMD1, it retains all the co-located
set-points which yield the same utility values with the same compound resource values but
with different resource vectors.
From Cic lists, we create another compound resource list Ki by including the elements
from Cic that only have distinct values of compound resources (h). In other words, if two or
more elements in Cic have the same value3 for h but different vector values for r, then they
are mapped to a single element in the Ki list. This is called “co-mapping” of set-points.
Each element in Ki also stores the indices of the corresponding elements in Cic.
AMRMD CM then performs the concave majorant operation on the Ki list instead of the
Cic list, and maintains the set of co-located points of the same utility if they lie on the
concave majorant. It attempts to allocate one of the co-located points of a task if a point
is infeasible due to resource constraints, and allocation lasts until all the points of all the
3In order to account for the floating point precision issue, we consider two points co-located when thefractional difference between their compound resource values is less than a small fraction ε, which is typicallyset to 0.1.
2.4. Enhanced Optimization Algorithms 37
input : profiles of tasksoutput: resource allocation of tasks by maximizing utilityCalculate initial penalty vector;for iter = 0 to max iter do
//max iter is usually set to 3for All tasks i do
Generate compound resource Cic for each task τi;Generate new list Ki where multiple co-located points in Cic are mapped toa single point in Ki;Perform concave majorant optimization [51] on Ki;Retain the corresponding set-points of Cic that map to the remainingset-points in Ki in terms of compound resources and utilities and discardthe rest;
endCreate slope list by merging set-points of all Cics of all tasks based on theirslopes;Go through the entire slope list and enter/update the resource allocation of thetasks;After the procedure is finished update penalty vector from the usage of theindividual resources;if the utility in the previous iteration differs from this utility by a small fractionε then
Break from the loop;end
endFinalize resource allocations of the tasks;
Algorithm 3: Basic “AMRMD CM” algorithm
tasks are exhausted.
The process of forming a Ki list and the corresponding retrieval of the relevant co-
located points are illustrated in Figure 2.6. In Step (1), co-located points are gathered. In
Step (2), the concave majorant is determined. In Step (3), only points (and their co-located
points) along the concave majorant are used for making resource allocation decisions. The
procedure is briefly described in Algorithm 3.
38 Chapter 2. System Model
Complexity of AMRMD CM
The asymptotic computational complexity of AMRMD CM can be obtained as follows. Let
L = maxni=1|Qi| and L′ = maxn
i=1|Ci|. In other words, L is assumed to the maximum
number of QoS levels and L′ is assumed to be the maximum number of set-points that may
have multiple set-points at a particular QoS level. The procedures for creating Klist and co-
O(nL log(L)) [51]. The merging operation takes O(nL′ log(n)). Therefore, the complexity
of the algorithm is: O(nL′(1 + log(n))) + O(nL log(L)) = O(nL log(L)) + O(nL′ log(n)).
This is somewhat higher than that of AMRMD1 since L′ ≥ L, but much smaller than that of
AMRMD DP. In addition, AMRMD CM yields the optimal results for both the examples discussed
in Section 2.3.2, similar to AMRMD DP.
2.5 Large-scale Optimization Issues
Based on the above discussion, we shall use AMRMD CM as the algorithm for optimization in
the rest of the dissertation. It exhibits a benign computational complexity (O(nL log(nL)))
compared to the optimal algorithm which is NP-hard. However, we have other problems in
using this algorithm directly in a large-scale distributed system.
2.5.1 Set-Point Generation Complexity
The computational complexity of O(nL(log(nL))) ignores the processing of the generation
of set-points from the various dimensions. In many applications, the generation of different
values of the operational dimensions can itself be of much higher complexity than the core
optimization process. For example, in a networked system, determining all the possible
paths between a source and a destination can be much more complex than finding a right
path with an appropriate bandwidth by Q-RAM. This will be discussed in more detail in
Chapter 4.
2.5. Large-scale Optimization Issues 39
NetworkDisk CPU
Q−RAM Resource Allocator
Resource Admission Control of Scheduler
Tasks
Resources
QoS and resource specifications from Tasks
Resource reservation request to OSRequest success/failure
QoS and resource assignments for Tasks
Figure 2.7: Q-RAM & Scheduler Admission Control
If the task is highly configurable i.e., it has a large number of possible values per di-
mension, we should generate only a few values of each dimensions before we perform the
concave majorant instead of exhaustively generating all possible values. This will have the
effect of reducing the complexity of the concave majorant step O(nL log(L)) by reducing L.
2.5.2 Core Algorithm Complexity
After reducing L, we would like to reduce the second part of the complexity that comes
from merging all set-points of all n tasks, which is O(nL log(n)). This is solved by dividing
the problem into smaller subproblems and solving these subproblems as independently as
possible. This includes the clustering of the tasks into a small number of groups and the
division of the entire distributed system into a number of small partitions. This reduces
both the complexity of the concave majorant and the merging operations. The technique
to perform this division varies depending on the type of the system.
40 Chapter 2. System Model
Sche
dule
r Adm
issi
onC
ontr
olQ−RAMResourceAllocator
Requested Task Queue
Q−RAM Reconfiguaration Clock
Output Task Setting
Rec
onfi
gura
ble
Tas
k Q
ueue
Figure 2.8: Dynamic Q-RAM Optimization
2.5.3 QoS Optimization and Resource Scheduling
In our QoS optimization model, we assume a simple model of resources where each resource
can be perfectly divisible among the tasks, either in time-shared or in spatial manner.
However, in the case of time-shared resources, in order to obtain real-time guarantees, we
need to perform scheduler admission tests for tasks once the resources are allocated to them
by Q-RAM. Moreover, the admission tests of multiple resources must be integrated with
each other [29, 69]. The interaction between a resource scheduler and Q-RAM optimization
are shown in Figure 2.7. Q-RAM can allocate resources more optimistically or conservatively
depending on the assigned utilization bounds on the resources. We know that the bound
must be set less than or equal to 1.
In addition, Q-RAM optimization needs to be performed either reactively or at regular
intervals (known as the reconfiguration rate) as a background process in a dynamic scenario
where the task set is not fixed and tasks are continuously arriving and departing the system.
In this case, the arriving tasks form a queuing system with Q-RAM as the “server” [38]. Q-
RAM accepts multiple newly arrived tasks, performs optimizations along with the existing
2.6. Chapter Summary 41
schedulable tasks, and finally produces the resource allocations of those tasks. In this
process, only a few out of all existing tasks need to be selected for optimization along with
the newly arrived tasks. The process is illustrated in Figure 2.8. The details of this dynamic
process are described in [38] along with experimental results.
Based on the above model, it must be noted that the scalability of the Q-RAM opti-
mization depends on how many tasks it can handle for optimization for a particular recon-
figuration rate. During the rest of this dissertation, we will investigate the improvement of
the scalability of Q-RAM.
2.6 Chapter Summary
In this chapter, we developed a generic model of a distributed system consisting of multiple
resources and applications. We also presented our QoS model, which is based on Q-RAM.
In the context of Q-RAM, we have presented new QoS optimization algorithms that handle
resource trade-offs more efficiently in a multi-resource environment. Finally, we highlighted
the challenges involved in performing QoS-based resource allocation in large systems. In the
next chapter, we will discuss QoS-based resource allocation in large multiprocessor systems.
Chapter 3
Resource Allocation in
Multiprocessor Systems
3.1 Introduction
In this chapter, we present our approach to QoS-based resource allocation in a multipro-
cessor environment. The tasks are assumed to be independent of each other i.e., there is
no communication among the tasks and they are indivisible. We also consider the fault-
tolerance requirements for the tasks along with the standard QoS requirements such as
timeliness.
A typical multiprocessor system consists of multiple processors connected via a bus, as
shown in Figure 3.1. Typical examples of multiprocessor systems are present in distributed
embedded environments such as automotive systems, back-end processors in phased-array
radar and distributed server systems.
There are existing algorithms such as bin-packing [11, 18, 19, 17, 3, 8, 43, 42] and load-
balancing [75] for deploying tasks with fixed resource requirements to a fixed set of resources.
There are QoS-based resource allocation schemes such as Q-RAM that determines the QoS
setting and associated resource allocation for tasks in any generic distributed system. How-
43
44 Chapter 3. Resource Allocation in Multiprocessor Systems
ever, these algorithms are not effective of performing resource allocation in multiprocessor
systems that integrate the QoS requirements with the fault-tolerance requirements of appli-
cations. In this chapter, we address this problem with new algorithms for combining QoS
optimization and fault-tolerance with resource selection.
P P P
P
1 23
4
I/O
Figure 3.1: Typical Multiprocessor System
One other problem with existing QoS optimization algorithms is that they are not scal-
able to very large numbers of resources and tasks. We present a new hierarchical decom-
position technique for solving very large optimization problems. The hierarchical technique
divides the problem into smaller sub-problems, and then solves these sub-problems individ-
ually. As we shall see, this leads to two or more orders of magnitude reduction in execution
time.
3.2 Q-RAM in Multiprocessor Systems
In this chapter, we make the following 4 assumptions in our model of a multiprocessor
system.
• A task has no specific bias or preference for any processor. In other words, a task can
be deployed to any processor as long as there is a space for it to be allocated.
• The number of QoS dimensions and the number of elements along any dimension are
both small. We have limited our analysis to 2 or 3 QoS dimensions each with only 2
or 3 discrete levels.
3.2. Q-RAM in Multiprocessor Systems 45
Number of QoS dimensions q 2Length of each dimension 3Utilities for QoS dimension (u(q)) (0.5,0.7,0.8)Weight for each QoS dimension random (0.00,1.00)Minimum resource for each task random (1,3) unitsResource increment for higher QoS random (1,2) unitsNumber of processors 5Resource amount per processor 10 units
Table 3.1: Experimental Settings with Optimal Algorithm
• Tasks do not communicate with one another. In other words, there are no communi-
cation bandwidth requirements among the tasks.
• A task is deployed to only one (or in the case of fault-tolerance, several) resource
(processor) from a pool of resources.
As we know, Q-RAM has an algorithm called AMRMD1 that performs the resource alloca-
tion for tasks in a multi-resource environment. In Chapter 2, we presented two algorithms
AMRMD CM and AMRMD DP as modified versions of AMRMD1 that were perceived to handle the
resource deployment trade-offs more efficiently. In this section, we evaluate the perfor-
mances of these algorithms in multiprocessor systems. For a given number of task profiles
and resources, our experiments focus on measuring the following performance metrics:
• the maximum number of tasks that can be admitted while satisfying the minimum
QoS requirements of all the admitted tasks,
• the utility obtained with the maximum number of admitted tasks when their minimum
QoS is 0, and
• the execution time of the algorithms.
46 Chapter 3. Resource Allocation in Multiprocessor Systems
0
2
4
6
8
10
12
14
16
18
0 5 10 15 20 25 30 35 40 45 50 55
Util
ity a
crue
d ->
Number of Tasks ->
Optimal Algorithm Comparison
mrmd-optimalamrmd1
amrmd_cmamrmd_dp
Figure 3.2: Utility Variation
3.2.1 Comparison with Optimal Algorithm
The first experiment compares all three AMRMD algorithms that are presented in Chapter 2,
along with the optimal exhaustive search algorithm called MRMD. The optimal algorithm is
presented in [49].
A small multi-processor system consisting of 5 processors is assumed for the sake of
convenience to be running the exponentially complex optimal algorithm. The assumed
configurations of the tasks and that of the system are presented in Table 3.1.
Figure 3.2 shows the variation in the utility as the number tasks is varied from 2 to
52. The result is averaged over 50 runs. It shows that AMRMD CM performed closest to the
optimal MRMD scheme in terms of utility, AMRMD1 being the farthest.
Figure 3.3 shows the variation of the execution time of the algorithms. The results
are plotted on log-scale as the optimal solution runs approximately 30, 000 times slower
than AMRMD1. The execution time of AMRMD CM is approximately 1.5 times greater than that
3.2. Q-RAM in Multiprocessor Systems 47
100
1000
10000
100000
1e+06
1e+07
1e+08
1e+09
0 5 10 15 20 25 30 35 40 45 50 55
Run
tim
es (u
sec)
in lo
g sc
ale
->
Number of Tasks ->
Optimal Algorithm Comparison
mrmd-optimalamrmd1
amrmd_cmamrmd_dp
Figure 3.3: Run-time Variation
of AMRMD1 as expected. The algorithm AMRMD DP has quadratic complexity and thus runs
slower by an order of magnitude compared to the other two AMRMD algorithms.
These results show that AMRMD CM yields utility values closest to those of the optimal
algorithm with somewhat higher execution times than those of AMRMD1.
3.2.2 Results for Larger Systems
In this experiment, we consider a system with 20 processors. The full experimental set-up
is given in Table 3.2. In the first case, we do not allocate the tasks with their minimum QoS
before performing the optimization. Instead, we compare the three algorithms in terms of
the maximum number of tasks they can admit into the system where each task has non-zero
QoS requirements that must be satisfied for admission into the system.
The results for the maximum number of tasks that can be admitted under each algo-
rithm, averaged over 100 randomly generated task configurations, are shown in Figure 3.4.
48 Chapter 3. Resource Allocation in Multiprocessor Systems
amrmd1 amrmd_dp amrmd_cm0
20
40
60
80
100
120
140Results on Number of Admitted Tasks
Algorithms
Num
ber o
f Adm
itted
Tas
ks
Figure 3.4: Number of Admitted Tasks (20 processors)
As can be seen, AMRMD CM is able to admit 6 times more tasks than AMRMD1 and twice as
many than AMRMD DP.
Figure 3.5 shows the standard deviation of the results for 3 algorithms. We observed
a very high relative standard deviation of the results of AMRMD1, the least being that of
AMRMD CM. The reason for this behavior is the following. AMRMD1 algorithm randomly selects
one of any co-mapped resource deployment points for a QoS setting and discards the rest
based on its own concave majorant operation. This random selection makes a significant
difference in the performance and contributes to the large standard deviation results for
AMRMD1. AMRMD DP, on the other hand, uses a better technique by evaluating the penalty
vector dynamically at each resource allocation. However, it follows the same technique as
AMRMD1 in discarding deployment options (trade-offs), resulting in a high standard deviation
similar to that of AMRMD1. This shows that AMRMD CM is the most consistent and predictable
in its result that does not depend on the randomness in the sequence of the input data.
In the next experiment, we vary the number of tasks and determine the utility accrued
under each algorithm. In this case, we assume all tasks have the zero minimum QoS
3.2. Q-RAM in Multiprocessor Systems 49
amrmd1 amrmd_dp amrmd_cm0
5
10
15
20
25
30
35
40
45Results Standard Deviation on Number of Admitted Tasks
Algorithms
Per
cent
age
Sta
ndar
d D
evia
tion
Figure 3.5: Percentage Standard-deviation (= 100× (Standard deviation)/mean ) of num-ber of admitted tasks on 20 processors
requirements and thus all can be “admitted”. This assumption equalizes all the algorithms
in terms of their admission control characteristics and allows us to compare utility accrual.
We plot the results that are averaged over 100 randomly generated problems in Figures 3.6
and 3.7. As AMRMD CM accommodates more resource options for each task, the results show
higher utility for AMRMD CM than for AMRMD1. Although it also shows that AMRMD1 algorithm
performed better in terms of yielding utility closer to that of AMRMD CM, this is again due
to the randomized ordering of trade-off values and the distribution of utilities among tasks.
This also can be made worse by choosing a different utility distribution for tasks. In other
words, in the case of AMRMD1, a few tasks obtained very high utility values at the expense
of many tasks remaining at the 0 utility level. In terms of execution times, AMRMD CM shows
slightly higher execution time than AMRMD1 while AMRMD DP needs close to two orders of
magnitudes (or higher) execution times compared to the other two.
50 Chapter 3. Resource Allocation in Multiprocessor Systems
0
10
20
30
40
50
60
70
80
0 20 40 60 80 100 120 140 160
Util
ity ->
Number of tasks ->
amrmdamrmd_cmamrmd_dp
Figure 3.6: Utility Variation of Three Algorithms in a System of 20 Processors
3.2.3 Results on Fault-tolerance
The notion of incorporating fault-tolerance in the QoS-based Resource Allocation Model
(Q-RAM) was explained in Section 2.2.6 of Chapter 2. We assume that fault-tolerance is
supported using replication on multiple processors. The higher the degree of replication, the
higher the utility obtained along the fault-tolerance dimension. In this experiment, tasks
are assumed to have fault-tolerance as the only QoS dimension for ease of comparison. Table
Number of QoS dimensions (q) 2Length of each dimension 3Utilities for each quality dimension (u(q)) (0.5,0.7,0.8)Computational resource on each processor 100Minimum resource for each task random (1,25)Resource increment for higher QoS random (1,10)Number of processors 20
Table 3.2: Settings for Second Experiment
3.2. Q-RAM in Multiprocessor Systems 51
1000
10000
100000
1e+06
1e+07
1e+08
0 20 40 60 80 100 120 140 160
Run
tim
es (u
sec)
in lo
g-sc
ale
->
Number of tasks ->
amrmdamrmd_cmamrmd_dp
Figure 3.7: Run-time Variation (log-scale) of Three Algorithms in a System of 20 Processors
3.3 lists the experimental specifications.
The results shown in Figures 3.8 and 3.9 demonstrate that AMRMD CM outperforms the
other two algorithms with respect to the number of tasks it admits and the utility it achieves
for a fixed number of tasks. The number of tasks it typically admits varies between 2 to 6
times that of AMRMD1. Essentially, if tasks are assumed be admitted with zero QoS (i.e., they
are “rejected”), AMRMD1 can maximize utility fairly well depending on utility values of tasks.
However, if all incoming tasks must be admitted at a nonzero QoS level, AMRMD1 performs
poorly compared to AMRMD CM. In other words, AMRMD1 admits fewer number of tasks and
thus it provides higher average utility value per task. On the other hand, AMRMD CM admits
more number of tasks and thus it provides lower average utility value per task. For certain
utility values of tasks where a minimum QoS gives a very large marginal utility, the utility
of the result of AMRMD1 can be made arbitrarily worse than that of AMRMD CM.
The results show an abundance of co-located set-points in the case of fault-tolerant
52 Chapter 3. Resource Allocation in Multiprocessor Systems
multi-processor scheduling.
Number of QoS dimensions 1Number of copies 1-2Number of quality indices 2Utilities (0.5, 0.7,0.8)
Table 3.3: Settings for Experiment on Fault-Tolerance
0
10
20
30
40
50
60
70
80
90
100
0 20 40 60 80 100 120 140 160
Util
ity ->
Number of tasks ->
Utility variation for fault-tolerant case
amrmd1amrmd_cmamrmd_dp
Figure 3.8: Utility Variation under Fault-Tolerance
3.3 Hierarchical Q-RAM in Multiprocessor System
None of the algorithms we have presented so far scale well when there is a large number of
resources. For example, in multi-dimensional radar systems, 64 or more processing nodes
are common. Under these conditions, all three algorithms can consume large amount of
computation time and memory, making them usable only offline. This scalability bottleneck
3.3. Hierarchical Q-RAM in Multiprocessor System 53
amrmd1 amrmd_cm amrmd_dp0
20
40
60
80
100
120
140
160
180
200Two copies: 20 Processors
Algorithms
Num
ber o
f Adm
itted
Tas
ks
Figure 3.9: Number of Admitted Tasks (20 processors) under Fault-Tolerance
10000
100000
1e+06
1e+07
1e+08
0 20 40 60 80 100 120 140 160
Exe
cutio
n tim
e (u
sec)
in lo
g-sc
ale
->
Number of tasks ->
Execution time variation for fault-tolerant case
amrmd1amrmd_cmamrmd_dp
Figure 3.10: Run-time Variation (log-scale)
54 Chapter 3. Resource Allocation in Multiprocessor Systems
arises because the factor L1 increases proportionately with the number of resources in the
system. This happens because we enumerate all possible allocations of each task in all
resource unit (processor), where a task is allocated to only one resource unit (processor)
out of many, unless fault-tolerance is required. Even under fault-tolerance, the number
of resource units assigned is equal to the number of replicas needed for a task. Thus we
propose a hierarchical Q-RAM approach in which we partition the problem into smaller
sub-problems each dealing with a smaller number of resources.
Before we discuss the details of our approach, we provide the following definitions and
then state a theorem based on theorems of constrained extrema in linear programming [83].
Definition 3.3.1 (Task Profile). The profile or type of a task is defined by its set-points
containing different values of its operational and environmental dimensions and the associ-
ated utility values.
Definition 3.3.2 (Identical Tasks). Two tasks are said to be identical if they are of the
same type, i.e., they have identical task profiles.
Definition 3.3.3 (Utility function). It is defined as the function that describes the vari-
ation of utility of a task relative to its allocated resource(s).
Util
ity f
(r) −
>
Resource (r) −>
Figure 3.11: A typical Continuous Utility Function
1L denotes the maximum number of set-points per task, as mentioned in complexity analysis in Section 2.4of Chapter 2.
3.3. Hierarchical Q-RAM in Multiprocessor System 55
Theorem 3.3.4 (Resource Distribution of Identical Tasks). If a resource has to be
distributed among identical tasks with continuous monotonically increasing concave utility
function, the total utility is maximized when each task is allocated equal amount of resource.
Proof. The proof of this theorem follows directly from the theorems of constrained extrema
in Linear Programming [83] as a special case of Karush Kunn Tucker’s theorem [64].
Corollary 3.3.5. If a resource has to be allocated among various tasks with a fixed number
of types with continuous utility functions, the maximum utility is obtained when the same
resource amount is allocated to the tasks of the same type.
Proof. This corollary can also be derived from Karush Kuhn Tucker’s theorem.
Corollary 3.3.5 guides us in designing an efficient QoS allocation scheme when tasks
can be classified into a finite set of categories. In our case, tasks do not have continuous
utility functions. Instead, each of them has a few discrete set-points that corresponds to
a set of discrete utility-resource pairs. Lee et al [49] derived the bound in the obtained
global utility relative to the optimal utility when we apply Karush-Kuhn-Tucker’s theorem
on such occasions.
3.3.1 Hierarchical Q-RAM Algorithm
In order to reduce the complexity of Q-RAM optimization, we employ a “divide-n-conquer”
technique. In other words, we would like to divide the problem into identical subproblems
and solve these subproblems independently. Each subproblem is considered a cluster and
each cluster contains an equal number of resources. Hence the total number of resources in
the system must be an integral multiple of the number of clusters created.
56 Chapter 3. Resource Allocation in Multiprocessor Systems
XY
Y
X
_
Initial slope = Util
ity
Resource
Figure 3.12: Initial Slope of a Task
Next, we assume that there are only a small number of types of tasks. We would like to
allocate the computing resource equally to all tasks of the same type. Hence, we distribute
tasks to clusters where each cluster contains identical numbers of tasks of the same type.
If the number of tasks of a particular type is not an integral multiple of the number of
clusters, we will have a few residual tasks of that type which cannot be distributed equally
among the clusters. We keep those tasks temporarily un-allocated.
We then sort the un-allocated tasks in the increasing order of their initial slopes of
utility functions. As shown in Figure 3.12, the initial slope of a task is the ratio of the
utility and the resource requirement at its minimum non-zero QoS level. In other words,
we prioritize these tasks by their initial marginal utility values. We sequentially choose a
task from the list and allocate it to a group that is least populated. Finally, we perform
resource allocation within each cluster by executing AMRMD CM algorithm for each of them
individually.
For example, let us consider a multiprocessor system of R = l× p number of processors.
If we divide the system into P clusters p1, ..., pP , each cluster will contain RP = l number of
resources. Let us assume that we have 2 types of tasks a and b and the number of tasks of
types are na and nb respectively Hence, each cluster also obtains bna/P c number of tasks
of type a and bnb/P c number of tasks of type b. The number of un-allocated tasks are
3.3. Hierarchical Q-RAM in Multiprocessor System 57
Type a
Type b
p
p p
0
Figure 3.13: Hierarchical QoS Optimization with Clustering
(na/P − bna/P c) and (nb/P − bnb/P c) for types a and b respectively.
Based on task profiles, let us assume that the initial slope of a task of type a is higher
than that of type b. Therefore, we allocate the remaining tasks of type a among the clusters
first in a load-balancing manner followed by the remaining tasks of type b. In this way, we
approximately divide the system into P near-identical subsystems. Figure 3.13 illustrates
the process where each cluster contains a single processor i.e., l = 1. Algorithm 4 details
the whole procedure.
Complexity Analysis of H-Q-RAM
First, let us estimate the complexity of AMRMD CM in a multiprocessor system. If |Qm| denotes
the maximum number of QoS setting of a task and R denotes the number of processors. If
58 Chapter 3. Resource Allocation in Multiprocessor Systems
input : Tasks of fixed number of types and a multiprocessor systemoutput: QoS assignment and resource allocation of tasksCluster the resources/processors into p groups;Divide the tasks of each type into p identical groups. If the number of tasks of atype is not an integral multiple of p, keep the remaining tasks un-allocated;Each of the p identical groups of tasks is assigned to a distinct resource cluster, ofwhich there are p clusters;Form p identical groups of tasks and allocate them to p resource clusters;for all remaining tasks do
Perform Concave Majorant;Order with their initial slope of the utility-resource curve;
endfor all sorted remaining tasks do
Choose a task based on the highest initial slope, and allocate it to a group leastpopulated with the same type;
endfor all processor groups do
Run AMRMD algorithm for QoS optimization;Run the selected algorithm only once for multiple identical groups of tasks.Apply the result obtained directly on those subsequent groups;
end
Algorithm 4: Hierarchical Q-RAM Optimization for Multiprocessor System
tasks do not have any fault-tolerance requirements, the maximum number of set-points of
a task is given by L = |Qm|R. Using the expression of complexity obtained for the basic
AMRMD CM in Chapter 2, the complexity of the Q-RAM optimization in a multiprocessor
system is O(nL log(nL)) = O(n|Qm|R log(n|Qm|R)), where n is the total number of tasks.
For H-Q-RAM, if we divide the system into P clusters, the maximum number of set-
points of a task within a cluster is given by L = |Qm|RP . Each cluster contains d nP e tasks.
Hence the complexity of the optimization for each cluster is O(d nP e|Qm|RP log(d n
P e|Qm|RP )).
If we run the operation in a single processor the total complexity is (d nP e|Qm|R log(d n
P e|Qm|RP )),
but since the optimization per cluster can be performed in parallel the total complexity is
reduced to the complexity of a single cluster.
As can be seen from the expressions, the complexity of H-Q-RAM reduces by a factor
P 2 log(P ) compared to that of Q-RAM, where P is the number of clusters.
3.4. Performance Evaluation: H-Q-RAM 59
Number of task Types 8 (0,1,2,3,4,5,6,7)Type of a task random(0,7)Utilities on QoS dimension [0.5,0.7,0.8]/[0.4,0.6,0.65]Minimum resource 24− 2TypeResource increment random(16− 2Type, 20− 2Type)Distribution of task types: 12.5% each on averageNumber of processors 32Number groups formed 1, 2, 4, 8 and 16
Table 3.4: Experimental Specifications (H-Q-RAM)
3.4 Performance Evaluation: H-Q-RAM
In this section, we evaluate the scalability of H-Q-RAM. In this process, we primarily
measure the variation of execution time of H-Q-RAM with respect to the number of clusters
and the corresponding performance in terms of the accrued utility2.
We assume the presence of 8 types of tasks in the system. Each task is independently,
randomly assigned a task type out of these 8 different types. The specifications of the tasks
are detailed in Table 3.4. We assume a multiprocessor system consisting 32 processors.
Hence, we can create 1, 2, 4, 8 and 16 possible clusters under 5 different configurations and
each cluster has 32, 16, 8, 4 and 2 processors in those 5 configurations respectively. Having
a single cluster of all 32 processors is equivalent to the basic Q-RAM algorithm.
3.4.1 Multi-processor Resource Allocation
This experiment deals with the tasks that have no Fault-tolerant QoS specification. We
measure the maximum number of tasks admitted under each of the 5 configurations and
plot them in Figure 3.14. The result is the average over 100 randomly generated task-sets.
The results in Figure 3.14 show that the Hierarchical AMRMD (H-AMRMD) under each
grouping was able to admit nearly the same number of tasks as the non-hierarchical AMRMD.
The maximum drop in the number of admitted tasks was only 5%, from 291 in non-
2H-Q-RAM is transformed to the basic Q-RAM when there is only one cluster.
60 Chapter 3. Resource Allocation in Multiprocessor Systems
1 2 4 8 160
50
100
150
200
250
300Variation of number of admitted tasks (32 processors)
Number of groups formed
Max
imum
num
ber o
f tas
ks a
dmitt
ed
Figure 3.14: Number of Tasks (32 processors)
hierarchical version to 276 in the hierarchical version with 16 groups.
Next, we keep the number of tasks constant at 276, the maximum that can be admitted
by groups of 16 processors and measure the execution time under each case. The results
averaged over 100 iterations are presented in 3.15. It shows a very sharp drop in execution
time as the number of groups is increased. For example, the execution time for non-
hierarchical AMRMD is 73 times that of hierarchical AMRMD with groups of 16.
Next, we vary the number of input tasks from 16 to 256 and plot the utility and execution
time of the algorithms against the number of tasks in Figures 3.16 and 3.17. The results
show a negligible difference in utilities for a fixed number of tasks. It also shows a huge
drop in execution time (73 times for 276 tasks) as the number of groups is increased from
1 to 16.
3.4.2 Fault-tolerance and Hierarchical Q-RAM
In this experiment, we consider tasks having fault-tolerance as the only QoS dimension.
The specifications for the fault-tolerance are the same as presented in Table 3.3 of the non-
3.5. Chapter Summary 61
1 2 4 8 160
1
2
3
4
5
6
7
8
9x 105 Time variation with hierarchical Q−RAM (276 tasks)
Number of groups formed
Tim
e (u
sec)
take
n to
run
Figure 3.15: Run-time (276 tasks)
hierarchical case. Other configuration parameters are maintained to be the same as those
of the previous experiments.
As observed from Figures 3.18 and 3.19, a relatively smaller number of tasks (112 to 82)
are admitted with 16 groups. This is because each group of 16 has only 2 processors, and the
maximum number of replicas for the fault-tolerance is 2, thereby significantly reducing the
number of possible trade-off options. On the other hand, we observe 37, 000 times reduction
in the execution time as the grouping is increased from 1 to 16 (Figure 3.19). In addition,
all of the groupings produce a near-identical utility curve when we vary the number of tasks
from 16 to 76, as shown in Figure 3.21.
In summary, H-Q-RAM obtains a near-optimal system utility while reducing execution
time by two orders of magnitude or more.
3.5 Chapter Summary
In this chapter, we investigated extensions to Q-RAM to apply it to multi-processor systems.
We showed that AMRMD CM was able to admit more tasks and achieve larger global utility
62 Chapter 3. Resource Allocation in Multiprocessor Systems
0
1000
2000
3000
4000
5000
6000
7000
0 50 100 150 200 250 300
Util
ity a
crue
d ->
Number of Tasks ->
No grouping (1 group)2 groups4 groups8 groups
16 groups
Figure 3.16: Utility Variation (max 256 tasks)
values compared to the basic algorithm AMRMD1 with only a small increase in the execution
time. A similar pattern was observed when we used reliability as a QoS dimension.
Unfortunately, algorithms AMRMD1, AMRMD CM and AMRMD DP take too long to run when
allocating resources on large multi-processor systems. For example, a radar tracking system
may consist of a bank of 64 or more processors for signal processing tasks. It would take
around 5s to perform the resource allocation under AMRMD CM, which may be unacceptably
long.
We then presented a hierarchical decomposition approach for applying our QoS opti-
mization algorithms to such systems. In this approach, we divided the system into multiple
smaller identical subsystems and uniformly distributed tasks into those subsystems. Then,
we performed QoS optimization on each of these subsystems independently. We showed that
this hierarchical approach significantly reduced the execution time for all of the algorithms.
In particular, the resource allocation problem involving fault tolerance as a QoS dimension
3.5. Chapter Summary 63
0
5e+06
1e+07
1.5e+07
2e+07
2.5e+07
3e+07
3.5e+07
4e+07
4.5e+07
5e+07
0 50 100 150 200 250 300
Run
-tim
e (in
use
c) ->
Number of Tasks ->
No grouping (1 group)2 groups4 groups8 groups
16 groups
Figure 3.17: Run-time plot with grouping for 32 processors (max 256 tasks)
1 2 4 8 160
20
40
60
80
100
120Variation of the number of task admitted with F−T
Number of groups formed
Max
num
ber o
f tas
ks a
dmitt
ed
Figure 3.18: Number of Tasks under Fault-Tolerance
64 Chapter 3. Resource Allocation in Multiprocessor Systems
1 2 4 8 16102
103
104
105
106
107
108
Number of groups formed
Tim
e (u
sec)
take
n to
run
[Log
sca
le]
Time variation with groups FT (76 tasks)
Figure 3.19: Run-time (log-scale) under Fault-Tolerance
100
1000
10000
100000
1e+06
1e+07
1e+08
10 20 30 40 50 60 70 80
Exe
cutio
n tim
e (u
sec)
in lo
g-sc
ale
Number of tasks ->
No grouping (1 group)2 groups4 groups8 groups
16 groups
Figure 3.20: Run-time plot in log-scale with grouping for 32 processors under fault-tolerance(max 76 tasks)
3.5. Chapter Summary 65
10
15
20
25
30
35
40
45
50
55
10 20 30 40 50 60 70 80
Util
ity ->
Number of tasks ->
No grouping (1 group)2 groups4 groups8 groups
16 groups
Figure 3.21: Utility Plot under Fault-Tolerance (max 76 tasks)
becomes feasible as a result of our hierarchical approach since it reduces the execution time
by 5 orders of magnitude for a system of 32 processors. This difference increases with the
increase in the size of the system.
In the next chapter, we consider the extension of the first assumption where a task has
constraints in selecting resource trade-offs. For example, if a task needs a route between a
source and destination in a network, the selection of the links (as resources) is not arbitrary
and is dependent on the topology of the network.
Chapter 4
Resource Allocation in Networks
4.1 Introduction
In this chapter, we discuss QoS optimization in distributed networked environments. Apart
from the Internet, examples of distributed networked systems include sensor networks, au-
tonomous systems and overlay networks. In order to provide QoS to tasks executing on these
systems, we need to guarantee the allocation and scheduling of resources. The resources
include computational cycles, storage and network bandwidth across a route between the
source and the destination. For example, a typical video transmission application requires
a certain amount of network bandwidth and CPU cycles from various network links and
routers respectively. Higher quality in terms of its frame rates and resolutions requires a
greater quantity of these resources.
For a large number of tasks to be deployed on a system consisting of a large number
of resources, we designed a hierarchical scheme in Chapter 3 that provides near-optimal
resource allocation in a scalable manner. The hierarchical technique divides the problem
into smaller independent sub-problems. Specifically, it divides the system into identical
subsystems, assigns tasks to these subsystems in an equitable fashion so that each subsystem
obtains a (nearly) identical number of tasks of the same type, and then makes resource
67
68 Chapter 4. Resource Allocation in Networks
allocation decisions within each subsystem independently. Implementing this scheme on a
networked system, however, presents two major difficulties. First, it is difficult to divide a
networked system into a number of identical subsystems if the architecture is heterogeneous
(even if it is hierarchical). Secondly and most importantly, it is not possible to isolate the
subsystems in the network. This is because the route of a task can potentially span a very
large number of links and routers over the entire network. If we consider each network sub-
domain as a subsystem, many tasks can have routes across multiple sub-domains and thus
the resource allocation in one subsystem may be dependent on that obtained in another
and vice versa. Hence, multiple subsystems need to negotiate with each other in order to
determine near-optimal resource allocations.
4.1.1 Our Contribution
In the context of network QoS, we make our contribution in network bandwidth allocation
and route selection. However, our model differs in two fundamental ways. First, instead of
specifying a single QoS requirement, our Q-RAM-based QoS model allows a task/flow to
specify multiple levels of bandwidth and delay requirements for different levels of service.
Second, our resource allocation scheme determines the allocation of a near-optimal route
and a near-optimal network bandwidth along the route for each flow. The scheme relies on a
signaling protocol such as RSVP and packet scheduling policies across the network in order
to satisfy the network bandwidth reservations. In addition, as we will discuss later in this
chapter, it can also exploit the existing routing protocols to perform efficient optimization.
4.2 Modeling of Networked System
In this section, we describe our model of a distributed networked system. We first briefly
describe our generic resource allocation model based on Q-RAM. Next, we introduce a
graph-theoretical model of the network and demonstrate how to formulate and solve the
4.2. Modeling of Networked System 69
network QoS optimization problem in Q-RAM.
4.2.1 Network Model and QoS
We assume that the network is a distributed system consisting of multiple resources where
each resource corresponds to the link capacity in terms of the available bandwidth of the
link1. We consider a set of tasks that involve the transfer of data from one node in the
network to another. Each task has a set of QoS set-points in terms of bandwidth and
delay requirements. In addition, there is a utility associated with each of its set-points. In
general, a higher bandwidth provides higher quality and hence higher utility for a task. If
a network is modeled as an undirected graph, these tasks can be modeled as flows across
the graph with variable capacity requirements.
Q-RAM optimization in a network works as follows. Using the edges of the graph as
network links with a certain amount of bandwidth R, we construct a resource capacity
vector ~R = R1, . . . , Rm where m is the total number of weighted edges of the graph and Ri
is the bandwidth of the ith edge. We enumerate the operational dimensions of each task as
follows.
Set of bandwidth settings
The number of choices of bandwidth settings of a task τi is given by:
Bi = bi1, · · · , biNBi, (4.1)
where, NBi = number of possible bandwidth settings for task τi. The bandwidth maps
directly to the resource requirement on the network link.
1It is relatively straightforward to extend our formulation to include processing resources but we do notdo so for simplicity of presentation.
70 Chapter 4. Resource Allocation in Networks
Set of delay settings
The number of choices of delay settings of τi is given by:
Di = di1, · · · , diNDi
, (4.2)
where NDi = number of delay levels for τi.
The network delay encountered by a flow is dependent on the value of total bandwidth
(or speed) of the network link(s) used. It is expressed as the sum of three components:
(1) circuit delay (propagation delay of 1 bit), (2) transmission delay, and (3) switching
delay [68]. In our model, for simplicity, we assume that the circuit delay is much smaller
compared to the other two factors and much smaller than the minimum delay requirements
of the applications. The transmission delay is the manifestation of bandwidth capacities
of the links along a route. In other words, it is expressed as the sum of the transmission
delays of a single packet across each link. Finally, the switching delay is the sum of the
queueing delay and the processing delay at each node of the route. Assuming the node has
enough computing power, the queuing delay is a more dominant factor than the processing
delay. This, in turn, depends on the scheduling policy of the packet scheduler on the node.
Since our QoS model deals with resource allocation that separates it from the scheduling
concern at the lower level, we only need to consider the bandwidth of the links for our
model. We assume that once the bandwidth has been allocated, the router will have enough
processing cycles to process the packets between its incoming and outgoing links, and its
lower level packet scheduler can schedule the packets appropriately so that each flow meets
their deadlines 2.
In conclusion, assuming that the routers can provide scheduling guarantees to meet the
deadlines of the packets, the delay encountered by a flow is simply expressed as the sum
of the transmission times along all links in the route. In this case, we can also add an
2A lot of work in packet scheduling has been done in the past with varying degree of schedulable utilizationbounds on the routers [79, 39, 28].
4.2. Modeling of Networked System 71
estimated queueing delay time along each hop in the route. Having this constraint will
prevent the QoS optimizer Q-RAM from choosing a too long route. However, the delay
must be managed by a proper packet scheduling scheme once the bandwidth is allocated to
each flow or task.
Set of routes :
The number of choices of routes of a task τi is given by:
Pi = pi1 × · · · × piNPi
(4.3)
For a connected graph, we always have |Pi| ≥ 1.
The procedure for determining all the routes for a fixed source-destination (S-D) pair is
described in Algorithm 5. This is similar to the basic broadcast route discovery except that
all possible routes are discovered in this case. First, the source node broadcasts its route
request to the destination to its neighboring nodes. Each neighboring node, upon receiving
the request, constructs a temporary route, and forwards that route along with the original
request to all of its neighbors other than the sender of the request (in this case, it was the
source node). Each intermediate node copies that route, creates a new route adding itself
and sends that its other neighbors. An intermediate node makes sure to prevent cycles by
not forwarding the request to a neighbor that has already been included in the temporary
route that is copied. This process continues recursively until a neighbor does not have any
node to forward the request or the destination node is discovered.
Basic Q-RAM Algorithm
By combining Equations (4.1), (4.2) and (4.3), we obtain the set-points of the tasks Si :
Bi×Di×Pi. The utility of a set-point is obtained from the QoS dimensions as Bi → u,
while the corresponding resource requirements are obtained as Bi ×Di × Pi → R. Thus
//I = S for when the algorithm is called for the first time
output: Set of routes connecting S and D
// p = pending/incomplete route under consideration
//Vp = set of vertices for p
for All edges ei leaving I dor ← 0 ;//accept the link by default;if Next vertex N of the edge already belongs to the pending route then
r ← 1;endif r 6= 1 then
if N = D thenA route is constructed;Insert the route into the list of routes;Update the routing table entries of the vertices falling on this route;
endelse
I ← N ;Call this algorithm with (S, D, N) as inputs;//recursion;
endend
end
Algorithm 5: Basic Route Discovery Algorithm
4.2. Modeling of Networked System 73
a set-point is represented by qj , uj , (rj1 , . . . , rjm), hj where
qj = Quality level,
uj = Utility level,
(rj1 , . . . , rjm) = resource vector representing resource requirement at each edge of the
system, and
hj = compound resource describing a cost of allocating the resource.
The procedure is detailed in Algorithm 6.
input : profiles of tasks with bandwidth and network routesoutput: route and bandwidth allocation of tasks by maximizing utilityfor Each task i = 0 to n do
Determine QoS points as bandwidths Bi;Determine Pi as the set of resource options using Algorithm 5;Generate set-points Si = Bi ×Di × Pi for τi and map to resource requirementsSi → R in terms of link bandwidths;Determine “compound resource” as a scalar cost metric for each set-point;Determine concave majorant of the set-points based on their (compoundresource, utility) values and the corresponding gradient;
endMerge set-points of n tasks with decreasing values of their gradients and perform aglobal resource allocation starting with the point of highest gradient;
Algorithm 6: Basic Global QoS Optimization For Networks
Algorithm 6 is the most direct way of solving the problem of network bandwidth allo-
cation in Q-RAM. However, there are two main drawbacks to this approach.
First, it requires each task to enumerate all of its set-points, which, in turn, requires
them to determine all possible routes Pi between the source and destination. As the size
of the network increases, |Pi| increases exponentially, and the complexity of the whole
route discovery process supersedes the complexity of the optimization, making the process
intractable for large networks. Therefore, we must use an efficient route discovery technique
that can exploit the architecture of the network, namely hierarchical route discovery [34][53].
Second, suppose that each task has a small set(≤ 10 for example) of QoS levels for the
sake of simplicity. Even in this case, since Pi is the enumerated list of all routes between
74 Chapter 4. Resource Allocation in Networks
two nodes in the network, it can potentially be very large. Therefore, we must select a few
routes to make the problem tractable. The challenge is to pick these few routes such that
the resulting utility is close to what would be achieved if the exhaustive lists of routes were
considered.
4.3 Hierarchical Network Architecture
In this section, we first formulate the hierarchical network problem using Graph-theoretical
techniques. Next, we describe how this formulation can be used in decomposing our opti-
mization process.
4.3.1 Graph-Theoretical Representation
We follow the description of the hierarchical network model as presented for the Internet
[10, 34, 81]. The ATM forum also adopts a hierarchical architecture for their network [14].
The entire network is represented as a connected undirected graph G = (V,E) as shown
in Figure 4.1, where V denotes the set of vertices and E denotes the number of edges.
The nodes or vertices of a graph represent switches, and the edges represent links. The
bandwidth across each link ej is expressed as the capacity cj of an edge in the graph. If the
network is hierarchically organized, Gp represents the network architecture at a particular
layer p.
The nodes get clustered to form the graph of the next layer. The nodes of the same layer
that are clustered into the same higher layer are said to belong to the same peer group [14].
At a particular layer, a set of edges partition the graph into multiple induced subgraphs,
whose vertices form peer groups. This set of edges defines the edges of the graph at the
next higher layer. We call these edges backbone-edges. If two subgraphs are connected by a
single edge, their connecting backbone-edge becomes a cut-edge of the graph.
If we collapse all the vertices and edges of a subgraph Gi of G into a single vertex, it is
4.3. Hierarchical Network Architecture 75
Layer 3
Layer 2
Layer 1
Layer 1 backbone-edge
Layer 2 backbone-edge
(a) Layer 1 Architecture
Layer 2 vertices /Layer 1 Supervectices
(b) Layer 2 Architecture
Layer 3 vertices /Layer 2 Supervectices
(c) Layer 3
Figure 4.1: Hierarchical Graph Model of Network
76 Chapter 4. Resource Allocation in Networks
called a supervertex. Thus the graph at a higher layer is the supervertex graph of that of
the next lower layer. This layered architecture is illustrated in Figure 4.1. Expanding each
supervertex at any layer reveals the entire network of nodes in that subgraph at the lower
layer.
Let us consider a task that sends data from a source node x to a destination node y. We
define PG(x, y) to be the set of all possible routes from x to y. For a connected graph, we
have |PG(x, y)| ≥ 1. Let us also define pG(x, y) ∈ PG(x, y) as a particular route from x to
y. This is formed by concatenating a set of edges that connect x and y. This includes the
edges inside multiple sub-graphs and the backbone edges connecting them. Let us assume
that Vx and Vy are sets of vertices of two subgraphs of G such that x ∈ Vx,y ∈ Vy and
Vx ∩ Vy = ∅. Let the supervertices v′x and v′y of the supervertex graph G′ represent the sets
of vertices Vx and Vy in the original graph G. By definition, PG′(v′x, v′y) denotes the set of
routes between the supervertices v′x and v′y. Therefore, for every pG′(v′x, v′y) ∈ PG′(v′x, v′y),
there is at least one corresponding pG(x, y) ∈ PG(x, y).
Definition 4.3.1 (Border vertices). The vertices in two different induced sub-graphs that
are connected by one or more backbone-edges are known as border vertices.
Definition 4.3.2 (Sub-Route). The set of edges of a particular route connecting two
border vertices of an induced sub-graph between two backbone-edges is called a “sub-route”
or a “child-route”.
Definition 4.3.3 (Parent Route). The route in the supervertex graph that connects the
source and the destination supervertices is called the “parent route” of the “sub-routes”
internal to each supervertex of the (supervertex) graph.
According to the above definitions, each parent route has sub-routes within each super-
vertex it connects. Using the same notation, PG′(v′x, v′y) denotes the set of parent routes, and
each element in PG(x, y) consists of a concatenation of the edges from a route in PG′(v′x, v′y)
and its sub-routes one from each of the supervertices it traverses. As an example, in the
4.3. Hierarchical Network Architecture 77
Subdomain 1 Subdomain 2 Subdomain 3
dstsrc
Supervextex graph
S D
Backbone 1 Backbone 2
Route in the Supervertex Graph
Figure 4.2: Network sub-domain and Supervertex Graph Example for |PG′(v′x, v′y)| = 1
case of the Internet, border vertices denote the edge routers that connect two sub-domains,
a parent route represents a route corresponding to “Inter-domain routing” and a sub-route
represents that corresponding to “Intra-domain routing”.
Next, we state theorems dealing with route selection for a given flow with a fixed capacity
(or bandwidth) constraint.
Lemma 4.3.4 (Backbone edge and Route selection). If all routes in PG(x, y) share
the same set of backbone edges, in Graph G, then |PG′(v′x, v′y)| = 1.
Proof. If all routes in PG share the same set of backbone edges, they go through the same
set of subgraphs. In the supervertex graph G′, these subgraphs are replaced by vertices.
Thus all routes in PG(x, y) collapse to having the same set of supervertices and hence are
connected by the same set of edges in G′. Therefore they collapse to a single route. In other
words, |PG′(v′x, v′y)| = 1.
Let us consider the network of 3 sub-domains illustrated in Figure 4.2. The source node
is present in Sub-domain 1 while the destination node is present in Sub-domain 3. As can
be seen from the figure, every route connecting the source “src” and the destination “dst”
has to go through the same sub-domains 1, 2, 3 and the backbone edges 1 and 2 connecting
those sub-domains. Hence, in the supervertex graph, all routes collapse to a single route
that traverses across 3 supervertices.
78 Chapter 4. Resource Allocation in Networks
Next, we would like to determine the routes internal to each sub-domain. Using the same
example in Figure 4.2, we build a complete route between the source and the destination
by selecting a sub-route within each sub-domain that connects the backbone edges. We can
have multiple possible choices of sub-routes inside each sub-domain. If the selection of the
sub-route in one sub-domain does not affect the same at another, we say that the sub-routes
can be chosen independently of each other. Based on that, we state Lemma 4.3.5 under the
situation where we would like to determine a route of a particular bandwidth for a flow.
Lemma 4.3.5 (Independent Sub-Route Selection). For a fixed route pG′(v′x, v′y) ∈
PG′(v′x, v′y) in the supervertex graph with a fixed capacity (bandwidth) requirement, the sub-
routes inside each sub-graph can be chosen independently of each other.
Proof. Let us consider a hierarchical Graph G consisting of multiple induced subgraphs and
backbone edges joining them. The source node and the destination node of a particular task
are denoted by x and y respectively. Any route pG(x, y) ∈ PG(x, y) traverses a fixed set of
subgraphs g1, .., gl and a fixed set of backbone edges L1, .., Ll−1. If pg1 , .., pglare the sub-
routes in the respective subgraphs g1, .., gl of the route pG(x, y), then we express pG(x, y) as
This shows that selecting edges inside each subgraph can be performed independently
under a fixed capacity constraint.
Delay and Hierarchical Routing Lemma 4.3.5 holds true when delay is not considered.
The approximated delay is the main drawback of hierarchical routing [53]. In order to satisfy
the delay constraint in terms of the number hops as mentioned in Section 4.2.1, we divide
the delay requirements equally in each subgraph falling in the route, similar to what is done
in [39].
Based on Lemma 4.3.4 and Lemma 4.3.5, we state a theorem on the complexity of route
selections.
Lemma 4.3.6 (Complexity of Route Selections). Suppose all routes in PG(x, y) share
the same set of backbone edges L1, . . . , Ll−1, and hence the same set of subgraphs g1, . . . , gl
in Graph G. Furthermore, suppose that the set of edges for the route within a subgraph
gi can be chosen in si different ways under a bandwidth constraint. Then the number of
possible routes is∏l
i=1 si and the number of computational steps required to choose a route
is∑l
i=1 si.
Proof. Using the notation from (4.4), the set of links pg1 satisfying the bandwidth constraint
from the sub-graph g1 can be chosen in s1 different ways. From Lemma 4.3.5, for each choice
in g1, we can choose the set of links in g2 by s2 different ways and so on. Therefore, the
maximum number of possible ways a route can be selected is s1 × . . .× sl =∏l
i=1 si.
80 Chapter 4. Resource Allocation in Networks
Next, the number of steps required to choose the near-optimal set of edges inside a
subgraph gi (sub-route) is si. Since all routes map to a single route in the supervertex
domain, Lemma 4.3.5 proved that the selection of edges in each subgraph can be done
independent of each other under a fixed capacity requirement. Therefore, the maximum
number of steps required to choose a suitable route is s1 + . . . + sl =∑l
i=1 si.
Based on Theorem 4.3.6, we describe our hierarchical route discovery method next.
Later, we will also discuss how it also assists in hierarchical QoS optimization.
4.3.2 Hierarchical Route Discovery
The process of route discovery assumes a top-down approach. In other words, we obtain
the set of routes for a task at its highest level of network hierarchy. Next, for each of the
(super)vertices in each route, we obtain the sub-routes inside the subgraphs represented by
those vertices. The process starts with the highest level of the task and continues to the
lowest level of the hierarchy. This recursive procedure is described in Algorithm 7. From
Theorem 4.3.6, if we would like to determine ηth number of sub-routes for each sub-domain,
the complexity of hierarchical route discovery is O(pηth), where p is the number of sub-
domains. On the other hand, a flat route discovery will have the complexity of O(ηpth) for
the same set of routes.
For each subgraph at every level, Algorithm 7 determines the set of routes between the
two edge routers using Algorithm 5 or the more efficient Algorithm 8.
4.4 Selective Routing
As proved in Theorem 4.3.6, the hierarchical scheme is able to reduce the complexity of
the route discovery process. However, it does not reduce the overall number of routes per
task. In order to reduce the complexity of the Q-RAM optimization, we must also limit the
4.4. Selective Routing 81
input : Level of the hierarchy of the graph, source and destination nodesoutput: Hierarchical routes between source and destination nodesDetermine routes between source and destination node within their domain (AS);//Use Algorithm 5 or 8 or something similar;
if Level of the routes is not the lowest thenfor Each node in the route do
Obtain the corresponding subgraph represented by the node;Determine the entry router node and the exit router node of the subgraph;Call this Procedure between the above two nodes within the domain of thesubgraph recursively;
The route discovery process employed in our scheme is developed in three phases, start-
ing from generating the exhaustive lists of routes for each task to a smart discovery of
a fewer routes, with the aim of improving on the execution time without incurring any
significant loss in overall utility.
4.4.1 Broadcast Routing
Broadcast routing is the basic approach that uses flooding from the source across the net-
work to determine all possible routes to the destination. It assumes that each node only
knows its neighbors. This process can potentially yield an exponentially large number of
routes, and can therefore become intractable as the size of the network increases.
4.4.2 Smart Route Discovery
Instead of choosing all possible routes between a source and a destination, we would like to
select only a few best or least-cost routes. We use a metric called Route Count Threshold.
Definition 4.4.1 (Route Count Threshold). The route count threshold is defined as
the maximum number of choices of routes for a particular source-destination pair.
82 Chapter 4. Resource Allocation in Networks
input : Source vertex S, Destination vertex D
output: Set of routes connecting S and D
// p = pending/incomplete route under consideration
//Vp = set of vertices for p
if T > Tth thenSort all edges ei connected to I within the Graph in terms its minimum cost ofrouting to D ;//T = Task ID, Tth = Task Count Threshold
endfor All edges ei do
r ← 0 ;//accept the link by default
if The cost of a potential route added by the load of the edge exceeds themaximum cost of the routes already included and the number of routes ηi = ηth
thenr ← 1 ;// reject this link from route discovery
endelse if Next vertex N of the edge already belongs to the pending route then
r ← 1;
if r 6= 1 thenif N = D then
A route is constructed;Insert the route into the list of routes;Update the routing table entries of the vertices falling on this route;
endelse
I ← N ;Call this algorithm recursively;
endend
end
Algorithm 8: Smart Route Discovery
4.4. Selective Routing 83
We denote this limit by ηth. We assume that the number of hops is the measure of the
cost of a route. Using this principle, for ηth = 1, we know that Dijkstra’s shortest route
algorithm can provide the best route between a source and a destination [22]. However,
Dijkstra’s algorithm for each source-destination has O(|V |2) complexity, where |V | is the
number of nodes. This can be quite expensive for large networks.
Another alternative is the Bellman-Ford algorithm. This algorithm finds the shortest
routes from a single source vertex to all other vertices in a weighted, directed graph [6, 24].
The algorithm initializes the distance to the source vertex to 0 and all other vertices to
∞. It then does |V | − 1 passes over all edges relaxing, or updating, the distance to the
destination of each edge. The time complexity is O(|V ||E|), where |E| is the number of
edges. A variant of this algorithm is used for distance-vector routing in the Internet, such
as RIP, BGP, ISO IDRP, NOVELL IPX etc.
In our routing scheme that we call “Smart Route Discovery”, we use a modified version
of the Bellman-Ford algorithm within each sub-domain of a network, where we determine
ηth shortest routes for each source-destination pair.
4.4.3 Route Caching
In a distance vector routing algorithm a router learns routes from neighboring routers’
perspectives and then advertises the routes from its own perspective. We implement a
reactive distance vector routing protocol in our simulation.
According to this protocol, each node (router) is initialized with the routes of its next
hop neighbors. The algorithm discovers routes of a task starting from its source. Once a
route is established, each node across the route adds the entry to its routing table. The
existing routing table, in turn, is exploited in route discovery. During this process, at any
intermediate node, we sort the neighboring vertices in increasing order of the minimum cost
of routing to the destination based on their routing tables, and reject the neighbors with
more expensive routing in their tables once the number of routes reaches the limit ηth. This
84 Chapter 4. Resource Allocation in Networks
algorithm can provide a potentially sub-optimal route compared to the exhaustive discovery
of the best routes. Therefore, we would like to use this routing information to assist in this
step only after we finish discovering routes for a sufficient number of tasks. We define a
parameter called Task Count Threshold Tth.
Definition 4.4.2 (Task Count Threshold). The task count threshold is defined as the
number of tasks whose routes are determined by exhaustive search using only the next-hop
routing information for each node.
The exploitation of cached routes is simply used to reduce the complexity of route
discovery process. The process becomes intractable for a large dynamic system if we perform
the exhaustive search for every incoming request. Instead, we eliminate this complexity
using cached routing information present in the node. However, we need to make sure that
the network is sufficiently discovered by nodes. Otherwise, the cached route information
at the intermediate nodes may provide sub-optimal routes. Therefore, we would like to
have Tth to be sufficiently large so that the cached routing information can be used without
significantly sacrificing optimality. Route caching is very important in a dynamic networked
system, where flows dynamically enter and leave the system.
Algorithm 8 describes the procedure for Smart Route Discovery that includes the usage
of parameters ηth and Tth.
4.4.4 QoS Optimization in Large Networks
So far, we have discussed a single centralized optimization scheme that distributes band-
width among tasks. In a large network, a centralized scheme is likely to be infeasible. In
addition, it may not scale well with a very large number of tasks. In the next section, we
will describe a hierarchical QoS optimization technique that exploits the inherent hierarchy
of the network. It can also be distributed across the entire system, thus making the QoS
optimization feasible and scalable for a large network using a large number of tasks or flows.
4.5. Hierarchical QoS Optimization (H-Q-RAM) 85
4.5 Hierarchical QoS Optimization (H-Q-RAM)
In this section, we present H-Q-RAM for networks that utilizes the hierarchical architecture
of networks [81]. In this dissertation, we confine our discussion to only 2 levels of hierarchy
for ease of presentation. The process is divided into two major steps. They are: (1) hierar-
chical concave majorant operation, and (2) distributed resource allocation. The process is
outlined in Algorithm 9 which will be described in detail in the following sections.
g2
S
G1G2
G
1
23
4 5
11D6
7
89
10
S :u,q,<R> ,h
S :u,q,<R> ,h
g1 g
2
S:u,q,<<R> ,<R> >,h + h
Subgraph set−pointSubgraph set−point
Composite set−point
g1 g2
1g
g1
2g
1g g
2
Figure 4.3: Compound Resource Composition
4.5.1 Hierarchical Concave Majorant Operation
This process is divided into 2 steps. First, we generate separate profiles for each task in
each of the sub-domains containing its sub-routes. Second, we combine information from
each sub-domain and update the set-points.
Creation of Multiple Profiles
At the lowest level for each sub-graph, we obtain the set of tasks whose routes include the
sub-graph. Next, we generate local set-points Si = Bi × Dig × Pig for these tasks, where
86 Chapter 4. Resource Allocation in Networks
Pig is the set of sub-routes inside the subgraph g and Dig is the delay assigned for the
route inside subgraph g. As mentioned before, a set-point consists of a utility value, a
corresponding QoS level and a resource vector specifying the route inside the subgraph and
the bandwidth requirement of the links of that route. Thus each task has distinct profiles
within each subgraph.
Next, we evaluate the compound resources for set-points. Using compound resource
values, we prune the list of set-points and discard the ones that are “inefficient”. A set-
point is called inefficient if it has a larger compound resource value than another point at
the same utility level. In other words, if we have multiple set-points for a particular value
of utility, we keep the one that has the smallest compound resource value and discard the
rest. If there is more than one set-point with the same minimum compound resource value
at a utility level, we keep all of those points as co-located set-points (see Chapter 2).
Creation of Composite Profiles
We next merge the profiles of multiple subgraphs or sub-domains into a single profile for
each task. First, we choose a single set-point for each utility value from each subgraph for
each parent route, and then combine the compound resource values of all subgraphs. Since
all the resources in this case are considered to be of identical type (as network links), the
compound resource of the global set-point of a task spanning two subgraphs g1 and g2 is
given by:
hcomp = hg1 + hg2 , (4.9)
where hg1 and hg2 are the compound resource values of the task(or flow) at its particular
quality setting in the two sub-domains g1 and g2. The generation of a composite set-point
is illustrated in Figure 4.3, where the local set-points of the subgraphs are assumed to be
(Sg1 : u, q, < R >g1 , hg1) and (Sg2 : u, q, < R >g2 , hg2) for a particular value of utility u and
quality level q.
4.5. Hierarchical QoS Optimization (H-Q-RAM) 87
G1
G
Optimization thread 1
Optimization thread 2
Global Information Transaction
G2
Figure 4.4: Distributed QoS Optimization
Second, we determine the concave majorant of these global set-points.
Third, we replace the compound resource values of the local set-points in each sub-
domain by the corresponding composite compound resource values. For example, as shown
in Figure 4.3, the set-points for a task in subgraphs g1 and g2 are changed from (Sg1 : u, q, <
R >g1 , hg1) and (Sg2 : u, q, < R >g2 , hg2) to (Sg1 : u, q, < R >g1 , hg1 + hg2) and (Sg2 : u, q, <
R >g2 , hg1 +hg2) respectively. In addition, since the concave majorant operation eliminates
set-points, a few global set-points may be discarded. In that case, we also discard the
corresponding local set-points in the subgraphs.
Finally, we merge all the local set-points of tasks in each sub-domain to create lists of
set-points called slope lists (see Section 2.3), which are going to be traversed for resource
allocation purposes. As mentioned in Chapter 2, the set-points in the slope list are ordered
by increasing slope or marginal utility values. We will discuss the resource allocation in the
next section.
4.5.2 Transaction-based Resource Allocation
We perform concurrent resource allocation within each sub-domain. Thus, the entire global
resource allocation problem is partitioned into multiple sub-problems within each subgraph,
88 Chapter 4. Resource Allocation in Networks
for Each sub-domain in the network dofor Each task in the sub-domain do
Determine set-points Qi = Bi ×Di × Pg(i) ;//Pg(i) = number of sub-routes for task τi in the domain;
endendfor Each task in the entire network do
Generate global set-points by combining compound resource at each utilitylevel;Perform concave majorant on global set-points;
endfor Each sub-domain in the network do
for Each task in the sub-domain doDiscard the set-points whose global counter-part has been eliminated byconcave majorant operation;
endMerge the remaining set-points of all tasks in the sub-domain in a single list;
endfor Each sub-domain in the network do
Execute transaction-based resource allocation as described in Figure 4.5;end
similar to the situation in Chapter 3. However, the sub-problems are not completely in-
dependent of each other in this case, since some tasks may be present in more than one
sub-problem. Such tasks must be assigned the resources to achieve the same utility value
(or quality setting) in all the sub-problems that they are present in. This requires coordi-
nation between these sub-problems, since a resource allocation in one sub-domain may be
infeasible in another sub-domain. In this context, we define three parameters.
Definition 4.5.1 (Local Task). A task is called a local task if its source and destination
nodes are in the same sub-domain.
Definition 4.5.2 (Global Task). A task is called a global task if its source and destination
nodes are in different sub-domains.
Definition 4.5.3 (Locality of Tasks). The locality is the fraction of tasks that are local,
Distributed Negotiation
The resource allocator in each sub-domain sequentially goes through its slope list. If it finds
the set-point in the list belonging to a local task, it determines its feasibility of allocation
locally, and accepts or rejects it based on the availability of local resources. Hence it works
independently for local tasks assuming that the best route for a local task is available within
the sub-domain it belongs to3.
When the allocator comes across a set-point of a global task that needs to have a route
spanning multiple subgraphs, it does the following. First, it checks if the corresponding
global set-point has already been rejected. It happens when another sub-domain that is
included in the parent route of the task fails to allocate the corresponding local set-point.
In that case, the current allocator also discards the set-point and moves on. Otherwise,
it marks the set-point as allocable and waits until every other sub-domain along the route
decides the allocations of their corresponding set-points. During this time, it goes to sleep3A network sub-domain is designed in such a way that the best route for a local task falls within the
sub-domain unless its links are extremely crowded.
90 Chapter 4. Resource Allocation in Networks
and and wakes up only when all other sub-domains make their decisions. Upon waking
up, it checks if the allocation has been successful. The allocation becomes successful when
all sub-domains are able to allocate their corresponding local set-points that complete the
route with a specific utility value. The allocation is unsuccessful if one of the sub-domains
fails. Upon a successful allocation, it finalizes the local allocation. Otherwise, it rejects the
initial tentative allocation. Next, it proceeds further to complete the operation of QoS-based
resource allocation.
Deadlock Avoidance in Negotiation
Since allocators negotiate the allocation for set-points belonging to global tasks, it is im-
portant to ensure that a deadlock never happens. Since an allocator follows the slope list
that is ordered in the increasing marginal utility4 values, it is feasible to have the same
marginal utility values for multiple set-points belonging to different tasks or flows or for
different routes of the same task. In that case, we must implement an ordering mechanism
of set-points to avoid any deadlock.
We implement two levels of ordering to avoid the deadlock. First, we assign a global
number to each flow or task in the entire network. This global number can obtained as a
combination of IP addresses of the source and the destination nodes, and the corresponding
port numbers.
Second, we also assign a global number to each “Parent Route” within a flow. Using
these numbers, we resolve the contention in the slope list when multiple set-points have the
same marginal utility value. First, we order them in the increasing order of their global
flow IDs. Next, for multiple co-located set-points of the same flow, we order them in the
increasing order of their Parent Route IDs. For the co-located points of the same Parent
Route of the same task, we do not require any ordering since their selections are independent
in sub-domains, as proved in Lemma 4.3.5. The allocation process is illustrated in Figure 4.44The marginal utility of a task is defined as the ratio of the difference between the utility values and the
compound resource values between two successive set-points of different utility values.
4.5. Hierarchical QoS Optimization (H-Q-RAM) 91
and is detailed by a flow-chart in Figure 4.5.
Start
Any pending global
allocation?
Check the next set-point in the list that
increses the current utility of the task
1. Feed the list with sorted set-points
of all tasks2. Utitlities of tasks
intialized to '0.0'
Local Task?
Y
Resource allocation feasible?
Y
Allocate
N
Resource allocation
locally and globally feasible?
Y
Mark it allocable
Task allocated/rejected by other allocator(s)?
Sleep and wait for
the completion of the total allocation
N
N
YAllocated?
AllocateLocally
Y
De-allocateLocally
N
Y
Next set-point is co-located of this one (of same parent path)?
N
N
Y
Mark it rejected
Wake up
pointleft?
Finished!
N
Y
Figure 4.5: Distributed Resource Allocator
92 Chapter 4. Resource Allocation in Networks
4.5.3 Complexity of Network QoS Optimization
In this section, we compare the complexities of the Q-RAM and the H-Q-RAM optimization.
Q-RAM Complexity
Suppose there are n tasks in the entire network. Using the same notation as before, let
us assume that |Qm| denotes the maximum number of QoS settings, ηth = maxni=1|PG(i)|.
This definition yields the the maximum number of set-points L = |Qm||etath|. Hence, the
complexity of the concave majorant operation is O(n|Qm| log |Qm|), and the complexity of
the merging operation is O(n|Qm||etath| log(n)).
Since the complexity of the Q-RAM optimization is the sum of the complexities of the
concave majorant and the merging operation, we have the total complexity as O(n|Qm|(log |Qm|+
|etath| log(n)))
H-Q-RAM Complexity
For H-Q-RAM, initial local set-point pruning has O(lnl|Qm|ηth) complexity per sub-domain,
where l equals the number of sub-domains and nl equals the maximum number of tasks per
sub-domain. Unlike the Q-RAM optimization, ηth denotes the upper limit on the number
of routes inside each sub-domain for a task.
Next, we have the concave majorant operation that has the global complexity of O(n|Qm| log(|Qm|)).
The second pruning operation after the concave majorant also has the same complexity
O(lnl|Qm|ηth).
The merging operation requires O(lnl|Qm|ηth log(nl)) steps, and the distributed trans-
action requires a maximum of O(nlηth|Qm|) steps per sub-domain.
We can now express the generic complexity expression for H-Q-RAM, namely: O(lnl|Qm|ηth)+
From the expression, in the worst case, when every task has a profile in every sub-
domain, we have nl = n. Then, the complexity of H-Q-RAM is higher than that of Q-RAM.
In the best case, which corresponds to the case when every flow is a local task that does
not span sub-domains, we have nl = n/l, which is better than that of Q-RAM. However,
in a very large network (the size of the Internet), it is very unlikely that a task traverses
across all sub-domains. Therefore, H-Q-RAM performs better than Q-RAM for practical
cases. Since H-Q-RAM computations can be distributed (one node per sub-domain), we
can further reduce the complexity to O(nl |Qm|(log |Qm|+ nl
n ηth log(nl))). Thus, H-Q-RAM
can scale well with large networks.
4.6 Experimental Evaluation
Our experimental evaluation is intended to quantify the performance of H-Q-RAM and Q-
RAM in terms of the trade-off between optimality and scalability. We focus on measuring
two main parameters:
• the global utility obtained by the optimization, and
• the total execution time of the algorithm.
First, we investigate the efficiency of our enhancements in route discovery. We deter-
mine how a selective set of routes obtained through our smart route discovery process can
eliminate the necessity of selecting a large number of routes for the optimization purposes.
We also investigate the performance of the optimization when we vary the parameter Tth.
Second, we compare the performance of H-Q-RAM optimization with respect to Q-RAM
optimization.
4.6.1 Experimental Configuration
In order to validate our technique, we generate network topologies using BRITE [56, 57] a
topology generation tool. The bandwidth distribution of the network links is presented in
94 Chapter 4. Resource Allocation in Networks
Table 4.1: Settings of Tasks
Number of QoS dimensions (Bandwidth, Delay) 2Length of bandwidth dimension random(1, 4)Length of delay dimension 1Minimum Bandwidth(Bmin) min((Rayleigh Distr. : µ = 152 Kbps),
8000.0 Kbps)Bandwidth Increment 0.3Bmin
Maximum Delay random(16, 20) hops
Utilities for QoS dimension (u(q)) (0.5,0.7,0.8)
Table 4.2: Settings of NetworksNetwork topology generator BRITE [56]Intra-domain link bandwidth 10.0 Mbps
Inter-domain link bandwidth 10000.0 Mbps
Table 4.2.
The specifications of the tasks are presented in Table 4.1. As seen from the table,
the minimum bandwidth is randomly chosen following a Rayleigh distribution with µ =
152 Kbps. This distribution ensures a positive value for the minimum bandwidth of any
task. For simplicity, we choose a single value of delay, which is expressed by a certain
maximum number of hops for a route. The source and the destination nodes of a task are
chosen randomly across the entire network. The experiments are performed on a 2.0 GHz
Pentium IV processor with 768 MB of memory.
4.6.2 Performance Evaluation of Selective Routing
In this section, we evaluate the performance of the selective routing algorithms.
Results on Smart Route Selection
In this experiment, we demonstrate the effectiveness of smart route selection as described
in Section 4.4.2.
4.6. Experimental Evaluation 95
0
50
100
150
200
250
300
350
400
0 50 100 150 200 250 300 350
Uti
lity
Number of Tasks
Random Path DiscoverySmart Path Discovery
Figure 4.6: Comparison of Smart Route Discovery and Random Route Discovery
0
50
100
150
200
250
300
350
400
0 100 200 300 400 500 600 700
Uti
lity
Number of Tasks
ηth = 1ηth = 2ηth = 5
ηth = 80ηth =∞ in bar-graph
Figure 4.7: Utility Variation with Number of Routes
96 Chapter 4. Resource Allocation in Networks
First, we compare the smart route discovery algorithm with the random route discovery
algorithm, where we randomly select ηth routes out of all possible routes. We vary the num-
ber the number of tasks in the system in geometric progression as N = 10, 20, 40, . . . , 640.
We plot the accrued utility against the number of tasks for ηth = 5 under both schemes
in Figure 4.6. The results show that a random route selection scheme yields a much lower
utility (29.5% for N = 320) compared to the smart route selection.
Next, we compare smart route selection for different values of ηth. In this case, we use
5 values of ηth as [1, 2, 5, 80,∞]. The value ∞ signifies that all possible routes are chosen
for each source-destination pair. The plots of utility against the number of tasks are shown
in Figure 4.7. The “ηth =∞” case is shown by the bar graph instead of a line.
From the bar graph, we observe that we do not have any data beyond N = 40 for
ηth = ∞. This is because for N ≥ 80, the route discovery and the optimization processes
become intractable. This is further confirmed by its steep rise in execution time as shown
in Figure 4.8.
On average, the utility increases as ηth increases since it provides more alternative
routes for each task. However, the difference between utilities at ηth = 5 and ηth = ∞
is statistically insignificant(< 0.09%), whereas the reduction in execution time for ηth = 5
is 93.6% (or, 15.6 times). Overall, we observe a 99.997% (or, 38239.4 times) reduction in
execution time for ηth = 5 relative to ηth = ∞ when the number of tasks is 40. Even for
ηth = 2, the reduction in utility is only 3.57% relative to ηth = 80 for 640 tasks, with a
run-time reduction of 96.9%.
Results on Route Caching
This experiment demonstrates how caching route information helps in reducing the execu-
tion time of the optimization. In this case, we fix the number of tasks N to 640 and vary
the parameter Task Count Threshold Tth. Figure 4.9 shows the percentage drop in utility
for different values of Tth compared to the same under no Route Caching, or Tth =∞. The
4.6. Experimental Evaluation 97
0.0001
0.001
0.01
0.1
1.0
10.0
100.0
1000.0
0 100 200 300 400 500 600 700
Opt
imiz
atio
nR
un-T
ime
(s)
Number of Tasks
ηth =∞ (All Paths)ηth = 80ηth = 5ηth = 2ηth = 1
Figure 4.8: Run-Time Variation with Number of Routes
-10
-8
-6
-4
-2
0
0 10 20 30 40 50 60 70
Per
cent
age
Uti
lity
Dro
p
Task Count Threshold (Tth)
Figure 4.9: Percentage Utility Drop with Routing Task Count Threshold
98 Chapter 4. Resource Allocation in Networks
-80
-70
-60
-50
-40
-30
-20
-10
0
0 10 20 30 40 50 60 70
Per
cent
age
Exe
cuti
onT
ime
Cha
nge
Task Count Threshold (Tth)
Figure 4.10: Percentage Run-Time Variation with Routing Task Count Threshold
value of ηth is kept constant at 5.
We observe that even for Tth = 1, for example, we start exploiting route discovery
information right after the first task’s routes have been determined. The percentage loss of
utility is less than 3%. On the other hand, we also observe a huge drop in execution time
(> 60%) as shown in Figure 4.10.
Using the route caching technique, the route discovery time per task will reduce with
time as nodes keep adding more entries to their routing tables. Figure 4.11 shows the plot
of route discovery time per task against the number of tasks considered for optimization.
It clearly shows that the route discovery time decreases exponentially with the number of
tasks. Hence it decreases with time in a dynamic system where tasks regularly arrive in and
depart from the system. In other words, we can claim that in a dynamic scenario, in steady
state, the optimization time dominates the route discovery time. This is also corroborated
in Figure 4.12, which shows the ratio of route discovery time and optimization time per
task.
4.6. Experimental Evaluation 99
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
0 2000 4000 6000 8000 10000 12000 14000
Rou
ting
Dis
cove
ryT
ime
per
Tas
k
Number of Tasks
Figure 4.11: Average Execution Time for Route-discovery Simulation Per Task
0
0.05
0.1
0.15
0.2
0.25
0.3
0 2000 4000 6000 8000 10000 12000 14000Q-R
AM
/Rou
teD
isco
very
Rat
ioPer
Tas
k
Number of Tasks
’Ratio in run-time’
Figure 4.12: Ratio of Q-RAM Optimization Time To Route-Discovery Per Task
100 Chapter 4. Resource Allocation in Networks
Table 4.3: Specifications of the NetworksNetwork ID Number of Sub-domains Number of nodes Number of links1 5 100 2072 8 160 3343 15 450 9304 20 600 1240
4.6.3 Performance Evaluation of Hierarchical Optimization
In this section, we evaluate the performance of Hierarchical QoS optimization. We use 2
levels of hierarchy for our experimental evaluation. We use the same specifications of tasks
as mentioned in Table 4.1. In order to validate the usefulness of H-Q-RAM, we use larger
networks, consisting of 5, 8, 15 and 20 sub-domains respectively. Their specifications are
presented in Table 4.3, and their bandwidth distributions as specified in Table 4.2. For these
large networks, we use ηth = 2, and Tth = 1, since these settings have provided reasonably
good utility values(< 5%) with great reductions execution time for smaller networks.
0
500
1000
1500
2000
2500
3000
3500
0 2000 4000 6000 8000 10000 12000
Uti
lity
Number of Tasks
Q-RAMH-Q-RAM
Figure 4.13: Absolute Utility Variation in Q-RAM and H-Q-RAM
In the first experiment, we use Network 3 from Table 4.3. In this case, we vary the
number of tasks for optimization between 100 and 10240 in a geometric progression with
4.6. Experimental Evaluation 101
0
1000.0
2000.0
3000.0
4000.0
5000.0
6000.0
7000.0
0 2000 4000 6000 8000 10000 12000
Exe
cuti
onT
ime
(sec
)
Number of Tasks
Q-RAMH-Q-RAM
Figure 4.14: Absolute Execution Time Variation in Q-RAM and H-Q-RAM
a power of 2. Figure 4.13 shows the variation of utility between Q-RAM and H-Q-RAM
against the number of tasks. Figure 4.14 shows the variation of execution time against the
number of tasks.
We observe that H-Q-RAM reduces the optimization time for 10, 240 tasks by 64%
while incurring a utility reduction of less than 2% utility than Q-RAM. From Figure 4.14,
we also observe that the difference between Q-RAM and H-Q-RAM increases further with
the increase in the number of tasks.
Implementation Considerations: As can be seen from Figure 4.14, the execution time
of the optimization increases exponentially for a large number of tasks to be deployed
in larger networks. This is because the simulation becomes memory-intensive under this
situation and hence many page faults and swapping operations cause the non-linear (expo-
nential) increase in the execution times. Consequently, it becomes difficult to simulate the
hierarchical optimization of a very large network in a single host, as the memory require-
ment for the optimization process also increase. This effectively suggests the necessity of
102 Chapter 4. Resource Allocation in Networks
0
2
4
6
8
10
10 20 30 40 50 60 70 80 90
Per
cent
age
Los
sin
Glo
balU
tilit
y
Locality of Tasks(%)
Number of SubNets=5Number of SubNets=15Number of Subnets=20
Figure 4.15: Variation of Percentage Utility Loss for 6400 Tasks with the Number of Sub-domains
studying the performance improvement of distributed transaction-based optimization using
H-Q-RAM. The execution time for H-Q-RAM will be reduced further if the optimization
is distributed over multiple hosts. This will be the only option available, since running the
Q-RAM optimization for all the tasks in a single host becomes intractable.
Next, we measure the performance of H-Q-RAM relative to the locality of tasks in
different sub-domains. From our complexity analysis, we know that H-Q-RAM performs
best when the source and the destination of a task are confined within a single domain, which
in turn also eliminates transactions between sub-domains during the optimization step. In
this experiment, we keep the number of tasks constant at 6400 and vary the locality of tasks
between 0% and 96% and measure the performance of Q-RAM and H-Q-RAM. The results
are taken for Networks 1, 3 and 4 from Table 4.3.
Figure 4.15 plots the percentage loss in utility under H-Q-RAM, which does not exceed
4.5%. In addition, the loss drops with the increase in the locality of the task and with the
increase in the size of the network.
4.6. Experimental Evaluation 103
-40
-20
0
20
40
60
80
100
10 20 30 40 50 60 70 80 90Per
cent
age
Red
ucti
onin
Exe
cuti
onT
ime
Locality of Tasks(%)
Number of SubNets=5Number of SubNets=15Number of Subnets=20
y=0 Line
Figure 4.16: Variation of Percentage Run-Time Reduction for 6400 Tasks with the Numberof Sub-domains
Figure 4.16 plots the percentage gain in execution time under H-Q-RAM. As seen from
the figure, H-Q-RAM actually has 20% higher execution time under 0% task locality for
the smallest network (Network 1 with 5 sub-domains). However, it increases with the size
of the network as well as with the locality of the tasks. Moreover, the rate of increase in
percentage gain decreases with the increase in the size of the network. In other words, for a
very large network, H-Q-RAM performs better than Q-RAM and the significance of locality
on this performance decreases.
The above experiment shows that H-Q-RAM provides a significant gain in performance
when (a) the size of the network is large, and (b) the locality of the tasks is high. These
results are in agreement the complexity analysis of H-Q-RAM.
In Figure 4.17, we also plot the number of transactions against the locality of the
tasks. As expected, the number of transactions decreases with the increase in task locality.
However, we observe a larger number of transactions with the increase in the size of the
network. This affects the absolute execution time of H-Q-RAM in our simulation due to
104 Chapter 4. Resource Allocation in Networks
a large number of switching among optimization threads and the consequent page faults.
Figure 4.18 plots the H-Q-RAM execution time against the number of sub-domains under
0
5000
10000
15000
20000
25000
0 20 40 60 80 100
Num
ber
ofTra
nsac
tion
s
Locality of Tasks (%)
Number of Sub-domains = 5Number of Sub-domains = 8
Number of Sub-domains = 15Number of Sub-domains=20
Figure 4.17: Number of Transactions for 6400 Tasks with the Number of Sub-domains
different values of the task localities for 6400 tasks. At a very high task locality (96%), the
number of transactions becomes negligible, and the execution time becomes independent of
the number of sub-domains. On the other hand, the execution time monotonically increases
with the number of sub-domains for lower task locality values.
Based on the above results, we conclude that H-Q-RAM performs well for large networks
compared to Q-RAM, which makes it feasible to employ QoS-based optimization in large
networked environments. However, we also observe that the number of transactions also
increases with the increase in the size of the network. Therefore, we would like to reduce
the number of transactions for future implementations.
4.7. Chapter Summary 105
0.1
1
10
100
4 6 8 10 12 14 16 18 20
Tim
ein
seco
nds
(log
sale
)
Number of Sub-domains
Locality of Tasks = 0.96Locality of Tasks = 0.72Locality of Tasks = 0.48Locality of Tasks = 0.24Locality of Tasks = 0.0
Figure 4.18: Variation of ( H-Q-RAM Execution Time/ Number of Sub-domains) for 6400Tasks with the Number of Sub-domains
4.7 Chapter Summary
In this chapter, we have discussed a resource allocation scheme for a networked system based
on Q-RAM. First, we proposed several pruning algorithms for smart route selections that
makes the basic optimization more scalable without any significant loss in the optimality
of the solution. Our main goal was to analyze the trade-off between optimality and the
execution time of our QoS optimization. Although the specific values may vary depending
on the topology, restricting the maximum number of routes only to 2 reduces the optimality
only by 5%. In addition, exploiting the cached route information across the network becomes
more useful as the size of the network increases.
Next, we presented a transaction-based hierarchical scheme (H-Q-RAM) that can make
the problem more scalable by exploiting the presence of hierarchy in networks. The perfor-
mance of H-Q-RAM improves with the increase in the size of the network and the locality
of the tasks. We also observed that the simulation is memory intensive, and it becomes in-
creasingly expensive in a single host with the increase in the size of the network. Therefore,
106 Chapter 4. Resource Allocation in Networks
a centralized scheme becomes infeasible for a network with the size of the Internet. Since H-
Q-RAM can be executed concurrently on multiple machines using distributed transactions,
it can be run in parallel to address large networks. In addition, we would also like to reduce
the number of transactions which increases with the size of the network. This can be done
if we can aggregate multiple tasks into a few “super-tasks” and perform transactions for
“super-tasks”. Hence our future work will investigate efficient methods of task aggregation.
Chapter 5
Resource Allocation in Phased
Array Radar
5.1 Introduction
There are certain systems where a task has a large number of operational dimensions and/or
a large number of elements across these dimensions1. For example, an application that has
10 operational dimensions with 10 levels along each dimension, will have 1010 set-points or
more. These tasks are called highly configurable tasks.
In addition, there are also certain tasks for which environmental factors play a key role
in deciding the QoS levels. In a dynamically changing environment, the mapping between
resources and utility may change with time. That necessitates frequent QoS optimizations
in order to allocate the resources among the tasks in a near-optimal manner.
Certain distributed embedded systems operate in conditions where both the above situ-
ations hold. An example of such a system is a phased array radar tracking system. A radar
system is an example for which environmental factors outside the direct control of the sys-
tem affect the relationship between the level of service and the resource requirements of the
1The operational and the environmental dimensions are defined in Chapter 2.
107
108 Chapter 5. Resource Allocation in Phased Array Radar
tasks, which in turn affect the perceived utility. In these systems, a finite amount of radar
bandwidth and computing resources must be apportioned among multiple tasks tracking
and searching targets in the sky. In addition, environmental factors such as noise, heating
constraints of the radar and the speed, distance and maneuverability of the tracked targets
dynamically affect the mapping between the level of service and resource requirements as
well as the mapping between the level of service and the user-perceived utility. Their highly
dynamic nature and stringent time constraints lead to complex cross-layer interactions in
these systems. Therefore, the design of such systems has long been a handcrafted mixture
of pre-computed schedules, pessimistic resource allocations, and cautious energy usage, and
operator intuition.
In this chapter, we consider an integrated framework for QoS optimization and schedul-
ing for a phased-array radar system. The antenna in a phased-array radar system can
electronically steer the energy beam in a desired direction. This allows it to track targets at
differing frequencies depending upon its distance, acceleration, and its characteristics such
as speed, acceleration etc. Some characteristics of a radar system are as follows.
• The longer the distance between the target and the radar, the higher the energy
requirement.
• Once a beam transmission starts, it cannot be preempted.
The goal of the radar system is to utilize its finite energy and time resources to maximize
the quality of tracking. In addition to the tracking tasks, the system also includes search
tasks and target confirmation task. A search task searches for new targets in the sky and
a target confirmation task confirms the target after it is detected by the search task.
A radar system must make two sets of decisions. First, it must decide what fraction
of resources (energy and time) to spend on each target. It must then schedule the radar
antenna(s) to transmit the beams and receive the return echoes in a non-preemptive fashion.
Since targets in the sky are continually moving, resource allocation and scheduling decisions
must be made on a frequent basis. Due to the multi-dimensional nature of radar resource
5.1. Introduction 109
allocation, the problem of maximizing the benefits gained is NP-hard.
In our scheme, we develop an integrated framework that performs a near-optimal re-
source allocation and scheduling of the tracking tasks in real-time. We show that such
decisions can be made near-optimally, while maintaining schedulability and satisfying the
resource constraints of the system. We concentrate primarily on the radar antenna resources
as these are generally scarce compared to the computing resources. Unlike traditional radar
systems, we use two layered components. A QoS optimization component is concerned
with determining how much of the resources should be given to each task, and a scheduling
component is concerned with determining when radar tracking tasks should be scheduled.
In short, our radar resource management scheme deals with two primary concerns: the
selections of operating points and ensuring schedulability.
Selection of Operating Points: This is performed by using Q-RAM. In this chapter,
we describe a scalable Q-RAM technique for allocating resources to radar tasks. This is
also presented in [27, 32, 36].
Ensuring Schedulability: As we know, only straightforward resource constraints (such
as total usage of any resource must be less than some utilization bound) is used by Q-
RAM in general. In the radar system, a given allocation generated by Q-RAM may or
may not be schedulable, and furthermore, jitter constraints can be violated even if all the
resource utilizations are less than 100%. In other words, the QoS allocator (Q-RAM) and
the scheduler need to be tightly integrated. Therefore, we present a scheme that integrates
the Q-RAM framework with the radar schedulability test. This is also described in [30].
Although radar scheduling incorporates a pipelined scheduling of a back-end and a front-
end, the front-end antenna remains the bottleneck resource. Therefore, we concentrate our
schedulability analysis on the front-end only. In order to provide stringent periodic jitter
constraints, we use harmonic periods for tasks [52].
The rest of this chapter is organized as follows. Section 5.2 presents our model of the
110 Chapter 5. Resource Allocation in Phased Array Radar
R1 only
R2 only
R3 only
R4 only
R1&R2
R2&R3 R3&R4
R1&R4
Figure 5.1: Radar System Model
radar system, its associated resources and constraints. Section 5.6 describes our radar dwell
Section 5.9, we present an evaluation of our experimental results. Finally, in Section 5.10,
we summarize our concluding remarks and provide a brief description of our future work.
5.2 Radar Task Model
We assume the same radar model as used in [30, 32]. It consists of a single ship with 4 radar
antennas oriented at 90 to each other as shown in Figure 5.1. We also assume that each
antenna is capable of tracking targets over a 120 arc. This means that there are regions
of the sky that are capable of being tracked by only one radar antenna, as well as regions
that can be tracked by two antennas. The antennas are assumed to share a large pool of
processors used for tracking and signal-processing algorithms, and a common power source
to supply energy to the antennas. The main tasks of an antenna are search and tracking.
• Search: There are multiple search tasks that cover the entire angular range of the
radar.
5.2. Radar Task Model 111
• Tracking: There is one tracking task corresponding to each target being tracked.
A single instance of tracking a particular target consists of sending a radar signal con-
sisting of a series of high frequency pulses and receiving the echo of those pulses. This
instance is known as a dwell, as shown in Figure 5.2. It is characterized in terms of a trans-
mit power Ai, a transmission time txi, a wait time twi and a receive time tri. Note that in
an actual radar, the transmission time actually consists of a series of rapid pulses over a
time period txi as opposed to a continuous transmission. Generally, txi = tri, and the wait
time is based on the round-trip time of the radar signal (e.g., about 1 ms for a target 100
miles away). Also, while the radar may dissipate some power while receiving, this power is
much smaller than the transmit power. For simplicity, we assume that the receive power
is negligible compared to the transmit power. The time between two successive dwells is
called the dwell period (Ti).
t
txi twi triTi
Ai
Figure 5.2: Radar Dwell
In order to appropriately track a target, the dwell needs to have a sufficient number of
pulses (target illumination time or txi) with a sufficient amount of power (Ai) on the pulses
to traverse through the air, illuminate the target and return back after reflection. Larger
txi (more pulses) and Ai provide better tracking information. The value of Ai required to
adequately track a target is proportional to the 4th power of distance between the target
and the radar [46]. Apart from the power output capability of the energy source, it is also
limited by the heat dissipation constraint of the radar. The tracking information is also
112 Chapter 5. Resource Allocation in Phased Array Radar
dependent on many environmental factors beyond the radar system’s control such as the
speed, the acceleration, the distance and the type of the target, the presence of noise in the
atmosphere and the use of electronic counter-measures by the target.
Based on the received pulses, an appropriate signal-processing algorithm must be used
in order to properly estimate the target range, velocity, acceleration, type, etc. There are
many tracking algorithms used in radar systems. They provide trade-offs between the noise
tolerance and dealing with target maneuverability. They also have different computational
requirements. Thus, each radar task consists of a front-end sub-task at the antenna and a
back-end signal-processing sub-task at the processors.
Since a target can maneuver to avoid being tracked, the estimates are valid only for the
duration of illumination time. Based on these data, the time-instant of the next dwell for the
task must be determined. Therefore, the tracking task needs to be repeated periodically with
a smaller period providing better estimates. In the absence of any jitter, the tracking period
is equal to the temporal distance between two consecutive dwells. For a large temporal
distance, the estimated error can be so large that the dwell will miss the target. On the
other hand, a small temporal distance will require higher resource utilization. The radar
needs to track the targets with higher importance using greater tracking precision than the
ones with lower importance [30].
A radar task is periodic with a strict jitter constraint. For example, for a task with
period Ti, the start of each dwell must be exactly2 Ti milliseconds from the start of the
previous dwell. We make the seemingly conservative choice of using only harmonic periods
for radar tasks since by using harmonics we can automatically satisfy the stringent periodic
jitter constraints (a pin-wheel scheduling problem [52]).
2In practice, if two successive dwells are not separated exactly by Ti, lower tracking quality will result.If the jitter is higher than a (small) threshold, an entire track may be lost.
5.3. Radar Resource Model 113
5.3 Radar Resource Model
The radar resources-space consists of the following resource dimensions: radar bandwidth,
radar power, and computing resources.
5.3.1 Radar Bandwidth
As we mentioned earlier, a radar can track only a limited number of targets at a specific
time. Since the radar is unused during the waiting period of a dwell, this time can often be
used by other dwells through interleaving. This gives us a radar utilization value of:
Ur =N∑
i=1
txi + tri
Ti. (5.1)
If we assume the receiving time to be equal to the transmission time, we obtain:
Ur = 2N∑
i=1
txi
Ti. (5.2)
5.3.2 Radar Power Constraints
In addition to timing constraints, radars also have power constraints. Violating a power
constraint can lead to overheating and even permanent damage to the radar. The radar can
have both long-term and short-term constraints. For example, there may be a long-term
constraint of operating below an average power of 1 kW , and a (less stringent) short-term
constraint of operating below an average power of 1.25 kW in a 200 ms window. The
short-term constraint is generally specified using an exponential waiting within a sliding
window.
114 Chapter 5. Resource Allocation in Phased Array Radar
Long-Term Power Utilization Bound
If Pmax is the maximum sustained long-term power dissipation for the radar, then we define
the long-term power utilization for a set of N tasks as:
UP =1
Pmax
N∑i=1
Aitxi
Ti. (5.3)
That is, the long-term power is given by the fraction of time each task is transmitting,
multiplied by the transmit power for that task. Dividing by Pmax gives a utilization value
that cannot exceed 1. To handle long-term constraints in Q-RAM, we simply treat power
as a resource, and denote the amount of that resource consumed by task i as 1Pmax
AitxiTi
.
Short-Term Power Utilization Bound
We will now derive a short-term power utilization bound. Short-term power needs are
defined in terms of a sliding window [5] with time constant τ . With an exponential sliding
window, pulses transmitted more recently have a larger impact on the average power value
than less recently transmitted pulses. Also, the rate that the average power decreases is
proportional to the average power value meaning that immediately after transmitting a
pulse, we have a relatively high but steadily decreasing cooling rate. The use of a sliding
exponential window has two benefits: it is memory-less, and it closely models thermal
cooling, which is the primary motivation for the constraint.
In order to define the short-term average power, we first define instantaneous power
dissipation as p(t). This function is 0 when the radar is not transmitting and Ai while pulse
i is being transmitted. We then define the average power at time t for a time constant τ as:
P τ (t) =1τ
∫ t
−∞p(x)e(x−t)/τdx. (5.4)
5.3. Radar Resource Model 115
t
t=t0
Pτ(t)
e(t-t0)/τ
p(t)
Figure 5.3: Average Power Exponential Window
Ps
Pin
Pout
tcd
ttx
A
P(t)
Pmax
t
Figure 5.4: Cool-Down Time
Ps
Pin
Pout
tcd
ttx
A
P(t)
Pmax
t
Figure 5.5: Non-Optimal Initial Average Power
116 Chapter 5. Resource Allocation in Phased Array Radar
Figure 5.3 shows an example of the average power value for a set of pulses along with the
exponential sliding window at time t0. The shaded bars represent the transmitted radar
energy, and the dotted line represents the sliding window at time t0. The short-term average
power constraint is considered satisfied if (5.4) never exceeds some bound P τ max. This
bound is called the power threshold over a look-back period τ . Alternatively, the expression
Eth = P τ maxτ is defined as the Energy threshold of the system.
Now, we would like to translate the short-term energy constraint of the radar antenna
to a timing constraint. In this context, we define a timing parameter called the cool-down
time tci that precedes a dwell of each task i.
Definition 5.3.1 (Cool-down Time). The cool-down time for a task is the time required
for P τ (t) to fall from P τ max to a value just low enough that at the end of the transmit phase
of a dwell P τ (t) will be restored to P τ max.
The effect of cool-down time is shown in Figure 5.4. It is a function of the transmit time
txi and the average power Ai of a dwell, the time constant τ and the short-term average
power constraint P τ max. This factor allows the power constraints to be converted into
simple timing constraints.
We will now derive the cool-down time tci for a task i. We will assume that for this
task Ai ≥ P τ max. For a task with Ai ≤ P τ max, there is no necessity of having a cool-down
time since the radar cools down even when it continues transmissions, that is tc = 0. Let
P s be the average power at the beginning of the cool-down period, P in be the average
power at the end of the cool-down period, and Pout be the average power at the end of the
transmission. We want P s = Pout = Pmax. We can express P in in terms of P s as:
P in = P se−tci/τ , (5.5)
5.3. Radar Resource Model 117
and Pout in terms of P in as:
Pout = P ine−tx/τ + Ai(1− e−txi/τ ). (5.6)
Substituting P τ max for Pout in (5.6) and solving for P in, we get:
P in =Pmax −Ai(1− e−txi/τ )
e−txi/τ. (5.7)
We can now substitute P τ max for P s in (5.5) and set the forward and backward definitions
(5.5) and (5.7) for P in to be equal and solve for tci to yield the expression for the cool-down
time:
tci = −τ lnP τ max −Ai(1− e−txi/τ )
P τ maxe−txi/τ. (5.8)
We now present the following theorem:
Theorem 5.3.2. For any set of N periodic radar tasks which do not violate the short-term
average power constraint (where Ai ≥ P τ max for all tasks), the total short-term average
power utilization given by
Uτ =N∑
i=1
tci + txi
Ti. (5.9)
must be no greater than 1.
Proof. Assume that we have a set of tasks for which Uτ = 1. From (5.8), it can be shown
that any decrease in P τ max will cause the tci to increase and thus cause Uτ to exceed 1. If
we can show that when Uτ = 1 the optimal schedule must include a point where the average
power Pτ (t) equals P τ max, then this implies that the theorem must hold. Now, assume that
we have a schedule S where tasks are scheduled such that each dwell transmission period
txi is preceded by an idle time of tci with the cool-down time for each dwell beginning
exactly at the end of the previous dwell’s transmission. Now let P s be the average power
at the beginning of the cool-down period preceding a dwell transmission. It can be shown
118 Chapter 5. Resource Allocation in Phased Array Radar
from (5.5) and (5.6) that if P s < P τ max, then the Pout for that dwell must satisfy P s <
Pout < Pmax as shown in Figure 5.5 due to the fact that the cooling rate is proportional
to the current average power. This implies that at the end of each transmit period for each
successive dwell, the average power will increase until it converges to P τ max. This means
that in the steady state, the average power will be P τ max at the end of the transmission
period for every dwell. The schedule S must be optimal since moving a dwell any sooner
would result in an increase in P in for that dwell and thus increase Pout as well (exceeding
P τ max). Moving a dwell any later would trade-off the efficient cooling immediately after the
transmission when average power is at P τ max for less efficient cooling before the transmission
resulting in a violation after the next dwell. This shows that the schedule S must be optimal
and that it must have a point where average power is equal to P τ max.
Based on (5.9), we model the short-term average power constraint in the Q-RAM opti-
mization framework by treating power as a pseudo-resource with a maximum value of 1 and
treating each radar task as if it consumes tci+txiTi
units of that resource, with tci computed
using (5.8). Hence, the expression in (5.9) is also referred to as the cool-down utilization Uc
of the system.
It is interesting to note that if we take the limτ→∞ in Equation (5.8), it can be shown
that
tci = (Ai
P τ max
− 1)txi. (5.10)
If we then substitute the above into (5.9), we obtain:
Uτ=∞ =1
P τ max
N∑i=1
Aitxi
Ti. (5.11)
We see that this equation has the exact same form as the long-term power utilization given
in Equation (5.3).
5.3. Radar Resource Model 119
Computational Resource
In addition to the radar resource, each track requires computing resources to process the
radar data, and to predict the next location of the target. The computing resources required
depend on the tracking algorithm Πi used, and the period Ti. We assume that the required
CPU is of the form CΠi/Ti where CΠi is the coefficient representing the computational cost
of algorithm Πi in each time period Ti. If we treat the back-end multiprocessor system as
a single resource, then we have the CPU constraint:
∑i
CΠi/Ti ≤ Cmax, (5.12)
where Cmax represents the total processing power of the bank of processors. This abstraction
is reasonable as long as the amount of processing required by each of the individual tasks
is small compared with the amount available on each of the processors.
5.3.3 Radar QoS Model
In [32], we developed a Q-RAM model for the radar tracking problem. There are two
principal QoS dimensions in the quality space of the radar tracking problem: tracking error
and search quality.
Tracking Error
This is the difference between the actual position and the tracked position of the target.
Although one cannot know the true tracking error, many tracking algorithms yield a pre-
cision of a particular tracking result. As mentioned in Section 5.3, this tracking precision
is dependent on the availability of the physical resources in addition to the computing re-
sources. A smaller tracking error leads to better tracking precision and hence better quality
of tracking. Therefore, we assume that the tracking quality qtrack is inversely dependent on
120 Chapter 5. Resource Allocation in Phased Array Radar
tracking error ε, as given by:
qtrack =1ε. (5.13)
Search Quality
We also define a QoS parameter for the search task. A search task must span the entire
angle in order to find the targets. It consists of multiple dwells (radar beams) to search
a particular angular space. Hence, the searching QoS increases with the increase in the
number of beams within a fixed angular space.
Reliability
This is the probability that there is no hardware/software failure in a specified time interval.
Higher reliability of a task is obtained by replicating resources, such as using two radars
to track a single target. Since we handle the use of replicas in Chapter 3, we will consider
tracking and searching errors as the only QoS dimensions.
Next, we list the operational and environmental dimensions of the system.
Operational Dimensions
In our tracking model, the operational dimensions are the dwell period (Ti), the dwell time
(txi), the dwell power (Ai), and the choice of the tracking algorithm Πi.
The above parameters can be controlled by the system designer or the optimizer in order
to achieve the desired quality of tracking of a target.
Environmental dimensions
The environmental dimensions we consider are the type of target ξi (e.g., airplane, helicopter,
missile etc.), the distance of the target from the radar ri, the velocity vector of the target ~vi,
the acceleration vector of the target ~ai, the active noise or the presence of electro-magnetic
interference as counter-measures ni, and the angular location of the target in the sky.
Higher tracking quality yields higher utility. Hence we assume that the utility of tracking
a target for a certain quality qtrack is given by the following concave exponential function:
U(qtrack) = Wtrack(1− e−βqtrack), (5.20)
where β is a parameter specific to the ranges of the speeds of three different types of
targets (airplane, missile or helicopter). Equation 5.20 assumes the utility increases with
increase in tracking precision, which ultimately saturates at a very high precision [49]. The
parameter Wtrack is a weight factor that determines the importance of the target and it is
also dependent on the type of the target. Moreover, it is also assumed to be proportional
to the speed and is inversely proportional to the distance of the target. We assume a weight
factor Wtrack of the form:
Wtrack = Kt(vi
ri + Kr), . (5.21)
providing an estimate of the importance of a particular target. The Kt and Kr terms
represent the importance based on the target type, and the right-most term represents the
time-to-intercept (i.e., the time that would be required for a target to reach the ship if flying
directly toward it).
The objective of our optimization is to allocate resources to each tracking process such
that the total utility is maximized. From our stated assumptions on tracking precision,
124 Chapter 5. Resource Allocation in Phased Array Radar
Return?End SchedulerAdmissionControl
QoS Optimizer(Q-RAM)
Start Tracking & Searching Tasks
Tasks with assigned QoS and Resource
Resource Allocation
Radar Utilization Bound
Adjustment Detection
1
0
Perform Utilization Bound Adjustment
Task Profiler
Figure 5.6: Resource Management Model of Radar Tracking System
quality and utility, we obtain an expression for utility as a function of the tracking error,
U(ε) = Wtrack(1− e−γ/ε), (5.22)
where γ is a function of β and the relation between quality and tracking error. The required
values of operational dimensions needed to obtain a particular value of tracking error from
(5.14) can be translated into the resource usages.
5.4 Resource Management in Phased Array Radar
Since radar systems are very dynamic with a constantly changing environment, it is neces-
sary for the radar to continuously redistribute its resources among the tasks. The resource
allocation process needs to be repeated at regular intervals. Hence, its efficient execution
is of critical importance to be of practical use. Our proposed radar resource management
approach consists of 3 main steps: (1) QoS-based resource allocation, (2) resource scheduler
admission test, and (3) utilization bound adjustment. These steps may need to be repeated
more than once in order to obtain a near-optimal solution. We next describe these three
steps.
1. QoS-based Resource Allocation: Basic Q-RAM optimization maximizes the
global utility of the system by allocating the resources to the tasks. We use the
5.4. Resource Management in Phased Array Radar 125
input : schedulability fails or Ub needs adjustment/* Ub = present utilization bound, Up = previous utilization bound,
Umax= upper level of bound, Umin = lower level of bound */if schedulability fails then
Umax ← Ub ;Un ← (Umax + Umin)/2 ;// Un = next utilization bound
if previous schedule was successful and (Un − Up)/Un < .1% thenSwitch to previous schedule;Ub ← Up;return 1;//Previous schedule is selected
endelse
Ub ← Un;//Utilization bound is reduced
Up ← Ub;Return 0;//Return to Q-RAM
endendelse
Umin ← Ub ;//Successful schedule
Un ← (Umax + Umin)/2;if (Un − Ub)/Un < .1% then
Return 1 ;//Current schedule is selected
endelse
Ub ← Un;//Utilization bound is increased
Up ← Ub;Return 0;//Return to Q-RAM
endend
Algorithm 10: Utilization Bound Adjustment
126 Chapter 5. Resource Allocation in Phased Array Radar
current snapshot of the sky at a particular instant during which the environmental
dimensions of the objects are constant. Next, we generate profiles for each target
with the values of its environmental parameters and picking the ranges of values of
the operational parameters. Profiles are used in the optimization process to provide
resource allocations for the tasks. The resulting resource allocation does not always
guarantee schedulability. This is due to the non-preemptive nature of the radar front-
end tasks, which require us to perform a sophisticated scheduler admission test to
determine the schedulability.
2. Scheduler Admission Test: The resource scheduler takes the results of the Q-RAM
resource allocations, interleaves the tasks and then runs the schedulability test. If the
task set is not schedulable, we reduce the utilization bound of the radar and return
to Step 1 in order produce a schedulable task-set.
3. Utilization Bound Adjustment: This function reduces the utilization bound if the
interleaved tasks are not schedulable, or increases the utilization bound if they are
schedulable. Thus, it searches for the maximum utilization bound for a schedulable
task-set using a binary search technique. This is described in Algorithm 10.
The entire resource allocation process iteratively searches for the best possible utilization
bound. It stops when it reaches a schedulable task-set, and the utility values from two
successive iterations differ by only a small value (such as 0.1%), called the “utiliity precision
factor”. This is detailed in Figure 5.6 as a flow-chart.
5.5 Resource Allocation with Q-RAM
As we recall, the Q-RAM optimization involves the following steps:
• Generate set-points for each task.
• Construct the concave majorant of its set-points.
5.5. Resource Allocation with Q-RAM 127
• Merge all the set-points of all tasks based on their marginal utility values.
• Traverse the sorted list of set-points to generate resource allocations to the tasks.
The basic Q-RAM optimization requires that each task explicitly provide the list of all
possible set-points. The concave majorant is determined on these input set-points next. We
know that the best-known algorithm for computing the exact concave majorant of L set-
points is O(L log L). Even though it is a relatively benign complexity, it has two drawbacks.
• We need to generate all possible set-points of a task and use them in determining the
penalty vector of the resources before we determine their concave majorant.
• The computational complexity of the concave majorant operation can be prohibitively
expensive when the number of set-points is large even when the number of output set-
points it generates is much smaller.
Since an application with d operational dimensions and p index values per dimension
has a total of l = pd set-points, the number of set-points can quickly become unmanageable
when there are a large number of operational dimensions. In the following sections, we
describe algorithms that traverse the set-point space generating the subsets of set-points
that are likely to lie on the concave majorant and thus eliminate the requirement of enu-
merating all possible set-points. For simplicity, we will first assume all tasks have only
monotonic operational dimensions. Later we discuss the general case in which some tasks
have operational dimensions that are non-monotonic.
5.5.1 Slope-based Traversal (ST)
This is the simplest approach to the traversal process. Let the minimum set-point for a
task τi for which all operational dimensions are monotonic be defined as ~Φmini = 1, . . . , 1,
and let the maximum set-point be defined as ~Φmaxi = φmax
i1 , . . . , φmaxiNΦ
i. Clearly, all of
the set-points in the utility/compound resource space that lie below a “terminating” line
from (u( ~Φmini ), h( ~Φmin
i )) to (u( ~Φmaxi ), h(Φmax
i )) as shown in Figure 5.7 cannot be on the
128 Chapter 5. Resource Allocation in Phased Array Radar
Compound Resource0.0 0.2 0.4 0.6 0.8 1.0
Util
ity
0.0
0.2
0.4
0.6
0.8
1.0
Figure 5.7: Slope-Based Traversal of Concave Majorant
concave majorant. These points can be eliminated immediately without being passed on
to the concave majorant step. We call this heuristic “slope-based traversal” (ST). While
this heuristic can reduce the time to compute the concave majorant by a constant factor, it
must still scan all of the set-points to determine if they are above or below the terminating
line.
5.5.2 Fast Set-point Traversals
We now consider a set of fast traversal heuristics that do not require computations on all
of the set-points. We (temporarily) assume that all operational dimensions are monotonic.
A key observation we make is that when the actual concave majorant is generated using
all of the set-points for typical tasks, the concave majorant tends to consist of sequences
of set-points that vary in only one dimension at a time with occasional jumps between
sequences of points. This insight suggests that we can use local search techniques to follow
the set-points up the concave majorant. We also know that ~Φmini will always be the first
point on the concave majorant, and ~Φmaxi will always be the last. The methods presented
5.5. Resource Allocation with Q-RAM 129
<1,1,*>
<1,*,5>
<*,7,5>
Composite Cost
Util
ity
Outer Envelope
Figure 5.8: Incremental Traversal
here differ primarily in the method used to perform the local search.
As an example, consider a task with three operational dimensions. If we consider the
subset of the set-points < 1, 1, ∗ > consisting of all the set-points for which dimensions 1
and 2 have index value 1, these points will tend to form a line as shown in Figure 5.8. The
concave majorant will tend to follow such a line until it switches to some other line, in this
case < 1, ∗, 5 > followed by < ∗, 7, 5 >.
While the fast traversal heuristics presented in this section are not guaranteed to find
the exact concave majorant, in Section 5.8 we will show that these heuristics produce very
good approximations to the concave majorant in our radar QoS optimization and more
importantly that the drop in system utility from using the approximations is negligible.
130 Chapter 5. Resource Allocation in Phased Array Radar
First-Order Fast Traversal (FOFT)
In first-order fast traversal (FOFT), we keep a current point ~Φi for each task τi which we
initialize to ~Φmini . We then compute the marginal utility for all the set-points adjacent to
~Φi. A set-point is adjacent if all of its index values except for one are identical, and the one
that differs varies by only one (i.e., they have a Manhattan distance of one). We, in fact,
need only consider positive index value changes. We then choose the point that has the
highest marginal utility, add it to the concave majorant and make that point the current
point. Formally, if ~Φi is the current point we choose the next current point ~Φ′i = ~Φi + ~Ξj
where j maximizes the marginal utility:
u( ~Φi + ~Ξj)− u( ~Φi)
h( ~Φi + ~Ξj)− h( ~Φi), (5.23)
and where ~Ξj is a vector that is zero everywhere except in dimension j where it is equal
to 1. We repeat this step until we reach ~Φmaxi . After we have generated this set of points,
the resulting curve may not be a concave majorant. Hence, we perform a final concave
majorant operation (albeit on a much smaller number of points than before).
The number of set-points generated before the final concave majorant step will be the
Manhattan distance between ~Φmini and ~Φmax
i which is:∑NΦ
ij=1(φ
maxij −φmin
ij ). Ignoring bound-
ary conditions, at each point we only consider NΦi possible next set-points. This means that
when we have d dimensions and k index levels per dimensions then the complexity of this
algorithm is O(kd2). If we include the complexity of the concave majorant determination,
we have the total complexity of O(kd2 + kd log kd)
5.5.3 Higher-Order Fast Traversal Methods
We can generalize the FOFT algorithm to an m-step p-order Fast Traversal algorithm as
follows. Just as in the FOFT heuristic, initialize the current point ~Φi to ~Φmini . Then choose
the next point ~Φ′i = ~Φi + ~Z where ~Z ∈ Gpm such that the marginal utility is maximized and
5.5. Resource Allocation with Q-RAM 131
Gpm is defined as:
G1m =
⋃1≤j≤NΦ,1≤k≤m
kΞj, (5.24)
Gpm = ~X + ~Y : ~X ∈ G1
m, ~Y ∈ Gp−1m , ~X • ~Y ≡ 0 ∪Gp−1
m . (5.25)
That is, we look all of the next set-points that can be reached from the current set-point by
increasing up to p dimensions and up to m steps. The FOFT algorithm described above then
corresponds to G11. As with FOFT, we need to perform a final concave majorant operation
on the points generated by this heuristic. As we observe, if we let m take as large a value
as possible, the procedure becomes a standard concave majorant operation.
5.5.4 Non-Monotonic Dimensions
The fast traversal algorithms described above assume that all of the operational dimensions
are monotonic. Unlike monotonic dimensions, non-monotonic operational dimensions gen-
erally do not have a structure that can be easily exploited. For example, the choice of a
coding scheme for a video, the choice of a route in a networked system, or the choice of a
tracking algorithm in a radar system can be considered to be non-monotonic.
Suppose that some of the operational dimensions are non-monotonic. Then, for every
combination of the index values of the non-monotonic dimensions, we simply apply the fast
traversal algorithms to the subset that is monotonic. We then form the union of all these
results and apply a concave majorant. In the worst-case that a task has only non-monotonic
dimensions, this simply reduces to a full concave majorant operation. For example, in a
radar tracking system, we can apply fast-traversal methods for each of the three tracking
algorithms separately, and then merge all three results and perform a concave majorant
operation.
If there is a large number of non-monotonic operational dimensions, we apply smart
heuristics to guess the best possible values of those dimensions to perform the same traversal.
How this is done depends on the characteristic of a particular system and the influence of
132 Chapter 5. Resource Allocation in Phased Array Radar
the dimensions on the resource requirements of the task.
5.5.5 Complexity of Traversal
Since, we linearly traverse the points and do not examine a point more than once, the worst
case complexity of these scheme is O(L) instead of O(L log L). However, it is likely to be
much smaller on average since we go through only a small number of points. We discuss
the experiments related to these techniques in detail in Section 5.8.
5.5.6 Discrete Profile Generation
Under certain situations, we can find that even efficient profile generation at run-time
takes too long. An alternative is to generate the profiles off-line, but the profile space
has multiple dimensions with wide ranges. Therefore, off-line computation and storage of
profiles requires exponentially large space and becomes unwieldy. The approach we adopt
is to quantize each continuous environmental dimension into a collection of discrete regions.
We then only need to generate a number of discrete task profiles offline for a variety of
environmental conditions. At run-time, we simply map each task into one of the discrete
profiles. Any quantization carried out must be such that (1) the storage needs of the discrete
profiles are practical, and (2) there is no significant drop in the quality of the tracks (as
measured by the total system utility). The quantization along any dimension can employ
an arithmetic, a geometric or some other progression.
5.6 Scheduling Considerations
Our model of each radar dwell task, as discussed in Section 5.2, consists of 4 phases: cool-
down time(tc), transmission time (tx), waiting time (tw), and receiving time (tr), as shown
in Figure 5.9(a). The durations tx and tr are non-preemptive, since a radar can only perform
a single transmission or a single reception at a time. However, the tc of one task can be
5.6. Scheduling Considerations 133
tx tw trtc
(a) Dwell with cool-down time
Offset
W1
W2
tC1 tC2
(b) Proper Nesting Ex-ample
Offset
W1
W2
tC1 tC2
(c) Improper NestingExample
Figure 5.9: Interleaving of Radar Dwells
overlapped with tr or tw of another task, since the radar can cool down during the waiting
and the receiving period.
Considering the entire duration of a dwell (from transmission start to reception end) as
a non-preemptive job wastes resources and decreases the schedulability of the system [74].
Task dwells can be interleaved to improve schedulability. Dwells can be interleaved in two
ways: (1) properly nested interleaving and (2) improperly nested interleaving. An optimal
construction of interleaved schedules using a branch-and-bound method has been described
in [74] and [73].
In this thesis, we focus on fast and inexpensive construction of dwell interleavings in the
presence of dynamically changing task-sets. The interleavings that we construct may not
necessarily be optimal in the sense of [74], but they will be schedulable.
5.6.1 Proper Nesting of Dwells
Two dwells are said to be properly nested if one dwell fits inside the waiting time (tw) of
another. Figure 5.9(b) demonstrates this situation in which dwell W1 fits in the waiting
time of dwell W2. The necessary condition for this interleaving is given by
tww1 ≥ (tcw2 + txw2 + tww2 + trw2). (5.26)
134 Chapter 5. Resource Allocation in Phased Array Radar
input : n > 1nv ← n;// n = Number of inputted tasks, nv = number of virtual tasks
Create a sorted list of the tasks in increasing order of (tc + tx + tw + tr);Create a sorted list of the tasks in increasing order of tw;while 1 do
if nv > 1 thenChoose the task τa with smallest tc + tx + tw + tr;Find the task τw with smallest possible tw that can properly nest τa in itstw;if no task τw is found then
Break from the loop;endelse
Fit τa inside τw by proper nesting (Figure 3) to form a single virtualtask;Remove the original two tasks from the sorted lists and insert the newvirtual task into them;nv ← nv − 1;
endendelse
Break from the loop;end
end
Algorithm 11: Proper Nesting Algorithm
We define a phase offset for a proper interleaving as given by:
op = tww1 − (tcw2 + txw2 + tww2 + trw2). (5.27)
For instance, we can schedule the cool-down time of the dwell W2 right after the transmission
time of W1. Thus, the value of the phase offset determines how tightly two nested tasks fit
together. Our aim is to minimize this offset.
The proper nesting procedure is detailed in Algorithm 11. The core of the scheme deals
with fitting a dwell of the smallest size into a dwell with the smallest feasible waiting time.
5.6. Scheduling Considerations 135
5.6.2 Improper Nesting of Dwells
Two dwells are said to be improperly nested when one dwell only partially overlaps with
another (e.g. as illustrated in Figure 5.9(c)). Suppose that task W1 is improperly inter-
leaved with task W2, where W1 starts first. Task W1 is called the leading task and task W2
is called the trailing task. Based on the phasing illustrated in Figure 5.9(c), the necessary
conditions for the interleaving to occur are given by
tww1 ≥ tcw2 + txw2, (5.28)
tcw2 + txw2 + tww2 ≥ tww1 + trw1. (5.29)
We define a phase offset for this case by
oi = tcw2 + txw2 + tww2 − (tww1 + trw1). (5.30)
Our improper nesting scheme is given in Algorithm 12. It starts with the task with
the largest waiting time (tw), and attempts to interleave it with the task with the largest
possible tw that is smaller than that of the original task and satisfies the conditions stated
in Equations (5.28) and (5.29). The algorithm repeats the process until it reaches the task
with the smallest tw that can no longer be interleaved, or all tasks are interleaved to form
a single virtual task.
5.6.3 Dwell Scheduler
The responsibilities of the radar dwell scheduler are as follows:
• Obtain the period and the dwell-time information (tc, tx, tw, tr) from Q-RAM for each
task.
• Interleave tasks with the same period using proper and/or improper nesting to create
a smaller number of virtual tasks.
136 Chapter 5. Resource Allocation in Phased Array Radar
input : Set of tasks with n > 1output: Modified set of virtual (improperly interleaved) tasks with nv ≥ 1nv ← n ;//n = Number of inputted tasks, nv = number of virtual tasks
Sort the list of tasks in increasing order of tw;while nv > 1 do
Start with a task τw with biggest twwhile A task is found do
Find a task τwn with biggest possible tw smaller than that of τw that can bethe leading task in improper nesting with τw;if τwn is found then
Compute the nesting offset as on;endFind a task τwi with biggest possible tw smaller than that of τw that can bethe trailing task in improper nesting with τw;if τwi is found then
Compute the nesting offset as oi;endif Both τwn and τwi are found then
if on < oi thenMerge τw and τwn by improper nesting with τwn as the leading task;
endelse
Merge τw and τwi by improper nesting with τwi as the trailing task;endnv ← nv − 1 ;Remove the merged two tasks from the sorted list and insert the newvirtual task into it;
endelse if Only τwn is found then
Merge τw and τwn by improper nesting with τwn as the leading task ;nv ← nv − 1 ;Remove the merged two tasks from the sorted list and insert the newvirtual task into it;
else if Only τwi is found thenMerge τw and τwi by improper nesting with τwi as the trailing task;nv ← nv − 1 ;Remove the merged two tasks from the sorted list and insert the newvirtual task into it;
elseGo to the task with next lower tw;
endend
end
Algorithm 12: Improper Nesting Algorithm
5.6. Scheduling Considerations 137
• Perform a non-preemptive schedulability test for the virtual tasks.
Next, we describe our schedulability test.
Schedulability Test
As mentioned earlier, in order to satisfy the jitter requirements, only relatively harmonic
periods are used for the dwells3. We define the following terms:
• Ni = Number of tasks with a period Ti
• Cij = Total run-time of the jth task among the tasks within the period Ti
• NT = Total number of periods
• Ti > Tj ,∀i < j
The response time tRi of the tasks for a given period Ti is given by
tRi =i−1∑j=1
dTi
Tje
Nj∑k=1
Cjk︸ ︷︷ ︸run-time of higher priority tasks
+Ni∑
k=1
Cik︸ ︷︷ ︸run-time of tasks with period Ti
+ Bi︸︷︷︸Blocking term
.
(5.31)
The blocking term Bi is defined as the maximum run-time Cmj among tasks with lower
priority, as defined by:
Bi = max(Cmn),∀i < m ≤ NT , 1 ≤ n ≤ Nm︸ ︷︷ ︸Maximum task size among all tasks of lower priority
. (5.32)
As already mentioned, each radar task (virtual or otherwise) is considered to be non-
preemptive under the schedulability test.
For a task-set to be schedulable, it must satisfy:
tRi ≤ Ti,∀i ∈ NT . (5.33)3As we show in the next section, our model of radar system does not show significant degradation in the
accrued utility due to the restricting use of harmonic periods
138 Chapter 5. Resource Allocation in Phased Array Radar
It must be remembered that using nesting we combine multiple tasks into a few virtual dwell
tasks within each period. The run time of a task is given by Cjk= tcjk
+ txjk+ twjk
+ trjk,
where the parameters tcjketc. may be virtual parameters if the dwells are nested.
5.7 Experimental Configuration
Parameter Type RangeDistance All [30, 400] kmAcceleration All [0.001g, 6g]Noise All [kTBw,103kTBw] a
This shows that although the error is always < 100%, it can be made arbitrarily close
to 1 when ∆r >> R.
Case 3: (n1 + n2)r1 ≥ R > (n1r1 + n2r2) :
This case represents the situation where sufficient resources are available to maximize
6.4. H-Q-RAM Algorithm Design 171
the QoS levels of all tasks of both types. The optimal utility is given by:
Uopt = (n1 + n2)u. (6.14)
Using the approximated classification, the obtained utility is given by:
Us = n2u +Rn1
(n1 + n2)r1u. (6.15)
The fractional utility loss is given by:
ε =n1[(n1 + n2)r1 −R]
(n1 + n2)2. (6.16)
Considering the lowest value of R in this range, we obtain:
ε ≤1− r2
r1n1n2
+ n2n1
+ 2. (6.17)
The worst case happens when r2 = 0 and n2 = n1. In this case, the value of error is 25%.
Based on the above results, we conclude that the approximated classification is applicable
where ∆r, the difference between maximum resource requirements of two tasks of two types,
is less than the capacity of the entire resource R, which is usually the case for most systems.
In addition, the worst case happens when the optimal utility value is infinitesimally small.
In the next section, we will discuss the design of our H-Q-RAM algorithm in detail that
classifies tasks based on their average slope values i.e., the slope of the line joining their
end-points.
6.4 H-Q-RAM Algorithm Design
This algorithm is a more generalized version of Algorithm 4 described in Chapter 3. The
whole process is divided into 4 main parts: (1) Task classification, (2) Clustering, (3)
172 Chapter 6. Resource Allocation in Distributed Embedded Systems
Virtual task formation, and (4) Hierarchical resource allocation.
First, we classify tasks by ordering them in decreasing values of their average slopes.
Next, we create two groups and allocate tasks in each group alternatively. This way, the
average slopes of tasks in each group are similar. Once two groups are formed, we com-
pute the resource demand of each group and allocate processors and communication (bus)
bandwidth to them proportional to their demands. We then recursively divide each group
hierarchically until we allocate at most 2 processors per group.
6.4.1 Task Classification
Tasks are classified to based on average slopes of their utility functions. We make the
following assumptions.
• A task always needs some amount of processing resource.
• If two tasks communicate with each other, we can eliminate their communication
bandwidth requirements by placing them in the same processor. Thus, unlike the
CPU bandwidth, the communication bandwidth may not always be needed for tasks.
We assume a weight of 0.5 for network bandwidth (sending) while assuming that of
1.0 for CPU bandwidth.
Based on the above observations, we express the resource requirement of task i in terms
of a 2-element resource vector consisting of communication (bus) bandwidth and processing
bandwidth at each QoS level1.
If rcj= the processing resource requirement at a QoS level j, rnj= the corresponding
total communication (bus) bandwidth requirement with other tasks, we define a composite
resource metric at level j by:
Hj =√
r2cj
+ (0.5rnj )2. (6.18)
1We are yet to obtain resource dimensions for tasks since clustering is not done. Hence these settings arenot set-points in the pure sense.
6.4. H-Q-RAM Algorithm Design 173
Thus, we construct set-points of a task by computing the composite resource value at each
QoS level. Next, we compute the average slope of the utility function.
Definition 6.4.1 (Average Slope of a Task). The average slope of a task i is given by
the following expression:
si =Umax − Umin
Hmax −Hmin, (6.19)
where Hmin = composite resource at the lowest QoS level, Hmax = composite resource at
the highest QoS level, Umin = utility at the lowest QoS level, and Umax = utility at the
highest QoS level.
Tasks are sorted based on their average slopes. We can either sort the tasks in decreasing
order of their slopes, or perform a radix sorting [45], in which we can divide the slope range
into N discrete slots and fit the tasks into one of the slots.
6.4.2 Clustering
First, we create two clusters and allocate tasks to them. The allocation is performed in
such a way that the tasks with the same class (i.e., similar average slopes) are distributed
in equal numbers between the clusters. Each cluster can again be divided into two more
clusters and thus, the clustering process continues recursively. It lasts until we have a certain
maximum threshold number of tasks or a certain maximum threshold number of resources
per cluster. The scalability and the accuracy of the solution depends on these thresholds.
In this chapter, for simplicity, we assume that the threshold for resources is 2 processors
per cluster. We do not assume any threshold number for tasks.
In order to distribute tasks and resources to the clusters, we require the following defi-
nitions.
Definition 6.4.2 (Mean Slope of a Cluster). The mean slope of a cluster is given by
the arithmetic mean of the average slopes of tasks present in the cluster.
174 Chapter 6. Resource Allocation in Distributed Embedded Systems
Division of Tasks among Groups/Cluster
Allocation of Processors to Groups
Figure 6.4: Slope-based Task Clustering Procedure
Definition 6.4.3 (Average Demand of a task). The average demand of a particular
type of resource (processing bandwidth or communication bandwidth) of a task is given by:
ravgc =
∑Nj=1 rcj
N, (6.20)
where N= the number of QoS levels of a task, and rcj= the resource vector consisting of
two components: CPU and network bandwidth at the jth level.
Definition 6.4.4 (Resource Demand of a Cluster). The resource demand of a cluster
is given by the sum of the average demands of its tasks.
Task Clustering
At a particular stage of the clustering algorithm, we start with the task with the highest
slope, and allocate tasks to each cluster so that the mean slopes of the two clusters are
nearly equal.
6.4. H-Q-RAM Algorithm Design 175
In order to minimize the network resource requirements, we would like to allocate tasks
that communicate with each other in the same cluster. This may conflict with our goal of
balancing mean slopes on clusters. As mentioned before, we divide the slope range into a
number of discrete slots and fit each task into one of the slots. Within each slot, we sort the
tasks by their communication (sending) bandwidth requirements in increasing order. Next,
we select a cluster for each task. If both clusters have the same mean slope, we allocate
the task to the cluster that contains its communicating tasks with largest value of total
communication bandwidth. If the mean slopes are not equal, the allocation is done to the
appropriate cluster so as to equalize the mean slope.
In this way, we ensure two aspects. First, the mean slopes of the clusters are equalized.
Second, the tasks that communicate with one another with larger communication bandwidth
requirements fall in the same cluster and may eventually be allocated to the same processor.
If two tasks are allocated to the same processor, their mutual communication bandwidth is
eliminated.
Resource Clustering
Once two clusters are formed, we apportion the processing and communication resources
based on the resource demands of the clusters. For example, if Rd1 and Rd2 are the total
processing resource demands of two clusters, P is the total number of processors each of
capacity C, the resource allocations Ra1 and Ra2 are given by:
Ra1 =PCRd1
Rd1 + Rd2
, (6.21)
Ra2 =PCRd2
Rd1 + Rd2
, (6.22)
This resource allocation may lead to a fractional allocation of processors, which must
be managed while performing the scheduler admission test on each processor. In the same
way, we distribute the bus bandwidth between the clusters. The process of hierarchical
176 Chapter 6. Resource Allocation in Distributed Embedded Systems
Virtual Task Formation
Figure 6.5: Virtual Task Creation Procedure
clustering for one iteration is illustrated in Algorithm 13.
6.4.3 QoS Optimization
So far, we have been able to divide the system into multiple independent subsystems called
clusters. In this step, we perform QoS optimization independently on each cluster. We will
compare this with basic Q-RAM optimization in which we directly model the entire system
without performing clustering. The optimization process is divided into two steps: Virtual
Task Formation and Resource Allocation.
Virtual Task Formation
As mentioned earlier, when a group of tasks communicate only with each other, their
network bandwidth requirements disappear when they are allocated to the same processor.
Therefore, the resource allocation of these tasks are mutually dependent. We form virtual
tasks by combining these tasks, as shown in Figure 6.5. Consequently, we generate profiles
of virtual tasks by enumerating their resource allocation in the cluster (or in the entire
system for basic Q-RAM).
6.5. Experimental Results 177
Number of QoS dimensions q 1Number of elements of each dimension 3Utility range for QoS dimension (u(q)) random [0.1-1.0]Weight range for each QoS dimension random [0.01,1.00]CPU requirement for a task random [2 MHz - 200 MHz]Network bandwidth requirement between two tasks random [20 Kbps - 200 Kbps]Number of communicating tasks for each task random [1,8]Number of processors 16Resource capacity per processor 2 GHzNetwork bandwidth capacity of the bus 100 Mbps
Table 6.2: Experimental Settings with Optimal Algorithm
Resource Allocation
The resource allocation within each cluster follows the basic AMRMD CM algorithm, as men-
tioned in Chapter 2. We perform allocation in each cluster independently.
6.5 Experimental Results
In this section, we compare the performances of H-Q-RAM and Q-RAM optimizations. As
in previous chapters, our experiment focuses on measuring two parameters: (1) the global
utility obtained by the optimization, and (2) the total execution time of the algorithm.
We consider a distributed system consisting of 16 processors, each with a frequency of
2GHz, connected by a bus of bandwidth 100Mbps. The assumed configuration of the tasks
and that of the system are presented in Table 6.2.
In the case of Q-RAM optimization, we enumerate all possible choices of deployment of
tasks in the system in order to obtain the optimal result. In H-Q-RAM optimization, we
implement Algorithm 13 to divide the system into multiple subsystems or clusters, repeat
the clustering process until we have fewer than 3 processors per cluster, enumerate possible
choices of deployment of tasks within each cluster, and determine the near-optimal resource
allocation within each cluster independently.
178 Chapter 6. Resource Allocation in Distributed Embedded Systems
Create 2 clusters;Create a 3rd cluster;//This stores odd-numbered task from each region
Linearly divide slope ranges (0,∞) in nth number of discrete regions;// nth = number of discrete regions, 100, for example
Fit the tasks into the regions based on average slope values of their utilityfunctions;Classify a task to be of a type based on its presence in a region;for Each slope region do
Within a region, sort the tasks based on decreasing order of their Averagetransmission bandwidth requirements;if number of tasks is odd then
Put the last task in the 3rd cluster;//This takes the last odd-numbered task from the next loop
endfor Each task in the region do
Determine the proportion of communication bandwidth requirements fortasks already allocated in 2 clusters;if Each cluster has equal number of tasks of this type then
Allocate the task to the cluster whose tasks have greater communicationbandwidth with this task;
endelse
Allocate the task to the cluster that has a less number of tasks of thistype;
endend
end/*We would like take tasks out of the 3rd cluster and put them intothe first two of them */
if There are non-zero number of tasks in 3rd cluster thenif Slopes of the first 2 clusters are equal then
Determine the proportion of communication bandwidth requirements fortasks already allocated in 2 clusters;Allocate the task to the cluster that has more tasks that are communicatingwith this task;
endelse
Allocate task to the cluster that balances the mean slopes of the 2 clusterend
end
Algorithm 13: Clustering Algorithm for Communicating Heterogeneous Tasks
6.5. Experimental Results 179
Figure 6.6: Utility Variation
We vary the number of tasks as N = 50, 100, . . . , 300, measure the accrued utility and
execution time for Q-RAM and H-Q-RAM. Each configuration is averaged over 50 iterations.
Figure 6.6 shows the bar-graph containing the variation of the obtained utility against
the number of tasks. From the figure, we notice that H-Q-RAM yields a utility very close
to that of Q-RAM. In fact, the maximum reduction in utility is less than 4%. In addition,
this drop decreases with an increase in the number of tasks, as shown in Figure 6.7.
We plot execution times for Q-RAM and H-Q-RAM in Figure 6.8. As expected, H-Q-
RAM shows a big improvement on execution time. For example, the reduction in execution
time for H-Q-RAM is 85% for 300 tasks. Moreover, the difference in execution times between
the algorithms increases with the number of tasks in the system. This proves the usefulness
of H-Q-RAM for large distributed embedded systems.
180 Chapter 6. Resource Allocation in Distributed Embedded Systems
Figure 6.7: Percentage Utility Reduction
Figure 6.8: Execution Time Variation
6.6. Chapter Summary 181
6.6 Chapter Summary
In this chapter, we investigated the QoS-based resource allocation problem in distributed
embedded systems. This is an extension of the resource allocation we discussed for mul-
tiprocessor systems in Chapter 3. However, we relaxed a few assumptions that we had
made in Chapter 3. First, tasks can communicate with each other. Therefore, we need to
consider allocating the network bandwidth (which is assumed to be bus bandwidth) along
with the processor cycles. Second, we did not assume any fixed set of types of tasks. In
other words, a task can have any possible profile within certain ranges of processor cycles
and network bandwidth requirements. In this case, in order to implement a similar hierar-
chical decomposition technique, we discretized profiles based on their average slopes. We
also minimized the usage of network bandwidth by clustering heavily communicating tasks
together as much as possible. This ensures that highly communicating tasks are likely to
be allocated to the same processor thereby eliminating the network bandwidth requirement
among themselves. The results also demonstrated that our H-Q-RAM is scalable enough
to be used as an adaptive run-time QoS optimizer for distributed embedded systems.
As future work, we would like to implement this as adaptive QoS-aware middleware
in specific types of embedded systems such as automotive systems. We would also like to
integrate this approach with a design-time code-generation tool, such as Time Weaver[21].
Chapter 7
Conclusion and Future Work
The fundamental motivation for this dissertation is the growing need for the development
of scalable resource management infrastructure for large, dynamic and distributed real-
time systems. Instead of maximizing the throughput of one or more resources, the goal
of our scheme is to maximize the satisfaction of the end-users. We consider traditional
distributed systems as well as embedded distributed systems that interact directly with the
physical environment, and hence operate under physical constraints. In all such systems,
the satisfaction of the end-users is the primary parameter that must be maximized.
Our goal was to address the complexity of resource management schemes that allocate
resources to a large number of tasks, perform their deployment in the system and ensure
their timing guarantees by interacting with the admission control of the scheduler. Since the
algorithm for optimally solving this problem is NP-hard, we investigated scalable heuristic
solutions that scale well with the size of the systems.
7.1 Contributions
The contribution of this dissertation can be divided into three major areas. First, we
designed a generic model of a distributed system consisting of resources and other physical
183
184 Chapter 7. Conclusion and Future Work
constraints. Secondly, we developed a set of algorithms that perform the QoS optimization
in a large system in a scalable manner while obtaining a global utility close that of the
optimal algorithm. Finally, we designed and implemented a scheme that integrates our
QoS optimization model with the admission control mechanisms of resources for guaranteed
schedulability.
7.1.1 Modeling
We borrowed the existing model of resources and tasks from the QoS-based Resource Al-
location Model (Q-RAM) [49]. In Q-RAM, a resource vector, whose number of elements
denotes the number of resources, represents a system and the value of an element denotes
the capacity of the corresponding resource. A task is represented by a set of QoS dimen-
sions as user-level dimensions. Each QoS dimension is associated with a utility function.
A particular QoS level of a task contains a fixed value for each of the QoS dimensions.
Hence, each QoS level is associated with a utility that is a sum of the utilities obtained
from the individual QoS dimensions. We also defined a set of system-level dimensions that
influence the allocation of resources of a task. These include operational dimensions and
environmental dimensions.
Operational Dimensions: The operational dimensions are the parameters that are
within the control of the system administrator that influence the resource demands of an
application. Some of the operational dimensions may be of direct relevance to the user
in terms of the quality. Hence, some operational dimensions can be QoS dimensions. Ex-
amples of operational dimensions include resource deployment options, coding schemes for
video applications etc.
Environmental Dimensions: The environmental dimensions are the parameters that
are not in control of the system administrator or the user. An example is the noise in a
7.1. Contributions 185
wireless environment. Changes in the environmental conditions require us to re-optimize
the QoS of the system.
The system dimensions determine the resource requirements, the values of the QoS di-
mensions and the corresponding utility values. Combining all these dimensions, we generate
set-points of tasks, where each set-point consists of a utility value and a particular setting
of operational and environmental dimensions, which includes a QoS level and a resource
configuration.
7.1.2 Scalable QoS Optimization
Our QoS optimization algorithm chooses a set-point for each task and allocates resources
to tasks according to the requirements of their assigned set-points. We define the global
utility of the system as the sum of the utilities of the assigned set-points of tasks. The
optimization process maximizes the global utility.
We developed a basic algorithm of polynomial complexity called AMRMD CM as a modified
version of AMRMD1 algorithm for Q-RAM [51]. AMRMD CM extends the functionality of the
basic algorithm by handling trade-offs more efficiently for tasks with multiple resource
deployment options. The complexity of the basic algorithm is O(nL log(nL)), where n
equals the number of tasks and L equals the maximum number of set-points per task.
Although this is a seemingly benign complexity, it increases monotonically with the increase
in either n or L, which can be problematic when either n or L is very large. To manage this
complexity, we have developed a hierarchical decomposition technique, collectively called
Hierarchical Q-RAM or H-Q-RAM for large distributed systems.
In multiprocessor systems, where a task can be allocated to any of the processors, we
divide the problem into multiple sub-problems, and solve these subproblems independently.
This is done by distributing the processors into near identical processor-groups, distributing
the tasks into near-identical task-clusters, assigning each task-cluster to each processor-
group to form near-identical subsystems, and finally performing the QoS optimization in
186 Chapter 7. Conclusion and Future Work
each of the subsystems concurrently.
A hierarchical networked architecture similar to the Internet consists of loosely connected
sub-domains. Each sub-domain can be considered to be a separate subsystem. However,
if a task has a fixed source and a destination in different sub-domains, then it has fixed
sub-domains for source and destination and the routes between its source and destination
can span across multiple sub-domains. In such cases, we cannot perform sub-domain QoS
optimization independently since the routes of tasks may pass through multiple sub-domains.
The resource allocation is very likely to be made locally within a sub-domain if the source
node and the destination node of a task both fall inside the same sub-domain. This type
of task is called a local task. A task whose source and destination nodes belong to different
sub-domains is called a global task. Sub-domains negotiate with each other using transaction
techniques to allocate resources (route and bandwidth) to a global task.
For certain systems, the complexity arises from the size of L, i.e., the number of the
set-points per tasks. In a certain system, a task may have a very large number of possible
configurations. A typical example is a Radar System. In this case, we have developed
efficient algorithms that select only a few important set-points per task efficiently without
enumerating all possible set-points. We studied the performance of these algorithms in
terms of the global utility and the execution time.
7.1.3 Integration of QoS Optimization and Scheduling
In this dissertation, we have presented an integrated approach that simultaneously maxi-
mizes overall system utility, performs task scheduling analysis and satisfies multi-resource
constraints in dynamic real-time systems such as a radar system. In our implementation
of a resource manager for a phased array radar system, we show that our approach is not
only efficient enough to be used on-line in real-time, but also performs within 10% of the
optimal solution. In this process, we develop efficient scheduling schemes for radar tracking
tasks that can generate high resource utilization of the radar by interleaving tasks with each
7.2. Future Work 187
other.
7.2 Future Work
This dissertation analyzes the complexities associated with QoS-based resource management
in distributed systems, and outlines a scalable framework for it. This has opened up multiple
directions for future research, from modest, incremental improvements to more broad and
fundamental ones. We present these different areas of future work here.
7.2.1 Implementation
We have a prototype implementation of a middleware that performs the QoS optimization
in a distributed networked system consisting of 12 nodes. A global server known as “Session
Coordinator” or SesCo, runs on a single node and performs QoS-based resource allocation for
the entire system [40]. It enforces resource reservations by interacting with “Local Resource
Managers (LRMD)” running on individual hosts. LRMD, in turn, relies on the reservation
mechanisms of the real-time operating system Linux/RK running on individual hosts [62].
Following the principles of H-Q-RAM, we would like to extend this prototype by incor-
porating the distributed implementation of SesCo. In addition, the emulation of large-scale
networks can also been performed in this test-bed for future research problems.
7.2.2 Stochastic QoS and Resource Requirements
In this dissertation, we have subtly assumed deterministic resource requirements of tasks.
Even if this could either be worst-case or the average-case, the variation of the resource
usages of tasks was not considered. If the resource requirement of a task changes, the
current configuration will rerun the optimization to generate a new resource allocation.
However, this may not be sufficient if the resource requirements of a task vary rapidly. In
this case, our system may not meet the deadlines of all the tasks if it uses average-case
188 Chapter 7. Conclusion and Future Work
utilization, or it will be heavily underutilized if it uses worst-case resource utilization.
In Q-RAM, we currently have two types of Probabilistic Level of Service (PLoS) metrics
in the context of network bandwidth [40]: (a) QoS availability (fraction of time there
is no degradation) and (b) fraction of packets delivered (not dropped). The “Resource
Priority Multiplexing” (RPM) policy module and its kernel-level mechanisms implement
the probabilistic guarantees for network bandwidth [37].
Apart from the two PLoS metrics as probabilistic QoS dimensions, there are other prob-
abilistic QoS dimensions such as the number of packets (or jobs) that meet their deadlines.
In this context, Zhu et al designed a Quantized EDF (Q-EDF) [86] scheduling mechanism
that minimizes the number of deadline misses of tasks. Hence, if we want to consider the
deadline miss rate as a QoS dimension (or, a PLoS metric), we would have to integrate
Q-EDF with the QoS optimization scheme. In our QoS optimization model, the criticality
can be considered a QoS dimension that determines the utility loss relative to the number
of deadline misses of a task. Based on the statistics of resource usage of the task, we would
like to determine the resource requirements of a task in order to obtain a specific deadline
miss rate. However, determining the resource requirement for a specific deadline miss rate
can be a difficult problem.
7.2.3 Profit Maximization Model for Resource Allocation
Our QoS-based resource allocation model maximizes the global utility of the system, by
apportioning resources of fixed quantities to a set of tasks. In this case, we maximize the
utilization of resources as well, since more resource usage generally provides more utility
to the end-users. Hence, we optimize our system toward maximizing benefits of the end-
users under the constraint of limited resource capacities. This is a typical consumer-centric
model where the consumer would like to maximize his/her satisfaction or utility by buying
a particular bundle of goods under his/her budget constraint.
The producer, on the other hand, sets the prices of goods based on the utility they
7.2. Future Work 189
provide to the customers. Hence the revenue earned by the producer is proportional to
the sum of the utilities of his/her consumers. However, the producer strives to maximize
his/her profit, which is defined as the difference between the revenue and cost. It is possible
that maximizing the revenue may not maximize the profit, since the cost generally increases
with the size of the system, and therefore, it may be too large at a very large revenue.
In computer systems, the cost to the producer includes the purchasing cost and the
maintenance cost of hardware and software components. Therefore, the profit maximization
principle leads to producer-driven hardware-software co-design issues. As future work, we
would like to develop analytical tools that determine the hardware composition and the
software deployment for embedded systems driven by profit maximization principles.
Bibliography
Bibliography
[1] Abdelzaher, T., Atkins, E., and Shin, K. G. (1997). QoS negotiation in real-time systems
and its application to automated flight control. In IEEE Real Time Systems Symposium
(RTAS), pages 228–238.
[2] Axelsson, J. (2000). Efficient integration of distributed automotive real-time systems.
In EDA-meeting.
[3] Baker, B. S. (1985). A new proof fro the first-fit decreasing bin-packing algorithm.
Journal of Algorithms, 6:49–70.
[4] Banerjee, P. and Abraham, J. (1984). Fault-secure algorithms for multiprocessor sys-
tems. In 11th International Symp. on Computer Architecture.
[5] Baugh, R. (1973). Computer Control of Modern Radars. RCA M&SR-Moorestown
Library.
[6] Bellman, R. (1958). On a routing problem. Quart. Appl. Math., 16:87–90.
[7] Bennett, J. and Zhang, H. (1996). Wf2q: Worst-case fair weighted fair queueing. In
Conference on Computer Communications (INFOCOM).
[8] Bentley, J. L., Johnson, D. S., Leighton, F. T., McGeoch, C. C., and McGeoch, L. A.
191
192 Bibliography
(1984). Some unexpected expected behavior results for bin packing. In 16th annual ACM
symposium on Theory of computing, pages 279–288.
[9] Bosch (1991). Can specification, version 2.0.
[10] Calvert, K. L., Doar, M. B., and Zegura, E. W. (1997). Modeling internet topology.
IEEE Communications Magazine, 35(6):160–163.
[11] Chan, L. M. A., Simchi-Levi, D., and Bramel, L. (1998). Worst-case analyses, linear
programming and the bin-packing problem. Mathematical Programming, 83:213–227.
[12] Chen, S. and Nahrstedt, K. (1998a). Distributed quality-of-service routing in high-
speed networks based on selective probing. In IEEE Annual Conference on Local Area
Networks (LCN), pages 80–89.
[13] Chen, S. and Nahrstedt, K. (1998b). An overview of quality-of-service routing for the
next generation high-speed networks: Problems and solutions. IEEE Network Magazine,
Special Issue on Transmission and Distribution of Digital Video, 12(6):64–79.
[14] Cherukuri, R., Dykeman(eds.), D., and Gouguen(chair), M. (1995). Pnni draft speci-
fication.
[15] Chevochot, P. and Puaut, I. (1999). Scheduling fault-tolerant distributed hard real-
time tasks independently of the replication strategies. In 6th International Conference
on Real-Time Computing Systems and Applications.
[16] C.L. Liu, J. L. (1973). Scheduling algorithms for multiprogramming in a hard real-time
environment. Journal on ACM, 2(1):46–61.
[17] Coffman, E., Garey, J. M., and Johnson, D. (1987). Bin packing with divisible item
size. Journal of Complexity, 3:406–428.
BIBLIOGRAPHY 193
[18] Coffman, E., Jr., Garey, M., and Johnson, D. (1996). Approximation Algorithms for Bin
Packing: A Survey. Approximation Algorithms for NP-Hard Problems. PWS Publishing,
Boston.
[19] David and Simchi-Levi (1994). New worst-case results for the bin-packing problem.
Naval Research Logistics, 41:579–585.
[20] Demers, A., Keshav, S., and Shenker, S. (1989). Analysis and simulation of a fair queue-
ing algorithm. In ACM Special Interest Group on Data Communication (SIGCOMM).
[21] Deniz, D. (2004). Modeling Functional and Para-Functional Concerns In Embedded
Real-Time Systems. PhD thesis, Department of Electrical and Computer Engineering,
Carnegie Mellon University.
[22] Dijkstra, E. (1959). A note on two problems in connection with graphs. Num. Math.,
1:269–271.
[23] Everitt, B. S., Landau, S., and Leese, M. (2001). Cluster Analysis. Edward Arnold.
[24] Ford, L. and Fulkerson, D. (1963). Flows in Networks. Princeton Univ. Press.
[25] Forum/95-0013R8, A. (1995). Atm forum traffic management specification version 4.0.
[26] Gao, X., Nandagopal, T., and Bharghavan, V. (2001). Achieving application level
fairness through utility-based wireless fair scheduling. In IEEE Global telecommunications
Conference (GLOBECOM).
[27] Ghosh, S., Hansen, J., Rajkumar, R., and Lehoczky, J. (2004a). Adaptive QoS op-
timizations with applications to radar tracking. In 10th International Conference on
Real-Time and Embedded Computing Systems and Applications (RTCSA).
[28] Ghosh, S. and Rajkumar, R. (1999). Practical management of end-to-end network
bandwidth reservation. In Proc. of Conference on Software in Telecommunications and
Computer Networks (SOFTCOM).
194 Bibliography
[29] Ghosh, S. and Rajkumar, R. (2002). Resource management of the os network subsys-
tem. In IEEE International Symposium on Object-oriented Real-time distributed Com-
puting.
[30] Ghosh, S., Rajkumar, R., Hansen, J., and Lehoczky, J. (2004b). Integrated resource
management and scheduling with multi-resource constraints. Technical Report 18-2-04,
Institute for Complex Engineering Systems, Carnegie Mellon University.
[31] Ghosh, S., Rajkumar, R. R., Hansen, J., and Lehoczky, J. (2003). Scalable resource
allocation for multi-processor QoS optimization. In 23rd IEEE International Conference
on Distributed Computing Systems (ICDCS 2003).
[32] Ghosh, S., Rajkumar, R. R., Hansen, J., and Lehoczky, J. (2004c). Adaptive QoS
optimizations with applications to radar tracking. Technical Report 18-3-04, Institute
for Complex Engineering Systems, Carnegie Mellon University.
[33] Goddard, S. and Jeffay, K. (1997). Analyzing the real-time properties of a dataflow
execution paradigm using a synthetic aperture radar application. In Proceedings of the
IEEE Real-Time and Embedded Technology and Applications Symposium.
[34] Guerin, R. and Orda, A. (1999). QoS-based routing in networks with inaccurate infor-
mation: Theory and algorithms. IEEE Transactions on Networking, 1(3).
[35] Guerin, R., Orda, A., and Williams, D. (1996). QoS routing mechanisms and ospf
extensions.
[36] Hansen, J., Ghosh, S., Rajkumar, R., and Lehoczky, J. (2004). Resource management
and highly configurable tasks. In 12th International Workshop on Parallel and Distributed
Real-Time Systems.
[37] Hansen, J., Zhu, H., and Rajkumar, R. (2001a). Probabilistic bandwidth reservation
BIBLIOGRAPHY 195
by resource priority multiplexing. In Real-Time Technology and Applications Symposium
(RTAS), pages 171–178.
[38] Hansen, J. P., Lehoczky, J., and Rajkumar, R. (2001b). Optimization of quality of
service in dynamic systems. In Proceedings of the 9th International Workshop on Parallel
and Distributed Real-Time Systems (WPDRTS).
[39] Hansen, J. P., Zhu, H., Lehoczky, J., and Rajkumar, R. (2002). Quantized edf schedul-
ing in a stochastic environment. In Proc. of 10th International Workshop on Parallel and
Distributed Real-Time Systems (WPDRTS).
[40] Hoover, C., Hansen, J., Koopman, P., and Tamboli, S. (1999). The amaranth frame-
work: Probabilistic, utility-based quality of service management for high-assurance com-
puting. In 4th IEEE International Symposium on High-Assurance Systems Engineering
(HASE), pages 207–216.
[41] Huang, J., Wan, P. J., and Du, D. Z. (1998). Criticality- and QoS-based multiresource
negotiation and adaptation for continuous multimedia. Journal of Real-Time Systems,
15(1):249–273.
[42] Johnson, D. (1973). Near-Optimal Bin Packing Algorithms. PhD thesis, MIT, Cam-
bridge, MA.
[43] Johnson, D., Demers, A., Ullman, J., Garey, M., and Graham, L. (1974). Worst-case
performance bounds for simple one-dimensional packing algorithms. SIAM J. Compt.,
3(4).
[44] Kleinrock, L. and Kamoun, F. (1977). Hierarchical routing for large networks-
performance evaluation and optimizations. Computer Networks, 1:155–174.
[45] Knuth, D. (1973). The art of Computer Programming, volume 3/Sorting and Searching.
Addison-Wesley.
196 Bibliography
[46] Kolawole, M. O. (2002). Radar Systems, Peak Detection and Tracking. Newnes Press.
[47] Kopetz, H. (1998). The time-triggered architecture. ISORC ’98, April 1998, in Kyoto,
Japan.
[48] Kuo, T. W., Chao, Y. S., Kuo, C. F., Chang, C., and Su., Y. (2002). Real-time dwell
scheduling of component-oriented phased array radars. In IEEE 2002 Radar Conferences.
[49] Lee, C. (1999). On Quality of Service Management. PhD thesis, Department of Elec-
trical and Computer Engineering, Carnegie Mellon University.
[50] Lee, C., Lehoczky, J., Rajkumar, R., and Siewiorek, D. (1998). On quality of service
optimization with discrete QoS options. In Proceedings of the IEEE Real-Time Technology
and Applications Symposium. IEEE.
[51] Lee, C., Lehoczky, J., Siewiorek, D., Rajkumar, R., and Hansen, J. (1999). A scalable
solution to the multi-resource QoS problem. In Proceedings of the IEEE Real-Time
Systems Symposium.
[52] Lin, K. (1995). Distributed pinwheel scheduling with end-to-end timing constraints.
In IEEE Real-Time Systems Symposium.
[53] Lui, K.-S., Nahrstedt, K., and Chen, S. (2000). Hierarchical QoS routing in delay-
bandwidth sensitive networks. In IEEE Local Computer Networks (LCN 2000), pages
579–588.
[54] Ma, Q. and Steenkiste, P. (1997). On path selection for traffic with bandwidth guar-
antees. In IEEE International Conference on Network Protocols.
[55] Mangharam, R., Demirhan, M., Rajkumar, R., and Raychaudhuri, D. (2004). Size
matters: Size-based scheduling for mpeg-4 over wireless channels. In SPIE Conference
on Multimedia Computing and Networking (MMCN), pages 110–122.
BIBLIOGRAPHY 197
[56] Medina, A., Lakhina, A., Matta, I., and Byers, J. (2001a). Brite: An approach to
universal topology generation. In International Workshop on Modeling, Analysis and
Simulation of Computer and Telecommunications Systems (MASCOTS 2001).
[57] Medina, A., Lakhina, A., Matta, I., and Byers, J. (2001b). Brite: Universal topology
generation from a user’s perspective. Technical Report 2001-003, Boston University.
[58] Megginson, R., Axler, S., and Gehring, F. (2001). An Introduction Banach Space
Theory (Graduate Texts in Mathematics, 183). John Wiley and Sons Inc.
[59] Mercer, C. (1997). Operating System Resource Reservation for Real-Time and Multi-
media Applications. PhD thesis, School of Computer Science, Carnegie Mellon University.
[60] Munu, M., Harrison, I., Wilkin, D., and Woolfson, M. (1992). Target tracking al-
gorithms for phased array radar. Radar and Signal Processing, IEE Proceedings-F,
139(5):336–342.
[61] Nahrstedt, K., hua Chu, H., and Narayan, S. (1999). QoS-aware resource management
for distributed multimedia applications. Journal of High Speed Networking, 7(3-4):229–
257.
[62] Oikawa, S. and Rajkumar, R. (1999). Portable RK: A portable resource kernel for guar-
anteed and enforced timing behavior. In IEEE Real-Time Technology and Applications
Symposium (RTAS).
[63] Parekh, A. and Gallager, R. G. (1993). A generalized processor sharing approach to flow
control in integrated services networks: the single node case. IEEE/ACM Transactions
on Networking, pages 344–357.
[64] Peressini, A., Sullivan, F., and Jr, J. U. (1988). The Mathematics of Nonlinear Pro-
gramming. Springer Verlag.
198 Bibliography
[65] Rajkumar, R. (1991). Synchronization in Real-Time Systems: A Priority Inheritence
Approach. Kluwer Academic Publishers.
[66] Rajkumar, R., Lee, C., Lehoczky, J., and Siewiorek, D. (1997). A resource allocation
model for QoS management. In IEEE Real-Time Systems Symposium.
[67] Rosu, D. I., Schwan, K., Yalamanchili, S., and Jha, R. (1997). On adaptive resource
allocation for complex real-time applications. In 18th IEEE Real-Time Systems Sympo-
sium.
[68] Rutgers, C. L. H. (2002). Cisco white paper: An introduction to igrp.
[69] Saewong, S. and Rajkumar, R. (1999). Cooperative scheduling of multiple resources.
In IEEE Real-time Systems Symposium.
[70] Sanfridson, M. (2000). Problem formulations for QoS management in automatic con-
trol. Technical report, Mechatronics Lab, Department of Machine Design Royal Institute
of Technology.
[71] Sha, L., Rajkumar, R., and Lehoczky, J. (1990). Priority inheritance protocols: An
approach to real-time synchronization. IEEE Transactions on Computers, pages 1175–
1185.
[72] Shenker, S. (1995). Fundamental design issues for the future internet. IEEE Journal
of Selected Areas in Communication, 13(7):1176–1188.
[73] Shih, C., Gopalakrishnan, S., Ganti, P., Caccamo, M., and Sha, L. (2003a). Scheduling
real-time dwells using tasks with synthetic periods. In Proceedings of the IEEE Real-Time
Systems Symposium.
[74] Shih, C., Gopalakrishnan, S., Ganti, P., Caccamo, M., and Sha, L. (2003b). Template-
based real-time dwell scheduling with energy constraint. In Proceedings of the IEEE
Real-Time and Embedded Technology and Applications Symposium.
BIBLIOGRAPHY 199
[75] Shirazi, B., Hurson, A., and Kavi, K. (1995). Scheduling and Load Balancing in Parallel
and Distributed Systems. Wiley.
[76] Smith, C. A. and Kroll, M. J. (1989). Utility theory and rent optimization: Utilizing
cluster analysis to segment rental markets. Journal of Real Estate Research, 4(1):61–71.
[77] Staehli, R. (1996). Quality of Service Specification for Resource Management in Multi-
media Systems. PhD thesis, Department of Computer Science and Engineering, Oregon
Graduate Institute.
[78] Staehli, R., Walpole, J., and Maier, D. (1995). Quality of service specification for