Scalable QoS-based Resource Allocationsourav/thesis.pdf · Abstract A distributed real-time or embedded system consists of a large number of applications that interact with the physical

Scalable QoS-based ResourceAllocation

A Dissertation

Submitted to the Graduate Education Committee

At The Department of Electrical and Computer Engieneering

Carnegie Mellon University in Partial Fulfillment of the Requirements

for the degree of

Doctor of Philosophy In Electrical and Computer Engineering by

Sourav Ghosh

COMMITTEE MEMBERS:Advisor: Prof. Ragunathan (Raj) Rajkumar

Dr. Jeffery Hansen

Prof. John Lehoczky

Prof. Dan Sieworek

Pittsburgh, PennsylvaniaAugust, 2004

Copyright c©2004 Sourav Ghosh

This research was supported by the DoD Multidisciplinary University Research Initiative (MURI) program

administered by the Office of Naval Research (ONR) under Grant N00014-01-1-0576 and in part by Defense

Advanced Research Project Agency(DARPA). The views and conclusions contained in this document are

those of the author and should not be interpreted as representing the official policies ore endorsements,

either expressed or implied, of DoD, ONR or DARPA.

Dedicated to my loving parents.

Abstract

A distributed real-time or embedded system consists of a large number of applications

that interact with the physical environment and must satisfy end-to-end timing constraints.

Applications in such system may offer different quality levels (such as higher or lower frame

rates for a video conferencing application) across multiple factors or dimensions (such as

frame rate, resolution). The end-user derives different degrees of satisfaction (known as

utility) from these quality levels.

In this dissertation, we design and implement a resource allocation methodology that

determines the quality settings of the applications in a given system with the goal of max-

imizing the global utility of the system. We build on the QoS-based Resource Allocation

Model (Q-RAM) as a QoS optimizer [51]. This acts as a resource manager between the

applications and the operating system scheduler. Q-RAM was able to reduce the NP-hard

complexity of the optimal algorithm to a polynomial one while yielding a near-optimal so-

lution. Nevertheless, Q-RAM becomes practically intractable as the system becomes large

and dynamic. Hence, we develop scalable hierarchical optimization algorithms that yields

near-optimal results within 5% of Q-RAM while obtaining several orders magnitude of gain

in execution times. Collectively, we name the above techniques as Hierarchical Q-RAM

(H-Q-RAM). H-Q-RAM can be practically implemented in large-scale distributed systems

at design time and/or at run-time. We apply our scheme to: large multiprocessor systems,

hierarchical networked systems, phased-array radar systems and distributed automotive

systems. We also exemplify the interaction of this optimizer with the lower level resource

i

ii Abstract

scheduler.

Acknowledgements

Five years ago, I first talked about my prospect in pursuing a PhD to my advisor Professor

Rajkumar after the completion of my Masters. While I was still unsure if this would be

the best move for my career, Raj urged me to continue working for it. At the end of this

long journey, I not only enjoyed my work as a researcher, but also realized how important

this PhD was for my life. I do believe my life would have been unfulfilled without this

accomplishment, which has opened up a window of new opportunities for me. For that

matter, I am indebted to Raj. While his critical approach has been greatly instrumental

in refining my thought process, his encouragement to think independently left no stone

unturned in making me a successful researcher. Raj, I thank you for bestowing your trust

on me during the hard times.

I would like to offer my sincere gratitude to my thesis committee, Dr. Jeffery Hansen,

Professor John Lehoczky and Professor Dan Sieworek. Thank you all for spending your

valuable time to help me. Jeff, I have been very fortunate to have been able to work closely

with you. Your contribution to my work has been significant. A big thank to you for

working with me during the late hours. I hope to continue working with you in the future.

John, thanks for many detailed discussions we had and your deep insights into problems.

Dan, I appreciate your probing questions and comments during my proposal as well as my

thesis defense.

I would like to thank all my colleagues at the Real-Time and Multimedia Systems Lab

(RTML): Dionisio de Niz, Saowanee (Apple) Saowang, Akihiko Miyoshi, Haifeng Zhu, Rahul

iii

iv Acknowledgements

Mangharam, Anand Eswaran, Anthony Rowe and Gaurav Bhatia. It would not have been

such a wonderful experience for me without you guys. Dio, yes I agree with you that our two

minute walk to coffee at Porter Hall was intellectually very refreshing. It was an excellent

time for creating, discussing and or destroying new ideas. I appreciate your wisdom and

experience. You have been very helpful to me. Apple, I always appreciated your fastidi-

ousness, and I offer my best wishes for your career in academia. Aki, your knowledge in

systems was immensely helpful to me. I did enjoy those insightful non-technical discussions,

which turned out to be very refreshing during the course of my graduate study. Haifeng, I

enjoyed our discussions. I hope to collaborate more with you in the future. In addition, I

must thank you for your friendliness and help during tough times. Rahul, you have been

very instrumental in instilling entrepreneurial ideas in me, which helped me in shaping my

thesis in the right direction towards potential employers. Anand, you have the potential to

be a great researcher. Keep up the good spirit. Anthony, you have brought a fresh new wave

of ideas in our group. Gaurav, your technical expertise in several areas was very useful to me.

In the process of writing this dissertation, I cannot gainsay the enormous amount of

contribution a person has made by proof-reading my document. Her name is Rachel Lange.

I sincerely appreciate the help Rachel has provided to me during my busiest hours.

Lastly but not the least, I thank my dearest parents. Their hard work and sacrifices had

given me educational advantages that brought me to Carnegie Mellon. They had continued

their faith in my capability during difficult times, while patiently waiting for my day to

graduate. Without their inspiration and blessing, I would not have made this far.

Table of Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3.1 QoS-Optimization Techniques and Middleware . . . . . . . . . . . . 61.3.2 QoS and Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.3.3 QoS and Radar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3.4 QoS and Embedded Systems . . . . . . . . . . . . . . . . . . . . . . 10

1.4 Organization of this Dissertation . . . . . . . . . . . . . . . . . . . . . . . . 11

2 System Model 13

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.1.1 Definitions: Task and Resources . . . . . . . . . . . . . . . . . . . . 132.1.2 Time-shared resources . . . . . . . . . . . . . . . . . . . . . . . . . . 142.1.3 Spatial resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2 QoS and Resource Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . 162.2.1 Operational Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . 172.2.2 Environmental Dimensions . . . . . . . . . . . . . . . . . . . . . . . 182.2.3 QoS Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.2.4 Set-point Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2.5 Example Application . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.2.6 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.3 Existing Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 252.3.1 Approximate Multi-Resource Multi-Dimensional Algorithm (AMRMD) 262.3.2 Drawbacks of the AMRMD1 Algorithm . . . . . . . . . . . . . . . . 28

2.4 Enhanced Optimization Algorithms . . . . . . . . . . . . . . . . . . . . . . . 322.4.1 Dynamic Penalty Vector (AMRMD DP) . . . . . . . . . . . . . . . . . . 32

v

vi TABLE OF CONTENTS

2.4.2 Co-mapping of Quality Points (AMRMD CM) . . . . . . . . . . . . . . . 342.5 Large-scale Optimization Issues . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.5.1 Set-Point Generation Complexity . . . . . . . . . . . . . . . . . . . . 382.5.2 Core Algorithm Complexity . . . . . . . . . . . . . . . . . . . . . . . 392.5.3 QoS Optimization and Resource Scheduling . . . . . . . . . . . . . . 40

2.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3 Resource Allocation in Multiprocessor Systems 43

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.2 Q-RAM in Multiprocessor Systems . . . . . . . . . . . . . . . . . . . . . . . 44

3.2.1 Comparison with Optimal Algorithm . . . . . . . . . . . . . . . . . . 463.2.2 Results for Larger Systems . . . . . . . . . . . . . . . . . . . . . . . 473.2.3 Results on Fault-tolerance . . . . . . . . . . . . . . . . . . . . . . . 50

3.3 Hierarchical Q-RAM in Multiprocessor System . . . . . . . . . . . . . . . . 523.3.1 Hierarchical Q-RAM Algorithm . . . . . . . . . . . . . . . . . . . . . 55

3.4 Performance Evaluation: H-Q-RAM . . . . . . . . . . . . . . . . . . . . . . 593.4.1 Multi-processor Resource Allocation . . . . . . . . . . . . . . . . . . 593.4.2 Fault-tolerance and Hierarchical Q-RAM . . . . . . . . . . . . . . . 60

3.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4 Resource Allocation in Networks 67

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.1.1 Our Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.2 Modeling of Networked System . . . . . . . . . . . . . . . . . . . . . . . . . 684.2.1 Network Model and QoS . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.3 Hierarchical Network Architecture . . . . . . . . . . . . . . . . . . . . . . . 744.3.1 Graph-Theoretical Representation . . . . . . . . . . . . . . . . . . . 744.3.2 Hierarchical Route Discovery . . . . . . . . . . . . . . . . . . . . . . 80

4.4 Selective Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 804.4.1 Broadcast Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814.4.2 Smart Route Discovery . . . . . . . . . . . . . . . . . . . . . . . . . 814.4.3 Route Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834.4.4 QoS Optimization in Large Networks . . . . . . . . . . . . . . . . . . 84

4.5 Hierarchical QoS Optimization (H-Q-RAM) . . . . . . . . . . . . . . . . . . 854.5.1 Hierarchical Concave Majorant Operation . . . . . . . . . . . . . . . 854.5.2 Transaction-based Resource Allocation . . . . . . . . . . . . . . . . . 874.5.3 Complexity of Network QoS Optimization . . . . . . . . . . . . . . . 92

TABLE OF CONTENTS vii

4.6 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

4.6.1 Experimental Configuration . . . . . . . . . . . . . . . . . . . . . . . 93

4.6.2 Performance Evaluation of Selective Routing . . . . . . . . . . . . . 94

4.6.3 Performance Evaluation of Hierarchical Optimization . . . . . . . . . 100

4.7 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

5 Resource Allocation in Phased Array Radar 107

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.2 Radar Task Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

5.3 Radar Resource Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

5.3.1 Radar Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

5.3.2 Radar Power Constraints . . . . . . . . . . . . . . . . . . . . . . . . 113

5.3.3 Radar QoS Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

5.4 Resource Management in Phased Array Radar . . . . . . . . . . . . . . . . 124

5.5 Resource Allocation with Q-RAM . . . . . . . . . . . . . . . . . . . . . . . 126

5.5.1 Slope-based Traversal (ST) . . . . . . . . . . . . . . . . . . . . . . . 127

5.5.2 Fast Set-point Traversals . . . . . . . . . . . . . . . . . . . . . . . . 128

5.5.3 Higher-Order Fast Traversal Methods . . . . . . . . . . . . . . . . . 130

5.5.4 Non-Monotonic Dimensions . . . . . . . . . . . . . . . . . . . . . . . 131

5.5.5 Complexity of Traversal . . . . . . . . . . . . . . . . . . . . . . . . . 132

5.5.6 Discrete Profile Generation . . . . . . . . . . . . . . . . . . . . . . . 132

5.6 Scheduling Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

5.6.1 Proper Nesting of Dwells . . . . . . . . . . . . . . . . . . . . . . . . 133

5.6.2 Improper Nesting of Dwells . . . . . . . . . . . . . . . . . . . . . . . 135

5.6.3 Dwell Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

5.7 Experimental Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

5.8 Results with QoS Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 139

5.8.1 Experiments with Traversal Techniques . . . . . . . . . . . . . . . . 140

5.8.2 Generation of Discrete Profiles . . . . . . . . . . . . . . . . . . . . . 145

5.8.3 Utility Variation with Discrete Profiles . . . . . . . . . . . . . . . . . 148

5.9 Results with Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

5.9.1 The Effect of Harmonic Periods . . . . . . . . . . . . . . . . . . . . . 151

5.9.2 Comparisons of Scheduling Algorithms . . . . . . . . . . . . . . . . . 154

5.9.3 Interleaving Execution Times . . . . . . . . . . . . . . . . . . . . . . 157

5.10 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

viii TABLE OF CONTENTS

6 Resource Allocation in Distributed Embedded Systems 1616.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1616.2 QoS and Resource Management Challenges . . . . . . . . . . . . . . . . . . 1626.3 Task Classification and Cluster Analysis . . . . . . . . . . . . . . . . . . . . 164

6.3.1 Measure of Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . 1656.3.2 Utility Loss Analysis in Slope-based Classification . . . . . . . . . . 167

6.4 H-Q-RAM Algorithm Design . . . . . . . . . . . . . . . . . . . . . . . . . . 1716.4.1 Task Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1726.4.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1736.4.3 QoS Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

6.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1776.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

7 Conclusion and Future Work 1837.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

7.1.1 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1847.1.2 Scalable QoS Optimization . . . . . . . . . . . . . . . . . . . . . . . 1857.1.3 Integration of QoS Optimization and Scheduling . . . . . . . . . . . 186

7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1877.2.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1877.2.2 Stochastic QoS and Resource Requirements . . . . . . . . . . . . . . 1877.2.3 Profit Maximization Model for Resource Allocation . . . . . . . . . . 188

List of Figures

2.1 Dimensions and Their Relations . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2 Reliability and Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.3 Equally Sized Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.4 Unequally Sized Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.5 AMRMD DP Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.6 AMRMD CM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.7 Q-RAM & Scheduler Admission Control . . . . . . . . . . . . . . . . . . . . 39

2.8 Dynamic Q-RAM Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.1 Typical Multiprocessor System . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.2 Utility Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.3 Run-time Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.4 Number of Admitted Tasks (20 processors) . . . . . . . . . . . . . . . . . . 48

3.5 Percentage Standard-deviation (= 100 × (Standard deviation)/mean ) ofnumber of admitted tasks on 20 processors . . . . . . . . . . . . . . . . . . 49

3.6 Utility Variation of Three Algorithms in a System of 20 Processors . . . . . 50

3.7 Run-time Variation (log-scale) of Three Algorithms in a System of 20 Pro-cessors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.8 Utility Variation under Fault-Tolerance . . . . . . . . . . . . . . . . . . . . 52

3.9 Number of Admitted Tasks (20 processors) under Fault-Tolerance . . . . . . 53

3.10 Run-time Variation (log-scale) . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.11 A typical Continuous Utility Function . . . . . . . . . . . . . . . . . . . . . 54

3.12 Initial Slope of a Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.13 Hierarchical QoS Optimization with Clustering . . . . . . . . . . . . . . . . 57

3.14 Number of Tasks (32 processors) . . . . . . . . . . . . . . . . . . . . . . . . 60

3.15 Run-time (276 tasks) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.16 Utility Variation (max 256 tasks) . . . . . . . . . . . . . . . . . . . . . . . . 62

ix

x LIST OF FIGURES

3.17 Run-time plot with grouping for 32 processors (max 256 tasks) . . . . . . . 633.18 Number of Tasks under Fault-Tolerance . . . . . . . . . . . . . . . . . . . . 633.19 Run-time (log-scale) under Fault-Tolerance . . . . . . . . . . . . . . . . . . 643.20 Run-time plot in log-scale with grouping for 32 processors under fault-tolerance(max

76 tasks) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.21 Utility Plot under Fault-Tolerance (max 76 tasks) . . . . . . . . . . . . . . 65

4.1 Hierarchical Graph Model of Network . . . . . . . . . . . . . . . . . . . . . 754.2 Network sub-domain and Supervertex Graph Example for |PG′(v′x, v′y)| = 1 774.3 Compound Resource Composition . . . . . . . . . . . . . . . . . . . . . . . 854.4 Distributed QoS Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 874.5 Distributed Resource Allocator . . . . . . . . . . . . . . . . . . . . . . . . . 914.6 Comparison of Smart Route Discovery and Random Route Discovery . . . . 954.7 Utility Variation with Number of Routes . . . . . . . . . . . . . . . . . . . . 954.8 Run-Time Variation with Number of Routes . . . . . . . . . . . . . . . . . . 974.9 Percentage Utility Drop with Routing Task Count Threshold . . . . . . . . 974.10 Percentage Run-Time Variation with Routing Task Count Threshold . . . . 984.11 Average Execution Time for Route-discovery Simulation Per Task . . . . . 994.12 Ratio of Q-RAM Optimization Time To Route-Discovery Per Task . . . . . 994.13 Absolute Utility Variation in Q-RAM and H-Q-RAM . . . . . . . . . . . . . 1004.14 Absolute Execution Time Variation in Q-RAM and H-Q-RAM . . . . . . . 1014.15 Variation of Percentage Utility Loss for 6400 Tasks with the Number of Sub-

domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1024.16 Variation of Percentage Run-Time Reduction for 6400 Tasks with the Num-

ber of Sub-domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1034.17 Number of Transactions for 6400 Tasks with the Number of Sub-domains . 1044.18 Variation of ( H-Q-RAM Execution Time/ Number of Sub-domains) for 6400

Tasks with the Number of Sub-domains . . . . . . . . . . . . . . . . . . . . 105

5.1 Radar System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1105.2 Radar Dwell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1115.3 Average Power Exponential Window . . . . . . . . . . . . . . . . . . . . . . 1155.4 Cool-Down Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1155.5 Non-Optimal Initial Average Power . . . . . . . . . . . . . . . . . . . . . . . 1155.6 Resource Management Model of Radar Tracking System . . . . . . . . . . . 1245.7 Slope-Based Traversal of Concave Majorant . . . . . . . . . . . . . . . . . . 1285.8 Incremental Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

LIST OF FIGURES xi

5.9 Interleaving of Radar Dwells . . . . . . . . . . . . . . . . . . . . . . . . . . . 1335.10 Average Number of Set-points . . . . . . . . . . . . . . . . . . . . . . . . . . 1405.11 Q-RAM Execution Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1415.12 Q-RAM Utility Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1425.13 Profile Generation Time (%) . . . . . . . . . . . . . . . . . . . . . . . . . . 1435.14 Utility loss (%) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1435.15 Optimization Time (%) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1445.16 Fractional Profile Time (%) . . . . . . . . . . . . . . . . . . . . . . . . . . . 1445.17 Utility Variation with Distance . . . . . . . . . . . . . . . . . . . . . . . . . 1455.18 Utility Variation with Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . 1465.19 Utility Variation with Acceleration . . . . . . . . . . . . . . . . . . . . . . . 1465.20 Utility Loss with Quantized Acceleration . . . . . . . . . . . . . . . . . . . . 1475.21 Utility Loss with Quantized Distance . . . . . . . . . . . . . . . . . . . . . . 1475.22 Utility Loss with Quantized Speed . . . . . . . . . . . . . . . . . . . . . . . 1485.23 Utility Loss with Quantized Distance . . . . . . . . . . . . . . . . . . . . . . 1495.24 Utility Variation with Energy and Tx-factor(X) . . . . . . . . . . . . . . . 1525.25 Utility Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1535.26 Optimization+Scheduling Run-time Variation . . . . . . . . . . . . . . . . . 1545.27 Avg Cool-Down Utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . 1555.28 Avg Radar Utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

6.1 Typical Automotive System . . . . . . . . . . . . . . . . . . . . . . . . . . . 1636.2 Utility Curve Ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1666.3 Utility Functions of Two Types . . . . . . . . . . . . . . . . . . . . . . . . . 1686.4 Slope-based Task Clustering Procedure . . . . . . . . . . . . . . . . . . . . . 1746.5 Virtual Task Creation Procedure . . . . . . . . . . . . . . . . . . . . . . . . 1766.6 Utility Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1796.7 Percentage Utility Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 1806.8 Execution Time Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

List of Tables

2.1 QoS and Operational Dimensions Example . . . . . . . . . . . . . . . . . . 232.2 Example Task Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.3 Example Resource Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . 312.4 AMRMD1 Resource Allocation . . . . . . . . . . . . . . . . . . . . . . . . . 312.5 AMRMD1 Resource Allocation for Unequal Processor . . . . . . . . . . . . 32

3.1 Experimental Settings with Optimal Algorithm . . . . . . . . . . . . . . . . 453.2 Settings for Second Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 503.3 Settings for Experiment on Fault-Tolerance . . . . . . . . . . . . . . . . . . 523.4 Experimental Specifications (H-Q-RAM) . . . . . . . . . . . . . . . . . . . . 59

4.1 Settings of Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 944.2 Settings of Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 944.3 Specifications of the Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.1 Filter Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1215.2 Utility Distribution of Search Tasks . . . . . . . . . . . . . . . . . . . . . . 1235.3 Environmental Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . 1385.4 Period, Power and Transmission Time Distribution . . . . . . . . . . . . . . 138

6.1 Assumed Parameters for each Task Types . . . . . . . . . . . . . . . . . . . 1686.2 Experimental Settings with Optimal Algorithm . . . . . . . . . . . . . . . . 177

xiii

List of Algorithms

1 Basic “AMRMD1” algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 282 Basic “AMRMD DP” algorithm . . . . . . . . . . . . . . . . . . . . . . . . 323 Basic “AMRMD CM” algorithm . . . . . . . . . . . . . . . . . . . . . . . . 374 Hierarchical Q-RAM Optimization for Multiprocessor System . . . . . . . . 585 Basic Route Discovery Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 726 Basic Global QoS Optimization For Networks . . . . . . . . . . . . . . . . . 737 Hierarchical Broadcast Route Discovery . . . . . . . . . . . . . . . . . . . . 818 Smart Route Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 829 Hierarchical Distributed QoS Optimization . . . . . . . . . . . . . . . . . . 8810 Utilization Bound Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . 12511 Proper Nesting Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13412 Improper Nesting Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 13613 Clustering Algorithm for Communicating Heterogeneous Tasks . . . . . . . 178

xv

Chapter 1

Introduction

1.1 Motivation

The built-in notion of time differentiates a real-time computing system from non-real-time

systems. A task, executing on a real-time system generates output to the external inter-

face(s) that must not only be logically correct but must also be temporally correct. The

end-user derives satisfaction depending on the degree of logical and temporal correctness

of the outputs. An output may lose its value if it is not delivered before its deadline. In

addition, the amount of the content may be adjusted or reduced in order to deliver the data

in predictable manner.

As an example, a multimedia application such as an Internet video conferencing must

provide video and audio data to the user in a timely manner. The content of an individual

video frame depends on the resolution, color of the picture it contains and that of an audio

frame depends on the size of its sample and type (stereo or mono). The user derives a

certain amount of satisfaction based on the values along these factors. Hence video frame

rate, audio sampling rate, resolution, color, audio sampling size etc. are the quality factors

that provide satisfaction to the user of a video conferencing application. Higher values

of these quality factors increase its resource usage. In addition, the satisfaction of the

1

2 Chapter 1. Introduction

user tends to increase with the increase in quality factors such as the frame rate and the

resolution. The Quality of Service (QoS) of a task expresses a state of a task in terms of

particular values of its quality factors. It increases with increase in the values of the factors.

In general, the higher QoS of a task the higher the resource demand. For example, the video

conferencing task consumes resources such as computational cycles and network bandwidth

based on the values of its quality factors. Since the capacities of the resources are finite, we

need to apportion them as efficiently as possible among multiple applications.

Many real-time systems interact directly with the physical environment and can be very

dynamic in nature. In these systems, task resource allocations must be adjusted based on

changes in the environment. An example is a radar system where the tracking precision

of a target changes based on its distance from the radar and the presence of noise in the

atmosphere. When the environmental noise increases, the radar must increase its signal

power in order to achieve the same level of tracking precision. Moreover, in a dynamic

environment, tasks may arrive or depart asynchronously at any time. This necessitates

run-time adjustments to the resource allocation of tasks in the system.

In this dissertation, we describe the design of a resource management framework for

distributed and dynamic real-time systems, which:

• performs near-optimal allocation of resources to tasks that maximizes the benefits for

the users,

• interacts with the resource scheduler to ensure that the timing requirements of the

tasks are guaranteed to be met,

• is scalable to large distributed real-time systems, and

• is adaptive to dynamic changes in the environment.

1.2. Approach 3

1.2 Approach

Our approach to distributed resource management is based on QoS-based Resource Alloca-

tion Model (Q-RAM) [49]. In Q-RAM, the quality factors of an application that indepen-

dently alter the end-user’s satisfaction are called QoS dimensions. Q-RAM expresses the

satisfaction earned by the QoS dimensions by a single scalar factor called the utility. The

utility of a task a function of utilities of its QoS dimensions, and is generally expressed as a

weighted sum of utilities of its QoS dimensions. The global utility of the system is usually

expressed as a function of the task utilities.

Definition 1.2.1 (Utility). Utility is a real number representing a user’s satisfaction with

offered services in which a higher value corresponds to higher satisfaction.

Q-RAM allocates resources to tasks in a distributed system so as to maximize the global

utility of the system. Since computing the optimal solution is NP-hard, Q-RAM determines

a near-optimal solution, the algorithm for which has a polynomial complexity relative to the

number of resources in the system and the number of QoS settings of a task. Even with this

relatively benign complexity, the execution time can become intractable when the number

of resources and tasks in the system becomes large, thereby rendering it impractical for very

large dynamic systems. Among the goals of this research is to create a QoS-Optimization

methodology that yields a near-optimal global utility close to that of Q-RAM, while scaling

with the size and the dynamics of the system.

Our contribution in this dissertation can be divided into five categories. First, we

define a generic model of QoS specifications for distributed systems and generate basic

resource allocation algorithms. The algorithms are advanced forms of the resource allocation

algorithm (AMRMD1 ) for Q-RAM [51].

Second, we investigate the scalable resource allocation problems in large systems. We

assume that tasks need only computational resources, that they multiple QoS levels, and

that they can use replication to meet its reliability requirements. Reliability of tasks is


modeled as a QoS dimension where a task’s degree of replication is translated into its

resource requirements. For large multiprocessor systems, we divide the entire multiproces-

sor system into multiple subsystems and solve the resource allocations on each subsystem

independently.

Third, we analyze resource allocations for tasks in networks, where tasks are charac-

terized by their network bandwidth and delay requirements. We present a scalable QoS

optimization and resource management scheme in a hierarchically structured distributed

networked environment. Each task in such a system can be represented by a flow in the

network that imposes two main requirements to the resource manager: (1) choosing the best

path between the source node and the destination node, and (2) choosing an appropriate

amount of bandwidth along the path. A hierarchical decomposition of the network allows it

to be divided into multiple subnets or sub-domains connected via backbone links. Therefore,

a route of a flow can be divided into multiple segments lying in separate sub-domains. This

translates the QoS optimization process into multiple resource allocation sub-problems spe-

cific to individual domains. Each sub-problem can be solved by a separate thread. Flows

local to sub-domains are allocated independently by their corresponding threads. For a

flow that spans across multiple sub-domains, the threads coordinate the resource alloca-

tion through a distributed transaction process. This approach exploits the hierarchy of the

network, and makes the optimization problem scalable without sacrificing the optimality

of the solution. We also are selective in choosing links within each sub-domain in order to

further improve scalability.

Fourth, we investigate scalability issues in the context of QoS and resource management

of a very dynamic real-time system that interacts directly with the physical environment.

We chose Phased Array Radar as an example of such systems. Dynamic real-time sys-

tems such as phased-array radars must manage multiple resources, satisfy physical(energy,

for example) constraints and make frequent on-line scheduling decisions. These systems

are hard to manage because task and system requirements change rapidly (e.g. in radar

1.3. Related Work 5

systems, the targets/tasks in the sky move continuously) and must satisfy a multitude of

constraints. The highly dynamic nature and stringent time constraints lead to complex

cross-layer interactions. To be able to handle these tasks, we design a QoS manager that

is adaptive, reacts to dynamic changes in the environment, adjusts the level of service and

reallocates resources efficiently. It uses efficient QoS optimization to allocate resources to

tasks and maximizes the overall utility, and then ensures schedulability of tasks in real-time.

We develop an integrated framework that incorporates an adaptive Q-RAM based resource

allocation method with the scheduler admission control in a Radar System.

Finally, we investigate QoS optimization in distributed computing systems for embedded

applications such as automotive and flight control systems. We assume a multiprocessor

system in which tasks have both computational as well as communication requirements.

Tasks are also assumed to have multiple QoS dimensions. This problem is formulated

as an extension to the hierarchical decomposition technique described for multiprocessor

QoS optimization. We develop a more general hierarchical technique that decomposes the

system into subsystems, and clusters the tasks to be deployed onto those subsystems while

satisfying the computational as well as the network bandwidth constraints.

1.3 Related Work

Much work has been done in the area of QoS-based resource management. It can be broadly

classified into three main categories: (1) high level modeling, specifications and changes in

applications, (2) development of QoS-aware middleware that translates the QoS specifi-

cations of applications to resource requirements and determines their resource allocation

based on certain policies, and (3) changes in the resource management and scheduling in

the Operating System and its cross-layer cooperation with the middleware that provides

guaranteed resource access to applications. In this dissertation, we discuss the related work

in the context of the development of middleware and resource allocation mechanisms based


on high level requirements of applications. In addition, we investigate the related work

that has been performed in certain specific areas of large-scale systems; namely, distributed

multiprocessor systems, large-scale networked environment and dynamic real-time systems

such as radar systems.

1.3.1 QoS-Optimization Techniques and Middleware

The Quasar Project at the Oregon Graduate Institute proposes a Quality of Service (QoS)

model for multimedia and database applications that specifies tolerance for inaccuracy in

output such as timing and information loss as QoS dimensions. Similar to Q-RAM, they use

application-level QoS specifications to drive system activities [78, 77]. For example, QoS

dimensions of a video player program include (1) getting a video frame at the right time and

(2) getting the right frame. Using these specifications, they employ heuristics to generate

resource allocations for tasks that satisfy the QoS requirements of all the tasks within a

range and guarantee near-minimal resource consumption. They integrate the infrastructure

for QoS specification and translation with adaptive resource management components and

perform dynamic system specialization for performance, predictability and survivability.

The MONET research group at the University of Illinois at Urbana-Champaign studied

system software issues to provide services and protocols for end-to-end Quality of Service

guarantees for distributed multimedia applications. They present a QoS-aware service man-

agement model called QualMan as a loadable middleware module [61]. In their model,

QoS is spread over multiple layers of the software architecture (users, applications, system

and network layers) as perceptual QoS, application QoS, system QoS and network QoS.

This classification allows each layer to specify its own quality parameters. Every applica-

tion specifies its ranges of QoS parameters. The resource admission control determines the

settings based on the availability of the resources while minimizing the resource usage.

Venkatasubramanian et al at University of California at Irvine adopts a similar approach

in developing QoS-aware middleware in their distributed resource management framework

1.3. Related Work 7

named AutoSeC (Automatic Service Composition) [82].

Jha et al, at Honeywell Inc., developed Adaptive Resource Allocation (ARA) mecha-

nisms for mission-critical and other applications that provide QoS guarantees, and adapt

resource allocation based on dynamic changes in the applications’ resource needs [67]. They

also define three dimensions to characterize an application, namely (1) timing, (2) QoS, and

(3) criticality [41]. In other words, they differentiate QoS from timing and criticality con-

straints. Using the criticality metric, the scheme tries to schedule the most critical tasks,

followed by the less critical ones. Next, it sorts the sessions in increasing order of QoS, puts

them in a circular list and expands their QoS in a round-robin order.

1.3.2 QoS and Networking

There have been several contributions in the field of QoS in networks, especially in the

context of the Internet and ATM [35, 25, 80]. These contributions to network QoS are

coarsely divided into three categories: (1) the selection of a route between the source and

the destination, (2) the bandwidth reservation across the route and (3) the scheduling of

network packets at each router across the route. We will discuss some of these contributions

in detail in the following section.

Route Selection

Ma and Steenkiste investigated several route selection schemes for flows with a fixed band-

width requirement [54]. The selection of the route is dependent on the length of the route

in terms of the number of hops and the availability of the maximum reservable network

bandwidth. The goal of their QoS routing is to select a feasible route if one exists, and

the route leading to the best resource efficiency is chosen if multiple routes are available.

The schemes are compared based on the availability of the routes for QoS-aware flows with

specified bandwidth, inaccuracy in generating routes, and the performance of best-effort

flows in terms of delays.


Nahrstedt et al make three contributions in the field of QoS-aware routing. First, they

use topology aggregation of hierarchically structured networks in order to provide routing

for tasks involving QoS requirements related to bandwidth and delay guarantees [13, 53]. In

hierarchical routing, nodes are clustered into groups, which are further clustered into higher-

level groups, creating a multi-level hierarchy [44]. Second, they also present distributed

ticket-based routing, which is designed to work with imprecise state information [12]. The

source node sends tickets that probe across the network in order to determine a suitable

route towards the destination. It allows the dynamic trade-off between routing performance

and overhead.

Signaling for Reservation

Resource ReserVation Protocol (RSVP) is a typical example of a signaling protocol for

network reservation [85]. It is intended to provide IP networks with the capability to sup-

port the divergent performance requirements of differing application types. Three classes

of applications are considered: best effort, rate-sensitive and delay-sensitive. Examples of

these three types are file transfer, Mpeg video transmission, and video conferencing respec-

tively. RSVP is a scalable destination-oriented simplex signaling protocol that provides a

mechanism to establish a reservation over a route between the destination and the source.

However, it does not determine the route by itself, nor does it ensure the proper scheduling

of the packets at the routers. It is designed to work in conjunction with existing routing

and scheduling protocols.

Packet Scheduling

Packet scheduling deals with scheduling policies of packets in routers for guaranteed delivery

to their destinations. Many queueing disciplines are extensions of the Generalized Processor

Sharing algorithm, assuming the fluid-flow model of the network packets [63]. Prominent

examples include: Weighted Fair Queueing (WFQ) [20], Worst-case Fair weighted Fair

1.3. Related Work 9

queueing (WF 2Q) [7] etc. Stoica et al presented core-stateless fair-queuing (CSFQ) that

makes the fair-queueing principle scalable to large networks, where per-flow management

becomes intractable [79]. It differentiates between core and edge routers. Edge routers

perform per-flow management while core routers do not perform per-flow management by

using aggregates instead. There are also other contributions that do not follow the fluid-flow

model, such as fair-Shortest Remaining Time (fair-SRPT) [55], Quantized EDF scheduling

[39] and deadline-monotonic packet-scheduling [28].

Utility-based QoS

Shenker first suggested the use of utility functions for modeling QoS in networks [72].

Bhargvan et al adopt a similar utility-function based QoS optimization method in the

wireless environment [26]. They consider throughput, fairness, delay and loss as their

system-wide QoS parameters. Similar to Q-RAM, they associate concave and continuous

utility functions with QoS parameters. They maximize the global utility of the system

by allocating channel bandwidths to applications subject to a channel capacity constraint.

They build their adaptive algorithm by choosing specific utility functions such as U(r) =

log(r). Unlike Q-RAM, their algorithm relies on the choice of a specific utility function1.

1.3.3 QoS and Radar

As mentioned earlier, there are many real-time systems where physical environment plays a

key role in determining the QoS of applications. Because of the dynamic nature of the envi-

ronment, QoS-based resource management has to be adaptive to changes of environmental

factors such as temperature, noise etc. Consequently, a whole range of resource constraints

such as power, energy etc., come into play. A radar system is a classic example of such a

system.

Many recent studies have focused on phased-array radar systems. The focus has pri-

1More details related to network QoS will be described in Chapter 4.


marily been on performing schedulability analysis of radar tasks for their given execution

times. For example, Kuo et al proposed a reservation-based approach for real-time radar

scheduling [48]. This approach allows the system to guarantee the performance requirement

when the schedulability condition holds. However, they do not consider energy constraints.

Shih et al use a template-based scheduling algorithm in which a set of templates is con-

structed offline, and tasks are fit into the templates at run-time [74, 73]. The templates

consider both the timing and power constraints. They also consider interleaving of dwells

that allow beam transmissions (or receptions) on one target to be interleaved with beam

transmissions and receptions on another. The space requirements of templates limit the

number of templates that can be used, and “service classes” designed offline determine how

QoS operating points are assigned to discrete sets of task configurations across an expected

operating range. Goddard et al addressed real-time back-end scheduling of radar tracking

algorithms using a data flow model [33]. Our work in radar QoS optimization is most sim-

ilar to the work of Jha et al[67]. They use their adaptive QoS middleware framework (as

mentioned in Section 1.3.1) for QoS-based resource allocation and schedulability analysis

in Radar Systems.

1.3.4 QoS and Embedded Systems

There has been comparatively less amount of work about QoS in distributed embedded

systems, which are mostly binary control systems. Abdelzaher et al first introduced the

notion of QoS in such systems [1]. They developed a negotiation model that adjusts the

QoS levels of the applications in real-time while maximizing application-perceived system

utility or reward. They incorporated the proposed QoS mechanism into a middleware service

called “RTPOOL”. It uses a QoS optimization heuristic that starts with the maximum

QoS of all tasks, and then reduces the QoS of a task whose drop in reward is minimum

for a lower QoS level. Next, they introduce a distributed QoS-optimization protocol, where

the hosts negotiate with each other and share the load based on the reward of accepting a

1.4. Organization of this Dissertation 11

task of a certain reward level. Based on that work, Sanfridson introduces the concept of

integrating QoS with a feedback control mechanism for automotive systems [70].

1.4 Organization of this Dissertation

We organize this dissertation as follows.

In Chapter 2, we describe our generic model of QoS and distributed systems. In Chap-

ter 3, we describe our resource allocation algorithms for large multiprocessor systems. In

Chapter 4, we describe our distributed resource allocation in scheme for large hierarchical

networks with large numbers of resources and tasks where each task requires many re-

sources. In Chapter 5, we describe an integrated resource allocation and scheduling model

for a Phased Array Radar System as a dynamic scalable real-time system with many dif-

ferent constraints. In Chapter 6, we describe QoS-based resource allocation in distributed

embedded systems. Finally, in Chapter 7, we summarize our research contributions and

discuss future work.

Chapter 2

System Model

2.1 Introduction

In this chapter, we describe a generic model of distributed systems that we use throughout

this dissertation. A distributed system consists of multiple tasks and multiple resources. A

task executes on the system by using the resources. An end-user derives a benefit or utility

from the system due to the execution of these tasks.

This chapter is divided into three parts. First, we define the terms task and resource

and discuss their interactions. Secondly, we elaborate on our mathematical model of a dis-

tributed system. Finally, we describe our basic optimization algorithms [31] that maximize

the accrued utility of the end-user while allocating resources to tasks.

2.1.1 Definitions: Task and Resources

In computer systems, a task is a basic unit of programming that an operating system

controls. Depending on how the operating system defines a task in its design, this unit

of programming may be an entire program or each successive invocation of a program.

A task is considered to be a container that holds a set of instantiating objects known as

threads, as in the case of Mach, Mach-type operating systems and Linux. In BSD Unix-like

13

14 Chapter 2. System Model

environments, however, the word process is used instead of task. When multiple processes

work in the same “context” (address space) of a task, we refer to them as threads.

In this dissertation, we refer to a task as an application that provides a service to the

end-user. It can be a video conferencing task, or a tracking task that tracks a target using a

phased array radar. In its implementation, it consists of one or more processes or threads.

A resource, on the other hand, is defined as a source of aid or supply that can be drawn

upon when needed. Tasks need resources to be executed. Furthermore, a resource is a

measurable entity that has a finite supply. The major computer system resource categories

are processor cycles, network bandwidth, memory, and disk space. Embedded systems may

have other resources. For example, in a radar system, resources include antenna bandwidth

and antenna power.

Resources can be classified into 2 main categories. They are: (1) Time-shared resources,

and (2) Spatial resources. Each is discussed in the following subsections.

2.1.2 Time-shared resources

A resource is time-shared when at a given instant, only one task receives the entire supply

of a resource, while other tasks that require it receive none of it. Processor (CPU) cycles,

and network bandwidth are time-shared resources. We can either express a time-shared

resource as a system-wide supply of an amount ∆R available in every small time unit ∆T ,

such that the rate of the supply can be expressed as:

r(t) =∆R

∆T. (2.1)

A task τi can specify its requirement as a total share Ci over a time spent Di. In this case,

we can express the task’s average usage rate as CiDi

.

If a task is periodic, its requirements can be expressed as Ci units of resource in every

period of Ti time units1. The resource requirement of a periodic task can be expressed as1Periodic and aperiodic tasks are described in [59] in more detail.

2.1. Introduction 15

a rate by:

si(t) =Ci

Ti. (2.2)

For n periodic tasks, Liu and Layland in [16] introduced a fixed priority scheduling

scheme in which the scheduling priority of a task is inversely proportional to its period and

a higher priority task can instantly preempt a lower priority task with no context-switching

overhead. This is known as rate-monotonic scheduling (RMS) algorithm. They proved that

each task τi obtains its share of Ci units in every period Ti if

1r(t)

n∑i=1

si(t) ≤ n(21/n − 1). (2.3)

The number (n(21/n − 1)) provides the least upper bound on the utilization of a time-

shared resource under RMS. In other words, this is only a sufficient condition, not a nec-

essary one. An average case behavior provides a much higher utilization in RMS than the

one presented in Equation (2.3).

As n → ∞, it reaches ln 2 = 0.69. In addition, if a task has non-preemptive regions

during its resource usage, then it causes blocking times of tasks of higher priority. This is

known as priority inversion [71, 65]. Priority inversion happens when a low-priority task

holds a resource that a high-priority task is waiting for. Considering Bi as the blocking

time, Equation 2.3 is transformed to:

1r(t)

n∑i=1

Ci + Bi

Ti≤ n(21/n − 1). (2.4)

Thus, the effective least upper bound∑n

i=1CiTi

r(t) is reduced below the least upper bound of

0.693. However, as mentioned before, this is a pathological case.

If the periods of the tasks are harmonic, then the utilization bound for the Rate-

Monotonic scheduling algorithm is 1.0. Hence, in a special case, if we assume that all

tasks have the same constant small period T and the context-switching cost is zero, we


can transform Equation 2.4 to a General Processor Share (GPS) [63] model, which is a

special case of rate-monotonic model with harmonic periods. In this case, the task-set is

schedulable when:1

r(t)

n∑i=1

Ci

T= 1. (2.5)

2.1.3 Spatial resources

A spatial resource can be shared by multiple tasks simultaneously. Disk space is a good

example of a spatial resource. Each task requires a certain amount of disk-space to store

its data and instructions. A memory buffer is also a spatial resource. If R is the total size

of a spatial resource at any time t and Si is the demand made by task i for that resource,

then,n∑

i=1

Si ≤ R, (2.6)

is the constraint on the resource demands.

In addition, there are other resources such as memory that can be divided spatially into

multiple time-shared resources. We will discuss this as future work in Chapter 7 of this

dissertation. In the next section, we will discuss our resource allocation model.

2.2 QoS and Resource Allocation

In our QoS optimization model, each task is assumed to have multiple QoS settings, each

of which provides a different quality level to the user. Each setting is associated with

certain resource levels. We employ a modified version of the existing QoS-Based Resource

Allocation Model (Q-RAM) [49, 51, 50, 66] as the basic building block of the optimization

process. Our model determines the near optimal quality levels of each task and apportions

the available resources to them. We assume a simple model of resources where each resource

can be divided among the tasks, either in a time-shared or in a spatial manner. In the case

of a time-shared resource, we limit the total allocable amount by its exact schedulability

2.2. QoS and Resource Allocation 17

bound or an approximate utilization bound.

As a generic model, let us consider a distributed system with m shared resources

r1, . . . , rm. Resources can be of any type including CPU, memory, link bandwidth, or

even radar bandwidth in the case of a radar tracking application. We use the term Re-

source Vector to describe a set of resource units (e.g., a processor of certain frequency,

a network link of certain bandwidth) in a multi-resource environment. For example, the

resource vector ~Rmax = (rmax1 , . . . , rmax

m ) denotes the capacity of the individual resources.

The resources are shared by a set of n independent tasks τ1, . . . , τn. Each task is as-

sumed to have a set of parameters that can be changed to configure its quality levels and

resource demands. We commonly refer to these parameters as dimensions. They are mainly

classified into two main categories: operational dimensions and environmental dimensions.

However, from the perspective of the user, we have only one type of dimension known as

QoS dimensions. We discuss all these dimensions in detail next.

2.2.1 Operational Dimensions

Operational dimensions are the control knobs that are directly controlled by the user or

the system administrator. Values of these dimensions determine the resource allocation of

the application and hence directly or indirectly influence its quality. The choice of a coding

scheme for video conferencing, and the choice of a route for a networked application between

its source and destination are examples of operational dimensions.

Operational Space: This is defined as the set of operational points, as shown for task

τi in Equation 2.7, where Φ is the jth operational dimension and NΦi is the number of

operational dimensions.

Φi = Φi1 × · · · × ΦiNΦi

(2.7)


Operational Indices: An index in 1, 2, ..|Φij | enumerating the possible values of op-

erational dimension j is called an operational index. Operational dimensions can be of two

types: monotonic and non-monotonic.

Monotonic Operational Dimensions: The value of this type of dimension is directly

or inversely related to the utility of the task. In other words, increasing values along this

dimension either increases or decreases utility. For example, increasing the frequency of a

tracking task in radar increases the quality of tracking.

Non-Monotonic Operational Dimensions: The value of this dimension is not directly

or inversely related to the utility of a task. An example is the selection of a video coding

algorithm for a video task. There may be multiple types of video coding algorithms, but it

may not be possible to sort them in the increasing or decreasing order of utility.

Next, we will introduce another type of dimension that affects the QoS of tasks and

hence the utility, but is not in the direct control of the user or the system administrator.

2.2.2 Environmental Dimensions

The quality obtained by a task may even depend on factors in the environment in addition

to the operational settings. For example, the quality of a video conferencing task in a

wireless medium can depend not only on the strength of the wireless signal received at the

receiver, but also on factors such as the environmental noise. Therefore, the noise is an

example of an environmental dimension.

Environmental Space: This is defined as the set of environmental points, as shown for

task τi in Equation 2.8, where Θ is the jth environmental dimension and NΘi is the number

of environmental dimensions.

Θi = Θi1 × · · · ×ΘiNΘi

(2.8)


Environmental Indices: An index in 1, 2, ..|Θij | enumerating the possible values of

environmental dimension j is called an environmental index.

With different values in the operational and environmental dimensions, a task gets a

different value of the QoS setting. Next, we discuss the dimensions that are of direct

relevance to the end-user and that provide QoS to the end-user.

2.2.3 QoS Dimensions

The dimensions that are of direct relevance to the user are known as QoS dimensions. For

example, the frame rate of a video-conferencing task and the tracking precision of a radar

tracking task are QoS dimensions. A higher value along a QoS dimension generally requires

higher resource levels.

QoS dimensions are derived from operational and environmental dimensions. A QoS

dimension can also be same as a monotonic operational dimension. For example, frame

rate of a videoconferencing task is an operational dimension (controllable knob) that is also

a QoS dimension.

Users derive satisfaction or utilities through various values of QoS dimensions. The

higher the value of a QoS dimensions, the higher the utility to the user. For example, a

higher frame-rate in a video-conferencing application provides a higher utility to the user.

The value of the utility along different QoS dimensions depends on the task, and perhaps

the user.

In the context of QoS dimensions, we use the following terms from [49].

Quality Space: This is defined as a set of quality points, as given by:

Qi = Qi1 × · · · ×QiNQ

i, (2.9)

for task τi, where Qij is the jth QoS dimension and NQi is the number of QoS dimensions.


QoS Dimensions

EnvironmentalDimensionsDimensions

Operational

System−centric dimensions

Resource requirements

User−centric dimensions

Utility

Figure 2.1: Dimensions and Their Relations

Quality Indices: An index in 1, 2, . . . , |Qij |, enumerating the quality levels for dimen-

sion j arranged in increasing value of the quality level is called a quality index.

Dimension-wise Utility: This is the utility associated with a particular quality level of

a QoS dimension. In other words, it is defined as the mapping uij : Qij → < representing

the utility achieved by assigning quality level qij to dimension Qij .

Application Utility: It is normally expressed as the weighted sum of dimension-wise

utilities across all QoS dimensions as a mapping ui : Qi → <.

For example, if an application has 2 QoS dimensions, its particular QoS setting is denoted

by (qj1, qk2), where j and k are the indices of its respective QoS dimensions. The utility of

the application at this QoS setting is expressed as (w1uj1 + w2uk2), where w1 and w2 are

the respective weights of the two dimensions.

Based on operational and environmental dimensions, we generate the different operating


points of a task. We refer to them as set-points.

Definition 2.2.1 (Set-point). It is an operating point of the task. It consists of a partic-

ular of each of its operational and environmental dimensions and a utility value.

2.2.4 Set-point Generation

Set-points are generated by creating a QoS Profile and a Resource Profile [49].

QoS Profile Generation

The QoS Profile consists of different QoS levels of the task and the values of the corre-

sponding utilities. For some tasks, the operational dimensions and QoS dimensions may

be equivalent and there may be no environmental dimensions, but in general we say that

there is a Quality Function fqi : Φi ×Θi → Qi mapping each point in the cross product of

the operational space and environment space to a point in the quality space. The relation

between operational, environmental and QoS dimensions is illustrated in Figure 2.1.

Resource Profile Generation

In order for a task to operate at a particular set-point φi, it requires resources. We define

a function gi : Φi → ~Ri specifying the amount of resources required for the task to oper-

ate at each set-point, where ~Ri = ri1 , . . . , rim is defined as the Resource Vector describing

the resource requirements of the task at that set-point. Apart from the resource require-

ment of the task, it also has a deployment constraint which is given by an non-monotonic

operational dimension. For example, in a networked system, if a task requires bandwidth

between a source and a destination, the multiple choices of paths belong to a non-monotonic

operational dimension[31].

For each task, all QoS dimensions Qij must satisfy the conditions as

∀k∈1,...,m∂rk

∂qij≥ 0, (2.10)


where rk denotes the kth resource. That is, an increase in any quality index value never

results in the decrease in any resource requirement value. Set-points that do not satisfy

these conditions can be dropped from consideration. This is because there are other set-

points that can yield higher QoS with reduced resources. The same condition is applicable

for monotonic operational dimensions.

However, for non-monotonic operational dimensions, the conditions are given by:

∃k∈1,...,m∂rk

∂φij< 0, (2.11)

∃k′∈1,...,m,k′ 6=k∂rk′

∂φij> 0. (2.12)

These equations indicate that the switching from one value of “resource configuration” to

another results in subtraction of resource from one or more resource element and addition

to one or more different resource elements.

2.2.5 Example Application

As an example, consider a video conference application with QoS and operational dimensions

as shown in Table 2.1. There are two monotonic operational dimensions that have one-to-

one correspondence with QoS dimensions: frame rate and resolution. They are assumed

to have weights 0.4 and 0.6 respectively. The weights represent the relative importance of

the QoS dimensions from the user’s perspective. For frame rate, there are three possible

levels of service at 10 frames/sec, 20 frames/sec and 30 frames/sec. A quality index is

associated with each of these service levels with 1 for the lowest level of service, and 3 for

the highest level of service. The user of the application has assigned utility values to each

of these levels of service indicating the relative desirability of these service levels. Similar

quality index and utility values are assigned for various resolutions.

In addition to the monotonic operational dimensions, there are also two non-monotonic

operational dimensions. The first operational dimension is the format, or codec, to use for


QoS/Monotonic Operational Levels Quality/Monotonic UtilityDimensions (weight) Operational Index

Frame rate (0.4) 10 fps 1 0.220 fps 2 0.630 fps 3 1.0

Resolution (0.6) 176x144 1 0.1352x288 2 0.8704x576 3 1.0

Non-monotonic Levels Non-monotonicOperational Dimensions Levels Operational Index

Codec NV 1CELLB 2h.261 3

Path B-C-D 1B-E-F 2

Table 2.1: QoS and Operational Dimensions Example

the actual video data. In this example, we assume that NV , CELLB and h.261 are the

video formats. Since each of these formats does differing amounts of compression, some

of them will consume substantial CPU while reducing the network bandwidth required,

while others will minimize the use of CPU at the expense of network bandwidth. Each of

these is assigned an operational index, but ordering does not matter. Furthermore, since

the selected video format does not directly impact the user, there are no QoS dimensions

associated with this dimension.

The other non-monotonic operational dimension is the actual path through the network

used to connect the two endpoints. In its first setting, links B, C and D are used, while in

the second setting links B, E and F are used. Again, while the actual network path selected

does not directly affect the user, it may have an impact on available system resources.

Using the concatenation of operational indices (qi, φi) with the above values listed in

Table 2.1, each possible set-point of the application can be assigned a unique vector. For

example, the set-point (3, 2, 3, 1) would represent the video conference application running

at 30 frames/sec, 352×288 resolution, using the h.261 video codec, and routing the packets


0

0.2

0.4

0.6

0.8

1

0 90% 99% 99.9% 99.99%

Uti

lity

Reliability (%)

Figure 2.2: Reliability and Utility

for the video flow along links B, C and D. The resources required for this set-point would be

determined by applying the resource mapping function gi. This function may be provided

by the application developer or a QoS engineer. The utility for a set-point is normally

determined as the weighted sum of the dimension utilities. In this case, the utility for the

set-point (3, 2, 3, 1) would be (0.4× 1.0) + (0.6× 0.8) or 0.88.

2.2.6 Reliability

Reliability or Fault-tolerance of the task is desirable in many systems. Higher reliability

provides higher utility and vice versa. The idea of fault-tolerance in the form of active

or passive replication has been studied in great detail [4]. Many fault-models have been

presented for determining the necessity of having a certain number of replicas of tasks[15],

but not in conjunction with a QoS optimization framework. In our QoS framework, we treat

fault-tolerance or reliability as an additional aspect of QoS. If we can quantify reliability

under a particular fault-model, we can assume a graph of utility versus reliability as shown

in Figure 2.2.

2.3. Existing Optimization Algorithm 25

Higher reliability of a task can be accomplished through replication. Replicas will run

on different resources relative to the original. Replication enables the application to provide

reliable output even when one (or more) copies of the same application fail(s). The number

of replicas that need to be executed in order to achieve a certain amount of reliability

depends on the fault model of the system.

Reliability can be mapped as a QoS dimension and each discrete level of reliability can

be mapped to a QoS index. For example, consider a task τi that has the following resource

vector allocation choices (options): ~Ri1, ~Ri2 and ~Ri3. At the same level of quality, any

of these resource choices can be allocated to the task. In order for the task to be fault-

tolerant, more than one resource vector needs to be allocated. Thus, we can generate the

QoS set-points in the following way:

Reliability Quality Indices Number of Replicas Resources

1 0 ~Ri1, ~Ri2, ~Ri3

2 1 ( ~Ri1 + ~Ri2),( ~Ri1 + ~Ri3),( ~Ri2 + ~Ri3)

3 2 ( ~Ri1 + ~Ri2 + ~Ri3)

For a task with N resource vector options, the reliability QoS index of M can be at-

tained in

N

M

combinations of resource vectors. This automatically limits the maximum

number of replicas to the number of independent resource options.

2.3 Existing Optimization Algorithm

We can now define the core problem of QoS-based resource allocation as follows. For each

task τi in the set τ1, . . . , τn, we assign a set-point such that the system utility is maximized

and no resource utilization exceeds its capacity. The system utility is defined as a function

of utilities of all the tasks. Normally, it is defined as the weighted sum of the task utilities.


But in case of “fair” sharing, it can also be defined as the minimum of the utilities among

tasks.

Formally, we write this as:

maximize: u(φ1, ..., φn) =∑n

i=1 wiφiui(φi) ⇐ System Utility

subject to: ∀1≤k≤m∑n

i=1 rik ≤ rmaxk

∀1≤k≤m,1≤i≤n rik = gik(φi)

In [51, 49], it was demonstrated that the QoS optimization problem involving multiple

resources (MR) and multiple QoS dimensions (MD) is NP-hard. An exact optimal solution

to the problem based on dynamic programming and an approximation scheme based on the

local search technique was presented.

In the next section, we discuss the limitations of the approximation scheme when applied

to problems with non-monotonic operational dimensions, typically in handling resource

trade-offs. We then present our algorithms that address these limitations.

2.3.1 Approximate Multi-Resource Multi-Dimensional Algorithm (AM-

RMD)

In this section, we briefly describe the optimization technique presented in [51, 49]. We

denote the number of tasks by n and the number of resources by m. Let Ci represent the

set of utility-resource pairs for task τi, as shown:

Ci = 〈

ui1

~Ri1

, ....,

uiki

~Riki

〉. (2.13)

Next, we would like to determine and compare the costs of the resource vectors in order

to choose one which gives higher utility at a lower cost. When there is a single resource in

the system, the cost of a set-point is simply equal to its resource amount. When there are


multiple resources, a scalar metric known as compound resource is computed.

To compute the compound resource, we first compute a penalty vector for the resources

(assuming we have m resources) ~P = (p1, ..., pm) to assign a “price” on each resource.

The value of an element in the vector is directly related to the overall demand of the

corresponding resource, and is defined to be:

pk =rsumk

rmax k+ 1, (2.14)

where rsumkis computed as the sum of the kth resource elements of all the set-points of all

the tasks as given by:

rsumi =∑

All tasks

∑All set−points

rji . (2.15)

The compound resource h is a scalar metric, which is defined for each set-point is defined

by:

h =√

(r1.p1)2 + . . . + (rm.pm)2. (2.16)

The metric h is used to compare the relative cost of each of the resource combinations. We

now augment Ci by adding h to get:

Cic = 〈

ui1

ri1

hi1

, ....,

uiki

riki

hiki

〉. (2.17)

Cic is called a compound resource vector. We use the parameters in Cic to determine the

near-optimal resource allocation for tasks that maximizes the global utility value. The

algorithm is called Approximate Multiple Resource Multiple Dimension or AMRMD1 [51]. It

is briefly presented in Algorithm 1.

This algorithm computes the compound resource vector of a resource. The procedure

concave majorant() chooses to retain the points in Cic falling along the line of highest


input : profiles of tasksoutput: resource allocation of tasks by maximizing utilityCalculate initial penalty vector;for iter = 0 to max iter do

//max iter is usually set to 3for All tasks i do

Generate compound resource Cic for each task τi;Perform concave majorant optimization [51] on Cic;

endCreate slope list by merging set-points of all Cics based on their slopes;Go through the entire slope list and enter/update the resource allocation of thetasks;Update penalty vector from the usage of the individual resources;if the utility in the previous iteration differs from this utility by a small fractionε then

Break from the loop;end

endFinalize resource allocations of the tasks;

Algorithm 1: Basic “AMRMD1” algorithm

slope. The slope of the utility function at a set-point j is defined by:

slope(j) =u(j)− u(j − 1)h(j)− h(j − 1)

, (2.18)

where h(j) and u(j) are the compound resource and the utility at the set-point j respectively.

This is also known as the marginal utility.

2.3.2 Drawbacks of the AMRMD1 Algorithm

There are 2 problems in applying the above algorithm in a multi-resource environment. We

describe them in the order of importance.


Static Penalty Vector Computation

The AMRMD1 algorithm statically computes the “penalty” vector. It is determined based on

the aggregate potential demand placed on a resource, and penalizes the choices of resources

that are perceived to be heavily loaded in favor of the less loaded resources. The aggregate

is determined by summing the resource requirements of all set-points of all the tasks. In

a true sense, the computation of the penalty vector should reflect the real usage of the

resources at any given point in time during resource allocation. In other words, the penalty

vector should be computed dynamically each time a set-point gets admitted, based on the

quality points that have already been admitted into the system so far. This is particularly

true for a large distributed system where a task can have multiple values of its operational

dimensions in terms of its resource trade-offs. Adding all possible resource trade-off values

will unnecessarily create heavy penalties for small resources. If the dynamic computation

is to be avoided for complexity reasons2, we need to obtain a smarter way of evaluating the

penalty vector that does not unnecessarily penalize resources of small size.

Neglecting Co-located Points

Even after using the static penalty vector computation, there can still be many set-points

that have the same values of utility and compound resource but different resource vectors

(or resource combinations). These set-points are known as co-located set-points.

There can be multiple co-located set-points, and keeping only one of them can be poten-

tially sub-optimal. However, while determining the concave majorant, the AMRMD1 algorithm

will choose only one out of those co-located points whichever appears first in the list and

eliminate others completely from consideration. This decision may not be the best one

simply because during the course of the resource allocation process, one point may be in-

feasible while another co-located point with the same utility may be feasible. This depends

on the status of the current allocation of resources. As a result, AMRMD1 may stop allocating

2We discuss the complexity of dynamic penalty vector computation in Section 2.4.1


10 10

Figure 2.3: Equally Sized Processors

Quality Requirements Resources Utility1 3 (3,0),(0,3) 0.32 5 (5,0),(0,5) 0.53 7 (7,0),(0,7) 0.7

Table 2.2: Example Task Profile

resources even though additional feasible allocations exist.

Example 1

Consider a system consisting of 2 processors each of size 10 units, as shown in Figure 2.3.

There are two tasks, each with the QoS profile detailed in Table 2.2. The QoS specification

indicates the resource requirements for a task on each processor. According to the above

specification, each task has two options on resource requirements at each QoS level. The

utility values are chosen as shown.

As can be easily seen, the resource demands on both the nodes are completely balanced

and hence the elements in the penalty vectors are identical. This produces a pair of co-

located points at every QoS level. The optimal solution is the one presented in Table 2.3,

where each task is allocated to its own processor and is assigned the maximum QoS. The

total utility achieved is 0.7 + 0.7 = 1.4.

In contrast, the AMRMD1 algorithm neither keeps co-located set-points nor does it compute

the penalty vector dynamically. Therefore, it may end up producing the following allocation


Task QoS level Resource Vector Utility0 3 (7,0) 0.71 3 (0,7) 0.7

Table 2.3: Example Resource Allocation


Table 2.4: AMRMD1 Resource Allocation

presented in Table 2.4. This yields a total utility of 0.5 + 0.5 = 1.0, which is sub-optimal.

Example 2

Consider the situation where the sizes of the processors are unequal and they are 12 units

and 9 units respectively. Assuming the same task profiles as in the previous example, the

resource demand on two nodes are balanced and the optimal solution is same as before,

The total utility obtained is again 1.4.

However, due to the smaller size of Processor 2, it has the higher penalty and hence the

set-points corresponds to the deployment to Processor 2 are always eliminated. Hence, the

result of the AMRMD1 algorithm will always be the same as in Table 2.5, and will yield a total

utility of 1.0, nearly 30% less than the optimal solution.

12 9

Figure 2.4: Unequally Sized Processors



Table 2.5: AMRMD1 Resource Allocation for Unequal Processor

In short, the original AMRMD1 algorithm can clearly lead to sub-optimal solutions. In

the next two sections, we will discuss two new algorithms that attempt to overcome these

limitations of AMRMD1.

2.4 Enhanced Optimization Algorithms

We now describe two new algorithms that address the limitations of the AMRMD1 algorithm

described earlier.

2.4.1 Dynamic Penalty Vector (AMRMD DP)

In this algorithm, we compute the penalty vector dynamically as we assign set-points for

the tasks. It works as follows.

input : profiles of tasksoutput: resource allocation of tasks by maximizing utility using Dynamic Penalty

VectorsCalculate initial penalty vector;while Number of set-points of all tasks more than 1 and Resources are available do

Create sorted slope list by merging all set-points of the tasks based on theirslopes;Allocate set-point of highest slope/marginal utility;Eliminate the set-points of the task with same or lower utilities;Recompute penalty vector based on the available resources;Update compound resources of the remaining set-points of the tasks;


Algorithm 2: Basic “AMRMD DP” algorithm

First, it creates the Ci lists. Without performing the concave majorant operation, it

2.4. Enhanced Optimization Algorithms 33

Compound Resource

Util

ity UpdateinPenalty Vector

Another update in penalty vectorFewer points left

Util

ity

Util

ity

Compound Resource

Compound Resource

Figure 2.5: AMRMD DP Algorithm

computes the marginal utility as the slope of the compound resource/utility curve. Next,

it selects the point of the highest marginal utility to be allocated. If the allocation is

successful, it updates the penalty vector. This step requires an update in the compound

resource parameters of all tasks containing the remaining set-points. Thus, the set-points

migrate from one location to another in compound resource-utility space during the progress

of the algorithm. Then, it repeats the procedure until all the set-points of all the tasks or

the resources are exhausted.

Complexity of AMRMD DP

The asymptotic computational complexity of AMRMD DP is as follows. The initial computa-

tion of the penalty vector takes O(nL) operations, where n is the number of tasks and L


is the maximum number of set-points per task. Within the loop, the procedure for updat-

ing the compound resource takes O(nL) operations, the procedure for selecting a set-point

takes O(nL) operations and the procedure for adjusting the penalty takes O(nL) opera-

tions. Now this loop can repeat nL times in the worst case. This yields a total complexity

of: O(nL) + nL(O(nL) + O(nL) + O(nL)) + O(nL) = O(n2L2).

Therefore, this algorithm has a higher degree of complexity than AMRMD1, whose com-

plexity is O(nL log(nL)); however, unlike AMRMD1, AMRMD DP yields the optimal solutions for

both the examples discussed in Section 2.3.2.

2.4.2 Co-mapping of Quality Points (AMRMD CM)

Util

ity

Compound Resource

Co−located points

Generate K_listandperform convex_hull

Convex_hull_map operationBring back essential co−locatedpoints

Eliminated pointsafter convex_hull

Co−located points

Util

ity

Util

ity

Compound Resource

Compound Resource

Figure 2.6: AMRMD CM Algorithm

The AMRMD CM algorithm explicitly keeps track of co-located quality points, and performs

both a penalty vector and a concave majorant computations in ways different from AMRMD1.


Penalty Vector Computation

Similar to AMRMD1, AMRMD CM algorithm also evaluates the penalty vector statically. However,

its computation is different from that of AMRMD1. In AMRMD1, all the resource deployment

options are added together to determine the potential resource demand. However, we know

that only one out of multiple of resource options need to be selected for a task and each

resource option may not be equally likely to be selected. Therefore, we would like to

include the likelihood of selection of a resource option while computing the penalty vector

of resources.

Let us consider the likelihood of a particular resource trade-off to be chosen. At a par-

ticular utility value, let us denote the resource vector of jth trade-off by ~Rj = (rj1 , . . . , rjm),

where rjk, rmax

k and m denote the demand of the kth resource, the capacity of the kth re-

source and the number of resources in the resource vector respectively. In this context, we

define the following two terms.

Definition 2.4.1 (Bottleneck Resource). At a given utility level, the kth resource is

said to be the bottleneck resource of the resource vector corresponding to the jth trade-off, if

rjk/rmax

k ≥ (rjl/rmax

l ), ∀1 ≤ l ≤ m. (2.19)

Definition 2.4.2 (Bottleneck Factor). At a given utility level, for jth trade-off, if kth

resource is the bottleneck resource, then the factor βj = rmax/rjkis defined as the Bottleneck

Factor of the jth trade-off.

Definition 2.4.3 (Selection Factor). The selection factor of jth trade-off at a fixed utility

level is given by:

ρj =(βj)∑NTi=1 (βi)

, (2.20)

where NT denotes the number of elements for the trade-off dimension.

Using Definition 2.4.3, at a given utility level, multiple resource allocations are weighed


based on the values of their Selection Factors in order to evaluate their demands. For

example, if a task is allocable to 2 processors of unequal capacities, the selection factor of

the larger processor is higher. However, if the task is allocable to only one processor, the

selection factor of that processor is 1 while that of the other one is 0. We evaluate the

demand of resources by modifying Equation 2.15 as given by:

rsumi =∑

All tasks:n

∑All set−points:L

ρjri. (2.21)

Next, we compute the penalty vector using Equation (2.14) and consequently derive

compound resource for the set-points using Equation (2.16).

Concave Majorant Computation

Similar to AMRMD1, the AMRMD CM algorithm computes the concave majorant procedure to

retain only the necessary set-points. However, unlike AMRMD1, it retains all the co-located

set-points which yield the same utility values with the same compound resource values but

with different resource vectors.

From Cic lists, we create another compound resource list Ki by including the elements

from Cic that only have distinct values of compound resources (h). In other words, if two or

more elements in Cic have the same value3 for h but different vector values for r, then they

are mapped to a single element in the Ki list. This is called “co-mapping” of set-points.

Each element in Ki also stores the indices of the corresponding elements in Cic.

AMRMD CM then performs the concave majorant operation on the Ki list instead of the

Cic list, and maintains the set of co-located points of the same utility if they lie on the

concave majorant. It attempts to allocate one of the co-located points of a task if a point

is infeasible due to resource constraints, and allocation lasts until all the points of all the

3In order to account for the floating point precision issue, we consider two points co-located when thefractional difference between their compound resource values is less than a small fraction ε, which is typicallyset to 0.1.


input : profiles of tasksoutput: resource allocation of tasks by maximizing utilityCalculate initial penalty vector;for iter = 0 to max iter do

//max iter is usually set to 3for All tasks i do

Generate compound resource Cic for each task τi;Generate new list Ki where multiple co-located points in Cic are mapped toa single point in Ki;Perform concave majorant optimization [51] on Ki;Retain the corresponding set-points of Cic that map to the remainingset-points in Ki in terms of compound resources and utilities and discardthe rest;

endCreate slope list by merging set-points of all Cics of all tasks based on theirslopes;Go through the entire slope list and enter/update the resource allocation of thetasks;After the procedure is finished update penalty vector from the usage of theindividual resources;if the utility in the previous iteration differs from this utility by a small fractionε then



Algorithm 3: Basic “AMRMD CM” algorithm

tasks are exhausted.

The process of forming a Ki list and the corresponding retrieval of the relevant co-

located points are illustrated in Figure 2.6. In Step (1), co-located points are gathered. In

Step (2), the concave majorant is determined. In Step (3), only points (and their co-located

points) along the concave majorant are used for making resource allocation decisions. The

procedure is briefly described in Algorithm 3.


Complexity of AMRMD CM

The asymptotic computational complexity of AMRMD CM can be obtained as follows. Let

L = maxni=1|Qi| and L′ = maxn

i=1|Ci|. In other words, L is assumed to the maximum

number of QoS levels and L′ is assumed to be the maximum number of set-points that may

have multiple set-points at a particular QoS level. The procedures for creating Klist and co-

mapping set-points require O(nL′) operations each. The concave majorant operation takes

O(nL log(L)) [51]. The merging operation takes O(nL′ log(n)). Therefore, the complexity

of the algorithm is: O(nL′(1 + log(n))) + O(nL log(L)) = O(nL log(L)) + O(nL′ log(n)).

This is somewhat higher than that of AMRMD1 since L′ ≥ L, but much smaller than that of

AMRMD DP. In addition, AMRMD CM yields the optimal results for both the examples discussed

in Section 2.3.2, similar to AMRMD DP.

2.5 Large-scale Optimization Issues

Based on the above discussion, we shall use AMRMD CM as the algorithm for optimization in

the rest of the dissertation. It exhibits a benign computational complexity (O(nL log(nL)))

compared to the optimal algorithm which is NP-hard. However, we have other problems in

using this algorithm directly in a large-scale distributed system.

2.5.1 Set-Point Generation Complexity

The computational complexity of O(nL(log(nL))) ignores the processing of the generation

of set-points from the various dimensions. In many applications, the generation of different

values of the operational dimensions can itself be of much higher complexity than the core

optimization process. For example, in a networked system, determining all the possible

paths between a source and a destination can be much more complex than finding a right

path with an appropriate bandwidth by Q-RAM. This will be discussed in more detail in

Chapter 4.

2.5. Large-scale Optimization Issues 39

NetworkDisk CPU

Q−RAM Resource Allocator

Resource Admission Control of Scheduler

Tasks

Resources

QoS and resource specifications from Tasks

Resource reservation request to OSRequest success/failure

QoS and resource assignments for Tasks

Figure 2.7: Q-RAM & Scheduler Admission Control

If the task is highly configurable i.e., it has a large number of possible values per di-

mension, we should generate only a few values of each dimensions before we perform the

concave majorant instead of exhaustively generating all possible values. This will have the

effect of reducing the complexity of the concave majorant step O(nL log(L)) by reducing L.

2.5.2 Core Algorithm Complexity

After reducing L, we would like to reduce the second part of the complexity that comes

from merging all set-points of all n tasks, which is O(nL log(n)). This is solved by dividing

the problem into smaller subproblems and solving these subproblems as independently as

possible. This includes the clustering of the tasks into a small number of groups and the

division of the entire distributed system into a number of small partitions. This reduces

both the complexity of the concave majorant and the merging operations. The technique

to perform this division varies depending on the type of the system.


Sche

dule

r Adm

issi

onC

ontr

olQ−RAMResourceAllocator

Requested Task Queue

Q−RAM Reconfiguaration Clock

Output Task Setting

Rec

onfi

gura

ble

Tas

k Q

ueue

Figure 2.8: Dynamic Q-RAM Optimization

2.5.3 QoS Optimization and Resource Scheduling

In our QoS optimization model, we assume a simple model of resources where each resource

can be perfectly divisible among the tasks, either in time-shared or in spatial manner.

However, in the case of time-shared resources, in order to obtain real-time guarantees, we

need to perform scheduler admission tests for tasks once the resources are allocated to them

by Q-RAM. Moreover, the admission tests of multiple resources must be integrated with

each other [29, 69]. The interaction between a resource scheduler and Q-RAM optimization

are shown in Figure 2.7. Q-RAM can allocate resources more optimistically or conservatively

depending on the assigned utilization bounds on the resources. We know that the bound

must be set less than or equal to 1.

In addition, Q-RAM optimization needs to be performed either reactively or at regular

intervals (known as the reconfiguration rate) as a background process in a dynamic scenario

where the task set is not fixed and tasks are continuously arriving and departing the system.

In this case, the arriving tasks form a queuing system with Q-RAM as the “server” [38]. Q-

RAM accepts multiple newly arrived tasks, performs optimizations along with the existing

2.6. Chapter Summary 41

schedulable tasks, and finally produces the resource allocations of those tasks. In this

process, only a few out of all existing tasks need to be selected for optimization along with

the newly arrived tasks. The process is illustrated in Figure 2.8. The details of this dynamic

process are described in [38] along with experimental results.

Based on the above model, it must be noted that the scalability of the Q-RAM opti-

mization depends on how many tasks it can handle for optimization for a particular recon-

figuration rate. During the rest of this dissertation, we will investigate the improvement of

the scalability of Q-RAM.

2.6 Chapter Summary

In this chapter, we developed a generic model of a distributed system consisting of multiple

resources and applications. We also presented our QoS model, which is based on Q-RAM.

In the context of Q-RAM, we have presented new QoS optimization algorithms that handle

resource trade-offs more efficiently in a multi-resource environment. Finally, we highlighted

the challenges involved in performing QoS-based resource allocation in large systems. In the

next chapter, we will discuss QoS-based resource allocation in large multiprocessor systems.

Chapter 3

Resource Allocation in

Multiprocessor Systems

3.1 Introduction

In this chapter, we present our approach to QoS-based resource allocation in a multipro-

cessor environment. The tasks are assumed to be independent of each other i.e., there is

no communication among the tasks and they are indivisible. We also consider the fault-

tolerance requirements for the tasks along with the standard QoS requirements such as

timeliness.

A typical multiprocessor system consists of multiple processors connected via a bus, as

shown in Figure 3.1. Typical examples of multiprocessor systems are present in distributed

embedded environments such as automotive systems, back-end processors in phased-array

radar and distributed server systems.

There are existing algorithms such as bin-packing [11, 18, 19, 17, 3, 8, 43, 42] and load-

balancing [75] for deploying tasks with fixed resource requirements to a fixed set of resources.

There are QoS-based resource allocation schemes such as Q-RAM that determines the QoS

setting and associated resource allocation for tasks in any generic distributed system. How-

43

44 Chapter 3. Resource Allocation in Multiprocessor Systems

ever, these algorithms are not effective of performing resource allocation in multiprocessor

systems that integrate the QoS requirements with the fault-tolerance requirements of appli-

cations. In this chapter, we address this problem with new algorithms for combining QoS

optimization and fault-tolerance with resource selection.

P P P

P

1 23

4

I/O

Figure 3.1: Typical Multiprocessor System

One other problem with existing QoS optimization algorithms is that they are not scal-

able to very large numbers of resources and tasks. We present a new hierarchical decom-

position technique for solving very large optimization problems. The hierarchical technique

divides the problem into smaller sub-problems, and then solves these sub-problems individ-

ually. As we shall see, this leads to two or more orders of magnitude reduction in execution

time.

3.2 Q-RAM in Multiprocessor Systems

In this chapter, we make the following 4 assumptions in our model of a multiprocessor

system.

• A task has no specific bias or preference for any processor. In other words, a task can

be deployed to any processor as long as there is a space for it to be allocated.

• The number of QoS dimensions and the number of elements along any dimension are

both small. We have limited our analysis to 2 or 3 QoS dimensions each with only 2

or 3 discrete levels.

3.2. Q-RAM in Multiprocessor Systems 45

Number of QoS dimensions q 2Length of each dimension 3Utilities for QoS dimension (u(q)) (0.5,0.7,0.8)Weight for each QoS dimension random (0.00,1.00)Minimum resource for each task random (1,3) unitsResource increment for higher QoS random (1,2) unitsNumber of processors 5Resource amount per processor 10 units

Table 3.1: Experimental Settings with Optimal Algorithm

• Tasks do not communicate with one another. In other words, there are no communi-

cation bandwidth requirements among the tasks.

• A task is deployed to only one (or in the case of fault-tolerance, several) resource

(processor) from a pool of resources.

As we know, Q-RAM has an algorithm called AMRMD1 that performs the resource alloca-

tion for tasks in a multi-resource environment. In Chapter 2, we presented two algorithms

AMRMD CM and AMRMD DP as modified versions of AMRMD1 that were perceived to handle the

resource deployment trade-offs more efficiently. In this section, we evaluate the perfor-

mances of these algorithms in multiprocessor systems. For a given number of task profiles

and resources, our experiments focus on measuring the following performance metrics:

• the maximum number of tasks that can be admitted while satisfying the minimum

QoS requirements of all the admitted tasks,

• the utility obtained with the maximum number of admitted tasks when their minimum

QoS is 0, and

• the execution time of the algorithms.


0

2

4

6

8

10

12

14

16

18

0 5 10 15 20 25 30 35 40 45 50 55

Util

ity a

crue

d ->

Number of Tasks ->

Optimal Algorithm Comparison

mrmd-optimalamrmd1

amrmd_cmamrmd_dp

Figure 3.2: Utility Variation

3.2.1 Comparison with Optimal Algorithm

The first experiment compares all three AMRMD algorithms that are presented in Chapter 2,

along with the optimal exhaustive search algorithm called MRMD. The optimal algorithm is

presented in [49].

A small multi-processor system consisting of 5 processors is assumed for the sake of

convenience to be running the exponentially complex optimal algorithm. The assumed

configurations of the tasks and that of the system are presented in Table 3.1.

Figure 3.2 shows the variation in the utility as the number tasks is varied from 2 to

52. The result is averaged over 50 runs. It shows that AMRMD CM performed closest to the

optimal MRMD scheme in terms of utility, AMRMD1 being the farthest.

Figure 3.3 shows the variation of the execution time of the algorithms. The results

are plotted on log-scale as the optimal solution runs approximately 30, 000 times slower

than AMRMD1. The execution time of AMRMD CM is approximately 1.5 times greater than that


100

1000

10000

100000

1e+06

1e+07

1e+08

1e+09

0 5 10 15 20 25 30 35 40 45 50 55

Run

tim

es (u

sec)

in lo

g sc

ale

->

Number of Tasks ->

Optimal Algorithm Comparison

mrmd-optimalamrmd1

amrmd_cmamrmd_dp

Figure 3.3: Run-time Variation

of AMRMD1 as expected. The algorithm AMRMD DP has quadratic complexity and thus runs

slower by an order of magnitude compared to the other two AMRMD algorithms.

These results show that AMRMD CM yields utility values closest to those of the optimal

algorithm with somewhat higher execution times than those of AMRMD1.

3.2.2 Results for Larger Systems

In this experiment, we consider a system with 20 processors. The full experimental set-up

is given in Table 3.2. In the first case, we do not allocate the tasks with their minimum QoS

before performing the optimization. Instead, we compare the three algorithms in terms of

the maximum number of tasks they can admit into the system where each task has non-zero

QoS requirements that must be satisfied for admission into the system.

The results for the maximum number of tasks that can be admitted under each algo-

rithm, averaged over 100 randomly generated task configurations, are shown in Figure 3.4.


amrmd1 amrmd_dp amrmd_cm0

20

40

60

80

100

120

140Results on Number of Admitted Tasks

Algorithms

Num

ber o

f Adm

itted

Tas

ks

Figure 3.4: Number of Admitted Tasks (20 processors)

As can be seen, AMRMD CM is able to admit 6 times more tasks than AMRMD1 and twice as

many than AMRMD DP.

Figure 3.5 shows the standard deviation of the results for 3 algorithms. We observed

a very high relative standard deviation of the results of AMRMD1, the least being that of

AMRMD CM. The reason for this behavior is the following. AMRMD1 algorithm randomly selects

one of any co-mapped resource deployment points for a QoS setting and discards the rest

based on its own concave majorant operation. This random selection makes a significant

difference in the performance and contributes to the large standard deviation results for

AMRMD1. AMRMD DP, on the other hand, uses a better technique by evaluating the penalty

vector dynamically at each resource allocation. However, it follows the same technique as

AMRMD1 in discarding deployment options (trade-offs), resulting in a high standard deviation

similar to that of AMRMD1. This shows that AMRMD CM is the most consistent and predictable

in its result that does not depend on the randomness in the sequence of the input data.

In the next experiment, we vary the number of tasks and determine the utility accrued

under each algorithm. In this case, we assume all tasks have the zero minimum QoS


amrmd1 amrmd_dp amrmd_cm0

5

10

15

20

25

30

35

40

45Results Standard Deviation on Number of Admitted Tasks

Algorithms

Per

cent

age

Sta

ndar

d D

evia

tion

Figure 3.5: Percentage Standard-deviation (= 100× (Standard deviation)/mean ) of num-ber of admitted tasks on 20 processors

requirements and thus all can be “admitted”. This assumption equalizes all the algorithms

in terms of their admission control characteristics and allows us to compare utility accrual.

We plot the results that are averaged over 100 randomly generated problems in Figures 3.6

and 3.7. As AMRMD CM accommodates more resource options for each task, the results show

higher utility for AMRMD CM than for AMRMD1. Although it also shows that AMRMD1 algorithm

performed better in terms of yielding utility closer to that of AMRMD CM, this is again due

to the randomized ordering of trade-off values and the distribution of utilities among tasks.

This also can be made worse by choosing a different utility distribution for tasks. In other

words, in the case of AMRMD1, a few tasks obtained very high utility values at the expense

of many tasks remaining at the 0 utility level. In terms of execution times, AMRMD CM shows

slightly higher execution time than AMRMD1 while AMRMD DP needs close to two orders of

magnitudes (or higher) execution times compared to the other two.


0

10

20

30

40

50

60

70

80

0 20 40 60 80 100 120 140 160

Util

ity ->

Number of tasks ->

amrmdamrmd_cmamrmd_dp

Figure 3.6: Utility Variation of Three Algorithms in a System of 20 Processors

3.2.3 Results on Fault-tolerance

The notion of incorporating fault-tolerance in the QoS-based Resource Allocation Model

(Q-RAM) was explained in Section 2.2.6 of Chapter 2. We assume that fault-tolerance is

supported using replication on multiple processors. The higher the degree of replication, the

higher the utility obtained along the fault-tolerance dimension. In this experiment, tasks

are assumed to have fault-tolerance as the only QoS dimension for ease of comparison. Table

Number of QoS dimensions (q) 2Length of each dimension 3Utilities for each quality dimension (u(q)) (0.5,0.7,0.8)Computational resource on each processor 100Minimum resource for each task random (1,25)Resource increment for higher QoS random (1,10)Number of processors 20

Table 3.2: Settings for Second Experiment


1000

10000

100000

1e+06

1e+07

1e+08

0 20 40 60 80 100 120 140 160

Run

tim

es (u

sec)

in lo

g-sc

ale

->

Number of tasks ->

amrmdamrmd_cmamrmd_dp

Figure 3.7: Run-time Variation (log-scale) of Three Algorithms in a System of 20 Processors

3.3 lists the experimental specifications.

The results shown in Figures 3.8 and 3.9 demonstrate that AMRMD CM outperforms the

other two algorithms with respect to the number of tasks it admits and the utility it achieves

for a fixed number of tasks. The number of tasks it typically admits varies between 2 to 6

times that of AMRMD1. Essentially, if tasks are assumed be admitted with zero QoS (i.e., they

are “rejected”), AMRMD1 can maximize utility fairly well depending on utility values of tasks.

However, if all incoming tasks must be admitted at a nonzero QoS level, AMRMD1 performs

poorly compared to AMRMD CM. In other words, AMRMD1 admits fewer number of tasks and

thus it provides higher average utility value per task. On the other hand, AMRMD CM admits

more number of tasks and thus it provides lower average utility value per task. For certain

utility values of tasks where a minimum QoS gives a very large marginal utility, the utility

of the result of AMRMD1 can be made arbitrarily worse than that of AMRMD CM.

The results show an abundance of co-located set-points in the case of fault-tolerant


multi-processor scheduling.

Number of QoS dimensions 1Number of copies 1-2Number of quality indices 2Utilities (0.5, 0.7,0.8)

Table 3.3: Settings for Experiment on Fault-Tolerance

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100 120 140 160

Util

ity ->

Number of tasks ->

Utility variation for fault-tolerant case

amrmd1amrmd_cmamrmd_dp

Figure 3.8: Utility Variation under Fault-Tolerance

3.3 Hierarchical Q-RAM in Multiprocessor System

None of the algorithms we have presented so far scale well when there is a large number of

resources. For example, in multi-dimensional radar systems, 64 or more processing nodes

are common. Under these conditions, all three algorithms can consume large amount of

computation time and memory, making them usable only offline. This scalability bottleneck

3.3. Hierarchical Q-RAM in Multiprocessor System 53

amrmd1 amrmd_cm amrmd_dp0

20

40

60

80

100

120

140

160

180

200Two copies: 20 Processors

Algorithms

Num

ber o

f Adm

itted

Tas

ks

Figure 3.9: Number of Admitted Tasks (20 processors) under Fault-Tolerance

10000

100000

1e+06

1e+07

1e+08

0 20 40 60 80 100 120 140 160

Exe

cutio

n tim

e (u

sec)

in lo

g-sc

ale

->

Number of tasks ->

Execution time variation for fault-tolerant case

amrmd1amrmd_cmamrmd_dp

Figure 3.10: Run-time Variation (log-scale)


arises because the factor L1 increases proportionately with the number of resources in the

system. This happens because we enumerate all possible allocations of each task in all

resource unit (processor), where a task is allocated to only one resource unit (processor)

out of many, unless fault-tolerance is required. Even under fault-tolerance, the number

of resource units assigned is equal to the number of replicas needed for a task. Thus we

propose a hierarchical Q-RAM approach in which we partition the problem into smaller

sub-problems each dealing with a smaller number of resources.

Before we discuss the details of our approach, we provide the following definitions and

then state a theorem based on theorems of constrained extrema in linear programming [83].

Definition 3.3.1 (Task Profile). The profile or type of a task is defined by its set-points

containing different values of its operational and environmental dimensions and the associ-

ated utility values.

Definition 3.3.2 (Identical Tasks). Two tasks are said to be identical if they are of the

same type, i.e., they have identical task profiles.

Definition 3.3.3 (Utility function). It is defined as the function that describes the vari-

ation of utility of a task relative to its allocated resource(s).

Util

ity f

(r) −

>

Resource (r) −>

Figure 3.11: A typical Continuous Utility Function

1L denotes the maximum number of set-points per task, as mentioned in complexity analysis in Section 2.4of Chapter 2.


Theorem 3.3.4 (Resource Distribution of Identical Tasks). If a resource has to be

distributed among identical tasks with continuous monotonically increasing concave utility

function, the total utility is maximized when each task is allocated equal amount of resource.

Proof. The proof of this theorem follows directly from the theorems of constrained extrema

in Linear Programming [83] as a special case of Karush Kunn Tucker’s theorem [64].

Corollary 3.3.5. If a resource has to be allocated among various tasks with a fixed number

of types with continuous utility functions, the maximum utility is obtained when the same

resource amount is allocated to the tasks of the same type.

Proof. This corollary can also be derived from Karush Kuhn Tucker’s theorem.

Corollary 3.3.5 guides us in designing an efficient QoS allocation scheme when tasks

can be classified into a finite set of categories. In our case, tasks do not have continuous

utility functions. Instead, each of them has a few discrete set-points that corresponds to

a set of discrete utility-resource pairs. Lee et al [49] derived the bound in the obtained

global utility relative to the optimal utility when we apply Karush-Kuhn-Tucker’s theorem

on such occasions.

3.3.1 Hierarchical Q-RAM Algorithm

In order to reduce the complexity of Q-RAM optimization, we employ a “divide-n-conquer”

technique. In other words, we would like to divide the problem into identical subproblems

and solve these subproblems independently. Each subproblem is considered a cluster and

each cluster contains an equal number of resources. Hence the total number of resources in

the system must be an integral multiple of the number of clusters created.


XY

Y

X

_

Initial slope = Util

ity

Resource

Figure 3.12: Initial Slope of a Task

Next, we assume that there are only a small number of types of tasks. We would like to

allocate the computing resource equally to all tasks of the same type. Hence, we distribute

tasks to clusters where each cluster contains identical numbers of tasks of the same type.

If the number of tasks of a particular type is not an integral multiple of the number of

clusters, we will have a few residual tasks of that type which cannot be distributed equally

among the clusters. We keep those tasks temporarily un-allocated.

We then sort the un-allocated tasks in the increasing order of their initial slopes of

utility functions. As shown in Figure 3.12, the initial slope of a task is the ratio of the

utility and the resource requirement at its minimum non-zero QoS level. In other words,

we prioritize these tasks by their initial marginal utility values. We sequentially choose a

task from the list and allocate it to a group that is least populated. Finally, we perform

resource allocation within each cluster by executing AMRMD CM algorithm for each of them

individually.

For example, let us consider a multiprocessor system of R = l× p number of processors.

If we divide the system into P clusters p1, ..., pP , each cluster will contain RP = l number of

resources. Let us assume that we have 2 types of tasks a and b and the number of tasks of

types are na and nb respectively Hence, each cluster also obtains bna/P c number of tasks

of type a and bnb/P c number of tasks of type b. The number of un-allocated tasks are


Type a

Type b

p

p p

0

Figure 3.13: Hierarchical QoS Optimization with Clustering

(na/P − bna/P c) and (nb/P − bnb/P c) for types a and b respectively.

Based on task profiles, let us assume that the initial slope of a task of type a is higher

than that of type b. Therefore, we allocate the remaining tasks of type a among the clusters

first in a load-balancing manner followed by the remaining tasks of type b. In this way, we

approximately divide the system into P near-identical subsystems. Figure 3.13 illustrates

the process where each cluster contains a single processor i.e., l = 1. Algorithm 4 details

the whole procedure.

Complexity Analysis of H-Q-RAM

First, let us estimate the complexity of AMRMD CM in a multiprocessor system. If |Qm| denotes

the maximum number of QoS setting of a task and R denotes the number of processors. If


input : Tasks of fixed number of types and a multiprocessor systemoutput: QoS assignment and resource allocation of tasksCluster the resources/processors into p groups;Divide the tasks of each type into p identical groups. If the number of tasks of atype is not an integral multiple of p, keep the remaining tasks un-allocated;Each of the p identical groups of tasks is assigned to a distinct resource cluster, ofwhich there are p clusters;Form p identical groups of tasks and allocate them to p resource clusters;for all remaining tasks do

Perform Concave Majorant;Order with their initial slope of the utility-resource curve;

endfor all sorted remaining tasks do

Choose a task based on the highest initial slope, and allocate it to a group leastpopulated with the same type;

endfor all processor groups do

Run AMRMD algorithm for QoS optimization;Run the selected algorithm only once for multiple identical groups of tasks.Apply the result obtained directly on those subsequent groups;

end

Algorithm 4: Hierarchical Q-RAM Optimization for Multiprocessor System

tasks do not have any fault-tolerance requirements, the maximum number of set-points of

a task is given by L = |Qm|R. Using the expression of complexity obtained for the basic

AMRMD CM in Chapter 2, the complexity of the Q-RAM optimization in a multiprocessor

system is O(nL log(nL)) = O(n|Qm|R log(n|Qm|R)), where n is the total number of tasks.

For H-Q-RAM, if we divide the system into P clusters, the maximum number of set-

points of a task within a cluster is given by L = |Qm|RP . Each cluster contains d nP e tasks.

Hence the complexity of the optimization for each cluster is O(d nP e|Qm|RP log(d n

P e|Qm|RP )).

If we run the operation in a single processor the total complexity is (d nP e|Qm|R log(d n

P e|Qm|RP )),

but since the optimization per cluster can be performed in parallel the total complexity is

reduced to the complexity of a single cluster.

As can be seen from the expressions, the complexity of H-Q-RAM reduces by a factor

P 2 log(P ) compared to that of Q-RAM, where P is the number of clusters.

3.4. Performance Evaluation: H-Q-RAM 59

Number of task Types 8 (0,1,2,3,4,5,6,7)Type of a task random(0,7)Utilities on QoS dimension [0.5,0.7,0.8]/[0.4,0.6,0.65]Minimum resource 24− 2TypeResource increment random(16− 2Type, 20− 2Type)Distribution of task types: 12.5% each on averageNumber of processors 32Number groups formed 1, 2, 4, 8 and 16

Table 3.4: Experimental Specifications (H-Q-RAM)

3.4 Performance Evaluation: H-Q-RAM

In this section, we evaluate the scalability of H-Q-RAM. In this process, we primarily

measure the variation of execution time of H-Q-RAM with respect to the number of clusters

and the corresponding performance in terms of the accrued utility2.

We assume the presence of 8 types of tasks in the system. Each task is independently,

randomly assigned a task type out of these 8 different types. The specifications of the tasks

are detailed in Table 3.4. We assume a multiprocessor system consisting 32 processors.

Hence, we can create 1, 2, 4, 8 and 16 possible clusters under 5 different configurations and

each cluster has 32, 16, 8, 4 and 2 processors in those 5 configurations respectively. Having

a single cluster of all 32 processors is equivalent to the basic Q-RAM algorithm.

3.4.1 Multi-processor Resource Allocation

This experiment deals with the tasks that have no Fault-tolerant QoS specification. We

measure the maximum number of tasks admitted under each of the 5 configurations and

plot them in Figure 3.14. The result is the average over 100 randomly generated task-sets.

The results in Figure 3.14 show that the Hierarchical AMRMD (H-AMRMD) under each

grouping was able to admit nearly the same number of tasks as the non-hierarchical AMRMD.

The maximum drop in the number of admitted tasks was only 5%, from 291 in non-

2H-Q-RAM is transformed to the basic Q-RAM when there is only one cluster.


1 2 4 8 160

50

100

150

200

250

300Variation of number of admitted tasks (32 processors)

Number of groups formed

Max

imum

num

ber o

f tas

ks a

dmitt

ed

Figure 3.14: Number of Tasks (32 processors)

hierarchical version to 276 in the hierarchical version with 16 groups.

Next, we keep the number of tasks constant at 276, the maximum that can be admitted

by groups of 16 processors and measure the execution time under each case. The results

averaged over 100 iterations are presented in 3.15. It shows a very sharp drop in execution

time as the number of groups is increased. For example, the execution time for non-

hierarchical AMRMD is 73 times that of hierarchical AMRMD with groups of 16.

Next, we vary the number of input tasks from 16 to 256 and plot the utility and execution

time of the algorithms against the number of tasks in Figures 3.16 and 3.17. The results

show a negligible difference in utilities for a fixed number of tasks. It also shows a huge

drop in execution time (73 times for 276 tasks) as the number of groups is increased from

1 to 16.

3.4.2 Fault-tolerance and Hierarchical Q-RAM

In this experiment, we consider tasks having fault-tolerance as the only QoS dimension.

The specifications for the fault-tolerance are the same as presented in Table 3.3 of the non-


1 2 4 8 160

1

2

3

4

5

6

7

8

9x 105 Time variation with hierarchical Q−RAM (276 tasks)


Tim

e (u

sec)

take

n to

run

Figure 3.15: Run-time (276 tasks)

hierarchical case. Other configuration parameters are maintained to be the same as those

of the previous experiments.

As observed from Figures 3.18 and 3.19, a relatively smaller number of tasks (112 to 82)

are admitted with 16 groups. This is because each group of 16 has only 2 processors, and the

maximum number of replicas for the fault-tolerance is 2, thereby significantly reducing the

number of possible trade-off options. On the other hand, we observe 37, 000 times reduction

in the execution time as the grouping is increased from 1 to 16 (Figure 3.19). In addition,

all of the groupings produce a near-identical utility curve when we vary the number of tasks

from 16 to 76, as shown in Figure 3.21.

In summary, H-Q-RAM obtains a near-optimal system utility while reducing execution

time by two orders of magnitude or more.

3.5 Chapter Summary

In this chapter, we investigated extensions to Q-RAM to apply it to multi-processor systems.

We showed that AMRMD CM was able to admit more tasks and achieve larger global utility


0

1000

2000

3000

4000

5000

6000

7000

0 50 100 150 200 250 300

Util

ity a

crue

d ->

Number of Tasks ->

No grouping (1 group)2 groups4 groups8 groups

16 groups

Figure 3.16: Utility Variation (max 256 tasks)

values compared to the basic algorithm AMRMD1 with only a small increase in the execution

time. A similar pattern was observed when we used reliability as a QoS dimension.

Unfortunately, algorithms AMRMD1, AMRMD CM and AMRMD DP take too long to run when

allocating resources on large multi-processor systems. For example, a radar tracking system

may consist of a bank of 64 or more processors for signal processing tasks. It would take

around 5s to perform the resource allocation under AMRMD CM, which may be unacceptably

long.

We then presented a hierarchical decomposition approach for applying our QoS opti-

mization algorithms to such systems. In this approach, we divided the system into multiple

smaller identical subsystems and uniformly distributed tasks into those subsystems. Then,

we performed QoS optimization on each of these subsystems independently. We showed that

this hierarchical approach significantly reduced the execution time for all of the algorithms.

In particular, the resource allocation problem involving fault tolerance as a QoS dimension


0

5e+06

1e+07

1.5e+07

2e+07

2.5e+07

3e+07

3.5e+07

4e+07

4.5e+07

5e+07

0 50 100 150 200 250 300

Run

-tim

e (in

use

c) ->

Number of Tasks ->


16 groups

Figure 3.17: Run-time plot with grouping for 32 processors (max 256 tasks)

1 2 4 8 160

20

40

60

80

100

120Variation of the number of task admitted with F−T


Max

num

ber o

f tas

ks a

dmitt

ed

Figure 3.18: Number of Tasks under Fault-Tolerance


1 2 4 8 16102

103

104

105

106

107

108


Tim

e (u

sec)

take

n to

run

[Log

sca

le]

Time variation with groups FT (76 tasks)

Figure 3.19: Run-time (log-scale) under Fault-Tolerance

100

1000

10000

100000

1e+06

1e+07

1e+08

10 20 30 40 50 60 70 80

Exe

cutio

n tim

e (u

sec)

in lo

g-sc

ale

Number of tasks ->


16 groups

Figure 3.20: Run-time plot in log-scale with grouping for 32 processors under fault-tolerance(max 76 tasks)


10

15

20

25

30

35

40

45

50

55

10 20 30 40 50 60 70 80

Util

ity ->

Number of tasks ->


16 groups

Figure 3.21: Utility Plot under Fault-Tolerance (max 76 tasks)

becomes feasible as a result of our hierarchical approach since it reduces the execution time

by 5 orders of magnitude for a system of 32 processors. This difference increases with the

increase in the size of the system.

In the next chapter, we consider the extension of the first assumption where a task has

constraints in selecting resource trade-offs. For example, if a task needs a route between a

source and destination in a network, the selection of the links (as resources) is not arbitrary

and is dependent on the topology of the network.

Chapter 4

Resource Allocation in Networks

4.1 Introduction

In this chapter, we discuss QoS optimization in distributed networked environments. Apart

from the Internet, examples of distributed networked systems include sensor networks, au-

tonomous systems and overlay networks. In order to provide QoS to tasks executing on these

systems, we need to guarantee the allocation and scheduling of resources. The resources

include computational cycles, storage and network bandwidth across a route between the

source and the destination. For example, a typical video transmission application requires

a certain amount of network bandwidth and CPU cycles from various network links and

routers respectively. Higher quality in terms of its frame rates and resolutions requires a

greater quantity of these resources.

For a large number of tasks to be deployed on a system consisting of a large number

of resources, we designed a hierarchical scheme in Chapter 3 that provides near-optimal

resource allocation in a scalable manner. The hierarchical technique divides the problem

into smaller independent sub-problems. Specifically, it divides the system into identical

subsystems, assigns tasks to these subsystems in an equitable fashion so that each subsystem

obtains a (nearly) identical number of tasks of the same type, and then makes resource

67

68 Chapter 4. Resource Allocation in Networks

allocation decisions within each subsystem independently. Implementing this scheme on a

networked system, however, presents two major difficulties. First, it is difficult to divide a

networked system into a number of identical subsystems if the architecture is heterogeneous

(even if it is hierarchical). Secondly and most importantly, it is not possible to isolate the

subsystems in the network. This is because the route of a task can potentially span a very

large number of links and routers over the entire network. If we consider each network sub-

domain as a subsystem, many tasks can have routes across multiple sub-domains and thus

the resource allocation in one subsystem may be dependent on that obtained in another

and vice versa. Hence, multiple subsystems need to negotiate with each other in order to

determine near-optimal resource allocations.

4.1.1 Our Contribution

In the context of network QoS, we make our contribution in network bandwidth allocation

and route selection. However, our model differs in two fundamental ways. First, instead of

specifying a single QoS requirement, our Q-RAM-based QoS model allows a task/flow to

specify multiple levels of bandwidth and delay requirements for different levels of service.

Second, our resource allocation scheme determines the allocation of a near-optimal route

and a near-optimal network bandwidth along the route for each flow. The scheme relies on a

signaling protocol such as RSVP and packet scheduling policies across the network in order

to satisfy the network bandwidth reservations. In addition, as we will discuss later in this

chapter, it can also exploit the existing routing protocols to perform efficient optimization.

4.2 Modeling of Networked System

In this section, we describe our model of a distributed networked system. We first briefly

describe our generic resource allocation model based on Q-RAM. Next, we introduce a

graph-theoretical model of the network and demonstrate how to formulate and solve the

4.2. Modeling of Networked System 69

network QoS optimization problem in Q-RAM.

4.2.1 Network Model and QoS

We assume that the network is a distributed system consisting of multiple resources where

each resource corresponds to the link capacity in terms of the available bandwidth of the

link1. We consider a set of tasks that involve the transfer of data from one node in the

network to another. Each task has a set of QoS set-points in terms of bandwidth and

delay requirements. In addition, there is a utility associated with each of its set-points. In

general, a higher bandwidth provides higher quality and hence higher utility for a task. If

a network is modeled as an undirected graph, these tasks can be modeled as flows across

the graph with variable capacity requirements.

Q-RAM optimization in a network works as follows. Using the edges of the graph as

network links with a certain amount of bandwidth R, we construct a resource capacity

vector ~R = R1, . . . , Rm where m is the total number of weighted edges of the graph and Ri

is the bandwidth of the ith edge. We enumerate the operational dimensions of each task as

follows.

Set of bandwidth settings

The number of choices of bandwidth settings of a task τi is given by:

Bi = bi1, · · · , biNBi, (4.1)

where, NBi = number of possible bandwidth settings for task τi. The bandwidth maps

directly to the resource requirement on the network link.

1It is relatively straightforward to extend our formulation to include processing resources but we do notdo so for simplicity of presentation.


Set of delay settings

The number of choices of delay settings of τi is given by:

Di = di1, · · · , diNDi

, (4.2)

where NDi = number of delay levels for τi.

The network delay encountered by a flow is dependent on the value of total bandwidth

(or speed) of the network link(s) used. It is expressed as the sum of three components:

(1) circuit delay (propagation delay of 1 bit), (2) transmission delay, and (3) switching

delay [68]. In our model, for simplicity, we assume that the circuit delay is much smaller

compared to the other two factors and much smaller than the minimum delay requirements

of the applications. The transmission delay is the manifestation of bandwidth capacities

of the links along a route. In other words, it is expressed as the sum of the transmission

delays of a single packet across each link. Finally, the switching delay is the sum of the

queueing delay and the processing delay at each node of the route. Assuming the node has

enough computing power, the queuing delay is a more dominant factor than the processing

delay. This, in turn, depends on the scheduling policy of the packet scheduler on the node.

Since our QoS model deals with resource allocation that separates it from the scheduling

concern at the lower level, we only need to consider the bandwidth of the links for our

model. We assume that once the bandwidth has been allocated, the router will have enough

processing cycles to process the packets between its incoming and outgoing links, and its

lower level packet scheduler can schedule the packets appropriately so that each flow meets

their deadlines 2.

In conclusion, assuming that the routers can provide scheduling guarantees to meet the

deadlines of the packets, the delay encountered by a flow is simply expressed as the sum

of the transmission times along all links in the route. In this case, we can also add an

2A lot of work in packet scheduling has been done in the past with varying degree of schedulable utilizationbounds on the routers [79, 39, 28].


estimated queueing delay time along each hop in the route. Having this constraint will

prevent the QoS optimizer Q-RAM from choosing a too long route. However, the delay

must be managed by a proper packet scheduling scheme once the bandwidth is allocated to

each flow or task.

Set of routes :

The number of choices of routes of a task τi is given by:

Pi = pi1 × · · · × piNPi

(4.3)

For a connected graph, we always have |Pi| ≥ 1.

The procedure for determining all the routes for a fixed source-destination (S-D) pair is

described in Algorithm 5. This is similar to the basic broadcast route discovery except that

all possible routes are discovered in this case. First, the source node broadcasts its route

request to the destination to its neighboring nodes. Each neighboring node, upon receiving

the request, constructs a temporary route, and forwards that route along with the original

request to all of its neighbors other than the sender of the request (in this case, it was the

source node). Each intermediate node copies that route, creates a new route adding itself

and sends that its other neighbors. An intermediate node makes sure to prevent cycles by

not forwarding the request to a neighbor that has already been included in the temporary

route that is copied. This process continues recursively until a neighbor does not have any

node to forward the request or the destination node is discovered.

Basic Q-RAM Algorithm

By combining Equations (4.1), (4.2) and (4.3), we obtain the set-points of the tasks Si :

Bi×Di×Pi. The utility of a set-point is obtained from the QoS dimensions as Bi → u,

while the corresponding resource requirements are obtained as Bi ×Di × Pi → R. Thus


input : Source vertex S, Destination vertex D, Intermediate node I

//I = S for when the algorithm is called for the first time

output: Set of routes connecting S and D

// p = pending/incomplete route under consideration

//Vp = set of vertices for p

for All edges ei leaving I dor ← 0 ;//accept the link by default;if Next vertex N of the edge already belongs to the pending route then

r ← 1;endif r 6= 1 then

if N = D thenA route is constructed;Insert the route into the list of routes;Update the routing table entries of the vertices falling on this route;

endelse

I ← N ;Call this algorithm with (S, D, N) as inputs;//recursion;

endend

end

Algorithm 5: Basic Route Discovery Algorithm


a set-point is represented by qj , uj , (rj1 , . . . , rjm), hj where

qj = Quality level,

uj = Utility level,

(rj1 , . . . , rjm) = resource vector representing resource requirement at each edge of the

system, and

hj = compound resource describing a cost of allocating the resource.

The procedure is detailed in Algorithm 6.

input : profiles of tasks with bandwidth and network routesoutput: route and bandwidth allocation of tasks by maximizing utilityfor Each task i = 0 to n do

Determine QoS points as bandwidths Bi;Determine Pi as the set of resource options using Algorithm 5;Generate set-points Si = Bi ×Di × Pi for τi and map to resource requirementsSi → R in terms of link bandwidths;Determine “compound resource” as a scalar cost metric for each set-point;Determine concave majorant of the set-points based on their (compoundresource, utility) values and the corresponding gradient;

endMerge set-points of n tasks with decreasing values of their gradients and perform aglobal resource allocation starting with the point of highest gradient;

Algorithm 6: Basic Global QoS Optimization For Networks

Algorithm 6 is the most direct way of solving the problem of network bandwidth allo-

cation in Q-RAM. However, there are two main drawbacks to this approach.

First, it requires each task to enumerate all of its set-points, which, in turn, requires

them to determine all possible routes Pi between the source and destination. As the size

of the network increases, |Pi| increases exponentially, and the complexity of the whole

route discovery process supersedes the complexity of the optimization, making the process

intractable for large networks. Therefore, we must use an efficient route discovery technique

that can exploit the architecture of the network, namely hierarchical route discovery [34][53].

Second, suppose that each task has a small set(≤ 10 for example) of QoS levels for the

sake of simplicity. Even in this case, since Pi is the enumerated list of all routes between


two nodes in the network, it can potentially be very large. Therefore, we must select a few

routes to make the problem tractable. The challenge is to pick these few routes such that

the resulting utility is close to what would be achieved if the exhaustive lists of routes were

considered.

4.3 Hierarchical Network Architecture

In this section, we first formulate the hierarchical network problem using Graph-theoretical

techniques. Next, we describe how this formulation can be used in decomposing our opti-

mization process.

4.3.1 Graph-Theoretical Representation

We follow the description of the hierarchical network model as presented for the Internet

[10, 34, 81]. The ATM forum also adopts a hierarchical architecture for their network [14].

The entire network is represented as a connected undirected graph G = (V,E) as shown

in Figure 4.1, where V denotes the set of vertices and E denotes the number of edges.

The nodes or vertices of a graph represent switches, and the edges represent links. The

bandwidth across each link ej is expressed as the capacity cj of an edge in the graph. If the

network is hierarchically organized, Gp represents the network architecture at a particular

layer p.

The nodes get clustered to form the graph of the next layer. The nodes of the same layer

that are clustered into the same higher layer are said to belong to the same peer group [14].

At a particular layer, a set of edges partition the graph into multiple induced subgraphs,

whose vertices form peer groups. This set of edges defines the edges of the graph at the

next higher layer. We call these edges backbone-edges. If two subgraphs are connected by a

single edge, their connecting backbone-edge becomes a cut-edge of the graph.

If we collapse all the vertices and edges of a subgraph Gi of G into a single vertex, it is

4.3. Hierarchical Network Architecture 75

Layer 3

Layer 2

Layer 1

Layer 1 backbone-edge

Layer 2 backbone-edge

(a) Layer 1 Architecture

Layer 2 vertices /Layer 1 Supervectices

(b) Layer 2 Architecture

Layer 3 vertices /Layer 2 Supervectices

(c) Layer 3

Figure 4.1: Hierarchical Graph Model of Network


called a supervertex. Thus the graph at a higher layer is the supervertex graph of that of

the next lower layer. This layered architecture is illustrated in Figure 4.1. Expanding each

supervertex at any layer reveals the entire network of nodes in that subgraph at the lower

layer.

Let us consider a task that sends data from a source node x to a destination node y. We

define PG(x, y) to be the set of all possible routes from x to y. For a connected graph, we

have |PG(x, y)| ≥ 1. Let us also define pG(x, y) ∈ PG(x, y) as a particular route from x to

y. This is formed by concatenating a set of edges that connect x and y. This includes the

edges inside multiple sub-graphs and the backbone edges connecting them. Let us assume

that Vx and Vy are sets of vertices of two subgraphs of G such that x ∈ Vx,y ∈ Vy and

Vx ∩ Vy = ∅. Let the supervertices v′x and v′y of the supervertex graph G′ represent the sets

of vertices Vx and Vy in the original graph G. By definition, PG′(v′x, v′y) denotes the set of

routes between the supervertices v′x and v′y. Therefore, for every pG′(v′x, v′y) ∈ PG′(v′x, v′y),

there is at least one corresponding pG(x, y) ∈ PG(x, y).

Definition 4.3.1 (Border vertices). The vertices in two different induced sub-graphs that

are connected by one or more backbone-edges are known as border vertices.

Definition 4.3.2 (Sub-Route). The set of edges of a particular route connecting two

border vertices of an induced sub-graph between two backbone-edges is called a “sub-route”

or a “child-route”.

Definition 4.3.3 (Parent Route). The route in the supervertex graph that connects the

source and the destination supervertices is called the “parent route” of the “sub-routes”

internal to each supervertex of the (supervertex) graph.

According to the above definitions, each parent route has sub-routes within each super-

vertex it connects. Using the same notation, PG′(v′x, v′y) denotes the set of parent routes, and

each element in PG(x, y) consists of a concatenation of the edges from a route in PG′(v′x, v′y)

and its sub-routes one from each of the supervertices it traverses. As an example, in the


Subdomain 1 Subdomain 2 Subdomain 3

dstsrc

Supervextex graph

S D

Backbone 1 Backbone 2

Route in the Supervertex Graph

Figure 4.2: Network sub-domain and Supervertex Graph Example for |PG′(v′x, v′y)| = 1

case of the Internet, border vertices denote the edge routers that connect two sub-domains,

a parent route represents a route corresponding to “Inter-domain routing” and a sub-route

represents that corresponding to “Intra-domain routing”.

Next, we state theorems dealing with route selection for a given flow with a fixed capacity

(or bandwidth) constraint.

Lemma 4.3.4 (Backbone edge and Route selection). If all routes in PG(x, y) share

the same set of backbone edges, in Graph G, then |PG′(v′x, v′y)| = 1.

Proof. If all routes in PG share the same set of backbone edges, they go through the same

set of subgraphs. In the supervertex graph G′, these subgraphs are replaced by vertices.

Thus all routes in PG(x, y) collapse to having the same set of supervertices and hence are

connected by the same set of edges in G′. Therefore they collapse to a single route. In other

words, |PG′(v′x, v′y)| = 1.

Let us consider the network of 3 sub-domains illustrated in Figure 4.2. The source node

is present in Sub-domain 1 while the destination node is present in Sub-domain 3. As can

be seen from the figure, every route connecting the source “src” and the destination “dst”

has to go through the same sub-domains 1, 2, 3 and the backbone edges 1 and 2 connecting

those sub-domains. Hence, in the supervertex graph, all routes collapse to a single route

that traverses across 3 supervertices.


Next, we would like to determine the routes internal to each sub-domain. Using the same

example in Figure 4.2, we build a complete route between the source and the destination

by selecting a sub-route within each sub-domain that connects the backbone edges. We can

have multiple possible choices of sub-routes inside each sub-domain. If the selection of the

sub-route in one sub-domain does not affect the same at another, we say that the sub-routes

can be chosen independently of each other. Based on that, we state Lemma 4.3.5 under the

situation where we would like to determine a route of a particular bandwidth for a flow.

Lemma 4.3.5 (Independent Sub-Route Selection). For a fixed route pG′(v′x, v′y) ∈

PG′(v′x, v′y) in the supervertex graph with a fixed capacity (bandwidth) requirement, the sub-

routes inside each sub-graph can be chosen independently of each other.

Proof. Let us consider a hierarchical Graph G consisting of multiple induced subgraphs and

backbone edges joining them. The source node and the destination node of a particular task

are denoted by x and y respectively. Any route pG(x, y) ∈ PG(x, y) traverses a fixed set of

subgraphs g1, .., gl and a fixed set of backbone edges L1, .., Ll−1. If pg1 , .., pglare the sub-

routes in the respective subgraphs g1, .., gl of the route pG(x, y), then we express pG(x, y) as

pG(x, y) = pg1 ·L1 ·pg2 · . . . ·Ll−1 ·pgland the corresponding pG′ as pG′(v′x, v′y) = L1 · . . . ·Ll−1.

The maximum capacity of the route pG(x, y) is given by

c(pG(x, y)) = min(c(pg1), c(L1), c(pg2), c(L2), ..., c(Ll−1), c(pgl)), (4.4)

and that of pG′(v′x, v′y) is given by

c(pG′(v′x, v′y)) = min(c(L1), c(L2), ..., c(Ll−1)). (4.5)


Combining Equations (4.4) and (4.5), we obtain:

c(pG(x, y)) = min(c(pG′(v′x, v′y)), c(pg1), c(pg2), ..., c(pgl), (4.6)

⇒ c(pG(x, y)) = min(c(pg1), c(pg2), ..., c(pgl), (4.7)

⇒ c(pG(x, y)) ≤ c(pgi), ∀1 ≤ i ≤ l. (4.8)

This shows that selecting edges inside each subgraph can be performed independently

under a fixed capacity constraint.

Delay and Hierarchical Routing Lemma 4.3.5 holds true when delay is not considered.

The approximated delay is the main drawback of hierarchical routing [53]. In order to satisfy

the delay constraint in terms of the number hops as mentioned in Section 4.2.1, we divide

the delay requirements equally in each subgraph falling in the route, similar to what is done

in [39].

Based on Lemma 4.3.4 and Lemma 4.3.5, we state a theorem on the complexity of route

selections.

Lemma 4.3.6 (Complexity of Route Selections). Suppose all routes in PG(x, y) share

the same set of backbone edges L1, . . . , Ll−1, and hence the same set of subgraphs g1, . . . , gl

in Graph G. Furthermore, suppose that the set of edges for the route within a subgraph

gi can be chosen in si different ways under a bandwidth constraint. Then the number of

possible routes is∏l

i=1 si and the number of computational steps required to choose a route

is∑l

i=1 si.

Proof. Using the notation from (4.4), the set of links pg1 satisfying the bandwidth constraint

from the sub-graph g1 can be chosen in s1 different ways. From Lemma 4.3.5, for each choice

in g1, we can choose the set of links in g2 by s2 different ways and so on. Therefore, the

maximum number of possible ways a route can be selected is s1 × . . .× sl =∏l

i=1 si.


Next, the number of steps required to choose the near-optimal set of edges inside a

subgraph gi (sub-route) is si. Since all routes map to a single route in the supervertex

domain, Lemma 4.3.5 proved that the selection of edges in each subgraph can be done

independent of each other under a fixed capacity requirement. Therefore, the maximum

number of steps required to choose a suitable route is s1 + . . . + sl =∑l

i=1 si.

Based on Theorem 4.3.6, we describe our hierarchical route discovery method next.

Later, we will also discuss how it also assists in hierarchical QoS optimization.

4.3.2 Hierarchical Route Discovery

The process of route discovery assumes a top-down approach. In other words, we obtain

the set of routes for a task at its highest level of network hierarchy. Next, for each of the

(super)vertices in each route, we obtain the sub-routes inside the subgraphs represented by

those vertices. The process starts with the highest level of the task and continues to the

lowest level of the hierarchy. This recursive procedure is described in Algorithm 7. From

Theorem 4.3.6, if we would like to determine ηth number of sub-routes for each sub-domain,

the complexity of hierarchical route discovery is O(pηth), where p is the number of sub-

domains. On the other hand, a flat route discovery will have the complexity of O(ηpth) for

the same set of routes.

For each subgraph at every level, Algorithm 7 determines the set of routes between the

two edge routers using Algorithm 5 or the more efficient Algorithm 8.

4.4 Selective Routing

As proved in Theorem 4.3.6, the hierarchical scheme is able to reduce the complexity of

the route discovery process. However, it does not reduce the overall number of routes per

task. In order to reduce the complexity of the Q-RAM optimization, we must also limit the

4.4. Selective Routing 81

input : Level of the hierarchy of the graph, source and destination nodesoutput: Hierarchical routes between source and destination nodesDetermine routes between source and destination node within their domain (AS);//Use Algorithm 5 or 8 or something similar;

if Level of the routes is not the lowest thenfor Each node in the route do

Obtain the corresponding subgraph represented by the node;Determine the entry router node and the exit router node of the subgraph;Call this Procedure between the above two nodes within the domain of thesubgraph recursively;

endend

Algorithm 7: Hierarchical Broadcast Route Discovery

number of routes per task.

The route discovery process employed in our scheme is developed in three phases, start-

ing from generating the exhaustive lists of routes for each task to a smart discovery of

a fewer routes, with the aim of improving on the execution time without incurring any

significant loss in overall utility.

4.4.1 Broadcast Routing

Broadcast routing is the basic approach that uses flooding from the source across the net-

work to determine all possible routes to the destination. It assumes that each node only

knows its neighbors. This process can potentially yield an exponentially large number of

routes, and can therefore become intractable as the size of the network increases.

4.4.2 Smart Route Discovery

Instead of choosing all possible routes between a source and a destination, we would like to

select only a few best or least-cost routes. We use a metric called Route Count Threshold.

Definition 4.4.1 (Route Count Threshold). The route count threshold is defined as

the maximum number of choices of routes for a particular source-destination pair.


input : Source vertex S, Destination vertex D

output: Set of routes connecting S and D

// p = pending/incomplete route under consideration

//Vp = set of vertices for p

if T > Tth thenSort all edges ei connected to I within the Graph in terms its minimum cost ofrouting to D ;//T = Task ID, Tth = Task Count Threshold

endfor All edges ei do

r ← 0 ;//accept the link by default

if The cost of a potential route added by the load of the edge exceeds themaximum cost of the routes already included and the number of routes ηi = ηth

thenr ← 1 ;// reject this link from route discovery

endelse if Next vertex N of the edge already belongs to the pending route then

r ← 1;

if r 6= 1 thenif N = D then

A route is constructed;Insert the route into the list of routes;Update the routing table entries of the vertices falling on this route;

endelse

I ← N ;Call this algorithm recursively;

endend

end

Algorithm 8: Smart Route Discovery

4.4. Selective Routing 83

We denote this limit by ηth. We assume that the number of hops is the measure of the

cost of a route. Using this principle, for ηth = 1, we know that Dijkstra’s shortest route

algorithm can provide the best route between a source and a destination [22]. However,

Dijkstra’s algorithm for each source-destination has O(|V |2) complexity, where |V | is the

number of nodes. This can be quite expensive for large networks.

Another alternative is the Bellman-Ford algorithm. This algorithm finds the shortest

routes from a single source vertex to all other vertices in a weighted, directed graph [6, 24].

The algorithm initializes the distance to the source vertex to 0 and all other vertices to

∞. It then does |V | − 1 passes over all edges relaxing, or updating, the distance to the

destination of each edge. The time complexity is O(|V ||E|), where |E| is the number of

edges. A variant of this algorithm is used for distance-vector routing in the Internet, such

as RIP, BGP, ISO IDRP, NOVELL IPX etc.

In our routing scheme that we call “Smart Route Discovery”, we use a modified version

of the Bellman-Ford algorithm within each sub-domain of a network, where we determine

ηth shortest routes for each source-destination pair.

4.4.3 Route Caching

In a distance vector routing algorithm a router learns routes from neighboring routers’

perspectives and then advertises the routes from its own perspective. We implement a

reactive distance vector routing protocol in our simulation.

According to this protocol, each node (router) is initialized with the routes of its next

hop neighbors. The algorithm discovers routes of a task starting from its source. Once a

route is established, each node across the route adds the entry to its routing table. The

existing routing table, in turn, is exploited in route discovery. During this process, at any

intermediate node, we sort the neighboring vertices in increasing order of the minimum cost

of routing to the destination based on their routing tables, and reject the neighbors with

more expensive routing in their tables once the number of routes reaches the limit ηth. This


algorithm can provide a potentially sub-optimal route compared to the exhaustive discovery

of the best routes. Therefore, we would like to use this routing information to assist in this

step only after we finish discovering routes for a sufficient number of tasks. We define a

parameter called Task Count Threshold Tth.

Definition 4.4.2 (Task Count Threshold). The task count threshold is defined as the

number of tasks whose routes are determined by exhaustive search using only the next-hop

routing information for each node.

The exploitation of cached routes is simply used to reduce the complexity of route

discovery process. The process becomes intractable for a large dynamic system if we perform

the exhaustive search for every incoming request. Instead, we eliminate this complexity

using cached routing information present in the node. However, we need to make sure that

the network is sufficiently discovered by nodes. Otherwise, the cached route information

at the intermediate nodes may provide sub-optimal routes. Therefore, we would like to

have Tth to be sufficiently large so that the cached routing information can be used without

significantly sacrificing optimality. Route caching is very important in a dynamic networked

system, where flows dynamically enter and leave the system.

Algorithm 8 describes the procedure for Smart Route Discovery that includes the usage

of parameters ηth and Tth.

4.4.4 QoS Optimization in Large Networks

So far, we have discussed a single centralized optimization scheme that distributes band-

width among tasks. In a large network, a centralized scheme is likely to be infeasible. In

addition, it may not scale well with a very large number of tasks. In the next section, we

will describe a hierarchical QoS optimization technique that exploits the inherent hierarchy

of the network. It can also be distributed across the entire system, thus making the QoS

optimization feasible and scalable for a large network using a large number of tasks or flows.

4.5. Hierarchical QoS Optimization (H-Q-RAM) 85

4.5 Hierarchical QoS Optimization (H-Q-RAM)

In this section, we present H-Q-RAM for networks that utilizes the hierarchical architecture

of networks [81]. In this dissertation, we confine our discussion to only 2 levels of hierarchy

for ease of presentation. The process is divided into two major steps. They are: (1) hierar-

chical concave majorant operation, and (2) distributed resource allocation. The process is

outlined in Algorithm 9 which will be described in detail in the following sections.

g2

S

G1G2

G

1

23

4 5

11D6

7

89

10

S :u,q,<R> ,h

S :u,q,<R> ,h

g1 g

2

S:u,q,<<R> ,<R> >,h + h

Subgraph set−pointSubgraph set−point

Composite set−point

g1 g2

1g

g1

2g

1g g

2

Figure 4.3: Compound Resource Composition

4.5.1 Hierarchical Concave Majorant Operation

This process is divided into 2 steps. First, we generate separate profiles for each task in

each of the sub-domains containing its sub-routes. Second, we combine information from

each sub-domain and update the set-points.

Creation of Multiple Profiles

At the lowest level for each sub-graph, we obtain the set of tasks whose routes include the

sub-graph. Next, we generate local set-points Si = Bi × Dig × Pig for these tasks, where


Pig is the set of sub-routes inside the subgraph g and Dig is the delay assigned for the

route inside subgraph g. As mentioned before, a set-point consists of a utility value, a

corresponding QoS level and a resource vector specifying the route inside the subgraph and

the bandwidth requirement of the links of that route. Thus each task has distinct profiles

within each subgraph.

Next, we evaluate the compound resources for set-points. Using compound resource

values, we prune the list of set-points and discard the ones that are “inefficient”. A set-

point is called inefficient if it has a larger compound resource value than another point at

the same utility level. In other words, if we have multiple set-points for a particular value

of utility, we keep the one that has the smallest compound resource value and discard the

rest. If there is more than one set-point with the same minimum compound resource value

at a utility level, we keep all of those points as co-located set-points (see Chapter 2).

Creation of Composite Profiles

We next merge the profiles of multiple subgraphs or sub-domains into a single profile for

each task. First, we choose a single set-point for each utility value from each subgraph for

each parent route, and then combine the compound resource values of all subgraphs. Since

all the resources in this case are considered to be of identical type (as network links), the

compound resource of the global set-point of a task spanning two subgraphs g1 and g2 is

given by:

hcomp = hg1 + hg2 , (4.9)

where hg1 and hg2 are the compound resource values of the task(or flow) at its particular

quality setting in the two sub-domains g1 and g2. The generation of a composite set-point

is illustrated in Figure 4.3, where the local set-points of the subgraphs are assumed to be

(Sg1 : u, q, < R >g1 , hg1) and (Sg2 : u, q, < R >g2 , hg2) for a particular value of utility u and

quality level q.


G1

G

Optimization thread 1

Optimization thread 2

Global Information Transaction

G2

Figure 4.4: Distributed QoS Optimization

Second, we determine the concave majorant of these global set-points.

Third, we replace the compound resource values of the local set-points in each sub-

domain by the corresponding composite compound resource values. For example, as shown

in Figure 4.3, the set-points for a task in subgraphs g1 and g2 are changed from (Sg1 : u, q, <

R >g1 , hg1) and (Sg2 : u, q, < R >g2 , hg2) to (Sg1 : u, q, < R >g1 , hg1 + hg2) and (Sg2 : u, q, <

R >g2 , hg1 +hg2) respectively. In addition, since the concave majorant operation eliminates

set-points, a few global set-points may be discarded. In that case, we also discard the

corresponding local set-points in the subgraphs.

Finally, we merge all the local set-points of tasks in each sub-domain to create lists of

set-points called slope lists (see Section 2.3), which are going to be traversed for resource

allocation purposes. As mentioned in Chapter 2, the set-points in the slope list are ordered

by increasing slope or marginal utility values. We will discuss the resource allocation in the

next section.

4.5.2 Transaction-based Resource Allocation

We perform concurrent resource allocation within each sub-domain. Thus, the entire global

resource allocation problem is partitioned into multiple sub-problems within each subgraph,


for Each sub-domain in the network dofor Each task in the sub-domain do

Determine set-points Qi = Bi ×Di × Pg(i) ;//Pg(i) = number of sub-routes for task τi in the domain;

endendfor Each task in the entire network do

Generate global set-points by combining compound resource at each utilitylevel;Perform concave majorant on global set-points;

endfor Each sub-domain in the network do

for Each task in the sub-domain doDiscard the set-points whose global counter-part has been eliminated byconcave majorant operation;

endMerge the remaining set-points of all tasks in the sub-domain in a single list;

endfor Each sub-domain in the network do

Execute transaction-based resource allocation as described in Figure 4.5;end

Algorithm 9: Hierarchical Distributed QoS Optimization


similar to the situation in Chapter 3. However, the sub-problems are not completely in-

dependent of each other in this case, since some tasks may be present in more than one

sub-problem. Such tasks must be assigned the resources to achieve the same utility value

(or quality setting) in all the sub-problems that they are present in. This requires coordi-

nation between these sub-problems, since a resource allocation in one sub-domain may be

infeasible in another sub-domain. In this context, we define three parameters.

Definition 4.5.1 (Local Task). A task is called a local task if its source and destination

nodes are in the same sub-domain.

Definition 4.5.2 (Global Task). A task is called a global task if its source and destination

nodes are in different sub-domains.

Definition 4.5.3 (Locality of Tasks). The locality is the fraction of tasks that are local,

Distributed Negotiation

The resource allocator in each sub-domain sequentially goes through its slope list. If it finds

the set-point in the list belonging to a local task, it determines its feasibility of allocation

locally, and accepts or rejects it based on the availability of local resources. Hence it works

independently for local tasks assuming that the best route for a local task is available within

the sub-domain it belongs to3.

When the allocator comes across a set-point of a global task that needs to have a route

spanning multiple subgraphs, it does the following. First, it checks if the corresponding

global set-point has already been rejected. It happens when another sub-domain that is

included in the parent route of the task fails to allocate the corresponding local set-point.

In that case, the current allocator also discards the set-point and moves on. Otherwise,

it marks the set-point as allocable and waits until every other sub-domain along the route

decides the allocations of their corresponding set-points. During this time, it goes to sleep3A network sub-domain is designed in such a way that the best route for a local task falls within the

sub-domain unless its links are extremely crowded.


and and wakes up only when all other sub-domains make their decisions. Upon waking

up, it checks if the allocation has been successful. The allocation becomes successful when

all sub-domains are able to allocate their corresponding local set-points that complete the

route with a specific utility value. The allocation is unsuccessful if one of the sub-domains

fails. Upon a successful allocation, it finalizes the local allocation. Otherwise, it rejects the

initial tentative allocation. Next, it proceeds further to complete the operation of QoS-based

resource allocation.

Deadlock Avoidance in Negotiation

Since allocators negotiate the allocation for set-points belonging to global tasks, it is im-

portant to ensure that a deadlock never happens. Since an allocator follows the slope list

that is ordered in the increasing marginal utility4 values, it is feasible to have the same

marginal utility values for multiple set-points belonging to different tasks or flows or for

different routes of the same task. In that case, we must implement an ordering mechanism

of set-points to avoid any deadlock.

We implement two levels of ordering to avoid the deadlock. First, we assign a global

number to each flow or task in the entire network. This global number can obtained as a

combination of IP addresses of the source and the destination nodes, and the corresponding

port numbers.

Second, we also assign a global number to each “Parent Route” within a flow. Using

these numbers, we resolve the contention in the slope list when multiple set-points have the

same marginal utility value. First, we order them in the increasing order of their global

flow IDs. Next, for multiple co-located set-points of the same flow, we order them in the

increasing order of their Parent Route IDs. For the co-located points of the same Parent

Route of the same task, we do not require any ordering since their selections are independent

in sub-domains, as proved in Lemma 4.3.5. The allocation process is illustrated in Figure 4.44The marginal utility of a task is defined as the ratio of the difference between the utility values and the

compound resource values between two successive set-points of different utility values.


and is detailed by a flow-chart in Figure 4.5.

Start

Any pending global

allocation?

Check the next set-point in the list that

increses the current utility of the task

1. Feed the list with sorted set-points

of all tasks2. Utitlities of tasks

intialized to '0.0'

Local Task?

Y

Resource allocation feasible?

Y

Allocate

N

Resource allocation

locally and globally feasible?

Y

Mark it allocable

Task allocated/rejected by other allocator(s)?

Sleep and wait for

the completion of the total allocation

N

N

YAllocated?

AllocateLocally

Y

De-allocateLocally

N

Y

Next set-point is co-located of this one (of same parent path)?

N

N

Y

Mark it rejected

Wake up

pointleft?

Finished!

N

Y

Figure 4.5: Distributed Resource Allocator


4.5.3 Complexity of Network QoS Optimization

In this section, we compare the complexities of the Q-RAM and the H-Q-RAM optimization.

Q-RAM Complexity

Suppose there are n tasks in the entire network. Using the same notation as before, let

us assume that |Qm| denotes the maximum number of QoS settings, ηth = maxni=1|PG(i)|.

This definition yields the the maximum number of set-points L = |Qm||etath|. Hence, the

complexity of the concave majorant operation is O(n|Qm| log |Qm|), and the complexity of

the merging operation is O(n|Qm||etath| log(n)).

Since the complexity of the Q-RAM optimization is the sum of the complexities of the

concave majorant and the merging operation, we have the total complexity as O(n|Qm|(log |Qm|+

|etath| log(n)))

H-Q-RAM Complexity

For H-Q-RAM, initial local set-point pruning has O(lnl|Qm|ηth) complexity per sub-domain,

where l equals the number of sub-domains and nl equals the maximum number of tasks per

sub-domain. Unlike the Q-RAM optimization, ηth denotes the upper limit on the number

of routes inside each sub-domain for a task.

Next, we have the concave majorant operation that has the global complexity of O(n|Qm| log(|Qm|)).

The second pruning operation after the concave majorant also has the same complexity

O(lnl|Qm|ηth).

The merging operation requires O(lnl|Qm|ηth log(nl)) steps, and the distributed trans-

action requires a maximum of O(nlηth|Qm|) steps per sub-domain.

We can now express the generic complexity expression for H-Q-RAM, namely: O(lnl|Qm|ηth)+

O(n|Qm| log(|Qm|))+O(lnl|Qm|ηth)+O(lnl|Qm|ηth log(nl))+O(nlηth|Qm|) = O(n|Qm| log(|Qm|)+

O(n|Qm|(log |Qm|+ lnln ηth log(nl))).

4.6. Experimental Evaluation 93

From the expression, in the worst case, when every task has a profile in every sub-

domain, we have nl = n. Then, the complexity of H-Q-RAM is higher than that of Q-RAM.

In the best case, which corresponds to the case when every flow is a local task that does

not span sub-domains, we have nl = n/l, which is better than that of Q-RAM. However,

in a very large network (the size of the Internet), it is very unlikely that a task traverses

across all sub-domains. Therefore, H-Q-RAM performs better than Q-RAM for practical

cases. Since H-Q-RAM computations can be distributed (one node per sub-domain), we

can further reduce the complexity to O(nl |Qm|(log |Qm|+ nl

n ηth log(nl))). Thus, H-Q-RAM

can scale well with large networks.

4.6 Experimental Evaluation

Our experimental evaluation is intended to quantify the performance of H-Q-RAM and Q-

RAM in terms of the trade-off between optimality and scalability. We focus on measuring

two main parameters:

• the global utility obtained by the optimization, and

• the total execution time of the algorithm.

First, we investigate the efficiency of our enhancements in route discovery. We deter-

mine how a selective set of routes obtained through our smart route discovery process can

eliminate the necessity of selecting a large number of routes for the optimization purposes.

We also investigate the performance of the optimization when we vary the parameter Tth.

Second, we compare the performance of H-Q-RAM optimization with respect to Q-RAM

optimization.

4.6.1 Experimental Configuration

In order to validate our technique, we generate network topologies using BRITE [56, 57] a

topology generation tool. The bandwidth distribution of the network links is presented in


Table 4.1: Settings of Tasks

Number of QoS dimensions (Bandwidth, Delay) 2Length of bandwidth dimension random(1, 4)Length of delay dimension 1Minimum Bandwidth(Bmin) min((Rayleigh Distr. : µ = 152 Kbps),

8000.0 Kbps)Bandwidth Increment 0.3Bmin

Maximum Delay random(16, 20) hops

Utilities for QoS dimension (u(q)) (0.5,0.7,0.8)

Table 4.2: Settings of NetworksNetwork topology generator BRITE [56]Intra-domain link bandwidth 10.0 Mbps

Inter-domain link bandwidth 10000.0 Mbps

Table 4.2.

The specifications of the tasks are presented in Table 4.1. As seen from the table,

the minimum bandwidth is randomly chosen following a Rayleigh distribution with µ =

152 Kbps. This distribution ensures a positive value for the minimum bandwidth of any

task. For simplicity, we choose a single value of delay, which is expressed by a certain

maximum number of hops for a route. The source and the destination nodes of a task are

chosen randomly across the entire network. The experiments are performed on a 2.0 GHz

Pentium IV processor with 768 MB of memory.

4.6.2 Performance Evaluation of Selective Routing

In this section, we evaluate the performance of the selective routing algorithms.

Results on Smart Route Selection

In this experiment, we demonstrate the effectiveness of smart route selection as described

in Section 4.4.2.


0

50

100

150

200

250

300

350

400

0 50 100 150 200 250 300 350

Uti

lity

Number of Tasks

Random Path DiscoverySmart Path Discovery

Figure 4.6: Comparison of Smart Route Discovery and Random Route Discovery

0

50

100

150

200

250

300

350

400

0 100 200 300 400 500 600 700

Uti

lity

Number of Tasks

ηth = 1ηth = 2ηth = 5

ηth = 80ηth =∞ in bar-graph

Figure 4.7: Utility Variation with Number of Routes


First, we compare the smart route discovery algorithm with the random route discovery

algorithm, where we randomly select ηth routes out of all possible routes. We vary the num-

ber the number of tasks in the system in geometric progression as N = 10, 20, 40, . . . , 640.

We plot the accrued utility against the number of tasks for ηth = 5 under both schemes

in Figure 4.6. The results show that a random route selection scheme yields a much lower

utility (29.5% for N = 320) compared to the smart route selection.

Next, we compare smart route selection for different values of ηth. In this case, we use

5 values of ηth as [1, 2, 5, 80,∞]. The value ∞ signifies that all possible routes are chosen

for each source-destination pair. The plots of utility against the number of tasks are shown

in Figure 4.7. The “ηth =∞” case is shown by the bar graph instead of a line.

From the bar graph, we observe that we do not have any data beyond N = 40 for

ηth = ∞. This is because for N ≥ 80, the route discovery and the optimization processes

become intractable. This is further confirmed by its steep rise in execution time as shown

in Figure 4.8.

On average, the utility increases as ηth increases since it provides more alternative

routes for each task. However, the difference between utilities at ηth = 5 and ηth = ∞

is statistically insignificant(< 0.09%), whereas the reduction in execution time for ηth = 5

is 93.6% (or, 15.6 times). Overall, we observe a 99.997% (or, 38239.4 times) reduction in

execution time for ηth = 5 relative to ηth = ∞ when the number of tasks is 40. Even for

ηth = 2, the reduction in utility is only 3.57% relative to ηth = 80 for 640 tasks, with a

run-time reduction of 96.9%.

Results on Route Caching

This experiment demonstrates how caching route information helps in reducing the execu-

tion time of the optimization. In this case, we fix the number of tasks N to 640 and vary

the parameter Task Count Threshold Tth. Figure 4.9 shows the percentage drop in utility

for different values of Tth compared to the same under no Route Caching, or Tth =∞. The


0.0001

0.001

0.01

0.1

1.0

10.0

100.0

1000.0

0 100 200 300 400 500 600 700

Opt

imiz

atio

nR

un-T

ime

(s)

Number of Tasks

ηth =∞ (All Paths)ηth = 80ηth = 5ηth = 2ηth = 1

Figure 4.8: Run-Time Variation with Number of Routes

-10

-8

-6

-4

-2

0

0 10 20 30 40 50 60 70

Per

cent

age

Uti

lity

Dro

p

Task Count Threshold (Tth)

Figure 4.9: Percentage Utility Drop with Routing Task Count Threshold


-80

-70

-60

-50

-40

-30

-20

-10

0

0 10 20 30 40 50 60 70

Per

cent

age

Exe

cuti

onT

ime

Cha

nge

Task Count Threshold (Tth)

Figure 4.10: Percentage Run-Time Variation with Routing Task Count Threshold

value of ηth is kept constant at 5.

We observe that even for Tth = 1, for example, we start exploiting route discovery

information right after the first task’s routes have been determined. The percentage loss of

utility is less than 3%. On the other hand, we also observe a huge drop in execution time

(> 60%) as shown in Figure 4.10.

Using the route caching technique, the route discovery time per task will reduce with

time as nodes keep adding more entries to their routing tables. Figure 4.11 shows the plot

of route discovery time per task against the number of tasks considered for optimization.

It clearly shows that the route discovery time decreases exponentially with the number of

tasks. Hence it decreases with time in a dynamic system where tasks regularly arrive in and

depart from the system. In other words, we can claim that in a dynamic scenario, in steady

state, the optimization time dominates the route discovery time. This is also corroborated

in Figure 4.12, which shows the ratio of route discovery time and optimization time per

task.


1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

11000

0 2000 4000 6000 8000 10000 12000 14000

Rou

ting

Dis

cove

ryT

ime

per

Tas

k

Number of Tasks

Figure 4.11: Average Execution Time for Route-discovery Simulation Per Task

0

0.05

0.1

0.15

0.2

0.25

0.3

0 2000 4000 6000 8000 10000 12000 14000Q-R

AM

/Rou

teD

isco

very

Rat

ioPer

Tas

k

Number of Tasks

’Ratio in run-time’

Figure 4.12: Ratio of Q-RAM Optimization Time To Route-Discovery Per Task


Table 4.3: Specifications of the NetworksNetwork ID Number of Sub-domains Number of nodes Number of links1 5 100 2072 8 160 3343 15 450 9304 20 600 1240

4.6.3 Performance Evaluation of Hierarchical Optimization

In this section, we evaluate the performance of Hierarchical QoS optimization. We use 2

levels of hierarchy for our experimental evaluation. We use the same specifications of tasks

as mentioned in Table 4.1. In order to validate the usefulness of H-Q-RAM, we use larger

networks, consisting of 5, 8, 15 and 20 sub-domains respectively. Their specifications are

presented in Table 4.3, and their bandwidth distributions as specified in Table 4.2. For these

large networks, we use ηth = 2, and Tth = 1, since these settings have provided reasonably

good utility values(< 5%) with great reductions execution time for smaller networks.

0

500

1000

1500

2000

2500

3000

3500

0 2000 4000 6000 8000 10000 12000

Uti

lity

Number of Tasks

Q-RAMH-Q-RAM

Figure 4.13: Absolute Utility Variation in Q-RAM and H-Q-RAM

In the first experiment, we use Network 3 from Table 4.3. In this case, we vary the

number of tasks for optimization between 100 and 10240 in a geometric progression with


0

1000.0

2000.0

3000.0

4000.0

5000.0

6000.0

7000.0

0 2000 4000 6000 8000 10000 12000

Exe

cuti

onT

ime

(sec

)

Number of Tasks

Q-RAMH-Q-RAM

Figure 4.14: Absolute Execution Time Variation in Q-RAM and H-Q-RAM

a power of 2. Figure 4.13 shows the variation of utility between Q-RAM and H-Q-RAM

against the number of tasks. Figure 4.14 shows the variation of execution time against the

number of tasks.

We observe that H-Q-RAM reduces the optimization time for 10, 240 tasks by 64%

while incurring a utility reduction of less than 2% utility than Q-RAM. From Figure 4.14,

we also observe that the difference between Q-RAM and H-Q-RAM increases further with

the increase in the number of tasks.

Implementation Considerations: As can be seen from Figure 4.14, the execution time

of the optimization increases exponentially for a large number of tasks to be deployed

in larger networks. This is because the simulation becomes memory-intensive under this

situation and hence many page faults and swapping operations cause the non-linear (expo-

nential) increase in the execution times. Consequently, it becomes difficult to simulate the

hierarchical optimization of a very large network in a single host, as the memory require-

ment for the optimization process also increase. This effectively suggests the necessity of


0

2

4

6

8

10

10 20 30 40 50 60 70 80 90

Per

cent

age

Los

sin

Glo

balU

tilit

y

Locality of Tasks(%)

Number of SubNets=5Number of SubNets=15Number of Subnets=20

Figure 4.15: Variation of Percentage Utility Loss for 6400 Tasks with the Number of Sub-domains

studying the performance improvement of distributed transaction-based optimization using

H-Q-RAM. The execution time for H-Q-RAM will be reduced further if the optimization

is distributed over multiple hosts. This will be the only option available, since running the

Q-RAM optimization for all the tasks in a single host becomes intractable.

Next, we measure the performance of H-Q-RAM relative to the locality of tasks in

different sub-domains. From our complexity analysis, we know that H-Q-RAM performs

best when the source and the destination of a task are confined within a single domain, which

in turn also eliminates transactions between sub-domains during the optimization step. In

this experiment, we keep the number of tasks constant at 6400 and vary the locality of tasks

between 0% and 96% and measure the performance of Q-RAM and H-Q-RAM. The results

are taken for Networks 1, 3 and 4 from Table 4.3.

Figure 4.15 plots the percentage loss in utility under H-Q-RAM, which does not exceed

4.5%. In addition, the loss drops with the increase in the locality of the task and with the

increase in the size of the network.


-40

-20

0

20

40

60

80

100

10 20 30 40 50 60 70 80 90Per

cent

age

Red

ucti

onin

Exe

cuti

onT

ime

Locality of Tasks(%)

Number of SubNets=5Number of SubNets=15Number of Subnets=20

y=0 Line

Figure 4.16: Variation of Percentage Run-Time Reduction for 6400 Tasks with the Numberof Sub-domains

Figure 4.16 plots the percentage gain in execution time under H-Q-RAM. As seen from

the figure, H-Q-RAM actually has 20% higher execution time under 0% task locality for

the smallest network (Network 1 with 5 sub-domains). However, it increases with the size

of the network as well as with the locality of the tasks. Moreover, the rate of increase in

percentage gain decreases with the increase in the size of the network. In other words, for a

very large network, H-Q-RAM performs better than Q-RAM and the significance of locality

on this performance decreases.

The above experiment shows that H-Q-RAM provides a significant gain in performance

when (a) the size of the network is large, and (b) the locality of the tasks is high. These

results are in agreement the complexity analysis of H-Q-RAM.

In Figure 4.17, we also plot the number of transactions against the locality of the

tasks. As expected, the number of transactions decreases with the increase in task locality.

However, we observe a larger number of transactions with the increase in the size of the

network. This affects the absolute execution time of H-Q-RAM in our simulation due to


a large number of switching among optimization threads and the consequent page faults.

Figure 4.18 plots the H-Q-RAM execution time against the number of sub-domains under

0

5000

10000

15000

20000

25000

0 20 40 60 80 100

Num

ber

ofTra

nsac

tion

s

Locality of Tasks (%)

Number of Sub-domains = 5Number of Sub-domains = 8

Number of Sub-domains = 15Number of Sub-domains=20

Figure 4.17: Number of Transactions for 6400 Tasks with the Number of Sub-domains

different values of the task localities for 6400 tasks. At a very high task locality (96%), the

number of transactions becomes negligible, and the execution time becomes independent of

the number of sub-domains. On the other hand, the execution time monotonically increases

with the number of sub-domains for lower task locality values.

Based on the above results, we conclude that H-Q-RAM performs well for large networks

compared to Q-RAM, which makes it feasible to employ QoS-based optimization in large

networked environments. However, we also observe that the number of transactions also

increases with the increase in the size of the network. Therefore, we would like to reduce

the number of transactions for future implementations.


0.1

1

10

100

4 6 8 10 12 14 16 18 20

Tim

ein

seco

nds

(log

sale

)

Number of Sub-domains

Locality of Tasks = 0.96Locality of Tasks = 0.72Locality of Tasks = 0.48Locality of Tasks = 0.24Locality of Tasks = 0.0

Figure 4.18: Variation of ( H-Q-RAM Execution Time/ Number of Sub-domains) for 6400Tasks with the Number of Sub-domains

4.7 Chapter Summary

In this chapter, we have discussed a resource allocation scheme for a networked system based

on Q-RAM. First, we proposed several pruning algorithms for smart route selections that

makes the basic optimization more scalable without any significant loss in the optimality

of the solution. Our main goal was to analyze the trade-off between optimality and the

execution time of our QoS optimization. Although the specific values may vary depending

on the topology, restricting the maximum number of routes only to 2 reduces the optimality

only by 5%. In addition, exploiting the cached route information across the network becomes

more useful as the size of the network increases.

Next, we presented a transaction-based hierarchical scheme (H-Q-RAM) that can make

the problem more scalable by exploiting the presence of hierarchy in networks. The perfor-

mance of H-Q-RAM improves with the increase in the size of the network and the locality

of the tasks. We also observed that the simulation is memory intensive, and it becomes in-

creasingly expensive in a single host with the increase in the size of the network. Therefore,


a centralized scheme becomes infeasible for a network with the size of the Internet. Since H-

Q-RAM can be executed concurrently on multiple machines using distributed transactions,

it can be run in parallel to address large networks. In addition, we would also like to reduce

the number of transactions which increases with the size of the network. This can be done

if we can aggregate multiple tasks into a few “super-tasks” and perform transactions for

“super-tasks”. Hence our future work will investigate efficient methods of task aggregation.

Chapter 5

Resource Allocation in Phased

Array Radar

5.1 Introduction

There are certain systems where a task has a large number of operational dimensions and/or

a large number of elements across these dimensions1. For example, an application that has

10 operational dimensions with 10 levels along each dimension, will have 1010 set-points or

more. These tasks are called highly configurable tasks.

In addition, there are also certain tasks for which environmental factors play a key role

in deciding the QoS levels. In a dynamically changing environment, the mapping between

resources and utility may change with time. That necessitates frequent QoS optimizations

in order to allocate the resources among the tasks in a near-optimal manner.

Certain distributed embedded systems operate in conditions where both the above situ-

ations hold. An example of such a system is a phased array radar tracking system. A radar

system is an example for which environmental factors outside the direct control of the sys-

tem affect the relationship between the level of service and the resource requirements of the

1The operational and the environmental dimensions are defined in Chapter 2.

107

108 Chapter 5. Resource Allocation in Phased Array Radar

tasks, which in turn affect the perceived utility. In these systems, a finite amount of radar

bandwidth and computing resources must be apportioned among multiple tasks tracking

and searching targets in the sky. In addition, environmental factors such as noise, heating

constraints of the radar and the speed, distance and maneuverability of the tracked targets

dynamically affect the mapping between the level of service and resource requirements as

well as the mapping between the level of service and the user-perceived utility. Their highly

dynamic nature and stringent time constraints lead to complex cross-layer interactions in

these systems. Therefore, the design of such systems has long been a handcrafted mixture

of pre-computed schedules, pessimistic resource allocations, and cautious energy usage, and

operator intuition.

In this chapter, we consider an integrated framework for QoS optimization and schedul-

ing for a phased-array radar system. The antenna in a phased-array radar system can

electronically steer the energy beam in a desired direction. This allows it to track targets at

differing frequencies depending upon its distance, acceleration, and its characteristics such

as speed, acceleration etc. Some characteristics of a radar system are as follows.

• The longer the distance between the target and the radar, the higher the energy

requirement.

• Once a beam transmission starts, it cannot be preempted.

The goal of the radar system is to utilize its finite energy and time resources to maximize

the quality of tracking. In addition to the tracking tasks, the system also includes search

tasks and target confirmation task. A search task searches for new targets in the sky and

a target confirmation task confirms the target after it is detected by the search task.

A radar system must make two sets of decisions. First, it must decide what fraction

of resources (energy and time) to spend on each target. It must then schedule the radar

antenna(s) to transmit the beams and receive the return echoes in a non-preemptive fashion.

Since targets in the sky are continually moving, resource allocation and scheduling decisions

must be made on a frequent basis. Due to the multi-dimensional nature of radar resource

5.1. Introduction 109

allocation, the problem of maximizing the benefits gained is NP-hard.

In our scheme, we develop an integrated framework that performs a near-optimal re-

source allocation and scheduling of the tracking tasks in real-time. We show that such

decisions can be made near-optimally, while maintaining schedulability and satisfying the

resource constraints of the system. We concentrate primarily on the radar antenna resources

as these are generally scarce compared to the computing resources. Unlike traditional radar

systems, we use two layered components. A QoS optimization component is concerned

with determining how much of the resources should be given to each task, and a scheduling

component is concerned with determining when radar tracking tasks should be scheduled.

In short, our radar resource management scheme deals with two primary concerns: the

selections of operating points and ensuring schedulability.

Selection of Operating Points: This is performed by using Q-RAM. In this chapter,

we describe a scalable Q-RAM technique for allocating resources to radar tasks. This is

also presented in [27, 32, 36].

Ensuring Schedulability: As we know, only straightforward resource constraints (such

as total usage of any resource must be less than some utilization bound) is used by Q-

RAM in general. In the radar system, a given allocation generated by Q-RAM may or

may not be schedulable, and furthermore, jitter constraints can be violated even if all the

resource utilizations are less than 100%. In other words, the QoS allocator (Q-RAM) and

the scheduler need to be tightly integrated. Therefore, we present a scheme that integrates

the Q-RAM framework with the radar schedulability test. This is also described in [30].

Although radar scheduling incorporates a pipelined scheduling of a back-end and a front-

end, the front-end antenna remains the bottleneck resource. Therefore, we concentrate our

schedulability analysis on the front-end only. In order to provide stringent periodic jitter

constraints, we use harmonic periods for tasks [52].

The rest of this chapter is organized as follows. Section 5.2 presents our model of the


R1 only

R2 only

R3 only

R4 only

R1&R2

R2&R3 R3&R4

R1&R4

Figure 5.1: Radar System Model

radar system, its associated resources and constraints. Section 5.6 describes our radar dwell

interleaving scheme. Section 5.4 presents our integrated resource management model. In

Section 5.9, we present an evaluation of our experimental results. Finally, in Section 5.10,

we summarize our concluding remarks and provide a brief description of our future work.

5.2 Radar Task Model

We assume the same radar model as used in [30, 32]. It consists of a single ship with 4 radar

antennas oriented at 90 to each other as shown in Figure 5.1. We also assume that each

antenna is capable of tracking targets over a 120 arc. This means that there are regions

of the sky that are capable of being tracked by only one radar antenna, as well as regions

that can be tracked by two antennas. The antennas are assumed to share a large pool of

processors used for tracking and signal-processing algorithms, and a common power source

to supply energy to the antennas. The main tasks of an antenna are search and tracking.

• Search: There are multiple search tasks that cover the entire angular range of the

radar.

5.2. Radar Task Model 111

• Tracking: There is one tracking task corresponding to each target being tracked.

A single instance of tracking a particular target consists of sending a radar signal con-

sisting of a series of high frequency pulses and receiving the echo of those pulses. This

instance is known as a dwell, as shown in Figure 5.2. It is characterized in terms of a trans-

mit power Ai, a transmission time txi, a wait time twi and a receive time tri. Note that in

an actual radar, the transmission time actually consists of a series of rapid pulses over a

time period txi as opposed to a continuous transmission. Generally, txi = tri, and the wait

time is based on the round-trip time of the radar signal (e.g., about 1 ms for a target 100

miles away). Also, while the radar may dissipate some power while receiving, this power is

much smaller than the transmit power. For simplicity, we assume that the receive power

is negligible compared to the transmit power. The time between two successive dwells is

called the dwell period (Ti).

t

txi twi triTi

Ai

Figure 5.2: Radar Dwell

In order to appropriately track a target, the dwell needs to have a sufficient number of

pulses (target illumination time or txi) with a sufficient amount of power (Ai) on the pulses

to traverse through the air, illuminate the target and return back after reflection. Larger

txi (more pulses) and Ai provide better tracking information. The value of Ai required to

adequately track a target is proportional to the 4th power of distance between the target

and the radar [46]. Apart from the power output capability of the energy source, it is also

limited by the heat dissipation constraint of the radar. The tracking information is also


dependent on many environmental factors beyond the radar system’s control such as the

speed, the acceleration, the distance and the type of the target, the presence of noise in the

atmosphere and the use of electronic counter-measures by the target.

Based on the received pulses, an appropriate signal-processing algorithm must be used

in order to properly estimate the target range, velocity, acceleration, type, etc. There are

many tracking algorithms used in radar systems. They provide trade-offs between the noise

tolerance and dealing with target maneuverability. They also have different computational

requirements. Thus, each radar task consists of a front-end sub-task at the antenna and a

back-end signal-processing sub-task at the processors.

Since a target can maneuver to avoid being tracked, the estimates are valid only for the

duration of illumination time. Based on these data, the time-instant of the next dwell for the

task must be determined. Therefore, the tracking task needs to be repeated periodically with

a smaller period providing better estimates. In the absence of any jitter, the tracking period

is equal to the temporal distance between two consecutive dwells. For a large temporal

distance, the estimated error can be so large that the dwell will miss the target. On the

other hand, a small temporal distance will require higher resource utilization. The radar

needs to track the targets with higher importance using greater tracking precision than the

ones with lower importance [30].

A radar task is periodic with a strict jitter constraint. For example, for a task with

period Ti, the start of each dwell must be exactly2 Ti milliseconds from the start of the

previous dwell. We make the seemingly conservative choice of using only harmonic periods

for radar tasks since by using harmonics we can automatically satisfy the stringent periodic

jitter constraints (a pin-wheel scheduling problem [52]).

2In practice, if two successive dwells are not separated exactly by Ti, lower tracking quality will result.If the jitter is higher than a (small) threshold, an entire track may be lost.

5.3. Radar Resource Model 113

5.3 Radar Resource Model

The radar resources-space consists of the following resource dimensions: radar bandwidth,

radar power, and computing resources.

5.3.1 Radar Bandwidth

As we mentioned earlier, a radar can track only a limited number of targets at a specific

time. Since the radar is unused during the waiting period of a dwell, this time can often be

used by other dwells through interleaving. This gives us a radar utilization value of:

Ur =N∑

i=1

txi + tri

Ti. (5.1)

If we assume the receiving time to be equal to the transmission time, we obtain:

Ur = 2N∑

i=1

txi

Ti. (5.2)

5.3.2 Radar Power Constraints

In addition to timing constraints, radars also have power constraints. Violating a power

constraint can lead to overheating and even permanent damage to the radar. The radar can

have both long-term and short-term constraints. For example, there may be a long-term

constraint of operating below an average power of 1 kW , and a (less stringent) short-term

constraint of operating below an average power of 1.25 kW in a 200 ms window. The

short-term constraint is generally specified using an exponential waiting within a sliding

window.


Long-Term Power Utilization Bound

If Pmax is the maximum sustained long-term power dissipation for the radar, then we define

the long-term power utilization for a set of N tasks as:

UP =1

Pmax

N∑i=1

Aitxi

Ti. (5.3)

That is, the long-term power is given by the fraction of time each task is transmitting,

multiplied by the transmit power for that task. Dividing by Pmax gives a utilization value

that cannot exceed 1. To handle long-term constraints in Q-RAM, we simply treat power

as a resource, and denote the amount of that resource consumed by task i as 1Pmax

AitxiTi

.

Short-Term Power Utilization Bound

We will now derive a short-term power utilization bound. Short-term power needs are

defined in terms of a sliding window [5] with time constant τ . With an exponential sliding

window, pulses transmitted more recently have a larger impact on the average power value

than less recently transmitted pulses. Also, the rate that the average power decreases is

proportional to the average power value meaning that immediately after transmitting a

pulse, we have a relatively high but steadily decreasing cooling rate. The use of a sliding

exponential window has two benefits: it is memory-less, and it closely models thermal

cooling, which is the primary motivation for the constraint.

In order to define the short-term average power, we first define instantaneous power

dissipation as p(t). This function is 0 when the radar is not transmitting and Ai while pulse

i is being transmitted. We then define the average power at time t for a time constant τ as:

P τ (t) =1τ

∫ t

−∞p(x)e(x−t)/τdx. (5.4)


t

t=t0

Pτ(t)

e(t-t0)/τ

p(t)

Figure 5.3: Average Power Exponential Window

Ps

Pin

Pout

tcd

ttx

A

P(t)

Pmax

t

Figure 5.4: Cool-Down Time

Ps

Pin

Pout

tcd

ttx

A

P(t)

Pmax

t

Figure 5.5: Non-Optimal Initial Average Power


Figure 5.3 shows an example of the average power value for a set of pulses along with the

exponential sliding window at time t0. The shaded bars represent the transmitted radar

energy, and the dotted line represents the sliding window at time t0. The short-term average

power constraint is considered satisfied if (5.4) never exceeds some bound P τ max. This

bound is called the power threshold over a look-back period τ . Alternatively, the expression

Eth = P τ maxτ is defined as the Energy threshold of the system.

Now, we would like to translate the short-term energy constraint of the radar antenna

to a timing constraint. In this context, we define a timing parameter called the cool-down

time tci that precedes a dwell of each task i.

Definition 5.3.1 (Cool-down Time). The cool-down time for a task is the time required

for P τ (t) to fall from P τ max to a value just low enough that at the end of the transmit phase

of a dwell P τ (t) will be restored to P τ max.

The effect of cool-down time is shown in Figure 5.4. It is a function of the transmit time

txi and the average power Ai of a dwell, the time constant τ and the short-term average

power constraint P τ max. This factor allows the power constraints to be converted into

simple timing constraints.

We will now derive the cool-down time tci for a task i. We will assume that for this

task Ai ≥ P τ max. For a task with Ai ≤ P τ max, there is no necessity of having a cool-down

time since the radar cools down even when it continues transmissions, that is tc = 0. Let

P s be the average power at the beginning of the cool-down period, P in be the average

power at the end of the cool-down period, and Pout be the average power at the end of the

transmission. We want P s = Pout = Pmax. We can express P in in terms of P s as:

P in = P se−tci/τ , (5.5)


and Pout in terms of P in as:

Pout = P ine−tx/τ + Ai(1− e−txi/τ ). (5.6)

Substituting P τ max for Pout in (5.6) and solving for P in, we get:

P in =Pmax −Ai(1− e−txi/τ )

e−txi/τ. (5.7)

We can now substitute P τ max for P s in (5.5) and set the forward and backward definitions

(5.5) and (5.7) for P in to be equal and solve for tci to yield the expression for the cool-down

time:

tci = −τ lnP τ max −Ai(1− e−txi/τ )

P τ maxe−txi/τ. (5.8)

We now present the following theorem:

Theorem 5.3.2. For any set of N periodic radar tasks which do not violate the short-term

average power constraint (where Ai ≥ P τ max for all tasks), the total short-term average

power utilization given by

Uτ =N∑

i=1

tci + txi

Ti. (5.9)

must be no greater than 1.

Proof. Assume that we have a set of tasks for which Uτ = 1. From (5.8), it can be shown

that any decrease in P τ max will cause the tci to increase and thus cause Uτ to exceed 1. If

we can show that when Uτ = 1 the optimal schedule must include a point where the average

power Pτ (t) equals P τ max, then this implies that the theorem must hold. Now, assume that

we have a schedule S where tasks are scheduled such that each dwell transmission period

txi is preceded by an idle time of tci with the cool-down time for each dwell beginning

exactly at the end of the previous dwell’s transmission. Now let P s be the average power

at the beginning of the cool-down period preceding a dwell transmission. It can be shown


from (5.5) and (5.6) that if P s < P τ max, then the Pout for that dwell must satisfy P s <

Pout < Pmax as shown in Figure 5.5 due to the fact that the cooling rate is proportional

to the current average power. This implies that at the end of each transmit period for each

successive dwell, the average power will increase until it converges to P τ max. This means

that in the steady state, the average power will be P τ max at the end of the transmission

period for every dwell. The schedule S must be optimal since moving a dwell any sooner

would result in an increase in P in for that dwell and thus increase Pout as well (exceeding

P τ max). Moving a dwell any later would trade-off the efficient cooling immediately after the

transmission when average power is at P τ max for less efficient cooling before the transmission

resulting in a violation after the next dwell. This shows that the schedule S must be optimal

and that it must have a point where average power is equal to P τ max.

Based on (5.9), we model the short-term average power constraint in the Q-RAM opti-

mization framework by treating power as a pseudo-resource with a maximum value of 1 and

treating each radar task as if it consumes tci+txiTi

units of that resource, with tci computed

using (5.8). Hence, the expression in (5.9) is also referred to as the cool-down utilization Uc

of the system.

It is interesting to note that if we take the limτ→∞ in Equation (5.8), it can be shown

that

tci = (Ai

P τ max

− 1)txi. (5.10)

If we then substitute the above into (5.9), we obtain:

Uτ=∞ =1

P τ max

N∑i=1

Aitxi

Ti. (5.11)

We see that this equation has the exact same form as the long-term power utilization given

in Equation (5.3).


Computational Resource

In addition to the radar resource, each track requires computing resources to process the

radar data, and to predict the next location of the target. The computing resources required

depend on the tracking algorithm Πi used, and the period Ti. We assume that the required

CPU is of the form CΠi/Ti where CΠi is the coefficient representing the computational cost

of algorithm Πi in each time period Ti. If we treat the back-end multiprocessor system as

a single resource, then we have the CPU constraint:

∑i

CΠi/Ti ≤ Cmax, (5.12)

where Cmax represents the total processing power of the bank of processors. This abstraction

is reasonable as long as the amount of processing required by each of the individual tasks

is small compared with the amount available on each of the processors.

5.3.3 Radar QoS Model

In [32], we developed a Q-RAM model for the radar tracking problem. There are two

principal QoS dimensions in the quality space of the radar tracking problem: tracking error

and search quality.

Tracking Error

This is the difference between the actual position and the tracked position of the target.

Although one cannot know the true tracking error, many tracking algorithms yield a pre-

cision of a particular tracking result. As mentioned in Section 5.3, this tracking precision

is dependent on the availability of the physical resources in addition to the computing re-

sources. A smaller tracking error leads to better tracking precision and hence better quality

of tracking. Therefore, we assume that the tracking quality qtrack is inversely dependent on


tracking error ε, as given by:

qtrack =1ε. (5.13)

Search Quality

We also define a QoS parameter for the search task. A search task must span the entire

angle in order to find the targets. It consists of multiple dwells (radar beams) to search

a particular angular space. Hence, the searching QoS increases with the increase in the

number of beams within a fixed angular space.

Reliability

This is the probability that there is no hardware/software failure in a specified time interval.

Higher reliability of a task is obtained by replicating resources, such as using two radars

to track a single target. Since we handle the use of replicas in Chapter 3, we will consider

tracking and searching errors as the only QoS dimensions.

Next, we list the operational and environmental dimensions of the system.

Operational Dimensions

In our tracking model, the operational dimensions are the dwell period (Ti), the dwell time

(txi), the dwell power (Ai), and the choice of the tracking algorithm Πi.

The above parameters can be controlled by the system designer or the optimizer in order

to achieve the desired quality of tracking of a target.

Environmental dimensions

The environmental dimensions we consider are the type of target ξi (e.g., airplane, helicopter,

missile etc.), the distance of the target from the radar ri, the velocity vector of the target ~vi,

the acceleration vector of the target ~ai, the active noise or the presence of electro-magnetic

interference as counter-measures ni, and the angular location of the target in the sky.


Filter Computation K1 K2 K3 KC

Time(ms)Kalman 0.022 0.60 0.4 1000.0 [1, 16]

Least-squares .00059 0.60 0.4 30.71 [1, 16]αβγ 0.0004 0.80 0.2 0.0 [1, 16]

Table 5.1: Filter Constants

Tracking Error Computation

Considering all the operational and environmental dimensions, assuming that we can model

the tracking error in terms of these dimensions, we can define a function:

εi = E( ξi, ri, ~vi,~ai, ni︸︷︷︸environmental

, Ti, txi, Ai,Πi︸︷︷︸operational

) (5.14)

that estimates the tracking error εi as a function of the position of the target along the

environmental and operational dimensions. Here, we make the following assumptions:

• The error increases with an increase in speed (vi), distance (ri) or acceleration (ai).

• An increase in the signal-to-noise ratio at the receiver reduces error. In addition, the

received signal power is directly proportional to the transmitted signal power and is

inversely proportional to the 4th power of the distance.

• A longer dwell time (duration of transmission) reduces error.

• The error increases with an increase in the dwell period. The error due to acceleration

also increases with an increase in the dwell period.

• The Kalman tracking algorithm provides best precision under noisy environment. The

Least Squares offers less precision but more than αβγ. Targets with high maneuver-

ability can be best tracked by the αβγ filter, followed by the Least Squares and the

Kalman.


We assume the transmission time txi consisting of ηi pulses of width w. Since the

tracking error is expressed as the ratio of the deviation in displacement of the target to the

estimated displacement of the target in a tracking period Ti, with the help of [46],[84] and

[60], we formulate a general expression of the tracking error given by:

εi =K1σr + K2(σvTi + K3ai ∗ T 2

i )(ri − d)

, (5.15)

σr =c

2Bw

√Ai(txi/KC)/(2Ti)

ni

, (5.16)

σv =λ

2(txi/KC)√

Ai(txi/KC)/(2Ti)ni

, (5.17)

Bw =M

w, (5.18)

d = viT +12aiT

2, (5.19)

where

σr = standard deviation in distance measurement,

σv = standard deviation in speed measurement,

λ = wavelength of the radar signal,

Bw = bandwidth of the radar signal,

M = bandwidth amplification factor by modulation,

d = estimated displacement of the target in time T ,

K1 = position tracking constant,

K2 = period tracking constant,

K3 = acceleration tracking constant, and

KC = transmission time tracking constant (Tx-factor).

The values we chose for the constants are presented in Table 5.1.


Type Number of beams Utility15 20× 0.3

Hi-Priority 30 20× 0.745 20× 0.8560 20× 0.9510 2× 0.3

Lo-Priority 20 2× 0.730 2× 0.9

Table 5.2: Utility Distribution of Search Tasks

Tracking Utility

Higher tracking quality yields higher utility. Hence we assume that the utility of tracking

a target for a certain quality qtrack is given by the following concave exponential function:

U(qtrack) = Wtrack(1− e−βqtrack), (5.20)

where β is a parameter specific to the ranges of the speeds of three different types of

targets (airplane, missile or helicopter). Equation 5.20 assumes the utility increases with

increase in tracking precision, which ultimately saturates at a very high precision [49]. The

parameter Wtrack is a weight factor that determines the importance of the target and it is

also dependent on the type of the target. Moreover, it is also assumed to be proportional

to the speed and is inversely proportional to the distance of the target. We assume a weight

factor Wtrack of the form:

Wtrack = Kt(vi

ri + Kr), . (5.21)

providing an estimate of the importance of a particular target. The Kt and Kr terms

represent the importance based on the target type, and the right-most term represents the

time-to-intercept (i.e., the time that would be required for a target to reach the ship if flying

directly toward it).

The objective of our optimization is to allocate resources to each tracking process such

that the total utility is maximized. From our stated assumptions on tracking precision,


Return?End SchedulerAdmissionControl

QoS Optimizer(Q-RAM)

Start Tracking & Searching Tasks

Tasks with assigned QoS and Resource

Resource Allocation

Radar Utilization Bound

Adjustment Detection

1

0

Perform Utilization Bound Adjustment

Task Profiler

Figure 5.6: Resource Management Model of Radar Tracking System

quality and utility, we obtain an expression for utility as a function of the tracking error,

U(ε) = Wtrack(1− e−γ/ε), (5.22)

where γ is a function of β and the relation between quality and tracking error. The required

values of operational dimensions needed to obtain a particular value of tracking error from

(5.14) can be translated into the resource usages.

5.4 Resource Management in Phased Array Radar

Since radar systems are very dynamic with a constantly changing environment, it is neces-

sary for the radar to continuously redistribute its resources among the tasks. The resource

allocation process needs to be repeated at regular intervals. Hence, its efficient execution

is of critical importance to be of practical use. Our proposed radar resource management

approach consists of 3 main steps: (1) QoS-based resource allocation, (2) resource scheduler

admission test, and (3) utilization bound adjustment. These steps may need to be repeated

more than once in order to obtain a near-optimal solution. We next describe these three

steps.

1. QoS-based Resource Allocation: Basic Q-RAM optimization maximizes the

global utility of the system by allocating the resources to the tasks. We use the

5.4. Resource Management in Phased Array Radar 125

input : schedulability fails or Ub needs adjustment/* Ub = present utilization bound, Up = previous utilization bound,

Umax= upper level of bound, Umin = lower level of bound */if schedulability fails then

Umax ← Ub ;Un ← (Umax + Umin)/2 ;// Un = next utilization bound

if previous schedule was successful and (Un − Up)/Un < .1% thenSwitch to previous schedule;Ub ← Up;return 1;//Previous schedule is selected

endelse

Ub ← Un;//Utilization bound is reduced

Up ← Ub;Return 0;//Return to Q-RAM

endendelse

Umin ← Ub ;//Successful schedule

Un ← (Umax + Umin)/2;if (Un − Ub)/Un < .1% then

Return 1 ;//Current schedule is selected

endelse

Ub ← Un;//Utilization bound is increased

Up ← Ub;Return 0;//Return to Q-RAM

endend

Algorithm 10: Utilization Bound Adjustment


current snapshot of the sky at a particular instant during which the environmental

dimensions of the objects are constant. Next, we generate profiles for each target

with the values of its environmental parameters and picking the ranges of values of

the operational parameters. Profiles are used in the optimization process to provide

resource allocations for the tasks. The resulting resource allocation does not always

guarantee schedulability. This is due to the non-preemptive nature of the radar front-

end tasks, which require us to perform a sophisticated scheduler admission test to

determine the schedulability.

2. Scheduler Admission Test: The resource scheduler takes the results of the Q-RAM

resource allocations, interleaves the tasks and then runs the schedulability test. If the

task set is not schedulable, we reduce the utilization bound of the radar and return

to Step 1 in order produce a schedulable task-set.

3. Utilization Bound Adjustment: This function reduces the utilization bound if the

interleaved tasks are not schedulable, or increases the utilization bound if they are

schedulable. Thus, it searches for the maximum utilization bound for a schedulable

task-set using a binary search technique. This is described in Algorithm 10.

The entire resource allocation process iteratively searches for the best possible utilization

bound. It stops when it reaches a schedulable task-set, and the utility values from two

successive iterations differ by only a small value (such as 0.1%), called the “utiliity precision

factor”. This is detailed in Figure 5.6 as a flow-chart.

5.5 Resource Allocation with Q-RAM

As we recall, the Q-RAM optimization involves the following steps:

• Generate set-points for each task.

• Construct the concave majorant of its set-points.

5.5. Resource Allocation with Q-RAM 127

• Merge all the set-points of all tasks based on their marginal utility values.

• Traverse the sorted list of set-points to generate resource allocations to the tasks.

The basic Q-RAM optimization requires that each task explicitly provide the list of all

possible set-points. The concave majorant is determined on these input set-points next. We

know that the best-known algorithm for computing the exact concave majorant of L set-

points is O(L log L). Even though it is a relatively benign complexity, it has two drawbacks.

• We need to generate all possible set-points of a task and use them in determining the

penalty vector of the resources before we determine their concave majorant.

• The computational complexity of the concave majorant operation can be prohibitively

expensive when the number of set-points is large even when the number of output set-

points it generates is much smaller.

Since an application with d operational dimensions and p index values per dimension

has a total of l = pd set-points, the number of set-points can quickly become unmanageable

when there are a large number of operational dimensions. In the following sections, we

describe algorithms that traverse the set-point space generating the subsets of set-points

that are likely to lie on the concave majorant and thus eliminate the requirement of enu-

merating all possible set-points. For simplicity, we will first assume all tasks have only

monotonic operational dimensions. Later we discuss the general case in which some tasks

have operational dimensions that are non-monotonic.

5.5.1 Slope-based Traversal (ST)

This is the simplest approach to the traversal process. Let the minimum set-point for a

task τi for which all operational dimensions are monotonic be defined as ~Φmini = 1, . . . , 1,

and let the maximum set-point be defined as ~Φmaxi = φmax

i1 , . . . , φmaxiNΦ

i. Clearly, all of

the set-points in the utility/compound resource space that lie below a “terminating” line

from (u( ~Φmini ), h( ~Φmin

i )) to (u( ~Φmaxi ), h(Φmax

i )) as shown in Figure 5.7 cannot be on the


Compound Resource0.0 0.2 0.4 0.6 0.8 1.0

Util

ity

0.0

0.2

0.4

0.6

0.8

1.0

Figure 5.7: Slope-Based Traversal of Concave Majorant

concave majorant. These points can be eliminated immediately without being passed on

to the concave majorant step. We call this heuristic “slope-based traversal” (ST). While

this heuristic can reduce the time to compute the concave majorant by a constant factor, it

must still scan all of the set-points to determine if they are above or below the terminating

line.

5.5.2 Fast Set-point Traversals

We now consider a set of fast traversal heuristics that do not require computations on all

of the set-points. We (temporarily) assume that all operational dimensions are monotonic.

A key observation we make is that when the actual concave majorant is generated using

all of the set-points for typical tasks, the concave majorant tends to consist of sequences

of set-points that vary in only one dimension at a time with occasional jumps between

sequences of points. This insight suggests that we can use local search techniques to follow

the set-points up the concave majorant. We also know that ~Φmini will always be the first

point on the concave majorant, and ~Φmaxi will always be the last. The methods presented


<1,1,*>

<1,*,5>

<*,7,5>

Composite Cost

Util

ity

Outer Envelope

Figure 5.8: Incremental Traversal

here differ primarily in the method used to perform the local search.

As an example, consider a task with three operational dimensions. If we consider the

subset of the set-points < 1, 1, ∗ > consisting of all the set-points for which dimensions 1

and 2 have index value 1, these points will tend to form a line as shown in Figure 5.8. The

concave majorant will tend to follow such a line until it switches to some other line, in this

case < 1, ∗, 5 > followed by < ∗, 7, 5 >.

While the fast traversal heuristics presented in this section are not guaranteed to find

the exact concave majorant, in Section 5.8 we will show that these heuristics produce very

good approximations to the concave majorant in our radar QoS optimization and more

importantly that the drop in system utility from using the approximations is negligible.


First-Order Fast Traversal (FOFT)

In first-order fast traversal (FOFT), we keep a current point ~Φi for each task τi which we

initialize to ~Φmini . We then compute the marginal utility for all the set-points adjacent to

~Φi. A set-point is adjacent if all of its index values except for one are identical, and the one

that differs varies by only one (i.e., they have a Manhattan distance of one). We, in fact,

need only consider positive index value changes. We then choose the point that has the

highest marginal utility, add it to the concave majorant and make that point the current

point. Formally, if ~Φi is the current point we choose the next current point ~Φ′i = ~Φi + ~Ξj

where j maximizes the marginal utility:

u( ~Φi + ~Ξj)− u( ~Φi)

h( ~Φi + ~Ξj)− h( ~Φi), (5.23)

and where ~Ξj is a vector that is zero everywhere except in dimension j where it is equal

to 1. We repeat this step until we reach ~Φmaxi . After we have generated this set of points,

the resulting curve may not be a concave majorant. Hence, we perform a final concave

majorant operation (albeit on a much smaller number of points than before).

The number of set-points generated before the final concave majorant step will be the

Manhattan distance between ~Φmini and ~Φmax

i which is:∑NΦ

ij=1(φ

maxij −φmin

ij ). Ignoring bound-

ary conditions, at each point we only consider NΦi possible next set-points. This means that

when we have d dimensions and k index levels per dimensions then the complexity of this

algorithm is O(kd2). If we include the complexity of the concave majorant determination,

we have the total complexity of O(kd2 + kd log kd)

5.5.3 Higher-Order Fast Traversal Methods

We can generalize the FOFT algorithm to an m-step p-order Fast Traversal algorithm as

follows. Just as in the FOFT heuristic, initialize the current point ~Φi to ~Φmini . Then choose

the next point ~Φ′i = ~Φi + ~Z where ~Z ∈ Gpm such that the marginal utility is maximized and


Gpm is defined as:

G1m =

⋃1≤j≤NΦ,1≤k≤m

kΞj, (5.24)

Gpm = ~X + ~Y : ~X ∈ G1

m, ~Y ∈ Gp−1m , ~X • ~Y ≡ 0 ∪Gp−1

m . (5.25)

That is, we look all of the next set-points that can be reached from the current set-point by

increasing up to p dimensions and up to m steps. The FOFT algorithm described above then

corresponds to G11. As with FOFT, we need to perform a final concave majorant operation

on the points generated by this heuristic. As we observe, if we let m take as large a value

as possible, the procedure becomes a standard concave majorant operation.

5.5.4 Non-Monotonic Dimensions

The fast traversal algorithms described above assume that all of the operational dimensions

are monotonic. Unlike monotonic dimensions, non-monotonic operational dimensions gen-

erally do not have a structure that can be easily exploited. For example, the choice of a

coding scheme for a video, the choice of a route in a networked system, or the choice of a

tracking algorithm in a radar system can be considered to be non-monotonic.

Suppose that some of the operational dimensions are non-monotonic. Then, for every

combination of the index values of the non-monotonic dimensions, we simply apply the fast

traversal algorithms to the subset that is monotonic. We then form the union of all these

results and apply a concave majorant. In the worst-case that a task has only non-monotonic

dimensions, this simply reduces to a full concave majorant operation. For example, in a

radar tracking system, we can apply fast-traversal methods for each of the three tracking

algorithms separately, and then merge all three results and perform a concave majorant

operation.

If there is a large number of non-monotonic operational dimensions, we apply smart

heuristics to guess the best possible values of those dimensions to perform the same traversal.

How this is done depends on the characteristic of a particular system and the influence of


the dimensions on the resource requirements of the task.

5.5.5 Complexity of Traversal

Since, we linearly traverse the points and do not examine a point more than once, the worst

case complexity of these scheme is O(L) instead of O(L log L). However, it is likely to be

much smaller on average since we go through only a small number of points. We discuss

the experiments related to these techniques in detail in Section 5.8.

5.5.6 Discrete Profile Generation

Under certain situations, we can find that even efficient profile generation at run-time

takes too long. An alternative is to generate the profiles off-line, but the profile space

has multiple dimensions with wide ranges. Therefore, off-line computation and storage of

profiles requires exponentially large space and becomes unwieldy. The approach we adopt

is to quantize each continuous environmental dimension into a collection of discrete regions.

We then only need to generate a number of discrete task profiles offline for a variety of

environmental conditions. At run-time, we simply map each task into one of the discrete

profiles. Any quantization carried out must be such that (1) the storage needs of the discrete

profiles are practical, and (2) there is no significant drop in the quality of the tracks (as

measured by the total system utility). The quantization along any dimension can employ

an arithmetic, a geometric or some other progression.

5.6 Scheduling Considerations

Our model of each radar dwell task, as discussed in Section 5.2, consists of 4 phases: cool-

down time(tc), transmission time (tx), waiting time (tw), and receiving time (tr), as shown

in Figure 5.9(a). The durations tx and tr are non-preemptive, since a radar can only perform

a single transmission or a single reception at a time. However, the tc of one task can be

5.6. Scheduling Considerations 133

tx tw trtc

(a) Dwell with cool-down time

Offset

W1

W2

tC1 tC2

(b) Proper Nesting Ex-ample

Offset

W1

W2

tC1 tC2

(c) Improper NestingExample

Figure 5.9: Interleaving of Radar Dwells

overlapped with tr or tw of another task, since the radar can cool down during the waiting

and the receiving period.

Considering the entire duration of a dwell (from transmission start to reception end) as

a non-preemptive job wastes resources and decreases the schedulability of the system [74].

Task dwells can be interleaved to improve schedulability. Dwells can be interleaved in two

ways: (1) properly nested interleaving and (2) improperly nested interleaving. An optimal

construction of interleaved schedules using a branch-and-bound method has been described

in [74] and [73].

In this thesis, we focus on fast and inexpensive construction of dwell interleavings in the

presence of dynamically changing task-sets. The interleavings that we construct may not

necessarily be optimal in the sense of [74], but they will be schedulable.

5.6.1 Proper Nesting of Dwells

Two dwells are said to be properly nested if one dwell fits inside the waiting time (tw) of

another. Figure 5.9(b) demonstrates this situation in which dwell W1 fits in the waiting

time of dwell W2. The necessary condition for this interleaving is given by

tww1 ≥ (tcw2 + txw2 + tww2 + trw2). (5.26)


input : n > 1nv ← n;// n = Number of inputted tasks, nv = number of virtual tasks

Create a sorted list of the tasks in increasing order of (tc + tx + tw + tr);Create a sorted list of the tasks in increasing order of tw;while 1 do

if nv > 1 thenChoose the task τa with smallest tc + tx + tw + tr;Find the task τw with smallest possible tw that can properly nest τa in itstw;if no task τw is found then

Break from the loop;endelse

Fit τa inside τw by proper nesting (Figure 3) to form a single virtualtask;Remove the original two tasks from the sorted lists and insert the newvirtual task into them;nv ← nv − 1;

endendelse


end

Algorithm 11: Proper Nesting Algorithm

We define a phase offset for a proper interleaving as given by:

op = tww1 − (tcw2 + txw2 + tww2 + trw2). (5.27)

For instance, we can schedule the cool-down time of the dwell W2 right after the transmission

time of W1. Thus, the value of the phase offset determines how tightly two nested tasks fit

together. Our aim is to minimize this offset.

The proper nesting procedure is detailed in Algorithm 11. The core of the scheme deals

with fitting a dwell of the smallest size into a dwell with the smallest feasible waiting time.


5.6.2 Improper Nesting of Dwells

Two dwells are said to be improperly nested when one dwell only partially overlaps with

another (e.g. as illustrated in Figure 5.9(c)). Suppose that task W1 is improperly inter-

leaved with task W2, where W1 starts first. Task W1 is called the leading task and task W2

is called the trailing task. Based on the phasing illustrated in Figure 5.9(c), the necessary

conditions for the interleaving to occur are given by

tww1 ≥ tcw2 + txw2, (5.28)

tcw2 + txw2 + tww2 ≥ tww1 + trw1. (5.29)

We define a phase offset for this case by

oi = tcw2 + txw2 + tww2 − (tww1 + trw1). (5.30)

Our improper nesting scheme is given in Algorithm 12. It starts with the task with

the largest waiting time (tw), and attempts to interleave it with the task with the largest

possible tw that is smaller than that of the original task and satisfies the conditions stated

in Equations (5.28) and (5.29). The algorithm repeats the process until it reaches the task

with the smallest tw that can no longer be interleaved, or all tasks are interleaved to form

a single virtual task.

5.6.3 Dwell Scheduler

The responsibilities of the radar dwell scheduler are as follows:

• Obtain the period and the dwell-time information (tc, tx, tw, tr) from Q-RAM for each

task.

• Interleave tasks with the same period using proper and/or improper nesting to create

a smaller number of virtual tasks.


input : Set of tasks with n > 1output: Modified set of virtual (improperly interleaved) tasks with nv ≥ 1nv ← n ;//n = Number of inputted tasks, nv = number of virtual tasks

Sort the list of tasks in increasing order of tw;while nv > 1 do

Start with a task τw with biggest twwhile A task is found do

Find a task τwn with biggest possible tw smaller than that of τw that can bethe leading task in improper nesting with τw;if τwn is found then

Compute the nesting offset as on;endFind a task τwi with biggest possible tw smaller than that of τw that can bethe trailing task in improper nesting with τw;if τwi is found then

Compute the nesting offset as oi;endif Both τwn and τwi are found then

if on < oi thenMerge τw and τwn by improper nesting with τwn as the leading task;

endelse

Merge τw and τwi by improper nesting with τwi as the trailing task;endnv ← nv − 1 ;Remove the merged two tasks from the sorted list and insert the newvirtual task into it;

endelse if Only τwn is found then

Merge τw and τwn by improper nesting with τwn as the leading task ;nv ← nv − 1 ;Remove the merged two tasks from the sorted list and insert the newvirtual task into it;

else if Only τwi is found thenMerge τw and τwi by improper nesting with τwi as the trailing task;nv ← nv − 1 ;Remove the merged two tasks from the sorted list and insert the newvirtual task into it;

elseGo to the task with next lower tw;

endend

end

Algorithm 12: Improper Nesting Algorithm


• Perform a non-preemptive schedulability test for the virtual tasks.

Next, we describe our schedulability test.

Schedulability Test

As mentioned earlier, in order to satisfy the jitter requirements, only relatively harmonic

periods are used for the dwells3. We define the following terms:

• Ni = Number of tasks with a period Ti

• Cij = Total run-time of the jth task among the tasks within the period Ti

• NT = Total number of periods

• Ti > Tj ,∀i < j

The response time tRi of the tasks for a given period Ti is given by

tRi =i−1∑j=1

dTi

Tje

Nj∑k=1

Cjk︸︷︷︸run-time of higher priority tasks

+Ni∑

k=1

Cik︸︷︷︸run-time of tasks with period Ti

+ Bi︸︷︷︸Blocking term

.

(5.31)

The blocking term Bi is defined as the maximum run-time Cmj among tasks with lower

priority, as defined by:

Bi = max(Cmn),∀i < m ≤ NT , 1 ≤ n ≤ Nm︸︷︷︸Maximum task size among all tasks of lower priority

. (5.32)

As already mentioned, each radar task (virtual or otherwise) is considered to be non-

preemptive under the schedulability test.

For a task-set to be schedulable, it must satisfy:

tRi ≤ Ti,∀i ∈ NT . (5.33)3As we show in the next section, our model of radar system does not show significant degradation in the

accrued utility due to the restricting use of harmonic periods


It must be remembered that using nesting we combine multiple tasks into a few virtual dwell

tasks within each period. The run time of a task is given by Cjk= tcjk

+ txjk+ twjk

+ trjk,

where the parameters tcjketc. may be virtual parameters if the dwells are nested.

5.7 Experimental Configuration

Parameter Type RangeDistance All [30, 400] kmAcceleration All [0.001g, 6g]Noise All [kTBw,103kTBw] a

1 (helicopter) [60,160] km/hrspeed 2 (fighter-jet) [160,960] km/hr

3 (missile) [800, 3200] km/hrAngle All [0, 360]

Table 5.3: Environmental DimensionsaBw = Bandwidth of the radar signal, k = Boltzmann constant, T = Temperature in Kelvin

We assume a radar model as described in Figure 5.1. Radar tasks are classified into

tracking tasks, high priority search tasks, and low-priority search tasks. The ranges that

we use for periods, dwell time, dwell power and the number of dwells among them are given

in Table 5.4.

As mentioned earlier, tracking error is assumed to be the only QoS dimension for each

tracking task, and the number of beams is the QoS dimension for the search tasks. The

Tasks Number of Period (ms) Dwell Transmissionbeams(dwells) power (kw) Time (ms)

Hi-Priority [15, 60] 800 5.0 0.5Search

Tracking 1 [100, 110, 120, [0.001, 0.002, 0.004, [0.02, 0.04, 0.06,, .., 1600] , .., 16.0] , .., 50.0]

Arithmetic series Geometric series Arithmetic seriesLo-Priority [10, 30] 1600 3.0 0.25

Search

Table 5.4: Period, Power and Transmission Time Distribution

5.8. Results with QoS Optimization 139

tracking error in turn is assumed to be a function of environmental dimensions (target dis-

tance ri, target speed vi, target acceleration ai, target noise ni) and operational dimensions

(dwell period Ti, number of pulses ηi in dwell transmission time Ci, pulse width w, dwell

transmission pulse power Ai, tracking filter algorithm) [32]. For a search task, each beam

corresponds to a single dwell parameterized by values of Ti,ηi and Ai.

The assumed ranges of the environmental dimensions for targets are shown in Table 5.7.

The ranges of various operational dimensions for all types of tasks are shown in Table 5.4.

As mentioned in Table 5.1, we assume three types of tracking filters, namely Kalman, αβγ

and least-squares, to account for computational resources. Their estimated run-times have

been extrapolated to equivalent run-times of a 300MHz processor as shown in Table 5.1 in

Section 5.3.3. This is because the radar computing system is assumed to be a distributed

system consisting of a large (128) number of 300MHz processors. We also assume that the

overhead for a search task is assumed to be the same as that of the Kalman filter.

We use a 2.0GHz Pentium IV with 256MB of memory for all of our experiments.

5.8 Results with QoS Optimization

In this set of experiments, we deal with tracking tasks only. Using the settings presented in

Tables 5.4 and 5.7, we perform the processes of task profile generation and QoS optimization.

We vary the number of targets, compute the utility accrued and determine the execution

time of the two processes. This is averaged over 50 iterations involving independent sets of

targets.

From the settings of the operational dimensions presented in Table 5.4, we observe that

each tracking task can have a very large number of configurable set-points. In fact, a single

task can have as many as 16500 set-points.


1 2 3 4 5 6101

102

103

104

105

Traversal Algorithms

Num

ber o

f Set

−poi

nts/

Task

(log

scal

e) 1.Basic Q−RAM2. Concave−Majorant3. ST4. FOFT5. 2−FOFT6. SOFT

Figure 5.10: Average Number of Set-points

5.8.1 Experiments with Traversal Techniques

We perform a series of experiments on each of the optimization methods with the number

of tasks varying geometrically from 8 to 512. In the basic Q-RAM case, we were not able

to continue beyond 128 due to memory exhaustion of our machine. This is due to the fact

that approximately 2 millions set-points are generated for 128 tasks. Each set-point requires

approximately 100 bytes of space. This means that we need more around 200MB just for

set-point storage.

The bar-graph in Figure 5.10 shows the average number of set-points per task after the

task profile generation step using different techniques. We observe a drop of 99%, from

16500 to 91 when we apply the concave majorant operation. We also observe that 2-FOFT

reduces the number of points to 47, which is a 48% drop compared to the full concave


10000

100000

1e+06

1e+07

1e+08

1e+09

0 100 200 300 400 500 600

Exe

cutio

n Ti

me

(us)

in lo

gsca

le ->

Number of Tasks->

With Basic Q-RAMConcave Majorant Only

STFOFT

2-FOFTSOT

Figure 5.11: Q-RAM Execution Time

majorant scheme. This could be a potential weakness of 2-FOFT since we have thrown some

potentially useful points. However, as we shall see, it does not affect the utility value of the

optimization significantly.

Figure 5.11 shows the plot of the overall Q-RAM execution time which is the sum of

the times for the task profile generation and the subsequent optimization. During the task

profile generation step, we generate set-points for tasks and use a traversal technique for

that. During the optimization step, we run AMRMD CM algorithm on the generated set-points

of the tasks. We notice a huge drop in execution time for the traversal algorithms. For

example, for 64 tasks, the run-time reduces from 7.4 minutes under basic Q-RAM to 16.28

seconds when we perform the concave majorant. It is further reduced to a minimum of

0.48sec under FOFT.

Next, we inspect the quality of the optimization results. Figure 5.12 shows the variation

of utility versus the number of tasks. All the algorithms yield very similar utility values.

The worst performer is FOFT, which is smaller by only 1.17% for 512 tasks.

Next, we exhaustively compare these traversal algorithms against the simple Concave


0

100

200

300

400

500

600

700

800

900

1000

0 100 200 300 400 500 600

Util

ity ->

Number of Tasks->

With Basic Q-RAMConcave Majorant Only

STFOFT

2-FOFTSOFT

Figure 5.12: Q-RAM Utility Variation

Majorant scheme for 1024 tasks. The results are averaged over 100 independent sets of

tasks. As shown in Figure 5.14, we obtained the maximum utility loss under FOFT, which

is still only 2.7% less than the simple concave majorant scheme. However, all incremental

traversal algorithms provided large (97%) reductions in the profile generation time as shown

in Figure 5.13, as well as considerable reductions in the optimization time (close to 50%) as

shown in Figure 5.15.

The total run-time of the algorithm is the sum of the profile generation and optimization

times. The percentage contribution of the profile generation time to the execution time of

the whole process under all traversal techniques is plotted in Figure 5.16. The task profile

generation always contributes more than 87% of the overall time which is quite expensive.

Unfortunately, even the fastest scheme takes more than 7sec for 1024 tasks. This is still

unlikely to be acceptable for online use. In the next section, we show how we can reduce

the optimization time even further.


Conc−Majorant ST FOFT 2−FOFT SOFT0

10

20

30

40

50

60

70

80

90

100


Per

cent

age

Exe

cutio

n Ti

me

Figure 5.13: Profile Generation Time (%)

ST FOFT 2−FOFT SOFT

0

1

2

3

4

5

6

7

8

9

10


Per

cent

age

Loss

in U

tility

Figure 5.14: Utility loss (%)


Conc−Majorant ST FOFT 2−FOFT SOT0

10

20

30

40

50

60

70

80

90

100


Per

cent

age

Exe

cutio

n Ti

me

Figure 5.15: Optimization Time (%)

Conv−Majorant ST FOFT 2−FOFT SOFT0

10

20

30

40

50

60

70

80

90

100


Per

cent

age

Pro

file

Run

−Tim

e

Figure 5.16: Fractional Profile Time (%)


0

1

2

3

4

5

6

7

8

9

10

11

0 1 2 3 4 5 6 7 8 9 10

Util

ty ->

Distance ->

Distance Variation

Figure 5.17: Utility Variation with Distance

5.8.2 Generation of Discrete Profiles

The idea of discrete profile generation has been described in Section 5.5.6. In this section,

we perform experiments on discrete profiles. We vary one environmental dimension at a

time while keeping the others constant and plot the utility values. Figures 5.17, 5.18 and

5.19 show the variation of utility under the variations of acceleration, speed and distance

respectively.

As observed from the figures, the utility variation can be reasonably approximated by

a linear regression with respect to speed and acceleration as independent variables. The

plot for speed has approximately three linear steps due to the use of three different types of

targets. This means the arithmetic progression is appropriate for discrete values of speed

and acceleration. On the other hand, the utility variation is hyperbolic relative to the

distance of the target. Therefore, a geometric progression represents the best scheme for

quantizing distance.

For each combination of the quantized environmental dimensions, a profile is generated


0

2

4

6

8

10

12

0 500 1000 1500 2000 2500 3000 3500

Util

ty ->

Speed ->

Speed Variation

Figure 5.18: Utility Variation with Speed

1.38

1.4

1.42

1.44

1.46

1.48

1.5

1.52

1.54

1.56

0 20 40 60 80 100 120 140 160

Util

ity->

Acceleration->

Acceleration Variation

Figure 5.19: Utility Variation with Acceleration


-25

-20

-15

-10

-5

0

0 200 400 600 800 1000 1200

Per

cent

age

Loss

in U

tility

->

Resolution in Acceleration (8,16,32,...) ->

Acceleration Resolution Variation

Figure 5.20: Utility Loss with Quantized Acceleration

-1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0

0 50 100 150 200 250 300 350 400 450 500 550

Per

cent

age

Loss

in U

tility

->

Resolution in Distance (16,32,...,512) ->

Distance Resolution Variation

Figure 5.21: Utility Loss with Quantized Distance


-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

0 20 40 60 80 100 120 140

Per

cent

age

Loss

in U

tility

->

Resolution in Speed (2,4,8,...)->

Speed Resolution Variation

Figure 5.22: Utility Loss with Quantized Speed

offline. Then, during the optimization process, each task is mapped to one of the discrete

profiles based on target characteristics. We always round up to conservative estimates on

the environmental parameters in order to determine the suitable quantization level for each

task. In addition, the weight factor of the profile obtained offline is adjusted based on the

real values of the speed and the distance of the target.

5.8.3 Utility Variation with Discrete Profiles

The aim of these experiments is to determine the loss in the utility relative to the resolutions

of the environmental dimensions. Higher resolutions result in utility values closer to the

optimal, but require more storage for offline profiling. Also, one dimension may be more

dominant in influencing the utility value than others, hence we need higher resolution for

this dimension. This gives us a trade-off between the loss in utility and the storage space.

Figure 5.20 shows the percentage loss in utility from using discrete profiles for different

resolutions of acceleration. Each result was taken for 1024 independent tasks over 100

iterations. The resolution of acceleration is varied from 16 to 1024 m/sec2 by powers of 2


-25

-20

-15

-10

-5

0

0 200 400 600 800 1000 1200

Per

cent

age

Util

ity L

oss

->

Space Requirements (Number of Offline Profiles) ->

Distance VariationSpeed Variation

Acceleration Variation

Figure 5.23: Utility Loss with Quantized Distance

keeping speed and distance continuous. The plot is a concave curve saturating close to the

continuous optimization at higher resolution. Next, we vary the resolution of the distance

from 16 to 256 points keeping acceleration and speed continuous. Figure 5.21 shows the

comparative utility variations. The same experiment is repeated by varying the resolution

of speed keeping the other two dimensions continuous. Figure 5.22 shows the result. All

the results show concave curves approaching a zero utility loss relative to that obtained

under all continuous environmental dimensions. But the variation across the acceleration

dimension is much more significant than the other two, and can be up to 25%.

Next, we plot the same results from the above three experiments as the loss in utility at

1024 tasks against the amount of space required for discrete off-line profiling. For each curve,

we keep two dimensions continuous (or at their maximum possible resolution) and keep

increasing the resolution of one dimension. The amount of storage space required for off-line

profiling is proportional to the resolution of a particular dimension. For the speed dimension,

it is also proportional to the number of types of targets (3 in our case) since the speed of each

type of target is quantized independently. From this, we can obtain the storage requirements


and the corresponding utility loss for each setting of the environmental parameters. For

example, for a resolution of 16 for distance, the loss is 2.47%, the other factors being

continuous. Similarly it is 2.89% for speed at a resolution of 2 and 2.59% for acceleration at

a resolution 256. Therefore, for a setting of (16, 2, 256) for distance, speed and acceleration

respectively, we would incur an approximate utility loss of 1−(100−2.47100 )(100−2.89

100 )(100−2.59100 ) =

7.74%, assuming the losses under these three parameters are mutually independent for the

sake of simplicity4. If each set-point requires 100 bytes, the total space requirement is

100 bytes× 16× (2× 3)× 256 ' 2.34MB, which is certainly very acceptable with today’s

memory technology.

As seen in Figure 5.23, we observe that the acceleration dimension is dominant in de-

ciding the quality of optimization. This is because the other two dimensions saturate very

quickly for the tracking error model we have chosen. The primary cause is the weight factor,

which is always computed based on the exact values of the speed and the distance of the

individual targets independent of their quantization. This can be done without incurring

additional complexity, and it dramatically minimizes the effect of quantization on speed

and distance. With the quantization in place, the on-line profile generation time reduces to

the order of microseconds, and only requires the reading of discrete profiles. Overall, the

run-time of the algorithm is mainly contributed by the Q-RAM optimization step. With

profile generation now becoming essentially negligible, the total run-time with discrete pro-

files can be determined from the data presented in Figure 5.11 and the fraction of time not

spent in profile generation presented in Figure 5.16. For example, Figure 5.11 indicates that

with 512 tasks, the total execution time with on-line profile generation is about 5 seconds.

Of these 5 seconds, Figure 5.16 indicates that 87% of the time is spent on on-line profile

generation. With offline profile generation taking only microseconds, the optimization time

for 512 tasks now takes just 13% of 5 seconds, i.e. 650 ms. Even in the unlikely event

that the number of tasks increases to 1024, the total run-time reduces from 7sec for on-line

4This is not exactly true. However, we found the real loss very close the one estimated in this way.

5.9. Results with Scheduling 151

profile generation to only 1sec for off-line discrete profile generation with a negligible loss in

utility. This brings the utility maximization performance to a level where it is practical for

real-time control. In the following section, we will investigate the results when we include

schedulability analysis with the QoS optimization. These two together will determine how

frequent we can invoke QoS optimization in a dynamic environment.

5.9 Results with Scheduling

This section is divided into 3 parts. First, we present a set of experiments to study the

impact of using only harmonic periods in QoS optimization. Next, we compare two different

harmonic period distributions against a wide choice of periods. Finally, we run the entire

resource allocation process as described in Figure 5.6 using various interleaving schemes for

radar scheduling and compare their performances.

5.9.1 The Effect of Harmonic Periods

Our first experiment studies the effect of using only harmonic periods on the total utility

obtained by QoS optimization. For simplicity, we consider only tracking tasks. We study

the impact of harmonic periods across a wide range of system configurations. Specifically,

we vary the amount of the two primary resources in the system, namely energy limits and

available time. We achieve this by varying two factors:

• Energy threshold (Eth): Lowering Eth increases the cool-down time for each quality

set-point of a radar task, and therefore the increases the cool-down utilization requirement.

Eth is defined at the end of Section 5.3.2.

•Transmission-time tracking constant (Tx-factor): This factor directly influences

the requirement of transmission time. A higher Tx-factorincreases the transmission time

for a particular quality set-point. This in turn increases both the radar utilization as well

as the cool-down utilization requirements for a given quality of any task. Tx-factor is


present in Equations (5.16) and (5.17) as a dividing factor of tx.

We use the settings from Tables 5.7, 5.4 and 5.1 to randomly generate 512 tasks (tracks)

and develop their profiles. We also vary Tx-factor from 1 to 16 in a geometric fashion,

and Eth from 20J to 670J keeping the look-back period τ constant at 200ms[48]. We then

perform the QoS optimization under three distributions of available periods between the

range [100, 1600]ms:

• A: Arithmetic distribution in steps of 10ms (to approximate a continuous range of

available periods to choose from),

• G2: Geometric distribution with a common ratio of 2 (100, 200, 400, · · · ),

• G4: Geometric distribution with a common ratio of 4 (100, 400, 1600).

0

20

40

60

80

100

120

140

160

0 100 200 300 400 500 600 700

Uti

lity

Energy Threhold

Tx-f=1, ATx-f=1, G2Tx-f=1, G4Tx-f=2, A

Tx-f=2, G2Tx-f=2, G4Tx-f=4, A

Tx-f=4, G2

Tx-f=4, G4Tx-f=8, A

Tx-f=8, G2Tx-f=8, G4Tx-f=16, A

Tx-f=16, G2Tx-f=16, G4

Figure 5.24: Utility Variation with Energy and Tx-factor(X)

Figure 5.24 shows the plot of utility against Eth at various values of Tx-factor aver-

aged over several randomly generated tasks. As expected, when Eth increases, cool-down

times decrease and higher utility is accrued since higher energy levels are available. As


Tx-factor increases, higher transmission times are required for achieving the same track-

ing error, and the accrued utility is lowered since the system runs out of time.

In fact, at a Tx-factor of 16 and Eth of around 100, not all tasks are admitted into

the optimizer under G2 or G4. That is, some tasks do not even get their minimum QoS

operating points. These conditions represent an over-constrained system, and occur for

Tx-factor values above 16 and Eth ≤ 100. Likewise, the system becomes under-constrained

for Eth ≥ 500.

60

80

100

120

140

160

180

0 50 100 150 200 250 300

Uti

lity

Number of Tracking Tasks

Utility variation with tx-factor 4

No SchedulingImproper Nesting

Proper NestingImproper-proper NestingProper-Improper Nesting


Let us only consider the general case when all tasks are admitted and can get at least

a minimum (non-zero) amount of tracking. Under these conditions, the maximum utility

drop for G2 relative to a wide choice of periods (represented by A) is 12.35% at an Eth value

of 170J and a Tx-factor value of 16. Similarly, the maximum drop for G4 is 24.5% at

an Eth value of 270J and a Tx-factor value of of 16. The average utility drops for G2

and G4 are 2.22% and 6.82% respectively and the corresponding standard deviations are 9.4

and 38.83 respectively across the entire range of Tx-factor and Eth. We limit periods to

harmonics only to satisfy jitter constraints, and these experiments show that the choice of


0

500000

1e+06

1.5e+06

2e+06

2.5e+06

3e+06

0 50 100 150 200 250 300

Run

-tim

e(us

)


Run-time variation with tx-factor 4

Improper NestingProper Nesting

Improper-proper NestingProper-Improper Nesting

Figure 5.26: Optimization+Scheduling Run-time Variation

G2 satisfies jitter constraints with only a small reduction in utility.

From the above experiment, we observe that a harmonic set of period G2 yields a utility

value very close to that of a fine-grained arithmetic set. Therefore, we can safely use only

harmonic periods for our radar model. This has the following additional effects: (1) it

improves the execution time of the optimization as it generates a smaller number of set-

points per task, (2) it automatically satisfies the jitter constraints, and (3) it does not affect

the optimality of the solution with respect to the fine-grained arithmetic period-set and (4)

makes the schedulability test easier.

Next, we perform the iterative (binary search) process of resource allocation for tasks

and analyze the performances of different dwell interleaving schemes.

5.9.2 Comparisons of Scheduling Algorithms

In this set of experiments, we maintain an Eth value of 250J [48] and a Tx-factor value of

4 with the aim of keeping the system to be neither under-constrained nor over-constrained.

Under these conditions, smart schemes will be better able to exploit available resources to


0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0 50 100 150 200 250 300

Uti

lizat

ion


Cooldown-Utilization variation with tx-factor 4



Figure 5.27: Avg Cool-Down Utilization

maximize overall utility. We vary the number of tasks from 16 to 256 and perform the

whole iterative resource allocation process as shown in Figure 5.6. The period distribution

is limited to G2 (namely 100, 200, 400, 800 and 1600) based on our earlier experiments.

The process of QoS optimization and generation of schedule repeats until we arrive at a

schedulable task-set where the radar utilization precision factor is 0.1%.

Our next experiment deals with comparing performances of various interleaving schemes,

under the condition that the schedulability test must be satisfied. We use the following 4

different interleaving schemes:

• Proper scheme: Perform Proper nesting of tasks alone.

• Improper scheme: Perform Improper nesting of tasks alone.

• Improper-proper scheme: Perform Improper nesting followed by Proper nesting.

• Proper-improper scheme: Perform Proper nesting followed by Improper nesting.

For each task configuration and each interleaving scheme, we determine the overall

accrued utility, the execution time and the radar utilization. We finally average the results

across 50 runs.


0.340.360.380.4

0.420.440.460.480.5

0.520.54

0 50 100 150 200 250 300

Uti

lizat

ion


Utilization variation with tx-factor 4



Figure 5.28: Avg Radar Utilization

Figure 5.25 shows the variation of utility accrued as the number of tracking tasks in-

creases under our interleaving schemes. From the figure, the Improper-proper scheme pro-

vides the highest utility followed in descending sequence by Proper-Improper, Improper

and Proper. The difference in utility between Improper-proper and Proper is 18.11% at

256 tasks. The drop in utility from Q-RAM with no scheduling constraints to Improper-proper

is 11.17% at the same number of tasks. In other words, the need to schedule the raw out-

puts of Q-RAM (operating with only 100% utilization requirements and no scheduling

constraints) leads to a drop of 11.17%.

Next, we plot the variation of radar cool-down utilization (Equation (5.9)) under these

schemes in Figure 5.27. We again observe that the Improper-proper scheme provides the

best utilization (close to 73.11%), and the Proper scheme performs the worst (on average

48.64%). We also observe that the radar utilization actually drops for large task sets (e.g.

from 128 tasks to 256 tasks). This is because we admit all tasks in the systems at their

minimum QoS level before performing any optimization or allocation. This reduces the

radar utilization of the system as the tasks are non-preemptive, and we admit tasks whose


minimum QoS itself is expensive. Our conclusion that the Improper-proper performs the

best among the four schemes is also further substantiated in the plot of radar utilization

(Equation (5.1)) in Figure 5.28. Figure 5.28 also shows that the cool-down utilization plays

a bigger role in determining the utilization of the system than the radar utilization in our

model.

Our experiments show that task-sets are best interleaved by improper nesting followed

by proper nesting. They also show that simple improper nesting performs better than

proper nesting. Therefore, the task-sets generated by the optimization steps are easier to

improperly nest.

5.9.3 Interleaving Execution Times

We measured the execution time of our resource allocation methods as the number of tasks

is varied. The results are shown in Figure 5.26. The execution time includes the Q-RAM

optimization followed by the schedulability analysis on the 4 radar system. We note that

the execution times of the scheduling results do not include the task profile generation time

(which occurs only once). This is because the task profile generation does not depend on

the interleaving scheme, and the task-set can choose their profiles very quickly (in the order

of µs) by using discrete profiles generated offline as described in Section 5.5.6.

The plot shows that all the interleaving schemes have comparable run-times, with

Improper being the fastest and Proper-Improper being the slowest. As can be seen, with

about 256 tasks, dynamic interleaving can be performed in about 2.5 sec -with very little

optimization carried out in our code. This shows that the reconfiguration rate of the op-

timizer can be once in 2.5 seconds for 256 tasks. In other words, the radar system can

re-evaluate the entire system and re-optimize every 2.5 seconds. In practice, the number of

tasks is unlikely to exceed 100 tasks. In that case, the reconfiguration rate can be increased

to roughly once in 0.7 seconds. In addition, our experimental results show that this can

be drastically dropped to the order of 100 ms if we increase the radar utilization precision


factor of the binary search technique to 1% or more.

5.10 Chapter Summary

We developed a QoS optimization scheme for a radar system. It incorporated the physical

and environmental factors that influence the QoS of the various tracking and search tasks .

In order to perform QoS optimization dynamically in real-time, the profiles of the tasks must

be dynamically generated, but this can lead to unacceptable execution times. We proposed

two approaches to solve this problem. First, we showed how only the “relevant” set-points

of the tasks are generated using traversal techniques that significantly reduce the complexity

of the optimization. A Two-step First-order Fast Traversal scheme (2-FOFT) proves to be

the best in reducing computational time significantly with negligible loss in system utility.

Next, we showed that the profiles can be generated offline based on quantization of the

environmental dimensions, with acceptable storage requirements. Only a limited number of

profiles need to be generated with quantized values of the environmental dimensions. With

such offline discrete profile generation, the total optimization time takes only 650 ms for

512 tasks with minimal loss in utility. This makes the Q-RAM optimization feasible for

real-time use.

Next, we developed an integrated QoS resource management and scheduling framework

for radar that can rapidly adapt to dynamic changes in the environment. This framework

relaxes the need for maximizing (or minimizing) an objective function using only strict

inequalities as resource constraints. We accomplish this by introducing an efficient schedu-

lability test on system resources, and repeating the optimization a small number of times

using a binary search technique. The scheduling of tasks on radar antenna deals with three

primary constraints. First, tasks must satisfy the energy constraint of the antenna. Sec-

ond, tasks must satisfy zero-jitter requirements. Third, tasks cannot be preempted during

transmission or reception.


We transformed the energy constraint into a timing constraint by defining a concept

called cool-down time. We then restricted ourselves to choosing only harmonic periods in

order to satisfy the jitter requirements, and we showed that such a restriction leads only to

small drops in the overall utility of the system. Finally, we interleave the phases of different

radar tasks so as to minimize the time utilization. We found that “improper nesting” of

tasks followed by “proper nesting” yields the best results. In the next chapter, we will

describe the resource allocation problem in distributed embedded systems.

Chapter 6

Resource Allocation in Distributed

Embedded Systems

6.1 Introduction

Distributed embedded control systems are used in cars, airplanes, houses, information and

communication devices such as digital TV and mobile phones, and autonomous systems

such as service or entertainment robots. Due to the steady improvement of production

processes, each of these applications is now realized as a system-on-a-chip. Furthermore, on

the hardware side, low-cost broadband communication media are essential in the realization

of distributed systems. In order to ease the difficulties of having a communication system,

middleware solutions for embedded systems are emerging. In distributed embedded systems,

typically, multiple threads work together to process real-time tasks. These threads can be

distributed across multiple processors.

Let us consider the case of automotive systems. The latest generation of road vehicles has

seen a tremendous growth in on-board electronic systems, which control increasingly large

parts of a vehicle’s functionality. The development of new automotive functions based on the

use of modern electronic, computer, and communication technologies has been accelerated

161

162 Chapter 6. Resource Allocation in Distributed Embedded Systems

in recent years. Several products such as ABS (anti-lock Brake Systems), Cruise Control,

and Engine Management Systems have been developed. These systems are controlled by

Electronic Control Units (ECUs). A typical diagram of a distributed automotive system is

shown in Figure 6.1.

The early ECUs were quite autonomous, and thus had very little exchange of information

among them. However, as the functionality increases, more advanced functions are required

for cooperation among the ECUs [2]. Currently, vehicles contain up to 20 ECUs that are

as close as possible to the device they control, and are connected via a few distribution

networks. The networks usually run protocols such as Controller Area Network (CAN) [9],

Time-Triggered Architecture [47], Local Interconnect Network (LIN) among many others.

The inclusion of electronics in the automotive industry will continue to increase, and at

the same time, cost pressure will limit the functionality to be added to the system. In the

future, it is expected that the number of ECUs will not grow dramatically because of cost,

space and weight constraints. This will lead to greater integration of software and hardware

modules to meet the demand for increased functionality.

6.2 QoS and Resource Management Challenges

The QoS of a distributed embedded system is generally characterized by the accuracy and

the precision of various measurements (e.g., speed), along with reliability, security and

other multimedia-specific requirements. Measurements on precision in embedded systems

like avionics and automotive systems are dependent on the feedback control mechanisms

used. In other words, the resource requirements for a given level of precision may change

dynamically due to changes in the environment.

As the functionality increases, we must provide an efficient layered software architecture

so that components of the software can be reused [21]. These components may have multiple

QoS levels that include dimensions in real-time, fault-tolerance and security. In addition to

6.2. QoS and Resource Management Challenges 163

Gateway

Gateway

Input/Output

FPGARAM

CACHE

CPUABS Cruise Control

Engine ControlThrottle Control

Safety critical Network

Non−safety criticial high−speed network

Network Interface.

Non−safety critical low−speed Network

Figure 6.1: Typical Automotive System

using computational resources, they also communicate with each other. Hence, an efficient

resource management scheme must determine the QoS settings and resource requirements

of these components, while scheduling them in the distributed system. In addition, the

scheme must also be dynamic enough to satisfy varying resource requirements of these

components under various environmental conditions. This may lead to a change in the QoS

of a task based on the demand of the situation. For example, the engine throttle control task

increases its frequency based on the RPM (Revolution Per Minute) of the engine crankshaft.

This essentially requires more resources for the same QoS level, thereby forcing the system

to adjust the QoS levels assigned to tasks.

We can assume that an embedded system consists of a series of ECUs as multiproces-

sors connected by a bus running a large number of small tasks (or software components).

Many of these components communicate with each other. In Chapter 3, we discussed QoS

optimization in a large multiprocessor system where we assumed that there was a finite

number of types of tasks and negligible communication requirements among them. In this


case, we summarize the main distinguishing features of a distributed automotive system as

follows.

• A large number of tasks (> 1000) is allocated to a small (< 20) number of processors.

• Tasks (or software components) communicate with one another. This means that the

communication bandwidth must be allocated appropriately in addition to processor

bandwidth in order to satisfy timeliness.

• The possible number of types of tasks may not be limited.

• This system can be very dynamic as in the case of radar tracking, based on the

environmental factors.

In order to tackle the above issues, we summarize our approach as follows. First, we use

cluster analysis to classify and prioritize tasks. Second, we hierarchically group the tasks

based on their obtained priorities and allocate resources among groups. Third, we execute

QoS Optimization independently on each group to perform resource allocation among tasks

within a group.

In the next section, we briefly describe our cluster analysis principles [23].

6.3 Task Classification and Cluster Analysis

Clustering has been used as a method of scalable optimization for many applications [76].

It is normally used to group objects based on their similarities or differences, which is com-

monly termed the distance between objects. The first step typically begins with measuring

each of a set of n objects on each of k attributes. Next, a measure of similarity or, alterna-

tively, the distance or difference between each pair of objects is obtained based on those k

variables, using Euclidean distance, Manhattan distance, Mahalanobis distance etc. Then

an algorithm, or a set of rules, must be employed to cluster the objects into sub-groups

6.3. Task Classification and Cluster Analysis 165

based on inter-object similarities. The ultimate goal is to arrive at clusters of objects which

display small intra-cluster variations, but large inter-cluster variations.

The overall cluster analysis procedure involves two key problems: (1) obtaining a mea-

sure of inter-object similarity and (2) specifying a procedure for forming clusters based on

the similarity measures.

6.3.1 Measure of Similarity

An essential step for cluster analysis is to obtain a measure of the similarity or proximity

between each pair of objects under study (or, alternatively the difference). There are

various ways to measure this metric under different circumstances such as : (1) Correlation

coefficients, (2) Euclidean distances, (3) Matching-type measures of similarity and (4) Direct

scaling of similarity. In our case, we will discuss the similarities between the tasks based on

their Profiles.

Recall that a task in our model consists of multiple set-points. Each set-point contains

values of several operational dimensions, resource requirements and utility. In the case

of multiple resources, we obtain a scalar parameter called the compound resource that

expresses the price of a particular resource combination.

We want to compare the utility functions of tasks with reference utility functions. There-

fore, we would like to generate reference curves for individual classes offline. Each task will

be allocated to one of the classes depending on which reference curve it is closest to. The

proximity is based on the above four quantities.

Similarity in Utility functions

Using the notion of an Lp−metric [58], the pth order difference between two functions f(x)

and g(x) in the range Ω = [a, b] can be expressed as:

||f − g||p = (

∫ ba |f(x)− g(x)|pdx

b− a)

1p . (6.1)


Figure 6.2: Utility Curve Ranges

Considering only the first order difference, the output of the above equation is pro-

portional to the area between the curves f(x) and g(x). Considering the second order

difference yields a root-mean-square (RMS) measure. The infinite order difference yields

the maximum distance between the functions.

For functions with discrete domain Ω = [x1, .., xn], this can be alternatively expressed

as:

||f − g||p = (n∑

i=1

(f(xi)− g(xi))p)1p . (6.2)

Let us now apply this concept to finding tasks with similar utility functions. From Figure

6.2, comparing functions with Ω = [0, b], we observe that any utility function starting from

0 at x = 0 and ending in the range [g(b), f(b)] at x = b lies within the trapezoidal region

covered by the lines [l(x) = (g(b)xb )], [m(x) = f(b)], the y-axis and [x = b]. The area covered

is equal to b2f(b) + (f(b) − g(b)). Hence, if two functions h(x) and g(x) have the same

end-points at y = l(b), then they maximally differ from each other by the triangular area


between the y-axis, [l(x)] and [n(x) = l(b)]. The area is equal to bl(b)2 . These two functions

have the same end-point and hence the same average slope.

Therefore, if two functions h(x) and g(x) have the same end-points at y = l(b), then

they maximally differ from each other by the triangular area between the y-axis, [l(x)] and

[n(x) = l(b)]. The area is equal to bl(b)2 .

Next, we investigate the relation between functions g(x) and h(x) relative to their deriva-

tives. If we impose two more conditions, g′(0) = h′(0) and g′(b) = h′(b), from Figure 6.2,

we can observe that the range of f(x) and g(x) are confined within the triangular area

between the intersecting lines [i(x) = (g′(0))x], [j(x) = (g′(b))x + (g(b)− bg′(b))] and [l(x)].

This area is smaller than the area of the previously defined triangle between l(x), n(x) and

the y-axis.

From the above discussion, we can define task similarity using the following four quan-

tities:

• f(x) at x = 0 and x = b (b = last estimated point on the curve)

• f ′(x) at x = 0 and x = b.

As seen from the last section, we can make the maximum error between two functions

arbitrarily smaller by using higher order derivatives. The above conditions can be used as

guidelines to classify tasks.

6.3.2 Utility Loss Analysis in Slope-based Classification

In this section, we evaluate the potential loss in utility that would occur when tasks are

classified based their values at two end-points. Let us suppose that each task has a contin-

uous utility function and its value increases linearly with resource up to a certain amount

of resource beyond which it saturates to a maximum value. We also assume that their

utility functions have the same end-points. This is illustrated in Figure 6.3. The assumed

parameters are listed in Table 6.1.


Util

ity

r2r1

u

Resource

Figure 6.3: Utility Functions of Two Types

Number of types of tasks 2 (T1 and T2)Number of tasks of each type n1 for T1, n2 for T2

Maximum achievable utility for a task uMinimum resource to accrue utility u r1 for T1, r2 for T2

Total resource amount R

Table 6.1: Assumed Parameters for each Task Types

If we employ our approximated classification, all tasks are considered identical since all

have the same end-points. We will inquire how much utility loss we incur if we employ this

approximation. We define 3 cases based on different ranges of the value of R.

Case 1: R ≤ n2r2: This case represents the situation when there are insufficient resources

to allocate all of the T2 tasks at their highest QoS level and thus in the optimal allocation,

no resources will be given to T1 tasks. The optimal algorithm will distribute the entire

resource of amount R equally among tasks of the type T2. The total optimal utility is given


by:

Uopt = uR

r2. (6.3)

If all tasks are classified to be of the same type based on their identical end-points, the

resource is distributed equally among (n1 + n2) tasks. The accrued utility is given by:

Us = uRn1r1

+ n2r2

n1 + n2. (6.4)

The fractional loss in utility from the optimal value is given by:

ε =Uopt − Us

Uopt=

1− r2r1

1 + n2n1

. (6.5)

Since R ≤ n2r2, the bound of the value of ε is expressed by:

ε ≤1− R

n2r1

1 + n2n1

, (6.6)

⇒ ε ≤ 1− R

n2r1, (6.7)

where n1 >> n2 in the extreme case. If we assume r1 = r2 + ∆r, where ∆r ≥ 0 since

r1 ≥ r2, we obtain:

ε ≤ 1− (1 +n2∆r

R)−1. (6.8)

From Equation (6.8), we observe that the utility loss increases with the increase in ∆r.

The utility loss can be potentially very high if the maximum resource requirement of a task

is much larger than the capacity of the resource i.e., ∆r >> R and the number of tasks

of the type T1 is much larger than the number of tasks of the type T2, namely n1 >> n2.

However, the obtained utility will be small when n2 and r2 both are small since Uopt ≤ un2

as R ≤ n2r2.


Case 2: (n1r1 + n2r2) ≥ R > (n1 + n2)r2: This case represents the situation where

there are sufficient resources available to allocate all of the T2 tasks at their highest QoS

level but the remaining resources are not enough to maximize the QoS of tasks of the type

T1. The optimal algorithm will maximize the utility values of tasks of type T2 while allo-

cating the remaining resource equally among tasks of type T1. The optimal utility value is

given by:

Uopt = n2u +u(R− n2r2)

r1. (6.9)

The utility value obtained by using our approximated classification is given by:

Us = uRn2r1 + n1r2

(n1 + n2)r1r2. (6.10)

The value of the utility loss ε is expressed by:

ε =(r1 − r2)n2[(n1 + n2)r2 −R](R + n2(r1 − r2))r2(n1 + n2)

, (6.11)

which can be simplified as:

ε ≤ 1(1 + n2

n1)[ R

n2∆r + 1]. (6.12)

If in the limiting case, when ∆rR →∞, we obtain:

ε ≤ 11 + n2

n1

. (6.13)

This shows that although the error is always < 100%, it can be made arbitrarily close

to 1 when ∆r >> R.

Case 3: (n1 + n2)r1 ≥ R > (n1r1 + n2r2) :

This case represents the situation where sufficient resources are available to maximize

6.4. H-Q-RAM Algorithm Design 171

the QoS levels of all tasks of both types. The optimal utility is given by:

Uopt = (n1 + n2)u. (6.14)

Using the approximated classification, the obtained utility is given by:

Us = n2u +Rn1

(n1 + n2)r1u. (6.15)

The fractional utility loss is given by:

ε =n1[(n1 + n2)r1 −R]

(n1 + n2)2. (6.16)

Considering the lowest value of R in this range, we obtain:

ε ≤1− r2

r1n1n2

+ n2n1

+ 2. (6.17)

The worst case happens when r2 = 0 and n2 = n1. In this case, the value of error is 25%.

Based on the above results, we conclude that the approximated classification is applicable

where ∆r, the difference between maximum resource requirements of two tasks of two types,

is less than the capacity of the entire resource R, which is usually the case for most systems.

In addition, the worst case happens when the optimal utility value is infinitesimally small.

In the next section, we will discuss the design of our H-Q-RAM algorithm in detail that

classifies tasks based on their average slope values i.e., the slope of the line joining their

end-points.

6.4 H-Q-RAM Algorithm Design

This algorithm is a more generalized version of Algorithm 4 described in Chapter 3. The

whole process is divided into 4 main parts: (1) Task classification, (2) Clustering, (3)


Virtual task formation, and (4) Hierarchical resource allocation.

First, we classify tasks by ordering them in decreasing values of their average slopes.

Next, we create two groups and allocate tasks in each group alternatively. This way, the

average slopes of tasks in each group are similar. Once two groups are formed, we com-

pute the resource demand of each group and allocate processors and communication (bus)

bandwidth to them proportional to their demands. We then recursively divide each group

hierarchically until we allocate at most 2 processors per group.

6.4.1 Task Classification

Tasks are classified to based on average slopes of their utility functions. We make the

following assumptions.

• A task always needs some amount of processing resource.

• If two tasks communicate with each other, we can eliminate their communication

bandwidth requirements by placing them in the same processor. Thus, unlike the

CPU bandwidth, the communication bandwidth may not always be needed for tasks.

We assume a weight of 0.5 for network bandwidth (sending) while assuming that of

1.0 for CPU bandwidth.

Based on the above observations, we express the resource requirement of task i in terms

of a 2-element resource vector consisting of communication (bus) bandwidth and processing

bandwidth at each QoS level1.

If rcj= the processing resource requirement at a QoS level j, rnj= the corresponding

total communication (bus) bandwidth requirement with other tasks, we define a composite

resource metric at level j by:

Hj =√

r2cj

+ (0.5rnj )2. (6.18)

1We are yet to obtain resource dimensions for tasks since clustering is not done. Hence these settings arenot set-points in the pure sense.


Thus, we construct set-points of a task by computing the composite resource value at each

QoS level. Next, we compute the average slope of the utility function.

Definition 6.4.1 (Average Slope of a Task). The average slope of a task i is given by

the following expression:

si =Umax − Umin

Hmax −Hmin, (6.19)

where Hmin = composite resource at the lowest QoS level, Hmax = composite resource at

the highest QoS level, Umin = utility at the lowest QoS level, and Umax = utility at the

highest QoS level.

Tasks are sorted based on their average slopes. We can either sort the tasks in decreasing

order of their slopes, or perform a radix sorting [45], in which we can divide the slope range

into N discrete slots and fit the tasks into one of the slots.

6.4.2 Clustering

First, we create two clusters and allocate tasks to them. The allocation is performed in

such a way that the tasks with the same class (i.e., similar average slopes) are distributed

in equal numbers between the clusters. Each cluster can again be divided into two more

clusters and thus, the clustering process continues recursively. It lasts until we have a certain

maximum threshold number of tasks or a certain maximum threshold number of resources

per cluster. The scalability and the accuracy of the solution depends on these thresholds.

In this chapter, for simplicity, we assume that the threshold for resources is 2 processors

per cluster. We do not assume any threshold number for tasks.

In order to distribute tasks and resources to the clusters, we require the following defi-

nitions.

Definition 6.4.2 (Mean Slope of a Cluster). The mean slope of a cluster is given by

the arithmetic mean of the average slopes of tasks present in the cluster.


Division of Tasks among Groups/Cluster

Allocation of Processors to Groups

Figure 6.4: Slope-based Task Clustering Procedure

Definition 6.4.3 (Average Demand of a task). The average demand of a particular

type of resource (processing bandwidth or communication bandwidth) of a task is given by:

ravgc =

∑Nj=1 rcj

N, (6.20)

where N= the number of QoS levels of a task, and rcj= the resource vector consisting of

two components: CPU and network bandwidth at the jth level.

Definition 6.4.4 (Resource Demand of a Cluster). The resource demand of a cluster

is given by the sum of the average demands of its tasks.

Task Clustering

At a particular stage of the clustering algorithm, we start with the task with the highest

slope, and allocate tasks to each cluster so that the mean slopes of the two clusters are

nearly equal.


In order to minimize the network resource requirements, we would like to allocate tasks

that communicate with each other in the same cluster. This may conflict with our goal of

balancing mean slopes on clusters. As mentioned before, we divide the slope range into a

number of discrete slots and fit each task into one of the slots. Within each slot, we sort the

tasks by their communication (sending) bandwidth requirements in increasing order. Next,

we select a cluster for each task. If both clusters have the same mean slope, we allocate

the task to the cluster that contains its communicating tasks with largest value of total

communication bandwidth. If the mean slopes are not equal, the allocation is done to the

appropriate cluster so as to equalize the mean slope.

In this way, we ensure two aspects. First, the mean slopes of the clusters are equalized.

Second, the tasks that communicate with one another with larger communication bandwidth

requirements fall in the same cluster and may eventually be allocated to the same processor.

If two tasks are allocated to the same processor, their mutual communication bandwidth is

eliminated.

Resource Clustering

Once two clusters are formed, we apportion the processing and communication resources

based on the resource demands of the clusters. For example, if Rd1 and Rd2 are the total

processing resource demands of two clusters, P is the total number of processors each of

capacity C, the resource allocations Ra1 and Ra2 are given by:

Ra1 =PCRd1

Rd1 + Rd2

, (6.21)

Ra2 =PCRd2

Rd1 + Rd2

, (6.22)

This resource allocation may lead to a fractional allocation of processors, which must

be managed while performing the scheduler admission test on each processor. In the same

way, we distribute the bus bandwidth between the clusters. The process of hierarchical


Virtual Task Formation

Figure 6.5: Virtual Task Creation Procedure

clustering for one iteration is illustrated in Algorithm 13.

6.4.3 QoS Optimization

So far, we have been able to divide the system into multiple independent subsystems called

clusters. In this step, we perform QoS optimization independently on each cluster. We will

compare this with basic Q-RAM optimization in which we directly model the entire system

without performing clustering. The optimization process is divided into two steps: Virtual

Task Formation and Resource Allocation.

Virtual Task Formation

As mentioned earlier, when a group of tasks communicate only with each other, their

network bandwidth requirements disappear when they are allocated to the same processor.

Therefore, the resource allocation of these tasks are mutually dependent. We form virtual

tasks by combining these tasks, as shown in Figure 6.5. Consequently, we generate profiles

of virtual tasks by enumerating their resource allocation in the cluster (or in the entire

system for basic Q-RAM).

6.5. Experimental Results 177

Number of QoS dimensions q 1Number of elements of each dimension 3Utility range for QoS dimension (u(q)) random [0.1-1.0]Weight range for each QoS dimension random [0.01,1.00]CPU requirement for a task random [2 MHz - 200 MHz]Network bandwidth requirement between two tasks random [20 Kbps - 200 Kbps]Number of communicating tasks for each task random [1,8]Number of processors 16Resource capacity per processor 2 GHzNetwork bandwidth capacity of the bus 100 Mbps

Table 6.2: Experimental Settings with Optimal Algorithm

Resource Allocation

The resource allocation within each cluster follows the basic AMRMD CM algorithm, as men-

tioned in Chapter 2. We perform allocation in each cluster independently.

6.5 Experimental Results

In this section, we compare the performances of H-Q-RAM and Q-RAM optimizations. As

in previous chapters, our experiment focuses on measuring two parameters: (1) the global

utility obtained by the optimization, and (2) the total execution time of the algorithm.

We consider a distributed system consisting of 16 processors, each with a frequency of

2GHz, connected by a bus of bandwidth 100Mbps. The assumed configuration of the tasks

and that of the system are presented in Table 6.2.

In the case of Q-RAM optimization, we enumerate all possible choices of deployment of

tasks in the system in order to obtain the optimal result. In H-Q-RAM optimization, we

implement Algorithm 13 to divide the system into multiple subsystems or clusters, repeat

the clustering process until we have fewer than 3 processors per cluster, enumerate possible

choices of deployment of tasks within each cluster, and determine the near-optimal resource

allocation within each cluster independently.


Create 2 clusters;Create a 3rd cluster;//This stores odd-numbered task from each region

Linearly divide slope ranges (0,∞) in nth number of discrete regions;// nth = number of discrete regions, 100, for example

Fit the tasks into the regions based on average slope values of their utilityfunctions;Classify a task to be of a type based on its presence in a region;for Each slope region do

Within a region, sort the tasks based on decreasing order of their Averagetransmission bandwidth requirements;if number of tasks is odd then

Put the last task in the 3rd cluster;//This takes the last odd-numbered task from the next loop

endfor Each task in the region do

Determine the proportion of communication bandwidth requirements fortasks already allocated in 2 clusters;if Each cluster has equal number of tasks of this type then

Allocate the task to the cluster whose tasks have greater communicationbandwidth with this task;

endelse

Allocate the task to the cluster that has a less number of tasks of thistype;

endend

end/*We would like take tasks out of the 3rd cluster and put them intothe first two of them */

if There are non-zero number of tasks in 3rd cluster thenif Slopes of the first 2 clusters are equal then

Determine the proportion of communication bandwidth requirements fortasks already allocated in 2 clusters;Allocate the task to the cluster that has more tasks that are communicatingwith this task;

endelse

Allocate task to the cluster that balances the mean slopes of the 2 clusterend

end

Algorithm 13: Clustering Algorithm for Communicating Heterogeneous Tasks

6.5. Experimental Results 179


We vary the number of tasks as N = 50, 100, . . . , 300, measure the accrued utility and

execution time for Q-RAM and H-Q-RAM. Each configuration is averaged over 50 iterations.

Figure 6.6 shows the bar-graph containing the variation of the obtained utility against

the number of tasks. From the figure, we notice that H-Q-RAM yields a utility very close

to that of Q-RAM. In fact, the maximum reduction in utility is less than 4%. In addition,

this drop decreases with an increase in the number of tasks, as shown in Figure 6.7.

We plot execution times for Q-RAM and H-Q-RAM in Figure 6.8. As expected, H-Q-

RAM shows a big improvement on execution time. For example, the reduction in execution

time for H-Q-RAM is 85% for 300 tasks. Moreover, the difference in execution times between

the algorithms increases with the number of tasks in the system. This proves the usefulness

of H-Q-RAM for large distributed embedded systems.


Figure 6.7: Percentage Utility Reduction

Figure 6.8: Execution Time Variation


6.6 Chapter Summary

In this chapter, we investigated the QoS-based resource allocation problem in distributed

embedded systems. This is an extension of the resource allocation we discussed for mul-

tiprocessor systems in Chapter 3. However, we relaxed a few assumptions that we had

made in Chapter 3. First, tasks can communicate with each other. Therefore, we need to

consider allocating the network bandwidth (which is assumed to be bus bandwidth) along

with the processor cycles. Second, we did not assume any fixed set of types of tasks. In

other words, a task can have any possible profile within certain ranges of processor cycles

and network bandwidth requirements. In this case, in order to implement a similar hierar-

chical decomposition technique, we discretized profiles based on their average slopes. We

also minimized the usage of network bandwidth by clustering heavily communicating tasks

together as much as possible. This ensures that highly communicating tasks are likely to

be allocated to the same processor thereby eliminating the network bandwidth requirement

among themselves. The results also demonstrated that our H-Q-RAM is scalable enough

to be used as an adaptive run-time QoS optimizer for distributed embedded systems.

As future work, we would like to implement this as adaptive QoS-aware middleware

in specific types of embedded systems such as automotive systems. We would also like to

integrate this approach with a design-time code-generation tool, such as Time Weaver[21].

Chapter 7

Conclusion and Future Work

The fundamental motivation for this dissertation is the growing need for the development

of scalable resource management infrastructure for large, dynamic and distributed real-

time systems. Instead of maximizing the throughput of one or more resources, the goal

of our scheme is to maximize the satisfaction of the end-users. We consider traditional

distributed systems as well as embedded distributed systems that interact directly with the

physical environment, and hence operate under physical constraints. In all such systems,

the satisfaction of the end-users is the primary parameter that must be maximized.

Our goal was to address the complexity of resource management schemes that allocate

resources to a large number of tasks, perform their deployment in the system and ensure

their timing guarantees by interacting with the admission control of the scheduler. Since the

algorithm for optimally solving this problem is NP-hard, we investigated scalable heuristic

solutions that scale well with the size of the systems.

7.1 Contributions

The contribution of this dissertation can be divided into three major areas. First, we

designed a generic model of a distributed system consisting of resources and other physical

183

184 Chapter 7. Conclusion and Future Work

constraints. Secondly, we developed a set of algorithms that perform the QoS optimization

in a large system in a scalable manner while obtaining a global utility close that of the

optimal algorithm. Finally, we designed and implemented a scheme that integrates our

QoS optimization model with the admission control mechanisms of resources for guaranteed

schedulability.

7.1.1 Modeling

We borrowed the existing model of resources and tasks from the QoS-based Resource Al-

location Model (Q-RAM) [49]. In Q-RAM, a resource vector, whose number of elements

denotes the number of resources, represents a system and the value of an element denotes

the capacity of the corresponding resource. A task is represented by a set of QoS dimen-

sions as user-level dimensions. Each QoS dimension is associated with a utility function.

A particular QoS level of a task contains a fixed value for each of the QoS dimensions.

Hence, each QoS level is associated with a utility that is a sum of the utilities obtained

from the individual QoS dimensions. We also defined a set of system-level dimensions that

influence the allocation of resources of a task. These include operational dimensions and

environmental dimensions.

Operational Dimensions: The operational dimensions are the parameters that are

within the control of the system administrator that influence the resource demands of an

application. Some of the operational dimensions may be of direct relevance to the user

in terms of the quality. Hence, some operational dimensions can be QoS dimensions. Ex-

amples of operational dimensions include resource deployment options, coding schemes for

video applications etc.

Environmental Dimensions: The environmental dimensions are the parameters that

are not in control of the system administrator or the user. An example is the noise in a

7.1. Contributions 185

wireless environment. Changes in the environmental conditions require us to re-optimize

the QoS of the system.

The system dimensions determine the resource requirements, the values of the QoS di-

mensions and the corresponding utility values. Combining all these dimensions, we generate

set-points of tasks, where each set-point consists of a utility value and a particular setting

of operational and environmental dimensions, which includes a QoS level and a resource

configuration.

7.1.2 Scalable QoS Optimization

Our QoS optimization algorithm chooses a set-point for each task and allocates resources

to tasks according to the requirements of their assigned set-points. We define the global

utility of the system as the sum of the utilities of the assigned set-points of tasks. The

optimization process maximizes the global utility.

We developed a basic algorithm of polynomial complexity called AMRMD CM as a modified

version of AMRMD1 algorithm for Q-RAM [51]. AMRMD CM extends the functionality of the

basic algorithm by handling trade-offs more efficiently for tasks with multiple resource

deployment options. The complexity of the basic algorithm is O(nL log(nL)), where n

equals the number of tasks and L equals the maximum number of set-points per task.

Although this is a seemingly benign complexity, it increases monotonically with the increase

in either n or L, which can be problematic when either n or L is very large. To manage this

complexity, we have developed a hierarchical decomposition technique, collectively called

Hierarchical Q-RAM or H-Q-RAM for large distributed systems.

In multiprocessor systems, where a task can be allocated to any of the processors, we

divide the problem into multiple sub-problems, and solve these subproblems independently.

This is done by distributing the processors into near identical processor-groups, distributing

the tasks into near-identical task-clusters, assigning each task-cluster to each processor-

group to form near-identical subsystems, and finally performing the QoS optimization in


each of the subsystems concurrently.

A hierarchical networked architecture similar to the Internet consists of loosely connected

sub-domains. Each sub-domain can be considered to be a separate subsystem. However,

if a task has a fixed source and a destination in different sub-domains, then it has fixed

sub-domains for source and destination and the routes between its source and destination

can span across multiple sub-domains. In such cases, we cannot perform sub-domain QoS

optimization independently since the routes of tasks may pass through multiple sub-domains.

The resource allocation is very likely to be made locally within a sub-domain if the source

node and the destination node of a task both fall inside the same sub-domain. This type

of task is called a local task. A task whose source and destination nodes belong to different

sub-domains is called a global task. Sub-domains negotiate with each other using transaction

techniques to allocate resources (route and bandwidth) to a global task.

For certain systems, the complexity arises from the size of L, i.e., the number of the

set-points per tasks. In a certain system, a task may have a very large number of possible

configurations. A typical example is a Radar System. In this case, we have developed

efficient algorithms that select only a few important set-points per task efficiently without

enumerating all possible set-points. We studied the performance of these algorithms in

terms of the global utility and the execution time.

7.1.3 Integration of QoS Optimization and Scheduling

In this dissertation, we have presented an integrated approach that simultaneously maxi-

mizes overall system utility, performs task scheduling analysis and satisfies multi-resource

constraints in dynamic real-time systems such as a radar system. In our implementation

of a resource manager for a phased array radar system, we show that our approach is not

only efficient enough to be used on-line in real-time, but also performs within 10% of the

optimal solution. In this process, we develop efficient scheduling schemes for radar tracking

tasks that can generate high resource utilization of the radar by interleaving tasks with each

7.2. Future Work 187

other.

7.2 Future Work

This dissertation analyzes the complexities associated with QoS-based resource management

in distributed systems, and outlines a scalable framework for it. This has opened up multiple

directions for future research, from modest, incremental improvements to more broad and

fundamental ones. We present these different areas of future work here.

7.2.1 Implementation

We have a prototype implementation of a middleware that performs the QoS optimization

in a distributed networked system consisting of 12 nodes. A global server known as “Session

Coordinator” or SesCo, runs on a single node and performs QoS-based resource allocation for

the entire system [40]. It enforces resource reservations by interacting with “Local Resource

Managers (LRMD)” running on individual hosts. LRMD, in turn, relies on the reservation

mechanisms of the real-time operating system Linux/RK running on individual hosts [62].

Following the principles of H-Q-RAM, we would like to extend this prototype by incor-

porating the distributed implementation of SesCo. In addition, the emulation of large-scale

networks can also been performed in this test-bed for future research problems.

7.2.2 Stochastic QoS and Resource Requirements

In this dissertation, we have subtly assumed deterministic resource requirements of tasks.

Even if this could either be worst-case or the average-case, the variation of the resource

usages of tasks was not considered. If the resource requirement of a task changes, the

current configuration will rerun the optimization to generate a new resource allocation.

However, this may not be sufficient if the resource requirements of a task vary rapidly. In

this case, our system may not meet the deadlines of all the tasks if it uses average-case


utilization, or it will be heavily underutilized if it uses worst-case resource utilization.

In Q-RAM, we currently have two types of Probabilistic Level of Service (PLoS) metrics

in the context of network bandwidth [40]: (a) QoS availability (fraction of time there

is no degradation) and (b) fraction of packets delivered (not dropped). The “Resource

Priority Multiplexing” (RPM) policy module and its kernel-level mechanisms implement

the probabilistic guarantees for network bandwidth [37].

Apart from the two PLoS metrics as probabilistic QoS dimensions, there are other prob-

abilistic QoS dimensions such as the number of packets (or jobs) that meet their deadlines.

In this context, Zhu et al designed a Quantized EDF (Q-EDF) [86] scheduling mechanism

that minimizes the number of deadline misses of tasks. Hence, if we want to consider the

deadline miss rate as a QoS dimension (or, a PLoS metric), we would have to integrate

Q-EDF with the QoS optimization scheme. In our QoS optimization model, the criticality

can be considered a QoS dimension that determines the utility loss relative to the number

of deadline misses of a task. Based on the statistics of resource usage of the task, we would

like to determine the resource requirements of a task in order to obtain a specific deadline

miss rate. However, determining the resource requirement for a specific deadline miss rate

can be a difficult problem.

7.2.3 Profit Maximization Model for Resource Allocation

Our QoS-based resource allocation model maximizes the global utility of the system, by

apportioning resources of fixed quantities to a set of tasks. In this case, we maximize the

utilization of resources as well, since more resource usage generally provides more utility

to the end-users. Hence, we optimize our system toward maximizing benefits of the end-

users under the constraint of limited resource capacities. This is a typical consumer-centric

model where the consumer would like to maximize his/her satisfaction or utility by buying

a particular bundle of goods under his/her budget constraint.

The producer, on the other hand, sets the prices of goods based on the utility they

7.2. Future Work 189

provide to the customers. Hence the revenue earned by the producer is proportional to

the sum of the utilities of his/her consumers. However, the producer strives to maximize

his/her profit, which is defined as the difference between the revenue and cost. It is possible

that maximizing the revenue may not maximize the profit, since the cost generally increases

with the size of the system, and therefore, it may be too large at a very large revenue.

In computer systems, the cost to the producer includes the purchasing cost and the

maintenance cost of hardware and software components. Therefore, the profit maximization

principle leads to producer-driven hardware-software co-design issues. As future work, we

would like to develop analytical tools that determine the hardware composition and the

software deployment for embedded systems driven by profit maximization principles.

Bibliography

Bibliography

[1] Abdelzaher, T., Atkins, E., and Shin, K. G. (1997). QoS negotiation in real-time systems

and its application to automated flight control. In IEEE Real Time Systems Symposium

(RTAS), pages 228–238.

[2] Axelsson, J. (2000). Efficient integration of distributed automotive real-time systems.

In EDA-meeting.

[3] Baker, B. S. (1985). A new proof fro the first-fit decreasing bin-packing algorithm.

Journal of Algorithms, 6:49–70.

[4] Banerjee, P. and Abraham, J. (1984). Fault-secure algorithms for multiprocessor sys-

tems. In 11th International Symp. on Computer Architecture.

[5] Baugh, R. (1973). Computer Control of Modern Radars. RCA M&SR-Moorestown

Library.

[6] Bellman, R. (1958). On a routing problem. Quart. Appl. Math., 16:87–90.

[7] Bennett, J. and Zhang, H. (1996). Wf2q: Worst-case fair weighted fair queueing. In

Conference on Computer Communications (INFOCOM).

[8] Bentley, J. L., Johnson, D. S., Leighton, F. T., McGeoch, C. C., and McGeoch, L. A.

191

192 Bibliography

(1984). Some unexpected expected behavior results for bin packing. In 16th annual ACM

symposium on Theory of computing, pages 279–288.

[9] Bosch (1991). Can specification, version 2.0.

[10] Calvert, K. L., Doar, M. B., and Zegura, E. W. (1997). Modeling internet topology.

IEEE Communications Magazine, 35(6):160–163.

[11] Chan, L. M. A., Simchi-Levi, D., and Bramel, L. (1998). Worst-case analyses, linear

programming and the bin-packing problem. Mathematical Programming, 83:213–227.

[12] Chen, S. and Nahrstedt, K. (1998a). Distributed quality-of-service routing in high-

speed networks based on selective probing. In IEEE Annual Conference on Local Area

Networks (LCN), pages 80–89.

[13] Chen, S. and Nahrstedt, K. (1998b). An overview of quality-of-service routing for the

next generation high-speed networks: Problems and solutions. IEEE Network Magazine,

Special Issue on Transmission and Distribution of Digital Video, 12(6):64–79.

[14] Cherukuri, R., Dykeman(eds.), D., and Gouguen(chair), M. (1995). Pnni draft speci-

fication.

[15] Chevochot, P. and Puaut, I. (1999). Scheduling fault-tolerant distributed hard real-

time tasks independently of the replication strategies. In 6th International Conference

on Real-Time Computing Systems and Applications.

[16] C.L. Liu, J. L. (1973). Scheduling algorithms for multiprogramming in a hard real-time

environment. Journal on ACM, 2(1):46–61.

[17] Coffman, E., Garey, J. M., and Johnson, D. (1987). Bin packing with divisible item

size. Journal of Complexity, 3:406–428.

BIBLIOGRAPHY 193

[18] Coffman, E., Jr., Garey, M., and Johnson, D. (1996). Approximation Algorithms for Bin

Packing: A Survey. Approximation Algorithms for NP-Hard Problems. PWS Publishing,

Boston.

[19] David and Simchi-Levi (1994). New worst-case results for the bin-packing problem.

Naval Research Logistics, 41:579–585.

[20] Demers, A., Keshav, S., and Shenker, S. (1989). Analysis and simulation of a fair queue-

ing algorithm. In ACM Special Interest Group on Data Communication (SIGCOMM).

[21] Deniz, D. (2004). Modeling Functional and Para-Functional Concerns In Embedded

Real-Time Systems. PhD thesis, Department of Electrical and Computer Engineering,

Carnegie Mellon University.

[22] Dijkstra, E. (1959). A note on two problems in connection with graphs. Num. Math.,

1:269–271.

[23] Everitt, B. S., Landau, S., and Leese, M. (2001). Cluster Analysis. Edward Arnold.

[24] Ford, L. and Fulkerson, D. (1963). Flows in Networks. Princeton Univ. Press.

[25] Forum/95-0013R8, A. (1995). Atm forum traffic management specification version 4.0.

[26] Gao, X., Nandagopal, T., and Bharghavan, V. (2001). Achieving application level

fairness through utility-based wireless fair scheduling. In IEEE Global telecommunications

Conference (GLOBECOM).

[27] Ghosh, S., Hansen, J., Rajkumar, R., and Lehoczky, J. (2004a). Adaptive QoS op-

timizations with applications to radar tracking. In 10th International Conference on

Real-Time and Embedded Computing Systems and Applications (RTCSA).

[28] Ghosh, S. and Rajkumar, R. (1999). Practical management of end-to-end network

bandwidth reservation. In Proc. of Conference on Software in Telecommunications and

Computer Networks (SOFTCOM).

194 Bibliography

[29] Ghosh, S. and Rajkumar, R. (2002). Resource management of the os network subsys-

tem. In IEEE International Symposium on Object-oriented Real-time distributed Com-

puting.

[30] Ghosh, S., Rajkumar, R., Hansen, J., and Lehoczky, J. (2004b). Integrated resource

management and scheduling with multi-resource constraints. Technical Report 18-2-04,

Institute for Complex Engineering Systems, Carnegie Mellon University.

[31] Ghosh, S., Rajkumar, R. R., Hansen, J., and Lehoczky, J. (2003). Scalable resource

allocation for multi-processor QoS optimization. In 23rd IEEE International Conference

on Distributed Computing Systems (ICDCS 2003).

[32] Ghosh, S., Rajkumar, R. R., Hansen, J., and Lehoczky, J. (2004c). Adaptive QoS

optimizations with applications to radar tracking. Technical Report 18-3-04, Institute

for Complex Engineering Systems, Carnegie Mellon University.

[33] Goddard, S. and Jeffay, K. (1997). Analyzing the real-time properties of a dataflow

execution paradigm using a synthetic aperture radar application. In Proceedings of the

IEEE Real-Time and Embedded Technology and Applications Symposium.

[34] Guerin, R. and Orda, A. (1999). QoS-based routing in networks with inaccurate infor-

mation: Theory and algorithms. IEEE Transactions on Networking, 1(3).

[35] Guerin, R., Orda, A., and Williams, D. (1996). QoS routing mechanisms and ospf

extensions.

[36] Hansen, J., Ghosh, S., Rajkumar, R., and Lehoczky, J. (2004). Resource management

and highly configurable tasks. In 12th International Workshop on Parallel and Distributed

Real-Time Systems.

[37] Hansen, J., Zhu, H., and Rajkumar, R. (2001a). Probabilistic bandwidth reservation

BIBLIOGRAPHY 195

by resource priority multiplexing. In Real-Time Technology and Applications Symposium

(RTAS), pages 171–178.

[38] Hansen, J. P., Lehoczky, J., and Rajkumar, R. (2001b). Optimization of quality of

service in dynamic systems. In Proceedings of the 9th International Workshop on Parallel

and Distributed Real-Time Systems (WPDRTS).

[39] Hansen, J. P., Zhu, H., Lehoczky, J., and Rajkumar, R. (2002). Quantized edf schedul-

ing in a stochastic environment. In Proc. of 10th International Workshop on Parallel and

Distributed Real-Time Systems (WPDRTS).

[40] Hoover, C., Hansen, J., Koopman, P., and Tamboli, S. (1999). The amaranth frame-

work: Probabilistic, utility-based quality of service management for high-assurance com-

puting. In 4th IEEE International Symposium on High-Assurance Systems Engineering

(HASE), pages 207–216.

[41] Huang, J., Wan, P. J., and Du, D. Z. (1998). Criticality- and QoS-based multiresource

negotiation and adaptation for continuous multimedia. Journal of Real-Time Systems,

15(1):249–273.

[42] Johnson, D. (1973). Near-Optimal Bin Packing Algorithms. PhD thesis, MIT, Cam-

bridge, MA.

[43] Johnson, D., Demers, A., Ullman, J., Garey, M., and Graham, L. (1974). Worst-case

performance bounds for simple one-dimensional packing algorithms. SIAM J. Compt.,

3(4).

[44] Kleinrock, L. and Kamoun, F. (1977). Hierarchical routing for large networks-

performance evaluation and optimizations. Computer Networks, 1:155–174.

[45] Knuth, D. (1973). The art of Computer Programming, volume 3/Sorting and Searching.

Addison-Wesley.

196 Bibliography

[46] Kolawole, M. O. (2002). Radar Systems, Peak Detection and Tracking. Newnes Press.

[47] Kopetz, H. (1998). The time-triggered architecture. ISORC ’98, April 1998, in Kyoto,

Japan.

[48] Kuo, T. W., Chao, Y. S., Kuo, C. F., Chang, C., and Su., Y. (2002). Real-time dwell

scheduling of component-oriented phased array radars. In IEEE 2002 Radar Conferences.

[49] Lee, C. (1999). On Quality of Service Management. PhD thesis, Department of Elec-

trical and Computer Engineering, Carnegie Mellon University.

[50] Lee, C., Lehoczky, J., Rajkumar, R., and Siewiorek, D. (1998). On quality of service

optimization with discrete QoS options. In Proceedings of the IEEE Real-Time Technology

and Applications Symposium. IEEE.

[51] Lee, C., Lehoczky, J., Siewiorek, D., Rajkumar, R., and Hansen, J. (1999). A scalable

solution to the multi-resource QoS problem. In Proceedings of the IEEE Real-Time

Systems Symposium.

[52] Lin, K. (1995). Distributed pinwheel scheduling with end-to-end timing constraints.

In IEEE Real-Time Systems Symposium.

[53] Lui, K.-S., Nahrstedt, K., and Chen, S. (2000). Hierarchical QoS routing in delay-

bandwidth sensitive networks. In IEEE Local Computer Networks (LCN 2000), pages

579–588.

[54] Ma, Q. and Steenkiste, P. (1997). On path selection for traffic with bandwidth guar-

antees. In IEEE International Conference on Network Protocols.

[55] Mangharam, R., Demirhan, M., Rajkumar, R., and Raychaudhuri, D. (2004). Size

matters: Size-based scheduling for mpeg-4 over wireless channels. In SPIE Conference

on Multimedia Computing and Networking (MMCN), pages 110–122.

BIBLIOGRAPHY 197

[56] Medina, A., Lakhina, A., Matta, I., and Byers, J. (2001a). Brite: An approach to

universal topology generation. In International Workshop on Modeling, Analysis and

Simulation of Computer and Telecommunications Systems (MASCOTS 2001).

[57] Medina, A., Lakhina, A., Matta, I., and Byers, J. (2001b). Brite: Universal topology

generation from a user’s perspective. Technical Report 2001-003, Boston University.

[58] Megginson, R., Axler, S., and Gehring, F. (2001). An Introduction Banach Space

Theory (Graduate Texts in Mathematics, 183). John Wiley and Sons Inc.

[59] Mercer, C. (1997). Operating System Resource Reservation for Real-Time and Multi-

media Applications. PhD thesis, School of Computer Science, Carnegie Mellon University.

[60] Munu, M., Harrison, I., Wilkin, D., and Woolfson, M. (1992). Target tracking al-

gorithms for phased array radar. Radar and Signal Processing, IEE Proceedings-F,

139(5):336–342.

[61] Nahrstedt, K., hua Chu, H., and Narayan, S. (1999). QoS-aware resource management

for distributed multimedia applications. Journal of High Speed Networking, 7(3-4):229–

257.

[62] Oikawa, S. and Rajkumar, R. (1999). Portable RK: A portable resource kernel for guar-

anteed and enforced timing behavior. In IEEE Real-Time Technology and Applications

Symposium (RTAS).

[63] Parekh, A. and Gallager, R. G. (1993). A generalized processor sharing approach to flow

control in integrated services networks: the single node case. IEEE/ACM Transactions

on Networking, pages 344–357.

[64] Peressini, A., Sullivan, F., and Jr, J. U. (1988). The Mathematics of Nonlinear Pro-

gramming. Springer Verlag.

198 Bibliography

[65] Rajkumar, R. (1991). Synchronization in Real-Time Systems: A Priority Inheritence

Approach. Kluwer Academic Publishers.

[66] Rajkumar, R., Lee, C., Lehoczky, J., and Siewiorek, D. (1997). A resource allocation

model for QoS management. In IEEE Real-Time Systems Symposium.

[67] Rosu, D. I., Schwan, K., Yalamanchili, S., and Jha, R. (1997). On adaptive resource

allocation for complex real-time applications. In 18th IEEE Real-Time Systems Sympo-

sium.

[68] Rutgers, C. L. H. (2002). Cisco white paper: An introduction to igrp.

[69] Saewong, S. and Rajkumar, R. (1999). Cooperative scheduling of multiple resources.

In IEEE Real-time Systems Symposium.

[70] Sanfridson, M. (2000). Problem formulations for QoS management in automatic con-

trol. Technical report, Mechatronics Lab, Department of Machine Design Royal Institute

of Technology.

[71] Sha, L., Rajkumar, R., and Lehoczky, J. (1990). Priority inheritance protocols: An

approach to real-time synchronization. IEEE Transactions on Computers, pages 1175–

1185.

[72] Shenker, S. (1995). Fundamental design issues for the future internet. IEEE Journal

of Selected Areas in Communication, 13(7):1176–1188.

[73] Shih, C., Gopalakrishnan, S., Ganti, P., Caccamo, M., and Sha, L. (2003a). Scheduling

real-time dwells using tasks with synthetic periods. In Proceedings of the IEEE Real-Time

Systems Symposium.

[74] Shih, C., Gopalakrishnan, S., Ganti, P., Caccamo, M., and Sha, L. (2003b). Template-

based real-time dwell scheduling with energy constraint. In Proceedings of the IEEE

Real-Time and Embedded Technology and Applications Symposium.

BIBLIOGRAPHY 199

[75] Shirazi, B., Hurson, A., and Kavi, K. (1995). Scheduling and Load Balancing in Parallel

and Distributed Systems. Wiley.

[76] Smith, C. A. and Kroll, M. J. (1989). Utility theory and rent optimization: Utilizing

cluster analysis to segment rental markets. Journal of Real Estate Research, 4(1):61–71.

[77] Staehli, R. (1996). Quality of Service Specification for Resource Management in Multi-

media Systems. PhD thesis, Department of Computer Science and Engineering, Oregon

Graduate Institute.

[78] Staehli, R., Walpole, J., and Maier, D. (1995). Quality of service specification for

multimedia presentations. Multimedia Systems, 3(5).

[79] Stoica, I., Shenker, S., and Zhang, H. (1998). Core-stateless fair queueing: Achiev-

ing approximately fair bandwidth allocations in high speed networks. In Proceedings of

SIGCOMM’98.

[80] Stoica, I. and Zhang, H. (1999). Providing guaranteed services without per flow man-

agement. In ACM Special Interest Group on Data Communication (SIGCOMM).

[81] Tauro, L., Palmer, C., Siganos, G., and Faloutsos, M. (2001). A simple conceptual

model for the internet topology. In 6th IEEE Global Internet Symposium.

[82] Venkatasubramanian, N., Talcott, C., and Agha, G. A. (2004). A formal model for

reasoning about adaptive QoS-enabled middleware. ACM Transactions on Software En-

gineering and Methodology, 13(1):86–147.

[83] Walker, R. (1999). Introduction to Mathematical Programming. Prentice Hall.

[84] Wilkin, D., Harrison, I., and Wooflson, M. (1991). Target tracking algorithms for

phased array radar. Radar and Signal Processing, IEE Proceedings-F, 138(3):255–262.

[85] Zhang, L., Deering, S., and Estrin, D. (1993). RSVP: A new resource ReSerVation

protocol. IEEE network, 7(5):8–18.

200 Bibliography

[86] Zhu, H., Lehoczky, J., Hansen, J., and Rajkumar, R. (2004). Design tradeoffs in

networks with soft end-to-end timing constraints. In IEEE Real Time Systems Symposium

(RTAS).

Scalable QoS-based Resource Allocationsourav/thesis.pdf · Abstract A distributed real-time or embedded system consists of a large number of applications that interact with the physical

Documents