Design of topology–aware networked applications - CiteSeerX

i

Design of topology–aware networked applications

Roger P. Karrer

ii

c�

Roger P. Karrer, 2002.

Diss. ETH No. 14828

Design of topology–aware networked applications

A dissertation submitted to theSWISS FEDERAL INSTITUTE OF TECHNOLOGY ZURICH

(ETH ZURICH)

for the degree ofDoctor of Technical Sciences

presented byRoger P. Karrer

Dipl. Informatik-Ing. ETHborn May 17, 1969

citizen of Winterthur, Switzerland

accepted on the recommendation ofProf. Dr. Thomas R. Gross, examiner

Prof. Dr. Roger P. Wattenhofer, co-examiner

2002

As vezes, depois de uma batalha que parece nao ter fim, o guerreiro tem uma ideia econsegue vencer em materia de segundos.

Entao pensa: “Por que sofri tanto tempo, num combate que ja podia ter sido resolvidocom metade da energia que gastei?”

Na verdade, todo problema – depois de resolvido – parece muito simples. A grandevitoria, que hoje parece facil, foi o resultado de uma serie de pequenas vitorias quepassaram despercebidas.

Entao o guerreiro entende o que aconteceu, e dorme tranquilo. Ao inves de culpar–sepelo fato de haver demorado tanto tempo para chegar onde queria, alegra–se por saberque terminou chegando.

Paulo Coelho, o manual do guerreiro da luz.

iii

Abstract

The Internet is designed along several simple principles: best–effort delivery, separationbetween end systems and networks, and point–to–point connectivity. This simplicity hasgreatly contributed to the growth of the Internet to its current dimensions, but is no longersuited to efficiently support the communication requirements of today’s and tomorrow’sapplications, such as multicast, mobility, bandwidth–based routing or adaptation to ad-dress heterogeneity.

Our thesis is that these communication requirements can be addressed by a singleapplication–layer communication architecture that integrates knowledge about the topol-ogy and the available resources in a network into the application context, and that thisintegration pays off. We thereby assume that nodes inside the network (proxies) exist onwhich applications can instantiate their code.

This dissertation first describes the architecture, a three-layered topology–aware frame-work we call Octopus. This application–layer framework provides abstractions for ser-vices which can be deployed inside the network, algorithms and tools to locate, evaluateand select proxies, as well as mechanisms to steer an ongoing transmission. The frame-work fixes the process of topology–aware communication while allowing applicationsexpress their preferences by customizing the communication abstractions.

The concept of topology–awareness implies that information about the available prox-ies must be available. At the lowest layer, this dissertation describes a scalable and prac-ticable application–centric solution to discover available proxies. These proxies are or-ganized in a graph to form the network topology for a topology–aware application. Thetopology discovery is combined with the measurement of the available resources alongthis topology.

The middle layer contains abstractions to evaluate the topology graph and select aproxy to instantiate application–specific code. The evaluation can be made with application–specific preferences, e.g., to find the path through the graph with the smallest latency orthe largest bandwidth. Similarly, the selection can be customized, e.g., to place code foradaptation or for multicasting.

Finally, at the topmost layer, we evaluate our solution with two topology–aware appli-cations. A collaborative application distributes data of different content and size to mul-tiple participants. We can show that a topology–aware path selection can considerably

v

vi Abstract

improve the delivery time or the delivery quality, based on the application preferences.Second, we integrate an existing MPEG–1 application into Octopus to deploy differentcommunication forms. Octopus supports the dynamic creation of a multicast tree or thedynamic change from an existing connection to an alternative connection.

We conclude that the breaking of the end–to–end concept and the layering concept ofthe Internet allows an application to opportunistically use the available resources in a waythat better reflect the application requirements. The topology–aware framework is therebywell suited to deal with the increased complexity of taking topology– and performanceinformation about the network into account.

Kurzfassung

Der Aufbau des Internets basiert auf einigen einfachen Prinzipien: die Daten werden ohneGarantien ubertragen, die Endsysteme und das Netzwerk sind klar voneinander getrennt,und Verbindungen verbinden genau 2 Endsysteme. Die Einfachheit dieser Prinzipien hatviel dazu beigetragen, dass das Internet die heutigen Ausmasse erreicht hat. Nur: ebendie-se Einfachheit verhindert eine effiziente Unterstutzung von Kommunikationsanforderun-gen von heutigen und zukunftigen Anwendungen, wie beispielsweise Multicast, Mobility,Datenubertragung entlang von Pfaden mit hoher Bandbreite, oder eine durch die Hetero-genitat des Netzwerkes bedingte Anpassungen der Daten.

Die These, die dieser Dissertation zu Grunde liegt, ist, dass diese Anforderungendurch eine einzige Kommunikationsarchitektur auf Applikationsstufe unterstutzt werdenkonnen, bei welcher Information uber die Topologie und die verfugbaren Resourcen einesNetzwerkes in den Anwendungskontext integriert werden; ausserdem bietet diese Archi-tektur der Anwendungen gewisse Vorteile. Wir gehen in unserem Modell davon aus, dassdas Netzwerk spezielle Knoten anbietet (Proxies), auf denen die Anwendungscode instal-lieren konnen.

Diese Dissertation beschreibt zuerst die Kommunikationsarchitektur. Sie besteht auseinem topologie–sensitiven Framework, das wir Octopus nennen. Das Framework, dasausschliesslich auf der Applikationsschicht arbeitet, bietet folgende Komponenten an:Abstraktionen fur Dienste, die auf den Proxies instanziert werden konnen; Algorithmenund Werkzeuge, um Proxies in einem Netzwerk zu lokalisieren, zu evaluieren und den be-sten Proxy auszuwahlen; schliesslich Mechanismen, um eine laufende Datenubertragungzu steuern. Das Framework definiert und fixiert dabei den Ablauf einer topologie–sensitivenAnwendung wahrend die Anwendungen ihre Praferenzen durch eine Erweiterung der Ab-straktionen des Frameworks ausdrucken konnen.

Das Framework besteht aus drei Schichten. Auf der untersten Schicht beschreibt dieseDissertation eine sklarierbare und praktische Losung, um verfugbare Proxies in einemNetzwerk zu lokalisieren. Die Topologie, die sich aus der Anordnung der gefundenenProxies ergeben, wird zusatzlich mit Messungen der verfugbaren Resourcen angereichert.

Die mittlere Schicht bietet Abstraktionen, mit denen die Topologie evaluiert und derbeste Proxy fur eine bestimmte Anwendung ausgewahlt werden kann. Eine Anwendungkann dabei beispielsweise bestimmen, ob sie der beste Pfad durch die Topologie auf-

vii

viii Kurzfassung

grund der kleinsten Latenz oder der grossten Bandbreite definiert ist. Ebenso bestimmtdie Anwendung, welcher Ort der beste ist, um einen bestimmten Dienst einzurichten,beispielsweise fur Multicast oder Datenanpassung.

Der oberste Layer schliesslich prasentiert zwei verschiedene Anwendungen, mit de-nen die These dieser Dissertation evaluiert wird. In einer verteilten Anwendung schickensich die Teilnehmer Daten verschiedener Art und Grosse zu. Das topologie–sensitive Fra-mework unterstutzt dabei die Ubertragung so, dass automatisch der beste Pfad fur einenbestimmten Datentyp ausgewahlt wird. Dabei kann die Qualitat der Ubertragung oder dieUbertragungszeit deutlich reduziert werden. Die zweite Anwendung, die MPEG–1 ver-wendet, zeigt, wie verschiedene Dienste die Qualitat des Videostroms verbessern konnen,und wie Octopus die Anwendung beim Platzieren von Multicastdiensten oder beim Wech-sel von einer bestehenden zu einer alternativen Verbindung unterstutzt.

Die Schlussfolgerung dieser Dissertation ist, dass das Aufgeben der Ubertragung voneinem Endsystem zum andern oder das Aufweichen der strikten Schichtung in Netzwer-ken eine bessere Nutzung der zur Verfugung stehenden Resourcen fuhrt, welche den An-forderungen der verschiedenen Anwendungen besser entspricht. Das topologie–sensitiveFramework eignet sich, um die erhohte Komplexitat, die durch die Aufgabe der Prinzipienentsteht, aufzufangen und damit die Anwendungen effizient zu unterstutzen.

Acknowledgments

This dissertation and its contributions has been inspired, enabled and improved by anendless number of people who I have had the great opportunity to be in touch with overthe course of this work.

I am extremely grateful to Thomas Gross, my thesis advisor. He has given me theopportunity and the time to find not only a way but my way through the tasks of a Ph.D.student. He has managed to find a suited balance between giving me the freedom toidentify and chose a research topic while at the same time reminding me to maintainmy focus on the research problem at the initial stages. His technical advise helped meto improve my skills as a researcher but also influenced my personal development. Hisvaluable comments have greatly contributed to this dissertation.

I am very grateful to Roger Wattenhofer for accepting the duties of being my co–advisor. His knowledge of topics related to this dissertation have not only resulted infruitful comments and critics, they have also opened my mind for other interesting ideasthat I may pursue after this dissertation.

My sincerest thanks go to Michela Taufer and Jurg Bolliger, who volunteered to readthrough the first draft of my dissertation in spite of being heavily loaded with their ownduties. Their comments helped me to considerably improve the text of this dissertation.

I am very thankful to Michela Taufer, who endured the burden to share the officespace with me for many years. Far more than just being an office mate, you were a mostvaluable friend to me, ready to support me in any trouble I was in, with a firmness andkindness that I have seldom experienced in any other person. The trust I was able to lie inyou, but also the fruitful discussions about any technical and philosophical question, and,in particular, your volcanic temperament made my time with you a terrific experience inmany ways.

It is a pleasure to acknowledge the members of Thomas Gross’ research group forcreating an inspiring environment. The discussions with Jurg Bolliger at the initial stagesof the Chariot project provided one of the basic motivations for this dissertation. Thenumerous discussions with the other members of our group, Peter Brandt, Irina Chihaia,Matteo Corti, Hans Domjan, Urs Hengartner, Valeri Naoumov, Christoph von Praun, LucaPrevitali, Alex Scherer and Cristian Tuduce provided the stimulating challenge to openmy mind to subjects that are not necessarily related to the topic of this dissertation. Many

ix

x Acknowledgments

thanks also to the members of the Department of Computer Science who collaborated withme in many ways, among them Silvania Avelar, Christian Kurmann and Felix Rauch.

The possibility to participate in the Remos project fostered in fruitful discussionswith Professor Peter Steenkiste, Peter Dinda, Bruce Lowekamp, Nancy Miller, and DeanSutherland. This collaboration broadened my knowledge of research in the networkingdomain, and my stay at CMU was a rich experience.

I had the pleasure to supervise a number of undergraduate student projects. I learnt agreat deal about how to supervise projects from the collaboration with Jurg Bolliger su-pervising the projects of Thomas Ammann, Daniel Estermann, Roland Vogeli and PatrickWalther at the beginning of my time at ETH. Nikolaos Kaintantzis and Nathalie Kochersupported my efforts in the Remos project. Daniel Sporndli and Min Zhou provided use-ful insight related to Internet traffic behavior, while Raphael Fontana, Michael Gahwiler,Michael Keller and Daniel Sporndli implemented and studied the behavior of topology–aware multimedia streams. Finally, Thomas Hug did a good job in his integration studyof Octopus services and Jini.

I particularly want to thank those people who provided me with an account. Gettingan account in our days is getting harder and harder due for security reasons. However,the opportunity to install code and run experiments in the real Internet is vital for the un-derstanding of the Internet behavior. I would therefore like to thank Peter Steenkiste andNancy Miller at CMU, Peter Dinda at Northwestern University, Martin Vetterli at EPFLLausanne, Miguel Revilla at the Universidad de Valladolid, Jorge Granjal at the Universi-dade de Coimbra, and Silvania Avelar at the Universidade Federal de Minas Gerais.

While all those people helped me to identify the path and to remove obstacles alongthat way, the driving force that pushed me forward was my motivation. The source of thismotivation lies in my family and my friends who supported me through all those years.First and foremost, my parents, Hilde and Rudolf Karrer, were a never expiring source oflove and understanding and a harbor in stormy times. My god–daughter Lea is my shiningstar and my most beloved princess. Finally, many thanks go to Nadja and Michael, Lauraand Otto, as well as Heidi and Heini for their everlasting support.

Finally, I would like to thank Julia Pfister for her love, understanding and support.She brought a new, hot and inspiring rhythm into my daily life that also had a stimulatinginfluence on the pace of my work. Yo vivire!

Table of Contents

Abstract v

Kurzfassung vii

Acknowledgments ix

Table of Contents xi

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Topology–awareness 92.1 Topology information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 Resource information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3 Bandwidth prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.4 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4.1 Server selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.4.2 Overlay routing . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.4.3 Overlay multicast . . . . . . . . . . . . . . . . . . . . . . . . . . 162.4.4 Content Distribution Networks . . . . . . . . . . . . . . . . . . . 182.4.5 Peer–to–peer networks . . . . . . . . . . . . . . . . . . . . . . . 18

2.5 Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.5.1 Image adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . 192.5.2 Video adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.6 Communication architectures . . . . . . . . . . . . . . . . . . . . . . . . 212.6.1 Active networks . . . . . . . . . . . . . . . . . . . . . . . . . . 212.6.2 Active services . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.6.3 Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.6.4 Overlay network architecture . . . . . . . . . . . . . . . . . . . . 24

xi

xii Table of Contents

2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3 Octopus: a topology–aware framework 273.1 Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.1.1 Scenario 1: topology–aware adaptation . . . . . . . . . . . . . . 283.1.2 Scenario 2: dynamic handoff . . . . . . . . . . . . . . . . . . . . 293.1.3 Scenario 3: topology–aware multicast . . . . . . . . . . . . . . . 303.1.4 Scenario 4: multipath streaming . . . . . . . . . . . . . . . . . . 303.1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2 A unified process for topology–awareness . . . . . . . . . . . . . . . . . 323.3 Octopus – a framework for topology–awareness . . . . . . . . . . . . . . 33

3.3.1 The design the Octopus framework . . . . . . . . . . . . . . . . 343.3.2 The dynamic view of the Octopus framework . . . . . . . . . . . 373.3.3 The layered structure of the Octopus framework . . . . . . . . . . 38

3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4 Network support for topology–aware applications 434.1 Octopus Location Discovery . . . . . . . . . . . . . . . . . . . . . . . . 43

4.1.1 Dynamic network topology discovery . . . . . . . . . . . . . . . 454.1.2 Path discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.1.3 Octopus node discovery . . . . . . . . . . . . . . . . . . . . . . 494.1.4 Topology–aware Remos . . . . . . . . . . . . . . . . . . . . . . 514.1.5 Overhead estimation . . . . . . . . . . . . . . . . . . . . . . . . 534.1.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.2 Bandwidth Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.2.1 Time series analysis . . . . . . . . . . . . . . . . . . . . . . . . 574.2.2 Analysis methodology . . . . . . . . . . . . . . . . . . . . . . . 604.2.3 Sampling interval . . . . . . . . . . . . . . . . . . . . . . . . . . 604.2.4 Prediction models . . . . . . . . . . . . . . . . . . . . . . . . . 654.2.5 Prediction Time . . . . . . . . . . . . . . . . . . . . . . . . . . . 734.2.6 Negative and zero predictions . . . . . . . . . . . . . . . . . . . 744.2.7 Prediction Error . . . . . . . . . . . . . . . . . . . . . . . . . . . 784.2.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5 Management for topology–aware applications 895.1 Resource management and storage . . . . . . . . . . . . . . . . . . . . . 89

5.1.1 Topology information for alternative paths . . . . . . . . . . . . . 915.2 Evaluation and Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5.2.1 Evaluation Methods . . . . . . . . . . . . . . . . . . . . . . . . 955.2.2 Internet experiment . . . . . . . . . . . . . . . . . . . . . . . . . 97

Table of Contents xiii

5.2.3 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1045.3 Octopus and Jini . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1055.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

6 A topology–aware collaborative application 1136.1 Collaborative communication . . . . . . . . . . . . . . . . . . . . . . . . 1136.2 Design of a topology–aware collaborative application . . . . . . . . . . . 115

6.2.1 Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1166.2.2 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

6.3 Topology–aware data delivery . . . . . . . . . . . . . . . . . . . . . . . 1216.4 Topology–aware data delivery in the Internet . . . . . . . . . . . . . . . . 124

6.4.1 Strategy parameters . . . . . . . . . . . . . . . . . . . . . . . . . 1266.4.2 Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

6.5 Prediction–based adaptation in Octopus . . . . . . . . . . . . . . . . . . 1336.5.1 Large time limit . . . . . . . . . . . . . . . . . . . . . . . . . . . 1346.5.2 Small time limit . . . . . . . . . . . . . . . . . . . . . . . . . . 1356.5.3 Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1366.5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

7 A topology–aware video application 1417.1 Medusa: design, implementation and integration of an adaptive MPEG–1

application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1427.1.1 Medusa components and integration with Octopus . . . . . . . . 1427.1.2 Adaptive filtering in Medusa . . . . . . . . . . . . . . . . . . . . 1447.1.3 Multicast in Medusa . . . . . . . . . . . . . . . . . . . . . . . . 1457.1.4 Trace modulation . . . . . . . . . . . . . . . . . . . . . . . . . . 147

7.2 Server Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1487.3 Dynamic placement of filters on Octopus nodes . . . . . . . . . . . . . . 151

7.3.1 Video streaming with two parties . . . . . . . . . . . . . . . . . 1517.3.2 Multicast video streaming . . . . . . . . . . . . . . . . . . . . . 1557.3.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

7.4 Multipath streaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1597.4.1 Management–layer multipath streaming . . . . . . . . . . . . . . 1617.4.2 Application–layer multipath streaming . . . . . . . . . . . . . . . 1627.4.3 Comparison of the multipath streaming approaches . . . . . . . . 1647.4.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1667.4.5 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1667.4.6 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . 1697.4.7 Multipath streaming in best-effort networks . . . . . . . . . . . . 173

xiv Table of Contents

7.4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1757.5 Handoff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

7.5.1 Client handoff . . . . . . . . . . . . . . . . . . . . . . . . . . . 1797.5.2 Handoff phases . . . . . . . . . . . . . . . . . . . . . . . . . . . 1807.5.3 Handoff parameters . . . . . . . . . . . . . . . . . . . . . . . . . 1817.5.4 Influence of the parameters on the handoff . . . . . . . . . . . . . 1847.5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

7.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

8 Conclusion 1958.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1958.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

Bibliography 201

Curriculum Vitae 209

1Introduction

1.1 Motivation

The Internet has been designed along several principles, such as a best–effort delivery, aseparation between end systems and networks, and a point–to–point connectivity model.The simplicity of these principles has greatly contributed to the growth of the Internet toits current dimensions.

However, after several decades of rapid growth in the Internet, this simplicity is nolonger suited to support the connectivity requirements of all of today’s and tomorrow’s ap-plications. The best effort service and the IP routing protocols do not transport the data inthe way that best suits all application needs. Especially applications with communication–intensive and communication–sensitive (bandwidth or timing) demands, such as multi-media applications, suffer from the non–optimal data transport. The point–to–point con-nectivity model fails to efficiently support applications that communicate with multipleentities, such as multicast, anycast, but also mobility. Finally, the separation of appli-cations and networks prohibits data processing or the selection of a routing path by theapplication. One form of data processing is adaptation and allows an application to ad-dress heterogeneity and dynamic resource fluctuations that are inherent in the Internet.Adaptation is currently only possible on an end–to–end base. However, adaptation in-side the network is an alternative and provides some advantages over end–system basedadaptation.

The fact that the original design principles of the Internet are no longer suited to effi-ciently support the communication requirements is not only a consequence of the increas-ing application requirements. New forms of networks have appeared, such as wirelessnetworks or ad–hoc networks. On one hand, these new networks are seamlessly inte-grated into the Internet from an application point of view by hiding the particularities atthe network layer. On the other hand, the differences in speed and reliability differ insome orders of magnitude from those known from the Internet, adding to the heterogene-ity problem. As a result, the mentioned problems of today’s communication are likely topersist also into the future.

Solutions to address the new communication requirements have been proposed in pre-

1

2 Chapter 1: Introduction

connectivityrequirements

network-awareness

active networks

active services

overlay routingoverlay networks

IP mechanisms

qualityrequirements

networklayer

application(overlay)layer

QoSReservation

application

Figure 1.1: Connectivity and quality issues in today’s communication and approaches toaddress these issues.

vious work. Figure 1.1 gives a rough overview of the approaches. This figure shows twoproblem areas. One area refers to the connectivity problems, including multicast or any-cast, whereas the second refers to quality problems, e.g., applications that are sensitive tobandwidth or to timing constraints. Approaches to address the two problems can be sep-arated into three groups: approaches at the network layer, approaches at the application(overlay) layer and approaches within applications.

Solutions to address the communication requirements at the network layer have re-sulted in special IP features, such as IP multicast [24] or mobile IP [61]. To addressthe quality requirements, quality of service or resource reservation protocols, such asRSVP [13], have been proposed. One disadvantage is that no approach addresses bothconnectivity and quality issues. Special IP protocols only address one connectivity re-quirement but do not provide support for sensitive applications, e.g., multicast or mobility,but not all of them. QoS helps to support the quality requirements of applications, but haslittle or no support for the connectivity requirements. Active networks, in contrast, pro-vide an infrastructure that allows applications to inject code into the network [78, 80, 85].This code may contain support for both connectivity and quality requirements.

In general, however, addressing the new communication requirements at the networklayer has significant drawbacks. Apart from the fact that IP mechanisms only address oneconnectivity issue, their deployment is very slow in the Internet. Similarly, the deploy-ment of mechanisms to support quality are hard to deploy. The active networks approach,finally, is considered to endanger the stability of the Internet [23].

1.1 Motivation 3

Therefore, alternative solutions have been proposed that restrict the deployment ofcommunication mechanisms to the application layer. Overlay routing allows the trans-mission of data along application–layer paths [21, 4, 68]. The routing along these pathsis steered by protocols based on application–layer metrics, such as bandwidth or latency.These protocols can therefore be used to support sensitive applications. Other approachesin the domain of overlay networks focus on application–layer mechanisms to supportcomplex connectivity requirements, such as multicast [47, 53, 22, 75] or anycast [8, 52].These approaches typically rely on an overlay network routing protocol. However, sim-ilar at the network layer, every overlay approach has so far only targeted at one singleconnectivity issue.

Network–awareness is an alternative approach to address the quality requirements byshifting the responsibility to the application developer. Network–awareness allows anapplication to react to changes in the network by measuring the availability of resources.It is therefore a concept that does not require a special overlay layer but rather tries toabandon the layering.

Finally, active services [3, 40] also advocate the execution of application–specific codeinside a network, similar to active networks. In contrast, however, active services are onlyallowed to access application–layer resources. A video gateway which adapts a videostream inside a network provides a sample implementation of an application that uses ac-tive services [3]. Similarly, active services have also been proposed for multicasting [51].

This discussion shows that active services provide the potential to address both con-nectivity requirements and quality requirements. However, many questions are still unan-swered. Especially the integration of active services with the application has not beenaddressed. Similarly, network–awareness has mostly focused on quality issues and hasnot been combined with active services to combine adaptivity and routing. We claim thatthis combination of active services and network–awareness opens new possibilities to ad-dress the connectivity and quality requirements of today’s and tomorrow’s applications.Our vision is that a unified architecture can be provided that supports different applica-tions on different kinds of networks.

To find a unified solution, we must first prove that the connectivity requirements andthe quality requirements can be addressed in a single architecture. Both issues address ap-plication requirements that are related to communication. We learn from active servicesand active networks that the application–specific code injected into the network may con-tain routing information, thereby addressing the connectivity requirements, as well as dataprocessing information, e.g., for adaptation.

Finally, we learn from network–awareness and from (the drawbacks of) overlay rout-ing and overlay networks that the application plays an important role in the communica-tion. Overlay routing is only able to address one metric at a time, e.g., bandwidth. Thediversity of applications requirements, however, requires that multiple, different routingschemes must be available. Network–awareness shows that integrating network informa-


client 1server

client 2

(a) Blackbox view of today’s Internet

client 1server

client 2

forwarder

path A

path B

(b) Overlay routing

client 1server

client 2

forwarder

multicast

(c) Overlay multicast

client 1server

client 2

active service node

path

adapt

(d) Topology–awareness

Figure 1.2: Comparison of the current Internet communication model, overlay ap-proaches and the topology–aware architecture.

tion into the application is one possible approach. The integration of network informationis a difficult task from a software engineering point of view. Since the original separationbetween applications and networks is no longer maintained, concepts are needed that reg-ulate the interaction process between applications and networks. This process is complexbecause it must be flexible to support different applications and different networks.

The differences between the communications mechanisms in the current Internet, inoverlay networks, in overlay multicast and finally with active services is depicted in Fig-ure 1.2. Figure 1.2(a) shows the communication model of the current Internet. The datatransport is hidden from the application and beyond application influence, as indicated bythe dotted lines. Figure 1.2(b) shows the application–layer routing of overlay networks.Overlay networks allow the sending of data along application–layer path. In this figure,path A and path B are two possible paths from the server to client 1. The paths are se-lected by a forwarder (in the terminology of [4]) that operates at the application layer butis (still) independent of the application and beyond its influence. Figure 1.2(c) depicts anoverlay multicast architecture. Overlay multicast typically relies on an underlying overlay

1.2 Challenges 5

routing protocol. It then adds the multicast capability to forwarders and adds protocols tocreate a multicast distribution scheme (e.g., a tree). In this figure, the data is sent fromthe server towards client 2. The multicast node splits the stream towards the two clients.Finally, Figure 1.2(d) depicts an active service network model. In contrast to previousapproaches, the active service nodes are visible to the application (the links are thereforeshown as solid lines). In our vision, the (network–aware) application is able to influ-ence the communication by selecting the best routing path through the network and byselecting the best node on which the data is adapted. Because the definition of the “best”routing path and the best location for adaptation depends on the application preferences,we claim that the selection process should be integrated into the application context. Fi-nally, since the application adapts the communication based on the network topology, wecall our concept topology–awareness.

1.2 Challenges

The concept of topology–awareness brings up various challenges. First and foremost, anarchitecture for topology–awareness is needed that addresses both the connectivity andquality requirements. The architecture should therefore be flexible to be used for dif-ferent application scenarios, but at the same time it should allow an easy deploymentof topology–aware applications. Optimally, the architecture should be able to integrateboth new and legacy applications. The architecture should also be flexible to take vari-ous network characteristics into account, such as those of the Internet, managed wirelessnetworks as well as ad–hoc networks.

Second, topology–awareness requires information about the network topology, i.e.,the location of active service nodes. In contrast to an overlay network, where the over-lay topology is given by the participating nodes, topology–awareness is an application–centric solution that does not necessarily rely on an infrastructure that provides informa-tion about the location of active service nodes. An application must therefore dynamicallyfind available nodes in a network by itself. The challenge is to formulate the finding ofnodes in a way that it can be implemented for different network types and for different ap-plication requirements. Additional challenges in the Internet are scalability, i.e., availablenodes must be found efficiently in a large network, and practicability, i.e., the solutionshould only use tools that are available in the current Internet to allow a deployment andan evaluation of the concept of topology–awareness in the Internet.

The third challenge is the gathering of performance information from the network, i.e.,bandwidth, latency or jitter information. Existing tools for gathering network informationprovide a base for these measurements, but they must be customized and integrated intothe application context. The integration requires that an application can specify its re-quirements (e.g., bandwidth) and that a corresponding tool measures the performance. Inthe other direction, a unified structure for these measurements is needed that hides the


details of the measurements from the application.Fourth, mechanisms must be found that allow a topology–aware application to easily

use all the desired and provided information to determine which routing behavior andwhich adaptation mechanisms are most suited. One problem is again that the term ”mostsuited” depends on the application context, so a solution should at the same time be flex-ible enough for multiple application requirements but at the same time easy to use.

Finally, a last challenge is to evaluate the concept of topology–awareness. We areespecially interested to see up to which degree topology–awareness is able to satisfy theimposed challenges and at which costs they can be satisfied.

1.3 Thesis Statement

Our hypothesis for this dissertation is that:The integration of (application–layer) topology and performance information into the

application context can be achieved by a single architecture and it pays off.

We call an application that integrates topology and performance information a topolo-gy–aware application and hence the concept to integrate topology– and performance in-formation topology–awareness. The terminology is motivated by the concept of network-awareness, which also aims at integrating information about the network into the ap-plication context. However, network–awareness only takes end–to–end information intoaccount.

This dissertation establishes the thesis in the following way.

1. We define a concept for topology–awareness and thereby show how the differentconnectivity requirements and the quality requirements can be expressed in a singlearchitecture.

2. It shows that a framework architecture is well suited to address the requirements. Aframework targets at a design reuse. The design in this dissertation is the interactionbetween networks and applications. The design fixes interaction abstractions whileat the same time allowing an application to customize these abstractions for itspurposes.

3. The (network part of the) framework is customized to gather network informationfrom the Internet. It provides mechanisms to dynamically gather topology informa-tion and to measure performance information.

4. We provide a study of different statistical prediction models to predict the availablebandwidth using Internet traces.

1.4 Roadmap 7

5. The integration of current middleware for service management provides a possibleinteraction between the applications and active services.

6. We prove that the proposed framework is able to take different connectivity andquality requirements into account by customizing the framework with two applica-tions. A collaborative application is used to study the importance of taking differentmetrics into account, whereas an MPEG video application focuses on the deploy-ment of connectivity requirements (multicast, dynamic handoff).

7. We stress the importance of the integration of application knowledge by showingthat a multipath streaming protocol has significant drawbacks due to synchroniza-tion problems if the multipath streaming is hidden from the application.

1.4 Roadmap

This dissertation is organized as follows. Chapter 2 reviews previous work related tothis dissertation. This chapter provides evidence that the challenges outlined above haveremained unsolved so far.

Chapter 3 presents an analysis of the different connectivity and quality requirements,which lead to the design of the unified topology–aware architecture, as outlined in thefirst two items of the previous list. This architecture is a topology–aware framework wename Octopus. The analysis shows that the issues addressed by topology–awareness canbe split into three layers: a network layer, a (core) management layer and an applicationlayer. These layers are subsequently described in a bottom–up fashion in the followingchapters.

Chapter 4 addresses the issues of the network layer in the Octopus framework (thethird item in the previous list). It describes the information which must be gathered abouta network to support topology–aware applications. First, Section 4.1 discusses the findingof the location of the active service nodes (which we call now Octopus nodes) in a (real)network. The implementation of the resource discovery is integrated into the networkinformation system Remos [28]. Second, Section 4.2 describes several statistical methodsto predict the future bandwidth availability. This dissertation categorizes the differentprediction models by different parameters so that every application can automaticallypick the prediction method that is best suited.

The management layer of the Octopus framework addresses the fifth item in our listand is described in Chapter 5. The main task of the management layer is to integrateapplications and networks by providing abstractions. This layer contains several methodsto evaluate the information provided by the network layer to pick the best path or selectthe best octopus node to run its code. This chapter also shows the integration of a servicemanagement middleware, Jini, into the concept of topology–awareness.


We study the use of the topology–aware information by means of two applications,thereby addressing points 6 and 7 of our thesis list.

Chapter 6 presents a topology–aware collaborative application. Its goal is to distributedifferent data types of varying sizes to multiple clients. The topology–aware approachprovides five strategies to send the data over a network and is able to deliver each datatype according to the metric it is sensitive to.

Chapter 7 describes the integration of a legacy adaptive MPEG–1 application [43] intothe Octopus framework. After the integration, this application contains several topology–aware mechanisms: an adaptive filter, a multicast filter, a handoff trigger and a multipathstreaming mechanism. The multicast filter proves that multicast can be easily expressedin the Octopus framework. The handoff trigger allows an application to switch from oneconnection to another at run time. It shows that the application layer is a reasonableplace to deploy communication mechanisms. Finally, the multipath streaming protocolprovides a comparison between an topology–aware approach and a layered approach,thereby addressing the discussion whether communication mechanisms should be hiddento applications.

Chapter 8 summarizes our findings and concludes this dissertation.

2Topology–awareness

This chapter presents a comprehensive survey of work related to this dissertation. Thegoals are thereby twofold. First, we show the limitations of the current state–of–the–artapproaches, especially the lack of a unified architecture. Second, we present sources ofprevious work that contributed to this dissertation.

The review of related work is separated into the different sections according to thetopics depicted in figure 2.1. Each circle in this figure addresses one issue related totopology–awareness (see also Section 1.2. We start with a survey of approaches to gathertopology information about a network. Section 2.2 describes approaches to gather perfor-mance information about a network. Approaches to process raw performance informa-tion with statistical prediction methods to improve the information quality are presentedin Section 2.3. Section 2.4 gives an overview of previous approaches in routing issues.We thereby especially focus on application–layer solutions and how the application–layerrouting reflects application requirements. An overview of adaptation mechanisms for dif-ferent data types is given in Section 2.5. Finally, Section 2.6 studies the architecturalissues of communication infrastructures.

2.1 Topology information

A topology–aware application needs information about the network topology, more pre-cisely, about the location of active service nodes inside a network. Literature describesseveral approaches about how this information can be provided. We distinguish threemain approaches: overlay networks, service management systems and network informa-tion systems.

Overlay networks are typically built from nodes that subscribe to the overlay com-munication. That is, a client that joins an overlay network communication announces itsinterest by subscribing to an existing infrastructure, e.g., by contacting a well–known ex-isting rendez–vous point. After the subscription, a client node does not only receive databut it can also forward data to other nodes. The network topology is therefore a “sideeffect” of the joining and leaving of nodes. It is assumed that no other nodes are avail-able in any other part of the network so that no protocol is used to discover additional

9

10 Chapter 2: Topology–awareness

bandwidth prediction

adaptation routing

topology-awarenessarchitecture

topology information performance informationsection 2.1 section 2.2

section 2.4

section 2.3

section 2.5

section 2.6

Figure 2.1: Subjects related to topology–awareness.

resources. The topology built by this process is the bare network topology on which ap-plications may communicate. The building of a distribution structure (tree, mesh) and itsoptimization is the task of other protocols (e.g., Narada [22]).

A discovery of active service nodes (aka services, distributed objects) is needed bydifferent service management systems, such as CORBA [58], AS1 [3], or Jini [76]. Theclients of a service are decoupled from the service and need to get information about thelocation of the desired service before using it. CORBA [58] applications either know thelocation of the service or they use a well–known point–of–contact called naming service.The naming service allows services to register themselves. When a client queries fora particular service (by name), the naming service returns the reference of the serviceobject. This rendez–vous mechanism is static in that clients and services must know thelocation of the naming service. The AS1 active service framework [3] uses a well–known (static) point–of–contact address to instantiate services, whereas a client uses the(dynamic) Active Service Control Protocol, ASCP,. ASCP is a decentralized announce–listen protocol with which a client can get hold of an instantiated service. Finally, Jini usesthe most dynamic discovery. Services as well as clients use send a broadcast message intothe network to which the central registry replies.

Neither the overlay network nor the discovery mechanisms of service managementsystems are suited for the purposes of this dissertation. The overlay approach cannot beused because the topology–aware network consists of nodes other than just the client. Ac-tive service nodes are just located inside the network and do not actively make themselvesknown. Using a well–known point–of–contact is technically possible but is a dangerousapproach for the Internet because it may easily become a bottleneck when both servicesand clients must access the same point. Finally, a dynamic search using broadcast simply

2.1 Topology information 11

does not scale to large networks and is a no–op for the Internet.

The problem of having a single point–of–contact can be alleviated by a distributed,hierarchically decomposed protocol. Chord [74], e.g., is a distributed protocol that canbe used for the management of service locations in the Internet. Chord is self–organizingand provides a good search performance (log � n � steps where n is the number of serversthat store object references). However, many questions regarding such protocols have notyet been addressed or answered. Little is known how fast these protocols are becauseno such protocol has been deployed on a large scale in the Internet. In spite of theirlogarithmic behavior, the latency and the processing time of log � n � steps may be large fortopology–aware applications.

A third approach to get information about the topology is to let the application searchfor it. We anticipate that we assume that active service nodes in our model are normalInternet hosts from a networking point of view, but they can somehow be identified byapplications, e.g., by a special tag in their host name. As a consequence, when an appli-cation gets a list of hosts in a network, it can identify and filter out active service nodes.Different tools provide raw network topology information. In a LAN, SNMP [70] can beused to gather routing information from the routers. This raw information can be usedto detect active service nodes. In the Internet, traceroute provides topology informationrelated to the routing path. Finally, other tools collect information from the routing tables(e.g., BGP tables).

These tools provide the topology information in a very raw and unanimous way. Theinterface of these tools as well as the presentation of the results are not easily usable forapplications. In addition, a combination of the information from different tools may beneeded, e.g., to combine routing path information with those of a LAN. Tools have there-fore been developed that provide a single, easy to use API. Remos [28], e.g., combinesSNMP and traceroute information and hides the details of the gathering and of the formatsbehind an API. The application specifies the hosts it is interested in. Remos gathers theinformation about their interconnection and returns it as a network graph that can easilybe used by applications. A similar approach is taken by Siamwalla et al. [69], which usesping, traceroute and DNS zone transfers to gather network information.

We consider network information systems, such as Remos, a viable start to get topol-ogy information for topology–aware networks. It uses well–known techniques that arewell understood and it can be deployed in the Internet. However, these network informa-tion systems must be extended and adapted to the issues of topology–awareness. First,they must be integrated to work with active service nodes. Second, although the networkinformation systems are well understood, their usage for the purpose of topology–awaresystems is far from easy. How can an application specify the part of the network it needsinformation about? How fast can information be gathered, how accurate can and mustthis information be? These issues have only been brought up by topology–awareness.


2.2 Resource information

The gathering of information about the available resources can be separated into threegroups, raw performance measurement tools, (large) measurement infrastructures and net-work information systems that measure that performance on behalf of an application.

CAIDA [14] lists some of the available tools to measure the raw performance of In-ternet connections, e.g., b/c–probe, netperf, nettimer, pathchar and ttcp. These toolsmeasure raw bandwidth, available bandwidth, congestion throughput, and latency. Thesetools use different techniques to measure the desired information. However, similar as thetopology tools, the performance tools are all different in their API and their output dataformat.

CAIDA also provides a list of about 20 Internet measurement infrastructures, such asManta, i2, Skitter or Surveyor or Nimi. These infrastructures are used to study the Inter-net traffic on a large scale. These infrastructures, however, are not designed to provideonline–information about the current resource availability to an application. Therefore,no such measurement infrastructure has so far been integrated into the application contextto support the data delivery of applications.

The drawback of non–uniform APIs is addressed (again) by different network infor-mation systems. Remos [28] integrates some of the raw performance tools and hides thedetails about the measurements and the different formats of reporting the values behinda uniform API from the application. Remos is therefore able to provide bandwidth andlatency information about LANs and WANs using different tools.

A similar goal achieves the Grid Monitoring Architecture [81] defined by the GridForum. These systems greatly support the deployment of topology–aware applicationsbecause their API allows an easy integration into the application context.

The above performance measurement tools are active in that they create traffic for theirmeasurements. These measurements impose a load on the network, which is undesiredand also limits their scalability. An alternative approach is the Shared Passive NetworkPerformance Discovery (SPAND) [72, 67]. SPAND is a system that gathers networkinformation by making shared, passive measurements from a collection of hosts. Ratherthan sending probe packets into the network, SPAND uses a packet capture host. Thisnode analyzes the traffic flowing through a part of the network and allows the calculationof the available bandwidth or the loss rate for a collection of hosts in the same area.SPAND is an alternative to active monitoring and could be integrated in either a networkinformation system (Remos) or into a topology–aware architecture.

The discussion of related work in the area of resource information gathering showsthat performance information is available and that network information systems providetheir results in a uniform API. While information about the architecture and the perfor-mance of network information systems is available, little is known about their suitabilityto support (topology–aware) applications.

2.3 Bandwidth prediction 13

2.3 Bandwidth prediction

The measurements of network properties, such as the topology and the available band-width, result in a snapshot of the actual resource usage in a network. While this snapshothas a high probability to remain stable for some time [59], the available bandwidth ismost likely to change, given the inherent bandwidth fluctuations. An application thatadapts its behavior to a single bandwidth measurement is therefore most likely to be letdown. Therefore, research has recently been driven to find means to predict the futurebandwidth based on this history data.

A wide variety of statistical models has been proposed for any kind of prediction.Linear models, such as the Bestmean model, take previous measurements into account,average them over a time interval, and use this average as a prediction. More com-plex models, such as auto–regressive models (AR), moving average processes (MA),mixed auto–regressive–moving–average processes (ARMA) or integrated ARMA pro-cesses (ARIMA) try to additionally model the dynamics. The (mathematical) propertiesof all these models are described by Box and Jenkins [12]. Basu et al. [5] show that evenbandwidth traces with heavy fluctuations can be predicted quite well by applying the cor-rect model with the correct set of model parameters. Unfortunately, little is said abouthow the optimal model and parameter set can be found, nor how long it takes to find it,nor how long it takes to calculate the prediction for such a trace.

The network weather service (NWS) uses prediction in a distributed environment [86].NWS periodically measures the available bandwidth between a set of hosts that form apool of available servers and clients. When a new application request should be started,the available bandwidth is calculated from the previous values. The server with the bestbandwidth prediction is chosen for this task. NWS uses three types of predictors: mean–based, median–based and autoregressive methods. For the considered traces in this study,a gradient–based predictor is shown to be the most effective for bandwidth prediction.

Groschwitz et al. [42] use time series analysis to create forecasts of future NSFNETbackbone traffic. They use the ARIMA model because the NSFNET behavior is non–stationary. In their analysis, they provide a model identification order in which the pa-rameters for ARIMA are determined, and select the most suited models from a parametervalue range of 0 to 2 for each parameter. Their evaluation shows that their predictionis fairly accurate up to 1 year ahead. They model especially manages to fit the ups anddowns of the observed traces.

In a Grid environment, where data is replicated on different servers, prediction can beused to select the server from which the data can be accessed most efficiently. Vazhkudaiet al. [82] apply several predictors (average and median based ones) to an end–to–enddata transfer, where end–to–end means that all components (storage systems, networks,clients) are integrated. The authors argue that this kind of prediction is easier and morerealistic in the sense that a user will eventually only notice the end–to–end performance.


Their measurements show that their predictors are off by at most 25%. The authors alsostate that large data transfers seem to be more predictable that small ones.

Dinda [27] applies the prediction models of Box and Jenkins [12]. to host load tracesand tries to predict the future host load. Dinda finds that of all models, AR(16) is mostsuited for host load prediction. Because his methodology is used as guidance in thisdissertation, it will be interesting to see whether bandwidth traces show a similar behavior.

We conclude that statistical models can be used to predict the future behavior of tracesfrom different sources. Statistical models have already been used on real bandwidthtraces. The key problem for a topology–aware application, however, is to find the correctmodel and the correct set of parameters within reasonable time. There is little time for atopology–aware application to calculate the results of different models and parameter setsand compare the result. In addition, especially complex models may require a significantamount of computation power and time to calculate a single value. A topology–awareapplication must trade off the (possible) accuracy of a prediction model against its timeand its overhead. We consider that the few models that have been applied for bandwidthprediction still lead space for investigation.

2.4 Routing

The IP routing protocol has been designed to be stable and to deliver the data over as fewhops as possible. However, a study by Savage et al. [66] shows that there alternative pathsexist in the Internet which provide better performance. An alternative path is constructedby combining two “normal” Internet paths via the same host (e.g., an alternative path froma node A to a node B via a node C is created by combining the path AC and the path CB).The bandwidth of the alternative path becomes the minimum of the two original pathsand the latency of the two paths is summed up. The study with a large set of Internettraces shows that in 30–80% of the test cases, alternate paths exist whose quality (deliverytime) is significantly better than that of the default routing path. These results show thatalternative routing strategies may better use the available resources in the Internet.

This section looks at different ways of improving the routing performance. First,Section 2.4.1 describes approaches for server selection. We consider server selection asimple form of routing because a client decides along which path the data is sent. Sec-tion 2.4.2 reviews different overlay routing protocols. Multicast protocols for overlaynetworks are compared in Section 2.4.3. Finally, two concrete overlay networks, contentdistribution networks (CDNs) and peer–to–peer (P2P) networks, are discussed in Sec-tion 2.4.4 and 2.4.5 respectively.

2.4 Routing 15

2.4.1 Server selection

Frequently accessed servers are replicated at places all over the world to avoid the over-load of a single hot spot server and to provide multiple possibilities to route data andthereby to avoid possible bottlenecks. A first approach in server selection is to confronta user with a list of replica servers from which he has to choose the one he thinks hasthe best performance. However, this decision is very hard for a user without knowinganything about the resource availability in a network.

Carter and Crovella [16] describe a dynamic server selection method which alleviatesa user from selecting a server. They use two tools called bprobe and cprobe to estimate theuncongested and congested bottleneck bandwidth and a third tool to measure the roundtrip time to the server. In downloading the data, they check the data size and decideupon it which metric to use. Small data is downloaded from the server with the bestRTT whereas large documents are retrieved from the server with the best bandwidth. Theevaluation shows that dynamic server selection consistently outperforms static policies,reducing the response times by as much as 50%.

Fei et al. [30] address the problem of server selection. Every replica server has its ownIP address. Revealing these IP addresses to a user is by no means helpful. The authorspropose the use of anycast addresses. An anycast address is a pseudo IP address thatcan be mapped to a set of real IP addresses by an anycast resolver. This mapping canbe steered by different strategies. One strategy is to integrate the strategy of Carter andCrovella to dynamically select the server based on resource information. The combinationof these two approaches hides server selection completely from a user.

2.4.2 Overlay routing

Overlay routing protocols build a routing layer on top of the IP routing and forward dataalong path. Applications may expect a better performance because the routing of over-lay protocols is based on metrics that better reflect the application needs. The overlayrouting protocols differ in the metric they use to route the data. Narada [22] and Scat-tercast [18] measure latency. Tapestry [89] defines that two nodes are close by “a varietyof mechanisms, including network latency and geographic locality”. Yoid [37] suggestslatency–based routing but does not implement it. Overcast [47] uses active probes to mea-sure the bandwidth. A RON router [4] implements three different routing metrics fromwhich a user can choose: latency, loss and TCP throughput.

A study by Chu et al. [21] compares the influence of different metrics on the routingbehavior and on fixed rate data stream applications. Among the routing strategies arelatency– and bandwidth–based routing scheme as well as a combined bandwidth–latencyscheme where bandwidth is the primary metric and latency is considered as a secondarymetric if two paths have similar bandwidth. Their analysis with Internet experimentsshows that a combined bandwidth–latency scheme outperforms the other techniques.


Detour [65] is a framework that supports alternate–hop routing with an emphasis onhigh performance packet classification and routing. An alternative path is created byconcatenating two end–to–end paths. Packets are routed along alternative paths by andIP–in–IP encapsulation. The work is mainly focused on the argumentation why alternativepaths are better than the default IP routing path. However, little is known about the detailsof the deployment and the use of Detour. Questions, such as: “how are alternative pathsfound?” or “How often are updates of bandwidth needed?” are left unanswered.

Resilient Overlay Network (RON) [4] is an overlay routing architecture that is targetedat detecting and recovering from path outages and periods of degraded performance. RONdefines an overlay architecture containing RON routers. A RON router forwards the databased on latency, loss or TCP throughput. The routing policy can be set by administratorsor it can be set on a packet base. Every packet may include a classification tag that allowsa router to identify its forwarding preference.

The discussion of related work in overlay routing leads to the following conclusions.First, most overlay routing architectures only take one metric into account. This metric ismostly latency (although the routing is mostly used by overlay multicast protocols whichmight be assumed to optimize routing according to bandwidth). The main exception isRON. RON especially shares the spirit that a uniform routing based on a single metricis not suited for a single, unified communication infrastructure. In contrast to our work,however, RON only focuses on routing and does not address application mechanisms,such as adaptation. We conclude that RON is the overlay routing mechanism that isclosest to our idea of topology–awareness with respect to the routing behavior.

2.4.3 Overlay multicast

In spite of a decade of research on network–layer (IP) multicast, no global IP multicastservice has been deployed. Among the reasons that hindered the deployment are problemsrelated to the multicast service model, such as group management, lack of access control,absence of inter–domain multicast routing and a lack of a distributed multicast addressallocation. In addition, the IP best–effort service model combined with IP unicast make ithard to add improved services, such as reliable, sequenced data delivery and congestioncontrol [18].

As a result, application–layer multicast protocols have been proposed in combinationwith overlay routing protocols. Janotti et al. [47] lists pros and cons of these application–layer multicast protocols: incremental deployment of protocols, application–based adap-tation, robustness, customizability, built on top of standard protocols (TCP, UDP), com-plexity management. On the flip side, inefficiency and loss of information are mentioned.

Overcast [47] is a single–source multicast protocol. Overcast defines a self–organizingprotocols that forms the multicast tree based on bandwidth measurements to maximize thebandwidth to the root for all nodes. When a node contacts the Overcast root node, the pro-

2.4 Routing 17

tocol iteratively searches for the node which is closest to the new node without sacrificingbandwidth. Its position is periodically reevaluated to adapt to resource changes.

Narada [22] differs from Overcast because it is designed for multi–source multicastbut only provides small–scale multicast groups whereas Overcast supports large–scalemulticast groups. The routing layer of Narada uses a path–vector protocol called DVRMPto optimize the routing paths (i.e., the dominating metric is latency). On top of this routinglayer, a mesh structure of the participating nodes is built. This mesh is then evaluated toform the final multicast tree.

Scattercast [19] is similar to Narada. It uses latency as a primary metric for routing.Scattercast uses a set of rendez–vous points to allow new nodes to join the distribution.

Yoid [37] is a generic architecture for overlay networks with a number of new proto-cols. Unlike Overcast, e.g., Yoid aims at a one–to–many or many–to–many data distri-bution, providing support for different applications such as netnews, streaming broadcastand bulk email delivery. On top of the traditional transport layer, a new transport layerimplements a set of protocols, such as yTCP or yRTP. On top of this layer, a tree protocolis responsible for the creation of the data distribution tree.

Bayeux [90] is a multicast protocol on top of Tapestry. Tapestry uses local routingmaps (called neighbor maps) at each node in a network. An entry in such a map is eithera node or a prefix that defines the next hop in the routing path towards all nodes thatshare this prefix. Bayeux extends this routing scheme by introducing storing prefixes ofall nodes in the multicast tree.

Despite the promising proposals, many questions regarding application–layer multi-cast are still unanswered. Most overlay multicast protocols base on an underlying overlayrouting protocol. We have seen that many of these routing protocols are not yet at a stageof being deployed and giving clear evidence about their performance in a real, large en-vironment. Given this lack, how should a multicast protocol be evaluated? One issue isthat is completely omitted is, e.g., the fact that resources fluctuate. How often should arouting protocol check for fluctuations, at what granularity should the routing protocolreact to these fluctuations, and how does the overlay multicast routing protocol react tothe changes? These questions can only be addressed if they are studied together. In ad-dition, we also note that a lot of theoretical work is published about overlay multicast.As an example, Shi et al. [68] introduce ”several routing algorithms that are suitable foroverlay multicast networks”. For their evaluation, they use network topology of 50 largeU.S. metropolitan areas, but ”all multicast sessions are assumed to have the same band-width”. While such an assumption may be satisfied to study the protocol performance andshow the differences to other protocols, it also shows that much work is still far beyond adeployment in a real network.

Similarly, little attention has been paid to combine the multicast delivery with mecha-nisms to adapt the data. It may be that adaptation is addressed by yet another protocol ontop of the overlay multicast protocol. In this case, the protocol stack would contain three


overlay layers: routing, multicast and adaptation. Given the experiences with the draw-backs of the OSI protocol stack, we prefer to consider an alternative approach, namelythat of integrating multicast communication into the common architecture.

2.4.4 Content Distribution Networks

Content distribution networks (CDNs) offer hosting services to Web content providers.These distributed services allow the replication of provider content to improve the accessperformance (faster response and download time). A successful CDN must address twoissues regarding routing. First, the replicas must be placed at strategic places, and second,an efficient mechanism is needed to (re)direct an access query to the “best” replica server.

Qiu et al. [62] investigate how Web server replicas are best placed to reduce the ac-cess time. Several strategies for server placement are described that make use of workloadinformation, such as client latency or request rates. They conclude that placement algo-rithms which incorporate dynamic client workload information perform a factor 2–5 bet-ter than workload–oblivious random algorithms. Second, they conclude that placementalgorithms are not very sensitive to noise in the estimates of distance and load. Third,prediction of future request load must be taken into account by any placement algorithm.For their work, a simple moving window average fits their requirements.

Little information is available regarding efficient redirection mechanisms. Although anumber of service providers operate CDNs, e.g., Akamai [1], Exodus [29] or Digital Is-land [25], no in–depth information about the internals of their routing behavior is publiclyavailable.

2.4.5 Peer–to–peer networks

Peer–to–peer (P2P) communication breaks with the traditional client–server communica-tion model. A peer acts as both server and client and may additionally provide routingsupport.

One large group of peer–to–peer (P2P) systems allow users to locate objects (files) in adistributed network. Two of the most known P2P systems are Napster and Gnutella. Theysignificantly differ in their architecture. Napster stores large indices of the file locationon large servers at well–known locations. When a client wants to access a file, it receivesthe locations of those peers which store the file together with metadata (e.g., the reportedbandwidth of this peer) from these servers. Gnutella, in contrast, does not use servers.Gnutella is a fully decentralized system where every peer maintains connections to itsimmediate neighbors. Files are found by broadcasting a query over a part of the Gnutellanetwork. The distribution of the query is limited by a TTL field.

A study by Saroiu et al. [64] analyzes Gnutella and Napster along different parameters.One of their findings is that there is a lot of heterogeneity in P2P networks, sometimesspanning five orders of magnitude. The bandwidth of the peers, e.g., ranges from modem

2.5 Adaptation 19

connection speeds (33% of the users) to T1 and T3 links (about 8%). However, this het-erogeneity is not addressed by neither system. The bandwidth which is reported by peers(by manual user configuration) is returned to a client. If multiple peers store a file, theuser can chose from the list of servers and (usually) picks the one with the highest band-width. However, it can be shown that a large number of peers report a wrong bandwidth(they mostly report a smaller bandwidth to discourage downloads). The authors concludethat the bandwidth information should be measured dynamically rather than using thereported value.

Summing up the research in overlay routing we conclude that overlay routing is ableto route data along application–layer metrics. However, little of this capability is used.Most routing protocols only take one metric (and mostly latency) into account, although astudy shows that a combined bandwidth–latency metric provides the best performance forfixed rate streaming applications. A second observation is that there is a large gap betweentheory and practical deployment. P2P networks, e.g., have not been around for too long,but they do not include any dynamic performance information in their routing. Finally,we note that only RON is aware of the fact that different applications have different needsthat can (or should) be taken into account by overlay routing.

2.5 Adaptation

Adaptation addresses heterogeneity and resource fluctuations in best–effort networks.Adaptive applications trade on resources, e.g., CPU against bandwidth. Of relevancefor this dissertation is the adaptation of two data types: images and videos.

2.5.1 Image adaptation

Applications that deliver images have been a fruitful target for adaptation. One reason isthat many methods and algorithms exist to adapt images: size reduction, format conver-sion, compression ratios. Images are also an important data type because many sourcesproduce or distribute images. Images are a dominant data type in the WWW.

The dissertation of Bolliger [10] presents a framework for network–aware applica-tions. The framework encapsulates the abstraction of how the application should adaptimages and lets the application choose the exact adaptation algorithm that is best suitedfor the application’s purpose. The interaction between the network and the application isalso encapsulated in a way that shields applications from many details of the network.

While Bolliger addresses adaptation on an end–to–end base, adaptation is performedon proxies in the global mobile computing scenario, described by Fox et al. [35, 36].These proxies (which can be viewed as active service nodes) are typically placed at theborder between a global network and slow links, e.g., modem or wireless links. Dataflowing through the proxy is distilled (i.e., adapted) according to the data type.


The above work shows that images can be adapted in different ways and at differentplaces. The combination of these two options implies that a decision about how and whereadaptation is performed is needed. Especially the question about the location of adapta-tion opens has not yet been addressed properly. End–to–end adaptation stands againstproxy–based adaptation. In case of a proxy–based adaptation, we identify a lack of adynamic selection of where the adaptation should be performed. The cited work assumesthat the proxy is located before a bottleneck. Such an assumption (mostly) holds when theproxy is located before a modem link. However, in a best–effort network, bottlenecks mayoccur at various places and at different orders. An extension of proxy–based adaptationto dynamically find a place where adaptation is needed has not been made so far.

2.5.2 Video adaptation

A set of real–time multimedia conferencing tools was created in the mid–nineties to trans-mit video streams over the Internet: nv from Xerox PARC, ivs from INRIA and vic fromBerkeley. These tools consist of codecs that compress video streams on the fly to a targetbandwidth. The streams are transmitted over RTP and IP and can be used in combinationwith IP multicast for the MBone. However, due to the drawbacks of IP multicast, none ofthese tools is widely deployed in the Internet today.

Research in streaming media has resulted in many techniques for adaptation. Fil-tering may be done by codecs, frame–dropping or layer–dropping filters. Compressiontechniques may base on wavelets or discrete cosine transform. Streams are congestion orrate controlled, by senders or receivers. Metrics to steer the stream include bandwidth,delay, loss and jitter. Wu et al. [87] give an overview of different adaptation techniques.

In the remainder of this section, we focus on MPEG stream filtering because we usean adaptive MPEG application later in this dissertation. MPEG–1 streams have beendesigned for transmission rates of 1–1.5 Mbps and are therefore suited for transmissionsover the Internet. An MPEG–1 stream consists of a one or more compressed audio andvideo bit streams. The video stream consists of a sequence of frames. An MPEG videostream distinguishes between I–frames, P–frames and B–frames of different importance.I–frames are the most important frames, B–frames the least.

Yeadon et al. [88] pioneered work in MPEG stream filtering. They present 6 categoriesof filter mechanisms that can be applied to MPEG streams, among them frame–droppingfilters or frequency filters. Every filter category requires a different degree of de– andencoding of the stream and has therefore a different adaptation capacity and a differentcomputation requirement.

Hemy et al. [43] describe a frame–dropping filter called MTP for MPEG–1 streams.The adaptive filter drops frames of the least importance in times of congestion to adaptthe amount of data sent per time unit to the available bandwidth to avoid packet loss onthe connection. The degree of filtering is steered by the client. The client reacts to lost

2.6 Communication architectures 21

packets by requesting a reduction of the data rate, i.e., a higher frame dropping level. Thisfeedback mechanism is smoothed by two thresholds to limit the impact of rapid and smallbandwidth fluctuations.

We consider frame–dropping MPEG–1 filters well suited to investigate the issues re-lated to topology–awareness. First, their behavior is well understood. Second, frame–dropping is an efficient adaptation mechanism because it does not require a de– and re–encoding of an MPEG–1 stream. Finally, we note that possibility to adapt MPEG streamson proxies has been mentioned in previous work, no attention has been paid so far to-wards a dynamic selection of the location where the adaptation should be performed ina real network. That is, it is known that MPEG streams can be adapted on proxies, butdynamically deciding which proxy is best suited in a real network has not been addressedso far.

2.6 Communication architectures

This section describes architectural approaches to integrate networks and applications forcommunication purposes. A short review of active networks is presented in Section 2.6.1.Section 2.6.2 discusses approaches in the area of active services. Section 2.6.3 looksat the integration of applications into the Grid. Finally, Section 2.6.4 reviews overlaynetworking architectures.

2.6.1 Active networks

An active network is a network architecture where the behavior of network nodes (switchesand routers) can actively be influenced by applications. A packet injected into the networkmay still contain (passive) application data. In addition, however, a packet in an activenetwork architecture may contain active code that influences the data delivery. This codemay steer the forwarding of the (IP) routing preferences. Or it may contain code thatallows the processing of application data in transmit, e.g., to adapt the data on a router.

Active networks were designed to address the following problems: difficulties of theintegration and the deployment of new technologies and standards in a network, poorperformance due to redundancy in the different layers of the protocol stack, and the diffi-culty of accommodating new services in the existing architectural model. Active networksopen new research issues in several domains, such as security application–driven routing,programmability, etc. An overview of different topics can be found in a survey by Ten-nenhouse et al. [79]. Two important samples of active network architectures are ANTSby Wetherall et al. [84] and CANES by Calvert and Zegura [9].

ANTS replaces the packet in a network protocol by a basic abstraction called capsule.A ANTS protocol is a collection of related capsules that are treated as a single unit of pro-tection by network nodes. New network protocols can therefore be deployed by packing


the protocol code into capsules and shipping them to the active nodes. ANTS provideseasy support for different communication forms, e.g., host mobility or multicast.

In addition, Legedza et al. [54] provide a sample for application–specific data process-ing inside an active network node. They describe an application which gather samplesfrom distributed sensors. These samples could e.g. be used to steer the adaptivity. Anactive network is able to steer the amount of sensoring information that is gathered andsent to the application, depending on the availability of resources inside the network. Forthe described application, the active node can drop, merge or split single samples.

In the CANES project [9], every node in the network supports a particular set offunctions. Every function contains a unique identifier. Every packet that is sent into thenetwork may contain (i) the identifier of the function that is to be applied to this packet and(ii) parameters to be supplied for this function. The headers and parameters are includedinto common IP packets by a newly defined IPOPT AP option. They evaluate their workwith an MPEG application. The MPEG stream can be adapted in times of congestionin several ways. A partial packet discard mechanism defines each IP fragment to be aunit that can be discarded. A similar mechanism at frame level is the frame level discardmechanism where every unit is an MPEG frame. Finally, a third drop level works on agroup of pictures (GOP). Congestion on a node is detected when a node is no longer ableto fit the incoming data into the output queue.

Although originally designed to provide flexibility in the deployment of new networkprotocols, no active network architecture has been deployed. The main reason is thecommon belief that they are dangerous to the stability of networks. The flexibility toinfluence the routing behavior in an arbitrary way by applications may change the routingbehavior of a whole network in an uncontrollable way, deliberately or involuntarily. Asan alternative to active networks, active services have been therefore been proposed.

2.6.2 Active services

Active services are similar to active networks in that they allow application data process-ing within a network. However, the processing is restricted to the application layer. Activeservices thereby preserve the routing and forwarding semantics of the current Internet andaddresses the perceived weakness of active networks. The active service approaches sharethe idea of providing nodes inside the network (active service nodes) where applicationscan install and instantiate application–specific services.

The AS1–framework by Amir et al. [3] shows that active services are both sufficientand necessary for deploying scalable, flexible, and robust services within the network.The architecture has been designed as a framework to allow an easy customization andextension. The authors present a media gateway (MeGa) as a sample implementation ofAS1. The media gateway distributes a video stream to multiple, subscribed clients.

TACC stands for transforming, aggregating, caching and customizing [34]. These

2.6 Communication architectures 23

functions describe possible implementations of adaptation mechanisms that can be ex-pressed as active services to address the heterogeneity problem of the Internet. In theTACC model, these services are typically located on scalable servers inside a network.The TACC model is evaluated for image and text data in a modem environment [35]: aset of (proxy) servers is available before the data stream from the Internet reaches themodem line to the end user. These proxy servers host the TACC services, e.g., distillationfilters that adapt the incoming stream to the modem speed.

Finally, ALAN (application–layer active network approach) [39] describes an archi-tecture that is similar to the previous approaches. Fry et al. denote an active service nodean Execution Environments for Proxylets (EEP), on which proxylets (aka active services)can be installed. The ALAN architecture consists of four components: EEP discovery,routing exchanges, service creation and information routing. The discovery consists ofbuilding a large distributed database of all nodes. The routing in ALAN is transparentto the application and is based on bandwidth. ALAN is evaluated with Web applica-tions [38]. Adaptive proxylets compress objects while they are transmitted. The authorsshow that the download time of a large text file (the bible) can significantly be decreasedwith ALAN.

All active service architecture are similar in that they allow applications to alter thecontent of their data on nodes inside a network. The strength of this approach hasbeen shown with different applications. While these architectures show how data canbe adapted, it is generally not known how such an architecture gathers performance infor-mation that is needed for the adaptation. In addition, although these architectures defineadaptation protocols to steer the adaptation, no information is found about how theseprotocols are used by real applications.

2.6.3 Grid

The Grid is an evolving architecture that connects high performance computing resources,e.g., clusters, over large distances. Very large applications, e.g., simulations, use the com-bined computational power to speed up their execution. Grid applications are similar totopology–aware applications because they adapt the scheduling of their tasks to the avail-ability of Grid resources. Berman et al. [7] describe such an adaptation task as a sequenceof resource discovery, resource selection, schedule generation, schedule selection and ap-plication execution. Resource discovery especially focuses on the discovery of availablecomputing power and the measurement of the bandwidth among them to ensure a fast dis-tribution of the tasks. Because the requirements of Grid applications are different yet theprocess of discovering and selecting a schedule is similar, the functionality of the processis encapsulated in an architecture with which a client application can interact. This ar-chitecture interacts with other parts of the Grid infrastructure, e.g., the Network WeatherService [86]. This architecture can therefore be considered as a middleware between


applications and the Grid environment.

The Network Weather Service (NWS) [86] is a monitoring system that periodicallymeasures different metrics (bandwidth, latency) among Grid resources. An applicationcan query these raw data. In addition, the NWS includes different bandwidth predictionmodels.

Since the Grid is an evolving architecture, many issues are currently not yet addressed.However, given that the Grid spans multiple network technologies and is used by differentapplications, we consider that the Grid architecture has to address issues that are similarto those in this dissertation.

2.6.4 Overlay network architecture

This section reviews some work related to the construction of an overlay network archi-tecture. In contrast to Section 2.4.2, where only routing in overlay networks is discussed,this section presents whole architectures that address multiple issues.

The Ninja architecture [41] aims at providing a service design that allows an easyconstruction of scalable, robust services through a distributed service architecture. It con-sists of four basic elements: bases, which are powerful workstation cluster environmentswith a software platform that simplifies scalable service construction; units, which arethe devices by which users access the services; active proxies, which are transformationalelements that are used for unit– or service–specific adaptation; and paths, which are anabstraction through which units, services and active proxies are composed. A sampleapplication is the Ninja Jukebox which allows the streaming of MP3 files to clients.

Nakao et al. [56] describe a framework for constructing network services for accessingmedia objects. The framework consists of different components that are sequentiallyconnected to form an end–to–end path. The framework contains abstractions for, e.g.,nodes or rules (path rules and node rules). An end–to–end path is constructed by applyingrules on a network (graph). The authors use an adaptive MPEG stream as an possible useof their framework.

The Resilient Overlay Networks (RON) architecture consists of a set of nodes thatcommunicate with each other via an overlay architecture. Every RON node contains aforwarder which determines the best path for every packet that enters the RON network.Within RON, no further path evaluation is made to improve the performance. The routingpath is determined based on bandwidth. To learn about the bandwidth, active probingtechniques or passive observations can be used. In addition, latency and loss are measuredalong the paths as well. An application influences the routing behavior by specifying themetric that should be chosen for the routing. The implementation is tightly integrated intothe router core.

The Internet Indirection Infrastructure (i3) [73] provides an rendez–vous based archi-tecture that decouples the sender and the receiver in a peer–to–peer communication. This

2.7 Summary 25

indirection in communication is achieved by receivers placing a trigger inside a network.A trigger is a piece of software that injected into the network by a receiver. The triggercontains a unique communication id rather than an IP address. The trigger knows thelocation of the receiver. A sender in i3 does not send the data directly to the address of thereceiver. It sends it to the trigger instead by adding the trigger id to the data. An indirec-tion is so created by sending the data from the sender to the trigger, via the id, and fromthe trigger to the receiver. This form of indirection allows the deployment of a set of com-munication models. Multicast is achieved when multiple receivers insert a trigger withthe same id. It is thereby assumed that the data is automatically sent from a sender to alltriggers that contain the same id. A second communication model that can be expressed ini3 is host mobility. Host mobility is easier than in the P2P model because whenever a hostchanges its location, it inserts a new trigger with the same id which replaces a previoustrigger. The new trigger automatically routes the data to the new receiver location. Thesender is completely unaware of this change. Finally, adaptation is possible if the triggerdoes not only implement a simple data forwarding but a customized data processing.

All these architectures have in common that they allow applications to customize thecommunication and thereby express their requirements. The customization comes in theform of abstractions that are all called differently in the various approaches: path, rules,tags, triggers.

The architectures show a great variety in possible approaches. Ninja provides a setof abstractions that can be combined to build a path. Nakao et al. present a frameworkarchitecture. RON is an overlay structure where tags in packets allow applications todefine the routing. i3 is also an overlay structure whose customization is achieved bytriggers. Because none of these architectures excessively study the use by applications, itis hard to draw conclusions which architecture is best suited to integrate applications andnetworks, and it is also hard to see how much interaction is possible or needed. Thesequestions have to be addressed in this dissertation.

2.7 Summary

The discussion of background and related work shows that previous research has comeup with a great variety of solutions to the problems related to topology–awareness. Es-pecially the shift from network–layer solutions to the application layer has created anamazing amount of proposals. From this amount we conclude that the issues we addressin this dissertation are a relevant problem.

The discussion also shows that many approaches are only targeted to one single prob-lem. Taking again overlay routing as an example, little effort is spent on how networkinformation is integrated into the application–layer routing in a real network. Similarly,little is known about the benefit of overlay routing for real applications. There is a clearlack of knowledge about the work that surrounds a single issue. And, as an immedi-


ate conclusion, no work has thoroughly studied a tighter integration of applications andnetworks to address the issues related to topology–awareness.

3Octopus: a topology–aware

framework

The concept of topology–awareness aims at a unified architecture that allows applicationsto express their communication and sensitivity requirements. The creation of such anarchitecture requires a major software engineering effort. We start this software engineer-ing process with the analysis of the requirements of typical topology–aware scenarios andresults in the design of the architecture. The process and the resulting architecture aredescribed in this chapter.

We start with the analysis of the requirements on a unified architecture by lookingat different application scenarios. Section 3.1 describes four different sample scenarios:adaptation, handoff (the dynamic change from one connection to an alternate connection,similar to host mobility), multicast and multi–path streaming. We have selected thesescenarios from the larger set of possible scenarios because we consider them importantfeatures of current and future applications. We also consider four scenarios a sufficientnumber to analyze the different application requirements.

Based on these scenarios, we derive the basic design of the unified architecture. In afirst step, Section 3.2 defines the main actors that interact in the topology–aware process.Then, these actors are transformed into a topology–aware framework in Section 3.3. Weargue that a framework is a well–suited structure to fasten the common topology–awareprocess around abstractions that can be customized by applications. Finally, Section 3.4summarizes the work.

3.1 Scenarios

This section analyzes four different communication scenarios that we consider importantfor today’s and tomorrow’s communication needs. These scenarios must therefore besupported by a topology–aware architecture. The analysis focuses on the process thatthe application runs through to set up the communication and to steer the communicationwhile transmitting data. In the figures that depict the scenario and describe the process,terms that depend on application–specific preferences are italicized. These terms must be

27

28 Chapter 3: Octopus: a topology–aware framework

client

server

Octopus node

path

adapt1. find Octopus nodes2. measure3. select path4. find bottleneck5. select node6. instantiate7. start transmission8. adaptation

bandwidth

bestservice

steerO

O

O O

O

O

Figure 3.1: Scenario 1: topology–aware adaptation before a bottleneck

customizable by applications in a topology–aware architecture.The following scenarios are already expressed in the terminology that is used through-

out this dissertation, although the terms are only introduced in Section 3.3. The reason forthis new terminology is that different approaches in the various research domains use dif-ferent names for the same concept. As an example, a node inside a network that is used toinstantiate application–specific code is sometimes called active service node, sometimesproxy. In addition, both terms have typically been used in areas that are closely related toapplication–specific adaptation, but they are seldom used in the area of multicasting. Toavoid such pre–defined, misleading ideas about various terms, we prefer to use our ownterminology. This terminology is still close enough to the common understanding of thedefined terms, so that the reader does not need to learn a new vocabulary.

The terms that are used in the scenarios are Octopus node and Octopus service. AnOctopus node is a node inside a network that provides a programmable platform thatis visible (accessible) to applications. The program that is instantiated on such a node isdenoted as an Octopus service. A service comprises any kind of data processing, includingdata forwarding, redirecting, multicasting as well as adaptation. Because all Octopusnodes are visible to the applications, the links among the Octopus nodes and the pathsthrough a network are application–layer connections which run on top of an IP network.

3.1.1 Scenario 1: topology–aware adaptation

Figure 3.1 depicts a server and a client of a topology–aware application. Assume that theapplication wants to stream a real–time video from the server to the client. The topology–aware application has two options to improve the streaming. First, it can decide alongwhich path it wants to route the data, and second, it may adapt the data if a bottleneck isencountered in the transmission. Adaptation can be made on the server or it can be doneinside the network on an Octopus node. An adaptation inside the network may, e.g., bemotivated by a large load on the server node.

The first step for the topology–aware application is to find available Octopus nodes

3.1 Scenarios 29

client

server

Octopus node

old path

1. find paths2. measure3. alternative path4. find bottleneck5. select node6. instantiate7. switch connection

alternativebandwidth

select

bestservice

x

new path

O

OO

OO

O

Figure 3.2: Scenario 2: dynamic handoff of a connection.

inside the network. When the application starts up, it does not have yet information whereOctopus nodes are located in the network. The application needs information about avail-able Octopus nodes to determine the routing of the data. After determining the locationof these nodes, metrics (bandwidth) must be measured. The nodes and the metrics formnow a directed graph that can be evaluated by the application to select the best transmis-sion path. This path is simply denoted as “path” in Figure 3.1 and uses three Octopusnodes which route the data along application–layer connections. When the best path isdetermined, it must be checked whether a bottleneck along the path limits the data trans-mission. If so, the best Octopus node on the server side of the bottleneck must be selectedto instantiate an adaptive service which adapts the data to the available bandwidth. Theterm “best” depends again on application preferences. Having instantiated the service,the application can start the transmission. During the transmission, the adaptation may besteered by the application.

3.1.2 Scenario 2: dynamic handoff

A dynamic handoff is a switch from an existing connection to an alternative connection.In Figure 3.2, assume that a server is sending data over an existing path (denoted as “oldpath”) to the client. During the transmission, the capacity of one of the links goes down(indicated by a cross). Knowing the topology of the network, the application initiates asearch for alternative routing paths. Among the set of possible paths, the best alternativecan be selected according to the application–specific metric (bandwidth). If adaptation isneeded along this path, an adaptive service may be instantiated. When all parts of the newpath are set up, the application can switch from the old to the new connection and shutthe old connection down.


client 1

server

Octopus node

1. find Octopus nodes2. measure3. paths4. find mc points5. instantiate6. find bottlenecks7. adaptation nodes8. instantiate9. start transmission

bandwidthselect

mc services

selectad. service

client 2

multicast

O

O

O

O

O

adapt

Figure 3.3: Scenario 3: topology–aware multicast

3.1.3 Scenario 3: topology–aware multicast

Figure 3.3 depicts a simple multicast scenario with one server and two clients. At thebeginning, the Octopus nodes inside the network are again unknown to the application. Adiscovery of the nodes is necessary as described in the first scenario. After finding avail-able nodes and the measurement of the necessary metrics, the best transmission paths canbe selected from the set of possible paths. This selection is again application–specific:some application may only look at bandwidth, others may take latency or jitter into ac-count. When all paths are found, multicast points are found where two streams towards theclients diverge. A multicast service is instantiated on the corresponding Octopus nodes.If the paths contain bottlenecks, it is necessary to additionally search for Octopus nodesthat can be used for adaptation.

The process described here is not necessarily limited to the startup of the applicationnor does it have to concern the whole distribution tree. A reorganization of the tree maybe done at any time while the application is running (the reorganization may, e.g., befollowed by a dynamic handoff), and a reorganization may be limited to any part of thedistribution tree.

3.1.4 Scenario 4: multipath streaming

The traditional Internet design allows the routing of data along only one path. Since atopology–aware application has the knowledge of the network topology and can influencethe routing, it may also select multiple paths rather than a single one to transmit thedata. We denote this capability as multipath streaming. Figure 3.4 shows two possiblepaths from the server to the client. Multipath streaming may be used by communication–intensive applications, such as multimedia applications.

3.1 Scenarios 31

client

server

Octopus node

path 1

1. find Octopus nodes2. measure3. find set of paths that

match4. find split points5. find bottlenecks6. adapt. nodes7. instantiate8. start transmission9. adaptation and

splitting

bandwidth

requirements

selectad. service

steer

path 2split

O

O

O

OO

O

merge

Figure 3.4: Scenario 4: Streaming data simultaneously over multiple paths.

The process of multipath streaming also requires the discovery of Octopus nodes andthe measurement of the metric as a first step. Rather than selecting a single path, however,a set of paths is chosen that match the application requirements. Such a requirement fora multimedia application may be that the sum of the bandwidth of the different pathsmust exceed the bandwidth required by the multimedia stream, while the difference in thelatency of the paths must be small enough to avoid synchronization problems. When sucha set of paths is found, the splitting and merging points for the streams must be identified,similar to the finding of multicast points. If adaptation is needed, the corresponding nodesmust be found as well. Once that all services are in place, the transmission can begin.During the transmission, the adaptation as well as the splitting must be steered by theapplication.

3.1.5 Summary

The presentation of the four scenarios leads to the following conclusions. First, the pro-cess of finding nodes, selecting the path and the nodes and instantiating the services aresimilar in all scenarios. This similarity is used to define the basis for a unified topology–aware architecture.

Second, the similarity also implies that there are differences among the scenarios.The differences represent application–specific preferences that must be kept flexible toimplement multiple scenarios. These preferences include the metric, the selection criteriaand the (kind of) service. These parts must be built into the architecture as abstractionsthat can be customized by applications.

Third, we observe that the process is not linear in that several abstractions influencethe topology–aware process at various places. The metric to which an application issensitive, e.g., influences the measurement (because the corresponding metric must bemeasured), the selection of a path (should the metric be maximized, e.g., for bandwidth,


application

manager

monitor

selector

evaluator

predictor

requires

gathers

delivers

predictionselectionstrategy

evaluationstrategy

influences

influences

defines

defines

delegates

requires

best path

delegates

network

application

requires

service

uses, places, instantiates

management

predicts

network

information

networkinformation

Figure 3.5: Use case of a topology–aware process.

or minimized, e.g., for latency) and the placement of the service. This complexity is aneffect of the tighter integration of application and networks and requires a sophisticateddesign for a unified topology–aware architecture.

3.2 A unified process for topology–awareness

The unified topology–aware process is described in an UML use case diagram [33] inFigure 3.5. A use case describes the communication among the different actors in asystem at a high abstraction level. The use case in Figure 3.5 is a direct consequence ofthe analysis of the application scenarios.

Figure 3.5 depicts seven actors. A first actor is the topology–aware application. Itinteracts with a manager by inquiring about the best way of transmitting the data. Themanager maps the application request into a network request and asks the monitor togather the necessary information (e.g., topology, bandwidth). The monitor returns theinformation again to the manager, which processes this information on behalf of the ap-

3.3 Octopus – a framework for topology–awareness 33

plication with the aid of the other three actors. First, the manager may ask a predictorto predict the future availability of resources based on the current measurements. Sec-ond, the evaluator evaluates the gathered topology information, i.e., it finds the best paththrough a network topology. Note that the application delegates the evaluation to this ac-tor, i.e., the evaluator is defined by the application and evaluates the topology on behalf ofthe application. Third, the selector encapsulates the selection strategy for the service thatshould be instantiated (e.g., multicast, adaptive). The selector also works on behalf of theapplication. Finally, when all information about the transmission has been used and thetransmission strategy is known, the application instantiates services on the correspondingOctopus nodes.

Figure 3.5 separates the actors into three layers, as indicated by the dashed lines:the application, the network and an intermediate layer around the manager actor. Wetherefore call this middle layer management layer. These three layers reflect the fact thetwo existing layers, the application and the network layer, are integrated by means of athird layer, the management layer. This layer combines the application preferences andrequirements regarding communication and regarding sensitivity and combines them withthe information from the network. The three Octopus layers must not be considered astransparent layers, as e.g. in the OSI network model. These layers form an organizationalunit and group related classes together. An interaction among the classes in the differentlayers should only be made through a small interface between the layers.

The interaction between the classes is necessary and important, but it must be donein predictable and controlled manner. Without these layers, an application would be freeto interact with the other layers at its own will. However, we have shown that there is acommon process in all the scenarios. This process can be very complex so that it is usefulto hide it from the application. When the application does not have to deal and implementthis process, it is easier to integrate new applications than if the whole process had to bedefined anew for every application. Similarly, the interface to the network layer shieldsthe details of the information gathering from the other layers. This shielding does notonly facilitate the design of the management layer, it also makes sure that monitors fordifferent network types can be used without changing the management layer.

3.3 Octopus – a framework for topology–awareness

The requirements on a unified architecture can be summarized after the use case as fol-lows:

� The architecture must contain three layers. The application and the network layermust be extensible.

� The interaction among the parts of the architecture may be complex and may spanall three layers.


Latency

Metric

Bandwidth Resource

NodeLink

Graph

Evaluator

MaxFlow

Selector

Multicast Adaptation

Service

ImageAdapter Forwarder

Collector

ShortestPathnetw

ork

managem

ent

applic

ation

Predictor

BandwidthCollLatencyColl TopologyColl

Figure 3.6: Layered structure of the Octopus framework.

� The process of a topology–aware application must be fixed in the architecture. Itscomplexity should be hidden from the application to facilitate the development andthe integration of applications.

The software engineering community has come up with several approaches to buildsuch architectures, such as libraries, toolkits, components and frameworks. A compre-hensive study of different approaches in the area of network–awareness is given by Bol-liger [10]. Since topology–awareness and network–awareness are closely related, we canmap some of the pros and cons also to our requirements. We learn that libraries and toolk-its are not flexible enough to come up to the requirements of a topology–aware architec-ture. Components are flexible, but they do not fix the process well enough. Frameworks,in contrast, seem well suited to satisfy the requirements. A framework fosters designreuse, which complies with the requirement that the process of the topology–awarenessmust be fixed. Frameworks are also well suited to capture complex interactions and theyprovide techniques which allow an easy extensibility. A framework also allows the sepa-ration into several layers while maintaining the interaction among the layers.

3.3.1 The design the Octopus framework

Figure 3.6 gives a rough overview of the design of the Octopus framework. The classesare attributed to one of the three layers, corresponding to the requirements. The mainpart of the framework lies at the management layer because this part of the frameworkregulates the interactions among the two other layers.

Apart from the separation of the classes into the different layers, we additionally groupthem by functionality. Classes that are related to network information are shown in white.


metric

Latency

current_value

history

Bandwidth

*

Evaluatorevaluate(graph, metric, src, dst): GraphfindBottleneck(graph, metric, src, dst)

MaxFlowEvalShortestPathEval

Resource

Node Link

has

srcNodedstNode

getBw()getStaticBw()getLatency()

IPhostname

getLoad()

*

*

Graphrefresh()removeNodes(NodeType)

containscontains

evaluates

SelectorselectNode(graph, metric)selectPath(graph, src, dsts): Nodelist

PathSelector MulticastSelector AdaptiveSelector

Predictorpredict(metric, model)

Servicehandle()

PathgetBw()

Figure 3.7: The design of the Octopus framework core.

Boxes in gray represent classes which deal with the selection of the best path or the bestOctopus nodes. Finally, classes that deal with the service infrastructure are depicted inblack. A large part of the classes from the use case are encountered in this structureagain, with the exception of the manager, which is not designed as a class but whosefunctionality is distributed into the different classes at the management layer. In additionto the actors of Figure 3.5, several classes are shown that represent data structures. Theseclasses do not have an active role and have therefore not come up in the use case.

Figure 3.7 shows the different classes in more details, though only the most importantmethods are mentioned for each class. One fundamental class in the design is the Octopusresource class. An Octopus resource is an (abstract) resource in a network that is visibleat the application layer. Two classes extend the resource class: Octopus node and Octopuslink.

An Octopus node is a node inside a network whose resources (CPU, memory) canbe used by an application. An Octopus node provides an application–layer platform onwhich applications can instantiate specific Octopus services. Octopus nodes correspondto active service nodes [3] as well as an end system that is used as an intermediate node inthe terminology of overlay networks [21]. Octopus nodes which run client or server codeare specially called Octopus end systems.

An Octopus link is a connection between two Octopus nodes or between an Octo-pus node and an Octopus end system. An Octopus link therefore corresponds to anapplication–layer connection in a network.

Important for topology–awareness is that Octopus resources know about their resourceavailability, e.g., a link should know its bandwidth. Every resource in Octopus has there-fore one or more references to a metric. A metric class stores the current value of the


resource metric and a history of past values. These values are needed by the Octopus pre-dictor. Two extensions of the metric class are shown in the figure, bandwidth and latency,but others can additionally be defined by the application.

Single resources are aggregated by an Octopus graph. This class represents the net-work topology that is used by topology–aware applications. The graph is not only thecentral data structure in this framework, it also controls a large part of the process of theframework.

An Octopus path is another resource aggregation, similar to the Octopus graph. AnOctopus path is a linear set of Octopus nodes between two Octopus end systems. Weintroduce an Octopus Path in addition to an Octopus graph because applications often dealwith this data structure when they have to decide where to instantiate Octopus services.

The predictor is a simple class that encapsulates the prediction of different metrics.Different (statistical) models can be selected for the prediction.

The Octopus evaluator processes the topology graph on behalf of the application.Evaluators implement well–known graph traversal algorithms. One extension, the short-estPathEvaluator, implements a Dijkstra shortest path algorithm, the maxFlowEvaluatordetermines the maximal flow through a network (also called the widest path). The evalu-ators are parameterized by a metric. Most commonly, maximal flow algorithms are usedin conjunction with bandwidth, shortest path algorithms with latency. However, othermetrics, such as jitter or a combination of bandwidth and latency can be considered aswell. Here is an advantage of creating a special abstraction for metrics. The networkmust provide information about the same metrics as is passed for the topology evaluationat a different place in the framework and at a different time in the process. The flexibilityobtained by combining evaluators and metrics allows the creation of evaluators that dealwith different metrics.

Octopus selectors capture the preferences of the different filters or services that shouldbe instantiated. Depending on the filter type, the strategy to place an adaptive filter isdifferent. A path selector selects one path out of a list of possible paths. Assume, e.g.,that the application searches for a path to create a multipath, i.e., where the aggregatedbandwidth of all selected paths must exceed a certain threshold. The Evaluator returnsa list of paths, ordered by their capacities. The selector then chooses the ones that areneeded to satisfy the application’s condition. A multicast selector searches the multicastpoints, i.e., the place where streams split. Finally, an adaptive selector selects a nodewhere a topology–aware application can instantiate an adaptive service.

In this design, evaluators and selectors are separated. It would also be possible tointegrate the methods of both classes into a single class. However, such a class wouldbe very hard to write and especially hard to reuse. With the separation, the functionalityof each class is well encapsulated and limited. Given this separation, we estimate thatthe two presented extensions of the Evaluator class, the shortest path and the maximalflow evaluator, are by far the most likely used types of evaluators. As a consequence,


application

collector

manager

getInformation

URL

new URL

predict

evaluate graph

select node

predictor

evaluator

6

7

5

4

3

2

1

instantiate Octopus service and start application

selector

Figure 3.8: Sequence diagram of the Octopus framework.

many topology–aware applications can not only reuse the design of the topology–awareframework but also one of these two evaluator implementations. The selector then imple-ments only the strategy according to which the evaluated topology should be traversed.The complexity here lies especially in the expression of the application preferences. Inthe case of a multicast selector, e.g., a first preference for a node is certainly the Octopusnode where two streams split. However, if this node is already loaded, an alternative nodemay be selected. An application can express such sophisticated demands by customizingthe selector class.

3.3.2 The dynamic view of the Octopus framework

Figure 3.8 shows the dynamic flow of the Octopus framework in a sequence diagram. Thissequence diagram shows a typical interaction of the classes in the Octopus framework. Itthereby combines the sequence identified by the scenarios with the class diagram of theOctopus framework. Note that the encircled numbers in the figure will be shown again inFigure 3.9.

The sequence starts with a topology–aware application that wants to transmit data.It first needs information about the network topology. To get the information about thetopology, the Octopus manager is contacted. A typical scenario might be: ”what is the


server clientOctopus serviceclient server

?a

pp

lica

tio

nm

an

ag

em

en

t xtp://client/

manager

get networkinformation

selectorevaluator

predictor

Remos collector

Remos modeler

xtp://service,client/

ne

two

rk

6

3

25

1

4

7

manager

O

Figure 3.9: The process of a topology–aware application using the Octopus framework.

best transmission path from A to B and do I have to instantiate a filter on a node?”(arrow1 in Figure 3.8). The manager interacts with the collector(s) to get topology and resourceinformation (2,3). The result is a resource graph, as described in figures 3.7. This resourcegraph is subsequently processed by the predictor (4), the evaluator (5) and the selector(6).The result (which may be a path or a location where a service can be instantiated) isreturned to the application (7), e.g., in the form of a new URL.

In addition to the timely understanding of the interactions among the different partsof the framework, Figure 3.8 also shows a typical interaction pattern for a framework. Aframework typically consists of two types of classes: fixed classes and extensible classes.In a typical interaction pattern, a fixed class (here the manager) keeps control of the exe-cution. Calls to fixed classes are merged with calls to extended classes (shaded in gray).The full implementation of the framework contains even more of these interactions, suchas calls to the metric abstraction or calls to metric–specific monitors. These calls are notshown in Figure 3.8 to keep the figure simple.

3.3.3 The layered structure of the Octopus framework

The previous description of Octopus has focused on the core Octopus framework. Thissection describes the complete architecture of Octopus, i.e., the Octopus core plus the twoadjacent layers, the application and the network layer.

Figure 3.9 focuses on the calling sequence of the topology–aware process. It thereby

3.4 Summary 39

combines the sequence diagram 3.8 with the layered structure and the classes of the Octo-pus framework. When a distributed application starts up, no information about availableOctopus nodes is available. The client queries the Octopus framework by passing theURL (xtp thereby stands for any transport protocol). Knowing the client and the serveraddress, Octopus tries to determine the available Octopus between these two Octopusend systems. In our implementation, we are using the Remos system for this purpose.Remos consists of two main parts: a modeler and a collector. A collector gathers infor-mation about a part of a network. Collectors are specific to the kind of networks theysupport: a SNMP collector gathers information about a LAN using SNMP, a WAN col-lector gets its information from various Internet tools, such as traceroute or nettest. Amodeler combines the information from various collectors and creates a topology out ofthe information. We attribute part of the Remos modeler to the network layer and anotherpart to the management layer.

Remos is well suited for our needs for several reasons. First, Remos provides in-formation about the topology and its resource via a uniform API and thereby hides thedetails of the used tools. Second, the Remos collectors can easily be extended. Currently,they measure the latency and the bandwidth of links. Additional collectors for nodes (tomeasure the load on the nodes) or to measure other metrics, such as jitter, could be easilyintegrated into the system. However, given that the network layer and the managementlayer are well separated, other tools or network management systems than Remos can beintegrated into Octopus.

The topology and the performance information are then returned to the manager. Af-ter applying the predictor, the evaluator and the selector, a new URL is returned to theapplication. In the example, the URL denotes an Octopus path consisting of two Octopusnodes. The service node refers to the node where a service should be instantiated.

Figure 3.10 shows the same layout as Figure 3.9, but concentrates on the data struc-tures that are exchanged. The network layer gathers topology information about all avail-able nodes in a certain part of the network, including Octopus nodes and routers. Whenthis raw information is returned to the management layer, all nodes that are not Octo-pus nodes must be filtered out of the data structure. The predictor, the evaluator and theselector are then applied to the graph that only contains Octopus nodes.

3.4 Summary

This chapter has analyzed the requirements of topology–aware applications by means offour different application scenarios. The result of this analysis is that in spite of the dif-ferent requirements of the scenarios, a common process in the data flow can be identified.The main actors have been described by means of a use case. This use case then lead tothe design of the Octopus framework.

Octopus is structured as a three–layer framework. This structure is a consequence


server clientOctopus serviceclient server

?a

pp

lica

tio

nm

an

ag

em

en

t

manager

selectorevaluator

predictor

Remos collector

Remos modeler

ne

two

rk

Octopus node

router

selected node

O

O

O

O

O

O

O

O

Figure 3.10: Resource gathering and processing in the Octopus framework.

of the goal to integrate applications and networks. These two parts form the outer layerwhich are combined by an additional layer that bridges the gap between the other twolayers. A layer in the context of Octopus does not maintain the completely transparentcharacteristics of the network layers in the OSI model. The parts of the framework interactwith each other through a well–defined interface. Especially the interface between themanagement layer and the network layer can be considered as a facade pattern that hidesthe details of the information gathering at the network layer from the management layer.

An alternative design decision would have been to avoid a layering completely. In ad-dition to the given nature of the layering, we see two reasons for maintaining the layers.First, the framework must be extensible to integrate different applications and differentnetworks. If, e.g., the layer that separates the network was abandoned, it would be moredifficult to apply Octopus to other kinds of networks, such as wireless networks or ad–hoc networks. Second, the integration of networks and applications must be done is astructured way. Mixing all parts of the framework without any structure might lead toa very complex flow logic. In contrast, the separation between the application and themanagement layer hides the process of topology–awareness from the application. The

3.4 Summary 41

design of topology–aware applications (new ones but also the integration of legacy ap-plications) is much easier when the application does not have to deal with this complexprocess. Applications express their preferences only by extending a limited set of abstrac-tions. We claim that the design of the framework provides the necessary but also sufficientabstractions that are needed to take the communication requirements into account. Thecustomization of the framework by two different applications and the implementation ofdifferent scenarios support this claim by showing that Octopus is able to support a widerange of communication requirements.

The creation of this topology–aware framework as a unified architecture for today’sand tomorrow’s communication requirement has been a challenge that has not been un-dertaken in this domain before. Most previous approaches have just addressed one of theissues related to topology–awareness. We have been able to come up with this architec-ture through the use of well–known software engineering design principles and concepts.We consider the applying of these principles to the issues of topology–awareness one ofthe contributions of this dissertation.

4Network support for

topology–aware applications

Topology-aware applications run on top of a network. Information about the networktopology and the available resources must be available to select the best communicationpaths for an application.

This chapter addresses two issues related to networks. First, Section 4.1 describeshow topology and resource information can be gathered in a network that does not di-rectly provide this information [50]. That is, the network does not provide informationwhere Octopus nodes are located in the network and how the connectivity among thesenodes is. Instead, we argue that an application-centric approach, in which the applicationactively gathers this information yields a practicable and scalable solution to discovernodes and to measure resources in a network. Although being an application-centric ap-proach, we can show that the three–layered structure of Octopus allows a flexible andextensible solution that takes the characteristics and challenges of different network typesinto account. Finally, we show how our approach is integrated into Remos [28].

The second part of this chapter addresses the predictability of bandwidth. The rawnetwork information that is gathered only reflects the current status of the network. How-ever, given the dynamic fluctuations of the resources, this status is likely to change onany time scale. We can show that the magnitude of the fluctuations can span one order ofmagnitude. The success of topology-awareness is highly correlated with reliable informa-tion about the future resource availability because a topology-aware application must takethe decisions about the future communication based on current measurements. We there-fore study different methods and parameters to predict the future bandwidth availabilityin Section 4.2.

4.1 Octopus Location Discovery

The analysis of the application scenarios in Chapter 3 has shown that topology–aware ap-plications need information about available Octopus nodes in a network to deploy routingand adaptation services. The discussion of related work has shown that there are basically

43

44 Chapter 4: Network support for topology–aware applications

clientserver

Figure 4.1: The network between a client and a server may be as large as the Internet.

two possible approaches (originally three, but the overlay approach does not address thedynamic search for node): the information is either provided by a network managementsystem or it must be searched by the application. We have shown that current networkmanagement systems are not suited for a deployment in the Internet due to scalabilitylimitations.

The alternative is to let the application search and discover Octopus nodes. Discov-ering the location of Octopus nodes is far from easy, especially in large networks suchas the Internet. Figure 4.1 shows a visualization of the Internet AS core [14]. The ASare organized by the peering outdegree, i.e., the Internet backbones are typically at thecenter whereas AS with only one outdegree are at the border. Application end systemsare therefore typically located at the border. Data sent from the server to the client travelsfrom the border towards to center and from there towards the client again on a specifiedpath. In our model, Octopus nodes are somewhere located in the Internet and have to befound by an application. The question we treat in this section is: how can an applicationfind available Octopus nodes in the network, as depicted in Figure 4.1.

A possible solution must satisfy a set of requirements. First, a solution must scale tolarge networks. Scalability is defined by a location discovery that is still reasonably fasteven if the diameter of the network or the number of (network) nodes between clientsand servers grows. Second, a solution should be practicable, i.e. it should only use

4.1 Octopus Location Discovery 45

information and tools that are currently available in networks (in our case, the Internet).Both demands must be realized to prove that topology–awareness can be deployed inthe Internet. Having a real and running solution also enables to draw conclusions aboutthe future deployment of Octopus services. Finally, solutions that can be deployed inthe Internet are important because the Internet is an excellent target for topology–awareapplications. In contrast to local area networks, where the resource availability is oftensufficient, the Internet is often the bottleneck in an application.

4.1.1 Dynamic network topology discovery

Octopus nodes have so far been defined as a concept at the management layer of theOctopus framework. However, we have not yet defined how Octopus nodes are mapped tonodes in a real network. We therefore describe our assumptions regarding the propertiesof Octopus nodes. How these properties can be mapped to real Internet nodes will bedescribed in the discussion of the discovery algorithm.

A first assumption is that Octopus nodes are easily recognizable as such in a networkand can therefore be distinguished from a non–customizable network node, e.g., a switchor a router. This requirement can be met, e.g., if Octopus nodes have a special host nameor a particular IP address (e.g., special lower order bits).

Another assumption is that Octopus nodes are located close to routers. Closeness isdefined here by the property that the connection between an Octopus node and a routerprovides a high bandwidth and a low latency, compared to the performance of the con-nections in the Internet (i.e., the connections between different autonomous systems). Aset of Octopus and network nodes that are close together are called a cloud (followingthe definition of [60]). There are several reasons why we assume that Octopus nodes arelocated close to routers but not on them. First, making services available on routers wouldrequire a conceptual change in the lower layers of the Internet architecture. It is much eas-ier to add a new node type (Octopus node) to the architecture at the application layer thanto change one of the core pillars of the Internet architecture. Adding Octopus nodes to theInternet does not require changes in the lower layers. Second, for performance reasons.Octopus services that run on Octopus nodes may use up resources that are then no longeravailable for routing. The routing performance, however, is one of the important factorsof the Internet performance and must not be reduced by adding services to the nodes.Finally, several Octopus nodes may be close to each other. If these Octopus nodes wererouters, the routing would become more complex without an additional gain. In contrast,adding Octopus nodes as additional nodes does not influence or confuse the IP routing.

4.1.2 Path discovery

When a topology–aware application starts up, it sees the network as a black box. The onlyinformation the application has is the address of each peer (client(s), server(s)). The nodes


client

server

client

Figure 4.2: Initial application view of a network, before a topology discovery.

and connections that make up the network are hidden from the application, as depicted inFigure 4.2.

Considering that the network hides millions of nodes, a first challenge to a discoveryalgorithm is to select a part of the network that is of interest to the application. Scanningthe whole network for available Octopus nodes is not realistic for two reasons. First, ascan over the whole Internet takes too long. In addition, a scan always creates traffic onthe network. This additional load on the network increases with the number of topology-aware applications. Second, assuming that a full scan is possible (technically and within areasonable time), it would report all available Octopus nodes in the network. The numberof reported nodes is likely to be too large for an application to efficiently determine asuitable path and to locate Octopus nodes to instantiate a service.

We therefore claim that a selection of a part of a network must be made as a very firststep to comply with the required scalability issue. That part of the network must be ofinterest to the application. The optimal solution would be if an application could specifysome search criteria and the network presents those network parts that correspond to thequery, e.g., an application could specify that it is interested in at most n paths through thenetwork that exhibit a minimal bandwidth of x Mbps. However, such a query is far frompossible with the current Internet infrastructure. Given that we want to find a practicablesolution, we are bound to existing tools and information.

We consider the part of the network that the application is interested in is the routingpath, as defined by the IP routing. As indicated in Figure 4.3, nodes that are on therouting path are “between” a server and a client are certainly of interest to an application.In contrast, nodes that lie “behind” the server or the client (in Figure 4.3, nodes outside thenetwork cloud) are not of interest. Mapping the situation of Figure 4.3 to a real scenariowhere a server is located in Switzerland and the clients in the U.S., we know from ourexperience that the routing path passes transatlantic links. The set of transatlantic links iscertainly of interest to the applications, whereas paths via Asia have a very low probability


client

server

client

router on the routing path

other router

part of the network the application is interested in

Figure 4.3: Using the routing path to identify the part of the network that is of interest toan application.

to be useful to an application. Unfortunately, the Internet does not know about this factnor does it know what “behind” means. One possible way to define the network part ofinterest is therefore to follow the routing path. The routing path can be detected fromthe end systems and therefore provides a practical solution. It also has other advantages.The routing path is usually the shortest connection between two end systems in terms ofthe number of hops. It is therefore guaranteed to be a scalable solution (the average pathlength is almost independent of the size of the Internet). We term this first step in theOctopus location discovery path discovery.

Following the routing path has also its drawbacks that must be taken into account.First, the routing path is usually the shortest path in the number of hops. However, thismetric is not of interest to an application: they are more interested in high–bandwidthor low–latency connections. These application–specific metrics are not considered in thepacket routing. Savage et al. [66] show in fact that the routing path is not the most efficientin terms of bandwidth or latency. They show that 30–55% of the alternative paths havea better latency than the default path. In 10% of the cases, the best alternative path hasa latency which is 50% better than the latency of the default path. The results of thebandwidth analysis are similar. At least 10–20% of the paths have a potential bandwidthgain of a factor of 3.

The idea of following the routing path in a large network can also be looked at froma search theory’s point of view. The problem of finding a path through a network canbe mapped to a search problem in a tree. The server, which is the root of this tree, hasmultiple paths to the client(s), which form the leaves of the tree. Routers and Octopus


client

server

client

router

cloud

iteration 1 iteration 2

Figure 4.4: Breadth–first search through a network topology.

nodes are the inner leaves because they form the paths from a server to a client. Note thatthe number of inner leaves may be gigantic, because it corresponds to the total number ofInternet nodes. The branching factor of an inner node is therefore immensely high.

Searching for paths in a tree is basically made in two ways: depth-first or breadth-first search. Following the routing path compares to a depth–first search. At everylevel, we only consider one single node, and then go on to the next level. Depth–firstsearch is simple and fast – the complexity of the search is in the order of the path length,O � path length � .

In contrast, a breadth-first search increments the search depth only when all nodes ata given level have been visited. Such a search strategy is shown in Figure 4.4. In a firstiteration, the server searches all nodes that are one hop away. In every additional iteration,it increments the diameter of the search range by one until some stopping criterion isfulfilled. As a concrete example, assume that the server in Figure 4.4 is located at ETHZurich. It wants to find available Octopus nodes to find a multicast point and an Octopusnode where it can instantiate an adaptive Octopus service if necessary. In a first iteration,it searches for Octopus nodes in the local domain of the department (*.inf.ethz.ch). Then,in a second iteration, it takes on the ETH domain (*.ethz.ch), and so on, until the necessaryinformation is available.

Breadth-first search has one true advantage over depth-first search. A breadth–firstsearch guarantees that all Octopus nodes are found within a certain diameter. The ap-plication is therefore most likely to find an optimal or almost optimal location for theinstantiation of an Octopus service (given that the stopping criterion for the search is wellchosen).

However, there are a few drawbacks that make a breadth–first search ill suited for


this kind of problem. First, it is hard to define the diameter. In the above example,we defined the diameter by DNS domains. The first two diameters were easy to findin the example. However, defining the next hop and extending the search to it becomesdifficult. Leaving the “home” domain requires additional steps, e.g., looking at globalrouting tables that show to which domains the home domain is connected. This step isdifficult. An alternative in defining a diameter would be hop counts: in a first step, allnodes are considered that can be reached via one hop, then with two, etc.. This solutionis even more difficult to implement because it would require support from routers aboutthe following hops. Remember that a breadth-first search does not only search along therouting path, but it must consider all possible nodes that can be reached within n hops. Thesecond problem is the definition of a stopping criterion. Searching for an Octopus nodein a multicast scenario may look easy: identify the location where two paths split. Butfinding a place for an adaptive service requires the identification of a bottleneck, which isnot known until the whole path is known and all resources are measured. A last problemis the complexity of the algorithm. The number of nodes found in every step growsexponentially with the branching factor k, resulting in a complexity of O � k pathlength � . Togive an approximation of the expected complexity, consider that the average length of anInternet path is around 16 hops and the number of DNS domains crossed is around 6 inour experiment. Assuming an (unrealistically) low branching factor of 2, the algorithmwould end with 65000 nodes or 64 domains after completing all the iterations after 16hops. Comparing these numbers to the complexity of the depth–first search shows thatthe breadth–first search is far less scalable. For all these reasons, we consider the breadth–first alternative a non–suited solution for this problem.

4.1.3 Octopus node discovery

Octopus nodes are not located on routers, but close to them. Closeness has been definedby a connection between routers and Octopus nodes that has a higher bandwidth, lowerlatency, etc., compared to those connections that are between the routers in a path, andsuch an aggregation of close routers and Octopus nodes is called a cloud. So the idea ofthe next step, which we call octopus node discovery, is to find those Octopus nodes thatbelong to the same cloud. This situation is shown in Figure 4.5.

The idea is simple but its implementation is not because there is nothing like cloudsin the Internet. In future networks or overlay architectures, an information system may beavailable that can be queried like “I have here a router, which Octopus nodes are close tothat one?”. But for the current implementation another solution must be found.

We use the Domain Name System (DNS) to locate nearby Octopus nodes. Everynode in the Internet has its IP address, and those nodes that belong to the same domaincan be seen to be in the same cloud, which corresponds to an Autonomous System (AS).Although an AS is a purely organizational collection of hosts, it often implies that (at least


client

server

client

router on the routing path

Octopus node

cloud

O

O

O

O

Figure 4.5: Default routing paths and Octopus nodes in the same clouds.

some) nodes within the AS have a better connectivity that those between AS.

The algorithm for the Octopus node discovery is as follows: every router found on thepath is taken one by one, and the following two steps are applied. First, the router’s nameis used to identify the DNS domain (AS) the router belongs to. Speaking in terms of themodel, the cloud of every router is identified. Second, a DNS zone transfer is used to geta list of all nodes that belong to the AS (cloud) that has just been found. From this list, theOctopus nodes can be filtered out because we have required that Octopus nodes have aparticular host name or a particular IP address. Therefore, the result of the Octopus nodediscovery is a set or a list of Octopus node locations, as required. Figure 4.5 shows thesituation after the Octopus location discovery. Octopus node locations that are availableclose to the routers are visible. The number of these locations may vary from cloud tocloud.

Using DNS or any organizational closeness is certainly subject to debate. It is clearthat the use of the domain name system is not the solution that would come out of anarchitectural design. The idea of self–organizing networks, based on some dynamic per-formance metrics, is more tempting (e.g., Zegura [17]).

Nevertheless, the organizational closeness has some advantages. Basing the closenessonly on dynamic parameters, such as bandwidth, would imply that the closeness of ASwould change because the dynamic behavior of the Internet changes as well. An Octopusnode might therefore change its connectivity and belong once to one routing path, onceto another. In addition, it might be hard to define what closeness means in terms ofbandwidth anyhow (i.e., is a connection of 1 Mbps much or not?). The use of DNS allowsa cloud to be stable and to be monitored and/or managed. A management system for


client

server

client

Octopus node

multicast

adapt

forward

O

O

O

O

Figure 4.6: Application-layer connections among Octopus services on selected Octopusnodes.

Octopus nodes within a LAN is more feasible than for the Internet.A second advantage of DNS is that it is set up hierarchically. These hierarchies allow

the definition of different levels of closeness. So, to find an Octopus node close to a router,an application may first look at those nodes that are only 1 AS away, then those which are2 hops away, etc. In addition, the hierarchy limits the number of nodes that can be foundwith a single query. The available locations can be found faster, and the final selection ofthe most suited active service location is faster if the number of nodes is kept smaller.

Last but not least, DNS provides all functionality needed to find Octopus node loca-tions close to the path. It is therefore a practicable solution and fulfills this requirement.

The result after the Octopus node discovery is a directed acyclic graph (DAG) whichconsists of Octopus nodes and Octopus links, as depicted in Figure 4.6. It is now up to theapplication or a topology–aware system to select the node that is best suited for a givenapplication.

Figure 4.7 summarizes the algorithm of the path and the Octopus node discovery.The algorithm starts with the default routing path that can be traced from the source tothe destination. After the path discovery, every router is replaced by the correspondingdomain (AS, cloud). Nodes without an entry in the DNS are removed from the path. Eachdomain is subsequently queried for the nodes in its domain and the Octopus nodes areextracted. Finally, the domain is replaced by the Octopus nodes in the path. The topologydiscovery returns a graph–like structure as its result.

4.1.4 Topology–aware Remos

We have implemented the proposed algorithm into the Octopus framework. We haveintegrated the topology discovery into the Remos system [28], as shown in Figure 4.8.Remos encapsulates the functionality to gather network information into an abstraction


path = trace(source, destination);

graph = path;

for each router in graph {

domain = DNS.map(router);

if (domain==null) graph.remove(hop);

else graph.replace(hop, domain);

}

for each domain in graph {

nodelist = DNS.list_nodes(domain);

services = filter(nodelist);

graph.replace(domain, services);

}

Figure 4.7: Algorithm for topology discovery.

WANCollector

SNMPCollector

DNSResolver

TraceCollector

Graph

NodeLink

CloudRouter

Collector

Figure 4.8: Remos with extensions for topology discovery: extensions to the collectorhierarchy (left) and to the topology data structures (right).

called Collector. The original Remos contains two collectors: a SNMP collector forLAN measurements and a WAN collector for WAN measurements. We have added anew collector type called TraceCollector which gathers the path information. From thisinformation, a graph is constructed (this graph is internal to Remos and is different fromthe Octopus graph). In the original Remos, a graph node denoted a router. We extendedthis hierarchy to additionally include Clouds. The mapping of a router to a cloud and theextraction of Octopus nodes from a DNS query are made by the DNS resolver.

All extensions for the location discovery are local to the Remos data structures andare therefore limited to the network layer of the Octopus framework. As a result, nochanges to the interface between the network and the management layer are necessary.The interface therefore remains small and stable and shows that the originally designedinterface is able to hide changes at the network layer from the other parts of Octopus.This finding is important because it strengthens the claim that the information gatheringprocess at the network layer can be easily exchanged, e.g., to use different tools for othernetwork types, such as ad-hoc networks, or to add other network information sources,such as SPAND [72].


path DNS mappingclient time (s) hops domains nodes time (s) service locations

ethz.ch 0.03 1 1 28710 0.9 80epfl.ch 0.28 8 3 14219 1.22 5cs.cmu.edu 1.53 13 6 5658 0.68 30uc.pt 2.25 16 5 487 0.87 2uva.es 3.56 13 5 n.a. n.a. 2ufmg.br 3.55 15 6 n.a. n.a. 3harvard.edu 1.89 15 5 56348 2.1 n.a.nwu.edu 1.76 15 4 n.a. n.a. n.a.

Table 4.1: The overhead of Internet measurements.

4.1.5 Overhead estimation

An evaluation of the proposed algorithm is difficult because the penalty of the topologydiscovery (the time needed to find the Octopus nodes) must be weighted against the benefitof using Octopus nodes. But an estimation of the overhead of the topology discoveryis possible and can be done without the application context. Table 4.1.5 summarizesthe overhead of the different steps of the topology discovery for different domains. Theevaluation is limited to those domains that are under our control when we have beenable to install Remos. The first column contains the domain where the client is located,the server is always at ethz.ch. The following three columns show the statistics of thepath discovery: the time needed to trace the path, the number of routers and the numberof clouds in the path. The remaining three columns show the number of nodes in thedomains, the time needed for the zone transfer and the number of Octopus nodes in thesedomains.

The path discovery is influenced by the path length. The average path length in theInternet is about 15 nodes, so that an overhead of 1 to 2 seconds can be expected. Thetime for the DNS zone transfer (to list all hosts in a domain and filter out the Octopusnodes) is also in the order of 1 second. The expected overhead of the topology discoveryis therefore in the order of a few seconds.

One number that is not shown in the table is the time needed for the bandwidth mea-surement. Bandwidth measurements are also performed by Remos at the network layerand these measurements are used by the evaluation. To measure the bandwidth, Remosstresses the network by sending data as fast as possible over the network. The measure-ment time can be configured by the application, and the accuracy of the measurementcorrelates with the measurement interval. For the experiments, we used an interval of 10seconds for each measurement. These measurements use up most of the time unless themeasurement time shortened. However, a shorter measurement interval may lead to lessaccurate results.


4.1.6 Summary

An Octopus location discovery is necessary to provide a topology–aware application withtopology information, especially about the location of Octopus nodes. This informationforms the basis for a topology–aware application to influence the routing and express itscommunication needs.

A location discovery can be done at any time while the application runs. It is typicallymade when the application starts up, to define the routing path and to instantiate thenecessary Octopus services. However, a topology discovery may also be necessary at runtime, e.g., when the resource availability changes in a way that it is no longer suited to theapplication requirements. We have described such as scenario in Section 3.1.2.

The described location discovery is scalable to a large number of nodes in a network.The key to the scalability lies in an early reduction of the search space of the network byfocusing on the routing path. This early reduction comes at a price: because only one pathis considered, it is possible that the best solution to transmit and adapt data is not found.We argue in favor of our solution with two reasons. First, it is always possible to takeadditional search spaces into account by constructing alternative paths. The preconditionto create an alternative path, however, is that a node in the network via which the data canbe routed efficiently as well. Second, we argue that there is no such thing as the “best”path. Resources along the paths fluctuate and what looks as the best path at one momentin time may be worse than another a second later. A trade–off between the search timefor a good solution and the probability that the path remains good has to be found. Oursolution tries to limit the search time. If more time is available or the first search hasnot been successful to satisfy the application requirements, alternative paths can still beconsidered.

The proposed approach is practicable in that is uses routing information that is easilygathered. In addition, the assumptions regarding the Octopus nodes are easily satisfied.Our solution especially refrains from changes at the network layer, e.g., by requiringchanges in the IP routing behavior.

The use of the DNS to find nodes near the routing path is certainly not an optimalsolution. A DNS mapping for large ISPs may result in a large number of nodes that aregeographically distributed over large distances. We have sacrificed a better solution againfor practicability and note that if topology-awareness should be deployed, a better solutionis needed to find available Octopus nodes near the routing path.

The search for Octopus nodes comes at a price: the time needed for the discoverytakes several seconds, and the resource measurements require several seconds as well.This time can be omitted if a network information system or a shared database stores theresource information. There exist a (growing) number of such monitoring systems, so weare optimistic that in the future, this aspect of our system will be improved.

We have been able to integrate the Octopus location discovery into Remos as well as

4.2 Bandwidth Prediction 55

in the Octopus framework. The location discovery fits well into the structure of Remos,especially in the collector abstraction. The information gathered from the TraceCollectoris easily combined with the information gathered by the other collectors. This combi-nation is important because a topology–aware application does not only need topologyinformation but also information about the available resources in this topology. The gath-ering of both kinds of information and its combining can therefore be expressed in a singleprocessing step. This processing can be hidden behind a simple and uniform interface.This interface is located at the border between the network and the management layer inthe Octopus framework. Because the process is hidden behind the interface, other tools,e.g., for different kinds of networks, can be transparently integrated into the Octopusframework without changing the management or the application layer.

4.2 Bandwidth Prediction

Remos (as most network information system) measures a snapshot of the actual resourceusage in a network. This information is helpful for topology–aware applications. becauseit gives hints about where to instantiate Octopus services. However, the usability of singlemeasurements is limited by the dynamics of network fluctuations. Topology–aware ap-plications need information about the future evolution of the resources. If the bandwidthavailability were a stationary process, the current bandwidth could be used as a predictionof the future bandwidth availability. This assumption has frequently been made. Bol-liger [10], e.g., uses a single measurement for his adaptive framework. However, previouswork has shown that bandwidth is not stationary by far.

The discussion of related work has shown that prediction (or forecasting) has beenapplied in cases that are similar to bandwidth prediction. Dinda [27], e.g., studies differentprediction models for host load prediction. He applies statistical models that have beendeveloped for general time series [12]. These models have been applied by Basu et al. [5]to bandwidth traces. The study shows that these models are able to fit single bandwidthtraces quite accurately. In a real environment, a subset of these models has been appliedto predict the available bandwidth for NSFNET backbone traffic [42] or the Grid [86, 82].

In spite of this previous work, we note an urgent lack in knowledge about the applica-bility of bandwidth prediction models for topology–aware applications. First, the amountof previous work which applies statistical models to predict the bandwidth is small. Solittle is known which model and which parameters are suited to predict the future band-width availability. The studies in the Grid environment have not investigated all the issuesregarding bandwidth prediction. Especially non–stationary models have not been takeninto account. Second, the study by Basu et al. focus on the predictability of traces, i.e.,how close a prediction model and a given set of prediction parameters can come to abandwidth trace. Although not explicitly stated in the study, we assume that it took manyiterations to find the best matching trace. In addition, we anticipate that the calculation of


the prediction may take long (order of seconds) for more sophisticated models.

In contrast to related work, we focus on the problem of predicting the available band-width for topology–aware applications. Bandwidth prediction for topology–aware appli-cations differs in several ways from previous work. First, the time to calculate the newprediction model is limited. In contrast to previous approaches, a real–time forecasting isneeded. An adaptive application, e.g., needs predictions to estimate how it should adaptthe data in the (near) future. Depending on the data and the application preferences, weexcept that new predictions are needed in a range from seconds to minutes. We anticipatethat several prediction models need several seconds to calculate one prediction value onan unloaded commodity PC.

Second, the time horizon for a prediction shows a similar range as the predictionfrequency. The prediction of backbone traffic, e.g., tried to predict the bandwidth up toone year in advance. In contrast, we expect that topology–aware applications require adifferent time scale, ranging from seconds (e.g., for adaptation) to minutes or hours (e.g.,for path evaluation).

Third, a topology–aware application may not have a long history of samples for theprediction. If Octopus is used only in the context of an application, no sample historymay be available. Not all prediction models require the same number of samples to becorrectly initialized. In such a case, the use of a simple prediction model that can beinitialized with fewer samples may outperform a complex model that is not sufficientlywell initialized.

So, a topology-aware application must find a trade–off between the quality of theprediction (e.g., the error rate) and the time needed to calculate the prediction. To decidewhich model and which parameter set is best suited, a given topology–aware applicationmay only have a limited time and a limited number of samples available for its decision.Once a model has been selected, the application is most likely to stick to this model, unlessthe characteristics of the trace change drastically. For the applications we are looking at,we consider a change in the model as possible for long-running video applications butrather unlikely for shorter image applications.

This section focuses on two questions. First, how does the sampling interval affect thedifferent prediction models and their parameters? We claim that the interval at which aprediction is needed and hence at which new samples must be gathered has in influence onthe error of the different prediction models and hence have an influence on the application.

Second, we try to find a criterion which allows an application to select a predictionmodel and (a set of) prediction parameters that is well suited for a given network char-acteristic (e.g., defined by the first bandwidth samples of a transmission) and given thesampling interval. This selection criterion should be easy and efficient to be calculated.As in the previous section, we do not expect that the selected model and the selected pa-rameters are guaranteed to be the best solution because the selection must be done withinreasonable time and with a limited overhead. The reason why we do not search for an


optimal solution is motivated by the fact that a single bandwidth trace, as it is measuredby an application, is the result of many parameters. The queuing delay of packet on asingle router, e.g., is the result of all streams that are sent via this router (and also of allstreams that are not sent). The combination of all parameters that can influence a singlebandwidth trace along the whole path leads to an even larger number of possible results.This huge factor space makes it also immensely difficult to simulate Internet behavior.Rather than trying to model these factors, we try to observe the influence of the factorsand to select the model based on these observations.

To answer the two questions, we first give an overview of prediction models for timeseries in Section 4.2.1. Section 4.2.2 then describes the methodology for the analysis.The subsequent sections describe the important factors for bandwidth prediction and theirinfluence on the predicted bandwidth. First, Section 4.2.3 studies the influence of thesampling interval on the bandwidth trace. Section 4.2.4 shows the characteristics of theprediction models when applied to a single bandwidth trace. The time needed to cal-culate a prediction is considered in Section 4.2.5. Section 4.2.6 studies the amount ofzero and negative prediction values. Section 4.2.7 analyzes the quality of the predictionmodels as a function of the trace characteristics. We find that not all prediction modelsare able to model bandwidth fluctuations in the same way. We therefore derive a simplemodel that allows an application to select a prediction model based on the trace variation.Section 4.2.8 concludes this chapter.

4.2.1 Time series analysis

The forecasting (prediction) of time series requires that observations are available at dis-crete, equally spaced time intervals z1 � z2 �� zt of t successive observations [12]. Sucha time series, denoted as Wt , is regarded as a sample. Using mathematical models forforecasting is a well known technique. Two types of models are distinguished: stationarymodels and non–stationary models. A model is statistically stationary iff:

1. the expected values and variances are constant, i.e., E�Zk �� µ and Var

�zk �� σ2

z forany k.

2. the autocovariances (and autocorrelations) are constant, i.e., cov � zk � zk m � � constfor any k, m.

If a time series is non–stationary, it can be transformed into a stationary time serieswith the use of the differencing operator nabla. Because of the use of the differencingoperator, this transformation is often called an integrated process.

Different types of statistical models have been proposed for time series forecasts. Wegive only a short overview – a detailed description of the models is given by Box etal. [12]).


The simplest models are linear filter models, which work on the weighted sum ofprevious observations. The BestMean model averages the last p samples to calculate aprediction, i.e.,

BM � p � � 1n

p

∑k � 1

zk (4.1)

The BestMean(1) model is alternatively named Last model because it only takes the lastmeasured value into account.

Autoregressive and moving average models are more complex prediction models. Theautoregressive model is denoted by AR(p). Its prediction uses the weighted sum of pprevious values plus a random shock, where a shock is a random drawing from a whitenoise process with zero mean and finite variance. The term autoregressive is used becausethe variable z is regressed on previous values of itself.

In the definition of the autoregressive models, we need the two operators: the B–operator (backshift operator) and the ∇–operator (nabla operator). The B–operator trans-forms an observation of a time series to the previous one:

BYt � Yt � 1 (4.2)

and

BkYt � Yt � k (4.3)

The back shift operator can be used to define the ∇–operator:

∇Yt � Yt� Yt � 1 (4.4)

and

∇ksYt � � 1 � Bs � kYt (4.5)

An AR(p) process is defined as

� 1 � φ1B � φ2B2 � �� φpBp � Wt � et (4.6)

where Wt is the stationary time series, et the white noise error.The corresponding forecasting function Ft for AR(p) is then

Ft �p

∑i � 1

φiWt � i (4.7)

The moving average model MA(q) predicts the current value based on a random shockand weighted values of q previous shocks. A MA(q) process can be modeled as

Wt � � 1 � θ1B � θ2B2 � �� θqBq � et (4.8)


where Wt is the stationary time series and et the white noise error. The correspondingforecasting function Ft is then

Ft � �

q

∑i � 1

θiet � i (4.9)

The combination of AR and MA results in the ARMA(p,q) process. It can be definedas

φ � B � Wt � θ � B � et (4.10)

The forecasting function for ARMA is:

Ft �p

∑i � 1

φiWt � i�

q

∑i � 1

θiet � i (4.11)

ARMA processes achieve a greater flexibility in fitting of actual time series. Experienceshows that stationary time series can be obtained with AR, MA or ARMA models, inwhich p and q are often not greater than 2.

Non–stationary models impose additional complexity on the model. The autoregres-sive integrated moving average ARIMA(p,q,r) model of order (p,q,r) is frequently usedfor non–stationary data. The ARIMA model is an ARMA(p,q) model that is differentiatedr times. The name ARIMA comes from the fact that this series is created by an integratedprocess, i.e., derived from the ARMA series.

The use of the prediction models depends on the question whether the time seriesgenerated from bandwidth traces are stationary or non–stationary because different mod-els are used for stationary and non–stationary time series. Internet traffic on an uncon-gested Internet wire has been shown to exhibit a pervasive non–stationarity [15], and alsoGroschwitz et al. [42] use a non–stationary prediction model to predict the NSFNET back-bone traffic. However, no previous research has addressed the question whether the timeseries that are used by topology–aware applications are stationary or non–stationary. An-swering this question in detail would require a thorough analysis of probably thousandsof (Internet) bandwidth traces. The time for such an analysis exceeds the time frame ofthis dissertation. However, the following sections show a first approach on how such ananalysis can be performed and provides initial results to the above questions.

The second problem for bandwidth prediction for topology–aware applications is thatthe time to select a model and determine the model parameters is limited. The selectionof the parameters for a prediction model is a well understood process. In the case of anARIMA model, the first parameter to be determined is d, the differencing parameter, thenp, the autoregression, and finally q, the moving average. Differencing is applied until thedata is stationary. Autoregression is determined by examining the patterns of autocor-relation after differencing. Finally, the moving average is determined by examining thepattern of the partial autocorrelations.

Although the techniques to estimate these prediction model parameters are under-stood, we consider the overhead to calculate the parameters too high for topology–aware


applications. Throughout the reminder of this section, we try to identify an easier, lessexpensive criterion to select the best prediction model and the best model parameters.

4.2.2 Analysis methodology

We perform the analysis as follows: we are using a set of 1000 Internet traces. Theseraw traces are resampled at different, regular time intervals. The resulting traces are thenpassed for prediction to the Resource Prediction System (RPS). The RPS is part of theRemos system [28]. Although we use RPS as a standalone tool for this analysis, it iswell embedded into the Remos architecture and can be used for the needs of topology–aware applications without changes. Note that from an architectural point of view, wehave attributed the predictor to the management layer in the Octopus design (see, e.g.,Figure 3.6). We will give the reason for this architecture in Chapter 5. We neverthelesstreat the issues related to resource prediction in this chapter because its functionality isclosely related to networking issues.

The 1000 bandwidth traces have been measured and collected by Bolliger [10]. Thesetraces were created using a 10 MB data transfer between two hosts over the Internet. Thehost locations are spread over several sites in Europe and the U.S. The raw traces containthe timestamps of the acknowledgments received by the sending host. Given a constantpackage size, the available bandwidth, as experienced by the sender, can be reconstructed.

Figure 4.9 shows the Cumulative Distribution Function (CDF) of all traces, i.e., thex–axis shows the amount of bandwidth of each trace, in KBps, and the y–axis denotesthe number of traces with a bandwidth that has at least x KBps. The wide range ofexperienced bandwidth of all traces, from 10 KBps up to 603 KBps, allows the conclusionthat the traces show significant differences in their bandwidth and that the conclusionsdrawn from these studies are well founded.

4.2.3 Sampling interval

The raw traces have been re–sampled to create equally spaced traces, as required by theprediction models. The sampling interval, tsample, is thereby varied for two reasons. First,we expect that the different sampling intervals yield to different trace characteristics,which have in turn an influence on the choice of the prediction models and their parame-ters. Second, different applications have different time requirements: one application mayrequire new bandwidth values every 100 ms, another every second. These requirementsdepend on the relationship between the application data granularity and network band-width: small data chunks over fast networks require more frequent adaptation. Theseapplication requirements may be an important factor that limits the choice of a possibleprediction model.

For our study, we use seven different sampling intervals: 0.01, 0.1, 0.2, 0.5, 1.0, 2.0and 5.0 seconds. The range of sampling intervals has been limited by the characteristics


0

10

20

30

40

50

60

70

80

90

100

0 100 200 300 400 500 600 700

bandwidth [KBps]

CD

F

Figure 4.9: CDF of bandwidth of all 1000 sample traces.

of the bandwidth traces. The time span of the bandwidth traces is in the order of several10s of seconds. A higher sampling interval than 5 seconds would have led to few samples.0.01 seconds is a lower bound because the amount of data that is sent within such a shortinterval is often zero.

To visualize the differences of the sampling intervals, Figures 4.10 and 4.11 show apart of one sample trace after the resampling.

Figure 4.10 shows the trace (number 118) after being sampled with the two fast sam-pling intervals, at 0.01 and 0.1 seconds. Sampling at such high frequencies implies thatin some time intervals no data is received. A high peak is measured when data is finallyreceived within an interval.

Figure 4.11 shows the same trace when sampled at 0.2, 0.5 and 1 second respectively.Note that both x– and y–axis are scaled differently than in Figure 4.10. Especially they–axis is smaller because the peaks are less high at these sampling intervals. With theexception of the beginning of the trace, there is also no sample that has a zero valueanymore.

It is generally known that sampling at a lower frequency corresponds to a low–passfiltering of the samples and the resulting curve tends to be smoother with an increasingsampling period. However, comparing the traces at 0.2 and 0.5 seconds interval, we canalso see that a lower sampling frequency does not always guarantee a smoother curve:


0

100

200

300

400

500

1 1.5 2 2.5 3

time [s]

bandw

idth

[KB

ps]

0.01 seconds 0.1 seconds

Figure 4.10: Part of bandwidth trace (no. 118), sampled at 0.01 and 0.1 seconds

0

20

40

60

80

0 5 10 15 20 25time [s]

bandw

idth

[KB

ps]

0.2 seconds 0.5 seconds 1.0 seconds

Figure 4.11: Part of bandwidth trace (no. 118), sampled at 0.2, 0.5 and 1.0 seconds


0.1

1

10

0 20 40 60 80 100

CDF

coeff

icie

nt

of

variation

(CO

V)

0.01 s 0.1 s 0.2 s 0.5 s 1 s 2 s 5 s

Figure 4.12: The CDF of the coefficient of variation (COV) for all traces.

between 3 and 5 seconds in Figure 4.11, the 0.2 seconds sampling yields the smoothercurve than for 0.5 seconds. The reason is that the ACKs in the original trace are notarriving at regular intervals and the resulting trace just depends on how the ACKs aredistributed to the different intervals. So, although there is a tendency towards a smoothercurve when larger sampling intervals are used, there is no guarantee that it will be thecase. If there were such a guarantee, the prediction problem would be easily solved byjust using a large enough sampling interval. However, since there is no such guaranteeand since there are other factors that influence the sampling interval (e.g., applicationrequirements), we must try to find a method to select a good prediction model under theseconstraints.

One statistical parameter that captures the size and the frequency of the fluctuationsof the traces is the coefficient of variation (COV). The COV is the quotient of the standarddeviation normalized by the average, i.e., COV � σ

�µ [46]. The COV is a useful metric

because it describes the deviation of the samples in relation to the average. The result istherefore independent of the original unit, e.g., whether the bandwidth was 100 Mbps or1 Mbps. A COV value of, e.g., 0.1, always shows a small variation whereas values largerthan 1 show high variations.

Figure 4.12 shows the CDF of the COV for all 1000 traces. The different curvesdenote the various sampling intervals. We note a huge gap between the COV of the 0.01s


0

20

40

60

80

100

0 20 40 60 80 100

(average-median)/average [%]

CD

F

0.01 s 0.1 s 0.2 s 0.5 s 1 s 2 s 5 s

Figure 4.13: The CDF of the difference between the mean and the median bandwidth.

samples and all the other lines in Figure 4.12. Note here that the y–axis is logarithmicallyscaled. The alternation of high peaks and low values that we have identified in Figure 4.10is visible throughout all 1000 traces of our study. The COV values above 1 show that theexpected standard deviation of the sampled trace is at least in the order of the averagebandwidth. For the other intervals, the tendency of smoother traces for larger samplingintervals can be confirmed. We can also confirm that there is no guaranteed smoothness:there is a significant number of traces of at a lower sampling frequency whose COV islarger than the corresponding COV of a higher sampling trace. Finally, we note that alarge part of the slower sampling intervals (i.e., except 0.01) have a COV between 0.25and 0.75. A prediction model is therefore expected to be able to take a large, but notextreme variation into account. However, even for 5 seconds sampling, a COV of largerthan 1 is possible. In statistics, mean and median can be used to describe the average.Median values are preferred when the samples contain single, large outliers that influencethe calculation of the mean.

Figure 4.13 shows the CDF of the difference between mean and median, normalizedby the mean, in %. A high value on the x–axis shows a large difference between the meanand the median. For sampling intervals of 0.01 seconds, only 15% of all the traces have adifference smaller than 90% (relative to the average bandwidth). This difference does notcome as a surprise: the trace in Figure 4.10 has shown that the sampled values are either


0 or very high. Since the number of peaks is much smaller, a median value of zero can beexpected whereas the mean value lies somewhere between the peaks and zero. This resultshows that a statistical analysis must be made with care.

This analysis of the different sampling intervals shows that the characteristics of thesampled traces changes significantly with the sampling interval. The sampling interval isan important factor because it can be selected by the application (within its requirements).A higher sampling frequency tendentially leads to higher fluctuations that have to be takeninto account by a prediction model.

4.2.4 Prediction models

This section addresses the question whether there is a single model that best predicts allthe Internet traces with all the sampling frequencies, and, if not, whether we can find acriterion that allows the selection of a prediction model.

To answer these questions, we apply the traces onto the different models and comparethe predicted values with the values of the sampled traces. The difference of the predictedand the sampled value are summed up over the whole trace. We denote this value theprediction error of a trace. The goal is to minimize the prediction error.

In the analysis, we first look again at a single trace and show the differences of thedifferent prediction models and prediction parameters onto the trace. This first analysisshould again give a idea of the characteristics of the prediction. Then, in a second step,we calculate the prediction over all 1000 traces.

Model and trace characteristics

The Internet traces as well as the prediction models have different characteristics. A sim-ple analysis of the prediction results of all traces is difficult because it is hard to interpretthe results. A result may have its main cause in the prediction model, but it may also bethat the characteristic of the Internet trace has influenced the result. This section thereforeapplies different prediction models to a single trace and identifies the characteristics ofthe models.

Note that the graphs in the following discussion should not be compared among them-selves because each graph is optimized at showing the behavior of the prediction model.The axes are therefore not consistent throughout the charts. The trace we consider asexample has been sampled at 0.1 seconds. It exhibits some fluctuations around a meanbandwidth value.

BestMean models The easiest model to calculate and to understand is the BestMean(p)model. The number of samples over which a value is averaged is denoted by the averagingwindow parameter p. Figure 4.14 shows the behavior of the BestMean(1) (Last) model.It takes the last value as a prediction of the future available bandwidth. BestMean(1)


0

2500

5000

7500

10000

1 11 21 31 41 51

samples

bandw

idth

[kbps]

sampled bandwidth predicted bandwidth

Figure 4.14: BestMean(1) at 0.1 seconds.

0

2500

5000

7500

10000

1 11 21 31 41 51

samples

bandw

idth

[kbps]


Figure 4.15: BestMean(10) at 0.1 seconds


0

2500

5000

7500

10000

1 11 21 31 41 51

samples

bandw

idth

[kbps]


Figure 4.16: AR(1) at 0.1 seconds

is typically suited for stationary traces with a low variance. The sample trace is poorlypredicted by the BestMean(1) model. The model is almost always far away from the realvalue since the original trace has an almost regular jigsaw pattern.

Higher parameters in the BestMean model result in a stronger smoothing of the pre-dicted trace, as shown in Figure 4.15. Two observations can be made at this trace: first,the predicted trace is slow in following the sampled trace and keeps around the averagebandwidth. The steady clutching to the average bandwidth may be beneficial for some ap-plications, e.g., long running applications which are not interested in the detailed peaks.Other applications, however, may want to take advantage of the dynamic changes. Sec-ond, the sampling interval and the prediction model are two parameters that interfere. Atrace which was sampled at a high frequency can be ”transformed”into a trace that is sim-ilar to a low frequency trace by applying a model with large parameters p (in the case ofthe BestMean model).

AR models Autoregressive models try to model the fluctuations of the original traceswith the shock parameter. Figures 4.16 and 4.17 show this modeling by the AR modelwith parameters 1 and 16 respectively. The modeling of the fluctuations is well visiblebetween samples 21 and 31: the sampled trace has a constant average. Both AR(1) andAR(16) are able to identify this average and try to model the variations as white noise.AR(1) models all these peaks with the same variation since it uses only the first derivation


0

2500

5000

7500

10000

1 11 21 31 41 51

samples

bandw

idth

[kbps]



in the prediction calculation. In contrast, AR(16) models these peaks also differently. Ingeneral, we clearly note the difference between autoregressive and linear models.

ARMA model Figure 4.18 shows that ARMA(1,1) yields similar results as AR models.It also models the fluctuations and the peaks nicely. The shown window is at the beginningof the trace. The initialization of the model seem to have been too low so that model firsthas to ”ramp up”to reach the correct trace level.

ARIMA model Finally, ARIMA(1,1,1) is also capable of modeling the dynamics well.However, in contrast to the previous models, ARIMA sometimes overshoots the highpeaks of the bandwidth trace. The predicted bandwidth at sample 50 has a value of zeroalthough the real bandwidth value is higher than zero. Of the three autoregressive models,AR is the most conservative with respect to the exploitation of extreme bandwidth values.ARIMA is the most dynamic and aggressive. The result is that the prediction is sometimesfar off: for some parameter combinations and some traces, the prediction error is a factorof 1000 away from the real trace! For the discussed trace, ARIMA(8,8,8), e.g., has anaverage prediction error of 1000 times the average bandwidth of the sampled trace. Thesame applies also to ARMA models. The reason may be that the model per se is not suitedfor this kind of traces, or that the initialization has misled the model. In any case, such amisprediction is disastrous for applications.


0

2500

5000

7500

10000

1 11 21 31 41 51

samples

bandw

idth

[kbps]


Figure 4.18: ARMA(1,1) at 0.1 seconds

0

2500

5000

7500

10000

1 11 21 31 41

samples

bandw

idth


Figure 4.19: ARIMA(1,1,1) at 0.1 seconds


0

2500

5000

7500

10000

12500

1 11 21 31 41 51 61

samples

bandw

idth

[kbps]



Trace at 0.2 seconds

The trace, sampled at 0.1 seconds, is very dynamic. The autoregressive models seemwell suited to model this characteristic. To show their behavior with smoother traces, weconsider the same original Internet trace but sampled at 0.2 seconds. The dynamics ofthis trace is quite different from the trace at 0.1 seconds: the COV of the sampled tracedrops from a COV of 0.6 for the 0.1 seconds sampling to 0.2 for the 0.2 seconds samplinginterval.

Figure 4.20 models the trace with AR(1). In contrast to the previous modeling at 0.1seconds, the model does not model the dynamics well. The alternations between periodswith an almost constant bandwidth and periods with fluctuations seem to confuse theAR(1) model.

In contrast to AR(1), AR(16) looks better suited. Since it takes more values intoaccount for the prediction, it integrates the changes in the fluctuations.

Finally, ARMA(1,1) also shows a similar problem as AR(1): the short window, asdefined by the 1–1 parameters, seems to confuse the model.

Summary

To briefly sum up we have shown how the different prediction models try to model the dy-namics of this bandwidth trace. The BestMean model does not well model the dynamics


0

2500

5000

7500

10000

12500

1 11 21 31 41 51 61

samples

bandw

idth

[kbps]



0

2500

5000

7500

10000

12500

1 11 21 31 41 51 61

samples

bandw

idth

[kbps]


Figure 4.22: ARMA(1,1) at 0.2 seconds


0.1 seconds 0.2 secondsmean error median error mean error median error

BestMean(1) 1.0 0.75BestMean(10) 0.57 0.42

AR(1) 0.41 0.31 0.18 0.14AR(16) 0.32 0.25 0.19 0.15

ARMA(1,1) 0.18 0.14 0.4 0.3ARIMA(1,1,1) 0.34 0.23

Table 4.2: Prediction error of the different models, relative to the average bandwidth ofthe trace.

of non–stationary traces. However, if the trace has a rather constant average bandwidthand the application is not interested in the exact peaks, a BestMean model can well beused. Autoregressive models are more suited for these non–stationary traces. The degreeof the aggressiveness (the flexibility) of the modeling differs among the models and theirparameters.

The effects shown in the graphs also have an impact on the prediction error. Thisprediction error for all the following models is shown in Table 4.2.4. This table showsthe prediction error of the traces as a function of the prediction models. The values inthis table are prediction errors relative to the average bandwidth of the trace. A value of1.0 therefore means that the expected error is of the same order as the average bandwidth.The error of BestMean(1) is by far the largest of all models. Remember that BestMean(1)is a very frequent model to in bandwidth prediction. The autoregressive models show afar better error ratio. Their ranking changes when the trace moves from a higher samplingfrequency to a lower one.

The description of the dynamics of the prediction models is by far not complete. Es-pecially ARMA and ARIMA have a wide range of parameters that can be varied. Theparameters used in the charts are the simplest that can be found. In addition, the calcu-lation of the models with these parameters is reasonably fast (the time for the calculationof the prediction are be analyzed later in this section). The same model with a differentparameter sets may result in a different characterization. More parameters are used in theanalysis of the prediction error.

Finally, the graphs also show that errors are caused by different reasons. Consider,e.g., figures 4.21 and 4.22. In the former trace, almost every prediction has an error andthe two curves look very different. In contrast, Figure 4.22 looks more harmonic, meaningthat periods with small errors alternate with periods of significant errors. In the end, bothtraces may result in the same prediction error. However, the impact on the applicationwill be significantly different. This fact has to be kept in mind.


0.01

0.1

1

10

1 2 3 4 5 6 7 8 9 10 15 20 1 2 3 4 5 6 7 8 10 16

calc

ula

tion

tim

e[m

s]

Bestmean(p) AR(p)

Figure 4.23: Calculation time for 1 prediction step.

4.2.5 Prediction Time

The time to calculate a prediction value is important for topology-aware applications.Depending on the application requirements, new values may have to be calculated atfrequent intervals. For the adaptation mechanisms of the topology-aware applicationsconsidered in this dissertation, we estimate that prediction frequencies in order of secondsare realistic. As a consequence, prediction models whose calculation time is significantlylarger than seconds do not fulfill the requirements of the considered applications. It mustalso be noted that bandwidth predictions should be made in parallel to the application. Ifthe prediction has to be triggered by the application, the application has to wait until thenew prediction value is available. If the prediction is calculated offline, a new value isreadily available.

Figure 4.23 shows the time for one prediction step for BestMean and AR models. They–axis denotes the time to calculate a prediction value in milliseconds on a logarithmicscale. Each value is the average over all steps of all 1000 traces. The timing experimentsare executed on an unloaded 933 MHz Pentium III machine with 256 MB RAM.

Two observations can be made here. First, the calculation time lies around 1 millisec-ond for these models. The prediction can be considered as fast enough for the consideredapplications. Second, the time to calculate the prediction is almost independent of themodel parameter p for AR, but it slightly increases for BestMean models for the imple-mentation we used. This fact would have to be taken into account for very large p on slow


1

10

100

1000

10000

1,1

1,5

1,10 2,

12,

52,

10 3,1

3,5

3,10 4,

14,

54,

10 5,1

5,5

5,10 8,

18,

58,

1010

,110

,5

10,1

015

,115

,5

15,1

020

,1

ARMA(p,q)

calc

ula

tion

tim

e[m

s]


or loaded machines.Figure 4.24 shows the time to calculate a prediction for the ARMA models. The

x–axis shows the different model parameters � p � q � . The y–axis is again scaled logarith-mically. A significant increase in the prediction time is visible with increasing modelparameters. The cheapest ARMA models take an order of magnitude longer than AR orBestMean models, but are still fast enough for the application needs on unloaded standardPCs. However, the calculation of the prediction becomes expensive for larger parameters.Prediction times of 1 second can be considered as too large for the application needs. Wetherefore conclude that only simple ARMA models are fast enough for the consideredapplications.

Figure 4.25 finally shows the prediction times for the different ARIMA models. Thedifferent parameters of the ARIMA model have a large influence on the prediction time.An increase in the parameter r from 1 to 3 almost increases the time by an order ofmagnitude. In general, the same statements can be made as for ARMA: only simpleARIMA models are fast enough for the problem addressed in this dissertation.

4.2.6 Negative and zero predictions

The different sampling intervals and the different prediction models have not the sameprobability to report a bandwidth of zero or even a negative bandwidth. An applicationmust discard a reported negative bandwidth value and it must wait for the next prediction.


1

10

100

1000

10000

1,1

,1

1,1

,2

1,1

,3

1,2

,1

1,2

,2

1,2

,3

1,3

,1

1,3

,2

1,3

,3

2,1

,1

2,1

,2

2,1

,3

2,2

,1

2,2

,2

2,2

,3

2,3

,1

2,3

,2

2,3

,3

3,1

,1

3,1

,2

3,1

,3

3,2

,2

3,2

,3

3,3

,1

3,3

,2

3,3

,3

8,1

,1

8,1

,2

8,1

,3

8,2

,1

8,2

,2

8,2

,3

8,3

,1

8,3

,2

8,3

,3

ARIMA(p,q,r)

tim

e[m

illis

econds]


This delay wastes useful time for an application. A reported bandwidth of zero may notbe wrong, since a connection may not be able to transmit any data. For the traces weconsider, however, a zero value means that the application is currently blocked. It isinteresting to see how frequent negative and zero predicted bandwidth values are for agiven prediction model.

Figure 4.26 shows the average number of zero values for all 1000 traces for the Best-Mean model. The x–axis shows the model parameter p on the x–axis and the percentageof zero values on the y–axis. The graph shows first that the number of zero values de-creases with the window parameter p. This result is intuitive because the probability ofhaving a full window of zero values decreases with the a larger window. Second, the num-ber of zero values is higher for higher sampling frequencies. Combining BestMean(1) andthe 0.1 sampling interval results in a total of 40% of zero values. Note that the values fora sampling interval of 0.01 would have been much higher.

In contrast to the BestMean model, there are almost no zero predictions in autore-gressive models. But autoregressive models are subject to negative predictions. Negativepredictions are not possible with the BestMean model (unless the collector reported awrongly negative bandwidth).

Figure 4.27 shows the number of negative predictions for the AR model. The modelparameter AR(p) is varied on the x–axis and the y–axis denotes the probability of a neg-ative prediction in percent. A first observation is that the probability of negative predic-


0

5

10

15

20

25

30

35

40

BestMean(1) BestMean(2) BestMean(5) BestMean(10) BestMean(15) BestMean(20)

zero

pre

dic

tion

valu

es

[%]

0.1 s 0.2 s 0.5 s

Figure 4.26: Number of zero prediction values in BestMean model.

0

0.5

1

1.5

2

2.5

3

AR(1) AR(2) AR(5) AR(10) AR(15) AR(16)

negative

pre

dic

tion

valu

es

[%]

0.1 s 0.2 s 0.5 s

Figure 4.27: Number of negative prediction values in AR model.


0

5

10

15

20

25

ARIMA(1,1,1) ARIMA(1,1,2) ARIMA(1,2,1) ARIMA(1,2,2) ARIMA(2,1,1) ARIMA(2,1,2) ARIMA(2,2,1) ARIMA(2,2,2)

negative

pre

dic

tion

valu

es

[%]

0.1 s 0.2 s 0.5 s

Figure 4.28: Number of negative prediction values for different ARIMA model.

tions is much lower than for zero values of BestMean models. Second, the probability ofnegative predictions is higher for high sampling frequencies and for higher model param-eters. This observation can be explained by the dynamics of the autoregressive models.As shown in the analysis of the AR model for a single trace, a higher model parameterp implies a higher degree of freedom. This freedom leads to an extreme modeling offluctuations, which may result in negative peaks. In contrast, the AR(1) model is moreconservative in the modeling of the fluctuations.

No chart for the ARMA model is shown because the probability of a negative predic-tion is lower than 1%.

However, the characteristics is quite different again for ARIMA models, as shown inFigure 4.28 The number of negative predictions for traces with 0.1 seconds is similar toAR, but this probability is very high at 0.01 seconds. There is a clear tendency towardsmore negative values with increasing values of both q and r values.

To sum up, BestMean models are subject to report zero values whereas autoregressivemodels may predict a negative bandwidth. Zero and negative values may make it hard fortopology-aware applications to cast a decision, e.g., to adapt data, where a positive valueis needed (unless the outcome is that no data is sent at all). Zero and negative values maybe used, in contrast, to select paths. A path with a negative predicted bandwidth is morelikely to slow down (though not to negative values) than a predicted value that is positive.


0.1

1

10

100

1000

10000

AR

(1)

AR

(2)

AR

(5)

AR

(10)

AR

(15)

AR

(16)

BestM

ean(1

)

BestM

ean(2

)

BestM

ean(5

)

BestM

ean(1

0)

BestM

ean(1

5)

BestM

ean(2

0)

AR

IMA

(1,1

,1)

AR

IMA

(1,1

,2)

AR

IMA

(1,2

,1)

AR

IMA

(1,2

,2)

AR

IMA

(2,1

,1)

AR

IMA

(2,1

,2)

AR

IMA

(2,2

,1)

AR

IMA

(2,2

,2)

pre

dic

tion

err

or

0.1 s 0.2 s 0.5 s 1 s 2 s

Figure 4.29: normalized mean error of all 1000 traces.

4.2.7 Prediction Error

The description of the traces and the prediction models has given an understanding of theircharacteristics. This section shows the effects of these characteristics when the differentmodels are applied to all 1000 traces.

Figure 4.29 shows the mean prediction error of the different prediction models on all1000 traces. The mean prediction error is calculated as follows: first, for each trace, themean prediction error is calculated. This error is then divided by the average bandwidthof the trace to get a normalized prediction error. This normalized error is finally averagedover all traces to give the mean prediction error of a model.

The x–axis denotes the different prediction models and parameters. The y–axis islogarithmically scaled. The graph shows that the prediction errors of AR and BestMeanare at least an order of magnitude smaller than those of the ARIMA model. Additionally,the differences among the ARIMA models are extremely large. Only ARIMA(2,2,1) hasa prediction error that comes close to the AR and BestMean models. The ARMA resultsare not shown in this graph. The prediction error is similar to the ARIMA models.

Figure 4.30 shows the prediction errors of the AR and BestMean models in detail. They–axis is scaled linearly. The AR models on the left side show no clear pattern when themodel parameter p is varied. There is a slight tendency that models with a higher p are


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

AR

(1)

AR

(2)

AR

(5)

AR

(10)

AR

(15)

AR

(16)

BestM

ean(1

)

BestM

ean(2

)

BestM

ean(5

)

BestM

ean(1

0)

BestM

ean(1

5)

BestM

ean(2

0)

pre

dic

tio

ne

rro

r

0.1 s 0.2 s 0.5 s 1 s 2 s

Figure 4.30: normalized mean error of all 1000 traces, only AR and BestMean models.

better suited at high sampling frequencies but worse for long sampling intervals. On theright, mid–range BestMean models seem to perform best for parameter value of p=5) orp=10. Clearly, a longer p is needed for small sampling times. In general, we consider thedifferences for both AR and BestMean models too small to draw a relevant conclusionabout which model is best suited, also given that these results are obtained by just lookingat 1000 traces.

The analysis leads to the conclusion that ARMA and ARIMA models are not as suitedas the BestMean and AR models. For any sampling interval, ARIMA and ARMA yielda worse average prediction error than AR and BestMean. Combined with the fact thatthe calculation of one prediction step is more expensive than with AR or BestMean, weexclude the ARMA and ARIMA models from the further analysis.

Focusing on the AR and BestMean model, no model in Figure 4.30 can be identified inthat performs best in all cases. A criterion is therefore needed by which an application caneasily determine which model is well suited for a given bandwidth trace characteristic.Our hypothesis is that the coefficient of variation, COV, has some correlation with theproperties of the prediction models. The COV describes the fluctuation of a trace inrelation to the average bandwidth.

The following four figures (figures 4.31 to 4.34) show the correlation between theCOV of a trace and the mean prediction error. Each figure depicts the results of one sam-


0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

0 0.5 1 1.5 2 2.5 3 3.5COV

mean

pre

dic

tion

err

or

BM(1) BM(10) AR(1) AR(16) BM(1) BM(10) AR(1) AR(16)

Figure 4.31: Correlation between COV of the sampled trace and the mean predictionerror for 0.1 seconds samples.

pling rate (0.1, 0.2, 0.5 and 1 seconds). The COV is depicted on the x–axis, the predictionerror on the y–axis. Each figure shows four prediction models: AR(1), AR(16), Best-Mean(1) and BestMean(10). These models are chosen as representatives of both a smalland a large prediction model parameter p. The reason why we consider 10 and 16 largeparameters is given by the model initialization. Prediction models must be initialized be-fore a value can be predicted. The number of samples needed for the model initializationcorresponds to the parameter p: for BestMean, it defines the window size; for AR, thedegree of freedom is p � 2. A higher value of p would therefore delay the prediction ofthe first value and the time to calculate the prediction step also increases with higher p.

In addition to the raw sample points, a trend line for every model is shown in eachfigure. To calculate the trend line, the raw samples of each model are used to estimate theparameters of an exponential function.

First note that all figures share the same scaling on the y–axis but not on the x–axis.The charts are optimized at showing the greatest distribution of the values within thevisible range. Figure 4.31 has the largest range of x values because the COV of this highsampling frequency shows the largest variation. The COV here lies between 0 and 3.5whereas Figure 4.34 has its maximum at 1.5.

The figures show a clear correlation between the COV and the mean prediction error.The higher the COV, i.e., the larger the fluctuations within the trace, the larger the proba-bility of a prediction error. This trend holds for all prediction models and for all samplingintervals.


0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

0 0.5 1 1.5 2 2.5COV

me

an

pre

dic

tio

ne

rro

rBM(1) BM(10) AR(1) AR(16) BM(1) BM(10) AR(1) AR(16)


0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2COV

me

an

pre

dic

tio

ne

rro

r




0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

0 0.2 0.4 0.6 0.8 1 1.2 1.4COV

mean

pre

dic

tion

err

or



A second observation is that the distribution of the prediction error per model changeswith the sampling interval. AR(16) is clearly the best model when the values are sampledat 0.1 seconds. AR(16) especially performs best at higher COV values ( � 0 � 75). Belowthat threshold, the differences among the models are less accentuated, but AR(16) is stillamong the models with a lesser prediction error. BestMean(10) has a slightly smallerprediction error below a COV of 0.75 than AR(16), but has higher prediction values fora COV above 0.75. AR(1) is third best model throughout the graph, and BestMean(1) isthe worst model.

The same ranking as for 0.1 seconds can be reported for Figure 4.32, where the tracesare sampled at 0.2 seconds.

The prediction errors appear more bundled at a sampling interval of 0.5 seconds be-cause the trace shows a smaller variation (a smaller COV). The sample distribution of thedifferent models also start to change in Figure 4.33, as compared to Figures 4.32 and 4.31,BestMean(10) becomes the prediction model with the least errors. AR(1) and AR(16) areslightly worse then BestMean(10). The statistics reveal that although BestMean(10) has abetter average prediction error, the standard deviation is larger than that of AR(1). Whenan application has to select a prediction model based on these characteristics, it is againup to the application to decide which model it prefers. The higher standard deviation ofBestMean(10) means that the application risks that a predicted value is further off than avalue calculated by AR(1). A large misprediction may cause problems in the adaptationalgorithm.


0

0.5

1

1.5

2

2.5

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 >2

COV

mean

pre

dic

tion

err

or

BestMean(1) BestMean(10) AR(1) AR(16)

Figure 4.35: Mean prediction error of the prediction models, as a function of the COV ofthe sampled traces.

Finally, Figure 4.34 accentuates the findings of the 1.0 seconds traces. The Best-Mean(1) model is by far the worst model. AR(1) and BestMean(10) have a similar aver-age prediction error. However, the mean value of BestMean(10) is better than for AR(1),but its standard deviation is also larger. The AR(16) prediction error values are worsethan AR(1) and BestMean(10), but better than BestMean(1).

To sum up, a correlation between the COV of the sampled bandwidth traces and theprediction error is clearly visible. The higher the COV, the greater the probability ofa large prediction error. Second, not all prediction models are equally suited to modelbandwidth fluctuations. AR(16) is better suited to model traces with a high COV whereasBestMean(10) and AR(1) are better suited for models with less fluctuations. Of the lattertwo models, BestMean(10) has a better average prediction error whereas AR(1) has asmaller standard deviation.

These results are also visible from Figure 4.35 The figure shows the COV of all traces(independent of the sampling interval) on the x–axis. The y–axis shows the average pre-diction error.

Model parameters

The previous section has only compared two model parameters for each model. For bothAR and BestMean we chose a model with a small and a large model parameter p. In


0

0.5

1

1.5

2

2.5

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 >2

COV

mean

pre

dic

tion

err

or

AR(1) AR(5) AR(10) AR(16)

Figure 4.36: Mean prediction error of AR models with different model parameters, as afunction of the COV of the sampled traces.

this section, we report the mean prediction error for AR and BestMean models for moreparameters.

Figure 4.36 shows the differences in the prediction error for the AR model with dif-ferent model parameters p. The differences for COV values below 1 are small. For higherCOVs, the prediction errors for AR(1) and AR(5) begin to increase, compared to AR(10)and AR(16).

A similar conclusion can be drawn for the BestMean models. For smaller COVs, allmodels show a similar behavior, but as soon as the COV exceeds a threshold (starting at0.75), the prediction error of the different models diverges. The smaller the predictionmodel parameter p, the higher the average prediction error.

4.2.8 Conclusions

Bandwidth in the Internet is inherently subject to fluctuations. Many adaptive applicationshave so far measured the available bandwidth at one point in time and taken this value asa prediction for the future. However, many, more sophisticated prediction models for timeseries can be found in literature. This section has studied various prediction models thatare integrated into Remos and can therefore be used by topology–aware applications thatuse the Octopus framework.

We have first shown that the sampling frequency of the bandwidth has a significant


0

0.5

1

1.5

2

2.5

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 >2

COV

mean

pre

dic

tion

err

or

BestMean(1) BestMean(5) BestMean(10) BestMean(15) BestMean(20)

Figure 4.37: Mean prediction error of BestMean models with different model parameters,as a function of the COV of the sampled traces.

influence on the resulting time series. We have chosen sampling intervals between 0.01seconds and 5 seconds because we estimate that this range is important for the adaptationmechanisms of topology–aware applications. Depending on the sampling interval, thecoefficient of variation, which expresses the relative variation of a trace, spans two ordersof magnitude, ranging from 0.1 to a high value of 10. We conclude that the samplinginterval is an important parameter for the bandwidth prediction. The sampling interval isalso a parameter that can be chosen by applications, within their possibilities.

We have also shown that not all prediction models are equally able to model the band-width fluctuations. Especially autoregressive models are better in predicting values fortraces with higher fluctuations, whereas BestMean models are better in predicting traceswith smaller and less frequent fluctuations. We have shown that an application can selectthe prediction model and the prediction parameters based on the coefficient of variation.For the traces we analyzed in this study, AR(1) and BestMean(10) were the best modelsfor traces with low fluctuations (COV � 0 � 75) and AR(16) has the least prediction errorfor higher COV values.

We acknowledge that these COV values must not be taken as absolute values for anyfuture experiment. A more detailed study with more traces is necessary to e.g. define aCOV threshold to determine which model should be used. We consider our results as apreliminary study. However, we claim that the study allows us to criticize the (frequent)


use of the BestMean(1) prediction model for bandwidth prediction. This model is suited ifthe bandwidth trace has little or no fluctuations. However, bandwidth traces in the Internetare very unlikely to have little fluctuations. One might argue that our study shows thata large sampling interval automatically leads to no fluctuations. However, bandwidthhas been shown to have a self–similar behavior. Therefore, it is necessary to deal withbandwidth fluctuations.

The more complex autoregressive models, such as ARMA and ARIMA, are not wellsuited for the topology–aware applications we considered. First, the calculation of a sin-gle prediction value is significantly more expensive than the BestMean and AR models.Second, their prediction error is much higher than for the other models. We attribute thishigh error (at least partially) to the fact that these complex models need more samples toget working correctly. However, more samples implies longer prediction times and it alsoimplies that more samples must be available. While the simplest BestMean and AR mod-els can be initialized with few samples, ARIMA models may require 10 or 100 samples.If a topology–aware application starts up and it does not have any old samples, a smallnumber of samples makes prediction faster usable for the application.

4.3 Conclusions

The availability of information about the topology and the available resources in the net-work is a prerequisite for topology–aware applications. This chapter has addressed toissues related to the gathering of this information.

First, we presented an algorithm by which an application can locate available Oc-topus nodes in the Internet. The ability of dynamically searching for Octopus nodes isimportant for a topology–aware application when no infrastructure is available that storesand reveals the location information. Building such an infrastructure in the Internet isfar from easy, especially for scalability reasons. Other kinds of networks, such as ad-hoc networks, make the building of an infrastructure difficult because of their dynamics.We therefore conclude that the presented solution is necessary to deploy the concept oftopology–awareness.

However, the dynamic location discovery comes at a price. The proposed algorithmtakes at least several seconds to complete. An application pays this time by a delayedstartup. Not all applications are willing to pay this overhead. Therefore, more work isneeded to provide the information faster.

The second issue we addressed is the prediction of the future bandwidth availabilityfor topology–aware applications. We have analyzed several prediction models using Inter-net traces. Our study shows that the taking a snapshot of the actual bandwidth availabilityis a poor predictor for the future bandwidth availability, especially when the bandwidthfluctuates. We propose a simple criterion based on the bandwidth variation to select amore sophisticated predictor, either a BestMean predictor or an autoregressive predictor

4.3 Conclusions 87

(AR). Second, we have shown that a topology–aware application can influence the fluc-tuation of the bandwidth trace by choosing the sampling frequency. A higher samplingfrequency tends to a higher fluctuation, which in turn is harder to predict by the predictionmodels.

Finally, from a software–engineering point of view, we have implemented all the re-quirements of the network–layer as part of the Remos system. The details of the informa-tion gathering are separated from the other structures of Octopus. We have aimed at sucha separation to make the networking parts exchangeable, e.g., to address other networktypes than the Internet. Or, if other network information systems become available thatcan be integrated into Octopus, they can easily be integrated into the architecture.

5Management for

topology–aware applications

The management layer is located between the network and the application layer in the Oc-topus framework. One responsibility of this layer is to glue the other two layers together.Especially the graph data structure with nodes and links is used allows an integration ofthe two other layers. The integration has been described in Section 3.3.

This chapter focuses on three other issues that are attributed to the management layer.A first issue is the management of the resource information. The network layer gathersthe network information. Section 5.1 shows how this information can be stored at themanagement layer to be used by other applications. Storing is also necessary when a firsttopology discovery has not been successful enough. In such a case, alternative paths mustbe searched. In this context, we look at the construction of alternative paths and theirresource properties.

The second issue of this section focuses on the evaluation of a topology graph andthe selection of Octopus nodes. Section 5.2 looks at architectural and algorithmic issuesof node selection. It also applies the evaluator to the constructed alternative paths ofSection 5.1 to investigate the exploit the properties of alternative paths.

Finally, Section 5.3 shows how Octopus services can be integrated into Jini. Jini isa system that manages different kinds of services. In this chapter, we design a classstructure that allows the deployment of Octopus services based on the Jini infrastructure.

5.1 Resource management and storage

The topology and resource information gathered by the network layer can be used directlyby topology–aware applications. However, it can also be stored in a database. There areseveral reasons for storing resource information. First, a history of values is needed forthe resource prediction. Second, the resource information can be shared with other ap-plications. If two applications send data over the same part of a network, the sharingof measurements can lessen the overhead of the measurements (especially if the mea-surements are made via active probing). Third, some measurements may be reused at a

89

90 Chapter 5: Management for topology–aware applications

ma

na

ge

me

nt

laye

r selector

evaluator

predictor

Remos collector

Remos modeler

ne

two

rk la

ye

r

Remos collector

collect

create

O Octopus node

O

OO

O

O

O

Figure 5.1: Overview of the management layer

later point in time. Assume, e.g., that an application sends data, then is inactive for someamount of time, and starts sending again. Assuming that the resource gathering is stoppedwhile the application is not sending (to save resources), no accurate information is avail-able when it starts sending again. We have shown that the gathering of new informationmay delay the application start. Having stored values in a database allows the applicationto get an initial value. Especially the topology information can be used since it is morestable than bandwidth values.

From an architectural point of view, databases can be used at two places in the Octopusframework, as shown in Figure 5.1. At the network layer, every collector can have asmall database to store the gathered values. One collector is responsible for one part ofthe network (a domain, a subnet, etc.). It makes therefore sense to share the gatheredinformation among all users that access a certain network part. We are not discussing thisdatabase any further because it is internal to a Remos collector and does not influence thedesign and the functionality of Octopus.

In contrast, the database at the management layer is able to address the issues outlined

5.1 Resource management and storage 91

above. In addition, placing the database at the management layer has the benefit that notonly network–specific values can be stored. A multimedia application, e.g., wants to storehow many frames have to be discarded. The storing and retrieval of such information ismuch easier when the database is located at the management layer. Similarly, Bolligeret al. [11], e.g., show that measurements by the application also accurately measure theavailable bandwidth. Although this kind of measurement architecturally already belongsto the management layer, legacy applications may have been designed in a way that makesit hard to separate the application and the monitoring. For these legacy applications, thestoring of application–layer data is much easier if the database is at the management layer.

Finally, a management–layer database is needed to create Octopus paths. As depictedin Figure 5.1, the information that is gathered by Remos collectors is typically about a partof the network. Multiple collectors may be contacted to satisfy one application request.The responses are combined in the Remos modeler. That is, Octopus paths are only visibleafter the processing by the Remos modeler, but they are not visible at the network layer.

5.1.1 Topology information for alternative paths

Section 4.1 describes how to find Octopus nodes along the routing path. The focus on onepath provides scalability, but on the flip side the resources found along the default pathmay not satisfy the application requirements.

Savage et al. [66] describe how alternative paths can be constructed. When the appli-cation knows the location of an intermediate host somewhere in the Internet that allowsa route discovery from itself to other hosts, it can construct such an alternative path. Thepath is created by combining the route from the source node s to the intermediate host hand from h to the destination node d. The construction of such alternative paths is easy todeploy because all tools can be run at the application layer and do not require changes inrouters 1.

An application can learn about intermediate hosts in various ways. One possibilityis to store the location of possible intermediate hosts in the management–layer databaseor an external database. If the information the information about the location is addi-tionally combined with performance information, it may be useful to topology–awareapplications. An alternative is to use or create an additional infrastructure that supportsthe searching for Octopus nodes. We imagine an infrastructure that is similar to Look-ing Glass project or public traceroute servers [55]. Looking Glass servers and publictraceroute servers can be considered as intermediate hosts. To create an alternative path,the traceroute information from the user’s host to the intermediate server can be combinedwith the routing information from the server to the receiving host. Currently, these serversonly allow the tracing of the routing path. However, the service of these servers could be

1The IP source route option would be even easier to use because it does not need collaboration fromintermediate hosts, but only few routers support this option.


client

sender

59

106

32

37 61

(a) hops that overlap with default path

7

sender

106

client

99

(b) domains that overlap with default path

Figure 5.2: Overlapping of paths.

extended, e.g., to include tools that measure bandwidth, to store bandwidth informationor even to yield information about the location of Octopus nodes.

The question is to which degree a similar structure like Looking Glass and tracerouteservers allows the finding of true alternative paths. That is, assume that an applicationuses 10 Looking Glass servers as intermediate hosts. How large is the probability ofpaths which differ enough from the default routing path that they are likely to provide adifferent availability of resources or that they allow the detection of new Octopus nodes.If the default and the alternative paths follow the same routers or the same domains, theyare likely to provide a similar resource usage. In contrast, if two paths only overlap for ashort part, they are likely to be real alternatives.

We performed an experiment with 106 traceroute servers to address this question. Weset up a server at ETH and a client at CMU. The 106 traceroute servers are used to createalternative paths. To analyze the overlap of the paths, the names of the routers and thedomains are stored and compared with each other.

As a first analysis, we compare the number of hops (routers) and the number of do-mains which are common to the default path and every alternative path. Figure 5.2 showsthe separation of the paths as a graph, whereas Figure 5.3 shows the same results as achart. In Figure 5.3, the number of hops and domains on the x–axis. The y–axis showsthe number of the measured traces that overlap a certain number of hops or domains. For

5.1 Resource management and storage 93

0

20

40

60

80

100

120

1 2 3 4 5 6 7 8 9 10 11 12

number of overlapping hops/domains

num

ber

of

traces

hops forward hops backward domains forward domains backwards

Figure 5.3: Overlap of an alternative path with the default path.

each path, the number of hops is considered in the forward direction (i.e., after how manyhops two paths share until they diverge) and how many hops they share before the client(i.e., when two paths merge again).

The figure shows that all paths have at least two nodes in common, looking forward.In the considered topology, i.e. for the server at ETH, it takes 3 hops until the providerthat connects ETH to the Internet is reached. Almost half of the paths diverge after threehops, about a forth each after 4 and 5 hops. The situation at the client is slightly different.The last hop before the destination either has two interfaces with different names or thereare two routers over which the client can be reached. This explains the high number ofpath joins at the last hop. Another large peak is when the streams reach the CMU provider(sharing of 3 hops). In the average, we can state that most alternative paths share between4 to 8 hops with the default path in this case.

The number of overlapping hops is not as important as the number of overlappingdomains because the topology discovery algorithm described in Section 4.1 maps therouters to domains. It is therefore important to see how many domains are shared betweenthe default and the alternative paths. Figure 5.3 shows that most paths diverge after thesecond domain in our example. The first domain is ethz.ch, the second is switch.ch (theETH provider). After these two domains, most paths take a different route than the defaultpath. At the receiver side, most paths share the last three domains.

Compare the overlapping of the hosts and domains with the total length of the paths.


client

sender

5565

3284

2281

1098

292

806

1183

562

2442

2745

378

(a) hops that overlap with every other

sender

5565

client

1550

4015

1389

693 4872

161

3648

1224

1160

(b) domains that overlap with every other

Figure 5.4: Overlapping of paths.

The average number of hops in a path is 32 in our experiment. The sharing of 6 hopsshows that a large part of the paths are different. The same result is seen for the domains:of the average number of 11 domains in a path, only 5 are shared and leaves 6 domainsthat an alternative path comes across. There is therefore a fair chance that an alternativepath finds more Octopus nodes and also provides a different resource availability. We canespecially expect that the bottleneck of a path is likely to be in that part that is not sharedamong the paths. Of the 5 shared paths, the first and the last domain in our example isthe local domain of the corresponding university, which has a high bandwidth connectionto the Internet. Note of course that there is no guarantee that an alternative path yieldsa better resource availability. However, the study by Savage et al. [66] has shown thatbetween 30–80% of the alternative paths have a better bandwidth. So there is a chancethat a better path can be found.

The previous result has only compared the default path to all alternative paths. It couldbe that the default path just follows an exceptional path whereas all other paths take thesame path. To exclude this argument, we compare every path with every other. If allalternative paths are similar, the number of overlapping hops and domains should shift to

5.2 Evaluation and Selection 95

0

1000

2000

3000

4000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

number of overlapping hops/domains

num

ber

of

path

s

hops forward hops backward domains forward domains backwards

Figure 5.5: Overlap of all paths with every other.

higher numbers. However, as Figures 5.4 and 5.5 show, this is not the case. Figure 5.5shows the number of overlapping hosts and domains on the x–axis and the number ofpaths on the y–axis. The number of paths on the y–axis is larger than in Figure 5.3 becausethe study compares 106x105 paths. The peaks are similarly distributed as in Figure 5.3.We can therefore conclude that all paths of our experiment are different for a large part ofthe path. The construction of alternative paths therefore provides real alternatives.

5.2 Evaluation and Selection

The evaluation of a topology–aware graph is encapsulated in the evaluator class in theOctopus framework. This section first gives a short overview of different evaluation algo-rithms that may be incorporated in the evaluator class. Section 5.2.2 evaluates an Internettopology and shows that the evaluation comes up with alternative paths that yield a signif-icantly better bandwidth or latency. Finally, Section 5.2.3 gives an overview of possibleSelector implementations.

5.2.1 Evaluation Methods

The evaluation of a graph, as it is created at the management layer, is a well–knownproblem in theory and has been applied to network graph, e.g., in (QoS) routing [20]or routing in overlay networks [21]. The evaluator class in the Octopus framework (see


Figure 3.7) is an abstraction of the graph evaluation. We just mention a few standardalgorithms. The first two algorithms have been implemented in the Octopus frameworkand are therefore ready to use by applications.

� Shortest path: the shortest path algorithm finds the path in the graph whose sum-med up values are minimal for all paths in the graph. This problem was originallysolved by Dijkstra [26]. This problem can be used for latency–bound applications.

� Maximal flow: the algorithm by Ford–Fulkerson finds the maximal flow through agraph by repeated path augmentation [32]. This solution is well suited for bandwidth–bound applications.

� Link–optimization problem: Dijkstra [26] and Bellman–Ford [6] described so-lutions for finding a link (or a path) through a network whose metric is abovea (application–specified) threshold. This algorithm can be used by bandwidth–limited applications that know about their resource demands. The same algorithmscan also be used for combined status problems (i.e., when the metrics are additive,such as latency, and not maximized, such as bandwidth).

� Steiner–Tree: Steiner–Trees are optimal solutions for multicast problems and sce-narios [45].

The implementation of these algorithms should cover a large part of the requirementsof different topology–aware applications. The four scenarios presented in Chapter 3, e.g.,can be addressed with the presented implementations. The first scenario, where a filteris placed inside a network to adapt data, as well as the multi–path streaming scenariocan be solved by a maximal flow algorithm or they can be viewed as a link–optimizationproblem. Steiner–Trees solve the multicast problem. Finally, finding an alternative pathfor a handoff is either a shortest path or a maximal flow problem.

However, an application is always able to implement its own evaluator by extendingthe evaluator class in the Octopus framework. Such an implementation may, e.g., beimportant to address problems with the run–time complexity of some algorithms. Steiner–Trees, e.g., are NP–complete and are therefore very sensitive to the size of a topology–graph. In graph theory, however, a number of polynomial time approximation schemes(PTAS) have been proposed which find a solution to an NP–complete problem that isoff from the optimal solution by a factor (1+ ε), but in polynomial time. The definitionof the evaluator abstraction allows an application to select the PTAS solution that bestcorresponds to its preferences. We expect that PTAS solutions are suited to be integratedas evaluators because an evaluation will always be off by some factor because the resourceavailability has changed between the time of the resource measurement and the end of theevaluation.


ETH

EPFL UVA UC

Harvard

CMU

UFMG

NWU

100K 100K

500K

1.5M

1.5M

180K

120K120K

5M 150K

Figure 5.6: Sample topology of various sites used in this dissertation.

One strength of Octopus is that the evaluator class is not just designed for a singlemetric. The metric is passed as a parameter to the Evaluator. A shortest path evaluator,e.g., works with latency as well as an error rate metric to find the path with the least errorrate in the graph.

5.2.2 Internet experiment

The previous section has shown that alternative paths can be constructed that exhibit dif-ferent path properties with respect to their locations. This section also performs an Inter-net experiment with alternative paths. However, the experiment focuses on the resourceavailability. We create an application layer topology with hosts that are located at differ-ent sites. The bandwidth and the latency are measured periodically among all sites. Afterevery measurement, a maxFlowEvaluator evaluates the graph according to the bandwidthmetric and a shortestPathEvaluator according to the latency.

The goal of this evaluation is twofold. First, we want to see whether alternative pathshave a better performance than the default routing paths. Second, we want to observethe changes of the resource availability over time. We know that the available bandwidthfluctuates. But are the fluctuations such that the evaluation of a topology graph suddenlychanges? Are these changes visible only for one sample or are visible over a longer time?With respect to the first question, we expect that the evaluation shows that alternativepaths exist that provide a better resource availability. Such a result would confirm theresults of Savage et al. [66]. In contrast, we claim that the second question provides a newcontribution.

The topology for the experiment is shown in Figure 5.6. The topology graph shows8 clouds. For reason of simplicity, the Octopus nodes in each cloud are not shown, butevery cloud could have finite number of Octopus nodes. For our experiment, we use onenode per cloud with the exception of ETH where we have two nodes running. Theseclouds represent these domains that provided us with an account to do the experimentsin this dissertation: ETH (ETH Zurich, Switzerland), EPFL (EPFL Lausanne, Switzer-


land), UVA (Universidad de Valladolid, Spain), UC(Universidade de Coimbra, Portu-gal), CMU (Carnegie Mellon University, Pittsburgh), NWU (Northwestern University,Chicago), Harvard (Massachusetts), and UFMG (Universidade Federal de Minas Gerais,Brazil). The links among the clouds are drawn arbitrarily to illustrate the graph structure.In the experiment, every node is connected to every other node via a direct link (denotingthe default routing path) plus via alternative links. The labels along the links denote theavailable bandwidth and are also just attached for illustration.

In the experiment, we measured the available bandwidth and the latency over a periodof 24 hours. Every site measured these metrics to every other site in a round robin way.The tool that Remos uses for bandwidth measurements is nettest, and ping is used forlatency measurements. For the bandwidth measurement, a total of 10 MBytes is transmit-ted.

Although the study is limited to 8 sites, we still consider that relevant conclusions canbe drawn from this experiment. First, the sites span three continents, and on two of them,more than one site participates, so that we can study continental as well as interconti-nental connections. Second, the study lasts 24 hours and should therefore include dailyfluctuations.

Bandwidth

Figure 5.7 shows the recorded (raw) bandwidth traces from one source to one destina-tion via the default routing path. The label of each figure denotes the source node fromwhich the experiment is run. The x–axis denotes the sample number. The time differencebetween two samples is about 5 minutes. The y–axis denotes the available bandwidth inMbps on a logarithmically scaled axis (note that the scaling is different for the varioussites).

The upper two charts (Figures 5.7(a) and 5.7(b)) show an almost stable bandwidth.As a consequence, the ranking of the connectivity of the different destinations remainsthe same throughout the experiment (with very few exceptions). Figures 5.7(d) to 5.7(e)show two different behaviors. Little fluctuation is visible at the beginning. However,after about half of the time, fluctuations are visible in various traces. At the same time,some connections experience a significant drop in bandwidth which changes the rankingof the connectivity. Finally, Figure 5.7(f) shows frequent fluctuations at the beginning.The values are not clearly separated so that no clear ranking can be made out. After 60samples, however, the fluctuations stop and the connectivity gets more stable. A rankingis now possible because the connectivity is different. Looking at all figures, we note thatthere are different behaviors in the traces: some remain stable, some change over time sothat even the ranking changes. An evaluation of the graph must there repeated at differenttimes.

A second observation is that these raw measurements already reveal some informa-


0.01

0.1

1

10

100

0 20 40 60 80 100 120 140 160 180 200 220 240

sample

bandw

idth

[Mbps]

ETH2 EPFL CMU NWU UVA UFMG

(a) ETH1

0.01

0.1

1

10

0 20 40 60 80 100 120 140 160 180 200 220 240

sample

bandw

idth

[Mbps]

ETH1 ETH2 CMU NWU UVA UFMG

(b) EPFL

0.01

0.1

1

10

100

0 20 40 60 80 100 120 140 160 180 200 220 240

sample

ba

nd

wid

th[M

bp

s]

ETH1 ETH2 EPFL NWU UVA UFMG

(c) CMU

0.01

0.1

1

10

100

0 20 40 60 80 100 120 140 160 180 200 220 240

sample

ba

nd

wid

th[M

bp

s]

ETH1 ETH2 EPFL CMU UVA UFMG

(d) NWU

0.01

0.1

1

10

0 20 40 60 80 100 120 140 160 180 200 220 240

sample

bandw

idth

[Mbps]

ETH1 ETH2 EPFL CMU NWU UFMG

(e) UVA

0.01

0.1

1

0 20 40 60 80 100 120

sample

bandw

idth

[Mbps]

ETH1 ETH2 EPFL CMU NWU UVA

(f) UFMG

Figure 5.7: Bandwidth measurements among different hosts.


0

0.1

0.2

0.3

0.4

0.5

0.6

0 20 40 60 80 100 120 140 160 180 200 220 240

sample

bandw

idth

[Mbps]

direct via CMU via NWU

via CMU

via NWU

direct

Figure 5.8: Bandwidth between ETH1 and UFMG via alternative paths.

tion about the location of the bottleneck bandwidth. The bandwidth from ETH1, e.g., isso different for all traces that we conclude that the bottleneck is not immediately close tothe node ETH1. The bottleneck must be located after the paths separate from each other.In contrast, the bandwidth from UFMG (Figure 5.7(f)) is similar for all destinations in thefirst part of the trace. The topology analysis reveals that at least the first 10 hops (fromufmg.br to ucaid.edu) are common to all paths. In addition, a measurement of the staticbottleneck bandwidth using pathchar reveals that the static connectivity between the firstand the second cloud is significantly lower than the connectivity of the other links. Noneof the paths in this scenario is a real alternative because all share the same bottleneck.However, this topology information allows the narrowing down the space where alterna-tive paths must be found. This narrowing can speed up the search for alternatives andcontributes to the scalability issues of searching for alternative paths.

The other question we look at is whether alternative paths can be made out which havea better performance than the default path. To answer this question, alternative paths arecreated by attaching one path to another. The available bandwidth of the alternative pathis the minimal bandwidth of the individual paths.

Figure 5.8 shows the comparison of the different paths between ETH1 and UFMG.Up to sample 60 and between samples 150 and 200, it is hard to make out which path hasthe best bandwidth. In between and after sample 200, however, the direct path has thelowest available bandwidth. Other paths, e.g. via CMU, provide an available bandwidth


0.1

1

10

100

0 20 40 60 80 100 120 140 160 180 200 220 240

sample

bandw

idth

[Mbps]

direct via ETH2

Figure 5.9: Bandwidth between NWU and CMU via alternative paths.

that is 3 times higher. Averaging over all samples, the CMU path exceeds the bandwidthof the direct path by a factor of 2.5. Another interesting result is found by analyzing theavailable bandwidth from NWU to CMU, as shown in Figure 5.9. Geographically, andalso from a physical connectivity’s point of view, the direct connection can be expected tobe the best connection. This expectation is met up to sample 120. Afterwards, however,the bandwidth slows down and other path show a better available bandwidth, e.g., viaETH2. The difference in the bandwidth of these two paths is a factor of 1.6 for the secondpart of the samples. This example shows that the availability of bandwidth does not followgeographical or connectivity rules. In our experiment, a path from a U.S. site via Europeto another U.S. site has at certain times a better bandwidth than within the U.S. Therefore,we conclude that the possibility to automatically select a transmission path is absolutelyvital for the deployment of topology–aware applications.

Latency

In addition to the bandwidth, we also measured the latency of the connections. The re-sults of four sites are shown in Figure 5.10. The label of each figure denotes again themeasuring node. The x–axis denotes the sample, the y–axis the latency in milliseconds.

We note a similar behavior as for the bandwidth traces. The latency is almost stable


0

50

100

150

200

250

300

350

400

450

0 20 40 60 80 100 120 140 160 180 200 220 240

sample

late

ncy

[ms]

ETH2 EPFL CMU NWU UVA UFMG

(a) Latency ETH1

0

50

100

150

200

250

300

350

400

450

0 20 40 60 80 100 120 140 160 180 200 220 240

sample

late

ncy

[ms]

ETH1 ETH2 EPFL NWU UVA UFMG

(b) Latency CMU

0

50

100

150

200

250

300

350

400

450

0 20 40 60 80 100 120 140 160 180 200 220 240

sample

late

ncy

[ms]

ETH1 ETH2 EPFL CMU NWU UFMG

(c) Latency UVA

0

50

100

150

200

250

300

350

400

450

0 20 40 60 80 100 120

sample

late

ncy

[ms]

ETH1 ETH2 EPFL CMU NWU UVA

(d) Latency UFMG

Figure 5.10: Latency measurements among different hosts.


0

100

200

300

400

500

600

700

800

0 20 40 60 80 100 120 140 160 180 200 220 240

sample

late

ncy

[ms]

direct via CMU via UFMG

Figure 5.11: Latency between UVA and EPFL via alternative paths.

in Figure 5.10(a) and the ranking among the different hosts is maintained throughout allsamples, with a few exceptions. Figure 5.10(b) shows large differences in single tracesafter sample 130. Finally, Figure 5.10(c) and 5.10(d) show a mix of fluctuations andsmooth periods. The same conclusion as for the bandwidth evaluation can be applied tothe latency evaluation.

The latency of alternative paths can again be combined by summing up the latencyof two single paths. Figure 5.11 shows the latency of the default and alternative pathsfrom UVA to EPFL. In the first half, the direct path has the smallest latency. At sample130, the latency suddenly increases for the direct path. Note that the bandwidth dropssignificantly at the same time between these two sites (Figure 5.7(e)). After sample 130,the latency via CMU becomes smaller for two reasons. First, although the latency fromUVA increases for all hosts, it does not so to the same degree for all hosts, e.g., theconnection to CMU. And second, the latency from CMU to EPFL stays as low as before.We therefore conclude that alternative paths do not only provide the possibility of a betterbandwidth, but also of a lower latency.

We omit the forwarding delay at the intermediate host in the latency calculation for analternative path. The assumption that the forwarding delay is zero is not true. However,the results of Figure 5.11 show a difference of about 100 ms between the direct and thebest alternative path. We argue that a forwarding delay at an intermediate hosts must notexceed this value because such a delay cannot be justified for latency–bound applications.


5.2.3 Selection

The evaluation of the topology graph results in one or several paths along which the datais best sent to satisfy the application requirements. We use the term “best” here becausethe application has expressed its requirements by implementing the evaluator abstrac-tion. Along these paths, it may be necessary to instantiate Octopus services to deploy thecommunication mechanisms of the application, e.g., for adaptation or multicast.

The selection of a node is a task that is orthogonal to the evaluation of a graph. A firstconsequence is that the selection has been designed as a separate class, named selector, inthe Octopus framework, as shown in Figure 3.6. A concrete instance of this abstract classcontains the logic to place an Octopus service of a certain kind. A multimedia application,e.g., may need to locate Octopus nodes where the data is multicast, but, in addition, it mayneed places to adapt the data. Both the finding of a multicast node and of adaptation nodescan be expressed as separate kinds of a selection problem. A multicast selector identifiesthe Octopus node where two streams diverge, an adaptive selector selects an Octopusnode before a bottleneck.

The selector is typically applied after the completion of the graph evaluation. Thesequential nature of evaluation and selection allows using the results of the evaluationfor multiple selections. In the previous example for a multimedia application, the twoselectors can use the same evaluated graph to place multicast and adaptive services. Re–using the results of the evaluation for multiple selectors saves vital time to the application.

A third advantage of the separation of evaluation and selection is the reduction in theframework complexity and the increase in reuse. The reduction of complexity can easilybe expressed. Assume that the sum of all topology–aware applications has m evaluationrequirements and n selection criteria. If evaluation and selection criteria were to be ex-pressed by a single class, a total of m � n different implementations were necessary. Incontrast, with a separation, the number of needed instances is just m � n.

The Selector class encapsulates the logic behind the placement of the different filters.In this dissertation, the applications in chapters 6 and 7 use three types of filters that canbe instantiated on Octopus nodes: forwarding services, adaptive services and multicastservices.

To place forwarding services, the selection algorithm selects all Octopus nodes alonga path that does not contain other filters. In Figure 5.6, e.g., three paths from ETH to UCare possible: a direct path, which does not need any additional filter, an indirect path viaUVA, requiring 1 forwarding service, or via EPFL and UVA, having two filters along thepath. It is up to the application to select the most suited path and place the forwardingservices accordingly. Forwarding services are also necessary for a multi–path streamingscenario, as described in Chapter 3.

Adaptive filters reduce the amount of data in transmission before a bottleneck so thatthe data can be delivered within a certain time frame. Typical locations to place an adap-

5.3 Octopus and Jini 105

tive filter are before modem or wireless connections or before the data stream reaches theInternet (before or at the provider). As shown in Chapter 6, it is beneficial for feedback–based adaptive applications to place the filters as close to the bottleneck as possible, tomake the feedback loop as small as possible. The selector for such an application first de-termines the path that best fits the application requirements (e.g., the path with the highestbandwidth). Then, it identifies the bottleneck in the path and searches for available Octo-pus nodes before the bottleneck. This selector is implemented in the Octopus frameworkcore. If an application has additional requirements for the selection of the Octopus node,e.g., if the node should provide a certain CPU power 2, the application can easily extendthe selection mechanism.

Finally, to place multicast services, the selector must first choose the paths from theserver(s) to the clients. A multicast service is instantiated at every Octopus node wherethe incoming stream is split.

The selection (as the evaluation) can be done at any time while the application isrunning. One point is certainly when the application starts up. However, an evaluationand a selection may also be needed, e.g., when the bottleneck of a connection shifts, whenresources change in a multi–path, or a handoff scenario.

One advantage of the framework is that the selectors work with any metric. For avideo application, e.g., bandwidth or jitter may be important metrics to place filters.

5.3 Octopus and Jini

The process of finding available Octopus nodes in a network, evaluating and selecting thebest of these nodes for a given application does not imply that the selected node alreadycontains the application–specific code. However, every Octopus node runs an Octopusproxy, which is able to install, manage and run Octopus services. The Octopus service isthe abstraction of the application–specific code that must be run on the selected node (seee.g. Figure 3.9).

There are different possibilities to install and instantiate Octopus services. An appli-cation may download and install in the application context. This “plug and play”–likedynamic code instantiation is widely deployed today. Java provides several features thatallow an easy dynamic installation and instantiation of code: the Java virtual machine al-lows code to run on different platforms, the serialization mechanism of the Java languagealso code to be shipped, the dynamic class loading allows the integration of new codeonto a running system, and the remote method invocation (RMI) allow an easy calling ofremote code.

An alternative to the dynamic code installation in the application context is the com-bination of Octopus with an existing service management system, such as Jini [76]. Jini

2if this information is available


clientserver

jini lookupservice

1

2

3

O

O

Octopus node

Figure 5.12: Scenario of the usage of Jini by topology–aware applications.

is a distributed computing environment that offers “network plug and play”. A device ora software service can be connected to a network and announce its presence, and clientsthat wish to use such a service can then locate it and call it to perform tasks.

We envisage a combination of Octopus and Jini as follows. A Jini system is deployedat various places in the network, e.g., every cloud may contain a Jini system. When anapplication selects an Octopus node to install a service, it first checks whether the desiredservice is already available on this node. If not, the local Jini system is contacted forthe desired service. If the service is available, it is downloaded from the local systemand instantiated. The download from a local system is expected to be faster than from aremote host, and it may improve the security because a locally stored service may havebeen checked and approved before. Finally, if no Jini system is available or if the servicecan not be found, the service is downloaded in the application context.

This section focuses on the design and the implementation of a combined Octopus–Jini mechanism. We study how difficult the design of Octopus services is to make themavailable via Jini mechanisms.

Figure 5.12 depicts a typical interaction of a topology–aware application with Jini.This figure shows an application, consisting of a server and a client. Assume that the lo-cation discovery (see Section 4.1 has come up with a one cloud where an Octopus serviceshould be instantiated (e.g., it is the cloud before a bottleneck). We assume that everycloud contains one or several Jini lookup services where all available Octopus servicesregister when they become available. The server sends a (Jini) discovery request to thecloud (1), which contains the search criteria (e.g., the service name). The Jini lookup ser-vice checks the registered Octopus services and returns a reference to an available service(2). The application can then use the service via a normal method call (3).


Service Reg

ServiceFinderListener

ServiceFinder

ServiceEventHandler

EventManager

LeaseManagerOctopusService

OctopusServiceServer Servent

Remote

RemoteEventListener

UnicastRemoteObject

DiscoveryListener

ServentInterface

OctopusServiceImpl ServentImpl Landlord

<<interface>>

<<interface>>

+ServiceReg item)+ServiceReg( item, String filename)

(ServiceItemServiceItem

+ServicFinder(...)+void start()+synchr void addServiceFinderListener(...)+synchr void removeServiceFinderListener(...)+synchr void addServiceItem(ServiceItem)+synchr getServiceItem()ServiceItem

#LeaseManager leaseManager#Hashtable events

+void notify(RemoteEvent event)

+ getLease(long duration)+void expireLeases()

Lease

<<interface>>

+Servent createServent()+Lease addRemoteEventListener(...)

<<interface>><<interface>>

<<interface>>

<<interface>>

<<interface>>

+void cancel(...)+void cancelAll(...)+long renew(...)

#LookupDiscovery discovery#LeaseRenewalManager leaseManager#ServiceTemplate template

#LandlordLeaseFactory factory

Figure 5.13: Overview of the design that integrates Octopus into Jini.

This scenario is just one possible way for an application to get a handle on an Octopusservice. This solution is best combined with the Octopus node discovery, as describedin Section 4.1.3, and may replace the use of the domain name system to find availableOctopus nodes within a cloud.

For a simple case study, we designed a framework that enables the integration ofOctopus with Jini [44].

Figure 5.13 shows the core classes needed for this integration. The integration be-tween Octopus services and Jini is made by the OctopusService interface. This interfacecontains all the functionality of a service that a client needs: the instantiation of a ser-vant and the registration of a RemoteEventListener. It also ensures that the service can beaccessed via RMI. Every concrete service must implement the OctopusService interface.In addition, every OctopusService must implement the OctopusServiceServer interface,which guarantees that a service can be accessed via the Jini registration mechanism.

The ServiceFinder class contains features for asynchronous as well as synchronouscommunication to find a specific OctopusService. Discovery events, e.g., are typically


OctopusService

OctopusServiceServer

Servent

Remote

UnicastRemoteObject

RemosServentRemosImpl

RemosServentImpl

<<interface>>


<<interface>>

<<interface>>

<<interface>>

<<interface>>

+ServiceItem getServiceItem()

+RemosImpl()

PathDiscovery-ServiceReg registration

+PathDiscovery()-void init()

+ Vector startDiscovery(String dst)

+RemosServentImpl()-void trace(String dst)

+static RemoteStub exportObject(...)

Figure 5.14: Integration of Remos into Jini

asynchronous whereas applications may use synchronous communication.

The EventManager implements the handling of service–specific events. It interactswith the LeaseManager, which implements the Jini leasing.

In summary, the whole package consists only of 9 classes. This small number stressesthe easiness of the integration of Octopus and Jini. A huge part of the functionality neededto deploy Octopus services is already available from the Jini system. The design is flex-ible and extensible for any kind of Octopus service. The interfaces do not contain anyapplication–specific methods.

In the remainder of this section, we study the design of three existing Octopus servicesof this framework: a Remos network information service, an adaptive image filter and aframe–dropping filter for MPEG–1 streams. The latter two anticipate services that will bedescribed in the context of the applications in Chapter 6 and 7.

A Remos–Jini service

The first service to be integrated with Jini is the path discovery algorithm, which is inte-grated into the Remos system. The goal here is to show that parts of Remos, or Remositself, can be integrated. So Jini is not only able to support application–specific servicesbut also network information. The service in this study is called with a destination net-work address. The service returns the list of hops or clouds.

Figure 5.14 shows the design of this service. RemosImpl is the service, which iscreated when the getServiceItem is called on the OctopusServiceServer. The RemosSer-vent provides the interface that is specific to this servent, i.e., the interface to start thediscovery. remosServentImpl implements the code for this service.

An application gets a reference to the service by calling the serviceFinder in the main


OctopusService

OctopusServiceServer

Servent

Remote

UnicastRemoteObject

ImageServentImageImpl

ImageServentImpl

<<interface>>


<<interface>>

<<interface>>

<<interface>>

<<interface>>

+ServiceItem getServiceItem()

+RemosImpl()

ImageFilter-ServiceReg registration

+PathDiscovery()-void init()

+Object getData()+double getSize()+void loadImage(...)+void saveImage(...)+void scaleImage(...)+void convertImage(MimeType to)

-MimeType format-byte [] data

+static RemoteStub exportObject(...)

+ImageServentImpl()+void setFormat(MimeType to)

MimeType

+static final int NO_MATCH+static final int MATCH_TYPE+static final int MATCH+static final MimeType GIF+static final MimeType JPEG+static final MimeType TIFF

+MimeType()+int match(MimeType other)+boolean equals(...)

Entry<<interface>>

Figure 5.15: Integration of the image application into Jini

framework. Given the name of the Octopus service, the serviceFinder locates the availableservice. The application instantiates a new Octopus service via the returned reference.

The design shows that the integration of a Remos component is easy and requires verylittle knowledge about Jini. The details of Jini are well hidden in the integration frame-work. This example demonstrates that the integration of networking information tools iseasy. We expect the integration of similar tools, e.g., Remos components (e.g., collectors,databases) or information from a management–layer database, is similarly easy.

An image–Jini service

Chapter 6 describes an image application that employs adaptive Octopus services to re-duce the size of images before a bandwidth bottleneck. These services can be managedby Jini as well. Figure 5.15 shows how this integration is made. The structure of thisapplication service is similar to the structure of the Remos service. The four classes (im-ageServent, imageImpl, imageService, imageServentImpl) are analogous to the Remosclasses. This analogy stresses the ease of implementation of other, new services becausethey can reuse the design of the presented services.

An interesting difference is the mimeType class. This class encapsulates search criteriafor different adaptive filters. If an application sends a jpeg or a gif image, it must find afilter that can process these image types. When searching for a service, the application


additionally specifies the image type. This search criterion may also be abstract, e.g., thefilter should apply lossless or lossy compression.

A video–Jini service

As a third example, we consider an adaptive multicast filter for MPEG–1 streams. As ex-pected, the design of the classes to integrate this filter are similar to the previous designs.One important difference to the previous examples is the use of the service after the in-stantiation. The communication between the service user (e.g., the video server) and theservice are significantly different. In the previous examples, the communication is madevia RMI. In contrast, the video stream is sent via sockets. Technically, it would have beenpossible to change the communication mechanism to RMI. However, by showing that thecommunication via sockets is possible as well we stress that also legacy applications canextend our integration framework as easily as a newly designed application.

5.4 Summary

The management layer integrates the network and the application layer of the Octopusframework. The data structures model network resources, but they also allow the ap-plication to customize the selection of paths and locations to deploy Octopus services.Second, the management layer manages the resources gathered by the network layer andalso steers the search by looking for alternative paths. Finally, the management layerallows the integration and interaction with service management systems, such as Jini.

Topology–aware applications must take alternative paths into account when the searchfor Octopus locations does not satisfy the application requirements. Alternative paths canbe constructed by sending the data via a third–party host that forwards the data towardsits destination. Such a forwarding can be implemented with an Octopus service. Sincewe expect that a forwarding service is often used, it can also be expected to be optimizedto forward data as efficiently as possible and it can also be expected to be installed perdefault on all Octopus nodes. Such an optimization is necessary to keep the (latency)penalty for alternative path routing low.

Alternative path routing requires that some information is necessary about the locationof third–party hosts via which the data can be sent. We have shown that already theinformation that is currently available from public traceroute servers and Looking Glassservers is sufficient to create alternative paths. These paths are real alternatives froma topology point of view because they share less than 50% of the path with any otherpath. It can therefore also be expected that these alternative paths yield a different metriccharacterization.

The evaluation of a network topology is based on well–known algorithms from thegraph theory, but it also allows a customization of the evaluation by the application, e.g.,

5.4 Summary 111

to implement a less complex (non NP–complete) algorithm. Similarly, the selection of anOctopus node to deploy a service provides standard implementations for frequently usedselections, e.g., for adaptive services or multicast services, but also allows a customiza-tion.

Applying the evaluation to a network graph that has been created with Internet traceshas shown that the network behavior is hard to predict. However, an automatic evaluationof a network graph, as it is done in Octopus, is efficient and easy. In contrast, this evalua-tion is often delegated to the user in today’s systems. From this experiment we concludethat the user has little knowledge to do an evaluation.

The possibility to combine Octopus and Jini shows that our approach is compatiblewith other systems. We consider the possibility of Octopus to interact with other systems apositive side of our design. Advances in other domains that are closely related to Octopusmay be integrated so that Octopus can profit from these advances.

Finally, the structure and the functionality of the management layer are well definedand can easily be understood from its design. This ease of use allows an easy integrationof new and legacy applications into the Octopus framework. At the same time, however,there is enough flexibility provided by some parts in the management layer to take variousapplication preferences into account. The framework structure, with its fixed and flexibleparts, is therefore well suited to address the needs of topology–aware applications.

6A topology–aware

collaborative application

The integration of network information into the application context also requires an effortfrom the application programmer to take and use the provided information. This chapterpresents a topology–aware collaborative application which aims at distributing differentdata types among multiple participants in the most efficient way, by selecting transmissionpaths and by adapting the data. Section 6.1 describes the requirements and the challengesof a collaborative application. Section 6.2 shows the design of a collaborative applicationand its interaction with Octopus. Section 6.3 studies the data delivery. It first shows howthe data requirements and the receiver requirements are integrated into the applicationdesign. Second, it shows the benefit of topology–awareness with an Internet experiment.Finally, Section 6.6 summarizes the findings of this chapter.

6.1 Collaborative communication

We define a collaborative application as an application that allows a set of users at vari-ous places to communicate with each other, e.g., to work together on a common task ora problem (hence the name collaborative). The core of such a collaborative applicationis data distribution: when one user provides some information (a text, an image, a videostream), it must be displayed on the screen of all participating users. Examples of collabo-rative applications can be found in telemedicine or teleteaching. In a teleteaching session,a video camera streams a live audio and video of the teacher to the different clients. Inaddition, slides prepared by the teacher are transmitted to all clients. Finally, text writtenor graphics drawn on an electronic whiteboard are sent as well.

In telemedicine, consider an emergency case where an ambulance is equipped withlaptops, cameras and other medical sources, e.g., heart rate monitor. These devices trans-mit information to the nearby hospitals in real time, where experts may give advice aboutthe treatment of the patient. Information about the patient record can also be sent fromthe hospital to the ambulance. If the scenario is worse, information from the local hospi-tal can be shared with other hospitals, e.g., where specialists are located. To support the

113

114 Chapter 6: A topology–aware collaborative application

Figure 6.1: Web site of the 2002 soccer world championship, showing a mix of text,image and video data.

discussion, life images of the patient may be transmitted, and medical images acquired bydifferent sources (X–ray, CT, MRI) can be shown when available.

These scenarios have in common that a significant amount of flexibility is needed.First, the devices produce a wide variety of data with different requirements for theirtransport. Heart rate monitors produce a steady stream of a number every part of a sec-ond, images are transmitted once and are therefore bursty. Video data, finally, producea contiguous data stream with high requirements. Flexibility is also needed because thereceiving devices may be heterogeneous: laptops (or even smaller handhelds) may beconnected with limited displaying capabilities, whereas large screens are available in hos-pitals. Finally, the connectivity also shows a variation, from wired networks in the hospitalto wireless connections in the mobile devices.

The problem space addressed by collaborative applications covers a large range ofproblems. Many other applications only have to deal with a part of the issues the collab-orative application addresses. Electronic whiteboards, e.g., often do not need to adapt the

6.2 Design of a topology–aware collaborative application 115

data because (i) the amount of data is small and (ii) text is difficult to adapt while main-taining a good quality. Another example of a simplified communication is when only oneserver is transmitting data, as in the Web. Figure 6.1 shows the match cast of the 2002soccer world championship while the game is being played. This site contains a mix ofinformation of different data types. On the left of the figure are the current score and thelineups in the form of text and small images. This information is updated infrequently.In contrast, the match log in the form of text on the right is more frequently updated.At the bottom, thumbnail images are displayed that are enlarged when clicked upon. Fi-nally, a link on the top of the page leads to video highlights. The data displayed on thissite is very heterogeneous. Another heterogeneity is given by the capacity of the receiv-ing connections and the displaying capabilities of the devices (especially in Japan, wherethe new generation of handys allows the displaying of images and video). Both kindsof heterogeneity make topology–awareness an attractive approach to distribute the data.First, the heterogeneity of the connectivity and the devices requires adaptation. Second,the adaptation is dependent on the data type. Third, multicast can be used to distributethe data to multiple users. And finally, given the popularity of soccer, the Web server islikely to become a bottleneck, so that a distribution of the adaptation (in combination withadaptation) is possible.

So, the goal of a topology–aware collaborative application is to distribute the data asefficiently as possible to all users. The challenge thereby to distribute the data over thesepaths that are best for a particular data item. The following sections show how Octopus isable to support a collaborative application in improving its quality.

6.2 Design of a topology–aware collaborative application

The design of a topology–aware collaborative application requires the integration of var-ious components of the Octopus framework into the application context. A first group ofabstractions is the communication infrastructure that allows the sending of data via third–party hosts. For the communication infrastructure, the application must deal with Octopusnodes, links and paths. The abstraction of Octopus nodes is needed to instantiate Octopusservices. An Octopus link must enable the communication between two Octopus nodes.The data is thereby tunneled from one node to the other over normal IP path. Finally, theOctopus path must provide an easy–to–use abstraction for applications to send data fromone Octopus end system to the other.

A second group of abstractions that have to be integrated into the application are Oc-topus services. The application must define its own services. Remember that services arenot only used to work on application data (adaptation), but also more complex communi-cation implementations, such as multicast, are expressed as services. The challenge in thedesign of this application is to allow an easy installation and instantiation of new services.

Note that both the communication infrastructure and the service abstractions have so


ETH UVA

CMU NWU

CMU

EPFLETH

OO

O

O Octopus node

Figure 6.2: Scenario for a collaborative application setup.

far only been described as concepts (e.g., in Chapter 5). This section provides exam-ples for a concrete implementation of these concepts. Although the implementation isapplication–specific, we anticipate that many applications can use the same implementa-tion of communication mechanisms and service abstractions. The communication mecha-nisms, e.g., are targeted at the current Internet, with Octopus nodes being real end systemsand Octopus links being overlay links that tunnel the data via normal IP paths over thecurrent Internet. Because this implementation can be so frequently reused, we attributethese abstractions to the management layer rather than the application layer. Anotherreason is that the application layer should only contain application–specific mechanisms,such as adaptation mechanisms.

The challenge in the design of the topology–aware collaborative application is to in-tegrate these abstractions in a way that allows an easy customization of the different ab-stractions on the one hand, but still gives the application enough flexibility to express itspreferences.

6.2.1 Scenario

The challenges in the design of a collaborative application can be described with a sam-ple scenario of a topology–aware collaborative application. Figure 6.2 shows 4 usersat different locations that are using a collaborative application, e.g. for teleteaching ortelemedicine. At application startup, only the end systems are visible. The applicationmust first detect available Octopus nodes in the network. Then, depending on the ap-plication preferences (user preferences, data types), the routing path must be determinedamong all the participants. Note here that it is possible to have multiple paths betweentwo end systems. One of these paths may be used for high bandwidth transmissions, theother may be optimized for a low latency. After the determination of the transmissionpaths, services are instantiated on the Octopus nodes, either to forward the data in a spe-


cial way (multicast) or to adapt the data. Finally, after establishing Octopus links amongall Octopus nodes, the application is ready to start sending.

The sending of the data must be coordinated so that the correct data is sent over thecorresponding path, e.g., video data over a high–bandwidth link. However, this coordina-tion may not always be so easily made by the data type. In a teleteaching session, e.g.,presentation slides may be available when the session starts. A clever strategy to sendthese slides is to send the first slide as fast as possible and then send the following slideswhen enough resources are available. Every client can cache these slides so that theyare immediately available when needed. Such a difference in the transmission must beexpressed by the application.

This section describes the selection of a transmission scheme at the application startup.However, the same process can be repeated at any time in the life time of a topology–aware application. Such a re–consideration of the transmission scheme is necessary whenthe available resources along the paths change significantly. The term significantly de-pends again on the application preferences. Octopus makes it easy to take these prefer-ences into account.

The issues that have to be addressed can be summarized as follows:

� Discover the topology and measure the available resources for the needed metric(bandwidth, latency, jitter).

� Evaluate the topology graph. The evaluation depends on the data type, the userpreferences, timing factors (how often should be measured and sampled).

� Select the Octopus nodes, depending on the scenario.

� Install the Octopus services.

� Establish all Octopus paths in an easy way (for the application).

� Send the data along the best path.

� Adapt to significant changes in the resource availability, e.g., by changing the paths.

The main challenge in addressing these issues is to provide as much abstractions andimplementation as necessary within the framework to make the implementation of thisapplication easy while at the same time still providing enough flexibility to allow an easyexpression of application requirements and preferences.

6.2.2 Design

Figure 6.3 describes the design of the topology–aware collaborative application and itsintegration into Octopus. The figure shows two layers, the management layer and theapplication layer. The classes can be separated into three groups, as indicated by the


+setup(UserGroup)+display(OctopusObj)

VideoObj

UserGUI

TextObj

StorageService

VideoServiceImageService

- TopologySocket

JPEGAdapter

ImageAdapter

+adapt(ImageObj)

+adapt(ImageObj)

GIFAdapter

+adapt(ImageObj)

ImageObj TextService

ap

plic

atio

n la

ye

rm

an

ag

em

en

t la

ye

r

OctopusObj

OctopusService

OctopusProxy

*TopologySocket

-Socket-Modeler

+connect()+send(OctopusObj)+recv():OctopusObj

TopologyXSocket

+configure()

Manager

+setup(): Path

Modeler

+getBandwidth()+getLatency()+getTopology()

+adapt()-resize()-compress()-convert()

+adapt()-drop()

+adapt()-convert()

+adapt()

+handle()

-Preferences+Adapter

-Preferences -Preferences

+handle()

+handle()

+handle()

Evaluator

Selector

-TopologySocket

ForwardingService

*

Figure 6.3: Design of a collaborative application.

different colors in the diagram. Classes shaded in dark gray belong to the core Octopusframework and have been discussed in Chapters 3 and 5. They are depicted to show theintegration of the application into Octopus. Classes shaded in light gray are classes thatprovide the communication and computation mechanisms for the collaborative applica-tion that may also be reused by other applications. Communication mechanisms allowthe establishment of Octopus paths, including multicast and multipath streaming facili-ties. Computation mechanisms include the platform for service instantiation on Octopusnodes as well as abstractions for the services. All classes in white are specific to the col-laborative application. Their implementation might also be used by applications with thesame data types or the same communication requirements.

Starting the tour from the application’s point of view, the UserGUI provides the mech-anisms for displaying the content of the collaborative application. It sends and receives thedata from the other participants by a special kind of socket we call topologySocket. Thefunctionality of a topologySocket corresponds to a normal socket, but it implements themore complex connection management (establishing a connection path and sending datavia multiple hops) from the application. TopologySockets are an extension of standardsockets and can therefore easily replace standard socket to support legacy applications.TopologySockets contact the modeler to gather information about the topology and theavailable resources.


sender receiver ROctopusProxy (P)

service

new TopologySocket(P,R)

5

3

2

1

connect(P,R)

connect(R)

set (policy, preferences) set (policy, preferences)

start services

send (OctopusObj)

adapt (OctopusObj)

store (OctopusObj)send (OctopusObj)

6

7

8

4

Figure 6.4: Sequence diagram of the communication in the collaborative application.

A topologySocket is just used for the communication between two Octopus nodes. Incontrast, topologyXSockets provide mechanisms for multi–point communications, e.g.,for multicast. The functionality provided by topologyXSockets could also be imple-mented as services in the application context, but since multi–point communication isneeded for different scenarios, we decided to provide this functionality at the manage-ment layer.

TopologySockets are used by the UserGUI, but also by the Octopus proxy. The Oc-topus proxy is a platform for applications that runs on an Octopus node. The Octopusproxy allows the installation and the instantiation of application–specific services. Theseservices are defined at the management layer by the Octopus service class. An Octopusservice encapsulates the information how the application data should be processed at theOctopus proxy. Two services, the storageService and the forwardingService, are simpleservices that store and forward application data independent of the data type. For thisreason, these services are located at the management layer rather than the applicationlayer. The application must customize the Octopus service for every data type it wants toprocess on a proxy. The imageService allows the adaptation of images, the videoServiceimplements a frame dropping filter or a format conversion of the video stream. A textservice might convert one text format into another, e.g., converting a word document intoa simple text format to be displayed on a simple handheld device.

Corresponding to each Octopus service is an Octopus object that encapsulates the data(the image, video frame or text). In analogy to the class hierarchy of the Octopus services,the Octopus object is defined at the management layer and every application must extendthis service for every data type it wants to process.

The interaction of the different parts of the collaborative application at runtime isdepicted in Figure 6.4. This figure shows two collaborative users, labeled sender and


receiver, and one Octopus proxy in between. When the application starts up, a new topol-ogySocket is instantiated (1). This socket interacts with the network layer to get thetopology information and to evaluate the network graph. To build the Octopus path, thesender connects to the proxy and sends the remaining path to the proxy (2). In this fig-ure, the remaining path is just the receiver, but it could contain multiple proxies or evenmultiple receivers. Note that this connection establishment is transparent to the applica-tion, which simplifies the programming of topology–aware applications. Upon receivinga connection request, the proxy starts a first service that is responsible for the handling ofthis connection (3), e.g., the setup and tear down of the connection. This service connectsto the receiver (4). Having established the connections among the collaborative users, theapplication may customize the communication by setting policies and preferences in allproxies (5). Customization includes the starting of services on the proxy that are laterneeded for the communication, e.g., starting an imageService or a videoService. Anothertype of customization is to set preferences in these services. If a receiver has a limited dis-playing capability, it must set the corresponding preferences in the service it receives thedata from. Note that step (5) is optional. Starting services before they are used may havea negative impact on the proxy performance. In contrast, starting a service only when itis used may have a negative impact on the application performance. An agreement anda trade–off must be found between the application and the proxy. Also note that step (5)may be executed or repeated any time later. Finally, to send an Octopus object, the sendersends the object over the first hop. At the proxy, the data may be processed (e.g., adaptedor stored) before it is forwarded to the receiver.

The design covers the issues of topology–aware applications mentioned earlier in thischapter. The topologySocket communicates with the modeler to discover the networktopology and gather resource information. The resource gathering is influenced by theapplication via the manager. The evaluation and the selection is customized by subclass-ing the evaluator and the selector class. Finally, the topologySocket shields the applicationfrom the details of the establishment of the Octopus path and sending the data along thatpath. The installation and instantiation of Octopus services is extensible and allows aneasy customization of the services, based on different data types and application prefer-ences.

The design of this application shows that it is possible to add functionality even to themanagement layer. In addition, the framework is easily extended by subclassing severalclasses while the interaction among the different classes (the program flow) remains fixed.This ease of use and ease of (design) reuse proves that Octopus is a powerful framework.

The classes shown in this design are those classes that are needed to implement atopology–aware application. Other classes which implement the adaptation strategy, e.g.,are not shown here because they are already needed for non–topology–aware applications.Bolliger [10], e.g., describes these classes in his framework for network–aware applica-tions. Depending on the complexity and the flexibility of the application, a significant

6.3 Topology–aware data delivery 121

effort may be necessary to deploy mechanisms within the application context to optimizethe communication.

The kind of integration shown here for the collaborative application is termed a white–box framework use. White–box here refers to two issues. First, the classes in the frame-work are customized by extension, rather than just being used as fixed parts. Second,application designers may extend the functionality of the management layer to fit therequirements of different applications. This white–box use is especially useful for the im-plementation of new applications. In contrast, a black–box approach is often more suitedfor legacy applications. This approach will be shown in Chapter 7.

6.3 Topology–aware data delivery

The integration of network information allows a topology–aware application to route andprocess the data according to its own preferences. A topology–aware has thereby a choiceof five strategies to transmit the data:

1. send the data over the default path, without adaptation.

2. send the data over the default path, adapt data at the sender

3. send the data over an alternative path, without adaptation.

4. send the data over an alternative path, adapt data at the sender

5. send the data over an alternative path, adapt data on the proxy

The first strategy is the traditional, end–to–end data delivery without adaptation. Thesecond strategy is that of a network–aware application which adapts the data on an end–to–end base. The transmission time is thereby reduced at the cost of an adaptation of thedata prior to sending it. Time can be saved when the reduced transmission time is largerthan the adaptation time. Strategies 3 to 5, in contrast, provide new possibilities, due tothe concept of topology–awareness. Strategies 3 and 4 are analogous to strategies 1 and 2,but the data is routed along a different path (via a proxy). Finally, strategy 5 even movesthe adaptation process onto the proxy, i.e., it adapts the data inside the network.

A topology–aware application selects the strategy that allows the best data delivery.We already mentioned that two factors are relevant in the definition of what is the bestdata delivery: application preferences (or constraints) and the transmission time. Theapplication preferences limit the choices of an application to change the data delivery orthe adaptation. A possible list of constraints for the collaborative application is:

� Text data, e.g. written on the electronic white board, must be transmitted unchangedto all users.


sender receiver

data

time

latSR

data /bwSR SR

datalatSR

data /bwSR SR

tadapt,s

with a

dapta

tion

without adapta

tion

(a) Topology–aware data delivery over defaultpath (strategies 1: with adaptation, and 2: withadaptation)

sender receiverproxy

dataSP

dataPR

time

latSP

latPR

data /bwSP SP

data /bwPR PR

tproc,p

(b) Topology–aware data delivery using alter-native paths (strategies 3–5)

Figure 6.5: Topology–aware data delivery.

� Slides can be transmitted as plain text (ignoring figures if they are on the slides) oras images.

� Images (including slides) can be transmitted as tiff, gif and jpeg. Images may beadapted by converting them to another format or by reducing the image size.

� Both images and text must be visible at every participants at least T seconds afterbeing written or displayed at the sender.

� Video is adapted by a format conversion (RGB stream from the video source toMPEG–1) and by dropping frames of lower priority. The discussion of video isdeferred onto the next chapter.

The second parameter that influences the selection of a transmission strategy is thetransmission time. The transmission time depends on various factors that also vary forthe different transmission strategies. Figure 6.5 depicts the different strategies to identifythe relevant parameters for each strategy. Figure 6.5(a) shows the first two strategies,which send the data on an end–to–end base. The first strategy (Figure 6.5(a), top) dependson two parameters: the one–way latency and the bandwidth between the sender and thereceiver. The second strategy, which uses adaptation, additionally depends on the time toadapt the data.

Figure 6.5(b) depicts the topology–aware data delivery which includes a proxy in thedata delivery. Hence, there are two latency and two bandwidth parameters that have tobe taken into account. Depending on the transmission strategy, an additional parame-

6.3 Topology–aware data delivery 123

strategy 1 : tdelivery � databwsr

� latsr (6.1)

strategy 2 : tdelivery � dataadapt

bwsr� latsr � tadapt � s (6.2)

strategy 3 : tdelivery � databwalt

� latsp � latpr � t f orward � p (6.3)

strategy 4 : tdelivery � dataadapt

bwalt� latsp � latpr � tadapt � s � t f orward � p (6.4)

strategy 5 : tdelivery � databwsp

� latsp �dataadapt

bwpr� latpr � tadapt � p (6.5)

with : bwalt � maxpaths � min � bwsp � bwpr � � (6.6)

Table 6.1: Formulas for the time needed to send data for the 5 strategies.

ter for the adaptation influences the delivery time. However, with topology–awareness,adaptation can be executed on the server or the proxy.

Figure 6.5(b) also shows a small detail about our implementation of proxy–basedadaptation. We require that a data item (an image, an MPEG frame, a text) always hasto be received completely at the proxy before it is processed, and it is sent only whenthe processing is finished. Other adaptation schemes, such as progressive encoding, existwhich do not require such a sequential process. Our implementation only considers asequential process because (i) we use Java RMI, which serializes or de–serializes dataitems completely before a next processing step is started, and (ii), because it simplifiesthe following analysis of which transmission strategy should be chosen by an application.However, proxy–based adaptation is able to also implement progressive data processing.

The parameters which influence the transmission time for a specific transmission strat-egy are summarized in Table 6.1 (these equations correspond to the list of strategies shownearlier in this section). The formulas in this table express the delivery time of a data itemas a function of those parameters that are relevant for a specific transmission strategy. Inthese formulas, latxy denotes the latency between two Octopus nodes x and y. Possiblevalues for x and y are s: sender, p: proxy and r: receiver. bwxy denotes the bandwidthbetween two Octopus nodes. data corresponds to the size of the data and dataadapt is thesize of the data after the adaptation. tadapt � x denotes the adaptation time on an Octopusnode x. Finally, bwalt is the bandwidth over the best alternative path, where the bandwidthof an alternative path is the minimal bandwidth of the two subpaths.

Every strategy adds an additional option to transmit the data for a topology–awareapplication. Note that not all strategies add the same possible number of alternatives. Thefirst strategy only contains one possibility, namely to transmit the data unchanged. Thesecond strategy, in contrast, opens many possible transmission options, depending on the


adaptability of the application data. An image, e.g., can be adapted in different ways(scaling, format conversion, etc.), and for each adaptation option, a (possibly infinite)range of parameters are available. Alternative path routing (strategy 3) adds as manyoptions as alternative paths are considered. Strategies 4 and 5, finally, are a combinationof adaptation (strategy 2) and alternative path routing (strategy 3) and therefore multiplythe number of transmission options they provide.

However, the number of options goes hand in hand with an increased complexityto gather and evaluate the information. Strategy 1 does not need any information to begathered. Strategy 2 needs end–to–end bandwidth measurements. Strategies 3 to 5, inaddition, need topology and resource information. The complexity of the resource gath-ering has several drawbacks for an application. We have been able to hide the complexityof the architecture and the design of topology–aware applications from the application bythe design of the Octopus framework. However, we are not able to hide the time overheadof the resource gathering that is needed for strategies 3 to 5.

It is up to the application programmer to define the details which strategy shouldbe taken into consideration and in which order. Figure 6.6 shows a sample algorithmin pseudo–code how the strategies could be questioned one by one. After specifying theresource constraints in line 1, the application starts sending the data with strategy 1. Octo-pus provides an exception–based mechanism (aka callback) if the application constraintsare not met by the underlying network. The exception is caught by the application toselect the transmission strategy. If the constraints are not met, the application must decidewhether it wants to consider adaptation first (strategy 2) or whether it wants to considerstrategies 3–5. Adaptation only tries to meet the application constraints (e.g., to trans-mit data within a certain time, using the application–specified adaptation mechanisms),whereas strategies 3–5 try to optimize the data transport by taking alternative paths intoaccount. In the algorithm, strategy 2 is considered first because it requires less resourceinformation. It measures the available resources (line 11), checks whether the resourcessatisfy the application constraints and if they do, the data is sent (line 13). If the resourcesare not satisfying the application constraints, the more expensive strategies have to betaken into account. First, topology information is gathered and the application checksagain whether the resource requirements can be satisfied by alternative paths (strategy 3,line 21). If not, the application has to decide whether the necessary adaptation should beexecuted on the server or the proxy (strategy 4 or 5).

6.4 Topology–aware data delivery in the Internet

This section evaluates the different strategies of topology–aware data delivery using theInternet traces introduced in Section 4.2. This evaluation focuses on two issues. First,we show that the Internet traces change the results of the delivery time in the applicationstrategies for different data types (i.e., for different data sizes). Second, we show that the

6.4 Topology–aware data delivery in the Internet 125

1 specifyConstraints(...);

2 // start sending with strategy 13 while (more data items to be sent) {

4 try {5 send(item); // strategy 1

6 } catch (OctopusTimeConstraintsException e) {7 // thrown if the timing constraints are not met

8 // take other strategies into account

9 if (!optimizeTransmission) {10 resources = Octopus.measureResources(); // end-to-end

11 if (checkRequirements(resources)) {12 adaptedItem = item.adapt(resources);

13 send(adaptedItem); // strategy 2

14 break;15 }

16 }17 // check more expensive strategies 3-5

18 graph = Octopus.getGraph(src, dst);

19 path = Evaluator.evaluateGraph(graph);20 if (checkRequirements(path)) {

21 send(item, path); // strategy 322 break;

23 }24 if (adaptOnServer) {

25 adaptedItem = item.adapt(resources);

26 send(adaptedItem); // strategy 427 break;

28 }29 path p = Selector.select(graph);

30 send(item, path); // strategy 5

31 } // end catch32 } // end while

Figure 6.6: Sample algorithm for topology–aware data delivery.

delivery time changes for a given data size, depending on the transmission strategy. Bothissues finally show that taking application–specific information into account is essentialand hence that our approach for topology–awareness is justified.

In the following discussion, we limit the application data type to images, to simplifythe discussion. We justify this limitation by the fact that images span a wide range of datasizes. Small images correspond to text sizes, and large images may be several megabyteslarge. At the same time, images provide a wide variety of adaptation options, e.g., com-pared to text which is hard to adapt. However, the same line of thought can be used forother data types as well.


Original size convert to jpeg convert to gif downscale (0.5) [ms](tiff) [KB] time [ms] size [KB] time [ms] size [KB] tiff jpeg gif

11 38 2 38 5 23 9 2220 27 3 27 8 61 10 1134 28 4 29 13 20 18 1376 23 7 22 29 12 22 34

170 107 12 99 65 34 98 53300 238 18 140 115 455 137 165540 529 29 520 205 494 509 343

1216 1024 54 919 460 936 863 7341979 1338 80 2364 748 1973 1899 18712166 2249 85 2413 819 2013 1929 2338

Table 6.2: Times for image adaptation (format conversion and scaling).

6.4.1 Strategy parameters

The parameters of the transmission strategies in Table 6.1 can be separated into twogroups. A first group are the networking parameters, such as latency and bandwidth.The values of these parameters for the evaluation are taken from the Internet traces.

A second group are parameters which depend on the properties of Octopus nodes.These parameters determine the adaptation time. They depend on the hardware of a host,e.g., the CPU speed, and the load on the node. A detailed study of these parameters andhow they influence topology–aware data delivery is beyond the scope of this dissertation.However, to get an idea of the processing times, we have measured the execution timeof image conversion and size reduction on a 933 MHz Pentium III running Linux 7.2,using Java Advanced Imaging version 1.1 [48]. We take these measurements to model thenode–specific parameters.

Table 6.2 shows the time to adapt an image. The first column denotes the originalimages size. The next two groups of columns denote the adaptation time and the com-pression factor for a format conversion to jpeg and gif format, respectively. Finally, thelast three columns list the time to scale the original images by a factor of 0.5, dependingon the original image format. All values are averages of 10 runs.

The conversion times for both jpeg and gif increase almost linearly with the size of theimage, except for small images where the costs are almost constant. The times are alsosimilar for both format conversions. The resulting image sizes, however, are significantlydifferent: gif images tend to be larger than the corresponding jpeg images. However,the size of a jpeg image depends on the image content. Finally, the scaling times ofthe size reduction show a similar increase for all formats, but with deviations. In fact,the coefficient of variation is around 0.2 for these measurements. In general, we conclude


from our measurements that: (i) format conversions can reduce the data size by a factor ofabout 20 for (large) jpeg images and about 3 for gifs; (ii) the time to convert a given imageis similar to the time to scale the same image; and (iii) the adaptation time ranges fromseveral 10s of milliseconds to seconds. As a consequence, we model the adaptation timetadapt , as a linear function τ � data � � data, where data stands for the size of the originaldata. Corresponding to the measured results, we set τ � data � to 1. Second, the targetadaptation size dataadapt is a function of the original data size. For a format conversionfrom tiff to jpeg, we set dataadapt to 1

� � φ � dataorig� dataorig � , with φ � exp � 3 � 2 � .

These numbers are filled into 5 formulas of Table 6.1 for the evaluation. Assumethat an application must send an image of 1.2 MB. Which of the strategies are suited totransmit the image within 2 seconds if the default path has a bandwidth of 1 Mbps and thebest alternative path has 4 Mbps from the sender to the proxy and 2 Mbps from the proxyto the receiver. Every link has a latency of 200 ms. Strategy 1 (without any network–awareness) takes 1200kB

125kBps � 0 � 2s � 9 � 8s A format conversion to jpeg at the server (strategy

2) cuts the time down to 54kB125kBps � 0 � 2s � 1 � 02s � 1 � 6s. Strategy 3 transmits the data

within 1200kB250kBps � 0 � 2s � 0 � 2s � 0 � 008s � 5 � 2 seconds. Strategy 4 takes 54kB

250kBps � 0 � 2s �0 � 2s � 0 � 008 � 1 � 02s � 1 � 64s seconds. Strategy 5, finally, has a total time of 1200kB

500kBps �0 � 2s � 0 � 2s � 54kB

250 � 1 � 02 � 4 � 03s. In this case, strategies 2 and 4 satisfy the applicationrequirements.

To sum up, the differences in the results show that the strategies really provide alter-natives for the application to steer the data delivery. However, every additional strategycomes at more cost and requires more information about the network, the load on nodes,the application run time, etc. Rather than doing all optimizations for every data delivery,the presented stepwise algorithm provides an alternative because it tries to evaluate onestrategy at a time and requires more measurements only if the previous strategy has notsatisfied the user requirements. The formulas are simple linear expressions that can becalculated fast, once the necessary information is available.

6.4.2 Comparisons

We omit the comparison between strategies 1 and 2 because they both do not require anytopology–awareness. The comparison of strategies 1 and 3 shows whether an alternativepath exist in our setup which provides better performance than the default routing path.The default routing path has typically a lower latency than the alternative path, but thereare alternative paths which have better bandwidth than the default routing path. Table 6.3shows the number of alternative paths that have better bandwidth than the default path.The number is given in percent of the total number of measurements.

Table 6.3 shows that taking alternative paths into consideration is an interesting ap-proach. For some host pairs, all alternative paths have a better bandwidth than the defaultrouting path, and many numbers range between 90 and 100%. These numbers show


src/dst ETH1 ETH2 EPFL CMU NWU UVA UFMG

ETH1 0 100 100 100 100 100ETH2 0 0 2.5 96.3 1.2 98.4EPFL 100 0 100 100 99.6 99.6CMU 100 48.2 100 0 91.8 20.8NWU 99.2 51.8 100 47.3 100 83.7UVA 100 47.8 99.2 76.7 78.4 97.6

UFMG 100 99.2 100 40.4 62 99.2

Table 6.3: Percentage of alternative paths that have a better bandwidth than the defaultrouting path.


ETH1 0 33.1 44.1 46.1 37.9 51.1ETH2 99.6 0.4 1.2 5.7 4.1 14.7EPFL 1.2 27.3 2.1 7.3 2.9 19.6CMU 1.6 22.5 0 1.2 47.7 4.1NWU 1.7 22.2 0 1.2 46.3 3.7UVA 48.5 62.8 48.2 1.2 6.1 6.2

UFMG 56.7 71.8 57.6 4.1 4.0 12.2

Table 6.4: Percentage of alternative paths that have a better latency than the default rout-ing path.

that the use of alternative path routing opens new transmission options for an applicationwhich may justify the additional overhead of searching for alternative paths.

Table 6.4 shows the same evaluation for the latency. In contrast to the bandwidth, thevalues here are much lower. The results are not astonishing because a path is usually opti-mized for the number of hops, and there is a correlation between the number of hops andthe latency. The results are also too optimistic because only the pure network–measuredlatency was summed up. The latency of the alternative path does not include the process-ing time at the Octopus node. If this overhead was included as well, the percentage wouldbe even lower. Finally the table only lists the percentage of the paths but not the differ-ence in the latency. The alternative paths from ETH2 to ETH1, e.g., which have a betterlatency in 99.6% of the cases only differ in 10% from the default path, whereas almost allalternative paths with UFMG as source differ in 30% or more from the default path.

The fact that a significant percentage of alternative paths has a higher bandwidth thanthe default path does not allow any conclusions about the benefit of alternative paths yet.Besides the pure percentage, the bandwidth difference of the two paths is important aswell.



ETH1 3.7 161.3 280.3 216.5 209.8 247.9ETH2 3.4 64.8 64.3 202.6 53.5 160.1EPFL 181.2 57.2 274.8 209.6 207.6 252.1CMU 231.2 132.3 219.7 14.5 147.7 84.1NWU 200.4 122.1 196.5 90.9 209.5 141.1UVA 252.1 106.2 231.3 133.1 134.9 177.2

UFMG 253.9 165.9 248.1 97.9 107.4 182.7

Table 6.5: Average bandwidth of the best alternative path, compared to default routingpath, in %.


ETH1 161.3 280.3 216.5 209.8 247.9ETH2 125.4 207.2 131.8 161.1EPFL 181.2 274.8 209.6 208.2 252.6CMU 231.2 202.9 219.7 152.5 113.1NWU 201.6 186.6 196.5 173.8 209.5 150.5UVA 252.1 168.7 232.3 145.1 146.7 178.9

UFMG 253.9 166.4 248.1 111.3 117.2 183.5

Table 6.6: Average bandwidth gain of alternative paths if the alternative path is betterthan the default path, in %.

Table 6.5 therefore shows the average bandwidth difference between the default andthe best alternative path, relative to the default path bandwidth, in %. A value largerthan 100% indicates that the average bandwidth of the best alternative path exceeds thebandwidth of the default path (if the value is 200%, then there is a difference by a factorof 2). If the value is below 100%, then the default path is better on the average.

There is clearly a relationship between the number of alternative paths with a betterbandwidth (Table 6.3) and Table 6.5. If all paths have a better bandwidth, then the valuesin the latter table will be above 100%. In spite of these expectations, the numbers are stillimpressive. Not just a few alternative paths exceed the default path by a factor of 2 in theaverage.

Table 6.6 shows the same results as in Table 6.5, except that these values only includethose measurements where the alternative path yields a better bandwidth than the alter-native path. In contrast to Table 6.5, which included all measurements and can thereforebe used to compare the best alternative path to the default path, the results of Table 6.6show the performance gain of an application if it uses alternative path streaming and al-ways takes the path with the highest bandwidth (which may also be the default path in



ETH1 165.5 305.3 358.4 290.1 182.7ETH2 1269.5 406.8EPFL 1180.6 331.8 385.4 316.5 146.8CMU 514.8 349.8 686.7NWU 667.1 657.9 406.8 390.6 1365.2 1037.2UVA 401.5 436.6 34.7.2 384.4

UFMG 150.5 397.8 167.5 1125.2 355.6

Table 6.7: Data size threshold [KB] for strategies 1 vs. 3, i.e., the amount of data to besent until the quality of the alternative path exceeds that of the default path.

this table).The results of Table 6.6 show that a topology–aware application has an average band-

width gain of a factor of 2 (200%) compared to an application which only takes the defaultpath into account. We therefore note that almost all sites can profit from a better band-width availability.

The conclusions of the comparison of the bandwidth and the latency of strategies 1and 3 shows that the bandwidth of strategy 3 is often better whereas the latency is smallerfor the default path. Looking at Equations 6.1 and 6.3, the delivery time for small data sizeis mostly dominated by the latency term. Its influence increases with an increase of thedata size. We can therefore calculate a threshold for the data size where transmission ofthe data is equally fast for both strategies. Beneath this threshold, i.e., for small data, thedelivery time is dominated by the latency term and strategy 1 is therefore better; above thethreshold, the larger bandwidth allows a faster delivery of the larger data using strategy 3.To determine this data size threshold, Equations 6.1 and 6.3 are set equal and resolved tothe variable x, which stands for the data size threshold:

x � � latd � latalt � � bwd� bwalt

bwd� bwalt

(6.7)

We use this equation in this section to compare strategies 1 and 3. In addition, however,the same equation is used by a topology–aware application to compare the data deliveryover two paths in a network for a given data size. In the application context, these twopaths are then just labeled path 1 and path 2. This (and the following) equations makeclear why the routing decision should be included in the application context. The def-inition of the best transmission strategy depends on various factors. Here, these factorsare only bandwidth and latency. However, more parameters are included in the equationslater in this section, and additional constraints may have to be taken into account that arenot important for the delivery of images (e.g., jitter for multimedia applications). Thisinformation is only available in the application context.


Table 6.7 shows the data size threshold for the Internet traces. Every value is theaverage of all traces from a given source to a given destination. The results show that thedata threshold in our experiment lies between 400 KB and 1.5 MB. The range of thesevalues is important for images because the size of many images is in the same range. It istherefore important for a topology–aware application to calculate the data size thresholdand decide over which path the data should be sent.

The results for strategies 1 and 3 can also be applied to strategies 2 and 4 because theonly parameter that is different, the adaptation time, is the same for both comparisons andtherefore falls out when calculating Equation 6.7. The difference is that the calculatedvalue x is no longer the size of the original data, but the data size after the adaptation.Since the adaptation function typically reduces the data size, the results of Table 6.7 movetowards higher values. That is, the default routing path is also better for larger originaldata sizes. The target size can be calculated by the following equation:

x � bwd� � T � latd �

φ � x0 � � x0 � bwd� τ � x0 � (6.8)

where x0 is the original data size, φ is the adaptation factor (e.g., by how much the data iscompressed) and τ is the adaptation time. This formula holds for strategy 2. The equationfor strategy 4 additionally takes the forwarding time into account:

x � bwalt� � T � latalt

� t f orward �φ � x0 � � x0 � bwalt

� τ � x0 � (6.9)

The equation for strategy 5 finally contains two data–bandwidth terms because theoriginal data is transported unadapted over the first path and the adapted data is thentransmitted over the second path.

x � bwsp� bwpr � � T � latsp

� latpr� t f orward �

bwsp� φ � x0 � � x0 � bwpr � τ � x0 � � bwsp

� bwpr(6.10)

The indices denote the server (s), proxy (p) and receiver (r) respectively.Alternative path routing and adaptation are two different solutions for an application

to meet a time deadline. Adaptation results in a loss of quality, alternative path routingrequires the gathering of networking knowledge. It is up to the application to decidewhich solution is preferred. This decision has already been shown in Figure 6.6 (line 9).

The difference between end system based adaptation and proxy–based adaptation isshown by a comparison of strategies 4 and 5. Using the same approach as for the previouscomparison, i.e., determining the data size by combining Equations 6.4 and 6.5, results inthe following equation:

x0 � bwalt� bwsp

� bwpr� � tadapt � p

� tadapt � s� t f orward � p �

φ � x0 � � bwsp� bwpr

� φ � x0 � � bwalt� bwsp

� bwalt� bwpr

(6.11)



ETH1 0.02 0.01 0.03 0.01 0.1ETH2 0.56 0.22EPFL 2.26 0.08 0.11 0.2 0.09CMU 0.8 2.1 0.38 0.09 1.78NWU 0.89 0.26 0.15 0.26 0.50UVA 0.57 1.96 0.04 0.7 0.31 0.24

UFMG 0.02 0.03 0.02 0.23 0.06 0.02

Table 6.8: Threshold [KB] for strategies 2 and 5.

The comparison of strategies 4 and 5 is difficult for this dissertation because more in-formation about the performance of the Octopus nodes on which the adaptation should bedone is needed. We noted earlier that adaptation inside the network is useful when, e.g.,the server is likely to be overloaded and become a bottleneck, whereas a node inside thenetwork may have resources available. This difference in the dynamic resource availabil-ity is expressed by the parameter tadapt � p and tadapt � s respectively. If the adaptation timeon the proxy is higher than the one on the server, proxy–based adaptation leads to worseresults. However, since we have only measured the adaptation time on a single node, wecannot go into a detailed comparison.

Since we have no significant numbers about real server loads, we limit the compari-son of strategies 4 and 5 to one question: which subpath in an overlay path containing aproxy has a higher bandwidth: the subpath between the sender and the proxy or the sub-path between the proxy and the receiver. Proxy–based adaptation makes sense if the firstsubpath is not the bottleneck of the connection. From the total number of 51452 whichcontain proxies, about half of them (25949) have a first subpath with a better bandwidth,and for 25503 paths is the second subpath better. Calculating the average bandwidth ratiobetween the first part and the second part of the path results in a median value of 0.993 forall paths. These results show that proxy–based adaptation is an alternative to server–basedadaptation in our experiment.

Finally, comparing strategies 2 and 5 shows the difference when both strategies canadapt the data, but strategy 5 additionally can use alternative paths. To solve the equa-tions, we assume that tadapt � p � tadapt � s because we only want to compare default pathversus alternative path routing with proxy–based adaptation. The equation which resultsby comparing Equations 6.2 and 6.5 is:

x0 � bwd� bwsp

� bwpr� � latalt

� latd �φ � x0 � � bwsp

� bwpr� φ � x0 � � bwd

� bwsp� bwd

� bwpr(6.12)

The results of this equation for the traces is shown in Table 6.8. One way to interpret thesenumbers is to look at the numbers of Table 6.3 at the same time. There is a correlation

6.5 Prediction–based adaptation in Octopus 133

between the number of alternative paths that have a better bandwidth and the superiorityof strategy 5, expressed by a positive number in Table 6.8. The threshold numbers areusually small so that a benefit with alternative path adaptation can be achieved even withsmall image sizes.

6.5 Prediction–based adaptation in Octopus

The study of the bandwidth predictability has shown that the applicability of a predictorand the result of a prediction depends on the frequency with which a new prediction resultis required. This frequency again depends on the application needs.

Some applications have the possibility to steer the frequency with which new predic-tion values are needed. For a video stream, e.g., adaptation decisions can be cast for everysingle frame, for every GOP or at discrete time steps (e.g., every 10 seconds). Differentimage formats support a tiling of a large image into different subimages which can beadapted individually. A set of images can be adapted either individually or as a whole.A larger number of data chunks allows a more detailed adaptation but also increases theprediction frequency. This section analyzes how this dependency between the image sizeand the prediction.

To analyze this dependency, we use the collaborative application. The application isgiven a set of images to be transmitted within time limit T . The application requires a firstprediction of the available bandwidth for T . The images are sent unchanged if there isenough bandwidth available. Otherwise, the images are adapted. We thereby only allowan adaptation which reduces the image size because we assume that the sender stores ahigh quality version of an image. Increasing the image size would not add information tothe image and thus improve the image quality.

The analysis uses the same prediction models and the same sampling intervals as inSection 4.2, i.e., AR(1), AR(16), BestMean(1) and BestMean(10) are used as predictionmodels and 0.1, 0.2 and 1 seconds are used as sampling intervals. The application mustsend 100 equally sized images of 10K each, resulting in a total amount of 1MB. Theapplication can adapt every image individually, or it may group the adaptation as if itwere 1, 2, 10 or 50 images. The size of the images is rather small. However, we havechosen this size because it allows the use of the traces introduced in Section 4.2. We againargue that it is not necessarily the absolute size of the image that is important, but it is therelationship of the trace fluctuation and the image. We use 100 randomly selected tracesfor the analysis.

We separate the analysis in two parts. First, we analyze the interaction with a timelimit T which requires an average compression rate of a factor of 0.5 (i.e., a compressionby 50%). Second, we reduce the time limit T so that the data has to be compressedby a factor of 0.05. This analysis reduces the chance of an application to compensatemispredictions by compressing the following images by a higher factor.


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

AR

1A

R1

6B

M1

BM

10

AR

1A

R1

6B

M1

BM

10

AR

1A

R1

6B

M1

BM

10

AR

1A

R1

6B

M1

BM

10

AR

1A

R1

6B

M1

BM

10

AR

1A

R1

6B

M1

BM

10

AR

1A

R1

6B

M1

BM

10

AR

1A

R1

6B

M1

BM

10

AR

1A

R1

6B

M1

BM

10

AR

1A

R1

6B

M1

BM

10

AR

1A

R1

6B

M1

BM

10

AR

1A

R1

6B

M1

BM

10

AR

1A

R1

6B

M1

BM

10

AR

1A

R1

6B

M1

BM

10

AR

1A

R1

6B

M1

1 2 10 50 100 1 2 10 50 100 1 2 10 50 100

0.1 s 0.2 s 1.0 s

sampling frequency / number of images / prediction model

co

mp

ressio

nfa

cto

r

Figure 6.7: Average compression factor, as a function of the prediction model, the sam-pling interval and the number of images.

6.5.1 Large time limit

The time limit T is set so that only half of the data is transmittable, given the availablebandwidth of the bandwidth trace. That is, the compression factor of the images is 0.5.However, the application is not aware of the correct value of this factor and must base theadaptation on the bandwidth prediction for every step.

Figure 6.7 shows the average compression factor of the adaptation, as a function ofthe prediction model, the sampling interval and the number of (tiled) images. The y–axisdenotes the compression factor. A compression factor of 0.5 means that the data sizeis halved. The chart shows the median values and the 10– and 90–percentile for everyparameter set.

If the images are considered as a single image and therefore adapted based on thefirst value, we note a wide variety of compression factors, ranging from 0.05 to 1. Thecompression factors in the experiment are distributed over the whole range. We thereforeconclude that adapting the data based on a single value is a bad idea. Although we notedin Section 4.2 that a good predictor achieves a low average prediction error of 5%, thereis no guarantee that a single value is within this range. It is interesting to note that thelarge distribution of values holds for all prediction models and for all sampling intervals.


The variations get smaller with an increasing number of images. Already the possi-bility to consider the data as 2 images reduces the variation significantly. The differencein the variation for 10, 50 and 100 images is already so small that we consider it in therange of noise.

A second observation is that there is a slight tendency towards a lower average com-pression factor as the number of images increases. We find one reason for this decreaseby looking at the raw data of the prediction and the adaptation factor of the images. Abandwidth misprediction may either over– or underestimate the available bandwidth. Theprobability of over– and underestimating the bandwidth is the same. However, the effectsof the mispredictions are not equal because the impact of a bandwidth overestimation islimited by the compression factor, which has an upper bound of 1. In contrast, there isno limit to the compression ratio – in the worst case, it is even set to 0. This inequalitygrows with the number of adaptation decisions that have to be cast, hence, the averagecompression for 100 images is smaller than for 10 images.

Third, Figure 6.7 shows that the number of images is the dominant factor that influ-ences the compression factor for a small number of images (1 or 2 images in our exper-iment). For 10 or more images, in contrast, it is the prediction model and the samplinginterval which are more important. The lower sampling frequencies at 0.2 and 1.0 secondshave almost the same average compression factor, and their result is almost independentof the prediction model in our experiment. The higher sampling frequencies show a largervariation with a slightly lower average value.

The compression factor alone is not the only parameter which determines the qualityof the adaptation. A low compression factor has the risk to reduce the image too littlefor the given bandwidth, so that the time limit is violated. On the other hand, a highcompression factor may lead to an underusage of the time limit and thus of the availableresources. For our experiment, we know that the target average compression factor is 0.5to meet the time limit. Every compression factor in Figure 6.7 that is above 0.5 thereforeleads to an underusage of the available resources whereas a lower value violates the timelimit. The over– or underusage increases with the distance to the target value of 0.5. Theconclusion is therefore that treating the data as a single image leads to a drastic over– orunderusage of the available resources, whereas an adaptation of 10 or 100 images is ableto closely meet the time limit.

6.5.2 Small time limit

Reducing the time limit T within which the application must send the data increases thepressure on the application in that a misprediction (especially a too optimistic one) has amore severe impact on the adaptation quality. With a shorter time limit, the applicationhas less time left to compensate a data transmission that exceeds the predicted time.

Figure 6.8 shows the average compression factor of the adaptation, as a function of


0.001

0.01

0.1

1

AR

1A

R16

BM

1B

M10

AR

1A

R16

BM

1B

M10

AR

1A

R16

BM

1B

M10

AR

1A

R16

BM

1B

M10

AR

1A

R16

BM

1B

M10

AR

1A

R16

BM

1B

M10

AR

1A

R16

BM

1B

M10

AR

1A

R16

BM

1B

M10

AR

1A

R16

BM

1B

M10

AR

1A

R16

BM

1B

M10

AR

1A

R16

BM

1B

M10

AR

1A

R16

BM

1B

M10

AR

1A

R16

BM

1B

M10

AR

1A

R16

BM

1B

M10

AR

1A

R16

BM

1

1 2 10 50 100 1 2 10 50 100 1 2 10 50 100

0.1 s 0.2 s 1.0 s

sampling frequency / number of images / prediction model

co

mp

ressio

nfa

cto

r

Figure 6.8: Average compression factor, as a function of the prediction model, the sam-pling interval and the number of images.

the prediction model, the sampling interval and the number of (tiled) images. The y–axisdenotes the compression factor on a logarithmic scale. The chart shows the median valuesand the 90–percentile for every parameter set (The 10–percentile is omitted because it isaround 0.001 for all experiments). The target average compression factor is 0.005. Sucha high compression factor may be necessary when data is transmitted over a modem line.

The results in Figure 6.8 are similar to those in Figure 6.7 in that the compressionfactor is reduced with an increasing number of images. Note that the average compressionfactor for 1 image is permanently too high and its 90–percentile even reaches a value of1. The average compression factor is between 0.01 and 0.1 for 10 images or more.

6.5.3 Variation

An important factor in the analysis of the prediction model has been the coefficient ofvariation, COV, which models the variation of the samples relative to their median value.The COV has also an influence on the adaptation behavior of the collaborative application.There is randomness in the results of the previous analysis in that the adaptation mayalways have gotten a too pessimistic or too optimistic bandwidth prediction. To showthe effects of this randomness, we have run the same experiment again, but varied the


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.1

0.2

0.1

0.2

0.1

0.2 1

0.1

0.2

0.1

0.2

0.1

0.2

0.1

0.2 1

0.1

0.2

0.1

0.2

0.1

0.2

0.1

0.2 1

0.1

0.2

0.1

0.2

0.1

0.2

0.1

0.2 1

0.1

0.2

0.1

0.2

0.1

0.2

0.1

0.2 1

0.1

0.2

AR1 AR16 BM1 BM10 AR1 AR16 BM1 BM10 AR1 AR16 BM1 BM10 AR1 AR16 BM1 BM10 AR1 AR16 BM1 BM10

1 2 10 50 100

com

pre

ssio

nfa

ctor

Figure 6.9: Compression factor variation.

starting point within the trace. Figure 6.9 shows the compression factor as a function ofthe number of images, the sampling interval and the prediction model. The points in thegraph are the median values of compression factors for the runs with the different startingpoints. The bars denote the 10– and 90–percentiles. The great distribution of valuesfor a single image shows how dangerous it is to send a large image without tiling. It isalmost impossible to predict the result of an adaptation. In contrast, the results for 10 andmore images show a small variation. As a consequence, we can state the results of theprevious analysis are independent of when the prediction starts in a trace. This conclusionis important for an application because it allows to start the bandwidth prediction any time,e.g., also before the data should be transmitted, and second, it gives a certain guarantee tothe application that the result will be similar independent of when exactly the bandwidthprediction is queried.

6.5.4 Summary

Using a good prediction model and a suited sampling interval is essential for a collab-orative application to determine the compression factor and, in consequence, to use theavailable bandwidth as good as possible, while at the same time meeting the imposed timelimit. However, apart from these network parameters, an application also has a chance to


influence the result with parameters that are purely application specific. Splitting theamount of data to be sent into multiple chunks allows a data to repeatedly adapt data tothe latest changes in the available bandwidth, and, if necessary, to correct previous pre-diction errors. This evaluation has not shown that the time limit can be met in more casesand with a greater accuracy. In addition.

This analysis has also shown that the combination of networking and application pa-rameters opens a wide range of parameter combinations which may even be related amongthemselves and mutually influence each other. The number of images influences the sam-pling frequency, which influences the variability of the resulting time series, which in turninfluences the prediction result. We have shown that not every combination is equally wellsuited for a given application. Although we have not been able to study all the details ofthe parameters we have at least shown the dependency of these parameters among them-selves. It becomes clear that only a solution which is able to influence all parametersand select the correct combination (e.g., of prediction models) is able to achieve the bestresult.

6.6 Summary

The discussion of the design of the collaborative application shows that the Octopusframework is able to support the building of topology–aware applications. The frame-work allows the application to easily implement a sending mechanism for multiple datatypes via proxies. Additional functionality other than the data forwarding can be installedon these proxies, e.g., to adapt the data.

A second opportunity for the collaborative application is that data can be sent andadapted using different strategies. These strategies also depend on application–layer met-rics, such as the data type or the data size. Taking these application parameters intoaccount has so far been disabled by the separation of networks and applications.

The implementation of the collaborative application has shown that the abstractionsof the Octopus framework are sufficient to efficiently support the application needs. Thecustomization of the framework is limited to few definitions and calls into the Octopusframework. This fact shows that the customization of Octopus is easy. As a consequence,new applications as well as legacy applications can easily make use of the functionalityprovided by Octopus.

The topology–aware data delivery in Octopus opens new possibilities to better usethe available resources of a network. While a bandwidth–aware application only has twostrategies to adapt the data, a topology–aware application has 5. But it is especially thenumber of possible options for each strategy that provides the application with a widerange of options to transmit the data, i.e., the strategy to stream the data via an alternativepath opens a number of options that equals the number of alternative paths that are takeninto account.

6.6 Summary 139

The analysis with real Internet traces shows that an application can profit from usingthese choices. Alternative paths often provide a higher available bandwidth than the de-fault routing path. By selecting the path with a highest bandwidth, it is possible to reducethe delivery time of large data items significantly.

However, the collaborative application also shows that the benefit of using alternativepaths highly depends on application parameters, such as the data size. For small datasizes, e.g., for text or small images, other paths are better than for large amounts of data.The Octopus framework is able to support an application in the decision over which pathto send the data.

Another important strategy for a topology–aware application is to adapt the data insidethe network. This strategy is especially useful when the server, which is used to adapt thedata when only bandwidth–awareness is available, is endangered to become a bottleneck.The main task of the server should be to answer client requests as fast as possible. Whenthe load of processing these requests becomes too large, a topology–aware server mayshed the load of the data adaptation onto the proxies.

The possibility of sending a large amount of data in separate parts that can be adaptedindividually allows the collaborative application to send the data within a given time limit.Sending the whole amount as a whole and using a prediction only once makes the datadelivery subtle to mispredictions. By splitting the data into smaller parts, a mispredictioncan be compensated at the next adaptation point. However, the evaluation also shows thatthere are upper bounds to the data separation. First, in a real system, each adaptation pointrequires bandwidth predictions and the adaptation of the next data part. This overhead isstressing any participating system and may lead to an overload of the resources. Second,an increase in the adaptation frequency also increases the probability of hitting a mispre-dicted bandwidth value. For these reasons, an trade–off has to be found to the number ofdata parts that should be used.

7A topology–aware video

application

This chapter evaluates the concept of topology-awareness with an adaptive MPEG–1 ap-plication. The evaluation in this chapter is different from the previous one for the follow-ing reasons (these properties may not be true in any case but in general):

� A video application runs for a longer period than image–based applications. Thevideo access may take several minutes up to hours.

� Adaptation points (possible points where an application can adapt the data) aremore frequent for MPEG–1 movies (frames) than for images.

� Videos require a higher bandwidth over a longer period of time.

These properties are important because they imply a different use of topology–aware-ness. A longer run time, e.g., allows an application to spend more time for server selectionor location selection because it has a longer time span to compensate the overhead. Thefrequency of adaptation intervals has an influence on the information needed from thenetwork layer. Finally, the high need for bandwidth implies optimizing strategies forthe data delivery, such as multicast. For these reasons we consider an evaluation with avideo application as a necessary and important extension to the evaluation in the previouschapter.

This chapter is organized as follows. Section 7.1 first describes the original designand implementation of the MTP MPEG–1 application. Second, it describes the exten-sions made to make MTP topology–aware. We call this topology–aware version of MTPMedusa. The new name allows us to clearly distinguish between the old, adaptive imple-mentation and the new, topology–aware version. Section 7.2 describes the server selectionscenario, using the information provided by Remos. The evaluation of the Octopus nodeselection, as described in Chapter 4 and 5, is addressed in Section 7.3. Section 7.4 de-scribes the idea of using multiple paths to stream video data from a source to one or severaldestinations. Finally, Section 7.5 finally describes how a topology–aware application candynamically switch from one connection to an alternative connection.

141

142 Chapter 7: A topology–aware video application

7.1 Medusa: design, implementation and integration of anadaptive MPEG–1 application

The MPEG–1 standard is designed for transmission rates between 1 to 1.5 Mbps. Suchtransmission rates can be expected from today’s Internet. However, there are also timeswhen the network is congested. One possibility to address this situation is to avoid it byreserving or providing quality of service (QoS) guarantees [63, 2, 13]. However, makingbandwidth reservation for an application is very hard to achieve in the Internet.

A second possibility is to download the video before watching it. During the down-load, the bandwidth may fluctuate as it likes. However, at least two restrictions are obvioushere. First, there is a limited amount of space on the client machines, limiting the amountof data to be downloaded. Second, the user must wait for the video until the download iscomplete. Not all users are ready to accept this waiting period. In addition, this modelis not well suited for real–time transmissions, such as the streaming of news or sportsevents.

Yet another alternative is to accept the simple best–effort service provided by the In-ternet and adapt the data stream whenever necessary. An MPEG–1 stream can be adaptedby dropping selective frames of low priority. Adaptation is suited for real–time videodownloads. The problems and solutions discussed in this chapter focus on such a real–time streaming scenario. MTP is such an adaptive MPEG–1 video streaming application.It was designed and implemented by Hemy et al. [43]. This application consists of threeparts: a video server, a client, and an adaptive frame–dropping filter. The parts can run onthree different hosts, connected via application–layer connections.

We use the MTP application as a second application to show the use and the benefitsof the concept of topology–awareness. We therefore extend the original MTP applicationby several mechanisms that are described in this chapter. We call the resulting topology–aware MPEG–1 application Medusa.

7.1.1 Medusa components and integration with Octopus

clientserver filtermpeg filtered_mpeg

medusa://filter,server/video.mpg

O Octopus node

O

Figure 7.1: The main components of Medusa: server, filter and client. Setup of Medusa.

7.1 Medusa: design, implementation and integration of an adaptive MPEG–1application 143

The design of MTP distinguishes between three independent components, as depictedin Figure 7.1: the server, the filter, and the client. The MTP filter is responsible for theadaptation. The figure shows that the connection between the server and the filter mustprovide enough capacity to transmit the MPEG stream unadapted, i.e., the filter must beplaced before a bottleneck. It therefore becomes clear that the (correct) placement of thefilter has an important influence on the quality of the video stream.

In MTP, the client starts the download by specifying a URL. This URL may contain alist of hosts, separated by commas, e.g.:

mtp://filter.epfl.ch,server.cmu.edu/video.mpgThe first entries in this list are filter hosts whereas the last entry is the server.

Medusa adapts the three–component design of MTP. However, Medusa adds the fea-tures of the topology discovery and the selection of Octopus nodes to MTP. That is, inMTP, a user is requested to enter the URL by hand. In contrast, Medusa is able to detectintermediate hosts where a filter can be instantiated automatically. The user is only re-quested to enter the server host in the URL, e.g.

medusa://server.cmu.edu/video.mpg.This URL is then passed to the management layer, which gathers the topology informationand evaluates the graph to determine whether a filter is needed, given the current resourceusage. If so, it selects the best Octopus node and adds it to the URL. The resulting URLis then returned to the application, which can start the download.

The comparison in the creation of the URL shows that very little changes in the MTPcode are necessary to make this application topology–aware. In fact, only 3 lines of codehave been added. Two lines are needed to specify the Evaluator and the Selector (sinceMedusa can use standard implementations available at the management layer, no code hasto be written here). Finally, the third line calls the topology system to substitute the URL:

newUrl = Manager.replaceUrl(oldUrl);Additional code is necessary when the application should react to resource changes at runtime. Such mechanisms are not foreseen in MTP. A handoff, e.g., requires an impend-ing congestion in the actual path is detected. For the detection, two design variants arepossible. The network may detect the congestion when it measures the bandwidth or theapplication can detect it (Medusa measures the packet loss at the application layer. Anincreased loss hints at a congestion).

Which alternative is used depends on the availability of mechanisms to detect such asituation. Quite often, only one of the two is available or, if both are available, only one isused (e.g., because measurements are expensive). From a system design’s point of view,the application measurement has the advantage that the interface between the applicationand the topology system does not change: if congestion is detected, the application callsthe topology system to replace the URL. If the network detects the congestion, an ad-ditional mechanism is needed that allows the network to inform the application. Such acallback could easily be integrated in the Octopus framework.


drop

pa

cke

tize

de

lay

send

transmit

de

term

ine

fra

me

typ

e

input stream

output stream

buffer eva

lua

tefr

am

e

MPEG frames(UDP) packets (UDP) packets

Figure 7.2: Frame–dropping in the Medusa filter.

In general, however, we can see that Medusa has a very lean and clean interface tothe topology system. The framework might be customized. But with Medusa we canshow that the framework can also be used as it is, because Octopus is a framework thatprovides implementations for all abstractions. Such a framework use is denoted as black–box use. In contrast, the application shown in Chapter 6 showed a white–box use ofOctopus. The black–box use is important because it shows how easily legacy applicationscan be integrated into Octopus.

7.1.2 Adaptive filtering in Medusa

The filter can adapt to changes in the availability of network resources by selectivelydropping single MPEG frames. Figure 7.2 shows this filtering process. The first stepis the reading of a packet from the input stream, i.e., from the network. The filter thendetermines whether the packet should be copied to the output queue, depending of thecontent of the packet (the frame type) and the current capacity of the outgoing networklink. If the packet has a special content, such as audio or a system part, it is alwayscopied to the outgoing packet queue. If the stream contains a video frame, the filterdetermines the type of the frame. MPEG–1 distinguishes between three video frametypes: I–frames are independent frames and have the highest priority. P–frames dependon the nearest I–frames and are therefore only displayed when the corresponding I–framehas been transmitted. Finally, B–frames depend on P– and I–frames and have the lowestpriority. A typical MPEG–1 stream layout may look like IBBPBBPBBPBBPBB. Thisbunch of frames is also called a group of pictures, GOP. Other GOP layouts are possibleas well. One sample video used for the evaluation, e.g., consists of 68 GOPs, containing68 I–Frames with an average size of 8.9 KB, 272 P–Frames of 3.3 KB and 676 B–Framesof 1.5 KB each. To achieve a (typically desired) frame rate of 24 frames per second, theI–frames require about 200 KBps, the P–frames 320 KBps, and the B–frames 360 KBps.

Having determined the frame type, it is compared against the current filtering rate (de-


noted ”evaluate frame”in Figure 7.2). The filtering rate is based on client feedback. Everypacket that is sent to the client contains a unique sequence number. If the client noticesthat packets are lost, it sends a message to the filter to increase the filtering level, therebyreducing the outgoing stream. (Uncontrolled) packet loss is worse than the (controlled)dropping of frames because a packet may contain several frames (which have to be inval-idated) or a packet may contain a (high priority) I–frame. A lost I–frame automaticallyleads to an invalidation of all P– and B–frames in the same GOP. Details about the clientfeedback can be found in the corresponding papers (e.g., [43]). As a result of the filterthreshold, the frame is either dropped or copied to the output queue. The data in the outputqueue is accumulated until enough data is ready to fill a network packet (of the size of theMTU). This packetization allows for an optimized use of the network bandwidth. Beforesending the packet, it may be delayed to ensure the required frame rate by the video.

The separation of the filtering process from server and client allows an easy integra-tion of Medusa into Octopus because a filter can be instantiated on an Octopus node. Ittherefore allows us to investigate how Octopus can profit from the information providedby application–aware networks. The question where a filter should be instantiated in anetwork, e.g., has not yet been addressed in previous work.

On the other hand, Medusa has been designed without the idea of topology–awareness.The feedback mechanisms, e.g., uses information gathered by the client, i.e., at applica-tion layer. An important question is therefore whether the adaptation can be improvedwith information from the network or whether the client feedback is good enough for thiskind of application.

7.1.3 Multicast in Medusa

Concept and implementation

Although the structure of MTP is already suited for topology–awareness, we extended itsfunctionality for multicast. Since multicast is a well–known technique, we concentrate onthe implementation of the multicasting and a brief evaluation.

The implementation of a multicast filter for Medusa implements a combined adap-tive/multicast filter. Figure 7.3 shows its design. Compared to the design of the originalMTP design, two parts had to be extended. First, the buffer is extended for the accessof multiple threads, one per (adaptive) connection. This extension was straightforward.Second, the protocol between the client and the filter had to be slightly changed to supportmulticasting. A filter automatically recognizes a multicast scenario when multiple clientscontact a filter with the same URL. Some commands from the client to the filter, such asclosing a connection, had to be changed for a correct multicast behavior. Details can befound in the project description [31].


filterclient 1

bufferfilter packetize send

feedback

playbuffer

feedback

client 2

buffer

play

filter

MPEG frame

(UDP) packet

server

Figure 7.3: Multicast filter for Medusa.

Multicast performance

Multicasting a stream in combination with adaptation increases the stress on a host. Thescalability of the filter, i.e., how many multicast streams can be supported on a single host,is an important factor for the scalability and the deployment of multicast for Medusa. Forthis investigation, a 350 MHz Pentium II host with 128 MB RAM running Windows 2000is used. Figure 7.4(b) shows the load on the system, as a function of the number of clients(on the x–axis). The graph compares the load if multiple instances of the original MTPfilters are run to the load imposed by a single Medusa multicast filter with the correspond-ing number of clients to be served. The Medusa implementation causes a slightly higherload than the multiple instances of the MTP filter. This result is astonishing becauserunning multiple instances of a MTP filter implies that multiple Java VMs are running,which should cause a higher load. In addition, some tasks, such as the data reading fromthe incoming data stream (from the server), can be shared and should therefore producea lower load. One possible reason for this higher load may be that the implementation ofthe Medusa filter instantiates a larger number of threads than the original MTP filter. Nofurther investigation has been made here. A second observation is that the system load in-creases almost linearly with the number of clients, from 5% for 1 outgoing connection to20% for 8. There is no reason to believe that the increase will slow down for more clients.We therefore conclude that there is an upper limit on the scalability of this filter. Runninga Medusa filter for up to 10 clients is well possible. However, note that hardware used forthis experiment is no longer up to the speed of today’s machines, so that we expect a farbetter scalability if the experiment were run on current state–of–the–art hosts.

Figure 7.4(a) shows the memory usage in KBytes. The Medusa filter is again com-pared to multiple instances of the original MTP filter. The graph shows that Medusascales better to a larger number of connections. For 1 and 2 clients, an overhead in mem-


0

1000

2000

3000

4000

5000

6000

1 2 3 4 5 6 7 8

number of connections

mem

ory

usage

[KB

]

MTP Filter Medusa Filter

(a) Memory usage

0

5

10

15

20

25

1 2 3 4 5 6 7 8

number of connections

syste

mlo

ad

[%]

MTP Filter Medusa Filter

(b) System load

Figure 7.4: Comparison of resource usage of MTP and Medusa filters.

ory has to be paid, but then the memory usage of Medusa falls below that of MTP. Thedifference could be accentuated by increasing the size of the buffer for the incoming datastream. This buffer is always replicated for multiple instances of MTP but can be sharedby Medusa filters. Also note that the memory usage is rather low: only 4 MBytes areneeded for 8 outgoing connections.

We conclude from this evaluation that the implementation of a multicast Medusa filteris able to support up to 10 connections without overloading a slow host. Depending onthe hardware of the host (CPU, memory), a larger number can be supported as well.

7.1.4 Trace modulation

The remainder of this chapter evaluates the adaptation and streaming behavior of Medusain the Internet. Making experiments in the Internet is difficult because the resource be-havior can not be controlled and reproduced. We therefore use a technique called tracemodulation [57]. A slower trace is thereby emulated over a faster network. We haveimplemented this emulation in a special socket class. This emulation socket allows thesending of a specified amount of data per time interval. This amount of data may havea regular pattern (e.g., constant, linear in/decreasing, sinoid) or it may be steered by theInternet traces. The latter allows the emulation of the Internet bandwidth and its fluctua-tions in a reproducible way. We distinguish between two different emulation socket types:reliable and non–reliable sockets. These socket types differ in the way they treat the datathat cannot be sent within a given time interval. The non–reliable emulation socket dis-cards data which exceeds the capacity of the emulated link. In contrast, the reliable socketsends the data, but delays the delivery as long as the trace requires.

Although a trace emulation typically provides the same bandwidth as a real trace, thereis no absolute correspondence between the two. A UDP connection in the Internet, e.g.,drops application packets randomly in an MPEG stream. In contrast, the non–reliable


Server Location average bandwidth standard deviation

ETH 63.1 5.61EPFL 3.03 0.17CMU 0.50 0.28UVA 0.37 0.28UC 0.18 0.07

Table 7.1: Server location, the available bandwidth and the standard deviation, measuredby Remos.

socket has a regular pattern: it always transmits the first packet of a stream up to the mo-ment where the bandwidth capacity is reached and then drops the remaining packets untilthe end of the time frame. The difference between the real trace and the emulated tracegrows with the emulated time interval. We therefore typically use a time interval of 100ms. We have compared different time intervals and noticed that 100 ms is well suited forour evaluation: the interval is large enough to maintain a certain transmission regularity,but it is small enough to keep the differences between real and emulated streams small.

7.2 Server Selection

Server selection is a well–known concept by which a client has a choice of replicatedservers from which it can download its data. Server selection is a relatively simple sce-nario for topology–awareness because the information about the network topology whichthe client needs is limited. However, this simplicity also makes server selection an easy–to–use and cheap mechanism.

This section describes how the topology–aware Medusa application uses the informa-tion from the network. The scenario for this experiment is as follows: a Medusa clientwants to access a life video stream and display it in real time. The client has a choice ofseveral sources that it can connect to. These servers are listed in Table 7.1.

Before the download, the Medusa client issues a query to the management layer. Thequery contains the list of available servers and the metric that should be used for theevaluation, in this case the bandwidth. The management layer retrieves the value fromthe database or triggers Remos to do a new measurement. The topology resulting fromthis kind of query is very simple: the client acts as root of a tree, the servers are its leafs.This simple topology, without intermediate nodes, makes server selection easy to use.The evaluator and the selector build a ranked list of the servers, based on the bandwidthmeasurements. The client selects the first server from the list and downloads the video.In this experiment, the video is not only downloaded from the first server, but from allservers sequentially, in the order of their ranking. This experiment is run several timeswithin 24 hours with different movies.

7.2 Server Selection 149

0

500

1000

1500

2000

2500

3000

3500

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Experiment number

Num

ber

of c

orre

ctly

rec

eive

d fr

ames

best server

2nd best server

worst server

picked server

Figure 7.5: Server picked according to the measured bandwidth (large circle) and thenumber of correctly received frames in the following download.

This experiment serves two purposes. First, is should confirm that server selectionworks with a real application in the Internet. Did the selected server provide the bestperformance? Is the assumption that Medusa is bandwidth bound true? The problem inthe investigation of these questions is that the time between the measurement / selectionand the video download is not the same for all servers: the first server is contacted directlyafter the selection, and every subsequent server must wait until the previous server is done.

A second goal is to compare the measured bandwidth to the bandwidth the applicationexperiences. The filter adapts to the available stream by reducing the amount of data tobe sent in times of congestion and to increase the amount when more bandwidth becomesavailable. The bandwidth measured by Remos before the transmission can therefore becompared to the application–perceived bandwidth when the video is downloaded.

The setup of this experiment is as follows: the video client is located at ETH. Serversfrom which the videos can be downloaded are placed at different locations in Europe andthe U.S. (see Table 7.1). The table shows that the bandwidth to the local server at ETH isan order of magnitude higher than to EPFL, which in turn is an order of magnitude largerthan to all other sources. The adaptive filter is run on the same host as the server to adaptthe outgoing data stream.

Figure 7.5 shows the results of the server selection. The y–axis denotes the number ofcorrectly received frames for each experiment. This metric is used to compare the qualityof the received videos. The various experiments are shown on the x–axis. The individual


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25 30 35time [sec]

band

wid

th [M

bps]

1 second interval, remote server

1 second interval, local server





local server

remote server

Figure 7.6: The bandwidth measured by the application, averaged over different timeintervals, and the bandwidth reported by Remos.

points show the ranking of the servers, based on the Remos measurements. The serversat ETH and EPFL are excluded because their ranking never changes and because all datais always transmittable due to the high available bandwidth. The server that is selectedfirst (excluding ETH and EPFL) is indicated by a large circle. In 90% of the selectionsperforms the selected server better than the others. Only in experiment 16 and 21 is theselected server not the best one. A detailed view of the log files shows that the server sentonly about half of the packets, most likely due to a high load on the server.

The results show that the available bandwidth corresponds well to the application–perceived quality. However, the two wrong picks indicate that the bandwidth alone doesnot guarantee a good video download. Other parameters may influence the download aswell and should be taken into account. The work by Dinda [27], e.g., can be used toaddress this issue.

Figure 7.6 addresses the second question: how well do the bandwidth measurementand the application–perceived quality correspond? The figure shows two experiments indetail, one an access to a local server and one accessing a remote server (UVA). Eachpacket that arrives at the client is time stamped and the application–perceived bandwidthis calculated as the average over 3 different time intervals: 1, 2, and 10 seconds.

The download from the local server is not limited by the bandwidth. The server there-fore sends the complete video at the speed that maintains the required frame rate. Thebandwidth fluctuates between 500 and 600 Kbps. These small fluctuations can be caused

7.3 Dynamic placement of filters on Octopus nodes 151

by many factors, such as different bandwidth requirements due to varying movie con-tent (the size of the frames depend on the content) or temporary server load fluctuations.However, all three sampled bandwidth lines are close to each other.

For the remote server experiment, the bandwidth measured by Remos is 0.15 Mbps,as shown by the straight line. This line corresponds well to bandwidth measured by theapplication if it is averaged over a large interval. The 10 seconds interval corresponds tothe time interval that Remos uses to measure the available bandwidth. Calculating the av-erage over smaller intervals shows higher fluctuations by the network. These fluctuationsare visible at 1 or 2 seconds. This fact leads to several conclusions. First, the samplinginterval has an important influence of the application behavior and the application perfor-mance. Second, the choice of the sampling interval depends on the adaptation granularityand the application preferences. As shown in this figure, the application is able to react tochanges in the order of seconds. Choosing a sampling interval of 10 seconds is too coarsefor this application and slows down the reaction of the application. A sampling intervalof 10 seconds leads to an underusage of the available bandwidth in this part of the figure.In other parts, the bandwidth estimation may be too high, leading to lost packets.

Nevertheless, the overall experiment shows that Medusa can use the information aboutthe topology for server selection. A correct server selection can improve the quality of thevideo considerably. In some cases, the number of correctly received frames is one thirdhigher than for the second best server. This dynamic server selection is also most likelyto outperform any server selection by a user, as it is often done today.

7.3 Dynamic placement of filters on Octopus nodes

The correct placement of a Medusa filter is important for the received video quality at theclient. Section 4.1 and Chapter 5 described the necessary algorithms to place and selectthe locations for the instantiation of the filters. This section evaluates these methodsin the context of the video application. To evaluate a potential benefit of topology–awareapplications, the overhead of the Octopus location selection must be put against the benefitof the improved application quality achieved by the correct selection of the Octopus nodeplacement. The cost of the selection has been described in Section 4.1.5. This sectiontherefore assesses the potential benefit for the application.

7.3.1 Video streaming with two parties

A first part of the evaluation focuses on the differences in the received video quality,depending on the placement of adaptive and multicast filters. For this evaluation, thefollowing scenario is set up: a video stream should be multicast from ETH to two clientsat CMU. The network topology between the server and the clients consists of Octopusnodes at both ETH and CMU. End–systems are configured as Octopus nodes for this


client 1

server

client 2

transatlanticlink

ETH CMU

O Octopus node

OO

0.4 Mb

10 Mb

(0.3, 0.7, 1.0 Mb)

10 Mb

Figure 7.7: Evaluation topology for experiment 1.

setup multicast node adaptive node

1 ETH ETH2 ETH CMU3 CMU ETH4 CMU CMU5 dynamic dynamic

Table 7.2: Setups for experiment 1.

experiment. On these nodes, multicast and adaptive filters can be instantiated.

The topology of this experiment is shown in Figure 7.7. The thin line between ETHand CMU denotes a low–bandwidth link; the wireless connection between the CMU cloudand client 2 is indicated by a dashed line. These bandwidth are emulated in our localnetwork using trace modulation, as described in Section 7.1.4. The bandwidth of thetransatlantic link is emulated with three different bandwidth values: 1 Mbps, 0.7 Mbpsand 0.3 Mbps. These values are typical values found on the transatlantic link at differenttimes throughout a day [77]. These three values are also chosen because they implydifferent operations from the application. The video stream requires an average bandwidthof 0.6 Mbps. A single video stream can therefore be transmitted in full quality for the twohigher bandwidth values, but not for the lowest one. A second restriction is applied tothe link to client 2 at CMU. This link is limited to a constant bandwidth of 0.4 Mbps,emulating the bandwidth of a wireless link. All other links are inside a LAN and have astatic bandwidth of 10 Mbps. It is ensured that no packets are lost due to congestion overthese links.

The experiment is run with four different configurations, resulting from the combina-tions of placing an adaptive and a multicast filter at either ETH or CMU. The possiblesetups for this topology are shown in Table 7.2. For each configuration, the number ofcorrectly received frames at both clients is measured. Figure 7.8 shows the results for


0

10

20

30

40

50

60

70

80

90

100

1 Mbps 0.7 Mbps 0.3 Mbps

bandwidth of transatlantic link

receiv

ed

fram

es

[%]

setup 1

setup 2

setup 3

setup 4

dynamic

Figure 7.8: Percentage of correctly received frames at client 1.

client 1. The results for client 2 are not shown in the graph because they show a smallervariation. They are only described in the text.

The figure shows the three different values for the transatlantic link on the x–axis andthe number of the correctly received frames in percent on the y–axis. Note that the numberof frames also corresponds to the filtering level of the adaptive filter: the first frames thatare not received are the (low–priority) B–frames, etc.. The first four bars correspond tothe four different setups of placing the adaptive and multicast filters.

The first setup (black) shows the situation when both filters are located at ETH. Thevideo stream from the server is first multicast and then individually adapted to the band-width of the remaining connection. Placing the multicast filter before the adaptive filterand especially before the bottleneck is a bad idea. The splitting of the streams lets thetwo streams compete for the available bandwidth on the bottleneck link. As a result, even10% of the frames have to be dropped in times when there is no congestion on the transat-lantic link (1.0 Mbps). Clearly, this setup seems stupid and the loss of packets is obvious.However, it is only obvious because it is obvious to an experienced user which link is thebottleneck. However, another setup with hosts at other locations is easy found where thelocation of the bottleneck is not so easily identified. In such a case, even a stupid setuplike the one described here is well possible.

The situation becomes even worse if the multicast client is placed at ETH and the


adaptive filters at CMU (setup 2). The multicast filter splits the data stream before thebottleneck and congests it. The adaptive filter is placed after the bottleneck and thereforehas no effect. As a consequence, packets are lost over the transatlantic link, leading toincomplete frames that have to be dropped at the client. The result is that the numberof correctly received frames is very low for client 1. At worst, almost no frame at all isdisplayed at 0.3 Mbps.

In the third setup, the adaptive filter is placed at ETH and the multicast filter at CMU. Ifthe transatlantic link is the bottleneck (0.3 Mbps), this setup reaches the maximal possiblenumber of frames that can be transmitted over the bottleneck. For higher bandwidths,however, this setup is dominated by client 2. The multicast filter at CMU is not adaptiveand can only transmit as many packets as the limiting client. In this setup, client 2 limitsthe transmission to 0.4 Mbps. The adaptive filter at ETH adapts the stream to this rate. Thenumber of correctly received frames at client 1 is therefore much smaller than the numberthat could have been transmitted. A solution to this problem would be to additionallyinstall an adaptive filter for client 2 at CMU.

In setup 4, all filters are placed at CMU. No adaptation is performed over the sharedtransatlantic link. When enough bandwidth is available (1.0 and 0.7 Mbps), no adaptationis requested, and both clients receive the full number of frames. However, once the band-width drops below the threshold of 0.6 Mbps, the same effect as in setup 2 occurs: manypackets are dropped and no frame is received correctly.

The last setup shows the results of the dynamic location selection performed by Oc-topus. Octopus evaluates the topology, depending on the available bandwidth, and selectsthe place for multicast and adaptive filters. The algorithm places the multicast filter al-ways at CMU because CMU is the last point where the two streams to the clients separate.This strategy avoids congestion on the transatlantic link. The placement of the adaptivefilter, in contrast, depends on the bottleneck. If the transatlantic link is the bottleneck (0.3Mbps), the adaptive filter is placed at ETH. For the other values of the transatlantic link,the adaptive filter is placed at CMU, to adapt the stream for client 2.

This experiment shows that the dynamic selection is capable of achieving the opti-mal results. A static placement of the filters on Octopus nodes may obtain the sameresults. In such a simple scenario, an experienced user may even guess the correct loca-tion. However, it is certainly hard to figure out how much bandwidth is available on thebottleneck link. A second advantage of a topology–aware application is that it may au-tomatically reconfigure the placement when the usage of the links changes. A real–timevideo transmission may last several hours during which the bandwidth and the location ofthe bottleneck may shift. A topology–aware application may automatically react to suchchanges.


server

O

server server

O

O

ETH

EPFL

UC

UVAUC

ETHETH

EPFL CMU

ETH

EPFL O

OUC

UVA

CMU

ETH O

EPFLEPFL O

CMU

ETH

Octopus node

UVAUC

chain setup tree setupfan setup

Figure 7.9: Network topology for the complex video experiment.

7.3.2 Multicast video streaming

A more complex setup is used for a second experiment. A video stream is emitted froma server at ETH. Clients are located at EPFL, CMU, UC and UVA. Any of these domainsadditionally contains Octopus nodes that can be used for the multicasting. The videorequires an average bandwidth of 0.6 Mbps to be transmitted without packet loss. Theissue is to find a setup of adaptive and multicast filters so that the video quality of allclients is optimized.

Figuring out the most efficient setup is difficult in this scenario. Three setups arecompared against each other, as shown in Figure 7.9. The first setup is simple: the streamis first multicast at ETH and adapted for every single destination individually. This setupis a simple end–to–end multicast and is merely used as a worst–case multicast scenario.We call this setup fan setup. The second setup derives from the geographical closeness ofthe hosts. The stream is split in two streams at ETH: one towards CMU and one towardsEPFL. Every stream is individually adapted. A second multicast filter is located at EPFL,which diverges the stream to the client at EPFL and forwards and adapts the stream to UC.A third multicast/adaptive filter pair is located at UC to split the stream to the local clientat UC and forward it to UVA. Because the distribution shape forwards the data along achain, from ETH to EPFL to UC to UVA, we call this setup chain setup. Finally, the treesetup is the result of the dynamic evaluation of the network topology. It is similar to thechain setup, but multicasts the data at EPFL.

Figure 7.10 shows the resulting video quality at the different clients. The y–axisshows the number of correctly received frames at the clients. Each bar defines one setup.The results of setup 1 reflect the bandwidth properties of the different links. Enoughbandwidth is available to EPFL and CMU. The small number of missing frames can beexplained by the load on the node that multicasts the streams. All filters (multicast filter,


0

10

20

30

40

50

60

70

80

90

100

EPFL CMU UC UVA

receiv

ed

fram

es

[%]

fan setup

chain setup

tree setup

Figure 7.10: Video quality at the clients.

adaptive filter) run on this node and the sum of all tasks loads this Octopus node. Thebandwidth limits the percentage of received frames for the clients at UC and UVA.

In the second setup, all frames are received at EPFL and CMU because the load ofmulticasting and adaptive filtering is distributed over several nodes. The two Iberianclients show an astonishing behavior: the client at UC receives more frames than in setup1 whereas the client at UVA receives significantly less frames. Because no overload couldbe detected throughout this configuration, it must be the network that causes the packetloss.

Figure 7.11 explains this behavior. An analysis of the network paths shows that infact no direct path exists between the two sites, in spite of their geographical closeness.The data sent from UC to UVA travels about half the way back to EPFL before it takesa different route. Sending the data to UC and back again over the same path obviouslyincreases the congestion on a bottleneck link, which implies the high loss rate at UVA.

The danger of the congestion between UC and UVA is detected by the evaluation ofthe topology graph. Multicasting the streams at EPFL avoids the congestion. Both clientsat UC and UVA therefore receive more frames than in the previous setups.

The delegation of the placement of Octopus nodes to the user also bears a problem ifthe user has some knowledge about the topology. In addition to the fact that the physicalconnectivity does not necessarily correspond to the available bandwidth along a path, it


UC UVA

EPFL

ETH

Figure 7.11: Topology map of Ten in 2001

is also possible that the physical topology changes over time. Between the time when theexperiments were run in 2001 and the writing of this dissertation in 2002, the physicaltopology of the European research backbone was significantly changed. The new topol-ogy, which is depicted in Figure 7.12, provides a new, direct connection between UVAand UC. We noted this change because we rerun the experiment in spring 2002 and thetopology–aware discovery came up with a different placement of filters than in 2001.

Table 7.3 shows the percentage of received frames in 2001 and 2002, as a function ofthe distribution setup. We observe three changes between the setups of 2001 and 2002.First, the number of correctly received frames increases. Second, the differences in thesetups are smaller. Finally, the best data distribution setup changes: in 2001, the tree setupwas the best setup whereas the chain setup is better in 2002. We attribute all the changesmostly to the change in the physical connectivity.


UC UVA

EPFL

ETH

Figure 7.12: Topology map of the GEANT backbone in 2002

The smaller differences in the 2002 experiments might lead to the conclusion thatwith an increasing network speed, topology–awareness is no longer needed. That is,the overhead introduced by topology–awareness does no longer pay off the gain in qual-ity. However, experience shows that the requirements of applications and users increasesteadily as well.

7.3.3 Conclusions

The two experiments show that the topology–aware application generally outperformsthe static placement of filters. A user only picks the best setup for simple scenarios, andeven there is some knowledge about the network topology necessary. An example of thedifficulties of a placement by the user is shown in Section 7.3.2, where a user could hardlyguess the right setup. In addition, we have shown that even the physical network topologycan change over time. Keeping track of all these changes is very hard. The change in thephysical connectivity between the time the evaluation was done (2001) and the writingof the dissertation (2002) has resulted in a different optimal placement of Octopus nodes

7.4 Multipath streaming 159

2001 2002client Fan Chain Tree Chain Tree

UVA 60% 70% 80% 100% 96%UC 45% 25% 60% 96% 88%

Table 7.3: Differences in video quality (% of received frames), as a function of the distri-bution setup and the physical topology (2001, 2002).

and a different video quality.The topology discovery, evaluation and selection come at a price: the start of the

sending of the data stream is delayed. Depending on the availability of the information,this delay spans from a few seconds (if the bandwidth and topology information is cached)to several 10s of seconds (if the information must be gathered first). The applicationcan also shorten the delay if it requests the information as soon as possible. Topologyinformation can be gathered while the server is busy processing a client request, fetchingdata from storage, etc. Such techniques may hide the delay at least partially. In the end,however, it is the user who must decide whether the delay can be tolerated.

The difference in the video quality is significant for the different setups. A differenceof 20% of the number of correctly received frames results in a noticeable difference inthe video quality perceived at the client. In spite of the price, we therefore consider thedifferences in the video quality significant enough to claim that topology–aware pays off.

7.4 Multipath streaming

The Internet is often not able to satisfy the bandwidth requirements of multimedia appli-cations. Adaptation allows an application to deal with shortages, but does not increase theavailable bandwidth.

A topology–aware application has an alternative way to address shortages. It can sendthe data over multiple paths. The problem in the traditional Internet is that a single pathis not able to satisfy the application requirements. This limitation of a single path totransmit the data does not apply to topology–aware applications. Chapter 5 has describedalgorithms for the evaluation of a network topology. Some of these algorithms are notonly able to find the best path. They can also find a set of paths which satisfy someconstraints. One of these constraints may be that the summed bandwidth of all pathsexceeds the required application bandwidth.

Multipath streaming has been explored in the context of QoS networks, e.g. [20, 63],and has been proposed for the Internet at a lower layer [83], but it has not been deployedin the Internet as an application–layer protocol. Such a multipath streaming protocol fitswell into the architecture of Octopus. First, because the splitting and the merging of thestreams can be implemented as Octopus services, and second, because a topology–aware


application has the knowledge of the network topology that is required to find and selectmultiple paths.

Multipath streaming in a best-effort network poses a set of new research issues. Themain problem is that resources in such a network are not known, and, even worse, theydynamically change over time, so that continuous adaptation is needed even for a multi-path streaming setup. The combination of adaptation and multipath streaming, however,is far from easy:

� How can a multimedia stream efficiently and dynamically be split onto differentsubstreams?

� Every path is likely to have different timing characteristics, e.g., latency or jitter.How can the different substreams be synchronized?

� How does the adaptation mechanism interact with multipath streaming? In con-trast to QoS networks neither bandwidth guarantees nor bandwidth information isdirectly available from the Internet; one aspect of this problem is selecting the var-ious paths for a multipath setup.

Multimedia applications are a good vehicle to investigate these issues, not only be-cause of their bandwidth demands, but also because of their additional synchronizationissues. We therefore study the above issues in the context of Medusa.

The following sections investigate two approaches to adaptive multipath streaming inthe Internet. A first approach, presented in Section 7.4.1, is to split the data completelytransparent to the application. In contrast, Section 7.4.2 presents an alternative approachin which the splitting is integrated into the application context and combines it with theadaptation algorithm. The first approach, which hides the splitting, is an example of thetraditional system structure of hiding functionality behind layers to separate applicationsfrom the data transport. This architecture is also maintained in most application-layerrouting infrastructures (overlay networks). In contrast, the second approach focuses onan integration of the data transport into the application context. This integration results ina software architecture that’s similar in style to those scenarios that provide application-driven data transport. The motivation in both cases is also the same: some parametersthat decide how the data should be transported are only known within the context of theapplication.

The comparison of the two approaches yields interesting insight about the benefits ofboth approaches. Both approaches significantly differ in their approach to address thequestions listed above. Because both approaches operate at the application layer, we areable to compare the two approaches using real Internet experiments.


filter client

bufferfilter packetize send

feedback

play

buffer

Xpacket frame packet s

plit

socket

merg

e s

ocket

Figure 7.13: Design of a management–layer multipath Medusa filter.

7.4.1 Management–layer multipath streaming

The first approach to multipath streaming is to split the stream as transparently as possibleto the application. This transparency is achieved in two steps. First, in the design of amultipath streaming solution, the stream must be split as late as possible at the sender (thefilter) and it must be merged as soon as possible at the receiver (the client).

Figure 7.13 shows the design of such a multipath streaming for Medusa. A comparisonwith the single–path streaming filter, as shown in Figure 7.2, reveals that no differencesoccur in the data processing until the data is sent. Sending the data over multiple pathsrequires an additional splitting phase and multiple outgoing streams over which the datacan be sent.

The second step in the design is to hide the splitting and the merging of the streamsbehind a layer that makes it easy to exchange single–path and multipath streaming. We usethe socket API for this purpose. That is, we model the splitting and merging of the streamas a special socket class. In the Octopus model, this splitting and merging sockets can beattributed to the management layer because they are not specific to a single application.We call this approach therefore management–layer multipath streaming (MLMS).

Stream setup

Setting up the stream splitting socket when the application establishes the connection doesnot require any changes in the application code. The splitting and merging sockets mustestablish a set of connections rather than a single one. The number of parallel streamsmust be given as a parameter and results from the topology evaluation.

Adaptation

No changes in the adaptation mechanism is needed for MLMS. As shown in Figure 7.13,the adaptation is made independently of the number of parallel streams. At the receiver


filter

client

buffer

filter packetize send

feedback

play

buffer feedback

coordinator

Figure 7.14: Design of an application–layer multipath Medusa filter.

side, the measurement of the packet loss is made after the merging of the substreams.Hence, the feedback mechanism does not have to be changed either.

Splitting

It is likely that not all outgoing streams have the same properties, e.g., the same bandwidthor latency or jitter. The distribution of the data onto the different outgoing streams dependson the relationship of these dynamic properties. That is, if a multipath stream consists oftwo substreams and one of them has twice as much bandwidth as the other, the data shouldbe distributed in relation to the bandwidth.

However, not all applications are just sensitive to bandwidth. It should therefore bepossible to customize the splitting of the data by the application, because only the ap-plication knows about its sensitivity. This application–specific splitting can implementedin Octopus by defining an abstraction at the management layer we call splitting strategy.Any application can customize this splitting strategy.

7.4.2 Application–layer multipath streaming

An alternative to the management–layer multipath streaming is application–layer multi-path streaming (ALMS). ALMS integrates the splitting of the stream into the applicationcontext. Figure 7.14 depicts ALMS.

The application-layer multipath streaming setup is similar to a multicast setup withindividual stream adaptation, with two exceptions. First, all substreams reach the samedestination. Second, multipath streaming and filtering must be combined to ensure that(i) not the same data is sent in the different substreams and (ii) the most important framesare transmitted first (over any path). The latter problem is difficult because the filter must


decide for each frame (i) whether it should be transmitted and (ii) if so, over which path.We address these problems by introducing a filter coordinator. The adaptive multipathfiltering process lets every substream read the full stream from the buffer at its own pace.However, the filters are no longer independent; instead, the coordinator makes sure thatthe most important frames are sent first and are sent only over one path. E.g., if a multipathsetup consists of two paths, the coordinator may set one filter to pass all I-frames and theother to pass all P-frames, or it may set one filter to pass all even frames and the other topass all odd frames.

Stream setup

The stream setup for ALMS is more difficult than for MLMS. A first issue is that thestreams have to be set up in the context of the application. The application design has tobe extended. Depending on the original design, a tedious redesign of at least a part of theapplication may be necessary.

A second problem is the initialization of the different filters and the coordinator. Inthe single path streaming case, the client sends feedback to the filter immediately after thereceiving of the first packets. We see two possible solutions for multipath streaming. Afirst solution is closely related to the single–streaming case. The coordinator is initially setto pass all frames over all connections. After the feedback from the client, the coordinatorcan start assigning drop levels to the filter. This approach may cause frames to be sentmore than once from the filter to the client. However, this solution does not need anyadditional information.

An alternative is to use bandwidth measurements to set up the coordinator and thefilters. The measurements can either be made by the application. However, in Octopus,bandwidth values are already available from the topology evaluation. We propose to usethese values to initially set up the coordinator and the filter levels. For this solution,the metrics of the network (bandwidth) must be mapped onto application–layer metrics(frames, frame rate). The information about the frame layout within the stream (e.g., aGOP of IPBBPBBPBBPBB), the average size of a frame (per type) and the frame rate ofa video are known at the beginning of the transmission and can hence be calculated.

Adaptation

The adaptation mechanism must be changed due to the necessity to send complimentarydata over the different substreams. The streams are no longer independent of each other.One of the consequences is that the adaptation mechanism works differently for ALMS.

At the client, the loss rate must now be measured per substream, which requireschanges in the application code. A second change is that feedback from the client toincrease or decrease the filtering level must no longer be handled by the correspondingfilter alone. It must be forwarded to the coordinator because a change in the filtering level


for one path affects all paths. If one path reduces the filtering level to transmit additionalframes, the coordinator must ensure that the streams are still complimentary. Similarly, ifone stream increases the filtering level, the coordinator must ensure that the high priorityframes are still transmitted first. Assume, e.g., that a multipath streaming filter has 2 sub-streams, one transmitting I-frames, the other P- and B-frames. When the first connectionslows down, it cannot simply start dropping I-frames. The coordinator must adjust thefiltering level of both paths. An I-frame that can no longer be transmitted over the firstpath must be transmitted over the second path, which in turn may have to start droppingB– and P–frames.

In the opposite case, i.e. when a stream can be increased, the multipath coordina-tor must be contacted that find out which frame type is the next that can be transmitted”globally”. Consider, e.g., a multipath streaming with two streams where the first streamtransmits IP1

1 and the second stream P2. If stream 1 wants to increase the stream, itmust not simply choose the next frame type, P2, since this frame is already transmitted byanother connection. The coordinator which knows the global view correctly determinesP3 as the next frame to be sent.

Changes at the client are necessary to merge the streams for ALMS. Two changes arenecessary. First, the measurement of the packet loss is combined with the buffer instancein the single–path stream implementation. This implementation has to be provide thefeedback for every substream. That is, the measurement of the loss rate as well as thesending of the feedback messages must newly be attached to every substream. No changesare necessary to the messages, however.

7.4.3 Comparison of the multipath streaming approaches

Figure 7.15 compares the two approaches from an architectural view. Figure 7.15(a)shows that the stream splitting and merging can be integrated in the management layer.So neither the application must be changed nor is support needed from the network. Fig-ure 7.15(b), in contrast, shows only two layers because the adaptation and the splitting areintegrated into the application. Second, MLMS has only one feedback stream whereasALMS has a feedback channel for every substream.

Table 7.4 compares the two approaches along different criteria. To enable an existingconventional single-path application, MLMS limits any source code changes to using thenew splitting and merging sockets, whereas ALMS requires changes to the applicationcode. These changes may be hard to implement, especially if the application’s modulestructure does not encapsulate the communication activities.

The splitting granularity of the two implementations is different. MLMS, which isclosely related to the network, works with UDP packets. In contrast, ALMS works withapplication-specific data types (MPEG frames in this case). The splitting granularity for

1The index of P1 means that P is the first P–frame in this GOP


dro

p

sp

lit

ab m

erg

e

bu

ffe

r

feedbackclientfilter

networklayer

vid

eo

pla

ye

r

management layer

application layer

(a) Management–layer multipath streaming

dro

p

sp

lit

ab

me

rge

bu

ffe

r

feedback

clientfilter

networklayer

vid

eo

pla

ye

r

dro

p

feedback

(b) Application–layer multipath streaming

Figure 7.15: Synchronization during the different phases of a handoff at the client.

criteria MLMS ALMS

filter changes no coordinatorclient changes no mergingother changes splitting/merging socket noportability to other app yes nosplit granularity (UDP) packet MPEG framenumber of streams unlimited depends on frame layoutsplit metric bandwidth packet lossadaptation metric packet loss packet lossstream adaptation all together individual substreams

Table 7.4: Comparison of management–layer splitting and application–layer splitting

ALMS is coarser than for MLMS because an average MPEG frame is larger than a UDPpacket.

The number of parallel streams for MLMS is only limited by the host communication(network) interface (and the host’s memory bandwidth), so for many settings, the numberof paths can be quite large. ALMS on the other hand may be limited by applicationrestrictions. The MPEG filtering process distinguishes frames within a GOP, but notamong them. That is, the filter knows whether a B-frame is the first or the second after anI-frame, but it does not know the difference between the first B-frames in different GOPs.As a consequence, the number of parallel streams is limited by the number of frames in aGOP. Typical numbers are 12 or 15 possible parallel streams.

The split metric is also related to the network for transport-layer stream splitting.Bandwidth is a typical split metric, but others network-related metrics, e.g., error ratefor wireless connections, could be used as well. In contrast, the coordinator of the


paths 1 2 3 5 10 15

SysloadALMS 2% 2.6% 3.4% 5.0% 7.6% 11.2%MLMS 2.1% 2.1% 2.0% 2.1% 2.0% 2.2%

DelayALMS 1.00 1.06 1.06 1.06 1.07 1.07MLMS 1.02 1.02 1.01 1.02 1.02 1.02

Table 7.5: Performance of MLMS and ALMS (% load and normalized delay)

application-layer splitting distributes frames (not packets) according to packet loss alongthe paths. The adaptation metric is packet loss for both implementations, but the MLMSadapts all substreams together, as if it were one single stream, whereas ALMS adapts eachsubstream individually.

7.4.4 Evaluation

Table 7.4 shows that the two multipath streaming approaches not only differ architec-turally but also in many parameters. All these differences influence the streaming perfor-mance of each approach in various ways. This section discusses this influence along threelines of thought. A first part looks at the overhead of each approach with respect to thesystem load on a filter and the delay overhead for the streaming. The second part discussessynchronization problems in multipath streaming, e.g., how the two approaches deal withlarge delay differences along the subpaths. Finally, the third part discusses the interac-tion between multipath streaming and adaptation and its impact on the final multimediaquality.

7.4.5 Performance

Table 7.5 compares the load imposed on the filtering host and the delay for data forward-ing of the two approaches, as a function of the number of paths in the multipath setup.The system load is measured on a 933 MHz Pentium III with 256 MB running Linux7.2. The results are averages of 100 tests and are expressed in % of the total system load.The results show that the load imposed by MLMS is independent of the number of paths,whereas for ALMS the load increases almost linearly. The reason for this increase isthat the whole filtering process is replicated for each additional stream in the application-streaming approach. ALMS has therefore a significantly worse scalability than MLMS.

The second set of rows shows the delay overhead of multipath streaming normalizedto the corresponding single-stream implementation. For MLMS, the delay overhead ismeasured between the sending command by the application and the sending command bythe splitting socket. For ALMS, the measurement compares the time between the fetchingof a frame from the buffer and the sending of a frame. The numbers show that the delayoverhead is negligible; in practice the absolute values are in the order of milliseconds.


0

25

50

75

100

1/2 1/2 1/3 2/3 1/4 1/4 1/4 1/4 1/8 1/8 3/8 3/8 1/15 2/15 4/15

8/15

bandwidth distribution of parallel paths

rece

ive

db

yte

s[%

]

round robin random BW(1) BW(10)

Figure 7.16: Transmitted packets (network emulation with real traces)

Splitting strategy

The performance of multipath streaming depends on the efficiency of distributing thestream onto the different substreams. For ALMS, this splitting is implicitly integratedinto the adaptation mechanism. The filtering process of each substream fetches the datafrom the buffer at its own rate, matching the capacity of the outgoing streams. In contrast,an explicit splitting mechanism is needed for MLMS. This splitting mechanism, which ispart of the splitting socket, receives UDP packets from the application and must distributethem onto the outgoing connections. Three principal splitting strategies are evaluated inthis paper. Round robin distributes the packets sequentially at an equal share. Randomuses a uniform distribution. Finally, the third strategy measures the available bandwidthalong each path and distributes the frames based on the actual bandwidth availability.These bandwidth measurement are also integrated into the splitting and merging socketand are therefore hidden to the application. We report on two different bandwidth splittingstrategies: BW � 1 � splits the streams according to a bandwidth sample that is taken anewevery second, whereas BW � 10 � takes a value averaged over the last 10 samples to smoothheavy fluctuations.

To compare the different splitting strategies, we emulate real Internet traces betweenthe filter and the client, as described in Section 7.1.4. Two different videos are used in


this experiment. The amount of transmitted data is measured as a function of the splittingstrategy, the number of parallel streams and the bandwidth distribution among the paths.Figure 7.16 shows the amount of transmitted bytes, relative to total bytes sent by the fil-ter, in %. A value of less than 100% shows that some packets had to be dropped becausethey could not be sent due to congestion. The x-axis denotes the relative bandwidth dis-tribution of the paths. A value of � 1

2 � 12 � , e.g., means that the multipath setup consists of

two subpaths with an equal average bandwidth (50%). To achieve this distribution, theoriginal traces are scaled so that the average bandwidth of a trace matches the target aver-age bandwidth, while still keeping its fluctuation patterns. The total amount of streamingbandwidth is always chosen to match the bandwidth requirements of the video for thisexperiment.

The first and the third group of columns in Figure 7.16 exhibit an equal share of band-width for each subpath. The results of all 4 splitting strategies are similar. In contrast, thetwo static splitting strategies (random and round robin) significantly drop in performancewhen the bandwidth share is no longer equal for the different paths, as shown in the sec-ond, fourth and fifth group of columns. Regarding the two dynamic splitting strategies,we note that BW � 10 � performs 5-10% better than BW � 1 � .

The conclusion from this figure is that a good splitting strategy is essential to get adecent performance for MLMS. Static splitting strategies are not useful when the pathsdo not have an equal bandwidth, which is hardly the case in real best-effort networks.

Splitting granularity and metric

MLMS and ALMS differ in a third factor: splitting granularity and splitting metric. Theresult of this difference is shown in Figure 7.17. The experiment that leads to this figure issimilar to the previous one. Two different videos are sent over the emulated traces for bothapproaches. The x-axis denotes the total amount of available bandwidth for the multipathsetup, and the bandwidth of each of the n paths has 1

n -th of the total bandwidth. MLMSuses BW � 10 � as a splitting metric. The y-axis shows the amount of received bytes, in %of the total bytes sent.

The performance for a single connection is always higher because it does not haveto pay the multipath streaming overhead. Interestingly, the relationship within one groupof bars is different for MLMS and ALMS. The performance of ALMS decreases with anincreasing number of parallel paths while it increases for MLMS. This different behavioris due to the differences in the splitting mechanism. The decrease in performance forthe ALMS is due to the coarse and slow splitting-adaptation mechanism. The adaptationmechanism is lazy to ignore small bandwidth fluctuations to smooth the video and, espe-cially, to avoid packet loss. If a request to change the filtering level arrives in ALMS, thisrequest has a much larger impact than for the corresponding request for MLMS becauseframes are a factor of 2-5 larger than UDP packets. It is therefore much more difficult to


0

25

50

75

100

200 400

ALMS

600 200 400

MLMS

600

rece

ive

db

yte

s[%

]

1 connection 2 connections 4 connections 8 connections

Figure 7.17: Amount of received data as a function of the number of parallel connections,the available bandwidth.

coordinate the different filters.In contrast, MLMS is more efficient for a larger number of connections. We explain

this observation with bandwidth mispredictions. The impact of a misprediction affectsabout 50% of the data if only two parallel connections are used. In contrast, with 8connections, only 1

8 of the data is affected.The discussion shows that splitting granularity and metric affect both approaches in

a different way. Comparing the two approaches, ALMS is generally better suited for alower number of parallel streams. MLMS does not only scale better to a large number ofstreams from an implementation’s point of view, it is also more efficient when the numberof paths increases.

7.4.6 Synchronization

Real Internet paths are not only different in the available bandwidth, they also differ inother metrics, such as delay or jitter. These differences cause a synchronization problemfor multipath streaming. Packets that travel over a slower path may arrive late at the client.In the worst case, they must be discarded because they are too late to be displayed. Dis-carding or loosing packets due to congestion has a negative impact on the video quality.Because a packet is smaller than a frame, a single packet loss may invalidate other packetsas well, i.e., they are transmitted in vain. Even worse, the loss of a packet containing an


200

400

600

800

1000

1200

1400

1600

1800

2000

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140

Time [s]

Late

ncy

[ms]

-6

-5

-4

-3

-2

-1

0

1

2

3

4

5

6

7

8

9

10

Dro

ple

vel

Latency path1 Latency path2 ALMS 0.5s MLMS 1.0s MLMS 0.5s

MLMS 0.5s

MLMS 1.0s

ALMS 0.5s

Figure 7.18: Change of filtering level due to synchronization problems

I-frame leads to an invalidation all the frames that depend on this I-frame.The ability to deal with a large difference in the path latencies varies for the two

multipath streaming approaches. To show the differences, a video stream is sent over twoemulated paths. The latency of these traces is shown on the lower part of Figure 7.18. Thex-axis denotes the time, the left y-axis the latency in milliseconds. The two lower linesshow the latency of the two paths that form the multipath. Their latency is similar untilt � 30 seconds, when one trace doubles its latency in two major steps whereas the latencyof the other path is only slightly increased.

The increase in the latency difference has an impact on the filtering level if the clientbuffer is not much larger than the latency difference. We therefore set the buffer size cor-responding to data for 0.5 seconds and 1.0 seconds, resp., for MLMS and to 0.5 secondsfor ALMS. These buffer size values are relatively small for Internet applications, but thelatency difference in the real Internet may also be much larger than for the shown traces,so the same effect may happen on a larger buffer.

The top of Figure 7.18 shows the filter drop level (y-axis on the right) for MLMSand ALMS. (In the case of ALMS, the filtering level of the coordinator is shown). Allthree implementations increase their filtering level 30 seconds into the experiment as areaction to the increase in the latency. The ALMS reduces the drop level again after10 seconds. The reason is for this reduction is that the adaptation mechanism (of the


single-path streaming) is able to adjust the speed of a stream (the data rate) by delayingthe sending of a packet at the filter to keep a steady data flow to the client. BecauseALMS adapts every substream individually, it can increase the speed of the high latencyconnection until the two substreams are synchronized and no more packets are dropped.The same adjustment in the speed is visible after 53 seconds and 96 seconds.

MLMS does not have the ability to react accordingly because the multipath streamingis hidden. The adaptation algorithm only notices that some packets are discarded by theclient buffer. Assuming that the discarding is a result of congestion, MLMS increases thedrop level at the filter. However, this increase has no effect because the latency differenceis still there. The “0.5s buffer” client continues to increase the drop level until the maximallevel of 10 is reached (i.e., 10 out of the 12 frames in each GOP are filtered out). Theeffects on the video quality are devastating, especially compared to ALMS with the samebuffer size! Doubling the buffer size for MLMS to “1.0s” also leads to an increase inthe filtering level, but with two differences with respect to the “0.5s” buffer. First, thefiltering level increases later because the larger buffer can hide the effects of the firstlatency increase. Second, when the latency difference decreases again after 95 seconds,the filtering level is also reduced.

We make three observations here. First, the buffer size is a critical parameter formultipath streaming, especially at the transport layer. A larger buffer is advantageous todeal with synchronization problems. However, there are other factors that limit the buffersize, e.g., constraints imposed by the client hardware.

Second, the ability to deal with synchronization problems is limited for MLMS be-cause the multipath mechanism is separated from the adaptation mechanism. The adapta-tion mechanism has no way to find out the right cause of the packet loss in this implemen-tation. One way to deal with the synchronization would be to integrate synchronizationinto the multipath streaming (socket) as well. Timestamps could be set by the splittingsocket at the filter and be compared at the client. A mismatch could be adjusted by de-laying the sending of packets over individual paths. However, such an adjustment wouldimply that a buffering mechanism is included into the splitting socket - one buffer peroutgoing socket! Apart from being resource intensive, such a buffering mechanism maybe hard to implement; especially the timing question may be hard to answer, i.e., whenexactly should which buffer be drained by how much.

Finally, synchronization is only one kind of the “asymmetry” problems that can occurwith multipath streaming. In a wireless environment, e.g., paths may have an “asym-metric” probability of losing packets due to transmission errors. ALMS can detect thesedifferences and react, e.g., by sending the most important frames over the most reliableconnection. TLMS cannot deal with an unequal error rate because it does not know thatmultiple paths are used. Implementing a scheme to send high priority frames over themost reliable connection within the splitting socket would be very difficult, especiallybecause the splitting socket does not know which part of a frame is contained in a packet.


0

25

50

75

100

I P

200

B I P

400

B I P

600

B

frame types / multipath bandwidth

rece

ive

dfr

am

es

[%]

1 connection 2 connections 4 connections

Figure 7.19: Transmitted video frames as a function of the available bandwidth of themultipath for transport-layer streaming.

Multipath adaptation

The previous performance analysis has measured only the amount of transmitted bytes.Important for the video quality at the client, however, is not the amount of transmitteddata, but the number of correctly displayable frames, which determines to the video qual-ity.

To assess the video quality at the client, we stream again two different videos over theemulated Internet traces with a varying number of paths. Each run is repeated 50 timeswith different Internet traces. The traces are again scaled to match the average targetbandwidth. For each run, we measure the number of displayable frames, separately foreach frame type. The results of this experiment are shown in Figures 7.19 and 7.20.

Both figures show the number of displayable frames on the y-axis, grouped by frametypes, as a function of the multipath bandwidth and the number of paths that make up themultipath bandwidth. Both figures are similar in the layout of the received frames, that is,if the multipath bandwidth is low, only I-frames are transmitted. P-frames are sent whenmore bandwidth becomes available, and finally B-frames get through. This shows that theadaptation and multipath streaming mechanism work well together. Analyzing the datafurther reveals that ALMS results in a slightly higher number of frames. Especially the


0

25

50

75

100

I P

200

B I P

400

B I P

600

B

frame types / multipath bandwidth

rece

ive

dfr

am

es

[%]

1 connection 2 connections 4 connections

Figure 7.20: Transmitted video frames as a function of the available bandwidth of themultipath for application-layer streaming.

probability of losing an I-frame is almost zero, which means that this approach almostperfectly combines adaptation and multipath streaming. The results for MLMS, however,are almost as good as those of ALMS. However, it must be noted that these values areachieved only by a good setup of the transport-layer streaming, i.e., we used the BW � 10 �splitting strategy, the paths are free of error, and the client-side buffer is large enoughto avoid synchronization problems. Any of these problems can significantly reduce theperformance of MLMS whereas ALMS is more robust and only slightly affected.

7.4.7 Multipath streaming in best-effort networks

The previous discussion compares ALMS and MLMS and highlights its pros and cons;the net benefit of multipath streaming for an application is shown in Figure 7.21. It showsthe number of received frames as a function of the number of paths and the multipathstreaming strategy. For this experiment, a video stream of 1.5 Mbps is streamed over adifferent number of parallel connections. Each connection is emulated with Internet traceswhose average bandwidth is between 300 and 500 Mbps. Every experiment is repeated10 times with different traces. Every bar shows the average number of frames received,with the min and max numbers. The first bar denotes the number of received frames


0

25

50

75

100

MLMS ALMS MLMS ALMS MLMS ALMS

1 2 3 4

Number of paths

Receiv

ed

fram

es

[%]

Figure 7.21: Multipath streaming in a best-effort network.

over a single path, i.e., the common single-path streaming. All other bars use multipathstreaming. With two streams, the number of received frames is almost doubled and thevideo quality is greatly increased. Finally, four parallel streams transmit the video almostin the original quality. These results show that the presented approaches to multipathstreaming can significantly increase the video quality. The key to this improvement liesin the exploitation of the resources that are available in the network via alternative paths.

Multipath multicast streaming

Multipath streaming has so far been applied to replace a single connection between twoOctopus nodes (typically between a filter and a client) by multiple connections. However,multipath streaming is not limited to this setup. A special setup is e.g. possible whencombining multicast and application–layer multipath streaming.

Figure 7.22 shows such a combined setup. In Figure 7.22, a server multicasts its datato three clients. Assume that every link has a capacity to stream a pattern of 1 I–frameand 2 P–frames per GOP. All filters could be setup to let the pattern IP1P2 pass. Eachclient would thereby receive the same frame pattern and hence the same video quality.However, as shown in the figure, having a global view of the topology allows a differentsetup. The server streams different P–frames over the outgoing connections: P1P2 to onefilter, P3P4 to the other filter (the I–frames are needed over both connections). Clients 1and 3 receive the same streams that pass the corresponding filters. Client 2, in contrast,


client 1

server

client 2

client 3

IP P1 2

IP P3 4

IP P1 2

P P3 4

IP P1 2

IP P3 4O Octopus node

O

O

Figure 7.22: Combined multipath multicast setup.

may receive streams from both filters. Because their content is complementary, client 2receives the sequence IP1P2P3P4. Receiving all P–frames increases the video quality forclient 2.

Such a scenario is possible but difficult to achieve. It requires that the video distribu-tion is not only coordinated locally at one filter, but also over the whole distribution tree.This ”global”coordinator must coordinate all local filters so that they pass complementaryframes for streams that lead to the same client. Such a coordination is difficult and scalesbadly to very large multicast trees. In addition, changes in the local filters must approvedby the global coordinator. This operation may be too time consuming (for the messagepassing as well as the optimization of the multicast tree) to be deployed on a large scalefor real applications. However, it is important to notice that this kind of optimization isonly possible with ALMS because information about the frame layout of every stream isneeded.

7.4.8 Summary

Multipath streaming is a feature that extends the possibilities of end–to–end applications.The basics for the implementation of multipath streaming lie in the availability of informa-tion about alternative paths and the information about the resource usage of the networktopology. Octopus is able to gather and present this information to applications.

This section presents two approaches for multipath streaming. Splitting the streamsat the management layer provides a solution that is completely transparent to the applica-tion. It is therefore portable to any topology–aware application. Its overhead is low. Theintegration of the multipath splitting into Octopus allows the application to customize thesplitting according to its preferences. The presented solution uses bandwidth as a metricfor the splitting, but others may (additionally) integrate latency or jitter.

As of an alternative, integrating multipath streaming into the application enables the


application to fully control and steer the splitting of data onto the different streams. Theincreased overhead and the loss of portability may be compensated in environments whichrequire application–knowledge for the splitting.

In spite of the many advantages of MLMS, it has also severe drawbacks. Every path ina best-effort network has its own dynamic behavior, e.g., different latencies or error rates.Any asymmetry in this behavior must be addressed by an application, e.g., to maintainsynchronization. MLMS only sees the effects of the asymmetry on the whole stream, butit is neither able to identify the misbehaving path nor can it take appropriate reactions. Incontrast, ALMS integrates the multipath streaming and the adaptation into the applicationcontext. Because it first splits the data and adapts every substream individually, it is ableto deal with path asymmetries.

We also observed that although MLMS has a lower overhead and is more efficient withrespect to the number of bytes, ALMS is able to deliver a better video quality because theadaptation and the splitting mechanisms use application-layer metrics. The mapping ofnetworking metrics (bandwidth) onto application-layer metrics (MPEG frames) is not eas-ily performed. So we caution to expect that the promising performance results of overlaynetwork will automatically translate into noticeable quality increases for an application.

The feedback mechanism of the original application has been changed as little as pos-sible for this evaluation. That is, the client continues to steer the filter drop level. Relatedwork shows that client–based feedback is only one possibility to steer adaptation. Steer-ing is also possible at the sender (i.e., the filter). Since MLMS requires the measurementof bandwidth to efficiently steer the splitting anyway, this bandwidth information couldalso be used to steer the filter. We have not considered this issue any further because suchan adaptation mechanism would also require significant changes in the ALMS adaptationmechanism. Finally, we do not expect additional insight about the comparison of MLMSand ALMS because this comparison is to a great part independent of the feedback initiator(i.e., whether the filter drop level is steered by the client or the filter itself).

The conclusions drawn from the comparison of the two approaches can also be ap-plied to other protocols or approaches that have not been considered here. One idea thatis often brought up when discussing adaptation is the idea of redundancy. The framedropping filter takes the different frame priorities into account by dropping low–priorityframes first. A similar priority could also be given in the transmission of the frames byadding redundancy to the higher priority frames. In a single streaming case, redundancycan be added at a “bit–level”: the bits of a frame can be packetized and duplicated ina way that the loss of a packet containing a part of a frame does not automatically leadto the discarding of the whole frame. That is, when a packet is lost that contains a partof an important frame, enough redundancy should be contained in the other packets toreconstruct this frame. (Application–layer) multipath–streaming offers an additional pos-sibility to add redundancy at a packet level: packets containing important frames can besent via different streams. We have not pursued this idea any further because of timing

7.5 Handoff 177

constraints. However, we claim that such an idea may be interesting to investigate fur-ther for the following reasons. First, we expect a “packet–level redundancy” to be fasterthan a “bit–level” solution, similar to the fact that dropping whole packets is faster thande– and re–encoding frames. Second, such a packet–level mechanism is easier to imple-ment because it does not require detailed knowledge about the MPEG encoding. “Only”the packetization mechanism must be changed to send important frames over multipleconnections. Next, the study of dynamic mechanisms to decide how much data shouldbe replicated, e.g., as a function of the loss probability of a connection, might bring upinteresting results. Finally, it would be interesting to evaluate the benefit of adding redun-dancy because the redundancy always increases the load on the network. It is not clearwhether this adding yields better results or whether the adaptation mechanism we usedis good enough in adapting the streams in a way that the probability of loosing frames islow enough.

The usage of multiple parallel streams raises the question whether multipath stream-ing does not increase the congestion in the Internet. Congestion is a severe problem in theInternet. If applications start to exploit additional resources by using multiple paths, thisproblem could be aggravated. However, there are two arguments that alleviate this prob-lem. First, it can not be expected that all applications in the Internet exploit the benefits ofmultipath streaming. The design overhead pays off for bandwidth–intensive applications,but we do not expect a significant benefit for the transmission of images because multi-path streaming increases the setup time and hence the access latency. Second, congestionis already a problem in single–path streaming. TCP–friendly protocols, e.g., reduce thedanger of congestion. A multipath streaming application should be implemented with thesame principles in mind.

7.5 Handoff

The previous sections have focused on the selection of Octopus nodes and the setup ofvideo streams at application startup. However, a topology–aware application must alsoobserve the topology and the resources while the data is transmitted to detect fluctuationsin the resource availability. The smoothing of the adaptive filter takes care of small re-source fluctuations. However, if the resource availability changes significantly or overa longer period of time, a topology–aware application has the option to react to thesechanges. It may gather the latest information about the network topology and re–evaluatethe availability of resources along alternative paths. If more resources are available overan alternative path, the application may decide to switch to an alternative connection.

Note that such a fluctuation may be both positive and negative: a congestion mayvanish as well as new congestions may slow down the data delivery at other parts of thenetwork. In the worst case, a connection may even go completely down.

It the application decides to change the data transmission, it should be done in a way


UFMG

CMU ETH

handoff

Figure 7.23: Handoff: the switch from an old connection (ETH-UFMG) to a new con-nection via CMU

that the application is least affected by this change. This section describes an application–layer handoff for Medusa [49]. A handoff is the switch from one connection to an alterna-tive connection. Handoffs are a well–known technique in mobile communication systems:when a user moves from one cell to another, the connection must be handed off from anold to a new base station (see e.g. [71] for mobile hosts). The same idea is applied toMedusa: if Medusa detects that the current connection is slowing down and an alternativeconnection could be established that offers better resource availability, it does a handoff.

Figure 7.23 shows a sample handoff scenario. The Medusa components are distributedover three locations. Assume that at the beginning, a server at ETH Zurich sends its videodata stream to a client at UFMG in Brazil. The link between ETH and UFMG has onlya small bandwidth, as indicated by the thin line. At some point in time, a client at CMUjoins the transmission. A topology–aware application notices that a new link betweenCMU and UFMG shows up. If the path from ETH via CMU to UFMG provides a betterperformance than directly from ETH to CMU, a handoff can be triggered by the clientat UFMG. As soon as the path via CMU is established, the old connection may be shutdown.

The handoff in the description of this scenario is caused by a change in the topology. Ascenario where the topology remains the same (both clients) and the handoff is triggeredbecause of bandwidth fluctuations can easily be thought of as well.

A handoff is a typical topology–aware mechanism. A handoff requires that moreinformation about the network topology is available. At least a second path must beknown to which the application can switch. Simple network–aware applications do nothave this possibility and are therefore limited to adaptation.

This section makes two contributions. First, it describes the design and the implemen-

7.5 Handoff 179

client

buffer

play

MPEG frame

(UDP) packet

input stream

(a) Streaming during phases 1, 2, 4 and 5 at thehandoff

client

buffer

play

MPEG frame

(UDP) packet

input stream

synchronization

old connection

new connection

(b) Synchronization at the client buffer duringphase 3

Figure 7.24: Synchronization during the different phases of a handoff at the client.

tation of a handoff mechanism for Medusa. Second, it analyzes the different factors thatinfluence the quality of a handoff.

7.5.1 Client handoff

This section describes the design and the implementation of a handoff mechanism for anMedusa client. The reason why the client is chosen for the discussion and not a filter isthat the Medusa client monitors the incoming data stream. This monitored information isused to steer the adaptive filter. This information is also one possible source to detect highfluctuations in the resource usage and can therefore trigger a handoff. Given the originaldesign of Medusa, this kind of triggering is only possible at the client. An alternativeis that the management layer is responsible for the detection of a possible handoff. Themanagement layer monitors the network topology and informs the application wheneverthe resources change significantly. This mechanism can be applied to filters and clients.Which of the two methods are used is not important here because they are only twodifferent triggers for a handoff. This section concentrates on the actions taken after thetriggering. Also omitted from discussion is how alternative connections can be found.Chapter 5 has described mechanisms to search and detect alternative paths.

A handoff takes place while the application is in full transmission. The issue of ahandoff is to be as unnoticed as possible, i.e., the digital media stream should be played assmoothly as possible. In the design of a handoff, this issue implies that a second connec-tion must be established in parallel to the current connection, as shown in Figure 7.24(b).The switch is not done until the two connections are synchronized. Establishing a secondconnection requires a change in the design of the Medusa client.

The server splits the data stream, which consists of frames, into packets. Each packetis identified by a sequence number (the offset in the video stream). The sequence numberis therefore unique within a stream, but the same packet may be sent over two different


client connection A connection B phase

1: connection A (old)

2: connection setup(connection B)

3: synchronization

4: connection teardown

5: connection B (new)

connect

set_start_packet

start_send

stop_send

handoff request

Figure 7.25: The handoff can be divided into 5 phases: (1) connection A only, (2) con-nection setup (connection B), (3) synchronization, (4) connection A tear down and (5)connection B only.

connections and therefore arrive twice at the client. The buffer does not consider overwhich connection the data was received and therefore treats such packets as duplicates.The video player fetches the packets from the buffer, converts them into frames and dis-plays them. Figure 7.24(a) shows how packets are inserted into the client buffer: above,only one connection is inserting packets. Below, two connections are concurrently stream-ing and the buffer must therefore be synchronized. The buffer must also discard packetsthat are sent over both connections. This functionality is often already implemented forsingle connections.

An important feature of our implementation is that neither server nor filters have to bechanged. All changes in the implementation due to the handoff are limited to the client.The handoff–enhanced client is therefore works with the original MTP filter and server aswell as all kinds of Medusa filters.

7.5.2 Handoff phases

A handoff can be divided into 5 phases, as depicted in Figure 7.25. The first phase is thepre–handoff phase where the handoff is triggered. In this phase, the client is receiving dataover the old connection A. This first phase ends with the triggering of a handoff. Duringphase 2, the new connection B is established. The establishment includes the setup of thewhole Octopus path as well as the exchange of the commands to start the streaming ofthe video over the new connection. One of these commands is, e.g., contains the currentlocation in the video stream sequence. In phase 3, both connections are simultaneouslystreaming data to the Medusa client. The client must synchronize the two streams becausecopies of some packets may arrive over both connections. In phase 4, the transmission ofthe old connection A is stopped and the connection is shut down. Finally, in phase 5, datais sent only over the new connection B. The handoff is completed.

7.5 Handoff 181

7.5.3 Handoff parameters

The quality of the video during the handoff is influenced by different factors. In thecontemplation of the handoff phases and the design of the handoff at the client, we havecome up with four major parameters that significantly influence the handoff:

� the size of the buffer

� the initialization of the new stream

� the delay in the connection setup

� the length of the synchronization phase

These four parameters can be separated into two groups. The first two parameters areparameters that refer to the buffer, the second two parameters refer to the timing.

Buffer size

The buffer at the client is used to cache data to smooth fluctuations in the transmissionand to allow a steady video stream to the video player. This buffer also plays an importantrole during the handoff. Between the last packet that arrives over the old connection andthe first packet that arrives over the new connection, the video player fetches data fromthe buffer. When the buffer has drained, the player stops and must wait for new data toarrive. The buffer capacity and the fill degree set an upper bound to the time allowed forthe handoff.

The buffer size and the fill degree can be steered by the application. However, weexpect that the size of the buffer is set at the application startup and remains unchangedwhile the client is receiving data. A dynamic adaptation of the buffer size is an issue thatis not solely related to handoff: it is also possible to dynamically adapt the buffer size inreaction to different degrees of resource fluctuations. For a handoff, however, a dynamicadaptation makes only sense when an impending handoff is noted early enough to also fillthe buffer with content. Since this is not always the case, we assume that the buffer sizeis fixed throughout the transmission.

Setting the initial buffer size requires a trade off of multiple parameters. Since thebuffer must deal with fluctuations, it should be made as large as possible. However, alarge buffer must also be filled, so that a large buffer implies a larger startup delay anda later displaying of the content. Finally, the buffer size may be limited by hardwareconstraints. Memory size is typically abundant in desktop computers, however, moderncellular phones may receive video streams. We expect that the buffer size is not unlimitedfor these devices.

Also note that the buffer size is usually steered by the ”normal”operation, i.e., adaptivestreaming and is not optimized for a handoff. When a handoff is triggered, the handoff


is usually started immediately. In this case, an increase in the buffer size is no longeruseful since the old connection is already slowing down. For these reasons, we considerthe buffer size as an important parameter that is statically defined when the applicationstarts up and is not changed at run time.

Stream initialization

When a handoff is triggered, a part of the video has already been transmitted via the oldconnection. The current position in the video stream has to be known to the new connec-tion. The position in the video sequence can, e.g., be expressed by a packet number. Thisposition is called start packet in Figure 7.25. However, since the client has a buffer, it ispossible to define the start packet in different ways.

The goal in defining the start packet is to optimize the combination of the old andthe new data stream. Obviously, the new connection should neither transmit frames thathave already been transmitted over the old connection nor should there be a large gapof untransmitted packets. Determining the start packet depends on two parameters: thebuffer quality and the round–trip time of the new connection. If the old connection haslost many packets just before the handoff, the new connection may try to retransmit thelost frames. (A special case exists if the old connection has unexpectedly shut down; thenonly the new connection is available to transmit data.) Such retransmission may allow anapplication to achieve a better frame rate, but there is the risk that resources are wasted.If the old connection has lost only a few frames, resources are either wasted to retransmitframes that arrived already or resources are wasted to inform the server of the exact statusof previously sent frames.

The round–trip time denotes the time for the first packet to arrive at the client over thenew connection. During this time, the player consumes packets from the buffer. Thesepackets should not be retransmitted. Often, the round–trip time can be determined whenthe connection is set up.

The buffer and the connection setup time influence the setting of the start packet. Toget a first assessment of the influence, we use only two discrete values for each param-eter, although both parameters take on contiguous values over a larger range. For thebuffer quality we use ”low”and ”high”and for the round–trip time ”small”and ”large”.The combination of the two parameters leads to 4 test cases.

For each of these 4 cases, we define a strategy S to determine the start packet:

S0: The oldest packet in the buffer, i.e. the one with the lowest sequence number, is thestart packet. This frame will be displayed next.

S1: The newest packet number in the buffer is the start packet.

S2: The packet that is expected to be displayed when the first packet arrives over the newconnection.

7.5 Handoff 183

buffer

player25 2123 617 14 12 11 8

client

inputstream

S0S1

S2S3offset offset

empty buffer space

packet sequence number

Figure 7.26: Handoff timing: different methods to define the start packet (S0 to S3).

S3: The packet that is expected to have arrived last at the buffer when the first packetarrives over the new connection.

Figure 7.26 depicts the four strategies to determine the start packet. The figure showsthe client buffer and a set of video packets with their sequence numbers. The videopackets stream from the left (from the server) to the right (to the video player).

S0 points at the first packet in the buffer (the oldest packet). S1 uses the latest packetinserted into the buffer. Note that the buffer is often only partially filled (typically between50 and 80%) to balance fluctuations in the arrival rate.

S2 is based on S0, but adds an offset to the oldest frame in the buffer:

o f f set � rtt � f

where rtt is the round–trip time between the client and the new server and f the averagefrequency with which the player fetches the frames from the buffer. The offset corre-sponds to the number of packets that are consumed by the player until the first packetarrives via the new connection. S3 uses the same offset, but adds it to the newest packetin the buffer (S2).

Each combination of parameter values (high/low buffer quality, small/long connectionsetup delay) can be mapped to a strategy, as shown in Table 7.6. Strategies S0 and S2 try tocompensate for the lost packets that are missing in the buffer. If the new connection startssending immediately, it can be set up at the position of the packet that will be fetchednext by the player (S0). If the setup takes longer, the round–trip time should be taken intoaccount to avoid the sending of packets that have already been consumed when they arriveat the client. Similarly, S1 and S3 are used when the buffer is full. S3 avoids the sendingof data that may have been transmitted over the old connection.


delay low buffer quality high buffer quality

small S0 S1

long S2 S3

Table 7.6: Strategy S, dependent on the buffer quality and the connection setup delay.

Connection setup delay

The connection setup delay is a parameter that influences the video quality during thehandoff in conjunction with other parameters, especially the buffer size and the bandwidthdifference between the old and the new connection. Clearly, the larger the bandwidthdifference between the two connections, the greater is the effect of the handoff. If, in theworst case, a handoff is triggered because the old connection went down, the connectionsetup delay becomes important together with the buffer size. In this case, the setup delayshould not exceed the time that is kept in the buffer. If it does, the video stops because thebuffer has run out of frames.

There is only a limited possibility for an application to influence the effects of the con-nection setup delay. The application can start the search for alternative connections early,without establishing the connection, however. It is certainly not wise to start searchingfor alternative connections when the old connection is down (unless it goes down unex-pectedly).

Length of the synchronization phase

During the synchronization phase, the client receives data over two connections concur-rently. The client may profit from both connections if the content of the data streams isdistinct, e.g., one connection sends even packet numbers and the other odd numbers. Theclient would have to configure the servers accordingly because the servers do not knowwhich packets to send. Currently, this customization is not implemented. The serversdecide independently which frames they send.

At the same time, concurrently maintaining two connections includes the danger ofoverloading resources. If both connections send their data over a common link, it maybecome a bottleneck. Similarly, the client may be overloaded by receiving packets fromtwo sources.

Although it is interesting to discuss topics in this area, this paper only investigates theinfluence of the length of the synchronization phase on the quality of the handoff.

7.5.4 Influence of the parameters on the handoff

To investigate the effects of the various parameters and methods on the handoff quality,several experiments are performed. The evaluation scenario is presented in Figure 7.27.

7.5 Handoff 185

Clientserver

serverserver

CMU

ETH

long setup time

connection A

connection Bconnection C

short setup time

Figure 7.27: Experiment setup. The client is forced to hand off from connection A toeither connection B (local server) or C (remote server).

At the beginning, a client at ETH receives data from a local server over the connectionA. The packets that arrive at the client are buffered before the client displays them. Theaverage fill degree of the buffer is set to 50%. At some point in time, this connection isartificially slowed down and a handoff is triggered. The topology–awareness of Medusaallows the application to find two alternative connections it can switch to: another serverat ETH or a server at CMU. The server at ETH has a short connection setup delay, the oneat CMU a large delay. For the experiments, an MPEG movie containing audio and videotracks is used. The movie requires a transmission rate of about 650 Kbps. It consists ofa total of 3200 packets, resulting in a playing time of 42 seconds. The local connectioneasily satisfies this bandwidth requirement. In contrast, the transatlantic connection fromCMU to ETH cannot always satisfy the requirements. The handoff is (automatically)triggered after the arrival of packet 1500 at the client to create reproducible results. Thenew server is determined before the handoff so that no delay occurs for the search of analternate connection path.

Buffer size

The first experiment investigates the effects of the (statically configured) buffer size onthe handoff. Figure 7.28 shows the arrival rate of 4 video transmissions at the client. Thex–axis denotes the time (the time stamp when the packet is inserted into the buffer), they–axis the sequence number of the packet. Three experiments are measured using strategyS0 with a varying buffer size of 50, 100 and 200 packets, respectively. A forth experimentuses strategy S1 with a buffer size of 50. In contrast to S0, the sequence number of thestart packet with strategy S1 is independent of the buffer size. A discussion of strategies S2

and S3 are omitted for reason of simplicity. Because they are both based on the discussedstrategies, their effect can be deduced from the presented experimental data.

The video is transmitted over the old connection A (only one trace is shown beforethe handoff because it is similar for all runs). When the handoff is triggered, connectionA hands off to connection B. The old connection A is immediately shut down. After


1400

1420

1440

1460

1480

1500

1520

1540

1560

25.5 26 26.5 27 27.5 28 28.5 29 29.5 30 30.5

time [sec]

pa

cke

tse

qu

en

ce

nu

mb

er

connection A

connection B

handoff triggered

S0/100

S1/50

S0/50

S0/200

Figure 7.28: Three experiments with variable buffer size (50, 100, 200 packets) for strat-egy S0, and one experiment with strategy S1 (50 packets).

triggering the handoff, no packets arrive at the client for about half a second. This delaycontains the new connection setup and setting of the start packet. Because the serverremains the same for all experiments, the delay is similar for all handoff strategies. Duringthe setup time, the video player fetches the data from the buffer. A gap in the packet arrivalin Figure 7.28 therefore does not necessarily imply a pause for the player.

Strategy S1 starts sending at packet 1500, not retransmitting packets that have alreadybeen sent via connection A. Therefore, the packets that are most needed (with a sequencenumber larger than 1500), arrive earlier than with other strategies.

With strategy S0, the number of retransmitted packets depends on the buffer size.The larger the buffer size, the more packets are retransmitted. In this experiment, theretransmission is unnecessary. The old connection A has a steady arrival rate before thehandoff, and the buffer is therefore well filled. The number of packets received twice atthe buffer (because of the retransmission) is as high as 98% between the start packet andpacket 1500 for all experiments with strategy S0. These packets must be discarded. Suchduplicates should be minimized because they may overload the client or the network.

The figure also shows an additional fact: the arrival time of packets with a givensequence number also depends on the buffer size: with larger buffers, the packets tend toarrive later than with small buffers, and packets arrive later with S0 than with S1. However,

7.5 Handoff 187

1400

1420

1440

1460

1480

1500

1520

1540

1560

25 26 27 28 29 30 31 32

time [sec]

pa

cke

tse

qu

en

ce

nu

mb

er

connection A

player 50

player 200

connection B/50

connection B/200

Figure 7.29: The influence of the buffer size on the video with strategy S0.

this is unimportant for the video player because the late arrival time always correspondsto a large buffer, and the buffer is capable of compensating for the (large) delay.

Figure 7.29 uses the same experiment setup, but the old connection A is artificiallymade lossy before the handoff. In contrast, the new connections send over an uncongestedlink.

In addition to the packet arrival times, the time a packet is fetched from the bufferby the video player is shown. Strategy S0 tries to compensate a lossy connection beforethe handoff. This compensation is visible between packets 1460 and 1500 approximately.Using the buffer with 50 packets, the strategy S0 almost ideally finds the correct startpacket. When the new packets arrive at the buffer, they are almost immediately consumedby the player. Because of the high losses of the old connection, only few packets mustbe discarded. Using the larger buffer (200 packets), the start packet is set to a lowersequence number. The retransmission of more packets could improve the repair of thevideo stream even more. In this example, however, no benefit is visible. The packetsover the new connection arrive too late at the buffer, the player has already consumedthese packets. Therefore, most of the transmitted packets at the beginning are discarded.Because the new connection delivers the data faster than the consumption by the player,the new connection finally manages to deliver its data early enough. In the figure, the newconnection B/200 reaches the player at about packet 1470.


This experiment shows that the different strategies have different effects on the arrivalrate of the packets. A correct strategy can influence the number of packets discarded bythe client. The unnecessary transmission puts load on the client and network resources. Inaddition, the strategy itself depends on the buffer size. Because the buffer size is usuallynot changeable by applications at run time, the buffer size should be taken into accountwhen selecting a strategy.

Finally, Figure 7.29 also shows the original motivation for a handoff. ConnectionA is obviously no longer capable of maintaining a constant video stream because manypackets are not received by the buffer. This loss is noticeable when looking at the video:the video stream is no longer smooth and the images are of a poor quality. A detailedanalysis of the received video shows that not even all I–frames were transmitted. Afterthe handoff, in contrast, the video streams again smoothly. This streaming is also visiblein Figure 7.29: the players fetch the packets at regular, small intervals.

Connection setup delay

As a second experiment, the influence of the connection setup delay on the packet arrivalis measured. Here, a handoff from connection A to connection B (the local server) iscompared to a handoff to connection C (the server at CMU). The measured delay for theconnection setup is about 50 ms for the local server and 2000 ms for the remote server.Strategy S0 is used with a buffer size of 50 packets. Figure 7.30 shows the packet sequencenumber on the y–axis and the time on the x–axis. For each packet, the arrival time and thetime the packet is fetched by the player is shown. In contrast to the almost linear arrivalrate of the packets (for all connections), the player fetches the data in bunches, resultingin a step–like function.

The behavior of the player is worthy of a discussion. Handing off to the local serverat ETH (connection B) delays the player (Player ETH) only slightly. Although the playerseems to bump slightly between packet 1480 and 1500, no effects are visible to an un-trained user if he looks at the movie. In contrast, the handoff to the CMU server (connec-tion C) is well noticeable. The player waits after packet 1500 until the new packets arrivevia connection C.

The discussion shows that strategy S0 is not optimal in both cases. Handing off toconnection B, most packets are dropped because of the filled buffer. Nevertheless, be-cause of the small delay, no effects are visible in the final movie. This shows that evena non–optimal strategy selection may still not be noticed by the user (always under theassumption that the client does not suffer from receiving packets twice!). In contrast, thehandoff to connection C has a negative impact on the video player. The movie stops andmust wait until new packets arrive at the buffer. The waiting time could have been reducedwith strategy S1. With S1, the new stream starts at the handoff packet (1500), so the playerdoes not wait because the first packets (before 1500) arrive but must be discarded.

7.5 Handoff 189

1400

1420

1440

1460

1480

1500

1520

1540

26 27 28 29 30 31 32

time [sec]

pa

cke

tse

qu

en

ce

nu

mb

er

connection A

video player

connection B

connection C

player B

player C

handoff

Figure 7.30: Handoff to local filter/server (ETH, connection B) and remote (CMU, con-nection C).

Influence on video quality

As a third set of experiments, the influence of the strategy on the video quality is inves-tigated. The number of correctly received frames is taken as a measure of video quality.For each strategy S0 to S3, the connection setup time and the length of the synchronizationphase are varied. In experiments 1 and 2, the old connection is torn down immediatelyafter the handoff request. Thereby, the synchronization phase is completely omitted. Inexperiments 3 and 4, in contrast, the old connection is maintained until packet 1800. Atthat time, the new connection is sure to fully send its data. Experiments 1 and 3 show ahandoff to a local server, having a measured connection setup delay of 50 ms, whereasexperiments 2 and 4 have a delay of 2000 ms.

Only packets between 1400 and 1800 are used for the measurement because beforeand after the handoff the number of received frames is the same for all experiments.

In addition, the number of frames that are discarded during the handoff phase (i.e.between frames 1400 and 1800) are measured. Figures 7.31(a) and 7.31(b) show thenumber of correctly received frames between sequence numbers 1400 and 1800. Theresults are shown as the percentage of the ideal case. The figures show that a handoffto a nearby server (experiments 1 and 3) allows more frames to be received correctly


0

10

20

30

40

50

60

70

80

90

100

S0 S1 S2 S3 S0 S1 S2 S3

%of

corr

ectly

receiv

ed

fram

es

local server remote server

experiment 1 experiment 2

(a) Immediate connection shutdown

0

10

20

30

40

50

60

70

80

90

100

S0 S1 S2 S3 S0 S1 S2 S3

%of

corr

ectly

receiv

ed

fram

es


experiment 4experiment 3

(b) Delayed connection shutdown

Figure 7.31: Video quality (the number of correctly received frames) for immediate shut-down of the old connection.

than from a remote server. In experiment 1 and 3, the differences between the differentstrategies are quite small: the handoff is fast enough to maintain the constant data streamat the video player. However, experiment 4 shows that the numbers may be similarly higheven for a handoff to a remote server if the old connection continues to send (under theassumption that the client can support two streams at once). In experiment 2, the handofftakes too long to connect to the remote server. The handoff cannot avoid the packet loss.However, the influence of the correct strategy is visible in this experiment.

Figures 7.32(a) and 7.32(b) show a great diversity in the number of discarded packetsduring the handoff. In experiment 1, strategies S0 and S2 have a higher discard rate be-cause of those frames that are retransmitted unnecessarily. With strategy S1 and S3, fewerframes have to be discarded. Experiment 2 shows very few discarded packets becausethe packets arrive too late. In contrast, in experiment 3, both connections are sending allframes so that the duplicates must be discarded. Finally, in experiment 4, few packets arediscarded because the new connection C takes a long time to start sending.

This last experiment shows that the quality of the video stream can be improved byprolonging the synchronization phase. However, if the synchronization phase is longerthan the time needed to set up the new connection, the improvement comes at the cost ofdiscarding duplicate frames. No effect was visible in our experiments because both thenetwork and the client had enough resources to support two simultaneous connections. Ifthe application is aware of a congestion problem, it can shut down the old connection assoon as the first packet arrives over the new connection.

7.5 Handoff 191

0

10

20

30

40

50

60

70

80

90

100

S0 S1 S2 S3 S0 S1 S2 S3

%of

dis

card

ed

fram

es

remote serverlocal server

experiment 1 experiment 2

(a) Immediate connection shutdown

0

10

20

30

40

50

60

70

80

90

100

S0 S1 S2 S3 S0 S1 S2 S3

%of

dis

card

ed

fram

es


experiment 4experiment 3

(b) Delayed connection shutdown

Figure 7.32: The number of frames discarded during the handoff (between frame 1400and frame 1800).

7.5.5 Conclusions

A handoff is a technique that extends the capabilities of pure end–to–end applications.For a handoff, more information about the network is needed than just the current trans-mission path. In addition to the knowledge about the topology, resource information isneeded as well. To detect an impending handoff, the bandwidth of the current connec-tion must be compared to the bandwidth of alternative paths. A monitoring system, astypically used in a topology–aware environment, can even be used to trigger a handoff.Finally, the connection setup delay may become an important parameter that influencesthe video quality during a handoff. When an existing connection goes suddenly down andan alternative connection must be found as fast as possible, the selection of the fastestaccessible source may greatly influence the handoff quality.

Handoffs can be done to improve the quality of a transmission or even to ensure thecontinuation of a transmission. A handoff is therefore another approach to deal withservice breakdowns. As long as the Internet provides only a best–effort service model,applications have to prepare for performance problems and even service breakdowns.

One advantage of the presented implementation is that only the client functionalityis extended. All other components (server, filters) are not changed. Handoffs provide apractical approach to deal with those service breakdowns that are outside the range ofevents that can be addressed by adaptivity.

The effects of the handoff, i.e. the connection setup and synchronization, should notbe noticed by the client. This paper investigates the effects of various parameters on thevideo stream during the handoff. The connection setup time of the new connection as wellas the size and the fill degree of the buffer before the handoff have noticeable influenceon the handoff quality.

Because the application cannot influence these parameters, four strategies for setting


the start packet are investigated (the start packet determines the first packet to be sentover the new connection). In addition, the application may influence the time the oldconnection is maintained, if that option exists, i.e. unless the old connection has beenclosed.

The evaluation shows that a long connection setup time can be compensated by main-taining the old connection until the new connection is fully sending. If the old connec-tion is maintained too long, duplicate packets arriving over both connections must bediscarded. The unnecessary transmission of duplicate packets along different paths isproblematic. These transmissions may have a negative influence on the network and/orthe client if they cause or favor overloads The start packet strategy allows to trade off thenumber of correctly received packets and the number of packets that must be discarded.By choosing the oldest packet, e.g., the application can compensate for a lossy connectionbefore the handoff.

Handoffs provide the application developer with another option to deal with unde-sirable changes of connection properties if an alternative connection can be identified.Handoffs are practical and can be realized even in today’s environment without requiringchanges to servers or filters. The strategies that orchestrate a handoff can be based on arange of parameters, and handoffs are therefore another reason to encourage the deploy-ment of topology–awareness.

7.6 Conclusions

The unified architecture of Octopus supports applications with different communicationrequirements. Medusa is an MPEG–1 streaming application which has several commu-nication requirements. This chapter proves that Octopus is well suited to address theserequirements. Octopus provides at the same time support for the bandwidth sensitivity byserver selection and alternative path streaming. It implements network–based multicast,which allows a sharing of the bandwidth and hence also a better use of the bandwidth.Octopus provides support for dynamic handoff of streams at run time, when a connectionshould slow down. Medusa can adapt the data inside the network, e.g., to avoid an over-loading of the server resources. In addition, a combination of multicast and adaptationallows to individually adapt every substream to the available resources of the receivinghost. And finally, Octopus supports multipath streaming, which allows the combinationof multiple resources to better come up to the resource expectations of the application.In contrast to previous work, Octopus provides these communication features in a singleinfrastructure.

The implementation of Medusa shows that Octopus is easy to use. The interface be-tween the application and the Octopus core (the management layer) consists of a single,comprehensive call which replaces a given URL with a new URL that contains the newtransmission paths. To express the communication requirements, the application must

7.6 Conclusions 193

additionally extend the evaluator and selector abstractions of the Octopus framework.However, Octopus also provides several concrete implementations of both abstractions.An extension of these abstractions by the application programmer is therefore only nec-essary when these implementations are not suited for the application. Finally, the applica-tion must provide the necessary Octopus services, which implement the communicationmechanisms for the particular application (e.g., the adaptation). We consider the imple-mentation requirements a minor challenge for an application programmer. The parts thatmust be programmed are either well defined in their functionality (e.g., the evaluator), orthe functionality is closely related to the application process (e.g., adaptation) and may beneeded anyway (i.e., even without topology–awareness). So we consider the overhead forthe application programmer to make an application topology–awareness acceptable.

The implementation of Medusa and its integration with Octopus show that topology–awareness is not only a concept that is drawn on a piece of paper. The deployment ofMedusa and the evaluation of the mechanisms in the Internet show that the interactionof the different parts of the Octopus framework work well together. We note a clearimprovement of the video quality at the client when topology–aware mechanisms areused, compared to previous mechanisms.

The ease of use of Octopus communication mechanisms and the benefit of topology–awareness does not necessarily mean that no effort is needed anymore within the appli-cation to improve the application quality. Topology–aware adaptation is only possiblebecause adaptive filters have been deployed previously on an end–to–end base. The com-parison of the two implementations of multipath streaming has also shown that the infor-mation that is available within the application context allows an additional improvementof the application quality, in comparison to the management–layer multipath streaming.However, such an additional improvement comes only at the price of a software engineer-ing effort.

In general, we note that the integration of the legacy application into Octopus was onlyso easy and successful because the design already foresaw the need to model the filter as aseparate entity. It is therefore important that future applications that may use the featuresof topology–awareness are designed accordingly from scratch. However, since the idea ofintegrating communication into the application is relatively new, little research has beenpublished to studies this integration and provides guidance to the community about howto build such applications.

8Conclusion

8.1 Conclusions

Customization and responsibilities

Topology-awareness allows the deployment of customized communication in a network.This customization includes the use of different communication operations, such as mul-ticast, mobility, etc., as well as the awareness to application–specific resources, such asbandwidth, latency, etc.. Customization is a major step away from the current simple,“one–size–fits–all” service offered by the Internet. The possibility to customize commu-nication leads towards the deployment of more sophisticated distributed applications.

Customization, however, inherently introduces new responsibilities to define and toguide communication in an uncontrollable, large, dynamic network such as the Internet.These responsibilities can no longer be hidden behind transparent layers, but they have tobe individually taken by topology–aware applications. The Octopus framework providesvital help for applications about how to take these responsibilities, by providing guide-lines to the design and the use of topology–aware applications as well as by includingfrequently used algorithms in its core classes.

However, in spite of all this help, the final responsibility remains within the applicationcontext. The application designer and/or the user have to be aware of which resources areneeded by an application. For multipath streaming, e.g., knowledge is needed about thetotally needed bandwidth, about the minimal bandwidth that one path must have, andabout delay and jitter constraints. A part of this information may be statically provided byan application, e.g., the required bandwidth of a video can be analyzed by looking at theMPEG layout. Other factors, however, require additional dynamic interactions betweenthe application and the network at run–time. These interactions can only be partiallymodeled in the framework. Therefore, building sophisticated, distributed applicationsremains a challenging effort also in the future.

However, this dissertation has also shown that the effort of building topology–awareapplications pays off. We have studied the benefit of topology–aware mechanisms fortwo applications. Note, however, that we have only applied basic topology–aware mech-anisms. The MPEG application, e.g., has not been substantially changed from its orig-

195

196 Chapter 8: Conclusion

inal code to make it topology–aware. Especially the feedback–based adaptation mech-anism has not been changed. Even without these changes, we have been able to showa significant improvement in the application performance. By additionally integratingnew options from topology–awareness, e.g., new information sources, we expect an evenlarger gain in performance. We therefore conclude that a trade–off in the deploymentof topology–aware applications can be made between the effort of tuning and exploitingtopology–aware features and the performance gain.

Network information

Topology–awareness depends on the availability of information about the network topol-ogy and performance. In a large and dynamic network, such as the Internet, an applicationis far from having perfect information. It is especially hard to get information quickly andaccurately.

An application–centric approach, as taken in this dissertation, has the benefit that cus-tomization of the information is possible. Every application can specify the metric it isinterested in and at which rate measurements must be made. Similarly, each applicationcan search on its own for the network topology information. Such an application–centricapproach has the advantage that no generalization is needed, i.e., the information gather-ing mechanisms can be targeted to the needs of a single application. Problems, such as theneed for scalability, can thereby addressed at a very early stage. Similarly, measurementscan be at the rate that is most suited for a single application, so that no trade–off has to befound to satisfy multiple application needs at the same time.

On the other hand, if topology–awareness should be deployed on a larger scale, it isnot reasonable to let every application spend time and resources (most measurements areactive) to get the desired information. A general network information architecture is there-fore needed. Upon building such an architecture, two conclusions from this dissertationshould be taken into account. First, build a customizable system. Customization does notmean that every application should have a direct access and an influence on the informa-tion collection. But an application should, somewhere between the information collectionand the information processing, be able to specify which information it is interested inand possibly filter out unnecessary information. Second, we expect that a trade–off hasto made between the speed at which information can be provided and the accuracy ofthe information. This trade–off has again to be negotiable by applications. Especiallylatency–limited applications need information very fast, whereas long–running applica-tions can tolerate an initial setup time.

Layering concept in networks

Topology–awareness advocates for the re–thinking of the layering concept that has beenso far dominant in the network community. Any kind of transparent layering, however,

8.2 Future Work 197

limits the chance to build smart applications. The same layering is also visible in theprotocol stack and even in (application–layer) overlay networks. This dissertation showsthat there is an alternative to transparent layering, namely a controlled and organizedarchitecture that still hides many details behind a semi–transparent layer with a well–defined interface but allows a controlled access to selected resources. Our conclusionare therefore that it is well possible to replace the strict transparency without exposingapplications to an undesired complexity of the underlying complexity.

Evolutionary solutions

Topology-awareness is a concept that advocates for an evolutionary solution to addressthe need of complex communication operations. The restriction to access only resourcesavailable at the application–layer has been motivated by the goal of building a real, de-ployable solution. With that motivation, we have been able to show that it is technicallypossible to deploy topology–awareness in the current Internet, and we have shown someof the benefits for topology–aware applications. Nevertheless, our evaluation has beenlimited by the few sites that have accepted us to run Octopus code, in spite of our restric-tion to only access application–layer resources. To further deploy topology–awareness,more sites must be available. Flexibility would thereby needed not only from universities,but also other “resource owners”, such as ISPs or even private companies This openingof resources offers a lot of security concerns. However, given the recent advances in dy-namic code downloading and authentication and given that the code only needs to accessresources at the application layer, we hope that topology–awareness can soon be deployedon a larger scale.

8.2 Future Work

The customization and the flexibility of Octopus opens many possibilities to extend thework of this dissertation. We thereby see especially focus on improvements of exist-ing Octopus parts as well as the use of Octopus for different applications and networks.In contrast, we currently see little motivation to re–think the structure of the Octopusframework because the framework has proven to be usable for different applications. Are–design of the framework may be necessary, however, when new requirements turn upfrom both new applications and new networks.

Network layer

At the network layer, improvements are highly desired to provide a faster and more accu-rate access to network topology and performance information.


Global network information infrastructure Proposals for building global informationsystems have primarily been built for network management and research purposes. Theinformation they provide may not be suited for topology–aware applications, e.g., becausethey are too detailed. However, recent advantages in other domains, e.g., in peer–to–peernetworks, may be taken into consideration to build such a network information system.We envision a distributed, hierarchical system (similar to DNS) that provides topologyand performance information on a different level of coarseness and accuracy. The appli-cation may gradually require information about a finer granularity for a limited numberof network clouds.

Application collaboration A second option to extend the availability of network infor-mation is to integrate information from other systems or from the application. If informa-tion is collected at a network site for administrative purposes, it can be fed into topology–aware applications, e.g., by means of the collector database or the management–layerdatabase. Similarly, SPAND–like systems can be integrated at the management–layer.Both administration information and SPAND can be used to set up an application as wellas to drive an application at run time.

Information from other layers The findings of this dissertation regarding the draw-backs of transparently layered solutions can also be used in other contexts. Previousresearch has already shown that bandwidth information can also be extracted from TCPmechanisms, e.g. timers or congestion control. At an even lower layer, routers that runinto the danger of getting congested can notify passing streams at a higher level. Theexploitation of how the Internet should or could be changed to pass performance infor-mation in a controlled way through the protocol stack and how this information could beused by topology–aware applications would be a huge but challenging project.

New network architectures In addition to improving Internet network information,a deployment of topology–awareness in other kinds of networks is challenging as well.First, wireless networks have different metrics that must be measured, such as packet loss.These metrics may be reported directly to the application. However, packet loss also hasan influence on the application–perceived bandwidth or latency. Extending the Octopusframework to model this influence and to allow applications to react appropriately tochanges would be a challenging extension to the work of this dissertation.

Similarly, new challenges can be made out in an ad–hoc environment. An ad–hocnetwork typically has no fixed infrastructure. The development of a topology collectorfor ad–hoc networks is therefore an interesting task. In addition, the changes of the topol-ogy also influence the latency and the throughput in the network. A combination of thetopology change information and the performance metrics can easily be implemented in-side the Octopus core. This combination may be used by topology–aware applications to

8.2 Future Work 199

influence the packet routing and/or react to changes.

Application layer

At the application layer, future work may also be directed towards optimizing the de-scribed application algorithms as well as integrating new applications.

Optimization An optimization of the current algorithms includes a fine tuning of theadaptation. The adaptation algorithm of Medusa, e.g., is currently based on client feed-back about the loss rate. Alternatively, topology–aware bandwidth measurements couldbe taken into account. Finally, new abstractions that alleviate applications from process-ing measurements can be integrated. An event–driven mechanism, e.g., that allows anapplication to specify that it wants to be notified if and only if the difference in bandwidthexceeds a certain threshold has been proposed previously. We expect that such mecha-nisms provide a higher abstraction to the application and thus make the use of topology–aware mechanisms easier and second increase the application performance as comparedto the current algorithms.

Other topology–aware applications We consider a set of applications other than thepresented once suited to use topology–aware mechanisms. First, however, note that thetwo presented applications belong to important application groups. Medusa belongs tothe group of multimedia applications, which also include video on demand, teleteachingand telemedicine. Their benefit is typically straightforward since these applications aretypically long–running to compensate the setup delay, and their bandwidth requirementsare high so that adaptation is required. Depending on their connectivity pattern, they mayalso be considered as collaborative applications.

An application that would be interesting to integrate is the WWW. The WWW causesa large part of the Internet traffic and more sophisticated sites offer a large amount ofdata to be transferred, including video and large sets of images. The WWW is a specialcase in that it would already offer proxies inside the network whose functionality couldbe extended to become Octopus nodes. These proxies could additionally be used to builda hierarchical network information system.

A second group of applications that can use Octopus are large databases, data ware-houses or Internet data centers where a (typically small) user request is first processedand may result in a large volume of data to be sent back, e.g., a set of images or even avideo stream. We consider this group of applications typical for standalone Octopus fortwo reasons. First, users from any place in the world can contact the servers where thedata is stored. That is, an integration of other systems may be difficult. In addition, thisapplication group also has a good possibility to hide the startup latency of Octopus: the


processing time of the request can be used to gather network information. When the datais ready to be sent, the best transmission path found by Octopus can be used.

A third application group are applications that use the peer–to–peer paradigm, i.e.,where every participant acts as sender and receiver at the same time. Since every par-ticipant in a peer–to–peer network can be considered an Octopus node, P2P applicationscould be integrated into Octopus. We expect that an integration of currently deployedlarge P2P applications, such as Napster or Gnutella, could profit from the network infor-mation provided by Octopus. First, the time to search for an object could be customizedby the user, thereby trading off search time and the possibility to find an object witha better download time. Second, the network information can support the selection ofthe fastest download location. While the second issue can almost straightforwardly beachieved with the current Octopus mechanisms, the first issue opens a lot of new researchquestions. Allowing a user to specify the search time has a direct impact on the traffic im-posed onto a network: if all users select a very large search time, many search messagesmust be sent over a large distance in the network, thereby imposing load on the networkand the nodes. Finding an efficient solution or a trade off is a challenging issue that goesbeyond the pure use of Octopus.

Finally, the Grid is an evolving infrastructure that combines multiple computationalresources worldwide. Grid applications must distribute their tasks in way that best suittheir requirements. The computational requirements as well as their communication re-quirements are heterogeneous. We consider Octopus a system that could act as a middle-ware for the Grid because it could integrate Grid–specific network services at the lowerlayer. At a higher layer, calls to the Octopus management layer allow an application toexpress their resource requirements in an easy way.

Bibliography

[1] Akamai. www.akamai.com.

[2] T. Alrabiah and T. Znati. QoS-based routing for online multicasting. Technical re-port, Department of Computer Science, University of Pittsburgh, Pittsburgh, Febru-ary 1999.

[3] E. Amir, S. McCanne, and R. Katz. An active service framework and its applicationto real-time multimedia transcoding. In Proceedings of ACM SIGCOMM ’98, pages178–189, Vancouver, BC, Canada, August 1998.

[4] D. Andersen, H. Balakrishnan, M. Kaashoek, and R. Morris. Resilient overlay net-works. Operating Systems Review, 35(5):131–145, December 2001.

[5] S. Basu, A. Mukherjee, and S. Klivansky. Time series models for Internet traffic. InProceedings of IEEE Infocom ’96, pages 611–620, San Francisco, CA, March 1996.

[6] R. Bellman. Dynamic programming. Princeton University Press, Princeton, NJ,1957.

[7] F. Berman, R. Wolski, H. Casanova, W. Cirne, H. Dail, M. Faerman, S. Figueira,J. Hayes, G. Obertelli, J. Schopf, G. Shao, S. Smallen, N. Spring, A. Su, andD. Zagorodnov. Adaptive computing on the Grid using AppLeS. Submitted to IEEETransactions on Parallel and Distributed Systems, November 2001.

[8] S. Bhattacharjee, M. Ammar, E. Zegura, V. Shah, and Z. Fei. Application-layeranycasting. In Proceedings of IEEE Infocom ’97, pages 1388–1396, Kobe, Japan,April 1997.

[9] S. Bhattacharjee, K. Calvert, and E. Zegura. An architecture for active networking.In Proceedings of the 7th IFIP Conference on High Performance Networking (HPN’97), pages 265–279, White Plains, NY, April 1997.

[10] J. Bolliger. A framework for network-aware applications. PhD thesis, ETH Zurich,April 2000. Nr. 13636.

[11] J. Bolliger and T. Gross. Bandwidth monitoring for network-aware applications.In Proceedings of the 10th IEEE International Symposium on High PerformanceDistributed Computing (HPDC-10), pages 241–251, San Francisco, CA, August2001.

201

202 Bibliography

[12] G. Box, G. Jenkins, and G. Reinsel. Time Series Analysis: Forecasting and Control.Prentice Hall, Englewood Cliffs, NJ, 3rd edition edition, 1994. ISBN 0130607746.

[13] R. Braden, L. Zhang, S. Berson, S. Herzog, and S. Jamin. Resource ReSerVationProtocol - Version 1 Functional Specification. Request for Comment (RFC 2205),September 1997.

[14] Caida. Corporative Association for Internet Data Analysis (CAIDA).http://www.caida.org/.

[15] J. Cao, W. Cleveland, D. Lin, and D. Sun. On the nonstationarity of Internet traffic.In Proceedings of ACM Sigmetrics’01, pages 102–112, Cambridge, MA, June 2001.

[16] R. Carter and M. Crovella. Server selection using dynamic path characterization inwide-area networks. In Proceedings of IEEE Infocom ’97, pages 1014–1021, Kobe,Japan, April 1997.

[17] Y. Chae, S. Merugu, E. Zegura, and S. Bhattacharjee. Exposing the network: supportfor topology-sensitive applications. In Proceedings of the 3rd IEEE Conference onOpen Architectures and Network Programming (OPENARCH 2000), pages 65–74,Tel Aviv, Isr, March 2000.

[18] Y. Chawathe. An architecture for Internet content distribution as an infrastructureservice. PhD thesis, University of Berkeley, December 2000.

[19] Y. Chawathe, S. McCanne, and E. Brewer. RMX: reliable multicast for heteroge-neous networks. In Proceedings of IEEE Infocom 2000, pages 795–804, Tel Aviv,Isr, March 2000.

[20] S. Chen and K. Nahrstedt. An overview of quality-of-service routing for the nextgeneration high-speed networks: problems and solutions. IEEE Network Magazine,special issue on transmission and distributed of digital video, 12(6):64–79, Novem-ber 1998.

[21] Y. Chu, S. Rao, S. Seshan, and H. Zhang. Enabling conferencing applications on theInternet using an overlay multicast architecture. In Proceedings of ACM SIGCOMM’01, pages 55–67, San Diego, CA, August 2001.

[22] Y. Chu, S. Rao, and H. Zhang. A case for end system multicast. In ACM Sigmetrics2000, pages 1–12, Santa Clara, CA, June 2000.

[23] D. Clark and M. Blumenthal. Rethinking the design of the Internet: The end toend arguments vs. the brave new world. ACM Transactions on Internet Technology,1(1):70–109, August 2001.

[24] S. Deering. Host extensions for IP multicasting. Request for Comment (RFC 1112),August 1989.

[25] Digital Island. www.digitalisland.com.

Bibliography 203

[26] E. Dijkstra. A note on two problems in connexion with graphs, volume 1 of Nu-merische mathematik. 1959.

[27] P. Dinda. Online prediction of the running time of tasks. In Proceedings of the10th IEEE International Symposium on High Performance Distributed Computing(HPDC-10), pages 383–394, San Francisco, CA, August 2001.

[28] P. Dinda, T. Gross, R. Karrer, B. Lowekamp, N. Miller, P. Steenkiste, and D. Suther-land. The architecture of the Remos system. In Proceedings of the 10th IEEE In-ternational Symposium on High Performance Distributed Computing (HPDC-10),pages 252–265, San Francisco, CA, August 2001.

[29] Exodus. Exodus. www.exodus.com.

[30] Z. Fei, S. Bhattacharjee, E. Zegura, and M. Ammar. A novel server selection tech-nique for improving the response time of a replicated service. In Proceedings ofIEEE Infocom ’98, pages 783–791, San Francisco, CA, April 1998.

[31] R. Fontana. Video multicast. Master’s thesis, Laboratory for Software Technology,ETH Zurich, October 2000.

[32] L. Ford and D. Fulkerson. Flows in Networks. Princeton University Press, Princeton,NJ, 1962.

[33] M. Fowler and K. Scott. UML Distilled: Applying the standard object modelinglanguage. Addison-Wesley Publishing Company, Inc, Publishing Company, Inc,1997.

[34] A. Fox. The case for TACC: scalable servers for transformation, aggregation,caching and customization, 1998. Qualifying Exam Proposal.

[35] A. Fox, S. Gribble, E. Brewer, and E. Amir. Adapting to network and client vari-ability via on-demand dynamic distillation. In Proceedings of the 7th InternationalConference on Architectural Support for Programming Languages and OperatingSystems (ASPLOS-VII), pages 160–173, Cambridge, MA, October 1996.

[36] A. Fox, S. Gribble, Y. Chawathe, and E. Brewer. Adapting to network and clientvariations using infrastructural proxies: lessons and perspectives. IEEE PersonalCommunications, 5(4):10–19, September 1998.

[37] P. Francis. Yallcast: extending the Internet multicast architecture. Technical report,NTT Information Sharing Platform Laboratories, September 1999.

[38] M. Fry and A. Ghosh. Application level active networking. Computer Networks,31(7):655–667, April 1999.

[39] A. Ghosh, M. Fry, and J. Crowcroft. An architecture for application layer routing.In Intl. Workshop for Active Networks (IWAN 2000), pages 71–86, Tokyo, Japan,October 2000.

204 Bibliography

[40] R. Govindan, C. Alaettinoglu, and D. Estrin. A framework for active distributed ser-vices. Technical Report TR 98-669, University of Southern California, Los Angeles,CA, January 1998.

[41] S. Gribble, M. Welsh, R. von Behren, E. Brewer, D. Culler, N. Borisov, S. Czer-winski, R. Gummadi, J. Hill, A. Joseph, R. Katz, Z. Mao, S. Ross, and B. Zhao.The ninja architecture for robust Internet-scale systems and services. ComputerNetworks, 35(4):473–497, 2001.

[42] N. Groschwitz and G. Polyzos. A time series model of long-term NSFNET backbonetraffic. In Proceedings of IEEE Conference on Communication (ICC ’94), volume 3,pages 1400–1404, New Orleans, May 1994.

[43] M. Hemy, P. Steenkiste, and T. Gross. Evaluation of adaptive filtering of MPEGsystem streams in IP networks. In Proceedings of the IEEE International Conferenceon Multimedia and Expo 2000 (IDME 2000), pages 1313–1317, New York, NY,August 2000.

[44] T. Hug. Managing active services with Jini. Master’s thesis, Laboratory for SoftwareTechnology, ETH Zurich, October 2000.

[45] F. Hwang, D. Richards, and P. Winter. The Steiner Tree Problem. Annals of DiscreteMathematics, Vol 53. North-Holland, Amsterdam, Netherlands, 1992.

[46] R. Jain. The art of computer systems performance analysis. Techniques for experi-mental design, measurement, simulation, and modeling. John Wiley & Sons, Inc.,1991. ISBN 0-471-50336-3.

[47] J. Jannotti, D. Gifford, K. Johnson, M. Kaashoek, and J. O’Toole. Overcast: reliablemulticasting with an overlay network. In Proceedings of the 4th Symposium onOperating System Design and Implementation (OSDI 2000), pages 197–212, SanDiego, CA, October 2000.

[48] Java Advanced Imaging. http://java.sun.com/products/java-media/jai/index.html.

[49] R. Karrer and T. Gross. Dynamic handoff of multimedia streams. In Proceedingof the 11th International Workshop on Network and Operating Systems Support forDigital Audio and Video (NOSSDAV 2001), pages 125–133, Port Jefferson, NY, June2001.

[50] R. Karrer and T. Gross. Location Selection for Active Services. Cluster Comput-ing: the Journal of Networks, Software and Applications, 5(1):365–376, July 2002.Kluwer Academic Publishers.

[51] S. Kasera, S. Bhattacharyya, M. Keaton, D. Kiwior, J. Kurose, D. Towsley, andS. Zabele. Scalable fair reliable multicast using active services. IEEE NetworkMagazine, 14(1):48–57, January 2000.

Bibliography 205

[52] D. Katabi and J. Wroclawski. A framework for scalable global IP-anycast (GIA).In Proceedings of ACM SIGCOMM 2000, pages 3–15, Stockholm, Sweden, August2000.

[53] K. Lee, S. Ha, J. Li, and V. Bharghavan. An application-level multicast architecturefor multimedia communications. In Proceedings of ACM Multimedia 2000, pages398–400, Los Angeles, Ca, October 2000.

[54] U. Legedza, D. Wetherall, and J. Guttag. Improving the performance of distributedapplications using active networks. In Proceedings of IEEE Infocom ’98, pages590–599, San Francisco, CA, April 1998.

[55] Looking glass and public traceroute servers. http://www.traceroute.org.

[56] A. Nakao, L. Peterson, and A. Bavier. Constructing end-to-end paths for playingmedia objects. In Proceedings of IEEE Openarch’01, pages 117–128, Anchorage,April 2001.

[57] B. Noble, M. Satyanarayanan, G. Nguyen, and R. Katz. Trace-based mobile net-work emulation. In Proceedings of ACM SIGCOMM ’97, pages 51–62, Cannes, Fr,September 1997.

[58] OMG’s CORBA Web site. www.omg.org.

[59] V. Paxson. End-to-end Internet packet dynamics. In Proceedings of ACM SIG-COMM ’97, pages 139–152, Cannes, France, September 1997.

[60] V. Paxson, G. Almes, J. Mahdavi, and M. Mathis. A framework for IP performancemetrics. Request for Comment 2330, May 1998.

[61] C. Perkins. IP mobility support. Request for Comment (RFC 2002), October 1996.

[62] L. Qiu, V. Padmanabhan, and G. Voelker. On the placement of Web server replicas.In Proceedings of IEEE Infocom 2001, pages 1587–1596, Anchorage, Alaska, April2001.

[63] N. Rao and S. Batsell. QoS routing via multiple paths using bandwidth reservation.In Proceedings of IEEE Infocom ’98, pages 11–18, San Francisco, CA, March 1998.

[64] S. Saroiu, P. Gummadi, and S. Gribble. A measurement study of peer-to-peer filesharing systems. In Proceedings of IS&T/SPIE Conference on Multimedia Comput-ing and Networking (MMCN ’02), San Jose, CA, January 2002.

[65] S. Savage, T. Anderson, A. Aggarwal, D. Becker, N. Cardwell, A. Collins, E. Hoff-man, J. Snell, A. Vahdat, G. Voelker, and J. Zahorjan. Detour: informed Internetrouting and transport. IEEE Micro, 19(1):50–59, January 1999.

[66] S. Savage, A. Collins, E. Hoffman, J. Snell, and T. Anderson. The end-to-end effectsof Internet path selection. In Proceedings of ACM SIGCOMM ’99, pages 289–299,Boston, Massachusetts, August 1999.

206 Bibliography

[67] S. Seshan, M. Stemm, and R. Katz. SPAND: shared passive network performancediscovery. In Proceedings of the 1st USENIX Symposium on Internet Technologiesand Systems (USITS’97), pages 135–146, Monterey, CA, December 1997.

[68] S. Shi and J. Turner. Routing in overlay networks. In Proceedings of IEEE Infocom2002, volume 3, pages 1200–1208, New York, NY, June 2002.

[69] R. Siamwalla, R. Sharma, and S. Keshav. Discovering Internet topology. Technicalreport, Dept of Computer Science, Cornell University, July 1998.

[70] M. Siegl and G. Trausmuth. Hierarchical network management: a concept and itsprototype in SNMPv2. In Proceedings of the 6th Joint European Networking Con-ference (JENC6), Tel Aviv, Isr, May 1995.

[71] A. Snoeren and H. Balakrishnan. An end-to-end approach to host mobility. InProceedings of the 6th ACM/IEEE International Conference on Mobile Computingand Networking (MobiCom ’00), pages 155–166, Boston, MA, August 2000.

[72] M. Stemm, R. Katz, and S. Seshan. A network measurement architecture for adap-tive applications. In Proceedings of IEEE Infocom 2000, pages 285–294, Tel Aviv,Isr, March 2000.

[73] I. Stoica, D. Adkins, S. Zhuang, S. Shenker, and S. Surana. Internet IndirectionInfrastructure. In Proceedings of ACM SIGCOMM ’02, Pittsburgh, PA, August 2002.

[74] I. Stoica, R. Morris, D. Karger, M. Kaashoek, and H. Balakrishnan. Chord: a scal-able peer-to-peer lookup service for Internet applications. In Proceedings of ACMSIGCOMM ’01, pages 149–160, San Diego, CA, August 2001.

[75] I. Stoica, T. Ng, and H. Zhang. REUNITE: A recursive unicast approach to multi-cast. In Proceedings of IEEE Infocom 2000, pages 1644–1653, Tel Aviv, Isr, March2000.

[76] Sun Microsystems. The Jini homepage. http://www.java.sun.com/jini/.

[77] Switch traffic statistics. www.switch.ch/lan/stat.

[78] D. Tennenhouse, S. Garland, L. Shira, and M. Kaashoek. From Internet to Ac-tiveNet. Request for Comment, January 1996. TNS Group, MIT LCS.

[79] D. Tennenhouse, J. Smith, W. Sincoskie, D. Wetherall, and G. Minden. A survey ofactive network research. IEEE Communications Magazine, pages 80–86, January1997.

[80] D. Tennenhouse and D. Wetherall. Towards an active network architecture. Com-puter Communications Review, 26(2):5–18, April 1996. Also in: Proceedings ofMultimedia Computing and Networking 96, San Jose, CA.

[81] B. Tierney, B. Crowley, D. Gunter, M. Holding, J. Lee, and M. Thompson. A mon-itoring senser management system for Grid environments. In Proceedings of the

Bibliography 207

9th IEEE International Symposium on High Performance Distributed Computing(HPDC-9), pages 97–104, Pittsburgh, PA, August 2000.

[82] S. Vazhkudai, J. Schopf, and I. Foster. Predicting the performance of wide-areadata transfers . In Proceedings of the 16th International Parallel and DistributedProcessing Symposium (IPDPS 2002), Fort Lauderdale, April 2002.

[83] C. Villamizar. OSPF Optimized Multipath (OSPF-OMP). Internet Draft (draft-ietf-ospf-omp-02), work in progress, February 1999.

[84] D. Wetherall, J. Guttag, and D. Tennenhouse. ANTS: A toolkit for building anddynamically deploying network protocols. In Proceedings of the 1st IEEE Confer-ence on Open Architectures and Network Programming (OPENARCH ’98), pages117–129, San Francisco, CA, April 1998.

[85] D. Wetherall, U. Legedza, and J. Guttag. Introducing new Internet services: whyand how. IEEE Network Magazine (Special Issue on Active and ProgrammableNetworks), pages 12–19, July 1998.

[86] R. Wolski. Dynamically forecasting network performance using the NetworkWeather Service. In Proceedings of the 6th High-Performance Distributed Com-puting Conference (HPDC-6), pages 316–325, Portland, OR, August 1997.

[87] D. Wu, Y. Hou, W. Zhu, Y. Zhang, and J. Peha. Streaming video over the Internet:approaches and directions. IEEE Transactions on Circuits and Systems for VideoTechnology, 11(3):282–300, March 2001.

[88] N. Yeadon, F. Garcia, D. Hutchison, and D. Shepherd. Continuous media filters forheterogeneous Internetworking. In Proceedings of SPIE - Multimedia Computingand Networking (MMCN96), pages 118–134, San Jose, CA, January 1996.

[89] B. Zhao, J. Kubiatowicz, and A. Joseph. Tapestry: an infrastructure for fault-tolerantwide-area location and routing. Technical Report UCB/CSD-01-1141, University ofBerkeley, CA, April 2001.

[90] S. Zhuang, B. Zhao, A. Joseph, R. Katz, and J. Kubiatowicz. Bayeux: an architecturefor scalable and fault-tolerant wide-area data dissemination. In Proceeding of the11th International Workshop on Network and Operating Systems Support for DigitalAudio and Video (NOSSDAV 2001), pages 11–20, Port Jefferson, NY, June 2001.

Curriculum Vitae

17. May 1969 Born in Menziken, SwitzerlandSon of Hildegard and Rudolf Karrer-Hug

1976–1985 Primary school in Seon

1985–1989 Secondary school: Alte Kantonsschule AarauMatura Typus B

1990–1996 Studies in Computer ScienceSwiss Federal Institute of Technology, Zurich

1993 Internship at Spectrospin, Fallanden, and Bruker Intl. Inc, Woburn,Massachusetts

1996 Diploma in Computer Science, ETH Zurich(Dipl. Informatik-Ingenieur ETH)Diploma Thesis on “Application-level QoS Monitoring”, supervisedby Prof. Thomas R. Gross and Jurg Bolliger

1996–2002 Research and teaching assistant at ETH Zurich (Institute for ComputerSystems, Laboratory for Software Technology), in the research groupof Prof. Thomas R. Gross

209

Design of topology–aware networked applications - CiteSeerX

Documents