Top Banner
Decentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer content delivery network Alan Kin Wah Yim, Rajkumar Buyya * Grid Computing and Distributed Systems (GRIDS) Laboratory, Department of Computer Science and Software Engineering, The University of Melbourne, Carlton, Melbourne VIC 3053, Australia Received 13 January 2005; received in revised form 24 March 2006; accepted 8 May 2006 Available online 10 July 2006 Abstract Hosting an on-demand media content streaming service has been a challenging task mainly because of the outrageously enormous network and server bandwidth required to deliver large amount of content data to users simultaneously. We propose an infrastructure that helps online media content providers offload their server and network resources for media streaming. Using application level resource diversity together with the peer-to-peer resource-sharing model is a feasible approach to decentralize the content storage, server and network bandwidth. Each subscriber is responsible for only a small fraction of such resources. Most importantly, the cost of maintaining the service can also be shared amongst sub- scribers, especially when the subscriber base is large. As a result, subscribers can be benefit from lower subscription cost. There have been a few solutions out there that focused only on sharing the load of network bandwidth by division of a streaming task to be carried out by multiple sources. However, existing solutions require that the content to be replicated in full and stored in each source, which is impractical for a subscriber as the owner of the storage resource that is of con- sumer capacity. Our solution focuses on the division of responsibility on both the network bandwidth and content storage such that each subscriber is responsible for only a small portion of the content. We propose a light-weighted candidate peer selection strategy based on avoidance of network congestion and an adaptive re-scheduling algorithm in order to enhance smoothness of the aggregated streaming rate perceived at the consumer side. Experiments show that the performance of our peer-selection strategy out performs the traditional strategy based on end-to-end streaming bandwidth. Ó 2006 Elsevier B.V. All rights reserved. Keywords: Content delivery networks; Decentralised systems; Media streaming; Peer-to-peer computing 1. Introduction Hosting an on-demand streaming service of per- sistent media content, such as video-on-demand, has been a challenging task mainly because of the outrageously enormous network and server band- width required to deliver, in real-time, large amount 1383-7621/$ - see front matter Ó 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.sysarc.2006.05.001 * Corresponding author. E-mail addresses: [email protected] (A.K. Wah Yim), [email protected], [email protected] (R. Buyya). Journal of Systems Architecture 52 (2006) 737–772 www.elsevier.com/locate/sysarc
36

Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

Apr 25, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

Journal of Systems Architecture 52 (2006) 737–772

www.elsevier.com/locate/sysarc

Decentralized media streaming infrastructure (DeMSI):An adaptive and high-performance peer-to-peer content

delivery network

Alan Kin Wah Yim, Rajkumar Buyya *

Grid Computing and Distributed Systems (GRIDS) Laboratory, Department of Computer Science and Software Engineering,

The University of Melbourne, Carlton, Melbourne VIC 3053, Australia

Received 13 January 2005; received in revised form 24 March 2006; accepted 8 May 2006Available online 10 July 2006

Abstract

Hosting an on-demand media content streaming service has been a challenging task mainly because of the outrageouslyenormous network and server bandwidth required to deliver large amount of content data to users simultaneously. Wepropose an infrastructure that helps online media content providers offload their server and network resources for mediastreaming. Using application level resource diversity together with the peer-to-peer resource-sharing model is a feasibleapproach to decentralize the content storage, server and network bandwidth. Each subscriber is responsible for only asmall fraction of such resources. Most importantly, the cost of maintaining the service can also be shared amongst sub-scribers, especially when the subscriber base is large. As a result, subscribers can be benefit from lower subscription cost.There have been a few solutions out there that focused only on sharing the load of network bandwidth by division of astreaming task to be carried out by multiple sources. However, existing solutions require that the content to be replicatedin full and stored in each source, which is impractical for a subscriber as the owner of the storage resource that is of con-sumer capacity. Our solution focuses on the division of responsibility on both the network bandwidth and content storagesuch that each subscriber is responsible for only a small portion of the content. We propose a light-weighted candidate peerselection strategy based on avoidance of network congestion and an adaptive re-scheduling algorithm in order to enhancesmoothness of the aggregated streaming rate perceived at the consumer side. Experiments show that the performance ofour peer-selection strategy out performs the traditional strategy based on end-to-end streaming bandwidth.� 2006 Elsevier B.V. All rights reserved.

Keywords: Content delivery networks; Decentralised systems; Media streaming; Peer-to-peer computing

1383-7621/$ - see front matter � 2006 Elsevier B.V. All rights reserved

doi:10.1016/j.sysarc.2006.05.001

* Corresponding author.E-mail addresses: [email protected] (A.K. Wah

Yim), [email protected], [email protected] (R. Buyya).

1. Introduction

Hosting an on-demand streaming service of per-sistent media content, such as video-on-demand,has been a challenging task mainly because of theoutrageously enormous network and server band-width required to deliver, in real-time, large amount

.

Page 2: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

738 A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772

of video data to users simultaneously. In order todeliver a near DVD quality video stream while usingas low the streaming rate as possible, a video com-pression technology such as MPEG-4 [5] is typicallyused. Informal studies [6] show that the streamingrate required for a near-DVD reproduction is atleast 500 kbps. Maintaining a big network pipeenough to support the simultaneous video streamsand a persistent 500 kbps bandwidth per streamfor the duration of a movie (ranging from 1 to3 h) is expensive. Therefore a pure single video ser-ver cluster to multiple consumers approach is quitea bad idea. The ability to scale is weak. A variant ofthat is to use multiple server clusters working likeproxies in different regional locations or ‘‘edges’’of the network to allow better scalability [8]. Theconsumer node is instructed to contact the proxylocal to the consumer for streaming. Each proxymay act like the master that carries a replicationof the contents, or caches a subset of contents mostfrequently requested by the local consumers [7].Such distributed ‘‘edge architecture’’ helps reducelatency and number of hops before reaching theconsumer, as the stream is scheduled to deliver fromthe proxy closest to the consumer. Hence the chanceof encountering network congestion is lower. How-ever, it does not mean that a local connection is freeof congestion. As [9] suggests, packet loss (hence thecongestion) in an end-to-end connection is usuallycaused by only a few hop-links in the path.Although there are more than one video server clus-ter to share the server and network loading, the sys-tem still suffers from single point of failure problemas the stream is still pushed from a single sourceover a single connection. Although the stream canbe diverted through multiple paths of the networkto avoid congestion [10–12], the technique is outof question as the routing is beyond the control ofthe content provider. In addition, since the cost ofthe servers and the network bandwidth for thestreams belongs to the content provider, both thetraditional single-server and the edge architecturesuffer from under-utilized server and networkresources problem during off-peak hours.

The existing problems of implementing a costeffective streaming service of persistent media con-tent as mentioned above lead to the design ofDeMSI – the Decentralized Media Streaming Infra-structure. The main objective of DeMSI is to easethe cost of content storage and workload of a videocontent distribution/delivery network (CDN), tradi-tionally managed by the content provider, by off-

loading the streaming server, network and storageresources to subscriber workstations and theirupstream internet bandwidth, without sacrificingvideo quality. Subscribers are not only the con-sumer of the service, but also a member of the con-tent server. The fundamental idea is to allowmultiple subscriber peers to serve streams of thesame video content simultaneously to a consumingpeer rather than the traditional single-server-to-cli-ent streaming model, while allowing each peer tostore only a small portion of the content. It is antic-ipated that a subscriber peer can be a PC worksta-tion, or simply a set top box with a few gigabytesof disk space to spare. Each peer has a broadbandconnection of at least 1.5 Mbps downstream and256 kbps upstream to the internet. DeMSI isdesigned to be independent of the type of the mediacontent. It is anticipated to work with both CBR/VBR video of any formats and bit rate, and it isnot limited to serve video content, but any othermedia types that are stream-able.

Like other peer-to-peer applications, DeMSI hasto face with the reliability issues of peer resources.Since peer resources are pretty much beyond controlby the service owner, the domain of the reliabilityproblems that DeMSI has to overcome includes:

1. The unpredictability of dynamics in the conditionof connection between the serving peers and theconsuming peer. As a connection is made up ofa path through hop-links, some links sharingtraffic with other connections may be congestedthat result in delays and loss packets. Hence avarying end-to-end effective bandwidth;

2. A peer may be turned off at any time. Evenworse, a peer can be shut down abnormally suchthat one cannot expect a peer to notify anotherparty of its unavailability;

3. The integrity of contents is vulnerable as the con-tent is stored at the peer end that is beyond con-trol by the service owner.

The integrity of contents can be easily verified byemploying a hash scheme, such as SHA-1, to thecontent data such that a tampered copy of the con-tent can be detected upon deriving from the contenta hash code different from the original. In DeMSI,the consumer may send the SHA-1 code of the con-tent segment to the target peer along with the requestfor streaming. The peer then verify against the localcopy and reply either by commencing the stream or anegative acknowledgement. This technique has been

Page 3: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772 739

used in many P2P applications and we will not dis-cuss this further here. Therefore, addressing prob-lems 1 and 2 are the primary focus of this paper.

As the peers and their connections are unreliable,every P2P application have to deal with re-schedulingof streaming tasks and switching-over of peers whenthey become unavailable or the service level does notmeet expectations. When a DeMSI consumer has alist of candidate subscriber peers discovered or previ-ously contacted by others as consumers, it has tomake a selection that achieves the following goals:

1. To maximize the utilization of the network andpeers.

2. To minimize the number of peers to serve thecontent.

3. To minimize the frequency of re-scheduling oremergency switching-over to other candidatesover the course of streaming.

The goals are attributed to two important facts.Firstly, the need for fewer peers at a time in stream-ing implies fewer transitions over the course of astreaming session, and fewer peers are required tobe online at any point in time. Secondly, the goalspromote stability in aggregated streaming rate fromthe active serving peers. As a result, less buffering isrequired for received content prior to a playback. Atfirst glance, peers with largest historic streamingrate should be selected first in order to achieve sucha goal. However, this may not be true. Since theinternet is made up of hop-links where they canshare the traffic from multiple connections, DeMSIshould expect there exist two or more candidatesthat have to send packets through the same hop-link(s). If one of those hop-links has tight band-width or filled with cross-traffic, while the previousselection of any one of those peers allows the peerto give 100% of its offered streaming rate, the selec-tion of two or more of those peers may result in con-gestion such that those peers may only serve at afraction of their offered rate. As our performanceevaluation shows, this results in not only addingfluctuations to the aggregated streaming rate, butalso the need for more peers in subsequent schedul-ing of streaming tasks. Moreover, re-schedulingbecomes more frequent. DeMSI deals with thisproblem from two major directions:

1. Proactive scheduling: Candidate peers with thelargest historic end-to-end streaming bandwidth,smallest packet loss rate, and offer the largest

portion of the content, while they share no orvery few congested link(s) with the other activelyserving peers, are selected first. The consumerconstantly monitors and stores in its knowledge-base the above mentioned network metrics foreach peer whenever it is actively serving. In addi-tion, the consumer infers incrementally duringthe streaming session which peer connectionsare possibly sharing a congested link in the net-work, without contributing any additional over-head on the streams.

2. Reactive scheduling: The underlying networkcharacteristics of the peer-consumer connectionsand the availability of the peer itself change overtime. Re-scheduling of streaming tasks and emer-gency switching-over of actively serving peers isunavoidable despite of how good the selectionalgorithm is. We design a sophisticated divide-and-conquer based scheduling/re-schedulingalgorithm that is highly adaptive, flexible, awareof deadlines, and promotes smooth transitions.

As the storage of media content under DeMSI’s sce-nario is decentralized where no single peer containsthe complete replication of the content, it is inherentthat the consumer has to look for hundreds of peers,which means hundreds of transitions from one peerto another over the course of streaming. The sched-uling and re-scheduling algorithms have to be light-weighted and perform in a timely fashion.

The remainder of this paper is organized as fol-lows. Section 2 discusses related work. Section 3provides the design details of DeMSI. Section 4analyses the performance of our system in termsof the effectiveness of its peer selection strategy inthe scheduling/re-scheduling processes, and the re-scheduling algorithm itself. Finally, the paper con-cludes with an outline of future work in Section 5.

2. Related work

A number of attempts have been made to decen-tralize CDNs. One of the popular approaches is todeploy several server clusters serving the same con-tent in various regions of the world, or commonlyknown as the ‘‘edges’’ of the internet [8,35]. Eachserver cluster serves the end-user population thatis ‘‘local’’ to where the data center of that clusternode is located. The definition of local is a functionof one or many parameters such as network topol-ogy, packet round-trip time, available bandwidth,or even the physical location of the end-user’s ISP

Page 4: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

740 A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772

network. The delivery of streaming content such asvideo is still from a single source over a single con-nection (i.e.: point-to-point) like the traditional cen-tralized CDNs. Although this approach has greatlyreduced the chance of outage over the whole popu-lation of end-users, the point-to-point delivery ofvideo is still subject to single point of failure prob-lem for each end-user. In addition, as contents haveto be mirrored over the server clusters around theworld, the cost to maintain such a CDN can behuge. Table 2-1 outlines their approaches.

In the past half a decade, there have been a num-ber of P2P content delivery network models devel-oped and deployed widely. The most popular P2PCDN model is the general file-sharing applicationsand infrastructures. Napster [28,29,34], Gnutella(Bearshare) [29,30,34], FastTrack (Kazaa) [31,34],eDonkey [32], and Bit Torrent [33] are the popularexamples. Each application represents an iterationof improvement in the approaches on resource dis-covery, peer selection, and content delivery. Table2-2 provides a summary of their approaches. How-ever, none of today’s file-sharing applications sup-port real-time streaming of media content files.Acquisition of a file in those applications is basicallyaccomplished by batched download, which has nonotion of sequencing and timing constraint in thedelivery timeline.

Table 2-1Summary of popular non-P2P CDN architectures

CDN solutionsprovider

Type of release Resource selection strate

Akamai Infrastructure consistsof server clusters andmonitoring, deploymenttools

The end-user node reliesdomain name server (DNwhich IP address (of thecontent) the end-user noThe server that is closestin terms of measures sucround-trip time, with miand below the load thres

IntelliDNS Application The end-user node contarunning the IntelliDNS sdetermine which IP addrserving the content) theshould contact. The servthe end-user in terms ofas network round-trip timbest-guess geographical lend-user’s ISP network,

Another emerging P2P CDN model is commonlyknown as the application level multicast (ALM). Asthe term implies, the delivery of the content to mul-tiple requested peers simultaneously is achieved onthe application layer rather than the network layer,such that it can be used over a traditional unicastnetwork. The motivation of ALM is due to the factthat multicast networks are still rare in today’sinternet. ALM is commonly accomplished by aone-to-many distribution tree of peers managedeither in a centralized fashion at the content sourcepeer such as CoopNet by Padmanabhan et al. [26],DirectStream [38], or in a centralized-decentralizedfashion at source and intermediate peers of the dis-tribution tree such as P2Cast [39], PeerCast byDeshpande et al. [27]. However, ALM is essentiallya point-to-point content delivery model that relieson a single connection from one peer to each ofits child peers. Failure of a parent peer or the pathbetween two peers results in interrupted deliverywhen the re-orientation of the tree for switching-over to another parent takes place. The Padmanab-han group [26] addressed this problem by the use ofmultiple distribution trees where multiple sub-streams of the original stream are sent down eachpeer. The orientation of each node in the tree forone sub-stream is different from that for another.When the parent fails, there is still at least one of

gy Content delivery strategy

on Akamai’sS) to determineserver serving the

de should contact.to the end-user

h as networknimal packet loss,hold, is selected

The content is delivered from a singleserver source (point-to-point). Uponfailure, the end-user has to manuallyrestart the player session such that theplayer will contact the DNS again toobtain the IP address of another server.In the case of archived video, the contentis mirrored over the server community.The content may not be replicated acrossevery available server around the world.The extent of the mirroring depends onthe popularity of the content in eachregion

cts the DNService toess (of the serverend-user nodeer that is closest tomeasures such

e, and theocation of theis selected

The architecture covers only the resourceselection. However, as IntelliDNS isdesigned to locate a single server resource,it is targeted for decentralized CDNs thatdeliver content from a singleserver source to an end-user client

Page 5: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772 741

the sub-streams likely to reach each child peer. Pad-manabhan employs the multiple description coding

(MDC) on the media content in order to sustainuninterrupted playback of a content under interrup-tion of some of the sub-streams. The MDC is an

Table 2-2Summary of popular P2P file-sharing architectures

File-sharingarchitecture

Type of release Resource discovery strategy

Napster Application The consuming peer contactsthe centralized globaldirectory server to locatewhere the file is. More thanone peer claimed to have therequested file may be returnedbut no mechanism exists forverification of actual identity

Gnutella Infrastructurefocused on resourcediscovery – client

examples: Bearshare,XoloX

The consuming peer contactsits neighbor seed peers tolocate the file on its behalf.Each contacted neighbor peerin turn forwards the samerequest to its neighborrecursively if it does not havethe file. More than one peerclaimed to have the requestedfile may be returned but nomechanism exists for verificatioof actual identity

FastTrack Infrastructure –client examples:Kazaa

The consuming peer contactsits local ‘‘supernode’’, whichis another consuming or servingpeer with the capability ofmaintaining a partial filedirectory of local peers.A supernode may queryother supernodes for therequested file. Each peerinforms its local supernodeupon completion of afile download

eDonkey Application Similar to FastTrack. Incontrast, the eDonkey’sdirectory peer also maintainsa list of peers who aredownloading the requestedfile as well

encoding technique for dividing a media contentstream into m sub-streams, each of which can bedelivered at a fraction of the rate required by theoriginal stream. It also allows partial reproductionof the media content out of p:p < m sub-streams

Peer selectionstrategy

Content delivery strategy

None (manual) Point-to-point single-peer filetransfer. Upon peer failure,the user has to manuallyselect another peer thatessentially restarts thefile transfer

n

None (clientdependent)

None (client dependent). Filetransfer is usually performedin point-to-point single-peerfashion by clients released inearly days. Most recentlyreleased clients supportaggregated file transfer frommultiple peers selectedmanually. However, a filehas to be downloadedcompletely into a peerbefore it can be madeavailable for sharing

None (clientdependent)

None (client dependent). Clientssuch as Kazaa schedule deliveryof the requested file in differentblocks, in no particular order,from multiple selected peers tobe accomplished simultaneously.When an active peer fails, the userhas to manually select anotherpeer to fill in the gap. Recoveryfrom a broken file transferis inherent. A file has to bedownloaded completely into a peerbefore it can be made availablefor sharing

Discovered peers areautomaticallyselected foraggregated filetransfer based onthe time they arediscovered and theavailable upload slotsof the peer. Themaximum numberof peers allowed inthe active set ispredefined by user.

eDonkey schedules delivery ofthe requested file in differentblocks, in no particular order,from multiple selected peers tobe accomplished simultaneously.When an active peer fails, itlooks up another peer to fillin the gap automatically.Recovery from a broken filetransfer is inherent. Whenever afixed-sized chuck of the file isdownloaded completely intoa peer, it is made availablefor sharing

(continued on next page)

Page 6: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

Table 2-2 (continued)

File-sharingarchitecture

Type of release Resource discovery strategy Peer selectionstrategy

Content delivery strategy

Bit Torrent Application The consuming peer locatesthe object by contacting a‘‘Tracker’’ peer that keepstrack of other peers whoare currently downloading,and/or have bits and piecesof the same object. Theobject consists of a setof files predefined by thepublisher. The location of theTracker is found in a tokenfile made available on theweb by the publisher

Random withpreference topeers that carrythe chucks of anobject that areleast commonlyfound inother peers

Bit Torrent schedulesdelivery of the requestedfile in fixed-sized chucksfrom multiple selectedpeers to be accomplishedsimultaneously. The chucksthat are least commonlydistributed are downloadedfirst. When an active peerfails, it looks up anotherpeer to fill in the gapautomatically. Recovery froma broken file transfer isinherent. Whenever a fixed-sizedchuck of the file is downloadedcompletely into a peer, it ismade available for sharing.The download rate of apeer is proportional to itsupload rate in order tofacilitate fairness

742 A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772

being delivered simultaneously. If the content is avideo, partial reproduction results in loss of videoquality during playback, typically in terms of lowerframe rate than the original.

The idea of allowing multiple peers to push sub-streams of the same media content simultaneouslyto a consuming peer, in order to share the networkbandwidth that is originally required for a singlemedia stream, is now commonly known as aggre-

gated streaming or multiple-sender path diversity inthe research community. This CDN model underP2P paradigm has received the least attention untilrecently. The concept was probably originatedabout 3 years ago as Calvert et al. [13] outlined itin their Concast paper. The subject of aggregatedstreaming slowly came into research attention suchas the works from Nguyen and Zakhor [14], Coop-Net by Padmanabhan et al. [26], and finally, Hefe-eda et al. [15] is probably among the first tointegrate this concept with the peer-to-peer para-digm with the introduction of CollectCast (alsoknown as PROMISE). The papers [14,15,26]brought out a number of important issues relatedto aggregated streaming with remarkable solutions.For example, the Nguyen group proposed the use offorward error correction (FEC) in their aggregatedstreaming architecture such that the receiver canrecover the original stream by receiving any n ofnFEC:nFEC > n FEC encoded packets [14], as long

as the number of lost packets during the transmis-sion does not exceed nFEC � n. The solution neatlyavoids the need of lost packets re-transmission thatimposes delay and control overhead. In [15], theHefeeda group raised the importance of networktopology awareness in the selection of candidatepeers in the aggregated streaming scenario, in orderto avoid having too many active serving peers thatdeliver the sub-streams through the same link ofthe network that causes congestion. The Padmanab-han group [26] attempted to support partial contentstorage in each active serving peer participating inthe aggregated streaming in the CoopNet system,by employing MDC on the media content.

The keynotes of the papers mentioned abovehave become important inspiration in the designof DeMSI. However, the implementation of theMDC technique like the one being used by Coop-Net is highly dependent on the type of media con-tent. Moreover, in terms of storage, the contentcan only be split into m parts, where m is limitedby the number of sub-streams required to be deliv-ered simultaneously in order to achieve originalreproduction quality. In other words, m cannot belarge. Therefore the partial content to be stored ineach peer is still quite large in size. Hefeeda’s Col-lectCast requires each serving peer to store the com-plete replication of the content rather than a smallportion of it as employed in DeMSI’s decentralized

Page 7: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772 743

content storage model. The topology-aware peerselection technique being used in CollectCast istoo costly for DeMSI’s content storage model thatrequires visiting many serving peers over the courseof a streaming session, typically in the order of hun-dreds for an hour-long video.

CollectCast requires prior knowledge on theinferred topology of the network being used bythe connections between all candidate peers andthe consumer before the streaming can commence.Its peer selection algorithm relies on the heavy-weighted traceroute utility to help obtain the net-work topology information before the selectioncan be made. The accuracy of the topology infer-ence is high, and its granularity can be down tothe hop-link level that is visible to traceroute. Thisenables CollectCast to precisely calculate, for eachhop-link shared by a group of peers, which onescan be chosen to serve simultaneously. However,the use of traceroute introduces significant overheadon both time scale and network load because itrequires co-operation with the routers. In addition,some routers may not even respond to tracerouterequests [16]. Unlike CollectCast, DeMSI does nothave any one peer that has a complete replicationof the content available. Regular switching ofactively serving peers set is required for the durationof a streaming session regardless of peer availabilityand network condition. In contrast, CollectCastheavily relies on a handful (an average of four asdiscussed in [15]) of peers. Switching of active peersis inherently less frequent in CollectCast’s scenarioas it only occurs when the peer becomes unavailableor network condition becomes inferior. The numberof candidate peers in DeMSI’s scenario is way largerthan that of CollectCast. The overhead required toinfer the topology of the network being used byall candidate peers before the streaming commencesis completely out of question for DeMSI. It is intu-itive to visualize that the use of a topology-awarepeer selection method will not be as effective inDeMSI as in CollectCast. The need to visit a largenumber of peers makes DeMSI less likely to pickthe peers that have to push sub-streams throughthe same tight hop-link before reaching the con-sumer for the duration of the streaming session,except in the situation where the total unused band-width of all such hop-links in the network isapproaching the aggregated streaming rate requiredto serve a consumer.

The goal of peer selection is to maximize the uti-lization of the network while minimize the number

of active peers at a time to serve the content, andthe frequency of re-scheduling or switching-over toother candidates over the course of streaming.DeMSI also requires it to be timely. For that rea-son, we design alternative solutions for inferenceof network characteristics and peer selection thatsacrifice granularity for efficiency. As the perfor-mance evaluation shows, our solution outperformsthe selection strategy purely based on end-to-endbandwidth in terms of achieving the goal.

The inference of internal network characteristicsusing end-to-end measurements is one of the popu-lar areas of research. The idea is commonly referredto as ‘‘network tomography’’. There are two majorresearch directions in this area that we are interestedin: (1) Inference of network topology [16,21]; (2)Inference of shared congestion points of the net-work [17,1,2]. Within each, there are two mainfocuses on the sender–receiver relationship: Namelythe single-sender–multiple receiver (sometimesknown as the inverted Y-topology), and the multi-ple-sender-single-receiver (sometimes known as theY-topology). To summarize quickly, the topologybased inference techniques generally exhibit highapproximation granularity, slow convergence (inorder of minutes) and overwhelming algorithmcomplexity. On the other hand, the congestionbased inference techniques generally offers lowerapproximation granularity, converge quickly (inorder of seconds) and are light-weighted. Therefore,DeMSI’s inference solution is based on inference ofshared congestion points, or ‘‘congestion based’’ inshort. Its design is inspired by Flowmate [2] – a toolfor partitioning flows into clusters each of whichrepresents a congested link in the network. Flow-mate uses the packet delay correlation test algo-rithm proposed by Rubenstein et al. in [1] toperiodically determine whether the two flows tra-verse through the same congested link when theycome from or go to the same partner. It requiresin-band or out-of-band poisson probe traffic to beinjected from the sender node to work properly.The two end-nodes may either be receivers or send-ers, while the partner may either be a sender or areceiver respectively. The idea is that when the flowsshare a congested link, their probes reach the con-gested link at time that is a poisson random vari-able, but they are queued up and serviced at adeterministic rate. As a result, the spacing betweenpackets of different flows after the bottleneck issmaller than the spacing between packets withinthe same flow. Rubenstein suggests the comparison

Page 8: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

744 A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772

of correlation coefficients on delay samples from thetwo flows to detect such phenomenon. As the corre-lation test algorithm only addresses two flows at atime, Flowmate is built on top of it to support theclustering of multiple flows in an efficient way.Unfortunately, it works with inverted Y-topologies,whereas in DeMSI, connections and flows betweenserving peers and the consuming peer follow a Y-topology formation. Rubenstein et al. also proposedan alternative correlation test algorithm based onpacket loss of the two flows. However, experimentsshow that it converges slower than delay correla-tion. Another remarkable attempt of inferringshared congestion points of the network is byKatabi et al. [17] who uses an entropy function ofpacket spacing to determine whether the flows tra-verse through the same congested link. The idearelies on the fact that packets from various sendersare sent at different rates and times. Since the pack-ets from various flows are queued up and serviced ata fixed deterministic rate at the congested link, theirinter-packet spacing measured after the bottleneckshould be least varied regardless of where the packetis from. Therefore, Katabi’s approach does notrequire extra probe traffic and it is capable of parti-tioning multiple flows into clusters each represents acongested link. However, it takes more packet sam-ples (hence more time) than Rubenstein’s algorithmto converge, especially when the congested link isfilled with cross-traffic. Another serious drawbackis that it requires prior knowledge on the numberof congested links to be identified amongst theflows.

3. Architecture of DeMSI

This section presents an overview of our systemand its functional components.

3.1. Overview of functional components

DeMSI is the P2P media streaming service mid-dleware that bridges between the content player sys-tem at the subscriber end and the other end made upof the CDN itself and other online subscribers. Itskey objective is to promote decentralized mediastreaming from a selection of multiple subscriberpeers, and decentralized storage of media contentsdivided and distributed amongst subscriber peers.At this stage, selection of peers is primarily basedon past history of their streaming performance,and congestion avoidance by the analysis of correla-

tion with the sub-stream flows from other selectedpeers. Fig. 3.1-1 shows the block diagram of the com-ponents in DeMSI and their relationship in terms oftheir interactions. Here is an overview of the mainworkflow of DeMSI: when the user requests a videoto be played via the user interface of the Player, itinforms DeMSI through the DeMSI-Player API,which in turn kicks off the Scheduler. The Scheduleris in-charge of the initial selection of candidate peersdiscovered by the Peer Hunter as per Scheduler’srequest through the DeMSI-Peer Hunter API, andschedules each selected peer to serve the segment(s)of the content, one segment at a time. The peers towhich the streaming task is scheduled become activeserving peers. In each active serving peer, the seg-ment(s) of the content are then retrieved from the filesystem locally via the Storage Manager and deliv-ered from the Segment Sender. The sub-stream isreceived by the Segment Receiver at the consumerside. It stores the sub-stream segment by segmenton-the-fly in the Segment Cache, and collects net-work statistics of a sub-stream flow from the origi-nating peer of the received packet. Concurrently,the Player plays the content by pulling the receivedsegments from the Segment Cache in order, via theDeMSI-Player API. On the other hand, the PeerMonitor performs the following periodically: (1)checking the health of each active serving peer anddetermine whether the peer needs more help fromanother redundant peer candidate; (2) inferringpoints of network congestion shared by sub-streamflows if there are any. The Peer Monitor informsthe Re-scheduler if there is a need to scheduleanother peer to assist one of the current active serv-ing peers found to be ‘‘unhealthy’’, such as when apeer goes offline, or the actual streaming rate isbelow expectation.

Implementation of systems like DeMSI is chal-lenging. Most interactions amongst the componentsand their activities are actually occurring concur-rently. Therefore, in Fig. 3.1-1, each component rep-resented as a rounded rectangular block is a separatethread executing on its own. In other words, DeMSIis designed and implemented as a team of autono-mous agents. When two components are connectedby a fat arrow, it means that their interactions arepurely one-way asynchronous requests. A mixtureof a fat and a thin arrow pointing at the oppositedirection of the fat one denotes request–responsetype interactions. They originate from the startingend of the fat arrow. The fat arrow denotes a requestand the thin one denotes a response in this case. The

Page 9: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

Fig. 3.1-1. Components of DeMSI – a team of agents.

A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772 745

normal rectangular block and the cylinder denote apackage of methods to be executed under the caller’sthread. However, the cylinder also denotes a reposi-tory of data objects: local segment, remote segment,peer, and point of congestion. Most data objects arepersistent except remote segment, which stays in theSegment Cache between the time it is received andthe time it is consumed by the Player.

As there are many existing works in the researchcommunity on resource discovery or lookup sub-strate over P2P networks, we make DeMSI indepen-dent of the substrate as the Peer Hunter agent aslong as its implements the DeMSI-Peer HunterAPI. As this is not our primary focus at this stage,we do not discuss this further except an outline ofwhat DeMSI requires the Peer Hunter agent to per-form, in Section 3.4.

Each DeMSI peer uses one TCP port for incom-ing control flows from other consumers and a UDPport for content sub-stream flows from active serv-ing peers.

3.2. Storage strategy

The storage of media content in DeMSI employsa decentralized with division of responsibilityapproach. No single subscriber peer stores a com-plete replication of the content, but a small part ofit. As the peers and their connections are unreliable,the aggregated streaming may need to partially relyon reliable resources from the CDN when there arenot enough peers available. For that reason, DeMSIsupports a special type of peers called the dedicated

server that offer a complete archive of the contents.

Page 10: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

746 A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772

Therefore, while it handles unreliable subscriberpeers as serving peers, it assumes the existence ofsome of the more reliable dedicated servers. Itmay be thought of such peers as being owned bythe CDN and/or content publishers. DeMSIassumes that the dedicated servers are online allthe time, although it still assumes that the connec-tivity may still be unreliable. Dedicated servers aretreated differently from other peers in terms of peerselection, which is described in more detail in Sec-tions 3.5 and 3.6.

The division of responsibility strategy leads tothe need of dividing a media content into segmentsbefore distribution to subscriber peers. A mediacontent Mv is divided into n equal-sized segmentsSi where 0 6 i 6 n � 1. The stream of a media con-tent is now represented by

S0S1S2 . . . Sn�1

A segment is the smallest unit of data block to bestored in the subscriber’s workstation. Each sub-scriber keeps at least one segment of the same IDfor each Mv. Each peer may store k consecutive seg-ments of each Mv in such order

Ss; Ssþ1; Ssþ2; . . . ; Ssþk�1

There are several advantages of assigning a k seg-ments consecutively as opposed to scattered. Firstly,it ensures the segment offerings information of eachpeer to be represented in the most compact manner.Secondly, scaling the segment offerings up or downis simple to manage. Thirdly, it helps ease the subse-quent scheduling effort once the consumer finds apeer that offers, for example, the next k segmentsit needs. Most importantly, it helps increase theaccuracy of correlating the sub-stream of one peeragainst others.

While a peer has segments stored locally, othersegments that are streamed from other peers forthe purpose of consumption are transient. In thecurrent implementation of DeMSI, both types ofsegments are managed by the Segment Cache. Thefuture version will include the Storage Manageragent that manages the inter-peer re-distributionof the new segments received from other peersthrough their re-distribution process.

3.2.1. Forward error correction and segment structure

In order to avoid re-transmission of lost packetsthat may occur in the streaming, each segment isencoded using a Forward Error Correction (FEC)algorithm before sending to the consumer. The

FEC encoding process is associated a parameterknown as tolerance level lFEC:0 < lFEC < 1, whichindicates the maximum packet loss rate that theFEC can tolerate. DeMSI employs a fixed tolerancelevel approach such that each segment is stored pre-encoded with FEC at the peer. FEC deals with datain blocks, or in other words, packets. A segment hasto be split up further into small blocks, which wecall fragments, such that it can be transmitted in ser-ies of packets. The fragment size is defined such thateach fragment can be fitted into a packet of the sub-stream. It makes perfect sense to use the same seg-ment structure for the FEC algorithm to encode.

Let as define 1 as the size of a segment and q asthe size of a fragment in bytes. Then each segmentconsists of n = 1/q fragments. When a segment isencoded with FEC, the size of each segment storedin a subscriber peer becomes 1 /(1 � lFEC). Hence anencoded segment consists of nFEC = 1/q(1 �lFEC) = n/(1 � lFEC) encoded fragments. At the con-sumer side, a segment Si is decoded on-the-fly usinga separate thread after receiving any n of the nFEC

fragments that belong to Si.We decide to employ a Reed–Solomon based

FEC algorithm in DeMSI because it guaranteesthe tolerance level, regardless of order and which nof the nFEC fragments are received. Moreover, ithas existing Java code available [4]. However, thedownside of the algorithm is slow, although thisdoes not introduce much of a problem during theevaluation when it is executed in a Pentium 4 classPC. Another FEC implementation known as theTornado Codes [20] should be more desirable.Unfortunately an existing piece of working code isyet to be found. Although Tornado Codes uses aprobabilistic approach, where it does not guaranteea 100% QoS in terms of tolerance level, it claims tobe a lot more efficient than the Reed–Solomon’sapproach.

3.3. The knowledgebase of discovered peers

DeMSI has to maintain the Peer Cache – a semi-persistent knowledgebase of discovered peers for thepurpose of monitoring and selection of candidatesto be active serving peers. The name ‘‘semi-persis-tent’’ comes from the fact that the Peer Cache doesnot maintain a global collection of peers, althoughthe knowledge is stored in the file system for subse-quent streaming sessions. Rather, the Peer Cachemaintains a limited number of candidates. WhenDeMSI acquires knowledge of a new candidate peer

Page 11: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772 747

and the Peer Cache reaches the limit, the leastrecently contacted candidate is removed. Theknowledge of a new candidate peer comes fromone of the two sources: either from the responseof a Peer Hunter’s hunting request, or the huntingrequest from another peer.

DeMSI maintains a number of service level met-rics for each peer in order to aid the peer selectionduring the scheduling/re-scheduling processes, andthe decision by the Peer Monitor on whether a par-ticular flow has to be re-scheduled. There are 2groups of metrics: dynamic and static. The dynamicmetrics change over time, whereas the static onesremain constant at least for the period of a stream-ing session. Let Pj be a peer of ID j. The static met-rics are as follows:

• First segment ID offered: sj, sj P 1.• Number of segments offered: kj, kj P 1.

The dynamic metrics are as follows. In particular,the first three metrics are obtained based on themethodologies discussed in [3].

• Average net receive rate of content sub-stream:Rj – this is the actual receive rate of content datadetected by the consumer. Let Rrecv,j be the his-toric gross receive rate of peer j, and U be thepacket data utilization. The average net receiverate is calculated as

Rj ¼ Rrecv;jð1� ljÞU• Average loss rate of sub-stream packets:

lj:0 < lj 6 1 – the percentage of packets lost overa number of packets supposed to be receivedfrom Pj.

• Average round-trip time: Tj – this is the timetaken for a packet to take a consumer – Pj – con-sumer round-trip.

• Average response time to a hunting request: tj –the time taken between the sending of a huntingrequest and the receipt of the correspondingresponse from Pj.

• Inferred point of congestion: Gj – DeMSI detectswhether the sub-stream flows from the two of theactive serving peers share a congested link. Eachpeer Pj from which the flows are inferred to sharethe same congested link are put into a group Gj.Please refer to Section 3.7 for details.

• Congestion index: Cj:0 6 Cj 6 1 – if there exist aGj for a peer Pj It indicates how congested theshared link, that this peer is believed to be using,

is currently. The lower the value the less con-gested. A Cj of zero indicates that the flows fromPj are believed not to share any congested linkwith flows from other active serving peers. Thevalue of Cj changes as the set of active servingpeers changes:

Cj ¼Xj2}

Rj=ðRupmaxUÞ where } is the set of active

serving peers with the same Gj

3.4. Peer hunting

It is inherently necessary for DeMSI to look forpeers that carry the segment(s) of the content itneeds in a decentralized way. It is indeed a challengeto look for hundreds of candidate peers at once.Fortunately, this is not necessary since the Playerconsumes the content one segment at a time overa period, in ascending order. Peer hunting can beperformed for at least 2 segments at a time incre-mentally. DeMSI works independently from theresource discovery algorithms in order to promotereuse, as there are many such technologies availablein the field [18,36].

The Scheduler and Re-scheduler agents rely onthe Peer Hunter agent to look for at least c candi-date peers for each segment Si where

Xc

a¼1

Rcand½a� P Rcontent 8 cand½a� are dedicated

servers or subscriber peers

Rcontent is the minimum required aggregated contentconsumption rate, cand[a] denotes the peer ID of theath candidate peer in the candidate list, and Rcand[a]

denotes the net receive rate of content sub-streamfrom peer ID:cand[a]. At the beginning of a stream-ing session, DeMSI refreshes and enriches the PeerCache by asking the Peer Hunter agent to find peersthat carry one or more of the k segments required bythe requested content. Whenever DeMSI is runningshort of candidate peers that supply a particularsegment, such that it has to contact the publisher’sdedicated servers for delivery whenever anyone ofthe serving peers fails to satisfy its estimated netcontent receive rate and loss rate, DeMSI will askthe peer hunter agent again to find more peers thatcarry one or more of the next h segments includingthe current segment being delivered. This is knownas a repeated peer-hunting request. The number h

must be at least 2, and is determined such that it

Page 12: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

748 A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772

is enough to fill up the segment cache of at least nmin

decoded fragments – which is also a threshold cachelevel for DeMSI to determine whether it should in-volve dedicated servers right away, without tryingother candidate peers, for delivery. Even thoughthe implementation of this algorithm is beyond thescope of this project, the substrate must satisfy thefollowing requirements:

• As each peer stores the same set of segments forevery movie, the resource that the algorithmneeds to look for is segment.

• It is essential for the algorithm to find more thanc candidate peers for each segment requested, atleast one of which must be a publisher’s dedi-cated server. If it manages to find only one candi-date peer, it must be a dedicated server.

• It shall confine the scope of peer-hunting down tothe consumer’s local communities. The scope ofhunting may only be expanded upon a repeatedhunting request.

• It is preferred that the substrate to be capable ofestimating the candidate peer’s upstream band-width in return. One way to achieve that is toemploy a fast packet-dispersion based estimationmethod, such as SProbe [19] at the candidate peerside. The estimation involves overhead of only afew packets and a couple of round-trips of sev-eral tens of miniseconds. For candidate peers ofwhich the upstream bandwidth cannot be esti-mated and are new to the consumer, therequested streaming rate Rreq when the peer isselected, is initially Rup min – the minimum grossupstream rate of the peer.

Fig. 3.5-1. Example of an aggre

Existing resource discovery substrates such asKelips [36] and Pastry [18] can be good candidatesto be the Peer Hunter agent, as they both have thenotion of locality in the search. However, furtherenhancement on the substrate is unavoidable inorder to satisfy the requirements stated above andbe compatible with the DeMSI-Peer Hunter API.

3.5. Scheduler and segment cache

The Scheduler is an agent that co-ordinates peerhunting and dispatches various streaming and peermonitoring tasks to be carried out during thestreaming session upon request from the Playeragent. The media content is served in terms of anaggregation of p sub-stream flows from p activeserving peers at a time where p > 0. Let actv(a)denotes a function that returns the peer ID of athactive serving peer. p is determined according tothe historic average net receive rate of contentsub-stream Ractv(a), 1 6 a 6 p of each selected peer.As Fig. 3.5-1 shows, each active serving peer isassigned a fraction of the segment to be deliveredto the consumer. The number of fragments to bedelivered is proportioned by

minðRactvðaÞa=Rcontent; 1Þ

where a : 0 < a < 1; a 2 R is called the re-schedulingthreshold. The use of aprevents the decision to re-schedule from being too sensitive to noise from thenetwork and the statistical oscillations in calculationof Ractv(a). The peers serve the assigned range of frag-ments in parallel until the consumer instructs themto stop. The total number of fragments ntotal,actv(a)

gated streaming scenario.

Page 13: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772 749

of segment Si to be delivered from all the active serv-ing peers is based on the smallest of the tolerancelevel, and the 2 times the highest loss rate (lmax)of active serving peers such that

ntotal;actvðaÞ ¼ 1=ðqð1�minðlFEC;2lmaxÞÞÞ

When a fragment f of segment S is received from theactive serving peers via the Segment Receiver intothe Segment Cache, it is in FEC-encoded form. Itis only decoded after receiving n � 1 other frag-ments of segment S. Therefore, the Segment Cachecontains both the encoded and decoded fragments.A decoded fragment is removed from the Cacheafter it is consumed by the Player.

Let tmax be the maximum time allowed for thePeer Hunter to collect discovery responses fromthe peers, and dmax be the worse case time takento decode a received segment. The Scheduler maycontact any type of discovered peers for streamingif the number of received and decoded fragmentsin the Segment Cache is larger than nmin where

nmin ¼ nþ ðtmax þ dmaxÞRcontent=q

Otherwise the Scheduler contacts only dedicatedservers for streaming. On the other hand, if thenumber of fragments received in the segment cache,decoded or not, is larger than the maximum numberof fragments allowed in the segment cache nmax,where

nmax ¼ nmin þ hn;

the Scheduler pauses until the condition no longerholds.

The segment next to the most recently deliveredone, or the first segment to be scheduled for deliveryin a streaming session is called the urgent segment.There is only one urgent segment at any time of astreaming session. The urgent segment is given pri-ority in scheduling and re-scheduling processes.The selection of candidate peers and scheduling ofstreaming tasks for each segment is described inthe following pseudo-code:

1. fb = 0; fe = 0; lmax = 0;ntotal,i+1 = nFEC; //where fb, fe denote first fragment ID, last frag-ment ID to be scheduled for delivery,respectively

2. For each segment Si,0 6 i < numSegments(Mv)3. Get list of discovered peer candidates that

carry segment Si (excluding the ones tried inprevious round) sorted by subscriber first, C

ascending, R descending, online first, s

descending, k descending, l ascending, T

ascending, t ascending;4. For each peer candidate Pj from the list until

all ntotal,i fragments have been scheduled orend of list

5. If Si is not urgent & (Pj is a dedicated server orPj does not carry the urgent segment as well orCj > 0), continue with next candidate;

6. If Si is urgent and number of fragments deco-ded 6 nmin & Pj is not dedicated server, con-tinue with next candidate;

7. If no. of fragments decoded > nmin & Pj

is dedicated server, PeerHunter.

findPeers(Si,Si+1);8. If Pj can be connected, wait until Rrecv �

min(Rup max,wRj) < Rdown max; Else continuewith next candidate; // Rrecv is the aggregatedgross receive rate; Rup max is the maximumgross upstream rate from a peer; w is thegrowth factor allowed for Rj;Rdown max is themax allowed aggregated gross downstreamrate at the consumer;

9. fe ¼ fb þ nFEC minðRja=Rcontent; 1Þ; == 0 <a < 1; a 2 R is the re-scheduling threshold

10. If request Delivery (Pj, fb, fe,Rj) is successful,{fb = fe + 1; Re,i,j = Rj; ntotal,i = 1/(q(1 �min(lFEC,2lmax))); Repeat from 3} else fe = fb;// where Re,i,j is the estimated content receiverate for the delivery request

11. End For;12. If Si is not urgent, wait until Si is urgent;13. If there are still fragments remained to be

scheduled, repeat from 3;14. lmax = 0; ntotal,i+1 = nFEC;15. End For;

For newly discovered peers, the consumer has onlythe static service level information about the candi-date serving peers discovered. The dynamic servicelevel information is mostly unknown except tj. Peersoffering the same segment are initially selected inascending order of tj, and if the candidate list is bigenough, the selection process avoids picking peersthat have the same first 24 bits of the IP addressesexcept the first one in the sorted candidate list. Theintuition is that the longer the tj, the more probablethat the candidate is further from the consumer, themore probable that the packet path encounters a con-gested link. As it is common to allocate the last 8 bitsof the IP addresses to the same ISP, or in many cases,to the same LAN of an enterprise, peers that have the

Page 14: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

750 A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772

same first 24 bits of the IP address has quite a highprobability of sharing the same backbone that maybe of limited capacity. A special case of that is whenthere are peers that have the same IP address, thatprobably suggests they are behind the same fire-wall/NAT that may introduce bottleneck. Allselected candidate peers that are new are initiallyallocated the request streaming rate Rreq equals toRup min, except in the case where the Peer Hunteragent supports rate estimation as discussed inSection 3.4. This is also the basis for the estimatednet content receive rate Rj = Rup min(1 � lj)U. Asthe sub-stream flow arrives the consumer node, therest of the dynamic service level metrics can becollected.

3.6. Segment receiver

While the Scheduler agent schedule the streamingof content segment by segment, the Segment Recei-ver agent listens to the UDP port for streams of frag-ments from the active serving peers. It parses eachpacket received and updates any dynamic servicelevel information: Rj, lj ,Tj of the source peer dataobject Pj whenever applicable. The timestamps fromboth the origin and the receiving end are stored if thepacket contains either a round-trip-time reply or aprobe for inference of shared congestion points.Please refer to Section 3.7 for details.

In an event of changing packet loss rate and/orround-trip time for the sub-stream flow from Pj,the Segment Receiver adjusts the upstream rateRreq,j

according to the renewed calculation of the TCP

friendly rate [3] based on round-trip time and packetloss rate. This congestion control mechanism ensuresthat both the round-trip time and the loss rate can beunder control. The peer Pj is informed of such changeonly regularly by the Peer Monitor as discussed inSection 3.7. We fine-tune the TCP friendly rate equa-tion in order to allow a slightly more aggressivestreaming rate allocation in the expense of a slightlyhigher delay to reach its equilibrium state

Rreq;j ¼q

UðT j

ffiffiffiffiffiffiffiffiffi2000lj

3

qþ 12T j

ffiffiffiffiffiffiffiffiffi3000lj

8

qljð1þ 32l2

j ÞÞFragments received are left untouched in FEC-encoded form initially and stored into the SegmentCache. They will be decoded, consumed by thePlayer, and finally purged from the cache at a latertime. Please refer to Section 3.5 for more detailabout the arrangement of received fragments inthe Segment Cache.

We anticipate that the future version of theSegment Receiver will also handle the reception ofsegments of a new content re-distributed from otherpeers, and co-ordination of the archival processwith the Storage Manager.

3.7. Peer monitor

The Peer Monitor agent invokes itself regularlyby a fixed interval. It performs the following tasksat each execution for each active serving peer:

1. The sending of a request for measurement ofround-trip-time between the consumer and theactive serving peer. The request is sent once persecond except at the first two seconds of a sessionwith a particular peer, the evaluation frequency isat 4 at the first followed by 2 at the second inorder to reduce the extra delay in responseoccurred in the initialization stage at the peerside.

2. As the loss rate of a peer is usually well below theFEC tolerance level, it is a waste of networkresource to have all the redundant fragmentsdelivered in order to support the tolerance level.Therefore, the Scheduler usually does not sche-dule all the redundant fragments to be deliveredto the consumer. However, when the loss rate ofan active peer goes beyond the rate estimated atthe time of scheduling, the Peer Monitor willassign an instance of the Re-scheduler agent toschedule another group of candidate peers to deli-ver the remaining redundant fragments of the seg-ment. This task is performed once a second.

3. Examination of the dynamic service level infor-mation at an interval of one per second. The PeerMonitor informs the Re-scheduler to re-schedulethe delivery of a range of fragments [fb, fe] uponencountering one of the following events froman active peer Pj:• When the renewed TCP friendly rate

Rreq,j < Rup min.• When the current net content receive rate Rj is

smaller than the estimated content receive rateof the current delivery request: Re,i,ja.

• When an active serving peer goes offlinesuddenly.

• When lj > min(lFEC,2lmax).

4. Inference of network congestion points possibly

shared by the sub-stream flows, as discussedbelow.

Page 15: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772 751

The peer selection technique being used byDeMSI employs a network congestion avoidancestrategy. In order to achieve the goal, DeMSI hasto have knowledge about the path of the sub-streamflow from each peer such that the hop-links thatshared by two or more sub-stream flows can beidentified. Recent works [15,16,21] indicate thatthe inference of fine-grain knowledge such as net-work topology takes too much time to convergeeven for a small network consisted of a few peers.DeMSI has to visit a diverse selection of candidatepeers over the course of the streaming due to thedecentralized storage of content segments. Theadverse effect of having a particular group of peersstreaming through the same congested link of thenetwork becomes less significant as each sub-streamflow is likely to be short-lived. The inference ofcoarse-grain knowledge such as shared congestionpoints [17,1,2] of the network is enough for the pur-pose. The short period of convergence is whatDeMSI requires, as a sub-stream flow from eachpeer is often short-lived. A life span ranging from1 to 5 s out of a segment of 10-s playing time is typ-ical. It can be worked around by forcing a peer todeliver at least 2 segments consecutively, but thissignificantly reduces the flexibility to scale the stor-age offering from a subscriber. Another challengefor DeMSI to implement a congestion based infer-ence algorithm is the need to have packets flow atthe same time, from the peers to be correlated, forthe period of correlation. As it is impossible to cor-relate a large number of candidate peers (in theorder of hundreds) before the selection process caneven start, the knowledge is accumulated incremen-tally during a streaming session.

Our inference algorithm extends Rubenstein’smethod of determining whether two flows share acongested link by correlation test on packet delaysamples [1]. The Peer Monitor regularly performspair-wise correlation of sub-stream flows from theset of active serving peers once every second todetermine whether there are any two peers share acongested link. The mappings between the peersand the congested links are kept across streamingsessions. This is made possible by measuring thecorrelation of time spacing between adjacent probepackets, spaced apart by time xx > 0 from twosub-streams (in terms of a cross-correlation coeffi-cient Mx) and the correlation of time spacingbetween successive probe packets, spaced apart bytime xa > xx from one of the two sub-streams (interms of an auto-correlation coefficient Ma). When

Mx > Ma, the sub-streams share at least one con-gested link. Otherwise they do not. The idea is that,there are two sub-streams of packet flow where thetime spacing between successive probe packetswithin a flow is a poisson random variable of meank. When they flow through a pipe with a service ratelarger than their aggregated rate, the time spacingbetween packets should remain pretty much thesame as they do not queue up. Therefore the timespacing remains poisson – hence uncorrelated. Incontrast, when the probe packets of the two sub-streams travel through a congested link, the timespacing between adjacent probe packets from twosub-streams is shorter than that between successiveprobe packets of one sub-stream. The spacingbetween the probe packets no longer follows thepoisson distribution due to the fact that they nowfollow the same independent-identically distributedgeneral distribution as that of the congested link’sservice rate, which introduces correlation in thespacing between packets of the sub-stream flows.The delay of each probe packet is calculated usingthe timestamps from both the origin and the receiv-ing end. As Rubenstein’s correlation test algorithmassumes no network layer path diversity in thetopology used by the flows, the same assumptionapplies to our inference algorithm.

Here is how the active serving peers are groupedtogether by point of congestions identified incre-mentally during a streaming session. At the verybeginning, the Peer Monitor assumes no peers shareany congested links. When the sub-stream flowsfrom active peer P1, P2 are found to share a con-gested link, a group g1 that represents a point ofcongestion is created. P1, P2 are then inserted intothat group. Later in the streaming session, P1 nolonger delivers but P3 starts delivery. The PeerMonitor takes the duration of time d, or d/k probesin each sub-stream to find out that P2, P3 share acongested link. Knowing that the flow paths fromthe peers converge as they approach the consumer,and the paths usually remain unchanged for at leasta day [9], it is quite safe for our algorithm to adopta transitive induction approach to relate a newinference to existing ones inferred minutes before.Therefore, P3 joins g1 as a result because P2 belongsto g1.

Now let us assume there is another group g2

formed with members P7, P8, P9 in another stream-ing session. P2 is no longer an active peer but P1, P8,and they are found to share a congested link. SinceP1 belongs to g1, P8 belongs to g2, and g2 has more

Page 16: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

752 A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772

members than g1. g1 is deleted and the members ofg1 are moved to g2 as a result.

The inference algorithm of the Peer Monitoremploys a conservative approach in determiningwhether the two sub-stream flows are routedthrough a congested link. In other words, the infer-ence algorithm would rather return false negative(determine the sub-stream flows do not share a con-gested link but they actually share one) than falsepositive. Both types of error have an adverse effecton the inference accuracy. False negatives lead toincreasing probability of selecting peers that sharea congested link into the set of active serving peers.However, false positives lead to lower utilization ofpeers that do not shared a congested link as well asthe adverse effect of false negatives. First let usdenote Mx�1,2 as the cross-correlation coefficientresulted from the calculation of delay samples fromP1 against those from P2. Ma�1 is the auto-correla-tion coefficient resulted from the calculation ofdelay samples from P1. When two sub-stream flowsfrom active peer P1, P2 are found to be correlated(share a congested link) initially by testing whetherMx�1,2 > Ma�1, the sub-streams is then re-testedon whether Mx�2,1 > Ma�2. Second, our experi-ments indicate that false positives often result whenMa is small, which implies that the flow itself is unli-kely to be congested by any hop-links it traversesthough. Therefore, our algorithm considers P1, P2

to be correlated only if they pass both tests oncorrelation coefficients, and if min(Ma�1,Ma�2)P d. If they pass the two-sided correlation testsbut min(Ma�1,Ma�2) < d, no conclusion is madeand the outcome of the comparison is ignored.Otherwise, they are considered uncorrelated. Exper-iments showed that the combined use of the two-sided correlation tests and the avoidance of smallMa had significantly reduced the chance of gettingfalse positives. The downside is that it also slightlyincreases the chance of getting false negatives.Another case is if two active peers are found to beuncorrelated but they have been allocated in thesame group, they will be both removed from thatgroup according to the philosophy of the conserva-tive approach.

3.8. Re-scheduler

Network conditions in terms of dynamic servicelevel metrics and the peer availability change overtime. Although the trend on time-series usually fol-lows a pattern [22,23], when it comes to very short

and immediate terms, the changes occur by randomquantities at random time possibly within a rangebounded by the trend. It is crucial for DeMSI tobe reactive of random adverse changes in a timelyfashion, by assigning additional peers to rectifythe lagging aggregated streaming rate and time-to-play deadlines. This is where the Re-scheduler agentcomes into play. There can be multiple instances ofRe-scheduler agent each of which takes care of a re-scheduling task concurrently for various ranges offragments to be received.

Assuming that there is an active serving peer Pactv

which is delivering fragments up to fcurr,0 of segmentSdr(0), where dr(0) denotes the segment ID of thedelivery request r = 0 currently being served. Ithas been scheduled to deliver up to fragment fe,0

but the Peer Monitor has detected an event thatrequires re-scheduling. The role of the Re-scheduleris to find and schedule another candidate peer thatis suitable for assisting Pactv to deliver the range ofoutstanding fragments. The algorithm for theRe-scheduler takes a highly adaptive divide-and-conquer approach. Firstly, as Pactv is still deliveringthe fragments at a slower than expected rate, therange of outstanding fragments is re-scheduled tobe delivered by another peer Pj in a reversed direc-tion of the current sub-stream by Pactv in order toavoid repeated delivery of the same fragments. Sec-ondly, as it cannot assume that Pj can assist Pactv

within the newly estimated time frame, the re-sched-uling algorithm simply treats this new schedule asanother smaller delivery request rj which is assistingthe original one ractv scheduled to Pactv. In otherwords, the algorithm may locate another peer toassist rj. We call this the ‘‘spiral’’, or recursivedivide-and-conquer re-scheduling strategy. Likethe Scheduler, the re-scheduling algorithm has anotion of the ‘‘urgent segment’’, which is the seg-ment next to the most recently delivered one. Thekey implication of the urgent segment in the per-spective of the Re-scheduler is that any active serv-ing peers will be called upon if they have a copy ofthe urgent segment, unless they are serving someother fragments of the urgent segment. In otherwords, even though the peer is delivering a non-urgent segment, the peer will be preempted to servethe urgent segment first as instructed by theRe-scheduler.

The pseudo-code for re-scheduling is as follows.Fig. 3.8-1 illustrates an example on how a deliveryrequest is re-scheduled, in a spiral fashion, to becarried out by another peer.

Page 17: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

Fig. 3.8-1. An example to show the Re-scheduler at work – a delivery request is re-scheduled in a spiraling divide-and-conquer fashion.

A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772 753

1. r = 0; This is round 1 of the workflow;2. For each Sdr(r) in delivery request r scheduled

for the active serving peer Pactv

3. If this is round 1, Get list of discovered peercandidates that carry segment Sdr(r) (excludingPactv & others tried in previous round for thesame delivery request) sorted by subscriber first,online first, C ascending, l ascending, Rdescend-ing, s descending, k descending, T ascending, t

ascending;4. If this is round 2, Get list of discovered peer

candidates that carry segment Sdr(r) (excludingPactv & others tried in previous round for thesame delivery request) sorted by subscriber first,offline first, C ascending, l ascending, R ascend-ing, s descending, k descending, T ascending, t

ascending5. For each Pj from the list until end of list6. If this is round 1 & Pj is offline, {re-sort the list

of discovered candidates by subscriber first, off-line first, C ascending, l ascending, R ascending,s descending, k descending, T ascending, tascending; The workflow is now in round 2;go to 5};

7. If this is round 2 & Pj is online8. If r is 0 & Sdr(r) is not an urgent segment,9. Wait until Sdr(r) becomes urgent; The workflow

is now back to round 1;10. Repeat from 3;11. Else go to 33; // It means that the Re-scheduler

has run out of candidates. Pactv has to be on itown!

12. End If;13. End If;14. If r is 0 & Sdr(r) is not an urgent segment & (Pj is

a dedicated server or Pj does not carry theurgent segment as well or Cj > 0), continue withnext candidate;

15. If size of segment cache <nmin & Pj is not a ded-icated server, continue with next candidate;

16. If Pj can be connected, wait until Rrecv � min(Rup max,wRj) < Rdown max; Else continue withnext candidate;

17. D ¼ fe;r � fcurr;rjj ; // D is the number of frag-ments left to be delivered – 1

18. g ¼0; if D ¼¼ 0�1 fe;r�fcurr;r

jfe;r�fcurr;r j ; otherwise

� �// g2{�1,0,1}

is the unit-direction multiplier to indicate the

Page 18: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

754 A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772

direction of streaming for the new schedule.That is, the opposite of the direction for the cur-rent schedule.

19. Rres = min(Rcontent, (D + 1)q/sleft); // Rres isthe new content stream rate required from thecandidate; sleft = max(1, (1/Rcontent) �selasped � 1) is the time left for fulfilling thedelivery of this range of fragments; selasped isthe time already spent on the delivery of the cur-rent fragments range

20. fb�new = fe,r;21. fe�new = fb�new + gDmin(Rj�cand a/Rres,1);22. If requestDelivery (Pj, fb�new, fe�new,Rj) is

successful23. If current active peer is still online24. Inform the current active peer to deliver up to

fragment fe�new + g;25. Renew the estimated net content receive rate of

the current active peer Re�actp ¼ minðRcontent;ð fe�new þ g� fcurr;rj j þ 1Þq=ðsleftaÞÞ;

26. Go to 34;27. Else28. If gfe�new < gfcurr,r, {fe,r = fe�new + g; go to 3}29. End If;30. Go to 34;31. End If;32. End For;33. If r is 0 & tried all Pj & Sdr(r) is not urgent,

{Wait until Sdr(r) becomes urgent; The workflowis now in round 1; go to 3};

34. If Pj is still online, quit;35. End For;

A more aggressive extension for scheduling/re-scheduling algorithm is to maintain an idle connec-tion with a redundant peer after each segment isscheduled for delivery by the Scheduler, and afteran outstanding delivery request is re-scheduled bythe Re-scheduler. This strategy moves the time-con-suming socket connection process to an earlier timebefore the failure event occurs. This ensures asmooth transition in the case of an emergencyswitch-over such as when an active serving peergoes offline while the streaming is in progress.One way to implement this is to have the Sched-uler/Re-scheduler spawn a separate thread, whichtries to establish a TCP connection with the nextcandidate peer in the sorted list until one of themis connected. This peer only serves as a stand-bywhen there is no re-scheduling activity. Otherwise,the Re-scheduler agent spawned at a later timemay communicate with the redundant peer right

away without the need to make a prior TCP con-nection. When the redundant peer is consumed,the Scheduler/Re-scheduler has to locate anotherone immediately in case of subsequent use. In thecase where the candidate list is exhausted or leftwith only dedicated servers, the thread approachesthe peer hunter to discover more peers that carrythe segment it needs before the trial connection pro-cess can continue.

3.9. Segment Sender

The Segment Sender agent is responsible for thedelivery of segment in part or in whole, in termsof a sub-stream of fragments as per delivery requestfrom the consumer. Fragments can be streamed ineither forward or backward sequence in order tobe compatible with the re-scheduling algorithm.The streaming in progress may be preempted by asubsequent delivery request from the Re-scheduler,if it is requesting a segment of which the ID is smal-ler than the current one in delivery. The SegmentSender also handles round-trip-time request token,and the generation of probes as required by the PeerMonitor for the inference of congestion points. Theprobes are generated such that they are spaced apartby xwhere x is a poisson random variable withmean k. The round-trip-time reply and the probeare piggybacked onto the sub-stream packet. Asevery sub-stream packet contains timestamps atthe origin and receiving end, the probe does notintroduce any additional overhead. It is distin-guished from a normal sub-stream packet by simplyflipping the packet ID field to a negative value.

We anticipate that the future version of the Seg-ment Sender will also participate in the new contentre-distribution process. Its role will be to deliver thewhole segment to other peers.

4. Performance evaluation

We evaluate DeMSI under a simplex (one-way)network as shown in Fig. 4-1 emulated by the NS-2 network simulator [25]. The network is made upof eight hop-links. Each cloud represents a combina-tion of 3 Pareto traffic sources as cross-traffic. In par-ticular, each of the clouds c1, c4, c5, c8 also contains3 CBR traffic sources. Pareto sources are goodapproximation of the web traffic that is self-similar,whereas CBR sources are to approximate otherlong-lived streaming traffic. To simulate the asym-metric upstream/downstream bandwidth offered by

Page 19: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

Fig. 4-1. Configuration of the simulated network for performance evaluation.

A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772 755

mainstream ADSL modems of today, every peer isoffered a 32 kB/s connection to the network. Onthe other hand, the consumer has a 192 kB/s connec-tion to the network. The maximum gross upstreamrate Rup max offered by each peer is also set to32 kB/s. In other words, assuming the cross-trafficarrives at its maximum rate allowed at its link, eachof the four tight hop-links: r1–r9, r4–r9, r5–r9, r8–r9allows at most one peer streaming at maximum rate,while another is streaming at marginally less than themaximum rate simultaneously. Although every linkhas the same propagation delay of 1ms, the band-width allocated to each link, the average rate of eachcross-traffic source, and the shape parameter of eachPareto traffic source is different in order to promoteheterogeneity. As the flow of the control packets is oflow volume and the control packets are small in size,the impact of control flow delay and its differencebetween the consumer and each peer is insignificantrelative to the difference in delays of the sub-streamflows. Therefore we focus on emulating the down-stream paths (towards the consumer) of the network.

Fig. 4-2 illustrates how the system is set up forthe experiments to be carried out for evaluation.We split the peers into 2 groups of 10. One groupconsists of peers with odd peer ID numbers, whileanother group consists of peers with even numberedpeer IDs. Each group is assigned to be executed on aPentium 4 2 Mhz class workstation. The consumer

peer is executed on one of the two workstations.Since we emulate a network in real-time using NS-2, we assign the third workstation for the NS-2exclusively. NS-2 has to be executed in real-timemode under Windows XP such that it can catchup with the events occurring in real-time. Normally,during the scheduling or re-scheduling process, theconsumer tries to establish a TCP connection withthe selected candidate peer before the control flow,consists of delivery requests and round-trip-timerequests, commences. The candidate peer becomesan active serving peer by pushing directly to theconsumer a sub-stream flow of content fragmentson UDP packets. Under the NS-2 scenario, theway to establish TCP connections remain as nor-mal. However, UDP flows are emulated. TheUDP packets from an active serving peer are sentto the NS-2 workstation as if it is the consumer.NS-2 eventually forwards most UDP packets tothe real consumer at emulated rates and with emu-lated delay. Some packets are not forwarded dueto emulated packet loss occurred in the middle ofthe network.

We have implemented a prototype of DeMSIwhich includes a Player with a progress monitoruser interface as shown in Fig. 4-3. Although aDeMSI peer is both a consumer and a content ser-ver, we have implemented a prototype that supportsan optional serving-peer-only execution. With this

Page 20: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

Fig. 4-2. Physical system configuration for performance evaluation.

756 A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772

option, the consumer related components includingthe Player and its user interface, Segment Receiver,Scheduler, Re-scheduler and the Peer Monitor areturned off. The process under this execution optionis compact enough to allow multiple instances of itto be executed on the same workstation for evalua-tion purpose. The prototype is implemented in Java1.4.2 with Java Media Framework 2.1.1. The per-ceived dynamic service level statistics: Rj, lj, Tj,and number of active serving peers are collectedand written into a file once a second for furtheranalysis. On the other hand, each claim and peer-point of congestion mapping update resulted froma pair-wise flow correlation test is written into a filewhenever it becomes available.

We encode a small portion of a video clip usingMPEG-1 with a constant consumption rate Rcontent

of 100 kB/s for evaluations. We use a rather legacyMPEG-1 format simply due to the constraint of theJava Media Framework that we have leveraged on aquick implementation of the primitive Player agent.The clip consists of 25 10.24-s segments. Each seg-ment that is ready to play contains 1024 fragments.The size q of each fragment is 1kB. The data utilizes96% of a stream packet on average. We use a toler-ance level lFEC of 0.2 for the FEC such that eachsegment encoded with FEC contains 1280 frag-ments. The Peer Hunter agent has been imple-

mented as a stub that simply reads from an XMLformatted file a pre-defined list of candidate peersas if they are discovered as per hunting request. Inorder to ensure the congestion occurs in the experi-ments, each candidate j is assigned the followingevery time when DeMSI is started:

Rj ¼ Rup maxð1� ljÞU ; lj ¼ 0:001;

T j ¼ 1 ms; tj is assigned a random value

When DeMSI is started, it has no knowledge of con-gestion information. Therefore, the initial selectionof peers is essentially by random. We use the a of0.84 for all experiments such that if peers deliverat Rup max, the Scheduler will schedule four peersto stream. Segments are distributed to peers evenly.Each segment Si, 5 6 i 6 19 is distributed to eightpeers. Each segment Si, 0 6 i 6 4, 20 6 i 6 24 isdistributed to four peers. Four peers are dedicatedservers. Table 4.1 provides the details of theassignment.

4.1. Finding the optimal parameters for

correlation tests

First, we survey a range of parameter value pairs:poisson probe rate and correlation time, in order tofind out the optimal combination for the point-of-congestion inference algorithm under DeMSI’s

Page 21: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

Fig. 4-3. DeMSI player UI showing the progress bars (shown in green) of each sub-stream flow and the inferred POC on the lower-right(shown in grayish red). Each blue number shown within a POC block represents the ID of a peer believed to have pushed sub-streamthrough that POC. A blue number tuple separated by a colon represents hID of the peer from which the sub-stream is deliveredi:hID of the

fragment to be receivedi. The black numbers represent the start and end points of a sub-stream expressed in fragment ID. (Forinterpretation of the references in colour in this figure legend, the reader is referred to the web version of this article.)

Table 4.1Distribution of segments to peers

Peers Segments assignment

P0, P5, P10, P15 S0 . . .S24 (dedicated server)P1 . . .P4 S0 . . .S9

P6 . . .P9 S5 . . .S14

P11 . . .P14 S10 . . .S19

P16 . . .P19 S15 . . .S24

A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772 757

aggregated streaming scenario. For each parameterpair, we start DeMSI and play the video two timesin a row. Then we restart DeMSI and play the videotwo times again. Each positive claim (where twoflows share a point of congestion) from the pair-wisecomparison of flows is verified against the actualnetwork topology. Each playback typically gener-ates tens of positive claims and the number of posi-tive claims decreases in subsequent playbackwithout quitting DeMSI. The reason is that asDeMSI accumulates knowledge of where the con-gestion points are, it avoids visiting more than one

peer in each partially identified group. Hence thechance of getting positive claims decreases. Ourexperience is that the number of positive claims gen-erated out of the third playback in the same DeMSIsession is of little statistical value. This survey is alsohelpful for us to determine an optimal value to use.We have tested a range of correlation time between2 and 8 s. The results basically exhibit a trade-offbetween accuracy of inference and number of posi-tive claims during a playback. Accuracy improves asthe correlation time increases, but the rate ofincrease is very small when the correlation time ismore than 5 s. On the other hand, the number ofclaims decreases at a converging rate as the correla-tion time increases. This is expected because thesub-stream flow from a peer is short-lived. Theprobability of having two peers stream togetherfor as long as the correlation time decreases as thecorrelation time increases. Therefore, we have nar-rowed down the survey to correlation time between3 and 5 s. The d of 0.2 is determined. We first try afew variety of the mean probing rates with fixed

Page 22: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

758 A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772

correlation time of 4 s. This survey is conductedunder the network topology as shown in Fig. 4-1but without cross-traffic. The congestion in thehop-links r1–r9, r4–r9, r5–r9, r8–r9 is made possibleby a reduction of bandwidth to 64kB/s instead. Theresult suggests an obvious increase in accuracy asthe probing rate increases, perhaps except that theaccuracy of claims for the probing rate of 10 isslightly less than that for the probing rate of 8, with-out d filtering. This is possibly due to statistical arti-fact resulted from a small number of samples in thesurvey. Table 4.1-1a shows the result of the survey.

Despite of the fact that an increase of probingrate increases the accuracy, we stop at the probingrate of 10/s. Since the probes is sent in-band withthe sub-stream flow, the probing rate is directly pro-portional to the minimum gross upstream rate suchthat

Rup min ¼q

Uk

A probing rate of 10 translates to 10.4 kB/s accord-ing to our configuration. A further increase ofRup min reduces the coverage of low-end broadbandcommunity where the upstream bandwidth of eachconnection can be as low as 16 kB/s.

This accuracy figures as shown in Table 4.1-1aare particularly discouraging. However, when thecongestion is partly due to cross-traffic, the accuracyimproves significantly as shown in Table 4.1-1b. Wechange the focus on surveying a variety of correla-tion times but fix the probing rate at 10/s. Fortu-

Table 4.1-1aImplications of increasing probing rate and the use of d using the corrused

Probingrate (/s)

Interval b/wcorrelation tests(no. of probes)

Total no. of correctpositive claims

No. of correcclaims survived filtering

5 20 45 268 32 44 27

10 40 52 41

Table 4.1-1bImplications of increasing correlation time and the use of d using the proused

Correlationtime (s)

Interval b/wcorrelationtests (no. of probes)

Total no. ofcorrect positiveclaims

No. of correcclaims survivefiltering

3 30 125 1084 40 100 935 50 52 47

nately, the network with cross-traffic resembles theinternet more closely than the network withoutcross-traffic.

4.2. Efficiency of scheduling and re-scheduling

processes

This section provides more insights about theperformance of the streaming task scheduling andre-scheduling algorithms. The objectives of the eval-uation are as follows:

1. To show that the concept of the proactive peerselection algorithm based on congestion avoid-ance is useful under DeMSI’s decentralized stor-age scenario.

2. To show how our reactive re-scheduling algo-rithm enhances the performance of any proactivescheduling strategies.

In order to achieve the first objective, the algo-rithm has to be independent of its underlying infer-ence algorithm. That is, the pair-wise flowcorrelation test algorithm by Rubenstein [1]. Theexperiment has to assume that the inference algo-rithm is 100% accurate on the point-of-congestioninference such that it can show how well the conceptworks when it is compared against the peer selectionbased on end-to-end bandwidth measurement [15](or ‘‘best-bandwidth-first’’ as we refer to in theremaining of this paper). We achieve such indepen-dence by injecting the correct peer-point of conges-

elation time of 4 s – the network topology without cross-traffic is

t positived after

Total no. ofpositive claimsincl false positives

Accuracy, accuracywith d filtering(col 3/col 5, col 4/col 5)

119 0.378, 0.21874 0.595, 0.36591 0.571, 0.451

bing rate of 10/s – the same network topology with cross-traffic is

t positived after d

Total no. of positiveclaims incl falsepositives

Accuracy, accuracywith d filtering(col 3/col 5, col 4/col 5)

162 0.772, 0.667115 0.870, 0.80960 0.867, 0.783

Page 23: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

3.5

4

4.5

5

5.5

6

6.5

1119181713121111 6141 51 101 121 131 141 151 161 171 181 191 201 211

elasped time (s)

aver

age

no. o

f ser

ving

pee

rsAC-ideal-noresBW-nores

Fig. 4.2-1a. Average number of active serving peers (sub-streams) – aim for less.

Table 4.2-1Notations to be used in Figs. 4.2-1–4.2-8

Notations

AC-ideal-nores Peer selection by ideal congestion avoidancewithout Re-scheduler

AC-ideal Peer selection by ideal congestion avoidancewith Re-scheduler

AC-nores Peer selection by congestion avoidance usingcorrelation tests without Re-scheduler

AC Peer selection by congestion avoidance usingcorrelation tests with Re-scheduler

BW-nores Peer selection by best-bandwidth-first withoutRe-scheduler

BW Peer selection by best-bandwidth-first withRe-scheduler

A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772 759

tion mappings into the data structure of the PeerCache and Peer Monitor agents, before the stream-ing session starts. We override the correlation testalgorithm completely in this experiment. We thenrepeat the experiment and present the comparisonusing the correlation test algorithm.

In addition, we turn off most Re-scheduling func-tionalities except the handling of active servingpeers going offline, and the handling of loss rateexceeding lFEC. In other words, the streaming ses-sions in this experiment relies almost completelyon proactive scheduling of streaming tasks exceptin the event that requires emergency switch-over.As a reminder, the Scheduler agent schedulesstreaming tasks mainly at the beginning of a seg-ment delivery. The partitioning of a segment isrevised only when it proceeds with the next segment.We run the experiment under the network topologywith cross-traffic as shown in Fig. 4-1. The experi-ment involves running and quitting the DeMSIPlayer for 5 times. Each time the Player plays thevideo for 3 repetitions without quitting DeMSI.We repeat the experiment for each of the followingconfiguration:

1. Peer selection based on end-to-end bandwidth.2. Peer selection based on congestion avoidance

with ideal inference simulation.3. Peer selection based on congestion avoidance

with correlation test algorithm.

We also work on the second objective in thisexperiment by repeating for each of the above con-figuration with the Re-scheduler fully enabled.

The dynamic service level statistics of each activepeer is aggregated and extracted once a second dur-ing the playback. The statistics from the 5 runs arealigned by the repetition number and the elapsed

time of the playback. Each record of statistics fromthe 5 runs over the same elapsed timeline and repe-tition number are averaged.

Figs. 4.2-1a, 4.2-2a, 4.2-2c, 4.2-3a and 4.2-3cshow how far the peer selection based on congestionavoidance can go ideally. The notations being usedin Figs. 4.2-1–4.2-8 are described in Table 4.2-1.Under the congestion avoidance selection strategy,the average number of active serving peers (hencenumber of sub-streams) scheduled by the consumerat almost any time of the playback is lower thanthose scheduled by the consumer using selectionbased on end-to-end bandwidth. The average utiliza-tion of each active serving peer is also higher thanthat its bandwidth-based counterpart at almost anytime of the playback. Likewise, the consumer usingselection based on congestion avoidance yields loweraverage round-trip-times between the consumer andpeers, than the consumer using selection based onend-to-end bandwidth. As expected, the lower aver-age round-trip-times lead to lower average loss ratesthan the counterpart, as shown in Figs. 4.2-4a and

Page 24: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

3.5

4

4.5

5

5.5

6

6.5

1119181713121111 6141 51 101 121 131 141 151 161 171 181 191 201

elasped time (s)

aver

age

no. o

f ser

ving

pee

rsAC-noresBW-nores

Fig. 4.2-1b. Average number of active serving peers (sub-streams) – aim for less.

3.5

4

4.5

5

5.5

6

6.5

1119181713121111 6141 51 101 121 131 141 151 161 171 181 191 201

elasped time (s)

aver

age

no. o

f ser

ving

pee

rs

AC-idealBW-nores

Fig. 4.2-1c. Average number of active serving peers (sub-streams) – aim for less.

3.5

4

4.5

5

5.5

6

6.5

1119181713121111 6141 51 101 121 131 141 151 161 171 181 191

elasped time (s)

aver

age

no. o

f ser

ving

pee

rs

BW-noresAC

Fig. 4.2-1d. Average number of active serving peers (sub-streams) – aim for less.

760 A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772

4.2-4b. There is one characteristic in common. Thatis, the difference in performance between the twoselection strategies, in terms of any type of statistics,converges towards the end of the playback. This isbecause there are only 4 peers available to deliverthe last 5 segments: S20 . . .S24, and 3 peers out of 4share the same hop-link. As the Segment Cachehas accumulated a considerable amount of frag-

ments towards the end of the streaming session,the Scheduler does not need to contact the dedicatedservers for help. As a result, the same set of peers isselected for the delivery of the last 5 segmentsregardless of the selection strategy. Hence the differ-ence in performance converges towards the end.

However, when we compare the average aggre-gated net content receive rates between the two

Page 25: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

3.5

4

4.5

5

5.5

6

6.5

1119181713121111 6141 51 101 121 131 141 151 161 171 181 191

elasped time (s)

aver

age

no. o

f ser

ving

pee

rsBW

BW-nores

Fig. 4.2-1e. Average number of active serving peers (sub-streams) – aim for less.

3.5

4

4.5

5

5.5

6

6.5

elasped time (s)

aver

age

no. o

f ser

ving

pee

rs

AC-ideal-noresAC-ideal

1119181713121111 6141 51 101 121 131 141 151 161 171 181 191 201 211

Fig. 4.2-1f. Average number of active serving peers (sub-streams) – aim for less.

3.5

4

4.5

5

5.5

6

6.5

elasped time (s)

aver

age

no. o

f ser

ving

pee

rs

AC-noresAC

1119181713121111 6141 51 101 121 131 141 151 161 171 181 191 201

Fig. 4.2-1g. Average number of active serving peers (sub-streams) – aim for less.

A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772 761

selection strategies, as shown in Fig. 4.2-5a and 4.2-5d, the figures achieved by the peer selection basedon congestion avoidance are lower than thoseachieved by the selection based on end-to-end band-width. It is indeed the case that the consumeremploying the congestion avoidance selection strat-egy takes longer than the one employing the best-bandwidth-first strategy to finish streaming. This is

largely due to the phenomenon of diversity on peerrevisit. The selection of peers by best-bandwidth-first promotes diverse selections on subsequent revi-sit of previously used peers that have encounteredcongestion before. This can be illustrated by anexample. When peers sharing a congested link areselected and scheduled to stream, their end-to-endbandwidths perceived by the consumer decrease

Page 26: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

18

19

20

21

22

23

24

25

26

27

1 1019111 21 31 41 51 61 71 81 111 121 131 141 151 161 171 181 191 201 211

elasped time (s)

aver

age

net u

pstr

eam

rat

e (k

B/s

)

AC-ideal-noresBW-nores

Fig. 4.2-2a. Average net content upstream rate of active serving peers – aim for more.

1 1019111 21 31 41 51 61 71 81 111 121 131 141 151 161 171 181 191 201

elasped time (s)

18

19

20

21

22

23

24

25

26

27

aver

age

net u

pstr

eam

rat

e (k

B/s

)

AC-noresBW-nores

Fig. 4.2-2b. Average net content upstream rate of active serving peers – aim for more.

1 1019111 21 31 41 51 61 71 81 111 121 131 141 151 161 171 181 191 20118

19

20

21

22

23

24

25

26

27

aver

age

net u

pstr

eam

rat

e (k

B/s

)

AC-idealBW

elasped time (s)

Fig. 4.2-2c. Average net content upstream rate of active serving peers – aim for more.

762 A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772

considerably. When the next segment delivery is dueto be scheduled, the selection of alone is no longerenough. Thus the selection algorithm adds twomore peers: which apparently have similarly slowend-to-end receive rate perceived by the consumerduring the previous playback, due to a previousselection of P5, P6, P7, P8 which share another con-gested link. Now since only P6, P7 are selected, the

actual receive rate increases considerably from theoriginal estimate. The outcome is an increase ofaggregated receive rate from the selection: P0, P1,P2, P3, P6, P7. The larger the set of active peersselected, the higher the chance of encountering sucha phenomenon. In contrast, the congestion avoid-ance selection tends to avoid fluctuations in per-ceived end-to-end receive rate. Unless the peer has

Page 27: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

1 1019111 21 31 41 51 61 71 81 111 121 131 141 151 161 171 181 19118

19

20

21

22

23

24

25

26

27

aver

age

net u

pstr

eam

rat

e (k

B/s

)ACBW

elasped time (s)

Fig. 4.2-2d. Average net content upstream rate of active serving peers – aim for more.

40

45

50

55

60

65

70

75

80

roun

d-tr

ip-t

ime

(ms)

AC-ideal-noresBW-nores

elasped time (s)

1 1019111 21 31 41 51 61 71 81 111 121 131 141 151 161 171 181 191 201 211

Fig. 4.2-3a. Average round-trip-time between the consumer and the active serving peers – aim for less.

40

45

50

55

60

65

70

75

80

roun

d-tr

ip-t

ime

(ms)

AC-noresBW-nores

elasped time (s)

1 1019111 21 31 41 51 61 71 81 111 121 131 141 151 161 171 181 191 201

Fig. 4.2-3b. Average round-trip-time between the consumer and the active serving peers – aim for less.

A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772 763

encountered independent congestion before duringthe streaming, the headroom for the previously per-ceived receive rate to grow is limited.

Fortunately, the aggregated net content receiverates can be boosted by the Re-scheduler as shownin Figs. 4.2-5b, 4.2-5c and 4.2-5e. The boost isregardless of the selection strategy being used forscheduling and re-scheduling. Fig. 4.2-6 shows that

the Re-scheduler participates on easing the fluctua-tions of the aggregated receive rates as well. As theRe-scheduler acts upon slower-than-expected sub-stream flows in a defensive manner by adding aredundant peer to assist the streaming, it slightlyincreases the average number of active serving peersat almost any second of the playback regardlessof the selection strategy. Figs. 4.2-1e, 4.2-1f and

Page 28: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

40

45

50

55

60

65

70

75

80

roun

d-tr

ip-t

ime

(ms)

AC-idealBW

elasped time (s)

1 1019111 21 31 41 51 61 71 81 111 121 131 141 151 161 171 181 191 201

Fig. 4.2-3c. Average round-trip-time between the consumer and the active serving peers – aim for less.

40

45

50

55

60

65

70

75

80

roun

d-tr

ip-t

ime

(ms)

elasped time (s)

ACBW

1 1019111 21 31 41 51 61 71 81 111 121 131 141 151 161 171 181 191

Fig. 4.2-3d. Average round-trip-time between the consumer and the active serving peers – aim for less.

0.001

0.0011

0.0012

0.0013

0.0014

0.0015

1019181716151413121111 111 121 131 141 151 161 171 181 191 201 211

elasped time (s)

aver

aged

loss

rat

e

AC-ideal-noresBW-noresAC-idealBW

Fig. 4.2-4a. Average packet loss rate of sub-stream flows from active serving peers – aim for less.

764 A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772

4.2-1g illustrate this. This is perhaps the cost ofsmoothing down the aggregated receive rate with aminor boost. However, as Figs. 4.2-1c and 4.2-1dshow, this cost is small relative to the significant per-formance improvement of the congestion avoidanceselection strategy over the best-bandwidth-firstcounterpart. Other statistics show no evident ordominating differences after enabling the fully func-

tional Re-scheduler in the experiments. As thecongestion avoidance selection promotes smootherend-to-end receive rate when it is compared to thebest-bandwidth-first selection, it reduces the fre-quency of re-scheduling as Fig. 4.2-7 shows.

Another crucial feature of the Re-scheduler is toensure smooth transition in the event of peer failure,and to reduce the impact of such events on the aggre-

Page 29: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

1019181716151413121111 111 121 131 141 151 161 171 181 191 201 211

80

90

100

110

120

130

140

150

160

170

elasped time (s)

aver

aged

R (

kB/s

)

AC-ideal-noresBW-nores

Fig. 4.2-5a. Average aggregated net content receive rate perceived by the consumer – aim for smoothness and at least Rcontent.

elasped time (s)

aver

aged

R (

kB/s

)

80

90

100

110

120

130

140

150

160

170BW-noresBW

1019181716151413121111 111 121 131 141 151 161 171 181 191

Fig. 4.2-5b. Average aggregated net content receive rate perceived by the consumer – aim for smoothness and at least Rcontent.

0.001

0.0011

0.0012

0.0013

0.0014

0.0015

1019181716151413121111 111 121 131 141 151 161 171 181 191 201

elasped time (s)

aver

aged

loss

rat

e

AC-noresBW-noresACBW

Fig. 4.2-4b. Average packet loss rate of sub-stream flows from active serving peers – aim for less.

A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772 765

gated streaming. We examine the impact of a single-peer failure on aggregated receive rates during a play-back. We shut down a peer when it becomes activeand is pushing a sub-stream of fragments to the con-sumer. Then the aggregated receive rates and thenumber of active serving peers obtained for the 10 sbefore and after the failure event are captured. Thisis repeated 5 times on each peer-selection strategy.Out of 10 trials, 9 of them exhibit no sudden drop

in aggregated receive rate. Six of the nine cases exhibita varying degree of burst in the next 2–5 s after thefailure event. During the burst period, the numberof active serving peers often increases by 1. It impliesthat in most cases, there are 2 peers being re-sched-uled to finish the outstanding streaming task. Theremaining 3 of them exhibit no obvious changes. Itis observed that the trials using the best-bandwidth-first selection strategy exhibit less evident burst in

Page 30: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

elasped time (s)

aver

aged

R (

kB/s

)

80

90

100

110

120

130

140

150

160

170AC-ideal-noresAC-ideal

1019181716151413121111 111 121 131 141 151 161 171 181 191 201 211

Fig. 4.2-5c. Average aggregated net content receive rate perceived by the consumer – aim for smoothness and at least Rcontent.

elasped time (s)

aver

aged

R (

kB/s

)

80

90

100

110

120

130

140

150

160

170AC-nores

BW-nores

1019181716151413121111 111 121 131 141 151 161 171 181 191 201

Fig. 4.2-5d. Average aggregated net content receive rate perceived by the consumer – aim for smoothness and at least Rcontent.

elasped time (s)

aver

aged

R (

kB/s

)

80

90

100

110

120

130

140

150

160

170AC-noresAC

1019181716151413121111 111 121 131 141 151 161 171 181 191 201

Fig. 4.2-5e. Average aggregated net content receive rate perceived by the consumer – aim for smoothness and at least Rcontent.

766 A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772

aggregated receive rate than those using the conges-tion avoidance counterpart. This is expected becausethe size of the active peer set resulted from the best-bandwidth-first selection strategy is often larger thanthat resulted from the congestion avoidance strategy.In addition to the fact that the utilization of eachactive serving peer, under the best-bandwidth-firststrategy, is lower than that under the congestion

avoidance strategy, the contribution of each activepeer under the best-bandwidth-first strategy is rela-tively less significant than that under the congestionavoidance counterpart. This applies to the impactof peer failure as well. Fig. 4.2-8 illustrates an exam-ple of a short burst due to re-scheduling upon peerfailure from one of the playback trials using the idealcongestion avoidance selection strategy.

Page 31: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

0

20

40

60

80

100

120

140

160

180

1 2 3playback repetition

var(

R)

BW-noresBWAC-noresACAC-ideal-noresAC-ideal

Fig. 4.2-6. Variance of net content receive rates obtained fromeach playback. The rates are averaged over 5 runs. The lower thevariance, the more stable (smooth) the receive rates perceivedover the course of the playback.

0

50

100

150

200

250

300

350

400

450

500

1 2 3playback repetition

re-s

ched

ule

freq

uenc

y ov

er 5

run

s

BWACAC-ideal

Fig. 4.2-7. Total number of re-schedules during each playbackover 5 runs. Note that as the congestion avoidance algorithmusing correlation tests takes time to infer the peer-point ofcongestion mappings. The performance of the first play is similarto that when the best-bandwidth-first peer-selection is used.

90

95

100

105

110

115

120

125

130

104 106 108 110 112 114 116 118 120 122elasped time (s)

aggr

egat

ed R

(kB

/s)

AC-idealA peer fails at this point

Fig. 4.2-8a. A typical impact of a single-peer failure on aggre-gated net content receive rate from an example playback.

4.8

5

5.2

5.4

5.6

5.8

6

6.2

104 106 108 110 112 114 116 118 120 122elasped time (s)

no. o

f ser

ving

pee

rs

AC-ideal

A peer fails at this point

Fig. 4.2-8b. A typical impact of a single-peer failure on numberof active serving peers from an example playback.

A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772 767

Finally, we repeat the experiments using the con-gestion avoidance selection with correlation test asthe underlying inference algorithm. As shownin Figs. 4.2-1a versus 4.2-1b, and 4.2-2a versus4.2-2b, 4.2-2c versus 4.2-2d, 4.2-3a versus 4.2-3b,4.2-3c versus 4.2-3d, and 4.2-4a versus 4.2-4b, thedifference in performance between the correlationtest version and the selection based on end-to-end

bandwidth is less evident than that using the idealversion. This is expected as the correlation testscannot give the full picture of the peer-congestionpoints relationships, although false positives in thegroupings are rare. At its best out of all rounds ofexperiments, the algorithm successfully identifiesall 4 points of congestion with 3 peers in each.Although the selection algorithm avoids pickingmore than one peer from each group when thereare enough candidates, there are still occasionswhere more than one peer from the same group isselected as active peers at the same time. False neg-atives, which disintegrate the groupings, may beintroduced when those peers in the same groupare tested for correlation, while there are notenough active peers in that group to produce con-gestion. Moreover, as the point of congestions areinferred incrementally during the streaming session,the performance statistics obtained from the firstplayback of each DeMSI Player session haveadversely affected the average values over all runsby some degree. We deliberately include the statis-tics from the first playback in the overall averagesbecause, in reality, each streaming session should

Page 32: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

768 A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772

probably encounter a significant population of dis-covered candidates that have not been contactedbefore. In addition, the network path between theconsumer and a previously contacted peer willchange [9]. Some of the previously inferred knowl-edge may become invalid. The opportunity of newexploration is always there for DeMSI.

5. Conclusion and future work

This paper presents an infrastructural solution toaddress aggregated media streaming from a decen-tralized collection of unreliable subscriber resources,under the scenario where the media content is col-lectively stored at the subscriber ends. Unlike otherP2P resource sharing solutions, each subscriber isresponsible for only a small portion of the contentrather than a complete replication of it. Our simula-tions demonstrate the effectiveness of the peer selec-tion algorithm that employs a proactive congestionavoidance strategy, which only requires coarse-grain point-of-congestion inference and clusteringof peers, under DeMSI’s scenario. It can be con-cluded that the use of congestion avoidance strategyin peer selection outperforms the use of best-band-width-first strategy in terms of the following goalsset out in Section 1:

1. To maximize the utilization of the network andpeers.

2. To minimize the number of peers to serve thecontent.

3. To minimize the frequency of re-scheduling oremergency switching-over to other candidatesover the course of streaming.

We also demonstrate the power of our novelapproach to promote smooth reactive re-schedulingof aggregated streaming tasks. It has been shown toimprove the performance of aggregated streaming,in particular on the streaming rate and its smooth-ness regardless of which proactive peer-selectionstrategy has been used in scheduling and re-schedul-ing. The combined use of the proactive peer-selec-tion and the re-scheduling algorithm simply bringsthe best of both worlds together.

It is anticipated that the smoothening of aggre-gated receive rate by using reactive re-scheduling,in events of fluctuating perceived receive rate of a sin-gle peer, can also be achieved solely by scheduling asthe segment size decreases. As the segment sizedecreases, the frequency of scheduling increases. In

that case, the scheduling process has more up-to-datedata on dynamic service level metrics. Therefore, itsadaptability in changing network conditionsincreases. However, it is expected that the decreaseof segment size reduces the frequency of claims pro-duced by the correlation test algorithm being usedfor the point-of-congestion inference. Hence thelonger it takes to infer. One way to work around thisproblem is to have more consecutive segments dis-tributed to each peer such that the continuity of thesub-stream flow from a peer can be maintainedacross schedules, in order to ensure enough timefor a correlation test against another flow. However,smaller segment size also implies more loading on thenetwork caused by more frequent use of controlpackets by the Scheduler for sending deliveryrequests to the active peers. In contrast, the Re-Scheduler sends additional delivery requests to otherpeers only when there is a need.

As we have discovered from the experiments, ourinference algorithm is particularly vulnerable to falsepositives from the correlation tests of the sub-streamflows. The existing peer-point of congestion map-pings can be easily disintegrated by false positives.It is due to the fact that the introduction of false pos-itives into the group leads to subsequent correlationtests of an existing peer that is correctly identifiedagainst the one that is not. Hence an increase inthe probability of removing correctly identified peersout of the group together with the incorrect ones.Nevertheless, our conservative approaches appliedto correlation tests have significantly reduced therate of false positives in the results. Our experimentsalso confirm that the correlation tests yield moreaccurate inference under asymmetric network withshared links congested by heavy cross-traffic, thanunder the same asymmetric network with sharedlinks congested by tight bandwidth assignment. Apossible explanation is that the shared links withcross-traffic promote varying differences in eachother. The outcome is a network that is more asym-metric than that without cross-traffic. The phenom-enon is in line with the findings discussed in [2]that the correlation test performs better under anasymmetric network than under a symmetric one.

The subject of P2P aggregated media streaming islarge and involves a diverse collection of disciplinessuch as security, networking, agent-oriented designand development, artificial intelligence, and statis-tics. The future research directions of DeMSI arealso diverse. We outline the most important onesin descending order of priority:

Page 33: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772 769

5.1. Peer hunting

As discussed in Section 3.4, DeMSI works inde-pendently from the resource discovery algorithms.However, resource discovery is one of the most crit-ical components of this infrastructure, but unfortu-nately, there are yet to have any existing resourcediscovery substrates that can be ‘‘plugged into’’DeMSI nicely. Major enhancement on the designof the substrate is expected.

Our future direction on the evaluation ofresource discovery substrates will orient around dis-tributed hash table (DHT) systems that have thenotion of locality in the search. Good candidatesof such substrates include Kelips [36] and Pastry[18]. In particular, we will pay more attention tothose that utilize gossip protocol such as Kelips asit manages to offer constant lookup time and over-head bandwidth regardless of the number of peersin the system [36].

5.2. Intelligent pattern learning for enhanced

proactiveness in peer-selection

Peer-selection approaches based on past historyof network characteristics are proved effective inaggregated streaming scenario. However, theapproach discussed in this paper does not proac-tively predict whether the candidate peer is availableat the time of selection, and the probability that thepeer will become unavailable during the delivery. Inthat sense, DeMSI is completely reactive when itcomes to the dynamics of peer availability. Selectionbased on past history and even prediction of peeravailability as well as the network characteristicsshould be an interesting field of research. Inspiredby the fact that users of peer-to-peer file-sharing sys-tems generally have a regular usage pattern overtime [22,23], the availability of peers and theirunderlying network characteristics over time shouldalso have a pattern. Such properties can beexploited by the peer selection algorithm such thatonly the peers that are believed to be most probablyavailable at the time of selection are chosen. Like-wise, it is anticipated that the peer selection can alsobe based on the prediction of the streaming rate ofthe candidate peer, and even the prediction onpeer-point of congestion mappings at the time ofselection. Hefeeda et el in [15] have briefly proposeda pure statistical method of estimating current avail-ability of a peer upon request by the consumer. Theestimation process is situated at the peer end. How-

ever, the architecture does not allow prediction offuture availability due to the fact that the size ofthe data sample for estimation is probably too largeto be maintained collectively on the consumer sidein order to promote prediction. For example, theconsumer has no way to predict whether the candi-date peer selected to be contacted is actually avail-able at all. Moreover, the estimation algorithmassumes that the usage pattern repeats every 24 h,which probably can only cover a narrow range ofusers.

Let us narrow down the focus to the peer avail-ability prediction for now. There are two mainapproaches on the architecture for pattern learning.The first approach is to have the peer collect theusage statistics and send a summary of it to the con-sumer regularly. The regularity here is possibly aninterval of at least a day. The consumer then analy-ses the summary and infer the future availability ofa peer incrementally. In this approach, the summaryhas to be as compact as possible and the interval ofsummary generation cannot be too frequent inorder to minimize overhead to the network. Onthe other hand, the second approach is to have theconsumer infer the future availability of a peerbased on past experience of connection attemptsto that peer. This approach does not require anyactions on the peer side.

It is anticipated that the architecture may employsome of the existing incremental learning algorithmson time-series data such as [24]. In traditional neuralnetworks such as the back-propagation neural net-works, the network has to be trained with a streamof data samples for a number of iterations in orderto predict what the next data sample in the streamis. When new data samples come in, the networkhas to be re-trained with the original set of datasamples plus the new data samples in order toensure accurate predictions. In contrast, the incre-mental learning algorithm allows the network tobe trained incrementally using the new data samplestogether with a fixed-sized metadata or ‘‘hypothe-sis’’. The outcome of the training is a renewedhypothesis and it can be used for the next training.This model can be applied to the first approach asmentioned above: The summary to be sent fromthe peers regularly is the hypothesis resulted fromincremental training with availability and usagedata obtained since the last training at the peer side.The past experience ‘‘hypothesis’’ or metadata ofeach candidate peer is to be stored persistently atthe consumer side across multiple streaming

Page 34: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

770 A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772

sessions. However, there must be a limit on thenumber of peers with which the past experiencecan be stored. The size of the hypothesis, its updateinterval and the prediction accuracy are open issues.On the other hand, the second approach is evenmore challenging as it has to deal with availabilitydata resulted from polls (trial connection attempts)occurred irregularly over the time series. Althoughthe perceived data can be grouped and expressedin terms of some interpolated and accumulated sta-tistics as a function of poll rate over a time period,the accuracy of the statistics itself is difficult to beconsistent along the time line. It is impossible forDeMSI to maintain a consistent poll rate over atime period as there are too many candidates tobe polled. In addition, the consumer may bringthe DeMSI offline at any time. Therefore, the sec-ond approach is unlikely to be of consideration.

5.3. Publishing of new contents to peers

We have discussed the storage strategy of DeMSIin this paper. However, it cannot be consideredcomplete without the content publishing and re-dis-tribution processes. It can be very costly if a newcontent is published to the peers from a singlesource such as the content provider itself. A morescalable and cost effective solution is to employ apower-law approach: The content provider firstpublishes the content in blocks of segments to aninitial set of peers. Then those peers are scheduledto do the re-distribution work on behalf of the con-tent provider. Each peer that receives the re-distri-bution is scheduled to re-distribute the newsegments again in different combinations to its localpeers subsequently. Such a decentralized approachhas to face with the challenge of making sure everysingle peer that comes online at a later time can besynchronized with the new content. Another chal-lenge is to ensure evenness of the re-distributionssuch that the peers in a local community are notbiased to offer a particular range of segments ofthe content. The re-distribution strategy mustensure some degree of redundancy or overlap inthe range of segments to be offered by a local collec-tion of peers.

5.4. Incentive model

Since the purpose of the DeMSI is to ease theworkload of a traditional single point (or client–ser-ver based) CDN by offloading it to the subscriber

peers, it is inherent to hope that the longer andthe more peers stay online the more workload canbe offloaded from the provider. However, who caresif the provider does not offer any incentive for thosewho stay online? The incentive can be calculatedbased on accumulated online time and the amountof content data delivered to other consuming peers.In other words, the system must be able to recordthe above usage statistics reliably and accurately.Since the delivery of content is decentralized, theaccounting service has to rely on the peers to reportusage statistics. It is anticipated that such a decen-tralized usage accounting model is subject to higherrisk of fraud attacks from malicious users, than theconventional centralized model that is pretty muchunder the content provider’s control.

5.5. VCR operations

The current version of DeMSI is capable of deliv-ering video content at VBR (variable bit rate). How-ever, the Player can only support trivial VCRoperations such as play, pause, and stop. Morecomplex VCR functions such as slide-bar style videoskipping, fast forward and fast reverse scan requireDeMSI to deliver video at CBR (constant bit rate).Since most stream-able video coding technologiessuch as MPEG-4 [5] have coding dependenciesbetween video frames in a GOP (group of pictures)[37], any video skipping operations will fail ifDeMSI does not know where a GOP starts (wherethe I-frame is) and where a requested frame ends(in terms of fragment ID). In other words, the sizeand structure of a DeMSI’s segment and fragmentcan no longer be independent of the video coding.The use of CBR to deliver video promotes implicitmappings between the frame structure of the videocoding and the structure of DeMSI’s segment andfragment. The size of each GOP of the stream canbe consistent due to the use of CBR to deliver video,and it can be completely fit into a constant numberof fragments nGOP of a segment such that the totalnumber of fragments that made up a segment n isdivisible by nGOP. On the other hand, if VBR isused, there will be additional overhead on taggingthe stream with explicit mapping information asthe size of each GOP and frame of the stream canbe different from one another.

Another implication of supporting video skip-ping functions is that the Scheduler will no longerrequest every fragment of the segment to be deliv-ered from the active serving peers. In other words,

Page 35: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772 771

if a segment has 10 s worth of normal playing time,it becomes only 2 s under fast forward scan that is 5times faster than normal. In that case, DeMSI hasto look for more peers up front to serve moreupcoming segments than that under normal playing.As DeMSI has to react quick enough when the userrequests any video skipping functions, it has to relymore heavily on the dedicated servers than thatunder normal playing, especially at the beginningof the operation. The impact on DeMSI in termsof the above mentioned aspects is what we have tolook into further if video skipping functions aresupported.

References

[1] D. Rubenstein, J. Kurose, D. Towsley, Detecting sharedcongestion of flows via end-to-end measurement, IEEE/ACM Transactions on Networking 10 (3) (2002).

[2] O. Younis, S. Fahmy, On efficient on-line grouping of flowswith shared bottlenecks at loaded servers, Technical ReportCSD-02-018, Purdue University, August 2002.

[3] M. Handley, S. Floyd, J. Padhye, J. Widmer, TCP FriendlyRate Control (TFRC) Protocol Specification – RFC 3448,January 2003.

[4] Onion Networks Inc., Java FEC Library v1.0.3. Availablefrom: <http://www.onionnetworks.com/developers/>.

[5] MPEG-4 Industry Forum FAQ. Available from: <http://www.m4if.org/resources/mpeg4userfaq.php>.

[6] Dixon, Streaming Media: Trends and Formats, ManifestTechnology, 2003.

[7] Bouras, Kapoulas, Konidaris, Sevasti, A dynamic distrib-uted video on demand service, in: 20th IEEE InternationalConference on Distributed Computing Systems-ICDCS2000, Taipei, Taiwan, April 10–13, 2000, pp. 496–503.

[8] Akamai Technologies Inc. Available from: <http://www.akamai.com>.

[9] V.N. Padmanabhan, L. Qiu, H.J. Wang, Server-basedinference of Internet link lossiness, in: Infocom 2003, IEEE,2003.

[10] R. Teixeira, K. Marzullo, S. Savage, G.M. Voelker, InSearch of Path Diversity in ISP Networks, IMC 03, ACM,October 2003.

[11] T. Nguyen, A. Zakhor, Path diversity with forward errorcorrection (PDF) system for packet switched networks, in:Infocom 2003, IEEE, 2003.

[12] J.G. Apostolopoulos, M.D. Trott, Path diversity forenhanced media streaming, in: IEEE CommunicationsMagazine, IEEE, August 2004.

[13] K. Calvert, J. Griffioen, B. Mullins, A. Sehgal, S. Wen,Concast: design and implementation of an active networkservice, IEEE Journal on Selected Area in Communications19 (3) (2001) 426–427.

[14] T. Nguyen, A. ZakHor, Distributed video streaming withforward error correction, Packet Video Workshop 2002,Pittsburgh PA, USA, April 2002.

[15] M. Hefeeda, A. Habib, D. Xu, B. Bhargava, B. Botev,CollectCast: a peer-to-peer service for media streaming,ACM Multimedia 2003, Berkeley CA, USA, November 2003.

[16] M. Coates, R. Hero, A. Nowak, B. Yu, Internet tomogra-phy, IEEE Signal Processing Magazine 19 (3) (2002).

[17] D. Katabi, C. Blake, Inferring congestion sharing and pathcharacteristics for packet interarrival times, MIT-LCS-TR-828, December 2001.

[18] A. Rowstron, P. Druschel, Pastry: Scalable, distributedobject location and routing for large-scale peer-to-peersystems, in: Proceedings of 18th IFIP/ACM InternationalConference on Distributed Systems Platforms (Middleware2001), Heidelberg, Germany, November 2001.

[19] S. Saroiu, P.K. Gummadi, S.D. Gribble, SProbe: a fasttechnique for measuring bottleneck bandwidth in uncooper-ative environments, in: Infocom 2002, IEEE, 2002.

[20] B. Byers, M. Luby, M. Mitzenmacher, A. Rege, A digitalfountain approach to reliable distribution of bulk data, in:Proceedings of the ACM SIGCOMM 98, Vancouver, BritishColumbia, August 1998, pp. 56–67.

[21] A. Bestavros, J. Byers, K. Harfoush, Inference and labelingof metric-induced network topologies, Computer ScienceDepartment, Boston University, Boston, MA, USA, Tech.Rep., BUCS-2001-010, June 2001.

[22] S. Saroiu, P. Krishna Gummadi, S.D. Gribble, Measuringand analyzing the characteristics of Napster and Gnutellahosts, Multimedia Systems Journal 8 (5) (2002).

[23] S. Sen, J. Wang, Analyzing peer-to-peer traffic across largenetworks, IEEE/ACM Transactions on Networking 12 (2)(2004).

[24] K. Okamoto, S. Ozawa, S. Abe, A fast incremental learningalgorithm of RBF networks with long-term memory, in:Proceedings of the International Conference on NeuralNetworks, 2003 (IJCNN2003-Portland).

[25] UCB/LBNL/VINT Groups, Network Simulator NS-2.Available from: <http://www.isi.edu/nsnam/ns>.

[26] V. Padmanabhan, H. Wang, P. Chou, K. Sripanidkulchai,Distributing streaming media content using cooperativenetworking, in: Proceedings of the ACM InternationalWorkshop on Networking and Operating Systems Supportfor Digital Audio and Video (NOSSDAV’02), Miami Beach,FL, USA, May 2002.

[27] H. Deshpande, M. Bawa, H. Garcia-Molina, Streaming livemedia over a peer-to-peer network, Technical report, Stan-ford University, August 2001.

[28] Marshall Brain, Howstuffworks ‘‘How File Sharing Works’’.Available from: <http://computer.howstuffworks.com/file-sharing1.htm>.

[29] S.M. Lui, S.H. Kwok, Interoperability of peer-to-peer filesharing protocols, ACM SIGecom Exchanges 3 (3) (2002)25–33.

[30] J.E. Berkes, Decentralized peer-to-peer network architecture:Gnutella and Freenet, University of Manitoba Winnipeg,Manitoba, Canada, 2003.

[31] Peer-to-Peer (P2P) and How Kazaa Works. Available from:<http://www.kazaa.com/us/help/glossary/p2p.htm>.

[32] K. Tutschku, A measurement-based traffic profile of theeDonkey filesharing service, passive and active networkmeasurement, in: 5th International Workshop, PAM 2004,Antibes Juan-les-Pins, France April 19–20, 2004. Proceed-ings, LNCS, vol. 3015/2004.

[33] B. Cohen, Incentives Build Robustness in BitTorrent, May 2003.Available from: <http://bittorrent.com/bittorrente con.pdf>.

[34] C.H. Ding, S. Nutanong, R. Buyya, Peer-to-peer networksfor content sharing, Technical Report, GRIDS-TR-2003-7,

Page 36: Decentralized media streaming infrastructure …buyya.com/papers/DEMSI_JSA_Elsevier.pdfDecentralized media streaming infrastructure (DeMSI): An adaptive and high-performance peer-to-peer

772 A.K. Wah Yim, R. Buyya / Journal of Systems Architecture 52 (2006) 737–772

Grid Computing and Distributed Systems Laboratory,University of Melbourne, Australia, December 2003.

[35] IntelliDNS, Available from: <http://www.intellidns.com>.[36] I. Gupta, K. Birman, P. Linga, A. Demers, R. Van Renesse,

Kelips: Building an efficient and stable P2P DHT throughincreased memory and background overhead, in: Proceed-ings of the 2nd International Workshop on Peer-to-PeerSystems (IPTPS ’03), 2003.

[37] C.M. Huang, K.C. Yang, J.S. Wang, Support fast scanoperations with video streaming technology, in: Proceedings

of the 2004 IEEE International Conference on Multimediaand Expo, ICME 2004, June 2004.

[38] Y. Guo, K. Suh, J. Kurose, D. Towsley, A peer-to-peer on-demand streaming service and its performance evaluation, in:Proceedings of 2003 IEEE International Conference onMultimedia & Expo (ICME 2003), Baltimore, MD, July 2003.

[39] Y. Guo, K. Suh, J. Kurose, D. Towsley, P2Cast: Peer-to-peer patching scheme for VoD service, in: Proceedings of the12th World Wide Web Conference (WWW-03), Budapest,Hungary, May 2003.