Maximizing P2P File Access Availability in Mobile Ad …hs6ms/publishedPaper/Journal/2014...Maximizing P2P File Access Availability in Mobile Ad hoc Networks Though Replication for

0018-9340 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TC.2014.2308211, IEEE Transactions on Computers

1

Maximizing P2P File Access Availability inMobile Ad hoc Networks Though Replication for

Efficient File SharingKang Chen, Student Member, IEEE, Haiying Shen*, Senior Member, IEEE,

AbstractFile sharing applications in mobile ad hoc networks (MANETs) have attracted more and more attention in recent years. Theefficiency of file querying suffers from the distinctive properties of such networks including node mobility and limited communicationrange and resource. An intuitive method to alleviate this problem is to create file replicas in the network. However, despite the effortson file replication, no research has focused on the global optimal replica creation with minimum average querying delay. Specifically,current file replication protocols in mobile ad hoc networks have two shortcomings. First, they lack a rule to allocate limited resourceto different files in order to minimize the average querying delay. Second, they simply consider storage as resource for replicas, butneglect the fact that the file holders frequency of meeting other nodes also plays an important role in determining file availability.Actually, a node that has a higher meeting frequency with others provides higher availability to its files. This becomes even moreevident in sparsely distributed MANETs, where nodes meet disruptively. In this paper, we introduce a new concept of resource forfile replication, which considers both node storage and meeting frequency. We theoretically study the influence of resource allocationon the average querying delay and derive a resource allocation rule to minimize the average querying delay. We further propose adistributed file replication protocol to realize the proposed rule. Extensive trace-driven experiments with synthesized traces and realtraces show that our protocol can achieve shorter average querying delay at a lower cost than current replication protocols.

Index TermsMANETs, Peer-to-Peer, File Sharing, File Availability

F

1 INTRODUCTIONWith the popularity of popularity of mobile devices,i.e., smartphones and laptops, we envision the future ofMANETs consisted of these mobile devices. By MANETs,we refer to both normal MANETs and disconnectedMANETs (or delay tolerant networks (DTNs). The for-mer has a relatively dense node distribution in a localarea while the latter has sparsely distributed nodes thatopportunistically meet each other. On the other side, theemerging of mobile file sharing applications (e.g., Qik [1]and Flixwagon [2]) also motivates the investigation onthe peer-to-peer (P2P) file sharing over such MANETs.

The local P2P model provides three advantages.Firstly, it enables file sharing when no base stationsare available (e.g., rural area). Secondly, with the P2Parchitecture, the bottleneck on overloaded servers incurrent client-server based file sharing systems can beavoided. Thirdly, it exploits the otherwise wasted peer topeer communication opportunities among mobile nodes.As a result, nodes can freely and unobtrusively accessand share files in the distributed MANET environment,which can possibly support some interesting applica-tions. For example, mobile nodes can share files basedon users proximity [3] in the same building or a localcommunity. Tourists can share their travel experiences

* Corresponding Author. Email: [email protected]; Phone: (864) 6565931; Fax: (864) 656 5910.

The authors are with the Department of Electrical and Computer Engi-neering, Clemson University, Clemson, SC, 29634.E-mail: {kangc, shenh}@clemson.edu

or emergency information with other tourists throughdigital devices directly even when no base station isavailable in remote areas. Drivers can share road orweather information through the vehicle-to-vehicle com-munication.

However, the distinctive properties of MANETs, in-cluding node mobility, limited communication range andresource, have rendered many difficulties in realizingsuch a P2P file sharing system. For example, file search-ing turns out to be non-trivial and time consumingsince nodes in MANETs move around freely and canexchange information only when they are within thecommunication range. Broadcasting can quickly discoverfiles, but it generates the broadcast storm problem [4]with high energy consumption. Probabilistic routing andfile discovery protocols [5][7] avoid broadcasting byforwarding a query to a node with higher probability ofmeeting the destination. But the opportunistic encoun-tering of nodes in MANETs makes file searching andretrieval non-trivial.

File replication is an effective way to enhance fileavailability and reduce file querying delay. It createsreplicas for a file to improve its probability of beingencountered by requests. Unfortunately, it is impracticaland inefficient to enable every node to hold the repli-cas of all files in the system considering limited noderesources. Also, file querying delay is always a mainconcern in a file sharing system. Users often desire toreceive their requested files quickly no matter whetherthe files are popular or unpopular. Thus, a critical issue israised for further investigation: how to allocate the limitedresource in the network to different files for replication so thatthe overall average file querying delay is minimized?



2

Recently, a number of file replication protocols havebeen proposed for MANETs [8][12]. In these proto-cols, each individual node replicates files it frequentlyqueries [8][10], or a group of nodes create one replicafor each file they frequently query [10][12]. In theformer, redundant replicas are easily created in the sys-tem, wasting resources. In the latter, though redundantreplicas are reduced by group cooperation, neighbor-ing nodes may separate from each other due to nodemobility, leading to large query delay. There are alsosome works addressing content caching in more sparselydistributed MANETs (disconnected MANETs/DTNs) forefficient data retrieval [13][19] or message routing [20].They basically follow an intuitive way to cache datathat are frequently queried on places that are visitedfrequently by mobile nodes. Both the two categories ofreplication methods fail to thoroughly consider that anodes mobility affects the availability of its files.

In spite of the efforts, current file replication protocolslack a rule to allocate limited resource to different filesfor replica creation in order to achieve the minimumglobal average querying delay, i.e., global search effi-ciency optimization under limited resource. Moreover,they simply consider storage as the resource for replicas,but neglect that a nodes frequency to meet other nodes(meeting ability in short) also influences the availabilityof its files. Files in a node with a higher meeting abilityhave higher availability.

In this paper, we introduce a new concept of resourcefor file replication, which considers both node storageand node meeting ability. We theoretically study theinfluence of resource allocation on the average queryingdelay and derive an optimal file replication rule thatallocates resources to each file based on its popularityand size. To the best of our knowledge, this work isthe first attempt to theoretically investigate the problemof resource allocation for replica creation to achieveglobal file searching optimization in MANETs. We fur-ther propose a file replication protocol based on therule, which approximates the minimum global queryingdelay in a fully distributed manner. Our experiment andsimulation results show the superior performance of theproposed protocol in comparison with other representa-tive replication protocols.

The remainder of this paper is organized as follows.Section 2 presents an overview of the related works.Section 3 presents the analysis and modeling of the influ-ence of the resource allocation on file searching efficiencyunder two representative mobility models. Section 4details the file replication protocol. In Section 5, 6, and 7,the performance of our proposed system is evaluatedthrough real traces and synthesized mobility. Section 8concludes the paper.

2 RELATED WORK2.1 File Sharing in Normal MANETsThe topic of file replication for efficient file sharing ap-plications in MANETs has been studied recently. In [10][12], individual or a group of nodes decide the list

of files to replicate according to file visiting frequency.Hara [10] proposed three file replication protocols: StaticAccess Frequency (SAF), Dynamic Access Frequencyand Neighborhood (DAFN) and Dynamic Connectivitybased Grouping (DCG). In SAF, each node replicatesits frequently queried files until its available storageis used up. SAF may lead to many duplicate replicasamong neighboring nodes when they have the same in-terested files. DAFN eliminates duplicate replicas amongneighbors. DCG further reduces duplicate replicas in agroup of nodes with frequent connections. It sums theaccess frequencies of all nodes in a group and createsreplicas for files in the descending order. Though DAFNand DCG enable replicas to be shared among neighbors,neighboring nodes may separate from each other dueto node mobility. Also, they incur high traffic load inidentifying duplicates or managing groups.

Zhang et al [11] proposed to let each node collectaccess statistics from neighbors to decide the creationor relinquishment of a replica. Duong and Demeure [12]proposed to group nodes with stable connections and leteach node checks its group members potential possibil-ity of requesting a file and their storage status to decidereplicate the file or not. Also, each node notifies all othernodes in the system about its newly created files bybroadcasting. Yin and Cao [9] proposed to cache popularfiles on the intersection nodes of file retrieval paths.Though it is effective for popular files, it fails to utilize allstorage space in nodes other than the intersection nodes.

Gianuzzi [21] investigated the probability of acquiringa file, which has n replicas in the network, from thepotentially partitioned network. He also studied thefile retrieval performance when erasure coding [22] isemployed to segment files. Chen [23] discussed how todecide the minimal number of mobile servers neededto satisfy the requirement that every data item can beobtained within at most k (k 1) hops by any node inthe system. Moussaoui et al. [8] proposed two steps of filereplication, primary replication and dynamic replication,to disseminate replicas in the network in order to meetuser needs and prevent data loss in the case of networkpartition. In the primary replication step, newly createdfiles are distributed evenly among nodes that are threehops away from each other through replication. Later,when the network topology changes, dynamic replica-tion is conducted, in which each node checks its visitingfrequency to a file or the density of a file to make thereplication decision.

2.2 File Sharing in Disconnected MANETs/DTNsHuang et al. [13] discussed how to cache files in serversto realize the optimal file availability to mobile users inWiFi-based wireless networks based on node mobilitypattern, AP topology and file popularity. However, thefile servers in this paper are fixed nodes connecting toAPs, while we consider a more general P2P scenario, inwhich all mobile nodes are both file servers and clients.Pitkanen and Ott [17] proposed the DTN storage moduleto leverage the DTN store-carry-and-forward paradigm



3

and make DTN nodes keep a copy of a message for alonger period of time required by forwarding. Gao etal. [14] proposed a cooperative caching method in DTNsby copying each file to the node in each network centrallocation, which is frequently visited by other nodes.When the central node is full, less-popular replicas aremoved to its neighbor nodes. However, central nodesmay be frequently changed, leading to frequent filetransfers and high overhead. QCR [15] leverages cachingfor multimedia content dissemination in opportunisticnetworks. It considers data retrieval delay and the prob-ability that users will require the same content basedon previously experiences to decide the caching policy.SEDUM [20] also uses replication to create redundantmessages in routing for DTNs, thereby enhancing rout-ing success rate. PSEPHOS [16] considers three factorsincluding data access frequency, user preference andnode mobility to decide the data caching. The authorin [18] considers the contact duration in DTNs to betterimprove data retrieval probability through replication.In [19], both social community structures and contactduration among nodes are considered to decide whereand how much to cache data in DTNs. However, thesemethods fail to consider that the mobility of a nodeaffects the availability of files or messages and fur-ther optimize the replication distribution to enhance fileavailability or routing success rate.

2.3 Modeling Replication Optimization ProblemWe present the general process to model the expected

file querying delay with file replication. We let mi be theprobability that a nodes newly met node in the comingtime interval is node i, which reflects the meeting abilityof the files on node i. We also use Xij to denote whethernode i owns file j or its replication. Then, the averagenumber of time intervals needed to meet a specific file,say file j, can be represented as:

Tj =1

Ni=1

miXij

(1)

Then, the average number of intervals needed to satisfya request is

T =

Fj=1

qj Tj =

Fj=1

qjNi=1

miXij

, (2)

where qj is the probability of querying file j. WithFormula (2), we can formulate the global optimizationproblem as minimizing T , which can be further utilizedto deduce the optimal replication rule.

However, the calculation of mi may be complex andmakes the minimization problem non-trivial. We willdiscuss how this is handled with the two commonmobility models in Section 3.

3 THEORETICAL ANALYSIS OF GLOBALLYOPTIMAL FILE REPLICATION

3.1 Node Movement ModelsRecall that we consider two types of MANETs (i.e., nor-mal MANETs and disconnected MANETs) in this paper.In the research area of MANETs, usually, the randomwaypoint model (RWP) [24] is used for the normalMANETs and the community-based mobility model [25]is used for the disconnected MANETs (and DTNs). Thus,we also use the two models to represent the two typesof MANETs in theoretical analysis. We leave the analysisfor other mobility models (i.e., created by Bonn MotionTool [26]) as our future work.

3.1.1 Random Waypoint Model for Normal MANETsAs some MANET replication protocols [10], [11], [21],we use the random waypoint model (RWP) [24] tomodel node mobility in normal MANETs. In RWP, nodesrepeatedly move to a randomly selected point at arandom speed, which means each node has roughlysimilar probability to meet other nodes. However, nodesusually have different probabilities of meeting nodes inreality (i.e., nodes with faster speed can meet others morefrequently). We hence let each node have a randomly ob-tained speed, rather than continuously varying a nodesspeed in different paths as in the normal RWP model.

3.1.2 Community-Based Mobility Model for Discon-nected MANETsThe community-based mobility model [25] has beenused in some content dissemination or routing algo-rithms for disconnected MANETs/DTNs [27], [28] todepict node mobility. In this model, the entire test areais split into different sub-areas, denoted as caves. Eachcave holds one community. A node belongs to one ormore communities (i.e., home community). The routinesand (or) social relationships of a node tend to decide itsmobility pattern. When moving, a node has probabilityPin to stay in the home community and probability1 Pin to visit a foreign community. A node moveswithin its home communities for most of the time (i.e.,Pin usually is large). Please refer to [25] for more detail.

3.1.3 Assumptions and LimitationsWith above two mobility models, our analysis replies

on two assumptions: 1) the probability of meeting acertain node is the same for all nodes (RWP model)or all nodes in its home community (community-basedmodel) and 2) nodes move independently in the network(both models). The two assumptions may not hold inreal cases, which limits the applicability of the analysisresults in our paper to different real scenarios. However,the analysis results can provide instructions on file repli-cation because the two models can represent key char-acteristics in real mobility and have been widely used inresearch works [10], [11], [21], [27], [28]. We also havebriefly discussed how to expand the analysis to generalscenarios, which do not have the two assumptions, inSection 2.3 and 3.2.3. Due to the complexity of suchgeneral cases, we leave the detailed research without thetwo assumptions to future work.



4

3.2 Theoretical Analysis

TABLE 1: Notations in analysis.

Notation Meaningqj The probability of querying file j in the systemmi The probability that the next encountered node is node ipj The probability of obtaining file j in the next encountered nodeN Total number of nodesVi Node is meeting ability (i.e., frequency of meeting nodes)Si Storage space of node iV Average meeting ability of all nodes in the systemF Total number of files in the systembj Size of file jXij Whether node i contains file j or notVjk Meeting ability of the kth node that holds file jnj The number of nodes holding file j or its replicasAj Allocated resource for file j for replicationTj Average number of time intervals needed to meet file jT Average number of time intervals needed to meet a fileR Total amount of resource in the systemPj Priority value of file j, Pj =

qj/bj

In this section, we theoretically analyze the influenceof the file replica distribution on the overall queryefficiency in MANETs under the two mobility modelsfollowing the process introduced in Section 2.3. Pleaserefer to Table 1 for the meanings of notations.

3.2.1 Optimal File Replication with the RWP modelIn the RWP model, we can assume that the inter-meetingtime among nodes follows exponential distribution [29],[30]. Then, the probability of meeting a node is inde-pendent with the previous encountered node. Therefore,we define the meeting ability of a node as the averagenumber of nodes it meets in a unit time and use it toinvestigate the optimal file replication. Specifically, if anode is able to meet more nodes, it has higher probabilityof being encountered by other nodes later on. We use mito denote the probability that the next node a requestholder meets is node i. Then, mi is proportional to nodeis meeting ability (i.e., Vi). That is

mi =ViN

k=1Vk

=Vi

V N(3)

where N denotes the total number of nodes and Vdenotes the average meeting ability of all nodes.

We use vector (Vj1, Vj2, . . . , Vjnj ) to denote the meet-ing abilities of a group of nodes holding file j or itsreplica, where nj is the number of file j (includingreplicas) in the system. Then, the probability that a nodeobtains its requested file j from its encountering node isthe sum of the probabilities of encountering nodes thathold file j or its replica. That is,

pj =

Ni=1

miXij =

Ni=1

Vi

V NXij =

njk=1

Vjk

V N(4)

where Xij is a zero-one variable that denotes whethernode i contains file j or its replica.

As stated above, a nodes probability of being encoun-tered by other nodes is proportional to the meeting abil-ity of the node. This indicates that files residing in nodeswith higher meeting ability have higher availability thanfiles in nodes with lower meeting ability. So we take intoaccount both meeting ability and storage in measuring

a nodes resource. When a replica is created in a node, itoccupies the memory on the node. Also, its probabilityof being met by others is decided by the nodes meetingability. This means that the replica naturally consumesboth the storage resource and the meeting ability re-source of the node. Therefore, we denote the resourceon a node by SiVi, in which Si denotes node is storagespace and Vi denotes its meeting ability. Then, the totalamount of resource in the system (R) is:

R =Ni=1

SiVi (5)

Thus, the total resource allocated to file j is:

Rj = bj

njk=1

Vjk (6)

where bj is the size of file j. Based on Equation (6),Equation (4) can be represented as

pj =

bj

njk=1

Vjk

bjV N=

Rj

bjV N(7)

Thus, the probability of meeting file j after k (k =1, 2, 3, ) time intervals (i.e., average inter-meeting timeamong nodes) is

(1 pj)k1pjand the average number of time intervals needed for anode to meet a node containing file j is

Tj =

k=1

k(1 pj)k1pj =1

pj=bjV N

Rj(8)

We use qj [0, 1] to denote the probability of a nodesoriginating a request for file j in the system during a unitof time period (

Fj=1 qj = 1). Then, the average number

of intervals needed to satisfy a request is

T =

Fj=1

qjTj =

Fj=1

qjbjV N

Rj= V N

Fj=1

qjbjRj

(9)

We aim to minimize the global file querying delay(i.e., T ) by file replication. According to Equation (9),T is decided by qj , bj and Rj , and the values of qjand bj are decided by the system. Thus, the problem ofoptimal resource allocation is then converted to findingthe optimal amount of resource (Rj) for each file j underthe restriction of total available resource in order toachieve the minimum average querying delay.

Suppose Bj = qjbj , with Equations (5) and (9), theproblem of optimal resource allocation is expressed by

min(T ) = min{F

j=1

qjbjRj} = min{

Fj=1

BjRj} (10)

subject to:F

j=1

Rj R.

Equation (9) also indicates that each Rj should be aslarge as possible in order to minimize T . Therefore, weassume all resources (R) are allocated.



5

Fj=1

Rj = R (11)

By applying Formula (11), Formula (10) is changed to

min(T ) = min{B1

R1+

B2

R2+ +

BF

R (R1 +R2 + +RF1)} (12)

Next, we try to find the value of Rj (1 j F 1)that satisfies Formula (12). Specifically, we first calculatethe first order (necessary) condition by differentiating Ton each Rj (1 j F 1) respectively, and find thevalue of Rj that makes the differentiated formula equal0. The resultant formulas after differentiation are

B1R21 BF{R (R1 +R2 + +RF1)}2

= 0 (13)

BF1R2F1

BF{R (R1 +R2 + +RF1)}2= 0 (14)

Combine all of the above F 1 equations, we getB1R21

=B2R22

=B3R23

= = BF1R2F1

=BFR2F

(15)

To achieve the minimal average delay, the second order(sufficient) condition should be larger than 0 as below:

2B1R31

2BF{R (R1 +R2 + +RF1)}3> 0 (16)

2BF1R3F1

2BF{R (R1 +R2 + +RF1)}3> 0 (17)

If Equation (15) is true, based on Equation (11), Formu-las (16) and (17) can be transformed to below.

(1

RF 1R1

)2B1R21

> 0 (18)

(

1

RF 1RF1

)2BF1R2F1

> 0 (19)

When RF < Rj (j [1, F 1]), Equations (18) and (19)(and also the second order condition) are satisfied. Recallthat above result is obtained when we replace RF withR(R1+R2+ +RF1) in Equation (10). If we replaceRk (k [1, F ]) with R (R1 + Rk1 +Rk+1 + RF ),the second order is also satisfied when Rk < Rj (j [1, F ], j 6= k). In summary, the second order is satisfiedwhen the resource allocated for one file is less thanthe resource allocated for any other file. This conditionis always true because there always exists a file withthe minimum allocated resource. Therefore, as long asthe first order condition (Equation (15)) is satisfied, thesecond order condition is also satisfied.

Then, according to Equation (11) and Equation (15),we can see that the optimal allocation is

Rj =

Bj

Fk=1

Bk

R (j = 1, 2, 3, , F ) (20)

This means that the optimal resource allocation isachieved through the square root policy, i.e., the portionof resource for file j is in direct proportion of the square

root of Bj :

Rj Bj bj

njk=1

Vjk bjqj (21)

That is njk=1

Vjk qjbj

njk=1

Vjk Pj (22)

We callqj/bj the Priority Value (P ) of file j as it

represents the relative priority in acquiring resource forthe global optimization on querying delay.

Based on Formula (22), we derive the Optimal FileReplication Rule (OFRR) that gives the direction for theoptimal resource allocation for each file that leads to theminimum average file querying delay under the RWPmodel.

OFRR. In order to achieve minimum overall file queryingdelay, the sum of the meeting ability of replica nodes of file jshould be proportional to Pj =

qj/bj .

3.2.2 Optimal File Replication with the Community-Based Mobility Model

In this section, we conduct the analysis under thecommunity-based mobility model. Unless otherwisespecified, we use the same notations in Table 1 (which isfor the RWP model) but add to each notation to denotethat it is for the community-based mobility model. Recallthat in the RWP model, we can assume that the inter-meeting time of nodes follows exponential distribution.Based on this assumption, we can calculate the proba-bility that a newly met node is node i (i.e., mi), which isused to find the expected time T to satisfy a request andfinally deduce OFRR to minimize T . However, underthe community-based mobility model, this assumptiondoes not hold [31]. This makes it difficult to calculate mi,which makes the process of minimizing the overall delayT a formidable problem. To deal with this problem,rather than considering meeting ability, we consider eachnodes satisfying ability. It is defined as a nodes abilityto satisfy queries in the system (denoted by V i ) and iscalculated based on the nodes capacity to satisfy queriesin each community.

We use Nc to denote the number of nodes in commu-nity c. Then, community c holds NcN fraction of nodesin the system. Node is satisfying ability to communityc depends on both the number of different nodes in cit meets in a unit time period (denoted by Mic), andthe number of queries generated by nodes in c. In thismodel, since nodes file interests are stable during a cer-tain time period, we assume that each nodes queryingpattern (i.e., different querying rates for different files)remains stable during a certain period of time.

Then, the number of nodes in a community representsthe number of queries for a given file generated in thiscommunity. As a result, a file holder has low abilityto satisfy queries from a small community. Thus, weintegrate each communitys fraction of nodes (i.e., NcN )into the calculation of the satisfying ability. Therefore,



6

V i =

Cc=1

MicNcN

(23)

where C is the total number of communities.Given nj nodes that hold file j or its replicas, we

again use vector (V j1, Vj2, . . . , V

jk, . . . , V

inj

) to denotethe satisfying abilities of these nodes. Then, the overallability of nodes in the system to satisfy requests for filej (denoted by Oj) is the sum of all the satisfying abilitiestimes a redundancy elimination factor .

Oj =

njk=1

V jk ( [0, 1]) (24)

is added because different holders of file j may meetthe same requester for file j in the same time unit. Sincethe requester has only one request for file j, only thefirst meeting satisfies the file request, and the subsequentmeetings do not satisfy any requests for file j. In otherwords, denotes the discount on the overall satisfyingability considering the fact that the satisfying abilities ofdifferent file holders may overlap.

Then, the number of time intervals (i.e., average inter-meeting time among nodes) needed to satisfy a requestfor file j is

T j =1

Oj=

1

njk=1

V jk

(25)

Recall that bj denotes the size of file j and qj denotes theprobability of initiating a request for file j from nodesin the system. Similar to Equation (6), the total resource(satisfying resource and storage resource) allocated to file

j can be represented by Rj = bjnjk=1

V jk. As a result,

the average number of time intervals needed to satisfya request in the system is

T =

Fj=1

qjT j =

Fj=1

qj1

njk=1

V jk

=1

Fj=1

qjbjRj

(26)

Then, the problem of optimal resource allocation can beexpressed by

min(T ) = min{F

j=1

qjbjRj} = min{

Fj=1

BjRj} (27)

subject to: Fj=1

Rj R.

We can find that Equation (27) is the same as Equa-tion (10). Then, we follow the same process after Equa-tion (10) and deduce the OFRR rule in disconnectedMANETs as

njk=1

V jk qjbj

njk=1

V jk Pj (28)

We see that the OFRR under the community-basedmobility model (Equation (28)) is the same as the OFRRdeduced with the RWP model (Equation (22)) except thatV jk is the satisfying ability (Equation (23)) in the formerwhile is the meeting ability (defined in Table 1) in thelater. It is intriguing to find that Equation (23) turns to be

the same as the definition of Vi in Table 1 if the numberof community is 1. This means that the OFRR expressedby Equation (22) is a special case of the OFRR expressedby Equation (28). As a result, our previously deducedOFRR can be the OFRR for MANETs under the twomobility models.

It is interesting to find that the OFRR matches thesquare root assignment rule derived by Kleinrock [32]for the link capacity assignment in wireless communica-tion to maximize the network efficiency. It also matchesthe findings in [33] that when file servers may be un-available due to node dynamism, the wired P2P contentdistribution systems can achieve the maximum file hitrate when available storage is allocated in proportion toa constant value plus ln(qj/bj) for each file.

3.2.3 Extension to General Node Mobility ModelsIn the above two subsections, we deduced the OFRR

rule in RWP mobility model and community-based mo-bility model following the basic idea in Section 2.3.However, above analysis relies on two assumptionsmentioned in Section 3.1.3, which may not hold ingeneral node mobility models. Therefore, it is nontrivialto extend above analysis to general cases directly. Specif-ically, in certain mobility models, different nodes mayhave different visiting preferences or patterns, makingdifferent nodes probabilities of meeting node i in thenext encountering (mi) lack a direct general expression.

However, there are some ways to make the analysis ingeneral cases possible. For example, we can incorporatenew factors into mi to express each nodes distinctpattern, e.g., active levels and community identities.These factors usually represent how frequent a nodemeets other nodes. We can also first measure the meetingabilities of different nodes in a real scenario. Then, wecan assign labels to each node to indicate its roughmeeting ability. With these simplifications, mi can beexpressed and the analysis can be conducted. We leavethe research following such a direction to future work.

On the other hand, there are possibly fixed nodes inthe system, which are naturally supported in our anal-ysis. This is because we only care a nodes storage andmeeting ability regarding creating file replicas. Thoughfixed nodes do not move, they can meet other nodes,which means their meeting abilities can be measured oreven formulated. As a result, fixed nodes are regardedthe same as mobile nodes in the system.

3.3 Meeting Ability Distribution in Real TracesWe measured the meeting ability distribution from realtraces to confirm the necessity to consider node meetingability as an important factor in the resources allocationin our design. Specifically, for normal MANETs, we usedthe Dartmouth trace [34], which was obtained throughan outdoor project in Dartmouth College. The traceprovides position records of 35 laptop nodes movingrandomly and independently across different sections ofan open field. For disconnected MANETs, we used theMIT Reality trace [35] and the Haggle trace [36]. In the



7

15

18

21

24

27

30

1 6 11 16 21 26 31

Meetin

gability(x10

3 )

Nodesequence

Dartmouttrace

(a) In a connected MANET.

0

4

8

12

16

20

1 11 21 31 41 51 61 71 81 91

Meetin

gability(1

02)

Nodesequence

MitRealitytraceHaggletrace

(b) In disconnected MANETs.Fig. 1: Meeting ability distribution.

former, 97 smart phones were distributed to studentsand faculties at MIT. In the latter, 98 iMotes were as-signed to scholars attending the Infocom06 conference.In both traces, nodes contact records were recorded.

For each trace, we measured the meeting abilities of allnodes and ranked them in decreasing order, as shownin Figure 1(a) and Figure 1(b). We see that in all thethree traces, node meeting ability is distributed in awide range. This matches our previous claim that nodesusually have different meeting abilities. Also, it verifiesthe necessity of considering node meeting ability as aresource in file replication since if all nodes have similarmeeting ability, replicas on different nodes have similarprobability to meet requesters, and hence there is noneed to consider meeting ability in resource allocation.

4 DISTRIBUTED FILE REPLICATION PROTO-COL

In this section, we propose a distributed file replicationprotocol that can approximately realize the optimal filereplication rule (OFRR) with the two mobility models ina distributed manner. Since the OFRR in the two scenar-ios (i.e., Equation (22) and Equation (28)) have the sameform, we present the protocol in this section withoutindicating the specific scenario. We first introduce thechallenges to realize the OFRR and our solutions to thesechallenges. Then, we propose a replication protocol torealize OFRR and analyze the effect of the protocol.

4.1 Challenges and Solutions to Achieve the OFRRChallenge 1: resource allocation without a centralserver. OFRR shows that in order to realize the globallyoptimal querying delay, each files popularity (qj) andsize (bj), and the system resource (R) information (bothnode storage size and moving ability) must be known inorder to decide the portion of resource for each file forreplica creation. Specifically, suppose there are F files inthe system with b1q1 bF qF and total resource R, theresource allocated to file j (Rj) should be

Rj = Rbjqj/

Fk=1

bkqk (29)

Then, an intuitive way to achieve this goal is to setupa central server to collect all above-mentioned informa-tion, conduct the resource allocation for each file, anddistribute the information to file owners to replicate theirfiles. However, the nature of the distributed network,node mobility and transmission range constraint becomeobstacles of building such a central service. For example,

since nodes are constantly moving and have limitedcommunication ranges, it is impossible for each nodeto update its information to or receive information fromthe server timely. Thus, a severe challenge is to enablea node to distributively figure out the proper portion ofresource for each of its files without a central server.

Even when each node knowsbjqj/

Fk=1

bkqk of

each of its files, the total amount of resources availablein the system may change due to node joins and depar-tures, which makes it difficult for a node to calculate theportion of resource of each of its file (Rj). For example,suppose there are only two files in the system, say f1 andf2, and the ratio of their allocated resources should be4:1. If the total amount of resourceR = 40, the amount ofresource allocated to f1 is 32. If R = 60, the amount forf1 should be adjusted to 48. Further, the time-varying filepopularity (qj) make the problem even more formidable.Therefore, OFRR cannot be simply realized by lettingeach node distribute replicas of a file until an absoluteamount of resource is used.

Solution to Challenge 1: resource competition. OFRR(i.e, Formula (22)) requires that for each file, the sum ofits replica nodes meeting abilities,

nFk=1 VFk, is propor-

tional to its priority value P . In other words, OFRR canbe shown by

P1/

n1k=1

V1k = P2/

n2k=1

V2k = PF /nFk=1

VFk (30)

where nj (j [1, 2, , F ]) represents the number ofreplica nodes of file j. Then, we can let each file, say filej, periodically compete for the resource with its currentPj/

njk=1

Vjk. In one competition, the file with the highestPj/

njk=1

Vjk wins and receives resource for one replica.After a file creates a replica, its Pj/

njk=1

Vjk decreases.The competition stops when all available resource isallocated and no one can win a competition. Thus, fileswith larger Pj/

njk=1

Vjk win more competitions andreceive more resource and files with smaller Pj/

njk=1

Vjkonly win few competitions and receive less resource. Thecompetition gradually lets each file receive its deservedportion of resource based on OFRR. By enabling fileowners to distributively compete for resource for theirfiles, we can realize OFRR without a central server.

Challenge 2: competition for distributed resource.In a MANET, all available resource is scattered amongdifferent nodes moving around in the network. Thisposes three problems. First, different file owners arescattered and can hardly gather together to conduct theresource competition. Second, after a file is replicated toa number of nodes, it is difficult to collect the popularityof the replicas to update the P of the file. Third, sincethe number of nodes met by a file owner is limited, asingle file owner cannot distribute replicas efficiently andquickly. We propose a work-around for this problem.Specifically, we regard a file and its newly created replicaas two different files, which participate in further compe-tition independently with evenly split P . However, thisbrings another challenge: since replica nodes of a file arescattered in the network, how to ensure that the overall



8

Vjk is proportional to the overall P of the file? We

solve it in next subsection.Note the competition used in the description is not

to show that resources are very limited. It is only toshow the process of resource allocation, which can beviewed as a probability based resource allocation algo-rithm. Such a solution increases the complexity of thesystem. However, this is caused by the distributed natureof MANETs. We will investigate how to reduce thecomplexity in the next step. For example, we can checkwhether files can reduce the frequency of competitionbut still get the deserved amount of resources.

Solution to Challenge 2: distributive competition onselective resources. In the solution to Challenge 1, eachfile periodically competes for resource with its currentPj/

njk=1

Vjk. However, as previously mentioned, it isa challenge to keep the overall P proportional to theoverall

Vjk while replica holders are scattered. We in-

directly resolve this problem by keeping the average V ofthe replica nodes of a file close to V . Then, Formula (22)can be re-expressed as

nj V qjbj nj

qjbj nj Pj (31)

In such a case, when the number of replicas of each fileis proportional to its Pj =

qj/bj , OFRR is satisfied.

To attain this goal, we let each node deliberately selecta neighbor node to create replicas of its file so that theaverage meeting ability of replica nodes of the file isequal or closest to V . Considering the diverse mobilityof nodes in the network, a node should be able to findreplica nodes whose average meeting ability equals Vduring its movement. Then, based on Equation (31),each node only needs to consider the P of each file inthe resource completion. Upon winning a competitionfor a file, a node splits the files P evenly betweenthe file and the replica. After this, the popularity ofeach file/replica is continuously updated based on thenumber of requests received for it in a unit time period,which is used to update its priority value P .

When a replica is deleted in the competition, wecannot reverse the process of priority split because itis very difficult to track locations of the holders ofthe original file in a distributed manner due to themobility of nodes in MANETs. Fortunately, we can usethe querying popularity q to handle this problem. Inthis case, the qs (or P s) of other replicas of the fileincrease since they receive more requests for the file asthe total amount of requests is stable. That is, the sumof the replicas P s equals the overall P of the originalfile j (Pi). The increase of priority value caused by thereplica deletion can be regarded as the reversed processof priority split. As a result, the number of replicas ofeach file is proportional to the sum of meeting ability ofits replica nodes, realizing Formula (22).

4.2 Design of the File Replication ProtocolThe two solutions to handle the challenges in achievingOFRR described above are maximal approximation to

File Priority competition

Replica creation &

priority split

Success

Try at most K times

Select one neighbor by the OFRR RULE

Failure

Fig. 2: Replica distribution process.

realize the OFRR in a distributed manner. Based on thesolutions, we propose the Priority Competition and Splitfile replication protocol (PCS). We first introduce how anode retrieves the parameters needed in PCS and thenpresent the detail of PCS.

In PCS, each node dynamically updates its meetingability (Vi) and the average meeting ability of all nodesin the system (V ). Such information is exchanged amongneighbor nodes. We explain the detail of this step inSection 4.3. Each node also periodically calculates thePj =

qj/bj of each of its files. The qj is calculated by

qj = uj/U , where uj and U are the number of receivedrequests for the file and the total number of queriesgenerated in a unit of time period, respectively. Notethat U is a pre-defined system parameter.

In the solution to Challenge 2, nodes replicate filesdistributively and select replicate nodes to ensure thatthe average meeting ability of replica nodes of a file theclosest to V . That is, Vn

jV , where nj is the number of

created replicas of file j and Vnj

is the average meetingability of these replica nodes. Therefore, each node needsto keep track of nj and Vnj of each of its file. Aftercreating a replica, the node increases nj by 1 and updatesVn

jusing the V of the new replica node.

With the above information, we introduce the processof the replication of a file in PCS. Based on OFRR, since afile with a higher P should receive more resource, a nodeshould assign higher priority to its files with higher Pto compete resource with other nodes. Thus, each nodeorders all of its files in descending order of their P sand creates replicas for the files in a top-down mannerperiodically. Algorithm 1 presents the pseudo-code forthe process of PCS between two encountered nodes. Indetail, suppose node i needs to replicate file j on thetop of the list, as shown in Figure 2, it keeps trying toreplicate file j on nodes it encounters until one replicais created or K attempts have been made. If file j isreplicated, its P is split and it is inserted to its new placein the list. Next, the node fetches the file from the top ofthe list and repeats the process. If file j fails to replicateafter K attempts, the node stops launching competitionuntil the next period.

Following the solution to Challenge 2, a replicatingnode should keep the average meeting ability of thereplica nodes for file j around V . Node i first checksthe meeting abilities of neighbors and then chooses theneighbor k that does not contain file j and makes V newn

j=

(njVnj +Vk)/(nj +1) the closest to V as the replica node

candidate. It is possible that V newnj

is far away from V .Therefore, we set a deviation range r. If creating a replicain the selected neighbor makes (V newn

jV ) > r, then the



9

node does not replicate file j in the selected neighboruntil it has a different set of neighbors.

In the case that (V newnj V ) r, if the selected

neighbors available storage is larger than the size of filej (Sj), it creates a replica for file j directly. Otherwise,a competition is launched among the replica of file jand replicas already in the neighbor node based ontheir P s. The priority value of the new replica is set tohalf of the original files P . According to the solutionto Challenge 1, the probability that a replica wins theresource competition is proportional to its P , i.e., areplicas probability of being selected to be removedis inversely proportional to its P . Then, suppose thereare d replicas in competition, we let each replica beresponsible for a range that equals its 1/P in range space[0,

dk=1 1/Pk]. The neighbor node randomly chooses a

number in [0,d

k=1 1/Pk], and the replica whose rangeowns the number is selected to be removed. The neigh-bor node repeats above process until available storage isno less than the size of file j.

If file j is among the selected files, it fails the compe-tition and will not be replicated in the neighbor node.Otherwise, all selected files are removed and file j isreplicated. If file j fails, node i will launch anotherattempt for file j until the maximum number of attempts(K) is reached. The setting of K attempts is to ensure thateach file can compete with a sufficient subset of replicasin the system. If node i fails to create a replica for file jafter K attempts, then replicas in node i with smaller P sthan file j are unlikely to win a competition. Thus, at thismoment, node i stops replicating files until next round.Finally, all available resource in the system is allocatedto replicas according to their P s (i.e., OFRR is realized).

According to the Solution to Challenge 2, we regardfile js replica as a different file from file j in PCS.Therefore, if node i successfully creates a replica forfile j, it splits the files P evenly between file j andthe new replica. Thus, each files priority is P/2. Afterthe splitting, the two copies of file j involve in furtherresource competition independently. Note that we do notsplit files in the PCS algorithm but split the priority valueof a file when a replica is created.

The replication for a file stops when the communi-cation session of the two involved nodes ends. Then,the node will continue the replication process for thefile again after excluding the disconnected node fromthe neighbor node list. Since the popularity of filespopularity and P s and available system resource changeas time goes on, each node periodically executes PCSto dynamically handle these time-varying factors. Eachnode also periodically calculates the popularity of itsfiles (qj) to reflect the changes on file popularity (dueto node querying pattern and rate changes) in differenttime periods. The periodical file popularity update canautomatically handle file dynamism. The popularity ofnewly added files will be calculated and hence these fileswill be considered in resource allocation. Similarly, thoseof deleted files will not be calculated and hence these filewill not be considered in resource allocation.

Algorithm 1 Pseudo-code of PCS between node i and k.i.createReplicasOn(k) //node i tries to create a replica on node kk.createReplicasOn(i) //node k tries to create a replica on node iProcedure createReplicasOn (node)

nCount 0 //initialize a countthis.orderFilesByP() //order files by priority valueFor (each file f in current node) //try to replica each file

If (node.compete4File(f) == true) //competitionnode.createAReplica4(f) //create a replica if win

elsenCount nCount+1

If nCount K //try at most K timesBreak

end ProcedureProcedure compete4File() //Compete for file j

While (nRemainningMem < j.size())nSum nTotal nRandom fFile 0 //initilizationFor (each file f (including j) in current node)

nTotal nTotal+1/PfnRandom generateARandomNumber() % nTotalFor (each file f (including j) in current node)

nSum nSum+1/PfIf (nSum >= nRandom)

fFile = f Break //pick the fileIf (fFile = j) //j is the picked file, competition fails

return falseElse //win the competition

select fFiledelSelectedFiles() //delete the selected filesreturn true

end Procedure

4.3 How to Collect Meeting Ability InformationIn a MANET, nodes periodically exchange beacon mes-sages to discover neighbor nodes. The frequency of thebeacon messages depends on the mobility of nodes. Thesize of a beacon message usually is several bytes. To savecommunication cost, the values of Vi and V are piggy-backed into beacon messages. Since Vi and V are onlyseveral bytes, the piggybacking only slightly increasesthe size of the beacon message. In normal MANETs,a nodes meeting ability (Vi) is simply measured bythe frequency it meets other nodes. In disconnectedMANETs, a node needs to know the distribution ofdifferent communities to calculate its satisfying ability(Equation (23)). We then let each node piggyback itscommunity ID and the community information it knowsin the beacon message. Also, its hard to collect thesatisfying abilities of all nodes in distributed MANETsin a timely manner since nodes are sparsely distributed.We let each node simply use the average meeting abilityof all so far encountered nodes as that for all nodes inthe system. As nodes meet more and more nodes, thecalculated value can generally represent that of all nodes.

4.4 Analysis of the Effectiveness of PCSIn this section, we briefly prove the effectiveness of PCS.We refer to the process in which a node tries to copy afile to its neighbors as one round of replica distribution.

Recall that when a replica is created for a file with P ,the two copies will replicate files with priority P/2 inthe next round. This means that the creation of replicaswill not increase the overall P of the file. Also, aftereach round, the priority value of each file or replica is



10

updated based on the received requests for the file. Then,though some replicas may be deleted in the competition,the total amount of requests for the file remains stable,making the sum of the Ps of all replicas and the originalfile roughly equal to the overall priority value of thefile. Then, we can regard the replicas of a file as anentity that competes for available resource in the systemwith accumulated priority P in each round. Therefore, ineach round of replica distribution, based on our designof PCS, the overall probability of creating a replica foran original file j, denoted by Psj , is proportional to itsoverall Pj . That is:

Psj Pj (32)Then, suppose total M rounds of competition are con-ducted, the expected number of replicas, denoted by nj ,for file j is

nj =MPsj nj Pj (33)Therefore, we conclude that the PCS can realize Equation(31), in which the number of replicas of each file isproportional to its P , thereby realizing the OFRR.

We further briefly discuss the security and incentiveconsiderations for PCS in Appendix B .

5 PERFORMANCE EVALUATION IN NORMALMANETS WITH THE RWP MODELTo evaluate the performance of PCS in normal MANETs,we conducted experiments on both the GENI Orbittestbed [37], [38] and the NS-2 [39] simulator. TheGENI testbed consists of 400 nodes equipped with wire-less cards. We used the Dartmouth real-world MANETtrace [34], which provides the mobility trace of 35 laptopsmoving in an open field, to drive node mobility in bothexperiments. In order to validate the adaptability ofPCS, we used two routing protocols in the experiments.We first used the StaticWait protocol [40] in the GENIexperiment, in which each query stays on the sourcenode waiting for the destination. We then used a proba-bilistic routing protocol (PROPHET) [6], in which a noderoutes requests to the neighbor with the highest meetingability. We set a larger TTL for Static Wait since it needsmore time to find a file holder. We used 95% confidenceinterval when handling the experimental results.

We evaluated the performance of PCS in normalMANETs in comparison with several MANET replica-tion algorithms: SAF [10], DCG [10], PDRS [12] andCACHE [9]. The details of these protocols can be foundin Section 2. To better validate our analysis, we alsocompared PCS with Random, which places replicas onnodes randomly, and OPTM, which is a centralizedprotocol that calculates the ideal number of replicas foreach file based on our derived optimal replication rule.OPTM represents the best possible performance can beobtained by the OFRR. In order to evaluate our protocolunder different network sizes and node mobilities, wealso conducted simulation on the NS-2 with differentnetwork sizes and node mobilities synthesized by themodified RWP model. Due to page limit, the results ofthese tests are shown in Appendix A.

Table 2 shows the parameters used in experiments,unless otherwise specified. The parameters are deter-mined by referring to the settings in [9], [41] and the realtrace. According to the works in [9], [42], we determinedthe file size and storage space on each node. As thework in [33], the probability of originating requests fordifferent files in each node followed a Zipf distributionand the Zipf parameter was set to 0.7. Initially, fileswere evenly distributed to each node and no replicaexisted in the system. In the synthesized mobility, thespeed of a node was randomly chosen from the rangeof [s/2, 3s/2], where s is the configured average nodemovement speed. Since the real trace does not indicatethe communication range of each node, we set thecommunication range to 100m in the simulation and to60m in the GENI experiment in order to see the influenceof different transmission ranges on the performance. Weevaluated the performance of PCS with K = 3.

We used the following metrics in the experiments: Hit Rate. This refers to the percent of requests that

are successfully resolved by either original filesor replicas. This metric shows the effectiveness ofreplication protocols in enhancing file availability.

Average delay. This is the average delay of all re-quests. To make the comparison fair, we included allrequests in the calculation. For unresolved requests,we set their delays as the TTL. This metric showsthe efficiency of replication protocols in terms of filequerying delay.

Replication cost. This is the total number of messagesgenerated in creating replicates. This metric showsthe overhead of replication protocols.

Cumulative Distribution Function (CDF) of the propor-tion of replicas. This is the CDF of the proportion ofreplicas of each file. This metric reflects the amountof resource allocated to each file for replication.

TABLE 2: Simulation parameters.Real trace Synthesized mobility

Environment Parameters GENI / NS-2 NS-2Simulation area 600m 300m 1000m 1000mNode ParametersNumber of nodes 35 60Communication range 60m / 100m 250mAverage movement speed - 6m/sThe size of a file (kb) 1 10 1 10Number of files in each node 10 10Storage space for replicas (kb) 50 50Query ParametersInitialization period 500s / 800s 200sQuerying period 1500s / 1200s 600sTTL of each request 1000s / 200s 200sTotal time for each test 3000s / 3000s 1000s

5.1 Performance in the Trace-Driven GENI experi-ments5.1.1 Hit Rate and Average DelayTable 3 shows the results of each protocol in the trace-driven experiments on GENI. We see that the hit rates indifferent replication protocols follow RandomOPTM. We see that OPTM and PCS



11

lead to higher hit rate and lower average delay thanothers. This is attributed to the guidance of OFRR,which aims to minimize the average querying delay byconsidering both storage and meeting ability as resourceto enhance overall file availability. PCS generates slightlylower hit rate and around 20% higher average delay thanOPTM. This is because OPTM has the knowledge of allinformation needed in OFRR beforehand, while PCS hasto distribute replicas in a fully distributed manner.

On the contrary, other protocols only replicate fileslocally, creating redundant replicas and failing to achievehigh file availability under node mobility. Random hasthe worst performance on hit rate and average delay.This is because Random only randomly creates replicasfor files and fails to assign more resources to popu-lar files, which are queried more frequently by nodes.CACHE only utilizes the storage on intersection nodes,which indicates that it fails to fully utilize storage spacein all nodes. Therefore, it cannot create as many replicasas other protocols and exhibits a low hit rate and ahight delay. In SAF, each node replicates its frequentlyqueried files until its memory is filled up. Then, almostall resources are allocated to popular files. Therefore,SAF cannot optimize query delay globally. In PDRS,a node replicates files interested by its neighbors thathave less storage resource than itself. However, as thesharing of replicas is not in the whole group, PDRS onlyrenders a slightly performance improvement over SAF.DCG further improves SAF and PDRS by conducting thefile replication on a group level. It eliminates duplicatereplicas among group members and uses released mem-ory for other replicas, thereby generating higher hit rateand smaller average delay.

We find that the 1st percentiles of the delays of allprotocols are 0.01. This is because some requests areimmediately satisfied by direct neighbors. The 99th per-centiles of the delays of the protocols approximatelyfollow the relationship on average delay. Above resultsjustify that PCS enhances the file searching efficiency byits global optimization of file availability. The fact thatRandom leads to worse performance than all methodsthat give priority to popular files when creating replicasalso justify that a resource allocation strategy is neces-sary for file availability optimization.

TABLE 3: Experimental results of the trace-driven GENI experiments.Protocol Hit rate Average / 1% / 99% delay (s) Replication costRandom 0.840139 263.176 / 0.01 / 991.9843 13387CACHE 0.842454 260.469 / 0.01 / 994.2487 0SAF 0.857341 259.1768 / 0.01 / 997.1095 0PDRS 0.863074 256.1983 / 0.01 / 991.2384 175140DCG 0.878559 251.3287 / 0.01 / 993.3947 67549PCS 0.898823 240.7031 / 0.01 / 990.4522 28983OPTM 0.910370 195.1776 / 0.01 / 990.1296 14542

5.1.2 Replication Cost

From the table, we find that the replication costs ofdifferent protocols follow PDRS>DCG>PCS>OPTMRandom>SAF=CACHE=0. PDRS shows the highestreplication cost because it needs to broadcast each newfile to all nodes in the system. DCG incurs moderate

replication cost because group members need to ex-change information to reduce duplicate replicas. PCS hasa low replication cost because each node only tries atmost K times to create a new replica for each file itholds. OPTM and Random have a very low cost sincenodes only need to communicate with the central serverfor replica list. SAF and CACHE have no replication costsince they do not need to exchange information amongnodes for file replication. However, SAF generates a lotof redundant replicas, and Random and CACHE lead tolow performance.

5.1.3 Replica DistributionFigure 3 shows the CDF of the proportion of re-

source allocated to each file for replica creation in dif-ferent protocols. From the figure, we find that PCSexhibits the closest similarity to OPTM while otherprotocols follow: DCGRandomCACHEPDRSSAF,where means closer similarity to OPTM. Combining

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 35 70 105 140 175 210 245 280 315

CDFofth

eprop

ortio

nofre

plicas

Filesequenceindecreasingorderofpopularity

PCS DCGSAF CACHEOPTM PDRSRandom

PCS

DCG

OPTM

SAF

PDRSCACHE Random

Fig. 3: CDF of the resource allo-cated to replicas in trace-drivenGENI experiment.

the results on average de-lay, we find an interest-ing phenomenon: exceptCACHE and Random, aprotocol with closer sim-ilarity to OPTM has lessaverage delay. This provesthe correctness of our the-oretical analysis and theresultant OFRR rule ex-pressed in Formula (22).CACHE has a low performance because it does notutilize all storage space, though it exhibits similarity withPDRS. Random creates replicas for each file randomlywithout considering their popularity, leading to a lowperformance since popular files are not replicated withpriority. We also observe that the CDFs of the proportionof resource allocated to replicas of DCG, CACHE, PDRSand SAF increases to 0.9 quickly. This is because theyallocate most resources to popular files, resulting in alot of replicas for these files. Though these protocols canreduce the delay of queries for popular files, they cannotreduce the delay for unpopular files. PCS is superiorover these protocols because it can globally reduce thequery delay for all files.

5.2 Performance in the Trace-Driven Simulation5.2.1 Hit Rate and Average DelayTable 4 shows the results of each protocol in the trace-driven experiments on NS-2. We see the hit rates andaverage delays of the seven protocols follow the samerelationship as in Table 3 due to the same reasons. Wefind that the average delays of the seven protocols aremuch less than those in the GENI experiment. This iscaused by two reasons. First, the trace-driven simulationadopts the PROPHET for file searching, which can locatefiles more quickly than the StaticWait searching protocolused in the GENI experiment. Second, the communi-cation range of two nodes (100m) in the simulation islarger than that in the GENI experiment (60m), leading



12

to shorter searching delay since a node can reach moreneighbors. The hit rates of the seven protocols are lowerthan those in the GENI experiment. This is because thetrace-driven simulation used much smaller TTL. Therelative performance between different protocols in thesimulation matches that in the GENI experiment, whichfurther proves the effectiveness of PCS.

5.2.2 Replication CostFrom Table 4, we find that the replication costsof different protocols follow PDRS>DCG>PCS>OPTMRandom>SAF=CACHE=0. This matches the results inTable 3 and the reasons are the same.

5.2.3 Replica DistributionFigure 4 shows the CDF of the proportion of

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 35 70 105 140 175 210 245 280 315

CDFofth

eprop

ortio

nofre

plicas


PCS DCGSAF CACHEOPTM PDRSRandom

PCS

DCG

Random

SAFPDRS

CACHE

OPTM

Fig. 4: CDF of the resource allo-cated to replicas in trace-drivensimulation.

resource allocated to repli-cas of each file in theseven protocols. From thefigure, we find similartrend as that in Figure 3.That is, except CACHEand Random, a proto-col with closer similar-ity to OPTM has less av-erage delay. This furtherproves the correctness ofour analysis through the trace-driven simulation.

TABLE 4: Simulation results of the trace-driven experiments.Protocol Hit rate Average / 1% / 99% delay (s) Replication costRandom 0.828652 67.9564 / 0.00175637 / 193.259 4695CACHE 0.830038 64.6417 / 0.00172859 / 191.703 0SAF 0.837664 62.1525 / 0.00172887 / 190.896 0PDRS 0.842982 61.0969 / 0.00172652 / 191.279 246454DCG 0.848559 59.0611 / 0.00172883 / 189.270 14510PCS 0.868749 50.2859 / 0.00172885 / 188.550 9846OPTM 0.878677 41.2282 / 0.00172874 / 188.428 4721

6 PERFORMANCE EVALUATION IN DISCON-NECTED MANETS WITH THE COMMUNITY-BASED MOBILITY MODELIn order to evaluate the performance of PCS in dis-

connected MANETs, we conducted event-driven exper-iments with the MIT Reality project [35] trace and theHaggle project [36] trace. The MIT Reality trace lastsabout 2.56 million seconds (Ms), while the Haggle projecttrace lasts about 0.34 Ms. Both traces represent typi-cal disconnected MANET scenarios. We used the StaticWaiting routing protocol [40] in this test.

We evaluated the performance of PCS in comparisonwith DCG [10], CACHE-DTN [14], OPTM, and Random.CACHE-DTN is a caching algorithm for DTNs. It cacheseach file in the central node of each network centerlocation (NCL). If a central node is full, its replicas arestored in its neighbor nodes according to their popu-larity. A higher popular replica is stored closer to thecentral node. The experiment settings and measurementmetrics are the same as in Section 5 unless otherwisespecified below. The total number of queries was set to6000Rp, and Rp is the query rate and was varied in the

range of [2, 6]. In the experiment with the Haggle traceand the MIT Reality trace, all queries were generatedevenly in the time period of [0.3Ms, 2.3Ms] and [0.05Ms,0.25Ms], and the TTL of each query was set to 0.3Msand 0.04Ms, respectively. We again adopted the 95%confidence interval when handling experimental data.

6.1 Hit Rate

Figure 5(a) and Figure 6(a) plot the hit rates ofthe five methods with the Haggle trace and theMIT Reality trace, respectively. We see that in bothscenarios, the hit rates follow OPTM>PCS>CACHE-DTN>DCG>Random. OPTM and PCS achieve higherhit rate than other methods because they follow thededuced OFRR. However, since PCS realizes OFRR in adistributed way, it presents slightly inferior performancecompared to OPTM. CACHE-DTN considers the inter-mittent connection properties of disconnected MANETsand replicates each file to every NCL, leading to highdate accessibility, though not optimal. DCG only con-siders temporary connected group for data replication,which is not stable in disconnected MANETs. Therefore,it has a low hit rate. Random assigns resources to filesrandomly, which means it cannot create more replicasfor popular files, leading to the lowest hit rate. Such aresult proves the effectiveness of the proposed PCS onimproving the overall file availability and the correctnessof our derived OFRR for disconnected MANETs.

We also see that the hit rates of different methodsfluctuate slightly when the query rate increases. Thisis because the hit rate is not affected by the queryrate. Even when the number of query increases, thefile availability remains on the same level and leads tosimilar hit rates, as shown in the two figures.

6.2 Average Delay

Figure 5(b) and Figure 6(b) demonstrate the averagedelays of the five methods with the Haggle trace and theMIT Reality trace, respectively. We find that with bothtraces, the average delays follow OPTM



13

0.63

0.66

0.69

0.72

0.75

0.78

0.81

0.84

2 3 4 5 6

Hitrate

Queryrate

PCS DCGCACHEDTN OPTMRandom

(a) Hit rate.

16

20

24

28

32

2 3 4 5 6

Averagede

lay(x10

3 s)

Queryrate


(b) Average delay.

8.0E+05

2.8E+06

4.8E+06

6.8E+06

8.8E+06

2 3 4 5 6

Replicationcost

Queryrate


(c) Replication cost.

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 110 220 330 440 550 660 770 880

CDFofth

eprop

ortio

nofre

plicas



(d) CDF of allocated resources.

Fig. 5: Performance of the file replication protocols with the Haggle trace.

0.40

0.44

0.48

0.52

0.56

0.60

0.64

0.68

2 3 4 5 6

Hitrate

Queryrate


(a) Hit rate.

22

25

28

31

34

37

2 3 4 5 6

Averagede

lay(x10

4 s)

Queryrate


(b) Average delay.

8.0E+05

2.8E+06

4.8E+06

6.8E+06

2 3 4 5 6

Replicationcost

Queryrate


(c) Replication cost.

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 110 220 330 440 550 660 770

CDFofth

eprop

ortio

nofre

plicas



(d) CDF of allocated resources.

Fig. 6: Performance of the file replication protocols with the MIT Reality trace.

6.3 Replication CostFigure 5(c) and Figure 6(c) show the replication costsof the five methods with the Haggle trace and the MITReality trace, respectively. OPTM and Random have thelowest replication cost while the costs of the other threemethods follow PCS



14

protocols that only consider storage space as resource,we also consider file holders ability to meet nodes asavailable resource since it also affects the average query-ing delay. This new concept enhances the correctness ofthe deduced rule and the effectiveness of the accordinglydeveloped replication protocol. Finally, we designed thePriority Competition and Split replication protocol (PCS)that realizes the proposed optimal replication rule in afully distributed manner. Extensive experiments on bothreal-world GENI testbed, NS-2, and event-driven simula-tor with real trace and synthesized mobility confirm boththe correctness of our theoretical analysis and the effec-tiveness of PCS in MANETs. In this study, we focus on astatic set of files in the network. In our future work, wewill theoretically analyze a more complex environmentincluding file dynamics (file addition and deletion, filetimeout) and dynamic node querying pattern.

ACKNOWLEDGMENTThis research was supported in part by U.S. NSF grantsOCI-1064230, CNS-1049947, CNS-1025652, CNS-1025649,CNS-1057530 and CNS-0917056, Microsoft Research Fac-ulty Fellowship 8300751, and Sandia National Laborato-ries grant 10002282.

REFERENCES[1] Qik, http://qik.com/.[2] Flixwagon, http://www.flixwagon.com/.[3] C. Palazzi and A. Bujari, A delay/disruption tolerant solution

for mobile to mobile file sharing. in Proc. of IFIP/IEEE WirelessDays, 2010.

[4] Y. Tseng, S. Ni, and E. Shih, Adaptive approaches to relievingbroadcast storms in a wireless multihop mobile ad hoc network,in Proc. of ICDCS, 2001, pp. 481488.

[5] B. Chiara, C. Marco, and et al., Hibop: A history based routingprotocol for opportunistic networks, in Proc. of WoWMoM, 2007.

[6] A. Lindgren, A. Doria, and O. Schelen, Probabilistic routing inintermittently connected networks, MC2R, vol. 7, no. 3, pp. 1920, 2003.

[7] F. Li and J. Wu, MOPS: Providing content-based service indisruption-tolerant networks, in Proc. of ICDCS, 2009.

[8] S. Moussaoui, M. Guerroumi, and N. Badache, Data replicationin mobile ad hoc networks, in Proc. of MSN, 2006, pp. 685697.

[9] L. Yin and G. Cao, Supporting cooperative caching in ad hocnetworks, TMC, vol. 5, no. 1, pp. 7789, 2006.

[10] T. Hara and S. K. Madria, Data replication for improving dataaccessibility in ad hoc networks, TMC, vol. 5, no. 11, pp. 15151532, 2006.

[11] J. Zheng, J. Su, K. Yang, and Y. Wang, Stable neighbor basedadaptive replica allocation in mobile ad hoc networks, in Proc.of ICCS, 2004.

[12] H. Duong and I. Demeure, Proactive data replication semanticinformation within mobility groups in MANET, in Proc. ofMobilware, 2009.

[13] Y. Huang, Y. Gao, and et al., Optimizing file retrieval in delay-tolerant content distribution community, in Proc. of ICDCS, 2009.

[14] W. Gao, G. Cao, A. Iyengar, and M. Srivatsa, Supporting cooper-ative caching in disruption tolerant networks. in Proc. of ICDCS,2011.

[15] J. Reich and A. Chaintreau, The age of impatience: optimal repli-cation schemes for opportunistic networks. in Proc. of CoNEXT,2009.

[16] S. Ioannidis, L. Massoulie, and A. Chaintreau, Distributedcaching over heterogeneous mobile networks. in Proc. of SIG-METRICS, 2010.

[17] M. J. Pitkanen and J. Ott, Redundancy and distributed cachingin mobile DTNs, in Proc. of MobiArch, 2007.

[18] X. Zhuo, Q. Li, W. Gao, G. Cao, and Y. Dai, Contact durationaware data replication in delay tolerant networks. in Proc. ofICNP, 2011.

[19] X. Zhuo, Q. Li, G. Cao, Y. Dai, B. K. Szymanski, and T. L. Porta,Social-based cooperative caching in DTNs: A contact durationaware approach. in Proc. of MASS, 2011.

[20] Z. Li and H. Shen, Sedum: Exploiting social networks in utility-based distributed routing for DTNs, TC, 2012.

[21] V. Gianuzzi, Data replication effectiveness in mobile ad-hocnetworks, in Proc. of PE-WASUN, 2004, pp. 1722.

[22] S. Chessa and P. Maestrini, Dependable and secure data storageand retrieval in mobile wireless networks, in Proc. of DSN, 2003.

[23] X. Chen, Data replication approaches for ad hoc wireless net-works satisfying time constraints, IJPEDS, vol. 22, no. 3, pp. 149161, 2007.

[24] J. Broch, D. A. Maltz, D. B. Johnson, Y. Hu, and J. G. Jetcheva, Aperformance comparison of multi-hop wireless ad hoc networkrouting protocols, in Proc. of MOBICOM, 1998, pp. 8597.

[25] M. Musolesi and C. Mascolo, Designing mobility models basedon social network theory, MCCR, vol. 11, pp. 5970, 2007.

[26] Http://web.informatik.uni-bonn.de/IV/BoMoNet/BonnMotion.htm.[27] P. Costa, C. Mascolo, M. Musolesi, and G. P. Picco, Socially-

aware routing for publish-subscribe in delay-tolerant mobile adhoc networks, IEEE JSAC, vol. 26, no. 5, pp. 748760, 2008.

[28] M. Musolesi and C. Mascolo, Car: Context-aware adaptive rout-ing for delay-tolerant mobile networks. TMC, 2009.

[29] H. Cai and D. Y. Eun, Crossing over the bounded domain: fromexponential to power-law inter-meeting time in MANET. in Proc.of MOBICOM, 2007.

[30] R. Groenevelt, P. Nain, and G. Koole, The message delay inmobile ad hoc networks. Perform. Eval., vol. 62, pp. 210228, 2005.

[31] G. Sharma, R. Mazumdar, and N. B. Shroff, Delay and capacitytrade-offs in mobile ad hoc networks: A global perspective. inProc. of INFOCOM, 2006.

[32] L. Kleinrock, Queueing Systems, Volume II: Coputer Applications.John Wiley & Sons, 1976.

[33] J. Kangasharju, K. W. Ross, and D. A. Turner, Optimizing fileavailability in peer-to-peer content distribution, in Proc. of IN-FOCOM, 2007.

[34] R. S. Gray, D. Kotz, C. Newport, N. Dubrovsky,A. Fiske, J. Liu, C. Masone, S. McGrath, and Y. Yuan,CRAWDAD data set dartmouth/outdoor (v. 2006-11-06),http://crawdad.cs.dartmouth.edu/dartmouth/outdoor.

[35] N. Eagle, A. Pentland, and D. Lazer, Inferring social networkstructure using mobile phone data, PNAS, vol. 106, no. 36, 2009.

[36] A. Chaintreau, P. Hui, J. Scott, R. Gass, J. Crowcroft, and C. Diot,Impact of human mobility on opportunistic forwarding algo-rithms, in Proc. of INFOCOM, 2006.

[37] GENI project, http://www.geni.net/.[38] Orbit, http://www.orbit-lab.org/.[39] The Network Simulator ns-2, http://www.isi.edu/nsnam/ns/.[40] T. Spyropoulos, K. Psounis, and C. Raghavendra, Efficient rout-

ing in intermittently connected mobile networks: The single-copycase, ACM/IEEE Transactions on Networking, 2007.

[41] M. Lu and J. Wu, Opportunistic routing algebra and its applica-tion, in Proc. of INFOCOM, 2009.

[42] T. Hara, Effective replica allocation in ad hoc networks forimproving data accessibility, in Proc. of INFOCOM, 2001.

[43] Z. Li and H. Shen, Analysis of cooperation incentive strategiesin mobile ad hoc networks, TMC, 2012.

[44] B. Chen and M. C. Chan, MobiCent: a credit-based incentivesystem for disruption tolerant network. in Proc. of INFOCOM,2010.

Kang Chen Kang Chen received the BS degreein Electronics and Information Engineering fromHuazhong University of Science and Technol-ogy, China in 2005, and the MS in Communica-tion and Information Systems from the Gradu-ate University of Chinese Academy of Sciences,China in 2008. He is currently a Ph.D studentin the Department of Electrical and ComputerEngineering at Clemson University. His researchinterests include mobile ad hoc networks anddelay tolerant networks.

Haiying Shen received the BS degree in Computer Science and Engineering from Tongji University, China in 2000, and the MS and Ph.D. degrees in Computer Engineering from Wayne State University in 2004 and 2006, respectively. She is currently an Assistant Professor in the Holcombe Department of Electrical and Computer Engineering at Clemson University. Her research interests include distributed and parallel computer systems and computer networks, with an emphasis on peer-to-peer and content delivery networks, mobile computing, wireless sensor networks, and grid and cloud computing. She was the Program Co-Chair for a number of international conferences and member of the Program Committees of many leading conferences. She is a Microsoft Faculty Fellow of 2010 and a member of the IEEE and ACM.

Cheng-Zhong Xu received B.S. and M.S. degrees from Nanjing University in 1986 and 1989, respectively, and a Ph.D. degree in Computer Science from the University of Hong Kong in 1993. He is currently a Professor in the Department of Electrical and Computer Engineering of Wayne State University and the Director of Suns Center of Excellence in Open Source Computing and Applications. His research interests are mainly in distributed and parallel systems, particularly in scalable and secure Internet services, autonomic cloud management, energy-aware task scheduling in wireless embedded systems, and high performance cluster and grid computing. He has published more than 160 articles in peer-reviewed journals and conferences in these areas. He is the author of Scalable and Secure Internet Services and Architecture (Chapman & Hall/CRC Press, 2005) and a co-author of Load Balancing in Parallel Computers: Theory and Practice

Haiying Shen Haiying Shen received the BSdegree in Computer Science and Engineeringfrom Tongji University, China in 2000, and theMS and Ph.D. degrees in Computer Engineeringfrom Wayne State University in 2004 and 2006,respectively. She is currently an Assistant Pro-fessor in the Department of Electrical and Com-puter Engineering at Clemson University. Herresearch interests include distributed computersystems and computer networks, with an em-phasis on P2P and content delivery networks,

mobile computing, wireless sensor networks, and cloud computing. Sheis a Microsoft Faculty Fellow of 2010, a senior member of the IEEE anda member of the ACM.

Maximizing P2P File Access Availability in Mobile Ad …hs6ms/publishedPaper/Journal/2014...Maximizing P2P File Access Availability in Mobile Ad hoc Networks Though Replication for

Documents