On Quantifying the Eﬀects of Mobility on Data Replication in ...mcn.cse.psu.edu/paper/submitted/adhoc11.pdfMobile ad hoc networks, Replication, Mobility model 1. Introduction In

On Quantifying the Effects of Mobility on Data

Replication in Mobile Ad Hoc Networks

Yang Zhanga,, Guohong Caoa, Bhaskar Krishnamacharib, Tom La Portaa,

aDepartment of Computer Science and Engineering, The Pennsylvania State University,University Park, PA

bDepartment of Electrical Engineering, University of Southern California, Los Angeles,CA

Abstract

In mobile ad hoc networks, nodes move freely and network partition occursfrequently. To mitigate this problem, data replication is commonly used toincrease the data availability and reduce the data access delay. However,most previous work assumed a particular mobility model and could not fullystudy the effects of mobility on data replication. In this paper, we quantifythe effects of mobility on different data replication algorithms from variousperspectives. The study is based on several metrics which are not limited tothe average access delay and data availability, by including the geographicaldistribution of these values. Through extensive experiments, we study theeffects of four typical mobility models on data replication, and identify themost suitable data replication algorithms under various mobility models.

Keywords:Mobile ad hoc networks, Replication, Mobility model

1. Introduction

In mobile ad hoc networks (MANETs)[1], since nodes move freely, net-work partition may occur, where nodes in one partition cannot access dataheld by nodes in other partitions. To mitigate this problem, data replication

Email addresses: [email protected] (Yang Zhang), [email protected](Guohong Cao), [email protected] (Bhaskar Krishnamachari), [email protected] (TomLa Porta)

Preprint submitted to Ad Hoc Networks February 7, 2011

can be used. By replicating the data into a number of nodes, a data requestcan be served by the closest node which has the data replica. Then, even ifthere is a network partition between the requesting node and the originallydata source, the data request can still be served as long as it can reach anode with a data replica. Further, since the data request can be served withless number of hops, the data access delay is reduced.

Data replication can increase the data availability and reduce the dataaccess delay, but at the cost of data storage. Since mobile nodes only havelimited storage space, bandwidth, and power, it is impossible for one node tohold all the data. Therefore, it is important for mobile nodes to cooperatewith each other to decide which node should hold which data replica. Toincrease data availability, a node may not hold the data which has alreadybeen replicated by neighbors so that its local storage can be used to holdadditional data. However, this may increase the hop count of some data andincrease the data access delay. The problem becomes more complex whenmobility is considered, since mobility can change the location of the datareplica, and then affect the data availability and data access delay.

There have been some studies on data replication in MANETs [2, 3, 4].These studies show that node mobility significantly affects the performance ofdata replication. However, most of these works assume a particular mobilitymodel and only examine the effect of one particular mobility model on theperformance of their proposed algorithm. In other words, they could notprovide any general insights on the relationship between different mobilitymodels and data replication algorithms. In this paper, we aim to studythe effect of various mobility models on data replication and then identifythe most suitable data replication algorithms under various mobility models.More specifically, the contributions of this paper are as follows:

1. We quantify the effect of mobility on data replication based on metricssuch as data access delay and data availability. Besides these traditionalmetrics, we also look into the geographical distribution of access delayand data availability.

2. Our experimental results illustrate that different replication algorithmsshow quite different features on node cooperation, and thus achieve dif-ferent data access performance under different mobility models. Specif-ically, we provide a deep analysis and evaluation on the relationshipbetween data replication and node mobility, and identify the reasonbehind it.

2

3. We identify the most suitable data replication algorithms under var-ious mobility models. These results can be used as guidelines for re-searchers and system developers to design and examine data replicationalgorithms when considering node mobility.

The remainder of this paper is organized as follows. In Section 2, wesummarize related work in this area. In Section 3, we present the systemmodel, performance metrics, and data replication algorithms that will beevaluated in this paper. Section 4 reports the evaluation results of howdifferent data replication algorithm perform under various mobility models.Finally, Section 5 concludes the paper.

2. Related work

Data replication has been extensively studied in the Web environment[5, 6] and the distributed database systems [7], where the goal is to placesome replicas of the web servers or database among a number of possiblelocations so that the performance in terms of query delay or data availabilityis optimized. However, in all these conventional works, both web servers anddatabase systems are assumed to be static, whereas our work is proposed fora mobile ad hoc environment.

Recently, much research has been conducted to investigate the effect ofmobility on network performance such as the efficiency of routing protocol[8] and network partitioning [9]. For routing, the main objective is to finddestinations and forward data with low message overhead, high data deliveryratio, and short delivery delay. For network partition, the dynamic changesof the size and shape of each partition are important issues. These studieshave some similarity to our work from the point of investigating the effectof mobility. However, these works mainly focus on link stability and nodedistribution in the network; i.e., their studies are at the link and node level.Our work, however, focuses on the effects of mobility on data replication.

Some existing works studied the effects of mobility on data availabilityand data dissemination speed in MANETs. In [10], the authors mathemati-cally define some metrics that represent the effects on information diffusionin MANETs. Huang and Chen [11] studied how to replicate data when nodeshave group mobility pattern. However, all these works aim at studying thenetwork dynamics and the characteristic of node mobility. They cannotprovide any deep and general insight on the internal relationship between

3

mobility model and data access. The existing work that is most relevantto our work is [12], where Hara proposes metrics to evaluate the impact ofmobility on data availability. However, all the metrics are limited to dataavailability. No specific data replication algorithm is analyzed and the dataaccess performance is not considered. Therefore, it is not enough to fullyexamine the relationship between mobility model and data access. In thiswork, we will study the effects of mobility on data replication in terms ofdata access delay and data availability. We also identify the most suitabledata replication algorithms under various mobility models.

3. Preliminary

In this section, we propose new metrics to quantify the effects of mobilityon data replication and present four data replication algorithms that will beused in the evaluation.

3.1. System Model

We assume there are 𝑚 nodes in the network. The nodes are denotedby 𝑁 = {𝑁1, 𝑁2, ..., 𝑁𝑚}, where 𝑁𝑘(𝑘 = 1, ...,𝑚) is a node identifier. Thecommunication range of each mobile node is represented by a circle withradius 𝑅. When two nodes move out of their communication range, thelink between them will fail and the link failure probability between 𝑁𝑖 and𝑁𝑗 is denoted as 𝑓𝑖𝑗. The link failure probability is related to the distance,and the moving direction and velocity of the nodes. For example, if twoconnected nodes have a long distance and they move towards the oppositedirection, then they are easier to disconnect and the link failure probabilitybetween them is high. We assume every link is bidirectional, and thus 𝑓𝑖𝑗 isequal to 𝑓𝑗𝑖. The network can be partitioned due to the limitations of thecommunication range and link failure.

There are 𝑛 different data items in the network. The set of data itemsis denoted by 𝐷 = {𝑑1, 𝑑2, ...𝑑𝑛}, where 𝑑𝑘(𝑘 = 1, ..., 𝑛) is a data identifier.Each mobile node maintains some amount of data locally. For simplicity,we assume that data are not updated, and similar techniques used in [13]and [14, 15] can be applied to extend the proposed scheme to handle dataupdate or data consistency issues. These data items may be replicated toother nodes based on some data replication algorithm. Because of limitedmemory (or disk) size, each mobile node can only host 𝐵(𝐵 < 𝑛) replicasincluding its original data. When a mobile node 𝑁𝑖 needs to access a data

4

item 𝑑𝑗, 𝑁𝑖 first searches its local memory. If 𝑁𝑖 cannot find a copy of 𝑑𝑗in the local memory, 𝑁𝑖 communicates with its reachable nodes (throughone-hop or multi-hop links in its partition) to get 𝑑𝑗. If the requesting nodecannot communicate with any of the nodes that have 𝑑𝑗, 𝑑𝑗 is considered tobe not accessible to 𝑁𝑖.

3.2. Evaluation Metrics

Based on the system model, we define several metrics that represent theperformance of different data replication algorithms.

3.2.1. Average Access Delay (𝒟)This metric is defined as the average number of hops from the query node

to the nearest node that has the data. Formally, if we use 𝑡𝑖𝑗 to denote theaccess delay of the 𝑗th request of node 𝑁𝑖, the average access delay duringthe whole experiment can be expressed by the following equation:

𝒟 =∑𝑚

𝑖=1

∑ℛ(𝑖)𝑗=1 𝑡𝑖𝑗∑𝑚

𝑖=1 ℛ(𝑖)(1)

Here, ℛ(𝑖) is a function to return the number of requests initiated by node𝑁𝑖 during the experiment.

3.2.2. Average Availability (𝒜)Average availability is the average probability that the query can be

served successfully. Similarly, we use a binary variable 𝑠𝑖𝑗 to denote if the𝑗th request of node 𝑁𝑖 is satisfied or not, the definition of this metric can beformalized as

𝒜 =∑𝑚

𝑖=1

∑ℛ(𝑖)𝑗=1 𝑠𝑖𝑗∑𝑚

𝑖=1 ℛ(𝑖)(2)

where 𝑠𝑖𝑗 = 1 if the 𝑗th request of node 𝑁𝑖 is satisfied; otherwise, 𝑠𝑖𝑗 = 0.

3.2.3. Distribution of the Access Delay (𝒟ℎ)We believe that the average access delay may not always be a very signif-

icant metric since it treats the two case equally: 1) each request has similaraccess delay; and 2) some requests have long access delay but others haveshort delay. Therefore, to study the performance of data replication algo-rithms, the distribution of access delay is more significant than their averagevalue, and is heavily affected by the adopted mobility model and replication

5

algorithms. Therefore, we define the distribution of access delay as a newmetric by the following equation:

𝒟ℎ =𝑚∑𝑖=1

ℛ(𝑖)∑𝑗=1

𝑏𝑒𝑙(𝑡𝑖𝑗 , ℎ), (ℎ = 0, 𝑡, 2𝑡, ...) (3)

where 𝑡 is the statistic interval of the access delay, and 𝑏𝑒𝑙(𝑡𝑖𝑗, ℎ) is a functionto return if 𝑡𝑖𝑗 belongs to the range [ℎ, ℎ+ 𝑡). 𝑏𝑒𝑙(𝑡𝑖𝑗, ℎ) = 1 if ℎ ≤ 𝑡𝑖𝑗 < ℎ+ 𝑡;otherwise, 𝑏𝑒𝑙(𝑡𝑖𝑗, ℎ) = 0.

3.2.4. Geographical Distribution of the Access Delay (𝒟⟨ℎ𝑥,ℎ𝑦⟩)Since different mobility models may lead to different deployment patterns

of mobile nodes, we study the geographical distribution of the data accessdelay. We divide the entire network area into ℎ × ℎ small subareas andcompare the results in different subareas. The geographical distribution ofaccess delay at subarea ⟨ℎ𝑥, ℎ𝑦⟩ is expressed by the following equation:

𝒟⟨ℎ𝑥,ℎ𝑦⟩ =𝑚∑𝑖=1

ℛ(𝑖)∑𝑗=1

ℒ(𝑡𝑖𝑗 , ⟨ℎ𝑥, ℎ𝑦⟩) (4)

where ℒ(𝑡𝑖𝑗, ⟨ℎ𝑥, ℎ𝑦⟩) is a function that returns if the request takes place inthe subarea ⟨ℎ𝑥, ℎ𝑦⟩. ℒ(𝑡𝑖𝑗, ⟨ℎ𝑥, ℎ𝑦⟩) = 𝑡𝑖𝑗 if the 𝑗th request of node 𝑁𝑖 isinitiated in the subarea ⟨ℎ𝑥, ℎ𝑦⟩; otherwise, ℒ(𝑡𝑖𝑗, ⟨ℎ𝑥, ℎ𝑦⟩) = 0.

3.2.5. Geographical Distribution of Availability (𝒜⟨ℎ𝑥,ℎ𝑦⟩)Similar to the definition of geographical distribution of access delay, the

geographical distribution of availability is represented by the following equa-tion:

𝒜⟨ℎ𝑥,ℎ𝑦⟩ =𝑚∑𝑖=1

ℛ(𝑖)∑𝑗=1

ℒ(𝑠𝑖𝑗 , ⟨ℎ𝑥, ℎ𝑦⟩) (5)

where ℒ(𝑡𝑖𝑗, ⟨ℎ𝑥, ℎ𝑦⟩) is a function that returns if the request takes place inthe subarea ⟨ℎ𝑥, ℎ𝑦⟩. ℒ(𝑡𝑖𝑗, ⟨ℎ𝑥, ℎ𝑦⟩) = 𝑠𝑖𝑗 if the 𝑗th request of node 𝑁𝑖 isinitiated in the subarea ⟨ℎ𝑥, ℎ𝑦⟩; otherwise, ℒ(𝑡𝑖𝑗, ⟨ℎ𝑥, ℎ𝑦⟩) = 0.

3.3. Data Replication Algorithms

To study the effects of mobility on data replication, we use the followingfour representative data replication algorithms.

6

3.3.1. Greedy Data Replication

The Greedy data replication is a naive data replication algorithm. In thisalgorithm, each node replicates its most frequently accessed data until thememory is full. More specifically, let 𝑎𝑖𝑗 denote the access frequency of node𝑁𝑖 to data 𝑑𝑗. Then, each node always replicates the data with the highest𝑎𝑖𝑗. Since each node only takes its own data access pattern into accountduring data replication, this algorithm is non-cooperative.

3.3.2. Pairing Cooperation Data Replication

Different from the Greedy data replication, in the Paring algorithm (e.g.,the OTOO scheme in [16] and the DAFN scheme in [2]), each mobile nodecooperates with one of its neighbors to decide which data to replicate. Morespecifically, each node pair 𝑁𝑖 and 𝑁𝑗 calculates a combined access frequencyvalue to data item 𝑑𝑘 at 𝑁𝑖 and 𝑁𝑗, called 𝐶𝐴𝐹𝑖𝑗, respectively. For example,for 𝑁𝑖:

𝐶𝐴𝐹𝑖𝑗(𝑘) = 𝑎𝑖𝑘 + 𝑎𝑗𝑘 × (1− 𝑓𝑖𝑗) (6)Similarly 𝑁𝑗 calculates its combined access frequency. Each node sorts thedata according to the CAF value and picks data items with the highest valuesto replicate in its memory until no more data items can be replicated. Thedata replication decision does not simply depend on the access frequency ofone single node. It depends on the access frequency of the other pairing nodeand the link stability between them.

3.3.3. Reliable Neighboring Data Replication

The Paring algorithm considers neighboring nodes when making datareplication choices. However, it still considers its own access frequency asthe most important factor and only considers to cooperate with one neigh-boring node. As described in [16], the reliable Neighboring data replicationalgorithm further increases the degree of cooperation and allows nodes toreplicate and share data with multiple reliable neighbors within its one-hoprange. The replication decision is made depending on the data access fre-quency and the link stability. More specifically, in this algorithm, part ofthe node’s memory is used to hold the most interesting data for itself andothers are for its reliable neighbors. The combined access frequency functionfor node 𝑁𝑖 to data 𝑑𝑘 in the Neighboring algorithm is defined as:

𝐶𝐴𝐹𝑖(𝑘) =∑

𝑁𝑗∈𝑛𝑏(𝑖)𝑎𝑗𝑘 × (1− 𝑓𝑖𝑗) (7)

7

where 𝑛𝑏(𝑖) is the set that includes all reliable neighbors of 𝑁𝑖; i.e., whoselink failure rate to 𝑁𝑖 is less than a threshold.

3.3.4. Reliable Grouping Data Replication

Reliable Grouping data replication (e.g., the DCG scheme in [2] andDRAM scheme in [11]) is the most aggressively cooperative algorithm indata replication. All nodes in this group contribute parts of their memory toshare and replicate data for all members in the same group. More specifically,the access frequency and access overhead of each data is evaluated from thegroup perspective. During data replication, the data with the highest group-ing access frequency will be allocated first at the node that minimizes thetotal access delay within the group. The allocation process is repeated forall data items in the order of their access frequency until the memory of allnodes in the group are filled. The Grouping algorithm can fully exploit thecooperation among a group of well connected nodes. Obviously, the perfor-mance of the group data replication algorithm highly depends on the groupconnectivity, and the performance will be better when the group connectivityis better.

4. Experiments

In this section, we measure the performance of the four data replicationalgorithms under typical mobility modes: random walk [17], random way-point [18], Manhattan mobility [10], and reference point group mobility [19].

4.1. Mobility Models

Random Walk (RW): In this model, at every unit of experimental time,each mobile node randomly determines a movement direction, and randomlydetermines a movement speed from 0 to 𝑉 m/sec. From long term pointof view, this model offers very low mobility similar to vibrating in the sameposition, because mobile nodes randomly change movement direction.Random WayPoint (RWP): Each node remains stationary for a pausetime 𝑆 seconds. Then, it selects a random destination in the entire area andmoves to the destination at a speed determined randomly between 0 and 𝑉m/sec. After reaching the destination, it pauses again, and then repeats thisprocess. In this model, mobile nodes tend to gather at the center of the area.Manhattan Mobility (MM): This model emulates the node movementon streets where nodes only travel on the pathways in the map. Manhattan

8

Table 1: Parameter ConfigurationParameter Symbol Value RangeNumber of nodes m 300Node movement speed V 5m/s (3, 8m/s)Group movement speed (RPGM) V’ 5m/s (3, 8m/s)Radius of group (RPGM) R 300mNode pause time S 5sec (3, 7sec)Group pause time (RPGM) S’ 5sec (3, 7sec)Communication range C 100mNumber of data n 200Memory Size B 10Zipf access 𝜃 0.8

grid maps of horizontal and vertical streets are used to restrict the nodemovement. On each street, the mobile nodes move along the lanes in bothdirections. At each intersection, the mobile nodes choose their directions andspeed (0 to 𝑉 m/sec) randomly.Reference Point Group Mobility (RPGM): This model is used to modelgroup mobility. Each group has a logical “center” called a reference pointand group members (nodes). Each reference point moves according to theRWP model with 𝑉 ′ m/sec (maximum speed) and 𝑆 ′ sec (pause time). Ineach group, nodes are uniformly distributed within a certain radius from thereference point. To achieve this, we assume that each node moves accordingto the RW model with 𝑉 m/sec (maximum speed) within that range. Specifi-cally, a node’s movement vector is composed by adding the movement vectorof the RW model of the node to that of the RWP model of the referencepoint.

4.2. Simulation Settings

There are 𝑚 mobile nodes (𝑁 = 𝑁1, ..., 𝑁𝑚) in a 2500𝑚× 2500𝑚 squarearea. All nodes move based on the mobility model. For the MM model, weuse a grid road map with six vertical and horizontal streets; i.e., 25 blocks ofthe same size (500𝑚× 500𝑚). For the RPGM model, we assume that thereare 25 reference points 𝑟𝑝1,...,𝑟𝑝25, and 𝑁𝑗(𝑗 = 1, ...,𝑚) sets its referencepoints as 𝑟𝑝⌈(𝑚/25)⌉.

At the beginning of the simulations, the initial position of each mobilenode is randomly determined in the space where the node can exist. Forexample, nodes can only exist on a road in the MM model. We set thesimulation time 𝑇 as 500,000 seconds. Each node initiates query request

9

Table 2: Access Delay with Uniform Data Access Pattern

RW RWP MM RPGMGreedy 0.8410 0.8511 1.4296 1.4346Pairing 0.9047 0.8619 1.3238 1.3347Neighboring 0.9863 0.8664 1.0423 1.3153Grouping 1.0093 0.8689 1.237 1.3559

Table 3: Access Delay with Zipf Data Access Pattern (𝜃 = 0.8)


every 5 seconds. Therefore, each node has almost 100,000 requests duringthe entire simulation period. We neglect the first 1000 seconds to removethe impact of the initial start. Table 1 summarizes the parameters and theirvalues used in the experiments. Most parameters are fixed to constant valueswhile others can change within a range represented by the parenthetic values.

4.3. Results

4.3.1. Average Delay and Average Data Availability

In this subsection, we study the average delay and average data avail-ability of four data replication algorithms under four mobility models withuniform data access pattern and a more skewed data access pattern, i.e., Zipfdata access.

Table 2 shows the average query delay with uniform data access pattern.As for the RW mobility model, both the Greedy algorithm and the Pairingalgorithm achieve relatively shorter access delay than the other two. Theshort access delay of the Greedy algorithm is due to its low data availability(as shown in Table 4), and the missed queries will not be accounted. ThePairing algorithm, however, helps share data with one-hop pairing nodes. Inthis way, the node and its paring node can both serve its requests. Consider-ing the relatively reliable connectivity between paring nodes under the RWmobility, the Paring algorithm can achieve higher data availability and lower

10

Table 4: Data Availability with Uniform Data Access Pattern


Table 5: Data Availability with Zipf Data Access Pattern (𝜃 = 0.8)


query delay compared to other algorithms. Similar results can be observedfrom Table 3.

In RWP, there is no reliable connectivity between any node pair, andhence the Paring algorithm may not be helpful for data sharing. Similar tothe Greedy algorithm, most requests in Paring are served locally. Therefore,the average query delay of the Paring algorithm becomes even shorter in thiscase at the cost of low data availability (see Table 4). Similar results existin the Neighboring and Grouping algorithms. However, the average delayof the Greedy algorithm increases in RWP. This is related to the networkformation under RWP where nodes tend to gather at the central area. Thus,large partitions may be formed in the center, which increases the possibilityof finding available data from nearby nodes to serve query requests whendata access is uniformed distributed.

Similarly, in the MM and the RPGM mobility models, due to the roadlayout constraint and the restricted mobility pattern, the network has rela-tively higher density from nodes perspective. As expected, larger partitionscan be formed in MM and RPGM compared to RW. Therefore, the querydelay becomes larger in the MM and the RPGM models.

Table 3 shows the results of query delay with skewed data access followingZipf (𝜃 = 0.8) distribution. By comparing Table 2 and Table 3, we can see

11

that as data access becomes more skewed, the average data access delaydecreases dramatically. This is because as data access becomes more skewed,it becomes easier for each node to buffer and replicate its interested data intoits own memory or at nearby nodes so that more query requests can be servedlocally or from nearby neighbors. Here we also note that there are two factorsthat may affect the performance in RWP. First, due to random mobility, therewill be fewer reliable connections in RWP. Therefore the cooperation basedalgorithms tend to work like the Greedy algorithm resulting in low accessdelay. Second, the cooperative algorithms may still replicate and share datawith other nodes when they find some reliable connections occasionally. Thisincreases the access delay as two nodes move farther away but still reachablewith multiple hops. When the data access pattern is uniform, the first factorhas more weight on the performance. When data access becomes skewed,the second factor has more weight because some interesting data with highaccess frequency may not be replicated locally. Therefore, the Paring andNeighboring algorithms have a larger access delay in RWP than those in RWwhen the access pattern follows Zipf distribution.

Table 4 and Table 5 show the results of data availability with uniformand Zipf data access pattern. Similar to the results of data access delay, thedata availability is much higher in Zipf data access than uniform data access.Moreover, we can see that MM and RPGM always have better data availabil-ity than RW and RWP. This advantage comes from the higher relative nodedensity and more similar node mobility in MM and RPGM. More nodes canbe accessed and more data can be used to serve query requests.

Tables 2, 3, 4 and 5 also demonstrate that cooperation helps to improveperformance in MM and RPGM, but less improvement in RW, and nonein RWP. This is because MM and RPGM have more mobility similaritybetween close nodes than that in RW and RWP. If close nodes can movetogether for a long time, cooperative data replication algorithms such asParing, Neighboring, and Grouping have more advantages.

In summary, in RWP where nodes move randomly, the Greedy algorithmis the best solution. In the RW model where close nodes have more reliableconnections, the Paring algorithm works the best. In MM, mobile nodestend to have more reliable neighbors and have higher nodes density, theNeighboring algorithm shows more advantage. In RPGM model where nodesmove following strict group mobility, Grouping data replication outperformsothers.

12

Greedy Paring Neighboring Grouping0

0.05

0.1

0.15

0.2

0.25

local1 hop2 hops3 hops4 hops5 hops6 hops6+ hops

(a) RW


0.05

0.1

0.15

0.2

0.25


(b) RWP


0.05

0.1

0.15

0.2

0.25


(c) MM


0.05

0.1

0.15

0.2

0.25


(d) RPGM

Figure 1: Distribution of the data access delay (with uniform data access pattern)

4.3.2. Distribution of Access Delay

Figure 1 and Figure 2 show the distribution of access delay under uniformand Zipf distributions. In these figures, the y-axis indicates the requestsuccess ratio. Each bar represents the query delay in terms of hops. Sevendifferent bars represent different distribution of the access delay, from 0 hopto 6+ hops.

As shown in Figure 1(a), for the RW model, since Paring, Neighboring,and Grouping algorithms share data among nearby nodes, a few requests thatare not satisfied locally can be served from one-hop or two-hop neighbors.Because the data access is uniformly distributed, the improvement from co-operation is not too much. When the data access become more skewed(as shown in Figure 2(a)), more cooperations exist, and more requests areserved from nearby nodes. For example, compared to the Greedy algorithm,the Neighboring algorithm sacrifices 5% requests served locally, but achieves

13


0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4


(a) RW

Greedy Pairing Neighboring Grouping0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4


(b) RWP

Greedy Pairing Neighboring Grouping0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4


(c) MM


0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4


(d) RPGM

Figure 2: Distribution of the data access delay (with Zipf (𝜃 = 0.8) data access pattern)

15% more requests that can be served from one-hop neighbors. Similarly,the Grouping algorithm tries to share data in a larger area. It has the fewestnumber of requests served locally, but the largest number of satisfied requestsfrom two-hop or three-hop neighbors.

In Figures 1(b) and 2(b), since nodes move randomly in RWP, cooperativealgorithms such as Paring, Neighboring, and Grouping do not get help fromcooperation. The Greedy algorithm in which each node replicates its mostinterested data, however, is more suitable for RWP.

In MM, due to the road layout constraint, nodes can only move on andfollow the roads. Therefore, each node has more neighbors in MM than thatin RW and RWP, and the average network partition size can be larger thanthat in RW and RWP. As a result, as shown in Figures 1(c) and 2(c), eachreplication algorithm has more requests satisfied from multi-hop neighbors.

In Figures 1(d) and 2(d), the result is similar to that in MM due to the rel-

14

atively reliable connectivity and higher density in RPGM. These two figuresalso clearly demonstrate that in RPGM, the Grouping replication algorithmhas more requests served by the neighboring nodes that are multiple hopsaway.

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(a) RW

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)A

cces

s D

elay

(b) RWP

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(c) MM

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(d) RPGM

Figure 3: Geographical distribution of the access delay (Greedy Algorithm)

4.3.3. Geographical Distribution of Access Delay

Figures 3 to 6 show geographical distribution of access delay with differentdata replication algorithms. Due to page limit, we only present the resultswith Zipf distribution.

As shown in Figures 3(a), 4(a), 5(a), and 6(a), geographical location doesnot affect the access delay too much with the RW mobility model. This isbecause nodes are initially randomly distributed and randomly determinemovement directions in RW. The node density is relatively even, and hencethere is no large variation for data access delay at different locations. How-

15

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(a) RW

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(b) RWP

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(c) MM

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(d) RPGM

Figure 4: Geographical distribution of the access delay (Paring Algorithm)

ever, we can still see that the Greedy algorithm and the Paring algorithmhave lower access delay than the other two, which is consistent with ourprevious results on access delay.

From Figures 3(b), 4(b), 5(b), and 6(b), we can see some interestingresults under RWP. When a node is at the boundary of the simulation area,its access delay is short. As it moves towards the center area, its accessdelay becomes larger first and then begins to decrease. In RWP, duringeach movement cycle, each node randomly chooses a destination and movesthere. Therefore, nodes have higher probability to appear at the center area,and thus the central area has higher node density than the boundary area.Thus, nodes are easier to be isolated at the boundary area, but form largepartitions at the center of the simulation area. In an extreme case whereone node is isolated, its access delay is the lowest since it can only accessthe local replicated data. At the center area, the node density is high, which

16

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(a) RW

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(b) RWP

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(c) MM

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(d) RPGM

Figure 5: Geographical distribution of the access delay (Neighboring Algorithm)

helps nodes to find their interested data from close nearby nodes, resultingin a low access delay.

Under MM, shown in Figures 3(c), 4(c), 5(c), and 6(c), mobile nodes areonly allowed to move in the vertical or horizontal directions following theroad layout, and thus the access delay is only available at the position wherethere is a road. This is good for achieving a relatively higher node density andavoiding nodes being isolated, but it results in larger access delay comparedto RW and RWP.

Finally, Figures 3(d), 4(d), 5(d), and 6(d) compare the access delay ofdifferent data replication algorithms under the RPGM mobility model. Sim-ilar to RWP, RPGM has lower access delay at the boundary area and thecenter area but larger delay in the middle. This is because the movement ofthe reference point of each group follows the RWP mobility model, and themobility pattern of each mobile group follows RWP. Because of the group

17

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(a) RW

0500

10001500

20002500

0

1000

2000

30000

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(b) RWP

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(c) MM

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

1.2

Y (m)X (m)

Acc

ess

Del

ay

(d) RPGM

Figure 6: Geographical distribution of the access delay (Grouping Algorithm)

mobility characteristic of RPGM, the connectivity among nodes in the samegroup are relatively reliable. This helps nodes to form larger partitions andthus more nodes can be reached. Therefore, the access delay is larger in theRPGM mobility model than that in the RWP.

4.3.4. Geographical Distribution of Data Availability

Similar to the geographical distribution of data access delay, from Figures7(a), 8(a), 9(a), and 10(a), we can see that data availability is independent tothe location where the query is initiated under RW. However, different datareplication algorithms achieve different data availability. In the Greedy algo-rithm, there is no data sharing since each node only replicates data accordingto its own interest. Therefore, there could be duplicated data among closelyconnected nodes, which reduces the overall data availability. The Neighbor-ing algorithm and the Grouping algorithm aim to share data with nearby

18

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(a) RW

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(b) RWP

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(c) MM

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(d) RPGM

Figure 7: Geographical distribution of the data availability (Greedy Algorithm)

nodes, which can remove some data redundancy and improve the data avail-ability. However, when partition occurs, data saved on neighbors may notbe available. The Paring algorithm, however, considers to replicate data onthe most reliable neighbor, and can achieve the best balance between nodes’cooperation and the risk of partition. Therefore, the Paring algorithm hasthe highest data availability.

As shown in Figures 7(b), 8(b), 9(b), and 10(b), under RWP, nodes areeasier to stay around the central area, and thus the data availability is higherin the center area. We can also see that the Greedy algorithm has the bestdata availability in RWP since it does not consider any cooperation.

In MM, shown in Figures 7(c), 8(c), 9(c), and 10(c), there are more nodesaround the intersection area than other area, and hence the data availabilityat the intersection is higher. We also find an interesting fact existing in thecooperative data replication algorithms. Let’s use the Paring algorithm as

19

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(a) RW

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(b) RWP

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(c) MM

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(d) RPGM

Figure 8: Geographical distribution of the data availability (Paring Algorithm)

an example. Figure 11 shows the data availability along the third horizontalroad, i.e., the position (x,y) changes from [0, 1500] to [2500, 1500]. In thisfigure, we can see that both the intersection area and the middle segmentsof the road have higher data availability. However, the data availability islow at other areas that are close to the intersections. This fact comes fromthe characteristic of the MM mobility model. Due to the road layout con-straint, mobile nodes may split at the intersection area when they choosedifferent movement directions. Since cooperation based data replication al-gorithms rely on data sharing among nearby nodes, some data may not beavailable when split happens, which affects the data availability at these ar-eas. However, when nodes are aware of the splitting, they will reorganizetheir collaborative nodes and share data with them. Therefore, after thereorganization process, i.e., at the middle segments of the road, they canachieve a relatively higher data availability.

20

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(a) RW

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(b) RWP

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(c) MM

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(d) RPGM

Figure 9: Geographical distribution of the data availability (Neighboring Algorithm)

Figures 7(d), 8(d), 9(d), and 10(d) present results for RPGM. Due tothe similar mobility pattern of the mobile nodes in RWP and the referencepoint in RGPM, the shape of the data availability figure of RPGM is simi-lar to RWP, i.e., higher data availability near the center area and low dataavailability at the boundary area. Since nodes in the same group have quitesimilar mobility pattern and more reliable connectivity, RPGM can achievemuch higher data availability than RWP. By comparing different data repli-cation algorithms, we can see that the Grouping algorithm has the highestdata availability. The advantage comes from its data sharing within eachmobile group, and thus nodes’ memory can be utilized more efficiently.

4.4. Discussions

In this section, we summarize the experimental results and identify themost suitable data replication algorithms under various mobility models.

21

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(a) RW

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(b) RWP

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(c) MM

0500

10001500

20002500

0500

10001500

20002500

0

0.2

0.4

0.6

0.8

1

Y (m)X (m)

Ava

ilabi

lity

(d) RPGM

Figure 10: Geographical distribution of the data availability (Grouping Algorithm)

RW Model: Under RW, nodes have low mobility similar to vibrating inthe same position. Then, the connectivity between closely connected nodes isrelatively reliable. Also, RW always forms small network partitions but rarelyforms large ones. Due to the low mobility, even if there is a network partition,each partition is relatively stable. Thus, when designing a data replicationalgorithm, it is more appropriate for nodes to cooperatively replicate datawith their closely connected neighbor, and the replication should not relyon data sharing with a large number of nodes. This also explains why theParing algorithm is the most efficient algorithm in the simulation.

RWP Model: In RWP, nodes move randomly and do not show anyreliable connections with each other, and hence the node partition rate ishigh. Thus, it may not be good to share data with others and the non-cooperative Greedy algorithm may be the best choice. On the other hand,since nodes tend to gather at the center of the network, it forms a large

22

0 500 1000 1500 2000 25000.68

0.69

0.7

0.71

0.72

0.73

0.74

0.75

Ava

ilabi

lity

Intersection Intersection IntersectionIntersection

Figure 11: Geographical distribution of the Paring algorithm in MM

partition around the central area, where the availability is high. Thus, whendesigning a data replication algorithm, it is better to push and replicate themost important data on the nodes around the central area. Further, mobilenodes should forward their requests to the central area to improve the querysuccess ratio.

MM Model: The MM model has several interesting features due to itsrestricted mobility. First, in MM the connection between neighboring nodeslasts longer than that in RWP. The connectivity is relatively reliable becauseseveral neighboring nodes on the same street with the same direction oftenmove together. Therefore, when designing a data replication algorithm underMM, it is effective to share data among neighbors in the same direction.Second, the node density is higher in the intersection area than other areas.Similar to the RWP model, it is more suitable to buffer some important dataat these areas to better serve future requests. Finally, partitions frequentlyoccur after the intersection area, and resulting in low data availability in theseareas. To maintain high data availability and low query delay, new schemesshould be designed to predict partition at the intersection and pre-fetch theimportant data before the partition.

RPGM Model: The RPGM model provides much higher data avail-ability but longer query delay than other mobility models. Due to groupmobility, RPGM always provides higher connectivity among nodes in thesame group and the most reliable group connection. As a result, cooperationbased data replication algorithm can achieve the best performance in termsof data availability by cooperatively sharing data within each group. How-

23

ever, the negative effect is that the query delay is relatively longer than othermobility models due to node cooperation. By contributing more memory toreplicate data for other group members, mobile nodes have to access someof the interested data from other nodes through multi-hop. In summary, itis effective to share data among nodes in the same group in RPGM. It isimportant to have a good group detection technique to detect nodes movingin the same group and then effectively allocate data within the group.

5. Conclusion

In mobile ad hoc networks, nodes move freely and network partition oc-curs frequently. To mitigate this problem, data replication is commonly usedto increase the data availability and reduce the data access delay. However,most previous work assumed a particular mobility model and could not fullystudy the effects of mobility on data replication. In this paper, we quantifythe effects of mobility on different data replication algorithms from variousperspectives. The study is based on several metrics which are not limited tothe average access delay and data availability, by including the geographicaldistribution of these values. Through extensive experiments, we study theeffects of four typical mobility models on data replication, and identify themost suitable data replication algorithms under various mobility models.

We believe that the experimental results and knowledge obtained fromthe results are very useful for researchers to design various algorithms for datasharing and replication on these typical mobility models. To the best ourknowledge, this is the first work that explores and provides a deep explanationof the relationship between node mobility and data replication algorithms.

References

[1] D. B. Johnson, D. A. Maltz, Dynamic source routing in ad hoc wirelessnetworks, Mobile Computing, Kluwer (1996) 153–181.

[2] T. Hara, S. K. Madria, Data replication for improving data accessibilityin ad hoc networks, IEEE Transactions on Mobile Computing 5 (11)(2006) 1515–1532.

[3] K. Wang, B. Li, Efficient and guaranteed service coverage in partition-able mobile ad-hoc networks, IEEE INFOCOM.

24

[4] J. Luo, J. Hubaux, and P. Eugster, Pan: Providing Reliable Storagein Mobile Ad Hoc Networks with Probabilistic Quorum Systems, ACMMobiHoc.

[5] H. Yu, A. Vahdat, Minimal Replication Cost for Availability, ACM Sym-posium on Principles of Distributed Computing (PODC).

[6] H. Yu, A. Vahdat, The costs and limits of availability for replicatedservices, ACM Transactions on Computer Systems 24 (2006) 70–113.

[7] L. Gao, M. Dahlin, A. Nayate, J. Zheng, A. Iyengar, Consistency andReplication: Application Specific Data Replication for Edge Services,International conference on World Wide Web.

[8] J. Zhao, G. Cao, Vadd: Vehicle-assisted data delivery in vehicular adhoc networks, IEEE Transactions on Vehicular Technology 57 (3) (2008(A preliminary version appeared in IEEE infocom’06)) 1910–1922.

[9] J. Hahner, D. Dudkowski, P. Marron, K. Rothermel, Quantifying net-work partitioning in mobile ad hoc networks, International Conferenceon Mobile Data Management (2007) 174–181.

[10] F. Bai, N. Sadagopan, A. Helmy, Important: A framework to systemati-cally analyze the impact of mobility on performance of routing protocolsfor adhoc networks, IEEE INFOCOM.

[11] J. Huang, M. Chen, On the effect of group mobility to data replicationin ad hoc networks, IEEE Transactions on Mobile Computing 5 (2006)492 – 507.

[12] T. Hara, Quantifying impact of mobility on data availability in mobilead hoc networks, IEEE Transactions on Mobile Computing 9 (2) (2010)241–258.

[13] T. Hara, Replica Allocation in Ad hoc Networks with Periodic DataUpdate, International Conference on Mobile Data Management.

[14] T. Hara, S. Madria, Consistency management strategies for data repli-cation in mobile ad hoc networks, IEEE Transactions on Mobile Com-puting 8 (7) (2009) 950–967.

25

[15] J. Cao, Y. Zhang, G. Cao, L. Xie, Data consistency for cooperativecaching in mobile environments, IEEE Computer 40 (4) (2007) 60–66.

[16] L. Yin, G. Cao, Balancing the tradeoffs between data accessibility andquery delay in ad hoc networks, IEEE International Symposium on Re-liable Distributed Systems (2004) 289–298.

[17] K. Pearson, The problem of the random walk, Nature 72 (1867) (1905)342.

[18] J. Broch, D. Maltz, D. Johnson Y. Hu, and J. Jetcheva, A PerformanceComparison of Multi-Hop Wireless Ad Hoc Network Routing Protocols,ACM MobiCom (1998) 85–97.

[19] X. Hong, M. Gerla, G. Pei, C. Chiang, A group mobility model forad hoc wireless networks, ACM international workshop on Modeling,analysis and simulation of wireless and mobile systems (1999) 53–60.

26

On Quantifying the Eﬀects of Mobility on Data Replication in ...mcn.cse.psu.edu/paper/submitted/adhoc11.pdfMobile ad hoc networks, Replication, Mobility model 1. Introduction In

Documents