Cost Based Data Dissemination in Satellite Networks Bo Xu and Ouri Wolfson Department of Electrical Engineering and Computer Science University of Illinois at Chicago bxu,wolfson @eecs.uic.edu Sam Chamberlain Army Research Laboratory [email protected]Abstract We consider the problem of data dissemination in a satel- lite network. In contrast to previously studied models, broadcastingis among peers, rather than client server. We introduce a cost model for data dissemination in peer to peer satellite networks. The model quantifies the tradeoff between the inconsistency of the data, and its transmission cost; the transmission cost may be given in terms of dollars, energy, or bandwidth. Using the model we first determine the parameters for which eager (i.e. consistent) replication has a lower cost than lazy (i.e. inconsistent) replication. Then we introduce a lazy broadcast policy and compare it with several naive or traditional approaches to solving the problem. 1 Introduction A mobile computing problem that has generated a signif- icant amount of interest in the database community is data This research was supported in part by Army Research Labs grant DAAL01-96-2-0003, NATO grant CRG-960648. broadcasting (see for example [39]). The problem is how to organize the pages in a broadcast from a server to a large client population in the dissemination of public informa- tion (e.g. electronic news services, stock-price information, etc.). A strongly related problem is how to replicate (or cache) the broadcast data in the Mobile Units that receive the broadcast. In this paper we study the problems of broadcasting and replication in a peer to peer rather than client server archi- tecture. More precisely, we study the problem of dissemina- tion, i.e. full replication at all the nodes in the system. This architecture is motivated by new types of emerging wire- less broadcast networks such as Mobile Ad-hoc Networks (see [?]) , sensor and ”smart dust” networks (see [26, 27]), and satellite networks. These networks enable novel ap- plications in which the nodes of a network collaborate to assemble a complete database. For instance, in the case of sensors that are parachuted or sprayed from an airplane, the database renders a global picture of an unknown terrain from local images collected by individual sensors. Or, the database consists of the current location of each member in a military unit (in a MANET case), or another meaningful 1
28
Embed
Cost Based Data Dissemination in Satellite Networkswolfson/mobile_ps/winet02.pdf · Cost Based Data Dissemination in Satellite Networks ... introduce a cost model for data dissemination
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Cost Based Data Dissemination in Satellite Networks
Bo Xu and Ouri Wolfson
Department of Electrical Engineering and Computer Science
University of Illinois at Chicago�bxu,wolfson � @eecs.uic.edu
etc.). A strongly related problem is how to replicate (or
cache) the broadcast data in the Mobile Units that receive
the broadcast.
In this paper we study the problems of broadcasting and
replication in a peer to peer rather than client server archi-
tecture. More precisely, we study the problem of dissemina-
tion, i.e. full replication at all the nodes in the system. This
architecture is motivated by new types of emerging wire-
less broadcast networks such as Mobile Ad-hoc Networks
(see [?]) , sensor and ”smart dust” networks (see [26, 27]),
and satellite networks. These networks enable novel ap-
plications in which the nodes of a network collaborate to
assemble a complete database. For instance, in the case
of sensors that are parachuted or sprayed from an airplane,
the database renders a global picture of an unknown terrain
from local images collected by individual sensors. Or, the
database consists of the current location of each member in
a military unit (in a MANET case), or another meaningful
1
database constructed from a set of widely distributed frag-
ments.
We model such applications using a ”master” replication
environment (see [22]), in which each node � ”owns” the
master copy of a data item ��� , i.e. it generates all the up-
dates to � � . For example, � � may be the latest in a sequence
of images taken periodically by the node � of its local sur-
roundings. Each new image updates � � . Or, � � may be the
location of the node which is moving; ��� is updated when
the Global Positioning System (GPS) on board the node �indicates a current location that deviates from � � by more
than a prespecified threshold. The database of interest is �= ����� ,..., ��� , where � is the number of nodes and also the
number of items in the database. �
It is required that � is accessible from each node in the
network, thus each node stores a (possibly inconsistent)
copy of � . � Our paper deals with various policies of broad-
casting updates of the data items. In each broadcast a data
item is associated with its version number, and a node that
receives a broadcasted data item updates its local database if
and only if the local version is older than the newly arrived
version. In the broadcast policies there is a tradeoff between
data consistency and communication cost. In satellite net-
works the communication cost is in terms of actual dollars
�In case ��� is the location of � , the database � is of interest in what
are called Moving Objects Database (MOD) applications (see [28, 30, 16,
17, 34]). If ��� is the location of object � in a battlefield situation, then a
typical query may be: retrieve the friendly helicopters that are in a given
region. Other MOD applications involve emergency (fire, police) vehicles
and local transportation systems (e.g. city bus system).�For example, the location of the members of a platoon should be view-
able by any member at any time.�By inconsistency of � we mean that some data items may not contain
the most recent version.
the customer is charged by the network provider; in sensor
networks, due to the small size of the battery, the communi-
cation cost is in terms of energy consumption for message
transmission; and in MANET’s the critical cost component
is bandwidth (see [?]). Bandwidth for (secure) communi-
cation is an important and scarce resource, particularly in
military applications (see [35, 36]).
Now let us discuss the broadcast policies. One obvious
policy is the following: for each node � , when ��� is updated,
node � broadcasts the new version of � � to the other nodes
in the network. We call this the Single-item Broadcast Dis-
semination (SBD) policy. In the networks and applications
we discuss in this paper, nodes may be disconnected, turned
off or out of battery. Thus the broadcast of � � may not be
received by all the nodes in the system. A natural way to
deal with this problem is to rebroadcast an update to ��� un-
til it is acknowledged by all the nodes, i.e. Reliable Broad-
cast Dissemination (RBD). Clearly, if the new version is not
much different than the previous one and if the probability
of reception is low (thus necessitating multiple broadcasts),
then this increase in communication cost is not justified. An
alternative option, which we adopt in SBD, is to broadcast
each update once, and let copies diverge. Thus the delivery
of updates is unreliable, and consequently the dissemination
of � � is ”lazy” in the sense that the copy of � � stored at a
node may be inconsistent.
How can we quantify the tradeoff between the increase
in consistency afforded by a reliable broadcast and its in-
crease in communication cost? In order to answer this ques-
tion we introduce the concept of inconsistency-cost of a data
item. This concept, in turn, is quantified via the notion of
the cost difference between two versions of a data item � � .
2
In other words, the inconsistency cost of using an older ver-
sion � rather than the latest version � is the distance be-
tween the two versions. For example, if ��� represents a lo-
cation, then the cost difference between two versions of � �can be taken to be the distance between the two locations. If
� � is an image, an existing algorithm that quantifies the dif-
ference between two images can be used (see for example
[6]). If � � is the quantity-on-hand of a widget, then the dif-
ference between the two versions is the difference between
the quantities. Now, in order to quantify the tradeoff be-
tween inconsistency and communication one has to answer
the question: what amount of bandwidth/energy/dollars am
I willing to spend in order to reduce the inconsistency cost
on a data item by one unit? Using this model we establish
the cost formulas for RBD and SBD, i.e reliable and unreli-
able broadcasting, and based on them formulas for selecting
one of the two policies for a given set of system parameters.
For the cases when unreliable broadcast, particu-
larly SBD, is more appropriate, consistency of the local
databases can be enhanced by a policy that we call Full
Broadcast Dissemination (FBD). In FBD, whenever � � is
updated, � broadcasts its local copy of the whole database
� , called � � ��� . In other words, � broadcasts � � , as well
as its local version of each one of the other data items in
the database. When a node � receives this broadcast, � up-
dates its version of � � , and � also updates its local copy of
each other item ��� , for which the version number in � � ���is more recent. Thus these indirect broadcasts of �� (to �via � ) are ”gossip” messages that increase the consistency
of each local database. However, again, this comes at the
price of an increase in communication cost due to the fact
that each broadcast message is � times longer.
The SBD and FBD policies represent in some sense two
extreme solutions on a consistency-communication spec-
trum of lazy dissemination policies. SBD has minimum
communication cost and minimum local database consis-
tency, whereas FBD has maximum communication cost
and maximum (under the imperfect circumstances) local
database consistency.
In this paper we introduce and analyze the Adaptive
Broadcast Dissemination (ABD) policy that optimizes the
tradeoff between consistency and communication using a
cost based approach. In the ABD policy, when node � re-
ceives an update to � � it first determines whether the ex-
pected reduction in inconsistency justifies broadcasting a
message. If so, then � ”pads” the broadcast message that
contains � � with a set of data items (that � does not own)
from its local database, such as to optimize the total cost.
One problem that we solve in this paper is how to determine
the set , i.e. how node � should select for each broadcast
message which data items from the local database to piggy-
back on � � . In order to do so, � estimates for each � and�
the expected benefit (in terms of inconsistency reduction)
to node�
of including in the broadcast message its local
version of � � .
Let us now put this paper in the context of existing work
on consistency in distributed systems. Our approach is new
as far as we know. Although gossiping has been studied
extensively in distributed systems and databases (see sec-
tion 6), none of the existing works uses an inconsistency-
communication tradeoff cost function in order to determine
what gossip messages to send. Furthermore, in the emerg-
ing resource constrained environments (e.g. sensor net-
works, satellite communication, and MANET’s) this trade-
3
off is crucial. Also our notion of consistency is appropri-
ate for the types of novel applications discussed in this pa-
per, and is different than the traditional notion of consis-
tency in distributed systems discussed in the literature (e.g.,
[3, 13, 7, 18]. Specifically, in contrast to the traditional ap-
proaches, our notion of consistency does not mean consis-
tency of different copies of a data item at different nodes,
and it does not mean mutual consistency of different data
items at a node. In this paper a copy of a data item at a node
is consistent if it has the latest version of the data item. Oth-
erwise it is inconsistent, and the inconsistency cost is the
distance between the local copy and the latest version of
the data item. Inconsistency of a local database is simply
the sum of the inconsistencies of all data items. We employ
gossiping to reduce inconsistency, not to ensure consistency
as in using vector clocks ([13, 3]).
In this paper we provide a comparative analysis of dis-
semination policies. The analysis is probabilistic and ex-
perimental, and it achieves the following objectives. First,
it gives a formula for the expected total cost of SBD and
RBD, and a complete characterization of the parameters for
which each policy has a cost lower than the other. Sec-
ond, for ABD we prove cost optimality for the set of data
items broadcast by a node � , for � ’s level of knowledge of
the system state. Third, the analysis compares the three un-
reliable policies discussed above, namely SBD, FBD, and
ABD, and a fourth traditional one called flooding (FLD) �
[37]. ABD proved to consistently outperform the other two
policies, often having a total cost (that includes the cost of
inconsistency and the cost of communication) that is several
�In flooding a node � broadcasts each new data item it receives either
as a results of a local update of ��� , or from a broadcast message.
times lower than that of the other policies.
In summary, the key contributions of this paper are as
follows.
� Introduction of a cost model to quantify the tradeoff
between consistency and communication.
� Analyzing the performance of eager and lazy dissem-
ination via reliable and unreliable broadcasts respec-
tively, obtaining cost formulas for each case and de-
termining the data and communication parameters for
which eager is superior to lazy, and vice versa.
� Developing and analyzing the Adaptive Broadcast Dis-
semination policy, and comparing it to the other lazy
dissemination policies.
The rest of the paper is organized as follows. In section
2 we introduce the operational model and the cost model.
In section 3 we analyze and compare reliable and unreliable
broadcasting. In section 4 we describe the ABD policy, and
in section 5 we analyze it. In section 6 we compare the un-
reliable broadcast policies by simulation. In section 7 we
discuss relevant work, and in the last section we summa-
rize the paper. In appendix A we provide the proofs of our
theorems and lemmas. In appendix B we describe further
experimental results.
2 The Model
In subsection 2.1 we precisely define the overall oper-
ational model, and in subsection 2.2 we define the cost
model.
4
2.1 Operational model
The system consists of a set of � nodes that communi-
cate by message broadcasting. Each node � (��� � � � )
has a data item �� associated with it. Node � is called ��� ’sowner. This data item may contain a single numeric value,
or a complex data structure such as a motion plan, or an im-
age of the local environment. Only � , and no other nodes,
has the authorization to modify the state of ��� . A data item
is updated at discrete time points. Each update creates a
new version of the data item. In other words, the�
th ver-
sion of � � , denoted � � � � � , is generated by the�
th update.
We denote the latest version of ��� by �� . Furthermore,
we use � � � � � to represent the version number of � � , i.e.
� � � � � � � ��� �. For two versions � � � � � and � � � ��� � , we say
that �� � � � is newer than � � � � � � if��� � �
, and �� � � � is
older than �� � � � � if� � � �
.
An owner � periodically broadcasts its data item ��� to
the rest of the system. Each such broadcast includes the
version number of � � . Since nodes may be disconnected,
some broadcasts may be missed by some nodes, thus each
node � has a version of each � � which may be older than
� � . The local database of node � at any given time is the
set� � �� � � �� ���� �
�� � , where each � �� (for��� � � � )
is a version of � � . Observe that since all the updates of ���originate at � , then � �� � � � . Node � updates � �� � ���� ��� in
its local database when it receives a broadcast from � .
Nodes may be disconnected (e.g. shut down) and thus
miss messages. Let � � be the percentage of time a node � is
connected. Then � � is also the probability that � receives a
message from any other node � . For example, if � is con-
nected 60% of the time (i.e. � � ��� ��� ), then a message
from � is received by � with probability 0.6. We call � � the
connection probability of � .
2.2 Cost Model
In this subsection we introduce a cost function that quan-
tifies the tradeoff between consistency and communication.
The function has two purposes. First, to enable determin-
ing the items that will be included in each broadcast of the
ABD policy, and second, to enable comparing the various
policies.
Inconsistency cost
Assume that the distance between any two versions of a
data item can be quantified. For example, in moving ob-
jects database (MOD) applications, the distance between
two data item versions may be taken to be the Euclidean dis-
tance between the two locations. If ��� is an image, one of
the many existing distance functions between images (e.g.
the cross-correlation distance ([6])) can be used.
Formally, the distance between two versions � � � � � and
� � � � � , denoted ��� �� � � � � � � � � � � � � , is a function whose
domain is the nonnegative reals, and it has the property that
the distance between two identical versions is 0. If the data
item owned by each node consists of two or more types of
logical objects, each with its own distance function, then
the distance between the items should be taken to be the
weighted averages of the pairwise distances.
We take the ��� �� function to represent the cost, or the
penalty, of using the older version rather than the newer one.
More precisely, consider two consecutive updates on ��� ,namely the
�th update and the
� ��� � � st update. Assume
that the�
th update happened at time � � and the� ��� � � st
update at time � �! � . Intuitively, at time � �! � each node �
5
that did not receive the�
th version � � � � � during the inter-
val � ��� ���! � � , pays a price which is equal to the distance
between the latest version of � � that � knows and � � � � � . In
other words, this price is the penalty that � pays for using an
older version during the time in which � should have used
� � � � � . If � receives � � � � � sometime during the interval
� � � � �! � � , then the price that � pays on ��� is zero. Formally,
assume that at time � �! � the latest version of � � that �knows is � ( � � �
cost of the system on � � � � � is ��� �� ������� � � � � � � � ��� � � ��� �� ������� � � � � � � � � .The total inconsistency cost of the system on ��� up to
the � �� update of � � , denoted ��� �� ������� � � � � , is�� ����� ��� �� ������� � � � � � � � .The total inconsistency cost for the system up to time
Observe that the characterization of Theorem 3 is de-
pendent of the total cost of inconsistency of each data item
(since W � and W are dependent on these inconsistencies).
In some cases, however, the difference between any two ver-
sions of � � is a constant. For example, assume that if a node
� does not have the latest version of � � , then it pays a fixed
cost (because, say, � makes an erroneous decision), regard-
less of the version of � � that � actually has. In this case, the
difference between any two arbitrary versions is a constant.
Now we characterize when SBD is better than RBD in this
case.
Theorem 4 Assume that for any node � , the difference be-
tween two arbitrary versions ��� � � � and �� � � � is a constant�. Then the expected system cost of SBD up to � is:
� � ��� �� � ������ � ����� � � ��� � � � � � � �
��� ��� �
� � � � � � � � �,� � � � � � � � ��� ��� � �
� �(� � � � � (5)
%Theorems 2 and 4 enable us to compare the performance
of SBD and RBD for given � � , � , and�. We identify the
ranges of � � , � , and�
for which SBD outperforms RBD
and vice versa. This is illustrated in Figure 1(b), where �is the angle between the line �� and the axis � � , and is the angle between the line � � and the axis
SBD is better than RBD if and only if the parameters � � ,� , and
�denote a point which is below the shadowed plane
of Figure 1(b). One of the implications of this result is that
as a point on the ( � � , � ) plain moves farther away from
the origin, SBD is better for a wider range of�’s. This quan-
tifies the intuition that SBD becomes the preferred policy as
the communication cost increases.
4 The Adaptive Broadcast Dissemination
Policy
In this section we describe the Adaptive Broadcast Dis-
semination policy. Intuitively, a node � executing the policy
behaves as follows. When it receives an update to � � , node
� constructs a broadcast message by evaluating the benefit
of including in the message each one of the data items in its
local database. Specifically, the ABD policy executed by �consists of the following two steps.
(1) Benefit estimation: For each data item in the local
database, estimate how much the inconsistency of the sys-
tem could be reduced if that data item is included in the
message.
8
(2) Message construction: Construct the message which
is a subset of the local database so that the total estimated
net benefit of the message is maximized (The net benefit
is the difference between the inconsistency reduced by the
message and the cost of the message). Observe that the set
of data items to be broadcast may be empty. In other words,
when �� is updated, node � may estimate that the net benefit
of broadcasting any data item is negative.
Each one of the above steps is executed by an algorithm
which is described in one of the next two subsections.
4.1 Benefit Estimation
Intuitively, the benefit to the system of including a data
item � � in a message that node � broadcasts is in terms
of inconsistency reduction. This reduction depends on the
nodes that receive the broadcast, and on the latest version
of � � at each one of these nodes. Node � maintains data
structures that enable it to estimate the latest version of � �at each node. Then the benefit of including a data item � �in a message that � broadcasts is simply the sum of the ex-
pected inconsistency reductions at all the nodes.
In computing the inconsistency reduction for a node�
we attempt to be as accurate as possible, and we do so as fol-
lows. Node � maintains a ”knowledge matrix” which stores
in entry� � � � the last version number of � � that node �
received from node�
(this version is called � � � �� � ), and
the time when it was received. Additionally, � saves in the
”real history” for each � � all the versions of � � that � has
”heard” from other nodes, the times at which it has done
so, and from which node they were received � . The reason�There is a potential storage problem here, which we address, but we
postpone the discussion for now
for maintaining all this information is that now, in estimat-
ing which version of � � node�
has, node i can take into
consideration two factors: (1) the last version of � � that �received from
�at time, say � , and (2) the fact that since time
� node�
may have received updates of � � by ”third party”
messages that were transmitted after time � , and ”heard” by
both,�
and � . Node � also saves with each version � of � �that it ”heard”, the distance (i.e. the inconsistency caused
by the version difference) between � and the last version of
� � that � knows; this difference is the parameter necessary
in order to compute the inconsistency cost reduction that is
obtained if node i broadcasts its latest version of � � .In subsection 4.1.1 we describe the data structures that
are used by a node � in benefit estimation. In subsection
4.1.2 we present � ’s benefit estimation method.
4.1.1 Data Structures
(1) The Knowledge matrix: For each data item � � (� ��� ), denote by � � � �� � the latest version number of � � that �received from
�, and denote by � � � �� � the last time when
� �� was received at � . The knowledge matrix at node � is: